gene bed12#
The trackc.pl.gene_track method input formats is BED12.
Bed12 file description can be found from the link below: https://bedtools.readthedocs.io/en/latest/content/general-usage.html#genome-file-format.
trackc gtf2bed#
If you have installed trackc, you can conver GTF to bed12 using trackc gtf2bed command. By default, the column-4 of the output BED12 file will be the gene identifier, typically gene_name or gene_id from the GTF attributes. If you wish to include the gene_biotype as a 13th column (BED13 format), you can use the –biotype2bed13 flag.
trackc gtf2bed GRCh38.84.gtf -o GRCh38.84.bed12
# To include gene biotype as a 13th column:
# trackc gtf2bed GRCh38.84.gtf -o GRCh38.84.bed13 --biotype2bed13
1 |
11869 |
14409 |
DDX11L1 |
0 |
14409 |
14409 |
0 |
9 |
358,108,1188,47,48,84,77,153,217 |
11869,12613,13221,12010,12179,12613,12975,13221,13453 |
|
1 |
14404 |
29570 |
WASH7P |
0 |
29570 |
29570 |
0 |
11 |
36,153,98,146,136,135,197,158,151,33,97 |
29534,24738,18268,17915,17606,17233,16858,16607,15796,15005,14404 |
|
1 |
17369 |
17436 |
MIR6859-1 |
0 |
17436 |
17436 |
0 |
1 |
67 |
17369 |
|
1 |
29554 |
31109 |
RP11-34P13.3 |
0 |
31109 |
31109 |
0 |
5 |
485,103,121,400,133 |
29554,30564,30976,30267,30976 |
gtf2bed.pl#
You can use gtf2bed convert gtf format to bed12 format.
gtf2bed is a perl script, can be get from the link below: https://github.com/ExpressionAnalysis/ea-utils/blob/master/clipper/gtf2bed.
perl gtf2bed GRCh38.84.gtf >GRCh38.84.gtf.bed12
Below table is the output of gtf2bed:
1 |
11868 |
14409 |
ENST00000456328 |
0 |
11868 |
14409 |
0 |
3 |
359,109,1189, |
0,744,1352, |
|
1 |
12009 |
13670 |
ENST00000450305 |
0 |
12009 |
13670 |
0 |
6 |
48,49,85,78,154,218, |
0,169,603,965,1211,1443, |
|
1 |
17368 |
17436 |
ENST00000619216 |
0 |
17368 |
17436 |
0 |
1 |
68, |
0, |
|
1 |
14403 |
29570 |
ENST00000488147 |
0 |
14403 |
29570 |
0 |
11 |
98,34,152,159,198,136,137,147,99,154,37, |
0,601,1392,2203,2454,2829,3202,3511,3864,10334,15130, |
please note that, the bed12 from gtf2bed.pl is based on transcript id, each row is a transcript, the column-4 is transcript id not gene name
read bed12#
read bed12 file to pd.DataFrame` for trackc.pl.gene_track input data
gene_bed12 = pd.read_table("GRCh38.84.bed12", sep="\t", header=None)
# The naming of chromosomes may be different for different multi-group data.
# If you want to keep the naming of chromosomes consistent, please refer to one of the following code
#gene_bed12[0] = gene_bed12[0].str.lstrip('chr')
#gene_bed12[0] = 'chr' + gene_bed12[0].astype(str)