gene bed12#

The trackc.pl.gene_track method input formats is BED12.

Bed12 file description can be found from the link below: https://bedtools.readthedocs.io/en/latest/content/general-usage.html#genome-file-format.

trackc gtf2bed#

If you have installed trackc, you can conver GTF to bed12 using trackc gtf2bed command. By default, the column-4 of the output BED12 file will be the gene identifier, typically gene_name or gene_id from the GTF attributes. If you wish to include the gene_biotype as a 13th column (BED13 format), you can use the –biotype2bed13 flag.

trackc gtf2bed GRCh38.84.gtf -o GRCh38.84.bed12
# To include gene biotype as a 13th column:
# trackc gtf2bed GRCh38.84.gtf -o GRCh38.84.bed13 --biotype2bed13

bed12-gene-name#
1	11869	14409	DDX11L1	14409	14409	9	358,108,1188,47,48,84,77,153,217	11869,12613,13221,12010,12179,12613,12975,13221,13453
1	14404	29570	WASH7P	29570	29570	11	36,153,98,146,136,135,197,158,151,33,97	29534,24738,18268,17915,17606,17233,16858,16607,15796,15005,14404
1	17369	17436	MIR6859-1	17436	17436	1	67	17369
1	29554	31109	RP11-34P13.3	31109	31109	5	485,103,121,400,133	29554,30564,30976,30267,30976

gtf2bed.pl#

You can use gtf2bed convert gtf format to bed12 format.

gtf2bed is a perl script, can be get from the link below: https://github.com/ExpressionAnalysis/ea-utils/blob/master/clipper/gtf2bed.

perl gtf2bed GRCh38.84.gtf >GRCh38.84.gtf.bed12

Below table is the output of gtf2bed:

bed12-transcript-id#
1	11868	14409	ENST00000456328	11868	14409	3	359,109,1189,	0,744,1352,
1	12009	13670	ENST00000450305	12009	13670	6	48,49,85,78,154,218,	0,169,603,965,1211,1443,
1	17368	17436	ENST00000619216	17368	17436	1	68,	0,
1	14403	29570	ENST00000488147	14403	29570	11	98,34,152,159,198,136,137,147,99,154,37,	0,601,1392,2203,2454,2829,3202,3511,3864,10334,15130,

please note that, the bed12 from gtf2bed.pl is based on transcript id, each row is a transcript, the column-4 is transcript id not gene name

read bed12#

read bed12 file to pd.DataFrame` for trackc.pl.gene_track input data

gene_bed12 = pd.read_table("GRCh38.84.bed12", sep="\t", header=None)
# The naming of chromosomes may be different for different multi-group data.
# If you want to keep the naming of chromosomes consistent, please refer to one of the following code
#gene_bed12[0] = gene_bed12[0].str.lstrip('chr')
#gene_bed12[0] = 'chr' + gene_bed12[0].astype(str)