gene bed12#

The trackc.pl.gene_track method input formats is BED12.

Bed12 file description can be found from the link below: https://bedtools.readthedocs.io/en/latest/content/general-usage.html#genome-file-format.

trackc gtf2bed#

If you have installed trackc, you can conver GTF to bed12 using trackc gtf2bed command. By default, the column-4 of the output BED12 file will be the gene identifier, typically gene_name or gene_id from the GTF attributes. If you wish to include the gene_biotype as a 13th column (BED13 format), you can use the –biotype2bed13 flag.

trackc gtf2bed GRCh38.84.gtf -o GRCh38.84.bed12
# To include gene biotype as a 13th column:
# trackc gtf2bed GRCh38.84.gtf -o GRCh38.84.bed13 --biotype2bed13
bed12-gene-name#

1

11869

14409

DDX11L1

0

14409

14409

0

9

358,108,1188,47,48,84,77,153,217

11869,12613,13221,12010,12179,12613,12975,13221,13453

1

14404

29570

WASH7P

0

29570

29570

0

11

36,153,98,146,136,135,197,158,151,33,97

29534,24738,18268,17915,17606,17233,16858,16607,15796,15005,14404

1

17369

17436

MIR6859-1

0

17436

17436

0

1

67

17369

1

29554

31109

RP11-34P13.3

0

31109

31109

0

5

485,103,121,400,133

29554,30564,30976,30267,30976

gtf2bed.pl#

You can use gtf2bed convert gtf format to bed12 format.

gtf2bed is a perl script, can be get from the link below: https://github.com/ExpressionAnalysis/ea-utils/blob/master/clipper/gtf2bed.

perl gtf2bed GRCh38.84.gtf >GRCh38.84.gtf.bed12

Below table is the output of gtf2bed:

bed12-transcript-id#

1

11868

14409

ENST00000456328

0

11868

14409

0

3

359,109,1189,

0,744,1352,

1

12009

13670

ENST00000450305

0

12009

13670

0

6

48,49,85,78,154,218,

0,169,603,965,1211,1443,

1

17368

17436

ENST00000619216

0

17368

17436

0

1

68,

0,

1

14403

29570

ENST00000488147

0

14403

29570

0

11

98,34,152,159,198,136,137,147,99,154,37,

0,601,1392,2203,2454,2829,3202,3511,3864,10334,15130,

please note that, the bed12 from gtf2bed.pl is based on transcript id, each row is a transcript, the column-4 is transcript id not gene name

read bed12#

read bed12 file to pd.DataFrame` for trackc.pl.gene_track input data

gene_bed12 = pd.read_table("GRCh38.84.bed12", sep="\t", header=None)
# The naming of chromosomes may be different for different multi-group data.
# If you want to keep the naming of chromosomes consistent, please refer to one of the following code
#gene_bed12[0] = gene_bed12[0].str.lstrip('chr')
#gene_bed12[0] = 'chr' + gene_bed12[0].astype(str)