-
Notifications
You must be signed in to change notification settings - Fork 1
Only noncoding annotations returned despite protein-coding genes in reference DB #6
Copy link
Copy link
Open
Description
Hi, thanks for this tool!
I’m trying to use this tool, but I'm running into an issue where all annotations are classified as noncoding, even though my reference database clearly contains protein-coding genes.
Interestingly, this seems related to the already reported (but unresolved) issue:
#3
Instead of expected annotations like:
- CDS
- exon
- UTR
I only get:
- distnoncoding_intron500
- intergenic
- noncoding_exon
- proxnoncoding_intron500
I generated my own gffutils database from GENCODE:
- Source: GENCODE Comprehensive gene annotation
- Release: 49 (GRCh38.p14)
db = gffutils.create_db(
annotation_file, dbfn=db_file, force=force,
keep_order=True, merge_strategy='merge', sort_attribute_values=True,
disable_infer_genes=False,
disable_infer_transcripts=False
)I verified that the database contains protein-coding genes: and got ~20,000 genes across ~290,000 transcripts.
Question:
Could this be related to how gene/transcript biotypes are parsed or expected by the annotator?
For example:
- Does the tool require specific attribute keys (e.g.
gene_biotypevsgene_type)? - Is there a compatibility issue with newer GENCODE releases?
Thanks a lot for your help!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels