Skip to content

Only noncoding annotations returned despite protein-coding genes in reference DB #6

@schilau

Description

@schilau

Hi, thanks for this tool!

I’m trying to use this tool, but I'm running into an issue where all annotations are classified as noncoding, even though my reference database clearly contains protein-coding genes.

Interestingly, this seems related to the already reported (but unresolved) issue:
#3

Instead of expected annotations like:

  • CDS
  • exon
  • UTR

I only get:

  • distnoncoding_intron500
  • intergenic
  • noncoding_exon
  • proxnoncoding_intron500

I generated my own gffutils database from GENCODE:

  • Source: GENCODE Comprehensive gene annotation
  • Release: 49 (GRCh38.p14)
db = gffutils.create_db(
    annotation_file, dbfn=db_file, force=force,
    keep_order=True, merge_strategy='merge', sort_attribute_values=True,
    disable_infer_genes=False,
    disable_infer_transcripts=False
)

I verified that the database contains protein-coding genes: and got ~20,000 genes across ~290,000 transcripts.


Question:

Could this be related to how gene/transcript biotypes are parsed or expected by the annotator?

For example:

  • Does the tool require specific attribute keys (e.g. gene_biotype vs gene_type)?
  • Is there a compatibility issue with newer GENCODE releases?

Thanks a lot for your help!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions