Gene name matching and ncbi info (gene)

Example

from orangecontrib.bioinformatics.ncbi.gene import GeneMatcher, GENE_INFO_TAGS

# specify input
organism = 9606
genes_symbols_to_match = ['HB1', 'BCKDHB', 'TWIST1']

# initialize gene matcher object
gene_matcher = GeneMatcher(organism)
gene_matcher.genes = genes_symbols_to_match

# run matching process
gene_matcher.run_matcher()

# inspect results
for gene in gene_matcher.genes:
    print("\ninput name: " + gene.input_name,
          "\nid from ncbi: ", gene.ncbi_id,
          "\nmatch type: ", gene.type_of_match
          )
    if gene.ncbi_id is None and gene.possible_hits:
        print('possible_hits: ', [hit.ncbi_id for hit in gene.possible_hits])

Output:

input name: HB1
id from ncbi:  None
match type:  None
possible_hits:  [3887, 6331, 8184]

input name: BCKDHB
id from ncbi:  594
match type:  Symbol match

input name: TWIST1
id from ncbi:  7291
match type:  Symbol match

Two out of three genes had a unique match with corresponding ncbi gene id. Symbol ‘HB1’ is used in multiple genes so we store them for further analysis.

One can also display gene information from NCBI database.

gene_of_interest = gene_matcher.genes[0].possible_hits[0]
gene_of_interest.load_ncbi_info()

for tag in GENE_INFO_TAGS:
    print(tag + ':', getattr(gene_of_interest, tag))

Output:

tax_id: 9606
gene_id: 3887
symbol: KRT81
synonyms: |HB1|Hb-1|KRTHB1|MLN137|ghHkb1|hHAKB2-1|
db_refs: MIM:602153|HGNC:HGNC:6458|Ensembl:ENSG00000205426|Vega:OTTHUMG00000167574
description: keratin 81
locus_tag: -
chromosome: 12
map_location: 12q13.13
type_of_gene: protein-coding
symbol_from_nomenclature_authority: KRT81
full_name_from_nomenclature_authority: keratin 81
nomenclature_status: O
other_designations: keratin, type II cuticular Hb1|K81|MLN 137|ghHb1|hair keratin K2.9|hard keratin, type II, 1|keratin 81, type II|keratin, hair, basic, 1|metastatic lymph node 137 gene protein|type II hair keratin Hb1|type-II keratin Kb21
modification_date: 20171105

Class References