KEGG - Kyoto Encyclopedia of Genes and Genomes (kegg)

KEGG - Kyoto Encyclopedia of Genes and Genomes

kegg is a python module for accessing KEGG (Kyoto Encyclopedia of Genes and Genomes) using its web services.

Note

This module requires slumber and requests packages.

>>> # Create a KEGG Genes database interface
>>> genome = KEGGGenome()
>>> # List all available entry ids
>>> keys = list(genome.keys())
>>> print(keys[0])
T01001
>>> # Retrieve the entry for the key.
>>> entry = genome[keys[0]]
>>> print(entry.entry_key)
T01001
>>> print(entry.definition)
Homo sapiens (human)
>>> print(entry)  
ENTRY       T01001            Complete  Genome
NAME        hsa, HUMAN, 9606
DEFINITION  Homo sapiens (human)
...

The Organism class can be a convenient starting point for organism specific databases.

>>> organism = Organism("Homo sapiens")  # searches for the organism by name
>>> print(organism.org_code)  # prints the KEGG organism code
hsa
>>> genes = organism.genes  # get the genes database for the organism
>>> gene_ids = list(genes.keys()) # KEGG gene identifiers
>>> entry = genes["hsa:672"]
>>> print(entry.definition) 
(RefSeq) BRCA1, DNA repair associated
>>> # print the entry in DBGET database format.
>>> print(entry) 
ENTRY       672               CDS       T01001
NAME        BRCA1, BRCAI, BRCC1, BROVCA1, FANCS, IRIS, PNCA4, PPP1R53, PSCP, RNF53
DEFINITION  ...
class orangecontrib.bioinformatics.kegg.Organism(org)[source]

A convenience class for retrieving information regarding an organism in the KEGG Genes database.

Parameters:org (str) – KEGG organism code (e.g. “hsa”, “sce”). Can also be a descriptive name (e.g. ‘yeast’, “homo sapiens”) in which case the organism code will be searched for by using KEGG find api.

See also

organism_name_search()
Search KEGG for an organism code
org

KEGG organism code.

genes

An Genes database instance for this organism.

gene_aliases()[source]

Return a list of sets of equal genes (synonyms) in KEGG for this organism.

Note

This only includes ‘ncbi-geneid’ and ‘ncbi-proteinid’ records from the KEGG Genes DBLINKS entries.

pathways(with_ids=None)[source]

Return a list of all pathways for this organism.

list_pathways()[source]

List all pathways for this organism.

get_enriched_pathways(genes, reference=None, prob=<orangecontrib.bioinformatics.utils.statistics.Binomial object>, callback=None)[source]

Return a dictionary with enriched pathways ids as keys and (list_of_genes, p_value, num_of_reference_genes) tuples as items.

get_pathways_by_genes(gene_ids)[source]

Pathways that include all genes in gene_ids.

orangecontrib.bioinformatics.kegg.KEGGOrganism

alias of orangecontrib.bioinformatics.kegg.Organism

Search for a organism by name and return it’s KEGG organism code.

orangecontrib.bioinformatics.kegg.pathways(org)[source]

Return a list of all KEGG pathways for an KEGG organism code org.

orangecontrib.bioinformatics.kegg.from_taxid(taxid)[source]

Return a KEGG organism code for a an NCBI Taxonomy id string taxid.

orangecontrib.bioinformatics.kegg.to_taxid(name)[source]

Return a NCBI Taxonomy id for a given KEGG Organism name

DBEntry (entry)

The entry.DBEntry represents a DBGET databas entry. The individual KEGG Database interfaces below provide their own specialization for this base class.

class orangecontrib.bioinformatics.kegg.entry.DBEntry(text=None)[source]

Bases: object

A DBGET entry object.

entry_key

Primary entry key used for identifying the entry.

parse(text)[source]

Parse text string containing a formated DBGET entry.

format(section_indent=12)[source]

Return a DBGET formated string representation.

KEGG Databases interface (databases)

class orangecontrib.bioinformatics.kegg.databases.DBDataBase(**kwargs)[source]

Bases: object

Base class for a DBGET database interface.

ENTRY_TYPE

alias of orangecontrib.bioinformatics.kegg.entry.DBEntry

DB = None

A database name/abbreviation (e.g. ‘pathway’). Needs to be set in a subclass or object instance’s constructor before calling the base. __init__

iterkeys()[source]

Return an iterator over the keys.

iteritems()[source]

Return an iterator over the items.

itervalues()[source]

Return an iterator over all DBDataBase.ENTRY_TYPE instances.

keys()[source]

Return an iterator over all database keys. These are unique KEGG identifiers that can be used to query the database.

values()[source]

Return an iterator over all DBDataBase.ENTRY_TYPE instances.

items()[source]

Return an iterator over all (key, DBDataBase.ENTRY_TYPE) tuples.

get(key, default=None)[source]

Return an DBDataBase.ENTRY_TYPE instance for the key. Raises KeyError if not found.

get_text(key)[source]

Return the database entry for key as plain text.

get_entry(key)[source]

Return the database entry for key as an instance of ENTRY_TYPE.

find(name)[source]

Find name using kegg find api.

pre_cache(keys=None, batch_size=10, progress_callback=None)[source]

Retrieve all the entries for keys and cache them locally for faster subsequent retrieval. If keys is None then all entries will be retrieved.

batch_get(keys)[source]

Batch retrieve all entries for keys. This can be significantly faster then getting each entry separately especially if entries are not yet cached.

class orangecontrib.bioinformatics.kegg.databases.GenomeEntry(text)[source]

Bases: orangecontrib.bioinformatics.kegg.entry.DBEntry

Entry for a KEGG Genome database.

organism_code

A three or four letter KEGG organism code (e.g. ‘hsa’, ‘sce’, …)

taxid

Organism NCBI taxonomy id.

annotation

ANNOTATION

chromosome

CHROMOSOME

comment

COMMENT

data_source

DATA_SOURCE

definition

DEFINITION

disease

DISEASE

entry

ENTRY

keywords

KEYWORDS

name

NAME

original_db

ORIGINAL_DB

plasmid

PLASMID

reference

REFERENCE

statistics

STATISTICS

taxonomy

TAXONOMY

class orangecontrib.bioinformatics.kegg.databases.Genome[source]

Bases: orangecontrib.bioinformatics.kegg.databases.DBDataBase

An interface to the A KEGG GENOME database.

ENTRY_TYPE

alias of GenomeEntry

org_code_to_entry_key(code)[source]

Map an organism code (‘hsa’, ‘sce’, …) to the corresponding kegg identifier (T + 5 digit number).

search(string, relevance=False)[source]

Search the genome database for string using bfind.

class orangecontrib.bioinformatics.kegg.databases.GeneEntry(text=None)[source]

Bases: orangecontrib.bioinformatics.kegg.entry.DBEntry

aaseq

AASEQ

brite

BRITE

class_

CLASS

DBLINKS

definition

DEFINITION

disease

DISEASE

drug_target

DRUG_TARGET

entry

ENTRY

module

MODULE

motif

MOTIF

name

NAME

ntseq

NTSEQ

organism

ORGANISM

orthology

ORTHOLOGY

pathway

PATHWAY

position

POSITION

structure

STRUCTURE

class orangecontrib.bioinformatics.kegg.databases.Genes(org_code)[source]

Bases: orangecontrib.bioinformatics.kegg.databases.DBDataBase

Interface to the KEGG Genes database.

Parameters:org_code (str) – KEGG organism code (e.g. ‘hsa’).
ENTRY_TYPE

alias of GeneEntry

class orangecontrib.bioinformatics.kegg.databases.CompoundEntry(text=None)[source]

Bases: orangecontrib.bioinformatics.kegg.entry.DBEntry

atom

ATOM

bond

BOND

brite

BRITE

comment

COMMENT

DBLINKS

entry

ENTRY

enzyme

ENZYME

exact_mass

EXACT_MASS

formula

FORMULA

mol_weight

MOL_WEIGHT

name

NAME

pathway

PATHWAY

reaction

REACTION

reference

REFERENCE

remark

REMARK

class orangecontrib.bioinformatics.kegg.databases.Compound[source]

Bases: orangecontrib.bioinformatics.kegg.databases.DBDataBase

ENTRY_TYPE

alias of CompoundEntry

class orangecontrib.bioinformatics.kegg.databases.ReactionEntry(text=None)[source]

Bases: orangecontrib.bioinformatics.kegg.entry.DBEntry

definition

DEFINITION

entry

ENTRY

enzyme

ENZYME

equation

EQUATION

name

NAME

class orangecontrib.bioinformatics.kegg.databases.Reaction[source]

Bases: orangecontrib.bioinformatics.kegg.databases.DBDataBase

ENTRY_TYPE

alias of ReactionEntry

class orangecontrib.bioinformatics.kegg.databases.EnzymeEntry(text=None)[source]

Bases: orangecontrib.bioinformatics.kegg.entry.DBEntry

all_reac

ALL_REAC

class_

CLASS

comment

COMMENT

DBLINKS

entry

ENTRY

genes

GENES

name

NAME

orthology

ORTHOLOGY

pathway

PATHWAY

product

PRODUCT

reaction

REACTION

reference

REFERENCE

substrate

SUBSTRATE

sysname

SYSNAME

class orangecontrib.bioinformatics.kegg.databases.Enzyme[source]

Bases: orangecontrib.bioinformatics.kegg.databases.DBDataBase

ENTRY_TYPE

alias of EnzymeEntry

class orangecontrib.bioinformatics.kegg.databases.PathwayEntry(text=None)[source]

Bases: orangecontrib.bioinformatics.kegg.entry.DBEntry

class_

CLASS

compound

COMPOUND

DBLINKS

description

DESCRIPTION

disease

DISEASE

drug

DRUG

entry

ENTRY

enzyme

ENZYME

ko_pathway

KO_PATHWAY

module

MODULE

name

NAME

organism

ORGANISM

pathway_map

PATHWAY_MAP

reference

REFERENCE

rel_pathway

REL_PATHWAY

class orangecontrib.bioinformatics.kegg.databases.Pathway(prefix='map')[source]

Bases: orangecontrib.bioinformatics.kegg.databases.DBDataBase

KEGG Pathway database

Parameters:prefix (str) – KEGG Organism code (‘hsa’, …) or ‘map’, ‘ko’, ‘ec’ or ‘rn’
ENTRY_TYPE

alias of PathwayEntry

KEGG Pathway (pathway)

class orangecontrib.bioinformatics.kegg.pathway.Pathway(pathway_id, local_cache=None, connection=None)[source]

Bases: object

Class representing a KEGG Pathway (parsed from a “kgml” file)

Parameters:pathway_id (str) – A KEGG pathway id (e.g. ‘path:hsa05130’)
name

Pathway name/id (e.g. “path – hsa05130”)

org

Pathway organism code (e.g. ‘hsa’)

number

Pathway number as a string (e.g. ‘05130’)

title

Pathway title string.

image

URL of the pathway image.

URL to a pathway on the KEGG web site.

get_image()[source]

Return an local filesystem path to an image of the pathway. The image will be downloaded if not already cached.

classmethod list(organism)[source]

List all pathways for KEGG organism code organism.

Utilities

class orangecontrib.bioinformatics.kegg.entry.parser.DBGETEntryParser[source]

A DBGET entry parser (inspired by xml.dom.pulldom).

Example

>>> stream = StringIO(
...     "ENTRY foo\n"
...     "NAME  foo's name\n"
...     "  BAR A subsection of 'NAME'\n"
... )
>>> parser = DBGETEntryParser()
>>> for event, title, contents_part in parser.parse(stream):
...    print(parser.EVENTS[event], title, repr(contents_part))
...
ENTRY_START None None
SECTION_START ENTRY 'foo\n'
SECTION_END ENTRY None
SECTION_START NAME "foo's name\n"
SUBSECTION_START BAR "A subsection of 'NAME'\n"
SUBSECTION_END BAR None
SECTION_END NAME None
ENTRY_END None None
ENTRY_END = 1

Entry end event

ENTRY_START = 0

Entry start events

SECTION_END = 3

Section end event

SECTION_START = 2

Section start event

SUBSECTION_END = 5

Subsection end event

SUBSECTION_START = 4

Subsection start event

TEXT = 6

Text element event