Gene sets (geneset)

This module can load either gene sets distributed with Orange or custom gene sets in the GMT file format.

Loading gene sets

orangecontrib.bioinformatics.geneset.list_all(**kwargs)[source]

Returns available gene sets from the server files repository.

Parameters:kwargs
  • organism (str) – Taxonomy id (NCBI taxonomy database)
Return type:list of (hierarchy, organism)

Example

The available gene set collection can be listed with:
>>> list_all(organism='10090')
orangecontrib.bioinformatics.geneset.load_gene_sets(hierarchy, tax_id)[source]

Initialize gene sets from a given hierarchy.

Parameters:hierarchy (tuple) – gene set hierarchy.
Return type:GeneSets

Example

Gene sets provided with Orange are organized hierarchically:
>>> list_of_genesets= list_all(organism='10090')
    [(('KEGG', 'Pathways'), '10090'),
     (('KEGG', 'pathways'), '10090'),
     (('GO', 'biological_process'), '10090'),
     (('GO', 'molecular_function'), '10090'),
     (('GO', 'cellular_component'), '10090')]
>>> load_gene_sets(list_of_genesets[0])

Supporting functionality

class orangecontrib.bioinformatics.geneset.GeneSets(sets=None)[source]

Bases: set

A collection of gene sets: contains GeneSet objects.

common_hierarchy()[source]

Return a common hierarchy.

common_org()[source]

Return a common organism.

static from_gmt_file_format(file_path)[source]

Load GeneSets object from GMT file.

Parameters:file_path – path to a file on local disk
Return type:GeneSets
genes()[source]
Returns:All genes from GeneSets
hierarchies()[source]

Return all hierarchies.

split_by_hierarchy()[source]

Split gene sets by hierarchies. Return a list of GeneSets objects.

to_gmt_file_format(file_path)[source]

The GMT file format is a tab delimited file format that describes gene sets.

In the GMT format, each row represents a gene set. Columns: gs_id gmt_description Gene Gene Gene … gmt_description: ‘gs_id’,’hierarchy’,’organism’,’name’,’genes’,’description’,’link’

Parameters:file_path – Path to where file will be created
update(sets)[source]

Update a set with the union of itself and others.

class orangecontrib.bioinformatics.geneset.GeneSet(gs_id=None, hierarchy=None, organism=None, name=None, genes=None, description=None, link=None)[source]
gmt_description()[source]

Represent GeneSet as line in GMT file format

Returns:Comma-separated GeneSet attributes.
set_enrichment(reference, query)[source]
Parameters:
  • reference
  • query

Helper functions to work with serverfiles

orangecontrib.bioinformatics.geneset.filename(hierarchy, organism)[source]

Obtain a filename for given hierarchy and organism.

Parameters:
  • hierarchy – GeneSet hierarchy, example: (‘GO’, ‘biological_process’)
  • organism – Taxonomy ID
Returns:

Filename for given hierarchy and organism

Example

>>> filename(('CustomSet', 'subsets'), '6500')
'CustomSet-subsets-6500.gmt'
orangecontrib.bioinformatics.geneset.filename_parse(fn)[source]

Returns a hierarchy and the organism from the gene set filename format.

Parameters:fn – GeneSets file name (.gmt)
Returns:A hierarchy and taxonomy id for given filename

Example

>>> filename_parse('Custom-set-6500.gmt')
(('Custom', 'set'), '6500')