oats.annotation package¶
Submodules¶
oats.annotation.annotation module¶
-
annotate_using_fuzzy_matching(ids_to_texts, ontology, threshold=0.9, fixcase=1, local=1)¶ Build a dictionary of annotations using fuzzy string matching. This is useful for finding instances of ontology terms in larger text strings.
- Args:
ids_to_texts (dict of int:str): Mapping from unique integer IDs to natural language text strings.
ontology (oats.annotation.Ontology): Ontology object with specified terms.
threshold (float, optional): Value ranging from 0 to 1, the similarity threshold for string matches.
fixcase (int, optional): Set to 1 to normalize all strings before matching, set to 0 to ignore this option.
local (int, optional): Set the alignment method, 0 for global and 1 for local. Local alignment should always be used for annotating ontology terms to long strings of text.
- Returns:
- dict of int:list of str: Mapping from unique integer IDs to lists of ontology term IDs.
-
annotate_using_noble_coder(ids_to_texts, jar_path, ontology_name, precise=1, output=None)¶ Build a dictionary of annotations using NOBLE Coder (Tseytlin et al., 2016).
- Args:
ids_to_texts (dict of int:str): Mapping from unique integer IDs to natural language text strings.
jar_path (str): Path of the NOBLE Coder jar file.
ontology_name (str): Name of the ontology (e.g., “pato”, “po”) used to find matching a NOBLE Coder terminology file (e.g., pato.term, po.term) in ~/.noble/terminologies. This name is not case-sensitive.
precise (int, optional): Set to 1 to do precise matching, set to 0 to accept partial matches.
output (str, optional): Path to a text file where the stdout from running NOBLE Coder should be redirected. If not provided, this output is redirected to a temporary file and deleted.
- Returns:
- dict of int:list of str: Mapping from unique integer IDs to lists of ontology term IDs.
- Raises:
- FileNotFoundError: NOBLE Coder cannot find the terminology file matching this ontology.
-
annotate_using_rabin_karp(ids_to_texts, ontology, fixcase=1)¶ Build a dictionary of annotations using the Rabin Karp algorithm. This is useful for finding instances of ontology terms in larger text strings.
- Args:
ids_to_texts (dict of int:str): Mapping from unique integer IDs to natural language text strings.
ontology (oats.annotation.Ontology): Object of the ontology to be used.
fixcase (int, optional): Set to 1 to normalize all strings before matching, set to 0 to ignore this option.
- Returns:
- dict of int:list of str: Mapping from unique integer IDs to lists of ontology term IDs.
-
read_annotations_from_file(annotations_input_path, sep='\t')¶ Read a file of annotations and produce a dictionary. This is intended to be able to read the types of files that are produced by the functions that write dictionaries of annotations to files. This does the reverse process of producing a dictionary from those files.
- Args:
- annotations_input_file (str): Path of the input annotations file to read.
- Returns:
- dict of int:list of str: Mapping from unique integer IDs to lists of ontology term IDs.
-
term_enrichment(all_ids_to_annotations, group_ids, ontology, inherited=False)¶ Obtain a dataframe with the results of a term enrichment analysis using Fisher exact test with the results sorted by p-value.
- Args:
all_ids_to_annotations (dict of int:list of str): A mapping between unique integer IDs (for genes) and list of ontology term IDs annotated to them.
group_ids (list of int): The IDs which should be a subset of the dictionary argument that refer to those belonging to the group to be tested.
ontology (oats.annotation.ontology.Ontology): An ontology object that shoud match the ontology from which the annotations are drawn.
inherited (bool, optional): By default this is false to indicate that the lists of ontology term IDs have not already be pre-populated to include the terms that are superclasses of the terms annotated to that given ID. Set to true to indicate that these superclasses are already accounted for and the process of inheriting additional terms should be skipped.
- Returns:
- pandas.DataFrame: A dataframe sorted by p-value that contains the results of the enrichment analysis with one row per ontology term.
-
write_annotations_to_file(annotations_dict, annotations_output_path, sep='\t')¶ Write a dictionary of annotations to a file. The produces file format of IDs followed by delimited ontology term IDs is used as input and output formats for some other packages, so this is included as an option for interfacing with other steps in a pipeline if necessary.
- Args:
annotations_dict (dict of int:list of str): Mapping from unique integer IDs to lists of ontology term IDs.
annotations_output_file (str): Path of the output file that will be created.
oats.annotation.ontology module¶
-
class
Ontology(path)¶ Bases:
pronto.ontology.OntologyA wrapper class for pronto.ontology.Ontology to provided some extra functions that may be useful for natural language processing problems. Note that the inherited attributes and methods are not documented here, only the additional ones added for this derived class. See the documentation for the pronto package for information about the inherited class and its methods.
- Attributes:
term_to_tokens (dict of str:list of str): Mapping between ontology term IDs and lists of words that are related to those terms.
token_to_terms (dict of str:list of str): Mapping between words and the lists of ontology term IDs that are related to those words.
- Args:
- path (str): Path for the .obo file of the ontology to build this object from.
-
depth(term_id)¶ Given an ontology term ID, return the depth of that term in the hierarchial ontology graph. The depth provided is an integer that indicates the shortest possible path from that term to a root term.
- Args:
- term_id (str): The ID for a particular ontology term.
- Returns:
- int: The depth of the term.
-
descendants(term_ids)¶ Given an ontology term ID, return a list of the term IDs for all the terms that are descendants of this particular term, including the term itself. This list is prepopulated using the pronto subclasses method. The only difference is that a list of term ID strings is provided instead of a generator or term objects, which is useful for other methods in this class. This also accepts a list of one or more term IDs in which the union of the terms inherited by all terms in the list are returned, including every term in the passed in list.
- Args:
- term_ids (list of str or str): The ID for a particular ontology, or a list of ID(s).
- Returns:
- TYPE: Description
-
ic(term_id, as_weight=True)¶ Given an ontology term ID, return the information content of that term from the structure of the hierarchical ontology graph. This information content value takes into account the depth of the term in the graph, as well as what proportion of the total graph is a descendent of this term. The equation used for information content here is based on the depth of the term which is multiplied by the term [1 - log(descendants+1)/log(total)]. This works so that information content is proportional to depth (increases as terms get more specific), but if the number of descendants is very high that value is decreased. This is an alternative to calculating information content directly from the ontology graph rather than using the frequencies of terms appearing in data. This is useful when no such resource is available.
- Args:
- term_id (str): The ID for a particular ontology term.
- Returns:
- float: The information content of the term.
-
inherited(term_ids)¶ Given an ontology term ID, return a list of the term IDs for all the terms that are inherited by this particular term, including the term itself. The list is prepopulated using the pronto superclases method. The only difference is that a list of term ID strings is provided instead of a generator of term objects, which was useful for other methods in this class. This also accepts a list of one or more term IDs in which the union of the terms inherited by all terms in the list are returned, including every term in the passed in list.
- Args:
- term_ids (list of str or str): The ID for a particular ontology, or a list of ID(s).
- Returns:
- TYPE: Description
-
similarity_ic(term_id_list_1, term_id_list_2, inherited=False, as_weight=True)¶ Find the similarity between two lists of ontology terms, by finding the information content of the most specific term that is shared by the sets of all terms inherited by all terms in each list. In this case, the most specific term is the term with maximum information content.
- Args:
term_id_list_1 (list of str): A list of ontology term IDs.
term_id_list_2 (list of str): A list of ontology term IDs.
inherited (bool, optional): Setting to true indicates that the lists already include all inherited terms. By default this is set to false indicating that the ontology graph structure should be used to find the additional terms inherited by terms in each of the passed in sets.
- Returns:
- float: The maximum information content of any common ancestor between the two term lists.
-
similarity_jaccard(term_id_list_1, term_id_list_2, inherited=False)¶ Find the similarity between two lists of ontology terms, by finding the Jaccard similarity between the two sets of all the terms that are inherited by each of the terms present in each list.
- Args:
term_id_list_1 (list of str): A list of ontology term IDs.
term_id_list_2 (list of str): A list of ontology term IDs.
inherited (bool, optional): Setting to true indicates that the lists already include all inherited terms. By default this is set to false indicating that the ontology graph structure should be used to find the additional terms inherited by terms in each of the passed in sets.
- Returns:
- float: The jaccard similarity between the two lists of terms.
-
tokens()¶ Get a list of the tokens or words that appear in this ontology. This is intented to be useful for treating the ontology as a vocabulary source.
- Returns:
- list of str: Lists of words in the set of all words present in all term labels and synonyms in this ontology.