Project Documentation & Protocols: Maize Mapping Project: Ontologies and Controlled Vocabularies
Overview of Ontologies and Controlled Vocabularies: Controlled vocabulary terms, retrieved from an appropriate ontology, facilitate the assignment of descriptive biological and botanical terms to genetic map and physical map data. The resultant mapping facilitates data retrieval and comprehension of data within a database. The application of ontology products should also facilitate the communication between plant-based databases (interoperability), providing a uniform platform for queries based on controlled vocabularies. The terms used in the controlled vocabularies are derived from internationally published sources. This research is a component of that being undertaken by the Plant OntologyTM Consortium. The Plant OntologyTM (PO) Consortium is extending a paradigm developed by the Gene OntologyTM Consortium (GO) (http://www.geneontology.org). Databases associated with the Plant OntologyTM Consortium are: Gramene: A Resource for Comparative Grass Genomics - (http://www.gramene.org);
MaizeDB (http://www.agron.missouri.edu). The International Rice Research Institute (IRRI - http://www.irri.org) associated with The International Crop Information System (ICIS) database (http://www.cgiar.org/icis/). The Arabidopsis Information Resource (TAIR - http://www.arabidopsis.org/) is closely associated with the collaborative efforts of the Plant OntologyTM Consortium.
+ The development of ontology products, primarily for Zea mays;
+ The development of software tools to display ontology products for Zea mays;
+ The application of ontology products to data in the MaizeDB database.
+ Plant Ontology Consortium
+ Gramene: A resource for comparative grass genomics
[link to: http://www.gramene.org/plant_ontology/ - consult the Trait OntologyTM for rice.
* Gene OntologyTM Consortium [link to: http://www.geneontology.org - consult the ontologies for the three knowledge domains of a generic eukaryotic cell: molecular function, biological process and cellular component.
Background on Controlled Vocabularies and Ontologies:
Plant databases are expanding in number, size and complexity. This is especially true of economically important plant taxa such as maize/corn (Zea mays), rice (Oryza sativa) and soybean (Glycine max) but is also true of taxa regarded as .model organisms. for plant science research purposes, such as Arabidopsis thaliana and rice (Oryza). These information-rich databases face the challenge of accurately and consistently documenting features such as gene structures, products and functions, phenotypes, traits, developmental stages and anatomical parts besides other information.
It will be increasingly desirable for inter-database queries to be performed between these plant-based databases to exploit comparative genomic strategies to elucidate functional aspects of plant biology and conduct studies of synteny. These databases will facilitate interpolation and extrapolation of data that will facilitate the development of further hypotheses to be tested. However, terms used to describe comparable objects within and between databases are sometimes quite variable and limit the ability to accurately and successfully query information in and across different databases. One solution to this problem involves the development and application of structured controlled vocabularies arranged in ontologies.
What is an ontology? An ontology is a classification methodology for formalizing a subject.s knowledge in a structured way (typically for consumption by an electronic database). A database schema is an example of an ontology. In the world of structured information, ontologies, comprising structured controlled vocabularies, play a very important role in facilitating information retrieval. Furthermore, the definitions that accompany the controlled vocabulary terms are very important in that they facilitate the consistent use of the controlled vocabulary terms in database curation.
Biology-based Ontologies: In biology-based ontologies the controlled vocabulary terms are arranged in such a way that their placement reflects the known or putative biological associations between the objects represented by the controlled vocabulary terms. Considerable effort must be invested into the compilation of the controlled vocabulary terms and the definitions of these terms.
The research in the area of descriptors is focusing on the development of an anatomy (incl. morphology) ontology and a structured controlled vocabulary, with definitions, for maize/corn/Zea mays. The controlled vocabularies will facilitate the mapping of phenotypes and traits to the genetic map data.
The incorporation of controlled vocabularies into an ontology for Zea mays involves the philosophy and use of the methodology of Directed Acyclic Graphs (DAGs). A DAG is similar to a hierarchical structure but is superior because terms (representing concepts) within a DAG structure have the ability to have one or more than one .parent.. Consequently, DAGs are able to represent biological relationships more readily than typical hierarchical structures. A novel approach in ontology development, involving the more explicit use of ontogenetic and phylogenetic data and concepts in the elucidation and development of ontological relationships, is also being considered. This approach should facilitate the testing of the True Path Rule in the development of plant ontologies. The True Path Rule states that the pathway from a .child. term all the way up to its top level .parent(s). must always be biologically accurate. Further information on the True Path Rule is available from: http://www.geneontology.org/GO.usage.html.
Return to Documentation Index | Return to Maize Mapping Project Index | Return to Homepage