Ontology Generation
The primary objective of this schema is to provide a data model for representing chemical entities and their groupings, where these are database instances, and to use this for aligning across different chemical databases.
A secondary objective is to be able to generate an OBO-style ontology from the data model, and to use this to help advance the development of CHEBI.
Download OWL
See the ontology folder on github
OWL generation
See Makefile.etl for specific details.
The basic idea is to transform the turtle instance data (where for
example carbon
is an instance of the ChemicalElement class, and
carbon-12
is an instance of the sibling Isotope class) into classes,
and to use reasoning to classify.
Currently this is done via SPARQL construct (see owlgen folder)
Conversion to use linkml-owl is in progress.
Example
Mn(+4) is represented in the database as an individual of type MonoatomicIon
chem:MonoatomicIon/Mn/+4 rdf:type chem:MonoatomicIon ;
rdfs:label "manganese(4+)" ;
ns1:chebi_iri CHEBI:25158 ;
ns1:charge 4 ;
ns1:has_element chem:Mn ;
ns1:inchi_string "InChI=1S/Mn/q+4"^^xsd:string .
This is translated to class-level (via this query):
chem:MonoatomicIon/Mn/+4
a owl:Class ;
rdfs:label "manganese(4+)" ;
owl:equivalentClass
[ owl:intersectionOf
( chem:ChemicalElement/Mn
[ a owl:Restriction ;
owl:hasValue 4 ;
owl:onProperty chem:charge
]
) ] .
This will autoclassify to "manganese ion" etc
here is an example of the atom hierarchy in protege, showing automatic classification:
Two-level representation
One thing that may seem unintuitive is that instances at the LinkML level are classes at the OBO level. This is illustrated here:
Relationship to templating systems
One way to view this project is:
- the schema is a hierarchical collection of DOSDP templates or ROBOT templates
- the database are the TSVs/spreadsheets that are inputs to the templates to generate OWL
Using LinkML as the modeling system provides some advantages. Rather than a collection of denormalized tables, the inputs to the OWL generation are objects/instances/rows conforming to a full object model/schema, allowing for both rigorous modeling and powerful programmatic transformations.
Conversion to use linkml-owl is in progress.