Homework #5: OWL Constraint Checker
Due: Thursday, Oct. 16
In this assignment, you will write a tool that takes as input a simple
OWL ontology and an OWL data document, and does some basic checks
to see if the data document obeys the constraints of the ontology.
In order to keep this task manageable, we will only be concerned
with checking the following constraints:
- That the rdf:type of every instance in the data file is a class
defined in the ontology file.
- If the rdf:type of an instance is a class that is defined as the
subclass of an OWL DL Restriction, then the instance's properties obey the
specified constraint.
Furthermore, the input ontologies we use will not have:
- rdfs:subPropertyOf, rdfs:domain, or rdfs:range
- RDF-style datatyping (a la the rdf:datatype attribute).
- versioning or imports information
- property characteristics, such as owl:FunctionalProperty, owl:inverseOf, etc.
- ontology mapping properties, such as owl:sameClassAs, owl:equivalentTo, etc.
- complex classes, such as those defined by owl:intersectionOf, owl:oneOf, etc.
When checking constraints, the tool should make both the unique names
assumption and the closed world assumption. That is, you can
assume that if two resources have different URIs then they are
distinct, and you can assume that all relevant data is specified
in the data file. Therefore, if there was a minCardinality of 2
on a property, and an instance only had one value for that property,
that would be considered a constraint violation that the tool should
report. A particular class may have multiple constraints, and each
should be checked for any instance of the class.
(Note, although you should not usually make the unique names
assumption and the closed world assumption with respect to the
Semantic Web, it is okay for a particular application with known
ontologies and data to do so).
Your file should be run as:
java Checker ontfile datafile
where ontfile is the pathname of an OWL ontology file, and
datafile is the pathname of a file with OWL instance data.
The output of your program should be:
No constraints violated.
or a list of specific error messages.
In the case of an instance that is of an undefined type, report an error
message of the form:
ERROR: Undefined Type - Instance instance_id
member of class class_id
In the case of the violation of
a restriction, the message should report
the ID of the instance in which the error occurs, the type of
constraint that is violated, the class in which this constraint is
specified, and the property on which the constraint is placed.
For example:
ERROR: Property Restriction Violation
Instance: band1
Class: Band
Property: hasMember
Constraint: minCardinality = 2
Use of Jena
You must use Jena to parse the input files.
You will need to use classes from the com.hp.hpl.jena.rdf.model package,
and may find the com.hp.hpl.jena.vocabulary package useful as well.
Do not directly use any other packages from the Jena distribution without
my permission. In particular, you are forbidden to use Jena's ontology
or reasoner components. Note, for Jena to run correctly, you will need
to include the
jena.jar, log4j-1.2.7.jar and icu4j.jar files
in your classpath.
Design Hints
Even with the simplifications specified above, this task can be challenging,
so here is a suggested approach to solving this problem:
- Create a set of Java class that can parse an ontology and store
basic OWL class information, including superclasses and property
restrictions. In order to help you out, I have provided three
unfinished classes that you may use
(OwlOnt.java,
OwlClass.java, and
OwlProperty.java).
These classes provide basic
data structures and some simple access methods. However you will have to
implement the methods for parsing an ontology from an OWL file,
and eventually for testing if one OWL class is a subclass of another
class (see Step 4 for the later). You
may modify these Java classes in any way you wish, and may also
choose not to use them at all. In any case, be careful when trying
to determine the superclasses of an OWL class. The property
rdfs:subClassOf can be used either with a named class or with
an anonymous owl:Restriction. These two forms should be treated
differently by your application.
- Write code that reads a data file, determines what the instances are,
and verifies that all types correspond to a Class in the ontology file.
Print an error message for each instance that fails this test. You
can ignore instances that are untyped.
- Since the cardinality and hasValue constraints should be the
easiest to check, you should next write code that checks whether
each typed instance identified in Step 2 obeys these constraints,
and reports any violations (e.g., an
instance has a property that doesn't have enough values, or has
a property that doesn't include the value specified by a owl:hasValue
restriction). Note this must be done after steps 1 and 2, because
it requires the program to know the type of
the instance and what constraints are applicable for that type
(as specified in the ontology).
Since the constraints of any superclasses of the type should also apply,
be sure your program checks these as well.
- Write code to test if one class is a subClassOf another, whether
implicitly or explicitly.
Because we are restricting ourselves to very simple ontologies,
we only need to look at the explicit rdfs:subClassOf relations and
any transitive inferences that result (e.g., if A is a subclass of B,
and B is a subclass of C, then we can conclude that A is also a subclass of C).
Note, that more complex ontologies would require sophisticated reasoning
methods to determine all of the implicit subClassOf relations.
- Write code to check the owl:allValuesFrom and owl:someValuesFrom
constraints. Note that you must take extra care when checking these
kinds of constraints. An instance of a class is also an instance of
all of its superclasses, so be sure to take this into account when
you try to determine if the value of a property is of the type
specified by the constraint.
Submission Instructions
This assignment is due by the beginning of class on Thursday, Oct. 16.
Create a zip or tar file that contains both your source code (.java)
and compiled (.class) files (but do not include any of the Jena files
in it). If you used the three files I provided, make sure they are
included. Also, if you use a .BAT file or some other form of script
to compile and run your program, please include this as well.
Send the combined file to
heflin@cse.lehigh.edu with
subject line "CSE 497 - Homework #5". Also print out your .java files,
and turn them in during class. All of your files should be reasonably
commented, including an intitial comment that identifies you as
the author and descriptive comments for each class
and method.