Chimezie’s posterous

Filed under

semanticweb

 

Updates to FuXi (backward chaining SPARQL/RIF/OWL entailment, various fixes, etc.)

So, I've re-organized python-dlp and FuXi. Originally, I thought of
python-dlp as a library on top of the expert system capabilities of a
SW reasoner (syntax libraries, etc.). Now with a small ecosystem of
(next generation) RDFlib-based, open-source libraries emerging from
the SW informatics being done in our Heart and Vascular Institute:
 
- Telescope 
- SPARQL to SQL method (wiki and changes)
- "rdflib-stable" revisions 
 
I've decided to separate the expert system (FuXi) into its own Google
Code project and decide soon what to do with python-dlp (the idea,
anyways):
 
It has a mercurial repository and can be cloned via:
 
hg clone https://fuxi.googlecode.com/hg/ fuxi
 
So, effectively, FuXi development has switched from SVN to
Mercurial :)
 
Shortly, I'll update the literature (wikis, etc.) on the python-dlp
Google Code project to (for now) act as a detour to the new project
until I determine what features, usecases are still left from the
original idea of having FuXi be everything inference-related on top of
"rdflib-stable"
 
I still have to determine if the RIF-Core XML syntax support should be
done using 4suite-xml, amara, amara2, or Python's (native) XML
processing capabilities (which is what rdflib originally did). This
will have an impact on the ease of distribution, etc.
 
Until then the recent changes are probably the last before the feature-
frozen candidate release testing leading to 1.0:
 
Revision 81d9eec480:
- Fixed use of Set (some performance boost perhaps)
- +++ b/lib/Rete/Magic.py (added imports from proof theory, minor
fixes, empty sip collections,
variable utility function, immutable dictionaries, )
+++ b/lib/Rete/Proof.py (updated proof builder, PML / RDF/XML
serialization, for top-down, query
generating strategies)
+++ b/lib/Rete/SidewaysInformationPassing.py (fix construction of sip
collection)
- Native Prolog-like Python implementation for RIF-Core, OWL, and
SPARQL (can be used to convert
SPARQL in-light of RIF-Core/OWL/OWL2 into a series of top-down
evaluated and combined SPARQL queries
against an RDF dataset)
 
Revision dc70e7a115:
Takes another Uniterm and in every case where the corresponding term
+ for both literals are different variables, we map from the
variable
+ for *this* uniterm to the corresponding variable of the other
.
Revision ff2acf3e96:
- Added --why option to command-line. Adds to a list of SPARQL
queries used to trigger the sip-
strategy to solve them top-down from the ruleset.
- minor checks/fixes for top-down method
 
Revision a0c14690be
fix to handle use of pyparsing or Bison for SPARQL parsing
 
There is a new wiki showing usage of top-down (backward-chaining) RIF/
OWL/N3 solver:
 
Also, the testOWL.py script has been updated with options for running
the OWL test using top-down or bottom-up reasoners:
 
$ python testOWL.py --bottomUp --singleTest=OWL/TransitiveProperty/
premises001
.. snip ..
$ python testOWL.py --topDown=ground --singleTest=OWL/
TransitiveProperty/premises001
.. snip ..
$ python testOWL.py --topDown=ground
ok
.. snip ..
----------------------------------------------------------------------
Ran 1 test in 3.089s
 
OK
 
For example, the command-line can be used to solve a OWL test by
remotely fetching the OWL class descriptions, converting them to RIF-
like rules internally (using the DLP feature), reducing the ruleset
and creating a rule/goal graph using the GMS strategy, taking a SPARQL/ASK
query via the --why option, and solving the query against the original
OWL graph using the new top-down feature, printing out a PML RDF/XML
proof.

Notice the backward chaining method (SipStrategy) can be passed a 'live' database-
connected graph in order to solve the query in backward-chaining
fashion by dispatching SPARQL queries against the possibly large RDF
warehouse).
 
The example below demonstrates the ability to handle the tail-
recursive rulesets / programs that can result from extracting rules
from an OWL ontology that includes RBox axioms about role transitivity
(the path between countries, for example). The top-down algorithm is
similar to prolog with memoization and last call optimization. Both
are trivial to implement in Python using one of many memioze
decorators and native generators respectively.
 
Note, the bottom-up (forward chaining) method using the RETE-UL
decision network is much quicker and only fires one rule (due to
sideways information passing optimization - similar to the kind used
by databases to answer relational queries in the face of live,
immaterial, and possibly recursive views). I'm devising larger in-
memory tests to determine why the forward chain algorithms that much
faster, but I imagine it has something to do with:
 
Ullman, JD , "Bottom-up beats top-down for datalog". Proceedings of
the eighth ACM symposium on Principles of database systems. pp
140-149, 1989, ACM New York, NY, USA.
 
$ time FuXi --ruleFacts --why="ASK { test:Ghent test:path
test:Amsterdam }" --ns=test=http://www.w3.org/2002/03owlt/
TransitiveProperty/premises001# --dlp --output=conflict
http://www.w3.org/2002/03owlt/TransitiveProperty/premises001
 
Time to solve goal ASK { test:Ghent test:path test:Amsterdam } top-
down: 17.4751281738 milli seconds

.. snip PML serlializtion of proof ..

Time to calculate closure on working memory:  2.55084037781 milli seconds

<Network: 1 rules, 4 nodes, 9 tokens in working memory, 1 inferred tokens>

<TerminalNode : CommonVariables: [?thJpxqsn18] (3 in left, 3 in right memories)>

test:path(?X ?thJpxqsn19) :- And( test:path(?X ?thJpxqsn18) test:path(?thJpxqsn18 ?thJpxqsn19) )

1 instanciations 

real 0m2.532s
user 0m1.770s
sys 0m0.479s

Proof generated via renderProof method on ProofBuilder instance

 

Filed under  //   fuxi   logic   logic-programming   n3   owl   rdf   semantic-web  

Comments [0]

Using OWL and Default Negation to Reason about Patient Records

We've been using SPARQL, OWL, and N3 lately to prototype the reporting
of common research variables to the Society of Thoracic Surgeon's
(STS) National Database. The reports are being run against our large
RDF dataset of abstracted electronic patient records from the
Cleveland Clinic's Electronic Health Record system. Our dataset
consists of about 200,000 patients each represented as statements in
named RDF graphs. The STS variables we are responsible for deriving
are represented using a combination of OWL-DL and Notation 3. The
constraints that do not benefit from the restricted, tree-like nature of description logic are captured using secondary plain Horn clauses
(or rules) represented in Notation 3.
 
We use an open source logic reasoning system for the semantic web that
converts the constraints and a SPARQL query for an RDF dataset
governed by these OWL-DL constraints into provably optimal sets of
rules used to calculate an entailed RDF graph (the specifics of this
method is a subject of a paper I'm working on for the RuleML 2009
conference). Such an entailed RDF graph can then be targeted with
SPARQL queries to answer for the STS variables. A recent challenge
has been to try to capture the semantics of negation in order to
implement 'exclusion criteria'. This is typically of the form of a
class of procedures that do not involve combinations of one or more
kinds of other procedures. A recent update to FuXi includes the
ability to convert OWL-DL expressions that use owl:complementOf into
general, stratified, logic programs that can be evaluated using SPARQL
in order to implement the semantics of stable model negation (which is
quite different from the way owl:complementOf is meant to be
interpreted: according to the negation of first-order logic).
 
In particular, statements of classical (first-order) negation are
making assertions about the lack of existence of models that satisfy
the positive form of such a statement in a theory. I prefer this
explanation to the way the term 'open world' assumption is often
used to describe this interpretation of negated terms in a
description logic language. Database theory, ofcourse, does not interpret negated terms
in this way, but instead (intuitively) understands statements of negated terms to be
'true' if the (ground) positive form is not in the set of known facts (the
database).

Our use of negation, and the nature of knowledge
recorded in a computer-based patient record system seems (so far) to
lend itself more to the database interpretation where there is an
understanding that a curated medical information system would have its
data entered under the governance of policies that would allow
medically useful inferences to be made from the absence of certain
facts about patient care.
 
In particular, if a fact is known to not be true about a patient or
some activity involving a patient, it is not recorded. This common
understanding can be used to make inferences about whether facts in a
patient record satisfy an exclusion criteria. Below is an example of
this:
 
Consider the following OWL descriptions of a class of operations:
 

SubClassOf(
 IntersectionOf(
  Operation
  PostOpInHOspitalEvent
  ObjectAllValuesFrom(
  involves
  ComplementOf( UnionOf( CardiacProcedure ThoracicControlBleeding ) ) )
 )
 sts:ReopForOtherNonCardiac )

The syntax above is the OWL2 functional-style syntax. We can
paraphrase the general class inclusion (GCI) axiom above as saying:
".. all operations that followed another operation and do not involve any 
cardiac procedures or thoracic control bleeding procedures.

The original documentation for this variable in the STS adult cardiac
database manual says: 

Indicate whether the patient returned to the operating room for
other non-cardiac reasons

Now, if we assume that all operations of interest and the involved
procedures are explicitly recorded in our patient RDF dataset. This
general class inclusion axioms can be reduced into a set of rules that use negated
'literals' (as they are called); understood to capture the semantics of
default negation (or the 'closed world assumption').  It is worth noting that this is exemplary of a class of expressions that description logic, tableaux-based reasoning algorithms often have problems with.

Conjunctive query answering for stratified datalog is a well-studied class of
problems in database theory. It is through the insight of this canon of theory that FuXi is now able to reduce
OWL-DL expressions that use owl:complementOf into sets of rules (or
logic programs) that can be efficiently processed in order to
implement SPARQL entailment regimes for combinations of OWL and
rule-based representations for the semantic web such as
Notation 3 or RIF core.
 
The current FuXi implementation converts the GCI into the following
two RIF rules:


Forall ?X ?QrjeKHuq961 (
?X # sts:ReopForOtherNonCardiac

  :- And(

    ?X # PostOpInHospitalEvent
    ?X # Operation,
    Naf ?X[involves -> ?QrjeKHuq961] ) )


Forall ?X ?QrjeKHuq961 (
?X # sts:ReopForOtherNonCardiac
  :- And(

    ?X # PostOpInHospitalEvent ,
    ?X # Operation,
    ?X[involves -> ?QrjeKHuq961],
    Naf ?QrjeKHuq961 # CardiacProcedure,
    Naf ?QrjeKHuq961 # ThoracicControlBleeding ) )


 Note, Naf is in the (current) 30 July 2008 version of the "RIF
Framework for Logic Dialects

The first rule describes members of the clas of ReopForOtherNonCardiac as those post-operative operations (i.e., operations that follow another operation in the same patient hospital visit or episode) that do not involve other procedures.

The second rule applies to those post-operative operations that do involve other procedures where these other operations are not either cardiac procedures or thoracic control bleeding procedures.

These RIF rules can be exchanged with other RIF-compliant rule-based
systems that implement any of the well-accepted semantics for negated
formulas in horn clause logic (stable models, well-founded models,
stratified models, etc.). A recent modification to FuXi makes
use of a programmatic SPARQL interface for Python that a colleague of
mind has been working on called telescope. It works with
rdflib (same as FuXi) and is used as a control layer that converts
negated RIF rules into a series of SPARQL queries involving
OPTIONAL/FILTER/!BOUND that are used to calculate "stratified models"
(i.e., the finite set of facts that can be inferred from the set of
rules that include negated literals).
 
Renzo Angles et al. (2008) and Polleres, A. (2007) have since
demonstrated that the expressive power of SPARQL coincides with that
of datalog with negation, so it comes as no suprise that certain
datalog clauses (or rules) can be converted into SPARQL queries using
so-called copy-patterns and the introduction of a MINUS operator. For
the details of how this operator works and how its semantics are
equivalent with that of datalog, the reader is urged to read any of
the above mentioned papers.
 
telescope is used to programatically convert MINUS operators into a
SPARQL queries that answer for RIF rules with the corresponding
negated frame formulas below:
 

SELECT ?X
WHERE {
 ?X a PostOpInHospitalEvent .
 ?X a Operation
 #The post-operative operation does not invlolve any procedures
 OPTIONAL { ?X involves ?QrjeKHuq961 }
 FILTER (!bound(?QrjeKHuq961))
}


SELECT ?X
WHERE {
 ?X a PostOpInHospitalEvent .
 ?X a Operation .
 ?X involves ?QrjeKHuq961
 #In the case where the post-operative operation involves a procedure
it is *not* either a
 # cardiac procedure or a thoracic control bleeding
 OPTIONAL {
  ?QrjeKHuq961 a CardiacProcedure .
  ?QrjeKHuq961 a ThoracicControlBleeding .
  ?QrjeKHuq16542 a CardiacProcedure .
  ?QrjeKHuq16542 a ThoracicControlBleeding
  FILTER (?QrjeKHuq961 = ?QrjeKHuq16542)
 }
 FILTER (!bound(?QrjeKHuq16542))
}


I'll be adding a wiki shortly (on the python-dlp google code wiki)
describing the explicit APIs that can be used for this purpose, but I
wanted to give the feature some context in the recent work I've been
doing on applications of semantic web for medical informatics
 
-- Chimezie

Filed under  //   n3   owl   rdf   semantic-web  

Comments [0]

RDF / OWL Tool Developers Should Respect xml:base!

Okay. I think it needs to be said that RDF / OWL tool-makers do not
do a good job of respecting the mechanics of xml:base . It seems
almost a given that most of them mangle the URI base resolution
conventions they find in existing RDF/OWL files, overwriting them with
their own. Often, this happens in a very destructive way that can be
even more annoying for a person who has much experience using XML and
thus has a better appreciation (perhaps) of the value of a base URI of
an OWL document for more than just owl:Ontology/@rdf:about.
 
This is the typical scenario: I create an OWL document like so:
 
<rdf:RDF>
 <owl:Ontology rdf:about="">
   <owl:imports rdf:resource=".. relative path .."/>
 </owl:Ontology>
 <owl:Class rdf:about="tag:info@example.com#Stuff"/>
</rdf:RDF>

 
First, let me describe the intent here. I think it is perfectly
reasonable to modularize large ontologies into fragments (if you will)
that are bundled together and linked via relative imports. Since they
are bundled together, I often like to use the power of URI resolution
to make relative references to imported OWL documents (as you can see
above) in a way that is completely independent from where the bundle
is being deployed (file system or on the web).
 
The downside of not using xml:base explicitly is that the URIs of my
classes need to be fully qualified, but as far as I'm concerned this
is outweighed by the advantage of knowing that I can deploy my bundled
import network on a website or on a filesystem and have any
self-respecting tool know how to resolve the relative URIs.
 
One of the more involved technical points during the development of
GRDDL was regarding the use of empty URI references in GRDDL results.
The primary importance of empty, relative URI references (in RDF serializations) is the
ability to refer to the containing document without necessarily having
the URI on hand. It was this particular dialogue that made me better
appreciate the power and simplicity of the URI base resolution process
that sits at the bottom of many of the important W3C specifications.
 
From RFC 3986 (the process of determining the Base URI to use when
resolving relative URIs):
 
* The base URI is embedded in the document's content.
* The base URI is that of the encapsulating entity (message, document, or none).
* The base URI is the URI used to retrieve the entity.
* The base URI is defined by the context of the application.
 
Getting back to my example above (this was supposed to be a short
rant). Too often what happens is that when I load an OWL document
such as the oneabove into an OWL editor and save it (even after not
making any changes), it results in:
 
<rdf:RDF xml:base="tag:info@example.com#">
 <owl:Ontology rdf:about="">
   <owl:imports rdf:resource="file:///path/to/OWL/document"/>
 </owl:Ontology>
 <owl:Class rdf:ID="Stuff"/>
</rdf:RDF>

 
<insert appropriate expletive>
 
Now, it's not so much the 'forced' use of rdf:ID that bothers me as
the fact that the relative paths used for the imports are now replaced
with absolute URIs. This sabotages the advantage I once had of being
able to rely on the URI resolution chain to be sufficient for clients
that need to resolve my relative URIs. Essentially, the OWL tool has
monopolized the opportunity to resolve relative URI references.
 
Now, in fairness, it seems much of the motivation of doing this is to
associate an explicit base URI with the ontology itself and often
ontology tools will complain if they are unable to determine an
absolute URI to use for this purpose. I think a better mechanism for
doing this would be explicit attributes or elements rather than
bastardization of the ability to give an explicit URI base in content.
 
It seems to me that if the author wanted to associate an explicit URI
with the ontology he or she would use one in the place of an empty,
relative URI reference. In fact, I would go as far as saying that it
is bad practice to forcibly insert an @xml:base in a document that
doesn't have one *and* uses empty, relative reference!
 
So, for any developers of RDF / OWL tools, please take care in trying
not to enforce a base resolution scheme despite the document author
providing one of their own for reasons that are orthogonal to
associating a URI to an ontology (and more important as far I'm
concerned): deployability.

Filed under  //   owl   rdf   semantic-web   tools   uri  

Comments [0]

Magic sets, DLP, and other strange ways to implement semantic web expert systems

I just finished some changes to python-dlp including a modification to
FuXi that includes an implementation of the Magic Set Transformation
(MST) method for RIF-like horn clauses. The most useful, immediate
value this has for me is to be able to (essentially) implement a DLP
(description logic programming) entailment regime for a SPARQL query
service.
 
Consider the test case for the SymmetricProperty OWL test.
 
The base facts are:
 
  first:Ghent first:path first:Antwerp .
  first:path a owl:SymmetricProperty
 
The goal we are trying to prove is:
 
  first:Antwerp first:path first:Ghent
 
I.e., a user wants to query an RDF dataset that includes an RDF graph
with the above statements and is expected to implement an entailment
regime for OWL-DL RDF such that the following query gives a positive
answer:
 
  ASK { first:Antwerp first:path first:Ghent }
 
The general pD* rule that would normally apply in helping answer this query is:
 
  {?P a owl:SymmetricProperty. ?S ?P ?O} => {?O ?P ?S}.
 
Re-written in a familiar (prolog-like) RIF-BLD syntax:
 
  Forall ?P ?O ?S ( ?P(?O ?S) :- And( owl:SymmetricProperty(?P) ?P(?S ?O) ) )
 
In order to maintain consistency, a rule-based engine that used this
clause to implement the definition of a symmetric property would need
to fire it for *every* triple in the fact base (in order to properly
calculate the herbrand base) because of the 2nd triple pattern in the
body / antecedent / left-hand-side of the rule: ?S ?P ?O
 
However, the DLP approach that converted tree-based OWL axioms into
colloquial horn clauses would allow us to use (instead) a
domain-specific rule:
 
  Forall ?Y ?X ( first:path(?Y ?X) :- first:path(?X ?Y) )]
 
This rule is domain-specific in the sense that it only applies to
instances of the first:path predicate rather than for every predicate.
 As a result of this transformation, the procedural evaluation of the
rule for symmetry has been reduced from the worst case to only the
fraction of the RDF dataset concerning first:path statements.
 
So, a knowledge base that could exhaustively evaluate rules in a
top-down fashion (via 'forward chaining') prior to bringing up the
SPARQL service could answer that question against the (smaller)
entailed RDF graph.
 
However, with the MST implementation, if the query was known a priori
the ruleset can be modified into a version that further restricts the
amount of redundant work done during the inference process. For
example, even if the SPARQL service is known to never have to answer
that query, the colloquial rule above would still be needed by a niave
implemenation and would apply to every statement that used the
first:path predicate.
 


 
The diagram above is a Graph-viz rendering of a Proof Markup Language
(PML) proof tree generated by taking the colloquial rule, modifying it
using the MST algorithm, evaluating the base facts against the
ruleset, and adding an RDF statement that 'triggers' the
backward-chaining process.  Fuxi includes a nice set of utilities for generating proof tree vizualizations.

 

Essentially, performing a top-down (or
forward chaining) evaluation of the rules and the facts simulates a
backward-chained proof.
 
Below are the 3 rules that replace the original domain-specific rule:
 
  :path_magic(?LOC1 ? LOC2) :- And( :path_magic(?X ? LOC2)
:path_bf(?X ? LOC1) :path_magic(?X) )
  :path_magic(?X) :- :path_magic(?X ?LOC)
  :path(?X ?LOC1) :- And( :path_magic(?X ? LOC1) :path_magic(?X)
:path(?X ?LOC2) :path_magic(? LOC2 ? LOC1) :path(?LOC2 ?LOC1) )
 
And finally, the trigger for the proof is the following RDF statement:
 
  first:Antwerp first:path_magic first:Ghent
 
The first two rules, pass through information about the sub-goals of
the query and essentially block the final rule from taking effect
until the trigger is added to the fact graph. It is clear to see that
the 3rd rule, will no longer apply to every RDF statement with a
first:path predicate, but rather only statements of that kind where
the subject and / or object terms are part of a query. So, for a
SPARQL service where we do not expect to answer queries that rely on
supporting symmetric properties in the first:path predicate, no
calculations will be performed and no unnecessary RDF statements will
be entailed.
 
I hope to write a bit more about some of the benefits of a Python
toolkit for building Semantic Web expert systems. I touched a bit on
these in my InfixOWL write-up and presentation, but haven't really put
the whole picture together.
 
-- Chimezie

 

Filed under  //   fuxi   logic   logic-programming   semantic-web  

Comments [0]