Chimezie’s posterous

Filed under

n3

 

Updates to FuXi (backward chaining SPARQL/RIF/OWL entailment, various fixes, etc.)

So, I've re-organized python-dlp and FuXi. Originally, I thought of
python-dlp as a library on top of the expert system capabilities of a
SW reasoner (syntax libraries, etc.). Now with a small ecosystem of
(next generation) RDFlib-based, open-source libraries emerging from
the SW informatics being done in our Heart and Vascular Institute:
 
- Telescope 
- SPARQL to SQL method (wiki and changes)
- "rdflib-stable" revisions 
 
I've decided to separate the expert system (FuXi) into its own Google
Code project and decide soon what to do with python-dlp (the idea,
anyways):
 
It has a mercurial repository and can be cloned via:
 
hg clone https://fuxi.googlecode.com/hg/ fuxi
 
So, effectively, FuXi development has switched from SVN to
Mercurial :)
 
Shortly, I'll update the literature (wikis, etc.) on the python-dlp
Google Code project to (for now) act as a detour to the new project
until I determine what features, usecases are still left from the
original idea of having FuXi be everything inference-related on top of
"rdflib-stable"
 
I still have to determine if the RIF-Core XML syntax support should be
done using 4suite-xml, amara, amara2, or Python's (native) XML
processing capabilities (which is what rdflib originally did). This
will have an impact on the ease of distribution, etc.
 
Until then the recent changes are probably the last before the feature-
frozen candidate release testing leading to 1.0:
 
Revision 81d9eec480:
- Fixed use of Set (some performance boost perhaps)
- +++ b/lib/Rete/Magic.py (added imports from proof theory, minor
fixes, empty sip collections,
variable utility function, immutable dictionaries, )
+++ b/lib/Rete/Proof.py (updated proof builder, PML / RDF/XML
serialization, for top-down, query
generating strategies)
+++ b/lib/Rete/SidewaysInformationPassing.py (fix construction of sip
collection)
- Native Prolog-like Python implementation for RIF-Core, OWL, and
SPARQL (can be used to convert
SPARQL in-light of RIF-Core/OWL/OWL2 into a series of top-down
evaluated and combined SPARQL queries
against an RDF dataset)
 
Revision dc70e7a115:
Takes another Uniterm and in every case where the corresponding term
+ for both literals are different variables, we map from the
variable
+ for *this* uniterm to the corresponding variable of the other
.
Revision ff2acf3e96:
- Added --why option to command-line. Adds to a list of SPARQL
queries used to trigger the sip-
strategy to solve them top-down from the ruleset.
- minor checks/fixes for top-down method
 
Revision a0c14690be
fix to handle use of pyparsing or Bison for SPARQL parsing
 
There is a new wiki showing usage of top-down (backward-chaining) RIF/
OWL/N3 solver:
 
Also, the testOWL.py script has been updated with options for running
the OWL test using top-down or bottom-up reasoners:
 
$ python testOWL.py --bottomUp --singleTest=OWL/TransitiveProperty/
premises001
.. snip ..
$ python testOWL.py --topDown=ground --singleTest=OWL/
TransitiveProperty/premises001
.. snip ..
$ python testOWL.py --topDown=ground
ok
.. snip ..
----------------------------------------------------------------------
Ran 1 test in 3.089s
 
OK
 
For example, the command-line can be used to solve a OWL test by
remotely fetching the OWL class descriptions, converting them to RIF-
like rules internally (using the DLP feature), reducing the ruleset
and creating a rule/goal graph using the GMS strategy, taking a SPARQL/ASK
query via the --why option, and solving the query against the original
OWL graph using the new top-down feature, printing out a PML RDF/XML
proof.

Notice the backward chaining method (SipStrategy) can be passed a 'live' database-
connected graph in order to solve the query in backward-chaining
fashion by dispatching SPARQL queries against the possibly large RDF
warehouse).
 
The example below demonstrates the ability to handle the tail-
recursive rulesets / programs that can result from extracting rules
from an OWL ontology that includes RBox axioms about role transitivity
(the path between countries, for example). The top-down algorithm is
similar to prolog with memoization and last call optimization. Both
are trivial to implement in Python using one of many memioze
decorators and native generators respectively.
 
Note, the bottom-up (forward chaining) method using the RETE-UL
decision network is much quicker and only fires one rule (due to
sideways information passing optimization - similar to the kind used
by databases to answer relational queries in the face of live,
immaterial, and possibly recursive views). I'm devising larger in-
memory tests to determine why the forward chain algorithms that much
faster, but I imagine it has something to do with:
 
Ullman, JD , "Bottom-up beats top-down for datalog". Proceedings of
the eighth ACM symposium on Principles of database systems. pp
140-149, 1989, ACM New York, NY, USA.
 
$ time FuXi --ruleFacts --why="ASK { test:Ghent test:path
test:Amsterdam }" --ns=test=http://www.w3.org/2002/03owlt/
TransitiveProperty/premises001# --dlp --output=conflict
http://www.w3.org/2002/03owlt/TransitiveProperty/premises001
 
Time to solve goal ASK { test:Ghent test:path test:Amsterdam } top-
down: 17.4751281738 milli seconds

.. snip PML serlializtion of proof ..

Time to calculate closure on working memory:  2.55084037781 milli seconds

<Network: 1 rules, 4 nodes, 9 tokens in working memory, 1 inferred tokens>

<TerminalNode : CommonVariables: [?thJpxqsn18] (3 in left, 3 in right memories)>

test:path(?X ?thJpxqsn19) :- And( test:path(?X ?thJpxqsn18) test:path(?thJpxqsn18 ?thJpxqsn19) )

1 instanciations 

real 0m2.532s
user 0m1.770s
sys 0m0.479s

Proof generated via renderProof method on ProofBuilder instance

 

Filed under  //   fuxi   logic   logic-programming   n3   owl   rdf   semantic-web  

Comments [0]

Using OWL and Default Negation to Reason about Patient Records

We've been using SPARQL, OWL, and N3 lately to prototype the reporting
of common research variables to the Society of Thoracic Surgeon's
(STS) National Database. The reports are being run against our large
RDF dataset of abstracted electronic patient records from the
Cleveland Clinic's Electronic Health Record system. Our dataset
consists of about 200,000 patients each represented as statements in
named RDF graphs. The STS variables we are responsible for deriving
are represented using a combination of OWL-DL and Notation 3. The
constraints that do not benefit from the restricted, tree-like nature of description logic are captured using secondary plain Horn clauses
(or rules) represented in Notation 3.
 
We use an open source logic reasoning system for the semantic web that
converts the constraints and a SPARQL query for an RDF dataset
governed by these OWL-DL constraints into provably optimal sets of
rules used to calculate an entailed RDF graph (the specifics of this
method is a subject of a paper I'm working on for the RuleML 2009
conference). Such an entailed RDF graph can then be targeted with
SPARQL queries to answer for the STS variables. A recent challenge
has been to try to capture the semantics of negation in order to
implement 'exclusion criteria'. This is typically of the form of a
class of procedures that do not involve combinations of one or more
kinds of other procedures. A recent update to FuXi includes the
ability to convert OWL-DL expressions that use owl:complementOf into
general, stratified, logic programs that can be evaluated using SPARQL
in order to implement the semantics of stable model negation (which is
quite different from the way owl:complementOf is meant to be
interpreted: according to the negation of first-order logic).
 
In particular, statements of classical (first-order) negation are
making assertions about the lack of existence of models that satisfy
the positive form of such a statement in a theory. I prefer this
explanation to the way the term 'open world' assumption is often
used to describe this interpretation of negated terms in a
description logic language. Database theory, ofcourse, does not interpret negated terms
in this way, but instead (intuitively) understands statements of negated terms to be
'true' if the (ground) positive form is not in the set of known facts (the
database).

Our use of negation, and the nature of knowledge
recorded in a computer-based patient record system seems (so far) to
lend itself more to the database interpretation where there is an
understanding that a curated medical information system would have its
data entered under the governance of policies that would allow
medically useful inferences to be made from the absence of certain
facts about patient care.
 
In particular, if a fact is known to not be true about a patient or
some activity involving a patient, it is not recorded. This common
understanding can be used to make inferences about whether facts in a
patient record satisfy an exclusion criteria. Below is an example of
this:
 
Consider the following OWL descriptions of a class of operations:
 

SubClassOf(
 IntersectionOf(
  Operation
  PostOpInHOspitalEvent
  ObjectAllValuesFrom(
  involves
  ComplementOf( UnionOf( CardiacProcedure ThoracicControlBleeding ) ) )
 )
 sts:ReopForOtherNonCardiac )

The syntax above is the OWL2 functional-style syntax. We can
paraphrase the general class inclusion (GCI) axiom above as saying:
".. all operations that followed another operation and do not involve any 
cardiac procedures or thoracic control bleeding procedures.

The original documentation for this variable in the STS adult cardiac
database manual says: 

Indicate whether the patient returned to the operating room for
other non-cardiac reasons

Now, if we assume that all operations of interest and the involved
procedures are explicitly recorded in our patient RDF dataset. This
general class inclusion axioms can be reduced into a set of rules that use negated
'literals' (as they are called); understood to capture the semantics of
default negation (or the 'closed world assumption').  It is worth noting that this is exemplary of a class of expressions that description logic, tableaux-based reasoning algorithms often have problems with.

Conjunctive query answering for stratified datalog is a well-studied class of
problems in database theory. It is through the insight of this canon of theory that FuXi is now able to reduce
OWL-DL expressions that use owl:complementOf into sets of rules (or
logic programs) that can be efficiently processed in order to
implement SPARQL entailment regimes for combinations of OWL and
rule-based representations for the semantic web such as
Notation 3 or RIF core.
 
The current FuXi implementation converts the GCI into the following
two RIF rules:


Forall ?X ?QrjeKHuq961 (
?X # sts:ReopForOtherNonCardiac

  :- And(

    ?X # PostOpInHospitalEvent
    ?X # Operation,
    Naf ?X[involves -> ?QrjeKHuq961] ) )


Forall ?X ?QrjeKHuq961 (
?X # sts:ReopForOtherNonCardiac
  :- And(

    ?X # PostOpInHospitalEvent ,
    ?X # Operation,
    ?X[involves -> ?QrjeKHuq961],
    Naf ?QrjeKHuq961 # CardiacProcedure,
    Naf ?QrjeKHuq961 # ThoracicControlBleeding ) )


 Note, Naf is in the (current) 30 July 2008 version of the "RIF
Framework for Logic Dialects

The first rule describes members of the clas of ReopForOtherNonCardiac as those post-operative operations (i.e., operations that follow another operation in the same patient hospital visit or episode) that do not involve other procedures.

The second rule applies to those post-operative operations that do involve other procedures where these other operations are not either cardiac procedures or thoracic control bleeding procedures.

These RIF rules can be exchanged with other RIF-compliant rule-based
systems that implement any of the well-accepted semantics for negated
formulas in horn clause logic (stable models, well-founded models,
stratified models, etc.). A recent modification to FuXi makes
use of a programmatic SPARQL interface for Python that a colleague of
mind has been working on called telescope. It works with
rdflib (same as FuXi) and is used as a control layer that converts
negated RIF rules into a series of SPARQL queries involving
OPTIONAL/FILTER/!BOUND that are used to calculate "stratified models"
(i.e., the finite set of facts that can be inferred from the set of
rules that include negated literals).
 
Renzo Angles et al. (2008) and Polleres, A. (2007) have since
demonstrated that the expressive power of SPARQL coincides with that
of datalog with negation, so it comes as no suprise that certain
datalog clauses (or rules) can be converted into SPARQL queries using
so-called copy-patterns and the introduction of a MINUS operator. For
the details of how this operator works and how its semantics are
equivalent with that of datalog, the reader is urged to read any of
the above mentioned papers.
 
telescope is used to programatically convert MINUS operators into a
SPARQL queries that answer for RIF rules with the corresponding
negated frame formulas below:
 

SELECT ?X
WHERE {
 ?X a PostOpInHospitalEvent .
 ?X a Operation
 #The post-operative operation does not invlolve any procedures
 OPTIONAL { ?X involves ?QrjeKHuq961 }
 FILTER (!bound(?QrjeKHuq961))
}


SELECT ?X
WHERE {
 ?X a PostOpInHospitalEvent .
 ?X a Operation .
 ?X involves ?QrjeKHuq961
 #In the case where the post-operative operation involves a procedure
it is *not* either a
 # cardiac procedure or a thoracic control bleeding
 OPTIONAL {
  ?QrjeKHuq961 a CardiacProcedure .
  ?QrjeKHuq961 a ThoracicControlBleeding .
  ?QrjeKHuq16542 a CardiacProcedure .
  ?QrjeKHuq16542 a ThoracicControlBleeding
  FILTER (?QrjeKHuq961 = ?QrjeKHuq16542)
 }
 FILTER (!bound(?QrjeKHuq16542))
}


I'll be adding a wiki shortly (on the python-dlp google code wiki)
describing the explicit APIs that can be used for this purpose, but I
wanted to give the feature some context in the recent work I've been
doing on applications of semantic web for medical informatics
 
-- Chimezie

Filed under  //   n3   owl   rdf   semantic-web  

Comments [0]