Chimezie’s posterous

Filed under

logicprogramming

 

Lazy test and consumption of generators

So, I do a lot of design of RDF querying middleware and one of the tools of the trade that I have come to rely on quite a bit is the lazy handling of results. Consider a query to a large RDF dataset (with millions of rows). Generally, the naive approach would be to fetch all the answers from the server and then iterate over them at the client.

The lazy approach would instead fetch answers one at a time. Python generators are excellent for this and I've found myself using them judiciously in Python SPARQL results processing as well as in RDF/RIF/OWL inference (FuXi).

However, the problem with generators is that unlike lists they can only be consumed once rather than multiple times (as is the case with a list since it is a first class data structure). So, if I want to see if there is anything to fetch from the generator at all, I can't do it without effecting the consumption, since any subsequent attempt to fetch additional items from the generator will begin with the second item (if there is any).

I searched high and low for a 'lazy' test to determine if a generator has length. It would be similar to rdflib's first function - which takes an iterable or generator and consumes/returns the first item if there is one or None if not - but basically tests if a generator has length as an O(1) operation rather than an O(n) operation via the niave approach.

So, I wrote one up and am sharing it for anyone who has been faced with the same problem. It uses itertools.chain method in order to return a (new) generator over the initial item consumed for the purpose of testing if the generator has any length and the original generator (after losing the first item):

def lazyGeneratorPeek(iterable):
    """
    Lazily peeks into a generator and returns None if it is empty
    or returns another generator over *all* content if it isn't
    
    >>> a=(i for i in [1,2,3])
    >>> first(a)
    1
    >>> list(a)
    [2, 3]
    >>> a=(i for i in [1,2,3])
    >>> result = lazyGeneratorPeek(a)
    >>> result  # doctest:+ELLIPSIS
    <generator object at ...>
    >>> list(result)
    [1, 2, 3]
    >>> lazyGeneratorPeek((i for i in []))
    """
    item = first(iterable)
    if item:
        return (i for i in itertools.chain([item],
                                           iterable))

Filed under  //   logic-programming   python   rdf   tip  

Comments [0]

Updates to FuXi (backward chaining SPARQL/RIF/OWL entailment, various fixes, etc.)

So, I've re-organized python-dlp and FuXi. Originally, I thought of
python-dlp as a library on top of the expert system capabilities of a
SW reasoner (syntax libraries, etc.). Now with a small ecosystem of
(next generation) RDFlib-based, open-source libraries emerging from
the SW informatics being done in our Heart and Vascular Institute:
 
- Telescope 
- SPARQL to SQL method (wiki and changes)
- "rdflib-stable" revisions 
 
I've decided to separate the expert system (FuXi) into its own Google
Code project and decide soon what to do with python-dlp (the idea,
anyways):
 
It has a mercurial repository and can be cloned via:
 
hg clone https://fuxi.googlecode.com/hg/ fuxi
 
So, effectively, FuXi development has switched from SVN to
Mercurial :)
 
Shortly, I'll update the literature (wikis, etc.) on the python-dlp
Google Code project to (for now) act as a detour to the new project
until I determine what features, usecases are still left from the
original idea of having FuXi be everything inference-related on top of
"rdflib-stable"
 
I still have to determine if the RIF-Core XML syntax support should be
done using 4suite-xml, amara, amara2, or Python's (native) XML
processing capabilities (which is what rdflib originally did). This
will have an impact on the ease of distribution, etc.
 
Until then the recent changes are probably the last before the feature-
frozen candidate release testing leading to 1.0:
 
Revision 81d9eec480:
- Fixed use of Set (some performance boost perhaps)
- +++ b/lib/Rete/Magic.py (added imports from proof theory, minor
fixes, empty sip collections,
variable utility function, immutable dictionaries, )
+++ b/lib/Rete/Proof.py (updated proof builder, PML / RDF/XML
serialization, for top-down, query
generating strategies)
+++ b/lib/Rete/SidewaysInformationPassing.py (fix construction of sip
collection)
- Native Prolog-like Python implementation for RIF-Core, OWL, and
SPARQL (can be used to convert
SPARQL in-light of RIF-Core/OWL/OWL2 into a series of top-down
evaluated and combined SPARQL queries
against an RDF dataset)
 
Revision dc70e7a115:
Takes another Uniterm and in every case where the corresponding term
+ for both literals are different variables, we map from the
variable
+ for *this* uniterm to the corresponding variable of the other
.
Revision ff2acf3e96:
- Added --why option to command-line. Adds to a list of SPARQL
queries used to trigger the sip-
strategy to solve them top-down from the ruleset.
- minor checks/fixes for top-down method
 
Revision a0c14690be
fix to handle use of pyparsing or Bison for SPARQL parsing
 
There is a new wiki showing usage of top-down (backward-chaining) RIF/
OWL/N3 solver:
 
Also, the testOWL.py script has been updated with options for running
the OWL test using top-down or bottom-up reasoners:
 
$ python testOWL.py --bottomUp --singleTest=OWL/TransitiveProperty/
premises001
.. snip ..
$ python testOWL.py --topDown=ground --singleTest=OWL/
TransitiveProperty/premises001
.. snip ..
$ python testOWL.py --topDown=ground
ok
.. snip ..
----------------------------------------------------------------------
Ran 1 test in 3.089s
 
OK
 
For example, the command-line can be used to solve a OWL test by
remotely fetching the OWL class descriptions, converting them to RIF-
like rules internally (using the DLP feature), reducing the ruleset
and creating a rule/goal graph using the GMS strategy, taking a SPARQL/ASK
query via the --why option, and solving the query against the original
OWL graph using the new top-down feature, printing out a PML RDF/XML
proof.

Notice the backward chaining method (SipStrategy) can be passed a 'live' database-
connected graph in order to solve the query in backward-chaining
fashion by dispatching SPARQL queries against the possibly large RDF
warehouse).
 
The example below demonstrates the ability to handle the tail-
recursive rulesets / programs that can result from extracting rules
from an OWL ontology that includes RBox axioms about role transitivity
(the path between countries, for example). The top-down algorithm is
similar to prolog with memoization and last call optimization. Both
are trivial to implement in Python using one of many memioze
decorators and native generators respectively.
 
Note, the bottom-up (forward chaining) method using the RETE-UL
decision network is much quicker and only fires one rule (due to
sideways information passing optimization - similar to the kind used
by databases to answer relational queries in the face of live,
immaterial, and possibly recursive views). I'm devising larger in-
memory tests to determine why the forward chain algorithms that much
faster, but I imagine it has something to do with:
 
Ullman, JD , "Bottom-up beats top-down for datalog". Proceedings of
the eighth ACM symposium on Principles of database systems. pp
140-149, 1989, ACM New York, NY, USA.
 
$ time FuXi --ruleFacts --why="ASK { test:Ghent test:path
test:Amsterdam }" --ns=test=http://www.w3.org/2002/03owlt/
TransitiveProperty/premises001# --dlp --output=conflict
http://www.w3.org/2002/03owlt/TransitiveProperty/premises001
 
Time to solve goal ASK { test:Ghent test:path test:Amsterdam } top-
down: 17.4751281738 milli seconds

.. snip PML serlializtion of proof ..

Time to calculate closure on working memory:  2.55084037781 milli seconds

<Network: 1 rules, 4 nodes, 9 tokens in working memory, 1 inferred tokens>

<TerminalNode : CommonVariables: [?thJpxqsn18] (3 in left, 3 in right memories)>

test:path(?X ?thJpxqsn19) :- And( test:path(?X ?thJpxqsn18) test:path(?thJpxqsn18 ?thJpxqsn19) )

1 instanciations 

real 0m2.532s
user 0m1.770s
sys 0m0.479s

Proof generated via renderProof method on ProofBuilder instance

 

Filed under  //   fuxi   logic   logic-programming   n3   owl   rdf   semantic-web  

Comments [0]

Magic sets, DLP, and other strange ways to implement semantic web expert systems

I just finished some changes to python-dlp including a modification to
FuXi that includes an implementation of the Magic Set Transformation
(MST) method for RIF-like horn clauses. The most useful, immediate
value this has for me is to be able to (essentially) implement a DLP
(description logic programming) entailment regime for a SPARQL query
service.
 
Consider the test case for the SymmetricProperty OWL test.
 
The base facts are:
 
  first:Ghent first:path first:Antwerp .
  first:path a owl:SymmetricProperty
 
The goal we are trying to prove is:
 
  first:Antwerp first:path first:Ghent
 
I.e., a user wants to query an RDF dataset that includes an RDF graph
with the above statements and is expected to implement an entailment
regime for OWL-DL RDF such that the following query gives a positive
answer:
 
  ASK { first:Antwerp first:path first:Ghent }
 
The general pD* rule that would normally apply in helping answer this query is:
 
  {?P a owl:SymmetricProperty. ?S ?P ?O} => {?O ?P ?S}.
 
Re-written in a familiar (prolog-like) RIF-BLD syntax:
 
  Forall ?P ?O ?S ( ?P(?O ?S) :- And( owl:SymmetricProperty(?P) ?P(?S ?O) ) )
 
In order to maintain consistency, a rule-based engine that used this
clause to implement the definition of a symmetric property would need
to fire it for *every* triple in the fact base (in order to properly
calculate the herbrand base) because of the 2nd triple pattern in the
body / antecedent / left-hand-side of the rule: ?S ?P ?O
 
However, the DLP approach that converted tree-based OWL axioms into
colloquial horn clauses would allow us to use (instead) a
domain-specific rule:
 
  Forall ?Y ?X ( first:path(?Y ?X) :- first:path(?X ?Y) )]
 
This rule is domain-specific in the sense that it only applies to
instances of the first:path predicate rather than for every predicate.
 As a result of this transformation, the procedural evaluation of the
rule for symmetry has been reduced from the worst case to only the
fraction of the RDF dataset concerning first:path statements.
 
So, a knowledge base that could exhaustively evaluate rules in a
top-down fashion (via 'forward chaining') prior to bringing up the
SPARQL service could answer that question against the (smaller)
entailed RDF graph.
 
However, with the MST implementation, if the query was known a priori
the ruleset can be modified into a version that further restricts the
amount of redundant work done during the inference process. For
example, even if the SPARQL service is known to never have to answer
that query, the colloquial rule above would still be needed by a niave
implemenation and would apply to every statement that used the
first:path predicate.
 


 
The diagram above is a Graph-viz rendering of a Proof Markup Language
(PML) proof tree generated by taking the colloquial rule, modifying it
using the MST algorithm, evaluating the base facts against the
ruleset, and adding an RDF statement that 'triggers' the
backward-chaining process.  Fuxi includes a nice set of utilities for generating proof tree vizualizations.

 

Essentially, performing a top-down (or
forward chaining) evaluation of the rules and the facts simulates a
backward-chained proof.
 
Below are the 3 rules that replace the original domain-specific rule:
 
  :path_magic(?LOC1 ? LOC2) :- And( :path_magic(?X ? LOC2)
:path_bf(?X ? LOC1) :path_magic(?X) )
  :path_magic(?X) :- :path_magic(?X ?LOC)
  :path(?X ?LOC1) :- And( :path_magic(?X ? LOC1) :path_magic(?X)
:path(?X ?LOC2) :path_magic(? LOC2 ? LOC1) :path(?LOC2 ?LOC1) )
 
And finally, the trigger for the proof is the following RDF statement:
 
  first:Antwerp first:path_magic first:Ghent
 
The first two rules, pass through information about the sub-goals of
the query and essentially block the final rule from taking effect
until the trigger is added to the fact graph. It is clear to see that
the 3rd rule, will no longer apply to every RDF statement with a
first:path predicate, but rather only statements of that kind where
the subject and / or object terms are part of a query. So, for a
SPARQL service where we do not expect to answer queries that rely on
supporting symmetric properties in the first:path predicate, no
calculations will be performed and no unnecessary RDF statements will
be entailed.
 
I hope to write a bit more about some of the benefits of a Python
toolkit for building Semantic Web expert systems. I touched a bit on
these in my InfixOWL write-up and presentation, but haven't really put
the whole picture together.
 
-- Chimezie

 

Filed under  //   fuxi   logic   logic-programming   semantic-web  

Comments [0]