Lazy test and consumption of generators
So, I do a lot of design of RDF querying middleware and one of the tools of the trade that I have come to rely on quite a bit is the lazy handling of results. Consider a query to a large RDF dataset (with millions of rows). Generally, the naive approach would be to fetch all the answers from the server and then iterate over them at the client.
The lazy approach would instead fetch answers one at a time. Python generators are excellent for this and I've found myself using them judiciously in Python SPARQL results processing as well as in RDF/RIF/OWL inference (FuXi).
However, the problem with generators is that unlike lists they can only be consumed once rather than multiple times (as is the case with a list since it is a first class data structure). So, if I want to see if there is anything to fetch from the generator at all, I can't do it without effecting the consumption, since any subsequent attempt to fetch additional items from the generator will begin with the second item (if there is any).
I searched high and low for a 'lazy' test to determine if a generator has length. It would be similar to rdflib's first function - which takes an iterable or generator and consumes/returns the first item if there is one or None if not - but basically tests if a generator has length as an O(1) operation rather than an O(n) operation via the niave approach.
So, I wrote one up and am sharing it for anyone who has been faced with the same problem. It uses itertools.chain method in order to return a (new) generator over the initial item consumed for the purpose of testing if the generator has any length and the original generator (after losing the first item):
def lazyGeneratorPeek(iterable):
"""
Lazily peeks into a generator and returns None if it is empty
or returns another generator over *all* content if it isn't
>>> a=(i for i in [1,2,3])
>>> first(a)
1
>>> list(a)
[2, 3]
>>> a=(i for i in [1,2,3])
>>> result = lazyGeneratorPeek(a)
>>> result # doctest:+ELLIPSIS
<generator object at ...>
>>> list(result)
[1, 2, 3]
>>> lazyGeneratorPeek((i for i in []))
"""
item = first(iterable)
if item:
return (i for i in itertools.chain([item],
iterable))
Comments [0]