Tutorial - Manipulating SPARQL using ARQ

When you've been working with SPARQL you quickly find that static queries are restrictive. Maybe you want to vary a value, perhaps add a filter, alter the limit, etc etc. Being an impatient sort you dive in to the query string, and it works. But what about little Bobby Tables? And, even if you sanitise your inputs, string manipulation is a fraught process and syntax errors await you. Although it might seem harder than string munging, the ARQ API is your friend in the long run.

Originally published on the Research Revealed project blog

Inserting values (simple prepared statements)

Let's begin with something simple. Suppose we wanted to restrict the following query to a particular person:

   select * { ?person <http://xmlns.com/foaf/0.1/name> ?name }

String#replaceAll would work, but there is a safer way. QueryExecutionFactory in most cases lets you supply a QuerySolution with which you can prebind values.

   QuerySolutionMap initialBinding = new QuerySolutionMap();
   initialBinding.add("name", personResource);
   qe = QueryExecutionFactory.create(query, dataset, initialBinding);

This is often much simpler than the string equivalent since you don't have to escape quotes in literals. (Beware that this doesn't work for sparqlService, which is a great shame. It would be nice to spend some time remedying that.)

Making a Query from Scratch

The previously mentioned limitation is due to the fact that prebinding doesn't actually change the query at all, but the execution of that query. So what how do we really alter queries?

ARQ provides two ways to work with queries: at the syntax level (Query and Element), or the algebra level (Op). The distinction is clear in filters:

   SELECT ?s { ?s <http://example.com/val> ?val . FILTER ( ?val < 20 ) }

If you work at the syntax level you'll find that this looks (in pseudo code) like:

   (GROUP (PATTERN ( ?s <http://example.com/val> ?val )) (FILTER ( < ?val 20 ) ))

That is there's a group containing a triple pattern and a filter, just as you see in the query. The algebra is different, and we can see it using arq.qparse --print op

   $ java arq.qparse --print op 'SELECT ?s { ?s <http://example.com/val> ?val . FILTER ( ?val < 20 ) }'
   (base <file:///...>
       (project (?s)
           (filter (< ?val 20)
               (bgp (triple ?s <http://example.com/val> ?val)))))

Here the filter contains the pattern, rather than sitting next to it. This form makes it clear that the expression is filtering the pattern.

Let's create that query from scratch using ARQ. We begin with some common pieces: the triple to match, and the expression for the filter.

   // ?s ?p ?o .
   Triple pattern =
       Triple.create(Var.alloc("s"), Var.alloc("p"), Var.alloc("o"));
   // ( ?s < 20 )
   Expr e = new E_LessThan(new ExprVar("s"), new NodeValueInteger(20));

Triple should be familiar from jena. Var is an extension of Node for variables. Expr is the root interface for expressions, those things that appear in FILTER and LET.

First the syntax route:

   ElementTriplesBlock block = new ElementTriplesBlock(); // Make a BGP
   block.addTriple(pattern);                              // Add our pattern match
   ElementFilter filter = new ElementFilter(e);           // Make a filter matching the expression
   ElementGroup body = new ElementGroup();                // Group our pattern match and filter
   body.addElement(block);
   body.addElement(filter);

   Query q = QueryFactory.make();
   q.setQueryPattern(body);                               // Set the body of the query to our group
   q.setQuerySelectType();                                // Make it a select query
   q.addResultVar("s");                                   // Select ?s

Now the algebra:

   Op op;
   BasicPattern pat = new BasicPattern();                 // Make a pattern
   pat.add(pattern);                                      // Add our pattern match
   op = new OpBGP(pat);                                   // Make a BGP from this pattern
   op = OpFilter.filter(e, op);                           // Filter that pattern with our expression
   op = new OpProject(op, Arrays.asList(Var.alloc("s"))); // Reduce to just ?s
   Query q = OpAsQuery.asQuery(op);                       // Convert to a query
   q.setQuerySelectType();                                // Make is a select query

Notice that the query form (SELECT, CONSTRUCT, DESCRIBE, ASK) isn't part of the algebra, and we have to set this in the query (although SELECT is the default). FROM and FROM NAMED are similarly absent.

You can also look around the algebra and syntax using visitors. Start by extending OpVisitorBase (ElementVisitorBase) which stubs out the interface so you can concentrate on the parts of interest, then walk using OpWalker.walk(Op, OpVisitor) (ElementWalker.walk(Element, ElementVisitor)). These work bottom up.

For some alterations, like manipulating triple matches in place, visitors will do the trick. They provide a simple way to get to the right parts of the query, and you can alter the pattern backing BGPs in both the algebra and syntax. Mutation isn't consistently available, however, so don't depend on it.

Transforming the Algebra

So far there is no obvious advantage in using the algebra. The real power is visible in transformers, which allow you to reorganise an algebra completely. ARQ makes extensive use of transformations to simplify and optimise query execution.

In Research Revealed I wrote some code to take a number of constraints and produce a query. There were a number of ways to do this, but one way I found was to generate ops from each constraint and join the results:

   for (Constraint con: cons) {
       op = OpJoin.create(op, consToOp(cons)); // join
   }

The result was a perfectly correct mess, which is only barely readable with just three conditions:

   (join
       (join
           (filter (< ?o0 20) (bgp (triple ?s <urn:ex:prop0> ?o0)))
           (filter (< ?o1 20) (bgp (triple ?s <urn:ex:prop1> ?o1))))
       (filter (< ?o2 20) (bgp (triple ?s <urn:ex:prop2> ?o2))))

Each of the constraints is a filter on a bgp. This can be made much more readable by moving the filters out, and merging the triple patterns. We can do this with the following Transform:

   class QueryCleaner extends TransformBase
   {
       @Override
       public Op transform(OpJoin join, Op left, Op right) {
           // Bail if not of the right form
           if (!(left instanceof OpFilter && right instanceof OpFilter)) return join;
           OpFilter leftF = (OpFilter) left;
           OpFilter rightF = (OpFilter) right;

           // Add all of the triple matches to the LHS BGP
           ((OpBGP) leftF.getSubOp()).getPattern().addAll(((OpBGP) rightF.getSubOp()).getPattern());
           // Add the RHS filter to the LHS
           leftF.getExprs().addAll(rightF.getExprs());
           return leftF;
       }
   }
   ...
   op = Transformer.transform(new QueryCleaner(), op); // clean query

This looks for joins of the form:

   (join
       (filter (exp1) (bgp1))
       (filter (exp2) (bgp2)))

And replaces it with:

   (filter (exp1 && exp2) (bgp1 && bgp2))

As we go through the original query all joins are removed, and the result is:

   (filter (exprlist (< ?o0 20) (< ?o1 20) (< ?o2 20))
       (bgp
           (triple ?s <urn:ex:prop0> ?o0)
           (triple ?s <urn:ex:prop1> ?o1)
           (triple ?s <urn:ex:prop2> ?o2)
   ))

That completes this brief introduction. There is much more to ARQ, of course, but hopefully you now have a taste for what it can do.