Note: These notes are quite old now, but may still be of some interest in the design and architecture of Jena.
This note is a development of the original note on the enhanced node and graph design of Jena 2.
One problem with the Jena 1 design was that both the DAML layer and the RDB layer independently extended Resource with domain-specific information. That made it impossible to have a DAML-over-RDB implementation. While this could have been fixed by using the "enhanced resource" mechanism of Jena 1, that would have left a second problem.
In Jena 1.0, once a resource has been determined to be a DAML Class
(for instance), that remains true for the lifetime of the model. If
a resource starts out not qualifying as a DAML Class (no
rdf:type daml:Class
) then adding the type assertion later doesn't
make it a Class. Similarly, of a resource is a DAML Class, but then
the type assertion is retracted, the resource is still apparently a
class.
Hence being a DAMLClass is a view of the resource that may change
over time. Moreover, a given resource may validly have a number of
different views simultaneously. Using the current DAMLClass
implementation method means that a given resource is limited to a
single such view.
A key objective of the new design is to allow different views, or facets, to be used dynamically when accessing a node. The new design allows nodes to be polymorphic, in the sense that the same underlying node from the graph can present different encapsulations - thus different affordances to the programmer - on request.
In summary, the enhanced node design in Jena 2.0 allows programmers to:
To assist the following discussion, the key terms are introduced first.
node ~ A subject or object from a triple in the underlying graph graph ~ The underlying container of RDF triples that simplifies the previous abstraction Model enhanced node ~ An encapsulation of a node that adds additional state or functionality to the interface defined for node. For example, a bag is a resource that contains a number of other resources; an enhanced node encapsulating a bag might provide simplified programmatic access to the members of the bag. enhanced graph ~ Just as an enhanced node encapsulates a node and adds extra functionality, an enhanced graph encapsulates an underlying graph and provides additional features. For example, both Model and DAMLModel can be thought of as enhancements to the (deliberately simple) interface to graphs. polymorphic ~ An abstract super-class of enhanced graph and enhanced node that exists purely to provide shared implementation. personality ~ An abstraction that circumscribes the set of alternative views that are available in a given context. In particular, defines a mapping from types (q.v.) to implementations (q.v.). This seems to be taken to be closed for graphs. implementation ~ A factory object that is able to generate polymorphic objects that present a given enhanced node according to a given type. For example, an alt implementation can produce a sub-class of enhanced node that provides accessors for the members of the alt.
Some key features of the design are:
If en
is an enhanced node representing some resource we wish to
be able to view as being of some (Java) class/interface T
, the
expression en.as(T.class)
will either deliver an EnhNode of type
C
, if it is possible to do so, or throw an exception if not.
To check if the conversion is allowed, without having to catch
exceptions, the expression en.canAs(T.class)
delivers true
iff
the conversion is possible.
Somehow, some seed enhanced node must be created, otherwise as()
would have nothing to work on. Subclasses of enhanced node provide
constructors (perhaps hidden behind factories) which wrap plain
nodes up in enhanced graphs. Eventually these invoke the
constructor
EnhNode(Node,EnhGraph)
It's up to the constructors for the enhanced node subclasses to ensure that they are called with appropriate arguments.
as(Class T)
is defined on EnhNode to invoke asInternal(T)
in
Polymorphic
. If the original enhanced node en
is already a valid
instance of T
, it is returned as the result. Validity is checked
by the method isValue()
.
If en
is not already of type T
, then a cache of alternative
views of en
is consulted to see if a suitable alternative exists.
The cache is implemented as a sibling ring of enhanced nodes -
each enhanced node has a link to its next sibling, and the "last"
node links back to the "first". This makes it cheap to find
alternative views if there are not too many of them, and avoids
caches filling up with dead nodes and having to be flushed.
If there is no existing suitable enhanced node, the node's
personality is consulted. The personality maps the desired class
type to an Implementation
object, which is a factory with a
wrap
method which takes a (plain) node and an enhanced graph and
delivers the new enhanced node after checking that its conditions
apply. The new enhanced node is then linked into the sibling ring.
What you have to do to define an enhanced node/graph implementation:
I
for the new enhanced node. (You could
use just the implementation class, but we've stuck with the
interface, because there might be different implementations)C
. This is just a front for
the enhanced node. All the state of C
is reflected in the graph
(except for caching; but beware that the graph can change without
notice).Implementation
class for the factory. This class
defines methods canWrap
and wrap
, which test a node to see if
it is allowed to represent I
and construct an implementation of
C
respectively.I
to the factory. At the moment we do this by using (a copy of) the
built-in graph personality as the personality for the enhanced
graph.For an example, see the code for ReifiedStatementImpl
.
This document describes the reification API in Jena2, following discussions based on the 0.5a document. The essential decision made during that discussion is that reification triples are captured and dealt with by the Model transparently and appropriately.
The first Jena implementation made some attempt to optimise the representation of reification. In particular it tried to avoid so called 'triple bloat', ie requiring four triples to represent the reification of a statement. The approach taken was to make a Statement a subclass of Resource so that properties could be directly attached to statement objects.
There are a number of defects in the Jena 1 approach.
However, there are some supporters of the approach. They liked: - the avoidance of triple bloat - that the extra reifications statements are not there to be found on queries or ListStatements and do not affect the size() method.
Since Jena was first written the RDFCore WG have clarified the meaning of a reified statement. Whilst Jena 1 took a reified statement to denote a statement, RDFCore have decided that a reified statement denotes an occurrence of a statement, otherwise called a stating. The Jena 1 .equals() methods for Statements is thus inappropriate for comparing reified statements. The goal of reification support in the Jena 2 implementation are:
Statement will no longer be a subclass of Resource. Thus a statement may not be used where a resource is expected. Instead, a new interface ReifiedStatement will be defined:
public interface ReifiedStatement extends Resource { public Statement getStatement(); // could call it a day at that or could duplicate convenience // methods from Statement, eg getSubject(), getInt(). ... }
The Statement interface will be extended with the following methods: public interface Statement ... public ReifiedStatement createReifiedStatement(); public ReifiedStatement createReifiedStatement(String URI); / / public boolean isReified(); public ReifiedStatement getAnyReifiedStatement(); / / public RSIterator listReifiedStatements(); / / public void removeAllReifications(); ...
RSIterator is a new iterator which returns ReifiedStatements. It is an extension of ResourceIterator. The Model interface will be extended with the following methods:
public interface Model ... public ReifiedStatement createReifiedStatement(Statement stmt); public ReifiedStatement createReifiedStatement(String URI, Statement stmt); /* */ public boolean isReified(Statement st); public ReifiedStatement getAnyReifiedStatement(Statement stmt); /* */ public RSIterator listReifiedStatements(); public RSIterator listReifiedStatements(Statement stmt); /* */ public void removeReifiedStatement(reifiedStatement rs); public void removeAllReifications(Statement st); ...
The methods in Statement are defined to be the obvious calls of methods in Model. The interaction of those models is expressed below. Reification operates over statements in the model which use predicates rdf:subject, rdf:predicate, rdf:object, and rdf:type with object rdf:Statement. statements with those predicates are, by default, invisible. They do not appear in calls of listStatements, contains, or uses of the Query mechanism. Adding them to the model will not affect size(). Models that do not hide reification quads will also be available.
The Model::as() mechanism will allow the retrieval of reified statements.
someResource.as( ReifiedStatement.class )
If someResource has an associated reification quad, then this will deliver an instance rs of ReifiedStatement such that rs.getStatement() will be the statement rs reifies. Otherwise a DoesNotReifyException will be thrown. (Use the predicate canAs() to test if the conversion is possible.) It does not matter how the quad components have arrived in the model; explicitly asserted or by the create mechanisms described below. If quad components are removed from the model, existing ReifiedStatement objects will continue to function, but conversions using as() will fail.
createReifiedStatement(Statement stmt) creates a new ReifiedStatement object that reifies stmt; the appropriate quads are inserted into the model. The resulting resource is a blank node.
createReifiedStatement(String URI, Statement stmt) creates a new ReifiedStatement object that reifies stmt; the appropriate quads are inserted into the model. The resulting resource is a Resource with the URI given.
Two reified statements are .equals() iff they reify the same statement and have .equals() resources. Thus it is possible for equal Statements to have unequal reifications.
isReified(Statement st) is true iff in the Model of this Statement there is a reification quad for this Statement. It does not matter if the quad was inserted piece-by-piece or all at once using a create method.
getAnyReifiedStatement(Statement st) delivers an existing ReifiedStatement object that reifies st, if there is one; otherwise it creates a new one. If there are multiple reifications for st, it is not specified which one will be returned.
listReifiedStatements() will return an RSIterator which will deliver all the reified statements in the model.
listReifiedStatements( Statement st ) will return an RSIterator which will deliver all the reified statements in the model that reifiy st.
removeReifiedStatement(ReifiedStatement rs) will remove the reification rs from the model by removing the reification quad. Other reified statements with different resources will remain.
removeAllReifications(Statement st) will remove all the reifications in this model which reify st.
The writers will have access to the complete set of Statements and will be able to write out the quad components.
The readers need have no special machinery, but it would be efficient for them to be able to call createReifiedStatement when detecting an reification.
Jena1's "statements as resources" approach avoided triples bloat by not storing the reification quads. How, then, do we avoid triple bloat in Jena2?
The underlying machinery is intended to capture the reification quad components and store them in a form optimised for reification. In particular, in the case where a statement is completely reified, it is expected to store only the implementation representation of the Statement.
createReifiedStatement is expected to bypass the construction and detection of the quad components, so that in the "usual case" they will never come into existance.
This document describes the reification SPI, the mechanisms by which the Graph family supports the Model API reification interface.
Graphs handle reification at two levels. First, their reifier supports requests to reify triples and to search for reifications. The reifier is responsible for managing the reification information it adds and removes - the graph is not involved.
Second, a graph may optionally allow all triples added and removed through its normal operations (including the bulk update interfaces) to be monitored by its reifier. If so, all appropriate triples become the property of the reifier - they are no longer visible through the graph.
A graph may also have a reifier that doesn't do any reification.
This is useful for internal graphs that are not exposed as models.
So there are three kinds of Graph
:
Graphs that do no reification; Graphs that only do explicit reficiation; Graphs that do implicit reification.
The primary reification operation on graphs is to extract their
Reifier
instance. Handing reification off to a different class
allows reification to be handled independantly of other Graph
issues, eg query handling, bulk update.
Returns the Reifier
for this Graph
. Each graph has a single
reifier during its lifetime. The reifier object need not be
allocated until the first call of getReifier()
.
These two operations may defer their triples to the graph's reifier
using handledAdd(Triple)
and handledDelete(Triple)
; see below
for details.
Instances of Reifier
handle reification requests from their
Graph
and from the API level code (issues by the API class
ModelReifier
.
The reifier may keep reification triples to itself, coded in some
special way, rather than having them stored in the parent Graph
.
This method exposes those triples as another Graph
. This is a
dynamic graph - it changes as the underlying reifications change.
However, it is read-only; triples cannot be added to or removed
from it.
The SimpleReifier
implementation currently does not implement a
dynamic graph. This is a bug that will need fixing.
Get the Graph
that this reifier serves; the result is never
null
. (Thus the observable relationship between graphs and
reifiers is 1-1.)
This class extends RDFException
; it is the exception that may be
thrown by reifyAs
.
Record the t
as reified in the parent Graph
by the given n
and returns n
. If n
already reifies a different Triple
, throw
a AlreadyReifiedException
.
Calling reifyAs(t,n)
is like adding the triples:
n rdf:type ref:Statement
n rdf:subject t.getSubject()
n rdf:predicate t.getPredicate()
n rdf:object t.getObject()
to the associated Graph; however, it is intended that it is
efficient in both time and space.
Returns true iff some Node n
reifies t
in this Reifier
,
typically by an unretracted call of reifyAs(t,n)
.
The intended (and actual) use for hasTriple(Triple)
is in the
implementation of isReified(Statement)
in Model
.
Get the single Triple
associated with n
, if there is one. If
there isn't, return null
.
A node reifies at most one triple. If reifyAs
, with its explicit
check, is bypassed, and extra reification triples are asserted into
the parent graph, then getTriple()
will simply return null
.
Returns an (extended) iterator over all the nodes that (still)
reifiy something in this reifier.
This is intended for the implementation of listReifiedStatements
in Model
.
Returns an iterator over all the nodes that (still) reify the triple _t_.
Remove the association between n
and the triplet
. Subsequently,
hasNode(n)
will return false and getTriple(n)
will return
null
.
This method is used to implement removeReification(Statement)
in
Model
.
Remove all the associations between any node n
and t
; ie, for
all n
do remove(n,t)
.
This method is used to implement removeAllReifications
in
Model
.
A graph doing reification may choose to monitor the triples being
added to it and have the reifier handle reification triples. In
this case, the graph's add(t)
should call handledAdd(t)
and
only proceed with its add if the result is false
.
A graph that does not use handledAdd()
[and handledDelete()
]
can only use the explict reification supplied by its reifier.
As for handledAdd(t)
, but applied to delete
.
SimpleReifier
is an implementation of Reifier
suitable for
in-memory Graph
s built over GraphBase
. It operates in either of
two modes: with and without triple interception. With interception
enabled, reification triples fed to (or removed from) its parent
graph are captured using handledAdd()
and handledRemove
;
otherwise they are ignored and the graph must store them itself.
SimpleReifier
keeps a map from nodes to the reification
information about that node. Nodes which have no reification
information (most of them, in the usual case) do not appear in the
map at all.
Nodes with partial or excessive reification information are
associated with Fragments
. A Fragments
for a node n
records
separately
the S
s of all n ref:subject S
triples
the P
s of all n ref:predicate P
triples
the O
s of all n ref:subject O
triples
the T
s of all n ref:type T[Statement]
triples
If the Fragments
becomes singular, ie each of these sets
contains exactly one element, then n
represents a reification of
the triple (S, P, O)
, and the Fragments
object is replaced by
that triple.
(If another reification triple for n
arrives, then the triple is
re-exploded into Fragments
.)