ARP can be used both as a Jena subsystem, or as a standalone RDF/XML parser. This document gives a quick guide to using ARP standalone.
To load an RDF file:
Xerces is used for parsing the XML. The SAXEvents generated by Xerces are then analysed as RDF by ARP. It is possible to use a different source of SAX events.
Errors may occur in either the XML or the RDF part.
ARP arp = new ARP(); // initialisation - uses ARPConfig interface only. arp.getOptions().setLaxErrorMode(); arp.getHandlers().setErrorHandler(new ErrorHandler(){ public void fatalError(SAXParseException e){ // TODO code } public void error(SAXParseException e){ // TODO code } public void warning(SAXParseException e){ // TODO code } }); arp.getHandlers().setStatementHandler(new StatementHandler(){ public void statement(AResource a, AResource b, ALiteral l){ // TODO code } public void statement(AResource a, AResource b, AResource l){ // TODO code } }); // parsing. try { // Loading fixed input ... arp.load(new StringReader( "<rdf:RDF xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'>\n" +"<rdf:Description><rdf:value rdf:parseType='Literal'>" +"<b>hello</b></rdf:value>\n" +"</rdf:Description></rdf:RDF>" )); } catch (IOException ioe){ // something unexpected went wrong } catch (SAXParseException s){ // This error will have been reported } catch (SAXException ss) { // This error will not have been reported. }
ARP reports events concerning:
User code is needed to respond to any of these events of interest. This is written by implementing any of the relevant interfaces: StatementHandler, org.xml.sax.ErrorHandler, NamespaceHandler, and ExtendedHandler.
An individual handler is set by calling the getHandlers method on the ARP instance. This returns an encapsulation of all the handlers being used. A specific handler is set by calling the appropriate set...Handler method on that object, e.g. setStatementHandler.
All the handlers can be copied from one ARP instance to another by using the setHandlersWith method:
ARP from, to; // initialize from and to // ... to.setHandlersWith(from.getHandlers());
The error handler reports both XML and RDF errors, the former detected by Xerces. See ARPHandlers.setErrorHandler for details of how to distinguish between them.
ARP can be configured to treat most error conditions as warnings or to be ignored, and to treat some non-error conditions as warnings or errors.
In addition, the behaviour in response to input that does not have
an <rdf:RDF>
root element is configurable: either to treat the
whole file as RDF anyway, or to scan the file looking for embedded
<rdf:RDF>
elements.
As with the handlers, there is an options object that encapsulates
these settings. It can be accessed using
getOptions
,
and then individual settings can be made using the methods in
ARPOptions
.
It is also possible to copy all the option settings from one ARP instance to another:
ARP from, to; // initialize from and to ... to.setOptionsWith(from.getOptions());
The I/O how-to gives some more
detail about the options settings, although it assumes the use of
the Jena RDFReader
interface.
It is possible to interrupt an ARP thread. See the I/O how-to for details.
It is possible to use ARP with other SAX input sources, e.g. from a non-Xerces parser, or from an in-memory XML source, such as a DOM tree.
Instead of an ARP instance, you create an instance of SAX2RDF using the newInstance method. This can be configured just like an ARP instance, following the initialization section of the sample code.
This is used like a SAX2Model instance as described elsewhere.
For very large files, ARP does not use any additional memory except
when either the
ExtendedHandler.discardNodesWithNodeID
returns false or when the
AResource.setUserData
method has been used. In these cases ARP needs to remember the
rdf:nodeID
usage through the file life time.