Using ARP Without Jena

ARP can be used both as a Jena subsystem, or as a standalone RDF/XML parser. This document gives a quick guide to using ARP standalone.

Contents

Overview

To load an RDF file:

  1. Create an ARP instance.
  2. Set parse options, particularly error detection control, using getOptions or setOptionsWith.
  3. Set its handlers, by calling the getHandlers or setHandlersWith methods, and then.
  4. Call a load method

Xerces is used for parsing the XML. The SAXEvents generated by Xerces are then analysed as RDF by ARP. It is possible to use a different source of SAX events.

Errors may occur in either the XML or the RDF part.

Sample Code

ARP arp = new ARP();

// initialisation - uses ARPConfig interface only.
arp.getOptions().setLaxErrorMode();

arp.getHandlers().setErrorHandler(new ErrorHandler(){
    public void fatalError(SAXParseException e){
           // TODO code
    }
    public void error(SAXParseException e){
            // TODO code
    }
    public void warning(SAXParseException e){
            // TODO code
    }
});
arp.getHandlers().setStatementHandler(new StatementHandler(){
    public void statement(AResource a, AResource b, ALiteral l){
            // TODO code
    }
    public void statement(AResource a, AResource b, AResource l){
            // TODO code
    }
});

// parsing.

try {
    // Loading fixed input ...
    arp.load(new StringReader(
        "<rdf:RDF  xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'>\n"
        +"<rdf:Description><rdf:value rdf:parseType='Literal'>"
        +"<b>hello</b></rdf:value>\n"
        +"</rdf:Description></rdf:RDF>"
    ));

}
catch (IOException ioe){
    // something unexpected went wrong
}
catch (SAXParseException s){
    // This error will have been reported
}
catch (SAXException ss) {
    // This error will not have been reported.
}

ARP Event Handling

ARP reports events concerning:

  • Triples found in the input.
  • Errors in the input.
  • Namespace declarations.
  • Scope of blank nodes.

User code is needed to respond to any of these events of interest. This is written by implementing any of the relevant interfaces: StatementHandler, org.xml.sax.ErrorHandler, NamespaceHandler, and ExtendedHandler.

An individual handler is set by calling the getHandlers method on the ARP instance. This returns an encapsulation of all the handlers being used. A specific handler is set by calling the appropriate set...Handler method on that object, e.g. setStatementHandler.

All the handlers can be copied from one ARP instance to another by using the setHandlersWith method:

 ARP from, to;
 // initialize from and to
 // ...

 to.setHandlersWith(from.getHandlers());

The error handler reports both XML and RDF errors, the former detected by Xerces. See ARPHandlers.setErrorHandler for details of how to distinguish between them.

Configuring ARP

ARP can be configured to treat most error conditions as warnings or to be ignored, and to treat some non-error conditions as warnings or errors.

In addition, the behaviour in response to input that does not have an <rdf:RDF> root element is configurable: either to treat the whole file as RDF anyway, or to scan the file looking for embedded <rdf:RDF> elements.

As with the handlers, there is an options object that encapsulates these settings. It can be accessed using getOptions, and then individual settings can be made using the methods in ARPOptions.

It is also possible to copy all the option settings from one ARP instance to another:

 ARP from, to;
 // initialize from and to ...

 to.setOptionsWith(from.getOptions());

The I/O how-to gives some more detail about the options settings, although it assumes the use of the Jena RDFReader interface.

Interrupting ARP

It is possible to interrupt an ARP thread. See the I/O how-to for details.

Using Other SAX Sources

It is possible to use ARP with other SAX input sources, e.g. from a non-Xerces parser, or from an in-memory XML source, such as a DOM tree.

Instead of an ARP instance, you create an instance of SAX2RDF using the newInstance method. This can be configured just like an ARP instance, following the initialization section of the sample code.

This is used like a SAX2Model instance as described elsewhere.

Memory usage

For very large files, ARP does not use any additional memory except when either the ExtendedHandler.discardNodesWithNodeID returns false or when the AResource.setUserData method has been used. In these cases ARP needs to remember the rdf:nodeID usage through the file life time.