Xerces is an XML library for several languages, but if a very common library in Java.
I recently came across a problem with code intermittently throwing a NullPointerException inside the library:
[sourcecode lang=”text”]java.lang.NullPointerExceptionat org.apache.xerces.dom.ParentNode.nodeListItem(Unknown Source)
at org.apache.xerces.dom.ParentNode.item(Unknown Source)
at com.example.xml.Element.getChildren(Element.java:377)
at com.example.xml.Element.newChildElementHelper(Element.java:229)
at com.example.xml.Element.newChildElement(Element.java:180)
…
[/sourcecode]You may also find the NullPointerException in ParentNode.nodeListGetLength() and other locations in ParentNode.
Debugging this was not helped by the fact that the xercesImpl.jar is stripped of line numbers, so I couldn’t find the exact issue. After some searching, it appeared that the issue was down to the fact that Xerces is not thread-safe. ParentNode caches iterations through the NodeList of children to speed up performance and stores them in the Node’s Document object. In multi-threaded applications, this can lead to race conditions and NullPointerExceptions. And because it’s a threading issue, the problem is intermittent and hard to track down.
The solution is to synchronise your code on the DOM, and this means the Document object, everywhere you access the nodes. I’m not certain exactly which methods need to be protected, but I believe it needs to be at least any function that will iterate a NodeList. I would start by protecting every access and testing performance, and removing some if needed.
[sourcecode lang=”java”]/*** Returns the concatenation of all the text in all child nodes
* of the current element.
*/
public String getText() {
StringBuilder result = new StringBuilder();
synchronized ( m_element.getOwnerDocument()) {
NodeList nl = m_element.getChildNodes();
for (int i = 0; i < nl.getLength(); i++) {
Node n = nl.item(i);
if (n != null && n.getNodeType() == org.w3c.dom.Node.TEXT_NODE) {
result.append(((CharacterData) n).getData());
}
}
}
return result.toString();
}[/sourcecode]Notice the “synchronized ( m_element.getOwnerDocument()) {}” block around the section that deals with the DOM. The NPE would normally be thrown on the nl.getLength() or nl.item() calls.
Since putting in the synchronized blocks, we’ve gone from having 78 NPEs between 2:30am and 3:00am, to having zero in the last 12 hours, so I think it’s safe to say, this has drastically reduced the problem.