How can I avoid validation against DTD using DOM parsing?
I want to parse XML from a stream source like a socket which includes a DOCTYPE reference.
Everything left by default settings, the DocumentBuilder.parse(InputStream is); method tries to read the DTD.
This causes an exception because the DTD file couldn't be found in current working directory.
I know that I could set the factory to non validating by DocumentBuilder.setValidating(false);
But I wonder why it's needed because the default value for this property is false.
In case of SAX parsing I've seen that people use SAXParser.setProperty(String name, Object value); to achieve non validation.
However, DocumentBuilderFactory doesn't implement setProperty(). DOMParser does but I'd like to use the proper interfaces instead cutting directly into the DOMParser.
How can I avoid validation against DTD using DOM
parsing?
I know that I could set the factory to non validating
by DocumentBuilder.setValidating(false);
But I wonder why it's needed because the default value
for this property is false.
Hi Carsten,
Take a look at the following code:
// get the XML document from the input stream
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
document = docBuilder.parse(inputStream);
When you call docFactory.newDocumentBuilder() you can never be sure what kind of document builder you actually will be returned. It could return any of the following classes, and possibly others: com.sun.xml.parser.DocumentBuilderImpl org.apache.xerces.jaxp.DocumentBuilderImpl
* org.apache.xindice.xml.jaxp.DocumentBuilderImpl
All you know for sure, is that it will extend the abstract class: DocumentBuilder. You are quite correct in asserting that default value for validation is false - however there is no way to enforce this through the abstract class. This is why the class interface provides the setValidating() method so that you can ensure that the DocumentBuilder will not validate.
Sorry, maybe my first posting was a bit confusing in one point.
I definitely call
...
fact = DocumentBuilderFactory.newInstance();
fact.setValidating(false);
...
As far I know there is no setValidate() in Builder, one has to set it in the factory before calling fact.newDocumentBuilder(), correct?
I don't know if it's still validating or not, because I didn't try a non matching XML. All I know is that the parser still tries to connect to the DTD which is mentioned in XML. To be concrete, everytime I call parse() and do not provide a valid SystemId which matches the DTD reference made in the XML the parser throws an Exception.
In fact I work on some distributed components and I don't want or can not privode a DTD at any time. And I don't need one, I just want the XML to be parsed assuming it't DTD conform. On the other hand I don't want to provide an URI to DTD in XML or remove it completely (what would also work I guess).
My assumption was that setting the parser to non validating will stop validation completely and there doesn't make the parser's accessing of DTD necessary. Isn't that correct?
Basically I'm able to do a work around by providing a DTD on a network resource (or even on a Server later), but since I don't need one it's a bit annoying.
Anyone has a comprehensible answer why DTD is accessed regardless of builder's validation state or even a workaround to the issue described?
setValidating(false) sets a parser to non validating.
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setValidating(false);
SetValidating() doesn't disable DTD handling - at least not with JAXP 1.2ea2
To prevent the XML to be checked against its DTD, this is what I did:
myDocumentBuilder.setEntityResolver(new EntityResolver() {
public InputSource resolveEntity(java.lang.String publicId, java.lang.String systemId)
throws SAXException, java.io.IOException
{
if (publicId.equals("--myDTDpublicID--"))
// this deactivates the open office DTD
returnnew InputSource(new ByteArrayInputStream("<?xml version='1.0' encoding='UTF-8'?>".getBytes()));
elsereturnnull;
}
});
As you can see, when the parser hits the DTD, the entity resolver is called. I recognize my DTD with its specific ID and return an empty XML doc instead of the real DTD, stopping all validation...
This works fine to keep the parser from trying to find the dtd & validate it. I was happy to find this solution! But it also removes the dtd information from the !DOCTYPE, so if I write the document back out, I've lost the DTD reference. Is there a way to avoid this or am I asking to have my cake & eat it too?
Seems like there should be a way to use DOM to modify a document without throwing an exception if it can't find the DTD or stripping the DTD reference out of the !DOCTYPE.
Hello!
I am having exactly the same problem. I don't have the DTD, so I need the parser to don't validate the Document. Although I need to send the Document to a server, and I need it to have the !DOCTYPE declaration (which is being removed).
Did you solved the problem?
Any help would be welcome. Thanks in advance,
Victor Batista
I have been looking for a way to disable DTD validation that REALLY WORKS for a couple of months and I must say this has been a great deal of unfair headache !!! My respects to you for providing THE ONLY VIABLE SOLUTION to this nasty $%^#$%% problem !!!!!
THANK YOU VERY MUCH, YOU'R THE MAN !!!!!
How about replacing <!DOCTYPE blah> with <!--<!DOCTYPE blah>-->, thus commenting it out, and then uncomment it after you're done? Here's the first part, which seems to work OK.
try {
java.io.File fXML = new java.io.File(sXMLFile); //sXMLFile is path to an XML file with a DOCTYPE declaration
java.io.FileReader fr = new java.io.FileReader(fXML);
long nChars = fXML.length();
char [] cbuf = new char[(int)nChars];
int n = fr.read(cbuf);
String sXML = new String(cbuf); //I have a string representation of an XML file
java.util.regex.Pattern p = java.util.regex.Pattern.compile("<!DOCTYPE^>*>");
java.util.regex.Matcher m = p.matcher(sXML);
boolean bFound = m.find();
sXML = m.replaceFirst("<!--" m.group() "-->"); //With DOCTYPE commented out, XML is well formed
System.out.println(sXML);
fr.close();
}
catch (java.io.FileNotFoundException e) {
System.out.println(e.getMessage());
}
catch (java.io.IOException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}