participate


Java Technology & XML - How to set non validating with DOM
<<   Back to Forum  |   Give us Feedback
2 Duke Stars rewarded for this thread
This topic has 12 replies on 1 page.
schoenc
Posts:11
Registered: 7/31/02
How to set non validating with DOM   
Aug 1, 2002 7:42 AM

 
How can I avoid validation against DTD using DOM parsing?

I want to parse XML from a stream source like a socket which includes a DOCTYPE reference.
Everything left by default settings, the DocumentBuilder.parse(InputStream is); method tries to read the DTD.

This causes an exception because the DTD file couldn't be found in current working directory.

I know that I could set the factory to non validating by DocumentBuilder.setValidating(false);

But I wonder why it's needed because the default value for this property is false.

In case of SAX parsing I've seen that people use SAXParser.setProperty(String name, Object value); to achieve non validation.
However, DocumentBuilderFactory doesn't implement setProperty(). DOMParser does but I'd like to use the proper interfaces instead cutting directly into the DOMParser.

Thanks for any hint
Carsten
 
delewis
Posts:29
Registered: 1/23/02
Re: How to set non validating with DOM   
Aug 12, 2002 12:41 PM (reply 1 of 12)  (In reply to original post )

 
How can I avoid validation against DTD using DOM
parsing?

I know that I could set the factory to non validating
by DocumentBuilder.setValidating(false);

But I wonder why it's needed because the default value
for this property is false.

Hi Carsten,

Take a look at the following code:
	// get the XML document from the input stream
	DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
	DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
	document = docBuilder.parse(inputStream);

When you call docFactory.newDocumentBuilder() you can never be sure what kind of document builder you actually will be returned. It could return any of the following classes, and possibly others:
com.sun.xml.parser.DocumentBuilderImpl
org.apache.xerces.jaxp.DocumentBuilderImpl
* org.apache.xindice.xml.jaxp.DocumentBuilderImpl

All you know for sure, is that it will extend the abstract class: DocumentBuilder. You are quite correct in asserting that default value for validation is false - however there is no way to enforce this through the abstract class. This is why the class interface provides the setValidating() method so that you can ensure that the DocumentBuilder will not validate.

I hope this helps!

- David
 
cschoen
Posts:3
Registered: 3/1/01
Re: How to set non validating with DOM   
Aug 23, 2002 6:01 AM (reply 2 of 12)  (In reply to #1 )

 
Yes I agree.

Sorry, maybe my first posting was a bit confusing in one point.

I definitely call
...
fact = DocumentBuilderFactory.newInstance();
fact.setValidating(false);
...
As far I know there is no setValidate() in Builder, one has to set it in the factory before calling fact.newDocumentBuilder(), correct?

I don't know if it's still validating or not, because I didn't try a non matching XML. All I know is that the parser still tries to connect to the DTD which is mentioned in XML. To be concrete, everytime I call parse() and do not provide a valid SystemId which matches the DTD reference made in the XML the parser throws an Exception.

In fact I work on some distributed components and I don't want or can not privode a DTD at any time. And I don't need one, I just want the XML to be parsed assuming it't DTD conform. On the other hand I don't want to provide an URI to DTD in XML or remove it completely (what would also work I guess).

My assumption was that setting the parser to non validating will stop validation completely and there doesn't make the parser's accessing of DTD necessary. Isn't that correct?

Basically I'm able to do a work around by providing a DTD on a network resource (or even on a Server later), but since I don't need one it's a bit annoying.

Anyone has a comprehensible answer why DTD is accessed regardless of builder's validation state or even a workaround to the issue described?

Thanks, Carsten
 
dvohra09
Posts:3,591
Registered: 4/4/01
Re: How to set non validating with DOM   
Aug 24, 2002 6:13 AM (reply 3 of 12)  (In reply to original post )

 
setValidating(false) sets a parser to non validating.
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setValidating(false);
 
ddossot
Posts:2,124
Registered: 2/23/01
Re: How to set non validating with DOM      
Aug 25, 2002 10:46 PM (reply 4 of 12)  (In reply to #3 )

 
SetValidating() doesn't disable DTD handling - at least not with JAXP 1.2ea2

To prevent the XML to be checked against its DTD, this is what I did:
myDocumentBuilder.setEntityResolver(new EntityResolver() {
          public InputSource resolveEntity(java.lang.String publicId, java.lang.String systemId)
                 throws SAXException, java.io.IOException
          {
            if (publicId.equals("--myDTDpublicID--"))
              // this deactivates the open office DTD
              return new InputSource(new ByteArrayInputStream("<?xml version='1.0' encoding='UTF-8'?>".getBytes()));
            else return null;
          }
});

As you can see, when the parser hits the DTD, the entity resolver is called. I recognize my DTD with its specific ID and return an empty XML doc instead of the real DTD, stopping all validation...

Hope this helps,
David
 
schoenc
Posts:11
Registered: 7/31/02
Re: How to set non validating with DOM   
Aug 26, 2002 7:56 AM (reply 5 of 12)  (In reply to #4 )

 
This is really a nice trick! It worked fine.

I also like the cool inline definition of the EntitiyResolver class.

Thanks.
Carsten
 
absayre
Posts:1
Registered: 7/27/99
Re: How to set non validating with DOM   
Dec 13, 2002 12:17 PM (reply 6 of 12)  (In reply to #4 )

 
This works fine to keep the parser from trying to find the dtd & validate it. I was happy to find this solution! But it also removes the dtd information from the !DOCTYPE, so if I write the document back out, I've lost the DTD reference. Is there a way to avoid this or am I asking to have my cake & eat it too?

Seems like there should be a way to use DOM to modify a document without throwing an exception if it can't find the DTD or stripping the DTD reference out of the !DOCTYPE.

Thanks,
AB
 
vbatista
Posts:14
Registered: 5/29/98
Re: How to set non validating with DOM   
Oct 24, 2003 4:06 AM (reply 7 of 12)  (In reply to #6 )

 
Hello!
I am having exactly the same problem. I don't have the DTD, so I need the parser to don't validate the Document. Although I need to send the Document to a server, and I need it to have the !DOCTYPE declaration (which is being removed).
Did you solved the problem?
Any help would be welcome. Thanks in advance,
Victor Batista
 
BytEncoder
Posts:1
Registered: 12/3/03
Re: How to set non validating with DOM   
Dec 3, 2003 4:54 AM (reply 8 of 12)  (In reply to #4 )

 
I have been looking for a way to disable DTD validation that REALLY WORKS for a couple of months and I must say this has been a great deal of unfair headache !!! My respects to you for providing THE ONLY VIABLE SOLUTION to this nasty $%^#$%% problem !!!!!
THANK YOU VERY MUCH, YOU'R THE MAN !!!!!
 
xlolo
Posts:20
Registered: 10/24/00
Re: How to set non validating with DOM   
Aug 17, 2005 9:56 AM (reply 9 of 12)  (In reply to #4 )

 
Hello,

I have a dtd like this one :

<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE rootname SYSTEM "mydtd.dtd">

I tryed whith "if (systemId.equals("mydtd.dtd"))" but It didn't work.

Any idea ?

Thanks
 
indianmanju
Posts:2
Registered: 4/23/00
Re: How to set non validating with DOM   
Sep 13, 2005 10:26 AM (reply 10 of 12)  (In reply to #6 )

 
To get back the DOCTYPE after transformation, use this code

transformer.setOutputProperty(OutputKeys.DOCTYPE_SYSTEM, "haXML.dtd");
 
laxmi83
Posts:11
Registered: 2/29/08
Re: How to set non validating with DOM   
Feb 29, 2008 2:29 AM (reply 11 of 12)  (In reply to #10 )

 
Hi All,

I found this thread which was very useful.I have the same problem of ignoring <!DocType> during parsing and displaying it back in the parsed xml.

I followed the step given as transformer.setOutputProperty(OutputKeys.DOCTYPE_SYSTEM ,"haXml.dtd"); replacing my dtd name.

But the problem i have is i need to display <!DocType> with public id like

<!DOCTYPE web-app PUBLIC "-//Sun Microsystems, Inc.//DTD Web Application 2.3//EN" "http://java.sun.com/dtd/web-app_2_3.dtd">

Is there any way to achieve this.

Please help me in solving this issue.

Thanks,
laxmi.
 
xyharv
Posts:1
Registered: 4/29/08
Re: How to set non validating with DOM   
Apr 29, 2008 7:22 AM (reply 12 of 12)  (In reply to #11 )

 
How about replacing <!DOCTYPE blah> with <!--<!DOCTYPE blah>-->, thus commenting it out, and then uncomment it after you're done? Here's the first part, which seems to work OK.

try {
java.io.File fXML = new java.io.File(sXMLFile); //sXMLFile is path to an XML file with a DOCTYPE declaration
java.io.FileReader fr = new java.io.FileReader(fXML);
long nChars = fXML.length();
char [] cbuf = new char[(int)nChars];
int n = fr.read(cbuf);
String sXML = new String(cbuf); //I have a string representation of an XML file

java.util.regex.Pattern p = java.util.regex.Pattern.compile("<!DOCTYPE^>*>");
java.util.regex.Matcher m = p.matcher(sXML);
boolean bFound = m.find();
sXML = m.replaceFirst("<!--" m.group() "-->"); //With DOCTYPE commented out, XML is well formed
System.out.println(sXML);
fr.close();
}
catch (java.io.FileNotFoundException e) {
System.out.println(e.getMessage());
}
catch (java.io.IOException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}

Harv Greenberg, XyEnterprise
 
This topic has 12 replies on 1 page.
Back to Forum
 
Read the Developer Forums Code of Conduct

Click to email this message Email this Topic

Edit this Topic
  
 
 
Forums Statistics
    Users Online : 28
  • Guests : 129

About Sun forums
  • Oracle Forums is a large collection of user generated discussions. It is here to help you ask questions, find answers, and participate in discussions.

    Check out our guide on Getting started with Oracle Forums for a full walkthrough of how to best leverage the benefits of this community.

Powered by Jive Forums