• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Tim Cooke
  • paul wheaton
  • Paul Clapham
  • Ron McLeod
Sheriffs:
  • Jeanne Boyarsky
  • Liutauras Vilda
Saloon Keepers:
  • Tim Holloway
  • Carey Brown
  • Roland Mueller
  • Piet Souris
Bartenders:

How to remove special/Invalid charchters before reading the XML file

 
Greenhorn
Posts: 5
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I have functionality which reads XML file, gets data from the nodes and take an appropriate action to execute the business logic. I am using the below code:

SAXBuilder builder = new SAXBuilder();
Document doc = builder.build(fileName);

XML file I am getting as an input is modified by the end user. He adds the data into it and then using my functionality that data is getting into system. In this scenario I am getting an JDOM parser exception "Document root element is missing.."
Reason is the editor used to modify XML appends some charachters at the begining of file and it fails in reading this.
Is there any way I can handle this problem? Either removing this chars or ignoring them. These chars are not predicatable, those might be some garbage values also.



 
Ranch Hand
Posts: 152
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
It's a brittle solution, but you can write something else to preprocess the file to strip away the "garbage" and produce a valid XML output. The XML classes in Java pretty much expect to find valid XML in their inputs.

If you can't change the process to avoid the prepended characters but you can detect some rhyme or reason to them, you can make this approach more robust.
 
Marshal
Posts: 28425
102
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
The proper way to solve this problem would be to not have the user modifying XML files. It's easy for people to create malformed XML, even if they do know the rules for well-formed XML. Which most people don't.

I don't believe your users are keying in those few non-printable characters which are not allowed in XML documents. I think this "garbage" you're talking about is more likely to be missing ">" characters at the end of tags or unescaped ampersands or just plain fat-finger mistyping. There's no way to fix that sort of thing via a preprocessor.

But if you're stuck with people modifying XML by hand, then give them an XML editor to do that with. Don't let them use Notepad or something like that. Make sure they send you well-formed XML.
 
Greenhorn
Posts: 26
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I guess the special characters that you are mentioning about is called the Byte Order Mark (BOM). These are the characters inserted by the editor at the start when you edit the xml file. BOM is basically used to find out the encoding of the xml file.

May be the xml parser that you are using does not understand BOM and throws an exception.

Anybody has any thoughts about this?
 
Paul Clapham
Marshal
Posts: 28425
102
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Vinod Borole wrote:I guess the special characters that you are mentioning about is called the Byte Order Mark (BOM). These are the characters inserted by the editor at the start when you edit the xml file.


This is definitely a possibility. Notepad has a bad habit of doing that. So if you gave people a proper XML editor to edit their XML with, that wouldn't happen.
 
Aditya Bhanose
Greenhorn
Posts: 5
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Yes, my problem is about Byte order mark only. I think there is no way we can handle this in code.
I need to provide a proper XML editor to end users.
 
Paul Clapham
Marshal
Posts: 28425
102
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Aditya Bhanose wrote:Yes, my problem is about Byte order mark only. I think there is no way we can handle this in code.

Sure you can handle it in code. You just wrap the InputStream that contains the XML (with possible byte order mark) in another InputStream which skips over anything preceding the first "<" character when it's created. PushbackInputStream is a useful basis for that.

But don't underestimate the ability of end-users to botch up XML documents in other ways. I did it myself yesterday -- I made a little change to a configuration file which caused my application to fail to start up correctly.
 
Consider Paul's rocket mass heater.
reply
    Bookmark Topic Watch Topic
  • New Topic