Win a copy of Svelte and Sapper in Action this week in the JavaScript forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Ron McLeod
  • Paul Clapham
  • Bear Bibeault
  • Junilu Lacar
Sheriffs:
  • Jeanne Boyarsky
  • Tim Cooke
  • Henry Wong
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Tim Holloway
  • salvin francis
  • Frits Walraven
Bartenders:
  • Scott Selikoff
  • Piet Souris
  • Carey Brown

XML Parser

 
Greenhorn
Posts: 27
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi All,

I have a really huge XML which is not a well formed one.

Is there any way to find out exactly where it is not well formed by reading the file in java?

My file size is a 100 MB one and it eats up a lot of memory in opening through any XML editor. Moreover there are 2 million lines and editor for sure cant help. If I try to parse through any parser it will fail in the first step saying that it is not well formed. Any inputs would really help.

Thanks,
Satish
 
Marshal
Posts: 25826
69
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You could do that with a SAX parser. Here is an example of how to use a SAX parser and a Locator to find the current parse location. Save the information and use it when the exception is thrown.

I assume this enormous malformed document is being generated by a computer program, and you're trying to fix that program?
 
Satish Kandagadla
Greenhorn
Posts: 27
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Originally posted by Paul Clapham:
You could do that with a SAX parser. Here is an example of how to use a SAX parser and a Locator to find the current parse location. Save the information and use it when the exception is thrown.

I assume this enormous malformed document is being generated by a computer program, and you're trying to fix that program?



Thanks for the Reply. The XML is from a product. I do not have access to how the product generates the XML. My intention is to figure out how many missing tags are there and then find out how to fix it.

Well the code that you pointed me to assumes that the xml is well formed or does it work on any XML?
[ November 19, 2008: Message edited by: Satish Kandagadla ]
 
Paul Clapham
Marshal
Posts: 25826
69
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Only well-formed documents are XML. But a SAX parser will process a document until it reaches a place where it sees a problem.

In my opinion you would be better off getting the people who are sending you garbage documents to fix those documents themselves.
 
Satish Kandagadla
Greenhorn
Posts: 27
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Originally posted by Paul Clapham:
Only well-formed documents are XML. But a SAX parser will process a document until it reaches a place where it sees a problem.

In my opinion you would be better off getting the people who are sending you garbage documents to fix those documents themselves.



Thanks Paul. Yes there are challenges in the project to get the XML from them but I see no other way in getting the proper XML apart from approaching them. My life will be lot easier if I get a well formed XML.
 
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
IF - the XML you are getting is consistent in terms of where the markup is incorrect, you might be able to to code fixup routines as part of an XML pipeline processing model.

I wrote this article and this followup article as an introduction to "pipeline" processing of XML.

Bill
 
Please enjoy this holographic presentation of our apocalyptic dilemma right after this tiny ad:
Thread Boost feature
https://coderanch.com/t/674455/Thread-Boost-feature
    Bookmark Topic Watch Topic
  • New Topic