• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

XML parsing VS simple TXT parsing using java streams

 
Ranch Hand
Posts: 47
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
hi,
I have a situation here, I am currently storing huge amounts of data (half GB, one GB at most 2 GB) in text file (csv style)�and then I parse them using simple java streams �I would read them once and calculate some summaries and fill in oracle table. However, now I am thinking of storing the data in XML format instead of text format and use SAX for parsing , the file size would surely shoot up ..maybe double �.but more important is parsing performance �is XML suited for this amount of data ?? will SAX parsing be any better than simply reading text file using java streams and tokenizing them ??

Can some one please throw some light on this issue
thanks,

.....jw
 
Ranch Hand
Posts: 192
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Yes, you can write huge amounts of XML but not by using SAX or DOM. but by using SAX extensions that are available.
I would suggest you to read this article
This is a good one that talks about that.
Hope it helps..
 
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Since XML parsing will add LOTS of overhead I can't imagine how you could avoid a major slowdown. Any XML processing will involve creation of lots of objects, conversion to and from String etc.
IF (big if) your data is all ASCII, you will be much faster handling the input as byte streams and byte[] buffers, not character streams and staying well away from String conversion until the last minute.
XML shines when the data structure is complex, anything that can be represented as CSV is not a good candidate.
Bill
 
Consider Paul's rocket mass heater.
reply
    Bookmark Topic Watch Topic
  • New Topic