• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Paul Clapham
  • Ron McLeod
  • Jeanne Boyarsky
  • Tim Cooke
Sheriffs:
  • Liutauras Vilda
  • paul wheaton
  • Henry Wong
Saloon Keepers:
  • Tim Moores
  • Tim Holloway
  • Stephan van Hulst
  • Carey Brown
  • Frits Walraven
Bartenders:
  • Piet Souris
  • Himai Minh

Utiltiy that uses Terse XML tags

 
Ranch Hand
Posts: 204
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hello,

Has anyone run accross a utility to convert XML tags to low memory sized cryptic characters?

I'm hoping to find a utility that will read a BILLION record XML file and convert the tags to a special byte size character to reduce file size!

This utility would do something like convert <intex_traunche_cusip_id> to somthing like #@ or somekind of unmodifyable crytptic low size value?

Thanks for any references or suggestions!

bc
 
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Actually that would be pretty trivial to program using SAX. The startElement and endElement methods would have to build and use a replacement table. However, before you embark on that you should look into how much compression the plain ZIP compression utility can provide.

I looked into both ZIP encoding and "fast infoset" for this article. ZIP encoding compressed my test file by more than a factor of 10 with only a minor effect on parsing time.

Let us know what you come up with, I think a lot of people are worried about large XML files.

Bill
 
bob connolly
Ranch Hand
Posts: 204
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks William, very good article!

I'm going to take a closer look into the specifications for that Fast Infoset technique!

And it's good to know that the zipping is about the best anyone can do for right now!

Have a good one William!

bc
 
What are you saying? I thought you said that Santa gave you that. And this tiny ad:
Free, earth friendly heat - from the CodeRanch trailboss
https://www.kickstarter.com/projects/paulwheaton/free-heat
reply
    Bookmark Topic Watch Topic
  • New Topic