• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Tree parsing and performance

 
Ranch Hand
Posts: 264
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I have written an application that creates a fairly large tree structure from an input file and then inserts it into a database. As it parses the tree it inserts records and part of each record is the id of its parent. With really large input files the process gets extremely slow. I'm trying to conceptualize a way to do batch inserts (or anything else to speed up the process).

my basic algorithm is:



Any ideas? It also seems the application speed decreases exponentially while the input size increases linearly (thats a little bit of an overstatement).

-Tad
 
author and iconoclast
Posts: 24207
46
Mac OS X Eclipse IDE Chrome
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
The Enumeration at the top iterates over... all the nodes? Are all the nodes in the tree held in one collection somewhere?

Then the parseNode() method iterates over all the children of each node.

Taken together, and combined with your observation that the runtime goes up exponentially with tree depth, this suggests that parseNode() is being called multiple times on any given node.

If you have a recursive algorithm for walking a tree, you generally start it by calling it once on one node at the root of the tree, and then the recursion takes care of calling it on every tree node. It looks to me as if you're calling it on every node, and many of those calls proceed to recursively call it again on many other nodes. Can the Enumeration at the top be replaced with a single call at the tree root?
 
Tad Dicks
Ranch Hand
Posts: 264
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
The enumeration at the top holds the top level "nodes." If the input file doesn't create a single root node, it artificially creates one from the sequence of nodes that is the top level (I hope that makes sense).
I guess I should have written while(root.hasMoreChildren())as thats what is happening.

Basically if an input file has
parent 1
child1a
child1b
parent 2
child2a
child2b
parent 3
child3a
child3b

The top while statement would get executed 3 times.
I just didn't want the "artificially created root" in the database.
I wouldn't say its exactly exponential. Just if the input file is 50k or so it runs lickety split, if the input file is 500k it takes a lot longer than 10 times the length of the first run.
 
With a little knowledge, a cast iron skillet is non-stick and lasts a lifetime.
reply
    Bookmark Topic Watch Topic
  • New Topic