• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Tim Cooke
  • paul wheaton
  • Jeanne Boyarsky
  • Ron McLeod
Sheriffs:
  • Paul Clapham
  • Liutauras Vilda
  • Devaka Cooray
Saloon Keepers:
  • Tim Holloway
  • Roland Mueller
Bartenders:

Keeping 4 lakhs db records data into a array list of plain java bean or VO objects

 
Greenhorn
Posts: 25
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Dear Friends,

Is it a good way to keep all 4 lakhs records from db into a array list. Will it affects the performance. My requirement is i need to migrate that data to a content repository. So i need to send them one by one by accessing those plain java bean or VO's present in the array list.

Please let me know what is the efficeint way, whether to read 1000 records once process them and then again read another 1000 or i can do all 4 lakhs at a time. Which way will me more performance oriented.

Please help me ASAP Thanks in advance.
 
Bartender
Posts: 6663
5
MyEclipse IDE Firefox Browser Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Read them X records at a time and process them. 400,000 records processed into an arraylist can become bulky and inefficient. Trying to gulp down all the data at once is usually not the way to go
 
Bartender
Posts: 1638
IntelliJ IDE MySQL Database Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

venkata vinay bysani wrote:Which way will me more performance oriented.



Performance is very subjective. It depends on what kind of processing you are doing.
If you have to iterate over the list processing each record one by one and you do not have memory constraints then nothing like keeping it simple.
However, if you have to search elements in the list and massage data then obviously a list with half a million records is an absolute no.
 
Ranch Hand
Posts: 168
Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

venkata vinay bysani wrote:
Is it a good way to keep all 4 lakhs records


Please use English words. Not all the members of this forum know what lakh is.
 
Greenhorn
Posts: 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Vinay,

why cant you have a [code] PL/SQL [/code] procedure block that does this processing? => Option1
(because db is intelligent enough to handle the performance-again depends on hw smart your pl/sql block is written

always remember being a developer one should not interact with your db directly => use any orm(for instance use
Hibernate/Ibatis/..because they take care of performance to some extent..) to interact with your db => option 2

one thing i would like to know.? why do you need access to 4000 db records ?

finally performance is a guideline , not a target.
 
Bartender
Posts: 10336
Hibernate Eclipse IDE Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator


why cant you have a procedure block that does this processing? => Option1
(because db is intelligent enough to handle the performance-again depends on hw smart your pl/sql block is written


True, though the down side of this approach is you push work into the database layer that can be expensive to distribute, instead of the application layer that can be cheap to distribute.


always remember being a developer one should not interact with your db directly => use any orm(for instance use
Hibernate/Ibatis/..because they take care of performance to some extent..) to interact with your db => option 2


Using an ORM for such large bulk operations is a bad idea. ORMs are not suitable for bulk data manipulation.
 
Greenhorn
Posts: 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Processing n records at a time and then purging it to DB is a advised ,since holding so much of data in memory may eat up all the available memory and leads to application / server crash down.One thing when ever this type of processing is performed make sure its in off peak loads...
 
Ranch Hand
Posts: 300
Eclipse IDE Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
hi All,

It's very old thread but just curious to know how can i increase the performance with the 4,00,000 records.Suppose i am using joins to search the record and multiple user is requesting the same.So how can i implement the caching mechanism which will be efficient. I am not using ORM.

Regards
Jatan
 
Bartender
Posts: 10780
71
Hibernate Eclipse IDE Ubuntu
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

jatan bhavsar wrote:It's very old thread but just curious to know how can i increase the performance with the 4,00,000 records.Suppose i am using joins to search the record and multiple user is requesting the same.So how can i implement the caching mechanism which will be efficient. I am not using ORM.


Simple rule: Almost invariably, do all the complex stuff in the database, not in Java. It's exactly what they were designed for.

I'm afraid this is a Java forum, not a database one; but just off the top of my head: If you find you're having to do tons of joins or inefficient searches, it's usually an indication that:
(a) Your design needs an overhaul.
(b) (Ugh) Your database needs an overhaul.
(c) (Easiest, but possibly a 'band-aid') Your db needs some new - or different - indexes.
If your db has an EXPLAIN PLAN statement, it may help you analyse what's going on to help you make that decision; but I'd definitely tackle point (a) first.
Do you really need all these joins?

Winston
 
Bartender
Posts: 2407
36
Scala Python Oracle Postgres Database Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

jatan bhavsar wrote:hi All,

It's very old thread but just curious to know how can i increase the performance with the 4,00,000 records.Suppose i am using joins to search the record and multiple user is requesting the same.So how can i implement the caching mechanism which will be efficient. I am not using ORM.

Regards
Jatan


Winston's covered the key points, but it sounds like you need to re-think your query strategy.

First, build and test your SQL in the database, so you can explore issues like performance without all the overhead of Java-DB interfaces etc. Make your SQL as efficient as you can within the database - look at all those joins and see if they are really necessary or if you could tweak your table structures to make them easier e.g. denormalise commonly used columns from parent tables into the children (but remember you will need to maintain these denormalised values when the parent record changes).

If you have lots of similar queries, consider defining a database view to encapsulate the common joins etc, then your individual queries against the view will be simpler. but you still need to make sure the underlying view SQL is as efficient as possible. Also, make sure you do NOT do lots of DML across the network: fetching large volumes of data and then writing it back again is very inefficient. Instead, do big INSERTs/UPDATEs as single SQL statements within the database where possible, or use stored procedures if necessary.

Make sure you use all the available database tools to make your queries as efficient as possible - EXPLAIN PLAN, use of indexes, denormalisation, partitioning, etc. Partitioning might make a big difference if your users tend to query against particular logical sets of data e.g. for a particular month, as you can partition your table on the relevant column so that each month's data (for example) is physically separate, so any queries for e.g. January only have to search through January's partition instead of the whole table. This will depend on your RDBMS as these features are usually platform-specific.

If you don't know how to do all this stuff, get your DBA to help. If they don't know how to do this either, you shouldn't be running an application with millions of rows in the first place.

Now you have some SQL queries and you want to run them from your Java application. Again, make sure you do everything you can to make this as efficient as possible. Make sure you use PreparedStatement and bind variables to make your SQL both secure and re-usable (the SQL engine can re-use parsed SQL with bind variables), use batches etc if appropriate. Make sure you apply as much filtering as possible within the database before you fetch any records back. Reading data out of the database across the network is slow, so only read the data you really want - don't fetch everything and try to filter the data in Java.

Caching query results is a problem - you'll need some proper Java experts to advise on that (i.e. not me!) - but remember that the DB has its own caching mechanisms as well. Again, talk to your DBA to make sure you are using these effectively.

Use a profiling tool to find out where all the time is being spent. No point trying to speed up your SQL queries if most of the time is being wasted on network traffic, for example.

If your queries are still too slow, consider separating the "read" data from the "write" data. Your data model may be optimised for writing transaction data into a particular set of tables, but these may not be best structure for querying large volumes of data later on. You could look at building snapshots (in Oracle these are "materialised views") to hold commonly queried data in a more suitable structure, then query against the snaphots instead of the data tables, and refresh these snapshots e.g. once a day from your operational (transaction) data store. This approach is similar to the basic principle for data warehouses/data marts (on a much larger scale), but you can take the same approach on a smaller scale if it will help your application. This will also depend on your query requirements etc.

Finally, maybe think about whether you might be able to use some of the performance features of an ORM on your application.





 
Of course, I found a very beautiful couch. Definitely. And this tiny ad:
Smokeless wood heat with a rocket mass heater
https://woodheat.net
reply
    Bookmark Topic Watch Topic
  • New Topic