• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Ron McLeod
  • Paul Clapham
  • Tim Cooke
  • Devaka Cooray
Sheriffs:
  • Liutauras Vilda
  • paul wheaton
  • Rob Spoor
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Tim Holloway
  • Piet Souris
  • Mikalai Zaikin
Bartenders:
  • Carey Brown
  • Roland Mueller

Interview ques

 
Greenhorn
Posts: 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I have been asked this question in a couple of interviews: "A java program reads records from the database and processes it. It takes about 6-7 hours to complete the process. How can you improve the performance and decrease the time to 10-15 minutes". any help would be a great help.
 
Ranch Hand
Posts: 287
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Perhaps use multiple threads. A thread reads the records from database while another thread processes it. something like producer -consumer.

But lets wait for the "Expert" answers
 
Sheriff
Posts: 3837
66
Netbeans IDE Oracle Firefox Browser
  • Likes 5
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I think this is a trick question, assuming you didn't get any additional information. You don't know the cause of the performance problem, you don't even know whether there is a performance problem (you only know that someone claims that the process can be made 30 or 40 times faster). In real world, you need to find out what the bottleneck is and then offer solutions, if there are any at all.

If multithreading really was the correct answer, their box would have to be equipped with some 30 or 40 cores, probably more. This is still quite uncommon configuration. Moreover, the database or network might not be able to serve the records fast enough to keep forty CPUs busy all time.
 
Bartender
Posts: 4568
9
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I think Martin's right. I'm no expert on performance (or interviews ), but I suspect what a sensible interviewer would be looking for is an explanation of how you'd go about solving the problem - starting with profiling it to track down exactly where the bottleneck is.

After you've explained your approach, there wouldn't be any harm in following it with "here are some potential causes that I wouldn't be surprised to find, and if they turn out to be true, here's what might help" (e.g. finding the same query made many times with the same parameters - cache the results). Which might also help you out with a less well-informed interviewer who thinks there is a correct answer.

 
Ranch Hand
Posts: 100
1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
How about adding an index (or indexes) to the data you're reading - this would improve performance!

What's interesting about this question is that they give you the time frame they want to get the process to. If they were after a general 'how would you approach this problem' type of answer I'm not sure they would bound it with numbers like they did.

I could be wrong though....

Chris
 
Ranch Hand
Posts: 2187
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
This is a silly question and a silly post in my opinion. Probably posted by a recruiter looking for answers to an interview question they may want to ask
 
Rancher
Posts: 4804
7
Mac OS X VI Editor Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Chris Bicnal wrote:How about adding an index (or indexes) to the data you're reading - this would improve performance!



This is one approach that might work. Of course, there may already be proper index or indexes on the data, so it won't help.

I can make any program run in half the time (twice as fast). But the first step is to find out where it is slow, rather than making wild guesses.
 
Greenhorn
Posts: 23
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
JDBC Performance :
One point many developers forgot while using plain jdbc or even hibernate (which internally uses JDBC API's) is properly setting fetchSize on JDBC Statement, improves throughput considerably (about 10x). Default fetch size is 20, which means in single network call, JDBC driver will fetch only 20 records and next 20 records in another network call. What if i am reading millions of records in those 4-5 hours ? Increasing fetch size from 20 to 500 or even 1000 will reduce IO/Network calls by 250 or 500 times. This will definitely need more memory for store those 1000 records in memory but i can ignore them for throughput.
Above mentioned point is definitely worth trying. Another interesting point is fetching CLOB/BLOB in resultSet. Using CLOB/BLOB in resultSet will force JDBC driver (alteast about oracle I know) to fetch only one record per network call. Hence fetching BLOB/CLOB will result in more network calls.If BLOB/CLOB is not used, remove them from resultSet and you will see performance improved.
 
Pat Farrell
Rancher
Posts: 4804
7
Mac OS X VI Editor Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Rohan Dhapodkar wrote: properly setting fetchSize on JDBC Statement, improves throughput considerably (about 10x).



This is an interesting idea, so I thought I'd check it out. A quick google search shows that the setFetchSize() parameter's default value is implementation specific, and is different for MySql, Oracle, SqlServer, etc.

It is also clear that one can't reliably make such broad generalizations, as a large value may result in the DBMS returning so much data that your program has to garbage collect constantly.

I wrote a simple test program to test a fairly large MySql database that I'm working on for my day job. A snippet of the code looks like:



As you can see, it loops, and calls a random number, and switched between using a setFetchSize() parameter of zero (use default), or 1000 or 10,000 a third of the time. I use a random selection, repeated, so we can fairly test any caching logic that the DBMS may have.

At least for my testing with MySql on a fast quad processor MacBook Pro, the answers are not obvious.


 
Pat Farrell
Rancher
Posts: 4804
7
Mac OS X VI Editor Linux
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Opps, my earlier test had some bugs in the libraries that I was using. I have fixed that, and re-ran the code. The results are clear, there is no meaningful difference.


I wrote a simple test program to test a fairly large MySql database that I'm working on for my day job. The results of testing with a setFetchSize() of 0, 1000, and 40000 are:



Again, I picked a random number and did one of the three tests. You can see that I processed at least 28 trials yet no matter what value I used, the runtime was within 7 milliseconds with a small standard deviation. These values are so close together as to be within the measurement error.

So my tests with MySql show that there is no advantage, none at all, with settting the fetch size.
 
Marshal
Posts: 28295
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Pat Farrell wrote:So my tests with MySql show that there is no advantage, none at all, with settting the fetch size.



Yeah, there's a lot of "simple solutions" out there which people freely pass around to each other. Many of which are... um... of limited utility.

However would it be fair to guess that your tests were done in an environment with MySQL running locally? And that the results might have been different if the server was running elsewhere on the network? I wouldn't bet much money on the latter, frankly, but that's just guesswork, which isn't good practice in the performance-improvements world.
 
Pat Farrell
Rancher
Posts: 4804
7
Mac OS X VI Editor Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Paul Clapham wrote:However would it be fair to guess that your tests were done in an environment with MySQL running locally? And that the results might have been different if the server was running elsewhere on the network?


Yes, and yes, there is some chance that they would be different if there was real latency between the client and server.

My tests are good enough for me to say: (1) the proposed change is not always useful and (2) I don't bother to test/look at this further.

As always, when there is a real performance problem, I will profile and then see if this helps. But I'm not gonna hold my breath.
 
Bartender
Posts: 6663
5
MyEclipse IDE Firefox Browser Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Welcome to code ranch Sri.

Like the others have mentioned, the question is quite open ended. You can drill it down further by asking more questions like...

1. How much time is spent on database queries
2. What is the network latency between the database and the application server
3. What sort of processing is done on the data
4. etc etc

That can lead to an interesting discussion.
 
Ranch Hand
Posts: 52
1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Couple of thoughts

1. Use connection pooling
2. Define proper index on the table
3. Read only required columns (in most of the cases we no need to fetch all the column values)
4. Use the Producer - Consumer model for one set of threads reading it from db another set of threads processing it
5. Check CPU utilisation while running Java program, if the CPU utilisation is very less, increase the number of threads in the program. This should speed up the things. Even after increasing the threads if there is no improvement in the performance take heap dump of the application check if there is any memory leak.
6. Using profilers check which method/part of the program takes long time and address it first.
 
Lakshman Arun
Ranch Hand
Posts: 52
1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
7. In producer-consumer model, we can make the consumers running in distributed environment (using JMS worker threads as consumers). This way we can scale the application much better
 
Time is mother nature's way of keeping everything from happening at once. And this is a tiny ad:
We need your help - Coderanch server fundraiser
https://coderanch.com/wiki/782867/Coderanch-server-fundraiser
reply
    Bookmark Topic Watch Topic
  • New Topic