Win a copy of Cross-Platform Desktop Applications: Using Node, Electron, and NW.js this week in the JavaScript forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic

Problem in accessing and manipulating huge DB data  RSS feed

 
Harish V Gupta
Ranch Hand
Posts: 37
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello,
For one of the requirements in our project we need to access a DB table having huge data and then do some calculations based on the passed criteria to the stored proc. Such kind of query is taking huge time and consuming a lot of CPU as well. We though of some alternative approaches as follows :
1. Scheduler (Schedule such kind of queries to run in non-peak hours)
2. JMS Queue (Queue up the requests so that many concurrent requests do not create problem)

Scheduler has a con that it will change the user experience as the user will be getting the result for his criteria only on the next day of querying it. As far as JMS is concerned, we are not sure if it will be of use for such kind of scenario.

Could you please comment on this? Could you please suggest some alternative approach also?

Thanks
 
Ulf Dittmer
Rancher
Posts: 42972
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Which problem are you trying to solve, exactly? Neither approach you mentioned will do anything about processing time, so if that's the primary issue (which I understand is your main concern) then they won't help.
 
Karthik Shiraly
Bartender
Posts: 1210
25
Android C++ Java Linux PHP Python
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Harish V Gupta wrote:Such kind of query is taking huge time and consuming a lot of CPU as well.


For such problems, the first investigation to carry out is to search for a reason why the query is taking so much time on that particular database instance on that particular machine.
- It may be a naively written query which is selecting too many columns when only a few would suffice.
- Or it may be that the database configurations is not optimum.
- Or that the requisite indexes have not been created.
- Or that the database server hardware is not powerful enough, or is powerful but hampered by disk full conditions or whatever.

Every database comes with its own quirks, its own optimum hardware environment, and a bag of tricks to speed it up.
I would concentrate on investigating, cleaning up and optimizing the query itself first.
Trying to do anything in the application layer is likely to be a temporary bandage at best.
 
Harish V Gupta
Ranch Hand
Posts: 37
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello Karthik,
Thanks for the quick reply

The query has been optimised to the best extent by the DB experts. The task at hand now is to find a way on when to execute that query.
 
Harish V Gupta
Ranch Hand
Posts: 37
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello Dittmer,
The problem now is not processing time but when and how to process that time taking query. If there are a lot of concurrent requests, and all of them are being processed in parallel, then we fear that DB will crash. If, on the other hand, the requests are processed sequentially, we can omit that problem.
 
Ulf Dittmer
Rancher
Posts: 42972
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I see. Not kicking off long-running processes at user demand seems a good approach. Whether running them in parallel is feasible depends on your DB and processing server (assuming it's separate from the DB server) hardware. DB servers generally have numerous CPUs and/or cores, and DBs are optimized to take advantage of them, so from the DB's perspective some degree of concurrency should be feasible.

Whether processing during normal business hours is possible depends on what else the server is handling during that time. If it's so CPU-intensive that other operations are impacted you may want to run the tasks only at off-peak hours.

You might also think about running these queries ahead of time (assuming that parameters and such are limited and/or predictable), and keeping the results.

Hello Dittmer,

Out of curiosity: why are you addressing Karthik by his first name, and me by my last name? First name is fine, but if you want to use the last name, then the polite way to do so is by prefixing it by "Mr".
 
Karthik Shiraly
Bartender
Posts: 1210
25
Android C++ Java Linux PHP Python
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Few questions to narrow the possibilities:

- What percentage of your total queries are queries of this nature?
- Does this query mostly read data or mostly write data?
- Is your database a single instance, or a replicated set of instances?
- Does the same database currently execute both transactional as well as analytical workloads? This particular query sounds to me like an analytical workload (use cases like report generation, for example), but you know best.
- How much delay between query and result is acceptable from user experience point of view? seconds? or minutes? or hours?
- How important is freshness of data to this query? Is it ok to execute it on a stale snapshot, instead of on fresh transactionally consistent data?
 
Harish V Gupta
Ranch Hand
Posts: 37
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks Ulf and Please accept my sincere apologies.

We actually can't run the queries ahead of time as the input passed to the query is unpredictable.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!