• Post Reply Bookmark Topic Watch Topic
  • New Topic

EhCache vs In Memory Database vs Guava Table  RSS feed

 
Mike Cheung
Ranch Hand
Posts: 113
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi, if I need to do the following:
1) Represent things in a table like manner (ie accessible via row & column values).
2) Retrieve all rows of a given column (similar to "select columnName from tableName").
3) Retrieve all columns of a given row (similar to "select * from tableName where rowName = "someName").
4) Retrieve a specific cell (similar to "select columnName from tableName where rowName = "someName").
5) Store and update values for 2) - 4).

But I want to do this fast so I'm looking at in memory options and so far I can see the following:
1) Using NVP like caching libraries like EhCache. From the following thread it appears this can be done by using an Object that represents the row and column to act as the key.
http://stackoverflow.com/questions/5908619/ehcache-key-type
2) Using the Guava Table.
3) Using an in memory database such as H2.

Anyone have done something similar and can share with us your experience and thoughts in terms of speed, memory efficiency, and anything? Thanks.
 
Jayesh A Lalwani
Rancher
Posts: 2762
32
Eclipse IDE Spring Tomcat Server
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The main questions are
a) How fast is fast enough? and how fast is too fast?
b) What kind of operation do you want to perform? Are you going to do AND operations? WIll you have multiple tables? Will you need to join between multiple tables.
b) How big is this data going to become? Is this data even going to fit in the memory of one server?

All solutions attempt to tradeoff speed for simplicity. What fits your need depends on what your exact need is. The simplest solution that will give best performance is a Hashmap. But, you are limited by the size of the memory, and implementing AND operations and joins across tables will be a nightmare for the average programmer. An in memory database that can understand SQLs will take all the complications off your hands, but then you add overhead of parsing the SQL. A NoSQL database can grow as the data grows, but some of them are limited in the kind of operations you can do.

When you are talking about performance, your solution should meet the anticipated need. Overoptimization is the root of all evil.
 
Mike Cheung
Ranch Hand
Posts: 113
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Jayesh A Lalwani wrote:The main questions are
a) How fast is fast enough? and how fast is too fast?
b) What kind of operation do you want to perform? Are you going to do AND operations? WIll you have multiple tables? Will you need to join between multiple tables.
b) How big is this data going to become? Is this data even going to fit in the memory of one server?

All solutions attempt to tradeoff speed for simplicity. What fits your need depends on what your exact need is. The simplest solution that will give best performance is a Hashmap. But, you are limited by the size of the memory, and implementing AND operations and joins across tables will be a nightmare for the average programmer. An in memory database that can understand SQLs will take all the complications off your hands, but then you add overhead of parsing the SQL. A NoSQL database can grow as the data grows, but some of them are limited in the kind of operations you can do.

When you are talking about performance, your solution should meet the anticipated need. Overoptimization is the root of all evil.

Hi Jayesh thanks for the reply. Very good questions indeed!
a) From my perspective the faster it is the better. If you want me to draw finger in the air somewhere in the neighborhood of 1 - 2 million CRUD per second. Shoot me down if this is not achievable but I really just don't what is the fastest that has been known to have been accomplished. I tried digging around with sites like http://www.jpab.org/ but I can't quite see the numbers there giving the actual operations done per unit time. It's more like a normalized score they are giving. On the contrary, can it get too fast? I'm not sure. I hope not.
b) Yes there will be multiple tables and yes there is a need to do the joins. And yes need to be able to do AND operations too. Now if this is something that will incur large performance penalty I'm interested to see if we can do this via some other means and load only selective tables that need to do the join into an embedded database just to do the join.
c) Data will be small enough to fit entirely within a server that have as little as 1 - 2 GB RAM along with the OS, rest of binary, etc. There is a desire to have this scalable also. So if data grows beyond size of one server, it'd be good if we can spread it out across two physical nodes.
 
Mike Cheung
Ranch Hand
Posts: 113
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Just want to highlight about the statement of 1 - 2 million CRUD operations per second figure. This is obviously depending on hardware, type of operations, etc. For example, Oracle's Times Ten database achieves 0.8 - 7.5 million reads per second per chip; and 60 - 273 thousand updates per second per chip quoting from the following article.
https://blogs.oracle.com/BestPerf/entry/20121001_sparc_t4_2_timesten

Looking at the following benchmark done by NitroCache, we can see that using in memory caches can achieve higher performance. For example, NitroLRU achieved 400 million fetches in 61.05 seconds (roughly 6.552 million reads per second); and 20 million puts in 20.41 second (roughly 979,911 puts per second). Looking at the numbers using simple caches is a lot faster than using in memory databases, and not forgetting also the Oracle tests were done on SPARC while the NitroCache benchmarks were done on Intel Core i5.
http://sourceforge.net/p/nitrocache/blog/2012/05/performance-benchmark-nitrocache--ehcache--infinispan--jcs--cach4j/

The performance win with simple caches over in memory databases should be due to the cut down in bloat and simplicity of operations. For example, I'd imagine doing simple reads from caches over a loop is going to be faster than doing a SELECT * FROM TABLE).

I know I don't need all the features of a proper database like views, ACLs, replication, referential integrity, etc so rather than making a decision to just go ahead with an in memory database, I'm interested to see if anyone has done anything similar using caches and hence forth starting this thread.

Anyway, hope this helps give a bit more context on what I'm thinking through.
 
Jayesh A Lalwani
Rancher
Posts: 2762
32
Eclipse IDE Spring Tomcat Server
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Sounds like you need a NoSQL database. NoSQL databases are meant for application that don't need transactions and referential integrity, but need low latency, high throughput and scalability. The only thing is most NoSQL databases are not great at doing joins. Even if they support it, the throughput goes down. Doing a join is difficult. Most applications that need that kind of throughput do it by denormalizing the data and putting indexes on columns that you are going to search by. You will be trading off speed for space and simplicity.
 
Winston Gutkowski
Bartender
Posts: 10573
65
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Mike Cheung wrote:a) From my perspective the faster it is the better.

Really? Even if it creates a pile of spaghetti code that is difficult to understand, and even worse to maintain?

Speed is in the eye of the beholder; and in trying to create a system where speed (or throughput) is the only benchmark you're interested in, you may well find that you've shot yourself in the foot when, later on, you try to upgrade it, and find that it takes 10 times LONGER (and costs your company 10 times more) to modify and test than it would have done if you'd chosen a simpler, less "tuned" approach.

There's a old adage in computing: Good, Fast, Cheap - pick any two.

HIH

Winston
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!