Win a copy of Murach's Python Programming this week in the Jython/Python forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic

How to index java objects in memory  RSS feed

 
Aabha Varma
Greenhorn
Posts: 22
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

I have to work with large junks of data.
Assume i have a basket full of apples, which has its own specific colour,specific weight,specific place where it is imported.
I will taking all these details as a list to the client, by creating a Value object of apple.

Now the issue is if i want to do a query on size or place of import or other characteristics, i need to go through the entire list in a for loop and get the list. This is taking hell lot of time.
Is there any way of indexing the objects so that i can say, give me size x apples, it will return me all the size matching x.(Basically without going in the for loop). I dont want to store individual arraylist based on size, weight etc.. because it will eat up a lot of memory......

Something like the one implemented in oracle queries where clause.
So can anybody help me through.
 
William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13078
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
There is a Sourceforge project http://josql.sourceforge.net/
that is supposed to let you apply SQL-like operations to collections of Java Objects.
Bill
 
steve souza
Ranch Hand
Posts: 862
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
How many rows of apples are you talking about?
 
Ilja Preuss
author
Sheriff
Posts: 14112
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Aabha:
This is taking hell lot of time.


Are you sure? How much, with how many objects?


Is there any way of indexing the objects so that i can say, give me size x apples, it will return me all the size matching x.(Basically without going in the for loop). I dont want to store individual arraylist based on size, weight etc.. because it will eat up a lot of memory......


That is what an index does: bargaining speed for memory. I'd probably use a HashMap if I needed the performance.


Something like the one implemented in oracle queries where clause.


If you don't have an index on the column, oracle have to do a linear search. If you apply an index on the column, it basically does exactly what you described above: using a lot of additional disk space to build an additinal collection that maps the values of the column to the rows in the table. So there really is no magic involved...
[ April 04, 2005: Message edited by: Ilja Preuss ]
 
Aabha Varma
Greenhorn
Posts: 22
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Using hashmaps doesnt give a feasible solution. Because if my object has 3 different attributes, depending on the value i should create and put them in 3 different hashmap, which is unnecessarily loading the memory thrice with the same objects.

I had just look at Java Spaces. But its effective only for distributed system. There also it duplicates theobject at its server and local.
Is there any similar interface like java spaces....which can be used..
or can you suggest some idea regarding the structure of the object, so that if i give a size and color as parameter, i will get the result set, but without using two loops
 
Ilja Preuss
author
Sheriff
Posts: 14112
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Aabha:
Using hashmaps doesnt give a feasible solution. Because if my object has 3 different attributes, depending on the value i should create and put them in 3 different hashmap, which is unnecessarily loading the memory thrice with the same objects.


No, the objects don't need to be more than once in memory. You can put the same object into as many HashMaps as you like. Again, it's exactly identical to how databases would do it - an index for every column.

can you suggest some idea regarding the structure of the object, so that if i give a size and color as parameter, i will get the result set, but without using two loops


You still didn't tell us how many objects you need to manage. It's hard to give good advice without knowing more details.

BTW, you need to adjust your display name to our naming policy, which requires that you use both a first and a last name. Thanks!
 
Aabha Varma
Greenhorn
Posts: 22
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi

An object is having 10 different attributes.
And they are loaded into the memory in junks.
There can be 1000 such objects added in one go.

These objects that are already added will be of no use after a particular condition. But that condition is again a random thing. sometimes it may last for a long time or it may notbe used in few seconds.

The whole program may run for 10 -20 minutes.
And the number of search may be 100-150 times in a minute or less than a minute. These search may be for the differnt combination of object parameter.

I havent tested the stuff so am not able to give you the exact figures.
These figures i have mentioned is very much possible.
 
Ilja Preuss
author
Sheriff
Posts: 14112
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Aabha:
An object is having 10 different attributes.
And they are loaded into the memory in junks.
There can be 1000 such objects added in one go.


A linear search in 1000 objects shouldn't take much longer than a few milliseconds. If it does, you should try to find to find out why first.

How long *does* it take? Can you show us some code?
 
Ilja Preuss
author
Sheriff
Posts: 14112
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
And please adjust your display name, as asked by me above. Thanks.
 
Aabha Varma
Greenhorn
Posts: 22
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Ilja Preuss:


A linear search in 1000 objects shouldn't take much longer than a few milliseconds. If it does, you should try to find to find out why first.

How long *does* it take? Can you show us some code?


I havent done any coding on this. Its in desing phase only.

A few milliseconds(as quoted by you) is to search 1000 objects with a single attribute right? so if there are 10 such attributes.. 10 * few milliseconds..
if there are 100 such request to search it comes to 10*100*few milliseconds..

i.e., it might take a few seconds right?
i am supposed to do 1000 such request in 1 second.
So i am thinking of objects other than arraylist and hashmap.

lemme see if i can write some code on it, whcih i think it takes weeks.

:roll:
 
William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13078
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I havent done any coding on this. Its in desing phase only.

I would say that you should explore the simplest approach using existing collections with realistic simulated loads before writing anything complex.
Premature Optimization is the root of all evil.
Bill
 
Ilja Preuss
author
Sheriff
Posts: 14112
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Aabha Varma:
I havent done any coding on this. Its in desing phase only.


Trying to performance optimize a system without having any real data isn't going to be very effective. You really need to at least code a prototype to base your optimization on.

As an aside, the whole concept of *phases* is quite suboptimal, a highly iterative approach is much more effective.


A few milliseconds(as quoted by you) is to search 1000 objects with a single attribute right? so if there are 10 such attributes.. 10 * few milliseconds..
if there are 100 such request to search it comes to 10*100*few milliseconds..

i.e., it might take a few seconds right?


Might, or might not. It really was just an expression of my gut feel that you probably shouldn't get a performance problem at all. But that really depends on so many issues that to be sure, you need to try it.


i am supposed to do 1000 such request in 1 second.
So i am thinking of objects other than arraylist and hashmap.


HashMap is *really* fast - O(1). That is, if you have a reasonable hash code implementation, the performance of a search doesn't depend on the number of objects you search in.

lemme see if i can write some code on it, whcih i think it takes weeks.


Why should it take *weeks* to implement a simple spike on linear searches or using a HashMap...?
 
Peter Hu
Ranch Hand
Posts: 33
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Well, Aabha, there are some factors you haven't mentioned here. Where is the data stored, in disk files, in a database or in another system? How frequently is the data updated? From the context of your posting, it seems like there is a system that notifies the availability of new sets of data in an unpredictable manner.

But regardless of how you load the data. Trying to index these objects in Java collections objects is not possible since they don't allow duplicate keys.

My opinion is to use a good Database system runs on a machine with lots of memory so the whole table of records can be loaded into memory cache.
 
Ilja Preuss
author
Sheriff
Posts: 14112
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Paul HG:
But regardless of how you load the data. Trying to index these objects in Java collections objects is not possible since they don't allow duplicate keys.


You don't need duplicate keys to do that, all you need is to associate a collection with the key.
 
Peter Hu
Ranch Hand
Posts: 33
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Ilja Preuss:


You don't need duplicate keys to do that, all you need is to associate a collection with the key.


By duplicate keys, I meant there are same key values for different objects, rather than make copies of the keys.

For example, we consider apples with attributes of color, weight and imported place:
Apple 1: RED, 1/2P, ENGLAND
Apple 2: RED, 1/2P, JAPAN
Apple 3, GREEN, 1/3P, THAILAND
Apple 4, GREEN, 1/3P, USA
Apple 5: GREEN, 1/4P, JAPAN

How do you index these apples by the 3 attributes? If you index on color, you can have only 2 objects stored in the collection since there are only 2 distinct keys RED and GREEN. Unless you prepare to write your own collection like B+ tree to allow duplicate keys with index on those attributes, or use like ArrayList to loop to count the number of objects, all the existing Map classes only allow unique key values. It's not possible to use the existing collection classes to index on the attributes.
 
Ilja Preuss
author
Sheriff
Posts: 14112
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Paul HG:
all the existing Map classes only allow unique key values.


Yes, but they also allow collections as the value for a single key.


It's not possible to use the existing collection classes to index on the attributes.


Then I probably should get a ticket to Milliways, because I've already done it...
 
Peter Hu
Ranch Hand
Posts: 33
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You can create one index for every single attribute. Yet if it needs to search on multiple attributes, you can get the collection for the first matched attribute, you still have to loop for match on the rest of the attributes. If the search is for a range of values, like weight between 1/4 and 1/3 pounds, you need one additional loop. If it searches for imported place from "L" to "S", extra logic needs to put in to find the places that meet the criteria.
 
Ilja Preuss
author
Sheriff
Posts: 14112
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Paul HG:
You can create one index for every single attribute.


You can also create an index for a combination of attributes, quite similar to creating an index over more than one column in an RDBMS - if you need that.

Yet if it needs to search on multiple attributes, you can get the collection for the first matched attribute, you still have to loop for match on the rest of the attributes.


Not true. You could also, for example, make use of the retainAll methods for two sets for different attributes.

In general, there is nothing magic about RDBMS' - they basically work on collections, too. And getting them to perform well on specific queries isn't always simple, either (for example, I once had to explicitely disallow oracle to use a specific index for a specific query, to improve performance by orders of magnitude).

I agree, that there is a point where the complexity of queries calls for the use of a database, but it's far from clear to me that this is true for the OP.


If the search is for a range of values, like weight between 1/4 and 1/3 pounds, you need one additional loop.


What do you think how a database does this? Why couldn't you do this with your objects?

If it searches for imported place from "L" to "S", extra logic needs to put in to find the places that meet the criteria.


Yes, *if* you need it. So?
 
Peter Hu
Ranch Hand
Posts: 33
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Tanks for the response, Ilja Preuss.

As you can see, when all the situations are handled using Java collection classes, how much time do think Aabha Varma needs to spend to design and implement the functions in Java, as well as to test the code? Please keep in mind, Aabha Varma may not be as experienced as you on this subject.
 
Kevin Peterson
Ranch Hand
Posts: 37
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
on a quick side note, after quickly viewing the thread, couldnt implementing the iterator interface help because it provides you with an add() method so you can add your data to an object array and randomly access it by adding a find() method to your code and sort through it using hasNext() and next(). The only problem would be that the data would have to be sorted by different attributes when loaded into the array for easy retrival. This would just be abstracting your data at a higher level.

- if this approach is totally wrong please correct me -

-Kevin
 
Ilja Preuss
author
Sheriff
Posts: 14112
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Paul HG:
As you can see, when all the situations are handled using Java collection classes, how much time do think Aabha Varma needs to spend to design and implement the functions in Java, as well as to test the code?


But we don't know how many of those situations he really needs to handle. And, with all due respect, for simple queries a HashMap will be both simpler to implement and test, and probably also faster than using a database. Until I know that he will actually have a need for the features of a database, I will continue to suggest a simpler solution.

Please keep in mind, Aabha Varma may not be as experienced as you on this subject.


One more reason to suggest a *simple* solution, which I think using a HashMap is. Not that I wouldn't suggest it to a more experienced person than me, which for all I know Aabha might be as well.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!