• Post Reply Bookmark Topic Watch Topic
  • New Topic

Management of data object  RSS feed

 
Gabor Bator
Greenhorn
Posts: 6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello,

I am new here and I hope I opened this topic in the right forum. I have a problem in designing my system, and I hope that someone can help me here.

I am currently working on a system that builds and manages an ontology. It has concepts, each having attributes like name, real world name, constraints, etc. It would be useful to have methods as well (at least comparison and formatting methods). Therefore I decided to have a class for them.

My problem is that I have to put them into 'something'. A simple collection is not enough, as I will need indices to speed up operations.

The first idea was to create an object to store them in, and this is what I have done. It is based on an interface called Storage. I have designed another interface called Storeable, which can be added to a Storage and I have also modified the concept class to implement it.

Now the problem is, that some functionalities I need cannot be described by interfaces. For example, if an indexed attribute were changed in the concept, the storage class would have to update the index. That is maybe where abstract classes come in, but I am afraid it may upset the inheritence model. Furthermore, I have to write all index functions myself, and that becomes more difficult as the "key" may include more than one field.

Databases would provide an alternative, I guess; however, I feel insecure about how to map an object to sql tables, if I have to use them as objects. I have also considered to use them as simple rows in the database, but I think that is not a solution: the problem is inherently OO-like, why change it?

I would be grateful for any suggestions.

Gabor
 
Frank Carver
Sheriff
Posts: 6920
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Gabor Bator:
My problem is that I have to put them into 'something'. A simple collection is not enough, as I will need indices to speed up operations.


OK. As long as you are really sure about needing the index. Remember that premature optimization is rarely a good idea. Can you give any idea how many of these items you might need to store, if there might be duplicates, etc. ? You might not have thought of it this way, but the standard collections also include general-purpose indexed ("Map") collections, and they are often enough.


The first idea was to create an object to store them in, and this is what I have done. It is based on an interface called Storage. I have designed another interface called Storeable, which can be added to a Storage and I have also modified the concept class to implement it.


This sounds reasonable so far.


Now the problem is, that some functionalities I need cannot be described by interfaces. For example, if an indexed attribute were changed in the concept, the storage class would have to update the index. That is maybe where abstract classes come in, but I am afraid it may upset the inheritence model. Furthermore, I have to write all index functions myself, and that becomes more difficult as the "key" may include more than one field.


You have lost me a little here. I hope I got the gist of it.

At the start you seem to be indicating that when an object is updated its index value might change, which implies that the Storage object maintaining the index needs to know, in order to update its lookup table.

So, if you use something like a Map, then this seems easily accomplished by removing the old object (mapped from the original index key value), and re-adding the object mapped from the new index key value.

If you go with the Storage object managing a custom lookup table, this sounds like a perfect job for an Observable pattern. Introduce some sort of "StorableChangeEvent" and an accompanying "StorableChangeListener" interface. Implement the StorableChangeListener in your Storage class, and pass itself as a listener to each Storable object as it is added. Then if any Storable object changes its value in a way which would effect the index lookup, it can easily notify the container.

The second option neatly takes care of your issue with potentially complex or variable index key algorithms: when the container is notified that a Storable item has changed, it can then simply call its "getIndexKey" method (or whatever) and re-index as appropriate.

Do either of those sound reasaoble?


Databases would provide an alternative, I guess; however, I feel insecure about how to map an object to sql tables, if I have to use them as objects. I have also considered to use them as simple rows in the database, but I think that is not a solution: the problem is inherently OO-like, why change it?


I tend to agree. Databases are good for keeping data around for a long time, and making it available to other applications which may not be in Java, or may not even be thought of yet. However, all of this comes at the expense of extra complication in the software, and potentially large performance issues with transferring data to/from the database

As described, your problem seems pretty OO to me.
 
Peer Reynders
Bartender
Posts: 2968
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Gabor Bator:
I am currently working on a system that builds and manages an ontology. It has concepts, each having attributes like name, real world name, constraints, etc. It would be useful to have methods as well (at least comparison and formatting methods). Therefore I decided to have a class for them.

My problem is that I have to put them into 'something'. A simple collection is not enough, as I will need indices to speed up operations.

How do you know that? This type of premature optimization usually has a negative influence on design. Have you actually written some sample code representative to the operations that your program will perform using the different collections available in the JDK (each of them have their strengths and weaknesses. Collections Tutorial) and profiled it, resulting in some hard numbers?

Now if you had said that you need index-like features to support multiple types of searches that is a bit different � but you can still use the standard collections. You could simply implement a container object that manages multiple collections � one collection is the primary collection for your indexed objects, while each supplementary collection supports one index-type search. The primary collection could be a HashSet that uses your object's unique "domain key" as represented by your object's hashCode() (and equals()) function. The supplementary index collections could be HashMaps, mapping on an "index specific key object" extracted from your indexed object.

Alternately you may not need supplementary index collections. You could simply define a Comparator class (implementing java.util.Comparator) for each index type search and use java.util.Collections.binarySearch. A Comparator can also be used with a TreeSet and TreeMap.

There are two issues you have to deal with. You cannot change the "indexed" properties while the object belongs to a Set or a Map. Always remove the object from the (affected) collection, modify the "indexed" properties, and then re-insert it. One easy way out is to make all you indexed objects immutable. That way you have to create a new instance with the modified properties, remove the old instance from the collection(s), and insert the new one.

The other problem is that your "indexed" properties may not be unique. In that case the Set/Map would not refer to one of your indexed objects but to a List of them.

Originally posted by Gabor Bator:
Databases would provide an alternative, I guess; however, I feel insecure about how to map an object to sql tables, if I have to use them as objects. I have also considered to use them as simple rows in the database, but I think that is not a solution: the problem is inherently OO-like, why change it?


you should give this option very serious consideration. A real ontology can become rather large and right now you are trying to keep it all of it in memory � which puts a severe upper limit on the potential size of the ontology. Also most databases will give you the indexing/search facility "for free". You may want to look at
Data Modeling 101
Introduction to Object-Orientation and the UML
Mapping Objects to Relational Databases: O/R Mapping In Detail
After you have digested that you can look at O/R Mapping frameworks like Hibernate and decide whether they a worth the trouble. Once you understand the relational paradigm a bit better, you'll probably be able to exploit it more effectively. Depending on what the ontology is used for a Java/OO solution may not be optimal anyway, a tool using an ontology language may be more effective.
 
Ilja Preuss
author
Sheriff
Posts: 14112
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
It sounds to me as if a database would only be called for if you had to store massive amounts of data. In that case, you might also want to take a look at OO databases. They can be quite effective.
 
Gabor Bator
Greenhorn
Posts: 6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello,

thank you very much for your comments. I feel I should clarify some things which I have forgotten in my previous post.

I have chosen java because the ontology is not ready yet; I have to create it from dictionary data, which includes file processing, and applying various algorithms that have very little to do with reasoning. I think that for these kind of tasks, java is an appropriate choice. If it is ready, I may input the data to an inference system.

Secondly, there are some 65000 entries, which I have to manage. Therefore indices are a must. Currently I have two of them: one for the name and one for the 'parent' concept, since these two will be used by the algorithms (for now).

Finally, I am using a custom, serialization-like saving function for persistency (so that it can be later edited manually). I hope I can run all algorithms with the objects in the memory. However, I am considering databases as well, but I am not planning to use them right away, unless it fits into the design well.

Frank: sorry for not being clear, but you understood the problem perfectly. In fact, I have already been using the Observable pattern for this. However, I have some problems with it:

1. In case an index contains more than one field (like name, variant), the code quickly becomes quite complicated.
2. If I think about the generic problem (management of data objects), the only viable way I can think of would be to access the fields via reflection. I suspect this might result in some performance decrease, but the main issue is how can I access multiple-field indices in this way.

Peer: As you can see above, I have already created a simple version which kind of works, but is a bit messy; so I would like to know what the best solution is.

I use Lists for indices as the fields are not unique. The problem is (again, considering the generic problem): what if I later need another index? Modify the object again? Or use generics to create one on the fly (including Comparators)? Can this be done in java at all?

As for the pages you suggested, I have not had the time to look at them. However, you mentioned O/R mapping. Is this the only way to use a database? I mean, I heard about OO databases some 3-4 years ago, but not ever since. Are they any good?

About the relational paradigm: I could imagine the 2-3 tables I need to describe the model. However, I just feel a bit unsure about adding an extra framework and losing the simplicity of the model. I know it is hard to answer this (without knowing all details), but again, thinking generically, which do you think is better?

Thanks,
Gabor
 
Peer Reynders
Bartender
Posts: 2968
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Greenspun's Tenth Rule of Programming: "Any sufficiently complicated C or Fortran program contains an ad-hoc, informally-specified bug-ridden slow implementation of half of Common Lisp."
You may want to consider posting a question regarding the relative merits of a Java vs Lisp implementation of your problem on the comp.lang.lisp newsgroup. Java-Practitioner gone Lisper Peter Seibel, Author of Practical Common Lisp seems to frequent that newsgroup. It would be interesting to hear his comments.
 
Frank Carver
Sheriff
Posts: 6920
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Gabor Bator:
1. In case an index contains more than one field (like name, variant), the code quickly becomes quite complicated.


This maybe naive, but in the past I've usually got by by just building a compound index key (say name + "/" + variant). Is this reasonable in your case?

2. If I think about the generic problem (management of data objects), the only viable way I can think of would be to access the fields via reflection. I suspect this might result in some performance decrease, but the main issue is how can I access multiple-field indices in this way.


Again I'm a little puzzled here. It seems as if you are still writing about the container somehow "looking inside" the contained objects to access field data directly rather than calling a method on each object to let it do the data-fiddling itself. Is this correct?

Are you assuming that all contained objects will be instances of the same class, even though they may need different key-generation behaviour?

All sorts of cunning idas keep popping up about possible solutions to this problem, but I always end up with more questions. Can you tell us a bit more about the process you currently use for loading data into your container? How complex/flexible are the objects you build when you load the data - what methods do they expose? how polymorphic are they? etc.

Best of all, could you post (or send by email if it's too big) some example code from your current solution. That way we can see unabiguously where you are at the moment.
 
Ilja Preuss
author
Sheriff
Posts: 14112
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Gabor Bator:
Secondly, there are some 65000 entries, which I have to manage. Therefore indices are a must.


Have you *tried* without the indices first? With all due respect, 65000 doesn't sound *that* much to me (depending on what you are doing with it, of course).
 
Peer Reynders
Bartender
Posts: 2968
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Frank Carver:
How complex/flexible are the objects you build when you load the data - what methods do they expose?


In addition I�m getting the (possibly mistaken) impression that the ontology-entry object is being forced into a single class/object with somewhat fixed properties, when in fact a composite object that exposes discovery and inspection methods appropriate for the problem space would be a better fit, doing away with the need for any type of reflection.
Such an ontology entry object may also expose some of its general Properties and "composed Properties" (represented-as-objects) as "index properties" to allow the specification of a more general indexing process.
In effect such an ontology entry would be much less defined on the Java syntax level but more configured during the loading/object-building cycle.
 
Gabor Bator
Greenhorn
Posts: 6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Frank: the compound index is a good idea. In fact, the name field was originally like that, and I still do not know whether to leave it like that or to split it to separate fields. I guess I will give it another go.

Ilja: I have conducted a small test in which I have added 65000 number entries to a List (both as Strings and as Integers, using Random or just incrementing the value, which is the worst case). Doing it with linear search took up to 40-50x, at worst 100x as much time as with binary search. This is still far from the theoretical ~4000x (n/logn) difference, but please note that operations that lower this multiplier (like insert cost to the Arraylist) will not be frequent in the actual algorithms, which will mainly consist of graph traversal.

About flexibility: that is one of the factors I would like to determine now. I began with a pretty much hardwired solution, with methods like findByParent, and now I am trying to find the balance. As of now, the Storage has a findByField and a findByFields method, while the Storeable exposes get/setField. However, I still use generics, which I feel to be a solution between the two. Or it may be a good way to add specific ("hardwired") content in addition to the general methods (like the above mentioned findByParent). What do you think about this?

Currently I have a Storeable<T> interface, and planning a Storage<E extends Storeable>. Previously I used something like



, but that is obviously not an award-winning design.

The problem is that indices require Comparables. I am just wondering how I could make them available. First idea would be to include them in the Storeable (descendant) class and provide a discovery method for them (and the properties on which the Storage has to create an index). What do you think about it?
[ April 27, 2006: Message edited by: Gabor Bator ]
 
Ilja Preuss
author
Sheriff
Posts: 14112
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Gabor Bator:
Ilja: I have conducted a small test in which I have added 65000 number entries to a List (both as Strings and as Integers, using Random or just incrementing the value, which is the worst case). Doing it with linear search took up to 40-50x, at worst 100x as much time as with binary search.


Well, yes, that is to be expected.

But the *important* question is: is it *too slow* for you? Is it *your bottleneck*, or are there more important parts of the code to work on?
 
Gabor Bator
Greenhorn
Posts: 6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Ilja Preuss:

But the *important* question is: is it *too slow* for you? Is it *your bottleneck*,

Yes, it is. Most of the algorithms will use the indices extensively, and they will take quite some time even like this.

Originally posted by Ilja Preuss:

or are there more important parts of the code to work on?

Performance (speed)-wise, not really. Design-wise: this is why I have opened this topic.
 
Ilja Preuss
author
Sheriff
Posts: 14112
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Gabor Bator:

Most of the algorithms will use the indices extensively, and they will take quite some time even like this.


So, how much performance improvement did you get with the indices *for your algorithm(s)*?
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!