Win a copy of Programmer's Guide to Java SE 8 Oracle Certified Associate (OCA) this week in the OCAJP forum!

# 2 approaches, which to choose?

Mark Wales
Greenhorn
Posts: 6
I'm creating a simulation environment to investigate construction using swarms of virtual termites.

I have come up with 2 methods for representing the environment:

(1) As a grid, which would require a 2D storage structure.
(2) As a matrix-based system, which would require more mathematical computation but would require less storage.

I'm trying to compare the efficiencies of the two different approaches but I'm not really sure where to start. Any help would be greatly appreciated.

Scott Selikoff
author
Saloon Keeper
Posts: 4020
18
Could you elaborate on "matrix-based system"?

In general, processor power is considered far cheaper than memory space especially if there are permenant storage writes involved (such as a hard drive), although this can vary from situation to situation with different extremes.

For example, if the entire 2D grid can easily fit into memory, then its not a big deal, but if its big enough that entries will be often cached in and out of memory than this is clearly not a good thing.

Mark Wales
Greenhorn
Posts: 6
Sorry just re-read my post. I meant to say vector-based approach not matrixes, oops. So I'm trying to compare 2D vector with some kind of 2D grid system. Memory isn't too much of a problem, I was concerned about creating so many objects mostly, as in previous projects I've noticed that seems to hurt JVM performance vastly.

Ernest Friedman-Hill
author and iconoclast
Marshal
Posts: 24212
35
I'm afraid that the distinction between a "2D vector" and a "2D grid" is still lost on us. Could you write a little pseudocode to contrast the two implementations?

Scott Selikoff
author
Saloon Keeper
Posts: 4020
18
Is this what you mean?

Solution 1: A 2D typed array (possible of type Object such as Object[10][10])

Solution 2: A vector of vectors

If so, the first solution is better since you're laying the memory requirements for the array ahead of time instead of the second solution where you have to allocate multiple vectors at a time. Ergo, the first one has only one 'new' during instantiation whereas the second has one 'new' per vector child plus the parent vector.

The reason the second is preferred is if the number of size of the 2D typed array cannot be determined or the 2D typed array is often sparse, as in it rarely gets filled.

One note, if this is a typed array of primitives than the typed array is even more preferred since the vectors will require object children (not counting java 1.5).
[ November 01, 2005: Message edited by: Scott Selikoff ]

Mark Wales
Greenhorn
Posts: 6
OK, in my environment there will be ~100 "termites" lets say. They will all be constantly moving. To represent this motion I could give each termite a movement vector and a position vector, or I could place them in a grid square and give them a heading. To maintain reasonable accuracy the grid would need to very fine, i.e. 1000x1000 for example, creating such a large amount of objects would cause performance objects I believe, even if memory is not a problem. Using vectors would be the more elegant solution, but the computation would be much more intense, for resolving collisions between the vectors etc.

So I'm trying to weigh up the performance of each, although it only needs to be approximate.

Paul Clapham
Sheriff
Posts: 21316
32
If you are using vague metrics like "creating such a large amount of objects would cause performance objects I believe" then you are not going to be able to quantify your performance beforehand. So my approach would be to write one version and see if it is fast enough.

As Scott said earlier, don't write data to and from disk unless you have to.

William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13071
6
the grid would need to very fine, i.e. 1000x1000 for example,

Creating an object for each grid point would seem to be overkill. What do you really need to keep track of at each grid point?
Bill

Scott Selikoff
author
Saloon Keeper
Posts: 4020
18
One thing I don't thats been brought up is that defining a large array of objects does not mean you've actually created the objects. For example, you can use a 1000x1000 matrix that only has 100 'termite' objects in it, you just move them around the grid. The rest would be null.

On the other hand if you created 1000x1000 terminites or recreated them when you moved them...

Jim Yingst
Wanderer
Sheriff
Posts: 18671
Yeah, if most of your 1000 x 1000 grid is empty, the array structure itself (the array of arrays) takes about 4 Mb. (4 bytes for each object reference, of which there are 1000 x 1000.) Not a big deal in and of itelf, I think. But if you later decide you need the termites to inhabit a 3D region that's 10000 x 10000 x 10000, that could get ugly.

Modeling termites with position and velocity vectors and talking about calculating collisions makes me think of modeling gas molecules as though they were billiard balls. I would think that collision effects are probably minor for an insect colony (unless it's at war with another colony) - when termites "collide" they probably just alter their courses slightly so one goes over or around the other, and then they continue on their merry way. I'd think that other effects are more important, like the environment. What tunnels have already been dug? How hard is it to dig new tunnels in various types of material? Where are food resources to exploit? Etc. I'm thinking these types of considerations will push you towards keeping a 2D (or 3D) grid in memory in which you track this additional info - a map of tunnels etc which also may contain the termites themselves. Of course I could be completely wrong; I'm just making vague guesses at what sorts of things you may be trying to model here. Sounds interesting though - good luck.
[ November 01, 2005: Message edited by: Jim Yingst ]

JuanP barbancho
Ranch Hand
Posts: 52
Hi,

GRID is good for very heavy CPU process. But, grid need Network.

I think that if you want use only a machine, with a minimun of CPUs.

Memory hardware is cheaper than CPUs hardware, and memory do not pay license. Some product like J2EE server have license by CPUs.

I need know the CPU time that you will need and the memory that you will need.

You could divide the CPU time / num CPUs

You could divide the )CPU time + Network Time) / (num GRID cpus )

Thanks you

Mark Wales
Greenhorn
Posts: 6
Thanks for all the advice guys. The other parts of the simulation would involve pheremone trails etc, and so a grid would make detecting insects crossing the trails much easier than resolving vectors I believe.

A grid based approach it is.

Jim Yingst
Wanderer
Sheriff
Posts: 18671
Mark: Sounds cool. Enjoy.

Juan: I think we're using "grid" differently here. Mark isn't talking about grid computing; he's talking about a grid as a rectangular map, which will be stored as a 2D array (rather, an array of arrays). It's just a data structure here.

Similarly, Mark: you might want to be aware that when you say "vector" around Java programmers, many of them will probably think of the java.util.Vector class, which implements a variable-length list. This has very little to do with traditional vectors as the term is used in math / hard sciences / engineering, for a quantity which has both magnitude and direction. It's possible to use a Vector of numbers to represent a vector, but it's a bit inefficient since a vector has no need for changing the number of elements - a 3D vector isn't goign to suddenly turn into a 4D vector, unless you're doing some very unusual mathematics. This is one of many reasons why I dislike the Vector class - it uses a badly chosen name for the concept it represents. (This usage wasn't invented in Java; it came from other programmers / computer scientists - but in my opinion, those other programmers were wrong too.) So Mark: I think your usage of "vector" is entirely correct, but you will just have to beware that many programmer types will think you mean something different.

JuanP barbancho
Ranch Hand
Posts: 52
Sorry,

I think that you need a performance solution and not a algorithmic solution.

Thanks

Mark Wales
Greenhorn
Posts: 6
OK guys. If I adopt the 2D grid layout I have 2 options that I can see for data structures.

(1) 2D Sparse Array - implemented with linked lists say, which is efficient for storage, but relatively poor for retrieval, search etc.
(2) Array of arrays - efficient for search etc, poor for storage efficiency.

If I'm going to have approx. 100 "termite" entities in a grid of approx 1000x1000, which one is going to be the best tradeoff between memory and search speed?

Mr. C Lamont Gilbert
Ranch Hand
Posts: 1170
Originally posted by Jim Yingst:
Yeah, if most of your 1000 x 1000 grid is empty, the array structure itself (the array of arrays) takes about 4 Mb. (4 bytes for each object reference, of which there are 1000 x 1000.) Not a big deal in and of itelf, I think. But if you later decide you need the termites to inhabit a 3D region that's 10000 x 10000 x 10000, that could get ugly.

Wouldn't the array of arrays only be 1001 objects to begin with? With the array that holds 1000 other arrays, and the 1000 other arrays totallying 1001?

I like the grid idea. With a source point and a destination point your math is easier and you can use looser less accurate math. Even integer if you wish. with vectors, you need high accuracy to ensure they dont get off course. but this depends on if their endpoint is more significant than their path or not

Anyway, I thikn the grid scales better with respect to accuracy, the vector may scale better with respect to study size.

Good luck, sounds like a lot of fun.

William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13071
6
My approach would be to allocate a single 1000000 element array of Object and write methods for index conversion. The reasons? Simplified allocation, ease of handling edge conditions (ie wrapping or not wrapping) ability to mix termite and non-termite objects, ease of switching geometry.
You might want to look at the many cellular automata implementations - going all the way back to the "game of life" - most of them seem to use a single array.
Bill

Jim Yingst
Wanderer
Sheriff
Posts: 18671
Mark - I don't think 1000 x 1000 is large enough to warrant the additional complexity and decreased speed of a sparse array. It's a possibility to keep in mind for the future though, if you find you need a finer grid than you have now.

[JuanP]: I think that you need a performance solution and not a algorithmic solution.

Sorry, but I don't know what this means here. Some sort of algorithms will be involved whatever the solution is, and performance will probably become a noticeable factor (though not necessarily a critical one). I don't know if that addresses what you're talking about or not though.

[CLG]: Wouldn't the array of arrays only be 1001 objects to begin with? With the array that holds 1000 other arrays, and the 1000 other arrays totallying 1001?

Yes. And each of those 1001 arrays would be big enough for 1000 object references, so ~4000 bytes per array, 4004000 bytes total, approximately 4 Mb. The references take up the same amount of space whether they're null or not - of course if they're not null then they probably refer to other objects, and those objects take up space. I'm assuming that's negligible initally. The various arrays also probably have some minor overhead (additional memory cost) which we're overlooking.

[Bill]: My approach would be to allocate a single 1000000 element array of Object and write methods for index conversion.

Yeah, that works too. Total memory is about the same. Ease of allocation? Calling new Object[1000][1000] creates everything you need in one step, pretty easy. I'm not sure I'd see any reason to use wrapping here unless this is a termite video game. I don't have a strong preference though between new Object[1000][1000] and new Object[1000000] - I suppose I'd want to try both and measure access times to see if there's a difference.

As for ability to mix termite and non-termite objects - well that's a separate issue from whether we use array of arrays or one big array or a sparse array; it's determined by the type of the elements in the array. My first thought is to create a GridPoint class representing everything at a given point; could include a Termite or a PheremoneCount or both, plus whatever other elements are useful. Or maybe all the info we need can be encoded into an int or something smaller instead. E.g. perhaps all we need is the termite ID number (7 bits, could be 0000000 for no termite at all) and a rough pheremone intensity level (9 bits). Everything we need fits in a short, so use a new short[1000][1000] or new short[1000000] for about 2 Mb and no additional object. Well, there probably are more objects somewhere, but we've drastically reduced the number.

Bill - what do you mean by "ease of switching geometry"? What sort of switch?

James Hejmanowski
Greenhorn
Posts: 8
I'm a complete greenhorn so please feel free to disregard my rantings beyond this point but...

This sounds like a neat problem. Couldn't you model it as array or arrays of objects and just track everything through the state of the object i.e. variable represnting empty space vs termite vs pine vs oak vs tunnel etc?

Could one construct an object of a two-by-four being eaten by termites then build a virtural wall of such objects then further extend the concept? Then you'd have effecient storage and the speed of arrays...

Some questions: does the direction of the grain of the wood matter to a termite? Are termites smart enough to cross perpendicular to a tunnel's axis or just as likely to wander down a tunnel a bit? Is this a valuable model in only two dimensions?

James Kline
Greenhorn
Posts: 6
Hello, everyone:
For the sparse matrix approach, I think the best choice is the HashMap(). In a HashMap, you will save a lot of typing, have a clear code, and tuning its capacity at startup, it would be on the same order of efficiency than the array. A vector of vectors is cumbersome.
With the java.util.HashMap and Java2 v5.0, it would be something like this:

Map grid = new HashMap<Pos,Termite>;
...
grid.put(new Pos(2,5), new Termite(velocity, etcetera));
...
if(grid.containsKey(posi)){...}
...
Termite termite7 = grid.get(posi);