• Post Reply Bookmark Topic Watch Topic
  • New Topic

Large data structures  RSS feed

 
Gryphon Saldin
Greenhorn
Posts: 2
Firefox Browser Netbeans IDE Windows XP
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'm trying to develop a program that deals with a large amount of data that i'm currently visualizing as a 3d grid (500x500x50ish entries minimum, but ideally able to handle an order of magnitude or 2 more than that). Most of the entries will be using the byte data type (or possibly a custom smaller one) but being able to use multiple data types would be useful (though all data on the same 'level' would be of the same type)

I'm trying to make it able to run on low-ram systems, so will need to be able to read this data from the hard-drive in some way (&plan to save it in the same format) and so have been looking over the internet trying to figure out how to structure this, but am getting a bit lost; I need to be able to call up several wide horizontal 'slices' out of this as well as able to grab a number of vertical 'columns' with all of their data to work with and write back. However, I can't seem to figure out which way to go with this.

I will not, however, need to be able to add data entries in the middle; ex: entry 34x26x5 will always be next to 35x26x5 and I won't ever need to add an entry between the or any other data.

Please help get me pointed in the right direction, and feel free to ask for clarification if any of this was vague or you have any other questions.
 
Madhav Turangi
Greenhorn
Posts: 16
1
Eclipse IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
It is not clear where are you stuck with. If it is with 'how to represent the data structure', then to start with take 3 dimensional byte array. Wrap it into a class then keep adding methods for whatever behavior needed.
Say,



A slice of 3-dim arrray at an index is a 2-dim array (one less dimension), is my definition.
Based on that, the below method can be added.



Only slices from 2 dimensions will ever be needed and slicing from one dimension (3rd) is never needed, then add two methods


 
Joe Areeda
Ranch Hand
Posts: 334
2
Java Netbeans IDE Tomcat Server
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I agree with Madhav, it's not very clear where your stuck, or even if your stuck, but that won't stop you from getting responses on the Internet.

If we break it down into 2 problems 1) developing the algorithm and 2) dealing with data too big to be held in memory at one time we can come up with a plan. Not necessarily the solution you're looking for but a plan to find one.

Your initial size of 500x500x50x1 byte is only 12.5 MB so you can probably deal with it all in memory on your development system.

I would suggest a class for the data that hides where it's kept, disk or memory or some combination.

Develop the algorithm and see what order you will be accessing the data. Personally I would start with a very simple data class. All data in memory and a single getter and setter that takes the x,y,z coordinates and works on one value at a time.

Yes it will probably be slow but I've found it much easier to take a program that works and make it fast than to take one that's designed to be fast and make it work. Especially with a problem like this knowing how you access the data before trying to optimize multi-level memory simplifies the problem.

Give us more details on what you're planning and we can probably come up with more concrete suggestions.

Joe
 
Winston Gutkowski
Bartender
Posts: 10575
66
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Gryphon Saldin wrote:I'm trying to develop a program that deals with a large amount of data that i'm currently visualizing as a 3d grid (500x500x50ish entries minimum, but ideally able to handle an order of magnitude or 2 more than that). Most of the entries will be using the byte data type (or possibly a custom smaller one) but being able to use multiple data types would be useful (though all data on the same 'level' would be of the same type)

In addition to the other good advice, it would be interesting to know what this data represents, and what it's properties are.

For example: is it sparse? ie, How many of the 12.5 million possible grid positions will actually contain data? Or: how many will contain data that is different to their neighbour?

How often are you likely to need to add new items? Or remove them?

You can probably gather that I'm trying to think of possible strategies for storage; but without more info it's very difficult to even suggest any.

However, just one possibility to think about might be a HashMap<Integer, Byte> (or possibly a TreeMap equivalent), where the 'Integer' is a denormalized grid location.

HIH

Winston

PS: Welcome to JavaRanch.
 
Gryphon Saldin
Greenhorn
Posts: 2
Firefox Browser Netbeans IDE Windows XP
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks & sorry it was a bit vague as my post was written on a phone while my brain was fried from heat.
//embarrassed
My goal is to create a map program (hex style for fantasy rpgs, incidentally, as i wasn't entirely happy with anything i could find) with ~1-mile hexes as it's base and designed to easily handle at least 500mile wide maps. Unlike a lot of the ones I've seen, however, I plan to store a lot of different information on each hex, most of which will be numeric in nature, and each 'layer' being representative of a specific type of information (elevation or rainfall, for example).

The amount of repetition will vary greatly by layer, and a large amount of the planned functionality will involve doing read-compute-write with several layers at the same time. Many of the planned algorithms, however, plan to use small incremental changes to iteratively generate larger patterns, so most of them will be completely full of data that doesn't repeat that often, and using larger data types than planned may have to happen to make some of them work as intended. On the upside, however, it does mean that the number of data entries can be constant hex-to-hex.

I'm not trying to make it completely optimized from the start, However, since all functions being things built around this central data-structure(s) I would like to get what kind(s) I should be using correct from the start to minimize re-writing.

I plan to save this data, along with other data, into a single file for ease or transport. In time, though, the data set may grow to be larger than available ram, geometric relations being what they are, possible future features, and with the inclusion of multiple data-types, so I was trying to structure it from the start to be able to pull portions of the data up without loading the whole set. I had considered making 2d arrays for each layer, but trying to quickly call up -all- the data on multiple hexes in this case would require pulling up, searching through, and closing every layer in turn. (Which i'm trying to avoid for latency reasons)

To directly answer another question, though, I shouldn't need to add/remove items, but if I choose to at some point, it will be infrequent enough that simply recreating a larger data-set and moving the data over would be feasible.

In development terms, however, being able to easily add new layers of data for future functionality would be a big plus. Unlike a number of projects like this, however, ai/pathfinding concerns with regard to terrain will be very minimal even in the worst case, so structuring with that in mind is thankfully not a concern. I may, however, end up having a number of data sets of this size, such as if I include even a reasonably abstracted economic component to the program, as I would like to eventually do once the main part of it is sufficiently fleshed out, so I'm trying to conserve resources (ideally keeping the program to 1gb of ram use or less if possible to increase the number of systems it can run on, and would rather have parts of the program work a bit slow, rather than make it unusable to people with older machines). Since this is intended as a single-user, offline application with no personal data, thankfully, data-security isn't really a concern. (In fact being able to manually pass data files between computers easily is intended)

Hope that makes more sense and that I didn't miss any of your questions and thank you for your help as I try to not code myself into a proverbial corner.

ps: i have some, albeit rusty, education in java, however, this is my first time coding anything where I had to be concerned with memory management, so I might be being just overly cautious as it's my first time dealing with these kinds of challenges
 
Hauke Ingmar Schmidt
Rancher
Posts: 436
2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The number of items is much larger than the items that can be held in memory, you can access any item by a unique key, you don't need ultra-fast access to every item.

Sounds like a simple lazy structure backed by a database will work. You just need an API (i.e. interface) that allows access by key, e.g. "value getData(x, y, layerId)", "setData(x, y, layerId, value)".

Leaves the question if you want to preload data of a specific region (semi-)automatically, or if you always directly hit the db on each access. If you use the API you can change this behaviour on the fly later, so no problem there; if you should run into load performance or memory usage issues you can implement and experiment with different caching behaviour if the need arises.

A file based database that can be set up programmatically and doesn't need a server like SQLite, HSQL, or Derby will do a good job here.
 
Winston Gutkowski
Bartender
Posts: 10575
66
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Gryphon Saldin wrote:My goal is to create a map program (hex style for fantasy rpgs, incidentally, as i wasn't entirely happy with anything i could find) with ~1-mile hexes as it's base and designed to easily handle at least 500mile wide maps.

I'm still not quite sure I understand. Do you mean that each "cell" in the "map" represents a hexagon as opposed to a square? That suggests to me three axes of movement rather than two, and very possibly a different arrangement of cells; although it could certainly be done with a simple 2D grid.
I suspect there are libraries around that can handle this type of map, but I'm afraid I don't know any specific one.

As to your "layers": it seems to me these could be handled at the cell level: Cell "has-A" Layer; and if they're relatively static, you could possibly represent them with an enum. Alternatively, they could be named classes.

In general though, I suspect Hauke is right: the "volume" part is a job for a database. I'm also not sure that optimizing at this stage is a good idea (it rarely is). Get it right first; then worry about speed - and even then ONLY if it proves to be a problem.

HIH

Winston
 
It is sorta covered in the JavaRanch Style Guide.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!