• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Large Files

 
Ranch Hand
Posts: 750
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi, this is not so much of a java question, but I'll be doing it using java.

I will have say 20,000,000 strings, each one around 30 characters in length.

I want to store them on the server, so java program can access them.

So for example, if the 17,000,000 string was required, I could in theory, read the 17,000,000th line from one big text file.

I don't think this will be very quick though, so I thought..
Perhaps break the 20,000,000 strings into 1000 files of length 20,000. Then read required line from the required file.

Can anyone think of an alternative to this, prefably faster and talking up less space.

Thanks
 
Ranch Hand
Posts: 479
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I can probably think of 100 alternatives to this, most of them irrelevant to the program you are presumably trying to write.

My point is there is not enough information here to advise on a good way to do this (as opposed to just another way to do this); It will depend on lots of things. How often do you expect to do this? Are the strings all equally likely to be needed? Is this part of a program that is liable to need it to optimize memory use? Or I/O? Or CPU? Is it going to run for a long time, or does your program do this once and then go off and do something else most of the time?

rc
 
colin shuker
Ranch Hand
Posts: 750
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Yes, good point, I thought I would omit the details for clarity...

Its an opening book for my chess engine. Each entry contains a 64bit zobrist key of the position together with
a small selection of possible moves and weights.
I can wrap each entry into 1 number of about 30 digits, or less in hexadecimal.

So the opening book file(s) will only be read at most 12 times (during the start of the game), say once a minute for 12 minutes.

But I would still like it to perform quickly, say under 1 second, just to keep things fast.

Have also just been looking at RandomAcessFile in java, and this might be a good way to do it.

Also, I don't really want to be loading the file into the java program cause I need as much memory as I can for other parts of the program that take up big arrays.


Thanks again for any advice.
 
Sheriff
Posts: 67746
173
Mac Mac OS X IntelliJ IDE jQuery TypeScript Java iOS
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Database.
 
colin shuker
Ranch Hand
Posts: 750
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Can you be a bit more specific please, thanks
 
Bear Bibeault
Sheriff
Posts: 67746
173
Mac Mac OS X IntelliJ IDE jQuery TypeScript Java iOS
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I think you'd be better of using a database rather than files for this. Databases are designed to quickly look up records in a large dataset.
 
Ralph Cook
Ranch Hand
Posts: 479
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
While I agree that this is a major use of databases, I can see wanting to avoid a general-purpose relational database management system in this case.

RDBMs are built, necessarily, for the general case; they occupy large amounts of memory and take up extra processing power in order to make things flexible. They are good at what they do in general, and it is possible that this would be a good, or at least a possible, solution to this problem. But I would worry about saddling my heap with the objects generated by the RDBMS, which I could not control, for a chess-playing program.

A chess-playing program is one of these things that occupies all your available memory and processing power and screams for more. I would be careful about putting an RDBMS in one; if I did, I would be careful to abstract all use of it so I could replace it with a special-purpose equivalent with a minimum of trouble.

I've not done anything significant with random-access files in java, but from reading the runtime javadoc it appears they may suit your case. You will need some way to translate your key into the position you want to seek, and of course you want to minimize seeks. If it were me, I would do tests on multiple seeks in different size files, preferably on the most likely target OS, to try to determine if the splitting into different files made sense.

I would guess that opening a file would be expensive compared to seeking in one that was open, and that reading would be less expensive than either of those.

Good luck with it!

rc
 
Java Cowboy
Posts: 16084
88
Android Scala IntelliJ IDE Spring Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
A relational database is not necessarily a huge piece of software that uses massive amounts of disk space and / or processing power.

You could use something like HSQLDB or Apache Derby, both small relational database systems that you can even run embedded in your application (which means that the database server runs in the same JVM as your application, not as a separate process that you have to connect to).
 
Ranch Hand
Posts: 222
Google Web Toolkit Eclipse IDE Chrome
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
What about SQLite?
 
Ralph Cook
Ranch Hand
Posts: 479
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Certainly there are smaller and larger RDBM systems, and I have not made any survey of which ones are and are not large and so forth. I have some points that I still think are relevant here, however:

1. Any RDBM system is general purpose, and in order to maintain general-purpose flexibility, a system usually has to use more CPU cycles, memory, etc., in comparison with special-purpose code.

2. An RDBMS that is regarded as "small" is usually being compared to other RDBMS, not to doing the same job for a specific purpose with code crafted for that purpose.

3. The purpose for which the OP wants this is VERY limited for an RDBMS, and it does not seem difficult to fulfill the purpose without an RDBMS.

4. If you use any RDBMS, you lose *some* control over the use of CPU and memory that you can keep better if you craft the code for your specific purpose.

5. The program the OP is writing has EXTREME needs in both CPU and memory use. So it makes sense to examine carefully any commitment made in either of these areas at the outset.

As I said, some rdbms *might* fulfill what he needs, but I would make more sure than usual that I could detach the entire RDBMS and replace it with specific-purpose code if I ever expected it to, for instance, play tournament chess at any level.

rc
 
reply
    Bookmark Topic Watch Topic
  • New Topic