Win a copy of Programmer's Guide to Java SE 8 Oracle Certified Associate (OCA) this week in the OCAJP forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

sorting lines of strings in a file

 
jin sun
Ranch Hand
Posts: 30
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi, here's my problem:

I have this file which can be anywhere from 200 MB+, and within that file contains lines that look look like this:

ENG|glucopyranoside|C0644705|L1129264|S1355824|

So, what I want to do is sort by the 2nd field (glucopyranoside)alphanumerically. I've been trying to add each line to a list and sort using the collections.sort(), but I get out of memory errors. I think my problem is trying to put all the lines on a list and sort, haha. Is there a way around this?
 
William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13071
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If you are sure that the character set is ASCII, and you have enough memory that can be assigned to the JVM, do this:
1. read the whole file as one big byte[] (half the size of char[] used for Strings
2. locate all the line starts by scanning through the byte[], keeping a List of the line starts - possibly as Integer objects or maybe as a custom object.
3. create a class implementing Comparator that can find the field to sort on and return the correct compare and equals results
4. sort the List by providing the Comparator to Collections.sort()
- Thats not the fasted sort but it will be simple to code.

It you cant get it all in memory you will have to read and sort chunks - later merging the sorted chunks.

Bill
 
jin sun
Ranch Hand
Posts: 30
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Sorry, I think I need to rephrase my problem more. The file is already sorted and I want to insert new line(s) (following the same format as my example above) into it's right place. For example:

I want to insert the following into the file:
ENG|horse|C0644705|L1129264|S1355824|

And in the file it would go between the following based on the second field:
ENG|giant|C0644727|L1129215|S1355816|
ENG|house|C0644732|L1129211|S1355819|

I'm stumped, this is probably simple but I'm really rusty, any suggestions would be a help.
 
William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13071
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Your restatement of the problem does not resemble the original at all - anyway -

The only way to do this is by creating a new file, reading lines from the old and writing to the new until you hit the right spot to insert. If you know the old file is sorted, the right spot is when you hit a line that comes after the line to be inserted.
Bill
 
jin sun
Ranch Hand
Posts: 30
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
^yea, my restatement is completely different from my original one, sorry.

The only way to do this is by creating a new file, reading lines from the old and writing to the new until you hit the right spot to insert. If you know the old file is sorted, the right spot is when you hit a line that comes after the line to be inserted.


Thanks, I thought there was another way of doing it, but I guess I have to do it that way.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic