Forums Register Login

sorting lines of strings in a file

+Pie Number of slices to send: Send
Hi, here's my problem:

I have this file which can be anywhere from 200 MB+, and within that file contains lines that look look like this:

ENG|glucopyranoside|C0644705|L1129264|S1355824|

So, what I want to do is sort by the 2nd field (glucopyranoside)alphanumerically. I've been trying to add each line to a list and sort using the collections.sort(), but I get out of memory errors. I think my problem is trying to put all the lines on a list and sort, haha. Is there a way around this?
+Pie Number of slices to send: Send
If you are sure that the character set is ASCII, and you have enough memory that can be assigned to the JVM, do this:
1. read the whole file as one big byte[] (half the size of char[] used for Strings
2. locate all the line starts by scanning through the byte[], keeping a List of the line starts - possibly as Integer objects or maybe as a custom object.
3. create a class implementing Comparator that can find the field to sort on and return the correct compare and equals results
4. sort the List by providing the Comparator to Collections.sort()
- Thats not the fasted sort but it will be simple to code.

It you cant get it all in memory you will have to read and sort chunks - later merging the sorted chunks.

Bill
+Pie Number of slices to send: Send
Sorry, I think I need to rephrase my problem more. The file is already sorted and I want to insert new line(s) (following the same format as my example above) into it's right place. For example:

I want to insert the following into the file:
ENG|horse|C0644705|L1129264|S1355824|

And in the file it would go between the following based on the second field:
ENG|giant|C0644727|L1129215|S1355816|
ENG|house|C0644732|L1129211|S1355819|

I'm stumped, this is probably simple but I'm really rusty, any suggestions would be a help.
+Pie Number of slices to send: Send
Your restatement of the problem does not resemble the original at all - anyway -

The only way to do this is by creating a new file, reading lines from the old and writing to the new until you hit the right spot to insert. If you know the old file is sorted, the right spot is when you hit a line that comes after the line to be inserted.
Bill
+Pie Number of slices to send: Send
^yea, my restatement is completely different from my original one, sorry.

The only way to do this is by creating a new file, reading lines from the old and writing to the new until you hit the right spot to insert. If you know the old file is sorted, the right spot is when you hit a line that comes after the line to be inserted.



Thanks, I thought there was another way of doing it, but I guess I have to do it that way.
Don't destroy the earth! That's where I keep all my stuff! Including this tiny ad:
a bit of art, as a gift, the permaculture playing cards
https://gardener-gift.com


reply
reply
This thread has been viewed 1402 times.
Similar Threads
difference between Java, VisualBasic & C
Problem zipping the file
word file not getting downloaded on Netscape6
sorting a file with string/int fields
word file not getting downloaded on Netscape6
More...

All times above are in ranch (not your local) time.
The current ranch time is
Mar 29, 2024 08:55:22.