• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Best and optimum method to delete the first line from pipe delimited file with HUGE size

 
Titan Spectra
Greenhorn
Posts: 4
Firefox Browser Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

I would like to know the best and optimum method to delete the first line ( specifically ) from a pipe delimited file , which is huge is size.

I have tried the RandomAccessFile , FileChannel , Buffers .... but in all of these cases , I have to load the entire file , make the operations and then rewrite the whole file again.

This is a very expensive operation I do not want to load the whole file into memory and cause memory related problems , also since the size of the file may run to GB's , it is not feasible..

Can anybody direct me to an optimum method to just delete the first line from the pipe delimited file , without loading / rewriting the file.

Note: Could regular expressions help me in this , but the problem is that the length / format for the first line would be unknown. the only known thing would be the pipe.

Thanks in advance.
 
Madhan Sundararajan Devaki
Ranch Hand
Posts: 312
Java MS IE
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I believe, without loading the file you cannot skip/delete the first line!
 
Titan Spectra
Greenhorn
Posts: 4
Firefox Browser Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yes , agreed. But loading a huge file which may run into GB's just for deleting one line , that is my concern.

Would like to know if there is a better way of doing it , else if i need to load the file , which is the most optimum method which would not require a large memory.
 
fred rosenberger
lowercase baba
Bartender
Posts: 12185
34
Chrome Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
"best" and "optimum" are subjective terms. Best in terms of

a) program complexity/simplicity?
b) memory consumption?
c) speed?
d) fault tolerance?
e) recoverability?

several of these are conflicting - in other words, you can't have it all. You need to define what is most important, what is least, and HOW important each is.

your second post indicates that memory may be an issue.

Can you read the file a line (or 10 lines, or 100 lines) at a time, write them to the output file, then get the next 'chunk'?
 
Titan Spectra
Greenhorn
Posts: 4
Firefox Browser Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
fred rosenberger wrote:"best" and "optimum" are subjective terms. Best in terms of

a) program complexity/simplicity?
b) memory consumption?
c) speed?
d) fault tolerance?
e) recoverability?

several of these are conflicting - in other words, you can't have it all. You need to define what is most important, what is least, and HOW important each is.

your second post indicates that memory may be an issue.

Can you read the file a line (or 10 lines, or 100 lines) at a time, write them to the output file, then get the next 'chunk'?


Oops..my bad.. I should have been more specific.

Well in terms of what i exactly need , mentioned below are the same

a) Memory Consumption and Speed are the first priority
b) Recoverability would be the second priority

The rest follow ... program complexity / simplicity is not an issue as long as my first 2 priorities can be met.

I could read the file in chunks , and then re-write the output file.
But this is something that I want to avoid , since I want to just delete the first line from the file . Loading and rewriting a 1GB file just to delete the first line from it , is what I don't want to do , would like to know any other approach to this problem ( ( like the sed / tail command in Linux ).
 
William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13071
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You appear to be asking for a magic method to cause the first line to disappear while the rest of the file "moves up" to occupy the space previously used.

Think about it - how would an operating system store a file such that this is possible?

Reading and writing binary blocks to a new file, using a block size that fits with the operating system's internal buffers is your best bet. Do NOT do any character conversion, stick to binary.

Bill

 
Paul Clapham
Sheriff
Posts: 21322
32
Eclipse IDE Firefox Browser MySQL Database
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Titan Spectra wrote:Can anybody direct me to an optimum method to just delete the first line from the pipe delimited file , without loading / rewriting the file.


There isn't any such method, let alone an "optimum" method. Like Bill says, operating systems don't support that sort of thing.

If you omit the requirements at the end of that sentence then the optimum method is to read the file one line at a time and write out a new version, not writing the first line. There is absolutely no need to read the entire file into memory.
 
Pat Farrell
Rancher
Posts: 4678
7
Linux Mac OS X VI Editor
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Paul Clapham wrote:There isn't any such method, let alone an "optimum" method. Like Bill says, operating systems don't support that sort of thing.

Some operating systems would let you get very close. But I don't know of any in widespread use today that support it.

VMS (the Vax OS of the 70s and 80s) stored a file not as a string of bytes, but as an array of records, with a binary record descriptor at the start of each record. For normal ASCII files, each record was just a line of the file. On a Vax you did not have a \n to delimit the line, rather the binary record descriptor at the begining of each line/record had the number of bytes in the record, which could be padded. With this, you could make the first record disappear by simply changing the descriptor for the first record to show zero interesting bytes.

This did not, of course, actually make the file smaller. To do that, you have to read each and every byte of the file, and write out the ones you like.
Doing this for files of a gigabyte or two will not take all that long, assuming you don't do a lot of buffer reallocation/garbage collection. Naturally you want to only read in a buffer at a time.
 
Titan Spectra
Greenhorn
Posts: 4
Firefox Browser Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I think I get it now , guess the posts have helped me in a way , will try out the solutions offered.

A very Big Thank you to William , Paul , Pat , Fred , Madhan for all your help and inputs on the problem. I will try out the same and will update the thread on the same.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic