This week's book giveaway is in the Programmer Certification forum.
We're giving away four copies of OCP Oracle Certified Professional Java SE 21 Developer (Exam 1Z0-830) Java SE 17 Developer (Exam 1Z0-829) Programmer’s Guide and have Khalid Mughal and Vasily Strelnikov on-line!
See this thread for details.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Tim Cooke
  • Campbell Ritchie
  • paul wheaton
  • Ron McLeod
  • Devaka Cooray
Sheriffs:
  • Jeanne Boyarsky
  • Liutauras Vilda
  • Paul Clapham
Saloon Keepers:
  • Tim Holloway
  • Carey Brown
  • Piet Souris
Bartenders:

Checking a file has been split properly

 
Ranch Hand
Posts: 76
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,

I am writing a program which will split a file into multiple smaller files. Following this I then need to make sure that all of the data is in the split files.

Following the splitting operation I was thinking about counting the number of lines in the master file and then counting the number of lines in each of the smaller files. The sum of the lines in the smaller files should be the same as the number of lines in the master file. I'm thinking that this might take a long time and wondering if there is any quicker way of doing it.

Another possible way is the File.length method?? Should the sum of the lengths of the smaller files be the same as the length of the master file?

Would appreciate your thoughts and ideas on this.

Many Thanks
 
(instanceof Sidekick)
Posts: 8791
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Sounds like your files are all text? A line count audit should add up ok. A byte count might not since you're losing one line terminator per file (except the last file) in the split, and you might change the line break from CR to CRLF or vice versa.

Probably the best way to make sure you didn't lose or break any data would be to put the files back together and see if the new merged file matches the original, maybe with a 3rd party compare tool.
 
Wanderer
Posts: 18671
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hm, Stan seems to be making several assumptions about how line terminators are being processed here, and I'm not sure they're warranted. I think the poster needs to determine whether things like line terminators will be changed during processing, and whether a new line terminator may be added at the end of each split file. Offhand I don't think either of those is necessary, though they may be desirable, or not, depending what this application is for.

If you're concerned about time, then counting lines in a file does ultimately require that you read each and every byte of the file to discover if it's a line terminator (or part of one). If you're going to do that, you might as well also use some sort of checksum to verify that all the data is valid, not just the number of lines. The time spend calculating a checksum should be small compared to the time spend reading from the file in the first place. If you want something less reliable but much faster, and if you're not changing line terminators (something I typically find unnecessary or even undesirable anyway) then simply adding up the total file sizes should work reasonably well.
reply
    Bookmark Topic Watch Topic
  • New Topic