Granny's Programming Pearls
"inside of every large program is a small program struggling to get out"
JavaRanch.com/granny.jsp
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Liutauras Vilda
  • Bear Bibeault
  • Jeanne Boyarsky
  • Tim Cooke
Sheriffs:
  • Knute Snortum
  • Junilu Lacar
  • Devaka Cooray
Saloon Keepers:
  • Ganesh Patekar
  • Tim Moores
  • Carey Brown
  • Stephan van Hulst
  • salvin francis
Bartenders:
  • Ron McLeod
  • Frits Walraven
  • Pete Letkeman

Best Approach to read huge files utilizing multithreading  RSS feed

 
Ranch Hand
Posts: 362
2
Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
In case we have to read huge files (in GBs), then what is the optimal way to read the file using multithreading?
 
Saloon Keeper
Posts: 9378
181
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Using more than one thread to read a file is usually a really bad idea. However, if the contents of the file were designed for it, you can do it.

The file requires an index of sorts that says which records can be found in what position in the file. Read the index to find out what records the file contains, and divide them up over separate tasks that are responsible for processing a portion of the records.

Why do you want to do this?
 
Vaibhav Gargs
Ranch Hand
Posts: 362
2
Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thank you Stephan. It was on an interview question, so, just thinking how optimally we can read and process the large files.
 
Marshal
Posts: 60759
189
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Stephan van Hulst wrote:Using more than one thread to read a file is usually a really bad idea. . . .

Won't the file be locked by the OS, preventing several threads accessing it in the first place?

Consider reading line by line and passing the results to a parallel stream.
 
Stephan van Hulst
Saloon Keeper
Posts: 9378
181
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Operating systems (at least Windows) only lock files for a single process. The process itself can still access it with multiple threads. You will first want to map the file to memory.

Using lines() only works if each record is represented by a single line, and each line can be processed without any other context.

I haven't tested it, but I think you could do something like this:
 
Campbell Ritchie
Marshal
Posts: 60759
189
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thank you
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!