• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Ron McLeod
  • Rob Spoor
  • Tim Cooke
  • Junilu Lacar
Sheriffs:
  • Henry Wong
  • Liutauras Vilda
  • Jeanne Boyarsky
Saloon Keepers:
  • Jesse Silverman
  • Tim Holloway
  • Stephan van Hulst
  • Tim Moores
  • Carey Brown
Bartenders:
  • Al Hobbs
  • Mikalai Zaikin
  • Piet Souris

Reading a very Large Text File without Delimiters

 
Greenhorn
Posts: 13
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
My Requirement:

1. reading a text file that has field positions(without delimiters)
2. This is a a very big file like some 20k lines of data.
3. Each Line has exactly 10 fileds, which i want to store in a ArrayList
4. So if i use the subString(), the String pool will be filled.

So could anyone of you suggest a good solution for this ?

Thanks in Advance
 
Rancher
Posts: 4801
50
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
What do you want to do with the data?
 
Marshal
Posts: 73980
332
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Nikki Tha wrote:. . . without delimiters . . . Each Line has exactly 10 fileds . . .

That is contradictory; do you mean that each line contains the ten data without special characters separating them? If you are not using a particular character, not even spaces, to separate the data, they must have predefined lengths. In which case you can use substring. . . .

But I suggest you start by reading the individual lines.

What do you mean about the data having to go into a List? That sounds like bad design; you shou‍ld write a class that encapsulates those ten data instead.
 
Nikki Tha
Greenhorn
Posts: 13
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I have to take few fields out of each line, eg: firstname& lastName  and single fields like Area code and search in the db if there are any users with this combination of firstname and Lastname and any users with the Area code and if the users are present , i have to write all those details into another file and send it

Thanks
 
Nikki Tha
Greenhorn
Posts: 13
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Campbell Ritchie wrote:

Nikki Tha wrote:. . . without delimiters . . . Each Line has exactly 10 fileds . . .

That is contradictory; do you mean that each line contains the ten data without special characters separating them? If you are not using a particular character, not even spaces, to separate the data, they must have predefined lengths. In which case you can use substring. . . .

But I suggest you start by reading the individual lines.

What do you mean about the data having to go into a List? That sounds like bad design; you shou‍ld write a class that encapsulates those ten data instead.




yes, the field are of fixed length and i know that i can do it with a substring, but my question is 1. As this is a very big file, using subString wont be a good option right. So is there any better solution for it.
the data looks like
Firstname1Lastname1City1State1Country1Areacode1Address1
Firstname2Lastname2City2State2Country2Areacode2Address2
 
Campbell Ritchie
Marshal
Posts: 73980
332
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
20,000 lines isn't very large. 20,000,000 lines might be. I can't foreseee any performance problems.
I presume you can read all the lines and print them to screen unchanged?

I can see another contradiction: you said all the fields are the same length and you said they are names. But names are different lengths. So are addresses. The fields therefore have to be different lengths. Are they separated by whitespace, then? Please show us an example line.
And what is this about databases? You haven't had somebody create that file from a database, have you?
 
Nikki Tha
Greenhorn
Posts: 13
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi

the Data is like this:

Firstname1   Lastname13City1State1Country 1 Areacode1 Address1
Firstname234Lastname2 City2State2Country  2Areacode2 Address2

The length for each field is fixed.
 
Sheriff
Posts: 26770
82
Eclipse IDE Firefox Browser MySQL Database
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Okay. So the length of each line is fixed too, and can be calculated. So after you calculate that number, write code which reads that number of characters into a String and then splits out the fields using elementary String methods. Repeat until there's no more data.
 
Dave Tolls
Rancher
Posts: 4801
50
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Since there doesn't seem to be any relationship between lines (from your description of what you do with the data), then there is no need at all to read the whole file in.
Just read a line, create an object based on a model of the data you are working with, process that one then move on.

Indeed, especially if you're just comparing with data in the database, this is the sort of thing that would work well in parallel.
 
Campbell Ritchie
Marshal
Posts: 73980
332
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Paul Clapham wrote:Okay. So the length of each line is fixed too, and can be calculated. . . .

That sounds ideal for the String method we mentioned earlier. As long as you count the sizes of the fields carefully, that shou‍ld produce no problems. I presume that all entries in the field use only old‑fashioned Unicode characters and none is greater than 0xffff. As DT says, create an object from that line.

Your fields in the String are delimited by \s* because an entry shorter than the space available is separated from its successor by whitespace, but a datum filling its available space is separated from its successor by zero whitespace characters. Don't try String#split. Using String#split with \s* or \s+ will therefore give you incorrect results; one will fail to split if the datum is long and the other will split into one‑character Strings.
 
reply
    Bookmark Topic Watch Topic
  • New Topic