• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Ron McLeod
  • Paul Clapham
  • Tim Cooke
  • Devaka Cooray
Sheriffs:
  • Liutauras Vilda
  • paul wheaton
  • Rob Spoor
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Tim Holloway
  • Piet Souris
  • Mikalai Zaikin
Bartenders:
  • Carey Brown
  • Roland Mueller

Fast indexing / searching of a text file

 
Ranch Hand
Posts: 407
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi : I am writing programs that read data from massive text files.
What is the best way to do this in java ? Is indexing a possibility and if so what is the way to jump from one index to another ? Thanks, Jay
 
Bartender
Posts: 1638
IntelliJ IDE MySQL Database Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Yeah definetly indexing is a way to reduce the amount of time to fetch a record.
I am not sure what are your requirements but Apache lucene is a free text search engine that you may be interested in.
This article gives an insight into how to use a RandomAccessFile to build a small database. Although, it may not fit perfectly into your requirement but may give you a headstart about indexes and accessing records using indexes.
 
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Reading data from a file is something different than searching an index of the file, because the index typically does not contain the full text of the indexed documents. So whether an index would help depends on what exactly you need to do with the text.

I don't understand what you mean by "jump from one index to another" - random access of the file contents?
 
Nitesh Kant
Bartender
Posts: 1638
IntelliJ IDE MySQL Database Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Ulf: the index typically does not contain the full text of the indexed documents.


True, but the index will typically give me the record pointer, isnt it?
So, if i have indexed a text file to give me record pointers & record length for records containing a particular value for the indexed field, i can quickly retrieve the record from the file. isn't?

Ulf:I don't understand what you mean by "jump from one index to another" - random access of the file contents?


I assumed this! Its worth while getting this confirmed.
 
Ulf Dittmer
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

the index will typically give me the record pointer, isnt it? So, if i have indexed a text file to give me record pointers & record length for records containing a particular value for the indexed field, i can quickly retrieve the record from the file. isn't?



It is possible to to create an index like that. But that may or may not address the underlying problem. In particular, we don't know if there's a notion of structure or records within the files. That's why I asked the original poster for clarification what he's trying to do.
 
Nitesh Kant
Bartender
Posts: 1638
IntelliJ IDE MySQL Database Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Ulf:That's why I asked the original poster for clarification what he's trying to do.


Oh yeah absolutely, your question was perfectly valid. I was just confirming my understanding
 
All of the world's problems can be solved in a garden - Geoff Lawton. Tiny ad:
We need your help - Coderanch server fundraiser
https://coderanch.com/wiki/782867/Coderanch-server-fundraiser
reply
    Bookmark Topic Watch Topic
  • New Topic