posted 14 years ago
If the software that's used to search the files is supposed to be available on each server, then every single server needs to inform all other servers of any changes made to it - not a very appealing proposition. An alternative would be to have a single "master" server that collects updates from all servers, and then distributes them to all servers; that way, only one server needs to communicate to all others. Taking the "master server" concept a bit further, I think it would be preferable to have just a single server (the master) for the searching; any search results could then point back to the server where the file is to be found.
While using RMI is a possibility, it's kind of on its way out in this world of interoperability. Using a RESTful web service to send the updates would be a better approach, IMO. (Corba, or course, is truly dead.)
For indexing and searching the Apache Lucene library is king these days. If you can extract the text and metadata that you want to go into the index by other means (maybe JPedal or PDFBox for PDFs, Apache POI for MS Office files, etc.) then it should work nicely for this.