[First of all, sorry for this long reply - it ended up much longer than I wanted it to be. If you want the TL;DR version, scroll down to the 3 options below "Phew" before going for Solr]
In my opinion,
you should consider Solr only if you find that it has features that you need, but are simply not available in Oracle, and are are too complex or effort consuming to implement in the application layer.
Solr has some nice features that are very useful for some projects. Features such as:
stemming, for language and grammar aware searchingfaceting, for supporting filtered search interfacessearching documents like PDFs and office formatsdistributed and high availability searches using sharding and replication
On the other hand, it is a standalone specialized server. Which means the application layer is responsible for smoothing out all the integration headaches with master database.
Issues will crop up, such as:
Justifying data consistency requirements:
I've found that a lot of software engineers assume and overestimate the need for strong consistency among users.
Most users rarely notice and often don't really care if there are some results that shouldn't be shown.
But for logical people like developers, strong consistency seems like a neat and clean thing, while eventual consistency looks sloppy.
You'll have to think and discuss which approach a majority of your users will find acceptable.
Strong consistency is usually a pain to implement and often not worth the effort, especially when you need it between multiple data stores, like an oracle db and a solr index.
On the other hand, if you sacrifice strong consistency, there will always be a couple of fault reports from users who complain that a search result could not be opened, and they will have to be tackled.
So ask yourself, how critical is strong consistency?
If it's absolutely critical (think Air Traffic Control system), you'll have to re-architect most of your application's DAOs and business logic to implement transactional consistency between DB and Solr.
If not so critical, Solr integration is magnitudes easier because you can configure it to autonomously pull data from db.
Strong consistency usually requires a central broker component for routing all database operations
You mention "1000 insert / updates per second by other applications". If multiple applications are autonomously updating your tables, then you'll have a tough time keeping the Solr index strongly consistent with DB. Does each application update the Solr index? Multiple applications updating a single Solr index will itself result in eventual consistency issues between the applications, because Solr keeps delta changes between index updates and reconciles them once in a while.
So this again brings up the same question of strong or eventual consistency.
Data transfer strategy:
Deciding whether application should push database updates to Solr, or Solr should retrieve them. Related to the above point on consistency.
If you want strong transaction level consistency between database and Solr, you will have to re-architect your application to push the data, since pulling it from Solr using the DIH (Data Import Handler) is always asynchronous, which affords you only eventual consistency. This will manifest as search results which when clicked give "database record not found" errors. Is that acceptable to your users?
Network latencies
Since all commands and queries from application to Solr go as HTTP requests over the network, you are introducing additional network latencies here. Keep that in mind, especially because your problem
seems to more about performance than missing features.
Keeping DB schema and Solr schema in lock step
Solr has its own schema, described in XML configuration files. Whenever you change DB schema, you'll have to update the Solr schema too and then reindex all affected records.
Handling JOIN tables
This is an interesting problem. Since JOINs are dynamically created views of multiple tables but Solr schemas are static configurations, you will likely have to create Solr schemas for every JOIN query you write and index the JOIN results before hand.
Generally, I don't follow this. I implement search result joining in the application layer, by combining 2 different Solr search results into one single result set. But then again, all those applications were being written newly, so I had the opportunity to pre-plan. To change an existing system would be prohibitively effort consuming, I suspect.
Phew! So what next?
In my opinion, it's better to first explore and exhaust ways to use the system of record (Oracle db in your case) itself to solve your problem, before introducing a secondary specialized data store like Solr and all the additional integration problems it brings with it.
Some possibilities I can think of:
1. Basic check: Are all your searchable columns indexed? Use Explain plans to find out bottlenecks. Invariably, a lot of performance can be strained out of the DB this way.
2. Use Oracle's full text search capabilities: Are you aware of
Oracle Text? It supports features like stemming and consistency.
3. Use replication to separate your transactional workload from querying workload:
This is a common technique used in data warehousing (OLTP vs OLAP). Basically, you replicate your database.
One replica is used as the primary system of record, and receives all your live inserts, updates and deletes.
The other replica has additional views where tables are joined, and text or SQL queries are executed.
Some additional thoughts on your requirements:
100+ million rows
Not a problem by itself for Solr, neither for indexing nor querying. JOINs may be the bigger problem.
1000 insert / updates per second by other applications
Have already described above the eventual consistency issues when a single Solr index is updated by multiple autonomous applications. Alternatively, you'll require a central broker to do it, but that means re-architecture.
Results should come faster
Explore possibilities of doing this with Oracle itself, before deciding on Solr.
Results should be accurate
Accuracy is pretty subjective. Solr supports things like stemming and wildcard searches, but whether they are judged as more or less accurate compared to DB searches depends on your users. Use A/B user
testing to decide that.
Results should have the latest data (ie, strong consistency with database)
This is a minefield, for reasons described above. First decide with all stakeholders whether you want to enter this minefield at all, before searching for a safe route through it. A manager / developer /tester will almost always say "hurrah for strong consistency", but your users may not even care.