I am working on a project actually to implement distributed GREP and distributed sorting using Mapreduce. I have an ordinary core I5 laptop and I dont have distributed environment to work on. So I am thinking that I can implement simplistic approach to implement.
From your description of the problem, it looks like the intended system should indeed be distributed across machines at some point ("distributed GREP and distributed sorting") and even the strategy to do so has been decided as mapreduce (presumably using hadoop).
The only problem seems to be that for your development purposes you don't have a cluster of machines at the moment.
I don't think the solution should be decided by non availability of development resources. Rather it should be decided by how the system is going to be finally deployed in production.
You can start off by installing Hadoop in single machine mode on your laptop (they have a tutorial that explains how - very easy to do).
You can later simulate a cluster of machines on your laptop by installing a virtualization product like
Virtualbox , creating atleast 1 Virtual Machine (your host machine and the virtual machine will play role of task tracker/name tracker), installing Hadoop on both them and running your jobs on this "virtual cluster". There is a learning curve involved here, but it'll be well worth it.
If at a later point you have access to more machines, you can very easily include them into your hadoop setup. The grepping and sorting (hadoop already supports sorted aggregation) logic will remain the same, regardless of whether hadoop is on a single machine or a cluster.
If I use threads to implement the same , will it be possible for me to let each thread use one core and get executed simultaneously. Please help.
How each thread is scheduled and assigned to a core depends on how the JVM is implemented, how the underlying OS inturn schedules them, what other applications are occupying the processor, etc. It's rather emergent behaviour. Java has no explicit parallelism capability - you can't tell java I have 4 cores and I want this thread to run on this core and this other thread to run on that core. You just implement multi threading using java APIs (Java 7 for example introduces the
fork join API that makes tasks like yours easier), hope for the best, measure performance, and see if any code level optimizations are possible to utilize threads better.
From the description of your problem, I think
you should stick with hadoop instead of going this route, since you need distributed. Going the threading route means you'll have to roll out your own distributed logic later on using RMI or something like that. Hadoop already has all that and is much less coding work. You can concentrate on the core analysis logic instead of the infrastructure to run that logic at the beginning.