Forums Register Login

why Hive runs map reduce jobs only for Where clause statements not for normal select statements?

+Pie Number of slices to send: Send
When I run a query in hive say "select * from tablename"---No map reduce runs.but when i run query "select * from tablename where -----" -It starts to run map reduce in the background. Why so does is run map reduce only in case of where clause? also the response comes faster in normal query than when with where clause for same reason...so whats the reason.
thanks
1
+Pie Number of slices to send: Send
Case 1: SELECT * FROM <table>;
In this case, all the table contents are supposed to be delivered straight forward. There there isn't any 'precondition' or 'filter' as such which 'WHERE' clause introduces.
Hive stores tables as files on HDFS and AFAIK in this case Hive simply out streams that file contents (similar to 'cat' in Linux).
This must be part of optimization. Running MR job and slowing the query doesn't make sense in this case.

Case 2: SELECT * FROM <table> WHERE <condition;>
In this case, table contents must be processed through some kind of logic/filter to get rows matching the condition.
As Hive is meant for huge data, this processing is done by taking advantage of scalable, parallel Hadoop map reduce framework.

Hope this solves your query.
It's hard to fight evil. The little things, like a nice sandwich, really helps. Right tiny ad?
a bit of art, as a gift, the permaculture playing cards
https://gardener-gift.com


reply
reply
This thread has been viewed 6417 times.
Similar Threads
Advantage to storing data row wise in Hbase tables as compared to Relational DB?
Hadoop: confusion between Hive tables and HBase tables..
hibernate - gathering multiple rows single column for two queries
Hibernate/Display tag Pagination (Poor/slow performance for the last set of pages)/ Oracle 10 G
showing hbase data in JSP taking several minutes
More...

All times above are in ranch (not your local) time.
The current ranch time is
Mar 28, 2024 12:48:17.