Question: liveProject: How to Think about Manipulating Data -- DataFrame

Ranch Hand

Posts: 58

posted 2 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Thank you for answering my previous question.
I have another one:

If pandas library is single threaded, working partitioned data would not bring any benefits; the data would still be accessed sequentially and not in parallel manner, right?

Basically I am trying to find a good use case to use pandas vs Spark.

Regards,
Alex

SCJP 5
SCWCD 5

A. Bell

Author

Posts: 3

posted 2 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

If you have a small-ish dataset then pandas is just as good to use -- small-ish probably means that it should all be able to fit into memory.

Consider Paul's rocket mass heater.