# Specific problem domains in which which mahout is best

Alok Bhandari

Greenhorn

Posts: 15

Sean Owen

author

Greenhorn

Greenhorn

Posts: 21

posted 5 years ago

- 1

Gives better results than what? And "better" in the sense of faster, or "more accurate"?

The clustering algorithms in Mahout are fairly standard algorithms, not some special approach. So I think they perform as well as any other implementation of these standard algorithms in terms of quality.

In terms of performance -- they are implemented on Hadoop. This means it is much easier to scale up to very large data sets, but means you incur a lot of Hadoop overhead. For small data sets, you could probably find a faster implementation that is all on one machine, maybe something written in R. For very large data sets, where you can't apply non-distributed tools, I imagine it's about as good as anything else freely available out there. Honestly I'm not aware of another distributed clustering package to compare to.

The clustering algorithms in Mahout are fairly standard algorithms, not some special approach. So I think they perform as well as any other implementation of these standard algorithms in terms of quality.

In terms of performance -- they are implemented on Hadoop. This means it is much easier to scale up to very large data sets, but means you incur a lot of Hadoop overhead. For small data sets, you could probably find a faster implementation that is all on one machine, maybe something written in R. For very large data sets, where you can't apply non-distributed tools, I imagine it's about as good as anything else freely available out there. Honestly I'm not aware of another distributed clustering package to compare to.