• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Bear Bibeault
  • Paul Clapham
  • Jeanne Boyarsky
  • Knute Snortum
Sheriffs:
  • Liutauras Vilda
  • Tim Cooke
  • Junilu Lacar
Saloon Keepers:
  • Ron McLeod
  • Stephan van Hulst
  • Tim Moores
  • Tim Holloway
  • Carey Brown
Bartenders:
  • Joe Ess
  • salvin francis
  • fred rosenberger

Should machine learning be done always using Spark?

 
Ranch Hand
Posts: 1300
8
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Spark provides mlib for machine learning which has advantages over doing it without Spark such as reduced lines of code etc. Should spark always be used for machine learning or are there any cases where we should do machine learning without Spark.?
Thanks
 
Marshal
Posts: 67443
257
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Monica Shiralkar wrote:. . . Should spark always be used for machine learning . . . ?
Thanks

No.

Even without knowing anything about Spark, I would say that there most probably is nothing that should “always” be used for anything.
 
Monica Shiralkar
Ranch Hand
Posts: 1300
8
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks. What can be an example use case where it is better to implement machine learning without start as compared to doing it with spark?
 
Campbell Ritchie
Marshal
Posts: 67443
257
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Maybe I shall ask some questions: what would be a use case for Spark or Start when you think other products wouldn't be suitable?
 
Monica Shiralkar
Ranch Hand
Posts: 1300
8
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
One advantage is that the number of lines of code is reduced if we implement in Spark.
 
Saloon Keeper
Posts: 11185
244
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
As compared to what?
 
Monica Shiralkar
Ranch Hand
Posts: 1300
8
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
As compared to implementing and running the program without using Spark( Mlib and PySpark).
 
Campbell Ritchie
Marshal
Posts: 67443
257
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Monica Shiralkar wrote:. . . the number of lines of code is reduced . . .

How is that an answer to my question?
 
Monica Shiralkar
Ranch Hand
Posts: 1300
8
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Example of use case is logistic regression  program to predict whether customer will buy a product (0 or 1).
 
Campbell Ritchie
Marshal
Posts: 67443
257
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
And how does Spark make that better, apart from reducing the amount of code?
 
lowercase baba
Posts: 12792
51
Chrome Java Linux
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Monica Shiralkar wrote:One advantage is that the number of lines of code is reduced if we implement in Spark.


Lines of code is generally a poor metric.  I have seen C programs written on a single line, but you couldn't really make heads or tails out of it.  I have seen Perl programs written using all the shortcut, built in variables.  That makes the program shorter/fewer lines of code, but a week later when I come back to it, i have NO IDEA what those variables mean.

What is more important is readability of the code.

just to be clear, "lines of code is reduced" is NOT an obvious advantage, or really an advantage at all.
 
Bartender
Posts: 1229
38
IBM DB2 Netbeans IDE Spring Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

fred rosenberger wrote:

Monica Shiralkar wrote:One advantage is that the number of lines of code is reduced if we implement in Spark.


Lines of code is generally a poor metric.
What is more important is readability of the code.


I would totally agree, if we're talking about 'normal' programming, but when dealing with big data and/or machine learning I think that the more a language (or a framework) is successful in combining conciseness and expressiveness, the better is. In my humble experience, when I played - and I put emphasis on the fact I was playing and not working with machine learning, I found Python far better than Java. Less code, less ceremony, just aim at the core of the problem.
 
Monica Shiralkar
Ranch Hand
Posts: 1300
8
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

And how does Spark make that better, apart from reducing the amount of code?




simplicity, speed are some of the main benefits. I think speed will come into picture if the dataset is big size.
 
Whip out those weird instruments of science and probe away! I think it's a tiny ad:
Sauce Labs - World's Largest Continuous Testing Cloud for Websites and Mobile Apps
https://coderanch.com/t/722574/Sauce-Labs-World-Largest-Continuous
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!