Win a copy of Mastering Corda: Blockchain for Java Developers this week in the Cloud/Virtualization forum!

Carl Osipov

Author
+ Follow
since Jun 04, 2020
Cows and Likes
Cows
Total received
5
In last 30 days
0
Total given
0
Likes
Total received
6
Received in last 30 days
1
Total given
4
Given in last 30 days
0
Forums and Threads
Scavenger Hunt
expand Ranch Hand Scavenger Hunt
expand Greenhorn Scavenger Hunt

Recent posts by Carl Osipov

Hi Rishal,

Thank you for writing the book! Does it cover gradient descent and autograd?

Best,
Hi Junilu,

Thank you for your question.

I just got asked that same question yesterday! I recommend everyone who wants to get into machine learning to refresh on the basics, so start with:

Essence of Linear Algebra https://www.manning.com/livevideo/3blue1brown-essence-of-linear-algebra
Essence of Calculus https://www.manning.com/livevideo/3blue1brown-essence-of-calculus

Next, it is a good idea to get practical experience with using Python and the following libraries:

NumPy
Pandas

Then, you should be ready to pickup Scikit-Learn, XGBoost and do some practical ML projects.

Best,

Carl.
4 months ago
Hi Junilu,

Thank you for your kind words and your question!

This is a really interesting question because I think it is cultural. I had a unique opportunity to "straddle" the worlds of software engineering and ML and get some insights into these cultural differences. In traditional computer science / software engineering (by traditional here I mean Knuth's Art of Computer Programming algorithms), there is a virtue in using strong typing like in Java, Kotlin, Scala, etc. Strong typing helps with software engineering because it assists you with analysis of your code in advance of execution (both manually and with the compiler) and produce higher quality results in your applications or systems. So, there is a culture of software engineers who are working professionally in traditional software engineering roles, writing business logic, implementing backends, deploying microservices, etc who get constant self-reinforcement on the virtue of strong typing based on their professional roles.

ML & data science field is different. In this field you still depend on traditional IT (compute, storage, networking) but the practice of doing ML & DS is more like the scientific process: you analyze data, form hypothesis about data, chose models for data, and compare model performance. Hence, the outcome of the ML & DS projects are less known in advance and you don't rely on the compiler to help prove your algorithm correct. In ML & DS, you algorithms / models are correct primarily based on data rather than your code implementation. To make matters worse for strongly typed languages, having type checking is actually a hindrance in ML / DS rather than an advantage. In traditional software engineering, once you have a typed object (e.g. a Customer), that object has a consistent interface for the lifetime of the program: you expect Customer to continue to have the same fields (name, address, etc) when your code is executing. In contrast, in ML & DS, the entire point of the field is to keep changing the schema/type of a dataset that has something like Customer data. For example in the ML & DS code, you are likely to take Customer schema and merge/join with other tables, e.g. PersonalInformation. Then you may re-encode representation of Customer zip code from integer to string, then change the zip code from a column of strings to a "one-hot encoded" set of columns where each zip code is represented with a binary value, and so on.

In a nutshell, traditional software engineers depends on strong typing to get good results while ML & DS see strong typing as a hindrance or an obstacle to getting good results. Python is a mature, expressive language with a strong technical foundation that was designed outside of the strong typing culture. That's why I think it proved attractive to the practitioners in the field. Also, since the practitioners liked the language, they built great libraries for DS & ML, for example Pandas and Scikit-Learn.

As the result, Python is on track to become one of the top, if not the top programming language for practitioners.

Best,

Carl.
4 months ago
Hi Junilu,

Thank you for your question!

I don't have any plans to focus on data security in the book because I think there are many existing resources that cover cloud data security (e.g. https://aws.amazon.com/certification/certified-security-specialty/) and this topic can be handled independently of data analysis. In the industry, mixing the role of a security specialist and a data analyst is a bad idea since it gives rise to conflicts of interest.

With respect to companies that are still concerned about data security in the cloud: I think the situation today is better than it used to be 10 years ago and the trend is such that the companies are coming to grips with cloud data security though education and certifications like the one that I mentioned from AWS, or from Azure or GCP. If you are an architect talking to management about risks you should be certified or at least bring in experts into the conversation who have the right certifications. Also, look for success stories and case studies: management if going to be more likely to relate to your arguments if you spend the time understanding what's happening in the industry and can tell them about other teams doing similar work with security in the cloud. If you just Google this topic, you'll probably end up with a bunch of marketing trash. You should do the work and reach out to various communities: social network groups, meetups, LinkedIn experts, podcasts. The success stories are there if you look for them and just talk to people.

Best,

Carl.
4 months ago
Hi Divya,

Thank you for your question.

The most difficult problem is cultural. Data scientists who have grown into a specific set of tools (e.g. notebooks, dashboards, graphical UIs) can feel like the cloud is too complex for them to use. However major cloud providers have made serverless ML capabilities embarrassingly easy to use. I think the people who are smart enough to master data science are smart enough to master cloud.

Best,

Carl.
4 months ago
Hi Tim,

Thank you for your question.

I'm biased so I think that the project in my book is a great one for you to start

With that said, I've trained over 2,000 students on ML, so let me share some recurring themes that resonate with IT professionals:

- ML requires traditional IT (compute, storage, networking) but the practice of ML is different from traditional IT
- ML is more like the scientific process than IT, in ML you analyze data, hypothesize about different models for data, and compare different models for your data
- ML sounds like a zoo of different models but in reality you should study supervised learning (>80% of problems out there) for regression and classification. Make sure you understand these problems and whether they apply to what you are trying to achieve before diving deep into the machine learning algorithms. The worse case scenario here is if you become the expert on using random forests models/algorithms but find out that you need to do ML for image processing.

Last but not least, I can tell you from experience that even if you ask a Stanford PhD in ML to work on a dataset that they have never seen they are not guaranteed to achieve the state-of-the-art results on the 1st try. ML is much more iterative and experimental than traditional IT. So don't get discouraged if something doesn't work, ask for help or find a mentor, and try again!

Best,

Carl.
4 months ago
Hi Jose,

Thank you for your question. In general, if you are cost-constrained, then cloud is a double edged sword. For a small business it might be more simple to take care of a well known fixed capital expense upfront (for example buy a few servers) than to subscribe to a cloud service where the total cost of the IT resources is less clear upfront. I find that many small businesses have figured out how to limit their cloud service subscription expenses or how to take advantage of the numerous credits or promotions from major cloud providers (AWS, Azure, GCP) so that subscription cost is less of an issue.

With that said, the point of serverless ML is to reduce OPERATIONAL costs. In other words, if you are a small business and you don't have the money to hire administrators or SREs to babysit your machine learning system in production, then serverless ML is the right approach for you. With serverless ML, you design your ML system so that in production you have little to no need for operations personnel. Hence, you can take your ML expertise, apply it to the serverless ML design, get to the market sooner, and scale up or down to keep the operational costs in-line with the demand for your ML system.
4 months ago
Hi everyone! I'm Carl, the author of "Serverless Machine Learning in Action" from Manning Publishers. You can learn more about the book here: https://www.manning.com/books/serverless-machine-learning-in-action?a_aid=osipov&a_bid=fa913283 Happy to be here for the book promo and to let you know more if you are interested in machine learning.
5 months ago