Cole Howard

+ Follow
since Apr 22, 2019
Cows and Likes
Total received
In last 30 days
Total given
Total received
Received in last 30 days
Total given
Given in last 30 days
Forums and Threads
Scavenger Hunt
expand Ranch Hand Scavenger Hunt
expand Greenhorn Scavenger Hunt

Recent posts by Cole Howard

Hi Randy,

Hobson is the real expert on this point of NLP, but in a nutshell... LSI is one method to create a dense vector/mathematical representation of the content of a document in a particular corpus.

For the entire corpus, find the list of unique words.  
For each document in the corpus count up the occurrences of each of those words.  Some will be zero.  
Then use a dimensionality reduction technique such as SVD to get a vector that represents the meaning of that document 'in relation to'all the other docs.

You can then use the same path as above on me documents and then a similarity metric such as cosine distance to find similar documents in the corpus and surface them (in a search query, say).

Hope that helps,
Hi Sean,

I haven't used ANTLR myself, but a lot of the tools it provides are included in projects such as spaCy and NLTK, both of which we discuss the Python implementation of in the book.  NLP itself isn't a package or tool, as much as a concept.  Hopefully the book provides a good overview of the underlying concepts that power all of these tools and give you a better intuition of how to tweak to the levers to get to your projects end-goal.

Hi Sourabh,

The concepts are covered in general terms.  Its is completely reasonable to assume you could recreate any of the examples in Java, but we do rely heavily on existing Python libraries that abstract a lot of the boilerplate processes away.  I'm sure there are parallel libraries for a good deal of the tools in Java though.  

The deep learning libraries such as Keras haven't made there way to Java yet, as far as I know, so those sections may be a little extra slow-going.  But we do endeavor to express the concepts in general terms there as well.
Hi Paul,

Knowing Python is by no means an absolute prerequisite.  Although all of the examples are explicitly illustrated in Python code, we took great pains to try an explain the concepts in general terms first.

I won't say Python is the definitive tool for NLP, but it certainly has one of the richest eco-systems for NLP tools.   NLTK, spaCy, to name just two.  

The sections on deep-learning for NLP are also exampled in Python, with higher order concepts approached in general terms.  However, all of the major deep-learning libraries express apis at the very least in Python.  So to really get your feet wet with neural networks, Python will be huge help.

All of that being said.  Python is incredibly readable, even to the point of being confused with pseudo-code on occasion.  So don't let that be a barrier to entry.

Hi Carl,

There are many uses for machine learning in web development.  I started out as a web developer and slowly drifted into recommendation engines and then onto neural networks and nlp.

It mostly would depend on your current focus, I would imagine.  The book goes into detail on tools and approaches to build chatbots, for automating workflows for your users, for example.  Tensorflow.js is now bringing the power of deep learning straight into the browser.  So you can do near real-time classifications of text or images client-side with no need to ship data to a remote server.  Thereby avoiding potential privacy concerns.

The sky is the limit!