Don Horrell wrote:My other ML interest is topic modelling, using document vectors.
There do not seem to be pre-trained sets of document vectors available yet, but when there are, how could we use transfer learning to take a pre-trained set of document vectors and adapt it to domain-specific documents e.g. medical documents, documents about programming, patents etc?
Don Horrell wrote:Thanks for your reply, Paul. I'm a little confused though.
The pre-trained FastText word embeddings I have downloaded map words to vectors, so in my case (using TensorFlow to do some NLP classification), I can only train my classifier on the words in the embedding list.
That is the crux of my original question - how can I add domain-specific vocabulary to pre-trained word embeddings. Will your book cover this?
Lucian Maly wrote:Hi @Paul Azunre,
In your book, are you planning to cover this evolution from in-vocabulary context independent word embeddings to ones that take into account word order in their training? Or compare how Word2vec, Glove, ELMo, BERT generate different vectors for the same sentence? I have seen an attempt of something similar on the widely circulated sentence: but nothing that was easy to digest...
Don Horrell wrote:Hi Paul Azunre.
I am trying to do multi-label classification on some text. The number of times each label has been assigned to the training text shows a large skew.
Is there anything that Transfer Learning can do to help?
Don Horrell wrote:What are the strengths and weaknesses of Gensim and TensorFlow for NLP?
Which is best for the different types of project?
Serge Yurk wrote:Hello Paul.
Hope your book contains a lot of interesting info about cross-lingual models.
Сould you please give me some links on English-German and English-Russian successful stories in this field.
Thank you in advance,
Don Horrell wrote:Hi Paul Azunre.
There are several pre-trained word embeddings available, but they generally cover the most common words.
Can I do something similar to Transfer Learning - start with a pre-trained set of word embeddings, then add my own domain-specific words somehow?
peterr paul wrote:Hi
Can anyone help me in suggesting the best course preferably to learn as a fresher?
Would it be better to go with AI or Machine learning?
And certainly what will be the duration and methods of learning this course ?
Lucian Maly wrote:Two emerging pre-trained language models - BERT (uses a bidirectional transformer) & ELMo (uses the concatenation) open up new possibilities in language processing. I see in the book sample that for instance BERT and logistic regression are the best algorithms for the email classification, and also for the IMDB movie review classification, but what are the general rules for using one or the other or even something else like GPT? Obviously the answer is not as simple - it depends on the initial amount of data and hyperparameter tuning, but is there some kind of guidance / list of specific use case on where to use what algorithm?
Thank you so much for the response.
Sherin Mathew wrote:
As per my knowledge, you would require a good grasp in following subjects:
a. Linear algebra
b. Probability and Statistics
c. Artificial Intelligence and Neural Networks
d. Programming in any high level language, preferably python or Matlab (inbuilt libraries and functions available)
Some of the prerequisites for learning Natural Language Processing include: As NLP is part of soft skill training you must be able to understand concepts like sentence breaking, speech recognition, information extraction etc. Learning about python or tensor flow. Knowledge on algorithm
Campbell Ritchie wrote:Why?
Sherin Mathew wrote:. . . Linear algebra . . .
Abhisek Pattnaik wrote:Previously, NLP stood on it's own ground but now with AI and ML, it has taken a new curve. What things do I need to know to get a proper start with the subject?
Does the book cover these?
RajKumar Valluru wrote:Hi Paul, so this book will taught about how to leverage the prebuilt NLP models ?