Don Horrell wrote:Hi Paul Azunre.
There are several pre-trained word embeddings available, but they generally cover the most common words.
Can I do something similar to Transfer Learning - start with a pre-trained set of word embeddings, then add my own domain-specific words somehow?
Cheers
Don.
Don, the short answer is definitely absolutely YES!
Indeed the new breed of methods - pretrained models like ELMo, BERT, etc., etc., are all character-level. This means that you are actually no longer limited to in-vocabulary words, as used to be the case with word2vec and similar methods. Indeed even FastText, which preceded them, works at the sub-word level, and did ok with this.
Moreover, each method can be "fine-tuned" in your particular domain so that the general knowledge they contain can be adapted to the slang, differing meaning and structure inherent in the natural language you care about, beyond merely any particular words!!
Hope this is helpful!