• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Tim Cooke
  • paul wheaton
  • Paul Clapham
  • Ron McLeod
Sheriffs:
  • Jeanne Boyarsky
  • Liutauras Vilda
Saloon Keepers:
  • Tim Holloway
  • Carey Brown
  • Roland Mueller
  • Piet Souris
Bartenders:

Data Skew in Multi-Label Classification - Can Transfer Learning Help?

 
Ranch Hand
Posts: 81
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Paul Azunre.
I am trying to do multi-label classification on some text. The number of times each label has been assigned to the training text shows a large skew.
Is there anything that Transfer Learning can do to help?

Thanks
Don.
 
Author
Posts: 14
5
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Don Horrell wrote:Hi Paul Azunre.
I am trying to do multi-label classification on some text. The number of times each label has been assigned to the training text shows a large skew.
Is there anything that Transfer Learning can do to help?

Thanks
Don.



I don't think this is a transfer learning problem per se. I think this is more of a fundamental challenge with multilabel classification, but I will try to suggest a way TL can be used.

The first question I would ask would be - "is the skew representative of the target distribution?". Practitioners are obsessed with balanced datasets, but in my opinion we tend to forget that the distribution in the training data needs to reflect the target distribution in the wild, and not necessarily be balanced. If your training data distribution shows 3% "anomaly" class, and your classifier is likely to see 3% of this class when deployed, then 3% "anomalies" in your training data is probably the right thing to do. I hope this makes sense.

Beyond this, I would try "data augmentation" - duplicate some of the samples in the class whose count you are trying to increase, and substitute some of the words in the duplicates with their synonyms - you can use either pretrained word embeddings or a thesaurus to do this, for instance (an example reference that talks about this -> https://towardsdatascience.com/data-augmentation-in-nlp-2801a34dfc28). This will increase the count of your under-represented class and has been observed to lead to significant improvements. Technically, since you are using pretrained knowledge in the form of embeddings and thesaurus, this is an example of transfer learning, even if people may not acknowledge it as such.

Hope this is helpful!

- Paul
 
reply
    Bookmark Topic Watch Topic
  • New Topic