In practice what is the lower limit for the number of times a label has been applied in the training set that makes it viable in a multi-label project?
For example in a situation where we are labelling pictures, if we have say 1 million training examples of animals, most will be labelled as dogs, cats, puppies and kittens. A few will be horses, cows and sheep. Even fewer will be elephants, tigers or rhinos. When we get down to camelopards, there may only be 5 out of the million pictures. Is there any way we could train TensorFlow with the camelopard label, or do we just have to ignore it?
What would a sensible threshold be? 10 examples? 100 examples? Should we think more as a percentage, so any label that has been applied to less than 1% of the pictures is not going to be trainable?
If tomatoes are a fruit, then ketchup must be a jam. Taste this tiny ad:
Building a Better World in your Backyard by Paul Wheaton and Shawn Klassen-Koop