Sample reusability in importance-weighted active learning
Which example should we label next?
Active learning sounds like a wonderful idea: select the most interesting examples to learn the best classifier with the least effort. Not every example is equally helpful for training your classifier. If labelling examples is expensive, it makes sense to label only those examples that you expect will give a large improvement to your classifier.
Unfortunately, using a non-random and unrepresentative selection of samples violates the basic rules of machine learning. It is therefore not surprising that active learning can sometimes lead to poor results because its unnatural sample selection produces the wrong classifier.
Modern active learning algorithms try to avoid these problems and can be reasonably successful at it. One of the remaining problems is that of sample reusability: if you used active learning to select a dataset tailored for one type of classifier, can you also use that same dataset to train another type of classifier?
In my thesis I investigate the reusability of samples selected by the importance-weighted active learning algorithm, one of the current state-of-the-art active learning algorithms. I conclude that importance-weighted active learning does not solve the sample reusability problem completely. There are datasets and classifiers where it does not work.
In fact, as I argue in the second part of my thesis, I think it is impossible to have an active learning algorithm that can always guarantee sample reusability between every pair of classifiers. To specialise your dataset for one classifier, you must necessarily exclude samples that could be useful for others. If you want to do active learning, decide what classifier you want to use before you start selecting your samples.