What is Semi-Supervised Learning?
Semi-supervised learning is a learning problem that consists of a minimal number of labelled instances and a big number of unlabelled instances.
Learning problems of this variant are a challenge as neither supervised nor unsupervised learning algorithms are capable of making efficient use of the mixtures of labelled and untellable information. As such, specialized semis-supervised learning algorithms are needed.
In this guide, you will receive an introduction to the domain of semi-supervised learning for machine learning.
After going through this guide, you will be aware of:
- Semi-supervised learning is a variant of machine learning that sits between supervised and unsupervised learning.
- Top books on semi-supervised learning developed to get you up to speed in the domain.
- Extra resources on semi-supervised learning, like review papers and APIs
Tutorial Summarization
This tutorial is divided into three portions, which are:
- Semi-supervised learning
- Books on semi-supervised learning
- Additional resources
Semi-supervised learning
Semi-supervised learning is a variant of machine learning.
It references to a learning problem (and algorithms developed for the learning problem) that consists of a tiny portion of labelled instances and a big number of unlabelled instances from which a model must learn and make forecasts on new instances.
Dealing with the situation where comparatively little labelled training points are available, but a big number of unlabelled points are provided, it is directly relevant to a multitude of practical problems where it is comparatively expensive to generate labelled data.
As such, it is a learning problem that sits between supervised learning and unsupervised learning.
Semi-supervised learning (SSL) is halfway between supervised and unsupervised learning. On top of unlabelled information, the algorithm is furnished with some super-vision data – but not necessarily for all instances. Usually, this data will be the targets related with some of the instances.
We need semi-supervised learning algorithms when operating with data where labelling instances is a challenge or expensive.
Semi-supervised learning has massive practical value. In many activities, there is a paucity of labelled information. The labels y may be tough to obtain as they need human annotators, special devices, or expensive and slow experiments.
The indicator of an effective semi-supervised learning algorithm is it can accomplish improved performance than a supervised learning algorithm fit only on the labelled training instances.
Semi-supervised learning algorithms typically are capable of clearing this low bar expectation.
By contrast with a supervised algorithm that leverages just labelled information, can one hope to have a more precise prediction by taking into account the unlabelled points? In theory, the answer is yes.
Lastly, semi-supervised learning may be leveraged or might contrast inductive and transductive learning.
Typically, inductive learning is in reference to a learning algorithm that learns from labelled training data and generalizes to fresh data, like a test dataset. Transductive learning is in reference to learning from labelled training data and generalizing to available unlabelled (training) information. Both variants of learning activities may be carried out by a semi-supervised learning algorithm.
There are two unique objectives. One is to forecast the labels on future test data. The other objective is to forecast the labels on the unlabelled instances in the training sample. We call the former inductive semi-supervised learning, and the latter transductive learning.
If you are just being introduced to the idea of transduction v. induction, we will be doing another blog on Transduction within machine learning which you can refer to.
Now that we are acquainted with semi-supervised learning from a high-level, let’s take a look at the leading books on topic.
Books on Semi-Supervised Learning
Semi-supervised learning is a new and quick-moving domain of research, and as such, there are very little books on the subject.
There are probably two critical books on semi-supervised learning that you should consider if you are new to the subject, which are:
- Semi Supervised Learning, 2006
- Introduction to Semi-Supervised Learning, 2009
Semi-Supervised Learning, 2006
The book “Semi-Supervised Learning” was put out in 2006 and was edited by Olivier Chapelle, Bernhard Scholkopf, and Alexander Zien.
This book furnishes a big number of chapters, each one written by leading researchers in the domain.
It is developed to take you on a tour of the domain of research which includes intuitions, top strategies, and open problems.
The complete table of contents is detailed below.
Table of Contents
Chapter 1: Intro to Semi-Supervised Learning
Part I: Generative Models
Chapter 2: A taxonomy for semi-supervised learning models
Chapter 3: Semi-supervised text classification using EM
Chapter 4: Risks of semi-supervised learning
Chapter 5: Probabilistic semi-supervised clustering with constraints
Part II: Low-Density Separation
Chapter 6: Transductive Support Vector Machines
Chapter 7: Semi-supervised learning using semi-definite programming
Chapter 8: Gaussian processes and the null-category noise model
Chapter 9: Entropy regularization
Chapter 10: Data-dependent regularization
Part-III: Graph-based methods
Chapter 11: Label propagation and quadratic criterion
Chapter 12: The geometric basis of semi-supervised learning
Chapter 13: Discrete regularization
Chapter 14: Semi-supervised learning with conditional harmonic mixing
Part IV: Change of Representation
Chapter 15: Graph Kernels by Spectral Transforms
Chapter 16: Spectral methods for dimensionality reduction
Chapter 17: Modifying distances
Part V: Semi-supervised learning in practice
Chapter 18: Large-scale algorithms
Chapter 19: Semi-supervised protein classification using cluster kernels
Chapter 20: Prediction of protein function from networks
Chapter 21: Analysis of benchmarks
Part VI: Perspectives
Chapter 22: An augmented PAC model for semi-supervised learning
Chapter 23: Metric-driven approaches for semi-supervised regression and classification.
Chapter 24: Transductive inference and semi-supervised learning
Chapter 25: A discussion of semi-supervised learning and transduction
Intro to Semi-supervised learning, 2009
The book “Introduction to Semi-Supervised Learning” was put out in 2009 and was authored by Xiaojin Zhu and Andrew Goldberg.
The book is targeted at students, analysts/researchers, engineers just beginning in the domain.
The book is a starter’s guide to semi-supervised learning. It is targeted at advanced under-graduates, entry-level graduate students and analysts in areas as diverse as Computer Science, Electrical Engineering, Statistics, and Psychology.
It’s a briefer read than the above book and a great introduction.
The complete table of contents is detailed below:
Table of Contents
Chapter 1: Intro to Statistical Machine Learning
Chapter 2: Overview of semi-supervised learning
Chapter 3: Mixture models and EM
Chapter 4: Co-training
Chapter 5: Graph-based semi-supervised learning
Chapter 6: Semi-supervised support vector machines
Chapter 7: Human semi-supervised learning
Chapter 8: Theory and outlook
Other Books
There are some extra resources on semi-supervised learning that you might also wish to consider, which are:
- Semi-supervised learning: Background, applications, and future directions, 2018
- Graph-based semi-supervised learning, 2014
Additional Resources
There are extra resources that may be beneficial when beginning in the domain of semi-supervised learning.
It is highly recommended to begin reading some review papers.
Some instances of good review papers on semi-supervised learning consist of:
- Semi-supervised learning literature survey, 2005
- Introduction to semi-supervised learning, 2009
- An overview of deep semi-supervised learning, 2020.
In this paper, we furnish a detailed overview of deep semi-supervised learning, beginning with an intro to the domain, followed by a summarization of the dominant semi-supervised strategies in deep learning.
It is also a solid idea to attempt a few of the algorithms.
The scikit-learn Python machine learning furnishes a few graph-driven semi-supervised learning algorithms that you can try out:
Section 1.14. Semi-supervised, Scikit-Learn User Guide
The Wikipedia article might also furnish some useful links for subsequent reading:
Semi-supervised learning, Wikipedia.
Conclusion
In this guide, you received a brief introduction to the domain of semi-supervised learning for machine learning.
Particularly, you learned:
- Semi-supervised learning is a variant of machine learning that sits between supervised and unsupervised learning.
- Leading books on semi-supervised learning developed to get you up to speed in the domain.
- Extra resources on semi-supervised learning, like review papers and APIs.