>Business >What is Semi-Supervised Learning?

What is Semi-Supervised Learning?

Semi-supervised learning is a learning problem that consists of a minimal number of labelled instances and a big number of unlabelled instances. 

Learning problems of this variant are a challenge as neither supervised nor unsupervised learning algorithms are capable of making efficient use of the mixtures of labelled and untellable information. As such, specialized semis-supervised learning algorithms are needed. 

In this guide, you will receive an introduction to the domain of semi-supervised learning for machine learning. 

After going through this guide, you will be aware of: 

  • Semi-supervised learning is a variant of machine learning that sits between supervised and unsupervised learning. 
  • Top books on semi-supervised learning developed to get you up to speed in the domain. 
  • Extra resources on semi-supervised learning, like review papers and APIs 

Tutorial Summarization 

This tutorial is divided into three portions, which are: 

  1. Semi-supervised learning 
  2. Books on semi-supervised learning 
  3. Additional resources 

Semi-supervised learning 

Semi-supervised learning is a variant of machine learning.  

It references to a learning problem (and algorithms developed for the learning problem) that consists of a tiny portion of labelled instances and a big number of unlabelled instances from which a model must learn and make forecasts on new instances. 

Dealing with the situation where comparatively little labelled training points are available, but a big number of unlabelled points are provided, it is directly relevant to a multitude of practical problems where it is comparatively expensive to generate labelled data. 

As such, it is a learning problem that sits between supervised learning and unsupervised learning. 

Semi-supervised learning (SSL) is halfway between supervised and unsupervised learning. On top of unlabelled information, the algorithm is furnished with some super-vision data – but not necessarily for all instances. Usually, this data will be the targets related with some of the instances. 

We need semi-supervised learning algorithms when operating with data where labelling instances is a challenge or expensive.  

Semi-supervised learning has massive practical value. In many activities, there is a paucity of labelled information. The labels y may be tough to obtain as they need human annotators, special devices, or expensive and slow experiments. 

The indicator of an effective semi-supervised learning algorithm is it can accomplish improved performance than a supervised learning algorithm fit only on the labelled training instances. 

Semi-supervised learning algorithms typically are capable of clearing this low bar expectation. 

By contrast with a supervised algorithm that leverages just labelled information, can one hope to have a more precise prediction by taking into account the unlabelled points? In theory, the answer is yes. 

Lastly, semi-supervised learning may be leveraged or might contrast inductive and transductive learning. 

Typically, inductive learning is in reference to a learning algorithm that learns from labelled training data and generalizes to fresh data, like a test dataset. Transductive learning is in reference to learning from labelled training data and generalizing to available unlabelled (training) information. Both variants of learning activities may be carried out by a semi-supervised learning algorithm. 

There are two unique objectives. One is to forecast the labels on future test data. The other objective is to forecast the labels on the unlabelled instances in the training sample. We call the former inductive semi-supervised learning, and the latter transductive learning.  

If you are just being introduced to the idea of  transduction v. induction, we will be doing another blog on Transduction within machine learning which you can refer to. 

Now that we are acquainted with semi-supervised learning from a high-level, let’s take a look at the leading books on topic. 

Books on Semi-Supervised Learning 

Semi-supervised learning is a new and quick-moving domain of research, and as such, there are very little books on the subject.  

There are probably two critical books on semi-supervised learning that you should consider if you are new to the subject, which are: 

  • Semi Supervised Learning, 2006 
  • Introduction to Semi-Supervised Learning, 2009 

Semi-Supervised Learning, 2006 

The book “Semi-Supervised Learning” was put out in 2006 and was edited by Olivier Chapelle, Bernhard Scholkopf, and Alexander Zien. 

This book furnishes a big number of chapters, each one written by leading researchers in the domain. 

It is developed to take you on a tour of the domain of research which includes intuitions, top strategies, and open problems. 

The complete table of contents is detailed below. 

Table of Contents 

Chapter 1: Intro to Semi-Supervised Learning 

Part I: Generative Models 

Chapter 2: A taxonomy for semi-supervised learning models 

Chapter 3: Semi-supervised text classification using EM 

Chapter 4: Risks of semi-supervised learning 

Chapter 5: Probabilistic semi-supervised clustering with constraints 

Part II: Low-Density Separation 

Chapter 6: Transductive Support Vector Machines 

Chapter 7: Semi-supervised learning using semi-definite programming 

Chapter 8: Gaussian processes and the null-category noise model 

Chapter 9: Entropy regularization 

Chapter 10: Data-dependent regularization 

Part-III: Graph-based methods 

Chapter 11: Label propagation and quadratic criterion 

Chapter 12: The geometric basis of semi-supervised learning 

Chapter 13: Discrete regularization 

Chapter 14: Semi-supervised learning with conditional harmonic mixing 

Part IV: Change of Representation 

Chapter 15: Graph Kernels by Spectral Transforms 

Chapter 16: Spectral methods for dimensionality reduction 

Chapter 17: Modifying distances 

Part V: Semi-supervised learning in practice 

Chapter 18: Large-scale algorithms 

Chapter 19: Semi-supervised protein classification using cluster kernels 

Chapter 20: Prediction of protein function from networks 

Chapter 21: Analysis of benchmarks 

Part VI: Perspectives 

Chapter 22: An augmented PAC model for semi-supervised learning 

Chapter 23: Metric-driven approaches for semi-supervised regression and classification. 

Chapter 24: Transductive inference and semi-supervised learning 

Chapter 25: A discussion of semi-supervised learning and transduction 

Intro to Semi-supervised learning, 2009 

The book “Introduction to Semi-Supervised Learning” was put out in 2009 and was authored by Xiaojin Zhu and Andrew Goldberg.  

The book is targeted at students, analysts/researchers, engineers just beginning in the domain.  

The book is a starter’s guide to semi-supervised learning. It is targeted at advanced under-graduates, entry-level graduate students and analysts in areas as diverse as Computer Science, Electrical Engineering, Statistics, and Psychology. 

It’s a briefer read than the above book and a great introduction. 

The complete table of contents is detailed below: 

Table of Contents 

Chapter 1: Intro to Statistical Machine Learning 

Chapter 2: Overview of semi-supervised learning 

Chapter 3: Mixture models and EM 

Chapter 4: Co-training 

Chapter 5: Graph-based semi-supervised learning 

Chapter 6: Semi-supervised support vector machines 

Chapter 7: Human semi-supervised learning 

Chapter 8: Theory and outlook 

Other Books 

There are some extra resources on semi-supervised learning that you might also wish to consider, which are: 

  • Semi-supervised learning: Background, applications, and future directions, 2018 
  • Graph-based semi-supervised learning, 2014 

Additional Resources 

There are extra resources that may be beneficial when beginning in the domain of semi-supervised learning.  

It is highly recommended to begin reading some review papers. 

Some instances of good review papers on semi-supervised learning consist of: 

  • Semi-supervised learning literature survey, 2005 
  • Introduction to semi-supervised learning, 2009 
  • An overview of deep semi-supervised learning, 2020. 

In this paper, we furnish a detailed overview of deep semi-supervised learning, beginning with an intro to the domain, followed by a summarization of the dominant semi-supervised strategies in deep learning. 

It is also a solid idea to attempt a few of the algorithms.  

The scikit-learn Python machine learning furnishes a few graph-driven semi-supervised learning algorithms that you can try out: 

Section 1.14. Semi-supervised, Scikit-Learn User Guide 

The Wikipedia article might also furnish some useful links for subsequent reading: 

Semi-supervised learning, Wikipedia. 

Conclusion 

In this guide, you received  a brief introduction to the domain of semi-supervised learning for machine learning.  

Particularly, you learned: 

  • Semi-supervised learning is a variant of machine learning that sits between supervised and unsupervised learning. 
  • Leading books on semi-supervised learning developed to get you up to speed in the domain. 
  • Extra resources on semi-supervised learning, like review papers and APIs. 
Add Comment