Apply semi-supervised learning for semantic segmentation

SmartLab AI
4 min readFeb 25, 2021


Author: Gábor Lant

One of the key strength of deep learning is that it can work with a large amount of data to train machine learning models. These models can be trained to find the underlying structure of the data. On the other hand, this also means that data is also a requirement for these methods to work properly. Finding enough good quality data, however, can be a challenging problem by itself. In computer vision tasks this is a common problem. Labeling images are difficult tasks and usually can only be done by a trained human agent. For example, annotating videos frame by frame requires a lot more time to label than the creation of the playback of the original media. For computer vision tasks this sets new problems for researchers to solve.


Deep learning methods achieved remarkable performance in image processing tasks. However, training the models require large datasets to avoid overfitting. Overfitting can cause the model to perfectly learn the features of the data preventing it from generalizing. Generalization in these terms means the performance of the model when introduced to data it has never seen before (e.g. test data). This can be achieved through careful network design or data augmentation.

There are many strategies that focus on increasing the performance of the network by modifying its architecture. Some commonly applied methods are dropout, batch normalization or transfer learning.

Data augmentation on the other hand does not modify the model, instead it artificially increases the size of available training data. Data augmentation allow us to create new artificial data points through some kind of data modification method. This method varies with the task. For computer vision problems for example: translation, cropping, noise injection and color space transformations work well.

Augmentation examples

One of the problems when using data augmentation is choosing the right set of policies for the given data. Some recent approaches solve this problem by applying policy search methods.

  • AutoAugment is a method which uses reinforcement learning style policy search to find the optimal set of augmentations.
  • RandAugment does not apply a searching method, but instead applies augmentations randomly for each batch. This methods seems to have the similar performance but with less computational overhead.

Semi-supervised learning

This approach falls between supervised and unsupervised learning. Semi-supervised methods utilize the fact that unlabeled data is much easier to produce economically. These methods try to improve the performance of the model by providing it with both labeled and unlabeled data in hopes to improve performance and generalization.

A recent study has shown that there is a lot to learn from semi-supervised methods, that apply data augmentation to the training phase.

Unsupervised Data Augmentation (UDA) has shown state-of-the-art performance when applied to computer vision tasks where data is scarce. This method uses a semi-supervised model with consistency training. The left side of the model is calculated supervised loss while the other side calculates consistency loss from a set of augmented and non-augmented unlabeled data.

UDA model

Applying it for semantic segmentation

Semantic segmentation is a problem where a lot of training is needed to achieve significant performance. However, creating label maps for every pixel on the image is a time-consuming process.

UDA can help to solve this problem by applying the strength of semi-supervised learning and data augmentation. Using two standard semantic segmentation models Deeplab v3 and U-Net on the popular benchmark Cityscapes dataset we achieved noticeable improvements over the baseline models with labeled data only. The two models achieved 0.600 and 0.676 mIoU score for the baseline training.

UDA was able to improve the results by +0.062 and +0.002 mIoU score over the baseline.

Some examples of the output: (top to bottom: U-Net, Deeplab, UDA U-Net, UDA Deeplab).

Input image — predicted image — ground truth

This study shows that semi-supervised learning can improve model accuracy in semantic segmentation tasks and that there is a lot more to learn about how to train models with specific augmentations to improve generalization.



SmartLab AI

Deep Learning and AI solutions from Budapest University of Technology and Economics.