A brief overview of Imitation Learning

Basics of Imitation Learning

Behavioural Cloning

Behavioural cloning can fail if the agent makes a mistake (source)

Direct Policy Learning (via Interactive Demonstrator)

The general direct policy learning algorithm

Inverse Reinforcement Learning

  • We start with a set of expert’s demonstrations (we assume these are optimal) and then we try to estimate the parameterized reward function, that would cause the expert’s behaviour/policy.
  • We update the reward function parameters.
  • Then we solve the reinforced learning problem (given the reward function, we try to find the optimal policy).
  • Finally, we compare the newly learned policy with the expert’s policy.
The differences between the model-given and the model-free IRL algorithms


  • advantages: very simple, can be quite efficient in certain applications
  • disadvantages: no long-term planning, a mismatch can occur in the state distribution between training and testing (in some applications this can lead to critical failure)
  • use when: the application is “simple”, so an error committed by the agent does not lead to severe consequences
  • advantages: efficient when trained, has long-term planning
  • disadvantages: interactive expert/demonstrator is required
  • use when: application is more complex and an interactive expert is available
  • advantages: does not need interactive expert, very efficient when trained (in some cases can outperform the demonstrator), has long-term planning
  • disadvantages: can be difficult to train
  • use when: application is more complex, an interactive expert is not available or it might be easier to learn the reward functions than the expert’s policy


Deep Learning and AI solutions from Budapest University of Technology and Economics. http://smartlab.tmit.bme.hu/

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Building a Sentiment Analyzer With Naive Bayes

Create an Image classifier from scratch without neural nets — Part 2

Machine Learning in the Azure Cloud

#Importing the libraries import numpy as np import matplotlib.pyplot

Will Machines See with Movidius? | SoftServe

What Is Clustering and Common Clustering Algorithms ?

Machine Learning Approach in Network Protocols [DTNs]

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
SmartLab AI

SmartLab AI

Deep Learning and AI solutions from Budapest University of Technology and Economics. http://smartlab.tmit.bme.hu/

More from Medium

GAN Hyperparameter Tuning with Keras Tuner

Invasive Species Monitoring: Using a Convolutional Neural Network to identify hydrangeas

Model Soups for Higher Performing Models

optimal learning rate for Gradient Descent on a high-dimensional quadratic