Applying transfer learning to the autonomous driving task

Author: András Kalapos

Building machines which perform complex, challenging tasks is interesting, and seeing the results in action is very rewarding, isn’t it? In my master’s thesis, I use machine learning to teach a mobile robot to perform lane following on simple roads.

Autonomous vehicles promise several improvements in transportation, one of the most important one being safer and more reliable than human drivers, especially in boring situations like highway commutes. Automated taxis, trucks and vans don’t need regular breaks as human drivers do, enabling faster, and potentially cheaper, more profitable rides and deliveries. Until recently, research and development dedicated to automated driving mostly applied supervised machine learning and hardcoded control techniques to these problems. Deep reinforcement learning is a subfield of machine learning which tackles the task of learning with the use of rewards (reinforcement) received from the environment after taking actions. Commonly, reinforcement learning agents are trained in simulations, because the trial and error process inherent to these algorithms is hard to manage and very time-consuming in reality. Therefore, when applying these algorithms to real-world problems, handling the gap between simulated and real domains poses its own challenges. Reinforcement learning-based vehicle control and this simulation to reality transfer problem are both challenging open research questions.

In my master’s thesis, I study reinforcement learning techniques and apply them to simple vision-based vehicle control problems, such as lane-following. Moreover, the simulation to reality transfer problem on vehicle control is also a central question of my thesis.

Related work

Reinforcement learning has been used to solve many control and robotics tasks, however, only a handful of papers have been published so far, which apply this technique to end-to-end driving. (Mnih, et al. 2016) primarily proposes new RL algorithms, but also analyses the performance of these on the TORCS simulator (Wymann and Espie ) by training policies to predict discrete control actions based on a single image of a forward-facing camera. (Jaritz, et al. 2018) used WRC6, a realistic racing simulator to train a road following policy, which is able to drive on a variety of simulated rally tracks at high speeds. They assess the policy’s generalization capability by testing on previously unseen tracks and on real diving videos, in an open-loop configuration, but their work doesn’t extend evaluation on real vehicles in closed-loop control. (Kendall, et al. 2018) focuses more on demonstrating real-world driving, by training a lane-following policy exclusively on a real vehicle, under the supervision of a safety driver. The closest paper to the topic of my thesis is by (Balaji, et al. 2019). They mostly focus on presenting the DeepRacer platform, but also show a method for successfully training a lane-following policy in a simulator and test the trained agent in the real-world, without any further tuning.


Duckietown (Paull, et al. 2017) is a relatively simple and accessible platform for research and education on mobile robotics and autonomous vehicles. It represents an ecosystem of differential-wheeled mobile robots (so-called duckiebots) and a road system in which they can navigate. The platform also features many miscellaneous components, such as traffic lights signs and most importantly rubber duckies to frame the problem of automated driving in a more cheerful and friendly image. The Duckiebot is equipped with two driven wheels and a single camera mounted at its front, which is its only sensor. Therefore, the primary challenge of Duckietown is controlling the robots to follow the right lane on a two-way road by relying exclusively on images of the front camera. Leaving the road counts as a failure and results in the termination of the episode, while moving to the left lane is only a mistake, the robot can move there but is not preferred.

At first, I aimed to solve the lane following problem in a simulation of Duckietown (Chevalier-Boisvert, et al. 2018). Using reinforcement learning, I trained a policy which takes the last three images of the camera and produces a single continuous output value, which determines the steering of the robot. To provide feedback to the policy for its actions, I developed a reward function, which simply encourages to robot to move towards the centre of the right lane, where it’s supposed to drive. I designed and selected a diverse set of tracks for training, to obtain robust performance on any road structure. By using open-source implementations, I experimented with multiple reinforcement learning algorithms, mostly from the policy gradient “family”.

For solving the lane-following problem on a real duckiebot in the real Duckeitown, I used domain randomisation. This technique involves training a policy in many different variants of a simulated environment, thus improving the generalization capability of it. Then, the performance of the trained policy is verified in a matching real-world scenario, without any fine-tuning on real data.

To measure the performance of the trained lane-following policies in a simulation, some “standard” metrics of Duckeitown were used. These are the mean survival time, distance travelled and lateral deviation. The quality of the transfer learning process was analysed by comparing real performance to measurements in matching simulated scenarios.


I found that Proximal Policy Optimisation was a suitable reinforcement learning algorithm for training vision-based lane-following policies. On the other hand, under the time and resource constraints of this semester, I couldn’t achieve even similar results with other algorithms. Using PPO, the agent can learn to control the robot well enough in the simulator, that the mean survival time equals the maximum episode length. The distance it can travel is 93% of a baseline agent, which relies on position and orientation error from the simulator. PPO is not only capable of learning the lane following task in the Duckietown simulator, but by training with domain randomisation, it can generalize well enough to perform similarly in the real-world.

Reinforcement learning agent controlling a simulated duckiebot
Reinforcement learning agent controlling a real duckiebot
Reinforcement learning agent controlling a simulated duckiebot, viewed from the robot’s perspective

Reinforcement learning agent controlling a real duckiebot

Conclusions and future works

During the first semester of my diploma project, I developed a solution to the problem of complex, vision-based lane following in the Duckietown environment. This solution uses reinforcement learning to train an end-to-end steering policy. The possibility of inter-domain transfer learning was also demonstrated by training a policy in a simulated environment and verifying its performance in the real-world.

In the next semester, I plan to further examine the problems and techniques explored so far and carry out more thorough experimentation and hyperparameter tuning. Also, to better evaluate the simulation to realty transfer, the development of accurate quantitative indicators is necessary. Finally, I would like to attempt to solve the lane following challenge in the presence of other vehicles, using vertical transfer learning, or curriculum learning in a reinforcement learning setup.

External supervisors: Moni Róbert (Continental Automotive Hungary Kft., BME TMIT), Gór Csaba (Continental Automotive Hungary Kft.)
Internal supervisor: Dr. Harmati István (BME IIT)

The picture illustrates how the trained policy is controlling the duckiebot in the real Duckietown.


Balaji, Bharathan, Sunil Mallya, Sahika Genc, Saurabh Gupta, Leo Dirac, Vineet Khare, Gourav Roy, et al. 2019. “DeepRacer: Educational Autonomous Racing Platform for Experimentation with Sim2Real Reinforcement Learning.” DeepRacer: Educational Autonomous Racing Platform for Experimentation with Sim2Real Reinforcement Learning.

Chevalier-Boisvert, Maxime, Florian Golemo, Yanjun Cao, Bhairav Mehta, and Liam Paull. 2018. “Duckietown Environments for OpenAI Gym.” Duckietown Environments for OpenAI Gym. GitHub.

Jaritz, Maximilian, Raoul de Charette, Marin Toromanoff, Etienne Perot, and Fawzi Nashashibi. 2018. “End-to-End Race Driving with Deep Reinforcement Learning.” CoRR abs/1807.02371.

Kendall, Alex, Jeffrey Hawke, David Janz, Przemyslaw Mazur, Daniele Reda, John-Mark Allen, Vinh-Dieu Lam, Alex Bewley, and Amar Shah. 2018. “Learning to Drive in a Day.” CoRR abs/1807.00412.

Mnih, Volodymyr, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Timothy P. Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. 2016. “Asynchronous Methods for Deep Reinforcement Learning.” CoRR abs/1602.01783.

Paull, L., J. Tani, H. Ahn, J. Alonso-Mora, L. Carlone, M. Cap, Y. F. Chen, et al. 2017. “Duckietown: An open, inexpensive and flexible platform for autonomy education and research.” 2017 IEEE International Conference on Robotics and Automation (ICRA). 1497–1504.

Wymann, Bernhard, and Eric Espie. . Torcs: The open racing car simulator.