Author: Péter Almási
I set the goal to create a method for controlling vehicles to perform autonomous lane following using deep reinforcement learning. The agent is trained in a simulated environment without any real-world data and is tested in the real world. The performance of the agent was tested under extreme test scenarios: night mode driving and recovery from irregular starting locations.
Deep Reinforcement Learning (DRL) is a field of machine learning which enables intelligent software agents in an environment to attain their goal. They utilize deep neural networks to learn the best possible actions in each state. This technique has been successfully applied to beat world champions in different board and computer games, for example, Go¹ or StartCraft II².
However, solving tasks involving real-world devices, e.g. robots or autonomous vehicles, with DRL seems to be a more difficult challenge. The desired approach is to use a simulator to train the agent in a virtual environment and transfer it to the real world. Training agents in an autonomous driving simulator is already a challenging task, as most DRL methods still lack mathematical fundaments and are unstable. Furthermore, models trained in a simulator tend to suffer from severe performance degradation when transferred to a real-world environment due to the differences.
In this article, I propose a method for training autonomous driving agents in a simulator and transferring them to real-world vehicles. I describe the details of training self-driving robots in the Duckietown simulation environment with deep reinforcement learning. I develop a method to effectively transfer agents from the simulator to real vehicles. As a result, the agents can drive autonomously in the real-world environment without further training on real-world data. The robustness of this method is evaluated with extreme test cases that the agent was not trained explicitly to be able to handle: driving in night vision conditions and recovering from an invalid starting position. The method can be run in real-time on a single computer with limited hardware resources (i.e. no dedicated graphics card).
I used the Duckietown environment³ to implement and test my method. It is an open, inexpensive, and flexible platform for small autonomous vehicles. The platform consists of two parts: the Duckietowns and the Duckiebots.
The Duckietowns are the “cities” with roads, intersections, and obstacles. These are built of standardized elements; using these map tiles, many different kinds of environments can be built. Simpler setups consist of only a single road, whereas more complex configurations can even include intersections.
The Duckiebots are the vehicles that need to be controlled in the cities. These are small, three-wheeled vehicles with a differential drive. Their only sensor is a forward-facing monocular camera with fish-eye lens. My goal was to create a control function that processes the camera images and produces commands to control the velocity of the wheels of the vehicle in order to realize successful lane following.
The overview of the method can be seen in the following picture.
In each time step, the environment (either the simulator or the camera of the vehicle) provides raw RGB images to the agent. The agent processes this image and selects one of the possible actions (control speed commands) that it finds most suitable for controlling the vehicle.
For successful transfer from the simulator to the real world, domain randomization methods⁴ are used during training. This means that in each episode, the values of several parameters of the simulator are chosen randomly. Such parameters include physical and vision parameters. This way, the agent is trained in several randomized environments and is expected to be able to control the vehicle in the real world as well.
The images are preprocessed through four preprocessing steps. First, they are rescaled to a smaller size to make training and inference faster. Some parts of the image, which do not contain useful information for lane following, are cropped, and the image is normalized. Finally, an image sequence is formed, which describes the state of the agent more accurately than one single image instance.
The agent is trained using the Deep Q-Networks⁵ algorithm. It uses a convolutional neural network to process the images. For fast training and inference, a simple neural network is utilized. For training, a reward function is defined, which describes how accurately the vehicle follows the lane. This is calculated using several metrics: e.g. the angle of the vehicle, its distance from the center of the lane, etc. When it follows the lane precisely, it earns high rewards; when it drifts away from the optimal curve, it receives smaller rewards; and when it leaves the track, it gets a high penalty. The agent learns the control policy based on the reward function as the only feedback from the environment.
Finally, the actions are post-processed during inference. The usage of discrete actions results in a crude movement of the vehicle; thus the predictions of the agent are transformed into the continuous action space. This results in a much smoother vehicle control both in the simulated and the real-world environment.
The performance of the agent is tested both in the simulated and the real-world environment. In each test case, the vehicle was started from a randomly chosen position on the track. The test case was considered successful if it was able to drive at least one complete lap without leaving the right lane — after this, it has successfully passed all parts of the track, so we can expect it to be able to continue its journey for more laps as well. These tests were carried out on four simulated and one real-world map, and the results are summarized in the following table — the agent was successful in most cases both in the simulator and in the real world.
Furthermore, I made extreme test cases to evaluate the robustness of the method. The agent was not trained explicitly to be able to handle these scenarios. The first of them was night mode driving. These conditions provide significantly different input compared to the day-mode scenario. The tests showed that the agent can drive the vehicle in night mode in most of cases.
The method was also tested with invalid starting positions: for example, when the vehicle is placed on the oncoming lane or perpendicularly to the road. I found that the vehicle is able to recover to the right lane and start traveling around there.
The vehicle in action can be seen in the following video.
If you are interested in reading more about this work, a paper is going to be published soon — stay tuned for more details!
 Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., … & Hassabis, D. (2017). Mastering the game of go without human knowledge. Nature, 550(7676), 354–359.
 Vinyals, O., Babuschkin, I., Chung, J., Mathieu, M., Jaderberg, M., Czarnecki, W. M., … & Silver, D. (2019). Alphastar: Mastering the real-time strategy game starcraft ii. DeepMind blog, 2.
 Paull, L., Tani, J., Ahn, H., Alonso-Mora, J., Carlone, L., Cap, M., … & Censi, A. (2017, May). Duckietown: an open, inexpensive and flexible platform for autonomy education and research. In 2017 IEEE International Conference on Robotics and Automation (ICRA) (pp. 1497–1504). IEEE.
 Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., & Abbeel, P. (2017, September). Domain randomization for transferring deep neural networks from simulation to the real world. In 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 23–30). IEEE.
 Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., … & Hassabis, D. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529–533.