Controlling Self-Driving Robots with Deep Reinforcement Learning
Author: Péter Almási
Deep Reinforcement Learning (DRL) has been successfully used to solve different challenges, e.g. complex board and computer games, recently. However, solving real-world robotics tasks with DRL seems to be a more difficult challenge. The desired approach would be to train the agent in a simulator and transfer it to the real world. Still, models trained in a simulator tend to perform poorly in real-world environments due to the differences. In this work, I present a DRL-based algorithm that can perform autonomous robot control using Deep Q-Networks (DQN). In my approach, the agent is trained in a simulated environment and it can navigate both in a simulated and real-world environment. The method is evaluated in the Duckietown environment, where the agent has to follow the lane based on a monocular camera input. The trained agent can run on limited hardware resources.
Deep learning has an important role in the automotive industry. The latest advances and developments have made it possible to analyze and understand the images of the cars’ cameras. Using deep neural networks, it is possible to find and localize certain objects (e.g. cars, pedestrians, cyclists, or traffic signs etc.) on the images, which is an important milestone in the development of self-driving vehicles. However, the problem of autonomous navigation is still far from being completely solved, and it is under very active research nowadays.
Duckietown is an educational and research platform where low-cost robots (“Duckiebots”) can travel in a small city (“Duckietown”). Duckiebots are small three-wheeled robots whose only sensor is a forward-facing wide-angle monocular camera; the robot has to be controlled by processing and analyzing the camera images. Duckietowns are cities consisting of roads, intersections, traffic signs, houses, ducks and other obstacles where the robots have to operate. The most essential challenge in this platform is lane following, where the robot has to drive around the map in their own lane.
The Duckietown software library contains a simulator, which provides an environment similar to the real world. Training agents for real-world problems in a simulator is a promising approach, as it is much safer to simulate incidences that must be avoided in the real world (e.g. collisions). Also, with sufficient GPU resources, the agents can be trained in a much faster pace than real-time. Collecting sufficient training data is also much more convenient within a simulator. The simulator can also provide additional metrics (e.g. accurate location of objects and the distance between them) which may be difficult to measure in the real world but help to evaluate the performance of the agent. However, simulators often have significant differences compared to the real world, and these differences (e.g. details, colours, lighting conditions, or dynamics) can cause the trained models to suffer significant performance degradation in the real world. Training models in the simulator which perform similarly well in the real world is a challenging task.
In my method, I chose the model-free and off-policy Deep Q-Networks (DQN) method to train a neural network to control the robot. DQN is one of the most well-known reinforcement learning algorithms, and it can learn an optimal policy in the simulated traffic environment which performs nearly as good in the real environment.
The complete pipeline works as follows. First, the camera images go through several preprocessing steps: resizing to a smaller resolution (60x80) for faster processing; cropping the upper part of the image, which doesn’t contain useful information for the navigation; segmenting important parts of the image based on their colour (lane markings); and normalizing the image. Next, a sequence is formed from the last 5 camera images, which will be the input of the Convolutional Neural Network (CNN) policy (the agent). The agent is trained in the simulator with the DQN algorithm based on a reward function that describes how accurately the robot follows the optimal curve. The output of the network is mapped to wheel speed commands.
I trained a convolutional neural network with the preprocessed images. The network was designed such that the inference can be performed real-time on a computer with limited resources (i.e. it has no dedicated GPU). The input of the network is a tensor with the shape of (40, 80, 15), which is the result of stacking five RGB images. The network consists of three convolutional layers, each followed by ReLU (nonlinearity function) and MaxPool (dimension reduction) operations. The convolutional layers use 32, 32, 64 filters with size 3 × 3. The MaxPool layers use 2 × 2 filters. The convolutional layers are followed by fully connected layers with 128 and 3 outputs. The output of the last layer corresponds to the selected action. The output of the neural network (one of the three actions) is mapped to wheel speed commands; these actions correspond to turning left, turning right, or going straight, respectively.
Evaluation and results
My primary goal was to train an agent in the simulator which can navigate the robot along the track both in the simulator and the real world. I tested my method on several maps, different from the one I used during training, to eliminate the possibility of overfitting to one single map. I trained on a larger, more complicated map, to make it possible for the network to learn diverse turns and road situations. I tested my method by placing the robot on 50 randomly selected positions on each map and checking how many times it can drive at least a complete lap. I assumed that if the agent can drive a whole lap, it has successfully passed all parts of the track and it would be able to do this in the following laps as well. The following table summarizes the results of my tests.
The following figure shows the paths taken by the robot on one of the simulated maps and on (a part of) the real-world map.
This work will also be presented at the IEEE World Congress on Computational Intelligence 2020 conference. Our paper was written together with my advisors, Bálint Gyires-Tóth and Róbert Moni, titled “Robust Reinforcement Learning-based Autonomous Driving Agent for Simulation and Real World”, has been accepted for the conference.