PIA project’s achievement at NeurIPS AIDO6
Author: Robert Moni, András Kalapos, András Béres, Bence Hámori, Dávid Bárdos, Tibor Áron Tóth, Gyires-Tóth Bálint
In December 2021 our team competed with 5 different solutions at the 6th edition of the AI Driving Olympics (AIDO) which was part of the 35th conference on Neural Information Processing Systems (NeurIPS). In all, 40 competitors competed in three different challenges.
Our team ranked top in 2 out of 3 challenges.
The Professional Intelligence for Automotive (PIA) Project was founded in 2019 as a cooperation project between Continental Deep Learning Competence Center Budapest and the Budapest University of Technology and Economics. The project aims to provide academic and industrial expertise for students’ research projects in the field of Deep Learning and Autonomous Driving.
The challenges at the 6th Duckietown AI Driving Olympics @ NeurIPS 2021
1. Lane following [LF]
2. Lane following + Vehicles [LFV]
3. Lane following + Intersections [LFI]
Top winning entries
Top submissions in the real testing [Source]. In the blue boxes, the PIA team members are highlighted
Our winning solutions
1. András Kalapos
My solution at the 6th AI Driving Olympics uses a neural network-based controller policy that was trained using reinforcement learning. Its ‘brain’ is a convolutional neural network, that almost directly computes control signals based on images from the robot. Only very simple preprocessing is applied to the observations, such as downscaling, cropping and stacking. Then, based on this input, the network computes a single scalar value as its output that is interpreted as a steering signal.
An important feature of my solution is that I trained the agent only in a simulation, while also testing it in the real world. I trained it using a policy gradient type reinforcement learning algorithm, namely Proximal Policy Optimization for its stability, sample-complexity, and ability to take advantage of multiple parallel workers. To achieve robust performance in the physical environment I used domain randomization. This involves training the policy in a set of different variants of the simulation, which are generated by randomly perturbing parameters of it, such as lighting conditions, object textures, camera parameters and so on. The built-in randomization features of Duckietow’s official simulation proved to be sufficient for reliable lane following on the real roads of Duckietown, despite this simulation’s lack of realistic graphics and physical accuracy.
If you are interested in the details of our work, we published two papers about it: Sim-to-real reinforcement learning applied to end-to-end vehicle control and Vision-based reinforcement learning for lane-tracking control.
Our source code is available on GitHub: github.com/kaland313/Duckietown-RL
2. András Béres
In the previous year and during my MSc thesis, I worked on algorithms involving supervised/unsupervised representation learning combined with reinforcement learning. This year, I experimented with a new class of algorithms using imitation learning. I built a teacher agent using full state feedback control, which was able to drive seamlessly using the physical state of the simulator (speed, angular velocity, track angle, distance and curvature). I then used this teacher to train a student model with supervised learning to try to imitate its steering behaviour using only stacked frames of input images from the simulator. The agent was driving during the learning process, so it could learn how to act in suboptimal situations too, using the DAGGER imitation learning algorithm. I used a continuous action space along with the Pytorch framework. For the finals, I also managed to train agents capable of overtaking other agents under the right circumstances using this method.
3. Bence Hámori
The agent I submitted has been trained with Optimal Policy Optimization. For the training, I used frame stacking and domain randomisation.
Frame stacking not only evaluates one picture, but it inspects a sequence of pictures. The algorithm receives a stack of frames as input, by this it can more accurately recognise the correct step to take in a given situation.
Although with domain randomisation, the agent achieves less reward during the training in the simulator, it works better in real-life conditions.
My submitted agent achieved 3rd place in the real-LV-validation challenge.
4. Dávid Bárdos
I used our Proximal Policy Optimization-based baseline implementation and attempted to speed up the training and evaluating process with mixed-precision training. Mixed precision training uses 16-bit floating-point numbers instead of the traditional 32-bit. I submitted my result to the AI Driving Olympics 6 DuckieTown challenge and achieved 4th place in the Lane Following with Vehicles section.
5. Tibor Áron Tóth
I implemented a Transformer model into a Proximal Policy Optimization-based Deep Reinforcement algorithm and I competed with this solution at the AI-DO 6 Duckietown challenges. My purpose was to train the agent in the Duckietown simulator not only for speed and accuracy but for robustness too. This robustness was achievable since the attention mechanism in the Transformer takes into consideration previous states of the model. I used hyperparameter optimisation and hand-tuning to find the satisfiable hyperparameter set for this quite new algorithm.