Accelerated Development of Learned Agents for Urban Autonomous Driving

Driving in an urban environment requires robust decision-making in complex situations. We often find ourselves turning at an intersection, yielding to a pedestrian, while allowing oncoming traffic to pass through. Sometimes we have to change lanes into a tight spot knowing that the vehicle behind us is going to slow down. At other times, we may have to bend the rules and nudge into an oncoming lane to pass around a bicyclist. When is it okay to proceed? Where should we merge? How far is safe enough? These are questions that we humans tend to address effortlessly and even subconsciously while driving.

This form of reasoning, often called Behavior Planning, is one of the most critical capabilities that we need to develop in our autonomous vehicle software in order to navigate safely and efficiently. It is also important to approach this in a scalable way in order to tackle the long tail of corner case events that arise in urban environments. This article gives a glimpse into how we are leveraging manual driving data from shared mobility fleets to accelerate the development of agents that learn to exhibit appropriate behaviors for a variety of driving scenarios.

Need for Learned Agents in Behavior Planning

Trying to engineer a rule-based or algorithmic behavior planner, by accounting for each and every possible driving situation, may easily become a daunting task. Consider the following:

  • You have to account for dynamic constraints, traffic regulations and social conventions.
  • Tuning algorithm parameters that work optimally across driving conditions can be difficult.
  • When you want to start driving in a new country or city, the regulations or conventions may be different, requiring significant re-engineering.
  • The biggest challenge is perhaps interaction-aware planning, where you need to predict the behavior of other actors based on your own future actions.

Capturing all of this in a predefined computational model of the environment can be hard, let alone planning with such a model.

Decision-making flowchart
Decision-making for autonomous driving can become a complex beast!

Learned agents can avoid these impediments by automatically extracting the most meaningful representations and relationships necessary for solving the task at hand, without putting unnecessary burden on us having to define every single detail. Another advantage of formulating this as a learning problem is that the capabilities of the agent are no longer limited by a predetermined set of design “features”. This approach prevents us engineers from subconsciously building in biases rooted in our own idea of how one should drive. The system is allowed to extract optimal driving policies through experience, potentially producing better results than we could develop manually.

One way to provide the experiences needed for learning to drive is to record human drivers in different scenarios – the state of the environment around them and the actions they take. These are indirect experiences that enable supervised approaches like Imitation Learning or Behavioral Cloning. While these provide a great way to bootstrap a driving policy, they are limited to the experiences that were originally recorded. The driving agent trained in this manner essentially learns to imitate human drivers, and may not know the best course of action to take when faced with a previously unseen situation.

Chimp imitating Jane Goodall
Animals learn new behaviors by imitation [source]
Small dog being trained to ride a skateboard with a treat
Positive reinforcement is often used for training dogs [source]

A less restrictive approach would be to use Reinforcement Learning (RL), where we provide the driving agent with an appropriate environment, and allow it to learn from direct experiences while interacting within it. Here the role of the engineer is primarily to design the environment – the observations sent to the agent, the space of actions available to it, and an appropriate reward mechanism – in a way that enables efficient learning. The initial development effort required in this approach is high, and you may have to go through several iterations of environment and agent design before you begin to see clear signs of learning. But once you have it all set up, learning agents tend to gradually get better and better over time. A new geographical region or driving environment with different rules/conventions may require some fine-tuning or retraining, but saves you from costly re-engineering.

Challenges in Training Agents for Real-World Decision-Making

We have established the need for learned approaches, but how do we make sure they are effective? The challenges we must contend with include:

  • Garbage-in / garbage-out: Train your agents in hand-coded scenarios, and they may not be able to adapt to emergent real-world situations.
  • Resimulation is not enough: Reproducing recorded trajectories does not provide enough variation for training.
  • Need better traffic actor models: Other actors must exhibit realistic behavior, including aggression, mistakes, violations, etc.
  • Difference between simulation and real world: This makes agents trained in simulation difficult to deploy onto physical vehicles.

Needless to say, if you want to train an RL agent to behave well in the real world, you have to expose it to realistic experiences. Using a simplified simulation environment with ideal actors (other cars and pedestrians) will result in learned agents that are not able to handle real-world complexity.

Simulated traffic at intersection
Homogeneous, orderly traffic in simulation [source]
Congested real-world traffic
Diverse, chaotic actors in real world [source]

Resimulation is a common approach that utilizes real-world data to make simulations more realistic. Here recorded scenarios consisting of actor trajectories are replayed as-is in simulation, potentially with some variation in speed, timing, position, etc. That may be appropriate for the testing and validation of an existing autonomous driving system, but is inadequate for providing the volume and diversity of scenarios needed for training an agent from scratch.

One could use a real-world scenario to only define the initial conditions of a simulation, and let the scenario play out. But for that to be realistic, you need actors that react to each other as well as the ego vehicle in natural ways. Canned or pre-scripted actors tend to produce behaviors that are too robotic, and may lead our agents into learning policies that fail in the real-world.

An additional challenge we have to deal with is that real-world sensors and actuators are very difficult to model in simulation, making trained agents hard to transfer on to physical vehicles. Although certain areas like camera rendering are inching closer to reality, it can be incredibly difficult to reproduce the same kinds of patterns, artifacts and noise that different sensors exhibit, including LiDARs and RADARs. Even if we are able to model sensors accurately, do we really want to commit to them? If you have to later swap sensors (or even their positioning), that may change the input data distribution significantly enough to require retraining from scratch!

Recreating Realistic Driving Environments for Efficient Learning

We are adopting the following core initiatives to address these challenges in developing learned agents for Behavior Planning:

  • Generate a comprehensive library of driving scenarios that reflect real-world diversity
  • Develop data-driven traffic models to reproduce human-like actor behavior
  • Design modular RL agents that ease the transition from simulation to real vehicles
  • Accelerate all of this using naturalistic driving data from shared mobility fleets

The process of generating a library of scenarios begins with scenario modeling. It is our take on quantifying the space and distribution of driving scenarios, which involves analyzing driving data to find relevant events, extracting relevant logical scenarios, organizing them in a taxonomy defined by various factors, and reproducing them in simulation.

Scenario modeling pipeline
Scenario modeling pipeline

To further amplify the value derived from each recorded real-world scenario, we adopt a data augmentation approach. It is a common practice in problem domains like object detection, where each labeled input is used to create numerous variations through translation, rotation, scaling, etc. RL agents are often trained in the exact same environment over and over again, with little variation. They get really good at the very specific problem they learn, but aren’t able to generalize well. To avoid this tendency to overfit, we need the equivalent of data augmentation for RL.

Although there have been some recent advances in introducing data augmentation for improving sample efficiency in certain RL domains, producing meaningful variations of driving scenarios isn’t as simple as applying affine transformations, adding noise or fuzzing a few parameters. Instead, we need to understand the topological structure of the road network on which the scenario plays out, and quantify the positions and behaviors of each agent relative to that structure. This allows us to reproduce the same logical scenario, e.g. a left turn at an intersection with an oncoming car, in many different parts of a given map (or even across maps), exposing agents in training to a far wider variety of experiences.

Recorded drive at intersection with oncoming car
Concrete scenario variations at an intersection with oncoming car
Concrete scenario variations generated from a single drive sequence recorded at an intersection

Actors in these scenarios must react to each other and obey traffic rules, but that is not enough for realism. Humans are aggressive, they take risks, make mistakes, drive imperfectly and even break rules. All of this introduces additional challenges for a self-driving car, and in order to train one in simulation, we must reproduce a similar level of behavioral complexity.

In concert with our work on scenarios, we are trying to address this complexity using traffic modeling. Researchers have tried to model human driving behavior in many ways, including cognitive models like Ulysses, macroscopic simulations that describe traffic flow and simplified microscopic models like the popular Intelligent Driver Model (IDM). Many of these approaches have been applied successfully in large-scale transportation research, urban road network design and capacity planning. However, they do not capture the complex decision-making process that affects individual driving behavior in urban environments with multiple other actors. We are designing a custom actor model from the ground up to represent and drive a wide range of driving behaviors, with parameters that can be tuned based on observed real-world data.

Simple traffic agent at yellow light
Simple traffic agent produces canned behavior
Data-driven agent at yellow light
Data-driven agent exhibits range of human-like behaviors

Once we have a framework to produce realistic scenarios and traffic actor behaviors, we can learn driving policies in simulation. But how do we ensure that these policies continue to work well in the real world? After all, sensors and actuators on physical vehicles may not behave the same way as modeled in simulation. We circumvent this problem by using sensor-agnostic representations of the environment, such as occupancy grids and semantic segmentation maps.

Similarly, instead of trying to model the effects of controlling a vehicle directly using throttle, brake and steering, we are leaning more towards universal action specifications like road-relative waypoint and target velocity. These actions are then processed by traditional motion planners to produce feasible collision-free trajectories, allowing us to apply reinforcement learning policies trained in simulation without significant change on real vehicles. This approach is consistent with introducing additional layers of safety, including formal methods such as Responsibility Sensitive Safety (RSS) and redundant systems for fault-tolerance.

Simplified architecture showing the role of sensor-agnostic abstractions and universal action specifications in creating a flexible interface for learned policies
Simplified architecture showing how learned policies can interface with other driving modules

Naturalistic Driving Data from Shared Mobility Fleets

It must be clear by now that driving data is the underlying source of realism for both scenario and traffic modeling! So we have to be careful where and how we collect that data. For instance, we could run a dedicated fleet of vehicles driven by safety drivers to collect the data we need. But is that going to be truly representative of natural driving? For one, our safety drivers are extremely well-trained and are highly unlikely to even get into scenarios that may result in close calls – the very scenarios that constitute the long tail of urban driving! Moreover, they may have predetermined destinations to drive to, but they’re unlikely to have the same motivation to get there as natural road users. You can never fake the urgency of a student who is running late for their finals, or the cautious behavior of a visitor driving in a new city!

How can we then understand the true spectrum of driving behaviors to expect in the real world? Vehicles operating as part of a shared mobility fleet can become an invaluable source for the naturalistic driving data we need. They are utilized more often than privately owned vehicles, driven by people for a wide range of tasks, and in fairly busy urban areas. Moreover, since our focus is on extracting abstract representations of scenarios and statistical distributions of human driving behavior, we can effectively implement measures to protect the identity and privacy of individual drivers and other traffic participants.

Woman booking shared car on mobile phone app
Shared mobility options such as ride-hailing and car-sharing are popular in dense urban areas

We are partnering with several shared mobility operators who are keen on monitoring the health and operation of their fleets. This gives us a unique opportunity to gather the data we need, while providing valuable insights to our customers. It also paves the way for introducing autonomous vehicle technology to optimize the utilization of fleet vehicles in the long run.

A development strategy for autonomous driving that is centered around machine learning can be a complex endeavor. Over time, the focus tends to shift from it being an engineering problem to more of a data and simulation problem. No matter how capable your learning algorithm is, without the right kind of input, it will never be able to produce quality results. Ridecell is prepared to tackle this challenge by extracting relevant scenarios and traffic actor models from shared mobility fleets. Furthermore, our modular approach towards system design, which allows learned agents to work in conjunction with engineered components, puts us on an accelerated path towards deploying safe and optimal driving technology.