
Unlocking the black box: how new research is making reinforcement learning safer
Published:
At a time when machine learning yields a multitude of solutions to many problems, reliably assuring their quality remains a challenge.
In this field, Jørn Eirik Betten, PhD candidate, is researching the ways Machine Learning (ML) software can be tested more efficiently.
Testing the uninterpretable machine
Betten began his research journey after studying chemistry and taking a statistics course, which led him to machine learning and learning theory. He was intrigued by the gaps in understanding why large models work and was unfamiliar with both reinforcement learning (RL) and the field of testing, which is central to his research group. Reinforcement learning, based on psychologist Skinner's theories, applies an immediate reward signal, like a virtual Pavlov’s bell, to teach a machine good behavior through trial and error, maximising the reward for each state.
The challenge lies in the nature of ML models because they are hard to interpret and unpredictable. While traditional software testing checks that a program executes as expected, ML models are "not easy to test". Testing is costly and must be done extremely well because of the real-world applications and potential risks involved. These machines are highly sensitive to small changes in sensor data or images, making it difficult to predict the output if an image slightly changes.
In reinforcement learning specifically, Jørn Eirik sees three main challenges:
1. Understanding the value of the state space (all possible states).
2. Interpreting the observation the model receives from a given state.
3. Planning across the dimension of time to predict the consequences of actions and ensure foresight into the outcome.
From Rashōmon to real-world safety
Jørn Eirik explains his research simply as finding ways to ensure a machine behaves as desired. He demonstrated this to his 13-year-old nephew who visited the Simula office, by showing how a computer could learn to play a game like Mario Cart, through trial and error.
One of the most significant challenges in his work was grappling with what Betten was at first likening to the Rashōmon effect. This effect, inspired by a Kurosawa film, describes how many different models can learn the exact same problem and arrive at the same answer, but they solve it differently via a multiplicity of good models. Unlike supervised learning, which has a ground truth, sequential decision-making tasks in reinforcement learning, such as a baby learning to walk or a car driver learning to drive, have the added dimension of a certain freedom to strategise.
Now, after two years and facing rejection on papers related to the effect, the focus has shifted. The work is now dedicated to mapping the set of outcomes from different models, using the same learning paradigm (algorithms) but varying the randomness through different "random seeds".
Contributing to the big question of safety
Jørn Eirik hopes his research will contribute to the "big question of safety," especially as general-purpose AI and technologies like home assistant robots are expected to arrive in five years or sooner. For these systems to be safe for deployment, Quality Assurance (QA) and testing, particularly in reinforcement learning, must be highlighted more.
His research streamlines scaling and deployment by leveraging the inherent variation of AI agents created during development. By using these varying policies, which are the “decision-making rules” of an agent, developers can automatically generate difficult test cases to evaluate new policies on the fly. This provides a faster, more controlled overview of how an AI is evolving. Ultimately, this work seeks to look into the “black box” of AI behavior, providing quantitative data for inspection at high speeds.
Future facing
Jørn is pursuing his PhD in a collaboration between Simula and OsloMet at the Faculty of Technology, Art, and Design.
Image illustrates machine learning variation, which is key to Jørn's research. It was generated using the Nano Banana Pro agent.