A Tool for Benchmarking Large Language Models' Robustness in Assessing the Realism of Driving Scenarios

A Tool for Benchmarking Large Language Models' Robustness in Assessing the Realism of Driving Scenarios

Authors
J. Wu