A Tool for Benchmarking Large Language Models' Robustness in Assessing the Realism of Driving ScenariosAuthorsJ. Wu, C. Lu, A. Arrieta and S. AliStatusAcceptedPublication typeProceedings RefereedYear of publication2025Journal2nd ACM/IEEE International Conference on AI-powered Software (AIware 2025)PublisherACM/IEEECitation key18493Google Scholar BibTex