SimpleAudit

SimpleAudit

Simula's open AI safety auditing framework that allows you to test your AI systems and find their weak spots.

Simpleaudit is recognised as a Digital Public Good. This confirms that the solution meets international standards for transparency, accountability, and its contribution to sustainable development.

Today’s large language models are powerful, but they also behave unpredictably. They may follow instructions in one moment and ignore them the next, or provide confident explanations even when their answers are wrong. For organisations that rely on accuracy, accountability and audit trails this is a serious challenge.

The systems are guided through “prompts”. They are flexible but fragile, difficult to control, hard to document, and almost impossible to verify.

A research-driven instruction schema has been validated and integrated into an operational environment. In this environment, structured AI governance moves from a theoretical concept to something that works in practice. 

Instead of describing system behaviour in plain text, this framework defines instructions as structured, versioned configuration. Making the behaviour more predictable, testable, and easier to audit.

This is done through a Large Language Model (LLM) Instruction Schema Standard (LISS) paired with a Per-session Instruction Schema (PSIS).

It can be compared to how web development became more disciplined when structured standards like CSS replaced ad-hoc formatting.

Available on GitHub: https://github.com/kelkalot/simpleaudit

Reports

Safety Evaluation of Language Models for Norwegian Deployment

This report presents a comparative safety evaluation of five language models on 36 Norwegian-specific scenarios, four 4B and 8B models, and GPT-5 from OpenAI as a frontier baseline.

Using our own SimpleAudit AI safety auditing framework, we tested behaviors that standard benchmarks do not measure: refusal of harmful requests, resistance to manipulation, accuracy of institutional information, and maintenance of appropriate boundaries. Our findings challenge conventional assumptions about model selection.

Access the full report.

Contributors

  • Michael A. Riegler (Simula)
  • Sushant Gautam (SimulaMet)
  • Klas H. Pettersen (SimulaMet)

Contact