Published Research
Publicly available research linked to Inquisitor Labs can be found here.
This is not an exhaustive list. Updates will be posted as and when further research is completed.
-
LLM INQUISITOR METHODOLOGY (GitHub Edition) v1.1
This document defines the LLM INQUISITOR METHODOLOGY: a structured, repeatable discipline for evaluating the behaviour of large language models under controlled load. It provides a formal approach for assessing reliability through observable behaviour and evidentiary traceability. The methodology is designed to support rigorous evaluation in research, safety, and enterprise assurance contexts, where behavioural stability under real operational conditions is a critical requirement.
Zenodo DOI
-
Argo AI Testing Protocol: Sustained Multi‑Axis Load Testing
Most evaluation of conversational AI relies on short, prompt‑based tests that fail to reflect how real people use these systems in real and diverse situations. Such tests do not capture the demands of extended interaction, shifting user intent, or the cumulative effects of context over time. This paper introduces the Argo AI Testing Protocol (the Argo Protocol), a conceptual approach for evaluating AI systems within the User Interaction Space — the full set of observable outputs and interactions available to a user.
Zenodo DOI
-
Argo’s Fundamentals of Failings in Prompt‑Test Design & Evaluation for LLMs
This paper identifies the core structural failings in prompt‑test design and evaluation for LLMs. It shows that the methods currently used to assess model behaviour cannot produce reliable signals: they mismeasure capability, misinterpret outputs, and often generate failure states created by the tests themselves. These practices emerged in an industry expanding faster than it can define standards, leaving evaluation shaped by inconsistent methods and gatekeepers with little grounding in the systems they are judging.
Zenodo DOI
-
Argo Prompting: Pattern‑Formation in LLMs Under Sustained Conceptual Pressure
This paper introduces Argo Prompting, a method for inducing pattern‑formation behaviour in large language models through sustained conceptual pressure. It distinguishes pattern‑formation from collapse, hallucination, and surface‑level pattern‑matching, and shows how guided conceptual pressure produces coherent structural responses. The paper provides a practical framework for researchers studying LLM behaviour under extended reasoning conditions.
Zenodo DOI
Home ·
Insights ·
Methodology ·
Research ·
Training ·
Open Use ·
Collaborations ·
Glossary ·
Services