EU AI Act Articles 9–15: A Technical Reading for Engineering Teams
The EU AI Act is a technical mandate, not just a policy one. For engineering teams deploying AI in high-risk contexts, Articles 9–15 require specific architectural choices—not documentation written after the fact. The Act doesn’t just ask what the model produced. It asks how it reasoned, and whether that reasoning can be shown.
Who This Is For
Most EU AI Act coverage is written by lawyers for lawyers. That’s a problem, because the compliance obligations in Articles 9 through 15 aren’t primarily legal obligations—they’re engineering ones. The decisions that determine whether a system is compliant get made by the teams designing and deploying it, not by the teams reviewing it afterward.
This is a technical reading. It assumes you know what a model is, what a prompt is, and what “production deployment” means. It does not assume you have read the Act in full. Enforcement begins August 2026 for high-risk AI systems, so the timeline is no longer abstract.
What Makes a System “High-Risk”
Before Articles 9–15 apply, a system has to qualify as high-risk. Annex III of the Act defines the categories. These are not edge cases. Annex III covers AI systems used in: biometric identification, critical infrastructure management, education and vocational training, employment and workforce management, access to essential private and public services (including credit scoring), law enforcement, migration and border control, and administration of justice.
If you are deploying an LLM-based system that influences credit decisions, screens job applicants, recommends insurance coverage, assists in medical diagnosis, or supports legal analysis—you are almost certainly operating a high-risk AI system under this definition. The question is not whether the Act applies to you. For most teams in regulated industries, it does. The question is what it actually requires.
Article 9: Risk Management Is a System, Not a Document
Article 9 requires providers of high-risk AI systems to establish, implement, document, and maintain a risk management system throughout the system’s lifecycle. The word “system” is deliberate. A one-time risk assessment document does not satisfy this requirement. What the Act describes is a continuous process: identifying risks, estimating and evaluating them, adopting mitigation measures, and testing residual risks against the system’s intended purpose.
For an LLM-based system, this has practical engineering implications. Risk management cannot be outsourced to a static evaluation run at launch. It requires ongoing monitoring of how the system behaves in production—not just whether outputs are accurate on average, but whether the system’s reasoning process holds up under the range of inputs it actually encounters. This is not a governance wrapper you add at the end of a project. It has to be designed in.
Article 9 also requires testing against “reasonably foreseeable misuse.” For LLM systems, this means you cannot limit your risk evaluation to intended use cases. You have to characterize how the system behaves when inputs are ambiguous, adversarial, or outside the distribution it was evaluated on. Teams that have only benchmarked their models on clean evaluation sets have not satisfied this requirement.
Article 11: Technical Documentation Means the Reasoning, Not Just the Output
Article 11 requires technical documentation to be drawn up before a high-risk AI system is placed on the market and kept up to date. Annex IV specifies what that documentation must contain. The list is detailed, but the substance reduces to one core requirement: you must be able to show how the system works, not just what it produces.
For classical ML models, this is tractable. You document the training data, the architecture, the validation methodology, the performance metrics. The mechanics of the model are inspectable in principle, even if they are complex in practice.
For LLM-based systems, Article 11 creates a harder problem. A large language model’s “mechanics” are not meaningful to document in the traditional sense. What matters for compliance is not the model weights—it is the reasoning process the system follows when making a consequential determination. Annex IV specifically requires documentation of “the logic and mechanism of the algorithms,” and for systems that make recommendations or decisions, documentation of “the relevant processes and logics followed.”
This is where the Act becomes an architectural mandate. If your system generates outputs in a single pass—prompt in, answer out—you do not have a reasoning process to document. You have an output. That is not the same thing, and it does not satisfy the requirement. What Article 11 describes, in engineering terms, is a system where the steps taken to reach a conclusion are represented explicitly and can be shown to an auditor.
Article 13: Transparency and What Users Need to Know
Article 13 requires that high-risk AI systems be designed to be sufficiently transparent that deployers can interpret the system’s output and use it appropriately. This is not a user interface requirement. It is a system design requirement.
The practical implication is that outputs from high-risk systems cannot be presented as bare conclusions. They must be accompanied by enough context for a deployer—the organization using the system, not necessarily the end user—to understand the basis for the output, assess its reliability, and exercise meaningful oversight. For a credit-scoring system, “applicant declined” is not sufficient transparency. The factors that drove the determination, and the reasoning that connected them to the conclusion, must be accessible.
For LLM-based systems, this is a harder problem than it appears. A model’s internal confidence is not surfaced by default, and the chain of reasoning that led to a particular output is not preserved unless you explicitly represent it. Transparency under Article 13 requires that the reasoning be a first-class output of the system—not a post-hoc explanation generated after the fact, but the actual process the system followed.
Article 14: Human Oversight Is an Architecture, Not a Checkbox
Article 14 requires that high-risk AI systems be designed and developed so that they can be effectively overseen by natural persons during the period of use. This is the requirement that most teams underestimate.
Article 14 is specific about what oversight means. It requires that human overseers be able to: fully understand the capabilities and limitations of the system; monitor its operation and detect malfunctions; interpret the output; and override or halt the system when necessary. The Act explicitly states that this capability must be built into the system—it cannot be satisfied by writing a policy document that says humans are responsible for reviewing outputs.
For LLM systems in production, this has direct architectural implications. If your system produces outputs that humans cannot meaningfully evaluate—because the reasoning is opaque, the confidence is unstated, or the process is invisible—then human oversight is nominal rather than real. A human who receives “loan denied” with no reasoning cannot exercise the oversight Article 14 requires, even if they are nominally in the loop.
Building a system that satisfies Article 14 means making the reasoning process available to the human overseer. They need to see what the system considered, what it weighed, what it flagged as uncertain, and where it could have gone differently. That level of transparency requires that the reasoning be represented explicitly—as a structured artifact that persists alongside the output, not as a black box that produced it.
Article 15: Accuracy, Robustness, and What “Reliability” Actually Requires
Article 15 requires that high-risk AI systems achieve appropriate levels of accuracy, robustness, and cybersecurity. The accuracy requirement is the one most teams address—run an evaluation, report a metric. But the robustness requirement is the harder one.
Robustness under Article 15 means the system must perform consistently across variations in input, including adversarial inputs, and that failures must be detectable. This is not a benchmark requirement. It is a behavioral specification: the system should fail gracefully, flag uncertainty when it is uncertain, and not produce confident outputs when the inputs do not justify confidence.
For LLMs, this maps directly to epistemic calibration—whether the system’s expressed confidence remains proportional to the available evidence. A system that produces confident determinations regardless of input quality fails the robustness requirement by definition, because it cannot signal when it is in a failure mode. The absence of uncertainty signals is itself a failure signal that the system hides.
The Architecture That Satisfies These Requirements
Reading Articles 9 through 15 together, the common thread is clear: the Act does not just require that AI systems produce good outputs. It requires that the process by which they produce those outputs be visible, structured, and auditable.
This is not satisfiable with logging. Logging captures what a system did. These requirements capture how it reasoned—the steps it took, the considerations it weighed, the uncertainties it encountered, and the chain of inference it followed to a conclusion. Those are different things, and the distinction is what makes compliance hard for teams whose systems treat reasoning as internal and transient.
What the Act describes, in engineering terms, is a reasoning layer that exists as a first-class artifact of the system’s operation. Not a log. Not a post-hoc explanation. An explicit representation of the reasoning process that is available for inspection, can be reviewed by human overseers, and can demonstrate to an auditor that the system followed a defensible process to reach its conclusion.
Structured reasoning graphs—systems where the steps of reasoning are represented as explicit, executable nodes with documented dependencies and revision flows—satisfy these requirements by architecture. The reasoning is not inferred from the output after the fact; it is the output, alongside the final determination. Article 11’s documentation requirement is satisfied because the reasoning is already a structured artifact. Article 14’s oversight requirement is satisfied because human overseers can read the reasoning, not just the conclusion. Article 9’s continuous risk management requirement is satisfiable because the reasoning traces are available for ongoing review.
This is not a product pitch. It is a technical observation: compliance with Articles 9–15 requires architectural choices that single-pass LLM systems cannot make after the fact. The teams that will be positioned for August 2026 are the ones making those choices now.
The EU AI Act doesn’t just regulate AI outputs—it regulates AI processes. Engineering teams that treat the Act as a documentation exercise will find, on closer reading, that the documentation it requires is only producible by systems designed from the outset to make their reasoning visible.