Designing at the Wrong Layer
AI interface design is not failing because designers lack skill. It is failing because designers are being asked to work at the wrong layer—and design leadership needs to reckon with what that means.
Written For
Design leaders and senior practitioners working on AI products, who suspect something structural is wrong but haven't quite named it yet.
In 2019, a team at Microsoft did something quietly devastating. They took the field’s best interaction design guidelines—the ones that had been settled practice for thirty years—and tested how well they translated to AI systems deployed in the wild.
The violations weren’t in the edges. They were in the fundamentals. Systems that couldn’t explain what they could do. Systems that couldn’t account for why they made a decision. Systems that offered no recovery path when things went wrong. Nielsen’s heuristics turned upside down.
The tempting diagnosis is to say “AI is new. We’re still learning.”
Here’s the uncomfortable truth: the guidelines aren’t being violated because designers are bad at following them. They’re being violated because designers are being asked to work at the wrong layer.
The assumption classical UI was built on
I’m going to use the wanky term “classical UI” to describe what most people recognise when they look at a screen on a device.
When Jakob Nielsen published his heuristics in 1994, he was describing a world that had one defining property: the designer controlled the surface being designed.
This seems obvious in retrospect. Of course the designer controls the interface. What else would they control? But trace what that assumption actually enables and you start to see how much classical UI theory depends on it.
If you—the designer—control the surface, you can guarantee what happens when a user clicks here. You can specify the error state. You can inspect every branch of the interaction tree. The model and the interface are, in effect, the same thing.
Thirty years of UI practice built on what we know as the ‘stack’: Information architecture, Interaction design, Usability testing. All of this dependent on the designer’s control over the full stack.
In an AI system, the designer specifies what they want the interface to do. The model determines whether that specification is achievable. The designer cannot inspect the model. Cannot guarantee the output. Cannot fully anticipate the failure modes. There is a layer below the interface that design does not control and often cannot see into. The classic model of the stack breaks down with the designer’s control diminished.
What the Microsoft team in 2019 found was that this assumption that had held for thirty years was no longer true, and no one had yet reckoned with what to replace it with.
Three layers classical UI never needed
What AI interface design actually requires, once you stop trying to retrofit classical frameworks, is a set of design responsibilities that have no precedent in UI history. Three of them emerge from the research.
The pre-interaction layer
Caetano and colleagues identified a failure mode in conversational AI that classical UX had never needed to worry about: users arrive at the interface without the capacity to use it well. Not because they lack skill. Because natural language implies open-ended capability that the system doesn’t have. The interface looks like it can do anything. It cannot.
Their response was architectural: build a structured goal-formation process before the conversation begins. Force clarity about intent upstream of the system attempting to fulfil it.
This isn’t a feature. It’s a new layer of design that sits upstream of everything classical UI assumed was the starting point. If the pre-interaction layer is a genuine design responsibility—and the evidence suggests it is—then AI interface design begins before the screen. It begins with how users form goals.
That is a service design problem. It requires different expertise, different methods, and a scope of authority that screen-level UX work doesn’t have.
The temporal layer
Classical UI had fixed interfaces. A designer made decisions, they were implemented, the interface remained stable until someone deliberately changed it. Learning was a human activity. It happened outside the system.
AI systems learn. And this creates a tension in Microsoft’s work: learn from user behaviour versus adapt cautiously. The more aggressively a system adapts, the more disruptively it changes. The more cautiously it adapts, the less useful the learning becomes.
The interface the designer builds is not the interface users will encounter six months later. Which means the training pipeline is a design artefact. The feedback loop criteria are design decisions. The adaptation logic is a design decision. A design leadership that cedes these to data science teams is ceding its most important future territory—to people who are not accountable to design objectives and may not know what a design objective looks like.
The structural layer
This is the hardest one to sit with, because it describes the boundary of design’s actual authority.
Many of the most important design properties in AI interfaces—explainability, accurate intent sensing, appropriate calibration, recovery from unexpected outputs—are determined at the model layer, not the interface layer. The designer can specify what they want. They often cannot implement it. The decisions that matter most are made by people who don’t think of themselves as designers and aren’t accountable to anything a designer would recognise as a quality criterion.
No framework in UI history was built for this. No methodology, no heuristic, no principle was developed for a world in which the designer is structurally excluded from the decisions that most affect the user experience. This isn’t a gap we’ve failed to close. It’s a condition we haven’t yet learned to design within.
What the field is currently doing wrong
Three failure modes run through the research.
Scope misalignment. Design is being applied at the screen level to problems that originate at the model level. This produces polished interfaces on top of fundamentally broken experiences. Blümel and Jha’s analysis of deployed chatbots is instructive here: most add conversational UI to systems that are functionally search boxes. The design work is competent. The product is dishonest. No amount of interaction design refinement fixes a capability mismatch—it only makes the mismatch harder for users to detect. We are, in the most literal sense, putting lipstick on a pig and billing it as a face.
Diversity failure. The design vocabulary for culturally sensitive AI interaction exists. Persona design, mental model calibration, capability signalling—all of it can theoretically be applied with attention to diverse populations. In practice, it consistently isn’t. The failure mode is invisible to homogeneous teams: you cannot see the capability gap you don’t experience. Design leadership that doesn’t treat team composition as a design quality issue will keep producing AI interfaces that work well for some users and systematically mislead others. And won’t notice.
Boundary ambiguity. There is currently no clear position on where design’s authority ends and engineering’s begins in AI systems. This creates two simultaneous failure modes: designers specifying requirements that are technically unachievable at the interface layer, and engineers making decisions with profound UX consequences without any design input. The structural layer is not going away. We need a deliberate strategy for working at it—not hoping it doesn’t exist.
What design needs to do instead
None of this argues for design retreating from AI. It argues for design advancing into territory it hasn’t yet claimed.
The most important shift is conceptual. Design needs to stop treating AI interface design as a harder version of classical UX and start treating it as a different discipline that classical UX partially informs. The foundations transfer—signifiers, affordances, mental model alignment, user control. The assumptions don’t. Building on foundations while abandoning assumptions requires a kind of intellectual honesty that’s uncomfortable, especially when the foundations are what gave the field its authority in the first place.
Practically, three things follow.
Extend the remit upstream. The pre-interaction layer is a design responsibility. Goal formation, capability communication, and expectation-setting before users reach the interface are not marketing problems or onboarding problems. They are design problems that require service design skills, not just UX skills. Design leadership should be lobbying for this scope, not waiting to be invited into it.
Treat the training pipeline as a design artefact. If the interface changes over time because the model is learning, then learning objectives are design decisions. Feedback loops are design decisions. The criteria by which the system determines what counts as a good interaction are design decisions. This work won’t happen unless design claims it.
Develop a deliberate strategy for the structural layer. At minimum: clear escalation paths for design requirements that can’t be implemented at the interface level, relationships with model development teams, and shared quality criteria that span the design-engineering boundary. And honest conversations with product leadership about what AI interfaces can and cannot guarantee from a user experience perspective. The current norm—where design is held accountable for outcomes it doesn’t control—serves no one.
The designers who’ll do well
Thirty years of UI history produced real, durable knowledge. How people build mental models. How signifiers communicate affordances. How recovery paths build trust. None of it is wrong. All of it transfers, at least in part.
What it doesn’t do is resolve the design problems specific to AI: the pre-interaction layer, the temporal layer, the structural layer. These require frameworks we’re only beginning to build, applied by teams with the authority and scope to actually implement them.
The designers who will do well in AI aren’t the ones who apply classical heuristics most diligently. They’re the ones who understand why classical heuristics are insufficient—and have the intellectual courage to build from that understanding rather than retreat from it.
That is the work design leadership needs to be creating conditions for.
The research is pointing the way. The question is whether we’re ready to follow it somewhere unfamiliar.
Sources
Amershi et al. (2019). Guidelines for Human-AI Interaction. CHI Conference on Human Factors in Computing Systems.
Caetano et al. (2025). Agentic Workflows for Conversational Human-AI Interaction Design.
Blümel & Jha (2023). Designing a Conversational AI Agent. Hawaii International Conference on System Sciences.