Thoughts on Using Generative AI in Mission-Critical System Development


Author’s note (April 5, 2026): This article framed its argument around the opposition between AI’s non-deterministic nature and the deterministic nature of traditional tools. Over the past few months of working with AI-driven development, my view has changed substantially. I have come to see AI as heuristic in nature — much like humans. In practice, development has always involved building up specifications incrementally while working to minimize errors along the way. Working with LLMs turned out to be no different from doing this with humans. Some aspects of this article, such as parts of the modeling and the argument that specifications must admit only one interpretation, remain valid, but I would ask readers to weigh the rest accordingly.

I have been working in the payment industry for over 10 years. In an environment where ambiguous results are never acceptable, system development has always been strict and unforgiving. Perhaps because of this, the adoption of generative AI in real production environments has been slow. In these domains, introducing new technology into production is never easy.

Recently, I started working on a project to explore how generative AI could advance the development of payment systems. In this article, I won’t mention any specific tools or products here. Instead, I want to capture my current way of thinking — the mindset behind how we should approach AI before talking about concrete implementations.

AI Is a Prediction Machine

Just as traditional machine learning predicts future numbers from historical data, generative AI is, at its core, a prediction machine. It takes natural language as input and predicts the next most plausible sequence of natural language as output. Nothing more, nothing less.

This can be expressed with a very simple model:

y = f(x)

In the context of system development:

  • x: specification
  • y: artifact (code, configuration, documentation, etc.)
  • f: generative AI

Generative AI has made the function f — the transformation from x to y — extremely fast. In the past, we saw this function as the primary role of software engineers. But that is changing.

Our role is shifting toward:

  • defining x more rigorously
  • verifying whether the generated y truly matches x

These are becoming central responsibilities for engineers in the AI era.

As a side note, programmers themselves can also be modeled as a simple function:

💻 = 🦄(☕)

A mysterious creature that takes coffee as input and outputs programs. In a way, maybe we’ve always been a kind of generative AI.

What Engineers Must Do Around f

The ultimate goal of the function f is to achieve x = y (or more realistically, to make y as close as possible to x in meaning). To move toward that goal, there are several areas engineers must strengthen.

1. Defining Contracts That Cannot Be Reinterpreted

First, we must acknowledge that natural language is fragile. Even in the legal world — arguably the strictest domain when it comes to language — interpretation gaps still exist. So when we give instructions to generative AI, ambiguity is unavoidable.

That’s why we need to pursue something closer to “contracts that cannot be reinterpreted.” In older terms, this is essentially a return to contract-based thinking.

Instead of feeding vague natural language as specifications, we should express requirements as formalized contracts whenever possible. And we must carefully choose the languages, tools, and formats used to represent those contracts.

2. Constraining Implementation Choices

Suppose we can write an unambiguous contract and achieve x = y. Even then, a crucial question remains:

Do we need to care about how y was implemented?

Personally, I believe the answer is yes.

In payment systems, when an incident occurs, it can cause significant financial loss for customers within seconds. In those situations, quickly narrowing down the root cause is critical.

There is no time to ask an AI, “Something broke — any idea why?”

If we understand the underlying mechanisms and the implementation stack, troubleshooting becomes much easier. For that reason, the choices used to generate artifacts should be somewhat constrained. And as operators of generative AI, we as software engineers still need to be deeply familiar with the technologies involved.

3. Verifying Input-Output Consistency

Even if x = y is achieved, we still need a way to verify it.

This is where I believe the real bottleneck of future development will emerge. Generative AI can produce artifacts y many times faster than we can write them ourselves. At that point, the slowest part of the process becomes the time between receiving y and providing the next x.

And what happens during that time?

We are verifying whether y truly equals x.

Testing, automated verification, and spec-to-implementation alignment will become the new center of engineering effort.

If x = y Is Achieved, Will Bugs Disappear? (The x ≠ z Problem)

No :cry:

Even if an implementation y perfectly matches the contract x, the contract itself was written by humans trying to model an unstable and ever-changing world. That means the true requirement — let’s call it z — may still differ.

So:

  • x = y can be achieved
  • but x = z is never guaranteed

This is not a new problem — it has always been there.

However, I suspect that in the age of generative AI, this gap will become more visible. Development speed will increase dramatically, and we may spend less time questioning whether the specification itself is correct.

Because of this, my current belief is that even with highly skilled engineers, the actual bug rate may not decrease significantly. In addition, from a macro perspective, it might even increase due to:

  • Overtrust in natural language specifications
  • Less attention paid to implementation details
  • A lower barrier to entry bringing more people into development

Both engineers and managers need to remain aware of these risks.

When it comes to bugs, responsibility also becomes an important question. At a high level, the structure hasn’t changed.

If x = y fails, the responsibility still belongs to the developers. “We just used what the AI generated” will not be an acceptable excuse.

The more difficult issue is when x = z fails. We may need to explicitly introduce a phase where we verify not only “Does y match x?” but also “Is x actually the right thing to build?”

So, Am I Against Natural-Language-Driven Development?

Not at all.

Natural language is extremely powerful for exploration and prototyping. We should absolutely use it aggressively in early stages. However, once the goal of exploration is achieved, I believe we should discard those temporary artifacts and switch to generation based on formal contracts.

Ideally, we want a world where:

Repeated generation from the same x always should produce the same y.

Now suppose we generate multiple artifacts from the same input x:

  • y1 from f(x)
  • y2 from f(x)

Then verify:

y1 = y2

If they converge, we can gain higher confidence in the reliability of f.

Conclusion

Generative AI will dramatically accelerate development speed. But that speed only becomes meaningful when supported by strong contracts and rigorous verification.

AI is not magic. It’s simply that the function f — the transformation from specification x to artifact y — has become incredibly fast. This shift raises three fundamental questions: how we define contracts, how we constrain implementations, and how we verify consistency.

Ultimately, these questions converge into two core responsibilities:

  • defining x precisely
  • verifying that y truly matches it

This is not about replacing engineers. It is about redefining where engineering effort should be spent.