Building Equity in AI Systems

How companies can build lasting intellectual property in AI systems by focusing on evaluation frameworks rather than prompt engineering

Avi Santoso

Copyright Vertical AI 2025. Written by Avi Santoso in September, 2025.

The AI implementation boom is in full swing. Across industries, companies are rushing to integrate AI capabilities into their applications. This is primarily driven by competitive pressure and the fear of being left behind, not because they believe there will be a strong return on investment.

Most organisations are treating AI applications as temporary necessities. Called "ChatGPT wrappers", many believe they don't represent genuine intellectual property. This misconception shows a key gap between where companies are putting their resources and where they should really invest.

Why the Current Thinking Is Flawed

When companies create AI applications now, they usually put their top talent into making effective prompts. Months are spent refining instructions, testing edge cases, and optimising outputs. And it works. The resulting prompts are valuable and yield high-quality results.

But prompts have a fundamental flaw: they're inherently tied to specific models. Every prompt is optimised for ChatGPT 4, Claude 3.5, or whatever model was current during development. When new model generations emerge - which happens annually - these carefully crafted prompts often need rework or complete rewrites.

This creates a hidden recurring cost. A company that spends six months perfecting a prompt for their legal document analysis system will likely need to invest comparable time and money when they migrate from ChatGPT to Claude, or when the next model generation changes the response patterns they've optimised for.

Worse still, prompts can be easily exploited using "jailbreaking" methods. These techniques force AI systems to reveal their instructions. High-profile cases like v0, where the full prompt was made public, highlight how companies’ core intellectual property can be easily revealed.

See this GitHub Repo to see the prompt from v0.

Where Real Equity Lives

While companies often focus just on prompts, the actual competitive advantage lies in evaluation systems. This is the infrastructure that tests, monitors, and validates AI outputs. Evaluation systems differ from prompts because they are model-agnostic. A good evaluation framework can measure agent quality, whether you use ChatGPT, Claude, or any future model.

Consider a law firm that has built an AI system for analysing wills and estate documents. The prompt that instructs the AI how to perform this analysis might be specific to GPT-4 and require updates with each model change. But what about the evaluation system that tests whether the analysis correctly identifies beneficiaries, catches inconsistencies, and flags potential legal issues? That represents lasting value.

Evaluation systems provide several layers of competitive advantage:

Quality Assurance at Scale

They help companies keep output quality steady while handling thousands of requests. They catch edge cases and ensure reliability that manual reviews simply can’t match at this scale.

Competitive Differentiation

A strong evaluation system lets a company show clear success rates that outshine competitors. This offers solid proof of better performance instead of just subjective claims.

Future-Proofing

When new models appear, companies with solid evaluation systems can adapt quickly. They can validate performance across their specific use cases with greater confidence.

Proprietary Protection

Unlike prompts, it’s tough to extract or reverse-engineer evaluation systems using adversarial methods. The logic for what makes up "good" output in your specific domain represents real intellectual property.

How Leading Companies Divide Resources

Forward-thinking organisations are already recognising this distinction and allocating resources accordingly. They assign their most skilled talent to focus on creating advanced evaluation frameworks instead of prompt engineering. This framework is then used to assess output quality, track production performance, validate new model generations, or update prompts faster.

This resource allocation reflects a deeper understanding of where lasting competitive advantage lies. A junior developer can learn to write effective prompts in weeks. Building evaluation systems that accurately assess domain-specific outputs? That requires deep expertise and represents intellectual property that's genuinely difficult to replicate.

The difference becomes clear when you consider hiring priorities. A company focused on short term results looks for talent with prompt engineering knowledge for the latest model generation. A company focused on sustainable competitive advantage looks for talent that can build systems to test the effectiveness of AI agents.

What This Means Going Forward

The implications of these different approaches will become increasingly clear over the next few years. Companies that invest a lot in prompt development but skip evaluation systems will keep facing costs each time they change models.

Meanwhile, companies with robust evaluation systems will move faster. They can check performance and find the best settings within days, not weeks, of new model launches. This helps them keep their advantage as technology changes.

The timeline for this transition appears to be roughly three to five years before these practices become standard industry knowledge. Organisations that see this pattern now could gain a big edge. They might have up to five years of advantage before evaluation-focused methods are the norm, setting up those firms for a decade.

Building for the Long Term

The companies that will dominate AI applications aren't those with the most sophisticated prompts today. They build evaluation systems that allow production monitoring, enable quick improvements and adaptations to existing prompts, and help keep competitive edges through tech changes.

This represents a fundamental shift in how we should think about AI investment. Instead of asking "How can we build the best prompt?" the question becomes "How can we build systems that can validate and improve our AI outputs over time?"


At our consulting practice in Perth, Western Australia, we help companies manage this change. We design evaluation systems that fit their unique needs and goals. If you're creating your first AI app or want to improve current ones, we can talk about how systematic AI quality assurance can boost your competitive edge.

Please contact us at hello@verticalai.com.au or visit our website at VerticalAI