Legal AI Experiment Deep Dive ~25 min read · 2026-04-01

$0.60 to $0.02 Per Message -- The Ontology Trick Nobody in Legal AI Is Talking About

By Stephane Boghossian, CEO of HAQQ Legal AI

Published: April 1, 2026 | Reading time: ~25 min


TL;DR: So I stumbled onto a legal ontology built on Dynamic Interfaces' platform and it kind of broke my brain. A team in Mexico replaced 300 MCP tools with 7 ontology-aware ones. Per-message cost dropped from $0.60 to $0.02. Meanwhile, Harvey AI just raised at $11B and charges $1,200/lawyer/month -- for RAG. Stanford researchers found that production legal RAG tools hallucinate 17-33% of the time. The trick isn't better prompts or cheaper models -- it's modeling law the way law actually works: structured, versioned, cross-referenced. We're now building this for UAE labor law at HAQQ. I'm genuinely obsessed.

Key Takeaways:


Table of Contents:


How a Demo Call Rewrote My Roadmap

A few weeks ago, I got on a call with the CEO of Dynamic Interfaces to look at something called Sentencia -- a legal ontology for Mexican labor law. I figured I'd see a demo, take some notes, move on.

That's not what happened.

I've been building legal AI at HAQQ for the MENA region, and I've sat through enough "revolutionary" demos to last a lifetime. Most of them are just RAG with a nicer UI. Sentencia looked different from the first five minutes. Not because of slick design or marketing speak -- because of what was happening under the hood.

Here's the thing that stopped me cold: 5 Mexican government customers were using this system daily. Court-appointed expert witnesses -- peritos -- were querying labor law across four federal statutes, getting precise answers with full legal citations, and the whole thing cost two cents per message. Not two dollars. Two cents.

Holy shit.

For context, Harvey AI -- the $11B golden child of legal AI -- charges $1,200 per lawyer per month. CoCounsel starts at $220/month. Even the cheapest seat in legal AI runs $100+/month. And here was this system in Mexico doing it for two cents per message. No subscriptions. No seat minimums. Just a structured knowledge graph and 7 well-designed tools.

I spent the next two weeks pulling Sentencia apart to understand why it works. This article is what I found, why it matters for anyone building legal AI, and what we're building at HAQQ because of it.


What the $11B Legal AI Companies Get Wrong

Before I get into the ontology architecture, I need to say something blunt about the current state of legal AI. Because the more I dug into the competitive landscape, the more a pattern emerged -- and it's not flattering for the incumbents.

Every major legal AI company is a RAG wrapper. Not one has a formal legal ontology.

Let me be specific.

Harvey AI -- $11B valuation, $1.2B raised, backed by Sequoia and GIC -- runs fine-tuned LLMs with RAG over legal databases. They charge ~$1,200/lawyer/month at list price ($100-500 after enterprise discounts, 20-seat minimum, $288K annual floor). They just announced a LexisNexis integration, adding another $400-600/lawyer/year. They claim 91% accuracy on their "BigLaw Bench." That still means 9% of legal work contains errors. In a profession where a single wrong citation gets you sanctioned.

CoCounsel (Thomson Reuters) -- 1 million users, bolted onto Westlaw's 100+ years of case law. Multi-model architecture across Anthropic, OpenAI, and Google. Pricing from $220 to $500/user/month. Better data moat than Harvey. But still RAG at its core -- federated search with AI summarization layered on top.

Legora (formerly Leya) -- $5.55B valuation, 800 law firms. Built on Claude with agentic workflows. $250/user/month, 10-seat minimum. No proprietary legal knowledge structure. It's a very well-designed wrapper.

Now here's the number that should make every legal AI founder lose sleep.

Turns out Stanford ran a preregistered empirical study on this -- the first of its kind. Magesh et al., published in the Journal of Empirical Legal Studies in 2025. They tested production legal RAG tools and found hallucination rates of 17-33% across the board:

Tool Hallucination Rate Source
GPT-4 (general purpose) 58% Magesh et al., JELS 2025
Llama 2 (general purpose) 88% Magesh et al., JELS 2025
Westlaw AI-Assisted Research 33% Magesh et al., JELS 2025
Lexis+ AI 17%+ Magesh et al., JELS 2025
Ask Practical Law AI 17%+ Magesh et al., JELS 2025

The Stanford team's conclusion: RAG reduces hallucinations versus general-purpose models, but hallucinations remain "substantial, wide-ranging, and potentially insidious." Legal AI providers' claims of "hallucination-free" citations are demonstrably overstated.

So the companies charging $1,200/month are shipping tools that get it wrong up to a third of the time. And none of them have the architectural foundation to fix it -- because the problem isn't the model. It's the retrieval architecture.

Meanwhile, in an entirely different domain -- clinical medicine -- researchers published a paper showing that ontology-grounded GraphRAG hit 98% accuracy versus ChatGPT-4's 37%. That's not a typo. Ontology-grounded hallucination rate: 1.7%. ChatGPT-4 hallucination rate: 63%. A 61-percentage-point improvement, published in the Journal of Biomedical Informatics, using SNOMED CT (the medical ontology standard) as the grounding layer.

The medical domain proved it. The legal domain needs it. And nobody's building it.

That's the gap. That's what HAQQ is walking into.


The Experiment: Poking Around Inside a Legal Ontology via MCP

I want to be upfront about what this was. Not a product review. Not a partnership announcement. This was me connecting to Dynamic Interfaces' MCP server, exploring Sentencia's data structures, analyzing the ontology design, and stress-testing it against everything I know about legal reasoning.

The Model Context Protocol (MCP) -- Anthropic's standard for AI-tool integration, now governed by the Linux Foundation -- was the interface. Every action in the Sentencia ontology is exposed as a callable MCP tool. Any MCP-compatible client can plug in. That alone is interesting, but the real story is what happens when you constrain an AI agent to operate within a well-defined ontology instead of throwing 300 generic tools at it.

The Model Context Protocol (MCP) is an open standard introduced by Anthropic in November 2024 for connecting AI models to external tools and data sources. Now governed by the Linux Foundation's Agentic AI Foundation, MCP defines a JSON-RPC interface through which AI agents can discover and invoke tools. As of early 2026, over 10,000 active public MCP servers are registered, with 97 million monthly SDK downloads. The protocol has been adopted by OpenAI, Google DeepMind, and major enterprise platforms.

Their CEO put it plainly: "Ontologies are kind of the secret."

Coming from a guy whose entire platform is built around generating infrastructure from ontology definitions, sure -- that sounds self-serving. But after digging into the data, I think he's right. And I think this is the same insight that made Palantir a $100B+ company.

Turns out there's academic backing for this too. A 2025 paper on tool selection found that reducing tool count tripled accuracy -- from 13.6% to 43.1% -- while cutting prompt tokens by over 50% (RAG-MCP, arXiv:2505.03275). Fewer tools, dramatically better performance. That's exactly what the ontology does: collapses hundreds of granular database operations into a handful of semantically meaningful legal operations.


Why Does Legal AI Fail? The 3 Fatal Flaws of RAG for Law

I'll be blunt. Most legal AI products -- including most of what exists in the MENA market -- are doing RAG over PDFs. They chunk legal documents, embed them in a vector database, and retrieve semantically similar passages when you ask a question. This works for general knowledge queries. It fails catastrophically for law.

RAG-based legal AI fails in three specific ways: temporal blindness (retrieving wrong versions of amended law), structural ignorance (losing legislative hierarchy during chunking), and cross-reference amnesia (inability to follow typed relationships between provisions).

The consequences aren't theoretical. At least 6 attorneys have been sanctioned for filing AI-generated fake case citations since 2023 -- starting with the now-infamous Mata v. Avianca case where ChatGPT fabricated cases and then confirmed they "indeed exist" in Westlaw. In 2025, a large, well-regarded law firm got hit in Johnson v. Dunn. This is an ongoing crisis, not a one-off.

I keep running into the same three failure modes.

Temporal Blindness

RAG systems can't inherently distinguish between the 2022 version and the 2023 version of a legal article. A semantic search for "vacation days Mexico" might pull back the pre-2023 text (6 days minimum) instead of the post-2023 text (12 days minimum). In casual conversation, being off by 2x is embarrassing. In a legal calculation submitted to a court, it invalidates the entire document.

Mexico's 2023 "Vacaciones Dignas" reform doubled vacation entitlements overnight. A perito calculating a wrongful dismissal case spanning that boundary needs both versions -- the old table for pre-2023 years, the new table for post-2023. RAG has no mechanism for this. None.

An EMNLP 2025 survey of RAG-reasoning systems confirmed this as a systemic issue: "temporal blindness" is one of the identified failure modes, alongside "mode-switch fragility" where models actually perform worse with full retrieval sets than without any documents at all.

Structural Ignorance

Law is hierarchically structured: Constitution, Federal Law, Regulations, Circulars. Within a law: Titles, Chapters, Articles, Fractions, Paragraphs. RAG chunking destroys this hierarchy. When Article 50 says "in the terms of Article 48," a RAG system may retrieve Article 50 without Article 48, producing an incomplete answer. Worse -- it doesn't know what it's missing.

Cross-Reference Amnesia

Legal reasoning is inherently graph-based. A seniority premium calculation under Article 162 of Mexico's Federal Labor Law references UMA values defined in the Constitution. INFONAVIT housing credits are calculated in UMAs per a 2016 reform. The retirement savings law depends on the social security law's contribution definitions. A RAG system has no mechanism to follow these reference chains. It retrieves fragments. An ontology traverses connections.

A legal knowledge graph is a network of legal provisions connected by typed, semantic relationships. Unlike a document index where connections are inferred by keyword similarity, a legal knowledge graph explicitly encodes that Article 50 of Mexico's Federal Labor Law "establishes a formula" referenced by Article 48, which in turn "refers to" Article 84's salary definition. Each edge carries a relationship type -- refers_to, complements, creates_exception, establishes_formula, establishes_procedure, modifies, defines_term -- enabling an AI agent to traverse legal logic deterministically.

A 2025 paper from arXiv -- "An Ontology-Driven Graph RAG for Legal Norms" (SAT-Graph RAG, arXiv:2505.00039) -- validated exactly this. The researchers found that standard flat-text retrieval is "blind to the hierarchical, diachronic, and causal structure of law, leading to anachronistic and unreliable answers." Their solution -- grounding a knowledge graph in a formal legal ontology with temporal versioning and causal event nodes -- mirrors what Sentencia built. Applied to the Brazilian Constitution, it demonstrated verifiable, temporally-correct answers with drastically reduced factual errors.

I read that paper after my Sentencia deep-dive and got chills. Independent validation from researchers who'd never seen the system.

HTML_BLOCK_0


What Is a Legal Ontology and How Does It Work?

Here's what Sentencia actually looks like under the hood.

A legal ontology is a structured, machine-readable knowledge model that represents a legal domain as a graph of entities (laws, articles, courts, computed values), their hierarchical relationships, and typed cross-references between provisions. Unlike flat-text retrieval systems, a legal ontology preserves the inherent structure of legislation -- hierarchy, versioning, exceptions, and formula chains -- enabling deterministic legal reasoning rather than probabilistic document retrieval.

Scale: 11 entity types, 1,689 articles, 4 federal laws, 7 judicial precedents, 12 typed cross-references, 10 enum taxonomies.

The Sentencia legal ontology models 1,689 articles across 4 Mexican federal labor laws with 11 entity types, 7 typed cross-reference categories, and 10 enum taxonomies.

The four laws form a closed system around the Mexican employer-worker relationship:

Law Abbreviation Articles What It Governs
Ley Federal del Trabajo LFT 1,076 Employment, rights, disputes, procedures
Ley del Seguro Social LSS 373 Social security: health, disability, pensions
Ley de los Sistemas de Ahorro para el Retiro LSAR 146 Retirement savings, AFORE accounts
INFONAVIT Law INFONAVIT 94 National housing fund, mortgage credits

The hierarchy flows naturally from the law itself. Click any law below to explore how it breaks down:

HTML_BLOCK_1

How Typed Cross-References Change Legal AI Reasoning

But hierarchy alone isn't the interesting part. What makes Sentencia architecturally powerful is the typed cross-reference system. In most legal AI systems, a reference between two articles is just a hyperlink -- "see also Article 48." In Sentencia, it's a typed relationship with semantic meaning:

Reference Type Meaning Why It Matters
remite_a Refers to Basic dependency chain
complementa Complements Expands scope across laws
modifica Modifies Tracks reform history
excepciona Creates exception Prevents over-application of general rules
define_termino Defines term Links usage to legal definition
establece_formula Establishes formula Points to calculation methods
establece_procedimiento Establishes procedure Maps how to exercise a right

That excepciona type -- that's the one that keeps me up at night. Article 5 of the LFT establishes general labor rights, but special work regimes -- domestic workers, athletes, digital platform workers -- create exceptions. If your system doesn't model exceptions explicitly, it applies general rules where special rules should apply. That's not a rounding error. That's a legally catastrophic mistake that could cost someone their case.

The key insight -- and one that an independent 2025 arXiv paper confirmed -- is that law already has an ontology. It's already structured into hierarchies, cross-referenced with typed relationships, and versioned through reforms. Sentencia doesn't impose structure on unstructured text. It captures the structure that already exists in the law itself.

That was my "oh" moment. We've been trying to make AI understand law by throwing text at it. But law isn't text. Law is a graph.


How Much Does a Legal Ontology Reduce AI Costs?

This is where it goes from "architecturally interesting" to "holy shit, this changes the business model."

A legal ontology reduced AI reasoning costs from $0.60 to $0.02 per message -- a 97% reduction -- by collapsing 300 MCP tool descriptions into 7 ontology-aware tools.

Here's the before/after in a clean comparison:

Metric Before Ontology (RAG) After Ontology Improvement
Tools per request ~300 7 97.7% fewer
Tokens for tool descriptions ~90,000 ~2,100 97.7% fewer
Cost per message $0.60 $0.02 96.7% cheaper
Model requirement Frontier (GPT-4 class) Open-source ~30x cheaper
Answer traceability Probabilistic chunks Deterministic graph traversal Full citation chain

To put this in competitive context:

Product Cost per Interaction Annual Cost (200 queries/mo)
Harvey AI $2.40-$6.00 $14,400+
CoCounsel All Access $1.00-$2.50 $6,000
Legora $1.00-$1.25 $3,000
Sentencia (ontology) $0.02 $48

That's not a different price tier. That's a different universe.

HTML_BLOCK_2


Why Do Ontology-Based Legal AI Systems Need Fewer Tools?

Token Economics: Why 300 Tools Cost $0.60 and 7 Tools Cost $0.02

The standard approach to building AI agents is to expose every database operation as a separate tool. Get article by number. Get article by topic. Get articles by law. Get article text. Get article formula. Search articles by keyword. Get article reforms. Get article at date. Multiply that across 11 entity types and you hit 300 tools fast.

Here's the problem most people miss: every one of those tool descriptions gets serialized into the LLM's context window on every single request. A single large MCP server can consume 10,000-17,000+ tokens of context just for tool descriptions. With 300 tools at roughly 300 tokens each, you're burning 90,000 tokens before the user even asks a question.

At Claude Opus pricing ($15 per million input tokens, $75 per million output tokens as of March 2026), that's roughly $1.35 per request just for tool descriptions. Add the actual conversation and you land around $0.60 per message.

Turns out this isn't just a cost problem -- it's an accuracy problem too. A 2025 paper (RAG-MCP, arXiv:2505.03275) found that baseline tool selection accuracy with all tools in context was a dismal 13.62%. By using RAG to retrieve only relevant tools, accuracy jumped to 43.13% -- a 3.17x improvement. Prompt tokens dropped by over 50%. The platforms figured this out empirically: OpenAI hard-caps at 128 tools per agent, and Cursor limits MCP tools to 40, explicitly to prevent "flooding the agent's context window."

The ontology collapses this. Instead of 300 granular database operations, you get 7 semantically meaningful legal operations:

Seven tools. 2,100 tokens for descriptions. $0.02 per message.

I stared at those numbers for a long time. This isn't incremental improvement. This is a different cost universe.

The Model Downgrade Effect: From Frontier to Open-Source

Ontology-based legal AI enables a shift from frontier models costing $15 per million input tokens to open-source models at $0.50 per million, because the simplified 7-tool interface requires less reasoning capability.

When an AI agent only has 7 well-defined tools to choose from -- instead of 300 -- the model doesn't need to be as smart. The ontology does the heavy structural lifting. The model just needs to understand the user's question, pick the right 1-3 tools, and synthesize a response from structured data. That's a much simpler task than reasoning about which of 300 tools to chain together.

Anthropic's own engineering team demonstrated the principle: swapping direct MCP calls for a code-execution approach collapsed a 150,000-token workflow to ~2,000 tokens -- a 98% reduction. The ontology achieves the same compression through domain modeling rather than code generation.

HTML_BLOCK_3

The Perito Salary Math -- This Is Where It Gets Real

The cost math hits different when you look at who actually uses this system. A perito laboral in Mexico City earns roughly 15,000-30,000 MXN per month -- about $830 to $1,670 USD. At $0.60 per message, with 50 cases per month and 20 messages per case, the API cost alone would be $600/month. That's 36-72% of their income.

Completely non-viable. Dead on arrival.

At $0.02 per message, the same usage costs $20/month. That's 1.2-2.4% of income -- less than a Netflix subscription. The cost reduction doesn't just make the product cheaper. It makes an entirely new market possible. These are people who literally couldn't afford legal AI before.

And there's a compounding effect here that I find genuinely exciting: lower cost per message means more messages per case. More messages means more thorough analysis. More thorough analysis means better legal documents. Better documents mean more wins, more reputation, more referrals. The cost reduction doesn't just affect price -- it affects quality. It changes the product itself.

Compare that to Harvey's model: $1,200/lawyer/month with a 20-seat minimum means $288,000/year before a single query runs. That's designed for Am Law 100 firms with 100,000+ lawyers. It structurally excludes the solo practitioners, small firms, and expert witnesses who handle the bulk of labor law globally.


Deep Dive: A Real Wrongful Dismissal Calculation -- $254,000 MXN

Let me trace through a real example -- not a theoretical one. This is a complete wrongful dismissal calculation using actual LFT articles, real 2026 wage data, and the full cross-law dependency chain. This is what convinced me the architecture produces better answers than RAG -- not theoretically, but in practice.

The case: Maria Elena Gutierrez, administrative assistant at a manufacturing company in Guadalajara. Monthly salary: $15,000 MXN ($500/day). Employed 5 years, 3 months (January 2021 to April 2026). Terminated without just cause -- employer claims "restructuring" but provides no written notice per Art. 47 LFT.

Here's how the ontology handles it -- and what RAG would miss.

Step 1: Classify the dispute. The system identifies despido injustificado (wrongful dismissal) from the query. This isn't semantic similarity -- it's an exact match against the ArticuloTema enum. No fuzzy retrieval. No "close enough." Critically, Art. 47 LFT requires the employer to deliver written notice stating specific conduct and dates. Failure to provide written notice makes the dismissal unjustified automatically. The ontology knows this because the excepciona relationship is modeled.

Step 2: Check binding precedent. The system retrieves Tesis 2a./J. 53/2017 -- binding jurisprudencia from the SCJN's Second Chamber. This tesis establishes that when an employer denies dismissal and offers reinstatement, the tribunal must evaluate whether the offer is in good or bad faith. If bad faith (e.g., offering reinstatement under worse conditions), the burden of proof stays on the employer. Per Art. 784 LFT, the employer bears the burden of proof for working conditions, attendance, seniority, wages, and more. RAG might retrieve this tesis -- or it might not. The ontology always does, because it's linked to tema: despido with a typed interprets relationship.

Step 3: Follow the formula chain and compute everything.

Here's the full calculation -- every line traceable to a specific article:

A. Indemnification (Liquidacion)

Concept Legal Basis Calculation Amount (MXN)
Indemnizacion constitucional Art. 48 LFT 90 days x $500/day $45,000.00
20 dias por ano de servicio Art. 50 LFT 20 x 5.25 years x $500 $52,500.00
Prima de antiguedad Art. 162 LFT 12 x 5.25 years x $500 $31,500.00
Subtotal $129,000.00

B. Settlement of Pending Benefits (Finiquito)

Concept Legal Basis Calculation Amount (MXN)
Proportional aguinaldo Art. 87 LFT 15 days x (3/12) x $500 $1,875.00
Proportional vacation Art. 76 LFT (post-2023) 20 days x (3/12) x $500 $2,500.00
Prima vacacional Art. 80 LFT 25% of $2,500 $625.00
Subtotal $5,000.00

C. Back Wages (Salarios Vencidos)

Concept Legal Basis Calculation Amount (MXN)
Salarios vencidos (8 months) Art. 48 LFT 8 months x $15,000 $120,000.00
(Capped at 12 months max)

D. Cross-Law Verification

Check Legal Basis What It Covers
IMSS contribution audit Arts. 27-28 LSS Was salary correctly registered? SBC cap = 25 UMAs = $2,932.75/day
INFONAVIT contributions Art. 29 Ley INFONAVIT 5% employer contribution to housing subaccount
AFORE impact LSAR Unemployment withdrawal rights after 46 days

E. Grand Total

Component Amount (MXN)
Indemnizacion $129,000.00
Finiquito $5,000.00
Salarios vencidos (8 months) $120,000.00
GRAND TOTAL $254,000.00

That's $254,000 MXN -- roughly $14,100 USD -- computed deterministically from 8 articles across 4 laws. Every number traces to a specific article, in a specific version, via a named cross-reference.

A RAG system asked this question would retrieve some of these articles, miss others, and hallucinate the amounts it couldn't find. Turns out Stanford quantified exactly how often: 17-33% of the time. On a $254,000 calculation, a 17% error rate means the answer could be off by $43,000. That's not an approximation. That's malpractice.

The ontology computes this deterministically. Every time. Zero hallucination on the numbers -- because the numbers come from structured data, not generated text.

HTML_BLOCK_4


The UMA Trap: A $180,000 Mistake Hiding in Plain Sight

This deserves its own section because it's the single most common error in Mexican labor calculations -- and a perfect illustration of why structured knowledge representation matters.

Since Mexico's 2016 constitutional reform, two reference units coexist: the UMA (Unidad de Medida y Actualizacion) and the Salario Minimo (minimum wage). They sound similar. They are not. And the gap between them has been widening every year:

Year Salario Minimo (daily) UMA (daily) Gap
2017 $80.04 $75.49 1.06x
2020 $123.22 $86.88 1.42x
2023 $207.44 $103.74 2.00x
2025 $278.80 $113.14 2.46x
2026 $315.04 $117.31 2.69x

In 2026, the minimum wage is 2.69 times the UMA. Using the wrong one doesn't produce a rounding error. It produces a legally invalid calculation.

Here's where it gets treacherous. Different legal provisions reference different units:

If you calculate the prima de antiguedad cap using UMA instead of Salario Minimo, you get $234.62/day instead of $630.08/day -- shortchanging the worker by 62.8%. If you calculate an IMSS contribution cap using Salario Minimo instead of UMA, you overpay by 169%.

An ontology encodes which reference unit applies to which legal concept. It's not ambiguous. It's not up for interpretation. The define_termino cross-reference links each provision to the correct unit. A RAG system retrieves text about both UMA and Salario Minimo and leaves it to the LLM to figure out which one applies. That's where the hallucination lives.

This is the kind of mistake that peritos laborales -- the expert witnesses who prepare dictamenes periciales for courts -- flag as the #1 most common error since 2017. An ontology eliminates it structurally.


Can a Legal Ontology Work for Any Jurisdiction?

This is the question I kept circling back to. Sentencia works for Mexican labor law. Does it generalize?

I looked at six jurisdictions. The answer is yes -- with caveats. Every legal system shares five structural primitives: hierarchical legislation, semantic categorization, cross-reference graphs, temporal versioning, and computed domain values -- making the ontology pattern universally replicable. The pattern is universal. The enums are jurisdiction-specific.

The pattern maps most cleanly onto civil law systems -- France, Brazil, Saudi Arabia, UAE mainland -- because they share codified, hierarchical structures nearly identical to Mexican law. Common law systems (US, UK) need an adaptation layer that elevates case law to first-class status alongside statutes. Sharia-influenced systems (Saudi Arabia, UAE) need a third dimension: the religious/jurisprudential source hierarchy.

Having lived in both Paris and Dubai, I can tell you -- I've seen these legal systems from the inside, both as a founder and as someone who's had to deal with employment law across jurisdictions. The structural similarities are real.

Brazil's CLT (Consolidacao das Leis do Trabalho) would be the easiest port -- structurally almost identical to Mexico's LFT. France is medium complexity because of its dual legislative/regulatory track. The US is the hardest because of federal/state duality and the centrality of binding case law. Saudi Arabia and the UAE sit in the middle, with added complexity from bilingual requirements and, in the UAE's case, a triple-jurisdiction model (Federal + DIFC + ADGM).

But here's what matters: the adaptation is in the enums, not the architecture. The entity model -- laws, structural units, articles, cross-references, case law, computed values -- is universal. You rename entities, adjust taxonomies, and add jurisdiction-specific extensions. The cross-reference type system generalizes directly, with two additions (interprets for common law case-to-statute links, overrides for hierarchy conflicts). The temporal modeling generalizes completely.

HTML_BLOCK_5


What This Means for HAQQ -- and the $1.2B MENA Opportunity

I've been building legal AI for the MENA region for the past two years. After pulling Sentencia apart, I can say with confidence: nobody in MENA is doing this. Not even close.

The competitive landscape in MENA legal AI is dominated by RAG-over-PDFs. Al Tamimi partnered with Harvey. Legora launched Arabic support in January 2026. There are 185+ legal tech companies in the region. But not one of them -- as far as I can find -- has built a Sentencia-style structured ontology with typed cross-reference graphs and computed value tables for MENA labor law.

No Arabic legal ontology exists. Not for labor law, not for commercial law, not for any domain. Zero. That's not a gap -- that's a vacuum.

That's our opening. And the market is real.

The Numbers Behind MENA Legal Tech

Metric Value
GCC Legal Technology Market $1.2B
UAE Legal Tech (2023) $114.5M
UAE Legal Tech (projected 2030) $234.4M (CAGR 10.8%)
UAE Digital Justice Budget AED 2.1B ($572M)
MEA Legal AI CAGR (2025-2030) 18%

The UAE government alone has allocated $572 million for digital transformation across the justice sector. Saudi Arabia's Vision 2030 includes specific legal technology modernization provisions. This isn't speculative -- the budgets are allocated, the mandates are issued.

The Only Competitor

The closest MENA-native competitor is Qanooni -- a Dubai-based AI legal drafting tool with a $2M pre-seed from Village Global. They're generic. No ontology. No knowledge graph. No structured legal reasoning. They're a wrapper around an LLM with a nice Outlook/Word integration.

That's it. $2M pre-seed, generic RAG. In a $1.2B market.

Why MENA Is a Perfect Fit for This Approach

The UAE's Federal Decree-Law No. 33 of 2021 is roughly 65 articles plus implementing regulations. Saudi Arabia's labor law is about 245 articles. These are manageable corpora -- small enough to build a complete, human-validated ontology in weeks, not months.

UAE Decree-Law No. 33 is actually the ideal starting corpus for a legal ontology. It's modern (2021, amended 2024), well-structured, introduces six flexible employment models (remote, part-time, temporary) with different rules for each, and the temporal versioning challenge is already present thanks to the 2024 amendments. Perfect for proving the ontology pattern works.

The GCC also has high-value, high-frequency computed values that are perfectly suited to structured modeling: end-of-service gratuity (EOSG) formulas differ between UAE and Saudi Arabia. Emiratisation and Saudization quotas change by sector and company size. WPS (Wage Protection System) compliance thresholds matter for every employer. These aren't questions you want an LLM to hallucinate answers to. They need to come from structured, validated data.

Arabic-first is our advantage. Per ArabLegalEval benchmarks, Arabic legal NLP lacks the benchmarking frameworks available for English. RAG approaches that work reasonably well in English degrade significantly in Arabic due to rich morphology, orthographic ambiguity, and a shortage of annotated legal datasets. An ontology sidesteps this entirely -- it provides structured data rather than relying on NLP extraction from Arabic text. The LLM does synthesis, not extraction. That's a much simpler task and one where current models are already good.

And we already have the distribution channel. HAQQ runs on WhatsApp. Just like Sentencia delivers through Hiku on WhatsApp in Mexico, we deliver through WhatsApp in the GCC. The 7-tool MCP pattern means our WhatsApp interactions cost two cents each, not sixty. In a region where short, conversational message patterns dominate, that difference is -- no exaggeration -- the difference between a viable product and a money pit.

Our Implementation Sequence

Phase 1 (Month 1-2): UAE Federal Labor Law. 65 articles + implementing regulations. EOSG calculator, leave tables, WPS thresholds. Arabic + English bilingual from day one. This is our primary market and the smallest corpus -- highest ROI.

Phase 2 (Month 2-3): Saudi Labor Law. 245 articles + 2024/2025 amendments. Different EOSG formula, Nitaqat quotas. Same language, similar structure -- we leverage everything from Phase 1.

Phase 3 (Month 3-4): DIFC + ADGM. Common law overlay for free zone case law. Completes the UAE picture for international firms operating across all three jurisdictions.

Phase 4 (Month 4-6): GCC expansion. Egypt, Bahrain, Kuwait, Qatar, Oman. Each new jurisdiction gets faster as the framework matures.

I'm still figuring out the exact timeline -- these always slip, any founder will tell you that. But the sequence is right.


Machine-Readable Law: We're Not Alone

One thing that gave me confidence during this research -- we're not inventing the idea that law should be machine-readable. Serious institutions have been working on this for years.

Singapore announced SOLID (Singapore Open Legal Informatics Database) in November 2025 -- a partnership between SMU's Centre for Digital Law and the Ministry of Law to build machine-readable datasets of court decisions, statutes, and legal scholarship, with a public API. Full launch expected Q1 2028.

The EU has been running the European Legislation Identifier (ELI) since 2012, now implemented by 21+ countries. Every EU legal text gets a unique URI, standardized metadata, and machine-readable format in RDFa or JSON-LD. Italy built an entire legislative knowledge graph on Akoma Ntoso, the UN's XML standard for legislative documents.

New Zealand's "Better Rules" initiative, running since 2018, goes furthest -- developing legislation simultaneously in plain language, rule statements, and code. Estonia's CIO called it "the most transformative idea" from international digital government summits.

The pattern is clear: the governments that are investing in computational law now will have the infrastructure for legal AI later. The ones that aren't... will be buying expensive RAG wrappers from Silicon Valley.

MENA is at a crossroads. The UAE's $572M digital justice budget suggests they're ready. The question is whether the legal AI they adopt will be architecturally sound or just another hallucination machine with a nice interface.


How to Build a Legal Ontology: A 7-Step Playbook

For anyone who wants to replicate this pattern -- whether for labor law, tax law, regulatory compliance, or any domain where structured knowledge matters -- here's what I've distilled into a practical process.

For a focused legal domain like a single country's labor law (65-245 articles), a complete ontology can be built in 4-8 weeks at a parsing cost of $5-20 per law.

Step 1: Identify Source Laws and Official Repositories. Find the 3-5 primary statutes, locate the official digital repository (government gazette, law portal), and assess machine-readability. HTML is best, clean PDF needs parsing, scanned PDF needs OCR. For MENA, most government publications are clean PDF -- LlamaParse handles Arabic well at 93-95% accuracy on legal documents ($0.003 per page).

Step 2: Design the Entity Model. Start from the universal template (Law, StructuralUnit, Article, CaseLaw, CrossReference, ComputedValue) and add jurisdiction-specific entities. For GCC: EndOfServiceGratuity, NationalizationQuota, WPSCompliance, ShariaReference.

Step 3: Define Enum Taxonomies. This is the highest-value design step -- don't rush it. Define 15-25 semantic theme tags (wages, termination, contracts, leave, safety, discrimination, plus jurisdiction-specific themes like emiratisation or sharia_compliance). Define cross-reference types (start with Sentencia's 7, add interprets, overrides, implements as needed). Define computed value types. Every article should map to at least one theme.

Step 4: Parse and Ingest Laws. Run source documents through a parser (LlamaParse for PDF, DOM parsing for HTML), extract the hierarchy (titles, chapters, articles), run semantic tagging via LLM, extract cross-references via regex + LLM, and load into Supabase. Expect $5-20 per law for initial parsing -- a one-time cost that pays back immediately.

Step 5: Build the Cross-Reference Graph. Extract explicit references ("Article X", "pursuant to Section Y"), detect implicit thematic connections via LLM, link inter-law references, and connect case law to statutes. Type every edge. Enforce acyclicity on amends and overrides relationships.

Step 6: Generate SDK and MCP Tools. Expose 7 tools: search articles, get article, get cross-references, get case law, compute value, get law timeline, compare jurisdictions. Use the TypeScript MCP SDK. Each tool queries Supabase and returns structured JSON.

Step 7: Connect to Messaging. User sends a question via WhatsApp. LLM receives the message plus 7 tool definitions. LLM calls 1-3 tools to retrieve structured data. LLM synthesizes a response grounded in ontology data. Response sent with source citations. Total cost: $0.02.

That's it. No magic. Just modeling the domain correctly.


What's Next

Here's what we're doing at HAQQ in the next 90 days.

Connecting to Dynamic Interfaces' MCP. We're exploring integration with their platform for ontology definition and SDK generation. Building the infrastructure layer from scratch is expensive and slow -- I know because I've been doing it. Using a platform that generates database tables, MCP actions, and typed SDKs from an ontology definition could cut months off our timeline.

Building a UAE Labor Law Ontology. Federal Decree-Law No. 33 of 2021 is our first target. We're mapping its 65 articles into the universal entity model, defining Arabic-English bilingual entities, building EOSG and leave calculators as computed value tables, and extracting cross-references between the law and its implementing regulations.

Testing with Real Legal Workflows. We're working with practitioners in the UAE to validate the ontology against actual legal questions -- end-of-service calculations, termination procedures, Emiratisation compliance. The goal isn't theoretical correctness. It's practical utility: does the ontology produce answers that a lawyer would sign off on? I don't know yet. That's the honest answer. But the architecture gives us the right foundation to find out.

Open-Sourcing the Universal Entity Model. The cross-jurisdiction entity mapping and the universal legal ontology pattern should not be proprietary. We plan to publish the base schema, the cross-reference type taxonomy, and the computed value type system as an open standard. The value is in the jurisdiction-specific data, not the framework.


I went into this expecting a cool demo and some ideas for the backlog. What I got was the architectural pattern I think will define the next wave of legal AI -- not just for MENA, but globally.

The insight is almost embarrassingly simple: model law as what it actually is. A structured, versioned, cross-referenced knowledge system. Stop treating it like a pile of PDFs to search through.

Every major legal AI company -- Harvey at $11B, CoCounsel with a million users, Legora at $5.5B -- is built on RAG. Stanford proved they hallucinate 17-33% of the time. Meanwhile, in medicine, ontology-grounded systems hit 98% accuracy. The architecture exists. The academic validation exists. The market exists.

We've been building legal AI wrong. Not morally wrong -- architecturally wrong. And now I can see the better path.

The ontology is the secret. We're building ours.


Questions I Keep Getting

Since I started talking about this publicly, the same questions keep coming up. Here are the honest answers.

What is a legal ontology, exactly?

A legal ontology is a structured, machine-readable model of a legal domain. Think of it as a knowledge graph specifically designed for law -- it defines entities (laws, articles, courts, computed values like wage tables), maps their hierarchical relationships, tracks typed cross-references between provisions, and handles temporal versioning for reforms. The key difference from RAG-based systems: instead of treating law as flat text to search through, a legal ontology captures the actual structure that legislation already has. Hierarchy, versioning, exceptions, formula chains -- all of it, explicitly modeled.

How is ontology-based legal AI different from RAG?

RAG chunks legal documents and retrieves semantically similar passages. An ontology models law as a connected knowledge graph with typed relationships. In practice, this means RAG loses three things that matter enormously in legal reasoning: hierarchy (is this article still inside the chapter it references?), temporal versioning (is this the pre-reform or post-reform version?), and cross-reference chains (does Article 50 depend on Article 84 which depends on a UMA definition?). An ontology preserves all three. The result is deterministic legal reasoning instead of probabilistic retrieval. That's not a subtle distinction -- it's the difference between "probably right" and "cite-ably right."

How does this compare to Harvey AI?

Harvey is a $11B company charging ~$1,200/lawyer/month (list price) with a 20-seat minimum. They use fine-tuned LLMs with RAG over legal databases -- no formal ontology, no structured knowledge graph, no typed cross-references. They claim 91% accuracy on their own BigLaw Bench, which still means 9% error rate. Sentencia's ontology approach costs $0.02/message with deterministic calculations that don't hallucinate numbers. The architectures are fundamentally different: Harvey makes the model smarter; the ontology approach makes the model's job simpler. Both have a place, but for structured legal calculations -- wrongful dismissal, end-of-service gratuity, compliance thresholds -- the ontology produces verifiably correct results where RAG produces probabilistically approximate ones.

What about hallucination? Is this really a problem?

It's not just a problem -- it's quantified. Stanford researchers (Magesh et al., published in the Journal of Empirical Legal Studies, 2025) ran the first preregistered empirical evaluation of production legal RAG tools. They found hallucination rates of 17-33% across Westlaw AI-Assisted Research, Lexis+ AI, and Ask Practical Law AI. These aren't prototype tools -- these are the products lawyers are paying thousands of dollars per month to use. The researchers concluded that hallucinations remain "substantial, wide-ranging, and potentially insidious" and that claims of "hallucination-free" are demonstrably overstated. Meanwhile, in clinical medicine, ontology-grounded GraphRAG achieved 98% accuracy versus ChatGPT-4's 37% -- published in the Journal of Biomedical Informatics. The evidence is clear: structured knowledge representation dramatically reduces hallucination.

How much does it actually save?

In the Sentencia case study, per-message costs dropped from $0.60 to $0.02 -- that's 97%. The savings come from two places: collapsing 300 MCP tool descriptions into 7 (which alone saves ~87,900 tokens per request) and enabling the switch from frontier models to open-source ones. When the model only needs to pick from 7 well-defined tools instead of 300, you don't need Claude Opus. A smaller, cheaper model handles it fine.

What is MCP and why does it matter for legal AI?

MCP is Anthropic's open standard for connecting AI models to external tools and data sources, now governed by the Linux Foundation with support from OpenAI, Google, Microsoft, and AWS. As of 2026, there are 10,000+ active public MCP servers and 97 million monthly SDK downloads. In the context of legal AI, MCP tools expose ontology operations -- search articles, compute settlements, traverse cross-references -- as callable functions. Any MCP-compatible client can invoke them. The reason it matters: it means you can build your ontology once and make it accessible from any AI interface. WhatsApp, Slack, a web app, whatever. The ontology is the backend; MCP is the API layer.

Can this work for common law systems like the US or UK?

Yes, but with an adaptation layer. Civil law systems (Mexico, France, Brazil, UAE, Saudi Arabia) map directly because they share codified, hierarchical structures. Common law systems need to elevate case law to first-class status alongside statutes -- you add an interprets cross-reference type for case-to-statute links and an overrides type for hierarchy conflicts. It's more work, but the fundamental pattern holds. The five structural primitives (hierarchy, categorization, cross-references, versioning, computed values) exist in every legal system.

How long does it take to build one?

For a focused domain like a single country's labor law: 4-8 weeks for a small corpus (65-245 articles). That includes source parsing, entity modeling, cross-reference extraction, and MCP tool generation. Parsing costs run $5-20 per law using tools like LlamaParse. The universal entity model template we're building accelerates subsequent jurisdictions -- once you've done it for UAE labor law, Saudi labor law goes faster because the framework is already there.

What is ontology-driven development?

It's an approach where you define the domain model (the ontology) first, and then all infrastructure -- database tables, API endpoints, AI tool definitions, typed SDKs -- gets generated from that definition. Dynamic Interfaces built their entire platform around this idea. The benefit: type-level consistency across the whole stack. When you change an entity in the ontology, the database schema, the API, and the MCP tools all update in sync. It dramatically reduces the number of tools an AI agent needs because the ontology encodes the relationships that would otherwise require dozens of separate database queries.


About the Author

Stephane Boghossian is the CEO and co-founder of HAQQ Legal AI, building AI-powered legal tools for the MENA region. A serial founder with a focus on open-source models, Stephane splits his time between Paris and Dubai and writes about legal technology, AI architecture, and building in emerging markets. Connect with him on LinkedIn or reach out via HAQQ's WhatsApp.


References

Academic Papers:

Legal Sources (Mexico):

Industry:

Published on HAQQ.ai Blog · Built with 11 AI agents · haqq.ai