Edtek: how a small team built a hallucination-free legal AI that beats the giants

Over the last year, I have been spending a serious share of my time on a project that is, finally, beginning to grow up: Edtek. It started as a focused bet — could a small team with the right background build AI products that actually work in domains where most AI tools fail? — and turned into a small but real business with paying clients, named case studies, and a steady stream of expansion requests.

I want to write a short note on why I think it is working.

The “everyone does RAG” problem

Pick any AI conference talk from the last two years and you will hear that retrieval-augmented generation (RAG) is a commodity. Throw your documents into a vector database, plug in an LLM, ship. Three months and you have a chatbot.

This is true and it is also wrong. It is true because the basic recipe really is that simple. It is wrong because the basic recipe produces a chatbot that demos well and works badly — the kind of system that gives confident answers about clauses that do not exist, mis-cites case law, or quietly hallucinates a regulation that never existed. In legal contexts, this is not a quirk. It is a liability.

What separates a good RAG from a bad one is not the part that is on every diagram in every blog post. It is the boring stuff: how you chunk, how you re-rank, how you evaluate, how you handle conflicts between sources, how you decide when not to answer, how you cite, how you measure regressions when you change a single prompt. Each of these is small. Together they are the entire product.

This is exactly the kind of thing where our backgrounds happen to be a good fit. My co-founders and I come from two worlds that are usually kept apart: serious scientific research (astronomy, machine learning on real data, papers in real journals) and twenty years of building production software. The first taught us to evaluate things properly. The second taught us to ship them. In a market full of demos, this combination is surprisingly rare and surprisingly valuable.

The AAA case

Our most visible client is the American Arbitration Association — one of the oldest and most authoritative arbitration institutions in the world. We built the AAAi Chat Book for them: a conversational AI on top of their official case-preparation handbooks. Practitioners can now ask questions about AAA arbitration procedures in plain English and get source-verified answers with page-level citations back into the official documents.

This is the kind of project where “the basics work most of the time” is not good enough. Arbitrators, counsel, and parties are about to make decisions on the basis of what the system says. If the model invents a rule, somebody’s hearing goes sideways. If the citation points to the wrong page, the credibility of the entire tool collapses. So we did not ship the basic recipe. We built a multi-layer verification architecture — semantic retrieval with multi-criteria re-ranking, combined with both classical and AI-driven source validation. When the source material does not contain the answer, the system says so, out loud. No fabrication, no smooth-talking.

The result is not magical. It is just good. And in a market saturated with confident-sounding nonsense, “just good” turns out to be a strong commercial position.

Beyond chat

Chat was the obvious first product, but the architecture generalises. We are now expanding in two directions.

The first is more products on the same foundation. Edtek Cite reads documents and surfaces every legal citation and factual claim with inline authority and source links — useful for briefs, contracts, audits, and academic manuscripts. Edtek Draft generates first drafts of documents from verified internal knowledge bases. EssayTutor — closer to my old EdTech soul — lets professors grade student essays against model answers and structured criteria like IRAC for legal analysis. ChatBldr, CourseBldr, and DocuBldr extend the same idea to chatbot, course, and document creation workflows. They share a common verification core and differ only in the surface.

The second is more domains. We started in legal because the cost of being wrong is high enough that clients are willing to pay for being right. The same dynamic exists in medical reference, regulatory publishing, professional services, and education. We have expansion conversations open in each of these, and the architecture transfers cleanly — the verification machinery does not care whether the source of truth is the AAA Construction Industry Arbitration Rules, a clinical guideline, or a corporate compliance manual.

What this confirms for me

A few things, mostly things I already half-believed but now believe more.

First — and this is the contrarian one — technology is not the moat, taste is. Anyone can wire up OpenAI to a vector database. Knowing which of the hundred small decisions actually matters, and being willing to spend real effort on them, is what separates a product from a demo. That is a taste problem, and taste comes from having done this kind of work before, in adjacent fields, under conditions where being approximately right does not count.

Second, narrow specialisation is overrated. The reason we can build legal AI well is not that we are lawyers. It is that we have spent years thinking about evaluation, about uncertainty, about software architecture, and that we are willing to learn the domain deeply alongside our clients. Pure legal-tech competitors who know the law but not the engineering — or pure AI startups who know the engineering but not the domain — keep shipping things that are almost right. We try, very hard, to ship things that are right.

Third, and most importantly: the era of “AI products” as a category will end, and what will remain is products that happen to use AI very well. Edtek is, on a good day, an early example of that. We are not selling LLMs. We are selling reliable, source-verified document intelligence in domains where reliability is the whole point.

It is still early. The team is small, the to-do list is long, and the market is moving faster than any of us can fully keep up with. But for the first time in a while, I am working on something where the answer to “why do we have a right to win here?” is concrete rather than aspirational. That is a nice feeling.

If you want to take a look, the products live at edtek.ai.