Fractional Ops

Will AI Hallucinate Case Law? How to Use Legal AI Safely

Lawyers were sanctioned $5,000 when ChatGPT invented fake cases. Here is what AI hallucination means for your firm and how to use legal AI safely.

David YuJune 12, 202610 min read

In 2023, a federal judge in the Southern District of New York published an opinion that became required reading for anyone thinking about AI in legal practice. The attorneys representing Roberto Mata in Mata v. Avianca had used ChatGPT to research their motion. The model produced six cases to support their arguments. None of those cases existed. The judge fined the firm $5,000 and wrote that the court was presented with an unprecedented circumstance.

That precedent settled one question definitively: AI can and does hallucinate legal citations, and the consequences fall on the attorney, not the software company.

The question worth spending time on now is not whether AI hallucination is real, but how it works, which tools manage it better than others, and what your firm needs to do to use AI for research without ending up in a sanctions opinion.

What "Hallucination" Actually Means

The word is imprecise but the mechanism is specific. General-purpose large language models like ChatGPT and GPT-4 are trained by ingesting enormous amounts of text and learning statistical patterns across that text. They generate responses by predicting what text is likely to follow a given prompt. That works well for summarizing, drafting, and many other tasks. For legal citations, it breaks.

When a model has no exact citation in its training data to draw on, it does not say "I do not know." It generates what a citation would plausibly look like: a realistic-sounding case name, a plausible court, a credible year, a quote that fits the argument. Every element is statistically coherent. None of it is real.

This is not a bug in the sense of something developers missed. It is a structural property of how these systems work. Models that are not explicitly designed to suppress hallucination, or that are not grounded in an authoritative source at query time, will produce fabricated citations in a significant share of legal research tasks.

What the Research Shows

A study published in the Journal of Empirical Legal Studies tested the major AI-assisted legal research tools specifically on their hallucination rates. The results matter for any firm considering these tools.

Among the tools examined, Lexis+ AI had a hallucination rate of approximately 17 percent and answered around 65 percent of queries accurately. Westlaw AI-Assisted Research had a higher hallucination rate of around 33 percent and an accuracy rate of 42 percent. GPT-4, tested without any legal grounding layer, hallucinated on roughly 43 percent of queries.

The study also surfaced a more troubling category: misgrounded responses. A misgrounded response cites a real case, but that case does not actually support the claim the AI attributed to it. In some ways this is more dangerous than an obviously fabricated citation, because it passes a quick existence check. An attorney verifying that a case exists might not read carefully enough to notice that the case does not say what the brief claims it says.

The takeaway is not that you should avoid all AI legal research tools. It is that the tools vary considerably in how well they manage this problem, and that even the best ones require human verification for anything consequential.

The Difference Grounded AI Makes

The term grounded AI describes systems that do not rely solely on the model's learned patterns. Instead, they retrieve actual documents at query time and anchor their responses to what those documents say.

The underlying architecture is called Retrieval-Augmented Generation, or RAG. Here is how it works in plain terms. When you ask a RAG-powered legal research tool about a case or a doctrine, the system does not just ask the language model what it remembers. It first queries a database of real legal documents, pulls the relevant passages, and feeds those passages to the model along with your question. The model then generates a response grounded in the actual retrieved text and cites back to those documents.

The practical result is that the model cannot freely invent a citation, because its answer is built from documents that actually exist in the database. The quality ceiling is now set by what is in that database, not by what the model happens to have learned.

Tools like CoCounsel, built by Casetext and now part of Thomson Reuters, use this approach. When CoCounsel cites a case, it is citing a document from Westlaw's proprietary legal database. Harvey, which serves larger firms, uses a similar RAG architecture with citation verification designed to flag any output it cannot trace to a source. The result is meaningfully better than asking a general-purpose model a legal research question, though even these systems require attorney review.

The distinction worth keeping in mind: a tool that says "here is an answer with citations drawn from our verified database" behaves very differently from a tool that says "here is an answer from a model trained on general web text." Both might give you a citation. Only one of them has any real accountability to that citation.

What to Ask Before You Use Any Legal AI Tool

Before your firm commits to an AI research tool, these questions narrow the field quickly.

Where do the citations come from? If the vendor cannot tell you exactly which database their citations are drawn from, or if the answer is "the model learned from a large corpus of legal text," treat that as a high-risk answer. The grounded tools can be specific: Westlaw, Lexis, the firm's own document library.

What happens when the model is uncertain? Some tools are designed to refuse or flag rather than guess. Ask what the tool does when it cannot find a real supporting citation. A tool that says "I could not find relevant case law on that question" is far more useful than one that generates a plausible-sounding citation it cannot verify.

Can you see the source text, not just the citation? Good tools surface the actual passage from the underlying document alongside the citation. This lets you read the source in context rather than trusting the AI's characterization of it.

What are the data handling terms? Before putting any client matter into an AI tool, you need to know whether the vendor trains their models on your queries and documents, and what their data retention and access policies are. This is not optional under ABA Model Rule 1.6 on confidentiality. Most enterprise legal AI tools offer explicit contractual terms on data isolation and prohibit training on customer data. Verify this in writing before proceeding.

What ABA Formal Opinion 512 Requires

The American Bar Association issued Formal Opinion 512 on July 29, 2024, its first substantive ethics guidance on generative AI in legal practice. The opinion does not ban AI use, but it sets out what lawyers must do to use it ethically.

The opinion identifies six areas of ethical concern: competence under Model Rule 1.1, communication with clients under Model Rule 1.4, confidentiality under Model Rule 1.6, candor toward the tribunal under Model Rules 3.1 and 3.3, supervisory obligations under Model Rules 5.1 and 5.3, and fee practices.

On competence, the opinion states that lawyers do not need to become AI engineers, but they do need to understand the capabilities and limitations of any AI tool they use. Submitting a brief containing AI-generated citations without independent verification fails the competence standard under Rule 1.1.

On candor toward the tribunal, the obligation is straightforward. Under Rule 3.3, a lawyer may not knowingly make false statements of law to a court. A fabricated citation is a false statement of law, regardless of whether the lawyer knew it was fabricated. Ignorance of the model's limitations is not a defense. This is exactly why the Mata v. Avianca sanctions fell on the attorneys, not on OpenAI.

On confidentiality, Opinion 512 requires lawyers to evaluate whether a vendor's data practices are consistent with their obligations to clients. The opinion suggests lawyers must conduct reasonable due diligence before using any AI tool on client matters. That means reviewing data handling agreements, not just clicking through terms of service.

The full opinion is available on the ABA's website and is worth reading in full if you are making purchasing or policy decisions for your firm.

A Practical Workflow for AI-Assisted Legal Research

This is not an argument against using AI for legal research. Used correctly, these tools save real time on the exploratory phase of a research task. The point is building a workflow where the AI does the draft work and a human closes the loop before anything reaches a filing.

Start with a grounded tool. Use a legal research platform that retrieves from a known database, such as CoCounsel or a Westlaw AI-assisted product, rather than a general-purpose model for citation work.

Treat every citation as a draft, not a finding. Pull the actual case, read the relevant passage, and confirm that the case says what the AI claims it says. This is the step the Mata v. Avianca attorneys skipped. It takes two to three minutes per citation and is non-negotiable.

Limit general AI tools to non-citation tasks. ChatGPT, Claude, and similar general-purpose models can be useful for drafting arguments, summarizing documents you have already pulled, or brainstorming angles. They should not be the source of citations you file.

Never paste a client matter into a general AI tool without checking the data terms. Many general-purpose tools default to using conversations for model improvement unless you opt out or use an enterprise plan with different terms. Check before using.

Document your verification. Some firms are beginning to keep a brief verification log for AI-assisted research, noting which citations were independently confirmed and by whom. This is partly risk management and partly a response to supervisory obligations under Rule 5.3, which Opinion 512 suggests may extend to AI tools used as nonlawyer assistants.

Use AI earlier in the process, not at the finish line. AI is most useful during the exploratory phase: surfacing potentially relevant doctrines, identifying leading cases to pull, drafting a first pass at an argument structure. It is least safe as the final step, because that is when fabricated citations are most likely to slip through without a second look.

The Practical Question Your Firm Needs to Answer

The Mata v. Avianca opinion, the JELS hallucination study, and ABA Formal Opinion 512 are not arguments against AI in legal practice. They are a map of where the real risks sit and what responsible use actually requires.

Firms that run into trouble are typically the ones that hand AI output to a paralegal or junior associate without a clear protocol for verifying citations, or that assume a legal-branded tool cannot hallucinate. The firms that use this well are the ones that choose tools designed to be grounded in real legal sources, build verification into the workflow before anything reaches a filing, and treat AI output as a starting point rather than a finished product.

At Futureman Labs, we help small and mid-size firms work through the practical questions: which tools fit your existing workflow, what a defensible AI use policy looks like, and where the highest-leverage automation opportunities are. The free Law Firm AI Readiness Scorecard is a good starting point if you want a clear picture of where your firm stands today.

Is your firm AI-ready?

Take the free Law Firm AI Readiness Scorecard. Get a grounded, practical report on where AI safely saves your firm time, and where it is a liability.

Want to cut through the AI hype?

Start with the free Law Firm AI Readiness Scorecard. Two minutes, and you will see exactly where to start and what to avoid.

Fractional Ops

CRM Required Fields: Which Deal Fields to Enforce and When

Too many required CRM fields backfires. Here are the five deal fields worth enforcing and how to gate them by stage in HubSpot, Salesforce, or Pipedrive.

Jul 27, 202612 min read

Fractional Ops

Sales Pipeline Aging: Catch Stalling Deals Before They Die

Most stalled deals give no warning until it's too late. Here's how to measure pipeline aging, set stage SLAs, and act while there's still time to save them.

Jul 26, 202610 min read

Fractional Ops

Sales Pipeline Cleanup: Remove Zombie Deals, Fix Your Forecast

Most B2B pipelines carry 30-50% zombie deals. Here is how to find and archive them in HubSpot, Salesforce, or Pipedrive so your forecast reflects reality.

Jul 25, 202611 min read

Fractional Ops

CRM Call Notes: What Sales Reps Should Log After Every Call

Most CRM call notes are useless after a rep handoff. Here is the five-field template sales reps should use after every discovery or follow-up call.

Jul 24, 202610 min read

Fractional Ops

Inheriting a Sales Pipeline: The New AE Audit Checklist

Most inherited CRM pipelines have stale close dates, zombie deals, and no prospect context. Here is how to audit what you got on day one and ramp faster.

Jul 23, 20269 min read

Fractional Ops

Sales Rep Offboarding: Protect CRM Pipeline Data When a Rep Leaves

With 30% annual AE turnover, orphaned deals are inevitable. Here's the CRM offboarding checklist that protects your pipeline context when a rep walks out.

Jul 22, 20269 min read

What "Hallucination" Actually Means

What the Research Shows

The Difference Grounded AI Makes

What to Ask Before You Use Any Legal AI Tool

What ABA Formal Opinion 512 Requires

A Practical Workflow for AI-Assisted Legal Research

The Practical Question Your Firm Needs to Answer

Is your firm AI-ready?

Want to cut through the AI hype?

Related Articles

CRM Required Fields: Which Deal Fields to Enforce and When

Sales Pipeline Aging: Catch Stalling Deals Before They Die

Sales Pipeline Cleanup: Remove Zombie Deals, Fix Your Forecast

CRM Call Notes: What Sales Reps Should Log After Every Call

Inheriting a Sales Pipeline: The New AE Audit Checklist

Sales Rep Offboarding: Protect CRM Pipeline Data When a Rep Leaves