January 8, 2025

Raising the Performance Bar: How App Orchid Got 99.8% Text-to-SQL Accuracy on the Spider Benchmark

Author:

App Orchid

What is Text-to-SQL?

Text-to-SQL is a technology that enables users to query databases using natural language inputs instead of writing SQL (Structured Query Language) code. Instead of needing to write SQL queries themselves, these users can simply type a question like, “What was the revenue for Q4 2024?” or “Which products had the highest customer satisfaction last year?”. The Text-to-SQL system will parse the question and generate a SQL that queries a database to get the right answer.

An incorrect SQL query generated from user questions can yield misleading results, disrupt decision-making, or even compromise sensitive data. Imagine getting the wrong answers for “What was the revenue for Q4 2024?” .

Accuracy is the most important metric for systems like this - on most benchmarks, we rarely see solutions that hit high levels of accuracy. For example, on the Spider 1 leaderboard, we see top approaches hit 85% to 90% accuracy.
The Evolution of Text-to-SQL: A Technical Deep Dive

In implementing Text-to-SQL systems, three architectural approaches have emerged, each representing fundamentally different strategies for achieving accuracy and maintainability:

  • Model Fine-tuning focuses on tuning Large Language Models (LLM) for specific database schemas and SQL generation patterns. This approach requires training custom models on enterprise-specific SQL data, making it resource-intensive and challenging to maintain. As schemas evolve and new data patterns emerge, models need periodic retraining, creating significant operational overhead. While fine-tuned models can achieve high accuracy for specific use cases, they become increasingly complex to manage across multiple databases and domains.
  • Retrieval-Augmented Generation (RAG) takes a more dynamic approach, augmenting LLM prompts with relevant schema information and examples retrieved at query time. While this eliminates the need for constant model retraining, it introduces challenges around retrieval accuracy and latency. Each query requires searching through a vector store of schema documentation and example queries, making consistency difficult to guarantee. As databases grow and schemas evolve, maintaining accurate and current embeddings becomes increasingly complex.
  • App Orchid's Structured Ontology approach differentiates itself by creating a formal semantic layer through Managed Semantic Objects (MSOs). Unlike RAG's dynamic retrieval, this approach builds a comprehensive knowledge graph that explicitly defines business object hierarchies, relationships between those objects, field-level traits (temporal, spatial, categorical), computation rules, derived metrics as well as domain-specific terminology and synonyms. It captures rich metadata around each MSO including its purpose, description, its relationship with other objects in the dataset, how it is sourced from backend systems and synonyms. Each field within an MSO also has extensive metadata captured including description, data type, relationships, cardinality, completeness, synonyms, and specific traits such as temporal, spatial or categorical. The ontology also incorporates pre-built types for common data elements like addresses, amounts, areas, and percentages, each with associated functions for use in relevant visualizations and queries. Users can also define derived fields using pre-built aggregation and window functions or custom SQL code. This structured semantic layer provides deterministic behavior through explicit constraints and rules, rather than relying on retrieved context. These constraints act as guardrails, effectively preventing the LLM from generating incorrect or nonsensical SQL queries, thus mitigating the risk of hallucinations. Business users can enhance the ontology with custom traits, computations, and terminology, creating a living knowledge model that evolves with the organization. By enforcing consistency through formal object relationships and constraints, this approach achieves higher accuracy while remaining maintainable across diverse enterprise deployments.

The key advantage of this architecture lies in its ability to combine the flexibility of natural language understanding with the precision of traditional semantic modeling. While RAG systems struggle with consistency across queries and fine-tuned models resist schema evolution, App Orchid's structured ontology provides a stable yet adaptable foundation for accurate query generation, directly addressing the challenge of LLM hallucinations and delivering reliable results.

App Orchid Achieves the Highest Text2SQL Accuracy Score on This Benchmark Hitting 99.8% With Ontology Enrichment

With the rise of Large Language Model (LLMs), Text-to-SQL accuracy has risen from 77% to the current highs. While these models are powerful tools capable of understanding and generating human-like text, they are not without their challenges. One of the most concerning issues is “hallucination”—when an LLM confidently produces incorrect or nonsensical outputs.  

The best way to reduce hallucinations to give the LLM guardrails in the form of context and metadata.

App Orchid’s Text-to-SQL solution addresses this challenge by employing an ontology-driven methodology that enhances understanding, adaptability, and reliability. By prioritizing logical correctness—ensuring that generated queries are both syntactically and semantically accurate—App Orchid has achieved an impressive 94.8% accuracy out of the box, climbing to 99.6% with ontology enrichment.
The Spider Dataset Test: A Benchmark for Text-to-SQL Accuracy

To validate accuracy, App Orchid tested its Text-to-SQL model against the Spider dataset, a widely recognized benchmark developed by Yale university. The Spider dataset includes a diverse array of databases, each with natural language questions and corresponding SQL queries. This makes it an ideal testing ground for assessing the robustness and generalizability of Text-to-SQL solutions.

Spider is a large-scale complex and cross-domain semantic parsing and text-to-SQL dataset annotated by 11 Yale students. The goal of the Spider challenge is to develop natural language interfaces to cross-domain databases. It consists of 10,181 questions and 5,693 unique complex SQL queries on 200 databases with multiple tables covering 138 different domains.

App Orchid’s approach involved creating a unique ontology for each database in the Spider dataset. These ontologies act as structured dictionaries, providing semantic context and relationships that guide the model in generating accurate queries.  

The App Orchid platform utilizes an automated ontology enrichment module that auto generates an ontology for each of the 200 databases. LLMs were utilized to enrich the ontology metadata.  

App Orchid also opted for manual evaluation of the outputs. Human reviewers meticulously compared the generated SQL queries with the Spider dataset’s baseline results, verifying that they retrieved the intended data logically and accurately.

Results

App Orchid’s results speak for themselves:

Out-of-the-box accuracy (zero shot): 94.6%

Accuracy with ontology enrichment: 99.8%  

For context, the highest recorded accuracy on the Spider dataset leaderboard was 91.2%. App Orchid’s performance not only surpasses this benchmark but also redefines what’s possible in the Text-to-SQL domain.

Why Ontology Matters: Reducing Hallucinations, Adding Context

The core of App Orchid’s success lies in its ontology-driven framework. Traditional Text-to-SQL models often struggle because they lack a semantic layer, relying instead on static training data that may not generalize well to new datasets. This limitation increases the likelihood of hallucinations, where the model generates queries that are incorrect or irrelevant.

Ontologies address this issue by providing a structured, context-rich foundation that guides the model’s reasoning. Each ontology serves as a knowledge graph, mapping out the relationships and hierarchies within a specific dataset. This semantic layer ensures that the model interprets the user’s query in the correct context, dramatically reducing the risk of hallucinations.

For example, consider a database with tables related to customer orders and product inventory. Without an ontology, an LLM might misinterpret a question about “total sales” and generate a query that erroneously combines unrelated tables. With an ontology, the model understands the relevant relationships and generates a query that accurately reflects the user’s intent.

Ontology Architecture: Engineering for Precision

App Orchid's ontology system transforms traditional database metadata into a rich semantic layer that dramatically improves query accuracy. Here's how the architecture achieves this:

Managed Semantic Objects (MSOs)
App Orchid's ontology is built on the concept of Managed Semantic Objects (MSOs). These are representations of business objects in the dataset. Each MSO is enriched with extensive metadata that goes beyond traditional schema definitions, capturing a deeper understanding of the data within the business context.

Structured Knowledge Representation
Each MSO is automatically annotated with detailed descriptions generated by LLMs. These descriptions leverage the LLM's general knowledge and are further refined by the specific context provided within the ontology, ensuring a clear understanding of the MSO's role within the business.

  • Explicit relationships to other business objects with defined cardinality
  • Field-level metadata including data types, constraints, and completeness metrics
  • Domain-specific synonyms and business terminology mappings

Semantic Trait System
Rather than treating all fields equally, the ontology applies specialized traits that guide query generation through:

  • Temporal traits enable automatic handling of time-series patterns and date hierarchies
  • Spatial traits incorporate geographical relationships and distance computations
  • Categorical traits manage hierarchical dimensions and grouping logic
  • Numeric traits handle units, scales, and appropriate aggregation methods

Advanced Computation Layer
The ontology maintains a complete computational graph that captures:

  • Derived field definitions with full lineage tracking
  • Pre-built aggregation and window functions for common business metrics
  • Custom SQL functions that encapsulate complex business logic
  • Field-level validation rules and business constraints

Dynamic Enhancement Mechanisms
The semantic layer continuously evolves through:

  • Automated metadata enrichment using LLMs
  • Business user feedback and custom terminology addition: This forms a crucial part of the feedback loop, allowing the system to learn from user interactions, correct errors, and continuously improve its accuracy over time
  • Runtime pattern learning from successful queries
  • Power user customizations for specific domains or use cases

Query Generation Guardrails
The ontology provides critical constraints that prevent common LLM errors, directly addressing the challenge of hallucinations:

  • Valid join paths between business objects
  • Appropriate aggregation levels for metrics
  • Semantic compatibility between fields in comparisons
  • Business-rule validation for complex queries
Advantages of App Orchid’s Ontology-Driven Approach
  1. Contextual Understanding: Ontologies provide a semantic layer that enhances the model’s ability to interpret queries accurately, even in complex scenarios.
  1. Enterprise Adaptability: Unlike traditional models, which require extensive fine-tuning, ontology-driven solutions adapt seamlessly to new datasets, making them ideal for dynamic enterprise environments.
  1. Scalability and Resilience: The ontology-based framework ensures robust performance across diverse datasets, minimizing the risk of failure in real-world applications.
  1. Rigorous Testing: By combining automated and manual ontology enrichment with human-reviewed evaluations, App Orchid delivers a solution that is both highly accurate and reliable.
Why Accuracy Matters More Than Ever

In enterprise applications, the stakes for accuracy are high.  

  • Misinformed Decisions: Incorrect data insights can derail strategic initiatives.
  • Operational Disruptions: Erroneous queries can compromise workflows and system integrity.
  • Eroded Trust: Stakeholders lose confidence in AI solutions that produce unreliable results.

App Orchid’s ontology-driven methodology addresses these risks head-on, providing a level of accuracy that builds trust and ensures reliable performance. By achieving near-perfect accuracy, the solution empowers enterprises to leverage Text-to-SQL capabilities with confidence, unlocking new efficiencies and insights.

Conclusion: A New Era for Text-to-SQL Solutions

App Orchid’s ontology-driven Text-to-SQL solution sets new standards for accuracy and reliability in a field often plagued by LLM hallucinations. By combining cutting-edge technology with testing and evaluation, App Orchid has created a solution that not only meets but exceeds the demands of modern enterprises.

As LLMs continue to transform how we interact with data, solutions like App Orchid’s offer a blueprint for success—one that prioritizes accuracy, adaptability, and trust. With its industry-leading performance, App Orchid is paving the way for a future where AI-driven tools deliver on their promises, free from the pitfalls of hallucination.