Queries are a robot’s best friend: AugQL

Written by Aaron Hammond

03.27.2025

Large Language Models (LLMs) excel at transforming unstructured text content into structured data. However, this distillation is often prone to “hallucination”, making it difficult to build a system that delivers insight consistently and correctly. Bridging this gap between the unstructured world and structured data is one of the biggest problems in Generative AI right now.

To solve this, we built AugQL, a proprietary query language that offers LLMs a deterministic path for fetching and filtering normalized data from the database. AugQL results ground the agent in reality, mitigating the risks of hallucination while significantly expanding the scope of what our agent can do in the wild.

In this AugQL example, students are queried for food allergies, including in the results the student’s full name, school, and any allergens present on their health record.

What’s in a language?

AugQL, short for “August Query Language,” is a domain-specific language inspired by natural language and friendly to machines. On the backend, AugQL transforms complex queries and calculations automatically into an efficient, optimized sequence of database operations.

In this example, we calculate the breakdown of encounters with third grade students by the allergy of the student subject.

The corresponding database operations required for simple calculations make it infeasible for LLMs to generate SQL statements or MongoDB aggregations on their own. They simply require too much knowledge of the underlying data structures to be meaningfully useful.

AugQL would generate a MongoDB aggregation similar to the above when dispatched a query as in the previous example. This aggregation is much less readable for humans, and likewise more challenging for the machine to produce.

Above is an equivalent SQL query. Generation of these statements would require a deep knowledge of the database schema.

AugQL abstracts away database-level modeling, providing a clean, straightforward interface to write queries without knowing anything about the codebase. For example, querying for a “Food” allergen automatically includes any allergy we’ve tagged on the backend under food.

Similarly, AugQL surfaces a single, synthetic field for chief complaints on encounters, wherein reality this low-level form data is spread across many places. Using AugQL, our agent can query for food allergies or chief complaints without any knowledge of the actual complexity required to answer the question correctly.

In this AugQL example, we’re querying for encounters with allergy-related chief complaints, where the student subject is a student at Boutwell school and also has a peanut allergy. For columns, we’re including the student’s full name, as well as the email address for their contacts and the teachers of any of their classes. Together with columns, the complete operation involves joining students with the tables for schools, form values, contacts, classes, and people. The machine only needs to write AugQL however.

AugQL is dynamic based on the environment where it operates. When defining the schema for an AugQL request, we walk the contents of the database to determine which fields are used in the environment and what values are typically present. The AugQL schema is also sensitive to user roles. An agent that belongs to a counselor can only write AugQL against behavioral health data, not physical health fields like conditions or medications.

The code above represents a fragment of the JSON schema defined automatically for AugQL student queries. When we explore the database, we detect any fields that represent enumerated values, reflecting this structure in the AugQL schema. In this case, the agent knows that “Male” is a valid selection for the gender field and queries appropriately.

How is this different?

Based on the name, you’d be forgiven for mistaking AugQL as a clone of GraphQL or more traditional options like SQL. AugQL offers much more power than general purpose alternatives because of its dynamicity and cleanliness of access patterns.

LLMs aren’t very good at writing complex, multi-table SQL queries, and they may never be. How can we expect machines to write SQL well when it’s still so hard for well-informed humans?

On the other hand, GraphQL implementations might enable the specification of nested data access, but fail to answer how to fetch data efficiently. Likewise, GraphQL does not unlock the same depth of queries that AugQL offers. Instead, every input parameter must be shaped by the developer, so generalization is tough for the agent.

Where does this fit in?

Insights aren’t always adducible easily from structured data and queries. Indeed one of the greatest strengths of LLMs is their ability to search for information semantically, based on the meaning of content and not the exact words. Asking the agent for examples of encounters that involve bullying, for example, would be challenging with AugQL alone, because there is no single “bullying” checkbox in encounter documentation. Rather, this labeling wants to emerge from the freeform text of the encounter notes.

In such cases, traditional methods for Retrieval Augmented Generation (RAG) are more effective. Our agent not only has the ability to dispatch AugQL queries as needed, but also benefits from a hybrid search over chunks of database content extracted from our application database records. Using a combination of text and vector search, we identify chunks of relevant to a prompt and include these along with the user’s message. In this way, even before the agent begins to formulate a response, the answer may already be provided in the chunks.

This composite approach allows our agent to incrementally refine its understanding. When information fetched preemptively using the hybrid vector search is insufficient, the agent can more selectively access the needed data using AugQL, without becoming overwhelmed by useless detail.

Taken together, AugQL and RAG balance the precision and recall respectively required for critical applications in healthcare and education where hallucination can be dangerous.