EngineeringMarch 25, 20266 min read

4 Things We Got Wrong in Our First Hybrid Parser

Our hybrid regex + AI order parser worked — until our test suite showed us where it didn't. Four design assumptions we had to rethink while building Elizabeth.ai's Taglish order parsing system.

By Elizabeth.ai Team

When we designed Elizabeth.ai's order parsing system, we made a deliberate architectural choice: regex first, AI only when needed. The majority of orders get parsed by regex at zero cost. The rest route through a multi-provider AI cascade with circuit breakers.

The architecture was sound. But as we stress-tested it against hundreds of realistic Taglish order patterns — messages modeled on how Filipino online sellers across different product categories and regions actually communicate — we found four assumptions that did not hold up. Each one led to a meaningful change in how the parser works.

1. Confidence Is Not a Single Number

What we assumed: A single confidence score would be enough to decide whether the regex parser handled an order or whether it needed to escalate to AI. High confidence meant "regex got it." Low confidence meant "send it to Claude."

What we found: Confidence has multiple dimensions that matter independently. A message like "adobo po" might have high product confidence — we know exactly what the customer wants — but low quantity confidence, because they never specified how many. Under a single score, this would either round up to "confident enough" (risking a wrong quantity) or round down to "not confident" (wasting an AI call on a message where we already know the product).

What we changed: The parser now tracks structured confidence with separate scores for product identification, quantity extraction, and an overall composite. Each dimension has its own threshold. When product confidence is high but quantity is uncertain, the system asks a targeted clarification question — "How many adobo?" — instead of re-parsing the entire order with AI. When product confidence itself is low, it escalates to the full AI cascade.

The result: fewer unnecessary AI calls, lower cost, and a better customer experience. Instead of a generic "is this your order?" confirmation, the system asks precisely the question it needs answered.

2. AI Hallucinates Orders That Sound Plausible

What we assumed: When an AI provider parses an order and returns structured items, those items actually exist in the customer's message. If the customer said "2 adobo," the AI would return adobo with quantity 2.

What we found: AI models sometimes invent items that were never mentioned. A customer writes "2 adobo" and the model returns adobo plus sinigang — because those dishes frequently co-occur in Filipino food ordering contexts within training data. The hallucinated items look plausible. They are real products in the merchant's catalog. But the customer never ordered them.

This is particularly dangerous because the items pass surface-level validation. They exist in the catalog, the quantities are reasonable, and the format is correct. Without a specific check, the system would confidently present a wrong order to the customer.

What we changed: We added a post-parse hallucination filter with three guards. First, a lexical overlap check — does the returned item have meaningful word overlap with the original message? If the customer never mentioned "sinigang" in any form, it gets flagged. Second, a single-product guard — if the message contains no item separators (no "and," "tsaka," commas) but the AI returned multiple items, we keep only the best-matched one. Third, an excess-item guard — if the AI returned more items than the message has meaningful segments, the weakest-scoring extras get dropped.

Items that fail these checks are quietly removed before the customer ever sees them.

3. Every Merchant's Language Is Different

What we assumed: A default set of Filipino number words ("isa," "dalawa," "tatlo") and item separators ("tsaka," "saka," "at," "and") would cover the vast majority of merchants.

What we found: Filipino commerce language varies more than we anticipated. Sellers in different regions and product categories have distinct vocabularies. Some customers use "tas" or "tapos" as connectors that our default separator list did not include. Some niches use number words or abbreviations specific to their community. A single hardcoded set of parsing rules could not flex to match each merchant's actual customer language.

The regex parser was handling these messages correctly when customers used standard patterns, but dropping to AI unnecessarily when they used regional or niche vocabulary. This meant higher AI costs for merchants whose customers happened to speak differently from our defaults.

What we changed: The parser configuration is now runtime-extensible per merchant. Each merchant can have additional number words and item separators beyond the defaults, driven entirely by configuration. When a merchant's customers consistently use "tapos" as a connector, adding it to their config lets the regex engine handle those orders at zero cost instead of escalating to AI.

No code deploys needed. The regex engine reads the merchant's config at parse time and adapts accordingly.

4. Trust but Verify — Even AI Needs a Cross-Check

What we assumed: When an AI provider returns a high-confidence parse, the quantities are correct. If Claude says the customer ordered 3 sinigang, it is 3 sinigang.

What we found: Edge cases where AI providers confidently return wrong quantities. A customer writes "3 adobo" and the model returns quantity 2. Or a single quantity gets split across items that should share it. These errors are subtle — the overall parse looks correct, the products are right, and only the numbers are slightly off. But for a seller fulfilling physical orders, getting the quantity wrong means either wasted product or an unhappy customer.

What we changed: We added a quantity verification step that cross-checks AI-parsed quantities against the raw message text. If the numbers in the original message do not match the quantities in the parsed result, the system flags the order for confirmation rather than silently accepting it. We also lowered the default confidence assigned to AI parses that do not self-report a confidence score. Previously, an AI result without an explicit confidence was treated as high-confidence by default. Now it is treated as moderate-confidence, which triggers customer confirmation — a safer default that catches errors the model itself does not flag.

The principle: trust the AI to do the heavy lifting on complex orders, but verify the basics before committing.

The Parser Is Stronger for It

Each of these lessons made both layers of our hybrid system better. The regex parser gained smarter escalation logic — it knows when to ask a targeted question versus when to hand off entirely. The AI layer gained guardrails that catch hallucinations and verify outputs.

The core philosophy has not changed: regex first, AI when needed. But "when needed" is now a much more nuanced decision, and "what AI returns" goes through more scrutiny before reaching the customer.

For the foundational architecture behind all of this, read Hybrid Regex + AI: Why We Parse 60% of Orders Without AI and Building a Taglish-Aware NLP Parser.

Building an order automation tool for Filipino sellers? Try Elizabeth.ai free and see the parser in action.