Hybrid Regex + AI: Why We Parse 60% of Orders Without AI
A deep-dive into Elizabeth.ai's hybrid order parsing architecture — why regex handles 60% of orders at zero cost, and when AI takes over for complex Taglish messages.
By Elizabeth.ai Team
The Architecture Decision
When we designed Elizabeth.ai's order parsing system, we faced a classic engineering tradeoff: accuracy versus cost. Pure AI parsing (sending every message to Claude or GPT) gives you the highest accuracy, but the API costs scale linearly with volume. Pure regex parsing is free but cannot handle ambiguous or complex messages.
We chose a hybrid approach. Regex first, AI only when needed. The result: approximately 60% of orders are parsed by regex at zero marginal cost, and the remaining 40% are routed through our AI cascade.
This was not an arbitrary split. It emerged from studying Filipino order message patterns and understanding their structure.
Why Regex Works for 60% of Orders
Most order messages, even in Taglish, follow predictable patterns. Customers order in structured ways:
Pattern 1: Quantity + Product
"2 chicken adobo, 3 sinigang"
Pattern 2: Product + Quantity
"adobo x2, sinigang x3"
Pattern 3: Claim Pattern
"mine po" (implicit quantity = 1 of the post's product)
Pattern 4: Numbered List
"1. Adobo - 2 pcs\n2. Sinigang - 1\n3. Ube cake - 3 slices"
These patterns can be captured with well-crafted regular expressions. The regex engine does not need to understand Filipino grammar or English semantics — it needs to identify quantities, match product names against the merchant's catalog, and extract delivery/payment details.
The Parser Config System
Each merchant's regex parser is driven by a configuration object (ParserConfig) that contains:
- Product catalog: Names, aliases, and common misspellings for each product
- Units: Recognized unit words ("pcs", "pieces", "box", "dozen", "dz")
- Filipino quantity words: "isa" (1), "dalawa" (2), "tatlo" (3), etc.
- Stop words: Filipino particles to ignore ("po", "naman", "lang", "din")
- Connectors: Words that signal additional items ("tsaka", "saka", "at", "and", "plus")
This config-driven approach means the regex engine adapts to each merchant's unique vocabulary without code changes. A food seller and a clothing seller have completely different product names, but the same regex patterns work for both.
The Cost Analysis
Let us quantify why this matters.
Assume a merchant processes 5,000 orders per month. With pure AI parsing using a mid-tier model:
- Cost per AI call: ~$0.003-0.01 (depending on provider and prompt length)
- Monthly AI cost: $15-50 for parsing alone
- Annual: $180-600
With our hybrid approach at a 60/40 split:
- Regex (3,000 orders): $0
- AI cascade (2,000 orders): $6-20/month
- Monthly savings: $9-30
- Annual savings: $108-360
For a single merchant, the savings are modest. But Elizabeth.ai serves many merchants on shared infrastructure. At platform scale, the 60% regex coverage reduces our AI costs dramatically, enabling us to offer a meaningful Free tier (100 orders/month) and keep Pro pricing at PHP 2,499/month.
When Regex Fails (and AI Takes Over)
The regex parser has clear limitations. It escalates to the AI cascade when:
Ambiguous Quantities
"Adobo for the whole family"
How many servings is "the whole family"? The regex parser cannot infer this. The AI cascade, given the merchant's product catalog and typical serving sizes, can make a reasonable estimate or ask a clarifying question.
Conversational Orders
"Ate, yung ginawa mo last week na cake na sobrang sarap, yun ulit, dalawa. Tsaka yung cookies na kasama nun."
This order references a previous purchase ("yung ginawa mo last week"), uses a relative description ("yung cookies na kasama nun" = the cookies that came with it), and requires conversational context. Pure regex cannot resolve these references.
Complex Modifications
"Sinigang pero wag po yung maanghang. Pwede po bang less sabaw? Tapos extra rice 2."
Modifications ("wag yung maanghang" = not the spicy one, "less sabaw" = less broth) require understanding intent, not just pattern matching.
Emoji-Heavy Messages
"chicken adobo 2, sinigang 1, love the ube cake!!!"
When emojis are decorative versus meaningful (sometimes sellers use specific emojis as product identifiers), the regex parser may misinterpret. The AI layer understands context.
Accuracy Tradeoffs
Our regex parser operates with a confidence threshold. If pattern matching produces a result but with low confidence (e.g., a fuzzy product name match with high Levenshtein distance), it still escalates to AI for verification.
This means the regex parser prioritizes precision over recall:
- Precision: When the regex parser returns a result, it is almost always correct (~98%)
- Recall: It only attempts to parse messages it is confident about (~60% of all messages)
The AI cascade handles the remaining 40% with its own accuracy profile (~94-96% for the primary provider), and the combination delivers overall system accuracy above 96%.
For more on the AI cascade architecture, see Designing a 3-Provider AI Cascade with Circuit Breakers. For Taglish-specific parsing challenges, read Building a Taglish-Aware NLP Parser.
Learn more about our team and mission.
Update: Since writing this post, we've significantly evolved the parser's confidence system, added hallucination filtering, and made the regex engine runtime-configurable per merchant. Read 4 Things We Got Wrong in Our First Hybrid Parser for the full story.
Curious about the technology behind your order automation? Try Elizabeth.ai free and see the hybrid parser in action on your own orders.