Pillion Intelligence
Enterprise GenAI Platform
Pillion Intelligence engaged me to lead the delivery of their enterprise GenAI conversational platform entirely from scratch — no team, no architecture, no product definition. The platform needed to serve enterprise clients with complex, domain-specific queries at scale, using the best available LLMs dynamically routed based on query type, cost, and accuracy requirements.
Quick facts
- Client
- Pillion Intelligence
- Role
- Product Manager
- Category
- GenAI · SaaS
- Year
- 2025
- Duration
- 5 months
0→MVP
Full product build
100%
Sprint delivery rate
Multi-LLM
Dynamic routing
C-suite
Sign-off achieved
Background & context
Enterprise GenAI is a different problem to consumer GenAI. Enterprise users ask domain-specific, high-stakes questions where a hallucinated answer has real business consequences. They have longer sessions, more context, and less tolerance for failure. The "connect to GPT-4 and deploy" approach that works for demos fails in production because it optimises for peak capability rather than consistent reliability.
The challenge
What made this hard.
01
Zero-to-one with a hard deadline
There was no existing team, no architecture, no product spec, and no prior work to build on. Everything — team hiring, requirement gathering, product definition, technical architecture, and delivery execution — had to happen in parallel under a tight timeline with C-suite stakeholders who needed to validate a production-ready MVP.
02
Multi-model LLM routing in production
Different enterprise query types have fundamentally different requirements. A simple factual lookup needs fast, cheap, reliable. A nuanced multi-step analysis needs deep reasoning capability. A code generation request needs a coding-specialist model. Routing all queries to a single model either wastes money or sacrifices quality on tasks where the model is weak.
03
The hallucination problem at enterprise scale
Enterprise clients cannot tolerate confident wrong answers. The platform needed a resolution architecture that could distinguish between queries the AI could handle reliably and queries that required human verification — and transition between the two channels seamlessly, without making the user feel like the AI had failed.
The approach
How we solved it.
Recruited and structured the team in parallel with requirements
Rather than waiting for a complete requirements document before hiring, I recruited the team in two cohorts — first the senior engineers needed for architecture decisions, then the execution team once the scope was clear enough to define capacity needs. This parallel approach compressed the pre-build phase significantly.
OpenRouter for dynamic multi-model routing
OpenRouter provides a unified API across 100+ LLMs, enabling dynamic routing based on configurable criteria. I architected a routing layer that classified each incoming query by complexity, domain, and latency requirements, then selected the optimal model accordingly. Simple factual queries were routed to fast, cost-effective models. Complex reasoning tasks were routed to frontier models. The system tracked per-query cost and accuracy, enabling continuous routing optimisation.
RAG pipeline for enterprise knowledge grounding
To address hallucination on domain-specific content, I designed a RAG (Retrieval-Augmented Generation) pipeline that grounded model responses in the client's verified knowledge base. Incoming queries were vectorised, matched against the knowledge store, and the retrieved context was injected into the model prompt before generation. This dramatically improved factual accuracy for domain-specific queries.
Dual-channel resolution model
The platform distinguished between AI-resolvable queries and queries requiring human verification using a confidence scoring system. High-confidence queries received AI-generated responses directly. Low-confidence queries were flagged and routed to a human review queue, with the AI-generated draft surfaced as a starting point for the reviewer. This hybrid model delivered the speed of AI with the reliability of human oversight where it mattered.
Impact & outcomes
What we delivered.
Production-ready MVP delivered on schedule
Full product delivered from zero — team, architecture, product, and deployment — within the committed timeline. C-suite validation and sign-off achieved at the first review session.
100% sprint delivery consistency
Rigorous scope control, daily standups, and milestone tracking achieved perfect sprint delivery throughout the engagement. Not a single sprint closed with unresolved blockers.
Multi-model routing reduced LLM cost significantly
Routing cost-appropriate models to each query type reduced per-query LLM API cost while maintaining or improving accuracy on high-stakes queries — routing premium models only where they were needed.
Dual-channel resolution improved enterprise query outcomes
The hybrid AI/human resolution model measurably improved resolution accuracy for complex, domain-specific enterprise queries compared to a pure AI approach.
Tools & technologies
Lessons learned
What this taught me.
In GenAI product delivery, the hardest problems are not the AI problems — they are the product problems. What should the system do when it is not sure? How does it fail gracefully? How does it earn user trust over time?
Multi-model routing is not a premature optimisation — it is a product architecture decision that has significant downstream implications for cost, quality, and maintainability. Make it early.
The best way to build a GenAI team quickly is to hire for first-principles reasoning ability, not just current LLM framework knowledge. The frameworks change every six months. The thinking ability does not.
Next case study
FoodDay Ordering Platform →