Pillion Intelligence

Enterprise GenAI Platform

Product Manager20255 months

Pillion Intelligence engaged me to lead the delivery of their enterprise GenAI conversational platform entirely from scratch — no team, no architecture, no product definition. The platform needed to serve enterprise clients with complex, domain-specific queries at scale, using the best available LLMs dynamically routed based on query type, cost, and accuracy requirements.

Quick facts

Client: Pillion Intelligence
Role: Product Manager
Category: GenAI · SaaS
Year: 2025
Duration: 5 months

0→MVP

Full product build

100%

Sprint delivery rate

Multi-LLM

Dynamic routing

C-suite

Sign-off achieved

OpenRouterLLM OrchestrationRAGLangChainMulti-modelEnterprise AI

Background & context

Enterprise GenAI is a different problem to consumer GenAI. Enterprise users ask domain-specific, high-stakes questions where a hallucinated answer has real business consequences. They have longer sessions, more context, and less tolerance for failure. The "connect to GPT-4 and deploy" approach that works for demos fails in production because it optimises for peak capability rather than consistent reliability.

The challenge

What made this hard.

Zero-to-one with a hard deadline

There was no existing team, no architecture, no product spec, and no prior work to build on. Everything — team hiring, requirement gathering, product definition, technical architecture, and delivery execution — had to happen in parallel under a tight timeline with C-suite stakeholders who needed to validate a production-ready MVP.

Multi-model LLM routing in production

Different enterprise query types have fundamentally different requirements. A simple factual lookup needs fast, cheap, reliable. A nuanced multi-step analysis needs deep reasoning capability. A code generation request needs a coding-specialist model. Routing all queries to a single model either wastes money or sacrifices quality on tasks where the model is weak.

The hallucination problem at enterprise scale

Enterprise clients cannot tolerate confident wrong answers. The platform needed a resolution architecture that could distinguish between queries the AI could handle reliably and queries that required human verification — and transition between the two channels seamlessly, without making the user feel like the AI had failed.

The approach

How we solved it.

Recruited and structured the team in parallel with requirements

Rather than waiting for a complete requirements document before hiring, I recruited the team in two cohorts — first the senior engineers needed for architecture decisions, then the execution team once the scope was clear enough to define capacity needs. This parallel approach compressed the pre-build phase significantly.

OpenRouter for dynamic multi-model routing

OpenRouter provides a unified API across 100+ LLMs, enabling dynamic routing based on configurable criteria. I architected a routing layer that classified each incoming query by complexity, domain, and latency requirements, then selected the optimal model accordingly. Simple factual queries were routed to fast, cost-effective models. Complex reasoning tasks were routed to frontier models. The system tracked per-query cost and accuracy, enabling continuous routing optimisation.

RAG pipeline for enterprise knowledge grounding

To address hallucination on domain-specific content, I designed a RAG (Retrieval-Augmented Generation) pipeline that grounded model responses in the client's verified knowledge base. Incoming queries were vectorised, matched against the knowledge store, and the retrieved context was injected into the model prompt before generation. This dramatically improved factual accuracy for domain-specific queries.

Dual-channel resolution model

The platform distinguished between AI-resolvable queries and queries requiring human verification using a confidence scoring system. High-confidence queries received AI-generated responses directly. Low-confidence queries were flagged and routed to a human review queue, with the AI-generated draft surfaced as a starting point for the reviewer. This hybrid model delivered the speed of AI with the reliability of human oversight where it mattered.

Impact & outcomes

What we delivered.

Production-ready MVP delivered on schedule

Full product delivered from zero — team, architecture, product, and deployment — within the committed timeline. C-suite validation and sign-off achieved at the first review session.

100% sprint delivery consistency

Rigorous scope control, daily standups, and milestone tracking achieved perfect sprint delivery throughout the engagement. Not a single sprint closed with unresolved blockers.

Multi-model routing reduced LLM cost significantly

Routing cost-appropriate models to each query type reduced per-query LLM API cost while maintaining or improving accuracy on high-stakes queries — routing premium models only where they were needed.

Dual-channel resolution improved enterprise query outcomes

The hybrid AI/human resolution model measurably improved resolution accuracy for complex, domain-specific enterprise queries compared to a pure AI approach.

Tools & technologies

OpenRouterLangChainRAG PipelineReactNode.jsPostgreSQLAWSDockerGitHub Actions

Lessons learned

What this taught me.

In GenAI product delivery, the hardest problems are not the AI problems — they are the product problems. What should the system do when it is not sure? How does it fail gracefully? How does it earn user trust over time?

Multi-model routing is not a premature optimisation — it is a product architecture decision that has significant downstream implications for cost, quality, and maintainability. Make it early.

The best way to build a GenAI team quickly is to hire for first-principles reasoning ability, not just current LLM framework knowledge. The frameworks change every six months. The thinking ability does not.

Next case study

FoodDay Ordering Platform →

All projects Work with me →