Proof of work

Real problems.
Real results.

Three anonymised case studies from recent engagements. Metrics are verified, timelines are real. Client names withheld by request.

Fintech · Series A · Bangalore
Loan processing from 3 days to 4 hours using an agentic workflow
A 60-person lending startup was manually processing personal loan applications across 6 touchpoints — document collection, KYC verification, credit bureau pulls, risk scoring, underwriter review, and disbursal triggers. A 6-person ops team handled 150 applications per day. Growing loan volumes were creating a hiring bottleneck.
94%
Reduction in processing time
94L
Annual ops cost saved
Volume capacity without new hires
8wk
From first call to live in production
The problem

Each loan application touched 6 different systems: a WhatsApp onboarding bot, a document portal, CIBIL API, an internal risk model, a human underwriter queue, and a disbursal trigger. None were connected. Ops staff spent 4+ hours per application copying data between systems, chasing missing documents, and manually escalating edge cases.

At 150 applications/day and ₹180/hour blended ops cost, the manual processing alone cost ₹1.35L per day — ₹3.6Cr annually.

What we built

A 7-agent LangGraph pipeline that orchestrates the entire loan journey autonomously. A document intake agent accepts uploads via WhatsApp and email, runs OCR + classification, and flags missing items. A KYC agent cross-checks PAN, Aadhaar, and selfie verification. A credit agent pulls CIBIL scores and formats them for the risk model. A risk agent runs the client's scoring model and routes decisions to the right bucket.

The human underwriter now only sees applications the system has flagged as requiring judgment — roughly 12% of volume. Everything else is fully automated end-to-end.

LangGraphGPT-4o (routing)Gemma 3 (doc classification)FastAPIWhatsApp Business APIn8nPostgreSQL
Build timeline
Week 1–2Process mapping, API access, test dataset of 500 past applications
Week 3–4Document intake + KYC agents built and tested on historical data
Week 5–6Credit + risk agents, routing logic, underwriter review UI
Week 7Shadow mode — pipeline runs alongside humans, outputs compared
Week 8Live in production. Ops team redeployed to exception handling and collections.

"We expected to spend 6 months on this. It was live in 8 weeks and handling 88% of applications fully automatically. The underwriters actually prefer it — they only see the interesting cases now."

— Head of Operations, Series A lending startup (name withheld)
Healthcare · 45-physician group · Hyderabad
80% of appointment queries now handled automatically — in Hindi and English
A mid-sized multi-speciality hospital group with 5 locations was fielding 600–800 WhatsApp messages per day across departments. Receptionists spent 3–4 hours daily answering repeat questions: availability, fees, directions, test preparation instructions. After-hours queries went unanswered until 9am, causing missed appointments and patient frustration.
80%
Queries handled automatically
0
After-hours missed enquiries
22%
Increase in confirmed bookings
2wk
From briefing to live deployment
The problem

The hospital group had a single WhatsApp Business number per location, managed by one receptionist. Messages came in Hindi, Hinglish, and English — voice notes, text, and photos of prescriptions. Staff had no way to track which queries were resolved. After 6pm, patients received no response at all.

The ops manager estimated 3 receptionist-hours per location per day — 15 hours daily across 5 locations — were spent on answerable questions the AI could handle.

What we built

A WhatsApp AI agent connected to the hospital's existing appointment system (custom Django backend). The agent handles: appointment booking and rescheduling, doctor availability queries, fee enquiries, test preparation instructions (pulled from a PDF knowledge base), directions and parking, and insurance panel queries.

Voice notes are transcribed using Whisper. Hindi and Hinglish messages are handled natively — we fine-tuned the response prompts on real message samples provided by the hospital. Queries it can't answer are escalated to a human queue with full context and suggested responses pre-filled.

WhatsApp Business APIWhisper (voice transcription)GPT-4o miniRAG (PDF knowledge base)Django RESTRedis (session state)
Build timeline
Day 1–3Audit of 1,000 historical messages, intent mapping, FAQ extraction
Day 4–7Agent built, connected to appointment API, Hindi prompt tuning
Day 8–10Internal testing with 50 real staff test messages across languages
Day 11–14Soft launch — one location. Staff oversight. Zero critical errors. Full rollout.

"Patients are now getting instant replies at 11pm. Bookings went up and our reception team is actually less stressed — they deal with complex cases, not 'what time does OPD open'."

— Hospital Operations Director (name withheld by client request)
LegalTech SaaS · Series B · Europe
Custom LLM running on-premise, processing 40K contracts with 91% accuracy
A European legal technology company needed a contract classification and clause extraction model that could run entirely on their own infrastructure — clients were law firms and corporates with strict data sovereignty requirements. No data could be sent to OpenAI, Azure, or any cloud model provider. They had 40,000 labelled contracts and no internal ML team.
91%
Classification accuracy (held-out test set)
$22K
Total build cost incl. GPU time
6wk
To production-ready model
0
Data left client servers
The problem

The client's existing approach used rule-based regex classifiers written in 2019 — 67% accuracy on modern contracts, which had become more varied in structure. They'd evaluated GPT-4 and Claude but couldn't use either due to client data policies. They needed a model that could run on a single A100 server in their Frankfurt data centre.

Their CTO had spoken with 4 ML consultancies. Two said it wasn't possible without cloud infrastructure. One quoted €180K and 9 months. We quoted $22K and 6 weeks.

What we built

Fine-tuned Gemma 3 4B using QLoRA on their labelled dataset. The training process ran on rented A100s (we used Lambda Labs for training — no client data left their premises because we worked with a 10% sample for prototyping, then the full training ran on their own server using our scripts).

The final model handles 12 contract types, extracts 34 standard clause types, and flags non-standard language for human review. It runs in under 2 seconds per contract on their A100 — processing their entire historical archive in under 14 hours. We shipped an inference API wrapper, a monitoring dashboard, and full retraining documentation so their developers can fine-tune on new contract types without external help.

Gemma 3 4BQLoRA (4-bit)Hugging Face TransformersPEFTFastAPI inference serverMLflow trackingONNX export
Build timeline
Week 1Dataset audit, label quality review, baseline benchmark (rule-based vs GPT-4 reference)
Week 2–3Prototype fine-tune on 10% sample — validated approach, hit 88% on dev set
Week 4Full training run on client server. Hyperparameter tuning. Evaluation across all 12 contract types.
Week 5Inference API, monitoring dashboard, edge case review with their legal team
Week 6Production deploy, documentation, retraining runbook handover

"Every other agency told us we needed the cloud. Ravi's team understood the sovereignty constraint from day one and built around it — not against it. The model runs on our servers, we own it completely."

— CTO, European LegalTech company (name withheld)

Your problem could be
the next case study.

First call is free. We'll tell you within 30 minutes whether we can help — and what it would take.

Start a conversation →