
Intura CTO Speaks at BandungPy on LLM Routing for Efficiency and High Availability
Date
Saturday, 3 May 2025
Time
11:00 WIB
Venue
Sans. Co Space Dago
Bandung, Indonesia
Organiser
Komunitas Bandung.py
Talk Title
LLM Routing with Python for Improved Efficiency and High Availability
Bandung, May 2025 — At Sans. Co Space Dago on the morning of Saturday, 3 May, the Bandung Python community gathered for its May meetup. On the agenda alongside a session on MicroPython and IoT protocols: a talk by Muhammad Ramadiansyah — Co-Founder and CTO of Intura — on one of the most pressing practical problems facing engineers building AI products today. Not how to use an LLM. How to decide which one — and how to build a system that makes that decision intelligently, automatically, and at the lowest possible cost.
Key Highlights
LLM Routing in Practice
A live walkthrough of how to route different tasks to the most cost-effective and capable model — in Python, in production
Open Source SDK
Intura's `intura-ai` package on PyPI was introduced: a Python SDK for intelligent, personalised LLM routing
ROI-First Approach
The session tackled token consumption and pricing head-on — how to maximise output quality while minimising unnecessary spend
BandungPy Community
Presented to Bandung's largest Python community — developers, engineers, and students building with AI today
The Problem Every AI Engineer Is Running Into
A year ago, the question was whether to use an LLM at all. Today, the question is which one. GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro, Llama 3, Mistral, Qwen — the list of capable models grows every quarter, and each of them has a different price per token, a different latency profile, a different set of tasks it excels at, and a different risk of going down or changing behaviour after an update.
For a startup or an engineering team building an AI product, this fragmentation is not an academic problem. It is a daily operational one. You are paying for tokens on every request. You are relying on model uptime for your product's availability. You are getting different quality outputs depending on which model you call for which task — and most of the time, the default decision of "just use the best model for everything" is both the most expensive and the most brittle approach.
This is the problem that Muhammad's session at BandungPy was built around: how do you build an LLM routing layer — in Python, using patterns that real engineering teams can actually adopt — that solves all three dimensions of this problem simultaneously? Cost, capability, and availability.

BandungPy: Where Bandung's Python Engineers Show Up
BandungPy is Bandung's largest Python community — a regular gathering of developers, engineers, data scientists, and students who are building with Python in their day jobs and side projects. The meetups are hands-on and technically grounded: not keynote stages and sponsor pitches, but working engineers sharing what they have learned from doing real work.
The May 2025 edition was held at Sans. Co Space Dago — a relaxed venue in the Dago area that gave the session an appropriate energy: informal enough for honest technical conversation, focused enough for actual knowledge transfer. The room was full of engineers with laptops open, which is exactly the context in which a talk about a Python SDK lands best.
The Talk: LLM Routing with Python
Muhammad opened with the structural reality that every engineer in the room had encountered but perhaps not yet named: the LLM market is fragmenting rapidly, and the cost and capability gaps between models are large enough to matter significantly at scale. A request that costs $0.015 on one model might cost $0.0005 on another — for the same task, with comparable output quality. Multiply that across millions of requests and the difference is not a rounding error. It is a business outcome.
The core of the session was the concept of LLM routing: the practice of building a decision layer into your AI application that evaluates each incoming task — based on its complexity, required capabilities, latency tolerance, and cost budget — and routes it to the most appropriate model. Not always the best model. The most appropriate model for that specific task in that specific context.
This is not a new idea in systems engineering. Load balancing, traffic routing, and service mesh patterns have existed for decades. What is new is applying those patterns to the world of language models — where the "services" being routed to have meaningfully different output characteristics, not just performance profiles. A routing layer for LLMs has to understand what each model is good at, not just whether it is available and fast.

Every LLM has a different price, speed, and capability profile. Routing blindly — sending every task to the same model — is one of the most expensive mistakes an engineering team can make. The engineers who figure out how to match the right model to the right task will have a structural cost advantage that compounds over time.
Intura AI: The SDK Built for This Problem
The practical centrepiece of the session was the introduction of intura-ai — Intura's open source Python SDK, available on PyPI, that implements intelligent LLM routing out of the box. The SDK is built around the observation that most engineering teams end up building some version of this routing logic themselves, in a fragmented and ad-hoc way, every time they scale an AI application past a certain threshold of complexity.
With `intura-ai`, the routing logic is externalised and personalised. Engineers define their task profiles — what kind of output is needed, what the acceptable latency window is, what the cost ceiling is — and the SDK handles the model selection decision. It abstracts away the vendor-specific API differences, manages fallbacks when a model is unavailable, and tracks consumption patterns over time so the routing decisions can be continuously improved.
The framing was explicitly ROI-first. Token consumption is already a significant cost line for any team running AI at scale, and the trajectory of pricing — as models get more capable and inference demand grows — is not guaranteed to go down. Teams that build smart routing infrastructure now are building a cost moat: a structural advantage that compounds the larger their usage grows.

The Room's Response
The BandungPy community is not a passive audience. Engineers pushed back with real questions from real systems: how does routing interact with stateful conversations? How do you handle cases where task complexity is hard to classify upfront? What does fallback behaviour look like when a primary model hits rate limits mid-session?
These are exactly the questions that sharpen a product. The session was as much a feedback loop as it was a presentation — the kind of technical conversation that is only possible in a community where the people in the room are actively building, not just learning about building. Several engineers expressed interest in contributing to or integrating the SDK into their existing pipelines.
Why This Talk, Why Now
The LLM routing problem is not a future problem. It is a present one, and it is getting more acute every quarter as the number of available models grows, as capability differentiation between model tiers widens, and as more engineering teams move AI features from prototype to production. The engineers who figure out routing early will have systems that are cheaper to run, more resilient, and easier to optimise.
Sharing this at BandungPy — with Bandung's Python community, in an open and practical format — reflects something Intura believes about how good technical ideas should move through an ecosystem. Not gated behind sales cycles or proprietary tooling, but available to every engineer who wants to build better AI systems. The `intura-ai` SDK is open source for exactly that reason.
We are grateful to BandungPy and the community for the space, the questions, and the energy. The conversation started on 3 May in Bandung is one we intend to continue.
Intura is an AI-powered design platform that provides data-driven insights, predicts design performance before launch, and delivers recommendations that accelerate time-to-market, reduce decision fatigue, and maintain brand consistency.