Engineering Blog

Technical Insights & Tutorials

Deep dives into model optimization, HPC, MLOps, DevOps, and production-grade AI/ML engineering.

View all posts

11 articles published

Cloud Egress Costs: The Hidden Tax Breaking Cloud Budgets (and How to Cut It 20–80%)
Cloud Cost

Cloud Egress Costs: The Hidden Tax Breaking Cloud Budgets (and How to Cut It 20–80%)

Egress is rarely one line item — it's internet-out, NAT, cross-AZ, cross-region, CDN cache-fill, and realtime fanout. We break down list pricing across AWS, GCP, Azure, Supabase, Neon and Cloudflare, model four workloads, and show why your biggest network cost is architectural, not the rate card.

16 min read
Read
Oat UI: Semantic HTML UI Library and React Alternative
Frontend Engineering

Oat UI: Semantic HTML UI Library and React Alternative

Oat UI is a lightweight semantic HTML UI library for building fast, browser-native interfaces without React, utility classes, or build tooling.

20 min read
Read
Claude Mythos and Project Glasswing: The Morning an AI Found a 27-Year-Old Bug in OpenBSD
Security

Claude Mythos and Project Glasswing: The Morning an AI Found a 27-Year-Old Bug in OpenBSD

Claude Mythos found a 27-year-old OpenBSD bug and a 16-year-old FFmpeg flaw that five million fuzzing runs missed. What Project Glasswing means for your security work this week.

14 min read
Read
Designing Software for AI Agents: Why Your CLI and API Now Have Two Readers
Engineering

Designing Software for AI Agents: Why Your CLI and API Now Have Two Readers

A warning I saw in a CLI last week points at the biggest shift in software design since cloud. Software now has two readers: humans and AI agents. Here's what that actually means for your CLIs, APIs, docs, and cost structure, with the patterns and pitfalls we've learned building agent-native interfaces at NavyaAI.

12 min read
Read
Threads Beat Multiprocessing for RAG: 70% Faster, 75% Less Memory
Engineering

Threads Beat Multiprocessing for RAG: 70% Faster, 75% Less Memory

We benchmarked RAG ingestion across Python 3.13, 3.14, and 3.14t. Threads are 70% faster than multiprocessing with 75% less memory — because NumPy and PyTorch already release the GIL. Your infra doesn't need more pods.

12 min read
Read
Python vs Rust for Transformers: Performance and Cost
Engineering

Python vs Rust for Transformers: Performance and Cost

We load-tested Python FastAPI vs Rust Axum for GPU transformer inference. Rust delivered lower latency, higher throughput, and better cost per production request.

25 min read
Read
Self-Knowledge Distillation for AI Efficiency: Orpheus TTS
Engineering

Self-Knowledge Distillation for AI Efficiency: Orpheus TTS

Unsloth distillation walkthrough: compress Orpheus-3B TTS with self-knowledge distillation, SNAC tokenization, and LoRA to cut serving memory and private AI deployment cost.

25 min read
Read
Python 3.14 No-GIL vs Rust: We Benchmarked Both (4x Speedup)
Engineering

Python 3.14 No-GIL vs Rust: We Benchmarked Both (4x Speedup)

Free-threaded Python 3.14t hits 4x speedup on 4 threads — closing the gap to just 3.4x of Rust. We ran head-to-head CPU-bound benchmarks with full code. Here are the results.

30 min read
Read