Question 1

What does the edge LLM sizing agent do?

Accepted Answer

It sizes a private, on-device LLM deployment from six inputs: use case, data sensitivity, number of sites or devices, users per site, latency, and timeline. You get a feasibility verdict, recommended hardware, per-site monthly cost, fleet totals, and a break-even comparison against cloud APIs.

Question 2

Are the numbers real or modeled?

Accepted Answer

The throughput, power draw, and concurrency limits are physically measured on a Jetson Orin Nano 8GB in NavyaAI's June 2026 benchmark — including where the board fails (4B-class models under concurrent load). The full benchmark with all tables is published on our blog. Workload volume and API prices are modeled from your inputs and provider price pages, and every assumption is listed with the result.

Question 3

What hardware does an edge LLM deployment need?

Accepted Answer

For document Q&A, support assistants, and classification workloads, a Jetson Orin Nano-class board running a quantized 1B-class model serves up to 16 concurrent users at a measured 88 tokens/sec aggregate. Larger models or heavier concurrency need a GPU server — the agent tells you when that is the honest answer and routes you to the on-prem calculator.

Question 4

What happens after I unlock the full sizing?

Accepted Answer

You see the complete fleet cost breakdown and API comparison immediately, and the NavyaAI team sends a written sizing read — hardware tier, per-site costs, fleet math, and a deployment checklist — within one business day.

Edge LLM Deployment Sizing

Sizing grounded in a real bench, not a spreadsheet

88-158 tok/s measured aggregate

8-9W under sustained load

Honest failure boundaries

Written read in 1 business day

Edge LLM sizing FAQ

The Full Jetson Benchmark

GPU Requirements: Edge to HPC

Edge RAG vs OpenAI API

On-Prem GPU Cost Calculator