AI GPU Simulator — LLM Performance, VRAM & Tokens/sec Calculator
AI GPU Simulator is a free web tool that estimates how fast large language models (LLMs) run
on a given GPU. Simulate VRAM usage, tokens per second,
time to first token (TTFT), and decode speed for models
like Llama 3.1, Qwen 2.5, Mixtral, and DeepSeek on GPUs
including the RTX 5090, RTX 4090, RTX 3090, H100, and A100.
This page requires JavaScript to run the interactive simulator. Enable JavaScript or visit
the open-source GitHub repository
for documentation and benchmark data.
What you can do
- Check if a model fits in a GPU's VRAM at FP16, INT8, or INT4 precision.
- Estimate tokens-per-second decode throughput for single-user or batched workloads.
- Compare hardware before spending $2,000+ on a GPU that can't run your model.