Million Instructions Per Second (MIPS)
Overview
MIPS stands for “Million Instructions Per Second”—it’s a measure of computer performance that indicates how many machine instructions a processor can execute in one second.
Example
- A vintage Intel 8086 CPU (1980s) → ~0.33 MIPS
- A modern ARM Cortex-A76 → ~10,000+ MIPS (estimated)
Unlike FLOPS (which specifically measures floating-point operations), MIPS counts all types of processor instructions, including:
What’s an “Instruction”?
In this context, an instruction is a low-level command a CPU understands — like:
- Data movement (loading/storing values)
- Integer arithmetic (addition, subtraction with whole numbers)
- Logic operations (AND, OR, NOT)
- Control flow (jumps, branches, loops)
- Memory operations
MIPS vs FLOPS
Metric | Measures | Typical Use | Example |
---|---|---|---|
MIPS | Integer/machine instructions | General-purpose computing | OS, compilers, system apps |
FLOPS | Floating point operations | Scientific/AI/ML workloads | AI models, physics, simulations |
Why MIPS Isn’t Always Reliable
MIPS was historically a popular benchmark for comparing processor performance, especially in the 1980s and 1990s. However, it has some significant limitations:
- Different instructions take different amounts of time to execute
- Simple instructions (like moving data) execute faster than complex ones (like division)
- Modern processors use techniques like pipelining and parallel execution that make instruction counting less meaningful
- The mix of instruction types varies greatly between different programs
For example: a CPU might boast “2000 MIPS,” but that tells you little about how fast it renders video or trains a neural net.
Modern performance benchmarks tend to use more realistic measures like actual program execution times, FLOPS for scientific computing, or specialized benchmarks that test real-world workloads rather than just raw instruction throughput.
Modern Performance Benchmarks
There isn’t a single universal benchmark that replaced MIPS, but rather a variety of specialized benchmarks depending on the use case:
For General Computing Performance:
- SPEC benchmarks (like SPEC CPU): These run actual application workloads including compression, compilation, scientific computing, and other real-world tasks
- Geekbench: Popular cross-platform benchmark that tests CPU and memory performance with practical workloads
- Cinebench: Focuses on rendering performance, useful for creative workloads
For Gaming and Graphics:
- 3DMark: Tests GPU performance with game-like graphics workloads
- Frame rates in actual games: Often the most practical measure for gaming performance
For AI/Machine Learning:
- FLOPS: (especially for training neural networks)
- MLPerf: Standardized benchmarks for ML training and inference across different hardware
- Tokens per second: For language models
For Mobile Devices:
- AnTuTu: Comprehensive mobile benchmark testing CPU, GPU, memory, and storage
- Battery life tests: Under specific workloads
For Servers/Data Centers:
- TPC benchmarks: (Transaction Processing Performance Council) for database performance
- LINPACK: For high-performance computing (used to rank supercomputers)
The key shift has been from simple instruction counting to workload-based benchmarks that measure how fast systems complete actual tasks that users care about. This gives a much more realistic picture of real-world performance than abstract metrics like MIPS ever could.