Million Instructions Per Second (MIPS)

2025-06-22

3 min read

Overview

MIPS stands for “Million Instructions Per Second”—it’s a measure of computer performance that indicates how many machine instructions a processor can execute in one second.

Example

A vintage Intel 8086 CPU (1980s) → ~0.33 MIPS
A modern ARM Cortex-A76 → ~10,000+ MIPS (estimated)

Unlike FLOPS (which specifically measures floating-point operations), MIPS counts all types of processor instructions, including:

What’s an “Instruction”?

In this context, an instruction is a low-level command a CPU understands — like:

Data movement (loading/storing values)
Integer arithmetic (addition, subtraction with whole numbers)
Logic operations (AND, OR, NOT)
Control flow (jumps, branches, loops)
Memory operations

MIPS vs FLOPS

**Table.** MIPS vs FLOPS
Metric	Measures	Typical Use	Example
MIPS	Integer/machine instructions	General-purpose computing	OS, compilers, system apps
FLOPS	Floating point operations	Scientific/AI/ML workloads	AI models, physics, simulations

Why MIPS Isn’t Always Reliable

MIPS was historically a popular benchmark for comparing processor performance, especially in the 1980s and 1990s. However, it has some significant limitations:

Different instructions take different amounts of time to execute
Simple instructions (like moving data) execute faster than complex ones (like division)
Modern processors use techniques like pipelining and parallel execution that make instruction counting less meaningful
The mix of instruction types varies greatly between different programs

For example: a CPU might boast “2000 MIPS,” but that tells you little about how fast it renders video or trains a neural net.

Modern performance benchmarks tend to use more realistic measures like actual program execution times, FLOPS for scientific computing, or specialized benchmarks that test real-world workloads rather than just raw instruction throughput.

Modern Performance Benchmarks

There isn’t a single universal benchmark that replaced MIPS, but rather a variety of specialized benchmarks depending on the use case:

For General Computing Performance:

SPEC benchmarks (like SPEC CPU): These run actual application workloads including compression, compilation, scientific computing, and other real-world tasks
Geekbench: Popular cross-platform benchmark that tests CPU and memory performance with practical workloads
Cinebench: Focuses on rendering performance, useful for creative workloads

For Gaming and Graphics:

3DMark: Tests GPU performance with game-like graphics workloads
Frame rates in actual games: Often the most practical measure for gaming performance

For AI/Machine Learning:

FLOPS: (especially for training neural networks)
MLPerf: Standardized benchmarks for ML training and inference across different hardware
Tokens per second: For language models

For Mobile Devices:

AnTuTu: Comprehensive mobile benchmark testing CPU, GPU, memory, and storage
Battery life tests: Under specific workloads

For Servers/Data Centers:

TPC benchmarks: (Transaction Processing Performance Council) for database performance
LINPACK: For high-performance computing (used to rank supercomputers)

The key shift has been from simple instruction counting to workload-based benchmarks that measure how fast systems complete actual tasks that users care about. This gives a much more realistic picture of real-world performance than abstract metrics like MIPS ever could.