In Computer Architecture, Hennessy and Patterson classify benchmarks according to the following hierarchy, from best to worst.
- Real applications
- Modified applications (eg. with I/O removed to make it CPU-bound).
- Kernels (key fragments of real applications).
- Toy benchmarks (eg. sieve of Erastosthenes).
- Synthetic benchmarks (code created artificially to fit a profile of particular operations, eg. Dhrystone)
Then there are microbenchmarks, which typically measure a single short-running operation by repeating it many times. One might put microbenchmarks level with toy benchmarks -- they don't contain code from a real program, but at least they measure something that will occur in real programs, unlike synthetic benchmarks.
People may argue about exactly which level a particular benchmark belongs to. For example, an implementation of a crypto algorithm could be considered a small but real application, or it could be considered a kernel. Such a difference of opinion probably doesn't matter, the classification serves as a rule of thumb at best.