What is SIMD?
Wikipedia has a nice definition of SIMD for us:
Single instruction, multiple data (SIMD), is a class of parallel computers in Flynn's taxonomy. It describes computers with multiple processing elements that perform the same operation on multiple data points simultaneously. Thus, such machines exploit data level parallelism, but not concurrency: there are simultaneous (parallel) computations, but only a single process (instruction) at a given moment. SIMD is particularly applicable to common tasks like adjusting the contrast in a digital image or adjusting the volume of digital audio. Most modern CPU designs include SIMD instructions in order to improve the performance of multimedia use.
In short, SIMD allows for processing several data values with one single instruction. It's a cheap way to increase the computational power of CPUs: What is mostly needed are wide ALUs (those are cheap) and comparatively little control logic.
To unlock the computation potential of modern CPUs it is essential to utilize SIMD instructions.
SIMD engines usually work with wide registers (a typical number is 128 bits) that can contain several independent values. A typical 128 bit SIMD register can contain...
- sixteen 8 bit integer values (int8x16 and uint8x16)
- eight 16 bit integer values (int16x8 and uint16x8)
- four 32 bit integer values (int32x4 and uint32x4)
- four single precision floating point values (float32x4)
- two double precision floating point values (float64x2)
SIMD registers essentially contain vectors. SIMD instructions thus essentially are vector instructions. Awesome!
operations on packed data
To actually get something done with SIMD operations are needed that work on SIMD data types.
A list of typical operations is assembled on the SIMD Operations page.
SIMD instruction sets
- SSE2: Available on every not completely outdated CPU from Intel, AMD, or VIA. The SSE2 instructions are guaranteed to be available on all 64-bit x86-CPUs („x86-64“).
- AVX: Available on modern high-performance CPUs from Intel and AMD.
- NEON: Available on basically every modern ARM-compatible CPU designed for general purpose applications (for instance, the vast majority of ARM CPUs in smartphones or tablets support NEON).
To be able to use SIMD operations one needs
- SIMD data types
- Operations that are defined on the SIMD data types
Algorithms and use cases for SIMD
Not every algorithm can benefit from SIMD instructions. Generally speaking, it's necessary to have several identically-typed values of at hand on which to apply an identical sequence of operations.
The following algorithms can benefit greatly from SIMD operations: