# SIMD/Operations

To make use of SIMD data, operations are needed that work on the corresponding data types.

## Contents

## Load and Store

To assemble vectors of data that subsequently can be operated upon, SIMD instruction sets include loading instructions, that copy data values from consecutive memory locations into SIMD registers. After completing computation, the contents of SIMD registers can be copied to memory locations using store instructions.

In JavaScript, these instructions need to be exposed to programmers in a convenient way to instantiate SIMD data types, such as, e.g., uint16x8. This can, for instance, happen by having a constructor that accepts 8 numeric values (and applying clamping etc. as needed):

var myUint16x8 = new uint16x8(1, 2, 3, 4, 5, 6, 8);

This has the distinct disadvantage of being slow, as each data value needs to be converted to the corresponding scalar data type and written into memory to be accessible for the SIMD load instruction.

A more efficient approach may be to load data values from a fittingly typed ArrayBufferView, which is backed by a memory region containing data values in a linear fashion, as desired by the SIMD load/store instructions:

// myUint16Array is an ArrayBufferView of type Uint16Array var myUint16x8 = new uint16x8(myUint16Array, 2);

This loads eight uint16 values from myUint16Array into myUint16x8, starting at offset 2.

## Arithmetic operations

Clearly, following basic operations are needed:

- Addition, optionally saturating
- Subtraction, optionally saturating
- Multiplication

Division on modern CPUs can be achieved much faster using reciprocal multiplication. However, for ease of programming it might still make sense to offer a direct division operation (if the divisor can be detected to be static, the JIT engine may still convert it to reciprocal multiplication for full speed).

For video codecs, having support for the SAD operation on integer vector types can be very useful.

Min, Max (clamping) is useful for every data type.

Averaging. SSE2 only seems to have operations for averaging unsigned 8 and 16 bit integers, which may indicate what's considered useful.

Where applicable, all operations should have the option to supply a scalar argument, which is automatically expanded to vector form and applied.

## Logical operations

- AND
- OR
- XOR

## Shift operations

- component wise: shift left/right, logical and arithmetic, bitwise shift count
- complete vectors: left/right, logical (does sign extend make sense here?), bytewise shift count might suffice

## Pack and Unpack

Unpacking:

- 1 vector of 8 bit values -> 2 vectors of 16 bit values
- 1 vector of 16 bit values -> 2 vectors of 32 bit values

With fitting sign extension for signed data types.

Packing:

The reverse. Optional automatic clamping might be useful, but clamping can be done by the programmer prior to packing anyways.