SIMD/Uses/SAD

From MozillaWiki
Jump to: navigation, search

What is SAD?

The Sum of Absolute Differences is the sum of the absolute component wise difference of two vectors. For instance, in case of two vectors x and y represented as arrays:

function SAD(var x, var y) {
   var sad = 0;
   for(var i = 0; i < Math.min(x.length, y.length); ++i) {
      sad += Math.abs(x[i] - y[i]);
   }
   return sad;
}

Wikipedia has a nice article on SAD.

Uses of SAD

A major use case for SAD is motion estimation, which is an integral part of most video compression schemes. Instead of transmitting complete frames, video codecs usually try to copy similar frame data from previously transmitted frames. The positions from which to copy parts of a previous frame are coded as motion vectors. This works well, as in video sequences most objects can be found in several consecutive frames, albeit they may have moved.

SAD is often used to determine these motion vectors. Usually square regions of the current and a preceding frame are compared. Each region can be understood as a vector of pixel values. Generally speaking: the lower the SAD "score", the better the regions match. The computed motion vector is a vector pointing to the region in a preceding frame with the best match.

Code example

The excerpts are part of a public-domain MPEG-1 video encoder and can be found in https://github.com/maikmerten/mpeg/blob/master/me.c, which contains all motion estimation routines.

The following C code computes the SAD for 16 consecutive pixels (one line in a 16x16 block). The pointers "bptr" and "cptr" point to pixel values (8 bit unsigned integers) in the current and a preceding frame. The variable "error", which is the accumulator for the SAD score, is initialized as zero.

residue=(*(bptr++)-*(cptr++));
if (residue<0) {error-=residue;} else {error+=residue;}
residue=(*(bptr++)-*(cptr++));
if (residue<0) {error-=residue;} else {error+=residue;}
residue=(*(bptr++)-*(cptr++));
if (residue<0) {error-=residue;} else {error+=residue;}
residue=(*(bptr++)-*(cptr++));
if (residue<0) {error-=residue;} else {error+=residue;}
residue=(*(bptr++)-*(cptr++));
if (residue<0) {error-=residue;} else {error+=residue;}
residue=(*(bptr++)-*(cptr++));
if (residue<0) {error-=residue;} else {error+=residue;}
residue=(*(bptr++)-*(cptr++));
if (residue<0) {error-=residue;} else {error+=residue;}
residue=(*(bptr++)-*(cptr++));
if (residue<0) {error-=residue;} else {error+=residue;}
residue=(*(bptr++)-*(cptr++));
if (residue<0) {error-=residue;} else {error+=residue;}
residue=(*(bptr++)-*(cptr++));
if (residue<0) {error-=residue;} else {error+=residue;}
residue=(*(bptr++)-*(cptr++));
if (residue<0) {error-=residue;} else {error+=residue;}
residue=(*(bptr++)-*(cptr++));
if (residue<0) {error-=residue;} else {error+=residue;}
residue=(*(bptr++)-*(cptr++));
if (residue<0) {error-=residue;} else {error+=residue;}
residue=(*(bptr++)-*(cptr++));
if (residue<0) {error-=residue;} else {error+=residue;}
residue=(*(bptr++)-*(cptr++));
if (residue<0) {error-=residue;} else {error+=residue;}
residue=(*(bptr++)-*(cptr++));
if (residue<0) {error-=residue;} else {error+=residue;}

This code only computes the difference between two pixel values at a time. It also is very branchy, given that the equivalent to Math.abs() is implemented with conditional statements.


The same computation can be done in a much more straightforward way with SSE2. In this example, SSE2 intrinsics are used.

a = _mm_loadu_si128((__m128i *) bptr);
b = _mm_loadu_si128((__m128i *) cptr);
a = _mm_sad_epu8(a, b);
b = _mm_srli_si128(a, 8);
a = _mm_add_epi32(a, b);
error += _mm_cvtsi128_si32(a);

The first two statements load 128 bit of data into SSE2 registers. Each register thus contains sixteen pixels from the frames to be compared, as every pixel is represented as a 8-bit unsigned integer.

The third statement computes the SAD score as one single operation.

By design of the SSE2 SAD instruction, the register represented by "a" now contains two sums -- for each 64-bit halve of the 128-bit register one SAD. The following instructions ensure that these two SAD components are added. The last step is to convert the content of the 128-bit SSE2 register to a 32-bit integer.

Involved data types

In this example the SAD for vectors of pixel values is computed. Each pixel is represented as a 8-bit unsigned integer. The type of the input vectors can thus be uint8x16. The output of the whole procedure can have the uint32 type.