Jsctypes/api

From MozillaWiki
Jump to navigation Jump to search

js-ctypes is a library for calling C/C++ functions from JavaScript without having to write or generate any C/C++ "glue code".

js-ctypes is already in mozilla-central, but the API is subject to change. This page contains design proposals for the eventual js-ctypes API.

Proposal 1

1. opening a library and declaring a function

Cu.import("ctypes"); // imports the global ctypes object

// searches the path and opens "libmylib.so" on linux,
// "libmylib.dylib" on mac, and "mylib.dll" on windows
let mylib = ctypes.open("mylib", ctypes.SEARCH);

// declares the C prototype int32_t myfunc(int32_t)
// Int32 implies ctypes.Int32, shortened for brevity
let myfunc = mylib.declare("myfunc", DEFAULT_ABI, Int32(), Int32());

let ret = myfunc(2); // calls myfunc

Note that for simple types (integers and strings), we will autoconvert the argument at call time - there's no need to pass in an Int32 object. The consumer should never need to instantiate such an object explicitly, unless they're using it to back a pointer - in which case we require explicit, strong typing. See later for examples.

2. declaring and passing a simple type (by object)

let myfunc = mylib.declare("myfunc", DEFAULT, Int32, Int32);
let i = new Int32(); // instantiates an Int32 object with default value 0
let ret = myfunc(i);

An Int32 object, like all other type objects in ctypes, can be used for type specification when passed as an object, as above. declare() can look at the prototype JSObject* of its argument, and use this as a canonical JSObject representing the type, a pointer to which can be used for simple type equality comparisons. (This will work for user-defined types such as structs also - see later - though for pointer types we need to dig down to the underlying type.)

Int32() can have two modes depending on whether JS_IsConstructing(cx) is JS_TRUE ("new Int32()") or JS_FALSE ("Int32()"). Used as a function, we could perform a type conversion with range checking, for instance:

let n = Int32(4); // JSVAL_IS_INT(n) == JS_TRUE
n = Int32(4e16); // RangeError - out of bounds
n = Int32.max; // 2^31 - 1
// etc

For the new constructor, the resulting object stores three pieces of information internally in reserved slots. |new Int32()| creates a JSObject which allocates sizeof(int32_t) and stores that pointer in a private slot. It also stores its type, as a JSObject* pointing to the canonical Int32 prototype, and can store a parent JSObject* in case it refers to an Int32 that happens to be part of another object. Thus the slot layout of i above would be

i object:
  slot 1 (parent): JSObject* -> NULL (no parent object)
  slot 2 (type) : JSObject* -> Int32 prototype
  slot 3 (value) : void* -> binary blob from malloc(sizeof(int32_t))

Do we need to provide an explicit set() method, to allow for efficient modification? For instance,

i.set(5); // cheaper than i = new Int32(5);

3. declaring and passing a pointer

// C prototype: int32_t myfunc(int32_t* p)
let myfunc = mylib.declare("myfunc", DEFAULT_ABI, Int32, Pointer(Int32));
let p = new Pointer(new Int32()); // instantiates an int and a pointer
let ret = myfunc(p); // the int is an outparam
let i = p.contents(); // i = *p (by reference)
let a = p.address(); // 0x...

// same thing, but with a named integer
let i = new Int32();
let p = new Pointer(i);
let ret = myfunc(p); // modifies i

// same thing, but with a pointer temporary
let i = new Int32();
let ret = myfunc(new Pointer(i)); // modifies i

// other examples
let q = new Pointer(); // instantiate a null pointer to a void type
q = new Pointer(5); // TypeError - require a ctypes type

Internally, a pointer requires a backing object (unless it's a null pointer). In the examples, the Pointer JSObject holds a reference to the Int32 JSObject for rooting purposes, and is laid out similarly to an Int32 object:

p object:
  slot 1 (parent): JSObject* -> Int32 backing object
  slot 2 (type) : JSObject* -> Pointer prototype
  slot 3 (value) : void* -> pointer to binary int32_t blob inside backing object

4. declaring a pointer to opaque struct

const FILE = ctypes.Struct(); // creates a Struct() type with no allocated binary storage, and no fields to access
let fopen = mylib.declare("fopen", DEFAULT_ABI, Pointer(FILE), String);
let file = fopen("foo"); // creates a new Pointer() object
file.contents(); // will throw - type is unknown
file.address(); // ok

5. declaring a struct

// C prototype: struct s_t { int32_t a; int64_t b };
const s_t = Struct([{ a: Int32 }, { b: Int64 }]);
let myfunc = mylib.declare("myfunc", DEFAULT_ABI, Int32, s_t);

let s = new s_t(10, 20);

This creates an s_t object which allocates binary space for both fields, creates getters and setters to access the binary fields via their offset, assigns the values 10 and 20 to the fields, and whose prototype is s_t:

s object:
  slot 1 (parent): JSObject* -> NULL
  slot 2 (type) : JSObject* -> s_t prototype
  slot 3 (value) : void* -> pointer to binary blob from malloc()
  slot 4 (fields): array of data for each field:
    { JSObject* parent; JSObject* type; ptrdiff_t offset; }

The array of field information allows each field to be dependent on another JSObject (only for the case where the field is a pointer), have an associated type, and have an offset into the binary blob for ease of access.

let c = s.b; // invokes the getter for |b| to create an Int64 object like so:

c object:
  slot 1 (parent): JSObject* -> s backing object
  slot 2 (type) : JSObject* -> Int64 prototype
  slot 3 (value) : void* -> pointer to binary int64_t blob inside backing object

let i = myfunc(s); // checks the type of s by JSObject* prototype equality

6. pointers to struct fields

let p = new Pointer(s.b);

Once the Int64 representing s.b is constructed, the Pointer object references it directly:

p object:
  slot 1 (parent): JSObject* -> Int64 backing object (which, in turn, is backed by s)
  slot 2 (type) : JSObject* -> Pointer prototype
  slot 3 (value) : void* -> pointer to binary int64_t blob inside backing object

7. nested structs

const u_t = Struct([{ x: Int64 }, { y: s_t }]);
let u = new u_t(5e4, s); // copies data from s into u.y - no references

let u_field = u.y; // creates an s_t object that points directly to the offset of u.y within u.

const v_t = Struct([{ x: Pointer(s_t) }, { y: Pointer(s_t) }]);
let v = new v_t(new Pointer(s), new Pointer(s));

In this case, the fields array will each have their respective Pointer as the parent object, and both will point to the s binary blob.

Proposal 2

Types

A type maps JS values to C/C++ values and vice versa. They're used when declaring functions. They can also be used to create and populate C/C++ data structures entirely from JS.

Built-in types

ctypes provides the following types:

ctypes.int8_t, uint8_t, int16_t, uint16_t, int32_t, uint32_t, int64_t, uint64_t, float32_t, float64_t - Primitive numeric types that behave the same way on all platforms (with the usual caveat that every platform has slightly different floating-point behavior, in corner cases, and there's a limit to what we can realistically do about it).
Since some 64-bit values are outside the range of the JavaScript number type, ctypes.int64_t and ctypes.uint64_t do not autoconvert to JS numbers.
ctypes.size_t, ssize_t, intptr_t, uintptr_t - Primitive types whose size depends on the platform. These types do not autoconvert to JavaScript numbers because on some platforms, there are values of these types that cannot be precisely represented as a JS number.
(Open issue: Operator overloading will eventually come to JS. JS will likely have a 64-bit integer object type someday. The above non-autoconverting behavior prevents us from later autoconverting these CTypes to 64-bit values. Maybe we should autoconvert to a crummy 64-bit number type for now, with just valueOf and toString methods, so that in the future we can compatibly upgrade to a better one.)
ctypes.bool, short, unsigned_short, int, unsigned, unsigned_int, long, unsigned_long, float, double - Types that behave like the corresponding C types. Some or all of these might be aliases for the primitive types listed above. As in C, unsigned is always an alias for unsigned_int.
(Open issue: Does long autoconvert to a JS number?)
ctypes.char, ctypes.signed_char, ctypes.unsigned_char - Character types that behave like the corresponding C types. (These are distinct from int8_t and uint8_t in details of conversion behavior. For example, js-ctypes autoconverts between C characters and one-character JavaScript strings.)
ctypes.string, ustring - String types. The C/C++ type for ctypes.string is const char *. C/C++ values of this type must be either null or pointers to null-terminated strings. ctypes.ustring is the same, but for const jschar *; that is, the code units of the string are uint16_t.
ctypes.void_t - The special C type void. This can be used as a return value type. (void is a keyword in JavaScript.)
ctypes.voidptr_t - The C type void *.

Starting from those builtin types, ctypes can create additional types:

new ctypes.PointerType(t) - If t is a ctypes type, return the type "pointer to t". If t is a string, instead return a new opaque pointer type named t. Otherwise throw a TypeError.
new ctypes.ArrayType(t) - Return an array type with unspecified length and element type t. If t is not a type or t.size is undefined, throw a TypeError.
new ctypes.ArrayType(t, n) - Return the array type T[n]. If t is not a type, or t.size is undefined, or n is not a nonnegative integer, throw a TypeError.
new ctypes.StructType(name, fields) - Create a new struct type with the given name and fields. fields is an array of field descriptors. js-ctypes calculates the offsets of the fields from its encyclopedic knowledge of the architecture's struct layout rules. If name is not a string, or fields contains a field descriptor with a type t such that t.size is undefined, throw a TypeError.

(Open issue: Specify a way to tell ctypes.StructType to use #pragma pack(n).)

(TODO: Finish specifying field descriptors.)

These constructors behave exactly the same way when called without the new keyword.

Examples:

const DWORD = ctypes.uint32_t;
const HANDLE = new ctypes.PointerType("HANDLE");
const HANDLES = new ctypes.ArrayType(HANDLE);
const FILE = new ctypes.PointerType("FILE *");
const IOBuf = new ctypes.ArrayType(ctypes.uint8_t, 4096);

const struct_tm = new ctypes.StructType('tm', [[ctypes.int, 'tm_sec'], ...]);

Properties of types

All the fields described here are read-only.

All types have these properties:

t.size - The C/C++ sizeof the type, in bytes.
If t is an array type with unspecified length, t.size is undefined.
ctypes.void_t.size is undefined.
t.name - A string, the type's name. It's intended that in ordinary use, this will be a C/C++ type expression, but it's not really meant to be machine-readable in all cases.
For primitive types this is just the name of the corresponding C/C++ type, e.g. ctypes.int32_t.name == "int32_t" and ctypes.void_t == "void". But some of the builtin types are aliases for other types, so it might be that ctypes.unsigned_long.name == "uint32_t" (or something else). (Open issue: Is that too astonishing? Python ctypes does the same thing.)
For struct types and opaque pointer types, this is simply the string that was passed to the constructor; e.g. FILE.name == "FILE *" and struct_tm.name == "tm". For other pointer types and array types this should try to generate valid C/C++ type expressions, which isn't exactly trivial.
(Open issue: This conflicts with the usual meaning of .name for functions, and types are callable like functions.)
t.toString() - Returns "type " + t.name.

Pointer types also have:

t.targetType - Read-only. The pointed-to type, or null if t is an opaque pointer type.

Struct types also have:

t.fields - Read-only. A sealed array of field descriptors. (TODO: Details.)

Array types also have:

t.elementType - The type of the elements of an array of this type. E.g. IOBuf.elementType === ctypes.uint8_t.
t.length - The number of elements, a nonnegative integer; or undefined if this is an array type with unspecified length.

Minutiae:

ctypes.CType is the abstract-base-class constructor of all js-ctypes types. If called, it throws a TypeError. (This is exposed in order to expose ctypes.CType.prototype.)
The [[Class]] of a ctypes type is "CType".
The [[Class]] of the type constructors ctypes.{C,Array,Struct,Pointer}Type is "Function".
Every CType has a read-only, permanent .prototype property. The type-constructors ctypes.{C,Pointer,Struct,Array}Type each have a read-only, permanent .prototype property as well.
Types have a hierarchy of prototype objects. The prototype of ctypes.CType.prototype is Function.prototype. The prototype of ctypes.{Array,Struct,Pointer}Type.prototype and of all the builtin types except for the string types and ctypes.voidptr_t is ctypes.CType.prototype. The prototype of an array type is ctypes.ArrayType.prototype. The prototype of a struct type is ctypes.StructType.prototype. The prototype of a string type or pointer type is ctypes.PointerType.prototype.
Every CType t has t.prototype.constructor === t; that is, its .prototype has a read-only, permanent, own .constructor property that refers to the type. The same is true of the four type constructors ctypes.{C,Array,Struct,Pointer}Type.

Calling types

CTypes are JavaScript constructors. That is, they are functions, and they can be called to create new objects. (The objects they create are called CData objects, and they are described in the next section.)

new t or new t() or t() - Create a new CData object of type t.
Without arguments, these allocate a new buffer of t.size bytes, populate it with zeroes, and return a new CData object referring to the complete object in that buffer.
If t.size is undefined, this throws a TypeError.
new t(val) or t(val) - Convert val to type t according to the explicit conversion rules below, throwing a TypeError if the conversion is impossible. Allocate a new buffer of t.size bytes, populated with the converted value. Return a new CData object of type t referring to the complete object in that buffer. (When val is a CData object of type t, the behavior is like malloc followed by memcpy.)
As a special case, if t is an array type of unspecified length and typeof val is 'number' and val is a nonnegative integer, allocate a new buffer of size val * t.elementType.size. Populate it with zeroes. Return a CData object of type t referring to the new array.

CData objects

A CData object represents a C/C++ value located in memory. The address of the C/C++ value can be taken (using the .address() method), and it can be assigned to (using the .assign() method).

All CData objects have these methods and properties:

cdata.address() - Return a new CData object of the pointer type ctypes.PointerType(cdata.constructor) whose value points to the C/C++ object referred to by cdata.
cdata.assign(val) - Convert val to the type of cdata using the implicit conversion rules. Store the converted value in the buffer location referred to by cdata.
cdata.constructor - Read-only. The type of cdata. (Implementation note: The prototype of cdata is an object that has a read-only constructor property, as detailed under "minutiae".)
cdata.toSource() - Return the string "t(arg)" where t and arg are implementation-defined JavaScript expressions (intended to represent the type of cdata and its value, respectively). The intent is that eval(cdata.toSource()) should ideally produce a new CData object containing a copy of cdata, but this can only work if the type of cdata happens to be bound to an appropriate name in scope.
cdata.toString() - Return the same string as cdata.toSource().

CData objects of struct types have getters and setters for each struct member:

cstruct.member - Let F be a CData object referring to the struct member. Return ConvertToJS(F).
cstruct.member = value - The value is converted to the type of the member using the implicit conversion rules. The converted value is stored in the buffer.
cstruct.addressOfField(name) - Return a new CData object of the appropriate pointer type, whose value points to the field of cstruct with the name name. If name is not a JavaScript string or does not name a field of cstruct, throw a TypeError.

These getters and setters can shadow the properties and methods described above. (Open issue: Can they really shadow .constructor? Maybe StructType should shoot you down if you try that one.)

Likewise, CData objects of array types have getters and setters for each element. Arrays additionally have a length property.

Note that these getters and setters are only present for integers i in the range 0 ≤ i < carray.length. (Open issue: can we arrange to throw an exception if i is out of range?)

carray[i] - Let E be a CData object referring to the element at index i. Return ConvertToJS(R).
carray[i] = val - Convert val to the type of the array element using the implicit conversion rules and store the result in the buffer.
carray.length - Read-only. The length of the array.
(Open issue: Do we care about arrays eventually having a length longer than 253, i.e. not representable as a JS number? It's currently impossible even on 64-bit platforms.)
carray.addressOfElement(i) - Return a new CData object of the appropriate pointer type (ctypes.PointerType(carray.constructor.elementType)) whose value points to element i of carray. If i is not a JavaScript number that is a valid index of carray, throw a TypeError.

(TODO: Figure out if the type of new FooArray(30) is FooArray or ArrayType(Foo, 30).)

(TODO: Possibly, a way to get a CData object that acts like a view on a window of an array. E.g. carray.slice(start, stop). Then you could .assign one region of memory to another, effectively memcpy-ing.)

(TODO: Pointer types might need some properties of their own.)

It is possible for multiple CData objects to refer to the same memory. (In this way they are sort of like C++ references.) For example:

const Point = new ctypes.StructType(
    "Point", [[ctypes.int32_t, 'x'], [ctypes.int32_t, 'y']]);
const Rect = new ctypes.StructType(
    "Rect", [[Point, 'topLeft'], [Point, 'bottomRight']]);

var r = Rect();     // a new CData object of type Rect
r.topLeft.x = 100;  // This works because r.topLeft is a CData object
                    // that refers to the topLeft member of r, not a copy.

r.toSource()
===> "Rect({topLeft: Point({x: 100, y: 0}), bottomRight: Point({x: 0, y: 0})})"

Minutiae:

The [[Class]] of a CData object is "CData".
The prototype of a CData object is the same as its type's .prototype property.

(Implementation notes: A CData object has a reserved slot that points to its type; a reserved slot that contains null if the object owns its own buffer, and otherwise points to the base CData object that owns the backing buffer where the data is stored; and a data pointer. The data pointer points to the actual location within the buffer of the C/C++ object to which the CData object refers. Since the data pointer might not be aligned to 2 bytes, PRIVATE_TO_JSVAL is insufficient; a custom JSClass.trace hook will be needed. If the object owns its own buffer, its finalizer frees it. Other CData objects that point into the buffer keep the base CData, and therefore the underlying buffer, alive.)

Conversions

The implicit conversion rules are applied whenever a JavaScript value of any kind is passed to a parameter of a ctypes-declared function, passed to cdata.assign(val), or assigned to an array element or struct member, as in carray[i] = val or cstruct.member = val. These rules are intended to lose precision only when there is no reasonable alternative. They generally do not coerce values of one type to another type.

(TODO: precise rules. Some of the properties we're shooting for here are: if val is a CData object of the right type, return its C/C++ value; applying the rules to a JS number is exactly the same as applying them to the corresponding C/C++ double; applying the rules to a JS boolean is exactly the same as applying them to the corresponding C/C++ bool; plain old JS Objects can implicitly convert to C/C++ structs; plain old JS Arrays can implicitly convert to C/C++ arrays.)

The explicit conversion rules are applied when a JavaScript value is passed as a parameter when calling a type, as in t(val) or new t(val). These rules are a bit more aggressive.

(TODO: precise rules. Properties we're shooting for: if implicit conversion produces a result, explicit conversion produces the same result; in some but not all cases where a C++ typename(value) function-like cast expression would work, explicit conversion also works.)

ConvertToJS(x) - This function is used to convert a CData object or a C/C++ return value to a JavaScript value. The intent is to return a simple JavaScript value whenever possible, and a CData object otherwise. The precise rules are:

  • If the value is of type void, return undefined.
  • If the value is of type bool, return the corresponding JavaScript boolean.
  • If the value is of a number type other than the pointer-sized types and the 64-bit types, return the corresponding JavaScript number.
  • If the value is of a character type, select the corresponding Unicode character. (Open issue: here be dragons!) Convert the character to UTF-16. Return a JavaScript string containing the UTF-16 code units. (If the character type is 1 or 2 bytes, as it is on all platforms we care about, the result is a one-character JavaScript string.)
  • If the value is of a string type and is NULL, return null.
  • If the value is of type cstring and is non-null, transcode it to UTF-16 and return a JavaScript string containing the UTF-16 code units. (Open issue: here be dragons!)
  • If the value is of type ustring and is non-null, return a JavaScript string containing the same sequence of 16-bit characters.
  • Otherwise the value is of an array, struct, or pointer type. If the argument x is already a CData object, return it. Otherwise allocate a buffer containing a copy of the C/C++ value x, and return a CData object of the appropriate type referring to the object in the new buffer.

Note that we do not autoconvert null C/C++ pointers to the JavaScript null value.

Examples

Basic types:

let i = new ctypes.uint32_t(5); // allocate sizeof(uint32_t)==4 bytes, initialize to 5, and return a new CData object
const setint = ctypes.declare("setint", ctypes.abi.default, ctypes.void_t, ctypes.PointerType(ctypes.uint32_t));
setint(i); // implicitly passes the address of allocated buffer
That was what I originally proposed, but now I think we should use the more explicit idiom from Python ctypes: setint(byref(i)) or (I think equivalently in this case) setint(pointer(i)). This being JavaScript I think it would be OK to change the syntax to something like setint(i.ptr()) or setint(i.address). In fact I think it's important to provide this as a property or method because we do not expect people to do the equivalent of from ctypes import *, and setint(ctypes.byref(i)) is a mess. --jorendorff 14:24, 29 September 2009 (UTC)
const getintp = ctypes.declare("getintp", ctypes.abi.default, ctypes.PointerType(ctypes.uint32_t));
let p = getintp(); // creates a ctypes pointer that holds the returned address
ctypes pointers are gone. With the new language, this returns a CData object of type uint32_t *. --jorendorff 14:24, 29 September 2009 (UTC)
let q = ctypes.castPointer(ctypes.Pointer(ctypes.uint8_t), p); // cast to uint8_t... why isn't this a method on Pointer?
Because it's a footgun. It shouldn't be right at your fingertips, you should have to dig around for it. I think we can call this ctypes.cast, which is what Python ctypes calls it. --jorendorff 14:24, 29 September 2009 (UTC)
let k = ctypes.pointerToUnsafeReference(q); // likewise?
This is gone now, but I haven't looked at Python ctypes to steal a replacement for it yet. TODO. --jorendorff 14:24, 29 September 2009 (UTC)

Struct fields:

const u_t = new ctypes.StructType('u_t', [[ctypes.uint32_t, 'x'], [ctypes.uint32_t, 'y']]);
let u = new u_t(5, 10); // allocates sizeof(2*uint32_t) and creates a CData object
u.x = 7; // setter for u.x modifies field
let i = u.y; // getter for u.y returns ConvertToJS(reference to u.y) -> primitive value 10
i = 5; // doesn't touch u.y

const v_t = new ctypes.StructType('v_t', [[u_t, 'u'], [ctypes.uint32_t, 'z']]);
let v = new v_t;  // allocates 12 bytes, zeroes them out, and creates a CData object
let w = v.u; // ConvertToJS(reference to v.u) returns reference
w.x = 3; // invokes setter
setint(v.u.x); // TypeError - primitive is not a reference or pointer
let p = ctypes.addressOfField(v.u, 'x'); // pointer to v.u.x
setint(p); // ok - manually pass address
let q = v.u.addressOfField('x'); // abbreviated syntax?
That makes sense to me! --jorendorff 14:24, 29 September 2009 (UTC)

64-bit integers: (check me!)

// want to represent 64-bit ints as CData objects always, rather than
// autoconverting to an int/double primitive, to avoid loss of precision.
// use the same behavior for size_t and ptrdiff_t.
let i = new ctypes.int64_t(5);
let j = i;
j = 6; // invokes setter on i
This setter trick can't be done. The user will have to do: j.assign(6); --jorendorff 14:24, 29 September 2009 (UTC)
const m_t = new ctypes.StructType(
    'm_t', [[ctypes.int64_t, 'x'], [ctypes.int64_t, 'y']]);
let m = new m_t;
const setint64 = ctypes.declare("setint64", ctypes.abi.default, ctypes.void_t, ctypes.Pointer(ctypes.int64_t));
setint64(m.x); // ok - unlike int32_t case, ConvertToJS returns a reference to the field m.x
setint64(ctypes.addressOfField(m, 'x')); // also works, per int32_t case
Right. However I'm agitating to change this for future-compatibility; see hand-wringing near where int64_t and size_t are documented above. --jorendorff 14:24, 29 September 2009 (UTC)