Jsctypes/api

From MozillaWiki
Jump to: navigation, search

js-ctypes is a library for calling C/C++ functions from JavaScript without having to write or generate any C/C++ "glue code".

js-ctypes is already in mozilla-central, but the API is subject to change. This page contains design proposals for the eventual js-ctypes API.

Libraries

ctypes.open(name) - Open a library. (TODO: all the details) This always returns a Library object or throws an exception.

Library objects have the following methods:

lib.declare(name, abi, rtype, [argtype1, ...]) - Declare a function. (TODO: all the details) This always returns a new callable CData object representing a function pointer to name, or throws an exception.
If rtype is an array type, this throws a TypeError.
If any argtypeN is an array type, the result is the same as if it had been the corresponding pointer type, argtypeN.elementType.ptr. (Rationale: This is how C and C++ treat array types in function declarations.)

(TODO: Explain what happens when you call a declared function. In brief: It uses ImplicitConvert to convert the JavaScript arguments to C and ConvertToJS to convert the return value to JS.)

Types

A type maps JS values to C/C++ values and vice versa. They're used when declaring functions. They can also be used to create and populate C/C++ data structures entirely from JS.

(Types and their prototypes are extensible: scripts can add new properties to them. Rationale: This is how most JavaScript constructors behave.)

Built-in types

ctypes provides the following types:

ctypes.int8_t, uint8_t, int16_t, uint16_t, int32_t, uint32_t, int64_t, uint64_t, float32_t, float64_t - Primitive numeric types that behave the same way on all platforms (with the usual caveat that every platform has slightly different floating-point behavior, in corner cases, and there's a limit to what we can realistically do about it).
Since some 64-bit values are outside the range of the JavaScript number type, ctypes.int64_t and ctypes.uint64_t do not autoconvert to JavaScript numbers. Instead, they convert to objects of the wrapper types ctypes.Int64 and ctypes.UInt64 (which are JavaScript object types, not CTypes). See "64-bit integer objects" below.
ctypes.size_t, ssize_t, intptr_t, uintptr_t - Primitive types whose size depends on the platform. (These types do not autoconvert to JavaScript numbers. Instead they convert to wrapper objects, even on 32-bit platforms. See "64-bit integer objects" below. Rationale: On 64-bit platforms, there are values of these types that cannot be precisely represented as JS numbers. It will be easier to write code that works on multiple platforms if the builtin types autoconvert in the same way on all platforms.)
ctypes.bool, short, unsigned_short, int, unsigned, unsigned_int, long, unsigned_long, float, double - Types that behave like the corresponding C types. As in C, unsigned is always an alias for unsigned_int.
(ctypes.long and ctypes.unsigned_long autoconvert to 64-bit integer objects on all platforms. The rest autoconvert to JavaScript numbers. Rationale: Some platforms have 64-bit long and some do not.)
ctypes.char, ctypes.signed_char, ctypes.unsigned_char - Character types that behave like the corresponding C types. (These are very much like int8_t and uint8_t, but they differ in some details of conversion. For example, ctypes.char.array(30)(str) converts the string str to UTF-8 and returns a new CData object of array type.)
ctypes.char16_t - A 16-bit unsigned character type representing a UTF-16 code unit. (This is distinct from uint16_t in details of conversion behavior. js-ctypes autoconverts C char16_ts to JavaScript strings of length 1.) For backwards compatibility, ctypes.jschar is an alias for char16_t.
ctypes.void_t - The special C type void. This can be used as a return value type. (void is a keyword in JavaScript.)
ctypes.voidptr_t - The C type void *.

The wrapped integer types are the types int64_t, uint64_t, size_t, ssize_t, intptr_t, uintptr_t, long, and unsigned_long. These are the types that autoconvert to 64-bit integer objects rather than to primitive JavaScript numbers.

User-defined types

Starting from the builtin types above, these functions can be used to create additional types:

new ctypes.PointerType(t) - If t is a CType, return the type "pointer to t". The result is cached so that future requests for this pointer type produce the same CType object. If t is a string, instead return a new opaque pointer type named t. Otherwise throw a TypeError.
new ctypes.FunctionType(abi, rt, [ at1, ... ]) - Return a function pointer CType corresponding to the C type rt (*) (at1, ...), where abi is a ctypes ABI type and rt and at1, ... are CTypes. Otherwise throw a TypeError.
new ctypes.ArrayType(t) - Return an array type with unspecified length and element type t. If t is not a type or t.size is undefined, throw a TypeError.
new ctypes.ArrayType(t, n) - Return the array type t[n]. If t is not a type or t.size is undefined or n is not a size value (defined below), throw a TypeError. If the size of the resulting array type, in bytes, would not be exactly representable both as a size_t and as a JavaScript number, throw a RangeError.
A size value is either a non-negative, integer-valued primitive number, an Int64 object with a non-negative value, or a UInt64 object.
(Array types with 0 elements are allowed. Rationale: C/C++ allow them, and it is convenient to be able to pass an array to a foreign function, and have it autoconverted to a C array, without worrying about the special case where the array is empty.)
new ctypes.StructType(name, fields) - Create a new struct type with the given name and fields. fields is an array of field descriptors, of the format
[ { field1: type1 }, { field2: type2 }, ... ]
where fieldn is a string denoting the name of the field, and typen is a ctypes type. js-ctypes calculates the offsets of the fields from its encyclopedic knowledge of the architecture's struct layout rules. If name is not a string, or any typen is such that typen.size is undefined, throw a TypeError. If the size of the struct, in bytes, would not be exactly representable both as a size_t and as a JavaScript number, throw a RangeError.

(Open issue: Specify a way to tell ctypes.StructType to use #pragma pack(n).)

These constructors behave exactly the same way when called without the new keyword.

Examples:

const DWORD = ctypes.uint32_t;
const HANDLE = new ctypes.PointerType("HANDLE");
const HANDLES = new ctypes.ArrayType(HANDLE);
const FILE = new ctypes.StructType("FILE").ptr;
const IOBuf = new ctypes.ArrayType(ctypes.uint8_t, 4096);

const struct_tm = new ctypes.StructType('tm', [{'tm_sec': ctypes.int}, ...]);

const comparator_t = new ctypes.FunctionType(ctypes.default_abi, ctypes.int, [ ctypes.voidptr_t, ctypes.voidptr_t ]);

Properties of types

All the fields described here are read-only.

All types have these properties and methods:

t.size - The C/C++ sizeof the type, in bytes. The result is a primitive number, not a UInt64 object.
If t is an array type with unspecified length, t.size is undefined.
ctypes.void_t.size is undefined.
t.name - A string, the type's name. It's intended that in ordinary use, this will be a C/C++ type expression, but it's not really meant to be machine-readable in all cases.
For primitive types this is just the name of the corresponding C/C++ type.
For struct types and opaque pointer types, this is simply the string that was passed to the constructor. For other function, pointer, and array types this should try to generate valid C/C++ type expressions, which isn't exactly trivial.
(Open issue: This conflicts with the usual meaning of .name for functions, and types are callable like functions.)
ctypes.int32_t.name
  ===> "int32_t"
ctypes.void_t.name
  ===> "void"
ctypes.char16_t.ptr.name
  ===> "char16_t *"

const FILE = new ctypes.StructType("FILE").ptr;
FILE.name
  ===> "FILE*"

const fn_t = new ctypes.FunctionType(ctypes.stdcall, ctypes.int, [ ctypes.voidptr_t, ctypes.voidptr_t ]);
fn_t.name
  ===> "int (__stdcall *)(void*, void*)"

const struct_tm = new ctypes.StructType("tm", [{tm_sec: ctypes.int}, ...]);
struct_tm.name
  ===> "tm"

// Pointer-to-array types are not often used in C/C++.
// Such types have funny-looking names.
const ptrTo_ptrTo_arrayOf4_strings =
    new ctypes.PointerType(
      new ctypes.PointerType(
        new ctypes.ArrayType(new ctypes.PointerType(ctypes.char), 4)));
ptrTo_ptrTo_arrayOf4_strings.name
  ===> "char *(**)[4]"
t.ptr - Return ctypes.PointerType(t).
t.array() - Return ctypes.ArrayType(t).
t.array(n) - Return ctypes.ArrayType(t, n).
Thus a quicker (but still almost as confusing) way to write the type in the previous example would be:
const ptrTo_ptrTo_arrayOf4_strings = ctypes.char.ptr.array(4).ptr.ptr;
(.array() requires parentheses but .ptr doesn't. Rationale: .array() has to be able to handle an optional parameter. Note that in C/C++, to write an array type requires brackets, optionally with a number in between: int [10] --> ctypes.int.array(10). Writing a pointer type does not require the brackets.)
t.toString() - Return "type " + t.name.
t.toSource() - Return a JavaScript expression that evaluates to a CType describing the same C/C++ type as t.
ctypes.uint32_t.toSource()
  ===> "ctypes.uint32_t"
ctypes.string.toSource()
  ===> "ctypes.string"

const charPtr = new ctypes.PointerType(ctypes.char);
charPtr.toSource()
  ===> "ctypes.char.ptr"

const Point = new ctypes.StructType(
    "Point", [{x: ctypes.int32_t}, {y: ctypes.int32_t}]);
Point.toSource()
  ===> "ctypes.StructType("Point", [{x: ctypes.int32_t}, {y: ctypes.int23_t}])"

Pointer types also have:

t.targetType - Read-only. The pointed-to type, or null if t is an opaque pointer type.

Function types also have:

t.abi - Read-only. The ABI of the function; one of the ctypes ABI objects.
t.returnType - Read-only. The return type.
t.argTypes - Read-only. A sealed array of argument types.

Struct types also have:

t.fields - Read-only. A sealed array of field descriptors. (TODO: Details.)

Array types also have:

t.elementType - The type of the elements of an array of this type. E.g. IOBuf.elementType === ctypes.uint8_t.
t.length - The number of elements, a non-negative integer; or undefined if this is an array type with unspecified length. (The result, if not undefined, is a primitive number, not a UInt64 object. Rationale: Having .length produce anything other than a number is foreign to JS, and arrays of more than 253 elements are currently unheard-of.)

Minutiae:

ctypes.CType is the abstract-base-class constructor of all js-ctypes types. If called, it throws a TypeError. (This is exposed in order to expose ctypes.CType.prototype.)
The [[Class]] of a ctypes type is "CType".
The [[Class]] of the type constructors ctypes.{C,Array,Struct,Pointer}Type is "Function".
Every CType has a read-only, permanent .prototype property. The type-constructors ctypes.{C,Pointer,Struct,Array}Type each have a read-only, permanent .prototype property as well.
Types have a hierarchy of prototype objects. The prototype of ctypes.CType.prototype is Function.prototype. The prototype of ctypes.{Array,Struct,Pointer,Function}Type.prototype and of all the builtin types except ctypes.voidptr_t is ctypes.CType.prototype. The prototype of an array type is ctypes.ArrayType.prototype. The prototype of a struct type is ctypes.StructType.prototype. The prototype of a pointer type is ctypes.PointerType.prototype. The prototype of a function type is ctypes.FunctionType.prototype.
Every CType t has t.prototype.constructor === t; that is, its .prototype has a read-only, permanent, own .constructor property that refers to the type. The same is true of the five type constructors ctypes.{C,Array,Struct,Pointer,Function}Type.

Calling types

CTypes are JavaScript constructors. That is, they are functions, and they can be called to create new objects. (The objects they create are called CData objects, and they are described in the next section.)

new t or new t() or t() - Create a new CData object of type t.
Without arguments, these allocate a new buffer of t.size bytes, populate it with zeroes, and return a new CData object referring to the complete object in that buffer.
If t.size is undefined, this throws a TypeError.
new t(val) or t(val) - Create a new CData object as follows:
  • If t.size is not undefined: Convert val to type t by calling ExplicitConvert(val, t), throwing a TypeError if the conversion is impossible. Allocate a new buffer of t.size bytes, populated with the converted value. Return a new CData object of type t referring to the complete object in that buffer. (When val is a CData object of type t, the behavior is like malloc followed by memcpy.)
  • If t is an array type of unspecified length:
  • If val is a size value (defined above): Let u = ArrayType(t.elementType, val) and return new u.
  • If t.elementType is char16_t and val is a string: Return a new CData object of type ArrayType(ctypes.char16_t, val.length + 1) containing the contents of val followed by a null character.
  • If t.elementType is an 8-bit character type and val is a string: If val is not a well-formed UTF-16 string, throw a TypeError. Otherwise, let s = a sequence of bytes, the result of converting val from UTF-16 to UTF-8, and let n = the number of bytes in s. Return a new CData object of type ArrayType(t.elementType, n + 1) containing the bytes in s followed by a null character.
  • If val is a JavaScript array object and val.length is a nonnegative integer, let u = ArrayType(t.elementType, val.length) and return new u(val). (Array CData objects created in this way have cdata.constructor === u, not t. Rationale: For all CData objects, cdata.constructor.size gives the size in bytes, unless a struct field shadows cdata.constructor.)
  • Otherwise, throw a TypeError.
  • Otherwise, t is void_t. Throw a TypeError.
let a_t = ctypes.ArrayType(ctypes.int32_t);
let a = new a_t(5);
a.length
  ===> 5
a.constructor.size
  ===> 20

CData objects

A CData object represents a C/C++ value located in memory. The address of the C/C++ value can be taken (using the .address() method), and it can be assigned to (using the .value property).

Every CData object has a type, the CType object that describes the type of the C/C++ value.

Minutiae:

The [[Class]] of a CData object is "CData".
The prototype of a CData object is the same as its type's .prototype property.

(Implementation notes: A CData object has a reserved slot that points to its type; a reserved slot that contains null if the object owns its own buffer, and otherwise points to the base CData object that owns the backing buffer where the data is stored; and a data pointer. The data pointer points to the actual location within the buffer of the C/C++ object to which the CData object refers. Since the data pointer might not be aligned to 2 bytes, PRIVATE_TO_JSVAL is insufficient; a custom JSClass.trace hook will be needed. If the object owns its own buffer, its finalizer frees it. Other CData objects that point into the buffer keep the base CData, and therefore the underlying buffer, alive.)

Properties and methods of CData objects

All CData objects have these methods and properties:

cdata.address() - Return a new CData object of the pointer type ctypes.PointerType(cdata.constructor) whose value points to the C/C++ object referred to by cdata.
(Open issue: Does this pointer keep cdata alive? Currently not but we could easily change it. It is impossible to have all pointers keep their referents alive in a totally general way--consider pointers embedded in structs and arrays. But this special case would be pretty easy to hack: put a .contents property on the resulting pointer, referring back to cdata.)
cdata.constructor - Read-only. The type of cdata. (This is never void_t or an array type with unspecified length. Implementation note: The prototype of cdata is an object that has a read-only constructor property, as detailed under "minutiae".)
cdata.toSource() - Return the string "t(arg)" where t and arg are implementation-defined JavaScript expressions (intended to represent the type of cdata and its value, respectively). The intent is that eval(cdata.toSource()) should ideally produce a new CData object containing a copy of cdata, but this can only work if the type of cdata happens to be bound to an appropriate name in scope.
cdata.toString() - Return the same string as cdata.toSource().

The .value property has a getter and a setter:

cdata.value - Let x = ConvertToJS(cdata). If x === cdata, throw a TypeError. Otherwise return x.
cdata.value = val - Let cval = ImplicitConvert(val, cdata.constructor). If conversion fails, throw a TypeError. Otherwise assign the value cval to the C/C++ object referred to by cdata.

Structs

CData objects of struct types also have this method:

cstruct.addressOfField(name) - Return a new CData object of the appropriate pointer type, whose value points to the field of cstruct with the name name. If name is not a JavaScript string or does not name a member of cstruct, throw a TypeError.

They also have getters and setters for each struct member:

cstruct.member - Let F be a CData object referring to the struct member. Return ConvertToJS(F).
cstruct.member = val - Let cval = ImplicitConvert(val, the type of the member). If conversion fails, throw a TypeError. Otherwise store cval in the appropriate member of the struct.

These getters and setters can shadow the properties and methods described above.

Pointers

CData objects of pointer types also have this property:

cptr.contents - Let C be a CData object referring to the pointed-to contents of cptr. Return ConvertToJS(C).
cptr.contents = val - Let cval = ImplicitConvert(val, the base type of the pointer). If conversion fails, throw a TypeError. Otherwise store cval in the pointed-to contents of cptr.

Functions

CData objects of function types are callable:

let result = cfn(arg1, ...) - Let (carg1, ...) be CData objects representing the arguments to the C function cfn, and cresult be a CData object representing its return value. Let cargn = ImplicitConvert(argn, the type of the argument), and let result = ConvertToJS(cresult). Call the C function with arguments represented by (carg1, ...), and store the result in cresult. If conversion fails, throw a TypeError.

Arrays

Likewise, CData objects of array types have getters and setters for each element. Arrays additionally have a length property.

Note that these getters and setters are only present for integers i in the range 0 ≤ i < carray.length. (Open issue: can we arrange to throw an exception if i is out of range?)

carray[i] - Let E be a CData object referring to the element at index i. Return ConvertToJS(E).
carray[i] = val - Let cval = ImplicitConvert(val, carray.elementType). If conversion fails, throw a TypeError. Otherwise store cval in element i of the array.
carray.length - Read-only. The length of the array as a JavaScript number. (The same as carray.constructor.length. This is not a UInt64 object. Rationale: Array CData objects should behave like other array-like objects for easy duck typing.)
carray.addressOfElement(i) - Return a new CData object of the appropriate pointer type (ctypes.PointerType(carray.constructor.elementType)) whose value points to element i of carray. If i is not a JavaScript number that is a valid index of carray, throw a TypeError.

(TODO: specify a way to read a C/C++ string and transcode it into a JS string.)

Aliasing

Note that it is possible for several CData objects to refer to the same or overlapping memory. (In this way CData objects are like C++ references.) For example:

const Point = new ctypes.StructType(
    "Point", [[ctypes.int32_t, 'x'], [ctypes.int32_t, 'y']]);
const Rect = new ctypes.StructType(
    "Rect", [[Point, 'topLeft'], [Point, 'bottomRight']]);

var r = Rect();     // a new CData object of type Rect
var p = r.topLeft;  // refers to the topLeft member of r, not a copy
r.topLeft.x = 100;  // This would not work if `r.topLeft` was a copy!
r.topLeft.x
  ===> 100          // It works...
p.x                 // and p refers to the same C/C++ object...
  ===> 100          // so it sees the change as well.

r.toSource()
  ===> "Rect({topLeft: {x: 100, y: 0}, bottomRight: {x: 0, y: 0}})"

p.x = 1.0e90;       // Assigning a value out of range is an error.
  **** TypeError

// The range checking is great, but it can have surprising
// consequences sometimes:
p.x = 0x7fffffff;   // (the maximum int32_t value)
p.x++;              // p.x = 0x7fffffff + 1, which is out of range...
  **** TypeError    // ...so this fails, leaving p.x unchanged.
// But JS code doesn't need to do that very often.
// To make this to roll around to -0x80000000, you could write:
p.x = (p.x + 1) | 0; // In JS, `x|0` truncates a number to int32.

Casting

ctypes.cast(cdata, t) - Return a new CData object which points to the same memory block as cdata, but with type t. If t.size is undefined or larger than cdata.constructor.size, throw a TypeError. This is like a C cast or a C++ reinterpret_cast.

Equality

According to the ECMAScript standard, if x and y are two different objects, then x === y and x == y are both false. This has consequences for code that uses js-ctypes pointers, pointer-sized integers, or 64-bit integers, because all these values are represented as JavaScript objects. In C/C++, the == operator would compare values of these types for equality. Not so in js-ctypes:

const HANDLE = new ctypes.PointerType("HANDLE");
const INVALID_HANDLE_VALUE = HANDLE(-1);
const kernel32 = ctypes.open("kernel32");
const CreateMutex = kernel32.declare("CreateMutex", ...);

var h = CreateMutex(null, false, null);
if (h == INVALID_HANDLE_VALUE)   // BAD - always false
    ...

This comparison is always false because CreateMutex returns a new CData object, which of course will be a different object from the existing value of INVALID_HANDLE_VALUE.

(Python ctypes has the same issue. It isn't mentioned in the docs, but:

>>> from ctypes import *
>>> c_void_p(0) == c_void_p(0)
False
>>> c_int(33) == c_int(33)
False

We could overload operator== using the nonstandard hook JSExtendedClass.equality but it might not be worth it.)

64-bit integer objects

Since JavaScript numbers are floating-point values, they cannot precisely represent all 64-bit integer values. Therefore 64-bit and pointer-sized C/C++ values of numeric types do not autoconvert to JavaScript numbers. Instead they autoconvert to JavaScript objects of type ctypes.Int64 and ctypes.UInt64.

Int64 and UInt64 objects are immutable.

It's not possible to do arithmetic Int64Objects using the standard arithmetic operators. JavaScript does not have operator overloading (yet). A few convenience functions are provided. (These types are intentionally feature-sparse so that they can be drop-in-replaced with a full-featured bignum type when JavaScript gets one.)

Int64

ctypes.Int64(n) or new ctypes.Int64(n) - If n is an integer-valued number such that -263n < 263, return a sealed Int64 object with that value. Otherwise if n is a string consisting of an optional minus sign followed by either decimal digits or "0x" or "0X" and hexadecimal digits, and the string represents a number within range, convert the string to an integer and construct an Int64 object as above. Otherwise if n is an Int64 or UInt64 object, and represents a number within range, use the value to construct an Int64 object as above. Otherwise throw a TypeError.

Int64 objects have the following methods:

i64.toString([radix]) - If radix is omitted, assume 10. Return a string representation of a in base radix, consisting of a leading minus sign, if the value is negative, followed by one or more lowercase digits in base radix.
i64.toSource() - Return a string. (This is provided for debugging purposes, and programs should not rely on details of the resulting string, which may change in the future.)

The following functions are also provided:

ctypes.Int64.compare(a, b) - If a and b are both Int64 objects, return -1 if a < b, 0 if a = b, and 1 if a > b. Otherwise throw a TypeError.
ctypes.Int64.lo(a) - If a is an Int64 object, return the low 32 bits of its value. (The result is an integer in the range 0 ≤ result < 232.) Otherwise throw a TypeError.
ctypes.Int64.hi(a) - If a is an Int64 object, return the high 32 bits of its value (like a >> 32). Otherwise throw a TypeError.
ctypes.Int64.join(hi, lo) - If hi is an integer-valued number in the range -231hi < 231 and lo is an integer-valued number in the range 0 ≤ lo < 232, return a sealed Int64 object whose value is hi × 232 + lo. Otherwise throw a TypeError.

UInt64

UInt64 objects are the same except that the hi values are in the range 0 ≤ hi < 232 and the .toString() method never produces a minus sign.

Conversions

These functions are not exactly JS functions or C/C++ functions. They're algorithms used elsewhere in the spec.

ConvertToJS(x) - This function is used to convert a CData object or a C/C++ return value to a JavaScript value. The intent is to return a simple JavaScript value whenever possible without loss of data or different behavior on different platforms, and a CData object otherwise. The precise rules are:

  • If the type of x is void, return undefined.
  • If the type of x is bool, return the corresponding JavaScript boolean.
  • If x is of a number type but not a wrapped integer type, return the corresponding JavaScript number.
  • If x is a signed wrapped integer type (long, int64_t, ssize_t, or intptr_t), return a ctypes.Int64 object with value x.
  • If x is an unsigned wrapped integer type (unsigned long, uint64_t, size_t, or uintptr_t), return a ctypes.UInt64 object with value x.
  • If x is of type char16_t, return a JavaScript string of length 1 containing the value of x (like String.fromCharCode(x)).
  • If x is of any other character type, return the JavaScript number equal to its integer value. (This is sensitive to the signedness of the character type. Also, we assume no character types are so wide that they don't fit into a JavaScript number.)
  • Otherwise x is of an array, struct, or pointer type. If the argument x is already a CData object, return it. Otherwise allocate a buffer containing a copy of the C/C++ value x, and return a CData object of the appropriate type referring to the object in the new buffer.

Note that null C/C++ pointers do not convert to the JavaScript null value. (Open issue: Should we? Is there any value in retaining the type of a particular null pointer?)

(Arrays of characters do not convert to JavaScript strings. Rationale: Suppose x is a CData object of a struct type with a member a of type char[10]. Then x.a[1] should return the character in element 1 of the array, even if x.a[0] is a null character. Likewise, x.a[0] = '\0'; should modify the contents of the array. Both are possible only if x.a is a CData object of array type, not a JavaScript string.)

ImplicitConvert(val, t) - Convert the JavaScript value val to a C/C++ value of type t. This is called whenever a JavaScript value of any kind is passed to a parameter of a ctypes-declared function, passed to cdata.value = val, or assigned to an array element or struct member, as in carray[i] = val or cstruct.member = val.

This function is intended to lose precision only when there is no reasonable alternative. It generally does not coerce values of one type to another type.

C/C++ values of all supported types round trip through ConvertToJS and ImplicitConvert without any loss of data. That is, for any C/C++ value v of type t, ImplicitConvert(ConvertToJS(v), t produces a copy of v. (Note that not all JavaScript can round-trip to C/C++ and back in an analogous way. JavaScript primitive numbers can round-trip to double on all current platforms, Int64 objects to int64_t, JavaScript booleans to bool, and so on. But some JavaScript values, such as functions, cannot be ImplicitConverted to any C/C++ type without loss of data.)

t must not be void or an array type with unspecified length. (Rationale: C/C++ variables and parameters cannot have such types. The parameter of a function declared int f(int x[]) is int *, not int[].)

  • First, if val is a CData object of type u and SameType(t, u), return the current value of the C/C++ object referred to by val. Otherwise the behavior depends on the target type t.
  • If t is ctypes.bool:
  • If val is a boolean, return the corresponding C/C++ boolean value.
  • If val is the number +0 or -0, return false.
  • If val is the number 1, return true.
  • Otherwise fail.
  • If t is a numeric type:
  • If val is a boolean, the result is a 0 or 1 of type t.
  • If val is a CData object of a numeric type, and every value of that type is precisely representable in type t, the result is a precise representation of the value of val in type t. (This is more conservative than the implicit integer conversions in C/C++ and more conservative than what we do if val is a JavaScript number. This is sensitive to the signedness of the two types.)
  • If val is a number that can be exactly represented as a value of type t, the result is that value.
  • If val is an Int64 or UInt64 object whose value can be exactly represented as a value of type t, the result is that value.
  • If val is a number and t is a floating-point type, the result is the jsdouble represented by val, cast to type t. (This can implicitly lose bits of precision. The rationale is to allow the user to pass values like 1/3 to float parameters.)
  • Otherwise fail.
  • If t is ctypes.char16_t:
  • If val is a string of length 1, the result is the 16-bit unsigned value of the code unit in the string. val.charCodeAt(0).
  • If val is a number that can be exactly represented as a value of type char16_t (that is, an integer in the range 0 ≤ val < 216), the result is that value.
  • Otherwise fail.
  • If t is any other character type:
  • If val is a string:
  • If the 16-bit elements of val are not the UTF-16 encoding of a single Unicode character, fail. (Open issue: If we support wchar_t we may want to allow unpaired surrogate code points to pass through without error.)
  • If that Unicode character can be represented by a single character of type t, the result is that character. (Open issue: Unicode conversions.)
  • Otherwise fail.
  • If val is a number that can be exactly represented as a value of type t, the result is that value. (This is sensitive to the signedness of t.)
  • Otherwise fail.
  • If t is a pointer type:
  • If val is null, the result is a C/C++ NULL pointer of type t.
  • If val is a CData object of array type u and either t is ctypes.voidptr_t or SameType(t.targetType, u.elementType), return a pointer to the first element of the array.
  • If t is ctypes.voidptr_t and val is a CData object of pointer type, return the value of the C/C++ pointer in val, cast to void *.
  • Otherwise fail. (Rationale: We don't convert strings to pointers yet; see the "Auto-converting strings" section below. We don't convert JavaScript arrays to pointers because this would have to allocate a C array implicitly, raising issues about who should deallocate it, and when, and how they know it's their responsibility.)
  • If t is an array type:
  • If val is a JavaScript string:
  • If t.elementType is char16_t and t.length >= val.length, the result is an array of type t whose first val.length elements are the 16-bit elements of val. If t.length > val.length, then element val.length of the result is a null character. The values of the rest of the array elements are unspecified.
  • If t.elementType is an 8-bit character type:
  • If t is not well-formed UTF-16, fail.
  • Let s = a sequence of bytes, the result of converting val from UTF-16 to UTF-8.
  • Let n = the number of bytes in s.
  • If t.length < n, fail.
  • The result is an array of type t whose first n elements are the 8-bit values in s. If t.length > n, then element n of the result is 0. The values of the rest of the array elements are unspecified.
  • Otherwise fail.
  • If val is a JavaScript array object:
  • If val.length is not a nonnegative integer, fail.
  • If val.length !== t.length, fail.
  • Otherwise, the result is a C/C++ array of val.length elements of type t.elementType. Element i of the result is ImplicitConvert(val[i], t.elementType).
  • Otherwise fail. (Rationale: The clause "If val is a JavaScript array object" requires some justification. If we allowed arbitrary JavaScript objects that resemble arrays, that would include CData objects of array type. Consequently, arr1.value = arr2 where arr1 is of type ctypes.uint8_t.array(30) and arr2 is of type ctypes.int.array(30) would work as long as the values in arr2 are small enough. We considered this conversion too astonishing and too error-prone.)
  • Otherwise t is a struct type.
  • If val is a JavaScript object that is not a CData object:
  • If the enumerable own properties of val are exactly the names of the members of the struct t, the result is a C/C++ struct of type t, each of whose members is ImplicitConvert(val[the member name], the type of the member).
  • Otherwise fail.
  • Otherwise fail.

ExplicitConvert(val, t) - Convert the JavaScript value val to a C/C++ value of type t, a little more forcefully than ImplicitConvert.

This is called when a JavaScript value is passed as a parameter when calling a type, as in t(val) or new t(val).

  • If ImplicitConvert(val, t) succeeds, use that result. Otherwise:
  • If t is ctypes.bool, the result is the C/C++ boolean value corresponding to ToBoolean(val), where the operator ToBoolean is as defined in the ECMAScript standard. (This is a bit less strict than the conversion behavior specified for numeric types below. This is just for convenience: the operators && and ||, which produce a boolean value in C/C++, do not always do so in JavaScript.)
  • If t is an integer or character type and val is an infinity or NaN, the result is a 0 of type t.
  • If t is an integer or character type and val is a finite number, the result is the same as casting the jsdouble value of val to type t with a C-style cast. (I think this basically means, start with val, discard the fractional part, convert the integer part to a bit-pattern, and mask off whatever doesn't fit in type t. But whatever C does is good enough for me. --jorendorff)
  • If t is an integer or character type and val is an Int64 or UInt64 object, the result is the same as casting the int64_t or uint64_t value of val to type t with a C-style cast.
  • If t is a pointer type and val is a number, Int64 object, or UInt64 object that can be exactly represented as an intptr_t or uintptr_t, the result is the same as casting that intptr_t or uintptr_t value to type t with a C-style cast.
  • If t is an integer type (not a character type) and val is a string consisting entirely of an optional minus sign, followed by either one or more decimal digits or the characters "0x" or "0X" and one or more hexadecimal digits, then the result is the same as casting the integer named by val to type t with a C-style cast.
  • Otherwise fail.

SameType(t, u) - True if t and u represent the same C/C++ type.

  • If t and u represent the same built-in type, even void, return true.
  • If they are both pointer types, return SameType(t.targetType, u.targetType).
  • If they are both array types, return SameType(t.elementType, u.elementType) && t.length === u.length.
  • If they are both struct types, return t === u.
  • Otherwise return false.

(SameType(int, int32_t) is false. Rationale: As it stands, SameType behaves the same on all platforms. By making types match if they are typedef'd on the current platform, we could make e.g. ctypes.int.ptr and ctypes.int32_t.ptr compatible on platforms where we just have typedef int int32_t. But it was unclear how much that would matter in practice, balanced against cross-platform consistency. We might reverse this decision.)

Examples

Cu.import("ctypes"); // imports the global ctypes object

// searches the path and opens "libmylib.so" on linux,
// "libmylib.dylib" on mac, and "mylib.dll" on windows
let mylib = ctypes.open("mylib", ctypes.SEARCH);

// declares the C function:
//     int32_t myfunc(int32_t);
let myfunc = mylib.declare("myfunc", ctypes.default_abi,
    ctypes.int32_t, ctypes.int32_t);

let ret = myfunc(2); // calls myfunc

Note that for simple types (integers and characters), we will autoconvert the argument at call time - there's no need to pass in a ctypes.int32_t object. The consumer should never need to instantiate such an object explicitly, unless they're using it to back a pointer - in which case we require explicit, strong typing. See later for examples.

Here is how to create an object of type int32_t:

let i = new ctypes.int32_t; // new int32_t object with default value 0

This allocates a new C++ object of type int32_t (4 bytes of memory), zeroes it out, and returns a JS object that manages the allocated memory. Whenever the JS object is garbage-collected, the allocated memory will be automatically freed.

Of course you don't normally need to do this, as js-ctypes will autoconvert JS numbers to various C/C++ types for you:

let myfunc = mylib.declare("myfunc", ctypes.default_abi,
    ctypes.int32_t, ctypes.int32_t);
let ret = myfunc(i);
print(typeof ret); // The result is a JavaScript number.
number

ctypes.int32_t is a CType. Like all other CTypes, it can be used for type specification when passed as an object, as above. (This will work for user-defined CTypes such as structs and pointers also - see later.)

The object created by new ctypes.int32_t is called a CData object, and they are described in detail in the "CData objects" section above.

Opaque pointers:

// A new opaque pointer type.
FILE_ptr = new ctypes.StructType("FILE").ptr;

let fopen = mylib.declare("fopen", ctypes.default_abi,
    FILE_ptr, ctypes.char.ptr, ctypes.char.ptr);
let file = fopen("foo", "r");
if (file.isNull())
    throw "fopen failed";
file.contents(); // TypeError: type is unknown

(Open issue: fopen("foo", "r") does not work under js-ctypes as currently specified.)

Declaring a struct:

// C prototype: struct s_t { int32_t a; int64_t b; };
const s_t = new ctypes.StructType("s_t", [{ a: Int32 }, { b: Int64 }]);
let myfunc = mylib.declare("myfunc", ctypes.default_abi, ctypes.int32_t, s_t);

let s = new s_t(10, 20);

This creates an s_t object which allocates enough memory for the whole struct, creates getters and setters to access the binary fields via their offset, and assigns the values 10 and 20 to the fields. The new object's prototype is s_t.prototype.

let i = myfunc(0, s); // checks the type of s

Nested structs:

const u_t = ctypes.StructType("u_t", [{ x: Int64 }, { y: s_t }]);
let u = new u_t(5e4, s); // copies data from s into u.y - no references

let u_field = u.y; // creates an s_t object that points directly to
                   // the offset of u.y within u.

An out parameter:

// allocate sizeof(uint32_t)==4 bytes,
// initialize to 5, and return a new CData object
let i = new ctypes.uint32_t(5);

// Declare a C function with an out parameter.
const getint = ctypes.declare("getint", ctypes.abi.default,
    ctypes.void_t, ctypes.uint32_t.ptr);

getint(i.address()); // explicitly take the address of allocated buffer

(Python ctypes has byref(i) as an alternative to i.address(), but we do not expect users to do the equivalent of from ctypes import *, and setint(ctypes.byref(i)) is a bit much.)

Pointers:

// Declare a C function that returns a pointer.
const getintp = ctypes.declare("getintp", ctypes.abi.default,
    ctypes.uint32_t.ptr);
let p = getintp(); // A CData object that holds the returned uint32_t *

// cast from (uint32_t *) to (uint8_t *)
let q = ctypes.cast(p, ctypes.uint8_t.ptr);

// first byte of buffer
let b0 = q.contents(); // an integer, 0 <= b0 < 256

Struct fields:

const u_t = new ctypes.StructType('u_t',
    [[ctypes.uint32_t, 'x'], [ctypes.uint32_t, 'y']]);
// allocates sizeof(2*uint32_t) and creates a CData object
let u = new u_t(5, 10);
u.x = 7; // setter for u.x modifies field
let i = u.y; // getter for u.y returns ConvertToJS(reference to u.y)
print(i);    // ...which is the primitive number 10
10

i = 5; // doesn't touch u.y
print(u.y);
10

const v_t = new ctypes.StructType('v_t',
    [[u_t, 'u'], [ctypes.uint32_t, 'z']]);
// allocates 12 bytes, zeroes them out, and creates a CData object
let v = new v_t;
let w = v.u; // ConvertToJS(reference to v.u) returns CData object
w.x = 3; // invokes setter
setint(v.u.x); // TypeError: setint argument 1 expects type uint32_t *, got int
let p = v.u.addressOfField('x'); // pointer to v.u.x
setint(p); // ok - manually pass address

64-bit integers:

// Declare a function that returns a 64-bit unsigned int.
const getfilesize = mylib.declare("getfilesize", ctypes.default_abi,
    ctypes.uint64_t, ctypes.char.ptr);

// This autoconverts to a UInt64 object, not a JS number, even though the
// file is presumably much smaller than 4GiB. Converting to a different type
// each time you call the function, depending on the result value, would be
// worse.
let s = getfilesize("/usr/share/dict/words");
print(s instanceof ctypes.UInt64);
true
print(s < 1000000);    // Because s is an object, not a number,
false            // JS lies to you.
print(s >= 1000000);   // Neither of these is doing what you want,
false            // as evidenced by the bizarre answers.
print(s);              // It has a nice .toString() method at least!
931467

// There is no shortcut. To get an actual JS number out of a
// 64-bit integer, you have to use the ctypes.{Int64,UInt64}.{hi,lo}
// functions.
print(ctypes.UInt64.lo(s))
931467
// (OK, I lied. There is a shortcut. You can abuse the .toString() method.
// WARNING: This can lose precision!)
print(Number(s.toString()))
931467

let i = new ctypes.int64_t(5);  // a new 8-byte buffer
let j = i;  // another variable referring to the same CData object
j.value = 6; // invokes setter on i, auto-promotes 6 to Int64
print(typeof j.value)  // but j.value is still an Int64 object
object
print(j.value instanceof ctypes.Int64)
true
print(j.value);
6

const m_t = new ctypes.StructType(
    'm_t', [[ctypes.int64_t, 'x'], [ctypes.int64_t, 'y']]);
let m = new m_t;
const getint64 = ctypes.declare("getint64", ctypes.abi.default,
    ctypes.void_t, ctypes.Pointer(ctypes.int64_t));
getint64(m.x); // TypeError: getint64 argument 1 expected type int64_t *,
               // got Int64 object
               // (because m.x's getter autoconverts to an Int64 object)
getint64(ctypes.addressOfField(m, 'x')); // works

(Open issue: As above, the implicit conversion from JS string to char * in getfilesize("/usr/share/dict/words") does not work in js-ctypes as specified.)

(TODO - make this a real example:)

let i1 = ctypes.int32_t(5);
let i2 = ctypes.int32_t();
i2.value = i1  // i2 and i1 have separate binary storage, this is memcpy
//you can copy the guts of one struct to another, etc.

Future directions

Callbacks

The libffi part of this is presumably not too bad. Issues:

Lifetimes. C/C++ makes it impossible to track an object pointer. Both JavaScript's GC and experience with C/C++ function pointers will tend to discourage users from caring about function lifetimes.

I think the best solution to this problem is to put the burden of keeping the function alive entirely on the client.

Finding the right context to use. If we burn the cx right into the libffi closure, it will crash when called from a different thread or after the cx is destroyed. If we take a context at random from some internal JSAPI structure, it might be thread-safe, but the context's options and global will be random, which sounds dangerous. Perhaps ctypes itself can create a context per thread, on demand, for the use of function pointers. In a typical application, that would only create one context, if any.

Converting strings

I think we want an explicit API for converting strings, very roughly:

CData objects of certain pointer and array types have methods for reading and writing Unicode strings. These methods are present if the target or element type is an 8-bit character or integer type.

cdata.readString([encoding[, length]]) - Read bytes from cdata and convert them to Unicode characters using the specified encoding, returning a string. Specifically:

  • If cdata is an array, let p = a pointer to the first element. Otherwise cdata is a pointer; let p = the value of cdata.
  • If encoding is undefined or omitted, the selected encoding is UTF-8. Otherwise, if encoding is a string naming a known character encoding, that encoding is selected. Otherwise throw a TypeError.
  • If length is a size value, cdata is an array, and length > cdata.length, then throw a TypeError.
  • Otherwise, if length is a size value, take exactly length bytes starting at p and convert them to Unicode characters according to the selected encoding. (Open issue: Error handling.) Return a JavaScript string containing the Unicode characters, represented in UTF-16. (The result may contain null characters.)
  • Otherwise, if length is undefined or omitted, convert bytes starting at p to Unicode characters according to the selected encoding. Stop when the end of the array is reached (if cdata is an array) or when a null character (U+0000) is found. (Open issue: Error handling.) Return a JavaScript string containing the Unicode characters, represented in UTF-16. (If cdata is a pointer and there is no trailing null character, this can crash.)
  • Otherwise throw a TypeError.

cdata.writeString(s, [encoding[, length]]) - Determine the starting pointer p as above. If s is not a well-formed UTF-16 string, throw a TypeError. (Open issue: Error handling.) Otherwise convert s to bytes in the specified encoding (default: UTF-8) and write at most length - 1 bytes, or all the converted bytes, if length is undefined or omitted, to memory starting at p. Write a converted null character after the data. Return the number of bytes of data written, not counting the terminating null character.

(Open issue: cdata.writeString(...) is awkward for the case where you want an autosized ctypes.char.array() to hold the converted data. If cdata happens to be too small for the resulting string, and you don't supply length, you crash; and if you do supply length, you don't know whether conversion was halted because the target array was of insufficient length.)

(Open issue: As proposed, these are not suitable for working with encodings where a zero byte might not indicate the end of text. For example, a string encoded in UTF-16 will typically contain a lot of zero bytes. Unfortunately, in the case of readString, the underlying library demands the length up front.)

(Open issue: These methods offer no error handling options, which is pretty weak. Real-world code often wants to allow a few characters to be garbled rather than fail. For now we will likely be limited to whatever the underlying codec library, nsIScriptableUnicodeConverter, can do.)

(Open issue: 16-bit versions too, for UTF-16?)

isNull

If we do not convert NULL pointers to JS null (and I may have changed my mind about this) then we need:

cptr.isNull() - Return true if cptr's value is a null pointer, false otherwise.

Auto-converting strings

There are several issues:

Lifetimes. This problem arises when autoconverting from JS to C/C++ only.

When passing a string to a foreign function, like foo(s), what is the lifetime of the autoconverted pointer? We're comfortable with guaranteeing s for the duration of the call. But then there are situations like

TenStrings = char.ptr.array(10);
var arr = new TenStrings();
arr[0] = s;  // What is the lifetime of the data arr[0] points to?

The more implicit conversion we allow, the greater a problem this is; it's a tough trade-off.

Non-null-terminated strings. This problem arises when autoconverting from C/C++ to JS only. It applies to C/C++ character arrays as well as pointers (but it's worse when dealing with pointers).

In C/C++, the type char * effectively promises nothing about the pointed-to data. Autoconverting would make it hard to use APIs that return non-null-terminated strings (or structs containing char * pointers that aren't logically strings). The workaround would be to declare them as a different type.

Unicode. This problem does not apply to conversions between JS strings and char16_t arrays or pointers; only char arrays or pointers.

Converting both ways raises issues about what encoding should be assumed. We assume JS strings are UTF-16 and char strings are UTF-8, which is not the right thing on Windows. However Windows offers a lot of APIs that accept 16-bit strings and, for those, char16_t is the right thing.

Casting away const. This problem arises only when converting from a JS string to a C/C++ pointer type. The string data must not be modified, but the C/C++ types char * and char16_t * suggest that the referent might be modified.