ServerJS/Binary/B: Difference between revisions

Updated according to feedback from the list.
mNo edit summary
(Updated according to feedback from the list.)
Line 1: Line 1:
 
All platforms must support two types for interacting with binary data: ByteArray and ByteString.  The ByteArray type resembles the interface of Array in that it is mutable, extensible, and indexing will return number values for the byte in the given position, zero by default, or undefined if the index is out of bounds.  The ByteString type resembles the interface of String in that it is immutable and indexing returns a ByteString of length 1.  These types are exported by the 'binary' top-level module and both types are subtypes of Binary, which is not instantiable but exists only for the convenience of referring to both ByteArray and ByteString.  (The idea of using these particular two types and their respective names originated with Jason Orendorff in the [http://groups.google.com/group/serverjs/msg/89808c05d46b92d0 Binary API Brouhaha] discussion.)
All platforms support two types for interacting with binary data: ByteArray and ByteString.  The ByteArray type resembles the interface of Array in that it is mutable, extensible, and indexing will return number values for the byte in the given position, or undefined.  The ByteString type resembles the interface of String in that it is immutable and indexing returns a ByteString of length 1.  These types are exported by the 'binary' top-level module.  (The idea of using these particular two types and their respective names originated with Jason Orendorff in the [http://groups.google.com/group/serverjs/msg/89808c05d46b92d0 Binary API Brouhaha] discussion.)


== Philosophy ==
== Philosophy ==
Line 10: Line 9:
This proposal also reflects both group sentiment and a pragmatic point about properties.  This isn't a decree that properties like "length" should be consistently used throughout the ServerJS APIs.  However, given that all platforms support properties at the native level (to host String and Array objects) and that byte strings and arrays will require support at the native level, pursuing client-side interoperability is beyond the scope of this proposal and therefore properties have been specified.  (See comments by Kris Zyp about the implementability of properties in all platforms, comments by Davey Waterson from Aptana about the counter-productivity of attempting to support this API in browsers, and support properties over accessor and mutator functions by Ionut Gabriel Stand and Cameron McCormack on the [http://groups.google.com/group/serverjs/browse_thread/thread/be72ef3d8146731d/06c27162b698eef5?lnk=gst mailing list]).
This proposal also reflects both group sentiment and a pragmatic point about properties.  This isn't a decree that properties like "length" should be consistently used throughout the ServerJS APIs.  However, given that all platforms support properties at the native level (to host String and Array objects) and that byte strings and arrays will require support at the native level, pursuing client-side interoperability is beyond the scope of this proposal and therefore properties have been specified.  (See comments by Kris Zyp about the implementability of properties in all platforms, comments by Davey Waterson from Aptana about the counter-productivity of attempting to support this API in browsers, and support properties over accessor and mutator functions by Ionut Gabriel Stand and Cameron McCormack on the [http://groups.google.com/group/serverjs/browse_thread/thread/be72ef3d8146731d/06c27162b698eef5?lnk=gst mailing list]).


The byte types provide functions for encoding, decoding, and transcoding, but they are all shallow interfaces that defer to a codec manager module, and may in turn use a system level codec or use a pair of pure JavaScript modules to transcode through an array or stream of canonical Unicode code points.


== ByteString ==
== ByteString ==
Line 21: Line 21:
* ByteString(byteArray)
* ByteString(byteArray)
* ByteString(array)
* ByteString(array)
* ByteString(string, codecModuleId)
* ByteString(string, codec)
 
The ByteString object has the following methods:
 
* encode(string, codecModuleId)


ByteString instances support the following:
ByteString instances support the following:


* immutable length property
* immutable length property
* toByteArray()
* toByteArray() -> byte for byte
* toArray()
* toByteArray(sourceCodec, targetCodec) -> transcoded
* toString(codecModuleId)
* toByteString() -> copy
* decode(codecModuleId)
* toByteString(sourceCodec, targetCodec) -> transcoded
* hash(digestModuleId)
* toArray() -> byte value array
* compress(compressionModuleId)
* toArray(codec) -> decoded code point array
* indexOf(Number or ByteString)
* toString() -> a representation like "[ByteString 10]"
* lastIndexOf(Number or ByteString)
* toString(codec) -> decoded
* charAt(offset) -> ByteString
* indexOf(byte:Number|ByteString|ByteArray)
* charCodeAt(offset) -> Number
* indexOf(byte:Number|ByteString|ByteArray, start:Number)
* indexOf(byte:Number|ByteString|ByteArray, start:Number, stop:Number)
* lastIndexOf(byte:Number|ByteString|ByteArray)
* lastIndexOf(byte:Number|ByteString|ByteArray, start:Number)
* lastIndexOf(byte:Number|ByteString|ByteArray, start:Number, stop:Number)
* byteAt(offset) -> Number (same as charCodeAt)
* byteAt(offset) -> Number (same as charCodeAt)
* split(Number or ByteString) -> Array of ByteStrings
* charCodeAt(offset:Number) -> Number
* substring(first, last) or substring(first) to the end
* charAt(offset:Number) -> byte:ByteString
* substr(first, length) or substr(length)
* split(delimiter:Number|ByteString) -> Array of ByteStrings
* split(delimiter:Number|ByteString, count:Number) -> Array of ByteStrings
* slice()
* slice(begin)
* slice(begin, end)
* substr(start)
* substr(start, length)
* substring(first)
* substring(first, last)
* The + operator returning new ByteStrings
* The + operator returning new ByteStrings
* The immutable [] operator returning ByteStrings
* The immutable [] operator returning ByteStrings
* toSource() which would return "ByteString([])" for a null byte string
* toSource() which would return "ByteString([])" for a null byte string
* valueOf() returns itself
* valueOf()


ByteString does not implement toUpperCase() or toLowerCase() since they are not meaningful without the context of a codec.


ByteString does not implement toUpperCase() or toLowerCase().


== ByteArray ==
== ByteArray ==
Line 63: Line 71:
* ByteArray(byteString)
* ByteArray(byteString)
* ByteArray(array)
* ByteArray(array)
* ByteString(string, codecModuleId)
* ByteString(string, codec)


Unlike the Array, the ByteArray is not variadic so that its initial length constructor is not ambiguous with its copy constructor.
Unlike the Array, the ByteArray is not variadic so that its initial length constructor is not ambiguous with its copy constructor.


The ByteArray object has the following methods:
All values within the length of the array are numbers stored as bytes that default to 0 if they have not been explicitly set.  Assigning beyond the bounds of a ByteArray implicitly grows the array, just like an Array.  Retrieving a value from an index that is out of the bounds of the Array, lower than 0 or at or beyond the length, the returned value is "undefined".  Assigning an index with a value that is larger than fits in a byte will be implicitly and silently masked against 0xFF.  Negative numbers will be bit extended to a byte in two's complement form and likewise masked.
 
* encode(string, codecModuleId)


ByteArray instances support the following:
ByteArray instances support the following:
Line 75: Line 81:
* mutable length property
* mutable length property
** extending a byte array fills the new entries with 0.
** extending a byte array fills the new entries with 0.
* toByteString()
* toArray() -> an array of the byte values
* toArray()
* toArray(codec) -> an array of the code points, decoded
* toString(codecModuleId)
* toString() -> a string representation like "[ByteArray 10]"
* decode(codecModuleId) returns String
* toString(codec) - decoded
* hash(digestModuleId)
* toByteArray() -> just a copy
* compress(compressionModuleId)
* toByteArray(sourceCodec, targetCodec) -> transcoded
* concat(iterable)
* toByteString() -> byte for byte
* join(byteString byteArray or Number)
* toByteString(sourceCodec, targetCodec) -> transcoded
* pop()
* concat(other:ByteArray|ByteString|Array)
* push(…variadic Numbers…)
* join(delimiter:ByteArray|ByteString|Array)
* shift()
* pop() -> byte:Number
* unshift(…variadic Numbers…)
* push(...variadic Numbers...)-> count:Number
* shift() -> byte:Number
* unshift(...variadic Numbers...) -> count:Number
* reverse() in place reversal
* reverse() in place reversal
* slice()
* slice()
Line 92: Line 100:
* splice()
* splice()
* toSource() returns a string like "ByteArray([])" for a null byte-array.
* toSource() returns a string like "ByteArray([])" for a null byte-array.
* valueOf() returns itself
* valueOf()
* The + operator returning new ByteArrays
* The + operator returning new ByteArrays
* The mutable [] operator for numbers
* The mutable [] operator for numbers
Line 100: Line 108:
The String prototype will be extended with the following members:
The String prototype will be extended with the following members:


* toByteArray(codecModuleId)
* toByteArray(codec)
* toByteString(codecModuleId)
* toByteString(codec)
* charCodes() -> Array of charcode:Number


== Array ==
== Array ==
Line 107: Line 116:
The Array prototype will be extended with the following members:
The Array prototype will be extended with the following members:


* toByteArray(codecModuleId)
* toByteArray(codec)
* toByteString(codecModuleId)
* toByteString(codec)
 
== General Requirements ==


== Conventions ==
None of the specified prototypes or augmentations to existing prototypes are enumerable.


"codecModuleId" always defaults to "utf8".  Codec modules must always export at least "encode", and "decode" methods to support byte strings and arrays. Digest modules must always export a "hash" method that accepts Array, ByteArray, or ByteString objects as their argument. Compression modules must always export a "compress" method.
Codec strings are as defined by IANA http://www.iana.org/assignments/character-sets.
171

edits