ServerJS/Binary/B: Difference between revisions

Jump to navigation Jump to search
Edits in response to byteAt thread.
(s/codec/charset/ in most places, and noted that ByteString().toByteString() can return itself instead of making a copy since it is immutable.)
(Edits in response to byteAt thread.)
Line 5: Line 5:
This proposal is not an object oriented variation on pack and unpack with notions of inherent endianness, read/write head position, or intrinsic codec or charset information.  The objects described in this proposal are merely for the storage and direct manipulation of strings and arrays of byte data.  Some object oriented conveniences are made, but the exercise of implementing pack, unpack, or an object-oriented analog thereof are left as an exercise for a future proposal of a more abstract type or a 'struct' module (as mentioned by Ionut Gabriel Stan on [http://groups.google.com/group/serverjs/msg/592442ba98c6c70e the list]).  This goes against most mentioned [[ServerJS/Binary|prior art]].
This proposal is not an object oriented variation on pack and unpack with notions of inherent endianness, read/write head position, or intrinsic codec or charset information.  The objects described in this proposal are merely for the storage and direct manipulation of strings and arrays of byte data.  Some object oriented conveniences are made, but the exercise of implementing pack, unpack, or an object-oriented analog thereof are left as an exercise for a future proposal of a more abstract type or a 'struct' module (as mentioned by Ionut Gabriel Stan on [http://groups.google.com/group/serverjs/msg/592442ba98c6c70e the list]).  This goes against most mentioned [[ServerJS/Binary|prior art]].


This proposal also does not provide named member functions for any particular subset of the possible charsets, codecs, compression algorithms, or digests that might operate on a byte string or array.  Instead, convenience member functions are provided for interfacing with any named codec or digest module, assuming that the given module exports the specified interface. (As supported originally by Robert Schultz, Davey Waterson, Ross Boucher, and tacitly myself, Kris Kowal, on the [http://groups.google.com/group/serverjs/browse_thread/thread/be72ef3d8146731d/06c27162b698eef5?lnk=gst First proposition] thread on the mailing list).  This proposal does not address the need for stream objects to support pipelined codecs and hash digests (mentioned by Tom Robinson and Robert Schultz in the same conversation).
This proposal also does not provide named member functions for any particular subset of the possible charsets, codecs, compression algorithms, or consistent hash digests that might operate on a byte string or array.  Instead, convenience member functions are provided for interfacing with any named charset, with the IANA charset name space, and with the possibility of eventually employing a system of modular extensions for other codecs or digests, requiring that the given module exports a specified interface. (As supported originally by Robert Schultz, Davey Waterson, Ross Boucher, and tacitly myself, Kris Kowal, on the [http://groups.google.com/group/serverjs/browse_thread/thread/be72ef3d8146731d/06c27162b698eef5?lnk=gst First proposition] thread on the mailing list).  This proposal does not address the need for stream objects to support pipelined codecs and hash digests (mentioned by Tom Robinson and Robert Schultz in the same conversation).


This proposal also reflects both group sentiment and a pragmatic point about properties.  This isn't a decree that properties like "length" should be consistently used throughout the ServerJS APIs.  However, given that all platforms support properties at the native level (to host String and Array objects) and that byte strings and arrays will require support at the native level, pursuing client-side interoperability is beyond the scope of this proposal and therefore properties have been specified.  (See comments by Kris Zyp about the implementability of properties in all platforms, comments by Davey Waterson from Aptana about the counter-productivity of attempting to support this API in browsers, and support properties over accessor and mutator functions by Ionut Gabriel Stand and Cameron McCormack on the [http://groups.google.com/group/serverjs/browse_thread/thread/be72ef3d8146731d/06c27162b698eef5?lnk=gst mailing list]).
This proposal also reflects both group sentiment and a pragmatic point about properties.  This isn't a decree that properties like "length" should be consistently used throughout the ServerJS APIs.  However, given that all platforms support properties at the native level (to host String and Array objects) and that byte strings and arrays will require support at the native level, pursuing client-side interoperability is beyond the scope of this proposal and therefore properties have been specified.  (See comments by Kris Zyp about the implementability of properties in all platforms, comments by Davey Waterson from Aptana about the counter-productivity of attempting to support this API in browsers, and support properties over accessor and mutator functions by Ionut Gabriel Stand and Cameron McCormack on the [http://groups.google.com/group/serverjs/browse_thread/thread/be72ef3d8146731d/06c27162b698eef5?lnk=gst mailing list]).


The byte types provide functions for encoding, decoding, and transcoding, but they are all shallow interfaces that defer to a charset manager module, and may in turn use a system level charset or use a pair of pure JavaScript modules to transcode through an array or stream of canonical Unicode code points.
The byte types provide functions for encoding, decoding, and transcoding, but they are all shallow interfaces that defer to a charset manager module, and may in turn use a system level charset or use a pair of pure JavaScript modules to transcode through an array or stream of canonical Unicode code points.  This behavior may be specified further in the future.


= Specification =
= Specification =
The "binary" top-level module must export "ByteArray" and "ByteString".


== ByteString ==
== ByteString ==
Line 40: Line 43:
; toByteArray()
; toByteArray()
: Returns a byte for byte copy in a ByteArray.
: Returns a byte for byte copy in a ByteArray.
; toByteArray(sourceCodec, targetCodec)
; toByteArray(sourceCharset, targetCharset)
: Returns a transcoded copy in a ByteArray.
: Returns a transcoded copy in a ByteArray.
; toByteString()
; toByteString()
: Returns itself, since there's no need to copy an immutable ByteString.
: Returns itself, since there's no need to copy an immutable ByteString.
; toByteString(sourceCodec, targetCodec)
; toByteString(sourceCharset, targetCharset)
: Returns a transcoded copy.
: Returns a transcoded copy.
; toArray()
; toArray()
Line 79: Line 82:




* substr(start)
; substr(start)
* substr(start, length)
; substr(start, length)
* substring(first)
; substring(first)
* substring(first, last)
; substring(first, last)
* The + operator returning new ByteStrings
; The + operator returning new ByteStrings
* The immutable [] operator returning ByteStrings
; The immutable [] operator returning ByteStrings
* toSource() which would return "ByteString([])" for a null byte string
; toSource() which would return "ByteString([])" for a null byte string
* valueOf()
; valueOf()


ByteString does not implement toUpperCase() or toLowerCase() since they are not meaningful without the context of a charset.
ByteString does not implement toUpperCase() or toLowerCase() since they are not meaningful without the context of a charset.
Line 116: Line 119:
=== Instance properties ===
=== Instance properties ===


* mutable length property
; mutable length property
** extending a byte array fills the new entries with 0.
: extending a byte array fills the new entries with 0.


=== Instance methods (in prototype) ===
=== Instance methods (in prototype) ===


* toArray() -> an array of the byte values
; toArray()
* toArray(charset) -> an array of the code points, decoded
: n array of the byte values
* toString() -> a string representation like "[ByteArray 10]"
; toArray(charset)
* decodeToString(charset) - decoded
: an array of the code points, decoded
* toByteArray() -> just a copy
; toString()
* toByteArray(sourceCodec, targetCodec) -> transcoded
: a string representation like "[ByteArray 10]"
* toByteString() -> byte for byte
; <u>toString(charset)</u>
* toByteString(sourceCodec, targetCodec) -> transcoded
: <u>an alias for decodeToString(charset)</u>
* concat(other:ByteArray|ByteString|Array)
; decodeToString()
* join(delimiter:ByteArray|ByteString|Array)
; decodeToString(charset)
* pop() -> byte:Number
: returns a String from its decoded bytes in a given charset.  If no charset is provided, or if the charset is "undefined", assumes the default system encoding.
* push(...variadic Numbers...)-> count:Number
; toByteArray()
* shift() -> byte:Number
: just a copy
* unshift(...variadic Numbers...) -> count:Number
; toByteArray(sourceCharset, targetCharset)
* reverse() in place reversal
: transcoded
* slice()
; toByteString()
* sort()
: byte for byte copy
* splice()
; toByteString(sourceCharset, targetCharset)
* toSource() returns a string like "ByteArray([])" for a null byte-array.
: transcoded
* valueOf()
; <u>byteAt(offset)</u>
* The + operator returning new ByteArrays
; concat(other:ByteArray|ByteString|Array)
* The mutable [] operator for numbers
; <strike>join(delimiter:ByteArray|ByteString|Array)</strike>
: <u>deemed unnecessary and semantically unclear</u>
; pop() -> byte:Number
; push(...variadic Numbers...)-> count:Number
; shift() -> byte:Number
; unshift(...variadic Numbers...) -> count:Number
; reverse() in place reversal
; slice()
; sort()
; splice()
; toSource() returns a string like "ByteArray([])" for a null byte-array.
; valueOf()
; The + operator returning new ByteArrays
; The mutable [] operator for numbers


== String ==
== String ==
Line 163: Line 179:
; toByteString(charset)
; toByteString(charset)
: Converts an array of Unicode code points to a ByteString encoded in charset.
: Converts an array of Unicode code points to a ByteString encoded in charset.
; <u>join(delimiter)</u>
: <u>Overridden to distinguish its behavior on whether the delimiter is a ByteString, ByteArray, or String.  Defaults to usual behavior for Strings.  Coerces items to ByteString or ByteArray and joins them on the delimiter for delimiters with their respective types.</u>


== General Requirements ==
== General Requirements ==
Line 168: Line 186:
None of the specified prototypes or augmentations to existing prototypes are enumerable.
None of the specified prototypes or augmentations to existing prototypes are enumerable.


Codec strings are as defined by IANA http://www.iana.org/assignments/character-sets.
<u>Any operation that requires encoding, decoding, or transcoding among charsets may throw an error if that charset is not supported by the implementation.  All implementations MUST support "us-ascii" and "utf-8".</u>
 
Charset strings are as defined by IANA http://www.iana.org/assignments/character-sets.
 
<u>Charsets are case insensitive.</u>


= Relevant Discussions =
= Relevant Discussions =


* [http://groups.google.com/group/serverjs/browse_thread/thread/f8ad81201f7b121b ByteArray and ByteString proposal]
* [http://groups.google.com/group/serverjs/browse_thread/thread/f8ad81201f7b121b ByteArray and ByteString proposal]
* [http://groups.google.com/group/serverjs/browse_thread/thread/a8d3a91af37fd355 ByteArray: byteAt method]
171

edits

Navigation menu