ServerJS/Encodings

From MozillaWiki
< ServerJS
Revision as of 21:40, 9 April 2009 by MrN (talk | contribs) (→‎Class: Converter: unidirectional!)
Jump to navigation Jump to search

Rationale

For Streams, we need encodings support. There also should be a low-level API available for this.

There is some discussion on the mailing list (see <http://groups.google.com/group/serverjs/browse_thread/thread/6365b2a54615a134>) and here, there is a summary of these efforts.

Encoding Names

The encoding names should be among those supported by ICONV, which seem to be a superset of http://www.iana.org/assignments/character-sets.

The following encodings are required:

  • US-ASCII
  • UTF-8
  • UTF-16
  • ISO-8859-1

Encoding names must be case insensitive

API

OK, so probably this should be a module:

 var enc = require('encodings')

Simple methods

For convenience, there should be these easy methods for converting between encodings:

string = enc.convertToString(sourceEncoding, byteStringOrArray)
Converts a ByteString or a ByteArray to a Javascript string.
byteString = enc.convertFromString(targetEncoding, string)
Converts a Javascript string to a ByteString.
byteString = enc.convert(sourceEncoding, targetEncoding, byteStringOrArray)
Converts a ByteString or a ByteArray to a ByteString.

Checking for available encodings

enc.supports(encodingName)
Checks if encodingName is supported and return true if so, false otherwise.
enc.listEncodings([encodingCheckerFunction or regex])
encodingCheckerFunction takes the encoding name as a parameter and returns true-ish if the encoding should be listed. Regexes should also be supported. If the parameter is missing, returns all supported encodings.

Class: Converter

There also should be a class enc.Converter for more advanced conversion.

Please note that the interface is one way. Despite the fact that it has two methods, it's one way. It only supports encoding, no decoding! Multiple people have got that wrong when glancing over it, so it's emphasised here.

[Constructor] Converter(from, to)
Where from and to are the encoding names.
[Method] push(byteStringOrArray)
Convert input from a ByteString or ByteArray. The results are stored in an internal buffer, and also those parts of byteStringOrArray that could not be converted (for multi-byte encodings, in a separate buffer).
Returns nothing.
[Method] get([byteArray,] [maximumSize])
Read maximumSize bytes or as many bytes as available out of the internal buffer. If byteArray is specified, the data is written into that ByteArray.
Returns a ByteString if byteArray is not specified, or byteArray itself otherwise.

Example usage:

 Converter = require('encodings').Converter
 converter = new Converter('iso-8859-1', 'utf-32')
 converter.push(input)
 output = converter.get()