ServerJS/Encodings: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
(→‎API: fix example, add alternative API proposal)
Line 65: Line 65:
   Converter = require('encodings').Converter
   Converter = require('encodings').Converter
   converter = new Converter('iso-8859-1', 'utf-32')
   converter = new Converter('iso-8859-1', 'utf-32')
   converter.push(input) // input is a ByteString
   converter.write(input) // input is a ByteString
   output = converter.get() // output is a ByteString
   output = converter.read() // output is a ByteString
  converter.close()
 
=== Alternative: Converter ===
 
There is another way the Converter interface could work.
 
; [Constructor] Converter(from, to)
: Where from and to are the encoding names.
; [Method] push(byteStringOrArray)
: Convert input from a ByteString or ByteArray. The results are stored in an internal buffer, and also those parts of byteStringOrArray that could not be converted (for multi-byte encodings, in a separate buffer).
: Returns (as a ByteString) as much as could be converted.
; [Method] close()
: Close the stream. Throws an exception if there was a conversion error (specifically, a partial multibyte character).
: Returns nothing and takes no parameters.
 
Example:
 
  Converter = require('encodings').Converter
  converter = new Converter('iso-8859-1', 'utf-32')
  output = converter.push(input) // input is a ByteString, and output too
  converter.close()

Revision as of 22:22, 9 April 2009

Rationale

For Streams, we need encodings support. There also should be a low-level API available for this.

There is some discussion on the mailing list (see <http://groups.google.com/group/serverjs/browse_thread/thread/6365b2a54615a134>) and here, there is a summary of these efforts.

Encoding Names

The encoding names should be among those supported by ICONV, which seem to be a superset of http://www.iana.org/assignments/character-sets.

The following encodings are required:

  • US-ASCII
  • UTF-8
  • UTF-16
  • ISO-8859-1

Encoding names must be case insensitive

API

OK, so probably this should be a module:

 var enc = require('encodings')

Simple methods

For convenience, there should be these easy methods for converting between encodings:

string = enc.convertToString(sourceEncoding, byteStringOrArray)
Converts a ByteString or a ByteArray to a Javascript string.
byteString = enc.convertFromString(targetEncoding, string)
Converts a Javascript string to a ByteString.
byteString = enc.convert(sourceEncoding, targetEncoding, byteStringOrArray)
Converts a ByteString or a ByteArray to a ByteString.

Checking for available encodings

enc.supports(encodingName)
Checks if encodingName is supported and return true if so, false otherwise.
enc.listEncodings([encodingCheckerFunction or regex])
encodingCheckerFunction takes the encoding name as a parameter and returns true-ish if the encoding should be listed. Regexes should also be supported. If the parameter is missing, returns all supported encodings.

Class: Converter

There also should be a class enc.Converter for more advanced conversion.

Please note that the interface is one way. Despite the fact that it has two methods, it's one way. It only supports encoding, no decoding! Multiple people have got that wrong when glancing over it, so it's emphasised here.

[Constructor] Converter(from, to)
Where from and to are the encoding names.
[Method] write(byteStringOrArray)
Convert input from a ByteString or ByteArray. The results are stored in an internal buffer, and also those parts of byteStringOrArray that could not be converted (for multi-byte encodings, in a separate buffer).
Returns nothing.
[Method] read([byteArray,] [maximumSize])
Read maximumSize bytes or as many bytes as available out of the internal buffer. If byteArray is specified, the data is written into that ByteArray.
Returns a ByteString if byteArray is not specified, or byteArray itself otherwise.
[Method] close()
Close the stream. Throws an exception if there was a conversion error (specifically, a partial multibyte character).
Returns nothing and takes no parameters.

TODO: Which exception to throw on error?

Example usage:

 Converter = require('encodings').Converter
 converter = new Converter('iso-8859-1', 'utf-32')
 converter.write(input) // input is a ByteString
 output = converter.read() // output is a ByteString
 converter.close()

Alternative: Converter

There is another way the Converter interface could work.

[Constructor] Converter(from, to)
Where from and to are the encoding names.
[Method] push(byteStringOrArray)
Convert input from a ByteString or ByteArray. The results are stored in an internal buffer, and also those parts of byteStringOrArray that could not be converted (for multi-byte encodings, in a separate buffer).
Returns (as a ByteString) as much as could be converted.
[Method] close()
Close the stream. Throws an exception if there was a conversion error (specifically, a partial multibyte character).
Returns nothing and takes no parameters.

Example:

 Converter = require('encodings').Converter
 converter = new Converter('iso-8859-1', 'utf-32')
 output = converter.push(input) // input is a ByteString, and output too
 converter.close()