ServerJS/Encodings: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
m (updated discussion links)
(→‎Class: Transcoder: change Transcoder API in a _major_ way.)
Line 45: Line 45:
; [Method] push(byteStringOrArray[, outputByteArray])
; [Method] push(byteStringOrArray[, outputByteArray])
: Convert input from a ByteString or ByteArray. Those parts of byteStringOrArray that could not be converted (for multi-byte encodings) are stored in a buffer. If outputByteArray is passed, the results are ''appended'' to outputByteArray.
: Convert input from a ByteString or ByteArray. Those parts of byteStringOrArray that could not be converted (for multi-byte encodings) are stored in a buffer. If outputByteArray is passed, the results are ''appended'' to outputByteArray.
: If outputByteArray was passed, returns outputByteArray, otherwise returns (as a ByteString) as much output as could be converted.
: If outputByteArray was passed, returns outputByteArray, otherwise returns <u>nothing and the output is accumulated in an internal buffer.</u>
; [Method] close()
; [Method] close([outputByteArray])
: Close the stream. Throws an exception if there was a conversion error (specifically, a partial multibyte character).
: Close the stream. Throws an exception if there was a conversion error (specifically, a partial multibyte character).
: Returns nothing and takes no parameters.
: <u>Writes the remaining output bytes (including those that were accumulated because push was called without an outputByteArray) into the here given outputByteArray (appended) or a new ByteString. If outputByteArray is given, it is returned, otherwise the ByteString is returned.</u>


'''TODO''': Which exception to throw on error?
'''TODO''': Which exception to throw on error?
Line 56: Line 56:
   Transcoder = require('encodings').Transcoder
   Transcoder = require('encodings').Transcoder
   transcoder = new Transcoder('iso-8859-1', 'utf-32')
   transcoder = new Transcoder('iso-8859-1', 'utf-32')
   output = transcoder.push(input) // input is a ByteString, and output too
   transcoder.push(input) // input is a ByteString
   transcoder.close()
   output = transcoder.close() // and output is a ByteString too


Another example:
Another example:
Line 66: Line 66:
           transcoder.push(input, output)
           transcoder.push(input, output)
   }
   }
   transcoder.close()
   transcoder.close(output)
   // output is the complete conversion of all the input chunks concatenated now
   // output is the complete conversion of all the input chunks concatenated now



Revision as of 11:59, 6 June 2009

For Streams, we need encodings support. There also should be a low-level API available for this.

Encoding Names

The encoding names should be among those supported by ICONV, which seem to be a superset of http://www.iana.org/assignments/character-sets.

The following encodings are required:

  • US-ASCII
  • UTF-8
  • UTF-16
  • ISO-8859-1

Encoding names must be case insensitive

API

OK, so probably this should be a module:

 var enc = require('encodings')

Simple methods

For convenience, there should be these easy methods for converting between encodings:

string = enc.convertToString(sourceEncoding, byteStringOrArray)
Converts a ByteString or a ByteArray to a Javascript string.
byteString = enc.convertFromString(targetEncoding, string)
Converts a Javascript string to a ByteString.
byteString = enc.convert(sourceEncoding, targetEncoding, byteStringOrArray)
Converts a ByteString or a ByteArray to a ByteString.

Checking for available encodings

enc.supports(encodingName)
Checks if encodingName is supported and return true if so, false otherwise.
enc.listEncodings([encodingCheckerFunction or regex])
encodingCheckerFunction takes the encoding name as a parameter and returns true-ish if the encoding should be listed. Regexes should also be supported. If the parameter is missing, returns all supported encodings.

Class: Transcoder

There also should be a class enc.Transcoder for general transcoding conversion (between ByteStrings or ByteArrays).

[Constructor] Transcoder(from, to)
Where from and to are the encoding names.
[Method] push(byteStringOrArray[, outputByteArray])
Convert input from a ByteString or ByteArray. Those parts of byteStringOrArray that could not be converted (for multi-byte encodings) are stored in a buffer. If outputByteArray is passed, the results are appended to outputByteArray.
If outputByteArray was passed, returns outputByteArray, otherwise returns nothing and the output is accumulated in an internal buffer.
[Method] close([outputByteArray])
Close the stream. Throws an exception if there was a conversion error (specifically, a partial multibyte character).
Writes the remaining output bytes (including those that were accumulated because push was called without an outputByteArray) into the here given outputByteArray (appended) or a new ByteString. If outputByteArray is given, it is returned, otherwise the ByteString is returned.

TODO: Which exception to throw on error?

Example:

 Transcoder = require('encodings').Transcoder
 transcoder = new Transcoder('iso-8859-1', 'utf-32')
 transcoder.push(input) // input is a ByteString
 output = transcoder.close() // and output is a ByteString too

Another example:

 transcoder = new Transcoder('utf-32', 'utf-8')
 output = new ByteArray()
 while (input = readSomeByteFromSomewhere()) {
         transcoder.push(input, output)
 }
 transcoder.close(output)
 // output is the complete conversion of all the input chunks concatenated now

(See ServerJS/Encodings/OldClass for another API.)

Relevant Discussions