ServerJS/Encodings: Difference between revisions
< ServerJS
Jump to navigation
Jump to search
m (updated discussion links) |
(→Class: Transcoder: change Transcoder API in a _major_ way.) |
||
Line 45: | Line 45: | ||
; [Method] push(byteStringOrArray[, outputByteArray]) | ; [Method] push(byteStringOrArray[, outputByteArray]) | ||
: Convert input from a ByteString or ByteArray. Those parts of byteStringOrArray that could not be converted (for multi-byte encodings) are stored in a buffer. If outputByteArray is passed, the results are ''appended'' to outputByteArray. | : Convert input from a ByteString or ByteArray. Those parts of byteStringOrArray that could not be converted (for multi-byte encodings) are stored in a buffer. If outputByteArray is passed, the results are ''appended'' to outputByteArray. | ||
: If outputByteArray was passed, returns outputByteArray, otherwise returns | : If outputByteArray was passed, returns outputByteArray, otherwise returns <u>nothing and the output is accumulated in an internal buffer.</u> | ||
; [Method] close() | ; [Method] close([outputByteArray]) | ||
: Close the stream. Throws an exception if there was a conversion error (specifically, a partial multibyte character). | : Close the stream. Throws an exception if there was a conversion error (specifically, a partial multibyte character). | ||
: | : <u>Writes the remaining output bytes (including those that were accumulated because push was called without an outputByteArray) into the here given outputByteArray (appended) or a new ByteString. If outputByteArray is given, it is returned, otherwise the ByteString is returned.</u> | ||
'''TODO''': Which exception to throw on error? | '''TODO''': Which exception to throw on error? | ||
Line 56: | Line 56: | ||
Transcoder = require('encodings').Transcoder | Transcoder = require('encodings').Transcoder | ||
transcoder = new Transcoder('iso-8859-1', 'utf-32') | transcoder = new Transcoder('iso-8859-1', 'utf-32') | ||
transcoder.push(input) // input is a ByteString | |||
transcoder.close() | output = transcoder.close() // and output is a ByteString too | ||
Another example: | Another example: | ||
Line 66: | Line 66: | ||
transcoder.push(input, output) | transcoder.push(input, output) | ||
} | } | ||
transcoder.close() | transcoder.close(output) | ||
// output is the complete conversion of all the input chunks concatenated now | // output is the complete conversion of all the input chunks concatenated now | ||
Revision as of 11:59, 6 June 2009
For Streams, we need encodings support. There also should be a low-level API available for this.
Encoding Names
The encoding names should be among those supported by ICONV, which seem to be a superset of http://www.iana.org/assignments/character-sets.
The following encodings are required:
- US-ASCII
- UTF-8
- UTF-16
- ISO-8859-1
Encoding names must be case insensitive
API
OK, so probably this should be a module:
var enc = require('encodings')
Simple methods
For convenience, there should be these easy methods for converting between encodings:
- string = enc.convertToString(sourceEncoding, byteStringOrArray)
- Converts a ByteString or a ByteArray to a Javascript string.
- byteString = enc.convertFromString(targetEncoding, string)
- Converts a Javascript string to a ByteString.
- byteString = enc.convert(sourceEncoding, targetEncoding, byteStringOrArray)
- Converts a ByteString or a ByteArray to a ByteString.
Checking for available encodings
- enc.supports(encodingName)
- Checks if encodingName is supported and return true if so, false otherwise.
- enc.listEncodings([encodingCheckerFunction or regex])
- encodingCheckerFunction takes the encoding name as a parameter and returns true-ish if the encoding should be listed. Regexes should also be supported. If the parameter is missing, returns all supported encodings.
Class: Transcoder
There also should be a class enc.Transcoder for general transcoding conversion (between ByteStrings or ByteArrays).
- [Constructor] Transcoder(from, to)
- Where from and to are the encoding names.
- [Method] push(byteStringOrArray[, outputByteArray])
- Convert input from a ByteString or ByteArray. Those parts of byteStringOrArray that could not be converted (for multi-byte encodings) are stored in a buffer. If outputByteArray is passed, the results are appended to outputByteArray.
- If outputByteArray was passed, returns outputByteArray, otherwise returns nothing and the output is accumulated in an internal buffer.
- [Method] close([outputByteArray])
- Close the stream. Throws an exception if there was a conversion error (specifically, a partial multibyte character).
- Writes the remaining output bytes (including those that were accumulated because push was called without an outputByteArray) into the here given outputByteArray (appended) or a new ByteString. If outputByteArray is given, it is returned, otherwise the ByteString is returned.
TODO: Which exception to throw on error?
Example:
Transcoder = require('encodings').Transcoder transcoder = new Transcoder('iso-8859-1', 'utf-32') transcoder.push(input) // input is a ByteString output = transcoder.close() // and output is a ByteString too
Another example:
transcoder = new Transcoder('utf-32', 'utf-8') output = new ByteArray() while (input = readSomeByteFromSomewhere()) { transcoder.push(input, output) } transcoder.close(output) // output is the complete conversion of all the input chunks concatenated now
(See ServerJS/Encodings/OldClass for another API.)