ServerJS/Encodings: Difference between revisions

→‎Class: Transcoder: add sourceCharset/destinationCharset constants
(→‎Class: Transcoder: change Transcoder API in a _major_ way.)
(→‎Class: Transcoder: add sourceCharset/destinationCharset constants)
 
(2 intermediate revisions by the same user not shown)
Line 1: Line 1:
For Streams, we need encodings support. There also should be a low-level API available for this.
For Streams, we need encodings support. There also should be a low-level API available for this.
= Specification =


== Encoding Names ==
== Encoding Names ==
Line 43: Line 45:
; [Constructor] Transcoder(from, to)
; [Constructor] Transcoder(from, to)
: Where from and to are the encoding names.
: Where from and to are the encoding names.
; [Constant] sourceCharset
: String containing the (possibly normalised) source charset name.
; [Constant] destinationCharset
: String containing the (possibly normalised) destination charset name.
; [Method] push(byteStringOrArray[, outputByteArray])
; [Method] push(byteStringOrArray[, outputByteArray])
: Convert input from a ByteString or ByteArray. Those parts of byteStringOrArray that could not be converted (for multi-byte encodings) are stored in a buffer. If outputByteArray is passed, the results are ''appended'' to outputByteArray.
: Convert input from a ByteString or ByteArray. Those parts of byteStringOrArray that could not be converted (for multi-byte encodings) are stored in a buffer. If outputByteArray is passed, the results are ''appended'' to outputByteArray.
: If outputByteArray was passed, returns outputByteArray, otherwise returns <u>nothing and the output is accumulated in an internal buffer.</u>
: If outputByteArray was passed, returns outputByteArray, otherwise returns <u>the converted bytes as a ByteString</u>.
: <u>The result will also contain bytes accumulated in prior calls to pushAccumulate.</u>
; <u>[Method] pushAccumulate(byteStringOrArray)</u>
: <u>Convert input from a ByteString or ByteArray into an internal buffer that will be read out the next time push or close is called.</u>
; [Method] close([outputByteArray])
; [Method] close([outputByteArray])
: Close the stream. Throws an exception if there was a conversion error (specifically, a partial multibyte character).
: Close the stream. Throws an exception if there was a conversion error (specifically, a partial multibyte character).
: <u>Writes the remaining output bytes (including those that were accumulated because push was called without an outputByteArray) into the here given outputByteArray (appended) or a new ByteString. If outputByteArray is given, it is returned, otherwise the ByteString is returned.</u>
: <u>Writes the remaining output bytes (including those that were accumulated in pushAccumulate) into the here given outputByteArray (appended) or a new ByteString. If outputByteArray is given, it is returned, otherwise the ByteString is returned.</u>
: <u>Also adds initial shift state sequences if required by the encoding.</u>


'''TODO''': Which exception to throw on error?
'''TODO''': Which exception to throw on error?
Line 56: Line 66:
   Transcoder = require('encodings').Transcoder
   Transcoder = require('encodings').Transcoder
   transcoder = new Transcoder('iso-8859-1', 'utf-32')
   transcoder = new Transcoder('iso-8859-1', 'utf-32')
   transcoder.push(input) // input is a ByteString
   transcoder.pushAccumulate(input) // input is a ByteString
   output = transcoder.close() // and output is a ByteString too
   output = transcoder.close() // and output is a ByteString too


Line 70: Line 80:


(See [[ServerJS/Encodings/OldClass]] for another API.)
(See [[ServerJS/Encodings/OldClass]] for another API.)
= Implementation Recommendations =
First of all, it is recommended to implement convertToString, convertFromString and convert with Transcoder.
Secondly, you should make sure that initial shift state support is properly implemented. When you're using iconv, you need to call iconv(cd, 0, 0, &ob, &ol) in Transcoder.close(). An example of what an initial shift state is: In the Japanese ISO-2022-JP encoding, the default state are ASCII bytes. However, the state can be switched to Japanese with an escape sequence. To make sure that at the end of the text, the state is ASCII again, iconv will emit another escape sequence to switch back again. This is important if you want to concatenate ISO-2022-JP texts, and an implementation of Transcoder that doesn't properly emit these sequences is <b>broken</b>.


= Relevant Discussions =
= Relevant Discussions =
62

edits