48
edits
(Corrections) |
(→TSV Coding: shorter) |
||
| Line 9: | Line 9: | ||
== TSV Coding == | == TSV Coding == | ||
The data is a UTF-8 encoded unicode stream. Lines (=records) are separated using LF (newline, U+000A). There are no header/title records. Fields (=columns) are separated by TAB (U+0009). | The data is a UTF-8 encoded unicode stream. Lines (=records) are separated using LF (newline, U+000A). There are no header/title records. Fields (=columns) are separated by TAB (U+0009). So TAB and LF in fields need escaping. For this, they are preceded using backslash (U+005C). Of course, this means that backslashes in fields are escaped themselves. | ||
* [https://github.com/michaelku/grouper-worker/blob/488f1385fe5a1865cfc423ce7bec25237b150bca/src/main/java/org/mozilla/grouper/input/TsvReader.java Example FSM] to parse input data | * [https://github.com/michaelku/grouper-worker/blob/488f1385fe5a1865cfc423ce7bec25237b150bca/src/main/java/org/mozilla/grouper/input/TsvReader.java Example FSM] to parse input data | ||
edits