Firefox/Input/Data: Difference between revisions
Jump to navigation
Jump to search
(Corrections) |
(→TSV Coding: shorter) |
||
| Line 9: | Line 9: | ||
== TSV Coding == | == TSV Coding == | ||
The data is a UTF-8 encoded unicode stream. Lines (=records) are separated using LF (newline, U+000A). There are no header/title records. Fields (=columns) are separated by TAB (U+0009). | The data is a UTF-8 encoded unicode stream. Lines (=records) are separated using LF (newline, U+000A). There are no header/title records. Fields (=columns) are separated by TAB (U+0009). So TAB and LF in fields need escaping. For this, they are preceded using backslash (U+005C). Of course, this means that backslashes in fields are escaped themselves. | ||
* [https://github.com/michaelku/grouper-worker/blob/488f1385fe5a1865cfc423ce7bec25237b150bca/src/main/java/org/mozilla/grouper/input/TsvReader.java Example FSM] to parse input data | * [https://github.com/michaelku/grouper-worker/blob/488f1385fe5a1865cfc423ce7bec25237b150bca/src/main/java/org/mozilla/grouper/input/TsvReader.java Example FSM] to parse input data | ||
Revision as of 18:33, 30 March 2011
Summary
Currently, input offers two export formats for the user feedback data. The data is exported in from of TSV coded tables:
- opinions.tsv.bz2 offers the everything but ratings
- ratings.tsv.bz2 has the ratings data
Both tables form a 1:n relationship and can be joined using the first column (the opinion id). Both tables are compressed using bzip2, so decompress them e.g. using bunzip2 or bzip2 -d.
TSV Coding
The data is a UTF-8 encoded unicode stream. Lines (=records) are separated using LF (newline, U+000A). There are no header/title records. Fields (=columns) are separated by TAB (U+0009). So TAB and LF in fields need escaping. For this, they are preceded using backslash (U+005C). Of course, this means that backslashes in fields are escaped themselves.
- Example FSM to parse input data
Opinions
Fields
- 1. Opinion ID
- coded as base10 integer number, used to lookup ratings or items on the input website
- 2. Time of feedback
- base10 integer, note this is UNIX time (i.e. UTC+0, so seconds since 1970-01-01T00:00:00Z)
- 3. Type
- one of issue, praise, suggestion, rating
- 4. Product
- one of firefox, mobile
- 5. Version
- a version identifier such as 4.0b11 or 3.6.13
- 6. Platform
- one of mac, windows, linux, android, maemo
- 7. Locale
- a locale identifier such as en-US
- 8. Manufacturer
- for product:mobile only, the device manufacturer
- 9. Device
- for product:mobile only, a device identifier
- 10. URL
- an http, https, chrome or about URL given by the user with his feedback
- 11. Description
- Free text entered by the user. Limited to 140 unicode characters (not bytes)
Ratings
One line per (opinion x rating category). Keyed to opinion table using opinion ID.
Fields
- 1. Opinion ID
- base10 integer, used to group related ratings
- 2. Rating Type
- one of startup, pageload, responsive, crashy (higher = more stable), features
- 3. Rating Value
- base10 integer ranging from 1 to 5