Firefox/Input/Data: Difference between revisions
(Created page with "== Summary == Currently, input offers two export formats for the user feedback data. The data is exported in from of TSV coded tables: * ''opinions.tsv.bz2'' offers the everyth...") |
(→Summary: Added feedback URLs) |
||
| Line 3: | Line 3: | ||
Currently, input offers two export formats for the user feedback data. The data is exported in from of TSV coded tables: | Currently, input offers two export formats for the user feedback data. The data is exported in from of TSV coded tables: | ||
* ''opinions.tsv.bz2'' offers the everything but ratings | * ''[http://input.mozilla.com/data/opinions.tsv.bz2 opinions.tsv.bz2]'' offers the everything but ratings | ||
* ''ratings.tsv.bz2'' has the ratings data | * ''[http://input.mozilla.com/data/ratings.tsv.bz2 ratings.tsv.bz2]'' has the ratings data | ||
Both tables from a '''1:n''' relationship and can be joined using the first column (the opinion id). Both tables are compressed using bzip2, so decode them e.g. using ''bunzip2''. | Both tables from a '''1:n''' relationship and can be joined using the first column (the opinion id). Both tables are compressed using bzip2, so decode them e.g. using ''bunzip2''. | ||
Revision as of 18:26, 30 March 2011
Summary
Currently, input offers two export formats for the user feedback data. The data is exported in from of TSV coded tables:
- opinions.tsv.bz2 offers the everything but ratings
- ratings.tsv.bz2 has the ratings data
Both tables from a 1:n relationship and can be joined using the first column (the opinion id). Both tables are compressed using bzip2, so decode them e.g. using bunzip2.
TSV Coding
The data is a UTF-8 encoded unicode stream. Lines (=records) are separated using LF (newline, U+000A). There are no header/title records. Fields (=columns) are separated by TAB (U+0009). In this scheme, two characters within column values need to be escaped: TAB and LF. For this, they are preceded using backslash (U+005C) when they are part of cell content and not of TSV coding. Of course, this means that backslashes in content are escaped the same way.
- Example FSM to parse input data
Opinions
Fields
- 1. Opinion ID
- coded as base10 integer number, used to lookup ratings or items on the input website
- 2. Time of feedback
- base10 integer, note this is UNIX time (i.e. UTC+0, so seconds since 1970-01-01T00:00:00Z)
- 3. Type
- one of issue, praise, suggestion, rating
- 4. Product
- one of firefox, mobile
- 5. Version
- a version identifier such as 4.0b11 or 3.6.13
- 6. Platform
- one of mac, windows, linux, android, maemo
- 7. Locale
- a locale identifier such as en-US
- 8. Manufacturer
- for product:mobile only, the device manufacturer
- 9. Device
- for product:mobile only, a device identifier
- 10. URL
- an http, https, chrome or about URL given by the user with his feedback
- 11. Description
- Free text entered by the user. Limited to 140 unicode characters (not bytes)
Ratings
One line per (opinion x rating category). Keyed to opinion table using opinion ID.
Fields
- 1. Opinion ID
- base10 integer, used to group related ratings
- 2. Rating Type
- one of performance, startup
- 3. Rating Value
- base10 integer ranging from 0 to 100.