Support/MetricsDashboardPRD/MetricsCurrent

From MozillaWiki
Jump to: navigation, search

This is a copy of the metrics section from Metrics with notes about how each metric is currently collected and a quick overview of the query.


Overview

sumo_dump is a replica of our production database that's made nightly. $startdate is a string in this perl script of the form YYYY-MM-DD and is the START of the previous week.

of_dump is nightly copy of the production Openfire database, which contains chat logs and live chat user accounts.

  • i@Number of unique visitors (Om)
    • Pulled directly from Omniture.
  • i+Number of new forum threads
    • SELECT COUNT(*) FROM sumo_dump.tiki_actionlog WHERE action = 'Posted' and object=1 and objectType='forum' AND lastModif BETWEEN UNIX_TIMESTAMP('$startdate') AND (UNIX_TIMESTAMP('$startdate') + 604800);
    • A record is made in tiki_actionlog every time certain actions are taken. The Posted action on the forum objectType refers to making a new thread. object=1 is the Firefox support forum (we have multiple forums -- some are contributor specific but we care mainly about the Firefox support forum.
  • i+Number of new live chats
    • Collected using live chat queries below.
  • i+Number of new KB articles
    • Collected by cilias. I think he uses the web interface to tiki_actionlog to get this info.
  • i-Number of active contributors
    • SELECT COUNT(DISTINCT(uu.login)) FROM sumo_dump.tiki_actionlog al JOIN sumo_dump.users_users uu ON al.user = uu.login JOIN sumo_dump.users_usergroups ug ON uu.userId = ug.userId JOIN sumo_dump.tiki_categories tc ON al.categId = tc.categId WHERE al.objectType='wiki page' AND al.lastModif BETWEEN UNIX_TIMESTAMP('$startdate') AND (UNIX_TIMESTAMP('$startdate') + 604800) AND ug.groupName IN ('System Admins', 'Approvers', 'Locale leaders', Contributors', 'Forum Moderators') AND tc.name = 'Staging Area';
    • tiki_actionlog also stores when users make edits or make new articles in the knowledge base. We join to users_usergroups (Which stores which group the users are in) via users_users to make sure we only count edits by allowed contributors and we join to tiki_categories to only count actions on Staging articles (Otherwise we end up counting article approvals or maintenance or work on contributor documentation/Live chat pages. Here, we're actually interested in the number of unique users who are helping.
  • i-Number of new contributors
    • SELECT COUNT(DISTINCT(uu.login)) FROM sumo_dump.tiki_actionlog al JOIN sumo_dump.users_users uu ON al.user = uu.login JOIN sumo_dump.users_usergroups ug ON uu.userId = ug.userId JOIN sumo_dump.tiki_categories tc ON al.categId = tc.categId WHERE al.objectType='wiki page' AND al.lastModif BETWEEN UNIX_TIMESTAMP('$startdate') AND (UNIX_TIMESTAMP('$startdate') + 604800) AND uu.registrationDate BETWEEN UNIX_TIMESTAMP('$startdate') AND UNIX_TIMESTAMP('$startdate') + 604800) AND ug.groupName IN ('System Admins', Approvers', 'Locale leaders', 'Contributors', 'Forum Moderators') AND tc.name = 'Staging Area';
    • Same query as above only we are restricting to users who registered in the past week as well.
  • i+Number of new registered accounts
    • SELECT COUNT(DISTINCT(uu.userId)) FROM sumo_dump.users_users uu JOIN sumo_dump.users_usergroups ug ON uu.userId = ug.userId WHERE uu.registrationDate BETWEEN UNIX_TIMESTAMP('$startdate') AND (UNIX_TIMESTAMP('$startdate') + 604800) AND ug.groupName IN ('Contributors');
    • This queries the users_users table for users in Contributors and who registered this past week. (As all new registrations are by default in the Contributors group that restriction has actually become extraneous but it's important for looking historically.)
  •  %@Combined total CSAT score (for KB, forum, chat)
    • We're not actually collecting this metric right now (because we have no good way of doing so.)

Knowledge base (All tiki queries)

  • i@Number of modified articles
    • Pulled by cilias but easily queried from tiki_actionlog
  • i+Number of new articles
    • Ditto
  • i-Number of active KB contributors
    • Exact same data as above.
  • i-Number of new KB contributors
    • Ditto
  •  %=ratio of votes that say articles are helpful (calculated from:)
    • i+number of yes votes
      • SELECT SUM(yeshelpvotes) FROM sumo_processed.resultset;
      • sumo_processed.resultset is a DB table that stores every en-US KB article along with some key numbers about each of them (generated weekly and relevant for the previous week). See the section on KB articles below.
    • i+number of no votes
      • As above
  •  %=ratio of votes that say articles are easy to understand (calculated from:)
    • i+number of yes votes
      • ditto
    • i+number of no votes
      • ditto
  •  %@CSAT score This can also be + in that it can be collected daily and calculated for other date ranges using a weighted average rather than a sum.
  • i+Number of CSAT votes
    • SELECT FORMAT(SUM(numcsatvotes * avgcsatscore) * 100/(5*SUM(numcsatvotes)), 1) FROM sumo_processed.resultset;

Forums (All tiki queries)

  • i+New forum threads
    • As in the overview section above.
  • i+Number of replies by contributors (total)
    • SELECT COUNT(*) FROM sumo_dump.tiki_actionlog al JOIN sumo_dump.users_users uu ON al.user = uu.login JOIN sumo_dump.users_usergroups ug ON uu.userId = ug.userId WHERE al.action = 'Replied' and al.object=1 and al.objectType='forum' AND al.lastModif BETWEEN UNIX_TIMESTAMP('$startdate') AND UNIX_TIMESTAMP('$startdate') + 604800) AND ug.groupName IN ('System Admins', 'Approvers', 'Locale leaders', 'Contributors', 'Forum Moderators');
    • Unlike new threads above, here we're looking at the action of Replied and where the user is in one of the contributor groups.
  • i-Number of threads in each status category (after 1 week)
    • Request for more info
    • Solved
    • Proposed solution
    • Locked
    • None of the above
    • SELECT CONCAT(type, ' (', count(1), ') '), count(1) as numb FROM sumo_dump.tiki_comments where parentId = 0 and commentDate > UNIX_TIMESTAMP('$startdate') - 604800) and commentDate < (UNIX_TIMESTAMP('$startdate')) and object = 1 group by type order by numb desc;
    • I'm taking a shortcut here and returning more than one line from this query so we'll have to rewrite for the dashboard. Basically tiki_comments stores all the forum posts. Each top-level post has parentId = 0 and a type (n, r, p, o , l, s, a) which is set based on whether it's new/none of the above, there's a request for more info, proposed solution, solved, locked, stickied, announcement. Note, this query also has a different time frame, we're looking at threads started in the prior week to allow our contributors a week to respond to them and change the statuses.
  • i-Number of threads with no replies for over 1 week
    • SELECT count(1) as numb FROM sumo_dump.tiki_comments where parentId = 0 and commentDate > (UNIX_TIMESTAMP('$startdate') - 604800) and commentDate < UNIX_TIMESTAMP('$startdate')) and object = 1 and objectType = 'forum' and threadId not in ( select distinct(tc2.parentId) from sumo_dump.tiki_comments tc2 where tc2.commentDate > (UNIX_TIMESTAMP('$startdate') - 604800));
    • The subquery returns the forum posts that are threaded UNDER the parent thread. If there are none, we count.
  • i-Number of unique active contributors
    • SELECT uu.login, COUNT(*) as replies FROM sumo_dump.tiki_actionlog al JOIN sumo_dump.users_users uu ON al.user = uu.login JOIN sumo_dump.users_usergroups ug ON uu.userId = ug.userId WHERE al.action = 'Replied' we're using the action log to track actions AND al.object=1 and al.objectType='forum' AND al.lastModif BETWEEN UNIX_TIMESTAMP('$startdate') AND (UNIX_TIMESTAMP('$startdate') + 604800) AND ug.groupName IN ('System Admins', 'Approvers', 'Locale leaders', 'Contributors', 'Forum Moderators') GROUP BY uu.login HAVING replies > 2 ORDER BY replies DESC;
    • This actually returns the names of the users who have contributed 3 or more replies in the past week but you can also SELECT COUNT(DISTINCT(uu.login)) to just get the number we're tracking in the metrics.
  • i-Number of new contributors
    • SELECT uu.login, COUNT(*) as replies FROM sumo_dump.tiki_actionlog al JOIN sumo_dump.users_users uu ON al.user = uu.login JOIN sumo_dump.users_usergroups ug ON uu.userId = ug.userId WHERE al.action = 'Replied' and al.object=1 and al.objectType='forum' AND al.lastModif BETWEEN UNIX_TIMESTAMP('$startdate') AND (UNIX_TIMESTAMP('$startdate') + 604800) AND uu.registrationDate BETWEEN UNIX_TIMESTAMP('$startdate') AND UNIX_TIMESTAMP('$startdate') + 604800) AND ug.groupName IN ('System Admins', 'Approvers', 'Locale leaders', 'Contributors', 'Forum Moderators') GROUP BY uu.login HAVING replies > 2 ORDER BY replies DESC
    • As above, but with the added restriction that users_users.registrationDate is in the past week.
  •  %@CSAT score This can also be + in that it can be collected daily and calculated for other date ranges using a weighted average rather than a sum.
    • SELECT SUM(voteCount * tpo.position) / SUM(voteCount)*25-25 AS 'avgforumcsatscore' FROM sumo_dump.tiki_poll_votes tpv INNER JOIN sumo_dump.tiki_poll_options tpo ON sumo_dump.tpo.optionId = tpv.optionId INNER JOIN sumo_dump.tiki_polls tp ON tp.pollId = tpo.pollId WHERE tp.pollQuestionId = 3 AND tp.pollId = 6508 AND voteDate BETWEEN '$startdate' AND DATE_ADD('$startdate', INTERVAL 6 DAY);
    • Our votes and CSAT votes systems is a little complex and explained in the KB articles section below. Here we're tracking the result of one poll (pollId=6508) which is one specifically for forum CSAT.
  • i+Number of CSAT votes
    • I'm not actually pulling this right now but it'd be a query similar to the one above.
  • f-Average time to first response (in hours) for forum threads in the previous week.
    • SELECT MEDIAN(diff) FROM (SELECT MIN(thread.commentDate - parent.commentDate) as diff FROM sumo_dump.tiki_comments AS thread JOIN sumo_dump.tiki_comments AS parent ON thread.parentId = parent.threadId WHERE parent.parentId = 0 AND parent.object = 1 AND parent.commentDate BETWEEN UNIX_TIMESTAMP('$startdate') AND (UNIX_TIMESTAMP('$startdate') + 604800) GROUP BY parent.threadId) AS difftable Actually mySQL has no MEDIAN function so the query is really:
    • SELECT difftablex.diff/3600 FROM (SELECT MIN(thread.commentDate - parent.commentDate) as diff FROM sumo_dump.tiki_comments AS thread JOIN sumo_dump.tiki_comments AS parent ON thread.parentId = parent.threadId WHERE parent.parentId = 0 AND parent.object = 1 AND parent.commentDate BETWEEN UNIX_TIMESTAMP('$startdate') AND (UNIX_TIMESTAMP('$startdate') + 604800) GROUP BY parent.threadId) AS difftablex, (SELECT MIN(thread.commentDate - parent.commentDate) as diff FROM sumo_dump.tiki_comments AS thread JOIN sumo_dump.tiki_comments AS parent ON thread.parentId = parent.threadId WHERE parent.parentId = 0 AND parent.object = 1 AND parent.commentDate BETWEEN UNIX_TIMESTAMP('$startdate') AND (UNIX_TIMESTAMP('$startdate') + 604800) GROUP BY parent.threadId) AS difftabley GROUP BY difftablex.diff HAVING SUM(SIGN(1-SIGN(difftablex.diff - difftabley.diff))) = ROUND((COUNT(*)+1)/2); but that's harder to read.
    • Essentially, we're getting joining all posts with their parents and finding the minimum time difference (the time to first response in seconds) in the subquery. Then we just median that column in the main query.


Live Chat Many of these aren't done with queries so there will be some work with zzxc here.


  • i+Number of chats answered
    • Collected from Openfire fpSession table, which contains one row for each chat session. The start and end dates for the metric are compared with startTime to select data from a range. The transcript must match "message", indicating that at least one message stanza was sent. (BEGIN and END are dates in YYYY-MM-DD format)
SELECT count(*) FROM of_dump.fpSession
WHERE (startTime > (unix_timestamp('BEGIN') * 1000))
AND (startTime < (unix_timestamp('END') * 1000))
AND transcript REGEXP "message";
  • i+Number of chats not accepted by an agent
    • Collected from fpSession table. The transcript is null in cases where no agent accepts the offer.
SELECT count(*) FROM of_dump.fpSession
WHERE (startTime > (unix_timestamp('BEGIN') * 1000))
AND (startTime < (unix_timestamp('END') * 1000))
AND transcript IS NULL;
  • i+Number of chats dropped
    • This is the number of requests dropped before any messages were sent. (Includes cases where users close the chat window, helpers get disconnected before answering, or live chat bugs resulting in disconnections)
SELECT count(*) FROM of_dump.fpSession
WHERE (startTime > (unix_timestamp('BEGIN') * 1000))
AND (startTime < (unix_timestamp('END') * 1000))
AND transcript NOT REGEXP "message"
AND transcript IS NOT NULL;
  • i=Total number of chat requests (calculated from summing chat request totals from above)
  • i-Number of active contributors
    • Collected from Openfire ofUser and fpSession tables. Pattern matching is used to determine the total number of helpers in chat sessions during a given week. The username column comes from the ofUser table and is the bare username (eg. "zzxc") of the user account.
SELECT COUNT(DISTINCT username) FROM of_dump.ofUser
LEFT JOIN of_dump.fpSession ON transcript REGEXP CONCAT("\"", username,"[.@.]chat-support.mozilla.com[./.]")
WHERE (startTime > (unix_timestamp('BEGIN') * 1000))
AND (startTime < (unix_timestamp('END') * 1000));
  • i-Number of new contributors
    • Collected using the same method as the last query, adding a condition for accounts created during this time range. The creationDate column is from the ofUser table and specifies the unixtime (in milliseconds) that the account was created.
SELECT COUNT(DISTINCT username) FROM of_dump.ofUser
LEFT JOIN of_dump.fpSession ON transcript REGEXP CONCAT("\"", username,"[.@.]chat-support.mozilla.com[./.]")
WHERE (startTime > (unix_timestamp('BEGIN') * 1000))
AND (startTime < (unix_timestamp('END') * 1000))
AND (creationDate > (unix_timestamp('BEGIN') * 1000));
  • i+(CSAT survey) Number of chats with problem solved
    • Collected from Tikiwiki poll results combined with Openfire session data. The Webchat ID used in Tikiwiki is part of the referer metadata associated with each chat. Each row in fpSession can have one or more metadata rows (from fpSessionMetadata) associated with it. Metadata includes the question text, plugins, extensions, referer, and nickname.
    • For problems solved, a survey response of "yes" or a score from 1-5 is counted. A response of 1-5 indicates that "yes" was previously selected, since we only ask people who get their problems solved for a rating.
    • Note: These queries will become faster and simpler after bug 490638 is fixed.
SELECT count(*) as yesresults FROM sumo_dump.tiki_user_votings
LEFT JOIN sumo_dump.tiki_poll_options ON tiki_user_votings.optionId=tiki_poll_options.optionId
LEFT JOIN of_dump.fpSessionMetadata ON (tiki_user_votings.id regexp "feedback" and fpSessionMetadata.metadataValue regexp trim(leading 'feedback' from tiki_user_votings.id))
LEFT JOIN of_dump.fpSession ON (fpSession.sessionID = fpSessionMetadata.sessionID)
WHERE feedbackObjectType = "livechat"
AND fpSessionMetadata.metadataName = "referer"
AND tiki_user_votings.id regexp "feedback"
AND (title regexp "[1-5]" OR title regexp "yes")
AND startTime > (unix_timestamp('START') * 1000) and startTime < (unix_timestamp('END') * 1000);
  • i+(CSAT survey) Number of chats to follow up on
    • This is based on the above query, instead counting the number of people who answered that their issue was unsolved but that they are planning on following up.
SELECT count(*) as followupresults FROM sumo_dump.tiki_user_votings
LEFT JOIN sumo_dump.tiki_poll_options ON tiki_user_votings.optionId=tiki_poll_options.optionId
LEFT JOIN of_dump.fpSessionMetadata ON (tiki_user_votings.id regexp "feedback" and fpSessionMetadata.metadataValue regexp trim(leading 'feedback' from tiki_user_votings.id))
LEFT JOIN of_dump.fpSession ON (fpSession.sessionID = fpSessionMetadata.sessionID)
WHERE feedbackObjectType = "livechat"
AND fpSessionMetadata.metadataName = "referer"
AND tiki_user_votings.id regexp "feedback"
AND (title = "I will follow up later to continue solving this issue")
AND startTime > (unix_timestamp('START') * 1000) and startTime < (unix_timestamp('END') * 1000);
  • i+(CSAT survey) Number of chats unsolved
    • Based on the above queries, collecting users who have selected "No" and who are not planning on following up.
SELECT count(*) as noresults FROM sumo_dump.tiki_user_votings
LEFT JOIN sumo_dump.tiki_poll_options ON tiki_user_votings.optionId=tiki_poll_options.optionId
LEFT JOIN of_dump.fpSessionMetadata ON (tiki_user_votings.id regexp "feedback" and fpSessionMetadata.metadataValue regexp trim(leading 'feedback' from tiki_user_votings.id))
LEFT JOIN of_dump.fpSession ON (fpSession.sessionID = fpSessionMetadata.sessionID)
WHERE feedbackObjectType = "livechat"
AND fpSessionMetadata.metadataName = "referer"
AND tiki_user_votings.id regexp "feedback"
AND ((title = "No") OR (title = "The chat ended before it was finished") OR (title = "The helper wasn't responding") OR (title = "The helper was unable to solve my problem") OR (title = "The chat was taking too much time"))
AND startTime > (unix_timestamp('START') * 1000) and startTime < (unix_timestamp('END') * 1000);
  •  %=chats answered % (calculated from the "number of chats" metrics above)
  •  %=chats solved % (calculated from the "number of chats" metrics above)
  •  %=chats to follow up on % (calculated from the "number of chats" metrics above)
  • f+Number of hours open
  •  %@CSAT score This can also be + in that it can be collected daily and calculated for other date ranges using a weighted average rather than a sum.
  • i=Number of CSAT votes (calculated from CSAT poll counts above)


Traffic and usage

All of these come directly from Omniture unless otherwise noted -- no queries

  • i+Total pages hits
  • i@Number of unique visitors
  • i+Start page # entries
  •  %=Start page bounce rate (calculated from...)
    • i+Start page number of bounces
  • i+Inproduct start page # entries
  • i+Inproduct start page bounce rate (calculated from...)
    • i+Inproduct start page number of bounces
  • i+Number of searches
  • i+Number of people who go to Ask a question
    • from a KB article,
    • from a search results page,
    • directly.
  • i+Number of people who go to Ask a question and then go to forums, live chat (Om) -- This is a traffic question: Of the people who land on Ask a question how many try to go to forums and how many try to go to live chat, it lets us know (for example) that everyone is trying to do LiveChat and only doing forums if LC is closed.
  • =Top search terms from Omniture.

KB metrics how it's done now:

Currently the process is a little convoluted. I set up Omniture to e-mail me nextpage reports for the main home pages and search results pages as well as the pageviews report and then use a perl script to parse these as well as import data about votes and CSAT about each article from tikiwiki. All of this goes into a table called resultset in the sumo_processed database on dm-sumotools01. From there I pull out metrics using queries (as above) for the metrics spreadsheet.

The table has 11 rows:

  • Title (title)
  • Number of page hits total (pageviews)
  • Number of hits from search (searchresults)
  • Number of hits from the start page (startpage)
  • Number of hits from the inproduct start page (inproduct)
  • Number of yes votes in response to "Did this article solve a problem you had with Firefox" (yeshelpvotes)
  • Ditto, no votes (nohelpvotes)
  • Number of yes votes in response to "Was this article easy to understand" (yesundervotes)
  • Ditto no votes (noundervotes)
  • Total number of votes in the CSAT question (numcsatvotes)
  • Average score on the CSAT question (avgcsatscore)

The data is generated every week and is not cached in any way in terms of historically. It's more of an intermediate table so I can pull metrics.

Pulling the data from tikiwiki to sumo_processed.resultset is one giant query:

INSERT INTO resultset (`title`, `yeshelpvotes`, `nohelpvotes`, `yesundervotes`, `noundervotes`, `numcsatvotes`, `avgcsatscore`) SELECT TRIM(LOWER(tp.title)) AS 'title', Title in lowercase to make matching with Omniture data easier SUM(IF(tp.pollQuestionId = 1 AND tpo.position = 0, voteCount, 0)) AS 'yeshelpvotes', SUM(IF(tp.pollQuestionId = 1 AND tpo.position = 1, voteCount, 0)) AS 'nohelpvotes', SUM(IF(tp.pollQuestionId = 2 AND tpo.position = 0, voteCount, 0)) AS 'yesundervotes', SUM(IF(tp.pollQuestionId = 2 AND tpo.position = 1, voteCount, 0)) AS 'noundervotes', SUM(IF(tp.pollQuestionId = 3, voteCount, 0)) AS 'numcsatvotes', SUM(IF(tp.pollQuestionId = 3, voteCount * tpo.position, 0)) / SUM(IF(tp.pollQuestionId = 3, voteCount, 0)) AS 'avgcsatscore' FROM sumo_dump.tiki_poll_votes tpv INNER JOIN sumo_dump.tiki_poll_options tpo ON sumo_dump.tpo.optionId = tpv.optionId tpv stores the actual number of votes per day for each poll. tpo has the options offered (0 = yes, 1 = no or 1-5 for CSAT questions). INNER JOIN sumo_dump.tiki_polls tp ON tp.pollId = tpo.pollId tp has the actual polls which are names for the KB articles INNER JOIN sumo_dump.tiki_pages tpg ON tp.title = tpg.pageName which we can then join to the actual pages. WHERE voteDate BETWEEN CAST('$startdate' AS DATE) AND DATEADD(CAST('$startdate' AS DATE), INTERVAL 6 DAYS) voteDate uses mySQL date format not UNIXTIME AND tp.title REGEXP '^[^[.*.]]' AND tpg.lang = 'en' we're restricting to non-staging articles (which would be marked with a *) and articles in English. We don't want to have the latter restriction in the final dashboard. GROUP BY `title`;

Messy but amazingly effective.

Locale-based metrics -- Gather each of the below metrics for EACH locale. List based

We're not actually collecting any of these as metrics but we have a localization dashboard that generates current statuses on the fly that we can use to turn them into metrics.

  • i+Number of KB edits/translations/approvals to indicate activity
  •  %-Percent of KB articles in each group:
    • Up to date
    • Needs updating
    • Needs translation
    • Needs approval
  •  %-Combined completeness score
  • i+Number of visitors (Om)