Labs/Ubiquity/Usability/Usability Testing/Fall 08 1.2 Tests: Difference between revisions

m
 
(34 intermediate revisions by the same user not shown)
Line 1: Line 1:
Kris graciously made [http://gist.github.com/62499 this] Ubiquity that allows you to select any time code and jump to that point in time.
Kris graciously made [http://gist.github.com/62499 this] Ubiquity command that allows you to select any time code and jump to that point in time.
== Objectives ==
== Objectives ==


Line 6: Line 6:
== Methodology ==
== Methodology ==
''See main article [[Labs/Ubiquity/Usability/Usability_Testing/Fall_08_1.2_Tests/Methodology | Methodology ]]''
''See main article [[Labs/Ubiquity/Usability/Usability_Testing/Fall_08_1.2_Tests/Methodology | Methodology ]]''
This largely is an exploratory, qualitative test exploring what users do when presented with Ubiquity for the first time.  It is based on the [http://www.uie.com/articles/interview_based_tasks/ interview] format where the participants lead which tasks they perform next.
This largely is an exploratory, qualitative test exploring what users do when presented with Ubiquity for the first time.  It is based on the [http://www.uie.com/articles/interview_based_tasks/ interview] format where the participants lead which tasks they perform next.


Line 16: Line 17:
|- Style="background-color:#efefef;"
|- Style="background-color:#efefef;"
| <center>'''Tag'''</center>
| <center>'''Tag'''</center>
| <center>'''[[Labs/Ubiquity/Usability/Usability_Testing/UI_Triangulation#Rankings |Freq.]]'''</center>
| <center>'''Freq.'''</center>
| <center>'''[[Labs/Ubiquity/Usability/Usability_Testing/UI_Triangulation#Rankings |Sev.]]'''</center>
| <center>'''[[Labs/Ubiquity/Usability/Usability_Testing/UI_Triangulation#Rankings |Sev.]]'''</center>
| <center>'''Videos'''</center>
| <center>'''Videos'''</center>
Line 29: Line 30:
| Discovery
| Discovery
| Users who gave up on finding the hotkey combo.
| Users who gave up on finding the hotkey combo.
| 402, 440
| [https://ubiquity.mozilla.com/trac/ticket/402 402], [https://ubiquity.mozilla.com/trac/ticket/440 440]
|
|
|-
|-
Line 56: Line 57:
|
|
|
|
|
| [https://ubiquity.mozilla.com/trac/ticket/545 545]
|
|
|-
|-
Line 73: Line 74:
| 6, 7, 11
| 6, 7, 11
| Learnability
| Learnability
| Wiki, hotkey, mashup,
| Wiki, hotkey, mashup, etc.  See [https://wiki.mozilla.org/Labs/Ubiquity/Usability/Usability_Testing/Fall_08_1.2_Tests#Demonstrating_Value_.28marketing.29 Demonstrating Value]
|
| [https://ubiquity.mozilla.com/trac/ticket/547 547]
|
|
|-
|-
| Confused by wiki
| Confused by mozilla wiki
| 1
| 2
| 2
| 2
| 6, 8
| 6, 8
| Learnability
| Learnability
|
| We should move the help to something other than this Wiki.
|
| [https://ubiquity.mozilla.com/trac/ticket/402 402]
|
|
|-
|-
Line 91: Line 92:
| 9, 5
| 9, 5
| Learnability
| Learnability
|
| The "Help system" [http://www.indolering.com/indolering.com/Ubiquity_Blog/Entries/2008/12/9_Alternative_UI_Recommendations*.html includes] Wikipedia and Google.
|
|
|
|
Line 101: Line 102:
| Learnability
| Learnability
| Lead for stat analysis, synonym common email URL's w/ email command.
| Lead for stat analysis, synonym common email URL's w/ email command.
|
| [https://ubiquity.mozilla.com/trac/ticket/572 572]
|
|
|-
|-
Line 130: Line 131:
| Learnability
| Learnability
| External validity of this data is poor because all users were using the same wifi connection (ie not a random sample)
| External validity of this data is poor because all users were using the same wifi connection (ie not a random sample)
|
| [https://ubiquity.mozilla.com/trac/ticket/547 547]
|
|
|-
|-
Line 139: Line 140:
|
|
| Video is buzzword laden
| Video is buzzword laden
|
| [https://ubiquity.mozilla.com/trac/ticket/547 547]
|
|
|-
|-
Line 148: Line 149:
|
|
| Sound
| Sound
|
| [https://ubiquity.mozilla.com/trac/ticket/547 547]
|
|
|}  
|}  
Line 167: Line 168:
|- Style="background-color:#efefef;"
|- Style="background-color:#efefef;"
! <center>Tag</center>
! <center>Tag</center>
! <center>[[Labs/Ubiquity/Usability/Usability_Testing/UI_Triangulation#Rankings |Freq]]</center>
! <center>Freq</center>
! <center>Videos</center>
! <center>Videos</center>
! <center>Notes</center>
! <center>Notes</center>
Line 173: Line 174:
|-
|-
| Email-
| Email-
| 4
| 5
| 9, 11
| 9, 10, 11
| Two seperate issues
| Multiple issues, one being that users don't understand the need for modifiers ("'''to''' janedoe@gmail.com" or "email '''this'''").  Another being that users try typing in the URL of their service provider (mail.yahoo.com)- when that failed they assumed it didn't work with their email service provider.  Finally, the email command is just very buggy.
|
| [https://ubiquity.mozilla.com/trac/ticket/572 572] [https://ubiquity.mozilla.com/trac/ticket/574 574]
|-
|-
| Email+
| Email+
Line 184: Line 185:
|
|
|-
|-
| Maps-
| Map-
| 3
| 3
| 8, 9, 10
| 8, 9, 10
|
| Somewhat invalid as the errors are due to discovery problems with Ub itself. All participants cruised to Google Maps instead of using the command, provide [http://www.azarask.in/blog/post/can-ubiquity-be-used-only-with-the-mouse/ contextual reminders/clue] on Google Maps itself?
|
|
|-
|-
| Maps+
| Map+
| 4
| 4
| 7, 8, 10, 11
| 7, 8, 10, 11
Line 209: Line 210:
|-
|-
| Weather-
| Weather-
| 1
| 0
| 10
|  
|
|
|
|
Line 223: Line 224:
| 1
| 1
| 5
| 5
| Sudo did not show up in Define.com. Fallback dictionaries (urban, wiktionary, Google's define:, etc)
| Sudo did not show up in Define.com. Fallback dictionaries (urban, wiktionary, Google's define:, etc) would be a smart idea.
| 404
| [https://ubiquity.mozilla.com/trac/ticket/404 404]
|-
|-
| Define+
| Define+
Line 235: Line 236:
| 2
| 2
| 10
| 10
|
| This caused [http://www.indolering.com/indolering.com/Ubiquity_Blog/Entries/2008/11/12_Ubiquity_Translator_Command.html more confusion] than is reflected here.  Executing it was not the problem, users didn't expect it to change the text on the page.
|
| [https://ubiquity.mozilla.com/trac/ticket/54 54]
|-
|-
| Translate+
| Translate+
Line 247: Line 248:
| 1
| 1
| 10
| 10
|
| Couldn't guess the command correctly.
|
|
|-
|-
Line 253: Line 254:
| 6
| 6
| 5, 6, 7, 8
| 5, 6, 7, 8
|
| Requiring the user to click on the map is "counter intuitive."
|
| [https://ubiquity.mozilla.com/trac/ticket/542 542]
|-
|-
| Map-insert+
| Map-insert+
Line 336: Line 337:


As [http://www.usabilityworks.net/ Dana Chisnell] commented, "I wonder if what this says is that your participants thought it's an interesting idea, it could be developed more, but there isn't a lot of immediate value because it doesn't actually make their tasks easier."
As [http://www.usabilityworks.net/ Dana Chisnell] commented, "I wonder if what this says is that your participants thought it's an interesting idea, it could be developed more, but there isn't a lot of immediate value because it doesn't actually make their tasks easier."
How do you convey value to the users?  Slashdot considers Ubiquity (or rather the "natural language interface") [http://tech.slashdot.org/article.pl?sid=09/02/10/1334206 bloat].  Granted this IS /. but tech fanboys are the base that installed FF on millions of machines and taught others to use it.  It's a perception issue, it needs to be fixed.


==== Is there something that the help is lacking? ====
==== Is there something that the help is lacking? ====
Line 378: Line 377:


Marketing and the possible use of contextual notifications on screen would help to expose users to Ubiquity and it's interface outside of the browser experience.
Marketing and the possible use of contextual notifications on screen would help to expose users to Ubiquity and it's interface outside of the browser experience.
=== Demonstrating Value (selling) ===


=== Learnability ===
=== Learnability ===
Line 389: Line 386:
It's hard to gauge how long people stayed tuned in to the screencast.  They would often experiment and then come back.  It would be good to get some analysis with a little JS to track how long our current user-base stay on the page/tab after clicking play.  Stay tuned for some [www.indolering.com blog posts] on thoughts concerning video.
It's hard to gauge how long people stayed tuned in to the screencast.  They would often experiment and then come back.  It would be good to get some analysis with a little JS to track how long our current user-base stay on the page/tab after clicking play.  Stay tuned for some [www.indolering.com blog posts] on thoughts concerning video.


=== Packaging Ubiquity for the Mainstream ===
Please provide the lessons you learned about packaging Ubiquity for the mainstream.




It is hard to say what this suggestions this provides for mainstream implementation.  The core functions of Ubiquity work OK, once they got a command or two figured out most participants could guess new commands and seem to have the structure down well.
=== Demonstrating Value (marketing) ===
There were issues with mental models, namely the need for modifiers and dealing with Ubiquity as a dumb robot instead of a Turning test passable AI.


URL bar integration is a given, and Aza's implementation is very close to what I was envisioning, making findability a much smaller issue.  The study did provide some insight and push towards tweaking the auto suggest functions, the help system, and zeroed in on specifically bad ''commands''.
One angle of attack is to consider how to properly shape the users mental model.  The [http://www.vimeo.com/1561578 buzzword heavy video] was [[Labs/Ubiquity/Usability/Usability_Testing/Fall_08_1.2_Tests/Tester 07 | confusing]] many participants and I believe it also brought a [[Labs/Ubiquity/Usability/Usability_Testing/Fall_08_1.2_Tests/Tester 08 | level of expectation]] that was impossible to meet.


However, even after the participants had learned to use Ub they didn't really find it all that useful, nor do they appear to have continued using Ubiquity after the tests.
Even after the participants had learned to use Ub they didn't really [[Labs/Ubiquity/Usability/Usability_Testing/Fall_08_1.2_Tests#Feedback | find]] it all that useful, nor do they appear to have continued using Ubiquity after the tests.


So I think the biggest lesson for me is how essential of communicating and message control is for new UI'sWe have to make sure we communicate in such a way as to not scare users off, not give false impressions, and convince users that Ubiquity is worth learning before they interact with Ubiquity.
Slashdot considers Ubiquity (or rather "natural language interfaces") [http://tech.slashdot.org/article.pl?sid=09/02/10/1334206 bloat].  Granted this IS /. but tech fanboys ''are'' the base that seeded FF and taught others to use it.   


-Zach
[http://www.vimeo.com/2497726 Proper marketing] should teach the user how to use the unfamiliar interface, and the interface should either match or exceed those expectations.
 
[http://www.rockridgeinstitute.org/thinkingpoints/ThinkingPoints_Chapter3.pdf Explicit control] of mental [http://en.wikipedia.org/wiki/Framing_(social_sciences) framing] is what got Bush and Obama elected.  While framing and marketing is a deep issue, my take-away from the studies is that control of the message is essential as to not scare users off, not give false impressions, and convince users that Ubiquity is worth learning before they interact with Ubiquity.


=== Continual Improvement ===
=== Continual Improvement ===
TPS continual development cycle/ turning everything into a scientific experiment, [http://video.google.com/videoplay?docid=-6459171443654125383 implementation in software], measuring success of changes.
TPS continual development cycle turns everything into a scientific experiment, coming up with theories, hypotheses, testing, and analysis to make exacting decisions based upon those changes.
Participants have high expectations of Ubiquity to get things just
right. If a problem prevents a user from using a command they are unlikely to try again. While we do our best to anticipate those expectations and problems we can't
get them all. So how do we catch these errors and fix them? A major learning experience was gained from working with the Ubiquity development team and its support structures. UI Engineers are a part of the development workflow, so usability study final reports generally provide recommendations for improvements in the development workflow in addition to UI changes. From a background in radically custom manufacturing, the implications of Lean/Agile processes jump out. I want to share a piece of that here, specifically ''Kaizen''.


[http://en.wikipedia.org/wiki/Continual_improvement Kaizen] is business logic that is largely borrowed from the [http://en.wikipedia.org/wiki/Toyota_Production_System Toyota Production System]. In short, it is a way of structuring improvement strategies in a workflow or organization in a cyclical manner. While Kaizen or ''continual improvement'' can seem obvious, it is the implementation that is so interesting. Toyota frames every problem through the scientific method, meticulously documenting every step in a process, framing hypotheses for improvement, experimenting and measuring results. While we can get away with sudo-implementations of documenting the workflow and measuring results, vast improvements can be made as information becomes impersonal and assumptions are continually assaulted.
It's best to provide an example in an obvious area where continual improvement can help, like Ubiquity commands. Data analysis admins can identify large problem areas (many failed commands, for example) and alert usability engineers. The UI
engineers perform further analysis and come up with recommendations. UI
engineers then communicate with programming engineers who help to
implement changes. The UI engineers then communicate the changes and
anticipated data changes to the data analysis people and wait for the
results. Repeat.
Google has similar processes; a particularly tidy example of Lean in software development was given by Jen Fitzpatrick about the Google Spellchecker.
It's ~18:00 in.
<video type="googlevideo" id="-6459171443654125383" width="640" height="480" desc="[http://video.google.com/videoplay?docid=-6459171443654125383 Original video]" frame="true" position="center"/>


==== Streaming UI ====
==== Streaming UI ====
The [http://www.uie.com/articles/fast_iterations/ potential] for it.
Our position is uniquely suited to take advantage of Kaizen techniques because:
 
# We have a huge user-base
# Ubiquity could eventually support streaming UIs (core commands and css updated via subscription feeds)
 
Anyone remotely familiar with web development should have noticed the huge impact analytics has had on web and UI development. However, companies who have aligned UI engineering and backend development with Kaizen reap rewards on a  [http://www.uie.com/articles/fast_iterations/ different plane]:
 
<blockquote>"We don't assume anything works and we don't like to make predictions without real-world tests. '''Predictions color our thinking'''. So, we continually make this up as we go along, keeping what works and throwing away what doesn't. We've found that about 90% of it doesn't work." -Lead designer at [http://www.uie.com/articles/fast_iterations/ Netflix]<br>
</blockquote>


Netflix makes major changes every 2 weeks, and most of them fail! We make changes every 2 months, and if 90% of those are failing.... Granted, the Netflix team has a very well fleshed out UI so they are guaranteed to have a higher failure rate but their rapid development cycle allows them to challenge previous assumptions and scour out problems.


==== UI Triangulation ====
==== UI Triangulation ====
''See main article [[Labs/Ubiquity/Usability/Usability Testing/UI Triangulation | UI Triangulation]]''
''See main article [[Labs/Ubiquity/Usability/Usability Testing/UI Triangulation | UI Triangulation]]''
If you watched all of the above Google Talk video you will notice that Jen mentions how they have tied in their user support staff and techncial engineers.  That's something the UI world calls Triangulation.  Data collection and testing doesn't just come from a streaming UI and server logs. Indeed problem reports come in many forms, Test Pilot there we also get information flows from Get Satisfaction, Google Groups, Bugzilla, Trac, Moz Quality control, etc.
 
Measuring the impacts of changes to our UI is trickier for us.  Google and Netflix just examine their server logs, but we can't (and won't) log every keystroke.  But that's okay, if you watched all of the above Google Talk video you will notice that Jen mentions how they have tied in their user support staff and technical engineers.  That's something the UI world calls [[Labs/Ubiquity/Usability/Usability Testing/UI Triangulation | Triangulation]]Reports come in many forms, Test Pilot there we also get information flows from Get Satisfaction, Google Groups, Bugzilla, Trac, Moz Quality control, etc


Separately, all these places have their purpose, but by connecting these information flows or creating [http://www.derickbailey.com/2008/06/03/OnePieceFlowInSoftwareDevelopment.aspx ''one piece flow'']  (another [http://en.wikipedia.org/wiki/Toyota_Production_System TPS]  principal) you can triangulate problems while simultaniously increasing the value of participation to all systems.  In short, problems can be identified easier if all the vectors for data are aligned with one another, like communicating [http://getsatisfaction.com/mozilla/products/mozilla_ubiquity GSFN] complaints severity and frequency in Trac.  This goes for non-UI related problems as well, it's vital for the [http://www.toolness.com/wp/?p=64 web of trust].
Separately, all these places have their purpose, but by connecting these information flows or creating [http://www.derickbailey.com/2008/06/03/OnePieceFlowInSoftwareDevelopment.aspx ''one piece flow'']  (another [http://en.wikipedia.org/wiki/Toyota_Production_System TPS]  principal) you can triangulate problems while simultaniously increasing the value of participation to all systems.  In short, problems can be identified easier if all the vectors for data are aligned with one another, like communicating [http://getsatisfaction.com/mozilla/products/mozilla_ubiquity GSFN] complaints severity and frequency in Trac.  This goes for non-UI related problems as well, it's vital for the [http://www.toolness.com/wp/?p=64 web of trust].


UI developers do this with triangulation and priority calculationsIt would be smart if we started baking these into our collection mechanism and workflows so that they sync with other systems autonomously or make it very fast for people to do manually.
By combining all of these sources for information we can get a very accurate picture of things.  By analogy, think about how tainted political polling data pools are: people with landline phones who happen to be home and are willing to spend 10-30 minutes to answering questionsBut by combining polls and weighting based on historical averages Pollyvote predicted the 2008 US presidential election's popular vote to within a single percentage point.
 
To get started it would be smart if we started baking these into our collection mechanism and work flows so that they sync with other systems autonomously or make it very fast for people to do manually.  A fully connected flow would be to combine  GSFN (our customer facing solution) with Trac (our developer solution). BUT that would lead to too much clutter in the Trac database, and those problems would have chunked and worked out first.


If we frame everything is a scientific test we can know when to stop.  Depending on how much value any single step would provides and the drawbacks determines the amount of resources spent on it. A fully connected flow would be to combine  GSFN (our customer facing solution) with Trac (our developer solution). BUT that would lead to too much clutter in the Trac database, and those problems would have chunked  and worked out first.  For example it may start out with putting the GSFN box on the download page, [http://groups.google.com/group/ubiquity-core/browse_thread/thread/ea8324e41b952b4d improving manual syncing], then manually connecting the [http://getsatisfaction.com/getsatisfaction/topics/can_get_satisfaction_email_me_every_time_a_new_response_is_added_to_a_thread RSS feeds] to specific Track tickets, a script to update GSFN when a correlating ticket on Trac closes, eventually [https://subtrac.sara.nl/oss/email2trac automatic] importation of only certain ticket types, and so on.
If we frame everything is a scientific test (as Kaizen does) we can know when to stop.  Depending on how much value any single step would provides and the drawbacks determines the amount of resources spent on it.  For example it may start out with putting the GSFN box on the download page, [http://groups.google.com/group/ubiquity-core/browse_thread/thread/ea8324e41b952b4d improving manual syncing], then work on manually connecting the [http://getsatisfaction.com/getsatisfaction/topics/can_get_satisfaction_email_me_every_time_a_new_response_is_added_to_a_thread RSS feeds] to specific Track tickets, later adding script to update GSFN when a correlating ticket on Trac closes, eventually [https://subtrac.sara.nl/oss/email2trac automatic] importation of only certain ticket types, and so on.


== Sessions ==
== Sessions ==
501

edits