User:Ddahl: Difference between revisions
| Line 68: | Line 68: | ||
6. How well could have the Places original patch(es) been reviewed? I understand it was a huge patch, and no doubt difficult to review comprehensively. | 6. How well could have the Places original patch(es) been reviewed? I understand it was a huge patch, and no doubt difficult to review comprehensively. | ||
* Places routinely has ~1400 bugs in bugzilla. Really? This is a reflection of the complexity. | * Places routinely has ~1400 bugs in bugzilla. Really? This is a reflection of the complexity. | ||
7. When I interviewed at Mozilla, I was asked what I would like to do with Firefox. My best answer was that I wasn't sure yet, but because I have always been interested in bookmarking and online research (see rCache.com), I would like to know how I got to a particular URL, what was discovered on the way there and how I can easily get back later on without too much trouble. Almost like an analysis (think apache log analysis or perhaps a quasi data mining ) toolkit for your browsing. Our current schema would have trouble in supporting this kind of application. | 7. When I interviewed at Mozilla, I was asked what I would like to do with Firefox. My best answer was that I wasn't sure yet, but because I have always been interested in bookmarking and online research (see rCache.com), I would like to know how I got to a particular URL, what was discovered on the way there and how I can easily get back later on without too much trouble. Almost like an analysis (think apache log analysis or perhaps a quasi data mining ) toolkit for your browsing. Our current schema would have trouble in supporting this kind of application. | ||
Revision as of 22:08, 27 May 2009
Preface
Wouldn't it be amazing to have a Places module that community members could be excited about contributing code to? Shouldn't that be one of our goals?
import this
I take my software design and engineering philosophy partially from Python's famous 'import this':
ddahl-t500 ~ % python Python 2.6.2 (release26-maint, Apr 19 2009, 01:56:41) [GCC 4.3.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import this The Zen of Python, by Tim Peters Beautiful is better than ugly. Explicit is better than implicit. Simple is better than complex. Complex is better than complicated. Flat is better than nested. Sparse is better than dense. Readability counts. Special cases aren't special enough to break the rules. Although practicality beats purity. Errors should never pass silently. Unless explicitly silenced. In the face of ambiguity, refuse the temptation to guess. There should be one-- and preferably only one --obvious way to do it. Although that way may not be obvious at first unless you're Dutch. Now is better than never. Although never is often better than *right* now. If the implementation is hard to explain, it's a bad idea. If the implementation is easy to explain, it may be a good idea. Namespaces are one honking great idea -- let's do more of those!
I Love these sayings, following these precepts where appropriate will help you design systems that people want to use, and more importantly, want to extend.
As far as Places is concerned, this line is especially prescient:
"If the implementation is hard to explain, it's a bad idea."
Places Observations
I have been staring at the Places source code and fixing bugs for a few months now. I feel like I barely get what is going on under the surface due to the complexity, and I have a few observations as well as a proposal to put forward.
Let's review what I have learned about the Places module in Firefox.
0. Places code by lines is 20% of mozilla/browser, see below: https://wiki.mozilla.org/User:Ddahl#Places_Source_Code_is_bloated
1. Initially developed in a bunker at Google with little input from Mozilla, much less users at large (from what I understand), mostly in C++, by developers who should be working in a Computer Science research lab instead of working on a relatively simple (in scope) bookmarking application. This caused major code complexity.
- The source code in the Mozilla Labs HG repo is infinitely more readable than Places code, and this is OUR "research lab".
- What users were consulted on Places before, during and after development? Was there a call to the community to see what users really want or need?
2. The database schema was "fitted to the code that was already written", rather than being the driver of the Places application, or so it seems. The captured data should be central to this application. Instead it seems like an afterthought. Performance suffers because of this, as well as code over complexity. There is almost nothing in our schema to be proud of.
- The fact that the tags were not given their own table in the database is glaring. This causes our queries to be much more complex, constantly filtering out data, when a simple query could do.
3. Naming conventions do not exist, or rather, 15 of them exist. Even in the Places preferences, there are many "namespaces". This is unacceptable.
4. Teams have changed over and over, increasing the lack of coherence in the code. This fact alone speaks of the problems in maintaining an over complicated code base. This does not speak well for the future. Perhaps because of this, Places developers get distracted and actively look for other modules to work on?
5. A bookmarks organizer is not a complex piece of software compared to the rest of Firefox. This should be relatively simple tool that seems to have been made overly complex, for what reasons, I do not know.
6. How well could have the Places original patch(es) been reviewed? I understand it was a huge patch, and no doubt difficult to review comprehensively.
- Places routinely has ~1400 bugs in bugzilla. Really? This is a reflection of the complexity.
7. When I interviewed at Mozilla, I was asked what I would like to do with Firefox. My best answer was that I wasn't sure yet, but because I have always been interested in bookmarking and online research (see rCache.com), I would like to know how I got to a particular URL, what was discovered on the way there and how I can easily get back later on without too much trouble. Almost like an analysis (think apache log analysis or perhaps a quasi data mining ) toolkit for your browsing. Our current schema would have trouble in supporting this kind of application.
- the schema doesn't support _anything_ beyond what Places currently does, and it doesn't even support that very well.
- A star schema will allow for very simple (read: fast) queries to generate report data that can be used in dynamic containers or "about:me". As we proceed to Firefox.next we should create a new schema that actually makes any kind of "drill down" into the history and bookmarks easy.
8. Shawn Wilsher filed a bug in October 2008 about getting an experimental Places HG repo set up. It is my opinion that we need this ASAP. (I cannot find this bug, will ask sdwilsh about it)
9. Frankly, working on places feels like it is going to be death by a thousand patches, and I'm not sure how it will end up, but fixing it seems like more work than starting over. It will never be elegant and easy to extend if we keep on the road we are on, and the longer we keep on this path the harder it will be to pull the plug. The rationale for "marshaling on" with the current code is duly noted (it seems to be based on inertia in my opinion). I understand the concern in being conservative with code changes, but when did Mozilla become conservative? Are we really this afraid of change? What was Tracemonkey then? anything but conservative.
Proposal
0. Keep fixing bugs and working on Places performance tweaks. But we should slow down on new features.
1. Set up an experimental repo.
2. Create a new module, not called "Places", "Library" would be much better. perhaps we use a name never before used. (Let's fix the nomenclature problems)
3. Get user ideas and input early and often.
- What does Apple do and why? What does Google do and why? Do users get a lot out of the implementation? Let's start a dialog with Firefox, Safari and Chrome users.
4. Get UI design input early and often.
5. Get Labs input early and often.
6. Everything is a sprint, and a SHORT sprint at that. Simplicity is Key. What is the simplest way to do something. Design this for today, not around edge cases.
- Performance will come with time. Let's design this around a great experience first, the experience our users want.
- Test-driven development. We can make this simple and clean. We can make this a pleasure to work on. We can make extending it a pleasure as well, and we can make the user experience amazing.
7. Begin by creating new Schema designs. I suggest each Places developer go off and design a schema on their own. Put the call out for community developers to do the same. "Design challenge", re-assemble and discuss.
- guidelines are query simplicity, and the required data to capture for each activity: Visit, Bookmark, Tag, Keyword, Annotation, etc.
8. Create a database data generator. Let's develop this with a large dataset.
9. Design a really easy to use Query Api, that is a thin layer over a JS-freindly ORM/API over storage.
10. Database Schema and Query API sprint.
11. UI design sprint. Can we create a really clean MVC style application that makes skinning the "Library" trivial for add-on developers?
12. Get user feedback, Get UI feedback, Get Labs feedback. Capture Feedback. Iterate. Sprints should start out very short. 2/3 days a week for 3 weeks?
a list of Places code pain points. Mostly questions about perceived complexity in the places codebase
- Schema warts
- naming conventions
sqlite> .schema moz_bookmarks CREATE TABLE moz_bookmarks ( id INTEGER PRIMARY KEY, type INTEGER, fk INTEGER DEFAULT NULL, parent INTEGER, position INTEGER, title LONGVARCHAR, keyword_id INTEGER, folder_type TEXT, dateAdded INTEGER, lastModified INTEGER); sqlite> select * from moz_bookmarks where id = 1; 1|2||0|0||||1233943160822932|1233943160823457 sqlite> .schema moz_bookmarks_roots CREATE TABLE moz_bookmarks_roots ( root_name VARCHAR(16) UNIQUE, folder_id INTEGER); sqlite> select * from moz_bookmarks_roots; places|1 menu|2 toolbar|3 tags|4 unfiled|5 sqlite> select * from moz_bookmarks where id = 4; 4|2||1|2|Tags|||1233943160823352|1233943160823766
So the folder_id from moz_bookmarks_roots is the foreign key to id in moz_bookmarks.
This is a non-convention in database schema design.
- Here is a row from moz_bookmarks:
25792|2||25789|2|All Bookmarks|||1232646170341394|1232646170342806
- Why is a folder in the places UI kept as a record in the bookmarks table? If it is not really a bookmark, do not store it in the bookmarks table. the reason databases have various tables is usually so that you can keep like items together.
- Places services have many getters, most of them require arguments? WTF? Getters do not require arguements. NOTE: I have since figureedd out this is due to XPCOM. We should be able to write our code so that these things are wrapped in a more "object-oriented" manner.
bs.getItemLastModified
function getItemLastModified() {
[native code]
}
bs.getItemLastModified()
[Exception... "Not enough arguments [nsINavBookmarksService.getItemLastModified]" nsresult: "0x80570001 (NS_ERROR_XPC_NOT_ENOUGH_ARGS)" location: "JS frame :: chrome://mozshell/content/mozshell.js :: mzExecute :: line 363" data: no]
- I often see try/catch used to do something like this:
try { this() } catch(ex) {}
This seems a bit strange to me - so many code paths will silently fail - the very least we should do is log these exceptions if in a "testing" mode (not debug build). At best, we can take advantage of try/catch and formulate more of the control flow around try catch. in many occurrences you can replace if else and get a more solid control flow. You can also create a hierarchy of exception objects that do additional work for you. You should consider exceptions being thrown not so much as an error, but a helper for your code's logic.
create a set of tables to track uri info:
- places_tld
- places_host
- places_domain
- places_query_string_params ??
Places Source Code is bloated
A quick scan of the Places code reveals that it is quite large:
ddahl-t500 ~ % perl ~/bin/cloc.pl --exclude-dir=.hg ~/code/moz/mozilla-central/mozilla/browser/components/places
87 text files.
84 unique files.
26 files ignored.
http://cloc.sourceforge.net v 1.08 T=0.5 s (122.0 files/s, 47816.0 lines/s)
-------------------------------------------------------------------------------
Language files blank comment code scale 3rd gen. equiv
-------------------------------------------------------------------------------
Javascript 46 1865 4579 10550 x 1.48 = 15614.00
XML 4 407 83 3138 x 1.90 = 5962.20
C++ 1 349 511 1729 x 1.51 = 2610.79
IDL 2 33 0 439 x 3.80 = 1668.20
HTML 4 5 13 96 x 1.90 = 182.40
C/C++ Header 1 9 0 53 x 1.00 = 53.00
CSS 3 11 1 37 x 1.00 = 37.00
-------------------------------------------------------------------------------
SUM: 61 2679 5187 16042 x 1.63 = 26127.59
-------------------------------------------------------------------------------
Just the portion of the code under mozilla/browser/components/places is 16000 lines of code
Check this out:
ddahl-t500 ~ % perl ~/bin/cloc.pl --exclude-dir=.hg ~/code/moz/mozilla-central/mozilla/browser/
737 text files.
717 unique files.
250 files ignored.
http://cloc.sourceforge.net v 1.08 T=2.0 s (240.0 files/s, 56893.5 lines/s)
-------------------------------------------------------------------------------
Language files blank comment code scale 3rd gen. equiv
-------------------------------------------------------------------------------
Javascript 239 8725 16012 45326 x 1.48 = 67082.48
C++ 23 2094 2287 10010 x 1.51 = 15115.10
CSS 58 1996 1499 8664 x 1.00 = 8664.00
XML 24 1003 219 7771 x 1.90 = 14764.90
IDL 15 221 0 2078 x 3.80 = 7896.40
DTD 49 303 307 1614 x 1.90 = 3066.60
HTML 42 162 75 1085 x 1.90 = 2061.50
C/C++ Header 23 258 829 803 x 1.00 = 803.00
Perl 1 40 24 120 x 4.00 = 480.00
Bourne Shell 4 4 74 49 x 3.81 = 186.69
Python 2 21 75 39 x 4.20 = 163.80
-------------------------------------------------------------------------------
SUM: 480 14827 21401 77559 x 1.55 = 120284.47
-------------------------------------------------------------------------------
(this is not counting toolkit code)
The entire mozilla/browser is 77000 lines. That makes Places about 20% of browser. 20%? Really?
Ok, here is browser/ and toolkit/:
ddahl-t500 ~ % perl ~/bin/cloc.pl --exclude-dir=.hg ~/code/moz/mozilla-central/mozilla/browser/ ~/code/moz/mozilla-central/mozilla/toolkit
2926 text files.
2860 unique files.
956 files ignored.
http://cloc.sourceforge.net v 1.08 T=5.0 s (389.8 files/s, 97069.2 lines/s)
-------------------------------------------------------------------------------
Language files blank comment code scale 3rd gen. equiv
-------------------------------------------------------------------------------
Javascript 643 22306 42431 110082 x 1.48 = 162921.36
C++ 195 15972 18124 73938 x 1.51 = 111646.38
XML 283 4359 1859 32640 x 1.90 = 62016.00
Bourne Shell 19 3368 4739 25717 x 3.81 = 97981.77
HTML 125 6454 460 25330 x 1.90 = 48127.00
CSS 250 5856 6691 22705 x 1.00 = 22705.00
IDL 86 1382 0 12617 x 3.80 = 47944.60
C/C++ Header 208 4059 12616 11622 x 1.00 = 11622.00
m4 2 759 31 6548 x 1.00 = 6548.00
C 6 338 297 2700 x 0.77 = 2079.00
DTD 106 527 783 2672 x 1.90 = 5076.80
Objective C 3 235 211 959 x 2.96 = 2838.64
Perl 9 198 404 854 x 4.00 = 3416.00
Python 4 115 294 542 x 4.20 = 2276.40
make 1 31 41 301 x 2.50 = 752.50
MATLAB 2 48 0 277 x 4.00 = 1108.00
PHP 2 75 126 248 x 3.50 = 868.00
Java 2 29 72 97 x 1.36 = 131.92
Assembly 1 15 39 57 x 0.25 = 14.25
SQL 1 2 0 56 x 2.29 = 128.24
XSLT 1 10 0 28 x 1.90 = 53.20
-------------------------------------------------------------------------------
SUM: 1949 66138 89218 329990 x 1.79 = 590255.06
-------------------------------------------------------------------------------
And here is browser/components/places and toolkit/components/places:
ddahl-t500 ~ % perl ~/bin/cloc.pl --exclude-dir=.hg ~/code/moz/mozilla-central/mozilla/browser/components/places ~/code/moz/mozilla-central/mozilla/toolkit/components/places
330 text files.
321 unique files.
57 files ignored.
http://cloc.sourceforge.net v 1.08 T=0.5 s (542.0 files/s, 169236.0 lines/s)
-------------------------------------------------------------------------------
Language files blank comment code scale 3rd gen. equiv
-------------------------------------------------------------------------------
Javascript 203 5210 14992 26791 x 1.48 = 39650.68
C++ 13 3747 4643 17798 x 1.51 = 26874.98
IDL 12 347 0 3428 x 3.80 = 13026.40
XML 4 407 83 3138 x 1.90 = 5962.20
C/C++ Header 12 453 884 1979 x 1.00 = 1979.00
HTML 23 60 22 529 x 1.90 = 1005.10
SQL 1 2 0 56 x 2.29 = 128.24
CSS 3 11 1 37 x 1.00 = 37.00
-------------------------------------------------------------------------------
SUM: 271 10237 20625 53756 x 1.65 = 88663.60
-------------------------------------------------------------------------------
53000 Lines. Wow.
So, 16 percent of the code if you count toolkit.