User:Ddahl: Difference between revisions

Revision as of 22:08, 27 May 2009

Preface

Wouldn't it be amazing to have a Places module that community members could be excited about contributing code to? Shouldn't that be one of our goals?

import this

I take my software design and engineering philosophy partially from Python's famous 'import this':

ddahl-t500 ~ % python 
Python 2.6.2 (release26-maint, Apr 19 2009, 01:56:41) 
[GCC 4.3.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import this
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

I Love these sayings, following these precepts where appropriate will help you design systems that people want to use, and more importantly, want to extend.

As far as Places is concerned, this line is especially prescient:

"If the implementation is hard to explain, it's a bad idea."

Places Observations

I have been staring at the Places source code and fixing bugs for a few months now. I feel like I barely get what is going on under the surface due to the complexity, and I have a few observations as well as a proposal to put forward.

Let's review what I have learned about the Places module in Firefox.

0. Places code by lines is 20% of mozilla/browser, see below: https://wiki.mozilla.org/User:Ddahl#Places_Source_Code_is_bloated

1. Initially developed in a bunker at Google with little input from Mozilla, much less users at large (from what I understand), mostly in C++, by developers who should be working in a Computer Science research lab instead of working on a relatively simple (in scope) bookmarking application. This caused major code complexity.

The source code in the Mozilla Labs HG repo is infinitely more readable than Places code, and this is OUR "research lab".

What users were consulted on Places before, during and after development? Was there a call to the community to see what users really want or need?

2. The database schema was "fitted to the code that was already written", rather than being the driver of the Places application, or so it seems. The captured data should be central to this application. Instead it seems like an afterthought. Performance suffers because of this, as well as code over complexity. There is almost nothing in our schema to be proud of.

The fact that the tags were not given their own table in the database is glaring. This causes our queries to be much more complex, constantly filtering out data, when a simple query could do.

3. Naming conventions do not exist, or rather, 15 of them exist. Even in the Places preferences, there are many "namespaces". This is unacceptable.

4. Teams have changed over and over, increasing the lack of coherence in the code. This fact alone speaks of the problems in maintaining an over complicated code base. This does not speak well for the future. Perhaps because of this, Places developers get distracted and actively look for other modules to work on?

5. A bookmarks organizer is not a complex piece of software compared to the rest of Firefox. This should be relatively simple tool that seems to have been made overly complex, for what reasons, I do not know.

6. How well could have the Places original patch(es) been reviewed? I understand it was a huge patch, and no doubt difficult to review comprehensively.

Places routinely has ~1400 bugs in bugzilla. Really? This is a reflection of the complexity.

7. When I interviewed at Mozilla, I was asked what I would like to do with Firefox. My best answer was that I wasn't sure yet, but because I have always been interested in bookmarking and online research (see rCache.com), I would like to know how I got to a particular URL, what was discovered on the way there and how I can easily get back later on without too much trouble. Almost like an analysis (think apache log analysis or perhaps a quasi data mining ) toolkit for your browsing. Our current schema would have trouble in supporting this kind of application.

the schema doesn't support _anything_ beyond what Places currently does, and it doesn't even support that very well.

A star schema will allow for very simple (read: fast) queries to generate report data that can be used in dynamic containers or "about:me". As we proceed to Firefox.next we should create a new schema that actually makes any kind of "drill down" into the history and bookmarks easy.

8. Shawn Wilsher filed a bug in October 2008 about getting an experimental Places HG repo set up. It is my opinion that we need this ASAP. (I cannot find this bug, will ask sdwilsh about it)

9. Frankly, working on places feels like it is going to be death by a thousand patches, and I'm not sure how it will end up, but fixing it seems like more work than starting over. It will never be elegant and easy to extend if we keep on the road we are on, and the longer we keep on this path the harder it will be to pull the plug. The rationale for "marshaling on" with the current code is duly noted (it seems to be based on inertia in my opinion). I understand the concern in being conservative with code changes, but when did Mozilla become conservative? Are we really this afraid of change? What was Tracemonkey then? anything but conservative.

Proposal

0. Keep fixing bugs and working on Places performance tweaks. But we should slow down on new features.

1. Set up an experimental repo.

2. Create a new module, not called "Places", "Library" would be much better. perhaps we use a name never before used. (Let's fix the nomenclature problems)

3. Get user ideas and input early and often.

What does Apple do and why? What does Google do and why? Do users get a lot out of the implementation? Let's start a dialog with Firefox, Safari and Chrome users.

4. Get UI design input early and often.

5. Get Labs input early and often.

6. Everything is a sprint, and a SHORT sprint at that. Simplicity is Key. What is the simplest way to do something. Design this for today, not around edge cases.

Performance will come with time. Let's design this around a great experience first, the experience our users want.

Test-driven development. We can make this simple and clean. We can make this a pleasure to work on. We can make extending it a pleasure as well, and we can make the user experience amazing.

7. Begin by creating new Schema designs. I suggest each Places developer go off and design a schema on their own. Put the call out for community developers to do the same. "Design challenge", re-assemble and discuss.

guidelines are query simplicity, and the required data to capture for each activity: Visit, Bookmark, Tag, Keyword, Annotation, etc.

8. Create a database data generator. Let's develop this with a large dataset.

9. Design a really easy to use Query Api, that is a thin layer over a JS-freindly ORM/API over storage.

10. Database Schema and Query API sprint.

11. UI design sprint. Can we create a really clean MVC style application that makes skinning the "Library" trivial for add-on developers?

12. Get user feedback, Get UI feedback, Get Labs feedback. Capture Feedback. Iterate. Sprints should start out very short. 2/3 days a week for 3 weeks?

a list of Places code pain points. Mostly questions about perceived complexity in the places codebase

Schema warts

- naming conventions

sqlite> .schema moz_bookmarks
CREATE TABLE moz_bookmarks (  id INTEGER PRIMARY KEY, type INTEGER, fk INTEGER DEFAULT NULL, parent INTEGER, position INTEGER, title LONGVARCHAR, keyword_id INTEGER, folder_type TEXT, dateAdded INTEGER, lastModified INTEGER);

sqlite> select * from moz_bookmarks where id = 1;
1|2||0|0||||1233943160822932|1233943160823457

sqlite> .schema moz_bookmarks_roots
CREATE TABLE moz_bookmarks_roots (  root_name VARCHAR(16) UNIQUE, folder_id INTEGER);

sqlite> select * from moz_bookmarks_roots;
places|1
menu|2
toolbar|3
tags|4
unfiled|5
sqlite> select * from moz_bookmarks where id = 4;
4|2||1|2|Tags|||1233943160823352|1233943160823766

So the folder_id from moz_bookmarks_roots is the foreign key to id in moz_bookmarks.

This is a non-convention in database schema design.

- Here is a row from moz_bookmarks:

25792|2||25789|2|All Bookmarks|||1232646170341394|1232646170342806

Why is a folder in the places UI kept as a record in the bookmarks table? If it is not really a bookmark, do not store it in the bookmarks table. the reason databases have various tables is usually so that you can keep like items together.

Places services have many getters, most of them require arguments? WTF? Getters do not require arguements. NOTE: I have since figureedd out this is due to XPCOM. We should be able to write our code so that these things are wrapped in a more "object-oriented" manner.

bs.getItemLastModified
function getItemLastModified() {
    [native code]
}
bs.getItemLastModified()
[Exception... "Not enough arguments [nsINavBookmarksService.getItemLastModified]" nsresult: "0x80570001 (NS_ERROR_XPC_NOT_ENOUGH_ARGS)" location: "JS frame :: chrome://mozshell/content/mozshell.js :: mzExecute :: line 363" data: no]

I often see try/catch used to do something like this:

try { this() } catch(ex) {}

This seems a bit strange to me - so many code paths will silently fail - the very least we should do is log these exceptions if in a "testing" mode (not debug build). At best, we can take advantage of try/catch and formulate more of the control flow around try catch. in many occurrences you can replace if else and get a more solid control flow. You can also create a hierarchy of exception objects that do additional work for you. You should consider exceptions being thrown not so much as an error, but a helper for your code's logic.

create a set of tables to track uri info:

places_tld
places_host
places_domain
places_query_string_params ??

Places Source Code is bloated

A quick scan of the Places code reveals that it is quite large:

ddahl-t500 ~ % perl ~/bin/cloc.pl --exclude-dir=.hg ~/code/moz/mozilla-central/mozilla/browser/components/places                                                            
      87 text files.
      84 unique files.                              
      26 files ignored.

http://cloc.sourceforge.net v 1.08  T=0.5 s (122.0 files/s, 47816.0 lines/s)
-------------------------------------------------------------------------------
Language          files     blank   comment      code    scale   3rd gen. equiv
-------------------------------------------------------------------------------
Javascript           46      1865      4579     10550 x   1.48 =       15614.00
XML                   4       407        83      3138 x   1.90 =        5962.20
C++                   1       349       511      1729 x   1.51 =        2610.79
IDL                   2        33         0       439 x   3.80 =        1668.20
HTML                  4         5        13        96 x   1.90 =         182.40
C/C++ Header          1         9         0        53 x   1.00 =          53.00
CSS                   3        11         1        37 x   1.00 =          37.00
-------------------------------------------------------------------------------
SUM:                 61      2679      5187     16042 x   1.63 =       26127.59
-------------------------------------------------------------------------------

Just the portion of the code under mozilla/browser/components/places is 16000 lines of code

Check this out:

ddahl-t500 ~ % perl ~/bin/cloc.pl --exclude-dir=.hg ~/code/moz/mozilla-central/mozilla/browser/           
     737 text files.
     717 unique files.                                          
     250 files ignored.

http://cloc.sourceforge.net v 1.08  T=2.0 s (240.0 files/s, 56893.5 lines/s)
-------------------------------------------------------------------------------
Language          files     blank   comment      code    scale   3rd gen. equiv
-------------------------------------------------------------------------------
Javascript          239      8725     16012     45326 x   1.48 =       67082.48
C++                  23      2094      2287     10010 x   1.51 =       15115.10
CSS                  58      1996      1499      8664 x   1.00 =        8664.00
XML                  24      1003       219      7771 x   1.90 =       14764.90
IDL                  15       221         0      2078 x   3.80 =        7896.40
DTD                  49       303       307      1614 x   1.90 =        3066.60
HTML                 42       162        75      1085 x   1.90 =        2061.50
C/C++ Header         23       258       829       803 x   1.00 =         803.00
Perl                  1        40        24       120 x   4.00 =         480.00
Bourne Shell          4         4        74        49 x   3.81 =         186.69
Python                2        21        75        39 x   4.20 =         163.80
-------------------------------------------------------------------------------
SUM:                480     14827     21401     77559 x   1.55 =      120284.47
-------------------------------------------------------------------------------

(this is not counting toolkit code)

The entire mozilla/browser is 77000 lines. That makes Places about 20% of browser. 20%? Really?

Ok, here is browser/ and toolkit/:

ddahl-t500 ~ % perl ~/bin/cloc.pl --exclude-dir=.hg ~/code/moz/mozilla-central/mozilla/browser/ ~/code/moz/mozilla-central/mozilla/toolkit 
    2926 text files.
    2860 unique files.                                          
     956 files ignored.

http://cloc.sourceforge.net v 1.08  T=5.0 s (389.8 files/s, 97069.2 lines/s)
-------------------------------------------------------------------------------
Language          files     blank   comment      code    scale   3rd gen. equiv
-------------------------------------------------------------------------------
Javascript          643     22306     42431    110082 x   1.48 =      162921.36
C++                 195     15972     18124     73938 x   1.51 =      111646.38
XML                 283      4359      1859     32640 x   1.90 =       62016.00
Bourne Shell         19      3368      4739     25717 x   3.81 =       97981.77
HTML                125      6454       460     25330 x   1.90 =       48127.00
CSS                 250      5856      6691     22705 x   1.00 =       22705.00
IDL                  86      1382         0     12617 x   3.80 =       47944.60
C/C++ Header        208      4059     12616     11622 x   1.00 =       11622.00
m4                    2       759        31      6548 x   1.00 =        6548.00
C                     6       338       297      2700 x   0.77 =        2079.00
DTD                 106       527       783      2672 x   1.90 =        5076.80
Objective C           3       235       211       959 x   2.96 =        2838.64
Perl                  9       198       404       854 x   4.00 =        3416.00
Python                4       115       294       542 x   4.20 =        2276.40
make                  1        31        41       301 x   2.50 =         752.50
MATLAB                2        48         0       277 x   4.00 =        1108.00
PHP                   2        75       126       248 x   3.50 =         868.00
Java                  2        29        72        97 x   1.36 =         131.92
Assembly              1        15        39        57 x   0.25 =          14.25
SQL                   1         2         0        56 x   2.29 =         128.24
XSLT                  1        10         0        28 x   1.90 =          53.20
-------------------------------------------------------------------------------
SUM:               1949     66138     89218    329990 x   1.79 =      590255.06
-------------------------------------------------------------------------------

And here is browser/components/places and toolkit/components/places:

ddahl-t500 ~ % perl ~/bin/cloc.pl --exclude-dir=.hg ~/code/moz/mozilla-central/mozilla/browser/components/places ~/code/moz/mozilla-central/mozilla/toolkit/components/places 
     330 text files.
     321 unique files.                                          
      57 files ignored.

http://cloc.sourceforge.net v 1.08  T=0.5 s (542.0 files/s, 169236.0 lines/s)
-------------------------------------------------------------------------------
Language          files     blank   comment      code    scale   3rd gen. equiv
-------------------------------------------------------------------------------
Javascript          203      5210     14992     26791 x   1.48 =       39650.68
C++                  13      3747      4643     17798 x   1.51 =       26874.98
IDL                  12       347         0      3428 x   3.80 =       13026.40
XML                   4       407        83      3138 x   1.90 =        5962.20
C/C++ Header         12       453       884      1979 x   1.00 =        1979.00
HTML                 23        60        22       529 x   1.90 =        1005.10
SQL                   1         2         0        56 x   2.29 =         128.24
CSS                   3        11         1        37 x   1.00 =          37.00
-------------------------------------------------------------------------------
SUM:                271     10237     20625     53756 x   1.65 =       88663.60
-------------------------------------------------------------------------------

53000 Lines. Wow.

So, 16 percent of the code if you count toolkit.

@@ Line 68: / Line 68: @@
 . How well could have the Places original patch(es) been reviewed? I understand it was a huge patch, and no doubt difficult to review comprehensively.
-* Places routinely has ~1400 bugs in bugzilla. Really? This is a reflection of the complexity. (as well as test coverage?)
+* Places routinely has ~1400 bugs in bugzilla. Really? This is a reflection of the complexity.
 . When I interviewed at Mozilla, I was asked what I would like to do with Firefox. My best answer was that I wasn't sure yet, but because I have always been interested in bookmarking and online research (see rCache.com), I would like to know how I got to a particular URL, what was discovered on the way there and how I can easily get back later on without too much trouble. Almost like an analysis (think apache log analysis or perhaps a quasi data mining ) toolkit for your browsing. Our current schema would have trouble in supporting this kind of application.

User:Ddahl: Difference between revisions

Revision as of 22:08, 27 May 2009

Contents

Preface

import this

Places Observations

Proposal

a list of Places code pain points. Mostly questions about perceived complexity in the places codebase

create a set of tables to track uri info:

Places Source Code is bloated

Navigation menu

User:Ddahl: Difference between revisions

Revision as of 22:08, 27 May 2009

Preface

import this

Places Observations

Proposal

a list of Places code pain points. Mostly questions about perceived complexity in the places codebase

create a set of tables to track uri info:

Places Source Code is bloated

Navigation menu

Search