User:Ddahl: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
(Replaced content with '== nothing here ==')
Line 1: Line 1:
== Preface ==
== nothing here ==
 
Wouldn't it be amazing to have a Places module that community members could be excited about contributing code to? Shouldn't that be one of our goals? 
 
== import this ==
 
I take my software design and engineering philosophy partially from Python's famous 'import this':
 
<pre>
ddahl-t500 ~ % python
Python 2.6.2 (release26-maint, Apr 19 2009, 01:56:41)
[GCC 4.3.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import this
The Zen of Python, by Tim Peters
 
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
</pre>
 
I Love these sayings, following these precepts where appropriate will help you design systems that people want to use, and more importantly,  want to extend.
 
As far as Places is concerned, this line is especially prescient:
 
"If the implementation is hard to explain, it's a bad idea."
 
== Places Observations ==
 
I have been staring at the Places source code and fixing bugs for a few months now. I feel like I barely get what is going on under the surface due to the complexity, and I have a few observations as well as a proposal to put forward.
 
Let's review what I have learned about the Places module in Firefox.
 
0. Places code by lines is 20% of mozilla/browser, see below: https://wiki.mozilla.org/User:Ddahl#Places_Source_Code_is_bloated
 
1. Initially developed in a bunker at Google with little input from Mozilla, much less users at large (from what I understand), mostly in C++, by developers who should be working in a Computer Science research lab instead of working on a relatively simple (in scope) bookmarking application. This caused major code complexity.
* The source code in the Mozilla Labs HG repo is infinitely more readable than Places code, and this is OUR "research lab".
 
* What users were consulted on Places before, during and after development? Was there a call to the community to see what users really want or need?
 
2. The database schema was "fitted to the code that was already written", rather than being the driver of the Places application, or so it seems. The captured data should be central to this application. Instead it seems like an afterthought. Performance suffers because of this, as well as code over complexity. There is almost nothing in our schema to be proud of.
 
* The fact that the tags were not given their own table in the database is glaring. This causes our queries to be much more complex, constantly filtering out data, when a simple query could do.
 
3. Naming conventions do not exist, or rather, 15 of them exist. Even in the Places preferences, there are many "namespaces". This is unacceptable.
 
4. Teams have changed over and over, increasing the lack of coherence in the code. This fact alone speaks of the problems in maintaining an over complicated code base. This does not speak well for the future. Perhaps because of this, Places developers get distracted and actively look for other modules to work on?
 
5. A bookmarks organizer is not a complex piece of software compared to the rest of Firefox. This should be relatively simple tool that seems to have been  made overly complex, for what reasons, I do not know.
 
6. How well could have the Places original patch(es) been reviewed? I understand it was a huge patch, and no doubt difficult to review comprehensively.
 
* Places routinely has ~1400 bugs in bugzilla. Really? This is a reflection of the complexity.
 
7. When I interviewed at Mozilla, I was asked what I would like to do with Firefox. My best answer was that I wasn't sure yet, but because I have always been interested in bookmarking and online research (see rCache.com), I would like to know how I got to a particular URL, what was discovered on the way there and how I can easily get back later on without too much trouble. Almost like an analysis (think apache log analysis or perhaps a quasi data mining ) toolkit for your browsing. Our current schema would have trouble in supporting this kind of application.
 
* the schema doesn't support _anything_ beyond what Places currently does, and it doesn't even support that very well.
 
* A star schema will allow for very simple (read: fast) queries to generate report data that can be used in dynamic containers or "about:me". As we proceed to Firefox.next we should create a new schema that actually makes any kind of "drill down" into the history and bookmarks easy.
 
8. Shawn Wilsher filed a bug in October 2008 about getting an experimental Places HG repo set up. It is my opinion that we need this ASAP. (I cannot find this bug, will ask sdwilsh about it)
 
9. Frankly, working on places feels like it is going to be death by a thousand patches, and I'm not sure how it will end up, but fixing it seems like more work than starting over. It will never be elegant and easy to extend if we keep on the road we are on, and the longer we keep on this path the harder it will be to pull the plug. The rationale for "marshaling on" with the current code is duly noted (it seems to be based on inertia in my opinion). I understand the concern in being conservative with code changes, but when did Mozilla become conservative? Are we really this afraid of change? What was Tracemonkey then? anything but conservative.
 
== Proposal ==
 
0. Keep fixing bugs and working on Places performance tweaks. But we should slow down on new features.
 
1. Set up an experimental repo.
 
2. Create a new module, not called "Places", "Library" would be much better. perhaps we use a name never before used. (Let's fix the nomenclature problems)
 
3. Get user ideas and input early and often.
 
* What does Apple do and why? What does Google do and why? Do users get a lot out of the implementation? Let's start a dialog with Firefox, Safari and Chrome users.
 
4. Get UI design input early and often.
 
5. Get Labs input early and often.
 
6. Everything is a sprint, and a SHORT sprint at that. Simplicity is Key. What is the simplest way to do something. Design this for today, not around edge cases.
 
* Performance will come with time. Let's design this around a great experience first, the experience our users want.
 
* Test-driven development. We can make this simple and clean. We can make this a pleasure to work on. We can make extending it a pleasure as well, and we can make the user experience amazing.
 
7. Begin by creating new Schema designs. I suggest each Places developer go off and design a schema on their own. Put the call out for community developers to do the same. "Design challenge", re-assemble and discuss.
 
* guidelines are query simplicity, and the required data to capture for each activity: Visit, Bookmark, Tag, Keyword, Annotation, etc.
 
8. Create a database data generator. Let's develop this with a large dataset.
 
9. Design a really easy to use Query Api, that is a thin layer over a JS-freindly ORM/API over storage.
 
10. Database Schema and Query API sprint.
 
11. UI design sprint.  Can we create a really clean MVC style application that makes skinning the "Library" trivial for add-on developers?
 
12. Get user feedback, Get UI feedback, Get Labs feedback. Capture Feedback. Iterate. Sprints should start out very short. 2/3 days a week for 3 weeks?
 
== a list of Places code pain points. Mostly questions about perceived complexity in the places codebase ==
 
* Schema warts
 
** naming conventions
 
<pre>sqlite> .schema moz_bookmarks
CREATE TABLE moz_bookmarks (  id INTEGER PRIMARY KEY, type INTEGER, fk INTEGER DEFAULT NULL, parent INTEGER, position INTEGER, title LONGVARCHAR, keyword_id INTEGER, folder_type TEXT, dateAdded INTEGER, lastModified INTEGER);
 
sqlite> select * from moz_bookmarks where id = 1;
1|2||0|0||||1233943160822932|1233943160823457
 
sqlite> .schema moz_bookmarks_roots
CREATE TABLE moz_bookmarks_roots (  root_name VARCHAR(16) UNIQUE, folder_id INTEGER);
 
sqlite> select * from moz_bookmarks_roots;
places|1
menu|2
toolbar|3
tags|4
unfiled|5
sqlite> select * from moz_bookmarks where id = 4;
4|2||1|2|Tags|||1233943160823352|1233943160823766
</pre>
 
So the folder_id from moz_bookmarks_roots is the foreign key to id in moz_bookmarks.
 
This is a non-convention in database schema design.
 
** Here is a row from moz_bookmarks:
 
<pre>
25792|2||25789|2|All Bookmarks|||1232646170341394|1232646170342806
 
</pre>
 
* Why is a folder in the places UI kept as a record in the bookmarks table? If it is not really a bookmark, do not store it in the bookmarks table. the reason databases have various tables is usually so that you can keep like items together.
 
* Places services have many getters, most of them require arguments? WTF? Getters do not require arguements. NOTE: I have since figureedd out this is due to XPCOM. We should be able to write our code so that these things are wrapped in a more "object-oriented" manner.
 
<pre>
bs.getItemLastModified
function getItemLastModified() {
    [native code]
}
bs.getItemLastModified()
[Exception... "Not enough arguments [nsINavBookmarksService.getItemLastModified]" nsresult: "0x80570001 (NS_ERROR_XPC_NOT_ENOUGH_ARGS)" location: "JS frame :: chrome://mozshell/content/mozshell.js :: mzExecute :: line 363" data: no]
</pre>
 
* I often see try/catch used to do something like this:
<pre>
try { this() } catch(ex) {}
</pre>
 
This seems a bit strange to me - so many code paths will silently fail - the very least we should do is log these exceptions if in a "testing" mode (not debug build). At best, we can take advantage of try/catch and formulate more of the control flow around try catch. in many occurrences you can replace if else and get a more solid control flow. You can also create a hierarchy of exception objects that do additional work for you. You should consider exceptions being thrown not so much as an error, but a helper for your code's logic.
 
 
 
=== create a set of tables to track uri info: ===
 
* places_tld
* places_host
* places_domain
* places_query_string_params ??
 
 
== Places Source Code is bloated ==
 
A quick scan of the Places code reveals that it is quite large:
<pre>
ddahl-t500 ~ % perl ~/bin/cloc.pl --exclude-dir=.hg ~/code/moz/mozilla-central/mozilla/browser/components/places                                                           
      87 text files.
      84 unique files.                             
      26 files ignored.
 
http://cloc.sourceforge.net v 1.08  T=0.5 s (122.0 files/s, 47816.0 lines/s)
-------------------------------------------------------------------------------
Language          files    blank  comment      code    scale  3rd gen. equiv
-------------------------------------------------------------------------------
Javascript          46      1865      4579    10550 x  1.48 =      15614.00
XML                  4      407        83      3138 x  1.90 =        5962.20
C++                  1      349      511      1729 x  1.51 =        2610.79
IDL                  2        33        0      439 x  3.80 =        1668.20
HTML                  4        5        13        96 x  1.90 =        182.40
C/C++ Header          1        9        0        53 x  1.00 =          53.00
CSS                  3        11        1        37 x  1.00 =          37.00
-------------------------------------------------------------------------------
SUM:                61      2679      5187    16042 x  1.63 =      26127.59
-------------------------------------------------------------------------------
</pre>
 
Just the portion of the code under mozilla/browser/components/places is 16000 lines of code
 
 
Check this out:
 
<pre>
ddahl-t500 ~ % perl ~/bin/cloc.pl --exclude-dir=.hg ~/code/moz/mozilla-central/mozilla/browser/         
    737 text files.
    717 unique files.                                         
    250 files ignored.
 
http://cloc.sourceforge.net v 1.08  T=2.0 s (240.0 files/s, 56893.5 lines/s)
-------------------------------------------------------------------------------
Language          files    blank  comment      code    scale  3rd gen. equiv
-------------------------------------------------------------------------------
Javascript          239      8725    16012    45326 x  1.48 =      67082.48
C++                  23      2094      2287    10010 x  1.51 =      15115.10
CSS                  58      1996      1499      8664 x  1.00 =        8664.00
XML                  24      1003      219      7771 x  1.90 =      14764.90
IDL                  15      221        0      2078 x  3.80 =        7896.40
DTD                  49      303      307      1614 x  1.90 =        3066.60
HTML                42      162        75      1085 x  1.90 =        2061.50
C/C++ Header        23      258      829      803 x  1.00 =        803.00
Perl                  1        40        24      120 x  4.00 =        480.00
Bourne Shell          4        4        74        49 x  3.81 =        186.69
Python                2        21        75        39 x  4.20 =        163.80
-------------------------------------------------------------------------------
SUM:                480    14827    21401    77559 x  1.55 =      120284.47
-------------------------------------------------------------------------------
</pre>
 
(this is not counting toolkit code)
 
The entire mozilla/browser is 77000 lines.  That makes Places about 20% of browser. 20%? Really?
 
 
Ok, here is browser/ and toolkit/:
<pre>
ddahl-t500 ~ % perl ~/bin/cloc.pl --exclude-dir=.hg ~/code/moz/mozilla-central/mozilla/browser/ ~/code/moz/mozilla-central/mozilla/toolkit
    2926 text files.
    2860 unique files.                                         
    956 files ignored.
 
http://cloc.sourceforge.net v 1.08  T=5.0 s (389.8 files/s, 97069.2 lines/s)
-------------------------------------------------------------------------------
Language          files    blank  comment      code    scale  3rd gen. equiv
-------------------------------------------------------------------------------
Javascript          643    22306    42431    110082 x  1.48 =      162921.36
C++                195    15972    18124    73938 x  1.51 =      111646.38
XML                283      4359      1859    32640 x  1.90 =      62016.00
Bourne Shell        19      3368      4739    25717 x  3.81 =      97981.77
HTML                125      6454      460    25330 x  1.90 =      48127.00
CSS                250      5856      6691    22705 x  1.00 =      22705.00
IDL                  86      1382        0    12617 x  3.80 =      47944.60
C/C++ Header        208      4059    12616    11622 x  1.00 =      11622.00
m4                    2      759        31      6548 x  1.00 =        6548.00
C                    6      338      297      2700 x  0.77 =        2079.00
DTD                106      527      783      2672 x  1.90 =        5076.80
Objective C          3      235      211      959 x  2.96 =        2838.64
Perl                  9      198      404      854 x  4.00 =        3416.00
Python                4      115      294      542 x  4.20 =        2276.40
make                  1        31        41      301 x  2.50 =        752.50
MATLAB                2        48        0      277 x  4.00 =        1108.00
PHP                  2        75      126      248 x  3.50 =        868.00
Java                  2        29        72        97 x  1.36 =        131.92
Assembly              1        15        39        57 x  0.25 =          14.25
SQL                  1        2        0        56 x  2.29 =        128.24
XSLT                  1        10        0        28 x  1.90 =          53.20
-------------------------------------------------------------------------------
SUM:              1949    66138    89218    329990 x  1.79 =      590255.06
-------------------------------------------------------------------------------
</pre>
 
And here is browser/components/places and toolkit/components/places:
 
<pre>
ddahl-t500 ~ % perl ~/bin/cloc.pl --exclude-dir=.hg ~/code/moz/mozilla-central/mozilla/browser/components/places ~/code/moz/mozilla-central/mozilla/toolkit/components/places
    330 text files.
    321 unique files.                                         
      57 files ignored.
 
http://cloc.sourceforge.net v 1.08  T=0.5 s (542.0 files/s, 169236.0 lines/s)
-------------------------------------------------------------------------------
Language          files    blank  comment      code    scale  3rd gen. equiv
-------------------------------------------------------------------------------
Javascript          203      5210    14992    26791 x  1.48 =      39650.68
C++                  13      3747      4643    17798 x  1.51 =      26874.98
IDL                  12      347        0      3428 x  3.80 =      13026.40
XML                  4      407        83      3138 x  1.90 =        5962.20
C/C++ Header        12      453      884      1979 x  1.00 =        1979.00
HTML                23        60        22      529 x  1.90 =        1005.10
SQL                  1        2        0        56 x  2.29 =        128.24
CSS                  3        11        1        37 x  1.00 =         37.00
-------------------------------------------------------------------------------
SUM:                271    10237    20625    53756 x  1.65 =       88663.60
-------------------------------------------------------------------------------
</pre>
 
53000 Lines. Wow.
 
So, 16 percent of the code if you count toolkit.

Revision as of 20:47, 9 November 2009

nothing here