Update:Remora Database Migration: Difference between revisions

 
(43 intermediate revisions by 4 users not shown)
Line 11: Line 11:
The script is called migration.py and lives in svn at .../trunk/services/bin
The script is called migration.py and lives in svn at .../trunk/services/bin


The script has two Python library dependencies:  
The script requires Python 2.4 to work correctly.  Versions before 2.4 will likely produce syntax errors like this one:
 
  File "./migration.py", line 961
    delimitedListOfVersionIdsThatShouldBeDeleted = ",".join((str(x.id) for x in newDB.executeSql("select v.id from versions v where addon_id in (%s)" %  delimitedListOfAddons)))
SyntaxError: invalid syntax
 
The tests require the use of "generator expressions" that were not available until 2.4.
 
Also, keep in mind that you shouldn't expect existing data in the target database to persist.  You should run the script on a clean target database.
 
The script has three Python library dependencies:  
* cse -- using svn at GoogleCode: http://csepython.googlecode.com/svn/trunk/
* cse -- using svn at GoogleCode: http://csepython.googlecode.com/svn/trunk/
* MySQLdb -- http://sourceforge.net/projects/mysql-python
* MySQLdb -- http://sourceforge.net/projects/mysql-python
* Crypto -- http://www.amk.ca/python/code/crypto
===Use===
Disclaimer
migration.py is a sharp edged tool.  It makes it easy to completely screw up the
data in the database, therefore it must be used carefully.  migration.py logs its
every use, so if something bad does happen, you can go back through the log to
see what went wrong and perhaps assign blame.
To use the script, just invoke it on the command line with appropriate options.  The complete list of options can be found with this command:
./migration.py --help


===Command Line Options===  
===Command Line Options===  
Line 24: Line 48:
                 a quoted comma delimited list of the ids of addons OR the word
                 a quoted comma delimited list of the ids of addons OR the word
                 "all" (default: all)
                 "all" (default: all)
            --addonStatusWhenNoFilesApproved
                the status to set an addon if no files approved
                (null, sandbox, pending, nominated, public, disabled)
                (default: sandbox)
            --addonStatusWhenSomeFilesApproved
                the status to set an addon if some files approved
                (null, sandbox, pending, nominated, public, disabled)
                (default: public)
             --clear
             --clear
                 clear all exisiting information from the new database
                 clear all exisiting information from the new database
Line 32: Line 64:
                 specify the location and name of the config file  
                 specify the location and name of the config file  
                 (default: ./migration.conf)
                 (default: ./migration.conf)
            --fileCachePath
                a path for the caching of files (blank for no cache)
                the path should no end in /
         -?, --help
         -?, --help
                 print this message
                 print this message
        -l, --locale
                the locale to use when migrating addons (default: en-US)
            --logPathName
                a progressive log of all runs of the migration script
                (default: ./migration.log)
         -A, --migrateAddons
         -A, --migrateAddons
                 if present, this switch causes the addons in the "addons"  
                 if present, this switch causes the addons in the "addons"  
Line 60: Line 100:
             --oldUserName
             --oldUserName
                 the name of the user in the old database (default: )
                 the name of the user in the old database (default: )
            --previewURIPrefix
                a prefix to add to the URI in the preview table to enable
                downloading (default: https://addons.mozilla.org)
             --tests
             --tests
                 run the test cases
                 run the test cases
Line 209: Line 252:
<pre>
<pre>
lars@deerlodge:~/project> ./migration.py --clearAddons -a 2506,2507
lars@deerlodge:~/project> ./migration.py --clearAddons -a 2506,2507
</pre>
Reinsert Addon 2506 as a German language addon:
<pre>
lars@deerlodge:~/project> ./migration.py -A -a 2506 -l de
</pre>
</pre>


Line 222: Line 271:
|GUID||applications.guid
|GUID||applications.guid
|-
|-
|AppName||applications.name
|AppName||translations.localized_string<br>''insert id into'' applications.name
|-
|AppName||translations.localized_string<br>''insert id into'' applications.shortname
|-
|supported||supported
|-
|-
|major||''no mapping''
|major||''no mapping''
Line 252: Line 305:
|CategoryID||id
|CategoryID||id
|-
|-
|CatName||name
|CatName||translations.localized_string<br>''insert id into'' name
|-
|-
|CatDesc||description
|CatDesc||translations.localized_string<br>''insert id into'' description
|-
|-
|CatType||addontype_id <br/>''lookup in addontypes.name assuming mapping of E->extensions, T->themes...''
|CatType||addontype_id <br/>''lookup in addontypes.name assuming mapping of E->extensions, T->themes...''
Line 271: Line 324:
|OSID||id
|OSID||id
|-
|-
|OSName||name
|OSName||translations.localized_string<br>''insert id into'' name
|-
|-
|OSName||shortname
|OSName||translations.localized_string<br>''insert id into'' shortname
|-
|-
| ||created<br/>''time.now''
| ||created<br/>''time.now''
Line 286: Line 339:
|UserID||id
|UserID||id
|-
|-
|UserName||parse to firstname, lastname
|UserName||''parse to'' firstname, lastname<br/>''lastname is always the last word from Username<br/>firstname is all but the last word or "?" if there are no other words''
|-
|-
| ||nickname<br/>'''no source for data, Null not allowed, no default - I put in an empty string'''
| ||nickname<br/>'''no source for data, Null not allowed, no default - I put in an empty string'''
Line 344: Line 397:
|DateUpdated||modified
|DateUpdated||modified
|-
|-
|Homepage||homepage
|Homepage||translations.localized_string<br>''insert id into'' homepage
|-
|-
|Description||translations.localized_string<br/>''insert into translations, then fetch id''<br/>description
|Description||translations.localized_string<br/>''insert into translations, then fetch id''<br/>description
Line 354: Line 407:
|TotalDownloads||totaldownloads
|TotalDownloads||totaldownloads
|-
|-
|devcomments||developercomments
|devcomments||translations.localized_string<br>''insert id into'' developercomments
|-
|-
| ||icon<br/>''use default value''
| ||icon<br/>''use default value''
|-
|-
|Description||translations.localized_string<br/>''take up to 250 characters, do not break inside a word, add ellipsis if break is not at the end of sentence insert into translations, then fetch id''<br/>summary
|Description||translations.localized_string<br/>''take up to 250 characters but not more than the first line, do not break inside a word, add ellipsis if break is not at the end of sentence insert into translations, then fetch id''<br/>summary
|-
|-
| ||inactive<br/>''use default value''
| ||inactive<br/>''use default value''
Line 392: Line 445:
|-
|-
|DateAdded||versions.created
|DateAdded||versions.created
|-
|DateApproved||versions.dateapproved
|-
|-
|DateUpdated||versions.modified
|DateUpdated||versions.modified
|-
|-
|Notes||versions.releasenotes
|Notes||translations.localized_string<br>''insert id into'' versions.releasenotes
|-
|-
| ||'''addons.approvalnotes'''<BR/>''empty string''
| ||'''addons.approvalnotes'''<BR/>''empty string''
|-
|approved||versions.approved<br/>''mapping -- "YES": 2, "NO": 3, "?":1, "DISABLED":6''
|-
|-
| ||versions.created<br/>''time.now''
| ||versions.created<br/>''time.now''
Line 441: Line 490:
|DateApproved||files.dateapproved
|DateApproved||files.dateapproved
|-
|-
|approved||files.approved<br/>''mapping -- "YES": 2, "NO": 3, "?":1, "DISABLED":6''
|approved||files.approved<br/>''mapping -- "YES": 4, "NO": 1, "?":2, "DISABLED":5''
|-
|-
| ||files.created<br/>''time.now''
| ||files.created<br/>''time.now''
Line 463: Line 512:
|PreviewID||id
|PreviewID||id
|-
|-
|PreviewURI||filename
|PreviewURI||filedata<br/>''data from download''
|-
|PreviewURI||filetype<br/>''the MIME type from download''
|-
|PreviewURI||thumbdata<br/>''data from download cooked into an image of 200x150 with aspect ratio preserved (add transparent filler if needed)''
|-
|PreviewURI||thumbtype<br/>''the MIME type from download''
|-
|-
|ID||addon_id
|ID||addon_id
|-
|-
|caption||caption
|caption||translations.localized_string<br>''insert id from translations into into'' caption
|-
|-
|preview||highlight<br/>''conversion -- "YES":1, "NO":0''
|preview||highlight<br/>''conversion -- "YES":1, "NO":0''
Line 478: Line 533:
==Feedback==
==Feedback==
If we don't migrate add-ons which are only compatible with fx 1.0, then users can upload add-ons with the same em:ID and then users who might have had the old add-ons still installed but disabled (or who are still using fx 1.0) will be updated to the new one.
If we don't migrate add-ons which are only compatible with fx 1.0, then users can upload add-ons with the same em:ID and then users who might have had the old add-ons still installed but disabled (or who are still using fx 1.0) will be updated to the new one.
=== Previews ===
* No previews were migrated -- but we need to solve this issue to help get images over.
* In order to pull them we'd need to generate the URI and grab them so we can insert them into the proper blob in the previews table
Old db:
mysql> describe previews;
+------------+------------------+------+-----+---------+----------------+
| Field      | Type            | Null | Key | Default | Extra          |
+------------+------------------+------+-----+---------+----------------+
| PreviewID  | int(11)          |      | PRI | NULL    | auto_increment |
| PreviewURI | varchar(200)    |      |    |        |                |
| ID        | int(5)          |      | MUL | 0      |                |
| caption    | varchar(255)    |      |    |        |                |
| preview    | enum('YES','NO') |      |    | NO      |                |
+------------+------------------+------+-----+---------+----------------+
5 rows in set (0.00 sec)
* "PreviewURI" column points to the relative path of the preview.
* "ID" is the FK to main.ID
* "caption" would get ported to dynamic l10n
* "preview" would be "highlight" in the new previews table
Moving to new db:
* Use "https://addons.mozilla.org".$PreviewURI to grab the image and insert it into the filedata blog in the new db.
* Filetype should be preserved as well so it can display properly in the images controller.
=== Duplicate Files ===
* (clouserw) I think there were some duplications within the `files` table.  For example:
mysql> select * from files where filename like 'stumbleupon-2.1%';
+--------+------------+-------------+---------------------------------+------+------+------------+--------+---------------------+---------------------+---------------------+
| id    | version_id | platform_id | filename                        | size | hash | codereview | status | datestatuschanged  | created            | modified            |
+--------+------------+-------------+---------------------------------+------+------+------------+--------+---------------------+---------------------+---------------------+
| 133663 |      7507 |          1 | stumbleupon-2.1-fx+fl+mz+ns.xpi |  101 |      |          0 |      4 | 0000-00-00 00:00:00 | 2006-12-22 00:20:46 | 0000-00-00 00:00:00 |
| 133664 |      7507 |          1 | stumbleupon-2.1-fx+fl+mz+ns.xpi |  101 |      |          0 |      4 | 0000-00-00 00:00:00 | 2006-12-22 00:20:46 | 0000-00-00 00:00:00 |
| 133665 |      7507 |          1 | stumbleupon-2.1-fx+fl+mz+ns.xpi |  101 |      |          0 |      4 | 0000-00-00 00:00:00 | 2006-12-22 00:20:46 | 0000-00-00 00:00:00 |
| 133666 |      7507 |          1 | stumbleupon-2.1-fx+fl+mz+ns.xpi |  101 |      |          0 |      4 | 0000-00-00 00:00:00 | 2006-12-22 00:20:46 | 0000-00-00 00:00:00 |
+--------+------------+-------------+---------------------------------+------+------+------------+--------+---------------------+---------------------+---------------------+
4 rows in set (0.00 sec)
* (morgamic) I think the script may have created duped rows in files because in the old 'version' table rows would be duped except for the app/appversion info -- in this database that data is spliced off.  Might want to start there.
* (lars) files are duplicated because sometimes the the same physical file is specified for more than one application (Firefox & Mozilla) - it is very easy to  prevent this from resulting in two entries in to the file table.  However, if the two entries may not match exactly (hash, approval, etc), the one seen later will superceed the one seen earlier...
===Duplicate Categories/Tags===
* I haven't looked at the results since last week, but as I recall, there are many duplicate categories. My guess is that this is because in v1/v2, each application has to have its own category, so there might be an "Emo" category for Firefox and a different "Emo" category for Thunderbird, and the add-on is in both of them, but it only shows one because we grouped by name before. We will need to look at this and figure out whether to fix this in the code or migration. [[User:Fligtar|Fligtar]] 16:28, 27 December 2006 (PST)
* (lars) the database schema for the `tags` table requires that we duplicate the entries because it has foreign keys to both the addontypes and applications tables.  There are six tags called "Miscellaneous" because we need entries for the addontypes "extension" and "theme" for the applications "Firefox", "Mozilla" and "Thunderbird"
=== Other ===
* There are blank filenames in the `files` table - is this a result of the migration?
** (lars) empty filenames in the migration are faithful reproductions from the old database.  There are entries in the olddb version table where the URI column is blank. What would you like me to do with them?
*Some sha sums are calculated in the files table, some aren't.  Since we're migrating data, would it be useful to calculate hashes while we're at it?  (Would it be easily doable?)
=== Updates, 2007-02-07 ===
Things that we need to tweak:
* When there are multiple files per version/platform (one for each app, for example) AND there is at least one Firefox file, keep the firefox one and get rid of the non-firefox one.
        SELECT
            addons.guid as guid,
            addons.id as id,
            addons.addontype_id as type,
            applications.guid as appguid,
            appmin.version as min,
            appmax.version as max,
            files.filename,
            files.hash,
            versions.version as version
        FROM
            versions
        INNER JOIN addons ON versions.addon_id = addons.id AND addons.guid = '{c45c406e-ab73-11d8-be73-000a95be3b12}' AND addons.status = 4
        INNER JOIN applications_versions ON applications_versions.version_id = versions.id
        INNER JOIN applications ON applications_versions.application_id = applications.id  AND applications.guid = '{ec8030f7-c20a-464f-9b0e-13a3a9e97384}'     
        INNER JOIN appversions appmin ON appmin.id = applications_versions.min AND '2.0.0.2pre' >= appmin.version
        INNER JOIN appversions appmax ON appmax.id = applications_versions.max 
        INNER JOIN files ON files.version_id = versions.id AND (files.platform_id = 1 OR files.platform_id = '3' ) AND files.status = 4
        ORDER BY
            versions.id DESC
        LIMIT 1
In the case above, when using the migrated data set, the -mz file is returned because it is selected at random.  The file should actually be the -fx file.
* Verify main.description is getting trimmed at max 250 chars. when migrating to summary
== Final Migration ==
[[User:clouserw|Clouserw]] performed the final migration on 2007-03-05:
I expect this script will be retired at this point, but just to be complete, this is what happened with the final run:
* Initial AMOv2 database was loaded with a db dump from v2 from 2007-03-05 08:10am
*Initial AMOv3 database was loaded with [http://viewvc.svn.mozilla.org/vc/addons/trunk/site/app/config/sql/remora.sql?revision=2206&view=markup remora.sql from r2206]
*migration.conf was setup, etc.  Final command run was ''./migration.py -MAv --recalculateHash > migration.out 2>&1''.  We had to run the --recalculateHash flag so the script would download new addons (if it finds a hash in the db, it doesn't download the addon)
*The following previews failed to migrate (but there are blank entries for them in the db):
  https://addons.mozilla.org/images/previews/acid_burn-1.jpg for addon_id 453
  https://addons.mozilla.org/images/previews/acid_burn-2.jpg for addon_id 453
  https://addons.mozilla.org/images/previews/futurama-4.jpg for addon_id 1240
  https://addons.mozilla.org/images/previews/kiminbu?-1.jpg for addon_id 2904
  https://addons.mozilla.org/images/previews/ook?_video_ook!-4.jpg for addon_id 2584
  https://addons.mozilla.org/images/previews/webos_xul#_extension-1.jpg for addon_id 2945
  https://addons.mozilla.org/images/previews/ook?_video_ook!-5.jpg for addon_id 2584
  https://addons.mozilla.org/images/previews/print/print_preview-2.jpg for addon_id 1778
*Ran the [http://people.mozilla.org/~clouserw/public/remora/remora_supplement.final.sql supplemental SQL]
*''./migration.py --clearAddons -a `cat amov2_addons_marked_for_deletion`'' to remove the old disabled addons.  Just some housekeeping.
Confirmed users, Bureaucrats and Sysops emeriti
1,737

edits