Confirmed users, Bureaucrats and Sysops emeriti
1,737
edits
(29 intermediate revisions by 4 users not shown) | |||
Line 11: | Line 11: | ||
The script is called migration.py and lives in svn at .../trunk/services/bin | The script is called migration.py and lives in svn at .../trunk/services/bin | ||
The script has | The script requires Python 2.4 to work correctly. Versions before 2.4 will likely produce syntax errors like this one: | ||
File "./migration.py", line 961 | |||
delimitedListOfVersionIdsThatShouldBeDeleted = ",".join((str(x.id) for x in newDB.executeSql("select v.id from versions v where addon_id in (%s)" % delimitedListOfAddons))) | |||
SyntaxError: invalid syntax | |||
The tests require the use of "generator expressions" that were not available until 2.4. | |||
Also, keep in mind that you shouldn't expect existing data in the target database to persist. You should run the script on a clean target database. | |||
The script has three Python library dependencies: | |||
* cse -- using svn at GoogleCode: http://csepython.googlecode.com/svn/trunk/ | * cse -- using svn at GoogleCode: http://csepython.googlecode.com/svn/trunk/ | ||
* MySQLdb -- http://sourceforge.net/projects/mysql-python | * MySQLdb -- http://sourceforge.net/projects/mysql-python | ||
* Crypto -- http://www.amk.ca/python/code/crypto | |||
===Use=== | |||
Disclaimer | |||
migration.py is a sharp edged tool. It makes it easy to completely screw up the | |||
data in the database, therefore it must be used carefully. migration.py logs its | |||
every use, so if something bad does happen, you can go back through the log to | |||
see what went wrong and perhaps assign blame. | |||
To use the script, just invoke it on the command line with appropriate options. The complete list of options can be found with this command: | |||
./migration.py --help | |||
===Command Line Options=== | ===Command Line Options=== | ||
Line 40: | Line 64: | ||
specify the location and name of the config file | specify the location and name of the config file | ||
(default: ./migration.conf) | (default: ./migration.conf) | ||
--fileCachePath | |||
a path for the caching of files (blank for no cache) | |||
the path should no end in / | |||
-?, --help | -?, --help | ||
print this message | print this message | ||
Line 73: | Line 100: | ||
--oldUserName | --oldUserName | ||
the name of the user in the old database (default: ) | the name of the user in the old database (default: ) | ||
--previewURIPrefix | |||
a prefix to add to the URI in the preview table to enable | |||
downloading (default: https://addons.mozilla.org) | |||
--tests | --tests | ||
run the test cases | run the test cases | ||
Line 244: | Line 274: | ||
|- | |- | ||
|AppName||translations.localized_string<br>''insert id into'' applications.shortname | |AppName||translations.localized_string<br>''insert id into'' applications.shortname | ||
|- | |||
|supported||supported | |||
|- | |- | ||
|major||''no mapping'' | |major||''no mapping'' | ||
Line 307: | Line 339: | ||
|UserID||id | |UserID||id | ||
|- | |- | ||
|UserName||parse to firstname, lastname | |UserName||''parse to'' firstname, lastname<br/>''lastname is always the last word from Username<br/>firstname is all but the last word or "?" if there are no other words'' | ||
|- | |- | ||
| ||nickname<br/>'''no source for data, Null not allowed, no default - I put in an empty string''' | | ||nickname<br/>'''no source for data, Null not allowed, no default - I put in an empty string''' | ||
Line 379: | Line 411: | ||
| ||icon<br/>''use default value'' | | ||icon<br/>''use default value'' | ||
|- | |- | ||
|Description||translations.localized_string<br/>''take up to 250 characters, do not break inside a word, add ellipsis if break is not at the end of sentence insert into translations, then fetch id''<br/>summary | |Description||translations.localized_string<br/>''take up to 250 characters but not more than the first line, do not break inside a word, add ellipsis if break is not at the end of sentence insert into translations, then fetch id''<br/>summary | ||
|- | |- | ||
| ||inactive<br/>''use default value'' | | ||inactive<br/>''use default value'' | ||
Line 458: | Line 490: | ||
|DateApproved||files.dateapproved | |DateApproved||files.dateapproved | ||
|- | |- | ||
|approved||files.approved<br/>''mapping -- "YES": | |approved||files.approved<br/>''mapping -- "YES": 4, "NO": 1, "?":2, "DISABLED":5'' | ||
|- | |- | ||
| ||files.created<br/>''time.now'' | | ||files.created<br/>''time.now'' | ||
Line 480: | Line 512: | ||
|PreviewID||id | |PreviewID||id | ||
|- | |- | ||
|PreviewURI|| | |PreviewURI||filedata<br/>''data from download'' | ||
|- | |||
|PreviewURI||filetype<br/>''the MIME type from download'' | |||
|- | |||
|PreviewURI||thumbdata<br/>''data from download cooked into an image of 200x150 with aspect ratio preserved (add transparent filler if needed)'' | |||
|- | |||
|PreviewURI||thumbtype<br/>''the MIME type from download'' | |||
|- | |- | ||
|ID||addon_id | |ID||addon_id | ||
|- | |- | ||
|caption||translations.localized_string<br>''insert id into'' caption | |caption||translations.localized_string<br>''insert id from translations into into'' caption | ||
|- | |- | ||
|preview||highlight<br/>''conversion -- "YES":1, "NO":0'' | |preview||highlight<br/>''conversion -- "YES":1, "NO":0'' | ||
Line 496: | Line 534: | ||
If we don't migrate add-ons which are only compatible with fx 1.0, then users can upload add-ons with the same em:ID and then users who might have had the old add-ons still installed but disabled (or who are still using fx 1.0) will be updated to the new one. | If we don't migrate add-ons which are only compatible with fx 1.0, then users can upload add-ons with the same em:ID and then users who might have had the old add-ons still installed but disabled (or who are still using fx 1.0) will be updated to the new one. | ||
(clouserw) I think there were some duplications within the `files` table. For example: | === Previews === | ||
* No previews were migrated -- but we need to solve this issue to help get images over. | |||
* In order to pull them we'd need to generate the URI and grab them so we can insert them into the proper blob in the previews table | |||
Old db: | |||
mysql> describe previews; | |||
+------------+------------------+------+-----+---------+----------------+ | |||
| Field | Type | Null | Key | Default | Extra | | |||
+------------+------------------+------+-----+---------+----------------+ | |||
| PreviewID | int(11) | | PRI | NULL | auto_increment | | |||
| PreviewURI | varchar(200) | | | | | | |||
| ID | int(5) | | MUL | 0 | | | |||
| caption | varchar(255) | | | | | | |||
| preview | enum('YES','NO') | | | NO | | | |||
+------------+------------------+------+-----+---------+----------------+ | |||
5 rows in set (0.00 sec) | |||
* "PreviewURI" column points to the relative path of the preview. | |||
* "ID" is the FK to main.ID | |||
* "caption" would get ported to dynamic l10n | |||
* "preview" would be "highlight" in the new previews table | |||
Moving to new db: | |||
* Use "https://addons.mozilla.org".$PreviewURI to grab the image and insert it into the filedata blog in the new db. | |||
* Filetype should be preserved as well so it can display properly in the images controller. | |||
=== Duplicate Files === | |||
* (clouserw) I think there were some duplications within the `files` table. For example: | |||
mysql> select * from files where filename like 'stumbleupon-2.1%'; | mysql> select * from files where filename like 'stumbleupon-2.1%'; | ||
+--------+------------+-------------+---------------------------------+------+------+------------+--------+---------------------+---------------------+---------------------+ | +--------+------------+-------------+---------------------------------+------+------+------------+--------+---------------------+---------------------+---------------------+ | ||
Line 507: | Line 572: | ||
+--------+------------+-------------+---------------------------------+------+------+------------+--------+---------------------+---------------------+---------------------+ | +--------+------------+-------------+---------------------------------+------+------+------------+--------+---------------------+---------------------+---------------------+ | ||
4 rows in set (0.00 sec) | 4 rows in set (0.00 sec) | ||
* (morgamic) I think the script may have created duped rows in files because in the old 'version' table rows would be duped except for the app/appversion info -- in this database that data is spliced off. Might want to start there. | |||
* (lars) files are duplicated because sometimes the the same physical file is specified for more than one application (Firefox & Mozilla) - it is very easy to prevent this from resulting in two entries in to the file table. However, if the two entries may not match exactly (hash, approval, etc), the one seen later will superceed the one seen earlier... | |||
===Duplicate Categories/Tags=== | |||
* I haven't looked at the results since last week, but as I recall, there are many duplicate categories. My guess is that this is because in v1/v2, each application has to have its own category, so there might be an "Emo" category for Firefox and a different "Emo" category for Thunderbird, and the add-on is in both of them, but it only shows one because we grouped by name before. We will need to look at this and figure out whether to fix this in the code or migration. [[User:Fligtar|Fligtar]] 16:28, 27 December 2006 (PST) | |||
* (lars) the database schema for the `tags` table requires that we duplicate the entries because it has foreign keys to both the addontypes and applications tables. There are six tags called "Miscellaneous" because we need entries for the addontypes "extension" and "theme" for the applications "Firefox", "Mozilla" and "Thunderbird" | |||
=== Other === | |||
* There are blank filenames in the `files` table - is this a result of the migration? | |||
** (lars) empty filenames in the migration are faithful reproductions from the old database. There are entries in the olddb version table where the URI column is blank. What would you like me to do with them? | |||
*Some sha sums are calculated in the files table, some aren't. Since we're migrating data, would it be useful to calculate hashes while we're at it? (Would it be easily doable?) | |||
=== Updates, 2007-02-07 === | |||
Things that we need to tweak: | |||
* When there are multiple files per version/platform (one for each app, for example) AND there is at least one Firefox file, keep the firefox one and get rid of the non-firefox one. | |||
SELECT | |||
addons.guid as guid, | |||
addons.id as id, | |||
addons.addontype_id as type, | |||
applications.guid as appguid, | |||
appmin.version as min, | |||
appmax.version as max, | |||
files.filename, | |||
files.hash, | |||
versions.version as version | |||
FROM | |||
versions | |||
INNER JOIN addons ON versions.addon_id = addons.id AND addons.guid = '{c45c406e-ab73-11d8-be73-000a95be3b12}' AND addons.status = 4 | |||
INNER JOIN applications_versions ON applications_versions.version_id = versions.id | |||
INNER JOIN applications ON applications_versions.application_id = applications.id AND applications.guid = '{ec8030f7-c20a-464f-9b0e-13a3a9e97384}' | |||
INNER JOIN appversions appmin ON appmin.id = applications_versions.min AND '2.0.0.2pre' >= appmin.version | |||
INNER JOIN appversions appmax ON appmax.id = applications_versions.max | |||
INNER JOIN files ON files.version_id = versions.id AND (files.platform_id = 1 OR files.platform_id = '3' ) AND files.status = 4 | |||
ORDER BY | |||
versions.id DESC | |||
LIMIT 1 | |||
In the case above, when using the migrated data set, the -mz file is returned because it is selected at random. The file should actually be the -fx file. | |||
* Verify main.description is getting trimmed at max 250 chars. when migrating to summary | |||
== Final Migration == | |||
[[User:clouserw|Clouserw]] performed the final migration on 2007-03-05: | |||
I expect this script will be retired at this point, but just to be complete, this is what happened with the final run: | |||
* Initial AMOv2 database was loaded with a db dump from v2 from 2007-03-05 08:10am | |||
*Initial AMOv3 database was loaded with [http://viewvc.svn.mozilla.org/vc/addons/trunk/site/app/config/sql/remora.sql?revision=2206&view=markup remora.sql from r2206] | |||
*migration.conf was setup, etc. Final command run was ''./migration.py -MAv --recalculateHash > migration.out 2>&1''. We had to run the --recalculateHash flag so the script would download new addons (if it finds a hash in the db, it doesn't download the addon) | |||
*The following previews failed to migrate (but there are blank entries for them in the db): | |||
https://addons.mozilla.org/images/previews/acid_burn-1.jpg for addon_id 453 | |||
https://addons.mozilla.org/images/previews/acid_burn-2.jpg for addon_id 453 | |||
https://addons.mozilla.org/images/previews/futurama-4.jpg for addon_id 1240 | |||
https://addons.mozilla.org/images/previews/kiminbu?-1.jpg for addon_id 2904 | |||
https://addons.mozilla.org/images/previews/ook?_video_ook!-4.jpg for addon_id 2584 | |||
https://addons.mozilla.org/images/previews/webos_xul#_extension-1.jpg for addon_id 2945 | |||
https://addons.mozilla.org/images/previews/ook?_video_ook!-5.jpg for addon_id 2584 | |||
https://addons.mozilla.org/images/previews/print/print_preview-2.jpg for addon_id 1778 | |||
*Ran the [http://people.mozilla.org/~clouserw/public/remora/remora_supplement.final.sql supplemental SQL] | |||
*''./migration.py --clearAddons -a `cat amov2_addons_marked_for_deletion`'' to remove the old disabled addons. Just some housekeeping. |