CIDuty/How To/Deprecated / Archived/Slave Management: Difference between revisions

From MozillaWiki
< CIDuty‎ | How To
Jump to navigation Jump to search
 
(40 intermediate revisions by 8 users not shown)
Line 1: Line 1:
= Slave Management =
In general, slave management involves:
In general, slave management involves:
* keeping as many slaves up as possible, including
* keeping as many slaves up as possible, including
** proactively checking for hung/broken slaves - see [http://build.mozilla.org/builds/last-job-per-slave.html the last build per slave page] which is updated once an hour.  The [https://puppetdash.pvt.build.mozilla.org/ Puppet dashboard] is also useful.
** proactively checking for hung/broken slaves - see [https://secure.pub.build.mozilla.org/builddata/reports/slave_health/index.html SlaveHealth] dashboard.
** returning re-imaged slaves to production
** returning re-imaged slaves to production
* handling [[ReleaseEngineering:Buildduty:Nagios|nagios alerts]] for slaves
* handling [[ReleaseEngineering:Buildduty:Nagios|nagios alerts]] for slaves
* interacting with IT regarding slave maintenance
* interacting with IT regarding slave maintenance


== Known failure modes ==
= Known failure modes =
* talos-r3-*
* HW machines getting unreachable, a reboot is generally needed:
** all of the r3 slaves are minis and require manual intervention if you cannot ping them or ssh into them to reboot them yourself. [[#Filing_bugs_for_IT|Add slaves in this mode to the appropriate reboots bug for IT.]]
** all Mac OS machines (bld-lion-r5* and t-yosemite-r7*) are connected to [https://wiki.mozilla.org/ReleaseEngineering/How_To/Connect_To_IPMI PDU].  
* talos-r3-fed|fed64
** Windows and Linux machines use [https://wiki.mozilla.org/ReleaseEngineering/How_To/Connect_To_IPMI IPMI]
** these slaves frequently fail to reboot cleanly, knocking themselves off the network entirely. Also, check for stale puppet locks /var/lib/puppet/state/puppetdlock if they fail to puppetize cleanly.
* talos-r3-[w7|xp]
** Windows slaves have issues with modal dialogs, and sometimes the msys shell will fail to close properly. A manual reboot will usually clear this up.
* talos-r4-[lion|snow], talos-mtnlion-r5
** These slaves will sometimes fail to puppetize correctly. The [https://hg.mozilla.org/build/braindump/file/120bdff523a3/mac-related/remote_scutil_cmds.bash remote_scutil_cmds.bash] script can help with this. 
** all r4 and r5 slaves are connected to [[#PDU|PDUs for power-cycling]]
* tegras and pandas
** tegras and pandas can fail in many disparate ways. See [[ReleaseEngineering/How_To/Android_Tegras]] for more info.
* b2g pandas - panda-0[082-521]
** Managed by Mozpool
** See [[ReleaseEngineering/Mozpool/Handling_Panda_Failures#Known_Issues_and_Handling]]
* AWS slaves
** a common failure is running out of disk space.  They have default disk allocations of 150GB versus our which have 250GB.  Catlee is working on changing that.  
*** To clean them, you can run mock_mozilla -v -r mozilla-centos6-i386 --scrub=all See {{bug|829186}} for an example.
** Rail wrote a tool [[ReleaseEngineering/How_To/Manage_AWS_slaves | to manage aws slaves]] - enable or disable automatic reboot and automatic shutdown. 
** Mozilla DNS servers don't resolve AWS hostnames, thus [[ReleaseEngineering/How_To/Resolve_AWS_names | this document describes how to resolve them]]


== Automated ==
= Automated =
=== Slave Rebooter ===
There are currently no automated mechanisms for recovering individual slaves.
Slave rebooter is a script that analyzes recent slave activity and attempts to reboot slaves that it thinks are stuck. It is a [[ReleaseEngineering/Applications/SlaveAPI | SlaveAPI]] based replacement for Kittenherder. It [https://hg.mozilla.org/build/tools/file/default/buildfarm/maintenance/reboot-idle-slaves.py lives in the build/tools repository], [https://hg.mozilla.org/build/puppet/file/default/modules/slaverebooter gets deployed by Puppet], and currently lives on buildbot-master65.
* AWS instances will automatically terminate when idle.


At the time of writing, it works for all hardware machines except Tegras and Pandas. Cloud machines are explicitly ignored because they don't suffer from the same types of transient failures.
= Manual =
== Rebooting slaves ==
Find the slave page on [https://secure.pub.build.mozilla.org/builddata/reports/slave_health/index.html SlaveHealth]. There's a button to reboot the machine.


== Manual ==
== Filing bugs for IT ==
=== Rebooting slaves ===
* File a bug using the link in the SlaveHealth page for the slave - it will "do the right thing" to set up a new bug if needed.
You should always try to connect to bad slaves via ssh first. This gives you the chance to examine the current state of the machine and hopefully grab any logs that might be pertinent to the failure. If possible, you can also try connecting via VNC to see whether a stray crash or system dialog is being displayed.
* File a [https://bugzilla.mozilla.org/enter_bug.cgi?product=mozilla.org&component=Server%20Operations%3A%20DCOps&short_desc=HOST%20is%20unreachable "slave is unreachable bug"] for IT.
==== Rebooting via ssh ====
*** '''Note''': SlaveAPI will do that automatically when failing to reboot the machine.
===== Windows =====
* Create dependent bugs for any IT actions. (As of 2017, we should file per-slave bugs for reboots instead of grouping together machines in the same DC into one bug.)
shutdown -r -f -t 0
** should block the per host bug (for record keeping)
===== Linux/Mac =====
sudo reboot
==== When ssh doesn't work ====
 
On Windows, if the SSH connection closes immediately upon connecting, chances are that the KTS SSH daemon has either a) banned your IP address or b) has reference to a disconnected SSH session and is preventing new connections with that username.  To resolve, delete files under <program files>\KTS\log\active-sessions\*disconnected* and <program files>\KTS\log\ip-ban\*.* using Administrator privileges.
 
===== PDU =====
You can determine which PDU and outlet a slave is connected to by checking the [https://inventory.mozilla.org/ inventory] (login required). Find the entry for the slave in question, and then scroll down to the Key/Value Store. There should be a key like the following, e.g.:
Key                      Value
system.pdu.0            pdu1.r101-21.ops.releng.scl3:BC3
system.hostname.alias.0  talos-mtnlion-r5-006.test.releng.scl3
 
From the system.pdu.0 line, we can see that we should connect to http://pdu1.r101-21.ops.releng.scl3.mozilla.com to power-cycle this slave (login required), and that the slave is attached to outlet BC3 on the PDU. Some of the PDUs have the outlets labelled with the slave name, but it's always good to double-check before rebooting anything.
 
Once connected to the web interface of the PDU, navigate to Outlet Control->Individual and find the appropriate outlet to reset.
 
Slave attached to PDUs:
* talos-r4-*
* talos-mtnlion-r5-*
* tegras, but you should really follow the reboot instructions in [[ReleaseEngineering/How_To/Android_Tegras]]
 
===== IPMI =====
All iX build slaves can be rebooted via an IPMI interface. If the slave name is linux-ix-slave22, then you can access the IPMI interface for that slave at http://linux-ix-slave22-mgmt.build.mozilla.org/. It's protected by a username/password that you can get from any release engineer. Power Control is under the Remote Control menu.
 
You can also use this:
ipmitool -U <user> -P <password> -H .*-ix-.*-mgmt chassis power soft
 
Slaves that have IPMI:
* linux-ix-*
* linux64-ix-*
* mv-moz2-linux-ix-*
* mw32-ix-*
* w64-ix-*
 
=== Filing bugs for IT ===
* File a bug using the link in slavealloc - it will "do the right thing" to set up a new bug if needed.
** Make sure the alias of the bug is the hostname <small>''(done automatically if you follow slavealloc bug link)''</small>
* File a "slave is unreachable bug" for IT. (Note, we used to group these together for slaves in each colo, but we stopped doing this in early 2014.)
** Example: https://bugzilla.mozilla.org/show_bug.cgi?id=966954
* Create dependent bugs for any IT actions. (As of 2014, we should file per-slave bugs for reboots instead of grouping together machines in the same DC into one bug.)
** should block both the datacenter bug & the per host bug (for record keeping)
** consider whether the slave should be disabled in slavealloc, and note that in bug (no slave without a detailed bug should be disabled)
** consider whether the slave should be disabled in slavealloc, and note that in bug (no slave without a detailed bug should be disabled)
** dcops assumes if there is no separate bug, they only need to reboot and see the machine come online.
** DCOps assumes if there is no separate bug, they only need to reboot and see the machine come online.
** Examples: https://bugzilla.mozilla.org/show_bug.cgi?id=966954, https://bugzilla.mozilla.org/show_bug.cgi?id=828602
** e.g. bug [https://bugzilla.mozilla.org/show_bug.cgi?id=1420132 1420132].


=== Slave Tracking ===
== Slave Tracking ==
* Slave tracking is done via the [http://slavealloc.build.mozilla.org/ui/#slaves Slave Allocator]. Please disable/enable slaves in slavealloc.
Slave tracking is done via the [http://slavealloc.build.mozilla.org/ui/#slaves Slave Allocator]. Please disable/enable slaves in slavealloc. You can also disable them directly from the slave page in slave health, e.g. [https://secure.pub.build.mozilla.org/builddata/reports/slave_health/slave.html?name=t-w732-ix-001 t-w732-ix-001].


'''NOTE:''' you no longer need to add the slave-specific bug number to the Notes field. Clicking on the http://slavealloc.build.mozilla.org/ui/icons/help.png icon in slavealloc will look up the bug number and status for you, or create a template you can use to file a new bug. If there is another bug, e.g. for IT re-imaging, please add that extra bug number to the Notes field instead using the format: 'bug #######.'
'''NOTE:''' you no longer need to add the slave-specific bug number to the Notes field. Clicking on the https://secure.pub.build.mozilla.org/slavealloc/ui/icons/help.png icon in slavealloc will look up the bug number and status for you, or create a template you can use to file a new bug. If there is another bug, e.g. for IT re-imaging, please add that extra bug number to the Notes field instead using the format: 'bug #######.'


==== Slavealloc ====
=== Slavealloc ===
===== Connecting =====
==== Connecting ====
Slaves are added to slavealloc via the 'dbimport' subcommand of the 'slavealloc' command.
Slaves are added to slavealloc via the 'dbimport' subcommand of the 'slavealloc' command.


Line 99: Line 43:
ssh <your user>@relengwebadm.private.scl3.mozilla.com  
ssh <your user>@relengwebadm.private.scl3.mozilla.com  
</pre>
</pre>
===== Staging vs production =====
==== Staging vs production ====
The DB urls for staging and production are shared in a PGP encrypted file used by the Release Engineering team. Ask someone else in the team if you do not have this file.
The DB urls for staging and production are shared in a PGP encrypted file used by the Release Engineering team. Ask someone else in the team if you do not have this file.


===== Adding a slave =====
==== Adding a slave ====
Once you connect to relengwebadm (see above), to see the help for the slavealloc dbimport command, run:
Once you connect to relengwebadm (see above), to see the help for the slavealloc dbimport command, run:
<pre>
<pre>
Line 112: Line 56:
<pre>
<pre>
name,distro,bitlength,speed,datacenter,trustlevel,environment,purpose,pool,basedir
name,distro,bitlength,speed,datacenter,trustlevel,environment,purpose,pool,basedir
panda-0887,panda,32,mini,scl1,try,prod,tests,tests-scl1-panda,/builds/panda-0887
t-w864-ix-236,win8,64,ix,scl3,try,prod,tests,tests-inhouse-windows,C:\slave
panda-0888,panda,32,mini,scl1,try,prod,tests,tests-scl1-panda,/builds/panda-0888
t-w864-ix-237,win8,64,ix,scl3,try,prod,tests,tests-inhouse-windows,C:\slave
panda-0889,panda,32,mini,scl1,try,prod,tests,tests-scl1-panda,/builds/panda-0889
t-w864-ix-238,win8,64,ix,scl3,try,prod,tests,tests-inhouse-windows,C:\slave
</pre>
</pre>


Line 122: Line 66:
</pre>
</pre>


===== Adding a master =====
==== Adding a master ====
Adding masters is similar to adding a slave:
Adding masters is similar to adding a slave:
<pre>
<pre>
Line 131: Line 75:
<pre>
<pre>
nickname,fqdn,http_port,pb_port,datacenter,pool
nickname,fqdn,http_port,pb_port,datacenter,pool
bm89-tests1-panda,buildbot-master89.srv.releng.scl3.mozilla.com,8201,9201,scl3,tests-panda
bm141-tests1-linux32,buildbot-master141.bb.releng.use1.mozilla.com,8201,9201,scl3,tests-use1-linux32
bm142-tests1-linux32,buildbot-master142.bb.releng.usw2.mozilla.com,8201,9201,scl3,tests-usw2-linux32
</pre>
</pre>


Line 151: Line 96:
The slavealloc dbimport mechanism will convert lines of the CSV file into INSERT sql statements. Non specified fields will essentially be set to NULL. To see how the fields are mapped and normalized, see: https://hg.mozilla.org/build/tools/file/5439f10a7127/lib/python/slavealloc/scripts/dbimport.py#l111 (lines 111-137).
The slavealloc dbimport mechanism will convert lines of the CSV file into INSERT sql statements. Non specified fields will essentially be set to NULL. To see how the fields are mapped and normalized, see: https://hg.mozilla.org/build/tools/file/5439f10a7127/lib/python/slavealloc/scripts/dbimport.py#l111 (lines 111-137).


===== Moving slaves =====
==== Moving slaves ====
Connect to relengwebadmn and then connect to the mysql DB.
Connect to relengwebadmn and then connect to the mysql DB.


Line 157: Line 102:
  UPDATE slaves SET poolid=43, trustid=4 WHERE notes LIKE 'bug 917923 - to be converted into try hosts';
  UPDATE slaves SET poolid=43, trustid=4 WHERE notes LIKE 'bug 917923 - to be converted into try hosts';


===== Removing slaves =====
==== Removing slaves ====
Connect to relengwebadmn and then connect to the mysql DB.
Connect to relengwebadmn and then connect to the mysql DB.
<pre>
<pre>
  SELECT name FROM slaves WHERE notes LIKE '%bumblebumble%';
  SELECT name FROM slaves WHERE notes LIKE '%bumblebumble%';
  DELETE name FROM slaves WHERE notes LIKE '%bumblebumble%';
  DELETE FROM slaves WHERE notes LIKE '%bumblebumble%';
</pre>
</pre>


=== Returning a re-imaged slave to production ===
== Returning a re-imaged slave to production ==
* see [[ReleaseEngineering/How_To/Set_Up_a_Freshly_Imaged_Slave]]
 
== How to decommission a slave ==
* https://wiki.mozilla.org/ReleaseEngineering/How_To/Decommission_Slave
 
= Windows =
I'm hoping to add enough info to demystify Windows and allow anyone to debug a Windows machine.
 
== Start up flow ==
This is how buildbot starts:
scheduled task (after login) -> start talos bat -> C:\slave\runslave.py
 
We started logging the start up of runslave.py under C:\slave\runslave.log
We do some clean up steps inside of the .bat file.
 
TODO: correct file names and paths
=== Trigger buildbot the natural way ===
You're logged in and you want to trigger buildbot the same way as if the machine had come back from a reboot.
 
Go to the Task Library, change the property of the scheduled task to allow running manually and hit "run" on the task (more or less).
 
== Infra setup ==
The Windows machines are managed via GPO.
 
The Windows test machines have the on-board graphics card and a third party graphic card.
The screenshot below shows two devices listed:
[[File:Xp_-_two_graphic_cards.png]]
 
=== root vs .\root ===
You want to use .\root to use the local admin user rather than the remote one.
 
=== Fix 2nd monitor ===
(From Q)
On all of the machines there is a script c:\monitor_config\fakemon.vbs that
will detect if the second screen is missing. Add it if necessary then adjust
the resolution.
 
== Windows basics ==
=== Command Prompt ===
Aka cmd.exe, you can start it by clicking on the "start" button and then click on "Run..."
 
=== Quick edit mode ===
You can change the properties of a Command Prompt to allow you to do these neat things:
* right-click to paste
* select with mouse and press enter to copy from selected text
 
You can do so by doing a right click on the Command Prompt window and changing the properties.
You can also change the defaults settings for Command Prompts being generated in the future.
 
If I recall correctly, this feature was requested for RelOps to deploy to all of our Windows machines.
 
=== runas ===
In many places you can right click and run a process as root.
However, sometimes you would want to do that from the command prompt.
 
runas /user:root command_that_you_want
 
=== Screen resolution ===
Manually: You can do a right click on the desktop and click on "Properties".
You can then click on the "Settings" tab.
 
A while ago I wrote a script that adjusts the screen resolution on Win7 machines:
http://hg.mozilla.org/build/tools/file/default/scripts/support/mouse_and_screen_resolution.py
 
There is code to query screen resolutions.
 
We should find a way to prevent starting machines up with not big enough screen resolutions. We could use runslave.py or start-buildbot.bat to prevent that (since we don't have pre-flight tasks yet).
 
=== Registry ===
You can start the registry editor by running "regedit".
 
=== Rebooting ===
You can run this command (Start->Run...):
shutdown -f -r -t 0
 
=== Computer Management ===
Do a "right click" on "My Computer" and click on "Manage"<br />
[[File:Xp_-_computer_management.png]]
 
==== Check logs ====
You can review the logs of the Windows machine to debug issues.
You should things like reboot times and others.
[[File:Xp_-_looking_at_logs.png]]
 
== Task Library - talosslave task ==
[[Image:W7_-_task_library.png|thumbnail]]
[[Image:W8 - task library.png|thumbnail]]
 
On Win7 & Win8 you can right click on the "Computer" icon and click on "Manage".
For Win8, you will need to enter the admin credentials.
 
This will take you to the "Computer Management" window. Click on the following to reach to task library:
* System Tools
* Task Scheduler
* Task Scheduler Library


See [[ReleaseEngineering/How_To/Set_Up_a_Freshly_Imaged_Slave]]
You should "talosslave" listed there which takes care of staring buildbot/runslave.py.<br />
NOTE: To run manually the talosslave task you might need to change the property of the task.<br />
NOTE2: I have not figured out WinXp.<br />


=== How to decommission a slave ===
== Working graphical setup ==
* disable the slave in slavealloc, also setting its environment to "decomm"
=== Xp ===
* if the hardware has failed:
NOTE: You see two monitors unlike win7.<br />
** file a bug against Server Ops:Releng to decomm the slave. They should (at the very least) make sure the nagios alerts are updated, DNS updated, and the hardware recovered from the dc.
[[Image:Xp - working screen setup.png]]
* if the hardware is still viable and can be used by another pool (e.g. r3 mini)
=== Win7 ===
** file a bug against Server Ops:Releng to have the slave re-imaged to another OS with bad wait times (usually Windows)
NOTE: Only one monitor is showing up. <br />
** add the new slave to the buildbot configs, and make sure nagios monitoring is setup for the new slave (may require a new bug against relops)
[[Image:W7 - screen resolution.png]]
* remove the slave from the buildbot configs
=== Win8 ===
* remove the slave from puppet or opsi configs, if it exists in one
NOTE: You see two monitors in here; unlike win7.<br />
[[Image:W8 - screen resolution.png]]

Latest revision as of 10:52, 23 April 2019

In general, slave management involves:

  • keeping as many slaves up as possible, including
    • proactively checking for hung/broken slaves - see SlaveHealth dashboard.
    • returning re-imaged slaves to production
  • handling nagios alerts for slaves
  • interacting with IT regarding slave maintenance

Known failure modes

  • HW machines getting unreachable, a reboot is generally needed:
    • all Mac OS machines (bld-lion-r5* and t-yosemite-r7*) are connected to PDU.
    • Windows and Linux machines use IPMI

Automated

There are currently no automated mechanisms for recovering individual slaves.

  • AWS instances will automatically terminate when idle.

Manual

Rebooting slaves

Find the slave page on SlaveHealth. There's a button to reboot the machine.

Filing bugs for IT

  • File a bug using the link in the SlaveHealth page for the slave - it will "do the right thing" to set up a new bug if needed.
  • File a "slave is unreachable bug" for IT.
      • Note: SlaveAPI will do that automatically when failing to reboot the machine.
  • Create dependent bugs for any IT actions. (As of 2017, we should file per-slave bugs for reboots instead of grouping together machines in the same DC into one bug.)
    • should block the per host bug (for record keeping)
    • consider whether the slave should be disabled in slavealloc, and note that in bug (no slave without a detailed bug should be disabled)
    • DCOps assumes if there is no separate bug, they only need to reboot and see the machine come online.
    • e.g. bug 1420132.

Slave Tracking

Slave tracking is done via the Slave Allocator. Please disable/enable slaves in slavealloc. You can also disable them directly from the slave page in slave health, e.g. t-w732-ix-001.

NOTE: you no longer need to add the slave-specific bug number to the Notes field. Clicking on the help.png icon in slavealloc will look up the bug number and status for you, or create a template you can use to file a new bug. If there is another bug, e.g. for IT re-imaging, please add that extra bug number to the Notes field instead using the format: 'bug #######.'

Slavealloc

Connecting

Slaves are added to slavealloc via the 'dbimport' subcommand of the 'slavealloc' command.

You will need to ssh as your own user onto the server which hosts slavealloc:

ssh <your user>@relengwebadm.private.scl3.mozilla.com 

Staging vs production

The DB urls for staging and production are shared in a PGP encrypted file used by the Release Engineering team. Ask someone else in the team if you do not have this file.

Adding a slave

Once you connect to relengwebadm (see above), to see the help for the slavealloc dbimport command, run:

/data/releng/www/slavealloc/slavealloc dbimport -h

To import data, first you need to create a CSV file, like this one:

name,distro,bitlength,speed,datacenter,trustlevel,environment,purpose,pool,basedir
t-w864-ix-236,win8,64,ix,scl3,try,prod,tests,tests-inhouse-windows,C:\slave
t-w864-ix-237,win8,64,ix,scl3,try,prod,tests,tests-inhouse-windows,C:\slave
t-w864-ix-238,win8,64,ix,scl3,try,prod,tests,tests-inhouse-windows,C:\slave

You'll want a command line something like:

/data/releng/www/slavealloc/slavealloc dbimport -D mysql://user:password@host/DB_name --slave-data <the csv file you just created containing slaves>

Adding a master

Adding masters is similar to adding a slave:

/data/releng/www/slavealloc/slavealloc dbimport -D mysql://user:password@host/DB_name  --master-data <csv file containing masters>

The following example shows the required fields, and example values:

nickname,fqdn,http_port,pb_port,datacenter,pool
bm141-tests1-linux32,buildbot-master141.bb.releng.use1.mozilla.com,8201,9201,scl3,tests-use1-linux32
bm142-tests1-linux32,buildbot-master142.bb.releng.usw2.mozilla.com,8201,9201,scl3,tests-usw2-linux32

To get a full list of allowed values for the various normalized fields to use in both import files, you can connect to the mysql database and query the tables directly:

SELECT name FROM bitlengths;
SELECT name FROM datacenters;
SELECT name FROM distros;
SELECT name FROM environments;
SELECT name FROM pools;
SELECT name FROM purposes;
SELECT name FROM speeds;
SELECT name FROM trustlevels;

Please note you'll need to set values in your CSV file that correspond to these allowed values.

The slavealloc dbimport mechanism will convert lines of the CSV file into INSERT sql statements. Non specified fields will essentially be set to NULL. To see how the fields are mapped and normalized, see: https://hg.mozilla.org/build/tools/file/5439f10a7127/lib/python/slavealloc/scripts/dbimport.py#l111 (lines 111-137).

Moving slaves

Connect to relengwebadmn and then connect to the mysql DB.

You have to determine the correct poolid and trustid values.

UPDATE slaves SET poolid=43, trustid=4 WHERE notes LIKE 'bug 917923 - to be converted into try hosts';

Removing slaves

Connect to relengwebadmn and then connect to the mysql DB.

 SELECT name FROM slaves WHERE notes LIKE '%bumblebumble%';
 DELETE FROM slaves WHERE notes LIKE '%bumblebumble%';

Returning a re-imaged slave to production

How to decommission a slave

Windows

I'm hoping to add enough info to demystify Windows and allow anyone to debug a Windows machine.

Start up flow

This is how buildbot starts:

scheduled task (after login) -> start talos bat -> C:\slave\runslave.py

We started logging the start up of runslave.py under C:\slave\runslave.log We do some clean up steps inside of the .bat file.

TODO: correct file names and paths

Trigger buildbot the natural way

You're logged in and you want to trigger buildbot the same way as if the machine had come back from a reboot.

Go to the Task Library, change the property of the scheduled task to allow running manually and hit "run" on the task (more or less).

Infra setup

The Windows machines are managed via GPO.

The Windows test machines have the on-board graphics card and a third party graphic card. The screenshot below shows two devices listed: Xp - two graphic cards.png

root vs .\root

You want to use .\root to use the local admin user rather than the remote one.

Fix 2nd monitor

(From Q) On all of the machines there is a script c:\monitor_config\fakemon.vbs that will detect if the second screen is missing. Add it if necessary then adjust the resolution.

Windows basics

Command Prompt

Aka cmd.exe, you can start it by clicking on the "start" button and then click on "Run..."

Quick edit mode

You can change the properties of a Command Prompt to allow you to do these neat things:

  • right-click to paste
  • select with mouse and press enter to copy from selected text

You can do so by doing a right click on the Command Prompt window and changing the properties. You can also change the defaults settings for Command Prompts being generated in the future.

If I recall correctly, this feature was requested for RelOps to deploy to all of our Windows machines.

runas

In many places you can right click and run a process as root. However, sometimes you would want to do that from the command prompt.

runas /user:root command_that_you_want

Screen resolution

Manually: You can do a right click on the desktop and click on "Properties". You can then click on the "Settings" tab.

A while ago I wrote a script that adjusts the screen resolution on Win7 machines: http://hg.mozilla.org/build/tools/file/default/scripts/support/mouse_and_screen_resolution.py

There is code to query screen resolutions.

We should find a way to prevent starting machines up with not big enough screen resolutions. We could use runslave.py or start-buildbot.bat to prevent that (since we don't have pre-flight tasks yet).

Registry

You can start the registry editor by running "regedit".

Rebooting

You can run this command (Start->Run...):

shutdown -f -r -t 0

Computer Management

Do a "right click" on "My Computer" and click on "Manage"
Xp - computer management.png

Check logs

You can review the logs of the Windows machine to debug issues. You should things like reboot times and others. Xp - looking at logs.png

Task Library - talosslave task

W7 - task library.png
W8 - task library.png

On Win7 & Win8 you can right click on the "Computer" icon and click on "Manage". For Win8, you will need to enter the admin credentials.

This will take you to the "Computer Management" window. Click on the following to reach to task library:

  • System Tools
  • Task Scheduler
  • Task Scheduler Library

You should "talosslave" listed there which takes care of staring buildbot/runslave.py.
NOTE: To run manually the talosslave task you might need to change the property of the task.
NOTE2: I have not figured out WinXp.

Working graphical setup

Xp

NOTE: You see two monitors unlike win7.
Xp - working screen setup.png

Win7

NOTE: Only one monitor is showing up.
W7 - screen resolution.png

Win8

NOTE: You see two monitors in here; unlike win7.
W8 - screen resolution.png