Note that since the Myrinet upgrade on February 5, 2004, the following directions may seem a little over the top, given that there are now only 3 gigabit nodes now. Chalk this up to management's desire to minimize the amount of .html modifcations needed.
Also note that, particularly until the WestGrid UBC/TRIUMF cluster is back on-line and stable, users should feel free to submit a reasonable number of serial jobs via the Myrinet batch queue as described in this PAGE.
As is the case with the old cluster, there are no system-imposed limits to how many processors a single user can use at a given time, but users are expected to be aware of and considerate of the needs of other users, and management reserves the right to impose restrictions should contention for resources become severe.
your-workstation% ssh [email protected]Once you have logged in, use the avail command (see HERE for full usage information) to list the gig nodes in order of increasing load factor (i.e. from least to most busy):
head% avail node055 0.00 0.00 0.00 node056 0.00 0.00 0.00 node057 0.00 0.00 0.00Note that it will take a few seconds for the avail command to complete, as it must connect to all of the gig nodes and execute the uptime command on each. The three columns of numbers listed by avail are the 1-, 5- and 15-minute load averages on the respective gig nodes. Roughly speaking, we can interpret the load average as the number of CPU intensive jobs that are currently running on the machine. Thus, a load average of 0.00 means that the machine is idle, while load averages of 1.00 and 2.00 indicate that one or two jobs, respectively, are running on the node. Since each node has two processors, a load average of 2.00 means that the node is essentially completely saturated.
You should choose the nodes on which you will run from the start of the listing produced by avail, unless all nodes have 2.00 load averages, in which case you will have to wait until one or more nodes become available. DO NOT initiate a job on a machine which already has a load average of 2.00; this will only slow down overall throughput on the cluster.
head% rsh node055Again, note that the rsh command must be executed from the head node (not e.g. from your local workstation), since the compute nodes are connected via a private local network. Once you have logged into the node, run top for a few seconds to confirm that the load average is 1.00 or less, since some other user may have recently "claimed" the node. Then, simply start up your job as you would on any Unix workstation:
node055% cd some-dir node055% some-commandNOTE: If you background the job, then subsequently log out from the node before execution of the process is complete, the job will continue to run in the background, i.e. you do not have to explicitly nohup the job.