Mikael Johansson Jun 6, 2005:

There are now limits on how much memory a job can use. The reason is that
exceeding the physical memory on a node leads to horribly slow performance
due to swapping.

Also, the number of queueus has increased to six; three phys-queues and
three chem-queues. In practice it means that you have to estimate how
much memory your job will need, as the different queues allow different
amount of memory usage.

The queues, their memory limits and slot amounts are:

name               limit  slots
-------------------------------
test.q		 1950 MB      1
phys-smallmem.q   300 MB      6
phys.q            950 MB     88
phys-largemem.q  1600 MB      6
chem-smallmem.q   350 MB      6
chem.q           1950 MB     20
chem-largemem.q  3550 MB      6

Future modifications of the limits and number of CPU's dedicated to a
specific queue are possible, depending on your needs. This will not affect
the instructions below.


HOW TO SPECIFY THE QUEUE
========================

There are several ways of choosing which queue(s) you want to use. As
before, you can of course specify the queue with the '-q' parameter, for
example:
#$ -q phys-smallmem.q

A more efficient, and perhaps even simpler way is to just choose which
part you belong to (phys or chem) and selecting how much memory you need
(small, medium, large) and let SGE pick out a suitable queue. The cluster
part can then be selected with, e.g., '-q phys*'. The memory amount
can be specified by two mechanism:

Method 1        Method 2
-l memtype=1    -l smallmem=true
-l memtype=2    -l mediummem=true
-l memtype=3    -l largemem=true

A few hopefully clarifying examples:

Ex.1:
#$ -q chem-smallmem.q       <- Will only go to chem-smallmem.q

Ex.2:
#$ -q phys*                 <- Will go to the first available phys-queue
                               with free CPU's, in the order
                               phys-smallmem.q, phys.q, phys-largemem.q

Ex.3:
#$ -q chem*                 <- Will go to the first available chem-queue
#$ -l mediummem=true           with free CPU's and at least a "medium"
                               amount of memory, i.e., in order, either
                               chem.q or chem-largemem.q

Ex.4:
#$ -q phys*                 <- Same as Ex.2
#$ -l smallmem=true

Ex.5:
#$ -q phys*                 <- Will only go to phys-largemem.q
#$ -l memtype=3

Ex.6:
#$ -q chem*                 <- Same as Ex.3
#$ -l memtype=2

TEST QUEUE
==========
The test.q can be used for example to test if a job really would start
with the job script you've meticulously prepared. It's also useful for
diagnosis of programs that seem to crash right after start.

The jobs in test.q run on the front end, so you don't necessarily have to
copy your files to a /tmp-dir. And if you do, you don't have to log onto
a node to check the files produced.

The limits for test.q are: 10 min / 1950 MB / 1 slot

To use the test.q, select it with
#$ -q test.q

Have a nice weekend,



SOME MORE OR LESS USEFUL COMMANDS
=================================

To get a list of queues and available slots:

    qstat -g c


To show the properties of a queue (like memory limit, the last two values
at the end of the output):

    qconf -sq queue.q


Note that the memory limit refers to the total amount of memory used by
your job, including data and stack (the limits in effect 'ulimit' your job
session). This is not normally reported by for example 'top'. To get top
to show the total amount of memory used (DSIZE), start top and press
'f', 'l', 'Enter'. If you after this press 'W' top will show DSIZE for you
by default in the future.

To quickly see how much memory the main processes use on a specific
running host (say, compute-1-23) you could for example execute:

    ssh c1-23 "top -b -n 0 | grep -A 4 USER"

or perhaps:

    ssh c1-23 "top -b -n 0 | grep $USER | head -4"


Again, please report oddities related to the queue system. Suggestions are
of course also welcome. For now, no new changes to the queue are expected
before ametisti joins the M-GRID for real. That will be interesting...