Latest update: 18 Mar 2005
M-GRID cluster "ametisti"
Introduction
Ametisti is now in use.
ametisti.grid.helsinki.fi is the dual processor frontend of a
132-processor (66-node) Linux
cluster owned and operated jointly by the Department of Physical
Sciences, Department of Chemistry and Helsinki Institute of Physics. It
is intended for (a)
computational research in materials science, and (b) as a prototype
cluster for the LHC DataGRID.
The cluster environment is built on the
NPACI Rocks cluster
distribution, which is a Red Hat Linux based distribution
specialized for cluster installation, configuration, monitoring and
maintenance.
Introduction to the cluster environment
Cluster configuration
- Front-end machine: ametisti.grid.helsinki.fi
- Runs the batch queue system and other services.
- Fileserver for home directories.
- Only machine intended for interactive use.
- Computational nodes:
- To be accessed only trough the batch queue system
- You can get into individual nodes via the batch queue system
using the command qrsh !
- "Chemistry" and "Physics" nodes
- Chemistry nodes: compute-0-0 to compute-0-15
- For Dept. of Chemistry users
- 2 GB/memory per processor
- 2.2 GHz
- 262 GB /tmp, RAID-0
- Physics nodes: compute-1-0 to compute-1-49
- For Dept. of Physical Sciences and HIP users
- 1 GB/memory per processor
- 1.8 GHz
- 116 GB /tmp, RAID-1
- By default the batch queue system submits to Physics nodes. On
info how to submit to chemistry nodes see
notes on batch system below.
- Physics and HIP people who wish to use Chemistry nodes for a
well-motivated reason can ask for permission from Juha Vaara.
Account allocation
- Accounts are allocated to research scientists on a group-by basis.
- Eligible groups are those working at the Department of Physical
Sciences, Department of Chemistry participating labs and HIP, and need
computer capacity larger than that on a few workstations.
- To apply for an account, the group leader should fill in
this form, process it by latex/latex2e and dvips,
sign it, and send it to Kai Nordlund. The form is also available in
ps and PDF .
- The user list should only contain scientists who need massive
computational capacity.
- After the initial application has been accepted, new users can be
added to the group by a simple e-mail request from the group leader.
- The group leader has the responsibility to ensure that the cluster
users in the group are aware of the rules of usage (listed below), and
that they know enough of the use of Unix systems to be able to follow
the rules.
Advice on local usage
E-mail lists
- There are (at least) 3 e-mail alias lists for ametisti:
- ametisti-admin: Administrators
- ametisti-users: Users
- All of these are (at)helsinki.fi. The alias ametisti-users will
be used locally and should be kept
low-volume i.e. used only when it is really necessary to reach
all users. Ordinary users normally should not need to e-mail
the ametisti-users list, but may find the ametisti-admin
alias useful.
Remember, though, that basic advice on usage should
come from within your own research group.
- ametisti-users will also be part of the CSC e-mail list
mgrid-users which will cover the entire national MGRID
consortium.
Security
- Because of the novelty and great power of GRID clusters,
they pose a special security risk. Be very careful with all
passwords and passphrases you use (i.e. use good passwords
and only log in from trusted sites). Report any suspicious
activities in the cluster which might be of cracker origin to the
technical administrators!
Login
- The only supported protocol for login and data transfer is ssh.
- Login is only possible from ip addresses which are explicitly
opened in the firewall, which basically are those of the
owned labs.
- Login with ssh ametisti.grid.helsinki.fi
- Note: the first time you
login the system asks for a passphrase. Please give an empty one by
pressing return! This is needed to make the batch queue system work
properly.
- If you have given a non-empty passphrase, just do
rm .ssh/identity* and then login again to give an empty one.
- Ssh can be used to set up a ssh-tunnel to the http-server on
ametisti by once doing a special login like this: ssh -L
8000:ametisti.grid.helsinki.fi:80 ametisti.grid.helsinki.fi
- This ssh-tunnel enables access through your local favourite
browser to the Rocks manual, a graphical view of the batch queue
system and the cluster load. On a network connection with large latency
(ADSL or foreign connection) this is the fastest way to access the web
pages. This also the recommended way to access the web pages, since
running several local browser instances on ametisti can be quite
resource demanding.
Cluster status
On ametisti you can run mozilla localhost & to
view the (very nice) ganglia graphical reports of cluster status.
To get a similar thing on a text terminal interface use
Where to run
- All runs should be started under the SGE batch queue system from
the main node ametisti!
User support
- None. The systems is a standard Linux system. The system is
intended only for use by scientists knowledgeable about
computational methods and Unix/Linux systems, so no user support is
available. New group members who need to use the cluster should be
guided by old users within the research group.
Software support
- The only software supported by the administrators is the standard
Rocks distribution Linux software and the software provided by CSC.
- Commercial software may be installed to the system if the group(s)
needing the software provide the funding and do the installation and
maintenance.
Compilers
Fortran compilers
- Recommended: the Pathscale EKO suite Fortran compiler "pathf90"
is available via CSC's license servers
- Good optimization flags: -O3 -ipa -fno-math-errno -m64 -march=opteron
- The Portland group Fortran compiler "pgf90" is available via
CSC's license servers
- Good optimization flags: -O3 -Bstatic -fastsse -Mipa=fast
- The Absoft Fortran compiler "f90" is available with a local
licence (purchased by Kai Nordlund).
- Good optimization flags: -O3 -cpu:opteron -X"-Bstatic"
C and C++ compiler
- The Pathscale EKO suite C compilers "pathcc" is available
via CSC's license servers
- The Portland group C compilers "pgcc" and "pgCC" are available
via CSC's license servers
- It is also possible to run C and Fortran program binaries
compiled and linked statically in other Linux systems on the ametisti cluster. In
our experience this works fine, but of course the other Linux system
has to be sufficiently close to ametisti.
Optimization hints
- Always use optimization (-O or higher) when compiling your
programs.
This usually speeds up the execution by a factor of 2 or more.
- On MGRID machines always use compilers which have
Opteron-specific optimization - this can have a large effect in
performance.
- For additional info on optimization options read the compiler
man pages, "man gcc" or "man pgf90"
Advice on compilation and use of specific codes
More info
File system
General hints of usage
There are 3 commands which can be used to run things on the whole
cluster:
- Automatic removal of files
- Files on the /tmp disks which should be used for batch jobs are
automatically removed after some time. The time may be as small as 240 hours.
- If you have jobs which run longer than this, data may start vanishing
before the job is finished.
- To avoid this, you are allowed to use the command "touch -a"
regularly on all the files which are produced by the running job.
- However, you should not use touch -a after the job is finished,
but rather analyze and/or move the results to some permanent storage!
- ametisti uses the
Sun Grid Engine (SGE) batch queue system.
- man sge_intro
How to see what jobs are running?
- qstat
- qstat -f
- qstat -u username
How to see what queues are available and what they contain
- qstat -g c
- qconf -sq long
How to find out what a specific job is doing
- qstat -f -j (jobid)
How to submit a job ?
- qsub qsub_samplescript
- Including the line #$ -q phys.q in the script
- Submits to "physics" nodes, compute-1-*
How to kill a job
- qdel (jobid)
- qdel -u myuserid - kills all your jobs
- qsub qsub_samplescript
- Including the line #$ -q chem.q in the script
- Submits to "chemistry" nodes, compute-0-*
- A sample script for physics nodes is qsub_samplescript
- The sample script is for a code called "parcas". To run your
own code, it may be enough to just change "parcas" to the name of the
executable of your own code everywhere in the script.
- Also remember to change the e-mail address to your own!
- A sample script for chemistry nodes is gaussian-test.job
- As of Jun 1, 2005, there are memory limits on jobs of the same
size as the physical memory of the processors - this will prevent
the jobs from going into extensive swapping which would slow
down running tremendously and hence not be optimal use of
the processors.
More info on SGE6
Miscellaneous hints
- Fortran codes should be compiled statically!
- The default stack size is 8192. If you need to increase it,
just add ulimit -s unlimited to your batch job script!
- If you want to run the same could multiple times with a script,
simply include the script in the submission script, at the place
where you in the previous example would start the executable.
- This does have the disadvantage you can not use command line
options, but you can just put these as variables in the script instead
How to get statistics of cluster usage?
Logging in to individual nodes
Normally you should never need to do this. Nevertheless,
if you need to do it, e.g., to figure out why your batch job
crashed, you can log in to node A-B by using
ssh compute-A-B. For example ssh compute-1-1.
A shorthand alias node name of the form c-A-B does not work.
It is expressly forbidden to circumvent the
batch system and run jobs by logging in to the individual nodes.
Known problems and how to avoid them
- e-mailing from SGE batch job scripts was fixed.
- Other MGRID nodes report system crashes if shared disk usage is high.
Therefore it is of utmost importance to set up your jobs such that they
use only local disk for all I/O-intensive operations! (KN 27.3. 2005)
- More problems to appear soon.
Rules of usage for the "ametisti" computer cluster
- The cluster is intended for the use of personnel of the
Department of physical sciences, chemistry and HIP.
- The cluster is administered by Administrators appointed by the
Heads of the Departments.
- Allowed use is research and education in physics and chemistry
utilizing efficient simulation and numerical codes. Any large-scale
simulations should be run with compiled software, that is, extensive
runs using interpreting programs such as Matlab, Mathematica etc. or
script languages such as awk and perl are not allowed unless an
explicit exception is granted by one of the Administrators. Running
password cracking, cryptography, and "seti@home"-kinds of programs
on ametisti is naturally strictly prohibited.
- Research use accounts are given on a group-by basis.
Eligible groups are those working at the owning institutions. In
unclear cases the Head of the respective owning institutions decides
whether a group is eligible for an account.
- To open a group research account, the group leader should fill
in the initial application form, and send it to the Account
Administrator. After the initial application has been accepted, new
users can be added to the group by a simple e-mail request from the
group leader.
- Educational accounts might be allocated for the
lecturer of a computational physics course requiring parallell GRID
computing resources for the period of the course, according to a
separate agreement with one of the Administrators. But normally
the smaller cluster mill should be used for this purpose. The
lecturer bears the responsibility that the use of the educational
accounts is limited to proper course use, and for guiding the course
students into proper use of the cluster.
- As of now, there are no pre-defined limits for usage. Groups
are expected to use the machine in a gentlemanly manner, not attempting
to hoard as much computer capacity for themselves as possible at the
expense of other groups. All CPU use of each group is logged, and if a
single group has used what seems like an obviously unreasonable share
of the cluster for a long period of time, the Adminstrators have the
right to ask them to limit their use in the future. If after several
warnings the group still uses unreasonable amounts of capacity, the
group accounts can be closed for a fixed period of time.
- The use of the machine should take into account hardware
limitations such as memory and hard drive space limitations. Hard disk
space for public use is allocated on the /home/, and /tmp/ disks. Each
user should keep their disk space usage to a reasonable minimum, and
clean out stuff they no longer need. All long jobs should put their
output to the /tmp/ disks, which are not backuped, and are not
intended for long-term storage. Old files from the /tmp/ disks may
be removed without prior warning to the user.
- Although the cluster does have a capability for explicitly
(MPI) parallel simulations, the cluster load is expected to be so high
from serial jobs that running parallel jobs should be done only after
prior consultation and permission from one of the Administrators.
Running embarrasingly trivial parallel jobs using scripts is
allowed within the limits set by the batch queue system on the number
of jobs.
- The group leader has the responsibility to ensure that the
users in the group are aware of these rules, and that they know enough
of the use of Unix systems to be able to follow these rules.
- Any cluster user is allowed and indeed encouraged to report
clear violations of these rules to the Administrators.
- In case of clear violations of these rules, whether
intentional or due to negligence or poor understading of the system,
the Administrators can issue formal warnings to the group leader or
course lecturer. If after two warnings the group still does not comply
with the rules, the group account on the cluster will be closed for a
fixed amount of time, or permanently.
Naturally you should also follow the
University of Helsinki general rules of computer usage.
Persons in charge
The Administrators of the cluster are Kai Nordlund,
kai.nordlund@helsinki.fi (Account Administrator), Tomas Lindén,
tomas.linden@helsinki.fi, and Juha Vaara, juha.t.vaara@helsinki.fi.
The technical administrators are (11.10 2005):
- Tomas Lindén
- Erkki Aalto
- Kai Nordlund
- Jani Kotakoski
- Teemu Pennanen
- Francisco Garcia (Geant4 and Root)
- CSC administrator: Olli-Pekka Lehto