Fgate sysadmin

From CCMSTWiki

Jump to: navigation, search

Contents

Machine status

Can be conveniently checked by logging into fgate using ssh -X fgate and starting firefox. This brings up links to documentation and to status-monitoring software, e.g., ganglia or cluster top.

Machine layout

Host Names Type Specs Type of use IPs: 10.255.255.xx

ipmi IPs: 20.255.255.xx

control Head node Two 2.66 GHz Woodcrests, 4 GB RAM, 145 GB disk Job submission
c0-0 -- c0-7 Fat nodes Two 2.66 GHz Woodcrests, 16 GB RAM, 720 GB disk LSF xx=254 -- 247
c0-8 -- c0-15, c1-0 -- c1-31 Thin nodes Two 2.66 GHz Woodcrests, 4 GB RAM, 225 GB disk LSF xx=246 -- 239, 238 -- 207

OS updates

To update packages that should be on all the nodes, use rocks-update. To update things that should only be on the head node, we can use yum. If the kernel is updated on the head node, it may be necessary to re-build (via a make install) the driver for the Intel 10Gb module. This is under /root/10g_drivers/....

Filesystems

The following is the filesystem structure on fgate:

Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sda2             40313996   7362668  30903444  20% /
none                   2020784         0   2020784   0% /dev/shm
/dev/sda5            104814812   3504568  95985904   4% /scratch
/dev/sdb1            6242534700   6214056 6091016852   1% /state/partition1
/dev/sdc1            6242534700     89592 6097141316   1% /backups
tmpfs                   117904     17172    100732  15% /var/lib/ganglia/rrds

Note that fgate has its own local scratch (for testing compilations, etc). (There's some junk in scratch that looks like maybe it should have gone into /state/partition1/home.)

Filesystem /state/partition1 is physically located on the MD1000 storage array (RAID5) at /dev/sdb1. An identical array is at /dev/sdc1 and is mounted as /backups.

Under /state/partition1 there is apps and home. home contains all the user files, and apps contains applictions. Note that only home and apps are exported to the nodes. /state/partition1/apps is exported to nodes as /export/apps, and each user's directory under /state/partition1/home is exported separately (when needed) as /home/username. This latter feature is a little unusual. Because of this layout, all applications and sysadmin utilities scripts need to go under /export/apps. We will also put libraries that would normally go someplace like /usr/local there, too (e.g., blas and lapack compilation directories). System administration scripts that need to be accessible by the nodes will go under /export/apps/util.

Adding users

1) To create a primary group for which a new user is a member (if this group doesn't exist) use

        /usr/sbin/groupadd -g gid groupname
(e.g.,  /usr/sbin/groupadd -g 1400 diracgroup )

Be sure this gid has not been already used (look through /etc/group).

2) Use /admin/bin/setup_user command to add a new user account and create /scratch files on all the nodes:

        /admin/bin/setup_user username uid groupname userinfo
(e.g.,  /admin/bin/setup_user paul 1433 diracgroup "Paul Dirac" )

IPMI

To check the power status of a particular node (say, c1-1) use

    /admin/bin/ipmi c1-1 status
or  /admin/bin/ipmi compute-1-1 status

To turn the node off use

    /admin/bin/ipmi c1-1 off

To turn the node on use

    /admin/bin/ipmi c1-1 on

To restart it use

    /admin/bin/ipmi c1-1 reboot


LSF Documentation

http://www.platform.com/services/support/docs/index_62.html

Personal tools