Fgate sysadmin
From CCMSTWiki
Contents |
Machine status
Can be conveniently checked by logging into fgate using ssh -X fgate and starting firefox. This brings up links to documentation and to status-monitoring software, e.g., ganglia or cluster top.
Machine layout
| Host Names | Type | Specs | Type of use | IPs: 10.255.255.xx
ipmi IPs: 20.255.255.xx |
|---|---|---|---|---|
| control | Head node | Two 2.66 GHz Woodcrests, 4 GB RAM, 145 GB disk | Job submission | |
| c0-0 -- c0-7 | Fat nodes | Two 2.66 GHz Woodcrests, 16 GB RAM, 720 GB disk | LSF | xx=254 -- 247 |
| c0-8 -- c0-15, c1-0 -- c1-31 | Thin nodes | Two 2.66 GHz Woodcrests, 4 GB RAM, 225 GB disk | LSF | xx=246 -- 239, 238 -- 207 |
OS updates
To update packages that should be on all the nodes, use rocks-update. To update things that should only be on the head node, we can use yum. If the kernel is updated on the head node, it may be necessary to re-build (via a make install) the driver for the Intel 10Gb module. This is under /root/10g_drivers/....
Filesystems
The following is the filesystem structure on fgate:
Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda2 40313996 7362668 30903444 20% / none 2020784 0 2020784 0% /dev/shm /dev/sda5 104814812 3504568 95985904 4% /scratch /dev/sdb1 6242534700 6214056 6091016852 1% /state/partition1 /dev/sdc1 6242534700 89592 6097141316 1% /backups tmpfs 117904 17172 100732 15% /var/lib/ganglia/rrds
Note that fgate has its own local scratch (for testing compilations, etc). (There's some junk in scratch that looks like maybe it should have gone into /state/partition1/home.)
Filesystem /state/partition1 is physically located on the MD1000 storage array (RAID5) at /dev/sdb1. An identical array is at /dev/sdc1 and is mounted as /backups.
Under /state/partition1 there is apps and home. home contains all the user files, and apps contains applictions. Note that only home and apps are exported to the nodes. /state/partition1/apps is exported to nodes as /export/apps, and each user's directory under /state/partition1/home is exported separately (when needed) as /home/username. This latter feature is a little unusual. Because of this layout, all applications and sysadmin utilities scripts need to go under /export/apps. We will also put libraries that would normally go someplace like /usr/local there, too (e.g., blas and lapack compilation directories). System administration scripts that need to be accessible by the nodes will go under /export/apps/util.
Adding users
1) To create a primary group for which a new user is a member (if this group doesn't exist) use
/usr/sbin/groupadd -g gid groupname (e.g., /usr/sbin/groupadd -g 1400 diracgroup )
Be sure this gid has not been already used (look through /etc/group).
2) Use /admin/bin/setup_user command to add a new user account and create /scratch files on all the nodes:
/admin/bin/setup_user username uid groupname userinfo (e.g., /admin/bin/setup_user paul 1433 diracgroup "Paul Dirac" )
IPMI
To check the power status of a particular node (say, c1-1) use
/admin/bin/ipmi c1-1 status or /admin/bin/ipmi compute-1-1 status
To turn the node off use
/admin/bin/ipmi c1-1 off
To turn the node on use
/admin/bin/ipmi c1-1 on
To restart it use
/admin/bin/ipmi c1-1 reboot
