Cluster

From NetSysLab

Jump to: navigation, search

Contents

Newcomers Checklist

  • Change your password ASAP using yppasswd command on the head node
  • Request access to the reservation spreadsheet.
  • Join the cluster's mailing list. Announcements, such as cluster outage, will be posted on this list.


Naming and Access

  • You can access the cluster via ssh to alkindi.ece.ubc.ca
  • The cluster can be accessed only from within the ECE department's network. To access from outside, you need to tunnel through the department's ssh server.
  • To change your password, use yppasswd command on the head node
  • The cluster's internal domain name is netsyslab.cluster
  • The head node (alkindi.ece.ubc.ca from outside) is named master from the cluster side (i.e., master.netsyslab.cluster).
  • Nodes naming convention is as follows:
   nodexxy ; 10.0.0.xxy
   where xx is a serial number from 01 to 22
   and y is the interface number, each node has two interfaces: 0 or 1.
   For example node one interfaces     : node010 (10.0.0.10) and node011 (10.0.0.11)
                      node twenty interfaces   : node200 (10.0.0.200) and node201 (10.0.0.201)


Storage

  • Each user has a central home directory on the master node exported via NFS to all other nodes.
  • Each user has a local directory under "/local" on each node.
  • There is no backup for the data stored in the cluster (home and local directories).


Reservation

  • Make sure you reserve the nodes on the spreadsheet before start using them.
  • To reserve, just assign the "From" and "To" dates.
  • We adopt a trust system, so don't change the dates of another person's reservation unless they are expired.
  • If the nodes that you want are already reserved, contact the person that is reserving them directly to work things out (if you don't know his/her email, contact the administrator).
  • The spreadsheet also contains a waiting list for each node. Please, put your name and dates in this fields if the nodes are already being used by someone else. You can always try to talk to users to make some nodes available for you.


IMPORTANT USAGE CONSIDERATIONS

  • DO NOT use the head node in your experiments.
  • DO NOT put large traces/output on the home directory. Use the /local directories on each node to store such data. This saves space on the home directory and reduces network traffic dramatically.
  • DO NOT keep many ssh connections to alkindi from your workstation, and from the master node to all other slave nodes. Those connections consume resources from the master node, hence users may experience high response time on their ssh shells. You can always remotely fork computations on the slave nodes without keeping those connections alive. Over 4 connections is too much.
  • DO NOT change the dates of another person's reservation unless they are expired. We adopt a trust system.
  • DO NOT add extra rows to the sheet, only change the dates for the existing records.
  • DO NOT leave processes running after your reservation expires. People who use the nodes after you don't have the permission to kill them, therefore you have to kill them yourself.