SummaryThe Brazos Cluster will be going through a major software upgrade in August and September of 2014. Below are a few of the highlights on what changes to expect from this upgrade:
- OS Upgrade - Operating System upgrade from CentOS 5.7 to CentOS 6.5
- Login - Login node name changes
- From Torque to SLURM - Replacing the Torque/Maui batch scheduler with SLURM
- Changes to modules and /apps - New modules environment software now using Lmod
- Website - Improved Brazos website and new account registration/management web portal
- New /home storage server - A new dedicated storage server for /home and /apps
- /fdata update - Update to the FhGFS software that providers /fdata
- Management and stability - A new management infrastructure focused on stability
The term EL5 or EL5 Brazos refers to the Brazos cluster before the upgrade.
This software upgrade is from EL5 to EL6.
The operating system is being upgraded from CentOS 5.7 to CentOS 6.5
The operating system upgrade includes updates to numerous system libraries.
The kernel has also been upgraded from 188.8.131.52.scalable (provided by Scalable Informatics) to 2.6.32-431.23.3.el6.x86_64 (standard OS kernel).
In EL5 the InfiniBand nodes were using very old OFED drivers that were compiled specifically against our non-standard kernel. In EL6 the InfiniBand nodes will be using 'stock' InfiniBand drivers that are a part of the OS kernel. This lowers the maintenance overhead as well as improves stability within the InfiniBand software stack.
NOTE: It is very likely that any user compiled applications will need to be recompiled on the upgraded software platform.
Previously the hostnames brazos.tamu.edu and hurr.tamu.edu were used to access Brazos using SSH. After the upgrade the hostname will be login.brazos.tamu.edu. The login.brazos.tamu.edu hostname is a Round-Robin DNS entry that will randomly distribute logins to one of the head nodes.
See the Login information page for further details regarding login access to the Brazos Cluster.
From Torque to SLURM
Changes to modules and /apps
This software upgrade includes the switch from the Environment Modules application to the more modern and actively maintained Lmod application. The way apps have been defined on Brazos has also changed. The modules available to load are based upon your currently loaded Compiler/MPI/Language combination.
It is strongly recommended that any references to the "modules" command be removed from your login scripts, such as ".bashrc".
See the Brazos documentation at Modules loaded at login as well as the Lmod User Guide's Controlling which modules get loaded uring login for additional details.
Be sure to also check for any references to module commands in your scripts used during jobs. For example the command module load openmpi/1.8.1/gcc/64 will now be something like module load gcc/4.8.2 openmpi/1.8.1 (note that order does matter).
See the Brazos Modules documentation for further details on using the new modules environment.
Refer to Modules Available on Brazos for a full list of installed modules.
The Brazos website has been redesigned and will temporarily only be accessible via http://www.brazos.tamu.edu. The www prefix will be temporarily needed to access the new website.
Account registration using the new account management application is disabled until after the cluster upgrade is complete.
A new account management application has been developed to provide new user account registration. In the future we hope to further develop this application to provide users a self-service portal for their account on Brazos. More information regarding the account management application will be made available once development has furthered.
New /home storage server
A dedicated server will be used to provide /home and /apps.
More details will be posted regarding changes to quotas once the system is online and tested.
We will be upgrading the /fdata filesystem to fhgfs-2014.01. Currently Brazos is using fhgfs-2012.10.
We will also be adding a seventh storage server. The addition of the seventh storage server will bring /fdata to approximately 238TB.
Management and stability
Part of this upgrade is the introduction of new tools for managing the Brazos cluster. These include