Using ACML

We have installed AMD's Core Math Library (ACML) version 6.5.0.7. This offers an optimized full implementation of BLAS levels 1, 2, & 3, a full suite of LAPACK routines, FFTs in single, double, single-complex, and double-complex, and random number generators in single- and double-precision.

The versioned of ACML provided by the modules command contain both the single and multi threaded libraries.
Currently ACML 6 is only available for the GNU Fortran compiler.
The ACML modules add both the single and multi threaded paths to CPATH, LD_LIBRARY_PATH, and LIBRARY_PATH.
The table below describes the environment variables provided by the ACML modules.

Environment variable Description
BRAZOS_ACML_MP_ROOT The root directory of the multi threaded version.
BRAZOS_ACML_ROOT The root directory of the single threaded version.
BRAZOS_ACML_MP_INC The directory of the multi threaded header files.
BRAZOS_ACML_INC The directory of the single threaded header files.
BRAZOS_ACML_MP_LIB The directory of the multi threaded shared libraries.
BRAZOS_ACML_LIB The directory of the single threaded shared libraries.

Example 1
The generic library is used and run on 8 AMD Shanghai nodes with 8 processes per node, 64 total processes.

Here's how we compile the code, assuming it resides in your home directory:

cd $HOME
module load gcc acml openmpi
mpicc -O myapp.c -o myapp.exe -lacml

Here's the SLURM script:

#!/bin/bash
#SBATCH -p mpi-core8
#SBATCH --nodes=8
#SBATCH --ntasks-per-node=8
#SBATCH --time=01:00:00

# Load the acml and openmpi modules
module load gcc acml openmpi

# Print out the starting time and host
echo "$SBATCH_JOB_ID started on `hostname` at `date`"
echo "-=-"

# Change directory
cd $SCRATCH

# Run it and save exit status as $ret
mpirun -np 64 -mca btl openib,self $HOME/myapp.exe
ret=$?

# Print the end time and host.
echo "-=-"
echo "$SBATCH_JOB_ID ended on `hostname` at `date` with status $ret"

# Done
exit 0

Example 2
4 AMD Bulldozer nodes are used with the FMA4 multi threaded library. Each node has 32 processor cores. We will compile our code using the GCC compiler. The mpirun -pernode command is used to launch one process per node. The multi threaded ACML on each node provides node-local parallelism.

Here's how we compile the code, assuming it resides in your home directory:

cd $HOME
module load gcc acml openmpi
mpicc -O myapp.c -o myapp_acml_fma4_mp.exe -lacml_mp -lpthread -mfma4

Here's the SLURM script:

#!/bin/bash
#SBATCH -p mpi-core32
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=32
#SBATCH --time=01:00:00

# Load the acml and openmpi modules
module load gcc acml openmpi

export OMP_NUM_THREADS=32

# Print out the starting time and host
echo "$SBATCH_JOB_ID started on `hostname` at `date`"
echo "-=-"

# Change directory
cd $SCRATCH

# Run it and save exit status as $ret
mpirun -np 4 -pernode -mca btl openib,self $HOME/myapp_acml_fma4_mp.exe
ret=$?

# Print the end time and host.
echo "-=-"
echo "$SBATCH_JOB_ID ended on `hostname` at `date` with status $ret"

# Done
exit 0

Example 3
Similar to above, 4 AMD Bulldozer nodes are used with the FMA4 multi threaded library. Each node has 32 processor cores. We will compile our code using the GCC compiler. The mpirun -npernode 32 command is used to launch 32 processes per node. The multiple ACML processes on each node provides parallelism through MPI.

Here's how we compile the code, assuming it resides in your home directory:

cd $HOME
module load gcc acml openmpi
mpicc -O myapp.c -o myapp_acml_fma4_mp.exe -lacml_mp -lpthread -mfma4

Here's the SLURM script:

#!/bin/bash
#SBATCH -q iamcs
#SBATCH -p mpi-core32
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=32
#SBATCH --cpus-per-task=1
#SBATCH --time=01:00:00

# Load the acml and openmpi modules
module load gcc acml openmpi

export OMP_NUM_THREADS=1

# Print out the starting time and host
echo "$SBATCH_JOB_ID started on `hostname` at `date`"
echo "-=-"

# Change directory
cd $SCRATCH

# Run it and save exit status as $ret
mpirun -npernode 32 -mca btl openib,self $HOME/myapp_acml_fma4_mp.exe
ret=$?

# Print the end time and host.
echo "-=-"
echo "$SBATCH_JOB_ID ended on `hostname` at `date` with status $ret"

# Done
exit 0