Running a parallel computation with Z-set#

Z-set currently provides two level of parallism.

  • A first level of shared memory parallelism where thread are used whithin a single machine to speedup some parts of the computation. This capability is currently available mainly during the integration part and on some external libraries as the Mumps and Dissection solvers and for all linear algebra operations using a multithreaded Blas as the Intel MKL, e.g. in the new implementation of the domain decomposition sovers. The way of launching a multithreaded computation within Z-set is presented on page.

  • A second level of distributed memory parallelism where several instances of the Z-set program are launched simultaneously in an MPI (Message Passing Interface) context and acting on a previously split dataset (see page). Contrary to the previous mode, those instances may be run either on the same machine or on several distinct machines as all the communication needed is handled by the MPI communicator in a transparent way. This last mode is espially useful to spread on several machine a computation which would be too large to fit in a given machine. The way of launching a distributed memory computation within Z-set is presented on page

Finally, note that the two previous level of parallelism may be used simultaneously, which is advised to get the best performances on clusters with a large number of computing cores.

Shared memory parallelism#

The simplest way to activate the shared memory parallelism of Z-set on UNIX systems is to pass the switch -smp followed by the number of threads to the Zrun command. For a 12 cores computation:

> Zrun -smp 12 problem.inp

This mode may also be activated by passing the Solver.SMP parameter to the main Z-set binary, although the previous way has to be prefered as it check the status of additional environment variables.

> ${Z7PATH}/calcul/Zebulon_cpp_${Z7MACHINE} -s Solver.SMP 12 problem.inp

In a single process computation, multithreading capability can be actived when using the following linear solvers :

  • MUMPS (***linear_solver mumps);

  • Dissection (***linear_solver dissection, see).

Shared memory parallelism within a Slurm allocation#

The following example show how a simple submission script to run Z-set in shared memory parallelism on a cluster managed by the Slurm workload manager. Refer to the Slurm documentation for information about #SBATCH directives and Slurm related environment variables.

#SBATCH -J zset
#SBATCH -c 12         # number of cores

INPFILE=problem.inp

Zrun -smp ${SLURM_CPUS_PER_TASK} ${INPFILE}

Distributed memory parallelism#

The current way to activate the distributed memory parallelism of Z-set on UNIX systems is to launch the main binary with the mpirun command and the -mpimpi switch. This implies to have a dataset already split for a distributed computation (see page ). The number of Z-set process to be launched is given to the mpirun command and should be rigorously equal to the number of subdomains generated during the splitting phase. As an example, the following command will launch 12 instances of Z-set reading the problem.inp file, each instance loading one of the 12 subdomains of the initially split dataset.

> mpirun -n 12 ${Z7PATH}/calcul/Zebulon_cpp_${Z7MACHINE} -mpimpi ./ problem.inp

The ./ argument given to the -mpimpi switch stands for the path of the directory storing the datasets (./ being the path of the current directory). This implies that the datasets of all subdomains should be accessible through the same path that can be either a network file system or a local one (provided that the mount point is the same on all nodes).

There is no specific command to be issued in the main .inp file to enable the distributed memory parallelism, but a distributed linear solver has to be used. The following one are available in the distributed memory mode:

  • MUMPS in its distributed version (***linear_solver dmumps);

  • the new implementation of the FETI domain decomposition solver (***linear_solver dd_feti);

  • the AMPFETI multi-preconditioned domain decomposition solver (***linear_solver dd_mpfeti).

Note

If the computation is local to one node, as for example if you are connected trough ssh to a computation node and you want to exploit all the cores, you can also use the simple syntax:

> Zrun -mpimpi 12 problem.inp

Hybrid shared and distributed memory parallelism#

The shared memory parallelism capability is available within each Z-set process launched within the MPI context. Therefore, it is advised to take advantage of both parallelism in an hybrid way. The following example show how to launch Z-set in an hybrid parallelism with 12 Z-set process using 4 cores each.

> mpirun -n 12 ${Z7PATH}/calcul/Zebulon_cpp_${Z7MACHINE} -s Solver.SMP 4 \
    -mpimpi ./ problem.inp

Distributed memory parallelism in a SLURM context#

The following example show how a simple submission script to run Z-set in both shared and distributed memory parallelism on a cluster managed by the Slurm workload manager. Refer to the Slurm documentation for information about #SBATCH directives and Slurm related environment variables. The first part of the block generate the domain decomposition using the METIS mesh partitioner engine.

#SBATCH -J zset
#SBATCH -n 12        # number of Z-set process
#SBATCH -c 4         # number of core per process

INPFILE=problem.inp
MESHFILE=problem.geo

cat <<EOF > ${PWD}/splitter.inp
****mesher
 ***mesh ${MESHFILE}
  **open ${MESHFILE}
  **metis_split
   *domains ${SLURM_NTASKS}
  **dont_save_final_mesh
****return
EOF
Zrun -m splitter.inp

mpirun -n ${SLURM_NTASKS} ${Z7PATH}/calcul/Zebulon_cpp_{Z7MACHINE} \
    -s Solver.SMP ${SLURM_CPUS_PER_TASK} -mpimpi ${PWD} ${INPFILE}