Running a parallel computation with Z-set#
Z-set currently provides two level of parallism.
A first level of shared memory parallelism where thread are used whithin a single machine to speedup some parts of the computation. This capability is currently available mainly during the integration part and on some external libraries as the Mumps and Dissection solvers and for all linear algebra operations using a multithreaded Blas as the Intel MKL, e.g. in the new implementation of the domain decomposition sovers. The way of launching a multithreaded computation within Z-set is presented on page.
A second level of distributed memory parallelism where several instances of the Z-set program are launched simultaneously in an MPI (Message Passing Interface) context and acting on a previously split dataset (see page). Contrary to the previous mode, those instances may be run either on the same machine or on several distinct machines as all the communication needed is handled by the MPI communicator in a transparent way. This last mode is espially useful to spread on several machine a computation which would be too large to fit in a given machine. The way of launching a distributed memory computation within Z-set is presented on page
Finally, note that the two previous level of parallelism may be used simultaneously, which is advised to get the best performances on clusters with a large number of computing cores.
Distributed memory parallelism#
The current way to activate the distributed memory parallelism of
Z-set on UNIX systems is to launch the main binary with the mpirun
command and the -mpimpi
switch. This implies to have a dataset
already split for a distributed computation (see page ).
The number of Z-set process to be launched is given to the mpirun
command and
should be rigorously equal to the number of subdomains generated during
the splitting phase. As an example, the following command will launch 12
instances of Z-set reading the problem.inp
file, each instance
loading one of the 12 subdomains of the initially split dataset.
> mpirun -n 12 ${Z7PATH}/calcul/Zebulon_cpp_${Z7MACHINE} -mpimpi ./ problem.inp
The .
/ argument given to the -mpimpi
switch stands for the path
of the directory storing the datasets (.
/ being the path of the
current directory). This implies that the datasets of all subdomains
should be accessible through the same path that can be either a network
file system or a local one (provided that the mount point is the same on
all nodes).
There is no specific command to be issued in the main .inp
file to
enable the distributed memory parallelism, but a distributed linear
solver has to be used. The following one are available in the
distributed memory mode:
MUMPS in its distributed version (
***linear_solver dmumps
);the new implementation of the FETI domain decomposition solver (
***linear_solver dd_feti
);the AMPFETI multi-preconditioned domain decomposition solver (
***linear_solver dd_mpfeti
).
Note
If the computation is local to one node, as for example if you are connected trough ssh to a computation node and you want to exploit all the cores, you can also use the simple syntax:
> Zrun -mpimpi 12 problem.inp
Distributed memory parallelism in a SLURM context#
The following example show how a simple submission script to run
Z-set in both shared and distributed memory parallelism on a cluster
managed by the Slurm workload manager. Refer to the Slurm documentation
for information about #SBATCH
directives and Slurm related
environment variables. The first part of the block generate the domain
decomposition using the METIS mesh partitioner engine.
#SBATCH -J zset
#SBATCH -n 12 # number of Z-set process
#SBATCH -c 4 # number of core per process
INPFILE=problem.inp
MESHFILE=problem.geo
cat <<EOF > ${PWD}/splitter.inp
****mesher
***mesh ${MESHFILE}
**open ${MESHFILE}
**metis_split
*domains ${SLURM_NTASKS}
**dont_save_final_mesh
****return
EOF
Zrun -m splitter.inp
mpirun -n ${SLURM_NTASKS} ${Z7PATH}/calcul/Zebulon_cpp_{Z7MACHINE} \
-s Solver.SMP ${SLURM_CPUS_PER_TASK} -mpimpi ${PWD} ${INPFILE}