21) Introduction to MPI.jl

21) Introduction to MPI.jl#

Last time:

Slurm demo
Parallel linear algebra

Today:

Introduction to MPI.jl
Setup MPI.jl
Hello world
Setup on tuckoo cluster

1. Introduction to MPI.jl#

Recap on MPI so far:

Each process executes its own program with its own data.
Each process can communicate with every other process through messages (chunks of binary data, generally arrays of primitive data types like int, double, etc..)
Messages can be point-to-point (between only two processes) or collective such as broadcasts (one to all) and reductions (all to one).
Though in principle each process can execute a unique program, generally each process executes the same program but the work it does depends on it’s unique number within all the processes.

Terminology:

communicator: a group of processes that can communicate (a process can be in many different communicators, though in this class we will generally just use a single global communicator).
rank: the unique identifying number of a process within a communicator. Process ranks run from 0 to the number of processes minus one in the communicator. In general, a processes rank will have a different in each communicator it is a member of.

Key points:

There is no global memory shared by all the processors. Data must be exchanged with messages.
All processors operate independently of one another unless synchronization calls are made.
We often use the terms MPI rank, processor, processes, program instance, etc. interchangeably. A processor/core may have more than one MPI rank running on it but even in this case they are independent.
Note: Though in principle each process can execute a unique program ((MPMD: Multiple Program, Multiple Data)), generally each process executes the same program but the work it does depends on it’s unique number within all the processes. This is called SPMD: Single Program, Multiple Data.

Blocking Vs Non-blocking#

Blocking operation: An MPI communication operation is blocking, if return of control to the calling process indicates that all resources, such as buffers, specified in the call can be reused, e.g., for other operations. In particular, all state transitions initiated by a blocking operation are completed before control returns to the calling process.
Non-blocking operation: An MPI communication operation is non-blocking, if the corresponding call may return before all effects of the operation are completed and before the resources used by the call can be reused. Thus, a call of a non-blocking operation only starts the operation. The operation itself is completed not before all state transitions caused are completed and the resources specified can be reused.
Note that a blocking send is asynchronous in the sense that continuing execution does not imply that a message has been received by the receiving MPI rank, just that the buffers can be used (e.g., an MPI implementation might make a copy of the data before and then return control to the program).
Required synchronization is achieved through other “wait” calls.

MPI.jl is a basic Julia wrapper for the MPI standard.

2. Setup MPI.jl#

To be able to run MPI instructions in Julia, we first need to configure MPI.jl.

To do so, we need the auxiliary package MPIPreferences.jl. This will (hopefully) detect an MPI implementation library on your system at one of the default locations.

To do so, launch Julia and in the REPL do the following:

julia

julia> using Pkg; Pkg.add("MPIPreferences")

julia> using MPIPreferences; MPIPreferences.use_system_binary()

The call to use_system_binary() should automatically detect the MPI binary.

Now you should be ready to run your first MPI.jl hello world program.

3. Hello world#

In the Julia REPL, you can do:

julia> using Pkg; Pkg.add("MPI")

julia> using MPI; mpiexec(cmd -> run(`$cmd -n 3 echo hello world`))

Or you can find the following example in julia_codes/module6-1/.

using MPI

MPI.Init()
comm = MPI.COMM_WORLD

rank = MPI.Comm_rank(comm)
size = MPI.Comm_rank(comm)

println("I'm rank $rank of $size")

Once you have setup your environment via MPIPreferences and added MPI, you can run the above example under your mpiexec:

mpiexec -n 4 julia ./julia_codes/module6-1/mpi_hello_world.jl

4. Setup on tuckoo cluster#

On the cluster, you can simply start julia by typing julia from any location.

We will still use MPIPreferences to set up our environment, but this time the call to use_system_binary() instead of having an empty default list of arguments, will include the argument library_names with the location of where the MPI binary resides on the parallel file system in the cluster.

$ julia

julia> using Pkg; Pkg.add("MPIPreferences")
julia> using MPIPreferences
julia> MPIPreferences.use_system_binary(library_names="/usr/lib64/openmpi/lib/libmpi.so")
julia> using Pkg; Pkg.add("MPI")

Once your environment is setup on the cluster, you can run with

mpiexec -np 4 julia mpi_hello_world.jl

Note that we have specified the resources with -np on the cluster.

You can find an example of a batch script that launches a julia job in /examples/slurm/batch-jello

#!/bin/sh

#note: run julia hello-world on a single node

#SBATCH --job-name=jhello
#SBATCH --output=%A-jhello.out
#SBATCH --ntasks=16

export OMPI_MCA_pml=ob1
export OMPI_MCA_btl=tcp,self

#require tcp over infiniband
export OMPI_MCA_btl_tcp_if_include ib0

mpirun julia mpi_hello_world.jl