22) Blocking and non-blocking communication#
Last time:
Introduction to MPI.jl
Today:
Blocking communication
1.1 ExamplesNon-blocking communication
2.1 Examples
1. Blocking communication#
Note that most MPI routines return error codes indicating successful executing of a function. For the most part these are ignored by the MPI wrappers in Julia… For example, see the help string for the MPI.Send
below:
using MPI
help?> MPI.Send
Send(buf, comm::Comm; dest::Integer, tag::Integer=0)
Perform a blocking send from the buffer buf to MPI rank dest of communicator comm using the message tag tag.
Send(obj, comm::Comm; dest::Integer, tag::Integer=0)
Complete a blocking send of an isbits object obj to MPI rank dest of communicator comm using with the message tag tag.
External links
≡≡≡≡≡≡≡≡≡≡≡≡≡≡
• MPI_Send man page: OpenMPI
(https://docs.open-mpi.org/en/main/man-openmpi/man3/MPI_Send.3.html), MPICH
(https://www.mpich.org/static/docs/v4.0/www3/MPI_Send.html)
The
tag
above is used because sometimes there are cases when A might have to send many different types of messages to B. Instead of B having to go through extra measures to differentiate all these messages, MPI allows senders and receivers to also specify message IDs with the message (known as tags). When process B only requests a message with a certain tag number, messages with different tags will be buffered by the network until B is ready for them.Note that the receive operation has to match the tag of a message it wants to receive.
Likewise for the MPI.Recv!
function:
help?> MPI.Recv!
data = Recv!(recvbuf, comm::Comm;
source::Integer=MPI.ANY_SOURCE, tag::Integer=MPI.ANY_TAG)
data, status = Recv!(recvbuf, comm::Comm, MPI.Status;
source::Integer=MPI.ANY_SOURCE, tag::Integer=MPI.ANY_TAG)
Completes a blocking receive into the buffer recvbuf from MPI rank source of communicator comm
using with the message tag tag.
recvbuf can be a Buffer, or any object for which Buffer(recvbuf) is defined.
Optionally returns the Status object of the receive.
See also
≡≡≡≡≡≡≡≡
• Recv
• recv
External links
≡≡≡≡≡≡≡≡≡≡≡≡≡≡
• MPI_Recv man page: OpenMPI
(https://docs.open-mpi.org/en/main/man-openmpi/man3/MPI_Recv.3.html), MPICH
(https://www.mpich.org/static/docs/v4.0/www3/MPI_Recv.html)
MPI.Send
blocks until the message has been sent (that means that whatever was in thebuf
can be written over)MPI.Recv!
blocks until the message has been full received (that means thatbuf
is the valid data). Since two ranks may be sending multiple messages to one another,tag
uniquely identifies each message (both ranks must use the same tag).For most applications (everything I’ve ever done) the process receiving the communication can predict how much data it will receive. If this is not the case the status variable can be used to find out how much data was received via
MPI.Get_count
.Similarly,
source
andtag
inMPI.Recv
can be set toMPI.ANY_SOURCE
andMPI.ANY_TAG
and then the status variable can be used to determine the source process rank and the tag received. That said, I’ve never used this in practice and most likely this is not needed for this class.
1.1 Examples#
Example of blocking communication#
Example of a blocking communication can be found at julia_codes/module6-2/blocking.jl
.
We will demo this in class.
Example of deadlock#
Deadlock: One of the dangers of blocking communication is that deadlock can occur where there are waiting patterns that cannot be resolved, i.e., ranks are mutually waiting to receive data.
Example can be found at julia_codes/module6-2/deadlock.jl
.
WARNING: if the sends were done first in the example shown above deadlock may or may not occur depending on whether the MPI implementation buffered the send or not (i.e., made a copy and returned control to the user).
Other blocking communication#
Since it’s often the case the two processes exchange messages, in the MPI standard there are two functions to facilitate such pairwise communication. These are the functions: MPI_Sendrecv
and MPI_Sendrecv_replace
.
2. Non-blocking communication#
Two problems with blocking communication:
As seen above, deadlock can occur is the programmer is not careful with the sends and receives order.
The program has to wait for the sends to complete (or at least the buffer to be reusable) and the receives to complete before doing any more work. In many cases, we can initialize sends and receives to do some work with data that does not depend on the received data then come back to fix things up once we have the extra data. (This is called overlapping communication and computation.)
This gives rise to a need for non-blocking operations, where the process can start a send or receive, and then the program can continue and check at some future point whether the communication is done or not.
The commands for non-blocking sending and receiving are:
MPI.Isend
MPI.Irecv
and these havean MPI.Request
associated with them that can be checked withMPI.Wait
and in the case of an array of requests anMPI.Waitall
.
Recall that the “I” stands for “with Immediate return”; it does not block until the message is received.
julia> using MPI
help?> MPI.Irecv!
req = Irecv!(recvbuf, comm::Comm[, req::AbstractRequest = Request()];
source::Integer=MPI.ANY_SOURCE, tag::Integer=MPI.ANY_TAG)
Starts a nonblocking receive into the buffer data from MPI rank source of communicator comm using with the message tag tag.
data can be a Buffer, or any object for which Buffer(data) is defined.
Returns the AbstractRequest object for the nonblocking receive.
External links
≡≡≡≡≡≡≡≡≡≡≡≡≡≡
• MPI_Irecv man page: OpenMPI
(https://docs.open-mpi.org/en/main/man-openmpi/man3/MPI_Irecv.3.html), MPICH
(https://www.mpich.org/static/docs/v4.0/www3/MPI_Irecv.html)
help?> MPI.Isend
Isend(data, comm::Comm[, req::AbstractRequest = Request()]; dest::Integer, tag::Integer=0)
Starts a nonblocking send of data to MPI rank dest of communicator comm using with the message tag tag.
data can be a Buffer, or any object for which Buffer_send is defined.
Returns the AbstractRequest object for the nonblocking send.
External links
≡≡≡≡≡≡≡≡≡≡≡≡≡≡
• MPI_Isend man page: OpenMPI
(https://docs.open-mpi.org/en/main/man-openmpi/man3/MPI_Isend.3.html), MPICH
(https://www.mpich.org/static/docs/v4.0/www3/MPI_Isend.html)
help?> MPI.Wait
Wait(req::AbstractRequest)
status = Wait(req::AbstractRequest, Status)
Block until the request req is complete and deallocated.
The Status argument returns the Status of the completed request.
External links
≡≡≡≡≡≡≡≡≡≡≡≡≡≡
• MPI_Wait man page: OpenMPI
(https://docs.open-mpi.org/en/main/man-openmpi/man3/MPI_Wait.3.html), MPICH
(https://www.mpich.org/static/docs/v4.0/www3/MPI_Wait.html)
help?> MPI.Waitall
Waitall(reqs::AbstractVector{Request}[, statuses::Vector{Status}])
statuses = Waitall(reqs::AbstractVector{Request}, Status)
Block until all active requests in the array reqs are complete.
The optional statuses or Status argument can be used to obtain the return Status of each request.
See also
≡≡≡≡≡≡≡≡
• RequestSet can be used to minimize allocations
External links
≡≡≡≡≡≡≡≡≡≡≡≡≡≡
• MPI_Waitall man page: OpenMPI
(https://docs.open-mpi.org/en/main/man-openmpi/man3/MPI_Waitall.3.html), MPICH
(https://www.mpich.org/static/docs/v4.0/www3/MPI_Waitall.html)
help?> MPI.Waitany
i = Waitany(reqs::AbstractVector{Request}[, status::Ref{Status}])
i, status = Waitany(reqs::AbstractVector{Request}, Status)
Blocks until one of the requests in the array reqs is complete: if more than one is complete, one is chosen arbitrarily. The request is deallocated and the (1-based) index i of the completed request is returned.
If there are no active requests, then i = nothing.
The optional status argument can be used to obtain the return Status of the request.
See also
≡≡≡≡≡≡≡≡
• RequestSet can be used to minimize allocations
External links
≡≡≡≡≡≡≡≡≡≡≡≡≡≡
• MPI_Waitany man page: OpenMPI
(https://docs.open-mpi.org/en/main/man-openmpi/man3/MPI_Waitany.3.html), MPICH
(https://www.mpich.org/static/docs/v4.0/www3/MPI_Waitany.html)
Note:
The
MPI.Waitany
function allows a program to wait on any of the request in an array to complete (as opposed to all of them)The functions
MPI.Test
,MPI.Testany
, andMPI.Testall
can be used to check whether a request or array of requests is completed without blocking.
2.1 Examples#
Example of non-blocking communication#
Example of a non-blocking communication can be found at julia_codes/module6-2/non_blocking_wait.jl
.
If we’re not careful about not waiting on the recv
request you can have problems… Example of a buggy version can be found at julia_codes/module6-2/non_blocking_wait_with_bug.jl
.
In the following example, which can be found at julia_codes/module6-2/non_blocking_waitall.jl
, the MPI.Wait
is replaced with MPI.Waitall
call.
There is no reason you’d do this, just an example. Generally you do not want to wait on a request until you absolutely have to (so waiting on the send request and receive request together is unneeded!)