22) Blocking and non-blocking communication#

Last time:

  • Introduction to MPI.jl

Today:

  1. Blocking communication
    1.1 Examples

  2. Non-blocking communication
    2.1 Examples

1. Blocking communication#

Note that most MPI routines return error codes indicating successful executing of a function. For the most part these are ignored by the MPI wrappers in Julia… For example, see the help string for the MPI.Send below:

using MPI

help?> MPI.Send
  Send(buf, comm::Comm; dest::Integer, tag::Integer=0)

  Perform a blocking send from the buffer buf to MPI rank dest of communicator comm using the message tag tag.

  Send(obj, comm::Comm; dest::Integer, tag::Integer=0)

  Complete a blocking send of an isbits object obj to MPI rank dest of communicator comm using with the message tag tag.

  External links
  ≡≡≡≡≡≡≡≡≡≡≡≡≡≡

      MPI_Send man page: OpenMPI
       (https://docs.open-mpi.org/en/main/man-openmpi/man3/MPI_Send.3.html), MPICH
       (https://www.mpich.org/static/docs/v4.0/www3/MPI_Send.html)
  • The tag above is used because sometimes there are cases when A might have to send many different types of messages to B. Instead of B having to go through extra measures to differentiate all these messages, MPI allows senders and receivers to also specify message IDs with the message (known as tags). When process B only requests a message with a certain tag number, messages with different tags will be buffered by the network until B is ready for them.

    • Note that the receive operation has to match the tag of a message it wants to receive.

Likewise for the MPI.Recv! function:

help?> MPI.Recv!
  data = Recv!(recvbuf, comm::Comm;
          source::Integer=MPI.ANY_SOURCE, tag::Integer=MPI.ANY_TAG)
  data, status = Recv!(recvbuf, comm::Comm, MPI.Status;
          source::Integer=MPI.ANY_SOURCE, tag::Integer=MPI.ANY_TAG)

  Completes a blocking receive into the buffer recvbuf from MPI rank source of communicator comm
  using with the message tag tag.

  recvbuf can be a Buffer, or any object for which Buffer(recvbuf) is defined.

  Optionally returns the Status object of the receive.

  See also
  ≡≡≡≡≡≡≡≡

      Recv

      recv

  External links
  ≡≡≡≡≡≡≡≡≡≡≡≡≡≡

      MPI_Recv man page: OpenMPI
       (https://docs.open-mpi.org/en/main/man-openmpi/man3/MPI_Recv.3.html), MPICH
       (https://www.mpich.org/static/docs/v4.0/www3/MPI_Recv.html)
  • MPI.Send blocks until the message has been sent (that means that whatever was in the buf can be written over)

  • MPI.Recv! blocks until the message has been full received (that means that buf is the valid data). Since two ranks may be sending multiple messages to one another, tag uniquely identifies each message (both ranks must use the same tag).

  • For most applications (everything I’ve ever done) the process receiving the communication can predict how much data it will receive. If this is not the case the status variable can be used to find out how much data was received via MPI.Get_count.

  • Similarly, source and tag in MPI.Recv can be set to MPI.ANY_SOURCE and MPI.ANY_TAG and then the status variable can be used to determine the source process rank and the tag received. That said, I’ve never used this in practice and most likely this is not needed for this class.

1.1 Examples#

Example of blocking communication#

Example of a blocking communication can be found at julia_codes/module6-2/blocking.jl.

We will demo this in class.

Example of deadlock#

  • Deadlock: One of the dangers of blocking communication is that deadlock can occur where there are waiting patterns that cannot be resolved, i.e., ranks are mutually waiting to receive data.

Example can be found at julia_codes/module6-2/deadlock.jl.

  • WARNING: if the sends were done first in the example shown above deadlock may or may not occur depending on whether the MPI implementation buffered the send or not (i.e., made a copy and returned control to the user).

Other blocking communication#

Since it’s often the case the two processes exchange messages, in the MPI standard there are two functions to facilitate such pairwise communication. These are the functions: MPI_Sendrecv and MPI_Sendrecv_replace.

2. Non-blocking communication#

Two problems with blocking communication:

  1. As seen above, deadlock can occur is the programmer is not careful with the sends and receives order.

  2. The program has to wait for the sends to complete (or at least the buffer to be reusable) and the receives to complete before doing any more work. In many cases, we can initialize sends and receives to do some work with data that does not depend on the received data then come back to fix things up once we have the extra data. (This is called overlapping communication and computation.)

This gives rise to a need for non-blocking operations, where the process can start a send or receive, and then the program can continue and check at some future point whether the communication is done or not.

The commands for non-blocking sending and receiving are:

  • MPI.Isend

  • MPI.Irecv and these have an MPI.Request associated with them that can be checked with MPI.Wait and in the case of an array of requests an MPI.Waitall.

Recall that the “I” stands for “with Immediate return”; it does not block until the message is received.

julia> using MPI

help?> MPI.Irecv!
  req = Irecv!(recvbuf, comm::Comm[, req::AbstractRequest = Request()];
          source::Integer=MPI.ANY_SOURCE, tag::Integer=MPI.ANY_TAG)

  Starts a nonblocking receive into the buffer data from MPI rank source of communicator comm using with the message tag tag.

  data can be a Buffer, or any object for which Buffer(data) is defined.

  Returns the AbstractRequest object for the nonblocking receive.

  External links
  ≡≡≡≡≡≡≡≡≡≡≡≡≡≡

      MPI_Irecv man page: OpenMPI
       (https://docs.open-mpi.org/en/main/man-openmpi/man3/MPI_Irecv.3.html), MPICH
       (https://www.mpich.org/static/docs/v4.0/www3/MPI_Irecv.html)
help?> MPI.Isend
  Isend(data, comm::Comm[, req::AbstractRequest = Request()]; dest::Integer, tag::Integer=0)

  Starts a nonblocking send of data to MPI rank dest of communicator comm using with the message tag tag.

  data can be a Buffer, or any object for which Buffer_send is defined.

  Returns the AbstractRequest object for the nonblocking send.

  External links
  ≡≡≡≡≡≡≡≡≡≡≡≡≡≡

      MPI_Isend man page: OpenMPI
       (https://docs.open-mpi.org/en/main/man-openmpi/man3/MPI_Isend.3.html), MPICH
       (https://www.mpich.org/static/docs/v4.0/www3/MPI_Isend.html)
help?> MPI.Wait
  Wait(req::AbstractRequest)
  status = Wait(req::AbstractRequest, Status)

  Block until the request req is complete and deallocated.

  The Status argument returns the Status of the completed request.

  External links
  ≡≡≡≡≡≡≡≡≡≡≡≡≡≡

      MPI_Wait man page: OpenMPI
       (https://docs.open-mpi.org/en/main/man-openmpi/man3/MPI_Wait.3.html), MPICH
       (https://www.mpich.org/static/docs/v4.0/www3/MPI_Wait.html)
help?> MPI.Waitall
  Waitall(reqs::AbstractVector{Request}[, statuses::Vector{Status}])
  statuses = Waitall(reqs::AbstractVector{Request}, Status)

  Block until all active requests in the array reqs are complete.

  The optional statuses or Status argument can be used to obtain the return Status of each request.

  See also
  ≡≡≡≡≡≡≡≡

      RequestSet can be used to minimize allocations

  External links
  ≡≡≡≡≡≡≡≡≡≡≡≡≡≡

      MPI_Waitall man page: OpenMPI
       (https://docs.open-mpi.org/en/main/man-openmpi/man3/MPI_Waitall.3.html), MPICH
       (https://www.mpich.org/static/docs/v4.0/www3/MPI_Waitall.html)
help?> MPI.Waitany
  i = Waitany(reqs::AbstractVector{Request}[, status::Ref{Status}])
  i, status = Waitany(reqs::AbstractVector{Request}, Status)

  Blocks until one of the requests in the array reqs is complete: if more than one is complete, one is chosen arbitrarily. The request is deallocated and the (1-based) index i of the completed request is returned.

  If there are no active requests, then i = nothing.

  The optional status argument can be used to obtain the return Status of the request.

  See also
  ≡≡≡≡≡≡≡≡

      RequestSet can be used to minimize allocations

  External links
  ≡≡≡≡≡≡≡≡≡≡≡≡≡≡

      MPI_Waitany man page: OpenMPI
       (https://docs.open-mpi.org/en/main/man-openmpi/man3/MPI_Waitany.3.html), MPICH
       (https://www.mpich.org/static/docs/v4.0/www3/MPI_Waitany.html)

Note:

  • The MPI.Waitany function allows a program to wait on any of the request in an array to complete (as opposed to all of them)

  • The functions MPI.Test, MPI.Testany, and MPI.Testall can be used to check whether a request or array of requests is completed without blocking.

2.1 Examples#

Example of non-blocking communication#

Example of a non-blocking communication can be found at julia_codes/module6-2/non_blocking_wait.jl.

If we’re not careful about not waiting on the recv request you can have problems… Example of a buggy version can be found at julia_codes/module6-2/non_blocking_wait_with_bug.jl.

In the following example, which can be found at julia_codes/module6-2/non_blocking_waitall.jl, the MPI.Wait is replaced with MPI.Waitall call.

There is no reason you’d do this, just an example. Generally you do not want to wait on a request until you absolutely have to (so waiting on the send request and receive request together is unneeded!)