From: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
To: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
Artemy Kovalyov
<artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Subject: [rdma-next 11/11] Documentation: Hardware tag matching
Date: Mon, 14 Aug 2017 16:53:52 +0300 [thread overview]
Message-ID: <20170814135352.5324-12-leon@kernel.org> (raw)
In-Reply-To: <20170814135352.5324-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
From: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Add document providing definitions of terms and core explanations
for tag matching (TM) protocols, eager and rendezvous,
TM application header, tag list manipulations and matching process.
Signed-off-by: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
Documentation/infiniband/tag_matching.txt | 64 +++++++++++++++++++++++++++++++
1 file changed, 64 insertions(+)
create mode 100644 Documentation/infiniband/tag_matching.txt
diff --git a/Documentation/infiniband/tag_matching.txt b/Documentation/infiniband/tag_matching.txt
new file mode 100644
index 000000000000..d2a3bf819226
--- /dev/null
+++ b/Documentation/infiniband/tag_matching.txt
@@ -0,0 +1,64 @@
+Tag matching logic
+
+The MPI standard defines a set of rules, known as tag-matching, for matching
+source send operations to destination receives. The following parameters must
+match the following source and destination parameters:
+* Communicator
+* User tag - wild card may be specified by the receiver
+* Source rank – wild car may be specified by the receiver
+* Destination rank – wild
+The ordering rules require that when more than one pair of send and receive
+message envelopes may match, the pair that includes the earliest posted-send
+and the earliest posted-receive is the pair that must be used to satisfy the
+matching operation. However, this doesn’t imply that tags are consumed in
+the order they are created, e.g., a later generated tag may be consumed, if
+earlier tags can’t be used to satisfy the matching rules.
+
+When a message is sent from the sender to the receiver, the communication
+library may attempt to process the operation either after or before the
+corresponding matching receive is posted. If a matching receive is posted,
+this is an expected message, otherwise it is called an unexpected message.
+Implementations frequently use different matching schemes for these two
+different matching instances.
+
+To keep MPI library memory footprint down, MPI implementations typically use
+two different protocols for this purpose:
+
+1. The Eager protocol- the complete message is sent when the send is
+processed by the sender. A completion send is received in the send_cq
+notifying that the buffer can be reused.
+
+2. The Rendezvous Protocol - the sender sends the tag-matching header,
+and perhaps a portion of data when first notifying the receiver. When the
+corresponding buffer is posted, the responder will use the information from
+the header to initiate an RDMA READ operation directly to the matching buffer.
+A fin message needs to be received in order for the buffer to be reused.
+
+Tag matching implementation
+
+There are two types of matching objects used, the posted receive list and the
+unexpected message list. The application posts receive buffers through calls
+to the MPI receive routines in the posted receive list and posts send messages
+using the MPI send routines. The head of the posted receive list may be
+maintained by the hardware, with the software expected to shadow this list.
+
+When send is initiated and arrives at the receive side, if there is no
+pre-posted receive for this arriving message, it is passed to the software and
+placed in the unexpected message list. Otherwise the match is processed,
+including rendezvous processing, if appropriate, delivering the data to the
+specified receive buffer. This allows overlapping receive-side MPI tag
+matching with computation.
+
+When a receive-message is posted, the communication library will first check
+the software unexpected message list for a matching receive. If a match is
+found, data is delivered to the user buffer, using a software controlled
+protocol. The UCX implementation uses either an eager or rendezvous protocol,
+depending on data size. If no match is found, the entire pre-posted receive
+list is maintained by the hardware, and there is space to add one more
+pre-posted receive to this list, this receive is passed to the hardware.
+Software is expected to shadow this list, to help with processing MPI cancel
+operations. In addition, because hardware and software are not expected to be
+tightly synchronized with respect to the tag-matching operation, this shadow
+list is used to detect the case that a pre-posted receive is passed to the
+hardware, as the matching unexpected message is being passed from the hardware
+to the software.
--
2.14.0
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
prev parent reply other threads:[~2017-08-14 13:53 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-08-14 13:53 [pull request][rdma-next 00/11] Hardware tag matching support Leon Romanovsky
[not found] ` <20170814135352.5324-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2017-08-14 13:53 ` [rdma-next 01/11] net/mlx5: Update HW layout definitions Leon Romanovsky
2017-08-14 13:53 ` [rdma-next 02/11] IB/core: Add XRQ capabilities Leon Romanovsky
2017-08-14 13:53 ` [rdma-next 03/11] IB/core: Separate CQ handle in SRQ context Leon Romanovsky
2017-08-14 13:53 ` [rdma-next 04/11] IB/core: Add new SRQ type IB_SRQT_TM Leon Romanovsky
2017-08-14 13:53 ` [rdma-next 05/11] IB/uverbs: Add XRQ creation parameter to UAPI Leon Romanovsky
2017-08-14 13:53 ` [rdma-next 06/11] IB/uverbs: Add new SRQ type IB_SRQT_TM Leon Romanovsky
2017-08-14 13:53 ` [rdma-next 07/11] IB/uverbs: Expose XRQ capabilities Leon Romanovsky
2017-08-14 13:53 ` [rdma-next 08/11] IB/mlx5: Fill " Leon Romanovsky
2017-08-14 13:53 ` [rdma-next 09/11] net/mlx5: Add XRQ support Leon Romanovsky
2017-08-14 13:53 ` [rdma-next 10/11] IB/mlx5: Support IB_SRQT_TM Leon Romanovsky
2017-08-14 13:53 ` Leon Romanovsky [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170814135352.5324-12-leon@kernel.org \
--to=leon-dgejt+ai2ygdnm+yrofe0a@public.gmane.org \
--cc=artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
--cc=dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
--cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox