* [PATCH rdma-next 0/9] Rework retry algorithm used when sending MADs
@ 2024-12-05 13:49 Leon Romanovsky
0 siblings, 0 replies; only message in thread
From: Leon Romanovsky @ 2024-12-05 13:49 UTC (permalink / raw)
To: Jason Gunthorpe; +Cc: linux-kernel, linux-rdma, Sean Hefty, Vlad Dumitrescu
From Vlad,
This series aims to improve behaviour of a MAD sender under congestion
and/or receiver overload. We've seen significant drops in goodput when
MAD receivers are overloaded. This typically happens with SA requests,
which are served by a single node (SM), but can also happen with CM.
Patch 7 introduces the main change: exponential backoff. This new retry
algorithm is applied to all MADs, except RMPP and OPA. To avoid
reductions in recovery speed under transient failures, the exponential
backoff algorithm only engages after a certain number of linear timeouts
is experienced. The backoff algorithm resets to beginning after a CM
MRA, assuming the remote is not longer overloaded.
Because a trade-off between speed of recovery under transient failure
and reducing load from unnecessary retries under persistent failure must
be made, and this trade-off depends on the network scale, patch 8 makes
mad-linear-timeouts configurable.
Patch 1 makes CM MRA apply only once, to prevent entering an excessive
delay condition, even when the receiver is likely no longer overloaded.
The exponential backoff algorithm (a) increases the time until a send
MAD reaches the final timeout, and (b) makes it hard to predict by
callers. Since certain callers appear to care about this, Patch 2
introduces a new option, deadline, which can be used to enforce when
the final timeout is reached. SA, UMAD and CM are updated to use this
new parameter (patches 3, 5, 6).
Patch 3 also solves a related issue in SA, which configures the MAD
layer with extremely aggressive retry intervals, in certain cases.
Because the current aggressive retry was introduced to solve another
issue, patch 4 makes sa-min-timeout configurable.
Patch 9 resolves another related issue in CM, which uses a retry
interval that is way too high for (low latency) RDMA networks.
In summary:
1) IB/mad: Apply timeout modification (CM MRA) only once
2) IB/mad: Add deadline for send MADs
3) RDMA/sa_query: Enforce min retry interval and deadline
4) RDMA/nldev: Add sa-min-timeout management attribute
5) IB/umad: Set deadline when sending non-RMPP MADs
6) IB/cm: Set deadline when sending MADs
7) IB/mad: Exponential backoff when retrying sends
8) RDMA/nldev: Add mad-linear-timeouts management attribute
9) IB/cma: Lower response timeout to roughly 1s
Two tunables will be added to RDMA tool (iproute2), under the
'management' namespace as follow-up:
mad-linear-timeouts
sa-min-timeout
Thanks
Vlad Dumitrescu (9):
IB/mad: Apply timeout modification (CM MRA) only once
IB/mad: Add deadline for send MADs
RDMA/sa_query: Enforce min retry interval and deadline
RDMA/nldev: Add sa-min-timeout management attribute
IB/umad: Set deadline when sending non-RMPP MADs
IB/cm: Set deadline when sending MADs
IB/mad: Exponential backoff when retrying sends
RDMA/nldev: Add mad-linear-timeouts management attribute
IB/cma: Lower response timeout to roughly 1s
drivers/infiniband/core/cm.c | 13 +++
drivers/infiniband/core/cma.c | 2 +-
drivers/infiniband/core/core_priv.h | 4 +
drivers/infiniband/core/mad.c | 141 ++++++++++++++++++++++++++--
drivers/infiniband/core/mad_priv.h | 8 ++
drivers/infiniband/core/nldev.c | 133 ++++++++++++++++++++++++++
drivers/infiniband/core/sa_query.c | 81 +++++++++++++---
drivers/infiniband/core/user_mad.c | 8 ++
include/rdma/ib_mad.h | 29 ++++++
include/uapi/rdma/ib_user_mad.h | 12 ++-
include/uapi/rdma/rdma_netlink.h | 7 ++
11 files changed, 416 insertions(+), 22 deletions(-)
--
2.47.0
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2024-12-05 13:50 UTC | newest]
Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-05 13:49 [PATCH rdma-next 0/9] Rework retry algorithm used when sending MADs Leon Romanovsky
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox