public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Leon Romanovsky <leon@kernel.org>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
	Sean Hefty <shefty@nvidia.com>,
	Vlad Dumitrescu <vdumitrescu@nvidia.com>
Subject: [PATCH rdma-next 0/9] Rework retry algorithm used when sending MADs
Date: Thu,  5 Dec 2024 15:49:30 +0200	[thread overview]
Message-ID: <cover.1733405453.git.leon@kernel.org> (raw)

From Vlad,

This series aims to improve behaviour of a MAD sender under congestion
and/or receiver overload.  We've seen significant drops in goodput when
MAD receivers are overloaded.  This typically happens with SA requests,
which are served by a single node (SM), but can also happen with CM.

Patch 7 introduces the main change: exponential backoff.  This new retry
algorithm is applied to all MADs, except RMPP and OPA.  To avoid
reductions in recovery speed under transient failures, the exponential
backoff algorithm only engages after a certain number of linear timeouts
is experienced.  The backoff algorithm resets to beginning after a CM
MRA, assuming the remote is not longer overloaded.

Because a trade-off between speed of recovery under transient failure
and reducing load from unnecessary retries under persistent failure must
be made, and this trade-off depends on the network scale, patch 8 makes
mad-linear-timeouts configurable.

Patch 1 makes CM MRA apply only once, to prevent entering an excessive
delay condition, even when the receiver is likely no longer overloaded.

The exponential backoff algorithm (a) increases the time until a send
MAD reaches the final timeout, and (b) makes it hard to predict by
callers.  Since certain callers appear to care about this, Patch 2
introduces a new option, deadline, which can be used to enforce when
the final timeout is reached.  SA, UMAD and CM are updated to use this
new parameter (patches 3, 5, 6).

Patch 3 also solves a related issue in SA, which configures the MAD
layer with extremely aggressive retry intervals, in certain cases.
Because the current aggressive retry was introduced to solve another
issue, patch 4 makes sa-min-timeout configurable.

Patch 9 resolves another related issue in CM, which uses a retry
interval that is way too high for (low latency) RDMA networks.

In summary:
  1) IB/mad: Apply timeout modification (CM MRA) only once
  2) IB/mad: Add deadline for send MADs
  3) RDMA/sa_query: Enforce min retry interval and deadline
  4) RDMA/nldev: Add sa-min-timeout management attribute
  5) IB/umad: Set deadline when sending non-RMPP MADs
  6) IB/cm: Set deadline when sending MADs
  7) IB/mad: Exponential backoff when retrying sends
  8) RDMA/nldev: Add mad-linear-timeouts management attribute
  9) IB/cma: Lower response timeout to roughly 1s

Two tunables will be added to RDMA tool (iproute2), under the
'management' namespace as follow-up:

  mad-linear-timeouts
  sa-min-timeout

Thanks

Vlad Dumitrescu (9):
  IB/mad: Apply timeout modification (CM MRA) only once
  IB/mad: Add deadline for send MADs
  RDMA/sa_query: Enforce min retry interval and deadline
  RDMA/nldev: Add sa-min-timeout management attribute
  IB/umad: Set deadline when sending non-RMPP MADs
  IB/cm: Set deadline when sending MADs
  IB/mad: Exponential backoff when retrying sends
  RDMA/nldev: Add mad-linear-timeouts management attribute
  IB/cma: Lower response timeout to roughly 1s

 drivers/infiniband/core/cm.c        |  13 +++
 drivers/infiniband/core/cma.c       |   2 +-
 drivers/infiniband/core/core_priv.h |   4 +
 drivers/infiniband/core/mad.c       | 141 ++++++++++++++++++++++++++--
 drivers/infiniband/core/mad_priv.h  |   8 ++
 drivers/infiniband/core/nldev.c     | 133 ++++++++++++++++++++++++++
 drivers/infiniband/core/sa_query.c  |  81 +++++++++++++---
 drivers/infiniband/core/user_mad.c  |   8 ++
 include/rdma/ib_mad.h               |  29 ++++++
 include/uapi/rdma/ib_user_mad.h     |  12 ++-
 include/uapi/rdma/rdma_netlink.h    |   7 ++
 11 files changed, 416 insertions(+), 22 deletions(-)

-- 
2.47.0


                 reply	other threads:[~2024-12-05 13:50 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1733405453.git.leon@kernel.org \
    --to=leon@kernel.org \
    --cc=jgg@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=shefty@nvidia.com \
    --cc=vdumitrescu@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox