public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH rdma-next 0/8] RDMA/mana_ib: Handle service reset for RDMA resources
@ 2026-03-07  1:47 Long Li
  2026-03-07  1:47 ` [PATCH rdma-next 1/8] RDMA/mana_ib: Track ucontext per device Long Li
                   ` (8 more replies)
  0 siblings, 9 replies; 15+ messages in thread
From: Long Li @ 2026-03-07  1:47 UTC (permalink / raw)
  To: Long Li, Konstantin Taranov, Jakub Kicinski, David S . Miller,
	Paolo Abeni, Eric Dumazet, Andrew Lunn, Jason Gunthorpe,
	Leon Romanovsky, Haiyang Zhang, K . Y . Srinivasan, Wei Liu,
	Dexuan Cui
  Cc: Simon Horman, netdev, linux-rdma, linux-hyperv, linux-kernel

When the MANA hardware undergoes a service reset, the ETH auxiliary device
(mana.eth) used by DPDK persists across the reset cycle — it is not removed
and re-added like RC/UD/GSI QPs. This means userspace RDMA consumers such
as DPDK have no way of knowing that firmware handles for their PD, CQ, WQ,
QP and MR resources have become stale.

This series adds per-ucontext resource tracking and a reset notification
mechanism so that:

1. The RDMA driver is informed of service reset events via direct callbacks
   from the ETH driver (reset_notify / resume_notify).

2. On reset, all tracked firmware handles are invalidated (set to
   INVALID_MANA_HANDLE), user doorbell mappings are revoked via
   rdma_user_mmap_disassociate(), and IB_EVENT_PORT_ERR is dispatched to
   each affected ucontext so userspace can detect the reset.

3. Destroy callbacks check for INVALID_MANA_HANDLE and skip firmware
   commands for resources already invalidated by the reset path,
   preventing stale handles from being sent to firmware.

4. A reset_rwsem serializes handle invalidation against resource creation
   to avoid races between the reset path and new resource allocation.

Patches 1-6 introduce per-ucontext tracking lists for each resource type.
Patch 7 implements the reset/resume notification mechanism with rwsem
serialization, mmap revocation, and IB event dispatch.
Patch 8 adds INVALID_MANA_HANDLE checks in destroy callbacks.

Tested with DPDK testpmd on Azure VM (linux-next-20260306) — confirmed
IB_EVENT_PORT_ERR (type=10) and IB_EVENT_PORT_ACTIVE (type=9) are delivered
to userspace during service reset, and testpmd tears down cleanly afterwards.

Long Li (8):
  RDMA/mana_ib: Track ucontext per device
  RDMA/mana_ib: Track PD per ucontext
  RDMA/mana_ib: Track CQ per ucontext
  RDMA/mana_ib: Track WQ per ucontext
  RDMA/mana_ib: Track QP per ucontext
  RDMA/mana_ib: Track MR per ucontext
  RDMA/mana_ib: Notify service reset events to RDMA devices
  RDMA/mana_ib: Skip firmware commands for invalidated handles

 drivers/infiniband/hw/mana/cq.c               |  44 +++++--
 drivers/infiniband/hw/mana/device.c           | 105 ++++++++++++++++++
 drivers/infiniband/hw/mana/main.c             |  56 +++++++++-
 drivers/infiniband/hw/mana/mana_ib.h          |  19 ++++
 drivers/infiniband/hw/mana/mr.c               |  33 +++++-
 drivers/infiniband/hw/mana/qp.c               |  61 +++++++---
 drivers/infiniband/hw/mana/wq.c               |  24 ++++
 drivers/net/ethernet/microsoft/mana/mana_en.c |  14 ++-
 include/net/mana/gdma.h                       |   6 +
 9 files changed, 331 insertions(+), 31 deletions(-)

-- 
2.43.0

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2026-03-21  0:49 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-07  1:47 [PATCH rdma-next 0/8] RDMA/mana_ib: Handle service reset for RDMA resources Long Li
2026-03-07  1:47 ` [PATCH rdma-next 1/8] RDMA/mana_ib: Track ucontext per device Long Li
2026-03-07  1:47 ` [PATCH rdma-next 2/8] RDMA/mana_ib: Track PD per ucontext Long Li
2026-03-07  1:47 ` [PATCH rdma-next 3/8] RDMA/mana_ib: Track CQ " Long Li
2026-03-07  1:47 ` [PATCH rdma-next 4/8] RDMA/mana_ib: Track WQ " Long Li
2026-03-07  1:47 ` [PATCH rdma-next 5/8] RDMA/mana_ib: Track QP " Long Li
2026-03-07  1:47 ` [PATCH rdma-next 6/8] RDMA/mana_ib: Track MR " Long Li
2026-03-07  1:47 ` [PATCH rdma-next 7/8] RDMA/mana_ib: Notify service reset events to RDMA devices Long Li
2026-03-07  1:47 ` [PATCH rdma-next 8/8] RDMA/mana_ib: Skip firmware commands for invalidated handles Long Li
2026-03-07 17:38 ` [PATCH rdma-next 0/8] RDMA/mana_ib: Handle service reset for RDMA resources Leon Romanovsky
2026-03-13 16:59   ` Jason Gunthorpe
2026-03-16 20:08     ` Leon Romanovsky
2026-03-17 23:43       ` [EXTERNAL] " Long Li
2026-03-18 14:49         ` Leon Romanovsky
2026-03-21  0:49           ` Long Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox