* [PATCH rdma-next 0/8] RDMA/mana_ib: Handle service reset for RDMA resources
@ 2026-03-07 1:47 Long Li
2026-03-07 1:47 ` [PATCH rdma-next 1/8] RDMA/mana_ib: Track ucontext per device Long Li
` (8 more replies)
0 siblings, 9 replies; 15+ messages in thread
From: Long Li @ 2026-03-07 1:47 UTC (permalink / raw)
To: Long Li, Konstantin Taranov, Jakub Kicinski, David S . Miller,
Paolo Abeni, Eric Dumazet, Andrew Lunn, Jason Gunthorpe,
Leon Romanovsky, Haiyang Zhang, K . Y . Srinivasan, Wei Liu,
Dexuan Cui
Cc: Simon Horman, netdev, linux-rdma, linux-hyperv, linux-kernel
When the MANA hardware undergoes a service reset, the ETH auxiliary device
(mana.eth) used by DPDK persists across the reset cycle — it is not removed
and re-added like RC/UD/GSI QPs. This means userspace RDMA consumers such
as DPDK have no way of knowing that firmware handles for their PD, CQ, WQ,
QP and MR resources have become stale.
This series adds per-ucontext resource tracking and a reset notification
mechanism so that:
1. The RDMA driver is informed of service reset events via direct callbacks
from the ETH driver (reset_notify / resume_notify).
2. On reset, all tracked firmware handles are invalidated (set to
INVALID_MANA_HANDLE), user doorbell mappings are revoked via
rdma_user_mmap_disassociate(), and IB_EVENT_PORT_ERR is dispatched to
each affected ucontext so userspace can detect the reset.
3. Destroy callbacks check for INVALID_MANA_HANDLE and skip firmware
commands for resources already invalidated by the reset path,
preventing stale handles from being sent to firmware.
4. A reset_rwsem serializes handle invalidation against resource creation
to avoid races between the reset path and new resource allocation.
Patches 1-6 introduce per-ucontext tracking lists for each resource type.
Patch 7 implements the reset/resume notification mechanism with rwsem
serialization, mmap revocation, and IB event dispatch.
Patch 8 adds INVALID_MANA_HANDLE checks in destroy callbacks.
Tested with DPDK testpmd on Azure VM (linux-next-20260306) — confirmed
IB_EVENT_PORT_ERR (type=10) and IB_EVENT_PORT_ACTIVE (type=9) are delivered
to userspace during service reset, and testpmd tears down cleanly afterwards.
Long Li (8):
RDMA/mana_ib: Track ucontext per device
RDMA/mana_ib: Track PD per ucontext
RDMA/mana_ib: Track CQ per ucontext
RDMA/mana_ib: Track WQ per ucontext
RDMA/mana_ib: Track QP per ucontext
RDMA/mana_ib: Track MR per ucontext
RDMA/mana_ib: Notify service reset events to RDMA devices
RDMA/mana_ib: Skip firmware commands for invalidated handles
drivers/infiniband/hw/mana/cq.c | 44 +++++--
drivers/infiniband/hw/mana/device.c | 105 ++++++++++++++++++
drivers/infiniband/hw/mana/main.c | 56 +++++++++-
drivers/infiniband/hw/mana/mana_ib.h | 19 ++++
drivers/infiniband/hw/mana/mr.c | 33 +++++-
drivers/infiniband/hw/mana/qp.c | 61 +++++++---
drivers/infiniband/hw/mana/wq.c | 24 ++++
drivers/net/ethernet/microsoft/mana/mana_en.c | 14 ++-
include/net/mana/gdma.h | 6 +
9 files changed, 331 insertions(+), 31 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH rdma-next 1/8] RDMA/mana_ib: Track ucontext per device
2026-03-07 1:47 [PATCH rdma-next 0/8] RDMA/mana_ib: Handle service reset for RDMA resources Long Li
@ 2026-03-07 1:47 ` Long Li
2026-03-07 1:47 ` [PATCH rdma-next 2/8] RDMA/mana_ib: Track PD per ucontext Long Li
` (7 subsequent siblings)
8 siblings, 0 replies; 15+ messages in thread
From: Long Li @ 2026-03-07 1:47 UTC (permalink / raw)
To: Long Li, Konstantin Taranov, Jakub Kicinski, David S . Miller,
Paolo Abeni, Eric Dumazet, Andrew Lunn, Jason Gunthorpe,
Leon Romanovsky, Haiyang Zhang, K . Y . Srinivasan, Wei Liu,
Dexuan Cui
Cc: Simon Horman, netdev, linux-rdma, linux-hyperv, linux-kernel
Add per-device tracking of ucontext objects. Each ucontext is added
to the device's ucontext_list on allocation and removed on deallocation.
A mutex protects the list and a per-ucontext lock protects resource
lists that will be added in subsequent patches.
This enables iterating over all active ucontexts during service reset
cleanup.
Signed-off-by: Long Li <longli@microsoft.com>
---
drivers/infiniband/hw/mana/device.c | 2 ++
drivers/infiniband/hw/mana/main.c | 10 ++++++++++
drivers/infiniband/hw/mana/mana_ib.h | 6 ++++++
3 files changed, 18 insertions(+)
diff --git a/drivers/infiniband/hw/mana/device.c b/drivers/infiniband/hw/mana/device.c
index ccc2279ca63c..149e8d4d5b8e 100644
--- a/drivers/infiniband/hw/mana/device.c
+++ b/drivers/infiniband/hw/mana/device.c
@@ -132,6 +132,8 @@ static int mana_ib_probe(struct auxiliary_device *adev,
dev->ib_dev.dev.parent = gc->dev;
dev->gdma_dev = mdev;
xa_init_flags(&dev->qp_table_wq, XA_FLAGS_LOCK_IRQ);
+ mutex_init(&dev->ucontext_lock);
+ INIT_LIST_HEAD(&dev->ucontext_list);
if (mana_ib_is_rnic(dev)) {
dev->ib_dev.phys_port_cnt = 1;
diff --git a/drivers/infiniband/hw/mana/main.c b/drivers/infiniband/hw/mana/main.c
index 8d99cd00f002..fc28bdafcfd6 100644
--- a/drivers/infiniband/hw/mana/main.c
+++ b/drivers/infiniband/hw/mana/main.c
@@ -221,6 +221,12 @@ int mana_ib_alloc_ucontext(struct ib_ucontext *ibcontext,
ucontext->doorbell = doorbell_page;
+ mutex_init(&ucontext->lock);
+
+ mutex_lock(&mdev->ucontext_lock);
+ list_add_tail(&ucontext->dev_list, &mdev->ucontext_list);
+ mutex_unlock(&mdev->ucontext_lock);
+
return 0;
}
@@ -236,6 +242,10 @@ void mana_ib_dealloc_ucontext(struct ib_ucontext *ibcontext)
mdev = container_of(ibdev, struct mana_ib_dev, ib_dev);
gc = mdev_to_gc(mdev);
+ mutex_lock(&mdev->ucontext_lock);
+ list_del_init(&mana_ucontext->dev_list);
+ mutex_unlock(&mdev->ucontext_lock);
+
ret = mana_gd_destroy_doorbell_page(gc, mana_ucontext->doorbell);
if (ret)
ibdev_dbg(ibdev, "Failed to destroy doorbell page %d\n", ret);
diff --git a/drivers/infiniband/hw/mana/mana_ib.h b/drivers/infiniband/hw/mana/mana_ib.h
index a7c8c0fd7019..c7e333d3e9d8 100644
--- a/drivers/infiniband/hw/mana/mana_ib.h
+++ b/drivers/infiniband/hw/mana/mana_ib.h
@@ -83,6 +83,9 @@ struct mana_ib_dev {
struct dma_pool *av_pool;
netdevice_tracker dev_tracker;
struct notifier_block nb;
+ /* Protects ucontext_list */
+ struct mutex ucontext_lock;
+ struct list_head ucontext_list;
};
struct mana_ib_wq {
@@ -197,6 +200,9 @@ struct mana_ib_qp {
struct mana_ib_ucontext {
struct ib_ucontext ibucontext;
u32 doorbell;
+ struct list_head dev_list;
+ /* Protects resource lists below */
+ struct mutex lock;
};
struct mana_ib_rwq_ind_table {
--
2.43.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH rdma-next 2/8] RDMA/mana_ib: Track PD per ucontext
2026-03-07 1:47 [PATCH rdma-next 0/8] RDMA/mana_ib: Handle service reset for RDMA resources Long Li
2026-03-07 1:47 ` [PATCH rdma-next 1/8] RDMA/mana_ib: Track ucontext per device Long Li
@ 2026-03-07 1:47 ` Long Li
2026-03-07 1:47 ` [PATCH rdma-next 3/8] RDMA/mana_ib: Track CQ " Long Li
` (6 subsequent siblings)
8 siblings, 0 replies; 15+ messages in thread
From: Long Li @ 2026-03-07 1:47 UTC (permalink / raw)
To: Long Li, Konstantin Taranov, Jakub Kicinski, David S . Miller,
Paolo Abeni, Eric Dumazet, Andrew Lunn, Jason Gunthorpe,
Leon Romanovsky, Haiyang Zhang, K . Y . Srinivasan, Wei Liu,
Dexuan Cui
Cc: Simon Horman, netdev, linux-rdma, linux-hyperv, linux-kernel
Add per-ucontext list tracking for PD objects. Each PD is added to
the ucontext's pd_list on creation and removed on destruction. This
enables iterating over all PDs belonging to a ucontext, which will
be needed for service reset cleanup.
Signed-off-by: Long Li <longli@microsoft.com>
---
drivers/infiniband/hw/mana/main.c | 21 +++++++++++++++++++++
drivers/infiniband/hw/mana/mana_ib.h | 2 ++
2 files changed, 23 insertions(+)
diff --git a/drivers/infiniband/hw/mana/main.c b/drivers/infiniband/hw/mana/main.c
index fc28bdafcfd6..62d89ca06ba1 100644
--- a/drivers/infiniband/hw/mana/main.c
+++ b/drivers/infiniband/hw/mana/main.c
@@ -72,6 +72,7 @@ int mana_ib_alloc_pd(struct ib_pd *ibpd, struct ib_udata *udata)
struct ib_device *ibdev = ibpd->device;
struct gdma_create_pd_resp resp = {};
struct gdma_create_pd_req req = {};
+ struct mana_ib_ucontext *mana_ucontext;
enum gdma_pd_flags flags = 0;
struct mana_ib_dev *dev;
struct gdma_context *gc;
@@ -107,6 +108,16 @@ int mana_ib_alloc_pd(struct ib_pd *ibpd, struct ib_udata *udata)
mutex_init(&pd->vport_mutex);
pd->vport_use_count = 0;
+
+ INIT_LIST_HEAD(&pd->ucontext_list);
+ if (udata) {
+ mana_ucontext = rdma_udata_to_drv_context(udata,
+ struct mana_ib_ucontext, ibucontext);
+ mutex_lock(&mana_ucontext->lock);
+ list_add_tail(&pd->ucontext_list, &mana_ucontext->pd_list);
+ mutex_unlock(&mana_ucontext->lock);
+ }
+
return 0;
}
@@ -123,6 +134,15 @@ int mana_ib_dealloc_pd(struct ib_pd *ibpd, struct ib_udata *udata)
dev = container_of(ibdev, struct mana_ib_dev, ib_dev);
gc = mdev_to_gc(dev);
+ if (udata) {
+ struct mana_ib_ucontext *mana_ucontext =
+ rdma_udata_to_drv_context(udata,
+ struct mana_ib_ucontext, ibucontext);
+ mutex_lock(&mana_ucontext->lock);
+ list_del_init(&pd->ucontext_list);
+ mutex_unlock(&mana_ucontext->lock);
+ }
+
mana_gd_init_req_hdr(&req.hdr, GDMA_DESTROY_PD, sizeof(req),
sizeof(resp));
@@ -222,6 +242,7 @@ int mana_ib_alloc_ucontext(struct ib_ucontext *ibcontext,
ucontext->doorbell = doorbell_page;
mutex_init(&ucontext->lock);
+ INIT_LIST_HEAD(&ucontext->pd_list);
mutex_lock(&mdev->ucontext_lock);
list_add_tail(&ucontext->dev_list, &mdev->ucontext_list);
diff --git a/drivers/infiniband/hw/mana/mana_ib.h b/drivers/infiniband/hw/mana/mana_ib.h
index c7e333d3e9d8..6dba08bccc18 100644
--- a/drivers/infiniband/hw/mana/mana_ib.h
+++ b/drivers/infiniband/hw/mana/mana_ib.h
@@ -107,6 +107,7 @@ struct mana_ib_pd {
bool tx_shortform_allowed;
u32 tx_vp_offset;
+ struct list_head ucontext_list;
};
struct mana_ib_av {
@@ -203,6 +204,7 @@ struct mana_ib_ucontext {
struct list_head dev_list;
/* Protects resource lists below */
struct mutex lock;
+ struct list_head pd_list;
};
struct mana_ib_rwq_ind_table {
--
2.43.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH rdma-next 3/8] RDMA/mana_ib: Track CQ per ucontext
2026-03-07 1:47 [PATCH rdma-next 0/8] RDMA/mana_ib: Handle service reset for RDMA resources Long Li
2026-03-07 1:47 ` [PATCH rdma-next 1/8] RDMA/mana_ib: Track ucontext per device Long Li
2026-03-07 1:47 ` [PATCH rdma-next 2/8] RDMA/mana_ib: Track PD per ucontext Long Li
@ 2026-03-07 1:47 ` Long Li
2026-03-07 1:47 ` [PATCH rdma-next 4/8] RDMA/mana_ib: Track WQ " Long Li
` (5 subsequent siblings)
8 siblings, 0 replies; 15+ messages in thread
From: Long Li @ 2026-03-07 1:47 UTC (permalink / raw)
To: Long Li, Konstantin Taranov, Jakub Kicinski, David S . Miller,
Paolo Abeni, Eric Dumazet, Andrew Lunn, Jason Gunthorpe,
Leon Romanovsky, Haiyang Zhang, K . Y . Srinivasan, Wei Liu,
Dexuan Cui
Cc: Simon Horman, netdev, linux-rdma, linux-hyperv, linux-kernel
Add per-ucontext list tracking for CQ objects. Each CQ is added to
the ucontext's cq_list on creation and removed on destruction. This
enables iterating over all CQs belonging to a ucontext for service
reset cleanup.
Signed-off-by: Long Li <longli@microsoft.com>
---
drivers/infiniband/hw/mana/cq.c | 19 +++++++++++++++++++
drivers/infiniband/hw/mana/main.c | 1 +
drivers/infiniband/hw/mana/mana_ib.h | 2 ++
3 files changed, 22 insertions(+)
diff --git a/drivers/infiniband/hw/mana/cq.c b/drivers/infiniband/hw/mana/cq.c
index b2749f971cd0..89cf60987ff5 100644
--- a/drivers/infiniband/hw/mana/cq.c
+++ b/drivers/infiniband/hw/mana/cq.c
@@ -95,6 +95,16 @@ int mana_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
INIT_LIST_HEAD(&cq->list_send_qp);
INIT_LIST_HEAD(&cq->list_recv_qp);
+ INIT_LIST_HEAD(&cq->ucontext_list);
+ if (udata) {
+ struct mana_ib_ucontext *mana_ucontext =
+ rdma_udata_to_drv_context(udata,
+ struct mana_ib_ucontext, ibucontext);
+ mutex_lock(&mana_ucontext->lock);
+ list_add_tail(&cq->ucontext_list, &mana_ucontext->cq_list);
+ mutex_unlock(&mana_ucontext->lock);
+ }
+
return 0;
err_remove_cq_cb:
@@ -115,6 +125,15 @@ int mana_ib_destroy_cq(struct ib_cq *ibcq, struct ib_udata *udata)
mdev = container_of(ibdev, struct mana_ib_dev, ib_dev);
+ if (udata) {
+ struct mana_ib_ucontext *mana_ucontext =
+ rdma_udata_to_drv_context(udata,
+ struct mana_ib_ucontext, ibucontext);
+ mutex_lock(&mana_ucontext->lock);
+ list_del_init(&cq->ucontext_list);
+ mutex_unlock(&mana_ucontext->lock);
+ }
+
mana_ib_remove_cq_cb(mdev, cq);
/* Ignore return code as there is not much we can do about it.
diff --git a/drivers/infiniband/hw/mana/main.c b/drivers/infiniband/hw/mana/main.c
index 62d89ca06ba1..214c1d4e1548 100644
--- a/drivers/infiniband/hw/mana/main.c
+++ b/drivers/infiniband/hw/mana/main.c
@@ -243,6 +243,7 @@ int mana_ib_alloc_ucontext(struct ib_ucontext *ibcontext,
mutex_init(&ucontext->lock);
INIT_LIST_HEAD(&ucontext->pd_list);
+ INIT_LIST_HEAD(&ucontext->cq_list);
mutex_lock(&mdev->ucontext_lock);
list_add_tail(&ucontext->dev_list, &mdev->ucontext_list);
diff --git a/drivers/infiniband/hw/mana/mana_ib.h b/drivers/infiniband/hw/mana/mana_ib.h
index 6dba08bccc18..8d3edf7ba335 100644
--- a/drivers/infiniband/hw/mana/mana_ib.h
+++ b/drivers/infiniband/hw/mana/mana_ib.h
@@ -150,6 +150,7 @@ struct mana_ib_cq {
int cqe;
u32 comp_vector;
mana_handle_t cq_handle;
+ struct list_head ucontext_list;
};
enum mana_rc_queue_type {
@@ -205,6 +206,7 @@ struct mana_ib_ucontext {
/* Protects resource lists below */
struct mutex lock;
struct list_head pd_list;
+ struct list_head cq_list;
};
struct mana_ib_rwq_ind_table {
--
2.43.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH rdma-next 4/8] RDMA/mana_ib: Track WQ per ucontext
2026-03-07 1:47 [PATCH rdma-next 0/8] RDMA/mana_ib: Handle service reset for RDMA resources Long Li
` (2 preceding siblings ...)
2026-03-07 1:47 ` [PATCH rdma-next 3/8] RDMA/mana_ib: Track CQ " Long Li
@ 2026-03-07 1:47 ` Long Li
2026-03-07 1:47 ` [PATCH rdma-next 5/8] RDMA/mana_ib: Track QP " Long Li
` (4 subsequent siblings)
8 siblings, 0 replies; 15+ messages in thread
From: Long Li @ 2026-03-07 1:47 UTC (permalink / raw)
To: Long Li, Konstantin Taranov, Jakub Kicinski, David S . Miller,
Paolo Abeni, Eric Dumazet, Andrew Lunn, Jason Gunthorpe,
Leon Romanovsky, Haiyang Zhang, K . Y . Srinivasan, Wei Liu,
Dexuan Cui
Cc: Simon Horman, netdev, linux-rdma, linux-hyperv, linux-kernel
Add per-ucontext list tracking for WQ objects. Each WQ is added to
the ucontext's wq_list on creation and removed on destruction. This
enables iterating over all WQs belonging to a ucontext for service
reset cleanup.
Signed-off-by: Long Li <longli@microsoft.com>
---
drivers/infiniband/hw/mana/main.c | 1 +
drivers/infiniband/hw/mana/mana_ib.h | 2 ++
drivers/infiniband/hw/mana/wq.c | 20 ++++++++++++++++++++
3 files changed, 23 insertions(+)
diff --git a/drivers/infiniband/hw/mana/main.c b/drivers/infiniband/hw/mana/main.c
index 214c1d4e1548..e6da5c8400f4 100644
--- a/drivers/infiniband/hw/mana/main.c
+++ b/drivers/infiniband/hw/mana/main.c
@@ -244,6 +244,7 @@ int mana_ib_alloc_ucontext(struct ib_ucontext *ibcontext,
mutex_init(&ucontext->lock);
INIT_LIST_HEAD(&ucontext->pd_list);
INIT_LIST_HEAD(&ucontext->cq_list);
+ INIT_LIST_HEAD(&ucontext->wq_list);
mutex_lock(&mdev->ucontext_lock);
list_add_tail(&ucontext->dev_list, &mdev->ucontext_list);
diff --git a/drivers/infiniband/hw/mana/mana_ib.h b/drivers/infiniband/hw/mana/mana_ib.h
index 8d3edf7ba335..96b5a13470ae 100644
--- a/drivers/infiniband/hw/mana/mana_ib.h
+++ b/drivers/infiniband/hw/mana/mana_ib.h
@@ -94,6 +94,7 @@ struct mana_ib_wq {
int wqe;
u32 wq_buf_size;
mana_handle_t rx_object;
+ struct list_head ucontext_list;
};
struct mana_ib_pd {
@@ -207,6 +208,7 @@ struct mana_ib_ucontext {
struct mutex lock;
struct list_head pd_list;
struct list_head cq_list;
+ struct list_head wq_list;
};
struct mana_ib_rwq_ind_table {
diff --git a/drivers/infiniband/hw/mana/wq.c b/drivers/infiniband/hw/mana/wq.c
index 6206244f762e..1af9869933aa 100644
--- a/drivers/infiniband/hw/mana/wq.c
+++ b/drivers/infiniband/hw/mana/wq.c
@@ -41,6 +41,17 @@ struct ib_wq *mana_ib_create_wq(struct ib_pd *pd,
wq->wqe = init_attr->max_wr;
wq->wq_buf_size = ucmd.wq_buf_size;
wq->rx_object = INVALID_MANA_HANDLE;
+
+ INIT_LIST_HEAD(&wq->ucontext_list);
+ if (udata) {
+ struct mana_ib_ucontext *mana_ucontext =
+ rdma_udata_to_drv_context(udata,
+ struct mana_ib_ucontext, ibucontext);
+ mutex_lock(&mana_ucontext->lock);
+ list_add_tail(&wq->ucontext_list, &mana_ucontext->wq_list);
+ mutex_unlock(&mana_ucontext->lock);
+ }
+
return &wq->ibwq;
err_free_wq:
@@ -64,6 +75,15 @@ int mana_ib_destroy_wq(struct ib_wq *ibwq, struct ib_udata *udata)
mdev = container_of(ib_dev, struct mana_ib_dev, ib_dev);
+ if (udata) {
+ struct mana_ib_ucontext *mana_ucontext =
+ rdma_udata_to_drv_context(udata,
+ struct mana_ib_ucontext, ibucontext);
+ mutex_lock(&mana_ucontext->lock);
+ list_del_init(&wq->ucontext_list);
+ mutex_unlock(&mana_ucontext->lock);
+ }
+
mana_ib_destroy_queue(mdev, &wq->queue);
kfree(wq);
--
2.43.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH rdma-next 5/8] RDMA/mana_ib: Track QP per ucontext
2026-03-07 1:47 [PATCH rdma-next 0/8] RDMA/mana_ib: Handle service reset for RDMA resources Long Li
` (3 preceding siblings ...)
2026-03-07 1:47 ` [PATCH rdma-next 4/8] RDMA/mana_ib: Track WQ " Long Li
@ 2026-03-07 1:47 ` Long Li
2026-03-07 1:47 ` [PATCH rdma-next 6/8] RDMA/mana_ib: Track MR " Long Li
` (3 subsequent siblings)
8 siblings, 0 replies; 15+ messages in thread
From: Long Li @ 2026-03-07 1:47 UTC (permalink / raw)
To: Long Li, Konstantin Taranov, Jakub Kicinski, David S . Miller,
Paolo Abeni, Eric Dumazet, Andrew Lunn, Jason Gunthorpe,
Leon Romanovsky, Haiyang Zhang, K . Y . Srinivasan, Wei Liu,
Dexuan Cui
Cc: Simon Horman, netdev, linux-rdma, linux-hyperv, linux-kernel
Add per-ucontext list tracking for QP objects. Only RAW_PACKET QPs
are tracked since they persist across reset events. RC, UD and GSI
QPs are removed and re-added during reset by IB core and do not
need tracking.
Signed-off-by: Long Li <longli@microsoft.com>
---
drivers/infiniband/hw/mana/main.c | 1 +
drivers/infiniband/hw/mana/mana_ib.h | 2 ++
drivers/infiniband/hw/mana/qp.c | 47 ++++++++++++++++++++++------
3 files changed, 40 insertions(+), 10 deletions(-)
diff --git a/drivers/infiniband/hw/mana/main.c b/drivers/infiniband/hw/mana/main.c
index e6da5c8400f4..c6a859628ba3 100644
--- a/drivers/infiniband/hw/mana/main.c
+++ b/drivers/infiniband/hw/mana/main.c
@@ -244,6 +244,7 @@ int mana_ib_alloc_ucontext(struct ib_ucontext *ibcontext,
mutex_init(&ucontext->lock);
INIT_LIST_HEAD(&ucontext->pd_list);
INIT_LIST_HEAD(&ucontext->cq_list);
+ INIT_LIST_HEAD(&ucontext->qp_list);
INIT_LIST_HEAD(&ucontext->wq_list);
mutex_lock(&mdev->ucontext_lock);
diff --git a/drivers/infiniband/hw/mana/mana_ib.h b/drivers/infiniband/hw/mana/mana_ib.h
index 96b5a13470ae..9d90fda2c830 100644
--- a/drivers/infiniband/hw/mana/mana_ib.h
+++ b/drivers/infiniband/hw/mana/mana_ib.h
@@ -198,6 +198,7 @@ struct mana_ib_qp {
refcount_t refcount;
struct completion free;
+ struct list_head ucontext_list;
};
struct mana_ib_ucontext {
@@ -208,6 +209,7 @@ struct mana_ib_ucontext {
struct mutex lock;
struct list_head pd_list;
struct list_head cq_list;
+ struct list_head qp_list;
struct list_head wq_list;
};
diff --git a/drivers/infiniband/hw/mana/qp.c b/drivers/infiniband/hw/mana/qp.c
index 82f84f7ad37a..315bc54d8ae6 100644
--- a/drivers/infiniband/hw/mana/qp.c
+++ b/drivers/infiniband/hw/mana/qp.c
@@ -700,14 +700,31 @@ static int mana_ib_create_ud_qp(struct ib_qp *ibqp, struct ib_pd *ibpd,
int mana_ib_create_qp(struct ib_qp *ibqp, struct ib_qp_init_attr *attr,
struct ib_udata *udata)
{
+ struct mana_ib_qp *qp = container_of(ibqp, struct mana_ib_qp, ibqp);
+ int err;
+
+ INIT_LIST_HEAD(&qp->ucontext_list);
+
switch (attr->qp_type) {
case IB_QPT_RAW_PACKET:
/* When rwq_ind_tbl is used, it's for creating WQs for RSS */
if (attr->rwq_ind_tbl)
- return mana_ib_create_qp_rss(ibqp, ibqp->pd, attr,
- udata);
+ err = mana_ib_create_qp_rss(ibqp, ibqp->pd, attr,
+ udata);
+ else
+ err = mana_ib_create_qp_raw(ibqp, ibqp->pd, attr,
+ udata);
+
+ if (!err && udata) {
+ struct mana_ib_ucontext *mana_ucontext =
+ rdma_udata_to_drv_context(udata,
+ struct mana_ib_ucontext, ibucontext);
+ mutex_lock(&mana_ucontext->lock);
+ list_add_tail(&qp->ucontext_list, &mana_ucontext->qp_list);
+ mutex_unlock(&mana_ucontext->lock);
+ }
- return mana_ib_create_qp_raw(ibqp, ibqp->pd, attr, udata);
+ return err;
case IB_QPT_RC:
return mana_ib_create_rc_qp(ibqp, ibqp->pd, attr, udata);
case IB_QPT_UD:
@@ -716,9 +733,8 @@ int mana_ib_create_qp(struct ib_qp *ibqp, struct ib_qp_init_attr *attr,
default:
ibdev_dbg(ibqp->device, "Creating QP type %u not supported\n",
attr->qp_type);
+ return -EINVAL;
}
-
- return -EINVAL;
}
static int mana_ib_gd_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr,
@@ -898,14 +914,26 @@ static int mana_ib_destroy_ud_qp(struct mana_ib_qp *qp, struct ib_udata *udata)
int mana_ib_destroy_qp(struct ib_qp *ibqp, struct ib_udata *udata)
{
struct mana_ib_qp *qp = container_of(ibqp, struct mana_ib_qp, ibqp);
+ int ret = -ENOENT;
switch (ibqp->qp_type) {
case IB_QPT_RAW_PACKET:
+ if (udata) {
+ struct mana_ib_ucontext *mana_ucontext =
+ rdma_udata_to_drv_context(udata,
+ struct mana_ib_ucontext, ibucontext);
+ mutex_lock(&mana_ucontext->lock);
+ list_del_init(&qp->ucontext_list);
+ mutex_unlock(&mana_ucontext->lock);
+ }
+
if (ibqp->rwq_ind_tbl)
- return mana_ib_destroy_qp_rss(qp, ibqp->rwq_ind_tbl,
- udata);
+ ret = mana_ib_destroy_qp_rss(qp, ibqp->rwq_ind_tbl,
+ udata);
+ else
+ ret = mana_ib_destroy_qp_raw(qp, udata);
- return mana_ib_destroy_qp_raw(qp, udata);
+ return ret;
case IB_QPT_RC:
return mana_ib_destroy_rc_qp(qp, udata);
case IB_QPT_UD:
@@ -914,7 +942,6 @@ int mana_ib_destroy_qp(struct ib_qp *ibqp, struct ib_udata *udata)
default:
ibdev_dbg(ibqp->device, "Unexpected QP type %u\n",
ibqp->qp_type);
+ return ret;
}
-
- return -ENOENT;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH rdma-next 6/8] RDMA/mana_ib: Track MR per ucontext
2026-03-07 1:47 [PATCH rdma-next 0/8] RDMA/mana_ib: Handle service reset for RDMA resources Long Li
` (4 preceding siblings ...)
2026-03-07 1:47 ` [PATCH rdma-next 5/8] RDMA/mana_ib: Track QP " Long Li
@ 2026-03-07 1:47 ` Long Li
2026-03-07 1:47 ` [PATCH rdma-next 7/8] RDMA/mana_ib: Notify service reset events to RDMA devices Long Li
` (2 subsequent siblings)
8 siblings, 0 replies; 15+ messages in thread
From: Long Li @ 2026-03-07 1:47 UTC (permalink / raw)
To: Long Li, Konstantin Taranov, Jakub Kicinski, David S . Miller,
Paolo Abeni, Eric Dumazet, Andrew Lunn, Jason Gunthorpe,
Leon Romanovsky, Haiyang Zhang, K . Y . Srinivasan, Wei Liu,
Dexuan Cui
Cc: Simon Horman, netdev, linux-rdma, linux-hyperv, linux-kernel
Add per-ucontext list tracking for MR objects. Each MR is added to
the ucontext's mr_list on creation and removed on destruction. This
enables iterating over all MRs belonging to a ucontext for service
reset cleanup.
Also export mana_ib_gd_destroy_mr() for use by reset cleanup code.
Signed-off-by: Long Li <longli@microsoft.com>
---
drivers/infiniband/hw/mana/main.c | 1 +
drivers/infiniband/hw/mana/mana_ib.h | 3 +++
drivers/infiniband/hw/mana/mr.c | 21 ++++++++++++++++++++-
3 files changed, 24 insertions(+), 1 deletion(-)
diff --git a/drivers/infiniband/hw/mana/main.c b/drivers/infiniband/hw/mana/main.c
index c6a859628ba3..f739e6da5435 100644
--- a/drivers/infiniband/hw/mana/main.c
+++ b/drivers/infiniband/hw/mana/main.c
@@ -243,6 +243,7 @@ int mana_ib_alloc_ucontext(struct ib_ucontext *ibcontext,
mutex_init(&ucontext->lock);
INIT_LIST_HEAD(&ucontext->pd_list);
+ INIT_LIST_HEAD(&ucontext->mr_list);
INIT_LIST_HEAD(&ucontext->cq_list);
INIT_LIST_HEAD(&ucontext->qp_list);
INIT_LIST_HEAD(&ucontext->wq_list);
diff --git a/drivers/infiniband/hw/mana/mana_ib.h b/drivers/infiniband/hw/mana/mana_ib.h
index 9d90fda2c830..ce5c6c030fb2 100644
--- a/drivers/infiniband/hw/mana/mana_ib.h
+++ b/drivers/infiniband/hw/mana/mana_ib.h
@@ -134,6 +134,7 @@ struct mana_ib_mr {
struct ib_mr ibmr;
struct ib_umem *umem;
mana_handle_t mr_handle;
+ struct list_head ucontext_list;
};
struct mana_ib_dm {
@@ -208,6 +209,7 @@ struct mana_ib_ucontext {
/* Protects resource lists below */
struct mutex lock;
struct list_head pd_list;
+ struct list_head mr_list;
struct list_head cq_list;
struct list_head qp_list;
struct list_head wq_list;
@@ -665,6 +667,7 @@ struct ib_mr *mana_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
struct ib_udata *udata);
int mana_ib_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata);
+int mana_ib_gd_destroy_mr(struct mana_ib_dev *dev, u64 mr_handle);
int mana_ib_create_qp(struct ib_qp *qp, struct ib_qp_init_attr *qp_init_attr,
struct ib_udata *udata);
diff --git a/drivers/infiniband/hw/mana/mr.c b/drivers/infiniband/hw/mana/mr.c
index 9613b225dad4..559bb4f7c31d 100644
--- a/drivers/infiniband/hw/mana/mr.c
+++ b/drivers/infiniband/hw/mana/mr.c
@@ -87,7 +87,7 @@ static int mana_ib_gd_create_mr(struct mana_ib_dev *dev, struct mana_ib_mr *mr,
return 0;
}
-static int mana_ib_gd_destroy_mr(struct mana_ib_dev *dev, u64 mr_handle)
+int mana_ib_gd_destroy_mr(struct mana_ib_dev *dev, u64 mr_handle)
{
struct gdma_destroy_mr_response resp = {};
struct gdma_destroy_mr_request req = {};
@@ -185,6 +185,16 @@ struct ib_mr *mana_ib_reg_user_mr(struct ib_pd *ibpd, u64 start, u64 length,
* as part of the lifecycle of this MR.
*/
+ INIT_LIST_HEAD(&mr->ucontext_list);
+ if (udata) {
+ struct mana_ib_ucontext *mana_ucontext =
+ rdma_udata_to_drv_context(udata,
+ struct mana_ib_ucontext, ibucontext);
+ mutex_lock(&mana_ucontext->lock);
+ list_add_tail(&mr->ucontext_list, &mana_ucontext->mr_list);
+ mutex_unlock(&mana_ucontext->lock);
+ }
+
return &mr->ibmr;
err_dma_region:
@@ -313,6 +323,15 @@ int mana_ib_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata)
dev = container_of(ibdev, struct mana_ib_dev, ib_dev);
+ if (udata) {
+ struct mana_ib_ucontext *mana_ucontext =
+ rdma_udata_to_drv_context(udata,
+ struct mana_ib_ucontext, ibucontext);
+ mutex_lock(&mana_ucontext->lock);
+ list_del_init(&mr->ucontext_list);
+ mutex_unlock(&mana_ucontext->lock);
+ }
+
err = mana_ib_gd_destroy_mr(dev, mr->mr_handle);
if (err)
return err;
--
2.43.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH rdma-next 7/8] RDMA/mana_ib: Notify service reset events to RDMA devices
2026-03-07 1:47 [PATCH rdma-next 0/8] RDMA/mana_ib: Handle service reset for RDMA resources Long Li
` (5 preceding siblings ...)
2026-03-07 1:47 ` [PATCH rdma-next 6/8] RDMA/mana_ib: Track MR " Long Li
@ 2026-03-07 1:47 ` Long Li
2026-03-07 1:47 ` [PATCH rdma-next 8/8] RDMA/mana_ib: Skip firmware commands for invalidated handles Long Li
2026-03-07 17:38 ` [PATCH rdma-next 0/8] RDMA/mana_ib: Handle service reset for RDMA resources Leon Romanovsky
8 siblings, 0 replies; 15+ messages in thread
From: Long Li @ 2026-03-07 1:47 UTC (permalink / raw)
To: Long Li, Konstantin Taranov, Jakub Kicinski, David S . Miller,
Paolo Abeni, Eric Dumazet, Andrew Lunn, Jason Gunthorpe,
Leon Romanovsky, Haiyang Zhang, K . Y . Srinivasan, Wei Liu,
Dexuan Cui
Cc: Simon Horman, netdev, linux-rdma, linux-hyperv, linux-kernel
Register reset_notify and resume_notify callbacks so the RDMA driver
is informed when the MANA service undergoes a reset cycle.
On reset notification:
- Acquire reset_rwsem write lock to serialize with resource creation
- Walk every tracked ucontext and invalidate firmware handles for
all PD, CQ, WQ, QP, and MR resources (set to INVALID_MANA_HANDLE)
- Dispatch IB_EVENT_PORT_ERR to each affected ucontext so userspace
(e.g. DPDK) learns about the reset
On resume notification:
- Release reset_rwsem write lock, unblocking new resource creation
Resource creation paths (alloc_pd, create_cq, create_wq, create_qp for
RAW_PACKET, reg_user_mr) acquire reset_rwsem read lock to ensure handles
are not invalidated while being set up.
Signed-off-by: Long Li <longli@microsoft.com>
---
drivers/infiniband/hw/mana/cq.c | 15 ++-
drivers/infiniband/hw/mana/device.c | 103 ++++++++++++++++++
drivers/infiniband/hw/mana/main.c | 9 ++
drivers/infiniband/hw/mana/mana_ib.h | 2 +
drivers/infiniband/hw/mana/mr.c | 4 +
drivers/infiniband/hw/mana/qp.c | 5 +
drivers/infiniband/hw/mana/wq.c | 4 +
drivers/net/ethernet/microsoft/mana/mana_en.c | 14 ++-
include/net/mana/gdma.h | 6 +
9 files changed, 155 insertions(+), 7 deletions(-)
diff --git a/drivers/infiniband/hw/mana/cq.c b/drivers/infiniband/hw/mana/cq.c
index 89cf60987ff5..b054684b8de7 100644
--- a/drivers/infiniband/hw/mana/cq.c
+++ b/drivers/infiniband/hw/mana/cq.c
@@ -41,13 +41,17 @@ int mana_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
ibdev_dbg(ibdev, "CQE %d exceeding limit\n", attr->cqe);
return -EINVAL;
}
+ }
+
+ down_read(&mdev->reset_rwsem);
+ if (udata) {
cq->cqe = attr->cqe;
err = mana_ib_create_queue(mdev, ucmd.buf_addr, cq->cqe * COMP_ENTRY_SIZE,
&cq->queue);
if (err) {
ibdev_dbg(ibdev, "Failed to create queue for create cq, %d\n", err);
- return err;
+ goto err_unlock;
}
mana_ucontext = rdma_udata_to_drv_context(udata, struct mana_ib_ucontext,
@@ -56,14 +60,15 @@ int mana_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
} else {
if (attr->cqe > U32_MAX / COMP_ENTRY_SIZE / 2 + 1) {
ibdev_dbg(ibdev, "CQE %d exceeding limit\n", attr->cqe);
- return -EINVAL;
+ err = -EINVAL;
+ goto err_unlock;
}
buf_size = MANA_PAGE_ALIGN(roundup_pow_of_two(attr->cqe * COMP_ENTRY_SIZE));
cq->cqe = buf_size / COMP_ENTRY_SIZE;
err = mana_ib_create_kernel_queue(mdev, buf_size, GDMA_CQ, &cq->queue);
if (err) {
ibdev_dbg(ibdev, "Failed to create kernel queue for create cq, %d\n", err);
- return err;
+ goto err_unlock;
}
doorbell = mdev->gdma_dev->doorbell;
}
@@ -105,6 +110,7 @@ int mana_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
mutex_unlock(&mana_ucontext->lock);
}
+ up_read(&mdev->reset_rwsem);
return 0;
err_remove_cq_cb:
@@ -113,7 +119,8 @@ int mana_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
mana_ib_gd_destroy_cq(mdev, cq);
err_destroy_queue:
mana_ib_destroy_queue(mdev, &cq->queue);
-
+err_unlock:
+ up_read(&mdev->reset_rwsem);
return err;
}
diff --git a/drivers/infiniband/hw/mana/device.c b/drivers/infiniband/hw/mana/device.c
index 149e8d4d5b8e..081be31563ca 100644
--- a/drivers/infiniband/hw/mana/device.c
+++ b/drivers/infiniband/hw/mana/device.c
@@ -103,6 +103,7 @@ static int mana_ib_netdev_event(struct notifier_block *this,
netdev_put(ndev, &dev->dev_tracker);
return NOTIFY_OK;
+
default:
return NOTIFY_DONE;
}
@@ -110,6 +111,93 @@ static int mana_ib_netdev_event(struct notifier_block *this,
return NOTIFY_DONE;
}
+/*
+ * Reset cleanup: invalidate firmware handles for all tracked user objects.
+ *
+ * Called during service reset BEFORE dispatching IB_EVENT_PORT_ERR to
+ * user-mode.
+ *
+ * Only invalidates FW handles — does NOT free kernel resources (umem, queues)
+ * or remove objects from lists. The IB core's destroy callbacks handle full
+ * resource teardown when user-space closes the uverbs FD or ib_unregister_device
+ * is called. The destroy callbacks skip FW commands when the handle is already
+ * INVALID_MANA_HANDLE.
+ *
+ * For CQs, also removes the CQ callback to prevent stale completions.
+ */
+static void mana_ib_reset_notify(void *ctx)
+{
+ struct mana_ib_dev *mdev = ctx;
+ struct mana_ib_ucontext *uctx;
+ struct mana_ib_qp *qp;
+ struct mana_ib_wq *wq;
+ struct mana_ib_cq *cq;
+ struct mana_ib_mr *mr;
+ struct mana_ib_pd *pd;
+ struct ib_event ibev;
+ int i;
+
+ down_write(&mdev->reset_rwsem);
+
+ ibdev_dbg(&mdev->ib_dev, "reset cleanup starting\n");
+
+ mutex_lock(&mdev->ucontext_lock);
+ list_for_each_entry(uctx, &mdev->ucontext_list, dev_list) {
+ mutex_lock(&uctx->lock);
+
+ list_for_each_entry(qp, &uctx->qp_list, ucontext_list)
+ qp->qp_handle = INVALID_MANA_HANDLE;
+
+ list_for_each_entry(wq, &uctx->wq_list, ucontext_list)
+ wq->rx_object = INVALID_MANA_HANDLE;
+
+ list_for_each_entry(cq, &uctx->cq_list, ucontext_list) {
+ mana_ib_remove_cq_cb(mdev, cq);
+ cq->cq_handle = INVALID_MANA_HANDLE;
+ }
+
+ list_for_each_entry(mr, &uctx->mr_list, ucontext_list)
+ mr->mr_handle = INVALID_MANA_HANDLE;
+
+ list_for_each_entry(pd, &uctx->pd_list, ucontext_list)
+ pd->pd_handle = INVALID_MANA_HANDLE;
+
+ uctx->doorbell = INVALID_DOORBELL;
+
+ mutex_unlock(&uctx->lock);
+ }
+ mutex_unlock(&mdev->ucontext_lock);
+
+ up_write(&mdev->reset_rwsem);
+
+ /* Revoke user doorbell mappings so userspace cannot ring
+ * stale doorbells after firmware handles are invalidated.
+ */
+ rdma_user_mmap_disassociate(&mdev->ib_dev);
+
+ /* Notify userspace (e.g. DPDK) that the port is down */
+ for (i = 0; i < mdev->ib_dev.phys_port_cnt; i++) {
+ ibev.device = &mdev->ib_dev;
+ ibev.element.port_num = i + 1;
+ ibev.event = IB_EVENT_PORT_ERR;
+ ib_dispatch_event(&ibev);
+ }
+}
+
+static void mana_ib_resume_notify(void *ctx)
+{
+ struct mana_ib_dev *dev = ctx;
+ struct ib_event ibev;
+ int i;
+
+ for (i = 0; i < dev->ib_dev.phys_port_cnt; i++) {
+ ibev.device = &dev->ib_dev;
+ ibev.element.port_num = i + 1;
+ ibev.event = IB_EVENT_PORT_ACTIVE;
+ ib_dispatch_event(&ibev);
+ }
+}
+
static int mana_ib_probe(struct auxiliary_device *adev,
const struct auxiliary_device_id *id)
{
@@ -134,6 +222,7 @@ static int mana_ib_probe(struct auxiliary_device *adev,
xa_init_flags(&dev->qp_table_wq, XA_FLAGS_LOCK_IRQ);
mutex_init(&dev->ucontext_lock);
INIT_LIST_HEAD(&dev->ucontext_list);
+ init_rwsem(&dev->reset_rwsem);
if (mana_ib_is_rnic(dev)) {
dev->ib_dev.phys_port_cnt = 1;
@@ -216,6 +305,15 @@ static int mana_ib_probe(struct auxiliary_device *adev,
dev_set_drvdata(&adev->dev, dev);
+ /* ETH device persists across reset — use callback for cleanup.
+ * RNIC device is removed/re-added, so its cleanup happens in remove.
+ */
+ if (!mana_ib_is_rnic(dev)) {
+ mdev->reset_notify = mana_ib_reset_notify;
+ mdev->resume_notify = mana_ib_resume_notify;
+ mdev->reset_notify_ctx = dev;
+ }
+
return 0;
deallocate_pool:
@@ -242,6 +340,11 @@ static void mana_ib_remove(struct auxiliary_device *adev)
if (mana_ib_is_rnic(dev))
mana_drain_gsi_sqs(dev);
+ if (!mana_ib_is_rnic(dev)) {
+ dev->gdma_dev->reset_notify = NULL;
+ dev->gdma_dev->resume_notify = NULL;
+ dev->gdma_dev->reset_notify_ctx = NULL;
+ }
ib_unregister_device(&dev->ib_dev);
dma_pool_destroy(dev->av_pool);
if (mana_ib_is_rnic(dev)) {
diff --git a/drivers/infiniband/hw/mana/main.c b/drivers/infiniband/hw/mana/main.c
index f739e6da5435..61ce30aa9cb2 100644
--- a/drivers/infiniband/hw/mana/main.c
+++ b/drivers/infiniband/hw/mana/main.c
@@ -81,6 +81,8 @@ int mana_ib_alloc_pd(struct ib_pd *ibpd, struct ib_udata *udata)
dev = container_of(ibdev, struct mana_ib_dev, ib_dev);
gc = mdev_to_gc(dev);
+ down_read(&dev->reset_rwsem);
+
mana_gd_init_req_hdr(&req.hdr, GDMA_CREATE_PD, sizeof(req),
sizeof(resp));
@@ -98,6 +100,7 @@ int mana_ib_alloc_pd(struct ib_pd *ibpd, struct ib_udata *udata)
if (!err)
err = -EPROTO;
+ up_read(&dev->reset_rwsem);
return err;
}
@@ -118,6 +121,7 @@ int mana_ib_alloc_pd(struct ib_pd *ibpd, struct ib_udata *udata)
mutex_unlock(&mana_ucontext->lock);
}
+ up_read(&dev->reset_rwsem);
return 0;
}
@@ -230,10 +234,13 @@ int mana_ib_alloc_ucontext(struct ib_ucontext *ibcontext,
mdev = container_of(ibdev, struct mana_ib_dev, ib_dev);
gc = mdev_to_gc(mdev);
+ down_read(&mdev->reset_rwsem);
+
/* Allocate a doorbell page index */
ret = mana_gd_allocate_doorbell_page(gc, &doorbell_page);
if (ret) {
ibdev_dbg(ibdev, "Failed to allocate doorbell page %d\n", ret);
+ up_read(&mdev->reset_rwsem);
return ret;
}
@@ -252,6 +259,8 @@ int mana_ib_alloc_ucontext(struct ib_ucontext *ibcontext,
list_add_tail(&ucontext->dev_list, &mdev->ucontext_list);
mutex_unlock(&mdev->ucontext_lock);
+ up_read(&mdev->reset_rwsem);
+
return 0;
}
diff --git a/drivers/infiniband/hw/mana/mana_ib.h b/drivers/infiniband/hw/mana/mana_ib.h
index ce5c6c030fb2..29201cf3274c 100644
--- a/drivers/infiniband/hw/mana/mana_ib.h
+++ b/drivers/infiniband/hw/mana/mana_ib.h
@@ -86,6 +86,8 @@ struct mana_ib_dev {
/* Protects ucontext_list */
struct mutex ucontext_lock;
struct list_head ucontext_list;
+ /* Serializes resource create callbacks vs reset cleanup */
+ struct rw_semaphore reset_rwsem;
};
struct mana_ib_wq {
diff --git a/drivers/infiniband/hw/mana/mr.c b/drivers/infiniband/hw/mana/mr.c
index 559bb4f7c31d..7189ccd41576 100644
--- a/drivers/infiniband/hw/mana/mr.c
+++ b/drivers/infiniband/hw/mana/mr.c
@@ -141,6 +141,8 @@ struct ib_mr *mana_ib_reg_user_mr(struct ib_pd *ibpd, u64 start, u64 length,
if (!mr)
return ERR_PTR(-ENOMEM);
+ down_read(&dev->reset_rwsem);
+
mr->umem = ib_umem_get(ibdev, start, length, access_flags);
if (IS_ERR(mr->umem)) {
err = PTR_ERR(mr->umem);
@@ -195,6 +197,7 @@ struct ib_mr *mana_ib_reg_user_mr(struct ib_pd *ibpd, u64 start, u64 length,
mutex_unlock(&mana_ucontext->lock);
}
+ up_read(&dev->reset_rwsem);
return &mr->ibmr;
err_dma_region:
@@ -204,6 +207,7 @@ struct ib_mr *mana_ib_reg_user_mr(struct ib_pd *ibpd, u64 start, u64 length,
ib_umem_release(mr->umem);
err_free:
+ up_read(&dev->reset_rwsem);
kfree(mr);
return ERR_PTR(err);
}
diff --git a/drivers/infiniband/hw/mana/qp.c b/drivers/infiniband/hw/mana/qp.c
index 315bc54d8ae6..d590aca9b93a 100644
--- a/drivers/infiniband/hw/mana/qp.c
+++ b/drivers/infiniband/hw/mana/qp.c
@@ -701,12 +701,16 @@ int mana_ib_create_qp(struct ib_qp *ibqp, struct ib_qp_init_attr *attr,
struct ib_udata *udata)
{
struct mana_ib_qp *qp = container_of(ibqp, struct mana_ib_qp, ibqp);
+ struct mana_ib_dev *mdev =
+ container_of(ibqp->device, struct mana_ib_dev, ib_dev);
int err;
INIT_LIST_HEAD(&qp->ucontext_list);
switch (attr->qp_type) {
case IB_QPT_RAW_PACKET:
+ down_read(&mdev->reset_rwsem);
+
/* When rwq_ind_tbl is used, it's for creating WQs for RSS */
if (attr->rwq_ind_tbl)
err = mana_ib_create_qp_rss(ibqp, ibqp->pd, attr,
@@ -724,6 +728,7 @@ int mana_ib_create_qp(struct ib_qp *ibqp, struct ib_qp_init_attr *attr,
mutex_unlock(&mana_ucontext->lock);
}
+ up_read(&mdev->reset_rwsem);
return err;
case IB_QPT_RC:
return mana_ib_create_rc_qp(ibqp, ibqp->pd, attr, udata);
diff --git a/drivers/infiniband/hw/mana/wq.c b/drivers/infiniband/hw/mana/wq.c
index 1af9869933aa..67b757cf30f9 100644
--- a/drivers/infiniband/hw/mana/wq.c
+++ b/drivers/infiniband/hw/mana/wq.c
@@ -31,6 +31,8 @@ struct ib_wq *mana_ib_create_wq(struct ib_pd *pd,
ibdev_dbg(&mdev->ib_dev, "ucmd wq_buf_addr 0x%llx\n", ucmd.wq_buf_addr);
+ down_read(&mdev->reset_rwsem);
+
err = mana_ib_create_queue(mdev, ucmd.wq_buf_addr, ucmd.wq_buf_size, &wq->queue);
if (err) {
ibdev_dbg(&mdev->ib_dev,
@@ -52,9 +54,11 @@ struct ib_wq *mana_ib_create_wq(struct ib_pd *pd,
mutex_unlock(&mana_ucontext->lock);
}
+ up_read(&mdev->reset_rwsem);
return &wq->ibwq;
err_free_wq:
+ up_read(&mdev->reset_rwsem);
kfree(wq);
return ERR_PTR(err);
diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index ea71de39f996..3493b36426f7 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -3659,15 +3659,19 @@ int mana_probe(struct gdma_dev *gd, bool resuming)
}
}
- err = add_adev(gd, "eth");
+ if (!resuming)
+ err = add_adev(gd, "eth");
INIT_DELAYED_WORK(&ac->gf_stats_work, mana_gf_stats_work_handler);
schedule_delayed_work(&ac->gf_stats_work, MANA_GF_STATS_PERIOD);
-
out:
if (err) {
mana_remove(gd, false);
} else {
+ /* Notify IB layer that ports are back up after reset */
+ if (resuming && gd->resume_notify)
+ gd->resume_notify(gd->reset_notify_ctx);
+
dev_dbg(dev, "gd=%p, id=%u, num_ports=%d, type=%u, instance=%u\n",
gd, gd->dev_id.as_uint32, ac->num_ports,
gd->dev_id.type, gd->dev_id.instance);
@@ -3691,9 +3695,13 @@ void mana_remove(struct gdma_dev *gd, bool suspending)
cancel_delayed_work_sync(&ac->gf_stats_work);
/* adev currently doesn't support suspending, always remove it */
- if (gd->adev)
+ if (gd->adev && !suspending)
remove_adev(gd);
+ /* Notify IB layer before tearing down net devices during reset */
+ if (suspending && gd->reset_notify)
+ gd->reset_notify(gd->reset_notify_ctx);
+
for (i = 0; i < ac->num_ports; i++) {
ndev = ac->ports[i];
if (!ndev) {
diff --git a/include/net/mana/gdma.h b/include/net/mana/gdma.h
index ec17004b10c0..9187c5b4d0d1 100644
--- a/include/net/mana/gdma.h
+++ b/include/net/mana/gdma.h
@@ -249,6 +249,12 @@ struct gdma_dev {
struct auxiliary_device *adev;
bool is_suspended;
bool rdma_teardown;
+
+ /* Called by mana_remove() during reset to notify IB layer */
+ void (*reset_notify)(void *ctx);
+ /* Called by mana_probe() during resume to notify IB layer */
+ void (*resume_notify)(void *ctx);
+ void *reset_notify_ctx;
};
/* MANA_PAGE_SIZE is the DMA unit */
--
2.43.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH rdma-next 8/8] RDMA/mana_ib: Skip firmware commands for invalidated handles
2026-03-07 1:47 [PATCH rdma-next 0/8] RDMA/mana_ib: Handle service reset for RDMA resources Long Li
` (6 preceding siblings ...)
2026-03-07 1:47 ` [PATCH rdma-next 7/8] RDMA/mana_ib: Notify service reset events to RDMA devices Long Li
@ 2026-03-07 1:47 ` Long Li
2026-03-07 17:38 ` [PATCH rdma-next 0/8] RDMA/mana_ib: Handle service reset for RDMA resources Leon Romanovsky
8 siblings, 0 replies; 15+ messages in thread
From: Long Li @ 2026-03-07 1:47 UTC (permalink / raw)
To: Long Li, Konstantin Taranov, Jakub Kicinski, David S . Miller,
Paolo Abeni, Eric Dumazet, Andrew Lunn, Jason Gunthorpe,
Leon Romanovsky, Haiyang Zhang, K . Y . Srinivasan, Wei Liu,
Dexuan Cui
Cc: Simon Horman, netdev, linux-rdma, linux-hyperv, linux-kernel
After a service reset, firmware handles for PD, CQ, WQ, QP, and MR
are set to INVALID_MANA_HANDLE by the reset notification path.
Check for INVALID_MANA_HANDLE in each destroy callback before issuing
firmware destroy commands. When a handle is invalid, skip the firmware
call and proceed directly to kernel resource cleanup (umem, queues,
memory). This avoids sending stale handles to firmware after reset.
Affected callbacks:
- mana_ib_dealloc_pd: skip mana_ib_gd_destroy_pd
- mana_ib_destroy_cq: skip mana_ib_gd_destroy_cq and queue destroy
- mana_ib_destroy_wq: skip mana_ib_destroy_queue
- mana_ib_destroy_qp_rss: skip mana_destroy_wq_obj per WQ
- mana_ib_destroy_qp_raw: skip mana_destroy_wq_obj
- mana_ib_dereg_mr: skip mana_ib_gd_destroy_mr
Signed-off-by: Long Li <longli@microsoft.com>
---
drivers/infiniband/hw/mana/cq.c | 10 ++++++----
drivers/infiniband/hw/mana/main.c | 12 +++++++++---
drivers/infiniband/hw/mana/mr.c | 8 +++++---
drivers/infiniband/hw/mana/qp.c | 9 ++++++---
4 files changed, 26 insertions(+), 13 deletions(-)
diff --git a/drivers/infiniband/hw/mana/cq.c b/drivers/infiniband/hw/mana/cq.c
index b054684b8de7..315301bccb97 100644
--- a/drivers/infiniband/hw/mana/cq.c
+++ b/drivers/infiniband/hw/mana/cq.c
@@ -143,10 +143,12 @@ int mana_ib_destroy_cq(struct ib_cq *ibcq, struct ib_udata *udata)
mana_ib_remove_cq_cb(mdev, cq);
- /* Ignore return code as there is not much we can do about it.
- * The error message is printed inside.
- */
- mana_ib_gd_destroy_cq(mdev, cq);
+ if (cq->cq_handle != INVALID_MANA_HANDLE) {
+ /* Ignore return code as there is not much we can do about it.
+ * The error message is printed inside.
+ */
+ mana_ib_gd_destroy_cq(mdev, cq);
+ }
mana_ib_destroy_queue(mdev, &cq->queue);
diff --git a/drivers/infiniband/hw/mana/main.c b/drivers/infiniband/hw/mana/main.c
index 61ce30aa9cb2..d60205184dba 100644
--- a/drivers/infiniband/hw/mana/main.c
+++ b/drivers/infiniband/hw/mana/main.c
@@ -147,6 +147,9 @@ int mana_ib_dealloc_pd(struct ib_pd *ibpd, struct ib_udata *udata)
mutex_unlock(&mana_ucontext->lock);
}
+ if (pd->pd_handle == INVALID_MANA_HANDLE)
+ return 0;
+
mana_gd_init_req_hdr(&req.hdr, GDMA_DESTROY_PD, sizeof(req),
sizeof(resp));
@@ -280,9 +283,12 @@ void mana_ib_dealloc_ucontext(struct ib_ucontext *ibcontext)
list_del_init(&mana_ucontext->dev_list);
mutex_unlock(&mdev->ucontext_lock);
- ret = mana_gd_destroy_doorbell_page(gc, mana_ucontext->doorbell);
- if (ret)
- ibdev_dbg(ibdev, "Failed to destroy doorbell page %d\n", ret);
+ if (mana_ucontext->doorbell != INVALID_DOORBELL) {
+ ret = mana_gd_destroy_doorbell_page(gc, mana_ucontext->doorbell);
+ if (ret)
+ ibdev_dbg(ibdev, "Failed to destroy doorbell page %d\n",
+ ret);
+ }
}
int mana_ib_create_kernel_queue(struct mana_ib_dev *mdev, u32 size, enum gdma_queue_type type,
diff --git a/drivers/infiniband/hw/mana/mr.c b/drivers/infiniband/hw/mana/mr.c
index 7189ccd41576..75bc2a9c366a 100644
--- a/drivers/infiniband/hw/mana/mr.c
+++ b/drivers/infiniband/hw/mana/mr.c
@@ -336,9 +336,11 @@ int mana_ib_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata)
mutex_unlock(&mana_ucontext->lock);
}
- err = mana_ib_gd_destroy_mr(dev, mr->mr_handle);
- if (err)
- return err;
+ if (mr->mr_handle != INVALID_MANA_HANDLE) {
+ err = mana_ib_gd_destroy_mr(dev, mr->mr_handle);
+ if (err)
+ return err;
+ }
if (mr->umem)
ib_umem_release(mr->umem);
diff --git a/drivers/infiniband/hw/mana/qp.c b/drivers/infiniband/hw/mana/qp.c
index d590aca9b93a..76d59addb645 100644
--- a/drivers/infiniband/hw/mana/qp.c
+++ b/drivers/infiniband/hw/mana/qp.c
@@ -846,9 +846,11 @@ static int mana_ib_destroy_qp_rss(struct mana_ib_qp *qp,
for (i = 0; i < (1 << ind_tbl->log_ind_tbl_size); i++) {
ibwq = ind_tbl->ind_tbl[i];
wq = container_of(ibwq, struct mana_ib_wq, ibwq);
- ibdev_dbg(&mdev->ib_dev, "destroying wq->rx_object %llu\n",
+ ibdev_dbg(&mdev->ib_dev,
+ "destroying wq->rx_object %llu\n",
wq->rx_object);
- mana_destroy_wq_obj(mpc, GDMA_RQ, wq->rx_object);
+ if (wq->rx_object != INVALID_MANA_HANDLE)
+ mana_destroy_wq_obj(mpc, GDMA_RQ, wq->rx_object);
}
return 0;
@@ -867,7 +869,8 @@ static int mana_ib_destroy_qp_raw(struct mana_ib_qp *qp, struct ib_udata *udata)
mpc = netdev_priv(ndev);
pd = container_of(ibpd, struct mana_ib_pd, ibpd);
- mana_destroy_wq_obj(mpc, GDMA_SQ, qp->qp_handle);
+ if (qp->qp_handle != INVALID_MANA_HANDLE)
+ mana_destroy_wq_obj(mpc, GDMA_SQ, qp->qp_handle);
mana_ib_destroy_queue(mdev, &qp->raw_sq);
--
2.43.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH rdma-next 0/8] RDMA/mana_ib: Handle service reset for RDMA resources
2026-03-07 1:47 [PATCH rdma-next 0/8] RDMA/mana_ib: Handle service reset for RDMA resources Long Li
` (7 preceding siblings ...)
2026-03-07 1:47 ` [PATCH rdma-next 8/8] RDMA/mana_ib: Skip firmware commands for invalidated handles Long Li
@ 2026-03-07 17:38 ` Leon Romanovsky
2026-03-13 16:59 ` Jason Gunthorpe
8 siblings, 1 reply; 15+ messages in thread
From: Leon Romanovsky @ 2026-03-07 17:38 UTC (permalink / raw)
To: Long Li
Cc: Konstantin Taranov, Jakub Kicinski, David S . Miller, Paolo Abeni,
Eric Dumazet, Andrew Lunn, Jason Gunthorpe, Haiyang Zhang,
K . Y . Srinivasan, Wei Liu, Dexuan Cui, Simon Horman, netdev,
linux-rdma, linux-hyperv, linux-kernel
On Fri, Mar 06, 2026 at 05:47:14PM -0800, Long Li wrote:
> When the MANA hardware undergoes a service reset, the ETH auxiliary device
> (mana.eth) used by DPDK persists across the reset cycle — it is not removed
> and re-added like RC/UD/GSI QPs. This means userspace RDMA consumers such
> as DPDK have no way of knowing that firmware handles for their PD, CQ, WQ,
> QP and MR resources have become stale.
NAK to any of this.
In case of hardware reset, mana_ib AUX device needs to be destroyed and
recreated later.
The same is applicable for mana.eth as well.
Thanks
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH rdma-next 0/8] RDMA/mana_ib: Handle service reset for RDMA resources
2026-03-07 17:38 ` [PATCH rdma-next 0/8] RDMA/mana_ib: Handle service reset for RDMA resources Leon Romanovsky
@ 2026-03-13 16:59 ` Jason Gunthorpe
2026-03-16 20:08 ` Leon Romanovsky
0 siblings, 1 reply; 15+ messages in thread
From: Jason Gunthorpe @ 2026-03-13 16:59 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Long Li, Konstantin Taranov, Jakub Kicinski, David S . Miller,
Paolo Abeni, Eric Dumazet, Andrew Lunn, Haiyang Zhang,
K . Y . Srinivasan, Wei Liu, Dexuan Cui, Simon Horman, netdev,
linux-rdma, linux-hyperv, linux-kernel
On Sat, Mar 07, 2026 at 07:38:14PM +0200, Leon Romanovsky wrote:
> On Fri, Mar 06, 2026 at 05:47:14PM -0800, Long Li wrote:
> > When the MANA hardware undergoes a service reset, the ETH auxiliary device
> > (mana.eth) used by DPDK persists across the reset cycle — it is not removed
> > and re-added like RC/UD/GSI QPs. This means userspace RDMA consumers such
> > as DPDK have no way of knowing that firmware handles for their PD, CQ, WQ,
> > QP and MR resources have become stale.
>
> NAK to any of this.
>
> In case of hardware reset, mana_ib AUX device needs to be destroyed and
> recreated later.
Yeah, that is our general model for any serious RAS event where the
driver's view of resources becomes out of sync with the HW.
You have tear down the ib_device by removing the aux and then bring
back a new one.
There is an IB_EVENT_DEVICE_FATAL, but the purpose of that event is to
tell userspace to close and re-open their uverbs FD.
We don't have a model where a uverbs FD in userspace can continue to
work after the device has a catasrophic RAS event.
There may be room to have a model where the ib device doesn't fully
unplug/replug so it retains its name and things, but that is core code
not driver stuff.
Jason
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH rdma-next 0/8] RDMA/mana_ib: Handle service reset for RDMA resources
2026-03-13 16:59 ` Jason Gunthorpe
@ 2026-03-16 20:08 ` Leon Romanovsky
2026-03-17 23:43 ` [EXTERNAL] " Long Li
0 siblings, 1 reply; 15+ messages in thread
From: Leon Romanovsky @ 2026-03-16 20:08 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Long Li, Konstantin Taranov, Jakub Kicinski, David S . Miller,
Paolo Abeni, Eric Dumazet, Andrew Lunn, Haiyang Zhang,
K . Y . Srinivasan, Wei Liu, Dexuan Cui, Simon Horman, netdev,
linux-rdma, linux-hyperv, linux-kernel
On Fri, Mar 13, 2026 at 01:59:28PM -0300, Jason Gunthorpe wrote:
> On Sat, Mar 07, 2026 at 07:38:14PM +0200, Leon Romanovsky wrote:
> > On Fri, Mar 06, 2026 at 05:47:14PM -0800, Long Li wrote:
> > > When the MANA hardware undergoes a service reset, the ETH auxiliary device
> > > (mana.eth) used by DPDK persists across the reset cycle — it is not removed
> > > and re-added like RC/UD/GSI QPs. This means userspace RDMA consumers such
> > > as DPDK have no way of knowing that firmware handles for their PD, CQ, WQ,
> > > QP and MR resources have become stale.
> >
> > NAK to any of this.
> >
> > In case of hardware reset, mana_ib AUX device needs to be destroyed and
> > recreated later.
>
> Yeah, that is our general model for any serious RAS event where the
> driver's view of resources becomes out of sync with the HW.
>
> You have tear down the ib_device by removing the aux and then bring
> back a new one.
>
> There is an IB_EVENT_DEVICE_FATAL, but the purpose of that event is to
> tell userspace to close and re-open their uverbs FD.
>
> We don't have a model where a uverbs FD in userspace can continue to
> work after the device has a catasrophic RAS event.
>
> There may be room to have a model where the ib device doesn't fully
> unplug/replug so it retains its name and things, but that is core code
> not driver stuff.
Good luck with that model. It is going to break RDMA-CM hotplug support.
Thanks
>
> Jason
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: [EXTERNAL] Re: [PATCH rdma-next 0/8] RDMA/mana_ib: Handle service reset for RDMA resources
2026-03-16 20:08 ` Leon Romanovsky
@ 2026-03-17 23:43 ` Long Li
2026-03-18 14:49 ` Leon Romanovsky
0 siblings, 1 reply; 15+ messages in thread
From: Long Li @ 2026-03-17 23:43 UTC (permalink / raw)
To: Leon Romanovsky, Jason Gunthorpe
Cc: Konstantin Taranov, Jakub Kicinski, David S . Miller, Paolo Abeni,
Eric Dumazet, Andrew Lunn, Haiyang Zhang, KY Srinivasan, Wei Liu,
Dexuan Cui, Simon Horman, netdev@vger.kernel.org,
linux-rdma@vger.kernel.org, linux-hyperv@vger.kernel.org,
linux-kernel@vger.kernel.org
>
> On Fri, Mar 13, 2026 at 01:59:28PM -0300, Jason Gunthorpe wrote:
> > On Sat, Mar 07, 2026 at 07:38:14PM +0200, Leon Romanovsky wrote:
> > > On Fri, Mar 06, 2026 at 05:47:14PM -0800, Long Li wrote:
> > > > When the MANA hardware undergoes a service reset, the ETH
> > > > auxiliary device
> > > > (mana.eth) used by DPDK persists across the reset cycle — it is
> > > > not removed and re-added like RC/UD/GSI QPs. This means userspace
> > > > RDMA consumers such as DPDK have no way of knowing that firmware
> > > > handles for their PD, CQ, WQ, QP and MR resources have become stale.
> > >
> > > NAK to any of this.
> > >
> > > In case of hardware reset, mana_ib AUX device needs to be destroyed
> > > and recreated later.
> >
> > Yeah, that is our general model for any serious RAS event where the
> > driver's view of resources becomes out of sync with the HW.
> >
> > You have tear down the ib_device by removing the aux and then bring
> > back a new one.
> >
> > There is an IB_EVENT_DEVICE_FATAL, but the purpose of that event is to
> > tell userspace to close and re-open their uverbs FD.
> >
> > We don't have a model where a uverbs FD in userspace can continue to
> > work after the device has a catasrophic RAS event.
> >
> > There may be room to have a model where the ib device doesn't fully
> > unplug/replug so it retains its name and things, but that is core code
> > not driver stuff.
>
> Good luck with that model. It is going to break RDMA-CM hotplug support.
>
I think we can preserve RDMA-CM behavior without requiring ib_device
unregister/re-register.
On device reset, the driver can dispatch IB_EVENT_DEVICE_FATAL (or a
new reset event) through ib_dispatch_event(). RDMA-CM already handles
device events — we would add a handler that iterates all rdma_cm_ids
on the device and sends RDMA_CM_EVENT_DEVICE_REMOVAL to each, same
as cma_process_remove() does today. The difference: cma_device stays
alive, so applications can reconnect on the same device after recovery
instead of waiting for a new one to appear.
The motivation for keeping ib_device alive is that some RDMA consumers
— DPDK and NCCL — don't use RDMA-CM at all. They use raw verbs and
manage QP state themselves. For these users, a persistent ib_device
with IB_EVENT_PORT_ERR / IB_EVENT_PORT_ACTIVE notifications enables
reliable in-place recovery without reopening the device.
This matters especially for PCI DPC recovery, which is becoming
critical for large-scale GPU/storage deployments. See this talk for
context on the value of surviving DPC events:
https://www.youtube.com/watch?v=TpNNeMGEsdU&t=1619s
Today a DPC event on one NIC kills all RDMA connections and can
crash entire training jobs. If the ib_device persists and the driver
recreates firmware resources after recovery, raw verbs users can
resume without full teardown, and RDMA-CM users get the same
disconnect/reconnect behavior they have today.
Thanks,
Long
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [EXTERNAL] Re: [PATCH rdma-next 0/8] RDMA/mana_ib: Handle service reset for RDMA resources
2026-03-17 23:43 ` [EXTERNAL] " Long Li
@ 2026-03-18 14:49 ` Leon Romanovsky
2026-03-21 0:49 ` Long Li
0 siblings, 1 reply; 15+ messages in thread
From: Leon Romanovsky @ 2026-03-18 14:49 UTC (permalink / raw)
To: Long Li
Cc: Jason Gunthorpe, Konstantin Taranov, Jakub Kicinski,
David S . Miller, Paolo Abeni, Eric Dumazet, Andrew Lunn,
Haiyang Zhang, KY Srinivasan, Wei Liu, Dexuan Cui, Simon Horman,
netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org
On Tue, Mar 17, 2026 at 11:43:49PM +0000, Long Li wrote:
> >
> > On Fri, Mar 13, 2026 at 01:59:28PM -0300, Jason Gunthorpe wrote:
> > > On Sat, Mar 07, 2026 at 07:38:14PM +0200, Leon Romanovsky wrote:
> > > > On Fri, Mar 06, 2026 at 05:47:14PM -0800, Long Li wrote:
> > > > > When the MANA hardware undergoes a service reset, the ETH
> > > > > auxiliary device
> > > > > (mana.eth) used by DPDK persists across the reset cycle — it is
> > > > > not removed and re-added like RC/UD/GSI QPs. This means userspace
> > > > > RDMA consumers such as DPDK have no way of knowing that firmware
> > > > > handles for their PD, CQ, WQ, QP and MR resources have become stale.
> > > >
> > > > NAK to any of this.
> > > >
> > > > In case of hardware reset, mana_ib AUX device needs to be destroyed
> > > > and recreated later.
> > >
> > > Yeah, that is our general model for any serious RAS event where the
> > > driver's view of resources becomes out of sync with the HW.
> > >
> > > You have tear down the ib_device by removing the aux and then bring
> > > back a new one.
> > >
> > > There is an IB_EVENT_DEVICE_FATAL, but the purpose of that event is to
> > > tell userspace to close and re-open their uverbs FD.
> > >
> > > We don't have a model where a uverbs FD in userspace can continue to
> > > work after the device has a catasrophic RAS event.
> > >
> > > There may be room to have a model where the ib device doesn't fully
> > > unplug/replug so it retains its name and things, but that is core code
> > > not driver stuff.
> >
> > Good luck with that model. It is going to break RDMA-CM hotplug support.
> >
>
> I think we can preserve RDMA-CM behavior without requiring ib_device
> unregister/re-register.
>
> On device reset, the driver can dispatch IB_EVENT_DEVICE_FATAL (or a
> new reset event) through ib_dispatch_event(). RDMA-CM already handles
> device events — we would add a handler that iterates all rdma_cm_ids
> on the device and sends RDMA_CM_EVENT_DEVICE_REMOVAL to each, same
> as cma_process_remove() does today. The difference: cma_device stays
> alive, so applications can reconnect on the same device after recovery
> instead of waiting for a new one to appear.
>
> The motivation for keeping ib_device alive is that some RDMA consumers
> — DPDK and NCCL — don't use RDMA-CM at all. They use raw verbs and
> manage QP state themselves.
RDMA-CM provides an "external QP" model where the QP is managed by the
rdma-cm user.
As Jason noted, you should propose the core changes together with the
corresponding librdmacm updates. The final result must ensure that legacy
applications continue to function correctly with the new kernel.
Thanks
^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: [EXTERNAL] Re: [PATCH rdma-next 0/8] RDMA/mana_ib: Handle service reset for RDMA resources
2026-03-18 14:49 ` Leon Romanovsky
@ 2026-03-21 0:49 ` Long Li
0 siblings, 0 replies; 15+ messages in thread
From: Long Li @ 2026-03-21 0:49 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Jason Gunthorpe, Konstantin Taranov, Jakub Kicinski,
David S . Miller, Paolo Abeni, Eric Dumazet, Andrew Lunn,
Haiyang Zhang, KY Srinivasan, Wei Liu, Dexuan Cui, Simon Horman,
netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org
> On Tue, Mar 17, 2026 at 11:43:49PM +0000, Long Li wrote:
> > >
> > > On Fri, Mar 13, 2026 at 01:59:28PM -0300, Jason Gunthorpe wrote:
> > > > On Sat, Mar 07, 2026 at 07:38:14PM +0200, Leon Romanovsky wrote:
> > > > > On Fri, Mar 06, 2026 at 05:47:14PM -0800, Long Li wrote:
> > > > > > When the MANA hardware undergoes a service reset, the ETH
> > > > > > auxiliary device
> > > > > > (mana.eth) used by DPDK persists across the reset cycle — it
> > > > > > is not removed and re-added like RC/UD/GSI QPs. This means
> > > > > > userspace RDMA consumers such as DPDK have no way of knowing
> > > > > > that firmware handles for their PD, CQ, WQ, QP and MR resources have
> become stale.
> > > > >
> > > > > NAK to any of this.
> > > > >
> > > > > In case of hardware reset, mana_ib AUX device needs to be
> > > > > destroyed and recreated later.
> > > >
> > > > Yeah, that is our general model for any serious RAS event where
> > > > the driver's view of resources becomes out of sync with the HW.
> > > >
> > > > You have tear down the ib_device by removing the aux and then
> > > > bring back a new one.
> > > >
> > > > There is an IB_EVENT_DEVICE_FATAL, but the purpose of that event
> > > > is to tell userspace to close and re-open their uverbs FD.
> > > >
> > > > We don't have a model where a uverbs FD in userspace can continue
> > > > to work after the device has a catasrophic RAS event.
> > > >
> > > > There may be room to have a model where the ib device doesn't
> > > > fully unplug/replug so it retains its name and things, but that is
> > > > core code not driver stuff.
> > >
> > > Good luck with that model. It is going to break RDMA-CM hotplug support.
> > >
> >
> > I think we can preserve RDMA-CM behavior without requiring ib_device
> > unregister/re-register.
> >
> > On device reset, the driver can dispatch IB_EVENT_DEVICE_FATAL (or a
> > new reset event) through ib_dispatch_event(). RDMA-CM already handles
> > device events — we would add a handler that iterates all rdma_cm_ids
> > on the device and sends RDMA_CM_EVENT_DEVICE_REMOVAL to each,
> same
> > as cma_process_remove() does today. The difference: cma_device stays
> > alive, so applications can reconnect on the same device after recovery
> > instead of waiting for a new one to appear.
> >
> > The motivation for keeping ib_device alive is that some RDMA consumers
> > — DPDK and NCCL — don't use RDMA-CM at all. They use raw verbs and
> > manage QP state themselves.
>
> RDMA-CM provides an "external QP" model where the QP is managed by the
> rdma-cm user.
>
> As Jason noted, you should propose the core changes together with the
> corresponding librdmacm updates. The final result must ensure that legacy
> applications continue to function correctly with the new kernel.
>
> Thanks
Will send RFC patches.
Thank you,
Long
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2026-03-21 0:49 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-07 1:47 [PATCH rdma-next 0/8] RDMA/mana_ib: Handle service reset for RDMA resources Long Li
2026-03-07 1:47 ` [PATCH rdma-next 1/8] RDMA/mana_ib: Track ucontext per device Long Li
2026-03-07 1:47 ` [PATCH rdma-next 2/8] RDMA/mana_ib: Track PD per ucontext Long Li
2026-03-07 1:47 ` [PATCH rdma-next 3/8] RDMA/mana_ib: Track CQ " Long Li
2026-03-07 1:47 ` [PATCH rdma-next 4/8] RDMA/mana_ib: Track WQ " Long Li
2026-03-07 1:47 ` [PATCH rdma-next 5/8] RDMA/mana_ib: Track QP " Long Li
2026-03-07 1:47 ` [PATCH rdma-next 6/8] RDMA/mana_ib: Track MR " Long Li
2026-03-07 1:47 ` [PATCH rdma-next 7/8] RDMA/mana_ib: Notify service reset events to RDMA devices Long Li
2026-03-07 1:47 ` [PATCH rdma-next 8/8] RDMA/mana_ib: Skip firmware commands for invalidated handles Long Li
2026-03-07 17:38 ` [PATCH rdma-next 0/8] RDMA/mana_ib: Handle service reset for RDMA resources Leon Romanovsky
2026-03-13 16:59 ` Jason Gunthorpe
2026-03-16 20:08 ` Leon Romanovsky
2026-03-17 23:43 ` [EXTERNAL] " Long Li
2026-03-18 14:49 ` Leon Romanovsky
2026-03-21 0:49 ` Long Li
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox