Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed
* [PATCH rdma-next 0/9] FRMR pools fixes
@ 2026-06-10  0:01 Michael Gur
  2026-06-10  0:01 ` [PATCH rdma-next 1/9] RDMA/mlx5: Fix mkey creation error flow rollback Michael Gur
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: Michael Gur @ 2026-06-10  0:01 UTC (permalink / raw)
  To: jgg, leon, linux-rdma
  Cc: Edward Srouji, Yishai Hadas, Patrisious Haddad, Michael Guralnik

From: Michael Guralnik <michaelgur@nvidia.com>

This series addresses several bugs in FRMR pool handling.

Patch 2 fixes incorrect masking of TPH-related bits in the FRMR pool key,
which caused stale TPH values to be used when creating handles from an
empty pool.

Patch 3 fixes set-pinned flow to use the pool key returned from the
driver build_key callback instead of the raw key supplied by user.

Patch 8 extends the FRMR pools API with a new drop() operation.
This allows drivers to update pool state on handle destruction when
revocation fails, without incorrectly returning the handle to the pool.

The remaining patches fix error path handling, covering cases where memory
allocation fails during queue expansion, and where handle creation or
destruction operations return errors.

Michael Guralnik (9):
  RDMA/mlx5: Fix mkey creation error flow rollback
  RDMA/mlx5: Fix TPH extraction in FRMR pool key
  RDMA/core: Fix skipped usage for driver built FRMR key
  RDMA/core: Fix FRMR aging push to queue error flow
  RDMA/core: Fix FRMR set pinned push error path
  RDMA/core: Avoid NULL dereference on FRMR bad usage
  RDMA/core: Fix FRMR handle leak on push failure
  RDMA/core: Add ib_frmr_pool_drop for unrecoverable handles
  RDMA/mlx5: Drop FRMR pool handle on UMR revoke failure

 drivers/infiniband/core/frmr_pools.c | 104 +++++++++++++++++++--------
 drivers/infiniband/hw/mlx5/mr.c      |  31 +++++---
 include/rdma/frmr_pools.h            |   3 +-
 3 files changed, 98 insertions(+), 40 deletions(-)

-- 
2.52.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH rdma-next 1/9] RDMA/mlx5: Fix mkey creation error flow rollback
  2026-06-10  0:01 [PATCH rdma-next 0/9] FRMR pools fixes Michael Gur
@ 2026-06-10  0:01 ` Michael Gur
  2026-06-10  0:01 ` [PATCH rdma-next 2/9] RDMA/mlx5: Fix TPH extraction in FRMR pool key Michael Gur
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Michael Gur @ 2026-06-10  0:01 UTC (permalink / raw)
  To: jgg, leon, linux-rdma
  Cc: Edward Srouji, Yishai Hadas, Patrisious Haddad, Michael Guralnik

From: Michael Guralnik <michaelgur@nvidia.com>

Fix the indices of mkeys destroyed in case of an error in batch mkey
creation.

Fixes: 36680ef7bceb ("RDMA/mlx5: Switch from MR cache to FRMR pools")
Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
---
 drivers/infiniband/hw/mlx5/mr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index 254e6aa4ccaf..c174e27e2e65 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -294,7 +294,7 @@ static int mlx5r_create_mkeys(struct ib_device *device, struct ib_frmr_key *key,
 free_in:
 	kfree(in);
 	if (err)
-		for (; i > 0; i--)
+		for (i--; i >= 0; i--)
 			mlx5_core_destroy_mkey(dev->mdev, handles[i]);
 	return err;
 }
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH rdma-next 2/9] RDMA/mlx5: Fix TPH extraction in FRMR pool key
  2026-06-10  0:01 [PATCH rdma-next 0/9] FRMR pools fixes Michael Gur
  2026-06-10  0:01 ` [PATCH rdma-next 1/9] RDMA/mlx5: Fix mkey creation error flow rollback Michael Gur
@ 2026-06-10  0:01 ` Michael Gur
  2026-06-10  0:01 ` [PATCH rdma-next 3/9] RDMA/core: Fix skipped usage for driver built FRMR key Michael Gur
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Michael Gur @ 2026-06-10  0:01 UTC (permalink / raw)
  To: jgg, leon, linux-rdma
  Cc: Edward Srouji, Yishai Hadas, Patrisious Haddad, Michael Guralnik

From: Michael Guralnik <michaelgur@nvidia.com>

Fix reading the PH value from the FRMR pool key by shifting the pool key
to the relevant bits.

Fixes: 36680ef7bceb ("RDMA/mlx5: Switch from MR cache to FRMR pools")
Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
---
 drivers/infiniband/hw/mlx5/mr.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index c174e27e2e65..c0b3a8066974 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -31,6 +31,7 @@
  * SOFTWARE.
  */
 
+#include <linux/bitfield.h>
 #include <linux/kref.h>
 #include <linux/random.h>
 #include <linux/debugfs.h>
@@ -163,9 +164,8 @@ static int get_unchangeable_access_flags(struct mlx5_ib_dev *dev,
 #define MLX5_FRMR_POOLS_KEY_VENDOR_KEY_SUPPORTED \
 	MLX5_FRMR_POOLS_KEY_ACCESS_MODE_KSM_MASK
 
-#define MLX5_FRMR_POOLS_KERNEL_KEY_PH_SHIFT 16
-#define MLX5_FRMR_POOLS_KERNEL_KEY_PH_MASK 0xFF0000
-#define MLX5_FRMR_POOLS_KERNEL_KEY_ST_INDEX_MASK 0xFFFF
+#define MLX5_FRMR_POOLS_KERNEL_KEY_PH_MASK GENMASK_ULL(23, 16)
+#define MLX5_FRMR_POOLS_KERNEL_KEY_ST_INDEX_MASK GENMASK_ULL(15, 0)
 
 static struct mlx5_ib_mr *
 _mlx5_frmr_pool_alloc(struct mlx5_ib_dev *dev, struct ib_umem *umem,
@@ -194,7 +194,8 @@ _mlx5_frmr_pool_alloc(struct mlx5_ib_dev *dev, struct ib_umem *umem,
 		ph ^= MLX5_IB_NO_PH;
 
 	mr->ibmr.frmr.key.kernel_vendor_key =
-		st_index | (ph << MLX5_FRMR_POOLS_KERNEL_KEY_PH_SHIFT);
+		FIELD_PREP(MLX5_FRMR_POOLS_KERNEL_KEY_ST_INDEX_MASK, st_index) |
+		FIELD_PREP(MLX5_FRMR_POOLS_KERNEL_KEY_PH_MASK, ph);
 	err = ib_frmr_pool_pop(&dev->ib_dev, &mr->ibmr);
 	if (err) {
 		kfree(mr);
@@ -271,9 +272,10 @@ static int mlx5r_create_mkeys(struct ib_device *device, struct ib_frmr_key *key,
 		 get_mkc_octo_size(access_mode, key->num_dma_blocks));
 	MLX5_SET(mkc, mkc, log_page_size, PAGE_SHIFT);
 
-	st_index = key->kernel_vendor_key &
-		   MLX5_FRMR_POOLS_KERNEL_KEY_ST_INDEX_MASK;
-	ph = key->kernel_vendor_key & MLX5_FRMR_POOLS_KERNEL_KEY_PH_MASK;
+	st_index = FIELD_GET(MLX5_FRMR_POOLS_KERNEL_KEY_ST_INDEX_MASK,
+			     key->kernel_vendor_key);
+	ph = FIELD_GET(MLX5_FRMR_POOLS_KERNEL_KEY_PH_MASK,
+		       key->kernel_vendor_key);
 	if (ph) {
 		/* Normalize ph: swap MLX5_IB_NO_PH for 0 */
 		if (ph == MLX5_IB_NO_PH)
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH rdma-next 3/9] RDMA/core: Fix skipped usage for driver built FRMR key
  2026-06-10  0:01 [PATCH rdma-next 0/9] FRMR pools fixes Michael Gur
  2026-06-10  0:01 ` [PATCH rdma-next 1/9] RDMA/mlx5: Fix mkey creation error flow rollback Michael Gur
  2026-06-10  0:01 ` [PATCH rdma-next 2/9] RDMA/mlx5: Fix TPH extraction in FRMR pool key Michael Gur
@ 2026-06-10  0:01 ` Michael Gur
  2026-06-10  0:01 ` [PATCH rdma-next 4/9] RDMA/core: Fix FRMR aging push to queue error flow Michael Gur
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Michael Gur @ 2026-06-10  0:01 UTC (permalink / raw)
  To: jgg, leon, linux-rdma
  Cc: Edward Srouji, Yishai Hadas, Patrisious Haddad, Michael Guralnik

From: Michael Guralnik <michaelgur@nvidia.com>

When creating FRMR handles following a netlink command to pin handles,
use the key after driver callback instead of using the key passed directly
from user.

Fixes: 020d189d16a6 ("RDMA/core: Add pinned handles to FRMR pools")
Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
---
 drivers/infiniband/core/frmr_pools.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/frmr_pools.c b/drivers/infiniband/core/frmr_pools.c
index 5e992ff3d7cf..6170466ea958 100644
--- a/drivers/infiniband/core/frmr_pools.c
+++ b/drivers/infiniband/core/frmr_pools.c
@@ -426,7 +426,7 @@ int ib_frmr_pools_set_pinned(struct ib_device *device, struct ib_frmr_key *key,
 	if (!handles)
 		return -ENOMEM;
 
-	ret = pools->pool_ops->create_frmrs(device, key, handles,
+	ret = pools->pool_ops->create_frmrs(device, &driver_key, handles,
 					    needed_handles);
 	if (ret) {
 		kfree(handles);
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH rdma-next 4/9] RDMA/core: Fix FRMR aging push to queue error flow
  2026-06-10  0:01 [PATCH rdma-next 0/9] FRMR pools fixes Michael Gur
                   ` (2 preceding siblings ...)
  2026-06-10  0:01 ` [PATCH rdma-next 3/9] RDMA/core: Fix skipped usage for driver built FRMR key Michael Gur
@ 2026-06-10  0:01 ` Michael Gur
  2026-06-10  0:01 ` [PATCH rdma-next 5/9] RDMA/core: Fix FRMR set pinned push error path Michael Gur
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Michael Gur @ 2026-06-10  0:01 UTC (permalink / raw)
  To: jgg, leon, linux-rdma
  Cc: Edward Srouji, Yishai Hadas, Patrisious Haddad, Michael Guralnik

From: Michael Guralnik <michaelgur@nvidia.com>

Aging pools with pinned handles requires moving handles from the
active queue to a non-empty inactive queue that might fail on new page
allocation, we are currently not handling the fault and leaking any mkey
that fails the push.

Fix by Introducing push_queue_to_queue_locked() that fills the
destination's partial tail page from the source and then splices the
remaining source pages onto the destination, performing no allocation.

Replace the per-handle move loop in age_pinned_pool() and the
open-coded splice in pool_aging_work() with calls to the helper.
As the helper cannot fail under memory pressure, removing a class of
GFP_ATOMIC allocations under the pool lock and simplifying the error
flow.

Fixes: 020d189d16a6 ("RDMA/core: Add pinned handles to FRMR pools")
Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
---
 drivers/infiniband/core/frmr_pools.c | 53 ++++++++++++++++++++--------
 1 file changed, 38 insertions(+), 15 deletions(-)

diff --git a/drivers/infiniband/core/frmr_pools.c b/drivers/infiniband/core/frmr_pools.c
index 6170466ea958..927642c06f3a 100644
--- a/drivers/infiniband/core/frmr_pools.c
+++ b/drivers/infiniband/core/frmr_pools.c
@@ -97,13 +97,44 @@ static void destroy_all_handles_in_queue(struct ib_device *device,
 	}
 }
 
+/*
+ * Bulk-move all handles from @src into @dst without allocating new pages.
+ * If @dst has a partial tail page, fill it handle-by-handle from @src first
+ * to preserve the invariant that only the tail page is partial, then splice
+ * the remaining @src pages onto @dst. On return @src is empty.
+ *
+ * Caller must hold the lock protecting both queues.
+ */
+static void splice_frmr_queue_locked(struct frmr_queue *dst,
+				     struct frmr_queue *src)
+{
+	u32 free_in_tail = dst->ci % NUM_HANDLES_PER_PAGE;
+	u32 handle;
+
+	if (free_in_tail) {
+		free_in_tail = NUM_HANDLES_PER_PAGE - free_in_tail;
+		while (free_in_tail && src->ci) {
+			handle = pop_handle_from_queue_locked(src);
+			push_handle_to_queue_locked(dst, handle);
+			free_in_tail--;
+		}
+	}
+
+	if (src->ci > 0) {
+		list_splice_tail_init(&src->pages_list, &dst->pages_list);
+		dst->num_pages += src->num_pages;
+		dst->ci += src->ci;
+		src->num_pages = 0;
+		src->ci = 0;
+	}
+}
+
 static bool age_pinned_pool(struct ib_device *device, struct ib_frmr_pool *pool)
 {
 	struct ib_frmr_pools *pools = device->frmr_pools;
 	u32 total, to_destroy, destroyed = 0;
 	bool has_work = false;
 	u32 *handles;
-	u32 handle;
 
 	spin_lock(&pool->lock);
 	total = pool->queue.ci + pool->inactive_queue.ci + pool->in_use;
@@ -112,7 +143,7 @@ static bool age_pinned_pool(struct ib_device *device, struct ib_frmr_pool *pool)
 		return false;
 	}
 
-	to_destroy = total - pool->pinned_handles;
+	to_destroy = min(total - pool->pinned_handles, pool->inactive_queue.ci);
 
 	handles = kcalloc(to_destroy, sizeof(*handles), GFP_ATOMIC);
 	if (!handles) {
@@ -121,15 +152,13 @@ static bool age_pinned_pool(struct ib_device *device, struct ib_frmr_pool *pool)
 	}
 
 	/* Destroy all excess handles in the inactive queue */
-	while (pool->inactive_queue.ci && destroyed < to_destroy) {
-		handles[destroyed++] = pop_handle_from_queue_locked(
+	for (; destroyed < to_destroy; destroyed++)
+		handles[destroyed] = pop_handle_from_queue_locked(
 			&pool->inactive_queue);
-	}
 
 	/* Move all handles from regular queue to inactive queue */
-	while (pool->queue.ci) {
-		handle = pop_handle_from_queue_locked(&pool->queue);
-		push_handle_to_queue_locked(&pool->inactive_queue, handle);
+	if (pool->queue.ci > 0) {
+		splice_frmr_queue_locked(&pool->inactive_queue, &pool->queue);
 		has_work = true;
 	}
 
@@ -158,13 +187,7 @@ static void pool_aging_work(struct work_struct *work)
 	/* Move all pages from regular queue to inactive queue */
 	spin_lock(&pool->lock);
 	if (pool->queue.ci > 0) {
-		list_splice_tail_init(&pool->queue.pages_list,
-				      &pool->inactive_queue.pages_list);
-		pool->inactive_queue.num_pages = pool->queue.num_pages;
-		pool->inactive_queue.ci = pool->queue.ci;
-
-		pool->queue.num_pages = 0;
-		pool->queue.ci = 0;
+		splice_frmr_queue_locked(&pool->inactive_queue, &pool->queue);
 		has_work = true;
 	}
 	spin_unlock(&pool->lock);
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH rdma-next 5/9] RDMA/core: Fix FRMR set pinned push error path
  2026-06-10  0:01 [PATCH rdma-next 0/9] FRMR pools fixes Michael Gur
                   ` (3 preceding siblings ...)
  2026-06-10  0:01 ` [PATCH rdma-next 4/9] RDMA/core: Fix FRMR aging push to queue error flow Michael Gur
@ 2026-06-10  0:01 ` Michael Gur
  2026-06-10  0:01 ` [PATCH rdma-next 6/9] RDMA/core: Avoid NULL dereference on FRMR bad usage Michael Gur
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Michael Gur @ 2026-06-10  0:01 UTC (permalink / raw)
  To: jgg, leon, linux-rdma
  Cc: Edward Srouji, Yishai Hadas, Patrisious Haddad, Michael Guralnik

From: Michael Guralnik <michaelgur@nvidia.com>

Add destruction of FRMR handles in case the push to the pool fails.
This prevents resources leak in case pool page allocation fails.

Fixes: 020d189d16a6 ("RDMA/core: Add pinned handles to FRMR pools")
Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
---
 drivers/infiniband/core/frmr_pools.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/core/frmr_pools.c b/drivers/infiniband/core/frmr_pools.c
index 927642c06f3a..1cfdddc3fcda 100644
--- a/drivers/infiniband/core/frmr_pools.c
+++ b/drivers/infiniband/core/frmr_pools.c
@@ -461,11 +461,16 @@ int ib_frmr_pools_set_pinned(struct ib_device *device, struct ib_frmr_key *key,
 		ret = push_handle_to_queue_locked(&pool->queue,
 						  handles[i]);
 		if (ret)
-			goto end;
+			break;
 	}
-
-end:
 	spin_unlock(&pool->lock);
+
+	if (ret) {
+		/* Destroy handles created but never pushed to the pool. */
+		pools->pool_ops->destroy_frmrs(device, &handles[i],
+				needed_handles - i);
+	}
+
 	kfree(handles);
 
 schedule_aging:
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH rdma-next 6/9] RDMA/core: Avoid NULL dereference on FRMR bad usage
  2026-06-10  0:01 [PATCH rdma-next 0/9] FRMR pools fixes Michael Gur
                   ` (4 preceding siblings ...)
  2026-06-10  0:01 ` [PATCH rdma-next 5/9] RDMA/core: Fix FRMR set pinned push error path Michael Gur
@ 2026-06-10  0:01 ` Michael Gur
  2026-06-10  0:01 ` [PATCH rdma-next 7/9] RDMA/core: Fix FRMR handle leak on push failure Michael Gur
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Michael Gur @ 2026-06-10  0:01 UTC (permalink / raw)
  To: jgg, leon, linux-rdma
  Cc: Edward Srouji, Yishai Hadas, Patrisious Haddad, Michael Guralnik

From: Michael Guralnik <michaelgur@nvidia.com>

In case a driver calls FRMR pop operation without a successful init,
return after triggering a warning to avoid the NULL dereference.

Fixes: ce5df0b891ed ("IB/core: Introduce FRMR pools")
Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
---
 drivers/infiniband/core/frmr_pools.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/frmr_pools.c b/drivers/infiniband/core/frmr_pools.c
index 1cfdddc3fcda..892aedfe03be 100644
--- a/drivers/infiniband/core/frmr_pools.c
+++ b/drivers/infiniband/core/frmr_pools.c
@@ -529,7 +529,9 @@ int ib_frmr_pool_pop(struct ib_device *device, struct ib_mr *mr)
 	struct ib_frmr_pools *pools = device->frmr_pools;
 	struct ib_frmr_pool *pool;
 
-	WARN_ON_ONCE(!device->frmr_pools);
+	if (WARN_ON_ONCE(!pools))
+		return -EINVAL;
+
 	pool = ib_frmr_pool_find(pools, &mr->frmr.key);
 	if (!pool) {
 		pool = create_frmr_pool(device, &mr->frmr.key);
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH rdma-next 7/9] RDMA/core: Fix FRMR handle leak on push failure
  2026-06-10  0:01 [PATCH rdma-next 0/9] FRMR pools fixes Michael Gur
                   ` (5 preceding siblings ...)
  2026-06-10  0:01 ` [PATCH rdma-next 6/9] RDMA/core: Avoid NULL dereference on FRMR bad usage Michael Gur
@ 2026-06-10  0:01 ` Michael Gur
  2026-06-10  0:01 ` [PATCH rdma-next 8/9] RDMA/core: Add ib_frmr_pool_drop for unrecoverable handles Michael Gur
  2026-06-10  0:01 ` [PATCH rdma-next 9/9] RDMA/mlx5: Drop FRMR pool handle on UMR revoke failure Michael Gur
  8 siblings, 0 replies; 10+ messages in thread
From: Michael Gur @ 2026-06-10  0:01 UTC (permalink / raw)
  To: jgg, leon, linux-rdma
  Cc: Edward Srouji, Yishai Hadas, Patrisious Haddad, Michael Guralnik

From: Michael Guralnik <michaelgur@nvidia.com>

Failure to push a handle to the pool, caused by ENOMEM on queue page
allocation, will trigger missing in_use counter update, skewing pool
state indefinitely.
Fix that by moving the handling of handle destruction in such case
into the FRMR code, ensuring the handle is either pushed to the pool
or destroyed inside the same function.

Adjust mlx5_ib call site accordingly.

Fixes: ce5df0b891ed ("IB/core: Introduce FRMR pools")
Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
---
 drivers/infiniband/core/frmr_pools.c | 19 +++++++++++--------
 drivers/infiniband/hw/mlx5/mr.c      |  5 +++--
 include/rdma/frmr_pools.h            |  2 +-
 3 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/drivers/infiniband/core/frmr_pools.c b/drivers/infiniband/core/frmr_pools.c
index 892aedfe03be..e214a8273df8 100644
--- a/drivers/infiniband/core/frmr_pools.c
+++ b/drivers/infiniband/core/frmr_pools.c
@@ -549,9 +549,8 @@ EXPORT_SYMBOL(ib_frmr_pool_pop);
  * @device: The device to push the FRMR handle to.
  * @mr: The MR containing the FRMR handle to push back to the pool.
  *
- * Returns 0 on success, negative error code on failure.
  */
-int ib_frmr_pool_push(struct ib_device *device, struct ib_mr *mr)
+void ib_frmr_pool_push(struct ib_device *device, struct ib_mr *mr)
 {
 	struct ib_frmr_pool *pool = mr->frmr.pool;
 	struct ib_frmr_pools *pools = device->frmr_pools;
@@ -559,19 +558,23 @@ int ib_frmr_pool_push(struct ib_device *device, struct ib_mr *mr)
 	int ret;
 
 	spin_lock(&pool->lock);
+	pool->in_use--;
+	ret = push_handle_to_queue_locked(&pool->queue, mr->frmr.handle);
+
 	/* Schedule aging every time an empty pool becomes non-empty */
-	if (pool->queue.ci == 0)
+	if (!ret && pool->queue.ci == 1)
 		schedule_aging = true;
-	ret = push_handle_to_queue_locked(&pool->queue, mr->frmr.handle);
-	if (ret == 0)
-		pool->in_use--;
 
 	spin_unlock(&pool->lock);
 
-	if (ret == 0 && schedule_aging)
+	if (ret) {
+		pools->pool_ops->destroy_frmrs(device, &mr->frmr.handle, 1);
+		return;
+	}
+
+	if (schedule_aging)
 		queue_delayed_work(pools->aging_wq, &pool->aging_work,
 			secs_to_jiffies(READ_ONCE(pools->aging_period_sec)));
 
-	return ret;
 }
 EXPORT_SYMBOL(ib_frmr_pool_push);
diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index c0b3a8066974..1a6a8ccf6832 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -1379,9 +1379,10 @@ static int mlx5r_handle_mkey_cleanup(struct mlx5_ib_mr *mr)
 	bool is_odp = is_odp_mr(mr);
 	int ret;
 
-	if (mr->ibmr.frmr.pool && !mlx5_umr_revoke_mr_with_lock(mr) &&
-	    !ib_frmr_pool_push(mr->ibmr.device, &mr->ibmr))
+	if (mr->ibmr.frmr.pool && !mlx5_umr_revoke_mr_with_lock(mr)) {
+		ib_frmr_pool_push(mr->ibmr.device, &mr->ibmr);
 		return 0;
+	}
 
 	if (is_odp)
 		mutex_lock(&to_ib_umem_odp(mr->umem)->umem_mutex);
diff --git a/include/rdma/frmr_pools.h b/include/rdma/frmr_pools.h
index af1b88801fa4..5b57bafa3636 100644
--- a/include/rdma/frmr_pools.h
+++ b/include/rdma/frmr_pools.h
@@ -34,6 +34,6 @@ int ib_frmr_pools_init(struct ib_device *device,
 		       const struct ib_frmr_pool_ops *pool_ops);
 void ib_frmr_pools_cleanup(struct ib_device *device);
 int ib_frmr_pool_pop(struct ib_device *device, struct ib_mr *mr);
-int ib_frmr_pool_push(struct ib_device *device, struct ib_mr *mr);
+void ib_frmr_pool_push(struct ib_device *device, struct ib_mr *mr);
 
 #endif /* FRMR_POOLS_H */
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH rdma-next 8/9] RDMA/core: Add ib_frmr_pool_drop for unrecoverable handles
  2026-06-10  0:01 [PATCH rdma-next 0/9] FRMR pools fixes Michael Gur
                   ` (6 preceding siblings ...)
  2026-06-10  0:01 ` [PATCH rdma-next 7/9] RDMA/core: Fix FRMR handle leak on push failure Michael Gur
@ 2026-06-10  0:01 ` Michael Gur
  2026-06-10  0:01 ` [PATCH rdma-next 9/9] RDMA/mlx5: Drop FRMR pool handle on UMR revoke failure Michael Gur
  8 siblings, 0 replies; 10+ messages in thread
From: Michael Gur @ 2026-06-10  0:01 UTC (permalink / raw)
  To: jgg, leon, linux-rdma
  Cc: Edward Srouji, Yishai Hadas, Patrisious Haddad, Michael Guralnik

From: Michael Guralnik <michaelgur@nvidia.com>

A driver that has popped a handle from an FRMR pool can hit failures
that leave the handle in a state where it can't safely be returned
for reuse. The driver destroys the handle itself, but the pool has
no way to learn about it, so the in_use counter drifts upward.

Add ib_frmr_pool_drop to balance the pool's accounting in this case.
Every pop is now balanced by exactly one push or drop.

Fixes: 36680ef7bceb ("RDMA/mlx5: Switch from MR cache to FRMR pools")
Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
---
 drivers/infiniband/core/frmr_pools.c | 15 +++++++++++++++
 include/rdma/frmr_pools.h            |  1 +
 2 files changed, 16 insertions(+)

diff --git a/drivers/infiniband/core/frmr_pools.c b/drivers/infiniband/core/frmr_pools.c
index e214a8273df8..ce8ae4305b9c 100644
--- a/drivers/infiniband/core/frmr_pools.c
+++ b/drivers/infiniband/core/frmr_pools.c
@@ -578,3 +578,18 @@ void ib_frmr_pool_push(struct ib_device *device, struct ib_mr *mr)
 
 }
 EXPORT_SYMBOL(ib_frmr_pool_push);
+
+/*
+ * Drop a handle previously popped from the pool without returning it for
+ * reuse. The caller is responsible for destroying the underlying hardware
+ * resource.
+ */
+void ib_frmr_pool_drop(struct ib_mr *mr)
+{
+	struct ib_frmr_pool *pool = mr->frmr.pool;
+
+	spin_lock(&pool->lock);
+	pool->in_use--;
+	spin_unlock(&pool->lock);
+}
+EXPORT_SYMBOL(ib_frmr_pool_drop);
diff --git a/include/rdma/frmr_pools.h b/include/rdma/frmr_pools.h
index 5b57bafa3636..aed4d69d3841 100644
--- a/include/rdma/frmr_pools.h
+++ b/include/rdma/frmr_pools.h
@@ -35,5 +35,6 @@ int ib_frmr_pools_init(struct ib_device *device,
 void ib_frmr_pools_cleanup(struct ib_device *device);
 int ib_frmr_pool_pop(struct ib_device *device, struct ib_mr *mr);
 void ib_frmr_pool_push(struct ib_device *device, struct ib_mr *mr);
+void ib_frmr_pool_drop(struct ib_mr *mr);
 
 #endif /* FRMR_POOLS_H */
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH rdma-next 9/9] RDMA/mlx5: Drop FRMR pool handle on UMR revoke failure
  2026-06-10  0:01 [PATCH rdma-next 0/9] FRMR pools fixes Michael Gur
                   ` (7 preceding siblings ...)
  2026-06-10  0:01 ` [PATCH rdma-next 8/9] RDMA/core: Add ib_frmr_pool_drop for unrecoverable handles Michael Gur
@ 2026-06-10  0:01 ` Michael Gur
  8 siblings, 0 replies; 10+ messages in thread
From: Michael Gur @ 2026-06-10  0:01 UTC (permalink / raw)
  To: jgg, leon, linux-rdma
  Cc: Edward Srouji, Yishai Hadas, Patrisious Haddad, Michael Guralnik

From: Michael Guralnik <michaelgur@nvidia.com>

When UMR revoke fails during MR cleanup, the handle is left in an
unknown state and cannot be returned to the pool. The driver already
destroys the mkey via the fallback path, but the pool's in_use counter
is never decremented, drifting upward over time.

Call ib_frmr_pool_drop on the revoke-failure path so the pool's
accounting stays consistent with the handles it has handed out.

Fixes: 36680ef7bceb ("RDMA/mlx5: Switch from MR cache to FRMR pools")
Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
---
 drivers/infiniband/hw/mlx5/mr.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index 1a6a8ccf6832..067e80f7875b 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -1379,9 +1379,11 @@ static int mlx5r_handle_mkey_cleanup(struct mlx5_ib_mr *mr)
 	bool is_odp = is_odp_mr(mr);
 	int ret;
 
-	if (mr->ibmr.frmr.pool && !mlx5_umr_revoke_mr_with_lock(mr)) {
-		ib_frmr_pool_push(mr->ibmr.device, &mr->ibmr);
-		return 0;
+	if (mr->ibmr.frmr.pool) {
+		if (!mlx5_umr_revoke_mr_with_lock(mr)) {
+			ib_frmr_pool_push(mr->ibmr.device, &mr->ibmr);
+			return 0;
+		}
 	}
 
 	if (is_odp)
@@ -1403,6 +1405,10 @@ static int mlx5r_handle_mkey_cleanup(struct mlx5_ib_mr *mr)
 		dma_resv_unlock(
 			to_ib_umem_dmabuf(mr->umem)->attach->dmabuf->resv);
 	}
+
+	if (mr->ibmr.frmr.pool && !ret)
+		ib_frmr_pool_drop(&mr->ibmr);
+
 	return ret;
 }
 
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2026-06-10  0:03 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-10  0:01 [PATCH rdma-next 0/9] FRMR pools fixes Michael Gur
2026-06-10  0:01 ` [PATCH rdma-next 1/9] RDMA/mlx5: Fix mkey creation error flow rollback Michael Gur
2026-06-10  0:01 ` [PATCH rdma-next 2/9] RDMA/mlx5: Fix TPH extraction in FRMR pool key Michael Gur
2026-06-10  0:01 ` [PATCH rdma-next 3/9] RDMA/core: Fix skipped usage for driver built FRMR key Michael Gur
2026-06-10  0:01 ` [PATCH rdma-next 4/9] RDMA/core: Fix FRMR aging push to queue error flow Michael Gur
2026-06-10  0:01 ` [PATCH rdma-next 5/9] RDMA/core: Fix FRMR set pinned push error path Michael Gur
2026-06-10  0:01 ` [PATCH rdma-next 6/9] RDMA/core: Avoid NULL dereference on FRMR bad usage Michael Gur
2026-06-10  0:01 ` [PATCH rdma-next 7/9] RDMA/core: Fix FRMR handle leak on push failure Michael Gur
2026-06-10  0:01 ` [PATCH rdma-next 8/9] RDMA/core: Add ib_frmr_pool_drop for unrecoverable handles Michael Gur
2026-06-10  0:01 ` [PATCH rdma-next 9/9] RDMA/mlx5: Drop FRMR pool handle on UMR revoke failure Michael Gur

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox