From: Michael Gur <michaelgur@nvidia.com>
To: jgg@ziepe.ca, leon@kernel.org, linux-rdma@vger.kernel.org
Cc: Edward Srouji <edwards@nvidia.com>,
Yishai Hadas <yishaih@nvidia.com>,
Patrisious Haddad <phaddad@nvidia.com>,
Michael Guralnik <michaelgur@nvidia.com>
Subject: [PATCH rdma-next 7/9] RDMA/core: Fix FRMR handle leak on push failure
Date: Wed, 10 Jun 2026 03:01:43 +0300 [thread overview]
Message-ID: <20260610000145.820592-8-michaelgur@nvidia.com> (raw)
In-Reply-To: <20260610000145.820592-1-michaelgur@nvidia.com>
From: Michael Guralnik <michaelgur@nvidia.com>
Failure to push a handle to the pool, caused by ENOMEM on queue page
allocation, will trigger missing in_use counter update, skewing pool
state indefinitely.
Fix that by moving the handling of handle destruction in such case
into the FRMR code, ensuring the handle is either pushed to the pool
or destroyed inside the same function.
Adjust mlx5_ib call site accordingly.
Fixes: ce5df0b891ed ("IB/core: Introduce FRMR pools")
Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
---
drivers/infiniband/core/frmr_pools.c | 19 +++++++++++--------
drivers/infiniband/hw/mlx5/mr.c | 5 +++--
include/rdma/frmr_pools.h | 2 +-
3 files changed, 15 insertions(+), 11 deletions(-)
diff --git a/drivers/infiniband/core/frmr_pools.c b/drivers/infiniband/core/frmr_pools.c
index 892aedfe03be..e214a8273df8 100644
--- a/drivers/infiniband/core/frmr_pools.c
+++ b/drivers/infiniband/core/frmr_pools.c
@@ -549,9 +549,8 @@ EXPORT_SYMBOL(ib_frmr_pool_pop);
* @device: The device to push the FRMR handle to.
* @mr: The MR containing the FRMR handle to push back to the pool.
*
- * Returns 0 on success, negative error code on failure.
*/
-int ib_frmr_pool_push(struct ib_device *device, struct ib_mr *mr)
+void ib_frmr_pool_push(struct ib_device *device, struct ib_mr *mr)
{
struct ib_frmr_pool *pool = mr->frmr.pool;
struct ib_frmr_pools *pools = device->frmr_pools;
@@ -559,19 +558,23 @@ int ib_frmr_pool_push(struct ib_device *device, struct ib_mr *mr)
int ret;
spin_lock(&pool->lock);
+ pool->in_use--;
+ ret = push_handle_to_queue_locked(&pool->queue, mr->frmr.handle);
+
/* Schedule aging every time an empty pool becomes non-empty */
- if (pool->queue.ci == 0)
+ if (!ret && pool->queue.ci == 1)
schedule_aging = true;
- ret = push_handle_to_queue_locked(&pool->queue, mr->frmr.handle);
- if (ret == 0)
- pool->in_use--;
spin_unlock(&pool->lock);
- if (ret == 0 && schedule_aging)
+ if (ret) {
+ pools->pool_ops->destroy_frmrs(device, &mr->frmr.handle, 1);
+ return;
+ }
+
+ if (schedule_aging)
queue_delayed_work(pools->aging_wq, &pool->aging_work,
secs_to_jiffies(READ_ONCE(pools->aging_period_sec)));
- return ret;
}
EXPORT_SYMBOL(ib_frmr_pool_push);
diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index c0b3a8066974..1a6a8ccf6832 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -1379,9 +1379,10 @@ static int mlx5r_handle_mkey_cleanup(struct mlx5_ib_mr *mr)
bool is_odp = is_odp_mr(mr);
int ret;
- if (mr->ibmr.frmr.pool && !mlx5_umr_revoke_mr_with_lock(mr) &&
- !ib_frmr_pool_push(mr->ibmr.device, &mr->ibmr))
+ if (mr->ibmr.frmr.pool && !mlx5_umr_revoke_mr_with_lock(mr)) {
+ ib_frmr_pool_push(mr->ibmr.device, &mr->ibmr);
return 0;
+ }
if (is_odp)
mutex_lock(&to_ib_umem_odp(mr->umem)->umem_mutex);
diff --git a/include/rdma/frmr_pools.h b/include/rdma/frmr_pools.h
index af1b88801fa4..5b57bafa3636 100644
--- a/include/rdma/frmr_pools.h
+++ b/include/rdma/frmr_pools.h
@@ -34,6 +34,6 @@ int ib_frmr_pools_init(struct ib_device *device,
const struct ib_frmr_pool_ops *pool_ops);
void ib_frmr_pools_cleanup(struct ib_device *device);
int ib_frmr_pool_pop(struct ib_device *device, struct ib_mr *mr);
-int ib_frmr_pool_push(struct ib_device *device, struct ib_mr *mr);
+void ib_frmr_pool_push(struct ib_device *device, struct ib_mr *mr);
#endif /* FRMR_POOLS_H */
--
2.52.0
next prev parent reply other threads:[~2026-06-10 0:03 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-10 0:01 [PATCH rdma-next 0/9] FRMR pools fixes Michael Gur
2026-06-10 0:01 ` [PATCH rdma-next 1/9] RDMA/mlx5: Fix mkey creation error flow rollback Michael Gur
2026-06-10 0:01 ` [PATCH rdma-next 2/9] RDMA/mlx5: Fix TPH extraction in FRMR pool key Michael Gur
2026-06-10 0:01 ` [PATCH rdma-next 3/9] RDMA/core: Fix skipped usage for driver built FRMR key Michael Gur
2026-06-10 0:01 ` [PATCH rdma-next 4/9] RDMA/core: Fix FRMR aging push to queue error flow Michael Gur
2026-06-10 0:01 ` [PATCH rdma-next 5/9] RDMA/core: Fix FRMR set pinned push error path Michael Gur
2026-06-10 0:01 ` [PATCH rdma-next 6/9] RDMA/core: Avoid NULL dereference on FRMR bad usage Michael Gur
2026-06-10 0:01 ` Michael Gur [this message]
2026-06-10 0:01 ` [PATCH rdma-next 8/9] RDMA/core: Add ib_frmr_pool_drop for unrecoverable handles Michael Gur
2026-06-10 0:01 ` [PATCH rdma-next 9/9] RDMA/mlx5: Drop FRMR pool handle on UMR revoke failure Michael Gur
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260610000145.820592-8-michaelgur@nvidia.com \
--to=michaelgur@nvidia.com \
--cc=edwards@nvidia.com \
--cc=jgg@ziepe.ca \
--cc=leon@kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=phaddad@nvidia.com \
--cc=yishaih@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox