public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH rdma-next v2 00/11] RDMA/core: Introduce FRMR pools infrastructure
@ 2025-12-22 12:40 Edward Srouji
  2025-12-22 12:40 ` [PATCH rdma-next v2 01/11] RDMA/mlx5: Move device async_ctx initialization Edward Srouji
                   ` (10 more replies)
  0 siblings, 11 replies; 14+ messages in thread
From: Edward Srouji @ 2025-12-22 12:40 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky, Saeed Mahameed, Tariq Toukan,
	Mark Bloch, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni
  Cc: linux-kernel, linux-rdma, netdev, Michael Guralnik, Edward Srouji,
	Chiara Meiohas, Yishai Hadas, Patrisious Haddad

From Michael:

This patch series introduces a new FRMR (Fast Registration Memory Region)
pool infrastructure for the RDMA core subsystem. The goal is to provide
efficient management and allow reuse of MRs (Memory Regions) for RDMA
device drivers.

Background
==========

Memory registration and deregistration can be a significant bottleneck in
RDMA applications that need to register memory regions dynamically in
their data path or must re-register memory on application restart.
Repeatedly allocating and freeing these resources introduces overhead,
particularly in high-throughput or latency-sensitive environments where
memory regions are frequently cycled. Notably, the mlx5_ib driver has
already adopted memory registration reuse mechanisms and has demonstrated
notable performance improvements as a result.

FRMR pools will store handles of the reusable objects, giving drivers
the flexibility to choose what to store (e.g: pointers or indexes).
Device driver integration requires the ability to modify the hardware
objects underlying MRs when reusing FRMR handles, allowing the update
of pre-allocated handles to fit the parameters of requested MR
registrations. The FRMR pools manage memory region handles with respect
to attributes that cannot be changed after allocation such as access flags,
ATS capabilities, vendor keys, and DMA block size so each pool is uniquely
characterized by these non-modifiable attributes.
This ensures compatibility and correctness while allowing drivers
flexibility in managing other aspects of the MR lifecycle.

Solution Overview
=================

This patch series introduces a centralized, per-device FRMR pooling
infrastructure that provides:

1. Pool Organization: Uses an RB-tree to organize pools by FRMR
   characteristics (ATS support, access flags, vendor-specific keys,
   and DMA block count). This allows efficient lookup and reuse of
   compatible FRMR handles.

2. Dynamic Allocation: Pools grow dynamically on demand when no cached
   handles are available, ensuring optimal memory usage without
   sacrificing performance.

3. Aging Mechanism: Implements an aging system. Unused handles are
   gradually moved to the freed after a configurable aging period
   (default: 60 seconds), preventing memory bloat during idle periods.

4. Pinned Handles: Supports pinning a minimum number of handles per
   pool to maintain performance for latency-sensitive workloads, avoiding
   allocation overhead on critical paths.

5. Driver Flexibility: Provides a callback-based interface
   (ib_frmr_pool_ops) that allows drivers to implement their own FRMR
   creation/destruction logic while leveraging the common pooling
   infrastructure.

API
===

The infrastructure exposes the following APIs:

- ib_frmr_pools_init(): Initialize FRMR pools for a device
- ib_frmr_pools_cleanup(): Clean up all pools for a device
- ib_frmr_pool_pop(): Get an FRMR handle from the pool
- ib_frmr_pool_push(): Return an FRMR handle to the pool
- ib_frmr_pools_set_aging_period(): Configure aging period
- ib_frmr_pools_set_pinned(): Set minimum pinned handles per pool

mlx5_ib
=======

The partial control and visability we had only over the 'persistent'
cache entries through debugfs is replaced by the netlink FRMR API that
allows showing and setting properties of all available pools.
This series also changes the default behavior MR cache had for PFs
(Physical Functions) by dropping the pre-allocation of MKEYs that was
costing 100MB of memory per PF and slowing down the loading and
unloading of the driver.

Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
Signed-off-by: Edward Srouji <edwards@nvidia.com>
---
Changes in v2:
- Fix stack size warning in netlink set_pinned flow.
- Add commit to move async command context init and cleanup out of MR
  cache logic.
- Add enforcement of access flags in set_pinned flow and enforce used
  bits in vendor specific fields to ensure old kernels fail if any
  unknown parameter is passed.
- Add an option to expose kernel-internal pools through netlink.
- Link to v1: https://lore.kernel.org/r/20251116-frmr_pools-v1-0-5eb3c8f5c9c4@nvidia.com

---
Chiara Meiohas (1):
      RDMA/mlx5: Move device async_ctx initialization

Michael Guralnik (10):
      IB/core: Introduce FRMR pools
      RDMA/core: Add aging to FRMR pools
      RDMA/core: Add FRMR pools statistics
      RDMA/core: Add pinned handles to FRMR pools
      RDMA/mlx5: Switch from MR cache to FRMR pools
      net/mlx5: Drop MR cache related code
      RDMA/nldev: Add command to get FRMR pools
      RDMA/core: Add netlink command to modify FRMR aging
      RDMA/nldev: Add command to set pinned FRMR handles
      RDMA/nldev: Expose kernel-internal FRMR pools in netlink

 drivers/infiniband/core/Makefile               |    2 +-
 drivers/infiniband/core/frmr_pools.c           |  557 ++++++++++++
 drivers/infiniband/core/frmr_pools.h           |   63 ++
 drivers/infiniband/core/nldev.c                |  286 ++++++
 drivers/infiniband/hw/mlx5/main.c              |   10 +-
 drivers/infiniband/hw/mlx5/mlx5_ib.h           |   86 +-
 drivers/infiniband/hw/mlx5/mr.c                | 1145 ++++--------------------
 drivers/infiniband/hw/mlx5/odp.c               |   19 -
 drivers/infiniband/hw/mlx5/umr.h               |    1 +
 drivers/net/ethernet/mellanox/mlx5/core/main.c |   67 +-
 include/linux/mlx5/driver.h                    |   11 -
 include/rdma/frmr_pools.h                      |   39 +
 include/rdma/ib_verbs.h                        |    8 +
 include/uapi/rdma/rdma_netlink.h               |   22 +
 14 files changed, 1171 insertions(+), 1145 deletions(-)
---
base-commit: d056bc45b62b5981ebcd18c4303a915490b8ebe9
change-id: 20251116-frmr_pools-f823cc5e8a58

Best regards,
-- 
Edward Srouji <edwards@nvidia.com>


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2026-01-26 22:55 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-22 12:40 [PATCH rdma-next v2 00/11] RDMA/core: Introduce FRMR pools infrastructure Edward Srouji
2025-12-22 12:40 ` [PATCH rdma-next v2 01/11] RDMA/mlx5: Move device async_ctx initialization Edward Srouji
2025-12-22 12:40 ` [PATCH rdma-next v2 02/11] IB/core: Introduce FRMR pools Edward Srouji
2026-01-20 16:44   ` Jason Gunthorpe
2026-01-26 22:55     ` Michael Gur
2025-12-22 12:40 ` [PATCH rdma-next v2 03/11] RDMA/core: Add aging to " Edward Srouji
2025-12-22 12:40 ` [PATCH rdma-next v2 04/11] RDMA/core: Add FRMR pools statistics Edward Srouji
2025-12-22 12:40 ` [PATCH rdma-next v2 05/11] RDMA/core: Add pinned handles to FRMR pools Edward Srouji
2025-12-22 12:40 ` [PATCH rdma-next v2 06/11] RDMA/mlx5: Switch from MR cache " Edward Srouji
2025-12-22 12:40 ` [PATCH rdma-next v2 07/11] net/mlx5: Drop MR cache related code Edward Srouji
2025-12-22 12:40 ` [PATCH rdma-next v2 08/11] RDMA/nldev: Add command to get FRMR pools Edward Srouji
2025-12-22 12:40 ` [PATCH rdma-next v2 09/11] RDMA/core: Add netlink command to modify FRMR aging Edward Srouji
2025-12-22 12:40 ` [PATCH rdma-next v2 10/11] RDMA/nldev: Add command to set pinned FRMR handles Edward Srouji
2025-12-22 12:40 ` [PATCH rdma-next v2 11/11] RDMA/nldev: Expose kernel-internal FRMR pools in netlink Edward Srouji

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox