Linux cgroups development
 help / color / mirror / Atom feed
* [RFC PATCH rdma-next 0/5] cgroup/rdma: add per-type resource accounting for QP, MR and MR memory
@ 2026-05-25  5:55 Tao Cui
  2026-05-25  5:55 ` [RFC PATCH rdma-next 1/5] cgroup/rdma: extend charge/uncharge API with s64 amount parameter Tao Cui
                   ` (5 more replies)
  0 siblings, 6 replies; 11+ messages in thread
From: Tao Cui @ 2026-05-25  5:55 UTC (permalink / raw)
  To: tj, hannes, mkoutny, leon, jgg; +Cc: linux-rdma, cgroups, Tao Cui

Currently the RDMA cgroup only tracks two aggregate counters:
hca_handle and hca_object.  This is too coarse for real-world
deployment: a tenant can exhaust all HCA objects by creating nothing
but QPs, while the administrator has no way to impose separate limits
on QP count, MR count, or the cumulative memory registered through
MRs.

This RFC series adds per-type resource counters for three new
resource types on top of the existing hca_handle / hca_object:

  - qp      - Queue Pair count
  - mr      - Memory Region count
  - mr_mem  - Cumulative MR memory size in bytes

After this series an administrator can set limits like:

    echo "mlx5_0 qp=100 mr=500 mr_mem=1073741824" > rdma.max

Design decisions that I would appreciate feedback on:

  1. Dual charging: the existing hca_object charge is retained for
     QP and MR objects.  The per-type counter is charged in addition.
     This keeps backward compatibility - existing deployments that rely
     on hca_object limits continue to work.  An alternative would be
     to replace hca_object with per-type counters entirely, but that
     breaks the ABI.

  2. MR memory is byte-based: unlike QP/MR which are simple counts,
     mr_mem tracks the actual length parameter passed at MR
     registration time (both ioctl and legacy verbs paths).  This
     required changing the internal accounting from int to s64.  The
     match_int parser is replaced with a match_s64 helper using
     kstrtoll.

  3. Charging point for mr_mem: the byte charge happens after the
     MR is created but before the uobject is finalized, so that the
     error path can deregister the MR cleanly.  The charged byte count
     is stored in uobj->rdmacg_mr_mem_bytes so that the generic
     destroy / abort paths can uncharge without knowing the MR length.

  4. Overflow protection: the s64 addition in rdmacg_try_charge()
     checks for both overflow (new < old) and limit exceedance.

Open questions:

  - Should hca_object be deprecated in favor of the per-type counters,
    or should we keep dual charging indefinitely?

  - The mr_mem counter tracks the length requested by the user, not
    the actual pinned pages.  A process that registers a large MR but
    only touches a subset still consumes the full quota.  Is this the
    right semantic, or should we instead track pinned_page_counts?

This is marked RFC because the cgroup ABI change (new resource types)
is hard to revoke once merged, and I want to make sure the above
design choices are aligned with the maintainers' expectations before
proceeding to a formal submission.

Tao Cui (5):
  cgroup/rdma: extend charge/uncharge API with s64 amount parameter
  cgroup/rdma: add QP per-type resource counting
  cgroup/rdma: add MR per-type resource counting
  cgroup/rdma: add MR memory size per-type resource counting
  cgroup/rdma: update cgroup resource list for QP, MR and MR_MEM

 Documentation/admin-guide/cgroup-v2.rst       |  19 ++-
 drivers/infiniband/core/cgroup.c              |  10 +-
 drivers/infiniband/core/core_priv.h           |  12 +-
 drivers/infiniband/core/rdma_core.c           |  48 +++++-
 drivers/infiniband/core/uverbs_cmd.c          |  16 +-
 drivers/infiniband/core/uverbs_std_types_mr.c |  32 ++++
 include/linux/cgroup_rdma.h                   |  10 +-
 include/rdma/ib_verbs.h                       |   2 +
 kernel/cgroup/rdma.c                          | 151 ++++++++++++++----
 9 files changed, 243 insertions(+), 57 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2026-05-28 13:06 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-25  5:55 [RFC PATCH rdma-next 0/5] cgroup/rdma: add per-type resource accounting for QP, MR and MR memory Tao Cui
2026-05-25  5:55 ` [RFC PATCH rdma-next 1/5] cgroup/rdma: extend charge/uncharge API with s64 amount parameter Tao Cui
2026-05-25  5:55 ` [RFC PATCH rdma-next 2/5] cgroup/rdma: add QP per-type resource counting Tao Cui
2026-05-25  5:55 ` [RFC PATCH rdma-next 3/5] cgroup/rdma: add MR " Tao Cui
2026-05-25  5:55 ` [RFC PATCH rdma-next 4/5] cgroup/rdma: add MR memory size " Tao Cui
2026-05-25  5:55 ` [RFC PATCH rdma-next 5/5] cgroup/rdma: update cgroup resource list for QP, MR and MR_MEM Tao Cui
2026-05-25 13:43 ` [RFC PATCH rdma-next 0/5] cgroup/rdma: add per-type resource accounting for QP, MR and MR memory Jason Gunthorpe
2026-05-27 11:28   ` Tao Cui
2026-05-27 13:34     ` Jason Gunthorpe
2026-05-28  7:55       ` Tao Cui
2026-05-28 13:06         ` Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox