From: Tao Cui <cui.tao@linux.dev>
To: tj@kernel.org, hannes@cmpxchg.org, mkoutny@suse.com,
leon@kernel.org, jgg@ziepe.ca
Cc: linux-rdma@vger.kernel.org, cgroups@vger.kernel.org,
Tao Cui <cuitao@kylinos.cn>
Subject: [PATCH rdma-next v2 0/3] cgroup/rdma: add MR memory size resource tracking
Date: Fri, 29 May 2026 17:07:30 +0800 [thread overview]
Message-ID: <20260529090733.2242822-1-cui.tao@linux.dev> (raw)
From: Tao Cui <cuitao@kylinos.cn>
Currently the RDMA cgroup only tracks two aggregate counters:
hca_handle and hca_object. The real scarce resource in multi-tenant
deployments is pinned memory: how much physical memory gets registered
through MRs. The existing hca_object counter is too coarse to capture
this.
This series adds a single new resource type:
- mr_mem - Cumulative MR memory size in bytes
The per-object-type counters (qp, mr) from RFC v1 have been removed
per review feedback [1]: modern NICs pool objects from the same memory
pool so the distinction between QP count and MR count is not
meaningful for resource limiting. hca_object remains sufficient for
coarse object accounting.
After this series, an administrator can set limits like:
echo "mlx5_0 mr_mem=1073741824" > rdma.max
Design
~~~~~~
mr_mem is not page-level ownership tracking; it is object-based
accounting tied to the MR lifetime:
- charged at MR registration time
- uncharged at MR destruction time
- the charge is pinned to the cgroup that created the MR for the
entire lifetime of the MR object
This model intentionally defines accounting semantics around MR
object lifetime rather than page ownership:
1. fork(): fork() does not duplicate MR objects. Even though the
child inherits the uverbs fd and can access the parent's ucontext,
the MR remains a single kernel object. The charge is tied to the
MR object, not to the number of processes that can reach it, so
no splitting or re-accounting is needed.
2. Cgroup migration: mr_mem follows the same semantics as the existing
hca_object; charge at creation time against the invoking task's
cgroup, uncharge at destruction time. The RDMA cgroup does not
implement can_attach/attach callbacks today, so charges do not
migrate with the task. This is a known limitation that applies
equally to hca_handle and hca_object. mr_mem does not introduce
any new complication here.
3. Overlap with memory cgroup: mr_mem does not count process memory
usage; it represents a per-device DMA registration budget: the
amount of memory this cgroup may register through a given HCA.
This is a different dimension from what memory cgroup tracks. An
administrator might set mr_mem limits differently per device, which
memory cgroup cannot express.
In particular, mr_mem tracks the registered memory range associated
with the MR rather than exact dynamically pinned pages (e.g. for
ODP MRs). This is a stable, policy-oriented approximation of
registration footprint, not an attempt at precise physical page
accounting.
Tao Cui (3):
cgroup/rdma: extend charge/uncharge API with s64 amount parameter
cgroup/rdma: add MR memory size resource tracking
cgroup/rdma: update cgroup resource list for MR_MEM
Documentation/admin-guide/cgroup-v2.rst | 21 ++--
drivers/infiniband/core/cgroup.c | 10 +-
drivers/infiniband/core/core_priv.h | 12 +-
drivers/infiniband/core/rdma_core.c | 20 +++-
drivers/infiniband/core/uverbs_cmd.c | 61 +++++++++-
drivers/infiniband/core/uverbs_std_types_mr.c | 37 ++++++
include/linux/cgroup_rdma.h | 8 +-
include/rdma/ib_verbs.h | 1 +
kernel/cgroup/rdma.c | 108 +++++++++++++-----
9 files changed, 219 insertions(+), 59 deletions(-)
---
Changes from RFC v1:
- Removed RDMACG_RESOURCE_QP and RDMACG_RESOURCE_MR per-type
counters following review feedback from Jason Gunthorpe [1].
- Retained only RDMACG_RESOURCE_MR_MEM as the sole new resource.
- Added detailed semantic notes to the commit messages addressing
fork(), cgroup migration, and overlap with memory cgroup [2].
- Renamed patches to reflect the narrower scope.
[1] https://lore.kernel.org/all/20260525134314.GI7702@ziepe.ca/
[2] https://lore.kernel.org/all/20260528075537.2170697-1-cuitao@kylinos.cn/
--
2.43.0
next reply other threads:[~2026-05-29 9:08 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-29 9:07 Tao Cui [this message]
2026-05-29 9:07 ` [PATCH rdma-next v2 1/3] cgroup/rdma: extend charge/uncharge API with s64 amount parameter Tao Cui
2026-05-29 9:07 ` [PATCH rdma-next v2 2/3] cgroup/rdma: add MR memory size resource tracking Tao Cui
2026-05-29 9:07 ` [PATCH rdma-next v2 3/3] cgroup/rdma: update cgroup resource list for MR_MEM Tao Cui
2026-05-29 16:18 ` kernel test robot
2026-05-29 12:46 ` [PATCH rdma-next v2 0/3] cgroup/rdma: add MR memory size resource tracking Michal Koutný
2026-05-29 21:14 ` yanjun.zhu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260529090733.2242822-1-cui.tao@linux.dev \
--to=cui.tao@linux.dev \
--cc=cgroups@vger.kernel.org \
--cc=cuitao@kylinos.cn \
--cc=hannes@cmpxchg.org \
--cc=jgg@ziepe.ca \
--cc=leon@kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=mkoutny@suse.com \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox