From: Tao Cui <cui.tao@linux.dev>
To: "Michal Koutný" <mkoutny@suse.com>
Cc: cui.tao@linux.dev, tj@kernel.org, hannes@cmpxchg.org,
leon@kernel.org, jgg@ziepe.ca, linux-rdma@vger.kernel.org,
cgroups@vger.kernel.org, Tao Cui <cuitao@kylinos.cn>
Subject: Re: [PATCH rdma-next v2 0/3] cgroup/rdma: add MR memory size resource tracking
Date: Mon, 1 Jun 2026 13:37:48 +0800 [thread overview]
Message-ID: <48e538b6-0eb3-463d-ae48-5190a5e196a7@linux.dev> (raw)
In-Reply-To: <ahmG_ualxJT5WU_B@localhost.localdomain>
Hi Michal,
Thanks for the review and for the reference.
> IIUC the pinned memory is regular RAM, i.e. it could be controlled
> with memcg as needed. Or is there "physical" limit of what can be
> assigned to a single device?
You are right that the pages associated with an MR are regular system
RAM. However, MR registration does not allocate new pages; it registers
existing pages that are already charged to the allocating process's
memcg.
For that reason, mr_mem is intended to represent a different resource
dimension: not "how much memory does this cgroup own", but "how much
memory may this cgroup register through a given HCA". In other words:
* memcg limits memory ownership/consumption
* mr_mem limits RDMA registration footprint
An administrator may reasonably wish to set different registration
budgets per device (for example, 1G through mlx5_0 and 4G through
mlx5_1) for the same cgroup. memcg has no notion of device-scoped
limits; it only tracks aggregate memory consumption.
This distinction is important because memory ownership and DMA
registration are not necessarily constrained by the same policy. A
tenant may remain within its memcg limit while still consuming a large
portion of a particular HCA's registration capacity. The existing RDMA
controller already provides a per-device resource control framework,
and mr_mem extends that model to cover memory registration footprint.
> Or is there "physical" limit of what can be assigned to a single device?
Yes. Real HCAs have finite resources associated with memory
registration, such as MTT/MPT capacity and related DMA translation
resources. In practice, administrators often need to prevent one tenant
from consuming a disproportionate share of a particular HCA's
registration capacity, even when sufficient system memory remains
available.
It is also worth noting that mr_mem is intentionally not an attempt to
account exact pinned pages. The accounting model is tied to MR object
lifetime and tracks registration footprint rather than dynamic physical
page state. For example, ODP MRs may have only a subset of their pages
pinned at any given time, yet still consume registration resources on
the HCA. This is why the proposal focuses on a stable,
policy-oriented registration budget rather than precise memory
ownership accounting.
> BTW, have a look at [1], it'd be good to converge to similar approach
> (the current proposal allows distinguishing whether charging should
> include or exempt memcg counting).
I've read the related dma-buf accounting work.
My understanding is that those proposals focus on allocations that
create new memory on behalf of a device, which is naturally accounted
through memcg.
RDMA MR registration is different because no new memory is allocated.
The MR object is an in-kernel registration of existing memory that has
already been accounted elsewhere. The resource being limited is
therefore the registration itself rather than the underlying memory
pages.
> Also it seems, that the dmem controller could be a one-stop solution
> for all DMA charges. Please tell me if there are any distinguishing
> factors between RDMA devices' memory and these dmem memory regions.
One distinction is that the current dmem work appears to focus on
memory resources allocated on behalf of a device, whereas mr_mem is
intended to limit host memory registered for DMA through RDMA MRs.
RDMA NICs typically do not have large device-local memory pools;
instead they provide DMA access to host RAM through memory
registration. As a result, the resource being controlled here is not
device memory consumption itself, but the registration footprint
associated with a particular HCA.
Another difference is the accounting model itself. The proposed mr_mem
accounting is tied to MR object lifetime and tracks registration
footprint rather than precise physical page usage.
My understanding is that dmem is currently integrated with the DRM/TTM
subsystem for device-local memory accounting, and there is no existing
RDMA integration today. I have not investigated what would be required
to extend that model to RDMA registration accounting.
That said, I agree that convergence would be desirable if a generic
framework can naturally express per-device DMA registration budgets.
My goal here is not necessarily to require RDMA-specific accounting,
but to address a practical resource-control problem within the existing
RDMA cgroup framework.
Thanks,
Tao
next prev parent reply other threads:[~2026-06-01 5:38 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-29 9:07 [PATCH rdma-next v2 0/3] cgroup/rdma: add MR memory size resource tracking Tao Cui
2026-05-29 9:07 ` [PATCH rdma-next v2 1/3] cgroup/rdma: extend charge/uncharge API with s64 amount parameter Tao Cui
2026-05-29 9:07 ` [PATCH rdma-next v2 2/3] cgroup/rdma: add MR memory size resource tracking Tao Cui
2026-05-29 9:07 ` [PATCH rdma-next v2 3/3] cgroup/rdma: update cgroup resource list for MR_MEM Tao Cui
2026-05-29 16:18 ` kernel test robot
2026-05-29 12:46 ` [PATCH rdma-next v2 0/3] cgroup/rdma: add MR memory size resource tracking Michal Koutný
2026-06-01 5:37 ` Tao Cui [this message]
2026-05-29 21:14 ` yanjun.zhu
2026-06-01 6:08 ` Tao Cui
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=48e538b6-0eb3-463d-ae48-5190a5e196a7@linux.dev \
--to=cui.tao@linux.dev \
--cc=cgroups@vger.kernel.org \
--cc=cuitao@kylinos.cn \
--cc=hannes@cmpxchg.org \
--cc=jgg@ziepe.ca \
--cc=leon@kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=mkoutny@suse.com \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.