From: Tao Cui <cui.tao@linux.dev>
To: tj@kernel.org, hannes@cmpxchg.org, mkoutny@suse.com,
leon@kernel.org, jgg@ziepe.ca
Cc: linux-rdma@vger.kernel.org, cgroups@vger.kernel.org,
Tao Cui <cuitao@kylinos.cn>
Subject: [PATCH rdma-next v2 2/3] cgroup/rdma: add MR memory size resource tracking
Date: Fri, 29 May 2026 17:07:32 +0800 [thread overview]
Message-ID: <20260529090733.2242822-3-cui.tao@linux.dev> (raw)
In-Reply-To: <20260529090733.2242822-1-cui.tao@linux.dev>
From: Tao Cui <cuitao@kylinos.cn>
Add RDMACG_RESOURCE_MR_MEM so that the cumulative memory size of
registered Memory Regions can be tracked and limited independently
from the aggregate hca_object counter.
Unlike count-based resources (hca_handle, hca_object) which are
charged in the generic IDR allocation path, MR_MEM is byte-based
and must be charged after the MR length is known. Charge in the
uverbs MR registration handlers (ioctl and legacy), and uncharge
in the generic destroy paths (alloc_abort_idr_uobject,
destroy_hw_idr_uobject).
Store the charged byte count in uobj->rdmacg_mr_mem_bytes so that
the destroy path knows how much to uncharge.
Semantic notes
~~~~~~~~~~~~~~
mr_mem is not page-level ownership tracking - it is object-based
accounting tied to the MR lifetime:
- charged at MR registration time
- uncharged at MR destruction time
- the charge lives with the MR's creating cgroup for the entire
lifetime of the MR object
This model intentionally defines accounting semantics around MR
object lifetime rather than page ownership:
1. fork(): fork() does not duplicate MR objects. Even though the
child inherits the uverbs fd and can access the parent's ucontext,
the MR remains a single kernel object. The charge is tied to the
MR object, not to the number of processes that can reach it, so
no splitting or re-accounting is needed.
2. Cgroup migration: mr_mem follows the same semantics as the existing
hca_object - charge at creation time against the invoking task's
cgroup, uncharge at destruction time. The RDMA cgroup does not
implement can_attach/attach callbacks today, so charges do not
migrate with the task. This is a known limitation that applies
equally to hca_handle and hca_object. mr_mem does not introduce
any new complication here.
3. Overlap with memory cgroup: mr_mem does not count process memory
usage - it represents a per-device DMA registration budget: how
much memory can this cgroup register through a given HCA. This is
a different dimension from what memory cgroup tracks. An
administrator might set mr_mem limits differently per device, which
memory cgroup cannot express.
In particular, mr_mem tracks the registered memory range associated
with the MR rather than exact dynamically pinned pages (e.g. for
ODP MRs). This is a stable, policy-oriented approximation of
registration footprint - not an attempt at precise physical page
accounting.
Guard against u64-to-s64 overflow by rejecting MR lengths that
exceed S64_MAX at each registration site.
Handle MR reregistration (IB_USER_VERBS_CMD_REREG_MR with
IB_MR_REREG_TRANS) by computing the delta between old and new
lengths and charging or uncharging the difference. When the driver
creates a new HW object (new_mr != NULL), the full new length is
charged to the new uobj and the old uobj's mr_mem is released
through the existing rdma_assign_uobject -> destroy_hw_idr_uobject
-> rdmacg_uncharge_uobj path.
Enable MR memory limits:
echo "mlx5_0 mr_mem=1073741824" > rdma.max
Signed-off-by: Tao Cui <cuitao@kylinos.cn>
---
drivers/infiniband/core/rdma_core.c | 14 ++++-
drivers/infiniband/core/uverbs_cmd.c | 57 +++++++++++++++++++
drivers/infiniband/core/uverbs_std_types_mr.c | 37 ++++++++++++
include/linux/cgroup_rdma.h | 1 +
include/rdma/ib_verbs.h | 1 +
kernel/cgroup/rdma.c | 21 ++++++-
6 files changed, 126 insertions(+), 5 deletions(-)
diff --git a/drivers/infiniband/core/rdma_core.c b/drivers/infiniband/core/rdma_core.c
index 3268285b5478..a540cef6bb67 100644
--- a/drivers/infiniband/core/rdma_core.c
+++ b/drivers/infiniband/core/rdma_core.c
@@ -523,10 +523,19 @@ struct ib_uobject *rdma_alloc_begin_uobject(const struct uverbs_api_object *obj,
return ret;
}
-static void alloc_abort_idr_uobject(struct ib_uobject *uobj)
+static void rdmacg_uncharge_uobj(struct ib_uobject *uobj)
{
ib_rdmacg_uncharge(&uobj->cg_obj, uobj->context->device,
RDMACG_RESOURCE_HCA_OBJECT, 1);
+ if (uobj->rdmacg_mr_mem_bytes)
+ ib_rdmacg_uncharge(&uobj->cg_obj, uobj->context->device,
+ RDMACG_RESOURCE_MR_MEM,
+ uobj->rdmacg_mr_mem_bytes);
+}
+
+static void alloc_abort_idr_uobject(struct ib_uobject *uobj)
+{
+ rdmacg_uncharge_uobj(uobj);
xa_erase(&uobj->ufile->idr, uobj->id);
}
@@ -546,8 +555,7 @@ static int __must_check destroy_hw_idr_uobject(struct ib_uobject *uobj,
if (why == RDMA_REMOVE_ABORT)
return 0;
- ib_rdmacg_uncharge(&uobj->cg_obj, uobj->context->device,
- RDMACG_RESOURCE_HCA_OBJECT, 1);
+ rdmacg_uncharge_uobj(uobj);
return 0;
}
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 9540ac180711..901de117c808 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -752,6 +752,17 @@ static int ib_uverbs_reg_mr(struct uverbs_attr_bundle *attrs)
uobj->object = mr;
uobj_put_obj_read(pd);
+
+ if (cmd.length > S64_MAX)
+ goto err_free;
+ if (cmd.length) {
+ ret = ib_rdmacg_try_charge(&uobj->cg_obj, uobj->context->device,
+ RDMACG_RESOURCE_MR_MEM, cmd.length);
+ if (ret)
+ goto err_dereg;
+ uobj->rdmacg_mr_mem_bytes = cmd.length;
+ }
+
uobj_finalize_uobj_create(uobj, attrs);
resp.lkey = mr->lkey;
@@ -759,6 +770,8 @@ static int ib_uverbs_reg_mr(struct uverbs_attr_bundle *attrs)
resp.mr_handle = uobj->id;
return uverbs_response(attrs, &resp, sizeof(resp));
+err_dereg:
+ ib_dereg_mr_user(mr, &attrs->driver_udata);
err_put:
uobj_put_obj_read(pd);
err_free:
@@ -854,6 +867,20 @@ static int ib_uverbs_rereg_mr(struct uverbs_attr_bundle *attrs)
rdma_restrack_set_name(&new_mr->res, NULL);
rdma_restrack_add(&new_mr->res);
+ if ((cmd.flags & IB_MR_REREG_TRANS) && cmd.length) {
+ if (cmd.length > S64_MAX) {
+ ret = -EINVAL;
+ goto err_rereg_new_mr;
+ }
+ ret = ib_rdmacg_try_charge(&new_uobj->cg_obj,
+ new_uobj->context->device,
+ RDMACG_RESOURCE_MR_MEM,
+ cmd.length);
+ if (ret)
+ goto err_rereg_new_mr;
+ new_uobj->rdmacg_mr_mem_bytes = cmd.length;
+ }
+
/*
* The new uobj for the new HW object is put into the same spot
* in the IDR and the old uobj & HW object is deleted.
@@ -871,6 +898,31 @@ static int ib_uverbs_rereg_mr(struct uverbs_attr_bundle *attrs)
atomic_inc(&new_pd->usecnt);
}
if (cmd.flags & IB_MR_REREG_TRANS) {
+ s64 delta;
+
+ if (cmd.length > S64_MAX) {
+ ret = -EINVAL;
+ goto put_new_uobj;
+ }
+ delta = (s64)cmd.length -
+ (s64)uobj->rdmacg_mr_mem_bytes;
+
+ if (delta > 0) {
+ ret = ib_rdmacg_try_charge(
+ &uobj->cg_obj,
+ uobj->context->device,
+ RDMACG_RESOURCE_MR_MEM,
+ delta);
+ if (ret)
+ goto put_new_uobj;
+ } else if (delta < 0) {
+ ib_rdmacg_uncharge(
+ &uobj->cg_obj,
+ uobj->context->device,
+ RDMACG_RESOURCE_MR_MEM,
+ -delta);
+ }
+ uobj->rdmacg_mr_mem_bytes = cmd.length;
mr->iova = cmd.hca_va;
mr->length = cmd.length;
}
@@ -887,6 +939,11 @@ static int ib_uverbs_rereg_mr(struct uverbs_attr_bundle *attrs)
put_new_uobj:
if (new_uobj)
uobj_alloc_abort(new_uobj, attrs);
+err_rereg_new_mr:
+ if (new_uobj) {
+ rdma_alloc_abort_uobject(new_uobj, attrs, true);
+ new_uobj = NULL;
+ }
put_uobj_pd:
if (cmd.flags & IB_MR_REREG_PD)
uobj_put_obj_read(new_pd);
diff --git a/drivers/infiniband/core/uverbs_std_types_mr.c b/drivers/infiniband/core/uverbs_std_types_mr.c
index 570b9656801d..3989ff2d282b 100644
--- a/drivers/infiniband/core/uverbs_std_types_mr.c
+++ b/drivers/infiniband/core/uverbs_std_types_mr.c
@@ -32,6 +32,7 @@
*/
#include "rdma_core.h"
+#include "core_priv.h"
#include "uverbs.h"
#include <rdma/uverbs_std_types.h>
#include "restrack.h"
@@ -140,6 +141,18 @@ static int UVERBS_HANDLER(UVERBS_METHOD_DM_MR_REG)(
rdma_restrack_set_name(&mr->res, NULL);
rdma_restrack_add(&mr->res);
uobj->object = mr;
+ if (attr.length > S64_MAX)
+ return -EINVAL;
+
+ if (attr.length) {
+ ret = ib_rdmacg_try_charge(&uobj->cg_obj, uobj->context->device,
+ RDMACG_RESOURCE_MR_MEM, attr.length);
+ if (ret) {
+ ib_dereg_mr_user(mr, &attrs->driver_udata);
+ return ret;
+ }
+ uobj->rdmacg_mr_mem_bytes = attr.length;
+ }
uverbs_finalize_uobj_create(attrs, UVERBS_ATTR_REG_DM_MR_HANDLE);
@@ -254,6 +267,18 @@ static int UVERBS_HANDLER(UVERBS_METHOD_REG_DMABUF_MR)(
rdma_restrack_add(&mr->res);
uobj->object = mr;
+ if (length > S64_MAX)
+ return -EINVAL;
+ if (length) {
+ ret = ib_rdmacg_try_charge(&uobj->cg_obj, uobj->context->device,
+ RDMACG_RESOURCE_MR_MEM, length);
+ if (ret) {
+ ib_dereg_mr_user(mr, &attrs->driver_udata);
+ return ret;
+ }
+ uobj->rdmacg_mr_mem_bytes = length;
+ }
+
uverbs_finalize_uobj_create(attrs, UVERBS_ATTR_REG_DMABUF_MR_HANDLE);
ret = uverbs_copy_to(attrs, UVERBS_ATTR_REG_DMABUF_MR_RESP_LKEY,
@@ -383,6 +408,18 @@ static int UVERBS_HANDLER(UVERBS_METHOD_REG_MR)(
rdma_restrack_add(&mr->res);
uobj->object = mr;
+ if (length > S64_MAX)
+ return -EINVAL;
+ if (length) {
+ ret = ib_rdmacg_try_charge(&uobj->cg_obj, uobj->context->device,
+ RDMACG_RESOURCE_MR_MEM, length);
+ if (ret) {
+ ib_dereg_mr_user(mr, &attrs->driver_udata);
+ return ret;
+ }
+ uobj->rdmacg_mr_mem_bytes = length;
+ }
+
uverbs_finalize_uobj_create(attrs, UVERBS_ATTR_REG_MR_HANDLE);
ret = uverbs_copy_to(attrs, UVERBS_ATTR_REG_MR_RESP_LKEY,
diff --git a/include/linux/cgroup_rdma.h b/include/linux/cgroup_rdma.h
index 7146cefa95a6..2c8fb1ebb1a9 100644
--- a/include/linux/cgroup_rdma.h
+++ b/include/linux/cgroup_rdma.h
@@ -12,6 +12,7 @@
enum rdmacg_resource_type {
RDMACG_RESOURCE_HCA_HANDLE,
RDMACG_RESOURCE_HCA_OBJECT,
+ RDMACG_RESOURCE_MR_MEM,
RDMACG_RESOURCE_MAX,
};
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 9dd76f489a0b..c7dcd5d085fb 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1569,6 +1569,7 @@ struct ib_uobject {
void *object; /* containing object */
struct list_head list; /* link to context's list */
struct ib_rdmacg_object cg_obj; /* rdmacg object */
+ s64 rdmacg_mr_mem_bytes; /* charged MR memory size */
int id; /* index into kernel idr */
struct kref ref;
atomic_t usecnt; /* protects exclusive access */
diff --git a/kernel/cgroup/rdma.c b/kernel/cgroup/rdma.c
index 519f7f537223..ebfc5721c098 100644
--- a/kernel/cgroup/rdma.c
+++ b/kernel/cgroup/rdma.c
@@ -23,14 +23,18 @@ enum rdmacg_limit_tokens {
RDMACG_HCA_HANDLE_MAX,
RDMACG_HCA_OBJECT_VAL,
RDMACG_HCA_OBJECT_MAX,
+ RDMACG_MR_MEM_VAL,
+ RDMACG_MR_MEM_MAX,
NR_RDMACG_LIMIT_TOKENS,
};
static const match_table_t rdmacg_limit_tokens = {
- { RDMACG_HCA_HANDLE_VAL, "hca_handle=%d" },
+ { RDMACG_HCA_HANDLE_VAL, "hca_handle=%d" },
{ RDMACG_HCA_HANDLE_MAX, "hca_handle=max" },
- { RDMACG_HCA_OBJECT_VAL, "hca_object=%d" },
+ { RDMACG_HCA_OBJECT_VAL, "hca_object=%d" },
{ RDMACG_HCA_OBJECT_MAX, "hca_object=max" },
+ { RDMACG_MR_MEM_VAL, "mr_mem=%d" },
+ { RDMACG_MR_MEM_MAX, "mr_mem=max" },
{ NR_RDMACG_LIMIT_TOKENS, NULL },
};
@@ -55,6 +59,7 @@ enum rdmacg_file_type {
static char const *rdmacg_resource_names[] = {
[RDMACG_RESOURCE_HCA_HANDLE] = "hca_handle",
[RDMACG_RESOURCE_HCA_OBJECT] = "hca_object",
+ [RDMACG_RESOURCE_MR_MEM] = "mr_mem",
};
/* resource tracker for each resource of rdma cgroup */
@@ -566,6 +571,18 @@ static ssize_t rdmacg_resource_set_max(struct kernfs_open_file *of,
new_limits[RDMACG_RESOURCE_HCA_OBJECT] = S64_MAX;
enables |= BIT(RDMACG_RESOURCE_HCA_OBJECT);
break;
+ case RDMACG_MR_MEM_VAL:
+ if (match_s64(&args[0], &intval)) {
+ ret = -EINVAL;
+ goto parse_err;
+ }
+ new_limits[RDMACG_RESOURCE_MR_MEM] = intval;
+ enables |= BIT(RDMACG_RESOURCE_MR_MEM);
+ break;
+ case RDMACG_MR_MEM_MAX:
+ new_limits[RDMACG_RESOURCE_MR_MEM] = S64_MAX;
+ enables |= BIT(RDMACG_RESOURCE_MR_MEM);
+ break;
default:
ret = -EINVAL;
goto parse_err;
--
2.43.0
next prev parent reply other threads:[~2026-05-29 9:07 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-29 9:07 [PATCH rdma-next v2 0/3] cgroup/rdma: add MR memory size resource tracking Tao Cui
2026-05-29 9:07 ` [PATCH rdma-next v2 1/3] cgroup/rdma: extend charge/uncharge API with s64 amount parameter Tao Cui
2026-05-29 9:07 ` Tao Cui [this message]
2026-05-29 9:07 ` [PATCH rdma-next v2 3/3] cgroup/rdma: update cgroup resource list for MR_MEM Tao Cui
2026-05-29 16:18 ` kernel test robot
2026-05-29 12:46 ` [PATCH rdma-next v2 0/3] cgroup/rdma: add MR memory size resource tracking Michal Koutný
2026-05-29 21:14 ` yanjun.zhu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260529090733.2242822-3-cui.tao@linux.dev \
--to=cui.tao@linux.dev \
--cc=cgroups@vger.kernel.org \
--cc=cuitao@kylinos.cn \
--cc=hannes@cmpxchg.org \
--cc=jgg@ziepe.ca \
--cc=leon@kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=mkoutny@suse.com \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox