From: Tao Cui <cui.tao@linux.dev>
To: tj@kernel.org, hannes@cmpxchg.org, mkoutny@suse.com,
leon@kernel.org, jgg@ziepe.ca
Cc: linux-rdma@vger.kernel.org, cgroups@vger.kernel.org,
Tao Cui <cuitao@kylinos.cn>
Subject: [PATCH rdma-next v2 2/3] cgroup/rdma: add MR memory size resource tracking
Date: Fri, 29 May 2026 17:07:32 +0800 [thread overview]
Message-ID: <20260529090733.2242822-3-cui.tao@linux.dev> (raw)
In-Reply-To: <20260529090733.2242822-1-cui.tao@linux.dev>
From: Tao Cui <cuitao@kylinos.cn>
Add RDMACG_RESOURCE_MR_MEM so that the cumulative memory size of
registered Memory Regions can be tracked and limited independently
from the aggregate hca_object counter.
Unlike count-based resources (hca_handle, hca_object) which are
charged in the generic IDR allocation path, MR_MEM is byte-based
and must be charged after the MR length is known. Charge in the
uverbs MR registration handlers (ioctl and legacy), and uncharge
in the generic destroy paths (alloc_abort_idr_uobject,
destroy_hw_idr_uobject).
Store the charged byte count in uobj->rdmacg_mr_mem_bytes so that
the destroy path knows how much to uncharge.
Semantic notes
~~~~~~~~~~~~~~
mr_mem is not page-level ownership tracking - it is object-based
accounting tied to the MR lifetime:
- charged at MR registration time
- uncharged at MR destruction time
- the charge lives with the MR's creating cgroup for the entire
lifetime of the MR object
This model intentionally defines accounting semantics around MR
object lifetime rather than page ownership:
1. fork(): fork() does not duplicate MR objects. Even though the
child inherits the uverbs fd and can access the parent's ucontext,
the MR remains a single kernel object. The charge is tied to the
MR object, not to the number of processes that can reach it, so
no splitting or re-accounting is needed.
2. Cgroup migration: mr_mem follows the same semantics as the existing
hca_object - charge at creation time against the invoking task's
cgroup, uncharge at destruction time. The RDMA cgroup does not
implement can_attach/attach callbacks today, so charges do not
migrate with the task. This is a known limitation that applies
equally to hca_handle and hca_object. mr_mem does not introduce
any new complication here.
3. Overlap with memory cgroup: mr_mem does not count process memory
usage - it represents a per-device DMA registration budget: how
much memory can this cgroup register through a given HCA. This is
a different dimension from what memory cgroup tracks. An
administrator might set mr_mem limits differently per device, which
memory cgroup cannot express.
In particular, mr_mem tracks the registered memory range associated
with the MR rather than exact dynamically pinned pages (e.g. for
ODP MRs). This is a stable, policy-oriented approximation of
registration footprint - not an attempt at precise physical page
accounting.
Guard against u64-to-s64 overflow by rejecting MR lengths that
exceed S64_MAX at each registration site.
Handle MR reregistration (IB_USER_VERBS_CMD_REREG_MR with
IB_MR_REREG_TRANS) by computing the delta between old and new
lengths and charging or uncharging the difference. When the driver
creates a new HW object (new_mr != NULL), the full new length is
charged to the new uobj and the old uobj's mr_mem is released
through the existing rdma_assign_uobject -> destroy_hw_idr_uobject
-> rdmacg_uncharge_uobj path.
Enable MR memory limits:
echo "mlx5_0 mr_mem=1073741824" > rdma.max
Signed-off-by: Tao Cui <cuitao@kylinos.cn>
---
drivers/infiniband/core/rdma_core.c | 14 ++++-
drivers/infiniband/core/uverbs_cmd.c | 57 +++++++++++++++++++
drivers/infiniband/core/uverbs_std_types_mr.c | 37 ++++++++++++
include/linux/cgroup_rdma.h | 1 +
include/rdma/ib_verbs.h | 1 +
kernel/cgroup/rdma.c | 21 ++++++-
6 files changed, 126 insertions(+), 5 deletions(-)
diff --git a/drivers/infiniband/core/rdma_core.c b/drivers/infiniband/core/rdma_core.c
index 3268285b5478..a540cef6bb67 100644
--- a/drivers/infiniband/core/rdma_core.c
+++ b/drivers/infiniband/core/rdma_core.c
@@ -523,10 +523,19 @@ struct ib_uobject *rdma_alloc_begin_uobject(const struct uverbs_api_object *obj,
return ret;
}
-static void alloc_abort_idr_uobject(struct ib_uobject *uobj)
+static void rdmacg_uncharge_uobj(struct ib_uobject *uobj)
{
ib_rdmacg_uncharge(&uobj->cg_obj, uobj->context->device,
RDMACG_RESOURCE_HCA_OBJECT, 1);
+ if (uobj->rdmacg_mr_mem_bytes)
+ ib_rdmacg_uncharge(&uobj->cg_obj, uobj->context->device,
+ RDMACG_RESOURCE_MR_MEM,
+ uobj->rdmacg_mr_mem_bytes);
+}
+
+static void alloc_abort_idr_uobject(struct ib_uobject *uobj)
+{
+ rdmacg_uncharge_uobj(uobj);
xa_erase(&uobj->ufile->idr, uobj->id);
}
@@ -546,8 +555,7 @@ static int __must_check destroy_hw_idr_uobject(struct ib_uobject *uobj,
if (why == RDMA_REMOVE_ABORT)
return 0;
- ib_rdmacg_uncharge(&uobj->cg_obj, uobj->context->device,
- RDMACG_RESOURCE_HCA_OBJECT, 1);
+ rdmacg_uncharge_uobj(uobj);
return 0;
}
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 9540ac180711..901de117c808 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -752,6 +752,17 @@ static int ib_uverbs_reg_mr(struct uverbs_attr_bundle *attrs)
uobj->object = mr;
uobj_put_obj_read(pd);
+
+ if (cmd.length > S64_MAX)
+ goto err_free;
+ if (cmd.length) {
+ ret = ib_rdmacg_try_charge(&uobj->cg_obj, uobj->context->device,
+ RDMACG_RESOURCE_MR_MEM, cmd.length);
+ if (ret)
+ goto err_dereg;
+ uobj->rdmacg_mr_mem_bytes = cmd.length;
+ }
+
uobj_finalize_uobj_create(uobj, attrs);
resp.lkey = mr->lkey;
@@ -759,6 +770,8 @@ static int ib_uverbs_reg_mr(struct uverbs_attr_bundle *attrs)
resp.mr_handle = uobj->id;
return uverbs_response(attrs, &resp, sizeof(resp));
+err_dereg:
+ ib_dereg_mr_user(mr, &attrs->driver_udata);
err_put:
uobj_put_obj_read(pd);
err_free:
@@ -854,6 +867,20 @@ static int ib_uverbs_rereg_mr(struct uverbs_attr_bundle *attrs)
rdma_restrack_set_name(&new_mr->res, NULL);
rdma_restrack_add(&new_mr->res);
+ if ((cmd.flags & IB_MR_REREG_TRANS) && cmd.length) {
+ if (cmd.length > S64_MAX) {
+ ret = -EINVAL;
+ goto err_rereg_new_mr;
+ }
+ ret = ib_rdmacg_try_charge(&new_uobj->cg_obj,
+ new_uobj->context->device,
+ RDMACG_RESOURCE_MR_MEM,
+ cmd.length);
+ if (ret)
+ goto err_rereg_new_mr;
+ new_uobj->rdmacg_mr_mem_bytes = cmd.length;
+ }
+
/*
* The new uobj for the new HW object is put into the same spot
* in the IDR and the old uobj & HW object is deleted.
@@ -871,6 +898,31 @@ static int ib_uverbs_rereg_mr(struct uverbs_attr_bundle *attrs)
atomic_inc(&new_pd->usecnt);
}
if (cmd.flags & IB_MR_REREG_TRANS) {
+ s64 delta;
+
+ if (cmd.length > S64_MAX) {
+ ret = -EINVAL;
+ goto put_new_uobj;
+ }
+ delta = (s64)cmd.length -
+ (s64)uobj->rdmacg_mr_mem_bytes;
+
+ if (delta > 0) {
+ ret = ib_rdmacg_try_charge(
+ &uobj->cg_obj,
+ uobj->context->device,
+ RDMACG_RESOURCE_MR_MEM,
+ delta);
+ if (ret)
+ goto put_new_uobj;
+ } else if (delta < 0) {
+ ib_rdmacg_uncharge(
+ &uobj->cg_obj,
+ uobj->context->device,
+ RDMACG_RESOURCE_MR_MEM,
+ -delta);
+ }
+ uobj->rdmacg_mr_mem_bytes = cmd.length;
mr->iova = cmd.hca_va;
mr->length = cmd.length;
}
@@ -887,6 +939,11 @@ static int ib_uverbs_rereg_mr(struct uverbs_attr_bundle *attrs)
put_new_uobj:
if (new_uobj)
uobj_alloc_abort(new_uobj, attrs);
+err_rereg_new_mr:
+ if (new_uobj) {
+ rdma_alloc_abort_uobject(new_uobj, attrs, true);
+ new_uobj = NULL;
+ }
put_uobj_pd:
if (cmd.flags & IB_MR_REREG_PD)
uobj_put_obj_read(new_pd);
diff --git a/drivers/infiniband/core/uverbs_std_types_mr.c b/drivers/infiniband/core/uverbs_std_types_mr.c
index 570b9656801d..3989ff2d282b 100644
--- a/drivers/infiniband/core/uverbs_std_types_mr.c
+++ b/drivers/infiniband/core/uverbs_std_types_mr.c
@@ -32,6 +32,7 @@
*/
#include "rdma_core.h"
+#include "core_priv.h"
#include "uverbs.h"
#include <rdma/uverbs_std_types.h>
#include "restrack.h"
@@ -140,6 +141,18 @@ static int UVERBS_HANDLER(UVERBS_METHOD_DM_MR_REG)(
rdma_restrack_set_name(&mr->res, NULL);
rdma_restrack_add(&mr->res);
uobj->object = mr;
+ if (attr.length > S64_MAX)
+ return -EINVAL;
+
+ if (attr.length) {
+ ret = ib_rdmacg_try_charge(&uobj->cg_obj, uobj->context->device,
+ RDMACG_RESOURCE_MR_MEM, attr.length);
+ if (ret) {
+ ib_dereg_mr_user(mr, &attrs->driver_udata);
+ return ret;
+ }
+ uobj->rdmacg_mr_mem_bytes = attr.length;
+ }
uverbs_finalize_uobj_create(attrs, UVERBS_ATTR_REG_DM_MR_HANDLE);
@@ -254,6 +267,18 @@ static int UVERBS_HANDLER(UVERBS_METHOD_REG_DMABUF_MR)(
rdma_restrack_add(&mr->res);
uobj->object = mr;
+ if (length > S64_MAX)
+ return -EINVAL;
+ if (length) {
+ ret = ib_rdmacg_try_charge(&uobj->cg_obj, uobj->context->device,
+ RDMACG_RESOURCE_MR_MEM, length);
+ if (ret) {
+ ib_dereg_mr_user(mr, &attrs->driver_udata);
+ return ret;
+ }
+ uobj->rdmacg_mr_mem_bytes = length;
+ }
+
uverbs_finalize_uobj_create(attrs, UVERBS_ATTR_REG_DMABUF_MR_HANDLE);
ret = uverbs_copy_to(attrs, UVERBS_ATTR_REG_DMABUF_MR_RESP_LKEY,
@@ -383,6 +408,18 @@ static int UVERBS_HANDLER(UVERBS_METHOD_REG_MR)(
rdma_restrack_add(&mr->res);
uobj->object = mr;
+ if (length > S64_MAX)
+ return -EINVAL;
+ if (length) {
+ ret = ib_rdmacg_try_charge(&uobj->cg_obj, uobj->context->device,
+ RDMACG_RESOURCE_MR_MEM, length);
+ if (ret) {
+ ib_dereg_mr_user(mr, &attrs->driver_udata);
+ return ret;
+ }
+ uobj->rdmacg_mr_mem_bytes = length;
+ }
+
uverbs_finalize_uobj_create(attrs, UVERBS_ATTR_REG_MR_HANDLE);
ret = uverbs_copy_to(attrs, UVERBS_ATTR_REG_MR_RESP_LKEY,
diff --git a/include/linux/cgroup_rdma.h b/include/linux/cgroup_rdma.h
index 7146cefa95a6..2c8fb1ebb1a9 100644
--- a/include/linux/cgroup_rdma.h
+++ b/include/linux/cgroup_rdma.h
@@ -12,6 +12,7 @@
enum rdmacg_resource_type {
RDMACG_RESOURCE_HCA_HANDLE,
RDMACG_RESOURCE_HCA_OBJECT,
+ RDMACG_RESOURCE_MR_MEM,
RDMACG_RESOURCE_MAX,
};
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 9dd76f489a0b..c7dcd5d085fb 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1569,6 +1569,7 @@ struct ib_uobject {
void *object; /* containing object */
struct list_head list; /* link to context's list */
struct ib_rdmacg_object cg_obj; /* rdmacg object */
+ s64 rdmacg_mr_mem_bytes; /* charged MR memory size */
int id; /* index into kernel idr */
struct kref ref;
atomic_t usecnt; /* protects exclusive access */
diff --git a/kernel/cgroup/rdma.c b/kernel/cgroup/rdma.c
index 519f7f537223..ebfc5721c098 100644
--- a/kernel/cgroup/rdma.c
+++ b/kernel/cgroup/rdma.c
@@ -23,14 +23,18 @@ enum rdmacg_limit_tokens {
RDMACG_HCA_HANDLE_MAX,
RDMACG_HCA_OBJECT_VAL,
RDMACG_HCA_OBJECT_MAX,
+ RDMACG_MR_MEM_VAL,
+ RDMACG_MR_MEM_MAX,
NR_RDMACG_LIMIT_TOKENS,
};
static const match_table_t rdmacg_limit_tokens = {
- { RDMACG_HCA_HANDLE_VAL, "hca_handle=%d" },
+ { RDMACG_HCA_HANDLE_VAL, "hca_handle=%d" },
{ RDMACG_HCA_HANDLE_MAX, "hca_handle=max" },
- { RDMACG_HCA_OBJECT_VAL, "hca_object=%d" },
+ { RDMACG_HCA_OBJECT_VAL, "hca_object=%d" },
{ RDMACG_HCA_OBJECT_MAX, "hca_object=max" },
+ { RDMACG_MR_MEM_VAL, "mr_mem=%d" },
+ { RDMACG_MR_MEM_MAX, "mr_mem=max" },
{ NR_RDMACG_LIMIT_TOKENS, NULL },
};
@@ -55,6 +59,7 @@ enum rdmacg_file_type {
static char const *rdmacg_resource_names[] = {
[RDMACG_RESOURCE_HCA_HANDLE] = "hca_handle",
[RDMACG_RESOURCE_HCA_OBJECT] = "hca_object",
+ [RDMACG_RESOURCE_MR_MEM] = "mr_mem",
};
/* resource tracker for each resource of rdma cgroup */
@@ -566,6 +571,18 @@ static ssize_t rdmacg_resource_set_max(struct kernfs_open_file *of,
new_limits[RDMACG_RESOURCE_HCA_OBJECT] = S64_MAX;
enables |= BIT(RDMACG_RESOURCE_HCA_OBJECT);
break;
+ case RDMACG_MR_MEM_VAL:
+ if (match_s64(&args[0], &intval)) {
+ ret = -EINVAL;
+ goto parse_err;
+ }
+ new_limits[RDMACG_RESOURCE_MR_MEM] = intval;
+ enables |= BIT(RDMACG_RESOURCE_MR_MEM);
+ break;
+ case RDMACG_MR_MEM_MAX:
+ new_limits[RDMACG_RESOURCE_MR_MEM] = S64_MAX;
+ enables |= BIT(RDMACG_RESOURCE_MR_MEM);
+ break;
default:
ret = -EINVAL;
goto parse_err;
--
2.43.0
next prev parent reply other threads:[~2026-05-29 9:07 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-29 9:07 [PATCH rdma-next v2 0/3] cgroup/rdma: add MR memory size resource tracking Tao Cui
2026-05-29 9:07 ` [PATCH rdma-next v2 1/3] cgroup/rdma: extend charge/uncharge API with s64 amount parameter Tao Cui
2026-05-29 9:07 ` Tao Cui [this message]
2026-05-29 9:07 ` [PATCH rdma-next v2 3/3] cgroup/rdma: update cgroup resource list for MR_MEM Tao Cui
2026-05-29 16:18 ` kernel test robot
2026-05-29 12:46 ` [PATCH rdma-next v2 0/3] cgroup/rdma: add MR memory size resource tracking Michal Koutný
2026-06-01 5:37 ` Tao Cui
2026-05-29 21:14 ` yanjun.zhu
2026-06-01 6:08 ` Tao Cui
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260529090733.2242822-3-cui.tao@linux.dev \
--to=cui.tao@linux.dev \
--cc=cgroups@vger.kernel.org \
--cc=cuitao@kylinos.cn \
--cc=hannes@cmpxchg.org \
--cc=jgg@ziepe.ca \
--cc=leon@kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=mkoutny@suse.com \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.