* [RFC PATCH rdma-next 1/5] cgroup/rdma: extend charge/uncharge API with s64 amount parameter
2026-05-25 5:55 [RFC PATCH rdma-next 0/5] cgroup/rdma: add per-type resource accounting for QP, MR and MR memory Tao Cui
@ 2026-05-25 5:55 ` Tao Cui
2026-05-25 5:55 ` [RFC PATCH rdma-next 2/5] cgroup/rdma: add QP per-type resource counting Tao Cui
` (4 subsequent siblings)
5 siblings, 0 replies; 11+ messages in thread
From: Tao Cui @ 2026-05-25 5:55 UTC (permalink / raw)
To: tj, hannes, mkoutny, leon, jgg; +Cc: linux-rdma, cgroups, Tao Cui
Change struct rdmacg_resource fields (max, usage) and all
charge/uncharge function signatures from int to s64 to prepare for
byte-sized resource tracking such as MR memory.
Replace match_int with a match_s64 helper that uses kstrtoll so the
user-space limit tokens accept 64-bit values. All existing callers
pass amount=1 (count-based), so the change is transparent for
existing count-based resources.
Signed-off-by: Tao Cui <cuitao@kylinos.cn>
---
drivers/infiniband/core/cgroup.c | 10 +--
drivers/infiniband/core/core_priv.h | 12 ++--
drivers/infiniband/core/rdma_core.c | 8 +--
drivers/infiniband/core/uverbs_cmd.c | 4 +-
include/linux/cgroup_rdma.h | 7 +-
kernel/cgroup/rdma.c | 99 +++++++++++++++++++---------
6 files changed, 93 insertions(+), 47 deletions(-)
diff --git a/drivers/infiniband/core/cgroup.c b/drivers/infiniband/core/cgroup.c
index 1f037fe01450..81e24de72392 100644
--- a/drivers/infiniband/core/cgroup.c
+++ b/drivers/infiniband/core/cgroup.c
@@ -36,18 +36,20 @@ void ib_device_unregister_rdmacg(struct ib_device *device)
int ib_rdmacg_try_charge(struct ib_rdmacg_object *cg_obj,
struct ib_device *device,
- enum rdmacg_resource_type resource_index)
+ enum rdmacg_resource_type resource_index,
+ s64 amount)
{
return rdmacg_try_charge(&cg_obj->cg, &device->cg_device,
- resource_index);
+ resource_index, amount);
}
EXPORT_SYMBOL(ib_rdmacg_try_charge);
void ib_rdmacg_uncharge(struct ib_rdmacg_object *cg_obj,
struct ib_device *device,
- enum rdmacg_resource_type resource_index)
+ enum rdmacg_resource_type resource_index,
+ s64 amount)
{
rdmacg_uncharge(cg_obj->cg, &device->cg_device,
- resource_index);
+ resource_index, amount);
}
EXPORT_SYMBOL(ib_rdmacg_uncharge);
diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h
index a2c36666e6fc..866d99268032 100644
--- a/drivers/infiniband/core/core_priv.h
+++ b/drivers/infiniband/core/core_priv.h
@@ -159,11 +159,13 @@ void ib_device_unregister_rdmacg(struct ib_device *device);
int ib_rdmacg_try_charge(struct ib_rdmacg_object *cg_obj,
struct ib_device *device,
- enum rdmacg_resource_type resource_index);
+ enum rdmacg_resource_type resource_index,
+ s64 amount);
void ib_rdmacg_uncharge(struct ib_rdmacg_object *cg_obj,
struct ib_device *device,
- enum rdmacg_resource_type resource_index);
+ enum rdmacg_resource_type resource_index,
+ s64 amount);
#else
static inline void ib_device_register_rdmacg(struct ib_device *device)
{
@@ -175,14 +177,16 @@ static inline void ib_device_unregister_rdmacg(struct ib_device *device)
static inline int ib_rdmacg_try_charge(struct ib_rdmacg_object *cg_obj,
struct ib_device *device,
- enum rdmacg_resource_type resource_index)
+ enum rdmacg_resource_type resource_index,
+ s64 amount)
{
return 0;
}
static inline void ib_rdmacg_uncharge(struct ib_rdmacg_object *cg_obj,
struct ib_device *device,
- enum rdmacg_resource_type resource_index)
+ enum rdmacg_resource_type resource_index,
+ s64 amount)
{
}
#endif
diff --git a/drivers/infiniband/core/rdma_core.c b/drivers/infiniband/core/rdma_core.c
index 5018ec837056..3268285b5478 100644
--- a/drivers/infiniband/core/rdma_core.c
+++ b/drivers/infiniband/core/rdma_core.c
@@ -437,7 +437,7 @@ alloc_begin_idr_uobject(const struct uverbs_api_object *obj,
goto uobj_put;
ret = ib_rdmacg_try_charge(&uobj->cg_obj, uobj->context->device,
- RDMACG_RESOURCE_HCA_OBJECT);
+ RDMACG_RESOURCE_HCA_OBJECT, 1);
if (ret)
goto remove;
@@ -526,7 +526,7 @@ struct ib_uobject *rdma_alloc_begin_uobject(const struct uverbs_api_object *obj,
static void alloc_abort_idr_uobject(struct ib_uobject *uobj)
{
ib_rdmacg_uncharge(&uobj->cg_obj, uobj->context->device,
- RDMACG_RESOURCE_HCA_OBJECT);
+ RDMACG_RESOURCE_HCA_OBJECT, 1);
xa_erase(&uobj->ufile->idr, uobj->id);
}
@@ -547,7 +547,7 @@ static int __must_check destroy_hw_idr_uobject(struct ib_uobject *uobj,
return 0;
ib_rdmacg_uncharge(&uobj->cg_obj, uobj->context->device,
- RDMACG_RESOURCE_HCA_OBJECT);
+ RDMACG_RESOURCE_HCA_OBJECT, 1);
return 0;
}
@@ -878,7 +878,7 @@ static void ufile_destroy_ucontext(struct ib_uverbs_file *ufile,
}
ib_rdmacg_uncharge(&ucontext->cg_obj, ib_dev,
- RDMACG_RESOURCE_HCA_HANDLE);
+ RDMACG_RESOURCE_HCA_HANDLE, 1);
rdma_restrack_del(&ucontext->res);
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 91a62d2ade4d..9540ac180711 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -234,7 +234,7 @@ int ib_init_ucontext(struct uverbs_attr_bundle *attrs)
}
ret = ib_rdmacg_try_charge(&ucontext->cg_obj, ucontext->device,
- RDMACG_RESOURCE_HCA_HANDLE);
+ RDMACG_RESOURCE_HCA_HANDLE, 1);
if (ret)
goto err;
@@ -273,7 +273,7 @@ int ib_init_ucontext(struct uverbs_attr_bundle *attrs)
err_uncharge:
ib_rdmacg_uncharge(&ucontext->cg_obj, ucontext->device,
- RDMACG_RESOURCE_HCA_HANDLE);
+ RDMACG_RESOURCE_HCA_HANDLE, 1);
err:
mutex_unlock(&file->ucontext_lock);
up_read(&file->hw_destroy_rwsem);
diff --git a/include/linux/cgroup_rdma.h b/include/linux/cgroup_rdma.h
index 404e746552ca..7146cefa95a6 100644
--- a/include/linux/cgroup_rdma.h
+++ b/include/linux/cgroup_rdma.h
@@ -7,6 +7,7 @@
#define _CGROUP_RDMA_H
#include <linux/cgroup.h>
+#include <linux/types.h>
enum rdmacg_resource_type {
RDMACG_RESOURCE_HCA_HANDLE,
@@ -46,9 +47,11 @@ void rdmacg_unregister_device(struct rdmacg_device *device);
/* APIs for RDMA/IB stack to charge/uncharge pool specific resources */
int rdmacg_try_charge(struct rdma_cgroup **rdmacg,
struct rdmacg_device *device,
- enum rdmacg_resource_type index);
+ enum rdmacg_resource_type index,
+ s64 amount);
void rdmacg_uncharge(struct rdma_cgroup *cg,
struct rdmacg_device *device,
- enum rdmacg_resource_type index);
+ enum rdmacg_resource_type index,
+ s64 amount);
#endif /* CONFIG_CGROUP_RDMA */
#endif /* _CGROUP_RDMA_H */
diff --git a/kernel/cgroup/rdma.c b/kernel/cgroup/rdma.c
index 5e82a03b3270..7e0fff415528 100644
--- a/kernel/cgroup/rdma.c
+++ b/kernel/cgroup/rdma.c
@@ -26,10 +26,15 @@ enum rdmacg_limit_tokens {
NR_RDMACG_LIMIT_TOKENS,
};
+/* match_token uses %d for substring extraction only (simple_strtol captures
+ * all digits regardless of overflow). Actual s64 range validation is done
+ * by match_s64() below - must not be changed to %s or the "xxx=max" exact
+ * match patterns would be shadowed.
+ */
static const match_table_t rdmacg_limit_tokens = {
- { RDMACG_HCA_HANDLE_VAL, "hca_handle=%d" },
+ { RDMACG_HCA_HANDLE_VAL, "hca_handle=%d" },
{ RDMACG_HCA_HANDLE_MAX, "hca_handle=max" },
- { RDMACG_HCA_OBJECT_VAL, "hca_object=%d" },
+ { RDMACG_HCA_OBJECT_VAL, "hca_object=%d" },
{ RDMACG_HCA_OBJECT_MAX, "hca_object=max" },
{ NR_RDMACG_LIMIT_TOKENS, NULL },
};
@@ -59,9 +64,9 @@ static char const *rdmacg_resource_names[] = {
/* resource tracker for each resource of rdma cgroup */
struct rdmacg_resource {
- int max;
- int usage;
- int peak;
+ s64 max;
+ s64 usage;
+ s64 peak;
};
/*
@@ -105,13 +110,13 @@ static inline struct rdma_cgroup *get_current_rdmacg(void)
}
static void set_resource_limit(struct rdmacg_resource_pool *rpool,
- int index, int new_max)
+ int index, s64 new_max)
{
- if (new_max == S32_MAX) {
- if (rpool->resources[index].max != S32_MAX)
+ if (new_max == S64_MAX) {
+ if (rpool->resources[index].max != S64_MAX)
rpool->num_max_cnt++;
} else {
- if (rpool->resources[index].max == S32_MAX)
+ if (rpool->resources[index].max == S64_MAX)
rpool->num_max_cnt--;
}
rpool->resources[index].max = new_max;
@@ -122,7 +127,7 @@ static void set_all_resource_max_limit(struct rdmacg_resource_pool *rpool)
int i;
for (i = 0; i < RDMACG_RESOURCE_MAX; i++)
- set_resource_limit(rpool, i, S32_MAX);
+ set_resource_limit(rpool, i, S64_MAX);
}
static void free_cg_rpool_locked(struct rdmacg_resource_pool *rpool)
@@ -198,6 +203,7 @@ get_cg_rpool_locked(struct rdma_cgroup *cg, struct rdmacg_device *device)
* @cg: pointer to cg to uncharge and all parents in hierarchy
* @device: pointer to rdmacg device
* @index: index of the resource to uncharge in cg (resource pool)
+ * @amount: amount to uncharge
*
* It also frees the resource pool which was created as part of
* charging operation when there are no resources attached to
@@ -206,7 +212,8 @@ get_cg_rpool_locked(struct rdma_cgroup *cg, struct rdmacg_device *device)
static void
uncharge_cg_locked(struct rdma_cgroup *cg,
struct rdmacg_device *device,
- enum rdmacg_resource_type index)
+ enum rdmacg_resource_type index,
+ s64 amount)
{
struct rdmacg_resource_pool *rpool;
@@ -222,7 +229,7 @@ uncharge_cg_locked(struct rdma_cgroup *cg,
return;
}
- rpool->resources[index].usage--;
+ rpool->resources[index].usage -= amount;
/*
* A negative count (or overflow) is invalid,
@@ -303,18 +310,20 @@ static void rdmacg_event_locked(struct rdma_cgroup *cg,
* @stop_cg: while traversing hirerchy, when meet with stop_cg cgroup
* stop uncharging
* @index: index of the resource to uncharge in cg in given resource pool
+ * @amount: amount to uncharge
*/
static void rdmacg_uncharge_hierarchy(struct rdma_cgroup *cg,
struct rdmacg_device *device,
struct rdma_cgroup *stop_cg,
- enum rdmacg_resource_type index)
+ enum rdmacg_resource_type index,
+ s64 amount)
{
struct rdma_cgroup *p;
mutex_lock(&rdmacg_mutex);
for (p = cg; p != stop_cg; p = parent_rdmacg(p))
- uncharge_cg_locked(p, device, index);
+ uncharge_cg_locked(p, device, index, amount);
mutex_unlock(&rdmacg_mutex);
@@ -326,15 +335,17 @@ static void rdmacg_uncharge_hierarchy(struct rdma_cgroup *cg,
* @cg: pointer to cg to uncharge and all parents in hierarchy
* @device: pointer to rdmacg device
* @index: index of the resource to uncharge in cgroup in given resource pool
+ * @amount: amount to uncharge
*/
void rdmacg_uncharge(struct rdma_cgroup *cg,
struct rdmacg_device *device,
- enum rdmacg_resource_type index)
+ enum rdmacg_resource_type index,
+ s64 amount)
{
if (index >= RDMACG_RESOURCE_MAX)
return;
- rdmacg_uncharge_hierarchy(cg, device, NULL, index);
+ rdmacg_uncharge_hierarchy(cg, device, NULL, index, amount);
}
EXPORT_SYMBOL(rdmacg_uncharge);
@@ -343,6 +354,7 @@ EXPORT_SYMBOL(rdmacg_uncharge);
* @rdmacg: pointer to rdma cgroup which will own this resource
* @device: pointer to rdmacg device
* @index: index of the resource to charge in cgroup (resource pool)
+ * @amount: amount to charge
*
* This function follows charging resource in hierarchical way.
* It will fail if the charge would cause the new value to exceed the
@@ -361,7 +373,8 @@ EXPORT_SYMBOL(rdmacg_uncharge);
*/
int rdmacg_try_charge(struct rdma_cgroup **rdmacg,
struct rdmacg_device *device,
- enum rdmacg_resource_type index)
+ enum rdmacg_resource_type index,
+ s64 amount)
{
struct rdma_cgroup *cg, *p;
struct rdmacg_resource_pool *rpool;
@@ -371,6 +384,9 @@ int rdmacg_try_charge(struct rdma_cgroup **rdmacg,
if (index >= RDMACG_RESOURCE_MAX)
return -EINVAL;
+ if (amount <= 0)
+ return -EINVAL;
+
/*
* hold on to css, as cgroup can be removed but resource
* accounting happens on css.
@@ -384,8 +400,9 @@ int rdmacg_try_charge(struct rdma_cgroup **rdmacg,
ret = PTR_ERR(rpool);
goto err;
} else {
- new = (s64)rpool->resources[index].usage + 1;
- if (new > rpool->resources[index].max) {
+ new = rpool->resources[index].usage + amount;
+ if (new < rpool->resources[index].usage ||
+ new > rpool->resources[index].max) {
ret = -EAGAIN;
goto err;
} else {
@@ -409,7 +426,7 @@ int rdmacg_try_charge(struct rdma_cgroup **rdmacg,
if (ret == -EAGAIN)
rdmacg_event_locked(cg, p, device, index);
mutex_unlock(&rdmacg_mutex);
- rdmacg_uncharge_hierarchy(cg, device, p, index);
+ rdmacg_uncharge_hierarchy(cg, device, p, index, amount);
return ret;
}
EXPORT_SYMBOL(rdmacg_try_charge);
@@ -477,6 +494,25 @@ static struct rdmacg_device *rdmacg_get_device_locked(const char *name)
return NULL;
}
+static int match_s64(substring_t *s, s64 *result)
+{
+ char *buf;
+ int ret;
+ s64 val;
+
+ buf = kmemdup_nul(s->from, s->to - s->from, GFP_KERNEL);
+ if (!buf)
+ return -ENOMEM;
+ ret = kstrtoll(buf, 0, &val);
+ kfree(buf);
+ if (ret)
+ return ret;
+ if (val < 0)
+ return -EINVAL;
+ *result = val;
+ return 0;
+}
+
static ssize_t rdmacg_resource_set_max(struct kernfs_open_file *of,
char *buf, size_t nbytes, loff_t off)
{
@@ -486,7 +522,7 @@ static ssize_t rdmacg_resource_set_max(struct kernfs_open_file *of,
struct rdmacg_device *device;
char *options = strstrip(buf);
char *p;
- int *new_limits;
+ s64 *new_limits;
unsigned long enables = 0;
int i = 0, ret = 0;
@@ -497,7 +533,7 @@ static ssize_t rdmacg_resource_set_max(struct kernfs_open_file *of,
goto err;
}
- new_limits = kzalloc_objs(int, RDMACG_RESOURCE_MAX);
+ new_limits = kcalloc(RDMACG_RESOURCE_MAX, sizeof(s64), GFP_KERNEL);
if (!new_limits) {
ret = -ENOMEM;
goto err;
@@ -506,7 +542,8 @@ static ssize_t rdmacg_resource_set_max(struct kernfs_open_file *of,
/* parse resource limit tokens */
while ((p = strsep(&options, " \t\n"))) {
substring_t args[MAX_OPT_ARGS];
- int tok, intval;
+ int tok;
+ s64 intval;
if (!*p)
continue;
@@ -514,7 +551,7 @@ static ssize_t rdmacg_resource_set_max(struct kernfs_open_file *of,
tok = match_token(p, rdmacg_limit_tokens, args);
switch (tok) {
case RDMACG_HCA_HANDLE_VAL:
- if (match_int(&args[0], &intval) || intval < 0) {
+ if (match_s64(&args[0], &intval)) {
ret = -EINVAL;
goto parse_err;
}
@@ -522,11 +559,11 @@ static ssize_t rdmacg_resource_set_max(struct kernfs_open_file *of,
enables |= BIT(RDMACG_RESOURCE_HCA_HANDLE);
break;
case RDMACG_HCA_HANDLE_MAX:
- new_limits[RDMACG_RESOURCE_HCA_HANDLE] = S32_MAX;
+ new_limits[RDMACG_RESOURCE_HCA_HANDLE] = S64_MAX;
enables |= BIT(RDMACG_RESOURCE_HCA_HANDLE);
break;
case RDMACG_HCA_OBJECT_VAL:
- if (match_int(&args[0], &intval) || intval < 0) {
+ if (match_s64(&args[0], &intval)) {
ret = -EINVAL;
goto parse_err;
}
@@ -534,7 +571,7 @@ static ssize_t rdmacg_resource_set_max(struct kernfs_open_file *of,
enables |= BIT(RDMACG_RESOURCE_HCA_OBJECT);
break;
case RDMACG_HCA_OBJECT_MAX:
- new_limits[RDMACG_RESOURCE_HCA_OBJECT] = S32_MAX;
+ new_limits[RDMACG_RESOURCE_HCA_OBJECT] = S64_MAX;
enables |= BIT(RDMACG_RESOURCE_HCA_OBJECT);
break;
default:
@@ -588,7 +625,7 @@ static void print_rpool_values(struct seq_file *sf,
{
enum rdmacg_file_type sf_type;
int i;
- u32 value;
+ s64 value;
sf_type = seq_cft(sf)->private;
@@ -599,7 +636,7 @@ static void print_rpool_values(struct seq_file *sf,
if (rpool)
value = rpool->resources[i].max;
else
- value = S32_MAX;
+ value = S64_MAX;
} else if (sf_type == RDMACG_RESOURCE_TYPE_PEAK) {
value = rpool ? rpool->resources[i].peak : 0;
} else {
@@ -609,10 +646,10 @@ static void print_rpool_values(struct seq_file *sf,
value = 0;
}
- if (value == S32_MAX)
+ if (value == S64_MAX)
seq_puts(sf, RDMACG_MAX_STR);
else
- seq_printf(sf, "%d", value);
+ seq_printf(sf, "%lld", value);
seq_putc(sf, ' ');
}
}
--
2.43.0
^ permalink raw reply related [flat|nested] 11+ messages in thread* [RFC PATCH rdma-next 2/5] cgroup/rdma: add QP per-type resource counting
2026-05-25 5:55 [RFC PATCH rdma-next 0/5] cgroup/rdma: add per-type resource accounting for QP, MR and MR memory Tao Cui
2026-05-25 5:55 ` [RFC PATCH rdma-next 1/5] cgroup/rdma: extend charge/uncharge API with s64 amount parameter Tao Cui
@ 2026-05-25 5:55 ` Tao Cui
2026-05-25 5:55 ` [RFC PATCH rdma-next 3/5] cgroup/rdma: add MR " Tao Cui
` (3 subsequent siblings)
5 siblings, 0 replies; 11+ messages in thread
From: Tao Cui @ 2026-05-25 5:55 UTC (permalink / raw)
To: tj, hannes, mkoutny, leon, jgg; +Cc: linux-rdma, cgroups, Tao Cui
Add RDMACG_RESOURCE_QP so that Queue Pair creation can be tracked
and limited independently from the aggregate hca_object counter.
The existing hca_object charge in the generic IDR allocation path
still applies to QP objects (dual charging). The new per-type
counter allows administrators to set QP-specific limits:
echo "mlx5_0 qp=100" > rdma.max
Add uverbs_obj_to_rdmacg_type() to map uverbs object IDs to rdmacg
resource types. Store the mapped type in uobj->rdmacg_type so that
the generic uncharge paths (alloc_abort, destroy_hw) can issue the
per-type uncharge. Adding new per-type resources only requires
extending this mapping function.
Signed-off-by: Tao Cui <cuitao@kylinos.cn>
---
drivers/infiniband/core/rdma_core.c | 37 ++++++++++++++++++++++++++---
include/linux/cgroup_rdma.h | 1 +
include/rdma/ib_verbs.h | 1 +
kernel/cgroup/rdma.c | 18 ++++++++++++++
4 files changed, 54 insertions(+), 3 deletions(-)
diff --git a/drivers/infiniband/core/rdma_core.c b/drivers/infiniband/core/rdma_core.c
index 3268285b5478..aca735947230 100644
--- a/drivers/infiniband/core/rdma_core.c
+++ b/drivers/infiniband/core/rdma_core.c
@@ -40,6 +40,7 @@
#include <rdma/rdma_user_ioctl.h>
#include "uverbs.h"
#include "core_priv.h"
+#include <rdma/ib_user_ioctl_cmds.h>
#include "rdma_core.h"
static void uverbs_uobject_free(struct kref *ref)
@@ -421,12 +422,22 @@ struct ib_uobject *rdma_lookup_get_uobject(const struct uverbs_api_object *obj,
return ERR_PTR(ret);
}
+static enum rdmacg_resource_type
+uverbs_obj_to_rdmacg_type(u16 uverbs_obj_id)
+{
+ switch (uverbs_obj_id) {
+ case UVERBS_OBJECT_QP: return RDMACG_RESOURCE_QP;
+ default: return RDMACG_RESOURCE_HCA_OBJECT;
+ }
+}
+
static struct ib_uobject *
alloc_begin_idr_uobject(const struct uverbs_api_object *obj,
struct uverbs_attr_bundle *attrs)
{
int ret;
struct ib_uobject *uobj;
+ enum rdmacg_resource_type rdmacg_type;
uobj = alloc_uobj(attrs, obj);
if (IS_ERR(uobj))
@@ -441,6 +452,19 @@ alloc_begin_idr_uobject(const struct uverbs_api_object *obj,
if (ret)
goto remove;
+ rdmacg_type = uverbs_obj_to_rdmacg_type(uobj_get_object_id(uobj));
+ if (rdmacg_type != RDMACG_RESOURCE_HCA_OBJECT) {
+ ret = ib_rdmacg_try_charge(&uobj->cg_obj, uobj->context->device,
+ rdmacg_type, 1);
+ if (ret) {
+ ib_rdmacg_uncharge(&uobj->cg_obj, uobj->context->device,
+ RDMACG_RESOURCE_HCA_OBJECT, 1);
+ goto remove;
+ }
+ }
+
+ uobj->rdmacg_type = rdmacg_type;
+
return uobj;
remove:
@@ -523,10 +547,18 @@ struct ib_uobject *rdma_alloc_begin_uobject(const struct uverbs_api_object *obj,
return ret;
}
-static void alloc_abort_idr_uobject(struct ib_uobject *uobj)
+static void rdmacg_uncharge_uobj(struct ib_uobject *uobj)
{
ib_rdmacg_uncharge(&uobj->cg_obj, uobj->context->device,
RDMACG_RESOURCE_HCA_OBJECT, 1);
+ if (uobj->rdmacg_type != RDMACG_RESOURCE_HCA_OBJECT)
+ ib_rdmacg_uncharge(&uobj->cg_obj, uobj->context->device,
+ uobj->rdmacg_type, 1);
+}
+
+static void alloc_abort_idr_uobject(struct ib_uobject *uobj)
+{
+ rdmacg_uncharge_uobj(uobj);
xa_erase(&uobj->ufile->idr, uobj->id);
}
@@ -546,8 +578,7 @@ static int __must_check destroy_hw_idr_uobject(struct ib_uobject *uobj,
if (why == RDMA_REMOVE_ABORT)
return 0;
- ib_rdmacg_uncharge(&uobj->cg_obj, uobj->context->device,
- RDMACG_RESOURCE_HCA_OBJECT, 1);
+ rdmacg_uncharge_uobj(uobj);
return 0;
}
diff --git a/include/linux/cgroup_rdma.h b/include/linux/cgroup_rdma.h
index 7146cefa95a6..2dcae0e04063 100644
--- a/include/linux/cgroup_rdma.h
+++ b/include/linux/cgroup_rdma.h
@@ -12,6 +12,7 @@
enum rdmacg_resource_type {
RDMACG_RESOURCE_HCA_HANDLE,
RDMACG_RESOURCE_HCA_OBJECT,
+ RDMACG_RESOURCE_QP,
RDMACG_RESOURCE_MAX,
};
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 9dd76f489a0b..7dcb3955505b 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1569,6 +1569,7 @@ struct ib_uobject {
void *object; /* containing object */
struct list_head list; /* link to context's list */
struct ib_rdmacg_object cg_obj; /* rdmacg object */
+ enum rdmacg_resource_type rdmacg_type; /* per-type cgroup index */
int id; /* index into kernel idr */
struct kref ref;
atomic_t usecnt; /* protects exclusive access */
diff --git a/kernel/cgroup/rdma.c b/kernel/cgroup/rdma.c
index 7e0fff415528..f9922f6f87c9 100644
--- a/kernel/cgroup/rdma.c
+++ b/kernel/cgroup/rdma.c
@@ -23,6 +23,8 @@ enum rdmacg_limit_tokens {
RDMACG_HCA_HANDLE_MAX,
RDMACG_HCA_OBJECT_VAL,
RDMACG_HCA_OBJECT_MAX,
+ RDMACG_QP_VAL,
+ RDMACG_QP_MAX,
NR_RDMACG_LIMIT_TOKENS,
};
@@ -36,6 +38,8 @@ static const match_table_t rdmacg_limit_tokens = {
{ RDMACG_HCA_HANDLE_MAX, "hca_handle=max" },
{ RDMACG_HCA_OBJECT_VAL, "hca_object=%d" },
{ RDMACG_HCA_OBJECT_MAX, "hca_object=max" },
+ { RDMACG_QP_VAL, "qp=%d" },
+ { RDMACG_QP_MAX, "qp=max" },
{ NR_RDMACG_LIMIT_TOKENS, NULL },
};
@@ -60,7 +64,9 @@ enum rdmacg_file_type {
static char const *rdmacg_resource_names[] = {
[RDMACG_RESOURCE_HCA_HANDLE] = "hca_handle",
[RDMACG_RESOURCE_HCA_OBJECT] = "hca_object",
+ [RDMACG_RESOURCE_QP] = "qp",
};
+static_assert(ARRAY_SIZE(rdmacg_resource_names) == RDMACG_RESOURCE_MAX);
/* resource tracker for each resource of rdma cgroup */
struct rdmacg_resource {
@@ -574,6 +580,18 @@ static ssize_t rdmacg_resource_set_max(struct kernfs_open_file *of,
new_limits[RDMACG_RESOURCE_HCA_OBJECT] = S64_MAX;
enables |= BIT(RDMACG_RESOURCE_HCA_OBJECT);
break;
+ case RDMACG_QP_VAL:
+ if (match_s64(&args[0], &intval)) {
+ ret = -EINVAL;
+ goto parse_err;
+ }
+ new_limits[RDMACG_RESOURCE_QP] = intval;
+ enables |= BIT(RDMACG_RESOURCE_QP);
+ break;
+ case RDMACG_QP_MAX:
+ new_limits[RDMACG_RESOURCE_QP] = S64_MAX;
+ enables |= BIT(RDMACG_RESOURCE_QP);
+ break;
default:
ret = -EINVAL;
goto parse_err;
--
2.43.0
^ permalink raw reply related [flat|nested] 11+ messages in thread* [RFC PATCH rdma-next 3/5] cgroup/rdma: add MR per-type resource counting
2026-05-25 5:55 [RFC PATCH rdma-next 0/5] cgroup/rdma: add per-type resource accounting for QP, MR and MR memory Tao Cui
2026-05-25 5:55 ` [RFC PATCH rdma-next 1/5] cgroup/rdma: extend charge/uncharge API with s64 amount parameter Tao Cui
2026-05-25 5:55 ` [RFC PATCH rdma-next 2/5] cgroup/rdma: add QP per-type resource counting Tao Cui
@ 2026-05-25 5:55 ` Tao Cui
2026-05-25 5:55 ` [RFC PATCH rdma-next 4/5] cgroup/rdma: add MR memory size " Tao Cui
` (2 subsequent siblings)
5 siblings, 0 replies; 11+ messages in thread
From: Tao Cui @ 2026-05-25 5:55 UTC (permalink / raw)
To: tj, hannes, mkoutny, leon, jgg; +Cc: linux-rdma, cgroups, Tao Cui
Add RDMACG_RESOURCE_MR so that Memory Region registration can be
tracked and limited independently from the aggregate hca_object
counter.
Like QP, MR uses dual charging: the existing hca_object charge in
the generic IDR path still applies, and an additional per-type MR
counter is incremented. This enables MR-specific limits:
echo "mlx5_0 mr=500" > rdma.max
Extend uverbs_obj_to_rdmacg_type() with the MR mapping case.
Signed-off-by: Tao Cui <cuitao@kylinos.cn>
---
drivers/infiniband/core/rdma_core.c | 1 +
include/linux/cgroup_rdma.h | 1 +
kernel/cgroup/rdma.c | 17 +++++++++++++++++
3 files changed, 19 insertions(+)
diff --git a/drivers/infiniband/core/rdma_core.c b/drivers/infiniband/core/rdma_core.c
index aca735947230..8fb0df4aa0af 100644
--- a/drivers/infiniband/core/rdma_core.c
+++ b/drivers/infiniband/core/rdma_core.c
@@ -427,6 +427,7 @@ uverbs_obj_to_rdmacg_type(u16 uverbs_obj_id)
{
switch (uverbs_obj_id) {
case UVERBS_OBJECT_QP: return RDMACG_RESOURCE_QP;
+ case UVERBS_OBJECT_MR: return RDMACG_RESOURCE_MR;
default: return RDMACG_RESOURCE_HCA_OBJECT;
}
}
diff --git a/include/linux/cgroup_rdma.h b/include/linux/cgroup_rdma.h
index 2dcae0e04063..35caccc8eb8d 100644
--- a/include/linux/cgroup_rdma.h
+++ b/include/linux/cgroup_rdma.h
@@ -13,6 +13,7 @@ enum rdmacg_resource_type {
RDMACG_RESOURCE_HCA_HANDLE,
RDMACG_RESOURCE_HCA_OBJECT,
RDMACG_RESOURCE_QP,
+ RDMACG_RESOURCE_MR,
RDMACG_RESOURCE_MAX,
};
diff --git a/kernel/cgroup/rdma.c b/kernel/cgroup/rdma.c
index f9922f6f87c9..a056a14d9af5 100644
--- a/kernel/cgroup/rdma.c
+++ b/kernel/cgroup/rdma.c
@@ -25,6 +25,8 @@ enum rdmacg_limit_tokens {
RDMACG_HCA_OBJECT_MAX,
RDMACG_QP_VAL,
RDMACG_QP_MAX,
+ RDMACG_MR_VAL,
+ RDMACG_MR_MAX,
NR_RDMACG_LIMIT_TOKENS,
};
@@ -40,6 +42,8 @@ static const match_table_t rdmacg_limit_tokens = {
{ RDMACG_HCA_OBJECT_MAX, "hca_object=max" },
{ RDMACG_QP_VAL, "qp=%d" },
{ RDMACG_QP_MAX, "qp=max" },
+ { RDMACG_MR_VAL, "mr=%d" },
+ { RDMACG_MR_MAX, "mr=max" },
{ NR_RDMACG_LIMIT_TOKENS, NULL },
};
@@ -65,6 +69,7 @@ static char const *rdmacg_resource_names[] = {
[RDMACG_RESOURCE_HCA_HANDLE] = "hca_handle",
[RDMACG_RESOURCE_HCA_OBJECT] = "hca_object",
[RDMACG_RESOURCE_QP] = "qp",
+ [RDMACG_RESOURCE_MR] = "mr",
};
static_assert(ARRAY_SIZE(rdmacg_resource_names) == RDMACG_RESOURCE_MAX);
@@ -592,6 +597,18 @@ static ssize_t rdmacg_resource_set_max(struct kernfs_open_file *of,
new_limits[RDMACG_RESOURCE_QP] = S64_MAX;
enables |= BIT(RDMACG_RESOURCE_QP);
break;
+ case RDMACG_MR_VAL:
+ if (match_s64(&args[0], &intval)) {
+ ret = -EINVAL;
+ goto parse_err;
+ }
+ new_limits[RDMACG_RESOURCE_MR] = intval;
+ enables |= BIT(RDMACG_RESOURCE_MR);
+ break;
+ case RDMACG_MR_MAX:
+ new_limits[RDMACG_RESOURCE_MR] = S64_MAX;
+ enables |= BIT(RDMACG_RESOURCE_MR);
+ break;
default:
ret = -EINVAL;
goto parse_err;
--
2.43.0
^ permalink raw reply related [flat|nested] 11+ messages in thread* [RFC PATCH rdma-next 4/5] cgroup/rdma: add MR memory size per-type resource counting
2026-05-25 5:55 [RFC PATCH rdma-next 0/5] cgroup/rdma: add per-type resource accounting for QP, MR and MR memory Tao Cui
` (2 preceding siblings ...)
2026-05-25 5:55 ` [RFC PATCH rdma-next 3/5] cgroup/rdma: add MR " Tao Cui
@ 2026-05-25 5:55 ` Tao Cui
2026-05-25 5:55 ` [RFC PATCH rdma-next 5/5] cgroup/rdma: update cgroup resource list for QP, MR and MR_MEM Tao Cui
2026-05-25 13:43 ` [RFC PATCH rdma-next 0/5] cgroup/rdma: add per-type resource accounting for QP, MR and MR memory Jason Gunthorpe
5 siblings, 0 replies; 11+ messages in thread
From: Tao Cui @ 2026-05-25 5:55 UTC (permalink / raw)
To: tj, hannes, mkoutny, leon, jgg; +Cc: linux-rdma, cgroups, Tao Cui
Add RDMACG_RESOURCE_MR_MEM so that the cumulative memory size of
registered Memory Regions can be tracked and limited independently
from the MR count.
Unlike count-based resources (QP, MR) which are charged in the
generic IDR allocation path, MR_MEM is byte-based and must be
charged after the MR length is known. Charge in the uverbs MR
registration handlers (ioctl and legacy), and uncharge in the
generic destroy paths (alloc_abort_idr_uobject,
destroy_hw_idr_uobject).
Store the charged byte count in uobj->rdmacg_mr_mem_bytes so that
the destroy path knows how much to uncharge even if the charge
succeeded but uobject finalization was not reached.
Enable MR memory limits:
echo "mlx5_0 mr_mem=1073741824" > rdma.max
Signed-off-by: Tao Cui <cuitao@kylinos.cn>
---
drivers/infiniband/core/rdma_core.c | 4 +++
drivers/infiniband/core/uverbs_cmd.c | 12 +++++++
drivers/infiniband/core/uverbs_std_types_mr.c | 32 +++++++++++++++++++
include/linux/cgroup_rdma.h | 1 +
include/rdma/ib_verbs.h | 1 +
kernel/cgroup/rdma.c | 17 ++++++++++
6 files changed, 67 insertions(+)
diff --git a/drivers/infiniband/core/rdma_core.c b/drivers/infiniband/core/rdma_core.c
index 8fb0df4aa0af..cdae3d7d0931 100644
--- a/drivers/infiniband/core/rdma_core.c
+++ b/drivers/infiniband/core/rdma_core.c
@@ -555,6 +555,10 @@ static void rdmacg_uncharge_uobj(struct ib_uobject *uobj)
if (uobj->rdmacg_type != RDMACG_RESOURCE_HCA_OBJECT)
ib_rdmacg_uncharge(&uobj->cg_obj, uobj->context->device,
uobj->rdmacg_type, 1);
+ if (uobj->rdmacg_mr_mem_bytes)
+ ib_rdmacg_uncharge(&uobj->cg_obj, uobj->context->device,
+ RDMACG_RESOURCE_MR_MEM,
+ uobj->rdmacg_mr_mem_bytes);
}
static void alloc_abort_idr_uobject(struct ib_uobject *uobj)
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 9540ac180711..9f5a10da44dc 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -752,6 +752,15 @@ static int ib_uverbs_reg_mr(struct uverbs_attr_bundle *attrs)
uobj->object = mr;
uobj_put_obj_read(pd);
+
+ if (cmd.length) {
+ ret = ib_rdmacg_try_charge(&uobj->cg_obj, uobj->context->device,
+ RDMACG_RESOURCE_MR_MEM, cmd.length);
+ if (ret)
+ goto err_dereg;
+ uobj->rdmacg_mr_mem_bytes = cmd.length;
+ }
+
uobj_finalize_uobj_create(uobj, attrs);
resp.lkey = mr->lkey;
@@ -759,6 +768,9 @@ static int ib_uverbs_reg_mr(struct uverbs_attr_bundle *attrs)
resp.mr_handle = uobj->id;
return uverbs_response(attrs, &resp, sizeof(resp));
+err_dereg:
+ ib_dereg_mr_user(mr, &attrs->driver_udata);
+ goto err_free;
err_put:
uobj_put_obj_read(pd);
err_free:
diff --git a/drivers/infiniband/core/uverbs_std_types_mr.c b/drivers/infiniband/core/uverbs_std_types_mr.c
index 570b9656801d..ffb7d1f97b20 100644
--- a/drivers/infiniband/core/uverbs_std_types_mr.c
+++ b/drivers/infiniband/core/uverbs_std_types_mr.c
@@ -34,6 +34,8 @@
#include "rdma_core.h"
#include "uverbs.h"
#include <rdma/uverbs_std_types.h>
+#include <linux/cgroup_rdma.h>
+#include "core_priv.h"
#include "restrack.h"
static int uverbs_free_mr(struct ib_uobject *uobject,
@@ -141,6 +143,16 @@ static int UVERBS_HANDLER(UVERBS_METHOD_DM_MR_REG)(
rdma_restrack_add(&mr->res);
uobj->object = mr;
+ if (attr.length) {
+ ret = ib_rdmacg_try_charge(&uobj->cg_obj, uobj->context->device,
+ RDMACG_RESOURCE_MR_MEM, attr.length);
+ if (ret) {
+ ib_dereg_mr_user(mr, &attrs->driver_udata);
+ return ret;
+ }
+ uobj->rdmacg_mr_mem_bytes = attr.length;
+ }
+
uverbs_finalize_uobj_create(attrs, UVERBS_ATTR_REG_DM_MR_HANDLE);
ret = uverbs_copy_to(attrs, UVERBS_ATTR_REG_DM_MR_RESP_LKEY, &mr->lkey,
@@ -254,6 +266,16 @@ static int UVERBS_HANDLER(UVERBS_METHOD_REG_DMABUF_MR)(
rdma_restrack_add(&mr->res);
uobj->object = mr;
+ if (length) {
+ ret = ib_rdmacg_try_charge(&uobj->cg_obj, uobj->context->device,
+ RDMACG_RESOURCE_MR_MEM, length);
+ if (ret) {
+ ib_dereg_mr_user(mr, &attrs->driver_udata);
+ return ret;
+ }
+ uobj->rdmacg_mr_mem_bytes = length;
+ }
+
uverbs_finalize_uobj_create(attrs, UVERBS_ATTR_REG_DMABUF_MR_HANDLE);
ret = uverbs_copy_to(attrs, UVERBS_ATTR_REG_DMABUF_MR_RESP_LKEY,
@@ -383,6 +405,16 @@ static int UVERBS_HANDLER(UVERBS_METHOD_REG_MR)(
rdma_restrack_add(&mr->res);
uobj->object = mr;
+ if (length) {
+ ret = ib_rdmacg_try_charge(&uobj->cg_obj, uobj->context->device,
+ RDMACG_RESOURCE_MR_MEM, length);
+ if (ret) {
+ ib_dereg_mr_user(mr, &attrs->driver_udata);
+ return ret;
+ }
+ uobj->rdmacg_mr_mem_bytes = length;
+ }
+
uverbs_finalize_uobj_create(attrs, UVERBS_ATTR_REG_MR_HANDLE);
ret = uverbs_copy_to(attrs, UVERBS_ATTR_REG_MR_RESP_LKEY,
diff --git a/include/linux/cgroup_rdma.h b/include/linux/cgroup_rdma.h
index 35caccc8eb8d..10a373b8148b 100644
--- a/include/linux/cgroup_rdma.h
+++ b/include/linux/cgroup_rdma.h
@@ -14,6 +14,7 @@ enum rdmacg_resource_type {
RDMACG_RESOURCE_HCA_OBJECT,
RDMACG_RESOURCE_QP,
RDMACG_RESOURCE_MR,
+ RDMACG_RESOURCE_MR_MEM,
RDMACG_RESOURCE_MAX,
};
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 7dcb3955505b..951d8dfc98c4 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1570,6 +1570,7 @@ struct ib_uobject {
struct list_head list; /* link to context's list */
struct ib_rdmacg_object cg_obj; /* rdmacg object */
enum rdmacg_resource_type rdmacg_type; /* per-type cgroup index */
+ s64 rdmacg_mr_mem_bytes; /* charged MR memory size */
int id; /* index into kernel idr */
struct kref ref;
atomic_t usecnt; /* protects exclusive access */
diff --git a/kernel/cgroup/rdma.c b/kernel/cgroup/rdma.c
index a056a14d9af5..1386f93b9fbf 100644
--- a/kernel/cgroup/rdma.c
+++ b/kernel/cgroup/rdma.c
@@ -27,6 +27,8 @@ enum rdmacg_limit_tokens {
RDMACG_QP_MAX,
RDMACG_MR_VAL,
RDMACG_MR_MAX,
+ RDMACG_MR_MEM_VAL,
+ RDMACG_MR_MEM_MAX,
NR_RDMACG_LIMIT_TOKENS,
};
@@ -44,6 +46,8 @@ static const match_table_t rdmacg_limit_tokens = {
{ RDMACG_QP_MAX, "qp=max" },
{ RDMACG_MR_VAL, "mr=%d" },
{ RDMACG_MR_MAX, "mr=max" },
+ { RDMACG_MR_MEM_VAL, "mr_mem=%d" },
+ { RDMACG_MR_MEM_MAX, "mr_mem=max" },
{ NR_RDMACG_LIMIT_TOKENS, NULL },
};
@@ -70,6 +74,7 @@ static char const *rdmacg_resource_names[] = {
[RDMACG_RESOURCE_HCA_OBJECT] = "hca_object",
[RDMACG_RESOURCE_QP] = "qp",
[RDMACG_RESOURCE_MR] = "mr",
+ [RDMACG_RESOURCE_MR_MEM] = "mr_mem",
};
static_assert(ARRAY_SIZE(rdmacg_resource_names) == RDMACG_RESOURCE_MAX);
@@ -609,6 +614,18 @@ static ssize_t rdmacg_resource_set_max(struct kernfs_open_file *of,
new_limits[RDMACG_RESOURCE_MR] = S64_MAX;
enables |= BIT(RDMACG_RESOURCE_MR);
break;
+ case RDMACG_MR_MEM_VAL:
+ if (match_s64(&args[0], &intval)) {
+ ret = -EINVAL;
+ goto parse_err;
+ }
+ new_limits[RDMACG_RESOURCE_MR_MEM] = intval;
+ enables |= BIT(RDMACG_RESOURCE_MR_MEM);
+ break;
+ case RDMACG_MR_MEM_MAX:
+ new_limits[RDMACG_RESOURCE_MR_MEM] = S64_MAX;
+ enables |= BIT(RDMACG_RESOURCE_MR_MEM);
+ break;
default:
ret = -EINVAL;
goto parse_err;
--
2.43.0
^ permalink raw reply related [flat|nested] 11+ messages in thread* [RFC PATCH rdma-next 5/5] cgroup/rdma: update cgroup resource list for QP, MR and MR_MEM
2026-05-25 5:55 [RFC PATCH rdma-next 0/5] cgroup/rdma: add per-type resource accounting for QP, MR and MR memory Tao Cui
` (3 preceding siblings ...)
2026-05-25 5:55 ` [RFC PATCH rdma-next 4/5] cgroup/rdma: add MR memory size " Tao Cui
@ 2026-05-25 5:55 ` Tao Cui
2026-05-25 13:43 ` [RFC PATCH rdma-next 0/5] cgroup/rdma: add per-type resource accounting for QP, MR and MR memory Jason Gunthorpe
5 siblings, 0 replies; 11+ messages in thread
From: Tao Cui @ 2026-05-25 5:55 UTC (permalink / raw)
To: tj, hannes, mkoutny, leon, jgg; +Cc: linux-rdma, cgroups, Tao Cui
The RDMA cgroup now supports per-type resource counting for QP, MR
(count) and MR_MEM (cumulative memory size in bytes). Update the
rdma.rst document to list all five resources and revise the usage
examples accordingly.
Signed-off-by: Tao Cui <cuitao@kylinos.cn>
---
Documentation/admin-guide/cgroup-v2.rst | 19 +++++++++++--------
1 file changed, 11 insertions(+), 8 deletions(-)
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 993446ab66d0..512af59e302a 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -2769,12 +2769,15 @@ RDMA Interface Files
========== =============================
hca_handle Maximum number of HCA Handles
hca_object Maximum number of HCA Objects
+ qp Maximum number of Queue Pairs
+ mr Maximum number of Memory Regions
+ mr_mem Maximum cumulative MR memory size in bytes
========== =============================
An example for mlx4 and ocrdma device follows::
- mlx4_0 hca_handle=2 hca_object=2000
- ocrdma1 hca_handle=3 hca_object=max
+ mlx4_0 hca_handle=2 hca_object=2000 qp=100 mr=500 mr_mem=1073741824
+ ocrdma1 hca_handle=3 hca_object=max qp=max mr=max mr_mem=max
rdma.current
A read-only file that describes current resource usage.
@@ -2782,8 +2785,8 @@ RDMA Interface Files
An example for mlx4 and ocrdma device follows::
- mlx4_0 hca_handle=1 hca_object=20
- ocrdma1 hca_handle=1 hca_object=23
+ mlx4_0 hca_handle=1 hca_object=20 qp=5 mr=10 mr_mem=10485760
+ ocrdma1 hca_handle=1 hca_object=23 qp=3 mr=8 mr_mem=8388608
rdma.peak
A read-only nested-keyed file that exists for all the cgroups
@@ -2792,8 +2795,8 @@ RDMA Interface Files
An example for mlx4 and ocrdma device follows::
- mlx4_0 hca_handle=1 hca_object=20
- ocrdma1 hca_handle=0 hca_object=23
+ mlx4_0 hca_handle=1 hca_object=20 qp=5 mr=10 mr_mem=10485760
+ ocrdma1 hca_handle=0 hca_object=23 qp=3 mr=8 mr_mem=8388608
rdma.events
A read-only nested-keyed file which exists on non-root
@@ -2815,7 +2818,7 @@ RDMA Interface Files
An example for mlx4 device follows::
- mlx4_0 hca_handle.max=5 hca_handle.alloc_fail=3 hca_object.max=0 hca_object.alloc_fail=0
+ mlx4_0 hca_handle.max=5 hca_handle.alloc_fail=3 hca_object.max=0 hca_object.alloc_fail=0 qp.max=0 qp.alloc_fail=0 mr.max=0 mr.alloc_fail=0 mr_mem.max=0 mr_mem.alloc_fail=0
rdma.events.local
Similar to rdma.events but the fields in the file are local
@@ -2836,7 +2839,7 @@ RDMA Interface Files
An example for mlx4 device follows::
- mlx4_0 hca_handle.max=5 hca_handle.alloc_fail=0 hca_object.max=0 hca_object.alloc_fail=0
+ mlx4_0 hca_handle.max=5 hca_handle.alloc_fail=0 hca_object.max=0 hca_object.alloc_fail=0 qp.max=0 qp.alloc_fail=0 mr.max=0 mr.alloc_fail=0 mr_mem.max=0 mr_mem.alloc_fail=0
DMEM
----
--
2.43.0
^ permalink raw reply related [flat|nested] 11+ messages in thread* Re: [RFC PATCH rdma-next 0/5] cgroup/rdma: add per-type resource accounting for QP, MR and MR memory
2026-05-25 5:55 [RFC PATCH rdma-next 0/5] cgroup/rdma: add per-type resource accounting for QP, MR and MR memory Tao Cui
` (4 preceding siblings ...)
2026-05-25 5:55 ` [RFC PATCH rdma-next 5/5] cgroup/rdma: update cgroup resource list for QP, MR and MR_MEM Tao Cui
@ 2026-05-25 13:43 ` Jason Gunthorpe
2026-05-27 11:28 ` Tao Cui
5 siblings, 1 reply; 11+ messages in thread
From: Jason Gunthorpe @ 2026-05-25 13:43 UTC (permalink / raw)
To: Tao Cui; +Cc: tj, hannes, mkoutny, leon, linux-rdma, cgroups
On Mon, May 25, 2026 at 01:55:01PM +0800, Tao Cui wrote:
> Currently the RDMA cgroup only tracks two aggregate counters:
> hca_handle and hca_object. This is too coarse for real-world
> deployment: a tenant can exhaust all HCA objects by creating nothing
> but QPs, while the administrator has no way to impose separate limits
> on QP count, MR count, or the cumulative memory registered through
> MRs.
This was a deliberate choice.
> - qp - Queue Pair count
> - mr - Memory Region count
> - mr_mem - Cumulative MR memory size in bytes
I would agree to mr_mem as a reasonable extension, but not splitting
out objects to finer grains. There are endless objects we don't want a
100 different cgroup knobs, it is not usable.
Jason
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [RFC PATCH rdma-next 0/5] cgroup/rdma: add per-type resource accounting for QP, MR and MR memory
2026-05-25 13:43 ` [RFC PATCH rdma-next 0/5] cgroup/rdma: add per-type resource accounting for QP, MR and MR memory Jason Gunthorpe
@ 2026-05-27 11:28 ` Tao Cui
2026-05-27 13:34 ` Jason Gunthorpe
0 siblings, 1 reply; 11+ messages in thread
From: Tao Cui @ 2026-05-27 11:28 UTC (permalink / raw)
To: Jason Gunthorpe; +Cc: tj, hannes, mkoutny, leon, linux-rdma, cgroups
Hi,Jason
Thanks for the review.
在 2026/5/25 21:43, Jason Gunthorpe 写道:
>
> I would agree to mr_mem as a reasonable extension, but not splitting
> out objects to finer grains. There are endless objects we don't want a
> 100 different cgroup knobs, it is not usable.
>
Understood. Our initial motivation was
multi-tenant isolation: a tenant could consume disproportionate
resources by creating many objects of a single type. In hindsight,
though, the real bottleneck is pinned memory, not object counts —
modern hardware has large object pools, and the scarce resource is
how much physical memory gets registered through MRs. mr_mem
addresses that directly, while hca_object remains sufficient for
coarse object accounting.
I'll send a v2 shortly that removes the qp and mr counters entirely
and retains only mr_mem.
---
Tao
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH rdma-next 0/5] cgroup/rdma: add per-type resource accounting for QP, MR and MR memory
2026-05-27 11:28 ` Tao Cui
@ 2026-05-27 13:34 ` Jason Gunthorpe
2026-05-28 7:55 ` Tao Cui
0 siblings, 1 reply; 11+ messages in thread
From: Jason Gunthorpe @ 2026-05-27 13:34 UTC (permalink / raw)
To: Tao Cui; +Cc: tj, hannes, mkoutny, leon, linux-rdma, cgroups
On Wed, May 27, 2026 at 07:28:59PM +0800, Tao Cui wrote:
> Hi,Jason
>
> Thanks for the review.
>
> 在 2026/5/25 21:43, Jason Gunthorpe 写道:
> >
> > I would agree to mr_mem as a reasonable extension, but not splitting
> > out objects to finer grains. There are endless objects we don't want a
> > 100 different cgroup knobs, it is not usable.
> >
>
> Understood. Our initial motivation was
> multi-tenant isolation: a tenant could consume disproportionate
> resources by creating many objects of a single type. In hindsight,
> though, the real bottleneck is pinned memory, not object counts —
> modern hardware has large object pools, and the scarce resource is
> how much physical memory gets registered through MRs. mr_mem
> addresses that directly, while hca_object remains sufficient for
> coarse object accounting.
This was the same motivation that lead us to a single object
limit. Inside a modern NIC the objects tend to pool from the same memory
pool so it doesn't matter if you have 100 QPs or 100 SRQs or whatever
they sort of cost the same.
memory pin accounting should ideally be limited by the cgroup directly
but we argued about that for a while and could never get an agreement
of an acceptable implementation. There are many nasty corner cases
around cgroups and fork and other cases IIRC
So I'm not sure if making it rdma specific can easially solve these
problems
Jason
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH rdma-next 0/5] cgroup/rdma: add per-type resource accounting for QP, MR and MR memory
2026-05-27 13:34 ` Jason Gunthorpe
@ 2026-05-28 7:55 ` Tao Cui
2026-05-28 13:06 ` Jason Gunthorpe
0 siblings, 1 reply; 11+ messages in thread
From: Tao Cui @ 2026-05-28 7:55 UTC (permalink / raw)
To: jgg; +Cc: cgroups, cuitao, hannes, leon, linux-rdma, mkoutny, tj
Hi,Jason
> memory pin accounting should ideally be limited by the cgroup directly
> but we argued about that for a while and could never get an agreement
> of an acceptable implementation. There are many nasty corner cases
> around cgroups and fork and other cases IIRC
>
> So I'm not sure if making it rdma specific can easially solve these
> problems
Thanks for the detailed context. I understand the concern — generic
pinned-page accounting at the memcg level has difficult ownership
semantics around fork(), cgroup migration, shared mappings, and page
lifetime tracking.
The intent of mr_mem is narrower and RDMA-scoped. It is not page-level
ownership tracking — it is object-based accounting tied to the MR
lifetime:
- charged at MR registration time
- uncharged at MR destruction time
- the charge lives with the MR's creating cgroup for the entire
lifetime of the MR object
This model intentionally defines accounting semantics around MR
object lifetime rather than page ownership:
1. fork(): The accounting model is based on MR object ownership
rather than ownership of the underlying pages after fork().
fork() does not duplicate MR objects. Even though the child
inherits the uverbs fd and can access the parent's ucontext,
the MR remains a single kernel object — fork itself creates no
additional MR registrations or associated RDMA resource accounting.
The charge is tied to the MR object, not to the number of processes
that can reach it, so no splitting or re-accounting is needed.
2. Cgroup migration: mr_mem follows the same semantics as the existing
hca_object — charge at creation time against the invoking task's
cgroup, uncharge at destruction time. The RDMA cgroup does not
implement can_attach/attach callbacks today, so charges do not
migrate with the task. This is a known limitation that applies
equally to hca_handle and hca_object. mr_mem does not introduce
any new complication here.
3. Overlap with memory cgroup: mr_mem does not count process memory
usage — it represents a per-device DMA registration budget: how
much memory can this cgroup register through a given HCA. This is
a different dimension from what memory cgroup tracks. An
administrator might set mr_mem limits differently per device, which
memory cgroup cannot express.
In particular, mr_mem tracks the registered memory range associated
with the MR rather than exact dynamically pinned pages (e.g. for
ODP MRs). This is a stable, policy-oriented approximation of
registration footprint — not an attempt at precise physical page
accounting.
If you think this RDMA-scoped approach still has unresolved problems,
I'd appreciate guidance on which corner cases remain problematic.
Thanks,
Tao
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [RFC PATCH rdma-next 0/5] cgroup/rdma: add per-type resource accounting for QP, MR and MR memory
2026-05-28 7:55 ` Tao Cui
@ 2026-05-28 13:06 ` Jason Gunthorpe
0 siblings, 0 replies; 11+ messages in thread
From: Jason Gunthorpe @ 2026-05-28 13:06 UTC (permalink / raw)
To: Tao Cui; +Cc: cgroups, hannes, leon, linux-rdma, mkoutny, tj
On Thu, May 28, 2026 at 03:55:37PM +0800, Tao Cui wrote:
> Hi,Jason
>
> > memory pin accounting should ideally be limited by the cgroup directly
> > but we argued about that for a while and could never get an agreement
> > of an acceptable implementation. There are many nasty corner cases
> > around cgroups and fork and other cases IIRC
> >
> > So I'm not sure if making it rdma specific can easially solve these
> > problems
>
> Thanks for the detailed context. I understand the concern — generic
> pinned-page accounting at the memcg level has difficult ownership
> semantics around fork(), cgroup migration, shared mappings, and page
> lifetime tracking.
>
> The intent of mr_mem is narrower and RDMA-scoped. It is not page-level
> ownership tracking — it is object-based accounting tied to the MR
> lifetime:
>
> - charged at MR registration time
> - uncharged at MR destruction time
> - the charge lives with the MR's creating cgroup for the entire
> lifetime of the MR object
Okay, that's an interesting framing. Perhaps it can work, you should
include this in the commit message and be sure to CC the cgroup
people.
Jason
^ permalink raw reply [flat|nested] 11+ messages in thread