[PATCH 0/3] cgroup/rdma: add rdma.peak and rdma.events[.local]

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/3] cgroup/rdma: add rdma.peak and rdma.events[.local]
@ 2026-05-12  3:17 Tao Cui
  2026-05-12  3:17 ` [PATCH 1/3] cgroup/rdma: add rdma.peak for per-device peak usage tracking Tao Cui
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Tao Cui @ 2026-05-12  3:17 UTC (permalink / raw)
  To: tj, hannes, mkoutny, cgroups; +Cc: Tao Cui

Hi,

This series adds three new cgroup interface files to the RDMA controller
to improve observability of resource usage and limit enforcement:

  - rdma.peak:        per-device high watermark of resource usage
  - rdma.events:      hierarchical max event counters
  - rdma.events.local: per-cgroup local max and failcnt counters

Why these interfaces?

Currently rdma.current only shows the instantaneous resource usage per
device.  Administrators who need to set appropriate rdma.max limits have
no way to observe usage spikes or detect when limits are being hit.

rdma.peak addresses the observability gap: it tracks the historical high
watermark so administrators can determine a sensible rdma.max based on
actual peak demand rather than guesswork.  This is directly analogous to
memory.peak.

rdma.events and rdma.events.local address the notification gap: they
provide per-device counters that track how often resource limits block
allocations, and can be monitored via poll/epoll for real-time alerting
when a cgroup hits its rdma.max.  This follows the pids.events /
pids.events.local design, where events are attributed to the cgroup
whose limit was exceeded rather than the cgroup where the allocation was
attempted.

Patch overview:

  Patch 1: rdma.peak
    Adds peak tracking in the charge path and the rdma.peak interface
    file.  rpools are kept alive while peak is non-zero so the values
    persist as historical records.

  Patch 2: rdma.events
    Adds hierarchical max event counters that propagate upward from the
    cgroup whose limit was hit.  Introduces rdmacg_event_locked() and
    the rdma.events interface file with poll notification support.

  Patch 3: rdma.events.local
    Extends the event infrastructure with per-cgroup local counters:
    local max counts how often this cgroup's limit blocked an allocation,
    failcnt counts how often allocations from this subtree were rejected.
    Adds the rdma.events.local interface file.

These patches have been tested locally.

Tao Cui (3):
  cgroup/rdma: add rdma.peak for per-device peak usage tracking
  cgroup/rdma: add rdma.events to track resource limit exhaustion
  cgroup/rdma: add rdma.events.local for per-cgroup allocation failure
    attribution

 include/linux/cgroup_rdma.h |   4 +
 kernel/cgroup/rdma.c        | 165 +++++++++++++++++++++++++++++++++++-
 2 files changed, 165 insertions(+), 4 deletions(-)

-- 
2.43.0

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/3] cgroup/rdma: add rdma.peak for per-device peak usage tracking
  2026-05-12  3:17 [PATCH 0/3] cgroup/rdma: add rdma.peak and rdma.events[.local] Tao Cui
@ 2026-05-12  3:17 ` Tao Cui
  2026-05-12  3:17 ` [PATCH 2/3] cgroup/rdma: add rdma.events to track resource limit exhaustion Tao Cui
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 6+ messages in thread
From: Tao Cui @ 2026-05-12  3:17 UTC (permalink / raw)
  To: tj, hannes, mkoutny, cgroups; +Cc: Tao Cui

rdma.peak tracks the high watermark of resource usage per device,
giving a better baseline on which to set rdma.max. Polling
rdma.current isn't feasible since it would miss short-lived spikes.

This interface is analogous to memory.peak.

Signed-off-by: Tao Cui <cuitao@kylinos.cn>
---
 kernel/cgroup/rdma.c | 44 ++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 40 insertions(+), 4 deletions(-)

diff --git a/kernel/cgroup/rdma.c b/kernel/cgroup/rdma.c
index 3df7c38ce481..ed1f3f7996bd 100644
--- a/kernel/cgroup/rdma.c
+++ b/kernel/cgroup/rdma.c
@@ -44,6 +44,7 @@ static LIST_HEAD(rdmacg_devices);
 enum rdmacg_file_type {
 	RDMACG_RESOURCE_TYPE_MAX,
 	RDMACG_RESOURCE_TYPE_STAT,
+	RDMACG_RESOURCE_TYPE_PEAK,
 };
 
 /*
@@ -60,6 +61,7 @@ static char const *rdmacg_resource_names[] = {
 struct rdmacg_resource {
 	int max;
 	int usage;
+	int peak;
 };
 
 /*
@@ -204,9 +206,20 @@ uncharge_cg_locked(struct rdma_cgroup *cg,
 	rpool->usage_sum--;
 	if (rpool->usage_sum == 0 &&
 	    rpool->num_max_cnt == RDMACG_RESOURCE_MAX) {
+		int i;
+
+		/*
+		 * Keep the rpool alive if any peak value is non-zero,
+		 * so that rdma.peak persists as a historical high-
+		 * watermark even after all resources are freed.
+		 */
+		for (i = 0; i < RDMACG_RESOURCE_MAX; i++) {
+			if (rpool->resources[i].peak)
+				return;
+		}
 		/*
-		 * No user of the rpool and all entries are set to max, so
-		 * safe to delete this rpool.
+		 * No user of the rpool and all entries are
+		 * set to max, so safe to delete this rpool.
 		 */
 		free_cg_rpool_locked(rpool);
 	}
@@ -306,6 +319,8 @@ int rdmacg_try_charge(struct rdma_cgroup **rdmacg,
 				goto err;
 			} else {
 				rpool->resources[index].usage = new;
+				if (new > rpool->resources[index].peak)
+					rpool->resources[index].peak = new;
 				rpool->usage_sum++;
 			}
 		}
@@ -472,9 +487,15 @@ static ssize_t rdmacg_resource_set_max(struct kernfs_open_file *of,
 
 	if (rpool->usage_sum == 0 &&
 	    rpool->num_max_cnt == RDMACG_RESOURCE_MAX) {
+		int i;
+
+		for (i = 0; i < RDMACG_RESOURCE_MAX; i++) {
+			if (rpool->resources[i].peak)
+				goto dev_err;
+		}
 		/*
-		 * No user of the rpool and all entries are set to max, so
-		 * safe to delete this rpool.
+		 * No user of the rpool and all entries are
+		 * set to max, so safe to delete this rpool.
 		 */
 		free_cg_rpool_locked(rpool);
 	}
@@ -506,6 +527,8 @@ static void print_rpool_values(struct seq_file *sf,
 				value = rpool->resources[i].max;
 			else
 				value = S32_MAX;
+		} else if (sf_type == RDMACG_RESOURCE_TYPE_PEAK) {
+			value = rpool ? rpool->resources[i].peak : 0;
 		} else {
 			if (rpool)
 				value = rpool->resources[i].usage;
@@ -556,6 +579,12 @@ static struct cftype rdmacg_files[] = {
 		.private = RDMACG_RESOURCE_TYPE_STAT,
 		.flags = CFTYPE_NOT_ON_ROOT,
 	},
+	{
+		.name = "peak",
+		.seq_show = rdmacg_resource_read,
+		.private = RDMACG_RESOURCE_TYPE_PEAK,
+		.flags = CFTYPE_NOT_ON_ROOT,
+	},
 	{ }	/* terminate */
 };
 
@@ -575,6 +604,13 @@ rdmacg_css_alloc(struct cgroup_subsys_state *parent)
 static void rdmacg_css_free(struct cgroup_subsys_state *css)
 {
 	struct rdma_cgroup *cg = css_rdmacg(css);
+	struct rdmacg_resource_pool *rpool, *tmp;
+
+	/* Clean up rpools kept alive by non-zero peak values */
+	mutex_lock(&rdmacg_mutex);
+	list_for_each_entry_safe(rpool, tmp, &cg->rpools, cg_node)
+		free_cg_rpool_locked(rpool);
+	mutex_unlock(&rdmacg_mutex);
 
 	kfree(cg);
 }
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/3] cgroup/rdma: add rdma.events to track resource limit exhaustion
  2026-05-12  3:17 [PATCH 0/3] cgroup/rdma: add rdma.peak and rdma.events[.local] Tao Cui
  2026-05-12  3:17 ` [PATCH 1/3] cgroup/rdma: add rdma.peak for per-device peak usage tracking Tao Cui
@ 2026-05-12  3:17 ` Tao Cui
  2026-05-12  3:17 ` [PATCH 3/3] cgroup/rdma: add rdma.events.local for per-cgroup allocation failure attribution Tao Cui
  2026-05-12 17:49 ` [PATCH 0/3] cgroup/rdma: add rdma.peak and rdma.events[.local] Tejun Heo
  3 siblings, 0 replies; 6+ messages in thread
From: Tao Cui @ 2026-05-12  3:17 UTC (permalink / raw)
  To: tj, hannes, mkoutny, cgroups; +Cc: Tao Cui

Add per-device hierarchical event counters to track when RDMA resource
limits are exceeded. The rdma.events file reports max event counts
propagated upward from the cgroup whose limit was hit to all ancestors.

This mirrors the design of pids.events, where events are attributed to
the cgroup that imposed the limit, not necessarily the cgroup where the
allocation was attempted. Userspace can monitor this file via
poll/epoll for real-time notification of resource exhaustion.

Signed-off-by: Tao Cui <cuitao@kylinos.cn>
---
 include/linux/cgroup_rdma.h |  3 ++
 kernel/cgroup/rdma.c        | 66 +++++++++++++++++++++++++++++++++++++
 2 files changed, 69 insertions(+)

diff --git a/include/linux/cgroup_rdma.h b/include/linux/cgroup_rdma.h
index 80edae03c313..ac691fe7d3f5 100644
--- a/include/linux/cgroup_rdma.h
+++ b/include/linux/cgroup_rdma.h
@@ -24,6 +24,9 @@ struct rdma_cgroup {
 	 * that belongs to this cgroup.
 	 */
 	struct list_head		rpools;
+
+	/* Handle for rdma.events */
+	struct cgroup_file		events_file;
 };
 
 struct rdmacg_device {
diff --git a/kernel/cgroup/rdma.c b/kernel/cgroup/rdma.c
index ed1f3f7996bd..66b853cf4ac8 100644
--- a/kernel/cgroup/rdma.c
+++ b/kernel/cgroup/rdma.c
@@ -8,6 +8,7 @@
  * Copyright (C) 2016 Parav Pandit <pandit.parav@gmail.com>
  */
 
+#include <linux/atomic.h>
 #include <linux/bitops.h>
 #include <linux/limits.h>
 #include <linux/slab.h>
@@ -81,6 +82,9 @@ struct rdmacg_resource_pool {
 	u64			usage_sum;
 	/* total number counts which are set to max */
 	int			num_max_cnt;
+
+	/* per-resource hierarchical max event counters */
+	atomic64_t		events_max[RDMACG_RESOURCE_MAX];
 };
 
 static struct rdma_cgroup *css_rdmacg(struct cgroup_subsys_state *css)
@@ -225,6 +229,33 @@ uncharge_cg_locked(struct rdma_cgroup *cg,
 	}
 }
 
+/**
+ * rdmacg_event_locked - fire hierarchical max event when resource limit is hit
+ * @over_cg: cgroup whose limit was exceeded
+ * @device: rdma device
+ * @index: resource type index
+ *
+ * Must be called under rdmacg_mutex. Propagates max event counts upward
+ * from @over_cg to all ancestors and notifies userspace.
+ */
+static void rdmacg_event_locked(struct rdma_cgroup *over_cg,
+				struct rdmacg_device *device,
+				enum rdmacg_resource_type index)
+{
+	struct rdmacg_resource_pool *rpool;
+	struct rdma_cgroup *p;
+
+	lockdep_assert_held(&rdmacg_mutex);
+
+	for (p = over_cg; parent_rdmacg(p); p = parent_rdmacg(p)) {
+		rpool = get_cg_rpool_locked(p, device);
+		if (!IS_ERR(rpool)) {
+			atomic64_inc(&rpool->events_max[index]);
+			cgroup_file_notify(&p->events_file);
+		}
+	}
+}
+
 /**
  * rdmacg_uncharge_hierarchy - hierarchically uncharge rdma resource count
  * @cg: pointer to cg to uncharge and all parents in hierarchy
@@ -331,6 +362,8 @@ int rdmacg_try_charge(struct rdma_cgroup **rdmacg,
 	return 0;
 
 err:
+	if (ret == -EAGAIN)
+		rdmacg_event_locked(p, device, index);
 	mutex_unlock(&rdmacg_mutex);
 	rdmacg_uncharge_hierarchy(cg, device, p, index);
 	return ret;
@@ -565,6 +598,33 @@ static int rdmacg_resource_read(struct seq_file *sf, void *v)
 	return 0;
 }
 
+static int rdmacg_events_show(struct seq_file *sf, void *v)
+{
+	struct rdma_cgroup *cg = css_rdmacg(seq_css(sf));
+	struct rdmacg_resource_pool *rpool;
+	struct rdmacg_device *device;
+	int i;
+
+	mutex_lock(&rdmacg_mutex);
+
+	list_for_each_entry(device, &rdmacg_devices, dev_node) {
+		rpool = find_cg_rpool_locked(cg, device);
+		if (!rpool)
+			continue;
+
+		seq_printf(sf, "%s ", device->name);
+		for (i = 0; i < RDMACG_RESOURCE_MAX; i++) {
+			seq_printf(sf, "%s.max %lld ",
+				   rdmacg_resource_names[i],
+				   (s64)atomic64_read(&rpool->events_max[i]));
+		}
+		seq_putc(sf, '\n');
+	}
+
+	mutex_unlock(&rdmacg_mutex);
+	return 0;
+}
+
 static struct cftype rdmacg_files[] = {
 	{
 		.name = "max",
@@ -585,6 +645,12 @@ static struct cftype rdmacg_files[] = {
 		.private = RDMACG_RESOURCE_TYPE_PEAK,
 		.flags = CFTYPE_NOT_ON_ROOT,
 	},
+	{
+		.name = "events",
+		.seq_show = rdmacg_events_show,
+		.file_offset = offsetof(struct rdma_cgroup, events_file),
+		.flags = CFTYPE_NOT_ON_ROOT,
+	},
 	{ }	/* terminate */
 };
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 3/3] cgroup/rdma: add rdma.events.local for per-cgroup allocation failure attribution
  2026-05-12  3:17 [PATCH 0/3] cgroup/rdma: add rdma.peak and rdma.events[.local] Tao Cui
  2026-05-12  3:17 ` [PATCH 1/3] cgroup/rdma: add rdma.peak for per-device peak usage tracking Tao Cui
  2026-05-12  3:17 ` [PATCH 2/3] cgroup/rdma: add rdma.events to track resource limit exhaustion Tao Cui
@ 2026-05-12  3:17 ` Tao Cui
  2026-05-12 17:49 ` [PATCH 0/3] cgroup/rdma: add rdma.peak and rdma.events[.local] Tejun Heo
  3 siblings, 0 replies; 6+ messages in thread
From: Tao Cui @ 2026-05-12  3:17 UTC (permalink / raw)
  To: tj, hannes, mkoutny, cgroups; +Cc: Tao Cui

Add per-cgroup local event counters to track RDMA resource limit
exhaustion from the perspective of individual cgroups. The
rdma.events.local file reports two per-resource counters:

- max: number of times this cgroup's limit was the one that blocked
  an allocation in the subtree
- failcnt: number of allocation attempts originating from this
  cgroup (or its descendants) that failed due to an ancestor's limit

This mirrors the design of pids.events.local, where events are
attributed to the cgroup that imposed the limit, not necessarily the
cgroup where the allocation was attempted.

Signed-off-by: Tao Cui <cuitao@kylinos.cn>
---
 include/linux/cgroup_rdma.h |  3 +-
 kernel/cgroup/rdma.c        | 67 +++++++++++++++++++++++++++++++++----
 2 files changed, 63 insertions(+), 7 deletions(-)

diff --git a/include/linux/cgroup_rdma.h b/include/linux/cgroup_rdma.h
index ac691fe7d3f5..404e746552ca 100644
--- a/include/linux/cgroup_rdma.h
+++ b/include/linux/cgroup_rdma.h
@@ -25,8 +25,9 @@ struct rdma_cgroup {
 	 */
 	struct list_head		rpools;
 
-	/* Handle for rdma.events */
+	/* Handles for rdma.events[.local] */
 	struct cgroup_file		events_file;
+	struct cgroup_file		events_local_file;
 };
 
 struct rdmacg_device {
diff --git a/kernel/cgroup/rdma.c b/kernel/cgroup/rdma.c
index 66b853cf4ac8..2c1e1a5d7b6d 100644
--- a/kernel/cgroup/rdma.c
+++ b/kernel/cgroup/rdma.c
@@ -83,8 +83,10 @@ struct rdmacg_resource_pool {
 	/* total number counts which are set to max */
 	int			num_max_cnt;
 
-	/* per-resource hierarchical max event counters */
+	/* per-resource event counters */
 	atomic64_t		events_max[RDMACG_RESOURCE_MAX];
+	atomic64_t		events_local_max[RDMACG_RESOURCE_MAX];
+	atomic64_t		events_failcnt[RDMACG_RESOURCE_MAX];
 };
 
 static struct rdma_cgroup *css_rdmacg(struct cgroup_subsys_state *css)
@@ -230,15 +232,18 @@ uncharge_cg_locked(struct rdma_cgroup *cg,
 }
 
 /**
- * rdmacg_event_locked - fire hierarchical max event when resource limit is hit
+ * rdmacg_event_locked - fire event when resource allocation exceeds limit
+ * @cg: requesting cgroup
  * @over_cg: cgroup whose limit was exceeded
  * @device: rdma device
  * @index: resource type index
  *
- * Must be called under rdmacg_mutex. Propagates max event counts upward
- * from @over_cg to all ancestors and notifies userspace.
+ * Must be called under rdmacg_mutex. Updates event counters in the
+ * resource pools of @cg and @over_cg, propagates hierarchical max
+ * events upward, and notifies userspace via cgroup_file_notify().
  */
-static void rdmacg_event_locked(struct rdma_cgroup *over_cg,
+static void rdmacg_event_locked(struct rdma_cgroup *cg,
+				struct rdma_cgroup *over_cg,
 				struct rdmacg_device *device,
 				enum rdmacg_resource_type index)
 {
@@ -247,6 +252,21 @@ static void rdmacg_event_locked(struct rdma_cgroup *over_cg,
 
 	lockdep_assert_held(&rdmacg_mutex);
 
+	/* Increment failcnt in requesting cgroup */
+	rpool = find_cg_rpool_locked(cg, device);
+	if (rpool) {
+		atomic64_inc(&rpool->events_failcnt[index]);
+		cgroup_file_notify(&cg->events_local_file);
+	}
+
+	/* Increment local max in the over-limit cgroup */
+	rpool = find_cg_rpool_locked(over_cg, device);
+	if (rpool) {
+		atomic64_inc(&rpool->events_local_max[index]);
+		cgroup_file_notify(&over_cg->events_local_file);
+	}
+
+	/* Propagate hierarchical max events upward, creating rpools as needed */
 	for (p = over_cg; parent_rdmacg(p); p = parent_rdmacg(p)) {
 		rpool = get_cg_rpool_locked(p, device);
 		if (!IS_ERR(rpool)) {
@@ -363,7 +383,7 @@ int rdmacg_try_charge(struct rdma_cgroup **rdmacg,
 
 err:
 	if (ret == -EAGAIN)
-		rdmacg_event_locked(p, device, index);
+		rdmacg_event_locked(cg, p, device, index);
 	mutex_unlock(&rdmacg_mutex);
 	rdmacg_uncharge_hierarchy(cg, device, p, index);
 	return ret;
@@ -625,6 +645,35 @@ static int rdmacg_events_show(struct seq_file *sf, void *v)
 	return 0;
 }
 
+static int rdmacg_events_local_show(struct seq_file *sf, void *v)
+{
+	struct rdma_cgroup *cg = css_rdmacg(seq_css(sf));
+	struct rdmacg_resource_pool *rpool;
+	struct rdmacg_device *device;
+	int i;
+
+	mutex_lock(&rdmacg_mutex);
+
+	list_for_each_entry(device, &rdmacg_devices, dev_node) {
+		rpool = find_cg_rpool_locked(cg, device);
+		if (!rpool)
+			continue;
+
+		seq_printf(sf, "%s ", device->name);
+		for (i = 0; i < RDMACG_RESOURCE_MAX; i++) {
+			seq_printf(sf, "%s.max %lld %s.failcnt %lld ",
+				   rdmacg_resource_names[i],
+				   (s64)atomic64_read(&rpool->events_local_max[i]),
+				   rdmacg_resource_names[i],
+				   (s64)atomic64_read(&rpool->events_failcnt[i]));
+		}
+		seq_putc(sf, '\n');
+	}
+
+	mutex_unlock(&rdmacg_mutex);
+	return 0;
+}
+
 static struct cftype rdmacg_files[] = {
 	{
 		.name = "max",
@@ -651,6 +700,12 @@ static struct cftype rdmacg_files[] = {
 		.file_offset = offsetof(struct rdma_cgroup, events_file),
 		.flags = CFTYPE_NOT_ON_ROOT,
 	},
+	{
+		.name = "events.local",
+		.seq_show = rdmacg_events_local_show,
+		.file_offset = offsetof(struct rdma_cgroup, events_local_file),
+		.flags = CFTYPE_NOT_ON_ROOT,
+	},
 	{ }	/* terminate */
 };
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/3] cgroup/rdma: add rdma.peak and rdma.events[.local]
  2026-05-12  3:17 [PATCH 0/3] cgroup/rdma: add rdma.peak and rdma.events[.local] Tao Cui
                   ` (2 preceding siblings ...)
  2026-05-12  3:17 ` [PATCH 3/3] cgroup/rdma: add rdma.events.local for per-cgroup allocation failure attribution Tao Cui
@ 2026-05-12 17:49 ` Tejun Heo
  2026-05-13  1:51   ` Tao Cui
  3 siblings, 1 reply; 6+ messages in thread
From: Tejun Heo @ 2026-05-12 17:49 UTC (permalink / raw)
  To: Tao Cui; +Cc: hannes, mkoutny, cgroups

Hello,

The list below is from an AI-assisted review with some input from me.

* Patches 2 and 3 don't extend the rpool-free condition in
  uncharge_cg_locked() and rdmacg_resource_set_max() to the new event
  counters, so a "set limit -> hit limit -> uncharge to 0 -> write
  'max max'" sequence frees the rpool and zeros the counts.

* rdmacg_event_locked() creates rpools in ancestors of over_cg via
  get_cg_rpool_locked() just to host event counters. Those rpools have
  usage_sum==0, num_max_cnt==max, peak==0, so the next real uncharge
  through any such ancestor frees them.

* Patch 3 says failcnt covers "this cgroup (or its descendants)" but
  the code only increments the directly-requesting cgroup. Either the
  description or the propagation is wrong.

* rdma.events / rdma.events.local print "mlx4_0 hca_handle.max 5
  hca_object.max 0 " (trailing space). That doesn't match any of the
  formats in Documentation/admin-guide/cgroup-v2.rst. rdma.current and
  rdma.max are nested-keyed; the new files should be too:
  "mlx4_0 hca_handle.max=5 hca_object.max=0".

* Please document rdma.peak / rdma.events / rdma.events.local in
  Documentation/admin-guide/cgroup-v2.rst.

* "failcnt" is cgroup-v1 vocabulary; pids.events.local uses
  "fork_fail" for the same role.

* Event counters are atomic64_t but all updates are under
  rdmacg_mutex. Plain u64 with READ_ONCE on the read side would do.

* Patch 1 reflows an unrelated comment ("No user of the rpool ...");
  please drop the churn.

Thanks.

--
tejun

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/3] cgroup/rdma: add rdma.peak and rdma.events[.local]
  2026-05-12 17:49 ` [PATCH 0/3] cgroup/rdma: add rdma.peak and rdma.events[.local] Tejun Heo
@ 2026-05-13  1:51   ` Tao Cui
  0 siblings, 0 replies; 6+ messages in thread
From: Tao Cui @ 2026-05-13  1:51 UTC (permalink / raw)
  To: Tejun Heo; +Cc: hannes, mkoutny, cgroups

Hello，tejun

Thank you very much for your review.

在 2026/5/13 1:49, Tejun Heo 写道:
> * Patches 2 and 3 don't extend the rpool-free condition in
>   uncharge_cg_locked() and rdmacg_resource_set_max() to the new event
>   counters, so a "set limit -> hit limit -> uncharge to 0 -> write
>   'max max'" sequence frees the rpool and zeros the counts.
> 
The rpool-free condition in both uncharge_cg_locked()
and rdmacg_resource_set_max() only checks peak but misses the event
counters (events_max, events_local_max, events_fail).  This means a
non-zero event counter can be silently discarded when the rpool is
freed.  I'll add a helper that checks all persistent data (peak +
event counters) and use it in both sites.

> * rdmacg_event_locked() creates rpools in ancestors of over_cg via
>   get_cg_rpool_locked() just to host event counters. Those rpools have
>   usage_sum==0, num_max_cnt==max, peak==0, so the next real uncharge
>   through any such ancestor frees them.
> 
Agreed.  Using get_cg_rpool_locked() in the propagation loop was wrong
-- it allocates rpools in ancestors that never had any resource
configuration for the device, just to hold an event counter.  These
empty rpools then get freed on the next uncharge, losing the event
data.  I'll switch to find_cg_rpool_locked() so events are only
recorded in rpools that already exist.

> * Patch 3 says failcnt covers "this cgroup (or its descendants)" but
>   the code only increments the directly-requesting cgroup. Either the
>   description or the propagation is wrong.
> 
The description is wrong.  The code only increments failcnt in the
directly-requesting cgroup, not in its ancestors, which is consistent
with how pids.events.local tracks local attribution.  I'll fix the
commit message to say "originating from this cgroup" instead of
"originating from this cgroup or its descendants".

> * rdma.events / rdma.events.local print "mlx4_0 hca_handle.max 5
>   hca_object.max 0 " (trailing space). That doesn't match any of the
>   formats in Documentation/admin-guide/cgroup-v2.rst. rdma.current and
>   rdma.max are nested-keyed; the new files should be too:
>   "mlx4_0 hca_handle.max=5 hca_object.max=0".
> 
Will fix.  I'll switch to the nested-keyed format with '=' and remove
the trailing space so the output matches rdma.max / rdma.current:

    mlx4_0 hca_handle.max=5 hca_object.max=0

> * Please document rdma.peak / rdma.events / rdma.events.local in
>   Documentation/admin-guide/cgroup-v2.rst.
> 
Will add in the next revision.

> * "failcnt" is cgroup-v1 vocabulary; pids.events.local uses
>   "fork_fail" for the same role.
> 
Agreed.  I'll rename "failcnt" to "fail" to follow the cgroup-v2
naming convention.

> * Event counters are atomic64_t but all updates are under
>   rdmacg_mutex. Plain u64 with READ_ONCE on the read side would do.
> 
I also noticed this.  I have a version using plain u64 with READ_ONCE
on the read side, and it is currently being tested locally.  Since the
change touches a hot path in the charge/uncharge code, I want to be
cautious and verify that there are no regressions before sending it
out.

> * Patch 1 reflows an unrelated comment ("No user of the rpool ...");
>   please drop the churn.
> 
Sorry for the noise.  I'll revert the comment reflow and keep the
original formatting.

I'll send v2 with all the fixes above.  Thank you for the thorough
review.

Thanks.

--
Tao

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-05-13  1:52 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-12  3:17 [PATCH 0/3] cgroup/rdma: add rdma.peak and rdma.events[.local] Tao Cui
2026-05-12  3:17 ` [PATCH 1/3] cgroup/rdma: add rdma.peak for per-device peak usage tracking Tao Cui
2026-05-12  3:17 ` [PATCH 2/3] cgroup/rdma: add rdma.events to track resource limit exhaustion Tao Cui
2026-05-12  3:17 ` [PATCH 3/3] cgroup/rdma: add rdma.events.local for per-cgroup allocation failure attribution Tao Cui
2026-05-12 17:49 ` [PATCH 0/3] cgroup/rdma: add rdma.peak and rdma.events[.local] Tejun Heo
2026-05-13  1:51   ` Tao Cui

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.