[PATCH v2 0/4] cgroup/rdma: add rdma.peak and rdma.events[.local]

Linux cgroups development
 help / color / mirror / Atom feed

* [PATCH v2 0/4] cgroup/rdma: add rdma.peak and rdma.events[.local]
@ 2026-05-13 10:49 Tao Cui
  2026-05-13 10:49 ` [PATCH v2 1/4] cgroup/rdma: add rdma.peak for per-device peak usage tracking Tao Cui
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Tao Cui @ 2026-05-13 10:49 UTC (permalink / raw)
  To: tj, hannes, mkoutny, cgroups; +Cc: Tao Cui

Hi,

This is v2 of the RDMA cgroup observability series.  Thanks to the
reviewers for the detailed feedback on v1.  

This series adds new cgroup interface files to the RDMA controller
to improve observability of resource usage and limit enforcement:

  - rdma.peak:        per-device high watermark of resource usage
  - rdma.events:      hierarchical max and alloc_fail event counters
  - rdma.events.local: per-cgroup local max and alloc_fail counters

rdma.peak tracks the historical high watermark so administrators can
determine a sensible rdma.max based on actual peak demand rather than
guesswork.  This is directly analogous to memory.peak.

rdma.events and rdma.events.local provide per-device counters that
track how often resource limits block allocations, and can be monitored
via poll/epoll for real-time alerting.  Both files expose the same
keys (max and alloc_fail); rdma.events aggregates hierarchically while
rdma.events.local shows per-cgroup values.  This follows the
pids.events / pids.events.local design.

Patch overview:
  Patch 1 introduces rdma.peak, adding a per-resource peak field to track
  the high watermark of usage, updated only after a full hierarchical
  charge succeeds, and extends rpool lifetime to preserve non-zero
  peak values.
  Patch 2 adds rdma.events, which introduces rdmacg_event_locked() to
  propagate hierarchical max counters upward from the over-limit
  cgroup, with poll/epoll notification via cgroup_file_notify().
  Patch 3 adds rdma.events.local and hierarchical alloc_fail, extending
  the event framework with per-cgroup local counters (local_max for
  the over-limit cgroup, local_alloc_fail for the requesting cgroup)
  and a hierarchical alloc_fail counter propagated from the requestor
  upward.
  Patch 4 documents all three new interface files in cgroup-v2.rst.

Tao Cui (4):
  cgroup/rdma: add rdma.peak for per-device peak usage tracking
  cgroup/rdma: add rdma.events to track resource limit exhaustion
  cgroup/rdma: add rdma.events.local for per-cgroup allocation failure
    attribution
  cgroup/rdma: document rdma.peak, rdma.events and rdma.events.local

 Documentation/admin-guide/cgroup-v2.rst |  54 +++++++
 include/linux/cgroup_rdma.h             |   4 +
 kernel/cgroup/rdma.c                    | 180 ++++++++++++++++++++++++
 3 files changed, 238 insertions(+)

---
Changes in v2:
  - Fix peak updated before full hierarchical charge succeeds.
  - Use find_cg_rpool_locked() to avoid creating spurious rpools.
  - Replace atomic64_t with u64 + READ_ONCE (all under rdmacg_mutex).
  - Use key=value output format, remove trailing spaces.
  - Always list all devices, show zero for devices without an rpool.
  - Extend rpool-free condition to preserve non-zero event counters.
  - Rename "failcnt" to "alloc_fail" (cgroup v2 naming convention).
  - Fix alloc_fail semantics: local to the requesting cgroup only.
  - Add hierarchical alloc_fail to rdma.events for key consistency.
  - Add documentation in Documentation/admin-guide/cgroup-v2.rst.

v1:
  https://lore.kernel.org/all/20260512031719.273507-1-cuitao@kylinos.cn/
-- 
2.43.0

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v2 1/4] cgroup/rdma: add rdma.peak for per-device peak usage tracking
  2026-05-13 10:49 [PATCH v2 0/4] cgroup/rdma: add rdma.peak and rdma.events[.local] Tao Cui
@ 2026-05-13 10:49 ` Tao Cui
  2026-05-13 10:49 ` [PATCH v2 2/4] cgroup/rdma: add rdma.events to track resource limit exhaustion Tao Cui
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Tao Cui @ 2026-05-13 10:49 UTC (permalink / raw)
  To: tj, hannes, mkoutny, cgroups; +Cc: Tao Cui

rdma.peak tracks the high watermark of resource usage per device,
giving a better baseline on which to set rdma.max. Polling
rdma.current isn't feasible since it would miss short-lived spikes.

This interface is analogous to memory.peak.

Signed-off-by: Tao Cui <cuitao@kylinos.cn>
---
 kernel/cgroup/rdma.c | 40 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)

diff --git a/kernel/cgroup/rdma.c b/kernel/cgroup/rdma.c
index 3df7c38ce481..4e3bf0bade18 100644
--- a/kernel/cgroup/rdma.c
+++ b/kernel/cgroup/rdma.c
@@ -44,6 +44,7 @@ static LIST_HEAD(rdmacg_devices);
 enum rdmacg_file_type {
 	RDMACG_RESOURCE_TYPE_MAX,
 	RDMACG_RESOURCE_TYPE_STAT,
+	RDMACG_RESOURCE_TYPE_PEAK,
 };
 
 /*
@@ -60,6 +61,7 @@ static char const *rdmacg_resource_names[] = {
 struct rdmacg_resource {
 	int max;
 	int usage;
+	int peak;
 };
 
 /*
@@ -204,6 +206,17 @@ uncharge_cg_locked(struct rdma_cgroup *cg,
 	rpool->usage_sum--;
 	if (rpool->usage_sum == 0 &&
 	    rpool->num_max_cnt == RDMACG_RESOURCE_MAX) {
+		int i;
+
+		/*
+		 * Keep the rpool alive if any peak value is non-zero,
+		 * so that rdma.peak persists as a historical high-
+		 * watermark even after all resources are freed.
+		 */
+		for (i = 0; i < RDMACG_RESOURCE_MAX; i++) {
+			if (rpool->resources[i].peak)
+				return;
+		}
 		/*
 		 * No user of the rpool and all entries are set to max, so
 		 * safe to delete this rpool.
@@ -310,6 +323,12 @@ int rdmacg_try_charge(struct rdma_cgroup **rdmacg,
 			}
 		}
 	}
+	/* Update peak only after all charges succeed */
+	for (p = cg; p; p = parent_rdmacg(p)) {
+		rpool = find_cg_rpool_locked(p, device);
+		if (rpool && rpool->resources[index].usage > rpool->resources[index].peak)
+			rpool->resources[index].peak = rpool->resources[index].usage;
+	}
 	mutex_unlock(&rdmacg_mutex);
 
 	*rdmacg = cg;
@@ -472,6 +491,12 @@ static ssize_t rdmacg_resource_set_max(struct kernfs_open_file *of,
 
 	if (rpool->usage_sum == 0 &&
 	    rpool->num_max_cnt == RDMACG_RESOURCE_MAX) {
+		int i;
+
+		for (i = 0; i < RDMACG_RESOURCE_MAX; i++) {
+			if (rpool->resources[i].peak)
+				goto dev_err;
+		}
 		/*
 		 * No user of the rpool and all entries are set to max, so
 		 * safe to delete this rpool.
@@ -506,6 +531,8 @@ static void print_rpool_values(struct seq_file *sf,
 				value = rpool->resources[i].max;
 			else
 				value = S32_MAX;
+		} else if (sf_type == RDMACG_RESOURCE_TYPE_PEAK) {
+			value = rpool ? rpool->resources[i].peak : 0;
 		} else {
 			if (rpool)
 				value = rpool->resources[i].usage;
@@ -556,6 +583,12 @@ static struct cftype rdmacg_files[] = {
 		.private = RDMACG_RESOURCE_TYPE_STAT,
 		.flags = CFTYPE_NOT_ON_ROOT,
 	},
+	{
+		.name = "peak",
+		.seq_show = rdmacg_resource_read,
+		.private = RDMACG_RESOURCE_TYPE_PEAK,
+		.flags = CFTYPE_NOT_ON_ROOT,
+	},
 	{ }	/* terminate */
 };
 
@@ -575,6 +608,13 @@ rdmacg_css_alloc(struct cgroup_subsys_state *parent)
 static void rdmacg_css_free(struct cgroup_subsys_state *css)
 {
 	struct rdma_cgroup *cg = css_rdmacg(css);
+	struct rdmacg_resource_pool *rpool, *tmp;
+
+	/* Clean up rpools kept alive by non-zero peak values */
+	mutex_lock(&rdmacg_mutex);
+	list_for_each_entry_safe(rpool, tmp, &cg->rpools, cg_node)
+		free_cg_rpool_locked(rpool);
+	mutex_unlock(&rdmacg_mutex);
 
 	kfree(cg);
 }
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v2 2/4] cgroup/rdma: add rdma.events to track resource limit exhaustion
  2026-05-13 10:49 [PATCH v2 0/4] cgroup/rdma: add rdma.peak and rdma.events[.local] Tao Cui
  2026-05-13 10:49 ` [PATCH v2 1/4] cgroup/rdma: add rdma.peak for per-device peak usage tracking Tao Cui
@ 2026-05-13 10:49 ` Tao Cui
  2026-05-13 10:49 ` [PATCH v2 3/4] cgroup/rdma: add rdma.events.local for per-cgroup allocation failure attribution Tao Cui
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Tao Cui @ 2026-05-13 10:49 UTC (permalink / raw)
  To: tj, hannes, mkoutny, cgroups; +Cc: Tao Cui

Add per-device hierarchical event counters to track when RDMA resource
limits are exceeded. The rdma.events file reports max event counts
propagated upward from the cgroup whose limit was hit to all ancestors.

This mirrors the design of pids.events, where events are attributed to
the cgroup that imposed the limit, not necessarily the cgroup where the
allocation was attempted. Userspace can monitor this file via
poll/epoll for real-time notification of resource exhaustion.

Signed-off-by: Tao Cui <cuitao@kylinos.cn>
---
 include/linux/cgroup_rdma.h |  3 ++
 kernel/cgroup/rdma.c        | 72 +++++++++++++++++++++++++++++++++++--
 2 files changed, 73 insertions(+), 2 deletions(-)

diff --git a/include/linux/cgroup_rdma.h b/include/linux/cgroup_rdma.h
index 80edae03c313..ac691fe7d3f5 100644
--- a/include/linux/cgroup_rdma.h
+++ b/include/linux/cgroup_rdma.h
@@ -24,6 +24,9 @@ struct rdma_cgroup {
 	 * that belongs to this cgroup.
 	 */
 	struct list_head		rpools;
+
+	/* Handle for rdma.events */
+	struct cgroup_file		events_file;
 };
 
 struct rdmacg_device {
diff --git a/kernel/cgroup/rdma.c b/kernel/cgroup/rdma.c
index 4e3bf0bade18..2b729976b9e9 100644
--- a/kernel/cgroup/rdma.c
+++ b/kernel/cgroup/rdma.c
@@ -81,6 +81,9 @@ struct rdmacg_resource_pool {
 	u64			usage_sum;
 	/* total number counts which are set to max */
 	int			num_max_cnt;
+
+	/* per-resource hierarchical max event counters */
+	u64			events_max[RDMACG_RESOURCE_MAX];
 };
 
 static struct rdma_cgroup *css_rdmacg(struct cgroup_subsys_state *css)
@@ -214,7 +217,8 @@ uncharge_cg_locked(struct rdma_cgroup *cg,
 		 * watermark even after all resources are freed.
 		 */
 		for (i = 0; i < RDMACG_RESOURCE_MAX; i++) {
-			if (rpool->resources[i].peak)
+			if (rpool->resources[i].peak ||
+			    READ_ONCE(rpool->events_max[i]))
 				return;
 		}
 		/*
@@ -225,6 +229,34 @@ uncharge_cg_locked(struct rdma_cgroup *cg,
 	}
 }
 
+/**
+ * rdmacg_event_locked - fire hierarchical max event when resource limit is hit
+ * @over_cg: cgroup whose limit was exceeded
+ * @device: rdma device
+ * @index: resource type index
+ *
+ * Must be called under rdmacg_mutex. Propagates max event counts
+ * from @over_cg (including itself) upward to all ancestors with
+ * an rpool and notifies userspace.
+ */
+static void rdmacg_event_locked(struct rdma_cgroup *over_cg,
+				struct rdmacg_device *device,
+				enum rdmacg_resource_type index)
+{
+	struct rdmacg_resource_pool *rpool;
+	struct rdma_cgroup *p;
+
+	lockdep_assert_held(&rdmacg_mutex);
+
+	for (p = over_cg; parent_rdmacg(p); p = parent_rdmacg(p)) {
+		rpool = find_cg_rpool_locked(p, device);
+		if (rpool) {
+			rpool->events_max[index]++;
+			cgroup_file_notify(&p->events_file);
+		}
+	}
+}
+
 /**
  * rdmacg_uncharge_hierarchy - hierarchically uncharge rdma resource count
  * @cg: pointer to cg to uncharge and all parents in hierarchy
@@ -335,6 +367,8 @@ int rdmacg_try_charge(struct rdma_cgroup **rdmacg,
 	return 0;
 
 err:
+	if (ret == -EAGAIN)
+		rdmacg_event_locked(p, device, index);
 	mutex_unlock(&rdmacg_mutex);
 	rdmacg_uncharge_hierarchy(cg, device, p, index);
 	return ret;
@@ -494,7 +528,8 @@ static ssize_t rdmacg_resource_set_max(struct kernfs_open_file *of,
 		int i;
 
 		for (i = 0; i < RDMACG_RESOURCE_MAX; i++) {
-			if (rpool->resources[i].peak)
+			if (rpool->resources[i].peak ||
+			    READ_ONCE(rpool->events_max[i]))
 				goto dev_err;
 		}
 		/*
@@ -569,6 +604,33 @@ static int rdmacg_resource_read(struct seq_file *sf, void *v)
 	return 0;
 }
 
+static int rdmacg_events_show(struct seq_file *sf, void *v)
+{
+	struct rdma_cgroup *cg = css_rdmacg(seq_css(sf));
+	struct rdmacg_resource_pool *rpool;
+	struct rdmacg_device *device;
+	int i;
+
+	mutex_lock(&rdmacg_mutex);
+
+	list_for_each_entry(device, &rdmacg_devices, dev_node) {
+		rpool = find_cg_rpool_locked(cg, device);
+
+		seq_printf(sf, "%s ", device->name);
+		for (i = 0; i < RDMACG_RESOURCE_MAX; i++) {
+			seq_printf(sf, "%s.max=%lld",
+				   rdmacg_resource_names[i],
+				   rpool ? (s64)READ_ONCE(rpool->events_max[i]) : 0);
+			if (i < RDMACG_RESOURCE_MAX - 1)
+				seq_putc(sf, ' ');
+		}
+		seq_putc(sf, '\n');
+	}
+
+	mutex_unlock(&rdmacg_mutex);
+	return 0;
+}
+
 static struct cftype rdmacg_files[] = {
 	{
 		.name = "max",
@@ -589,6 +651,12 @@ static struct cftype rdmacg_files[] = {
 		.private = RDMACG_RESOURCE_TYPE_PEAK,
 		.flags = CFTYPE_NOT_ON_ROOT,
 	},
+	{
+		.name = "events",
+		.seq_show = rdmacg_events_show,
+		.file_offset = offsetof(struct rdma_cgroup, events_file),
+		.flags = CFTYPE_NOT_ON_ROOT,
+	},
 	{ }	/* terminate */
 };
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v2 3/4] cgroup/rdma: add rdma.events.local for per-cgroup allocation failure attribution
  2026-05-13 10:49 [PATCH v2 0/4] cgroup/rdma: add rdma.peak and rdma.events[.local] Tao Cui
  2026-05-13 10:49 ` [PATCH v2 1/4] cgroup/rdma: add rdma.peak for per-device peak usage tracking Tao Cui
  2026-05-13 10:49 ` [PATCH v2 2/4] cgroup/rdma: add rdma.events to track resource limit exhaustion Tao Cui
@ 2026-05-13 10:49 ` Tao Cui
  2026-05-13 10:49 ` [PATCH v2 4/4] cgroup/rdma: document rdma.peak, rdma.events and rdma.events.local Tao Cui
  2026-05-13 20:27 ` [PATCH v2 0/4] cgroup/rdma: add rdma.peak and rdma.events[.local] Tejun Heo
  4 siblings, 0 replies; 6+ messages in thread
From: Tao Cui @ 2026-05-13 10:49 UTC (permalink / raw)
  To: tj, hannes, mkoutny, cgroups; +Cc: Tao Cui

Add per-cgroup local event counters to track RDMA resource limit
exhaustion from the perspective of individual cgroups. The
rdma.events.local file reports two per-resource counters:

- max: number of times this cgroup's limit was the one that blocked
  an allocation in the subtree
- alloc_fail: number of allocation attempts originating from this
  cgroup that failed due to an ancestor's limit

This mirrors the design of pids.events.local, where events are
attributed to the cgroup that imposed the limit, not necessarily the
cgroup where the allocation was attempted.

Signed-off-by: Tao Cui <cuitao@kylinos.cn>
---
 include/linux/cgroup_rdma.h |  3 +-
 kernel/cgroup/rdma.c        | 94 ++++++++++++++++++++++++++++++++-----
 2 files changed, 85 insertions(+), 12 deletions(-)

diff --git a/include/linux/cgroup_rdma.h b/include/linux/cgroup_rdma.h
index ac691fe7d3f5..404e746552ca 100644
--- a/include/linux/cgroup_rdma.h
+++ b/include/linux/cgroup_rdma.h
@@ -25,8 +25,9 @@ struct rdma_cgroup {
 	 */
 	struct list_head		rpools;
 
-	/* Handle for rdma.events */
+	/* Handles for rdma.events[.local] */
 	struct cgroup_file		events_file;
+	struct cgroup_file		events_local_file;
 };
 
 struct rdmacg_device {
diff --git a/kernel/cgroup/rdma.c b/kernel/cgroup/rdma.c
index 2b729976b9e9..5c94cf080655 100644
--- a/kernel/cgroup/rdma.c
+++ b/kernel/cgroup/rdma.c
@@ -82,8 +82,11 @@ struct rdmacg_resource_pool {
 	/* total number counts which are set to max */
 	int			num_max_cnt;
 
-	/* per-resource hierarchical max event counters */
+	/* per-resource event counters */
 	u64			events_max[RDMACG_RESOURCE_MAX];
+	u64			events_alloc_fail[RDMACG_RESOURCE_MAX];
+	u64			events_local_max[RDMACG_RESOURCE_MAX];
+	u64			events_local_alloc_fail[RDMACG_RESOURCE_MAX];
 };
 
 static struct rdma_cgroup *css_rdmacg(struct cgroup_subsys_state *css)
@@ -218,7 +221,10 @@ uncharge_cg_locked(struct rdma_cgroup *cg,
 		 */
 		for (i = 0; i < RDMACG_RESOURCE_MAX; i++) {
 			if (rpool->resources[i].peak ||
-			    READ_ONCE(rpool->events_max[i]))
+			    READ_ONCE(rpool->events_max[i]) ||
+			    READ_ONCE(rpool->events_local_max[i]) ||
+			    READ_ONCE(rpool->events_alloc_fail[i]) ||
+			    READ_ONCE(rpool->events_local_alloc_fail[i]))
 				return;
 		}
 		/*
@@ -230,16 +236,19 @@ uncharge_cg_locked(struct rdma_cgroup *cg,
 }
 
 /**
- * rdmacg_event_locked - fire hierarchical max event when resource limit is hit
+ * rdmacg_event_locked - fire event when resource allocation exceeds limit
+ * @cg: requesting cgroup
  * @over_cg: cgroup whose limit was exceeded
  * @device: rdma device
  * @index: resource type index
  *
- * Must be called under rdmacg_mutex. Propagates max event counts
- * from @over_cg (including itself) upward to all ancestors with
- * an rpool and notifies userspace.
+ * Must be called under rdmacg_mutex. Updates event counters in the
+ * resource pools of @cg and @over_cg, propagates hierarchical max
+ * events from @over_cg (including itself) upward, and notifies
+ * userspace via cgroup_file_notify().
  */
-static void rdmacg_event_locked(struct rdma_cgroup *over_cg,
+static void rdmacg_event_locked(struct rdma_cgroup *cg,
+				struct rdma_cgroup *over_cg,
 				struct rdmacg_device *device,
 				enum rdmacg_resource_type index)
 {
@@ -248,6 +257,21 @@ static void rdmacg_event_locked(struct rdma_cgroup *over_cg,
 
 	lockdep_assert_held(&rdmacg_mutex);
 
+	/* Increment local alloc_fail in requesting cgroup */
+	rpool = find_cg_rpool_locked(cg, device);
+	if (rpool) {
+		rpool->events_local_alloc_fail[index]++;
+		cgroup_file_notify(&cg->events_local_file);
+	}
+
+	/* Increment local max in the over-limit cgroup */
+	rpool = find_cg_rpool_locked(over_cg, device);
+	if (rpool) {
+		rpool->events_local_max[index]++;
+		cgroup_file_notify(&over_cg->events_local_file);
+	}
+
+	/* Propagate hierarchical max events upward */
 	for (p = over_cg; parent_rdmacg(p); p = parent_rdmacg(p)) {
 		rpool = find_cg_rpool_locked(p, device);
 		if (rpool) {
@@ -255,6 +279,14 @@ static void rdmacg_event_locked(struct rdma_cgroup *over_cg,
 			cgroup_file_notify(&p->events_file);
 		}
 	}
+	/* Propagate hierarchical alloc_fail from requesting cgroup upward */
+	for (p = cg; parent_rdmacg(p); p = parent_rdmacg(p)) {
+		rpool = find_cg_rpool_locked(p, device);
+		if (rpool) {
+			rpool->events_alloc_fail[index]++;
+			cgroup_file_notify(&p->events_file);
+		}
+	}
 }
 
 /**
@@ -368,7 +400,7 @@ int rdmacg_try_charge(struct rdma_cgroup **rdmacg,
 
 err:
 	if (ret == -EAGAIN)
-		rdmacg_event_locked(p, device, index);
+		rdmacg_event_locked(cg, p, device, index);
 	mutex_unlock(&rdmacg_mutex);
 	rdmacg_uncharge_hierarchy(cg, device, p, index);
 	return ret;
@@ -529,7 +561,10 @@ static ssize_t rdmacg_resource_set_max(struct kernfs_open_file *of,
 
 		for (i = 0; i < RDMACG_RESOURCE_MAX; i++) {
 			if (rpool->resources[i].peak ||
-			    READ_ONCE(rpool->events_max[i]))
+			    READ_ONCE(rpool->events_max[i]) ||
+			    READ_ONCE(rpool->events_local_max[i]) ||
+			    READ_ONCE(rpool->events_alloc_fail[i]) ||
+			    READ_ONCE(rpool->events_local_alloc_fail[i]))
 				goto dev_err;
 		}
 		/*
@@ -618,9 +653,40 @@ static int rdmacg_events_show(struct seq_file *sf, void *v)
 
 		seq_printf(sf, "%s ", device->name);
 		for (i = 0; i < RDMACG_RESOURCE_MAX; i++) {
-			seq_printf(sf, "%s.max=%lld",
+			seq_printf(sf, "%s.max=%lld %s.alloc_fail=%lld",
 				   rdmacg_resource_names[i],
-				   rpool ? (s64)READ_ONCE(rpool->events_max[i]) : 0);
+				   rpool ? (s64)READ_ONCE(rpool->events_max[i]) : 0,
+				   rdmacg_resource_names[i],
+				   rpool ? (s64)READ_ONCE(rpool->events_alloc_fail[i]) : 0);
+			if (i < RDMACG_RESOURCE_MAX - 1)
+				seq_putc(sf, ' ');
+		}
+		seq_putc(sf, '\n');
+	}
+
+	mutex_unlock(&rdmacg_mutex);
+	return 0;
+}
+
+static int rdmacg_events_local_show(struct seq_file *sf, void *v)
+{
+	struct rdma_cgroup *cg = css_rdmacg(seq_css(sf));
+	struct rdmacg_resource_pool *rpool;
+	struct rdmacg_device *device;
+	int i;
+
+	mutex_lock(&rdmacg_mutex);
+
+	list_for_each_entry(device, &rdmacg_devices, dev_node) {
+		rpool = find_cg_rpool_locked(cg, device);
+
+		seq_printf(sf, "%s ", device->name);
+		for (i = 0; i < RDMACG_RESOURCE_MAX; i++) {
+			seq_printf(sf, "%s.max=%lld %s.alloc_fail=%lld",
+				   rdmacg_resource_names[i],
+				   rpool ? (s64)READ_ONCE(rpool->events_local_max[i]) : 0,
+				   rdmacg_resource_names[i],
+				   rpool ? (s64)READ_ONCE(rpool->events_local_alloc_fail[i]) : 0);
 			if (i < RDMACG_RESOURCE_MAX - 1)
 				seq_putc(sf, ' ');
 		}
@@ -657,6 +723,12 @@ static struct cftype rdmacg_files[] = {
 		.file_offset = offsetof(struct rdma_cgroup, events_file),
 		.flags = CFTYPE_NOT_ON_ROOT,
 	},
+	{
+		.name = "events.local",
+		.seq_show = rdmacg_events_local_show,
+		.file_offset = offsetof(struct rdma_cgroup, events_local_file),
+		.flags = CFTYPE_NOT_ON_ROOT,
+	},
 	{ }	/* terminate */
 };
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v2 4/4] cgroup/rdma: document rdma.peak, rdma.events and rdma.events.local
  2026-05-13 10:49 [PATCH v2 0/4] cgroup/rdma: add rdma.peak and rdma.events[.local] Tao Cui
                   ` (2 preceding siblings ...)
  2026-05-13 10:49 ` [PATCH v2 3/4] cgroup/rdma: add rdma.events.local for per-cgroup allocation failure attribution Tao Cui
@ 2026-05-13 10:49 ` Tao Cui
  2026-05-13 20:27 ` [PATCH v2 0/4] cgroup/rdma: add rdma.peak and rdma.events[.local] Tejun Heo
  4 siblings, 0 replies; 6+ messages in thread
From: Tao Cui @ 2026-05-13 10:49 UTC (permalink / raw)
  To: tj, hannes, mkoutny, cgroups; +Cc: Tao Cui

Add interface file documentation for the new rdma cgroup files to
Documentation/admin-guide/cgroup-v2.rst.

Signed-off-by: Tao Cui <cuitao@kylinos.cn>
---
 Documentation/admin-guide/cgroup-v2.rst | 54 +++++++++++++++++++++++++
 1 file changed, 54 insertions(+)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 6efd0095ed99..c8763300e827 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -2785,6 +2785,60 @@ RDMA Interface Files
 	  mlx4_0 hca_handle=1 hca_object=20
 	  ocrdma1 hca_handle=1 hca_object=23
 
+  rdma.peak
+	A read-only nested-keyed file that exists for all the cgroups
+	except root.  It shows the historical high watermark of
+	resource usage per device since the cgroup was created.
+
+	An example for mlx4 and ocrdma device follows::
+
+	  mlx4_0 hca_handle=1 hca_object=20
+	  ocrdma1 hca_handle=0 hca_object=23
+
+  rdma.events
+	A read-only nested-keyed file which exists on non-root
+	cgroups.  The following nested keys are defined.
+
+	  max
+		The number of times a process in this cgroup or its
+		descendants attempted an RDMA resource allocation that
+		was rejected because a rdma.max limit in the subtree
+		was reached.  This is a hierarchical counter: the event
+		is propagated upward to all ancestor cgroups that
+		already have a resource pool for the device.  A value
+		change in this file generates a file modified event.
+
+	  alloc_fail
+		The number of RDMA resource allocation attempts that
+		originated in this cgroup or its descendants and failed
+		due to a rdma.max limit being reached.  This is a
+		hierarchical counter propagated upward.
+
+	An example for mlx4 device follows::
+
+	  mlx4_0 hca_handle.max=5 hca_handle.alloc_fail=3 hca_object.max=0 hca_object.alloc_fail=0
+
+  rdma.events.local
+	Similar to rdma.events but the fields in the file are local
+	to the cgroup i.e. not hierarchical.  The file modified event
+	generated on this file reflects only the local events.
+
+	The following nested keys are defined.
+
+	  max
+		The number of times a process in this cgroup or its
+		descendants attempted an RDMA resource allocation that
+		was rejected because this cgroup's own rdma.max limit
+		was reached.
+	  alloc_fail
+		The number of RDMA resource allocation attempts
+		originating from this cgroup that failed due to this
+		cgroup's or an ancestor's rdma.max limit.
+
+	An example for mlx4 device follows::
+
+	  mlx4_0 hca_handle.max=5 hca_handle.alloc_fail=0 hca_object.max=0 hca_object.alloc_fail=0
+
 DMEM
 ----
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 0/4] cgroup/rdma: add rdma.peak and rdma.events[.local]
  2026-05-13 10:49 [PATCH v2 0/4] cgroup/rdma: add rdma.peak and rdma.events[.local] Tao Cui
                   ` (3 preceding siblings ...)
  2026-05-13 10:49 ` [PATCH v2 4/4] cgroup/rdma: document rdma.peak, rdma.events and rdma.events.local Tao Cui
@ 2026-05-13 20:27 ` Tejun Heo
  4 siblings, 0 replies; 6+ messages in thread
From: Tejun Heo @ 2026-05-13 20:27 UTC (permalink / raw)
  To: Tao Cui; +Cc: hannes, mkoutny, cgroups

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1511 bytes --]

Hello,

v1 points are fully addressed.  A few more on v2:

* In rdmacg_resource_set_max(), the new "rpool has peak/events" check
  uses `goto dev_err` to skip free_cg_rpool_locked().  It works
  because ret is still 0 at that point, but dev_err is the error
  label and this isn't an error path.  Restructure so the free is
  guarded by an if, or rename the label.

* By the end of patch 3, the rpool-keep predicate is five lines
  duplicated in uncharge_cg_locked() and rdmacg_resource_set_max().
  Worth extracting into a rpool_has_persistent_state() helper — a
  sixth counter later then changes one site, not two.

* Switching rdmacg_event_locked() from get_ to find_ avoids the
  spurious-rpool problem I raised in v1, but it also means
  ancestors of over_cg without a prior rpool for this device
  silently drop the hierarchical event.  Now that the rpool-keep
  check covers event counters, get_ + keep-alive would give full
  hierarchical coverage without the issue from v1 (rpools getting
  freed on the next uncharge).  The struct is small and rpool
  presence isn't user-observable.  Worth reconsidering — or, if you
  keep find_, note the caveat in the rdma.events documentation.

* Patch 3 also extends rdma.events with hierarchical alloc_fail
  but the commit message only describes rdma.events.local.  Mention
  the rdma.events change.

* In rdmacg_events_show() / rdmacg_events_local_show(), the
  `(s64)READ_ONCE(u64) ... %lld` pattern can drop the cast and use
  %llu.

Thanks.

--
tejun

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-05-13 20:27 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-13 10:49 [PATCH v2 0/4] cgroup/rdma: add rdma.peak and rdma.events[.local] Tao Cui
2026-05-13 10:49 ` [PATCH v2 1/4] cgroup/rdma: add rdma.peak for per-device peak usage tracking Tao Cui
2026-05-13 10:49 ` [PATCH v2 2/4] cgroup/rdma: add rdma.events to track resource limit exhaustion Tao Cui
2026-05-13 10:49 ` [PATCH v2 3/4] cgroup/rdma: add rdma.events.local for per-cgroup allocation failure attribution Tao Cui
2026-05-13 10:49 ` [PATCH v2 4/4] cgroup/rdma: document rdma.peak, rdma.events and rdma.events.local Tao Cui
2026-05-13 20:27 ` [PATCH v2 0/4] cgroup/rdma: add rdma.peak and rdma.events[.local] Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox