[PATCH 0/2] cgroup/dmem: introduce a peak file

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/2] cgroup/dmem: introduce a peak file
@ 2026-05-06 11:58 Thadeu Lima de Souza Cascardo
  2026-05-06 11:58 ` [PATCH 1/2] mm/page_counter: decouple peak_reset from peak_write Thadeu Lima de Souza Cascardo
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Thadeu Lima de Souza Cascardo @ 2026-05-06 11:58 UTC (permalink / raw)
  To: Tejun Heo, Johannes Weiner, Michal Koutný, Michal Hocko,
	Roman Gushchin, Shakeel Butt, Muchun Song, Andrew Morton,
	Jonathan Corbet, Shuah Khan, Maarten Lankhorst, Maxime Ripard,
	Natalie Vock, Tvrtko Ursulin
  Cc: cgroups, linux-kernel, linux-mm, linux-doc, dri-devel,
	Thadeu Lima de Souza Cascardo, kernel-dev

Just like we have memory.peak, introduce a dmem.peak, which uses the
page_counter support for that.

It can be written to in order to reset the peak, but different from
memory.peak, which expects any write, dmem.peak expects the region name to
be written to it. That region peak is the one that is reset.

That requires ofp_peak to carry a pointer to the pool that was reset.

Writing a different region name will reset the different region and make
the original region peak get back to its non-reset value.

While at it, we reuse a helper from memcontrol, which we moved to
kernel/cgroup/cgroup.c.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com>
---
Thadeu Lima de Souza Cascardo (2):
      mm/page_counter: decouple peak_reset from peak_write
      cgroup/dmem: introduce a peak file

 Documentation/admin-guide/cgroup-v2.rst |  10 +++
 include/linux/cgroup-defs.h             |   7 ++
 kernel/cgroup/cgroup.c                  |  32 ++++++++
 kernel/cgroup/dmem.c                    | 132 ++++++++++++++++++++++++++++++--
 mm/memcontrol.c                         |  42 ++--------
 5 files changed, 183 insertions(+), 40 deletions(-)
---
base-commit: 7fd2df204f342fc17d1a0bfcd474b24232fb0f32
change-id: 20260409-dmem_peak-3abc1be95072

Best regards,
--  
Thadeu Lima de Souza Cascardo <cascardo@igalia.com>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/2] mm/page_counter: decouple peak_reset from peak_write
  2026-05-06 11:58 [PATCH 0/2] cgroup/dmem: introduce a peak file Thadeu Lima de Souza Cascardo
@ 2026-05-06 11:58 ` Thadeu Lima de Souza Cascardo
  2026-05-06 11:58 ` [PATCH 2/2] cgroup/dmem: introduce a peak file Thadeu Lima de Souza Cascardo
  2026-05-06 13:53 ` [PATCH 0/2] " Michal Koutný
  2 siblings, 0 replies; 6+ messages in thread
From: Thadeu Lima de Souza Cascardo @ 2026-05-06 11:58 UTC (permalink / raw)
  To: Tejun Heo, Johannes Weiner, Michal Koutný, Michal Hocko,
	Roman Gushchin, Shakeel Butt, Muchun Song, Andrew Morton,
	Jonathan Corbet, Shuah Khan, Maarten Lankhorst, Maxime Ripard,
	Natalie Vock, Tvrtko Ursulin
  Cc: cgroups, linux-kernel, linux-mm, linux-doc, dri-devel,
	Thadeu Lima de Souza Cascardo, kernel-dev

Create a new function of_peak_reset that resets the page_counter peak for a
given writer. This should allow it to be reused by other cgroups.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com>
---
 include/linux/cgroup-defs.h |  6 ++++++
 kernel/cgroup/cgroup.c      | 32 ++++++++++++++++++++++++++++++++
 mm/memcontrol.c             | 42 ++++++++----------------------------------
 3 files changed, 46 insertions(+), 34 deletions(-)

diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index f42563739d2e..a85044cb0553 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -22,6 +22,7 @@
 #include <linux/workqueue.h>
 #include <linux/bpf-cgroup-defs.h>
 #include <linux/psi_types.h>
+#include <linux/page_counter.h>
 
 #ifdef CONFIG_CGROUPS
 
@@ -868,11 +869,16 @@ struct cgroup_subsys {
 extern struct percpu_rw_semaphore cgroup_threadgroup_rwsem;
 extern bool cgroup_enable_per_threadgroup_rwsem;
 
+#define OFP_PEAK_UNSET (((-1UL)))
+
 struct cgroup_of_peak {
 	unsigned long		value;
 	struct list_head	list;
 };
 
+void of_peak_reset(struct cgroup_of_peak *ofp, struct page_counter *pc,
+		   struct list_head *watchers);
+
 /**
  * cgroup_threadgroup_change_begin - threadgroup exclusion for cgroups
  * @tsk: target task
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 45c0b1ed687a..9b98a5cccf0e 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -1981,6 +1981,38 @@ struct cgroup_of_peak *of_peak(struct kernfs_open_file *of)
 	return &ctx->peak;
 }
 
+/**
+ * of_peak_reset - reset peak
+ * @ofp: open file context
+ * @pc: counter
+ * @watchers: list of other open file contexts
+ *
+ * This function updates all contexts in @watchers to the new usage of @pc.
+ * If @ofp is not in the list yet, that is, if its value is
+ * %OFP_PEAK_UNSET, it is added to @watchers list.
+ *
+ * A lock must be used to protect @watchers.
+ */
+void of_peak_reset(struct cgroup_of_peak *ofp, struct page_counter *pc,
+		   struct list_head *watchers)
+{
+	unsigned long usage;
+	struct cgroup_of_peak *peer_ctx;
+
+	usage = page_counter_read(pc);
+	WRITE_ONCE(pc->local_watermark, usage);
+
+	list_for_each_entry(peer_ctx, watchers, list)
+		if (usage > peer_ctx->value)
+			WRITE_ONCE(peer_ctx->value, usage);
+
+	/* initial write, register watcher */
+	if (ofp->value == OFP_PEAK_UNSET)
+		list_add(&ofp->list, watchers);
+
+	WRITE_ONCE(ofp->value, usage);
+}
+
 static void apply_cgroup_root_flags(unsigned int root_flags)
 {
 	if (current->nsproxy->cgroup_ns == &init_cgroup_ns) {
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index c03d4787d466..8754927070d3 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4517,8 +4517,6 @@ static u64 memory_current_read(struct cgroup_subsys_state *css,
 	return (u64)page_counter_read(&memcg->memory) * PAGE_SIZE;
 }
 
-#define OFP_PEAK_UNSET (((-1UL)))
-
 static int peak_show(struct seq_file *sf, void *v, struct page_counter *pc)
 {
 	struct cgroup_of_peak *ofp = of_peak(sf->private);
@@ -4563,45 +4561,18 @@ static void peak_release(struct kernfs_open_file *of)
 	spin_unlock(&memcg->peaks_lock);
 }
 
-static ssize_t peak_write(struct kernfs_open_file *of, char *buf, size_t nbytes,
-			  loff_t off, struct page_counter *pc,
-			  struct list_head *watchers)
+static ssize_t memory_peak_write(struct kernfs_open_file *of, char *buf,
+				 size_t nbytes, loff_t off)
 {
-	unsigned long usage;
-	struct cgroup_of_peak *peer_ctx;
 	struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of));
 	struct cgroup_of_peak *ofp = of_peak(of);
 
 	spin_lock(&memcg->peaks_lock);
-
-	usage = page_counter_read(pc);
-	WRITE_ONCE(pc->local_watermark, usage);
-
-	list_for_each_entry(peer_ctx, watchers, list)
-		if (usage > peer_ctx->value)
-			WRITE_ONCE(peer_ctx->value, usage);
-
-	/* initial write, register watcher */
-	if (ofp->value == OFP_PEAK_UNSET)
-		list_add(&ofp->list, watchers);
-
-	WRITE_ONCE(ofp->value, usage);
+	of_peak_reset(ofp, &memcg->memory, &memcg->memory_peaks);
 	spin_unlock(&memcg->peaks_lock);
-
 	return nbytes;
 }
 
-static ssize_t memory_peak_write(struct kernfs_open_file *of, char *buf,
-				 size_t nbytes, loff_t off)
-{
-	struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of));
-
-	return peak_write(of, buf, nbytes, off, &memcg->memory,
-			  &memcg->memory_peaks);
-}
-
-#undef OFP_PEAK_UNSET
-
 static int memory_min_show(struct seq_file *m, void *v)
 {
 	return seq_puts_memcg_tunable(m,
@@ -5611,9 +5582,12 @@ static ssize_t swap_peak_write(struct kernfs_open_file *of, char *buf,
 			       size_t nbytes, loff_t off)
 {
 	struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of));
+	struct cgroup_of_peak *ofp = of_peak(of);
 
-	return peak_write(of, buf, nbytes, off, &memcg->swap,
-			  &memcg->swap_peaks);
+	spin_lock(&memcg->peaks_lock);
+	of_peak_reset(ofp, &memcg->swap, &memcg->swap_peaks);
+	spin_unlock(&memcg->peaks_lock);
+	return nbytes;
 }
 
 static int swap_high_show(struct seq_file *m, void *v)

-- 
2.47.3


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/2] cgroup/dmem: introduce a peak file
  2026-05-06 11:58 [PATCH 0/2] cgroup/dmem: introduce a peak file Thadeu Lima de Souza Cascardo
  2026-05-06 11:58 ` [PATCH 1/2] mm/page_counter: decouple peak_reset from peak_write Thadeu Lima de Souza Cascardo
@ 2026-05-06 11:58 ` Thadeu Lima de Souza Cascardo
  2026-05-06 13:53 ` [PATCH 0/2] " Michal Koutný
  2 siblings, 0 replies; 6+ messages in thread
From: Thadeu Lima de Souza Cascardo @ 2026-05-06 11:58 UTC (permalink / raw)
  To: Tejun Heo, Johannes Weiner, Michal Koutný, Michal Hocko,
	Roman Gushchin, Shakeel Butt, Muchun Song, Andrew Morton,
	Jonathan Corbet, Shuah Khan, Maarten Lankhorst, Maxime Ripard,
	Natalie Vock, Tvrtko Ursulin
  Cc: cgroups, linux-kernel, linux-mm, linux-doc, dri-devel,
	Thadeu Lima de Souza Cascardo, kernel-dev

Just like we have memory.peak, introduce a dmem.peak, which uses the
page_counter support for that.

It can be written to in order to reset the peak, but different from
memory.peak, which expects any write, dmem.peak expects the region name to
be written to it. That region peak is the one that is reset.

That requires ofp_peak to carry a pointer to the pool that was reset.

Writing a different region name will reset the different region and make
the original region peak get back to its non-reset value.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com>
---
 Documentation/admin-guide/cgroup-v2.rst |  10 +++
 include/linux/cgroup-defs.h             |   1 +
 kernel/cgroup/dmem.c                    | 132 ++++++++++++++++++++++++++++++--
 3 files changed, 137 insertions(+), 6 deletions(-)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 6efd0095ed99..3ba7ab3a36b3 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -2808,6 +2808,16 @@ DMEM Interface Files
 	The semantics are the same as for the memory cgroup controller, and are
 	calculated in the same way.
 
+  dmem.peak
+	A readwrite nested-keyed file that exists on non-root cgroups.
+
+	The max memory usage recorded for the cgroup and its descendants since
+	either the creation of the cgroup or the most recent reset for that FD.
+
+	A write of a region name to this file resets it to the current memory
+	usage for subsequent reads through the same file descriptor for that
+	region.
+
   dmem.capacity
 	A read-only file that describes maximum region capacity.
 	It only exists on the root cgroup. Not all memory can be
diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index a85044cb0553..b536054bd916 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -874,6 +874,7 @@ extern bool cgroup_enable_per_threadgroup_rwsem;
 struct cgroup_of_peak {
 	unsigned long		value;
 	struct list_head	list;
+	struct dmem_cgroup_pool_state *pool;
 };
 
 void of_peak_reset(struct cgroup_of_peak *ofp, struct page_counter *pc,
diff --git a/kernel/cgroup/dmem.c b/kernel/cgroup/dmem.c
index 1ab1fb47f271..afa380c9839b 100644
--- a/kernel/cgroup/dmem.c
+++ b/kernel/cgroup/dmem.c
@@ -57,6 +57,9 @@ struct dmemcg_state {
 	struct cgroup_subsys_state css;
 
 	struct list_head pools;
+
+	/** @peaks_lock: Protects access to the pools' peaks lists */
+	spinlock_t peaks_lock;
 };
 
 struct dmem_cgroup_pool_state {
@@ -72,6 +75,10 @@ struct dmem_cgroup_pool_state {
 	struct rcu_head rcu;
 
 	struct page_counter cnt;
+
+	/* Protected by the dmemcg_state peaks_lock */
+	struct list_head peaks;
+
 	struct dmem_cgroup_pool_state *parent;
 
 	refcount_t ref;
@@ -162,26 +169,45 @@ set_resource_max(struct dmem_cgroup_pool_state *pool, u64 val)
 	page_counter_set_max(&pool->cnt, val);
 }
 
-static u64 get_resource_low(struct dmem_cgroup_pool_state *pool)
+static u64 get_resource_low(struct seq_file *sf, struct dmem_cgroup_pool_state *pool)
 {
 	return pool ? READ_ONCE(pool->cnt.low) : 0;
 }
 
-static u64 get_resource_min(struct dmem_cgroup_pool_state *pool)
+static u64 get_resource_min(struct seq_file *sf, struct dmem_cgroup_pool_state *pool)
 {
 	return pool ? READ_ONCE(pool->cnt.min) : 0;
 }
 
-static u64 get_resource_max(struct dmem_cgroup_pool_state *pool)
+static u64 get_resource_max(struct seq_file *sf, struct dmem_cgroup_pool_state *pool)
 {
 	return pool ? READ_ONCE(pool->cnt.max) : PAGE_COUNTER_MAX;
 }
 
-static u64 get_resource_current(struct dmem_cgroup_pool_state *pool)
+static u64 get_resource_current(struct seq_file *sf, struct dmem_cgroup_pool_state *pool)
 {
 	return pool ? page_counter_read(&pool->cnt) : 0;
 }
 
+static u64 get_resource_peak(struct seq_file *sf, struct dmem_cgroup_pool_state *pool)
+{
+	struct cgroup_of_peak *ofp = of_peak(sf->private);
+	u64 fd_peak, peak;
+	struct dmem_cgroup_pool_state *of_pool;
+
+	if (!pool)
+		return 0;
+
+	of_pool = READ_ONCE(ofp->pool);
+
+	fd_peak = READ_ONCE(ofp->value);
+	if (of_pool != pool || fd_peak == OFP_PEAK_UNSET)
+		peak = pool->cnt.watermark;
+	else
+		peak = max(fd_peak, READ_ONCE(pool->cnt.local_watermark));
+	return peak;
+}
+
 static void reset_all_resource_limits(struct dmem_cgroup_pool_state *rpool)
 {
 	set_resource_min(rpool, 0);
@@ -227,6 +253,7 @@ dmemcs_alloc(struct cgroup_subsys_state *parent_css)
 		return ERR_PTR(-ENOMEM);
 
 	INIT_LIST_HEAD(&dmemcs->pools);
+	spin_lock_init(&dmemcs->peaks_lock);
 	return &dmemcs->css;
 }
 
@@ -377,6 +404,7 @@ alloc_pool_single(struct dmemcg_state *dmemcs, struct dmem_cgroup_region *region
 			  ppool ? &ppool->cnt : NULL, true);
 	reset_all_resource_limits(pool);
 	refcount_set(&pool->ref, 1);
+	INIT_LIST_HEAD(&pool->peaks);
 	kref_get(&region->ref);
 	if (ppool && !pool->parent) {
 		pool->parent = ppool;
@@ -784,7 +812,7 @@ static ssize_t dmemcg_limit_write(struct kernfs_open_file *of,
 }
 
 static int dmemcg_limit_show(struct seq_file *sf, void *v,
-			    u64 (*fn)(struct dmem_cgroup_pool_state *))
+			    u64 (*fn)(struct seq_file *, struct dmem_cgroup_pool_state *))
 {
 	struct dmemcg_state *dmemcs = css_to_dmemcs(seq_css(sf));
 	struct dmem_cgroup_region *region;
@@ -796,7 +824,7 @@ static int dmemcg_limit_show(struct seq_file *sf, void *v,
 
 		seq_puts(sf, region->name);
 
-		val = fn(pool);
+		val = fn(sf, pool);
 		if (val < PAGE_COUNTER_MAX)
 			seq_printf(sf, " %lld\n", val);
 		else
@@ -807,6 +835,90 @@ static int dmemcg_limit_show(struct seq_file *sf, void *v,
 	return 0;
 }
 
+static int dmem_cgroup_region_peak_open(struct kernfs_open_file *of)
+{
+	struct cgroup_of_peak *ofp = of_peak(of);
+
+	ofp->value = OFP_PEAK_UNSET;
+
+	return 0;
+}
+
+static void dmem_cgroup_region_peak_remove(struct cgroup_of_peak *ofp)
+{
+	struct dmem_cgroup_pool_state *pool;
+	struct dmemcg_state *dmemcs;
+
+	pool = xchg(&ofp->pool, NULL);
+	if (!pool)
+		return;
+
+	dmemcs = pool->cs;
+
+	spin_lock(&dmemcs->peaks_lock);
+	list_del(&ofp->list);
+	spin_unlock(&dmemcs->peaks_lock);
+
+	WRITE_ONCE(ofp->value, OFP_PEAK_UNSET);
+
+	dmemcg_pool_put(pool);
+}
+
+static void dmem_cgroup_region_peak_release(struct kernfs_open_file *of)
+{
+	struct cgroup_of_peak *ofp = of_peak(of);
+
+	if (ofp->value == OFP_PEAK_UNSET) {
+		/* fast path (no writes on this fd) */
+		return;
+	}
+
+	dmem_cgroup_region_peak_remove(ofp);
+}
+
+static ssize_t dmem_cgroup_region_peak_write(struct kernfs_open_file *of,
+					     char *buf, size_t nbytes, loff_t off)
+{
+	struct dmemcg_state *dmemcs = css_to_dmemcs(of_css(of));
+	struct cgroup_of_peak *ofp = of_peak(of);
+	struct dmem_cgroup_pool_state *pool = NULL;
+	struct dmem_cgroup_region *region;
+	int err = 0;
+
+	buf = strstrip(buf);
+	if (!buf[0])
+		return -EINVAL;
+
+	rcu_read_lock();
+	region = dmemcg_get_region_by_name(buf);
+	rcu_read_unlock();
+
+	if (!region)
+		return -EINVAL;
+
+	pool = get_cg_pool_unlocked(dmemcs, region);
+	if (IS_ERR(pool)) {
+		err = PTR_ERR(pool);
+		goto out_put;
+	}
+
+	dmem_cgroup_region_peak_remove(ofp);
+
+	xchg(&ofp->pool, pool);
+	spin_lock(&dmemcs->peaks_lock);
+	of_peak_reset(ofp, &pool->cnt, &pool->peaks);
+	spin_unlock(&dmemcs->peaks_lock);
+
+out_put:
+	kref_put(&region->ref, dmemcg_free_region);
+	return err ?: nbytes;
+}
+
+static int dmem_cgroup_region_peak_show(struct seq_file *sf, void *v)
+{
+	return dmemcg_limit_show(sf, v, get_resource_peak);
+}
+
 static int dmem_cgroup_region_current_show(struct seq_file *sf, void *v)
 {
 	return dmemcg_limit_show(sf, v, get_resource_current);
@@ -855,6 +967,14 @@ static struct cftype files[] = {
 		.name = "current",
 		.seq_show = dmem_cgroup_region_current_show,
 	},
+	{
+		.name = "peak",
+		.open = dmem_cgroup_region_peak_open,
+		.release = dmem_cgroup_region_peak_release,
+		.write = dmem_cgroup_region_peak_write,
+		.seq_show = dmem_cgroup_region_peak_show,
+		.flags = CFTYPE_NOT_ON_ROOT,
+	},
 	{
 		.name = "min",
 		.write = dmem_cgroup_region_min_write,

-- 
2.47.3


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/2] cgroup/dmem: introduce a peak file
  2026-05-06 11:58 [PATCH 0/2] cgroup/dmem: introduce a peak file Thadeu Lima de Souza Cascardo
  2026-05-06 11:58 ` [PATCH 1/2] mm/page_counter: decouple peak_reset from peak_write Thadeu Lima de Souza Cascardo
  2026-05-06 11:58 ` [PATCH 2/2] cgroup/dmem: introduce a peak file Thadeu Lima de Souza Cascardo
@ 2026-05-06 13:53 ` Michal Koutný
  2026-05-06 14:18   ` Thadeu Lima de Souza Cascardo
  2 siblings, 1 reply; 6+ messages in thread
From: Michal Koutný @ 2026-05-06 13:53 UTC (permalink / raw)
  To: Thadeu Lima de Souza Cascardo
  Cc: Tejun Heo, Johannes Weiner, Michal Hocko, Roman Gushchin,
	Shakeel Butt, Muchun Song, Andrew Morton, Jonathan Corbet,
	Shuah Khan, Maarten Lankhorst, Maxime Ripard, Natalie Vock,
	Tvrtko Ursulin, cgroups, linux-kernel, linux-mm, linux-doc,
	dri-devel, kernel-dev

[-- Attachment #1: Type: text/plain, Size: 1205 bytes --]

Hello Thadeu.

On Wed, May 06, 2026 at 08:58:23AM -0300, Thadeu Lima de Souza Cascardo <cascardo@igalia.com> wrote:
> Just like we have memory.peak, introduce a dmem.peak, which uses the
> page_counter support for that.
> 
> It can be written to in order to reset the peak, but different from
> memory.peak, which expects any write, dmem.peak expects the region name to
> be written to it. That region peak is the one that is reset.
> 
> That requires ofp_peak to carry a pointer to the pool that was reset.

(It'd be nicer to have generic data in that generic structure, at least
some void *priv. But see below.)

> Writing a different region name will reset the different region and make
> the original region peak get back to its non-reset value.

I'm slightly confused by this fds x pool matricity when there's only
a single slot in cgroup_file_ctx::cgroup_of_peak.

The intended use case is that users should maintain one fd per pool and
not mix it up?
This stanza would better fit to cgroup-v2.rst proper than the commit
message. Or make it simpler and start with non-resettable peak file
(like memory.peak had started too) and see how it fares. WDYT?


Thanks,
Michal

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 265 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/2] cgroup/dmem: introduce a peak file
  2026-05-06 13:53 ` [PATCH 0/2] " Michal Koutný
@ 2026-05-06 14:18   ` Thadeu Lima de Souza Cascardo
  2026-05-06 15:09     ` Michal Koutný
  0 siblings, 1 reply; 6+ messages in thread
From: Thadeu Lima de Souza Cascardo @ 2026-05-06 14:18 UTC (permalink / raw)
  To: Michal Koutný
  Cc: Tejun Heo, Johannes Weiner, Michal Hocko, Roman Gushchin,
	Shakeel Butt, Muchun Song, Andrew Morton, Jonathan Corbet,
	Shuah Khan, Maarten Lankhorst, Maxime Ripard, Natalie Vock,
	Tvrtko Ursulin, cgroups, linux-kernel, linux-mm, linux-doc,
	dri-devel, kernel-dev

On Wed, May 06, 2026 at 03:53:59PM +0200, Michal Koutný wrote:
> Hello Thadeu.
> 
> On Wed, May 06, 2026 at 08:58:23AM -0300, Thadeu Lima de Souza Cascardo <cascardo@igalia.com> wrote:
> > Just like we have memory.peak, introduce a dmem.peak, which uses the
> > page_counter support for that.
> > 
> > It can be written to in order to reset the peak, but different from
> > memory.peak, which expects any write, dmem.peak expects the region name to
> > be written to it. That region peak is the one that is reset.
> > 
> > That requires ofp_peak to carry a pointer to the pool that was reset.
> 
> (It'd be nicer to have generic data in that generic structure, at least
> some void *priv. But see below.)
> 

I used void *, at first, but as the only current use is for the pool and as
mixing different uses may lead to misuse, I thought it would be safer to
use the type directly. This has been pointed out before for other members
of cgroup_file_ctx. See [1].

> > Writing a different region name will reset the different region and make
> > the original region peak get back to its non-reset value.
> 
> I'm slightly confused by this fds x pool matricity when there's only
> a single slot in cgroup_file_ctx::cgroup_of_peak.
> 
> The intended use case is that users should maintain one fd per pool and
> not mix it up?
> This stanza would better fit to cgroup-v2.rst proper than the commit
> message. Or make it simpler and start with non-resettable peak file
> (like memory.peak had started too) and see how it fares. WDYT?
> 

That is also due to the limitation that each file descriptor has a single
linked list under cgroup_file_ctx::cgroup_of_peak. To allow for all the
pools to be reset at once, we would need one list per file descriptor.

But, on the other hand, as you pointed out, this leads to the flexibility
of being able to reset only one pool, while leaving the others as is.
Whereas, if one needs to reset all pools, they can use one file descriptor
per region.

I started with a non-resettable peak file, but as memory.peak can be reset,
I added that feature too. If we want to merge a non-resettable support
ealier and need to take longer to discuss how to work on the resettable
support given the above, I can resubmit. But I guess we can see if we can
reach an agreement sonner rather than later.

Thanks.
Cascardo.

> 
> Thanks,
> Michal

[1] https://lore.kernel.org/all/CAHk-=wgiYkECT=hZRKj8ZwfBPw2Uz=gpOGBGd4ny0KYhSsjC0w@mail.gmail.com/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/2] cgroup/dmem: introduce a peak file
  2026-05-06 14:18   ` Thadeu Lima de Souza Cascardo
@ 2026-05-06 15:09     ` Michal Koutný
  0 siblings, 0 replies; 6+ messages in thread
From: Michal Koutný @ 2026-05-06 15:09 UTC (permalink / raw)
  To: Thadeu Lima de Souza Cascardo
  Cc: Tejun Heo, Johannes Weiner, Michal Hocko, Roman Gushchin,
	Shakeel Butt, Muchun Song, Andrew Morton, Jonathan Corbet,
	Shuah Khan, Maarten Lankhorst, Maxime Ripard, Natalie Vock,
	Tvrtko Ursulin, cgroups, linux-kernel, linux-mm, linux-doc,
	dri-devel, kernel-dev

[-- Attachment #1: Type: text/plain, Size: 1514 bytes --]

On Wed, May 06, 2026 at 11:18:26AM -0300, Thadeu Lima de Souza Cascardo <cascardo@igalia.com> wrote:
> I used void *, at first, but as the only current use is for the pool and as
> mixing different uses may lead to misuse, I thought it would be safer to
> use the type directly. This has been pointed out before for other members
> of cgroup_file_ctx. See [1].

That mail reacts to union overlaps and pointer vs embedded struct
allocations. Correct me if I missed your part.

I agree that having properly typed pointer is safer.
cgroup_file_ctx sub-structs are for generic cgroup files. But here
somehow a specific controller needs propagated to the generic member.

What about storing also the `list_head *watchers` inside `struct
cgroup_of_peak` and each subsys would manage it as needed?
(ofp->watchers == NULL could also substitute ofp->value ==
OFP_PEAK_UNSET)

> I started with a non-resettable peak file, but as memory.peak can be reset,
> I added that feature too. 

At the same time pids.peak has survived without reset option till today.

> If we want to merge a non-resettable support ealier and need to take
> longer to discuss how to work on the resettable support given the
> above, I can resubmit. But I guess we can see if we can reach an
> agreement sonner rather than later.

What kind of users do you envision (i.e. would they need resets at all)?
Anyway, the behavior should be explained in cgroup-v2.rst since that's
where they'll look for it.

HTH,
Michal

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 265 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-05-06 15:09 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-06 11:58 [PATCH 0/2] cgroup/dmem: introduce a peak file Thadeu Lima de Souza Cascardo
2026-05-06 11:58 ` [PATCH 1/2] mm/page_counter: decouple peak_reset from peak_write Thadeu Lima de Souza Cascardo
2026-05-06 11:58 ` [PATCH 2/2] cgroup/dmem: introduce a peak file Thadeu Lima de Souza Cascardo
2026-05-06 13:53 ` [PATCH 0/2] " Michal Koutný
2026-05-06 14:18   ` Thadeu Lima de Souza Cascardo
2026-05-06 15:09     ` Michal Koutný

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox