* [PATCH RFC 0/2] cgroup/mem: add a node to double charge in memcg
@ 2026-04-03 14:08 Eric Chanudet
2026-04-03 14:08 ` [PATCH RFC 1/2] mm/memcontrol: add page-level charge/uncharge functions Eric Chanudet
2026-04-03 14:08 ` [PATCH RFC 2/2] cgroup/dmem: add a node to double charge in memcg Eric Chanudet
0 siblings, 2 replies; 4+ messages in thread
From: Eric Chanudet @ 2026-04-03 14:08 UTC (permalink / raw)
To: Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
Muchun Song, Andrew Morton, Maarten Lankhorst, Maxime Ripard,
Natalie Vock, Tejun Heo, Michal Koutný
Cc: cgroups, linux-mm, linux-kernel, dri-devel, T.J. Mercier,
Christian König, Maxime Ripard, Albert Esteve, Dave Airlie,
Eric Chanudet
It was suggested previously[1] to introduce a knob for dmem region to
double charge dmem and memcg at the will of the administrator.
This RFC tries do that in the dmem controller through the cgroupfs
interface already available and walk through the problems that creates.
[1] https://lore.kernel.org/all/a446b598-5041-450b-aaa9-3c39a09ff6a0@amd.com/
Signed-off-by: Eric Chanudet <echanude@redhat.com>
---
Eric Chanudet (2):
mm/memcontrol: add page-level charge/uncharge functions
cgroup/dmem: add a node to double charge in memcg
include/linux/memcontrol.h | 4 +++
kernel/cgroup/dmem.c | 86 ++++++++++++++++++++++++++++++++++++++++++++--
mm/memcontrol.c | 24 +++++++++++++
3 files changed, 111 insertions(+), 3 deletions(-)
---
base-commit: 4b9c36c83b34f710da9573291404f6a2246251c1
change-id: 20260327-cgroup-dmem-memcg-double-charge-0f100a9ffbf2
Best regards,
--
Eric Chanudet <echanude@redhat.com>
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH RFC 1/2] mm/memcontrol: add page-level charge/uncharge functions
2026-04-03 14:08 [PATCH RFC 0/2] cgroup/mem: add a node to double charge in memcg Eric Chanudet
@ 2026-04-03 14:08 ` Eric Chanudet
2026-04-03 17:15 ` Johannes Weiner
2026-04-03 14:08 ` [PATCH RFC 2/2] cgroup/dmem: add a node to double charge in memcg Eric Chanudet
1 sibling, 1 reply; 4+ messages in thread
From: Eric Chanudet @ 2026-04-03 14:08 UTC (permalink / raw)
To: Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
Muchun Song, Andrew Morton, Maarten Lankhorst, Maxime Ripard,
Natalie Vock, Tejun Heo, Michal Koutný
Cc: cgroups, linux-mm, linux-kernel, dri-devel, T.J. Mercier,
Christian König, Maxime Ripard, Albert Esteve, Dave Airlie,
Eric Chanudet
Expose functions to charge/uncharge memcg with a number of pages instead
of a folio.
Signed-off-by: Eric Chanudet <echanude@redhat.com>
---
include/linux/memcontrol.h | 4 ++++
mm/memcontrol.c | 24 ++++++++++++++++++++++++
2 files changed, 28 insertions(+)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 70b685a85bf4cd0e830c9c0253e4d48f75957fe4..32f03890f13e06551fc910515eb478597c1235d8 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -642,6 +642,8 @@ static inline bool mem_cgroup_below_min(struct mem_cgroup *target,
int __mem_cgroup_charge(struct folio *folio, struct mm_struct *mm, gfp_t gfp);
+int mem_cgroup_try_charge_pages(struct mem_cgroup *memcg, gfp_t gfp_mask,
+ unsigned int nr_pages);
/**
* mem_cgroup_charge - Charge a newly allocated folio to a cgroup.
* @folio: Folio to charge.
@@ -692,6 +694,8 @@ static inline void mem_cgroup_uncharge_folios(struct folio_batch *folios)
__mem_cgroup_uncharge_folios(folios);
}
+void mem_cgroup_uncharge_pages(struct mem_cgroup *memcg, unsigned int nr_pages);
+
void mem_cgroup_replace_folio(struct folio *old, struct folio *new);
void mem_cgroup_migrate(struct folio *old, struct folio *new);
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 772bac21d15584ce495cba6ad2eebfa7f693677f..49ed069a2dafd5d26d77e6737dffe7e64ba5118c 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4764,6 +4764,24 @@ int __mem_cgroup_charge(struct folio *folio, struct mm_struct *mm, gfp_t gfp)
return ret;
}
+/**
+ * mem_cgroup_try_charge_pages - charge pages to a memory cgroup
+ * @memcg: memory cgroup to charge
+ * @gfp_mask: reclaim mode
+ * @nr_pages: number of pages to charge
+ *
+ * Try to charge @nr_pages to @memcg through try_charge_memcg.
+ *
+ * Returns 0 on success, an error code on failure.
+ */
+int mem_cgroup_try_charge_pages(struct mem_cgroup *memcg, gfp_t gfp_mask,
+ unsigned int nr_pages)
+{
+ return try_charge(memcg, gfp_mask, nr_pages);
+}
+EXPORT_SYMBOL_GPL(mem_cgroup_try_charge_pages);
+
+
/**
* mem_cgroup_charge_hugetlb - charge the memcg for a hugetlb folio
* @folio: folio being charged
@@ -4948,6 +4966,12 @@ void __mem_cgroup_uncharge_folios(struct folio_batch *folios)
uncharge_batch(&ug);
}
+void mem_cgroup_uncharge_pages(struct mem_cgroup *memcg, unsigned int nr_pages)
+{
+ memcg_uncharge(memcg, nr_pages);
+}
+EXPORT_SYMBOL_GPL(mem_cgroup_uncharge_pages);
+
/**
* mem_cgroup_replace_folio - Charge a folio's replacement.
* @old: Currently circulating folio.
--
2.52.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH RFC 2/2] cgroup/dmem: add a node to double charge in memcg
2026-04-03 14:08 [PATCH RFC 0/2] cgroup/mem: add a node to double charge in memcg Eric Chanudet
2026-04-03 14:08 ` [PATCH RFC 1/2] mm/memcontrol: add page-level charge/uncharge functions Eric Chanudet
@ 2026-04-03 14:08 ` Eric Chanudet
1 sibling, 0 replies; 4+ messages in thread
From: Eric Chanudet @ 2026-04-03 14:08 UTC (permalink / raw)
To: Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
Muchun Song, Andrew Morton, Maarten Lankhorst, Maxime Ripard,
Natalie Vock, Tejun Heo, Michal Koutný
Cc: cgroups, linux-mm, linux-kernel, dri-devel, T.J. Mercier,
Christian König, Maxime Ripard, Albert Esteve, Dave Airlie,
Eric Chanudet
Introduce /cgroupfs/<>/dmem.memcg to make allocations in a dmem
controlled region also be charged in memcg.
This is disabled by default and requires the administrator to configure
it through the cgroupfs before the first charge occurs.
The memcg is derived from the pool's cgroup, if it exists, since the
pool holds a ref to the dmem cgroup state keeping the cgroup alive and
stable.
The behavior is quirky. Since keeping track of each allocation would add
a fair amount of logic without solving the problem entirely, disable the
memcg switch once the first charge is issued. Having this as a dynamic
configuration doesn't seem relevant anyway.
Signed-off-by: Eric Chanudet <echanude@redhat.com>
---
kernel/cgroup/dmem.c | 86 ++++++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 83 insertions(+), 3 deletions(-)
diff --git a/kernel/cgroup/dmem.c b/kernel/cgroup/dmem.c
index 9d95824dc6fa09422274422313b63c25986596de..b65ae8cf0c302ce3773a7aa5f0d6d8223d2c10c9 100644
--- a/kernel/cgroup/dmem.c
+++ b/kernel/cgroup/dmem.c
@@ -17,6 +17,7 @@
#include <linux/refcount.h>
#include <linux/rculist.h>
#include <linux/slab.h>
+#include <linux/memcontrol.h>
struct dmem_cgroup_region {
/**
@@ -76,6 +77,9 @@ struct dmem_cgroup_pool_state {
refcount_t ref;
bool inited;
+
+ bool memcg;
+ bool memcg_locked;
};
/*
@@ -162,6 +166,14 @@ set_resource_max(struct dmem_cgroup_pool_state *pool, u64 val)
page_counter_set_max(&pool->cnt, val);
}
+static void
+set_resource_memcg(struct dmem_cgroup_pool_state *pool, u64 val)
+{
+ /* Cannot change once a charge happened. */
+ if (!pool->memcg_locked)
+ pool->memcg = !!val;
+}
+
static u64 get_resource_low(struct dmem_cgroup_pool_state *pool)
{
return pool ? READ_ONCE(pool->cnt.low) : 0;
@@ -182,11 +194,17 @@ static u64 get_resource_current(struct dmem_cgroup_pool_state *pool)
return pool ? page_counter_read(&pool->cnt) : 0;
}
+static u64 get_resource_memcg(struct dmem_cgroup_pool_state *pool)
+{
+ return pool ? READ_ONCE(pool->memcg) : 0;
+}
+
static void reset_all_resource_limits(struct dmem_cgroup_pool_state *rpool)
{
set_resource_min(rpool, 0);
set_resource_low(rpool, 0);
set_resource_max(rpool, PAGE_COUNTER_MAX);
+ set_resource_memcg(rpool, 0);
}
static void dmemcs_offline(struct cgroup_subsys_state *css)
@@ -609,6 +627,20 @@ get_cg_pool_unlocked(struct dmemcg_state *cg, struct dmem_cgroup_region *region)
return pool;
}
+static struct mem_cgroup *mem_cgroup_from_cgroup(struct cgroup *c)
+{
+ struct cgroup_subsys_state *css;
+
+ if (mem_cgroup_disabled())
+ return NULL;
+
+ rcu_read_lock();
+ css = cgroup_e_css(c, &memory_cgrp_subsys);
+ rcu_read_unlock();
+
+ return mem_cgroup_from_css(css);
+}
+
/**
* dmem_cgroup_uncharge() - Uncharge a pool.
* @pool: Pool to uncharge.
@@ -624,6 +656,13 @@ void dmem_cgroup_uncharge(struct dmem_cgroup_pool_state *pool, u64 size)
return;
page_counter_uncharge(&pool->cnt, size);
+
+ struct mem_cgroup *memcg = mem_cgroup_from_cgroup(pool->cs->css.cgroup);
+
+ if (pool->memcg && memcg)
+ mem_cgroup_uncharge_pages(memcg,
+ PAGE_ALIGN(size) >> PAGE_SHIFT);
+
css_put(&pool->cs->css);
dmemcg_pool_put(pool);
}
@@ -655,6 +694,8 @@ int dmem_cgroup_try_charge(struct dmem_cgroup_region *region, u64 size,
struct dmemcg_state *cg;
struct dmem_cgroup_pool_state *pool;
struct page_counter *fail;
+ struct mem_cgroup *memcg;
+ unsigned long nr_pages = PAGE_ALIGN(size) >> PAGE_SHIFT;
int ret;
*ret_pool = NULL;
@@ -670,7 +711,22 @@ int dmem_cgroup_try_charge(struct dmem_cgroup_region *region, u64 size,
pool = get_cg_pool_unlocked(cg, region);
if (IS_ERR(pool)) {
ret = PTR_ERR(pool);
- goto err;
+ goto err_css_put;
+ }
+
+ pool->memcg_locked = true;
+ memcg = get_mem_cgroup_from_current();
+ if (pool->memcg && memcg) {
+ ret = mem_cgroup_try_charge_pages(memcg, GFP_KERNEL, nr_pages);
+ if (ret) {
+ /*
+ * No dmem_cgroup_state_evict_valuable() could help,
+ * there's no ret_limit_pool to return.
+ */
+ ret = -ENOMEM;
+ dmemcg_pool_put(pool);
+ goto err_memcg_put;
+ }
}
if (!page_counter_try_charge(&pool->cnt, size, &fail)) {
@@ -681,14 +737,21 @@ int dmem_cgroup_try_charge(struct dmem_cgroup_region *region, u64 size,
}
dmemcg_pool_put(pool);
ret = -EAGAIN;
- goto err;
+ goto err_uncharge_memcg;
}
+ mem_cgroup_put(memcg);
+
/* On success, reference from get_current_dmemcs is transferred to *ret_pool */
*ret_pool = pool;
return 0;
-err:
+err_uncharge_memcg:
+ if (pool->memcg && memcg)
+ mem_cgroup_uncharge_pages(memcg, nr_pages);
+err_memcg_put:
+ mem_cgroup_put(memcg);
+err_css_put:
css_put(&cg->css);
return ret;
}
@@ -846,6 +909,17 @@ static ssize_t dmem_cgroup_region_max_write(struct kernfs_open_file *of,
return dmemcg_limit_write(of, buf, nbytes, off, set_resource_max);
}
+static int dmem_cgroup_memcg_show(struct seq_file *sf, void *v)
+{
+ return dmemcg_limit_show(sf, v, get_resource_memcg);
+}
+
+static ssize_t dmem_cgroup_memcg_write(struct kernfs_open_file *of, char *buf,
+ size_t nbytes, loff_t off)
+{
+ return dmemcg_limit_write(of, buf, nbytes, off, set_resource_memcg);
+}
+
static struct cftype files[] = {
{
.name = "capacity",
@@ -874,6 +948,12 @@ static struct cftype files[] = {
.seq_show = dmem_cgroup_region_max_show,
.flags = CFTYPE_NOT_ON_ROOT,
},
+ {
+ .name = "memcg",
+ .write = dmem_cgroup_memcg_write,
+ .seq_show = dmem_cgroup_memcg_show,
+ .flags = CFTYPE_NOT_ON_ROOT,
+ },
{ } /* Zero entry terminates. */
};
--
2.52.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH RFC 1/2] mm/memcontrol: add page-level charge/uncharge functions
2026-04-03 14:08 ` [PATCH RFC 1/2] mm/memcontrol: add page-level charge/uncharge functions Eric Chanudet
@ 2026-04-03 17:15 ` Johannes Weiner
0 siblings, 0 replies; 4+ messages in thread
From: Johannes Weiner @ 2026-04-03 17:15 UTC (permalink / raw)
To: Eric Chanudet
Cc: Michal Hocko, Roman Gushchin, Shakeel Butt, Muchun Song,
Andrew Morton, Maarten Lankhorst, Maxime Ripard, Natalie Vock,
Tejun Heo, Michal Koutný, cgroups, linux-mm, linux-kernel,
dri-devel, T.J. Mercier, Christian König, Maxime Ripard,
Albert Esteve, Dave Airlie
On Fri, Apr 03, 2026 at 10:08:35AM -0400, Eric Chanudet wrote:
> Expose functions to charge/uncharge memcg with a number of pages instead
> of a folio.
>
> Signed-off-by: Eric Chanudet <echanude@redhat.com>
No naked number accounting, please. The reason existing charge paths
require you to pass an object is because there are other memory
attributes we need to track (such as NUMA node location).
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-04-03 17:15 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-03 14:08 [PATCH RFC 0/2] cgroup/mem: add a node to double charge in memcg Eric Chanudet
2026-04-03 14:08 ` [PATCH RFC 1/2] mm/memcontrol: add page-level charge/uncharge functions Eric Chanudet
2026-04-03 17:15 ` Johannes Weiner
2026-04-03 14:08 ` [PATCH RFC 2/2] cgroup/dmem: add a node to double charge in memcg Eric Chanudet
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox