* [PATCH v2 0/2] cgroup/dmem: allow double-charging dmem allocations to memcg
@ 2026-05-19 15:59 Eric Chanudet
2026-05-19 15:59 ` [PATCH v2 1/2] mm/memcontrol: add dmem charge/uncharge functions Eric Chanudet
2026-05-19 15:59 ` [PATCH v2 2/2] cgroup/dmem: add dmem.memcg control file for double-charging to memcg Eric Chanudet
0 siblings, 2 replies; 8+ messages in thread
From: Eric Chanudet @ 2026-05-19 15:59 UTC (permalink / raw)
To: Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
Muchun Song, Andrew Morton, Maarten Lankhorst, Maxime Ripard,
Natalie Vock, Tejun Heo, Michal Koutný, Jonathan Corbet,
Shuah Khan
Cc: cgroups, linux-mm, linux-kernel, dri-devel, T.J. Mercier,
Christian König, Maxime Ripard, Albert Esteve, Dave Airlie,
linux-doc, Eric Chanudet
Following suggestion[1], offer a cgroupfs entry to allow an
administrator to request that a dmem controlled region also charges to
the memory controller.
Add mem_cgroup_dmem_charge/uncharge helpers to resolve the effective
cgroup from a dmem pool's cgroup, perform the charge and update a
MEMCG_DMEM stat counter.
Add a "dmem.memcg" control file at the root level to configure memcg
charging per region. The setting is disabled by default and locked on
first charge attempt.
[1] https://lore.kernel.org/all/a446b598-5041-450b-aaa9-3c39a09ff6a0@amd.com/
Signed-off-by: Eric Chanudet <echanude@redhat.com>
---
Changes in v2:
- Use mem_cgroup_dmem_{,un}charge to account for memcg pages instead of
exposing raw nr_pages functions. Use it to centralize where to find
the effective cgroup from the pool's cgroup (Johannes)
- Set depends_on for cgrp_memory if CONFIG_MEMCG by having a memory
controller in children cgroup (Michal)
- Move dmem.memcg to the root level as it applies by region for all
cgroups
- Add a dmem memory.stats entry for reporting memcg charges for dmem
allocations.
- Wrap the memcg enable/disable/lock configuration under a single state
to avoid toctou races and simplify transitions.
- Link to v1: https://lore.kernel.org/r/20260403-cgroup-dmem-memcg-double-charge-v1-0-c371d155de2a@redhat.com
---
Eric Chanudet (2):
mm/memcontrol: add dmem charge/uncharge functions
cgroup/dmem: add dmem.memcg control file for double-charging to memcg
Documentation/admin-guide/cgroup-v2.rst | 23 +++++
include/linux/memcontrol.h | 16 ++++
kernel/cgroup/dmem.c | 158 +++++++++++++++++++++++++++++++-
mm/memcontrol.c | 65 +++++++++++++
4 files changed, 259 insertions(+), 3 deletions(-)
---
base-commit: d989f135f71699294bb2ffd4726b526456e2db68
change-id: 20260327-cgroup-dmem-memcg-double-charge-0f100a9ffbf2
Best regards,
--
Eric Chanudet <echanude@redhat.com>
^ permalink raw reply [flat|nested] 8+ messages in thread* [PATCH v2 1/2] mm/memcontrol: add dmem charge/uncharge functions 2026-05-19 15:59 [PATCH v2 0/2] cgroup/dmem: allow double-charging dmem allocations to memcg Eric Chanudet @ 2026-05-19 15:59 ` Eric Chanudet 2026-05-20 7:22 ` Albert Esteve 2026-05-22 15:53 ` Shakeel Butt 2026-05-19 15:59 ` [PATCH v2 2/2] cgroup/dmem: add dmem.memcg control file for double-charging to memcg Eric Chanudet 1 sibling, 2 replies; 8+ messages in thread From: Eric Chanudet @ 2026-05-19 15:59 UTC (permalink / raw) To: Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt, Muchun Song, Andrew Morton, Maarten Lankhorst, Maxime Ripard, Natalie Vock, Tejun Heo, Michal Koutný, Jonathan Corbet, Shuah Khan Cc: cgroups, linux-mm, linux-kernel, dri-devel, T.J. Mercier, Christian König, Maxime Ripard, Albert Esteve, Dave Airlie, linux-doc, Eric Chanudet Add mem_cgroup_dmem_charge() and mem_cgroup_dmem_uncharge() to allow dmem pool allocations to optionally be double-charged against the memory controller. Take the struct cgroup from the dmem pool's css as there is no convenient object exported to represent these allocations. These will resolve the effective memory css from that cgroup and perform the charge. Introduce a MEMCG_DMEM stat counter to memory.stat to make the cgroup's dmem charge visible. Signed-off-by: Eric Chanudet <echanude@redhat.com> --- include/linux/memcontrol.h | 16 ++++++++++++ mm/memcontrol.c | 65 ++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 81 insertions(+) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index dc3fa687759b45748b2acee6d7f43da325eb50c1..8e1d49b87fb64e6114f3eb920293e14920290fe7 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -39,6 +39,7 @@ enum memcg_stat_item { MEMCG_ZSWAP_B, MEMCG_ZSWAPPED, MEMCG_ZSWAP_INCOMP, + MEMCG_DMEM, MEMCG_NR_STAT, }; @@ -1872,6 +1873,21 @@ static inline bool mem_cgroup_zswap_writeback_enabled(struct mem_cgroup *memcg) } #endif +#if defined(CONFIG_MEMCG) && defined(CONFIG_CGROUP_DMEM) +bool mem_cgroup_dmem_charge(struct cgroup *cgrp, unsigned int nr_pages, + gfp_t gfp_mask); +void mem_cgroup_dmem_uncharge(struct cgroup *cgrp, unsigned int nr_pages); +#else +static inline bool mem_cgroup_dmem_charge(struct cgroup *cgrp, + unsigned int nr_pages, gfp_t gfp_mask) +{ + return true; +} +static inline void mem_cgroup_dmem_uncharge(struct cgroup *cgrp, + unsigned int nr_pages) +{ +} +#endif /* Cgroup v1-related declarations */ diff --git a/mm/memcontrol.c b/mm/memcontrol.c index c03d4787d466803db49cdaa90e6d6ba426b7afe2..91a7ac16b6eac2d6c3700b6885a068bf8b640706 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -433,6 +433,7 @@ static const unsigned int memcg_stat_items[] = { MEMCG_ZSWAP_B, MEMCG_ZSWAPPED, MEMCG_ZSWAP_INCOMP, + MEMCG_DMEM, }; #define NR_MEMCG_NODE_STAT_ITEMS ARRAY_SIZE(memcg_node_stat_items) @@ -1606,6 +1607,9 @@ static const struct memory_stat memory_stats[] = { #ifdef CONFIG_NUMA_BALANCING { "pgpromote_success", PGPROMOTE_SUCCESS }, #endif +#ifdef CONFIG_CGROUP_DMEM + { "dmem", MEMCG_DMEM }, +#endif }; /* The actual unit of the state item, not the same as the output unit */ @@ -5909,6 +5913,67 @@ static struct cftype zswap_files[] = { }; #endif /* CONFIG_ZSWAP */ +#ifdef CONFIG_CGROUP_DMEM +/** + * mem_cgroup_dmem_charge - charge memcg for a dmem pool allocation + * @cgrp: cgroup of the dmem pool + * @nr_pages: number of pages to charge + * @gfp_mask: reclaim mode + * + * Charges @nr_pages to @memcg. Returns %true if the charge fit within + * @memcg's configured limit, %false if it doesn't. + */ +bool mem_cgroup_dmem_charge(struct cgroup *cgrp, unsigned int nr_pages, + gfp_t gfp_mask) +{ + struct cgroup_subsys_state *mem_css; + struct mem_cgroup *memcg; + + /* CGROUP_DMEM and MEMCG guarantees this cannot be NULL. */ + mem_css = cgroup_get_e_css(cgrp, &memory_cgrp_subsys); + + /* Use the memcg, if any, of the dmem cgroup. */ + memcg = mem_cgroup_from_css(mem_css); + if (!memcg || mem_cgroup_is_root(memcg)) { + css_put(mem_css); + return false; + } + + if (try_charge_memcg(memcg, gfp_mask, nr_pages)) { + css_put(mem_css); + return false; + } + + mod_memcg_state(memcg, MEMCG_DMEM, nr_pages); + css_put(mem_css); + return true; +} + +/** + * mem_cgroup_dmem_uncharge - uncharge memcg from a dmem pool allocation + * @cgrp: cgroup of the dmem pool + * @nr_pages: number of pages to uncharge + */ +void mem_cgroup_dmem_uncharge(struct cgroup *cgrp, unsigned int nr_pages) +{ + struct cgroup_subsys_state *mem_css; + struct mem_cgroup *memcg; + + /* CGROUP_DMEM and MEMCG guarantees this cannot be NULL. */ + mem_css = cgroup_get_e_css(cgrp, &memory_cgrp_subsys); + + memcg = mem_cgroup_from_css(mem_css); + if (!memcg || mem_cgroup_is_root(memcg)) { + css_put(mem_css); + return; + } + + mod_memcg_state(memcg, MEMCG_DMEM, -nr_pages); + refill_stock(memcg, nr_pages); + css_put(mem_css); +} +#endif /* CONFIG_CGROUP_DMEM */ + static int __init mem_cgroup_swap_init(void) { if (mem_cgroup_disabled()) -- 2.52.0 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v2 1/2] mm/memcontrol: add dmem charge/uncharge functions 2026-05-19 15:59 ` [PATCH v2 1/2] mm/memcontrol: add dmem charge/uncharge functions Eric Chanudet @ 2026-05-20 7:22 ` Albert Esteve 2026-05-22 15:53 ` Shakeel Butt 1 sibling, 0 replies; 8+ messages in thread From: Albert Esteve @ 2026-05-20 7:22 UTC (permalink / raw) To: Eric Chanudet Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt, Muchun Song, Andrew Morton, Maarten Lankhorst, Maxime Ripard, Natalie Vock, Tejun Heo, Michal Koutný, Jonathan Corbet, Shuah Khan, cgroups, linux-mm, linux-kernel, dri-devel, T.J. Mercier, Christian König, Maxime Ripard, Dave Airlie, linux-doc On Tue, May 19, 2026 at 6:01 PM Eric Chanudet <echanude@redhat.com> wrote: > > Add mem_cgroup_dmem_charge() and mem_cgroup_dmem_uncharge() to allow > dmem pool allocations to optionally be double-charged against the memory > controller. Take the struct cgroup from the dmem pool's css as there is > no convenient object exported to represent these allocations. These will > resolve the effective memory css from that cgroup and perform the > charge. > > Introduce a MEMCG_DMEM stat counter to memory.stat to make the cgroup's > dmem charge visible. > > Signed-off-by: Eric Chanudet <echanude@redhat.com> Reviewed-by: Albert Esteve <aesteve@redhat.com> > --- > include/linux/memcontrol.h | 16 ++++++++++++ > mm/memcontrol.c | 65 ++++++++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 81 insertions(+) > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > index dc3fa687759b45748b2acee6d7f43da325eb50c1..8e1d49b87fb64e6114f3eb920293e14920290fe7 100644 > --- a/include/linux/memcontrol.h > +++ b/include/linux/memcontrol.h > @@ -39,6 +39,7 @@ enum memcg_stat_item { > MEMCG_ZSWAP_B, > MEMCG_ZSWAPPED, > MEMCG_ZSWAP_INCOMP, > + MEMCG_DMEM, > MEMCG_NR_STAT, > }; > > @@ -1872,6 +1873,21 @@ static inline bool mem_cgroup_zswap_writeback_enabled(struct mem_cgroup *memcg) > } > #endif > > +#if defined(CONFIG_MEMCG) && defined(CONFIG_CGROUP_DMEM) > +bool mem_cgroup_dmem_charge(struct cgroup *cgrp, unsigned int nr_pages, > + gfp_t gfp_mask); > +void mem_cgroup_dmem_uncharge(struct cgroup *cgrp, unsigned int nr_pages); > +#else > +static inline bool mem_cgroup_dmem_charge(struct cgroup *cgrp, > + unsigned int nr_pages, gfp_t gfp_mask) > +{ > + return true; > +} > +static inline void mem_cgroup_dmem_uncharge(struct cgroup *cgrp, > + unsigned int nr_pages) > +{ > +} > +#endif > > /* Cgroup v1-related declarations */ > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index c03d4787d466803db49cdaa90e6d6ba426b7afe2..91a7ac16b6eac2d6c3700b6885a068bf8b640706 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -433,6 +433,7 @@ static const unsigned int memcg_stat_items[] = { > MEMCG_ZSWAP_B, > MEMCG_ZSWAPPED, > MEMCG_ZSWAP_INCOMP, > + MEMCG_DMEM, > }; > > #define NR_MEMCG_NODE_STAT_ITEMS ARRAY_SIZE(memcg_node_stat_items) > @@ -1606,6 +1607,9 @@ static const struct memory_stat memory_stats[] = { > #ifdef CONFIG_NUMA_BALANCING > { "pgpromote_success", PGPROMOTE_SUCCESS }, > #endif > +#ifdef CONFIG_CGROUP_DMEM > + { "dmem", MEMCG_DMEM }, > +#endif > }; > > /* The actual unit of the state item, not the same as the output unit */ > @@ -5909,6 +5913,67 @@ static struct cftype zswap_files[] = { > }; > #endif /* CONFIG_ZSWAP */ > > +#ifdef CONFIG_CGROUP_DMEM > +/** > + * mem_cgroup_dmem_charge - charge memcg for a dmem pool allocation > + * @cgrp: cgroup of the dmem pool > + * @nr_pages: number of pages to charge > + * @gfp_mask: reclaim mode > + * > + * Charges @nr_pages to @memcg. Returns %true if the charge fit within > + * @memcg's configured limit, %false if it doesn't. > + */ > +bool mem_cgroup_dmem_charge(struct cgroup *cgrp, unsigned int nr_pages, > + gfp_t gfp_mask) > +{ > + struct cgroup_subsys_state *mem_css; > + struct mem_cgroup *memcg; > + > + /* CGROUP_DMEM and MEMCG guarantees this cannot be NULL. */ > + mem_css = cgroup_get_e_css(cgrp, &memory_cgrp_subsys); > + > + /* Use the memcg, if any, of the dmem cgroup. */ > + memcg = mem_cgroup_from_css(mem_css); > + if (!memcg || mem_cgroup_is_root(memcg)) { > + css_put(mem_css); > + return false; > + } > + > + if (try_charge_memcg(memcg, gfp_mask, nr_pages)) { > + css_put(mem_css); > + return false; > + } > + > + mod_memcg_state(memcg, MEMCG_DMEM, nr_pages); > + css_put(mem_css); > + return true; > +} > + > +/** > + * mem_cgroup_dmem_uncharge - uncharge memcg from a dmem pool allocation > + * @cgrp: cgroup of the dmem pool > + * @nr_pages: number of pages to uncharge > + */ > +void mem_cgroup_dmem_uncharge(struct cgroup *cgrp, unsigned int nr_pages) > +{ > + struct cgroup_subsys_state *mem_css; > + struct mem_cgroup *memcg; > + > + /* CGROUP_DMEM and MEMCG guarantees this cannot be NULL. */ > + mem_css = cgroup_get_e_css(cgrp, &memory_cgrp_subsys); > + > + memcg = mem_cgroup_from_css(mem_css); > + if (!memcg || mem_cgroup_is_root(memcg)) { > + css_put(mem_css); > + return; > + } > + > + mod_memcg_state(memcg, MEMCG_DMEM, -nr_pages); > + refill_stock(memcg, nr_pages); > + css_put(mem_css); > +} > +#endif /* CONFIG_CGROUP_DMEM */ > + > static int __init mem_cgroup_swap_init(void) > { > if (mem_cgroup_disabled()) > > -- > 2.52.0 > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 1/2] mm/memcontrol: add dmem charge/uncharge functions 2026-05-19 15:59 ` [PATCH v2 1/2] mm/memcontrol: add dmem charge/uncharge functions Eric Chanudet 2026-05-20 7:22 ` Albert Esteve @ 2026-05-22 15:53 ` Shakeel Butt 2026-05-22 15:55 ` Shakeel Butt 1 sibling, 1 reply; 8+ messages in thread From: Shakeel Butt @ 2026-05-22 15:53 UTC (permalink / raw) To: Eric Chanudet Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song, Andrew Morton, Maarten Lankhorst, Maxime Ripard, Natalie Vock, Tejun Heo, Michal Koutný, Jonathan Corbet, Shuah Khan, cgroups, linux-mm, linux-kernel, dri-devel, T.J. Mercier, Christian König, Maxime Ripard, Albert Esteve, Dave Airlie, linux-doc On Tue, May 19, 2026 at 11:59:01AM -0400, Eric Chanudet wrote: > Add mem_cgroup_dmem_charge() and mem_cgroup_dmem_uncharge() to allow > dmem pool allocations to optionally be double-charged against the memory > controller. Take the struct cgroup from the dmem pool's css as there is > no convenient object exported to represent these allocations. These will > resolve the effective memory css from that cgroup and perform the > charge. > > Introduce a MEMCG_DMEM stat counter to memory.stat to make the cgroup's > dmem charge visible. > > Signed-off-by: Eric Chanudet <echanude@redhat.com> > --- > include/linux/memcontrol.h | 16 ++++++++++++ > mm/memcontrol.c | 65 ++++++++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 81 insertions(+) > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > index dc3fa687759b45748b2acee6d7f43da325eb50c1..8e1d49b87fb64e6114f3eb920293e14920290fe7 100644 > --- a/include/linux/memcontrol.h > +++ b/include/linux/memcontrol.h > @@ -39,6 +39,7 @@ enum memcg_stat_item { > MEMCG_ZSWAP_B, > MEMCG_ZSWAPPED, > MEMCG_ZSWAP_INCOMP, > + MEMCG_DMEM, > MEMCG_NR_STAT, > }; > > @@ -1872,6 +1873,21 @@ static inline bool mem_cgroup_zswap_writeback_enabled(struct mem_cgroup *memcg) > } > #endif > > +#if defined(CONFIG_MEMCG) && defined(CONFIG_CGROUP_DMEM) > +bool mem_cgroup_dmem_charge(struct cgroup *cgrp, unsigned int nr_pages, > + gfp_t gfp_mask); > +void mem_cgroup_dmem_uncharge(struct cgroup *cgrp, unsigned int nr_pages); > +#else > +static inline bool mem_cgroup_dmem_charge(struct cgroup *cgrp, > + unsigned int nr_pages, gfp_t gfp_mask) Please follow Johannes's request to pass the actually memory object instead of naked numbers. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 1/2] mm/memcontrol: add dmem charge/uncharge functions 2026-05-22 15:53 ` Shakeel Butt @ 2026-05-22 15:55 ` Shakeel Butt 0 siblings, 0 replies; 8+ messages in thread From: Shakeel Butt @ 2026-05-22 15:55 UTC (permalink / raw) To: Eric Chanudet Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song, Andrew Morton, Maarten Lankhorst, Maxime Ripard, Natalie Vock, Tejun Heo, Michal Koutný, Jonathan Corbet, Shuah Khan, cgroups, linux-mm, linux-kernel, dri-devel, T.J. Mercier, Christian König, Maxime Ripard, Albert Esteve, Dave Airlie, linux-doc On Fri, May 22, 2026 at 08:53:10AM -0700, Shakeel Butt wrote: > On Tue, May 19, 2026 at 11:59:01AM -0400, Eric Chanudet wrote: > > Add mem_cgroup_dmem_charge() and mem_cgroup_dmem_uncharge() to allow > > dmem pool allocations to optionally be double-charged against the memory > > controller. Take the struct cgroup from the dmem pool's css as there is > > no convenient object exported to represent these allocations. These will > > resolve the effective memory css from that cgroup and perform the > > charge. > > > > Introduce a MEMCG_DMEM stat counter to memory.stat to make the cgroup's > > dmem charge visible. > > > > Signed-off-by: Eric Chanudet <echanude@redhat.com> > > --- > > include/linux/memcontrol.h | 16 ++++++++++++ > > mm/memcontrol.c | 65 ++++++++++++++++++++++++++++++++++++++++++++++ > > 2 files changed, 81 insertions(+) > > > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > > index dc3fa687759b45748b2acee6d7f43da325eb50c1..8e1d49b87fb64e6114f3eb920293e14920290fe7 100644 > > --- a/include/linux/memcontrol.h > > +++ b/include/linux/memcontrol.h > > @@ -39,6 +39,7 @@ enum memcg_stat_item { > > MEMCG_ZSWAP_B, > > MEMCG_ZSWAPPED, > > MEMCG_ZSWAP_INCOMP, > > + MEMCG_DMEM, > > MEMCG_NR_STAT, > > }; > > > > @@ -1872,6 +1873,21 @@ static inline bool mem_cgroup_zswap_writeback_enabled(struct mem_cgroup *memcg) > > } > > #endif > > > > +#if defined(CONFIG_MEMCG) && defined(CONFIG_CGROUP_DMEM) > > +bool mem_cgroup_dmem_charge(struct cgroup *cgrp, unsigned int nr_pages, > > + gfp_t gfp_mask); > > +void mem_cgroup_dmem_uncharge(struct cgroup *cgrp, unsigned int nr_pages); > > +#else > > +static inline bool mem_cgroup_dmem_charge(struct cgroup *cgrp, > > + unsigned int nr_pages, gfp_t gfp_mask) > > Please follow Johannes's request to pass the actually memory object instead of > naked numbers. > Also what exactly is the backing memory here? Is it system memory? If yes, then you need to pass struct page. For non-system memory, I am not sure memcg is the right place to charge such memory. ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v2 2/2] cgroup/dmem: add dmem.memcg control file for double-charging to memcg 2026-05-19 15:59 [PATCH v2 0/2] cgroup/dmem: allow double-charging dmem allocations to memcg Eric Chanudet 2026-05-19 15:59 ` [PATCH v2 1/2] mm/memcontrol: add dmem charge/uncharge functions Eric Chanudet @ 2026-05-19 15:59 ` Eric Chanudet 2026-05-22 15:26 ` Michal Koutný 1 sibling, 1 reply; 8+ messages in thread From: Eric Chanudet @ 2026-05-19 15:59 UTC (permalink / raw) To: Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt, Muchun Song, Andrew Morton, Maarten Lankhorst, Maxime Ripard, Natalie Vock, Tejun Heo, Michal Koutný, Jonathan Corbet, Shuah Khan Cc: cgroups, linux-mm, linux-kernel, dri-devel, T.J. Mercier, Christian König, Maxime Ripard, Albert Esteve, Dave Airlie, linux-doc, Eric Chanudet Add a root-only cgroupfs file "dmem.memcg" that lets an administrator configure whether allocations in a dmem region should also be charged to the memory controller. To handle inheritance, dmem adds a depends_on the memory controller, unless MEMCG isn't configured in. Double-charging is disabled by default. Once a charge is attempted, the setting is locked to prevent inconsistent accounting by a small 4-state machine (off, on, locked off, locked on). The memcg to charge is derived from the pool's cgroup, since the pool holds a reference to the dmem cgroup state that keeps the cgroup alive until it gets uncharged. Signed-off-by: Eric Chanudet <echanude@redhat.com> --- Documentation/admin-guide/cgroup-v2.rst | 23 +++++ kernel/cgroup/dmem.c | 158 +++++++++++++++++++++++++++++++- 2 files changed, 178 insertions(+), 3 deletions(-) diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index 6efd0095ed995b1550317662bc1b56c7a7f3db23..1d2fa55ddf0faa17baa916a8914d3033e8e42359 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -2828,6 +2828,29 @@ DMEM Interface Files drm/0000:03:00.0/vram0 12550144 drm/0000:03:00.0/stolen 8650752 + dmem.memcg + A readwrite nested-keyed file that exists only on the root + cgroup. It configures whether allocations in a dmem region + should also be charged to the memory controller. + + Upon the first charge to a region, its setting can no longer be changed + and is reported as "[true|false] (locked)". + + Charges to the memory controller are visible in ``memory.stat`` as the + ``dmem`` entry, reported in bytes. + + An example read output follows:: + + drm/0000:03:00.0/vram0 false + drm/0000:03:00.0/stolen false (locked) + + Writing uses the same nested-keyed format:: + + echo "drm/0000:03:00.0/vram0 true" > dmem.memcg + + This file is only available when the kernel is built with + ``CONFIG_MEMCG``. + HugeTLB ------- diff --git a/kernel/cgroup/dmem.c b/kernel/cgroup/dmem.c index 1ab1fb47f2711ecc60dd13e611a8a4920b48f3e9..e07b20b8025c528f190f84c76b088cb8a32a7f5e 100644 --- a/kernel/cgroup/dmem.c +++ b/kernel/cgroup/dmem.c @@ -17,6 +17,14 @@ #include <linux/refcount.h> #include <linux/rculist.h> #include <linux/slab.h> +#include <linux/memcontrol.h> + +enum dmem_memcg_status { + DMEM_MEMCG_OFF, + DMEM_MEMCG_ON, + DMEM_MEMCG_LOCKED_OFF, + DMEM_MEMCG_LOCKED_ON, +}; struct dmem_cgroup_region { /** @@ -51,6 +59,14 @@ struct dmem_cgroup_region { * No new pools should be added to the region afterwards. */ bool unregistered; + + /** + * @memcg_status: Whether allocation in this region should charge memcg. + * DMEM_MEMCG_OFF/DMEM_MEMCG_ON or + * DMEM_MEMCG_LOCKED_OFF/DMEM_MEMCG_LOCKED_ON, frozen after first allocation. + * Transitions to a locked state are one-way. + */ + atomic_t memcg_status; }; struct dmemcg_state { @@ -609,6 +625,34 @@ get_cg_pool_unlocked(struct dmemcg_state *cg, struct dmem_cgroup_region *region) return pool; } +static bool apply_memcg_charge(atomic_t *status) +{ + int state = atomic_read(status); + + for (;;) { + switch (state) { + case DMEM_MEMCG_OFF: + state = atomic_cmpxchg(status, DMEM_MEMCG_OFF, + DMEM_MEMCG_LOCKED_OFF); + if (state != DMEM_MEMCG_OFF) + continue; + return false; + case DMEM_MEMCG_LOCKED_OFF: + return false; + case DMEM_MEMCG_ON: + state = atomic_cmpxchg(status, DMEM_MEMCG_ON, + DMEM_MEMCG_LOCKED_ON); + if (state != DMEM_MEMCG_ON) + continue; + return true; + case DMEM_MEMCG_LOCKED_ON: + return true; + } + WARN_ONCE(1, "Invalid memcg_status (%#x).\n", state); + return false; + } +} + /** * dmem_cgroup_uncharge() - Uncharge a pool. * @pool: Pool to uncharge. @@ -624,6 +668,12 @@ void dmem_cgroup_uncharge(struct dmem_cgroup_pool_state *pool, u64 size) return; page_counter_uncharge(&pool->cnt, size); + + if (atomic_read(&pool->region->memcg_status) == DMEM_MEMCG_LOCKED_ON && + !WARN_ON_ONCE(size > (u64)UINT_MAX << PAGE_SHIFT)) + mem_cgroup_dmem_uncharge(pool->cs->css.cgroup, + PAGE_ALIGN(size) >> PAGE_SHIFT); + css_put(&pool->cs->css); dmemcg_pool_put(pool); } @@ -655,6 +705,8 @@ int dmem_cgroup_try_charge(struct dmem_cgroup_region *region, u64 size, struct dmemcg_state *cg; struct dmem_cgroup_pool_state *pool; struct page_counter *fail; + unsigned long nr_pages = PAGE_ALIGN(size) >> PAGE_SHIFT; + bool charge_memcg; int ret; *ret_pool = NULL; @@ -670,7 +722,28 @@ int dmem_cgroup_try_charge(struct dmem_cgroup_region *region, u64 size, pool = get_cg_pool_unlocked(cg, region); if (IS_ERR(pool)) { ret = PTR_ERR(pool); - goto err; + goto err_css_put; + } + + charge_memcg = apply_memcg_charge(®ion->memcg_status); + if (charge_memcg) { + /* mem_cgroup_dmem_charge limitation from try_charge_memcg */ + if (size > (u64)UINT_MAX << PAGE_SHIFT) { + ret = -EINVAL; + dmemcg_pool_put(pool); + goto err_css_put; + } + + if (!mem_cgroup_dmem_charge(pool->cs->css.cgroup, nr_pages, + GFP_KERNEL)) { + /* + * No dmem_cgroup_state_evict_valuable() could help, + * there's no ret_limit_pool to return. + */ + ret = -ENOMEM; + dmemcg_pool_put(pool); + goto err_css_put; + } } if (!page_counter_try_charge(&pool->cnt, size, &fail)) { @@ -681,14 +754,17 @@ int dmem_cgroup_try_charge(struct dmem_cgroup_region *region, u64 size, } dmemcg_pool_put(pool); ret = -EAGAIN; - goto err; + goto err_uncharge_memcg; } /* On success, reference from get_current_dmemcs is transferred to *ret_pool */ *ret_pool = pool; return 0; -err: +err_uncharge_memcg: + if (charge_memcg) + mem_cgroup_dmem_uncharge(pool->cs->css.cgroup, nr_pages); +err_css_put: css_put(&cg->css); return ret; } @@ -845,6 +921,71 @@ static ssize_t dmem_cgroup_region_max_write(struct kernfs_open_file *of, return dmemcg_limit_write(of, buf, nbytes, off, set_resource_max); } +#ifdef CONFIG_MEMCG +static int dmem_cgroup_memcg_show(struct seq_file *sf, void *v) +{ + struct dmem_cgroup_region *region; + + rcu_read_lock(); + list_for_each_entry_rcu(region, &dmem_cgroup_regions, region_node) { + int state = atomic_read(®ion->memcg_status); + + seq_printf(sf, "%s %s\n", region->name, + state == DMEM_MEMCG_ON ? "true" : + state == DMEM_MEMCG_OFF ? "false" : + state == DMEM_MEMCG_LOCKED_ON ? "true (locked)" : + state == DMEM_MEMCG_LOCKED_OFF ? "false (locked)" : + "(invalid)"); + } + rcu_read_unlock(); + return 0; +} + +static ssize_t dmem_cgroup_memcg_write(struct kernfs_open_file *of, char *buf, + size_t nbytes, loff_t off) +{ + while (buf) { + struct dmem_cgroup_region *region; + char *options, *name; + bool flag; + + options = buf; + buf = strchr(buf, '\n'); + if (buf) + *buf++ = '\0'; + + options = strstrip(options); + if (!options[0]) + continue; + + name = strsep(&options, " \t"); + if (!name[0]) + continue; + + if (!options || !options[0]) + return -EINVAL; + + if (kstrtobool(options, &flag)) + return -EINVAL; + + rcu_read_lock(); + region = dmemcg_get_region_by_name(name); + rcu_read_unlock(); + if (!region) + return -ENODEV; + + atomic_cmpxchg(®ion->memcg_status, + flag ? DMEM_MEMCG_OFF : DMEM_MEMCG_ON, + flag ? DMEM_MEMCG_ON : DMEM_MEMCG_OFF); + /* Continue if a region is already locked. */ + + kref_put(®ion->ref, dmemcg_free_region); + } + + return nbytes; +} +#endif + static struct cftype files[] = { { .name = "capacity", @@ -873,6 +1014,14 @@ static struct cftype files[] = { .seq_show = dmem_cgroup_region_max_show, .flags = CFTYPE_NOT_ON_ROOT, }, +#ifdef CONFIG_MEMCG + { + .name = "memcg", + .write = dmem_cgroup_memcg_write, + .seq_show = dmem_cgroup_memcg_show, + .flags = CFTYPE_ONLY_ON_ROOT, + }, +#endif { } /* Zero entry terminates. */ }; @@ -882,4 +1031,7 @@ struct cgroup_subsys dmem_cgrp_subsys = { .css_offline = dmemcs_offline, .legacy_cftypes = files, .dfl_cftypes = files, +#ifdef CONFIG_MEMCG + .depends_on = 1 << memory_cgrp_id, +#endif }; -- 2.52.0 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v2 2/2] cgroup/dmem: add dmem.memcg control file for double-charging to memcg 2026-05-19 15:59 ` [PATCH v2 2/2] cgroup/dmem: add dmem.memcg control file for double-charging to memcg Eric Chanudet @ 2026-05-22 15:26 ` Michal Koutný 2026-05-22 16:17 ` Tejun Heo 0 siblings, 1 reply; 8+ messages in thread From: Michal Koutný @ 2026-05-22 15:26 UTC (permalink / raw) To: Eric Chanudet Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt, Muchun Song, Andrew Morton, Maarten Lankhorst, Maxime Ripard, Natalie Vock, Tejun Heo, Jonathan Corbet, Shuah Khan, cgroups, linux-mm, linux-kernel, dri-devel, T.J. Mercier, Christian König, Maxime Ripard, Albert Esteve, Dave Airlie, linux-doc [-- Attachment #1: Type: text/plain, Size: 2824 bytes --] Hello Eric. On Tue, May 19, 2026 at 11:59:02AM -0400, Eric Chanudet <echanude@redhat.com> wrote: > Add a root-only cgroupfs file "dmem.memcg" that lets an administrator > configure whether allocations in a dmem region should also be charged to > the memory controller. This kinda makes sense as it is not unlike io.cost.* device configurators. Just for my better understanding -- will there be a space for userspace to switch this? (No charged dmem allocations happen before responsible userspace runs, so that the attribute remains unlocked.) (I'm rather indifferent about the actual double charging/non-charging matter.) > > To handle inheritance, dmem adds a depends_on the memory controller, > unless MEMCG isn't configured in. > > Double-charging is disabled by default. Once a charge is attempted, the > setting is locked to prevent inconsistent accounting by a small 4-state > machine (off, on, locked off, locked on). > > The memcg to charge is derived from the pool's cgroup, since the pool > holds a reference to the dmem cgroup state that keeps the cgroup alive > until it gets uncharged. > > Signed-off-by: Eric Chanudet <echanude@redhat.com> > --- > Documentation/admin-guide/cgroup-v2.rst | 23 +++++ > kernel/cgroup/dmem.c | 158 +++++++++++++++++++++++++++++++- > 2 files changed, 178 insertions(+), 3 deletions(-) > > diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst > index 6efd0095ed995b1550317662bc1b56c7a7f3db23..1d2fa55ddf0faa17baa916a8914d3033e8e42359 100644 > --- a/Documentation/admin-guide/cgroup-v2.rst > +++ b/Documentation/admin-guide/cgroup-v2.rst > @@ -2828,6 +2828,29 @@ DMEM Interface Files > drm/0000:03:00.0/vram0 12550144 > drm/0000:03:00.0/stolen 8650752 > > + dmem.memcg > + A readwrite nested-keyed file that exists only on the root > + cgroup. Strictly speaking this is not nested-keyed but flat keyed [1], which leads me to realization that this is the first instance of a boolean. All in call, such a composition comes to my mind (latter is RO): drm/0000:03:00.0/vram0 enable=0|1 locked=0|1 > +static ssize_t dmem_cgroup_memcg_write(struct kernfs_open_file *of, char *buf, > + size_t nbytes, loff_t off) > +{ > + while (buf) { > + struct dmem_cgroup_region *region; > + char *options, *name; > + bool flag; > + > + options = buf; > + buf = strchr(buf, '\n'); > + if (buf) > + *buf++ = '\0'; I recall there was a discussion about accepting only a single device per write(2) (at the same time I see this idiom is still present in other dmem.* files, so this is nothing to change in _this_ patch). Thanks, Michal [1] https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html#format [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 265 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 2/2] cgroup/dmem: add dmem.memcg control file for double-charging to memcg 2026-05-22 15:26 ` Michal Koutný @ 2026-05-22 16:17 ` Tejun Heo 0 siblings, 0 replies; 8+ messages in thread From: Tejun Heo @ 2026-05-22 16:17 UTC (permalink / raw) To: Michal Koutný Cc: Eric Chanudet, Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt, Muchun Song, Andrew Morton, Maarten Lankhorst, Maxime Ripard, Natalie Vock, Jonathan Corbet, Shuah Khan, cgroups, linux-mm, linux-kernel, dri-devel, T.J. Mercier, Christian König, Maxime Ripard, Albert Esteve, Dave Airlie, linux-doc Hello, On Fri, May 22, 2026 at 05:26:16PM +0200, Michal Koutný wrote: > Hello Eric. > > On Tue, May 19, 2026 at 11:59:02AM -0400, Eric Chanudet <echanude@redhat.com> wrote: > > Add a root-only cgroupfs file "dmem.memcg" that lets an administrator > > configure whether allocations in a dmem region should also be charged to > > the memory controller. > > This kinda makes sense as it is not unlike io.cost.* device > configurators. > > Just for my better understanding -- will there be a space for userspace > to switch this? (No charged dmem allocations happen before responsible > userspace runs, so that the attribute remains unlocked.) > > (I'm rather indifferent about the actual double charging/non-charging > matter.) I wonder whether this would make more sense as a mount flag? What's the use case for e.g. having different config for different devices? Wouldn't that be really confusing? Thanks. -- tejun ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2026-05-22 16:17 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-05-19 15:59 [PATCH v2 0/2] cgroup/dmem: allow double-charging dmem allocations to memcg Eric Chanudet 2026-05-19 15:59 ` [PATCH v2 1/2] mm/memcontrol: add dmem charge/uncharge functions Eric Chanudet 2026-05-20 7:22 ` Albert Esteve 2026-05-22 15:53 ` Shakeel Butt 2026-05-22 15:55 ` Shakeel Butt 2026-05-19 15:59 ` [PATCH v2 2/2] cgroup/dmem: add dmem.memcg control file for double-charging to memcg Eric Chanudet 2026-05-22 15:26 ` Michal Koutný 2026-05-22 16:17 ` Tejun Heo
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox