* [PATCH -next v2 1/4] cgroup/dmem: fix NULL pointer dereference when setting max
2026-02-02 12:27 [PATCH -next v2 0/4] cgroup/dmem: bugfixes Chen Ridong
@ 2026-02-02 12:27 ` Chen Ridong
2026-02-02 12:27 ` [PATCH -next v2 2/4] cgroup/dmem: avoid rcu warning when unregister region Chen Ridong
` (3 subsequent siblings)
4 siblings, 0 replies; 7+ messages in thread
From: Chen Ridong @ 2026-02-02 12:27 UTC (permalink / raw)
To: dev, mripard, natalie.vock, tj, hannes, mkoutny
Cc: cgroups, dri-devel, linux-kernel, lujialin4, chenridong
From: Chen Ridong <chenridong@huawei.com>
An issue was triggered:
BUG: kernel NULL pointer dereference, address: 0000000000000000
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: Oops: 0000 [#1] SMP NOPTI
CPU: 15 UID: 0 PID: 658 Comm: bash Tainted: 6.19.0-rc6-next-2026012
Tainted: [O]=OOT_MODULE
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
RIP: 0010:strcmp+0x10/0x30
RSP: 0018:ffffc900017f7dc0 EFLAGS: 00000246
RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff888107cd4358
RDX: 0000000019f73907 RSI: ffffffff82cc381a RDI: 0000000000000000
RBP: ffff8881016bef0d R08: 000000006c0e7145 R09: 0000000056c0e714
R10: 0000000000000001 R11: ffff888107cd4358 R12: 0007ffffffffffff
R13: ffff888101399200 R14: ffff888100fcb360 R15: 0007ffffffffffff
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000105c79000 CR4: 00000000000006f0
Call Trace:
<TASK>
dmemcg_limit_write.constprop.0+0x16d/0x390
? __pfx_set_resource_max+0x10/0x10
kernfs_fop_write_iter+0x14e/0x200
vfs_write+0x367/0x510
ksys_write+0x66/0xe0
do_syscall_64+0x6b/0x390
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f42697e1887
It was trriggered setting max without limitation, the command is like:
"echo test/region0 > dmem.max". To fix this issue, add check whether
options is valid after parsing the region_name.
Fixes: b168ed458dde ("kernel/cgroup: Add "dmem" memory accounting cgroup")
Signed-off-by: Chen Ridong <chenridong@huawei.com>
---
kernel/cgroup/dmem.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/kernel/cgroup/dmem.c b/kernel/cgroup/dmem.c
index e12b946278b6..1f0d6caaf2fb 100644
--- a/kernel/cgroup/dmem.c
+++ b/kernel/cgroup/dmem.c
@@ -700,6 +700,9 @@ static ssize_t dmemcg_limit_write(struct kernfs_open_file *of,
if (!region_name[0])
continue;
+ if (!options || !*options)
+ return -EINVAL;
+
rcu_read_lock();
region = dmemcg_get_region_by_name(region_name);
rcu_read_unlock();
--
2.34.1
^ permalink raw reply related [flat|nested] 7+ messages in thread* [PATCH -next v2 2/4] cgroup/dmem: avoid rcu warning when unregister region
2026-02-02 12:27 [PATCH -next v2 0/4] cgroup/dmem: bugfixes Chen Ridong
2026-02-02 12:27 ` [PATCH -next v2 1/4] cgroup/dmem: fix NULL pointer dereference when setting max Chen Ridong
@ 2026-02-02 12:27 ` Chen Ridong
2026-02-02 12:27 ` [PATCH -next v2 3/4] cgroup/dmem: avoid pool UAF Chen Ridong
` (2 subsequent siblings)
4 siblings, 0 replies; 7+ messages in thread
From: Chen Ridong @ 2026-02-02 12:27 UTC (permalink / raw)
To: dev, mripard, natalie.vock, tj, hannes, mkoutny
Cc: cgroups, dri-devel, linux-kernel, lujialin4, chenridong
From: Chen Ridong <chenridong@huawei.com>
A warnning was detected:
WARNING: suspicious RCU usage
6.19.0-rc7-next-20260129+ #1101 Tainted: G O
kernel/cgroup/dmem.c:456 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1
1 lock held by insmod/532:
#0: ffffffff85e78b38 (dmemcg_lock){+.+.}-dmem_cgroup_unregister_region+
stack backtrace:
CPU: 2 UID: 0 PID: 532 Comm: insmod Tainted: 6.19.0-rc7-next-
Tainted: [O]=OOT_MODULE
Call Trace:
<TASK>
dump_stack_lvl+0xb0/0xd0
lockdep_rcu_suspicious+0x151/0x1c0
dmem_cgroup_unregister_region+0x1e2/0x380
? __pfx_dmem_test_init+0x10/0x10 [dmem_uaf]
dmem_test_init+0x65/0xff0 [dmem_uaf]
do_one_initcall+0xbb/0x3a0
The macro list_for_each_rcu() must be used within an RCU read-side critical
section (between rcu_read_lock() and rcu_read_unlock()). Using it outside
that context, as seen in dmem_cgroup_unregister_region(), triggers the
lockdep warning because the RCU protection is not guaranteed.
Replace list_for_each_rcu() with list_for_each_entry_safe(), which is
appropriate for traversal under spinlock protection where nodes may be
deleted.
Fixes: b168ed458dde ("kernel/cgroup: Add "dmem" memory accounting cgroup")
Signed-off-by: Chen Ridong <chenridong@huawei.com>
---
kernel/cgroup/dmem.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/kernel/cgroup/dmem.c b/kernel/cgroup/dmem.c
index 1f0d6caaf2fb..787b334e0f5d 100644
--- a/kernel/cgroup/dmem.c
+++ b/kernel/cgroup/dmem.c
@@ -423,7 +423,7 @@ static void dmemcg_free_region(struct kref *ref)
*/
void dmem_cgroup_unregister_region(struct dmem_cgroup_region *region)
{
- struct list_head *entry;
+ struct dmem_cgroup_pool_state *pool, *next;
if (!region)
return;
@@ -433,10 +433,7 @@ void dmem_cgroup_unregister_region(struct dmem_cgroup_region *region)
/* Remove from global region list */
list_del_rcu(®ion->region_node);
- list_for_each_rcu(entry, ®ion->pools) {
- struct dmem_cgroup_pool_state *pool =
- container_of(entry, typeof(*pool), region_node);
-
+ list_for_each_entry_safe(pool, next, ®ion->pools, region_node) {
list_del_rcu(&pool->css_node);
}
--
2.34.1
^ permalink raw reply related [flat|nested] 7+ messages in thread* [PATCH -next v2 3/4] cgroup/dmem: avoid pool UAF
2026-02-02 12:27 [PATCH -next v2 0/4] cgroup/dmem: bugfixes Chen Ridong
2026-02-02 12:27 ` [PATCH -next v2 1/4] cgroup/dmem: fix NULL pointer dereference when setting max Chen Ridong
2026-02-02 12:27 ` [PATCH -next v2 2/4] cgroup/dmem: avoid rcu warning when unregister region Chen Ridong
@ 2026-02-02 12:27 ` Chen Ridong
2026-02-02 12:27 ` [PATCH -next v2 4/4] cgroup/dmem: add argument checks in helpers Chen Ridong
2026-02-02 16:17 ` [PATCH -next v2 0/4] cgroup/dmem: bugfixes Tejun Heo
4 siblings, 0 replies; 7+ messages in thread
From: Chen Ridong @ 2026-02-02 12:27 UTC (permalink / raw)
To: dev, mripard, natalie.vock, tj, hannes, mkoutny
Cc: cgroups, dri-devel, linux-kernel, lujialin4, chenridong
From: Chen Ridong <chenridong@huawei.com>
An UAF issue was observed:
BUG: KASAN: slab-use-after-free in page_counter_uncharge+0x65/0x150
Write of size 8 at addr ffff888106715440 by task insmod/527
CPU: 4 UID: 0 PID: 527 Comm: insmod 6.19.0-rc7-next-20260129+ #11
Tainted: [O]=OOT_MODULE
Call Trace:
<TASK>
dump_stack_lvl+0x82/0xd0
kasan_report+0xca/0x100
kasan_check_range+0x39/0x1c0
page_counter_uncharge+0x65/0x150
dmem_cgroup_uncharge+0x1f/0x260
Allocated by task 527:
Freed by task 0:
The buggy address belongs to the object at ffff888106715400
which belongs to the cache kmalloc-512 of size 512
The buggy address is located 64 bytes inside of
freed 512-byte region [ffff888106715400, ffff888106715600)
The buggy address belongs to the physical page:
Memory state around the buggy address:
ffff888106715300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888106715380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888106715400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888106715480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888106715500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
The issue occurs because a pool can still be held by a caller after its
associated memory region is unregistered. The current implementation frees
the pool even if users still hold references to it (e.g., before uncharge
operations complete).
This patch adds a reference counter to each pool, ensuring that a pool is
only freed when its reference count drops to zero.
Fixes: b168ed458dde ("kernel/cgroup: Add "dmem" memory accounting cgroup")
Signed-off-by: Chen Ridong <chenridong@huawei.com>
---
kernel/cgroup/dmem.c | 60 ++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 58 insertions(+), 2 deletions(-)
diff --git a/kernel/cgroup/dmem.c b/kernel/cgroup/dmem.c
index 787b334e0f5d..1ea6afffa985 100644
--- a/kernel/cgroup/dmem.c
+++ b/kernel/cgroup/dmem.c
@@ -14,6 +14,7 @@
#include <linux/mutex.h>
#include <linux/page_counter.h>
#include <linux/parser.h>
+#include <linux/refcount.h>
#include <linux/rculist.h>
#include <linux/slab.h>
@@ -71,7 +72,9 @@ struct dmem_cgroup_pool_state {
struct rcu_head rcu;
struct page_counter cnt;
+ struct dmem_cgroup_pool_state *parent;
+ refcount_t ref;
bool inited;
};
@@ -88,6 +91,9 @@ struct dmem_cgroup_pool_state {
static DEFINE_SPINLOCK(dmemcg_lock);
static LIST_HEAD(dmem_cgroup_regions);
+static void dmemcg_free_region(struct kref *ref);
+static void dmemcg_pool_free_rcu(struct rcu_head *rcu);
+
static inline struct dmemcg_state *
css_to_dmemcs(struct cgroup_subsys_state *css)
{
@@ -104,10 +110,38 @@ static struct dmemcg_state *parent_dmemcs(struct dmemcg_state *cg)
return cg->css.parent ? css_to_dmemcs(cg->css.parent) : NULL;
}
+static void dmemcg_pool_get(struct dmem_cgroup_pool_state *pool)
+{
+ refcount_inc(&pool->ref);
+}
+
+static bool dmemcg_pool_tryget(struct dmem_cgroup_pool_state *pool)
+{
+ return refcount_inc_not_zero(&pool->ref);
+}
+
+static void dmemcg_pool_put(struct dmem_cgroup_pool_state *pool)
+{
+ if (!refcount_dec_and_test(&pool->ref))
+ return;
+
+ call_rcu(&pool->rcu, dmemcg_pool_free_rcu);
+}
+
+static void dmemcg_pool_free_rcu(struct rcu_head *rcu)
+{
+ struct dmem_cgroup_pool_state *pool = container_of(rcu, typeof(*pool), rcu);
+
+ if (pool->parent)
+ dmemcg_pool_put(pool->parent);
+ kref_put(&pool->region->ref, dmemcg_free_region);
+ kfree(pool);
+}
+
static void free_cg_pool(struct dmem_cgroup_pool_state *pool)
{
list_del(&pool->region_node);
- kfree(pool);
+ dmemcg_pool_put(pool);
}
static void
@@ -342,6 +376,12 @@ alloc_pool_single(struct dmemcg_state *dmemcs, struct dmem_cgroup_region *region
page_counter_init(&pool->cnt,
ppool ? &ppool->cnt : NULL, true);
reset_all_resource_limits(pool);
+ refcount_set(&pool->ref, 1);
+ kref_get(®ion->ref);
+ if (ppool && !pool->parent) {
+ pool->parent = ppool;
+ dmemcg_pool_get(ppool);
+ }
list_add_tail_rcu(&pool->css_node, &dmemcs->pools);
list_add_tail(&pool->region_node, ®ion->pools);
@@ -389,6 +429,10 @@ get_cg_pool_locked(struct dmemcg_state *dmemcs, struct dmem_cgroup_region *regio
/* Fix up parent links, mark as inited. */
pool->cnt.parent = &ppool->cnt;
+ if (ppool && !pool->parent) {
+ pool->parent = ppool;
+ dmemcg_pool_get(ppool);
+ }
pool->inited = true;
pool = ppool;
@@ -435,6 +479,8 @@ void dmem_cgroup_unregister_region(struct dmem_cgroup_region *region)
list_for_each_entry_safe(pool, next, ®ion->pools, region_node) {
list_del_rcu(&pool->css_node);
+ list_del(&pool->region_node);
+ dmemcg_pool_put(pool);
}
/*
@@ -515,8 +561,10 @@ static struct dmem_cgroup_region *dmemcg_get_region_by_name(const char *name)
*/
void dmem_cgroup_pool_state_put(struct dmem_cgroup_pool_state *pool)
{
- if (pool)
+ if (pool) {
css_put(&pool->cs->css);
+ dmemcg_pool_put(pool);
+ }
}
EXPORT_SYMBOL_GPL(dmem_cgroup_pool_state_put);
@@ -530,6 +578,8 @@ get_cg_pool_unlocked(struct dmemcg_state *cg, struct dmem_cgroup_region *region)
pool = find_cg_pool_locked(cg, region);
if (pool && !READ_ONCE(pool->inited))
pool = NULL;
+ if (pool && !dmemcg_pool_tryget(pool))
+ pool = NULL;
rcu_read_unlock();
while (!pool) {
@@ -538,6 +588,8 @@ get_cg_pool_unlocked(struct dmemcg_state *cg, struct dmem_cgroup_region *region)
pool = get_cg_pool_locked(cg, region, &allocpool);
else
pool = ERR_PTR(-ENODEV);
+ if (!IS_ERR(pool))
+ dmemcg_pool_get(pool);
spin_unlock(&dmemcg_lock);
if (pool == ERR_PTR(-ENOMEM)) {
@@ -573,6 +625,7 @@ void dmem_cgroup_uncharge(struct dmem_cgroup_pool_state *pool, u64 size)
page_counter_uncharge(&pool->cnt, size);
css_put(&pool->cs->css);
+ dmemcg_pool_put(pool);
}
EXPORT_SYMBOL_GPL(dmem_cgroup_uncharge);
@@ -624,7 +677,9 @@ int dmem_cgroup_try_charge(struct dmem_cgroup_region *region, u64 size,
if (ret_limit_pool) {
*ret_limit_pool = container_of(fail, struct dmem_cgroup_pool_state, cnt);
css_get(&(*ret_limit_pool)->cs->css);
+ dmemcg_pool_get(*ret_limit_pool);
}
+ dmemcg_pool_put(pool);
ret = -EAGAIN;
goto err;
}
@@ -719,6 +774,7 @@ static ssize_t dmemcg_limit_write(struct kernfs_open_file *of,
/* And commit */
apply(pool, new_limit);
+ dmemcg_pool_put(pool);
out_put:
kref_put(®ion->ref, dmemcg_free_region);
--
2.34.1
^ permalink raw reply related [flat|nested] 7+ messages in thread* [PATCH -next v2 4/4] cgroup/dmem: add argument checks in helpers
2026-02-02 12:27 [PATCH -next v2 0/4] cgroup/dmem: bugfixes Chen Ridong
` (2 preceding siblings ...)
2026-02-02 12:27 ` [PATCH -next v2 3/4] cgroup/dmem: avoid pool UAF Chen Ridong
@ 2026-02-02 12:27 ` Chen Ridong
2026-02-02 16:17 ` [PATCH -next v2 0/4] cgroup/dmem: bugfixes Tejun Heo
4 siblings, 0 replies; 7+ messages in thread
From: Chen Ridong @ 2026-02-02 12:27 UTC (permalink / raw)
To: dev, mripard, natalie.vock, tj, hannes, mkoutny
Cc: cgroups, dri-devel, linux-kernel, lujialin4, chenridong
From: Chen Ridong <chenridong@huawei.com>
Add WARN_ON_ONCE guards for NULL-sensitive arguments in dmem helpers to
avoid NULL dereferences on misused APIs. Valid callers are unaffected.
Signed-off-by: Chen Ridong <chenridong@huawei.com>
---
kernel/cgroup/dmem.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/kernel/cgroup/dmem.c b/kernel/cgroup/dmem.c
index 1ea6afffa985..aa5bacf5fe45 100644
--- a/kernel/cgroup/dmem.c
+++ b/kernel/cgroup/dmem.c
@@ -307,6 +307,9 @@ bool dmem_cgroup_state_evict_valuable(struct dmem_cgroup_pool_state *limit_pool,
struct page_counter *ctest;
u64 used, min, low;
+ if (WARN_ON_ONCE(!test_pool))
+ return false;
+
/* Can always evict from current pool, despite limits */
if (limit_pool == test_pool)
return true;
@@ -343,7 +346,8 @@ bool dmem_cgroup_state_evict_valuable(struct dmem_cgroup_pool_state *limit_pool,
low = READ_ONCE(ctest->elow);
if (used > low)
return true;
-
+ if (WARN_ON_ONCE(!ret_hit_low))
+ return false;
*ret_hit_low = true;
return false;
}
@@ -512,7 +516,7 @@ struct dmem_cgroup_region *dmem_cgroup_register_region(u64 size, const char *fmt
char *region_name;
va_list ap;
- if (!size)
+ if (WARN_ON_ONCE(!size || !fmt))
return NULL;
va_start(ap, fmt);
@@ -520,6 +524,10 @@ struct dmem_cgroup_region *dmem_cgroup_register_region(u64 size, const char *fmt
va_end(ap);
if (!region_name)
return ERR_PTR(-ENOMEM);
+ if (WARN_ON_ONCE(!region_name[0])) {
+ kfree(region_name);
+ return ERR_PTR(-EINVAL);
+ }
ret = kzalloc(sizeof(*ret), GFP_KERNEL);
if (!ret) {
@@ -657,6 +665,9 @@ int dmem_cgroup_try_charge(struct dmem_cgroup_region *region, u64 size,
struct page_counter *fail;
int ret;
+ if (WARN_ON_ONCE(!region || !ret_pool))
+ return -EINVAL;
+
*ret_pool = NULL;
if (ret_limit_pool)
*ret_limit_pool = NULL;
--
2.34.1
^ permalink raw reply related [flat|nested] 7+ messages in thread* Re: [PATCH -next v2 0/4] cgroup/dmem: bugfixes
2026-02-02 12:27 [PATCH -next v2 0/4] cgroup/dmem: bugfixes Chen Ridong
` (3 preceding siblings ...)
2026-02-02 12:27 ` [PATCH -next v2 4/4] cgroup/dmem: add argument checks in helpers Chen Ridong
@ 2026-02-02 16:17 ` Tejun Heo
2026-02-03 0:37 ` Chen Ridong
4 siblings, 1 reply; 7+ messages in thread
From: Tejun Heo @ 2026-02-02 16:17 UTC (permalink / raw)
To: Chen Ridong
Cc: dev, mripard, natalie.vock, hannes, mkoutny, cgroups, dri-devel,
linux-kernel, lujialin4
> Chen Ridong (4):
> cgroup/dmem: fix NULL pointer dereference when setting max
> cgroup/dmem: avoid rcu warning when unregister region
> cgroup/dmem: avoid pool UAF
> cgroup/dmem: add argument checks in helpers
Applied 1-3 to cgroup/for-6.19-fixes w/ stable tags added.
I dropped 4/4 as we don't want this kind of blanket input validation
unless there are specific reasons to do so.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [PATCH -next v2 0/4] cgroup/dmem: bugfixes
2026-02-02 16:17 ` [PATCH -next v2 0/4] cgroup/dmem: bugfixes Tejun Heo
@ 2026-02-03 0:37 ` Chen Ridong
0 siblings, 0 replies; 7+ messages in thread
From: Chen Ridong @ 2026-02-03 0:37 UTC (permalink / raw)
To: Tejun Heo
Cc: dev, mripard, natalie.vock, hannes, mkoutny, cgroups, dri-devel,
linux-kernel, lujialin4
On 2026/2/3 0:17, Tejun Heo wrote:
>> Chen Ridong (4):
>> cgroup/dmem: fix NULL pointer dereference when setting max
>> cgroup/dmem: avoid rcu warning when unregister region
>> cgroup/dmem: avoid pool UAF
>> cgroup/dmem: add argument checks in helpers
>
> Applied 1-3 to cgroup/for-6.19-fixes w/ stable tags added.
>
> I dropped 4/4 as we don't want this kind of blanket input validation
> unless there are specific reasons to do so.
>
Thank you.
--
Best regards,
Ridong
^ permalink raw reply [flat|nested] 7+ messages in thread