cgroups.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] kernfs: Fix UAF in PSI polling when open file is released
@ 2025-08-15  1:34 Chen Ridong
  2025-08-15  6:11 ` Greg KH
       [not found] ` <0319ee9b-ce2c-4c02-a731-c538afcf008f@huawei.com>
  0 siblings, 2 replies; 11+ messages in thread
From: Chen Ridong @ 2025-08-15  1:34 UTC (permalink / raw)
  To: gregkh, tj, hannes, mkoutny, peterz, zhouchengming
  Cc: linux-kernel, cgroups, lujialin4, chenridong

From: Chen Ridong <chenridong@huawei.com>

A use-after-free (UAF) vulnerability was identified in the PSI (Pressure
Stall Information) monitoring mechanism:

BUG: KASAN: slab-use-after-free in psi_trigger_poll+0x3c/0x140
Read of size 8 at addr ffff3de3d50bd308 by task systemd/1

psi_trigger_poll+0x3c/0x140
cgroup_pressure_poll+0x70/0xa0
cgroup_file_poll+0x8c/0x100
kernfs_fop_poll+0x11c/0x1c0
ep_item_poll.isra.0+0x188/0x2c0

Allocated by task 1:
cgroup_file_open+0x88/0x388
kernfs_fop_open+0x73c/0xaf0
do_dentry_open+0x5fc/0x1200
vfs_open+0xa0/0x3f0
do_open+0x7e8/0xd08
path_openat+0x2fc/0x6b0
do_filp_open+0x174/0x368

Freed by task 8462:
cgroup_file_release+0x130/0x1f8
kernfs_drain_open_files+0x17c/0x440
kernfs_drain+0x2dc/0x360
kernfs_show+0x1b8/0x288
cgroup_file_show+0x150/0x268
cgroup_pressure_write+0x1dc/0x340
cgroup_file_write+0x274/0x548

Reproduction Steps:
1. Open test/cpu.pressure and establish epoll monitoring
2. Disable monitoring: echo 0 > test/cgroup.pressure
3. Re-enable monitoring: echo 1 > test/cgroup.pressure

The race condition occurs because:
1. When cgroup.pressure is disabled (echo 0 > cgroup.pressure), it:
   - Releases PSI triggers via cgroup_file_release()
   - Frees of->priv through kernfs_drain_open_files()
2. While epoll still holds reference to the file and continues polling
3. Re-enabling (echo 1 > cgroup.pressure) accesses freed of->priv

epolling			disable/enable cgroup.pressure
fd=open(cpu.pressure)
while(1)
...
epoll_wait
kernfs_fop_poll
kernfs_get_active = true	echo 0 > cgroup.pressure
...				cgroup_file_show
				kernfs_show
				// inactive kn
				kernfs_drain_open_files
				cft->release(of);
				kfree(ctx);
				...
kernfs_get_active = false
				echo 1 > cgroup.pressure
				kernfs_show
				kernfs_activate_one(kn);
kernfs_fop_poll
kernfs_get_active = true
cgroup_file_poll
psi_trigger_poll
// UAF
...
end: close(fd)

Fix this by adding released flag check in kernfs_fop_poll(), which is
treated as kn is inactive.

Fixes: 34f26a15611a ("sched/psi: Per-cgroup PSI accounting disable/re-enable interface")
Reported-by: Zhang Zhantian <zhangzhaotian@huawei.com>
Signed-off-by: Chen Ridong <chenridong@huawei.com>
---
 fs/kernfs/file.c       | 2 +-
 kernel/cgroup/cgroup.c | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
index a6c692cac616..d5d01f0b9392 100644
--- a/fs/kernfs/file.c
+++ b/fs/kernfs/file.c
@@ -852,7 +852,7 @@ static __poll_t kernfs_fop_poll(struct file *filp, poll_table *wait)
 	struct kernfs_node *kn = kernfs_dentry_node(filp->f_path.dentry);
 	__poll_t ret;
 
-	if (!kernfs_get_active(kn))
+	if (of->released || !kernfs_get_active(kn))
 		return DEFAULT_POLLMASK|EPOLLERR|EPOLLPRI;
 
 	if (kn->attr.ops->poll)
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 312c6a8b55bb..d8b82afed181 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -4159,6 +4159,7 @@ static void cgroup_file_release(struct kernfs_open_file *of)
 		cft->release(of);
 	put_cgroup_ns(ctx->ns);
 	kfree(ctx);
+	of->priv = NULL;
 }
 
 static ssize_t cgroup_file_write(struct kernfs_open_file *of, char *buf,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] kernfs: Fix UAF in PSI polling when open file is released
  2025-08-15  1:34 [PATCH] kernfs: Fix UAF in PSI polling when open file is released Chen Ridong
@ 2025-08-15  6:11 ` Greg KH
  2025-08-15  6:22   ` Chen Ridong
  2025-08-15 14:42   ` Michal Koutný
       [not found] ` <0319ee9b-ce2c-4c02-a731-c538afcf008f@huawei.com>
  1 sibling, 2 replies; 11+ messages in thread
From: Greg KH @ 2025-08-15  6:11 UTC (permalink / raw)
  To: Chen Ridong
  Cc: tj, hannes, mkoutny, peterz, zhouchengming, linux-kernel, cgroups,
	lujialin4, chenridong

On Fri, Aug 15, 2025 at 01:34:29AM +0000, Chen Ridong wrote:
> From: Chen Ridong <chenridong@huawei.com>
> 
> A use-after-free (UAF) vulnerability was identified in the PSI (Pressure
> Stall Information) monitoring mechanism:
> 
> BUG: KASAN: slab-use-after-free in psi_trigger_poll+0x3c/0x140
> Read of size 8 at addr ffff3de3d50bd308 by task systemd/1
> 
> psi_trigger_poll+0x3c/0x140
> cgroup_pressure_poll+0x70/0xa0
> cgroup_file_poll+0x8c/0x100
> kernfs_fop_poll+0x11c/0x1c0
> ep_item_poll.isra.0+0x188/0x2c0
> 
> Allocated by task 1:
> cgroup_file_open+0x88/0x388
> kernfs_fop_open+0x73c/0xaf0
> do_dentry_open+0x5fc/0x1200
> vfs_open+0xa0/0x3f0
> do_open+0x7e8/0xd08
> path_openat+0x2fc/0x6b0
> do_filp_open+0x174/0x368
> 
> Freed by task 8462:
> cgroup_file_release+0x130/0x1f8
> kernfs_drain_open_files+0x17c/0x440
> kernfs_drain+0x2dc/0x360
> kernfs_show+0x1b8/0x288
> cgroup_file_show+0x150/0x268
> cgroup_pressure_write+0x1dc/0x340
> cgroup_file_write+0x274/0x548
> 
> Reproduction Steps:
> 1. Open test/cpu.pressure and establish epoll monitoring
> 2. Disable monitoring: echo 0 > test/cgroup.pressure
> 3. Re-enable monitoring: echo 1 > test/cgroup.pressure
> 
> The race condition occurs because:
> 1. When cgroup.pressure is disabled (echo 0 > cgroup.pressure), it:
>    - Releases PSI triggers via cgroup_file_release()
>    - Frees of->priv through kernfs_drain_open_files()
> 2. While epoll still holds reference to the file and continues polling
> 3. Re-enabling (echo 1 > cgroup.pressure) accesses freed of->priv
> 
> epolling			disable/enable cgroup.pressure
> fd=open(cpu.pressure)
> while(1)
> ...
> epoll_wait
> kernfs_fop_poll
> kernfs_get_active = true	echo 0 > cgroup.pressure
> ...				cgroup_file_show
> 				kernfs_show
> 				// inactive kn
> 				kernfs_drain_open_files
> 				cft->release(of);
> 				kfree(ctx);
> 				...
> kernfs_get_active = false
> 				echo 1 > cgroup.pressure
> 				kernfs_show
> 				kernfs_activate_one(kn);
> kernfs_fop_poll
> kernfs_get_active = true
> cgroup_file_poll
> psi_trigger_poll
> // UAF
> ...
> end: close(fd)
> 
> Fix this by adding released flag check in kernfs_fop_poll(), which is
> treated as kn is inactive.
> 
> Fixes: 34f26a15611a ("sched/psi: Per-cgroup PSI accounting disable/re-enable interface")
> Reported-by: Zhang Zhantian <zhangzhaotian@huawei.com>
> Signed-off-by: Chen Ridong <chenridong@huawei.com>
> ---
>  fs/kernfs/file.c       | 2 +-
>  kernel/cgroup/cgroup.c | 1 +
>  2 files changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
> index a6c692cac616..d5d01f0b9392 100644
> --- a/fs/kernfs/file.c
> +++ b/fs/kernfs/file.c
> @@ -852,7 +852,7 @@ static __poll_t kernfs_fop_poll(struct file *filp, poll_table *wait)
>  	struct kernfs_node *kn = kernfs_dentry_node(filp->f_path.dentry);
>  	__poll_t ret;
>  
> -	if (!kernfs_get_active(kn))
> +	if (of->released || !kernfs_get_active(kn))

I can see why the cgroup change is needed, but why is this test for
released() an issue here?  This feels like two different changes/fixes
for different problems?  Why does testing for released matter at this
point in time?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] kernfs: Fix UAF in PSI polling when open file is released
  2025-08-15  6:11 ` Greg KH
@ 2025-08-15  6:22   ` Chen Ridong
  2025-08-15  6:43     ` Greg KH
  2025-08-15 14:42   ` Michal Koutný
  1 sibling, 1 reply; 11+ messages in thread
From: Chen Ridong @ 2025-08-15  6:22 UTC (permalink / raw)
  To: Greg KH
  Cc: tj, hannes, mkoutny, peterz, zhouchengming, linux-kernel, cgroups,
	lujialin4, chenridong



On 2025/8/15 14:11, Greg KH wrote:
> On Fri, Aug 15, 2025 at 01:34:29AM +0000, Chen Ridong wrote:
>> From: Chen Ridong <chenridong@huawei.com>
>>
>> A use-after-free (UAF) vulnerability was identified in the PSI (Pressure
>> Stall Information) monitoring mechanism:
>>
>> BUG: KASAN: slab-use-after-free in psi_trigger_poll+0x3c/0x140
>> Read of size 8 at addr ffff3de3d50bd308 by task systemd/1
>>
>> psi_trigger_poll+0x3c/0x140
>> cgroup_pressure_poll+0x70/0xa0
>> cgroup_file_poll+0x8c/0x100
>> kernfs_fop_poll+0x11c/0x1c0
>> ep_item_poll.isra.0+0x188/0x2c0
>>
>> Allocated by task 1:
>> cgroup_file_open+0x88/0x388
>> kernfs_fop_open+0x73c/0xaf0
>> do_dentry_open+0x5fc/0x1200
>> vfs_open+0xa0/0x3f0
>> do_open+0x7e8/0xd08
>> path_openat+0x2fc/0x6b0
>> do_filp_open+0x174/0x368
>>
>> Freed by task 8462:
>> cgroup_file_release+0x130/0x1f8
>> kernfs_drain_open_files+0x17c/0x440
>> kernfs_drain+0x2dc/0x360
>> kernfs_show+0x1b8/0x288
>> cgroup_file_show+0x150/0x268
>> cgroup_pressure_write+0x1dc/0x340
>> cgroup_file_write+0x274/0x548
>>
>> Reproduction Steps:
>> 1. Open test/cpu.pressure and establish epoll monitoring
>> 2. Disable monitoring: echo 0 > test/cgroup.pressure
>> 3. Re-enable monitoring: echo 1 > test/cgroup.pressure
>>
>> The race condition occurs because:
>> 1. When cgroup.pressure is disabled (echo 0 > cgroup.pressure), it:
>>    - Releases PSI triggers via cgroup_file_release()
>>    - Frees of->priv through kernfs_drain_open_files()
>> 2. While epoll still holds reference to the file and continues polling
>> 3. Re-enabling (echo 1 > cgroup.pressure) accesses freed of->priv
>>
>> epolling			disable/enable cgroup.pressure
>> fd=open(cpu.pressure)
>> while(1)
>> ...
>> epoll_wait
>> kernfs_fop_poll
>> kernfs_get_active = true	echo 0 > cgroup.pressure
>> ...				cgroup_file_show
>> 				kernfs_show
>> 				// inactive kn
>> 				kernfs_drain_open_files
>> 				cft->release(of);
>> 				kfree(ctx);
>> 				...
>> kernfs_get_active = false
>> 				echo 1 > cgroup.pressure
>> 				kernfs_show
>> 				kernfs_activate_one(kn);
>> kernfs_fop_poll
>> kernfs_get_active = true
>> cgroup_file_poll
>> psi_trigger_poll
>> // UAF
>> ...
>> end: close(fd)
>>
>> Fix this by adding released flag check in kernfs_fop_poll(), which is
>> treated as kn is inactive.
>>
>> Fixes: 34f26a15611a ("sched/psi: Per-cgroup PSI accounting disable/re-enable interface")
>> Reported-by: Zhang Zhantian <zhangzhaotian@huawei.com>
>> Signed-off-by: Chen Ridong <chenridong@huawei.com>
>> ---
>>  fs/kernfs/file.c       | 2 +-
>>  kernel/cgroup/cgroup.c | 1 +
>>  2 files changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
>> index a6c692cac616..d5d01f0b9392 100644
>> --- a/fs/kernfs/file.c
>> +++ b/fs/kernfs/file.c
>> @@ -852,7 +852,7 @@ static __poll_t kernfs_fop_poll(struct file *filp, poll_table *wait)
>>  	struct kernfs_node *kn = kernfs_dentry_node(filp->f_path.dentry);
>>  	__poll_t ret;
>>  
>> -	if (!kernfs_get_active(kn))
>> +	if (of->released || !kernfs_get_active(kn))
> 
> I can see why the cgroup change is needed, but why is this test for
> released() an issue here?  This feels like two different changes/fixes
> for different problems?  Why does testing for released matter at this
> point in time?
> 
> thanks,
> 
> greg k-h

Thank you for your feedback.

The cgroup changes can prevent the UAF (Use-After-Free) issue, but they will introduce a NULL
pointer access problem.

If the open file is properly drained(released), it can not safely invoke the poll callback again.
Otherwise, it may still lead to either UAF or NULL pointer issues

Are you suggesting I should split this into two separate patches?

-- 
Best regards,
Ridong


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] kernfs: Fix UAF in PSI polling when open file is released
  2025-08-15  6:22   ` Chen Ridong
@ 2025-08-15  6:43     ` Greg KH
  2025-08-15  7:16       ` Chen Ridong
  0 siblings, 1 reply; 11+ messages in thread
From: Greg KH @ 2025-08-15  6:43 UTC (permalink / raw)
  To: Chen Ridong
  Cc: tj, hannes, mkoutny, peterz, zhouchengming, linux-kernel, cgroups,
	lujialin4, chenridong

On Fri, Aug 15, 2025 at 02:22:54PM +0800, Chen Ridong wrote:
> 
> 
> On 2025/8/15 14:11, Greg KH wrote:
> > On Fri, Aug 15, 2025 at 01:34:29AM +0000, Chen Ridong wrote:
> >> From: Chen Ridong <chenridong@huawei.com>
> >>
> >> A use-after-free (UAF) vulnerability was identified in the PSI (Pressure
> >> Stall Information) monitoring mechanism:
> >>
> >> BUG: KASAN: slab-use-after-free in psi_trigger_poll+0x3c/0x140
> >> Read of size 8 at addr ffff3de3d50bd308 by task systemd/1
> >>
> >> psi_trigger_poll+0x3c/0x140
> >> cgroup_pressure_poll+0x70/0xa0
> >> cgroup_file_poll+0x8c/0x100
> >> kernfs_fop_poll+0x11c/0x1c0
> >> ep_item_poll.isra.0+0x188/0x2c0
> >>
> >> Allocated by task 1:
> >> cgroup_file_open+0x88/0x388
> >> kernfs_fop_open+0x73c/0xaf0
> >> do_dentry_open+0x5fc/0x1200
> >> vfs_open+0xa0/0x3f0
> >> do_open+0x7e8/0xd08
> >> path_openat+0x2fc/0x6b0
> >> do_filp_open+0x174/0x368
> >>
> >> Freed by task 8462:
> >> cgroup_file_release+0x130/0x1f8
> >> kernfs_drain_open_files+0x17c/0x440
> >> kernfs_drain+0x2dc/0x360
> >> kernfs_show+0x1b8/0x288
> >> cgroup_file_show+0x150/0x268
> >> cgroup_pressure_write+0x1dc/0x340
> >> cgroup_file_write+0x274/0x548
> >>
> >> Reproduction Steps:
> >> 1. Open test/cpu.pressure and establish epoll monitoring
> >> 2. Disable monitoring: echo 0 > test/cgroup.pressure
> >> 3. Re-enable monitoring: echo 1 > test/cgroup.pressure
> >>
> >> The race condition occurs because:
> >> 1. When cgroup.pressure is disabled (echo 0 > cgroup.pressure), it:
> >>    - Releases PSI triggers via cgroup_file_release()
> >>    - Frees of->priv through kernfs_drain_open_files()
> >> 2. While epoll still holds reference to the file and continues polling
> >> 3. Re-enabling (echo 1 > cgroup.pressure) accesses freed of->priv
> >>
> >> epolling			disable/enable cgroup.pressure
> >> fd=open(cpu.pressure)
> >> while(1)
> >> ...
> >> epoll_wait
> >> kernfs_fop_poll
> >> kernfs_get_active = true	echo 0 > cgroup.pressure
> >> ...				cgroup_file_show
> >> 				kernfs_show
> >> 				// inactive kn
> >> 				kernfs_drain_open_files
> >> 				cft->release(of);
> >> 				kfree(ctx);
> >> 				...
> >> kernfs_get_active = false
> >> 				echo 1 > cgroup.pressure
> >> 				kernfs_show
> >> 				kernfs_activate_one(kn);
> >> kernfs_fop_poll
> >> kernfs_get_active = true
> >> cgroup_file_poll
> >> psi_trigger_poll
> >> // UAF
> >> ...
> >> end: close(fd)
> >>
> >> Fix this by adding released flag check in kernfs_fop_poll(), which is
> >> treated as kn is inactive.
> >>
> >> Fixes: 34f26a15611a ("sched/psi: Per-cgroup PSI accounting disable/re-enable interface")
> >> Reported-by: Zhang Zhantian <zhangzhaotian@huawei.com>
> >> Signed-off-by: Chen Ridong <chenridong@huawei.com>
> >> ---
> >>  fs/kernfs/file.c       | 2 +-
> >>  kernel/cgroup/cgroup.c | 1 +
> >>  2 files changed, 2 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
> >> index a6c692cac616..d5d01f0b9392 100644
> >> --- a/fs/kernfs/file.c
> >> +++ b/fs/kernfs/file.c
> >> @@ -852,7 +852,7 @@ static __poll_t kernfs_fop_poll(struct file *filp, poll_table *wait)
> >>  	struct kernfs_node *kn = kernfs_dentry_node(filp->f_path.dentry);
> >>  	__poll_t ret;
> >>  
> >> -	if (!kernfs_get_active(kn))
> >> +	if (of->released || !kernfs_get_active(kn))
> > 
> > I can see why the cgroup change is needed, but why is this test for
> > released() an issue here?  This feels like two different changes/fixes
> > for different problems?  Why does testing for released matter at this
> > point in time?
> > 
> > thanks,
> > 
> > greg k-h
> 
> Thank you for your feedback.
> 
> The cgroup changes can prevent the UAF (Use-After-Free) issue, but they will introduce a NULL
> pointer access problem.

Where will the null pointer access happen?  kernfs_fop_poll() isn't
caring about the released file, AND you need to properly lock things
when testing that value (hint, what if it changes right after you tested
it?)

> If the open file is properly drained(released), it can not safely invoke the poll callback again.
> Otherwise, it may still lead to either UAF or NULL pointer issues
> 
> Are you suggesting I should split this into two separate patches?

This feels like two separate issues for two different things.  The PSI
change didn't cause the kernfs change to be required right?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] kernfs: Fix UAF in PSI polling when open file is released
  2025-08-15  6:43     ` Greg KH
@ 2025-08-15  7:16       ` Chen Ridong
  0 siblings, 0 replies; 11+ messages in thread
From: Chen Ridong @ 2025-08-15  7:16 UTC (permalink / raw)
  To: Greg KH
  Cc: tj, hannes, mkoutny, peterz, zhouchengming, linux-kernel, cgroups,
	lujialin4, chenridong



On 2025/8/15 14:43, Greg KH wrote:
> On Fri, Aug 15, 2025 at 02:22:54PM +0800, Chen Ridong wrote:
>>
>>
>> On 2025/8/15 14:11, Greg KH wrote:
>>> On Fri, Aug 15, 2025 at 01:34:29AM +0000, Chen Ridong wrote:
>>>> From: Chen Ridong <chenridong@huawei.com>
>>>>
>>>> A use-after-free (UAF) vulnerability was identified in the PSI (Pressure
>>>> Stall Information) monitoring mechanism:
>>>>
>>>> BUG: KASAN: slab-use-after-free in psi_trigger_poll+0x3c/0x140
>>>> Read of size 8 at addr ffff3de3d50bd308 by task systemd/1
>>>>
>>>> psi_trigger_poll+0x3c/0x140
>>>> cgroup_pressure_poll+0x70/0xa0
>>>> cgroup_file_poll+0x8c/0x100
>>>> kernfs_fop_poll+0x11c/0x1c0
>>>> ep_item_poll.isra.0+0x188/0x2c0
>>>>
>>>> Allocated by task 1:
>>>> cgroup_file_open+0x88/0x388
>>>> kernfs_fop_open+0x73c/0xaf0
>>>> do_dentry_open+0x5fc/0x1200
>>>> vfs_open+0xa0/0x3f0
>>>> do_open+0x7e8/0xd08
>>>> path_openat+0x2fc/0x6b0
>>>> do_filp_open+0x174/0x368
>>>>
>>>> Freed by task 8462:
>>>> cgroup_file_release+0x130/0x1f8
>>>> kernfs_drain_open_files+0x17c/0x440
>>>> kernfs_drain+0x2dc/0x360
>>>> kernfs_show+0x1b8/0x288
>>>> cgroup_file_show+0x150/0x268
>>>> cgroup_pressure_write+0x1dc/0x340
>>>> cgroup_file_write+0x274/0x548
>>>>
>>>> Reproduction Steps:
>>>> 1. Open test/cpu.pressure and establish epoll monitoring
>>>> 2. Disable monitoring: echo 0 > test/cgroup.pressure
>>>> 3. Re-enable monitoring: echo 1 > test/cgroup.pressure
>>>>
>>>> The race condition occurs because:
>>>> 1. When cgroup.pressure is disabled (echo 0 > cgroup.pressure), it:
>>>>    - Releases PSI triggers via cgroup_file_release()
>>>>    - Frees of->priv through kernfs_drain_open_files()
>>>> 2. While epoll still holds reference to the file and continues polling
>>>> 3. Re-enabling (echo 1 > cgroup.pressure) accesses freed of->priv
>>>>
>>>> epolling			disable/enable cgroup.pressure
>>>> fd=open(cpu.pressure)
>>>> while(1)
>>>> ...
>>>> epoll_wait
>>>> kernfs_fop_poll
>>>> kernfs_get_active = true	echo 0 > cgroup.pressure
>>>> ...				cgroup_file_show
>>>> 				kernfs_show
>>>> 				// inactive kn
>>>> 				kernfs_drain_open_files
>>>> 				cft->release(of);
>>>> 				kfree(ctx);
>>>> 				...
>>>> kernfs_get_active = false
>>>> 				echo 1 > cgroup.pressure
>>>> 				kernfs_show
>>>> 				kernfs_activate_one(kn);
>>>> kernfs_fop_poll
>>>> kernfs_get_active = true
>>>> cgroup_file_poll
>>>> psi_trigger_poll
>>>> // UAF
>>>> ...
>>>> end: close(fd)
>>>>
>>>> Fix this by adding released flag check in kernfs_fop_poll(), which is
>>>> treated as kn is inactive.
>>>>
>>>> Fixes: 34f26a15611a ("sched/psi: Per-cgroup PSI accounting disable/re-enable interface")
>>>> Reported-by: Zhang Zhantian <zhangzhaotian@huawei.com>
>>>> Signed-off-by: Chen Ridong <chenridong@huawei.com>
>>>> ---
>>>>  fs/kernfs/file.c       | 2 +-
>>>>  kernel/cgroup/cgroup.c | 1 +
>>>>  2 files changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
>>>> index a6c692cac616..d5d01f0b9392 100644
>>>> --- a/fs/kernfs/file.c
>>>> +++ b/fs/kernfs/file.c
>>>> @@ -852,7 +852,7 @@ static __poll_t kernfs_fop_poll(struct file *filp, poll_table *wait)
>>>>  	struct kernfs_node *kn = kernfs_dentry_node(filp->f_path.dentry);
>>>>  	__poll_t ret;
>>>>  
>>>> -	if (!kernfs_get_active(kn))
>>>> +	if (of->released || !kernfs_get_active(kn))
>>>
>>> I can see why the cgroup change is needed, but why is this test for
>>> released() an issue here?  This feels like two different changes/fixes
>>> for different problems?  Why does testing for released matter at this
>>> point in time?
>>>
>>> thanks,
>>>
>>> greg k-h
>>
>> Thank you for your feedback.
>>
>> The cgroup changes can prevent the UAF (Use-After-Free) issue, but they will introduce a NULL
>> pointer access problem.
> 
> Where will the null pointer access happen?  kernfs_fop_poll() isn't
> caring about the released file, AND you need to properly lock things
> when testing that value (hint, what if it changes right after you tested
> it?)
> 


Issue occurs in psi_trigger_poll() during this sequence:

fd = open("cpu.pressure")
<...> // cgroup.pressure disabled then re-enabled
of->released flag gets set to true
kernfs_fop_poll()
└─ kernfs_get_active() returns true (due to re-enable) <<<--  kernfs changes, check of->released
    └─ kn->attr.ops->poll()
        └─ cgroup_file_poll()
	    └─ *ctx = of->priv; // Already released by .release()(cgroup_file_release)
                 └─ psi_trigger_poll()
                     └─ smp_load_acquire(trigger_ptr);
// UAF: trigger_ptr == of->priv
// NULL dereference after of->priv = NULL patch


>> If the open file is properly drained(released), it can not safely invoke the poll callback again.
>> Otherwise, it may still lead to either UAF or NULL pointer issues
>>
>> Are you suggesting I should split this into two separate patches?
> 
> This feels like two separate issues for two different things.  The PSI
> change didn't cause the kernfs change to be required right?
> 
> thanks,
> 
> greg k-h

Correct. The cgroup modifications make the issue more easily observable, while the kernfs changes
provide the actual fix.

-- 
Best regards,
Ridong


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] kernfs: Fix UAF in PSI polling when open file is released
  2025-08-15  6:11 ` Greg KH
  2025-08-15  6:22   ` Chen Ridong
@ 2025-08-15 14:42   ` Michal Koutný
  2025-08-18  7:41     ` Chen Ridong
  1 sibling, 1 reply; 11+ messages in thread
From: Michal Koutný @ 2025-08-15 14:42 UTC (permalink / raw)
  To: Greg KH
  Cc: Chen Ridong, tj, hannes, peterz, zhouchengming, linux-kernel,
	cgroups, lujialin4, chenridong

[-- Attachment #1: Type: text/plain, Size: 1390 bytes --]

On Fri, Aug 15, 2025 at 08:11:39AM +0200, Greg KH <gregkh@linuxfoundation.org> wrote:
> > diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
> > index a6c692cac616..d5d01f0b9392 100644
> > --- a/fs/kernfs/file.c
> > +++ b/fs/kernfs/file.c
> > @@ -852,7 +852,7 @@ static __poll_t kernfs_fop_poll(struct file *filp, poll_table *wait)
> >  	struct kernfs_node *kn = kernfs_dentry_node(filp->f_path.dentry);
> >  	__poll_t ret;
> >  
> > -	if (!kernfs_get_active(kn))
> > +	if (of->released || !kernfs_get_active(kn))
> 
> I can see why the cgroup change is needed,

I don't see it that much. of->priv isn't checked in cgroup code anywhere
so it isn't helpful zeroing. As Ridong writes it may trade UaF for NULL
pointer deref :-/ (Additionally, same zeroing would be needed in error
path in cgroup_file_open().)

I _think_ the place to cleanup would be in
@@ -3978,6 +3978,8 @@ static ssize_t cgroup_pressure_write(struct kernfs_open_file *of,
                psi->enabled = enable;
                if (enable)
                        psi_cgroup_restart(psi);
+               else
+                       psi_trigger_destroy(???);
        }

        cgroup_kn_unlock(of->kn);

The issue is that cgroup_pressure_write doesn't know all possible
triggers to be cancelled. (The fix with of->released would only
sanitize effect but not the cause IMO.)

HTH,
Michal

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] kernfs: Fix UAF in PSI polling when open file is released
  2025-08-15 14:42   ` Michal Koutný
@ 2025-08-18  7:41     ` Chen Ridong
  0 siblings, 0 replies; 11+ messages in thread
From: Chen Ridong @ 2025-08-18  7:41 UTC (permalink / raw)
  To: Michal Koutný, Greg KH
  Cc: tj, hannes, peterz, zhouchengming, linux-kernel, cgroups,
	lujialin4, chenridong



On 2025/8/15 22:42, Michal Koutný wrote:
> On Fri, Aug 15, 2025 at 08:11:39AM +0200, Greg KH <gregkh@linuxfoundation.org> wrote:
>>> diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
>>> index a6c692cac616..d5d01f0b9392 100644
>>> --- a/fs/kernfs/file.c
>>> +++ b/fs/kernfs/file.c
>>> @@ -852,7 +852,7 @@ static __poll_t kernfs_fop_poll(struct file *filp, poll_table *wait)
>>>  	struct kernfs_node *kn = kernfs_dentry_node(filp->f_path.dentry);
>>>  	__poll_t ret;
>>>  
>>> -	if (!kernfs_get_active(kn))
>>> +	if (of->released || !kernfs_get_active(kn))
>>
>> I can see why the cgroup change is needed,
> 
> I don't see it that much. of->priv isn't checked in cgroup code anywhere
> so it isn't helpful zeroing. As Ridong writes it may trade UaF for NULL
> pointer deref :-/ (Additionally, same zeroing would be needed in error
> path in cgroup_file_open().)
> 

Thank you, Michal,

I believe assigning NULL to of->priv should be harmless. This change would make the bug more
observable in practice. Without this explicit NULL assignment, the use-after-free (UAF) issue might
remain hidden in some cases, particularly when KASAN is disabled.

> I _think_ the place to cleanup would be in
> @@ -3978,6 +3978,8 @@ static ssize_t cgroup_pressure_write(struct kernfs_open_file *of,
>                 psi->enabled = enable;
>                 if (enable)
>                         psi_cgroup_restart(psi);
> +               else
> +                       psi_trigger_destroy(???);
>         }
> 

Could you please provide more details about this modification? Do you mean we need to consider
additional cleanup work when disabling cgroup.pressure? The psi_trigger_destroy is invoked as
follows:

cgroup_file_show
  kernfs_drain
    kernfs_drain_open_files
      kernfs_release_file
        cgroup_file_release
          cft->release(of);
            cgroup_pressure_release
              psi_trigger_destroy

>         cgroup_kn_unlock(of->kn);
> 
> The issue is that cgroup_pressure_write doesn't know all possible
> triggers to be cancelled. (The fix with of->released would only
> sanitize effect but not the cause IMO.)
> 
> HTH,
> Michal

-- 
Best regards,
Ridong


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] kernfs: Fix UAF in PSI polling when open file is released
       [not found] ` <0319ee9b-ce2c-4c02-a731-c538afcf008f@huawei.com>
@ 2025-08-18  8:00   ` Chen Ridong
  2025-08-18  8:22     ` Chen Ridong
  2025-08-20  1:46     ` Tejun Heo
  0 siblings, 2 replies; 11+ messages in thread
From: Chen Ridong @ 2025-08-18  8:00 UTC (permalink / raw)
  To: Baokun Li
  Cc: cgroups, chenridong, gregkh, hannes, linux-kernel, lujialin4,
	mkoutny, peterz, tj, zhouchengming, Yang Erkun



On 2025/8/16 17:53, Baokun Li wrote:
>> From: Chen Ridong <chenridong@huawei.com>
>>
>> A use-after-free (UAF) vulnerability was identified in the PSI (Pressure
>> Stall Information) monitoring mechanism:
>>
>> BUG: KASAN: slab-use-after-free in psi_trigger_poll+0x3c/0x140
>> Read of size 8 at addr ffff3de3d50bd308 by task systemd/1
>>
>> psi_trigger_poll+0x3c/0x140
>> cgroup_pressure_poll+0x70/0xa0
>> cgroup_file_poll+0x8c/0x100
>> kernfs_fop_poll+0x11c/0x1c0
>> ep_item_poll.isra.0+0x188/0x2c0
>>
>> Allocated by task 1:
>> cgroup_file_open+0x88/0x388
>> kernfs_fop_open+0x73c/0xaf0
>> do_dentry_open+0x5fc/0x1200
>> vfs_open+0xa0/0x3f0
>> do_open+0x7e8/0xd08
>> path_openat+0x2fc/0x6b0
>> do_filp_open+0x174/0x368
>>
>> Freed by task 8462:
>> cgroup_file_release+0x130/0x1f8
>> kernfs_drain_open_files+0x17c/0x440
>> kernfs_drain+0x2dc/0x360
>> kernfs_show+0x1b8/0x288
>> cgroup_file_show+0x150/0x268
>> cgroup_pressure_write+0x1dc/0x340
>> cgroup_file_write+0x274/0x548
>>
>> Reproduction Steps:
>> 1. Open test/cpu.pressure and establish epoll monitoring
>> 2. Disable monitoring: echo 0 > test/cgroup.pressure
>> 3. Re-enable monitoring: echo 1 > test/cgroup.pressure
>>
>> The race condition occurs because:
>> 1. When cgroup.pressure is disabled (echo 0 > cgroup.pressure), it:
>>    - Releases PSI triggers via cgroup_file_release()
>>    - Frees of->priv through kernfs_drain_open_files()
>> 2. While epoll still holds reference to the file and continues polling
>> 3. Re-enabling (echo 1 > cgroup.pressure) accesses freed of->priv
>>
>> epolling			disable/enable cgroup.pressure
>> fd=open(cpu.pressure)
>> while(1)
>> ...
>> epoll_wait
>> kernfs_fop_poll
>> kernfs_get_active = true	echo 0 > cgroup.pressure
>> ...				cgroup_file_show
>> 				kernfs_show
>> 				// inactive kn
>> 				kernfs_drain_open_files
>> 				cft->release(of);
>> 				kfree(ctx);
>> 				...
>> kernfs_get_active = false
>> 				echo 1 > cgroup.pressure
>> 				kernfs_show
>> 				kernfs_activate_one(kn);
>> kernfs_fop_poll
>> kernfs_get_active = true
>> cgroup_file_poll
>> psi_trigger_poll
>> // UAF
>> ...
>> end: close(fd)

Thank you, Baokun.

> I think the problem is that kernfs_show() handles enable and disable
> inconsistently. When disable is called, it sets kn->active and then frees
> cgroup_file_ctx and psi_trigger. But when enable is called, it only sets
> kn->active. This mismatch means we can end up accessing the freed
> cgroup_file_ctx and psi_trigger later on.
>

I agree with that.

> A potential solution is to make the lifecycles of cgroup_file_ctx and
> psi_trigger match the struct kernfs_open_file they're associated with.
> Maybe we could just get rid of the kernfs_release_file call in
> kernfs_drain_open_files?
>

Hi, Tj, what do you think about this solution?

> That way, the resources would be safely released only when the file
> descriptor is actually freed. Plus, if cgroup.pressure is re-enabled,
> any open file descriptors would still work as expected.
> 
> 
> Cheers,
> Baokun
> 

-- 
Best regards,
Ridong


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] kernfs: Fix UAF in PSI polling when open file is released
  2025-08-18  8:00   ` Chen Ridong
@ 2025-08-18  8:22     ` Chen Ridong
  2025-08-20  1:46     ` Tejun Heo
  1 sibling, 0 replies; 11+ messages in thread
From: Chen Ridong @ 2025-08-18  8:22 UTC (permalink / raw)
  To: Baokun Li
  Cc: cgroups, chenridong, gregkh, hannes, linux-kernel, lujialin4,
	mkoutny, peterz, tj, zhouchengming, Yang Erkun



On 2025/8/18 16:00, Chen Ridong wrote:
> 
> 
> On 2025/8/16 17:53, Baokun Li wrote:
>>> From: Chen Ridong <chenridong@huawei.com>
>>>
>>> A use-after-free (UAF) vulnerability was identified in the PSI (Pressure
>>> Stall Information) monitoring mechanism:
>>>
>>> BUG: KASAN: slab-use-after-free in psi_trigger_poll+0x3c/0x140
>>> Read of size 8 at addr ffff3de3d50bd308 by task systemd/1
>>>
>>> psi_trigger_poll+0x3c/0x140
>>> cgroup_pressure_poll+0x70/0xa0
>>> cgroup_file_poll+0x8c/0x100
>>> kernfs_fop_poll+0x11c/0x1c0
>>> ep_item_poll.isra.0+0x188/0x2c0
>>>
>>> Allocated by task 1:
>>> cgroup_file_open+0x88/0x388
>>> kernfs_fop_open+0x73c/0xaf0
>>> do_dentry_open+0x5fc/0x1200
>>> vfs_open+0xa0/0x3f0
>>> do_open+0x7e8/0xd08
>>> path_openat+0x2fc/0x6b0
>>> do_filp_open+0x174/0x368
>>>
>>> Freed by task 8462:
>>> cgroup_file_release+0x130/0x1f8
>>> kernfs_drain_open_files+0x17c/0x440
>>> kernfs_drain+0x2dc/0x360
>>> kernfs_show+0x1b8/0x288
>>> cgroup_file_show+0x150/0x268
>>> cgroup_pressure_write+0x1dc/0x340
>>> cgroup_file_write+0x274/0x548
>>>
>>> Reproduction Steps:
>>> 1. Open test/cpu.pressure and establish epoll monitoring
>>> 2. Disable monitoring: echo 0 > test/cgroup.pressure
>>> 3. Re-enable monitoring: echo 1 > test/cgroup.pressure
>>>
>>> The race condition occurs because:
>>> 1. When cgroup.pressure is disabled (echo 0 > cgroup.pressure), it:
>>>    - Releases PSI triggers via cgroup_file_release()
>>>    - Frees of->priv through kernfs_drain_open_files()
>>> 2. While epoll still holds reference to the file and continues polling
>>> 3. Re-enabling (echo 1 > cgroup.pressure) accesses freed of->priv
>>>
>>> epolling			disable/enable cgroup.pressure
>>> fd=open(cpu.pressure)
>>> while(1)
>>> ...
>>> epoll_wait
>>> kernfs_fop_poll
>>> kernfs_get_active = true	echo 0 > cgroup.pressure
>>> ...				cgroup_file_show
>>> 				kernfs_show
>>> 				// inactive kn
>>> 				kernfs_drain_open_files
>>> 				cft->release(of);
>>> 				kfree(ctx);
>>> 				...
>>> kernfs_get_active = false
>>> 				echo 1 > cgroup.pressure
>>> 				kernfs_show
>>> 				kernfs_activate_one(kn);
>>> kernfs_fop_poll
>>> kernfs_get_active = true
>>> cgroup_file_poll
>>> psi_trigger_poll
>>> // UAF
>>> ...
>>> end: close(fd)
> 
> Thank you, Baokun.
> 
>> I think the problem is that kernfs_show() handles enable and disable
>> inconsistently. When disable is called, it sets kn->active and then frees
>> cgroup_file_ctx and psi_trigger. But when enable is called, it only sets
>> kn->active. This mismatch means we can end up accessing the freed
>> cgroup_file_ctx and psi_trigger later on.
>>
> 
> I agree with that.
> 
>> A potential solution is to make the lifecycles of cgroup_file_ctx and
>> psi_trigger match the struct kernfs_open_file they're associated with.
>> Maybe we could just get rid of the kernfs_release_file call in
>> kernfs_drain_open_files?
>>
> 
> Hi, Tj, what do you think about this solution?
> 

Or we should add KERNFS_HIDDEN check?

--- a/fs/kernfs/file.c
+++ b/fs/kernfs/file.c
@@ -812,7 +812,7 @@ void kernfs_drain_open_files(struct kernfs_node *kn)
                        on->nr_mmapped--;
                }

-               if (kn->flags & KERNFS_HAS_RELEASE)
+               if (!kn->flags & KERNFS_HIDDEN && kn->flags & KERNFS_HAS_RELEASE)
                        kernfs_release_file(kn, of);
        }

-- 
Best regards,
Ridong


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] kernfs: Fix UAF in PSI polling when open file is released
  2025-08-18  8:00   ` Chen Ridong
  2025-08-18  8:22     ` Chen Ridong
@ 2025-08-20  1:46     ` Tejun Heo
  2025-08-22  1:57       ` Chen Ridong
  1 sibling, 1 reply; 11+ messages in thread
From: Tejun Heo @ 2025-08-20  1:46 UTC (permalink / raw)
  To: Chen Ridong
  Cc: Baokun Li, cgroups, chenridong, gregkh, hannes, linux-kernel,
	lujialin4, mkoutny, peterz, zhouchengming, Yang Erkun

Hello,

On Mon, Aug 18, 2025 at 04:00:08PM +0800, Chen Ridong wrote:
> > A potential solution is to make the lifecycles of cgroup_file_ctx and
> > psi_trigger match the struct kernfs_open_file they're associated with.
> > Maybe we could just get rid of the kernfs_release_file call in
> > kernfs_drain_open_files?
> >
> 
> Hi, Tj, what do you think about this solution?

So, I think it's really fragile for a killed (drained) kernfs_open_file to
be reused after the corresponding @kn is resurrected. Once killed, that file
should stay dead. I think it'd be best if we can do this in a generic manner
rather than trying to fix it only for poll.

kernfs_get_active() is the thing which gates active operations on the file.
Maybe we can add a wrapper, say, kernfs_get_active_of(struct
kernfs_open_file *of) which returns NULL if @of has already been killed or
the underlying @kn can't be activated?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] kernfs: Fix UAF in PSI polling when open file is released
  2025-08-20  1:46     ` Tejun Heo
@ 2025-08-22  1:57       ` Chen Ridong
  0 siblings, 0 replies; 11+ messages in thread
From: Chen Ridong @ 2025-08-22  1:57 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Baokun Li, cgroups, chenridong, gregkh, hannes, linux-kernel,
	lujialin4, mkoutny, peterz, zhouchengming, Yang Erkun



On 2025/8/20 9:46, Tejun Heo wrote:
> Hello,
> 
> On Mon, Aug 18, 2025 at 04:00:08PM +0800, Chen Ridong wrote:
>>> A potential solution is to make the lifecycles of cgroup_file_ctx and
>>> psi_trigger match the struct kernfs_open_file they're associated with.
>>> Maybe we could just get rid of the kernfs_release_file call in
>>> kernfs_drain_open_files?
>>>
>>
>> Hi, Tj, what do you think about this solution?
> 
> So, I think it's really fragile for a killed (drained) kernfs_open_file to
> be reused after the corresponding @kn is resurrected. Once killed, that file
> should stay dead. I think it'd be best if we can do this in a generic manner
> rather than trying to fix it only for poll.
> 
> kernfs_get_active() is the thing which gates active operations on the file.
> Maybe we can add a wrapper, say, kernfs_get_active_of(struct
> kernfs_open_file *of) which returns NULL if @of has already been killed or
> the underlying @kn can't be activated?
> 
> Thanks.
> 

Thank you Tj,

This is reasonable, I will try.

-- 
Best regards,
Ridong


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2025-08-22  1:57 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-15  1:34 [PATCH] kernfs: Fix UAF in PSI polling when open file is released Chen Ridong
2025-08-15  6:11 ` Greg KH
2025-08-15  6:22   ` Chen Ridong
2025-08-15  6:43     ` Greg KH
2025-08-15  7:16       ` Chen Ridong
2025-08-15 14:42   ` Michal Koutný
2025-08-18  7:41     ` Chen Ridong
     [not found] ` <0319ee9b-ce2c-4c02-a731-c538afcf008f@huawei.com>
2025-08-18  8:00   ` Chen Ridong
2025-08-18  8:22     ` Chen Ridong
2025-08-20  1:46     ` Tejun Heo
2025-08-22  1:57       ` Chen Ridong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).