* [PATCH] kernfs: Fix UAF in PSI polling when open file is released @ 2025-08-15 1:34 Chen Ridong 2025-08-15 6:11 ` Greg KH [not found] ` <0319ee9b-ce2c-4c02-a731-c538afcf008f@huawei.com> 0 siblings, 2 replies; 11+ messages in thread From: Chen Ridong @ 2025-08-15 1:34 UTC (permalink / raw) To: gregkh, tj, hannes, mkoutny, peterz, zhouchengming Cc: linux-kernel, cgroups, lujialin4, chenridong From: Chen Ridong <chenridong@huawei.com> A use-after-free (UAF) vulnerability was identified in the PSI (Pressure Stall Information) monitoring mechanism: BUG: KASAN: slab-use-after-free in psi_trigger_poll+0x3c/0x140 Read of size 8 at addr ffff3de3d50bd308 by task systemd/1 psi_trigger_poll+0x3c/0x140 cgroup_pressure_poll+0x70/0xa0 cgroup_file_poll+0x8c/0x100 kernfs_fop_poll+0x11c/0x1c0 ep_item_poll.isra.0+0x188/0x2c0 Allocated by task 1: cgroup_file_open+0x88/0x388 kernfs_fop_open+0x73c/0xaf0 do_dentry_open+0x5fc/0x1200 vfs_open+0xa0/0x3f0 do_open+0x7e8/0xd08 path_openat+0x2fc/0x6b0 do_filp_open+0x174/0x368 Freed by task 8462: cgroup_file_release+0x130/0x1f8 kernfs_drain_open_files+0x17c/0x440 kernfs_drain+0x2dc/0x360 kernfs_show+0x1b8/0x288 cgroup_file_show+0x150/0x268 cgroup_pressure_write+0x1dc/0x340 cgroup_file_write+0x274/0x548 Reproduction Steps: 1. Open test/cpu.pressure and establish epoll monitoring 2. Disable monitoring: echo 0 > test/cgroup.pressure 3. Re-enable monitoring: echo 1 > test/cgroup.pressure The race condition occurs because: 1. When cgroup.pressure is disabled (echo 0 > cgroup.pressure), it: - Releases PSI triggers via cgroup_file_release() - Frees of->priv through kernfs_drain_open_files() 2. While epoll still holds reference to the file and continues polling 3. Re-enabling (echo 1 > cgroup.pressure) accesses freed of->priv epolling disable/enable cgroup.pressure fd=open(cpu.pressure) while(1) ... epoll_wait kernfs_fop_poll kernfs_get_active = true echo 0 > cgroup.pressure ... cgroup_file_show kernfs_show // inactive kn kernfs_drain_open_files cft->release(of); kfree(ctx); ... kernfs_get_active = false echo 1 > cgroup.pressure kernfs_show kernfs_activate_one(kn); kernfs_fop_poll kernfs_get_active = true cgroup_file_poll psi_trigger_poll // UAF ... end: close(fd) Fix this by adding released flag check in kernfs_fop_poll(), which is treated as kn is inactive. Fixes: 34f26a15611a ("sched/psi: Per-cgroup PSI accounting disable/re-enable interface") Reported-by: Zhang Zhantian <zhangzhaotian@huawei.com> Signed-off-by: Chen Ridong <chenridong@huawei.com> --- fs/kernfs/file.c | 2 +- kernel/cgroup/cgroup.c | 1 + 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c index a6c692cac616..d5d01f0b9392 100644 --- a/fs/kernfs/file.c +++ b/fs/kernfs/file.c @@ -852,7 +852,7 @@ static __poll_t kernfs_fop_poll(struct file *filp, poll_table *wait) struct kernfs_node *kn = kernfs_dentry_node(filp->f_path.dentry); __poll_t ret; - if (!kernfs_get_active(kn)) + if (of->released || !kernfs_get_active(kn)) return DEFAULT_POLLMASK|EPOLLERR|EPOLLPRI; if (kn->attr.ops->poll) diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index 312c6a8b55bb..d8b82afed181 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -4159,6 +4159,7 @@ static void cgroup_file_release(struct kernfs_open_file *of) cft->release(of); put_cgroup_ns(ctx->ns); kfree(ctx); + of->priv = NULL; } static ssize_t cgroup_file_write(struct kernfs_open_file *of, char *buf, -- 2.34.1 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH] kernfs: Fix UAF in PSI polling when open file is released 2025-08-15 1:34 [PATCH] kernfs: Fix UAF in PSI polling when open file is released Chen Ridong @ 2025-08-15 6:11 ` Greg KH 2025-08-15 6:22 ` Chen Ridong 2025-08-15 14:42 ` Michal Koutný [not found] ` <0319ee9b-ce2c-4c02-a731-c538afcf008f@huawei.com> 1 sibling, 2 replies; 11+ messages in thread From: Greg KH @ 2025-08-15 6:11 UTC (permalink / raw) To: Chen Ridong Cc: tj, hannes, mkoutny, peterz, zhouchengming, linux-kernel, cgroups, lujialin4, chenridong On Fri, Aug 15, 2025 at 01:34:29AM +0000, Chen Ridong wrote: > From: Chen Ridong <chenridong@huawei.com> > > A use-after-free (UAF) vulnerability was identified in the PSI (Pressure > Stall Information) monitoring mechanism: > > BUG: KASAN: slab-use-after-free in psi_trigger_poll+0x3c/0x140 > Read of size 8 at addr ffff3de3d50bd308 by task systemd/1 > > psi_trigger_poll+0x3c/0x140 > cgroup_pressure_poll+0x70/0xa0 > cgroup_file_poll+0x8c/0x100 > kernfs_fop_poll+0x11c/0x1c0 > ep_item_poll.isra.0+0x188/0x2c0 > > Allocated by task 1: > cgroup_file_open+0x88/0x388 > kernfs_fop_open+0x73c/0xaf0 > do_dentry_open+0x5fc/0x1200 > vfs_open+0xa0/0x3f0 > do_open+0x7e8/0xd08 > path_openat+0x2fc/0x6b0 > do_filp_open+0x174/0x368 > > Freed by task 8462: > cgroup_file_release+0x130/0x1f8 > kernfs_drain_open_files+0x17c/0x440 > kernfs_drain+0x2dc/0x360 > kernfs_show+0x1b8/0x288 > cgroup_file_show+0x150/0x268 > cgroup_pressure_write+0x1dc/0x340 > cgroup_file_write+0x274/0x548 > > Reproduction Steps: > 1. Open test/cpu.pressure and establish epoll monitoring > 2. Disable monitoring: echo 0 > test/cgroup.pressure > 3. Re-enable monitoring: echo 1 > test/cgroup.pressure > > The race condition occurs because: > 1. When cgroup.pressure is disabled (echo 0 > cgroup.pressure), it: > - Releases PSI triggers via cgroup_file_release() > - Frees of->priv through kernfs_drain_open_files() > 2. While epoll still holds reference to the file and continues polling > 3. Re-enabling (echo 1 > cgroup.pressure) accesses freed of->priv > > epolling disable/enable cgroup.pressure > fd=open(cpu.pressure) > while(1) > ... > epoll_wait > kernfs_fop_poll > kernfs_get_active = true echo 0 > cgroup.pressure > ... cgroup_file_show > kernfs_show > // inactive kn > kernfs_drain_open_files > cft->release(of); > kfree(ctx); > ... > kernfs_get_active = false > echo 1 > cgroup.pressure > kernfs_show > kernfs_activate_one(kn); > kernfs_fop_poll > kernfs_get_active = true > cgroup_file_poll > psi_trigger_poll > // UAF > ... > end: close(fd) > > Fix this by adding released flag check in kernfs_fop_poll(), which is > treated as kn is inactive. > > Fixes: 34f26a15611a ("sched/psi: Per-cgroup PSI accounting disable/re-enable interface") > Reported-by: Zhang Zhantian <zhangzhaotian@huawei.com> > Signed-off-by: Chen Ridong <chenridong@huawei.com> > --- > fs/kernfs/file.c | 2 +- > kernel/cgroup/cgroup.c | 1 + > 2 files changed, 2 insertions(+), 1 deletion(-) > > diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c > index a6c692cac616..d5d01f0b9392 100644 > --- a/fs/kernfs/file.c > +++ b/fs/kernfs/file.c > @@ -852,7 +852,7 @@ static __poll_t kernfs_fop_poll(struct file *filp, poll_table *wait) > struct kernfs_node *kn = kernfs_dentry_node(filp->f_path.dentry); > __poll_t ret; > > - if (!kernfs_get_active(kn)) > + if (of->released || !kernfs_get_active(kn)) I can see why the cgroup change is needed, but why is this test for released() an issue here? This feels like two different changes/fixes for different problems? Why does testing for released matter at this point in time? thanks, greg k-h ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] kernfs: Fix UAF in PSI polling when open file is released 2025-08-15 6:11 ` Greg KH @ 2025-08-15 6:22 ` Chen Ridong 2025-08-15 6:43 ` Greg KH 2025-08-15 14:42 ` Michal Koutný 1 sibling, 1 reply; 11+ messages in thread From: Chen Ridong @ 2025-08-15 6:22 UTC (permalink / raw) To: Greg KH Cc: tj, hannes, mkoutny, peterz, zhouchengming, linux-kernel, cgroups, lujialin4, chenridong On 2025/8/15 14:11, Greg KH wrote: > On Fri, Aug 15, 2025 at 01:34:29AM +0000, Chen Ridong wrote: >> From: Chen Ridong <chenridong@huawei.com> >> >> A use-after-free (UAF) vulnerability was identified in the PSI (Pressure >> Stall Information) monitoring mechanism: >> >> BUG: KASAN: slab-use-after-free in psi_trigger_poll+0x3c/0x140 >> Read of size 8 at addr ffff3de3d50bd308 by task systemd/1 >> >> psi_trigger_poll+0x3c/0x140 >> cgroup_pressure_poll+0x70/0xa0 >> cgroup_file_poll+0x8c/0x100 >> kernfs_fop_poll+0x11c/0x1c0 >> ep_item_poll.isra.0+0x188/0x2c0 >> >> Allocated by task 1: >> cgroup_file_open+0x88/0x388 >> kernfs_fop_open+0x73c/0xaf0 >> do_dentry_open+0x5fc/0x1200 >> vfs_open+0xa0/0x3f0 >> do_open+0x7e8/0xd08 >> path_openat+0x2fc/0x6b0 >> do_filp_open+0x174/0x368 >> >> Freed by task 8462: >> cgroup_file_release+0x130/0x1f8 >> kernfs_drain_open_files+0x17c/0x440 >> kernfs_drain+0x2dc/0x360 >> kernfs_show+0x1b8/0x288 >> cgroup_file_show+0x150/0x268 >> cgroup_pressure_write+0x1dc/0x340 >> cgroup_file_write+0x274/0x548 >> >> Reproduction Steps: >> 1. Open test/cpu.pressure and establish epoll monitoring >> 2. Disable monitoring: echo 0 > test/cgroup.pressure >> 3. Re-enable monitoring: echo 1 > test/cgroup.pressure >> >> The race condition occurs because: >> 1. When cgroup.pressure is disabled (echo 0 > cgroup.pressure), it: >> - Releases PSI triggers via cgroup_file_release() >> - Frees of->priv through kernfs_drain_open_files() >> 2. While epoll still holds reference to the file and continues polling >> 3. Re-enabling (echo 1 > cgroup.pressure) accesses freed of->priv >> >> epolling disable/enable cgroup.pressure >> fd=open(cpu.pressure) >> while(1) >> ... >> epoll_wait >> kernfs_fop_poll >> kernfs_get_active = true echo 0 > cgroup.pressure >> ... cgroup_file_show >> kernfs_show >> // inactive kn >> kernfs_drain_open_files >> cft->release(of); >> kfree(ctx); >> ... >> kernfs_get_active = false >> echo 1 > cgroup.pressure >> kernfs_show >> kernfs_activate_one(kn); >> kernfs_fop_poll >> kernfs_get_active = true >> cgroup_file_poll >> psi_trigger_poll >> // UAF >> ... >> end: close(fd) >> >> Fix this by adding released flag check in kernfs_fop_poll(), which is >> treated as kn is inactive. >> >> Fixes: 34f26a15611a ("sched/psi: Per-cgroup PSI accounting disable/re-enable interface") >> Reported-by: Zhang Zhantian <zhangzhaotian@huawei.com> >> Signed-off-by: Chen Ridong <chenridong@huawei.com> >> --- >> fs/kernfs/file.c | 2 +- >> kernel/cgroup/cgroup.c | 1 + >> 2 files changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c >> index a6c692cac616..d5d01f0b9392 100644 >> --- a/fs/kernfs/file.c >> +++ b/fs/kernfs/file.c >> @@ -852,7 +852,7 @@ static __poll_t kernfs_fop_poll(struct file *filp, poll_table *wait) >> struct kernfs_node *kn = kernfs_dentry_node(filp->f_path.dentry); >> __poll_t ret; >> >> - if (!kernfs_get_active(kn)) >> + if (of->released || !kernfs_get_active(kn)) > > I can see why the cgroup change is needed, but why is this test for > released() an issue here? This feels like two different changes/fixes > for different problems? Why does testing for released matter at this > point in time? > > thanks, > > greg k-h Thank you for your feedback. The cgroup changes can prevent the UAF (Use-After-Free) issue, but they will introduce a NULL pointer access problem. If the open file is properly drained(released), it can not safely invoke the poll callback again. Otherwise, it may still lead to either UAF or NULL pointer issues Are you suggesting I should split this into two separate patches? -- Best regards, Ridong ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] kernfs: Fix UAF in PSI polling when open file is released 2025-08-15 6:22 ` Chen Ridong @ 2025-08-15 6:43 ` Greg KH 2025-08-15 7:16 ` Chen Ridong 0 siblings, 1 reply; 11+ messages in thread From: Greg KH @ 2025-08-15 6:43 UTC (permalink / raw) To: Chen Ridong Cc: tj, hannes, mkoutny, peterz, zhouchengming, linux-kernel, cgroups, lujialin4, chenridong On Fri, Aug 15, 2025 at 02:22:54PM +0800, Chen Ridong wrote: > > > On 2025/8/15 14:11, Greg KH wrote: > > On Fri, Aug 15, 2025 at 01:34:29AM +0000, Chen Ridong wrote: > >> From: Chen Ridong <chenridong@huawei.com> > >> > >> A use-after-free (UAF) vulnerability was identified in the PSI (Pressure > >> Stall Information) monitoring mechanism: > >> > >> BUG: KASAN: slab-use-after-free in psi_trigger_poll+0x3c/0x140 > >> Read of size 8 at addr ffff3de3d50bd308 by task systemd/1 > >> > >> psi_trigger_poll+0x3c/0x140 > >> cgroup_pressure_poll+0x70/0xa0 > >> cgroup_file_poll+0x8c/0x100 > >> kernfs_fop_poll+0x11c/0x1c0 > >> ep_item_poll.isra.0+0x188/0x2c0 > >> > >> Allocated by task 1: > >> cgroup_file_open+0x88/0x388 > >> kernfs_fop_open+0x73c/0xaf0 > >> do_dentry_open+0x5fc/0x1200 > >> vfs_open+0xa0/0x3f0 > >> do_open+0x7e8/0xd08 > >> path_openat+0x2fc/0x6b0 > >> do_filp_open+0x174/0x368 > >> > >> Freed by task 8462: > >> cgroup_file_release+0x130/0x1f8 > >> kernfs_drain_open_files+0x17c/0x440 > >> kernfs_drain+0x2dc/0x360 > >> kernfs_show+0x1b8/0x288 > >> cgroup_file_show+0x150/0x268 > >> cgroup_pressure_write+0x1dc/0x340 > >> cgroup_file_write+0x274/0x548 > >> > >> Reproduction Steps: > >> 1. Open test/cpu.pressure and establish epoll monitoring > >> 2. Disable monitoring: echo 0 > test/cgroup.pressure > >> 3. Re-enable monitoring: echo 1 > test/cgroup.pressure > >> > >> The race condition occurs because: > >> 1. When cgroup.pressure is disabled (echo 0 > cgroup.pressure), it: > >> - Releases PSI triggers via cgroup_file_release() > >> - Frees of->priv through kernfs_drain_open_files() > >> 2. While epoll still holds reference to the file and continues polling > >> 3. Re-enabling (echo 1 > cgroup.pressure) accesses freed of->priv > >> > >> epolling disable/enable cgroup.pressure > >> fd=open(cpu.pressure) > >> while(1) > >> ... > >> epoll_wait > >> kernfs_fop_poll > >> kernfs_get_active = true echo 0 > cgroup.pressure > >> ... cgroup_file_show > >> kernfs_show > >> // inactive kn > >> kernfs_drain_open_files > >> cft->release(of); > >> kfree(ctx); > >> ... > >> kernfs_get_active = false > >> echo 1 > cgroup.pressure > >> kernfs_show > >> kernfs_activate_one(kn); > >> kernfs_fop_poll > >> kernfs_get_active = true > >> cgroup_file_poll > >> psi_trigger_poll > >> // UAF > >> ... > >> end: close(fd) > >> > >> Fix this by adding released flag check in kernfs_fop_poll(), which is > >> treated as kn is inactive. > >> > >> Fixes: 34f26a15611a ("sched/psi: Per-cgroup PSI accounting disable/re-enable interface") > >> Reported-by: Zhang Zhantian <zhangzhaotian@huawei.com> > >> Signed-off-by: Chen Ridong <chenridong@huawei.com> > >> --- > >> fs/kernfs/file.c | 2 +- > >> kernel/cgroup/cgroup.c | 1 + > >> 2 files changed, 2 insertions(+), 1 deletion(-) > >> > >> diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c > >> index a6c692cac616..d5d01f0b9392 100644 > >> --- a/fs/kernfs/file.c > >> +++ b/fs/kernfs/file.c > >> @@ -852,7 +852,7 @@ static __poll_t kernfs_fop_poll(struct file *filp, poll_table *wait) > >> struct kernfs_node *kn = kernfs_dentry_node(filp->f_path.dentry); > >> __poll_t ret; > >> > >> - if (!kernfs_get_active(kn)) > >> + if (of->released || !kernfs_get_active(kn)) > > > > I can see why the cgroup change is needed, but why is this test for > > released() an issue here? This feels like two different changes/fixes > > for different problems? Why does testing for released matter at this > > point in time? > > > > thanks, > > > > greg k-h > > Thank you for your feedback. > > The cgroup changes can prevent the UAF (Use-After-Free) issue, but they will introduce a NULL > pointer access problem. Where will the null pointer access happen? kernfs_fop_poll() isn't caring about the released file, AND you need to properly lock things when testing that value (hint, what if it changes right after you tested it?) > If the open file is properly drained(released), it can not safely invoke the poll callback again. > Otherwise, it may still lead to either UAF or NULL pointer issues > > Are you suggesting I should split this into two separate patches? This feels like two separate issues for two different things. The PSI change didn't cause the kernfs change to be required right? thanks, greg k-h ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] kernfs: Fix UAF in PSI polling when open file is released 2025-08-15 6:43 ` Greg KH @ 2025-08-15 7:16 ` Chen Ridong 0 siblings, 0 replies; 11+ messages in thread From: Chen Ridong @ 2025-08-15 7:16 UTC (permalink / raw) To: Greg KH Cc: tj, hannes, mkoutny, peterz, zhouchengming, linux-kernel, cgroups, lujialin4, chenridong On 2025/8/15 14:43, Greg KH wrote: > On Fri, Aug 15, 2025 at 02:22:54PM +0800, Chen Ridong wrote: >> >> >> On 2025/8/15 14:11, Greg KH wrote: >>> On Fri, Aug 15, 2025 at 01:34:29AM +0000, Chen Ridong wrote: >>>> From: Chen Ridong <chenridong@huawei.com> >>>> >>>> A use-after-free (UAF) vulnerability was identified in the PSI (Pressure >>>> Stall Information) monitoring mechanism: >>>> >>>> BUG: KASAN: slab-use-after-free in psi_trigger_poll+0x3c/0x140 >>>> Read of size 8 at addr ffff3de3d50bd308 by task systemd/1 >>>> >>>> psi_trigger_poll+0x3c/0x140 >>>> cgroup_pressure_poll+0x70/0xa0 >>>> cgroup_file_poll+0x8c/0x100 >>>> kernfs_fop_poll+0x11c/0x1c0 >>>> ep_item_poll.isra.0+0x188/0x2c0 >>>> >>>> Allocated by task 1: >>>> cgroup_file_open+0x88/0x388 >>>> kernfs_fop_open+0x73c/0xaf0 >>>> do_dentry_open+0x5fc/0x1200 >>>> vfs_open+0xa0/0x3f0 >>>> do_open+0x7e8/0xd08 >>>> path_openat+0x2fc/0x6b0 >>>> do_filp_open+0x174/0x368 >>>> >>>> Freed by task 8462: >>>> cgroup_file_release+0x130/0x1f8 >>>> kernfs_drain_open_files+0x17c/0x440 >>>> kernfs_drain+0x2dc/0x360 >>>> kernfs_show+0x1b8/0x288 >>>> cgroup_file_show+0x150/0x268 >>>> cgroup_pressure_write+0x1dc/0x340 >>>> cgroup_file_write+0x274/0x548 >>>> >>>> Reproduction Steps: >>>> 1. Open test/cpu.pressure and establish epoll monitoring >>>> 2. Disable monitoring: echo 0 > test/cgroup.pressure >>>> 3. Re-enable monitoring: echo 1 > test/cgroup.pressure >>>> >>>> The race condition occurs because: >>>> 1. When cgroup.pressure is disabled (echo 0 > cgroup.pressure), it: >>>> - Releases PSI triggers via cgroup_file_release() >>>> - Frees of->priv through kernfs_drain_open_files() >>>> 2. While epoll still holds reference to the file and continues polling >>>> 3. Re-enabling (echo 1 > cgroup.pressure) accesses freed of->priv >>>> >>>> epolling disable/enable cgroup.pressure >>>> fd=open(cpu.pressure) >>>> while(1) >>>> ... >>>> epoll_wait >>>> kernfs_fop_poll >>>> kernfs_get_active = true echo 0 > cgroup.pressure >>>> ... cgroup_file_show >>>> kernfs_show >>>> // inactive kn >>>> kernfs_drain_open_files >>>> cft->release(of); >>>> kfree(ctx); >>>> ... >>>> kernfs_get_active = false >>>> echo 1 > cgroup.pressure >>>> kernfs_show >>>> kernfs_activate_one(kn); >>>> kernfs_fop_poll >>>> kernfs_get_active = true >>>> cgroup_file_poll >>>> psi_trigger_poll >>>> // UAF >>>> ... >>>> end: close(fd) >>>> >>>> Fix this by adding released flag check in kernfs_fop_poll(), which is >>>> treated as kn is inactive. >>>> >>>> Fixes: 34f26a15611a ("sched/psi: Per-cgroup PSI accounting disable/re-enable interface") >>>> Reported-by: Zhang Zhantian <zhangzhaotian@huawei.com> >>>> Signed-off-by: Chen Ridong <chenridong@huawei.com> >>>> --- >>>> fs/kernfs/file.c | 2 +- >>>> kernel/cgroup/cgroup.c | 1 + >>>> 2 files changed, 2 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c >>>> index a6c692cac616..d5d01f0b9392 100644 >>>> --- a/fs/kernfs/file.c >>>> +++ b/fs/kernfs/file.c >>>> @@ -852,7 +852,7 @@ static __poll_t kernfs_fop_poll(struct file *filp, poll_table *wait) >>>> struct kernfs_node *kn = kernfs_dentry_node(filp->f_path.dentry); >>>> __poll_t ret; >>>> >>>> - if (!kernfs_get_active(kn)) >>>> + if (of->released || !kernfs_get_active(kn)) >>> >>> I can see why the cgroup change is needed, but why is this test for >>> released() an issue here? This feels like two different changes/fixes >>> for different problems? Why does testing for released matter at this >>> point in time? >>> >>> thanks, >>> >>> greg k-h >> >> Thank you for your feedback. >> >> The cgroup changes can prevent the UAF (Use-After-Free) issue, but they will introduce a NULL >> pointer access problem. > > Where will the null pointer access happen? kernfs_fop_poll() isn't > caring about the released file, AND you need to properly lock things > when testing that value (hint, what if it changes right after you tested > it?) > Issue occurs in psi_trigger_poll() during this sequence: fd = open("cpu.pressure") <...> // cgroup.pressure disabled then re-enabled of->released flag gets set to true kernfs_fop_poll() └─ kernfs_get_active() returns true (due to re-enable) <<<-- kernfs changes, check of->released └─ kn->attr.ops->poll() └─ cgroup_file_poll() └─ *ctx = of->priv; // Already released by .release()(cgroup_file_release) └─ psi_trigger_poll() └─ smp_load_acquire(trigger_ptr); // UAF: trigger_ptr == of->priv // NULL dereference after of->priv = NULL patch >> If the open file is properly drained(released), it can not safely invoke the poll callback again. >> Otherwise, it may still lead to either UAF or NULL pointer issues >> >> Are you suggesting I should split this into two separate patches? > > This feels like two separate issues for two different things. The PSI > change didn't cause the kernfs change to be required right? > > thanks, > > greg k-h Correct. The cgroup modifications make the issue more easily observable, while the kernfs changes provide the actual fix. -- Best regards, Ridong ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] kernfs: Fix UAF in PSI polling when open file is released 2025-08-15 6:11 ` Greg KH 2025-08-15 6:22 ` Chen Ridong @ 2025-08-15 14:42 ` Michal Koutný 2025-08-18 7:41 ` Chen Ridong 1 sibling, 1 reply; 11+ messages in thread From: Michal Koutný @ 2025-08-15 14:42 UTC (permalink / raw) To: Greg KH Cc: Chen Ridong, tj, hannes, peterz, zhouchengming, linux-kernel, cgroups, lujialin4, chenridong [-- Attachment #1: Type: text/plain, Size: 1390 bytes --] On Fri, Aug 15, 2025 at 08:11:39AM +0200, Greg KH <gregkh@linuxfoundation.org> wrote: > > diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c > > index a6c692cac616..d5d01f0b9392 100644 > > --- a/fs/kernfs/file.c > > +++ b/fs/kernfs/file.c > > @@ -852,7 +852,7 @@ static __poll_t kernfs_fop_poll(struct file *filp, poll_table *wait) > > struct kernfs_node *kn = kernfs_dentry_node(filp->f_path.dentry); > > __poll_t ret; > > > > - if (!kernfs_get_active(kn)) > > + if (of->released || !kernfs_get_active(kn)) > > I can see why the cgroup change is needed, I don't see it that much. of->priv isn't checked in cgroup code anywhere so it isn't helpful zeroing. As Ridong writes it may trade UaF for NULL pointer deref :-/ (Additionally, same zeroing would be needed in error path in cgroup_file_open().) I _think_ the place to cleanup would be in @@ -3978,6 +3978,8 @@ static ssize_t cgroup_pressure_write(struct kernfs_open_file *of, psi->enabled = enable; if (enable) psi_cgroup_restart(psi); + else + psi_trigger_destroy(???); } cgroup_kn_unlock(of->kn); The issue is that cgroup_pressure_write doesn't know all possible triggers to be cancelled. (The fix with of->released would only sanitize effect but not the cause IMO.) HTH, Michal [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 228 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] kernfs: Fix UAF in PSI polling when open file is released 2025-08-15 14:42 ` Michal Koutný @ 2025-08-18 7:41 ` Chen Ridong 0 siblings, 0 replies; 11+ messages in thread From: Chen Ridong @ 2025-08-18 7:41 UTC (permalink / raw) To: Michal Koutný, Greg KH Cc: tj, hannes, peterz, zhouchengming, linux-kernel, cgroups, lujialin4, chenridong On 2025/8/15 22:42, Michal Koutný wrote: > On Fri, Aug 15, 2025 at 08:11:39AM +0200, Greg KH <gregkh@linuxfoundation.org> wrote: >>> diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c >>> index a6c692cac616..d5d01f0b9392 100644 >>> --- a/fs/kernfs/file.c >>> +++ b/fs/kernfs/file.c >>> @@ -852,7 +852,7 @@ static __poll_t kernfs_fop_poll(struct file *filp, poll_table *wait) >>> struct kernfs_node *kn = kernfs_dentry_node(filp->f_path.dentry); >>> __poll_t ret; >>> >>> - if (!kernfs_get_active(kn)) >>> + if (of->released || !kernfs_get_active(kn)) >> >> I can see why the cgroup change is needed, > > I don't see it that much. of->priv isn't checked in cgroup code anywhere > so it isn't helpful zeroing. As Ridong writes it may trade UaF for NULL > pointer deref :-/ (Additionally, same zeroing would be needed in error > path in cgroup_file_open().) > Thank you, Michal, I believe assigning NULL to of->priv should be harmless. This change would make the bug more observable in practice. Without this explicit NULL assignment, the use-after-free (UAF) issue might remain hidden in some cases, particularly when KASAN is disabled. > I _think_ the place to cleanup would be in > @@ -3978,6 +3978,8 @@ static ssize_t cgroup_pressure_write(struct kernfs_open_file *of, > psi->enabled = enable; > if (enable) > psi_cgroup_restart(psi); > + else > + psi_trigger_destroy(???); > } > Could you please provide more details about this modification? Do you mean we need to consider additional cleanup work when disabling cgroup.pressure? The psi_trigger_destroy is invoked as follows: cgroup_file_show kernfs_drain kernfs_drain_open_files kernfs_release_file cgroup_file_release cft->release(of); cgroup_pressure_release psi_trigger_destroy > cgroup_kn_unlock(of->kn); > > The issue is that cgroup_pressure_write doesn't know all possible > triggers to be cancelled. (The fix with of->released would only > sanitize effect but not the cause IMO.) > > HTH, > Michal -- Best regards, Ridong ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <0319ee9b-ce2c-4c02-a731-c538afcf008f@huawei.com>]
* Re: [PATCH] kernfs: Fix UAF in PSI polling when open file is released [not found] ` <0319ee9b-ce2c-4c02-a731-c538afcf008f@huawei.com> @ 2025-08-18 8:00 ` Chen Ridong 2025-08-18 8:22 ` Chen Ridong 2025-08-20 1:46 ` Tejun Heo 0 siblings, 2 replies; 11+ messages in thread From: Chen Ridong @ 2025-08-18 8:00 UTC (permalink / raw) To: Baokun Li Cc: cgroups, chenridong, gregkh, hannes, linux-kernel, lujialin4, mkoutny, peterz, tj, zhouchengming, Yang Erkun On 2025/8/16 17:53, Baokun Li wrote: >> From: Chen Ridong <chenridong@huawei.com> >> >> A use-after-free (UAF) vulnerability was identified in the PSI (Pressure >> Stall Information) monitoring mechanism: >> >> BUG: KASAN: slab-use-after-free in psi_trigger_poll+0x3c/0x140 >> Read of size 8 at addr ffff3de3d50bd308 by task systemd/1 >> >> psi_trigger_poll+0x3c/0x140 >> cgroup_pressure_poll+0x70/0xa0 >> cgroup_file_poll+0x8c/0x100 >> kernfs_fop_poll+0x11c/0x1c0 >> ep_item_poll.isra.0+0x188/0x2c0 >> >> Allocated by task 1: >> cgroup_file_open+0x88/0x388 >> kernfs_fop_open+0x73c/0xaf0 >> do_dentry_open+0x5fc/0x1200 >> vfs_open+0xa0/0x3f0 >> do_open+0x7e8/0xd08 >> path_openat+0x2fc/0x6b0 >> do_filp_open+0x174/0x368 >> >> Freed by task 8462: >> cgroup_file_release+0x130/0x1f8 >> kernfs_drain_open_files+0x17c/0x440 >> kernfs_drain+0x2dc/0x360 >> kernfs_show+0x1b8/0x288 >> cgroup_file_show+0x150/0x268 >> cgroup_pressure_write+0x1dc/0x340 >> cgroup_file_write+0x274/0x548 >> >> Reproduction Steps: >> 1. Open test/cpu.pressure and establish epoll monitoring >> 2. Disable monitoring: echo 0 > test/cgroup.pressure >> 3. Re-enable monitoring: echo 1 > test/cgroup.pressure >> >> The race condition occurs because: >> 1. When cgroup.pressure is disabled (echo 0 > cgroup.pressure), it: >> - Releases PSI triggers via cgroup_file_release() >> - Frees of->priv through kernfs_drain_open_files() >> 2. While epoll still holds reference to the file and continues polling >> 3. Re-enabling (echo 1 > cgroup.pressure) accesses freed of->priv >> >> epolling disable/enable cgroup.pressure >> fd=open(cpu.pressure) >> while(1) >> ... >> epoll_wait >> kernfs_fop_poll >> kernfs_get_active = true echo 0 > cgroup.pressure >> ... cgroup_file_show >> kernfs_show >> // inactive kn >> kernfs_drain_open_files >> cft->release(of); >> kfree(ctx); >> ... >> kernfs_get_active = false >> echo 1 > cgroup.pressure >> kernfs_show >> kernfs_activate_one(kn); >> kernfs_fop_poll >> kernfs_get_active = true >> cgroup_file_poll >> psi_trigger_poll >> // UAF >> ... >> end: close(fd) Thank you, Baokun. > I think the problem is that kernfs_show() handles enable and disable > inconsistently. When disable is called, it sets kn->active and then frees > cgroup_file_ctx and psi_trigger. But when enable is called, it only sets > kn->active. This mismatch means we can end up accessing the freed > cgroup_file_ctx and psi_trigger later on. > I agree with that. > A potential solution is to make the lifecycles of cgroup_file_ctx and > psi_trigger match the struct kernfs_open_file they're associated with. > Maybe we could just get rid of the kernfs_release_file call in > kernfs_drain_open_files? > Hi, Tj, what do you think about this solution? > That way, the resources would be safely released only when the file > descriptor is actually freed. Plus, if cgroup.pressure is re-enabled, > any open file descriptors would still work as expected. > > > Cheers, > Baokun > -- Best regards, Ridong ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] kernfs: Fix UAF in PSI polling when open file is released 2025-08-18 8:00 ` Chen Ridong @ 2025-08-18 8:22 ` Chen Ridong 2025-08-20 1:46 ` Tejun Heo 1 sibling, 0 replies; 11+ messages in thread From: Chen Ridong @ 2025-08-18 8:22 UTC (permalink / raw) To: Baokun Li Cc: cgroups, chenridong, gregkh, hannes, linux-kernel, lujialin4, mkoutny, peterz, tj, zhouchengming, Yang Erkun On 2025/8/18 16:00, Chen Ridong wrote: > > > On 2025/8/16 17:53, Baokun Li wrote: >>> From: Chen Ridong <chenridong@huawei.com> >>> >>> A use-after-free (UAF) vulnerability was identified in the PSI (Pressure >>> Stall Information) monitoring mechanism: >>> >>> BUG: KASAN: slab-use-after-free in psi_trigger_poll+0x3c/0x140 >>> Read of size 8 at addr ffff3de3d50bd308 by task systemd/1 >>> >>> psi_trigger_poll+0x3c/0x140 >>> cgroup_pressure_poll+0x70/0xa0 >>> cgroup_file_poll+0x8c/0x100 >>> kernfs_fop_poll+0x11c/0x1c0 >>> ep_item_poll.isra.0+0x188/0x2c0 >>> >>> Allocated by task 1: >>> cgroup_file_open+0x88/0x388 >>> kernfs_fop_open+0x73c/0xaf0 >>> do_dentry_open+0x5fc/0x1200 >>> vfs_open+0xa0/0x3f0 >>> do_open+0x7e8/0xd08 >>> path_openat+0x2fc/0x6b0 >>> do_filp_open+0x174/0x368 >>> >>> Freed by task 8462: >>> cgroup_file_release+0x130/0x1f8 >>> kernfs_drain_open_files+0x17c/0x440 >>> kernfs_drain+0x2dc/0x360 >>> kernfs_show+0x1b8/0x288 >>> cgroup_file_show+0x150/0x268 >>> cgroup_pressure_write+0x1dc/0x340 >>> cgroup_file_write+0x274/0x548 >>> >>> Reproduction Steps: >>> 1. Open test/cpu.pressure and establish epoll monitoring >>> 2. Disable monitoring: echo 0 > test/cgroup.pressure >>> 3. Re-enable monitoring: echo 1 > test/cgroup.pressure >>> >>> The race condition occurs because: >>> 1. When cgroup.pressure is disabled (echo 0 > cgroup.pressure), it: >>> - Releases PSI triggers via cgroup_file_release() >>> - Frees of->priv through kernfs_drain_open_files() >>> 2. While epoll still holds reference to the file and continues polling >>> 3. Re-enabling (echo 1 > cgroup.pressure) accesses freed of->priv >>> >>> epolling disable/enable cgroup.pressure >>> fd=open(cpu.pressure) >>> while(1) >>> ... >>> epoll_wait >>> kernfs_fop_poll >>> kernfs_get_active = true echo 0 > cgroup.pressure >>> ... cgroup_file_show >>> kernfs_show >>> // inactive kn >>> kernfs_drain_open_files >>> cft->release(of); >>> kfree(ctx); >>> ... >>> kernfs_get_active = false >>> echo 1 > cgroup.pressure >>> kernfs_show >>> kernfs_activate_one(kn); >>> kernfs_fop_poll >>> kernfs_get_active = true >>> cgroup_file_poll >>> psi_trigger_poll >>> // UAF >>> ... >>> end: close(fd) > > Thank you, Baokun. > >> I think the problem is that kernfs_show() handles enable and disable >> inconsistently. When disable is called, it sets kn->active and then frees >> cgroup_file_ctx and psi_trigger. But when enable is called, it only sets >> kn->active. This mismatch means we can end up accessing the freed >> cgroup_file_ctx and psi_trigger later on. >> > > I agree with that. > >> A potential solution is to make the lifecycles of cgroup_file_ctx and >> psi_trigger match the struct kernfs_open_file they're associated with. >> Maybe we could just get rid of the kernfs_release_file call in >> kernfs_drain_open_files? >> > > Hi, Tj, what do you think about this solution? > Or we should add KERNFS_HIDDEN check? --- a/fs/kernfs/file.c +++ b/fs/kernfs/file.c @@ -812,7 +812,7 @@ void kernfs_drain_open_files(struct kernfs_node *kn) on->nr_mmapped--; } - if (kn->flags & KERNFS_HAS_RELEASE) + if (!kn->flags & KERNFS_HIDDEN && kn->flags & KERNFS_HAS_RELEASE) kernfs_release_file(kn, of); } -- Best regards, Ridong ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] kernfs: Fix UAF in PSI polling when open file is released 2025-08-18 8:00 ` Chen Ridong 2025-08-18 8:22 ` Chen Ridong @ 2025-08-20 1:46 ` Tejun Heo 2025-08-22 1:57 ` Chen Ridong 1 sibling, 1 reply; 11+ messages in thread From: Tejun Heo @ 2025-08-20 1:46 UTC (permalink / raw) To: Chen Ridong Cc: Baokun Li, cgroups, chenridong, gregkh, hannes, linux-kernel, lujialin4, mkoutny, peterz, zhouchengming, Yang Erkun Hello, On Mon, Aug 18, 2025 at 04:00:08PM +0800, Chen Ridong wrote: > > A potential solution is to make the lifecycles of cgroup_file_ctx and > > psi_trigger match the struct kernfs_open_file they're associated with. > > Maybe we could just get rid of the kernfs_release_file call in > > kernfs_drain_open_files? > > > > Hi, Tj, what do you think about this solution? So, I think it's really fragile for a killed (drained) kernfs_open_file to be reused after the corresponding @kn is resurrected. Once killed, that file should stay dead. I think it'd be best if we can do this in a generic manner rather than trying to fix it only for poll. kernfs_get_active() is the thing which gates active operations on the file. Maybe we can add a wrapper, say, kernfs_get_active_of(struct kernfs_open_file *of) which returns NULL if @of has already been killed or the underlying @kn can't be activated? Thanks. -- tejun ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] kernfs: Fix UAF in PSI polling when open file is released 2025-08-20 1:46 ` Tejun Heo @ 2025-08-22 1:57 ` Chen Ridong 0 siblings, 0 replies; 11+ messages in thread From: Chen Ridong @ 2025-08-22 1:57 UTC (permalink / raw) To: Tejun Heo Cc: Baokun Li, cgroups, chenridong, gregkh, hannes, linux-kernel, lujialin4, mkoutny, peterz, zhouchengming, Yang Erkun On 2025/8/20 9:46, Tejun Heo wrote: > Hello, > > On Mon, Aug 18, 2025 at 04:00:08PM +0800, Chen Ridong wrote: >>> A potential solution is to make the lifecycles of cgroup_file_ctx and >>> psi_trigger match the struct kernfs_open_file they're associated with. >>> Maybe we could just get rid of the kernfs_release_file call in >>> kernfs_drain_open_files? >>> >> >> Hi, Tj, what do you think about this solution? > > So, I think it's really fragile for a killed (drained) kernfs_open_file to > be reused after the corresponding @kn is resurrected. Once killed, that file > should stay dead. I think it'd be best if we can do this in a generic manner > rather than trying to fix it only for poll. > > kernfs_get_active() is the thing which gates active operations on the file. > Maybe we can add a wrapper, say, kernfs_get_active_of(struct > kernfs_open_file *of) which returns NULL if @of has already been killed or > the underlying @kn can't be activated? > > Thanks. > Thank you Tj, This is reasonable, I will try. -- Best regards, Ridong ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2025-08-22 1:57 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-08-15 1:34 [PATCH] kernfs: Fix UAF in PSI polling when open file is released Chen Ridong 2025-08-15 6:11 ` Greg KH 2025-08-15 6:22 ` Chen Ridong 2025-08-15 6:43 ` Greg KH 2025-08-15 7:16 ` Chen Ridong 2025-08-15 14:42 ` Michal Koutný 2025-08-18 7:41 ` Chen Ridong [not found] ` <0319ee9b-ce2c-4c02-a731-c538afcf008f@huawei.com> 2025-08-18 8:00 ` Chen Ridong 2025-08-18 8:22 ` Chen Ridong 2025-08-20 1:46 ` Tejun Heo 2025-08-22 1:57 ` Chen Ridong
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).