* [PATCH v2 0/2] kernfs: Fix UAF in PSI polling when open file is released @ 2025-08-22 7:07 Chen Ridong 2025-08-22 7:07 ` [PATCH v2 1/2] kernfs: Fix UAF in " Chen Ridong 2025-08-22 7:07 ` [PATCH v2 2/2] cgroup/psi: Set of->priv to NULL upon file release Chen Ridong 0 siblings, 2 replies; 11+ messages in thread From: Chen Ridong @ 2025-08-22 7:07 UTC (permalink / raw) To: gregkh, tj, hannes, mkoutny, peterz, zhouchengming Cc: linux-kernel, cgroups, lujialin4, chenridong, libaokun1 From: Chen Ridong <chenridong@huawei.com> This series fixes a use-after-free issue triggered when disabling and enabling cgroup PSI. --- v2: - Add kernfs_get_active_of() to safely acquire active references for kernfs open files suggested by Tj. - Split into two patches as suggested by Greg. Chen Ridong (2): kernfs: Fix UAF in polling when open file is released cgroup/psi: Set of->priv to NULL upon file release fs/kernfs/file.c | 58 +++++++++++++++++++++++++++--------------- kernel/cgroup/cgroup.c | 1 + 2 files changed, 39 insertions(+), 20 deletions(-) -- 2.34.1 ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v2 1/2] kernfs: Fix UAF in polling when open file is released 2025-08-22 7:07 [PATCH v2 0/2] kernfs: Fix UAF in PSI polling when open file is released Chen Ridong @ 2025-08-22 7:07 ` Chen Ridong 2025-08-22 7:47 ` Greg KH 2025-08-22 17:46 ` Tejun Heo 2025-08-22 7:07 ` [PATCH v2 2/2] cgroup/psi: Set of->priv to NULL upon file release Chen Ridong 1 sibling, 2 replies; 11+ messages in thread From: Chen Ridong @ 2025-08-22 7:07 UTC (permalink / raw) To: gregkh, tj, hannes, mkoutny, peterz, zhouchengming Cc: linux-kernel, cgroups, lujialin4, chenridong, libaokun1 From: Chen Ridong <chenridong@huawei.com> A use-after-free (UAF) vulnerability was identified in the PSI (Pressure Stall Information) monitoring mechanism: BUG: KASAN: slab-use-after-free in psi_trigger_poll+0x3c/0x140 Read of size 8 at addr ffff3de3d50bd308 by task systemd/1 psi_trigger_poll+0x3c/0x140 cgroup_pressure_poll+0x70/0xa0 cgroup_file_poll+0x8c/0x100 kernfs_fop_poll+0x11c/0x1c0 ep_item_poll.isra.0+0x188/0x2c0 Allocated by task 1: cgroup_file_open+0x88/0x388 kernfs_fop_open+0x73c/0xaf0 do_dentry_open+0x5fc/0x1200 vfs_open+0xa0/0x3f0 do_open+0x7e8/0xd08 path_openat+0x2fc/0x6b0 do_filp_open+0x174/0x368 Freed by task 8462: cgroup_file_release+0x130/0x1f8 kernfs_drain_open_files+0x17c/0x440 kernfs_drain+0x2dc/0x360 kernfs_show+0x1b8/0x288 cgroup_file_show+0x150/0x268 cgroup_pressure_write+0x1dc/0x340 cgroup_file_write+0x274/0x548 Reproduction Steps: 1. Open test/cpu.pressure and establish epoll monitoring 2. Disable monitoring: echo 0 > test/cgroup.pressure 3. Re-enable monitoring: echo 1 > test/cgroup.pressure The race condition occurs because: 1. When cgroup.pressure is disabled (echo 0 > cgroup.pressure), it: - Releases PSI triggers via cgroup_file_release() - Frees of->priv through kernfs_drain_open_files() 2. While epoll still holds reference to the file and continues polling 3. Re-enabling (echo 1 > cgroup.pressure) accesses freed of->priv epolling disable/enable cgroup.pressure fd=open(cpu.pressure) while(1) ... epoll_wait kernfs_fop_poll kernfs_get_active = true echo 0 > cgroup.pressure ... cgroup_file_show kernfs_show // inactive kn kernfs_drain_open_files cft->release(of); kfree(ctx); ... kernfs_get_active = false echo 1 > cgroup.pressure kernfs_show kernfs_activate_one(kn); kernfs_fop_poll kernfs_get_active = true cgroup_file_poll psi_trigger_poll // UAF ... end: close(fd) To address this issue, introduce kernfs_get_active_of() for kernfs open files to obtain active references. This function will fail if the open file has been released. Replace kernfs_get_active() with kernfs_get_active_of() to prevent further operations on released file descriptors. Fixes: 34f26a15611a ("sched/psi: Per-cgroup PSI accounting disable/re-enable interface") Reported-by: Zhang Zhaotian <zhangzhaotian@huawei.com> Signed-off-by: Chen Ridong <chenridong@huawei.com> --- fs/kernfs/file.c | 58 +++++++++++++++++++++++++++++++----------------- 1 file changed, 38 insertions(+), 20 deletions(-) diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c index a6c692cac616..9adf36e6364b 100644 --- a/fs/kernfs/file.c +++ b/fs/kernfs/file.c @@ -70,6 +70,24 @@ static struct kernfs_open_node *of_on(struct kernfs_open_file *of) !list_empty(&of->list)); } +/* Get active reference to kernfs node for an open file */ +static struct kernfs_open_file *kernfs_get_active_of(struct kernfs_open_file *of) +{ + /* Skip if file was already released */ + if (unlikely(of->released)) + return NULL; + + if (!kernfs_get_active(of->kn)) + return NULL; + + return of; +} + +static void kernfs_put_active_of(struct kernfs_open_file *of) +{ + return kernfs_put_active(of->kn); +} + /** * kernfs_deref_open_node_locked - Get kernfs_open_node corresponding to @kn * @@ -139,7 +157,7 @@ static void kernfs_seq_stop_active(struct seq_file *sf, void *v) if (ops->seq_stop) ops->seq_stop(sf, v); - kernfs_put_active(of->kn); + kernfs_put_active_of(of); } static void *kernfs_seq_start(struct seq_file *sf, loff_t *ppos) @@ -152,7 +170,7 @@ static void *kernfs_seq_start(struct seq_file *sf, loff_t *ppos) * the ops aren't called concurrently for the same open file. */ mutex_lock(&of->mutex); - if (!kernfs_get_active(of->kn)) + if (!kernfs_get_active_of(of)) return ERR_PTR(-ENODEV); ops = kernfs_ops(of->kn); @@ -238,7 +256,7 @@ static ssize_t kernfs_file_read_iter(struct kiocb *iocb, struct iov_iter *iter) * the ops aren't called concurrently for the same open file. */ mutex_lock(&of->mutex); - if (!kernfs_get_active(of->kn)) { + if (!kernfs_get_active_of(of)) { len = -ENODEV; mutex_unlock(&of->mutex); goto out_free; @@ -252,7 +270,7 @@ static ssize_t kernfs_file_read_iter(struct kiocb *iocb, struct iov_iter *iter) else len = -EINVAL; - kernfs_put_active(of->kn); + kernfs_put_active_of(of); mutex_unlock(&of->mutex); if (len < 0) @@ -323,7 +341,7 @@ static ssize_t kernfs_fop_write_iter(struct kiocb *iocb, struct iov_iter *iter) * the ops aren't called concurrently for the same open file. */ mutex_lock(&of->mutex); - if (!kernfs_get_active(of->kn)) { + if (!kernfs_get_active_of(of)) { mutex_unlock(&of->mutex); len = -ENODEV; goto out_free; @@ -335,7 +353,7 @@ static ssize_t kernfs_fop_write_iter(struct kiocb *iocb, struct iov_iter *iter) else len = -EINVAL; - kernfs_put_active(of->kn); + kernfs_put_active_of(of); mutex_unlock(&of->mutex); if (len > 0) @@ -357,13 +375,13 @@ static void kernfs_vma_open(struct vm_area_struct *vma) if (!of->vm_ops) return; - if (!kernfs_get_active(of->kn)) + if (!kernfs_get_active_of(of)) return; if (of->vm_ops->open) of->vm_ops->open(vma); - kernfs_put_active(of->kn); + kernfs_put_active_of(of); } static vm_fault_t kernfs_vma_fault(struct vm_fault *vmf) @@ -375,14 +393,14 @@ static vm_fault_t kernfs_vma_fault(struct vm_fault *vmf) if (!of->vm_ops) return VM_FAULT_SIGBUS; - if (!kernfs_get_active(of->kn)) + if (!kernfs_get_active_of(of)) return VM_FAULT_SIGBUS; ret = VM_FAULT_SIGBUS; if (of->vm_ops->fault) ret = of->vm_ops->fault(vmf); - kernfs_put_active(of->kn); + kernfs_put_active_of(of); return ret; } @@ -395,7 +413,7 @@ static vm_fault_t kernfs_vma_page_mkwrite(struct vm_fault *vmf) if (!of->vm_ops) return VM_FAULT_SIGBUS; - if (!kernfs_get_active(of->kn)) + if (!kernfs_get_active_of(of)) return VM_FAULT_SIGBUS; ret = 0; @@ -404,7 +422,7 @@ static vm_fault_t kernfs_vma_page_mkwrite(struct vm_fault *vmf) else file_update_time(file); - kernfs_put_active(of->kn); + kernfs_put_active_of(of); return ret; } @@ -418,14 +436,14 @@ static int kernfs_vma_access(struct vm_area_struct *vma, unsigned long addr, if (!of->vm_ops) return -EINVAL; - if (!kernfs_get_active(of->kn)) + if (!kernfs_get_active_of(of)) return -EINVAL; ret = -EINVAL; if (of->vm_ops->access) ret = of->vm_ops->access(vma, addr, buf, len, write); - kernfs_put_active(of->kn); + kernfs_put_active_of(of); return ret; } @@ -455,7 +473,7 @@ static int kernfs_fop_mmap(struct file *file, struct vm_area_struct *vma) mutex_lock(&of->mutex); rc = -ENODEV; - if (!kernfs_get_active(of->kn)) + if (!kernfs_get_active_of(of)) goto out_unlock; ops = kernfs_ops(of->kn); @@ -490,7 +508,7 @@ static int kernfs_fop_mmap(struct file *file, struct vm_area_struct *vma) } vma->vm_ops = &kernfs_vm_ops; out_put: - kernfs_put_active(of->kn); + kernfs_put_active_of(of); out_unlock: mutex_unlock(&of->mutex); @@ -852,7 +870,7 @@ static __poll_t kernfs_fop_poll(struct file *filp, poll_table *wait) struct kernfs_node *kn = kernfs_dentry_node(filp->f_path.dentry); __poll_t ret; - if (!kernfs_get_active(kn)) + if (!kernfs_get_active_of(of)) return DEFAULT_POLLMASK|EPOLLERR|EPOLLPRI; if (kn->attr.ops->poll) @@ -860,7 +878,7 @@ static __poll_t kernfs_fop_poll(struct file *filp, poll_table *wait) else ret = kernfs_generic_poll(of, wait); - kernfs_put_active(kn); + kernfs_put_active_of(of); return ret; } @@ -875,7 +893,7 @@ static loff_t kernfs_fop_llseek(struct file *file, loff_t offset, int whence) * the ops aren't called concurrently for the same open file. */ mutex_lock(&of->mutex); - if (!kernfs_get_active(of->kn)) { + if (!kernfs_get_active_of(of)) { mutex_unlock(&of->mutex); return -ENODEV; } @@ -886,7 +904,7 @@ static loff_t kernfs_fop_llseek(struct file *file, loff_t offset, int whence) else ret = generic_file_llseek(file, offset, whence); - kernfs_put_active(of->kn); + kernfs_put_active_of(of); mutex_unlock(&of->mutex); return ret; } -- 2.34.1 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v2 1/2] kernfs: Fix UAF in polling when open file is released 2025-08-22 7:07 ` [PATCH v2 1/2] kernfs: Fix UAF in " Chen Ridong @ 2025-08-22 7:47 ` Greg KH 2025-08-22 17:46 ` Tejun Heo 1 sibling, 0 replies; 11+ messages in thread From: Greg KH @ 2025-08-22 7:47 UTC (permalink / raw) To: Chen Ridong Cc: tj, hannes, mkoutny, peterz, zhouchengming, linux-kernel, cgroups, lujialin4, chenridong, libaokun1 On Fri, Aug 22, 2025 at 07:07:14AM +0000, Chen Ridong wrote: > From: Chen Ridong <chenridong@huawei.com> > > A use-after-free (UAF) vulnerability was identified in the PSI (Pressure > Stall Information) monitoring mechanism: > > BUG: KASAN: slab-use-after-free in psi_trigger_poll+0x3c/0x140 > Read of size 8 at addr ffff3de3d50bd308 by task systemd/1 > > psi_trigger_poll+0x3c/0x140 > cgroup_pressure_poll+0x70/0xa0 > cgroup_file_poll+0x8c/0x100 > kernfs_fop_poll+0x11c/0x1c0 > ep_item_poll.isra.0+0x188/0x2c0 > > Allocated by task 1: > cgroup_file_open+0x88/0x388 > kernfs_fop_open+0x73c/0xaf0 > do_dentry_open+0x5fc/0x1200 > vfs_open+0xa0/0x3f0 > do_open+0x7e8/0xd08 > path_openat+0x2fc/0x6b0 > do_filp_open+0x174/0x368 > > Freed by task 8462: > cgroup_file_release+0x130/0x1f8 > kernfs_drain_open_files+0x17c/0x440 > kernfs_drain+0x2dc/0x360 > kernfs_show+0x1b8/0x288 > cgroup_file_show+0x150/0x268 > cgroup_pressure_write+0x1dc/0x340 > cgroup_file_write+0x274/0x548 > > Reproduction Steps: > 1. Open test/cpu.pressure and establish epoll monitoring > 2. Disable monitoring: echo 0 > test/cgroup.pressure > 3. Re-enable monitoring: echo 1 > test/cgroup.pressure > > The race condition occurs because: > 1. When cgroup.pressure is disabled (echo 0 > cgroup.pressure), it: > - Releases PSI triggers via cgroup_file_release() > - Frees of->priv through kernfs_drain_open_files() > 2. While epoll still holds reference to the file and continues polling > 3. Re-enabling (echo 1 > cgroup.pressure) accesses freed of->priv > > epolling disable/enable cgroup.pressure > fd=open(cpu.pressure) > while(1) > ... > epoll_wait > kernfs_fop_poll > kernfs_get_active = true echo 0 > cgroup.pressure > ... cgroup_file_show > kernfs_show > // inactive kn > kernfs_drain_open_files > cft->release(of); > kfree(ctx); > ... > kernfs_get_active = false > echo 1 > cgroup.pressure > kernfs_show > kernfs_activate_one(kn); > kernfs_fop_poll > kernfs_get_active = true > cgroup_file_poll > psi_trigger_poll > // UAF > ... > end: close(fd) > > To address this issue, introduce kernfs_get_active_of() for kernfs open > files to obtain active references. This function will fail if the open file > has been released. Replace kernfs_get_active() with kernfs_get_active_of() > to prevent further operations on released file descriptors. > > Fixes: 34f26a15611a ("sched/psi: Per-cgroup PSI accounting disable/re-enable interface") > Reported-by: Zhang Zhaotian <zhangzhaotian@huawei.com> > Signed-off-by: Chen Ridong <chenridong@huawei.com> > --- > fs/kernfs/file.c | 58 +++++++++++++++++++++++++++++++----------------- > 1 file changed, 38 insertions(+), 20 deletions(-) > > diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c > index a6c692cac616..9adf36e6364b 100644 > --- a/fs/kernfs/file.c > +++ b/fs/kernfs/file.c > @@ -70,6 +70,24 @@ static struct kernfs_open_node *of_on(struct kernfs_open_file *of) > !list_empty(&of->list)); > } > > +/* Get active reference to kernfs node for an open file */ > +static struct kernfs_open_file *kernfs_get_active_of(struct kernfs_open_file *of) > +{ > + /* Skip if file was already released */ > + if (unlikely(of->released)) > + return NULL; > + > + if (!kernfs_get_active(of->kn)) > + return NULL; > + > + return of; > +} > + > +static void kernfs_put_active_of(struct kernfs_open_file *of) > +{ > + return kernfs_put_active(of->kn); > +} > + > /** > * kernfs_deref_open_node_locked - Get kernfs_open_node corresponding to @kn > * > @@ -139,7 +157,7 @@ static void kernfs_seq_stop_active(struct seq_file *sf, void *v) > > if (ops->seq_stop) > ops->seq_stop(sf, v); > - kernfs_put_active(of->kn); > + kernfs_put_active_of(of); > } > > static void *kernfs_seq_start(struct seq_file *sf, loff_t *ppos) > @@ -152,7 +170,7 @@ static void *kernfs_seq_start(struct seq_file *sf, loff_t *ppos) > * the ops aren't called concurrently for the same open file. > */ > mutex_lock(&of->mutex); > - if (!kernfs_get_active(of->kn)) > + if (!kernfs_get_active_of(of)) > return ERR_PTR(-ENODEV); > > ops = kernfs_ops(of->kn); > @@ -238,7 +256,7 @@ static ssize_t kernfs_file_read_iter(struct kiocb *iocb, struct iov_iter *iter) > * the ops aren't called concurrently for the same open file. > */ > mutex_lock(&of->mutex); > - if (!kernfs_get_active(of->kn)) { > + if (!kernfs_get_active_of(of)) { > len = -ENODEV; > mutex_unlock(&of->mutex); > goto out_free; > @@ -252,7 +270,7 @@ static ssize_t kernfs_file_read_iter(struct kiocb *iocb, struct iov_iter *iter) > else > len = -EINVAL; > > - kernfs_put_active(of->kn); > + kernfs_put_active_of(of); > mutex_unlock(&of->mutex); > > if (len < 0) > @@ -323,7 +341,7 @@ static ssize_t kernfs_fop_write_iter(struct kiocb *iocb, struct iov_iter *iter) > * the ops aren't called concurrently for the same open file. > */ > mutex_lock(&of->mutex); > - if (!kernfs_get_active(of->kn)) { > + if (!kernfs_get_active_of(of)) { > mutex_unlock(&of->mutex); > len = -ENODEV; > goto out_free; > @@ -335,7 +353,7 @@ static ssize_t kernfs_fop_write_iter(struct kiocb *iocb, struct iov_iter *iter) > else > len = -EINVAL; > > - kernfs_put_active(of->kn); > + kernfs_put_active_of(of); > mutex_unlock(&of->mutex); > > if (len > 0) > @@ -357,13 +375,13 @@ static void kernfs_vma_open(struct vm_area_struct *vma) > if (!of->vm_ops) > return; > > - if (!kernfs_get_active(of->kn)) > + if (!kernfs_get_active_of(of)) > return; > > if (of->vm_ops->open) > of->vm_ops->open(vma); > > - kernfs_put_active(of->kn); > + kernfs_put_active_of(of); > } > > static vm_fault_t kernfs_vma_fault(struct vm_fault *vmf) > @@ -375,14 +393,14 @@ static vm_fault_t kernfs_vma_fault(struct vm_fault *vmf) > if (!of->vm_ops) > return VM_FAULT_SIGBUS; > > - if (!kernfs_get_active(of->kn)) > + if (!kernfs_get_active_of(of)) > return VM_FAULT_SIGBUS; > > ret = VM_FAULT_SIGBUS; > if (of->vm_ops->fault) > ret = of->vm_ops->fault(vmf); > > - kernfs_put_active(of->kn); > + kernfs_put_active_of(of); > return ret; > } > > @@ -395,7 +413,7 @@ static vm_fault_t kernfs_vma_page_mkwrite(struct vm_fault *vmf) > if (!of->vm_ops) > return VM_FAULT_SIGBUS; > > - if (!kernfs_get_active(of->kn)) > + if (!kernfs_get_active_of(of)) > return VM_FAULT_SIGBUS; > > ret = 0; > @@ -404,7 +422,7 @@ static vm_fault_t kernfs_vma_page_mkwrite(struct vm_fault *vmf) > else > file_update_time(file); > > - kernfs_put_active(of->kn); > + kernfs_put_active_of(of); > return ret; > } > > @@ -418,14 +436,14 @@ static int kernfs_vma_access(struct vm_area_struct *vma, unsigned long addr, > if (!of->vm_ops) > return -EINVAL; > > - if (!kernfs_get_active(of->kn)) > + if (!kernfs_get_active_of(of)) > return -EINVAL; > > ret = -EINVAL; > if (of->vm_ops->access) > ret = of->vm_ops->access(vma, addr, buf, len, write); > > - kernfs_put_active(of->kn); > + kernfs_put_active_of(of); > return ret; > } > > @@ -455,7 +473,7 @@ static int kernfs_fop_mmap(struct file *file, struct vm_area_struct *vma) > mutex_lock(&of->mutex); > > rc = -ENODEV; > - if (!kernfs_get_active(of->kn)) > + if (!kernfs_get_active_of(of)) > goto out_unlock; > > ops = kernfs_ops(of->kn); > @@ -490,7 +508,7 @@ static int kernfs_fop_mmap(struct file *file, struct vm_area_struct *vma) > } > vma->vm_ops = &kernfs_vm_ops; > out_put: > - kernfs_put_active(of->kn); > + kernfs_put_active_of(of); > out_unlock: > mutex_unlock(&of->mutex); > > @@ -852,7 +870,7 @@ static __poll_t kernfs_fop_poll(struct file *filp, poll_table *wait) > struct kernfs_node *kn = kernfs_dentry_node(filp->f_path.dentry); > __poll_t ret; > > - if (!kernfs_get_active(kn)) > + if (!kernfs_get_active_of(of)) > return DEFAULT_POLLMASK|EPOLLERR|EPOLLPRI; > > if (kn->attr.ops->poll) > @@ -860,7 +878,7 @@ static __poll_t kernfs_fop_poll(struct file *filp, poll_table *wait) > else > ret = kernfs_generic_poll(of, wait); > > - kernfs_put_active(kn); > + kernfs_put_active_of(of); > return ret; > } > > @@ -875,7 +893,7 @@ static loff_t kernfs_fop_llseek(struct file *file, loff_t offset, int whence) > * the ops aren't called concurrently for the same open file. > */ > mutex_lock(&of->mutex); > - if (!kernfs_get_active(of->kn)) { > + if (!kernfs_get_active_of(of)) { > mutex_unlock(&of->mutex); > return -ENODEV; > } > @@ -886,7 +904,7 @@ static loff_t kernfs_fop_llseek(struct file *file, loff_t offset, int whence) > else > ret = generic_file_llseek(file, offset, whence); > > - kernfs_put_active(of->kn); > + kernfs_put_active_of(of); > mutex_unlock(&of->mutex); > return ret; > } > -- > 2.34.1 > Hi, This is the friendly patch-bot of Greg Kroah-Hartman. You have sent him a patch that has triggered this response. He used to manually respond to these common problems, but in order to save his sanity (he kept writing the same thing over and over, yet to different people), I was created. Hopefully you will not take offence and will fix the problem in your patch and resubmit it so that it can be accepted into the Linux kernel tree. You are receiving this message because of the following common error(s) as indicated below: - You have marked a patch with a "Fixes:" tag for a commit that is in an older released kernel, yet you do not have a cc: stable line in the signed-off-by area at all, which means that the patch will not be applied to any older kernel releases. To properly fix this, please follow the documented rules in the Documentation/process/stable-kernel-rules.rst file for how to resolve this. If you wish to discuss this problem further, or you have questions about how to resolve this issue, please feel free to respond to this email and Greg will reply once he has dug out from the pending patches received from other developers. thanks, greg k-h's patch email bot ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 1/2] kernfs: Fix UAF in polling when open file is released 2025-08-22 7:07 ` [PATCH v2 1/2] kernfs: Fix UAF in " Chen Ridong 2025-08-22 7:47 ` Greg KH @ 2025-08-22 17:46 ` Tejun Heo 2025-08-23 0:23 ` Chen Ridong 1 sibling, 1 reply; 11+ messages in thread From: Tejun Heo @ 2025-08-22 17:46 UTC (permalink / raw) To: Chen Ridong Cc: gregkh, hannes, mkoutny, peterz, zhouchengming, linux-kernel, cgroups, lujialin4, chenridong, libaokun1 On Fri, Aug 22, 2025 at 07:07:14AM +0000, Chen Ridong wrote: > From: Chen Ridong <chenridong@huawei.com> > > A use-after-free (UAF) vulnerability was identified in the PSI (Pressure > Stall Information) monitoring mechanism: > > BUG: KASAN: slab-use-after-free in psi_trigger_poll+0x3c/0x140 > Read of size 8 at addr ffff3de3d50bd308 by task systemd/1 > > psi_trigger_poll+0x3c/0x140 > cgroup_pressure_poll+0x70/0xa0 > cgroup_file_poll+0x8c/0x100 > kernfs_fop_poll+0x11c/0x1c0 > ep_item_poll.isra.0+0x188/0x2c0 > > Allocated by task 1: > cgroup_file_open+0x88/0x388 > kernfs_fop_open+0x73c/0xaf0 > do_dentry_open+0x5fc/0x1200 > vfs_open+0xa0/0x3f0 > do_open+0x7e8/0xd08 > path_openat+0x2fc/0x6b0 > do_filp_open+0x174/0x368 > > Freed by task 8462: > cgroup_file_release+0x130/0x1f8 > kernfs_drain_open_files+0x17c/0x440 > kernfs_drain+0x2dc/0x360 > kernfs_show+0x1b8/0x288 > cgroup_file_show+0x150/0x268 > cgroup_pressure_write+0x1dc/0x340 > cgroup_file_write+0x274/0x548 > > Reproduction Steps: > 1. Open test/cpu.pressure and establish epoll monitoring > 2. Disable monitoring: echo 0 > test/cgroup.pressure > 3. Re-enable monitoring: echo 1 > test/cgroup.pressure > > The race condition occurs because: > 1. When cgroup.pressure is disabled (echo 0 > cgroup.pressure), it: > - Releases PSI triggers via cgroup_file_release() > - Frees of->priv through kernfs_drain_open_files() > 2. While epoll still holds reference to the file and continues polling > 3. Re-enabling (echo 1 > cgroup.pressure) accesses freed of->priv > > epolling disable/enable cgroup.pressure > fd=open(cpu.pressure) > while(1) > ... > epoll_wait > kernfs_fop_poll > kernfs_get_active = true echo 0 > cgroup.pressure > ... cgroup_file_show > kernfs_show > // inactive kn > kernfs_drain_open_files > cft->release(of); > kfree(ctx); > ... > kernfs_get_active = false > echo 1 > cgroup.pressure > kernfs_show > kernfs_activate_one(kn); > kernfs_fop_poll > kernfs_get_active = true > cgroup_file_poll > psi_trigger_poll > // UAF > ... > end: close(fd) > > To address this issue, introduce kernfs_get_active_of() for kernfs open > files to obtain active references. This function will fail if the open file > has been released. Replace kernfs_get_active() with kernfs_get_active_of() > to prevent further operations on released file descriptors. > > Fixes: 34f26a15611a ("sched/psi: Per-cgroup PSI accounting disable/re-enable interface") > Reported-by: Zhang Zhaotian <zhangzhaotian@huawei.com> > Signed-off-by: Chen Ridong <chenridong@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Thanks. -- tejun ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 1/2] kernfs: Fix UAF in polling when open file is released 2025-08-22 17:46 ` Tejun Heo @ 2025-08-23 0:23 ` Chen Ridong 0 siblings, 0 replies; 11+ messages in thread From: Chen Ridong @ 2025-08-23 0:23 UTC (permalink / raw) To: Tejun Heo Cc: gregkh, hannes, mkoutny, peterz, zhouchengming, linux-kernel, cgroups, lujialin4, chenridong, libaokun1 On 2025/8/23 1:46, Tejun Heo wrote: > On Fri, Aug 22, 2025 at 07:07:14AM +0000, Chen Ridong wrote: >> From: Chen Ridong <chenridong@huawei.com> >> >> A use-after-free (UAF) vulnerability was identified in the PSI (Pressure >> Stall Information) monitoring mechanism: >> >> BUG: KASAN: slab-use-after-free in psi_trigger_poll+0x3c/0x140 >> Read of size 8 at addr ffff3de3d50bd308 by task systemd/1 >> >> psi_trigger_poll+0x3c/0x140 >> cgroup_pressure_poll+0x70/0xa0 >> cgroup_file_poll+0x8c/0x100 >> kernfs_fop_poll+0x11c/0x1c0 >> ep_item_poll.isra.0+0x188/0x2c0 >> >> Allocated by task 1: >> cgroup_file_open+0x88/0x388 >> kernfs_fop_open+0x73c/0xaf0 >> do_dentry_open+0x5fc/0x1200 >> vfs_open+0xa0/0x3f0 >> do_open+0x7e8/0xd08 >> path_openat+0x2fc/0x6b0 >> do_filp_open+0x174/0x368 >> >> Freed by task 8462: >> cgroup_file_release+0x130/0x1f8 >> kernfs_drain_open_files+0x17c/0x440 >> kernfs_drain+0x2dc/0x360 >> kernfs_show+0x1b8/0x288 >> cgroup_file_show+0x150/0x268 >> cgroup_pressure_write+0x1dc/0x340 >> cgroup_file_write+0x274/0x548 >> >> Reproduction Steps: >> 1. Open test/cpu.pressure and establish epoll monitoring >> 2. Disable monitoring: echo 0 > test/cgroup.pressure >> 3. Re-enable monitoring: echo 1 > test/cgroup.pressure >> >> The race condition occurs because: >> 1. When cgroup.pressure is disabled (echo 0 > cgroup.pressure), it: >> - Releases PSI triggers via cgroup_file_release() >> - Frees of->priv through kernfs_drain_open_files() >> 2. While epoll still holds reference to the file and continues polling >> 3. Re-enabling (echo 1 > cgroup.pressure) accesses freed of->priv >> >> epolling disable/enable cgroup.pressure >> fd=open(cpu.pressure) >> while(1) >> ... >> epoll_wait >> kernfs_fop_poll >> kernfs_get_active = true echo 0 > cgroup.pressure >> ... cgroup_file_show >> kernfs_show >> // inactive kn >> kernfs_drain_open_files >> cft->release(of); >> kfree(ctx); >> ... >> kernfs_get_active = false >> echo 1 > cgroup.pressure >> kernfs_show >> kernfs_activate_one(kn); >> kernfs_fop_poll >> kernfs_get_active = true >> cgroup_file_poll >> psi_trigger_poll >> // UAF >> ... >> end: close(fd) >> >> To address this issue, introduce kernfs_get_active_of() for kernfs open >> files to obtain active references. This function will fail if the open file >> has been released. Replace kernfs_get_active() with kernfs_get_active_of() >> to prevent further operations on released file descriptors. >> >> Fixes: 34f26a15611a ("sched/psi: Per-cgroup PSI accounting disable/re-enable interface") >> Reported-by: Zhang Zhaotian <zhangzhaotian@huawei.com> >> Signed-off-by: Chen Ridong <chenridong@huawei.com> > > Acked-by: Tejun Heo <tj@kernel.org> > > Thanks. > Thanks -- Best regards, Ridong ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v2 2/2] cgroup/psi: Set of->priv to NULL upon file release 2025-08-22 7:07 [PATCH v2 0/2] kernfs: Fix UAF in PSI polling when open file is released Chen Ridong 2025-08-22 7:07 ` [PATCH v2 1/2] kernfs: Fix UAF in " Chen Ridong @ 2025-08-22 7:07 ` Chen Ridong 2025-08-22 17:48 ` Tejun Heo 1 sibling, 1 reply; 11+ messages in thread From: Chen Ridong @ 2025-08-22 7:07 UTC (permalink / raw) To: gregkh, tj, hannes, mkoutny, peterz, zhouchengming Cc: linux-kernel, cgroups, lujialin4, chenridong, libaokun1 From: Chen Ridong <chenridong@huawei.com> Setting of->priv to NULL when the file is released enables earlier bug detection. This allows potential bugs to manifest as NULL pointer dereferences rather than use-after-free errors[1], which are generally more difficult to diagnose. [1] https://lore.kernel.org/cgroups/38ef3ff9-b380-44f0-9315-8b3714b0948d@huaweicloud.com/T/#m8a3b3f88f0ff3da5925d342e90043394f8b2091b Signed-off-by: Chen Ridong <chenridong@huawei.com> --- kernel/cgroup/cgroup.c | 1 + 1 file changed, 1 insertion(+) diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index 312c6a8b55bb..d8b82afed181 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -4159,6 +4159,7 @@ static void cgroup_file_release(struct kernfs_open_file *of) cft->release(of); put_cgroup_ns(ctx->ns); kfree(ctx); + of->priv = NULL; } static ssize_t cgroup_file_write(struct kernfs_open_file *of, char *buf, -- 2.34.1 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v2 2/2] cgroup/psi: Set of->priv to NULL upon file release 2025-08-22 7:07 ` [PATCH v2 2/2] cgroup/psi: Set of->priv to NULL upon file release Chen Ridong @ 2025-08-22 17:48 ` Tejun Heo 2025-08-23 6:43 ` Greg KH 0 siblings, 1 reply; 11+ messages in thread From: Tejun Heo @ 2025-08-22 17:48 UTC (permalink / raw) To: Chen Ridong Cc: gregkh, hannes, mkoutny, peterz, zhouchengming, linux-kernel, cgroups, lujialin4, chenridong, libaokun1 On Fri, Aug 22, 2025 at 07:07:15AM +0000, Chen Ridong wrote: > From: Chen Ridong <chenridong@huawei.com> > > Setting of->priv to NULL when the file is released enables earlier bug > detection. This allows potential bugs to manifest as NULL pointer > dereferences rather than use-after-free errors[1], which are generally more > difficult to diagnose. > > [1] https://lore.kernel.org/cgroups/38ef3ff9-b380-44f0-9315-8b3714b0948d@huaweicloud.com/T/#m8a3b3f88f0ff3da5925d342e90043394f8b2091b > Signed-off-by: Chen Ridong <chenridong@huawei.com> Applied to cgroup/for-6.17-fixes. Thanks. -- tejun ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 2/2] cgroup/psi: Set of->priv to NULL upon file release 2025-08-22 17:48 ` Tejun Heo @ 2025-08-23 6:43 ` Greg KH 2025-08-25 17:32 ` Tejun Heo 0 siblings, 1 reply; 11+ messages in thread From: Greg KH @ 2025-08-23 6:43 UTC (permalink / raw) To: Tejun Heo Cc: Chen Ridong, hannes, mkoutny, peterz, zhouchengming, linux-kernel, cgroups, lujialin4, chenridong, libaokun1 On Fri, Aug 22, 2025 at 07:48:08AM -1000, Tejun Heo wrote: > On Fri, Aug 22, 2025 at 07:07:15AM +0000, Chen Ridong wrote: > > From: Chen Ridong <chenridong@huawei.com> > > > > Setting of->priv to NULL when the file is released enables earlier bug > > detection. This allows potential bugs to manifest as NULL pointer > > dereferences rather than use-after-free errors[1], which are generally more > > difficult to diagnose. > > > > [1] https://lore.kernel.org/cgroups/38ef3ff9-b380-44f0-9315-8b3714b0948d@huaweicloud.com/T/#m8a3b3f88f0ff3da5925d342e90043394f8b2091b > > Signed-off-by: Chen Ridong <chenridong@huawei.com> > > Applied to cgroup/for-6.17-fixes. Both or just this second patch? Should I take the first through the driver-core tree, or do you want to take it through the cgroup tree? No objection from me for you to take both :) thanks, greg k-h ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 2/2] cgroup/psi: Set of->priv to NULL upon file release 2025-08-23 6:43 ` Greg KH @ 2025-08-25 17:32 ` Tejun Heo 2025-09-01 1:38 ` Chen Ridong 0 siblings, 1 reply; 11+ messages in thread From: Tejun Heo @ 2025-08-25 17:32 UTC (permalink / raw) To: Greg KH Cc: Chen Ridong, hannes, mkoutny, peterz, zhouchengming, linux-kernel, cgroups, lujialin4, chenridong, libaokun1 Hello, Greg. On Sat, Aug 23, 2025 at 08:43:48AM +0200, Greg KH wrote: > > Applied to cgroup/for-6.17-fixes. > > Both or just this second patch? Should I take the first through the > driver-core tree, or do you want to take it through the cgroup tree? No > objection from me for you to take both :) Sorry about the lack of clarity. Just the second one. The first one looks fine to me but it would probably be more appropriate if you take it. Thanks! -- tejun ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 2/2] cgroup/psi: Set of->priv to NULL upon file release 2025-08-25 17:32 ` Tejun Heo @ 2025-09-01 1:38 ` Chen Ridong 2025-09-01 6:06 ` Greg KH 0 siblings, 1 reply; 11+ messages in thread From: Chen Ridong @ 2025-09-01 1:38 UTC (permalink / raw) To: Tejun Heo, Greg KH Cc: hannes, mkoutny, peterz, zhouchengming, linux-kernel, cgroups, lujialin4, chenridong, libaokun1 On 2025/8/26 1:32, Tejun Heo wrote: > Hello, Greg. > > On Sat, Aug 23, 2025 at 08:43:48AM +0200, Greg KH wrote: >>> Applied to cgroup/for-6.17-fixes. >> >> Both or just this second patch? Should I take the first through the >> driver-core tree, or do you want to take it through the cgroup tree? No >> objection from me for you to take both :) > > Sorry about the lack of clarity. Just the second one. The first one looks > fine to me but it would probably be more appropriate if you take it. > > Thanks! > Hello all, Any other opinions? Can this patch be applied? -- Best regards, Ridong ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 2/2] cgroup/psi: Set of->priv to NULL upon file release 2025-09-01 1:38 ` Chen Ridong @ 2025-09-01 6:06 ` Greg KH 0 siblings, 0 replies; 11+ messages in thread From: Greg KH @ 2025-09-01 6:06 UTC (permalink / raw) To: Chen Ridong Cc: Tejun Heo, hannes, mkoutny, peterz, zhouchengming, linux-kernel, cgroups, lujialin4, chenridong, libaokun1 On Mon, Sep 01, 2025 at 09:38:49AM +0800, Chen Ridong wrote: > > > On 2025/8/26 1:32, Tejun Heo wrote: > > Hello, Greg. > > > > On Sat, Aug 23, 2025 at 08:43:48AM +0200, Greg KH wrote: > >>> Applied to cgroup/for-6.17-fixes. > >> > >> Both or just this second patch? Should I take the first through the > >> driver-core tree, or do you want to take it through the cgroup tree? No > >> objection from me for you to take both :) > > > > Sorry about the lack of clarity. Just the second one. The first one looks > > fine to me but it would probably be more appropriate if you take it. > > > > Thanks! > > > > Hello all, > > Any other opinions? Can this patch be applied? Please give us a chance to catch up with patch reviews :) ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2025-09-01 6:06 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-08-22 7:07 [PATCH v2 0/2] kernfs: Fix UAF in PSI polling when open file is released Chen Ridong 2025-08-22 7:07 ` [PATCH v2 1/2] kernfs: Fix UAF in " Chen Ridong 2025-08-22 7:47 ` Greg KH 2025-08-22 17:46 ` Tejun Heo 2025-08-23 0:23 ` Chen Ridong 2025-08-22 7:07 ` [PATCH v2 2/2] cgroup/psi: Set of->priv to NULL upon file release Chen Ridong 2025-08-22 17:48 ` Tejun Heo 2025-08-23 6:43 ` Greg KH 2025-08-25 17:32 ` Tejun Heo 2025-09-01 1:38 ` Chen Ridong 2025-09-01 6:06 ` Greg KH
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).