From: Waiman Long <longman@redhat.com>
To: Chen Ridong <chenridong@huawei.com>,
tj@kernel.org, lizefan.x@bytedance.com, hannes@cmpxchg.org
Cc: bpf@vger.kernel.org, cgroups@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH -next] cgroup: fix uaf when proc_cpuset_show
Date: Sat, 22 Jun 2024 11:05:31 -0400 [thread overview]
Message-ID: <19648b9c-6df7-45cd-a5ae-624a3e4d860f@redhat.com> (raw)
In-Reply-To: <20240622113814.120907-1-chenridong@huawei.com>
[-- Attachment #1: Type: text/plain, Size: 4819 bytes --]
On 6/22/24 07:38, Chen Ridong wrote:
> We found a refcount UAF bug as follows:
>
> BUG: KASAN: use-after-free in cgroup_path_ns+0x112/0x150
> Read of size 8 at addr ffff8882a4b242b8 by task atop/19903
>
> CPU: 27 PID: 19903 Comm: atop Kdump: loaded Tainted: GF
> Call Trace:
> dump_stack+0x7d/0xa7
> print_address_description.constprop.0+0x19/0x170
> ? cgroup_path_ns+0x112/0x150
> __kasan_report.cold+0x6c/0x84
> ? print_unreferenced+0x390/0x3b0
> ? cgroup_path_ns+0x112/0x150
> kasan_report+0x3a/0x50
> cgroup_path_ns+0x112/0x150
> proc_cpuset_show+0x164/0x530
> proc_single_show+0x10f/0x1c0
> seq_read_iter+0x405/0x1020
> ? aa_path_link+0x2e0/0x2e0
> seq_read+0x324/0x500
> ? seq_read_iter+0x1020/0x1020
> ? common_file_perm+0x2a1/0x4a0
> ? fsnotify_unmount_inodes+0x380/0x380
> ? bpf_lsm_file_permission_wrapper+0xa/0x30
> ? security_file_permission+0x53/0x460
> vfs_read+0x122/0x420
> ksys_read+0xed/0x1c0
> ? __ia32_sys_pwrite64+0x1e0/0x1e0
> ? __audit_syscall_exit+0x741/0xa70
> do_syscall_64+0x33/0x40
> entry_SYSCALL_64_after_hwframe+0x67/0xcc
>
> This is also reported by: https://syzkaller.appspot.com/bug?extid=9b1ff7be974a403aa4cd
>
> This can be reproduced by the following methods:
> 1.add an mdelay(1000) before acquiring the cgroup_lock In the
> cgroup_path_ns function.
> 2.$cat /proc/<pid>/cpuset repeatly.
> 3.$mount -t cgroup -o cpuset cpuset /sys/fs/cgroup/cpuset/
> $umount /sys/fs/cgroup/cpuset/ repeatly.
>
> The race that cause this bug can be shown as below:
>
> (umount) | (cat /proc/<pid>/cpuset)
> css_release | proc_cpuset_show
> css_release_work_fn | css = task_get_css(tsk, cpuset_cgrp_id);
> css_free_rwork_fn | cgroup_path_ns(css->cgroup, ...);
> cgroup_destroy_root | mutex_lock(&cgroup_mutex);
> rebind_subsystems |
> cgroup_free_root |
> | // cgrp was freed, UAF
> | cgroup_path_ns_locked(cgrp,..);
>
> When the cpuset is initialized, the root node top_cpuset.css.cgrp
> will point to &cgrp_dfl_root.cgrp. In cgroup v1, the mount operation will
> allocate cgroup_root, and top_cpuset.css.cgrp will point to the allocated
> &cgroup_root.cgrp. When the umount operation is executed,
> top_cpuset.css.cgrp will be rebound to &cgrp_dfl_root.cgrp.
>
> The problem is that when rebinding to cgrp_dfl_root, there are cases
> where the cgroup_root allocated by setting up the root for cgroup v1
> is cached. This could lead to a Use-After-Free (UAF) if it is
> subsequently freed. The descendant cgroups of cgroup v1 can only be
> freed after the css is released. However, the css of the root will never
> be released, yet the cgroup_root should be freed when it is unmounted.
> This means that obtaining a reference to the css of the root does
> not guarantee that css.cgrp->root will not be freed.
>
> To solve this issue, we have added a cgroup reference count in
> the proc_cpuset_show function to ensure that css.cgrp->root will not
> be freed prematurely. This is a temporary solution. Let's see if anyone
> has a better solution.
>
> Signed-off-by: Chen Ridong <chenridong@huawei.com>
> ---
> kernel/cgroup/cpuset.c | 20 ++++++++++++++++++++
> 1 file changed, 20 insertions(+)
>
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index c12b9fdb22a4..782eaf807173 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -5045,6 +5045,7 @@ int proc_cpuset_show(struct seq_file *m, struct pid_namespace *ns,
> char *buf;
> struct cgroup_subsys_state *css;
> int retval;
> + struct cgroup *root_cgroup = NULL;
>
> retval = -ENOMEM;
> buf = kmalloc(PATH_MAX, GFP_KERNEL);
> @@ -5052,9 +5053,28 @@ int proc_cpuset_show(struct seq_file *m, struct pid_namespace *ns,
> goto out;
>
> css = task_get_css(tsk, cpuset_cgrp_id);
> + rcu_read_lock();
> + /*
> + * When the cpuset subsystem is mounted on the legacy hierarchy,
> + * the top_cpuset.css->cgroup does not hold a reference count of
> + * cgroup_root.cgroup. This makes accessing css->cgroup very
> + * dangerous because when the cpuset subsystem is remounted to the
> + * default hierarchy, the cgroup_root.cgroup that css->cgroup points
> + * to will be released, leading to a UAF issue. To avoid this problem,
> + * get the reference count of top_cpuset.css->cgroup first.
> + *
> + * This is ugly!!
> + */
> + if (css == &top_cpuset.css) {
> + cgroup_get(css->cgroup);
> + root_cgroup = css->cgroup;
> + }
> + rcu_read_unlock();
> retval = cgroup_path_ns(css->cgroup, buf, PATH_MAX,
> current->nsproxy->cgroup_ns);
> css_put(css);
> + if (root_cgroup)
> + cgroup_put(root_cgroup);
> if (retval == -E2BIG)
> retval = -ENAMETOOLONG;
> if (retval < 0)
Thanks for reporting this UAF bug. Could you try the attached patch to
see if it can fix the issue?
Cheers,
Longman
[-- Attachment #2: 0001-cgroup-cpuset-Prevent-UAF-in-proc_cpuset_show.patch --]
[-- Type: text/x-patch, Size: 3357 bytes --]
From 11036d027cc1f3dd0a6045794fb87711c840f426 Mon Sep 17 00:00:00 2001
From: Waiman Long <longman@redhat.com>
Date: Sat, 22 Jun 2024 10:25:15 -0400
Subject: [PATCH] cgroup/cpuset: Prevent UAF in proc_cpuset_show()
An UAF can happen when /proc/cpuset is read as reported in [1].
When the cpuset is initialized, the root node top_cpuset.css.cgrp
will point to &cgrp_dfl_root.cgrp. In cgroup v1, the mount operation will
allocate cgroup_root, and top_cpuset.css.cgrp will point to the allocated
&cgroup_root.cgrp. When the umount operation is executed,
top_cpuset.css.cgrp will be rebound to &cgrp_dfl_root.cgrp.
The problem is that when rebinding to cgrp_dfl_root, there are cases
where the cgroup_root allocated by setting up the root for cgroup v1
is cached. This could lead to a Use-After-Free (UAF) if it is
subsequently freed. The descendant cgroups of cgroup v1 can only be
freed after the css is released. However, the css of the root will never
be released, yet the cgroup_root should be freed when it is unmounted.
This means that obtaining a reference to the css of the root does
not guarantee that css.cgrp->root will not be freed.
Fix this problem by taking a reference to the v1 cgroup root in
cpuset_bind() and release it in the next cpuset_bind() call. The
top_cpuset will always be bound to either cgrp_dfl_root or the
allocated v1 cgroup root. So top_cpuset will always be remounted back
to cgrp_dfl_root whenever a v1 cpuset mount is released.
Access to css->cgroup in proc_cpuset_show() is now protected under
the cpuset_mutex to make sure that an UAF access to css->cgroup is
not possible.
[1] https://syzkaller.appspot.com/bug?extid=9b1ff7be974a403aa4cd
Reported-by: Chen Ridong <chenridong@huawei.com>
Closes: https://syzkaller.appspot.com/bug?extid=9b1ff7be974a403aa4cd
Signed-off-by: Waiman Long <longman@redhat.com>
---
kernel/cgroup/cpuset.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index c12b9fdb22a4..8155ad9ff927 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -4143,9 +4143,20 @@ static void cpuset_css_free(struct cgroup_subsys_state *css)
free_cpuset(cs);
}
+/*
+ * With a cgroup v1 mount, root_css.cgroup can be freed. We need to take a
+ * reference to it to avoid UAF as proc_cpuset_show() may access the content
+ * of this cgroup.
+ */
static void cpuset_bind(struct cgroup_subsys_state *root_css)
{
+ static struct cgroup *v1_cgroup_root;
+
mutex_lock(&cpuset_mutex);
+ if (v1_cgroup_root) {
+ cgroup_put(v1_cgroup_root);
+ v1_cgroup_root = NULL;
+ }
spin_lock_irq(&callback_lock);
if (is_in_v2_mode()) {
@@ -4159,6 +4170,10 @@ static void cpuset_bind(struct cgroup_subsys_state *root_css)
}
spin_unlock_irq(&callback_lock);
+ if (!cgroup_subsys_on_dfl(cpuset_cgrp_subsys)) {
+ v1_cgroup_root = root_css->cgroup;
+ cgroup_get(v1_cgroup_root);
+ }
mutex_unlock(&cpuset_mutex);
}
@@ -5051,10 +5066,12 @@ int proc_cpuset_show(struct seq_file *m, struct pid_namespace *ns,
if (!buf)
goto out;
+ mutex_lock(&cpuset_mutex);
css = task_get_css(tsk, cpuset_cgrp_id);
retval = cgroup_path_ns(css->cgroup, buf, PATH_MAX,
current->nsproxy->cgroup_ns);
css_put(css);
+ mutex_unlock(&cpuset_mutex);
if (retval == -E2BIG)
retval = -ENAMETOOLONG;
if (retval < 0)
--
2.39.3
next prev parent reply other threads:[~2024-06-22 15:05 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-22 11:38 [PATCH -next] cgroup: fix uaf when proc_cpuset_show Chen Ridong
2024-06-22 13:45 ` Markus Elfring
2024-06-24 3:34 ` chenridong
2024-06-22 15:05 ` Waiman Long [this message]
2024-06-22 20:04 ` [PATCH] cgroup/cpuset: Prevent UAF in proc_cpuset_show() Markus Elfring
2024-06-22 20:12 ` Waiman Long
2024-06-23 6:18 ` Markus Elfring
2024-06-23 16:28 ` Waiman Long
2024-06-24 2:59 ` [PATCH -next] cgroup: fix uaf when proc_cpuset_show chenridong
2024-06-24 23:59 ` Waiman Long
2024-06-25 1:46 ` chenridong
2024-06-25 2:40 ` Waiman Long
2024-06-25 3:12 ` chenridong
2024-06-25 10:10 ` Michal Koutný
[not found] ` <920bbfaa-bb76-4aa1-bd07-9a552e3bfdf2@huawei.com>
2024-06-25 14:16 ` Waiman Long
2024-06-25 14:29 ` chenridong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=19648b9c-6df7-45cd-a5ae-624a3e4d860f@redhat.com \
--to=longman@redhat.com \
--cc=bpf@vger.kernel.org \
--cc=cgroups@vger.kernel.org \
--cc=chenridong@huawei.com \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=lizefan.x@bytedance.com \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox