From: Christian Brauner <brauner@kernel.org>
To: Shakeel Butt <shakeel.butt@linux.dev>
Cc: "Tejun Heo" <tj@kernel.org>,
"Johannes Weiner" <hannes@cmpxchg.org>,
"Michal Koutný" <mkoutny@suse.com>,
cgroups@vger.kernel.org, linux-kernel@vger.kernel.org,
"Meta kernel team" <kernel-team@meta.com>
Subject: Re: [PATCH] cgroup: fix race between fork and cgroup.kill
Date: Tue, 4 Feb 2025 13:44:38 +0100 [thread overview]
Message-ID: <20250204-willen-aufmachen-69e8a849a5a7@brauner> (raw)
In-Reply-To: <20250131000542.1394856-1-shakeel.butt@linux.dev>
On Thu, Jan 30, 2025 at 04:05:42PM -0800, Shakeel Butt wrote:
> Tejun reported the following race between fork() and cgroup.kill at [1].
>
> Tejun:
> I was looking at cgroup.kill implementation and wondering whether there
> could be a race window. So, __cgroup_kill() does the following:
>
> k1. Set CGRP_KILL.
> k2. Iterate tasks and deliver SIGKILL.
> k3. Clear CGRP_KILL.
>
> The copy_process() does the following:
>
> c1. Copy a bunch of stuff.
> c2. Grab siglock.
> c3. Check fatal_signal_pending().
> c4. Commit to forking.
> c5. Release siglock.
> c6. Call cgroup_post_fork() which puts the task on the css_set and tests
> CGRP_KILL.
>
> The intention seems to be that either a forking task gets SIGKILL and
> terminates on c3 or it sees CGRP_KILL on c6 and kills the child. However, I
> don't see what guarantees that k3 can't happen before c6. ie. After a
> forking task passes c5, k2 can take place and then before the forking task
> reaches c6, k3 can happen. Then, nobody would send SIGKILL to the child.
> What am I missing?
>
> This is indeed a race. One way to fix this race is by taking
> cgroup_threadgroup_rwsem in write mode in __cgroup_kill() as the fork()
> side takes cgroup_threadgroup_rwsem in read mode from cgroup_can_fork()
> to cgroup_post_fork(). However that would be heavy handed as this adds
> one more potential stall scenario for cgroup.kill which is usually
> called under extreme situation like memory pressure.
>
> To fix this race, let's maintain a sequence number per cgroup which gets
> incremented on __cgroup_kill() call. On the fork() side, the
> cgroup_can_fork() will cache the sequence number locally and recheck it
> against the cgroup's sequence number at cgroup_post_fork() site. If the
> sequence numbers mismatch, it means __cgroup_kill() can been called and
> we should send SIGKILL to the newly created task.
>
> Reported-by: Tejun Heo <tj@kernel.org>
> Closes: https://lore.kernel.org/all/Z5QHE2Qn-QZ6M-KW@slm.duckdns.org/ [1]
> Fixes: 661ee6280931 ("cgroup: introduce cgroup.kill")
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> ---
Acked-by: Christian Brauner <brauner@kernel.org>
prev parent reply other threads:[~2025-02-04 12:44 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-31 0:05 [PATCH] cgroup: fix race between fork and cgroup.kill Shakeel Butt
2025-01-31 9:33 ` Michal Koutný
2025-02-02 16:56 ` Tejun Heo
2025-02-04 12:44 ` Christian Brauner [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250204-willen-aufmachen-69e8a849a5a7@brauner \
--to=brauner@kernel.org \
--cc=cgroups@vger.kernel.org \
--cc=hannes@cmpxchg.org \
--cc=kernel-team@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mkoutny@suse.com \
--cc=shakeel.butt@linux.dev \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox