From: Roman Gushchin <roman.gushchin@linux.dev>
To: chenridong <chenridong@huawei.com>, tj@kernel.org
Cc: martin.lau@linux.dev, ast@kernel.org, daniel@iogearbox.net,
andrii@kernel.org, eddyz87@gmail.com, song@kernel.org,
yonghong.song@linux.dev, john.fastabend@gmail.com,
kpsingh@kernel.org, sdf@google.com, haoluo@google.com,
jolsa@kernel.org, tj@kernel.org, lizefan.x@bytedance.com,
hannes@cmpxchg.org, bpf@vger.kernel.org, cgroups@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH -next] cgroup: Fix AA deadlock caused by cgroup_bpf_release
Date: Thu, 11 Jul 2024 03:52:34 +0000 [thread overview]
Message-ID: <Zo9XAmjpP6y0ZDGH@google.com> (raw)
In-Reply-To: <a1b23274-4a35-4cbf-8c4c-5f770fbcc187@huawei.com>
On Wed, Jul 10, 2024 at 11:02:57AM +0800, chenridong wrote:
>
>
> On 2024/7/9 21:42, chenridong wrote:
> >
> >
> > On 2024/6/10 10:47, Roman Gushchin wrote:
> > > Hi Chen!
> > >
> > > Was this problem found in the real life? Do you have a LOCKDEP
> > > splash available?
> > >
> > Sorry for the late email response.
> > Yes, it was. The issue occurred after a long period of stress testing,
> > with a very low probability.
> > > > On Jun 7, 2024, at 4:09 AM, Chen Ridong <chenridong@huawei.com> wrote:
> > > >
> > > > We found an AA deadlock problem as shown belowed:
> > > >
> > > > cgroup_destroy_wq TaskB WatchDog
> > > > system_wq
> > > >
> > > > ...
> > > > css_killed_work_fn:
> > > > P(cgroup_mutex)
> > > > ...
> > > > ...
> > > > __lockup_detector_reconfigure:
> > > > P(cpu_hotplug_lock.read)
> > > > ...
> > > > ...
> > > > percpu_down_write:
> > > > P(cpu_hotplug_lock.write)
> > > > ...
> > > > cgroup_bpf_release:
> > > > P(cgroup_mutex)
> > > > smp_call_on_cpu:
> > > > Wait system_wq
> > > >
> > > > cpuset_css_offline:
> > > > P(cpu_hotplug_lock.read)
> > > >
> > > > WatchDog is waiting for system_wq, who is waiting for cgroup_mutex, to
> > > > finish the jobs, but the owner of the cgroup_mutex is waiting for
> > > > cpu_hotplug_lock. This problem caused by commit 4bfc0bb2c60e ("bpf:
> > > > decouple the lifetime of cgroup_bpf from cgroup itself")
> > > > puts cgroup_bpf release work into system_wq. As cgroup_bpf is a
> > > > member of
> > > > cgroup, it is reasonable to put cgroup bpf release work into
> > > > cgroup_destroy_wq, which is only used for cgroup's release work, and the
> > > > preblem is solved.
> > >
> > > I need to think more on this, but at first glance the fix looks a
> > > bit confusing. cgroup_bpf_release() looks quite innocent, it only
> > > takes a cgroup_mutex. It’s not obvious why it’s not ok and requires
> > > a dedicated work queue. What exactly is achieved by placing it back
> > > on the dedicated cgroup destroy queue?
> > >
> > > I’m not trying to say your fix won’t work, but it looks like it
> > > might cover a more serious problem.
> >
> > The issue lies in the fact that different tasks require the cgroup_mutex
> > and cpu_hotplug_lock locks, eventually forming a deadlock. Placing
> > cgroup bpf release work on cgroup destroy queue can break loop.
> >
> The max_active of system_wq is WQ_DFL_ACTIVE(256). If all active works are
> cgroup bpf release works, it will block smp_call_on_cpu work which enque
> after cgroup bpf releases. So smp_call_on_cpu holding cpu_hotplug_lock will
> wait for completion, but it can never get a completion because cgroup bpf
> release works can not get cgroup_mutex and will never finish.
> However, Placing the cgroup bpf release works on cgroup destroy will never
> block smp_call_on_cpu work, which means loop is broken. Thus, it can solve
> the problem.
Tejun,
do you have an opinion on this?
If there are certain limitations from the cgroup side on what can be done
in a generic work context, it would be nice to document (e.g. don't grab
cgroup mutex), but I still struggle to understand what exactly is wrong
with the blamed commit.
Thanks,
Roman
next prev parent reply other threads:[~2024-07-11 3:52 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-07 11:03 [PATCH -next] cgroup: Fix AA deadlock caused by cgroup_bpf_release Chen Ridong
2024-06-10 2:47 ` Roman Gushchin
2024-07-09 13:42 ` chenridong
2024-07-10 3:02 ` chenridong
2024-07-11 3:52 ` Roman Gushchin [this message]
2024-07-11 17:36 ` Tejun Heo
2024-07-12 1:15 ` chenridong
2024-07-16 12:14 ` chenridong
2024-07-16 14:53 ` Roman Gushchin
2024-07-17 2:08 ` chenridong
2024-06-10 12:28 ` Markus Elfring
2024-07-09 13:45 ` chenridong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zo9XAmjpP6y0ZDGH@google.com \
--to=roman.gushchin@linux.dev \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=cgroups@vger.kernel.org \
--cc=chenridong@huawei.com \
--cc=daniel@iogearbox.net \
--cc=eddyz87@gmail.com \
--cc=hannes@cmpxchg.org \
--cc=haoluo@google.com \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kpsingh@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=lizefan.x@bytedance.com \
--cc=martin.lau@linux.dev \
--cc=sdf@google.com \
--cc=song@kernel.org \
--cc=tj@kernel.org \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.