public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Roman Gushchin <roman.gushchin@linux.dev>
To: chenridong <chenridong@huawei.com>, tj@kernel.org
Cc: martin.lau@linux.dev, ast@kernel.org, daniel@iogearbox.net,
	andrii@kernel.org, eddyz87@gmail.com, song@kernel.org,
	yonghong.song@linux.dev, john.fastabend@gmail.com,
	kpsingh@kernel.org, sdf@google.com, haoluo@google.com,
	jolsa@kernel.org, tj@kernel.org, lizefan.x@bytedance.com,
	hannes@cmpxchg.org, bpf@vger.kernel.org, cgroups@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH -next] cgroup: Fix AA deadlock caused by cgroup_bpf_release
Date: Thu, 11 Jul 2024 03:52:34 +0000	[thread overview]
Message-ID: <Zo9XAmjpP6y0ZDGH@google.com> (raw)
In-Reply-To: <a1b23274-4a35-4cbf-8c4c-5f770fbcc187@huawei.com>

On Wed, Jul 10, 2024 at 11:02:57AM +0800, chenridong wrote:
> 
> 
> On 2024/7/9 21:42, chenridong wrote:
> > 
> > 
> > On 2024/6/10 10:47, Roman Gushchin wrote:
> > > Hi Chen!
> > > 
> > > Was this problem found in the real life? Do you have a LOCKDEP
> > > splash available?
> > > 
> > Sorry for the late email response.
> > Yes, it was. The issue occurred after a long period of stress testing,
> > with a very low probability.
> > > > On Jun 7, 2024, at 4:09 AM, Chen Ridong <chenridong@huawei.com> wrote:
> > > > 
> > > > We found an AA deadlock problem as shown belowed:
> > > > 
> > > > cgroup_destroy_wq        TaskB                WatchDog
> > > > system_wq
> > > > 
> > > > ...
> > > > css_killed_work_fn:
> > > > P(cgroup_mutex)
> > > > ...
> > > >                                 ...
> > > >                                 __lockup_detector_reconfigure:
> > > >                                 P(cpu_hotplug_lock.read)
> > > >                                 ...
> > > >                 ...
> > > >                 percpu_down_write:
> > > >                 P(cpu_hotplug_lock.write)
> > > >                                                 ...
> > > >                                                 cgroup_bpf_release:
> > > >                                                 P(cgroup_mutex)
> > > >                                 smp_call_on_cpu:
> > > >                                 Wait system_wq
> > > > 
> > > > cpuset_css_offline:
> > > > P(cpu_hotplug_lock.read)
> > > > 
> > > > WatchDog is waiting for system_wq, who is waiting for cgroup_mutex, to
> > > > finish the jobs, but the owner of the cgroup_mutex is waiting for
> > > > cpu_hotplug_lock. This problem caused by commit 4bfc0bb2c60e ("bpf:
> > > > decouple the lifetime of cgroup_bpf from cgroup itself")
> > > > puts cgroup_bpf release work into system_wq. As cgroup_bpf is a
> > > > member of
> > > > cgroup, it is reasonable to put cgroup bpf release work into
> > > > cgroup_destroy_wq, which is only used for cgroup's release work, and the
> > > > preblem is solved.
> > > 
> > > I need to think more on this, but at first glance the fix looks a
> > > bit confusing. cgroup_bpf_release() looks quite innocent, it only
> > > takes a cgroup_mutex. It’s not obvious why it’s not ok and requires
> > > a dedicated work queue. What exactly is achieved by placing it back
> > > on the dedicated cgroup destroy queue?
> > > 
> > > I’m not trying to say your fix won’t work, but it looks like it
> > > might cover a more serious problem.
> > 
> > The issue lies in the fact that different tasks require the cgroup_mutex
> > and cpu_hotplug_lock locks, eventually forming a deadlock. Placing
> > cgroup bpf release work on cgroup destroy queue can break loop.
> > 
> The max_active of system_wq is WQ_DFL_ACTIVE(256). If all active works are
> cgroup bpf release works, it will block smp_call_on_cpu work which enque
> after cgroup bpf releases. So smp_call_on_cpu holding cpu_hotplug_lock will
> wait for completion, but it can never get a completion because cgroup bpf
> release works can not get cgroup_mutex and will never finish.
> However, Placing the cgroup bpf release works on cgroup destroy will never
> block smp_call_on_cpu work, which means loop is broken. Thus, it can solve
> the problem.

Tejun,

do you have an opinion on this?

If there are certain limitations from the cgroup side on what can be done
in a generic work context, it would be nice to document (e.g. don't grab
cgroup mutex), but I still struggle to understand what exactly is wrong
with the blamed commit.

Thanks,
Roman

  reply	other threads:[~2024-07-11  3:52 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-07 11:03 [PATCH -next] cgroup: Fix AA deadlock caused by cgroup_bpf_release Chen Ridong
2024-06-10  2:47 ` Roman Gushchin
2024-07-09 13:42   ` chenridong
2024-07-10  3:02     ` chenridong
2024-07-11  3:52       ` Roman Gushchin [this message]
2024-07-11 17:36         ` Tejun Heo
2024-07-12  1:15           ` chenridong
2024-07-16 12:14             ` chenridong
2024-07-16 14:53               ` Roman Gushchin
2024-07-17  2:08                 ` chenridong
2024-06-10 12:28 ` Markus Elfring
2024-07-09 13:45   ` chenridong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zo9XAmjpP6y0ZDGH@google.com \
    --to=roman.gushchin@linux.dev \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=cgroups@vger.kernel.org \
    --cc=chenridong@huawei.com \
    --cc=daniel@iogearbox.net \
    --cc=eddyz87@gmail.com \
    --cc=hannes@cmpxchg.org \
    --cc=haoluo@google.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kpsingh@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizefan.x@bytedance.com \
    --cc=martin.lau@linux.dev \
    --cc=sdf@google.com \
    --cc=song@kernel.org \
    --cc=tj@kernel.org \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox