public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed
From: chenridong <chenridong@huawei.com>
To: "Michal Koutný" <mkoutny@suse.com>
Cc: Hillf Danton <hdanton@sina.com>,
	Roman Gushchin <roman.gushchin@linux.dev>, <tj@kernel.org>,
	<bpf@vger.kernel.org>, <cgroups@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>
Subject: Re: [PATCH -v2] cgroup: fix deadlock caused by cgroup_mutex and cpu_hotplug_lock
Date: Thu, 8 Aug 2024 10:22:21 +0800	[thread overview]
Message-ID: <8be4c357-a111-4134-b7de-ffa6f769c9e4@huawei.com> (raw)
In-Reply-To: <mxyismki3ln2pvrbhd36japfffpfcwgyvgmy5him3n746w6wd6@24zlflalef6x>



On 2024/8/7 21:32, Michal Koutný wrote:
> Hello.
> 
> On Sat, Jul 27, 2024 at 06:21:55PM GMT, chenridong <chenridong@huawei.com> wrote:
>> Yes, I have offered the scripts in Link(V1).
> 
> Thanks (and thanks for patience).
> There is no lockdep complain about a deadlock (i.e. some circular
> locking dependencies). (I admit the multiple holders of cgroup_mutex
> reported there confuse me, I guess that's an artifact of this lockdep
> report and they could be also waiters.)
> 
>>> Who'd be the holder of cgroup_mutex preventing cgroup_bpf_release from
>>> progress? (That's not clear to me from your diagram.)
>>>
>> This is a cumulative process. The stress testing deletes a large member of
>> cgroups, and cgroup_bpf_release is asynchronous, competing with cgroup
>> release works.
> 
> Those are different situations:
> - waiting for one holder that's stuck for some reason (that's what we're
>    after),
> - waiting because the mutex is contended (that's slow but progresses
>    eventually).
> 
>> You know, cgroup_mutex is used in many places. Finally, the number of
>> `cgroup_bpf_release` instances in system_wq accumulates up to 256, and
>> it leads to this issue.
> 
> Reaching max_active doesn't mean that queue_work() would block or the
> items were lost. They are only queued onto inactive_works list.

Yes, I agree. But what if 256 active works can't finish because they are 
waiting for a lock? the works at inactive list can never be executed.
> (Remark: cgroup_destroy_wq has only max_active=1 but it apparently
> doesn't stop progress should there be more items queued (when
> when cgroup_mutex is not guarding losing references.))
> 
cgroup_destroy_wq is not stopped by cgroup_mutex, it has acquired 
cgroup_mutex, but it was blocked cpu_hotplug_lock.read. 
cpu_hotplug_lock.write is held by cpu offline process(step3).
> ---
> 
> The change on its own (deferred cgroup bpf progs removal via
> cgroup_destroy_wq instead of system_wq) is sensible by collecting
> related objects removal together (at the same time it shouldn't cause
> problems by sharing one cgroup_destroy_wq).
> 

> But the reasoning in the commit message doesn't add up to me. There
> isn't obvious deadlock, I'd say that system is overloaded with repeated
> calls of __lockup_detector_reconfigure() and it is not in deadlock
> state -- i.e. when you stop the test, it should eventually recover.
> Given that, I'd neither put Fixes: 4bfc0bb2c60e there.
> If I stop test, it can never recover. It does not need to be fixed if it 
could recover.
I have to admit, it is a complicated issue.

System_wq was not overloaded with __lockup_detector_reconfigure, but 
with cgroup_bpf_release_fn. A large number of cgroups were deleted. 
There were 256 active works in system_wq that were 
cgroup_bpf_release_fn, and they were all blocked by cgroup_mutex.

To make it simple, just imagine what if the max_active max_active of 
system_wq is 1? Could it result in a deadlock? If it could be deadlock, 
just imagine all works in system_wq are same.


> (One could symetrically argue to move smp_call_on_cpu() away from
> system_wq instead of cgroup_bpf_release_fn().)
> 
I also agree, why I move cgroup_bpf_release_fn away, cgroup has it own 
queue. As TJ said "system wqs are for misc things which shouldn't create 
a large number of concurrent work items. If something is going to 
generate 256+ concurrent work items, it should use its own workqueue."

> Honestly, I'm not sure it's worth the effort if there's no deadlock.
> 
There is a deadlock, and i think it have to be fixed.
> It's possible that I'm misunderstanding or I've missed a substantial
> detail for why this could lead to a deadlock. It'd be best visible in a
> sequence diagram with tasks/CPUs left-to-right and time top-down (in the
> original scheme it looks like time goes right-to-left and there's the
> unclear situation of the initial cgroup_mutex holder).
> 
> Thanks,
> Michal

I will modify the diagram.
And I hope you can understand how it leads to deadlock.
Thank you Michal for your reply.

Thanks,
Ridong

  reply	other threads:[~2024-08-08  2:22 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-19  2:52 [PATCH -v2] cgroup: fix deadlock caused by cgroup_mutex and cpu_hotplug_lock Chen Ridong
2024-07-19 18:54 ` bot+bpf-ci
2024-07-20  3:15 ` bot+bpf-ci
2024-07-24  0:53 ` chenridong
2024-08-01  1:34   ` chenridong
2024-07-24 11:08 ` Hillf Danton
2024-07-25  1:48   ` chenridong
2024-07-25 11:01     ` Hillf Danton
2024-07-26 13:04     ` Michal Koutný
2024-07-27 10:21       ` chenridong
2024-08-07 13:32         ` Michal Koutný
2024-08-08  2:22           ` chenridong [this message]
2024-08-08 17:03           ` Roman Gushchin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8be4c357-a111-4134-b7de-ffa6f769c9e4@huawei.com \
    --to=chenridong@huawei.com \
    --cc=bpf@vger.kernel.org \
    --cc=cgroups@vger.kernel.org \
    --cc=hdanton@sina.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mkoutny@suse.com \
    --cc=roman.gushchin@linux.dev \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox