linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Luo Gengkun <luogengkun@huaweicloud.com>
To: linux-kernel@vger.kernel.org
Cc: pmladek@suse.com, mhocko@suse.com, lecopzer.chen@mediatek.com,
	yaoma@linux.alibaba.com, linuxppc-dev@lists.ozlabs.org,
	dianders@chromium.org, song@kernel.org, bpf@vger.kernel.org,
	npiggin@gmail.com, trix@redhat.com, naveen.n.rao@linux.ibm.com,
	kernelfans@gmail.com, akpm@linux-foundation.org,
	luogengkun@huaweicloud.com, tglx@linutronix.de
Subject: [PATCH] watchdog/core: Fix AA deadlock causeb by watchdog
Date: Tue,  4 Jun 2024 11:57:36 +0000	[thread overview]
Message-ID: <20240604115736.1013341-1-luogengkun@huaweicloud.com> (raw)

We found an AA deadlock problem as shown belowed:

TaskA				TaskB				WatchDog			system_wq

...
css_killed_work_fn:
P(cgroup_mutex)
...
								...
								__lockup_detector_reconfigure:
								P(cpu_hotplug_lock.read)
								...
				...
				percpu_down_write:
				P(cpu_hotplug_lock.write)
												...
												cgroup_bpf_release:
												P(cgroup_mutex)
								smp_call_on_cpu:
								Wait system_wq

cpuset_css_offline:
P(cpu_hotplug_lock.read)

WatchDog is waitting for system_wq, who is waitting for cgroup_mutex, to finish
the jobs, but the owner of the cgroup_mutex is waitting for cpu_hotplug_lock.
The key point is the cpu_hotplug_lock, cause the system_wq may be waitting other
lock. What's more, it seems that smp_call_on_cpu doesn't need protection from
cpu_hotplug_lock. I try to revert the old patch to fix this problem, but I
encountered some conflicts. Or I should just release and acquire cpu_hotplug_lock
during between smp_call_on_cpu? I'm looking forward any suggestion :).

Fixes: e31d6883f21c ("watchdog/core, powerpc: Lock cpus across reconfiguration")

Signed-off-by: Luo Gengkun <luogengkun@huaweicloud.com>
---
 arch/powerpc/kernel/watchdog.c | 4 ++++
 kernel/watchdog.c              | 9 ---------
 2 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
index 8c464a5d8246..f33f532ea7fa 100644
--- a/arch/powerpc/kernel/watchdog.c
+++ b/arch/powerpc/kernel/watchdog.c
@@ -550,17 +550,21 @@ void watchdog_hardlockup_stop(void)
 {
 	int cpu;
 
+	cpus_read_lock();
 	for_each_cpu(cpu, &wd_cpus_enabled)
 		stop_watchdog_on_cpu(cpu);
+	cpus_read_unlock();
 }
 
 void watchdog_hardlockup_start(void)
 {
 	int cpu;
 
+	cpus_read_lock();
 	watchdog_calc_timeouts();
 	for_each_cpu_and(cpu, cpu_online_mask, &watchdog_cpumask)
 		start_watchdog_on_cpu(cpu);
+	cpus_read_unlock();
 }
 
 /*
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 51915b44ac73..13303a932cde 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -867,7 +867,6 @@ int lockup_detector_offline_cpu(unsigned int cpu)
 
 static void __lockup_detector_reconfigure(void)
 {
-	cpus_read_lock();
 	watchdog_hardlockup_stop();
 
 	softlockup_stop_all();
@@ -877,12 +876,6 @@ static void __lockup_detector_reconfigure(void)
 		softlockup_start_all();
 
 	watchdog_hardlockup_start();
-	cpus_read_unlock();
-	/*
-	 * Must be called outside the cpus locked section to prevent
-	 * recursive locking in the perf code.
-	 */
-	__lockup_detector_cleanup();
 }
 
 void lockup_detector_reconfigure(void)
@@ -916,11 +909,9 @@ static __init void lockup_detector_setup(void)
 #else /* CONFIG_SOFTLOCKUP_DETECTOR */
 static void __lockup_detector_reconfigure(void)
 {
-	cpus_read_lock();
 	watchdog_hardlockup_stop();
 	lockup_detector_update_enable();
 	watchdog_hardlockup_start();
-	cpus_read_unlock();
 }
 void lockup_detector_reconfigure(void)
 {
-- 
2.34.1


                 reply	other threads:[~2024-06-04 18:04 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240604115736.1013341-1-luogengkun@huaweicloud.com \
    --to=luogengkun@huaweicloud.com \
    --cc=akpm@linux-foundation.org \
    --cc=bpf@vger.kernel.org \
    --cc=dianders@chromium.org \
    --cc=kernelfans@gmail.com \
    --cc=lecopzer.chen@mediatek.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mhocko@suse.com \
    --cc=naveen.n.rao@linux.ibm.com \
    --cc=npiggin@gmail.com \
    --cc=pmladek@suse.com \
    --cc=song@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=trix@redhat.com \
    --cc=yaoma@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).