From: Sodagudi Prasad <psodagud@codeaurora.org>
To: peterz@infradead.org, mingo@kernel.org,
gregkh@linuxfoundation.org, bigeasy@linutronix.de,
tglx@linutronix.de
Cc: isaacm@codeaurora.org, psodagud@codeaurora.org,
linux-kernel@vger.kernel.org, mingo@kernel.org
Subject: cpu stopper threads and setaffinity leads to deadlock
Date: Wed, 01 Aug 2018 18:34:40 -0700 [thread overview]
Message-ID: <24eebe1d874cb8e3b9a18087554544fa@codeaurora.org> (raw)
Hi Peter and Tglx,
We are observing another deadlock issue due to commit
0b26351b91(stop_machine, sched: Fix migrate_swap() vs. active_balance()
deadlock), even after taking the following fix
https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1740526.html
on the Linux-4.14.56 kernel.
Here is the scenario that leads to this deadlock.
We have used the stress-ng-64 --affinity test case to reproduce this
issue in a controlled environment, while simultaneously running CPU hot
plug and task migrations.
Stress-ng-affin (call stack shown below) is changing its own affinity
from cpu3 to cpu7. Stress-ng-affin is preempted in the
cpu_stop_queue_work() function
as soon as the stopper lock for migration/3 is released . At the same
time, on CPU 7, cross migration of tasks happens between cpu3 and cpu7.
=======================================================
Process: stress-ng-affin, cpu: 3 pid: 1748 start: 0xffffffd8817e4480
=====================================================
Task name: stress-ng-affin pid: 1748 cpu: 3 start: ffffffd8817e4480
state: 0x0 exit_state: 0x0 stack base: 0xffffff801c8e8000 Prio: 120
Stack:
[<ffffff87754864f4>] __switch_to+0xb8
[<ffffff87763ebf8c>] __schedule+0x690
[<ffffff87763ec388>] preempt_schedule_common+0x100
[<ffffff87763eb8f4>] preempt_schedule+0x24
[<ffffff87763f0e58>] _raw_spin_unlock_irqrestore+0x64
[<ffffff8775574f8c>] cpu_stop_queue_work+0x9c
[<ffffff8775574dfc>] stop_one_cpu+0x58
[<ffffff87754e4884>] __set_cpus_allowed_ptr+0x234
[<ffffff87754e8888>] sched_setaffinity+0x150
[<ffffff87754e8ad8>] SyS_sched_setaffinity+0xcc
[<ffffff87754837c0>] el0_svc_naked+0x34
[<0>] UNKNOWN+0x0
Due to cross migration of tasks between cpu7 and cpu3, migration/7 has
started executing and waits for the migration/3 task, so that they can
proceed within the multi cpu stop state machine together.
Unfortunately stress-ng-affin is affine to cpu7, and since migration 7
has started running, and has monopolized cpu7’s execution, stress-ng
will never run on cpu7, and cpu3’s migration task is never woken up.
Essentially:
Due to the nature of the wake_q interface, a thread can only be in at
most one wake queue at a time.
migration/3 is currently in stress-ng-affin’s wake_q. This means that no
other thread can add migration/3 to their wake queue.
Thus, even if any attempt is made to stop CPU 3 (e.g. cross-migration,
hot plugging, etc), no thread will wake up migration/3.
Below change helped to fix this deadlock.
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index e190d1e..f932e1e 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -87,9 +87,9 @@ static bool cpu_stop_queue_work(unsigned int cpu,
struct cpu_stop_work *work)
__cpu_stop_queue_work(stopper, work, &wakeq);
else if (work->done)
cpu_stop_signal_done(work->done);
- raw_spin_unlock_irqrestore(&stopper->lock, flags);
wake_up_q(&wakeq);
+ raw_spin_unlock_irqrestore(&stopper->lock, flags);
-Thanks, Prasad
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora
Forum,
Linux Foundation Collaborative Project
next reply other threads:[~2018-08-02 1:34 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-08-02 1:34 Sodagudi Prasad [this message]
2018-08-02 8:12 ` cpu stopper threads and setaffinity leads to deadlock Peter Zijlstra
2018-08-02 8:27 ` Mike Galbraith
2018-08-02 8:45 ` Peter Zijlstra
2018-08-02 9:49 ` Peter Zijlstra
2018-08-03 11:41 ` Thomas Gleixner
2018-08-03 18:57 ` Sodagudi Prasad
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=24eebe1d874cb8e3b9a18087554544fa@codeaurora.org \
--to=psodagud@codeaurora.org \
--cc=bigeasy@linutronix.de \
--cc=gregkh@linuxfoundation.org \
--cc=isaacm@codeaurora.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.