All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sodagudi Prasad <psodagud@codeaurora.org>
To: peterz@infradead.org, mingo@kernel.org,
	gregkh@linuxfoundation.org, bigeasy@linutronix.de,
	tglx@linutronix.de
Cc: isaacm@codeaurora.org, psodagud@codeaurora.org,
	linux-kernel@vger.kernel.org, mingo@kernel.org
Subject: cpu stopper threads and setaffinity leads to deadlock
Date: Wed, 01 Aug 2018 18:34:40 -0700	[thread overview]
Message-ID: <24eebe1d874cb8e3b9a18087554544fa@codeaurora.org> (raw)

Hi Peter and Tglx,

We are observing another deadlock issue due to commit 
0b26351b91(stop_machine, sched: Fix migrate_swap() vs. active_balance() 
deadlock), even after taking the following fix
https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1740526.html 
on the Linux-4.14.56  kernel.

Here is the scenario that leads to this deadlock.
We have used the stress-ng-64 --affinity test case to reproduce this 
issue in a controlled environment, while simultaneously running CPU hot 
plug and task migrations.

Stress-ng-affin (call stack shown below) is changing its own affinity 
from cpu3 to cpu7. Stress-ng-affin is preempted in the 
cpu_stop_queue_work() function
as soon as the stopper lock for migration/3 is released . At the same 
time, on CPU 7, cross migration of tasks happens between  cpu3 and cpu7.

=======================================================
Process: stress-ng-affin, cpu: 3 pid: 1748 start: 0xffffffd8817e4480
=====================================================
     Task name: stress-ng-affin pid: 1748 cpu: 3 start: ffffffd8817e4480
     state: 0x0 exit_state: 0x0 stack base: 0xffffff801c8e8000 Prio: 120
     Stack:
     [<ffffff87754864f4>] __switch_to+0xb8
     [<ffffff87763ebf8c>] __schedule+0x690
     [<ffffff87763ec388>] preempt_schedule_common+0x100
     [<ffffff87763eb8f4>] preempt_schedule+0x24
     [<ffffff87763f0e58>] _raw_spin_unlock_irqrestore+0x64
     [<ffffff8775574f8c>] cpu_stop_queue_work+0x9c
     [<ffffff8775574dfc>] stop_one_cpu+0x58
     [<ffffff87754e4884>] __set_cpus_allowed_ptr+0x234
     [<ffffff87754e8888>] sched_setaffinity+0x150
     [<ffffff87754e8ad8>] SyS_sched_setaffinity+0xcc
     [<ffffff87754837c0>] el0_svc_naked+0x34
     [<0>] UNKNOWN+0x0

Due to cross migration of tasks between cpu7 and cpu3, migration/7 has 
started executing and waits for the migration/3 task, so that they can 
proceed within the multi cpu stop state machine together.
Unfortunately stress-ng-affin is affine to cpu7, and since migration 7 
has started running, and has monopolized cpu7’s execution, stress-ng 
will never run on cpu7, and cpu3’s migration task is never woken up.

Essentially:
Due to the nature of the wake_q interface,  a thread can only be in at 
most one wake queue at a time.
migration/3 is currently in stress-ng-affin’s wake_q. This means that no 
other thread can add migration/3 to their wake queue.
Thus, even if any attempt is made to stop CPU 3 (e.g. cross-migration, 
hot plugging, etc), no thread will wake up migration/3.

Below change helped to fix this deadlock.
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index e190d1e..f932e1e 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -87,9 +87,9 @@ static bool cpu_stop_queue_work(unsigned int cpu, 
struct cpu_stop_work *work)
                 __cpu_stop_queue_work(stopper, work, &wakeq);
         else if (work->done)
                 cpu_stop_signal_done(work->done);
-       raw_spin_unlock_irqrestore(&stopper->lock, flags);

         wake_up_q(&wakeq);
+       raw_spin_unlock_irqrestore(&stopper->lock, flags);


-Thanks, Prasad

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,
Linux Foundation Collaborative Project

             reply	other threads:[~2018-08-02  1:34 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-02  1:34 Sodagudi Prasad [this message]
2018-08-02  8:12 ` cpu stopper threads and setaffinity leads to deadlock Peter Zijlstra
2018-08-02  8:27   ` Mike Galbraith
2018-08-02  8:45 ` Peter Zijlstra
2018-08-02  9:49 ` Peter Zijlstra
2018-08-03 11:41   ` Thomas Gleixner
2018-08-03 18:57     ` Sodagudi Prasad

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=24eebe1d874cb8e3b9a18087554544fa@codeaurora.org \
    --to=psodagud@codeaurora.org \
    --cc=bigeasy@linutronix.de \
    --cc=gregkh@linuxfoundation.org \
    --cc=isaacm@codeaurora.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.