All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Galbraith <efault@gmx.de>
To: Matt Fleming <matt@codeblueprint.co.uk>,
	Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>,
	linux-kernel@vger.kernel.org, Michal Hocko <mhocko@suse.com>
Subject: Re: cpu stopper threads and load balancing leads to deadlock
Date: Thu, 03 May 2018 14:12:22 +0200	[thread overview]
Message-ID: <1525349542.9956.2.camel@gmx.de> (raw)
In-Reply-To: <20180424133325.GA3179@codeblueprint.co.uk>

On Tue, 2018-04-24 at 14:33 +0100, Matt Fleming wrote:
> On Fri, 20 Apr, at 11:50:05AM, Peter Zijlstra wrote:
> > On Tue, Apr 17, 2018 at 03:21:19PM +0100, Matt Fleming wrote:
> > > Hi guys,
> > > 
> > > We've seen a bug in one of our SLE kernels where the cpu stopper
> > > thread ("migration/15") is entering idle balance. This then triggers
> > > active load balance.
> > > 
> > > At the same time, a task on another CPU triggers a page fault and NUMA
> > > balancing kicks in to try and migrate the task closer to the NUMA node
> > > for that page (we're inside stop_two_cpus()). This faulting task is
> > > spinning in try_to_wake_up() (inside smp_cond_load_acquire(&p->on_cpu,
> > > !VAL)), waiting for "migration/15" to context switch.
> > > 
> > > Unfortunately, because "migration/15" is doing active load balance
> > > it's spinning waiting for the NUMA-page-faulting CPU's stopper lock,
> > > which is already held (since it's inside stop_two_cpus()).
> > > 
> > > Deadlock ensues.
> > 
> > 
> > So if I read that right, something like the following happens:
> > 
> > CPU0					CPU1
> > 
> > schedule(.prev=migrate/0)		<fault>
> >   pick_next_task			  ...
> >     idle_balance			    migrate_swap()
> >       active_balance			      stop_two_cpus()
> > 						spin_lock(stopper0->lock)
> > 						spin_lock(stopper1->lock)
> > 						ttwu(migrate/0)
> > 						  smp_cond_load_acquire() -- waits for schedule()
> >         stop_one_cpu(1)
> > 	  spin_lock(stopper1->lock) -- waits for stopper lock
> 
> Yep, that's exactly right.
> 
> > Fix _this_ deadlock by taking out the wakeups from under stopper->lock.
> > I'm not entirely sure there isn't more dragons here, but this particular
> > one seems fixable by doing that.
> > 
> > Is there any way you can reproduce/test this?
> 
> I'm afraid I don't have any way to test this, but I can ask the
> customer that reported it if they can.
> 
> Either way, this fix looks good to me.

Seems there's another problem there with hotplug.  Virgin tip...

[  122.147601] smpboot: CPU 4 is now offline
[  122.189701] smpboot: CPU 5 is now offline
[  122.225612] smpboot: CPU 6 is now offline
[  122.257760] smpboot: CPU 7 is now offline
[  124.172418] smpboot: CPU 2 is now offline
[  124.209121] smpboot: CPU 3 is now offline
[  124.215810] smpboot: Booting Node 0 Processor 2 APIC 0x4

[  124.216939] =============================
[  124.216939] WARNING: suspicious RCU usage
[  124.216941] 4.17.0.g66d489e-tip-default #82 Tainted: G            E    
[  124.216941] -----------------------------
[  124.216943] kernel/sched/core.c:1614 suspicious rcu_dereference_check() usage!
[  124.216944] 
               other info that might help us debug this:

[  124.216945] 
               RCU used illegally from offline CPU!
               rcu_scheduler_active = 2, debug_locks = 0
[  124.216946] 4 locks held by swapper/2/0:
[  124.216947]  #0: 000000001f9fa447 (stop_cpus_mutex){+.+.}, at: stop_machine_from_inactive_cpu+0x86/0x130
[  124.216953]  #1: 000000004cb07b3b (&stopper->lock){..-.}, at: cpu_stop_queue_work+0x2d/0x80
[  124.216958]  #2: 00000000d3a46b90 (&p->pi_lock){-.-.}, at: try_to_wake_up+0x2d/0x5f0
[  124.216964]  #3: 00000000f360767b (rcu_read_lock){....}, at: rcu_read_lock+0x0/0x80
[  124.216969] 
               stack backtrace:
[  124.216971] CPU: 2 PID: 0 Comm: swapper/2 Kdump: loaded Tainted: G            E     4.17.0.g66d489e-tip-default #82
[  124.216972] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013
[  124.216973] Call Trace:
[  124.216977]  dump_stack+0x78/0xb3
[  124.216979]  ttwu_stat+0x121/0x130
[  124.216983]  try_to_wake_up+0x2c2/0x5f0
[  124.216988]  ? cpu_stop_park+0x30/0x30
[  124.216990]  cpu_stop_queue_work+0x7c/0x80
[  124.216993]  queue_stop_cpus_work+0x61/0xb0
[  124.216997]  stop_machine_from_inactive_cpu+0xd3/0x130
[  124.216999]  ? mtrr_restore+0x80/0x80
[  124.217005]  mtrr_ap_init+0x62/0x70
[  124.217008]  identify_secondary_cpu+0x18/0x80
[  124.217011]  smp_store_cpu_info+0x44/0x50
[  124.217014]  start_secondary+0x9a/0x1e0
[  124.217017]  secondary_startup_64+0xa5/0xb0

[  124.218433] ======================================================
[  124.218433] WARNING: possible circular locking dependency detected
[  124.218433] 4.17.0.g66d489e-tip-default #82 Tainted: G            E    
[  124.218434] ------------------------------------------------------
[  124.218434] swapper/2/0 is trying to acquire lock:
[  124.218434] 000000006b311cf8 ((console_sem).lock){-...}, at: down_trylock+0xf/0x30

[  124.218436] but task is already holding lock:
[  124.218436] 00000000d3a46b90 (&p->pi_lock){-.-.}, at: try_to_wake_up+0x2d/0x5f0

[  124.218438] which lock already depends on the new lock.


[  124.218438] the existing dependency chain (in reverse order) is:

[  124.218439] -> #1 (&p->pi_lock){-.-.}:
[  124.218440]        lock_acquire+0xbd/0x200
[  124.218440]        _raw_spin_lock_irqsave+0x4c/0x60
[  124.218441]        try_to_wake_up+0x2d/0x5f0
[  124.218441]        up+0x40/0x50
[  124.218441]        __up_console_sem+0x41/0x80
[  124.218441]        console_unlock+0x398/0x6b0
[  124.218442]        do_con_write.part.20+0x71b/0x9d0
[  124.218442]        con_write+0x55/0x60
[  124.218442]        do_output_char+0x181/0x200
[  124.218442]        n_tty_write+0x20a/0x460
[  124.218443]        tty_write+0x14a/0x2b0
[  124.218443]        do_iter_write+0x144/0x180
[  124.218443]        vfs_writev+0x71/0xd0
[  124.218444]        do_writev+0x51/0xd0
[  124.218444]        do_syscall_64+0x60/0x210
[  124.218444]        entry_SYSCALL_64_after_hwframe+0x49/0xbe

[  124.218445] -> #0 ((console_sem).lock){-...}:
[  124.218446]        __lock_acquire+0x436/0x770
[  124.218446]        lock_acquire+0xbd/0x200
[  124.218446]        _raw_spin_lock_irqsave+0x4c/0x60
[  124.218447]        down_trylock+0xf/0x30
[  124.218447]        __down_trylock_console_sem+0x37/0xa0
[  124.218447]        console_trylock+0x13/0x60
[  124.218447]        vprintk_emit+0x237/0x460
[  124.218448]        printk+0x48/0x4a
[  124.218448]        lockdep_rcu_suspicious+0x26/0x100
[  124.218448]        ttwu_stat+0x121/0x130
[  124.218449]        try_to_wake_up+0x2c2/0x5f0
[  124.218449]        cpu_stop_queue_work+0x7c/0x80
[  124.218449]        queue_stop_cpus_work+0x61/0xb0
[  124.218450]        stop_machine_from_inactive_cpu+0xd3/0x130
[  124.218450]        mtrr_ap_init+0x62/0x70
[  124.218450]        identify_secondary_cpu+0x18/0x80
[  124.218450]        smp_store_cpu_info+0x44/0x50
[  124.218451]        start_secondary+0x9a/0x1e0
[  124.218451]        secondary_startup_64+0xa5/0xb0

[  124.218451] other info that might help us debug this:

[  124.218452]  Possible unsafe locking scenario:

[  124.218452]        CPU0                    CPU1
[  124.218453]        ----                    ----
[  124.218453]   lock(&p->pi_lock);
[  124.218454]                                lock((console_sem).lock);
[  124.218454]                                lock(&p->pi_lock);
[  124.218455]   lock((console_sem).lock);

[  124.218456]  *** DEADLOCK ***

[  124.218456] 4 locks held by swapper/2/0:
[  124.218456]  #0: 000000001f9fa447 (stop_cpus_mutex){+.+.}, at: stop_machine_from_inactive_cpu+0x86/0x130
[  124.218458]  #1: 000000004cb07b3b (&stopper->lock){..-.}, at: cpu_stop_queue_work+0x2d/0x80
[  124.218459]  #2: 00000000d3a46b90 (&p->pi_lock){-.-.}, at: try_to_wake_up+0x2d/0x5f0
[  124.218460]  #3: 00000000f360767b (rcu_read_lock){....}, at: rcu_read_lock+0x0/0x80

[  124.218462] stack backtrace:
[  124.218462] CPU: 2 PID: 0 Comm: swapper/2 Kdump: loaded Tainted: G            E     4.17.0.g66d489e-tip-default #82
[  124.218462] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013
[  124.218463] Call Trace:
[  124.218463]  dump_stack+0x78/0xb3
[  124.218463]  print_circular_bug.isra.38+0x1f3/0x200
[  124.218463]  check_prev_add.constprop.46+0x717/0x730
[  124.218464]  validate_chain.isra.42+0x652/0x980
[  124.218464]  __lock_acquire+0x436/0x770
[  124.218464]  lock_acquire+0xbd/0x200
[  124.218464]  ? down_trylock+0xf/0x30
[  124.218465]  ? vprintk_emit+0x237/0x460
[  124.218465]  _raw_spin_lock_irqsave+0x4c/0x60
[  124.218465]  ? down_trylock+0xf/0x30
[  124.218466]  down_trylock+0xf/0x30
[  124.218466]  __down_trylock_console_sem+0x37/0xa0
[  124.218466]  console_trylock+0x13/0x60
[  124.218466]  vprintk_emit+0x237/0x460
[  124.218467]  printk+0x48/0x4a
[  124.218467]  lockdep_rcu_suspicious+0x26/0x100
[  124.218467]  ttwu_stat+0x121/0x130
[  124.218467]  try_to_wake_up+0x2c2/0x5f0
[  124.218468]  ? cpu_stop_park+0x30/0x30
[  124.218468]  cpu_stop_queue_work+0x7c/0x80
[  124.218468]  queue_stop_cpus_work+0x61/0xb0
[  124.218468]  stop_machine_from_inactive_cpu+0xd3/0x130
[  124.218469]  ? mtrr_restore+0x80/0x80
[  124.218469]  mtrr_ap_init+0x62/0x70
[  124.218469]  identify_secondary_cpu+0x18/0x80
[  124.218469]  smp_store_cpu_info+0x44/0x50
[  124.218470]  start_secondary+0x9a/0x1e0
[  124.218470]  secondary_startup_64+0xa5/0xb0
[  124.478753] smpboot: Booting Node 0 Processor 3 APIC 0x6

  reply	other threads:[~2018-05-03 12:12 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-17 14:21 cpu stopper threads and load balancing leads to deadlock Matt Fleming
2018-04-18  5:47 ` Mike Galbraith
2018-04-19  5:38   ` Mike Galbraith
2018-04-20  9:50 ` Peter Zijlstra
2018-04-24 13:33   ` Matt Fleming
2018-05-03 12:12     ` Mike Galbraith [this message]
2018-05-03 12:28       ` Peter Zijlstra
2018-05-03 12:40         ` Mike Galbraith
2018-05-03 12:49           ` Peter Zijlstra
2018-05-03 13:32             ` Mike Galbraith
2018-05-03 13:56               ` Peter Zijlstra
2018-05-03 14:16                 ` Mike Galbraith
2018-05-03 14:44                   ` Peter Zijlstra
2018-05-03 16:12                     ` Paul E. McKenney
2018-05-03 16:45                       ` Peter Zijlstra
2018-05-03 17:18                         ` Paul E. McKenney
2018-05-03 17:54                           ` Peter Zijlstra
2018-05-03 18:24                             ` Paul E. McKenney
2018-05-04  3:38                         ` Mike Galbraith
2018-05-15  4:30                         ` Mike Galbraith
2018-05-17 14:03                           ` Paul E. McKenney
2018-05-17 14:10                             ` Mike Galbraith
2018-05-17 14:23                             ` Peter Zijlstra
2018-05-17 14:56                               ` Paul E. McKenney
2018-05-22 17:05                                 ` Paul E. McKenney
2018-05-03 14:39                 ` Paul E. McKenney
2018-05-03 14:52                   ` Peter Zijlstra
2018-05-03  9:24   ` [tip:sched/urgent] stop_machine, sched: Fix migrate_swap() vs. active_balance() deadlock tip-bot for Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1525349542.9956.2.camel@gmx.de \
    --to=efault@gmx.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=matt@codeblueprint.co.uk \
    --cc=mhocko@suse.com \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.