public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Wanpeng Li <kernellwp@gmail.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Ingo Molnar" <mingo@kernel.org>,
	"Lai Jiangshan" <jiangshanlai@gmail.com>,
	dipankar@in.ibm.com, "Andrew Morton" <akpm@linux-foundation.org>,
	"Mathieu Desnoyers" <mathieu.desnoyers@efficios.com>,
	"Josh Triplett" <josh@joshtriplett.org>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Steven Rostedt" <rostedt@goodmis.org>,
	dhowells@redhat.com, edumazet@google.com, dvhart@linux.intel.com,
	"Frédéric Weisbecker" <fweisbec@gmail.com>,
	oleg@redhat.com, bobby.prani@gmail.com
Subject: Re: [PATCH tip/core/rcu 3/5] sched: Make wake_up_nohz_cpu() handle CPUs going offline
Date: Mon, 22 Aug 2016 17:45:26 -0700	[thread overview]
Message-ID: <20160823004526.GW3482@linux.vnet.ibm.com> (raw)
In-Reply-To: <CANRm+CwgwTNTjvuNvLWCxUNTuAsmvYsX=kDV8J5GkjteG9-Ccw@mail.gmail.com>

On Tue, Aug 23, 2016 at 06:57:20AM +0800, Wanpeng Li wrote:
> 2016-08-22 23:30 GMT+08:00 Paul E. McKenney <paulmck@linux.vnet.ibm.com>:
> > Both timers and hrtimers are maintained on the outgoing CPU until
> > CPU_DEAD time, at which point they are migrated to a surviving CPU.  If a
> > mod_timer() executes between CPU_DYING and CPU_DEAD time, x86 systems
> > will splat in native_smp_send_reschedule() when attempting to wake up
> > the just-now-offlined CPU, as shown below from a NO_HZ_FULL kernel:
> >
> > [ 7976.741556] WARNING: CPU: 0 PID: 661 at /home/paulmck/public_git/linux-rcu/arch/x86/kernel/smp.c:125 native_smp_send_reschedule+0x39/0x40
> > [ 7976.741595] Modules linked in:
> > [ 7976.741595] CPU: 0 PID: 661 Comm: rcu_torture_rea Not tainted 4.7.0-rc2+ #1
> > [ 7976.741595] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> > [ 7976.741595]  0000000000000000 ffff88000002fcc8 ffffffff8138ab2e 0000000000000000
> > [ 7976.741595]  0000000000000000 ffff88000002fd08 ffffffff8105cabc 0000007d1fd0ee18
> > [ 7976.741595]  0000000000000001 ffff88001fd16d40 ffff88001fd0ee00 ffff88001fd0ee00
> > [ 7976.741595] Call Trace:
> > [ 7976.741595]  [<ffffffff8138ab2e>] dump_stack+0x67/0x99
> > [ 7976.741595]  [<ffffffff8105cabc>] __warn+0xcc/0xf0
> > [ 7976.741595]  [<ffffffff8105cb98>] warn_slowpath_null+0x18/0x20
> > [ 7976.741595]  [<ffffffff8103cba9>] native_smp_send_reschedule+0x39/0x40
> > [ 7976.741595]  [<ffffffff81089bc2>] wake_up_nohz_cpu+0x82/0x190
> > [ 7976.741595]  [<ffffffff810d275a>] internal_add_timer+0x7a/0x80
> > [ 7976.741595]  [<ffffffff810d3ee7>] mod_timer+0x187/0x2b0
> > [ 7976.741595]  [<ffffffff810c89dd>] rcu_torture_reader+0x33d/0x380
> > [ 7976.741595]  [<ffffffff810c66f0>] ? sched_torture_read_unlock+0x30/0x30
> > [ 7976.741595]  [<ffffffff810c86a0>] ? rcu_bh_torture_read_lock+0x80/0x80
> > [ 7976.741595]  [<ffffffff8108068f>] kthread+0xdf/0x100
> > [ 7976.741595]  [<ffffffff819dd83f>] ret_from_fork+0x1f/0x40
> > [ 7976.741595]  [<ffffffff810805b0>] ? kthread_create_on_node+0x200/0x200
> >
> > However, in this case, the wakeup is redundant, because the timer
> > migration will reprogram timer hardware as needed.  Note that the fact
> > that preemption is disabled does not avoid the splat, as the offline
> > operation has already passed both the synchronize_sched() and the
> > stop_machine() that would be blocked by disabled preemption.
> >
> > This commit therefore modifies wake_up_nohz_cpu() to avoid attempting
> > to wake up offline CPUs.  It also adds a comment stating that the
> > caller must tolerate lost wakeups when the target CPU is going offline,
> > and suggesting the CPU_DEAD notifier as a recovery mechanism.
> 
> Interesting, I have a patch which posted several weeks ago fix another
> similar issue, https://lkml.org/lkml/2016/8/4/143 Anyway, if my patch
> also fixes your bug?

I will see your several weeks and raise you more than a month:

http://lkml.kernel.org/g/20160630175845.GA10269@linux.vnet.ibm.com

So you try mine and then I will try yours.  ;-)

Especially given that I am not seeing how the code path in my trace
above reaches your change in sched_can_stop_tick()...

							Thanx, Paul

  reply	other threads:[~2016-08-23  0:45 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-22 15:29 [PATCH tip/core/rcu 0/5] Miscellaneous fixes for 4.9 Paul E. McKenney
2016-08-22 15:30 ` [PATCH tip/core/rcu 1/5] rcu: Fix soft lockup for rcu_nocb_kthread Paul E. McKenney
2016-08-22 16:19   ` Nikolay Borisov
2016-08-22 16:44     ` Paul E. McKenney
2016-08-22 16:47       ` Nikolay Borisov
2016-08-22 15:30 ` [PATCH tip/core/rcu 2/5] rcu: Use rcu_gp_kthread_wake() to wake up grace period kthreads again Paul E. McKenney
2016-08-22 15:30 ` [PATCH tip/core/rcu 3/5] sched: Make wake_up_nohz_cpu() handle CPUs going offline Paul E. McKenney
2016-08-22 22:57   ` Wanpeng Li
2016-08-23  0:45     ` Paul E. McKenney [this message]
2016-08-23  0:47       ` Wanpeng Li
2016-08-22 15:30 ` [PATCH tip/core/rcu 4/5] rcu: don't use modular infrastructure in non-modular code Paul E. McKenney
2016-08-22 15:30 ` [PATCH tip/core/rcu 5/5] rcu: Avoid redundant quiescent-state chasing Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160823004526.GW3482@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=bobby.prani@gmail.com \
    --cc=dhowells@redhat.com \
    --cc=dipankar@in.ibm.com \
    --cc=dvhart@linux.intel.com \
    --cc=edumazet@google.com \
    --cc=fweisbec@gmail.com \
    --cc=jiangshanlai@gmail.com \
    --cc=josh@joshtriplett.org \
    --cc=kernellwp@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mingo@kernel.org \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox