From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Wanpeng Li <kernellwp@gmail.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"Ingo Molnar" <mingo@kernel.org>,
"Lai Jiangshan" <jiangshanlai@gmail.com>,
dipankar@in.ibm.com, "Andrew Morton" <akpm@linux-foundation.org>,
"Mathieu Desnoyers" <mathieu.desnoyers@efficios.com>,
"Josh Triplett" <josh@joshtriplett.org>,
"Thomas Gleixner" <tglx@linutronix.de>,
"Peter Zijlstra" <peterz@infradead.org>,
"Steven Rostedt" <rostedt@goodmis.org>,
dhowells@redhat.com, edumazet@google.com, dvhart@linux.intel.com,
"Frédéric Weisbecker" <fweisbec@gmail.com>,
oleg@redhat.com, bobby.prani@gmail.com
Subject: Re: [PATCH tip/core/rcu 3/5] sched: Make wake_up_nohz_cpu() handle CPUs going offline
Date: Mon, 22 Aug 2016 17:45:26 -0700 [thread overview]
Message-ID: <20160823004526.GW3482@linux.vnet.ibm.com> (raw)
In-Reply-To: <CANRm+CwgwTNTjvuNvLWCxUNTuAsmvYsX=kDV8J5GkjteG9-Ccw@mail.gmail.com>
On Tue, Aug 23, 2016 at 06:57:20AM +0800, Wanpeng Li wrote:
> 2016-08-22 23:30 GMT+08:00 Paul E. McKenney <paulmck@linux.vnet.ibm.com>:
> > Both timers and hrtimers are maintained on the outgoing CPU until
> > CPU_DEAD time, at which point they are migrated to a surviving CPU. If a
> > mod_timer() executes between CPU_DYING and CPU_DEAD time, x86 systems
> > will splat in native_smp_send_reschedule() when attempting to wake up
> > the just-now-offlined CPU, as shown below from a NO_HZ_FULL kernel:
> >
> > [ 7976.741556] WARNING: CPU: 0 PID: 661 at /home/paulmck/public_git/linux-rcu/arch/x86/kernel/smp.c:125 native_smp_send_reschedule+0x39/0x40
> > [ 7976.741595] Modules linked in:
> > [ 7976.741595] CPU: 0 PID: 661 Comm: rcu_torture_rea Not tainted 4.7.0-rc2+ #1
> > [ 7976.741595] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> > [ 7976.741595] 0000000000000000 ffff88000002fcc8 ffffffff8138ab2e 0000000000000000
> > [ 7976.741595] 0000000000000000 ffff88000002fd08 ffffffff8105cabc 0000007d1fd0ee18
> > [ 7976.741595] 0000000000000001 ffff88001fd16d40 ffff88001fd0ee00 ffff88001fd0ee00
> > [ 7976.741595] Call Trace:
> > [ 7976.741595] [<ffffffff8138ab2e>] dump_stack+0x67/0x99
> > [ 7976.741595] [<ffffffff8105cabc>] __warn+0xcc/0xf0
> > [ 7976.741595] [<ffffffff8105cb98>] warn_slowpath_null+0x18/0x20
> > [ 7976.741595] [<ffffffff8103cba9>] native_smp_send_reschedule+0x39/0x40
> > [ 7976.741595] [<ffffffff81089bc2>] wake_up_nohz_cpu+0x82/0x190
> > [ 7976.741595] [<ffffffff810d275a>] internal_add_timer+0x7a/0x80
> > [ 7976.741595] [<ffffffff810d3ee7>] mod_timer+0x187/0x2b0
> > [ 7976.741595] [<ffffffff810c89dd>] rcu_torture_reader+0x33d/0x380
> > [ 7976.741595] [<ffffffff810c66f0>] ? sched_torture_read_unlock+0x30/0x30
> > [ 7976.741595] [<ffffffff810c86a0>] ? rcu_bh_torture_read_lock+0x80/0x80
> > [ 7976.741595] [<ffffffff8108068f>] kthread+0xdf/0x100
> > [ 7976.741595] [<ffffffff819dd83f>] ret_from_fork+0x1f/0x40
> > [ 7976.741595] [<ffffffff810805b0>] ? kthread_create_on_node+0x200/0x200
> >
> > However, in this case, the wakeup is redundant, because the timer
> > migration will reprogram timer hardware as needed. Note that the fact
> > that preemption is disabled does not avoid the splat, as the offline
> > operation has already passed both the synchronize_sched() and the
> > stop_machine() that would be blocked by disabled preemption.
> >
> > This commit therefore modifies wake_up_nohz_cpu() to avoid attempting
> > to wake up offline CPUs. It also adds a comment stating that the
> > caller must tolerate lost wakeups when the target CPU is going offline,
> > and suggesting the CPU_DEAD notifier as a recovery mechanism.
>
> Interesting, I have a patch which posted several weeks ago fix another
> similar issue, https://lkml.org/lkml/2016/8/4/143 Anyway, if my patch
> also fixes your bug?
I will see your several weeks and raise you more than a month:
http://lkml.kernel.org/g/20160630175845.GA10269@linux.vnet.ibm.com
So you try mine and then I will try yours. ;-)
Especially given that I am not seeing how the code path in my trace
above reaches your change in sched_can_stop_tick()...
Thanx, Paul
next prev parent reply other threads:[~2016-08-23 0:45 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-08-22 15:29 [PATCH tip/core/rcu 0/5] Miscellaneous fixes for 4.9 Paul E. McKenney
2016-08-22 15:30 ` [PATCH tip/core/rcu 1/5] rcu: Fix soft lockup for rcu_nocb_kthread Paul E. McKenney
2016-08-22 16:19 ` Nikolay Borisov
2016-08-22 16:44 ` Paul E. McKenney
2016-08-22 16:47 ` Nikolay Borisov
2016-08-22 15:30 ` [PATCH tip/core/rcu 2/5] rcu: Use rcu_gp_kthread_wake() to wake up grace period kthreads again Paul E. McKenney
2016-08-22 15:30 ` [PATCH tip/core/rcu 3/5] sched: Make wake_up_nohz_cpu() handle CPUs going offline Paul E. McKenney
2016-08-22 22:57 ` Wanpeng Li
2016-08-23 0:45 ` Paul E. McKenney [this message]
2016-08-23 0:47 ` Wanpeng Li
2016-08-22 15:30 ` [PATCH tip/core/rcu 4/5] rcu: don't use modular infrastructure in non-modular code Paul E. McKenney
2016-08-22 15:30 ` [PATCH tip/core/rcu 5/5] rcu: Avoid redundant quiescent-state chasing Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160823004526.GW3482@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=bobby.prani@gmail.com \
--cc=dhowells@redhat.com \
--cc=dipankar@in.ibm.com \
--cc=dvhart@linux.intel.com \
--cc=edumazet@google.com \
--cc=fweisbec@gmail.com \
--cc=jiangshanlai@gmail.com \
--cc=josh@joshtriplett.org \
--cc=kernellwp@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=mingo@kernel.org \
--cc=oleg@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.