From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Peter Zijlstra <peterz@infradead.org>,
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: "Chatre, Reinette" <reinette.chatre@intel.com>,
"Jacob Pan" <jacob.jun.pan@linux.intel.com>,
"Josh Triplett" <josh@joshtriplett.org>,
"Ross Green" <rgkernel@gmail.com>,
"John Stultz" <john.stultz@linaro.org>,
"Thomas Gleixner" <tglx@linutronix.de>,
lkml <linux-kernel@vger.kernel.org>,
"Ingo Molnar" <mingo@kernel.org>,
"Lai Jiangshan" <jiangshanlai@gmail.com>,
dipankar@in.ibm.com, "Andrew Morton" <akpm@linux-foundation.org>,
rostedt <rostedt@goodmis.org>,
"David Howells" <dhowells@redhat.com>,
"Eric Dumazet" <edumazet@google.com>,
"Darren Hart" <dvhart@linux.intel.com>,
"Frédéric Weisbecker" <fweisbec@gmail.com>,
"Oleg Nesterov" <oleg@redhat.com>,
"pranith kumar" <bobby.prani@gmail.com>
Subject: Re: rcu_preempt self-detected stall on CPU from 4.5-rc3, since 3.17
Date: Mon, 28 Mar 2016 02:23:45 +0000 (UTC) [thread overview]
Message-ID: <1124128277.37541.1459131825120.JavaMail.zimbra@efficios.com> (raw)
In-Reply-To: <683720290.37511.1459129458781.JavaMail.zimbra@efficios.com>
----- On Mar 27, 2016, at 9:44 PM, Mathieu Desnoyers mathieu.desnoyers@efficios.com wrote:
> ----- On Mar 27, 2016, at 4:45 PM, Peter Zijlstra peterz@infradead.org wrote:
>
>> On Sun, Mar 27, 2016 at 08:40:18AM -0700, Paul E. McKenney wrote:
>>> Oh, and the patch I am running with is below. I am running x86, and so
>>> some other architectures would of course need the corresponding patch
>>> on that architecture.
>>
>>> -#define TIF_POLLING_NRFLAG 21 /* idle is polling for TIF_NEED_RESCHED */
>>> +/* #define TIF_POLLING_NRFLAG 21 idle is polling for TIF_NEED_RESCHED */
>>
>> x86 is the only arch that really uses this heavily IIRC.
>>
>> Most of the other archs need interrupts to wake up remote cores.
>>
>> So what we try to do is avoid sending IPIs when the CPU is idle, for the
>> remote wakeup case we use set_nr_if_polling() which sets
>> TIF_NEED_RESCHED if TIF_POLLING_NRFLAG was set. If it wasn't, we'll send
>> the IPI. Otherwise we rely on the idle loop to do sched_ttwu_pending()
>> when it breaks out of loop due to TIF_NEED_RESCHED.
>>
>> But, you need hotplug for this to happen, right?
>
> My understanding is that this seems to be detection of failures to be
> awakened for a long time on idle CPUs. It therefore seems to be more
> idle-related than cpu hotplug-related. I'm not saying that there is
> no issue with hotplug, just that the investigation so far seems to
> target mostly idle systems, AFAIK without stressing hotplug.
>
>>
>> We should not be migrating towards, or waking on, CPUs no longer present
>> in cpu_active_map, and there is a rcu/sched_sync() after clearing that
>> bit. Furthermore, migration_call() does a sched_ttwu_pending() (waking
>> any remaining stragglers) before we migrate all runnable tasks off the
>> dying CPU.
>>
>>
>>
>> The other interesting case would be resched_cpu(), which uses
>> set_nr_and_not_polling() to kick a remote cpu to call schedule(). It
>> atomically sets TIF_NEED_RESCHED and returns if TIF_POLLING_NRFLAG was
>> not set. If indeed not, it will send an IPI.
>>
>> This assumes the idle 'exit' path will do the same as the IPI does; and
>> if you look at cpu_idle_loop() it does indeed do both
>> preempt_fold_need_resched() and sched_ttwu_pending().
>>
>> Note that one cannot rely on irq_enter()/irq_exit() being called for the
>> scheduler IPI.
>
> Looking at commit e3baac47f0e82c4be632f4f97215bb93bf16b342 :
>
> set_nr_if_polling() returns true if the ti->flags read has the
> _TIF_NEED_RESCHED bit set, which will skip the IPI.
>
> But it seems weird. The side that calls set_nr_if_polling()
> does the following:
> 1) llist_add(&p->wake_entry, &cpu_rq(cpu)->wake_list)
> 2) set_nr_if_polling(rq->idle)
> 3) (don't do smp_send_reschedule(cpu) since set_nr_if_polling() returned
> true)
>
> The idle loop does:
> 1) __current_set_polling()
> 2) __current_clr_polling()
> 3) smp_mb__after_atomic()
> 4) sched_ttwu_pending()
> 5) schedule_preempt_disabled()
> -> This will clear the TIF_NEED_RESCHED flag
>
> While the idle loop is in sched_ttwu_pending(), after
> it has done the llist_del_all() (thus has grabbed all the
> list entries), TIF_NEED_RESCHED is still set. If both list_all and
> set_nr_if_polling() are called right after the llist_del_all(), we
> will end up in a situation where we have an entry in the list, but
> there won't be any reschedule sent on the idle CPU until something
> else awakens it. On a _very_ idle CPU, this could take some time.
>
> set_nr_and_not_polling() don't seem to have the same issue, because
> it does not return true if TIF_NEED_RESCHED is observed as being
> already set: it really just depends on the state of the TIF_POLLING_NRFLAG
> bit.
>
> Am I missing something important ?
Well, it seems that the test for _TIF_POLLING_NRFLAG in set_nr_if_polling()
just before the test for _TIF_NEED_RESCHED should take care of it: while in
sched_ttwu_pending within the idle loop, the TIF_POLLING_NRFLAG should be
cleared, thus causing set_nr_if_polling to return false.
I'm slightly concerned about the lack of smp_mb__after_atomic()
between the TIF_NEED_RESCHED flag being cleared within schedule_preempt_disabled
and the TIF_POLLING_NRFLAG being set in the following loop. Indeed, clear_bit()
does not have a compiler barrier, nor processor-level memory barriers
(of course, the processor memory barrier should not really matter on
x86-64 due to lock prefix). Moreover, TIF_NEED_RESCHED is bit 3 on x86-64,
whereas TIF_POLLING_NRFLAG is bit 21. Those are in two different bytes of
the thread flags, and thus set/cleared as different addresses by clear_bit()
acting on an immediate "nr" argument.
If we have any state where TIF_POLLING_NRFLAG is set before TIF_NEED_RESCHED
is cleared within the idle thread, we could end up missing a needed resched IPI.
Another question: why are set_nr_if_polling and set_nr_and_not_polling two
different implementations ? Could they be combined ?
Thanks,
Mathieu
>
> Thanks,
>
> Mathieu
>
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
next prev parent reply other threads:[~2016-03-28 2:23 UTC|newest]
Thread overview: 79+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-02-09 10:11 rcu_preempt self-detected stall on CPU from 4.5-rc3, since 3.17 Ross Green
2016-02-17 5:45 ` Paul E. McKenney
2016-02-17 19:28 ` Paul E. McKenney
2016-02-17 19:45 ` Peter Zijlstra
2016-02-17 20:28 ` Paul E. McKenney
2016-02-17 23:19 ` Paul E. McKenney
2016-02-18 11:51 ` Ross Green
2016-02-18 23:14 ` Mathieu Desnoyers
2016-02-19 3:56 ` Ross Green
2016-02-19 4:13 ` John Stultz
2016-02-19 17:33 ` Paul E. McKenney
2016-02-20 4:34 ` Ross Green
2016-02-20 6:32 ` Paul E. McKenney
2016-02-21 5:04 ` Ross Green
2016-02-21 18:15 ` Ross Green
2016-02-23 20:34 ` Mathieu Desnoyers
2016-02-23 20:55 ` Paul E. McKenney
2016-02-23 21:28 ` Ross Green
2016-02-25 5:13 ` Ross Green
2016-02-26 0:56 ` Paul E. McKenney
2016-02-26 1:35 ` Paul E. McKenney
2016-03-04 5:30 ` Ross Green
2016-03-04 15:18 ` Paul E. McKenney
2016-03-18 21:00 ` Josh Triplett
2016-03-18 23:56 ` Paul E. McKenney
2016-03-21 16:22 ` Jacob Pan
2016-03-21 17:26 ` Paul E. McKenney
2016-03-22 16:35 ` Chatre, Reinette
2016-03-22 17:40 ` Paul E. McKenney
2016-03-22 21:04 ` Chatre, Reinette
2016-03-22 21:19 ` Paul E. McKenney
2016-03-23 17:15 ` Chatre, Reinette
2016-03-23 18:20 ` Paul E. McKenney
2016-03-23 18:25 ` Chatre, Reinette
2016-03-23 19:50 ` Paul E. McKenney
2016-03-25 21:24 ` Chatre, Reinette
2016-03-25 21:46 ` Paul E. McKenney
2016-03-26 12:29 ` Mathieu Desnoyers
2016-03-26 15:28 ` Paul E. McKenney
2016-03-26 18:49 ` Paul E. McKenney
2016-03-26 22:22 ` Mathieu Desnoyers
2016-03-27 1:34 ` Paul E. McKenney
2016-03-27 13:48 ` Mathieu Desnoyers
2016-03-27 15:40 ` Paul E. McKenney
2016-03-27 20:00 ` Paul E. McKenney
2016-03-27 20:45 ` Peter Zijlstra
2016-03-27 21:06 ` Paul E. McKenney
2016-03-28 6:25 ` Peter Zijlstra
2016-03-28 13:08 ` Paul E. McKenney
2016-03-29 0:25 ` Paul E. McKenney
2016-03-29 0:28 ` Paul E. McKenney
2016-03-29 13:49 ` Paul E. McKenney
2016-03-30 14:55 ` Paul E. McKenney
2016-03-31 15:42 ` Paul E. McKenney
2016-04-03 8:18 ` Paul E. McKenney
2016-05-06 6:25 ` Ross Green
2016-05-07 15:25 ` Paul E. McKenney
2016-05-10 2:36 ` Ross Green
2016-06-30 17:52 ` Paul E. McKenney
2016-03-28 1:44 ` Mathieu Desnoyers
2016-03-28 2:23 ` Mathieu Desnoyers [this message]
2016-03-28 6:13 ` Peter Zijlstra
2016-03-28 13:50 ` Paul E. McKenney
2016-03-28 14:15 ` Mathieu Desnoyers
2016-03-27 20:53 ` Peter Zijlstra
2016-03-27 21:07 ` Paul E. McKenney
2016-03-27 20:54 ` Peter Zijlstra
2016-03-27 21:09 ` Paul E. McKenney
2016-03-28 6:28 ` Peter Zijlstra
2016-03-28 13:29 ` Paul E. McKenney
2016-03-28 15:07 ` Mathieu Desnoyers
2016-03-28 15:56 ` Paul E. McKenney
2016-03-28 16:12 ` Mathieu Desnoyers
2016-03-28 16:29 ` Paul E. McKenney
2016-03-30 12:58 ` Boqun Feng
2016-03-30 13:30 ` Paul E. McKenney
2016-03-30 14:15 ` Boqun Feng
2016-02-19 4:22 ` Paul E. McKenney
2016-02-19 5:59 ` Ross Green
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1124128277.37541.1459131825120.JavaMail.zimbra@efficios.com \
--to=mathieu.desnoyers@efficios.com \
--cc=akpm@linux-foundation.org \
--cc=bobby.prani@gmail.com \
--cc=dhowells@redhat.com \
--cc=dipankar@in.ibm.com \
--cc=dvhart@linux.intel.com \
--cc=edumazet@google.com \
--cc=fweisbec@gmail.com \
--cc=jacob.jun.pan@linux.intel.com \
--cc=jiangshanlai@gmail.com \
--cc=john.stultz@linaro.org \
--cc=josh@joshtriplett.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=oleg@redhat.com \
--cc=paulmck@linux.vnet.ibm.com \
--cc=peterz@infradead.org \
--cc=reinette.chatre@intel.com \
--cc=rgkernel@gmail.com \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.