From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>,
josh@joshtriplett.org, mathieu.desnoyers@efficios.com,
jiangshanlai@gmail.com, linux-kernel@vger.kernel.org,
sramana@codeaurora.org, prsood@codeaurora.org,
pkondeti@codeaurora.org, markivx@codeaurora.org,
peterz@infradead.org
Subject: Re: Query regarding synchronize_sched_expedited and resched_cpu
Date: Tue, 19 Sep 2017 08:31:26 -0700 [thread overview]
Message-ID: <20170919153126.GA2955@linux.vnet.ibm.com> (raw)
In-Reply-To: <20170918162412.GM3521@linux.vnet.ibm.com>
On Mon, Sep 18, 2017 at 09:24:12AM -0700, Paul E. McKenney wrote:
> On Mon, Sep 18, 2017 at 12:12:13PM -0400, Steven Rostedt wrote:
> > On Mon, 18 Sep 2017 09:01:25 -0700
> > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> >
> >
> > > sched: Make resched_cpu() unconditional
> > >
> > > The current implementation of synchronize_sched_expedited() incorrectly
> > > assumes that resched_cpu() is unconditional, which it is not. This means
> > > that synchronize_sched_expedited() can hang when resched_cpu()'s trylock
> > > fails as follows (analysis by Neeraj Upadhyay):
> > >
> > > o CPU1 is waiting for expedited wait to complete:
> > > sync_rcu_exp_select_cpus
> > > rdp->exp_dynticks_snap & 0x1 // returns 1 for CPU5
> > > IPI sent to CPU5
> > >
> > > synchronize_sched_expedited_wait
> > > ret = swait_event_timeout(
> > > rsp->expedited_wq,
> > > sync_rcu_preempt_exp_done(rnp_root),
> > > jiffies_stall);
> > >
> > > expmask = 0x20 , and CPU 5 is in idle path (in cpuidle_enter())
> > >
> > > o CPU5 handles IPI and fails to acquire rq lock.
> > >
> > > Handles IPI
> > > sync_sched_exp_handler
> > > resched_cpu
> > > returns while failing to try lock acquire rq->lock
> > > need_resched is not set
> > >
> > > o CPU5 calls rcu_idle_enter() and as need_resched is not set, goes to
> > > idle (schedule() is not called).
> > >
> > > o CPU 1 reports RCU stall.
> > >
> > > Given that resched_cpu() is used only by RCU, this commit fixes the
> > > assumption by making resched_cpu() unconditional.
> >
> > Probably want to run this with several workloads with lockdep enabled
> > first.
>
> As soon as I work through the backlog of lockdep complaints that
> appeared in the last merge window... :-(
And this patch survived all rcutorture scenarios, including those with
lockdep enabled. There were failures, but these are pre-existing issues
I am chasing: Lost timeouts on TREE01 and rt_mutex trying to awaken
an offline CPU in TREE03.
So I have this one queued. Objections?
Thanx, Paul
------------------------------------------------------------------------
commit bc43e2e7e08134e6f403ac845edcf4f85668d803
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date: Mon Sep 18 08:54:40 2017 -0700
sched: Make resched_cpu() unconditional
The current implementation of synchronize_sched_expedited() incorrectly
assumes that resched_cpu() is unconditional, which it is not. This means
that synchronize_sched_expedited() can hang when resched_cpu()'s trylock
fails as follows (analysis by Neeraj Upadhyay):
o CPU1 is waiting for expedited wait to complete:
sync_rcu_exp_select_cpus
rdp->exp_dynticks_snap & 0x1 // returns 1 for CPU5
IPI sent to CPU5
synchronize_sched_expedited_wait
ret = swait_event_timeout(
rsp->expedited_wq,
sync_rcu_preempt_exp_done(rnp_root),
jiffies_stall);
expmask = 0x20 , and CPU 5 is in idle path (in cpuidle_enter())
o CPU5 handles IPI and fails to acquire rq lock.
Handles IPI
sync_sched_exp_handler
resched_cpu
returns while failing to try lock acquire rq->lock
need_resched is not set
o CPU5 calls rcu_idle_enter() and as need_resched is not set, goes to
idle (schedule() is not called).
o CPU 1 reports RCU stall.
Given that resched_cpu() is used only by RCU, this commit fixes the
assumption by making resched_cpu() unconditional.
Reported-by: Neeraj Upadhyay <neeraju@codeaurora.org>
Suggested-by: Neeraj Upadhyay <neeraju@codeaurora.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index cab8c5ec128e..b2281971894c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -505,8 +505,7 @@ void resched_cpu(int cpu)
struct rq *rq = cpu_rq(cpu);
unsigned long flags;
- if (!raw_spin_trylock_irqsave(&rq->lock, flags))
- return;
+ raw_spin_lock_irqsave(&rq->lock, flags);
resched_curr(rq);
raw_spin_unlock_irqrestore(&rq->lock, flags);
}
next prev parent reply other threads:[~2017-09-19 15:31 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-09-15 11:14 Query regarding synchronize_sched_expedited and resched_cpu Neeraj Upadhyay
2017-09-17 1:00 ` Paul E. McKenney
2017-09-17 6:07 ` Neeraj Upadhyay
2017-09-18 15:11 ` Steven Rostedt
2017-09-18 16:01 ` Paul E. McKenney
2017-09-18 16:12 ` Steven Rostedt
2017-09-18 16:24 ` Paul E. McKenney
2017-09-18 16:29 ` Steven Rostedt
2017-09-18 16:55 ` Paul E. McKenney
2017-09-18 23:53 ` Paul E. McKenney
2017-09-19 1:23 ` Steven Rostedt
2017-09-19 2:26 ` Paul E. McKenney
2017-09-19 1:50 ` Byungchul Park
2017-09-19 2:06 ` Byungchul Park
2017-09-19 2:33 ` Paul E. McKenney
2017-09-19 2:48 ` Byungchul Park
2017-09-19 4:04 ` Paul E. McKenney
2017-09-19 5:37 ` Boqun Feng
2017-09-19 6:11 ` Mike Galbraith
2017-09-19 6:53 ` Byungchul Park
2017-09-19 13:40 ` Paul E. McKenney
2017-09-21 13:57 ` Peter Zijlstra
2017-09-21 15:33 ` Paul E. McKenney
2017-09-19 1:55 ` Byungchul Park
2017-09-19 15:31 ` Paul E. McKenney [this message]
2017-09-19 15:58 ` Steven Rostedt
2017-09-19 16:12 ` Paul E. McKenney
2017-09-21 13:59 ` Peter Zijlstra
2017-09-21 16:00 ` Paul E. McKenney
2017-09-21 16:30 ` Peter Zijlstra
2017-09-21 16:47 ` Paul E. McKenney
2017-09-21 13:55 ` Peter Zijlstra
2017-09-21 15:31 ` Paul E. McKenney
2017-09-21 16:18 ` Peter Zijlstra
2017-09-21 15:46 ` Steven Rostedt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170919153126.GA2955@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=jiangshanlai@gmail.com \
--cc=josh@joshtriplett.org \
--cc=linux-kernel@vger.kernel.org \
--cc=markivx@codeaurora.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=neeraju@codeaurora.org \
--cc=peterz@infradead.org \
--cc=pkondeti@codeaurora.org \
--cc=prsood@codeaurora.org \
--cc=rostedt@goodmis.org \
--cc=sramana@codeaurora.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.