From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: josh@joshtriplett.org, rostedt@goodmis.org,
mathieu.desnoyers@efficios.com, jiangshanlai@gmail.com,
linux-kernel@vger.kernel.org, sramana@codeaurora.org,
prsood@codeaurora.org
Subject: Re: Query regarding synchronize_sched_expedited and resched_cpu
Date: Sat, 16 Sep 2017 18:00:15 -0700 [thread overview]
Message-ID: <20170917010015.GW3521@linux.vnet.ibm.com> (raw)
In-Reply-To: <8f33e48e-ac6d-2c88-e16f-20b698c06292@codeaurora.org>
On Fri, Sep 15, 2017 at 04:44:38PM +0530, Neeraj Upadhyay wrote:
> Hi,
>
> We have one query regarding the behavior of RCU expedited grace period,
> for scenario where resched_cpu() in sync_sched_exp_handler() fails to
> acquire the rq lock and returns w/o setting the need_resched. In this
> case, how do we ensure that the CPU notify rcu about the
> end of sched grace period (schedule() -> __schedule() ->
> rcu_note_context_switch(cpu) -> rcu_sched_qs()) , for cases where tick
> is stopped on that CPU. Is it implied from the rq lock acquisition
> failure, that the owner of the rq lock will enforce context switch?
> For which scenarios in RCU paths (as the function is used only in RCU
> code), we need trylock check in resched_cpu()?
>
> void resched_cpu(int cpu)
> {
> struct rq *rq = cpu_rq(cpu);
> unsigned long flags;
>
> if (!raw_spin_trylock_irqsave(&rq->lock, flags))
> return;
> resched_curr(rq);
> raw_spin_unlock_irqrestore(&rq->lock, flags);
> }
>
>
> This issue was observed in below scenario, where one of the CPUs (CPU1)
> started synchronize_sched_expedited and sent IPI to CPU5, which is in
> the idle path but handled sync_sched_exp_handler() IPI before
> rcu_idle_enter().
> As resched_cpu() failed to acquire the rq lock, need_resched was not set,
> and CPU went to idle; resulting in expedited stall getting reported
> by CPU1.
>
> Below is the scenario:
>
> • CPU1 is waiting for expedited wait to complete:
> sync_rcu_exp_select_cpus
> rdp->exp_dynticks_snap & 0x1 // returns 1 for CPU5
> IPI sent to CPU5
>
> synchronize_sched_expedited_wait
> ret = swait_event_timeout(
> rsp->expedited_wq,
> sync_rcu_preempt_exp_done(rnp_root),
> jiffies_stall);
>
> expmask = 0x20 , and CPU 5 is in idle path (in cpuidle_enter())
>
>
>
> • CPU5 handles IPI and fails to acquire rq lock.
>
> Handles IPI
> sync_sched_exp_handler
> resched_cpu
> returns while failing to try lock acquire rq->lock
> need_resched is not set
>
> • CPU5 calls rcu_idle_enter() and as need_resched is not set, goes to
> idle (schedule() is not called).
>
> • CPU 1 reports RCU stall.
Good catch and good detective work!!!
I will be working on a fix this week, hopefully involving resched_cpu()
getting a return value so that I can track who needs a later retry.
Thanx, Paul
next prev parent reply other threads:[~2017-09-17 1:00 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-09-15 11:14 Query regarding synchronize_sched_expedited and resched_cpu Neeraj Upadhyay
2017-09-17 1:00 ` Paul E. McKenney [this message]
2017-09-17 6:07 ` Neeraj Upadhyay
2017-09-18 15:11 ` Steven Rostedt
2017-09-18 16:01 ` Paul E. McKenney
2017-09-18 16:12 ` Steven Rostedt
2017-09-18 16:24 ` Paul E. McKenney
2017-09-18 16:29 ` Steven Rostedt
2017-09-18 16:55 ` Paul E. McKenney
2017-09-18 23:53 ` Paul E. McKenney
2017-09-19 1:23 ` Steven Rostedt
2017-09-19 2:26 ` Paul E. McKenney
2017-09-19 1:50 ` Byungchul Park
2017-09-19 2:06 ` Byungchul Park
2017-09-19 2:33 ` Paul E. McKenney
2017-09-19 2:48 ` Byungchul Park
2017-09-19 4:04 ` Paul E. McKenney
2017-09-19 5:37 ` Boqun Feng
2017-09-19 6:11 ` Mike Galbraith
2017-09-19 6:53 ` Byungchul Park
2017-09-19 13:40 ` Paul E. McKenney
2017-09-21 13:57 ` Peter Zijlstra
2017-09-21 15:33 ` Paul E. McKenney
2017-09-19 1:55 ` Byungchul Park
2017-09-19 15:31 ` Paul E. McKenney
2017-09-19 15:58 ` Steven Rostedt
2017-09-19 16:12 ` Paul E. McKenney
2017-09-21 13:59 ` Peter Zijlstra
2017-09-21 16:00 ` Paul E. McKenney
2017-09-21 16:30 ` Peter Zijlstra
2017-09-21 16:47 ` Paul E. McKenney
2017-09-21 13:55 ` Peter Zijlstra
2017-09-21 15:31 ` Paul E. McKenney
2017-09-21 16:18 ` Peter Zijlstra
2017-09-21 15:46 ` Steven Rostedt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170917010015.GW3521@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=jiangshanlai@gmail.com \
--cc=josh@joshtriplett.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=neeraju@codeaurora.org \
--cc=prsood@codeaurora.org \
--cc=rostedt@goodmis.org \
--cc=sramana@codeaurora.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.