linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>,
	josh@joshtriplett.org, mathieu.desnoyers@efficios.com,
	jiangshanlai@gmail.com, linux-kernel@vger.kernel.org,
	sramana@codeaurora.org, prsood@codeaurora.org,
	pkondeti@codeaurora.org, markivx@codeaurora.org,
	peterz@infradead.org
Subject: Re: Query regarding synchronize_sched_expedited and resched_cpu
Date: Tue, 19 Sep 2017 08:31:26 -0700	[thread overview]
Message-ID: <20170919153126.GA2955@linux.vnet.ibm.com> (raw)
In-Reply-To: <20170918162412.GM3521@linux.vnet.ibm.com>

On Mon, Sep 18, 2017 at 09:24:12AM -0700, Paul E. McKenney wrote:
> On Mon, Sep 18, 2017 at 12:12:13PM -0400, Steven Rostedt wrote:
> > On Mon, 18 Sep 2017 09:01:25 -0700
> > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> > 
> > 
> > >     sched: Make resched_cpu() unconditional
> > >     
> > >     The current implementation of synchronize_sched_expedited() incorrectly
> > >     assumes that resched_cpu() is unconditional, which it is not.  This means
> > >     that synchronize_sched_expedited() can hang when resched_cpu()'s trylock
> > >     fails as follows (analysis by Neeraj Upadhyay):
> > >     
> > >     o    CPU1 is waiting for expedited wait to complete:
> > >     sync_rcu_exp_select_cpus
> > >          rdp->exp_dynticks_snap & 0x1   // returns 1 for CPU5
> > >          IPI sent to CPU5
> > >     
> > >     synchronize_sched_expedited_wait
> > >              ret = swait_event_timeout(
> > >                                          rsp->expedited_wq,
> > >       sync_rcu_preempt_exp_done(rnp_root),
> > >                                          jiffies_stall);
> > >     
> > >                 expmask = 0x20 , and CPU 5 is in idle path (in cpuidle_enter())
> > >     
> > >     o    CPU5 handles IPI and fails to acquire rq lock.
> > >     
> > >     Handles IPI
> > >          sync_sched_exp_handler
> > >              resched_cpu
> > >                  returns while failing to try lock acquire rq->lock
> > >              need_resched is not set
> > >     
> > >     o    CPU5 calls  rcu_idle_enter() and as need_resched is not set, goes to
> > >          idle (schedule() is not called).
> > >     
> > >     o    CPU 1 reports RCU stall.
> > >     
> > >     Given that resched_cpu() is used only by RCU, this commit fixes the
> > >     assumption by making resched_cpu() unconditional.
> > 
> > Probably want to run this with several workloads with lockdep enabled
> > first.
> 
> As soon as I work through the backlog of lockdep complaints that
> appeared in the last merge window...  :-(

And this patch survived all rcutorture scenarios, including those with
lockdep enabled.  There were failures, but these are pre-existing issues
I am chasing:  Lost timeouts on TREE01 and rt_mutex trying to awaken
an offline CPU in TREE03.

So I have this one queued.  Objections?

							Thanx, Paul

------------------------------------------------------------------------

commit bc43e2e7e08134e6f403ac845edcf4f85668d803
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date:   Mon Sep 18 08:54:40 2017 -0700

    sched: Make resched_cpu() unconditional
    
    The current implementation of synchronize_sched_expedited() incorrectly
    assumes that resched_cpu() is unconditional, which it is not.  This means
    that synchronize_sched_expedited() can hang when resched_cpu()'s trylock
    fails as follows (analysis by Neeraj Upadhyay):
    
    o    CPU1 is waiting for expedited wait to complete:
    sync_rcu_exp_select_cpus
         rdp->exp_dynticks_snap & 0x1   // returns 1 for CPU5
         IPI sent to CPU5
    
    synchronize_sched_expedited_wait
             ret = swait_event_timeout(
                                         rsp->expedited_wq,
      sync_rcu_preempt_exp_done(rnp_root),
                                         jiffies_stall);
    
                expmask = 0x20 , and CPU 5 is in idle path (in cpuidle_enter())
    
    o    CPU5 handles IPI and fails to acquire rq lock.
    
    Handles IPI
         sync_sched_exp_handler
             resched_cpu
                 returns while failing to try lock acquire rq->lock
             need_resched is not set
    
    o    CPU5 calls  rcu_idle_enter() and as need_resched is not set, goes to
         idle (schedule() is not called).
    
    o    CPU 1 reports RCU stall.
    
    Given that resched_cpu() is used only by RCU, this commit fixes the
    assumption by making resched_cpu() unconditional.
    
    Reported-by: Neeraj Upadhyay <neeraju@codeaurora.org>
    Suggested-by: Neeraj Upadhyay <neeraju@codeaurora.org>
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Steven Rostedt <rostedt@goodmis.org>

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index cab8c5ec128e..b2281971894c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -505,8 +505,7 @@ void resched_cpu(int cpu)
 	struct rq *rq = cpu_rq(cpu);
 	unsigned long flags;
 
-	if (!raw_spin_trylock_irqsave(&rq->lock, flags))
-		return;
+	raw_spin_lock_irqsave(&rq->lock, flags);
 	resched_curr(rq);
 	raw_spin_unlock_irqrestore(&rq->lock, flags);
 }

  parent reply	other threads:[~2017-09-19 15:31 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-15 11:14 Query regarding synchronize_sched_expedited and resched_cpu Neeraj Upadhyay
2017-09-17  1:00 ` Paul E. McKenney
2017-09-17  6:07   ` Neeraj Upadhyay
2017-09-18 15:11     ` Steven Rostedt
2017-09-18 16:01       ` Paul E. McKenney
2017-09-18 16:12         ` Steven Rostedt
2017-09-18 16:24           ` Paul E. McKenney
2017-09-18 16:29             ` Steven Rostedt
2017-09-18 16:55               ` Paul E. McKenney
2017-09-18 23:53                 ` Paul E. McKenney
2017-09-19  1:23                   ` Steven Rostedt
2017-09-19  2:26                     ` Paul E. McKenney
2017-09-19  1:50                   ` Byungchul Park
2017-09-19  2:06                     ` Byungchul Park
2017-09-19  2:33                       ` Paul E. McKenney
2017-09-19  2:48                         ` Byungchul Park
2017-09-19  4:04                           ` Paul E. McKenney
2017-09-19  5:37                             ` Boqun Feng
2017-09-19  6:11                               ` Mike Galbraith
2017-09-19  6:53                                 ` Byungchul Park
2017-09-19 13:40                                 ` Paul E. McKenney
2017-09-21 13:57                 ` Peter Zijlstra
2017-09-21 15:33                   ` Paul E. McKenney
2017-09-19  1:55               ` Byungchul Park
2017-09-19 15:31             ` Paul E. McKenney [this message]
2017-09-19 15:58               ` Steven Rostedt
2017-09-19 16:12                 ` Paul E. McKenney
2017-09-21 13:59               ` Peter Zijlstra
2017-09-21 16:00                 ` Paul E. McKenney
2017-09-21 16:30                   ` Peter Zijlstra
2017-09-21 16:47                     ` Paul E. McKenney
2017-09-21 13:55       ` Peter Zijlstra
2017-09-21 15:31         ` Paul E. McKenney
2017-09-21 16:18           ` Peter Zijlstra
2017-09-21 15:46         ` Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170919153126.GA2955@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=jiangshanlai@gmail.com \
    --cc=josh@joshtriplett.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=markivx@codeaurora.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=neeraju@codeaurora.org \
    --cc=peterz@infradead.org \
    --cc=pkondeti@codeaurora.org \
    --cc=prsood@codeaurora.org \
    --cc=rostedt@goodmis.org \
    --cc=sramana@codeaurora.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).