linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Ingo Molnar <mingo@kernel.org>
Cc: linux-kernel@vger.kernel.org, jiangshanlai@gmail.com,
	dipankar@in.ibm.com, akpm@linux-foundation.org,
	mathieu.desnoyers@efficios.com, josh@joshtriplett.org,
	tglx@linutronix.de, peterz@infradead.org, rostedt@goodmis.org,
	dhowells@redhat.com, edumazet@google.com, fweisbec@gmail.com,
	oleg@redhat.com, Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH v5 tip/core/rcu 01/15] sched,rcu: Make cond_resched() provide RCU quiescent state
Date: Thu, 17 Aug 2017 05:40:18 -0700	[thread overview]
Message-ID: <20170817124018.GM7017@linux.vnet.ibm.com> (raw)
In-Reply-To: <20170817082244.ocqajpnfscd44cjr@gmail.com>

On Thu, Aug 17, 2017 at 10:22:44AM +0200, Ingo Molnar wrote:
> 
> * Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
> 
> > There is some confusion as to which of cond_resched() or
> > cond_resched_rcu_qs() should be added to long in-kernel loops.
> > This commit therefore eliminates the decision by adding RCU quiescent
> > states to cond_resched().  This commit also simplifies the code that
> > used to interact with cond_resched_rcu_qs(), and that now interacts with
> > cond_resched(), to reduce its overhead.  This reduction is necessary to
> > allow the heavier-weight cond_resched_rcu_qs() mechanism to be invoked
> > everywhere that cond_resched() is invoked.
> > 
> > Part of that reduction in overhead converts the jiffies_till_sched_qs
> > kernel parameter to read-only at runtime, thus eliminating the need for
> > bounds checking.
> > 
> > Reported-by: Michal Hocko <mhocko@kernel.org>
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > [ paulmck: Tuning for performance issues reported by 0day Test Robot. ]
> > 
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index 8337e2db0bb2..d2f291a3a44a 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -1522,10 +1522,11 @@ static inline int test_tsk_need_resched(struct task_struct *tsk)
> >   * cond_resched_lock() will drop the spinlock before scheduling,
> >   * cond_resched_softirq() will enable bhs before scheduling.
> >   */
> > +void rcu_all_qs(void);
> >  #ifndef CONFIG_PREEMPT
> >  extern int _cond_resched(void);
> >  #else
> > -static inline int _cond_resched(void) { return 0; }
> > +static inline int _cond_resched(void) { rcu_all_qs(); return 0; }
> >  #endif
> >  
> >  #define cond_resched() ({			\
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 51d4c3acf32d..e40cb5190783 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -537,8 +537,8 @@ module_param(rcu_kick_kthreads, bool, 0644);
> >   * How long the grace period must be before we start recruiting
> >   * quiescent-state help from rcu_note_context_switch().
> >   */
> > -static ulong jiffies_till_sched_qs = HZ / 20;
> > -module_param(jiffies_till_sched_qs, ulong, 0644);
> > +static ulong jiffies_till_sched_qs = HZ / 10;
> > +module_param(jiffies_till_sched_qs, ulong, 0444);
> >  
> >  static bool rcu_start_gp_advanced(struct rcu_state *rsp, struct rcu_node *rnp,
> >  				  struct rcu_data *rdp);
> > @@ -1230,7 +1230,6 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
> >  	unsigned long jtsq;
> >  	bool *rnhqp;
> >  	bool *ruqp;
> > -	unsigned long rjtsc;
> >  	struct rcu_node *rnp;
> >  
> >  	/*
> > @@ -1247,23 +1246,13 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
> >  		return 1;
> >  	}
> >  
> > -	/* Compute and saturate jiffies_till_sched_qs. */
> > -	jtsq = jiffies_till_sched_qs;
> > -	rjtsc = rcu_jiffies_till_stall_check();
> > -	if (jtsq > rjtsc / 2) {
> > -		WRITE_ONCE(jiffies_till_sched_qs, rjtsc);
> > -		jtsq = rjtsc / 2;
> > -	} else if (jtsq < 1) {
> > -		WRITE_ONCE(jiffies_till_sched_qs, 1);
> > -		jtsq = 1;
> > -	}
> > -
> >  	/*
> >  	 * Has this CPU encountered a cond_resched_rcu_qs() since the
> >  	 * beginning of the grace period?  For this to be the case,
> >  	 * the CPU has to have noticed the current grace period.  This
> >  	 * might not be the case for nohz_full CPUs looping in the kernel.
> >  	 */
> > +	jtsq = jiffies_till_sched_qs;
> >  	rnp = rdp->mynode;
> >  	ruqp = per_cpu_ptr(&rcu_dynticks.rcu_urgent_qs, rdp->cpu);
> >  	if (time_after(jiffies, rdp->rsp->gp_start + jtsq) &&
> > @@ -1271,7 +1260,7 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
> >  	    READ_ONCE(rdp->gpnum) == rnp->gpnum && !rdp->gpwrap) {
> >  		trace_rcu_fqs(rdp->rsp->name, rdp->gpnum, rdp->cpu, TPS("rqc"));
> >  		return 1;
> > -	} else {
> > +	} else if (time_after(jiffies, rdp->rsp->gp_start + jtsq)) {
> >  		/* Load rcu_qs_ctr before store to rcu_urgent_qs. */
> >  		smp_store_release(ruqp, true);
> >  	}
> > @@ -1299,10 +1288,6 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
> >  	 * updates are only once every few jiffies, the probability of
> >  	 * lossage (and thus of slight grace-period extension) is
> >  	 * quite low.
> > -	 *
> > -	 * Note that if the jiffies_till_sched_qs boot/sysfs parameter
> > -	 * is set too high, we override with half of the RCU CPU stall
> > -	 * warning delay.
> >  	 */
> >  	rnhqp = &per_cpu(rcu_dynticks.rcu_need_heavy_qs, rdp->cpu);
> >  	if (!READ_ONCE(*rnhqp) &&
> > @@ -1311,7 +1296,7 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
> >  		WRITE_ONCE(*rnhqp, true);
> >  		/* Store rcu_need_heavy_qs before rcu_urgent_qs. */
> >  		smp_store_release(ruqp, true);
> > -		rdp->rsp->jiffies_resched += 5; /* Re-enable beating. */
> > +		rdp->rsp->jiffies_resched += jtsq; /* Re-enable beating. */
> >  	}
> >  
> >  	/*
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 17c667b427b4..9433633012ba 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -4808,6 +4808,7 @@ int __sched _cond_resched(void)
> >  		preempt_schedule_common();
> >  		return 1;
> >  	}
> > +	rcu_all_qs();
> >  	return 0;
> >  }
> >  EXPORT_SYMBOL(_cond_resched);
> 
> So I'm a bit uneasy about this change:
> 
> - There's hundreds of uses of cond_resched(), some of them in commonly inlined
>   functions.
> 
> - cond_resched() typically gets called in functions that _might_ take a long time
>   to execute, but that's not a given.
> 
> - it's definitely getting called opportunistically as well, under
>   PREEMPT_VOLUNTARY, from common lightweight helpers that we know are in
>   schedulable contexts. We risk adding significant overhead here.
> 
> So what we risk here is turning a known to be super simple function into something 
> much slower - and exporting slowdowns to literally thousands of explicit and 
> implicit usage sites.

Yeah, I have been refining this one since November.  It finally seems
to make 0day Test Robot happy, but it isn't going to hurt to keep it
out for another cycle.

I will rebase it into my stack intended for 4.15 before sending you a
pull request.

							Thanx, Paul

  reply	other threads:[~2017-08-17 12:40 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-24 21:44 [PATCH tip/core/rcu 0/15] General fixes Paul E. McKenney
2017-07-24 21:44 ` [PATCH tip/core/rcu 01/15] sched,rcu: Make cond_resched() provide RCU quiescent state Paul E. McKenney
2017-08-15 16:19   ` [PATCH v5 " Paul E. McKenney
2017-08-17  8:22     ` Ingo Molnar
2017-08-17 12:40       ` Paul E. McKenney [this message]
2017-07-24 21:44 ` [PATCH tip/core/rcu 02/15] rcu: Use timer as backstop for NOCB deferred wakeups Paul E. McKenney
2017-07-25 18:12   ` Steven Rostedt
2017-07-25 19:18     ` Paul E. McKenney
2017-07-25 22:17       ` Steven Rostedt
2017-07-26  0:05         ` Paul E. McKenney
2017-07-26 21:18           ` Steven Rostedt
2017-07-26 21:47             ` Paul E. McKenney
2017-07-26 23:09               ` Steven Rostedt
2017-07-27 17:33                 ` Paul E. McKenney
2017-07-24 21:44 ` [PATCH tip/core/rcu 03/15] rcu: Drive TASKS_RCU directly off of PREEMPT Paul E. McKenney
2017-07-25 18:14   ` Steven Rostedt
2017-07-25 19:19     ` Paul E. McKenney
2017-07-24 21:44 ` [PATCH tip/core/rcu 04/15] rcu: Create reasonable API for do_exit() TASKS_RCU processing Paul E. McKenney
2017-07-24 21:44 ` [PATCH tip/core/rcu 05/15] rcu: Add TPS() to event-traced strings Paul E. McKenney
2017-07-28  1:32   ` Steven Rostedt
2017-07-24 21:44 ` [PATCH tip/core/rcu 06/15] rcu: Move rcu.h to new trivial-function style Paul E. McKenney
2017-07-24 21:44 ` [PATCH tip/core/rcu 07/15] rcu: Add event tracing to ->gp_tasks update at GP start Paul E. McKenney
2017-07-28  1:38   ` Steven Rostedt
2017-07-28  3:22     ` Paul E. McKenney
2017-07-28 12:18       ` Steven Rostedt
2017-07-28 17:13         ` Paul E. McKenney
2017-07-24 21:44 ` [PATCH tip/core/rcu 08/15] swait: add idle variants which don't contribute to load average Paul E. McKenney
2017-07-24 21:44 ` [PATCH tip/core/rcu 09/15] rcu: use idle versions of swait to make idle-hack clear Paul E. McKenney
2017-07-24 21:44 ` [PATCH tip/core/rcu 10/15] rcu: Add TPS() protection for _rcu_barrier_trace strings Paul E. McKenney
2017-07-28  1:40   ` Steven Rostedt
2017-07-24 21:44 ` [PATCH tip/core/rcu 11/15] rcu/tracing: Set disable_rcu_irq_enter on rcu_eqs_exit() Paul E. McKenney
2017-07-24 21:44 ` [PATCH tip/core/rcu 12/15] rcu: Add assertions verifying blocked-tasks list Paul E. McKenney
2017-07-24 21:44 ` [PATCH tip/core/rcu 13/15] rcu: Make rcu_idle_enter() rely on callers disabling irqs Paul E. McKenney
2017-07-24 21:44 ` [PATCH tip/core/rcu 14/15] rcu: Add warning to rcu_idle_enter() for irqs enabled Paul E. McKenney
2017-07-24 21:44 ` [PATCH tip/core/rcu 15/15] rcu: Remove exports from rcu_idle_exit() and rcu_idle_enter() Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170817124018.GM7017@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=dhowells@redhat.com \
    --cc=dipankar@in.ibm.com \
    --cc=edumazet@google.com \
    --cc=fweisbec@gmail.com \
    --cc=jiangshanlai@gmail.com \
    --cc=josh@joshtriplett.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mingo@kernel.org \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).