All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Ingo Molnar <mingo@kernel.org>
Cc: linux-kernel@vger.kernel.org, jiangshanlai@gmail.com,
	dipankar@in.ibm.com, akpm@linux-foundation.org,
	mathieu.desnoyers@efficios.com, josh@joshtriplett.org,
	tglx@linutronix.de, peterz@infradead.org, rostedt@goodmis.org,
	dhowells@redhat.com, edumazet@google.com, fweisbec@gmail.com,
	oleg@redhat.com, Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH v5 tip/core/rcu 01/15] sched,rcu: Make cond_resched() provide RCU quiescent state
Date: Thu, 17 Aug 2017 05:40:18 -0700	[thread overview]
Message-ID: <20170817124018.GM7017@linux.vnet.ibm.com> (raw)
In-Reply-To: <20170817082244.ocqajpnfscd44cjr@gmail.com>

On Thu, Aug 17, 2017 at 10:22:44AM +0200, Ingo Molnar wrote:
> 
> * Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
> 
> > There is some confusion as to which of cond_resched() or
> > cond_resched_rcu_qs() should be added to long in-kernel loops.
> > This commit therefore eliminates the decision by adding RCU quiescent
> > states to cond_resched().  This commit also simplifies the code that
> > used to interact with cond_resched_rcu_qs(), and that now interacts with
> > cond_resched(), to reduce its overhead.  This reduction is necessary to
> > allow the heavier-weight cond_resched_rcu_qs() mechanism to be invoked
> > everywhere that cond_resched() is invoked.
> > 
> > Part of that reduction in overhead converts the jiffies_till_sched_qs
> > kernel parameter to read-only at runtime, thus eliminating the need for
> > bounds checking.
> > 
> > Reported-by: Michal Hocko <mhocko@kernel.org>
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > [ paulmck: Tuning for performance issues reported by 0day Test Robot. ]
> > 
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index 8337e2db0bb2..d2f291a3a44a 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -1522,10 +1522,11 @@ static inline int test_tsk_need_resched(struct task_struct *tsk)
> >   * cond_resched_lock() will drop the spinlock before scheduling,
> >   * cond_resched_softirq() will enable bhs before scheduling.
> >   */
> > +void rcu_all_qs(void);
> >  #ifndef CONFIG_PREEMPT
> >  extern int _cond_resched(void);
> >  #else
> > -static inline int _cond_resched(void) { return 0; }
> > +static inline int _cond_resched(void) { rcu_all_qs(); return 0; }
> >  #endif
> >  
> >  #define cond_resched() ({			\
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 51d4c3acf32d..e40cb5190783 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -537,8 +537,8 @@ module_param(rcu_kick_kthreads, bool, 0644);
> >   * How long the grace period must be before we start recruiting
> >   * quiescent-state help from rcu_note_context_switch().
> >   */
> > -static ulong jiffies_till_sched_qs = HZ / 20;
> > -module_param(jiffies_till_sched_qs, ulong, 0644);
> > +static ulong jiffies_till_sched_qs = HZ / 10;
> > +module_param(jiffies_till_sched_qs, ulong, 0444);
> >  
> >  static bool rcu_start_gp_advanced(struct rcu_state *rsp, struct rcu_node *rnp,
> >  				  struct rcu_data *rdp);
> > @@ -1230,7 +1230,6 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
> >  	unsigned long jtsq;
> >  	bool *rnhqp;
> >  	bool *ruqp;
> > -	unsigned long rjtsc;
> >  	struct rcu_node *rnp;
> >  
> >  	/*
> > @@ -1247,23 +1246,13 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
> >  		return 1;
> >  	}
> >  
> > -	/* Compute and saturate jiffies_till_sched_qs. */
> > -	jtsq = jiffies_till_sched_qs;
> > -	rjtsc = rcu_jiffies_till_stall_check();
> > -	if (jtsq > rjtsc / 2) {
> > -		WRITE_ONCE(jiffies_till_sched_qs, rjtsc);
> > -		jtsq = rjtsc / 2;
> > -	} else if (jtsq < 1) {
> > -		WRITE_ONCE(jiffies_till_sched_qs, 1);
> > -		jtsq = 1;
> > -	}
> > -
> >  	/*
> >  	 * Has this CPU encountered a cond_resched_rcu_qs() since the
> >  	 * beginning of the grace period?  For this to be the case,
> >  	 * the CPU has to have noticed the current grace period.  This
> >  	 * might not be the case for nohz_full CPUs looping in the kernel.
> >  	 */
> > +	jtsq = jiffies_till_sched_qs;
> >  	rnp = rdp->mynode;
> >  	ruqp = per_cpu_ptr(&rcu_dynticks.rcu_urgent_qs, rdp->cpu);
> >  	if (time_after(jiffies, rdp->rsp->gp_start + jtsq) &&
> > @@ -1271,7 +1260,7 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
> >  	    READ_ONCE(rdp->gpnum) == rnp->gpnum && !rdp->gpwrap) {
> >  		trace_rcu_fqs(rdp->rsp->name, rdp->gpnum, rdp->cpu, TPS("rqc"));
> >  		return 1;
> > -	} else {
> > +	} else if (time_after(jiffies, rdp->rsp->gp_start + jtsq)) {
> >  		/* Load rcu_qs_ctr before store to rcu_urgent_qs. */
> >  		smp_store_release(ruqp, true);
> >  	}
> > @@ -1299,10 +1288,6 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
> >  	 * updates are only once every few jiffies, the probability of
> >  	 * lossage (and thus of slight grace-period extension) is
> >  	 * quite low.
> > -	 *
> > -	 * Note that if the jiffies_till_sched_qs boot/sysfs parameter
> > -	 * is set too high, we override with half of the RCU CPU stall
> > -	 * warning delay.
> >  	 */
> >  	rnhqp = &per_cpu(rcu_dynticks.rcu_need_heavy_qs, rdp->cpu);
> >  	if (!READ_ONCE(*rnhqp) &&
> > @@ -1311,7 +1296,7 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
> >  		WRITE_ONCE(*rnhqp, true);
> >  		/* Store rcu_need_heavy_qs before rcu_urgent_qs. */
> >  		smp_store_release(ruqp, true);
> > -		rdp->rsp->jiffies_resched += 5; /* Re-enable beating. */
> > +		rdp->rsp->jiffies_resched += jtsq; /* Re-enable beating. */
> >  	}
> >  
> >  	/*
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 17c667b427b4..9433633012ba 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -4808,6 +4808,7 @@ int __sched _cond_resched(void)
> >  		preempt_schedule_common();
> >  		return 1;
> >  	}
> > +	rcu_all_qs();
> >  	return 0;
> >  }
> >  EXPORT_SYMBOL(_cond_resched);
> 
> So I'm a bit uneasy about this change:
> 
> - There's hundreds of uses of cond_resched(), some of them in commonly inlined
>   functions.
> 
> - cond_resched() typically gets called in functions that _might_ take a long time
>   to execute, but that's not a given.
> 
> - it's definitely getting called opportunistically as well, under
>   PREEMPT_VOLUNTARY, from common lightweight helpers that we know are in
>   schedulable contexts. We risk adding significant overhead here.
> 
> So what we risk here is turning a known to be super simple function into something 
> much slower - and exporting slowdowns to literally thousands of explicit and 
> implicit usage sites.

Yeah, I have been refining this one since November.  It finally seems
to make 0day Test Robot happy, but it isn't going to hurt to keep it
out for another cycle.

I will rebase it into my stack intended for 4.15 before sending you a
pull request.

							Thanx, Paul

  reply	other threads:[~2017-08-17 12:40 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-24 21:44 [PATCH tip/core/rcu 0/15] General fixes Paul E. McKenney
2017-07-24 21:44 ` [PATCH tip/core/rcu 01/15] sched,rcu: Make cond_resched() provide RCU quiescent state Paul E. McKenney
2017-08-15 16:19   ` [PATCH v5 " Paul E. McKenney
2017-08-17  8:22     ` Ingo Molnar
2017-08-17 12:40       ` Paul E. McKenney [this message]
2017-07-24 21:44 ` [PATCH tip/core/rcu 02/15] rcu: Use timer as backstop for NOCB deferred wakeups Paul E. McKenney
2017-07-25 18:12   ` Steven Rostedt
2017-07-25 19:18     ` Paul E. McKenney
2017-07-25 22:17       ` Steven Rostedt
2017-07-26  0:05         ` Paul E. McKenney
2017-07-26 21:18           ` Steven Rostedt
2017-07-26 21:47             ` Paul E. McKenney
2017-07-26 23:09               ` Steven Rostedt
2017-07-27 17:33                 ` Paul E. McKenney
2017-07-24 21:44 ` [PATCH tip/core/rcu 03/15] rcu: Drive TASKS_RCU directly off of PREEMPT Paul E. McKenney
2017-07-25 18:14   ` Steven Rostedt
2017-07-25 19:19     ` Paul E. McKenney
2017-07-24 21:44 ` [PATCH tip/core/rcu 04/15] rcu: Create reasonable API for do_exit() TASKS_RCU processing Paul E. McKenney
2017-07-24 21:44 ` [PATCH tip/core/rcu 05/15] rcu: Add TPS() to event-traced strings Paul E. McKenney
2017-07-28  1:32   ` Steven Rostedt
2017-07-24 21:44 ` [PATCH tip/core/rcu 06/15] rcu: Move rcu.h to new trivial-function style Paul E. McKenney
2017-07-24 21:44 ` [PATCH tip/core/rcu 07/15] rcu: Add event tracing to ->gp_tasks update at GP start Paul E. McKenney
2017-07-28  1:38   ` Steven Rostedt
2017-07-28  3:22     ` Paul E. McKenney
2017-07-28 12:18       ` Steven Rostedt
2017-07-28 17:13         ` Paul E. McKenney
2017-07-24 21:44 ` [PATCH tip/core/rcu 08/15] swait: add idle variants which don't contribute to load average Paul E. McKenney
2017-07-24 21:44 ` [PATCH tip/core/rcu 09/15] rcu: use idle versions of swait to make idle-hack clear Paul E. McKenney
2017-07-24 21:44 ` [PATCH tip/core/rcu 10/15] rcu: Add TPS() protection for _rcu_barrier_trace strings Paul E. McKenney
2017-07-28  1:40   ` Steven Rostedt
2017-07-24 21:44 ` [PATCH tip/core/rcu 11/15] rcu/tracing: Set disable_rcu_irq_enter on rcu_eqs_exit() Paul E. McKenney
2017-07-24 21:44 ` [PATCH tip/core/rcu 12/15] rcu: Add assertions verifying blocked-tasks list Paul E. McKenney
2017-07-24 21:44 ` [PATCH tip/core/rcu 13/15] rcu: Make rcu_idle_enter() rely on callers disabling irqs Paul E. McKenney
2017-07-24 21:44 ` [PATCH tip/core/rcu 14/15] rcu: Add warning to rcu_idle_enter() for irqs enabled Paul E. McKenney
2017-07-24 21:44 ` [PATCH tip/core/rcu 15/15] rcu: Remove exports from rcu_idle_exit() and rcu_idle_enter() Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170817124018.GM7017@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=dhowells@redhat.com \
    --cc=dipankar@in.ibm.com \
    --cc=edumazet@google.com \
    --cc=fweisbec@gmail.com \
    --cc=jiangshanlai@gmail.com \
    --cc=josh@joshtriplett.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mingo@kernel.org \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.