Re: Fw: [lkp-developer] [sched,rcu] cf7a2dca60: [No primary change] +186% will-it-scale.time.involuntary_context_switches

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, peterz@infradead.org
Subject: Re: Fw: [lkp-developer] [sched,rcu]  cf7a2dca60: [No primary change] +186% will-it-scale.time.involuntary_context_switches
Date: Tue, 3 Jan 2017 16:55:59 -0800	[thread overview]
Message-ID: <20170104005559.GD3742@linux.vnet.ibm.com> (raw)
In-Reply-To: <20161214173923.GA16763@dhcp22.suse.cz>

On Wed, Dec 14, 2016 at 06:39:24PM +0100, Michal Hocko wrote:
> On Wed 14-12-16 08:48:27, Paul E. McKenney wrote:
> > On Wed, Dec 14, 2016 at 05:15:41PM +0100, Michal Hocko wrote:
> > > On Wed 14-12-16 03:06:09, Paul E. McKenney wrote:
> > > > On Wed, Dec 14, 2016 at 10:54:25AM +0100, Michal Hocko wrote:
> > > > > On Tue 13-12-16 07:14:08, Paul E. McKenney wrote:
> > > > > > Just FYI for the moment...
> > > > > > 
> > > > > > So even with the slowed-down checking, making cond_resched() do what
> > > > > > cond_resched_rcu_qs() does results in a smallish but quite measurable
> > > > > > degradation according to 0day.
> > > > > 
> > > > > So if I understand those results properly, the reason seems to be the
> > > > > increased involuntary context switches, right? Or am I misreading the
> > > > > data?
> > > > > I am looking at your "sched,rcu: Make cond_resched() provide RCU
> > > > > quiescent state" in linux-next and I am wondering whether rcu_all_qs has
> > > > > to be called unconditionally and not only when should_resched failed few
> > > > > times? I guess you have discussed that with Peter already but do not
> > > > > remember the outcome.
> > > > 
> > > > My first thought is to wait for the grace period to age further before
> > > > checking, the idea being to avoid increasing cond_resched() overhead
> > > > any further.  But if that doesn't work, then yes, I may have to look at
> > > > adding more checks to cond_resched().
> > > 
> > > This might be really naive but would something like the following work?
> > > The overhead should be pretty much negligible, I guess. Ideally the pcp
> > > variable could be set somewhere from check_cpu_stall() but I couldn't
> > > wrap my head around that code to see how exactly.
> > 
> > My concern (perhaps misplaced) with this approach is that there are
> > quite a few tight loops containing cond_resched().  So I would still
> > need to throttle the resulting grace-period acceleration to keep the
> > context switches down to a dull roar.
> 
> Yes, I see your point. Something based on the stall timeout would be
> much better of course. I just failed to come up with something that
> would make sense. This was more my lack of familiarity with the code so
> I hope you will be more successful ;)

Well, here is my current shot at this.  And so do I.  ;-)

So now it ignores cond_resched_rcu_qs() until at least
jiffies_till_sched_qs jiffies have elapsed since the start of the
grace period.  The jiffies_till_sched_qs variable defaults to HZ/20,
which should slow the checks down by about a factor of seven.  Plus I
don't see a problem with changing the default to (say) HZ/10 if needed.

Thoughts?

							Thanx, Paul

------------------------------------------------------------------------

commit 7acd02c9e62fb21e7466e7a99fc21bf6ed6cc3cf
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date:   Tue Jan 3 16:49:46 2017 -0800

    squash! rcu: Check cond_resched_rcu_qs() state less often to reduce GP overhead
    
    Now polling only after jiffies_till_sched_qs jiffies have elapsed.

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 083cb8a6299c..0369e0e0fe00 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1274,7 +1274,9 @@ static int dyntick_save_progress_counter(struct rcu_data *rdp,
 static int rcu_implicit_dynticks_qs(struct rcu_data *rdp,
 				    bool *isidle, unsigned long *maxj)
 {
+	unsigned long jtsq;
 	int *rcrmp;
+	unsigned long rjtsc;
 	struct rcu_node *rnp;
 
 	/*
@@ -1291,6 +1293,17 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp,
 		return 1;
 	}
 
+	/* Compute and saturate jiffies_till_sched_qs. */
+	jtsq = jiffies_till_sched_qs;
+	rjtsc = rcu_jiffies_till_stall_check();
+	if (jtsq > rjtsc / 2) {
+		WRITE_ONCE(jiffies_till_sched_qs, rjtsc);
+		jtsq = rjtsc / 2;
+	} else if (jtsq < 1) {
+		WRITE_ONCE(jiffies_till_sched_qs, 1);
+		jtsq = 1;
+	}
+
 	/*
 	 * Has this CPU encountered a cond_resched_rcu_qs() since the
 	 * beginning of the grace period?  For this to be the case,
@@ -1298,7 +1311,8 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp,
 	 * might not be the case for nohz_full CPUs looping in the kernel.
 	 */
 	rnp = rdp->mynode;
-	if (READ_ONCE(rdp->rcu_qs_ctr_snap) != per_cpu(rcu_qs_ctr, rdp->cpu) &&
+	if (time_after(jiffies, rdp->rsp->gp_start + jtsq) &&
+	    READ_ONCE(rdp->rcu_qs_ctr_snap) != per_cpu(rcu_qs_ctr, rdp->cpu) &&
 	    READ_ONCE(rdp->gpnum) == rnp->gpnum && !rdp->gpwrap) {
 		trace_rcu_fqs(rdp->rsp->name, rdp->gpnum, rdp->cpu, TPS("rqc"));
 		return 1;
@@ -1333,9 +1347,8 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp,
 	 * warning delay.
 	 */
 	rcrmp = &per_cpu(rcu_sched_qs_mask, rdp->cpu);
-	if (ULONG_CMP_GE(jiffies,
-			 rdp->rsp->gp_start + jiffies_till_sched_qs) ||
-	    ULONG_CMP_GE(jiffies, rdp->rsp->jiffies_resched)) {
+	if (time_after(jiffies, rdp->rsp->gp_start + jtsq) ||
+	    time_after(jiffies, rdp->rsp->jiffies_resched)) {
 		if (!(READ_ONCE(*rcrmp) & rdp->rsp->flavor_mask)) {
 			WRITE_ONCE(rdp->cond_resched_completed,
 				   READ_ONCE(rdp->mynode->completed));

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, peterz@infradead.org
Subject: Re: Fw: [lkp-developer] [sched,rcu]  cf7a2dca60: [No primary change] +186% will-it-scale.time.involuntary_context_switches
Date: Tue, 3 Jan 2017 16:55:59 -0800	[thread overview]
Message-ID: <20170104005559.GD3742@linux.vnet.ibm.com> (raw)
In-Reply-To: <20161214173923.GA16763@dhcp22.suse.cz>

On Wed, Dec 14, 2016 at 06:39:24PM +0100, Michal Hocko wrote:
> On Wed 14-12-16 08:48:27, Paul E. McKenney wrote:
> > On Wed, Dec 14, 2016 at 05:15:41PM +0100, Michal Hocko wrote:
> > > On Wed 14-12-16 03:06:09, Paul E. McKenney wrote:
> > > > On Wed, Dec 14, 2016 at 10:54:25AM +0100, Michal Hocko wrote:
> > > > > On Tue 13-12-16 07:14:08, Paul E. McKenney wrote:
> > > > > > Just FYI for the moment...
> > > > > > 
> > > > > > So even with the slowed-down checking, making cond_resched() do what
> > > > > > cond_resched_rcu_qs() does results in a smallish but quite measurable
> > > > > > degradation according to 0day.
> > > > > 
> > > > > So if I understand those results properly, the reason seems to be the
> > > > > increased involuntary context switches, right? Or am I misreading the
> > > > > data?
> > > > > I am looking at your "sched,rcu: Make cond_resched() provide RCU
> > > > > quiescent state" in linux-next and I am wondering whether rcu_all_qs has
> > > > > to be called unconditionally and not only when should_resched failed few
> > > > > times? I guess you have discussed that with Peter already but do not
> > > > > remember the outcome.
> > > > 
> > > > My first thought is to wait for the grace period to age further before
> > > > checking, the idea being to avoid increasing cond_resched() overhead
> > > > any further.  But if that doesn't work, then yes, I may have to look at
> > > > adding more checks to cond_resched().
> > > 
> > > This might be really naive but would something like the following work?
> > > The overhead should be pretty much negligible, I guess. Ideally the pcp
> > > variable could be set somewhere from check_cpu_stall() but I couldn't
> > > wrap my head around that code to see how exactly.
> > 
> > My concern (perhaps misplaced) with this approach is that there are
> > quite a few tight loops containing cond_resched().  So I would still
> > need to throttle the resulting grace-period acceleration to keep the
> > context switches down to a dull roar.
> 
> Yes, I see your point. Something based on the stall timeout would be
> much better of course. I just failed to come up with something that
> would make sense. This was more my lack of familiarity with the code so
> I hope you will be more successful ;)

Well, here is my current shot at this.  And so do I.  ;-)

So now it ignores cond_resched_rcu_qs() until at least
jiffies_till_sched_qs jiffies have elapsed since the start of the
grace period.  The jiffies_till_sched_qs variable defaults to HZ/20,
which should slow the checks down by about a factor of seven.  Plus I
don't see a problem with changing the default to (say) HZ/10 if needed.

Thoughts?

							Thanx, Paul

------------------------------------------------------------------------

commit 7acd02c9e62fb21e7466e7a99fc21bf6ed6cc3cf
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date:   Tue Jan 3 16:49:46 2017 -0800

    squash! rcu: Check cond_resched_rcu_qs() state less often to reduce GP overhead
    
    Now polling only after jiffies_till_sched_qs jiffies have elapsed.

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 083cb8a6299c..0369e0e0fe00 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1274,7 +1274,9 @@ static int dyntick_save_progress_counter(struct rcu_data *rdp,
 static int rcu_implicit_dynticks_qs(struct rcu_data *rdp,
 				    bool *isidle, unsigned long *maxj)
 {
+	unsigned long jtsq;
 	int *rcrmp;
+	unsigned long rjtsc;
 	struct rcu_node *rnp;
 
 	/*
@@ -1291,6 +1293,17 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp,
 		return 1;
 	}
 
+	/* Compute and saturate jiffies_till_sched_qs. */
+	jtsq = jiffies_till_sched_qs;
+	rjtsc = rcu_jiffies_till_stall_check();
+	if (jtsq > rjtsc / 2) {
+		WRITE_ONCE(jiffies_till_sched_qs, rjtsc);
+		jtsq = rjtsc / 2;
+	} else if (jtsq < 1) {
+		WRITE_ONCE(jiffies_till_sched_qs, 1);
+		jtsq = 1;
+	}
+
 	/*
 	 * Has this CPU encountered a cond_resched_rcu_qs() since the
 	 * beginning of the grace period?  For this to be the case,
@@ -1298,7 +1311,8 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp,
 	 * might not be the case for nohz_full CPUs looping in the kernel.
 	 */
 	rnp = rdp->mynode;
-	if (READ_ONCE(rdp->rcu_qs_ctr_snap) != per_cpu(rcu_qs_ctr, rdp->cpu) &&
+	if (time_after(jiffies, rdp->rsp->gp_start + jtsq) &&
+	    READ_ONCE(rdp->rcu_qs_ctr_snap) != per_cpu(rcu_qs_ctr, rdp->cpu) &&
 	    READ_ONCE(rdp->gpnum) == rnp->gpnum && !rdp->gpwrap) {
 		trace_rcu_fqs(rdp->rsp->name, rdp->gpnum, rdp->cpu, TPS("rqc"));
 		return 1;
@@ -1333,9 +1347,8 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp,
 	 * warning delay.
 	 */
 	rcrmp = &per_cpu(rcu_sched_qs_mask, rdp->cpu);
-	if (ULONG_CMP_GE(jiffies,
-			 rdp->rsp->gp_start + jiffies_till_sched_qs) ||
-	    ULONG_CMP_GE(jiffies, rdp->rsp->jiffies_resched)) {
+	if (time_after(jiffies, rdp->rsp->gp_start + jtsq) ||
+	    time_after(jiffies, rdp->rsp->jiffies_resched)) {
 		if (!(READ_ONCE(*rcrmp) & rdp->rsp->flavor_mask)) {
 			WRITE_ONCE(rdp->cond_resched_completed,
 				   READ_ONCE(rdp->mynode->completed));

next prev parent reply	other threads:[~2017-01-04  0:56 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-13 15:14 Fw: [lkp-developer] [sched,rcu] cf7a2dca60: [No primary change] +186% will-it-scale.time.involuntary_context_switches Paul E. McKenney
2016-12-13 15:14 ` Paul E. McKenney
2016-12-14  9:54 ` Michal Hocko
2016-12-14  9:54   ` Michal Hocko
2016-12-14 11:06   ` Paul E. McKenney
2016-12-14 11:06     ` Paul E. McKenney
2016-12-14 16:15     ` Michal Hocko
2016-12-14 16:15       ` Michal Hocko
2016-12-14 16:48       ` Paul E. McKenney
2016-12-14 16:48         ` Paul E. McKenney
2016-12-14 17:39         ` Michal Hocko
2016-12-14 17:39           ` Michal Hocko
2017-01-04  0:55           ` Paul E. McKenney [this message]
2017-01-04  0:55             ` Paul E. McKenney

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:083cb8a6299 dfblob:0369e0e0fe0 dfblob:083cb8a6299
dfblob:0369e0e0fe0 )
 OR (
bs:"Re: Fw: [lkp-developer] [sched,rcu]  cf7a2dca60: [No primary change] +186% will-it-scale.time.involuntary_context_switches" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170104005559.GD3742@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.