All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vincent Guittot <vincent.guittot@linaro.org>
To: lkp@lists.01.org
Subject: Re: [lkp-developer] [sched/fair] 4e5160766f: +149% ftq.noise.50% regression
Date: Thu, 22 Dec 2016 16:12:15 +0100	[thread overview]
Message-ID: <20161222151215.GA23448@linaro.org> (raw)
In-Reply-To: <878trk8urx.fsf@yhuang-dev.intel.com>

[-- Attachment #1: Type: text/plain, Size: 8668 bytes --]

Le Tuesday 13 Dec 2016 à 09:47:30 (+0800), Huang, Ying a écrit :
> Hi, Vincent,
> 
> Vincent Guittot <vincent.guittot@linaro.org> writes:
> 
> > Hi Ying,
> >
> > On 12 December 2016 at 06:43, kernel test robot
> > <ying.huang@linux.intel.com> wrote:
> >> Greeting,
> >>
> >> FYI, we noticed a 149% regression of ftq.noise.50% due to commit:
> >>
> >>
> >> commit: 4e5160766fcc9f41bbd38bac11f92dce993644aa ("sched/fair: Propagate asynchrous detach")
> >> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
> >>
> >> in testcase: ftq
> >> on test machine: 8 threads Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz with 8G memory
> >> with following parameters:
> >>
> >>         nr_task: 100%
> >>         samples: 6000ss
> >>         test: cache
> >>         freq: 20
> >>         cpufreq_governor: powersave
> >
> > Why using powersave ? Are you testing  every governors ?
> 
> We will test performance and powersave governor for FTQ.

Ok thanks

> 
> >>
> >> test-description: The FTQ benchmarks measure hardware and software interference or 'noise' on a node from the applications perspective.
> >> test-url: https://github.com/rminnich/ftq
> >
> > It's a bit difficult to understand exactly what is measured and what
> > is ftq.noise.50% because this result is not part of the bench which
> > seems to only record a log of data in a file and ftq.noise.50% seems
> > to be lkp specific
> 
> Yes. FTQ itself has no noise statistics builtin, although it is an OS
> noise benchmark.  ftq.noise.50% is calculated as below:
> 
> There is a score for every sample of ftq.  The lower the score, the
> higher the noises.  ftq.noise.50% is the number (per 1000000 samples) of
> samples whose score is less than 50% of the mean score.
> 

ok so IIUC we have moved from 0.03% to 0.11% for ftq.noise.50%

I have not been able to reproduce the regression on the different system that I have access to so I can only guess the root cause of the regression.

Could it be possible to test if the patch below fix the regression ?


---
 kernel/sched/fair.c | 29 ++++++++++++++++++++++++++++-
 1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 090a9bb..8efa113 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3138,6 +3138,31 @@ static inline int propagate_entity_load_avg(struct sched_entity *se)
 	return 1;
 }
 
+/* Check if we need to update the load and the utilization of a group_entity */
+static inline bool skip_blocked_update(struct sched_entity *se)
+{
+	struct cfs_rq *gcfs_rq = group_cfs_rq(se);
+
+	/*
+	 * If sched_entity still have not null load or utilization, we have to
+	 * decay it.
+	 */
+	if (se->avg.load_avg || se->avg.util_avg)
+		return false;
+
+	/*
+	 * If there is a pending propagation, we have to update the load and
+	 * the utilizaion of the sched_entity
+	 */
+	if (gcfs_rq->propagate_avg)
+		return false;
+
+	/*
+	 * Other wise, the load and the utilizaiton of the sched_entity is
+	 * already null so it will be a waste of time to try to decay it
+	 */
+	return true;
+}
 #else /* CONFIG_FAIR_GROUP_SCHED */
 
 static inline void update_tg_load_avg(struct cfs_rq *cfs_rq, int force) {}
@@ -6858,6 +6883,7 @@ static void update_blocked_averages(int cpu)
 {
 	struct rq *rq = cpu_rq(cpu);
 	struct cfs_rq *cfs_rq;
+	struct sched_entity *se;
 	unsigned long flags;
 
 	raw_spin_lock_irqsave(&rq->lock, flags);
@@ -6876,7 +6902,8 @@ static void update_blocked_averages(int cpu)
 			update_tg_load_avg(cfs_rq, 0);
 
 		/* Propagate pending load changes to the parent */
-		if (cfs_rq->tg->se[cpu])
+		se = cfs_rq->tg->se[cpu];
+		if (se && !skip_blocked_update(se))
 			update_load_avg(cfs_rq->tg->se[cpu], 0);
 	}
 	raw_spin_unlock_irqrestore(&rq->lock, flags);
-- 
2.7.4

Thanks


> Best Regards,
> Huang, Ying
> 
> > I have tried to reproduce the lkp test on a debian jessie then a
> > ubuntu server 16.10 but lkp doesn't seems to install cleanly as there
> > are some errors:
> >
> > sudo bin/lkp run     job.yaml
> > IPMI BMC is not supported on this machine, skip bmc-watchdog setup!
> > 2016-12-12 13:58:39 ./ftq_cache -f 20 -n 6000 -t 8 -a 524288
> > Start 5088418680237 end 5438443372098 elapsed 350024691861
> > cyclestart 14236344834332 cycleend 15214154208877 elapsed 977809374545
> > Avg Cycles(ticks) per ns. is 2.793544; nspercycle is 0.357968
> > Pre-computed ticks per ns: 2.793541
> > Sample frequency is 20.000000
> > ticks per ns 2.79354
> > chown: utilisateur incorrect: «lkp.lkp»
> > chown: utilisateur incorrect: «lkp.lkp»
> > wait for background monitors: 9405 9407 oom-killer nfs-hang
> > curl: (6) Could not resolve host: ftq.time
> >
> >
> >>
> >> In addition to that, the commit also has significant impact on the following tests:
> >>
> >> +------------------+--------------------------------------------------------------------------------+
> >> | testcase: change | unixbench: unixbench.score 2.7% improvement                                    |
> >> | test machine     | 4 threads Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz with 4G memory               |
> >> | test parameters  | cpufreq_governor=performance                                                   |
> >> |                  | nr_task=100%                                                                   |
> >> |                  | runtime=300s                                                                   |
> >> |                  | test=execl                                                                     |
> >> +------------------+--------------------------------------------------------------------------------+
> >>
> >>
> >> Details are as below:
> >> -------------------------------------------------------------------------------------------------->
> >>
> >>
> >> To reproduce:
> >>
> >>         git clone git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git
> >>         cd lkp-tests
> >>         bin/lkp install job.yaml  # job file is attached in this email
> >>         bin/lkp run     job.yaml
> >>
> >> testcase/path_params/tbox_group/run: ftq/100%-6000ss-cache-20-powersave/lkp-hsw-d01
> >>
> >> 09a43ace1f986b00  4e5160766fcc9f41bbd38bac11
> >> ----------------  --------------------------
> >>          %stddev      change         %stddev
> >>              \          |                \
> >>        305 ± 30%       260%       1100 ± 14%  ftq.noise.75%
> >>       1386 ± 19%       149%       3457 ±  7%  ftq.noise.50%
> >>       2148 ± 11%        98%       4257 ±  4%  ftq.noise.25%
> >>    3963589                     3898578        ftq.time.involuntary_context_switches
> >>
> >>
> >>
> >>                                    ftq.noise.50_
> >>
> >>   4000 ++------------O------------------------------------------------------+
> >>        |                                                           O      O |
> >>   3500 ++     O             O                        O    O O O             O
> >>        | O  O      O   O      O O  O O O    O O    O   O         O          |
> >>        O        O                         O                          O O    |
> >>   3000 ++                                       O                           |
> >>        |                 O                                                  |
> >>   2500 ++                                                                   |
> >>        |                                                                    |
> >>   2000 ++                                                                   |
> >>        |    *                  .*                                           |
> >>        |   + :     *   *      *  +                                          |
> >>   1500 ++ +  :    + + + +    :    + .*                                      |
> >>        |.*    *. +   *   *.. :     *  +                                     |
> >>   1000 *+-------*-----------*----------*------------------------------------+
> >>
> >>         [*] bisect-good sample
> >>         [O] bisect-bad  sample
> >>
> >>
> >> Disclaimer:
> >> Results have been estimated based on internal Intel analysis and are provided
> >> for informational purposes only. Any difference in system hardware or software
> >> design or configuration may affect actual performance.
> >>
> >>
> >> Thanks,
> >> Ying Huang
> > _______________________________________________
> > LKP mailing list
> > LKP(a)lists.01.org
> > https://lists.01.org/mailman/listinfo/lkp

WARNING: multiple messages have this Message-ID (diff)
From: Vincent Guittot <vincent.guittot@linaro.org>
To: "Huang, Ying" <ying.huang@intel.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>,
	Andi Kleen <ak@linux.intel.com>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Peter Zijlstra <peterz@infradead.org>, LKP <lkp@01.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Ingo Molnar <mingo@kernel.org>
Subject: Re: [LKP] [lkp-developer] [sched/fair] 4e5160766f: +149% ftq.noise.50% regression
Date: Thu, 22 Dec 2016 16:12:15 +0100	[thread overview]
Message-ID: <20161222151215.GA23448@linaro.org> (raw)
In-Reply-To: <878trk8urx.fsf@yhuang-dev.intel.com>

Le Tuesday 13 Dec 2016 à 09:47:30 (+0800), Huang, Ying a écrit :
> Hi, Vincent,
> 
> Vincent Guittot <vincent.guittot@linaro.org> writes:
> 
> > Hi Ying,
> >
> > On 12 December 2016 at 06:43, kernel test robot
> > <ying.huang@linux.intel.com> wrote:
> >> Greeting,
> >>
> >> FYI, we noticed a 149% regression of ftq.noise.50% due to commit:
> >>
> >>
> >> commit: 4e5160766fcc9f41bbd38bac11f92dce993644aa ("sched/fair: Propagate asynchrous detach")
> >> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
> >>
> >> in testcase: ftq
> >> on test machine: 8 threads Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz with 8G memory
> >> with following parameters:
> >>
> >>         nr_task: 100%
> >>         samples: 6000ss
> >>         test: cache
> >>         freq: 20
> >>         cpufreq_governor: powersave
> >
> > Why using powersave ? Are you testing  every governors ?
> 
> We will test performance and powersave governor for FTQ.

Ok thanks

> 
> >>
> >> test-description: The FTQ benchmarks measure hardware and software interference or 'noise' on a node from the applications perspective.
> >> test-url: https://github.com/rminnich/ftq
> >
> > It's a bit difficult to understand exactly what is measured and what
> > is ftq.noise.50% because this result is not part of the bench which
> > seems to only record a log of data in a file and ftq.noise.50% seems
> > to be lkp specific
> 
> Yes. FTQ itself has no noise statistics builtin, although it is an OS
> noise benchmark.  ftq.noise.50% is calculated as below:
> 
> There is a score for every sample of ftq.  The lower the score, the
> higher the noises.  ftq.noise.50% is the number (per 1000000 samples) of
> samples whose score is less than 50% of the mean score.
> 

ok so IIUC we have moved from 0.03% to 0.11% for ftq.noise.50%

I have not been able to reproduce the regression on the different system that I have access to so I can only guess the root cause of the regression.

Could it be possible to test if the patch below fix the regression ?


---
 kernel/sched/fair.c | 29 ++++++++++++++++++++++++++++-
 1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 090a9bb..8efa113 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3138,6 +3138,31 @@ static inline int propagate_entity_load_avg(struct sched_entity *se)
 	return 1;
 }
 
+/* Check if we need to update the load and the utilization of a group_entity */
+static inline bool skip_blocked_update(struct sched_entity *se)
+{
+	struct cfs_rq *gcfs_rq = group_cfs_rq(se);
+
+	/*
+	 * If sched_entity still have not null load or utilization, we have to
+	 * decay it.
+	 */
+	if (se->avg.load_avg || se->avg.util_avg)
+		return false;
+
+	/*
+	 * If there is a pending propagation, we have to update the load and
+	 * the utilizaion of the sched_entity
+	 */
+	if (gcfs_rq->propagate_avg)
+		return false;
+
+	/*
+	 * Other wise, the load and the utilizaiton of the sched_entity is
+	 * already null so it will be a waste of time to try to decay it
+	 */
+	return true;
+}
 #else /* CONFIG_FAIR_GROUP_SCHED */
 
 static inline void update_tg_load_avg(struct cfs_rq *cfs_rq, int force) {}
@@ -6858,6 +6883,7 @@ static void update_blocked_averages(int cpu)
 {
 	struct rq *rq = cpu_rq(cpu);
 	struct cfs_rq *cfs_rq;
+	struct sched_entity *se;
 	unsigned long flags;
 
 	raw_spin_lock_irqsave(&rq->lock, flags);
@@ -6876,7 +6902,8 @@ static void update_blocked_averages(int cpu)
 			update_tg_load_avg(cfs_rq, 0);
 
 		/* Propagate pending load changes to the parent */
-		if (cfs_rq->tg->se[cpu])
+		se = cfs_rq->tg->se[cpu];
+		if (se && !skip_blocked_update(se))
 			update_load_avg(cfs_rq->tg->se[cpu], 0);
 	}
 	raw_spin_unlock_irqrestore(&rq->lock, flags);
-- 
2.7.4

Thanks


> Best Regards,
> Huang, Ying
> 
> > I have tried to reproduce the lkp test on a debian jessie then a
> > ubuntu server 16.10 but lkp doesn't seems to install cleanly as there
> > are some errors:
> >
> > sudo bin/lkp run     job.yaml
> > IPMI BMC is not supported on this machine, skip bmc-watchdog setup!
> > 2016-12-12 13:58:39 ./ftq_cache -f 20 -n 6000 -t 8 -a 524288
> > Start 5088418680237 end 5438443372098 elapsed 350024691861
> > cyclestart 14236344834332 cycleend 15214154208877 elapsed 977809374545
> > Avg Cycles(ticks) per ns. is 2.793544; nspercycle is 0.357968
> > Pre-computed ticks per ns: 2.793541
> > Sample frequency is 20.000000
> > ticks per ns 2.79354
> > chown: utilisateur incorrect: «lkp.lkp»
> > chown: utilisateur incorrect: «lkp.lkp»
> > wait for background monitors: 9405 9407 oom-killer nfs-hang
> > curl: (6) Could not resolve host: ftq.time
> >
> >
> >>
> >> In addition to that, the commit also has significant impact on the following tests:
> >>
> >> +------------------+--------------------------------------------------------------------------------+
> >> | testcase: change | unixbench: unixbench.score 2.7% improvement                                    |
> >> | test machine     | 4 threads Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz with 4G memory               |
> >> | test parameters  | cpufreq_governor=performance                                                   |
> >> |                  | nr_task=100%                                                                   |
> >> |                  | runtime=300s                                                                   |
> >> |                  | test=execl                                                                     |
> >> +------------------+--------------------------------------------------------------------------------+
> >>
> >>
> >> Details are as below:
> >> -------------------------------------------------------------------------------------------------->
> >>
> >>
> >> To reproduce:
> >>
> >>         git clone git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git
> >>         cd lkp-tests
> >>         bin/lkp install job.yaml  # job file is attached in this email
> >>         bin/lkp run     job.yaml
> >>
> >> testcase/path_params/tbox_group/run: ftq/100%-6000ss-cache-20-powersave/lkp-hsw-d01
> >>
> >> 09a43ace1f986b00  4e5160766fcc9f41bbd38bac11
> >> ----------------  --------------------------
> >>          %stddev      change         %stddev
> >>              \          |                \
> >>        305 ± 30%       260%       1100 ± 14%  ftq.noise.75%
> >>       1386 ± 19%       149%       3457 ±  7%  ftq.noise.50%
> >>       2148 ± 11%        98%       4257 ±  4%  ftq.noise.25%
> >>    3963589                     3898578        ftq.time.involuntary_context_switches
> >>
> >>
> >>
> >>                                    ftq.noise.50_
> >>
> >>   4000 ++------------O------------------------------------------------------+
> >>        |                                                           O      O |
> >>   3500 ++     O             O                        O    O O O             O
> >>        | O  O      O   O      O O  O O O    O O    O   O         O          |
> >>        O        O                         O                          O O    |
> >>   3000 ++                                       O                           |
> >>        |                 O                                                  |
> >>   2500 ++                                                                   |
> >>        |                                                                    |
> >>   2000 ++                                                                   |
> >>        |    *                  .*                                           |
> >>        |   + :     *   *      *  +                                          |
> >>   1500 ++ +  :    + + + +    :    + .*                                      |
> >>        |.*    *. +   *   *.. :     *  +                                     |
> >>   1000 *+-------*-----------*----------*------------------------------------+
> >>
> >>         [*] bisect-good sample
> >>         [O] bisect-bad  sample
> >>
> >>
> >> Disclaimer:
> >> Results have been estimated based on internal Intel analysis and are provided
> >> for informational purposes only. Any difference in system hardware or software
> >> design or configuration may affect actual performance.
> >>
> >>
> >> Thanks,
> >> Ying Huang
> > _______________________________________________
> > LKP mailing list
> > LKP@lists.01.org
> > https://lists.01.org/mailman/listinfo/lkp

  reply	other threads:[~2016-12-22 15:12 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-12  5:43 [lkp-developer] [sched/fair] 4e5160766f: +149% ftq.noise.50% regression kernel test robot
2016-12-12  5:43 ` kernel test robot
2016-12-12 13:25 ` Vincent Guittot
2016-12-12 13:25   ` Vincent Guittot
2016-12-13  1:47   ` Huang, Ying
2016-12-13  1:47     ` [LKP] " Huang, Ying
2016-12-22 15:12     ` Vincent Guittot [this message]
2016-12-22 15:12       ` Vincent Guittot
2016-12-28  8:17       ` Huang, Ying
2016-12-28  8:17         ` [LKP] " Huang, Ying
2017-01-02 15:42         ` Vincent Guittot
2017-01-02 15:42           ` [LKP] " Vincent Guittot
2017-01-03 10:38           ` Dietmar Eggemann
2017-01-03 10:38             ` [LKP] " Dietmar Eggemann
2017-01-03 11:37             ` Vincent Guittot
2017-01-03 11:37               ` [LKP] " Vincent Guittot
2017-01-04  3:08               ` Huang, Ying
2017-01-04  3:08                 ` [LKP] " Huang, Ying
2017-01-04 14:06                 ` Vincent Guittot
2017-01-04 14:06                   ` [LKP] " Vincent Guittot
2017-02-21  2:40                   ` Huang, Ying
2017-02-21  2:40                     ` [LKP] " Huang, Ying
2017-02-27  9:44                     ` Vincent Guittot
2017-02-27  9:44                       ` [LKP] " Vincent Guittot
2017-02-28  0:33                       ` Huang, Ying
2017-02-28  0:33                         ` [LKP] " Huang, Ying
2017-02-28  9:35                         ` Vincent Guittot
2017-02-28  9:35                           ` [LKP] " Vincent Guittot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161222151215.GA23448@linaro.org \
    --to=vincent.guittot@linaro.org \
    --cc=lkp@lists.01.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.