All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ye Xiaolong <xiaolong.ye@intel.com>
To: lkp@lists.01.org
Subject: Re: [lkp-robot] [sched/fair] d519329f72: unixbench.score -9.9% regression
Date: Wed, 04 Apr 2018 14:14:06 +0800	[thread overview]
Message-ID: <20180404061406.GI3845@yexl-desktop> (raw)
In-Reply-To: <20180403114322.GF13951@e110439-lin>

[-- Attachment #1: Type: text/plain, Size: 17575 bytes --]

On 04/03, Patrick Bellasi wrote:
>Hi Xiaolong,
>
>On 02-Apr 11:20, kernel test robot wrote:
>> 
>> Greeting,
>> 
>> FYI, we noticed a -9.9% regression of unixbench.score due to commit:
>
>thanks for the report, I'll try to reproduce it locally to better
>understand what's going on.

Thanks for your attention and feedback.

>
>Meanwhile, I'm a little puzzled about some of the following
>numbers... likely looking at the code it should be more clear.
>But, maybe someone already knows the response.
>
>> commit: d519329f72a6f36bc4f2b85452640cfe583b4f81 ("sched/fair: Update util_est only on util_avg updates")
>> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>
>To give a bit of context, this patch is the last of a small series
>introducing util_est:
>   https://lkml.org/lkml/2018/3/9/245
>where we actually enable it by turning of by default the corresponding SCHED_FEAT.
>Thus, all the effect of util_est are visible just after this very last patch.
>
>I'm not surprised that, if there are issues related to util_est, they
>manifest at this stage.
>
>> in testcase: unixbench
>> on test machine: 8 threads Intel(R) Core(TM) i7 CPU 870 @ 2.93GHz with 6G memory
>> with following parameters:
>> 
>> 	runtime: 300s
>> 	nr_task: 100%
>> 	test: execl
>> 
>> test-description: UnixBench is the original BYTE UNIX benchmark suite aims to test performance of Unix-like system.
>> test-url: https://github.com/kdlucas/byte-unixbench
>
>AFAIU, this benchmark is composed of 12 different test cases:
>   https://github.com/intel/lkp-tests/blob/master/jobs/unixbench.yaml
>and it's reporting a regression for "only" 1 of those tests (execl).
>
>Is that correct?

Yes, this regression is only regarding to the execl test in unixbench.

>
>> Details are as below:
>> -------------------------------------------------------------------------------------------------->
>> 
>> 
>> To reproduce:
>> 
>>         git clone https://github.com/intel/lkp-tests.git
>>         cd lkp-tests
>>         bin/lkp install job.yaml  # job file is attached in this email
>>         bin/lkp run     job.yaml
>
>Will try to give it a run in the next days.
>
>> 
>> =========================================================================================
>> compiler/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
>>   gcc-7/x86_64-rhel-7.2/100%/debian-x86_64-2016-08-31.cgz/300s/nhm-white/execl/unixbench
>> 
>> commit: 
>>   a07630b8b2 ("sched/cpufreq/schedutil: Use util_est for OPP selection")
>>   d519329f72 ("sched/fair: Update util_est only on util_avg updates")
>> 
>> a07630b8b2c16f82 d519329f72a6f36bc4f2b85452 
>> ---------------- -------------------------- 
>>          %stddev     %change         %stddev
>>              \          |                \  
>>       4626            -9.9%       4167        unixbench.score
>
>Is this overall score a composition of the following scores?
>

Actually, the unixbench.score the main performance indicator of unixbench, it is
obtained directly from the raw output of unixbench as below:

------------------------------------------------------------------------
Benchmark Run: Sat Mar 31 2018 08:51:09 - 08:56:48
8 CPUs in system; running 8 parallel copies of tests

Execl Throughput                              17946.9 lps   (30.0 s, 7 samples)

System Benchmarks Partial Index              BASELINE       RESULT    INDEX
Execl Throughput                                 43.0      17946.9   4173.7
                                                                   ========
System Benchmarks Index Score (Partial Only)                         4173.7


>In general, it would be nice to see in the following metrics which one
>is considered "the lower the better" or "the higher the better".

It's a good suggestion, we'll work to make this metrics change more readable.

>
>>    3495362 ±  4%     +70.4%    5957769 ±  2%  unixbench.time.involuntary_context_switches
>
>The above seems to indicate an increase in contention which generated
>scheduler enforced context switches. But, AFAIU, the test under
>analysis generates just one single task which keep execl itself.
>I can't see how util_est can this generated an increase in context
>switches? Will investigate better.
>
>>  2.866e+08           -11.6%  2.534e+08        unixbench.time.minor_page_faults
>>     666.75            -9.7%     602.25        unixbench.time.percent_of_cpu_this_job_got
>>       1830            -9.7%       1653        unixbench.time.system_time
>>     395.13            -5.2%     374.58        unixbench.time.user_time
>
>When I read "system_time" and "user_time" I'm expecting it to be a
>"the lower the better" metric.
>Thus, if that's the case, by just looking at these last two metrics:
>doesn't that means that the test is actually completing faster with
>util_est enabled?

It seems so.

>
>>    8611715           -58.9%    3537314 ±  3%  unixbench.time.voluntary_context_switches
>>    6639375            -9.1%    6033775        unixbench.workload
>
>What the above metric represents?

unixbench.time.voluntary_context_switches is obtained through time (GNU time 1.7), we use
it to record the time metrics while benchmark running, the whole output is like:

        Command being timed: "/lkp/lkp/src/tests/unixbench"
        User time (seconds): 377.38
        System time (seconds): 1664.25
        Percent of CPU this job got: 601%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 5:39.39
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 8956
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 255131129
        Voluntary context switches: 3520290
        Involuntary context switches: 6137488
        Swaps: 0
        File system inputs: 0
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

unixbench.workload is defined as the total operations in unixbench for
all processes/threads.

>
>>      26025         +3849.3%    1027825        interrupts.CAL:Function_call_interrupts
>
>What this metric represents?

This metric is obtain through /proc/interrupts, we would take snapshot of this
file before and after the test, then sum up the Function call interrupts for
all cpus during the test.

Thanks,
Xiaolong
>
>This is a big variation which I cannot easily see as related to
>util_est overheads... unless these are interrupts related to tasks
>migrations, which potentially can affect LB and WKP code paths.
>
>>       4856 ± 14%     -27.4%       3523 ± 11%  slabinfo.filp.active_objs
>>    3534356            -8.8%    3223918        softirqs.RCU
>>      77929           -11.2%      69172        vmstat.system.cs
>>      19489 ±  2%      +7.5%      20956        vmstat.system.in
>>       9.05 ±  9%     +11.0%      10.05 ±  8%  boot-time.dhcp
>>     131.63 ±  4%      +8.6%     142.89 ±  7%  boot-time.idle
>>       9.07 ±  9%     +11.0%      10.07 ±  8%  boot-time.kernel_boot
>>      76288 ±  3%     -12.8%      66560 ±  3%  meminfo.DirectMap4k
>>      16606           -13.1%      14433        meminfo.Inactive
>>      16515           -13.2%      14341        meminfo.Inactive(anon)
>>      11.87 ±  5%      +7.8       19.63 ±  4%  mpstat.cpu.idle%
>>       0.07 ± 35%      -0.0        0.04 ± 17%  mpstat.cpu.soft%
>>      68.91            -6.1       62.82        mpstat.cpu.sys%
>
>The following:
>
>>   29291570          +325.4%  1.246e+08        cpuidle.C1.time
>>    8629105           -36.1%    5513780        cpuidle.C1.usage
>>     668733 ± 12%  +11215.3%   75668902 ±  2%  cpuidle.C1E.time
>>       9763 ± 12%  +16572.7%    1627882 ±  2%  cpuidle.C1E.usage
>>  1.834e+08 ±  9%     +23.1%  2.258e+08 ± 11%  cpuidle.C3.time
>>     222674 ±  8%    +133.4%     519690 ±  6%  cpuidle.C3.usage
>
>are other really big variations which metrics represents, AFAIU, the
>following IDLE states:
>- C1  : Core doesn't work, clocks stopped
>- C1E : Core doesn't work, clocks stopped and voltage lowered
>- C3  : L1 and L2 copied to L3, all core clocks stopped
>
>Thus, it seems to me that, with the patches in, we are more likely to
>sleep longer. Which suggests a more aggressive race-to-idle policy:
>running at higher frequencies to complete faster and sleep
>longer and deeper.
>
>This would kind-of match with the idea of completing faster, but I'm
>not completely sure... however, if that should be the case, again I
>would say that's a benefit, not a regression.
>
>
>>       4129           -13.3%       3581        proc-vmstat.nr_inactive_anon
>>       4129           -13.3%       3581        proc-vmstat.nr_zone_inactive_anon
>>  2.333e+08           -12.2%  2.049e+08        proc-vmstat.numa_hit
>>  2.333e+08           -12.2%  2.049e+08        proc-vmstat.numa_local
>>       6625           -10.9%       5905        proc-vmstat.pgactivate
>>  2.392e+08           -12.1%  2.102e+08        proc-vmstat.pgalloc_normal
>>  2.936e+08           -12.6%  2.566e+08        proc-vmstat.pgfault
>>  2.392e+08           -12.1%  2.102e+08        proc-vmstat.pgfree
>
>All the following metrics:
>
>>       2850           -15.3%       2413        turbostat.Avg_MHz
>>    8629013           -36.1%    5513569        turbostat.C1
>>       1.09            +3.5        4.61        turbostat.C1%
>>       9751 ± 12%  +16593.0%    1627864 ±  2%  turbostat.C1E
>>       0.03 ± 19%      +2.8        2.80        turbostat.C1E%
>>     222574 ±  8%    +133.4%     519558 ±  6%  turbostat.C3
>>       6.84 ±  8%      +1.5        8.34 ± 10%  turbostat.C3%
>>       2.82 ±  7%    +250.3%       9.87 ±  2%  turbostat.CPU%c1
>>    6552773 ±  3%     +23.8%    8111699 ±  2%  turbostat.IRQ
>>       2.02 ± 11%     +28.3%       2.58 ±  9%  turbostat.Pkg%pc3
>
>maybe they can help to understand better what's going on with respect
>to the race-to-idle theory...
>
>
>>  7.635e+11           -12.5%  6.682e+11        perf-stat.branch-instructions
>>  3.881e+10           -12.9%  3.381e+10        perf-stat.branch-misses
>>       2.09            -0.3        1.77 ±  4%  perf-stat.cache-miss-rate%
>>  1.551e+09           -15.1%  1.316e+09 ±  4%  perf-stat.cache-misses
>>   26177920           -10.5%   23428188        perf-stat.context-switches
>>       1.99            -2.8%       1.93        perf-stat.cpi
>>  7.553e+12           -14.7%  6.446e+12        perf-stat.cpu-cycles
>
>This:
>
>>     522523 ±  2%    +628.3%    3805664        perf-stat.cpu-migrations
>
>is another good point. With util_est we are affecting LB and WKP
>paths... but, give the specific execl test, not entirely sure how we
>can affect migrations using util_est.
>
>Have to check better these two points:
>- do we reset PELT after an execl?
>- do we trigger a possible task migration after and execl
>
>
>>  2.425e+10 ±  4%     -14.3%  2.078e+10        perf-stat.dTLB-load-misses
>>  1.487e+12           -11.3%  1.319e+12        perf-stat.dTLB-loads
>>  1.156e+10 ±  3%      -7.7%  1.066e+10        perf-stat.dTLB-store-misses
>>  6.657e+11           -11.1%  5.915e+11        perf-stat.dTLB-stores
>>       0.15            +0.0        0.15        perf-stat.iTLB-load-miss-rate%
>>  5.807e+09           -11.0%  5.166e+09        perf-stat.iTLB-load-misses
>>  3.799e+12           -12.1%   3.34e+12        perf-stat.iTLB-loads
>>  3.803e+12           -12.2%  3.338e+12        perf-stat.instructions
>>     654.99            -1.4%     646.07        perf-stat.instructions-per-iTLB-miss
>>       0.50            +2.8%       0.52        perf-stat.ipc
>>  2.754e+08           -11.6%  2.435e+08        perf-stat.minor-faults
>>  1.198e+08 ±  7%     +73.1%  2.074e+08 ±  4%  perf-stat.node-stores
>>  2.754e+08           -11.6%  2.435e+08        perf-stat.page-faults
>>     572928            -3.4%     553258        perf-stat.path-length
>> 
>> 
>>                                                                                 
>>                                   unixbench.score                               
>>                                                                                 
>>   4800 +-+------------------------------------------------------------------+   
>>        |+                                      +     +                      |   
>>   4700 +-+ +                +                  :+ +. :+ + +                 |   
>>        |  + +   +    +.     :  +              +  +  +  + + + .+++++    .+  +|   
>>   4600 +-+   +++ :+++  + ++: : :+  +++ ++.++++              +      ++++  ++ |   
>>        |         +      +  + ++  ++   +                                     |   
>>   4500 +-+                                                                  |   
>>        |                                                                    |   
>>   4400 +-+                                                                  |   
>>        |                                                                    |   
>>   4300 +-+                                                                  |   
>>        O                                                                    |   
>>   4200 +-O        O O    OOOO OO OOO OOOO  OOOO O O                         |   
>>        |O OO OOOOO O O OO    O      O     O    O O  OO                      |   
>>   4100 +-+------------------------------------------------------------------+   
>>                                                                                 
>>                                                                                                                                                                 
>>                                   unixbench.workload                            
>>                                                                                 
>>     9e+06 +-+---------------------------------------------------------------+   
>>           |                   :                                             |   
>>   8.5e+06 +-+                 :                                             |   
>>           |                   :                                             |   
>>     8e+06 +-+                 :                                             |   
>>           |                   ::                                            |   
>>   7.5e+06 +-+                : : +                                          |   
>>           |                 +: : :       +                                  |   
>>     7e+06 +-+      +    +   :: : ::  + + :  +   +     +    +            +   |   
>>           |:+ +   + :   :: : : :: :  :+ : : ::+ :+ .+ :+ ++ ++ + ++     ::++|   
>>   6.5e+06 +-O+ +++  ++++ +++ + ++ +.+   + ++ + +  +  +  +     + +  +.+++ +  |   
>>           O                O    O            O O O    O                     |   
>>     6e+06 +O+OOO O OOOOOOOO OOOO OO OOOOOOOOO O O O OO                      |   
>>           |     O                                                           |   
>>   5.5e+06 +-+---------------------------------------------------------------+   
>>                                                                                 
>>                                                                                 
>>                                                                                 
>> [*] bisect-good sample
>> [O] bisect-bad  sample
>> 
>> 
>> 
>> Disclaimer:
>> Results have been estimated based on internal Intel analysis and are provided
>> for informational purposes only. Any difference in system hardware or software
>> design or configuration may affect actual performance.
>> 
>> 
>> Thanks,
>> Xiaolong
>
>[...]
>
>> #
>> # CPU Frequency scaling
>> #
>> CONFIG_CPU_FREQ=y
>> CONFIG_CPU_FREQ_GOV_ATTR_SET=y
>> CONFIG_CPU_FREQ_GOV_COMMON=y
>> # CONFIG_CPU_FREQ_STAT is not set
>> # CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE is not set
>> # CONFIG_CPU_FREQ_DEFAULT_GOV_POWERSAVE is not set
>> # CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE is not set
>
>The governor in use is not schedutil... thus util_est could effect the
>test just because of signals tracking overheads, of because of the way
>we affect tasks placement in WK and LB paths... which can be
>correlated to the impact on task migrations and preemption...
>
>> CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y
>> # CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE is not set
>> # CONFIG_CPU_FREQ_DEFAULT_GOV_SCHEDUTIL is not set
>> CONFIG_CPU_FREQ_GOV_PERFORMANCE=y
>> CONFIG_CPU_FREQ_GOV_POWERSAVE=y
>> CONFIG_CPU_FREQ_GOV_USERSPACE=y
>> CONFIG_CPU_FREQ_GOV_ONDEMAND=y
>> CONFIG_CPU_FREQ_GOV_CONSERVATIVE=y
>> # CONFIG_CPU_FREQ_GOV_SCHEDUTIL is not set
>> 
>> #
>> # CPU frequency scaling drivers
>> #
>> CONFIG_X86_INTEL_PSTATE=y
>> CONFIG_X86_PCC_CPUFREQ=m
>> CONFIG_X86_ACPI_CPUFREQ=m
>> CONFIG_X86_ACPI_CPUFREQ_CPB=y
>> CONFIG_X86_POWERNOW_K8=m
>> CONFIG_X86_AMD_FREQ_SENSITIVITY=m
>> # CONFIG_X86_SPEEDSTEP_CENTRINO is not set
>> CONFIG_X86_P4_CLOCKMOD=m
>> 
>
>-- 
>#include <best/regards.h>
>
>Patrick Bellasi

WARNING: multiple messages have this Message-ID (diff)
From: Ye Xiaolong <xiaolong.ye@intel.com>
To: Patrick Bellasi <patrick.bellasi@arm.com>
Cc: Ingo Molnar <mingo@kernel.org>,
	Chris Redpath <chris.redpath@arm.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Joel Fernandes <joelaf@google.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Morten Rasmussen <morten.rasmussen@arm.com>,
	Paul Turner <pjt@google.com>,
	"Rafael J . Wysocki" <rafael.j.wysocki@intel.com>,
	Steve Muckle <smuckle@google.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Todd Kjos <tkjos@android.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Viresh Kumar <viresh.kumar@linaro.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Stephen Rothwell <sfr@canb.auug.org.au>,
	lkp@01.org
Subject: Re: [lkp-robot] [sched/fair]  d519329f72:  unixbench.score -9.9% regression
Date: Wed, 4 Apr 2018 14:14:06 +0800	[thread overview]
Message-ID: <20180404061406.GI3845@yexl-desktop> (raw)
In-Reply-To: <20180403114322.GF13951@e110439-lin>

On 04/03, Patrick Bellasi wrote:
>Hi Xiaolong,
>
>On 02-Apr 11:20, kernel test robot wrote:
>> 
>> Greeting,
>> 
>> FYI, we noticed a -9.9% regression of unixbench.score due to commit:
>
>thanks for the report, I'll try to reproduce it locally to better
>understand what's going on.

Thanks for your attention and feedback.

>
>Meanwhile, I'm a little puzzled about some of the following
>numbers... likely looking at the code it should be more clear.
>But, maybe someone already knows the response.
>
>> commit: d519329f72a6f36bc4f2b85452640cfe583b4f81 ("sched/fair: Update util_est only on util_avg updates")
>> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>
>To give a bit of context, this patch is the last of a small series
>introducing util_est:
>   https://lkml.org/lkml/2018/3/9/245
>where we actually enable it by turning of by default the corresponding SCHED_FEAT.
>Thus, all the effect of util_est are visible just after this very last patch.
>
>I'm not surprised that, if there are issues related to util_est, they
>manifest at this stage.
>
>> in testcase: unixbench
>> on test machine: 8 threads Intel(R) Core(TM) i7 CPU 870 @ 2.93GHz with 6G memory
>> with following parameters:
>> 
>> 	runtime: 300s
>> 	nr_task: 100%
>> 	test: execl
>> 
>> test-description: UnixBench is the original BYTE UNIX benchmark suite aims to test performance of Unix-like system.
>> test-url: https://github.com/kdlucas/byte-unixbench
>
>AFAIU, this benchmark is composed of 12 different test cases:
>   https://github.com/intel/lkp-tests/blob/master/jobs/unixbench.yaml
>and it's reporting a regression for "only" 1 of those tests (execl).
>
>Is that correct?

Yes, this regression is only regarding to the execl test in unixbench.

>
>> Details are as below:
>> -------------------------------------------------------------------------------------------------->
>> 
>> 
>> To reproduce:
>> 
>>         git clone https://github.com/intel/lkp-tests.git
>>         cd lkp-tests
>>         bin/lkp install job.yaml  # job file is attached in this email
>>         bin/lkp run     job.yaml
>
>Will try to give it a run in the next days.
>
>> 
>> =========================================================================================
>> compiler/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
>>   gcc-7/x86_64-rhel-7.2/100%/debian-x86_64-2016-08-31.cgz/300s/nhm-white/execl/unixbench
>> 
>> commit: 
>>   a07630b8b2 ("sched/cpufreq/schedutil: Use util_est for OPP selection")
>>   d519329f72 ("sched/fair: Update util_est only on util_avg updates")
>> 
>> a07630b8b2c16f82 d519329f72a6f36bc4f2b85452 
>> ---------------- -------------------------- 
>>          %stddev     %change         %stddev
>>              \          |                \  
>>       4626            -9.9%       4167        unixbench.score
>
>Is this overall score a composition of the following scores?
>

Actually, the unixbench.score the main performance indicator of unixbench, it is
obtained directly from the raw output of unixbench as below:

------------------------------------------------------------------------
Benchmark Run: Sat Mar 31 2018 08:51:09 - 08:56:48
8 CPUs in system; running 8 parallel copies of tests

Execl Throughput                              17946.9 lps   (30.0 s, 7 samples)

System Benchmarks Partial Index              BASELINE       RESULT    INDEX
Execl Throughput                                 43.0      17946.9   4173.7
                                                                   ========
System Benchmarks Index Score (Partial Only)                         4173.7


>In general, it would be nice to see in the following metrics which one
>is considered "the lower the better" or "the higher the better".

It's a good suggestion, we'll work to make this metrics change more readable.

>
>>    3495362 ±  4%     +70.4%    5957769 ±  2%  unixbench.time.involuntary_context_switches
>
>The above seems to indicate an increase in contention which generated
>scheduler enforced context switches. But, AFAIU, the test under
>analysis generates just one single task which keep execl itself.
>I can't see how util_est can this generated an increase in context
>switches? Will investigate better.
>
>>  2.866e+08           -11.6%  2.534e+08        unixbench.time.minor_page_faults
>>     666.75            -9.7%     602.25        unixbench.time.percent_of_cpu_this_job_got
>>       1830            -9.7%       1653        unixbench.time.system_time
>>     395.13            -5.2%     374.58        unixbench.time.user_time
>
>When I read "system_time" and "user_time" I'm expecting it to be a
>"the lower the better" metric.
>Thus, if that's the case, by just looking at these last two metrics:
>doesn't that means that the test is actually completing faster with
>util_est enabled?

It seems so.

>
>>    8611715           -58.9%    3537314 ±  3%  unixbench.time.voluntary_context_switches
>>    6639375            -9.1%    6033775        unixbench.workload
>
>What the above metric represents?

unixbench.time.voluntary_context_switches is obtained through time (GNU time 1.7), we use
it to record the time metrics while benchmark running, the whole output is like:

        Command being timed: "/lkp/lkp/src/tests/unixbench"
        User time (seconds): 377.38
        System time (seconds): 1664.25
        Percent of CPU this job got: 601%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 5:39.39
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 8956
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 255131129
        Voluntary context switches: 3520290
        Involuntary context switches: 6137488
        Swaps: 0
        File system inputs: 0
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

unixbench.workload is defined as the total operations in unixbench for
all processes/threads.

>
>>      26025         +3849.3%    1027825        interrupts.CAL:Function_call_interrupts
>
>What this metric represents?

This metric is obtain through /proc/interrupts, we would take snapshot of this
file before and after the test, then sum up the Function call interrupts for
all cpus during the test.

Thanks,
Xiaolong
>
>This is a big variation which I cannot easily see as related to
>util_est overheads... unless these are interrupts related to tasks
>migrations, which potentially can affect LB and WKP code paths.
>
>>       4856 ± 14%     -27.4%       3523 ± 11%  slabinfo.filp.active_objs
>>    3534356            -8.8%    3223918        softirqs.RCU
>>      77929           -11.2%      69172        vmstat.system.cs
>>      19489 ±  2%      +7.5%      20956        vmstat.system.in
>>       9.05 ±  9%     +11.0%      10.05 ±  8%  boot-time.dhcp
>>     131.63 ±  4%      +8.6%     142.89 ±  7%  boot-time.idle
>>       9.07 ±  9%     +11.0%      10.07 ±  8%  boot-time.kernel_boot
>>      76288 ±  3%     -12.8%      66560 ±  3%  meminfo.DirectMap4k
>>      16606           -13.1%      14433        meminfo.Inactive
>>      16515           -13.2%      14341        meminfo.Inactive(anon)
>>      11.87 ±  5%      +7.8       19.63 ±  4%  mpstat.cpu.idle%
>>       0.07 ± 35%      -0.0        0.04 ± 17%  mpstat.cpu.soft%
>>      68.91            -6.1       62.82        mpstat.cpu.sys%
>
>The following:
>
>>   29291570          +325.4%  1.246e+08        cpuidle.C1.time
>>    8629105           -36.1%    5513780        cpuidle.C1.usage
>>     668733 ± 12%  +11215.3%   75668902 ±  2%  cpuidle.C1E.time
>>       9763 ± 12%  +16572.7%    1627882 ±  2%  cpuidle.C1E.usage
>>  1.834e+08 ±  9%     +23.1%  2.258e+08 ± 11%  cpuidle.C3.time
>>     222674 ±  8%    +133.4%     519690 ±  6%  cpuidle.C3.usage
>
>are other really big variations which metrics represents, AFAIU, the
>following IDLE states:
>- C1  : Core doesn't work, clocks stopped
>- C1E : Core doesn't work, clocks stopped and voltage lowered
>- C3  : L1 and L2 copied to L3, all core clocks stopped
>
>Thus, it seems to me that, with the patches in, we are more likely to
>sleep longer. Which suggests a more aggressive race-to-idle policy:
>running at higher frequencies to complete faster and sleep
>longer and deeper.
>
>This would kind-of match with the idea of completing faster, but I'm
>not completely sure... however, if that should be the case, again I
>would say that's a benefit, not a regression.
>
>
>>       4129           -13.3%       3581        proc-vmstat.nr_inactive_anon
>>       4129           -13.3%       3581        proc-vmstat.nr_zone_inactive_anon
>>  2.333e+08           -12.2%  2.049e+08        proc-vmstat.numa_hit
>>  2.333e+08           -12.2%  2.049e+08        proc-vmstat.numa_local
>>       6625           -10.9%       5905        proc-vmstat.pgactivate
>>  2.392e+08           -12.1%  2.102e+08        proc-vmstat.pgalloc_normal
>>  2.936e+08           -12.6%  2.566e+08        proc-vmstat.pgfault
>>  2.392e+08           -12.1%  2.102e+08        proc-vmstat.pgfree
>
>All the following metrics:
>
>>       2850           -15.3%       2413        turbostat.Avg_MHz
>>    8629013           -36.1%    5513569        turbostat.C1
>>       1.09            +3.5        4.61        turbostat.C1%
>>       9751 ± 12%  +16593.0%    1627864 ±  2%  turbostat.C1E
>>       0.03 ± 19%      +2.8        2.80        turbostat.C1E%
>>     222574 ±  8%    +133.4%     519558 ±  6%  turbostat.C3
>>       6.84 ±  8%      +1.5        8.34 ± 10%  turbostat.C3%
>>       2.82 ±  7%    +250.3%       9.87 ±  2%  turbostat.CPU%c1
>>    6552773 ±  3%     +23.8%    8111699 ±  2%  turbostat.IRQ
>>       2.02 ± 11%     +28.3%       2.58 ±  9%  turbostat.Pkg%pc3
>
>maybe they can help to understand better what's going on with respect
>to the race-to-idle theory...
>
>
>>  7.635e+11           -12.5%  6.682e+11        perf-stat.branch-instructions
>>  3.881e+10           -12.9%  3.381e+10        perf-stat.branch-misses
>>       2.09            -0.3        1.77 ±  4%  perf-stat.cache-miss-rate%
>>  1.551e+09           -15.1%  1.316e+09 ±  4%  perf-stat.cache-misses
>>   26177920           -10.5%   23428188        perf-stat.context-switches
>>       1.99            -2.8%       1.93        perf-stat.cpi
>>  7.553e+12           -14.7%  6.446e+12        perf-stat.cpu-cycles
>
>This:
>
>>     522523 ±  2%    +628.3%    3805664        perf-stat.cpu-migrations
>
>is another good point. With util_est we are affecting LB and WKP
>paths... but, give the specific execl test, not entirely sure how we
>can affect migrations using util_est.
>
>Have to check better these two points:
>- do we reset PELT after an execl?
>- do we trigger a possible task migration after and execl
>
>
>>  2.425e+10 ±  4%     -14.3%  2.078e+10        perf-stat.dTLB-load-misses
>>  1.487e+12           -11.3%  1.319e+12        perf-stat.dTLB-loads
>>  1.156e+10 ±  3%      -7.7%  1.066e+10        perf-stat.dTLB-store-misses
>>  6.657e+11           -11.1%  5.915e+11        perf-stat.dTLB-stores
>>       0.15            +0.0        0.15        perf-stat.iTLB-load-miss-rate%
>>  5.807e+09           -11.0%  5.166e+09        perf-stat.iTLB-load-misses
>>  3.799e+12           -12.1%   3.34e+12        perf-stat.iTLB-loads
>>  3.803e+12           -12.2%  3.338e+12        perf-stat.instructions
>>     654.99            -1.4%     646.07        perf-stat.instructions-per-iTLB-miss
>>       0.50            +2.8%       0.52        perf-stat.ipc
>>  2.754e+08           -11.6%  2.435e+08        perf-stat.minor-faults
>>  1.198e+08 ±  7%     +73.1%  2.074e+08 ±  4%  perf-stat.node-stores
>>  2.754e+08           -11.6%  2.435e+08        perf-stat.page-faults
>>     572928            -3.4%     553258        perf-stat.path-length
>> 
>> 
>>                                                                                 
>>                                   unixbench.score                               
>>                                                                                 
>>   4800 +-+------------------------------------------------------------------+   
>>        |+                                      +     +                      |   
>>   4700 +-+ +                +                  :+ +. :+ + +                 |   
>>        |  + +   +    +.     :  +              +  +  +  + + + .+++++    .+  +|   
>>   4600 +-+   +++ :+++  + ++: : :+  +++ ++.++++              +      ++++  ++ |   
>>        |         +      +  + ++  ++   +                                     |   
>>   4500 +-+                                                                  |   
>>        |                                                                    |   
>>   4400 +-+                                                                  |   
>>        |                                                                    |   
>>   4300 +-+                                                                  |   
>>        O                                                                    |   
>>   4200 +-O        O O    OOOO OO OOO OOOO  OOOO O O                         |   
>>        |O OO OOOOO O O OO    O      O     O    O O  OO                      |   
>>   4100 +-+------------------------------------------------------------------+   
>>                                                                                 
>>                                                                                                                                                                 
>>                                   unixbench.workload                            
>>                                                                                 
>>     9e+06 +-+---------------------------------------------------------------+   
>>           |                   :                                             |   
>>   8.5e+06 +-+                 :                                             |   
>>           |                   :                                             |   
>>     8e+06 +-+                 :                                             |   
>>           |                   ::                                            |   
>>   7.5e+06 +-+                : : +                                          |   
>>           |                 +: : :       +                                  |   
>>     7e+06 +-+      +    +   :: : ::  + + :  +   +     +    +            +   |   
>>           |:+ +   + :   :: : : :: :  :+ : : ::+ :+ .+ :+ ++ ++ + ++     ::++|   
>>   6.5e+06 +-O+ +++  ++++ +++ + ++ +.+   + ++ + +  +  +  +     + +  +.+++ +  |   
>>           O                O    O            O O O    O                     |   
>>     6e+06 +O+OOO O OOOOOOOO OOOO OO OOOOOOOOO O O O OO                      |   
>>           |     O                                                           |   
>>   5.5e+06 +-+---------------------------------------------------------------+   
>>                                                                                 
>>                                                                                 
>>                                                                                 
>> [*] bisect-good sample
>> [O] bisect-bad  sample
>> 
>> 
>> 
>> Disclaimer:
>> Results have been estimated based on internal Intel analysis and are provided
>> for informational purposes only. Any difference in system hardware or software
>> design or configuration may affect actual performance.
>> 
>> 
>> Thanks,
>> Xiaolong
>
>[...]
>
>> #
>> # CPU Frequency scaling
>> #
>> CONFIG_CPU_FREQ=y
>> CONFIG_CPU_FREQ_GOV_ATTR_SET=y
>> CONFIG_CPU_FREQ_GOV_COMMON=y
>> # CONFIG_CPU_FREQ_STAT is not set
>> # CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE is not set
>> # CONFIG_CPU_FREQ_DEFAULT_GOV_POWERSAVE is not set
>> # CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE is not set
>
>The governor in use is not schedutil... thus util_est could effect the
>test just because of signals tracking overheads, of because of the way
>we affect tasks placement in WK and LB paths... which can be
>correlated to the impact on task migrations and preemption...
>
>> CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y
>> # CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE is not set
>> # CONFIG_CPU_FREQ_DEFAULT_GOV_SCHEDUTIL is not set
>> CONFIG_CPU_FREQ_GOV_PERFORMANCE=y
>> CONFIG_CPU_FREQ_GOV_POWERSAVE=y
>> CONFIG_CPU_FREQ_GOV_USERSPACE=y
>> CONFIG_CPU_FREQ_GOV_ONDEMAND=y
>> CONFIG_CPU_FREQ_GOV_CONSERVATIVE=y
>> # CONFIG_CPU_FREQ_GOV_SCHEDUTIL is not set
>> 
>> #
>> # CPU frequency scaling drivers
>> #
>> CONFIG_X86_INTEL_PSTATE=y
>> CONFIG_X86_PCC_CPUFREQ=m
>> CONFIG_X86_ACPI_CPUFREQ=m
>> CONFIG_X86_ACPI_CPUFREQ_CPB=y
>> CONFIG_X86_POWERNOW_K8=m
>> CONFIG_X86_AMD_FREQ_SENSITIVITY=m
>> # CONFIG_X86_SPEEDSTEP_CENTRINO is not set
>> CONFIG_X86_P4_CLOCKMOD=m
>> 
>
>-- 
>#include <best/regards.h>
>
>Patrick Bellasi

  reply	other threads:[~2018-04-04  6:14 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-02  3:20 [lkp-robot] [sched/fair] d519329f72: unixbench.score -9.9% regression kernel test robot
2018-04-02  3:20 ` kernel test robot
2018-04-03 11:43 ` Patrick Bellasi
2018-04-03 11:43   ` Patrick Bellasi
2018-04-04  6:14   ` Ye Xiaolong [this message]
2018-04-04  6:14     ` Ye Xiaolong
2018-10-24  6:41 ` Aaron Lu
2018-10-24  6:41   ` [LKP] " Aaron Lu
2018-10-24 17:01   ` Patrick Bellasi
2018-10-24 17:01     ` [LKP] " Patrick Bellasi
2018-10-25  8:56     ` Aaron Lu
2018-10-25  8:56       ` [LKP] " Aaron Lu
2018-10-25  9:31       ` Patrick Bellasi
2018-10-25  9:31         ` [LKP] " Patrick Bellasi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180404061406.GI3845@yexl-desktop \
    --to=xiaolong.ye@intel.com \
    --cc=lkp@lists.01.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.