All of lore.kernel.org
 help / color / mirror / Atom feed
From: kernel test robot <rong.a.chen@intel.com>
To: lkp@lists.01.org
Subject: Re: [PM] 8234f6734c: will-it-scale.per_process_ops -3.6% regression
Date: Wed, 16 Jan 2019 23:38:39 +0800	[thread overview]
Message-ID: <20190116153839.GA3867@shao2-debian> (raw)
In-Reply-To: <CAKfTPtB4cHpF2JcUiOLbmczDVSLEmBCpNDcYqCdAqwYZ2LAsRg@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 20618 bytes --]

On Tue, Jan 15, 2019 at 02:13:47PM +0100, Vincent Guittot wrote:
> Hi Rong,
> 
> On Tue, 15 Jan 2019 at 04:24, kernel test robot <rong.a.chen@intel.com> wrote:
> >
> > Greeting,
> >
> > FYI, we noticed a -3.6% regression of will-it-scale.per_process_ops due to commit:
> >
> >
> > commit: 8234f6734c5d74ac794e5517437f51c57d65f865 ("PM-runtime: Switch autosuspend over to using hrtimers")
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> >
> 
> Could you rerun with the patch :
> https://lore.kernel.org/patchwork/patch/1030857/ ?
> It optimizes autosuspend by reducing the number of call to ktime_get

Hi Vincent,

the regression of will-it-scale.per_process_ops is still exist according to the result.

commit:
   v4.20-rc7
   c534491102 ("PM/runtime: Do not needlessly call ktime_get")

       v4.20-rc7  c534491102b35a2075c78b72bb  
----------------  --------------------------  
         %stddev      change         %stddev
             \          |                \  
  25028944              -4%   23987264        will-it-scale.workload
    240662              -4%     230646        will-it-scale.per_process_ops
     80031                       78804        proc-vmstat.nr_zone_active_anon
     80031                       78804        proc-vmstat.nr_active_anon
      7649 ±173%     -6e+03       1870 ±133%  latency_stats.avg.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_rpc_wrapper.nfs3_proc_lookup.nfs_lookup_revalidate_dentry.nfs_do_lookup_revalidate.__nfs_lookup_revalidate.lookup_fast.walk_component.link_path_walk
      7654 ±173%     -6e+03       1834 ±133%  latency_stats.avg.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_rpc_wrapper.nfs3_proc_access.nfs_do_access.nfs_permission.inode_permission.link_path_walk.path_lookupat.filename_lookup
     13537 ±173%     -1e+04          0        latency_stats.avg.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_get_acl.get_acl.posix_acl_create.nfs3_proc_create.nfs_create.path_openat.do_filp_open.do_sys_open
     66199 ±130%     -7e+04          0        latency_stats.avg.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_rpc_wrapper.nfs3_proc_lookup.nfs_lookup.path_openat.do_filp_open.do_sys_open.do_syscall_64.entry_SYSCALL_64_after_hwframe
    389513 ±161%     -4e+05          0        latency_stats.avg.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_rpc_wrapper.nfs3_do_create.nfs3_proc_create.nfs_create.path_openat.do_filp_open.do_sys_open.do_syscall_64
       629 ± 65%      4e+03       4446 ±123%  latency_stats.max.io_schedule.__lock_page.do_wp_page.__handle_mm_fault.handle_mm_fault.__do_page_fault.do_page_fault.page_fault
      7748 ±173%     -6e+03       1899 ±133%  latency_stats.max.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_rpc_wrapper.nfs3_proc_lookup.nfs_lookup_revalidate_dentry.nfs_do_lookup_revalidate.__nfs_lookup_revalidate.lookup_fast.walk_component.link_path_walk
      7750 ±173%     -6e+03       1845 ±133%  latency_stats.max.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_rpc_wrapper.nfs3_proc_access.nfs_do_access.nfs_permission.inode_permission.link_path_walk.path_lookupat.filename_lookup
     13537 ±173%     -1e+04          0        latency_stats.max.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_get_acl.get_acl.posix_acl_create.nfs3_proc_create.nfs_create.path_openat.do_filp_open.do_sys_open
     66199 ±130%     -7e+04          0        latency_stats.max.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_rpc_wrapper.nfs3_proc_lookup.nfs_lookup.path_openat.do_filp_open.do_sys_open.do_syscall_64.entry_SYSCALL_64_after_hwframe
    392739 ±159%     -4e+05          0        latency_stats.max.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_rpc_wrapper.nfs3_do_create.nfs3_proc_create.nfs_create.path_openat.do_filp_open.do_sys_open.do_syscall_64
     15365 ± 41%      2e+05     194745 ±123%  latency_stats.sum.io_schedule.__lock_page.do_wp_page.__handle_mm_fault.handle_mm_fault.__do_page_fault.do_page_fault.page_fault
      9214 ± 30%      6e+04      71022 ± 22%  latency_stats.sum.pipe_wait.pipe_write.__vfs_write.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
     15299 ±173%     -1e+04       3740 ±133%  latency_stats.sum.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_rpc_wrapper.nfs3_proc_lookup.nfs_lookup_revalidate_dentry.nfs_do_lookup_revalidate.__nfs_lookup_revalidate.lookup_fast.walk_component.link_path_walk
     13537 ±173%     -1e+04          0        latency_stats.sum.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_get_acl.get_acl.posix_acl_create.nfs3_proc_create.nfs_create.path_openat.do_filp_open.do_sys_open
     22963 ±173%     -2e+04       3668 ±133%  latency_stats.sum.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_rpc_wrapper.nfs3_proc_access.nfs_do_access.nfs_permission.inode_permission.link_path_walk.path_lookupat.filename_lookup
     66199 ±130%     -7e+04          0        latency_stats.sum.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_rpc_wrapper.nfs3_proc_lookup.nfs_lookup.path_openat.do_filp_open.do_sys_open.do_syscall_64.entry_SYSCALL_64_after_hwframe
    408736 ±151%     -4e+05          0        latency_stats.sum.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_rpc_wrapper.nfs3_do_create.nfs3_proc_create.nfs_create.path_openat.do_filp_open.do_sys_open.do_syscall_64
     79146 ± 26%       626%     574528 ±  5%  perf-stat.i.iTLB-loads
      8211 ±  7%        56%      12770 ± 14%  perf-stat.i.node-store-misses
      0.28              15%       0.32        perf-stat.overall.branch-miss-rate%
  1.61e+08              11%  1.791e+08        perf-stat.i.branch-misses
     71.87              10%      79.18 ±  3%  perf-stat.overall.node-store-miss-rate%
     13107 ±  4%         7%      14023        perf-stat.i.node-loads
      1.04               5%       1.09        perf-stat.overall.cpi
     99.05                       97.63        perf-stat.i.iTLB-load-miss-rate%
     83.87                       82.36        perf-stat.overall.node-load-miss-rate%
     99.68                       97.65        perf-stat.overall.iTLB-load-miss-rate%
  24777147              -3%   23919344        perf-stat.i.iTLB-load-misses
 2.743e+11              -4%  2.646e+11        perf-stat.i.instructions
 5.791e+10              -4%  5.586e+10        perf-stat.i.branch-instructions
  2.89e+10              -4%  2.787e+10        perf-stat.i.dTLB-stores
 5.964e+10              -4%  5.752e+10        perf-stat.i.dTLB-loads
      0.96              -4%       0.92        perf-stat.i.ipc
 8.333e+13              -4%  7.976e+13        perf-stat.total.instructions
      0.96              -4%       0.92        perf-stat.overall.ipc
    355843 ±  4%       -12%     313369 ±  4%  perf-stat.i.cache-misses

Best Regards,
Rong Chen

> 
> Regards,
> Vincent
> 
> > in testcase: will-it-scale
> > on test machine: 104 threads Skylake with 192G memory
> > with following parameters:
> >
> >         nr_task: 100%
> >         mode: process
> >         test: poll2
> >         cpufreq_governor: performance
> >
> > test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
> > test-url: https://github.com/antonblanchard/will-it-scale
> >
> >
> >
> > Details are as below:
> > -------------------------------------------------------------------------------------------------->
> >
> >
> > To reproduce:
> >
> >         git clone https://github.com/intel/lkp-tests.git
> >         cd lkp-tests
> >         bin/lkp install job.yaml  # job file is attached in this email
> >         bin/lkp run     job.yaml
> >
> > =========================================================================================
> > compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
> >   gcc-7/performance/x86_64-rhel-7.2/process/100%/debian-x86_64-2018-04-03.cgz/lkp-skl-fpga01/poll2/will-it-scale
> >
> > commit:
> >   v4.20-rc7
> >   8234f6734c ("PM-runtime: Switch autosuspend over to using hrtimers")
> >
> >        v4.20-rc7 8234f6734c5d74ac794e551743
> > ---------------- --------------------------
> >        fail:runs  %reproduction    fail:runs
> >            |             |             |
> >            :2           50%           1:4     dmesg.WARNING:at#for_ip_interrupt_entry/0x
> >          %stddev     %change         %stddev
> >              \          |                \
> >     240408            -3.6%     231711        will-it-scale.per_process_ops
> >   25002520            -3.6%   24097991        will-it-scale.workload
> >     351914            -1.7%     345882        interrupts.CAL:Function_call_interrupts
> >       1.77 ± 45%      -1.1        0.64        mpstat.cpu.idle%
> >     106164 ± 24%     -23.2%      81494 ± 28%  numa-meminfo.node0.AnonHugePages
> >     326430 ±  8%     -11.3%     289513        softirqs.SCHED
> >       1294            -2.0%       1268        vmstat.system.cs
> >       3178           +48.4%       4716 ± 16%  slabinfo.eventpoll_pwq.active_objs
> >       3178           +48.4%       4716 ± 16%  slabinfo.eventpoll_pwq.num_objs
> >     336.32          -100.0%       0.00        uptime.boot
> >       3192          -100.0%       0.00        uptime.idle
> >  3.456e+08 ± 76%     -89.9%   34913819 ± 62%  cpuidle.C1E.time
> >     747832 ± 72%     -87.5%      93171 ± 45%  cpuidle.C1E.usage
> >      16209 ± 26%     -38.2%      10021 ± 44%  cpuidle.POLL.time
> >       6352 ± 32%     -39.5%       3843 ± 48%  cpuidle.POLL.usage
> >     885259 ±  2%     -13.8%     763434 ±  7%  numa-vmstat.node0.numa_hit
> >     865117 ±  2%     -13.9%     744992 ±  7%  numa-vmstat.node0.numa_local
> >     405085 ±  7%     +38.0%     558905 ±  9%  numa-vmstat.node1.numa_hit
> >     254056 ± 11%     +59.7%     405824 ± 13%  numa-vmstat.node1.numa_local
> >     738158 ± 73%     -88.5%      85078 ± 47%  turbostat.C1E
> >       1.07 ± 76%      -1.0        0.11 ± 62%  turbostat.C1E%
> >       1.58 ± 49%     -65.4%       0.55 ±  6%  turbostat.CPU%c1
> >       0.15 ± 13%     -35.0%       0.10 ± 38%  turbostat.CPU%c6
> >     153.97 ± 16%     -54.7       99.31        turbostat.PKG_%
> >      64141            +1.5%      65072        proc-vmstat.nr_anon_pages
> >      19541            -7.0%      18178 ±  8%  proc-vmstat.nr_shmem
> >      18296            +1.1%      18506        proc-vmstat.nr_slab_reclaimable
> >     713938            -2.3%     697489        proc-vmstat.numa_hit
> >     693688            -2.4%     677228        proc-vmstat.numa_local
> >     772220            -1.9%     757334        proc-vmstat.pgalloc_normal
> >     798565            -1.8%     784042        proc-vmstat.pgfault
> >     732336            -2.7%     712661        proc-vmstat.pgfree
> >      20.33 ±  4%      -7.0%      18.92        sched_debug.cfs_rq:/.runnable_load_avg.max
> >     160603           -44.5%      89108 ± 38%  sched_debug.cfs_rq:/.spread0.avg
> >     250694           -29.3%     177358 ± 18%  sched_debug.cfs_rq:/.spread0.max
> >       1109 ±  4%      -7.0%       1031        sched_debug.cfs_rq:/.util_avg.max
> >      20.33 ±  4%      -7.2%      18.88        sched_debug.cpu.cpu_load[0].max
> >     -10.00           +35.0%     -13.50        sched_debug.cpu.nr_uninterruptible.min
> >       3.56 ± 10%     +44.2%       5.14 ± 18%  sched_debug.cpu.nr_uninterruptible.stddev
> >      87.10 ± 24%     -34.0%      57.44 ± 37%  sched_debug.cpu.sched_goidle.avg
> >     239.48           -25.6%     178.07 ± 18%  sched_debug.cpu.sched_goidle.stddev
> >     332.67 ±  7%     -25.5%     247.83 ± 13%  sched_debug.cpu.ttwu_count.min
> >     231.67 ±  8%     -15.4%     195.96 ± 12%  sched_debug.cpu.ttwu_local.min
> >      95.47           -95.5        0.00        perf-profile.calltrace.cycles-pp.poll
> >      90.26           -90.3        0.00        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.poll
> >      90.08           -90.1        0.00        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.poll
> >      89.84           -89.8        0.00        perf-profile.calltrace.cycles-pp.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe.poll
> >      88.04           -88.0        0.00        perf-profile.calltrace.cycles-pp.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe.poll
> >       2.66            -0.1        2.54        perf-profile.calltrace.cycles-pp._copy_from_user.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >       1.90            -0.1        1.81        perf-profile.calltrace.cycles-pp.copy_user_enhanced_fast_string._copy_from_user.do_sys_poll.__x64_sys_poll.do_syscall_64
> >       2.56            +0.1        2.64        perf-profile.calltrace.cycles-pp.__fdget.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >       0.00            +2.3        2.29        perf-profile.calltrace.cycles-pp.syscall_return_via_sysret
> >       0.00            +2.3        2.34        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64
> >      17.45            +3.8       21.24        perf-profile.calltrace.cycles-pp.__fget_light.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >       0.00           +92.7       92.66        perf-profile.calltrace.cycles-pp.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >       0.00           +94.5       94.51        perf-profile.calltrace.cycles-pp.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >       0.00           +94.8       94.75        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >       0.00           +94.9       94.92        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
> >      96.03           -96.0        0.00        perf-profile.children.cycles-pp.poll
> >      90.29           -90.3        0.00        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
> >      90.11           -90.1        0.00        perf-profile.children.cycles-pp.do_syscall_64
> >      89.87           -89.9        0.00        perf-profile.children.cycles-pp.__x64_sys_poll
> >      89.39           -89.4        0.00        perf-profile.children.cycles-pp.do_sys_poll
> >      16.19           -16.2        0.00        perf-profile.children.cycles-pp.__fget_light
> >      68.59           -68.6        0.00        perf-profile.self.cycles-pp.do_sys_poll
> >      14.84           -14.8        0.00        perf-profile.self.cycles-pp.__fget_light
> >  1.759e+13          -100.0%       0.00        perf-stat.branch-instructions
> >       0.28            -0.3        0.00        perf-stat.branch-miss-rate%
> >  4.904e+10          -100.0%       0.00        perf-stat.branch-misses
> >       6.79 ±  3%      -6.8        0.00        perf-stat.cache-miss-rate%
> >  1.071e+08 ±  4%    -100.0%       0.00        perf-stat.cache-misses
> >  1.578e+09          -100.0%       0.00        perf-stat.cache-references
> >     385311 ±  2%    -100.0%       0.00        perf-stat.context-switches
> >       1.04          -100.0%       0.00        perf-stat.cpi
> >  8.643e+13          -100.0%       0.00        perf-stat.cpu-cycles
> >      13787          -100.0%       0.00        perf-stat.cpu-migrations
> >       0.00 ±  4%      -0.0        0.00        perf-stat.dTLB-load-miss-rate%
> >   23324811 ±  5%    -100.0%       0.00        perf-stat.dTLB-load-misses
> >  1.811e+13          -100.0%       0.00        perf-stat.dTLB-loads
> >       0.00            -0.0        0.00        perf-stat.dTLB-store-miss-rate%
> >    2478029          -100.0%       0.00        perf-stat.dTLB-store-misses
> >  8.775e+12          -100.0%       0.00        perf-stat.dTLB-stores
> >      99.66           -99.7        0.00        perf-stat.iTLB-load-miss-rate%
> >  7.527e+09          -100.0%       0.00        perf-stat.iTLB-load-misses
> >   25540468 ± 39%    -100.0%       0.00        perf-stat.iTLB-loads
> >   8.33e+13          -100.0%       0.00        perf-stat.instructions
> >      11066          -100.0%       0.00        perf-stat.instructions-per-iTLB-miss
> >       0.96          -100.0%       0.00        perf-stat.ipc
> >     777357          -100.0%       0.00        perf-stat.minor-faults
> >      81.69           -81.7        0.00        perf-stat.node-load-miss-rate%
> >   20040093          -100.0%       0.00        perf-stat.node-load-misses
> >    4491667 ±  7%    -100.0%       0.00        perf-stat.node-loads
> >      75.23 ± 10%     -75.2        0.00        perf-stat.node-store-miss-rate%
> >    3418662 ± 30%    -100.0%       0.00        perf-stat.node-store-misses
> >    1027183 ± 11%    -100.0%       0.00        perf-stat.node-stores
> >     777373          -100.0%       0.00        perf-stat.page-faults
> >    3331644          -100.0%       0.00        perf-stat.path-length
> >
> >
> >
> >                             will-it-scale.per_process_ops
> >
> >   242000 +-+----------------------------------------------------------------+
> >          |                      +.+..   .+..+.      .+.+..+.+.+.    .+.+..  |
> >   240000 +-+                   +     +.+      +.+..+            +..+      +.|
> >   238000 +-+..+.+.  .+.   .+..+                                             |
> >          |        +.   +.+                                                  |
> >   236000 +-+                                                                |
> >          |                                                                  |
> >   234000 +-+                                                                |
> >          |                                  O O O  O                        |
> >   232000 +-+             O O  O O                      O  O O O O  O O O  O |
> >   230000 +-+           O          O  O O O           O                      |
> >          |           O                                                      |
> >   228000 O-+    O O                                                         |
> >          | O  O                                                             |
> >   226000 +-+----------------------------------------------------------------+
> >
> >
> >                                 will-it-scale.workload
> >
> >   2.52e+07 +-+--------------------------------------------------------------+
> >            |                     +..+.   .+..+.      .+. .+.+..+.   .+..+.  |
> >    2.5e+07 +-+                  +     +.+      +.+.+.   +        +.+      +.|
> >   2.48e+07 +-+.+..+. .+.    .+.+                                            |
> >            |        +   +..+                                                |
> >   2.46e+07 +-+                                                              |
> >   2.44e+07 +-+                                                              |
> >            |                                                                |
> >   2.42e+07 +-+               O   O           O O O O        O        O      |
> >    2.4e+07 +-+          O  O   O                        O O    O O O    O O |
> >            |          O             O O O O           O                     |
> >   2.38e+07 O-+    O                                                         |
> >   2.36e+07 +-O O    O                                                       |
> >            |                                                                |
> >   2.34e+07 +-+--------------------------------------------------------------+
> >
> >
> > [*] bisect-good sample
> > [O] bisect-bad  sample
> >
> >
> >
> > Disclaimer:
> > Results have been estimated based on internal Intel analysis and are provided
> > for informational purposes only. Any difference in system hardware or software
> > design or configuration may affect actual performance.
> >
> >
> > Thanks,
> > Rong Chen
> _______________________________________________
> LKP mailing list
> LKP(a)lists.01.org
> https://lists.01.org/mailman/listinfo/lkp

  reply	other threads:[~2019-01-16 15:38 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-15  3:24 [PM] 8234f6734c: will-it-scale.per_process_ops -3.6% regression kernel test robot
2019-01-15 13:13 ` Vincent Guittot
2019-01-16 15:38   ` kernel test robot [this message]
2019-01-17 14:58     ` Vincent Guittot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190116153839.GA3867@shao2-debian \
    --to=rong.a.chen@intel.com \
    --cc=lkp@lists.01.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.