All of lore.kernel.org
 help / color / mirror / Atom feed
* [RAID5] 878ee679279: -1.8% vmstat.io.bo, +40.5% perf-stat.LLC-load-misses
@ 2015-04-23  6:55 ` Huang Ying
  0 siblings, 0 replies; 6+ messages in thread
From: Huang Ying @ 2015-04-23  6:55 UTC (permalink / raw)
  To: lkp

[-- Attachment #1: Type: text/plain, Size: 15090 bytes --]

FYI, we noticed the below changes on

git://neil.brown.name/md for-next
commit 878ee6792799e2f88bdcac329845efadb205252f ("RAID5: batch adjacent full stripe write")


testbox/testcase/testparams: lkp-st02/dd-write/300-5m-11HDD-RAID5-cfq-xfs-1dd

a87d7f782b47e030  878ee6792799e2f88bdcac3298  
----------------  --------------------------  
         %stddev     %change         %stddev
             \          |                \  
     59035 ±  0%     +18.4%      69913 ±  1%  softirqs.SCHED
      1330 ± 10%     +17.4%       1561 ±  4%  slabinfo.kmalloc-512.num_objs
      1330 ± 10%     +17.4%       1561 ±  4%  slabinfo.kmalloc-512.active_objs
    305908 ±  0%      -1.8%     300427 ±  0%  vmstat.io.bo
         1 ±  0%    +100.0%          2 ±  0%  vmstat.procs.r
      8266 ±  1%     -15.7%       6968 ±  0%  vmstat.system.cs
     14819 ±  0%      -2.1%      14503 ±  0%  vmstat.system.in
     18.20 ±  6%     +10.2%      20.05 ±  4%  perf-profile.cpu-cycles.raid_run_ops.handle_stripe.handle_active_stripes.raid5d.md_thread
      1.94 ±  9%     +90.6%       3.70 ±  9%  perf-profile.cpu-cycles.async_xor.raid_run_ops.handle_stripe.handle_active_stripes.raid5d
      0.00 ±  0%      +Inf%      25.18 ±  3%  perf-profile.cpu-cycles.handle_active_stripes.isra.45.raid5d.md_thread.kthread.ret_from_fork
      0.00 ±  0%      +Inf%      14.14 ±  4%  perf-profile.cpu-cycles.async_copy_data.isra.42.raid_run_ops.handle_stripe.handle_active_stripes.raid5d
      1.79 ±  7%    +102.9%       3.64 ±  9%  perf-profile.cpu-cycles.xor_blocks.async_xor.raid_run_ops.handle_stripe.handle_active_stripes
      3.09 ±  4%     -10.8%       2.76 ±  4%  perf-profile.cpu-cycles.get_active_stripe.make_request.md_make_request.generic_make_request.submit_bio
      0.80 ± 14%     +28.1%       1.02 ± 10%  perf-profile.cpu-cycles.mutex_lock.xfs_file_buffered_aio_write.xfs_file_write_iter.new_sync_write.vfs_write
     14.78 ±  6%    -100.0%       0.00 ±  0%  perf-profile.cpu-cycles.async_copy_data.isra.38.raid_run_ops.handle_stripe.handle_active_stripes.raid5d
     25.68 ±  4%    -100.0%       0.00 ±  0%  perf-profile.cpu-cycles.handle_active_stripes.isra.41.raid5d.md_thread.kthread.ret_from_fork
      1.23 ±  5%    +140.0%       2.96 ±  7%  perf-profile.cpu-cycles.xor_sse_5_pf64.xor_blocks.async_xor.raid_run_ops.handle_stripe
      2.62 ±  6%     -95.6%       0.12 ± 33%  perf-profile.cpu-cycles.analyse_stripe.handle_stripe.handle_active_stripes.raid5d.md_thread
      0.96 ±  9%     +17.5%       1.12 ±  2%  perf-profile.cpu-cycles.xfs_ilock.xfs_file_buffered_aio_write.xfs_file_write_iter.new_sync_write.vfs_write
 1.461e+10 ±  0%      -5.3%  1.384e+10 ±  1%  perf-stat.L1-dcache-load-misses
 3.688e+11 ±  0%      -2.7%   3.59e+11 ±  0%  perf-stat.L1-dcache-loads
 1.124e+09 ±  0%     -27.7%  8.125e+08 ±  0%  perf-stat.L1-dcache-prefetches
 2.767e+10 ±  0%      -1.8%  2.717e+10 ±  0%  perf-stat.L1-dcache-store-misses
 2.352e+11 ±  0%      -2.8%  2.287e+11 ±  0%  perf-stat.L1-dcache-stores
 6.774e+09 ±  0%      -2.3%   6.62e+09 ±  0%  perf-stat.L1-icache-load-misses
 5.571e+08 ±  0%     +40.5%  7.826e+08 ±  1%  perf-stat.LLC-load-misses
 6.263e+09 ±  0%     -13.7%  5.407e+09 ±  1%  perf-stat.LLC-loads
 1.914e+11 ±  0%      -4.2%  1.833e+11 ±  0%  perf-stat.branch-instructions
 1.145e+09 ±  2%      -5.6%  1.081e+09 ±  0%  perf-stat.branch-load-misses
 1.911e+11 ±  0%      -4.3%  1.829e+11 ±  0%  perf-stat.branch-loads
 1.142e+09 ±  2%      -5.1%  1.083e+09 ±  0%  perf-stat.branch-misses
 1.218e+09 ±  0%     +19.8%   1.46e+09 ±  0%  perf-stat.cache-misses
 2.118e+10 ±  0%      -5.2%  2.007e+10 ±  0%  perf-stat.cache-references
   2510308 ±  1%     -15.7%    2115410 ±  0%  perf-stat.context-switches
     39623 ±  0%     +22.1%      48370 ±  1%  perf-stat.cpu-migrations
 4.179e+08 ± 40%    +165.7%  1.111e+09 ± 35%  perf-stat.dTLB-load-misses
 3.684e+11 ±  0%      -2.5%  3.592e+11 ±  0%  perf-stat.dTLB-loads
 1.232e+08 ± 15%     +62.5%  2.002e+08 ± 27%  perf-stat.dTLB-store-misses
 2.348e+11 ±  0%      -2.5%  2.288e+11 ±  0%  perf-stat.dTLB-stores
   3577297 ±  2%      +8.7%    3888986 ±  1%  perf-stat.iTLB-load-misses
 1.035e+12 ±  0%      -3.5%  9.988e+11 ±  0%  perf-stat.iTLB-loads
 1.036e+12 ±  0%      -3.7%  9.978e+11 ±  0%  perf-stat.instructions
       594 ± 30%    +130.3%       1369 ± 13%  sched_debug.cfs_rq[0]:/.blocked_load_avg
        17 ± 10%     -28.2%         12 ± 23%  sched_debug.cfs_rq[0]:/.nr_spread_over
       210 ± 21%     +42.1%        298 ± 28%  sched_debug.cfs_rq[0]:/.tg_runnable_contrib
      9676 ± 21%     +42.1%      13754 ± 28%  sched_debug.cfs_rq[0]:/.avg->runnable_avg_sum
       772 ± 25%    +116.5%       1672 ±  9%  sched_debug.cfs_rq[0]:/.tg_load_contrib
      8402 ±  9%     +83.3%      15405 ± 11%  sched_debug.cfs_rq[0]:/.tg_load_avg
      8356 ±  9%     +82.8%      15272 ± 11%  sched_debug.cfs_rq[1]:/.tg_load_avg
       968 ± 25%    +100.8%       1943 ± 14%  sched_debug.cfs_rq[1]:/.blocked_load_avg
     16242 ±  9%     -22.2%      12643 ± 14%  sched_debug.cfs_rq[1]:/.avg->runnable_avg_sum
       353 ±  9%     -22.1%        275 ± 14%  sched_debug.cfs_rq[1]:/.tg_runnable_contrib
      1183 ± 23%     +77.7%       2102 ± 12%  sched_debug.cfs_rq[1]:/.tg_load_contrib
       181 ±  8%     -31.4%        124 ± 26%  sched_debug.cfs_rq[2]:/.tg_runnable_contrib
      8364 ±  8%     -31.3%       5745 ± 26%  sched_debug.cfs_rq[2]:/.avg->runnable_avg_sum
      8297 ±  9%     +81.7%      15079 ± 12%  sched_debug.cfs_rq[2]:/.tg_load_avg
     30439 ± 13%     -45.2%      16681 ± 26%  sched_debug.cfs_rq[2]:/.exec_clock
     39735 ± 14%     -48.3%      20545 ± 29%  sched_debug.cfs_rq[2]:/.min_vruntime
      8231 ± 10%     +82.2%      15000 ± 12%  sched_debug.cfs_rq[3]:/.tg_load_avg
      1210 ± 14%    +110.3%       2546 ± 30%  sched_debug.cfs_rq[4]:/.tg_load_contrib
      8188 ± 10%     +82.8%      14964 ± 12%  sched_debug.cfs_rq[4]:/.tg_load_avg
      8132 ± 10%     +83.1%      14890 ± 12%  sched_debug.cfs_rq[5]:/.tg_load_avg
       749 ± 29%    +205.9%       2292 ± 34%  sched_debug.cfs_rq[5]:/.blocked_load_avg
       963 ± 30%    +169.9%       2599 ± 33%  sched_debug.cfs_rq[5]:/.tg_load_contrib
     37791 ± 32%     -38.6%      23209 ± 13%  sched_debug.cfs_rq[6]:/.min_vruntime
       693 ± 25%    +132.2%       1609 ± 29%  sched_debug.cfs_rq[6]:/.blocked_load_avg
     10838 ± 13%     -39.2%       6587 ± 13%  sched_debug.cfs_rq[6]:/.avg->runnable_avg_sum
     29329 ± 27%     -33.2%      19577 ± 10%  sched_debug.cfs_rq[6]:/.exec_clock
       235 ± 14%     -39.7%        142 ± 14%  sched_debug.cfs_rq[6]:/.tg_runnable_contrib
      8085 ± 10%     +83.6%      14848 ± 12%  sched_debug.cfs_rq[6]:/.tg_load_avg
       839 ± 25%    +128.5%       1917 ± 18%  sched_debug.cfs_rq[6]:/.tg_load_contrib
      8051 ± 10%     +83.6%      14779 ± 12%  sched_debug.cfs_rq[7]:/.tg_load_avg
       156 ± 34%     +97.9%        309 ± 19%  sched_debug.cpu#0.cpu_load[4]
       160 ± 25%     +64.0%        263 ± 16%  sched_debug.cpu#0.cpu_load[2]
       156 ± 32%     +83.7%        286 ± 17%  sched_debug.cpu#0.cpu_load[3]
       164 ± 20%     -35.1%        106 ± 31%  sched_debug.cpu#2.cpu_load[0]
       249 ± 15%     +80.2%        449 ± 10%  sched_debug.cpu#4.cpu_load[3]
       231 ± 11%    +101.2%        466 ± 13%  sched_debug.cpu#4.cpu_load[2]
       217 ± 14%    +189.9%        630 ± 38%  sched_debug.cpu#4.cpu_load[0]
     71951 ±  5%     +21.6%      87526 ±  7%  sched_debug.cpu#4.nr_load_updates
       214 ±  8%    +146.1%        527 ± 27%  sched_debug.cpu#4.cpu_load[1]
       256 ± 17%     +75.7%        449 ± 13%  sched_debug.cpu#4.cpu_load[4]
       209 ± 23%     +98.3%        416 ± 48%  sched_debug.cpu#5.cpu_load[2]
     68024 ±  2%     +18.8%      80825 ±  1%  sched_debug.cpu#5.nr_load_updates
       217 ± 26%     +74.9%        380 ± 45%  sched_debug.cpu#5.cpu_load[3]
       852 ± 21%     -38.3%        526 ± 22%  sched_debug.cpu#6.curr->pid

lkp-st02: Core2
Memory: 8G




                                perf-stat.cache-misses

  1.6e+09 O+-----O--O---O--O---O--------------------------------------------+
          |                       O   O  O   O  O   O  O   O  O   O         |
  1.4e+09 ++                                                                |
  1.2e+09 *+.*...*      *..*      *      *...*..*...*..*...*..*...*..*...*..*
          |      :      :  :      :      :                                  |
    1e+09 ++      :    :    :    : :    :                                   |
          |       :    :    :    : :    :                                   |
    8e+08 ++      :    :    :    : :    :                                   |
          |       :   :      :   :  :   :                                   |
    6e+08 ++       :  :      :  :   :  :                                    |
    4e+08 ++       : :        : :    : :                                    |
          |        : :        : :    : :                                    |
    2e+08 ++       : :        : :    : :                                    |
          |         :          :      :                                     |
        0 ++-O------*----------*------*-------------------------------------+


                            perf-stat.L1-dcache-prefetches

  1.2e+09 ++----------------------------------------------------------------+
          *..*...*      *..*      *        ..*..  ..*..*...*..*...*..*...*..*
    1e+09 ++     :      :  :      :      *.     *.                          |
          |      :     :    :     ::     :                                  |
          |       :    :    :    : :     :                        O         |
    8e+08 O+     O: O  :O  O:  O :O:  O :O   O  O   O  O   O  O             |
          |       :   :      :   :  :   :                                   |
    6e+08 ++      :   :      :   :  :   :                                   |
          |        :  :      :  :   :   :                                   |
    4e+08 ++       :  :      :  :   :  :                                    |
          |        : :        : :    : :                                    |
          |        : :        : :    : :                                    |
    2e+08 ++        ::        ::     : :                                    |
          |         :          :      :                                     |
        0 ++-O------*----------*------*-------------------------------------+


                              perf-stat.LLC-load-misses

  1e+09 ++------------------------------------------------------------------+
  9e+08 O+     O   O  O   O  O                                              |
        |                        O   O  O   O                               |
  8e+08 ++                                     O   O   O  O   O  O          |
  7e+08 ++                                                                  |
        |                                                                   |
  6e+08 *+..*..*      *...*      *      *...*..*...*...*..*...*..*...*..*...*
  5e+08 ++      :     :   :      ::     :                                   |
  4e+08 ++      :    :     :    : :    :                                    |
        |        :   :     :    :  :   :                                    |
  3e+08 ++       :   :      :  :   :   :                                    |
  2e+08 ++        : :       :  :    : :                                     |
        |         : :       : :     : :                                     |
  1e+08 ++         :         ::      :                                      |
      0 ++--O------*---------*-------*--------------------------------------+


                              perf-stat.context-switches

    3e+06 ++----------------------------------------------------------------+
          |                              *...*..*...                        |
  2.5e+06 *+.*...*      *..*      *      :          *..*...  .*...*..*...  .*
          |      :      :  :      :      :                 *.            *. |
          O      O: O  :O  O:  O  ::    :       O   O  O   O  O   O         |
    2e+06 ++      :    :    :    :O:  O :O   O                              |
          |       :    :    :    : :    :                                   |
  1.5e+06 ++      :   :      :   :  :   :                                   |
          |        :  :      :   :  :  :                                    |
    1e+06 ++       :  :      :  :   :  :                                    |
          |        : :        : :    : :                                    |
          |        : :        : :    : :                                    |
   500000 ++        ::        : :    ::                                     |
          |         :          :      :                                     |
        0 ++-O------*----------*------*-------------------------------------+


                                  vmstat.system.cs

  10000 ++------------------------------------------------------------------+
   9000 ++                              *...*..                             |
        *...*..*      *...*      *      :      *...*...*..  ..*..*...*..  ..*
   8000 ++     :      :   :      :      :                 *.            *.  |
   7000 O+     O:  O  O   O: O  : :    :       O   O   O  O   O  O          |
        |       :    :     :    :O:  O :O   O                               |
   6000 ++      :    :     :    : :    :                                    |
   5000 ++       :   :     :   :   :   :                                    |
   4000 ++       :   :      :  :   :  :                                     |
        |        :  :       :  :   :  :                                     |
   3000 ++        : :       : :     : :                                     |
   2000 ++        : :       : :     : :                                     |
        |         : :        ::     ::                                      |
   1000 ++         :         :       :                                      |
      0 ++--O------*---------*-------*--------------------------------------+


	[*] bisect-good sample
	[O] bisect-bad  sample

To reproduce:

	apt-get install ruby
	git clone git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git
	cd lkp-tests
	bin/setup-local job.yaml # the job file attached in this email
	bin/run-local   job.yaml


Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


Thanks,
Ying Huang


_______________________________________________
LKP mailing list
LKP(a)linux.intel.com


[-- Attachment #2: job.yaml --]
[-- Type: text/plain, Size: 3296 bytes --]

---
testcase: dd-write
default-monitors:
  wait: pre-test
  uptime: 
  iostat: 
  vmstat: 
  numa-numastat: 
  numa-vmstat: 
  numa-meminfo: 
  proc-vmstat: 
  proc-stat: 
  meminfo: 
  slabinfo: 
  interrupts: 
  lock_stat: 
  latency_stats: 
  softirqs: 
  bdi_dev_mapping: 
  diskstats: 
  nfsstat: 
  cpuidle: 
  cpufreq-stats: 
  turbostat: 
  pmeter: 
  sched_debug:
    interval: 10
default-watchdogs:
  watch-oom: 
  watchdog: 
cpufreq_governor: 
commit: a1a71cc4c0a53e29fe27cede9392b0ad816ee956
model: Core2
memory: 8G
nr_hdd_partitions: 12
wait_disks_timeout: 300
hdd_partitions: "/dev/disk/by-id/scsi-35000c5000???????"
swap_partitions: 
runtime: 5m
disk: 11HDD
md: RAID5
iosched: cfq
fs: xfs
fs2: 
monitors:
  perf-stat: 
  perf-profile: 
  ftrace:
    events: balance_dirty_pages bdi_dirty_ratelimit global_dirty_state writeback_single_inode
nr_threads: 1dd
dd: 
testbox: lkp-st02
tbox_group: lkp-st02
kconfig: x86_64-rhel
enqueue_time: 2015-04-19 11:59:58.120063120 +08:00
head_commit: a1a71cc4c0a53e29fe27cede9392b0ad816ee956
base_commit: 39a8804455fb23f09157341d3ba7db6d7ae6ee76
branch: linux-devel/devel-hourly-2015042014
kernel: "/kernel/x86_64-rhel/a1a71cc4c0a53e29fe27cede9392b0ad816ee956/vmlinuz-4.0.0-09109-ga1a71cc"
user: lkp
queue: cyclic
rootfs: debian-x86_64-2015-02-07.cgz
result_root: "/result/lkp-st02/dd-write/300-5m-11HDD-RAID5-cfq-xfs-1dd/debian-x86_64-2015-02-07.cgz/x86_64-rhel/a1a71cc4c0a53e29fe27cede9392b0ad816ee956/0"
LKP_SERVER: inn
job_file: "/lkp/scheduled/lkp-st02/cyclic_dd-write-300-5m-11HDD-RAID5-cfq-xfs-1dd-x86_64-rhel-HEAD-a1a71cc4c0a53e29fe27cede9392b0ad816ee956-0-20150419-35022-17ddag2.yaml"
dequeue_time: 2015-04-20 16:17:46.635323077 +08:00
nr_cpu: "$(nproc)"
initrd: "/osimage/debian/debian-x86_64-2015-02-07.cgz"
bootloader_append:
- root=/dev/ram0
- user=lkp
- job=/lkp/scheduled/lkp-st02/cyclic_dd-write-300-5m-11HDD-RAID5-cfq-xfs-1dd-x86_64-rhel-HEAD-a1a71cc4c0a53e29fe27cede9392b0ad816ee956-0-20150419-35022-17ddag2.yaml
- ARCH=x86_64
- kconfig=x86_64-rhel
- branch=linux-devel/devel-hourly-2015042014
- commit=a1a71cc4c0a53e29fe27cede9392b0ad816ee956
- BOOT_IMAGE=/kernel/x86_64-rhel/a1a71cc4c0a53e29fe27cede9392b0ad816ee956/vmlinuz-4.0.0-09109-ga1a71cc
- RESULT_ROOT=/result/lkp-st02/dd-write/300-5m-11HDD-RAID5-cfq-xfs-1dd/debian-x86_64-2015-02-07.cgz/x86_64-rhel/a1a71cc4c0a53e29fe27cede9392b0ad816ee956/0
- LKP_SERVER=inn
- |2-


  earlyprintk=ttyS0,115200 rd.udev.log-priority=err systemd.log_target=journal systemd.log_level=warning
  debug apic=debug sysrq_always_enabled rcupdate.rcu_cpu_stall_timeout=100
  panic=-1 softlockup_panic=1 nmi_watchdog=panic oops=panic load_ramdisk=2 prompt_ramdisk=0
  console=ttyS0,115200 console=tty0 vga=normal

  rw
max_uptime: 1500
lkp_initrd: "/lkp/lkp/lkp-x86_64.cgz"
modules_initrd: "/kernel/x86_64-rhel/a1a71cc4c0a53e29fe27cede9392b0ad816ee956/modules.cgz"
bm_initrd: "/osimage/deps/debian-x86_64-2015-02-07.cgz/lkp.cgz,/osimage/deps/debian-x86_64-2015-02-07.cgz/turbostat.cgz,/lkp/benchmarks/turbostat.cgz,/osimage/deps/debian-x86_64-2015-02-07.cgz/fs.cgz,/osimage/deps/debian-x86_64-2015-02-07.cgz/fs2.cgz"
job_state: finished
loadavg: 1.60 1.36 0.63 1/145 5859
start_time: '1429517927'
end_time: '1429518229'
version: "/lkp/lkp/.src-20150418-142223"
time_delta: '1429517881.362849165'

[-- Attachment #3: reproduce.ksh --]
[-- Type: text/plain, Size: 680 bytes --]

mdadm --stop /dev/md0
mdadm -q --create /dev/md0 --chunk=256 --level=raid5 --raid-devices=11 --force --assume-clean /dev/sdb /dev/sdg /dev/sdi /dev/sdh /dev/sdl /dev/sdf /dev/sdm /dev/sdk /dev/sdd /dev/sde /dev/sdc
mkfs -t xfs /dev/md0
mount -t xfs -o nobarrier,inode64 /dev/md0 /fs/md0
echo 1 > /sys/kernel/debug/tracing/events/writeback/balance_dirty_pages/enable
echo 1 > /sys/kernel/debug/tracing/events/writeback/bdi_dirty_ratelimit/enable
echo 1 > /sys/kernel/debug/tracing/events/writeback/global_dirty_state/enable
echo 1 > /sys/kernel/debug/tracing/events/writeback/writeback_single_inode/enable
dd  if=/dev/zero of=/fs/md0/zero-1 status=noxfer &
sleep 300
killall -9 dd

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [LKP] [RAID5] 878ee679279: -1.8% vmstat.io.bo, +40.5% perf-stat.LLC-load-misses
@ 2015-04-23  6:55 ` Huang Ying
  0 siblings, 0 replies; 6+ messages in thread
From: Huang Ying @ 2015-04-23  6:55 UTC (permalink / raw)
  To: shli@kernel.org; +Cc: NeilBrown, LKML, LKP ML

[-- Attachment #1: Type: text/plain, Size: 14775 bytes --]

FYI, we noticed the below changes on

git://neil.brown.name/md for-next
commit 878ee6792799e2f88bdcac329845efadb205252f ("RAID5: batch adjacent full stripe write")


testbox/testcase/testparams: lkp-st02/dd-write/300-5m-11HDD-RAID5-cfq-xfs-1dd

a87d7f782b47e030  878ee6792799e2f88bdcac3298  
----------------  --------------------------  
         %stddev     %change         %stddev
             \          |                \  
     59035 ±  0%     +18.4%      69913 ±  1%  softirqs.SCHED
      1330 ± 10%     +17.4%       1561 ±  4%  slabinfo.kmalloc-512.num_objs
      1330 ± 10%     +17.4%       1561 ±  4%  slabinfo.kmalloc-512.active_objs
    305908 ±  0%      -1.8%     300427 ±  0%  vmstat.io.bo
         1 ±  0%    +100.0%          2 ±  0%  vmstat.procs.r
      8266 ±  1%     -15.7%       6968 ±  0%  vmstat.system.cs
     14819 ±  0%      -2.1%      14503 ±  0%  vmstat.system.in
     18.20 ±  6%     +10.2%      20.05 ±  4%  perf-profile.cpu-cycles.raid_run_ops.handle_stripe.handle_active_stripes.raid5d.md_thread
      1.94 ±  9%     +90.6%       3.70 ±  9%  perf-profile.cpu-cycles.async_xor.raid_run_ops.handle_stripe.handle_active_stripes.raid5d
      0.00 ±  0%      +Inf%      25.18 ±  3%  perf-profile.cpu-cycles.handle_active_stripes.isra.45.raid5d.md_thread.kthread.ret_from_fork
      0.00 ±  0%      +Inf%      14.14 ±  4%  perf-profile.cpu-cycles.async_copy_data.isra.42.raid_run_ops.handle_stripe.handle_active_stripes.raid5d
      1.79 ±  7%    +102.9%       3.64 ±  9%  perf-profile.cpu-cycles.xor_blocks.async_xor.raid_run_ops.handle_stripe.handle_active_stripes
      3.09 ±  4%     -10.8%       2.76 ±  4%  perf-profile.cpu-cycles.get_active_stripe.make_request.md_make_request.generic_make_request.submit_bio
      0.80 ± 14%     +28.1%       1.02 ± 10%  perf-profile.cpu-cycles.mutex_lock.xfs_file_buffered_aio_write.xfs_file_write_iter.new_sync_write.vfs_write
     14.78 ±  6%    -100.0%       0.00 ±  0%  perf-profile.cpu-cycles.async_copy_data.isra.38.raid_run_ops.handle_stripe.handle_active_stripes.raid5d
     25.68 ±  4%    -100.0%       0.00 ±  0%  perf-profile.cpu-cycles.handle_active_stripes.isra.41.raid5d.md_thread.kthread.ret_from_fork
      1.23 ±  5%    +140.0%       2.96 ±  7%  perf-profile.cpu-cycles.xor_sse_5_pf64.xor_blocks.async_xor.raid_run_ops.handle_stripe
      2.62 ±  6%     -95.6%       0.12 ± 33%  perf-profile.cpu-cycles.analyse_stripe.handle_stripe.handle_active_stripes.raid5d.md_thread
      0.96 ±  9%     +17.5%       1.12 ±  2%  perf-profile.cpu-cycles.xfs_ilock.xfs_file_buffered_aio_write.xfs_file_write_iter.new_sync_write.vfs_write
 1.461e+10 ±  0%      -5.3%  1.384e+10 ±  1%  perf-stat.L1-dcache-load-misses
 3.688e+11 ±  0%      -2.7%   3.59e+11 ±  0%  perf-stat.L1-dcache-loads
 1.124e+09 ±  0%     -27.7%  8.125e+08 ±  0%  perf-stat.L1-dcache-prefetches
 2.767e+10 ±  0%      -1.8%  2.717e+10 ±  0%  perf-stat.L1-dcache-store-misses
 2.352e+11 ±  0%      -2.8%  2.287e+11 ±  0%  perf-stat.L1-dcache-stores
 6.774e+09 ±  0%      -2.3%   6.62e+09 ±  0%  perf-stat.L1-icache-load-misses
 5.571e+08 ±  0%     +40.5%  7.826e+08 ±  1%  perf-stat.LLC-load-misses
 6.263e+09 ±  0%     -13.7%  5.407e+09 ±  1%  perf-stat.LLC-loads
 1.914e+11 ±  0%      -4.2%  1.833e+11 ±  0%  perf-stat.branch-instructions
 1.145e+09 ±  2%      -5.6%  1.081e+09 ±  0%  perf-stat.branch-load-misses
 1.911e+11 ±  0%      -4.3%  1.829e+11 ±  0%  perf-stat.branch-loads
 1.142e+09 ±  2%      -5.1%  1.083e+09 ±  0%  perf-stat.branch-misses
 1.218e+09 ±  0%     +19.8%   1.46e+09 ±  0%  perf-stat.cache-misses
 2.118e+10 ±  0%      -5.2%  2.007e+10 ±  0%  perf-stat.cache-references
   2510308 ±  1%     -15.7%    2115410 ±  0%  perf-stat.context-switches
     39623 ±  0%     +22.1%      48370 ±  1%  perf-stat.cpu-migrations
 4.179e+08 ± 40%    +165.7%  1.111e+09 ± 35%  perf-stat.dTLB-load-misses
 3.684e+11 ±  0%      -2.5%  3.592e+11 ±  0%  perf-stat.dTLB-loads
 1.232e+08 ± 15%     +62.5%  2.002e+08 ± 27%  perf-stat.dTLB-store-misses
 2.348e+11 ±  0%      -2.5%  2.288e+11 ±  0%  perf-stat.dTLB-stores
   3577297 ±  2%      +8.7%    3888986 ±  1%  perf-stat.iTLB-load-misses
 1.035e+12 ±  0%      -3.5%  9.988e+11 ±  0%  perf-stat.iTLB-loads
 1.036e+12 ±  0%      -3.7%  9.978e+11 ±  0%  perf-stat.instructions
       594 ± 30%    +130.3%       1369 ± 13%  sched_debug.cfs_rq[0]:/.blocked_load_avg
        17 ± 10%     -28.2%         12 ± 23%  sched_debug.cfs_rq[0]:/.nr_spread_over
       210 ± 21%     +42.1%        298 ± 28%  sched_debug.cfs_rq[0]:/.tg_runnable_contrib
      9676 ± 21%     +42.1%      13754 ± 28%  sched_debug.cfs_rq[0]:/.avg->runnable_avg_sum
       772 ± 25%    +116.5%       1672 ±  9%  sched_debug.cfs_rq[0]:/.tg_load_contrib
      8402 ±  9%     +83.3%      15405 ± 11%  sched_debug.cfs_rq[0]:/.tg_load_avg
      8356 ±  9%     +82.8%      15272 ± 11%  sched_debug.cfs_rq[1]:/.tg_load_avg
       968 ± 25%    +100.8%       1943 ± 14%  sched_debug.cfs_rq[1]:/.blocked_load_avg
     16242 ±  9%     -22.2%      12643 ± 14%  sched_debug.cfs_rq[1]:/.avg->runnable_avg_sum
       353 ±  9%     -22.1%        275 ± 14%  sched_debug.cfs_rq[1]:/.tg_runnable_contrib
      1183 ± 23%     +77.7%       2102 ± 12%  sched_debug.cfs_rq[1]:/.tg_load_contrib
       181 ±  8%     -31.4%        124 ± 26%  sched_debug.cfs_rq[2]:/.tg_runnable_contrib
      8364 ±  8%     -31.3%       5745 ± 26%  sched_debug.cfs_rq[2]:/.avg->runnable_avg_sum
      8297 ±  9%     +81.7%      15079 ± 12%  sched_debug.cfs_rq[2]:/.tg_load_avg
     30439 ± 13%     -45.2%      16681 ± 26%  sched_debug.cfs_rq[2]:/.exec_clock
     39735 ± 14%     -48.3%      20545 ± 29%  sched_debug.cfs_rq[2]:/.min_vruntime
      8231 ± 10%     +82.2%      15000 ± 12%  sched_debug.cfs_rq[3]:/.tg_load_avg
      1210 ± 14%    +110.3%       2546 ± 30%  sched_debug.cfs_rq[4]:/.tg_load_contrib
      8188 ± 10%     +82.8%      14964 ± 12%  sched_debug.cfs_rq[4]:/.tg_load_avg
      8132 ± 10%     +83.1%      14890 ± 12%  sched_debug.cfs_rq[5]:/.tg_load_avg
       749 ± 29%    +205.9%       2292 ± 34%  sched_debug.cfs_rq[5]:/.blocked_load_avg
       963 ± 30%    +169.9%       2599 ± 33%  sched_debug.cfs_rq[5]:/.tg_load_contrib
     37791 ± 32%     -38.6%      23209 ± 13%  sched_debug.cfs_rq[6]:/.min_vruntime
       693 ± 25%    +132.2%       1609 ± 29%  sched_debug.cfs_rq[6]:/.blocked_load_avg
     10838 ± 13%     -39.2%       6587 ± 13%  sched_debug.cfs_rq[6]:/.avg->runnable_avg_sum
     29329 ± 27%     -33.2%      19577 ± 10%  sched_debug.cfs_rq[6]:/.exec_clock
       235 ± 14%     -39.7%        142 ± 14%  sched_debug.cfs_rq[6]:/.tg_runnable_contrib
      8085 ± 10%     +83.6%      14848 ± 12%  sched_debug.cfs_rq[6]:/.tg_load_avg
       839 ± 25%    +128.5%       1917 ± 18%  sched_debug.cfs_rq[6]:/.tg_load_contrib
      8051 ± 10%     +83.6%      14779 ± 12%  sched_debug.cfs_rq[7]:/.tg_load_avg
       156 ± 34%     +97.9%        309 ± 19%  sched_debug.cpu#0.cpu_load[4]
       160 ± 25%     +64.0%        263 ± 16%  sched_debug.cpu#0.cpu_load[2]
       156 ± 32%     +83.7%        286 ± 17%  sched_debug.cpu#0.cpu_load[3]
       164 ± 20%     -35.1%        106 ± 31%  sched_debug.cpu#2.cpu_load[0]
       249 ± 15%     +80.2%        449 ± 10%  sched_debug.cpu#4.cpu_load[3]
       231 ± 11%    +101.2%        466 ± 13%  sched_debug.cpu#4.cpu_load[2]
       217 ± 14%    +189.9%        630 ± 38%  sched_debug.cpu#4.cpu_load[0]
     71951 ±  5%     +21.6%      87526 ±  7%  sched_debug.cpu#4.nr_load_updates
       214 ±  8%    +146.1%        527 ± 27%  sched_debug.cpu#4.cpu_load[1]
       256 ± 17%     +75.7%        449 ± 13%  sched_debug.cpu#4.cpu_load[4]
       209 ± 23%     +98.3%        416 ± 48%  sched_debug.cpu#5.cpu_load[2]
     68024 ±  2%     +18.8%      80825 ±  1%  sched_debug.cpu#5.nr_load_updates
       217 ± 26%     +74.9%        380 ± 45%  sched_debug.cpu#5.cpu_load[3]
       852 ± 21%     -38.3%        526 ± 22%  sched_debug.cpu#6.curr->pid

lkp-st02: Core2
Memory: 8G




                                perf-stat.cache-misses

  1.6e+09 O+-----O--O---O--O---O--------------------------------------------+
          |                       O   O  O   O  O   O  O   O  O   O         |
  1.4e+09 ++                                                                |
  1.2e+09 *+.*...*      *..*      *      *...*..*...*..*...*..*...*..*...*..*
          |      :      :  :      :      :                                  |
    1e+09 ++      :    :    :    : :    :                                   |
          |       :    :    :    : :    :                                   |
    8e+08 ++      :    :    :    : :    :                                   |
          |       :   :      :   :  :   :                                   |
    6e+08 ++       :  :      :  :   :  :                                    |
    4e+08 ++       : :        : :    : :                                    |
          |        : :        : :    : :                                    |
    2e+08 ++       : :        : :    : :                                    |
          |         :          :      :                                     |
        0 ++-O------*----------*------*-------------------------------------+


                            perf-stat.L1-dcache-prefetches

  1.2e+09 ++----------------------------------------------------------------+
          *..*...*      *..*      *        ..*..  ..*..*...*..*...*..*...*..*
    1e+09 ++     :      :  :      :      *.     *.                          |
          |      :     :    :     ::     :                                  |
          |       :    :    :    : :     :                        O         |
    8e+08 O+     O: O  :O  O:  O :O:  O :O   O  O   O  O   O  O             |
          |       :   :      :   :  :   :                                   |
    6e+08 ++      :   :      :   :  :   :                                   |
          |        :  :      :  :   :   :                                   |
    4e+08 ++       :  :      :  :   :  :                                    |
          |        : :        : :    : :                                    |
          |        : :        : :    : :                                    |
    2e+08 ++        ::        ::     : :                                    |
          |         :          :      :                                     |
        0 ++-O------*----------*------*-------------------------------------+


                              perf-stat.LLC-load-misses

  1e+09 ++------------------------------------------------------------------+
  9e+08 O+     O   O  O   O  O                                              |
        |                        O   O  O   O                               |
  8e+08 ++                                     O   O   O  O   O  O          |
  7e+08 ++                                                                  |
        |                                                                   |
  6e+08 *+..*..*      *...*      *      *...*..*...*...*..*...*..*...*..*...*
  5e+08 ++      :     :   :      ::     :                                   |
  4e+08 ++      :    :     :    : :    :                                    |
        |        :   :     :    :  :   :                                    |
  3e+08 ++       :   :      :  :   :   :                                    |
  2e+08 ++        : :       :  :    : :                                     |
        |         : :       : :     : :                                     |
  1e+08 ++         :         ::      :                                      |
      0 ++--O------*---------*-------*--------------------------------------+


                              perf-stat.context-switches

    3e+06 ++----------------------------------------------------------------+
          |                              *...*..*...                        |
  2.5e+06 *+.*...*      *..*      *      :          *..*...  .*...*..*...  .*
          |      :      :  :      :      :                 *.            *. |
          O      O: O  :O  O:  O  ::    :       O   O  O   O  O   O         |
    2e+06 ++      :    :    :    :O:  O :O   O                              |
          |       :    :    :    : :    :                                   |
  1.5e+06 ++      :   :      :   :  :   :                                   |
          |        :  :      :   :  :  :                                    |
    1e+06 ++       :  :      :  :   :  :                                    |
          |        : :        : :    : :                                    |
          |        : :        : :    : :                                    |
   500000 ++        ::        : :    ::                                     |
          |         :          :      :                                     |
        0 ++-O------*----------*------*-------------------------------------+


                                  vmstat.system.cs

  10000 ++------------------------------------------------------------------+
   9000 ++                              *...*..                             |
        *...*..*      *...*      *      :      *...*...*..  ..*..*...*..  ..*
   8000 ++     :      :   :      :      :                 *.            *.  |
   7000 O+     O:  O  O   O: O  : :    :       O   O   O  O   O  O          |
        |       :    :     :    :O:  O :O   O                               |
   6000 ++      :    :     :    : :    :                                    |
   5000 ++       :   :     :   :   :   :                                    |
   4000 ++       :   :      :  :   :  :                                     |
        |        :  :       :  :   :  :                                     |
   3000 ++        : :       : :     : :                                     |
   2000 ++        : :       : :     : :                                     |
        |         : :        ::     ::                                      |
   1000 ++         :         :       :                                      |
      0 ++--O------*---------*-------*--------------------------------------+


	[*] bisect-good sample
	[O] bisect-bad  sample

To reproduce:

	apt-get install ruby
	git clone git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git
	cd lkp-tests
	bin/setup-local job.yaml # the job file attached in this email
	bin/run-local   job.yaml


Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


Thanks,
Ying Huang


[-- Attachment #2: job.yaml --]
[-- Type: text/plain, Size: 3296 bytes --]

---
testcase: dd-write
default-monitors:
  wait: pre-test
  uptime: 
  iostat: 
  vmstat: 
  numa-numastat: 
  numa-vmstat: 
  numa-meminfo: 
  proc-vmstat: 
  proc-stat: 
  meminfo: 
  slabinfo: 
  interrupts: 
  lock_stat: 
  latency_stats: 
  softirqs: 
  bdi_dev_mapping: 
  diskstats: 
  nfsstat: 
  cpuidle: 
  cpufreq-stats: 
  turbostat: 
  pmeter: 
  sched_debug:
    interval: 10
default-watchdogs:
  watch-oom: 
  watchdog: 
cpufreq_governor: 
commit: a1a71cc4c0a53e29fe27cede9392b0ad816ee956
model: Core2
memory: 8G
nr_hdd_partitions: 12
wait_disks_timeout: 300
hdd_partitions: "/dev/disk/by-id/scsi-35000c5000???????"
swap_partitions: 
runtime: 5m
disk: 11HDD
md: RAID5
iosched: cfq
fs: xfs
fs2: 
monitors:
  perf-stat: 
  perf-profile: 
  ftrace:
    events: balance_dirty_pages bdi_dirty_ratelimit global_dirty_state writeback_single_inode
nr_threads: 1dd
dd: 
testbox: lkp-st02
tbox_group: lkp-st02
kconfig: x86_64-rhel
enqueue_time: 2015-04-19 11:59:58.120063120 +08:00
head_commit: a1a71cc4c0a53e29fe27cede9392b0ad816ee956
base_commit: 39a8804455fb23f09157341d3ba7db6d7ae6ee76
branch: linux-devel/devel-hourly-2015042014
kernel: "/kernel/x86_64-rhel/a1a71cc4c0a53e29fe27cede9392b0ad816ee956/vmlinuz-4.0.0-09109-ga1a71cc"
user: lkp
queue: cyclic
rootfs: debian-x86_64-2015-02-07.cgz
result_root: "/result/lkp-st02/dd-write/300-5m-11HDD-RAID5-cfq-xfs-1dd/debian-x86_64-2015-02-07.cgz/x86_64-rhel/a1a71cc4c0a53e29fe27cede9392b0ad816ee956/0"
LKP_SERVER: inn
job_file: "/lkp/scheduled/lkp-st02/cyclic_dd-write-300-5m-11HDD-RAID5-cfq-xfs-1dd-x86_64-rhel-HEAD-a1a71cc4c0a53e29fe27cede9392b0ad816ee956-0-20150419-35022-17ddag2.yaml"
dequeue_time: 2015-04-20 16:17:46.635323077 +08:00
nr_cpu: "$(nproc)"
initrd: "/osimage/debian/debian-x86_64-2015-02-07.cgz"
bootloader_append:
- root=/dev/ram0
- user=lkp
- job=/lkp/scheduled/lkp-st02/cyclic_dd-write-300-5m-11HDD-RAID5-cfq-xfs-1dd-x86_64-rhel-HEAD-a1a71cc4c0a53e29fe27cede9392b0ad816ee956-0-20150419-35022-17ddag2.yaml
- ARCH=x86_64
- kconfig=x86_64-rhel
- branch=linux-devel/devel-hourly-2015042014
- commit=a1a71cc4c0a53e29fe27cede9392b0ad816ee956
- BOOT_IMAGE=/kernel/x86_64-rhel/a1a71cc4c0a53e29fe27cede9392b0ad816ee956/vmlinuz-4.0.0-09109-ga1a71cc
- RESULT_ROOT=/result/lkp-st02/dd-write/300-5m-11HDD-RAID5-cfq-xfs-1dd/debian-x86_64-2015-02-07.cgz/x86_64-rhel/a1a71cc4c0a53e29fe27cede9392b0ad816ee956/0
- LKP_SERVER=inn
- |2-


  earlyprintk=ttyS0,115200 rd.udev.log-priority=err systemd.log_target=journal systemd.log_level=warning
  debug apic=debug sysrq_always_enabled rcupdate.rcu_cpu_stall_timeout=100
  panic=-1 softlockup_panic=1 nmi_watchdog=panic oops=panic load_ramdisk=2 prompt_ramdisk=0
  console=ttyS0,115200 console=tty0 vga=normal

  rw
max_uptime: 1500
lkp_initrd: "/lkp/lkp/lkp-x86_64.cgz"
modules_initrd: "/kernel/x86_64-rhel/a1a71cc4c0a53e29fe27cede9392b0ad816ee956/modules.cgz"
bm_initrd: "/osimage/deps/debian-x86_64-2015-02-07.cgz/lkp.cgz,/osimage/deps/debian-x86_64-2015-02-07.cgz/turbostat.cgz,/lkp/benchmarks/turbostat.cgz,/osimage/deps/debian-x86_64-2015-02-07.cgz/fs.cgz,/osimage/deps/debian-x86_64-2015-02-07.cgz/fs2.cgz"
job_state: finished
loadavg: 1.60 1.36 0.63 1/145 5859
start_time: '1429517927'
end_time: '1429518229'
version: "/lkp/lkp/.src-20150418-142223"
time_delta: '1429517881.362849165'

[-- Attachment #3: reproduce --]
[-- Type: text/plain, Size: 680 bytes --]

mdadm --stop /dev/md0
mdadm -q --create /dev/md0 --chunk=256 --level=raid5 --raid-devices=11 --force --assume-clean /dev/sdb /dev/sdg /dev/sdi /dev/sdh /dev/sdl /dev/sdf /dev/sdm /dev/sdk /dev/sdd /dev/sde /dev/sdc
mkfs -t xfs /dev/md0
mount -t xfs -o nobarrier,inode64 /dev/md0 /fs/md0
echo 1 > /sys/kernel/debug/tracing/events/writeback/balance_dirty_pages/enable
echo 1 > /sys/kernel/debug/tracing/events/writeback/bdi_dirty_ratelimit/enable
echo 1 > /sys/kernel/debug/tracing/events/writeback/global_dirty_state/enable
echo 1 > /sys/kernel/debug/tracing/events/writeback/writeback_single_inode/enable
dd  if=/dev/zero of=/fs/md0/zero-1 status=noxfer &
sleep 300
killall -9 dd

[-- Attachment #4: Type: text/plain, Size: 89 bytes --]

_______________________________________________
LKP mailing list
LKP@linux.intel.com
\r

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RAID5] 878ee679279: -1.8% vmstat.io.bo, +40.5% perf-stat.LLC-load-misses
  2015-04-23  6:55 ` [LKP] " Huang Ying
@ 2015-04-24  2:15   ` NeilBrown
  -1 siblings, 0 replies; 6+ messages in thread
From: NeilBrown @ 2015-04-24  2:15 UTC (permalink / raw)
  To: lkp

[-- Attachment #1: Type: text/plain, Size: 15980 bytes --]

On Thu, 23 Apr 2015 14:55:59 +0800 Huang Ying <ying.huang@intel.com> wrote:

> FYI, we noticed the below changes on
> 
> git://neil.brown.name/md for-next
> commit 878ee6792799e2f88bdcac329845efadb205252f ("RAID5: batch adjacent full stripe write")

Hi,
 is there any chance that you could explain what some of this means?
There is lots of data and some very pretty graphs, but no explanation.

Which numbers are "good", which are "bad"?  Which is "worst".
What do the graphs really show? and what would we like to see in them?

I think it is really great that you are doing this testing and reporting the
results.  It's just so sad that I completely fail to understand them.

Thanks,
NeilBrown

> 
> 
> testbox/testcase/testparams: lkp-st02/dd-write/300-5m-11HDD-RAID5-cfq-xfs-1dd
> 
> a87d7f782b47e030  878ee6792799e2f88bdcac3298  
> ----------------  --------------------------  
>          %stddev     %change         %stddev
>              \          |                \  
>      59035 ±  0%     +18.4%      69913 ±  1%  softirqs.SCHED
>       1330 ± 10%     +17.4%       1561 ±  4%  slabinfo.kmalloc-512.num_objs
>       1330 ± 10%     +17.4%       1561 ±  4%  slabinfo.kmalloc-512.active_objs
>     305908 ±  0%      -1.8%     300427 ±  0%  vmstat.io.bo
>          1 ±  0%    +100.0%          2 ±  0%  vmstat.procs.r
>       8266 ±  1%     -15.7%       6968 ±  0%  vmstat.system.cs
>      14819 ±  0%      -2.1%      14503 ±  0%  vmstat.system.in
>      18.20 ±  6%     +10.2%      20.05 ±  4%  perf-profile.cpu-cycles.raid_run_ops.handle_stripe.handle_active_stripes.raid5d.md_thread
>       1.94 ±  9%     +90.6%       3.70 ±  9%  perf-profile.cpu-cycles.async_xor.raid_run_ops.handle_stripe.handle_active_stripes.raid5d
>       0.00 ±  0%      +Inf%      25.18 ±  3%  perf-profile.cpu-cycles.handle_active_stripes.isra.45.raid5d.md_thread.kthread.ret_from_fork
>       0.00 ±  0%      +Inf%      14.14 ±  4%  perf-profile.cpu-cycles.async_copy_data.isra.42.raid_run_ops.handle_stripe.handle_active_stripes.raid5d
>       1.79 ±  7%    +102.9%       3.64 ±  9%  perf-profile.cpu-cycles.xor_blocks.async_xor.raid_run_ops.handle_stripe.handle_active_stripes
>       3.09 ±  4%     -10.8%       2.76 ±  4%  perf-profile.cpu-cycles.get_active_stripe.make_request.md_make_request.generic_make_request.submit_bio
>       0.80 ± 14%     +28.1%       1.02 ± 10%  perf-profile.cpu-cycles.mutex_lock.xfs_file_buffered_aio_write.xfs_file_write_iter.new_sync_write.vfs_write
>      14.78 ±  6%    -100.0%       0.00 ±  0%  perf-profile.cpu-cycles.async_copy_data.isra.38.raid_run_ops.handle_stripe.handle_active_stripes.raid5d
>      25.68 ±  4%    -100.0%       0.00 ±  0%  perf-profile.cpu-cycles.handle_active_stripes.isra.41.raid5d.md_thread.kthread.ret_from_fork
>       1.23 ±  5%    +140.0%       2.96 ±  7%  perf-profile.cpu-cycles.xor_sse_5_pf64.xor_blocks.async_xor.raid_run_ops.handle_stripe
>       2.62 ±  6%     -95.6%       0.12 ± 33%  perf-profile.cpu-cycles.analyse_stripe.handle_stripe.handle_active_stripes.raid5d.md_thread
>       0.96 ±  9%     +17.5%       1.12 ±  2%  perf-profile.cpu-cycles.xfs_ilock.xfs_file_buffered_aio_write.xfs_file_write_iter.new_sync_write.vfs_write
>  1.461e+10 ±  0%      -5.3%  1.384e+10 ±  1%  perf-stat.L1-dcache-load-misses
>  3.688e+11 ±  0%      -2.7%   3.59e+11 ±  0%  perf-stat.L1-dcache-loads
>  1.124e+09 ±  0%     -27.7%  8.125e+08 ±  0%  perf-stat.L1-dcache-prefetches
>  2.767e+10 ±  0%      -1.8%  2.717e+10 ±  0%  perf-stat.L1-dcache-store-misses
>  2.352e+11 ±  0%      -2.8%  2.287e+11 ±  0%  perf-stat.L1-dcache-stores
>  6.774e+09 ±  0%      -2.3%   6.62e+09 ±  0%  perf-stat.L1-icache-load-misses
>  5.571e+08 ±  0%     +40.5%  7.826e+08 ±  1%  perf-stat.LLC-load-misses
>  6.263e+09 ±  0%     -13.7%  5.407e+09 ±  1%  perf-stat.LLC-loads
>  1.914e+11 ±  0%      -4.2%  1.833e+11 ±  0%  perf-stat.branch-instructions
>  1.145e+09 ±  2%      -5.6%  1.081e+09 ±  0%  perf-stat.branch-load-misses
>  1.911e+11 ±  0%      -4.3%  1.829e+11 ±  0%  perf-stat.branch-loads
>  1.142e+09 ±  2%      -5.1%  1.083e+09 ±  0%  perf-stat.branch-misses
>  1.218e+09 ±  0%     +19.8%   1.46e+09 ±  0%  perf-stat.cache-misses
>  2.118e+10 ±  0%      -5.2%  2.007e+10 ±  0%  perf-stat.cache-references
>    2510308 ±  1%     -15.7%    2115410 ±  0%  perf-stat.context-switches
>      39623 ±  0%     +22.1%      48370 ±  1%  perf-stat.cpu-migrations
>  4.179e+08 ± 40%    +165.7%  1.111e+09 ± 35%  perf-stat.dTLB-load-misses
>  3.684e+11 ±  0%      -2.5%  3.592e+11 ±  0%  perf-stat.dTLB-loads
>  1.232e+08 ± 15%     +62.5%  2.002e+08 ± 27%  perf-stat.dTLB-store-misses
>  2.348e+11 ±  0%      -2.5%  2.288e+11 ±  0%  perf-stat.dTLB-stores
>    3577297 ±  2%      +8.7%    3888986 ±  1%  perf-stat.iTLB-load-misses
>  1.035e+12 ±  0%      -3.5%  9.988e+11 ±  0%  perf-stat.iTLB-loads
>  1.036e+12 ±  0%      -3.7%  9.978e+11 ±  0%  perf-stat.instructions
>        594 ± 30%    +130.3%       1369 ± 13%  sched_debug.cfs_rq[0]:/.blocked_load_avg
>         17 ± 10%     -28.2%         12 ± 23%  sched_debug.cfs_rq[0]:/.nr_spread_over
>        210 ± 21%     +42.1%        298 ± 28%  sched_debug.cfs_rq[0]:/.tg_runnable_contrib
>       9676 ± 21%     +42.1%      13754 ± 28%  sched_debug.cfs_rq[0]:/.avg->runnable_avg_sum
>        772 ± 25%    +116.5%       1672 ±  9%  sched_debug.cfs_rq[0]:/.tg_load_contrib
>       8402 ±  9%     +83.3%      15405 ± 11%  sched_debug.cfs_rq[0]:/.tg_load_avg
>       8356 ±  9%     +82.8%      15272 ± 11%  sched_debug.cfs_rq[1]:/.tg_load_avg
>        968 ± 25%    +100.8%       1943 ± 14%  sched_debug.cfs_rq[1]:/.blocked_load_avg
>      16242 ±  9%     -22.2%      12643 ± 14%  sched_debug.cfs_rq[1]:/.avg->runnable_avg_sum
>        353 ±  9%     -22.1%        275 ± 14%  sched_debug.cfs_rq[1]:/.tg_runnable_contrib
>       1183 ± 23%     +77.7%       2102 ± 12%  sched_debug.cfs_rq[1]:/.tg_load_contrib
>        181 ±  8%     -31.4%        124 ± 26%  sched_debug.cfs_rq[2]:/.tg_runnable_contrib
>       8364 ±  8%     -31.3%       5745 ± 26%  sched_debug.cfs_rq[2]:/.avg->runnable_avg_sum
>       8297 ±  9%     +81.7%      15079 ± 12%  sched_debug.cfs_rq[2]:/.tg_load_avg
>      30439 ± 13%     -45.2%      16681 ± 26%  sched_debug.cfs_rq[2]:/.exec_clock
>      39735 ± 14%     -48.3%      20545 ± 29%  sched_debug.cfs_rq[2]:/.min_vruntime
>       8231 ± 10%     +82.2%      15000 ± 12%  sched_debug.cfs_rq[3]:/.tg_load_avg
>       1210 ± 14%    +110.3%       2546 ± 30%  sched_debug.cfs_rq[4]:/.tg_load_contrib
>       8188 ± 10%     +82.8%      14964 ± 12%  sched_debug.cfs_rq[4]:/.tg_load_avg
>       8132 ± 10%     +83.1%      14890 ± 12%  sched_debug.cfs_rq[5]:/.tg_load_avg
>        749 ± 29%    +205.9%       2292 ± 34%  sched_debug.cfs_rq[5]:/.blocked_load_avg
>        963 ± 30%    +169.9%       2599 ± 33%  sched_debug.cfs_rq[5]:/.tg_load_contrib
>      37791 ± 32%     -38.6%      23209 ± 13%  sched_debug.cfs_rq[6]:/.min_vruntime
>        693 ± 25%    +132.2%       1609 ± 29%  sched_debug.cfs_rq[6]:/.blocked_load_avg
>      10838 ± 13%     -39.2%       6587 ± 13%  sched_debug.cfs_rq[6]:/.avg->runnable_avg_sum
>      29329 ± 27%     -33.2%      19577 ± 10%  sched_debug.cfs_rq[6]:/.exec_clock
>        235 ± 14%     -39.7%        142 ± 14%  sched_debug.cfs_rq[6]:/.tg_runnable_contrib
>       8085 ± 10%     +83.6%      14848 ± 12%  sched_debug.cfs_rq[6]:/.tg_load_avg
>        839 ± 25%    +128.5%       1917 ± 18%  sched_debug.cfs_rq[6]:/.tg_load_contrib
>       8051 ± 10%     +83.6%      14779 ± 12%  sched_debug.cfs_rq[7]:/.tg_load_avg
>        156 ± 34%     +97.9%        309 ± 19%  sched_debug.cpu#0.cpu_load[4]
>        160 ± 25%     +64.0%        263 ± 16%  sched_debug.cpu#0.cpu_load[2]
>        156 ± 32%     +83.7%        286 ± 17%  sched_debug.cpu#0.cpu_load[3]
>        164 ± 20%     -35.1%        106 ± 31%  sched_debug.cpu#2.cpu_load[0]
>        249 ± 15%     +80.2%        449 ± 10%  sched_debug.cpu#4.cpu_load[3]
>        231 ± 11%    +101.2%        466 ± 13%  sched_debug.cpu#4.cpu_load[2]
>        217 ± 14%    +189.9%        630 ± 38%  sched_debug.cpu#4.cpu_load[0]
>      71951 ±  5%     +21.6%      87526 ±  7%  sched_debug.cpu#4.nr_load_updates
>        214 ±  8%    +146.1%        527 ± 27%  sched_debug.cpu#4.cpu_load[1]
>        256 ± 17%     +75.7%        449 ± 13%  sched_debug.cpu#4.cpu_load[4]
>        209 ± 23%     +98.3%        416 ± 48%  sched_debug.cpu#5.cpu_load[2]
>      68024 ±  2%     +18.8%      80825 ±  1%  sched_debug.cpu#5.nr_load_updates
>        217 ± 26%     +74.9%        380 ± 45%  sched_debug.cpu#5.cpu_load[3]
>        852 ± 21%     -38.3%        526 ± 22%  sched_debug.cpu#6.curr->pid
> 
> lkp-st02: Core2
> Memory: 8G
> 
> 
> 
> 
>                                 perf-stat.cache-misses
> 
>   1.6e+09 O+-----O--O---O--O---O--------------------------------------------+
>           |                       O   O  O   O  O   O  O   O  O   O         |
>   1.4e+09 ++                                                                |
>   1.2e+09 *+.*...*      *..*      *      *...*..*...*..*...*..*...*..*...*..*
>           |      :      :  :      :      :                                  |
>     1e+09 ++      :    :    :    : :    :                                   |
>           |       :    :    :    : :    :                                   |
>     8e+08 ++      :    :    :    : :    :                                   |
>           |       :   :      :   :  :   :                                   |
>     6e+08 ++       :  :      :  :   :  :                                    |
>     4e+08 ++       : :        : :    : :                                    |
>           |        : :        : :    : :                                    |
>     2e+08 ++       : :        : :    : :                                    |
>           |         :          :      :                                     |
>         0 ++-O------*----------*------*-------------------------------------+
> 
> 
>                             perf-stat.L1-dcache-prefetches
> 
>   1.2e+09 ++----------------------------------------------------------------+
>           *..*...*      *..*      *        ..*..  ..*..*...*..*...*..*...*..*
>     1e+09 ++     :      :  :      :      *.     *.                          |
>           |      :     :    :     ::     :                                  |
>           |       :    :    :    : :     :                        O         |
>     8e+08 O+     O: O  :O  O:  O :O:  O :O   O  O   O  O   O  O             |
>           |       :   :      :   :  :   :                                   |
>     6e+08 ++      :   :      :   :  :   :                                   |
>           |        :  :      :  :   :   :                                   |
>     4e+08 ++       :  :      :  :   :  :                                    |
>           |        : :        : :    : :                                    |
>           |        : :        : :    : :                                    |
>     2e+08 ++        ::        ::     : :                                    |
>           |         :          :      :                                     |
>         0 ++-O------*----------*------*-------------------------------------+
> 
> 
>                               perf-stat.LLC-load-misses
> 
>   1e+09 ++------------------------------------------------------------------+
>   9e+08 O+     O   O  O   O  O                                              |
>         |                        O   O  O   O                               |
>   8e+08 ++                                     O   O   O  O   O  O          |
>   7e+08 ++                                                                  |
>         |                                                                   |
>   6e+08 *+..*..*      *...*      *      *...*..*...*...*..*...*..*...*..*...*
>   5e+08 ++      :     :   :      ::     :                                   |
>   4e+08 ++      :    :     :    : :    :                                    |
>         |        :   :     :    :  :   :                                    |
>   3e+08 ++       :   :      :  :   :   :                                    |
>   2e+08 ++        : :       :  :    : :                                     |
>         |         : :       : :     : :                                     |
>   1e+08 ++         :         ::      :                                      |
>       0 ++--O------*---------*-------*--------------------------------------+
> 
> 
>                               perf-stat.context-switches
> 
>     3e+06 ++----------------------------------------------------------------+
>           |                              *...*..*...                        |
>   2.5e+06 *+.*...*      *..*      *      :          *..*...  .*...*..*...  .*
>           |      :      :  :      :      :                 *.            *. |
>           O      O: O  :O  O:  O  ::    :       O   O  O   O  O   O         |
>     2e+06 ++      :    :    :    :O:  O :O   O                              |
>           |       :    :    :    : :    :                                   |
>   1.5e+06 ++      :   :      :   :  :   :                                   |
>           |        :  :      :   :  :  :                                    |
>     1e+06 ++       :  :      :  :   :  :                                    |
>           |        : :        : :    : :                                    |
>           |        : :        : :    : :                                    |
>    500000 ++        ::        : :    ::                                     |
>           |         :          :      :                                     |
>         0 ++-O------*----------*------*-------------------------------------+
> 
> 
>                                   vmstat.system.cs
> 
>   10000 ++------------------------------------------------------------------+
>    9000 ++                              *...*..                             |
>         *...*..*      *...*      *      :      *...*...*..  ..*..*...*..  ..*
>    8000 ++     :      :   :      :      :                 *.            *.  |
>    7000 O+     O:  O  O   O: O  : :    :       O   O   O  O   O  O          |
>         |       :    :     :    :O:  O :O   O                               |
>    6000 ++      :    :     :    : :    :                                    |
>    5000 ++       :   :     :   :   :   :                                    |
>    4000 ++       :   :      :  :   :  :                                     |
>         |        :  :       :  :   :  :                                     |
>    3000 ++        : :       : :     : :                                     |
>    2000 ++        : :       : :     : :                                     |
>         |         : :        ::     ::                                      |
>    1000 ++         :         :       :                                      |
>       0 ++--O------*---------*-------*--------------------------------------+
> 
> 
> 	[*] bisect-good sample
> 	[O] bisect-bad  sample
> 
> To reproduce:
> 
> 	apt-get install ruby
> 	git clone git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git
> 	cd lkp-tests
> 	bin/setup-local job.yaml # the job file attached in this email
> 	bin/run-local   job.yaml
> 
> 
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
> 
> 
> Thanks,
> Ying Huang
> 


[-- Attachment #2: attachment.sig --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [LKP] [RAID5] 878ee679279: -1.8% vmstat.io.bo, +40.5% perf-stat.LLC-load-misses
@ 2015-04-24  2:15   ` NeilBrown
  0 siblings, 0 replies; 6+ messages in thread
From: NeilBrown @ 2015-04-24  2:15 UTC (permalink / raw)
  To: Huang Ying; +Cc: shli@kernel.org, LKML, LKP ML

[-- Attachment #1: Type: text/plain, Size: 15980 bytes --]

On Thu, 23 Apr 2015 14:55:59 +0800 Huang Ying <ying.huang@intel.com> wrote:

> FYI, we noticed the below changes on
> 
> git://neil.brown.name/md for-next
> commit 878ee6792799e2f88bdcac329845efadb205252f ("RAID5: batch adjacent full stripe write")

Hi,
 is there any chance that you could explain what some of this means?
There is lots of data and some very pretty graphs, but no explanation.

Which numbers are "good", which are "bad"?  Which is "worst".
What do the graphs really show? and what would we like to see in them?

I think it is really great that you are doing this testing and reporting the
results.  It's just so sad that I completely fail to understand them.

Thanks,
NeilBrown

> 
> 
> testbox/testcase/testparams: lkp-st02/dd-write/300-5m-11HDD-RAID5-cfq-xfs-1dd
> 
> a87d7f782b47e030  878ee6792799e2f88bdcac3298  
> ----------------  --------------------------  
>          %stddev     %change         %stddev
>              \          |                \  
>      59035 ±  0%     +18.4%      69913 ±  1%  softirqs.SCHED
>       1330 ± 10%     +17.4%       1561 ±  4%  slabinfo.kmalloc-512.num_objs
>       1330 ± 10%     +17.4%       1561 ±  4%  slabinfo.kmalloc-512.active_objs
>     305908 ±  0%      -1.8%     300427 ±  0%  vmstat.io.bo
>          1 ±  0%    +100.0%          2 ±  0%  vmstat.procs.r
>       8266 ±  1%     -15.7%       6968 ±  0%  vmstat.system.cs
>      14819 ±  0%      -2.1%      14503 ±  0%  vmstat.system.in
>      18.20 ±  6%     +10.2%      20.05 ±  4%  perf-profile.cpu-cycles.raid_run_ops.handle_stripe.handle_active_stripes.raid5d.md_thread
>       1.94 ±  9%     +90.6%       3.70 ±  9%  perf-profile.cpu-cycles.async_xor.raid_run_ops.handle_stripe.handle_active_stripes.raid5d
>       0.00 ±  0%      +Inf%      25.18 ±  3%  perf-profile.cpu-cycles.handle_active_stripes.isra.45.raid5d.md_thread.kthread.ret_from_fork
>       0.00 ±  0%      +Inf%      14.14 ±  4%  perf-profile.cpu-cycles.async_copy_data.isra.42.raid_run_ops.handle_stripe.handle_active_stripes.raid5d
>       1.79 ±  7%    +102.9%       3.64 ±  9%  perf-profile.cpu-cycles.xor_blocks.async_xor.raid_run_ops.handle_stripe.handle_active_stripes
>       3.09 ±  4%     -10.8%       2.76 ±  4%  perf-profile.cpu-cycles.get_active_stripe.make_request.md_make_request.generic_make_request.submit_bio
>       0.80 ± 14%     +28.1%       1.02 ± 10%  perf-profile.cpu-cycles.mutex_lock.xfs_file_buffered_aio_write.xfs_file_write_iter.new_sync_write.vfs_write
>      14.78 ±  6%    -100.0%       0.00 ±  0%  perf-profile.cpu-cycles.async_copy_data.isra.38.raid_run_ops.handle_stripe.handle_active_stripes.raid5d
>      25.68 ±  4%    -100.0%       0.00 ±  0%  perf-profile.cpu-cycles.handle_active_stripes.isra.41.raid5d.md_thread.kthread.ret_from_fork
>       1.23 ±  5%    +140.0%       2.96 ±  7%  perf-profile.cpu-cycles.xor_sse_5_pf64.xor_blocks.async_xor.raid_run_ops.handle_stripe
>       2.62 ±  6%     -95.6%       0.12 ± 33%  perf-profile.cpu-cycles.analyse_stripe.handle_stripe.handle_active_stripes.raid5d.md_thread
>       0.96 ±  9%     +17.5%       1.12 ±  2%  perf-profile.cpu-cycles.xfs_ilock.xfs_file_buffered_aio_write.xfs_file_write_iter.new_sync_write.vfs_write
>  1.461e+10 ±  0%      -5.3%  1.384e+10 ±  1%  perf-stat.L1-dcache-load-misses
>  3.688e+11 ±  0%      -2.7%   3.59e+11 ±  0%  perf-stat.L1-dcache-loads
>  1.124e+09 ±  0%     -27.7%  8.125e+08 ±  0%  perf-stat.L1-dcache-prefetches
>  2.767e+10 ±  0%      -1.8%  2.717e+10 ±  0%  perf-stat.L1-dcache-store-misses
>  2.352e+11 ±  0%      -2.8%  2.287e+11 ±  0%  perf-stat.L1-dcache-stores
>  6.774e+09 ±  0%      -2.3%   6.62e+09 ±  0%  perf-stat.L1-icache-load-misses
>  5.571e+08 ±  0%     +40.5%  7.826e+08 ±  1%  perf-stat.LLC-load-misses
>  6.263e+09 ±  0%     -13.7%  5.407e+09 ±  1%  perf-stat.LLC-loads
>  1.914e+11 ±  0%      -4.2%  1.833e+11 ±  0%  perf-stat.branch-instructions
>  1.145e+09 ±  2%      -5.6%  1.081e+09 ±  0%  perf-stat.branch-load-misses
>  1.911e+11 ±  0%      -4.3%  1.829e+11 ±  0%  perf-stat.branch-loads
>  1.142e+09 ±  2%      -5.1%  1.083e+09 ±  0%  perf-stat.branch-misses
>  1.218e+09 ±  0%     +19.8%   1.46e+09 ±  0%  perf-stat.cache-misses
>  2.118e+10 ±  0%      -5.2%  2.007e+10 ±  0%  perf-stat.cache-references
>    2510308 ±  1%     -15.7%    2115410 ±  0%  perf-stat.context-switches
>      39623 ±  0%     +22.1%      48370 ±  1%  perf-stat.cpu-migrations
>  4.179e+08 ± 40%    +165.7%  1.111e+09 ± 35%  perf-stat.dTLB-load-misses
>  3.684e+11 ±  0%      -2.5%  3.592e+11 ±  0%  perf-stat.dTLB-loads
>  1.232e+08 ± 15%     +62.5%  2.002e+08 ± 27%  perf-stat.dTLB-store-misses
>  2.348e+11 ±  0%      -2.5%  2.288e+11 ±  0%  perf-stat.dTLB-stores
>    3577297 ±  2%      +8.7%    3888986 ±  1%  perf-stat.iTLB-load-misses
>  1.035e+12 ±  0%      -3.5%  9.988e+11 ±  0%  perf-stat.iTLB-loads
>  1.036e+12 ±  0%      -3.7%  9.978e+11 ±  0%  perf-stat.instructions
>        594 ± 30%    +130.3%       1369 ± 13%  sched_debug.cfs_rq[0]:/.blocked_load_avg
>         17 ± 10%     -28.2%         12 ± 23%  sched_debug.cfs_rq[0]:/.nr_spread_over
>        210 ± 21%     +42.1%        298 ± 28%  sched_debug.cfs_rq[0]:/.tg_runnable_contrib
>       9676 ± 21%     +42.1%      13754 ± 28%  sched_debug.cfs_rq[0]:/.avg->runnable_avg_sum
>        772 ± 25%    +116.5%       1672 ±  9%  sched_debug.cfs_rq[0]:/.tg_load_contrib
>       8402 ±  9%     +83.3%      15405 ± 11%  sched_debug.cfs_rq[0]:/.tg_load_avg
>       8356 ±  9%     +82.8%      15272 ± 11%  sched_debug.cfs_rq[1]:/.tg_load_avg
>        968 ± 25%    +100.8%       1943 ± 14%  sched_debug.cfs_rq[1]:/.blocked_load_avg
>      16242 ±  9%     -22.2%      12643 ± 14%  sched_debug.cfs_rq[1]:/.avg->runnable_avg_sum
>        353 ±  9%     -22.1%        275 ± 14%  sched_debug.cfs_rq[1]:/.tg_runnable_contrib
>       1183 ± 23%     +77.7%       2102 ± 12%  sched_debug.cfs_rq[1]:/.tg_load_contrib
>        181 ±  8%     -31.4%        124 ± 26%  sched_debug.cfs_rq[2]:/.tg_runnable_contrib
>       8364 ±  8%     -31.3%       5745 ± 26%  sched_debug.cfs_rq[2]:/.avg->runnable_avg_sum
>       8297 ±  9%     +81.7%      15079 ± 12%  sched_debug.cfs_rq[2]:/.tg_load_avg
>      30439 ± 13%     -45.2%      16681 ± 26%  sched_debug.cfs_rq[2]:/.exec_clock
>      39735 ± 14%     -48.3%      20545 ± 29%  sched_debug.cfs_rq[2]:/.min_vruntime
>       8231 ± 10%     +82.2%      15000 ± 12%  sched_debug.cfs_rq[3]:/.tg_load_avg
>       1210 ± 14%    +110.3%       2546 ± 30%  sched_debug.cfs_rq[4]:/.tg_load_contrib
>       8188 ± 10%     +82.8%      14964 ± 12%  sched_debug.cfs_rq[4]:/.tg_load_avg
>       8132 ± 10%     +83.1%      14890 ± 12%  sched_debug.cfs_rq[5]:/.tg_load_avg
>        749 ± 29%    +205.9%       2292 ± 34%  sched_debug.cfs_rq[5]:/.blocked_load_avg
>        963 ± 30%    +169.9%       2599 ± 33%  sched_debug.cfs_rq[5]:/.tg_load_contrib
>      37791 ± 32%     -38.6%      23209 ± 13%  sched_debug.cfs_rq[6]:/.min_vruntime
>        693 ± 25%    +132.2%       1609 ± 29%  sched_debug.cfs_rq[6]:/.blocked_load_avg
>      10838 ± 13%     -39.2%       6587 ± 13%  sched_debug.cfs_rq[6]:/.avg->runnable_avg_sum
>      29329 ± 27%     -33.2%      19577 ± 10%  sched_debug.cfs_rq[6]:/.exec_clock
>        235 ± 14%     -39.7%        142 ± 14%  sched_debug.cfs_rq[6]:/.tg_runnable_contrib
>       8085 ± 10%     +83.6%      14848 ± 12%  sched_debug.cfs_rq[6]:/.tg_load_avg
>        839 ± 25%    +128.5%       1917 ± 18%  sched_debug.cfs_rq[6]:/.tg_load_contrib
>       8051 ± 10%     +83.6%      14779 ± 12%  sched_debug.cfs_rq[7]:/.tg_load_avg
>        156 ± 34%     +97.9%        309 ± 19%  sched_debug.cpu#0.cpu_load[4]
>        160 ± 25%     +64.0%        263 ± 16%  sched_debug.cpu#0.cpu_load[2]
>        156 ± 32%     +83.7%        286 ± 17%  sched_debug.cpu#0.cpu_load[3]
>        164 ± 20%     -35.1%        106 ± 31%  sched_debug.cpu#2.cpu_load[0]
>        249 ± 15%     +80.2%        449 ± 10%  sched_debug.cpu#4.cpu_load[3]
>        231 ± 11%    +101.2%        466 ± 13%  sched_debug.cpu#4.cpu_load[2]
>        217 ± 14%    +189.9%        630 ± 38%  sched_debug.cpu#4.cpu_load[0]
>      71951 ±  5%     +21.6%      87526 ±  7%  sched_debug.cpu#4.nr_load_updates
>        214 ±  8%    +146.1%        527 ± 27%  sched_debug.cpu#4.cpu_load[1]
>        256 ± 17%     +75.7%        449 ± 13%  sched_debug.cpu#4.cpu_load[4]
>        209 ± 23%     +98.3%        416 ± 48%  sched_debug.cpu#5.cpu_load[2]
>      68024 ±  2%     +18.8%      80825 ±  1%  sched_debug.cpu#5.nr_load_updates
>        217 ± 26%     +74.9%        380 ± 45%  sched_debug.cpu#5.cpu_load[3]
>        852 ± 21%     -38.3%        526 ± 22%  sched_debug.cpu#6.curr->pid
> 
> lkp-st02: Core2
> Memory: 8G
> 
> 
> 
> 
>                                 perf-stat.cache-misses
> 
>   1.6e+09 O+-----O--O---O--O---O--------------------------------------------+
>           |                       O   O  O   O  O   O  O   O  O   O         |
>   1.4e+09 ++                                                                |
>   1.2e+09 *+.*...*      *..*      *      *...*..*...*..*...*..*...*..*...*..*
>           |      :      :  :      :      :                                  |
>     1e+09 ++      :    :    :    : :    :                                   |
>           |       :    :    :    : :    :                                   |
>     8e+08 ++      :    :    :    : :    :                                   |
>           |       :   :      :   :  :   :                                   |
>     6e+08 ++       :  :      :  :   :  :                                    |
>     4e+08 ++       : :        : :    : :                                    |
>           |        : :        : :    : :                                    |
>     2e+08 ++       : :        : :    : :                                    |
>           |         :          :      :                                     |
>         0 ++-O------*----------*------*-------------------------------------+
> 
> 
>                             perf-stat.L1-dcache-prefetches
> 
>   1.2e+09 ++----------------------------------------------------------------+
>           *..*...*      *..*      *        ..*..  ..*..*...*..*...*..*...*..*
>     1e+09 ++     :      :  :      :      *.     *.                          |
>           |      :     :    :     ::     :                                  |
>           |       :    :    :    : :     :                        O         |
>     8e+08 O+     O: O  :O  O:  O :O:  O :O   O  O   O  O   O  O             |
>           |       :   :      :   :  :   :                                   |
>     6e+08 ++      :   :      :   :  :   :                                   |
>           |        :  :      :  :   :   :                                   |
>     4e+08 ++       :  :      :  :   :  :                                    |
>           |        : :        : :    : :                                    |
>           |        : :        : :    : :                                    |
>     2e+08 ++        ::        ::     : :                                    |
>           |         :          :      :                                     |
>         0 ++-O------*----------*------*-------------------------------------+
> 
> 
>                               perf-stat.LLC-load-misses
> 
>   1e+09 ++------------------------------------------------------------------+
>   9e+08 O+     O   O  O   O  O                                              |
>         |                        O   O  O   O                               |
>   8e+08 ++                                     O   O   O  O   O  O          |
>   7e+08 ++                                                                  |
>         |                                                                   |
>   6e+08 *+..*..*      *...*      *      *...*..*...*...*..*...*..*...*..*...*
>   5e+08 ++      :     :   :      ::     :                                   |
>   4e+08 ++      :    :     :    : :    :                                    |
>         |        :   :     :    :  :   :                                    |
>   3e+08 ++       :   :      :  :   :   :                                    |
>   2e+08 ++        : :       :  :    : :                                     |
>         |         : :       : :     : :                                     |
>   1e+08 ++         :         ::      :                                      |
>       0 ++--O------*---------*-------*--------------------------------------+
> 
> 
>                               perf-stat.context-switches
> 
>     3e+06 ++----------------------------------------------------------------+
>           |                              *...*..*...                        |
>   2.5e+06 *+.*...*      *..*      *      :          *..*...  .*...*..*...  .*
>           |      :      :  :      :      :                 *.            *. |
>           O      O: O  :O  O:  O  ::    :       O   O  O   O  O   O         |
>     2e+06 ++      :    :    :    :O:  O :O   O                              |
>           |       :    :    :    : :    :                                   |
>   1.5e+06 ++      :   :      :   :  :   :                                   |
>           |        :  :      :   :  :  :                                    |
>     1e+06 ++       :  :      :  :   :  :                                    |
>           |        : :        : :    : :                                    |
>           |        : :        : :    : :                                    |
>    500000 ++        ::        : :    ::                                     |
>           |         :          :      :                                     |
>         0 ++-O------*----------*------*-------------------------------------+
> 
> 
>                                   vmstat.system.cs
> 
>   10000 ++------------------------------------------------------------------+
>    9000 ++                              *...*..                             |
>         *...*..*      *...*      *      :      *...*...*..  ..*..*...*..  ..*
>    8000 ++     :      :   :      :      :                 *.            *.  |
>    7000 O+     O:  O  O   O: O  : :    :       O   O   O  O   O  O          |
>         |       :    :     :    :O:  O :O   O                               |
>    6000 ++      :    :     :    : :    :                                    |
>    5000 ++       :   :     :   :   :   :                                    |
>    4000 ++       :   :      :  :   :  :                                     |
>         |        :  :       :  :   :  :                                     |
>    3000 ++        : :       : :     : :                                     |
>    2000 ++        : :       : :     : :                                     |
>         |         : :        ::     ::                                      |
>    1000 ++         :         :       :                                      |
>       0 ++--O------*---------*-------*--------------------------------------+
> 
> 
> 	[*] bisect-good sample
> 	[O] bisect-bad  sample
> 
> To reproduce:
> 
> 	apt-get install ruby
> 	git clone git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git
> 	cd lkp-tests
> 	bin/setup-local job.yaml # the job file attached in this email
> 	bin/run-local   job.yaml
> 
> 
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
> 
> 
> Thanks,
> Ying Huang
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RAID5] 878ee679279: -1.8% vmstat.io.bo, +40.5% perf-stat.LLC-load-misses
  2015-04-24  2:15   ` [LKP] " NeilBrown
@ 2015-04-30  6:25     ` Yuanhan Liu
  -1 siblings, 0 replies; 6+ messages in thread
From: Yuanhan Liu @ 2015-04-30  6:25 UTC (permalink / raw)
  To: lkp

[-- Attachment #1: Type: text/plain, Size: 16981 bytes --]

On Fri, Apr 24, 2015 at 12:15:59PM +1000, NeilBrown wrote:
> On Thu, 23 Apr 2015 14:55:59 +0800 Huang Ying <ying.huang@intel.com> wrote:
> 
> > FYI, we noticed the below changes on
> > 
> > git://neil.brown.name/md for-next
> > commit 878ee6792799e2f88bdcac329845efadb205252f ("RAID5: batch adjacent full stripe write")
> 
> Hi,
>  is there any chance that you could explain what some of this means?
> There is lots of data and some very pretty graphs, but no explanation.

Hi Neil,

(Sorry for late response: Ying is on vacation)

I guess you can simply ignore this report, as I already reported to you
month ago that this patch made fsmark performs better in most cases:

    https://lists.01.org/pipermail/lkp/2015-March/002411.html

> 
> Which numbers are "good", which are "bad"?  Which is "worst".
> What do the graphs really show? and what would we like to see in them?
> 
> I think it is really great that you are doing this testing and reporting the
> results.  It's just so sad that I completely fail to understand them.

Sorry, it's our bad to make them hard to understand as well as
to report a duplicate one(well, the commit hash is different ;).

We might need take some time to make those data understood easier.

	--yliu

> 
> > 
> > 
> > testbox/testcase/testparams: lkp-st02/dd-write/300-5m-11HDD-RAID5-cfq-xfs-1dd
> > 
> > a87d7f782b47e030  878ee6792799e2f88bdcac3298  
> > ----------------  --------------------------  
> >          %stddev     %change         %stddev
> >              \          |                \  
> >      59035 ±  0%     +18.4%      69913 ±  1%  softirqs.SCHED
> >       1330 ± 10%     +17.4%       1561 ±  4%  slabinfo.kmalloc-512.num_objs
> >       1330 ± 10%     +17.4%       1561 ±  4%  slabinfo.kmalloc-512.active_objs
> >     305908 ±  0%      -1.8%     300427 ±  0%  vmstat.io.bo
> >          1 ±  0%    +100.0%          2 ±  0%  vmstat.procs.r
> >       8266 ±  1%     -15.7%       6968 ±  0%  vmstat.system.cs
> >      14819 ±  0%      -2.1%      14503 ±  0%  vmstat.system.in
> >      18.20 ±  6%     +10.2%      20.05 ±  4%  perf-profile.cpu-cycles.raid_run_ops.handle_stripe.handle_active_stripes.raid5d.md_thread
> >       1.94 ±  9%     +90.6%       3.70 ±  9%  perf-profile.cpu-cycles.async_xor.raid_run_ops.handle_stripe.handle_active_stripes.raid5d
> >       0.00 ±  0%      +Inf%      25.18 ±  3%  perf-profile.cpu-cycles.handle_active_stripes.isra.45.raid5d.md_thread.kthread.ret_from_fork
> >       0.00 ±  0%      +Inf%      14.14 ±  4%  perf-profile.cpu-cycles.async_copy_data.isra.42.raid_run_ops.handle_stripe.handle_active_stripes.raid5d
> >       1.79 ±  7%    +102.9%       3.64 ±  9%  perf-profile.cpu-cycles.xor_blocks.async_xor.raid_run_ops.handle_stripe.handle_active_stripes
> >       3.09 ±  4%     -10.8%       2.76 ±  4%  perf-profile.cpu-cycles.get_active_stripe.make_request.md_make_request.generic_make_request.submit_bio
> >       0.80 ± 14%     +28.1%       1.02 ± 10%  perf-profile.cpu-cycles.mutex_lock.xfs_file_buffered_aio_write.xfs_file_write_iter.new_sync_write.vfs_write
> >      14.78 ±  6%    -100.0%       0.00 ±  0%  perf-profile.cpu-cycles.async_copy_data.isra.38.raid_run_ops.handle_stripe.handle_active_stripes.raid5d
> >      25.68 ±  4%    -100.0%       0.00 ±  0%  perf-profile.cpu-cycles.handle_active_stripes.isra.41.raid5d.md_thread.kthread.ret_from_fork
> >       1.23 ±  5%    +140.0%       2.96 ±  7%  perf-profile.cpu-cycles.xor_sse_5_pf64.xor_blocks.async_xor.raid_run_ops.handle_stripe
> >       2.62 ±  6%     -95.6%       0.12 ± 33%  perf-profile.cpu-cycles.analyse_stripe.handle_stripe.handle_active_stripes.raid5d.md_thread
> >       0.96 ±  9%     +17.5%       1.12 ±  2%  perf-profile.cpu-cycles.xfs_ilock.xfs_file_buffered_aio_write.xfs_file_write_iter.new_sync_write.vfs_write
> >  1.461e+10 ±  0%      -5.3%  1.384e+10 ±  1%  perf-stat.L1-dcache-load-misses
> >  3.688e+11 ±  0%      -2.7%   3.59e+11 ±  0%  perf-stat.L1-dcache-loads
> >  1.124e+09 ±  0%     -27.7%  8.125e+08 ±  0%  perf-stat.L1-dcache-prefetches
> >  2.767e+10 ±  0%      -1.8%  2.717e+10 ±  0%  perf-stat.L1-dcache-store-misses
> >  2.352e+11 ±  0%      -2.8%  2.287e+11 ±  0%  perf-stat.L1-dcache-stores
> >  6.774e+09 ±  0%      -2.3%   6.62e+09 ±  0%  perf-stat.L1-icache-load-misses
> >  5.571e+08 ±  0%     +40.5%  7.826e+08 ±  1%  perf-stat.LLC-load-misses
> >  6.263e+09 ±  0%     -13.7%  5.407e+09 ±  1%  perf-stat.LLC-loads
> >  1.914e+11 ±  0%      -4.2%  1.833e+11 ±  0%  perf-stat.branch-instructions
> >  1.145e+09 ±  2%      -5.6%  1.081e+09 ±  0%  perf-stat.branch-load-misses
> >  1.911e+11 ±  0%      -4.3%  1.829e+11 ±  0%  perf-stat.branch-loads
> >  1.142e+09 ±  2%      -5.1%  1.083e+09 ±  0%  perf-stat.branch-misses
> >  1.218e+09 ±  0%     +19.8%   1.46e+09 ±  0%  perf-stat.cache-misses
> >  2.118e+10 ±  0%      -5.2%  2.007e+10 ±  0%  perf-stat.cache-references
> >    2510308 ±  1%     -15.7%    2115410 ±  0%  perf-stat.context-switches
> >      39623 ±  0%     +22.1%      48370 ±  1%  perf-stat.cpu-migrations
> >  4.179e+08 ± 40%    +165.7%  1.111e+09 ± 35%  perf-stat.dTLB-load-misses
> >  3.684e+11 ±  0%      -2.5%  3.592e+11 ±  0%  perf-stat.dTLB-loads
> >  1.232e+08 ± 15%     +62.5%  2.002e+08 ± 27%  perf-stat.dTLB-store-misses
> >  2.348e+11 ±  0%      -2.5%  2.288e+11 ±  0%  perf-stat.dTLB-stores
> >    3577297 ±  2%      +8.7%    3888986 ±  1%  perf-stat.iTLB-load-misses
> >  1.035e+12 ±  0%      -3.5%  9.988e+11 ±  0%  perf-stat.iTLB-loads
> >  1.036e+12 ±  0%      -3.7%  9.978e+11 ±  0%  perf-stat.instructions
> >        594 ± 30%    +130.3%       1369 ± 13%  sched_debug.cfs_rq[0]:/.blocked_load_avg
> >         17 ± 10%     -28.2%         12 ± 23%  sched_debug.cfs_rq[0]:/.nr_spread_over
> >        210 ± 21%     +42.1%        298 ± 28%  sched_debug.cfs_rq[0]:/.tg_runnable_contrib
> >       9676 ± 21%     +42.1%      13754 ± 28%  sched_debug.cfs_rq[0]:/.avg->runnable_avg_sum
> >        772 ± 25%    +116.5%       1672 ±  9%  sched_debug.cfs_rq[0]:/.tg_load_contrib
> >       8402 ±  9%     +83.3%      15405 ± 11%  sched_debug.cfs_rq[0]:/.tg_load_avg
> >       8356 ±  9%     +82.8%      15272 ± 11%  sched_debug.cfs_rq[1]:/.tg_load_avg
> >        968 ± 25%    +100.8%       1943 ± 14%  sched_debug.cfs_rq[1]:/.blocked_load_avg
> >      16242 ±  9%     -22.2%      12643 ± 14%  sched_debug.cfs_rq[1]:/.avg->runnable_avg_sum
> >        353 ±  9%     -22.1%        275 ± 14%  sched_debug.cfs_rq[1]:/.tg_runnable_contrib
> >       1183 ± 23%     +77.7%       2102 ± 12%  sched_debug.cfs_rq[1]:/.tg_load_contrib
> >        181 ±  8%     -31.4%        124 ± 26%  sched_debug.cfs_rq[2]:/.tg_runnable_contrib
> >       8364 ±  8%     -31.3%       5745 ± 26%  sched_debug.cfs_rq[2]:/.avg->runnable_avg_sum
> >       8297 ±  9%     +81.7%      15079 ± 12%  sched_debug.cfs_rq[2]:/.tg_load_avg
> >      30439 ± 13%     -45.2%      16681 ± 26%  sched_debug.cfs_rq[2]:/.exec_clock
> >      39735 ± 14%     -48.3%      20545 ± 29%  sched_debug.cfs_rq[2]:/.min_vruntime
> >       8231 ± 10%     +82.2%      15000 ± 12%  sched_debug.cfs_rq[3]:/.tg_load_avg
> >       1210 ± 14%    +110.3%       2546 ± 30%  sched_debug.cfs_rq[4]:/.tg_load_contrib
> >       8188 ± 10%     +82.8%      14964 ± 12%  sched_debug.cfs_rq[4]:/.tg_load_avg
> >       8132 ± 10%     +83.1%      14890 ± 12%  sched_debug.cfs_rq[5]:/.tg_load_avg
> >        749 ± 29%    +205.9%       2292 ± 34%  sched_debug.cfs_rq[5]:/.blocked_load_avg
> >        963 ± 30%    +169.9%       2599 ± 33%  sched_debug.cfs_rq[5]:/.tg_load_contrib
> >      37791 ± 32%     -38.6%      23209 ± 13%  sched_debug.cfs_rq[6]:/.min_vruntime
> >        693 ± 25%    +132.2%       1609 ± 29%  sched_debug.cfs_rq[6]:/.blocked_load_avg
> >      10838 ± 13%     -39.2%       6587 ± 13%  sched_debug.cfs_rq[6]:/.avg->runnable_avg_sum
> >      29329 ± 27%     -33.2%      19577 ± 10%  sched_debug.cfs_rq[6]:/.exec_clock
> >        235 ± 14%     -39.7%        142 ± 14%  sched_debug.cfs_rq[6]:/.tg_runnable_contrib
> >       8085 ± 10%     +83.6%      14848 ± 12%  sched_debug.cfs_rq[6]:/.tg_load_avg
> >        839 ± 25%    +128.5%       1917 ± 18%  sched_debug.cfs_rq[6]:/.tg_load_contrib
> >       8051 ± 10%     +83.6%      14779 ± 12%  sched_debug.cfs_rq[7]:/.tg_load_avg
> >        156 ± 34%     +97.9%        309 ± 19%  sched_debug.cpu#0.cpu_load[4]
> >        160 ± 25%     +64.0%        263 ± 16%  sched_debug.cpu#0.cpu_load[2]
> >        156 ± 32%     +83.7%        286 ± 17%  sched_debug.cpu#0.cpu_load[3]
> >        164 ± 20%     -35.1%        106 ± 31%  sched_debug.cpu#2.cpu_load[0]
> >        249 ± 15%     +80.2%        449 ± 10%  sched_debug.cpu#4.cpu_load[3]
> >        231 ± 11%    +101.2%        466 ± 13%  sched_debug.cpu#4.cpu_load[2]
> >        217 ± 14%    +189.9%        630 ± 38%  sched_debug.cpu#4.cpu_load[0]
> >      71951 ±  5%     +21.6%      87526 ±  7%  sched_debug.cpu#4.nr_load_updates
> >        214 ±  8%    +146.1%        527 ± 27%  sched_debug.cpu#4.cpu_load[1]
> >        256 ± 17%     +75.7%        449 ± 13%  sched_debug.cpu#4.cpu_load[4]
> >        209 ± 23%     +98.3%        416 ± 48%  sched_debug.cpu#5.cpu_load[2]
> >      68024 ±  2%     +18.8%      80825 ±  1%  sched_debug.cpu#5.nr_load_updates
> >        217 ± 26%     +74.9%        380 ± 45%  sched_debug.cpu#5.cpu_load[3]
> >        852 ± 21%     -38.3%        526 ± 22%  sched_debug.cpu#6.curr->pid
> > 
> > lkp-st02: Core2
> > Memory: 8G
> > 
> > 
> > 
> > 
> >                                 perf-stat.cache-misses
> > 
> >   1.6e+09 O+-----O--O---O--O---O--------------------------------------------+
> >           |                       O   O  O   O  O   O  O   O  O   O         |
> >   1.4e+09 ++                                                                |
> >   1.2e+09 *+.*...*      *..*      *      *...*..*...*..*...*..*...*..*...*..*
> >           |      :      :  :      :      :                                  |
> >     1e+09 ++      :    :    :    : :    :                                   |
> >           |       :    :    :    : :    :                                   |
> >     8e+08 ++      :    :    :    : :    :                                   |
> >           |       :   :      :   :  :   :                                   |
> >     6e+08 ++       :  :      :  :   :  :                                    |
> >     4e+08 ++       : :        : :    : :                                    |
> >           |        : :        : :    : :                                    |
> >     2e+08 ++       : :        : :    : :                                    |
> >           |         :          :      :                                     |
> >         0 ++-O------*----------*------*-------------------------------------+
> > 
> > 
> >                             perf-stat.L1-dcache-prefetches
> > 
> >   1.2e+09 ++----------------------------------------------------------------+
> >           *..*...*      *..*      *        ..*..  ..*..*...*..*...*..*...*..*
> >     1e+09 ++     :      :  :      :      *.     *.                          |
> >           |      :     :    :     ::     :                                  |
> >           |       :    :    :    : :     :                        O         |
> >     8e+08 O+     O: O  :O  O:  O :O:  O :O   O  O   O  O   O  O             |
> >           |       :   :      :   :  :   :                                   |
> >     6e+08 ++      :   :      :   :  :   :                                   |
> >           |        :  :      :  :   :   :                                   |
> >     4e+08 ++       :  :      :  :   :  :                                    |
> >           |        : :        : :    : :                                    |
> >           |        : :        : :    : :                                    |
> >     2e+08 ++        ::        ::     : :                                    |
> >           |         :          :      :                                     |
> >         0 ++-O------*----------*------*-------------------------------------+
> > 
> > 
> >                               perf-stat.LLC-load-misses
> > 
> >   1e+09 ++------------------------------------------------------------------+
> >   9e+08 O+     O   O  O   O  O                                              |
> >         |                        O   O  O   O                               |
> >   8e+08 ++                                     O   O   O  O   O  O          |
> >   7e+08 ++                                                                  |
> >         |                                                                   |
> >   6e+08 *+..*..*      *...*      *      *...*..*...*...*..*...*..*...*..*...*
> >   5e+08 ++      :     :   :      ::     :                                   |
> >   4e+08 ++      :    :     :    : :    :                                    |
> >         |        :   :     :    :  :   :                                    |
> >   3e+08 ++       :   :      :  :   :   :                                    |
> >   2e+08 ++        : :       :  :    : :                                     |
> >         |         : :       : :     : :                                     |
> >   1e+08 ++         :         ::      :                                      |
> >       0 ++--O------*---------*-------*--------------------------------------+
> > 
> > 
> >                               perf-stat.context-switches
> > 
> >     3e+06 ++----------------------------------------------------------------+
> >           |                              *...*..*...                        |
> >   2.5e+06 *+.*...*      *..*      *      :          *..*...  .*...*..*...  .*
> >           |      :      :  :      :      :                 *.            *. |
> >           O      O: O  :O  O:  O  ::    :       O   O  O   O  O   O         |
> >     2e+06 ++      :    :    :    :O:  O :O   O                              |
> >           |       :    :    :    : :    :                                   |
> >   1.5e+06 ++      :   :      :   :  :   :                                   |
> >           |        :  :      :   :  :  :                                    |
> >     1e+06 ++       :  :      :  :   :  :                                    |
> >           |        : :        : :    : :                                    |
> >           |        : :        : :    : :                                    |
> >    500000 ++        ::        : :    ::                                     |
> >           |         :          :      :                                     |
> >         0 ++-O------*----------*------*-------------------------------------+
> > 
> > 
> >                                   vmstat.system.cs
> > 
> >   10000 ++------------------------------------------------------------------+
> >    9000 ++                              *...*..                             |
> >         *...*..*      *...*      *      :      *...*...*..  ..*..*...*..  ..*
> >    8000 ++     :      :   :      :      :                 *.            *.  |
> >    7000 O+     O:  O  O   O: O  : :    :       O   O   O  O   O  O          |
> >         |       :    :     :    :O:  O :O   O                               |
> >    6000 ++      :    :     :    : :    :                                    |
> >    5000 ++       :   :     :   :   :   :                                    |
> >    4000 ++       :   :      :  :   :  :                                     |
> >         |        :  :       :  :   :  :                                     |
> >    3000 ++        : :       : :     : :                                     |
> >    2000 ++        : :       : :     : :                                     |
> >         |         : :        ::     ::                                      |
> >    1000 ++         :         :       :                                      |
> >       0 ++--O------*---------*-------*--------------------------------------+
> > 
> > 
> > 	[*] bisect-good sample
> > 	[O] bisect-bad  sample
> > 
> > To reproduce:
> > 
> > 	apt-get install ruby
> > 	git clone git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git
> > 	cd lkp-tests
> > 	bin/setup-local job.yaml # the job file attached in this email
> > 	bin/run-local   job.yaml
> > 
> > 
> > Disclaimer:
> > Results have been estimated based on internal Intel analysis and are provided
> > for informational purposes only. Any difference in system hardware or software
> > design or configuration may affect actual performance.
> > 
> > 
> > Thanks,
> > Ying Huang
> > 
> 



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [LKP] [RAID5] 878ee679279: -1.8% vmstat.io.bo, +40.5% perf-stat.LLC-load-misses
@ 2015-04-30  6:25     ` Yuanhan Liu
  0 siblings, 0 replies; 6+ messages in thread
From: Yuanhan Liu @ 2015-04-30  6:25 UTC (permalink / raw)
  To: NeilBrown; +Cc: Huang Ying, shli@kernel.org, LKML, LKP ML, Fengguang Wu

On Fri, Apr 24, 2015 at 12:15:59PM +1000, NeilBrown wrote:
> On Thu, 23 Apr 2015 14:55:59 +0800 Huang Ying <ying.huang@intel.com> wrote:
> 
> > FYI, we noticed the below changes on
> > 
> > git://neil.brown.name/md for-next
> > commit 878ee6792799e2f88bdcac329845efadb205252f ("RAID5: batch adjacent full stripe write")
> 
> Hi,
>  is there any chance that you could explain what some of this means?
> There is lots of data and some very pretty graphs, but no explanation.

Hi Neil,

(Sorry for late response: Ying is on vacation)

I guess you can simply ignore this report, as I already reported to you
month ago that this patch made fsmark performs better in most cases:

    https://lists.01.org/pipermail/lkp/2015-March/002411.html

> 
> Which numbers are "good", which are "bad"?  Which is "worst".
> What do the graphs really show? and what would we like to see in them?
> 
> I think it is really great that you are doing this testing and reporting the
> results.  It's just so sad that I completely fail to understand them.

Sorry, it's our bad to make them hard to understand as well as
to report a duplicate one(well, the commit hash is different ;).

We might need take some time to make those data understood easier.

	--yliu

> 
> > 
> > 
> > testbox/testcase/testparams: lkp-st02/dd-write/300-5m-11HDD-RAID5-cfq-xfs-1dd
> > 
> > a87d7f782b47e030  878ee6792799e2f88bdcac3298  
> > ----------------  --------------------------  
> >          %stddev     %change         %stddev
> >              \          |                \  
> >      59035 ±  0%     +18.4%      69913 ±  1%  softirqs.SCHED
> >       1330 ± 10%     +17.4%       1561 ±  4%  slabinfo.kmalloc-512.num_objs
> >       1330 ± 10%     +17.4%       1561 ±  4%  slabinfo.kmalloc-512.active_objs
> >     305908 ±  0%      -1.8%     300427 ±  0%  vmstat.io.bo
> >          1 ±  0%    +100.0%          2 ±  0%  vmstat.procs.r
> >       8266 ±  1%     -15.7%       6968 ±  0%  vmstat.system.cs
> >      14819 ±  0%      -2.1%      14503 ±  0%  vmstat.system.in
> >      18.20 ±  6%     +10.2%      20.05 ±  4%  perf-profile.cpu-cycles.raid_run_ops.handle_stripe.handle_active_stripes.raid5d.md_thread
> >       1.94 ±  9%     +90.6%       3.70 ±  9%  perf-profile.cpu-cycles.async_xor.raid_run_ops.handle_stripe.handle_active_stripes.raid5d
> >       0.00 ±  0%      +Inf%      25.18 ±  3%  perf-profile.cpu-cycles.handle_active_stripes.isra.45.raid5d.md_thread.kthread.ret_from_fork
> >       0.00 ±  0%      +Inf%      14.14 ±  4%  perf-profile.cpu-cycles.async_copy_data.isra.42.raid_run_ops.handle_stripe.handle_active_stripes.raid5d
> >       1.79 ±  7%    +102.9%       3.64 ±  9%  perf-profile.cpu-cycles.xor_blocks.async_xor.raid_run_ops.handle_stripe.handle_active_stripes
> >       3.09 ±  4%     -10.8%       2.76 ±  4%  perf-profile.cpu-cycles.get_active_stripe.make_request.md_make_request.generic_make_request.submit_bio
> >       0.80 ± 14%     +28.1%       1.02 ± 10%  perf-profile.cpu-cycles.mutex_lock.xfs_file_buffered_aio_write.xfs_file_write_iter.new_sync_write.vfs_write
> >      14.78 ±  6%    -100.0%       0.00 ±  0%  perf-profile.cpu-cycles.async_copy_data.isra.38.raid_run_ops.handle_stripe.handle_active_stripes.raid5d
> >      25.68 ±  4%    -100.0%       0.00 ±  0%  perf-profile.cpu-cycles.handle_active_stripes.isra.41.raid5d.md_thread.kthread.ret_from_fork
> >       1.23 ±  5%    +140.0%       2.96 ±  7%  perf-profile.cpu-cycles.xor_sse_5_pf64.xor_blocks.async_xor.raid_run_ops.handle_stripe
> >       2.62 ±  6%     -95.6%       0.12 ± 33%  perf-profile.cpu-cycles.analyse_stripe.handle_stripe.handle_active_stripes.raid5d.md_thread
> >       0.96 ±  9%     +17.5%       1.12 ±  2%  perf-profile.cpu-cycles.xfs_ilock.xfs_file_buffered_aio_write.xfs_file_write_iter.new_sync_write.vfs_write
> >  1.461e+10 ±  0%      -5.3%  1.384e+10 ±  1%  perf-stat.L1-dcache-load-misses
> >  3.688e+11 ±  0%      -2.7%   3.59e+11 ±  0%  perf-stat.L1-dcache-loads
> >  1.124e+09 ±  0%     -27.7%  8.125e+08 ±  0%  perf-stat.L1-dcache-prefetches
> >  2.767e+10 ±  0%      -1.8%  2.717e+10 ±  0%  perf-stat.L1-dcache-store-misses
> >  2.352e+11 ±  0%      -2.8%  2.287e+11 ±  0%  perf-stat.L1-dcache-stores
> >  6.774e+09 ±  0%      -2.3%   6.62e+09 ±  0%  perf-stat.L1-icache-load-misses
> >  5.571e+08 ±  0%     +40.5%  7.826e+08 ±  1%  perf-stat.LLC-load-misses
> >  6.263e+09 ±  0%     -13.7%  5.407e+09 ±  1%  perf-stat.LLC-loads
> >  1.914e+11 ±  0%      -4.2%  1.833e+11 ±  0%  perf-stat.branch-instructions
> >  1.145e+09 ±  2%      -5.6%  1.081e+09 ±  0%  perf-stat.branch-load-misses
> >  1.911e+11 ±  0%      -4.3%  1.829e+11 ±  0%  perf-stat.branch-loads
> >  1.142e+09 ±  2%      -5.1%  1.083e+09 ±  0%  perf-stat.branch-misses
> >  1.218e+09 ±  0%     +19.8%   1.46e+09 ±  0%  perf-stat.cache-misses
> >  2.118e+10 ±  0%      -5.2%  2.007e+10 ±  0%  perf-stat.cache-references
> >    2510308 ±  1%     -15.7%    2115410 ±  0%  perf-stat.context-switches
> >      39623 ±  0%     +22.1%      48370 ±  1%  perf-stat.cpu-migrations
> >  4.179e+08 ± 40%    +165.7%  1.111e+09 ± 35%  perf-stat.dTLB-load-misses
> >  3.684e+11 ±  0%      -2.5%  3.592e+11 ±  0%  perf-stat.dTLB-loads
> >  1.232e+08 ± 15%     +62.5%  2.002e+08 ± 27%  perf-stat.dTLB-store-misses
> >  2.348e+11 ±  0%      -2.5%  2.288e+11 ±  0%  perf-stat.dTLB-stores
> >    3577297 ±  2%      +8.7%    3888986 ±  1%  perf-stat.iTLB-load-misses
> >  1.035e+12 ±  0%      -3.5%  9.988e+11 ±  0%  perf-stat.iTLB-loads
> >  1.036e+12 ±  0%      -3.7%  9.978e+11 ±  0%  perf-stat.instructions
> >        594 ± 30%    +130.3%       1369 ± 13%  sched_debug.cfs_rq[0]:/.blocked_load_avg
> >         17 ± 10%     -28.2%         12 ± 23%  sched_debug.cfs_rq[0]:/.nr_spread_over
> >        210 ± 21%     +42.1%        298 ± 28%  sched_debug.cfs_rq[0]:/.tg_runnable_contrib
> >       9676 ± 21%     +42.1%      13754 ± 28%  sched_debug.cfs_rq[0]:/.avg->runnable_avg_sum
> >        772 ± 25%    +116.5%       1672 ±  9%  sched_debug.cfs_rq[0]:/.tg_load_contrib
> >       8402 ±  9%     +83.3%      15405 ± 11%  sched_debug.cfs_rq[0]:/.tg_load_avg
> >       8356 ±  9%     +82.8%      15272 ± 11%  sched_debug.cfs_rq[1]:/.tg_load_avg
> >        968 ± 25%    +100.8%       1943 ± 14%  sched_debug.cfs_rq[1]:/.blocked_load_avg
> >      16242 ±  9%     -22.2%      12643 ± 14%  sched_debug.cfs_rq[1]:/.avg->runnable_avg_sum
> >        353 ±  9%     -22.1%        275 ± 14%  sched_debug.cfs_rq[1]:/.tg_runnable_contrib
> >       1183 ± 23%     +77.7%       2102 ± 12%  sched_debug.cfs_rq[1]:/.tg_load_contrib
> >        181 ±  8%     -31.4%        124 ± 26%  sched_debug.cfs_rq[2]:/.tg_runnable_contrib
> >       8364 ±  8%     -31.3%       5745 ± 26%  sched_debug.cfs_rq[2]:/.avg->runnable_avg_sum
> >       8297 ±  9%     +81.7%      15079 ± 12%  sched_debug.cfs_rq[2]:/.tg_load_avg
> >      30439 ± 13%     -45.2%      16681 ± 26%  sched_debug.cfs_rq[2]:/.exec_clock
> >      39735 ± 14%     -48.3%      20545 ± 29%  sched_debug.cfs_rq[2]:/.min_vruntime
> >       8231 ± 10%     +82.2%      15000 ± 12%  sched_debug.cfs_rq[3]:/.tg_load_avg
> >       1210 ± 14%    +110.3%       2546 ± 30%  sched_debug.cfs_rq[4]:/.tg_load_contrib
> >       8188 ± 10%     +82.8%      14964 ± 12%  sched_debug.cfs_rq[4]:/.tg_load_avg
> >       8132 ± 10%     +83.1%      14890 ± 12%  sched_debug.cfs_rq[5]:/.tg_load_avg
> >        749 ± 29%    +205.9%       2292 ± 34%  sched_debug.cfs_rq[5]:/.blocked_load_avg
> >        963 ± 30%    +169.9%       2599 ± 33%  sched_debug.cfs_rq[5]:/.tg_load_contrib
> >      37791 ± 32%     -38.6%      23209 ± 13%  sched_debug.cfs_rq[6]:/.min_vruntime
> >        693 ± 25%    +132.2%       1609 ± 29%  sched_debug.cfs_rq[6]:/.blocked_load_avg
> >      10838 ± 13%     -39.2%       6587 ± 13%  sched_debug.cfs_rq[6]:/.avg->runnable_avg_sum
> >      29329 ± 27%     -33.2%      19577 ± 10%  sched_debug.cfs_rq[6]:/.exec_clock
> >        235 ± 14%     -39.7%        142 ± 14%  sched_debug.cfs_rq[6]:/.tg_runnable_contrib
> >       8085 ± 10%     +83.6%      14848 ± 12%  sched_debug.cfs_rq[6]:/.tg_load_avg
> >        839 ± 25%    +128.5%       1917 ± 18%  sched_debug.cfs_rq[6]:/.tg_load_contrib
> >       8051 ± 10%     +83.6%      14779 ± 12%  sched_debug.cfs_rq[7]:/.tg_load_avg
> >        156 ± 34%     +97.9%        309 ± 19%  sched_debug.cpu#0.cpu_load[4]
> >        160 ± 25%     +64.0%        263 ± 16%  sched_debug.cpu#0.cpu_load[2]
> >        156 ± 32%     +83.7%        286 ± 17%  sched_debug.cpu#0.cpu_load[3]
> >        164 ± 20%     -35.1%        106 ± 31%  sched_debug.cpu#2.cpu_load[0]
> >        249 ± 15%     +80.2%        449 ± 10%  sched_debug.cpu#4.cpu_load[3]
> >        231 ± 11%    +101.2%        466 ± 13%  sched_debug.cpu#4.cpu_load[2]
> >        217 ± 14%    +189.9%        630 ± 38%  sched_debug.cpu#4.cpu_load[0]
> >      71951 ±  5%     +21.6%      87526 ±  7%  sched_debug.cpu#4.nr_load_updates
> >        214 ±  8%    +146.1%        527 ± 27%  sched_debug.cpu#4.cpu_load[1]
> >        256 ± 17%     +75.7%        449 ± 13%  sched_debug.cpu#4.cpu_load[4]
> >        209 ± 23%     +98.3%        416 ± 48%  sched_debug.cpu#5.cpu_load[2]
> >      68024 ±  2%     +18.8%      80825 ±  1%  sched_debug.cpu#5.nr_load_updates
> >        217 ± 26%     +74.9%        380 ± 45%  sched_debug.cpu#5.cpu_load[3]
> >        852 ± 21%     -38.3%        526 ± 22%  sched_debug.cpu#6.curr->pid
> > 
> > lkp-st02: Core2
> > Memory: 8G
> > 
> > 
> > 
> > 
> >                                 perf-stat.cache-misses
> > 
> >   1.6e+09 O+-----O--O---O--O---O--------------------------------------------+
> >           |                       O   O  O   O  O   O  O   O  O   O         |
> >   1.4e+09 ++                                                                |
> >   1.2e+09 *+.*...*      *..*      *      *...*..*...*..*...*..*...*..*...*..*
> >           |      :      :  :      :      :                                  |
> >     1e+09 ++      :    :    :    : :    :                                   |
> >           |       :    :    :    : :    :                                   |
> >     8e+08 ++      :    :    :    : :    :                                   |
> >           |       :   :      :   :  :   :                                   |
> >     6e+08 ++       :  :      :  :   :  :                                    |
> >     4e+08 ++       : :        : :    : :                                    |
> >           |        : :        : :    : :                                    |
> >     2e+08 ++       : :        : :    : :                                    |
> >           |         :          :      :                                     |
> >         0 ++-O------*----------*------*-------------------------------------+
> > 
> > 
> >                             perf-stat.L1-dcache-prefetches
> > 
> >   1.2e+09 ++----------------------------------------------------------------+
> >           *..*...*      *..*      *        ..*..  ..*..*...*..*...*..*...*..*
> >     1e+09 ++     :      :  :      :      *.     *.                          |
> >           |      :     :    :     ::     :                                  |
> >           |       :    :    :    : :     :                        O         |
> >     8e+08 O+     O: O  :O  O:  O :O:  O :O   O  O   O  O   O  O             |
> >           |       :   :      :   :  :   :                                   |
> >     6e+08 ++      :   :      :   :  :   :                                   |
> >           |        :  :      :  :   :   :                                   |
> >     4e+08 ++       :  :      :  :   :  :                                    |
> >           |        : :        : :    : :                                    |
> >           |        : :        : :    : :                                    |
> >     2e+08 ++        ::        ::     : :                                    |
> >           |         :          :      :                                     |
> >         0 ++-O------*----------*------*-------------------------------------+
> > 
> > 
> >                               perf-stat.LLC-load-misses
> > 
> >   1e+09 ++------------------------------------------------------------------+
> >   9e+08 O+     O   O  O   O  O                                              |
> >         |                        O   O  O   O                               |
> >   8e+08 ++                                     O   O   O  O   O  O          |
> >   7e+08 ++                                                                  |
> >         |                                                                   |
> >   6e+08 *+..*..*      *...*      *      *...*..*...*...*..*...*..*...*..*...*
> >   5e+08 ++      :     :   :      ::     :                                   |
> >   4e+08 ++      :    :     :    : :    :                                    |
> >         |        :   :     :    :  :   :                                    |
> >   3e+08 ++       :   :      :  :   :   :                                    |
> >   2e+08 ++        : :       :  :    : :                                     |
> >         |         : :       : :     : :                                     |
> >   1e+08 ++         :         ::      :                                      |
> >       0 ++--O------*---------*-------*--------------------------------------+
> > 
> > 
> >                               perf-stat.context-switches
> > 
> >     3e+06 ++----------------------------------------------------------------+
> >           |                              *...*..*...                        |
> >   2.5e+06 *+.*...*      *..*      *      :          *..*...  .*...*..*...  .*
> >           |      :      :  :      :      :                 *.            *. |
> >           O      O: O  :O  O:  O  ::    :       O   O  O   O  O   O         |
> >     2e+06 ++      :    :    :    :O:  O :O   O                              |
> >           |       :    :    :    : :    :                                   |
> >   1.5e+06 ++      :   :      :   :  :   :                                   |
> >           |        :  :      :   :  :  :                                    |
> >     1e+06 ++       :  :      :  :   :  :                                    |
> >           |        : :        : :    : :                                    |
> >           |        : :        : :    : :                                    |
> >    500000 ++        ::        : :    ::                                     |
> >           |         :          :      :                                     |
> >         0 ++-O------*----------*------*-------------------------------------+
> > 
> > 
> >                                   vmstat.system.cs
> > 
> >   10000 ++------------------------------------------------------------------+
> >    9000 ++                              *...*..                             |
> >         *...*..*      *...*      *      :      *...*...*..  ..*..*...*..  ..*
> >    8000 ++     :      :   :      :      :                 *.            *.  |
> >    7000 O+     O:  O  O   O: O  : :    :       O   O   O  O   O  O          |
> >         |       :    :     :    :O:  O :O   O                               |
> >    6000 ++      :    :     :    : :    :                                    |
> >    5000 ++       :   :     :   :   :   :                                    |
> >    4000 ++       :   :      :  :   :  :                                     |
> >         |        :  :       :  :   :  :                                     |
> >    3000 ++        : :       : :     : :                                     |
> >    2000 ++        : :       : :     : :                                     |
> >         |         : :        ::     ::                                      |
> >    1000 ++         :         :       :                                      |
> >       0 ++--O------*---------*-------*--------------------------------------+
> > 
> > 
> > 	[*] bisect-good sample
> > 	[O] bisect-bad  sample
> > 
> > To reproduce:
> > 
> > 	apt-get install ruby
> > 	git clone git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git
> > 	cd lkp-tests
> > 	bin/setup-local job.yaml # the job file attached in this email
> > 	bin/run-local   job.yaml
> > 
> > 
> > Disclaimer:
> > Results have been estimated based on internal Intel analysis and are provided
> > for informational purposes only. Any difference in system hardware or software
> > design or configuration may affect actual performance.
> > 
> > 
> > Thanks,
> > Ying Huang
> > 
> 



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-04-30  6:25 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-04-23  6:55 [RAID5] 878ee679279: -1.8% vmstat.io.bo, +40.5% perf-stat.LLC-load-misses Huang Ying
2015-04-23  6:55 ` [LKP] " Huang Ying
2015-04-24  2:15 ` NeilBrown
2015-04-24  2:15   ` [LKP] " NeilBrown
2015-04-30  6:25   ` Yuanhan Liu
2015-04-30  6:25     ` [LKP] " Yuanhan Liu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.