public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Huang Ying <ying.huang@intel.com>
To: "shli@kernel.org" <shli@kernel.org>
Cc: NeilBrown <neilb@suse.de>, LKML <linux-kernel@vger.kernel.org>,
	LKP ML <lkp@01.org>
Subject: [LKP] [RAID5] 878ee679279: -1.8% vmstat.io.bo, +40.5% perf-stat.LLC-load-misses
Date: Thu, 23 Apr 2015 14:55:59 +0800	[thread overview]
Message-ID: <1429772159.25120.9.camel@intel.com> (raw)

[-- Attachment #1: Type: text/plain, Size: 14775 bytes --]

FYI, we noticed the below changes on

git://neil.brown.name/md for-next
commit 878ee6792799e2f88bdcac329845efadb205252f ("RAID5: batch adjacent full stripe write")


testbox/testcase/testparams: lkp-st02/dd-write/300-5m-11HDD-RAID5-cfq-xfs-1dd

a87d7f782b47e030  878ee6792799e2f88bdcac3298  
----------------  --------------------------  
         %stddev     %change         %stddev
             \          |                \  
     59035 ±  0%     +18.4%      69913 ±  1%  softirqs.SCHED
      1330 ± 10%     +17.4%       1561 ±  4%  slabinfo.kmalloc-512.num_objs
      1330 ± 10%     +17.4%       1561 ±  4%  slabinfo.kmalloc-512.active_objs
    305908 ±  0%      -1.8%     300427 ±  0%  vmstat.io.bo
         1 ±  0%    +100.0%          2 ±  0%  vmstat.procs.r
      8266 ±  1%     -15.7%       6968 ±  0%  vmstat.system.cs
     14819 ±  0%      -2.1%      14503 ±  0%  vmstat.system.in
     18.20 ±  6%     +10.2%      20.05 ±  4%  perf-profile.cpu-cycles.raid_run_ops.handle_stripe.handle_active_stripes.raid5d.md_thread
      1.94 ±  9%     +90.6%       3.70 ±  9%  perf-profile.cpu-cycles.async_xor.raid_run_ops.handle_stripe.handle_active_stripes.raid5d
      0.00 ±  0%      +Inf%      25.18 ±  3%  perf-profile.cpu-cycles.handle_active_stripes.isra.45.raid5d.md_thread.kthread.ret_from_fork
      0.00 ±  0%      +Inf%      14.14 ±  4%  perf-profile.cpu-cycles.async_copy_data.isra.42.raid_run_ops.handle_stripe.handle_active_stripes.raid5d
      1.79 ±  7%    +102.9%       3.64 ±  9%  perf-profile.cpu-cycles.xor_blocks.async_xor.raid_run_ops.handle_stripe.handle_active_stripes
      3.09 ±  4%     -10.8%       2.76 ±  4%  perf-profile.cpu-cycles.get_active_stripe.make_request.md_make_request.generic_make_request.submit_bio
      0.80 ± 14%     +28.1%       1.02 ± 10%  perf-profile.cpu-cycles.mutex_lock.xfs_file_buffered_aio_write.xfs_file_write_iter.new_sync_write.vfs_write
     14.78 ±  6%    -100.0%       0.00 ±  0%  perf-profile.cpu-cycles.async_copy_data.isra.38.raid_run_ops.handle_stripe.handle_active_stripes.raid5d
     25.68 ±  4%    -100.0%       0.00 ±  0%  perf-profile.cpu-cycles.handle_active_stripes.isra.41.raid5d.md_thread.kthread.ret_from_fork
      1.23 ±  5%    +140.0%       2.96 ±  7%  perf-profile.cpu-cycles.xor_sse_5_pf64.xor_blocks.async_xor.raid_run_ops.handle_stripe
      2.62 ±  6%     -95.6%       0.12 ± 33%  perf-profile.cpu-cycles.analyse_stripe.handle_stripe.handle_active_stripes.raid5d.md_thread
      0.96 ±  9%     +17.5%       1.12 ±  2%  perf-profile.cpu-cycles.xfs_ilock.xfs_file_buffered_aio_write.xfs_file_write_iter.new_sync_write.vfs_write
 1.461e+10 ±  0%      -5.3%  1.384e+10 ±  1%  perf-stat.L1-dcache-load-misses
 3.688e+11 ±  0%      -2.7%   3.59e+11 ±  0%  perf-stat.L1-dcache-loads
 1.124e+09 ±  0%     -27.7%  8.125e+08 ±  0%  perf-stat.L1-dcache-prefetches
 2.767e+10 ±  0%      -1.8%  2.717e+10 ±  0%  perf-stat.L1-dcache-store-misses
 2.352e+11 ±  0%      -2.8%  2.287e+11 ±  0%  perf-stat.L1-dcache-stores
 6.774e+09 ±  0%      -2.3%   6.62e+09 ±  0%  perf-stat.L1-icache-load-misses
 5.571e+08 ±  0%     +40.5%  7.826e+08 ±  1%  perf-stat.LLC-load-misses
 6.263e+09 ±  0%     -13.7%  5.407e+09 ±  1%  perf-stat.LLC-loads
 1.914e+11 ±  0%      -4.2%  1.833e+11 ±  0%  perf-stat.branch-instructions
 1.145e+09 ±  2%      -5.6%  1.081e+09 ±  0%  perf-stat.branch-load-misses
 1.911e+11 ±  0%      -4.3%  1.829e+11 ±  0%  perf-stat.branch-loads
 1.142e+09 ±  2%      -5.1%  1.083e+09 ±  0%  perf-stat.branch-misses
 1.218e+09 ±  0%     +19.8%   1.46e+09 ±  0%  perf-stat.cache-misses
 2.118e+10 ±  0%      -5.2%  2.007e+10 ±  0%  perf-stat.cache-references
   2510308 ±  1%     -15.7%    2115410 ±  0%  perf-stat.context-switches
     39623 ±  0%     +22.1%      48370 ±  1%  perf-stat.cpu-migrations
 4.179e+08 ± 40%    +165.7%  1.111e+09 ± 35%  perf-stat.dTLB-load-misses
 3.684e+11 ±  0%      -2.5%  3.592e+11 ±  0%  perf-stat.dTLB-loads
 1.232e+08 ± 15%     +62.5%  2.002e+08 ± 27%  perf-stat.dTLB-store-misses
 2.348e+11 ±  0%      -2.5%  2.288e+11 ±  0%  perf-stat.dTLB-stores
   3577297 ±  2%      +8.7%    3888986 ±  1%  perf-stat.iTLB-load-misses
 1.035e+12 ±  0%      -3.5%  9.988e+11 ±  0%  perf-stat.iTLB-loads
 1.036e+12 ±  0%      -3.7%  9.978e+11 ±  0%  perf-stat.instructions
       594 ± 30%    +130.3%       1369 ± 13%  sched_debug.cfs_rq[0]:/.blocked_load_avg
        17 ± 10%     -28.2%         12 ± 23%  sched_debug.cfs_rq[0]:/.nr_spread_over
       210 ± 21%     +42.1%        298 ± 28%  sched_debug.cfs_rq[0]:/.tg_runnable_contrib
      9676 ± 21%     +42.1%      13754 ± 28%  sched_debug.cfs_rq[0]:/.avg->runnable_avg_sum
       772 ± 25%    +116.5%       1672 ±  9%  sched_debug.cfs_rq[0]:/.tg_load_contrib
      8402 ±  9%     +83.3%      15405 ± 11%  sched_debug.cfs_rq[0]:/.tg_load_avg
      8356 ±  9%     +82.8%      15272 ± 11%  sched_debug.cfs_rq[1]:/.tg_load_avg
       968 ± 25%    +100.8%       1943 ± 14%  sched_debug.cfs_rq[1]:/.blocked_load_avg
     16242 ±  9%     -22.2%      12643 ± 14%  sched_debug.cfs_rq[1]:/.avg->runnable_avg_sum
       353 ±  9%     -22.1%        275 ± 14%  sched_debug.cfs_rq[1]:/.tg_runnable_contrib
      1183 ± 23%     +77.7%       2102 ± 12%  sched_debug.cfs_rq[1]:/.tg_load_contrib
       181 ±  8%     -31.4%        124 ± 26%  sched_debug.cfs_rq[2]:/.tg_runnable_contrib
      8364 ±  8%     -31.3%       5745 ± 26%  sched_debug.cfs_rq[2]:/.avg->runnable_avg_sum
      8297 ±  9%     +81.7%      15079 ± 12%  sched_debug.cfs_rq[2]:/.tg_load_avg
     30439 ± 13%     -45.2%      16681 ± 26%  sched_debug.cfs_rq[2]:/.exec_clock
     39735 ± 14%     -48.3%      20545 ± 29%  sched_debug.cfs_rq[2]:/.min_vruntime
      8231 ± 10%     +82.2%      15000 ± 12%  sched_debug.cfs_rq[3]:/.tg_load_avg
      1210 ± 14%    +110.3%       2546 ± 30%  sched_debug.cfs_rq[4]:/.tg_load_contrib
      8188 ± 10%     +82.8%      14964 ± 12%  sched_debug.cfs_rq[4]:/.tg_load_avg
      8132 ± 10%     +83.1%      14890 ± 12%  sched_debug.cfs_rq[5]:/.tg_load_avg
       749 ± 29%    +205.9%       2292 ± 34%  sched_debug.cfs_rq[5]:/.blocked_load_avg
       963 ± 30%    +169.9%       2599 ± 33%  sched_debug.cfs_rq[5]:/.tg_load_contrib
     37791 ± 32%     -38.6%      23209 ± 13%  sched_debug.cfs_rq[6]:/.min_vruntime
       693 ± 25%    +132.2%       1609 ± 29%  sched_debug.cfs_rq[6]:/.blocked_load_avg
     10838 ± 13%     -39.2%       6587 ± 13%  sched_debug.cfs_rq[6]:/.avg->runnable_avg_sum
     29329 ± 27%     -33.2%      19577 ± 10%  sched_debug.cfs_rq[6]:/.exec_clock
       235 ± 14%     -39.7%        142 ± 14%  sched_debug.cfs_rq[6]:/.tg_runnable_contrib
      8085 ± 10%     +83.6%      14848 ± 12%  sched_debug.cfs_rq[6]:/.tg_load_avg
       839 ± 25%    +128.5%       1917 ± 18%  sched_debug.cfs_rq[6]:/.tg_load_contrib
      8051 ± 10%     +83.6%      14779 ± 12%  sched_debug.cfs_rq[7]:/.tg_load_avg
       156 ± 34%     +97.9%        309 ± 19%  sched_debug.cpu#0.cpu_load[4]
       160 ± 25%     +64.0%        263 ± 16%  sched_debug.cpu#0.cpu_load[2]
       156 ± 32%     +83.7%        286 ± 17%  sched_debug.cpu#0.cpu_load[3]
       164 ± 20%     -35.1%        106 ± 31%  sched_debug.cpu#2.cpu_load[0]
       249 ± 15%     +80.2%        449 ± 10%  sched_debug.cpu#4.cpu_load[3]
       231 ± 11%    +101.2%        466 ± 13%  sched_debug.cpu#4.cpu_load[2]
       217 ± 14%    +189.9%        630 ± 38%  sched_debug.cpu#4.cpu_load[0]
     71951 ±  5%     +21.6%      87526 ±  7%  sched_debug.cpu#4.nr_load_updates
       214 ±  8%    +146.1%        527 ± 27%  sched_debug.cpu#4.cpu_load[1]
       256 ± 17%     +75.7%        449 ± 13%  sched_debug.cpu#4.cpu_load[4]
       209 ± 23%     +98.3%        416 ± 48%  sched_debug.cpu#5.cpu_load[2]
     68024 ±  2%     +18.8%      80825 ±  1%  sched_debug.cpu#5.nr_load_updates
       217 ± 26%     +74.9%        380 ± 45%  sched_debug.cpu#5.cpu_load[3]
       852 ± 21%     -38.3%        526 ± 22%  sched_debug.cpu#6.curr->pid

lkp-st02: Core2
Memory: 8G




                                perf-stat.cache-misses

  1.6e+09 O+-----O--O---O--O---O--------------------------------------------+
          |                       O   O  O   O  O   O  O   O  O   O         |
  1.4e+09 ++                                                                |
  1.2e+09 *+.*...*      *..*      *      *...*..*...*..*...*..*...*..*...*..*
          |      :      :  :      :      :                                  |
    1e+09 ++      :    :    :    : :    :                                   |
          |       :    :    :    : :    :                                   |
    8e+08 ++      :    :    :    : :    :                                   |
          |       :   :      :   :  :   :                                   |
    6e+08 ++       :  :      :  :   :  :                                    |
    4e+08 ++       : :        : :    : :                                    |
          |        : :        : :    : :                                    |
    2e+08 ++       : :        : :    : :                                    |
          |         :          :      :                                     |
        0 ++-O------*----------*------*-------------------------------------+


                            perf-stat.L1-dcache-prefetches

  1.2e+09 ++----------------------------------------------------------------+
          *..*...*      *..*      *        ..*..  ..*..*...*..*...*..*...*..*
    1e+09 ++     :      :  :      :      *.     *.                          |
          |      :     :    :     ::     :                                  |
          |       :    :    :    : :     :                        O         |
    8e+08 O+     O: O  :O  O:  O :O:  O :O   O  O   O  O   O  O             |
          |       :   :      :   :  :   :                                   |
    6e+08 ++      :   :      :   :  :   :                                   |
          |        :  :      :  :   :   :                                   |
    4e+08 ++       :  :      :  :   :  :                                    |
          |        : :        : :    : :                                    |
          |        : :        : :    : :                                    |
    2e+08 ++        ::        ::     : :                                    |
          |         :          :      :                                     |
        0 ++-O------*----------*------*-------------------------------------+


                              perf-stat.LLC-load-misses

  1e+09 ++------------------------------------------------------------------+
  9e+08 O+     O   O  O   O  O                                              |
        |                        O   O  O   O                               |
  8e+08 ++                                     O   O   O  O   O  O          |
  7e+08 ++                                                                  |
        |                                                                   |
  6e+08 *+..*..*      *...*      *      *...*..*...*...*..*...*..*...*..*...*
  5e+08 ++      :     :   :      ::     :                                   |
  4e+08 ++      :    :     :    : :    :                                    |
        |        :   :     :    :  :   :                                    |
  3e+08 ++       :   :      :  :   :   :                                    |
  2e+08 ++        : :       :  :    : :                                     |
        |         : :       : :     : :                                     |
  1e+08 ++         :         ::      :                                      |
      0 ++--O------*---------*-------*--------------------------------------+


                              perf-stat.context-switches

    3e+06 ++----------------------------------------------------------------+
          |                              *...*..*...                        |
  2.5e+06 *+.*...*      *..*      *      :          *..*...  .*...*..*...  .*
          |      :      :  :      :      :                 *.            *. |
          O      O: O  :O  O:  O  ::    :       O   O  O   O  O   O         |
    2e+06 ++      :    :    :    :O:  O :O   O                              |
          |       :    :    :    : :    :                                   |
  1.5e+06 ++      :   :      :   :  :   :                                   |
          |        :  :      :   :  :  :                                    |
    1e+06 ++       :  :      :  :   :  :                                    |
          |        : :        : :    : :                                    |
          |        : :        : :    : :                                    |
   500000 ++        ::        : :    ::                                     |
          |         :          :      :                                     |
        0 ++-O------*----------*------*-------------------------------------+


                                  vmstat.system.cs

  10000 ++------------------------------------------------------------------+
   9000 ++                              *...*..                             |
        *...*..*      *...*      *      :      *...*...*..  ..*..*...*..  ..*
   8000 ++     :      :   :      :      :                 *.            *.  |
   7000 O+     O:  O  O   O: O  : :    :       O   O   O  O   O  O          |
        |       :    :     :    :O:  O :O   O                               |
   6000 ++      :    :     :    : :    :                                    |
   5000 ++       :   :     :   :   :   :                                    |
   4000 ++       :   :      :  :   :  :                                     |
        |        :  :       :  :   :  :                                     |
   3000 ++        : :       : :     : :                                     |
   2000 ++        : :       : :     : :                                     |
        |         : :        ::     ::                                      |
   1000 ++         :         :       :                                      |
      0 ++--O------*---------*-------*--------------------------------------+


	[*] bisect-good sample
	[O] bisect-bad  sample

To reproduce:

	apt-get install ruby
	git clone git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git
	cd lkp-tests
	bin/setup-local job.yaml # the job file attached in this email
	bin/run-local   job.yaml


Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


Thanks,
Ying Huang


[-- Attachment #2: job.yaml --]
[-- Type: text/plain, Size: 3296 bytes --]

---
testcase: dd-write
default-monitors:
  wait: pre-test
  uptime: 
  iostat: 
  vmstat: 
  numa-numastat: 
  numa-vmstat: 
  numa-meminfo: 
  proc-vmstat: 
  proc-stat: 
  meminfo: 
  slabinfo: 
  interrupts: 
  lock_stat: 
  latency_stats: 
  softirqs: 
  bdi_dev_mapping: 
  diskstats: 
  nfsstat: 
  cpuidle: 
  cpufreq-stats: 
  turbostat: 
  pmeter: 
  sched_debug:
    interval: 10
default-watchdogs:
  watch-oom: 
  watchdog: 
cpufreq_governor: 
commit: a1a71cc4c0a53e29fe27cede9392b0ad816ee956
model: Core2
memory: 8G
nr_hdd_partitions: 12
wait_disks_timeout: 300
hdd_partitions: "/dev/disk/by-id/scsi-35000c5000???????"
swap_partitions: 
runtime: 5m
disk: 11HDD
md: RAID5
iosched: cfq
fs: xfs
fs2: 
monitors:
  perf-stat: 
  perf-profile: 
  ftrace:
    events: balance_dirty_pages bdi_dirty_ratelimit global_dirty_state writeback_single_inode
nr_threads: 1dd
dd: 
testbox: lkp-st02
tbox_group: lkp-st02
kconfig: x86_64-rhel
enqueue_time: 2015-04-19 11:59:58.120063120 +08:00
head_commit: a1a71cc4c0a53e29fe27cede9392b0ad816ee956
base_commit: 39a8804455fb23f09157341d3ba7db6d7ae6ee76
branch: linux-devel/devel-hourly-2015042014
kernel: "/kernel/x86_64-rhel/a1a71cc4c0a53e29fe27cede9392b0ad816ee956/vmlinuz-4.0.0-09109-ga1a71cc"
user: lkp
queue: cyclic
rootfs: debian-x86_64-2015-02-07.cgz
result_root: "/result/lkp-st02/dd-write/300-5m-11HDD-RAID5-cfq-xfs-1dd/debian-x86_64-2015-02-07.cgz/x86_64-rhel/a1a71cc4c0a53e29fe27cede9392b0ad816ee956/0"
LKP_SERVER: inn
job_file: "/lkp/scheduled/lkp-st02/cyclic_dd-write-300-5m-11HDD-RAID5-cfq-xfs-1dd-x86_64-rhel-HEAD-a1a71cc4c0a53e29fe27cede9392b0ad816ee956-0-20150419-35022-17ddag2.yaml"
dequeue_time: 2015-04-20 16:17:46.635323077 +08:00
nr_cpu: "$(nproc)"
initrd: "/osimage/debian/debian-x86_64-2015-02-07.cgz"
bootloader_append:
- root=/dev/ram0
- user=lkp
- job=/lkp/scheduled/lkp-st02/cyclic_dd-write-300-5m-11HDD-RAID5-cfq-xfs-1dd-x86_64-rhel-HEAD-a1a71cc4c0a53e29fe27cede9392b0ad816ee956-0-20150419-35022-17ddag2.yaml
- ARCH=x86_64
- kconfig=x86_64-rhel
- branch=linux-devel/devel-hourly-2015042014
- commit=a1a71cc4c0a53e29fe27cede9392b0ad816ee956
- BOOT_IMAGE=/kernel/x86_64-rhel/a1a71cc4c0a53e29fe27cede9392b0ad816ee956/vmlinuz-4.0.0-09109-ga1a71cc
- RESULT_ROOT=/result/lkp-st02/dd-write/300-5m-11HDD-RAID5-cfq-xfs-1dd/debian-x86_64-2015-02-07.cgz/x86_64-rhel/a1a71cc4c0a53e29fe27cede9392b0ad816ee956/0
- LKP_SERVER=inn
- |2-


  earlyprintk=ttyS0,115200 rd.udev.log-priority=err systemd.log_target=journal systemd.log_level=warning
  debug apic=debug sysrq_always_enabled rcupdate.rcu_cpu_stall_timeout=100
  panic=-1 softlockup_panic=1 nmi_watchdog=panic oops=panic load_ramdisk=2 prompt_ramdisk=0
  console=ttyS0,115200 console=tty0 vga=normal

  rw
max_uptime: 1500
lkp_initrd: "/lkp/lkp/lkp-x86_64.cgz"
modules_initrd: "/kernel/x86_64-rhel/a1a71cc4c0a53e29fe27cede9392b0ad816ee956/modules.cgz"
bm_initrd: "/osimage/deps/debian-x86_64-2015-02-07.cgz/lkp.cgz,/osimage/deps/debian-x86_64-2015-02-07.cgz/turbostat.cgz,/lkp/benchmarks/turbostat.cgz,/osimage/deps/debian-x86_64-2015-02-07.cgz/fs.cgz,/osimage/deps/debian-x86_64-2015-02-07.cgz/fs2.cgz"
job_state: finished
loadavg: 1.60 1.36 0.63 1/145 5859
start_time: '1429517927'
end_time: '1429518229'
version: "/lkp/lkp/.src-20150418-142223"
time_delta: '1429517881.362849165'

[-- Attachment #3: reproduce --]
[-- Type: text/plain, Size: 680 bytes --]

mdadm --stop /dev/md0
mdadm -q --create /dev/md0 --chunk=256 --level=raid5 --raid-devices=11 --force --assume-clean /dev/sdb /dev/sdg /dev/sdi /dev/sdh /dev/sdl /dev/sdf /dev/sdm /dev/sdk /dev/sdd /dev/sde /dev/sdc
mkfs -t xfs /dev/md0
mount -t xfs -o nobarrier,inode64 /dev/md0 /fs/md0
echo 1 > /sys/kernel/debug/tracing/events/writeback/balance_dirty_pages/enable
echo 1 > /sys/kernel/debug/tracing/events/writeback/bdi_dirty_ratelimit/enable
echo 1 > /sys/kernel/debug/tracing/events/writeback/global_dirty_state/enable
echo 1 > /sys/kernel/debug/tracing/events/writeback/writeback_single_inode/enable
dd  if=/dev/zero of=/fs/md0/zero-1 status=noxfer &
sleep 300
killall -9 dd

[-- Attachment #4: Type: text/plain, Size: 89 bytes --]

_______________________________________________
LKP mailing list
LKP@linux.intel.com
\r

             reply	other threads:[~2015-04-23  6:56 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-23  6:55 Huang Ying [this message]
2015-04-24  2:15 ` [LKP] [RAID5] 878ee679279: -1.8% vmstat.io.bo, +40.5% perf-stat.LLC-load-misses NeilBrown
2015-04-30  6:25   ` Yuanhan Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1429772159.25120.9.camel@intel.com \
    --to=ying.huang@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lkp@01.org \
    --cc=neilb@suse.de \
    --cc=shli@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox