Re: [LKP] [RAID5] 878ee679279: -1.8% vmstat.io.bo, +40.5% perf-stat.LLC-load-misses

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: NeilBrown <neilb@suse.de>
To: Huang Ying <ying.huang@intel.com>
Cc: "shli@kernel.org" <shli@kernel.org>,
	LKML <linux-kernel@vger.kernel.org>, LKP ML <lkp@01.org>
Subject: Re: [LKP] [RAID5] 878ee679279: -1.8% vmstat.io.bo, +40.5% perf-stat.LLC-load-misses
Date: Fri, 24 Apr 2015 12:15:59 +1000	[thread overview]
Message-ID: <20150424121559.321677ce@notabene.brown> (raw)
In-Reply-To: <1429772159.25120.9.camel@intel.com>

[-- Attachment #1: Type: text/plain, Size: 15980 bytes --]

On Thu, 23 Apr 2015 14:55:59 +0800 Huang Ying <ying.huang@intel.com> wrote:

> FYI, we noticed the below changes on
> 
> git://neil.brown.name/md for-next
> commit 878ee6792799e2f88bdcac329845efadb205252f ("RAID5: batch adjacent full stripe write")

Hi,
 is there any chance that you could explain what some of this means?
There is lots of data and some very pretty graphs, but no explanation.

Which numbers are "good", which are "bad"?  Which is "worst".
What do the graphs really show? and what would we like to see in them?

I think it is really great that you are doing this testing and reporting the
results.  It's just so sad that I completely fail to understand them.

Thanks,
NeilBrown

> 
> 
> testbox/testcase/testparams: lkp-st02/dd-write/300-5m-11HDD-RAID5-cfq-xfs-1dd
> 
> a87d7f782b47e030  878ee6792799e2f88bdcac3298  
> ----------------  --------------------------  
>          %stddev     %change         %stddev
>              \          |                \  
>      59035 ±  0%     +18.4%      69913 ±  1%  softirqs.SCHED
>       1330 ± 10%     +17.4%       1561 ±  4%  slabinfo.kmalloc-512.num_objs
>       1330 ± 10%     +17.4%       1561 ±  4%  slabinfo.kmalloc-512.active_objs
>     305908 ±  0%      -1.8%     300427 ±  0%  vmstat.io.bo
>          1 ±  0%    +100.0%          2 ±  0%  vmstat.procs.r
>       8266 ±  1%     -15.7%       6968 ±  0%  vmstat.system.cs
>      14819 ±  0%      -2.1%      14503 ±  0%  vmstat.system.in
>      18.20 ±  6%     +10.2%      20.05 ±  4%  perf-profile.cpu-cycles.raid_run_ops.handle_stripe.handle_active_stripes.raid5d.md_thread
>       1.94 ±  9%     +90.6%       3.70 ±  9%  perf-profile.cpu-cycles.async_xor.raid_run_ops.handle_stripe.handle_active_stripes.raid5d
>       0.00 ±  0%      +Inf%      25.18 ±  3%  perf-profile.cpu-cycles.handle_active_stripes.isra.45.raid5d.md_thread.kthread.ret_from_fork
>       0.00 ±  0%      +Inf%      14.14 ±  4%  perf-profile.cpu-cycles.async_copy_data.isra.42.raid_run_ops.handle_stripe.handle_active_stripes.raid5d
>       1.79 ±  7%    +102.9%       3.64 ±  9%  perf-profile.cpu-cycles.xor_blocks.async_xor.raid_run_ops.handle_stripe.handle_active_stripes
>       3.09 ±  4%     -10.8%       2.76 ±  4%  perf-profile.cpu-cycles.get_active_stripe.make_request.md_make_request.generic_make_request.submit_bio
>       0.80 ± 14%     +28.1%       1.02 ± 10%  perf-profile.cpu-cycles.mutex_lock.xfs_file_buffered_aio_write.xfs_file_write_iter.new_sync_write.vfs_write
>      14.78 ±  6%    -100.0%       0.00 ±  0%  perf-profile.cpu-cycles.async_copy_data.isra.38.raid_run_ops.handle_stripe.handle_active_stripes.raid5d
>      25.68 ±  4%    -100.0%       0.00 ±  0%  perf-profile.cpu-cycles.handle_active_stripes.isra.41.raid5d.md_thread.kthread.ret_from_fork
>       1.23 ±  5%    +140.0%       2.96 ±  7%  perf-profile.cpu-cycles.xor_sse_5_pf64.xor_blocks.async_xor.raid_run_ops.handle_stripe
>       2.62 ±  6%     -95.6%       0.12 ± 33%  perf-profile.cpu-cycles.analyse_stripe.handle_stripe.handle_active_stripes.raid5d.md_thread
>       0.96 ±  9%     +17.5%       1.12 ±  2%  perf-profile.cpu-cycles.xfs_ilock.xfs_file_buffered_aio_write.xfs_file_write_iter.new_sync_write.vfs_write
>  1.461e+10 ±  0%      -5.3%  1.384e+10 ±  1%  perf-stat.L1-dcache-load-misses
>  3.688e+11 ±  0%      -2.7%   3.59e+11 ±  0%  perf-stat.L1-dcache-loads
>  1.124e+09 ±  0%     -27.7%  8.125e+08 ±  0%  perf-stat.L1-dcache-prefetches
>  2.767e+10 ±  0%      -1.8%  2.717e+10 ±  0%  perf-stat.L1-dcache-store-misses
>  2.352e+11 ±  0%      -2.8%  2.287e+11 ±  0%  perf-stat.L1-dcache-stores
>  6.774e+09 ±  0%      -2.3%   6.62e+09 ±  0%  perf-stat.L1-icache-load-misses
>  5.571e+08 ±  0%     +40.5%  7.826e+08 ±  1%  perf-stat.LLC-load-misses
>  6.263e+09 ±  0%     -13.7%  5.407e+09 ±  1%  perf-stat.LLC-loads
>  1.914e+11 ±  0%      -4.2%  1.833e+11 ±  0%  perf-stat.branch-instructions
>  1.145e+09 ±  2%      -5.6%  1.081e+09 ±  0%  perf-stat.branch-load-misses
>  1.911e+11 ±  0%      -4.3%  1.829e+11 ±  0%  perf-stat.branch-loads
>  1.142e+09 ±  2%      -5.1%  1.083e+09 ±  0%  perf-stat.branch-misses
>  1.218e+09 ±  0%     +19.8%   1.46e+09 ±  0%  perf-stat.cache-misses
>  2.118e+10 ±  0%      -5.2%  2.007e+10 ±  0%  perf-stat.cache-references
>    2510308 ±  1%     -15.7%    2115410 ±  0%  perf-stat.context-switches
>      39623 ±  0%     +22.1%      48370 ±  1%  perf-stat.cpu-migrations
>  4.179e+08 ± 40%    +165.7%  1.111e+09 ± 35%  perf-stat.dTLB-load-misses
>  3.684e+11 ±  0%      -2.5%  3.592e+11 ±  0%  perf-stat.dTLB-loads
>  1.232e+08 ± 15%     +62.5%  2.002e+08 ± 27%  perf-stat.dTLB-store-misses
>  2.348e+11 ±  0%      -2.5%  2.288e+11 ±  0%  perf-stat.dTLB-stores
>    3577297 ±  2%      +8.7%    3888986 ±  1%  perf-stat.iTLB-load-misses
>  1.035e+12 ±  0%      -3.5%  9.988e+11 ±  0%  perf-stat.iTLB-loads
>  1.036e+12 ±  0%      -3.7%  9.978e+11 ±  0%  perf-stat.instructions
>        594 ± 30%    +130.3%       1369 ± 13%  sched_debug.cfs_rq[0]:/.blocked_load_avg
>         17 ± 10%     -28.2%         12 ± 23%  sched_debug.cfs_rq[0]:/.nr_spread_over
>        210 ± 21%     +42.1%        298 ± 28%  sched_debug.cfs_rq[0]:/.tg_runnable_contrib
>       9676 ± 21%     +42.1%      13754 ± 28%  sched_debug.cfs_rq[0]:/.avg->runnable_avg_sum
>        772 ± 25%    +116.5%       1672 ±  9%  sched_debug.cfs_rq[0]:/.tg_load_contrib
>       8402 ±  9%     +83.3%      15405 ± 11%  sched_debug.cfs_rq[0]:/.tg_load_avg
>       8356 ±  9%     +82.8%      15272 ± 11%  sched_debug.cfs_rq[1]:/.tg_load_avg
>        968 ± 25%    +100.8%       1943 ± 14%  sched_debug.cfs_rq[1]:/.blocked_load_avg
>      16242 ±  9%     -22.2%      12643 ± 14%  sched_debug.cfs_rq[1]:/.avg->runnable_avg_sum
>        353 ±  9%     -22.1%        275 ± 14%  sched_debug.cfs_rq[1]:/.tg_runnable_contrib
>       1183 ± 23%     +77.7%       2102 ± 12%  sched_debug.cfs_rq[1]:/.tg_load_contrib
>        181 ±  8%     -31.4%        124 ± 26%  sched_debug.cfs_rq[2]:/.tg_runnable_contrib
>       8364 ±  8%     -31.3%       5745 ± 26%  sched_debug.cfs_rq[2]:/.avg->runnable_avg_sum
>       8297 ±  9%     +81.7%      15079 ± 12%  sched_debug.cfs_rq[2]:/.tg_load_avg
>      30439 ± 13%     -45.2%      16681 ± 26%  sched_debug.cfs_rq[2]:/.exec_clock
>      39735 ± 14%     -48.3%      20545 ± 29%  sched_debug.cfs_rq[2]:/.min_vruntime
>       8231 ± 10%     +82.2%      15000 ± 12%  sched_debug.cfs_rq[3]:/.tg_load_avg
>       1210 ± 14%    +110.3%       2546 ± 30%  sched_debug.cfs_rq[4]:/.tg_load_contrib
>       8188 ± 10%     +82.8%      14964 ± 12%  sched_debug.cfs_rq[4]:/.tg_load_avg
>       8132 ± 10%     +83.1%      14890 ± 12%  sched_debug.cfs_rq[5]:/.tg_load_avg
>        749 ± 29%    +205.9%       2292 ± 34%  sched_debug.cfs_rq[5]:/.blocked_load_avg
>        963 ± 30%    +169.9%       2599 ± 33%  sched_debug.cfs_rq[5]:/.tg_load_contrib
>      37791 ± 32%     -38.6%      23209 ± 13%  sched_debug.cfs_rq[6]:/.min_vruntime
>        693 ± 25%    +132.2%       1609 ± 29%  sched_debug.cfs_rq[6]:/.blocked_load_avg
>      10838 ± 13%     -39.2%       6587 ± 13%  sched_debug.cfs_rq[6]:/.avg->runnable_avg_sum
>      29329 ± 27%     -33.2%      19577 ± 10%  sched_debug.cfs_rq[6]:/.exec_clock
>        235 ± 14%     -39.7%        142 ± 14%  sched_debug.cfs_rq[6]:/.tg_runnable_contrib
>       8085 ± 10%     +83.6%      14848 ± 12%  sched_debug.cfs_rq[6]:/.tg_load_avg
>        839 ± 25%    +128.5%       1917 ± 18%  sched_debug.cfs_rq[6]:/.tg_load_contrib
>       8051 ± 10%     +83.6%      14779 ± 12%  sched_debug.cfs_rq[7]:/.tg_load_avg
>        156 ± 34%     +97.9%        309 ± 19%  sched_debug.cpu#0.cpu_load[4]
>        160 ± 25%     +64.0%        263 ± 16%  sched_debug.cpu#0.cpu_load[2]
>        156 ± 32%     +83.7%        286 ± 17%  sched_debug.cpu#0.cpu_load[3]
>        164 ± 20%     -35.1%        106 ± 31%  sched_debug.cpu#2.cpu_load[0]
>        249 ± 15%     +80.2%        449 ± 10%  sched_debug.cpu#4.cpu_load[3]
>        231 ± 11%    +101.2%        466 ± 13%  sched_debug.cpu#4.cpu_load[2]
>        217 ± 14%    +189.9%        630 ± 38%  sched_debug.cpu#4.cpu_load[0]
>      71951 ±  5%     +21.6%      87526 ±  7%  sched_debug.cpu#4.nr_load_updates
>        214 ±  8%    +146.1%        527 ± 27%  sched_debug.cpu#4.cpu_load[1]
>        256 ± 17%     +75.7%        449 ± 13%  sched_debug.cpu#4.cpu_load[4]
>        209 ± 23%     +98.3%        416 ± 48%  sched_debug.cpu#5.cpu_load[2]
>      68024 ±  2%     +18.8%      80825 ±  1%  sched_debug.cpu#5.nr_load_updates
>        217 ± 26%     +74.9%        380 ± 45%  sched_debug.cpu#5.cpu_load[3]
>        852 ± 21%     -38.3%        526 ± 22%  sched_debug.cpu#6.curr->pid
> 
> lkp-st02: Core2
> Memory: 8G
> 
> 
> 
> 
>                                 perf-stat.cache-misses
> 
>   1.6e+09 O+-----O--O---O--O---O--------------------------------------------+
>           |                       O   O  O   O  O   O  O   O  O   O         |
>   1.4e+09 ++                                                                |
>   1.2e+09 *+.*...*      *..*      *      *...*..*...*..*...*..*...*..*...*..*
>           |      :      :  :      :      :                                  |
>     1e+09 ++      :    :    :    : :    :                                   |
>           |       :    :    :    : :    :                                   |
>     8e+08 ++      :    :    :    : :    :                                   |
>           |       :   :      :   :  :   :                                   |
>     6e+08 ++       :  :      :  :   :  :                                    |
>     4e+08 ++       : :        : :    : :                                    |
>           |        : :        : :    : :                                    |
>     2e+08 ++       : :        : :    : :                                    |
>           |         :          :      :                                     |
>         0 ++-O------*----------*------*-------------------------------------+
> 
> 
>                             perf-stat.L1-dcache-prefetches
> 
>   1.2e+09 ++----------------------------------------------------------------+
>           *..*...*      *..*      *        ..*..  ..*..*...*..*...*..*...*..*
>     1e+09 ++     :      :  :      :      *.     *.                          |
>           |      :     :    :     ::     :                                  |
>           |       :    :    :    : :     :                        O         |
>     8e+08 O+     O: O  :O  O:  O :O:  O :O   O  O   O  O   O  O             |
>           |       :   :      :   :  :   :                                   |
>     6e+08 ++      :   :      :   :  :   :                                   |
>           |        :  :      :  :   :   :                                   |
>     4e+08 ++       :  :      :  :   :  :                                    |
>           |        : :        : :    : :                                    |
>           |        : :        : :    : :                                    |
>     2e+08 ++        ::        ::     : :                                    |
>           |         :          :      :                                     |
>         0 ++-O------*----------*------*-------------------------------------+
> 
> 
>                               perf-stat.LLC-load-misses
> 
>   1e+09 ++------------------------------------------------------------------+
>   9e+08 O+     O   O  O   O  O                                              |
>         |                        O   O  O   O                               |
>   8e+08 ++                                     O   O   O  O   O  O          |
>   7e+08 ++                                                                  |
>         |                                                                   |
>   6e+08 *+..*..*      *...*      *      *...*..*...*...*..*...*..*...*..*...*
>   5e+08 ++      :     :   :      ::     :                                   |
>   4e+08 ++      :    :     :    : :    :                                    |
>         |        :   :     :    :  :   :                                    |
>   3e+08 ++       :   :      :  :   :   :                                    |
>   2e+08 ++        : :       :  :    : :                                     |
>         |         : :       : :     : :                                     |
>   1e+08 ++         :         ::      :                                      |
>       0 ++--O------*---------*-------*--------------------------------------+
> 
> 
>                               perf-stat.context-switches
> 
>     3e+06 ++----------------------------------------------------------------+
>           |                              *...*..*...                        |
>   2.5e+06 *+.*...*      *..*      *      :          *..*...  .*...*..*...  .*
>           |      :      :  :      :      :                 *.            *. |
>           O      O: O  :O  O:  O  ::    :       O   O  O   O  O   O         |
>     2e+06 ++      :    :    :    :O:  O :O   O                              |
>           |       :    :    :    : :    :                                   |
>   1.5e+06 ++      :   :      :   :  :   :                                   |
>           |        :  :      :   :  :  :                                    |
>     1e+06 ++       :  :      :  :   :  :                                    |
>           |        : :        : :    : :                                    |
>           |        : :        : :    : :                                    |
>    500000 ++        ::        : :    ::                                     |
>           |         :          :      :                                     |
>         0 ++-O------*----------*------*-------------------------------------+
> 
> 
>                                   vmstat.system.cs
> 
>   10000 ++------------------------------------------------------------------+
>    9000 ++                              *...*..                             |
>         *...*..*      *...*      *      :      *...*...*..  ..*..*...*..  ..*
>    8000 ++     :      :   :      :      :                 *.            *.  |
>    7000 O+     O:  O  O   O: O  : :    :       O   O   O  O   O  O          |
>         |       :    :     :    :O:  O :O   O                               |
>    6000 ++      :    :     :    : :    :                                    |
>    5000 ++       :   :     :   :   :   :                                    |
>    4000 ++       :   :      :  :   :  :                                     |
>         |        :  :       :  :   :  :                                     |
>    3000 ++        : :       : :     : :                                     |
>    2000 ++        : :       : :     : :                                     |
>         |         : :        ::     ::                                      |
>    1000 ++         :         :       :                                      |
>       0 ++--O------*---------*-------*--------------------------------------+
> 
> 
> 	[*] bisect-good sample
> 	[O] bisect-bad  sample
> 
> To reproduce:
> 
> 	apt-get install ruby
> 	git clone git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git
> 	cd lkp-tests
> 	bin/setup-local job.yaml # the job file attached in this email
> 	bin/run-local   job.yaml
> 
> 
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
> 
> 
> Thanks,
> Ying Huang
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

next prev parent reply	other threads:[~2015-04-24  2:16 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-23  6:55 [LKP] [RAID5] 878ee679279: -1.8% vmstat.io.bo, +40.5% perf-stat.LLC-load-misses Huang Ying
2015-04-24  2:15 ` NeilBrown [this message]
2015-04-30  6:25   ` Yuanhan Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150424121559.321677ce@notabene.brown \
    --to=neilb@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lkp@01.org \
    --cc=shli@kernel.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox