From: NeilBrown <neilb@suse.de>
To: Huang Ying <ying.huang@intel.com>
Cc: "shli@kernel.org" <shli@kernel.org>,
LKML <linux-kernel@vger.kernel.org>, LKP ML <lkp@01.org>
Subject: Re: [LKP] [RAID5] 878ee679279: -1.8% vmstat.io.bo, +40.5% perf-stat.LLC-load-misses
Date: Fri, 24 Apr 2015 12:15:59 +1000 [thread overview]
Message-ID: <20150424121559.321677ce@notabene.brown> (raw)
In-Reply-To: <1429772159.25120.9.camel@intel.com>
[-- Attachment #1: Type: text/plain, Size: 15980 bytes --]
On Thu, 23 Apr 2015 14:55:59 +0800 Huang Ying <ying.huang@intel.com> wrote:
> FYI, we noticed the below changes on
>
> git://neil.brown.name/md for-next
> commit 878ee6792799e2f88bdcac329845efadb205252f ("RAID5: batch adjacent full stripe write")
Hi,
is there any chance that you could explain what some of this means?
There is lots of data and some very pretty graphs, but no explanation.
Which numbers are "good", which are "bad"? Which is "worst".
What do the graphs really show? and what would we like to see in them?
I think it is really great that you are doing this testing and reporting the
results. It's just so sad that I completely fail to understand them.
Thanks,
NeilBrown
>
>
> testbox/testcase/testparams: lkp-st02/dd-write/300-5m-11HDD-RAID5-cfq-xfs-1dd
>
> a87d7f782b47e030 878ee6792799e2f88bdcac3298
> ---------------- --------------------------
> %stddev %change %stddev
> \ | \
> 59035 ± 0% +18.4% 69913 ± 1% softirqs.SCHED
> 1330 ± 10% +17.4% 1561 ± 4% slabinfo.kmalloc-512.num_objs
> 1330 ± 10% +17.4% 1561 ± 4% slabinfo.kmalloc-512.active_objs
> 305908 ± 0% -1.8% 300427 ± 0% vmstat.io.bo
> 1 ± 0% +100.0% 2 ± 0% vmstat.procs.r
> 8266 ± 1% -15.7% 6968 ± 0% vmstat.system.cs
> 14819 ± 0% -2.1% 14503 ± 0% vmstat.system.in
> 18.20 ± 6% +10.2% 20.05 ± 4% perf-profile.cpu-cycles.raid_run_ops.handle_stripe.handle_active_stripes.raid5d.md_thread
> 1.94 ± 9% +90.6% 3.70 ± 9% perf-profile.cpu-cycles.async_xor.raid_run_ops.handle_stripe.handle_active_stripes.raid5d
> 0.00 ± 0% +Inf% 25.18 ± 3% perf-profile.cpu-cycles.handle_active_stripes.isra.45.raid5d.md_thread.kthread.ret_from_fork
> 0.00 ± 0% +Inf% 14.14 ± 4% perf-profile.cpu-cycles.async_copy_data.isra.42.raid_run_ops.handle_stripe.handle_active_stripes.raid5d
> 1.79 ± 7% +102.9% 3.64 ± 9% perf-profile.cpu-cycles.xor_blocks.async_xor.raid_run_ops.handle_stripe.handle_active_stripes
> 3.09 ± 4% -10.8% 2.76 ± 4% perf-profile.cpu-cycles.get_active_stripe.make_request.md_make_request.generic_make_request.submit_bio
> 0.80 ± 14% +28.1% 1.02 ± 10% perf-profile.cpu-cycles.mutex_lock.xfs_file_buffered_aio_write.xfs_file_write_iter.new_sync_write.vfs_write
> 14.78 ± 6% -100.0% 0.00 ± 0% perf-profile.cpu-cycles.async_copy_data.isra.38.raid_run_ops.handle_stripe.handle_active_stripes.raid5d
> 25.68 ± 4% -100.0% 0.00 ± 0% perf-profile.cpu-cycles.handle_active_stripes.isra.41.raid5d.md_thread.kthread.ret_from_fork
> 1.23 ± 5% +140.0% 2.96 ± 7% perf-profile.cpu-cycles.xor_sse_5_pf64.xor_blocks.async_xor.raid_run_ops.handle_stripe
> 2.62 ± 6% -95.6% 0.12 ± 33% perf-profile.cpu-cycles.analyse_stripe.handle_stripe.handle_active_stripes.raid5d.md_thread
> 0.96 ± 9% +17.5% 1.12 ± 2% perf-profile.cpu-cycles.xfs_ilock.xfs_file_buffered_aio_write.xfs_file_write_iter.new_sync_write.vfs_write
> 1.461e+10 ± 0% -5.3% 1.384e+10 ± 1% perf-stat.L1-dcache-load-misses
> 3.688e+11 ± 0% -2.7% 3.59e+11 ± 0% perf-stat.L1-dcache-loads
> 1.124e+09 ± 0% -27.7% 8.125e+08 ± 0% perf-stat.L1-dcache-prefetches
> 2.767e+10 ± 0% -1.8% 2.717e+10 ± 0% perf-stat.L1-dcache-store-misses
> 2.352e+11 ± 0% -2.8% 2.287e+11 ± 0% perf-stat.L1-dcache-stores
> 6.774e+09 ± 0% -2.3% 6.62e+09 ± 0% perf-stat.L1-icache-load-misses
> 5.571e+08 ± 0% +40.5% 7.826e+08 ± 1% perf-stat.LLC-load-misses
> 6.263e+09 ± 0% -13.7% 5.407e+09 ± 1% perf-stat.LLC-loads
> 1.914e+11 ± 0% -4.2% 1.833e+11 ± 0% perf-stat.branch-instructions
> 1.145e+09 ± 2% -5.6% 1.081e+09 ± 0% perf-stat.branch-load-misses
> 1.911e+11 ± 0% -4.3% 1.829e+11 ± 0% perf-stat.branch-loads
> 1.142e+09 ± 2% -5.1% 1.083e+09 ± 0% perf-stat.branch-misses
> 1.218e+09 ± 0% +19.8% 1.46e+09 ± 0% perf-stat.cache-misses
> 2.118e+10 ± 0% -5.2% 2.007e+10 ± 0% perf-stat.cache-references
> 2510308 ± 1% -15.7% 2115410 ± 0% perf-stat.context-switches
> 39623 ± 0% +22.1% 48370 ± 1% perf-stat.cpu-migrations
> 4.179e+08 ± 40% +165.7% 1.111e+09 ± 35% perf-stat.dTLB-load-misses
> 3.684e+11 ± 0% -2.5% 3.592e+11 ± 0% perf-stat.dTLB-loads
> 1.232e+08 ± 15% +62.5% 2.002e+08 ± 27% perf-stat.dTLB-store-misses
> 2.348e+11 ± 0% -2.5% 2.288e+11 ± 0% perf-stat.dTLB-stores
> 3577297 ± 2% +8.7% 3888986 ± 1% perf-stat.iTLB-load-misses
> 1.035e+12 ± 0% -3.5% 9.988e+11 ± 0% perf-stat.iTLB-loads
> 1.036e+12 ± 0% -3.7% 9.978e+11 ± 0% perf-stat.instructions
> 594 ± 30% +130.3% 1369 ± 13% sched_debug.cfs_rq[0]:/.blocked_load_avg
> 17 ± 10% -28.2% 12 ± 23% sched_debug.cfs_rq[0]:/.nr_spread_over
> 210 ± 21% +42.1% 298 ± 28% sched_debug.cfs_rq[0]:/.tg_runnable_contrib
> 9676 ± 21% +42.1% 13754 ± 28% sched_debug.cfs_rq[0]:/.avg->runnable_avg_sum
> 772 ± 25% +116.5% 1672 ± 9% sched_debug.cfs_rq[0]:/.tg_load_contrib
> 8402 ± 9% +83.3% 15405 ± 11% sched_debug.cfs_rq[0]:/.tg_load_avg
> 8356 ± 9% +82.8% 15272 ± 11% sched_debug.cfs_rq[1]:/.tg_load_avg
> 968 ± 25% +100.8% 1943 ± 14% sched_debug.cfs_rq[1]:/.blocked_load_avg
> 16242 ± 9% -22.2% 12643 ± 14% sched_debug.cfs_rq[1]:/.avg->runnable_avg_sum
> 353 ± 9% -22.1% 275 ± 14% sched_debug.cfs_rq[1]:/.tg_runnable_contrib
> 1183 ± 23% +77.7% 2102 ± 12% sched_debug.cfs_rq[1]:/.tg_load_contrib
> 181 ± 8% -31.4% 124 ± 26% sched_debug.cfs_rq[2]:/.tg_runnable_contrib
> 8364 ± 8% -31.3% 5745 ± 26% sched_debug.cfs_rq[2]:/.avg->runnable_avg_sum
> 8297 ± 9% +81.7% 15079 ± 12% sched_debug.cfs_rq[2]:/.tg_load_avg
> 30439 ± 13% -45.2% 16681 ± 26% sched_debug.cfs_rq[2]:/.exec_clock
> 39735 ± 14% -48.3% 20545 ± 29% sched_debug.cfs_rq[2]:/.min_vruntime
> 8231 ± 10% +82.2% 15000 ± 12% sched_debug.cfs_rq[3]:/.tg_load_avg
> 1210 ± 14% +110.3% 2546 ± 30% sched_debug.cfs_rq[4]:/.tg_load_contrib
> 8188 ± 10% +82.8% 14964 ± 12% sched_debug.cfs_rq[4]:/.tg_load_avg
> 8132 ± 10% +83.1% 14890 ± 12% sched_debug.cfs_rq[5]:/.tg_load_avg
> 749 ± 29% +205.9% 2292 ± 34% sched_debug.cfs_rq[5]:/.blocked_load_avg
> 963 ± 30% +169.9% 2599 ± 33% sched_debug.cfs_rq[5]:/.tg_load_contrib
> 37791 ± 32% -38.6% 23209 ± 13% sched_debug.cfs_rq[6]:/.min_vruntime
> 693 ± 25% +132.2% 1609 ± 29% sched_debug.cfs_rq[6]:/.blocked_load_avg
> 10838 ± 13% -39.2% 6587 ± 13% sched_debug.cfs_rq[6]:/.avg->runnable_avg_sum
> 29329 ± 27% -33.2% 19577 ± 10% sched_debug.cfs_rq[6]:/.exec_clock
> 235 ± 14% -39.7% 142 ± 14% sched_debug.cfs_rq[6]:/.tg_runnable_contrib
> 8085 ± 10% +83.6% 14848 ± 12% sched_debug.cfs_rq[6]:/.tg_load_avg
> 839 ± 25% +128.5% 1917 ± 18% sched_debug.cfs_rq[6]:/.tg_load_contrib
> 8051 ± 10% +83.6% 14779 ± 12% sched_debug.cfs_rq[7]:/.tg_load_avg
> 156 ± 34% +97.9% 309 ± 19% sched_debug.cpu#0.cpu_load[4]
> 160 ± 25% +64.0% 263 ± 16% sched_debug.cpu#0.cpu_load[2]
> 156 ± 32% +83.7% 286 ± 17% sched_debug.cpu#0.cpu_load[3]
> 164 ± 20% -35.1% 106 ± 31% sched_debug.cpu#2.cpu_load[0]
> 249 ± 15% +80.2% 449 ± 10% sched_debug.cpu#4.cpu_load[3]
> 231 ± 11% +101.2% 466 ± 13% sched_debug.cpu#4.cpu_load[2]
> 217 ± 14% +189.9% 630 ± 38% sched_debug.cpu#4.cpu_load[0]
> 71951 ± 5% +21.6% 87526 ± 7% sched_debug.cpu#4.nr_load_updates
> 214 ± 8% +146.1% 527 ± 27% sched_debug.cpu#4.cpu_load[1]
> 256 ± 17% +75.7% 449 ± 13% sched_debug.cpu#4.cpu_load[4]
> 209 ± 23% +98.3% 416 ± 48% sched_debug.cpu#5.cpu_load[2]
> 68024 ± 2% +18.8% 80825 ± 1% sched_debug.cpu#5.nr_load_updates
> 217 ± 26% +74.9% 380 ± 45% sched_debug.cpu#5.cpu_load[3]
> 852 ± 21% -38.3% 526 ± 22% sched_debug.cpu#6.curr->pid
>
> lkp-st02: Core2
> Memory: 8G
>
>
>
>
> perf-stat.cache-misses
>
> 1.6e+09 O+-----O--O---O--O---O--------------------------------------------+
> | O O O O O O O O O O |
> 1.4e+09 ++ |
> 1.2e+09 *+.*...* *..* * *...*..*...*..*...*..*...*..*...*..*
> | : : : : : |
> 1e+09 ++ : : : : : : |
> | : : : : : : |
> 8e+08 ++ : : : : : : |
> | : : : : : : |
> 6e+08 ++ : : : : : : |
> 4e+08 ++ : : : : : : |
> | : : : : : : |
> 2e+08 ++ : : : : : : |
> | : : : |
> 0 ++-O------*----------*------*-------------------------------------+
>
>
> perf-stat.L1-dcache-prefetches
>
> 1.2e+09 ++----------------------------------------------------------------+
> *..*...* *..* * ..*.. ..*..*...*..*...*..*...*..*
> 1e+09 ++ : : : : *. *. |
> | : : : :: : |
> | : : : : : : O |
> 8e+08 O+ O: O :O O: O :O: O :O O O O O O O |
> | : : : : : : |
> 6e+08 ++ : : : : : : |
> | : : : : : : |
> 4e+08 ++ : : : : : : |
> | : : : : : : |
> | : : : : : : |
> 2e+08 ++ :: :: : : |
> | : : : |
> 0 ++-O------*----------*------*-------------------------------------+
>
>
> perf-stat.LLC-load-misses
>
> 1e+09 ++------------------------------------------------------------------+
> 9e+08 O+ O O O O O |
> | O O O O |
> 8e+08 ++ O O O O O O |
> 7e+08 ++ |
> | |
> 6e+08 *+..*..* *...* * *...*..*...*...*..*...*..*...*..*...*
> 5e+08 ++ : : : :: : |
> 4e+08 ++ : : : : : : |
> | : : : : : : |
> 3e+08 ++ : : : : : : |
> 2e+08 ++ : : : : : : |
> | : : : : : : |
> 1e+08 ++ : :: : |
> 0 ++--O------*---------*-------*--------------------------------------+
>
>
> perf-stat.context-switches
>
> 3e+06 ++----------------------------------------------------------------+
> | *...*..*... |
> 2.5e+06 *+.*...* *..* * : *..*... .*...*..*... .*
> | : : : : : *. *. |
> O O: O :O O: O :: : O O O O O O |
> 2e+06 ++ : : : :O: O :O O |
> | : : : : : : |
> 1.5e+06 ++ : : : : : : |
> | : : : : : : |
> 1e+06 ++ : : : : : : |
> | : : : : : : |
> | : : : : : : |
> 500000 ++ :: : : :: |
> | : : : |
> 0 ++-O------*----------*------*-------------------------------------+
>
>
> vmstat.system.cs
>
> 10000 ++------------------------------------------------------------------+
> 9000 ++ *...*.. |
> *...*..* *...* * : *...*...*.. ..*..*...*.. ..*
> 8000 ++ : : : : : *. *. |
> 7000 O+ O: O O O: O : : : O O O O O O |
> | : : : :O: O :O O |
> 6000 ++ : : : : : : |
> 5000 ++ : : : : : : |
> 4000 ++ : : : : : : |
> | : : : : : : |
> 3000 ++ : : : : : : |
> 2000 ++ : : : : : : |
> | : : :: :: |
> 1000 ++ : : : |
> 0 ++--O------*---------*-------*--------------------------------------+
>
>
> [*] bisect-good sample
> [O] bisect-bad sample
>
> To reproduce:
>
> apt-get install ruby
> git clone git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git
> cd lkp-tests
> bin/setup-local job.yaml # the job file attached in this email
> bin/run-local job.yaml
>
>
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
>
>
> Thanks,
> Ying Huang
>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]
next prev parent reply other threads:[~2015-04-24 2:16 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-04-23 6:55 [LKP] [RAID5] 878ee679279: -1.8% vmstat.io.bo, +40.5% perf-stat.LLC-load-misses Huang Ying
2015-04-24 2:15 ` NeilBrown [this message]
2015-04-30 6:25 ` Yuanhan Liu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150424121559.321677ce@notabene.brown \
--to=neilb@suse.de \
--cc=linux-kernel@vger.kernel.org \
--cc=lkp@01.org \
--cc=shli@kernel.org \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox