Re: [block] 34b48db66e0: +3291.6% iostat.sde.wrqm/s

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jens Axboe <axboe@fb.com>
To: lkp@lists.01.org
Subject: Re: [block] 34b48db66e0: +3291.6% iostat.sde.wrqm/s
Date: Thu, 22 Jan 2015 12:02:37 -0700	[thread overview]
Message-ID: <54C1494D.5050507@fb.com> (raw)
In-Reply-To: <x49zj9a9073.fsf@segfault.boston.devel.redhat.com>

[-- Attachment #1: Type: text/plain, Size: 3827 bytes --]

On 01/22/2015 10:47 AM, Jeff Moyer wrote:
> Jens Axboe <axboe@fb.com> writes:
> 
>> On 01/21/2015 06:21 PM, Huang Ying wrote:
>>> FYI, we noticed the below changes on
>>>
>>> commit 34b48db66e08ca1c1bc07cf305d672ac940268dc ("block: remove artifical max_hw_sectors cap")
>>>
>>> testbox/testcase/testparams: lkp-ws02/fileio/600s-100%-1HDD-btrfs-64G-1024f-seqrewr-sync
>>>
>>> c2661b806092d8ea  34b48db66e08ca1c1bc07cf305
>>> ----------------  --------------------------
>>>           %stddev     %change         %stddev
>>>               \          |                \
>>>       47176 ±  2%     -67.3%      15406 ±  4%  softirqs.BLOCK
>>>        1110 ± 44%     -51.0%        544 ± 35%  sched_debug.cpu#8.curr->pid
>>>          22 ± 33%     -48.9%         11 ± 43%  sched_debug.cpu#1.cpu_load[0]
>>>          91 ± 43%    +125.0%        204 ± 32%  sched_debug.cfs_rq[4]:/.blocked_load_avg
>>>          17 ± 46%     -65.2%          6 ± 31%  sched_debug.cfs_rq[1]:/.runnable_load_avg
>>>         105 ± 43%    +102.6%        213 ± 32%  sched_debug.cfs_rq[4]:/.tg_load_contrib
>>>         163 ± 35%     +62.6%        265 ± 27%  sched_debug.cfs_rq[16]:/.blocked_load_avg
>>>         183 ± 29%     +51.4%        277 ± 26%  sched_debug.cfs_rq[16]:/.tg_load_contrib
>>>        1411 ± 31%     -42.5%        812 ± 32%  sched_debug.cpu#6.curr->pid
>>>    57565068 ± 15%     +66.8%   96024066 ± 17%  cpuidle.C1E-NHM.time
>>>       94625 ±  9%     -32.5%      63893 ±  4%  cpuidle.C3-NHM.usage
>>>         200 ± 14%     -22.8%        155 ± 24%  sched_debug.cfs_rq[8]:/.tg_load_contrib
>>>         244 ± 33%     -39.0%        149 ± 11%  sched_debug.cfs_rq[6]:/.blocked_load_avg
>>>         265 ± 31%     -38.4%        163 ±  9%  sched_debug.cfs_rq[6]:/.tg_load_contrib
>>>        4959 ±  9%     -18.2%       4058 ±  1%  slabinfo.kmalloc-128.active_objs
>>>        4987 ±  9%     -18.6%       4058 ±  1%  slabinfo.kmalloc-128.num_objs
>>>          19 ±  8%     -19.7%         15 ± 14%  sched_debug.cpu#0.cpu_load[1]
>>>      662307 ±  7%     -12.6%     579108 ±  3%  cpuidle.C6-NHM.usage
>>>        3028 ±  7%     -12.3%       2656 ±  1%  slabinfo.ext4_extent_status.num_objs
>>>        3028 ±  7%     -12.3%       2656 ±  1%  slabinfo.ext4_extent_status.active_objs
>>>        4.87 ±  0%   +3291.6%     165.07 ±  0%  iostat.sde.wrqm/s
>>>        1006 ±  0%    +120.3%       2216 ±  0%  iostat.sde.avgrq-sz
>>
>> So these two above tells us that we are doing way more write merges
>> per second, and that the average request size has roughly doubled from
>> 1006 to 2216 - both are excellent news.
>>
>>>         466 ±  0%    +115.9%       1007 ±  0%  iostat.sde.await
>>>         466 ±  0%    +115.9%       1007 ±  0%  iostat.sde.w_await
>>
>> Service time roughly doubled, must be mostly stream time.
>>
>>>         301 ±  0%     -52.7%        142 ±  0%  iostat.sde.w/s
>>
>> About half the number of writes completed, but from the stats above,
>> those writes are more than double. 1006 * 301 < 2216 * 142, so again,
>> this looks good.
>>
>>>        2230 ±  2%      -8.2%       2048 ±  2%  vmstat.system.in
>>
>> And a nice reduction in irq rate, also nice. Way less software irqs
>> from the first few lines, also a win.
> 
> Agreed on all above, but are the actual benchmark numbers included
> somewhere in all this mess?  I'd like to see if the benchmark numbers
> improved first, before digging into the guts of which functions are
> called more or which stats changed.

I deleted the original email, but the latter tables had drive throughput
rates and it looked higher for the ones I checked on the newer kernel.
Which the above math would indicate as well, multiplying reqs-per-sec
and req-size.

-- 
Jens Axboe

WARNING: multiple messages have this Message-ID (diff)

From: Jens Axboe <axboe@fb.com>
To: Jeff Moyer <jmoyer@redhat.com>
Cc: Huang Ying <ying.huang@intel.com>, Christoph Hellwig <hch@lst.de>,
	LKML <linux-kernel@vger.kernel.org>, LKP ML <lkp@01.org>
Subject: Re: [LKP] [block] 34b48db66e0: +3291.6% iostat.sde.wrqm/s
Date: Thu, 22 Jan 2015 12:02:37 -0700	[thread overview]
Message-ID: <54C1494D.5050507@fb.com> (raw)
In-Reply-To: <x49zj9a9073.fsf@segfault.boston.devel.redhat.com>

On 01/22/2015 10:47 AM, Jeff Moyer wrote:
> Jens Axboe <axboe@fb.com> writes:
> 
>> On 01/21/2015 06:21 PM, Huang Ying wrote:
>>> FYI, we noticed the below changes on
>>>
>>> commit 34b48db66e08ca1c1bc07cf305d672ac940268dc ("block: remove artifical max_hw_sectors cap")
>>>
>>> testbox/testcase/testparams: lkp-ws02/fileio/600s-100%-1HDD-btrfs-64G-1024f-seqrewr-sync
>>>
>>> c2661b806092d8ea  34b48db66e08ca1c1bc07cf305
>>> ----------------  --------------------------
>>>           %stddev     %change         %stddev
>>>               \          |                \
>>>       47176 ±  2%     -67.3%      15406 ±  4%  softirqs.BLOCK
>>>        1110 ± 44%     -51.0%        544 ± 35%  sched_debug.cpu#8.curr->pid
>>>          22 ± 33%     -48.9%         11 ± 43%  sched_debug.cpu#1.cpu_load[0]
>>>          91 ± 43%    +125.0%        204 ± 32%  sched_debug.cfs_rq[4]:/.blocked_load_avg
>>>          17 ± 46%     -65.2%          6 ± 31%  sched_debug.cfs_rq[1]:/.runnable_load_avg
>>>         105 ± 43%    +102.6%        213 ± 32%  sched_debug.cfs_rq[4]:/.tg_load_contrib
>>>         163 ± 35%     +62.6%        265 ± 27%  sched_debug.cfs_rq[16]:/.blocked_load_avg
>>>         183 ± 29%     +51.4%        277 ± 26%  sched_debug.cfs_rq[16]:/.tg_load_contrib
>>>        1411 ± 31%     -42.5%        812 ± 32%  sched_debug.cpu#6.curr->pid
>>>    57565068 ± 15%     +66.8%   96024066 ± 17%  cpuidle.C1E-NHM.time
>>>       94625 ±  9%     -32.5%      63893 ±  4%  cpuidle.C3-NHM.usage
>>>         200 ± 14%     -22.8%        155 ± 24%  sched_debug.cfs_rq[8]:/.tg_load_contrib
>>>         244 ± 33%     -39.0%        149 ± 11%  sched_debug.cfs_rq[6]:/.blocked_load_avg
>>>         265 ± 31%     -38.4%        163 ±  9%  sched_debug.cfs_rq[6]:/.tg_load_contrib
>>>        4959 ±  9%     -18.2%       4058 ±  1%  slabinfo.kmalloc-128.active_objs
>>>        4987 ±  9%     -18.6%       4058 ±  1%  slabinfo.kmalloc-128.num_objs
>>>          19 ±  8%     -19.7%         15 ± 14%  sched_debug.cpu#0.cpu_load[1]
>>>      662307 ±  7%     -12.6%     579108 ±  3%  cpuidle.C6-NHM.usage
>>>        3028 ±  7%     -12.3%       2656 ±  1%  slabinfo.ext4_extent_status.num_objs
>>>        3028 ±  7%     -12.3%       2656 ±  1%  slabinfo.ext4_extent_status.active_objs
>>>        4.87 ±  0%   +3291.6%     165.07 ±  0%  iostat.sde.wrqm/s
>>>        1006 ±  0%    +120.3%       2216 ±  0%  iostat.sde.avgrq-sz
>>
>> So these two above tells us that we are doing way more write merges
>> per second, and that the average request size has roughly doubled from
>> 1006 to 2216 - both are excellent news.
>>
>>>         466 ±  0%    +115.9%       1007 ±  0%  iostat.sde.await
>>>         466 ±  0%    +115.9%       1007 ±  0%  iostat.sde.w_await
>>
>> Service time roughly doubled, must be mostly stream time.
>>
>>>         301 ±  0%     -52.7%        142 ±  0%  iostat.sde.w/s
>>
>> About half the number of writes completed, but from the stats above,
>> those writes are more than double. 1006 * 301 < 2216 * 142, so again,
>> this looks good.
>>
>>>        2230 ±  2%      -8.2%       2048 ±  2%  vmstat.system.in
>>
>> And a nice reduction in irq rate, also nice. Way less software irqs
>> from the first few lines, also a win.
> 
> Agreed on all above, but are the actual benchmark numbers included
> somewhere in all this mess?  I'd like to see if the benchmark numbers
> improved first, before digging into the guts of which functions are
> called more or which stats changed.

I deleted the original email, but the latter tables had drive throughput
rates and it looked higher for the ones I checked on the newer kernel.
Which the above math would indicate as well, multiplying reqs-per-sec
and req-size.

-- 
Jens Axboe

next prev parent reply	other threads:[~2015-01-22 19:02 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-01-22  1:21 [block] 34b48db66e0: +3291.6% iostat.sde.wrqm/s Huang Ying
2015-01-22  1:21 ` [LKP] " Huang Ying
2015-01-22  5:37 ` Jens Axboe
2015-01-22  5:37   ` [LKP] " Jens Axboe
2015-01-22 17:47   ` Jeff Moyer
2015-01-22 17:47     ` [LKP] " Jeff Moyer
2015-01-22 19:02     ` Jens Axboe [this message]
2015-01-22 19:02       ` Jens Axboe
2015-01-22 20:49       ` Jeff Moyer
2015-01-22 20:49         ` [LKP] " Jeff Moyer
2015-01-22 20:58         ` Jens Axboe
2015-01-22 20:58           ` [LKP] " Jens Axboe
2015-01-22 21:08           ` Jeff Moyer
2015-01-22 21:08             ` [LKP] " Jeff Moyer
2015-01-22 21:12             ` Jens Axboe
2015-01-22 21:12               ` [LKP] " Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54C1494D.5050507@fb.com \
    --to=axboe@fb.com \
    --cc=lkp@lists.01.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.