From: Mustafa Mesanovic <mume@linux.vnet.ibm.com>
To: Mike Snitzer <snitzer@redhat.com>, dm-devel@redhat.com
Cc: Neil Brown <neilb@suse.de>,
akpm@linux-foundation.org, cotte@de.ibm.com,
heiko.carstens@de.ibm.com, linux-kernel@vger.kernel.org,
ehrhardt@linux.vnet.ibm.com,
"Alasdair G. Kergon" <agk@redhat.com>,
Jeff Moyer <jmoyer@redhat.com>
Subject: Re: [PATCH v3] dm stripe: implement merge method
Date: Mon, 14 Mar 2011 12:54:50 +0100 [thread overview]
Message-ID: <4D7E020A.1020708@linux.vnet.ibm.com> (raw)
In-Reply-To: <20110312224222.GA6176@redhat.com>
On 03/12/2011 11:42 PM, Mike Snitzer wrote:
> Hi Mustafa,
>
> On Thu, Mar 10 2011 at 9:02am -0500,
> Mustafa Mesanovic<mume@linux.vnet.ibm.com> wrote:
>
>> On 03/08/2011 05:48 PM, Mike Snitzer wrote:
>>> In any case, it clearly helps your workload.
>>>
>>> Could you explain your config in more detail?
>>> - what is your chunk_size?
>>> - how many stripes (how many mpath devices)?
>>> - what is the performance, of your test workload, of a single underlying
>>> mpath device?
>>>
>>> And, in particular, what is your test workload?
>>> - What is the nature of your IO (are you using a particular tool)?
>>> - Are you using AIO?
>>> - How many threads?
>>> - Are you driving deep queue depths? Etc.
>>>
>>> I have various configs that I'll be testing to help verify the benefit.
>>> The only other change Alasdair request is that the target version should
>>> be bumped to 1.4 (rather than 1.3.2).
>>>
>>> Given that I can put some time to this now: we should be able to sort
>>> all this out for upstream inclusion in 2.6.39.
>>>
>>> Thanks,
>>> Mike
>> Mike,
>>
>> the setup that I have used to verify and check upon the changes
>> consisted of:
>>
>> - Benchmark
>> iozone (seq write, seq read, random read and write),
>> filesize 2000m, with 32 processes (no AIO used).
>>
>> - Disk-Setup
>> 2 disks (queue_depth=192) -> each disk with 8 paths
>> -> multipathed (multibus, rr_min_io=1)
>>
>> And a striped LVM out of these two (chunk_size=64KiB).
>>
>> The benchmark then runs on this LV.
> What record size are you using?
> Which filesystem are you using?
> Also, were you using O_DIRECT? If not then I'm having a hard time
> understanding why implementing stripe_merge was so beneficial for you.
> stripe_merge doesn't help buffered IO.
>
> Please share your exact iozone command line.
>
> In my testing with aio-stress I have seen the number of calls to
> stripe_map be inversely proportional to the record size (when record
> size is<= chunk_size).
>
> That is, with the following aio-stress commandline:
> aio-stress -O -o 0 -o 1 -r $RECORD_SIZE -d 64 -b 16 -i 16 -s 2048 /dev/snitm/striped_lv
>
> I varied the $RECORD_SIZE from 4k to 256k (striped_lv is using a 64k
> chunk_size across 8 mpath devices).
>
> The number of stripe_map_sector() calls resulting from having
> implemented stripe_merge is fixed at 1048560 (when reading and then
> writing 2048m). And there is one stripe_map_sector() call for each
> stripe_map() call.
>
> The following table shows the stripe_map_sector and stripe_map call
> count for writes then reads of 2048m (using $record_size AIO). AIO does
> make use of dm_merge_bvec and stripe_merge.
>
> record_size stripe_map_sector calls stripe_map calls
> 4k 2097152 1048592
> 8k 1572864 524304
> 16k 1310720 262160
> 32k 1179648 131088
> 64k 1114112 65552
> 128k 1114112 65552
> 256k 1114112 65552
>
> The above shows that bios are being assembled using larger payloads (up
> to chunk_size) given that AIO does make use of stripe_merge.
>
> When I did the same accounting (via attached systemtap script) for a
> buffered iozone run with a file size of 2000m (using -i 0 -i 1 -i 2) I
> saw that dm_merge_bvec() was _never_ called and the number of
> stripe_map_sector calls was very close to the stripe_map calls.
>
> Mike
>
> p.s.
> All the above aside, one of our more elaborate benchmarks against XFS
> has seen a significant benefit from stripe_merge() being present... I
> still need to understand that benchmark's IO workload though.
I used 64k record size, and ext3 as filesystem.
No, I was not using O_DIRECT. But I have measured as well with O_DIRECT, and
the benefits there are significant too.
stripe_merge() helps a lot. The reason of splitting I/O records into 4KiB
chunks happens at dm_set_device_limits(), thats what I explained in my v1 patch.
If the target has no own merge_fn, max_sectors will be set to PAGE_SIZE, what
in my case is 4KiB. Then __bio_add_page checks upon max_sectors and does not
add any more pages to a bio. The bio stays at 4KiB.
Now by avoiding the "wrong" setting of max_sectors for the dm target,
__bio_add_page will be able to add more than one page to the bios.
So this is my iozone call:
# iozone -s 2000m -r 64k -t 32 -e -w -R -C -i 0
-F<mntpt>/Child0 ....<mntpt>/Child31
For direct I/O (O_DIRECT) add '-I'.
dm_merge_bvec/stripe_merge is being called only on reads, thats what I have
observed when I was testing the patch on my 2.6.32.x-stable kernel. Maybe it
depends if the I/O is page cached or aio based...this might be worth a
further analysis. On writes another path must be walked through, but I have
not further analysed it so far.
In think it helps to avoid "overhead" in passing always 4KiB bios to the
dm target. In my opinion it is "cheaper"/"faster" to pass one big bio
down to the dm target instead of passing 4KiB max each bio.
I used iostat to check on the devices and the sizes of the requests, just try
to start an iostat process which collects I/O statistics during your
runs. e.g. 'iostat -dmx 2> outfile&' - check out "avgrq-sz".
And yes during my iostat runs I figured out that the writes are still dropping
into the dm in 4KiB chunks, this is what I will analyse next.
Maybe there will be another patch(es) to fix that.
Mustafa
ps:
aio-stress did not work for me, sorry but I did not have the time to check on that
and to search where the error might be...
next prev parent reply other threads:[~2011-03-14 11:54 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-12-27 11:19 [RFC][PATCH] dm: improve read performance Mustafa Mesanovic
2010-12-27 11:54 ` Neil Brown
2010-12-27 12:23 ` Mustafa Mesanovic
2011-03-07 10:10 ` Mustafa Mesanovic
2011-03-08 2:21 ` [PATCH v3] dm stripe: implement merge method Mike Snitzer
2011-03-08 10:29 ` Mustafa Mesanovic
2011-03-08 16:48 ` Mike Snitzer
2011-03-10 14:02 ` Mustafa Mesanovic
2011-03-12 22:42 ` Mike Snitzer
2011-03-14 11:54 ` Mustafa Mesanovic [this message]
2011-03-14 14:33 ` Mike Snitzer
2011-03-16 20:21 ` [PATCH v4] " Mike Snitzer
2011-03-17 5:12 ` [RFC][PATCH] dm: improve read performance Nikanth Karthikesan
2011-03-17 13:08 ` Mike Snitzer
2011-03-18 4:59 ` Nikanth Karthikesan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4D7E020A.1020708@linux.vnet.ibm.com \
--to=mume@linux.vnet.ibm.com \
--cc=agk@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=cotte@de.ibm.com \
--cc=dm-devel@redhat.com \
--cc=ehrhardt@linux.vnet.ibm.com \
--cc=heiko.carstens@de.ibm.com \
--cc=jmoyer@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=neilb@suse.de \
--cc=snitzer@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.