public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Mustafa Mesanovic <mume@linux.vnet.ibm.com>
To: Mike Snitzer <snitzer@redhat.com>, dm-devel@redhat.com
Cc: Neil Brown <neilb@suse.de>,
	akpm@linux-foundation.org, cotte@de.ibm.com,
	heiko.carstens@de.ibm.com, linux-kernel@vger.kernel.org,
	ehrhardt@linux.vnet.ibm.com,
	"Alasdair G. Kergon" <agk@redhat.com>,
	Jeff Moyer <jmoyer@redhat.com>
Subject: Re: [PATCH v3] dm stripe: implement merge method
Date: Mon, 14 Mar 2011 12:54:50 +0100	[thread overview]
Message-ID: <4D7E020A.1020708@linux.vnet.ibm.com> (raw)
In-Reply-To: <20110312224222.GA6176@redhat.com>

On 03/12/2011 11:42 PM, Mike Snitzer wrote:
> Hi Mustafa,
>
> On Thu, Mar 10 2011 at  9:02am -0500,
> Mustafa Mesanovic<mume@linux.vnet.ibm.com>  wrote:
>
>> On 03/08/2011 05:48 PM, Mike Snitzer wrote:
>>> In any case, it clearly helps your workload.
>>>
>>> Could you explain your config in more detail?
>>> - what is your chunk_size?
>>> - how many stripes (how many mpath devices)?
>>> - what is the performance, of your test workload, of a single underlying
>>>    mpath device?
>>>
>>> And, in particular, what is your test workload?
>>> - What is the nature of your IO (are you using a particular tool)?
>>> - Are you using AIO?
>>> - How many threads?
>>> - Are you driving deep queue depths? Etc.
>>>
>>> I have various configs that I'll be testing to help verify the benefit.
>>> The only other change Alasdair request is that the target version should
>>> be bumped to 1.4 (rather than 1.3.2).
>>>
>>> Given that I can put some time to this now: we should be able to sort
>>> all this out for upstream inclusion in 2.6.39.
>>>
>>> Thanks,
>>> Mike
>> Mike,
>>
>> the setup that I have used to verify and check upon the changes
>> consisted of:
>>
>> - Benchmark
>> iozone (seq write, seq read, random read and write),
>> filesize 2000m, with 32 processes (no AIO used).
>>
>> - Disk-Setup
>> 2 disks (queue_depth=192) ->   each disk with 8 paths
>> ->   multipathed (multibus, rr_min_io=1)
>>
>> And a striped LVM out of these two (chunk_size=64KiB).
>>
>> The benchmark then runs on this LV.
> What record size are you using?
> Which filesystem are you using?
> Also, were you using O_DIRECT?  If not then I'm having a hard time
> understanding why implementing stripe_merge was so beneficial for you.
> stripe_merge doesn't help buffered IO.
>
> Please share your exact iozone command line.
>
> In my testing with aio-stress I have seen the number of calls to
> stripe_map be inversely proportional to the record size (when record
> size is<= chunk_size).
>
> That is, with the following aio-stress commandline:
> aio-stress -O -o 0 -o 1 -r $RECORD_SIZE -d 64 -b 16 -i 16 -s 2048 /dev/snitm/striped_lv
>
> I varied the $RECORD_SIZE from 4k to 256k (striped_lv is using a 64k
> chunk_size across 8 mpath devices).
>
> The number of stripe_map_sector() calls resulting from having
> implemented stripe_merge is fixed at 1048560 (when reading and then
> writing 2048m).  And there is one stripe_map_sector() call for each
> stripe_map() call.
>
> The following table shows the stripe_map_sector and stripe_map call
> count for writes then reads of 2048m (using $record_size AIO).  AIO does
> make use of dm_merge_bvec and stripe_merge.
>
> record_size    stripe_map_sector calls    stripe_map calls
> 4k             2097152                    1048592
> 8k             1572864                    524304
> 16k            1310720                    262160
> 32k            1179648                    131088
> 64k            1114112                    65552
> 128k           1114112                    65552
> 256k           1114112                    65552
>
> The above shows that bios are being assembled using larger payloads (up
> to chunk_size) given that AIO does make use of stripe_merge.
>
> When I did the same accounting (via attached systemtap script) for a
> buffered iozone run with a file size of 2000m (using -i 0 -i 1 -i 2) I
> saw that dm_merge_bvec() was _never_ called and the number of
> stripe_map_sector calls was very close to the stripe_map calls.
>
> Mike
>
> p.s.
> All the above aside, one of our more elaborate benchmarks against XFS
> has seen a significant benefit from stripe_merge() being present... I
> still need to understand that benchmark's IO workload though.
I used 64k record size, and ext3 as filesystem.

No, I was not using O_DIRECT. But I have measured as well with O_DIRECT, and
the benefits there are significant too.

stripe_merge() helps a lot. The reason of splitting I/O records into 4KiB
chunks happens at dm_set_device_limits(), thats what I explained in my v1 patch.
If the target has no own merge_fn, max_sectors will be set to PAGE_SIZE, what
in my case is 4KiB. Then __bio_add_page checks upon max_sectors and does not
add any more pages to a bio. The bio stays at 4KiB.

Now by avoiding the "wrong" setting of max_sectors for the dm target,
__bio_add_page will be able to add more than one page to the bios.

So this is my iozone call:
  # iozone -s 2000m -r 64k -t 32 -e -w -R -C -i 0
                                         -F<mntpt>/Child0 ....<mntpt>/Child31
For direct I/O (O_DIRECT) add '-I'.

dm_merge_bvec/stripe_merge is being called only on reads, thats what I have
observed when I was testing the patch on my 2.6.32.x-stable kernel. Maybe it
depends if the I/O is page cached or aio based...this might be worth a
further analysis. On writes another path must be walked through, but I have
not further analysed it so far.

In think it helps to avoid "overhead" in passing always 4KiB bios to the
dm target. In my opinion it is "cheaper"/"faster" to pass one big bio
down to the dm target instead of passing 4KiB max each bio.

I used iostat to check on the devices and the sizes of the requests, just try
to start an iostat process which collects I/O statistics during your
runs. e.g. 'iostat -dmx 2>  outfile&' - check out "avgrq-sz".

And yes during my iostat runs I figured out that the writes are still dropping
into the dm in 4KiB chunks, this is what I will analyse next.
Maybe there will be another patch(es) to fix that.

Mustafa

ps:
aio-stress did not work for me, sorry but I did not have the time to check on that
and to search where the error might be...


  reply	other threads:[~2011-03-14 11:54 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-12-27 11:19 [RFC][PATCH] dm: improve read performance Mustafa Mesanovic
2010-12-27 11:54 ` Neil Brown
2010-12-27 12:23   ` Mustafa Mesanovic
2011-03-07 10:10     ` Mustafa Mesanovic
2011-03-08  2:21       ` [PATCH v3] dm stripe: implement merge method Mike Snitzer
2011-03-08 10:29         ` Mustafa Mesanovic
2011-03-08 16:48           ` Mike Snitzer
2011-03-10 14:02             ` Mustafa Mesanovic
2011-03-12 22:42               ` Mike Snitzer
2011-03-14 11:54                 ` Mustafa Mesanovic [this message]
2011-03-14 14:33                   ` Mike Snitzer
2011-03-16 20:21         ` [PATCH v4] " Mike Snitzer
2011-03-17  5:12       ` [RFC][PATCH] dm: improve read performance Nikanth Karthikesan
2011-03-17 13:08         ` Mike Snitzer
2011-03-18  4:59           ` Nikanth Karthikesan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D7E020A.1020708@linux.vnet.ibm.com \
    --to=mume@linux.vnet.ibm.com \
    --cc=agk@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=cotte@de.ibm.com \
    --cc=dm-devel@redhat.com \
    --cc=ehrhardt@linux.vnet.ibm.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=jmoyer@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=snitzer@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox