All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Snitzer <snitzer@redhat.com>
To: Mustafa Mesanovic <mume@linux.vnet.ibm.com>
Cc: dm-devel@redhat.com, Neil Brown <neilb@suse.de>,
	akpm@linux-foundation.org, cotte@de.ibm.com,
	heiko.carstens@de.ibm.com, linux-kernel@vger.kernel.org,
	ehrhardt@linux.vnet.ibm.com,
	"Alasdair G. Kergon" <agk@redhat.com>,
	Jeff Moyer <jmoyer@redhat.com>
Subject: Re: [PATCH v3] dm stripe: implement merge method
Date: Mon, 14 Mar 2011 10:33:06 -0400	[thread overview]
Message-ID: <20110314143305.GA16939@redhat.com> (raw)
In-Reply-To: <4D7E020A.1020708@linux.vnet.ibm.com>

On Mon, Mar 14 2011 at  7:54am -0400,
Mustafa Mesanovic <mume@linux.vnet.ibm.com> wrote:

> On 03/12/2011 11:42 PM, Mike Snitzer wrote:
> >What record size are you using?
> >Which filesystem are you using?
> >Also, were you using O_DIRECT?  If not then I'm having a hard time
> >understanding why implementing stripe_merge was so beneficial for you.
> >stripe_merge doesn't help buffered IO.

To clarify (thanks to the clue about dropping caches):
stripe_merge doesn't help _cached_ buffered IO ;)

> I used 64k record size, and ext3 as filesystem.
> 
> No, I was not using O_DIRECT. But I have measured as well with O_DIRECT, and
> the benefits there are significant too.
> 
> stripe_merge() helps a lot. The reason of splitting I/O records into 4KiB
> chunks happens at dm_set_device_limits(), thats what I explained in my v1 patch.
> If the target has no own merge_fn, max_sectors will be set to PAGE_SIZE, what
> in my case is 4KiB. Then __bio_add_page checks upon max_sectors and does not
> add any more pages to a bio. The bio stays at 4KiB.
>
> Now by avoiding the "wrong" setting of max_sectors for the dm target,
> __bio_add_page will be able to add more than one page to the bios.

Right, I understand the limitation that the patch addresses.  But given
that I hadn't dropped caches I was missing _why_ it was helping you so
much for buffered IO.

> So this is my iozone call:
>  # iozone -s 2000m -r 64k -t 32 -e -w -R -C -i 0
>                                         -F<mntpt>/Child0 ....<mntpt>/Child31
> For direct I/O (O_DIRECT) add '-I'.

I've been using a comparable iozone commandline (except I was using -i 0
-i 1 -i 2 in a single iozone run).  If I write (-i 0), drop caches, and
then read (-i 1) I see the benefit associated with stripe_merge() and
buffered reads:

iozone -s 2000m -r 64k -t 8 -e -w -i 0 -F ...

dm_merge_bvec_count calls: 192
stripe_map_sector calls: 8192527
stripe_map calls: 8192921

(do _not_ drop_caches)

iozone -s 2000m -r 64k -t 8 -e -w -i 1 -F ...

dm_merge_bvec_count calls: 0
stripe_map_sector calls: 6
stripe_map calls: 262

echo 3 > /proc/sys/vm/drop_caches
iozone -s 2000m -r 64k -t 8 -e -w -i 1 -F ...
...

dm_merge_bvec_count calls: 8147899
stripe_map_sector calls: 4205979
stripe_map calls: 247675

> dm_merge_bvec/stripe_merge is being called only on reads, thats what I have
> observed when I was testing the patch on my 2.6.32.x-stable kernel. Maybe it
> depends if the I/O is page cached or aio based...this might be worth a
> further analysis. On writes another path must be walked through, but I have
> not further analysed it so far.

Direct I/O (and AIO) writes do use dm_merge_bvec/stripe_merge:

iozone -s 2000m -r 64k -t 8 -e -w -I -i 0 -F ...

dm_merge_bvec_count calls: 16344806
stripe_map_sector calls: 8683179
stripe_map calls: 511595

> In think it helps to avoid "overhead" in passing always 4KiB bios to the
> dm target. In my opinion it is "cheaper"/"faster" to pass one big bio
> down to the dm target instead of passing 4KiB max each bio.
> 
> I used iostat to check on the devices and the sizes of the requests, just try
> to start an iostat process which collects I/O statistics during your
> runs. e.g. 'iostat -dmx 2>  outfile&' - check out "avgrq-sz".
> 
> And yes during my iostat runs I figured out that the writes are still dropping
> into the dm in 4KiB chunks, this is what I will analyse next.

Yes, for buffered writes that is definitely the case.

My iostat runs show this too, e.g.:

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
dm-50             0.00 11970.00    0.00   45.50     0.00 23264.00  1022.59   175.31 1398.88  21.98 100.00
dm-52             0.00 11967.50    0.00   41.50     0.00 20742.00   999.61   170.87 1330.83  24.10 100.00
dm-56             0.00 11962.00    0.00   47.50     0.00 24320.00  1024.00   167.65 1324.91  21.05 100.00
dm-57             0.00 11970.00    0.00   45.00     0.00 23040.00  1024.00   174.03 1409.62  22.22 100.00
dm-58             0.00 11970.00    0.00   47.00     0.00 23808.00  1013.11   176.86 1415.36  21.28 100.00
dm-60             0.00 11962.00    0.00   45.00     0.00 23040.00  1024.00   171.99 1375.34  22.22 100.00
dm-62             0.00 11962.00    0.00   59.50     0.00 29952.00  1006.79   139.20 1129.12  16.81 100.00
dm-64             0.00 11967.50    0.00   63.50     0.00 32510.00  1023.94   136.29 1116.40  15.75 100.00
dm-66             0.00     0.00    0.00 96459.50     0.00 385838.00     8.00 167097.20  675.99   0.01 100.00

NOTE: dm-66 is the bio-based striped_lv volume, the other dm-* are the
underlying request-based mpath devices.

> Maybe there will be another patch(es) to fix that.

Doubtful, and it certainly isn't a DM-only phenomenon.  writeback is
always done in terms of PAGE_SIZE IOs.

> Mustafa
> 
> ps:
> aio-stress did not work for me, sorry but I did not have the time to check on that
> and to search where the error might be...

Odd, works fine for me.  I'm using the following commndline:
aio-stress -O -o 0 -o 1 -r 64 -d 128 -b 16 -i 16 -s 2048 /dev/snitm/striped_lv

Can you share how you're using it and what the error is?

  reply	other threads:[~2011-03-14 14:33 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-12-27 11:19 [RFC][PATCH] dm: improve read performance Mustafa Mesanovic
2010-12-27 11:54 ` Neil Brown
2010-12-27 12:23   ` Mustafa Mesanovic
2011-03-07 10:10     ` Mustafa Mesanovic
2011-03-08  2:21       ` [PATCH v3] dm stripe: implement merge method Mike Snitzer
2011-03-08 10:29         ` Mustafa Mesanovic
2011-03-08 16:48           ` Mike Snitzer
2011-03-10 14:02             ` Mustafa Mesanovic
2011-03-12 22:42               ` Mike Snitzer
2011-03-14 11:54                 ` Mustafa Mesanovic
2011-03-14 14:33                   ` Mike Snitzer [this message]
2011-03-16 20:21         ` [PATCH v4] " Mike Snitzer
2011-03-17  5:12       ` [RFC][PATCH] dm: improve read performance Nikanth Karthikesan
2011-03-17 13:08         ` Mike Snitzer
2011-03-18  4:59           ` Nikanth Karthikesan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110314143305.GA16939@redhat.com \
    --to=snitzer@redhat.com \
    --cc=agk@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=cotte@de.ibm.com \
    --cc=dm-devel@redhat.com \
    --cc=ehrhardt@linux.vnet.ibm.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=jmoyer@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mume@linux.vnet.ibm.com \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.