From: Mustafa Mesanovic <mume@linux.vnet.ibm.com>
To: Neil Brown <neilb@suse.de>,
akpm@linux-foundation.org, snitzer@redhat.com
Cc: dm-devel@redhat.com, linux-kernel@vger.kernel.org,
heiko.carstens@de.ibm.com, cotte@de.ibm.com,
ehrhardt@linux.vnet.ibm.com
Subject: Re: [RFC][PATCH] dm: improve read performance
Date: Mon, 07 Mar 2011 11:10:01 +0100 [thread overview]
Message-ID: <4D74AEF9.7050108@linux.vnet.ibm.com> (raw)
In-Reply-To: <201012271323.13406.mume@linux.vnet.ibm.com>
On 12/27/2010 01:23 PM, Mustafa Mesanovic wrote:
> On Mon December 27 2010 12:54:59 Neil Brown wrote:
>> On Mon, 27 Dec 2010 12:19:55 +0100 Mustafa Mesanovic
>>
>> <mume@linux.vnet.ibm.com> wrote:
>>> From: Mustafa Mesanovic<mume@linux.vnet.ibm.com>
>>>
>>> A short explanation in prior: in this case we have "stacked" dm devices.
>>> Two multipathed luns combined together to one striped logical volume.
>>>
>>> I/O throughput degradation happens at __bio_add_page when bio's get
>>> checked upon max_sectors. In this setup max_sectors is always set to 8
>>> -> what is 4KiB.
>>> A standalone striped logical volume on luns which are not multipathed do
>>> not have the problem: the logical volume will take over the max_sectors
>>> from luns below.
[...]
>>> Using the patch improves read I/O up to 3x. In this specific case from
>>> 600MiB/s up to 1800MiB/s.
>> and using this patch will cause IO to fail sometimes.
>> If an IO request which is larger than a page crosses a device boundary in
>> the underlying e.g. RAID0, the RAID0 will return an error as such things
>> should not happen - they are prevented by merge_bvec_fn.
>>
>> If merge_bvec_fn is not being honoured, then you MUST limit requests to a
>> single entry iovec of at most one page.
>>
>> NeilBrown
>>
> Thank you for that hint, I will try to write a merge_bvec_fn for dm-stripe.c
> which solves the problem, if that is ok?
>
> Mustafa Mesanovic
>
Now here my new suggestion to fix this issue, what is your opinion?
I tested this with different setups, and it worked fine and I had
very good performance improvements.
[RFC][PATCH] dm: improve read performance - v2
This patch adds a merge_fn for the dm stripe target. This merge_fn
prevents dm_set_device_limits() setting the max_sectors to 4KiB
(PAGE_SIZE). (As in a prior patch already mentioned.)
Now the read performance improved up to 3x higher compared to before.
What happened before:
I/O throughput degradation happened at __bio_add_page() when bio's got checked
at the very beginning upon max_sectors. In this setup max_sectors is always
set to 8. So bio's entered the dm target with a max of 4KiB.
Now dm-stripe target will have its own merge_fn so max_sectors will not
pushed down to 8 (4KiB), and bio's can get bigger than 4KiB.
Signed-off-by: Mustafa Mesanovic<mume@linux.vnet.ibm.com>
---
dm-stripe.c | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)
Index: linux-2.6/drivers/md/dm-stripe.c
===================================================================
--- linux-2.6.orig/drivers/md/dm-stripe.c 2011-02-28 10:23:37.000000000 +0100
+++ linux-2.6/drivers/md/dm-stripe.c 2011-02-28 10:24:29.000000000 +0100
@@ -396,6 +396,29 @@
blk_limits_io_opt(limits, chunk_size * sc->stripes);
}
+static int stripe_merge(struct dm_target *ti, struct bvec_merge_data *bvm,
+ struct bio_vec *biovec, int max_size)
+{
+ struct stripe_c *sc = (struct stripe_c *) ti->private;
+ sector_t offset, chunk;
+ uint32_t stripe;
+ struct request_queue *q;
+
+ offset = bvm->bi_sector - ti->begin;
+ chunk = offset>> sc->chunk_shift;
+ stripe = sector_div(chunk, sc->stripes);
+
+ if (!bdev_get_queue(sc->stripe[stripe].dev->bdev)->merge_bvec_fn)
+ return max_size;
+
+ bvm->bi_bdev = sc->stripe[stripe].dev->bdev;
+ q = bdev_get_queue(bvm->bi_bdev);
+ bvm->bi_sector = sc->stripe[stripe].physical_start +
+ (chunk<< sc->chunk_shift) + (offset& sc->chunk_mask);
+
+ return min(max_size, q->merge_bvec_fn(q, bvm, biovec));
+}
+
static struct target_type stripe_target = {
.name = "striped",
.version = {1, 3, 1},
@@ -403,6 +426,7 @@
.ctr = stripe_ctr,
.dtr = stripe_dtr,
.map = stripe_map,
+ .merge = stripe_merge,
.end_io = stripe_end_io,
.status = stripe_status,
.iterate_devices = stripe_iterate_devices,
next prev parent reply other threads:[~2011-03-07 10:10 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-12-27 11:19 [RFC][PATCH] dm: improve read performance Mustafa Mesanovic
2010-12-27 11:54 ` Neil Brown
2010-12-27 12:23 ` Mustafa Mesanovic
2011-03-07 10:10 ` Mustafa Mesanovic [this message]
2011-03-08 2:21 ` [PATCH v3] dm stripe: implement merge method Mike Snitzer
2011-03-08 10:29 ` Mustafa Mesanovic
2011-03-08 16:48 ` Mike Snitzer
2011-03-10 14:02 ` Mustafa Mesanovic
2011-03-12 22:42 ` Mike Snitzer
2011-03-14 11:54 ` Mustafa Mesanovic
2011-03-14 14:33 ` Mike Snitzer
2011-03-16 20:21 ` [PATCH v4] " Mike Snitzer
2011-03-17 5:12 ` [RFC][PATCH] dm: improve read performance Nikanth Karthikesan
2011-03-17 13:08 ` Mike Snitzer
2011-03-18 4:59 ` Nikanth Karthikesan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4D74AEF9.7050108@linux.vnet.ibm.com \
--to=mume@linux.vnet.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=cotte@de.ibm.com \
--cc=dm-devel@redhat.com \
--cc=ehrhardt@linux.vnet.ibm.com \
--cc=heiko.carstens@de.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=neilb@suse.de \
--cc=snitzer@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.