From: NeilBrown <neilb@suse.de>
To: Eric Mei <meijia@gmail.com>
Cc: linux-raid@vger.kernel.org
Subject: Re: [PATCH] md/raid5: don't do chunk aligned read on degraded array.
Date: Mon, 20 Apr 2015 16:20:38 +1000 [thread overview]
Message-ID: <20150420162038.72af8591@notabene.brown> (raw)
In-Reply-To: <550B265C.7070907@gmail.com>
[-- Attachment #1: Type: text/plain, Size: 5874 bytes --]
On Thu, 19 Mar 2015 13:41:16 -0600 Eric Mei <meijia@gmail.com> wrote:
> On 2015-03-19 12:02 AM, NeilBrown wrote:
> > On Wed, 18 Mar 2015 23:39:11 -0600 Eric Mei <meijia@gmail.com> wrote:
> >
> >> From: Eric Mei <eric.mei@seagate.com>
> >>
> >> When array is degraded, read data landed on failed drives will result in
> >> reading rest of data in a stripe. So a single sequential read would
> >> result in same data being read twice.
> >>
> >> This patch is to avoid chunk aligned read for degraded array. The
> >> downside is to involve stripe cache which means associated CPU overhead
> >> and extra memory copy.
> >>
> >> Signed-off-by: Eric Mei <eric.mei@seagate.com>
> >> ---
> >> drivers/md/raid5.c | 15 ++++++++++++---
> >> 1 files changed, 12 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> >> index cd2f96b..763c64a 100644
> >> --- a/drivers/md/raid5.c
> >> +++ b/drivers/md/raid5.c
> >> @@ -4180,8 +4180,12 @@ static int raid5_mergeable_bvec(struct mddev *mddev,
> >> unsigned int chunk_sectors = mddev->chunk_sectors;
> >> unsigned int bio_sectors = bvm->bi_size >> 9;
> >>
> >> - if ((bvm->bi_rw & 1) == WRITE)
> >> - return biovec->bv_len; /* always allow writes to be
> >> mergeable */
> >> + /*
> >> + * always allow writes to be mergeable, read as well if array
> >> + * is degraded as we'll go through stripe cache anyway.
> >> + */
> >> + if ((bvm->bi_rw & 1) == WRITE || mddev->degraded)
> >> + return biovec->bv_len;
> >>
> >> if (mddev->new_chunk_sectors < mddev->chunk_sectors)
> >> chunk_sectors = mddev->new_chunk_sectors;
> >> @@ -4656,7 +4660,12 @@ static void make_request(struct mddev *mddev,
> >> struct bio * bi)
> >>
> >> md_write_start(mddev, bi);
> >>
> >> - if (rw == READ &&
> >> + /*
> >> + * If array is degraded, better not do chunk aligned read because
> >> + * later we might have to read it again in order to reconstruct
> >> + * data on failed drives.
> >> + */
> >> + if (rw == READ && mddev->degraded == 0 &&
> >> mddev->reshape_position == MaxSector &&
> >> chunk_aligned_read(mddev,bi))
> >> return;
> >
> > Thanks for the patch.
> >
> > However this sort of patch really needs to come with some concrete
> > performance numbers. Preferably both sequential reads and random reads.
> >
> > I agree that sequential reads are likely to be faster, but how much faster
> > are they?
> > I imagine that this might make random reads a little slower. Does it? By
> > how much?
> >
> > Thanks,
> > NeilBrown
> >
>
> Hi Neil,
>
> Sorry I should have done the test in first place.
>
> Following test are done on a enterprise storage node with Seagate 6T SAS
> drives and Xeon E5-2648L CPU (10 cores, 1.9Ghz), 10 disks MD RAID6 8+2,
> chunk size 128 KiB.
>
> I use FIO, using direct-io with various bs size, enough queue depth,
> tested sequential and 100% random read against 3 array config: 1)
> optimal, as baseline; 2) degraded; 3) degraded with this patch. Kernel
> version is 4.0-rc3.
>
> Each individual test I only did once so there might be some variations,
> but we just focus on big trend.
>
> Sequential Read:
> bs=(KiB) optimal(MiB/s) degraded(MiB/s) degraded-with-patch (MiB/s)
> 1024 1608 656 995
> 512 1624 710 956
> 256 1635 728 980
> 128 1636 771 983
> 64 1612 1119 1000
> 32 1580 1420 1004
> 16 1368 688 986
> 8 768 647 953
> 4 411 413 850
>
> Random Read:
> bs=(KiB) optimal(IOPS) degraded(IOPS) degraded-with-patch (IOPS)
> 1024 163 160 156
> 512 274 273 272
> 256 426 428 424
> 128 576 592 591
> 64 726 724 726
> 32 849 848 837
> 16 900 970 971
> 8 927 940 929
> 4 948 940 955
>
> Some notes:
> * In sequential + optimal, as bs size getting smaller, the FIO thread
> become CPU bound.
> * In sequential + degraded, there's big increase when bs is 64K and
> 32K, I don't have explanation.
> * In sequential + degraded-with-patch, the MD thread mostly become CPU
> bound.
>
> If you want to we can discuss specific data point in those data. But in
> general it seems with this patch, we have more predictable and in most
> cases significant better sequential read performance when array is
> degraded, and almost no noticeable impact on random read.
>
> Performance is a complicated thing, the patch works well for this
> particular configuration, but may not be universal. For example I
> imagine testing on all SSD array may have very different result. But I
> personally think in most cases IO bandwidth is more scarce resource than
> CPU.
>
> Eric
Thanks.
That is reasonably convincing.
I've added that text to the commit message, fixed up all the white-space
damage in the patch (tabs were converted to spaces etc ... if you are going
to be sending more patches, please find a way to convince your mailer that
spaces are important), and applied it.
It should be included in my pull request for 4.1
Thanks,
NeilBrown
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]
prev parent reply other threads:[~2015-04-20 6:20 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-03-19 5:39 [PATCH] md/raid5: don't do chunk aligned read on degraded array Eric Mei
2015-03-19 6:02 ` NeilBrown
2015-03-19 19:41 ` Eric Mei
2015-04-20 6:20 ` NeilBrown [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150420162038.72af8591@notabene.brown \
--to=neilb@suse.de \
--cc=linux-raid@vger.kernel.org \
--cc=meijia@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).