linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Eric Mei <meijia@gmail.com>
To: NeilBrown <neilb@suse.de>
Cc: linux-raid@vger.kernel.org
Subject: Re: [PATCH] md/raid5: don't do chunk aligned read on degraded array.
Date: Thu, 19 Mar 2015 13:41:16 -0600	[thread overview]
Message-ID: <550B265C.7070907@gmail.com> (raw)
In-Reply-To: <20150319170215.7d6dfd60@notabene.brown>

On 2015-03-19 12:02 AM, NeilBrown wrote:
> On Wed, 18 Mar 2015 23:39:11 -0600 Eric Mei <meijia@gmail.com> wrote:
>
>> From: Eric Mei <eric.mei@seagate.com>
>>
>> When array is degraded, read data landed on failed drives will result in
>> reading rest of data in a stripe. So a single sequential read would
>> result in same data being read twice.
>>
>> This patch is to avoid chunk aligned read for degraded array. The
>> downside is to involve stripe cache which means associated CPU overhead
>> and extra memory copy.
>>
>> Signed-off-by: Eric Mei <eric.mei@seagate.com>
>> ---
>>    drivers/md/raid5.c |   15 ++++++++++++---
>>    1 files changed, 12 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
>> index cd2f96b..763c64a 100644
>> --- a/drivers/md/raid5.c
>> +++ b/drivers/md/raid5.c
>> @@ -4180,8 +4180,12 @@ static int raid5_mergeable_bvec(struct mddev *mddev,
>>           unsigned int chunk_sectors = mddev->chunk_sectors;
>>           unsigned int bio_sectors = bvm->bi_size >> 9;
>>
>> -       if ((bvm->bi_rw & 1) == WRITE)
>> -               return biovec->bv_len; /* always allow writes to be
>> mergeable */
>> +       /*
>> +        * always allow writes to be mergeable, read as well if array
>> +        * is degraded as we'll go through stripe cache anyway.
>> +        */
>> +       if ((bvm->bi_rw & 1) == WRITE || mddev->degraded)
>> +               return biovec->bv_len;
>>
>>           if (mddev->new_chunk_sectors < mddev->chunk_sectors)
>>                   chunk_sectors = mddev->new_chunk_sectors;
>> @@ -4656,7 +4660,12 @@ static void make_request(struct mddev *mddev,
>> struct bio * bi)
>>
>>           md_write_start(mddev, bi);
>>
>> -       if (rw == READ &&
>> +       /*
>> +        * If array is degraded, better not do chunk aligned read because
>> +        * later we might have to read it again in order to reconstruct
>> +        * data on failed drives.
>> +        */
>> +       if (rw == READ && mddev->degraded == 0 &&
>>                mddev->reshape_position == MaxSector &&
>>                chunk_aligned_read(mddev,bi))
>>                   return;
>
> Thanks for the patch.
>
> However this sort of patch really needs to come with some concrete
> performance numbers.  Preferably both sequential reads and random reads.
>
> I agree that sequential reads are likely to be faster, but how much faster
> are they?
> I imagine that this might make random reads a little slower.   Does it?  By
> how much?
>
> Thanks,
> NeilBrown
>

Hi Neil,

Sorry I should have done the test in first place.

Following test are done on a enterprise storage node with Seagate 6T SAS 
drives and Xeon E5-2648L CPU (10 cores, 1.9Ghz), 10 disks MD RAID6 8+2, 
chunk size 128 KiB.

I use FIO, using direct-io with various bs size, enough queue depth, 
tested sequential and 100% random read against 3 array config: 1) 
optimal, as baseline; 2) degraded; 3) degraded with this patch. Kernel 
version is 4.0-rc3.

Each individual test I only did once so there might be some variations, 
but we just focus on big trend.

Sequential Read:
  bs=(KiB)  optimal(MiB/s)  degraded(MiB/s)  degraded-with-patch (MiB/s)
   1024       1608            656              995
    512       1624            710              956
    256       1635            728              980
    128       1636            771              983
     64       1612           1119             1000
     32       1580           1420             1004
     16       1368            688              986
      8        768            647              953
      4        411            413              850

Random Read:
  bs=(KiB)  optimal(IOPS)  degraded(IOPS)  degraded-with-patch (IOPS)
   1024        163            160              156
    512        274            273              272
    256        426            428              424
    128        576            592              591
     64        726            724              726
     32        849            848              837
     16        900            970              971
      8        927            940              929
      4        948            940              955

Some notes:
  * In sequential + optimal, as bs size getting smaller, the FIO thread 
become CPU bound.
  * In sequential + degraded, there's big increase when bs is 64K and 
32K, I don't have explanation.
  * In sequential + degraded-with-patch, the MD thread mostly become CPU 
bound.

If you want to we can discuss specific data point in those data. But in 
general it seems with this patch, we have more predictable and in most 
cases significant better sequential read performance when array is 
degraded, and almost no noticeable impact on random read.

Performance is a complicated thing, the patch works well for this 
particular configuration, but may not be universal. For example I 
imagine testing on all SSD array may have very different result. But I 
personally think in most cases IO bandwidth is more scarce resource than 
CPU.

Eric

  reply	other threads:[~2015-03-19 19:41 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-19  5:39 [PATCH] md/raid5: don't do chunk aligned read on degraded array Eric Mei
2015-03-19  6:02 ` NeilBrown
2015-03-19 19:41   ` Eric Mei [this message]
2015-04-20  6:20     ` NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=550B265C.7070907@gmail.com \
    --to=meijia@gmail.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).