linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Shaohua Li <shli@kernel.org>
To: NeilBrown <neilb@suse.de>
Cc: linux-raid@vger.kernel.org, linux-nfs@vger.kernel.org
Subject: Re: [PATCH 7/7] md/raid5: fix handling of degraded stripes in batches.
Date: Tue, 19 May 2015 22:56:24 -0700	[thread overview]
Message-ID: <20150520055624.GA41173@kernel.org> (raw)
In-Reply-To: <20150513105604.55ac700a@notabene.brown>

On Wed, May 13, 2015 at 10:56:04AM +1000, NeilBrown wrote:
> On Fri, 8 May 2015 12:12:23 -0700 Shaohua Li <shli@kernel.org> wrote:
> 
> > On Fri, May 08, 2015 at 06:56:12PM +1000, NeilBrown wrote:
> > > There is no need for special handling of stripe-batches when the array
> > > is degraded.
> > > 
> > > There may be if there is a failure in the batch, but STRIPE_DEGRADED
> > > does not imply an error.
> > > 
> > > So don't set STRIPE_BATCH_ERR in ops_run_io just because the array is
> > > degraded.
> > > This actually causes a bug: the STRIPE_DEGRADED flag gets cleared in
> > > check_break_stripe_batch_list() and so the bitmap bit gets cleared
> > > when it shouldn't.
> > > 
> > > So in check_break_stripe_batch_list(), split the batch up completely -
> > > again STRIPE_DEGRADED isn't meaningful.
> > > 
> > > Also don't set STRIPE_BATCH_ERR when there is a write error to a
> > > replacement device.  This simply removes the replacement device and
> > > requires no extra handling.
> > > 
> > > Signed-off-by: NeilBrown <neilb@suse.de>
> > > ---
> > >  drivers/md/raid5.c |   17 +++--------------
> > >  1 file changed, 3 insertions(+), 14 deletions(-)
> > > 
> > > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> > > index 3873eaa6fa2e..1ba97fdc6df1 100644
> > > --- a/drivers/md/raid5.c
> > > +++ b/drivers/md/raid5.c
> > > @@ -1078,9 +1078,6 @@ again:
> > >  			pr_debug("skip op %ld on disc %d for sector %llu\n",
> > >  				bi->bi_rw, i, (unsigned long long)sh->sector);
> > >  			clear_bit(R5_LOCKED, &sh->dev[i].flags);
> > > -			if (sh->batch_head)
> > > -				set_bit(STRIPE_BATCH_ERR,
> > > -					&sh->batch_head->state);
> > >  			set_bit(STRIPE_HANDLE, &sh->state);
> > >  		}
> > 
> > Patches look good to me. I had a question here. Is it possible some stripes in
> > a batch become degraded here but some not? Seems possible, then the batch
> > should be splitted too.
> 
> Why?
> 
> I don't really understand the purpose of splitting up the batch.
> The only possible error handling on a full-stripe write is:
>  - fail a device, or
>  - record a bad-block.
> 
> The first case affects all stripes in a batch equally so there is no need to
> split it up.
> The second case it is probably best to record the bad blocks while iterating
> through the batch in handle_stripe_clean_event().
> 
> What exactly do you expect to happen after the stripes in a batch after they
> have been split up?

My original concern is a device failure can causes some stripes fail but some
not, eg, get rdev returns NULL in ops_run_io for some stripes but not all of a
batch. There is no any locking, so seems possible. But you are right, the
stripes without error in ops_run_io will get an IO error eventually, the whole
batch stripes are still in the same state. So my concern is invalid, but I
forgot to reply the email, sorry.

The batch split is to handle IO error, eg, record bad-block and so on. I'm not
confident to change existing code to handle the error case, so I feel spliting
it and handling the stripe in normal way is the best thing to do. There
certainly might be better way. Again the device failure case should be ignored,
which I didn't realize originally.

BTW, can you apply the fix reported by Maxime, which is introduced by the batch patch.
http://marc.info/?l=linux-raid&m=143153461415534&w=2

Thanks,
Shaohua

  reply	other threads:[~2015-05-20  5:56 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-08  8:56 [PATCH 0/7] md fixes for -rc2 NeilBrown
2015-05-08  8:56 ` [PATCH 1/7] md-raid0: conditional mddev->queue access to suit dm-raid NeilBrown
2015-05-08  8:56 ` [PATCH 2/7] md/raid5: new alloc_stripe() to allocate an initialize a stripe NeilBrown
2015-05-08  8:56 ` [PATCH 5/7] md/raid5: don't record new size if resize_stripes fails NeilBrown
2015-05-08  8:56 ` [PATCH 4/7] md/raid5: avoid reading parity blocks for full-stripe write to degraded array NeilBrown
2015-05-08  8:56 ` [PATCH 3/7] md/raid5: more incorrect BUG_ON in handle_stripe_fill NeilBrown
2015-05-08  8:56 ` [PATCH 7/7] md/raid5: fix handling of degraded stripes in batches NeilBrown
2015-05-08 19:12   ` Shaohua Li
2015-05-13  0:56     ` NeilBrown
2015-05-20  5:56       ` Shaohua Li [this message]
2015-05-08  8:56 ` [PATCH 6/7] md/raid5: fix allocation of 'scribble' array NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150520055624.GA41173@kernel.org \
    --to=shli@kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).