Re: [PATCH 3/3] Btrfs: make raid6 rebuild retry more

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Liu Bo <bo.li.liu@oracle.com>
To: Qu Wenruo <quwenruo.btrfs@gmx.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 3/3] Btrfs: make raid6 rebuild retry more
Date: Tue, 5 Dec 2017 11:04:03 -0700	[thread overview]
Message-ID: <20171205180403.GA18865@dhcp-10-211-47-181.usdhcp.oraclecorp.com> (raw)
In-Reply-To: <1ab2da20-6294-67a4-2ef3-6a65afed0fc6@gmx.com>

On Tue, Dec 05, 2017 at 04:07:35PM +0800, Qu Wenruo wrote:
> 
> 
> On 2017年12月05日 06:40, Liu Bo wrote:
> > There is a scenario that can end up with rebuild process failing to
> > return good content, i.e.
> > suppose that all disks can be read without problems and if the content
> > that was read out doesn't match its checksum, currently for raid6
> > btrfs at most retries twice,
> > 
> > - the 1st retry is to rebuild with all other stripes, it'll eventually
> >   be a raid5 xor rebuild,
> > - if the 1st fails, the 2nd retry will deliberately fail parity p so
> >   that it will do raid6 style rebuild,
> > 
> > however, the chances are that another non-parity stripe content also
> > has something corrupted, so that the above retries are not able to
> > return correct content, and users will think of this as data loss.
> > More seriouly, if the loss happens on some important internal btree
> > roots, it could refuse to mount.
> > 
> > This extends btrfs to do more retries and each retry fails only one
> > stripe.  Since raid6 can tolerate 2 disk failures, if there is one
> > more failure besides the failure on which we're recovering, this can
> > always work.
> 
> This should be the correct behavior for RAID6, try all possible
> combination until all combination is exhausted or correct data can be
> recovered.
> 
> > 
> > The worst case is to retry as many times as the number of raid6 disks,
> > but given the fact that such a scenario is really rare in practice,
> > it's still acceptable.
> 
> And even we tried that much times, I don't think it will be a big problem.
> Since most of the that happens purely in memory, it should be so fast
> that no obvious impact can be observed.
>

It's basically a while loop, so it may cause some delay/hang, anyway,
it's rare though.

> While with some small nitpick inlined below, the idea looks pretty good
> to me.
> 
> Reviewed-by: Qu Wenruo <wqu@suse.com>
> 
> > 
> > Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> > ---
> >  fs/btrfs/raid56.c  | 18 ++++++++++++++----
> >  fs/btrfs/volumes.c |  9 ++++++++-
> >  2 files changed, 22 insertions(+), 5 deletions(-)
> > 
> > diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
> > index 8d09535..064d5bc 100644
> > --- a/fs/btrfs/raid56.c
> > +++ b/fs/btrfs/raid56.c
> > @@ -2166,11 +2166,21 @@ int raid56_parity_recover(struct btrfs_fs_info *fs_info, struct bio *bio,
> >  	}
> >  
> >  	/*
> > -	 * reconstruct from the q stripe if they are
> > -	 * asking for mirror 3
> > +	 * Loop retry:
> > +	 * for 'mirror == 2', reconstruct from all other stripes.
> 
> What about using macro to makes the reassemble method more human readable?
> 
> And for mirror == 2 case, "rebuild from all" do you mean rebuild using
> all remaining data stripe + P? The word "all" here is a little confusing.
>

Thank you for the comments.

It depends, if all other stripes are good to read, then it'd do
'data+p' which is raid5 xor rebuild, if some disks also fail, then
it'd may do 'data+p+q' or 'data+q'.

Is it better to say "for mirror == 2, reconstruct from other available
stripes"?

Thanks,

-liubo

> Thanks,
> Qu
> 
> > +	 * for 'mirror_num > 2', select a stripe to fail on every retry.
> >  	 */> -	if (mirror_num == 3)
> > -		rbio->failb = rbio->real_stripes - 2;
> > +	if (mirror_num > 2) {
> > +		/*
> > +		 * 'mirror == 3' is to fail the p stripe and
> > +		 * reconstruct from the q stripe.  'mirror > 3' is to
> > +		 * fail a data stripe and reconstruct from p+q stripe.
> > +		 */
> > +		rbio->failb = rbio->real_stripes - (mirror_num - 1);
> > +		ASSERT(rbio->failb > 0);
> > +		if (rbio->failb <= rbio->faila)
> > +			rbio->failb--;
> > +	}
> >  
> >  	ret = lock_stripe_add(rbio);
> >  
> > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> > index b397375..95371f8 100644
> > --- a/fs/btrfs/volumes.c
> > +++ b/fs/btrfs/volumes.c
> > @@ -5094,7 +5094,14 @@ int btrfs_num_copies(struct btrfs_fs_info *fs_info, u64 logical, u64 len)
> >  	else if (map->type & BTRFS_BLOCK_GROUP_RAID5)
> >  		ret = 2;
> >  	else if (map->type & BTRFS_BLOCK_GROUP_RAID6)
> > -		ret = 3;
> > +		/*
> > +		 * There could be two corrupted data stripes, we need
> > +		 * to loop retry in order to rebuild the correct data.
> > +		 * 
> > +		 * Fail a stripe at a time on every retry except the
> > +		 * stripe under reconstruction.
> > +		 */
> > +		ret = map->num_stripes;
> >  	else
> >  		ret = 1;
> >  	free_extent_map(em);
> > 
>

next prev parent reply	other threads:[~2017-12-05 19:06 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-04 22:40 [PATCH 0/3] Btrfs: loop retry on raid6 read failures Liu Bo
2017-12-04 22:40 ` [PATCH 1/3] Btrfs: remove redundant check in rbio_can_merge Liu Bo
2017-12-05 18:20   ` David Sterba
2017-12-04 22:40 ` [PATCH 2/3] Btrfs: do not merge rbios if their fail stripe index are not identical Liu Bo
2017-12-05 18:24   ` David Sterba
2017-12-04 22:40 ` [PATCH 3/3] Btrfs: make raid6 rebuild retry more Liu Bo
2017-12-05  8:07   ` Qu Wenruo
2017-12-05 18:04     ` Liu Bo [this message]
2017-12-05 19:29       ` David Sterba
2017-12-05 18:09     ` David Sterba
2017-12-05 22:55       ` Liu Bo
2017-12-06  0:11         ` Qu Wenruo
2017-12-07  0:26           ` Liu Bo
2017-12-05  8:08   ` Qu Wenruo
2017-12-05 18:26 ` [PATCH 0/3] Btrfs: loop retry on raid6 read failures David Sterba
2018-01-05 15:54 ` David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171205180403.GA18865@dhcp-10-211-47-181.usdhcp.oraclecorp.com \
    --to=bo.li.liu@oracle.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=quwenruo.btrfs@gmx.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).