From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from imap1.codethink.co.uk ([176.9.8.82]:59995 "EHLO imap1.codethink.co.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752623AbeFHSm6 (ORCPT ); Fri, 8 Jun 2018 14:42:58 -0400 Message-ID: <1528483367.2289.105.camel@codethink.co.uk> Subject: Re: [PATCH 4.4 038/268] Btrfs: fix scrub to repair raid6 corruption From: Ben Hutchings To: Greg Kroah-Hartman , Liu Bo , Sasha Levin Cc: stable@vger.kernel.org, David Sterba , LKML Date: Fri, 08 Jun 2018 19:42:47 +0100 In-Reply-To: <20180528100206.374694208@linuxfoundation.org> References: <20180528100202.045206534@linuxfoundation.org> <20180528100206.374694208@linuxfoundation.org> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: stable-owner@vger.kernel.org List-ID: On Mon, 2018-05-28 at 12:00 +0200, Greg Kroah-Hartman wrote: > 4.4-stable review patch.  If anyone has any objections, please let me know. > > ------------------ > > From: Liu Bo > > [ Upstream commit 762221f095e3932669093466aaf4b85ed9ad2ac1 ] The diff here is actually from commit 8810f7517a3b ("Btrfs: make raid6 rebuild retry more", mentioned in this commit message). (Sasha, please try to work out why commit messages and descriptions are getting mixed up in your auto-selections.) Maybe stable branches should get the real commit 762221f095e3 as well? Ben. > The raid6 corruption is that, > suppose that all disks can be read without problems and if the content > that was read out doesn't match its checksum, currently for raid6 > btrfs at most retries twice, > > - the 1st retry is to rebuild with all other stripes, it'll eventually >   be a raid5 xor rebuild, > - if the 1st fails, the 2nd retry will deliberately fail parity p so >   that it will do raid6 style rebuild, > > however, the chances are that another non-parity stripe content also > has something corrupted, so that the above retries are not able to > return correct content. > > We've fixed normal reads to rebuild raid6 correctly with more retries > in Patch "Btrfs: make raid6 rebuild retry more"[1], this is to fix > scrub to do the exactly same rebuild process. > > [1]: https://patchwork.kernel.org/patch/10091755/ > > Signed-off-by: Liu Bo > Signed-off-by: David Sterba > Signed-off-by: Sasha Levin > Signed-off-by: Greg Kroah-Hartman > --- >  fs/btrfs/raid56.c  |   18 ++++++++++++++---- >  fs/btrfs/volumes.c |    9 ++++++++- >  2 files changed, 22 insertions(+), 5 deletions(-) > > --- a/fs/btrfs/raid56.c > +++ b/fs/btrfs/raid56.c > @@ -2160,11 +2160,21 @@ int raid56_parity_recover(struct btrfs_r >   } >   >   /* > -  * reconstruct from the q stripe if they are > -  * asking for mirror 3 > +  * Loop retry: > +  * for 'mirror == 2', reconstruct from all other stripes. > +  * for 'mirror_num > 2', select a stripe to fail on every retry. >    */ > - if (mirror_num == 3) > - rbio->failb = rbio->real_stripes - 2; > + if (mirror_num > 2) { > + /* > +  * 'mirror == 3' is to fail the p stripe and > +  * reconstruct from the q stripe.  'mirror > 3' is to > +  * fail a data stripe and reconstruct from p+q stripe. > +  */ > + rbio->failb = rbio->real_stripes - (mirror_num - 1); > + ASSERT(rbio->failb > 0); > + if (rbio->failb <= rbio->faila) > + rbio->failb--; > + } >   >   ret = lock_stripe_add(rbio); >   > --- a/fs/btrfs/volumes.c > +++ b/fs/btrfs/volumes.c > @@ -5056,7 +5056,14 @@ int btrfs_num_copies(struct btrfs_fs_inf >   else if (map->type & BTRFS_BLOCK_GROUP_RAID5) >   ret = 2; >   else if (map->type & BTRFS_BLOCK_GROUP_RAID6) > - ret = 3; > + /* > +  * There could be two corrupted data stripes, we need > +  * to loop retry in order to rebuild the correct data. > +  * > +  * Fail a stripe at a time on every retry except the > +  * stripe under reconstruction. > +  */ > + ret = map->num_stripes; >   else >   ret = 1; >   free_extent_map(em); -- Ben Hutchings, Software Developer   Codethink Ltd https://www.codethink.co.uk/ Dale House, 35 Dale Street Manchester, M1 2HF, United Kingdom