From mboxrd@z Thu Jan 1 00:00:00 1970 From: Nate Dailey Subject: Re: [PATCH 0/2] raid1/10: Handle write errors correctly in narrow_write_error() Date: Thu, 22 Oct 2015 18:37:04 -0400 Message-ID: <56296510.4030702@stratus.com> References: <1445357353-19906-1-git-send-email-Jes.Sorensen@redhat.com> <87pp092sid.fsf@notabene.neil.brown.name> <87r3kmziux.fsf@notabene.neil.brown.name> Mime-Version: 1.0 Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <87r3kmziux.fsf@notabene.neil.brown.name> Sender: linux-raid-owner@vger.kernel.org To: Neil Brown , Jes Sorensen Cc: linux-raid@vger.kernel.org, William.Kuzeja@stratus.com, xni@redhat.com List-Id: linux-raid.ids The problem is that we aren't getting true write (medium) errors. In this case we're testing device removals. The write errors happen because the disk goes away. Narrow_write_error returns 1, the bitmap bit is cleared, and then when the device is re-added the resync might not include the sectors in that chunk (there's some luck involved; if other writes to that chunk happen while the disk is removed, we're okay--bug is easier to hit with smaller bitmap chunks because of this). On 10/22/2015 05:36 PM, Neil Brown wrote: > Jes Sorensen writes: > >> Neil Brown writes: >>> Jes.Sorensen@redhat.com writes: >>> >>>> From: Jes Sorensen >>>> >>>> Hi, >>>> >>>> Bill Kuzeja reported a problem to me about data corruption when >>>> repeatedly removing and re-adding devices in raid1 arrays. It showed >>>> up to be caused by the return value of submit_bio_wait() being handled >>>> incorrectly. Tracking this down is credit of Bill! >>>> >>>> Looks like commit 9e882242c6193ae6f416f2d8d8db0d9126bd996b changed the >>>> return of submit_bio_wait() to return != 0 on error, whereas before it >>>> returned 0 on error. >>>> >>>> This fix should be suitable for -stable as far back as 3.9 >>> 3.10? >>> >>> Thanks to both of you! >>> >>> I took the liberty of changing the patches a little so they are now: >>> >>> - if (submit_bio_wait(WRITE, wbio) == 0) >>> + if (submit_bio_wait(WRITE, wbio) < 0) >>> >>> because when there is no explicit test I tend to expect a Bool but these >>> values are not Bool. >>> >>> Patches are in my for-linus branch and will be forwarded sometime this >>> week. >>> >>> This bug only causes a problem when bad-block logs are active, so >>> hopefully it won't have caused too much corruption yet -- you would need >>> to be using a newish mdadm. >> Neil, >> >> An additional twist on this one - Nate ran more tests on this, but was >> still able to hit data corruption. He suggests the it is a mistake to >> set 'ok = rdev_set_badblocks()' and it should instead be set to 0 if >> submit_bio_wait() fails. Like this: >> >> --- raid1.c >> +++ raid1.c >> @@ -2234,11 +2234,12 @@ >> bio_trim(wbio, sector - r1_bio->sector, sectors); >> wbio->bi_sector += rdev->data_offset; >> wbio->bi_bdev = rdev->bdev; >> if (submit_bio_wait(WRITE, wbio) < 0) { >> /* failure! */ >> - ok = rdev_set_badblocks(rdev, sector, >> - sectors, 0) >> - && ok; >> + ok = 0; >> + rdev_set_badblocks(rdev, sector, >> + sectors, 0); >> + } >> >> Question is whether this change has any negative impact in case of a >> real write failure? >> >> I have actual patches, I'll send as a reply to this one. >> > If we unconditionally set ok to 0 on a write error, then > narrow_write_error() will return 0 and handle_write finished() will call > md_error() to kick the device out of the array. > > And given that we only call narrow_write_error() when we got a write > error, we strongly expect at least one sector to give an error. > > So it seems to me that the net result of this patch is to make > bad-block-lists completely ineffective. > > What sort of tests are you running, and what sort of corruption do you > see? > > NeilBrown