linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jes Sorensen <Jes.Sorensen@redhat.com>
To: Nate Dailey <nate.dailey@stratus.com>
Cc: Neil Brown <neilb@suse.de>,
	linux-raid@vger.kernel.org, William.Kuzeja@stratus.com,
	xni@redhat.com
Subject: Re: [PATCH 0/2] raid1/10: Handle write errors correctly in narrow_write_error()
Date: Fri, 23 Oct 2015 14:02:47 -0400	[thread overview]
Message-ID: <wrfjoafpcvjs.fsf@redhat.com> (raw)
In-Reply-To: <562A4475.1000904@stratus.com> (Nate Dailey's message of "Fri, 23 Oct 2015 10:30:13 -0400")

Nate Dailey <nate.dailey@stratus.com> writes:
> Thank you!
>
> I confirmed that this patch prevents the bug.
>
> Nate

Awesome, thanks Nate!

Neil once you commit the final version of this patch, please let me
know.

Cheers,
Jes

>
>
>
> On 10/22/2015 08:09 PM, Neil Brown wrote:
>> Nate Dailey <nate.dailey@stratus.com> writes:
>>
>>> The problem is that we aren't getting true write (medium) errors.
>>>
>>> In this case we're testing device removals. The write errors happen
>>> because the
>>> disk goes away. Narrow_write_error returns 1, the bitmap bit is cleared, and
>>> then when the device is re-added the resync might not include the sectors in
>>> that chunk (there's some luck involved; if other writes to that chunk happen
>>> while the disk is removed, we're okay--bug is easier to hit with
>>> smaller bitmap
>>> chunks because of this).
>>>
>>>
>> OK, that makes sense.
>>
>> The device removal will be noticed when the bad block log is written
>> out.
>> When a bad-block is recorded we make sure to write that out promptly
>> before bio_endio() gets called.  But not before close_write() has called
>> bitmap_end_write().
>>
>> So I guess we need to delay the close_write() call until the
>> bad-block-log has been written.
>>
>> I think this patch should do it.  Can you test?
>>
>> Thanks,
>> NeilBrown
>>
>> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
>> index c1ad0b075807..1a1c5160c930 100644
>> --- a/drivers/md/raid1.c
>> +++ b/drivers/md/raid1.c
>> @@ -2269,8 +2269,6 @@ static void handle_write_finished(struct r1conf *conf, struct r1bio *r1_bio)
>>   			rdev_dec_pending(conf->mirrors[m].rdev,
>>   					 conf->mddev);
>>   		}
>> -	if (test_bit(R1BIO_WriteError, &r1_bio->state))
>> -		close_write(r1_bio);
>>   	if (fail) {
>>   		spin_lock_irq(&conf->device_lock);
>>   		list_add(&r1_bio->retry_list, &conf->bio_end_io_list);
>> @@ -2396,6 +2394,9 @@ static void raid1d(struct md_thread *thread)
>>   			r1_bio = list_first_entry(&tmp, struct r1bio,
>>   						  retry_list);
>>   			list_del(&r1_bio->retry_list);
>> +			if (mddev->degraded)
>> +				set_bit(R1BIO_Degraded, &r1_bio->state);
>> +			close_write(r1_bio);
>>   			raid_end_bio_io(r1_bio);
>>   		}
>>   	}

  reply	other threads:[~2015-10-23 18:02 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-20 16:09 [PATCH 0/2] raid1/10: Handle write errors correctly in narrow_write_error() Jes.Sorensen
2015-10-20 16:09 ` [PATCH 1/2] md/raid1: submit_bio_wait() returns 0 on success Jes.Sorensen
2015-10-20 16:09 ` [PATCH 2/2] md/raid10: " Jes.Sorensen
2015-10-20 20:29 ` [PATCH 0/2] raid1/10: Handle write errors correctly in narrow_write_error() Neil Brown
2015-10-20 23:12   ` Jes Sorensen
2015-10-22 15:59   ` Jes Sorensen
2015-10-22 16:01     ` [PATCH 1/2] md/raid1: Do not clear bitmap bit if submit_bio_wait() fails Jes.Sorensen
2015-10-22 16:01     ` [PATCH 2/2] md/raid10: " Jes.Sorensen
2015-10-22 21:36     ` [PATCH 0/2] raid1/10: Handle write errors correctly in narrow_write_error() Neil Brown
2015-10-22 22:37       ` Nate Dailey
2015-10-23  0:09         ` Neil Brown
2015-10-23 14:30           ` Nate Dailey
2015-10-23 18:02             ` Jes Sorensen [this message]
2015-10-24  5:31               ` Neil Brown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=wrfjoafpcvjs.fsf@redhat.com \
    --to=jes.sorensen@redhat.com \
    --cc=William.Kuzeja@stratus.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=nate.dailey@stratus.com \
    --cc=neilb@suse.de \
    --cc=xni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).