All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jes Sorensen <Jes.Sorensen@redhat.com>
To: Nate Dailey <nate.dailey@stratus.com>
Cc: Neil Brown <neilb@suse.de>,
	linux-raid@vger.kernel.org, William.Kuzeja@stratus.com,
	xni@redhat.com
Subject: Re: [PATCH 0/2] raid1/10: Handle write errors correctly in narrow_write_error()
Date: Fri, 23 Oct 2015 14:02:47 -0400	[thread overview]
Message-ID: <wrfjoafpcvjs.fsf@redhat.com> (raw)
In-Reply-To: <562A4475.1000904@stratus.com> (Nate Dailey's message of "Fri, 23 Oct 2015 10:30:13 -0400")

Nate Dailey <nate.dailey@stratus.com> writes:
> Thank you!
>
> I confirmed that this patch prevents the bug.
>
> Nate

Awesome, thanks Nate!

Neil once you commit the final version of this patch, please let me
know.

Cheers,
Jes

>
>
>
> On 10/22/2015 08:09 PM, Neil Brown wrote:
>> Nate Dailey <nate.dailey@stratus.com> writes:
>>
>>> The problem is that we aren't getting true write (medium) errors.
>>>
>>> In this case we're testing device removals. The write errors happen
>>> because the
>>> disk goes away. Narrow_write_error returns 1, the bitmap bit is cleared, and
>>> then when the device is re-added the resync might not include the sectors in
>>> that chunk (there's some luck involved; if other writes to that chunk happen
>>> while the disk is removed, we're okay--bug is easier to hit with
>>> smaller bitmap
>>> chunks because of this).
>>>
>>>
>> OK, that makes sense.
>>
>> The device removal will be noticed when the bad block log is written
>> out.
>> When a bad-block is recorded we make sure to write that out promptly
>> before bio_endio() gets called.  But not before close_write() has called
>> bitmap_end_write().
>>
>> So I guess we need to delay the close_write() call until the
>> bad-block-log has been written.
>>
>> I think this patch should do it.  Can you test?
>>
>> Thanks,
>> NeilBrown
>>
>> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
>> index c1ad0b075807..1a1c5160c930 100644
>> --- a/drivers/md/raid1.c
>> +++ b/drivers/md/raid1.c
>> @@ -2269,8 +2269,6 @@ static void handle_write_finished(struct r1conf *conf, struct r1bio *r1_bio)
>>   			rdev_dec_pending(conf->mirrors[m].rdev,
>>   					 conf->mddev);
>>   		}
>> -	if (test_bit(R1BIO_WriteError, &r1_bio->state))
>> -		close_write(r1_bio);
>>   	if (fail) {
>>   		spin_lock_irq(&conf->device_lock);
>>   		list_add(&r1_bio->retry_list, &conf->bio_end_io_list);
>> @@ -2396,6 +2394,9 @@ static void raid1d(struct md_thread *thread)
>>   			r1_bio = list_first_entry(&tmp, struct r1bio,
>>   						  retry_list);
>>   			list_del(&r1_bio->retry_list);
>> +			if (mddev->degraded)
>> +				set_bit(R1BIO_Degraded, &r1_bio->state);
>> +			close_write(r1_bio);
>>   			raid_end_bio_io(r1_bio);
>>   		}
>>   	}

  reply	other threads:[~2015-10-23 18:02 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-20 16:09 [PATCH 0/2] raid1/10: Handle write errors correctly in narrow_write_error() Jes.Sorensen
2015-10-20 16:09 ` [PATCH 1/2] md/raid1: submit_bio_wait() returns 0 on success Jes.Sorensen
2015-10-20 16:09 ` [PATCH 2/2] md/raid10: " Jes.Sorensen
2015-10-20 20:29 ` [PATCH 0/2] raid1/10: Handle write errors correctly in narrow_write_error() Neil Brown
2015-10-20 23:12   ` Jes Sorensen
2015-10-22 15:59   ` Jes Sorensen
2015-10-22 16:01     ` [PATCH 1/2] md/raid1: Do not clear bitmap bit if submit_bio_wait() fails Jes.Sorensen
2015-10-22 16:01     ` [PATCH 2/2] md/raid10: " Jes.Sorensen
2015-10-22 21:36     ` [PATCH 0/2] raid1/10: Handle write errors correctly in narrow_write_error() Neil Brown
2015-10-22 22:37       ` Nate Dailey
2015-10-23  0:09         ` Neil Brown
2015-10-23 14:30           ` Nate Dailey
2015-10-23 18:02             ` Jes Sorensen [this message]
2015-10-24  5:31               ` Neil Brown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=wrfjoafpcvjs.fsf@redhat.com \
    --to=jes.sorensen@redhat.com \
    --cc=William.Kuzeja@stratus.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=nate.dailey@stratus.com \
    --cc=neilb@suse.de \
    --cc=xni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.