Re: [PATCH] md: Fix nr_pending race during raid10 recovery

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Neil Brown <neilb@suse.de>
To: aniket@netezza.com
Cc: linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] md: Fix nr_pending race during raid10 recovery
Date: Wed, 24 Nov 2010 16:29:53 +1100	[thread overview]
Message-ID: <20101124162953.4a405299@notabene.brown> (raw)
In-Reply-To: <1290020270.3055.12.camel@aniket-desktop>

On Wed, 17 Nov 2010 13:57:50 -0500
Aniket Kulkarni <aniket@netezza.com> wrote:

> If a RAID10 rdev that is undergoing recovery is marked 'faulty', the rdev 
> could get taken out of the array in spite of outstanding IOs leading to 
> a kernel panic. There are two issues here -
> 
> 1. The ref count (nr_pending) increment for sync or recovery leaves a lot of
> open windows for concurrent rdev removals
> 2. raid10 sync thread continues to submit recovery IOs to faulty devices. These get
> rejected at a later stage by management thread (raid10d).
> 
> Note - rd denotes the rdev from which we are reading, and wr the one we are
> writing to
> 
>   Sync Thread                                Management Thread
> 
> sync_request
>   ++rd.nr_pending
>   bi_end_io = end_sync_read
>   generic_make_request         -------> recovery_request_write
>          |                    |             wr.nr_pending++
>          |                    |             bi_end_io = end_sync_write
>          V                    |             generic_make_request
> end_sync_read    --------------                      |
>   --rd.nr_pending                                    |
>   reschedule_retry for write                         |
>                                                      v
>                                          end_sync_write
>                                              --wr.nr_pending
> 
> So a set-faulty and remove on recovery rdev between sync_request and 
> recovery_request_write is allowed and will lead to a panic.
> 
> The fix is -
> 
> 1. Increment wr.nr_pending immediately after selecting a good target. Ofcourse
> the decrements will be added to error paths in sync_request and end_sync_read.
> 2. Don't submit recovery IOs to faulty targets

Hi again,
 I've been thinking about this some more and cannot see that it is a real
 problem.
 Do you have an actual 'oops' showing a crash in this situation?

 The reason it shouldn't happen is that devices are only removed by
 remove_and_add_devices, and that is only called when no resync/recovery is
 happening.
 So when a device fail, the recovery will abort (waiting for all requests to
 complete), then failed devices are removed and possibly spares are added,
 then possible recovery starts up again.

 So it should work correctly as it is....

confused,
NeilBrown

next prev parent reply	other threads:[~2010-11-24  5:29 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-17 18:57 [PATCH] md: Fix nr_pending race during raid10 recovery Aniket Kulkarni
2010-11-22  7:32 ` Neil Brown
2010-11-24  5:29 ` Neil Brown [this message]
2010-11-25  1:36   ` Aniket Kulkarni

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101124162953.4a405299@notabene.brown \
    --to=neilb@suse.de \
    --cc=aniket@netezza.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).