All of lore.kernel.org
 help / color / mirror / Atom feed
From: Shaohua Li <shli@kernel.org>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Artur Paszkiewicz <artur.paszkiewicz@intel.com>,
	NeilBrown <neilb@suse.com>,
	linux-raid <linux-raid@vger.kernel.org>
Subject: Re: raid5d hangs when stopping an array during reshape
Date: Thu, 25 Feb 2016 11:17:39 -0800	[thread overview]
Message-ID: <20160225191739.GB2390@kernel.org> (raw)
In-Reply-To: <CAPcyv4jTrYL7ctYgZ+P_Cb=z_eN9oRR9Z0OiyUu6jS+aphm0ag@mail.gmail.com>

On Thu, Feb 25, 2016 at 10:48:45AM -0800, Dan Williams wrote:
> On Thu, Feb 25, 2016 at 10:42 AM, Shaohua Li <shli@kernel.org> wrote:
> > On Thu, Feb 25, 2016 at 05:05:17PM +0100, Artur Paszkiewicz wrote:
> >> On 02/25/2016 02:17 AM, Shaohua Li wrote:
> >> > On Thu, Feb 25, 2016 at 11:31:04AM +1100, Neil Brown wrote:
> >> >> On Thu, Feb 25 2016, Shaohua Li wrote:
> >> >>
> >> >>>
> >> >>> As for the bug, write requests run in raid5d, mddev_suspend() waits for all IO,
> >> >>> which waits for the write requests. So this is a clear deadlock. I think we
> >> >>> should delete the check_reshape() in md_check_recovery(). If we change
> >> >>> layout/disks/chunk_size, check_reshape() is already called. If we start an
> >> >>> array, the .run() already handles new layout. There is no point
> >> >>> md_check_recovery() check_reshape() again.
> >> >>
> >> >> Are you sure?
> >> >> Did you look at the commit which added that code?
> >> >> commit b4c4c7b8095298ff4ce20b40bf180ada070812d0
> >> >>
> >> >> When there is an IO error, reshape (or resync or recovery) will abort
> >> >> and then possibly be automatically restarted.
> >> >
> >> > thanks pointing out this.
> >> >> Without the check here a reshape might be attempted on an array which
> >> >> has failed.  Not sure if that would be harmful, but it would certainly
> >> >> be pointless.
> >> >>
> >> >> But you are right that this is causing the problem.
> >> >> Maybe we should keep track of the size of the 'scribble' arrays and only
> >> >> call resize_chunks if the size needs to change?  Similar to what
> >> >> resize_stripes does.
> >> >
> >> > yep, this is my first solution, but think check_reshape() is useless here
> >> > later, apparently miss the restart case. I'll go this way.
> >>
> >> My idea was to replace mddev_suspend()/mddev_resume() in resize_chunks()
> >> with a rw lock that would prevent collisions with raid_run_ops(), since
> >> scribble is used only there. But if the parity operations are executed
> >> asynchronously this would also need to wait until all the submitted
> >> operations have completed. Seems a bit overkill, but I came up with
> >> this:
> >
> > Looks it should work, but it's overkill indead, especially the extra lock, we
> > can replace it with srcu though. The 'track scribble array size' is much
> > simpler, so I'd prefer that way. In the future, we probably should move
> > resize_stripes()/resize_chunks() to .start_reshape().
> > resize_stripes()/resize_chunks() sounds not qualified as .check_reshape().
> >
> 
> Any time any linux-raid mail mentions the raid5_run_ops infrastructure
> I am prompted to remind that async_tx needs to die and be up leveled
> to md directly.  The "help wanted" request is still pending.

A quick search shows async_tx has another user: exofs

  reply	other threads:[~2016-02-25 19:17 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-30 13:45 raid5d hangs when stopping an array during reshape Artur Paszkiewicz
2016-02-24 21:21 ` Dan Williams
2016-02-25  0:03   ` Shaohua Li
2016-02-25  0:31     ` NeilBrown
2016-02-25  1:17       ` Shaohua Li
2016-02-25 16:05         ` Artur Paszkiewicz
2016-02-25 18:42           ` Shaohua Li
2016-02-25 18:48             ` Dan Williams
2016-02-25 19:17               ` Shaohua Li [this message]
2016-02-25 19:58                 ` Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160225191739.GB2390@kernel.org \
    --to=shli@kernel.org \
    --cc=artur.paszkiewicz@intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.