All of lore.kernel.org
 help / color / mirror / Atom feed
From: Xiao Ni <xni@redhat.com>
To: NeilBrown <neilb@suse.com>, linux-raid <linux-raid@vger.kernel.org>
Cc: shli@kernel.org
Subject: Re: Stuck in md_write_start because MD_SB_CHANGE_PENDING can't be cleared
Date: Wed, 6 Sep 2017 21:37:57 -0400 (EDT)	[thread overview]
Message-ID: <624049285.8379021.1504748277805.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <3a5955de-e6a1-de83-b00b-1984f7125799@redhat.com>



----- Original Message -----
> From: "Xiao Ni" <xni@redhat.com>
> To: "NeilBrown" <neilb@suse.com>, "linux-raid" <linux-raid@vger.kernel.org>
> Cc: shli@kernel.org
> Sent: Tuesday, September 5, 2017 10:15:00 AM
> Subject: Re: Stuck in md_write_start because MD_SB_CHANGE_PENDING can't be cleared
> 
> 
> 
> On 09/05/2017 09:36 AM, NeilBrown wrote:
> > On Mon, Sep 04 2017, Xiao Ni wrote:
> >
> >>
> >> In function handle_stripe:
> >> 4697         if (s.handle_bad_blocks ||
> >> 4698             test_bit(MD_SB_CHANGE_PENDING, &conf->mddev->sb_flags)) {
> >> 4699                 set_bit(STRIPE_HANDLE, &sh->state);
> >> 4700                 goto finish;
> >> 4701         }
> >>
> >> Because MD_SB_CHANGE_PENDING is set, so the stripes can't be handled.
> >>
> > Right, of course.  I see what is happening now.
> >
> > - raid5d cannot complete stripes until the metadata is written
> > - the metadata cannot be written until raid5d gets the mddev_lock
> > - mddev_lock is held by the write to suspend_hi
> > - the write to suspend_hi is waiting for raid5_quiesce
> > - raid5_quiesce is waiting for some stripes to complete.
> >
> > We could declare that ->quiesce(, 1) cannot be called while holding the
> > lock.
> > We could possible allow it but only if md_update_sb() is called first,
> > though that might still be racy.
> >
> > ->quiesce(, 1) is currently called from:
> >   mddev_suspend
> >   suspend_lo_store
> >   suspend_hi_store
> >   __md_stop_writes
> >   mddev_detach
> >   set_bitmap_file
> >   update_array_info (when setting/removing internal bitmap)
> >   md_do_sync
> >
> > and most of those are call with the lock held, or take the lock.
> >
> > Maybe we should *require* that mddev_lock is held when calling
> > ->quiesce() and have ->quiesce() do the metadata update.
> >
> > Something like the following maybe.  Can you test it?
> 
> Hi Neil
> 
> Thanks for the analysis. I need to thing for a while :)
> I already added the patch and the test is running now. It usually needs
> more than 5
> hours to reproduce this problem. I'll let it run more than 24 hours.
> I'll update the test
> result later.

Hi Neil

The problem still exists. But it doesn't show calltrace this time. It
was stuck yesterday. I didn't notice that because there has no calltrace.

echo file raid5.c +p > /sys/kernel/debug/dynamic_debug/control

It shows that raid5d is still spinning.

Regards
Xiao

> 
> Regards
> Xiao
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

  reply	other threads:[~2017-09-07  1:37 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <221835411.4473056.1504338574607.JavaMail.zimbra@redhat.com>
2017-09-02  8:01 ` Stuck in md_write_start because MD_SB_CHANGE_PENDING can't be cleared Xiao Ni
2017-09-04  2:16   ` NeilBrown
2017-09-04  2:45     ` Xiao Ni
2017-09-04  3:52       ` Xiao Ni
2017-09-04  5:34         ` NeilBrown
2017-09-04  7:36           ` Xiao Ni
2017-09-05  1:36             ` NeilBrown
2017-09-05  2:15               ` Xiao Ni
2017-09-07  1:37                 ` Xiao Ni [this message]
2017-09-07  5:37                   ` NeilBrown
2017-09-11  0:14                     ` Xiao Ni
2017-09-11  3:36                       ` NeilBrown
2017-09-11  5:03                         ` Xiao Ni
2017-09-30  9:44               ` Xiao Ni

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=624049285.8379021.1504748277805.JavaMail.zimbra@redhat.com \
    --to=xni@redhat.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.com \
    --cc=shli@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.