All of lore.kernel.org
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.com>
To: Xiao Ni <xni@redhat.com>, linux-raid <linux-raid@vger.kernel.org>
Cc: shli@kernel.org
Subject: Re: Stuck in md_write_start because MD_SB_CHANGE_PENDING can't be cleared
Date: Tue, 05 Sep 2017 11:36:12 +1000	[thread overview]
Message-ID: <87k21ec4fn.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <34fedde7-cef9-34ff-1403-9d097267eb55@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 2999 bytes --]

On Mon, Sep 04 2017, Xiao Ni wrote:

>
>
> In function handle_stripe:
> 4697         if (s.handle_bad_blocks ||
> 4698             test_bit(MD_SB_CHANGE_PENDING, &conf->mddev->sb_flags)) {
> 4699                 set_bit(STRIPE_HANDLE, &sh->state);
> 4700                 goto finish;
> 4701         }
>
> Because MD_SB_CHANGE_PENDING is set, so the stripes can't be handled.
>

Right, of course.  I see what is happening now.

- raid5d cannot complete stripes until the metadata is written
- the metadata cannot be written until raid5d gets the mddev_lock
- mddev_lock is held by the write to suspend_hi
- the write to suspend_hi is waiting for raid5_quiesce
- raid5_quiesce is waiting for some stripes to complete.

We could declare that ->quiesce(, 1) cannot be called while holding the
lock.
We could possible allow it but only if md_update_sb() is called first,
though that might still be racy.

->quiesce(, 1) is currently called from:
 mddev_suspend
 suspend_lo_store
 suspend_hi_store
 __md_stop_writes
 mddev_detach
 set_bitmap_file
 update_array_info (when setting/removing internal bitmap)
 md_do_sync

and most of those are call with the lock held, or take the lock.

Maybe we should *require* that mddev_lock is held when calling
->quiesce() and have ->quiesce() do the metadata update.

Something like the following maybe.  Can you test it?
Thanks,
NeilBrown

diff --git a/drivers/md/md.c b/drivers/md/md.c
index b01e458d31e9..999ccf08c5db 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -5805,9 +5805,11 @@ void md_stop(struct mddev *mddev)
 	/* stop the array and free an attached data structures.
 	 * This is called from dm-raid
 	 */
+	mddev_lock_nointr(mddev);
 	__md_stop(mddev);
 	if (mddev->bio_set)
 		bioset_free(mddev->bio_set);
+	mddev_unlock(mddev);
 }
 
 EXPORT_SYMBOL_GPL(md_stop);
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 0fc2748aaf95..cde5a82eb404 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -4316,6 +4316,8 @@ static void handle_stripe_expansion(struct r5conf *conf, struct stripe_head *sh)
 
 			/* place all the copies on one channel */
 			init_async_submit(&submit, 0, tx, NULL, NULL, NULL);
+			WARN_ON(sh2->dev[dd_idx].page != sh2->dev[dd_idx].orig_page);
+			WARN_ON(sh->dev[i].page != sh->dev[i].orig_page);
 			tx = async_memcpy(sh2->dev[dd_idx].page,
 					  sh->dev[i].page, 0, 0, STRIPE_SIZE,
 					  &submit);
@@ -8031,7 +8033,10 @@ static void raid5_quiesce(struct mddev *mddev, int state)
 		wait_event_cmd(conf->wait_for_quiescent,
 				    atomic_read(&conf->active_stripes) == 0 &&
 				    atomic_read(&conf->active_aligned_reads) == 0,
-				    unlock_all_device_hash_locks_irq(conf),
+				    ({unlock_all_device_hash_locks_irq(conf);
+					if (mddev->sb_flags)
+						md_update_sb(mddev, 0);
+				    }),
 				    lock_all_device_hash_locks_irq(conf));
 		conf->quiesce = 1;
 		unlock_all_device_hash_locks_irq(conf);

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

  reply	other threads:[~2017-09-05  1:36 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <221835411.4473056.1504338574607.JavaMail.zimbra@redhat.com>
2017-09-02  8:01 ` Stuck in md_write_start because MD_SB_CHANGE_PENDING can't be cleared Xiao Ni
2017-09-04  2:16   ` NeilBrown
2017-09-04  2:45     ` Xiao Ni
2017-09-04  3:52       ` Xiao Ni
2017-09-04  5:34         ` NeilBrown
2017-09-04  7:36           ` Xiao Ni
2017-09-05  1:36             ` NeilBrown [this message]
2017-09-05  2:15               ` Xiao Ni
2017-09-07  1:37                 ` Xiao Ni
2017-09-07  5:37                   ` NeilBrown
2017-09-11  0:14                     ` Xiao Ni
2017-09-11  3:36                       ` NeilBrown
2017-09-11  5:03                         ` Xiao Ni
2017-09-30  9:44               ` Xiao Ni

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87k21ec4fn.fsf@notabene.neil.brown.name \
    --to=neilb@suse.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=shli@kernel.org \
    --cc=xni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.