raid1 IO hang - md_submit_flush_data ignores md_write

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* raid1 IO hang - md_submit_flush_data ignores md_write_start return value
@ 2017-09-20 12:49 Nate Dailey
  2017-09-20 19:35 ` Shaohua Li
  0 siblings, 1 reply; 3+ messages in thread
From: Nate Dailey @ 2017-09-20 12:49 UTC (permalink / raw)
  To: linux-raid

I'm seeing an occasional XFS IO hang with raid1 (INFO: task xfsaild/md20:17963 
blocked for more than 120 seconds).

It turns out that this is because md_submit_flush_data calls pers->make_request, 
and doesn't check the return value (unlike md_make_request, which checks the 
return value and retries). So if raid1_make_request/md_write_start return false, 
md_submit_flush_data drops the write on the floor.

I'm hitting this on a RHEL kernel, but looking at the upstream code it appears 
that the same thing could happen.

Not sure how best to deal with this... thank you for any advice!

Nate

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: raid1 IO hang - md_submit_flush_data ignores md_write_start return value
  2017-09-20 12:49 raid1 IO hang - md_submit_flush_data ignores md_write_start return value Nate Dailey
@ 2017-09-20 19:35 ` Shaohua Li
  2017-09-21 15:42   ` Nate Dailey
  0 siblings, 1 reply; 3+ messages in thread
From: Shaohua Li @ 2017-09-20 19:35 UTC (permalink / raw)
  To: Nate Dailey; +Cc: linux-raid

On Wed, Sep 20, 2017 at 08:49:30AM -0400, Nate Dailey wrote:
> I'm seeing an occasional XFS IO hang with raid1 (INFO: task
> xfsaild/md20:17963 blocked for more than 120 seconds).
> 
> It turns out that this is because md_submit_flush_data calls
> pers->make_request, and doesn't check the return value (unlike
> md_make_request, which checks the return value and retries). So if
> raid1_make_request/md_write_start return false, md_submit_flush_data drops
> the write on the floor.
> 
> I'm hitting this on a RHEL kernel, but looking at the upstream code it
> appears that the same thing could happen.
> 
> Not sure how best to deal with this... thank you for any advice!

Thanks for reporting! Looks like a regression introduced recently:
cc27b0c78c79 md: fix deadlock between mddev_suspend() and md_write_start()

I think we can just reuse md_make_request. To avoid deadlock, we cleanup the
mddev->flush_bio before we handle the bio. Something like this should fix it:

diff --git a/drivers/md/md.c b/drivers/md/md.c
index b01e458d31e9..3978607e390e 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -439,16 +439,16 @@ static void md_submit_flush_data(struct work_struct *ws)
 	struct mddev *mddev = container_of(ws, struct mddev, flush_work);
 	struct bio *bio = mddev->flush_bio;
 
+	mddev->flush_bio = NULL;
+	wake_up(&mddev->sb_wait);
+
 	if (bio->bi_iter.bi_size == 0)
 		/* an empty barrier - all done */
 		bio_endio(bio);
 	else {
 		bio->bi_opf &= ~REQ_PREFLUSH;
-		mddev->pers->make_request(mddev, bio);
+		md_make_request(mddev->queue, bio);
 	}
-
-	mddev->flush_bio = NULL;
-	wake_up(&mddev->sb_wait);
 }
 
 void md_flush_request(struct mddev *mddev, struct bio *bio)

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: raid1 IO hang - md_submit_flush_data ignores md_write_start return value
  2017-09-20 19:35 ` Shaohua Li
@ 2017-09-21 15:42   ` Nate Dailey
  0 siblings, 0 replies; 3+ messages in thread
From: Nate Dailey @ 2017-09-21 15:42 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-raid

I applied your change to the RHEL kernel on which I'm hitting the bug. It seems 
to be working; I'm not hitting the bug (or any other bad behavior).

Thank you!

Nate


On 09/20/2017 03:35 PM, Shaohua Li wrote:
> On Wed, Sep 20, 2017 at 08:49:30AM -0400, Nate Dailey wrote:
> > I'm seeing an occasional XFS IO hang with raid1 (INFO: task
> > xfsaild/md20:17963 blocked for more than 120 seconds).
> >
> > It turns out that this is because md_submit_flush_data calls
> > pers->make_request, and doesn't check the return value (unlike
> > md_make_request, which checks the return value and retries). So if
> > raid1_make_request/md_write_start return false, md_submit_flush_data drops
> > the write on the floor.
> >
> > I'm hitting this on a RHEL kernel, but looking at the upstream code it
> > appears that the same thing could happen.
> >
> > Not sure how best to deal with this... thank you for any advice!
>
> Thanks for reporting! Looks like a regression introduced recently:
> cc27b0c78c79 md: fix deadlock between mddev_suspend() and md_write_start()
>
> I think we can just reuse md_make_request. To avoid deadlock, we cleanup the
> mddev->flush_bio before we handle the bio. Something like this should fix it:
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index b01e458d31e9..3978607e390e 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -439,16 +439,16 @@ static void md_submit_flush_data(struct work_struct *ws)
> struct mddev *mddev = container_of(ws, struct mddev, flush_work);
> struct bio *bio = mddev->flush_bio;
>
> + mddev->flush_bio = NULL;
> + wake_up(&mddev->sb_wait);
> +
> if (bio->bi_iter.bi_size == 0)
> /* an empty barrier - all done */
> bio_endio(bio);
> else {
> bio->bi_opf &= ~REQ_PREFLUSH;
> - mddev->pers->make_request(mddev, bio);
> + md_make_request(mddev->queue, bio);
> }
> -
> - mddev->flush_bio = NULL;
> - wake_up(&mddev->sb_wait);
> }
>
> void md_flush_request(struct mddev *mddev, struct bio *bio)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html 
> <http://vger.kernel.org/majordomo-info.html>



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2017-09-21 15:42 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-09-20 12:49 raid1 IO hang - md_submit_flush_data ignores md_write_start return value Nate Dailey
2017-09-20 19:35 ` Shaohua Li
2017-09-21 15:42   ` Nate Dailey

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).