From: Mike Snitzer <snitzer@redhat.com>
To: Frank Mayhar <fmayhar@google.com>
Cc: dm-devel <dm-devel@redhat.com>
Subject: Re: RFC for multipath queue_if_no_path timeout.
Date: Thu, 17 Oct 2013 15:15:12 -0400 [thread overview]
Message-ID: <20131017191511.GA30452@redhat.com> (raw)
In-Reply-To: <1382036590.1980.32.camel@bobble.lax.corp.google.com>
On Thu, Oct 17 2013 at 3:03pm -0400,
Frank Mayhar <fmayhar@google.com> wrote:
> Dragging this back up into the light...
>
> On Thu, 2013-09-26 at 19:49 -0400, Mike Snitzer wrote:
> > Frank, I had a look at your patch. It leaves a lot to be desired, I was
> > starting to clean it up but ultimately found myself agreeing with
> > Alasdair's original point: that this policy should be implemented in the
> > userspace daemon.
>
> I've found and fixed a couple of bugs but I would still like to know
> what issues you had with the patch. As I said before, I would be more
> than happy to clean it up.
I don't recall, will let you know if/when I do have time to look again.
> In the time since we had this discussion, by the way, we ran into a
> problem that a userspace daemon can't solve: That of shutdown. We ran
> into a number of failures in which systems were hung for hours. It
> turned out that they were caused by a regular system shutdown. Our
> backing store is network-based and networking was getting killed before
> applications (as is usually the case), leaving I/O outstanding on the
> device. Since queue_if_no_path was set, the I/O wasn't dumped and our
> daemon was killed by shutdown very shortly thereafter so it couldn't
> recover (otherwise it would have cleaned things up).
>
> With those I/Os sitting queued in multipath, with no network and no
> daemon to turn off queue_if_no_path, the systems just sat. When we
> finally diagnosed this, we realized that the timeout would work
> perfectly to solve the problem, automatically turning queue_if_no_path
> off shortly after the network went away without depending on the
> intervention of the no-longer-running daemon.
>
> So how do you guys deal with this failure scenario?
Shouldn't you wait for the application to shutdown before ripping the
network out? Seems odd to just throw away queued IO.
A proper shutdown sequence really should avoid this problem in general,
the multipath daemon would only be shutdown once all mpath devices are
deactivated.
Then, if you still want to gracefully handle the case where there is no
network (and hence no paths) on shutdown the multipathd would still be
around to transition to a table that doesn't have queue_if_no_path.
next prev parent reply other threads:[~2013-10-17 19:15 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-09-26 17:14 RFC for multipath queue_if_no_path timeout Frank Mayhar
2013-09-26 17:24 ` Alasdair G Kergon
2013-09-26 17:31 ` Frank Mayhar
2013-09-26 17:38 ` Alasdair G Kergon
2013-09-26 17:47 ` Frank Mayhar
2013-09-26 17:52 ` Mike Snitzer
2013-09-26 20:36 ` [PATCH 1/1] dm mpath: Add timeout mechanism for queue_if_no_path Frank Mayhar
2013-09-26 23:22 ` RFC for multipath queue_if_no_path timeout Alasdair G Kergon
2013-09-26 23:49 ` Mike Snitzer
2013-09-27 6:07 ` Hannes Reinecke
2013-09-27 8:06 ` Hannes Reinecke
2013-09-27 8:37 ` Alasdair G Kergon
2013-09-27 13:52 ` Hannes Reinecke
2013-09-27 16:37 ` Frank Mayhar
2013-09-27 16:32 ` Frank Mayhar
2013-09-27 16:29 ` Frank Mayhar
2013-10-17 19:03 ` Frank Mayhar
2013-10-17 19:15 ` Mike Snitzer [this message]
2013-10-17 20:45 ` Frank Mayhar
2013-10-17 21:13 ` Mike Snitzer
2013-10-18 20:51 ` Frank Mayhar
2013-10-18 21:47 ` Alasdair G Kergon
2013-10-18 22:53 ` [RFC PATCH v2] dm mpath: add a " Mike Snitzer
2013-10-30 1:02 ` Mike Snitzer
2013-10-30 15:08 ` Frank Mayhar
2013-10-30 15:43 ` Mike Snitzer
2013-10-30 18:09 ` Frank Mayhar
2013-10-31 9:36 ` Junichi Nomura
2013-10-31 14:16 ` Frank Mayhar
2013-10-31 14:31 ` Alasdair G Kergon
2013-10-31 17:17 ` Frank Mayhar
2013-11-01 1:23 ` Junichi Nomura
2013-11-01 1:58 ` Junichi Nomura
2013-11-01 4:17 ` Junichi Nomura
2013-11-05 15:18 ` Frank Mayhar
2013-11-05 16:02 ` [RFC PATCH v3] " Frank Mayhar
2013-11-05 16:53 ` Mike Snitzer
2013-11-06 6:54 ` Hannes Reinecke
2013-11-06 15:43 ` Frank Mayhar
2013-11-06 19:21 ` Mike Snitzer
2013-11-07 1:03 ` Junichi Nomura
2013-10-31 14:59 ` [RFC PATCH v2] " Hannes Reinecke
2013-10-21 16:05 ` RFC for multipath " Benjamin Marzinski
2013-10-21 16:17 ` Frank Mayhar
2013-10-23 13:39 ` Benjamin Marzinski
2013-09-27 16:27 ` Frank Mayhar
2013-09-26 17:41 ` Mike Snitzer
2013-09-26 17:55 ` Frank Mayhar
2013-09-26 18:41 ` Mike Snitzer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20131017191511.GA30452@redhat.com \
--to=snitzer@redhat.com \
--cc=dm-devel@redhat.com \
--cc=fmayhar@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.