RFC for multipath queue_if_no_path timeout.

* RFC for multipath queue_if_no_path timeout.
@ 2013-09-26 17:14 Frank Mayhar
  2013-09-26 17:24 ` Alasdair G Kergon
  2013-09-26 17:41 ` Mike Snitzer
  0 siblings, 2 replies; 49+ messages in thread
From: Frank Mayhar @ 2013-09-26 17:14 UTC (permalink / raw)
  To: dm-devel

Hey, folks.  We're using multipath as an in-kernel failover mechanism,
so that if an underlying device dies, multipath will switch to another
in its list.  Further, we use queue_if_no_path so that a daemon can get
involved and replace the list if the kernel runs out of alternatives.
In testing, however, we ran into a problem.

Obviously, if queue_if_no_path is on and multipath runs out of good
paths, the I/Os will sit there queued forever barring user intervention.
I was doing a lot of failure testing and encountered a daemon bug in
which it would abandon its recovery in the middle, leaving the list
intact and the I/Os queued, forever.  We fixed the daemon but the
problem is potentially still there if for some reason the daemon dies
and is not restarted.  This is a problem not solely (or even primarily)
for the queued I/O, but also because things like slab shrink can get
stuck behind that I/O and then other stuff becomes stuck behind _that_
(since tries to get locks held by shrink and may itself hold
semaphores), bringing the whole system to its knees in fairly short
order, to the point that it's impossible to even get in via the network
and reboot it.  I have an existence proof that this is the case. :-)

My idea to deal with this in the kernel was to introduce a timeout on
queue_if_no_path and make it settable either kernel-wide or per-table.
By default it's disabled and is only armed when multipath runs out of
valid paths and queue_if_no_path is on.  It's disabled again on table
load.  If the timeout ever fires, all that happens is that the handler
turns off queue_if_no_path; this causes all the outstanding I/O to get
EIO and unsticks things all the way up the chain.  Losing those I/Os is
far better than losing the entire system.

I've actually implemented this and it works.  I've debated about talking
with you folks about it but figured it was worth a shot.  I can post the
patch if you're interested.
-- 
Frank Mayhar
310-460-4042

^ permalink raw reply	[flat|nested] 49+ messages in thread