* RFC for multipath queue_if_no_path timeout. @ 2013-09-26 17:14 Frank Mayhar 2013-09-26 17:24 ` Alasdair G Kergon 2013-09-26 17:41 ` Mike Snitzer 0 siblings, 2 replies; 49+ messages in thread From: Frank Mayhar @ 2013-09-26 17:14 UTC (permalink / raw) To: dm-devel Hey, folks. We're using multipath as an in-kernel failover mechanism, so that if an underlying device dies, multipath will switch to another in its list. Further, we use queue_if_no_path so that a daemon can get involved and replace the list if the kernel runs out of alternatives. In testing, however, we ran into a problem. Obviously, if queue_if_no_path is on and multipath runs out of good paths, the I/Os will sit there queued forever barring user intervention. I was doing a lot of failure testing and encountered a daemon bug in which it would abandon its recovery in the middle, leaving the list intact and the I/Os queued, forever. We fixed the daemon but the problem is potentially still there if for some reason the daemon dies and is not restarted. This is a problem not solely (or even primarily) for the queued I/O, but also because things like slab shrink can get stuck behind that I/O and then other stuff becomes stuck behind _that_ (since tries to get locks held by shrink and may itself hold semaphores), bringing the whole system to its knees in fairly short order, to the point that it's impossible to even get in via the network and reboot it. I have an existence proof that this is the case. :-) My idea to deal with this in the kernel was to introduce a timeout on queue_if_no_path and make it settable either kernel-wide or per-table. By default it's disabled and is only armed when multipath runs out of valid paths and queue_if_no_path is on. It's disabled again on table load. If the timeout ever fires, all that happens is that the handler turns off queue_if_no_path; this causes all the outstanding I/O to get EIO and unsticks things all the way up the chain. Losing those I/Os is far better than losing the entire system. I've actually implemented this and it works. I've debated about talking with you folks about it but figured it was worth a shot. I can post the patch if you're interested. -- Frank Mayhar 310-460-4042 ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: RFC for multipath queue_if_no_path timeout. 2013-09-26 17:14 RFC for multipath queue_if_no_path timeout Frank Mayhar @ 2013-09-26 17:24 ` Alasdair G Kergon 2013-09-26 17:31 ` Frank Mayhar 2013-09-26 17:41 ` Mike Snitzer 1 sibling, 1 reply; 49+ messages in thread From: Alasdair G Kergon @ 2013-09-26 17:24 UTC (permalink / raw) To: Frank Mayhar; +Cc: dm-devel Timeouts were discussed when we added queue_if_no_path. and we made the decision to implement this in userspace not the kernel. Make sure that your daemon doesn't die - or that something monitors it and relaunches it if it does. Alasdair ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: RFC for multipath queue_if_no_path timeout. 2013-09-26 17:24 ` Alasdair G Kergon @ 2013-09-26 17:31 ` Frank Mayhar 2013-09-26 17:38 ` Alasdair G Kergon 0 siblings, 1 reply; 49+ messages in thread From: Frank Mayhar @ 2013-09-26 17:31 UTC (permalink / raw) To: Alasdair G Kergon; +Cc: dm-devel On Thu, 2013-09-26 at 18:24 +0100, Alasdair G Kergon wrote: > Timeouts were discussed when we added queue_if_no_path. and we made the > decision to implement this in userspace not the kernel. > > Make sure that your daemon doesn't die - or that something monitors it > and relaunches it if it does. Uh, huh. And what about when (not if) _that_ fails? (For one thing, what if the stuckness caused by the queued I/O prevents the binary from being successfully pulled in from storage?) Granted, this is a belt-and-suspenders kind of thing that most won't need, but when you need it, you _need_ it. -- Frank Mayhar 310-460-4042 ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: RFC for multipath queue_if_no_path timeout. 2013-09-26 17:31 ` Frank Mayhar @ 2013-09-26 17:38 ` Alasdair G Kergon 2013-09-26 17:47 ` Frank Mayhar 0 siblings, 1 reply; 49+ messages in thread From: Alasdair G Kergon @ 2013-09-26 17:38 UTC (permalink / raw) To: Frank Mayhar; +Cc: dm-devel On Thu, Sep 26, 2013 at 10:31:56AM -0700, Frank Mayhar wrote: > Uh, huh. And what about when (not if) _that_ fails? (For one thing, > what if the stuckness caused by the queued I/O prevents the binary from > being successfully pulled in from storage?) Lock the daemon in memory (or launch from ramdisk), don't allocate any new memory while it's doing critical monitoring, tell the OOM killer not to kill it, set high/real-time priority etc. lvm2 and multipath-tools use some of these techniques and seem to cope OK. Alasdair ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: RFC for multipath queue_if_no_path timeout. 2013-09-26 17:38 ` Alasdair G Kergon @ 2013-09-26 17:47 ` Frank Mayhar 2013-09-26 17:52 ` Mike Snitzer 2013-09-26 23:22 ` RFC for multipath queue_if_no_path timeout Alasdair G Kergon 0 siblings, 2 replies; 49+ messages in thread From: Frank Mayhar @ 2013-09-26 17:47 UTC (permalink / raw) To: Alasdair G Kergon; +Cc: dm-devel On Thu, 2013-09-26 at 18:38 +0100, Alasdair G Kergon wrote: > On Thu, Sep 26, 2013 at 10:31:56AM -0700, Frank Mayhar wrote: > > Uh, huh. And what about when (not if) _that_ fails? (For one thing, > > what if the stuckness caused by the queued I/O prevents the binary from > > being successfully pulled in from storage?) > > Lock the daemon in memory (or launch from ramdisk), don't allocate any new > memory while it's doing critical monitoring, tell the OOM killer not to kill > it, set high/real-time priority etc. > > lvm2 and multipath-tools use some of these techniques and seem to cope OK. Launching it from ramdisk won't help, particularly, since it still goes through the block layer. The other stuff won't help if a (potentially unrelated) bug in the daemon happens to be being tickled at the same time, or if some dependency happens to be broken and _that's_ what's preventing the daemon from making progress. And as far as lvm2 and multipath-tools, yeah, they cope okay in the kind of environments most people have, but that's not the kind of environment (or scale) we have to deal with. -- Frank Mayhar 310-460-4042 ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: RFC for multipath queue_if_no_path timeout. 2013-09-26 17:47 ` Frank Mayhar @ 2013-09-26 17:52 ` Mike Snitzer 2013-09-26 20:36 ` [PATCH 1/1] dm mpath: Add timeout mechanism for queue_if_no_path Frank Mayhar 2013-09-26 23:22 ` RFC for multipath queue_if_no_path timeout Alasdair G Kergon 1 sibling, 1 reply; 49+ messages in thread From: Mike Snitzer @ 2013-09-26 17:52 UTC (permalink / raw) To: Frank Mayhar; +Cc: dm-devel, Alasdair G Kergon On Thu, Sep 26 2013 at 1:47pm -0400, Frank Mayhar <fmayhar@google.com> wrote: > On Thu, 2013-09-26 at 18:38 +0100, Alasdair G Kergon wrote: > > On Thu, Sep 26, 2013 at 10:31:56AM -0700, Frank Mayhar wrote: > > > Uh, huh. And what about when (not if) _that_ fails? (For one thing, > > > what if the stuckness caused by the queued I/O prevents the binary from > > > being successfully pulled in from storage?) > > > > Lock the daemon in memory (or launch from ramdisk), don't allocate any new > > memory while it's doing critical monitoring, tell the OOM killer not to kill > > it, set high/real-time priority etc. > > > > lvm2 and multipath-tools use some of these techniques and seem to cope OK. > > Launching it from ramdisk won't help, particularly, since it still goes > through the block layer. The other stuff won't help if a (potentially > unrelated) bug in the daemon happens to be being tickled at the same > time, or if some dependency happens to be broken and _that's_ what's > preventing the daemon from making progress. > > And as far as lvm2 and multipath-tools, yeah, they cope okay in the kind > of environments most people have, but that's not the kind of environment > (or scale) we have to deal with. Fine, please see the post I made earlier in this thread and let me know what you think. ^ permalink raw reply [flat|nested] 49+ messages in thread
* [PATCH 1/1] dm mpath: Add timeout mechanism for queue_if_no_path. 2013-09-26 17:52 ` Mike Snitzer @ 2013-09-26 20:36 ` Frank Mayhar 0 siblings, 0 replies; 49+ messages in thread From: Frank Mayhar @ 2013-09-26 20:36 UTC (permalink / raw) To: Mike Snitzer; +Cc: dm-devel, Alasdair G Kergon Add a configurable timeout mechanism to the queue_if_no_path function of multipath. If set and if queue_if_no_path is set, the timeout is started when there are no active paths on a multipath device. The timeout is reset if an active path is introduced or, of course, if a new table (and therefore a new multipath definition) is loaded. If the timeout ever fires, the handler simply turns queue_if_no_path off. This allows I/O queued in multipath to be errored, possibly releasing locks and semaphores that may be being held waiting for that I/O to complete. The implementation dynamically allocates the timeout as needed, so as to add as little overhead as possible when it is not being used. This mechanism is not turned on by default (the default timeout is zero). It can be turned on by either adding the queue_if_no_path_timeout parameter to the table definition or sending the parameter via the DM message mechanism (sets a timeout for only that multipath instance; note that this doesn't survive a table reload unless the parameter is included in the new table). Signed-off-by: Frank Mayhar <fmayhar@google.com> diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c index de570a5..a746306 100644 --- a/drivers/md/dm-mpath.c +++ b/drivers/md/dm-mpath.c @@ -27,6 +27,8 @@ #define DM_PG_INIT_DELAY_MSECS 2000 #define DM_PG_INIT_DELAY_DEFAULT ((unsigned) -1) +#define DEFAULT_NOPATH_TIMEOUT_SECS 0 + /* Path properties */ struct pgpath { struct list_head list; @@ -105,6 +107,17 @@ struct multipath { mempool_t *mpio_pool; struct mutex work_mutex; + + /* + * When a context loses its last path, queue_if_no_path is set and + * nopath_timeout > 0, we start this timer. If it fires, we clear + * queue_if_no_path. If we get a new path before it fires, we stop + * the timer. + * + * The timeout is given in seconds. + */ + unsigned nopath_timeout; + struct timer_list *nopath_timer; }; /* @@ -117,12 +130,19 @@ struct dm_mpath_io { typedef int (*action_fn) (struct pgpath *pgpath); +static unsigned long dm_mpath_nopath_timeout = DEFAULT_NOPATH_TIMEOUT_SECS; +module_param_named(queue_if_no_path_timeout_secs, + dm_mpath_nopath_timeout, ulong, S_IRUGO | S_IWUSR); + static struct kmem_cache *_mpio_cache; static struct workqueue_struct *kmultipathd, *kmpath_handlerd; static void process_queued_ios(struct work_struct *work); static void trigger_event(struct work_struct *work); static void activate_path(struct work_struct *work); +static void handle_nopath_timeout(unsigned long data); +static void activate_nopath_timeout_l(struct multipath *m); +static int activate_nopath_timeout(struct multipath *m, unsigned to); /*----------------------------------------------- @@ -193,6 +213,8 @@ static struct multipath *alloc_multipath(struct dm_target *ti) m = kzalloc(sizeof(*m), GFP_KERNEL); if (m) { + int r; + INIT_LIST_HEAD(&m->priority_groups); INIT_LIST_HEAD(&m->queued_ios); spin_lock_init(&m->lock); @@ -207,6 +229,14 @@ static struct multipath *alloc_multipath(struct dm_target *ti) kfree(m); return NULL; } + m->nopath_timer = NULL; + r = activate_nopath_timeout(m, DEFAULT_NOPATH_TIMEOUT_SECS); + if (r == -ENOMEM) { + DMWARN("Couldn't allocate timer list!"); + mempool_destroy(m->mpio_pool); + kfree(m); + return NULL; + } m->ti = ti; ti->private = m; } @@ -790,6 +820,8 @@ static int parse_features(struct dm_arg_set *as, struct multipath *m) {0, 6, "invalid number of feature args"}, {1, 50, "pg_init_retries must be between 1 and 50"}, {0, 60000, "pg_init_delay_msecs must be between 0 and 60000"}, + {0, 31557600, "queue_if_no_path_timeout_secs must be 0 or" + " a positive number of seconds"}, }; r = dm_read_arg_group(_args, as, &argc, &ti->error); @@ -827,6 +859,16 @@ static int parse_features(struct dm_arg_set *as, struct multipath *m) continue; } + if (!strcasecmp(arg_name, "queue_if_no_path_timeout_secs") && + (argc >= 1)) { + unsigned to; + + r = dm_read_arg(_args+3, as, &to, &ti->error); + activate_nopath_timeout(m, to); + argc--; + continue; + } + ti->error = "Unrecognised multipath feature request"; r = -EINVAL; } while (argc && !r); @@ -952,6 +994,8 @@ static void multipath_dtr(struct dm_target *ti) { struct multipath *m = ti->private; + if (m->nopath_timer) + del_timer_sync(m->nopath_timer); flush_multipath_work(m); free_multipath(m); } @@ -978,6 +1022,17 @@ static int multipath_map(struct dm_target *ti, struct request *clone, } /* + * If the queue_if_no_path timeout fires, turn off queue_if_no_path and + * process any queued I/O. + */ +static void handle_nopath_timeout(unsigned long data) +{ + struct multipath *m = (struct multipath *)data; + + (void)queue_if_no_path(m, 0, 0); +} + +/* * Take a path out of use. */ static int fail_path(struct pgpath *pgpath) @@ -1006,6 +1061,8 @@ static int fail_path(struct pgpath *pgpath) schedule_work(&m->trigger_event); + activate_nopath_timeout_l(m); + out: spin_unlock_irqrestore(&m->lock, flags); @@ -1055,6 +1112,9 @@ static int reinstate_path(struct pgpath *pgpath) out: spin_unlock_irqrestore(&m->lock, flags); + if (m->nopath_timer) + del_timer_sync(m->nopath_timer); + return r; } @@ -1079,6 +1139,79 @@ static int action_dev(struct multipath *m, struct dm_dev *dev, } /* + * Activate the queue_if_no_path timeout if necessary. Called with m->lock + * held. + */ +static void activate_nopath_timeout_l(struct multipath *m) +{ + if (m->nr_valid_paths == 0 && m->nopath_timer && + m->nopath_timeout > 0 && m->queue_if_no_path) + mod_timer(m->nopath_timer, jiffies + m->nopath_timeout * HZ); +} + +/* + * Set the queue_if_no_path timeout and activate it. + */ +static int activate_nopath_timeout(struct multipath *m, unsigned to) +{ + unsigned long flags; + struct timer_list *nopath_timer = NULL; + + spin_lock_irqsave(&m->lock, flags); + /* + * If the user is turning the timeout on, allocate the timer list. + */ + if (to && !m->nopath_timer) { + spin_unlock_irqrestore(&m->lock, flags); + nopath_timer = kzalloc(sizeof(struct timer_list), GFP_KERNEL); + if (!nopath_timer) + return -ENOMEM; + init_timer(nopath_timer); + nopath_timer->data = (unsigned long)m; + nopath_timer->function = handle_nopath_timeout; + spin_lock_irqsave(&m->lock, flags); + /* If we raced with an allocate, drop the new one. */ + if (m->nopath_timer) + kfree(nopath_timer); + else + m->nopath_timer = nopath_timer; + nopath_timer = NULL; + } + m->nopath_timeout = to; + if (to) + activate_nopath_timeout_l(m); + else { + /* + * If the user is turning the timeout off, null out the timer + * list pointer so we can free it below. + */ + nopath_timer = m->nopath_timer; + m->nopath_timer = NULL; + } + spin_unlock_irqrestore(&m->lock, flags); + /* Now turn off the timer and free the timer list. */ + if (nopath_timer) { + del_timer_sync(nopath_timer); + kfree(nopath_timer); + } + return 0; +} + +/* + * Get the queue_if_no_path timeout, set it and activate it. + */ +static int set_nopath_timeout(struct multipath *m, const char *timestr) +{ + unsigned long to; + + if (!timestr || (sscanf(timestr, "%lu", &to) != 1)) { + DMWARN("invalid timeout supplied to set_nopath_timeout"); + return -EINVAL; + } + return activate_nopath_timeout(m, to); +} + +/* * Temporarily try to avoid having to use the specified PG */ static void bypass_pg(struct multipath *m, struct priority_group *pg, @@ -1423,7 +1556,8 @@ static void multipath_status(struct dm_target *ti, status_type_t type, DMEMIT("%u ", m->queue_if_no_path + (m->pg_init_retries > 0) * 2 + (m->pg_init_delay_msecs != DM_PG_INIT_DELAY_DEFAULT) * 2 + - m->retain_attached_hw_handler); + m->retain_attached_hw_handler) + + (m->nopath_timeout > 0) * 2; if (m->queue_if_no_path) DMEMIT("queue_if_no_path "); if (m->pg_init_retries) @@ -1432,6 +1566,9 @@ static void multipath_status(struct dm_target *ti, status_type_t type, DMEMIT("pg_init_delay_msecs %u ", m->pg_init_delay_msecs); if (m->retain_attached_hw_handler) DMEMIT("retain_attached_hw_handler "); + if (m->nopath_timeout) + DMEMIT("queue_if_no_path_timeout_secs %u ", + m->nopath_timeout); } if (!m->hw_handler_name || type == STATUSTYPE_INFO) @@ -1550,6 +1687,9 @@ static int multipath_message(struct dm_target *ti, unsigned argc, char **argv) } else if (!strcasecmp(argv[0], "switch_group")) { r = switch_pg_num(m, argv[1]); goto out; + } else if (!strcasecmp(argv[0], "queue_if_no_path_timeout_secs")) { + r = set_nopath_timeout(m, argv[1]); + goto out; } else if (!strcasecmp(argv[0], "reinstate_path")) action = reinstate_path; else if (!strcasecmp(argv[0], "fail_path")) -- Frank Mayhar 310-460-4042 ^ permalink raw reply related [flat|nested] 49+ messages in thread
* Re: RFC for multipath queue_if_no_path timeout. 2013-09-26 17:47 ` Frank Mayhar 2013-09-26 17:52 ` Mike Snitzer @ 2013-09-26 23:22 ` Alasdair G Kergon 2013-09-26 23:49 ` Mike Snitzer 2013-09-27 16:27 ` Frank Mayhar 1 sibling, 2 replies; 49+ messages in thread From: Alasdair G Kergon @ 2013-09-26 23:22 UTC (permalink / raw) To: Frank Mayhar; +Cc: dm-devel On Thu, Sep 26, 2013 at 10:47:13AM -0700, Frank Mayhar wrote: > Launching it from ramdisk won't help, particularly, since it still goes > through the block layer. The other stuff won't help if a (potentially > unrelated) bug in the daemon happens to be being tickled at the same > time, or if some dependency happens to be broken and _that's_ what's > preventing the daemon from making progress. Then put more effort into debugging your daemon so it doesn't have bugs that make it die? Implement the timeout in a robust independent daemon if it's other code there that's unreliable? > And as far as lvm2 and multipath-tools, yeah, they cope okay in the kind > of environments most people have, but that's not the kind of environment > (or scale) we have to deal with. In what way are your requirements so different that a locked-into-memory monitoring daemon cannot implement this timeout? Alasdair ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: RFC for multipath queue_if_no_path timeout. 2013-09-26 23:22 ` RFC for multipath queue_if_no_path timeout Alasdair G Kergon @ 2013-09-26 23:49 ` Mike Snitzer 2013-09-27 6:07 ` Hannes Reinecke ` (2 more replies) 2013-09-27 16:27 ` Frank Mayhar 1 sibling, 3 replies; 49+ messages in thread From: Mike Snitzer @ 2013-09-26 23:49 UTC (permalink / raw) To: Frank Mayhar; +Cc: dm-devel On Thu, Sep 26 2013 at 7:22pm -0400, Alasdair G Kergon <agk@redhat.com> wrote: > On Thu, Sep 26, 2013 at 10:47:13AM -0700, Frank Mayhar wrote: > > Launching it from ramdisk won't help, particularly, since it still goes > > through the block layer. The other stuff won't help if a (potentially > > unrelated) bug in the daemon happens to be being tickled at the same > > time, or if some dependency happens to be broken and _that's_ what's > > preventing the daemon from making progress. > > Then put more effort into debugging your daemon so it doesn't have > bugs that make it die? Implement the timeout in a robust independent > daemon if it's other code there that's unreliable? > > > And as far as lvm2 and multipath-tools, yeah, they cope okay in the kind > > of environments most people have, but that's not the kind of environment > > (or scale) we have to deal with. > > In what way are your requirements so different that a locked-into-memory > monitoring daemon cannot implement this timeout? Frank, I had a look at your patch. It leaves a lot to be desired, I was starting to clean it up but ultimately found myself agreeing with Alasdair's original point: that this policy should be implemented in the userspace daemon. Mike ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: RFC for multipath queue_if_no_path timeout. 2013-09-26 23:49 ` Mike Snitzer @ 2013-09-27 6:07 ` Hannes Reinecke 2013-09-27 8:06 ` Hannes Reinecke 2013-09-27 16:29 ` Frank Mayhar 2013-10-17 19:03 ` Frank Mayhar 2 siblings, 1 reply; 49+ messages in thread From: Hannes Reinecke @ 2013-09-27 6:07 UTC (permalink / raw) To: dm-devel On 09/27/2013 01:49 AM, Mike Snitzer wrote: > On Thu, Sep 26 2013 at 7:22pm -0400, > Alasdair G Kergon <agk@redhat.com> wrote: > >> On Thu, Sep 26, 2013 at 10:47:13AM -0700, Frank Mayhar wrote: >>> Launching it from ramdisk won't help, particularly, since it still goes >>> through the block layer. The other stuff won't help if a (potentially >>> unrelated) bug in the daemon happens to be being tickled at the same >>> time, or if some dependency happens to be broken and _that's_ what's >>> preventing the daemon from making progress. >> >> Then put more effort into debugging your daemon so it doesn't have >> bugs that make it die? Implement the timeout in a robust independent >> daemon if it's other code there that's unreliable? >> >>> And as far as lvm2 and multipath-tools, yeah, they cope okay in the kind >>> of environments most people have, but that's not the kind of environment >>> (or scale) we have to deal with. >> >> In what way are your requirements so different that a locked-into-memory >> monitoring daemon cannot implement this timeout? > > Frank, I had a look at your patch. It leaves a lot to be desired, I was > starting to clean it up but ultimately found myself agreeing with > Alasdair's original point: that this policy should be implemented in the > userspace daemon. > _Actually_ there is a way how this could be implemented properly: implement a blk_timeout function. Thing is, every request_queue might have a timeout function implemented, whose goal is to abort requests which are beyond that timeout. EG SCSI uses that for the dev_loss_tmo mechanism. Multipath what with it being request-based could easily implement the same mechanism, namely have to blk_timeout function which would just re-arm the timeout in the default case, but abort any queued I/O (after a timeout) if all paths are down. Hmm. I see to draft up a PoC. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: RFC for multipath queue_if_no_path timeout. 2013-09-27 6:07 ` Hannes Reinecke @ 2013-09-27 8:06 ` Hannes Reinecke 2013-09-27 8:37 ` Alasdair G Kergon 2013-09-27 16:32 ` Frank Mayhar 0 siblings, 2 replies; 49+ messages in thread From: Hannes Reinecke @ 2013-09-27 8:06 UTC (permalink / raw) To: dm-devel; +Cc: Frank Mayhar, Mike Snitzer [-- Attachment #1: Type: text/plain, Size: 2403 bytes --] On 09/27/2013 08:07 AM, Hannes Reinecke wrote: > On 09/27/2013 01:49 AM, Mike Snitzer wrote: >> On Thu, Sep 26 2013 at 7:22pm -0400, >> Alasdair G Kergon <agk@redhat.com> wrote: >> >>> On Thu, Sep 26, 2013 at 10:47:13AM -0700, Frank Mayhar wrote: >>>> Launching it from ramdisk won't help, particularly, since it still goes >>>> through the block layer. The other stuff won't help if a (potentially >>>> unrelated) bug in the daemon happens to be being tickled at the same >>>> time, or if some dependency happens to be broken and _that's_ what's >>>> preventing the daemon from making progress. >>> >>> Then put more effort into debugging your daemon so it doesn't have >>> bugs that make it die? Implement the timeout in a robust independent >>> daemon if it's other code there that's unreliable? >>> >>>> And as far as lvm2 and multipath-tools, yeah, they cope okay in the kind >>>> of environments most people have, but that's not the kind of environment >>>> (or scale) we have to deal with. >>> >>> In what way are your requirements so different that a locked-into-memory >>> monitoring daemon cannot implement this timeout? >> >> Frank, I had a look at your patch. It leaves a lot to be desired, I was >> starting to clean it up but ultimately found myself agreeing with >> Alasdair's original point: that this policy should be implemented in the >> userspace daemon. >> > _Actually_ there is a way how this could be implemented properly: > implement a blk_timeout function. > > Thing is, every request_queue might have a timeout function > implemented, whose goal is to abort requests which are beyond that > timeout. EG SCSI uses that for the dev_loss_tmo mechanism. > > Multipath what with it being request-based could easily implement > the same mechanism, namely have to blk_timeout function which would > just re-arm the timeout in the default case, but abort any queued > I/O (after a timeout) if all paths are down. > > Hmm. I see to draft up a PoC. > And indeed, here it is. Completely untested, just to give you an idea what I was going on about. Let's see if I can put this to test somewhere... Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) [-- Attachment #2: dm-mpath-no-path-timeout.patch --] [-- Type: text/x-patch, Size: 4589 bytes --] diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c index 5adede1..6801ac3 100644 --- a/drivers/md/dm-mpath.c +++ b/drivers/md/dm-mpath.c @@ -444,6 +444,61 @@ static int queue_if_no_path(struct multipath *m, unsigned queue_if_no_path, return 0; } +/* + * Block timeout callback, called from the block layer + * + * request_queue lock is held on entry. + * + * Return values: + * BLK_EH_RESET_TIMER if the request should be left running + * BLK_EH_NOT_HANDLED if the request is handled or terminated + * by the driver. + */ +enum blk_eh_timer_return abort_if_no_path(struct request *req) +{ + union map_info *info; + struct dm_mpath_io *mpio; + struct multipath *m; + unsigned long flags; + int rc = BLK_EH_RESET_TIMER; + int flush_ios = 0; + + info = dm_get_rq_mapinfo(req); + if (!info || !info->ptr) + return BLK_EH_NOT_HANDLED; + + mpio = info->ptr; + m = mpio->pgpath->pg->m; + /* + * Only abort request if: + * - queued_ios is not empty + * (protect against races with process_queued_ios) + * - queue_io is not set + * - no valid paths are found + */ + spin_lock_irqsave(&m->lock, flags); + if (!list_empty(&m->queued_ios) && + !m->queue_io && + !m->nr_valid_paths) { + list_del_init(&req->queuelist); + m->queue_size--; + m->queue_if_no_path = 0; + if (m->queue_size) + flush_ios = 1; + rc = BLK_EH_NOT_HANDLED; + } + spin_unlock_irqrestore(&m->lock, flags); + + if (rc == BLK_EH_NOT_HANDLED) { + mempool_free(mpio, m->mpio_pool); + dm_kill_unmapped_request(clone, -ETIMEOUT); + } + if (flush_ios) + queue_work(kmultipathd, &m->process_queue_ios); + + return rc; +} + /*----------------------------------------------------------------- * The multipath daemon is responsible for resubmitting queued ios. *---------------------------------------------------------------*/ @@ -790,6 +845,7 @@ static int parse_features(struct dm_arg_set *as, struct multipath *m) {0, 6, "invalid number of feature args"}, {1, 50, "pg_init_retries must be between 1 and 50"}, {0, 60000, "pg_init_delay_msecs must be between 0 and 60000"}, + {0, 65535, "no_path_timeout must be between 0 and 65535"}, }; r = dm_read_arg_group(_args, as, &argc, &ti->error); @@ -827,6 +883,13 @@ static int parse_features(struct dm_arg_set *as, struct multipath *m) continue; } + if (!strcasecmp(arg_name, "no_path_timeout") && + (argc >= 1)) { + r = dm_read_arg(_args + 3, as, &m->no_path_timeout, &ti->error); + argc--; + continue; + } + ti->error = "Unrecognised multipath feature request"; r = -EINVAL; } while (argc && !r); @@ -905,6 +968,12 @@ static int multipath_ctr(struct dm_target *ti, unsigned int argc, goto bad; } + if (m->no_path_timeout) + dm_set_queue_timeout(ti, abort_if_no_path, + m->no_path_timeout * HZ); + else + dm_set_queue_timeout(ti, NULL, 0) + ti->num_flush_bios = 1; ti->num_discard_bios = 1; ti->num_write_same_bios = 1; diff --git a/drivers/md/dm.c b/drivers/md/dm.c index 9e39d2b..26bfad6 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -1881,6 +1881,16 @@ static void dm_init_md_queue(struct mapped_device *md) blk_queue_merge_bvec(md->queue, dm_merge_bvec); } +static void dm_set_md_timeout(struct mapped_device *md, + rq_timed_out_fn *fn, unsigned int timeout) +{ + if (dm_get_md_type(md) != DM_TYPE_REQUEST_BASED) + return; + + blk_queue_rq_timed_out(md->queue, fn); + blk_queue_rq_timeout(md->queue, timeout); +} + /* * Allocate and initialise a blank device with a given minor. */ @@ -2790,6 +2800,13 @@ int dm_noflush_suspending(struct dm_target *ti) } EXPORT_SYMBOL_GPL(dm_noflush_suspending); +int dm_set_queue_timeout(struct dm_target *ti, rq_timed_out_fn *fn, + unsigned int timeout) +{ + return dm_set_md_timeout(dm_table_get_md(ti->table), fn, timeout); +} +EXPORT_SYMBOL_GPL(dm_set_queue_timeout); + struct dm_md_mempools *dm_alloc_md_mempools(unsigned type, unsigned integrity, unsigned per_bio_data_size) { struct dm_md_mempools *pools = kzalloc(sizeof(*pools), GFP_KERNEL); diff --git a/drivers/md/dm.h b/drivers/md/dm.h index 45b97da..c8df1ef 100644 --- a/drivers/md/dm.h +++ b/drivers/md/dm.h @@ -80,6 +80,8 @@ void dm_unlock_md_type(struct mapped_device *md); void dm_set_md_type(struct mapped_device *md, unsigned type); unsigned dm_get_md_type(struct mapped_device *md); struct target_type *dm_get_immutable_target_type(struct mapped_device *md); +void dm_set_queue_timeout(struct dm_table *t, rq_timed_out_fn *fn, + unsigned int timeout); int dm_setup_md_queue(struct mapped_device *md); [-- Attachment #3: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply related [flat|nested] 49+ messages in thread
* Re: RFC for multipath queue_if_no_path timeout. 2013-09-27 8:06 ` Hannes Reinecke @ 2013-09-27 8:37 ` Alasdair G Kergon 2013-09-27 13:52 ` Hannes Reinecke 2013-09-27 16:32 ` Frank Mayhar 1 sibling, 1 reply; 49+ messages in thread From: Alasdair G Kergon @ 2013-09-27 8:37 UTC (permalink / raw) To: Hannes Reinecke; +Cc: dm-devel, Frank Mayhar, Mike Snitzer But this still dodges the fundamental problem: What is the right value to use for the timeout? - How long should you wait for a path to (re)appear? - In the current model, reinstating a path is a userspace responsibility. The timeout, as proposed, is being used in two conflicting ways: - How long to wait for path recovery when all paths went down - How long to wait when the system locks without enough free memory even to reinstate a path (because of broken userspace code) before having multipath fail queued I/O in a desperate attempt at releasing memory to assist recovery The second case should point to a very short timeout. The first case probably wants a longer one. In my view the correct approach for the case Frank is discussing is to use a different trigger to detect the (approaching?) locking up of the system. E.g. should something related to the handling of an out of memory condition have a hook to instruct multipath to release such queued I/O? Alasdair ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: RFC for multipath queue_if_no_path timeout. 2013-09-27 8:37 ` Alasdair G Kergon @ 2013-09-27 13:52 ` Hannes Reinecke 2013-09-27 16:37 ` Frank Mayhar 0 siblings, 1 reply; 49+ messages in thread From: Hannes Reinecke @ 2013-09-27 13:52 UTC (permalink / raw) To: dm-devel, Frank Mayhar, Mike Snitzer On 09/27/2013 10:37 AM, Alasdair G Kergon wrote: > But this still dodges the fundamental problem: > > What is the right value to use for the timeout? > - How long should you wait for a path to (re)appear? > - In the current model, reinstating a path is a userspace > responsibility. > And with my proposed patch it would still be userspace which is setting the timeout. Currently, no_path_retry is not a proper measure anyway, as it's depending on the time multipathd takes to complete a path check round. Which depends on the number of device, the state of those etc. > The timeout, as proposed, is being used in two conflicting ways: > - How long to wait for path recovery when all paths went down That would be set via the new 'no_path_timeout' feature, which would be set instead of the (multipath-internal) no_path_retry setting. > - How long to wait when the system locks without enough free > memory even to reinstate a path (because of broken userspace > code) before having multipath fail queued I/O in a desperate > attempt at releasing memory to assist recovery > Do we even handle that case currently? Methinks this is precisely the use-case this is supposed to address. When currently 'no_path_retry' is set _and_ we're running under a low-mem condition there is a quite large likelyhood that the multipath daemon will be killed by the OOM-killer or not able to send any dm messages down to the kernel, as the latter most likely require some memory allocations. So in the current 'no_path_retry' scenario the maps would have been created with 'queue_if_no_path', and the daemon would have to reset the 'queue_if_no_path' flag if the no_path_retry value expires. Which it might not be able to do so due to the above scenario. So with the proposed 'no_path_timeout' we would enable the dm-mpath module to terminate all outstanding I/O, irrespective on all userland conditions. Which seems like an improvement to me ... > The second case should point to a very short timeout. > The first case probably wants a longer one. > > In my view the correct approach for the case Frank is discussing is to > use a different trigger to detect the (approaching?) locking up of the > system. E.g. should something related to the handling of an out > of memory condition have a hook to instruct multipath to release such > queued I/O? > Yeah, that was what I had planned for quite some time. But thinking it over the no_path_timeout seems like a better approach here. (Plus we're hooking into the generic 'blk_timeout' mechanism, which then would allow to blk_abort_request() to work) Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: RFC for multipath queue_if_no_path timeout. 2013-09-27 13:52 ` Hannes Reinecke @ 2013-09-27 16:37 ` Frank Mayhar 0 siblings, 0 replies; 49+ messages in thread From: Frank Mayhar @ 2013-09-27 16:37 UTC (permalink / raw) To: Hannes Reinecke; +Cc: dm-devel, Mike Snitzer On Fri, 2013-09-27 at 15:52 +0200, Hannes Reinecke wrote: > On 09/27/2013 10:37 AM, Alasdair G Kergon wrote: > > But this still dodges the fundamental problem: > > > > What is the right value to use for the timeout? > > - How long should you wait for a path to (re)appear? > > - In the current model, reinstating a path is a userspace > > responsibility. > > > And with my proposed patch it would still be userspace which is > setting the timeout. > Currently, no_path_retry is not a proper measure anyway, as it's > depending on the time multipathd takes to complete a path check > round. Which depends on the number of device, the state of those etc. > > > The timeout, as proposed, is being used in two conflicting ways: > > - How long to wait for path recovery when all paths went down > > That would be set via the new 'no_path_timeout' feature, which would > be set instead of the (multipath-internal) no_path_retry > setting. Yes, this matches our setup as well. > > - How long to wait when the system locks without enough free > > memory even to reinstate a path (because of broken userspace > > code) before having multipath fail queued I/O in a desperate > > attempt at releasing memory to assist recovery > Do we even handle that case currently? My understanding is that the current code doesn't, no, but if it does I would love to know how. > Methinks this is precisely the use-case this is supposed to address. Yes, exactly. > When currently 'no_path_retry' is set _and_ we're running under a > low-mem condition there is a quite large likelyhood that the > multipath daemon will be killed by the OOM-killer or not able to > send any dm messages down to the kernel, as the latter most likely > require some memory allocations. > > So in the current 'no_path_retry' scenario the maps would have been > created with 'queue_if_no_path', and the daemon would have to reset > the 'queue_if_no_path' flag if the no_path_retry value expires. > Which it might not be able to do so due to the above scenario. > > So with the proposed 'no_path_timeout' we would enable the dm-mpath > module to terminate all outstanding I/O, irrespective on all > userland conditions. Which seems like an improvement to me ... And to me, which is why I went in this direction in the first place. I could see no dependable way to deal with outside of the kernel; if I had, I would have taken it, since userspace changes are _much_ easier for us to deal with than kernel changes. -- Frank Mayhar 310-460-4042 ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: RFC for multipath queue_if_no_path timeout. 2013-09-27 8:06 ` Hannes Reinecke 2013-09-27 8:37 ` Alasdair G Kergon @ 2013-09-27 16:32 ` Frank Mayhar 1 sibling, 0 replies; 49+ messages in thread From: Frank Mayhar @ 2013-09-27 16:32 UTC (permalink / raw) To: Hannes Reinecke; +Cc: dm-devel, Mike Snitzer On Fri, 2013-09-27 at 10:06 +0200, Hannes Reinecke wrote: > On 09/27/2013 08:07 AM, Hannes Reinecke wrote: > > On 09/27/2013 01:49 AM, Mike Snitzer wrote: > >> On Thu, Sep 26 2013 at 7:22pm -0400, > >> Alasdair G Kergon <agk@redhat.com> wrote: > >> > >>> On Thu, Sep 26, 2013 at 10:47:13AM -0700, Frank Mayhar wrote: > >>>> Launching it from ramdisk won't help, particularly, since it still goes > >>>> through the block layer. The other stuff won't help if a (potentially > >>>> unrelated) bug in the daemon happens to be being tickled at the same > >>>> time, or if some dependency happens to be broken and _that's_ what's > >>>> preventing the daemon from making progress. > >>> > >>> Then put more effort into debugging your daemon so it doesn't have > >>> bugs that make it die? Implement the timeout in a robust independent > >>> daemon if it's other code there that's unreliable? > >>> > >>>> And as far as lvm2 and multipath-tools, yeah, they cope okay in the kind > >>>> of environments most people have, but that's not the kind of environment > >>>> (or scale) we have to deal with. > >>> > >>> In what way are your requirements so different that a locked-into-memory > >>> monitoring daemon cannot implement this timeout? > >> > >> Frank, I had a look at your patch. It leaves a lot to be desired, I was > >> starting to clean it up but ultimately found myself agreeing with > >> Alasdair's original point: that this policy should be implemented in the > >> userspace daemon. > >> > > _Actually_ there is a way how this could be implemented properly: > > implement a blk_timeout function. > > > > Thing is, every request_queue might have a timeout function > > implemented, whose goal is to abort requests which are beyond that > > timeout. EG SCSI uses that for the dev_loss_tmo mechanism. > > > > Multipath what with it being request-based could easily implement > > the same mechanism, namely have to blk_timeout function which would > > just re-arm the timeout in the default case, but abort any queued > > I/O (after a timeout) if all paths are down. > > > > Hmm. I see to draft up a PoC. > > > And indeed, here it is. > > Completely untested, just to give you an idea what I was going on > about. Let's see if I can put this to test somewhere... Thanks, Hannes! I'll grab this and test it today. I clearly don't know enough about the block layer, since using blk_timeout never even crossed my mind. -- Frank Mayhar 310-460-4042 ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: RFC for multipath queue_if_no_path timeout. 2013-09-26 23:49 ` Mike Snitzer 2013-09-27 6:07 ` Hannes Reinecke @ 2013-09-27 16:29 ` Frank Mayhar 2013-10-17 19:03 ` Frank Mayhar 2 siblings, 0 replies; 49+ messages in thread From: Frank Mayhar @ 2013-09-27 16:29 UTC (permalink / raw) To: Mike Snitzer; +Cc: dm-devel On Thu, 2013-09-26 at 19:49 -0400, Mike Snitzer wrote:Frank, I had a look at your patch. It leaves a lot to be desired, I was > starting to clean it up but ultimately found myself agreeing with > Alasdair's original point: that this policy should be implemented in the > userspace daemon. If you could quickly summarize your issues with it I would be more than happy to clean it up myself. On the other hand, it appears that Hannes may have something more palatable that solves the same problem? -- Frank Mayhar 310-460-4042 ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: RFC for multipath queue_if_no_path timeout. 2013-09-26 23:49 ` Mike Snitzer 2013-09-27 6:07 ` Hannes Reinecke 2013-09-27 16:29 ` Frank Mayhar @ 2013-10-17 19:03 ` Frank Mayhar 2013-10-17 19:15 ` Mike Snitzer 2013-10-21 16:05 ` RFC for multipath " Benjamin Marzinski 2 siblings, 2 replies; 49+ messages in thread From: Frank Mayhar @ 2013-10-17 19:03 UTC (permalink / raw) To: Mike Snitzer; +Cc: dm-devel Dragging this back up into the light... On Thu, 2013-09-26 at 19:49 -0400, Mike Snitzer wrote: > Frank, I had a look at your patch. It leaves a lot to be desired, I was > starting to clean it up but ultimately found myself agreeing with > Alasdair's original point: that this policy should be implemented in the > userspace daemon. I've found and fixed a couple of bugs but I would still like to know what issues you had with the patch. As I said before, I would be more than happy to clean it up. In the time since we had this discussion, by the way, we ran into a problem that a userspace daemon can't solve: That of shutdown. We ran into a number of failures in which systems were hung for hours. It turned out that they were caused by a regular system shutdown. Our backing store is network-based and networking was getting killed before applications (as is usually the case), leaving I/O outstanding on the device. Since queue_if_no_path was set, the I/O wasn't dumped and our daemon was killed by shutdown very shortly thereafter so it couldn't recover (otherwise it would have cleaned things up). With those I/Os sitting queued in multipath, with no network and no daemon to turn off queue_if_no_path, the systems just sat. When we finally diagnosed this, we realized that the timeout would work perfectly to solve the problem, automatically turning queue_if_no_path off shortly after the network went away without depending on the intervention of the no-longer-running daemon. So how do you guys deal with this failure scenario? -- Frank Mayhar 310-460-4042 ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: RFC for multipath queue_if_no_path timeout. 2013-10-17 19:03 ` Frank Mayhar @ 2013-10-17 19:15 ` Mike Snitzer 2013-10-17 20:45 ` Frank Mayhar 2013-10-21 16:05 ` RFC for multipath " Benjamin Marzinski 1 sibling, 1 reply; 49+ messages in thread From: Mike Snitzer @ 2013-10-17 19:15 UTC (permalink / raw) To: Frank Mayhar; +Cc: dm-devel On Thu, Oct 17 2013 at 3:03pm -0400, Frank Mayhar <fmayhar@google.com> wrote: > Dragging this back up into the light... > > On Thu, 2013-09-26 at 19:49 -0400, Mike Snitzer wrote: > > Frank, I had a look at your patch. It leaves a lot to be desired, I was > > starting to clean it up but ultimately found myself agreeing with > > Alasdair's original point: that this policy should be implemented in the > > userspace daemon. > > I've found and fixed a couple of bugs but I would still like to know > what issues you had with the patch. As I said before, I would be more > than happy to clean it up. I don't recall, will let you know if/when I do have time to look again. > In the time since we had this discussion, by the way, we ran into a > problem that a userspace daemon can't solve: That of shutdown. We ran > into a number of failures in which systems were hung for hours. It > turned out that they were caused by a regular system shutdown. Our > backing store is network-based and networking was getting killed before > applications (as is usually the case), leaving I/O outstanding on the > device. Since queue_if_no_path was set, the I/O wasn't dumped and our > daemon was killed by shutdown very shortly thereafter so it couldn't > recover (otherwise it would have cleaned things up). > > With those I/Os sitting queued in multipath, with no network and no > daemon to turn off queue_if_no_path, the systems just sat. When we > finally diagnosed this, we realized that the timeout would work > perfectly to solve the problem, automatically turning queue_if_no_path > off shortly after the network went away without depending on the > intervention of the no-longer-running daemon. > > So how do you guys deal with this failure scenario? Shouldn't you wait for the application to shutdown before ripping the network out? Seems odd to just throw away queued IO. A proper shutdown sequence really should avoid this problem in general, the multipath daemon would only be shutdown once all mpath devices are deactivated. Then, if you still want to gracefully handle the case where there is no network (and hence no paths) on shutdown the multipathd would still be around to transition to a table that doesn't have queue_if_no_path. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: RFC for multipath queue_if_no_path timeout. 2013-10-17 19:15 ` Mike Snitzer @ 2013-10-17 20:45 ` Frank Mayhar 2013-10-17 21:13 ` Mike Snitzer 0 siblings, 1 reply; 49+ messages in thread From: Frank Mayhar @ 2013-10-17 20:45 UTC (permalink / raw) To: Mike Snitzer; +Cc: dm-devel On Thu, 2013-10-17 at 15:15 -0400, Mike Snitzer wrote: > Shouldn't you wait for the application to shutdown before ripping the > network out? Seems odd to just throw away queued IO. In general, yes, and we do that, but there's still the occasional shutdown that doesn't go through those extra steps. > A proper shutdown sequence really should avoid this problem in general, > the multipath daemon would only be shutdown once all mpath devices are > deactivated. > > Then, if you still want to gracefully handle the case where there is no > network (and hence no paths) on shutdown the multipathd would still be > around to transition to a table that doesn't have queue_if_no_path. Of course, this still depends on the daemon surviving everything else circumstances can throw at it. In our particular case, changing the shutdown sequence is far more difficult than using the kernel mechanism I've already provided. This would also simplify things for other folks who use queue_if_no_path. -- Frank Mayhar 310-460-4042 ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: RFC for multipath queue_if_no_path timeout. 2013-10-17 20:45 ` Frank Mayhar @ 2013-10-17 21:13 ` Mike Snitzer 2013-10-18 20:51 ` Frank Mayhar 0 siblings, 1 reply; 49+ messages in thread From: Mike Snitzer @ 2013-10-17 21:13 UTC (permalink / raw) To: Frank Mayhar; +Cc: dm-devel On Thu, Oct 17 2013 at 4:45pm -0400, Frank Mayhar <fmayhar@google.com> wrote: > On Thu, 2013-10-17 at 15:15 -0400, Mike Snitzer wrote: > > Shouldn't you wait for the application to shutdown before ripping the > > network out? Seems odd to just throw away queued IO. > > In general, yes, and we do that, but there's still the occasional > shutdown that doesn't go through those extra steps. > > > A proper shutdown sequence really should avoid this problem in general, > > the multipath daemon would only be shutdown once all mpath devices are > > deactivated. > > > > Then, if you still want to gracefully handle the case where there is no > > network (and hence no paths) on shutdown the multipathd would still be > > around to transition to a table that doesn't have queue_if_no_path. > > Of course, this still depends on the daemon surviving everything else > circumstances can throw at it. > > In our particular case, changing the shutdown sequence is far more > difficult than using the kernel mechanism I've already provided. This > would also simplify things for other folks who use queue_if_no_path. Cannot say that argument wins me over but I will say that if you intend to take the approach to have the kernel have a timeout; please pursue the approach Hannes offered: https://patchwork.kernel.org/patch/2953231/ It is much cleaner and if it works for your needs we can see about getting a tested version upstream. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: RFC for multipath queue_if_no_path timeout. 2013-10-17 21:13 ` Mike Snitzer @ 2013-10-18 20:51 ` Frank Mayhar 2013-10-18 21:47 ` Alasdair G Kergon 2013-10-18 22:53 ` [RFC PATCH v2] dm mpath: add a " Mike Snitzer 0 siblings, 2 replies; 49+ messages in thread From: Frank Mayhar @ 2013-10-18 20:51 UTC (permalink / raw) To: Mike Snitzer; +Cc: dm-devel On Thu, 2013-10-17 at 17:13 -0400, Mike Snitzer wrote: > Cannot say that argument wins me over but I will say that if you intend > to take the approach to have the kernel have a timeout; please pursue > the approach Hannes offered: > > https://patchwork.kernel.org/patch/2953231/ > > It is much cleaner and if it works for your needs we can see about > getting a tested version upstream. Unfortunately his patch doesn't work as-is; it turns out that it tries to set the timeout only if the target is request-based but at the time he tries to set it the table type hasn't yet been set. I'm looking into fixing it. -- Frank Mayhar 310-460-4042 ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: RFC for multipath queue_if_no_path timeout. 2013-10-18 20:51 ` Frank Mayhar @ 2013-10-18 21:47 ` Alasdair G Kergon 2013-10-18 22:53 ` [RFC PATCH v2] dm mpath: add a " Mike Snitzer 1 sibling, 0 replies; 49+ messages in thread From: Alasdair G Kergon @ 2013-10-18 21:47 UTC (permalink / raw) To: Frank Mayhar; +Cc: Jun'ichi Nomura, dm-devel, Mike Snitzer On Fri, Oct 18, 2013 at 01:51:55PM -0700, Frank Mayhar wrote: > I'm looking into fixing it. Do consider whether some of the code fits more naturally into the core part of dm rather than a particular target. Alasdair ^ permalink raw reply [flat|nested] 49+ messages in thread
* [RFC PATCH v2] dm mpath: add a queue_if_no_path timeout 2013-10-18 20:51 ` Frank Mayhar 2013-10-18 21:47 ` Alasdair G Kergon @ 2013-10-18 22:53 ` Mike Snitzer 2013-10-30 1:02 ` Mike Snitzer 1 sibling, 1 reply; 49+ messages in thread From: Mike Snitzer @ 2013-10-18 22:53 UTC (permalink / raw) To: Frank Mayhar; +Cc: dm-devel, Alasdair G. Kergon On Fri, Oct 18 2013 at 4:51pm -0400, Frank Mayhar <fmayhar@google.com> wrote: > On Thu, 2013-10-17 at 17:13 -0400, Mike Snitzer wrote: > > Cannot say that argument wins me over but I will say that if you intend > > to take the approach to have the kernel have a timeout; please pursue > > the approach Hannes offered: > > > > https://patchwork.kernel.org/patch/2953231/ > > > > It is much cleaner and if it works for your needs we can see about > > getting a tested version upstream. > > Unfortunately his patch doesn't work as-is; it turns out that it tries > to set the timeout only if the target is request-based but at the time > he tries to set it the table type hasn't yet been set. > > I'm looking into fixing it. Ouch, yeah, can't access the DM device's queue from .ctr() There were other issues with Hannes RFC patch, wouldn't compile. Anyway, looks like we need a new target_type hook (e.g. .init_queue) that is called from dm_init_request_based_queue(). Request-based DM only allows a single DM target per device so we don't need the usual multi DM-target iterators. But, unfortunately, at the time we call dm_init_request_based_queue() the mapped_device isn't yet connected to the inactive table that is being loaded (but the table is connected to the mapped_device). In dm-ioctl.c:table_load(), the inactive table could be passed directly into dm_setup_md_queue(). Please give the following revised patch a try, if it works we can clean it up further (think multipath_status needs updating, we also may want to constrain .init_queue to only being called if the target is a singleton, which dm-mpath should be, but isn't flagged as such yet). It compiles, but I haven't tested it... --- drivers/md/dm-ioctl.c | 2 +- drivers/md/dm-mpath.c | 77 +++++++++++++++++++++++++++++++++++++++++ drivers/md/dm.c | 12 +++++-- drivers/md/dm.h | 2 +- include/linux/device-mapper.h | 4 ++ 5 files changed, 92 insertions(+), 5 deletions(-) diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c index afe0814..74d1ab4 100644 --- a/drivers/md/dm-ioctl.c +++ b/drivers/md/dm-ioctl.c @@ -1289,7 +1289,7 @@ static int table_load(struct dm_ioctl *param, size_t param_size) } /* setup md->queue to reflect md's type (may block) */ - r = dm_setup_md_queue(md); + r = dm_setup_md_queue(md, t); if (r) { DMWARN("unable to set up device queue for new table."); goto err_unlock_md_type; diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c index de570a5..2c3e427 100644 --- a/drivers/md/dm-mpath.c +++ b/drivers/md/dm-mpath.c @@ -105,6 +105,8 @@ struct multipath { mempool_t *mpio_pool; struct mutex work_mutex; + + unsigned no_path_timeout; }; /* @@ -444,6 +446,61 @@ static int queue_if_no_path(struct multipath *m, unsigned queue_if_no_path, return 0; } +/* + * Block timeout callback, called from the block layer + * + * request_queue lock is held on entry. + * + * Return values: + * BLK_EH_RESET_TIMER if the request should be left running + * BLK_EH_NOT_HANDLED if the request is handled or terminated + * by the driver. + */ +enum blk_eh_timer_return abort_if_no_path(struct request *rq) +{ + union map_info *info; + struct dm_mpath_io *mpio; + struct multipath *m; + unsigned long flags; + int rc = BLK_EH_RESET_TIMER; + int flush_ios = 0; + + info = dm_get_rq_mapinfo(rq); + if (!info || !info->ptr) + return BLK_EH_NOT_HANDLED; + + mpio = info->ptr; + m = mpio->pgpath->pg->m; + /* + * Only abort request if: + * - queued_ios is not empty + * (protect against races with process_queued_ios) + * - queue_io is not set + * - no valid paths are found + */ + spin_lock_irqsave(&m->lock, flags); + if (!list_empty(&m->queued_ios) && + !m->queue_io && + !m->nr_valid_paths) { + list_del_init(&rq->queuelist); + m->queue_size--; + m->queue_if_no_path = 0; + if (m->queue_size) + flush_ios = 1; + rc = BLK_EH_NOT_HANDLED; + } + spin_unlock_irqrestore(&m->lock, flags); + + if (rc == BLK_EH_NOT_HANDLED) { + mempool_free(mpio, m->mpio_pool); + dm_kill_unmapped_request(rq, -ETIMEDOUT); + } + if (flush_ios) + queue_work(kmultipathd, &m->process_queued_ios); + + return rc; +} + /*----------------------------------------------------------------- * The multipath daemon is responsible for resubmitting queued ios. *---------------------------------------------------------------*/ @@ -790,6 +847,7 @@ static int parse_features(struct dm_arg_set *as, struct multipath *m) {0, 6, "invalid number of feature args"}, {1, 50, "pg_init_retries must be between 1 and 50"}, {0, 60000, "pg_init_delay_msecs must be between 0 and 60000"}, + {0, 65535, "no_path_timeout must be between 0 and 65535"}, }; r = dm_read_arg_group(_args, as, &argc, &ti->error); @@ -827,6 +885,13 @@ static int parse_features(struct dm_arg_set *as, struct multipath *m) continue; } + if (!strcasecmp(arg_name, "no_path_timeout") && + (argc >= 1)) { + r = dm_read_arg(_args + 3, as, &m->no_path_timeout, &ti->error); + argc--; + continue; + } + ti->error = "Unrecognised multipath feature request"; r = -EINVAL; } while (argc && !r); @@ -1709,6 +1774,17 @@ out: return busy; } +static void multipath_init_queue(struct dm_target *ti, + struct request_queue *q) +{ + struct multipath *m = ti->private; + + if (m->no_path_timeout) { + blk_queue_rq_timed_out(q, abort_if_no_path); + blk_queue_rq_timeout(q, m->no_path_timeout * HZ); + } +} + /*----------------------------------------------------------------- * Module setup *---------------------------------------------------------------*/ @@ -1728,6 +1804,7 @@ static struct target_type multipath_target = { .ioctl = multipath_ioctl, .iterate_devices = multipath_iterate_devices, .busy = multipath_busy, + .init_queue = multipath_init_queue, }; static int __init dm_multipath_init(void) diff --git a/drivers/md/dm.c b/drivers/md/dm.c index b3e26c7..ce87b8a 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -2336,8 +2336,10 @@ EXPORT_SYMBOL_GPL(dm_get_queue_limits); /* * Fully initialize a request-based queue (->elevator, ->request_fn, etc). */ -static int dm_init_request_based_queue(struct mapped_device *md) +static int dm_init_request_based_queue(struct mapped_device *md, + struct dm_table *table) { + struct dm_target *ti = NULL; struct request_queue *q = NULL; if (md->queue->elevator) @@ -2356,16 +2358,20 @@ static int dm_init_request_based_queue(struct mapped_device *md) elv_register_queue(md->queue); + ti = dm_table_get_target(table, 0); + if (ti->type->init_queue) + ti->type->init_queue(ti, md->queue); + return 1; } /* * Setup the DM device's queue based on md's type */ -int dm_setup_md_queue(struct mapped_device *md) +int dm_setup_md_queue(struct mapped_device *md, struct dm_table *t) { if ((dm_get_md_type(md) == DM_TYPE_REQUEST_BASED) && - !dm_init_request_based_queue(md)) { + !dm_init_request_based_queue(md, t)) { DMWARN("Cannot initialize queue for request-based mapped device"); return -EINVAL; } diff --git a/drivers/md/dm.h b/drivers/md/dm.h index 1d1ad7b..55cb207 100644 --- a/drivers/md/dm.h +++ b/drivers/md/dm.h @@ -83,7 +83,7 @@ void dm_set_md_type(struct mapped_device *md, unsigned type); unsigned dm_get_md_type(struct mapped_device *md); struct target_type *dm_get_immutable_target_type(struct mapped_device *md); -int dm_setup_md_queue(struct mapped_device *md); +int dm_setup_md_queue(struct mapped_device *md, struct dm_table *t); /* * To check the return value from dm_table_find_target(). diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h index ed419c6..650c575 100644 --- a/include/linux/device-mapper.h +++ b/include/linux/device-mapper.h @@ -107,6 +107,9 @@ typedef int (*dm_iterate_devices_fn) (struct dm_target *ti, typedef void (*dm_io_hints_fn) (struct dm_target *ti, struct queue_limits *limits); +typedef void (*dm_init_queue_fn) (struct dm_target *ti, + struct request_queue *q); + /* * Returns: * 0: The target can handle the next I/O immediately. @@ -162,6 +165,7 @@ struct target_type { dm_busy_fn busy; dm_iterate_devices_fn iterate_devices; dm_io_hints_fn io_hints; + dm_init_queue_fn init_queue; /* For internal device-mapper use. */ struct list_head list; ^ permalink raw reply related [flat|nested] 49+ messages in thread
* Re: [RFC PATCH v2] dm mpath: add a queue_if_no_path timeout 2013-10-18 22:53 ` [RFC PATCH v2] dm mpath: add a " Mike Snitzer @ 2013-10-30 1:02 ` Mike Snitzer 2013-10-30 15:08 ` Frank Mayhar 0 siblings, 1 reply; 49+ messages in thread From: Mike Snitzer @ 2013-10-30 1:02 UTC (permalink / raw) To: Frank Mayhar; +Cc: dm-devel, Alasdair G. Kergon On Fri, Oct 18 2013 at 6:53pm -0400, Mike Snitzer <snitzer@redhat.com> wrote: > On Fri, Oct 18 2013 at 4:51pm -0400, > Frank Mayhar <fmayhar@google.com> wrote: > > > On Thu, 2013-10-17 at 17:13 -0400, Mike Snitzer wrote: > > > Cannot say that argument wins me over but I will say that if you intend > > > to take the approach to have the kernel have a timeout; please pursue > > > the approach Hannes offered: > > > > > > https://patchwork.kernel.org/patch/2953231/ > > > > > > It is much cleaner and if it works for your needs we can see about > > > getting a tested version upstream. > > > > Unfortunately his patch doesn't work as-is; it turns out that it tries > > to set the timeout only if the target is request-based but at the time > > he tries to set it the table type hasn't yet been set. > > > > I'm looking into fixing it. > > Ouch, yeah, can't access the DM device's queue from .ctr() > There were other issues with Hannes RFC patch, wouldn't compile. > > Anyway, looks like we need a new target_type hook (e.g. .init_queue) > that is called from dm_init_request_based_queue(). > > Request-based DM only allows a single DM target per device so we don't > need the usual multi DM-target iterators. > > But, unfortunately, at the time we call dm_init_request_based_queue() > the mapped_device isn't yet connected to the inactive table that is > being loaded (but the table is connected to the mapped_device). > > In dm-ioctl.c:table_load(), the inactive table could be passed directly > into dm_setup_md_queue(). > > Please give the following revised patch a try, if it works we can clean > it up further (think multipath_status needs updating, we also may want > to constrain .init_queue to only being called if the target is a > singleton, which dm-mpath should be, but isn't flagged as such yet). > > It compiles, but I haven't tested it... Frank, Any interest in this or should I just table it for >= v3.14? ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [RFC PATCH v2] dm mpath: add a queue_if_no_path timeout 2013-10-30 1:02 ` Mike Snitzer @ 2013-10-30 15:08 ` Frank Mayhar 2013-10-30 15:43 ` Mike Snitzer 0 siblings, 1 reply; 49+ messages in thread From: Frank Mayhar @ 2013-10-30 15:08 UTC (permalink / raw) To: Mike Snitzer; +Cc: dm-devel, Alasdair G. Kergon On Tue, 2013-10-29 at 21:02 -0400, Mike Snitzer wrote: > Any interest in this or should I just table it for >= v3.14? Sorry, I've been busy putting out another fire. Yes, there's definitely still interest. I grabbed your revised patch and tested with it. Unfortunately the timeout doesn't actually fire when requests are queued due to queue_if_no_path; IIRC the block request queue timeout logic wasn't triggering. I planned to look into it more deeply figure out why but I had to spend all last week fixing a nasty race and hadn't gotten back to it yet. -- Frank Mayhar 310-460-4042 ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [RFC PATCH v2] dm mpath: add a queue_if_no_path timeout 2013-10-30 15:08 ` Frank Mayhar @ 2013-10-30 15:43 ` Mike Snitzer 2013-10-30 18:09 ` Frank Mayhar 2013-10-31 14:59 ` [RFC PATCH v2] " Hannes Reinecke 0 siblings, 2 replies; 49+ messages in thread From: Mike Snitzer @ 2013-10-30 15:43 UTC (permalink / raw) To: Frank Mayhar, hare; +Cc: dm-devel, Alasdair G. Kergon On Wed, Oct 30 2013 at 11:08am -0400, Frank Mayhar <fmayhar@google.com> wrote: > On Tue, 2013-10-29 at 21:02 -0400, Mike Snitzer wrote: > > Any interest in this or should I just table it for >= v3.14? > > Sorry, I've been busy putting out another fire. Yes, there's definitely > still interest. I grabbed your revised patch and tested with it. > Unfortunately the timeout doesn't actually fire when requests are queued > due to queue_if_no_path; IIRC the block request queue timeout logic > wasn't triggering. I planned to look into it more deeply figure out why > but I had to spend all last week fixing a nasty race and hadn't gotten > back to it yet. OK, Hannes, any idea why this might be happening? The patch in question is here: https://patchwork.kernel.org/patch/3070391/ ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [RFC PATCH v2] dm mpath: add a queue_if_no_path timeout 2013-10-30 15:43 ` Mike Snitzer @ 2013-10-30 18:09 ` Frank Mayhar 2013-10-31 9:36 ` Junichi Nomura 2013-10-31 14:59 ` [RFC PATCH v2] " Hannes Reinecke 1 sibling, 1 reply; 49+ messages in thread From: Frank Mayhar @ 2013-10-30 18:09 UTC (permalink / raw) To: Mike Snitzer; +Cc: dm-devel, Alasdair G. Kergon On Wed, 2013-10-30 at 11:43 -0400, Mike Snitzer wrote: > On Wed, Oct 30 2013 at 11:08am -0400, > Frank Mayhar <fmayhar@google.com> wrote: > > > On Tue, 2013-10-29 at 21:02 -0400, Mike Snitzer wrote: > > > Any interest in this or should I just table it for >= v3.14? > > > > Sorry, I've been busy putting out another fire. Yes, there's definitely > > still interest. I grabbed your revised patch and tested with it. > > Unfortunately the timeout doesn't actually fire when requests are queued > > due to queue_if_no_path; IIRC the block request queue timeout logic > > wasn't triggering. I planned to look into it more deeply figure out why > > but I had to spend all last week fixing a nasty race and hadn't gotten > > back to it yet. > > OK, Hannes, any idea why this might be happening? The patch in question > is here: https://patchwork.kernel.org/patch/3070391/ I got to this today and so far the most interesting I see is that the cloned request that's queued in multipath has no queue associated with it when it's queued; a printk reveals: [ 517.610042] map_io: queueing rq ffff8801150e0070 q (null) When it's eventually dequeued, it gets a queue from the destination device (in the pgpath) via bdev_get_queue(). Because of this and from just looking at the code, blk_start_request() (and therefore blk_add_timer()) isn't being called for those requests, so there's never a chance that the timeout would happen. Does this make sense? Or am I totally off-base? -- Frank Mayhar 310-460-4042 ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [RFC PATCH v2] dm mpath: add a queue_if_no_path timeout 2013-10-30 18:09 ` Frank Mayhar @ 2013-10-31 9:36 ` Junichi Nomura 2013-10-31 14:16 ` Frank Mayhar 0 siblings, 1 reply; 49+ messages in thread From: Junichi Nomura @ 2013-10-31 9:36 UTC (permalink / raw) To: device-mapper development, Mike Snitzer, Frank Mayhar; +Cc: Alasdair G. Kergon On 10/31/13 03:09, Frank Mayhar wrote: > On Wed, 2013-10-30 at 11:43 -0400, Mike Snitzer wrote: >> On Wed, Oct 30 2013 at 11:08am -0400, >> Frank Mayhar <fmayhar@google.com> wrote: >> >>> On Tue, 2013-10-29 at 21:02 -0400, Mike Snitzer wrote: >>>> Any interest in this or should I just table it for >= v3.14? >>> >>> Sorry, I've been busy putting out another fire. Yes, there's definitely >>> still interest. I grabbed your revised patch and tested with it. >>> Unfortunately the timeout doesn't actually fire when requests are queued >>> due to queue_if_no_path; IIRC the block request queue timeout logic >>> wasn't triggering. I planned to look into it more deeply figure out why >>> but I had to spend all last week fixing a nasty race and hadn't gotten >>> back to it yet. >> >> OK, Hannes, any idea why this might be happening? The patch in question >> is here: https://patchwork.kernel.org/patch/3070391/ > > I got to this today and so far the most interesting I see is that the > cloned request that's queued in multipath has no queue associated with > it when it's queued; a printk reveals: > > [ 517.610042] map_io: queueing rq ffff8801150e0070 q (null) > > When it's eventually dequeued, it gets a queue from the destination > device (in the pgpath) via bdev_get_queue(). > > Because of this and from just looking at the code, blk_start_request() > (and therefore blk_add_timer()) isn't being called for those requests, > so there's never a chance that the timeout would happen. > > Does this make sense? Or am I totally off-base? Hi, I haven't checked the above patch in detail but there is a problem; abort_if_no_path() treats "rq" as a clone request, which it isn't. "rq" is an original request. It shouldn't be a correct fix but just for testing purpose, you can try changing: info = dm_get_rq_mapinfo(rq); to info = dm_get_rq_mapinfo(rq->special); and see what happens. -- Jun'ichi Nomura, NEC Corporation ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [RFC PATCH v2] dm mpath: add a queue_if_no_path timeout 2013-10-31 9:36 ` Junichi Nomura @ 2013-10-31 14:16 ` Frank Mayhar 2013-10-31 14:31 ` Alasdair G Kergon 2013-11-01 1:58 ` Junichi Nomura 0 siblings, 2 replies; 49+ messages in thread From: Frank Mayhar @ 2013-10-31 14:16 UTC (permalink / raw) To: Junichi Nomura Cc: device-mapper development, Alasdair G. Kergon, Mike Snitzer On Thu, 2013-10-31 at 09:36 +0000, Junichi Nomura wrote: > On 10/31/13 03:09, Frank Mayhar wrote: > > On Wed, 2013-10-30 at 11:43 -0400, Mike Snitzer wrote: > >> On Wed, Oct 30 2013 at 11:08am -0400, > >> Frank Mayhar <fmayhar@google.com> wrote: > >> > >>> On Tue, 2013-10-29 at 21:02 -0400, Mike Snitzer wrote: > >>>> Any interest in this or should I just table it for >= v3.14? > >>> > >>> Sorry, I've been busy putting out another fire. Yes, there's definitely > >>> still interest. I grabbed your revised patch and tested with it. > >>> Unfortunately the timeout doesn't actually fire when requests are queued > >>> due to queue_if_no_path; IIRC the block request queue timeout logic > >>> wasn't triggering. I planned to look into it more deeply figure out why > >>> but I had to spend all last week fixing a nasty race and hadn't gotten > >>> back to it yet. > >> > >> OK, Hannes, any idea why this might be happening? The patch in question > >> is here: https://patchwork.kernel.org/patch/3070391/ > > > > I got to this today and so far the most interesting I see is that the > > cloned request that's queued in multipath has no queue associated with > > it when it's queued; a printk reveals: > > > > [ 517.610042] map_io: queueing rq ffff8801150e0070 q (null) > > > > When it's eventually dequeued, it gets a queue from the destination > > device (in the pgpath) via bdev_get_queue(). > > > > Because of this and from just looking at the code, blk_start_request() > > (and therefore blk_add_timer()) isn't being called for those requests, > > so there's never a chance that the timeout would happen. > > > > Does this make sense? Or am I totally off-base? > > Hi, > > I haven't checked the above patch in detail but there is a problem; > abort_if_no_path() treats "rq" as a clone request, which it isn't. > "rq" is an original request. > > It shouldn't be a correct fix but just for testing purpose, you can try > changing: > info = dm_get_rq_mapinfo(rq); > to > info = dm_get_rq_mapinfo(rq->special); > and see what happens. Well, at the moment this is kind of moot since abort_if_no_path() isn't being called. But, regardless, don't we want to time out the clone request? That is, after all, what is being queued in map_io(). Unfortunately the clones don't appear to be associated with a request queue; they're just put on multipath's internal queue. -- Frank Mayhar 310-460-4042 ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [RFC PATCH v2] dm mpath: add a queue_if_no_path timeout 2013-10-31 14:16 ` Frank Mayhar @ 2013-10-31 14:31 ` Alasdair G Kergon 2013-10-31 17:17 ` Frank Mayhar 2013-11-01 1:58 ` Junichi Nomura 1 sibling, 1 reply; 49+ messages in thread From: Alasdair G Kergon @ 2013-10-31 14:31 UTC (permalink / raw) To: Frank Mayhar, Junichi Nomura, device-mapper development, Mike Snitzer, Alasdair G. Kergon On Thu, Oct 31, 2013 at 07:16:51AM -0700, Frank Mayhar wrote: > Unfortunately the clones don't appear to be associated with a request > queue; they're just put on multipath's internal queue. (And also remember to test table swap/push back.) Alasdair ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [RFC PATCH v2] dm mpath: add a queue_if_no_path timeout 2013-10-31 14:31 ` Alasdair G Kergon @ 2013-10-31 17:17 ` Frank Mayhar 2013-11-01 1:23 ` Junichi Nomura 0 siblings, 1 reply; 49+ messages in thread From: Frank Mayhar @ 2013-10-31 17:17 UTC (permalink / raw) To: Alasdair G Kergon; +Cc: Junichi Nomura, device-mapper development, Mike Snitzer On Thu, 2013-10-31 at 14:31 +0000, Alasdair G Kergon wrote: > On Thu, Oct 31, 2013 at 07:16:51AM -0700, Frank Mayhar wrote: > > Unfortunately the clones don't appear to be associated with a request > > queue; they're just put on multipath's internal queue. > > (And also remember to test table swap/push back.) That brings up something I wanted to ask. I've dug through the code and this particular thing isn't clear to me. So how does it handle the queued I/Os when switching tables? I see nothing in the table_load() path that would deal with this. I'm guessing that the requests are pushed back to the block layer and are later resubmitted and requeued on the new multipath queue, but I don't see how that works. Code references would be very welcome. -- Frank Mayhar 310-460-4042 ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [RFC PATCH v2] dm mpath: add a queue_if_no_path timeout 2013-10-31 17:17 ` Frank Mayhar @ 2013-11-01 1:23 ` Junichi Nomura 0 siblings, 0 replies; 49+ messages in thread From: Junichi Nomura @ 2013-11-01 1:23 UTC (permalink / raw) To: Frank Mayhar; +Cc: device-mapper development, Mike Snitzer, Alasdair G Kergon On 11/01/13 02:17, Frank Mayhar wrote: > On Thu, 2013-10-31 at 14:31 +0000, Alasdair G Kergon wrote: >> (And also remember to test table swap/push back.) > > That brings up something I wanted to ask. I've dug through the code and > this particular thing isn't clear to me. So how does it handle the > queued I/Os when switching tables? I see nothing in the table_load() > path that would deal with this. I'm guessing that the requests are > pushed back to the block layer and are later resubmitted and requeued on > the new multipath queue, but I don't see how that works. > > Code references would be very welcome. Relevant piece of codes is: - multipath_presuspend() temporarily disables "queue_if_no_path" - during the suspend process, __must_push_back() catches (otherwise failing) requests and requeues them back to the block layer queue - upon resume, dm starts processing requests in the block layer queue as usual Hope this helps. -- Jun'ichi Nomura, NEC Corporation ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [RFC PATCH v2] dm mpath: add a queue_if_no_path timeout 2013-10-31 14:16 ` Frank Mayhar 2013-10-31 14:31 ` Alasdair G Kergon @ 2013-11-01 1:58 ` Junichi Nomura 2013-11-01 4:17 ` Junichi Nomura 1 sibling, 1 reply; 49+ messages in thread From: Junichi Nomura @ 2013-11-01 1:58 UTC (permalink / raw) To: Frank Mayhar; +Cc: device-mapper development, Alasdair G. Kergon, Mike Snitzer On 10/31/13 23:16, Frank Mayhar wrote: > On Thu, 2013-10-31 at 09:36 +0000, Junichi Nomura wrote: >> On 10/31/13 03:09, Frank Mayhar wrote: >>> On Wed, 2013-10-30 at 11:43 -0400, Mike Snitzer wrote: >>>> On Wed, Oct 30 2013 at 11:08am -0400, >>>> Frank Mayhar <fmayhar@google.com> wrote: >>>> >>>>> On Tue, 2013-10-29 at 21:02 -0400, Mike Snitzer wrote: >>>>>> Any interest in this or should I just table it for >= v3.14? >>>>> >>>>> Sorry, I've been busy putting out another fire. Yes, there's definitely >>>>> still interest. I grabbed your revised patch and tested with it. >>>>> Unfortunately the timeout doesn't actually fire when requests are queued >>>>> due to queue_if_no_path; IIRC the block request queue timeout logic >>>>> wasn't triggering. I planned to look into it more deeply figure out why >>>>> but I had to spend all last week fixing a nasty race and hadn't gotten >>>>> back to it yet. >>>> >>>> OK, Hannes, any idea why this might be happening? The patch in question >>>> is here: https://patchwork.kernel.org/patch/3070391/ >>> >>> I got to this today and so far the most interesting I see is that the >>> cloned request that's queued in multipath has no queue associated with >>> it when it's queued; a printk reveals: >>> >>> [ 517.610042] map_io: queueing rq ffff8801150e0070 q (null) >>> >>> When it's eventually dequeued, it gets a queue from the destination >>> device (in the pgpath) via bdev_get_queue(). >>> >>> Because of this and from just looking at the code, blk_start_request() >>> (and therefore blk_add_timer()) isn't being called for those requests, >>> so there's never a chance that the timeout would happen. >>> >>> Does this make sense? Or am I totally off-base? >> >> Hi, >> >> I haven't checked the above patch in detail but there is a problem; >> abort_if_no_path() treats "rq" as a clone request, which it isn't. >> "rq" is an original request. >> >> It shouldn't be a correct fix but just for testing purpose, you can try >> changing: >> info = dm_get_rq_mapinfo(rq); >> to >> info = dm_get_rq_mapinfo(rq->special); >> and see what happens. > > Well, at the moment this is kind of moot since abort_if_no_path() isn't > being called. But, regardless, don't we want to time out the clone > request? That is, after all, what is being queued in map_io(). > Unfortunately the clones don't appear to be associated with a request > queue; they're just put on multipath's internal queue. Hmm, "isn't being called" is strange. If the clone is in multipath's internal queue, the original should have been "started" from request queue point of view and timeout should fire. As for the "clone or original" question, if you are to use the block timer, you have to use it for the original request (then perhaps let the handler find its clone and kill it). That's because (as you already see) clones are not associated with request queue when it's queued in multipath internal queue and when it's associated, it belongs to the lower device's queue. -- Jun'ichi Nomura, NEC Corporation ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [RFC PATCH v2] dm mpath: add a queue_if_no_path timeout 2013-11-01 1:58 ` Junichi Nomura @ 2013-11-01 4:17 ` Junichi Nomura 2013-11-05 15:18 ` Frank Mayhar 0 siblings, 1 reply; 49+ messages in thread From: Junichi Nomura @ 2013-11-01 4:17 UTC (permalink / raw) To: Frank Mayhar; +Cc: device-mapper development, Mike Snitzer, Alasdair G. Kergon On 11/01/13 10:58, Junichi Nomura wrote: > On 10/31/13 23:16, Frank Mayhar wrote: >> On Thu, 2013-10-31 at 09:36 +0000, Junichi Nomura wrote: >>> On 10/31/13 03:09, Frank Mayhar wrote: >>>> On Wed, 2013-10-30 at 11:43 -0400, Mike Snitzer wrote: >>>>> On Wed, Oct 30 2013 at 11:08am -0400, >>>>> Frank Mayhar <fmayhar@google.com> wrote: >>>>> >>>>>> On Tue, 2013-10-29 at 21:02 -0400, Mike Snitzer wrote: >>>>>>> Any interest in this or should I just table it for >= v3.14? >>>>>> >>>>>> Sorry, I've been busy putting out another fire. Yes, there's definitely >>>>>> still interest. I grabbed your revised patch and tested with it. >>>>>> Unfortunately the timeout doesn't actually fire when requests are queued >>>>>> due to queue_if_no_path; IIRC the block request queue timeout logic >>>>>> wasn't triggering. I planned to look into it more deeply figure out why >>>>>> but I had to spend all last week fixing a nasty race and hadn't gotten >>>>>> back to it yet. >>>>> >>>>> OK, Hannes, any idea why this might be happening? The patch in question >>>>> is here: https://patchwork.kernel.org/patch/3070391/ >>>> >>>> I got to this today and so far the most interesting I see is that the >>>> cloned request that's queued in multipath has no queue associated with >>>> it when it's queued; a printk reveals: >>>> >>>> [ 517.610042] map_io: queueing rq ffff8801150e0070 q (null) >>>> >>>> When it's eventually dequeued, it gets a queue from the destination >>>> device (in the pgpath) via bdev_get_queue(). >>>> >>>> Because of this and from just looking at the code, blk_start_request() >>>> (and therefore blk_add_timer()) isn't being called for those requests, >>>> so there's never a chance that the timeout would happen. >>>> >>>> Does this make sense? Or am I totally off-base? >>> >>> Hi, >>> >>> I haven't checked the above patch in detail but there is a problem; >>> abort_if_no_path() treats "rq" as a clone request, which it isn't. >>> "rq" is an original request. >>> >>> It shouldn't be a correct fix but just for testing purpose, you can try >>> changing: >>> info = dm_get_rq_mapinfo(rq); >>> to >>> info = dm_get_rq_mapinfo(rq->special); >>> and see what happens. >> >> Well, at the moment this is kind of moot since abort_if_no_path() isn't >> being called. But, regardless, don't we want to time out the clone >> request? That is, after all, what is being queued in map_io(). >> Unfortunately the clones don't appear to be associated with a request >> queue; they're just put on multipath's internal queue. > > Hmm, "isn't being called" is strange. > If the clone is in multipath's internal queue, the original > should have been "started" from request queue point of view > and timeout should fire. > > As for the "clone or original" question, if you are to use the block timer, > you have to use it for the original request (then perhaps let the handler > find its clone and kill it). > That's because (as you already see) clones are not associated with > request queue when it's queued in multipath internal queue > and when it's associated, it belongs to the lower device's queue. I slightly modified the patch: - fixed the timeout handler to correctly find clone request and "struct multipath" - the timeout handler just disables "queue_if_no_path" instead of killing the request directly - "dmsetup status" to show the parameter - changed an interface between dm core and target - added some debugging printk (you can remove them) and checked the timeout occurs, at least. I'm not sure if this feature is good or not though. (The timer behavior is not intuitive, I think) --- Jun'ichi Nomura, NEC Corporation diff --git a/block/blk-timeout.c b/block/blk-timeout.c index 65f1035..7ea06b0 100644 --- a/block/blk-timeout.c +++ b/block/blk-timeout.c @@ -170,7 +170,7 @@ void blk_add_timer(struct request *req) struct request_queue *q = req->q; unsigned long expiry; - if (!q->rq_timed_out_fn) + if (!q->rq_timed_out_fn || !q->rq_timeout) return; BUG_ON(!list_empty(&req->timeout_list)); diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c index de570a5..c345fef 100644 --- a/drivers/md/dm-mpath.c +++ b/drivers/md/dm-mpath.c @@ -105,6 +105,8 @@ struct multipath { mempool_t *mpio_pool; struct mutex work_mutex; + + unsigned no_path_timeout; }; /* @@ -444,6 +446,48 @@ static int queue_if_no_path(struct multipath *m, unsigned queue_if_no_path, return 0; } +/* + * Block timeout callback, called from the block layer + * + * request_queue lock is held on entry. + * + * Return values: + * BLK_EH_RESET_TIMER if the request should be left running + * BLK_EH_NOT_HANDLED if the request is handled or terminated + * by the driver. + */ +enum blk_eh_timer_return multipath_timed_out(struct dm_target *ti, + struct request *clone) +{ + unsigned long flags; + struct multipath *m = ti->private; + int rc = BLK_EH_RESET_TIMER; + int flush_ios = 0; + + printk(KERN_DEBUG "[%lu] multipath_timed_out %p %p\n", jiffies, clone, m); + + /* + * Only abort request if: + * - queued_ios is not empty + * (protect against races with process_queued_ios) + * - no valid paths are found + */ + spin_lock_irqsave(&m->lock, flags); + if (m->no_path_timeout && + !list_empty(&m->queued_ios) && !m->nr_valid_paths) { + flush_ios = 1; + } + spin_unlock_irqrestore(&m->lock, flags); + + if (flush_ios) { + DMWARN("disabling queue_if_no_path as I/O timed out\n"); + queue_if_no_path(m, 0, 0); + queue_work(kmultipathd, &m->process_queued_ios); + } + + return rc; +} + /*----------------------------------------------------------------- * The multipath daemon is responsible for resubmitting queued ios. *---------------------------------------------------------------*/ @@ -790,6 +834,7 @@ static int parse_features(struct dm_arg_set *as, struct multipath *m) {0, 6, "invalid number of feature args"}, {1, 50, "pg_init_retries must be between 1 and 50"}, {0, 60000, "pg_init_delay_msecs must be between 0 and 60000"}, + {0, 65535, "no_path_timeout must be between 0 and 65535"}, }; r = dm_read_arg_group(_args, as, &argc, &ti->error); @@ -827,6 +872,13 @@ static int parse_features(struct dm_arg_set *as, struct multipath *m) continue; } + if (!strcasecmp(arg_name, "no_path_timeout") && + (argc >= 1)) { + r = dm_read_arg(_args + 3, as, &m->no_path_timeout, &ti->error); + argc--; + continue; + } + ti->error = "Unrecognised multipath feature request"; r = -EINVAL; } while (argc && !r); @@ -1384,6 +1436,9 @@ static void multipath_resume(struct dm_target *ti) spin_lock_irqsave(&m->lock, flags); m->queue_if_no_path = m->saved_queue_if_no_path; + if (m->no_path_timeout) + printk("Setting no_path_timeout %d\n", m->no_path_timeout); + dm_set_timeout(dm_table_get_md(m->ti->table), m->no_path_timeout); spin_unlock_irqrestore(&m->lock, flags); } @@ -1421,11 +1476,14 @@ static void multipath_status(struct dm_target *ti, status_type_t type, DMEMIT("2 %u %u ", m->queue_size, m->pg_init_count); else { DMEMIT("%u ", m->queue_if_no_path + + (m->no_path_timeout > 0) * 2 + (m->pg_init_retries > 0) * 2 + (m->pg_init_delay_msecs != DM_PG_INIT_DELAY_DEFAULT) * 2 + m->retain_attached_hw_handler); if (m->queue_if_no_path) DMEMIT("queue_if_no_path "); + if (m->no_path_timeout) + DMEMIT("no_path_timeout %u ", m->no_path_timeout); if (m->pg_init_retries) DMEMIT("pg_init_retries %u ", m->pg_init_retries); if (m->pg_init_delay_msecs != DM_PG_INIT_DELAY_DEFAULT) @@ -1728,6 +1786,7 @@ static struct target_type multipath_target = { .ioctl = multipath_ioctl, .iterate_devices = multipath_iterate_devices, .busy = multipath_busy, + .timed_out = multipath_timed_out, }; static int __init dm_multipath_init(void) diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c index 8f87835..32c5399 100644 --- a/drivers/md/dm-table.c +++ b/drivers/md/dm-table.c @@ -26,6 +26,8 @@ #define KEYS_PER_NODE (NODE_SIZE / sizeof(sector_t)) #define CHILDREN_PER_NODE (KEYS_PER_NODE + 1) +enum blk_eh_timer_return dm_rq_timed_out(struct request *rq); + struct dm_table { struct mapped_device *md; unsigned type; @@ -1519,6 +1521,7 @@ static void suspend_targets(struct dm_table *t, unsigned postsuspend) ti++; } + blk_queue_rq_timed_out(dm_disk(t->md)->queue, NULL); } void dm_table_presuspend_targets(struct dm_table *t) @@ -1540,10 +1543,14 @@ void dm_table_postsuspend_targets(struct dm_table *t) int dm_table_resume_targets(struct dm_table *t) { int i, r = 0; + int has_timeout = 0; for (i = 0; i < t->num_targets; i++) { struct dm_target *ti = t->targets + i; + if (ti->type->timed_out) + has_timeout = 1; + if (!ti->type->preresume) continue; @@ -1552,6 +1559,9 @@ int dm_table_resume_targets(struct dm_table *t) return r; } + if (has_timeout) + blk_queue_rq_timed_out(dm_disk(t->md)->queue, dm_rq_timed_out); + for (i = 0; i < t->num_targets; i++) { struct dm_target *ti = t->targets + i; diff --git a/drivers/md/dm.c b/drivers/md/dm.c index b3e26c7..701d3d2 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -1026,6 +1026,13 @@ static void end_clone_request(struct request *clone, int error) dm_complete_request(clone, error); } +int dm_set_timeout(struct mapped_device *md, unsigned int tmo) +{ + blk_queue_rq_timeout(md->queue, tmo * HZ); + return 0; +} +EXPORT_SYMBOL_GPL(dm_set_timeout); + /* * Return maximum size of I/O possible at the supplied sector up to the current * target boundary. @@ -1829,6 +1836,23 @@ out: dm_put_live_table(md, srcu_idx); } +enum blk_eh_timer_return dm_rq_timed_out(struct request *rq) +{ + struct request *clone = rq->special; + struct dm_rq_target_io *tio = clone->end_io_data; + + printk(KERN_DEBUG "[%lu] dm_timed_out %p : %p\n", jiffies, rq, tio); + + if (!tio) + goto out; /* not mapped */ + + if (tio->ti->type->timed_out) + return tio->ti->type->timed_out(tio->ti, clone); + +out: + return BLK_EH_RESET_TIMER; +} + int dm_underlying_device_busy(struct request_queue *q) { return blk_lld_busy(q); diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h index ed419c6..84c0368 100644 --- a/include/linux/device-mapper.h +++ b/include/linux/device-mapper.h @@ -107,6 +107,8 @@ typedef int (*dm_iterate_devices_fn) (struct dm_target *ti, typedef void (*dm_io_hints_fn) (struct dm_target *ti, struct queue_limits *limits); +typedef enum blk_eh_timer_return (*dm_rq_timed_out_fn) (struct dm_target *ti, + struct request *clone); /* * Returns: * 0: The target can handle the next I/O immediately. @@ -162,6 +164,7 @@ struct target_type { dm_busy_fn busy; dm_iterate_devices_fn iterate_devices; dm_io_hints_fn io_hints; + dm_rq_timed_out_fn timed_out; /* For internal device-mapper use. */ struct list_head list; @@ -604,5 +607,6 @@ void dm_dispatch_request(struct request *rq); void dm_requeue_unmapped_request(struct request *rq); void dm_kill_unmapped_request(struct request *rq, int error); int dm_underlying_device_busy(struct request_queue *q); +int dm_set_timeout(struct mapped_device *md, unsigned int tmo); #endif /* _LINUX_DEVICE_MAPPER_H */ ^ permalink raw reply related [flat|nested] 49+ messages in thread
* Re: [RFC PATCH v2] dm mpath: add a queue_if_no_path timeout 2013-11-01 4:17 ` Junichi Nomura @ 2013-11-05 15:18 ` Frank Mayhar 2013-11-05 16:02 ` [RFC PATCH v3] " Frank Mayhar 0 siblings, 1 reply; 49+ messages in thread From: Frank Mayhar @ 2013-11-05 15:18 UTC (permalink / raw) To: Junichi Nomura Cc: device-mapper development, Mike Snitzer, Alasdair G. Kergon On Fri, 2013-11-01 at 04:17 +0000, Junichi Nomura wrote: > I slightly modified the patch: > - fixed the timeout handler to correctly find > clone request and "struct multipath" > - the timeout handler just disables "queue_if_no_path" > instead of killing the request directly > - "dmsetup status" to show the parameter > - changed an interface between dm core and target > - added some debugging printk (you can remove them) > and checked the timeout occurs, at least. > > I'm not sure if this feature is good or not though. > (The timer behavior is not intuitive, I think) > > --- > Jun'ichi Nomura, NEC Corporation > [patch trimmed] Thanks! I integrated your new patch and tested it. Sure enough, it seems to work. I've made a few tweaks (added a module tunable and support for setting the timer in multipath_message(), removed your debug printks) and will submit the modified patch for discussion shortly. -- Frank Mayhar 310-460-4042 ^ permalink raw reply [flat|nested] 49+ messages in thread
* [RFC PATCH v3] dm mpath: add a queue_if_no_path timeout 2013-11-05 15:18 ` Frank Mayhar @ 2013-11-05 16:02 ` Frank Mayhar 2013-11-05 16:53 ` Mike Snitzer 2013-11-06 6:54 ` Hannes Reinecke 0 siblings, 2 replies; 49+ messages in thread From: Frank Mayhar @ 2013-11-05 16:02 UTC (permalink / raw) To: Junichi Nomura Cc: device-mapper development, Mike Snitzer, Alasdair G. Kergon This is the patch submitted by Jun'ichi Nomura, originally based on Mike's patch with some small changes by me. Jun'ichi's description follows, along with my changes: On Tue, 2013-11-05 at 07:18 -0800, Frank Mayhar wrote: > On Fri, 2013-11-01 at 04:17 +0000, Junichi Nomura wrote: > > I slightly modified the patch: > > - fixed the timeout handler to correctly find > > clone request and "struct multipath" > > - the timeout handler just disables "queue_if_no_path" > > instead of killing the request directly > > - "dmsetup status" to show the parameter > > - changed an interface between dm core and target > > - added some debugging printk (you can remove them) > > and checked the timeout occurs, at least. > > > > I'm not sure if this feature is good or not though. > > (The timer behavior is not intuitive, I think) > Thanks! I integrated your new patch and tested it. Sure enough, it > seems to work. I've made a few tweaks (added a module tunable and > support for setting the timer in multipath_message(), removed your debug > printks) and will submit the modified patch for discussion shortly. Comments? --- block/blk-timeout.c | 2 drivers/md/dm-mpath.c | 85 ++++++++++++++++++++++++++++++++++++++++++ drivers/md/dm-table.c | 10 ++++ drivers/md/dm.c | 22 ++++++++++ include/linux/device-mapper.h | 4 + 5 files changed, 122 insertions(+), 1 deletion(-) diff --git a/block/blk-timeout.c b/block/blk-timeout.c index 65f1035..7ea06b0 100644 --- a/block/blk-timeout.c +++ b/block/blk-timeout.c @@ -170,7 +170,7 @@ void blk_add_timer(struct request *req) struct request_queue *q = req->q; unsigned long expiry; - if (!q->rq_timed_out_fn) + if (!q->rq_timed_out_fn || !q->rq_timeout) return; BUG_ON(!list_empty(&req->timeout_list)); diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c index de570a5..6f127b7 100644 --- a/drivers/md/dm-mpath.c +++ b/drivers/md/dm-mpath.c @@ -27,6 +27,8 @@ #define DM_PG_INIT_DELAY_MSECS 2000 #define DM_PG_INIT_DELAY_DEFAULT ((unsigned) -1) +#define DEFAULT_NO_PATH_TIMEOUT_SECS 0 + /* Path properties */ struct pgpath { struct list_head list; @@ -105,6 +107,8 @@ struct multipath { mempool_t *mpio_pool; struct mutex work_mutex; + + unsigned no_path_timeout; }; /* @@ -117,6 +121,10 @@ struct dm_mpath_io { typedef int (*action_fn) (struct pgpath *pgpath); +static unsigned long global_no_path_timeout = DEFAULT_NO_PATH_TIMEOUT_SECS; +module_param_named(queue_if_no_path_timeout_secs, + global_no_path_timeout, ulong, S_IRUGO | S_IWUSR); + static struct kmem_cache *_mpio_cache; static struct workqueue_struct *kmultipathd, *kmpath_handlerd; @@ -207,6 +215,7 @@ static struct multipath *alloc_multipath(struct dm_target *ti) kfree(m); return NULL; } + m->no_path_timeout = global_no_path_timeout; m->ti = ti; ti->private = m; } @@ -444,6 +453,45 @@ static int queue_if_no_path(struct multipath *m, unsigned queue_if_no_path, return 0; } +/* + * Block timeout callback, called from the block layer + * + * request_queue lock is held on entry. + * + * Return values: + * BLK_EH_RESET_TIMER if the request should be left running + * BLK_EH_NOT_HANDLED if the request is handled or terminated + * by the driver. + */ +enum blk_eh_timer_return multipath_timed_out(struct dm_target *ti, + struct request *clone) +{ + unsigned long flags; + struct multipath *m = ti->private; + int rc = BLK_EH_RESET_TIMER; + int flush_ios = 0; + + /* + * Only abort request if: + * - queued_ios is not empty + * (protect against races with process_queued_ios) + * - no valid paths are found + */ + spin_lock_irqsave(&m->lock, flags); + if (m->no_path_timeout && + !list_empty(&m->queued_ios) && !m->nr_valid_paths) { + flush_ios = 1; + } + spin_unlock_irqrestore(&m->lock, flags); + + if (flush_ios) { + queue_if_no_path(m, 0, 0); + queue_work(kmultipathd, &m->process_queued_ios); + } + + return rc; +} + /*----------------------------------------------------------------- * The multipath daemon is responsible for resubmitting queued ios. *---------------------------------------------------------------*/ @@ -790,6 +838,7 @@ static int parse_features(struct dm_arg_set *as, struct multipath *m) {0, 6, "invalid number of feature args"}, {1, 50, "pg_init_retries must be between 1 and 50"}, {0, 60000, "pg_init_delay_msecs must be between 0 and 60000"}, + {0, 65535, "queue_if_no_path_timeout_secs must be between 0 and 65535"}, }; r = dm_read_arg_group(_args, as, &argc, &ti->error); @@ -827,6 +876,13 @@ static int parse_features(struct dm_arg_set *as, struct multipath *m) continue; } + if (!strcasecmp(arg_name, "queue_if_no_path_timeout_secs") && + (argc >= 1)) { + r = dm_read_arg(_args + 3, as, &m->no_path_timeout, &ti->error); + argc--; + continue; + } + ti->error = "Unrecognised multipath feature request"; r = -EINVAL; } while (argc && !r); @@ -978,6 +1034,27 @@ static int multipath_map(struct dm_target *ti, struct request *clone, } /* + * Set the no_path_timeout on behalf of a message. Note that this only + * affects new requests; already-queued requests are unaffected. + */ +static int set_no_path_timeout(struct multipath *m, const char *timestr) +{ + unsigned long to; + unsigned long flags; + + if (!timestr || (sscanf(timestr, "%lu", &to) != 1)) { + DMWARN("invalid timeout supplied to set_no_path_timeout"); + return -EINVAL; + } + + spin_lock_irqsave(&m->lock, flags); + m->no_path_timeout = to; + dm_set_timeout(dm_table_get_md(m->ti->table), m->no_path_timeout); + spin_unlock_irqrestore(&m->lock, flags); + return 0; +} + +/* * Take a path out of use. */ static int fail_path(struct pgpath *pgpath) @@ -1384,6 +1461,7 @@ static void multipath_resume(struct dm_target *ti) spin_lock_irqsave(&m->lock, flags); m->queue_if_no_path = m->saved_queue_if_no_path; + dm_set_timeout(dm_table_get_md(m->ti->table), m->no_path_timeout); spin_unlock_irqrestore(&m->lock, flags); } @@ -1421,11 +1499,14 @@ static void multipath_status(struct dm_target *ti, status_type_t type, DMEMIT("2 %u %u ", m->queue_size, m->pg_init_count); else { DMEMIT("%u ", m->queue_if_no_path + + (m->no_path_timeout > 0) * 2 + (m->pg_init_retries > 0) * 2 + (m->pg_init_delay_msecs != DM_PG_INIT_DELAY_DEFAULT) * 2 + m->retain_attached_hw_handler); if (m->queue_if_no_path) DMEMIT("queue_if_no_path "); + if (m->no_path_timeout) + DMEMIT("queue_if_no_path_timeout_secs %u ", m->no_path_timeout); if (m->pg_init_retries) DMEMIT("pg_init_retries %u ", m->pg_init_retries); if (m->pg_init_delay_msecs != DM_PG_INIT_DELAY_DEFAULT) @@ -1550,6 +1631,9 @@ static int multipath_message(struct dm_target *ti, unsigned argc, char **argv) } else if (!strcasecmp(argv[0], "switch_group")) { r = switch_pg_num(m, argv[1]); goto out; + } else if (!strcasecmp(argv[0], "queue_if_no_path_timeout_secs")) { + r = set_no_path_timeout(m, argv[1]); + goto out; } else if (!strcasecmp(argv[0], "reinstate_path")) action = reinstate_path; else if (!strcasecmp(argv[0], "fail_path")) @@ -1728,6 +1812,7 @@ static struct target_type multipath_target = { .ioctl = multipath_ioctl, .iterate_devices = multipath_iterate_devices, .busy = multipath_busy, + .timed_out = multipath_timed_out, }; static int __init dm_multipath_init(void) diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c index 8f87835..32c5399 100644 --- a/drivers/md/dm-table.c +++ b/drivers/md/dm-table.c @@ -26,6 +26,8 @@ #define KEYS_PER_NODE (NODE_SIZE / sizeof(sector_t)) #define CHILDREN_PER_NODE (KEYS_PER_NODE + 1) +enum blk_eh_timer_return dm_rq_timed_out(struct request *rq); + struct dm_table { struct mapped_device *md; unsigned type; @@ -1519,6 +1521,7 @@ static void suspend_targets(struct dm_table *t, unsigned postsuspend) ti++; } + blk_queue_rq_timed_out(dm_disk(t->md)->queue, NULL); } void dm_table_presuspend_targets(struct dm_table *t) @@ -1540,10 +1543,14 @@ void dm_table_postsuspend_targets(struct dm_table *t) int dm_table_resume_targets(struct dm_table *t) { int i, r = 0; + int has_timeout = 0; for (i = 0; i < t->num_targets; i++) { struct dm_target *ti = t->targets + i; + if (ti->type->timed_out) + has_timeout = 1; + if (!ti->type->preresume) continue; @@ -1552,6 +1559,9 @@ int dm_table_resume_targets(struct dm_table *t) return r; } + if (has_timeout) + blk_queue_rq_timed_out(dm_disk(t->md)->queue, dm_rq_timed_out); + for (i = 0; i < t->num_targets; i++) { struct dm_target *ti = t->targets + i; diff --git a/drivers/md/dm.c b/drivers/md/dm.c index b3e26c7..bb7a29c 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -1026,6 +1026,13 @@ static void end_clone_request(struct request *clone, int error) dm_complete_request(clone, error); } +int dm_set_timeout(struct mapped_device *md, unsigned int tmo) +{ + blk_queue_rq_timeout(md->queue, tmo * HZ); + return 0; +} +EXPORT_SYMBOL_GPL(dm_set_timeout); + /* * Return maximum size of I/O possible at the supplied sector up to the current * target boundary. @@ -1829,6 +1836,21 @@ out: dm_put_live_table(md, srcu_idx); } +enum blk_eh_timer_return dm_rq_timed_out(struct request *rq) +{ + struct request *clone = rq->special; + struct dm_rq_target_io *tio = clone->end_io_data; + + if (!tio) + goto out; /* not mapped */ + + if (tio->ti->type->timed_out) + return tio->ti->type->timed_out(tio->ti, clone); + +out: + return BLK_EH_RESET_TIMER; +} + int dm_underlying_device_busy(struct request_queue *q) { return blk_lld_busy(q); diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h index ed419c6..84c0368 100644 --- a/include/linux/device-mapper.h +++ b/include/linux/device-mapper.h @@ -107,6 +107,8 @@ typedef int (*dm_iterate_devices_fn) (struct dm_target *ti, typedef void (*dm_io_hints_fn) (struct dm_target *ti, struct queue_limits *limits); +typedef enum blk_eh_timer_return (*dm_rq_timed_out_fn) (struct dm_target *ti, + struct request *clone); /* * Returns: * 0: The target can handle the next I/O immediately. @@ -162,6 +164,7 @@ struct target_type { dm_busy_fn busy; dm_iterate_devices_fn iterate_devices; dm_io_hints_fn io_hints; + dm_rq_timed_out_fn timed_out; /* For internal device-mapper use. */ struct list_head list; @@ -604,5 +607,6 @@ void dm_dispatch_request(struct request *rq); void dm_requeue_unmapped_request(struct request *rq); void dm_kill_unmapped_request(struct request *rq, int error); int dm_underlying_device_busy(struct request_queue *q); +int dm_set_timeout(struct mapped_device *md, unsigned int tmo); #endif /* _LINUX_DEVICE_MAPPER_H */ -- Frank Mayhar 310-460-4042 ^ permalink raw reply related [flat|nested] 49+ messages in thread
* Re: [RFC PATCH v3] dm mpath: add a queue_if_no_path timeout 2013-11-05 16:02 ` [RFC PATCH v3] " Frank Mayhar @ 2013-11-05 16:53 ` Mike Snitzer 2013-11-06 6:54 ` Hannes Reinecke 1 sibling, 0 replies; 49+ messages in thread From: Mike Snitzer @ 2013-11-05 16:53 UTC (permalink / raw) To: Frank Mayhar Cc: Junichi Nomura, device-mapper development, Alasdair G. Kergon On Tue, Nov 05 2013 at 11:02am -0500, Frank Mayhar <fmayhar@google.com> wrote: > This is the patch submitted by Jun'ichi Nomura, originally based on > Mike's patch with some small changes by me. Jun'ichi's description > follows, along with my changes: > > On Tue, 2013-11-05 at 07:18 -0800, Frank Mayhar wrote: > > On Fri, 2013-11-01 at 04:17 +0000, Junichi Nomura wrote: > > > I slightly modified the patch: > > > - fixed the timeout handler to correctly find > > > clone request and "struct multipath" > > > - the timeout handler just disables "queue_if_no_path" > > > instead of killing the request directly > > > - "dmsetup status" to show the parameter > > > - changed an interface between dm core and target > > > - added some debugging printk (you can remove them) > > > and checked the timeout occurs, at least. > > > > > > I'm not sure if this feature is good or not though. > > > (The timer behavior is not intuitive, I think) > > Thanks! I integrated your new patch and tested it. Sure enough, it > > seems to work. I've made a few tweaks (added a module tunable and > > support for setting the timer in multipath_message(), removed your debug > > printks) and will submit the modified patch for discussion shortly. > > Comments? My primary concern is that this patch is always establishing a timed_out method via blk_queue_rq_timed_out() regardless of whether the mpath device needs it. I'm also not a fan of adding such a specialized rq-based only hook (dm_rq_timed_out_fn timed_out). Could be we'll need to do other things to the queue (be it bio-based or rq-based) in the future. So I prefer the approach I took to arming the queue (via new .init_queue hook) in this patch: https://patchwork.kernel.org/patch/3070391/ ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [RFC PATCH v3] dm mpath: add a queue_if_no_path timeout 2013-11-05 16:02 ` [RFC PATCH v3] " Frank Mayhar 2013-11-05 16:53 ` Mike Snitzer @ 2013-11-06 6:54 ` Hannes Reinecke 2013-11-06 15:43 ` Frank Mayhar 1 sibling, 1 reply; 49+ messages in thread From: Hannes Reinecke @ 2013-11-06 6:54 UTC (permalink / raw) To: Frank Mayhar Cc: Jun'ichi Nomura, dm-devel, Alasdair G Kergon, Mike Snitzer On 11/05/2013 05:02 PM, Frank Mayhar wrote: > This is the patch submitted by Jun'ichi Nomura, originally based on > Mike's patch with some small changes by me. Jun'ichi's description > follows, along with my changes: > > On Tue, 2013-11-05 at 07:18 -0800, Frank Mayhar wrote: >> On Fri, 2013-11-01 at 04:17 +0000, Junichi Nomura wrote: >>> I slightly modified the patch: >>> - fixed the timeout handler to correctly find >>> clone request and "struct multipath" >>> - the timeout handler just disables "queue_if_no_path" >>> instead of killing the request directly >>> - "dmsetup status" to show the parameter >>> - changed an interface between dm core and target >>> - added some debugging printk (you can remove them) >>> and checked the timeout occurs, at least. >>> >>> I'm not sure if this feature is good or not though. >>> (The timer behavior is not intuitive, I think) >> Thanks! I integrated your new patch and tested it. Sure enough, it >> seems to work. I've made a few tweaks (added a module tunable and >> support for setting the timer in multipath_message(), removed your debug >> printks) and will submit the modified patch for discussion shortly. > > Comments? > Yeah. Seems to be my eternal fate; initiating fixes and not getting mentioned at all. Sigh. I dimly remember having sent the original patch for the blk timeout function ... hence a short notice would've been nice. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [RFC PATCH v3] dm mpath: add a queue_if_no_path timeout 2013-11-06 6:54 ` Hannes Reinecke @ 2013-11-06 15:43 ` Frank Mayhar 2013-11-06 19:21 ` Mike Snitzer 0 siblings, 1 reply; 49+ messages in thread From: Frank Mayhar @ 2013-11-06 15:43 UTC (permalink / raw) To: Hannes Reinecke Cc: Jun'ichi Nomura, dm-devel, Alasdair G Kergon, Mike, Snitzer On Wed, 2013-11-06 at 07:54 +0100, Hannes Reinecke wrote: > On 11/05/2013 05:02 PM, Frank Mayhar wrote: > > This is the patch submitted by Jun'ichi Nomura, originally based on > > Mike's patch with some small changes by me. Jun'ichi's description > > follows, along with my changes: > > > > On Tue, 2013-11-05 at 07:18 -0800, Frank Mayhar wrote: > >> On Fri, 2013-11-01 at 04:17 +0000, Junichi Nomura wrote: > >>> I slightly modified the patch: > >>> - fixed the timeout handler to correctly find > >>> clone request and "struct multipath" > >>> - the timeout handler just disables "queue_if_no_path" > >>> instead of killing the request directly > >>> - "dmsetup status" to show the parameter > >>> - changed an interface between dm core and target > >>> - added some debugging printk (you can remove them) > >>> and checked the timeout occurs, at least. > >>> > >>> I'm not sure if this feature is good or not though. > >>> (The timer behavior is not intuitive, I think) > >> Thanks! I integrated your new patch and tested it. Sure enough, it > >> seems to work. I've made a few tweaks (added a module tunable and > >> support for setting the timer in multipath_message(), removed your debug > >> printks) and will submit the modified patch for discussion shortly. > > > > Comments? > > > Yeah. Seems to be my eternal fate; initiating fixes and not getting > mentioned at all. > Sigh. > > I dimly remember having sent the original patch for the blk timeout > function ... hence a short notice would've been nice. Sorry, I did ding you early on (and I think Mike dinged you as well), but you were apparently busy with other things at the time. I also sent email last Thursday, quoted below: On Wed, 2013-10-30 at 11:43 -0400, Mike Snitzer wrote: > On Wed, Oct 30 2013 at 11:08am -0400, > Frank Mayhar <fmayhar@google.com> wrote: > > > On Tue, 2013-10-29 at 21:02 -0400, Mike Snitzer wrote: > > > Any interest in this or should I just table it for >= v3.14? > > > > Sorry, I've been busy putting out another fire. Yes, there's definitely > > still interest. I grabbed your revised patch and tested with it. > > Unfortunately the timeout doesn't actually fire when requests are queued > > due to queue_if_no_path; IIRC the block request queue timeout logic > > wasn't triggering. I planned to look into it more deeply figure out why > > but I had to spend all last week fixing a nasty race and hadn't gotten > > back to it yet. > > OK, Hannes, any idea why this might be happening? The patch in question > is here: https://patchwork.kernel.org/patch/3070391/ I got to this today and so far the most interesting I see is that the cloned request that's queued in multipath has no queue associated with it when it's queued; a printk reveals: [ 517.610042] map_io: queueing rq ffff8801150e0070 q (null) When it's eventually dequeued, it gets a queue from the destination device (in the pgpath) via bdev_get_queue(). Because of this and from just looking at the code, blk_start_request() (and therefore blk_add_timer()) isn't being called for those requests, so there's never a chance that the timeout would happen. Does this make sense? Or am I totally off-base? -- Frank Mayhar 310-460-4042 ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [RFC PATCH v3] dm mpath: add a queue_if_no_path timeout 2013-11-06 15:43 ` Frank Mayhar @ 2013-11-06 19:21 ` Mike Snitzer 2013-11-07 1:03 ` Junichi Nomura 0 siblings, 1 reply; 49+ messages in thread From: Mike Snitzer @ 2013-11-06 19:21 UTC (permalink / raw) To: Frank Mayhar; +Cc: Jun'ichi Nomura, dm-devel, Mike, Alasdair G Kergon On Wed, Nov 06 2013 at 10:43am -0500, Frank Mayhar <fmayhar@google.com> wrote: > On Wed, 2013-11-06 at 07:54 +0100, Hannes Reinecke wrote: > > On 11/05/2013 05:02 PM, Frank Mayhar wrote: > > > This is the patch submitted by Jun'ichi Nomura, originally based on > > > Mike's patch with some small changes by me. Jun'ichi's description > > > follows, along with my changes: > > > > > > On Tue, 2013-11-05 at 07:18 -0800, Frank Mayhar wrote: > > >> On Fri, 2013-11-01 at 04:17 +0000, Junichi Nomura wrote: > > >>> I slightly modified the patch: > > >>> - fixed the timeout handler to correctly find > > >>> clone request and "struct multipath" > > >>> - the timeout handler just disables "queue_if_no_path" > > >>> instead of killing the request directly > > >>> - "dmsetup status" to show the parameter > > >>> - changed an interface between dm core and target > > >>> - added some debugging printk (you can remove them) > > >>> and checked the timeout occurs, at least. > > >>> > > >>> I'm not sure if this feature is good or not though. > > >>> (The timer behavior is not intuitive, I think) > > >> Thanks! I integrated your new patch and tested it. Sure enough, it > > >> seems to work. I've made a few tweaks (added a module tunable and > > >> support for setting the timer in multipath_message(), removed your debug > > >> printks) and will submit the modified patch for discussion shortly. > > > > > > Comments? > > > > > Yeah. Seems to be my eternal fate; initiating fixes and not getting > > mentioned at all. > > Sigh. > > > > I dimly remember having sent the original patch for the blk timeout > > function ... hence a short notice would've been nice. > > Sorry, I did ding you early on (and I think Mike dinged you as well), > but you were apparently busy with other things at the time. Hi Frank, I wouldn't worry about this. You didn't even supply a patch header.. so it isn't like Hannes was obviously left out. Fact is this patch has had 4 iterations, the first of which from Hannes didn't compile or even make sense. Anyway, he'll get attribution through Suggested-by unless he wins the race to produces the first upstream-worthy variant of this line of work. So far it has all been RFC-style patches.. your most recent one that builds on Jun'ichi's patch with a module param and timeout default: We generally don't add module params to targets (but I can appreciate why you might want that.. just not seeing the need). And having a message to change the timeout conflicts with my desire to conditionally establish a timed_out method only if a timeout was specified (like my last reply in this thread suggested). Mike ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [RFC PATCH v3] dm mpath: add a queue_if_no_path timeout 2013-11-06 19:21 ` Mike Snitzer @ 2013-11-07 1:03 ` Junichi Nomura 0 siblings, 0 replies; 49+ messages in thread From: Junichi Nomura @ 2013-11-07 1:03 UTC (permalink / raw) To: device-mapper development; +Cc: Alasdair G Kergon, Frank Mayhar, Mike Snitzer On 11/07/13 04:21, Mike Snitzer wrote: > On Wed, Nov 06 2013 at 10:43am -0500, > Frank Mayhar <fmayhar@google.com> wrote: > >> On Wed, 2013-11-06 at 07:54 +0100, Hannes Reinecke wrote: >>> On 11/05/2013 05:02 PM, Frank Mayhar wrote: >>>> This is the patch submitted by Jun'ichi Nomura, originally based on >>>> Mike's patch with some small changes by me. Jun'ichi's description >>>> follows, along with my changes: >>>> >>>> On Tue, 2013-11-05 at 07:18 -0800, Frank Mayhar wrote: >>>>> On Fri, 2013-11-01 at 04:17 +0000, Junichi Nomura wrote: >>>>>> I slightly modified the patch: >>>>>> - fixed the timeout handler to correctly find >>>>>> clone request and "struct multipath" >>>>>> - the timeout handler just disables "queue_if_no_path" >>>>>> instead of killing the request directly >>>>>> - "dmsetup status" to show the parameter >>>>>> - changed an interface between dm core and target >>>>>> - added some debugging printk (you can remove them) >>>>>> and checked the timeout occurs, at least. >>>>>> >>>>>> I'm not sure if this feature is good or not though. >>>>>> (The timer behavior is not intuitive, I think) >>>>> Thanks! I integrated your new patch and tested it. Sure enough, it >>>>> seems to work. I've made a few tweaks (added a module tunable and >>>>> support for setting the timer in multipath_message(), removed your debug >>>>> printks) and will submit the modified patch for discussion shortly. >>>> >>>> Comments? >>>> >>> Yeah. Seems to be my eternal fate; initiating fixes and not getting >>> mentioned at all. >>> Sigh. >>> >>> I dimly remember having sent the original patch for the blk timeout >>> function ... hence a short notice would've been nice. >> >> Sorry, I did ding you early on (and I think Mike dinged you as well), >> but you were apparently busy with other things at the time. > > Hi Frank, > > I wouldn't worry about this. You didn't even supply a patch header.. so > it isn't like Hannes was obviously left out. Fact is this patch has had > 4 iterations, the first of which from Hannes didn't compile or even make > sense. Anyway, he'll get attribution through Suggested-by unless he > wins the race to produces the first upstream-worthy variant of this line > of work. > > So far it has all been RFC-style patches.. your most recent one that > builds on Jun'ichi's patch with a module param and timeout default: We > generally don't add module params to targets (but I can appreciate why > you might want that.. just not seeing the need). And having a message > to change the timeout conflicts with my desire to conditionally > establish a timed_out method only if a timeout was specified (like my > last reply in this thread suggested). BTW, this approach of using blk timer might conflict with possible future removal of multipath internal queue: https://www.redhat.com/archives/dm-devel/2013-November/msg00025.html because the timer is stopped if the request is queued to block layer. To cope with that, we might have to extend block layer to run timer for queued request, etc. -- Jun'ichi Nomura, NEC Corporation ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [RFC PATCH v2] dm mpath: add a queue_if_no_path timeout 2013-10-30 15:43 ` Mike Snitzer 2013-10-30 18:09 ` Frank Mayhar @ 2013-10-31 14:59 ` Hannes Reinecke 1 sibling, 0 replies; 49+ messages in thread From: Hannes Reinecke @ 2013-10-31 14:59 UTC (permalink / raw) To: Mike Snitzer, Frank Mayhar; +Cc: dm-devel, Alasdair G. Kergon On 10/30/2013 04:43 PM, Mike Snitzer wrote: > On Wed, Oct 30 2013 at 11:08am -0400, > Frank Mayhar <fmayhar@google.com> wrote: > >> On Tue, 2013-10-29 at 21:02 -0400, Mike Snitzer wrote: >>> Any interest in this or should I just table it for >= v3.14? >> >> Sorry, I've been busy putting out another fire. Yes, there's definitely >> still interest. I grabbed your revised patch and tested with it. >> Unfortunately the timeout doesn't actually fire when requests are queued >> due to queue_if_no_path; IIRC the block request queue timeout logic >> wasn't triggering. I planned to look into it more deeply figure out why >> but I had to spend all last week fixing a nasty race and hadn't gotten >> back to it yet. > > OK, Hannes, any idea why this might be happening? The patch in question > is here: https://patchwork.kernel.org/patch/3070391/ > Not yet; currently I'm on vacation, but will be looking into it on Monday. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: RFC for multipath queue_if_no_path timeout. 2013-10-17 19:03 ` Frank Mayhar 2013-10-17 19:15 ` Mike Snitzer @ 2013-10-21 16:05 ` Benjamin Marzinski 2013-10-21 16:17 ` Frank Mayhar 1 sibling, 1 reply; 49+ messages in thread From: Benjamin Marzinski @ 2013-10-21 16:05 UTC (permalink / raw) To: device-mapper development; +Cc: Frank Mayhar On Thu, Oct 17, 2013 at 12:03:10PM -0700, Frank Mayhar wrote: > Dragging this back up into the light... > > On Thu, 2013-09-26 at 19:49 -0400, Mike Snitzer wrote: > > Frank, I had a look at your patch. It leaves a lot to be desired, I was > > starting to clean it up but ultimately found myself agreeing with > > Alasdair's original point: that this policy should be implemented in the > > userspace daemon. > > I've found and fixed a couple of bugs but I would still like to know > what issues you had with the patch. As I said before, I would be more > than happy to clean it up. > > In the time since we had this discussion, by the way, we ran into a > problem that a userspace daemon can't solve: That of shutdown. We ran > into a number of failures in which systems were hung for hours. It > turned out that they were caused by a regular system shutdown. Our > backing store is network-based and networking was getting killed before > applications (as is usually the case), leaving I/O outstanding on the > device. Since queue_if_no_path was set, the I/O wasn't dumped and our > daemon was killed by shutdown very shortly thereafter so it couldn't > recover (otherwise it would have cleaned things up). > Was multipathd force killed? What was the default configuration parameter "queue_without_daemon" set to? If "queue_without_daemon" is set to "no", multipathd should disable queueing when it is stopped. This was added specifically to avoid this issue. -Ben > With those I/Os sitting queued in multipath, with no network and no > daemon to turn off queue_if_no_path, the systems just sat. When we > finally diagnosed this, we realized that the timeout would work > perfectly to solve the problem, automatically turning queue_if_no_path > off shortly after the network went away without depending on the > intervention of the no-longer-running daemon. > > So how do you guys deal with this failure scenario? > -- > Frank Mayhar > 310-460-4042 > > -- > dm-devel mailing list > dm-devel@redhat.com > https://www.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: RFC for multipath queue_if_no_path timeout. 2013-10-21 16:05 ` RFC for multipath " Benjamin Marzinski @ 2013-10-21 16:17 ` Frank Mayhar 2013-10-23 13:39 ` Benjamin Marzinski 0 siblings, 1 reply; 49+ messages in thread From: Frank Mayhar @ 2013-10-21 16:17 UTC (permalink / raw) To: Benjamin Marzinski; +Cc: device-mapper development On Mon, 2013-10-21 at 11:05 -0500, Benjamin Marzinski wrote: > On Thu, Oct 17, 2013 at 12:03:10PM -0700, Frank Mayhar wrote: > > Dragging this back up into the light... > > > > On Thu, 2013-09-26 at 19:49 -0400, Mike Snitzer wrote: > > > Frank, I had a look at your patch. It leaves a lot to be desired, I was > > > starting to clean it up but ultimately found myself agreeing with > > > Alasdair's original point: that this policy should be implemented in the > > > userspace daemon. > > > > I've found and fixed a couple of bugs but I would still like to know > > what issues you had with the patch. As I said before, I would be more > > than happy to clean it up. > > > > In the time since we had this discussion, by the way, we ran into a > > problem that a userspace daemon can't solve: That of shutdown. We ran > > into a number of failures in which systems were hung for hours. It > > turned out that they were caused by a regular system shutdown. Our > > backing store is network-based and networking was getting killed before > > applications (as is usually the case), leaving I/O outstanding on the > > device. Since queue_if_no_path was set, the I/O wasn't dumped and our > > daemon was killed by shutdown very shortly thereafter so it couldn't > > recover (otherwise it would have cleaned things up). > > > > Was multipathd force killed? What was the default configuration > parameter "queue_without_daemon" set to? > > If "queue_without_daemon" is set to "no", multipathd should disable > queueing when it is stopped. This was added specifically to avoid this > issue. We don't run multipathd (we need our own daemon for various reasons) and we're also running an older kernel (based on 3.3), so we don't have "queue_without_daemon" yet. (In fact, I don't see it on a cursory glance at dm-mpath.c; where is it implemented?) -- Frank Mayhar 310-460-4042 ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: RFC for multipath queue_if_no_path timeout. 2013-10-21 16:17 ` Frank Mayhar @ 2013-10-23 13:39 ` Benjamin Marzinski 0 siblings, 0 replies; 49+ messages in thread From: Benjamin Marzinski @ 2013-10-23 13:39 UTC (permalink / raw) To: Frank Mayhar; +Cc: device-mapper development On Mon, Oct 21, 2013 at 09:17:31AM -0700, Frank Mayhar wrote: > On Mon, 2013-10-21 at 11:05 -0500, Benjamin Marzinski wrote: > > On Thu, Oct 17, 2013 at 12:03:10PM -0700, Frank Mayhar wrote: > > > Dragging this back up into the light... > > > > > > On Thu, 2013-09-26 at 19:49 -0400, Mike Snitzer wrote: > > > > Frank, I had a look at your patch. It leaves a lot to be desired, I was > > > > starting to clean it up but ultimately found myself agreeing with > > > > Alasdair's original point: that this policy should be implemented in the > > > > userspace daemon. > > > > > > I've found and fixed a couple of bugs but I would still like to know > > > what issues you had with the patch. As I said before, I would be more > > > than happy to clean it up. > > > > > > In the time since we had this discussion, by the way, we ran into a > > > problem that a userspace daemon can't solve: That of shutdown. We ran > > > into a number of failures in which systems were hung for hours. It > > > turned out that they were caused by a regular system shutdown. Our > > > backing store is network-based and networking was getting killed before > > > applications (as is usually the case), leaving I/O outstanding on the > > > device. Since queue_if_no_path was set, the I/O wasn't dumped and our > > > daemon was killed by shutdown very shortly thereafter so it couldn't > > > recover (otherwise it would have cleaned things up). > > > > > > > Was multipathd force killed? What was the default configuration > > parameter "queue_without_daemon" set to? > > > > If "queue_without_daemon" is set to "no", multipathd should disable > > queueing when it is stopped. This was added specifically to avoid this > > issue. > > We don't run multipathd (we need our own daemon for various reasons) and > we're also running an older kernel (based on 3.3), so we don't have > "queue_without_daemon" yet. (In fact, I don't see it on a cursory > glance at dm-mpath.c; where is it implemented?) It's implemented in multipathd, not the kernel, so won't be any help to you then. -Ben > -- > Frank Mayhar > 310-460-4042 ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: RFC for multipath queue_if_no_path timeout. 2013-09-26 23:22 ` RFC for multipath queue_if_no_path timeout Alasdair G Kergon 2013-09-26 23:49 ` Mike Snitzer @ 2013-09-27 16:27 ` Frank Mayhar 1 sibling, 0 replies; 49+ messages in thread From: Frank Mayhar @ 2013-09-27 16:27 UTC (permalink / raw) To: Alasdair G Kergon; +Cc: dm-devel On Fri, 2013-09-27 at 00:22 +0100, Alasdair G Kergon wrote: > On Thu, Sep 26, 2013 at 10:47:13AM -0700, Frank Mayhar wrote: > > Launching it from ramdisk won't help, particularly, since it still goes > > through the block layer. The other stuff won't help if a (potentially > > unrelated) bug in the daemon happens to be being tickled at the same > > time, or if some dependency happens to be broken and _that's_ what's > > preventing the daemon from making progress. > > Then put more effort into debugging your daemon so it doesn't have > bugs that make it die? Implement the timeout in a robust independent > daemon if it's other code there that's unreliable? I'm not sure how to respond to this. Some fifty years of people programming computers appears to show unequivocally that you can't rely on code not having bugs no matter _how_ much effort you put into it. It's just the nature of the beast. > > And as far as lvm2 and multipath-tools, yeah, they cope okay in the kind > > of environments most people have, but that's not the kind of environment > > (or scale) we have to deal with. > In what way are your requirements so different that a locked-into-memory > monitoring daemon cannot implement this timeout? If we could _have_ an independent, locked-into-memory monitoring daemon just for this purpose, we might be able to get by. It would still be iffy, for the reason I mention above; at our scale anything that _can_ fail _will_ fail, at least occasionally and often many times per day. Unfortunately, that's a non-starter for a number of reasons, including but not limited to the fact that the environment the daemon is running in is memory-constrained. Add to that the fact that the daemon we actually have depends on stuff that my team has no direct control over and we end up really needing an in-kernel way to deal with this. -- Frank Mayhar 310-460-4042 ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: RFC for multipath queue_if_no_path timeout. 2013-09-26 17:14 RFC for multipath queue_if_no_path timeout Frank Mayhar 2013-09-26 17:24 ` Alasdair G Kergon @ 2013-09-26 17:41 ` Mike Snitzer 2013-09-26 17:55 ` Frank Mayhar 1 sibling, 1 reply; 49+ messages in thread From: Mike Snitzer @ 2013-09-26 17:41 UTC (permalink / raw) To: Frank Mayhar; +Cc: dm-devel On Thu, Sep 26 2013 at 1:14pm -0400, Frank Mayhar <fmayhar@google.com> wrote: > Hey, folks. We're using multipath as an in-kernel failover mechanism, > so that if an underlying device dies, multipath will switch to another > in its list. Further, we use queue_if_no_path so that a daemon can get > involved and replace the list if the kernel runs out of alternatives. > In testing, however, we ran into a problem. > > Obviously, if queue_if_no_path is on and multipath runs out of good > paths, the I/Os will sit there queued forever barring user intervention. > I was doing a lot of failure testing and encountered a daemon bug in > which it would abandon its recovery in the middle, leaving the list > intact and the I/Os queued, forever. We fixed the daemon Did you share the fix upstream yet? If not please do ;) > but the > problem is potentially still there if for some reason the daemon dies > and is not restarted. This is a problem not solely (or even primarily) > for the queued I/O, but also because things like slab shrink can get > stuck behind that I/O and then other stuff becomes stuck behind _that_ > (since tries to get locks held by shrink and may itself hold > semaphores), bringing the whole system to its knees in fairly short > order, to the point that it's impossible to even get in via the network > and reboot it. I have an existence proof that this is the case. :-) > > My idea to deal with this in the kernel was to introduce a timeout on > queue_if_no_path and make it settable either kernel-wide or per-table. > By default it's disabled and is only armed when multipath runs out of > valid paths and queue_if_no_path is on. It's disabled again on table > load. If the timeout ever fires, all that happens is that the handler > turns off queue_if_no_path; this causes all the outstanding I/O to get > EIO and unsticks things all the way up the chain. Losing those I/Os is > far better than losing the entire system. > > I've actually implemented this and it works. I've debated about talking > with you folks about it but figured it was worth a shot. I can post the > patch if you're interested. A timeout is always going to be racey. But obviously with enough testing you could arrive at a timeout that is reasonable for your needs.. but in general I just don't think a timeout to release the queuing is the right way to go. And I understand Alasdair's point about hardening multipathd and using a watchdog to restart it if it fails. Ultimately that is ideal. But if multipathd does have a bug that makes it incapable of handling a case (like the one you just fixed) it doesn't help to restart the daemon. Therefore I'm not opposed to some solution in kernel. But I'd think it would be the kernel equivalent to multipathd's "queue_without_daemon". AFAIK we currently don't have a way for the kernel to _know_ multipathd is running; but that doesn't mean such a mechanism couldn't be implemented. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: RFC for multipath queue_if_no_path timeout. 2013-09-26 17:41 ` Mike Snitzer @ 2013-09-26 17:55 ` Frank Mayhar 2013-09-26 18:41 ` Mike Snitzer 0 siblings, 1 reply; 49+ messages in thread From: Frank Mayhar @ 2013-09-26 17:55 UTC (permalink / raw) To: Mike Snitzer; +Cc: dm-devel On Thu, 2013-09-26 at 13:41 -0400, Mike Snitzer wrote: > On Thu, Sep 26 2013 at 1:14pm -0400, > Frank Mayhar <fmayhar@google.com> wrote: > > Obviously, if queue_if_no_path is on and multipath runs out of good > > paths, the I/Os will sit there queued forever barring user intervention. > > I was doing a lot of failure testing and encountered a daemon bug in > > which it would abandon its recovery in the middle, leaving the list > > intact and the I/Os queued, forever. We fixed the daemon > Did you share the fix upstream yet? If not please do ;) It's a daemon we wrote so the fix only really applies to us, sorry. > A timeout is always going to be racey. But obviously with enough > testing you could arrive at a timeout that is reasonable for your > needs.. but in general I just don't think a timeout to release the > queuing is the right way to go. Having it as an admin-settable option seems reasonable to me, though. I agree you don't want one by default. I expect that the timeout that's actually used to be on the order of single-digit minutes. > And I understand Alasdair's point about hardening multipathd and using a > watchdog to restart it if it fails. Ultimately that is ideal. But if > multipathd does have a bug that makes it incapable of handling a case > (like the one you just fixed) it doesn't help to restart the daemon. Believe me, we're hardening out daemon as much as possible, but the reality is that there's always going to be some situation that wasn't anticipated. In our environment, that kind of stuff happens almost constantly. No matter how hardened the daemon, _something_ can keep it from doing its job. > Therefore I'm not opposed to some solution in kernel. But I'd think it > would be the kernel equivalent to multipathd's "queue_without_daemon". > AFAIK we currently don't have a way for the kernel to _know_ multipathd > is running; but that doesn't mean such a mechanism couldn't be > implemented. If you have a reasonable alternative I'm all ears. However instantly failing the I/O if the daemon isn't present and we run out of paths isn't a good answer for us. Setting a timeout only if the daemon isn't present is functionally equivalent to setting a timeout regardless and having a running daemon nearly instantly reload the table (thereby turning off the timeout). -- Frank Mayhar 310-460-4042 ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: RFC for multipath queue_if_no_path timeout. 2013-09-26 17:55 ` Frank Mayhar @ 2013-09-26 18:41 ` Mike Snitzer 0 siblings, 0 replies; 49+ messages in thread From: Mike Snitzer @ 2013-09-26 18:41 UTC (permalink / raw) To: Frank Mayhar; +Cc: dm-devel On Thu, Sep 26 2013 at 1:55pm -0400, Frank Mayhar <fmayhar@google.com> wrote: > On Thu, 2013-09-26 at 13:41 -0400, Mike Snitzer wrote: > > On Thu, Sep 26 2013 at 1:14pm -0400, > > Frank Mayhar <fmayhar@google.com> wrote: > > > Obviously, if queue_if_no_path is on and multipath runs out of good > > > paths, the I/Os will sit there queued forever barring user intervention. > > > I was doing a lot of failure testing and encountered a daemon bug in > > > which it would abandon its recovery in the middle, leaving the list > > > intact and the I/Os queued, forever. We fixed the daemon > > Did you share the fix upstream yet? If not please do ;) > > It's a daemon we wrote so the fix only really applies to us, sorry. Ah, unfortunate to hear multipath-tools wasn't suitable for your needs. We don't need to get in an elaborate tangent on why it wasn't used but if there is something that would help improve multipath-tools in general please share. > > A timeout is always going to be racey. But obviously with enough > > testing you could arrive at a timeout that is reasonable for your > > needs.. but in general I just don't think a timeout to release the > > queuing is the right way to go. > > Having it as an admin-settable option seems reasonable to me, though. I > agree you don't want one by default. I expect that the timeout that's > actually used to be on the order of single-digit minutes. > > > And I understand Alasdair's point about hardening multipathd and using a > > watchdog to restart it if it fails. Ultimately that is ideal. But if > > multipathd does have a bug that makes it incapable of handling a case > > (like the one you just fixed) it doesn't help to restart the daemon. > > Believe me, we're hardening out daemon as much as possible, but the > reality is that there's always going to be some situation that wasn't > anticipated. In our environment, that kind of stuff happens almost > constantly. No matter how hardened the daemon, _something_ can keep it > from doing its job. > > > Therefore I'm not opposed to some solution in kernel. But I'd think it > > would be the kernel equivalent to multipathd's "queue_without_daemon". > > AFAIK we currently don't have a way for the kernel to _know_ multipathd > > is running; but that doesn't mean such a mechanism couldn't be > > implemented. > > If you have a reasonable alternative I'm all ears. However instantly > failing the I/O if the daemon isn't present and we run out of paths > isn't a good answer for us. I see your point, and making a decision based on the availability of the registered daemon (your custom thing or multipathd) is racey if the daemon has a bug that causes it to crash and the watchdog restarts it repeatedly; could result in false positive that makes the kernel think the daemon is there (only for it to fail). > Setting a timeout only if the daemon isn't > present is functionally equivalent to setting a timeout regardless and > having a running daemon nearly instantly reload the table (thereby > turning off the timeout). Not really following you here but the big point is you don't want immediate release of the queued IO (even if you were armed with the insight that there is no daemon to recover the paths)... you want that release to be time delayed. Pretty sure multipathd doesn't have the ability to dequeue multipath devices after a timeout; so while Alasdair said it was a conscious decision to have that feature be done in a userspace daemon I can easily see that a bug in said daemon could prevent it from staying up long enough to hit the configured timeout and then reload tables. It is moot considering you aren't using multipath-tools, but your daemon could also suffer from inability to reliably staying up to reload the tables on timeout. So all said, please post your dm-mpath patch that implements this queue_if_no_path_timeout feature for further consideration. I'd hope 0 disables it (the default) and that the unit of the timeout is in seconds. Also, I don't think it should be dm-mpath (kernel) wide... best to just have it be per multipath table. Thanks, Mike ^ permalink raw reply [flat|nested] 49+ messages in thread
end of thread, other threads:[~2013-11-07 1:03 UTC | newest] Thread overview: 49+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-09-26 17:14 RFC for multipath queue_if_no_path timeout Frank Mayhar 2013-09-26 17:24 ` Alasdair G Kergon 2013-09-26 17:31 ` Frank Mayhar 2013-09-26 17:38 ` Alasdair G Kergon 2013-09-26 17:47 ` Frank Mayhar 2013-09-26 17:52 ` Mike Snitzer 2013-09-26 20:36 ` [PATCH 1/1] dm mpath: Add timeout mechanism for queue_if_no_path Frank Mayhar 2013-09-26 23:22 ` RFC for multipath queue_if_no_path timeout Alasdair G Kergon 2013-09-26 23:49 ` Mike Snitzer 2013-09-27 6:07 ` Hannes Reinecke 2013-09-27 8:06 ` Hannes Reinecke 2013-09-27 8:37 ` Alasdair G Kergon 2013-09-27 13:52 ` Hannes Reinecke 2013-09-27 16:37 ` Frank Mayhar 2013-09-27 16:32 ` Frank Mayhar 2013-09-27 16:29 ` Frank Mayhar 2013-10-17 19:03 ` Frank Mayhar 2013-10-17 19:15 ` Mike Snitzer 2013-10-17 20:45 ` Frank Mayhar 2013-10-17 21:13 ` Mike Snitzer 2013-10-18 20:51 ` Frank Mayhar 2013-10-18 21:47 ` Alasdair G Kergon 2013-10-18 22:53 ` [RFC PATCH v2] dm mpath: add a " Mike Snitzer 2013-10-30 1:02 ` Mike Snitzer 2013-10-30 15:08 ` Frank Mayhar 2013-10-30 15:43 ` Mike Snitzer 2013-10-30 18:09 ` Frank Mayhar 2013-10-31 9:36 ` Junichi Nomura 2013-10-31 14:16 ` Frank Mayhar 2013-10-31 14:31 ` Alasdair G Kergon 2013-10-31 17:17 ` Frank Mayhar 2013-11-01 1:23 ` Junichi Nomura 2013-11-01 1:58 ` Junichi Nomura 2013-11-01 4:17 ` Junichi Nomura 2013-11-05 15:18 ` Frank Mayhar 2013-11-05 16:02 ` [RFC PATCH v3] " Frank Mayhar 2013-11-05 16:53 ` Mike Snitzer 2013-11-06 6:54 ` Hannes Reinecke 2013-11-06 15:43 ` Frank Mayhar 2013-11-06 19:21 ` Mike Snitzer 2013-11-07 1:03 ` Junichi Nomura 2013-10-31 14:59 ` [RFC PATCH v2] " Hannes Reinecke 2013-10-21 16:05 ` RFC for multipath " Benjamin Marzinski 2013-10-21 16:17 ` Frank Mayhar 2013-10-23 13:39 ` Benjamin Marzinski 2013-09-27 16:27 ` Frank Mayhar 2013-09-26 17:41 ` Mike Snitzer 2013-09-26 17:55 ` Frank Mayhar 2013-09-26 18:41 ` Mike Snitzer
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.