From mboxrd@z Thu Jan 1 00:00:00 1970 From: Hannes Reinecke Subject: Re: RFC for multipath queue_if_no_path timeout. Date: Fri, 27 Sep 2013 15:52:58 +0200 Message-ID: <52458DBA.5090109@suse.de> References: <1380215696.25252.36.camel@bobble.lax.corp.google.com> <20130926172422.GA31328@agk-dp.fab.redhat.com> <1380216716.25252.39.camel@bobble.lax.corp.google.com> <20130926173814.GB31328@agk-dp.fab.redhat.com> <1380217633.25252.46.camel@bobble.lax.corp.google.com> <20130926232241.GC31328@agk-dp.fab.redhat.com> <20130926234957.GA3658@redhat.com> <524520B0.6010003@suse.de> <52453C76.6060403@suse.de> <20130927083742.GA683@agk-dp.fab.redhat.com> Reply-To: device-mapper development Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: <20130927083742.GA683@agk-dp.fab.redhat.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com To: dm-devel@redhat.com, Frank Mayhar , Mike Snitzer List-Id: dm-devel.ids On 09/27/2013 10:37 AM, Alasdair G Kergon wrote: > But this still dodges the fundamental problem: > = > What is the right value to use for the timeout? > - How long should you wait for a path to (re)appear? > - In the current model, reinstating a path is a userspace = > responsibility. > = And with my proposed patch it would still be userspace which is setting the timeout. Currently, no_path_retry is not a proper measure anyway, as it's depending on the time multipathd takes to complete a path check round. Which depends on the number of device, the state of those etc. > The timeout, as proposed, is being used in two conflicting ways: > - How long to wait for path recovery when all paths went down That would be set via the new 'no_path_timeout' feature, which would be set instead of the (multipath-internal) no_path_retry setting. > - How long to wait when the system locks without enough free > memory even to reinstate a path (because of broken userspace > code) before having multipath fail queued I/O in a desperate > attempt at releasing memory to assist recovery > = Do we even handle that case currently? Methinks this is precisely the use-case this is supposed to address. When currently 'no_path_retry' is set _and_ we're running under a low-mem condition there is a quite large likelyhood that the multipath daemon will be killed by the OOM-killer or not able to send any dm messages down to the kernel, as the latter most likely require some memory allocations. So in the current 'no_path_retry' scenario the maps would have been created with 'queue_if_no_path', and the daemon would have to reset the 'queue_if_no_path' flag if the no_path_retry value expires. Which it might not be able to do so due to the above scenario. So with the proposed 'no_path_timeout' we would enable the dm-mpath module to terminate all outstanding I/O, irrespective on all userland conditions. Which seems like an improvement to me ... > The second case should point to a very short timeout. > The first case probably wants a longer one. > = > In my view the correct approach for the case Frank is discussing is to > use a different trigger to detect the (approaching?) locking up of the > system. E.g. should something related to the handling of an out > of memory condition have a hook to instruct multipath to release such > queued I/O? > = Yeah, that was what I had planned for quite some time. But thinking it over the no_path_timeout seems like a better approach here. (Plus we're hooking into the generic 'blk_timeout' mechanism, which then would allow to blk_abort_request() to work) Cheers, Hannes -- = Dr. Hannes Reinecke zSeries & Storage hare@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 N=FCrnberg GF: J. Hawn, J. Guild, F. Imend=F6rffer, HRB 16746 (AG N=FCrnberg)