From mboxrd@z Thu Jan 1 00:00:00 1970 From: Hannes Reinecke Subject: Re: [dm-devel] LSF: Multipathing and path checking question Date: Fri, 17 Apr 2009 09:50:37 +0200 Message-ID: <49E834CD.1090306@suse.de> References: <49E7B845.70400@cs.wisc.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: Received: from cantor2.suse.de ([195.135.220.15]:40019 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754271AbZDQHuk (ORCPT ); Fri, 17 Apr 2009 03:50:40 -0400 In-Reply-To: <49E7B845.70400@cs.wisc.edu> Sender: linux-scsi-owner@vger.kernel.org List-Id: linux-scsi@vger.kernel.org To: device-mapper development Cc: SCSI Mailing List Hi Mike, Mike Christie wrote: > Hey, >=20 > For this topic: >=20 > ----------------------- > Next-Gen Multipathing > --------------------- > Dr. Hannes Reinecke >=20 > ...... >=20 > Should path checkers use sd->state to check for errors or availabilit= y? > ---------------------- >=20 > What was decided? >=20 > Could this problem be fixed or helped if multipath tools always sets = the > fast io fail tmo for FC or the replacement_timeout for iscsi? >=20 No, I already do this for FC (should be checking the replacement_timeou= t, too ...) > If those are set then IO in the blocked queue and in the driver will = get > failed after fast io fail tmo/replacement_timeout seconds (driver has= to > implement a terminate rport IO callback and only mptfc does not now).= So > at this time, do we want to fail the path? >=20 > Or are people thinking that we want to fail the path when the problem= is > initially detected like when the LLD deletes the rport for fc for exa= mple? >=20 Well, the idea is the following: The primary purpose of the path checkers is to check the availability o= f the paths (my, that was easy :-). And the main problem we have with the path checkers is that they are us= ing actual SCSI commands to determine this, thereby incurring unrelated err= ors (Disk errors, delaying response due to blocked path behaviour or error = handling etc). So we have to invest quite a bit of logic to separate the 'true' = path condition from unrelated errors, simply because we're checking at the w= rong level; the path state is maintained by the transport layer, not by the SCSI layer. So the suggestion here is to check the transport layer for the path sta= tes and do away with the existing path_checker SG_IO mechanism. The secondary use of the path checkers (determine inactive paths) will = have to be delegated to the priority callouts, which then have to arrange th= e paths correctly. =46C Transport already maintains an attribute for the path state, and e= ven sends netlink events if and when this attribute changes. For iSCSI I ha= ve to defer to your superior knowledge; of course it would be easiest if iSCSI could send out the very same message FC does. >=20 >=20 > Also for this one: > ----------------------- > How to communication device went away: > 1) send event to udev (uses netlink) > ----------------------- >=20 > Is this an event when dev_loss_tmo fires or when the LLD first detect= s > something like a link down (or any event it might block the rport for= ), > or would it be for when the fast fail io tmo fires (when the fc class= is > going to fail running IO and incoming IO), or would we have events fo= r > all of them? >=20 Currently the event is sent when the device itself is removed from sysf= s. And only then can we actually update the path maps and (possibly) chang= e to another part. We cannot do anything when the path is blocked (ie whe= n dev_loss_tmo is active) as we require this interval to capture jitter o= n the line. So we have this state diagram: sdev state: RUNNING <-> BLOCKED -> CANCEL mpath state: path up <-> -> path down / remove from map Notice the '' here; we cannot check the path state when the sdev is blocked as all I/O will be queued. And also note that we now lump two different multipath path states together; a path down is basically always followed immediately by a path remove event. However, when all paths are down (and queue_if_no_path is active) we mi= ght run into a deadlock when a path comes back, as we might not have enough memory to actually create the required structures. Idea was to modify the state machine so that fast_fail_io_tmo is being made mandatory, which transitions the sdev into an intermediate state 'DISABLED' and sends out a netlink message. sdev state: RUNNING <-> BLOCKED <-> DISABLED -> CANCEL mpath state: path up <-> <-> path down -> remove from map This will allow us to switch paths early, ie when it moves into 'DISABLED' state. But the path structure themselves are still alive, so when a path comes back between 'DISABLED' and 'CANCEL' we won't have an issue reconnecting it. And we could even allow to set a dev_loss_tmo to infinity thereby simulating the 'old' behaviour. However, this proposal didn't go through. Instead it was proposed to do away with the unlimited queue_if_no_path setting and _always_ have a timeout there, so that the machine is able to recover after a certain period of time. I still like my original proposal, though. Maybe we can do the EU referendum thing and just ask again and again until everyone becomes tired of it and just says 'yes' to get rid of this issue ... Cheers, Hannes --=20 Dr. Hannes Reinecke zSeries & Storage hare@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 N=FCrnberg GF: Markus Rex, HRB 16746 (AG N=FCrnberg) -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" i= n the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html