From: Hannes Reinecke <hare@suse.de>
To: Mike Christie <michaelc@cs.wisc.edu>
Cc: device-mapper development <dm-devel@redhat.com>,
SCSI Mailing List <linux-scsi@vger.kernel.org>
Subject: Re: [dm-devel] LSF: Multipathing and path checking question
Date: Mon, 20 Apr 2009 09:59:33 +0200 [thread overview]
Message-ID: <49EC2B65.1050508@suse.de> (raw)
In-Reply-To: <49E89866.50205@cs.wisc.edu>
Hi Mike,
Mike Christie wrote:
> Hannes Reinecke wrote:
>>
>> FC Transport already maintains an attribute for the path state, and even
>> sends netlink events if and when this attribute changes. For iSCSI I have
>
> Are you referring to fc_host_post_event? Is the same thing we talked
> about last year, where you wanted events? Is this in multipath tools now
> or just in the SLES ones?
>
Yep, that's the thing.
> For something like FCH_EVT_LINKDOWN, are you going to fail the path at
> that time or when would the multipath path be marked failed?
>
This is just a notification that the path has gone down. Fast fail / dev_loss_tmo
still applies, ie that path won't get switched then.
>
>
>> to defer to your superior knowledge; of course it would be easiest if
>> iSCSI could send out the very same message FC does.
>
> We can do something like fc_host_event_code for iscsi.
>
Oh, that'll be grand.
> Question on what you are needing:
>
> Do you mean you want to make fc_host_event_code more generic (there are
> some FC specific ones like lip_reset)? Put them in scsi-ml and send from
> a new netlink group that just sends these events?
>
> Or do you just want something similar from iscsi? iscsi will hook into
> the iscsi netlink code using the scsi_netlink.c and then send a
> ISCSIH_EVT_LINKUP, ISCSIH_EVT, LINKDOWN, etc.
>
Well, actually, I don't care. It's just if we were to go with the
proposal we'll have to fix up all transports to present the path state
to userspace; preferably with both, netlink events and sysfs attributes.
The actual implementation might well be transport-specific.
> What do the FCH_EVT_PORT_* ones means?
>
FC stuff methinks. James S. should know better.
>
>
>>
>> Idea was to modify the state machine so that fast_fail_io_tmo is
>> being made mandatory, which transitions the sdev into an intermediate
>> state 'DISABLED' and sends out a netlink message.
>
>
> Above when you said, "No, I already do this for FC (should be checking
> the replacement_timeout, too ...)", did you mean that you have mulitpath
> tools always setting fast io fail now?
>
Yes, quite so. Look at
git://git.kernel.org/pub/scm/linux/kernel/git/hare/multipath-tools
branch sles11
for details.
> For iscsi the replacement_timeout is always set already. If from
> multipath tools you are going to add some code so multipth sets this I
> can make iscsi allow the replacement_timeout to be set from sysfs like
> is done for FC's fast io fail.
>
Oh, that would be awesome. Currently I think we have a mismatch / race
condition between iSCSI and multipathing, where ERL in iSCSI actually
counteracts multipathing. But I'll be investigating that one shortly.
>
>
>>
>> sdev state: RUNNING <-> BLOCKED <-> DISABLED -> CANCEL
>> mpath state: path up <-> <stall> <-> path down -> remove from map
>>
>> This will allow us to switch paths early, ie when it moves into
>> 'DISABLED' state. But the path structure themselves are still alive,
>> so when a path comes back between 'DISABLED' and 'CANCEL' we won't
>> have an issue reconnecting it. And we could even allow to set a
>> dev_loss_tmo to infinity thereby simulating the 'old' behaviour.
>>
>> However, this proposal didn't go through.
>
> You got my hopes up for a solution in the the long explanation, then you
> destroyed them :)
>
Yes, same here. I really thought this to be a sensible proposal, but
then the discussion veered off into queue_if_no_path handling.
>
> Was the reason people did not like this because of the scsi device
> lifetime issue?
>
>
> I think we still want someone to set the fast io fail tmo for users when
> multipath is being used, because we want IO out of the queues and
> drivers and sent to the multipath layer before dev_loss_tmo if
> dev_loss_tmo is still going to be a lot longer. fast io fail tmo is
> usually less than 10 or 5 and for dev_loss_tmo seems like we still have
> user setting that to minutes.
>
Exactly. Point here is that with the current implementation we basically
_cannot_ return 'path down' anymore, as the path is either blocked (during
which time all I/O got stalled) or failed completely (ie in state 'CANCEL').
Which is a bit of a detriment and we actually run into quite some contention
when the path is removed, as we have to kill all I/O, fail over paths, remove
stale paths, update device-mapper tables etc.
When decoupling this by having the midlayer always return 'DID_TRANSPORT_DISRUPTED'
after fast_fail_io we would be able to kill all I/O and switch paths gracefully.
Path removal and device-mapper table update would then be done later one when
dev_loss_tmo triggers.
>
> Can't the transport layers just send two events?
> 1. On the initial link down when the port/session is blocked.
> 2. When there fast io fail tmos fire.
>
Yes, that would be a good start.
> Today, instead of #2, the Red Hat multipath tools guy and I were talking
> about doing a probe with SG_IO. For example we would send down a path
> tester IO and then wait for it to be failed with DID_TRANSPORT_FAILFAST.
>
No. this is exactly what you cannot do. SG_IO will be stalled when the
sdev is BLOCKED and will only return a result _after_ the sdev transitions
_out_ of the BLOCKED state.
Translated to FC this means that whenever dev_loss_tmo is _active_ (!)
no I/O will be send out neither any I/O result will be returned to userland.
Hence using SG_IO for path checker is a bad idea here.
Hence my proposal.
Cheers,
Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare@suse.de +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2009-04-20 7:59 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-04-16 22:59 LSF: Multipathing and path checking question Mike Christie
2009-04-17 7:50 ` [dm-devel] " Hannes Reinecke
2009-04-17 14:55 ` Mike Christie
2009-04-17 15:21 ` Mike Christie
2009-04-20 8:19 ` [dm-devel] " Hannes Reinecke
2009-04-20 19:23 ` Mike Christie
2009-04-20 23:02 ` Mike Christie
2009-04-21 7:26 ` [dm-devel] " Hannes Reinecke
2009-04-20 7:59 ` Hannes Reinecke [this message]
2009-04-20 19:10 ` Mike Christie
2009-04-20 19:28 ` [dm-devel] " Mike Christie
2009-04-21 7:04 ` Hannes Reinecke
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=49EC2B65.1050508@suse.de \
--to=hare@suse.de \
--cc=dm-devel@redhat.com \
--cc=linux-scsi@vger.kernel.org \
--cc=michaelc@cs.wisc.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.