* LSF: Multipathing and path checking question @ 2009-04-16 22:59 Mike Christie 2009-04-17 7:50 ` [dm-devel] " Hannes Reinecke 0 siblings, 1 reply; 12+ messages in thread From: Mike Christie @ 2009-04-16 22:59 UTC (permalink / raw) To: device-mapper development, SCSI Mailing List Hey, For this topic: ----------------------- Next-Gen Multipathing --------------------- Dr. Hannes Reinecke ...... Should path checkers use sd->state to check for errors or availability? ---------------------- What was decided? Could this problem be fixed or helped if multipath tools always sets the fast io fail tmo for FC or the replacement_timeout for iscsi? If those are set then IO in the blocked queue and in the driver will get failed after fast io fail tmo/replacement_timeout seconds (driver has to implement a terminate rport IO callback and only mptfc does not now). So at this time, do we want to fail the path? Or are people thinking that we want to fail the path when the problem is initially detected like when the LLD deletes the rport for fc for example? Also for this one: ----------------------- How to communication device went away: 1) send event to udev (uses netlink) ----------------------- Is this an event when dev_loss_tmo fires or when the LLD first detects something like a link down (or any event it might block the rport for), or would it be for when the fast fail io tmo fires (when the fc class is going to fail running IO and incoming IO), or would we have events for all of them? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [dm-devel] LSF: Multipathing and path checking question 2009-04-16 22:59 LSF: Multipathing and path checking question Mike Christie @ 2009-04-17 7:50 ` Hannes Reinecke 2009-04-17 14:55 ` Mike Christie 0 siblings, 1 reply; 12+ messages in thread From: Hannes Reinecke @ 2009-04-17 7:50 UTC (permalink / raw) To: device-mapper development; +Cc: SCSI Mailing List Hi Mike, Mike Christie wrote: > Hey, > > For this topic: > > ----------------------- > Next-Gen Multipathing > --------------------- > Dr. Hannes Reinecke > > ...... > > Should path checkers use sd->state to check for errors or availability? > ---------------------- > > What was decided? > > Could this problem be fixed or helped if multipath tools always sets the > fast io fail tmo for FC or the replacement_timeout for iscsi? > No, I already do this for FC (should be checking the replacement_timeout, too ...) > If those are set then IO in the blocked queue and in the driver will get > failed after fast io fail tmo/replacement_timeout seconds (driver has to > implement a terminate rport IO callback and only mptfc does not now). So > at this time, do we want to fail the path? > > Or are people thinking that we want to fail the path when the problem is > initially detected like when the LLD deletes the rport for fc for example? > Well, the idea is the following: The primary purpose of the path checkers is to check the availability of the paths (my, that was easy :-). And the main problem we have with the path checkers is that they are using actual SCSI commands to determine this, thereby incurring unrelated errors (Disk errors, delaying response due to blocked path behaviour or error handling etc). So we have to invest quite a bit of logic to separate the 'true' path condition from unrelated errors, simply because we're checking at the wrong level; the path state is maintained by the transport layer, not by the SCSI layer. So the suggestion here is to check the transport layer for the path states and do away with the existing path_checker SG_IO mechanism. The secondary use of the path checkers (determine inactive paths) will have to be delegated to the priority callouts, which then have to arrange the paths correctly. FC Transport already maintains an attribute for the path state, and even sends netlink events if and when this attribute changes. For iSCSI I have to defer to your superior knowledge; of course it would be easiest if iSCSI could send out the very same message FC does. > > > Also for this one: > ----------------------- > How to communication device went away: > 1) send event to udev (uses netlink) > ----------------------- > > Is this an event when dev_loss_tmo fires or when the LLD first detects > something like a link down (or any event it might block the rport for), > or would it be for when the fast fail io tmo fires (when the fc class is > going to fail running IO and incoming IO), or would we have events for > all of them? > Currently the event is sent when the device itself is removed from sysfs. And only then can we actually update the path maps and (possibly) change to another part. We cannot do anything when the path is blocked (ie when dev_loss_tmo is active) as we require this interval to capture jitter on the line. So we have this state diagram: sdev state: RUNNING <-> BLOCKED -> CANCEL mpath state: path up <-> <stall> -> path down / remove from map Notice the '<stall>' here; we cannot check the path state when the sdev is blocked as all I/O will be queued. And also note that we now lump two different multipath path states together; a path down is basically always followed immediately by a path remove event. However, when all paths are down (and queue_if_no_path is active) we might run into a deadlock when a path comes back, as we might not have enough memory to actually create the required structures. Idea was to modify the state machine so that fast_fail_io_tmo is being made mandatory, which transitions the sdev into an intermediate state 'DISABLED' and sends out a netlink message. sdev state: RUNNING <-> BLOCKED <-> DISABLED -> CANCEL mpath state: path up <-> <stall> <-> path down -> remove from map This will allow us to switch paths early, ie when it moves into 'DISABLED' state. But the path structure themselves are still alive, so when a path comes back between 'DISABLED' and 'CANCEL' we won't have an issue reconnecting it. And we could even allow to set a dev_loss_tmo to infinity thereby simulating the 'old' behaviour. However, this proposal didn't go through. Instead it was proposed to do away with the unlimited queue_if_no_path setting and _always_ have a timeout there, so that the machine is able to recover after a certain period of time. I still like my original proposal, though. Maybe we can do the EU referendum thing and just ask again and again until everyone becomes tired of it and just says 'yes' to get rid of this issue ... Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Markus Rex, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [dm-devel] LSF: Multipathing and path checking question 2009-04-17 7:50 ` [dm-devel] " Hannes Reinecke @ 2009-04-17 14:55 ` Mike Christie 2009-04-17 15:21 ` Mike Christie 2009-04-20 7:59 ` Hannes Reinecke 0 siblings, 2 replies; 12+ messages in thread From: Mike Christie @ 2009-04-17 14:55 UTC (permalink / raw) To: Hannes Reinecke; +Cc: device-mapper development, SCSI Mailing List Hannes Reinecke wrote: > > FC Transport already maintains an attribute for the path state, and even > sends netlink events if and when this attribute changes. For iSCSI I have Are you referring to fc_host_post_event? Is the same thing we talked about last year, where you wanted events? Is this in multipath tools now or just in the SLES ones? For something like FCH_EVT_LINKDOWN, are you going to fail the path at that time or when would the multipath path be marked failed? > to defer to your superior knowledge; of course it would be easiest if > iSCSI could send out the very same message FC does. We can do something like fc_host_event_code for iscsi. Question on what you are needing: Do you mean you want to make fc_host_event_code more generic (there are some FC specific ones like lip_reset)? Put them in scsi-ml and send from a new netlink group that just sends these events? Or do you just want something similar from iscsi? iscsi will hook into the iscsi netlink code using the scsi_netlink.c and then send a ISCSIH_EVT_LINKUP, ISCSIH_EVT, LINKDOWN, etc. What do the FCH_EVT_PORT_* ones means? > > Idea was to modify the state machine so that fast_fail_io_tmo is > being made mandatory, which transitions the sdev into an intermediate > state 'DISABLED' and sends out a netlink message. Above when you said, "No, I already do this for FC (should be checking the replacement_timeout, too ...)", did you mean that you have mulitpath tools always setting fast io fail now? For iscsi the replacement_timeout is always set already. If from multipath tools you are going to add some code so multipth sets this I can make iscsi allow the replacement_timeout to be set from sysfs like is done for FC's fast io fail. > > sdev state: RUNNING <-> BLOCKED <-> DISABLED -> CANCEL > mpath state: path up <-> <stall> <-> path down -> remove from map > > This will allow us to switch paths early, ie when it moves into > 'DISABLED' state. But the path structure themselves are still alive, > so when a path comes back between 'DISABLED' and 'CANCEL' we won't > have an issue reconnecting it. And we could even allow to set a > dev_loss_tmo to infinity thereby simulating the 'old' behaviour. > > However, this proposal didn't go through. You got my hopes up for a solution in the the long explanation, then you destroyed them :) Was the reason people did not like this because of the scsi device lifetime issue? I think we still want someone to set the fast io fail tmo for users when multipath is being used, because we want IO out of the queues and drivers and sent to the multipath layer before dev_loss_tmo if dev_loss_tmo is still going to be a lot longer. fast io fail tmo is usually less than 10 or 5 and for dev_loss_tmo seems like we still have user setting that to minutes. Can't the transport layers just send two events? 1. On the initial link down when the port/session is blocked. 2. When there fast io fail tmos fire. Today, instead of #2, the Red Hat multipath tools guy and I were talking about doing a probe with SG_IO. For example we would send down a path tester IO and then wait for it to be failed with DID_TRANSPORT_FAILFAST. Or for #2 if we cannot have a new event, can we send a transport level bsg request? For iscsi this would be a nop. For FC, I am not sure what it would be? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: LSF: Multipathing and path checking question 2009-04-17 14:55 ` Mike Christie @ 2009-04-17 15:21 ` Mike Christie 2009-04-20 8:19 ` [dm-devel] " Hannes Reinecke 2009-04-20 7:59 ` Hannes Reinecke 1 sibling, 1 reply; 12+ messages in thread From: Mike Christie @ 2009-04-17 15:21 UTC (permalink / raw) To: Hannes Reinecke; +Cc: device-mapper development, SCSI Mailing List Oops, I mashed two topics together. See below. Mike Christie wrote: > Hannes Reinecke wrote: >> >> FC Transport already maintains an attribute for the path state, and even >> sends netlink events if and when this attribute changes. For iSCSI I have > > Are you referring to fc_host_post_event? Is the same thing we talked > about last year, where you wanted events? Is this in multipath tools now > or just in the SLES ones? > > For something like FCH_EVT_LINKDOWN, are you going to fail the path at > that time or when would the multipath path be marked failed? > I was asking this because it seems we have people always making bugzillas saying they did not want the path to be marked failed for short problems. There was the problem where we might get DID_ERROR for temporary dropped frame. That would be fixed by just listening to transport events like you explained. But then I thought there was the case where if we get a linkdown then linkup within a couple seconds, we would not want to transition the multipath path state. So below while you were talking about when to remove the device, I was talking about when to mark the path failed. > > You got my hopes up for a solution in the the long explanation, then you > destroyed them :) > > > Was the reason people did not like this because of the scsi device > lifetime issue? > > > I think we still want someone to set the fast io fail tmo for users when > multipath is being used, because we want IO out of the queues and > drivers and sent to the multipath layer before dev_loss_tmo if > dev_loss_tmo is still going to be a lot longer. fast io fail tmo is > usually less than 10 or 5 and for dev_loss_tmo seems like we still have > user setting that to minutes. > > > Can't the transport layers just send two events? > 1. On the initial link down when the port/session is blocked. > 2. When there fast io fail tmos fire. So for #2, I just want a way to figure out when the transport is giving up on executing IO and is going to fail everything. At that time, I was thinking we want to mark the path failed. I guess if multipiath tools is going to set fast io fail, it could also use that as its down timer to decide when to fail the path and not have to send SG IO or a bsg transport command. > > Today, instead of #2, the Red Hat multipath tools guy and I were talking > about doing a probe with SG_IO. For example we would send down a path > tester IO and then wait for it to be failed with DID_TRANSPORT_FAILFAST. > > Or for #2 if we cannot have a new event, can we send a transport level > bsg request? For iscsi this would be a nop. For FC, I am not sure what > it would be? > -- > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [dm-devel] LSF: Multipathing and path checking question 2009-04-17 15:21 ` Mike Christie @ 2009-04-20 8:19 ` Hannes Reinecke 2009-04-20 19:23 ` Mike Christie 0 siblings, 1 reply; 12+ messages in thread From: Hannes Reinecke @ 2009-04-20 8:19 UTC (permalink / raw) To: Mike Christie; +Cc: device-mapper development, SCSI Mailing List Hi Mike, Mike Christie wrote: > Oops, I mashed two topics together. See below. > > Mike Christie wrote: >> Hannes Reinecke wrote: >>> >>> FC Transport already maintains an attribute for the path state, and even >>> sends netlink events if and when this attribute changes. For iSCSI I >>> have >> >> Are you referring to fc_host_post_event? Is the same thing we talked >> about last year, where you wanted events? Is this in multipath tools >> now or just in the SLES ones? >> >> For something like FCH_EVT_LINKDOWN, are you going to fail the path at >> that time or when would the multipath path be marked failed? >> > > I was asking this because it seems we have people always making > bugzillas saying they did not want the path to be marked failed for > short problems. > > There was the problem where we might get DID_ERROR for temporary dropped > frame. That would be fixed by just listening to transport events like > you explained. > > But then I thought there was the case where if we get a linkdown then > linkup within a couple seconds, we would not want to transition the > multipath path state. > > So below while you were talking about when to remove the device, I was > talking about when to mark the path failed. > > I have the same bugzillas, too :-) My proposal is to handle this in several stages: - path fails -> Send out netlink event -> start dev_loss_tmo and fast_fail_io timer -> fast_fail_io timer triggers: Abort all oustanding I/O with DID_TRANSPORT_DISRUPTED, return DID_TRANSPORT_FAILFAST for any future I/O, and send out netlink event. -> dev_loss_tmo timer triggers: Remove sdev and cleanup rport. netlink event is sent implicitely by removing the sdev. Multipath would then interact with this sequence by: - Upon receiving 'path failed' event: mark path as 'ghost' or 'blocked', ie no I/O is currently possible and will be queued (no path switch yet). - Upon receiving 'fast_fail_io' event: switch paths and resubmit queued I/Os - Upon receiving 'path removed' event: remove path from internal structures, update multipath maps etc. The time between 'path failed' and 'fast_fail_io triggers' would then be able to capture any jitter / intermittent failures. Between 'fast_fail_io triggers' and 'path removed' the path would be held in some sort of 'limbo' in case it comes back again, eg for maintenance/SP update etc. And we can even increase this one to rather long timespans (eg hours) to give the admin enough time for a manual intervention. I still like this proposal as it makes multipath interaction far cleaner. And we can do away with path checkers completely here. > >> >> You got my hopes up for a solution in the the long explanation, then >> you destroyed them :) >> >> >> Was the reason people did not like this because of the scsi device >> lifetime issue? >> >> >> I think we still want someone to set the fast io fail tmo for users >> when multipath is being used, because we want IO out of the queues and >> drivers and sent to the multipath layer before dev_loss_tmo if >> dev_loss_tmo is still going to be a lot longer. fast io fail tmo is >> usually less than 10 or 5 and for dev_loss_tmo seems like we still >> have user setting that to minutes. >> >> >> Can't the transport layers just send two events? >> 1. On the initial link down when the port/session is blocked. >> 2. When there fast io fail tmos fire. > > > So for #2, I just want a way to figure out when the transport is giving > up on executing IO and is going to fail everything. At that time, I was > thinking we want to mark the path failed. > See above. Exactly my proposal. > I guess if multipiath tools is going to set fast io fail, it could also > use that as its down timer to decide when to fail the path and not have > to send SG IO or a bsg transport command. > But that's a bit of out-guessing the midlayer, no? We're instructing the midlayer to fail all I/O at one point; so it makes far more sense to me to have the midlayer telling us when this is going to happen instead of trying to figure this one out ourselves. For starters we just should send a netlink event when fast_fail_io has fired. We could easily integrate that one in multipathd and would gain an instant benefit from that as we can switch paths in advance. Next step would be to implement an additional sdev state which would return 'DID_TRANSPORT_FASTFAIL' for any 'normal' I/O; it would be inserted between 'RUNNING' and 'CANCEL'. Transition would be possible between 'RUNNING' and 'FASTFAIL', but it would only be possible to transition into 'CANCEL' from 'FASTFAIL'. Oh, and of course we have to persuade Eric Moore et al to implement fast_fail_io into mptfc ... Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Markus Rex, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: LSF: Multipathing and path checking question 2009-04-20 8:19 ` [dm-devel] " Hannes Reinecke @ 2009-04-20 19:23 ` Mike Christie 2009-04-20 23:02 ` Mike Christie 0 siblings, 1 reply; 12+ messages in thread From: Mike Christie @ 2009-04-20 19:23 UTC (permalink / raw) To: Hannes Reinecke; +Cc: device-mapper development, SCSI Mailing List Hannes Reinecke wrote: > Hi Mike, > > Mike Christie wrote: >> Oops, I mashed two topics together. See below. >> >> Mike Christie wrote: >>> Hannes Reinecke wrote: >>>> FC Transport already maintains an attribute for the path state, and even >>>> sends netlink events if and when this attribute changes. For iSCSI I >>>> have >>> Are you referring to fc_host_post_event? Is the same thing we talked >>> about last year, where you wanted events? Is this in multipath tools >>> now or just in the SLES ones? >>> >>> For something like FCH_EVT_LINKDOWN, are you going to fail the path at >>> that time or when would the multipath path be marked failed? >>> >> I was asking this because it seems we have people always making >> bugzillas saying they did not want the path to be marked failed for >> short problems. >> >> There was the problem where we might get DID_ERROR for temporary dropped >> frame. That would be fixed by just listening to transport events like >> you explained. >> >> But then I thought there was the case where if we get a linkdown then >> linkup within a couple seconds, we would not want to transition the >> multipath path state. >> >> So below while you were talking about when to remove the device, I was >> talking about when to mark the path failed. >> >> > I have the same bugzillas, too :-) > > My proposal is to handle this in several stages: > > - path fails > -> Send out netlink event > -> start dev_loss_tmo and fast_fail_io timer > -> fast_fail_io timer triggers: Abort all oustanding I/O with > DID_TRANSPORT_DISRUPTED, return DID_TRANSPORT_FAILFAST for > any future I/O, and send out netlink event. This is almost done. The IOs are failed. There is not netlink event yet. > -> dev_loss_tmo timer triggers: Remove sdev and cleanup rport. > netlink event is sent implicitely by removing the sdev. > > Multipath would then interact with this sequence by: > > - Upon receiving 'path failed' event: mark path as 'ghost' or 'blocked', > ie no I/O is currently possible and will be queued (no path switch yet). > - Upon receiving 'fast_fail_io' event: switch paths and resubmit queued I/Os > - Upon receiving 'path removed' event: remove path from internal structures, > update multipath maps etc. > > The time between 'path failed' and 'fast_fail_io triggers' would then be > able to capture any jitter / intermittent failures. Between > 'fast_fail_io triggers' and 'path removed' the path would be held in some > sort of 'limbo' in case it comes back again, eg for maintenance/SP update > etc. And we can even increase this one to rather long timespans (eg hours) > to give the admin enough time for a manual intervention. > > I still like this proposal as it makes multipath interaction far cleaner. > And we can do away with path checkers completely here. > >>> You got my hopes up for a solution in the the long explanation, then >>> you destroyed them :) >>> >>> >>> Was the reason people did not like this because of the scsi device >>> lifetime issue? >>> >>> >>> I think we still want someone to set the fast io fail tmo for users >>> when multipath is being used, because we want IO out of the queues and >>> drivers and sent to the multipath layer before dev_loss_tmo if >>> dev_loss_tmo is still going to be a lot longer. fast io fail tmo is >>> usually less than 10 or 5 and for dev_loss_tmo seems like we still >>> have user setting that to minutes. >>> >>> >>> Can't the transport layers just send two events? >>> 1. On the initial link down when the port/session is blocked. >>> 2. When there fast io fail tmos fire. >> >> So for #2, I just want a way to figure out when the transport is giving >> up on executing IO and is going to fail everything. At that time, I was >> thinking we want to mark the path failed. >> > See above. Exactly my proposal. > >> I guess if multipiath tools is going to set fast io fail, it could also >> use that as its down timer to decide when to fail the path and not have >> to send SG IO or a bsg transport command. >> > But that's a bit of out-guessing the midlayer, no? Yeah, agree. Just brain storming. > We're instructing the midlayer to fail all I/O at one point; so it makes > far more sense to me to have the midlayer telling us when this is going > to happen instead of trying to figure this one out ourselves. > > For starters we just should send a netlink event when fast_fail_io has > fired. We could easily integrate that one in multipathd and would gain > an instant benefit from that as we can switch paths in advance. > Next step would be to implement an additional sdev state which would > return 'DID_TRANSPORT_FASTFAIL' for any 'normal' I/O; it would be > inserted between 'RUNNING' and 'CANCEL'. > Transition would be possible between 'RUNNING' and 'FASTFAIL', but > it would only be possible to transition into 'CANCEL' from 'FASTFAIL'. > Yeah, a new sdev state might be nice. Right now this state is handled by the classes. For iscsi and FC the port/session will be in blocked/ISCSI_SESSION_FAILED. Then internally the classes are decieding what to do with IO in the *_chkready functions. > Oh, and of course we have to persuade Eric Moore et al to implement > fast_fail_io into mptfc ... Yeah, last holdout not counting the the old qlogic driver. But actually in the current code if you just set the fast io fail tmo, all IO in the block queues and any incoming IO will get failed. It is sort of a partial support. Even if you cannot kill IO in the driver because you do not have the terminate rport IO callback you can at least get the queues cleared, so that IO does not sit in there. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: LSF: Multipathing and path checking question 2009-04-20 19:23 ` Mike Christie @ 2009-04-20 23:02 ` Mike Christie 2009-04-21 7:26 ` [dm-devel] " Hannes Reinecke 0 siblings, 1 reply; 12+ messages in thread From: Mike Christie @ 2009-04-20 23:02 UTC (permalink / raw) To: device-mapper development; +Cc: SCSI Mailing List Mike Christie wrote: >> For starters we just should send a netlink event when fast_fail_io has >> fired. We could easily integrate that one in multipathd and would gain >> an instant benefit from that as we can switch paths in advance. >> Next step would be to implement an additional sdev state which would >> return 'DID_TRANSPORT_FASTFAIL' for any 'normal' I/O; it would be >> inserted between 'RUNNING' and 'CANCEL'. >> Transition would be possible between 'RUNNING' and 'FASTFAIL', but >> it would only be possible to transition into 'CANCEL' from 'FASTFAIL'. >> > > > Yeah, a new sdev state might be nice. Right now this state is handled by > the classes. For iscsi and FC the port/session will be in > blocked/ISCSI_SESSION_FAILED. Then internally the classes are decieding > what to do with IO in the *_chkready functions. > > How about setting the device to the offline state for this case where fast_io_fail has fired but the dev_loss_tmo has not yet fired? As fast as failing IO we get the same result. scsi-ml would fail the incoming IO instead of it getting to the class _chkready functions, but the scsi device state indicates that it cannot execute IO which might be nice for users. Can we not do this because offline for the device only means when the scsi-eh has put it offline because it could not recover it or is it more generic like for any time it cannot execute IO? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [dm-devel] LSF: Multipathing and path checking question 2009-04-20 23:02 ` Mike Christie @ 2009-04-21 7:26 ` Hannes Reinecke 0 siblings, 0 replies; 12+ messages in thread From: Hannes Reinecke @ 2009-04-21 7:26 UTC (permalink / raw) To: Mike Christie; +Cc: device-mapper development, SCSI Mailing List Mike Christie wrote: > Mike Christie wrote: >>> For starters we just should send a netlink event when fast_fail_io has >>> fired. We could easily integrate that one in multipathd and would gain >>> an instant benefit from that as we can switch paths in advance. >>> Next step would be to implement an additional sdev state which would >>> return 'DID_TRANSPORT_FASTFAIL' for any 'normal' I/O; it would be >>> inserted between 'RUNNING' and 'CANCEL'. >>> Transition would be possible between 'RUNNING' and 'FASTFAIL', but >>> it would only be possible to transition into 'CANCEL' from 'FASTFAIL'. >>> >> >> >> Yeah, a new sdev state might be nice. Right now this state is handled >> by the classes. For iscsi and FC the port/session will be in >> blocked/ISCSI_SESSION_FAILED. Then internally the classes are >> decieding what to do with IO in the *_chkready functions. >> >> > > > How about setting the device to the offline state for this case where > fast_io_fail has fired but the dev_loss_tmo has not yet fired? As fast > as failing IO we get the same result. scsi-ml would fail the incoming IO > instead of it getting to the class _chkready functions, but the scsi > device state indicates that it cannot execute IO which might be nice for > users. > Ah, no. OFFLINE is a dead end status out of which we cannot transition from inside the kernel. I'd prefer a new state here. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Markus Rex, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [dm-devel] LSF: Multipathing and path checking question 2009-04-17 14:55 ` Mike Christie 2009-04-17 15:21 ` Mike Christie @ 2009-04-20 7:59 ` Hannes Reinecke 2009-04-20 19:10 ` Mike Christie 1 sibling, 1 reply; 12+ messages in thread From: Hannes Reinecke @ 2009-04-20 7:59 UTC (permalink / raw) To: Mike Christie; +Cc: device-mapper development, SCSI Mailing List Hi Mike, Mike Christie wrote: > Hannes Reinecke wrote: >> >> FC Transport already maintains an attribute for the path state, and even >> sends netlink events if and when this attribute changes. For iSCSI I have > > Are you referring to fc_host_post_event? Is the same thing we talked > about last year, where you wanted events? Is this in multipath tools now > or just in the SLES ones? > Yep, that's the thing. > For something like FCH_EVT_LINKDOWN, are you going to fail the path at > that time or when would the multipath path be marked failed? > This is just a notification that the path has gone down. Fast fail / dev_loss_tmo still applies, ie that path won't get switched then. > > >> to defer to your superior knowledge; of course it would be easiest if >> iSCSI could send out the very same message FC does. > > We can do something like fc_host_event_code for iscsi. > Oh, that'll be grand. > Question on what you are needing: > > Do you mean you want to make fc_host_event_code more generic (there are > some FC specific ones like lip_reset)? Put them in scsi-ml and send from > a new netlink group that just sends these events? > > Or do you just want something similar from iscsi? iscsi will hook into > the iscsi netlink code using the scsi_netlink.c and then send a > ISCSIH_EVT_LINKUP, ISCSIH_EVT, LINKDOWN, etc. > Well, actually, I don't care. It's just if we were to go with the proposal we'll have to fix up all transports to present the path state to userspace; preferably with both, netlink events and sysfs attributes. The actual implementation might well be transport-specific. > What do the FCH_EVT_PORT_* ones means? > FC stuff methinks. James S. should know better. > > >> >> Idea was to modify the state machine so that fast_fail_io_tmo is >> being made mandatory, which transitions the sdev into an intermediate >> state 'DISABLED' and sends out a netlink message. > > > Above when you said, "No, I already do this for FC (should be checking > the replacement_timeout, too ...)", did you mean that you have mulitpath > tools always setting fast io fail now? > Yes, quite so. Look at git://git.kernel.org/pub/scm/linux/kernel/git/hare/multipath-tools branch sles11 for details. > For iscsi the replacement_timeout is always set already. If from > multipath tools you are going to add some code so multipth sets this I > can make iscsi allow the replacement_timeout to be set from sysfs like > is done for FC's fast io fail. > Oh, that would be awesome. Currently I think we have a mismatch / race condition between iSCSI and multipathing, where ERL in iSCSI actually counteracts multipathing. But I'll be investigating that one shortly. > > >> >> sdev state: RUNNING <-> BLOCKED <-> DISABLED -> CANCEL >> mpath state: path up <-> <stall> <-> path down -> remove from map >> >> This will allow us to switch paths early, ie when it moves into >> 'DISABLED' state. But the path structure themselves are still alive, >> so when a path comes back between 'DISABLED' and 'CANCEL' we won't >> have an issue reconnecting it. And we could even allow to set a >> dev_loss_tmo to infinity thereby simulating the 'old' behaviour. >> >> However, this proposal didn't go through. > > You got my hopes up for a solution in the the long explanation, then you > destroyed them :) > Yes, same here. I really thought this to be a sensible proposal, but then the discussion veered off into queue_if_no_path handling. > > Was the reason people did not like this because of the scsi device > lifetime issue? > > > I think we still want someone to set the fast io fail tmo for users when > multipath is being used, because we want IO out of the queues and > drivers and sent to the multipath layer before dev_loss_tmo if > dev_loss_tmo is still going to be a lot longer. fast io fail tmo is > usually less than 10 or 5 and for dev_loss_tmo seems like we still have > user setting that to minutes. > Exactly. Point here is that with the current implementation we basically _cannot_ return 'path down' anymore, as the path is either blocked (during which time all I/O got stalled) or failed completely (ie in state 'CANCEL'). Which is a bit of a detriment and we actually run into quite some contention when the path is removed, as we have to kill all I/O, fail over paths, remove stale paths, update device-mapper tables etc. When decoupling this by having the midlayer always return 'DID_TRANSPORT_DISRUPTED' after fast_fail_io we would be able to kill all I/O and switch paths gracefully. Path removal and device-mapper table update would then be done later one when dev_loss_tmo triggers. > > Can't the transport layers just send two events? > 1. On the initial link down when the port/session is blocked. > 2. When there fast io fail tmos fire. > Yes, that would be a good start. > Today, instead of #2, the Red Hat multipath tools guy and I were talking > about doing a probe with SG_IO. For example we would send down a path > tester IO and then wait for it to be failed with DID_TRANSPORT_FAILFAST. > No. this is exactly what you cannot do. SG_IO will be stalled when the sdev is BLOCKED and will only return a result _after_ the sdev transitions _out_ of the BLOCKED state. Translated to FC this means that whenever dev_loss_tmo is _active_ (!) no I/O will be send out neither any I/O result will be returned to userland. Hence using SG_IO for path checker is a bad idea here. Hence my proposal. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Markus Rex, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: LSF: Multipathing and path checking question 2009-04-20 7:59 ` Hannes Reinecke @ 2009-04-20 19:10 ` Mike Christie 2009-04-20 19:28 ` [dm-devel] " Mike Christie 0 siblings, 1 reply; 12+ messages in thread From: Mike Christie @ 2009-04-20 19:10 UTC (permalink / raw) To: Hannes Reinecke; +Cc: device-mapper development, SCSI Mailing List Hannes Reinecke wrote: >> Today, instead of #2, the Red Hat multipath tools guy and I were talking >> about doing a probe with SG_IO. For example we would send down a path >> tester IO and then wait for it to be failed with DID_TRANSPORT_FAILFAST. >> > No. this is exactly what you cannot do. SG_IO will be stalled when the > sdev is BLOCKED and will only return a result _after_ the sdev transitions > _out_ of the BLOCKED state. > Translated to FC this means that whenever dev_loss_tmo is _active_ (!) > no I/O will be send out neither any I/O result will be returned to userland. > That is not true anymore. When fast io fail fires, the sdev and rport will be blocked, but the the fc class will call into the LLD to have it fail any IO still running in the driver. The FC class will then fail any IO in the block queues, and then it will also fail any new IO sent to it. With your patch to have multipath-tools set fast io fail for multipath, then we should always get the IO failed before dev_loss_tmo fires. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [dm-devel] LSF: Multipathing and path checking question 2009-04-20 19:10 ` Mike Christie @ 2009-04-20 19:28 ` Mike Christie 2009-04-21 7:04 ` Hannes Reinecke 0 siblings, 1 reply; 12+ messages in thread From: Mike Christie @ 2009-04-20 19:28 UTC (permalink / raw) To: Hannes Reinecke; +Cc: device-mapper development, SCSI Mailing List Mike Christie wrote: > Hannes Reinecke wrote: >>> Today, instead of #2, the Red Hat multipath tools guy and I were talking >>> about doing a probe with SG_IO. For example we would send down a path >>> tester IO and then wait for it to be failed with DID_TRANSPORT_FAILFAST. >>> >> No. this is exactly what you cannot do. SG_IO will be stalled when the >> sdev is BLOCKED and will only return a result _after_ the sdev >> transitions >> _out_ of the BLOCKED state. >> Translated to FC this means that whenever dev_loss_tmo is _active_ (!) >> no I/O will be send out neither any I/O result will be returned to >> userland. >> > > That is not true anymore. When fast io fail fires, the sdev and rport > will be blocked, but the the fc class will call into the LLD to have it I miswrote that. The rport will be show blocked state, but when fast io fail tmo fires, fc_terminate_rport_io will unblock the sdev, and the fc class chkready will fail any IO sent to it and of course terminate_rport_io will fail IO in the driver like I said below. And then you do not need a terminate_rport_io callback to have the fast io fail tmo now. If you set that timer at least IO in the block queue and new IO will be failed. > fail any IO still running in the driver. The FC class will then fail any > IO in the block queues, and then it will also fail any new IO sent to it. > > With your patch to have multipath-tools set fast io fail for multipath, > then we should always get the IO failed before dev_loss_tmo fires. > -- > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [dm-devel] LSF: Multipathing and path checking question 2009-04-20 19:28 ` [dm-devel] " Mike Christie @ 2009-04-21 7:04 ` Hannes Reinecke 0 siblings, 0 replies; 12+ messages in thread From: Hannes Reinecke @ 2009-04-21 7:04 UTC (permalink / raw) To: Mike Christie; +Cc: device-mapper development, SCSI Mailing List Hi Mike, Mike Christie wrote: > Mike Christie wrote: >> Hannes Reinecke wrote: >>>> Today, instead of #2, the Red Hat multipath tools guy and I were >>>> talking >>>> about doing a probe with SG_IO. For example we would send down a path >>>> tester IO and then wait for it to be failed with >>>> DID_TRANSPORT_FAILFAST. >>>> >>> No. this is exactly what you cannot do. SG_IO will be stalled when the >>> sdev is BLOCKED and will only return a result _after_ the sdev >>> transitions >>> _out_ of the BLOCKED state. >>> Translated to FC this means that whenever dev_loss_tmo is _active_ (!) >>> no I/O will be send out neither any I/O result will be returned to >>> userland. >>> >> >> That is not true anymore. When fast io fail fires, the sdev and rport >> will be blocked, but the the fc class will call into the LLD to have it > > I miswrote that. The rport will be show blocked state, but when fast io > fail tmo fires, fc_terminate_rport_io will unblock the sdev, and the fc > class chkready will fail any IO sent to it and of course > terminate_rport_io will fail IO in the driver like I said below. And > then you do not need a terminate_rport_io callback to have the fast io > fail tmo now. If you set that timer at least IO in the block queue and > new IO will be failed. > Indeed, I didn't look closely enough. Ok, so I/O will be failed after terminate_rport_io. So that means we can just implement a new netlink message after terminate_rport_io to inform the multipath daemon about this changes. And, of course, we _really_ should introduce a new sdev state here. Having the sdev set to 'RUNNING' but having all I/O failed in the transport class is just a quirky behaviour which is bound to cause trouble. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Markus Rex, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2009-04-21 7:26 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-04-16 22:59 LSF: Multipathing and path checking question Mike Christie 2009-04-17 7:50 ` [dm-devel] " Hannes Reinecke 2009-04-17 14:55 ` Mike Christie 2009-04-17 15:21 ` Mike Christie 2009-04-20 8:19 ` [dm-devel] " Hannes Reinecke 2009-04-20 19:23 ` Mike Christie 2009-04-20 23:02 ` Mike Christie 2009-04-21 7:26 ` [dm-devel] " Hannes Reinecke 2009-04-20 7:59 ` Hannes Reinecke 2009-04-20 19:10 ` Mike Christie 2009-04-20 19:28 ` [dm-devel] " Mike Christie 2009-04-21 7:04 ` Hannes Reinecke
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).