Re: [SRP] [RFC] Needed changes to support fail-over drivers

public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed

* Re: [SRP] [RFC] Needed changes to support fail-over drivers
       [not found] <20060724165602.GA8600@mellanox.co.il>
@ 2006-07-24 22:34 ` Roland Dreier
  2006-07-25  2:06   ` Mike Christie
  0 siblings, 1 reply; 2+ messages in thread
From: Roland Dreier @ 2006-07-24 22:34 UTC (permalink / raw)
  To: Ishai Rabinovitz; +Cc: linux-scsi, openib-general, Roland Dreier, vu

[CC'ing linux-scsi as well -- I think we'll get better insight from there]

 > The current SRP initiator code cannot work with several fail-over mechanisms. 
 > 
 > The current srp driver's behavior when a target off-line then online:
 > 1) The target is offline.
 > 2) the initiator tries to reconnect and fails
 > 3) The initiator calls srp_remove_work that removes the scsi_host.
 > 4) The target is back online.
 > 5) the user (or the ibsrpdm daemon) is expected to execute a new add_target.
 > 6) This creates a new scsi_host (with new names to the devices and new index in
 > the scsi_host directory in sysfs) for this target.
 > 
 > Fail-over drivers (e.g., MPP that is used by Engenio and XVM that is used by
 > SGI) have problems with this behavior (item 3). They need the scsi_host to keep
 > exist and return errors in the meanwhile until the connection to the target
 > resumes.

OK, but is this a valid assumption?  What happens for iSCSI and/or iSER?

 > In addition remove/re-alloc scsi host is a "heavy" operation instead of
 > disconnect/reconnect the connection only.
 > 
 > In order to support these tools I propose the following changes that will allow
 > the user to move the srp initiator to a disconnected state (when the target
 > leaves the fabric) and reconnect it later (when the target returns to the
 > fabric).

Seems OK but see below...

 > After these changes will be in the ib_srp module, the ibsrpdm daemon will be
 > able to monitor the presence of targets in the fabric and to use this interface
 > (When targets leave or rejoin the fabric).

How does the daemon know when something is gone for good vs. when it
might come back?

 > Here is the description of the new design: (I already implemented most of the
 > code)
 > 
 > 1) Split the function srp_reconnect_target into two functions:
 > _srp_disconnect_target and _srp_reconnect_target 
 > 
 > 2) Adding two new states: SRP_TARGET_DISCONNECTED (The state after
 > _srp_disconnect_target was executed and before _srp_reconnect_target is
 > executed) and SRP_TARGET_DISCONNECTING (The state while in srp_remove_target).
 > 
 > 3) Adding new input files in sysfs:
 > /sys/class/scsi_host/host?/{disconnect_target,connect_target,erase_target}
 > 
 > 4) Writing the string "remove" to /sys/class/scsi_host/host?/disconnect_target
 > calls srp_disconnect_target that moves the corresponding target to a
 > SRP_TARGET_DISCONNECTED state (After closing the cm, and reset all pending
 > requests).  Now when the scsi performs queuecommand to this host the result is
 > DID_NO_CONNECT.  This causes the scsi mid-layer to return to the user with an
 > IO error without initiating the scsi error auto recovery chain.

Why does userspace need to be able to disconnect a connection?

 > 5) Writing anything to /sys/class/scsi_host/host?/reconnect_target calls
 > _srp_reconnect_target that move the target to SRP_TARGET_LIVE state again.
 > 
 > 6) Writing "erase" to /sys/class/scsi_host/host?/erase_target calls
 > srp_remove_work that removes the scsi_host.

Why the asymmetry here?  In other words, why does anything work for
reconnect_target but only the literal "erase" work for erase_target?

 > 7) Adding output files in sysfs to present the HCA and port that the initiator
 > used to connect to the target. Using these files and the target GUID the
 > ibsrpdm can know on which scsi_host to perform the reconnect_target.

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [SRP] [RFC] Needed changes to support fail-over drivers
  2006-07-24 22:34 ` [SRP] [RFC] Needed changes to support fail-over drivers Roland Dreier
@ 2006-07-25  2:06   ` Mike Christie
  0 siblings, 0 replies; 2+ messages in thread
From: Mike Christie @ 2006-07-25  2:06 UTC (permalink / raw)
  To: Roland Dreier
  Cc: Ishai Rabinovitz, linux-scsi, openib-general, Roland Dreier, vu

Roland Dreier wrote:
> [CC'ing linux-scsi as well -- I think we'll get better insight from there]
> 
>  > The current SRP initiator code cannot work with several fail-over mechanisms. 
>  > 
>  > The current srp driver's behavior when a target off-line then online:
>  > 1) The target is offline.
>  > 2) the initiator tries to reconnect and fails
>  > 3) The initiator calls srp_remove_work that removes the scsi_host.
>  > 4) The target is back online.
>  > 5) the user (or the ibsrpdm daemon) is expected to execute a new add_target.
>  > 6) This creates a new scsi_host (with new names to the devices and new index in
>  > the scsi_host directory in sysfs) for this target.
>  > 
>  > Fail-over drivers (e.g., MPP that is used by Engenio and XVM that is used by
>  > SGI) have problems with this behavior (item 3). They need the scsi_host to keep
>  > exist and return errors in the meanwhile until the connection to the target
>  > resumes.
> 
> OK, but is this a valid assumption?  What happens for iSCSI and/or iSER?

I do not see why the host has to remain constant for the above problem.
I can understand why it may be easier to program though. However, this
is not a requirement for other multipath drivers like dm-multipath or md
multpiath and I do not think you should rely on that type of behavior.

The short story is that I think we are moving to something similar to
what srp does very soon.

The long story....

iscsi and iser allocate a host per session (session is allocated in the
host's hostdata). If there are problems with the connection (target goes
unreachable for N number of seconds or we get some error value from the
network layer, etc) we keep the host, session, connection, target and
scsi devices around and try to reconnect. We then have a userspace
daemon that tries to reconnect to the target and relogin.

If we reconnect within X seconds (we call this the replacement_timeout
and it is similar to the FC class dev_loss_tmo), we reuse those structs
and go on as normal. If after replacement_timeout seconds we do not
reconnect, we can remove the host, session, connection, target and
scsi_devices or we can keep them around and reuse them if we later
reconnect. If we remove those structs we later have to allocate new ones
of course and will get a new host number. Whether we use the model of
reusing the structs or removing them is controlled in userspace and we
currently do the wrong thing by default and keep the structs around.

I guess what we are supposed to do is something similar to the FC class
where if dev_loss_tmo expires then we should remove the session,
connection, target and devices. I am not sure if we should be removing
the scsi host though. I think it makes sense to remove that too, since
the host and session are so closely tied in our model. We are in the
process to moving to the model where all the structs are removed as the
default and only model we support, and it looks like we will do this in
2.6.19.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2006-07-25  2:09 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20060724165602.GA8600@mellanox.co.il>
2006-07-24 22:34 ` [SRP] [RFC] Needed changes to support fail-over drivers Roland Dreier
2006-07-25  2:06   ` Mike Christie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox