From: Roland Dreier <rdreier@cisco.com>
To: Ishai Rabinovitz <ishai@mellanox.co.il>
Cc: linux-scsi@vger.kernel.org, openib-general@openib.org,
Roland Dreier <rolandd@cisco.com>,
vu@mellanox.com
Subject: Re: [SRP] [RFC] Needed changes to support fail-over drivers
Date: Mon, 24 Jul 2006 15:34:14 -0700 [thread overview]
Message-ID: <ada8xmiiqe1.fsf@cisco.com> (raw)
In-Reply-To: <20060724165602.GA8600@mellanox.co.il> (Ishai Rabinovitz's message of "Mon, 24 Jul 2006 19:56:02 +0300")
[CC'ing linux-scsi as well -- I think we'll get better insight from there]
> The current SRP initiator code cannot work with several fail-over mechanisms.
>
> The current srp driver's behavior when a target off-line then online:
> 1) The target is offline.
> 2) the initiator tries to reconnect and fails
> 3) The initiator calls srp_remove_work that removes the scsi_host.
> 4) The target is back online.
> 5) the user (or the ibsrpdm daemon) is expected to execute a new add_target.
> 6) This creates a new scsi_host (with new names to the devices and new index in
> the scsi_host directory in sysfs) for this target.
>
> Fail-over drivers (e.g., MPP that is used by Engenio and XVM that is used by
> SGI) have problems with this behavior (item 3). They need the scsi_host to keep
> exist and return errors in the meanwhile until the connection to the target
> resumes.
OK, but is this a valid assumption? What happens for iSCSI and/or iSER?
> In addition remove/re-alloc scsi host is a "heavy" operation instead of
> disconnect/reconnect the connection only.
>
> In order to support these tools I propose the following changes that will allow
> the user to move the srp initiator to a disconnected state (when the target
> leaves the fabric) and reconnect it later (when the target returns to the
> fabric).
Seems OK but see below...
> After these changes will be in the ib_srp module, the ibsrpdm daemon will be
> able to monitor the presence of targets in the fabric and to use this interface
> (When targets leave or rejoin the fabric).
How does the daemon know when something is gone for good vs. when it
might come back?
> Here is the description of the new design: (I already implemented most of the
> code)
>
> 1) Split the function srp_reconnect_target into two functions:
> _srp_disconnect_target and _srp_reconnect_target
>
> 2) Adding two new states: SRP_TARGET_DISCONNECTED (The state after
> _srp_disconnect_target was executed and before _srp_reconnect_target is
> executed) and SRP_TARGET_DISCONNECTING (The state while in srp_remove_target).
>
> 3) Adding new input files in sysfs:
> /sys/class/scsi_host/host?/{disconnect_target,connect_target,erase_target}
>
> 4) Writing the string "remove" to /sys/class/scsi_host/host?/disconnect_target
> calls srp_disconnect_target that moves the corresponding target to a
> SRP_TARGET_DISCONNECTED state (After closing the cm, and reset all pending
> requests). Now when the scsi performs queuecommand to this host the result is
> DID_NO_CONNECT. This causes the scsi mid-layer to return to the user with an
> IO error without initiating the scsi error auto recovery chain.
Why does userspace need to be able to disconnect a connection?
> 5) Writing anything to /sys/class/scsi_host/host?/reconnect_target calls
> _srp_reconnect_target that move the target to SRP_TARGET_LIVE state again.
>
> 6) Writing "erase" to /sys/class/scsi_host/host?/erase_target calls
> srp_remove_work that removes the scsi_host.
Why the asymmetry here? In other words, why does anything work for
reconnect_target but only the literal "erase" work for erase_target?
> 7) Adding output files in sysfs to present the HCA and port that the initiator
> used to connect to the target. Using these files and the target GUID the
> ibsrpdm can know on which scsi_host to perform the reconnect_target.
next parent reply other threads:[~2006-07-24 22:34 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20060724165602.GA8600@mellanox.co.il>
2006-07-24 22:34 ` Roland Dreier [this message]
2006-07-25 2:06 ` [SRP] [RFC] Needed changes to support fail-over drivers Mike Christie
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ada8xmiiqe1.fsf@cisco.com \
--to=rdreier@cisco.com \
--cc=ishai@mellanox.co.il \
--cc=linux-scsi@vger.kernel.org \
--cc=openib-general@openib.org \
--cc=rolandd@cisco.com \
--cc=vu@mellanox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox