From mboxrd@z Thu Jan 1 00:00:00 1970 From: Sebastian Riemer Subject: Re: [PATCH 07/14] scsi_transport_srp: Add transport layer error handling Date: Mon, 17 Jun 2013 12:13:41 +0200 Message-ID: <51BEE155.1000609@profitbricks.com> References: <51B87501.4070005@acm.org> <51B8777B.5050201@acm.org> <51BA20ED.6040200@mellanox.com> <51BB1857.7040802@acm.org> <51BB5A04.3080901@mellanox.com> <51BC3945.9030900@acm.org> <51BEAA40.9070908@suse.de> <51BEB4FF.9000607@acm.org> <51BEB770.9030305@suse.de> <51BEBAEA.4080202@acm.org> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <51BEBAEA.4080202-HInyCGIudOg@public.gmane.org> Sender: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org To: Bart Van Assche Cc: Hannes Reinecke , Vu Pham , Roland Dreier , David Dillow , linux-rdma , linux-scsi , James Bottomley List-Id: linux-rdma@vger.kernel.org On 17.06.2013 09:29, Bart Van Assche wrote: > On 06/17/13 09:14, Hannes Reinecke wrote: >> On 06/17/2013 09:04 AM, Bart Van Assche wrote: >>> I agree that the value of fast_io_fail_tmo should be kept small. >>> Although as you explained changing the SCSI device state into >>> SDEV_BLOCK doesn't help for I/O that has already been queued on a >>> failed path, I think it's still useful for I/O that is queued after >>> the fast_io_fail timer has been started and before that timer has >>> expired. >> >> Why, but of course. >> >> The typical scenario would be: >> -> detect link-loss >> -> call scsi_block_request() >> -> start dev_loss_tmo and fast_io_fail_tmo >> >> -> When fast_io_fail_tmo triggers: >> -> Abort all outstanding requests >> >> -> When dev_loss_tmo triggers: >> -> Abort all outstanding requests >> -> Remove/disable the I_T nexus >> -> call scsi_unblock_request() >> >> However, if and whether multipath detects SDEV_BLOCK doesn't >> guarantee a fast failover; in fact is was only added rather recently >> as it's not a big win in most cases. > > Even if setting the state SDEV_BLOCK doesn't help much with improving > failover time, it still has the advantage over using > scsi_block_requests() that it can be overridden by a user via sysfs. In my opinion that SDEV_BLOCK can help the reconnect. The only reason for high fast_io_fail_tmo is that you don't use multipath at all and hope that the connection becomes available again before that timeout. You place the reconnects in between so that there is a chance that the reconnect succeeds and the transport layer error work can be canceled. But I have to look at all of your patches first to see how you implemented the big picture. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html