From mboxrd@z Thu Jan 1 00:00:00 1970 From: Vu Pham Subject: Re: [LSF/MM TOPIC] Reducing the SRP initiator failover time Date: Thu, 7 Feb 2013 14:42:49 -0800 Message-ID: <51142DE9.30900@mellanox.com> References: <510BC68A.90708@acm.org> Mime-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: Sender: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org To: Bart Van Assche Cc: lsf-pc-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org, linux-scsi , "linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" , David Dillow , Oren Duer , Sagi Grimberg List-Id: linux-rdma@vger.kernel.org > > > It is known that it takes about two to three minutes before the > upstream SRP initiator fails over from a failed path to a working > path. This is not only considered longer than acceptable but is also > longer than other Linux SCSI initiators (e.g. iSCSI and FC). Progress > so far with improving the fail-over SRP initiator has been slow. This > is because the discussion about candidate patches occurred at two > different levels: not only the patches itself were discussed but also > the approach that should be followed. That last aspect is easier to > discuss in a meeting than over a mailing list. Hence the proposal to > discuss SRP initiator failover behavior during the LSF/MM summit. The > topics that need further discussion are: > * If a path fails, remove the entire SCSI host or preserve the SCSI > host and only remove the SCSI devices associated with that host ? > * Which software component should test the state of a path and should > reconnect to an SRP target if a path is restored ? Should that be > done by the user space process srp_daemon or by the SRP initiator > kernel module ? > * How should the SRP initiator behave after a path failure has been > detected ? Should the behavior be similar to the FC initiator with > its fast_io_fail_tmo and dev_loss_tmo parameters ? > > Dave, if this topic gets accepted, I really hope you will be able to > attend the LSF/MM summit. > > Bart. > Hello Bart, Thank you for taking the initiative. Mellanox think that this should be discussed. We'd be happy to attend. We also would like to discuss: * How and how fast does SRP detect a path failure besides RC error? * Role of srp_daemon, how often srp_daemon scan fabric for new/old targets, how-to scale srp_daemon discovery, traps. -vu -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html