* [LSF/MM TOPIC] Reducing the SRP initiator failover time
@ 2013-02-01 13:43 Bart Van Assche
[not found] ` <CAJZOPZJeCdkJ0xfK0kxic9jfz5A5ddw7TSWXe51yuO6bYTk4ag@mail.gmail.com>
0 siblings, 1 reply; 4+ messages in thread
From: Bart Van Assche @ 2013-02-01 13:43 UTC (permalink / raw)
To: lsf-pc, linux-scsi, linux-rdma@vger.kernel.org, David Dillow
It is known that it takes about two to three minutes before the upstream
SRP initiator fails over from a failed path to a working path. This is
not only considered longer than acceptable but is also longer than other
Linux SCSI initiators (e.g. iSCSI and FC). Progress so far with
improving the fail-over SRP initiator has been slow. This is because the
discussion about candidate patches occurred at two different levels: not
only the patches itself were discussed but also the approach that should
be followed. That last aspect is easier to discuss in a meeting than
over a mailing list. Hence the proposal to discuss SRP initiator
failover behavior during the LSF/MM summit. The topics that need further
discussion are:
* If a path fails, remove the entire SCSI host or preserve the SCSI
host and only remove the SCSI devices associated with that host ?
* Which software component should test the state of a path and should
reconnect to an SRP target if a path is restored ? Should that be
done by the user space process srp_daemon or by the SRP initiator
kernel module ?
* How should the SRP initiator behave after a path failure has been
detected ? Should the behavior be similar to the FC initiator with
its fast_io_fail_tmo and dev_loss_tmo parameters ?
Dave, if this topic gets accepted, I really hope you will be able to
attend the LSF/MM summit.
Bart.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [LSF/MM TOPIC] Reducing the SRP initiator failover time
[not found] ` <BB97625FCF082447AC2B11418FF02044A6E9E9C5-fViJhHBwANKuSA5JZHE7gA@public.gmane.org>
@ 2013-02-07 22:42 ` Vu Pham
[not found] ` <51142DE9.30900-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
0 siblings, 1 reply; 4+ messages in thread
From: Vu Pham @ 2013-02-07 22:42 UTC (permalink / raw)
To: Bart Van Assche
Cc: lsf-pc-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, linux-scsi,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, David Dillow,
Oren Duer, Sagi Grimberg
>
>
> It is known that it takes about two to three minutes before the
> upstream SRP initiator fails over from a failed path to a working
> path. This is not only considered longer than acceptable but is also
> longer than other Linux SCSI initiators (e.g. iSCSI and FC). Progress
> so far with improving the fail-over SRP initiator has been slow. This
> is because the discussion about candidate patches occurred at two
> different levels: not only the patches itself were discussed but also
> the approach that should be followed. That last aspect is easier to
> discuss in a meeting than over a mailing list. Hence the proposal to
> discuss SRP initiator failover behavior during the LSF/MM summit. The
> topics that need further discussion are:
> * If a path fails, remove the entire SCSI host or preserve the SCSI
> host and only remove the SCSI devices associated with that host ?
> * Which software component should test the state of a path and should
> reconnect to an SRP target if a path is restored ? Should that be
> done by the user space process srp_daemon or by the SRP initiator
> kernel module ?
> * How should the SRP initiator behave after a path failure has been
> detected ? Should the behavior be similar to the FC initiator with
> its fast_io_fail_tmo and dev_loss_tmo parameters ?
>
> Dave, if this topic gets accepted, I really hope you will be able to
> attend the LSF/MM summit.
>
> Bart.
>
Hello Bart,
Thank you for taking the initiative.
Mellanox think that this should be discussed. We'd be happy to attend.
We also would like to discuss:
* How and how fast does SRP detect a path failure besides RC error?
* Role of srp_daemon, how often srp_daemon scan fabric for new/old
targets, how-to scale srp_daemon discovery, traps.
-vu
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [LSF/MM TOPIC] Reducing the SRP initiator failover time
[not found] ` <51142DE9.30900-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2013-02-08 9:24 ` Sagi Grimberg
2013-02-08 11:38 ` Sebastian Riemer
0 siblings, 1 reply; 4+ messages in thread
From: Sagi Grimberg @ 2013-02-08 9:24 UTC (permalink / raw)
To: Bart Van Assche
Cc: Vu Pham, lsf-pc-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
linux-scsi, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
David Dillow, Oren Duer
On 2/8/2013 12:42 AM, Vu Pham wrote:
>
>>
>>
>> It is known that it takes about two to three minutes before the
>> upstream SRP initiator fails over from a failed path to a working
>> path. This is not only considered longer than acceptable but is also
>> longer than other Linux SCSI initiators (e.g. iSCSI and FC). Progress
>> so far with improving the fail-over SRP initiator has been slow. This
>> is because the discussion about candidate patches occurred at two
>> different levels: not only the patches itself were discussed but also
>> the approach that should be followed. That last aspect is easier to
>> discuss in a meeting than over a mailing list. Hence the proposal to
>> discuss SRP initiator failover behavior during the LSF/MM summit. The
>> topics that need further discussion are:
>> * If a path fails, remove the entire SCSI host or preserve the SCSI
>> host and only remove the SCSI devices associated with that host ?
>> * Which software component should test the state of a path and should
>> reconnect to an SRP target if a path is restored ? Should that be
>> done by the user space process srp_daemon or by the SRP initiator
>> kernel module ?
>> * How should the SRP initiator behave after a path failure has been
>> detected ? Should the behavior be similar to the FC initiator with
>> its fast_io_fail_tmo and dev_loss_tmo parameters ?
>>
>> Dave, if this topic gets accepted, I really hope you will be able to
>> attend the LSF/MM summit.
>>
>> Bart.
>>
> Hello Bart,
>
> Thank you for taking the initiative.
> Mellanox think that this should be discussed. We'd be happy to attend.
>
> We also would like to discuss:
> * How and how fast does SRP detect a path failure besides RC error?
> * Role of srp_daemon, how often srp_daemon scan fabric for new/old
> targets, how-to scale srp_daemon discovery, traps.
>
> -vu
Hey Bart,
I agree with Vu that this issue should be discussed. We'd be happy to
attend.
--
Sagi
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [LSF/MM TOPIC] Reducing the SRP initiator failover time
2013-02-08 9:24 ` Sagi Grimberg
@ 2013-02-08 11:38 ` Sebastian Riemer
0 siblings, 0 replies; 4+ messages in thread
From: Sebastian Riemer @ 2013-02-08 11:38 UTC (permalink / raw)
To: Sagi Grimberg
Cc: Bart Van Assche, Vu Pham, lsf-pc, linux-scsi,
linux-rdma@vger.kernel.org, David Dillow, Oren Duer
On 08.02.2013 10:24, Sagi Grimberg wrote:
> On 2/8/2013 12:42 AM, Vu Pham wrote:
>> Hello Bart,
>>
>> Thank you for taking the initiative.
>> Mellanox think that this should be discussed. We'd be happy to attend.
>>
>> We also would like to discuss:
>> * How and how fast does SRP detect a path failure besides RC error?
>> * Role of srp_daemon, how often srp_daemon scan fabric for new/old
>> targets, how-to scale srp_daemon discovery, traps.
>>
>> -vu
> Hey Bart,
>
> I agree with Vu that this issue should be discussed. We'd be happy to
> attend.
>
> --
> Sagi
Wow, also thanks to Mellanox for spending resources on SRP as well! Last
year in June we came across a very different situation.
Cheers,
Sebastian and the ProfitBricks storage team
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2013-02-08 11:38 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-02-01 13:43 [LSF/MM TOPIC] Reducing the SRP initiator failover time Bart Van Assche
[not found] ` <CAJZOPZJeCdkJ0xfK0kxic9jfz5A5ddw7TSWXe51yuO6bYTk4ag@mail.gmail.com>
[not found] ` <BB97625FCF082447AC2B11418FF02044A6E9E9C5@MTLDAG01.mtl.com>
[not found] ` <BB97625FCF082447AC2B11418FF02044A6E9E9C5-fViJhHBwANKuSA5JZHE7gA@public.gmane.org>
2013-02-07 22:42 ` Vu Pham
[not found] ` <51142DE9.30900-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2013-02-08 9:24 ` Sagi Grimberg
2013-02-08 11:38 ` Sebastian Riemer
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).