* [LSF/MM TOPIC] NVMe over Fabrics auto-discovery in Linux @ 2018-01-23 15:11 Johannes Thumshirn 2018-01-23 16:09 ` Bart Van Assche 2018-01-24 18:42 ` Sagi Grimberg 0 siblings, 2 replies; 9+ messages in thread From: Johannes Thumshirn @ 2018-01-23 15:11 UTC (permalink / raw) In NVMe over Fabrics we currently perform target discovery by running either one of 'nvme discover' or 'nvme connect-all' (with or without the use of an appropriate /etc/nvme/discovery.conf). This is well suited for the RDMA transport, which has no idea of the underlying fabric and it's connections. To automatically connect to an RDMA target Sagi proposed a systemd one-shot service in [1]. The Fibre Channel transport on the other hand does already know it's mapping of rports to lports and thus could automatically connect to the target (with a little help from udev) as shown in [2]. Unfortunately the method for FC is not possible with RDMA and the currently used 'nvme discover/connect/connect-all' method is extremely cumbersome with Fibre Channel, especially as no special setup was/is needed for SCSI devices over Fibre Channel and administrators thus expect it for NVMe as well. Other downside of the "RDMA version" are 1) once the network topology and thus /etc/nvme/discovery.conf changes one has to rebuild the initrd if nvme is to be started from the initrd and 2) if we use the one-shot systemd service there is no way to automatically re-try the discovery/connect. I'm hoping we have developers from the RDMA and Fibre Channel transports, as well as seasoned Storage developers with a SCSI Fibre Channel and RDMA knowledge and Distribution Maintainers around to discuss a way to address this problem is a user-friendly way. Byte, Johannes [1] http://lists.infradead.org/pipermail/linux-nvme/2017-September/012976.html [2] http://lists.infradead.org/pipermail/linux-nvme/2017-December/014324.html -- Johannes Thumshirn Storage jthumshirn at suse.de +49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg GF: Felix Imend?rffer, Jane Smithard, Graham Norton HRB 21284 (AG N?rnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850 ^ permalink raw reply [flat|nested] 9+ messages in thread
* [LSF/MM TOPIC] NVMe over Fabrics auto-discovery in Linux 2018-01-23 15:11 [LSF/MM TOPIC] NVMe over Fabrics auto-discovery in Linux Johannes Thumshirn @ 2018-01-23 16:09 ` Bart Van Assche 2018-01-24 8:26 ` Hannes Reinecke 2018-01-24 18:42 ` Sagi Grimberg 1 sibling, 1 reply; 9+ messages in thread From: Bart Van Assche @ 2018-01-23 16:09 UTC (permalink / raw) On 01/23/18 07:11, Johannes Thumshirn wrote: > In NVMe over Fabrics we currently perform target discovery by running either > one of 'nvme discover' or 'nvme connect-all' (with or without the use of an > appropriate /etc/nvme/discovery.conf). > > This is well suited for the RDMA transport, which has no idea of the > underlying fabric and it's connections. To automatically connect to an RDMA > target Sagi proposed a systemd one-shot service in [1]. > > The Fibre Channel transport on the other hand does already know it's mapping > of rports to lports and thus could automatically connect to the target (with a > little help from udev) as shown in [2]. > > Unfortunately the method for FC is not possible with RDMA and the currently > used 'nvme discover/connect/connect-all' method is extremely cumbersome with > Fibre Channel, especially as no special setup was/is needed for SCSI devices > over Fibre Channel and administrators thus expect it for NVMe as well. > > Other downside of the "RDMA version" are 1) once the network topology and thus > /etc/nvme/discovery.conf changes one has to rebuild the initrd if nvme is to > be started from the initrd and 2) if we use the one-shot systemd service there > is no way to automatically re-try the discovery/connect. > > I'm hoping we have developers from the RDMA and Fibre Channel transports, as > well as seasoned Storage developers with a SCSI Fibre Channel and RDMA > knowledge and Distribution Maintainers around to discuss a way to address this > problem is a user-friendly way. > > Byte, > Johannes > > [1] http://lists.infradead.org/pipermail/linux-nvme/2017-September/012976.html > [2] http://lists.infradead.org/pipermail/linux-nvme/2017-December/014324.html Hello Johannes, Can you have a look at the SSDP and SLP protocols and see whether one of these protocols or an alternative is appropriate? See also https://en.wikipedia.org/wiki/Simple_Service_Discovery_Protocol and https://en.wikipedia.org/wiki/Service_Location_Protocol. Thanks, Bart. ^ permalink raw reply [flat|nested] 9+ messages in thread
* [LSF/MM TOPIC] NVMe over Fabrics auto-discovery in Linux 2018-01-23 16:09 ` Bart Van Assche @ 2018-01-24 8:26 ` Hannes Reinecke 2018-01-24 17:17 ` James Smart 0 siblings, 1 reply; 9+ messages in thread From: Hannes Reinecke @ 2018-01-24 8:26 UTC (permalink / raw) On 01/23/2018 05:09 PM, Bart Van Assche wrote: > > > On 01/23/18 07:11, Johannes Thumshirn wrote: >> In NVMe over Fabrics we currently perform target discovery by running >> either >> one of 'nvme discover' or 'nvme connect-all' (with or without the use >> of an >> appropriate /etc/nvme/discovery.conf). >> >> This is well suited for the RDMA transport, which has no idea of the >> underlying fabric and it's connections. To automatically connect to an >> RDMA >> target Sagi proposed a systemd one-shot service in [1]. >> >> The Fibre Channel transport on the other hand does already know it's >> mapping >> of rports to lports and thus could automatically connect to the target >> (with a >> little help from udev) as shown in [2]. >> >> Unfortunately the method for FC is not possible with RDMA and the >> currently >> used 'nvme discover/connect/connect-all' method is extremely >> cumbersome with >> Fibre Channel, especially as no special setup was/is needed for SCSI >> devices >> over Fibre Channel and administrators thus expect it for NVMe as well. >> >> Other downside of the "RDMA version" are 1) once the network topology >> and thus >> /etc/nvme/discovery.conf changes one has to rebuild the initrd if nvme >> is to >> be started from the initrd and 2) if we use the one-shot systemd >> service there >> is no way to automatically re-try the discovery/connect. >> >> I'm hoping we have developers from the RDMA and Fibre Channel >> transports, as >> well as seasoned Storage developers with a SCSI Fibre Channel and RDMA >> knowledge and Distribution Maintainers around to discuss a way to >> address this >> problem is a user-friendly way. >> >> Byte, >> ????Johannes >> >> [1] >> http://lists.infradead.org/pipermail/linux-nvme/2017-September/012976.html >> >> [2] >> http://lists.infradead.org/pipermail/linux-nvme/2017-December/014324.html > > Hello Johannes, > > Can you have a look at the SSDP and SLP protocols and see whether one of > these protocols or an alternative is appropriate? See also > https://en.wikipedia.org/wiki/Simple_Service_Discovery_Protocol and > https://en.wikipedia.org/wiki/Service_Location_Protocol. > Partially beside the point. The problem currently is that FC-NVMe is the only transport which implements dev_loss_tmo, causing connections to be dropped completely after a certain time. After that the user has to manually re-establish the connection via nvme-cli, or one has to create some udev/systemd interaction (cf the thread "nvme/fc: add 'discovery' sysfs attribute to fc transport devices" and others). The other transports just keep the reconnection loop running, and the user has to manually _disconnect_ here. So we have a difference in user experience, which should be reconciled. Also, a user-space based rediscovery/reconnect will get tricky during path failover, as one might end up with all connections down and no way of ever being _able_ to call nvme-cli as the root fs in inaccessible. But that might be another topic. Cheers, Hannes -- Dr. Hannes Reinecke Teamlead Storage & Networking hare at suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg GF: F. Imend?rffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG N?rnberg) ^ permalink raw reply [flat|nested] 9+ messages in thread
* [LSF/MM TOPIC] NVMe over Fabrics auto-discovery in Linux 2018-01-24 8:26 ` Hannes Reinecke @ 2018-01-24 17:17 ` James Smart 2018-01-24 18:46 ` Sagi Grimberg 0 siblings, 1 reply; 9+ messages in thread From: James Smart @ 2018-01-24 17:17 UTC (permalink / raw) On 1/24/2018 12:26 AM, Hannes Reinecke wrote: > Partially beside the point. > > The problem currently is that FC-NVMe is the only transport which > implements dev_loss_tmo, causing connections to be dropped completely > after a certain time. > After that the user has to manually re-establish the connection via > nvme-cli, or one has to create some udev/systemd interaction (cf the > thread "nvme/fc: add 'discovery' sysfs attribute to fc transport > devices" and others). > > The other transports just keep the reconnection loop running, and the > user has to manually _disconnect_ here. > > So we have a difference in user experience, which should be reconciled. This is incorrect. Rdma (and FC too) have the reconnect_delay timer that caps, on a per-controller basis, how long the reconnection loop will run before the controller is deleted.?? In FC's case, as we know the state of the node, which may have multiple controllers connected via it, and have inherited the SCSI semantics for how long to wait for connectivity to a node before giving up - thus FC's reconnect window is capped at min(controller reconnect_delay, fc node dev_loss_tmo). So it is the same experience - at least in termination behavior. It possibly is the same in recovery after this full termination/deletion of the controller has been done as well. If there is connectivity yet the controller reconnect_delay expired, FC will need the manual reconnect action just like rdma. However, FC will support, if you go through a loss of connectivity followed by connectivity, it can auto-matically reconnect back to the storage - granted it may have a different /dev name at that point. As stated by Johannes, the real difference in behavior is establishing the initial connectivity as well as those auto-reconnect behaviors where connectivity was lost and later regained. > > Also, a user-space based rediscovery/reconnect will get tricky during > path failover, as one might end up with all connections down and no way > of ever being _able_ to call nvme-cli as the root fs in inaccessible. > But that might be another topic. > > Cheers, > > Hannes I don't disagree with this, and do believe, that to support low-memory issues during failover and reconnectivity (as we've seen in the past), as well as to make booting easier (stop futzing with ramdisk), it will likely require some amount of nvme discovery engine within the kernel.? It's difficult as without things like SSDP or LSP, rdma and tcp can't really do this without admin help. But as you say - this can be a later topic. -- james ^ permalink raw reply [flat|nested] 9+ messages in thread
* [LSF/MM TOPIC] NVMe over Fabrics auto-discovery in Linux 2018-01-24 17:17 ` James Smart @ 2018-01-24 18:46 ` Sagi Grimberg 0 siblings, 0 replies; 9+ messages in thread From: Sagi Grimberg @ 2018-01-24 18:46 UTC (permalink / raw) >> Partially beside the point. >> >> The problem currently is that FC-NVMe is the only transport which >> implements dev_loss_tmo, causing connections to be dropped completely >> after a certain time. >> After that the user has to manually re-establish the connection via >> nvme-cli, or one has to create some udev/systemd interaction (cf the >> thread "nvme/fc: add 'discovery' sysfs attribute to fc transport >> devices" and others). >> >> The other transports just keep the reconnection loop running, and the >> user has to manually _disconnect_ here. >> >> So we have a difference in user experience, which should be reconciled. > > This is incorrect. Rdma (and FC too) have the reconnect_delay timer that > caps, on a per-controller basis, how long the reconnection loop will run > before the controller is deleted. echo that... > As stated by Johannes, the real difference in behavior is establishing > the initial connectivity as well as those auto-reconnect behaviors where > connectivity was lost and later regained. I agree that this is a problem for IP based transports. So until we have discovery enhancements we probably need to poll predefined discovery subsystems periodically and cross check with what we are connected to like iscsi discoveryd.. >> Also, a user-space based rediscovery/reconnect will get tricky during >> path failover, as one might end up with all connections down and no way >> of ever being _able_ to call nvme-cli as the root fs in inaccessible. >> But that might be another topic. Again, polling discovery subsystems every X time will overcome that... > I don't disagree with this, and do believe, that to support low-memory > issues during failover and reconnectivity (as we've seen in the past), > as well as to make booting easier (stop futzing with ramdisk), it will > likely require some amount of nvme discovery engine within the kernel. > It's difficult as without things like SSDP or LSP, rdma and tcp can't > really do this without admin help. But as you say - this can be a later > topic. I hope fabrics 1.1 discovery enhancements will tackle all that... ^ permalink raw reply [flat|nested] 9+ messages in thread
* [LSF/MM TOPIC] NVMe over Fabrics auto-discovery in Linux 2018-01-23 15:11 [LSF/MM TOPIC] NVMe over Fabrics auto-discovery in Linux Johannes Thumshirn 2018-01-23 16:09 ` Bart Van Assche @ 2018-01-24 18:42 ` Sagi Grimberg 2018-01-24 18:51 ` James Smart 2018-01-29 13:05 ` Johannes Thumshirn 1 sibling, 2 replies; 9+ messages in thread From: Sagi Grimberg @ 2018-01-24 18:42 UTC (permalink / raw) Hi Johannes, > In NVMe over Fabrics we currently perform target discovery by running either > one of 'nvme discover' or 'nvme connect-all' (with or without the use of an > appropriate /etc/nvme/discovery.conf). > > This is well suited for the RDMA transport, which has no idea of the > underlying fabric and it's connections. To automatically connect to an RDMA > target Sagi proposed a systemd one-shot service in [1]. > > The Fibre Channel transport on the other hand does already know it's mapping > of rports to lports and thus could automatically connect to the target (with a > little help from udev) as shown in [2]. > > Unfortunately the method for FC is not possible with RDMA and the currently > used 'nvme discover/connect/connect-all' method is extremely cumbersome with > Fibre Channel, especially as no special setup was/is needed for SCSI devices > over Fibre Channel and administrators thus expect it for NVMe as well. > > Other downside of the "RDMA version" are 1) once the network topology and thus > /etc/nvme/discovery.conf changes one has to rebuild the initrd if nvme is to > be started from the initrd and 2) if we use the one-shot systemd service there > is no way to automatically re-try the discovery/connect. > > I'm hoping we have developers from the RDMA and Fibre Channel transports, as > well as seasoned Storage developers with a SCSI Fibre Channel and RDMA > knowledge and Distribution Maintainers around to discuss a way to address this > problem is a user-friendly way. Discovery enhancements is a subject the NVMe TWG will be working on in the near future, and "discovery of the the discovery service" is indeed a sub-topic IIRC. I'm not sure LSF would be the appropriate forum for this. What we do need to have, is a way to support existing devices. I think its acceptable that FC and Ethernet based transports diverge in their implementations for this. For Ethernet based transports we could follow the open-iscsi model which has discoveryd service which periodically polls predefined addresses. As for updating initramfs, maybe we can live with this limitation for the time being? FC can keep doing its own thing... ^ permalink raw reply [flat|nested] 9+ messages in thread
* [LSF/MM TOPIC] NVMe over Fabrics auto-discovery in Linux 2018-01-24 18:42 ` Sagi Grimberg @ 2018-01-24 18:51 ` James Smart 2018-01-24 18:59 ` Sagi Grimberg 2018-01-29 13:05 ` Johannes Thumshirn 1 sibling, 1 reply; 9+ messages in thread From: James Smart @ 2018-01-24 18:51 UTC (permalink / raw) On 1/24/2018 10:42 AM, Sagi Grimberg wrote: > > FC can keep doing its own thing... > :)? Funny, I see it as ethernet solutions keep doing their own thing with each new protocol. FC keeps doing what it has for years. -- james ^ permalink raw reply [flat|nested] 9+ messages in thread
* [LSF/MM TOPIC] NVMe over Fabrics auto-discovery in Linux 2018-01-24 18:51 ` James Smart @ 2018-01-24 18:59 ` Sagi Grimberg 0 siblings, 0 replies; 9+ messages in thread From: Sagi Grimberg @ 2018-01-24 18:59 UTC (permalink / raw) >> FC can keep doing its own thing... >> > > :)? Funny, I see it as ethernet solutions keep doing their own thing > with each new protocol. FC keeps doing what it has for years. I meant "its own mature well-known that works thing" :) ^ permalink raw reply [flat|nested] 9+ messages in thread
* [LSF/MM TOPIC] NVMe over Fabrics auto-discovery in Linux 2018-01-24 18:42 ` Sagi Grimberg 2018-01-24 18:51 ` James Smart @ 2018-01-29 13:05 ` Johannes Thumshirn 1 sibling, 0 replies; 9+ messages in thread From: Johannes Thumshirn @ 2018-01-29 13:05 UTC (permalink / raw) Sagi Grimberg <sagi at grimberg.me> writes: > Discovery enhancements is a subject the NVMe TWG will be working on in > the near future, and "discovery of the the discovery service" is indeed > a sub-topic IIRC. I'm not sure LSF would be the appropriate forum for > this. > > What we do need to have, is a way to support existing devices. I think > its acceptable that FC and Ethernet based transports diverge in their > implementations for this. > > For Ethernet based transports we could follow the open-iscsi model which > has discoveryd service which periodically polls predefined addresses. As > for updating initramfs, maybe we can live with this limitation for the > time being? > > FC can keep doing its own thing... Sure, general discovery enhancements are out of LSF's scope, but for the implementing a nvme-discoveryd or something similar for Fabrics 1.0 based systems discussion LSF would be the right forum IMHO. This model still needs /etc/nvme/discovery.conf updates mirrored into the initrd etc... Byte, Johannes -- Johannes Thumshirn Storage jthumshirn at suse.de +49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg GF: Felix Imend?rffer, Jane Smithard, Graham Norton HRB 21284 (AG N?rnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850 ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2018-01-29 13:05 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-01-23 15:11 [LSF/MM TOPIC] NVMe over Fabrics auto-discovery in Linux Johannes Thumshirn 2018-01-23 16:09 ` Bart Van Assche 2018-01-24 8:26 ` Hannes Reinecke 2018-01-24 17:17 ` James Smart 2018-01-24 18:46 ` Sagi Grimberg 2018-01-24 18:42 ` Sagi Grimberg 2018-01-24 18:51 ` James Smart 2018-01-24 18:59 ` Sagi Grimberg 2018-01-29 13:05 ` Johannes Thumshirn
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox