* Native multipath across multiple subsystem NQNs
@ 2022-02-11 22:07 Uday Shankar
2022-02-12 6:34 ` Christoph Hellwig
0 siblings, 1 reply; 10+ messages in thread
From: Uday Shankar @ 2022-02-11 22:07 UTC (permalink / raw)
To: linux-nvme
Cc: Prabhath Sajeepa, Keith Busch, Jens Axboe, Christoph Hellwig,
Sagi Grimberg
Hello,
We have the need to expose the same namespace (considered the same due
to reporting the same unique identifier in "Identify Namespace") from
two distinct targets, having different subsystem NQNs. In this case, the
native multipath implementation in this driver behaves differently from
most other multipath implementations, including DM-multipath and the
implementations in other operating systems.
Native multipath: Does not consider the namespaces the same. Two block
devices are created, corresponding to the two subsystems in play.
Others: The namespaces are considered the same. One block device is
created, backed by paths to both subsystems.
The second behavior is desired in our use case. I want to implement this
behavior in the driver. I can think of two approaches, and want some
feedback on which one(s) would be acceptable.
1. A "direct" approach: add a parameter, which, when set, will result in
the second behavior - paths to namespaces are consolidated purely based
on namespace unique ID, without considering the subsystem.
2. The recently ratified TPE4034 Dispersed Namespaces feature provides a
means to get the second behavior, and a lot more. It requires both host
and target support.
I look forward to your feedback!
Thanks,
Uday
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Native multipath across multiple subsystem NQNs
2022-02-11 22:07 Native multipath across multiple subsystem NQNs Uday Shankar
@ 2022-02-12 6:34 ` Christoph Hellwig
2022-02-12 21:22 ` Keith Busch
0 siblings, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2022-02-12 6:34 UTC (permalink / raw)
To: Uday Shankar
Cc: linux-nvme, Prabhath Sajeepa, Keith Busch, Jens Axboe,
Christoph Hellwig, Sagi Grimberg
The answer is neither. The fact that NVMe with ANA required namespaces
to be be attached to the same subsystem is good and important.
TPE 4034 is retrograde and completely broken in this respect and should
have never been ratifier. Please fix your storage system to export
virtual subsystems like everyone else and don't break the nice NVMe
architecture.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Native multipath across multiple subsystem NQNs
2022-02-12 6:34 ` Christoph Hellwig
@ 2022-02-12 21:22 ` Keith Busch
2022-02-14 8:12 ` Christoph Hellwig
0 siblings, 1 reply; 10+ messages in thread
From: Keith Busch @ 2022-02-12 21:22 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Uday Shankar, linux-nvme, Prabhath Sajeepa, Jens Axboe,
Sagi Grimberg
On Sat, Feb 12, 2022 at 07:34:22AM +0100, Christoph Hellwig wrote:
> The answer is neither. The fact that NVMe with ANA required namespaces
> to be be attached to the same subsystem is good and important.
> TPE 4034 is retrograde and completely broken in this respect and should
> have never been ratifier. Please fix your storage system to export
> virtual subsystems like everyone else and don't break the nice NVMe
> architecture.
If anything, the driver could check for duplicate namespace identifiers
across subsystems so that we could warn about or prevent exposing them.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Native multipath across multiple subsystem NQNs
2022-02-12 21:22 ` Keith Busch
@ 2022-02-14 8:12 ` Christoph Hellwig
2022-02-25 0:15 ` Uday Shankar
0 siblings, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2022-02-14 8:12 UTC (permalink / raw)
To: Keith Busch
Cc: Christoph Hellwig, Uday Shankar, linux-nvme, Prabhath Sajeepa,
Jens Axboe, Sagi Grimberg
On Sat, Feb 12, 2022 at 01:22:21PM -0800, Keith Busch wrote:
> On Sat, Feb 12, 2022 at 07:34:22AM +0100, Christoph Hellwig wrote:
> > The answer is neither. The fact that NVMe with ANA required namespaces
> > to be be attached to the same subsystem is good and important.
> > TPE 4034 is retrograde and completely broken in this respect and should
> > have never been ratifier. Please fix your storage system to export
> > virtual subsystems like everyone else and don't break the nice NVMe
> > architecture.
>
> If anything, the driver could check for duplicate namespace identifiers
> across subsystems so that we could warn about or prevent exposing them.
Yes, this has actually been something I've been wanting to do for a long
time.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Native multipath across multiple subsystem NQNs
2022-02-14 8:12 ` Christoph Hellwig
@ 2022-02-25 0:15 ` Uday Shankar
2022-02-25 0:53 ` Randy Jennings
0 siblings, 1 reply; 10+ messages in thread
From: Uday Shankar @ 2022-02-25 0:15 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Keith Busch, linux-nvme, Prabhath Sajeepa, Jens Axboe,
Sagi Grimberg, Randy Jennings
Pulling Randy into this thread.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Native multipath across multiple subsystem NQNs
2022-02-25 0:15 ` Uday Shankar
@ 2022-02-25 0:53 ` Randy Jennings
2022-03-01 11:08 ` Christoph Hellwig
0 siblings, 1 reply; 10+ messages in thread
From: Randy Jennings @ 2022-02-25 0:53 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Christoph Hellwig, Keith Busch, linux-nvme, Prabhath Sajeepa,
Jens Axboe, Sagi Grimberg, Uday Shankar
> The fact that NVMe with ANA required namespaces
> to be be attached to the same subsystem is good and important.
> TPE 4034 is retrograde and completely broken in this respect and should
> have never been ratifier. Please fix your storage system to export
> virtual subsystems like everyone else and don't break the nice NVMe
> architecture.
Supposing we did implement virtual subsystems to handle exposing
namespaces on multiple arrays.
In addition to other considerations, migrating a namespace with hot data
from a different storage vendor array nondisruptive to client i/o would
be difficult to do without having a namespace exist under multiple NQNs.
One approach that has been used for SCSI is by multipathing to both
targets & a proxy mirroring writes; on cut-over, the other path
disconnects. If the filehandle has to change, that is disruptive to the
client software.
Unifying different storage arrays behind the same NQN/virtual subsystem
requires coordination of nvme ctrl_ids at least, and doing that between
different storage vendor arrays is unlikely. Having a mechanism to
migrate between different JBOF devices non-disruptively would be helpful
regardless of the source/destination vendors. Such devices will
probably not have the option of virtual subsystems.
Additionally, the granularity at which NQNs/subsystems are exposed to
hosts affects how many connections the host creates with the controller.
Each connection has a cost in hardware & software.
In other words, even with implementing virtual subsystems, we still have
use for non-disruptively moving a namespace between subsystems. How
will this usecase be supported on Linux?
Sincerely,
Randy Jennings
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Native multipath across multiple subsystem NQNs
2022-02-25 0:53 ` Randy Jennings
@ 2022-03-01 11:08 ` Christoph Hellwig
[not found] ` <2A0F911A-7900-401F-B75B-7D0C2866C33A@netapp.com>
2022-03-03 11:04 ` Adurthi, Prashanth
0 siblings, 2 replies; 10+ messages in thread
From: Christoph Hellwig @ 2022-03-01 11:08 UTC (permalink / raw)
To: Randy Jennings
Cc: Christoph Hellwig, Keith Busch, linux-nvme, Prabhath Sajeepa,
Jens Axboe, Sagi Grimberg, Uday Shankar
On Thu, Feb 24, 2022 at 05:53:06PM -0700, Randy Jennings wrote:
> In addition to other considerations, migrating a namespace with hot data
> from a different storage vendor array nondisruptive to client i/o would
> be difficult to do without having a namespace exist under multiple NQNs.
> One approach that has been used for SCSI is by multipathing to both
> targets & a proxy mirroring writes; on cut-over, the other path
> disconnects. If the filehandle has to change, that is disruptive to the
> client software.
>
> Unifying different storage arrays behind the same NQN/virtual subsystem
> requires coordination of nvme ctrl_ids at least, and doing that between
> different storage vendor arrays is unlikely. Having a mechanism to
> migrate between different JBOF devices non-disruptively would be helpful
> regardless of the source/destination vendors. Such devices will
> probably not have the option of virtual subsystems.
Yes, it does require coordination, so please coordinate.
> In other words, even with implementing virtual subsystems, we still have
> use for non-disruptively moving a namespace between subsystems. How
> will this usecase be supported on Linux?
You can create a mirror using dm (or md but not as easily), resync and
then cut off a leg. If you want to do that transparently on an
already existing device node you'll need to pick up the block device
interposer patchset and help to get it upstream.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Native multipath across multiple subsystem NQNs
[not found] ` <2A0F911A-7900-401F-B75B-7D0C2866C33A@netapp.com>
@ 2022-03-03 10:41 ` Christoph Hellwig
0 siblings, 0 replies; 10+ messages in thread
From: Christoph Hellwig @ 2022-03-03 10:41 UTC (permalink / raw)
To: Adurthi, Prashanth
Cc: Christoph Hellwig, Randy Jennings, Keith Busch,
linux-nvme@lists.infradead.org, Prabhath Sajeepa, Jens Axboe,
Sagi Grimberg, Uday Shankar, Knight, Frederick
Hi Prashanth,
your mail client completely messed up the quoting so I could not easily
find what you actually wrote inbetween the full quote.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Native multipath across multiple subsystem NQNs
2022-03-01 11:08 ` Christoph Hellwig
[not found] ` <2A0F911A-7900-401F-B75B-7D0C2866C33A@netapp.com>
@ 2022-03-03 11:04 ` Adurthi, Prashanth
2022-03-03 11:13 ` Christoph Hellwig
1 sibling, 1 reply; 10+ messages in thread
From: Adurthi, Prashanth @ 2022-03-03 11:04 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Randy Jennings, Keith Busch, linux-nvme@lists.infradead.org,
Prabhath Sajeepa, Jens Axboe, Sagi Grimberg, Uday Shankar,
Knight, Frederick
> On 01-Mar-2022, at 4:38 PM, Christoph Hellwig <hch@lst.de> wrote:
>
>
>
> On Thu, Feb 24, 2022 at 05:53:06PM -0700, Randy Jennings wrote:
>> In addition to other considerations, migrating a namespace with hot data
>> from a different storage vendor array nondisruptive to client i/o would
>> be difficult to do without having a namespace exist under multiple NQNs.
>> One approach that has been used for SCSI is by multipathing to both
>> targets & a proxy mirroring writes; on cut-over, the other path
>> disconnects. If the filehandle has to change, that is disruptive to the
>> client software.
>>
>> Unifying different storage arrays behind the same NQN/virtual subsystem
>> requires coordination of nvme ctrl_ids at least, and doing that between
>> different storage vendor arrays is unlikely. Having a mechanism to
>> migrate between different JBOF devices non-disruptively would be helpful
>> regardless of the source/destination vendors. Such devices will
>> probably not have the option of virtual subsystems.
>
> Yes, it does require coordination, so please coordinate.
>
>> In other words, even with implementing virtual subsystems, we still have
>> use for non-disruptively moving a namespace between subsystems. How
>> will this usecase be supported on Linux?
Implementing the same subsystem NQN on multiple storage arrays imposes excessive
architecture/design costs on the target. In addition to implementing
the same subsystem NQN, the target has to ensure that all components
that make up the virtual subsystem (across multiple arrays) have
the same inventory (same namespaces with matching NSIDs and ANA
groups). This needs to be kept consistent even when there are communication
failures between those components of the virtual subsystem.
The virtual subsystem would also be required to have the same host
access control settings and similar QoS settings etc on all of the
components that make up the virtual subsystem. This coordination, in
addition to what is required for controller IDs, is extremely complicated when both
the arrays belong to the same vendor and next to impossible, if in the future,
a multi-vendor solution is developed.
TP 4034 is designed to specifically address these migration/business continuity solutions, while
minimizing all of that complexity in the target. So it's quite useful in that aspect.
Can you please shed some light on why you consider TP 4034 retrograde? What issues are you concerned about if Linux nvme were to handle namespaces shared across subsystems?
Prashanth
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Native multipath across multiple subsystem NQNs
2022-03-03 11:04 ` Adurthi, Prashanth
@ 2022-03-03 11:13 ` Christoph Hellwig
0 siblings, 0 replies; 10+ messages in thread
From: Christoph Hellwig @ 2022-03-03 11:13 UTC (permalink / raw)
To: Adurthi, Prashanth
Cc: Christoph Hellwig, Randy Jennings, Keith Busch,
linux-nvme@lists.infradead.org, Prabhath Sajeepa, Jens Axboe,
Sagi Grimberg, Uday Shankar, Knight, Frederick,
Greg Kroah-Hartman
Hi Prashanth,
thanks for adding Fred who apparently complained to the NVMe BOD about
this issue, and who really needs some shaming from Greg about trying to
use corporate means for forcing Linux developers into implementing broken
standards.
On Thu, Mar 03, 2022 at 11:04:48AM +0000, Adurthi, Prashanth wrote:
> Implementing the same subsystem NQN on multiple storage arrays imposes excessive
> architecture/design costs on the target.
While on the other hand implementing TP4034 forces excessive architecture,
design, maintainance and debugging costs on the host. That's why I've
told the NVMe technical working group on the very day that this TPAR
was brought in that we are not going to support it, and I've also heard
feeback from various other host implementators that there is very little
interest in it.
>
> In addition to implementing
> the same subsystem NQN, the target has to ensure that all components
> that make up the virtual subsystem (across multiple arrays) have
> the same inventory (same namespaces with matching NSIDs and ANA
> groups). This needs to be kept consistent even when there are communication
> failures between those components of the virtual subsystem.
> The virtual subsystem would also be required to have the same host
> access control settings and similar QoS settings etc on all of the
> components that make up the virtual subsystem. This coordination, in
> addition to what is required for controller IDs, is extremely complicated when both
> the arrays belong to the same vendor and next to impossible, if in the future,
> a multi-vendor solution is developed.
Yes, and that is the whole point.
> Can you please shed some light on why you consider TP 4034 retrograde? What issues are you concerned about if Linux nvme were to handle namespaces shared across subsystems?
NVMe has a sensible object model that allows us for easy sanity checking
of duplicate ids and more importantly for proper discovery. We do not
want to lose that, nor do want to break the sysfs hierachy that directly
results from encoding the nvme architecture model in the Linux device
model.
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2022-03-03 11:14 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-02-11 22:07 Native multipath across multiple subsystem NQNs Uday Shankar
2022-02-12 6:34 ` Christoph Hellwig
2022-02-12 21:22 ` Keith Busch
2022-02-14 8:12 ` Christoph Hellwig
2022-02-25 0:15 ` Uday Shankar
2022-02-25 0:53 ` Randy Jennings
2022-03-01 11:08 ` Christoph Hellwig
[not found] ` <2A0F911A-7900-401F-B75B-7D0C2866C33A@netapp.com>
2022-03-03 10:41 ` Christoph Hellwig
2022-03-03 11:04 ` Adurthi, Prashanth
2022-03-03 11:13 ` Christoph Hellwig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).