linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: snitzer@redhat.com (Mike Snitzer)
Subject: [PATCH 0/3] Provide more fine grained control over multipathing
Date: Wed, 30 May 2018 18:02:06 -0400	[thread overview]
Message-ID: <20180530220206.GA7037@redhat.com> (raw)
In-Reply-To: <db1aa53f-1152-c003-ccc1-3e439ca6916d@grimberg.me>

On Wed, May 30 2018 at  5:20pm -0400,
Sagi Grimberg <sagi@grimberg.me> wrote:

> Hi Folks,
> 
> I'm sorry to chime in super late on this, but a lot has been
> going on for me lately which got me off the grid.
> 
> So I'll try to provide my input hopefully without starting any more
> flames..
> 
> >>>This patch series aims to provide a more fine grained control over
> >>>nvme's native multipathing, by allowing it to be switched on and off
> >>>on a per-subsystem basis instead of a big global switch.
> >>
> >>No.  The only reason we even allowed to turn multipathing off is
> >>because you complained about installer issues.  The path forward
> >>clearly is native multipathing and there will be no additional support
> >>for the use cases of not using it.
> >
> >We all basically knew this would be your position.  But at this year's
> >LSF we pretty quickly reached consensus that we do in fact need this.
> >Except for yourself, Sagi and afaik Martin George: all on the cc were in
> >attendance and agreed.
> 
> Correction, I wasn't able to attend LSF this year (unfortunately).

Yes, I was trying to say you weren't at LSF (but are on the cc).

> >And since then we've exchanged mails to refine and test Johannes'
> >implementation.
> >
> >You've isolated yourself on this issue.  Please just accept that we all
> >have a pretty solid command of what is needed to properly provide
> >commercial support for NVMe multipath.
> >
> >The ability to switch between "native" and "other" multipath absolutely
> >does _not_ imply anything about the winning disposition of native vs
> >other.  It is purely about providing commercial flexibility to use
> >whatever solution makes sense for a given environment.  The default _is_
> >native NVMe multipath.  It is on userspace solutions for "other"
> >multipath (e.g. multipathd) to allow user's to whitelist an NVMe
> >subsystem to be switched to "other".
> >
> >Hopefully this clarifies things, thanks.
> 
> Mike, I understand what you're saying, but I also agree with hch on
> the simple fact that this is a burden on linux nvme (although less
> passionate about it than hch).
> 
> Beyond that, this is going to get much worse when we support "dispersed
> namespaces" which is a submitted TPAR in the NVMe TWG. "dispersed
> namespaces" makes NVMe namespaces share-able over different subsystems
> so changing the personality on a per-subsystem basis is just asking for
> trouble.
> 
> Moreover, I also wanted to point out that fabrics array vendors are
> building products that rely on standard nvme multipathing (and probably
> multipathing over dispersed namespaces as well), and keeping a knob that
> will keep nvme users with dm-multipath will probably not help them
> educate their customers as well... So there is another angle to this.

Wouldn't expect you guys to nurture this 'mpath_personality' knob.  SO
when features like "dispersed namespaces" land a negative check would
need to be added in the code to prevent switching from "native".

And once something like "dispersed namespaces" lands we'd then have to
see about a more sophisticated switch that operates at a different
granularity.  Could also be that switching one subsystem that is part of
"dispersed namespaces" would then cascade to all other associated
subsystems?  Not that dissimilar from the 3rd patch in this series that
allows a 'device' switch to be done in terms of the subsystem.

Anyway, I don't know the end from the beginning on something you just
told me about ;)  But we're all in this together.  And can take it as it
comes.  I'm merely trying to bridge the gap from old dm-multipath while
native NVMe multipath gets its legs.

In time I really do have aspirations to contribute more to NVMe
multipathing.  I think Christoph's NVMe multipath implementation of
bio-based device ontop on NVMe core's blk-mq device(s) is very clever
and effective (blk_steal_bios() hack and all).

> Don't get me wrong, I do support your cause, and I think nvme should try
> to help, I just think that subsystem granularity is not the correct
> approach going forward.

I understand there will be limits to this 'mpath_personality' knob's
utility and it'll need to evolve over time.  But the burden of making
more advanced NVMe multipath features accessible outside of native NVMe
isn't intended to be on any of the NVMe maintainers (other than maybe
remembering to disallow the switch where it makes sense in the future).
 
> As I said, I've been off the grid, can you remind me why global knob is
> not sufficient?

Because once nvme_core.multipath=N is set: native NVMe multipath is then
not accessible from the same host.  The goal of this patchset is to give
users choice.  But not limit them to _only_ using dm-multipath if they
just have some legacy needs.

Tough to be convincing with hypotheticals but I could imagine a very
obvious usecase for native NVMe multipathing be PCI-based embedded NVMe
"fabrics" (especially if/when the numa-based path selector lands).  But
the same host with PCI NVMe could be connected to a FC network that has
historically always been managed via dm-multipath.. but say that
FC-based infrastructure gets updated to use NVMe (to leverage a wider
NVMe investment, whatever?) -- but maybe admins would still prefer to
use dm-multipath for the NVMe over FC.
 
> This might sound stupid to you, but can't users that desperately must
> keep using dm-multipath (for its mature toolset or what-not) just
> stack it on multipath nvme device? (I might be completely off on
> this so feel free to correct my ignorance).

We could certainly pursue adding multipath-tools support for native NVMe
multipathing.  Not opposed to it (even if just reporting topology and
state).  But given the extensive lengths NVMe multipath goes to hide
devices we'd need some way to piercing through the opaque nvme device
that native NVMe multipath exposes.  But that really is a tangent
relative to this patchset.  Since that kind of visibility would also
benefit the nvme cli... otherwise how are users to even be able to trust
but verify native NVMe multipathing did what it expected it to?

Mike

  reply	other threads:[~2018-05-30 22:02 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-25 12:53 [PATCH 0/3] Provide more fine grained control over multipathing Johannes Thumshirn
2018-05-25 12:53 ` [PATCH 1/3] nvme: provide a way to disable nvme mpath per subsystem Johannes Thumshirn
2018-05-25 13:47   ` Mike Snitzer
2018-05-31  8:17   ` Sagi Grimberg
2018-05-25 12:53 ` [PATCH 2/3] nvme multipath: added SUBSYS_ATTR_RW Johannes Thumshirn
2018-05-25 12:53 ` [PATCH 3/3] nvme multipath: add dev_attr_mpath_personality Johannes Thumshirn
2018-05-25 13:05 ` [PATCH 0/3] Provide more fine grained control over multipathing Christoph Hellwig
2018-05-25 13:58   ` Mike Snitzer
2018-05-25 14:12     ` Christoph Hellwig
2018-05-25 14:50       ` Mike Snitzer
2018-05-29  1:19         ` Martin K. Petersen
2018-05-29  3:02           ` Mike Snitzer
2018-05-29  7:18             ` Hannes Reinecke
2018-05-29  7:22             ` Johannes Thumshirn
2018-05-29  8:09               ` Christoph Hellwig
2018-05-29  9:54                 ` Mike Snitzer
2018-05-29 23:27                 ` Mike Snitzer
2018-05-30 19:05                   ` Jens Axboe
2018-05-30 19:59                     ` Mike Snitzer
2018-06-04  6:19                     ` Hannes Reinecke
2018-06-04  7:18                       ` Johannes Thumshirn
2018-06-04 12:59                         ` Christoph Hellwig
2018-06-04 13:27                           ` Mike Snitzer
2018-05-31  2:42               ` Ming Lei
2018-05-30 21:20     ` Sagi Grimberg
2018-05-30 22:02       ` Mike Snitzer [this message]
2018-05-31  8:37         ` Sagi Grimberg
2018-05-31 12:37           ` Mike Snitzer
2018-05-31 16:34             ` Christoph Hellwig
2018-06-01  4:11               ` Mike Snitzer
2018-05-31 16:36           ` Christoph Hellwig
2018-05-31 16:33         ` Christoph Hellwig
2018-05-31 18:17           ` Mike Snitzer
2018-06-01  2:40             ` Martin K. Petersen
2018-06-01  4:24               ` Mike Snitzer
2018-06-01 14:09                 ` Martin K. Petersen
2018-06-01 15:21                   ` Mike Snitzer
2018-06-03 11:00                 ` Sagi Grimberg
2018-06-03 16:06                   ` Mike Snitzer
2018-06-04 11:46                     ` Sagi Grimberg
2018-06-04 12:48                       ` Johannes Thumshirn
2018-05-30 22:44       ` Mike Snitzer
2018-05-31  8:51         ` Sagi Grimberg
2018-05-31 12:41           ` Mike Snitzer
2018-06-04 21:58       ` Roland Dreier
2018-06-05  4:42         ` Christoph Hellwig
2018-06-05 22:57           ` Roland Dreier
2018-06-06  9:51             ` Christoph Hellwig
2018-06-06  9:32           ` Sagi Grimberg
2018-06-06  9:50             ` Christoph Hellwig
2018-05-25 14:22   ` Johannes Thumshirn
2018-05-25 14:30     ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180530220206.GA7037@redhat.com \
    --to=snitzer@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).