public inbox for linux-nvme@lists.infradead.org
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Keith Busch <kbusch@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>,
	John Meneghini <jmeneghi@redhat.com>,
	"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>,
	ming.lei@redhat.com
Subject: Re: nvme/pcie hot plug results in /dev name change
Date: Wed, 1 Feb 2023 10:33:15 +0800	[thread overview]
Message-ID: <Y9nPa2ZBc80g2LcA@T590> (raw)
In-Reply-To: <Y9lEF7oY+aQw2POp@kbusch-mbp.dhcp.thefacebook.com>

On Tue, Jan 31, 2023 at 09:38:47AM -0700, Keith Busch wrote:
> On Sun, Jan 29, 2023 at 06:28:05PM +0800, Ming Lei wrote:
> > On Fri, Jan 20, 2023 at 11:01:53PM -0800, Christoph Hellwig wrote:
> > > On Fri, Jan 20, 2023 at 02:42:23PM -0700, Keith Busch wrote:
> > > > That is correct. We don't know the identity of the device at the point
> > > > we have to assign it an instance number, so the hot added one will just
> > > > get the first available unique number. If you need a consistent name, we
> > > > have the persistent naming rules that should create those links in
> > > > /dev/disk/by-id/.
> > > 
> > > Note that this a bit of a problem under a file system or stacking driver
> > > that handles failing drives (e.g. btrfs or md raid), that holds ontop
> > > the "old" device file, and then fails to find the new one.  I had a
> > > customer complaint for that as well :)
> > > 
> > > The first hack was to force run the multipath code that can keep the
> > > node alive.  That works, but is really ugly especially when dealing
> > > with corner cases such as overlapping nsids between different
> > > controllers.
> > > 
> > > In the long run I think we'll need to:
> > >  - send a notification to the holder if a device is hot removed from
> > >    the block layer so that it can clean up
> > 
> > When the disk is deleted, the notification has been sent to userspace
> > via udev/kobj uevent, so user can umount the original FS or
> > DM/MD userspace can handle the device removal.
> > 
> > >  - make the upper layers look for the replugged devie
> > > 
> > > I've been working on some of this for a while but haven't made much
> > > progress due to other committments.
> > 
> > block device persistent name is supposed to be supported by userspace,
> > such as udev rule.
> 
> Come to think of it, I actually have heard many complaints about this behavior.
> Requiring user space deal with the teardown and restore of their open files and
> mount points on a transient link loss can be inconvenient. Example use cases

If IO error is returned to FS, I guess umount may have to be done since it might
be one meta IO. But if userspace has persistent device name, it is easy for
userspace to handle the umount and re-setup.

> are firmware activation requiring a Subsystem Reset, or a PCIe error
> containment event. Those cause the links to bounce, which can trigger hot plug
> events in some platforms.

The above isn't unique for nvme, and it is just easier for nvme-pci to
handle timeout/err by removing device, IMO.

> The native nvme multipath looks like it could be leveraged to improving that
> user experience if we wanted to make that layer an option for non-multipath
> devices.

Can you share the basic idea? Will nvme mulitpath hold the io error and
not propagate to upper layer until new device is probed? What if the
new device is probed late, and IO has been timed out and err is returned
to upper layer?


Thanks, 
Ming



  reply	other threads:[~2023-02-01  2:33 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-20 19:50 nvme/pcie hot plug results in /dev name change John Meneghini
2023-01-20 21:42 ` Keith Busch
2023-01-21  7:01   ` Christoph Hellwig
2023-01-29 10:28     ` Ming Lei
2023-01-31 16:38       ` Keith Busch
2023-02-01  2:33         ` Ming Lei [this message]
2023-02-01  6:27           ` Christoph Hellwig
2023-02-01 16:31             ` Keith Busch
2023-02-13 14:01               ` Sagi Grimberg
2023-02-13 16:32                 ` Keith Busch
2023-02-14  0:04                   ` Ming Lei
2023-02-14  9:18                     ` Sagi Grimberg
2023-02-15  1:04                       ` Ming Lei
2023-02-14 16:17                     ` Keith Busch
2023-02-15  1:11                       ` Ming Lei
2023-02-14  9:04                   ` Sagi Grimberg
2023-02-15  6:18                   ` Christoph Hellwig
2023-02-15  8:56                     ` Sagi Grimberg
2023-02-15 22:18             ` Keith Busch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y9nPa2ZBc80g2LcA@T590 \
    --to=ming.lei@redhat.com \
    --cc=hch@infradead.org \
    --cc=jmeneghi@redhat.com \
    --cc=kbusch@kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox