All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nilay Shroff <nilay@linux.ibm.com>
To: linux-nvme@lists.infradead.org, linux-block@vger.kernel.org
Cc: hch@lst.de, kbusch@kernel.org, hare@suse.de, sagi@grimberg.me,
	jmeneghi@redhat.com, axboe@kernel.dk, gjoyce@ibm.com
Subject: [RFC PATCH 0/2] improve NVMe multipath handling
Date: Fri, 21 Mar 2025 12:07:21 +0530	[thread overview]
Message-ID: <20250321063901.747605-1-nilay@linux.ibm.com> (raw)

Hi,

This patch series introduces improvements to NVMe multipath handling by
refining the removal behavior of the multipath head node and simplifying
configuration options. The idea/POC for this change was originally
proposed by Christoph[1] and Keith[2]. I worked upon their original
idea/POC and implemented this series.

The first patch in the series addresses an issue where the multipath
head node of a PCIe NVMe disk is removed immediately when all disk paths
are lost. This can cause problems in scenarios such as:
- Hot removal and re-addition of a disk.
- Transient PCIe link failures that trigger re-enumeration,
  briefly removing and restoring the disk.

In such cases, premature removal of the head node may result in a device
node name change, requiring applications to reopen device handles if
they were performing I/O during the failure. To mitigate this, we
introduce a delayed removal mechanism. Instead of removing the head node
immediately, the system waits for a configurable timeout, allowing the
disk to recover. If the disk comes back online within this window, the
head node remains unchanged, ensuring uninterrupted workloads.

A new sysfs attribute, delayed_shutdown_sec, allows users to configure
this timeout. By default, it is set to 0 seconds, preserving the
existing behavior unless explicitly changed.

Additionally, please note that this change now always creates head disk
node for all types of NVMe disks (single-ported or multi-ported) as well
as shared/private namespaces, unless the multipath nvme-core module
parameter is explicitly set to false or CONFIG_NVME_MULTIPATH is disabled.

The second patch removes the multipath module parameter parameter from
nvme-core, making native NVMe multipath support explicit. Now with first
patch changes, the multipath head node is always created, even for single-
port NVMe disks when CONFIG_NVME_MULTIPATH is configured. Since this
behavior is now default, the multipath module parameter may no longer be
needed. IMO, the CONFIG_NVME_MULTIPATH (native-multipath) should be the
default and non-native multipath should ideally be deprecated by now,
however I didn't remove CONFIG_NVME_MULTIPATH in this series. So users 
who still prefers non-native multipath can disable CONFIG_NVME_MULTIPATH 
at compile time. Having said that, if everyone agress we may depreacte 
non-native multipath support for NVMe.

These changes should help improve NVMe multipath reliability and simplify
configuration. Feedback and testing are welcome!

PS: Yes I know this RFC is late, but the intention is to get feedback/
suggestion in the upcoming LSF/MM/BPF summit. This might be used as a
reference implementation for discussion. I also saw that we've already
got a timeslot where John is going to talk about removing NVMe multipath
config option. Maybe we could include it in that discussion, if everyone
agress.

Thanks!
--Nilay

Nilay Shroff (2):
  nvme-multipath: introduce delayed removal of the multipath head node
  nvme-multipath: remove multipath module param

 drivers/nvme/host/core.c      |  36 ++++------
 drivers/nvme/host/multipath.c | 127 ++++++++++++++++++++++++++--------
 drivers/nvme/host/nvme.h      |   5 +-
 drivers/nvme/host/sysfs.c     |  13 ++++
 4 files changed, 132 insertions(+), 49 deletions(-)

-- 
2.47.1


             reply	other threads:[~2025-03-21  6:39 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-21  6:37 Nilay Shroff [this message]
2025-03-21  6:37 ` [RFC PATCH 1/2] nvme-multipath: introduce delayed removal of the multipath head node Nilay Shroff
2025-03-22  1:48   ` Martin K. Petersen
2025-03-22 22:08     ` Nilay Shroff
2025-03-25 15:21   ` John Meneghini
2025-04-07 14:44   ` Christoph Hellwig
2025-04-08 14:07     ` Nilay Shroff
2025-04-09 10:43       ` Christoph Hellwig
2025-04-18 10:45         ` Nilay Shroff
2025-04-22  7:36           ` Christoph Hellwig
2025-04-22  9:52             ` Nilay Shroff
2025-03-21  6:37 ` [RFC PATCH 2/2] nvme-multipath: remove multipath module param Nilay Shroff
2025-03-25 15:09   ` John Meneghini
2025-04-07 14:45   ` Christoph Hellwig
2025-04-08 14:35     ` Nilay Shroff
2025-04-09 10:45       ` Christoph Hellwig
2025-04-18 14:22         ` Nilay Shroff
2025-04-22  7:36           ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250321063901.747605-1-nilay@linux.ibm.com \
    --to=nilay@linux.ibm.com \
    --cc=axboe@kernel.dk \
    --cc=gjoyce@ibm.com \
    --cc=hare@suse.de \
    --cc=hch@lst.de \
    --cc=jmeneghi@redhat.com \
    --cc=kbusch@kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.