* [PATCH 00/19] nvme: switch to libmultipath
@ 2026-02-25 15:39 John Garry
2026-02-25 15:39 ` [PATCH 01/19] nvme-multipath: pass NS head to nvme_mpath_revalidate_paths() John Garry
` (19 more replies)
0 siblings, 20 replies; 28+ messages in thread
From: John Garry @ 2026-02-25 15:39 UTC (permalink / raw)
To: hch, kbusch, sagi, axboe, martin.petersen, james.bottomley, hare
Cc: jmeneghi, linux-nvme, linux-scsi, michael.christie, snitzer,
bmarzins, dm-devel, linux-block, linux-kernel, John Garry
This switches the NVMe host driver to use libmultipath. That library
is very heavily based on the NVMe multipath code, so the change over
should hopefully be straightforward. There is often a direct replacement
for functions.
The multipath functionality in nvme_ns_head and nvme_ns structures are
replaced with the mpath_head, mpath_disk, and mpath_device structures.
In the driver we have places which test is the nvme_ns_head structure has
member nvme_ns' - for this the nvme_ns_head list was used. Since that
member will disappear, a count of nvme_ns' is added.
It's hard to switch to libmulipath in a step-by-step fashion without
breaking builds or functionality. To make the series reviewable, I took
the approach of adding libmultipath-based code, which would initially be
unused, and then finally making the full switch.
I think that more testing is required here and any help on that would be
appreciated.
The series is based on baa47c4f89eb (nvme/nvme-7.0) nvme-pci: do not
try to add queue maps at runtime and [0]
[0] https://lore.kernel.org/linux-block/20260225153225.1031169-1-john.g.garry@oracle.com/T/#m928333859c0320e57ece0dfcf4ecf58baae3220f
John Garry (19):
nvme-multipath: pass NS head to nvme_mpath_revalidate_paths()
nvme: introduce a namespace count in the ns head structure
nvme-multipath: add nvme_is_mpath_request()
nvme-multipath: add initial support for using libmultipath
nvme-multipath: add nvme_mpath_available_path()
nvme-multipath: add nvme_mpath_{add, remove}_cdev()
nvme-multipath: add nvme_mpath_is_{disabled, optimised}
nvme-multipath: add nvme_mpath_get_access_state()
nvme-multipath: add nvme_mpath_{bdev, cdev}_ioctl()
nvme-multipath: add uring_cmd support
nvme-multipath: add nvme_mpath_get_iopolicy()
nvme-multipath: add PR support for libmultipath
nvme-multipath: add nvme_mpath_report_zones()
nvme-multipath: add nvme_mpath_get_unique_id()
nvme-multipath: add nvme_mpath_synchronize()
nvme-multipath: add nvme_mpath_{add,delete}_ns()
nvme-multipath: add nvme_mpath_head_queue_if_no_path()
nvme-multipath: set mpath_head_template.device_groups
nvme-multipath: switch to use libmultipath
drivers/nvme/host/Kconfig | 1 +
drivers/nvme/host/core.c | 81 ++-
drivers/nvme/host/ioctl.c | 96 ++--
drivers/nvme/host/multipath.c | 962 +++++++++++-----------------------
drivers/nvme/host/nvme.h | 117 +++--
drivers/nvme/host/pr.c | 205 ++++++--
drivers/nvme/host/sysfs.c | 84 +--
7 files changed, 632 insertions(+), 914 deletions(-)
--
2.43.5
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH 01/19] nvme-multipath: pass NS head to nvme_mpath_revalidate_paths()
2026-02-25 15:39 [PATCH 00/19] nvme: switch to libmultipath John Garry
@ 2026-02-25 15:39 ` John Garry
2026-02-25 15:39 ` [PATCH 02/19] nvme: introduce a namespace count in the ns head structure John Garry
` (18 subsequent siblings)
19 siblings, 0 replies; 28+ messages in thread
From: John Garry @ 2026-02-25 15:39 UTC (permalink / raw)
To: hch, kbusch, sagi, axboe, martin.petersen, james.bottomley, hare
Cc: jmeneghi, linux-nvme, linux-scsi, michael.christie, snitzer,
bmarzins, dm-devel, linux-block, linux-kernel, John Garry
In nvme_mpath_revalidate_paths(), we are passed a NS pointer and use that
to lookup the NS head and then as a iter variable.
It makes more sense pass the NS head and use a local variable for the NS
iter.
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
drivers/nvme/host/core.c | 2 +-
drivers/nvme/host/multipath.c | 4 ++--
drivers/nvme/host/nvme.h | 4 ++--
3 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 3a2126584a236..37e30caff4149 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2522,7 +2522,7 @@ static int nvme_update_ns_info(struct nvme_ns *ns, struct nvme_ns_info *info)
set_capacity_and_notify(ns->head->disk, get_capacity(ns->disk));
set_disk_ro(ns->head->disk, nvme_ns_is_readonly(ns, info));
- nvme_mpath_revalidate_paths(ns);
+ nvme_mpath_revalidate_paths(ns->head);
blk_mq_unfreeze_queue(ns->head->disk->queue, memflags);
}
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index bfcc5904e6a26..c70fff58b5698 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -271,10 +271,10 @@ void nvme_mpath_clear_ctrl_paths(struct nvme_ctrl *ctrl)
srcu_read_unlock(&ctrl->srcu, srcu_idx);
}
-void nvme_mpath_revalidate_paths(struct nvme_ns *ns)
+void nvme_mpath_revalidate_paths(struct nvme_ns_head *head)
{
- struct nvme_ns_head *head = ns->head;
sector_t capacity = get_capacity(head->disk);
+ struct nvme_ns *ns;
int node;
int srcu_idx;
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 9971045dbc05e..057d051ef925d 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -1035,7 +1035,7 @@ void nvme_mpath_update(struct nvme_ctrl *ctrl);
void nvme_mpath_uninit(struct nvme_ctrl *ctrl);
void nvme_mpath_stop(struct nvme_ctrl *ctrl);
bool nvme_mpath_clear_current_path(struct nvme_ns *ns);
-void nvme_mpath_revalidate_paths(struct nvme_ns *ns);
+void nvme_mpath_revalidate_paths(struct nvme_ns_head *head);
void nvme_mpath_clear_ctrl_paths(struct nvme_ctrl *ctrl);
void nvme_mpath_remove_disk(struct nvme_ns_head *head);
void nvme_mpath_start_request(struct request *rq);
@@ -1100,7 +1100,7 @@ static inline bool nvme_mpath_clear_current_path(struct nvme_ns *ns)
{
return false;
}
-static inline void nvme_mpath_revalidate_paths(struct nvme_ns *ns)
+static inline void nvme_mpath_revalidate_paths(struct nvme_ns_head *head)
{
}
static inline void nvme_mpath_clear_ctrl_paths(struct nvme_ctrl *ctrl)
--
2.43.5
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 02/19] nvme: introduce a namespace count in the ns head structure
2026-02-25 15:39 [PATCH 00/19] nvme: switch to libmultipath John Garry
2026-02-25 15:39 ` [PATCH 01/19] nvme-multipath: pass NS head to nvme_mpath_revalidate_paths() John Garry
@ 2026-02-25 15:39 ` John Garry
2026-03-02 12:46 ` Nilay Shroff
2026-02-25 15:39 ` [PATCH 03/19] nvme-multipath: add nvme_is_mpath_request() John Garry
` (17 subsequent siblings)
19 siblings, 1 reply; 28+ messages in thread
From: John Garry @ 2026-02-25 15:39 UTC (permalink / raw)
To: hch, kbusch, sagi, axboe, martin.petersen, james.bottomley, hare
Cc: jmeneghi, linux-nvme, linux-scsi, michael.christie, snitzer,
bmarzins, dm-devel, linux-block, linux-kernel, John Garry
For switching to use libmultipath, the per-namespace sibling list entry in
nvme_ns.sibling will be replaced with multipath_device.sibling list
pointer.
For when CONFIG_LIBMULTIPATH is disabled, that list of namespaces would no
longer be maintained.
However the core code checks in many places whether there is any
namespace in the head list, like in nvme_ns_remove().
Introduce a separate count of the number of namespaces for the namespace
head and use that count for the places where the per-namespace head list
of namespaces is checked to be empty.
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
drivers/nvme/host/core.c | 10 +++++++---
drivers/nvme/host/multipath.c | 4 ++--
drivers/nvme/host/nvme.h | 1 +
3 files changed, 10 insertions(+), 5 deletions(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 37e30caff4149..76249871dd7c2 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4024,7 +4024,7 @@ static int nvme_init_ns_head(struct nvme_ns *ns, struct nvme_ns_info *info)
} else {
ret = -EINVAL;
if ((!info->is_shared || !head->shared) &&
- !list_empty(&head->list)) {
+ head->ns_count) {
dev_err(ctrl->device,
"Duplicate unshared namespace %d\n",
info->nsid);
@@ -4047,6 +4047,7 @@ static int nvme_init_ns_head(struct nvme_ns *ns, struct nvme_ns_info *info)
}
list_add_tail_rcu(&ns->siblings, &head->list);
+ head->ns_count++;
ns->head = head;
mutex_unlock(&ctrl->subsys->lock);
@@ -4192,7 +4193,8 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, struct nvme_ns_info *info)
out_unlink_ns:
mutex_lock(&ctrl->subsys->lock);
list_del_rcu(&ns->siblings);
- if (list_empty(&ns->head->list)) {
+ ns->head->ns_count--;
+ if (!ns->head->ns_count) {
list_del_init(&ns->head->entry);
/*
* If multipath is not configured, we still create a namespace
@@ -4217,6 +4219,7 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, struct nvme_ns_info *info)
static void nvme_ns_remove(struct nvme_ns *ns)
{
+ struct nvme_ns_head *head = ns->head;
bool last_path = false;
if (test_and_set_bit(NVME_NS_REMOVING, &ns->flags))
@@ -4238,7 +4241,8 @@ static void nvme_ns_remove(struct nvme_ns *ns)
mutex_lock(&ns->ctrl->subsys->lock);
list_del_rcu(&ns->siblings);
- if (list_empty(&ns->head->list)) {
+ head->ns_count--;
+ if (!head->ns_count) {
if (!nvme_mpath_queue_if_no_path(ns->head))
list_del_init(&ns->head->entry);
last_path = true;
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index c70fff58b5698..0d540749b16ee 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -703,7 +703,7 @@ static void nvme_remove_head_work(struct work_struct *work)
bool remove = false;
mutex_lock(&head->subsys->lock);
- if (list_empty(&head->list)) {
+ if (!head->ns_count) {
list_del_init(&head->entry);
remove = true;
}
@@ -1307,7 +1307,7 @@ void nvme_mpath_remove_disk(struct nvme_ns_head *head)
* head->list here. If it is no longer empty then we skip enqueuing the
* delayed head removal work.
*/
- if (!list_empty(&head->list))
+ if (head->ns_count)
goto out;
if (head->delayed_removal_secs) {
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 057d051ef925d..397e8685f6c38 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -539,6 +539,7 @@ struct nvme_ns_head {
struct nvme_effects_log *effects;
u64 nuse;
unsigned ns_id;
+ int ns_count;
int instance;
#ifdef CONFIG_BLK_DEV_ZONED
u64 zsze;
--
2.43.5
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 03/19] nvme-multipath: add nvme_is_mpath_request()
2026-02-25 15:39 [PATCH 00/19] nvme: switch to libmultipath John Garry
2026-02-25 15:39 ` [PATCH 01/19] nvme-multipath: pass NS head to nvme_mpath_revalidate_paths() John Garry
2026-02-25 15:39 ` [PATCH 02/19] nvme: introduce a namespace count in the ns head structure John Garry
@ 2026-02-25 15:39 ` John Garry
2026-02-25 15:39 ` [PATCH 04/19] nvme-multipath: add initial support for using libmultipath John Garry
` (16 subsequent siblings)
19 siblings, 0 replies; 28+ messages in thread
From: John Garry @ 2026-02-25 15:39 UTC (permalink / raw)
To: hch, kbusch, sagi, axboe, martin.petersen, james.bottomley, hare
Cc: jmeneghi, linux-nvme, linux-scsi, michael.christie, snitzer,
bmarzins, dm-devel, linux-block, linux-kernel, John Garry
Add a helper to find if a request has flag REQ_NVME_MPATH set.
An advantage of this is that for !CONFIG_NVME_MULTIPATH, the code is
compiled out, so we avoid the check.
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
drivers/nvme/host/core.c | 6 +++---
drivers/nvme/host/nvme.h | 13 +++++++++++--
2 files changed, 14 insertions(+), 5 deletions(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 76249871dd7c2..2d0faec902eb2 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -409,7 +409,7 @@ static inline enum nvme_disposition nvme_decide_disposition(struct request *req)
if ((nvme_req(req)->status & NVME_SCT_SC_MASK) == NVME_SC_AUTH_REQUIRED)
return AUTHENTICATE;
- if (req->cmd_flags & REQ_NVME_MPATH) {
+ if (nvme_is_mpath_request(req)) {
if (nvme_is_path_error(nvme_req(req)->status) ||
blk_queue_dying(req->q))
return FAILOVER;
@@ -442,7 +442,7 @@ static inline void __nvme_end_req(struct request *req)
}
nvme_end_req_zoned(req);
nvme_trace_bio_complete(req);
- if (req->cmd_flags & REQ_NVME_MPATH)
+ if (nvme_is_mpath_request(req))
nvme_mpath_end_request(req);
}
@@ -762,7 +762,7 @@ blk_status_t nvme_fail_nonready_command(struct nvme_ctrl *ctrl,
state != NVME_CTRL_DELETING &&
state != NVME_CTRL_DEAD &&
!test_bit(NVME_CTRL_FAILFAST_EXPIRED, &ctrl->flags) &&
- !blk_noretry_request(rq) && !(rq->cmd_flags & REQ_NVME_MPATH))
+ !blk_noretry_request(rq) && !nvme_is_mpath_request(rq))
return BLK_STS_RESOURCE;
if (!(rq->rq_flags & RQF_DONTPREP))
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 397e8685f6c38..6b5977610d886 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -1042,11 +1042,16 @@ void nvme_mpath_remove_disk(struct nvme_ns_head *head);
void nvme_mpath_start_request(struct request *rq);
void nvme_mpath_end_request(struct request *rq);
+static inline bool nvme_is_mpath_request(struct request *req)
+{
+ return req->cmd_flags & REQ_NVME_MPATH;
+}
+
static inline void nvme_trace_bio_complete(struct request *req)
{
struct nvme_ns *ns = req->q->queuedata;
- if ((req->cmd_flags & REQ_NVME_MPATH) && req->bio)
+ if (nvme_is_mpath_request(req) && req->bio)
trace_block_bio_complete(ns->head->disk->queue, req->bio);
}
@@ -1145,6 +1150,10 @@ static inline void nvme_mpath_start_freeze(struct nvme_subsystem *subsys)
static inline void nvme_mpath_default_iopolicy(struct nvme_subsystem *subsys)
{
}
+static inline bool nvme_is_mpath_request(struct request *req)
+{
+ return false;
+}
static inline void nvme_mpath_start_request(struct request *rq)
{
}
@@ -1213,7 +1222,7 @@ static inline void nvme_hwmon_exit(struct nvme_ctrl *ctrl)
static inline void nvme_start_request(struct request *rq)
{
- if (rq->cmd_flags & REQ_NVME_MPATH)
+ if (nvme_is_mpath_request(rq))
nvme_mpath_start_request(rq);
blk_mq_start_request(rq);
}
--
2.43.5
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 04/19] nvme-multipath: add initial support for using libmultipath
2026-02-25 15:39 [PATCH 00/19] nvme: switch to libmultipath John Garry
` (2 preceding siblings ...)
2026-02-25 15:39 ` [PATCH 03/19] nvme-multipath: add nvme_is_mpath_request() John Garry
@ 2026-02-25 15:39 ` John Garry
2026-02-25 15:39 ` [PATCH 05/19] nvme-multipath: add nvme_mpath_available_path() John Garry
` (15 subsequent siblings)
19 siblings, 0 replies; 28+ messages in thread
From: John Garry @ 2026-02-25 15:39 UTC (permalink / raw)
To: hch, kbusch, sagi, axboe, martin.petersen, james.bottomley, hare
Cc: jmeneghi, linux-nvme, linux-scsi, michael.christie, snitzer,
bmarzins, dm-devel, linux-block, linux-kernel, John Garry
Add initial support, as follows:
- Add mpath_head_template
- Add mpath_device in nvme_ns
- Add mpath_disk pointer to head structure
Initially all the functionality which mpath_head_template points to will be
unused, until the driver fully switches to libmultipath. Otherwise it's
hard to do so in a step-wise fashion without breaking functionality.
Many of the libmultipath-based function added will reference the
ns mpath_device, so add that now. Also add the NS head disk pointer for the
same reason.
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
drivers/nvme/host/Kconfig | 1 +
drivers/nvme/host/multipath.c | 6 ++++++
drivers/nvme/host/nvme.h | 6 ++++++
3 files changed, 13 insertions(+)
diff --git a/drivers/nvme/host/Kconfig b/drivers/nvme/host/Kconfig
index 31974c7dd20c9..fc6e75fe8cbfe 100644
--- a/drivers/nvme/host/Kconfig
+++ b/drivers/nvme/host/Kconfig
@@ -17,6 +17,7 @@ config BLK_DEV_NVME
config NVME_MULTIPATH
bool "NVMe multipath support"
depends on NVME_CORE
+ select LIBMULTIPATH
help
This option controls support for multipath access to NVMe
subsystems. If this option is enabled support for NVMe multipath
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 0d540749b16ee..390a1d1133921 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -12,6 +12,8 @@
bool multipath = true;
static bool multipath_always_on;
+static const struct mpath_head_template mpdt;
+
static int multipath_param_set(const char *val, const struct kernel_param *kp)
{
int ret;
@@ -1407,3 +1409,7 @@ void nvme_mpath_uninit(struct nvme_ctrl *ctrl)
ctrl->ana_log_buf = NULL;
ctrl->ana_log_size = 0;
}
+
+__maybe_unused
+static const struct mpath_head_template mpdt = {
+};
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 6b5977610d886..c48efbfb46efc 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -13,6 +13,7 @@
#include <linux/blk-mq.h>
#include <linux/sed-opal.h>
#include <linux/fault-inject.h>
+#include <linux/multipath.h>
#include <linux/rcupdate.h>
#include <linux/wait.h>
#include <linux/t10-pi.h>
@@ -555,6 +556,8 @@ struct nvme_ns_head {
u16 nr_plids;
u16 *plids;
+
+ struct mpath_disk *mpath_disk;
#ifdef CONFIG_NVME_MULTIPATH
struct bio_list requeue_list;
spinlock_t requeue_lock;
@@ -582,6 +585,7 @@ enum nvme_ns_features {
};
struct nvme_ns {
+ struct mpath_device mpath_device;
struct list_head list;
struct nvme_ctrl *ctrl;
@@ -608,6 +612,8 @@ struct nvme_ns {
struct nvme_fault_inject fault_inject;
};
+#define nvme_mpath_to_ns(d) container_of(d, struct nvme_ns, mpath_device)
+
/* NVMe ns supports metadata actions by the controller (generate/strip) */
static inline bool nvme_ns_has_pi(struct nvme_ns_head *head)
{
--
2.43.5
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 05/19] nvme-multipath: add nvme_mpath_available_path()
2026-02-25 15:39 [PATCH 00/19] nvme: switch to libmultipath John Garry
` (3 preceding siblings ...)
2026-02-25 15:39 ` [PATCH 04/19] nvme-multipath: add initial support for using libmultipath John Garry
@ 2026-02-25 15:39 ` John Garry
2026-02-25 15:39 ` [PATCH 06/19] nvme-multipath: add nvme_mpath_{add, remove}_cdev() John Garry
` (14 subsequent siblings)
19 siblings, 0 replies; 28+ messages in thread
From: John Garry @ 2026-02-25 15:39 UTC (permalink / raw)
To: hch, kbusch, sagi, axboe, martin.petersen, james.bottomley, hare
Cc: jmeneghi, linux-nvme, linux-scsi, michael.christie, snitzer,
bmarzins, dm-devel, linux-block, linux-kernel, John Garry
This is for mpath_head_template.available_path callback.
Currently the same functionality is in nvme_available_path().
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
drivers/nvme/host/multipath.c | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 390a1d1133921..e888791b8947a 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -510,6 +510,26 @@ static bool nvme_available_path(struct nvme_ns_head *head)
return nvme_mpath_queue_if_no_path(head);
}
+static bool nvme_mpath_available_path(struct mpath_device *mpath_device,
+ bool *available)
+{
+ struct nvme_ns *ns = nvme_mpath_to_ns(mpath_device);
+
+ if (test_bit(NVME_CTRL_FAILFAST_EXPIRED, &ns->ctrl->flags))
+ return false;
+
+ switch (nvme_ctrl_state(ns->ctrl)) {
+ case NVME_CTRL_LIVE:
+ case NVME_CTRL_RESETTING:
+ case NVME_CTRL_CONNECTING:
+ *available = true;
+ default:
+ break;
+ }
+
+ return true;
+}
+
static void nvme_ns_head_submit_bio(struct bio *bio)
{
struct nvme_ns_head *head = bio->bi_bdev->bd_disk->private_data;
@@ -1412,4 +1432,5 @@ void nvme_mpath_uninit(struct nvme_ctrl *ctrl)
__maybe_unused
static const struct mpath_head_template mpdt = {
+ .available_path = nvme_mpath_available_path,
};
--
2.43.5
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 06/19] nvme-multipath: add nvme_mpath_{add, remove}_cdev()
2026-02-25 15:39 [PATCH 00/19] nvme: switch to libmultipath John Garry
` (4 preceding siblings ...)
2026-02-25 15:39 ` [PATCH 05/19] nvme-multipath: add nvme_mpath_available_path() John Garry
@ 2026-02-25 15:39 ` John Garry
2026-02-25 15:39 ` [PATCH 07/19] nvme-multipath: add nvme_mpath_is_{disabled, optimised} John Garry
` (13 subsequent siblings)
19 siblings, 0 replies; 28+ messages in thread
From: John Garry @ 2026-02-25 15:39 UTC (permalink / raw)
To: hch, kbusch, sagi, axboe, martin.petersen, james.bottomley, hare
Cc: jmeneghi, linux-nvme, linux-scsi, michael.christie, snitzer,
bmarzins, dm-devel, linux-block, linux-kernel, John Garry
These are for mpath_head_template.add_cdev+del_cdev callbacks.
Currently the same functionality is in nvme_add_ns_cdev() and
nvme_cdev_del().
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
drivers/nvme/host/multipath.c | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index e888791b8947a..c90ac76dbe317 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -628,6 +628,26 @@ const struct block_device_operations nvme_ns_head_ops = {
.pr_ops = &nvme_pr_ops,
};
+static int nvme_mpath_add_cdev(struct mpath_head *mpath_head)
+{
+ struct nvme_ns_head *head = mpath_head->drvdata;
+ int ret;
+
+ mpath_head->cdev_device.parent = &head->subsys->dev;
+ ret = dev_set_name(&mpath_head->cdev_device, "ng%dn%d",
+ head->subsys->instance, head->instance);
+ if (ret)
+ return ret;
+ ret = nvme_cdev_add(&mpath_head->cdev, &mpath_head->cdev_device,
+ &mpath_generic_chr_fops, THIS_MODULE);
+ return ret;
+}
+
+static void nvme_mpath_del_cdev(struct mpath_head *mpath_head)
+{
+ nvme_cdev_del(&mpath_head->cdev, &mpath_head->cdev_device);
+}
+
static inline struct nvme_ns_head *cdev_to_ns_head(struct cdev *cdev)
{
return container_of(cdev, struct nvme_ns_head, cdev);
@@ -1433,4 +1453,6 @@ void nvme_mpath_uninit(struct nvme_ctrl *ctrl)
__maybe_unused
static const struct mpath_head_template mpdt = {
.available_path = nvme_mpath_available_path,
+ .add_cdev = nvme_mpath_add_cdev,
+ .del_cdev = nvme_mpath_del_cdev,
};
--
2.43.5
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 07/19] nvme-multipath: add nvme_mpath_is_{disabled, optimised}
2026-02-25 15:39 [PATCH 00/19] nvme: switch to libmultipath John Garry
` (5 preceding siblings ...)
2026-02-25 15:39 ` [PATCH 06/19] nvme-multipath: add nvme_mpath_{add, remove}_cdev() John Garry
@ 2026-02-25 15:39 ` John Garry
2026-02-25 15:39 ` [PATCH 08/19] nvme-multipath: add nvme_mpath_get_access_state() John Garry
` (12 subsequent siblings)
19 siblings, 0 replies; 28+ messages in thread
From: John Garry @ 2026-02-25 15:39 UTC (permalink / raw)
To: hch, kbusch, sagi, axboe, martin.petersen, james.bottomley, hare
Cc: jmeneghi, linux-nvme, linux-scsi, michael.christie, snitzer,
bmarzins, dm-devel, linux-block, linux-kernel, John Garry
These are for mpath_head_template.is_{disabled, optimized} callbacks, and
just call into nvme_path_is_disabled() and nvme_path_is_optimized(),
respectively.
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
drivers/nvme/host/multipath.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index c90ac76dbe317..07461a7d8d1fa 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -310,6 +310,13 @@ static bool nvme_path_is_disabled(struct nvme_ns *ns)
return false;
}
+static bool nvme_mpath_is_disabled(struct mpath_device *mpath_device)
+{
+ struct nvme_ns *ns = nvme_mpath_to_ns(mpath_device);
+
+ return nvme_path_is_disabled(ns);
+}
+
static struct nvme_ns *__nvme_find_path(struct nvme_ns_head *head, int node)
{
int found_distance = INT_MAX, fallback_distance = INT_MAX, distance;
@@ -452,6 +459,13 @@ static inline bool nvme_path_is_optimized(struct nvme_ns *ns)
ns->ana_state == NVME_ANA_OPTIMIZED;
}
+static bool nvme_mpath_is_optimized(struct mpath_device *mpath_device)
+{
+ struct nvme_ns *ns = nvme_mpath_to_ns(mpath_device);
+
+ return nvme_path_is_optimized(ns);
+}
+
static struct nvme_ns *nvme_numa_path(struct nvme_ns_head *head)
{
int node = numa_node_id();
@@ -1455,4 +1469,6 @@ static const struct mpath_head_template mpdt = {
.available_path = nvme_mpath_available_path,
.add_cdev = nvme_mpath_add_cdev,
.del_cdev = nvme_mpath_del_cdev,
+ .is_disabled = nvme_mpath_is_disabled,
+ .is_optimized = nvme_mpath_is_optimized,
};
--
2.43.5
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 08/19] nvme-multipath: add nvme_mpath_get_access_state()
2026-02-25 15:39 [PATCH 00/19] nvme: switch to libmultipath John Garry
` (6 preceding siblings ...)
2026-02-25 15:39 ` [PATCH 07/19] nvme-multipath: add nvme_mpath_is_{disabled, optimised} John Garry
@ 2026-02-25 15:39 ` John Garry
2026-02-25 15:39 ` [PATCH 09/19] nvme-multipath: add nvme_mpath_{bdev, cdev}_ioctl() John Garry
` (11 subsequent siblings)
19 siblings, 0 replies; 28+ messages in thread
From: John Garry @ 2026-02-25 15:39 UTC (permalink / raw)
To: hch, kbusch, sagi, axboe, martin.petersen, james.bottomley, hare
Cc: jmeneghi, linux-nvme, linux-scsi, michael.christie, snitzer,
bmarzins, dm-devel, linux-block, linux-kernel, John Garry
Add nvme_mpath_get_access_state(), which gets the NS ana_state and
translates into enum mpath_access_state.
This replicates functionality for checking ana state in __nvme_find_path().
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
drivers/nvme/host/multipath.c | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 07461a7d8d1fa..a67db36f3c5a5 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -1464,6 +1464,24 @@ void nvme_mpath_uninit(struct nvme_ctrl *ctrl)
ctrl->ana_log_size = 0;
}
+static enum mpath_access_state nvme_mpath_get_access_state(
+ struct mpath_device *mpath_device)
+{
+ struct nvme_ns *ns = nvme_mpath_to_ns(mpath_device);
+
+ switch (ns->ana_state) {
+ case NVME_ANA_OPTIMIZED:
+ return MPATH_STATE_OPTIMIZED;
+ case NVME_ANA_NONOPTIMIZED:
+ return MPATH_STATE_ACTIVE;
+ case NVME_ANA_INACCESSIBLE:
+ case NVME_ANA_PERSISTENT_LOSS:
+ case NVME_ANA_CHANGE:
+ default:
+ return MPATH_STATE_INVALID;
+ }
+}
+
__maybe_unused
static const struct mpath_head_template mpdt = {
.available_path = nvme_mpath_available_path,
@@ -1471,4 +1489,5 @@ static const struct mpath_head_template mpdt = {
.del_cdev = nvme_mpath_del_cdev,
.is_disabled = nvme_mpath_is_disabled,
.is_optimized = nvme_mpath_is_optimized,
+ .get_access_state = nvme_mpath_get_access_state,
};
--
2.43.5
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 09/19] nvme-multipath: add nvme_mpath_{bdev, cdev}_ioctl()
2026-02-25 15:39 [PATCH 00/19] nvme: switch to libmultipath John Garry
` (7 preceding siblings ...)
2026-02-25 15:39 ` [PATCH 08/19] nvme-multipath: add nvme_mpath_get_access_state() John Garry
@ 2026-02-25 15:39 ` John Garry
2026-02-25 15:39 ` [PATCH 10/19] nvme-multipath: add uring_cmd support John Garry
` (10 subsequent siblings)
19 siblings, 0 replies; 28+ messages in thread
From: John Garry @ 2026-02-25 15:39 UTC (permalink / raw)
To: hch, kbusch, sagi, axboe, martin.petersen, james.bottomley, hare
Cc: jmeneghi, linux-nvme, linux-scsi, michael.christie, snitzer,
bmarzins, dm-devel, linux-block, linux-kernel, John Garry
Add ioctl callsbacks as follows:
- nvme_mpath_bdev_ioctl(), which does the same as nvme_ns_head_ioctl()
- nvme_mpath_cdev_ioctl(), which does the same as nvme_ns_head_chr_ioctl()
Note that ioctl callbacks are called with the mpath_head srcu read lock
taken. Since nvme_ns_head_ctrl_ioctl() releases the lock itself, it
is expected that the ioctls always release the srcu read lock themselves.
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
drivers/nvme/host/ioctl.c | 76 +++++++++++++++++++++++++++++++++++
drivers/nvme/host/multipath.c | 42 ++++++++++---------
drivers/nvme/host/nvme.h | 6 +++
3 files changed, 104 insertions(+), 20 deletions(-)
diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c
index a9c097dacad6f..7f0bd38f8c24e 100644
--- a/drivers/nvme/host/ioctl.c
+++ b/drivers/nvme/host/ioctl.c
@@ -682,6 +682,25 @@ int nvme_ns_chr_uring_cmd_iopoll(struct io_uring_cmd *ioucmd,
return 0;
}
#ifdef CONFIG_NVME_MULTIPATH
+static int nvme_mpath_device_ctrl_ioctl(struct mpath_device *mpath_device,
+ unsigned int cmd, void __user *argp,
+ struct nvme_ns_head *head, int srcu_idx,
+ bool open_for_write)
+{
+ struct nvme_ns *ns = nvme_mpath_to_ns(mpath_device);
+ struct mpath_disk *mpath_disk = head->mpath_disk;
+ struct mpath_head *mpath_head = mpath_disk->mpath_head;
+ struct nvme_ctrl *ctrl = ns->ctrl;
+ int ret;
+
+ nvme_get_ctrl(ns->ctrl);
+ mpath_head_read_unlock(mpath_head, srcu_idx);
+ ret = nvme_ctrl_ioctl(ns->ctrl, cmd, argp, open_for_write);
+
+ nvme_put_ctrl(ctrl);
+ return ret;
+}
+
static int nvme_ns_head_ctrl_ioctl(struct nvme_ns *ns, unsigned int cmd,
void __user *argp, struct nvme_ns_head *head, int srcu_idx,
bool open_for_write)
@@ -698,6 +717,63 @@ static int nvme_ns_head_ctrl_ioctl(struct nvme_ns *ns, unsigned int cmd,
return ret;
}
+int nvme_mpath_bdev_ioctl(struct block_device *bdev,
+ struct mpath_device *mpath_device, blk_mode_t mode,
+ unsigned int cmd, unsigned long arg, int srcu_idx)
+{
+ struct gendisk *disk = bdev->bd_disk;
+ struct mpath_disk *mpath_disk = mpath_gendisk_to_disk(disk);
+ struct nvme_ns *ns = nvme_mpath_to_ns(mpath_device);
+ struct nvme_ns_head *head = ns->head;
+ bool open_for_write = mode & BLK_OPEN_WRITE;
+ void __user *argp = (void __user *)arg;
+ struct mpath_head *mpath_head = mpath_disk->mpath_head;
+ int ret = -EWOULDBLOCK;
+ unsigned int flags = 0;
+
+ if (bdev_is_partition(bdev))
+ flags |= NVME_IOCTL_PARTITION;
+
+ /*
+ * Handle ioctls that apply to the controller instead of the namespace
+ * separately and drop the ns SRCU reference early. This avoids a
+ * deadlock when deleting namespaces using the passthrough interface.
+ */
+ if (is_ctrl_ioctl(cmd))
+ return nvme_mpath_device_ctrl_ioctl(mpath_device, cmd, argp,
+ head, srcu_idx, open_for_write);
+
+ ret = nvme_ns_ioctl(ns, cmd, argp, flags, open_for_write);
+ mpath_head_read_unlock(mpath_head, srcu_idx);
+
+ return ret;
+}
+
+int nvme_mpath_cdev_ioctl(struct mpath_head *mpath_head,
+ struct mpath_device *mpath_device, blk_mode_t mode,
+ unsigned int cmd, unsigned long arg, int srcu_idx)
+{
+ struct nvme_ns *ns = nvme_mpath_to_ns(mpath_device);
+ struct nvme_ns_head *head = ns->head;
+ bool open_for_write = mode & BLK_OPEN_WRITE;
+ void __user *argp = (void __user *)arg;
+ int ret = -EWOULDBLOCK;
+
+ /*
+ * Handle ioctls that apply to the controller instead of the namespace
+ * separately and drop the ns SRCU reference early. This avoids a
+ * deadlock when deleting namespaces using the passthrough interface.
+ */
+ if (is_ctrl_ioctl(cmd))
+ return nvme_mpath_device_ctrl_ioctl(mpath_device, cmd, argp,
+ head, srcu_idx, open_for_write);
+
+ ret = nvme_ns_ioctl(ns, cmd, argp, 0, open_for_write);
+ mpath_head_read_unlock(mpath_head, srcu_idx);
+
+ return ret;
+}
+
int nvme_ns_head_ioctl(struct block_device *bdev, blk_mode_t mode,
unsigned int cmd, unsigned long arg)
{
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index a67db36f3c5a5..513d73e589a58 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -642,26 +642,6 @@ const struct block_device_operations nvme_ns_head_ops = {
.pr_ops = &nvme_pr_ops,
};
-static int nvme_mpath_add_cdev(struct mpath_head *mpath_head)
-{
- struct nvme_ns_head *head = mpath_head->drvdata;
- int ret;
-
- mpath_head->cdev_device.parent = &head->subsys->dev;
- ret = dev_set_name(&mpath_head->cdev_device, "ng%dn%d",
- head->subsys->instance, head->instance);
- if (ret)
- return ret;
- ret = nvme_cdev_add(&mpath_head->cdev, &mpath_head->cdev_device,
- &mpath_generic_chr_fops, THIS_MODULE);
- return ret;
-}
-
-static void nvme_mpath_del_cdev(struct mpath_head *mpath_head)
-{
- nvme_cdev_del(&mpath_head->cdev, &mpath_head->cdev_device);
-}
-
static inline struct nvme_ns_head *cdev_to_ns_head(struct cdev *cdev)
{
return container_of(cdev, struct nvme_ns_head, cdev);
@@ -690,6 +670,26 @@ static const struct file_operations nvme_ns_head_chr_fops = {
.uring_cmd_iopoll = nvme_ns_chr_uring_cmd_iopoll,
};
+static int nvme_mpath_add_cdev(struct mpath_head *mpath_head)
+{
+ struct nvme_ns_head *head = mpath_head->drvdata;
+ int ret;
+
+ mpath_head->cdev_device.parent = &head->subsys->dev;
+ ret = dev_set_name(&mpath_head->cdev_device, "ng%dn%d",
+ head->subsys->instance, head->instance);
+ if (ret)
+ return ret;
+ ret = nvme_cdev_add(&mpath_head->cdev, &mpath_head->cdev_device,
+ &mpath_generic_chr_fops, THIS_MODULE);
+ return ret;
+}
+
+static void nvme_mpath_del_cdev(struct mpath_head *mpath_head)
+{
+ nvme_cdev_del(&mpath_head->cdev, &mpath_head->cdev_device);
+}
+
static int nvme_add_ns_head_cdev(struct nvme_ns_head *head)
{
int ret;
@@ -1490,4 +1490,6 @@ static const struct mpath_head_template mpdt = {
.is_disabled = nvme_mpath_is_disabled,
.is_optimized = nvme_mpath_is_optimized,
.get_access_state = nvme_mpath_get_access_state,
+ .bdev_ioctl = nvme_mpath_bdev_ioctl,
+ .cdev_ioctl = nvme_mpath_cdev_ioctl,
};
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index c48efbfb46efc..11b63e92502ad 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -1047,6 +1047,12 @@ void nvme_mpath_clear_ctrl_paths(struct nvme_ctrl *ctrl);
void nvme_mpath_remove_disk(struct nvme_ns_head *head);
void nvme_mpath_start_request(struct request *rq);
void nvme_mpath_end_request(struct request *rq);
+int nvme_mpath_bdev_ioctl(struct block_device *bdev,
+ struct mpath_device *mpath_device, blk_mode_t mode,
+ unsigned int cmd, unsigned long arg, int srcu_idx);
+int nvme_mpath_cdev_ioctl(struct mpath_head *mpath_device,
+ struct mpath_device *mpath_head, blk_mode_t mode,
+ unsigned int cmd, unsigned long arg, int srcu_idx);
static inline bool nvme_is_mpath_request(struct request *req)
{
--
2.43.5
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 10/19] nvme-multipath: add uring_cmd support
2026-02-25 15:39 [PATCH 00/19] nvme: switch to libmultipath John Garry
` (8 preceding siblings ...)
2026-02-25 15:39 ` [PATCH 09/19] nvme-multipath: add nvme_mpath_{bdev, cdev}_ioctl() John Garry
@ 2026-02-25 15:39 ` John Garry
2026-02-25 15:39 ` [PATCH 11/19] nvme-multipath: add nvme_mpath_get_iopolicy() John Garry
` (9 subsequent siblings)
19 siblings, 0 replies; 28+ messages in thread
From: John Garry @ 2026-02-25 15:39 UTC (permalink / raw)
To: hch, kbusch, sagi, axboe, martin.petersen, james.bottomley, hare
Cc: jmeneghi, linux-nvme, linux-scsi, michael.christie, snitzer,
bmarzins, dm-devel, linux-block, linux-kernel, John Garry
Add callback nvme_mpath_chr_uring_cmd, which is equivalent to
nvme_ns_head_chr_uring_cmd().
Also fill in chr_uring_cmd_iopoll with same function as currently used,
chr_uring_cmd_iopoll().
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
drivers/nvme/host/ioctl.c | 9 +++++++++
drivers/nvme/host/multipath.c | 2 ++
drivers/nvme/host/nvme.h | 2 ++
3 files changed, 13 insertions(+)
diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c
index 7f0bd38f8c24e..773c819cde52a 100644
--- a/drivers/nvme/host/ioctl.c
+++ b/drivers/nvme/host/ioctl.c
@@ -701,6 +701,15 @@ static int nvme_mpath_device_ctrl_ioctl(struct mpath_device *mpath_device,
return ret;
}
+int nvme_mpath_chr_uring_cmd(struct mpath_device *mpath_device,
+ struct io_uring_cmd *ioucmd,
+ unsigned int issue_flags)
+{
+ struct nvme_ns *ns = nvme_mpath_to_ns(mpath_device);
+
+ return nvme_ns_uring_cmd(ns, ioucmd, issue_flags);
+}
+
static int nvme_ns_head_ctrl_ioctl(struct nvme_ns *ns, unsigned int cmd,
void __user *argp, struct nvme_ns_head *head, int srcu_idx,
bool open_for_write)
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 513d73e589a58..12386f9caa72a 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -1492,4 +1492,6 @@ static const struct mpath_head_template mpdt = {
.get_access_state = nvme_mpath_get_access_state,
.bdev_ioctl = nvme_mpath_bdev_ioctl,
.cdev_ioctl = nvme_mpath_cdev_ioctl,
+ .chr_uring_cmd = nvme_mpath_chr_uring_cmd,
+ .chr_uring_cmd_iopoll = nvme_ns_chr_uring_cmd_iopoll,
};
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 11b63e92502ad..bc0ad0bbb68fd 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -1053,6 +1053,8 @@ int nvme_mpath_bdev_ioctl(struct block_device *bdev,
int nvme_mpath_cdev_ioctl(struct mpath_head *mpath_device,
struct mpath_device *mpath_head, blk_mode_t mode,
unsigned int cmd, unsigned long arg, int srcu_idx);
+int nvme_mpath_chr_uring_cmd(struct mpath_device *mpath_device,
+ struct io_uring_cmd *ioucmd, unsigned int issue_flags);
static inline bool nvme_is_mpath_request(struct request *req)
{
--
2.43.5
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 11/19] nvme-multipath: add nvme_mpath_get_iopolicy()
2026-02-25 15:39 [PATCH 00/19] nvme: switch to libmultipath John Garry
` (9 preceding siblings ...)
2026-02-25 15:39 ` [PATCH 10/19] nvme-multipath: add uring_cmd support John Garry
@ 2026-02-25 15:39 ` John Garry
2026-02-25 15:40 ` [PATCH 12/19] nvme-multipath: add PR support for libmultipath John Garry
` (8 subsequent siblings)
19 siblings, 0 replies; 28+ messages in thread
From: John Garry @ 2026-02-25 15:39 UTC (permalink / raw)
To: hch, kbusch, sagi, axboe, martin.petersen, james.bottomley, hare
Cc: jmeneghi, linux-nvme, linux-scsi, michael.christie, snitzer,
bmarzins, dm-devel, linux-block, linux-kernel, John Garry
Add a function to return the iopolicy for the head structure.
Since iopolicy for NVMe is currently per-subsystem, we add the
mpath_iopolicy struct to the subsystem struct, and
nvme_mpath_get_iopolicy() needs to access that member.
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
drivers/nvme/host/multipath.c | 10 ++++++++++
drivers/nvme/host/nvme.h | 1 +
2 files changed, 11 insertions(+)
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 12386f9caa72a..6cadbc0449d3d 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -1464,6 +1464,15 @@ void nvme_mpath_uninit(struct nvme_ctrl *ctrl)
ctrl->ana_log_size = 0;
}
+static enum mpath_iopolicy_e nvme_mpath_get_iopolicy(
+ struct mpath_head *mpath_head)
+{
+ struct nvme_ns_head *head = mpath_head->drvdata;
+ struct nvme_subsystem *subsys = head->subsys;
+
+ return mpath_read_iopolicy(&subsys->mpath_iopolicy);
+}
+
static enum mpath_access_state nvme_mpath_get_access_state(
struct mpath_device *mpath_device)
{
@@ -1494,4 +1503,5 @@ static const struct mpath_head_template mpdt = {
.cdev_ioctl = nvme_mpath_cdev_ioctl,
.chr_uring_cmd = nvme_mpath_chr_uring_cmd,
.chr_uring_cmd_iopoll = nvme_ns_chr_uring_cmd_iopoll,
+ .get_iopolicy = nvme_mpath_get_iopolicy,
};
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index bc0ad0bbb68fd..da9bd1ada6ad6 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -502,6 +502,7 @@ struct nvme_subsystem {
struct ida ns_ida;
#ifdef CONFIG_NVME_MULTIPATH
enum nvme_iopolicy iopolicy;
+ struct mpath_iopolicy mpath_iopolicy;
#endif
};
--
2.43.5
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 12/19] nvme-multipath: add PR support for libmultipath
2026-02-25 15:39 [PATCH 00/19] nvme: switch to libmultipath John Garry
` (10 preceding siblings ...)
2026-02-25 15:39 ` [PATCH 11/19] nvme-multipath: add nvme_mpath_get_iopolicy() John Garry
@ 2026-02-25 15:40 ` John Garry
2026-02-25 15:40 ` [PATCH 13/19] nvme-multipath: add nvme_mpath_report_zones() John Garry
` (7 subsequent siblings)
19 siblings, 0 replies; 28+ messages in thread
From: John Garry @ 2026-02-25 15:40 UTC (permalink / raw)
To: hch, kbusch, sagi, axboe, martin.petersen, james.bottomley, hare
Cc: jmeneghi, linux-nvme, linux-scsi, michael.christie, snitzer,
bmarzins, dm-devel, linux-block, linux-kernel, John Garry
Add PR support for libmultipath in the addition of nvme_mpath_pr_ops
structure.
The callbacks here pass mpath_device pointers. These can be converted to
NS pointer. However, the current PR callbacks for nvme_pr_ops work in
pass a bdev, and the helps us this to figure out if we are for a
multipath head or a NS. Later the send command helpers can be changed to
work per NS, when the full change to libmultipath happens. Until then,
have separate per-NS command send helpers. The original PR callback
functions from nvme_pr_ops can also be refactored to use the new
NS-based callbacks then, reducing duplication.
The new NS-based helpers are marked as __maybe_unused until the switch
to libmultipath happens.
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
drivers/nvme/host/multipath.c | 1 +
drivers/nvme/host/nvme.h | 1 +
drivers/nvme/host/pr.c | 314 ++++++++++++++++++++++++++++++++++
3 files changed, 316 insertions(+)
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 6cadbc0449d3d..ac75db92dd124 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -1501,6 +1501,7 @@ static const struct mpath_head_template mpdt = {
.get_access_state = nvme_mpath_get_access_state,
.bdev_ioctl = nvme_mpath_bdev_ioctl,
.cdev_ioctl = nvme_mpath_cdev_ioctl,
+ .pr_ops = &nvme_mpath_pr_ops,
.chr_uring_cmd = nvme_mpath_chr_uring_cmd,
.chr_uring_cmd_iopoll = nvme_ns_chr_uring_cmd_iopoll,
.get_iopolicy = nvme_mpath_get_iopolicy,
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index da9bd1ada6ad6..619d2fff969e3 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -22,6 +22,7 @@
#include <trace/events/block.h>
extern const struct pr_ops nvme_pr_ops;
+extern const struct mpath_pr_ops nvme_mpath_pr_ops;
extern unsigned int nvme_io_timeout;
#define NVME_IO_TIMEOUT (nvme_io_timeout * HZ)
diff --git a/drivers/nvme/host/pr.c b/drivers/nvme/host/pr.c
index ad2ecc2f49a97..fd5a9f309a56f 100644
--- a/drivers/nvme/host/pr.c
+++ b/drivers/nvme/host/pr.c
@@ -116,6 +116,51 @@ static int nvme_send_pr_command(struct block_device *bdev, u32 cdw10, u32 cdw11,
return ret < 0 ? ret : nvme_status_to_pr_err(ret);
}
+static int __nvme_send_pr_command_ns(struct nvme_ns *ns, u32 cdw10,
+ u32 cdw11, u8 op, void *data, unsigned int data_len)
+{
+ struct nvme_command c = { 0 };
+
+ c.common.opcode = op;
+ c.common.cdw10 = cpu_to_le32(cdw10);
+ c.common.cdw11 = cpu_to_le32(cdw11);
+
+ return nvme_send_ns_pr_command(ns, &c, data, data_len);
+}
+
+static int nvme_send_pr_command_ns(struct nvme_ns *ns, u32 cdw10, u32 cdw11,
+ u8 op, void *data, unsigned int data_len)
+{
+ int ret;
+
+ ret = __nvme_send_pr_command_ns(ns, cdw10, cdw11, op, data, data_len);
+ return ret < 0 ? ret : nvme_status_to_pr_err(ret);
+}
+
+__maybe_unused
+static int nvme_pr_register_ns(struct nvme_ns *ns, u64 old_key, u64 new_key,
+ u32 flags)
+{
+ struct nvmet_pr_register_data data = { 0 };
+ u32 cdw10;
+ int ret;
+
+ if (flags & ~PR_FL_IGNORE_KEY)
+ return -EOPNOTSUPP;
+
+ data.crkey = cpu_to_le64(old_key);
+ data.nrkey = cpu_to_le64(new_key);
+
+ cdw10 = old_key ? NVME_PR_REGISTER_ACT_REPLACE :
+ NVME_PR_REGISTER_ACT_REG;
+ cdw10 |= (flags & PR_FL_IGNORE_KEY) ? NVME_PR_IGNORE_KEY : 0;
+ cdw10 |= NVME_PR_CPTPL_PERSIST;
+
+ ret = nvme_send_pr_command_ns(ns, cdw10, 0, nvme_cmd_resv_register,
+ &data, sizeof(data));
+ return ret;
+}
+
static int nvme_pr_register(struct block_device *bdev, u64 old_key, u64 new_key,
unsigned int flags)
{
@@ -137,6 +182,26 @@ static int nvme_pr_register(struct block_device *bdev, u64 old_key, u64 new_key,
&data, sizeof(data));
}
+__maybe_unused
+static int nvme_pr_reserve_ns(struct nvme_ns *ns, u64 key, enum pr_type type,
+ u32 flags)
+{
+ struct nvmet_pr_acquire_data data = { 0 };
+ u32 cdw10;
+
+ if (flags & ~PR_FL_IGNORE_KEY)
+ return -EOPNOTSUPP;
+
+ data.crkey = cpu_to_le64(key);
+
+ cdw10 = NVME_PR_ACQUIRE_ACT_ACQUIRE;
+ cdw10 |= nvme_pr_type_from_blk(type) << 8;
+ cdw10 |= (flags & PR_FL_IGNORE_KEY) ? NVME_PR_IGNORE_KEY : 0;
+
+ return nvme_send_pr_command_ns(ns, cdw10, 0, nvme_cmd_resv_acquire,
+ &data, sizeof(data));
+}
+
static int nvme_pr_reserve(struct block_device *bdev, u64 key,
enum pr_type type, unsigned flags)
{
@@ -156,6 +221,24 @@ static int nvme_pr_reserve(struct block_device *bdev, u64 key,
&data, sizeof(data));
}
+__maybe_unused
+static int nvme_pr_preempt_ns(struct nvme_ns *ns, u64 old, u64 new,
+ enum pr_type type, bool abort)
+{
+ struct nvmet_pr_acquire_data data = { 0 };
+ u32 cdw10;
+
+ data.crkey = cpu_to_le64(old);
+ data.prkey = cpu_to_le64(new);
+
+ cdw10 = abort ? NVME_PR_ACQUIRE_ACT_PREEMPT_AND_ABORT :
+ NVME_PR_ACQUIRE_ACT_PREEMPT;
+ cdw10 |= nvme_pr_type_from_blk(type) << 8;
+
+ return nvme_send_pr_command_ns(ns, cdw10, 0, nvme_cmd_resv_acquire,
+ &data, sizeof(data));
+}
+
static int nvme_pr_preempt(struct block_device *bdev, u64 old, u64 new,
enum pr_type type, bool abort)
{
@@ -173,6 +256,21 @@ static int nvme_pr_preempt(struct block_device *bdev, u64 old, u64 new,
&data, sizeof(data));
}
+__maybe_unused
+static int nvme_pr_clear_ns(struct nvme_ns *ns, u64 key)
+{
+ struct nvmet_pr_release_data data = { 0 };
+ u32 cdw10;
+
+ data.crkey = cpu_to_le64(key);
+
+ cdw10 = NVME_PR_RELEASE_ACT_CLEAR;
+ cdw10 |= key ? 0 : NVME_PR_IGNORE_KEY;
+
+ return nvme_send_pr_command_ns(ns, cdw10, 0, nvme_cmd_resv_release,
+ &data, sizeof(data));
+}
+
static int nvme_pr_clear(struct block_device *bdev, u64 key)
{
struct nvmet_pr_release_data data = { 0 };
@@ -202,6 +300,45 @@ static int nvme_pr_release(struct block_device *bdev, u64 key, enum pr_type type
&data, sizeof(data));
}
+__maybe_unused
+static int nvme_pr_release_ns(struct nvme_ns *ns, u64 key, enum pr_type type)
+{
+ struct nvmet_pr_release_data data = { 0 };
+ u32 cdw10;
+
+ data.crkey = cpu_to_le64(key);
+
+ cdw10 = NVME_PR_RELEASE_ACT_RELEASE;
+ cdw10 |= nvme_pr_type_from_blk(type) << 8;
+ cdw10 |= key ? 0 : NVME_PR_IGNORE_KEY;
+
+ return nvme_send_pr_command_ns(ns, cdw10, 0, nvme_cmd_resv_release,
+ &data, sizeof(data));
+}
+
+static int nvme_mpath_pr_resv_report_ns(struct nvme_ns *ns, void *data,
+ u32 data_len, bool *eds)
+{
+ u32 cdw10, cdw11;
+ int ret;
+
+ cdw10 = nvme_bytes_to_numd(data_len);
+ cdw11 = NVME_EXTENDED_DATA_STRUCT;
+ *eds = true;
+
+retry:
+ ret = __nvme_send_pr_command_ns(ns, cdw10, cdw11, nvme_cmd_resv_report,
+ data, data_len);
+ if (ret == NVME_SC_HOST_ID_INCONSIST &&
+ cdw11 == NVME_EXTENDED_DATA_STRUCT) {
+ cdw11 = 0;
+ *eds = false;
+ goto retry;
+ }
+
+ return ret < 0 ? ret : nvme_status_to_pr_err(ret);
+}
+
static int nvme_pr_resv_report(struct block_device *bdev, void *data,
u32 data_len, bool *eds)
{
@@ -225,6 +362,52 @@ static int nvme_pr_resv_report(struct block_device *bdev, void *data,
return ret < 0 ? ret : nvme_status_to_pr_err(ret);
}
+__maybe_unused
+static int nvme_pr_read_keys_ns(struct nvme_ns *ns, struct pr_keys *keys_info)
+{
+ size_t rse_len;
+ u32 num_keys = keys_info->num_keys;
+ struct nvme_reservation_status_ext *rse;
+ int ret, i;
+ bool eds;
+
+ /*
+ * Assume we are using 128-bit host IDs and allocate a buffer large
+ * enough to get enough keys to fill the return keys buffer.
+ */
+ rse_len = struct_size(rse, regctl_eds, num_keys);
+ if (rse_len > U32_MAX)
+ return -EINVAL;
+
+ rse = kzalloc(rse_len, GFP_KERNEL);
+ if (!rse)
+ return -ENOMEM;
+
+ ret = nvme_mpath_pr_resv_report_ns(ns, rse, rse_len, &eds);
+ if (ret)
+ goto free_rse;
+
+ keys_info->generation = le32_to_cpu(rse->gen);
+ keys_info->num_keys = get_unaligned_le16(&rse->regctl);
+
+ num_keys = min(num_keys, keys_info->num_keys);
+ for (i = 0; i < num_keys; i++) {
+ if (eds) {
+ keys_info->keys[i] =
+ le64_to_cpu(rse->regctl_eds[i].rkey);
+ } else {
+ struct nvme_reservation_status *rs;
+
+ rs = (struct nvme_reservation_status *)rse;
+ keys_info->keys[i] = le64_to_cpu(rs->regctl_ds[i].rkey);
+ }
+ }
+
+free_rse:
+ kfree(rse);
+ return ret;
+}
+
static int nvme_pr_read_keys(struct block_device *bdev,
struct pr_keys *keys_info)
{
@@ -271,6 +454,70 @@ static int nvme_pr_read_keys(struct block_device *bdev,
return ret;
}
+__maybe_unused
+static int nvme_pr_read_reservation_ns(struct nvme_ns *ns,
+ struct pr_held_reservation *resv)
+{
+ struct nvme_reservation_status_ext tmp_rse, *rse;
+ int ret, i, num_regs;
+ u32 rse_len;
+ bool eds;
+
+get_num_regs:
+ /*
+ * Get the number of registrations so we know how big to allocate
+ * the response buffer.
+ */
+ ret = nvme_mpath_pr_resv_report_ns(ns, &tmp_rse, sizeof(tmp_rse),
+ &eds);
+ if (ret)
+ return ret;
+
+ num_regs = get_unaligned_le16(&tmp_rse.regctl);
+ if (!num_regs) {
+ resv->generation = le32_to_cpu(tmp_rse.gen);
+ return 0;
+ }
+
+ rse_len = struct_size(rse, regctl_eds, num_regs);
+ rse = kzalloc(rse_len, GFP_KERNEL);
+ if (!rse)
+ return -ENOMEM;
+
+ ret = nvme_mpath_pr_resv_report_ns(ns, rse, rse_len, &eds);
+ if (ret)
+ goto free_rse;
+
+ if (num_regs != get_unaligned_le16(&rse->regctl)) {
+ kfree(rse);
+ goto get_num_regs;
+ }
+
+ resv->generation = le32_to_cpu(rse->gen);
+ resv->type = block_pr_type_from_nvme(rse->rtype);
+
+ for (i = 0; i < num_regs; i++) {
+ if (eds) {
+ if (rse->regctl_eds[i].rcsts) {
+ resv->key = le64_to_cpu(rse->regctl_eds[i].rkey);
+ break;
+ }
+ } else {
+ struct nvme_reservation_status *rs;
+
+ rs = (struct nvme_reservation_status *)rse;
+ if (rs->regctl_ds[i].rcsts) {
+ resv->key = le64_to_cpu(rs->regctl_ds[i].rkey);
+ break;
+ }
+ }
+ }
+
+free_rse:
+ kfree(rse);
+ return ret;
+}
+
static int nvme_pr_read_reservation(struct block_device *bdev,
struct pr_held_reservation *resv)
{
@@ -333,6 +580,73 @@ static int nvme_pr_read_reservation(struct block_device *bdev,
return ret;
}
+#if defined(CONFIG_NVME_MULTIPATH)
+static int nvme_mpath_pr_register(struct mpath_device *mpath_device,
+ u64 old_key, u64 new_key, unsigned int flags)
+{
+ struct nvme_ns *ns = nvme_mpath_to_ns(mpath_device);
+
+ return nvme_pr_register_ns(ns, old_key, new_key, flags);
+}
+
+static int nvme_mpath_pr_reserve(struct mpath_device *mpath_device, u64 key,
+ enum pr_type type, unsigned flags)
+{
+ struct nvme_ns *ns = nvme_mpath_to_ns(mpath_device);
+
+ return nvme_pr_reserve_ns(ns, key, type, flags);
+}
+
+static int nvme_mpath_pr_release(struct mpath_device *mpath_device, u64 key,
+ enum pr_type type)
+{
+ struct nvme_ns *ns = nvme_mpath_to_ns(mpath_device);
+
+ return nvme_pr_release_ns(ns, key, type);
+}
+
+static int nvme_mpath_pr_preempt(struct mpath_device *mpath_device, u64 old,
+ u64 new, enum pr_type type, bool abort)
+{
+ struct nvme_ns *ns = nvme_mpath_to_ns(mpath_device);
+
+ return nvme_pr_preempt_ns(ns, old, new, type, abort);
+}
+
+static int nvme_mpath_pr_clear(struct mpath_device *mpath_device, u64 key)
+{
+ struct nvme_ns *ns = nvme_mpath_to_ns(mpath_device);
+
+ return nvme_pr_clear_ns(ns, key);
+}
+
+static int nvme_mpath_pr_read_keys(struct mpath_device *mpath_device,
+ struct pr_keys *keys_info)
+{
+ struct nvme_ns *ns = nvme_mpath_to_ns(mpath_device);
+
+ return nvme_pr_read_keys_ns(ns, keys_info);
+}
+
+static int nvme_mpath_pr_read_reservation(struct mpath_device *mpath_device,
+ struct pr_held_reservation *resv)
+{
+ struct nvme_ns *ns = nvme_mpath_to_ns(mpath_device);
+
+ return nvme_pr_read_reservation_ns(ns, resv);
+}
+
+const struct mpath_pr_ops nvme_mpath_pr_ops = {
+ .pr_register = nvme_mpath_pr_register,
+ .pr_reserve = nvme_mpath_pr_reserve,
+ .pr_release = nvme_mpath_pr_release,
+ .pr_preempt = nvme_mpath_pr_preempt,
+ .pr_clear = nvme_mpath_pr_clear,
+ .pr_read_keys = nvme_mpath_pr_read_keys,
+ .pr_read_reservation = nvme_mpath_pr_read_reservation,
+};
+#endif
+
const struct pr_ops nvme_pr_ops = {
.pr_register = nvme_pr_register,
.pr_reserve = nvme_pr_reserve,
--
2.43.5
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 13/19] nvme-multipath: add nvme_mpath_report_zones()
2026-02-25 15:39 [PATCH 00/19] nvme: switch to libmultipath John Garry
` (11 preceding siblings ...)
2026-02-25 15:40 ` [PATCH 12/19] nvme-multipath: add PR support for libmultipath John Garry
@ 2026-02-25 15:40 ` John Garry
2026-02-25 15:40 ` [PATCH 14/19] nvme-multipath: add nvme_mpath_get_unique_id() John Garry
` (6 subsequent siblings)
19 siblings, 0 replies; 28+ messages in thread
From: John Garry @ 2026-02-25 15:40 UTC (permalink / raw)
To: hch, kbusch, sagi, axboe, martin.petersen, james.bottomley, hare
Cc: jmeneghi, linux-nvme, linux-scsi, michael.christie, snitzer,
bmarzins, dm-devel, linux-block, linux-kernel, John Garry
Add callback for mpath_head_template.report_zones, which just calls into
nvme_ns_report_zones() after converting from mpath_device to NS.
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
drivers/nvme/host/multipath.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index ac75db92dd124..ee7228fced375 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -625,8 +625,17 @@ static int nvme_ns_head_report_zones(struct gendisk *disk, sector_t sector,
srcu_read_unlock(&head->srcu, srcu_idx);
return ret;
}
+
+static int nvme_mpath_report_zones(struct mpath_device *mpath_device,
+ sector_t sector, unsigned int nr_zones,
+ struct blk_report_zones_args *args)
+{
+ return nvme_ns_report_zones(nvme_mpath_to_ns(mpath_device), sector,
+ nr_zones, args);
+}
#else
#define nvme_ns_head_report_zones NULL
+#define nvme_mpath_report_zones NULL
#endif /* CONFIG_BLK_DEV_ZONED */
const struct block_device_operations nvme_ns_head_ops = {
@@ -1501,6 +1510,9 @@ static const struct mpath_head_template mpdt = {
.get_access_state = nvme_mpath_get_access_state,
.bdev_ioctl = nvme_mpath_bdev_ioctl,
.cdev_ioctl = nvme_mpath_cdev_ioctl,
+ #ifdef CONFIG_BLK_DEV_ZONED
+ .report_zones = nvme_mpath_report_zones,
+ #endif
.pr_ops = &nvme_mpath_pr_ops,
.chr_uring_cmd = nvme_mpath_chr_uring_cmd,
.chr_uring_cmd_iopoll = nvme_ns_chr_uring_cmd_iopoll,
--
2.43.5
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 14/19] nvme-multipath: add nvme_mpath_get_unique_id()
2026-02-25 15:39 [PATCH 00/19] nvme: switch to libmultipath John Garry
` (12 preceding siblings ...)
2026-02-25 15:40 ` [PATCH 13/19] nvme-multipath: add nvme_mpath_report_zones() John Garry
@ 2026-02-25 15:40 ` John Garry
2026-02-25 15:40 ` [PATCH 15/19] nvme-multipath: add nvme_mpath_synchronize() John Garry
` (5 subsequent siblings)
19 siblings, 0 replies; 28+ messages in thread
From: John Garry @ 2026-02-25 15:40 UTC (permalink / raw)
To: hch, kbusch, sagi, axboe, martin.petersen, james.bottomley, hare
Cc: jmeneghi, linux-nvme, linux-scsi, michael.christie, snitzer,
bmarzins, dm-devel, linux-block, linux-kernel, John Garry
Add a callback for handling .get_unique_id, which calls into
nvme_ns_get_unique_id().
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
drivers/nvme/host/multipath.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index ee7228fced375..15fba20cded67 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -610,6 +610,12 @@ static int nvme_ns_head_get_unique_id(struct gendisk *disk, u8 id[16],
return ret;
}
+static int nvme_mpath_get_unique_id(struct mpath_device *mpath_device,
+ u8 id[16], enum blk_unique_id type)
+{
+ return nvme_ns_get_unique_id(nvme_mpath_to_ns(mpath_device), id, type);
+}
+
#ifdef CONFIG_BLK_DEV_ZONED
static int nvme_ns_head_report_zones(struct gendisk *disk, sector_t sector,
unsigned int nr_zones, struct blk_report_zones_args *args)
@@ -1517,4 +1523,5 @@ static const struct mpath_head_template mpdt = {
.chr_uring_cmd = nvme_mpath_chr_uring_cmd,
.chr_uring_cmd_iopoll = nvme_ns_chr_uring_cmd_iopoll,
.get_iopolicy = nvme_mpath_get_iopolicy,
+ .get_unique_id = nvme_mpath_get_unique_id,
};
--
2.43.5
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 15/19] nvme-multipath: add nvme_mpath_synchronize()
2026-02-25 15:39 [PATCH 00/19] nvme: switch to libmultipath John Garry
` (13 preceding siblings ...)
2026-02-25 15:40 ` [PATCH 14/19] nvme-multipath: add nvme_mpath_get_unique_id() John Garry
@ 2026-02-25 15:40 ` John Garry
2026-02-25 15:40 ` [PATCH 16/19] nvme-multipath: add nvme_mpath_{add,delete}_ns() John Garry
` (4 subsequent siblings)
19 siblings, 0 replies; 28+ messages in thread
From: John Garry @ 2026-02-25 15:40 UTC (permalink / raw)
To: hch, kbusch, sagi, axboe, martin.petersen, james.bottomley, hare
Cc: jmeneghi, linux-nvme, linux-scsi, michael.christie, snitzer,
bmarzins, dm-devel, linux-block, linux-kernel, John Garry
Add a wrapper which calls into mpath_synchronize.
The mpath_disk is added as we can be called from paths when the mpath_head
has not been allocated.
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
drivers/nvme/host/multipath.c | 10 ++++++++++
drivers/nvme/host/nvme.h | 4 ++++
2 files changed, 14 insertions(+)
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 15fba20cded67..7ee0ad7bdfa26 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -972,6 +972,16 @@ static void nvme_update_ns_ana_state(struct nvme_ana_group_desc *desc,
}
}
+void nvme_mpath_synchronize(struct nvme_ns_head *head)
+{
+ struct mpath_disk *mpath_disk = head->mpath_disk;
+
+ if (!mpath_disk)
+ return;
+
+ mpath_synchronize(mpath_disk->mpath_head);
+}
+
static int nvme_update_ana_state(struct nvme_ctrl *ctrl,
struct nvme_ana_group_desc *desc, void *data)
{
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 619d2fff969e3..d642b0eddf010 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -1027,6 +1027,7 @@ static inline bool nvme_ctrl_use_ana(struct nvme_ctrl *ctrl)
return ctrl->ana_log_buf != NULL;
}
+void nvme_mpath_synchronize(struct nvme_ns_head *head);
void nvme_mpath_unfreeze(struct nvme_subsystem *subsys);
void nvme_mpath_wait_freeze(struct nvme_subsystem *subsys);
void nvme_mpath_start_freeze(struct nvme_subsystem *subsys);
@@ -1095,6 +1096,9 @@ static inline bool nvme_ctrl_use_ana(struct nvme_ctrl *ctrl)
{
return false;
}
+static inline void nvme_mpath_synchronize(struct nvme_ns_head *head)
+{
+}
static inline void nvme_failover_req(struct request *req)
{
}
--
2.43.5
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 16/19] nvme-multipath: add nvme_mpath_{add,delete}_ns()
2026-02-25 15:39 [PATCH 00/19] nvme: switch to libmultipath John Garry
` (14 preceding siblings ...)
2026-02-25 15:40 ` [PATCH 15/19] nvme-multipath: add nvme_mpath_synchronize() John Garry
@ 2026-02-25 15:40 ` John Garry
2026-03-02 12:48 ` Nilay Shroff
2026-02-25 15:40 ` [PATCH 17/19] nvme-multipath: add nvme_mpath_head_queue_if_no_path() John Garry
` (3 subsequent siblings)
19 siblings, 1 reply; 28+ messages in thread
From: John Garry @ 2026-02-25 15:40 UTC (permalink / raw)
To: hch, kbusch, sagi, axboe, martin.petersen, james.bottomley, hare
Cc: jmeneghi, linux-nvme, linux-scsi, michael.christie, snitzer,
bmarzins, dm-devel, linux-block, linux-kernel, John Garry
Add functions to call into the mpath_add_device() and mpath_delete_device()
functions.
The per-NS gendisk pointer is used as the mpath_device disk pointer, which
is used in libmultipath for references the per-path block device.
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
drivers/nvme/host/multipath.c | 26 ++++++++++++++++++++++++++
drivers/nvme/host/nvme.h | 8 ++++++++
2 files changed, 34 insertions(+)
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 7ee0ad7bdfa26..bd96211123fee 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -982,6 +982,32 @@ void nvme_mpath_synchronize(struct nvme_ns_head *head)
mpath_synchronize(mpath_disk->mpath_head);
}
+void nvme_mpath_add_ns(struct nvme_ns *ns)
+{
+ struct nvme_ns_head *head = ns->head;
+ struct mpath_disk *mpath_disk = head->mpath_disk;
+ struct mpath_head *mpath_head;
+
+ if (!mpath_disk)
+ return;
+
+ mpath_head = mpath_disk->mpath_head;
+
+ ns->mpath_device.disk = ns->disk;
+ mpath_add_device(mpath_head, &ns->mpath_device);
+}
+
+void nvme_mpath_delete_ns(struct nvme_ns *ns)
+{
+ struct nvme_ns_head *head = ns->head;
+ struct mpath_disk *mpath_disk = head->mpath_disk;
+
+ if (!mpath_disk)
+ return;
+
+ mpath_delete_device(mpath_disk->mpath_head, &ns->mpath_device);
+}
+
static int nvme_update_ana_state(struct nvme_ctrl *ctrl,
struct nvme_ana_group_desc *desc, void *data)
{
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index d642b0eddf010..3c08212e4a54f 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -1028,6 +1028,8 @@ static inline bool nvme_ctrl_use_ana(struct nvme_ctrl *ctrl)
}
void nvme_mpath_synchronize(struct nvme_ns_head *head);
+void nvme_mpath_add_ns(struct nvme_ns *ns);
+void nvme_mpath_delete_ns(struct nvme_ns *ns);
void nvme_mpath_unfreeze(struct nvme_subsystem *subsys);
void nvme_mpath_wait_freeze(struct nvme_subsystem *subsys);
void nvme_mpath_start_freeze(struct nvme_subsystem *subsys);
@@ -1099,6 +1101,12 @@ static inline bool nvme_ctrl_use_ana(struct nvme_ctrl *ctrl)
static inline void nvme_mpath_synchronize(struct nvme_ns_head *head)
{
}
+static inline void nvme_mpath_add_ns(struct nvme_ns *ns)
+{
+}
+static inline void nvme_mpath_delete_ns(struct nvme_ns *ns)
+{
+}
static inline void nvme_failover_req(struct request *req)
{
}
--
2.43.5
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 17/19] nvme-multipath: add nvme_mpath_head_queue_if_no_path()
2026-02-25 15:39 [PATCH 00/19] nvme: switch to libmultipath John Garry
` (15 preceding siblings ...)
2026-02-25 15:40 ` [PATCH 16/19] nvme-multipath: add nvme_mpath_{add,delete}_ns() John Garry
@ 2026-02-25 15:40 ` John Garry
2026-02-25 15:40 ` [PATCH 18/19] nvme-multipath: set mpath_head_template.device_groups John Garry
` (2 subsequent siblings)
19 siblings, 0 replies; 28+ messages in thread
From: John Garry @ 2026-02-25 15:40 UTC (permalink / raw)
To: hch, kbusch, sagi, axboe, martin.petersen, james.bottomley, hare
Cc: jmeneghi, linux-nvme, linux-scsi, michael.christie, snitzer,
bmarzins, dm-devel, linux-block, linux-kernel, John Garry
Add a wrapper to call into mpath_head_queue_if_no_path().
The mpath_disk is added as we can be called from paths when the mpath_head
has not been allocated.
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
drivers/nvme/host/multipath.c | 10 ++++++++++
drivers/nvme/host/nvme.h | 5 +++++
2 files changed, 15 insertions(+)
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index bd96211123fee..fdb7f3b55a197 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -214,6 +214,16 @@ void nvme_mpath_end_request(struct request *rq)
nvme_req(rq)->start_time);
}
+bool nvme_mpath_head_queue_if_no_path(struct nvme_ns_head *head)
+{
+ struct mpath_disk *mpath_disk = head->mpath_disk;
+
+ if (!mpath_disk)
+ return false;
+
+ return mpath_head_queue_if_no_path(mpath_disk->mpath_head);
+}
+
void nvme_kick_requeue_lists(struct nvme_ctrl *ctrl)
{
struct nvme_ns *ns;
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 3c08212e4a54f..e276a7bcb7aff 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -1052,6 +1052,7 @@ void nvme_mpath_clear_ctrl_paths(struct nvme_ctrl *ctrl);
void nvme_mpath_remove_disk(struct nvme_ns_head *head);
void nvme_mpath_start_request(struct request *rq);
void nvme_mpath_end_request(struct request *rq);
+bool nvme_mpath_head_queue_if_no_path(struct nvme_ns_head *head);
int nvme_mpath_bdev_ioctl(struct block_device *bdev,
struct mpath_device *mpath_device, blk_mode_t mode,
unsigned int cmd, unsigned long arg, int srcu_idx);
@@ -1196,6 +1197,10 @@ static inline bool nvme_mpath_queue_if_no_path(struct nvme_ns_head *head)
{
return false;
}
+static inline bool nvme_mpath_head_queue_if_no_path(struct nvme_ns_head *head)
+{
+ return false;
+}
#endif /* CONFIG_NVME_MULTIPATH */
int nvme_ns_get_unique_id(struct nvme_ns *ns, u8 id[16],
--
2.43.5
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 18/19] nvme-multipath: set mpath_head_template.device_groups
2026-02-25 15:39 [PATCH 00/19] nvme: switch to libmultipath John Garry
` (16 preceding siblings ...)
2026-02-25 15:40 ` [PATCH 17/19] nvme-multipath: add nvme_mpath_head_queue_if_no_path() John Garry
@ 2026-02-25 15:40 ` John Garry
2026-02-25 15:40 ` [PATCH 19/19] nvme-multipath: switch to use libmultipath John Garry
2026-03-02 14:12 ` [PATCH 00/19] nvme: switch to libmultipath Christoph Hellwig
19 siblings, 0 replies; 28+ messages in thread
From: John Garry @ 2026-02-25 15:40 UTC (permalink / raw)
To: hch, kbusch, sagi, axboe, martin.petersen, james.bottomley, hare
Cc: jmeneghi, linux-nvme, linux-scsi, michael.christie, snitzer,
bmarzins, dm-devel, linux-block, linux-kernel, John Garry
Set the instance of mpath_head_template.device_groups to
nvme_ns_attr_groups.
This callback is used for setting the attribute groups in adding the
multipath gendisk.
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
drivers/nvme/host/multipath.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index fdb7f3b55a197..081a8a20a9908 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -1570,4 +1570,5 @@ static const struct mpath_head_template mpdt = {
.chr_uring_cmd_iopoll = nvme_ns_chr_uring_cmd_iopoll,
.get_iopolicy = nvme_mpath_get_iopolicy,
.get_unique_id = nvme_mpath_get_unique_id,
+ .device_groups = nvme_ns_attr_groups,
};
--
2.43.5
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 19/19] nvme-multipath: switch to use libmultipath
2026-02-25 15:39 [PATCH 00/19] nvme: switch to libmultipath John Garry
` (17 preceding siblings ...)
2026-02-25 15:40 ` [PATCH 18/19] nvme-multipath: set mpath_head_template.device_groups John Garry
@ 2026-02-25 15:40 ` John Garry
2026-03-02 12:57 ` Nilay Shroff
2026-03-02 14:12 ` [PATCH 00/19] nvme: switch to libmultipath Christoph Hellwig
19 siblings, 1 reply; 28+ messages in thread
From: John Garry @ 2026-02-25 15:40 UTC (permalink / raw)
To: hch, kbusch, sagi, axboe, martin.petersen, james.bottomley, hare
Cc: jmeneghi, linux-nvme, linux-scsi, michael.christie, snitzer,
bmarzins, dm-devel, linux-block, linux-kernel, John Garry
Now that as much unused libmulipath-based code has been added, do the
full switch over.
The major change is that the multipath management is moved out of the
nvme_ns_head structure and into mpath_head and mpath_disk structures.
The check for ns->head->disk is now replaced with a ns->mpath_disk check,
it decide whether we are really in multipath mode. Similarly everywhere
we were referencing ns->head->disk, we reference ns->mpath_disk->disk.
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
drivers/nvme/host/core.c | 65 ++-
drivers/nvme/host/ioctl.c | 89 ----
drivers/nvme/host/multipath.c | 865 +++++++---------------------------
drivers/nvme/host/nvme.h | 72 +--
drivers/nvme/host/pr.c | 355 +++-----------
drivers/nvme/host/sysfs.c | 84 ++--
6 files changed, 318 insertions(+), 1212 deletions(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 2d0faec902eb2..be757879f19b2 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -667,9 +667,7 @@ static void nvme_free_ns_head(struct kref *ref)
struct nvme_ns_head *head =
container_of(ref, struct nvme_ns_head, ref);
- nvme_mpath_put_disk(head);
ida_free(&head->subsys->ns_ida, head->instance);
- cleanup_srcu_struct(&head->srcu);
nvme_put_subsystem(head->subsys);
kfree(head->plids);
kfree(head);
@@ -2488,9 +2486,12 @@ static int nvme_update_ns_info(struct nvme_ns *ns, struct nvme_ns_info *info)
struct queue_limits *ns_lim = &ns->disk->queue->limits;
struct queue_limits lim;
unsigned int memflags;
+ struct nvme_ns_head *head = ns->head;
+ struct mpath_disk *mpath_disk = head->mpath_disk;
+ struct gendisk *disk = mpath_disk->disk;
- lim = queue_limits_start_update(ns->head->disk->queue);
- memflags = blk_mq_freeze_queue(ns->head->disk->queue);
+ lim = queue_limits_start_update(disk->queue);
+ memflags = blk_mq_freeze_queue(disk->queue);
/*
* queue_limits mixes values that are the hardware limitations
* for bio splitting with what is the device configuration.
@@ -2511,20 +2512,20 @@ static int nvme_update_ns_info(struct nvme_ns *ns, struct nvme_ns_info *info)
lim.io_min = ns_lim->io_min;
lim.io_opt = ns_lim->io_opt;
queue_limits_stack_bdev(&lim, ns->disk->part0, 0,
- ns->head->disk->disk_name);
+ disk->disk_name);
if (unsupported)
- ns->head->disk->flags |= GENHD_FL_HIDDEN;
+ disk->flags |= GENHD_FL_HIDDEN;
else
nvme_init_integrity(ns->head, &lim, info);
lim.max_write_streams = ns_lim->max_write_streams;
lim.write_stream_granularity = ns_lim->write_stream_granularity;
- ret = queue_limits_commit_update(ns->head->disk->queue, &lim);
+ ret = queue_limits_commit_update(disk->queue, &lim);
- set_capacity_and_notify(ns->head->disk, get_capacity(ns->disk));
- set_disk_ro(ns->head->disk, nvme_ns_is_readonly(ns, info));
- nvme_mpath_revalidate_paths(ns->head);
+ set_capacity_and_notify(disk, get_capacity(ns->disk));
+ set_disk_ro(disk, nvme_ns_is_readonly(ns, info));
+ nvme_mpath_revalidate_paths(head);
- blk_mq_unfreeze_queue(ns->head->disk->queue, memflags);
+ blk_mq_unfreeze_queue(disk->queue, memflags);
}
return ret;
@@ -3884,10 +3885,6 @@ static struct nvme_ns_head *nvme_alloc_ns_head(struct nvme_ctrl *ctrl,
size_t size = sizeof(*head);
int ret = -ENOMEM;
-#ifdef CONFIG_NVME_MULTIPATH
- size += num_possible_nodes() * sizeof(struct nvme_ns *);
-#endif
-
head = kzalloc(size, GFP_KERNEL);
if (!head)
goto out;
@@ -3895,10 +3892,7 @@ static struct nvme_ns_head *nvme_alloc_ns_head(struct nvme_ctrl *ctrl,
if (ret < 0)
goto out_free_head;
head->instance = ret;
- INIT_LIST_HEAD(&head->list);
- ret = init_srcu_struct(&head->srcu);
- if (ret)
- goto out_ida_remove;
+
head->subsys = ctrl->subsys;
head->ns_id = info->nsid;
head->ids = info->ids;
@@ -3911,22 +3905,20 @@ static struct nvme_ns_head *nvme_alloc_ns_head(struct nvme_ctrl *ctrl,
if (head->ids.csi) {
ret = nvme_get_effects_log(ctrl, head->ids.csi, &head->effects);
if (ret)
- goto out_cleanup_srcu;
+ goto out_ida_free;
} else
head->effects = ctrl->effects;
ret = nvme_mpath_alloc_disk(ctrl, head);
if (ret)
- goto out_cleanup_srcu;
+ goto out_ida_free;
list_add_tail(&head->entry, &ctrl->subsys->nsheads);
kref_get(&ctrl->subsys->ref);
return head;
-out_cleanup_srcu:
- cleanup_srcu_struct(&head->srcu);
-out_ida_remove:
+out_ida_free:
ida_free(&ctrl->subsys->ns_ida, head->instance);
out_free_head:
kfree(head);
@@ -3965,7 +3957,7 @@ static int nvme_global_check_duplicate_ids(struct nvme_subsystem *this,
static int nvme_init_ns_head(struct nvme_ns *ns, struct nvme_ns_info *info)
{
struct nvme_ctrl *ctrl = ns->ctrl;
- struct nvme_ns_head *head = NULL;
+ struct nvme_ns_head *head;
int ret;
ret = nvme_global_check_duplicate_ids(ctrl->subsys, &info->ids);
@@ -4046,14 +4038,11 @@ static int nvme_init_ns_head(struct nvme_ns *ns, struct nvme_ns_info *info)
}
}
- list_add_tail_rcu(&ns->siblings, &head->list);
head->ns_count++;
ns->head = head;
+ nvme_mpath_add_ns(ns);
mutex_unlock(&ctrl->subsys->lock);
-#ifdef CONFIG_NVME_MULTIPATH
- cancel_delayed_work(&head->remove_work);
-#endif
return 0;
out_put_ns_head:
@@ -4192,24 +4181,24 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, struct nvme_ns_info *info)
synchronize_srcu(&ctrl->srcu);
out_unlink_ns:
mutex_lock(&ctrl->subsys->lock);
- list_del_rcu(&ns->siblings);
+ nvme_mpath_delete_ns(ns);
ns->head->ns_count--;
if (!ns->head->ns_count) {
list_del_init(&ns->head->entry);
/*
* If multipath is not configured, we still create a namespace
- * head (nshead), but head->disk is not initialized in that
+ * head (nshead), but mpath_head->disk is not initialized in that
* case. As a result, only a single reference to nshead is held
* (via kref_init()) when it is created. Therefore, ensure that
* we do not release the reference to nshead twice if head->disk
* is not present.
*/
- if (ns->head->disk)
+ if (nvme_mpath_has_disk(ns->head))
last_path = true;
}
mutex_unlock(&ctrl->subsys->lock);
if (last_path)
- nvme_put_ns_head(ns->head);
+ nvme_mpath_remove_disk(ns->head);
nvme_put_ns_head(ns->head);
out_cleanup_disk:
put_disk(disk);
@@ -4233,24 +4222,24 @@ static void nvme_ns_remove(struct nvme_ns *ns)
* Ensure that !NVME_NS_READY is seen by other threads to prevent
* this ns going back into current_path.
*/
- synchronize_srcu(&ns->head->srcu);
+ nvme_mpath_synchronize(head);
/* wait for concurrent submissions */
if (nvme_mpath_clear_current_path(ns))
- synchronize_srcu(&ns->head->srcu);
+ nvme_mpath_synchronize(head);
mutex_lock(&ns->ctrl->subsys->lock);
- list_del_rcu(&ns->siblings);
+ nvme_mpath_delete_ns(ns);
head->ns_count--;
if (!head->ns_count) {
- if (!nvme_mpath_queue_if_no_path(ns->head))
+ if (!nvme_mpath_head_queue_if_no_path(head))
list_del_init(&ns->head->entry);
last_path = true;
}
mutex_unlock(&ns->ctrl->subsys->lock);
/* guarantee not available in head->list */
- synchronize_srcu(&ns->head->srcu);
+ nvme_mpath_synchronize(head);
if (!nvme_ns_head_multipath(ns->head))
nvme_cdev_del(&ns->cdev, &ns->cdev_device);
diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c
index 773c819cde52a..a243662b461e9 100644
--- a/drivers/nvme/host/ioctl.c
+++ b/drivers/nvme/host/ioctl.c
@@ -710,22 +710,6 @@ int nvme_mpath_chr_uring_cmd(struct mpath_device *mpath_device,
return nvme_ns_uring_cmd(ns, ioucmd, issue_flags);
}
-static int nvme_ns_head_ctrl_ioctl(struct nvme_ns *ns, unsigned int cmd,
- void __user *argp, struct nvme_ns_head *head, int srcu_idx,
- bool open_for_write)
- __releases(&head->srcu)
-{
- struct nvme_ctrl *ctrl = ns->ctrl;
- int ret;
-
- nvme_get_ctrl(ns->ctrl);
- srcu_read_unlock(&head->srcu, srcu_idx);
- ret = nvme_ctrl_ioctl(ns->ctrl, cmd, argp, open_for_write);
-
- nvme_put_ctrl(ctrl);
- return ret;
-}
-
int nvme_mpath_bdev_ioctl(struct block_device *bdev,
struct mpath_device *mpath_device, blk_mode_t mode,
unsigned int cmd, unsigned long arg, int srcu_idx)
@@ -783,79 +767,6 @@ int nvme_mpath_cdev_ioctl(struct mpath_head *mpath_head,
return ret;
}
-int nvme_ns_head_ioctl(struct block_device *bdev, blk_mode_t mode,
- unsigned int cmd, unsigned long arg)
-{
- struct nvme_ns_head *head = bdev->bd_disk->private_data;
- bool open_for_write = mode & BLK_OPEN_WRITE;
- void __user *argp = (void __user *)arg;
- struct nvme_ns *ns;
- int srcu_idx, ret = -EWOULDBLOCK;
- unsigned int flags = 0;
-
- if (bdev_is_partition(bdev))
- flags |= NVME_IOCTL_PARTITION;
-
- srcu_idx = srcu_read_lock(&head->srcu);
- ns = nvme_find_path(head);
- if (!ns)
- goto out_unlock;
-
- /*
- * Handle ioctls that apply to the controller instead of the namespace
- * separately and drop the ns SRCU reference early. This avoids a
- * deadlock when deleting namespaces using the passthrough interface.
- */
- if (is_ctrl_ioctl(cmd))
- return nvme_ns_head_ctrl_ioctl(ns, cmd, argp, head, srcu_idx,
- open_for_write);
-
- ret = nvme_ns_ioctl(ns, cmd, argp, flags, open_for_write);
-out_unlock:
- srcu_read_unlock(&head->srcu, srcu_idx);
- return ret;
-}
-
-long nvme_ns_head_chr_ioctl(struct file *file, unsigned int cmd,
- unsigned long arg)
-{
- bool open_for_write = file->f_mode & FMODE_WRITE;
- struct cdev *cdev = file_inode(file)->i_cdev;
- struct nvme_ns_head *head =
- container_of(cdev, struct nvme_ns_head, cdev);
- void __user *argp = (void __user *)arg;
- struct nvme_ns *ns;
- int srcu_idx, ret = -EWOULDBLOCK;
-
- srcu_idx = srcu_read_lock(&head->srcu);
- ns = nvme_find_path(head);
- if (!ns)
- goto out_unlock;
-
- if (is_ctrl_ioctl(cmd))
- return nvme_ns_head_ctrl_ioctl(ns, cmd, argp, head, srcu_idx,
- open_for_write);
-
- ret = nvme_ns_ioctl(ns, cmd, argp, 0, open_for_write);
-out_unlock:
- srcu_read_unlock(&head->srcu, srcu_idx);
- return ret;
-}
-
-int nvme_ns_head_chr_uring_cmd(struct io_uring_cmd *ioucmd,
- unsigned int issue_flags)
-{
- struct cdev *cdev = file_inode(ioucmd->file)->i_cdev;
- struct nvme_ns_head *head = container_of(cdev, struct nvme_ns_head, cdev);
- int srcu_idx = srcu_read_lock(&head->srcu);
- struct nvme_ns *ns = nvme_find_path(head);
- int ret = -EINVAL;
-
- if (ns)
- ret = nvme_ns_uring_cmd(ns, ioucmd, issue_flags);
- srcu_read_unlock(&head->srcu, srcu_idx);
- return ret;
-}
#endif /* CONFIG_NVME_MULTIPATH */
int nvme_dev_uring_cmd(struct io_uring_cmd *ioucmd, unsigned int issue_flags)
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 081a8a20a9908..c686cabfd9d16 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -67,33 +67,17 @@ module_param_cb(multipath_always_on, &multipath_always_on_ops,
MODULE_PARM_DESC(multipath_always_on,
"create multipath node always except for private namespace with non-unique nsid; note that this also implicitly enables native multipath support");
-static const char *nvme_iopolicy_names[] = {
- [NVME_IOPOLICY_NUMA] = "numa",
- [NVME_IOPOLICY_RR] = "round-robin",
- [NVME_IOPOLICY_QD] = "queue-depth",
-};
-static int iopolicy = NVME_IOPOLICY_NUMA;
+static int iopolicy = MPATH_IOPOLICY_NUMA;
static int nvme_set_iopolicy(const char *val, const struct kernel_param *kp)
{
- if (!val)
- return -EINVAL;
- if (!strncmp(val, "numa", 4))
- iopolicy = NVME_IOPOLICY_NUMA;
- else if (!strncmp(val, "round-robin", 11))
- iopolicy = NVME_IOPOLICY_RR;
- else if (!strncmp(val, "queue-depth", 11))
- iopolicy = NVME_IOPOLICY_QD;
- else
- return -EINVAL;
-
- return 0;
+ return mpath_set_iopolicy(val, &iopolicy);
}
static int nvme_get_iopolicy(char *buf, const struct kernel_param *kp)
{
- return sprintf(buf, "%s\n", nvme_iopolicy_names[iopolicy]);
+ return mpath_get_iopolicy(buf, iopolicy);
}
module_param_call(iopolicy, nvme_set_iopolicy, nvme_get_iopolicy,
@@ -103,7 +87,7 @@ MODULE_PARM_DESC(iopolicy,
void nvme_mpath_default_iopolicy(struct nvme_subsystem *subsys)
{
- subsys->iopolicy = iopolicy;
+ subsys->iopolicy.iopolicy = iopolicy;
}
void nvme_mpath_unfreeze(struct nvme_subsystem *subsys)
@@ -111,9 +95,13 @@ void nvme_mpath_unfreeze(struct nvme_subsystem *subsys)
struct nvme_ns_head *h;
lockdep_assert_held(&subsys->lock);
- list_for_each_entry(h, &subsys->nsheads, entry)
- if (h->disk)
- blk_mq_unfreeze_queue_nomemrestore(h->disk->queue);
+ list_for_each_entry(h, &subsys->nsheads, entry) {
+ struct mpath_disk *mpath_disk = h->mpath_disk;
+
+ if (mpath_disk)
+ blk_mq_unfreeze_queue_nomemrestore(
+ mpath_disk->disk->queue);
+ }
}
void nvme_mpath_wait_freeze(struct nvme_subsystem *subsys)
@@ -121,9 +109,12 @@ void nvme_mpath_wait_freeze(struct nvme_subsystem *subsys)
struct nvme_ns_head *h;
lockdep_assert_held(&subsys->lock);
- list_for_each_entry(h, &subsys->nsheads, entry)
- if (h->disk)
- blk_mq_freeze_queue_wait(h->disk->queue);
+ list_for_each_entry(h, &subsys->nsheads, entry) {
+ struct mpath_disk *mpath_disk = h->mpath_disk;
+
+ if (mpath_disk)
+ blk_mq_freeze_queue_wait(mpath_disk->disk->queue);
+ }
}
void nvme_mpath_start_freeze(struct nvme_subsystem *subsys)
@@ -131,15 +122,22 @@ void nvme_mpath_start_freeze(struct nvme_subsystem *subsys)
struct nvme_ns_head *h;
lockdep_assert_held(&subsys->lock);
- list_for_each_entry(h, &subsys->nsheads, entry)
- if (h->disk)
- blk_freeze_queue_start(h->disk->queue);
+ list_for_each_entry(h, &subsys->nsheads, entry) {
+ struct mpath_disk *mpath_disk = h->mpath_disk;
+
+ if (mpath_disk)
+ blk_freeze_queue_start(mpath_disk->disk->queue);
+ }
}
void nvme_failover_req(struct request *req)
{
struct nvme_ns *ns = req->q->queuedata;
+ struct nvme_ns_head *head = ns->head;
+ struct mpath_disk *mpath_disk = head->mpath_disk;
+ struct mpath_head *mpath_head = mpath_disk->mpath_head;
u16 status = nvme_req(req)->status & NVME_SCT_SC_MASK;
+ struct gendisk *disk = mpath_disk->disk;
unsigned long flags;
struct bio *bio;
@@ -155,9 +153,9 @@ void nvme_failover_req(struct request *req)
queue_work(nvme_wq, &ns->ctrl->ana_work);
}
- spin_lock_irqsave(&ns->head->requeue_lock, flags);
+ spin_lock_irqsave(&mpath_head->requeue_lock, flags);
for (bio = req->bio; bio; bio = bio->bi_next) {
- bio_set_dev(bio, ns->head->disk->part0);
+ bio_set_dev(bio, disk->part0);
if (bio->bi_opf & REQ_POLLED) {
bio->bi_opf &= ~REQ_POLLED;
bio->bi_cookie = BLK_QC_T_NONE;
@@ -171,20 +169,23 @@ void nvme_failover_req(struct request *req)
*/
bio->bi_opf &= ~REQ_NOWAIT;
}
- blk_steal_bios(&ns->head->requeue_list, req);
- spin_unlock_irqrestore(&ns->head->requeue_lock, flags);
+ blk_steal_bios(&mpath_head->requeue_list, req);
+ spin_unlock_irqrestore(&mpath_head->requeue_lock, flags);
nvme_req(req)->status = 0;
nvme_end_req(req);
- kblockd_schedule_work(&ns->head->requeue_work);
+ kblockd_schedule_work(&mpath_head->requeue_work);
}
void nvme_mpath_start_request(struct request *rq)
{
struct nvme_ns *ns = rq->q->queuedata;
- struct gendisk *disk = ns->head->disk;
+ struct nvme_ns_head *head = ns->head;
+ struct mpath_disk *mpath_disk = head->mpath_disk;
+ struct gendisk *disk = mpath_disk->disk;
+ struct nvme_subsystem *subsys = head->subsys;
- if ((READ_ONCE(ns->head->subsys->iopolicy) == NVME_IOPOLICY_QD) &&
+ if (mpath_qd_iopolicy(&subsys->iopolicy) &&
!(nvme_req(rq)->flags & NVME_MPATH_CNT_ACTIVE)) {
atomic_inc(&ns->ctrl->nr_active);
nvme_req(rq)->flags |= NVME_MPATH_CNT_ACTIVE;
@@ -203,13 +204,15 @@ EXPORT_SYMBOL_GPL(nvme_mpath_start_request);
void nvme_mpath_end_request(struct request *rq)
{
struct nvme_ns *ns = rq->q->queuedata;
+ struct nvme_ns_head *head = ns->head;
+ struct mpath_disk *mpath_disk = head->mpath_disk;
if (nvme_req(rq)->flags & NVME_MPATH_CNT_ACTIVE)
atomic_dec_if_positive(&ns->ctrl->nr_active);
if (!(nvme_req(rq)->flags & NVME_MPATH_IO_STATS))
return;
- bdev_end_io_acct(ns->head->disk->part0, req_op(rq),
+ bdev_end_io_acct(mpath_disk->disk->part0, req_op(rq),
blk_rq_bytes(rq) >> SECTOR_SHIFT,
nvme_req(rq)->start_time);
}
@@ -232,11 +235,17 @@ void nvme_kick_requeue_lists(struct nvme_ctrl *ctrl)
srcu_idx = srcu_read_lock(&ctrl->srcu);
list_for_each_entry_srcu(ns, &ctrl->namespaces, list,
srcu_read_lock_held(&ctrl->srcu)) {
- if (!ns->head->disk)
+ struct mpath_disk *mpath_disk = ns->head->mpath_disk;
+ struct mpath_head *mpath_head;
+ struct gendisk *disk;
+
+ if (!mpath_disk)
continue;
- kblockd_schedule_work(&ns->head->requeue_work);
+ mpath_head = mpath_disk->mpath_head;
+ disk = mpath_disk->disk;
+ kblockd_schedule_work(&mpath_head->requeue_work);
if (nvme_ctrl_state(ns->ctrl) == NVME_CTRL_LIVE)
- disk_uevent(ns->head->disk, KOBJ_CHANGE);
+ disk_uevent(disk, KOBJ_CHANGE);
}
srcu_read_unlock(&ctrl->srcu, srcu_idx);
}
@@ -253,20 +262,13 @@ static const char *nvme_ana_state_names[] = {
bool nvme_mpath_clear_current_path(struct nvme_ns *ns)
{
struct nvme_ns_head *head = ns->head;
- bool changed = false;
- int node;
+ struct mpath_disk *mpath_disk = head->mpath_disk;
- if (!head)
- goto out;
+ if (!mpath_disk)
+ return false;
- for_each_node(node) {
- if (ns == rcu_access_pointer(head->current_path[node])) {
- rcu_assign_pointer(head->current_path[node], NULL);
- changed = true;
- }
- }
-out:
- return changed;
+ return mpath_clear_current_path(mpath_disk->mpath_head,
+ &ns->mpath_device);
}
void nvme_mpath_clear_ctrl_paths(struct nvme_ctrl *ctrl)
@@ -277,30 +279,35 @@ void nvme_mpath_clear_ctrl_paths(struct nvme_ctrl *ctrl)
srcu_idx = srcu_read_lock(&ctrl->srcu);
list_for_each_entry_srcu(ns, &ctrl->namespaces, list,
srcu_read_lock_held(&ctrl->srcu)) {
+ struct nvme_ns_head *head = ns->head;
+ struct mpath_disk *mpath_disk = head->mpath_disk;
+
+ if (!mpath_disk)
+ continue;
+
nvme_mpath_clear_current_path(ns);
- kblockd_schedule_work(&ns->head->requeue_work);
+ kblockd_schedule_work(&mpath_disk->mpath_head->requeue_work);
}
srcu_read_unlock(&ctrl->srcu, srcu_idx);
}
+static void nvme_mpath_revalidate_paths_cb(struct mpath_device *mpath_device,
+ sector_t capacity)
+{
+ struct nvme_ns *ns = nvme_mpath_to_ns(mpath_device);
+
+ if (capacity != get_capacity(ns->disk))
+ clear_bit(NVME_NS_READY, &ns->flags);
+}
+
void nvme_mpath_revalidate_paths(struct nvme_ns_head *head)
{
- sector_t capacity = get_capacity(head->disk);
- struct nvme_ns *ns;
- int node;
- int srcu_idx;
+ struct mpath_disk *mpath_disk = head->mpath_disk;
- srcu_idx = srcu_read_lock(&head->srcu);
- list_for_each_entry_srcu(ns, &head->list, siblings,
- srcu_read_lock_held(&head->srcu)) {
- if (capacity != get_capacity(ns->disk))
- clear_bit(NVME_NS_READY, &ns->flags);
- }
- srcu_read_unlock(&head->srcu, srcu_idx);
+ if (!mpath_disk)
+ return;
- for_each_node(node)
- rcu_assign_pointer(head->current_path[node], NULL);
- kblockd_schedule_work(&head->requeue_work);
+ mpath_revalidate_paths(mpath_disk, nvme_mpath_revalidate_paths_cb);
}
static bool nvme_path_is_disabled(struct nvme_ns *ns)
@@ -327,142 +334,6 @@ static bool nvme_mpath_is_disabled(struct mpath_device *mpath_device)
return nvme_path_is_disabled(ns);
}
-static struct nvme_ns *__nvme_find_path(struct nvme_ns_head *head, int node)
-{
- int found_distance = INT_MAX, fallback_distance = INT_MAX, distance;
- struct nvme_ns *found = NULL, *fallback = NULL, *ns;
-
- list_for_each_entry_srcu(ns, &head->list, siblings,
- srcu_read_lock_held(&head->srcu)) {
- if (nvme_path_is_disabled(ns))
- continue;
-
- if (ns->ctrl->numa_node != NUMA_NO_NODE &&
- READ_ONCE(head->subsys->iopolicy) == NVME_IOPOLICY_NUMA)
- distance = node_distance(node, ns->ctrl->numa_node);
- else
- distance = LOCAL_DISTANCE;
-
- switch (ns->ana_state) {
- case NVME_ANA_OPTIMIZED:
- if (distance < found_distance) {
- found_distance = distance;
- found = ns;
- }
- break;
- case NVME_ANA_NONOPTIMIZED:
- if (distance < fallback_distance) {
- fallback_distance = distance;
- fallback = ns;
- }
- break;
- default:
- break;
- }
- }
-
- if (!found)
- found = fallback;
- if (found)
- rcu_assign_pointer(head->current_path[node], found);
- return found;
-}
-
-static struct nvme_ns *nvme_next_ns(struct nvme_ns_head *head,
- struct nvme_ns *ns)
-{
- ns = list_next_or_null_rcu(&head->list, &ns->siblings, struct nvme_ns,
- siblings);
- if (ns)
- return ns;
- return list_first_or_null_rcu(&head->list, struct nvme_ns, siblings);
-}
-
-static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head)
-{
- struct nvme_ns *ns, *found = NULL;
- int node = numa_node_id();
- struct nvme_ns *old = srcu_dereference(head->current_path[node],
- &head->srcu);
-
- if (unlikely(!old))
- return __nvme_find_path(head, node);
-
- if (list_is_singular(&head->list)) {
- if (nvme_path_is_disabled(old))
- return NULL;
- return old;
- }
-
- for (ns = nvme_next_ns(head, old);
- ns && ns != old;
- ns = nvme_next_ns(head, ns)) {
- if (nvme_path_is_disabled(ns))
- continue;
-
- if (ns->ana_state == NVME_ANA_OPTIMIZED) {
- found = ns;
- goto out;
- }
- if (ns->ana_state == NVME_ANA_NONOPTIMIZED)
- found = ns;
- }
-
- /*
- * The loop above skips the current path for round-robin semantics.
- * Fall back to the current path if either:
- * - no other optimized path found and current is optimized,
- * - no other usable path found and current is usable.
- */
- if (!nvme_path_is_disabled(old) &&
- (old->ana_state == NVME_ANA_OPTIMIZED ||
- (!found && old->ana_state == NVME_ANA_NONOPTIMIZED)))
- return old;
-
- if (!found)
- return NULL;
-out:
- rcu_assign_pointer(head->current_path[node], found);
- return found;
-}
-
-static struct nvme_ns *nvme_queue_depth_path(struct nvme_ns_head *head)
-{
- struct nvme_ns *best_opt = NULL, *best_nonopt = NULL, *ns;
- unsigned int min_depth_opt = UINT_MAX, min_depth_nonopt = UINT_MAX;
- unsigned int depth;
-
- list_for_each_entry_srcu(ns, &head->list, siblings,
- srcu_read_lock_held(&head->srcu)) {
- if (nvme_path_is_disabled(ns))
- continue;
-
- depth = atomic_read(&ns->ctrl->nr_active);
-
- switch (ns->ana_state) {
- case NVME_ANA_OPTIMIZED:
- if (depth < min_depth_opt) {
- min_depth_opt = depth;
- best_opt = ns;
- }
- break;
- case NVME_ANA_NONOPTIMIZED:
- if (depth < min_depth_nonopt) {
- min_depth_nonopt = depth;
- best_nonopt = ns;
- }
- break;
- default:
- break;
- }
-
- if (min_depth_opt == 0)
- return best_opt;
- }
-
- return best_opt ? best_opt : best_nonopt;
-}
-
static inline bool nvme_path_is_optimized(struct nvme_ns *ns)
{
return nvme_ctrl_state(ns->ctrl) == NVME_CTRL_LIVE &&
@@ -476,64 +347,6 @@ static bool nvme_mpath_is_optimized(struct mpath_device *mpath_device)
return nvme_path_is_optimized(ns);
}
-static struct nvme_ns *nvme_numa_path(struct nvme_ns_head *head)
-{
- int node = numa_node_id();
- struct nvme_ns *ns;
-
- ns = srcu_dereference(head->current_path[node], &head->srcu);
- if (unlikely(!ns))
- return __nvme_find_path(head, node);
- if (unlikely(!nvme_path_is_optimized(ns)))
- return __nvme_find_path(head, node);
- return ns;
-}
-
-inline struct nvme_ns *nvme_find_path(struct nvme_ns_head *head)
-{
- switch (READ_ONCE(head->subsys->iopolicy)) {
- case NVME_IOPOLICY_QD:
- return nvme_queue_depth_path(head);
- case NVME_IOPOLICY_RR:
- return nvme_round_robin_path(head);
- default:
- return nvme_numa_path(head);
- }
-}
-
-static bool nvme_available_path(struct nvme_ns_head *head)
-{
- struct nvme_ns *ns;
-
- if (!test_bit(NVME_NSHEAD_DISK_LIVE, &head->flags))
- return false;
-
- list_for_each_entry_srcu(ns, &head->list, siblings,
- srcu_read_lock_held(&head->srcu)) {
- if (test_bit(NVME_CTRL_FAILFAST_EXPIRED, &ns->ctrl->flags))
- continue;
- switch (nvme_ctrl_state(ns->ctrl)) {
- case NVME_CTRL_LIVE:
- case NVME_CTRL_RESETTING:
- case NVME_CTRL_CONNECTING:
- return true;
- default:
- break;
- }
- }
-
- /*
- * If "head->delayed_removal_secs" is configured (i.e., non-zero), do
- * not immediately fail I/O. Instead, requeue the I/O for the configured
- * duration, anticipating that if there's a transient link failure then
- * it may recover within this time window. This parameter is exported to
- * userspace via sysfs, and its default value is zero. It is internally
- * mapped to NVME_NSHEAD_QUEUE_IF_NO_PATH. When delayed_removal_secs is
- * non-zero, this flag is set to true. When zero, the flag is cleared.
- */
- return nvme_mpath_queue_if_no_path(head);
-}
-
static bool nvme_mpath_available_path(struct mpath_device *mpath_device,
bool *available)
{
@@ -554,94 +367,12 @@ static bool nvme_mpath_available_path(struct mpath_device *mpath_device,
return true;
}
-static void nvme_ns_head_submit_bio(struct bio *bio)
-{
- struct nvme_ns_head *head = bio->bi_bdev->bd_disk->private_data;
- struct device *dev = disk_to_dev(head->disk);
- struct nvme_ns *ns;
- int srcu_idx;
-
- /*
- * The namespace might be going away and the bio might be moved to a
- * different queue via blk_steal_bios(), so we need to use the bio_split
- * pool from the original queue to allocate the bvecs from.
- */
- bio = bio_split_to_limits(bio);
- if (!bio)
- return;
-
- srcu_idx = srcu_read_lock(&head->srcu);
- ns = nvme_find_path(head);
- if (likely(ns)) {
- bio_set_dev(bio, ns->disk->part0);
- bio->bi_opf |= REQ_NVME_MPATH;
- trace_block_bio_remap(bio, disk_devt(ns->head->disk),
- bio->bi_iter.bi_sector);
- submit_bio_noacct(bio);
- } else if (nvme_available_path(head)) {
- dev_warn_ratelimited(dev, "no usable path - requeuing I/O\n");
-
- spin_lock_irq(&head->requeue_lock);
- bio_list_add(&head->requeue_list, bio);
- spin_unlock_irq(&head->requeue_lock);
- } else {
- dev_warn_ratelimited(dev, "no available path - failing I/O\n");
-
- bio_io_error(bio);
- }
-
- srcu_read_unlock(&head->srcu, srcu_idx);
-}
-
-static int nvme_ns_head_open(struct gendisk *disk, blk_mode_t mode)
-{
- if (!nvme_tryget_ns_head(disk->private_data))
- return -ENXIO;
- return 0;
-}
-
-static void nvme_ns_head_release(struct gendisk *disk)
-{
- nvme_put_ns_head(disk->private_data);
-}
-
-static int nvme_ns_head_get_unique_id(struct gendisk *disk, u8 id[16],
- enum blk_unique_id type)
-{
- struct nvme_ns_head *head = disk->private_data;
- struct nvme_ns *ns;
- int srcu_idx, ret = -EWOULDBLOCK;
-
- srcu_idx = srcu_read_lock(&head->srcu);
- ns = nvme_find_path(head);
- if (ns)
- ret = nvme_ns_get_unique_id(ns, id, type);
- srcu_read_unlock(&head->srcu, srcu_idx);
- return ret;
-}
-
static int nvme_mpath_get_unique_id(struct mpath_device *mpath_device,
u8 id[16], enum blk_unique_id type)
{
return nvme_ns_get_unique_id(nvme_mpath_to_ns(mpath_device), id, type);
}
-
#ifdef CONFIG_BLK_DEV_ZONED
-static int nvme_ns_head_report_zones(struct gendisk *disk, sector_t sector,
- unsigned int nr_zones, struct blk_report_zones_args *args)
-{
- struct nvme_ns_head *head = disk->private_data;
- struct nvme_ns *ns;
- int srcu_idx, ret = -EWOULDBLOCK;
-
- srcu_idx = srcu_read_lock(&head->srcu);
- ns = nvme_find_path(head);
- if (ns)
- ret = nvme_ns_report_zones(ns, sector, nr_zones, args);
- srcu_read_unlock(&head->srcu, srcu_idx);
- return ret;
-}
-
static int nvme_mpath_report_zones(struct mpath_device *mpath_device,
sector_t sector, unsigned int nr_zones,
struct blk_report_zones_args *args)
@@ -650,51 +381,9 @@ static int nvme_mpath_report_zones(struct mpath_device *mpath_device,
nr_zones, args);
}
#else
-#define nvme_ns_head_report_zones NULL
#define nvme_mpath_report_zones NULL
#endif /* CONFIG_BLK_DEV_ZONED */
-const struct block_device_operations nvme_ns_head_ops = {
- .owner = THIS_MODULE,
- .submit_bio = nvme_ns_head_submit_bio,
- .open = nvme_ns_head_open,
- .release = nvme_ns_head_release,
- .ioctl = nvme_ns_head_ioctl,
- .compat_ioctl = blkdev_compat_ptr_ioctl,
- .getgeo = nvme_getgeo,
- .get_unique_id = nvme_ns_head_get_unique_id,
- .report_zones = nvme_ns_head_report_zones,
- .pr_ops = &nvme_pr_ops,
-};
-
-static inline struct nvme_ns_head *cdev_to_ns_head(struct cdev *cdev)
-{
- return container_of(cdev, struct nvme_ns_head, cdev);
-}
-
-static int nvme_ns_head_chr_open(struct inode *inode, struct file *file)
-{
- if (!nvme_tryget_ns_head(cdev_to_ns_head(inode->i_cdev)))
- return -ENXIO;
- return 0;
-}
-
-static int nvme_ns_head_chr_release(struct inode *inode, struct file *file)
-{
- nvme_put_ns_head(cdev_to_ns_head(inode->i_cdev));
- return 0;
-}
-
-static const struct file_operations nvme_ns_head_chr_fops = {
- .owner = THIS_MODULE,
- .open = nvme_ns_head_chr_open,
- .release = nvme_ns_head_chr_release,
- .unlocked_ioctl = nvme_ns_head_chr_ioctl,
- .compat_ioctl = compat_ptr_ioctl,
- .uring_cmd = nvme_ns_head_chr_uring_cmd,
- .uring_cmd_iopoll = nvme_ns_chr_uring_cmd_iopoll,
-};
-
static int nvme_mpath_add_cdev(struct mpath_head *mpath_head)
{
struct nvme_ns_head *head = mpath_head->drvdata;
@@ -715,72 +404,17 @@ static void nvme_mpath_del_cdev(struct mpath_head *mpath_head)
nvme_cdev_del(&mpath_head->cdev, &mpath_head->cdev_device);
}
-static int nvme_add_ns_head_cdev(struct nvme_ns_head *head)
-{
- int ret;
-
- head->cdev_device.parent = &head->subsys->dev;
- ret = dev_set_name(&head->cdev_device, "ng%dn%d",
- head->subsys->instance, head->instance);
- if (ret)
- return ret;
- ret = nvme_cdev_add(&head->cdev, &head->cdev_device,
- &nvme_ns_head_chr_fops, THIS_MODULE);
- return ret;
-}
-
-static void nvme_partition_scan_work(struct work_struct *work)
-{
- struct nvme_ns_head *head =
- container_of(work, struct nvme_ns_head, partition_scan_work);
-
- if (WARN_ON_ONCE(!test_and_clear_bit(GD_SUPPRESS_PART_SCAN,
- &head->disk->state)))
- return;
-
- mutex_lock(&head->disk->open_mutex);
- bdev_disk_changed(head->disk, false);
- mutex_unlock(&head->disk->open_mutex);
-}
-
-static void nvme_requeue_work(struct work_struct *work)
+bool nvme_mpath_has_disk(struct nvme_ns_head *head)
{
- struct nvme_ns_head *head =
- container_of(work, struct nvme_ns_head, requeue_work);
- struct bio *bio, *next;
-
- spin_lock_irq(&head->requeue_lock);
- next = bio_list_get(&head->requeue_list);
- spin_unlock_irq(&head->requeue_lock);
-
- while ((bio = next) != NULL) {
- next = bio->bi_next;
- bio->bi_next = NULL;
-
- submit_bio_noacct(bio);
- }
-}
-
-static void nvme_remove_head(struct nvme_ns_head *head)
-{
- if (test_and_clear_bit(NVME_NSHEAD_DISK_LIVE, &head->flags)) {
- /*
- * requeue I/O after NVME_NSHEAD_DISK_LIVE has been cleared
- * to allow multipath to fail all I/O.
- */
- kblockd_schedule_work(&head->requeue_work);
-
- nvme_cdev_del(&head->cdev, &head->cdev_device);
- synchronize_srcu(&head->srcu);
- del_gendisk(head->disk);
- }
- nvme_put_ns_head(head);
+ return head->mpath_disk;
}
static void nvme_remove_head_work(struct work_struct *work)
{
- struct nvme_ns_head *head = container_of(to_delayed_work(work),
- struct nvme_ns_head, remove_work);
+ struct mpath_head *mpath_head = container_of(to_delayed_work(work),
+ struct mpath_head, remove_work);
+ struct nvme_ns_head *head = mpath_head->drvdata;
+ struct mpath_disk *mpath_disk = head->mpath_disk;
bool remove = false;
mutex_lock(&head->subsys->lock);
@@ -789,24 +423,21 @@ static void nvme_remove_head_work(struct work_struct *work)
remove = true;
}
mutex_unlock(&head->subsys->lock);
- if (remove)
- nvme_remove_head(head);
+ if (remove) {
+ mpath_unregister_disk(mpath_disk);
+ nvme_put_ns_head(head);
+ }
module_put(THIS_MODULE);
}
int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl, struct nvme_ns_head *head)
{
+ struct mpath_disk *mpath_disk;
+ struct mpath_head *mpath_head;
+ struct nvme_subsystem *subsys = ctrl->subsys;
struct queue_limits lim;
- mutex_init(&head->lock);
- bio_list_init(&head->requeue_list);
- spin_lock_init(&head->requeue_lock);
- INIT_WORK(&head->requeue_work, nvme_requeue_work);
- INIT_WORK(&head->partition_scan_work, nvme_partition_scan_work);
- INIT_DELAYED_WORK(&head->remove_work, nvme_remove_head_work);
- head->delayed_removal_secs = 0;
-
/*
* If "multipath_always_on" is enabled, a multipath node is added
* regardless of whether the disk is single/multi ported, and whether
@@ -832,66 +463,29 @@ int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl, struct nvme_ns_head *head)
if (head->ids.csi == NVME_CSI_ZNS)
lim.features |= BLK_FEAT_ZONED;
- head->disk = blk_alloc_disk(&lim, ctrl->numa_node);
- if (IS_ERR(head->disk))
- return PTR_ERR(head->disk);
- head->disk->fops = &nvme_ns_head_ops;
- head->disk->private_data = head;
-
- /*
- * We need to suppress the partition scan from occuring within the
- * controller's scan_work context. If a path error occurs here, the IO
- * will wait until a path becomes available or all paths are torn down,
- * but that action also occurs within scan_work, so it would deadlock.
- * Defer the partition scan to a different context that does not block
- * scan_work.
- */
- set_bit(GD_SUPPRESS_PART_SCAN, &head->disk->state);
- sprintf(head->disk->disk_name, "nvme%dn%d",
- ctrl->subsys->instance, head->instance);
- nvme_tryget_ns_head(head);
- return 0;
-}
-
-static void nvme_mpath_set_live(struct nvme_ns *ns)
-{
- struct nvme_ns_head *head = ns->head;
- int rc;
-
- if (!head->disk)
- return;
+ mpath_disk = mpath_alloc_head_disk(&lim, ctrl->numa_node);
+ if (!mpath_disk)
+ return -ENOMEM;
- /*
- * test_and_set_bit() is used because it is protecting against two nvme
- * paths simultaneously calling device_add_disk() on the same namespace
- * head.
- */
- if (!test_and_set_bit(NVME_NSHEAD_DISK_LIVE, &head->flags)) {
- rc = device_add_disk(&head->subsys->dev, head->disk,
- nvme_ns_attr_groups);
- if (rc) {
- clear_bit(NVME_NSHEAD_DISK_LIVE, &head->flags);
- return;
- }
- nvme_add_ns_head_cdev(head);
- queue_work(nvme_wq, &head->partition_scan_work);
+ mpath_head = mpath_alloc_head();
+ if (IS_ERR(mpath_head)) {
+ mpath_put_disk(mpath_disk);
+ return PTR_ERR(mpath_head);
}
- nvme_mpath_add_sysfs_link(ns->head);
+ mpath_head->drvdata = head;
- mutex_lock(&head->lock);
- if (nvme_path_is_optimized(ns)) {
- int node, srcu_idx;
+ head->mpath_disk = mpath_disk;
+ mpath_disk->mpath_head = mpath_head;
+ mpath_disk->parent = &subsys->dev;
- srcu_idx = srcu_read_lock(&head->srcu);
- for_each_online_node(node)
- __nvme_find_path(head, node);
- srcu_read_unlock(&head->srcu, srcu_idx);
- }
- mutex_unlock(&head->lock);
+ mpath_head->mpdt = &mpdt;
+ INIT_DELAYED_WORK(&mpath_head->remove_work, nvme_remove_head_work);
- synchronize_srcu(&head->srcu);
- kblockd_schedule_work(&head->requeue_work);
+ sprintf(mpath_disk->disk->disk_name, "nvme%dn%d",
+ ctrl->subsys->instance, head->instance);
+ nvme_tryget_ns_head(head);
+ return 0;
}
static int nvme_parse_ana_log(struct nvme_ctrl *ctrl, void *data,
@@ -946,9 +540,13 @@ static inline bool nvme_state_is_live(enum nvme_ana_state state)
static void nvme_update_ns_ana_state(struct nvme_ana_group_desc *desc,
struct nvme_ns *ns)
{
+ struct nvme_ns_head *head = ns->head;
+ struct mpath_disk *mpath_disk = head->mpath_disk;
+ struct mpath_head *mpath_head = mpath_disk->mpath_head;
ns->ana_grpid = le32_to_cpu(desc->grpid);
ns->ana_state = desc->state;
clear_bit(NVME_NS_ANA_PENDING, &ns->flags);
+
/*
* nvme_mpath_set_live() will trigger I/O to the multipath path device
* and in turn to this path device. However we cannot accept this I/O
@@ -960,7 +558,7 @@ static void nvme_update_ns_ana_state(struct nvme_ana_group_desc *desc,
*/
if (nvme_state_is_live(ns->ana_state) &&
nvme_ctrl_state(ns->ctrl) == NVME_CTRL_LIVE)
- nvme_mpath_set_live(ns);
+ mpath_device_set_live(mpath_disk, &ns->mpath_device);
else {
/*
* Add sysfs link from multipath head gendisk node to path
@@ -977,8 +575,8 @@ static void nvme_update_ns_ana_state(struct nvme_ana_group_desc *desc,
* is not live but still create the sysfs link to this path from
* head node if head node of the path has already come alive.
*/
- if (test_bit(NVME_NSHEAD_DISK_LIVE, &ns->head->flags))
- nvme_mpath_add_sysfs_link(ns->head);
+ if (test_bit(MPATH_HEAD_DISK_LIVE, &mpath_head->flags))
+ mpath_add_sysfs_link(mpath_disk);
}
}
@@ -1018,6 +616,17 @@ void nvme_mpath_delete_ns(struct nvme_ns *ns)
mpath_delete_device(mpath_disk->mpath_head, &ns->mpath_device);
}
+void nvme_mpath_remove_sysfs_link(struct nvme_ns *ns)
+{
+ struct nvme_ns_head *head = ns->head;
+ struct mpath_disk *mpath_disk = head->mpath_disk;
+
+ if (!mpath_disk)
+ return;
+
+ mpath_remove_sysfs_link(mpath_disk, &ns->mpath_device);
+}
+
static int nvme_update_ana_state(struct nvme_ctrl *ctrl,
struct nvme_ana_group_desc *desc, void *data)
{
@@ -1140,32 +749,23 @@ static ssize_t nvme_subsys_iopolicy_show(struct device *dev,
{
struct nvme_subsystem *subsys =
container_of(dev, struct nvme_subsystem, dev);
+ return mpath_iopolicy_show(&subsys->iopolicy, buf);
- return sysfs_emit(buf, "%s\n",
- nvme_iopolicy_names[READ_ONCE(subsys->iopolicy)]);
}
-static void nvme_subsys_iopolicy_update(struct nvme_subsystem *subsys,
- int iopolicy)
+static void nvme_subsys_iopolicy_store_update(void *data)
{
+ struct nvme_subsystem *subsys = data;
struct nvme_ctrl *ctrl;
- int old_iopolicy = READ_ONCE(subsys->iopolicy);
- if (old_iopolicy == iopolicy)
- return;
-
- WRITE_ONCE(subsys->iopolicy, iopolicy);
-
- /* iopolicy changes clear the mpath by design */
mutex_lock(&nvme_subsystems_lock);
- list_for_each_entry(ctrl, &subsys->ctrls, subsys_entry)
+ pr_err("%s subsys=%pS\n", __func__, subsys);
+ list_for_each_entry(ctrl, &subsys->ctrls, subsys_entry) {
+ pr_err("%s2 subsys=%pS ctrl=%pS calling nvme_mpath_clear_ctrl_paths\n",
+ __func__, subsys, ctrl);
nvme_mpath_clear_ctrl_paths(ctrl);
+ }
mutex_unlock(&nvme_subsystems_lock);
-
- pr_notice("subsysnqn %s iopolicy changed from %s to %s\n",
- subsys->subnqn,
- nvme_iopolicy_names[old_iopolicy],
- nvme_iopolicy_names[iopolicy]);
}
static ssize_t nvme_subsys_iopolicy_store(struct device *dev,
@@ -1173,16 +773,9 @@ static ssize_t nvme_subsys_iopolicy_store(struct device *dev,
{
struct nvme_subsystem *subsys =
container_of(dev, struct nvme_subsystem, dev);
- int i;
- for (i = 0; i < ARRAY_SIZE(nvme_iopolicy_names); i++) {
- if (sysfs_streq(buf, nvme_iopolicy_names[i])) {
- nvme_subsys_iopolicy_update(subsys, i);
- return count;
- }
- }
-
- return -EINVAL;
+ return mpath_iopolicy_store(&subsys->iopolicy, buf, count,
+ nvme_subsys_iopolicy_store_update, subsys);
}
SUBSYS_ATTR_RW(iopolicy, S_IRUGO | S_IWUSR,
nvme_subsys_iopolicy_show, nvme_subsys_iopolicy_store);
@@ -1207,8 +800,9 @@ static ssize_t queue_depth_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
struct nvme_ns *ns = nvme_get_ns_from_dev(dev);
+ struct nvme_subsystem *subsys = ns->head->subsys;
- if (ns->head->subsys->iopolicy != NVME_IOPOLICY_QD)
+ if (!mpath_qd_iopolicy(&subsys->iopolicy))
return 0;
return sysfs_emit(buf, "%d\n", atomic_read(&ns->ctrl->nr_active));
@@ -1218,69 +812,33 @@ DEVICE_ATTR_RO(queue_depth);
static ssize_t numa_nodes_show(struct device *dev, struct device_attribute *attr,
char *buf)
{
- int node, srcu_idx;
- nodemask_t numa_nodes;
- struct nvme_ns *current_ns;
struct nvme_ns *ns = nvme_get_ns_from_dev(dev);
struct nvme_ns_head *head = ns->head;
+ struct mpath_disk *mpath_disk = head->mpath_disk;
+ struct mpath_head *mpath_head = mpath_disk->mpath_head;
+ struct nvme_subsystem *subsys = ns->head->subsys;
+ struct mpath_device *mpath_device = &ns->mpath_device;
- if (head->subsys->iopolicy != NVME_IOPOLICY_NUMA)
- return 0;
-
- nodes_clear(numa_nodes);
-
- srcu_idx = srcu_read_lock(&head->srcu);
- for_each_node(node) {
- current_ns = srcu_dereference(head->current_path[node],
- &head->srcu);
- if (ns == current_ns)
- node_set(node, numa_nodes);
- }
- srcu_read_unlock(&head->srcu, srcu_idx);
-
- return sysfs_emit(buf, "%*pbl\n", nodemask_pr_args(&numa_nodes));
+ return mpath_numa_nodes_show(mpath_head, mpath_device, &subsys->iopolicy, buf);
}
DEVICE_ATTR_RO(numa_nodes);
-static ssize_t delayed_removal_secs_show(struct device *dev,
+static ssize_t delayed_removal_secs_show(struct device *bd_device,
struct device_attribute *attr, char *buf)
{
- struct gendisk *disk = dev_to_disk(dev);
- struct nvme_ns_head *head = disk->private_data;
- int ret;
+ struct mpath_disk *mpath_disk = mpath_bd_device_to_disk(bd_device);
+ struct mpath_head *mpath_head = mpath_disk->mpath_head;
- mutex_lock(&head->subsys->lock);
- ret = sysfs_emit(buf, "%u\n", head->delayed_removal_secs);
- mutex_unlock(&head->subsys->lock);
- return ret;
+ return mpath_delayed_removal_secs_show(mpath_head, buf);
}
-static ssize_t delayed_removal_secs_store(struct device *dev,
+static ssize_t delayed_removal_secs_store(struct device *bd_device,
struct device_attribute *attr, const char *buf, size_t count)
{
- struct gendisk *disk = dev_to_disk(dev);
- struct nvme_ns_head *head = disk->private_data;
- unsigned int sec;
- int ret;
-
- ret = kstrtouint(buf, 0, &sec);
- if (ret < 0)
- return ret;
-
- mutex_lock(&head->subsys->lock);
- head->delayed_removal_secs = sec;
- if (sec)
- set_bit(NVME_NSHEAD_QUEUE_IF_NO_PATH, &head->flags);
- else
- clear_bit(NVME_NSHEAD_QUEUE_IF_NO_PATH, &head->flags);
- mutex_unlock(&head->subsys->lock);
- /*
- * Ensure that update to NVME_NSHEAD_QUEUE_IF_NO_PATH is seen
- * by its reader.
- */
- synchronize_srcu(&head->srcu);
+ struct mpath_disk *mpath_disk = mpath_bd_device_to_disk(bd_device);
+ struct mpath_head *mpath_head = mpath_disk->mpath_head;
- return count;
+ return mpath_delayed_removal_secs_store(mpath_head, buf, count);
}
DEVICE_ATTR_RW(delayed_removal_secs);
@@ -1297,87 +855,14 @@ static int nvme_lookup_ana_group_desc(struct nvme_ctrl *ctrl,
return -ENXIO; /* just break out of the loop */
}
-void nvme_mpath_add_sysfs_link(struct nvme_ns_head *head)
-{
- struct device *target;
- int rc, srcu_idx;
- struct nvme_ns *ns;
- struct kobject *kobj;
-
- /*
- * Ensure head disk node is already added otherwise we may get invalid
- * kobj for head disk node
- */
- if (!test_bit(GD_ADDED, &head->disk->state))
- return;
-
- kobj = &disk_to_dev(head->disk)->kobj;
-
- /*
- * loop through each ns chained through the head->list and create the
- * sysfs link from head node to the ns path node
- */
- srcu_idx = srcu_read_lock(&head->srcu);
-
- list_for_each_entry_srcu(ns, &head->list, siblings,
- srcu_read_lock_held(&head->srcu)) {
- /*
- * Ensure that ns path disk node is already added otherwise we
- * may get invalid kobj name for target
- */
- if (!test_bit(GD_ADDED, &ns->disk->state))
- continue;
-
- /*
- * Avoid creating link if it already exists for the given path.
- * When path ana state transitions from optimized to non-
- * optimized or vice-versa, the nvme_mpath_set_live() is
- * invoked which in truns call this function. Now if the sysfs
- * link already exists for the given path and we attempt to re-
- * create the link then sysfs code would warn about it loudly.
- * So we evaluate NVME_NS_SYSFS_ATTR_LINK flag here to ensure
- * that we're not creating duplicate link.
- * The test_and_set_bit() is used because it is protecting
- * against multiple nvme paths being simultaneously added.
- */
- if (test_and_set_bit(NVME_NS_SYSFS_ATTR_LINK, &ns->flags))
- continue;
-
- target = disk_to_dev(ns->disk);
- /*
- * Create sysfs link from head gendisk kobject @kobj to the
- * ns path gendisk kobject @target->kobj.
- */
- rc = sysfs_add_link_to_group(kobj, nvme_ns_mpath_attr_group.name,
- &target->kobj, dev_name(target));
- if (unlikely(rc)) {
- dev_err(disk_to_dev(ns->head->disk),
- "failed to create link to %s\n",
- dev_name(target));
- clear_bit(NVME_NS_SYSFS_ATTR_LINK, &ns->flags);
- }
- }
-
- srcu_read_unlock(&head->srcu, srcu_idx);
-}
-
-void nvme_mpath_remove_sysfs_link(struct nvme_ns *ns)
+void nvme_mpath_add_disk(struct nvme_ns *ns, __le32 anagrpid)
{
- struct device *target;
- struct kobject *kobj;
+ struct nvme_ns_head *head = ns->head;
+ struct mpath_disk *mpath_disk = head->mpath_disk;
- if (!test_bit(NVME_NS_SYSFS_ATTR_LINK, &ns->flags))
+ if (!mpath_disk)
return;
- target = disk_to_dev(ns->disk);
- kobj = &disk_to_dev(ns->head->disk)->kobj;
- sysfs_remove_link_from_group(kobj, nvme_ns_mpath_attr_group.name,
- dev_name(target));
- clear_bit(NVME_NS_SYSFS_ATTR_LINK, &ns->flags);
-}
-
-void nvme_mpath_add_disk(struct nvme_ns *ns, __le32 anagrpid)
-{
if (nvme_ctrl_use_ana(ns->ctrl)) {
struct nvme_ana_group_desc desc = {
.grpid = anagrpid,
@@ -1398,23 +883,28 @@ void nvme_mpath_add_disk(struct nvme_ns *ns, __le32 anagrpid)
}
} else {
ns->ana_state = NVME_ANA_OPTIMIZED;
- nvme_mpath_set_live(ns);
+ mpath_device_set_live(mpath_disk, &ns->mpath_device);
}
#ifdef CONFIG_BLK_DEV_ZONED
- if (blk_queue_is_zoned(ns->queue) && ns->head->disk)
- ns->head->disk->nr_zones = ns->disk->nr_zones;
+ if (blk_queue_is_zoned(ns->queue) && mpath_disk->disk)
+ mpath_disk->disk->nr_zones = ns->disk->nr_zones;
#endif
}
void nvme_mpath_remove_disk(struct nvme_ns_head *head)
{
+ struct mpath_disk *mpath_disk = head->mpath_disk;
+ struct mpath_head *mpath_head;
bool remove = false;
- if (!head->disk)
+ if (!mpath_disk)
return;
+ mpath_head = mpath_disk->mpath_head;
+
mutex_lock(&head->subsys->lock);
+
/*
* We are called when all paths have been removed, and at that point
* head->list is expected to be empty. However, nvme_ns_remove() and
@@ -1424,37 +914,21 @@ void nvme_mpath_remove_disk(struct nvme_ns_head *head)
* head->list here. If it is no longer empty then we skip enqueuing the
* delayed head removal work.
*/
+
if (head->ns_count)
goto out;
- if (head->delayed_removal_secs) {
- /*
- * Ensure that no one could remove this module while the head
- * remove work is pending.
- */
- if (!try_module_get(THIS_MODULE))
- goto out;
- mod_delayed_work(nvme_wq, &head->remove_work,
- head->delayed_removal_secs * HZ);
- } else {
+ if (mpath_can_remove_head(mpath_head)) {
list_del_init(&head->entry);
remove = true;
}
out:
mutex_unlock(&head->subsys->lock);
- if (remove)
- nvme_remove_head(head);
-}
-void nvme_mpath_put_disk(struct nvme_ns_head *head)
-{
- if (!head->disk)
- return;
- /* make sure all pending bios are cleaned up */
- kblockd_schedule_work(&head->requeue_work);
- flush_work(&head->requeue_work);
- flush_work(&head->partition_scan_work);
- put_disk(head->disk);
+ if (remove) {
+ mpath_unregister_disk(mpath_disk);
+ nvme_put_ns_head(head);
+ }
}
void nvme_mpath_init_ctrl(struct nvme_ctrl *ctrl)
@@ -1525,15 +999,16 @@ void nvme_mpath_uninit(struct nvme_ctrl *ctrl)
ctrl->ana_log_size = 0;
}
-static enum mpath_iopolicy_e nvme_mpath_get_iopolicy(
- struct mpath_head *mpath_head)
+
+static enum mpath_iopolicy_e nvme_mpath_get_iopolicy(struct mpath_head *mpath_head)
{
struct nvme_ns_head *head = mpath_head->drvdata;
struct nvme_subsystem *subsys = head->subsys;
- return mpath_read_iopolicy(&subsys->mpath_iopolicy);
+ return mpath_read_iopolicy(&subsys->iopolicy);
}
+
static enum mpath_access_state nvme_mpath_get_access_state(
struct mpath_device *mpath_device)
{
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index e276a7bcb7aff..d83495dead590 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -253,11 +253,6 @@ struct nvme_request {
struct nvme_ctrl *ctrl;
};
-/*
- * Mark a bio as coming in through the mpath node.
- */
-#define REQ_NVME_MPATH REQ_DRV
-
enum {
NVME_REQ_CANCELLED = (1 << 0),
NVME_REQ_USERCMD = (1 << 1),
@@ -475,11 +470,6 @@ static inline enum nvme_ctrl_state nvme_ctrl_state(struct nvme_ctrl *ctrl)
return READ_ONCE(ctrl->state);
}
-enum nvme_iopolicy {
- NVME_IOPOLICY_NUMA,
- NVME_IOPOLICY_RR,
- NVME_IOPOLICY_QD,
-};
struct nvme_subsystem {
int instance;
@@ -502,8 +492,7 @@ struct nvme_subsystem {
u16 vendor_id;
struct ida ns_ida;
#ifdef CONFIG_NVME_MULTIPATH
- enum nvme_iopolicy iopolicy;
- struct mpath_iopolicy mpath_iopolicy;
+ struct mpath_iopolicy iopolicy;
#endif
};
@@ -525,8 +514,6 @@ struct nvme_ns_ids {
* only ever has a single entry for private namespaces.
*/
struct nvme_ns_head {
- struct list_head list;
- struct srcu_struct srcu;
struct nvme_subsystem *subsys;
struct nvme_ns_ids ids;
u8 lba_shift;
@@ -551,33 +538,15 @@ struct nvme_ns_head {
struct ratelimit_state rs_nuse;
- struct cdev cdev;
- struct device cdev_device;
-
- struct gendisk *disk;
-
u16 nr_plids;
u16 *plids;
struct mpath_disk *mpath_disk;
-#ifdef CONFIG_NVME_MULTIPATH
- struct bio_list requeue_list;
- spinlock_t requeue_lock;
- struct work_struct requeue_work;
- struct work_struct partition_scan_work;
- struct mutex lock;
- unsigned long flags;
- struct delayed_work remove_work;
- unsigned int delayed_removal_secs;
-#define NVME_NSHEAD_DISK_LIVE 0
-#define NVME_NSHEAD_QUEUE_IF_NO_PATH 1
- struct nvme_ns __rcu *current_path[];
-#endif
};
static inline bool nvme_ns_head_multipath(struct nvme_ns_head *head)
{
- return IS_ENABLED(CONFIG_NVME_MULTIPATH) && head->disk;
+ return IS_ENABLED(CONFIG_NVME_MULTIPATH) && head->mpath_disk;
}
enum nvme_ns_features {
@@ -1011,9 +980,7 @@ int nvme_getgeo(struct gendisk *disk, struct hd_geometry *geo);
int nvme_dev_uring_cmd(struct io_uring_cmd *ioucmd, unsigned int issue_flags);
extern const struct attribute_group *nvme_ns_attr_groups[];
-extern const struct attribute_group nvme_ns_mpath_attr_group;
extern const struct pr_ops nvme_pr_ops;
-extern const struct block_device_operations nvme_ns_head_ops;
extern const struct attribute_group nvme_dev_attrs_group;
extern const struct attribute_group *nvme_subsys_attrs_groups[];
extern const struct attribute_group *nvme_dev_attr_groups[];
@@ -1030,6 +997,7 @@ static inline bool nvme_ctrl_use_ana(struct nvme_ctrl *ctrl)
void nvme_mpath_synchronize(struct nvme_ns_head *head);
void nvme_mpath_add_ns(struct nvme_ns *ns);
void nvme_mpath_delete_ns(struct nvme_ns *ns);
+void nvme_mpath_remove_sysfs_link(struct nvme_ns *ns);
void nvme_mpath_unfreeze(struct nvme_subsystem *subsys);
void nvme_mpath_wait_freeze(struct nvme_subsystem *subsys);
void nvme_mpath_start_freeze(struct nvme_subsystem *subsys);
@@ -1037,8 +1005,7 @@ void nvme_mpath_default_iopolicy(struct nvme_subsystem *subsys);
void nvme_failover_req(struct request *req);
void nvme_kick_requeue_lists(struct nvme_ctrl *ctrl);
int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl,struct nvme_ns_head *head);
-void nvme_mpath_add_sysfs_link(struct nvme_ns_head *ns);
-void nvme_mpath_remove_sysfs_link(struct nvme_ns *ns);
+bool nvme_mpath_has_disk(struct nvme_ns_head *head);
void nvme_mpath_add_disk(struct nvme_ns *ns, __le32 anagrpid);
void nvme_mpath_put_disk(struct nvme_ns_head *head);
int nvme_mpath_init_identify(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id);
@@ -1064,15 +1031,19 @@ int nvme_mpath_chr_uring_cmd(struct mpath_device *mpath_device,
static inline bool nvme_is_mpath_request(struct request *req)
{
- return req->cmd_flags & REQ_NVME_MPATH;
+ return is_mpath_request(req);
}
static inline void nvme_trace_bio_complete(struct request *req)
{
struct nvme_ns *ns = req->q->queuedata;
- if (nvme_is_mpath_request(req) && req->bio)
- trace_block_bio_complete(ns->head->disk->queue, req->bio);
+ if (nvme_is_mpath_request(req) && req->bio) {
+ struct nvme_ns_head *head = ns->head;
+ struct mpath_disk *mpath_disk = head->mpath_disk;
+
+ trace_block_bio_complete(mpath_disk->disk->queue, req->bio);
+ }
}
extern bool multipath;
@@ -1085,13 +1056,7 @@ extern struct device_attribute subsys_attr_iopolicy;
static inline bool nvme_disk_is_ns_head(struct gendisk *disk)
{
- return disk->fops == &nvme_ns_head_ops;
-}
-static inline bool nvme_mpath_queue_if_no_path(struct nvme_ns_head *head)
-{
- if (test_bit(NVME_NSHEAD_QUEUE_IF_NO_PATH, &head->flags))
- return true;
- return false;
+ return is_mpath_head(disk);
}
#else
#define multipath false
@@ -1108,6 +1073,9 @@ static inline void nvme_mpath_add_ns(struct nvme_ns *ns)
static inline void nvme_mpath_delete_ns(struct nvme_ns *ns)
{
}
+static inline void nvme_mpath_remove_sysfs_link(struct nvme_ns *ns)
+{
+}
static inline void nvme_failover_req(struct request *req)
{
}
@@ -1119,16 +1087,14 @@ static inline int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl,
{
return 0;
}
-static inline void nvme_mpath_add_disk(struct nvme_ns *ns, __le32 anagrpid)
-{
-}
-static inline void nvme_mpath_put_disk(struct nvme_ns_head *head)
+static inline bool nvme_mpath_has_disk(struct nvme_ns_head *head)
{
+ return false;
}
-static inline void nvme_mpath_add_sysfs_link(struct nvme_ns *ns)
+static inline void nvme_mpath_add_disk(struct nvme_ns *ns, __le32 anagrpid)
{
}
-static inline void nvme_mpath_remove_sysfs_link(struct nvme_ns *ns)
+static inline void nvme_mpath_put_disk(struct nvme_ns_head *head)
{
}
static inline bool nvme_mpath_clear_current_path(struct nvme_ns *ns)
diff --git a/drivers/nvme/host/pr.c b/drivers/nvme/host/pr.c
index fd5a9f309a56f..b1002c3d43eb3 100644
--- a/drivers/nvme/host/pr.c
+++ b/drivers/nvme/host/pr.c
@@ -49,24 +49,9 @@ static enum pr_type block_pr_type_from_nvme(enum nvme_pr_type type)
return 0;
}
-static int nvme_send_ns_head_pr_command(struct block_device *bdev,
- struct nvme_command *c, void *data, unsigned int data_len)
-{
- struct nvme_ns_head *head = bdev->bd_disk->private_data;
- int srcu_idx = srcu_read_lock(&head->srcu);
- struct nvme_ns *ns = nvme_find_path(head);
- int ret = -EWOULDBLOCK;
-
- if (ns) {
- c->common.nsid = cpu_to_le32(ns->head->ns_id);
- ret = nvme_submit_sync_cmd(ns->queue, c, data, data_len);
- }
- srcu_read_unlock(&head->srcu, srcu_idx);
- return ret;
-}
-
-static int nvme_send_ns_pr_command(struct nvme_ns *ns, struct nvme_command *c,
- void *data, unsigned int data_len)
+static int nvme_send_device_pr_command(struct nvme_ns *ns,
+ struct nvme_command *c, void *data,
+ unsigned int data_len)
{
c->common.nsid = cpu_to_le32(ns->head->ns_id);
return nvme_submit_sync_cmd(ns->queue, c, data, data_len);
@@ -92,31 +77,7 @@ static int nvme_status_to_pr_err(int status)
}
}
-static int __nvme_send_pr_command(struct block_device *bdev, u32 cdw10,
- u32 cdw11, u8 op, void *data, unsigned int data_len)
-{
- struct nvme_command c = { 0 };
-
- c.common.opcode = op;
- c.common.cdw10 = cpu_to_le32(cdw10);
- c.common.cdw11 = cpu_to_le32(cdw11);
-
- if (nvme_disk_is_ns_head(bdev->bd_disk))
- return nvme_send_ns_head_pr_command(bdev, &c, data, data_len);
- return nvme_send_ns_pr_command(bdev->bd_disk->private_data, &c,
- data, data_len);
-}
-
-static int nvme_send_pr_command(struct block_device *bdev, u32 cdw10, u32 cdw11,
- u8 op, void *data, unsigned int data_len)
-{
- int ret;
-
- ret = __nvme_send_pr_command(bdev, cdw10, cdw11, op, data, data_len);
- return ret < 0 ? ret : nvme_status_to_pr_err(ret);
-}
-
-static int __nvme_send_pr_command_ns(struct nvme_ns *ns, u32 cdw10,
+static int __nvme_send_pr_command(struct nvme_ns *ns, u32 cdw10,
u32 cdw11, u8 op, void *data, unsigned int data_len)
{
struct nvme_command c = { 0 };
@@ -125,19 +86,18 @@ static int __nvme_send_pr_command_ns(struct nvme_ns *ns, u32 cdw10,
c.common.cdw10 = cpu_to_le32(cdw10);
c.common.cdw11 = cpu_to_le32(cdw11);
- return nvme_send_ns_pr_command(ns, &c, data, data_len);
+ return nvme_send_device_pr_command(ns, &c, data, data_len);
}
-static int nvme_send_pr_command_ns(struct nvme_ns *ns, u32 cdw10, u32 cdw11,
+static int nvme_send_pr_command(struct nvme_ns *ns, u32 cdw10, u32 cdw11,
u8 op, void *data, unsigned int data_len)
{
int ret;
- ret = __nvme_send_pr_command_ns(ns, cdw10, cdw11, op, data, data_len);
+ ret = __nvme_send_pr_command(ns, cdw10, cdw11, op, data, data_len);
return ret < 0 ? ret : nvme_status_to_pr_err(ret);
}
-__maybe_unused
static int nvme_pr_register_ns(struct nvme_ns *ns, u64 old_key, u64 new_key,
u32 flags)
{
@@ -156,33 +116,11 @@ static int nvme_pr_register_ns(struct nvme_ns *ns, u64 old_key, u64 new_key,
cdw10 |= (flags & PR_FL_IGNORE_KEY) ? NVME_PR_IGNORE_KEY : 0;
cdw10 |= NVME_PR_CPTPL_PERSIST;
- ret = nvme_send_pr_command_ns(ns, cdw10, 0, nvme_cmd_resv_register,
+ ret = nvme_send_pr_command(ns, cdw10, 0, nvme_cmd_resv_register,
&data, sizeof(data));
return ret;
}
-static int nvme_pr_register(struct block_device *bdev, u64 old_key, u64 new_key,
- unsigned int flags)
-{
- struct nvmet_pr_register_data data = { 0 };
- u32 cdw10;
-
- if (flags & ~PR_FL_IGNORE_KEY)
- return -EOPNOTSUPP;
-
- data.crkey = cpu_to_le64(old_key);
- data.nrkey = cpu_to_le64(new_key);
-
- cdw10 = old_key ? NVME_PR_REGISTER_ACT_REPLACE :
- NVME_PR_REGISTER_ACT_REG;
- cdw10 |= (flags & PR_FL_IGNORE_KEY) ? NVME_PR_IGNORE_KEY : 0;
- cdw10 |= NVME_PR_CPTPL_PERSIST;
-
- return nvme_send_pr_command(bdev, cdw10, 0, nvme_cmd_resv_register,
- &data, sizeof(data));
-}
-
-__maybe_unused
static int nvme_pr_reserve_ns(struct nvme_ns *ns, u64 key, enum pr_type type,
u32 flags)
{
@@ -198,30 +136,10 @@ static int nvme_pr_reserve_ns(struct nvme_ns *ns, u64 key, enum pr_type type,
cdw10 |= nvme_pr_type_from_blk(type) << 8;
cdw10 |= (flags & PR_FL_IGNORE_KEY) ? NVME_PR_IGNORE_KEY : 0;
- return nvme_send_pr_command_ns(ns, cdw10, 0, nvme_cmd_resv_acquire,
+ return nvme_send_pr_command(ns, cdw10, 0, nvme_cmd_resv_acquire,
&data, sizeof(data));
}
-static int nvme_pr_reserve(struct block_device *bdev, u64 key,
- enum pr_type type, unsigned flags)
-{
- struct nvmet_pr_acquire_data data = { 0 };
- u32 cdw10;
-
- if (flags & ~PR_FL_IGNORE_KEY)
- return -EOPNOTSUPP;
-
- data.crkey = cpu_to_le64(key);
-
- cdw10 = NVME_PR_ACQUIRE_ACT_ACQUIRE;
- cdw10 |= nvme_pr_type_from_blk(type) << 8;
- cdw10 |= (flags & PR_FL_IGNORE_KEY) ? NVME_PR_IGNORE_KEY : 0;
-
- return nvme_send_pr_command(bdev, cdw10, 0, nvme_cmd_resv_acquire,
- &data, sizeof(data));
-}
-
-__maybe_unused
static int nvme_pr_preempt_ns(struct nvme_ns *ns, u64 old, u64 new,
enum pr_type type, bool abort)
{
@@ -235,28 +153,10 @@ static int nvme_pr_preempt_ns(struct nvme_ns *ns, u64 old, u64 new,
NVME_PR_ACQUIRE_ACT_PREEMPT;
cdw10 |= nvme_pr_type_from_blk(type) << 8;
- return nvme_send_pr_command_ns(ns, cdw10, 0, nvme_cmd_resv_acquire,
- &data, sizeof(data));
-}
-
-static int nvme_pr_preempt(struct block_device *bdev, u64 old, u64 new,
- enum pr_type type, bool abort)
-{
- struct nvmet_pr_acquire_data data = { 0 };
- u32 cdw10;
-
- data.crkey = cpu_to_le64(old);
- data.prkey = cpu_to_le64(new);
-
- cdw10 = abort ? NVME_PR_ACQUIRE_ACT_PREEMPT_AND_ABORT :
- NVME_PR_ACQUIRE_ACT_PREEMPT;
- cdw10 |= nvme_pr_type_from_blk(type) << 8;
-
- return nvme_send_pr_command(bdev, cdw10, 0, nvme_cmd_resv_acquire,
+ return nvme_send_pr_command(ns, cdw10, 0, nvme_cmd_resv_acquire,
&data, sizeof(data));
}
-__maybe_unused
static int nvme_pr_clear_ns(struct nvme_ns *ns, u64 key)
{
struct nvmet_pr_release_data data = { 0 };
@@ -267,40 +167,10 @@ static int nvme_pr_clear_ns(struct nvme_ns *ns, u64 key)
cdw10 = NVME_PR_RELEASE_ACT_CLEAR;
cdw10 |= key ? 0 : NVME_PR_IGNORE_KEY;
- return nvme_send_pr_command_ns(ns, cdw10, 0, nvme_cmd_resv_release,
- &data, sizeof(data));
-}
-
-static int nvme_pr_clear(struct block_device *bdev, u64 key)
-{
- struct nvmet_pr_release_data data = { 0 };
- u32 cdw10;
-
- data.crkey = cpu_to_le64(key);
-
- cdw10 = NVME_PR_RELEASE_ACT_CLEAR;
- cdw10 |= key ? 0 : NVME_PR_IGNORE_KEY;
-
- return nvme_send_pr_command(bdev, cdw10, 0, nvme_cmd_resv_release,
+ return nvme_send_pr_command(ns, cdw10, 0, nvme_cmd_resv_release,
&data, sizeof(data));
}
-static int nvme_pr_release(struct block_device *bdev, u64 key, enum pr_type type)
-{
- struct nvmet_pr_release_data data = { 0 };
- u32 cdw10;
-
- data.crkey = cpu_to_le64(key);
-
- cdw10 = NVME_PR_RELEASE_ACT_RELEASE;
- cdw10 |= nvme_pr_type_from_blk(type) << 8;
- cdw10 |= key ? 0 : NVME_PR_IGNORE_KEY;
-
- return nvme_send_pr_command(bdev, cdw10, 0, nvme_cmd_resv_release,
- &data, sizeof(data));
-}
-
-__maybe_unused
static int nvme_pr_release_ns(struct nvme_ns *ns, u64 key, enum pr_type type)
{
struct nvmet_pr_release_data data = { 0 };
@@ -312,11 +182,11 @@ static int nvme_pr_release_ns(struct nvme_ns *ns, u64 key, enum pr_type type)
cdw10 |= nvme_pr_type_from_blk(type) << 8;
cdw10 |= key ? 0 : NVME_PR_IGNORE_KEY;
- return nvme_send_pr_command_ns(ns, cdw10, 0, nvme_cmd_resv_release,
+ return nvme_send_pr_command(ns, cdw10, 0, nvme_cmd_resv_release,
&data, sizeof(data));
}
-static int nvme_mpath_pr_resv_report_ns(struct nvme_ns *ns, void *data,
+static int nvme_mpath_pr_resv_report(struct nvme_ns *ns, void *data,
u32 data_len, bool *eds)
{
u32 cdw10, cdw11;
@@ -327,7 +197,7 @@ static int nvme_mpath_pr_resv_report_ns(struct nvme_ns *ns, void *data,
*eds = true;
retry:
- ret = __nvme_send_pr_command_ns(ns, cdw10, cdw11, nvme_cmd_resv_report,
+ ret = __nvme_send_pr_command(ns, cdw10, cdw11, nvme_cmd_resv_report,
data, data_len);
if (ret == NVME_SC_HOST_ID_INCONSIST &&
cdw11 == NVME_EXTENDED_DATA_STRUCT) {
@@ -339,30 +209,6 @@ static int nvme_mpath_pr_resv_report_ns(struct nvme_ns *ns, void *data,
return ret < 0 ? ret : nvme_status_to_pr_err(ret);
}
-static int nvme_pr_resv_report(struct block_device *bdev, void *data,
- u32 data_len, bool *eds)
-{
- u32 cdw10, cdw11;
- int ret;
-
- cdw10 = nvme_bytes_to_numd(data_len);
- cdw11 = NVME_EXTENDED_DATA_STRUCT;
- *eds = true;
-
-retry:
- ret = __nvme_send_pr_command(bdev, cdw10, cdw11, nvme_cmd_resv_report,
- data, data_len);
- if (ret == NVME_SC_HOST_ID_INCONSIST &&
- cdw11 == NVME_EXTENDED_DATA_STRUCT) {
- cdw11 = 0;
- *eds = false;
- goto retry;
- }
-
- return ret < 0 ? ret : nvme_status_to_pr_err(ret);
-}
-
-__maybe_unused
static int nvme_pr_read_keys_ns(struct nvme_ns *ns, struct pr_keys *keys_info)
{
size_t rse_len;
@@ -383,7 +229,7 @@ static int nvme_pr_read_keys_ns(struct nvme_ns *ns, struct pr_keys *keys_info)
if (!rse)
return -ENOMEM;
- ret = nvme_mpath_pr_resv_report_ns(ns, rse, rse_len, &eds);
+ ret = nvme_mpath_pr_resv_report(ns, rse, rse_len, &eds);
if (ret)
goto free_rse;
@@ -399,53 +245,8 @@ static int nvme_pr_read_keys_ns(struct nvme_ns *ns, struct pr_keys *keys_info)
struct nvme_reservation_status *rs;
rs = (struct nvme_reservation_status *)rse;
- keys_info->keys[i] = le64_to_cpu(rs->regctl_ds[i].rkey);
- }
- }
-
-free_rse:
- kfree(rse);
- return ret;
-}
-
-static int nvme_pr_read_keys(struct block_device *bdev,
- struct pr_keys *keys_info)
-{
- size_t rse_len;
- u32 num_keys = keys_info->num_keys;
- struct nvme_reservation_status_ext *rse;
- int ret, i;
- bool eds;
-
- /*
- * Assume we are using 128-bit host IDs and allocate a buffer large
- * enough to get enough keys to fill the return keys buffer.
- */
- rse_len = struct_size(rse, regctl_eds, num_keys);
- if (rse_len > U32_MAX)
- return -EINVAL;
-
- rse = kzalloc(rse_len, GFP_KERNEL);
- if (!rse)
- return -ENOMEM;
-
- ret = nvme_pr_resv_report(bdev, rse, rse_len, &eds);
- if (ret)
- goto free_rse;
-
- keys_info->generation = le32_to_cpu(rse->gen);
- keys_info->num_keys = get_unaligned_le16(&rse->regctl);
-
- num_keys = min(num_keys, keys_info->num_keys);
- for (i = 0; i < num_keys; i++) {
- if (eds) {
keys_info->keys[i] =
- le64_to_cpu(rse->regctl_eds[i].rkey);
- } else {
- struct nvme_reservation_status *rs;
-
- rs = (struct nvme_reservation_status *)rse;
- keys_info->keys[i] = le64_to_cpu(rs->regctl_ds[i].rkey);
+ le64_to_cpu(rs->regctl_ds[i].rkey);
}
}
@@ -454,7 +255,6 @@ static int nvme_pr_read_keys(struct block_device *bdev,
return ret;
}
-__maybe_unused
static int nvme_pr_read_reservation_ns(struct nvme_ns *ns,
struct pr_held_reservation *resv)
{
@@ -468,7 +268,7 @@ static int nvme_pr_read_reservation_ns(struct nvme_ns *ns,
* Get the number of registrations so we know how big to allocate
* the response buffer.
*/
- ret = nvme_mpath_pr_resv_report_ns(ns, &tmp_rse, sizeof(tmp_rse),
+ ret = nvme_mpath_pr_resv_report(ns, &tmp_rse, sizeof(tmp_rse),
&eds);
if (ret)
return ret;
@@ -484,7 +284,7 @@ static int nvme_pr_read_reservation_ns(struct nvme_ns *ns,
if (!rse)
return -ENOMEM;
- ret = nvme_mpath_pr_resv_report_ns(ns, rse, rse_len, &eds);
+ ret = nvme_mpath_pr_resv_report(ns, rse, rse_len, &eds);
if (ret)
goto free_rse;
@@ -499,7 +299,8 @@ static int nvme_pr_read_reservation_ns(struct nvme_ns *ns,
for (i = 0; i < num_regs; i++) {
if (eds) {
if (rse->regctl_eds[i].rcsts) {
- resv->key = le64_to_cpu(rse->regctl_eds[i].rkey);
+ resv->key =
+ le64_to_cpu(rse->regctl_eds[i].rkey);
break;
}
} else {
@@ -518,67 +319,6 @@ static int nvme_pr_read_reservation_ns(struct nvme_ns *ns,
return ret;
}
-static int nvme_pr_read_reservation(struct block_device *bdev,
- struct pr_held_reservation *resv)
-{
- struct nvme_reservation_status_ext tmp_rse, *rse;
- int ret, i, num_regs;
- u32 rse_len;
- bool eds;
-
-get_num_regs:
- /*
- * Get the number of registrations so we know how big to allocate
- * the response buffer.
- */
- ret = nvme_pr_resv_report(bdev, &tmp_rse, sizeof(tmp_rse), &eds);
- if (ret)
- return ret;
-
- num_regs = get_unaligned_le16(&tmp_rse.regctl);
- if (!num_regs) {
- resv->generation = le32_to_cpu(tmp_rse.gen);
- return 0;
- }
-
- rse_len = struct_size(rse, regctl_eds, num_regs);
- rse = kzalloc(rse_len, GFP_KERNEL);
- if (!rse)
- return -ENOMEM;
-
- ret = nvme_pr_resv_report(bdev, rse, rse_len, &eds);
- if (ret)
- goto free_rse;
-
- if (num_regs != get_unaligned_le16(&rse->regctl)) {
- kfree(rse);
- goto get_num_regs;
- }
-
- resv->generation = le32_to_cpu(rse->gen);
- resv->type = block_pr_type_from_nvme(rse->rtype);
-
- for (i = 0; i < num_regs; i++) {
- if (eds) {
- if (rse->regctl_eds[i].rcsts) {
- resv->key = le64_to_cpu(rse->regctl_eds[i].rkey);
- break;
- }
- } else {
- struct nvme_reservation_status *rs;
-
- rs = (struct nvme_reservation_status *)rse;
- if (rs->regctl_ds[i].rcsts) {
- resv->key = le64_to_cpu(rs->regctl_ds[i].rkey);
- break;
- }
- }
- }
-
-free_rse:
- kfree(rse);
- return ret;
-}
#if defined(CONFIG_NVME_MULTIPATH)
static int nvme_mpath_pr_register(struct mpath_device *mpath_device,
@@ -647,6 +387,61 @@ const struct mpath_pr_ops nvme_mpath_pr_ops = {
};
#endif
+static int nvme_pr_register(struct block_device *bdev, u64 old_key,
+ u64 new_key, unsigned int flags)
+{
+ struct nvme_ns *ns = bdev->bd_disk->private_data;
+
+ return nvme_pr_register_ns(ns, old_key, new_key, flags);
+}
+
+static int nvme_pr_reserve(struct block_device *bdev, u64 key,
+ enum pr_type type, unsigned flags)
+{
+ struct nvme_ns *ns = bdev->bd_disk->private_data;
+
+ return nvme_pr_reserve_ns(ns, key, type, flags);
+}
+
+static int nvme_pr_preempt(struct block_device *bdev, u64 old, u64 new,
+ enum pr_type type, bool abort)
+{
+ struct nvme_ns *ns = bdev->bd_disk->private_data;
+
+ return nvme_pr_preempt_ns(ns, old, new, type, abort);
+}
+
+static int nvme_pr_clear(struct block_device *bdev, u64 key)
+{
+ struct nvme_ns *ns = bdev->bd_disk->private_data;
+
+ return nvme_pr_clear_ns(ns, key);
+}
+
+static int nvme_pr_release(struct block_device *bdev, u64 key,
+ enum pr_type type)
+{
+ struct nvme_ns *ns = bdev->bd_disk->private_data;
+
+ return nvme_pr_release_ns(ns, key, type);
+}
+
+static int nvme_pr_read_keys(struct block_device *bdev,
+ struct pr_keys *keys_info)
+{
+ struct nvme_ns *ns = bdev->bd_disk->private_data;
+
+ return nvme_pr_read_keys_ns(ns, keys_info);
+}
+
+static int nvme_pr_read_reservation(struct block_device *bdev,
+ struct pr_held_reservation *resv)
+{
+ struct nvme_ns *ns = bdev->bd_disk->private_data;
+
+ return nvme_pr_read_reservation_ns(ns, resv);
+}
+
const struct pr_ops nvme_pr_ops = {
.pr_register = nvme_pr_register,
.pr_reserve = nvme_pr_reserve,
diff --git a/drivers/nvme/host/sysfs.c b/drivers/nvme/host/sysfs.c
index 16c6fea4b2db6..95f621c0a5b05 100644
--- a/drivers/nvme/host/sysfs.c
+++ b/drivers/nvme/host/sysfs.c
@@ -64,8 +64,11 @@ static inline struct nvme_ns_head *dev_to_ns_head(struct device *dev)
{
struct gendisk *disk = dev_to_disk(dev);
- if (nvme_disk_is_ns_head(disk))
- return disk->private_data;
+ if (nvme_disk_is_ns_head(disk)) {
+ struct mpath_disk *mpath_disk = mpath_gendisk_to_disk(disk);
+
+ return mpath_disk->mpath_head->drvdata;
+ }
return nvme_get_ns_from_dev(dev)->head;
}
@@ -183,30 +186,36 @@ static ssize_t metadata_bytes_show(struct device *dev,
}
static DEVICE_ATTR_RO(metadata_bytes);
-static int ns_head_update_nuse(struct nvme_ns_head *head)
+static int ns_head_update_nuse_cb(struct mpath_device *mpath_device)
{
+ struct nvme_ns *ns = container_of(mpath_device, struct nvme_ns, mpath_device);
+ struct nvme_ns_head *head = ns->head;
struct nvme_id_ns *id;
- struct nvme_ns *ns;
- int srcu_idx, ret = -EWOULDBLOCK;
-
- /* Avoid issuing commands too often by rate limiting the update */
- if (!__ratelimit(&head->rs_nuse))
- return 0;
-
- srcu_idx = srcu_read_lock(&head->srcu);
- ns = nvme_find_path(head);
- if (!ns)
- goto out_unlock;
+ int ret;
ret = nvme_identify_ns(ns->ctrl, head->ns_id, &id);
if (ret)
- goto out_unlock;
+ return ret;
head->nuse = le64_to_cpu(id->nuse);
kfree(id);
+ return 0;
+}
+
+static int ns_head_update_nuse(struct nvme_ns_head *head)
+{
+ struct mpath_disk *mpath_disk = head->mpath_disk;
+ struct mpath_head *mpath_head = mpath_disk->mpath_head;
+ int ret;
+
+ /* Avoid issuing commands too often by rate limiting the update */
+ if (!__ratelimit(&head->rs_nuse))
+ return 0;
+
+ ret = mpath_call_for_device(mpath_head, ns_head_update_nuse_cb);
+ if (ret == -ENODEV)
+ return -EWOULDBLOCK;
-out_unlock:
- srcu_read_unlock(&head->srcu, srcu_idx);
return ret;
}
@@ -312,49 +321,10 @@ static const struct attribute_group nvme_ns_attr_group = {
.is_visible = nvme_ns_attrs_are_visible,
};
-#ifdef CONFIG_NVME_MULTIPATH
-/*
- * NOTE: The dummy attribute does not appear in sysfs. It exists solely to allow
- * control over the visibility of the multipath sysfs node. Without at least one
- * attribute defined in nvme_ns_mpath_attrs[], the sysfs implementation does not
- * invoke the multipath_sysfs_group_visible() method. As a result, we would not
- * be able to control the visibility of the multipath sysfs node.
- */
-static struct attribute dummy_attr = {
- .name = "dummy",
-};
-
-static struct attribute *nvme_ns_mpath_attrs[] = {
- &dummy_attr,
- NULL,
-};
-
-static bool multipath_sysfs_group_visible(struct kobject *kobj)
-{
- struct device *dev = container_of(kobj, struct device, kobj);
-
- return nvme_disk_is_ns_head(dev_to_disk(dev));
-}
-
-static bool multipath_sysfs_attr_visible(struct kobject *kobj,
- struct attribute *attr, int n)
-{
- return false;
-}
-
-DEFINE_SYSFS_GROUP_VISIBLE(multipath_sysfs)
-
-const struct attribute_group nvme_ns_mpath_attr_group = {
- .name = "multipath",
- .attrs = nvme_ns_mpath_attrs,
- .is_visible = SYSFS_GROUP_VISIBLE(multipath_sysfs),
-};
-#endif
-
const struct attribute_group *nvme_ns_attr_groups[] = {
&nvme_ns_attr_group,
#ifdef CONFIG_NVME_MULTIPATH
- &nvme_ns_mpath_attr_group,
+ &mpath_attr_group,
#endif
NULL,
};
--
2.43.5
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH 02/19] nvme: introduce a namespace count in the ns head structure
2026-02-25 15:39 ` [PATCH 02/19] nvme: introduce a namespace count in the ns head structure John Garry
@ 2026-03-02 12:46 ` Nilay Shroff
2026-03-02 15:57 ` John Garry
0 siblings, 1 reply; 28+ messages in thread
From: Nilay Shroff @ 2026-03-02 12:46 UTC (permalink / raw)
To: John Garry, hch, kbusch, sagi, axboe, martin.petersen,
james.bottomley, hare
Cc: jmeneghi, linux-nvme, linux-scsi, michael.christie, snitzer,
bmarzins, dm-devel, linux-block, linux-kernel
On 2/25/26 9:09 PM, John Garry wrote:
> For switching to use libmultipath, the per-namespace sibling list entry in
> nvme_ns.sibling will be replaced with multipath_device.sibling list
> pointer.
>
> For when CONFIG_LIBMULTIPATH is disabled, that list of namespaces would no
> longer be maintained.
>
> However the core code checks in many places whether there is any
> namespace in the head list, like in nvme_ns_remove().
>
> Introduce a separate count of the number of namespaces for the namespace
> head and use that count for the places where the per-namespace head list
> of namespaces is checked to be empty.
>
> Signed-off-by: John Garry<john.g.garry@oracle.com>
> ---
> drivers/nvme/host/core.c | 10 +++++++---
> drivers/nvme/host/multipath.c | 4 ++--
> drivers/nvme/host/nvme.h | 1 +
> 3 files changed, 10 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 37e30caff4149..76249871dd7c2 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -4024,7 +4024,7 @@ static int nvme_init_ns_head(struct nvme_ns *ns, struct nvme_ns_info *info)
> } else {
> ret = -EINVAL;
> if ((!info->is_shared || !head->shared) &&
> - !list_empty(&head->list)) {
> + head->ns_count) {
> dev_err(ctrl->device,
> "Duplicate unshared namespace %d\n",
> info->nsid);
> @@ -4047,6 +4047,7 @@ static int nvme_init_ns_head(struct nvme_ns *ns, struct nvme_ns_info *info)
> }
>
> list_add_tail_rcu(&ns->siblings, &head->list);
> + head->ns_count++;
> ns->head = head;
> mutex_unlock(&ctrl->subsys->lock);
>
I think we could still access head->mpath_disk->mpath_head->dev_list.
So in that case do we really need to have ->ns_count? Moreover, if
we could maintain a pointer to struct mpath_head from struct
nvme_ns_head then we may avoid one dereference. What do you think?
Thanks,
--Nilay
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 16/19] nvme-multipath: add nvme_mpath_{add,delete}_ns()
2026-02-25 15:40 ` [PATCH 16/19] nvme-multipath: add nvme_mpath_{add,delete}_ns() John Garry
@ 2026-03-02 12:48 ` Nilay Shroff
2026-03-02 15:59 ` John Garry
0 siblings, 1 reply; 28+ messages in thread
From: Nilay Shroff @ 2026-03-02 12:48 UTC (permalink / raw)
To: John Garry, hch, kbusch, sagi, axboe, martin.petersen,
james.bottomley, hare
Cc: jmeneghi, linux-nvme, linux-scsi, michael.christie, snitzer,
bmarzins, dm-devel, linux-block, linux-kernel
On 2/25/26 9:10 PM, John Garry wrote:
> Add functions to call into the mpath_add_device() and mpath_delete_device()
> functions.
>
> The per-NS gendisk pointer is used as the mpath_device disk pointer, which
> is used in libmultipath for references the per-path block device.
>
> Signed-off-by: John Garry<john.g.garry@oracle.com>
> ---
> drivers/nvme/host/multipath.c | 26 ++++++++++++++++++++++++++
> drivers/nvme/host/nvme.h | 8 ++++++++
> 2 files changed, 34 insertions(+)
>
> diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
> index 7ee0ad7bdfa26..bd96211123fee 100644
> --- a/drivers/nvme/host/multipath.c
> +++ b/drivers/nvme/host/multipath.c
> @@ -982,6 +982,32 @@ void nvme_mpath_synchronize(struct nvme_ns_head *head)
> mpath_synchronize(mpath_disk->mpath_head);
> }
>
> +void nvme_mpath_add_ns(struct nvme_ns *ns)
> +{
> + struct nvme_ns_head *head = ns->head;
> + struct mpath_disk *mpath_disk = head->mpath_disk;
> + struct mpath_head *mpath_head;
> +
> + if (!mpath_disk)
> + return;
> +
> + mpath_head = mpath_disk->mpath_head;
> +
> + ns->mpath_device.disk = ns->disk;
> + mpath_add_device(mpath_head, &ns->mpath_device);
> +}
As we have now reference to mpath_device from struct nvme_ns
then why do we still maintain reference to ns->disk? We may
want to directly access path device disk using ns->mpath_device.disk,
makes sense?
Thanks,
--Nilay
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 19/19] nvme-multipath: switch to use libmultipath
2026-02-25 15:40 ` [PATCH 19/19] nvme-multipath: switch to use libmultipath John Garry
@ 2026-03-02 12:57 ` Nilay Shroff
2026-03-02 16:13 ` John Garry
0 siblings, 1 reply; 28+ messages in thread
From: Nilay Shroff @ 2026-03-02 12:57 UTC (permalink / raw)
To: John Garry, hch, kbusch, sagi, axboe, martin.petersen,
james.bottomley, hare
Cc: jmeneghi, linux-nvme, linux-scsi, michael.christie, snitzer,
bmarzins, dm-devel, linux-block, linux-kernel
On 2/25/26 9:10 PM, John Garry wrote:
> void nvme_mpath_clear_ctrl_paths(struct nvme_ctrl *ctrl)
> @@ -277,30 +279,35 @@ void nvme_mpath_clear_ctrl_paths(struct nvme_ctrl *ctrl)
> srcu_idx = srcu_read_lock(&ctrl->srcu);
> list_for_each_entry_srcu(ns, &ctrl->namespaces, list,
> srcu_read_lock_held(&ctrl->srcu)) {
> + struct nvme_ns_head *head = ns->head;
> + struct mpath_disk *mpath_disk = head->mpath_disk;
> +
> + if (!mpath_disk)
> + continue;
> +
> nvme_mpath_clear_current_path(ns);
> - kblockd_schedule_work(&ns->head->requeue_work);
> + kblockd_schedule_work(&mpath_disk->mpath_head->requeue_work);
> }
> srcu_read_unlock(&ctrl->srcu, srcu_idx);
> }
>
> +static void nvme_mpath_revalidate_paths_cb(struct mpath_device *mpath_device,
> + sector_t capacity)
> +{
> + struct nvme_ns *ns = nvme_mpath_to_ns(mpath_device);
> +
> + if (capacity != get_capacity(ns->disk))
> + clear_bit(NVME_NS_READY, &ns->flags);
> +}
> +
I don't quite understand the intent of the above function.
Here I see that we compare mpath_disk capacity with per-path
disk. Do we really have sectors allocated for mpath_disk?
Overall, IMO abstracting out common multipath function into
a separate library is a good move. But then I just want to
understand layering here with libmultipath. Does it sit above
the driver or below? I see in some places we have back and forth
callbacks from driver to libmultipath and then back to the
driver, for instance:
nvme_mpath_add_disk => driver
-> mpath_device_set_live => libmultipath
-> mpath_head_add_cdev => libmultipath
-> nvme_mpath_add_cdev => driver
Does this intentional? Or am I missing overall picture...
Thanks,
--Nilay
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 00/19] nvme: switch to libmultipath
2026-02-25 15:39 [PATCH 00/19] nvme: switch to libmultipath John Garry
` (18 preceding siblings ...)
2026-02-25 15:40 ` [PATCH 19/19] nvme-multipath: switch to use libmultipath John Garry
@ 2026-03-02 14:12 ` Christoph Hellwig
2026-03-02 14:58 ` John Garry
19 siblings, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2026-03-02 14:12 UTC (permalink / raw)
To: John Garry
Cc: hch, kbusch, sagi, axboe, martin.petersen, james.bottomley, hare,
jmeneghi, linux-nvme, linux-scsi, michael.christie, snitzer,
bmarzins, dm-devel, linux-block, linux-kernel
On Wed, Feb 25, 2026 at 03:39:48PM +0000, John Garry wrote:
> This switches the NVMe host driver to use libmultipath. That library
> is very heavily based on the NVMe multipath code, so the change over
> should hopefully be straightforward. There is often a direct replacement
> for functions.
Given how little code this removes while adding the new libmultipath
dependency and abstractions I can't say I like this at all.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 00/19] nvme: switch to libmultipath
2026-03-02 14:12 ` [PATCH 00/19] nvme: switch to libmultipath Christoph Hellwig
@ 2026-03-02 14:58 ` John Garry
0 siblings, 0 replies; 28+ messages in thread
From: John Garry @ 2026-03-02 14:58 UTC (permalink / raw)
To: Christoph Hellwig
Cc: kbusch, sagi, axboe, martin.petersen, james.bottomley, hare,
jmeneghi, linux-nvme, linux-scsi, michael.christie, snitzer,
bmarzins, dm-devel, linux-block, linux-kernel
On 02/03/2026 14:12, Christoph Hellwig wrote:
> On Wed, Feb 25, 2026 at 03:39:48PM +0000, John Garry wrote:
>> This switches the NVMe host driver to use libmultipath. That library
>> is very heavily based on the NVMe multipath code, so the change over
>> should hopefully be straightforward. There is often a direct replacement
>> for functions.
>
> Given how little code this removes while adding the new libmultipath
> dependency and abstractions I can't say I like this at all.
>
Yeah, so we're losing about 300 lines here in the conversion. However
nvme mulitpath.c goes from 1410 -> 1250 lines with this series - that's
not good enough to justify the change.
In a quick review of the code, there is more stuff which I push down
(into the lib). And a lot of the abstraction code can be condensed - if
you check something like numa_node_show(), there are 6 variables needed
just to get to the point where the common helper could be called - that
is just silly.
Another issue is just that the remaining code is NVMe specific, like ANA
support, or just doesn't fit the SCSI model, e.g. NVMe iopolicy is per
subsystem, while for SCSI we don't have a subsystem concept and I so
made iopolicy per multipathed SCSI disk.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 02/19] nvme: introduce a namespace count in the ns head structure
2026-03-02 12:46 ` Nilay Shroff
@ 2026-03-02 15:57 ` John Garry
0 siblings, 0 replies; 28+ messages in thread
From: John Garry @ 2026-03-02 15:57 UTC (permalink / raw)
To: Nilay Shroff, hch, kbusch, sagi, axboe, martin.petersen,
james.bottomley, hare
Cc: jmeneghi, linux-nvme, linux-scsi, michael.christie, snitzer,
bmarzins, dm-devel, linux-block, linux-kernel
On 02/03/2026 12:46, Nilay Shroff wrote:
>>
>> Signed-off-by: John Garry<john.g.garry@oracle.com>
>> ---
>> drivers/nvme/host/core.c | 10 +++++++---
>> drivers/nvme/host/multipath.c | 4 ++--
>> drivers/nvme/host/nvme.h | 1 +
>> 3 files changed, 10 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>> index 37e30caff4149..76249871dd7c2 100644
>> --- a/drivers/nvme/host/core.c
>> +++ b/drivers/nvme/host/core.c
>> @@ -4024,7 +4024,7 @@ static int nvme_init_ns_head(struct nvme_ns *ns,
>> struct nvme_ns_info *info)
>> } else {
>> ret = -EINVAL;
>> if ((!info->is_shared || !head->shared) &&
>> - !list_empty(&head->list)) {
>> + head->ns_count) {
>> dev_err(ctrl->device,
>> "Duplicate unshared namespace %d\n",
>> info->nsid);
>> @@ -4047,6 +4047,7 @@ static int nvme_init_ns_head(struct nvme_ns *ns,
>> struct nvme_ns_info *info)
>> }
>> list_add_tail_rcu(&ns->siblings, &head->list);
>> + head->ns_count++;
>> ns->head = head;
>> mutex_unlock(&ctrl->subsys->lock);
>
> I think we could still access head->mpath_disk->mpath_head->dev_list.
> So in that case do we really need to have ->ns_count?
As mentioned, if CONFIG_NVME_MULTIPATH is disabled, mpath_head->dev_list
is not maintained. So we need another method to set NS count in the core
code.
> Moreover, if
> we could maintain a pointer to struct mpath_head from struct
> nvme_ns_head then we may avoid one dereference. What do you think?
I think that it should be ok. I was just trying to reduce pointer
declaration (as they need to be maintained).
Thanks!
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 16/19] nvme-multipath: add nvme_mpath_{add,delete}_ns()
2026-03-02 12:48 ` Nilay Shroff
@ 2026-03-02 15:59 ` John Garry
0 siblings, 0 replies; 28+ messages in thread
From: John Garry @ 2026-03-02 15:59 UTC (permalink / raw)
To: Nilay Shroff, hch, kbusch, sagi, axboe, martin.petersen,
james.bottomley, hare
Cc: jmeneghi, linux-nvme, linux-scsi, michael.christie, snitzer,
bmarzins, dm-devel, linux-block, linux-kernel
On 02/03/2026 12:48, Nilay Shroff wrote:
>> +void nvme_mpath_add_ns(struct nvme_ns *ns)
>> +{
>> + struct nvme_ns_head *head = ns->head;
>> + struct mpath_disk *mpath_disk = head->mpath_disk;
>> + struct mpath_head *mpath_head;
>> +
>> + if (!mpath_disk)
>> + return;
>> +
>> + mpath_head = mpath_disk->mpath_head;
>> +
>> + ns->mpath_device.disk = ns->disk;
>> + mpath_add_device(mpath_head, &ns->mpath_device);
>> +}
>
> As we have now reference to mpath_device from struct nvme_ns
> then why do we still maintain reference to ns->disk? We may
> want to directly access path device disk using ns->mpath_device.disk,
> makes sense?
For !CONFIG_NVME_MULTIPATH or !multipath mod param set, mpath_device is
not properly maintained, so I would rather not use it in the core code.
Thanks,
John
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 19/19] nvme-multipath: switch to use libmultipath
2026-03-02 12:57 ` Nilay Shroff
@ 2026-03-02 16:13 ` John Garry
0 siblings, 0 replies; 28+ messages in thread
From: John Garry @ 2026-03-02 16:13 UTC (permalink / raw)
To: Nilay Shroff, hch, kbusch, sagi, axboe, martin.petersen,
james.bottomley, hare
Cc: jmeneghi, linux-nvme, linux-scsi, michael.christie, snitzer,
bmarzins, dm-devel, linux-block, linux-kernel
On 02/03/2026 12:57, Nilay Shroff wrote:
> On 2/25/26 9:10 PM, John Garry wrote:
>> void nvme_mpath_clear_ctrl_paths(struct nvme_ctrl *ctrl)
>> @@ -277,30 +279,35 @@ void nvme_mpath_clear_ctrl_paths(struct
>> nvme_ctrl *ctrl)
>> srcu_idx = srcu_read_lock(&ctrl->srcu);
>> list_for_each_entry_srcu(ns, &ctrl->namespaces, list,
>> srcu_read_lock_held(&ctrl->srcu)) {
>> + struct nvme_ns_head *head = ns->head;
>> + struct mpath_disk *mpath_disk = head->mpath_disk;
>> +
>> + if (!mpath_disk)
>> + continue;
>> +
>> nvme_mpath_clear_current_path(ns);
>> - kblockd_schedule_work(&ns->head->requeue_work);
>> + kblockd_schedule_work(&mpath_disk->mpath_head->requeue_work);
>> }
>> srcu_read_unlock(&ctrl->srcu, srcu_idx);
>> }
>> +static void nvme_mpath_revalidate_paths_cb(struct mpath_device
>> *mpath_device,
>> + sector_t capacity)
>> +{
>> + struct nvme_ns *ns = nvme_mpath_to_ns(mpath_device);
>> +
>> + if (capacity != get_capacity(ns->disk))
>> + clear_bit(NVME_NS_READY, &ns->flags);
>> +}
>> +
>
> I don't quite understand the intent of the above function.
This specifically is a callback for when the NVMe driver calls into
libmultipath. It could also be supplied in the function template.
If you check mainline nvme_mpath_revalidate_paths(), it iterates the
paths, and for each part checks capacity and unsets NVME_NS_READY if set.
Now libmultipath manages the paths, so we provide an API for the driver
to call into the iterate the paths to support
nvme_mpath_revalidate_paths(). Since we are doing something
nvme-specific in nvme_mpath_revalidate_paths(), we need the driver to
provide a CB to do this driver-specific functionality.
> Here I see that we compare mpath_disk capacity with per-path
> disk. Do we really have sectors allocated for mpath_disk?
mpath_disk manages the multipath gendisk - it is no longer in
nvme_ns_head.disk
>
> Overall, IMO abstracting out common multipath function into
> a separate library is a good move. But then I just want to
> understand layering here with libmultipath. Does it sit above
> the driver or below?
I would say neither - or it sits above the driver, if anything. It is
just a library for managing multipathed devices.
> I see in some places we have back and forth
> callbacks from driver to libmultipath and then back to the
> driver, for instance:
> nvme_mpath_add_disk => driver
> -> mpath_device_set_live => libmultipath
> -> mpath_head_add_cdev => libmultipath
> -> nvme_mpath_add_cdev => driver
>
> Does this intentional? Or am I missing overall picture...
Something like nvme_mpath_add_cdev is a callback supplied for
libmultipath to do some driver-specific action. About
nvme_mpath_add_cdev(), I think that this can be pushed into
libmultipath. The only NVMe specific thing it does is the device naming
- so a method for the nvme driver to supply that is required.
Thanks for checking all this.
^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2026-03-02 16:14 UTC | newest]
Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-25 15:39 [PATCH 00/19] nvme: switch to libmultipath John Garry
2026-02-25 15:39 ` [PATCH 01/19] nvme-multipath: pass NS head to nvme_mpath_revalidate_paths() John Garry
2026-02-25 15:39 ` [PATCH 02/19] nvme: introduce a namespace count in the ns head structure John Garry
2026-03-02 12:46 ` Nilay Shroff
2026-03-02 15:57 ` John Garry
2026-02-25 15:39 ` [PATCH 03/19] nvme-multipath: add nvme_is_mpath_request() John Garry
2026-02-25 15:39 ` [PATCH 04/19] nvme-multipath: add initial support for using libmultipath John Garry
2026-02-25 15:39 ` [PATCH 05/19] nvme-multipath: add nvme_mpath_available_path() John Garry
2026-02-25 15:39 ` [PATCH 06/19] nvme-multipath: add nvme_mpath_{add, remove}_cdev() John Garry
2026-02-25 15:39 ` [PATCH 07/19] nvme-multipath: add nvme_mpath_is_{disabled, optimised} John Garry
2026-02-25 15:39 ` [PATCH 08/19] nvme-multipath: add nvme_mpath_get_access_state() John Garry
2026-02-25 15:39 ` [PATCH 09/19] nvme-multipath: add nvme_mpath_{bdev, cdev}_ioctl() John Garry
2026-02-25 15:39 ` [PATCH 10/19] nvme-multipath: add uring_cmd support John Garry
2026-02-25 15:39 ` [PATCH 11/19] nvme-multipath: add nvme_mpath_get_iopolicy() John Garry
2026-02-25 15:40 ` [PATCH 12/19] nvme-multipath: add PR support for libmultipath John Garry
2026-02-25 15:40 ` [PATCH 13/19] nvme-multipath: add nvme_mpath_report_zones() John Garry
2026-02-25 15:40 ` [PATCH 14/19] nvme-multipath: add nvme_mpath_get_unique_id() John Garry
2026-02-25 15:40 ` [PATCH 15/19] nvme-multipath: add nvme_mpath_synchronize() John Garry
2026-02-25 15:40 ` [PATCH 16/19] nvme-multipath: add nvme_mpath_{add,delete}_ns() John Garry
2026-03-02 12:48 ` Nilay Shroff
2026-03-02 15:59 ` John Garry
2026-02-25 15:40 ` [PATCH 17/19] nvme-multipath: add nvme_mpath_head_queue_if_no_path() John Garry
2026-02-25 15:40 ` [PATCH 18/19] nvme-multipath: set mpath_head_template.device_groups John Garry
2026-02-25 15:40 ` [PATCH 19/19] nvme-multipath: switch to use libmultipath John Garry
2026-03-02 12:57 ` Nilay Shroff
2026-03-02 16:13 ` John Garry
2026-03-02 14:12 ` [PATCH 00/19] nvme: switch to libmultipath Christoph Hellwig
2026-03-02 14:58 ` John Garry
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox