* [PATCH v2 00/13] libmultipath: a generic multipath lib for block drivers
@ 2026-04-28 11:10 John Garry
2026-04-28 11:10 ` [PATCH v2 01/13] libmultipath: Add initial framework John Garry
` (13 more replies)
0 siblings, 14 replies; 16+ messages in thread
From: John Garry @ 2026-04-28 11:10 UTC (permalink / raw)
To: hch, kbusch, sagi, axboe, martin.petersen, james.bottomley, hare,
bmarzins, nilay
Cc: jmeneghi, linux-nvme, linux-scsi, michael.christie, snitzer,
dm-devel, linux-kernel, John Garry
libmultipath: a generic multipath lib for block drivers
This series introduces libmultipath. It is essentially a refactoring of
NVME multipath support, so we can have a common library to also support
native SCSI multipath.
Much of the code is taken directly from the NVMe multipath code. However,
NVMe specifics are removed. A template structure is provided so the driver
may provide callbacks for driver specifics, like ANA support for NVMe.
Important new structures introduced include:
- mpath_head
These contain much of the multipath-specific functionality from
nvme_ns_head, including a pointer to the gendisk structure and
a path SRCU-based array.
- mpath_device
This is the per-path structure, and contains much the same
multipath-specific functionality in nvme_ns
libmultipath provides functionality for path management, path selection,
data path, and failover handling.
Since the NVMe driver has some code in the sysfs and ioctl handling
which iterate all multipath NSes, functions like mpath_call_for_device()
are added to do the same per-path iteration.
Full series also available at
https://github.com/johnpgarry/linux/commits/scsi-multipath-pre-7.1-upstream-v2/
Differences to v1:
- put current_path[] at end of struct mpath_head (Nilay)
- drop struct mpath_disk and keep nvme_remove_head() (Nilay)
- don't pass iopolicy from mpath_find_path() (Benjamin)
- change mpath_access_state names (Nilay)
- fix for setting mpath_device.nr_active and .numa_node (Nilay)
- fix uninit'ed pointers in __mpath_find_path() (Nilay)
- simplify mpath_head_template.available_path (Nilay, Benjamin)
- use DEFINE_SIMPLE_SYSFS_GROUP_VISIBLE (Benjamin)
- check mpath_bdev_submit_bio() -> .clone_bio() for errors (Benjamin)
- drop struct mpath_pr_ops (Keith)
- drop mpath_head_template.bdev_ioctl
- drop mpath_head_template.get_unique_id
- drop mpath_head_template.report_zones
- drop mpath_head_template.get_access_state
- add mpath_head_template.ioctl_{begin, finish} and drop
mpath_head_read_unlock()
- add mpath_device.access_state
- add mpath_head_devices_empty()
- make mpath_delete_device() return a bool
John Garry (13):
libmultipath: Add initial framework
libmultipath: Add basic gendisk support
libmultipath: Add path selection support
libmultipath: Add bio handling
libmultipath: Add support for mpath_device management
libmultipath: Add cdev support
libmultipath: Add delayed removal support
libmultipath: Add sysfs helpers
libmultipath: Add PR support
libmultipath: Add mpath_bdev_report_zones()
libmultipath: Add support for block device IOCTL
libmultipath: Add mpath_bdev_getgeo()
libmultipath: Add mpath_bdev_get_unique_id()
include/linux/multipath.h | 181 ++++++
lib/Kconfig | 6 +
lib/Makefile | 2 +
lib/multipath.c | 1293 +++++++++++++++++++++++++++++++++++++
4 files changed, 1482 insertions(+)
create mode 100644 include/linux/multipath.h
create mode 100644 lib/multipath.c
--
2.43.5
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v2 01/13] libmultipath: Add initial framework
2026-04-28 11:10 [PATCH v2 00/13] libmultipath: a generic multipath lib for block drivers John Garry
@ 2026-04-28 11:10 ` John Garry
2026-04-28 11:10 ` [PATCH v2 02/13] libmultipath: Add basic gendisk support John Garry
` (12 subsequent siblings)
13 siblings, 0 replies; 16+ messages in thread
From: John Garry @ 2026-04-28 11:10 UTC (permalink / raw)
To: hch, kbusch, sagi, axboe, martin.petersen, james.bottomley, hare,
bmarzins, nilay
Cc: jmeneghi, linux-nvme, linux-scsi, michael.christie, snitzer,
dm-devel, linux-kernel, John Garry
Add initial framework for libmultipath. libmultipath is a library for
multipath-capable block drivers, such as NVMe. The main function is to
support path management, path selection, and failover handling.
Basic support to add and remove the head structure - mpath_head - is
included.
This main purpose of this structure is to manage available paths and path
selection. It is quite similar to the multipath functionality in
nvme_ns_head. It also manages the multipath gendisk.
Each path is represented by the mpath_device structure. It should hold a
pointer to the per-path gendisk and also a list element for all siblings
of paths. For NVMe, there would be a mpath_device per nvme_ns.
All the libmultipath code is more or less taken from
drivers/nvme/host/multipath.c, which was originally authored by Christoph
Hellwig <hch@lst.de>.
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
include/linux/multipath.h | 28 +++++++++++++++
lib/Kconfig | 6 ++++
lib/Makefile | 2 ++
lib/multipath.c | 74 +++++++++++++++++++++++++++++++++++++++
4 files changed, 110 insertions(+)
create mode 100644 include/linux/multipath.h
create mode 100644 lib/multipath.c
diff --git a/include/linux/multipath.h b/include/linux/multipath.h
new file mode 100644
index 0000000000000..102dfcf21ffc9
--- /dev/null
+++ b/include/linux/multipath.h
@@ -0,0 +1,28 @@
+
+#ifndef _LIBMULTIPATH_H
+#define _LIBMULTIPATH_H
+
+#include <linux/blkdev.h>
+#include <linux/srcu.h>
+
+struct mpath_device {
+ struct list_head siblings;
+ struct gendisk *disk;
+};
+
+struct mpath_head {
+ struct srcu_struct srcu;
+ struct list_head dev_list; /* list of all mpath_devs */
+ struct mutex lock;
+
+ struct kref ref;
+
+ void *drvdata;
+ struct mpath_device __rcu *current_path[];
+};
+
+int mpath_get_head(struct mpath_head *mpath_head);
+void mpath_put_head(struct mpath_head *mpath_head);
+struct mpath_head *mpath_alloc_head(void);
+
+#endif // _LIBMULTIPATH_H
diff --git a/lib/Kconfig b/lib/Kconfig
index 0f2fb96106476..5c70e6df89740 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -636,3 +636,9 @@ config UNION_FIND
config MIN_HEAP
bool
+
+config LIBMULTIPATH
+ bool "MULTIPATH BLOCK DRIVER LIBRARY"
+ depends on BLOCK
+ help
+ If you say yes here then you get a multipath lib for block drivers
diff --git a/lib/Makefile b/lib/Makefile
index 1b9ee167517f3..98948b18af7f7 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -335,3 +335,5 @@ CONTEXT_ANALYSIS_test_context-analysis.o := y
obj-$(CONFIG_CONTEXT_ANALYSIS_TEST) += test_context-analysis.o
subdir-$(CONFIG_FORTIFY_SOURCE) += test_fortify
+
+obj-$(CONFIG_LIBMULTIPATH) += multipath.o
diff --git a/lib/multipath.c b/lib/multipath.c
new file mode 100644
index 0000000000000..0d5690b5ba4ca
--- /dev/null
+++ b/lib/multipath.c
@@ -0,0 +1,74 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2017-2018 Christoph Hellwig.
+ * Copyright (c) 2026 Oracle and/or its affiliates.
+ */
+#include <linux/module.h>
+#include <linux/multipath.h>
+
+static struct workqueue_struct *mpath_wq;
+
+static void mpath_free_head(struct kref *ref)
+{
+ struct mpath_head *mpath_head =
+ container_of(ref, struct mpath_head, ref);
+
+ cleanup_srcu_struct(&mpath_head->srcu);
+ kfree(mpath_head);
+}
+
+int mpath_get_head(struct mpath_head *mpath_head)
+{
+ if (!kref_get_unless_zero(&mpath_head->ref))
+ return -ENXIO;
+ return 0;
+}
+EXPORT_SYMBOL_GPL(mpath_get_head);
+
+void mpath_put_head(struct mpath_head *mpath_head)
+{
+ kref_put(&mpath_head->ref, mpath_free_head);
+}
+EXPORT_SYMBOL_GPL(mpath_put_head);
+
+struct mpath_head *mpath_alloc_head(void)
+{
+ struct mpath_head *mpath_head;
+ int ret;
+
+ mpath_head = kzalloc(struct_size(mpath_head, current_path,
+ num_possible_nodes()), GFP_KERNEL);
+ if (!mpath_head)
+ return ERR_PTR(-ENOMEM);
+ INIT_LIST_HEAD(&mpath_head->dev_list);
+ mutex_init(&mpath_head->lock);
+ kref_init(&mpath_head->ref);
+
+ ret = init_srcu_struct(&mpath_head->srcu);
+ if (ret) {
+ kfree(mpath_head);
+ return ERR_PTR(ret);
+ }
+
+ return mpath_head;
+}
+EXPORT_SYMBOL_GPL(mpath_alloc_head);
+
+static int __init mpath_init(void)
+{
+ mpath_wq = alloc_workqueue("mpath-wq",
+ WQ_UNBOUND | WQ_MEM_RECLAIM | WQ_SYSFS, 0);
+ if (!mpath_wq)
+ return -ENOMEM;
+ return 0;
+}
+
+static void __exit mpath_exit(void)
+{
+ destroy_workqueue(mpath_wq);
+}
+
+module_init(mpath_init);
+module_exit(mpath_exit);
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("libmultipath");
--
2.43.5
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v2 02/13] libmultipath: Add basic gendisk support
2026-04-28 11:10 [PATCH v2 00/13] libmultipath: a generic multipath lib for block drivers John Garry
2026-04-28 11:10 ` [PATCH v2 01/13] libmultipath: Add initial framework John Garry
@ 2026-04-28 11:10 ` John Garry
2026-04-28 11:10 ` [PATCH v2 03/13] libmultipath: Add path selection support John Garry
` (11 subsequent siblings)
13 siblings, 0 replies; 16+ messages in thread
From: John Garry @ 2026-04-28 11:10 UTC (permalink / raw)
To: hch, kbusch, sagi, axboe, martin.petersen, james.bottomley, hare,
bmarzins, nilay
Cc: jmeneghi, linux-nvme, linux-scsi, michael.christie, snitzer,
dm-devel, linux-kernel, John Garry
Add support to allocate and free a multipath gendisk.
NVMe has almost like-for-like equivalents here:
- mpath_alloc_head_disk() -> nvme_mpath_alloc_disk()
- multipath_partition_scan_work() -> nvme_partition_scan_work()
- mpath_remove_disk() -> nvme_remove_head()
- mpath_device_set_live() -> nvme_mpath_set_live()
struct mpath_head_template is introduced as a method for drivers to
provide custom multipath functionality.
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
include/linux/multipath.h | 38 ++++++++++++++++
lib/multipath.c | 96 +++++++++++++++++++++++++++++++++++++++
2 files changed, 134 insertions(+)
diff --git a/include/linux/multipath.h b/include/linux/multipath.h
index 102dfcf21ffc9..3e2a513059cde 100644
--- a/include/linux/multipath.h
+++ b/include/linux/multipath.h
@@ -5,11 +5,19 @@
#include <linux/blkdev.h>
#include <linux/srcu.h>
+extern const struct block_device_operations mpath_ops;
+
struct mpath_device {
+ struct mpath_head *mpath_head;
struct list_head siblings;
struct gendisk *disk;
};
+struct mpath_head_template {
+};
+
+#define MPATH_HEAD_DISK_LIVE 0
+
struct mpath_head {
struct srcu_struct srcu;
struct list_head dev_list; /* list of all mpath_devs */
@@ -18,11 +26,41 @@ struct mpath_head {
struct kref ref;
void *drvdata;
+ unsigned long flags;
+ struct gendisk *disk;
+ struct work_struct partition_scan_work;
+ struct device *parent;
+ const struct attribute_group **disk_groups;
+ const struct mpath_head_template *mpdt;
struct mpath_device __rcu *current_path[];
};
+static inline struct mpath_head *mpath_bd_device_to_head(struct device *dev)
+{
+ return dev_get_drvdata(dev);
+}
+
+static inline struct mpath_head *mpath_gendisk_to_head(struct gendisk *disk)
+{
+ return mpath_bd_device_to_head(disk_to_dev(disk));
+}
+
int mpath_get_head(struct mpath_head *mpath_head);
void mpath_put_head(struct mpath_head *mpath_head);
struct mpath_head *mpath_alloc_head(void);
+void mpath_put_disk(struct mpath_head *mpath_head);
+void mpath_remove_disk(struct mpath_head *mpath_head);
+int mpath_alloc_head_disk(struct mpath_head *mpath_head,
+ struct queue_limits *lim, int numa_node);
+void mpath_device_set_live(struct mpath_device *mpath_device);
+
+static inline bool is_mpath_disk(struct gendisk *disk)
+{
+ #if IS_ENABLED(CONFIG_LIBMULTIPATH)
+ return disk->fops == &mpath_ops;
+ #else
+ return false;
+ #endif
+}
#endif // _LIBMULTIPATH_H
diff --git a/lib/multipath.c b/lib/multipath.c
index 0d5690b5ba4ca..726d9bec13553 100644
--- a/lib/multipath.c
+++ b/lib/multipath.c
@@ -31,6 +31,99 @@ void mpath_put_head(struct mpath_head *mpath_head)
}
EXPORT_SYMBOL_GPL(mpath_put_head);
+static int mpath_bdev_open(struct gendisk *disk, blk_mode_t mode)
+{
+ struct mpath_head *mpath_head = disk->private_data;
+
+ return mpath_get_head(mpath_head);
+}
+
+static void mpath_bdev_release(struct gendisk *disk)
+{
+ struct mpath_head *mpath_head = disk->private_data;
+
+ mpath_put_head(mpath_head);
+}
+
+const struct block_device_operations mpath_ops = {
+ .owner = THIS_MODULE,
+ .open = mpath_bdev_open,
+ .release = mpath_bdev_release,
+};
+EXPORT_SYMBOL_GPL(mpath_ops);
+
+static void multipath_partition_scan_work(struct work_struct *work)
+{
+ struct mpath_head *mpath_head =
+ container_of(work, struct mpath_head, partition_scan_work);
+
+ if (WARN_ON_ONCE(!test_and_clear_bit(GD_SUPPRESS_PART_SCAN,
+ &mpath_head->disk->state)))
+ return;
+
+ mutex_lock(&mpath_head->disk->open_mutex);
+ bdev_disk_changed(mpath_head->disk, false);
+ mutex_unlock(&mpath_head->disk->open_mutex);
+}
+
+void mpath_remove_disk(struct mpath_head *mpath_head)
+{
+ if (test_and_clear_bit(MPATH_HEAD_DISK_LIVE, &mpath_head->flags)) {
+ struct gendisk *disk = mpath_head->disk;
+
+ del_gendisk(disk);
+ }
+}
+EXPORT_SYMBOL_GPL(mpath_remove_disk);
+
+void mpath_put_disk(struct mpath_head *mpath_head)
+{
+ if (!mpath_head->disk)
+ return;
+
+ /* make sure all pending bios are cleaned up */
+ flush_work(&mpath_head->partition_scan_work);
+ put_disk(mpath_head->disk);
+}
+EXPORT_SYMBOL_GPL(mpath_put_disk);
+
+int mpath_alloc_head_disk(struct mpath_head *mpath_head,
+ struct queue_limits *lim, int numa_node)
+{
+ mpath_head->disk = blk_alloc_disk(lim, numa_node);
+ if (IS_ERR(mpath_head->disk))
+ return PTR_ERR(mpath_head->disk);
+
+ mpath_head->disk->private_data = mpath_head;
+ mpath_head->disk->fops = &mpath_ops;
+
+ set_bit(GD_SUPPRESS_PART_SCAN, &mpath_head->disk->state);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(mpath_alloc_head_disk);
+
+void mpath_device_set_live(struct mpath_device *mpath_device)
+{
+ struct mpath_head *mpath_head = mpath_device->mpath_head;
+ int ret;
+
+ if (!mpath_head->disk)
+ return;
+
+ if (!test_and_set_bit(MPATH_HEAD_DISK_LIVE, &mpath_head->flags)) {
+ dev_set_drvdata(disk_to_dev(mpath_head->disk), mpath_head);
+ ret = device_add_disk(mpath_head->parent, mpath_head->disk,
+ mpath_head->disk_groups);
+ if (ret) {
+ clear_bit(MPATH_HEAD_DISK_LIVE, &mpath_head->flags);
+ return;
+ }
+ queue_work(mpath_wq, &mpath_head->partition_scan_work);
+ }
+}
+EXPORT_SYMBOL_GPL(mpath_device_set_live);
+
struct mpath_head *mpath_alloc_head(void)
{
struct mpath_head *mpath_head;
@@ -44,6 +137,9 @@ struct mpath_head *mpath_alloc_head(void)
mutex_init(&mpath_head->lock);
kref_init(&mpath_head->ref);
+ INIT_WORK(&mpath_head->partition_scan_work,
+ multipath_partition_scan_work);
+
ret = init_srcu_struct(&mpath_head->srcu);
if (ret) {
kfree(mpath_head);
--
2.43.5
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v2 03/13] libmultipath: Add path selection support
2026-04-28 11:10 [PATCH v2 00/13] libmultipath: a generic multipath lib for block drivers John Garry
2026-04-28 11:10 ` [PATCH v2 01/13] libmultipath: Add initial framework John Garry
2026-04-28 11:10 ` [PATCH v2 02/13] libmultipath: Add basic gendisk support John Garry
@ 2026-04-28 11:10 ` John Garry
2026-04-28 11:10 ` [PATCH v2 04/13] libmultipath: Add bio handling John Garry
` (10 subsequent siblings)
13 siblings, 0 replies; 16+ messages in thread
From: John Garry @ 2026-04-28 11:10 UTC (permalink / raw)
To: hch, kbusch, sagi, axboe, martin.petersen, james.bottomley, hare,
bmarzins, nilay
Cc: jmeneghi, linux-nvme, linux-scsi, michael.christie, snitzer,
dm-devel, linux-kernel, John Garry
Add code for path selection.
NVMe ANA is abstracted into enum mpath_access_state. The motivation here is
so that SCSI ALUA can be used. Callbacks .is_disabled, .is_optimized,
.get_access_state are added to get the path access state.
Path selection modes round-robin, NUMA, and queue-depth are added, same
as NVMe supports.
NVMe has almost like-for-like equivalents here:
- __mpath_find_path() -> __nvme_find_path()
- mpath_find_path() -> nvme_find_path()
and similar for all introduced callee functions.
Functions mpath_set_iopolicy() and mpath_get_iopolicy() are added for
setting default iopolicy.
A separate mpath_iopolicy structure is introduced. There is no iopolicy
member included in the mpath_head structure as it may not suit NVMe, where
iopolicy is per-subsystem and not per namespace.
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
include/linux/multipath.h | 37 ++++++
lib/multipath.c | 248 ++++++++++++++++++++++++++++++++++++++
2 files changed, 285 insertions(+)
diff --git a/include/linux/multipath.h b/include/linux/multipath.h
index 3e2a513059cde..13d810148a96a 100644
--- a/include/linux/multipath.h
+++ b/include/linux/multipath.h
@@ -7,13 +7,36 @@
extern const struct block_device_operations mpath_ops;
+enum mpath_iopolicy_e {
+ MPATH_IOPOLICY_NUMA,
+ MPATH_IOPOLICY_RR,
+ MPATH_IOPOLICY_QD,
+};
+
+struct mpath_iopolicy {
+ enum mpath_iopolicy_e iopolicy;
+};
+
+enum mpath_access_state {
+ MPATH_STATE_OPTIMIZED,
+ MPATH_STATE_NONOPTIMIZED,
+ MPATH_STATE_OTHER
+};
+
struct mpath_device {
struct mpath_head *mpath_head;
struct list_head siblings;
struct gendisk *disk;
+ int numa_node;
+ enum mpath_access_state access_state;
};
struct mpath_head_template {
+ bool (*is_disabled)(struct mpath_device *);
+ bool (*is_optimized)(struct mpath_device *);
+ int (*get_nr_active)(struct mpath_device *);
+ enum mpath_iopolicy_e (*get_iopolicy)(struct mpath_head *);
+ const struct attribute_group **device_groups;
};
#define MPATH_HEAD_DISK_LIVE 0
@@ -45,6 +68,14 @@ static inline struct mpath_head *mpath_gendisk_to_head(struct gendisk *disk)
return mpath_bd_device_to_head(disk_to_dev(disk));
}
+static inline enum mpath_iopolicy_e mpath_read_iopolicy(
+ struct mpath_iopolicy *mpath_iopolicy)
+{
+ return READ_ONCE(mpath_iopolicy->iopolicy);
+}
+void mpath_synchronize(struct mpath_head *mpath_head);
+int mpath_set_iopolicy(const char *val, int *iopolicy);
+int mpath_get_iopolicy(char *buf, int iopolicy);
int mpath_get_head(struct mpath_head *mpath_head);
void mpath_put_head(struct mpath_head *mpath_head);
struct mpath_head *mpath_alloc_head(void);
@@ -63,4 +94,10 @@ static inline bool is_mpath_disk(struct gendisk *disk)
return false;
#endif
}
+
+static inline bool mpath_qd_iopolicy(struct mpath_iopolicy *mpath_iopolicy)
+{
+ return mpath_read_iopolicy(mpath_iopolicy) == MPATH_IOPOLICY_QD;
+}
+
#endif // _LIBMULTIPATH_H
diff --git a/lib/multipath.c b/lib/multipath.c
index 726d9bec13553..fa211420b72c3 100644
--- a/lib/multipath.c
+++ b/lib/multipath.c
@@ -6,8 +6,242 @@
#include <linux/module.h>
#include <linux/multipath.h>
+static struct mpath_device *mpath_find_path(struct mpath_head *mpath_head);
+
static struct workqueue_struct *mpath_wq;
+static const char *mpath_iopolicy_names[] = {
+ [MPATH_IOPOLICY_NUMA] = "numa",
+ [MPATH_IOPOLICY_RR] = "round-robin",
+ [MPATH_IOPOLICY_QD] = "queue-depth",
+};
+
+int mpath_set_iopolicy(const char *val, int *iopolicy)
+{
+ if (!val)
+ return -EINVAL;
+ if (!strncmp(val, "numa", 4))
+ *iopolicy = MPATH_IOPOLICY_NUMA;
+ else if (!strncmp(val, "round-robin", 11))
+ *iopolicy = MPATH_IOPOLICY_RR;
+ else if (!strncmp(val, "queue-depth", 11))
+ *iopolicy = MPATH_IOPOLICY_QD;
+ else
+ return -EINVAL;
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(mpath_set_iopolicy);
+
+int mpath_get_iopolicy(char *buf, int iopolicy)
+{
+ return sprintf(buf, "%s\n", mpath_iopolicy_names[iopolicy]);
+}
+EXPORT_SYMBOL_GPL(mpath_get_iopolicy);
+
+
+void mpath_synchronize(struct mpath_head *mpath_head)
+{
+ synchronize_srcu(&mpath_head->srcu);
+}
+EXPORT_SYMBOL_GPL(mpath_synchronize);
+
+static bool mpath_path_is_disabled(struct mpath_head *mpath_head,
+ struct mpath_device *mpath_device)
+{
+ return mpath_head->mpdt->is_disabled(mpath_device);
+}
+
+static struct mpath_device *__mpath_find_path(struct mpath_head *mpath_head,
+ int node)
+{
+ int found_distance = INT_MAX, fallback_distance = INT_MAX, distance;
+ struct mpath_device *found = NULL, *fallback = NULL, *mpath_device;
+
+ list_for_each_entry_srcu(mpath_device, &mpath_head->dev_list, siblings,
+ srcu_read_lock_held(&mpath_head->srcu)) {
+ if (mpath_path_is_disabled(mpath_head, mpath_device))
+ continue;
+
+ if (mpath_device->numa_node != NUMA_NO_NODE &&
+ (mpath_head->mpdt->get_iopolicy(mpath_head) ==
+ MPATH_IOPOLICY_NUMA))
+ distance = node_distance(node,
+ mpath_device->numa_node);
+ else
+ distance = LOCAL_DISTANCE;
+
+ switch(mpath_device->access_state) {
+ case MPATH_STATE_OPTIMIZED:
+ if (distance < found_distance) {
+ found_distance = distance;
+ found = mpath_device;
+ }
+ break;
+ case MPATH_STATE_NONOPTIMIZED:
+ if (distance < fallback_distance) {
+ fallback_distance = distance;
+ fallback = mpath_device;
+ }
+ break;
+ default:
+ break;
+ }
+ }
+
+ if (!found)
+ found = fallback;
+
+ if (found)
+ rcu_assign_pointer(mpath_head->current_path[node], found);
+
+ return found;
+}
+
+static struct mpath_device *mpath_next_dev(struct mpath_head *mpath_head,
+ struct mpath_device *mpath_dev)
+{
+ mpath_dev = list_next_or_null_rcu(&mpath_head->dev_list,
+ &mpath_dev->siblings, struct mpath_device,
+ siblings);
+
+ if (mpath_dev)
+ return mpath_dev;
+ return list_first_or_null_rcu(&mpath_head->dev_list,
+ struct mpath_device, siblings);
+}
+
+static struct mpath_device *mpath_round_robin_path(
+ struct mpath_head *mpath_head)
+{
+ struct mpath_device *mpath_device, *found = NULL;
+ int node = numa_node_id();
+ enum mpath_access_state access_state_old;
+ struct mpath_device *old =
+ srcu_dereference(mpath_head->current_path[node],
+ &mpath_head->srcu);
+
+ if (unlikely(!old))
+ return __mpath_find_path(mpath_head, node);
+
+ if (list_is_singular(&mpath_head->dev_list)) {
+ if (mpath_path_is_disabled(mpath_head, old))
+ return NULL;
+ return old;
+ }
+
+ for (mpath_device = mpath_next_dev(mpath_head, old);
+ mpath_device && mpath_device != old;
+ mpath_device = mpath_next_dev(mpath_head, mpath_device)) {
+
+ if (mpath_path_is_disabled(mpath_head, mpath_device))
+ continue;
+ if (mpath_device->access_state == MPATH_STATE_OPTIMIZED) {
+ found = mpath_device;
+ goto out;
+ }
+ if (mpath_device->access_state == MPATH_STATE_NONOPTIMIZED)
+ found = mpath_device;
+ }
+
+ /*
+ * The loop above skips the current path for round-robin semantics.
+ * Fall back to the current path if either:
+ * - no other optimized path found and current is optimized,
+ * - no other usable path found and current is usable.
+ */
+ access_state_old = old->access_state;
+ if (!mpath_path_is_disabled(mpath_head, old) &&
+ (access_state_old == MPATH_STATE_OPTIMIZED ||
+ (!found && access_state_old == MPATH_STATE_NONOPTIMIZED)))
+ return old;
+
+ if (!found)
+ return NULL;
+out:
+ rcu_assign_pointer(mpath_head->current_path[node], found);
+
+ return found;
+}
+
+static struct mpath_device *mpath_queue_depth_path(
+ struct mpath_head *mpath_head)
+{
+ struct mpath_device *best_opt = NULL, *mpath_device;
+ struct mpath_device *best_nonopt = NULL;
+ unsigned int min_depth_opt = UINT_MAX, min_depth_nonopt = UINT_MAX;
+ unsigned int depth;
+ int (*get_nr_active)(struct mpath_device *) =
+ mpath_head->mpdt->get_nr_active;
+
+ list_for_each_entry_srcu(mpath_device, &mpath_head->dev_list, siblings,
+ srcu_read_lock_held(&mpath_head->srcu)) {
+
+ if (mpath_path_is_disabled(mpath_head, mpath_device))
+ continue;
+
+ depth = get_nr_active(mpath_device);
+
+ switch (mpath_device->access_state) {
+ case MPATH_STATE_OPTIMIZED:
+ if (depth < min_depth_opt) {
+ min_depth_opt = depth;
+ best_opt = mpath_device;
+ }
+ break;
+ case MPATH_STATE_NONOPTIMIZED:
+ if (depth < min_depth_nonopt) {
+ min_depth_nonopt = depth;
+ best_nonopt = mpath_device;
+ }
+ break;
+ default:
+ break;
+ }
+
+ if (min_depth_opt == 0)
+ return best_opt;
+ }
+
+ return best_opt ? best_opt : best_nonopt;
+}
+
+static inline bool mpath_path_is_optimized(struct mpath_head *mpath_head,
+ struct mpath_device *mpath_device)
+{
+ return mpath_head->mpdt->is_optimized(mpath_device);
+}
+
+static struct mpath_device *mpath_numa_path(struct mpath_head *mpath_head)
+{
+ int node = numa_node_id();
+ struct mpath_device *mpath_device;
+
+ mpath_device = srcu_dereference(mpath_head->current_path[node],
+ &mpath_head->srcu);
+ if (unlikely(!mpath_device))
+ return __mpath_find_path(mpath_head, node);
+ if (unlikely(!mpath_path_is_optimized(mpath_head, mpath_device)))
+ return __mpath_find_path(mpath_head, node);
+ return mpath_device;
+}
+
+__maybe_unused
+static struct mpath_device *mpath_find_path(struct mpath_head *mpath_head)
+{
+ enum mpath_iopolicy_e iopolicy =
+ mpath_head->mpdt->get_iopolicy(mpath_head);
+
+ switch (iopolicy) {
+ case MPATH_IOPOLICY_QD:
+ return mpath_queue_depth_path(mpath_head);
+ case MPATH_IOPOLICY_RR:
+ return mpath_round_robin_path(mpath_head);
+ default:
+ return mpath_numa_path(mpath_head);
+ }
+}
+
static void mpath_free_head(struct kref *ref)
{
struct mpath_head *mpath_head =
@@ -71,6 +305,7 @@ void mpath_remove_disk(struct mpath_head *mpath_head)
if (test_and_clear_bit(MPATH_HEAD_DISK_LIVE, &mpath_head->flags)) {
struct gendisk *disk = mpath_head->disk;
+ mpath_synchronize(mpath_head);
del_gendisk(disk);
}
}
@@ -121,6 +356,19 @@ void mpath_device_set_live(struct mpath_device *mpath_device)
}
queue_work(mpath_wq, &mpath_head->partition_scan_work);
}
+
+ mutex_lock(&mpath_head->lock);
+ if (mpath_path_is_optimized(mpath_head, mpath_device)) {
+ int node, srcu_idx;
+
+ srcu_idx = srcu_read_lock(&mpath_head->srcu);
+ for_each_online_node(node)
+ __mpath_find_path(mpath_head, node);
+ srcu_read_unlock(&mpath_head->srcu, srcu_idx);
+ }
+ mutex_unlock(&mpath_head->lock);
+
+ mpath_synchronize(mpath_head);
}
EXPORT_SYMBOL_GPL(mpath_device_set_live);
--
2.43.5
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v2 04/13] libmultipath: Add bio handling
2026-04-28 11:10 [PATCH v2 00/13] libmultipath: a generic multipath lib for block drivers John Garry
` (2 preceding siblings ...)
2026-04-28 11:10 ` [PATCH v2 03/13] libmultipath: Add path selection support John Garry
@ 2026-04-28 11:10 ` John Garry
2026-04-28 11:10 ` [PATCH v2 05/13] libmultipath: Add support for mpath_device management John Garry
` (9 subsequent siblings)
13 siblings, 0 replies; 16+ messages in thread
From: John Garry @ 2026-04-28 11:10 UTC (permalink / raw)
To: hch, kbusch, sagi, axboe, martin.petersen, james.bottomley, hare,
bmarzins, nilay
Cc: jmeneghi, linux-nvme, linux-scsi, michael.christie, snitzer,
dm-devel, linux-kernel, John Garry
Add support to submit a bio per-path. In addition, for failover, add
support to requeue a failed bio.
NVMe has almost like-for-like equivalents here:
- nvme_available_path() -> mpath_available_path()
- nvme_requeue_work() -> mpath_requeue_work()
- nvme_ns_head_submit_bio() -> mpath_bdev_submit_bio()
For failover, a driver may want to re-submit a bio, so add support to
clone a bio prior to submission.
A bio which is submitted to a per-path device has flag REQ_MPATH set,
same as what is done for NVMe with REQ_NVME_MPATH.
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
include/linux/multipath.h | 18 +++++++
lib/multipath.c | 100 +++++++++++++++++++++++++++++++++++++-
2 files changed, 116 insertions(+), 2 deletions(-)
diff --git a/include/linux/multipath.h b/include/linux/multipath.h
index 13d810148a96a..2a5a9236480f7 100644
--- a/include/linux/multipath.h
+++ b/include/linux/multipath.h
@@ -3,6 +3,7 @@
#define _LIBMULTIPATH_H
#include <linux/blkdev.h>
+#include <linux/blk-mq.h>
#include <linux/srcu.h>
extern const struct block_device_operations mpath_ops;
@@ -32,10 +33,12 @@ struct mpath_device {
};
struct mpath_head_template {
+ bool (*available_path)(struct mpath_device *);
bool (*is_disabled)(struct mpath_device *);
bool (*is_optimized)(struct mpath_device *);
int (*get_nr_active)(struct mpath_device *);
enum mpath_iopolicy_e (*get_iopolicy)(struct mpath_head *);
+ struct bio *(*clone_bio)(struct bio *);
const struct attribute_group **device_groups;
};
@@ -48,6 +51,10 @@ struct mpath_head {
struct kref ref;
+ struct bio_list requeue_list; /* list for requeing bio */
+ spinlock_t requeue_lock;
+ struct work_struct requeue_work; /* work struct for requeue */
+
void *drvdata;
unsigned long flags;
struct gendisk *disk;
@@ -58,6 +65,13 @@ struct mpath_head {
struct mpath_device __rcu *current_path[];
};
+#define REQ_MPATH REQ_DRV
+
+static inline bool is_mpath_request(struct request *req)
+{
+ return req->cmd_flags & REQ_MPATH;
+}
+
static inline struct mpath_head *mpath_bd_device_to_head(struct device *dev)
{
return dev_get_drvdata(dev);
@@ -100,4 +114,8 @@ static inline bool mpath_qd_iopolicy(struct mpath_iopolicy *mpath_iopolicy)
return mpath_read_iopolicy(mpath_iopolicy) == MPATH_IOPOLICY_QD;
}
+static inline void mpath_schedule_requeue_work(struct mpath_head *mpath_head)
+{
+ kblockd_schedule_work(&mpath_head->requeue_work);
+}
#endif // _LIBMULTIPATH_H
diff --git a/lib/multipath.c b/lib/multipath.c
index fa211420b72c3..eabf1347d9acc 100644
--- a/lib/multipath.c
+++ b/lib/multipath.c
@@ -5,6 +5,7 @@
*/
#include <linux/module.h>
#include <linux/multipath.h>
+#include <trace/events/block.h>
static struct mpath_device *mpath_find_path(struct mpath_head *mpath_head);
@@ -39,7 +40,6 @@ int mpath_get_iopolicy(char *buf, int iopolicy)
}
EXPORT_SYMBOL_GPL(mpath_get_iopolicy);
-
void mpath_synchronize(struct mpath_head *mpath_head)
{
synchronize_srcu(&mpath_head->srcu);
@@ -226,7 +226,6 @@ static struct mpath_device *mpath_numa_path(struct mpath_head *mpath_head)
return mpath_device;
}
-__maybe_unused
static struct mpath_device *mpath_find_path(struct mpath_head *mpath_head)
{
enum mpath_iopolicy_e iopolicy =
@@ -242,6 +241,73 @@ static struct mpath_device *mpath_find_path(struct mpath_head *mpath_head)
}
}
+static bool mpath_available_path(struct mpath_head *mpath_head)
+{
+ struct mpath_device *mpath_device;
+
+ if (!test_bit(MPATH_HEAD_DISK_LIVE, &mpath_head->flags))
+ return false;
+
+ list_for_each_entry_srcu(mpath_device, &mpath_head->dev_list, siblings,
+ srcu_read_lock_held(&mpath_head->srcu)) {
+ if (mpath_head->mpdt->available_path(mpath_device))
+ return true;
+ }
+
+ return false;
+}
+
+static void mpath_bdev_submit_bio(struct bio *bio)
+{
+ struct mpath_head *mpath_head = bio->bi_bdev->bd_disk->private_data;
+ struct device *dev = mpath_head->parent;
+ struct mpath_device *mpath_device;
+ int srcu_idx;
+
+ /*
+ * The mpath_devuce might be going away and the bio might be moved to a
+ * different queue in failover, so we need to use the bio_split
+ * pool from the original queue to allocate the bvecs from.
+ */
+ bio = bio_split_to_limits(bio);
+ if (!bio)
+ return;
+
+ srcu_idx = srcu_read_lock(&mpath_head->srcu);
+ mpath_device = mpath_find_path(mpath_head);
+
+ if (likely(mpath_device)) {
+ if (mpath_head->mpdt->clone_bio) {
+ struct bio *orig = bio;
+
+ bio = mpath_head->mpdt->clone_bio(bio);
+ if (!bio) {
+ bio_io_error(orig);
+ goto out;
+ }
+ }
+ trace_block_bio_remap(bio, disk_devt(mpath_device->disk),
+ bio->bi_iter.bi_sector);
+ bio_set_dev(bio, mpath_device->disk->part0);
+ bio->bi_opf |= REQ_MPATH;
+
+ submit_bio_noacct(bio);
+ } else if (mpath_available_path(mpath_head)) {
+ dev_warn_ratelimited(dev, "no usable path - requeuing I/O\n");
+
+ spin_lock_irq(&mpath_head->requeue_lock);
+ bio_list_add(&mpath_head->requeue_list, bio);
+ spin_unlock_irq(&mpath_head->requeue_lock);
+ } else {
+ dev_warn_ratelimited(dev, "no available path - failing I/O\n");
+
+ bio_io_error(bio);
+ }
+
+out:
+ srcu_read_unlock(&mpath_head->srcu, srcu_idx);
+}
+
static void mpath_free_head(struct kref *ref)
{
struct mpath_head *mpath_head =
@@ -283,6 +349,7 @@ const struct block_device_operations mpath_ops = {
.owner = THIS_MODULE,
.open = mpath_bdev_open,
.release = mpath_bdev_release,
+ .submit_bio = mpath_bdev_submit_bio,
};
EXPORT_SYMBOL_GPL(mpath_ops);
@@ -300,11 +367,34 @@ static void multipath_partition_scan_work(struct work_struct *work)
mutex_unlock(&mpath_head->disk->open_mutex);
}
+static void mpath_requeue_work(struct work_struct *work)
+{
+ struct mpath_head *mpath_head =
+ container_of(work, struct mpath_head, requeue_work);
+ struct bio *bio, *next;
+
+ spin_lock_irq(&mpath_head->requeue_lock);
+ next = bio_list_get(&mpath_head->requeue_list);
+ spin_unlock_irq(&mpath_head->requeue_lock);
+
+ while ((bio = next) != NULL) {
+ next = bio->bi_next;
+ bio->bi_next = NULL;
+ submit_bio_noacct(bio);
+ }
+}
+
void mpath_remove_disk(struct mpath_head *mpath_head)
{
if (test_and_clear_bit(MPATH_HEAD_DISK_LIVE, &mpath_head->flags)) {
struct gendisk *disk = mpath_head->disk;
+ /*
+ * requeue I/O after MPATH_HEAD_DISK_LIVE has been cleared
+ * to allow multipath to fail all I/O.
+ */
+ mpath_schedule_requeue_work(mpath_head);
+
mpath_synchronize(mpath_head);
del_gendisk(disk);
}
@@ -317,6 +407,8 @@ void mpath_put_disk(struct mpath_head *mpath_head)
return;
/* make sure all pending bios are cleaned up */
+ kblockd_schedule_work(&mpath_head->requeue_work);
+ flush_work(&mpath_head->requeue_work);
flush_work(&mpath_head->partition_scan_work);
put_disk(mpath_head->disk);
}
@@ -369,6 +461,7 @@ void mpath_device_set_live(struct mpath_device *mpath_device)
mutex_unlock(&mpath_head->lock);
mpath_synchronize(mpath_head);
+ mpath_schedule_requeue_work(mpath_head);
}
EXPORT_SYMBOL_GPL(mpath_device_set_live);
@@ -387,6 +480,9 @@ struct mpath_head *mpath_alloc_head(void)
INIT_WORK(&mpath_head->partition_scan_work,
multipath_partition_scan_work);
+ INIT_WORK(&mpath_head->requeue_work, mpath_requeue_work);
+ spin_lock_init(&mpath_head->requeue_lock);
+ bio_list_init(&mpath_head->requeue_list);
ret = init_srcu_struct(&mpath_head->srcu);
if (ret) {
--
2.43.5
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v2 05/13] libmultipath: Add support for mpath_device management
2026-04-28 11:10 [PATCH v2 00/13] libmultipath: a generic multipath lib for block drivers John Garry
` (3 preceding siblings ...)
2026-04-28 11:10 ` [PATCH v2 04/13] libmultipath: Add bio handling John Garry
@ 2026-04-28 11:10 ` John Garry
2026-04-28 11:10 ` [PATCH v2 06/13] libmultipath: Add cdev support John Garry
` (8 subsequent siblings)
13 siblings, 0 replies; 16+ messages in thread
From: John Garry @ 2026-04-28 11:10 UTC (permalink / raw)
To: hch, kbusch, sagi, axboe, martin.petersen, james.bottomley, hare,
bmarzins, nilay
Cc: jmeneghi, linux-nvme, linux-scsi, michael.christie, snitzer,
dm-devel, linux-kernel, John Garry
Add support to add or remove a mpath_device as a path.
NVMe has almost like-for-like equivalents here:
- nvme_mpath_clear_current_path() -> mpath_clear_current_path()
- nvme_mpath_add_sysfs_link() -> mpath_add_sysfs_link()
- nvme_mpath_remove_sysfs_link() -> mpath_remove_sysfs_link()
- nvme_mpath_revalidate_paths() -> mpath_revalidate_paths()
mpath_revalidate_paths() has a CB arg for NVMe specific handling.
The functionality in mpath_clear_paths() and mpath_synchronize() have the
same pattern which is frequently used in the NVMe code.
Helper mpath_call_for_device() is added to allow a driver run a callback
on any path available. It is intended to be used for occasions when the
NVMe drivers accesses the list of paths outside its multipath code, like
NVMe sysfs.c
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
include/linux/multipath.h | 16 ++++
lib/multipath.c | 182 ++++++++++++++++++++++++++++++++++++++
2 files changed, 198 insertions(+)
diff --git a/include/linux/multipath.h b/include/linux/multipath.h
index 2a5a9236480f7..72186ab220083 100644
--- a/include/linux/multipath.h
+++ b/include/linux/multipath.h
@@ -24,10 +24,13 @@ enum mpath_access_state {
MPATH_STATE_OTHER
};
+#define MPATH_DEVICE_SYSFS_ATTR_LINK 0
+
struct mpath_device {
struct mpath_head *mpath_head;
struct list_head siblings;
struct gendisk *disk;
+ unsigned long flags;
int numa_node;
enum mpath_access_state access_state;
};
@@ -90,6 +93,19 @@ static inline enum mpath_iopolicy_e mpath_read_iopolicy(
void mpath_synchronize(struct mpath_head *mpath_head);
int mpath_set_iopolicy(const char *val, int *iopolicy);
int mpath_get_iopolicy(char *buf, int iopolicy);
+bool mpath_clear_current_path(struct mpath_device *mpath_device);
+void mpath_synchronize(struct mpath_head *mpath_head);
+void mpath_add_device(struct mpath_head *mpath_head,
+ struct mpath_device *mpath_device);
+bool mpath_delete_device(struct mpath_device *mpath_device);
+bool mpath_head_devices_empty(struct mpath_head *mpath_head);
+int mpath_call_for_device(struct mpath_head *mpath_head,
+ int (*cb)(struct mpath_device *mpath_device));
+void mpath_clear_paths(struct mpath_head *mpath_head);
+void mpath_revalidate_paths(struct mpath_head *mpath_head,
+ void (*not_ready_cb)(struct mpath_device *mpath_device));
+void mpath_add_sysfs_link(struct mpath_head *mpath_head);
+void mpath_remove_sysfs_link(struct mpath_device *mpath_device);
int mpath_get_head(struct mpath_head *mpath_head);
void mpath_put_head(struct mpath_head *mpath_head);
struct mpath_head *mpath_alloc_head(void);
diff --git a/lib/multipath.c b/lib/multipath.c
index eabf1347d9acc..1232e057199ae 100644
--- a/lib/multipath.c
+++ b/lib/multipath.c
@@ -46,6 +46,115 @@ void mpath_synchronize(struct mpath_head *mpath_head)
}
EXPORT_SYMBOL_GPL(mpath_synchronize);
+void mpath_add_device(struct mpath_head *mpath_head,
+ struct mpath_device *mpath_device)
+{
+ mpath_device->mpath_head = mpath_head;
+ mutex_lock(&mpath_head->lock);
+ list_add_tail_rcu(&mpath_device->siblings, &mpath_head->dev_list);
+ mutex_unlock(&mpath_head->lock);
+}
+EXPORT_SYMBOL_GPL(mpath_add_device);
+
+bool mpath_delete_device(struct mpath_device *mpath_device)
+{
+ bool empty;
+
+ mutex_lock(&mpath_device->mpath_head->lock);
+ list_del_rcu(&mpath_device->siblings);
+ empty = list_empty(&mpath_device->mpath_head->dev_list);
+ mutex_unlock(&mpath_device->mpath_head->lock);
+
+ return empty;
+}
+EXPORT_SYMBOL_GPL(mpath_delete_device);
+
+bool mpath_head_devices_empty(struct mpath_head *mpath_head)
+{
+ bool empty;
+
+ mutex_lock(&mpath_head->lock);
+ empty = list_empty(&mpath_head->dev_list);
+ mutex_unlock(&mpath_head->lock);
+
+ return empty;
+}
+EXPORT_SYMBOL_GPL(mpath_head_devices_empty);
+
+int mpath_call_for_device(struct mpath_head *mpath_head,
+ int (*cb)(struct mpath_device *mpath_device))
+{
+ struct mpath_device *mpath_device;
+ int ret = -EWOULDBLOCK, srcu_idx;
+
+ srcu_idx = srcu_read_lock(&mpath_head->srcu);
+ mpath_device = mpath_find_path(mpath_head);
+ if (mpath_device)
+ ret = cb(mpath_device);
+ srcu_read_unlock(&mpath_head->srcu, srcu_idx);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(mpath_call_for_device);
+
+bool mpath_clear_current_path(struct mpath_device *mpath_device)
+{
+ struct mpath_head *mpath_head = mpath_device->mpath_head;
+ bool changed = false;
+ int node;
+
+ for_each_node(node) {
+ if (mpath_device ==
+ rcu_access_pointer(mpath_head->current_path[node])) {
+ rcu_assign_pointer(mpath_head->current_path[node],
+ NULL);
+ changed = true;
+ }
+ }
+
+ return changed;
+}
+EXPORT_SYMBOL_GPL(mpath_clear_current_path);
+
+static void mpath_revalidate_paths_iter(struct mpath_head *mpath_head,
+ void (*not_ready_cb)(struct mpath_device *mpath_device))
+{
+ sector_t capacity = get_capacity(mpath_head->disk);
+ struct mpath_device *mpath_device;
+ int srcu_idx;
+
+ if (!not_ready_cb)
+ return;
+
+ srcu_idx = srcu_read_lock(&mpath_head->srcu);
+ list_for_each_entry_srcu(mpath_device, &mpath_head->dev_list, siblings,
+ srcu_read_lock_held(&mpath_head->srcu)) {
+ if (capacity != get_capacity(mpath_device->disk))
+ not_ready_cb(mpath_device);
+ }
+ srcu_read_unlock(&mpath_head->srcu, srcu_idx);
+}
+
+void mpath_clear_paths(struct mpath_head *mpath_head)
+{
+ int node;
+
+ for_each_node(node)
+ rcu_assign_pointer(mpath_head->current_path[node], NULL);
+}
+EXPORT_SYMBOL_GPL(mpath_clear_paths);
+
+void mpath_revalidate_paths(struct mpath_head *mpath_head,
+ void (*not_ready_cb)(struct mpath_device *mpath_device))
+{
+
+ mpath_revalidate_paths_iter(mpath_head, not_ready_cb);
+ mpath_clear_paths(mpath_head);
+
+ mpath_schedule_requeue_work(mpath_head);
+}
+EXPORT_SYMBOL_GPL(mpath_revalidate_paths);
+
static bool mpath_path_is_disabled(struct mpath_head *mpath_head,
struct mpath_device *mpath_device)
{
@@ -449,6 +558,8 @@ void mpath_device_set_live(struct mpath_device *mpath_device)
queue_work(mpath_wq, &mpath_head->partition_scan_work);
}
+ mpath_add_sysfs_link(mpath_head);
+
mutex_lock(&mpath_head->lock);
if (mpath_path_is_optimized(mpath_head, mpath_device)) {
int node, srcu_idx;
@@ -465,6 +576,77 @@ void mpath_device_set_live(struct mpath_device *mpath_device)
}
EXPORT_SYMBOL_GPL(mpath_device_set_live);
+void mpath_add_sysfs_link(struct mpath_head *mpath_head)
+{
+ struct device *target;
+ struct device *source;
+ int rc, srcu_idx;
+ struct kobject *mpath_gd_kobj;
+ struct mpath_device *mpath_device;
+
+ /*
+ * Ensure head disk node is already added otherwise we may get invalid
+ * kobj for head disk node
+ */
+ if (!test_bit(GD_ADDED, &mpath_head->disk->state))
+ return;
+
+ mpath_gd_kobj = &disk_to_dev(mpath_head->disk)->kobj;
+ srcu_idx = srcu_read_lock(&mpath_head->srcu);
+
+ list_for_each_entry_srcu(mpath_device, &mpath_head->dev_list, siblings,
+ srcu_read_lock_held(&mpath_head->srcu)) {
+ if (!test_bit(GD_ADDED, &mpath_device->disk->state))
+ continue;
+
+ if (test_and_set_bit(MPATH_DEVICE_SYSFS_ATTR_LINK,
+ &mpath_device->flags))
+ continue;
+
+ target = disk_to_dev(mpath_device->disk);
+ source = disk_to_dev(mpath_head->disk);
+ /*
+ * Create sysfs link from head gendisk kobject @kobj to the
+ * ns path gendisk kobject @target->kobj.
+ */
+ rc = sysfs_add_link_to_group(mpath_gd_kobj, "multipath",
+ &target->kobj, dev_name(target));
+
+ if (unlikely(rc)) {
+ dev_err(disk_to_dev(mpath_head->disk),
+ "failed to create link to %s rc=%d\n",
+ dev_name(target), rc);
+ clear_bit(MPATH_DEVICE_SYSFS_ATTR_LINK,
+ &mpath_device->flags);
+ } else {
+ dev_info(source, "Created multipath sysfs link to %s\n",
+ mpath_device->disk->disk_name);
+ }
+ }
+
+ srcu_read_unlock(&mpath_head->srcu, srcu_idx);
+}
+EXPORT_SYMBOL_GPL(mpath_add_sysfs_link);
+
+void mpath_remove_sysfs_link(struct mpath_device *mpath_device)
+{
+ struct device *target;
+ struct kobject *mpath_gd_kobj;
+ struct mpath_head *mpath_head = mpath_device->mpath_head;
+
+ if (!test_bit(MPATH_DEVICE_SYSFS_ATTR_LINK, &mpath_device->flags))
+ return;
+
+ target = disk_to_dev(mpath_device->disk);
+ mpath_gd_kobj = &disk_to_dev(mpath_head->disk)->kobj;
+
+ sysfs_remove_link_from_group(mpath_gd_kobj, "multipath",
+ dev_name(target));
+
+ clear_bit(MPATH_DEVICE_SYSFS_ATTR_LINK, &mpath_device->flags);
+}
+EXPORT_SYMBOL_GPL(mpath_remove_sysfs_link);
+
struct mpath_head *mpath_alloc_head(void)
{
struct mpath_head *mpath_head;
--
2.43.5
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v2 06/13] libmultipath: Add cdev support
2026-04-28 11:10 [PATCH v2 00/13] libmultipath: a generic multipath lib for block drivers John Garry
` (4 preceding siblings ...)
2026-04-28 11:10 ` [PATCH v2 05/13] libmultipath: Add support for mpath_device management John Garry
@ 2026-04-28 11:10 ` John Garry
2026-04-28 11:10 ` [PATCH v2 07/13] libmultipath: Add delayed removal support John Garry
` (7 subsequent siblings)
13 siblings, 0 replies; 16+ messages in thread
From: John Garry @ 2026-04-28 11:10 UTC (permalink / raw)
To: hch, kbusch, sagi, axboe, martin.petersen, james.bottomley, hare,
bmarzins, nilay
Cc: jmeneghi, linux-nvme, linux-scsi, michael.christie, snitzer,
dm-devel, linux-kernel, John Garry
Add support to create a cdev multipath device. The functionality is much
the same as NVMe, where the cdev is created when a mpath device is set
live.
The driver must provide a mpath_head_template.cdev_ioctl callback to
actually handle the ioctl.
Structure mpath_generic_chr_fops would be used for setting the cdev fops in
the mpath_head_template.add_cdev callback.
NVMe cdev iotcl handler has special handling for NVMe controller commands.
In this case, the SRCU read lock is dropped before executing the ioctl.
For reference, see nvme_ns_head_ctrl_ioctl(). This makes having the SRCU
lock when calling not always possible. To handle this scenario, add template
callbacks .ioctl_begin and .ioctl_finish to be called around the before and
after the ioctl callback - if the .ioctl_begin returns data then we know
to drop the SRCU lock before calling the ioctl callback, and then later
call .ioctl_finish callback with that same data. For NVMe using
libmultipath, we would take a reference to the controller structure and
pass a pointer to the controller structure back in .ioctl_begin callback
and use that same data in the .ioctl_finish callback to put the reference
to the controller.
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
include/linux/multipath.h | 18 ++++++
lib/multipath.c | 129 ++++++++++++++++++++++++++++++++++++++
2 files changed, 147 insertions(+)
diff --git a/include/linux/multipath.h b/include/linux/multipath.h
index 72186ab220083..3ac77c089a58c 100644
--- a/include/linux/multipath.h
+++ b/include/linux/multipath.h
@@ -4,8 +4,11 @@
#include <linux/blkdev.h>
#include <linux/blk-mq.h>
+#include <linux/cdev.h>
#include <linux/srcu.h>
+#include <linux/io_uring/cmd.h>
+extern const struct file_operations mpath_chr_fops;
extern const struct block_device_operations mpath_ops;
enum mpath_iopolicy_e {
@@ -37,12 +40,24 @@ struct mpath_device {
struct mpath_head_template {
bool (*available_path)(struct mpath_device *);
+ int (*add_cdev)(struct mpath_head *);
+ void (*del_cdev)(struct mpath_head *);
bool (*is_disabled)(struct mpath_device *);
bool (*is_optimized)(struct mpath_device *);
int (*get_nr_active)(struct mpath_device *);
+ long (*cdev_ioctl)(struct mpath_device *, unsigned int cmd,
+ unsigned long arg, bool open_for_write);
+ int (*chr_uring_cmd)(struct mpath_device *,
+ struct io_uring_cmd *ioucmd,
+ unsigned int issue_flags);
+ int (*chr_uring_cmd_iopoll)(struct io_uring_cmd *ioucmd,
+ struct io_comp_batch *iob,
+ unsigned int poll_flags);
enum mpath_iopolicy_e (*get_iopolicy)(struct mpath_head *);
struct bio *(*clone_bio)(struct bio *);
const struct attribute_group **device_groups;
+ void (*ioctl_begin)(struct mpath_device *, unsigned int cmd, void **);
+ void (*ioctl_finish)(void *opaque);
};
#define MPATH_HEAD_DISK_LIVE 0
@@ -58,6 +73,9 @@ struct mpath_head {
spinlock_t requeue_lock;
struct work_struct requeue_work; /* work struct for requeue */
+ struct cdev cdev;
+ struct device cdev_device;
+
void *drvdata;
unsigned long flags;
struct gendisk *disk;
diff --git a/lib/multipath.c b/lib/multipath.c
index 1232e057199ae..69e48ca3169c2 100644
--- a/lib/multipath.c
+++ b/lib/multipath.c
@@ -462,6 +462,122 @@ const struct block_device_operations mpath_ops = {
};
EXPORT_SYMBOL_GPL(mpath_ops);
+static int mpath_chr_open(struct inode *inode, struct file *file)
+{
+ struct cdev *cdev = file_inode(file)->i_cdev;
+ struct mpath_head *mpath_head =
+ container_of(cdev, struct mpath_head, cdev);
+
+ return mpath_get_head(mpath_head);
+}
+
+static int mpath_chr_release(struct inode *inode, struct file *file)
+{
+ struct cdev *cdev = file_inode(file)->i_cdev;
+ struct mpath_head *mpath_head =
+ container_of(cdev, struct mpath_head, cdev);
+
+ mpath_put_head(mpath_head);
+ return 0;
+}
+
+static long mpath_chr_ioctl(struct file *file, unsigned int cmd,
+ unsigned long arg)
+{
+ struct cdev *cdev = file_inode(file)->i_cdev;
+ struct mpath_head *mpath_head =
+ container_of(cdev, struct mpath_head, cdev);
+ struct mpath_device *mpath_device;
+ int srcu_idx, err = -EWOULDBLOCK;
+ void *unlocked_ioctl_data = NULL;
+
+ srcu_idx = srcu_read_lock(&mpath_head->srcu);
+ mpath_device = mpath_find_path(mpath_head);
+ if (!mpath_device)
+ goto out_unlock;
+ if (mpath_head->mpdt->ioctl_begin)
+ mpath_head->mpdt->ioctl_begin(mpath_device, cmd,
+ &unlocked_ioctl_data);
+ if (unlocked_ioctl_data)
+ srcu_read_unlock(&mpath_head->srcu, srcu_idx);
+ err = mpath_head->mpdt->cdev_ioctl(mpath_device, cmd, arg,
+ file->f_mode & FMODE_WRITE);
+ if (unlocked_ioctl_data) {
+ mpath_head->mpdt->ioctl_finish(unlocked_ioctl_data);
+ return err;
+ }
+
+out_unlock:
+ srcu_read_unlock(&mpath_head->srcu, srcu_idx);
+ return err;
+}
+
+static int mpath_chr_uring_cmd(struct io_uring_cmd *ioucmd,
+ unsigned int issue_flags)
+{
+ struct cdev *cdev = file_inode(ioucmd->file)->i_cdev;
+ struct mpath_head *mpath_head =
+ container_of(cdev, struct mpath_head, cdev);
+ struct mpath_device *mpath_device;
+ /* error code copied from nvme_ns_head_chr_uring_cmd */
+ int srcu_idx, ret = -EINVAL;
+
+ srcu_idx = srcu_read_lock(&mpath_head->srcu);
+ mpath_device = mpath_find_path(mpath_head);
+
+ if (!mpath_device)
+ goto out_unlock;
+
+ if (!mpath_head->mpdt->chr_uring_cmd) {
+ ret = -EOPNOTSUPP;
+ goto out_unlock;
+ }
+
+ ret = mpath_head->mpdt->chr_uring_cmd(mpath_device, ioucmd,
+ issue_flags);
+out_unlock:
+ srcu_read_unlock(&mpath_head->srcu, srcu_idx);
+ return ret;
+}
+
+static int mpath_chr_uring_cmd_iopoll(struct io_uring_cmd *ioucmd,
+ struct io_comp_batch *iob,
+ unsigned int poll_flags)
+{
+ struct cdev *cdev = file_inode(ioucmd->file)->i_cdev;
+ struct mpath_head *mpath_head =
+ container_of(cdev, struct mpath_head, cdev);
+
+ if (!mpath_head->mpdt->chr_uring_cmd_iopoll)
+ return -EOPNOTSUPP;
+
+ return mpath_head->mpdt->chr_uring_cmd_iopoll(ioucmd, iob, poll_flags);
+}
+
+const struct file_operations mpath_chr_fops = {
+ .owner = THIS_MODULE,
+ .open = mpath_chr_open,
+ .release = mpath_chr_release,
+ .unlocked_ioctl = mpath_chr_ioctl,
+ .compat_ioctl = compat_ptr_ioctl,
+ .uring_cmd = mpath_chr_uring_cmd,
+ .uring_cmd_iopoll = mpath_chr_uring_cmd_iopoll,
+};
+EXPORT_SYMBOL_GPL(mpath_chr_fops);
+
+static int mpath_head_add_cdev(struct mpath_head *mpath_head)
+{
+ if (mpath_head->mpdt->add_cdev)
+ return mpath_head->mpdt->add_cdev(mpath_head);
+ return 0;
+}
+
+static void mpath_head_del_cdev(struct mpath_head *mpath_head)
+{
+ if (mpath_head->mpdt->del_cdev)
+ mpath_head->mpdt->del_cdev(mpath_head);
+}
+
static void multipath_partition_scan_work(struct work_struct *work)
{
struct mpath_head *mpath_head =
@@ -504,6 +620,7 @@ void mpath_remove_disk(struct mpath_head *mpath_head)
*/
mpath_schedule_requeue_work(mpath_head);
+ mpath_head_del_cdev(mpath_head);
mpath_synchronize(mpath_head);
del_gendisk(disk);
}
@@ -526,6 +643,16 @@ EXPORT_SYMBOL_GPL(mpath_put_disk);
int mpath_alloc_head_disk(struct mpath_head *mpath_head,
struct queue_limits *lim, int numa_node)
{
+ /* Do limited sanity checks on template */
+ if (!mpath_head->mpdt->ioctl_begin ^ !mpath_head->mpdt->ioctl_finish)
+ return -EINVAL;
+
+ if (!mpath_head->mpdt->add_cdev ^ !mpath_head->mpdt->del_cdev)
+ return -EINVAL;
+
+ if (!mpath_head->mpdt->add_cdev ^ !mpath_head->mpdt->cdev_ioctl)
+ return -EINVAL;
+
mpath_head->disk = blk_alloc_disk(lim, numa_node);
if (IS_ERR(mpath_head->disk))
return PTR_ERR(mpath_head->disk);
@@ -555,6 +682,8 @@ void mpath_device_set_live(struct mpath_device *mpath_device)
clear_bit(MPATH_HEAD_DISK_LIVE, &mpath_head->flags);
return;
}
+
+ mpath_head_add_cdev(mpath_head);
queue_work(mpath_wq, &mpath_head->partition_scan_work);
}
--
2.43.5
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v2 07/13] libmultipath: Add delayed removal support
2026-04-28 11:10 [PATCH v2 00/13] libmultipath: a generic multipath lib for block drivers John Garry
` (5 preceding siblings ...)
2026-04-28 11:10 ` [PATCH v2 06/13] libmultipath: Add cdev support John Garry
@ 2026-04-28 11:10 ` John Garry
2026-04-28 11:11 ` [PATCH v2 08/13] libmultipath: Add sysfs helpers John Garry
` (6 subsequent siblings)
13 siblings, 0 replies; 16+ messages in thread
From: John Garry @ 2026-04-28 11:10 UTC (permalink / raw)
To: hch, kbusch, sagi, axboe, martin.petersen, james.bottomley, hare,
bmarzins, nilay
Cc: jmeneghi, linux-nvme, linux-scsi, michael.christie, snitzer,
dm-devel, linux-kernel, John Garry
Add support for delayed removal, same as exists for NVMe.
The purpose of this feature is to keep the multipath disk and cdev present
for intermittent periods of no available path.
Helpers mpath_delayed_removal_secs_show() and
mpath_delayed_removal_secs_store() may be used in the driver sysfs code.
The driver is responsible for supplying the removal work callback for
the delayed work.
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
include/linux/multipath.h | 18 ++++++++
lib/multipath.c | 91 ++++++++++++++++++++++++++++++++++++++-
2 files changed, 108 insertions(+), 1 deletion(-)
diff --git a/include/linux/multipath.h b/include/linux/multipath.h
index 3ac77c089a58c..6afbf6ae1d2a9 100644
--- a/include/linux/multipath.h
+++ b/include/linux/multipath.h
@@ -40,6 +40,7 @@ struct mpath_device {
struct mpath_head_template {
bool (*available_path)(struct mpath_device *);
+ void (*remove_head)(struct mpath_head *);
int (*add_cdev)(struct mpath_head *);
void (*del_cdev)(struct mpath_head *);
bool (*is_disabled)(struct mpath_device *);
@@ -61,6 +62,7 @@ struct mpath_head_template {
};
#define MPATH_HEAD_DISK_LIVE 0
+#define MPATH_HEAD_QUEUE_IF_NO_PATH 1
struct mpath_head {
struct srcu_struct srcu;
@@ -76,6 +78,10 @@ struct mpath_head {
struct cdev cdev;
struct device cdev_device;
+ struct delayed_work remove_work;
+ unsigned int delayed_removal_secs;
+ struct module *drv_module;
+
void *drvdata;
unsigned long flags;
struct gendisk *disk;
@@ -133,6 +139,11 @@ void mpath_remove_disk(struct mpath_head *mpath_head);
int mpath_alloc_head_disk(struct mpath_head *mpath_head,
struct queue_limits *lim, int numa_node);
void mpath_device_set_live(struct mpath_device *mpath_device);
+bool mpath_can_remove_head(struct mpath_head *mpath_head);
+ssize_t mpath_delayed_removal_secs_show(struct mpath_head *mpath_head,
+ char *buf);
+ssize_t mpath_delayed_removal_secs_store(struct mpath_head *mpath_head,
+ const char *buf, size_t count);
static inline bool is_mpath_disk(struct gendisk *disk)
{
@@ -148,6 +159,13 @@ static inline bool mpath_qd_iopolicy(struct mpath_iopolicy *mpath_iopolicy)
return mpath_read_iopolicy(mpath_iopolicy) == MPATH_IOPOLICY_QD;
}
+static inline bool mpath_head_queue_if_no_path(struct mpath_head *mpath_head)
+{
+ if (test_bit(MPATH_HEAD_QUEUE_IF_NO_PATH, &mpath_head->flags))
+ return true;
+ return false;
+}
+
static inline void mpath_schedule_requeue_work(struct mpath_head *mpath_head)
{
kblockd_schedule_work(&mpath_head->requeue_work);
diff --git a/lib/multipath.c b/lib/multipath.c
index 69e48ca3169c2..9a1a8cb4a417f 100644
--- a/lib/multipath.c
+++ b/lib/multipath.c
@@ -53,6 +53,8 @@ void mpath_add_device(struct mpath_head *mpath_head,
mutex_lock(&mpath_head->lock);
list_add_tail_rcu(&mpath_device->siblings, &mpath_head->dev_list);
mutex_unlock(&mpath_head->lock);
+ if (cancel_delayed_work(&mpath_head->remove_work))
+ module_put(mpath_head->drv_module);
}
EXPORT_SYMBOL_GPL(mpath_add_device);
@@ -363,7 +365,17 @@ static bool mpath_available_path(struct mpath_head *mpath_head)
return true;
}
- return false;
+ /*
+ * If "mpath_head->delayed_removal_secs" is set (i.e., non-zero), do
+ * not immediately fail I/O. Instead, requeue the I/O for the configured
+ * duration, anticipating that if there's a transient link failure then
+ * it may recover within this time window. This parameter is exported to
+ * userspace via sysfs, and its default value is zero. It is internally
+ * mapped to MPATH_HEAD_QUEUE_IF_NO_PATH. When delayed_removal_secs is
+ * non-zero, this flag is set to true. When zero, the flag is cleared.
+ */
+ return mpath_head_queue_if_no_path(mpath_head);
+
}
static void mpath_bdev_submit_bio(struct bio *bio)
@@ -609,6 +621,39 @@ static void mpath_requeue_work(struct work_struct *work)
}
}
+bool mpath_can_remove_head(struct mpath_head *mpath_head)
+{
+ bool remove = false;
+
+ mutex_lock(&mpath_head->lock);
+ /*
+ * Ensure that no one could remove this module while the head
+ * remove work is pending.
+ */
+ if (mpath_head_queue_if_no_path(mpath_head) &&
+ try_module_get(mpath_head->drv_module)) {
+
+ mod_delayed_work(mpath_wq, &mpath_head->remove_work,
+ mpath_head->delayed_removal_secs * HZ);
+ } else {
+ remove = true;
+ }
+
+ mutex_unlock(&mpath_head->lock);
+ return remove;
+}
+EXPORT_SYMBOL_GPL(mpath_can_remove_head);
+
+static void mpath_remove_head_work(struct work_struct *work)
+{
+ struct mpath_head *mpath_head = container_of(to_delayed_work(work),
+ struct mpath_head, remove_work);
+ struct module *drv_module = mpath_head->drv_module;
+
+ mpath_head->mpdt->remove_head(mpath_head);
+ module_put(drv_module);
+}
+
void mpath_remove_disk(struct mpath_head *mpath_head)
{
if (test_and_clear_bit(MPATH_HEAD_DISK_LIVE, &mpath_head->flags)) {
@@ -660,6 +705,9 @@ int mpath_alloc_head_disk(struct mpath_head *mpath_head,
mpath_head->disk->private_data = mpath_head;
mpath_head->disk->fops = &mpath_ops;
+ INIT_DELAYED_WORK(&mpath_head->remove_work, mpath_remove_head_work);
+ mpath_head->delayed_removal_secs = 0;
+
set_bit(GD_SUPPRESS_PART_SCAN, &mpath_head->disk->state);
return 0;
@@ -705,6 +753,47 @@ void mpath_device_set_live(struct mpath_device *mpath_device)
}
EXPORT_SYMBOL_GPL(mpath_device_set_live);
+ssize_t mpath_delayed_removal_secs_show(struct mpath_head *mpath_head,
+ char *buf)
+{
+ int ret;
+
+ mutex_lock(&mpath_head->lock);
+ ret = sysfs_emit(buf, "%u\n", mpath_head->delayed_removal_secs);
+ mutex_unlock(&mpath_head->lock);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(mpath_delayed_removal_secs_show);
+
+ssize_t mpath_delayed_removal_secs_store(struct mpath_head *mpath_head,
+ const char *buf, size_t count)
+{
+ ssize_t ret;
+ int sec;
+
+ ret = kstrtouint(buf, 0, &sec);
+ if (ret < 0)
+ return ret;
+
+ mutex_lock(&mpath_head->lock);
+ mpath_head->delayed_removal_secs = sec;
+ if (sec)
+ set_bit(MPATH_HEAD_QUEUE_IF_NO_PATH, &mpath_head->flags);
+ else
+ clear_bit(MPATH_HEAD_QUEUE_IF_NO_PATH, &mpath_head->flags);
+ mutex_unlock(&mpath_head->lock);
+
+ /*
+ * Ensure that update to MPATH_HEAD_QUEUE_IF_NO_PATH is seen
+ * by its reader.
+ */
+ mpath_synchronize(mpath_head);
+
+ return count;
+}
+EXPORT_SYMBOL_GPL(mpath_delayed_removal_secs_store);
+
void mpath_add_sysfs_link(struct mpath_head *mpath_head)
{
struct device *target;
--
2.43.5
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v2 08/13] libmultipath: Add sysfs helpers
2026-04-28 11:10 [PATCH v2 00/13] libmultipath: a generic multipath lib for block drivers John Garry
` (6 preceding siblings ...)
2026-04-28 11:10 ` [PATCH v2 07/13] libmultipath: Add delayed removal support John Garry
@ 2026-04-28 11:11 ` John Garry
2026-04-28 11:11 ` [PATCH v2 09/13] libmultipath: Add PR support John Garry
` (5 subsequent siblings)
13 siblings, 0 replies; 16+ messages in thread
From: John Garry @ 2026-04-28 11:11 UTC (permalink / raw)
To: hch, kbusch, sagi, axboe, martin.petersen, james.bottomley, hare,
bmarzins, nilay
Cc: jmeneghi, linux-nvme, linux-scsi, michael.christie, snitzer,
dm-devel, linux-kernel, John Garry
Add helpers for driver sysfs code for the following functionality:
- get/set iopolicy with mpath_iopolicy_store() and mpath_iopolicy_show()
- show device path per NUMA node
- "multipath" attribute group, equivalent to nvme_ns_mpath_attr_group
- device groups attribute array, similar to nvme_ns_attr_groups but not
containing NVMe members.
Note that mpath_iopolicy_store() has a update callback to allow same
functionality as nvme_subsys_iopolicy_update() be run for clearing paths.
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
include/linux/multipath.h | 7 +++
lib/multipath.c | 96 +++++++++++++++++++++++++++++++++++++++
2 files changed, 103 insertions(+)
diff --git a/include/linux/multipath.h b/include/linux/multipath.h
index 6afbf6ae1d2a9..b18491c1d077f 100644
--- a/include/linux/multipath.h
+++ b/include/linux/multipath.h
@@ -10,6 +10,8 @@
extern const struct file_operations mpath_chr_fops;
extern const struct block_device_operations mpath_ops;
+extern const struct attribute_group mpath_attr_group;
+extern const struct attribute_group *mpath_device_groups[];
enum mpath_iopolicy_e {
MPATH_IOPOLICY_NUMA,
@@ -140,6 +142,11 @@ int mpath_alloc_head_disk(struct mpath_head *mpath_head,
struct queue_limits *lim, int numa_node);
void mpath_device_set_live(struct mpath_device *mpath_device);
bool mpath_can_remove_head(struct mpath_head *mpath_head);
+ssize_t mpath_numa_nodes_show(struct mpath_device *mpath_device,
+ struct mpath_iopolicy *iopolicy, char *buf);
+ssize_t mpath_iopolicy_show(struct mpath_iopolicy *mpath_iopolicy, char *buf);
+bool mpath_iopolicy_store(struct mpath_iopolicy *mpath_iopolicy,
+ const char *buf, size_t count);
ssize_t mpath_delayed_removal_secs_show(struct mpath_head *mpath_head,
char *buf);
ssize_t mpath_delayed_removal_secs_store(struct mpath_head *mpath_head,
diff --git a/lib/multipath.c b/lib/multipath.c
index 9a1a8cb4a417f..680bb4f0ae237 100644
--- a/lib/multipath.c
+++ b/lib/multipath.c
@@ -753,6 +753,102 @@ void mpath_device_set_live(struct mpath_device *mpath_device)
}
EXPORT_SYMBOL_GPL(mpath_device_set_live);
+static struct attribute dummy_attr = {
+ .name = "dummy",
+};
+
+static struct attribute *mpath_attrs[] = {
+ &dummy_attr,
+ NULL
+};
+
+static bool multipath_sysfs_group_visible(struct kobject *kobj)
+{
+ struct device *dev = container_of(kobj, struct device, kobj);
+ struct gendisk *disk = dev_to_disk(dev);
+
+ return is_mpath_disk(disk);
+}
+DEFINE_SIMPLE_SYSFS_GROUP_VISIBLE(multipath_sysfs)
+
+const struct attribute_group mpath_attr_group = {
+ .name = "multipath",
+ .attrs = mpath_attrs,
+ .is_visible = SYSFS_GROUP_VISIBLE(multipath_sysfs),
+};
+EXPORT_SYMBOL_GPL(mpath_attr_group);
+
+const struct attribute_group *mpath_device_groups[] = {
+ &mpath_attr_group,
+ NULL
+};
+EXPORT_SYMBOL_GPL(mpath_device_groups);
+
+ssize_t mpath_iopolicy_show(struct mpath_iopolicy *mpath_iopolicy, char *buf)
+{
+ return sysfs_emit(buf, "%s\n",
+ mpath_iopolicy_names[mpath_read_iopolicy(mpath_iopolicy)]);
+}
+EXPORT_SYMBOL_GPL(mpath_iopolicy_show);
+
+static void mpath_iopolicy_update(struct mpath_iopolicy *mpath_iopolicy,
+ int iopolicy)
+{
+ int old_iopolicy = READ_ONCE(mpath_iopolicy->iopolicy);
+
+ if (old_iopolicy == iopolicy)
+ return;
+
+ WRITE_ONCE(mpath_iopolicy->iopolicy, iopolicy);
+
+ pr_info("iopolicy changed from %s to %s\n",
+ mpath_iopolicy_names[old_iopolicy],
+ mpath_iopolicy_names[iopolicy]);
+}
+
+bool mpath_iopolicy_store(struct mpath_iopolicy *mpath_iopolicy,
+ const char *buf, size_t count)
+{
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(mpath_iopolicy_names); i++) {
+ if (sysfs_streq(buf, mpath_iopolicy_names[i])) {
+ mpath_iopolicy_update(mpath_iopolicy, i);
+ return true;
+ }
+ }
+
+ return false;
+}
+EXPORT_SYMBOL_GPL(mpath_iopolicy_store);
+
+ssize_t mpath_numa_nodes_show(struct mpath_device *mpath_device,
+ struct mpath_iopolicy *mpath_iopolicy, char *buf)
+{
+ struct mpath_head *mpath_head = mpath_device->mpath_head;
+ int node, srcu_idx;
+ nodemask_t numa_nodes;
+ struct mpath_device *current_mpath_dev;
+
+ if (mpath_read_iopolicy(mpath_iopolicy) != MPATH_IOPOLICY_NUMA)
+ return 0;
+
+ nodes_clear(numa_nodes);
+
+ srcu_idx = srcu_read_lock(&mpath_head->srcu);
+ for_each_node(node) {
+ current_mpath_dev =
+ srcu_dereference(mpath_head->current_path[node],
+ &mpath_head->srcu);
+ if (current_mpath_dev == mpath_device)
+ node_set(node, numa_nodes);
+ }
+ srcu_read_unlock(&mpath_head->srcu, srcu_idx);
+
+ return sysfs_emit(buf, "%*pbl\n", nodemask_pr_args(&numa_nodes));
+}
+EXPORT_SYMBOL_GPL(mpath_numa_nodes_show);
+
ssize_t mpath_delayed_removal_secs_show(struct mpath_head *mpath_head,
char *buf)
{
--
2.43.5
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v2 09/13] libmultipath: Add PR support
2026-04-28 11:10 [PATCH v2 00/13] libmultipath: a generic multipath lib for block drivers John Garry
` (7 preceding siblings ...)
2026-04-28 11:11 ` [PATCH v2 08/13] libmultipath: Add sysfs helpers John Garry
@ 2026-04-28 11:11 ` John Garry
2026-04-28 11:11 ` [PATCH v2 10/13] libmultipath: Add mpath_bdev_report_zones() John Garry
` (4 subsequent siblings)
13 siblings, 0 replies; 16+ messages in thread
From: John Garry @ 2026-04-28 11:11 UTC (permalink / raw)
To: hch, kbusch, sagi, axboe, martin.petersen, james.bottomley, hare,
bmarzins, nilay
Cc: jmeneghi, linux-nvme, linux-scsi, michael.christie, snitzer,
dm-devel, linux-kernel, John Garry
Add support for persistent reservations.
Effectively all that is done here is that a multipath version of pr_ops is
created which calls into the bdev fops callback for the mpath_device
selected.
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
include/linux/multipath.h | 1 +
lib/multipath.c | 182 ++++++++++++++++++++++++++++++++++++++
2 files changed, 183 insertions(+)
diff --git a/include/linux/multipath.h b/include/linux/multipath.h
index b18491c1d077f..7464c94fbcc3e 100644
--- a/include/linux/multipath.h
+++ b/include/linux/multipath.h
@@ -5,6 +5,7 @@
#include <linux/blkdev.h>
#include <linux/blk-mq.h>
#include <linux/cdev.h>
+#include <linux/pr.h>
#include <linux/srcu.h>
#include <linux/io_uring/cmd.h>
diff --git a/lib/multipath.c b/lib/multipath.c
index 680bb4f0ae237..d2270c70b9913 100644
--- a/lib/multipath.c
+++ b/lib/multipath.c
@@ -466,11 +466,193 @@ static void mpath_bdev_release(struct gendisk *disk)
mpath_put_head(mpath_head);
}
+static int mpath_pr_register(struct block_device *bdev, u64 old_key,
+ u64 new_key, unsigned int flags)
+{
+ struct mpath_head *mpath_head = dev_get_drvdata(&bdev->bd_device);
+ struct mpath_device *mpath_device;
+ int srcu_idx, ret = -EWOULDBLOCK;
+
+ srcu_idx = srcu_read_lock(&mpath_head->srcu);
+ mpath_device = mpath_find_path(mpath_head);
+ if (mpath_device) {
+ const struct pr_ops *ops = mpath_device->disk->fops->pr_ops;
+
+ if (!ops || !ops->pr_register) {
+ ret = -EOPNOTSUPP;
+ goto unlock;
+ }
+ ret = ops->pr_register(mpath_device->disk->part0,
+ old_key, new_key, flags);
+ }
+unlock:
+ srcu_read_unlock(&mpath_head->srcu, srcu_idx);
+
+ return ret;
+}
+
+static int mpath_pr_reserve(struct block_device *bdev, u64 key,
+ enum pr_type type, unsigned flags)
+{
+ struct mpath_head *mpath_head = dev_get_drvdata(&bdev->bd_device);
+ struct mpath_device *mpath_device;
+ int srcu_idx, ret = -EWOULDBLOCK;
+
+ srcu_idx = srcu_read_lock(&mpath_head->srcu);
+ mpath_device = mpath_find_path(mpath_head);
+ if (mpath_device) {
+ const struct pr_ops *ops = mpath_device->disk->fops->pr_ops;
+
+ if (!ops || !ops->pr_reserve) {
+ ret = -EOPNOTSUPP;
+ goto unlock;
+ }
+ ret = ops->pr_reserve(mpath_device->disk->part0, key,
+ type, flags);
+ }
+unlock:
+ srcu_read_unlock(&mpath_head->srcu, srcu_idx);
+
+ return ret;
+}
+
+static int mpath_pr_release(struct block_device *bdev, u64 key,
+ enum pr_type type)
+{
+ struct mpath_head *mpath_head = dev_get_drvdata(&bdev->bd_device);
+ struct mpath_device *mpath_device;
+ int srcu_idx, ret = -EWOULDBLOCK;
+
+ srcu_idx = srcu_read_lock(&mpath_head->srcu);
+ mpath_device = mpath_find_path(mpath_head);
+ if (mpath_device) {
+ const struct pr_ops *ops = mpath_device->disk->fops->pr_ops;
+
+ if (!ops || !ops->pr_release) {
+ ret = -EOPNOTSUPP;
+ goto unlock;
+ }
+ ret = ops->pr_release(mpath_device->disk->part0, key, type);
+ }
+unlock:
+ srcu_read_unlock(&mpath_head->srcu, srcu_idx);
+
+ return ret;
+}
+
+static int mpath_pr_preempt(struct block_device *bdev, u64 old, u64 new,
+ enum pr_type type, bool abort)
+{
+ struct mpath_head *mpath_head = dev_get_drvdata(&bdev->bd_device);
+ struct mpath_device *mpath_device;
+ int srcu_idx, ret = -EWOULDBLOCK;
+
+ srcu_idx = srcu_read_lock(&mpath_head->srcu);
+ mpath_device = mpath_find_path(mpath_head);
+ if (mpath_device) {
+ const struct pr_ops *ops = mpath_device->disk->fops->pr_ops;
+
+ if (!ops || !ops->pr_preempt) {
+ ret = -EOPNOTSUPP;
+ goto unlock;
+ }
+ ret = ops->pr_preempt(mpath_device->disk->part0, old,
+ new, type, abort);
+ }
+unlock:
+ srcu_read_unlock(&mpath_head->srcu, srcu_idx);
+
+ return ret;
+}
+
+static int mpath_pr_clear(struct block_device *bdev, u64 key)
+{
+ struct mpath_head *mpath_head = dev_get_drvdata(&bdev->bd_device);
+ struct mpath_device *mpath_device;
+ int srcu_idx, ret = -EWOULDBLOCK;
+
+ srcu_idx = srcu_read_lock(&mpath_head->srcu);
+ mpath_device = mpath_find_path(mpath_head);
+ if (mpath_device) {
+ const struct pr_ops *ops = mpath_device->disk->fops->pr_ops;
+
+ if (!ops || !ops->pr_clear) {
+ ret = -EOPNOTSUPP;
+ goto unlock;
+ }
+ ret = ops->pr_clear(mpath_device->disk->part0, key);
+ }
+unlock:
+ srcu_read_unlock(&mpath_head->srcu, srcu_idx);
+
+ return ret;
+}
+
+static int mpath_pr_read_keys(struct block_device *bdev,
+ struct pr_keys *keys_info)
+{
+ struct mpath_head *mpath_head = dev_get_drvdata(&bdev->bd_device);
+ struct mpath_device *mpath_device;
+ int srcu_idx, ret = -EWOULDBLOCK;
+
+ srcu_idx = srcu_read_lock(&mpath_head->srcu);
+ mpath_device = mpath_find_path(mpath_head);
+ if (mpath_device) {
+ const struct pr_ops *ops = mpath_device->disk->fops->pr_ops;
+
+ if (!ops || !ops->pr_read_keys) {
+ ret = -EOPNOTSUPP;
+ goto unlock;
+ }
+ ret = ops->pr_read_keys(mpath_device->disk->part0, keys_info);
+ }
+unlock:
+ srcu_read_unlock(&mpath_head->srcu, srcu_idx);
+
+ return ret;
+}
+
+static int mpath_pr_read_reservation(struct block_device *bdev,
+ struct pr_held_reservation *resv)
+{
+ struct mpath_head *mpath_head = dev_get_drvdata(&bdev->bd_device);
+ struct mpath_device *mpath_device;
+ int srcu_idx, ret = -EWOULDBLOCK;
+
+ srcu_idx = srcu_read_lock(&mpath_head->srcu);
+ mpath_device = mpath_find_path(mpath_head);
+ if (mpath_device) {
+ const struct pr_ops *ops = mpath_device->disk->fops->pr_ops;
+
+ if (!ops || !ops->pr_read_reservation) {
+ ret = -EOPNOTSUPP;
+ goto unlock;
+ }
+ ret = ops->pr_read_reservation(mpath_device->disk->part0,
+ resv);
+ }
+unlock:
+ srcu_read_unlock(&mpath_head->srcu, srcu_idx);
+
+ return ret;
+}
+
+static const struct pr_ops mpath_pr_ops = {
+ .pr_register = mpath_pr_register,
+ .pr_reserve = mpath_pr_reserve,
+ .pr_release = mpath_pr_release,
+ .pr_preempt = mpath_pr_preempt,
+ .pr_clear = mpath_pr_clear,
+ .pr_read_keys = mpath_pr_read_keys,
+ .pr_read_reservation = mpath_pr_read_reservation,
+};
+
const struct block_device_operations mpath_ops = {
.owner = THIS_MODULE,
.open = mpath_bdev_open,
.release = mpath_bdev_release,
.submit_bio = mpath_bdev_submit_bio,
+ .pr_ops = &mpath_pr_ops,
};
EXPORT_SYMBOL_GPL(mpath_ops);
--
2.43.5
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v2 10/13] libmultipath: Add mpath_bdev_report_zones()
2026-04-28 11:10 [PATCH v2 00/13] libmultipath: a generic multipath lib for block drivers John Garry
` (8 preceding siblings ...)
2026-04-28 11:11 ` [PATCH v2 09/13] libmultipath: Add PR support John Garry
@ 2026-04-28 11:11 ` John Garry
2026-04-28 11:11 ` [PATCH v2 11/13] libmultipath: Add support for block device IOCTL John Garry
` (3 subsequent siblings)
13 siblings, 0 replies; 16+ messages in thread
From: John Garry @ 2026-04-28 11:11 UTC (permalink / raw)
To: hch, kbusch, sagi, axboe, martin.petersen, james.bottomley, hare,
bmarzins, nilay
Cc: jmeneghi, linux-nvme, linux-scsi, michael.christie, snitzer,
dm-devel, linux-kernel, John Garry
Add a multipath handler for block_device_operations.report_zones
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
lib/multipath.c | 25 +++++++++++++++++++++++++
1 file changed, 25 insertions(+)
diff --git a/lib/multipath.c b/lib/multipath.c
index d2270c70b9913..c72f35e02e2ab 100644
--- a/lib/multipath.c
+++ b/lib/multipath.c
@@ -40,6 +40,30 @@ int mpath_get_iopolicy(char *buf, int iopolicy)
}
EXPORT_SYMBOL_GPL(mpath_get_iopolicy);
+#ifdef CONFIG_BLK_DEV_ZONED
+static int mpath_bdev_report_zones(struct gendisk *disk, sector_t sector,
+ unsigned int nr_zones, struct blk_report_zones_args *args)
+{
+ struct mpath_head *mpath_head = mpath_gendisk_to_head(disk);
+ struct mpath_device *mpath_device;
+ int srcu_idx, ret = -EWOULDBLOCK;
+
+ srcu_idx = srcu_read_lock(&mpath_head->srcu);
+ mpath_device = mpath_find_path(mpath_head);
+ if (mpath_device) {
+ if (mpath_device->disk->fops->report_zones)
+ ret = mpath_device->disk->fops->report_zones
+ (mpath_device->disk, sector, nr_zones, args);
+ else
+ ret = -EOPNOTSUPP;
+ }
+ srcu_read_unlock(&mpath_head->srcu, srcu_idx);
+ return ret;
+}
+#else
+#define mpath_bdev_report_zones NULL
+#endif /* CONFIG_BLK_DEV_ZONED */
+
void mpath_synchronize(struct mpath_head *mpath_head)
{
synchronize_srcu(&mpath_head->srcu);
@@ -652,6 +676,7 @@ const struct block_device_operations mpath_ops = {
.open = mpath_bdev_open,
.release = mpath_bdev_release,
.submit_bio = mpath_bdev_submit_bio,
+ .report_zones = mpath_bdev_report_zones,
.pr_ops = &mpath_pr_ops,
};
EXPORT_SYMBOL_GPL(mpath_ops);
--
2.43.5
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v2 11/13] libmultipath: Add support for block device IOCTL
2026-04-28 11:10 [PATCH v2 00/13] libmultipath: a generic multipath lib for block drivers John Garry
` (9 preceding siblings ...)
2026-04-28 11:11 ` [PATCH v2 10/13] libmultipath: Add mpath_bdev_report_zones() John Garry
@ 2026-04-28 11:11 ` John Garry
2026-04-28 11:11 ` [PATCH v2 12/13] libmultipath: Add mpath_bdev_getgeo() John Garry
` (2 subsequent siblings)
13 siblings, 0 replies; 16+ messages in thread
From: John Garry @ 2026-04-28 11:11 UTC (permalink / raw)
To: hch, kbusch, sagi, axboe, martin.petersen, james.bottomley, hare,
bmarzins, nilay
Cc: jmeneghi, linux-nvme, linux-scsi, michael.christie, snitzer,
dm-devel, linux-kernel, John Garry
Add mpath_bdev_ioctl() as a multipath block device IOCTL handler. This
handler calls into the mpath_device bdev fops handler.
Like what is done for cdev IOCTL handler, use .ioctl_begin and
.ioctl_finish methods to know when the until the SRCU read lock - this is
for special NVMe controller IOCTL handling.
The .compat_ioctl handler is given the standard handler.
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
lib/multipath.c | 34 ++++++++++++++++++++++++++++++++++
1 file changed, 34 insertions(+)
diff --git a/lib/multipath.c b/lib/multipath.c
index c72f35e02e2ab..e2998c1b277c0 100644
--- a/lib/multipath.c
+++ b/lib/multipath.c
@@ -490,6 +490,38 @@ static void mpath_bdev_release(struct gendisk *disk)
mpath_put_head(mpath_head);
}
+static int mpath_bdev_ioctl(struct block_device *bdev, blk_mode_t mode,
+ unsigned int cmd, unsigned long arg)
+{
+ struct gendisk *disk = bdev->bd_disk;
+ struct mpath_head *mpath_head = mpath_gendisk_to_head(disk);
+ struct mpath_device *mpath_device;
+ int srcu_idx, err;
+ void *unlocked_ioctl_data = NULL;
+
+ srcu_idx = srcu_read_lock(&mpath_head->srcu);
+ mpath_device = mpath_find_path(mpath_head);
+ if (!mpath_device) {
+ err = -EWOULDBLOCK;
+ goto out_unlock;
+ }
+
+ if (mpath_head->mpdt->ioctl_begin)
+ mpath_head->mpdt->ioctl_begin(mpath_device, cmd,
+ &unlocked_ioctl_data);
+ if (unlocked_ioctl_data)
+ srcu_read_unlock(&mpath_head->srcu, srcu_idx);
+ err = mpath_device->disk->fops->ioctl(
+ mpath_device->disk->part0, mode, cmd, arg);
+ if (unlocked_ioctl_data) {
+ mpath_head->mpdt->ioctl_finish(unlocked_ioctl_data);
+ return err;
+ }
+out_unlock:
+ srcu_read_unlock(&mpath_head->srcu, srcu_idx);
+ return err;
+}
+
static int mpath_pr_register(struct block_device *bdev, u64 old_key,
u64 new_key, unsigned int flags)
{
@@ -676,6 +708,8 @@ const struct block_device_operations mpath_ops = {
.open = mpath_bdev_open,
.release = mpath_bdev_release,
.submit_bio = mpath_bdev_submit_bio,
+ .ioctl = mpath_bdev_ioctl,
+ .compat_ioctl = blkdev_compat_ptr_ioctl,
.report_zones = mpath_bdev_report_zones,
.pr_ops = &mpath_pr_ops,
};
--
2.43.5
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v2 12/13] libmultipath: Add mpath_bdev_getgeo()
2026-04-28 11:10 [PATCH v2 00/13] libmultipath: a generic multipath lib for block drivers John Garry
` (10 preceding siblings ...)
2026-04-28 11:11 ` [PATCH v2 11/13] libmultipath: Add support for block device IOCTL John Garry
@ 2026-04-28 11:11 ` John Garry
2026-04-28 11:11 ` [PATCH v2 13/13] libmultipath: Add mpath_bdev_get_unique_id() John Garry
2026-05-10 22:03 ` [PATCH v2 00/13] libmultipath: a generic multipath lib for block drivers Sagi Grimberg
13 siblings, 0 replies; 16+ messages in thread
From: John Garry @ 2026-04-28 11:11 UTC (permalink / raw)
To: hch, kbusch, sagi, axboe, martin.petersen, james.bottomley, hare,
bmarzins, nilay
Cc: jmeneghi, linux-nvme, linux-scsi, michael.christie, snitzer,
dm-devel, linux-kernel, John Garry
Add mpath_bdev_getgeo() as a multipath block device .getgeo handler.
Here we just redirect into the selected mpath_device disk fops->getgeo
handler.
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
lib/multipath.c | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)
diff --git a/lib/multipath.c b/lib/multipath.c
index e2998c1b277c0..1228837e5eeac 100644
--- a/lib/multipath.c
+++ b/lib/multipath.c
@@ -522,6 +522,26 @@ static int mpath_bdev_ioctl(struct block_device *bdev, blk_mode_t mode,
return err;
}
+static int mpath_bdev_getgeo(struct gendisk *disk, struct hd_geometry *geo)
+{
+ struct mpath_head *mpath_head = mpath_gendisk_to_head(disk);
+ int srcu_idx, ret = -EWOULDBLOCK;
+ struct mpath_device *mpath_device;
+
+ srcu_idx = srcu_read_lock(&mpath_head->srcu);
+ mpath_device = mpath_find_path(mpath_head);
+ if (mpath_device) {
+ if (mpath_device->disk->fops->getgeo)
+ ret = mpath_device->disk->fops->getgeo(
+ mpath_device->disk, geo);
+ else
+ ret = -ENOTTY; /* See blkdev_getgeo */
+ }
+ srcu_read_unlock(&mpath_head->srcu, srcu_idx);
+
+ return ret;
+}
+
static int mpath_pr_register(struct block_device *bdev, u64 old_key,
u64 new_key, unsigned int flags)
{
@@ -711,6 +731,7 @@ const struct block_device_operations mpath_ops = {
.ioctl = mpath_bdev_ioctl,
.compat_ioctl = blkdev_compat_ptr_ioctl,
.report_zones = mpath_bdev_report_zones,
+ .getgeo = mpath_bdev_getgeo,
.pr_ops = &mpath_pr_ops,
};
EXPORT_SYMBOL_GPL(mpath_ops);
--
2.43.5
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v2 13/13] libmultipath: Add mpath_bdev_get_unique_id()
2026-04-28 11:10 [PATCH v2 00/13] libmultipath: a generic multipath lib for block drivers John Garry
` (11 preceding siblings ...)
2026-04-28 11:11 ` [PATCH v2 12/13] libmultipath: Add mpath_bdev_getgeo() John Garry
@ 2026-04-28 11:11 ` John Garry
2026-05-10 22:03 ` [PATCH v2 00/13] libmultipath: a generic multipath lib for block drivers Sagi Grimberg
13 siblings, 0 replies; 16+ messages in thread
From: John Garry @ 2026-04-28 11:11 UTC (permalink / raw)
To: hch, kbusch, sagi, axboe, martin.petersen, james.bottomley, hare,
bmarzins, nilay
Cc: jmeneghi, linux-nvme, linux-scsi, michael.christie, snitzer,
dm-devel, linux-kernel, John Garry
Add mpath_bdev_get_unique_id() as a multipath block device .get_unique_id
handler.
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
lib/multipath.c | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)
diff --git a/lib/multipath.c b/lib/multipath.c
index 1228837e5eeac..001a02e6df274 100644
--- a/lib/multipath.c
+++ b/lib/multipath.c
@@ -490,6 +490,26 @@ static void mpath_bdev_release(struct gendisk *disk)
mpath_put_head(mpath_head);
}
+static int mpath_bdev_get_unique_id(struct gendisk *disk, u8 id[16],
+ enum blk_unique_id type)
+{
+ struct mpath_head *mpath_head = mpath_gendisk_to_head(disk);
+ int srcu_idx, ret = -EWOULDBLOCK;
+ struct mpath_device *mpath_device;
+
+ srcu_idx = srcu_read_lock(&mpath_head->srcu);
+ mpath_device = mpath_find_path(mpath_head);
+ if (mpath_device) {
+ if (mpath_device->disk->fops->get_unique_id)
+ ret = mpath_device->disk->fops->get_unique_id(
+ mpath_device->disk, id, type);
+ else
+ ret = 0; /* referencing __dm_get_unique_id() */
+ }
+ srcu_read_unlock(&mpath_head->srcu, srcu_idx);
+
+ return ret;
+}
static int mpath_bdev_ioctl(struct block_device *bdev, blk_mode_t mode,
unsigned int cmd, unsigned long arg)
{
@@ -730,6 +750,7 @@ const struct block_device_operations mpath_ops = {
.submit_bio = mpath_bdev_submit_bio,
.ioctl = mpath_bdev_ioctl,
.compat_ioctl = blkdev_compat_ptr_ioctl,
+ .get_unique_id = mpath_bdev_get_unique_id,
.report_zones = mpath_bdev_report_zones,
.getgeo = mpath_bdev_getgeo,
.pr_ops = &mpath_pr_ops,
--
2.43.5
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v2 00/13] libmultipath: a generic multipath lib for block drivers
2026-04-28 11:10 [PATCH v2 00/13] libmultipath: a generic multipath lib for block drivers John Garry
` (12 preceding siblings ...)
2026-04-28 11:11 ` [PATCH v2 13/13] libmultipath: Add mpath_bdev_get_unique_id() John Garry
@ 2026-05-10 22:03 ` Sagi Grimberg
2026-05-11 7:30 ` John Garry
13 siblings, 1 reply; 16+ messages in thread
From: Sagi Grimberg @ 2026-05-10 22:03 UTC (permalink / raw)
To: John Garry, hch, kbusch, axboe, martin.petersen, james.bottomley,
hare, bmarzins, nilay
Cc: jmeneghi, linux-nvme, linux-scsi, michael.christie, snitzer,
dm-devel, linux-kernel
On 28/04/2026 14:10, John Garry wrote:
> libmultipath: a generic multipath lib for block drivers
This is very nice John.
>
> This series introduces libmultipath. It is essentially a refactoring of
> NVME multipath support, so we can have a common library to also support
> native SCSI multipath.
>
> Much of the code is taken directly from the NVMe multipath code. However,
> NVMe specifics are removed. A template structure is provided so the driver
> may provide callbacks for driver specifics, like ANA support for NVMe.
>
> Important new structures introduced include:
>
> - mpath_head
> These contain much of the multipath-specific functionality from
> nvme_ns_head, including a pointer to the gendisk structure and
> a path SRCU-based array.
I think it should be placed first in its parent struct as it holds the
hot-path
head->srcu and head->list.
>
> - mpath_device
> This is the per-path structure, and contains much the same
> multipath-specific functionality in nvme_ns
>
> libmultipath provides functionality for path management, path selection,
> data path, and failover handling.
>
> Since the NVMe driver has some code in the sysfs and ioctl handling
> which iterate all multipath NSes, functions like mpath_call_for_device()
> are added to do the same per-path iteration.
very nice, overall seems fairly straight forward.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 00/13] libmultipath: a generic multipath lib for block drivers
2026-05-10 22:03 ` [PATCH v2 00/13] libmultipath: a generic multipath lib for block drivers Sagi Grimberg
@ 2026-05-11 7:30 ` John Garry
0 siblings, 0 replies; 16+ messages in thread
From: John Garry @ 2026-05-11 7:30 UTC (permalink / raw)
To: Sagi Grimberg, hch, kbusch, axboe, martin.petersen,
james.bottomley, hare, bmarzins, nilay
Cc: jmeneghi, linux-nvme, linux-scsi, michael.christie, snitzer,
dm-devel, linux-kernel
On 10/05/2026 23:03, Sagi Grimberg wrote:
>> Important new structures introduced include:
>>
>> - mpath_head
>> These contain much of the multipath-specific functionality from
>> nvme_ns_head, including a pointer to the gendisk structure and
>> a path SRCU-based array.
>
> I think it should be placed first in its parent struct as it holds the
> hot-path
> head->srcu and head->list.
>
Yeah, I did originally try this. However it becomes a pain for managing
the lifecycle of the mpath_head and nvme_ns_head/scsi_mpath_head
structures, especially for the scenarios like the delayed head removal.
However I can see again if I can make it work.
>>
>> - mpath_device
>> This is the per-path structure, and contains much the same
>> multipath-specific functionality in nvme_ns
>>
>> libmultipath provides functionality for path management, path selection,
>> data path, and failover handling.
>>
>> Since the NVMe driver has some code in the sysfs and ioctl handling
>> which iterate all multipath NSes, functions like mpath_call_for_device()
>> are added to do the same per-path iteration.
>
> very nice, overall seems fairly straight forward.
thanks a lot
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2026-05-11 7:31 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-28 11:10 [PATCH v2 00/13] libmultipath: a generic multipath lib for block drivers John Garry
2026-04-28 11:10 ` [PATCH v2 01/13] libmultipath: Add initial framework John Garry
2026-04-28 11:10 ` [PATCH v2 02/13] libmultipath: Add basic gendisk support John Garry
2026-04-28 11:10 ` [PATCH v2 03/13] libmultipath: Add path selection support John Garry
2026-04-28 11:10 ` [PATCH v2 04/13] libmultipath: Add bio handling John Garry
2026-04-28 11:10 ` [PATCH v2 05/13] libmultipath: Add support for mpath_device management John Garry
2026-04-28 11:10 ` [PATCH v2 06/13] libmultipath: Add cdev support John Garry
2026-04-28 11:10 ` [PATCH v2 07/13] libmultipath: Add delayed removal support John Garry
2026-04-28 11:11 ` [PATCH v2 08/13] libmultipath: Add sysfs helpers John Garry
2026-04-28 11:11 ` [PATCH v2 09/13] libmultipath: Add PR support John Garry
2026-04-28 11:11 ` [PATCH v2 10/13] libmultipath: Add mpath_bdev_report_zones() John Garry
2026-04-28 11:11 ` [PATCH v2 11/13] libmultipath: Add support for block device IOCTL John Garry
2026-04-28 11:11 ` [PATCH v2 12/13] libmultipath: Add mpath_bdev_getgeo() John Garry
2026-04-28 11:11 ` [PATCH v2 13/13] libmultipath: Add mpath_bdev_get_unique_id() John Garry
2026-05-10 22:03 ` [PATCH v2 00/13] libmultipath: a generic multipath lib for block drivers Sagi Grimberg
2026-05-11 7:30 ` John Garry
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox