* [PATCHv2 0/3] nvme: multi-path scan races fixes
@ 2026-02-26 18:32 Keith Busch
2026-02-26 18:32 ` [PATCHv2 1/3] nvme-multipath: fix leak on try_module_get failure Keith Busch
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Keith Busch @ 2026-02-26 18:32 UTC (permalink / raw)
To: linux-nvme, hch, nilay; +Cc: Keith Busch
From: Keith Busch <kbusch@kernel.org>
Changes from v1:
* Added reviews and Fixes tags to patch 1
* Redid the logic in patch 2 to match Christoph's suggestions
* Also fixed a compile bug for CONFIG_NVME_MULTIPATH=n
* Added a new patch 3 to fix a dependency race between multiple
controller's scan_work
Keith Busch (3):
nvme-multipath: fix leak on try_module_get failure
nvme-multipath: rescan siblings on last path removal
nvme: fix unmatched id's under delayed path deletion
drivers/nvme/host/core.c | 37 ++++++++++++++++++++++++++++++++++-
drivers/nvme/host/multipath.c | 12 +++++-------
drivers/nvme/host/nvme.h | 9 +++++++++
3 files changed, 50 insertions(+), 8 deletions(-)
--
2.47.3
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCHv2 1/3] nvme-multipath: fix leak on try_module_get failure
2026-02-26 18:32 [PATCHv2 0/3] nvme: multi-path scan races fixes Keith Busch
@ 2026-02-26 18:32 ` Keith Busch
2026-02-26 18:32 ` [PATCHv2 2/3] nvme-multipath: rescan siblings on last path removal Keith Busch
2026-02-26 18:32 ` [PATCHv2 3/3] nvme: fix unmatched id's under delayed path deletion Keith Busch
2 siblings, 0 replies; 10+ messages in thread
From: Keith Busch @ 2026-02-26 18:32 UTC (permalink / raw)
To: linux-nvme, hch, nilay; +Cc: Keith Busch, John Garry
From: Keith Busch <kbusch@kernel.org>
We need to fall back to the synchronous removal if we can't get a
reference on the module needed for the deferred removal.
Fixes: 62188639ec16 ("nvme-multipath: introduce delayed removal of the multipath head node")
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
---
drivers/nvme/host/multipath.c | 12 +++++-------
1 file changed, 5 insertions(+), 7 deletions(-)
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index bfcc5904e6a26..fc6800a9f7f94 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -1310,13 +1310,11 @@ void nvme_mpath_remove_disk(struct nvme_ns_head *head)
if (!list_empty(&head->list))
goto out;
- if (head->delayed_removal_secs) {
- /*
- * Ensure that no one could remove this module while the head
- * remove work is pending.
- */
- if (!try_module_get(THIS_MODULE))
- goto out;
+ /*
+ * Ensure that no one could remove this module while the head
+ * remove work is pending.
+ */
+ if (head->delayed_removal_secs && try_module_get(THIS_MODULE)) {
mod_delayed_work(nvme_wq, &head->remove_work,
head->delayed_removal_secs * HZ);
} else {
--
2.47.3
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCHv2 2/3] nvme-multipath: rescan siblings on last path removal
2026-02-26 18:32 [PATCHv2 0/3] nvme: multi-path scan races fixes Keith Busch
2026-02-26 18:32 ` [PATCHv2 1/3] nvme-multipath: fix leak on try_module_get failure Keith Busch
@ 2026-02-26 18:32 ` Keith Busch
2026-02-27 6:59 ` Nilay Shroff
2026-02-27 13:54 ` Christoph Hellwig
2026-02-26 18:32 ` [PATCHv2 3/3] nvme: fix unmatched id's under delayed path deletion Keith Busch
2 siblings, 2 replies; 10+ messages in thread
From: Keith Busch @ 2026-02-26 18:32 UTC (permalink / raw)
To: linux-nvme, hch, nilay; +Cc: Keith Busch
From: Keith Busch <kbusch@kernel.org>
When a controller's scan removes the last path to a multipath namespace
head, sibling controllers in the same subsystem may need to rescan. A
concurrent scan on another controller could have encountered the stale
head with mismatched identifiers (e.g. from a recycled NSID) and failed
to set up the namespace. Without notification, the sibling won't retry
until the next AEN or explicit rescan.
After a scan that performed last-path removals, notify all controllers
in the subsystem to rescan by requeueing their scan work. This ensures
that recycled NSIDs are promptly discovered by sibling controllers.
Signed-off-by: Keith Busch <kbusch@kernel.org>
---
drivers/nvme/host/core.c | 19 ++++++++++++++++++-
drivers/nvme/host/nvme.h | 1 +
2 files changed, 19 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 3de52f1d27234..5bb4b18511b7b 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4262,8 +4262,10 @@ static void nvme_ns_remove(struct nvme_ns *ns)
mutex_unlock(&ns->ctrl->namespaces_lock);
synchronize_srcu(&ns->ctrl->srcu);
- if (last_path)
+ if (last_path) {
nvme_mpath_remove_disk(ns->head);
+ set_bit(NVME_CTRL_SCAN_REMOVED_NS, &ns->ctrl->flags);
+ }
nvme_put_ns(ns);
}
@@ -4530,6 +4532,21 @@ static void nvme_scan_work(struct work_struct *work)
}
mutex_unlock(&ctrl->scan_lock);
+ /*
+ * If the scan removed the last path to a namespace, notify all
+ * controllers in the subsystem to rescan. A controller that is
+ * concurrently scanning may have missed the namespace due to the
+ * stale head still occupying the NSID in the subsystem list.
+ */
+ if (test_and_clear_bit(NVME_CTRL_SCAN_REMOVED_NS, &ctrl->flags)) {
+ struct nvme_ctrl *tmp;
+
+ mutex_lock(&ctrl->subsys->lock);
+ list_for_each_entry(tmp, &ctrl->subsys->ctrls, subsys_entry)
+ nvme_queue_scan(tmp);
+ mutex_unlock(&ctrl->subsys->lock);
+ }
+
/* Requeue if we have missed AENs */
if (test_bit(NVME_AER_NOTICE_NS_CHANGED, &ctrl->events))
nvme_queue_scan(ctrl);
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 9971045dbc05e..e73cc2e67ac51 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -329,6 +329,7 @@ enum nvme_ctrl_flags {
NVME_CTRL_SKIP_ID_CNS_CS = 4,
NVME_CTRL_DIRTY_CAPABILITY = 5,
NVME_CTRL_FROZEN = 6,
+ NVME_CTRL_SCAN_REMOVED_NS = 7,
};
struct nvme_ctrl {
--
2.47.3
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCHv2 3/3] nvme: fix unmatched id's under delayed path deletion
2026-02-26 18:32 [PATCHv2 0/3] nvme: multi-path scan races fixes Keith Busch
2026-02-26 18:32 ` [PATCHv2 1/3] nvme-multipath: fix leak on try_module_get failure Keith Busch
2026-02-26 18:32 ` [PATCHv2 2/3] nvme-multipath: rescan siblings on last path removal Keith Busch
@ 2026-02-26 18:32 ` Keith Busch
2026-02-27 5:55 ` Nilay Shroff
2026-02-27 13:54 ` Christoph Hellwig
2 siblings, 2 replies; 10+ messages in thread
From: Keith Busch @ 2026-02-26 18:32 UTC (permalink / raw)
To: linux-nvme, hch, nilay; +Cc: Keith Busch
From: Keith Busch <kbusch@kernel.org>
The NVMe controller is allowed to reuse an NSID for a new namespace after
deleting the previous namespace that had been using it. The delayed removal may
have the stale namespace head in the subsystem list pending the timer, which
would cause the scan to falsely report an ID mismatch error for the new
namespace. Flush the pending removal work and retry to resolve this.
Signed-off-by: Keith Busch <kbusch@kernel.org>
---
drivers/nvme/host/core.c | 18 ++++++++++++++++++
drivers/nvme/host/nvme.h | 8 ++++++++
2 files changed, 26 insertions(+)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 5bb4b18511b7b..906421047debd 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3968,6 +3968,7 @@ static int nvme_init_ns_head(struct nvme_ns *ns, struct nvme_ns_info *info)
{
struct nvme_ctrl *ctrl = ns->ctrl;
struct nvme_ns_head *head = NULL;
+ bool retried = false;
int ret;
ret = nvme_global_check_duplicate_ids(ctrl->subsys, &info->ids);
@@ -4008,6 +4009,7 @@ static int nvme_init_ns_head(struct nvme_ns *ns, struct nvme_ns_info *info)
ctrl->quirks |= NVME_QUIRK_BOGUS_NID;
}
+again:
mutex_lock(&ctrl->subsys->lock);
head = nvme_find_ns_head(ctrl, info->nsid);
if (!head) {
@@ -4033,6 +4035,22 @@ static int nvme_init_ns_head(struct nvme_ns *ns, struct nvme_ns_info *info)
goto out_put_ns_head;
}
if (!nvme_ns_ids_equal(&head->ids, &info->ids)) {
+ /*
+ * A newly created namespace can reuse an NSID that was
+ * previously deleted. If the head has no active paths,
+ * it is pending delayed removal and still occupying
+ * this NSID in the subsystem list. Flush the removal
+ * work to clear the stale head and retry.
+ */
+ if (!retried && multipath && list_empty(&head->list)) {
+ mutex_unlock(&ctrl->subsys->lock);
+ nvme_mpath_flush_remove_work(head);
+ nvme_put_ns_head(head);
+ retried = true;
+ goto again;
+ }
+
+ WARN_ON_ONCE(list_empty(&head->list));
dev_err(ctrl->device,
"IDs don't match for shared namespace %d\n",
info->nsid);
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index e73cc2e67ac51..44801801fc289 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -1042,6 +1042,11 @@ void nvme_mpath_remove_disk(struct nvme_ns_head *head);
void nvme_mpath_start_request(struct request *rq);
void nvme_mpath_end_request(struct request *rq);
+static inline void nvme_mpath_flush_remove_work(struct nvme_ns_head *head)
+{
+ flush_delayed_work(&head->remove_work);
+}
+
static inline void nvme_trace_bio_complete(struct request *req)
{
struct nvme_ns *ns = req->q->queuedata;
@@ -1110,6 +1115,9 @@ static inline void nvme_mpath_clear_ctrl_paths(struct nvme_ctrl *ctrl)
static inline void nvme_mpath_remove_disk(struct nvme_ns_head *head)
{
}
+static inline void nvme_mpath_flush_remove_work(struct nvme_ns_head *head)
+{
+}
static inline void nvme_trace_bio_complete(struct request *req)
{
}
--
2.47.3
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCHv2 3/3] nvme: fix unmatched id's under delayed path deletion
2026-02-26 18:32 ` [PATCHv2 3/3] nvme: fix unmatched id's under delayed path deletion Keith Busch
@ 2026-02-27 5:55 ` Nilay Shroff
2026-02-27 13:54 ` Christoph Hellwig
1 sibling, 0 replies; 10+ messages in thread
From: Nilay Shroff @ 2026-02-27 5:55 UTC (permalink / raw)
To: Keith Busch, linux-nvme, hch; +Cc: Keith Busch
On 2/27/26 12:02 AM, Keith Busch wrote:
> From: Keith Busch <kbusch@kernel.org>
>
> The NVMe controller is allowed to reuse an NSID for a new namespace after
> deleting the previous namespace that had been using it. The delayed removal may
> have the stale namespace head in the subsystem list pending the timer, which
> would cause the scan to falsely report an ID mismatch error for the new
> namespace. Flush the pending removal work and retry to resolve this.
>
> Signed-off-by: Keith Busch <kbusch@kernel.org>
Looks good to me:
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCHv2 2/3] nvme-multipath: rescan siblings on last path removal
2026-02-26 18:32 ` [PATCHv2 2/3] nvme-multipath: rescan siblings on last path removal Keith Busch
@ 2026-02-27 6:59 ` Nilay Shroff
2026-02-27 15:34 ` Keith Busch
2026-02-27 13:54 ` Christoph Hellwig
1 sibling, 1 reply; 10+ messages in thread
From: Nilay Shroff @ 2026-02-27 6:59 UTC (permalink / raw)
To: Keith Busch, linux-nvme, hch; +Cc: Keith Busch
On 2/27/26 12:02 AM, Keith Busch wrote:
> From: Keith Busch <kbusch@kernel.org>
>
> When a controller's scan removes the last path to a multipath namespace
> head, sibling controllers in the same subsystem may need to rescan. A
> concurrent scan on another controller could have encountered the stale
> head with mismatched identifiers (e.g. from a recycled NSID) and failed
> to set up the namespace. Without notification, the sibling won't retry
> until the next AEN or explicit rescan.
>
> After a scan that performed last-path removals, notify all controllers
> in the subsystem to rescan by requeueing their scan work. This ensures
> that recycled NSIDs are promptly discovered by sibling controllers.
>
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---
> drivers/nvme/host/core.c | 19 ++++++++++++++++++-
> drivers/nvme/host/nvme.h | 1 +
> 2 files changed, 19 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 3de52f1d27234..5bb4b18511b7b 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -4262,8 +4262,10 @@ static void nvme_ns_remove(struct nvme_ns *ns)
> mutex_unlock(&ns->ctrl->namespaces_lock);
> synchronize_srcu(&ns->ctrl->srcu);
>
> - if (last_path)
> + if (last_path) {
> nvme_mpath_remove_disk(ns->head);
> + set_bit(NVME_CTRL_SCAN_REMOVED_NS, &ns->ctrl->flags);
> + }
> nvme_put_ns(ns);
> }
>
> @@ -4530,6 +4532,21 @@ static void nvme_scan_work(struct work_struct *work)
> }
> mutex_unlock(&ctrl->scan_lock);
>
> + /*
> + * If the scan removed the last path to a namespace, notify all
> + * controllers in the subsystem to rescan. A controller that is
> + * concurrently scanning may have missed the namespace due to the
> + * stale head still occupying the NSID in the subsystem list.
> + */
> + if (test_and_clear_bit(NVME_CTRL_SCAN_REMOVED_NS, &ctrl->flags)) {
> + struct nvme_ctrl *tmp;
> +
> + mutex_lock(&ctrl->subsys->lock);
> + list_for_each_entry(tmp, &ctrl->subsys->ctrls, subsys_entry)
> + nvme_queue_scan(tmp);
> + mutex_unlock(&ctrl->subsys->lock);
> + }
> +
> /* Requeue if we have missed AENs */
> if (test_bit(NVME_AER_NOTICE_NS_CHANGED, &ctrl->events))
> nvme_queue_scan(ctrl);
> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
> index 9971045dbc05e..e73cc2e67ac51 100644
> --- a/drivers/nvme/host/nvme.h
> +++ b/drivers/nvme/host/nvme.h
> @@ -329,6 +329,7 @@ enum nvme_ctrl_flags {
> NVME_CTRL_SKIP_ID_CNS_CS = 4,
> NVME_CTRL_DIRTY_CAPABILITY = 5,
> NVME_CTRL_FROZEN = 6,
> + NVME_CTRL_SCAN_REMOVED_NS = 7,
> };
>
> struct nvme_ctrl {
Should we consider reordering patches 2/3 and 3/3? It seems like this
patch depends on the changes introduced in 3/3.
Without the updates from 3/3, if delayed head removal is still in effect
and the NSIDs are recycled with mismatched identifiers, a forced rescan
of the sibling controller may still fail to instantiate the namespace.
In that case, requeueing the scan alone might not be sufficient.
Thanks,
--Nilay
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCHv2 2/3] nvme-multipath: rescan siblings on last path removal
2026-02-26 18:32 ` [PATCHv2 2/3] nvme-multipath: rescan siblings on last path removal Keith Busch
2026-02-27 6:59 ` Nilay Shroff
@ 2026-02-27 13:54 ` Christoph Hellwig
1 sibling, 0 replies; 10+ messages in thread
From: Christoph Hellwig @ 2026-02-27 13:54 UTC (permalink / raw)
To: Keith Busch; +Cc: linux-nvme, hch, nilay, Keith Busch
On Thu, Feb 26, 2026 at 10:32:15AM -0800, Keith Busch wrote:
> From: Keith Busch <kbusch@kernel.org>
>
> When a controller's scan removes the last path to a multipath namespace
> head, sibling controllers in the same subsystem may need to rescan. A
> concurrent scan on another controller could have encountered the stale
> head with mismatched identifiers (e.g. from a recycled NSID) and failed
> to set up the namespace. Without notification, the sibling won't retry
> until the next AEN or explicit rescan.
>
> After a scan that performed last-path removals, notify all controllers
> in the subsystem to rescan by requeueing their scan work. This ensures
> that recycled NSIDs are promptly discovered by sibling controllers.
Looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCHv2 3/3] nvme: fix unmatched id's under delayed path deletion
2026-02-26 18:32 ` [PATCHv2 3/3] nvme: fix unmatched id's under delayed path deletion Keith Busch
2026-02-27 5:55 ` Nilay Shroff
@ 2026-02-27 13:54 ` Christoph Hellwig
1 sibling, 0 replies; 10+ messages in thread
From: Christoph Hellwig @ 2026-02-27 13:54 UTC (permalink / raw)
To: Keith Busch; +Cc: linux-nvme, hch, nilay, Keith Busch
Looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCHv2 2/3] nvme-multipath: rescan siblings on last path removal
2026-02-27 6:59 ` Nilay Shroff
@ 2026-02-27 15:34 ` Keith Busch
2026-02-27 15:57 ` Nilay Shroff
0 siblings, 1 reply; 10+ messages in thread
From: Keith Busch @ 2026-02-27 15:34 UTC (permalink / raw)
To: Nilay Shroff; +Cc: Keith Busch, linux-nvme, hch
On Fri, Feb 27, 2026 at 12:29:00PM +0530, Nilay Shroff wrote:
> Should we consider reordering patches 2/3 and 3/3? It seems like this
> patch depends on the changes introduced in 3/3.
>
> Without the updates from 3/3, if delayed head removal is still in effect
> and the NSIDs are recycled with mismatched identifiers, a forced rescan
> of the sibling controller may still fail to instantiate the namespace.
> In that case, requeueing the scan alone might not be sufficient.
But the other way around, if we're racing with another controller's scan
to complete the last path removal, then the delayed work hasn't been
scheduled yet so patch 3 has a significant gap without patch 2 preceding
it.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCHv2 2/3] nvme-multipath: rescan siblings on last path removal
2026-02-27 15:34 ` Keith Busch
@ 2026-02-27 15:57 ` Nilay Shroff
0 siblings, 0 replies; 10+ messages in thread
From: Nilay Shroff @ 2026-02-27 15:57 UTC (permalink / raw)
To: Keith Busch; +Cc: Keith Busch, linux-nvme, hch
On 2/27/26 9:04 PM, Keith Busch wrote:
> On Fri, Feb 27, 2026 at 12:29:00PM +0530, Nilay Shroff wrote:
>> Should we consider reordering patches 2/3 and 3/3? It seems like this
>> patch depends on the changes introduced in 3/3.
>>
>> Without the updates from 3/3, if delayed head removal is still in effect
>> and the NSIDs are recycled with mismatched identifiers, a forced rescan
>> of the sibling controller may still fail to instantiate the namespace.
>> In that case, requeueing the scan alone might not be sufficient.
>
> But the other way around, if we're racing with another controller's scan
> to complete the last path removal, then the delayed work hasn't been
> scheduled yet so patch 3 has a significant gap without patch 2 preceding
> it.
Yes, that’s also correct — both patches depend on each other to some extent.
If the user does not configure delayed_removal_sec, then patch 2 by itself
is sufficient to resolve the race. So with that, overall, the changes look
good to me:
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2026-02-27 15:57 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-26 18:32 [PATCHv2 0/3] nvme: multi-path scan races fixes Keith Busch
2026-02-26 18:32 ` [PATCHv2 1/3] nvme-multipath: fix leak on try_module_get failure Keith Busch
2026-02-26 18:32 ` [PATCHv2 2/3] nvme-multipath: rescan siblings on last path removal Keith Busch
2026-02-27 6:59 ` Nilay Shroff
2026-02-27 15:34 ` Keith Busch
2026-02-27 15:57 ` Nilay Shroff
2026-02-27 13:54 ` Christoph Hellwig
2026-02-26 18:32 ` [PATCHv2 3/3] nvme: fix unmatched id's under delayed path deletion Keith Busch
2026-02-27 5:55 ` Nilay Shroff
2026-02-27 13:54 ` Christoph Hellwig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox