From: Hannes Reinecke <hare@suse.de>
To: Sagi Grimberg <sagi@grimberg.me>, Christoph Hellwig <hch@lst.de>
Cc: Keith Busch <keith.busch@wdc.com>, linux-nvme@lists.infradead.org
Subject: Re: [PATCHv6] nvme: allow to re-attach namespaces after all paths are down
Date: Tue, 22 Jun 2021 08:31:54 +0200 [thread overview]
Message-ID: <4a05b0de-8639-0747-e9f4-c20400854b02@suse.de> (raw)
In-Reply-To: <4903ef70-ed16-4b81-3570-60e9fcc5ecb0@grimberg.me>
On 6/21/21 8:13 PM, Sagi Grimberg wrote:
>
>
> On 6/9/21 8:01 AM, Hannes Reinecke wrote:
>> We should only remove the ns head from the list of heads per
>> subsystem if the reference count drops to zero. That cleans up
>> reference counting, and allows us to call del_gendisk() once the last
>> path is removed (as then the ns_head should be removed anyway).
>> As this introduces a (theoretical) race condition where I/O might have
>> been requeued before the last path went down we also should be checking
>> if the gendisk is still present in nvme_ns_head_submit_bio(),
>> and failing I/O if so.
>>
>> Changes to v5:
>> - Synchronize between nvme_init_ns_head() and
>> nvme_mpath_check_last_path()
>> - Check for removed gendisk in nvme_ns_head_submit_bio()
>> Changes to v4:
>> - Call del_gendisk() in nvme_mpath_check_last_path() to avoid deadlock
>> Changes to v3:
>> - Simplify if() clause to detect duplicate namespaces
>> Changes to v2:
>> - Drop memcpy() statement
>> Changes to v1:
>> - Always check NSIDs after reattach
>>
>> Signed-off-by: Hannes Reinecke <hare@suse.de>
>> ---
>> drivers/nvme/host/core.c | 9 ++++-----
>> drivers/nvme/host/multipath.c | 30 +++++++++++++++++++++++++-----
>> drivers/nvme/host/nvme.h | 11 ++---------
>> 3 files changed, 31 insertions(+), 19 deletions(-)
>>
>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>> index 177cae44b612..6d7c2958b3e2 100644
>> --- a/drivers/nvme/host/core.c
>> +++ b/drivers/nvme/host/core.c
>> @@ -566,6 +566,9 @@ static void nvme_free_ns_head(struct kref *ref)
>> struct nvme_ns_head *head =
>> container_of(ref, struct nvme_ns_head, ref);
>> + mutex_lock(&head->subsys->lock);
>> + list_del_init(&head->entry);
>> + mutex_unlock(&head->subsys->lock);
>> nvme_mpath_remove_disk(head);
>> ida_simple_remove(&head->subsys->ns_ida, head->instance);
>> cleanup_srcu_struct(&head->srcu);
>> @@ -3806,8 +3809,6 @@ static void nvme_alloc_ns(struct nvme_ctrl
>> *ctrl, unsigned nsid,
>> out_unlink_ns:
>> mutex_lock(&ctrl->subsys->lock);
>> list_del_rcu(&ns->siblings);
>> - if (list_empty(&ns->head->list))
>> - list_del_init(&ns->head->entry);
>> mutex_unlock(&ctrl->subsys->lock);
>> nvme_put_ns_head(ns->head);
>> out_free_queue:
>> @@ -3828,8 +3829,6 @@ static void nvme_ns_remove(struct nvme_ns *ns)
>> mutex_lock(&ns->ctrl->subsys->lock);
>> list_del_rcu(&ns->siblings);
>> - if (list_empty(&ns->head->list))
>> - list_del_init(&ns->head->entry);
>> mutex_unlock(&ns->ctrl->subsys->lock);
>> synchronize_rcu(); /* guarantee not available in head->list */
>> @@ -3849,7 +3848,7 @@ static void nvme_ns_remove(struct nvme_ns *ns)
>> list_del_init(&ns->list);
>> up_write(&ns->ctrl->namespaces_rwsem);
>> - nvme_mpath_check_last_path(ns);
>> + nvme_mpath_check_last_path(ns->head);
>> nvme_put_ns(ns);
>> }
>> diff --git a/drivers/nvme/host/multipath.c
>> b/drivers/nvme/host/multipath.c
>> index 23573fe3fc7d..31153f6ec582 100644
>> --- a/drivers/nvme/host/multipath.c
>> +++ b/drivers/nvme/host/multipath.c
>> @@ -266,6 +266,8 @@ inline struct nvme_ns *nvme_find_path(struct
>> nvme_ns_head *head)
>> int node = numa_node_id();
>> struct nvme_ns *ns;
>> + if (!(head->disk->flags & GENHD_FL_UP))
>> + return NULL;
>> ns = srcu_dereference(head->current_path[node], &head->srcu);
>> if (unlikely(!ns))
>> return __nvme_find_path(head, node);
>> @@ -281,6 +283,8 @@ static bool nvme_available_path(struct
>> nvme_ns_head *head)
>> {
>> struct nvme_ns *ns;
>> + if (!(head->disk->flags & GENHD_FL_UP))
>> + return false;
>
> nvme_available_path should have no business looking at the head gendisk,
> it should just understand if a PATH (a.k.a a controller) exists.
>
Agreed. I was only overly cautious here; will be dropping this check.
> IMO, the fact that it does should tell that we should take a step back
> and think about this. We are trying to keep an zombie nshead around
> just for the possibility the host will reconnect (not as part of
> error recovery, but as a brand new connect). Why shouldn't we just
> remove it and restore it as a brand new nshead when the host attaches
> again?
>
This patch has now evolved quite a bit, and in fact diverged slightly
from the description. The original intent indeed was to keep the nshead
around until the last reference drops, such that if a controller gets
reattached it will be able to connect the namespaces to the correct
(existing) ns_head.
However, as it turned out this was just a band-aid, and the real fix is
to get the reference counts between 'struct ns' and 'struct ns_head'
correct: if the last path to a ns_head drops, we should be removing the
ns_head by calling del_gendisk() and removing it from the list of ns_heads.
As noted by Keith the first part is done correctly in this patch (namely
del_gendisk() is called when the last path drops), but the second bit of
detaching it from the list of ns_heads is _not_ done correctly.
Both should be happening at the same time to avoid any race conditions.
Will be sending an updated patch.
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
prev parent reply other threads:[~2021-06-22 6:32 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-09 15:01 [PATCHv6] nvme: allow to re-attach namespaces after all paths are down Hannes Reinecke
2021-06-21 6:38 ` Christoph Hellwig
2021-06-21 7:33 ` Hannes Reinecke
2021-06-21 17:26 ` Keith Busch
2021-06-22 6:21 ` Hannes Reinecke
2021-06-21 18:13 ` Sagi Grimberg
2021-06-22 6:31 ` Hannes Reinecke [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4a05b0de-8639-0747-e9f4-c20400854b02@suse.de \
--to=hare@suse.de \
--cc=hch@lst.de \
--cc=keith.busch@wdc.com \
--cc=linux-nvme@lists.infradead.org \
--cc=sagi@grimberg.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox