public inbox for linux-nvme@lists.infradead.org
 help / color / mirror / Atom feed
From: Hannes Reinecke <hare@suse.de>
To: Sagi Grimberg <sagi@grimberg.me>, Christoph Hellwig <hch@lst.de>
Cc: Keith Busch <keith.busch@wdc.com>, linux-nvme@lists.infradead.org
Subject: Re: [PATCHv6] nvme: allow to re-attach namespaces after all paths are down
Date: Tue, 22 Jun 2021 08:31:54 +0200	[thread overview]
Message-ID: <4a05b0de-8639-0747-e9f4-c20400854b02@suse.de> (raw)
In-Reply-To: <4903ef70-ed16-4b81-3570-60e9fcc5ecb0@grimberg.me>

On 6/21/21 8:13 PM, Sagi Grimberg wrote:
> 
> 
> On 6/9/21 8:01 AM, Hannes Reinecke wrote:
>> We should only remove the ns head from the list of heads per
>> subsystem if the reference count drops to zero. That cleans up
>> reference counting, and allows us to call del_gendisk() once the last
>> path is removed (as then the ns_head should be removed anyway).
>> As this introduces a (theoretical) race condition where I/O might have
>> been requeued before the last path went down we also should be checking
>> if the gendisk is still present in nvme_ns_head_submit_bio(),
>> and failing I/O if so.
>>
>> Changes to v5:
>> - Synchronize between nvme_init_ns_head() and 
>> nvme_mpath_check_last_path()
>> - Check for removed gendisk in nvme_ns_head_submit_bio()
>> Changes to v4:
>> - Call del_gendisk() in nvme_mpath_check_last_path() to avoid deadlock
>> Changes to v3:
>> - Simplify if() clause to detect duplicate namespaces
>> Changes to v2:
>> - Drop memcpy() statement
>> Changes to v1:
>> - Always check NSIDs after reattach
>>
>> Signed-off-by: Hannes Reinecke <hare@suse.de>
>> ---
>>   drivers/nvme/host/core.c      |  9 ++++-----
>>   drivers/nvme/host/multipath.c | 30 +++++++++++++++++++++++++-----
>>   drivers/nvme/host/nvme.h      | 11 ++---------
>>   3 files changed, 31 insertions(+), 19 deletions(-)
>>
>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>> index 177cae44b612..6d7c2958b3e2 100644
>> --- a/drivers/nvme/host/core.c
>> +++ b/drivers/nvme/host/core.c
>> @@ -566,6 +566,9 @@ static void nvme_free_ns_head(struct kref *ref)
>>       struct nvme_ns_head *head =
>>           container_of(ref, struct nvme_ns_head, ref);
>> +    mutex_lock(&head->subsys->lock);
>> +    list_del_init(&head->entry);
>> +    mutex_unlock(&head->subsys->lock);
>>       nvme_mpath_remove_disk(head);
>>       ida_simple_remove(&head->subsys->ns_ida, head->instance);
>>       cleanup_srcu_struct(&head->srcu);
>> @@ -3806,8 +3809,6 @@ static void nvme_alloc_ns(struct nvme_ctrl 
>> *ctrl, unsigned nsid,
>>    out_unlink_ns:
>>       mutex_lock(&ctrl->subsys->lock);
>>       list_del_rcu(&ns->siblings);
>> -    if (list_empty(&ns->head->list))
>> -        list_del_init(&ns->head->entry);
>>       mutex_unlock(&ctrl->subsys->lock);
>>       nvme_put_ns_head(ns->head);
>>    out_free_queue:
>> @@ -3828,8 +3829,6 @@ static void nvme_ns_remove(struct nvme_ns *ns)
>>       mutex_lock(&ns->ctrl->subsys->lock);
>>       list_del_rcu(&ns->siblings);
>> -    if (list_empty(&ns->head->list))
>> -        list_del_init(&ns->head->entry);
>>       mutex_unlock(&ns->ctrl->subsys->lock);
>>       synchronize_rcu(); /* guarantee not available in head->list */
>> @@ -3849,7 +3848,7 @@ static void nvme_ns_remove(struct nvme_ns *ns)
>>       list_del_init(&ns->list);
>>       up_write(&ns->ctrl->namespaces_rwsem);
>> -    nvme_mpath_check_last_path(ns);
>> +    nvme_mpath_check_last_path(ns->head);
>>       nvme_put_ns(ns);
>>   }
>> diff --git a/drivers/nvme/host/multipath.c 
>> b/drivers/nvme/host/multipath.c
>> index 23573fe3fc7d..31153f6ec582 100644
>> --- a/drivers/nvme/host/multipath.c
>> +++ b/drivers/nvme/host/multipath.c
>> @@ -266,6 +266,8 @@ inline struct nvme_ns *nvme_find_path(struct 
>> nvme_ns_head *head)
>>       int node = numa_node_id();
>>       struct nvme_ns *ns;
>> +    if (!(head->disk->flags & GENHD_FL_UP))
>> +        return NULL;
>>       ns = srcu_dereference(head->current_path[node], &head->srcu);
>>       if (unlikely(!ns))
>>           return __nvme_find_path(head, node);
>> @@ -281,6 +283,8 @@ static bool nvme_available_path(struct 
>> nvme_ns_head *head)
>>   {
>>       struct nvme_ns *ns;
>> +    if (!(head->disk->flags & GENHD_FL_UP))
>> +        return false;
> 
> nvme_available_path should have no business looking at the head gendisk,
> it should just understand if a PATH (a.k.a a controller) exists.
> 
Agreed. I was only overly cautious here; will be dropping this check.

> IMO, the fact that it does should tell that we should take a step back
> and think about this. We are trying to keep an zombie nshead around
> just for the possibility the host will reconnect (not as part of
> error recovery, but as a brand new connect). Why shouldn't we just
> remove it and restore it as a brand new nshead when the host attaches
> again?
> 
This patch has now evolved quite a bit, and in fact diverged slightly 
from the description. The original intent indeed was to keep the nshead 
around until the last reference drops, such that if a controller gets 
reattached it will be able to connect the namespaces to the correct 
(existing) ns_head.
However, as it turned out this was just a band-aid, and the real fix is 
to get the reference counts between 'struct ns' and 'struct ns_head' 
correct: if the last path to a ns_head drops, we should be removing the 
ns_head by calling del_gendisk() and removing it from the list of ns_heads.

As noted by Keith the first part is done correctly in this patch (namely 
del_gendisk() is called when the last path drops), but the second bit of 
detaching it from the list of ns_heads is _not_ done correctly.
Both should be happening at the same time to avoid any race conditions.

Will be sending an updated patch.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

      reply	other threads:[~2021-06-22  6:32 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-09 15:01 [PATCHv6] nvme: allow to re-attach namespaces after all paths are down Hannes Reinecke
2021-06-21  6:38 ` Christoph Hellwig
2021-06-21  7:33   ` Hannes Reinecke
2021-06-21 17:26 ` Keith Busch
2021-06-22  6:21   ` Hannes Reinecke
2021-06-21 18:13 ` Sagi Grimberg
2021-06-22  6:31   ` Hannes Reinecke [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4a05b0de-8639-0747-e9f4-c20400854b02@suse.de \
    --to=hare@suse.de \
    --cc=hch@lst.de \
    --cc=keith.busch@wdc.com \
    --cc=linux-nvme@lists.infradead.org \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox