From: Hannes Reinecke <hare@suse.de>
To: Keith Busch <kbusch@kernel.org>,
linux-nvme@lists.infradead.org, hch@lst.de, sagi@grimberg.me
Subject: Re: [RFC] nvme-mpath: delete disk after last connection
Date: Sat, 26 Sep 2020 13:16:51 +0200 [thread overview]
Message-ID: <f8fdd99f-84da-bf68-a07b-7a6200cc02ab@suse.de> (raw)
In-Reply-To: <20200925213819.224198-1-kbusch@kernel.org>
On 9/25/20 11:38 PM, Keith Busch wrote:
> I have this tagged "RFC" because I'm not sure if there's a reason why the code
> is done the way that it is today.
>
> The multipath code currently deletes the disk only after all references
> to it are dropped rather than when the last path to that disk is lost.
> This has been reported to cause problems with some usage, like MD RAID.
>
> Delete the disk when the last path is gone. This is the same behavior we
> currently have with non-multipathed nvme devices.
>
> The following is just a simple example that demonstrates what is currently
> observed using a simple nvme loop back (loop setup file not shown):
>
> # nvmetcli restore loop.json
> [ 31.156452] nvmet: adding nsid 1 to subsystem testnqn1
> [ 31.159140] nvmet: adding nsid 1 to subsystem testnqn2
>
> # nvme connect -t loop -n testnqn1 -q hostnqn
> [ 36.866302] nvmet: creating controller 1 for subsystem testnqn1 for NQN hostnqn.
> [ 36.872926] nvme nvme3: new ctrl: "testnqn1"
>
> # nvme connect -t loop -n testnqn1 -q hostnqn
> [ 38.227186] nvmet: creating controller 2 for subsystem testnqn1 for NQN hostnqn.
> [ 38.234450] nvme nvme4: new ctrl: "testnqn1"
>
> # nvme connect -t loop -n testnqn2 -q hostnqn
> [ 43.902761] nvmet: creating controller 3 for subsystem testnqn2 for NQN hostnqn.
> [ 43.907401] nvme nvme5: new ctrl: "testnqn2"
>
> # nvme connect -t loop -n testnqn2 -q hostnqn
> [ 44.627689] nvmet: creating controller 4 for subsystem testnqn2 for NQN hostnqn.
> [ 44.641773] nvme nvme6: new ctrl: "testnqn2"
>
> # mdadm --create /dev/md0 --level=mirror --raid-devices=2 /dev/nvme3n1 /dev/nvme5n1
> [ 53.497038] md/raid1:md0: active with 2 out of 2 mirrors
> [ 53.501717] md0: detected capacity change from 0 to 66060288
>
> # cat /proc/mdstat
> Personalities : [raid1]
> md0 : active raid1 nvme5n1[1] nvme3n1[0]
> 64512 blocks super 1.2 [2/2] [UU]
>
> Now delete all paths to one of the namespaces:
>
> # echo 1 > /sys/class/nvme/nvme3/delete_controller
> # echo 1 > /sys/class/nvme/nvme4/delete_controller
>
> We have no path, but mdstat says:
>
> # cat /proc/mdstat
> Personalities : [raid1]
> md0 : active (auto-read-only) raid1 nvme5n1[1]
> 64512 blocks super 1.2 [2/1] [_U]
>
> And this is reported to cause a problem.
>
> With the proposed patch, the following messages appear:
>
> [ 227.516807] md/raid1:md0: Disk failure on nvme3n1, disabling device.
> [ 227.516807] md/raid1:md0: Operation continuing on 1 devices.
>
> And mdstat shows only the viable members:
>
> # cat /proc/mdstat
> Personalities : [raid1]
> md0 : active (auto-read-only) raid1 nvme5n1[1]
> 64512 blocks super 1.2 [2/1] [_U]
>
> Reported-by: Hannes Reinecke <hare@suse.de>
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---
> drivers/nvme/host/core.c | 3 ++-
> drivers/nvme/host/multipath.c | 1 -
> drivers/nvme/host/nvme.h | 2 +-
> 3 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 4857168f71f2..a2faa3625e39 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -475,7 +475,8 @@ static void nvme_free_ns_head(struct kref *ref)
> struct nvme_ns_head *head =
> container_of(ref, struct nvme_ns_head, ref);
>
> - nvme_mpath_remove_disk(head);
> + if (head->disk)
> + put_disk(head->disk);
> ida_simple_remove(&head->subsys->ns_ida, head->instance);
> cleanup_srcu_struct(&head->srcu);
> nvme_put_subsystem(head->subsys);
> diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
> index 74896be40c17..55045291b4de 100644
> --- a/drivers/nvme/host/multipath.c
> +++ b/drivers/nvme/host/multipath.c
> @@ -697,7 +697,6 @@ void nvme_mpath_remove_disk(struct nvme_ns_head *head)
> */
> head->disk->queue = NULL;
> }
> - put_disk(head->disk);
> }
>
> int nvme_mpath_init(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id)
> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
> index a42b75869213..745cda1a63fd 100644
> --- a/drivers/nvme/host/nvme.h
> +++ b/drivers/nvme/host/nvme.h
> @@ -670,7 +670,7 @@ static inline void nvme_mpath_check_last_path(struct nvme_ns *ns)
> struct nvme_ns_head *head = ns->head;
>
> if (head->disk && list_empty(&head->list))
> - kblockd_schedule_work(&head->requeue_work);
> + nvme_mpath_remove_disk(head);
> }
>
> static inline void nvme_trace_bio_complete(struct request *req,
>
I'm okay with that in general, but then again we might run into
situations where an 'all paths down' scenario is actually expected
(think of a temporary network outage on nvme-tcp).
So I guess we need to introduce an additional setting
(queue_if_no_path?) to be specified during the initial connection.
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
next prev parent reply other threads:[~2020-09-26 11:17 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-09-25 21:38 [RFC] nvme-mpath: delete disk after last connection Keith Busch
2020-09-26 11:16 ` Hannes Reinecke [this message]
2020-09-27 6:07 ` Christoph Hellwig
2020-09-28 14:11 ` Keith Busch
2020-09-29 8:28 ` Sagi Grimberg
2020-09-29 18:08 ` Christoph Hellwig
2020-09-29 18:16 ` Sagi Grimberg
2020-09-30 5:36 ` Hannes Reinecke
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f8fdd99f-84da-bf68-a07b-7a6200cc02ab@suse.de \
--to=hare@suse.de \
--cc=hch@lst.de \
--cc=kbusch@kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=sagi@grimberg.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox