From: Mike Snitzer <snitzer@redhat.com>
To: Christoph Hellwig <hch@lst.de>
Cc: Keith Busch <keith.busch@intel.com>,
Sagi Grimberg <sagi@grimberg.me>, Jens Axboe <axboe@kernel.dk>,
linux-block@vger.kernel.org, Hannes Reinecke <hare@suse.de>,
linux-nvme@lists.infradead.org,
Johannes Thumshirn <jthumshirn@suse.de>
Subject: Re: [PATCH 4/7] nvme: implement multipath access to nvme subsystems
Date: Thu, 9 Nov 2017 16:22:17 -0500 [thread overview]
Message-ID: <20171109212217.GA16454@redhat.com> (raw)
In-Reply-To: <20171109174450.17142-5-hch@lst.de>
On Thu, Nov 09 2017 at 12:44pm -0500,
Christoph Hellwig <hch@lst.de> wrote:
> This patch adds native multipath support to the nvme driver. For each
> namespace we create only single block device node, which can be used
> to access that namespace through any of the controllers that refer to it.
> The gendisk for each controllers path to the name space still exists
> inside the kernel, but is hidden from userspace. The character device
> nodes are still available on a per-controller basis. A new link from
> the sysfs directory for the subsystem allows to find all controllers
> for a given subsystem.
>
> Currently we will always send I/O to the first available path, this will
> be changed once the NVMe Asynchronous Namespace Access (ANA) TP is
> ratified and implemented, at which point we will look at the ANA state
> for each namespace. Another possibility that was prototyped is to
> use the path that is closes to the submitting NUMA code, which will be
> mostly interesting for PCI, but might also be useful for RDMA or FC
> transports in the future. There is not plan to implement round robin
> or I/O service time path selectors, as those are not scalable with
> the performance rates provided by NVMe.
>
> The multipath device will go away once all paths to it disappear,
> any delay to keep it alive needs to be implemented at the controller
> level.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Your 0th header speaks to the NVMe multipath IO path leveraging NVMe's
lack of partial completion but I think it'd be useful to have this
header (that actually gets committed) speak to it.
> diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
> new file mode 100644
> index 000000000000..062754ebebfd
> --- /dev/null
> +++ b/drivers/nvme/host/multipath.c
...
> +void nvme_failover_req(struct request *req)
> +{
> + struct nvme_ns *ns = req->q->queuedata;
> + unsigned long flags;
> +
> + spin_lock_irqsave(&ns->head->requeue_lock, flags);
> + blk_steal_bios(&ns->head->requeue_list, req);
> + spin_unlock_irqrestore(&ns->head->requeue_lock, flags);
> + blk_mq_end_request(req, 0);
> +
> + nvme_reset_ctrl(ns->ctrl);
> + kblockd_schedule_work(&ns->head->requeue_work);
> +}
Also, the block core patch to introduce blk_steal_bios() already went in
but should there be a QUEUE_FLAG that gets set by drivers like NVMe that
don't support partial completion?
This would make it easier for other future drivers to know whether they
can use a more optimized IO path.
Mike
WARNING: multiple messages have this Message-ID (diff)
From: snitzer@redhat.com (Mike Snitzer)
Subject: [PATCH 4/7] nvme: implement multipath access to nvme subsystems
Date: Thu, 9 Nov 2017 16:22:17 -0500 [thread overview]
Message-ID: <20171109212217.GA16454@redhat.com> (raw)
In-Reply-To: <20171109174450.17142-5-hch@lst.de>
On Thu, Nov 09 2017 at 12:44pm -0500,
Christoph Hellwig <hch@lst.de> wrote:
> This patch adds native multipath support to the nvme driver. For each
> namespace we create only single block device node, which can be used
> to access that namespace through any of the controllers that refer to it.
> The gendisk for each controllers path to the name space still exists
> inside the kernel, but is hidden from userspace. The character device
> nodes are still available on a per-controller basis. A new link from
> the sysfs directory for the subsystem allows to find all controllers
> for a given subsystem.
>
> Currently we will always send I/O to the first available path, this will
> be changed once the NVMe Asynchronous Namespace Access (ANA) TP is
> ratified and implemented, at which point we will look at the ANA state
> for each namespace. Another possibility that was prototyped is to
> use the path that is closes to the submitting NUMA code, which will be
> mostly interesting for PCI, but might also be useful for RDMA or FC
> transports in the future. There is not plan to implement round robin
> or I/O service time path selectors, as those are not scalable with
> the performance rates provided by NVMe.
>
> The multipath device will go away once all paths to it disappear,
> any delay to keep it alive needs to be implemented at the controller
> level.
>
> Signed-off-by: Christoph Hellwig <hch at lst.de>
Your 0th header speaks to the NVMe multipath IO path leveraging NVMe's
lack of partial completion but I think it'd be useful to have this
header (that actually gets committed) speak to it.
> diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
> new file mode 100644
> index 000000000000..062754ebebfd
> --- /dev/null
> +++ b/drivers/nvme/host/multipath.c
...
> +void nvme_failover_req(struct request *req)
> +{
> + struct nvme_ns *ns = req->q->queuedata;
> + unsigned long flags;
> +
> + spin_lock_irqsave(&ns->head->requeue_lock, flags);
> + blk_steal_bios(&ns->head->requeue_list, req);
> + spin_unlock_irqrestore(&ns->head->requeue_lock, flags);
> + blk_mq_end_request(req, 0);
> +
> + nvme_reset_ctrl(ns->ctrl);
> + kblockd_schedule_work(&ns->head->requeue_work);
> +}
Also, the block core patch to introduce blk_steal_bios() already went in
but should there be a QUEUE_FLAG that gets set by drivers like NVMe that
don't support partial completion?
This would make it easier for other future drivers to know whether they
can use a more optimized IO path.
Mike
next prev parent reply other threads:[~2017-11-09 21:22 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-11-09 17:44 nvme multipath support V7 Christoph Hellwig
2017-11-09 17:44 ` Christoph Hellwig
2017-11-09 17:44 ` [PATCH 1/7] nvme: track subsystems Christoph Hellwig
2017-11-09 17:44 ` Christoph Hellwig
2017-11-09 20:23 ` Martin K. Petersen
2017-11-09 20:23 ` Martin K. Petersen
2017-11-09 17:44 ` [PATCH 2/7] nvme: introduce a nvme_ns_ids structure Christoph Hellwig
2017-11-09 17:44 ` Christoph Hellwig
2017-11-09 20:25 ` Martin K. Petersen
2017-11-09 20:25 ` Martin K. Petersen
2017-11-09 17:44 ` [PATCH 3/7] nvme: track shared namespaces Christoph Hellwig
2017-11-09 17:44 ` Christoph Hellwig
2017-11-09 20:28 ` Martin K. Petersen
2017-11-09 20:28 ` Martin K. Petersen
2017-11-09 17:44 ` [PATCH 4/7] nvme: implement multipath access to nvme subsystems Christoph Hellwig
2017-11-09 17:44 ` Christoph Hellwig
2017-11-09 18:17 ` Keith Busch
2017-11-09 18:17 ` Keith Busch
2017-11-09 20:32 ` Martin K. Petersen
2017-11-09 20:32 ` Martin K. Petersen
2017-11-09 21:21 ` Keith Busch
2017-11-09 21:21 ` Keith Busch
2017-11-10 4:52 ` Christoph Hellwig
2017-11-10 4:52 ` Christoph Hellwig
2017-11-10 5:07 ` Christoph Hellwig
2017-11-10 5:07 ` Christoph Hellwig
2017-11-09 21:22 ` Mike Snitzer [this message]
2017-11-09 21:22 ` Mike Snitzer
2017-11-10 4:54 ` Christoph Hellwig
2017-11-10 4:54 ` Christoph Hellwig
2017-11-10 7:27 ` Hannes Reinecke
2017-11-10 7:27 ` Hannes Reinecke
2017-11-09 17:44 ` [PATCH 5/7] nvme: also expose the namespace identification sysfs files for mpath nodes Christoph Hellwig
2017-11-09 17:44 ` Christoph Hellwig
2017-11-09 20:33 ` Martin K. Petersen
2017-11-09 20:33 ` Martin K. Petersen
2017-11-10 8:21 ` Hannes Reinecke
2017-11-10 8:21 ` Hannes Reinecke
2017-11-09 17:44 ` [PATCH 6/7] block: create 'slaves' and 'holders' entries for hidden gendisks Christoph Hellwig
2017-11-09 17:44 ` Christoph Hellwig
2017-11-09 20:34 ` Martin K. Petersen
2017-11-09 20:34 ` Martin K. Petersen
2017-11-09 17:44 ` [PATCH 7/7] nvme: create 'slaves' and 'holders' entries for hidden controllers Christoph Hellwig
2017-11-09 17:44 ` Christoph Hellwig
2017-11-09 20:34 ` Martin K. Petersen
2017-11-09 20:34 ` Martin K. Petersen
2017-11-10 8:44 ` nvme multipath support V7 Christoph Hellwig
2017-11-10 8:44 ` Christoph Hellwig
2018-04-10 19:32 ` Gruher, Joseph R
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171109212217.GA16454@redhat.com \
--to=snitzer@redhat.com \
--cc=axboe@kernel.dk \
--cc=hare@suse.de \
--cc=hch@lst.de \
--cc=jthumshirn@suse.de \
--cc=keith.busch@intel.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=sagi@grimberg.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.