From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mike Snitzer Subject: Re: nvme: allow ANA support to be independent of native multipathing Date: Fri, 16 Nov 2018 09:01:53 -0500 Message-ID: <20181116140153.GB28870@redhat.com> References: <2691abf6733f791fb16b86d96446440e4aaff99f.camel@suse.com> <20181112215323.GA7983@redhat.com> <20181113161838.GC9827@localhost.localdomain> <20181113180008.GA12513@redhat.com> <20181114053837.GA15086@redhat.com> <30cf7af7-8826-55bd-e39a-4f81ed032f6d@suse.de> <20181114174746.GA18526@redhat.com> <87c931e5-4ac9-1795-8d40-cc5541d3ebcf@suse.de> <20181115174605.GA19782@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Content-Disposition: inline In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org To: Hannes Reinecke Cc: linux-nvme@lists.infradead.org, Keith Busch , Sagi Grimberg , hch@lst.de, axboe@kernel.dk, Martin Wilck , lijie , xose.vazquez@gmail.com, chengjike.cheng@huawei.com, shenhong09@huawei.com, dm-devel@redhat.com, wangzhoumengjian@huawei.com, christophe.varoqui@opensvc.com, bmarzins@redhat.com, sschremm@netapp.com, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org List-Id: dm-devel.ids On Fri, Nov 16 2018 at 2:25am -0500, Hannes Reinecke wrote: > On 11/15/18 6:46 PM, Mike Snitzer wrote: > >Whether or not ANA is present is a choice of the target implementation; > >the host (and whether it supports multipathing) has _zero_ influence on > >this. If the target declares a path as 'inaccessible' the path _is_ > >inaccessible to the host. As such, ANA support should be functional > >even if native multipathing is not. > > > >Introduce ability to always re-read ANA log page as required due to ANA > >error and make current ANA state available via sysfs -- even if native > >multipathing is disabled on the host (e.g. nvme_core.multipath=N). > > > >This affords userspace access to the current ANA state independent of > >which layer might be doing multipathing. It also allows multipath-tools > >to rely on the NVMe driver for ANA support while dm-multipath takes care > >of multipathing. > > > >While implementing these changes care was taken to preserve the exact > >ANA functionality and code sequence native multipathing has provided. > >This manifests as native multipathing's nvme_failover_req() being > >tweaked to call __nvme_update_ana() which was factored out to allow > >nvme_update_ana() to be called independent of nvme_failover_req(). > > > >And as always, if embedded NVMe users do not want any performance > >overhead associated with ANA or native NVMe multipathing they can > >disable CONFIG_NVME_MULTIPATH. > > > >Signed-off-by: Mike Snitzer > >--- > > drivers/nvme/host/core.c | 10 +++++---- > > drivers/nvme/host/multipath.c | 49 +++++++++++++++++++++++++++++++++---------- > > drivers/nvme/host/nvme.h | 4 ++++ > > 3 files changed, 48 insertions(+), 15 deletions(-) > > > >diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c > >index fe957166c4a9..3df607905628 100644 > >--- a/drivers/nvme/host/core.c > >+++ b/drivers/nvme/host/core.c > >@@ -255,10 +255,12 @@ void nvme_complete_rq(struct request *req) > > nvme_req(req)->ctrl->comp_seen = true; > > if (unlikely(status != BLK_STS_OK && nvme_req_needs_retry(req))) { > >- if ((req->cmd_flags & REQ_NVME_MPATH) && > >- blk_path_error(status)) { > >- nvme_failover_req(req); > >- return; > >+ if (blk_path_error(status)) { > >+ if (req->cmd_flags & REQ_NVME_MPATH) { > >+ nvme_failover_req(req); > >+ return; > >+ } > >+ nvme_update_ana(req); > > } > > if (!blk_queue_dying(req->q)) { ... > >diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c > >index 8e03cda770c5..0adbcff5fba2 100644 > >--- a/drivers/nvme/host/multipath.c > >+++ b/drivers/nvme/host/multipath.c > >@@ -58,25 +87,22 @@ void nvme_failover_req(struct request *req) > > spin_unlock_irqrestore(&ns->head->requeue_lock, flags); > > blk_mq_end_request(req, 0); > >- switch (status & 0x7ff) { > >- case NVME_SC_ANA_TRANSITION: > >- case NVME_SC_ANA_INACCESSIBLE: > >- case NVME_SC_ANA_PERSISTENT_LOSS: > >+ if (nvme_ana_error(status)) { > > /* > > * If we got back an ANA error we know the controller is alive, > > * but not ready to serve this namespaces. The spec suggests > > * we should update our general state here, but due to the fact > > * that the admin and I/O queues are not serialized that is > > * fundamentally racy. So instead just clear the current path, > >- * mark the the path as pending and kick of a re-read of the ANA > >+ * mark the path as pending and kick off a re-read of the ANA > > * log page ASAP. > > */ > > nvme_mpath_clear_current_path(ns); > >- if (ns->ctrl->ana_log_buf) { > >- set_bit(NVME_NS_ANA_PENDING, &ns->flags); > >- queue_work(nvme_wq, &ns->ctrl->ana_work); > >- } > >- break; > >+ __nvme_update_ana(ns); > >+ goto kick_requeue; > >+ } > >+ > >+ switch (status & 0x7ff) { > > case NVME_SC_HOST_PATH_ERROR: > > /* > > * Temporary transport disruption in talking to the controller. > >@@ -93,6 +119,7 @@ void nvme_failover_req(struct request *req) > > break; > > } > >+kick_requeue: > > kblockd_schedule_work(&ns->head->requeue_work); > > } > Doesn't the need to be protected by 'if (ns->head->disk)' or somesuch? No. nvme_failover_req() is only ever called by native multipathing; see nvme_complete_rq()'s check for req->cmd_flags & REQ_NVME_MPATH as the condition for calling nvme_complete_rq(). The previos RFC-style patch I posted muddled ANA and multipathing in nvme_update_ana() but this final patch submission was fixed because I saw a cleaner way forward by having nvme_failover_req() also do ANA work just like it always has -- albeit with new helpers that nvme_update_ana() also calls. Mike From mboxrd@z Thu Jan 1 00:00:00 1970 From: snitzer@redhat.com (Mike Snitzer) Date: Fri, 16 Nov 2018 09:01:53 -0500 Subject: nvme: allow ANA support to be independent of native multipathing In-Reply-To: References: <2691abf6733f791fb16b86d96446440e4aaff99f.camel@suse.com> <20181112215323.GA7983@redhat.com> <20181113161838.GC9827@localhost.localdomain> <20181113180008.GA12513@redhat.com> <20181114053837.GA15086@redhat.com> <30cf7af7-8826-55bd-e39a-4f81ed032f6d@suse.de> <20181114174746.GA18526@redhat.com> <87c931e5-4ac9-1795-8d40-cc5541d3ebcf@suse.de> <20181115174605.GA19782@redhat.com> Message-ID: <20181116140153.GB28870@redhat.com> On Fri, Nov 16 2018 at 2:25am -0500, Hannes Reinecke wrote: > On 11/15/18 6:46 PM, Mike Snitzer wrote: > >Whether or not ANA is present is a choice of the target implementation; > >the host (and whether it supports multipathing) has _zero_ influence on > >this. If the target declares a path as 'inaccessible' the path _is_ > >inaccessible to the host. As such, ANA support should be functional > >even if native multipathing is not. > > > >Introduce ability to always re-read ANA log page as required due to ANA > >error and make current ANA state available via sysfs -- even if native > >multipathing is disabled on the host (e.g. nvme_core.multipath=N). > > > >This affords userspace access to the current ANA state independent of > >which layer might be doing multipathing. It also allows multipath-tools > >to rely on the NVMe driver for ANA support while dm-multipath takes care > >of multipathing. > > > >While implementing these changes care was taken to preserve the exact > >ANA functionality and code sequence native multipathing has provided. > >This manifests as native multipathing's nvme_failover_req() being > >tweaked to call __nvme_update_ana() which was factored out to allow > >nvme_update_ana() to be called independent of nvme_failover_req(). > > > >And as always, if embedded NVMe users do not want any performance > >overhead associated with ANA or native NVMe multipathing they can > >disable CONFIG_NVME_MULTIPATH. > > > >Signed-off-by: Mike Snitzer > >--- > > drivers/nvme/host/core.c | 10 +++++---- > > drivers/nvme/host/multipath.c | 49 +++++++++++++++++++++++++++++++++---------- > > drivers/nvme/host/nvme.h | 4 ++++ > > 3 files changed, 48 insertions(+), 15 deletions(-) > > > >diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c > >index fe957166c4a9..3df607905628 100644 > >--- a/drivers/nvme/host/core.c > >+++ b/drivers/nvme/host/core.c > >@@ -255,10 +255,12 @@ void nvme_complete_rq(struct request *req) > > nvme_req(req)->ctrl->comp_seen = true; > > if (unlikely(status != BLK_STS_OK && nvme_req_needs_retry(req))) { > >- if ((req->cmd_flags & REQ_NVME_MPATH) && > >- blk_path_error(status)) { > >- nvme_failover_req(req); > >- return; > >+ if (blk_path_error(status)) { > >+ if (req->cmd_flags & REQ_NVME_MPATH) { > >+ nvme_failover_req(req); > >+ return; > >+ } > >+ nvme_update_ana(req); > > } > > if (!blk_queue_dying(req->q)) { ... > >diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c > >index 8e03cda770c5..0adbcff5fba2 100644 > >--- a/drivers/nvme/host/multipath.c > >+++ b/drivers/nvme/host/multipath.c > >@@ -58,25 +87,22 @@ void nvme_failover_req(struct request *req) > > spin_unlock_irqrestore(&ns->head->requeue_lock, flags); > > blk_mq_end_request(req, 0); > >- switch (status & 0x7ff) { > >- case NVME_SC_ANA_TRANSITION: > >- case NVME_SC_ANA_INACCESSIBLE: > >- case NVME_SC_ANA_PERSISTENT_LOSS: > >+ if (nvme_ana_error(status)) { > > /* > > * If we got back an ANA error we know the controller is alive, > > * but not ready to serve this namespaces. The spec suggests > > * we should update our general state here, but due to the fact > > * that the admin and I/O queues are not serialized that is > > * fundamentally racy. So instead just clear the current path, > >- * mark the the path as pending and kick of a re-read of the ANA > >+ * mark the path as pending and kick off a re-read of the ANA > > * log page ASAP. > > */ > > nvme_mpath_clear_current_path(ns); > >- if (ns->ctrl->ana_log_buf) { > >- set_bit(NVME_NS_ANA_PENDING, &ns->flags); > >- queue_work(nvme_wq, &ns->ctrl->ana_work); > >- } > >- break; > >+ __nvme_update_ana(ns); > >+ goto kick_requeue; > >+ } > >+ > >+ switch (status & 0x7ff) { > > case NVME_SC_HOST_PATH_ERROR: > > /* > > * Temporary transport disruption in talking to the controller. > >@@ -93,6 +119,7 @@ void nvme_failover_req(struct request *req) > > break; > > } > >+kick_requeue: > > kblockd_schedule_work(&ns->head->requeue_work); > > } > Doesn't the need to be protected by 'if (ns->head->disk)' or somesuch? No. nvme_failover_req() is only ever called by native multipathing; see nvme_complete_rq()'s check for req->cmd_flags & REQ_NVME_MPATH as the condition for calling nvme_complete_rq(). The previos RFC-style patch I posted muddled ANA and multipathing in nvme_update_ana() but this final patch submission was fixed because I saw a cleaner way forward by having nvme_failover_req() also do ANA work just like it always has -- albeit with new helpers that nvme_update_ana() also calls. Mike