Linux-NVME Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] nvme-multipath: do not reset on unknown status
@ 2020-02-20 14:52 Keith Busch
  2020-02-20 16:47 ` Christoph Hellwig
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Keith Busch @ 2020-02-20 14:52 UTC (permalink / raw)
  To: linux-nvme, hch, sagi; +Cc: Keith Busch, hare, John Meneghini

From: John Meneghini <johnm@netapp.com>

The nvme multipath error handling defaults to controller reset if the
error is unknown. There are, however, no existing nvme status codes that
indicate a reset should be used, and resetting causes unnecessary
disruption to the rest of IO.

Change nvme's error handling to first check if failover should happen.
If not, let the normal error handling take over rather than reset the
controller.

Signed-off-by: John Meneghini <johnm@netapp.com>
[changelog]
Signed-off-by: Keith Busch <kbusch@kernel.org>
---
 drivers/nvme/host/core.c      |  4 +---
 drivers/nvme/host/multipath.c | 21 +++++++++------------
 drivers/nvme/host/nvme.h      |  5 +++--
 3 files changed, 13 insertions(+), 17 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 84914223c537..0eef1ef8c659 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -292,10 +292,8 @@ void nvme_complete_rq(struct request *req)
 
 	if (unlikely(status != BLK_STS_OK && nvme_req_needs_retry(req))) {
 		if ((req->cmd_flags & REQ_NVME_MPATH) &&
-		    blk_path_error(status)) {
-			nvme_failover_req(req);
+		    nvme_failover_req(req))
 			return;
-		}
 
 		if (!blk_queue_dying(req->q)) {
 			nvme_retry_req(req);
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 797c18337d96..16df0baaeb40 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -64,17 +64,12 @@ void nvme_set_disk_name(char *disk_name, struct nvme_ns *ns,
 	}
 }
 
-void nvme_failover_req(struct request *req)
+bool nvme_failover_req(struct request *req)
 {
 	struct nvme_ns *ns = req->q->queuedata;
 	u16 status = nvme_req(req)->status;
 	unsigned long flags;
 
-	spin_lock_irqsave(&ns->head->requeue_lock, flags);
-	blk_steal_bios(&ns->head->requeue_list, req);
-	spin_unlock_irqrestore(&ns->head->requeue_lock, flags);
-	blk_mq_end_request(req, 0);
-
 	switch (status & 0x7ff) {
 	case NVME_SC_ANA_TRANSITION:
 	case NVME_SC_ANA_INACCESSIBLE:
@@ -103,15 +98,17 @@ void nvme_failover_req(struct request *req)
 		nvme_mpath_clear_current_path(ns);
 		break;
 	default:
-		/*
-		 * Reset the controller for any non-ANA error as we don't know
-		 * what caused the error.
-		 */
-		nvme_reset_ctrl(ns->ctrl);
-		break;
+		/* This was a non-ANA error so follow the normal error path. */
+		return false;
 	}
 
+	spin_lock_irqsave(&ns->head->requeue_lock, flags);
+	blk_steal_bios(&ns->head->requeue_list, req);
+	spin_unlock_irqrestore(&ns->head->requeue_lock, flags);
+	blk_mq_end_request(req, 0);
+
 	kblockd_schedule_work(&ns->head->requeue_work);
+	return true;
 }
 
 void nvme_kick_requeue_lists(struct nvme_ctrl *ctrl)
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 1024fec7914c..d800b9a51c2c 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -550,7 +550,7 @@ void nvme_mpath_wait_freeze(struct nvme_subsystem *subsys);
 void nvme_mpath_start_freeze(struct nvme_subsystem *subsys);
 void nvme_set_disk_name(char *disk_name, struct nvme_ns *ns,
 			struct nvme_ctrl *ctrl, int *flags);
-void nvme_failover_req(struct request *req);
+bool nvme_failover_req(struct request *req);
 void nvme_kick_requeue_lists(struct nvme_ctrl *ctrl);
 int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl,struct nvme_ns_head *head);
 void nvme_mpath_add_disk(struct nvme_ns *ns, struct nvme_id_ns *id);
@@ -599,8 +599,9 @@ static inline void nvme_set_disk_name(char *disk_name, struct nvme_ns *ns,
 	sprintf(disk_name, "nvme%dn%d", ctrl->instance, ns->head->instance);
 }
 
-static inline void nvme_failover_req(struct request *req)
+static inline bool nvme_failover_req(struct request *req)
 {
+	return false;
 }
 static inline void nvme_kick_requeue_lists(struct nvme_ctrl *ctrl)
 {
-- 
2.21.0


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] nvme-multipath: do not reset on unknown status
  2020-02-20 14:52 [PATCH] nvme-multipath: do not reset on unknown status Keith Busch
@ 2020-02-20 16:47 ` Christoph Hellwig
  2020-02-20 17:17   ` Meneghini, John
  2020-02-20 19:39 ` Hannes Reinecke
  2020-03-10 21:06 ` Keith Busch
  2 siblings, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2020-02-20 16:47 UTC (permalink / raw)
  To: Keith Busch; +Cc: hare, John Meneghini, hch, linux-nvme, sagi

On Thu, Feb 20, 2020 at 11:52:41PM +0900, Keith Busch wrote:
> From: John Meneghini <johnm@netapp.com>
> 
> The nvme multipath error handling defaults to controller reset if the
> error is unknown. There are, however, no existing nvme status codes that
> indicate a reset should be used, and resetting causes unnecessary
> disruption to the rest of IO.
> 
> Change nvme's error handling to first check if failover should happen.
> If not, let the normal error handling take over rather than reset the
> controller.
> 
> Signed-off-by: John Meneghini <johnm@netapp.com>
> [changelog]
> Signed-off-by: Keith Busch <kbusch@kernel.org>

This pretty much looks like my edited version of Hanne's patch with
John's changelog and your edits..

>  		if ((req->cmd_flags & REQ_NVME_MPATH) &&
> +		    nvme_failover_req(req))
> -		    blk_path_error(status)) {
> -			nvme_failover_req(req);
> +		    nvme_failover_req(req))

This conditional could now fit onto a single line.

Otherwise this looks fine to me.

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] nvme-multipath: do not reset on unknown status
  2020-02-20 16:47 ` Christoph Hellwig
@ 2020-02-20 17:17   ` Meneghini, John
  0 siblings, 0 replies; 6+ messages in thread
From: Meneghini, John @ 2020-02-20 17:17 UTC (permalink / raw)
  To: Christoph Hellwig, Keith Busch
  Cc: Meneghini, John, sagi@grimberg.me, linux-nvme@lists.infradead.org,
	hare@suse.de

On 2/20/20, 11:47 AM, "Christoph Hellwig" <hch@lst.de> wrote:
    
    This pretty much looks like my edited version of Hanne's patch with
    John's changelog and your edits..

That's because it is!
    
    >               if ((req->cmd_flags & REQ_NVME_MPATH) &&
    > +                 nvme_failover_req(req))
    > -                 blk_path_error(status)) {
    > -                     nvme_failover_req(req);
    > +                 nvme_failover_req(req))
    
    This conditional could now fit onto a single line.
    
    Otherwise this looks fine to me.
  
Here's another copy of the version I sent to Kieth, with the new commit message.

From: John Meneghini <johnm@netapp.com>

The nvme multipath error handling defaults to controller reset if the
error is unknown. There are, however, no existing nvme status codes that
indicate a reset should be used, and resetting causes unnecessary
disruption to the rest of IO.

Change nvme's error handling to first check if failover should happen.
If not, let the normal error handling take over rather than reset the
controller.

Signed-off-by: John Meneghini <johnm@netapp.com>
[changelog]
Signed-off-by: Keith Busch <kbusch@kernel.org>
---
 drivers/nvme/host/core.c      |  4 +---
 drivers/nvme/host/multipath.c | 22 ++++++++++------------
 drivers/nvme/host/nvme.h      |  5 +++--
 3 files changed, 14 insertions(+), 17 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 84914223c537..e1696fcf0f8b 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -291,9 +291,7 @@ void nvme_complete_rq(struct request *req)
                nvme_req(req)->ctrl->comp_seen = true;
 
        if (unlikely(status != BLK_STS_OK && nvme_req_needs_retry(req))) {
-               if ((req->cmd_flags & REQ_NVME_MPATH) &&
-                   blk_path_error(status)) {
-                       nvme_failover_req(req);
+               if ((req->cmd_flags & REQ_NVME_MPATH) && nvme_failover_req(req)) {
                        return;
                }
 
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 797c18337d96..8a1697f7986b 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -64,18 +64,12 @@ void nvme_set_disk_name(char *disk_name, struct nvme_ns *ns,
        }
 }
 
-void nvme_failover_req(struct request *req)
+bool nvme_failover_req(struct request *req)
 {
        struct nvme_ns *ns = req->q->queuedata;
-       u16 status = nvme_req(req)->status;
        unsigned long flags;
 
-       spin_lock_irqsave(&ns->head->requeue_lock, flags);
-       blk_steal_bios(&ns->head->requeue_list, req);
-       spin_unlock_irqrestore(&ns->head->requeue_lock, flags);
-       blk_mq_end_request(req, 0);
-
-       switch (status & 0x7ff) {
+       switch (nvme_req(req)->status & 0x7ff) {
        case NVME_SC_ANA_TRANSITION:
        case NVME_SC_ANA_INACCESSIBLE:
        case NVME_SC_ANA_PERSISTENT_LOSS:
@@ -104,14 +98,18 @@ void nvme_failover_req(struct request *req)
                break;
        default:
                /*
-                * Reset the controller for any non-ANA error as we don't know
-                * what caused the error.
+                * This was a non-ANA error so follow the normal error path.
                 */
-               nvme_reset_ctrl(ns->ctrl);
-               break;
+               return false;
        }

void nvme_kick_requeue_lists(struct nvme_ctrl *ctrl)
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 1024fec7914c..d800b9a51c2c 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -550,7 +550,7 @@ void nvme_mpath_wait_freeze(struct nvme_subsystem *subsys);
 void nvme_mpath_start_freeze(struct nvme_subsystem *subsys);
 void nvme_set_disk_name(char *disk_name, struct nvme_ns *ns,
                        struct nvme_ctrl *ctrl, int *flags);
-void nvme_failover_req(struct request *req);
+bool nvme_failover_req(struct request *req);
 void nvme_kick_requeue_lists(struct nvme_ctrl *ctrl);
 int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl,struct nvme_ns_head *head);
 void nvme_mpath_add_disk(struct nvme_ns *ns, struct nvme_id_ns *id);
@@ -599,8 +599,9 @@ static inline void nvme_set_disk_name(char *disk_name, struct nvme_ns *ns,
        sprintf(disk_name, "nvme%dn%d", ctrl->instance, ns->head->instance);
 }
 
-static inline void nvme_failover_req(struct request *req)
+static inline bool nvme_failover_req(struct request *req)
 {
+       return false;
 }
 static inline void nvme_kick_requeue_lists(struct nvme_ctrl *ctrl)
 {
-- 
2.21.0

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] nvme-multipath: do not reset on unknown status
  2020-02-20 14:52 [PATCH] nvme-multipath: do not reset on unknown status Keith Busch
  2020-02-20 16:47 ` Christoph Hellwig
@ 2020-02-20 19:39 ` Hannes Reinecke
  2020-03-10 21:06 ` Keith Busch
  2 siblings, 0 replies; 6+ messages in thread
From: Hannes Reinecke @ 2020-02-20 19:39 UTC (permalink / raw)
  To: Keith Busch, linux-nvme, hch, sagi; +Cc: John Meneghini

On 2/20/20 3:52 PM, Keith Busch wrote:
> From: John Meneghini <johnm@netapp.com>
> 
> The nvme multipath error handling defaults to controller reset if the
> error is unknown. There are, however, no existing nvme status codes that
> indicate a reset should be used, and resetting causes unnecessary
> disruption to the rest of IO.
> 
> Change nvme's error handling to first check if failover should happen.
> If not, let the normal error handling take over rather than reset the
> controller.
> 
> Signed-off-by: John Meneghini <johnm@netapp.com>
> [changelog]
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---
>   drivers/nvme/host/core.c      |  4 +---
>   drivers/nvme/host/multipath.c | 21 +++++++++------------
>   drivers/nvme/host/nvme.h      |  5 +++--
>   3 files changed, 13 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 84914223c537..0eef1ef8c659 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -292,10 +292,8 @@ void nvme_complete_rq(struct request *req)
>   
>   	if (unlikely(status != BLK_STS_OK && nvme_req_needs_retry(req))) {
>   		if ((req->cmd_flags & REQ_NVME_MPATH) &&
> -		    blk_path_error(status)) {
> -			nvme_failover_req(req);
> +		    nvme_failover_req(req))
>   			return;
> -		}
>   
>   		if (!blk_queue_dying(req->q)) {
>   			nvme_retry_req(req);
> diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
> index 797c18337d96..16df0baaeb40 100644
> --- a/drivers/nvme/host/multipath.c
> +++ b/drivers/nvme/host/multipath.c
> @@ -64,17 +64,12 @@ void nvme_set_disk_name(char *disk_name, struct nvme_ns *ns,
>   	}
>   }
>   
> -void nvme_failover_req(struct request *req)
> +bool nvme_failover_req(struct request *req)
>   {
>   	struct nvme_ns *ns = req->q->queuedata;
>   	u16 status = nvme_req(req)->status;
>   	unsigned long flags;
>   
> -	spin_lock_irqsave(&ns->head->requeue_lock, flags);
> -	blk_steal_bios(&ns->head->requeue_list, req);
> -	spin_unlock_irqrestore(&ns->head->requeue_lock, flags);
> -	blk_mq_end_request(req, 0);
> -
>   	switch (status & 0x7ff) {
>   	case NVME_SC_ANA_TRANSITION:
>   	case NVME_SC_ANA_INACCESSIBLE:
> @@ -103,15 +98,17 @@ void nvme_failover_req(struct request *req)
>   		nvme_mpath_clear_current_path(ns);
>   		break;
>   	default:
> -		/*
> -		 * Reset the controller for any non-ANA error as we don't know
> -		 * what caused the error.
> -		 */
> -		nvme_reset_ctrl(ns->ctrl);
> -		break;
> +		/* This was a non-ANA error so follow the normal error path. */
> +		return false;
>   	}
>   
> +	spin_lock_irqsave(&ns->head->requeue_lock, flags);
> +	blk_steal_bios(&ns->head->requeue_list, req);
> +	spin_unlock_irqrestore(&ns->head->requeue_lock, flags);
> +	blk_mq_end_request(req, 0);
> +
>   	kblockd_schedule_work(&ns->head->requeue_work);
> +	return true;
>   }
>   
>   void nvme_kick_requeue_lists(struct nvme_ctrl *ctrl)
> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
> index 1024fec7914c..d800b9a51c2c 100644
> --- a/drivers/nvme/host/nvme.h
> +++ b/drivers/nvme/host/nvme.h
> @@ -550,7 +550,7 @@ void nvme_mpath_wait_freeze(struct nvme_subsystem *subsys);
>   void nvme_mpath_start_freeze(struct nvme_subsystem *subsys);
>   void nvme_set_disk_name(char *disk_name, struct nvme_ns *ns,
>   			struct nvme_ctrl *ctrl, int *flags);
> -void nvme_failover_req(struct request *req);
> +bool nvme_failover_req(struct request *req);
>   void nvme_kick_requeue_lists(struct nvme_ctrl *ctrl);
>   int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl,struct nvme_ns_head *head);
>   void nvme_mpath_add_disk(struct nvme_ns *ns, struct nvme_id_ns *id);
> @@ -599,8 +599,9 @@ static inline void nvme_set_disk_name(char *disk_name, struct nvme_ns *ns,
>   	sprintf(disk_name, "nvme%dn%d", ctrl->instance, ns->head->instance);
>   }
>   
> -static inline void nvme_failover_req(struct request *req)
> +static inline bool nvme_failover_req(struct request *req)
>   {
> +	return false;
>   }
>   static inline void nvme_kick_requeue_lists(struct nvme_ctrl *ctrl)
>   {
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke            Teamlead Storage & Networking
hare@suse.de                               +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] nvme-multipath: do not reset on unknown status
  2020-02-20 14:52 [PATCH] nvme-multipath: do not reset on unknown status Keith Busch
  2020-02-20 16:47 ` Christoph Hellwig
  2020-02-20 19:39 ` Hannes Reinecke
@ 2020-03-10 21:06 ` Keith Busch
  2020-03-10 22:08   ` Meneghini, John
  2 siblings, 1 reply; 6+ messages in thread
From: Keith Busch @ 2020-03-10 21:06 UTC (permalink / raw)
  To: Keith Busch
  Cc: Hannes Reinecke, John Meneghini, Christoph Hellwig, linux-nvme,
	Sagi Grimberg

On Thu, Feb 20, 2020 at 7:53 AM Keith Busch <kbusch@kernel.org> wrote:
>
> From: John Meneghini <johnm@netapp.com>
>
> The nvme multipath error handling defaults to controller reset if the
> error is unknown. There are, however, no existing nvme status codes that
> indicate a reset should be used, and resetting causes unnecessary
> disruption to the rest of IO.
>
> Change nvme's error handling to first check if failover should happen.
> If not, let the normal error handling take over rather than reset the
> controller.


I've applied this to 5.7. Many developers had input to this patch.
Please let me know if attribution should be adjusted, and I will fix
ASAP.

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] nvme-multipath: do not reset on unknown status
  2020-03-10 21:06 ` Keith Busch
@ 2020-03-10 22:08   ` Meneghini, John
  0 siblings, 0 replies; 6+ messages in thread
From: Meneghini, John @ 2020-03-10 22:08 UTC (permalink / raw)
  To: Keith Busch, Keith Busch
  Cc: Hannes Reinecke, Meneghini, John, Christoph Hellwig, linux-nvme,
	Sagi Grimberg

I think the answer to this question needs to come from Hannes and Christoph.

This patch is a version of a change Hannes and I worked on privately for some time.

I reported this problem on this list months ago, when I proposed an ACRE patch,
and we've had a BZ open with SUSE for this problem.  After some time working on
this BZ Hannes took one of my suggestions and pushed a proposed patch out.

Christoph then significantly modified the patch, and I took both Christoph's
changes and Hannes' changes and pushed up the final patch.

/John

P.S. that’s for working on this Keith, I realize my patch did not apply cleanly.

On 3/10/20, 5:06 PM, "Keith Busch" <keith.busch@gmail.com> wrote:
    
    On Thu, Feb 20, 2020 at 7:53 AM Keith Busch <kbusch@kernel.org> wrote:
    >
    > From: John Meneghini <johnm@netapp.com>
    >
    > The nvme multipath error handling defaults to controller reset if the
    > error is unknown. There are, however, no existing nvme status codes that
    > indicate a reset should be used, and resetting causes unnecessary
    > disruption to the rest of IO.
    >
    > Change nvme's error handling to first check if failover should happen.
    > If not, let the normal error handling take over rather than reset the
    > controller.
    
    
    I've applied this to 5.7. Many developers had input to this patch.
    Please let me know if attribution should be adjusted, and I will fix
    ASAP.
    

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-03-10 22:08 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-02-20 14:52 [PATCH] nvme-multipath: do not reset on unknown status Keith Busch
2020-02-20 16:47 ` Christoph Hellwig
2020-02-20 17:17   ` Meneghini, John
2020-02-20 19:39 ` Hannes Reinecke
2020-03-10 21:06 ` Keith Busch
2020-03-10 22:08   ` Meneghini, John

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox