All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Snitzer <snitzer@redhat.com>
To: "Meneghini, John" <John.Meneghini@netapp.com>
Cc: Ewan Milne <emilne@redhat.com>,
	Christoph Hellwig <hch@infradead.org>,
	"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>,
	Chao Leng <lengchao@huawei.com>, Keith Busch <kbusch@kernel.org>,
	Hannes Reinecke <hare@suse.de>
Subject: [PATCH] nvme: restore use of blk_path_error() in nvme_complete_rq()
Date: Thu, 6 Aug 2020 15:19:43 -0400	[thread overview]
Message-ID: <20200806191943.GA27868@redhat.com> (raw)
In-Reply-To: <20200806184057.GA27858@redhat.com>

On Thu, Aug 06 2020 at  2:40pm -0400,
Mike Snitzer <snitzer@redhat.com> wrote:

> On Thu, Aug 06 2020 at 12:17pm -0400,
> Meneghini, John <John.Meneghini@netapp.com> wrote:
> 
> > On 8/6/20, 11:59 AM, "Meneghini, John" <John.Meneghini@netapp.com> wrote:
> > 
> >     Maybe translate to
> >     >> BLK_STS_IOERR is also not suitable, we should translate
> >     >> NVME_SC_CMD_INTERRUPTED to BLK_STS_AGAIN.
> > 
> > I think this depends upon what the error handling is up the stack for BLK_STS_IOERR.
> > 
> > What does DM do with BLK_STS_IOERR?
> 
> DM treats it as retryable.  See blk_path_error().
> 
> >     > BLK_STS_AGAIN is a bad choice as we use it for calls that block when
> >     > the callers asked for non-blocking submission.  I'm really not sure
> >     > we want to change anything here - the error definition clearly states
> >     > it is not a failure but a request to retry later.
> > 
> > So it sounds like you may need a new BLK_STS error.   However, even if you add
> > a new error, that's not going to be enough to communicate the CRDT or DNR 
> > information up the stack.
> >  
> > } blk_errors[] = {
> >         [BLK_STS_OK]            = { 0,          "" },
> >         [BLK_STS_NOTSUPP]       = { -EOPNOTSUPP, "operation not supported" },
> >         [BLK_STS_TIMEOUT]       = { -ETIMEDOUT, "timeout" },
> >         [BLK_STS_NOSPC]         = { -ENOSPC,    "critical space allocation" },
> >         [BLK_STS_TRANSPORT]     = { -ENOLINK,   "recoverable transport" },
> >         [BLK_STS_TARGET]        = { -EREMOTEIO, "critical target" },
> >         [BLK_STS_NEXUS]         = { -EBADE,     "critical nexus" },
> >         [BLK_STS_MEDIUM]        = { -ENODATA,   "critical medium" },
> >         [BLK_STS_PROTECTION]    = { -EILSEQ,    "protection" },
> >         [BLK_STS_RESOURCE]      = { -ENOMEM,    "kernel resource" },
> >         [BLK_STS_DEV_RESOURCE]  = { -EBUSY,     "device resource" },
> >         [BLK_STS_AGAIN]         = { -EAGAIN,    "nonblocking retry" },
> > 
> >         /* device mapper special case, should not leak out: */
> >         [BLK_STS_DM_REQUEUE]    = { -EREMCHG, "dm internal retry" },
> > 
> >         /* everything else not covered above: */
> >         [BLK_STS_IOERR]         = { -EIO,       "I/O" },
> > };
> > 
> 
> We've yet to determine how important it is that the target provided
> delay information be honored...
> 
> In any case, NVMe translating NVME_SC_CMD_INTERRUPTED to BLK_STS_TARGET
> is definitely wrong.  That conveys the error is not retryable (see
> blk_path_error()).
> 
> Shouldn't NVMe translate NVME_SC_CMD_INTERRUPTED to BLK_STS_RESOURCE or
> BLK_STS_DEV_RESOURCE?
> 
> DM will retry immediately if BLK_STS_RESOURCE is returned.
> DM will delay a fixed 100ms if BLK_STS_DEV_RESOURCE is used.

Ngh, I got that inverted.. BLK_STS_RESOURCE will result in the 100ms
delayed retry.  BLK_STS_DEV_RESOURCE results in immediate retry.

But going back to BLK_STS_IOERR by reverting commit 35038bffa87 would
work too.

> (Ming said BLK_STS_RESOURCE isn't Linux [block core] specific and can
> be used by drivers)

Regardless, reading back on this thread, I think there is at least some
consensus about reverting commit 35038bffa87 ("nvme: Translate more
status codes to blk_status_t") ?

And on a related note, building on the thread I started here (but
haven't heard back from any NVMe maintainers on):
https://www.redhat.com/archives/dm-devel/2020-July/msg00051.html
I'd also be happy as a pig in shit if this patch were applied:

From: Mike Snitzer <snitzer@redhat.com>
Date: Thu, 2 Jul 2020 01:43:27 -0400
Subject: [PATCH] nvme: restore use of blk_path_error() in nvme_complete_rq()

Commit 764e9332098c0 ("nvme-multipath: do not reset on unknown
status") removed NVMe's use blk_path_error() -- presummably because
nvme_failover_req() was modified to return whether a command should be
retried or not.

By not using blk_path_error() there is serious potential for
regression for how upper layers (e.g. DM multipath) respond to NVMe's
error conditions.  This has played out now due to commit 35038bffa87
("nvme: Translate more status codes to blk_status_t").  Had NVMe
continued to use blk_path_error() it too would not have retried an
NVMe command that got NVME_SC_CMD_INTERRUPTED.

Fix this potential for NVMe error handling regression, possibly
outside NVMe, by restoring NVMe's use of blk_path_error().

Fixes: 764e9332098c0 ("nvme-multipath: do not reset on unknown status")
Cc: stable@vger.kerneel.org
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/nvme/host/core.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 6585d57112ad..072f629da4d8 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -290,8 +290,13 @@ void nvme_complete_rq(struct request *req)
 		nvme_req(req)->ctrl->comp_seen = true;
 
 	if (unlikely(status != BLK_STS_OK && nvme_req_needs_retry(req))) {
-		if ((req->cmd_flags & REQ_NVME_MPATH) && nvme_failover_req(req))
-			return;
+		if (blk_path_error(status)) {
+			if (req->cmd_flags & REQ_NVME_MPATH) {
+				if (nvme_failover_req(req))
+					return;
+				/* fallthru to normal error handling */
+			}
+		}
 
 		if (!blk_queue_dying(req->q)) {
 			nvme_retry_req(req);
-- 
2.18.0


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

  reply	other threads:[~2020-08-06 19:20 UTC|newest]

Thread overview: 80+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-27  5:58 [PATCH] nvme-core: fix io interrupt when work with dm-multipah Chao Leng
2020-07-28 11:19 ` Christoph Hellwig
2020-07-29  2:54   ` Chao Leng
2020-07-29  5:59     ` Christoph Hellwig
2020-07-30  1:49       ` Chao Leng
2020-08-05  6:40         ` Chao Leng
2020-08-05 15:29           ` Keith Busch
2020-08-06  5:52             ` Chao Leng
2020-08-06 14:26               ` Keith Busch
2020-08-06 15:59                 ` Meneghini, John
2020-08-06 16:17                   ` Meneghini, John
2020-08-06 18:40                     ` Mike Snitzer
2020-08-06 19:19                       ` Mike Snitzer [this message]
2020-08-06 22:42                         ` [PATCH] nvme: restore use of blk_path_error() in nvme_complete_rq() Meneghini, John
2020-08-07  0:07                           ` Mike Snitzer
2020-08-07  0:07                             ` Mike Snitzer
2020-08-07  1:21                             ` Sagi Grimberg
2020-08-07  1:21                               ` Sagi Grimberg
2020-08-07  4:50                               ` Mike Snitzer
2020-08-07  4:50                                 ` Mike Snitzer
2020-08-07 23:35                                 ` Sagi Grimberg
2020-08-07 23:35                                   ` Sagi Grimberg
2020-08-08 21:08                                   ` Meneghini, John
2020-08-08 21:08                                     ` Meneghini, John
2020-08-08 21:11                                     ` Meneghini, John
2020-08-08 21:11                                       ` Meneghini, John
2020-08-10 14:48                                       ` Mike Snitzer
2020-08-10 14:48                                         ` Mike Snitzer
2020-08-11 12:54                                         ` Meneghini, John
2020-08-11 12:54                                           ` Meneghini, John
2020-08-10  8:10                                     ` Chao Leng
2020-08-10  8:10                                       ` Chao Leng
2020-08-11 12:36                                       ` Meneghini, John
2020-08-11 12:36                                         ` Meneghini, John
2020-08-12  7:51                                         ` Chao Leng
2020-08-12  7:51                                           ` Chao Leng
2020-08-10 14:36                                   ` Mike Snitzer
2020-08-10 14:36                                     ` Mike Snitzer
2020-08-10 17:22                                     ` [PATCH] nvme: explicitly use normal NVMe error handling when appropriate Mike Snitzer
2020-08-10 17:22                                       ` Mike Snitzer
2020-08-11  3:32                                       ` Chao Leng
2020-08-11  3:32                                         ` Chao Leng
2020-08-11  4:20                                         ` Mike Snitzer
2020-08-11  4:20                                           ` Mike Snitzer
2020-08-11  6:17                                           ` Chao Leng
2020-08-11  6:17                                             ` Chao Leng
2020-08-11 14:12                                             ` Mike Snitzer
2020-08-11 14:12                                               ` Mike Snitzer
2020-08-13 14:48                                       ` [RESEND PATCH] " Mike Snitzer
2020-08-13 14:48                                         ` Mike Snitzer
2020-08-13 15:29                                         ` Meneghini, John
2020-08-13 15:29                                           ` Meneghini, John
2020-08-13 15:43                                           ` Mike Snitzer
2020-08-13 15:43                                             ` Mike Snitzer
2020-08-13 15:59                                             ` Meneghini, John
2020-08-13 15:59                                               ` Meneghini, John
2020-08-13 15:36                                         ` Christoph Hellwig
2020-08-13 15:36                                           ` Christoph Hellwig
2020-08-13 17:47                                           ` Mike Snitzer
2020-08-13 17:47                                             ` Mike Snitzer
2020-08-13 18:43                                             ` Christoph Hellwig
2020-08-13 18:43                                               ` Christoph Hellwig
2020-08-13 19:03                                               ` Mike Snitzer
2020-08-13 19:03                                                 ` Mike Snitzer
2020-08-14  4:26                                               ` Meneghini, John
2020-08-14  4:26                                                 ` Meneghini, John
2020-08-14  6:53                                               ` Sagi Grimberg
2020-08-14  6:53                                                 ` Sagi Grimberg
2020-08-14  6:55                                                 ` Christoph Hellwig
2020-08-14  6:55                                                   ` Christoph Hellwig
2020-08-14  7:02                                                   ` Sagi Grimberg
2020-08-14  7:02                                                     ` Sagi Grimberg
2020-08-14  3:23                                         ` Meneghini, John
2020-08-14  3:23                                           ` Meneghini, John
2020-08-07  0:44                         ` [PATCH] nvme: restore use of blk_path_error() in nvme_complete_rq() Sagi Grimberg
2020-08-10 12:43                         ` Christoph Hellwig
2020-08-10 15:06                           ` Mike Snitzer
2020-08-11  3:45                           ` [PATCH] " Chao Leng
2020-08-07  0:03                   ` [PATCH] nvme-core: fix io interrupt when work with dm-multipah Sagi Grimberg
2020-08-07  2:28                     ` Chao Leng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200806191943.GA27868@redhat.com \
    --to=snitzer@redhat.com \
    --cc=John.Meneghini@netapp.com \
    --cc=emilne@redhat.com \
    --cc=hare@suse.de \
    --cc=hch@infradead.org \
    --cc=kbusch@kernel.org \
    --cc=lengchao@huawei.com \
    --cc=linux-nvme@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.