All of lore.kernel.org
 help / color / mirror / Atom feed
From: hare@kernel.org
To: Christoph Hellwig <hch@lst.de>
Cc: Keith Busch <kbusch@kernel.org>, Sagi Grimberg <sagi@grimberg.me>,
	linux-nvme@lists.infradead.org, Hannes Reinecke <hare@suse.de>
Subject: [PATCH] nvme-tcp: Do not terminate commands when in RESETTING
Date: Thu, 11 Jan 2024 08:29:29 +0100	[thread overview]
Message-ID: <20240111072929.29381-1-hare@kernel.org> (raw)

From: Hannes Reinecke <hare@suse.de>

Terminating commands from the timeout handler might lead
to a data corruption as the timeout might trigger before
KATO expired.
This is the case when several commands have been started
before the keep-alive command and the command timeouts
trigger just after the keep-alive command has been sent.
Then the first command will trigger an error recovery,
but all the other commands will be aborted directly
and immediately retried.
So return BLK_EH_RESET_TIMER for I/O commands when
error recovery has been started to ensure that the
commands will be retried only after the KATO interval.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/nvme/host/tcp.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index b234f0674aeb..b9ec121b3fc6 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -2429,6 +2429,18 @@ static enum blk_eh_timer_return nvme_tcp_timeout(struct request *rq)
 		 rq->tag, nvme_cid(rq), pdu->hdr.type, opc,
 		 nvme_opcode_str(qid, opc, fctype), qid);
 
+	/*
+	 * If the error recovery is started we should ignore all
+	 * I/O commands as they'll be aborted once error recovery starts.
+	 * Otherwise they'll be failed over immediately and might
+	 * cause data corruption.
+	 */
+	if (ctrl->state == NVME_CTRL_RESETTING && qid > 0) {
+		/* Avoid interfering with firmware download */
+		if (!WARN_ON(work_pending(&ctrl->fw_act_work)))
+			return BLK_EH_RESET_TIMER;
+	}
+
 	if (ctrl->state != NVME_CTRL_LIVE) {
 		/*
 		 * If we are resetting, connecting or deleting we should
-- 
2.35.3



             reply	other threads:[~2024-01-11  7:29 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-11  7:29 hare [this message]
2024-01-11 12:30 ` [PATCH] nvme-tcp: Do not terminate commands when in RESETTING Sagi Grimberg
2024-01-11 14:10   ` Hannes Reinecke
2024-01-11 16:15 ` Keith Busch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240111072929.29381-1-hare@kernel.org \
    --to=hare@kernel.org \
    --cc=hare@suse.de \
    --cc=hch@lst.de \
    --cc=kbusch@kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.