public inbox for linux-nvme@lists.infradead.org
 help / color / mirror / Atom feed
From: Hannes Reinecke <hare@suse.de>
To: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>, Keith Busch <kbusch@kernel.org>,
	linux-nvme@lists.infradead.org, Hannes Reinecke <hare@suse.de>
Subject: [PATCH 0/3] nvme-tcp: start error recovery after KATO
Date: Fri,  8 Sep 2023 12:00:46 +0200	[thread overview]
Message-ID: <20230908100049.80809-1-hare@suse.de> (raw)

Hi all,

there have been some very insistent reports of data corruption
with certain target implementations due to command retries.
Problem here is that for TCP we're starting error recovery
immediately after either a command timeout or a (local) link loss.
That is contrary to the NVMe base spec, which states in
section 3.9:

 If a Keep Alive Timer expires:
  a) the controller shall ...
  and
  b) the host assumes all outstanding commands are not completed
     and re-issues commands as appropriate.

IE we should retry commands only after KATO expired.
With this patchset we will always wait until KATO expired until
starting error recovery. This will cause a longer delay until
failed commands are retried, but that's kinda the point
of this patchset :-)

As usual, comments and reviews are welcome.

Hannes Reinecke (3):
  nvme-tcp: Do not terminate commands when in RESETTING
  nvme-tcp: make 'err_work' a delayed work
  nvme-tcp: delay error recovery until the next KATO interval

 drivers/nvme/host/core.c |  3 ++-
 drivers/nvme/host/nvme.h |  1 +
 drivers/nvme/host/tcp.c  | 29 +++++++++++++++++++++++------
 3 files changed, 26 insertions(+), 7 deletions(-)

-- 
2.35.3



             reply	other threads:[~2023-09-08 10:01 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-08 10:00 Hannes Reinecke [this message]
2023-09-08 10:00 ` [PATCH 1/3] nvme-tcp: Do not terminate commands when in RESETTING Hannes Reinecke
2023-09-12 11:54   ` Sagi Grimberg
2023-09-12 12:06     ` Hannes Reinecke
2023-09-12 12:15       ` Sagi Grimberg
2023-09-12 12:59         ` Hannes Reinecke
2023-09-12 13:02           ` Sagi Grimberg
2023-09-08 10:00 ` [PATCH 2/3] nvme-tcp: make 'err_work' a delayed work Hannes Reinecke
2023-09-08 10:00 ` [PATCH 3/3] nvme-tcp: delay error recovery until the next KATO interval Hannes Reinecke
2023-09-12 12:17   ` Sagi Grimberg
2023-09-12 11:51 ` [PATCH 0/3] nvme-tcp: start error recovery after KATO Sagi Grimberg
2023-09-12 11:56   ` Hannes Reinecke
2023-09-12 12:12     ` Sagi Grimberg
2023-09-12 13:25       ` Hannes Reinecke
2023-09-12 13:43         ` Sagi Grimberg
2023-12-04 10:30           ` Sagi Grimberg
2023-12-04 11:37             ` Hannes Reinecke
2023-12-04 13:27               ` Sagi Grimberg
2023-12-04 14:15                 ` Hannes Reinecke
2023-12-04 16:11                   ` Sagi Grimberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230908100049.80809-1-hare@suse.de \
    --to=hare@suse.de \
    --cc=hch@lst.de \
    --cc=kbusch@kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox