public inbox for linux-nvme@lists.infradead.org
 help / color / mirror / Atom feed
* [PATCH v3 0/3] nvme-fc: fix race with connectivity loss and nvme_fc_create_association
@ 2024-11-29  9:28 Daniel Wagner
  2024-11-29  9:28 ` [PATCH v3 1/3] nvme-fc: go straight to connecting state when initializing Daniel Wagner
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Daniel Wagner @ 2024-11-29  9:28 UTC (permalink / raw)
  To: James Smart, Keith Busch, Christoph Hellwig, Sagi Grimberg,
	Hannes Reinecke, Paul Ely
  Cc: linux-nvme, linux-kernel, Daniel Wagner

After a long hard stare at the keep alive machinery I am convienced we
need to trigger a reset when nvme_keep_alive_end_io is called with status
!= 0. There is a lengthy explanation in patch #3.

I've also tested this version with blktests and some manual tests. Though
it's not that easy to get into exact sequence reported by Paul.

Daniel

previous cover letter:

We got a bug report that a controller was stuck in the connected state
after an association dropped.

It turns out that nvme_fc_create_association can succeed even though some
operation do fail. This is on purpose to handle the degraded controller
case, where the admin queue is up and running but not the io queues. In
this case the controller will still reach the LIVE state.

Unfortunatly, this will also ignore full connectivity loss for fabric
controllers. Let's address this by not filtering out all errors in
nvme_set_queue_count.

---
Changes in v3:
- collected reviewed tags
- added nvme_ctrl_reset to keep alive end io handler
- Link to v2: https://lore.kernel.org/r/20241029-nvme-fc-handle-com-lost-v2-0-5b0d137e2a0a@kernel.org

Changes in v2:
- handle connection lost in nvme_set_queue_count directly
- collected reviewed tags
- Link to v1: https://lore.kernel.org/r/20240611190647.11856-1-dwagner@suse.de

---
Daniel Wagner (3):
      nvme-fc: go straight to connecting state when initializing
      nvme: trigger reset when keep alive fails
      nvme: handle connectivity loss in nvme_set_queue_count

 drivers/nvme/host/core.c | 13 ++++++++++++-
 drivers/nvme/host/fc.c   |  3 +--
 2 files changed, 13 insertions(+), 3 deletions(-)
---
base-commit: 029cc98dec2eadb5d0978b5fea9ae6c427f2a020
change-id: 20241029-nvme-fc-handle-com-lost-9b241936809a

Best regards,
-- 
Daniel Wagner <wagi@kernel.org>



^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2025-01-08 10:59 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-29  9:28 [PATCH v3 0/3] nvme-fc: fix race with connectivity loss and nvme_fc_create_association Daniel Wagner
2024-11-29  9:28 ` [PATCH v3 1/3] nvme-fc: go straight to connecting state when initializing Daniel Wagner
2024-11-29  9:28 ` [PATCH v3 2/3] nvme: trigger reset when keep alive fails Daniel Wagner
2024-11-29 11:09   ` Hannes Reinecke
2024-12-09 13:36   ` Christoph Hellwig
2024-12-24 10:31   ` Sagi Grimberg
2025-01-07 14:38     ` Daniel Wagner
2025-01-08 10:50       ` Sagi Grimberg
2024-11-29  9:28 ` [PATCH v3 3/3] nvme: handle connectivity loss in nvme_set_queue_count Daniel Wagner
2024-11-29 11:10   ` Hannes Reinecke
2024-12-17  8:35     ` Daniel Wagner
2024-12-17  9:45       ` Hannes Reinecke
2024-12-17 14:01         ` Daniel Wagner
2024-12-20  8:32           ` Hannes Reinecke
2024-12-24 10:35       ` Sagi Grimberg
2025-01-07 14:40         ` Daniel Wagner
2025-01-08 10:51           ` Sagi Grimberg
2024-12-24 10:39   ` Sagi Grimberg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox