All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Anton Eidelman <anton@lightbitslabs.com>,
	Sagi Grimberg <sagi@grimberg.me>, Christoph Hellwig <hch@lst.de>,
	Sasha Levin <sashal@kernel.org>
Subject: [PATCH 5.10 22/97] nvme-multipath: fix hang when disk goes live over reconnect
Date: Mon, 23 May 2022 19:05:26 +0200	[thread overview]
Message-ID: <20220523165815.776802739@linuxfoundation.org> (raw)
In-Reply-To: <20220523165812.244140613@linuxfoundation.org>

From: Anton Eidelman <anton.eidelman@gmail.com>

[ Upstream commit a4a6f3c8f61c3cfbda4998ad94596059ad7e4332 ]

nvme_mpath_init_identify() invoked from nvme_init_identify() fetches a
fresh ANA log from the ctrl.  This is essential to have an up to date
path states for both existing namespaces and for those scan_work may
discover once the ctrl is up.

This happens in the following cases:
  1) A new ctrl is being connected.
  2) An existing ctrl is successfully reconnected.
  3) An existing ctrl is being reset.

While in (1) ctrl->namespaces is empty, (2 & 3) may have namespaces, and
nvme_read_ana_log() may call nvme_update_ns_ana_state().

This result in a hang when the ANA state of an existing namespace changes
and makes the disk live: nvme_mpath_set_live() issues IO to the namespace
through the ctrl, which does NOT have IO queues yet.

See sample hang below.

Solution:
- nvme_update_ns_ana_state() to call set_live only if ctrl is live
- nvme_read_ana_log() call from nvme_mpath_init_identify()
  therefore only fetches and parses the ANA log;
  any erros in this process will fail the ctrl setup as appropriate;
- a separate function nvme_mpath_update()
  is called in nvme_start_ctrl();
  this parses the ANA log without fetching it.
  At this point the ctrl is live,
  therefore, disks can be set live normally.

Sample failure:
    nvme nvme0: starting error recovery
    nvme nvme0: Reconnecting in 10 seconds...
    block nvme0n6: no usable path - requeuing I/O
    INFO: task kworker/u8:3:312 blocked for more than 122 seconds.
          Tainted: G            E     5.14.5-1.el7.elrepo.x86_64 #1
    Workqueue: nvme-wq nvme_tcp_reconnect_ctrl_work [nvme_tcp]
    Call Trace:
     __schedule+0x2a2/0x7e0
     schedule+0x4e/0xb0
     io_schedule+0x16/0x40
     wait_on_page_bit_common+0x15c/0x3e0
     do_read_cache_page+0x1e0/0x410
     read_cache_page+0x12/0x20
     read_part_sector+0x46/0x100
     read_lba+0x121/0x240
     efi_partition+0x1d2/0x6a0
     bdev_disk_changed.part.0+0x1df/0x430
     bdev_disk_changed+0x18/0x20
     blkdev_get_whole+0x77/0xe0
     blkdev_get_by_dev+0xd2/0x3a0
     __device_add_disk+0x1ed/0x310
     device_add_disk+0x13/0x20
     nvme_mpath_set_live+0x138/0x1b0 [nvme_core]
     nvme_update_ns_ana_state+0x2b/0x30 [nvme_core]
     nvme_update_ana_state+0xca/0xe0 [nvme_core]
     nvme_parse_ana_log+0xac/0x170 [nvme_core]
     nvme_read_ana_log+0x7d/0xe0 [nvme_core]
     nvme_mpath_init_identify+0x105/0x150 [nvme_core]
     nvme_init_identify+0x2df/0x4d0 [nvme_core]
     nvme_init_ctrl_finish+0x8d/0x3b0 [nvme_core]
     nvme_tcp_setup_ctrl+0x337/0x390 [nvme_tcp]
     nvme_tcp_reconnect_ctrl_work+0x24/0x40 [nvme_tcp]
     process_one_work+0x1bd/0x360
     worker_thread+0x50/0x3d0

Signed-off-by: Anton Eidelman <anton@lightbitslabs.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/nvme/host/core.c      |  1 +
 drivers/nvme/host/multipath.c | 25 +++++++++++++++++++++++--
 drivers/nvme/host/nvme.h      |  4 ++++
 3 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index ad4f1cfbad2e..e73a5c62a858 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4420,6 +4420,7 @@ void nvme_start_ctrl(struct nvme_ctrl *ctrl)
 	if (ctrl->queue_count > 1) {
 		nvme_queue_scan(ctrl);
 		nvme_start_queues(ctrl);
+		nvme_mpath_update(ctrl);
 	}
 }
 EXPORT_SYMBOL_GPL(nvme_start_ctrl);
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 18a756444d5a..a9e15c8f907b 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -484,8 +484,17 @@ static void nvme_update_ns_ana_state(struct nvme_ana_group_desc *desc,
 	ns->ana_grpid = le32_to_cpu(desc->grpid);
 	ns->ana_state = desc->state;
 	clear_bit(NVME_NS_ANA_PENDING, &ns->flags);
-
-	if (nvme_state_is_live(ns->ana_state))
+	/*
+	 * nvme_mpath_set_live() will trigger I/O to the multipath path device
+	 * and in turn to this path device.  However we cannot accept this I/O
+	 * if the controller is not live.  This may deadlock if called from
+	 * nvme_mpath_init_identify() and the ctrl will never complete
+	 * initialization, preventing I/O from completing.  For this case we
+	 * will reprocess the ANA log page in nvme_mpath_update() once the
+	 * controller is ready.
+	 */
+	if (nvme_state_is_live(ns->ana_state) &&
+	    ns->ctrl->state == NVME_CTRL_LIVE)
 		nvme_mpath_set_live(ns);
 }
 
@@ -572,6 +581,18 @@ static void nvme_ana_work(struct work_struct *work)
 	nvme_read_ana_log(ctrl);
 }
 
+void nvme_mpath_update(struct nvme_ctrl *ctrl)
+{
+	u32 nr_change_groups = 0;
+
+	if (!ctrl->ana_log_buf)
+		return;
+
+	mutex_lock(&ctrl->ana_lock);
+	nvme_parse_ana_log(ctrl, &nr_change_groups, nvme_update_ana_state);
+	mutex_unlock(&ctrl->ana_lock);
+}
+
 static void nvme_anatt_timeout(struct timer_list *t)
 {
 	struct nvme_ctrl *ctrl = from_timer(ctrl, t, anatt_timer);
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 10e5ae3a8c0d..95b9657cabaf 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -712,6 +712,7 @@ void nvme_mpath_add_disk(struct nvme_ns *ns, struct nvme_id_ns *id);
 void nvme_mpath_remove_disk(struct nvme_ns_head *head);
 int nvme_mpath_init_identify(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id);
 void nvme_mpath_init_ctrl(struct nvme_ctrl *ctrl);
+void nvme_mpath_update(struct nvme_ctrl *ctrl);
 void nvme_mpath_uninit(struct nvme_ctrl *ctrl);
 void nvme_mpath_stop(struct nvme_ctrl *ctrl);
 bool nvme_mpath_clear_current_path(struct nvme_ns *ns);
@@ -798,6 +799,9 @@ static inline int nvme_mpath_init_identify(struct nvme_ctrl *ctrl,
 "Please enable CONFIG_NVME_MULTIPATH for full support of multi-port devices.\n");
 	return 0;
 }
+static inline void nvme_mpath_update(struct nvme_ctrl *ctrl)
+{
+}
 static inline void nvme_mpath_uninit(struct nvme_ctrl *ctrl)
 {
 }
-- 
2.35.1




  parent reply	other threads:[~2022-05-23 17:18 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-23 17:05 [PATCH 5.10 00/97] 5.10.118-rc1 review Greg Kroah-Hartman
2022-05-23 17:05 ` [PATCH 5.10 01/97] usb: gadget: fix race when gadget driver register via ioctl Greg Kroah-Hartman
2022-05-23 17:05 ` [PATCH 5.10 02/97] io_uring: always grab file table for deferred statx Greg Kroah-Hartman
2022-05-23 17:05 ` [PATCH 5.10 03/97] floppy: use a statically allocated error counter Greg Kroah-Hartman
2022-05-23 17:05 ` [PATCH 5.10 04/97] Revert "drm/i915/opregion: check port number bounds for SWSCI display power state" Greg Kroah-Hartman
2022-05-23 17:05 ` [PATCH 5.10 05/97] igc: Remove _I_PHY_ID checking Greg Kroah-Hartman
2022-05-23 17:05 ` [PATCH 5.10 06/97] igc: Remove phy->type checking Greg Kroah-Hartman
2022-05-23 17:05 ` [PATCH 5.10 07/97] igc: Update I226_K device ID Greg Kroah-Hartman
2022-05-25 10:45   ` Pavel Machek
2022-05-26  4:02     ` Neftin, Sasha
2022-05-23 17:05 ` [PATCH 5.10 08/97] rtc: fix use-after-free on device removal Greg Kroah-Hartman
2022-05-23 17:05 ` [PATCH 5.10 09/97] rtc: pcf2127: fix bug when reading alarm registers Greg Kroah-Hartman
2022-05-23 17:05 ` [PATCH 5.10 10/97] um: Cleanup syscall_handler_t definition/cast, fix warning Greg Kroah-Hartman
2022-05-23 17:05 ` [PATCH 5.10 11/97] Input: add bounds checking to input_set_capability() Greg Kroah-Hartman
2022-05-23 17:05 ` [PATCH 5.10 12/97] Input: stmfts - fix reference leak in stmfts_input_open Greg Kroah-Hartman
2022-05-23 17:05 ` [PATCH 5.10 13/97] nvme-pci: add quirks for Samsung X5 SSDs Greg Kroah-Hartman
2022-05-23 17:05 ` [PATCH 5.10 14/97] gfs2: Disable page faults during lockless buffered reads Greg Kroah-Hartman
2022-05-23 17:05 ` [PATCH 5.10 15/97] rtc: sun6i: Fix time overflow handling Greg Kroah-Hartman
2022-05-23 17:05 ` [PATCH 5.10 16/97] crypto: stm32 - fix reference leak in stm32_crc_remove Greg Kroah-Hartman
2022-05-23 17:05 ` [PATCH 5.10 17/97] crypto: x86/chacha20 - Avoid spurious jumps to other functions Greg Kroah-Hartman
2022-05-23 17:05 ` [PATCH 5.10 18/97] ALSA: hda/realtek: Enable headset mic on Lenovo P360 Greg Kroah-Hartman
2022-05-23 17:05 ` [PATCH 5.10 19/97] s390/pci: improve zpci_dev reference counting Greg Kroah-Hartman
2022-05-23 17:05 ` [PATCH 5.10 20/97] vhost_vdpa: dont setup irq offloading when irq_num < 0 Greg Kroah-Hartman
2022-05-23 17:05 ` [PATCH 5.10 21/97] tools/virtio: compile with -pthread Greg Kroah-Hartman
2022-05-23 17:05 ` Greg Kroah-Hartman [this message]
2022-05-23 17:05 ` [PATCH 5.10 23/97] rtc: mc146818-lib: Fix the AltCentury for AMD platforms Greg Kroah-Hartman
2022-05-23 17:05 ` [PATCH 5.10 24/97] fs: fix an infinite loop in iomap_fiemap Greg Kroah-Hartman
2022-05-23 17:05 ` [PATCH 5.10 25/97] MIPS: lantiq: check the return value of kzalloc() Greg Kroah-Hartman
2022-05-23 17:05 ` [PATCH 5.10 26/97] drbd: remove usage of list iterator variable after loop Greg Kroah-Hartman
2022-05-23 17:05 ` [PATCH 5.10 27/97] platform/chrome: cros_ec_debugfs: detach log reader wq from devm Greg Kroah-Hartman
2022-05-23 17:05 ` [PATCH 5.10 28/97] ARM: 9191/1: arm/stacktrace, kasan: Silence KASAN warnings in unwind_frame() Greg Kroah-Hartman
2022-05-23 17:05 ` [PATCH 5.10 29/97] nilfs2: fix lockdep warnings in page operations for btree nodes Greg Kroah-Hartman
2022-05-23 17:05 ` [PATCH 5.10 30/97] nilfs2: fix lockdep warnings during disk space reclamation Greg Kroah-Hartman
2022-05-23 17:05 ` [PATCH 5.10 31/97] Revert "swiotlb: fix info leak with DMA_FROM_DEVICE" Greg Kroah-Hartman
2022-05-23 18:25 ` [PATCH 5.10 00/97] 5.10.118-rc1 review Florian Fainelli
2022-05-23 21:36 ` Daniel Díaz
2022-05-25  7:16   ` Greg Kroah-Hartman
2022-05-23 22:56 ` Shuah Khan
2022-05-24  9:24 ` Fox Chen
2022-05-24 14:49 ` Sudip Mukherjee
2022-05-24 15:25 ` Pavel Machek
2022-05-24 20:04 ` Guenter Roeck
2022-05-25  1:05 ` Samuel Zou

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220523165815.776802739@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=anton@lightbitslabs.com \
    --cc=hch@lst.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sagi@grimberg.me \
    --cc=sashal@kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.