From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, Anton Eidelman <anton@lightbitslabs.com>,
Sagi Grimberg <sagi@grimberg.me>, Christoph Hellwig <hch@lst.de>,
Sasha Levin <sashal@kernel.org>
Subject: [PATCH 5.4 11/68] nvme-multipath: fix hang when disk goes live over reconnect
Date: Mon, 23 May 2022 19:04:38 +0200 [thread overview]
Message-ID: <20220523165804.401178525@linuxfoundation.org> (raw)
In-Reply-To: <20220523165802.500642349@linuxfoundation.org>
From: Anton Eidelman <anton.eidelman@gmail.com>
[ Upstream commit a4a6f3c8f61c3cfbda4998ad94596059ad7e4332 ]
nvme_mpath_init_identify() invoked from nvme_init_identify() fetches a
fresh ANA log from the ctrl. This is essential to have an up to date
path states for both existing namespaces and for those scan_work may
discover once the ctrl is up.
This happens in the following cases:
1) A new ctrl is being connected.
2) An existing ctrl is successfully reconnected.
3) An existing ctrl is being reset.
While in (1) ctrl->namespaces is empty, (2 & 3) may have namespaces, and
nvme_read_ana_log() may call nvme_update_ns_ana_state().
This result in a hang when the ANA state of an existing namespace changes
and makes the disk live: nvme_mpath_set_live() issues IO to the namespace
through the ctrl, which does NOT have IO queues yet.
See sample hang below.
Solution:
- nvme_update_ns_ana_state() to call set_live only if ctrl is live
- nvme_read_ana_log() call from nvme_mpath_init_identify()
therefore only fetches and parses the ANA log;
any erros in this process will fail the ctrl setup as appropriate;
- a separate function nvme_mpath_update()
is called in nvme_start_ctrl();
this parses the ANA log without fetching it.
At this point the ctrl is live,
therefore, disks can be set live normally.
Sample failure:
nvme nvme0: starting error recovery
nvme nvme0: Reconnecting in 10 seconds...
block nvme0n6: no usable path - requeuing I/O
INFO: task kworker/u8:3:312 blocked for more than 122 seconds.
Tainted: G E 5.14.5-1.el7.elrepo.x86_64 #1
Workqueue: nvme-wq nvme_tcp_reconnect_ctrl_work [nvme_tcp]
Call Trace:
__schedule+0x2a2/0x7e0
schedule+0x4e/0xb0
io_schedule+0x16/0x40
wait_on_page_bit_common+0x15c/0x3e0
do_read_cache_page+0x1e0/0x410
read_cache_page+0x12/0x20
read_part_sector+0x46/0x100
read_lba+0x121/0x240
efi_partition+0x1d2/0x6a0
bdev_disk_changed.part.0+0x1df/0x430
bdev_disk_changed+0x18/0x20
blkdev_get_whole+0x77/0xe0
blkdev_get_by_dev+0xd2/0x3a0
__device_add_disk+0x1ed/0x310
device_add_disk+0x13/0x20
nvme_mpath_set_live+0x138/0x1b0 [nvme_core]
nvme_update_ns_ana_state+0x2b/0x30 [nvme_core]
nvme_update_ana_state+0xca/0xe0 [nvme_core]
nvme_parse_ana_log+0xac/0x170 [nvme_core]
nvme_read_ana_log+0x7d/0xe0 [nvme_core]
nvme_mpath_init_identify+0x105/0x150 [nvme_core]
nvme_init_identify+0x2df/0x4d0 [nvme_core]
nvme_init_ctrl_finish+0x8d/0x3b0 [nvme_core]
nvme_tcp_setup_ctrl+0x337/0x390 [nvme_tcp]
nvme_tcp_reconnect_ctrl_work+0x24/0x40 [nvme_tcp]
process_one_work+0x1bd/0x360
worker_thread+0x50/0x3d0
Signed-off-by: Anton Eidelman <anton@lightbitslabs.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/nvme/host/core.c | 1 +
drivers/nvme/host/multipath.c | 25 +++++++++++++++++++++++--
drivers/nvme/host/nvme.h | 4 ++++
3 files changed, 28 insertions(+), 2 deletions(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 6a9a42809f97..79e22618817d 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4047,6 +4047,7 @@ void nvme_start_ctrl(struct nvme_ctrl *ctrl)
if (ctrl->queue_count > 1) {
nvme_queue_scan(ctrl);
nvme_start_queues(ctrl);
+ nvme_mpath_update(ctrl);
}
ctrl->created = true;
}
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 4d615337e6e2..811f7b96b551 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -501,8 +501,17 @@ static void nvme_update_ns_ana_state(struct nvme_ana_group_desc *desc,
ns->ana_grpid = le32_to_cpu(desc->grpid);
ns->ana_state = desc->state;
clear_bit(NVME_NS_ANA_PENDING, &ns->flags);
-
- if (nvme_state_is_live(ns->ana_state))
+ /*
+ * nvme_mpath_set_live() will trigger I/O to the multipath path device
+ * and in turn to this path device. However we cannot accept this I/O
+ * if the controller is not live. This may deadlock if called from
+ * nvme_mpath_init_identify() and the ctrl will never complete
+ * initialization, preventing I/O from completing. For this case we
+ * will reprocess the ANA log page in nvme_mpath_update() once the
+ * controller is ready.
+ */
+ if (nvme_state_is_live(ns->ana_state) &&
+ ns->ctrl->state == NVME_CTRL_LIVE)
nvme_mpath_set_live(ns);
}
@@ -586,6 +595,18 @@ static void nvme_ana_work(struct work_struct *work)
nvme_read_ana_log(ctrl);
}
+void nvme_mpath_update(struct nvme_ctrl *ctrl)
+{
+ u32 nr_change_groups = 0;
+
+ if (!ctrl->ana_log_buf)
+ return;
+
+ mutex_lock(&ctrl->ana_lock);
+ nvme_parse_ana_log(ctrl, &nr_change_groups, nvme_update_ana_state);
+ mutex_unlock(&ctrl->ana_lock);
+}
+
static void nvme_anatt_timeout(struct timer_list *t)
{
struct nvme_ctrl *ctrl = from_timer(ctrl, t, anatt_timer);
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 2df90d4355b9..1d1431dd4f9e 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -551,6 +551,7 @@ void nvme_mpath_add_disk(struct nvme_ns *ns, struct nvme_id_ns *id);
void nvme_mpath_remove_disk(struct nvme_ns_head *head);
int nvme_mpath_init_identify(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id);
void nvme_mpath_init_ctrl(struct nvme_ctrl *ctrl);
+void nvme_mpath_update(struct nvme_ctrl *ctrl);
void nvme_mpath_uninit(struct nvme_ctrl *ctrl);
void nvme_mpath_stop(struct nvme_ctrl *ctrl);
bool nvme_mpath_clear_current_path(struct nvme_ns *ns);
@@ -648,6 +649,9 @@ static inline int nvme_mpath_init_identify(struct nvme_ctrl *ctrl,
"Please enable CONFIG_NVME_MULTIPATH for full support of multi-port devices.\n");
return 0;
}
+static inline void nvme_mpath_update(struct nvme_ctrl *ctrl)
+{
+}
static inline void nvme_mpath_uninit(struct nvme_ctrl *ctrl)
{
}
--
2.35.1
next prev parent reply other threads:[~2022-05-23 17:17 UTC|newest]
Thread overview: 72+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-05-23 17:04 [PATCH 5.4 00/68] 5.4.196-rc1 review Greg Kroah-Hartman
2022-05-23 17:04 ` [PATCH 5.4 01/68] floppy: use a statically allocated error counter Greg Kroah-Hartman
2022-05-23 17:04 ` [PATCH 5.4 02/68] x86/xen: Make the boot CPU idle task reliable Greg Kroah-Hartman
2022-05-23 17:04 ` [PATCH 5.4 03/68] x86/xen: Make the secondary CPU idle tasks reliable Greg Kroah-Hartman
2022-05-23 17:04 ` [PATCH 5.4 04/68] rtc: fix use-after-free on device removal Greg Kroah-Hartman
2022-05-23 17:04 ` [PATCH 5.4 05/68] um: Cleanup syscall_handler_t definition/cast, fix warning Greg Kroah-Hartman
2022-05-23 17:04 ` [PATCH 5.4 06/68] Input: add bounds checking to input_set_capability() Greg Kroah-Hartman
2022-05-23 17:04 ` [PATCH 5.4 07/68] Input: stmfts - fix reference leak in stmfts_input_open Greg Kroah-Hartman
2022-05-23 17:04 ` [PATCH 5.4 08/68] crypto: stm32 - fix reference leak in stm32_crc_remove Greg Kroah-Hartman
2022-05-23 17:04 ` [PATCH 5.4 09/68] crypto: x86/chacha20 - Avoid spurious jumps to other functions Greg Kroah-Hartman
2022-05-23 17:04 ` [PATCH 5.4 10/68] ALSA: hda/realtek: Enable headset mic on Lenovo P360 Greg Kroah-Hartman
2022-05-23 17:04 ` Greg Kroah-Hartman [this message]
2022-05-23 17:04 ` [PATCH 5.4 12/68] rtc: mc146818-lib: Fix the AltCentury for AMD platforms Greg Kroah-Hartman
2022-05-23 17:04 ` [PATCH 5.4 13/68] MIPS: lantiq: check the return value of kzalloc() Greg Kroah-Hartman
2022-05-23 17:04 ` [PATCH 5.4 14/68] drbd: remove usage of list iterator variable after loop Greg Kroah-Hartman
2022-05-23 17:04 ` [PATCH 5.4 15/68] platform/chrome: cros_ec_debugfs: detach log reader wq from devm Greg Kroah-Hartman
2022-05-23 17:04 ` [PATCH 5.4 16/68] ARM: 9191/1: arm/stacktrace, kasan: Silence KASAN warnings in unwind_frame() Greg Kroah-Hartman
2022-05-23 17:04 ` [PATCH 5.4 17/68] nilfs2: fix lockdep warnings in page operations for btree nodes Greg Kroah-Hartman
2022-05-23 17:04 ` [PATCH 5.4 18/68] nilfs2: fix lockdep warnings during disk space reclamation Greg Kroah-Hartman
2022-05-23 17:04 ` [PATCH 5.4 19/68] mmc: core: Specify timeouts for BKOPS and CACHE_FLUSH for eMMC Greg Kroah-Hartman
2022-05-23 17:04 ` [PATCH 5.4 20/68] mmc: block: Use generic_cmd6_time when modifying INAND_CMD38_ARG_EXT_CSD Greg Kroah-Hartman
2022-05-23 17:04 ` [PATCH 5.4 21/68] mmc: core: Default to generic_cmd6_time as timeout in __mmc_switch() Greg Kroah-Hartman
2022-05-23 17:04 ` [PATCH 5.4 22/68] SUNRPC: Clean up scheduling of autoclose Greg Kroah-Hartman
2022-05-23 17:04 ` [PATCH 5.4 23/68] SUNRPC: Prevent immediate close+reconnect Greg Kroah-Hartman
2022-05-23 17:04 ` [PATCH 5.4 24/68] SUNRPC: Dont call connect() more than once on a TCP socket Greg Kroah-Hartman
2022-05-23 17:04 ` [PATCH 5.4 25/68] SUNRPC: Ensure we flush any closed sockets before xs_xprt_free() Greg Kroah-Hartman
2022-05-23 17:04 ` [PATCH 5.4 26/68] ALSA: wavefront: Proper check of get_user() error Greg Kroah-Hartman
2022-05-23 17:04 ` [PATCH 5.4 27/68] perf: Fix sys_perf_event_open() race against self Greg Kroah-Hartman
2022-05-23 17:04 ` [PATCH 5.4 28/68] Fix double fget() in vhost_net_set_backend() Greg Kroah-Hartman
2022-05-23 17:04 ` [PATCH 5.4 29/68] PCI/PM: Avoid putting Elo i2 PCIe Ports in D3cold Greg Kroah-Hartman
2022-05-23 17:04 ` [PATCH 5.4 30/68] KVM: x86/mmu: Update number of zapped pages even if page list is stable Greg Kroah-Hartman
2022-05-23 17:04 ` [PATCH 5.4 31/68] crypto: qcom-rng - fix infinite loop on requests not multiple of WORD_SZ Greg Kroah-Hartman
2022-05-23 17:04 ` [PATCH 5.4 32/68] drm/dp/mst: fix a possible memory leak in fetch_monitor_name() Greg Kroah-Hartman
2022-05-23 17:05 ` [PATCH 5.4 33/68] dma-buf: fix use of DMA_BUF_SET_NAME_{A,B} in userspace Greg Kroah-Hartman
2022-05-23 17:05 ` [PATCH 5.4 34/68] ARM: dts: aspeed-g6: remove FWQSPID group in pinctrl dtsi Greg Kroah-Hartman
2022-05-23 17:05 ` [PATCH 5.4 35/68] ARM: dts: aspeed-g6: fix SPI1/SPI2 quad pin group Greg Kroah-Hartman
2022-05-23 17:05 ` [PATCH 5.4 36/68] net: macb: Increment rx bd head after allocating skb and buffer Greg Kroah-Hartman
2022-05-23 17:05 ` [PATCH 5.4 37/68] net/sched: act_pedit: sanitize shift argument before usage Greg Kroah-Hartman
2022-05-23 17:05 ` [PATCH 5.4 38/68] net: vmxnet3: fix possible use-after-free bugs in vmxnet3_rq_alloc_rx_buf() Greg Kroah-Hartman
2022-05-23 17:05 ` [PATCH 5.4 39/68] net: vmxnet3: fix possible NULL pointer dereference in vmxnet3_rq_cleanup() Greg Kroah-Hartman
2022-05-23 17:05 ` [PATCH 5.4 40/68] ice: fix possible under reporting of ethtool Tx and Rx statistics Greg Kroah-Hartman
2022-05-23 17:05 ` [PATCH 5.4 41/68] clk: at91: generated: consider range when calculating best rate Greg Kroah-Hartman
2022-05-23 17:05 ` [PATCH 5.4 42/68] net/qla3xxx: Fix a test in ql_reset_work() Greg Kroah-Hartman
2022-05-23 17:05 ` [PATCH 5.4 43/68] NFC: nci: fix sleep in atomic context bugs caused by nci_skb_alloc Greg Kroah-Hartman
2022-05-23 17:05 ` [PATCH 5.4 44/68] net/mlx5e: Properly block LRO when XDP is enabled Greg Kroah-Hartman
2022-05-23 17:05 ` [PATCH 5.4 45/68] net: af_key: add check for pfkey_broadcast in function pfkey_process Greg Kroah-Hartman
2022-05-23 17:05 ` [PATCH 5.4 46/68] ARM: 9196/1: spectre-bhb: enable for Cortex-A15 Greg Kroah-Hartman
2022-05-23 17:05 ` [PATCH 5.4 47/68] ARM: 9197/1: spectre-bhb: fix loop8 sequence for Thumb2 Greg Kroah-Hartman
2022-05-23 17:05 ` [PATCH 5.4 48/68] igb: skip phy status check where unavailable Greg Kroah-Hartman
2022-05-23 17:05 ` [PATCH 5.4 49/68] net: bridge: Clear offload_fwd_mark when passing frame up bridge interface Greg Kroah-Hartman
2022-05-23 17:05 ` [PATCH 5.4 50/68] gpio: gpio-vf610: do not touch other bits when set the target bit Greg Kroah-Hartman
2022-05-23 17:05 ` [PATCH 5.4 51/68] gpio: mvebu/pwm: Refuse requests with inverted polarity Greg Kroah-Hartman
2022-05-23 17:05 ` [PATCH 5.4 52/68] perf bench numa: Address compiler error on s390 Greg Kroah-Hartman
2022-05-23 17:05 ` [PATCH 5.4 53/68] scsi: qla2xxx: Fix missed DMA unmap for aborted commands Greg Kroah-Hartman
2022-05-23 17:05 ` [PATCH 5.4 54/68] mac80211: fix rx reordering with non explicit / psmp ack policy Greg Kroah-Hartman
2022-05-23 17:05 ` [PATCH 5.4 55/68] selftests: add ping test with ping_group_range tuned Greg Kroah-Hartman
2022-05-23 17:05 ` [PATCH 5.4 56/68] ethernet: tulip: fix missing pci_disable_device() on error in tulip_init_one() Greg Kroah-Hartman
2022-05-23 17:05 ` [PATCH 5.4 57/68] net: stmmac: fix missing pci_disable_device() on error in stmmac_pci_probe() Greg Kroah-Hartman
2022-05-23 17:05 ` [PATCH 5.4 58/68] net: atlantic: verify hw_head_ lies within TX buffer ring Greg Kroah-Hartman
2022-05-23 17:05 ` [PATCH 5.4 59/68] Input: ili210x - fix reset timing Greg Kroah-Hartman
2022-05-23 17:05 ` [PATCH 5.4 60/68] block: return ELEVATOR_DISCARD_MERGE if possible Greg Kroah-Hartman
2022-05-23 17:05 ` [PATCH 5.4 61/68] net: stmmac: disable Split Header (SPH) for Intel platforms Greg Kroah-Hartman
2022-05-23 17:05 ` [PATCH 5.4 62/68] firmware_loader: use kernel credentials when reading firmware Greg Kroah-Hartman
2022-05-23 17:05 ` [PATCH 5.4 63/68] ARM: dts: imx7: Use audio_mclk_post_div instead audio_mclk_root_clk Greg Kroah-Hartman
2022-05-23 18:06 ` [PATCH 5.4 00/68] 5.4.196-rc1 review Florian Fainelli
2022-05-23 22:57 ` Shuah Khan
2022-05-24 11:23 ` Naresh Kamboju
2022-05-24 14:52 ` Sudip Mukherjee
2022-05-24 20:03 ` Guenter Roeck
2022-05-25 0:42 ` Khalid Masum
2022-05-25 0:42 ` Khalid Masum
2022-05-25 1:06 ` Samuel Zou
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220523165804.401178525@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=anton@lightbitslabs.com \
--cc=hch@lst.de \
--cc=linux-kernel@vger.kernel.org \
--cc=sagi@grimberg.me \
--cc=sashal@kernel.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.