All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Sagi Grimberg <sagi@grimberg.me>
Subject: [PATCH 5.4 21/23] nvme-mpath: replace direct_make_request with generic_make_request
Date: Fri,  9 Apr 2021 11:53:51 +0200	[thread overview]
Message-ID: <20210409095303.569029742@linuxfoundation.org> (raw)
In-Reply-To: <20210409095302.894568462@linuxfoundation.org>

From: Sagi Grimberg <sagi@grimberg.me>

The below patches caused a regression in a multipath setup:
Fixes: 9f98772ba307 ("nvme-rdma: fix controller reset hang during traffic")
Fixes: 2875b0aecabe ("nvme-tcp: fix controller reset hang during traffic")

These patches on their own are correct because they fixed a controller reset
regression.

When we reset/teardown a controller, we must freeze and quiesce the namespaces
request queues to make sure that we safely stop inflight I/O submissions.
Freeze is mandatory because if our hctx map changed between reconnects,
blk_mq_update_nr_hw_queues will immediately attempt to freeze the queue, and
if it still has pending submissions (that are still quiesced) it will hang.
This is what the above patches fixed.

However, by freezing the namespaces request queues, and only unfreezing them
when we successfully reconnect, inflight submissions that are running
concurrently can now block grabbing the nshead srcu until either we successfully
reconnect or ctrl_loss_tmo expired (or the user explicitly disconnected).

This caused a deadlock [1] when a different controller (different path on the
same subsystem) became live (i.e. optimized/non-optimized). This is because
nvme_mpath_set_live needs to synchronize the nshead srcu before requeueing I/O
in order to make sure that current_path is visible to future (re)submisions.
However the srcu lock is taken by a blocked submission on a frozen request
queue, and we have a deadlock.

In recent kernels (v5.9+) direct_make_request was replaced by submit_bio_noacct
which does not have this issue because it bio_list will be active when
nvme-mpath calls submit_bio_noacct on the bottom device (because it was
populated when submit_bio was triggered on it.

Hence, we need to fix all the kernels that were before submit_bio_noacct was
introduced.

[1]:
Workqueue: nvme-wq nvme_tcp_reconnect_ctrl_work [nvme_tcp]
Call Trace:
 __schedule+0x293/0x730
 schedule+0x33/0xa0
 schedule_timeout+0x1d3/0x2f0
 wait_for_completion+0xba/0x140
 __synchronize_srcu.part.21+0x91/0xc0
 synchronize_srcu_expedited+0x27/0x30
 synchronize_srcu+0xce/0xe0
 nvme_mpath_set_live+0x64/0x130 [nvme_core]
 nvme_update_ns_ana_state+0x2c/0x30 [nvme_core]
 nvme_update_ana_state+0xcd/0xe0 [nvme_core]
 nvme_parse_ana_log+0xa1/0x180 [nvme_core]
 nvme_read_ana_log+0x76/0x100 [nvme_core]
 nvme_mpath_init+0x122/0x180 [nvme_core]
 nvme_init_identify+0x80e/0xe20 [nvme_core]
 nvme_tcp_setup_ctrl+0x359/0x660 [nvme_tcp]
 nvme_tcp_reconnect_ctrl_work+0x24/0x70 [nvme_tcp]

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/nvme/host/multipath.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -330,7 +330,7 @@ static blk_qc_t nvme_ns_head_make_reques
 		trace_block_bio_remap(bio->bi_disk->queue, bio,
 				      disk_devt(ns->head->disk),
 				      bio->bi_iter.bi_sector);
-		ret = direct_make_request(bio);
+		ret = generic_make_request(bio);
 	} else if (nvme_available_path(head)) {
 		dev_warn_ratelimited(dev, "no usable path - requeuing I/O\n");
 



  parent reply	other threads:[~2021-04-09 10:00 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-09  9:53 [PATCH 5.4 00/23] 5.4.111-rc1 review Greg Kroah-Hartman
2021-04-09  9:53 ` [PATCH 5.4 01/23] ARM: dts: am33xx: add aliases for mmc interfaces Greg Kroah-Hartman
2021-04-09  9:53 ` [PATCH 5.4 02/23] bus: ti-sysc: Fix warning on unbind if reset is not deasserted Greg Kroah-Hartman
2021-04-09  9:53 ` [PATCH 5.4 03/23] platform/x86: intel-hid: Support Lenovo ThinkPad X1 Tablet Gen 2 Greg Kroah-Hartman
2021-04-09  9:53 ` [PATCH 5.4 04/23] bpf, x86: Use kvmalloc_array instead kmalloc_array in bpf_jit_comp Greg Kroah-Hartman
2021-04-09  9:53 ` [PATCH 5.4 05/23] net/mlx5e: Enforce minimum value check for ICOSQ size Greg Kroah-Hartman
2021-04-09  9:53 ` [PATCH 5.4 06/23] net: pxa168_eth: Fix a potential data race in pxa168_eth_remove Greg Kroah-Hartman
2021-04-09  9:53 ` [PATCH 5.4 07/23] mISDN: fix crash in fritzpci Greg Kroah-Hartman
2021-04-09  9:53 ` [PATCH 5.4 08/23] mac80211: choose first enabled channel for monitor Greg Kroah-Hartman
2021-04-09  9:53 ` [PATCH 5.4 09/23] drm/msm/adreno: a5xx_power: Dont apply A540 lm_setup to other GPUs Greg Kroah-Hartman
2021-04-09  9:53 ` [PATCH 5.4 10/23] drm/msm: Ratelimit invalid-fence message Greg Kroah-Hartman
2021-04-09  9:53 ` [PATCH 5.4 11/23] netfilter: conntrack: Fix gre tunneling over ipv6 Greg Kroah-Hartman
2021-04-09  9:53 ` [PATCH 5.4 12/23] platform/x86: thinkpad_acpi: Allow the FnLock LED to change state Greg Kroah-Hartman
2021-04-09  9:53 ` [PATCH 5.4 13/23] x86/build: Turn off -fcf-protection for realmode targets Greg Kroah-Hartman
2021-04-09  9:53 ` [PATCH 5.4 14/23] scsi: target: pscsi: Clean up after failure in pscsi_map_sg() Greg Kroah-Hartman
2021-04-09  9:53 ` [PATCH 5.4 15/23] ia64: mca: allocate early mca with GFP_ATOMIC Greg Kroah-Hartman
2021-04-09  9:53 ` [PATCH 5.4 16/23] ia64: fix format strings for err_inject Greg Kroah-Hartman
2021-04-09  9:53 ` [PATCH 5.4 17/23] cifs: revalidate mapping when we open files for SMB1 POSIX Greg Kroah-Hartman
2021-04-09  9:53 ` [PATCH 5.4 18/23] cifs: Silently ignore unknown oplock break handle Greg Kroah-Hartman
2021-04-09  9:53 ` [PATCH 5.4 19/23] bpf, x86: Validate computation of branch displacements for x86-64 Greg Kroah-Hartman
2021-04-09  9:53 ` [PATCH 5.4 20/23] bpf, x86: Validate computation of branch displacements for x86-32 Greg Kroah-Hartman
2021-04-09  9:53 ` Greg Kroah-Hartman [this message]
2021-04-09  9:53 ` [PATCH 5.4 22/23] init/Kconfig: make COMPILE_TEST depend on !S390 Greg Kroah-Hartman
2021-04-09  9:53 ` [PATCH 5.4 23/23] init/Kconfig: make COMPILE_TEST depend on HAS_IOMEM Greg Kroah-Hartman
2021-04-09 19:22 ` [PATCH 5.4 00/23] 5.4.111-rc1 review Florian Fainelli
2021-04-09 20:14 ` Guenter Roeck
2021-04-09 20:39 ` Shuah Khan
2021-04-09 21:20 ` Sudip Mukherjee
2021-04-10  0:55 ` Samuel Zou
2021-04-10  7:27 ` Naresh Kamboju

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210409095303.569029742@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sagi@grimberg.me \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.