From: Sasha Levin <Alexander.Levin@microsoft.com>
To: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"stable@vger.kernel.org" <stable@vger.kernel.org>
Cc: Rafael David Tinoco <rafael.tinoco@canonical.com>,
"Martin K . Petersen" <martin.petersen@oracle.com>,
Sasha Levin <Alexander.Levin@microsoft.com>
Subject: [PATCH AUTOSEL for 4.14 31/97] scsi: libiscsi: Allow sd_shutdown on bad transport
Date: Mon, 19 Mar 2018 15:55:05 +0000 [thread overview]
Message-ID: <20180319155411.12348-31-alexander.levin@microsoft.com> (raw)
In-Reply-To: <20180319155411.12348-1-alexander.levin@microsoft.com>
From: Rafael David Tinoco <rafael.tinoco@canonical.com>
[ Upstream commit d754941225a7dbc61f6dd2173fa9498049f9a7ee ]
If, for any reason, userland shuts down iscsi transport interfaces
before proper logouts - like when logging in to LUNs manually, without
logging out on server shutdown, or when automated scripts can't
umount/logout from logged LUNs - kernel will hang forever on its
sd_sync_cache() logic, after issuing the SYNCHRONIZE_CACHE cmd to all
still existent paths.
PID: 1 TASK: ffff8801a69b8000 CPU: 1 COMMAND: "systemd-shutdow"
#0 [ffff8801a69c3a30] __schedule at ffffffff8183e9ee
#1 [ffff8801a69c3a80] schedule at ffffffff8183f0d5
#2 [ffff8801a69c3a98] schedule_timeout at ffffffff81842199
#3 [ffff8801a69c3b40] io_schedule_timeout at ffffffff8183e604
#4 [ffff8801a69c3b70] wait_for_completion_io_timeout at ffffffff8183fc6c
#5 [ffff8801a69c3bd0] blk_execute_rq at ffffffff813cfe10
#6 [ffff8801a69c3c88] scsi_execute at ffffffff815c3fc7
#7 [ffff8801a69c3cc8] scsi_execute_req_flags at ffffffff815c60fe
#8 [ffff8801a69c3d30] sd_sync_cache at ffffffff815d37d7
#9 [ffff8801a69c3da8] sd_shutdown at ffffffff815d3c3c
This happens because iscsi_eh_cmd_timed_out(), the transport layer
timeout helper, would tell the queue timeout function (scsi_times_out)
to reset the request timer over and over, until the session state is
back to logged in state. Unfortunately, during server shutdown, this
might never happen again.
Other option would be "not to handle" the issue in the transport
layer. That would trigger the error handler logic, which would also need
the session state to be logged in again.
Best option, for such case, is to tell upper layers that the command was
handled during the transport layer error handler helper, marking it as
DID_NO_CONNECT, which will allow completion and inform about the
problem.
After the session was marked as ISCSI_STATE_FAILED, due to the first
timeout during the server shutdown phase, all subsequent cmds will fail
to be queued, allowing upper logic to fail faster.
Signed-off-by: Rafael David Tinoco <rafael.tinoco@canonical.com>
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
---
drivers/scsi/libiscsi.c | 24 +++++++++++++++++++++++-
1 file changed, 23 insertions(+), 1 deletion(-)
diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
index f8dc1601efd5..bddbe2da5283 100644
--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -1696,6 +1696,15 @@ int iscsi_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *sc)
*/
switch (session->state) {
case ISCSI_STATE_FAILED:
+ /*
+ * cmds should fail during shutdown, if the session
+ * state is bad, allowing completion to happen
+ */
+ if (unlikely(system_state != SYSTEM_RUNNING)) {
+ reason = FAILURE_SESSION_FAILED;
+ sc->result = DID_NO_CONNECT << 16;
+ break;
+ }
case ISCSI_STATE_IN_RECOVERY:
reason = FAILURE_SESSION_IN_RECOVERY;
sc->result = DID_IMM_RETRY << 16;
@@ -1980,6 +1989,19 @@ enum blk_eh_timer_return iscsi_eh_cmd_timed_out(struct scsi_cmnd *sc)
}
if (session->state != ISCSI_STATE_LOGGED_IN) {
+ /*
+ * During shutdown, if session is prematurely disconnected,
+ * recovery won't happen and there will be hung cmds. Not
+ * handling cmds would trigger EH, also bad in this case.
+ * Instead, handle cmd, allow completion to happen and let
+ * upper layer to deal with the result.
+ */
+ if (unlikely(system_state != SYSTEM_RUNNING)) {
+ sc->result = DID_NO_CONNECT << 16;
+ ISCSI_DBG_EH(session, "sc on shutdown, handled\n");
+ rc = BLK_EH_HANDLED;
+ goto done;
+ }
/*
* We are probably in the middle of iscsi recovery so let
* that complete and handle the error.
@@ -2084,7 +2106,7 @@ enum blk_eh_timer_return iscsi_eh_cmd_timed_out(struct scsi_cmnd *sc)
task->last_timeout = jiffies;
spin_unlock(&session->frwd_lock);
ISCSI_DBG_EH(session, "return %s\n", rc == BLK_EH_RESET_TIMER ?
- "timer reset" : "nh");
+ "timer reset" : "shutdown or nh");
return rc;
}
EXPORT_SYMBOL_GPL(iscsi_eh_cmd_timed_out);
--
2.14.1
next prev parent reply other threads:[~2018-03-19 15:55 UTC|newest]
Thread overview: 97+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-03-19 15:54 [PATCH AUTOSEL for 4.14 01/97] i40iw: Fix sequence number for the first partial FPDU Sasha Levin
2018-03-19 15:54 ` [PATCH AUTOSEL for 4.14 02/97] i40iw: Correct Q1/XF object count equation Sasha Levin
2018-03-19 15:54 ` [PATCH AUTOSEL for 4.14 03/97] i40iw: Validate correct IRD/ORD connection parameters Sasha Levin
2018-03-19 15:54 ` [PATCH AUTOSEL for 4.14 04/97] clk: meson: mpll: use 64-bit maths in params_from_rate Sasha Levin
2018-03-19 15:54 ` [PATCH AUTOSEL for 4.14 05/97] ARM: dts: ls1021a: add "fsl,ls1021a-esdhc" compatible string to esdhc node Sasha Levin
2018-03-19 15:54 ` [PATCH AUTOSEL for 4.14 06/97] Bluetooth: Add a new 04ca:3015 QCA_ROME device Sasha Levin
2018-03-19 15:54 ` [PATCH AUTOSEL for 4.14 07/97] ipv6: Reinject IPv6 packets if IPsec policy matches after SNAT Sasha Levin
2018-03-19 15:54 ` [PATCH AUTOSEL for 4.14 08/97] thermal: power_allocator: fix one race condition issue for thermal_instances list Sasha Levin
2018-03-19 15:54 ` [PATCH AUTOSEL for 4.14 09/97] perf probe: Find versioned symbols from map Sasha Levin
2018-03-19 15:54 ` [PATCH AUTOSEL for 4.14 10/97] perf probe: Add warning message if there is unexpected event name Sasha Levin
2018-03-19 15:54 ` [PATCH AUTOSEL for 4.14 11/97] perf evsel: Enable ignore_missing_thread for pid option Sasha Levin
2018-03-19 15:54 ` [PATCH AUTOSEL for 4.14 12/97] net: hns3: free the ring_data structrue when change tqps Sasha Levin
2018-03-19 15:54 ` [PATCH AUTOSEL for 4.14 13/97] net: hns3: fix for getting auto-negotiation state in hclge_get_autoneg Sasha Levin
2018-03-19 15:54 ` [PATCH AUTOSEL for 4.14 14/97] l2tp: fix missing print session offset info Sasha Levin
2018-03-19 15:54 ` [PATCH AUTOSEL for 4.14 15/97] rds; Reset rs->rs_bound_addr in rds_add_bound() failure path Sasha Levin
2018-03-19 15:54 ` [PATCH AUTOSEL for 4.14 16/97] ACPI / video: Default lcd_only to true on Win8-ready and newer machines Sasha Levin
2018-03-19 15:54 ` [PATCH AUTOSEL for 4.14 17/97] net/mlx4_en: Change default QoS settings Sasha Levin
2018-03-19 15:54 ` [PATCH AUTOSEL for 4.14 18/97] VFS: close race between getcwd() and d_move() Sasha Levin
2018-03-19 15:54 ` [PATCH AUTOSEL for 4.14 19/97] watchdog: dw_wdt: add stop watchdog operation Sasha Levin
2018-03-19 15:54 ` [PATCH AUTOSEL for 4.14 20/97] clk: divider: fix incorrect usage of container_of Sasha Levin
2018-03-19 15:54 ` [PATCH AUTOSEL for 4.14 21/97] clk: sunxi-ng: fix the A64/H5 clock description of DE2 CCU Sasha Levin
2018-03-19 15:54 ` [PATCH AUTOSEL for 4.14 22/97] PM / devfreq: Fix potential NULL pointer dereference in governor_store Sasha Levin
2018-03-19 15:54 ` [PATCH AUTOSEL for 4.14 23/97] selftests/net: fix bugs in address and port initialization Sasha Levin
2018-03-19 15:54 ` [PATCH AUTOSEL for 4.14 24/97] RDMA/cma: Mark end of CMA ID messages Sasha Levin
2018-03-19 15:54 ` [PATCH AUTOSEL for 4.14 25/97] hwmon: (ina2xx) Make calibration register value fixed Sasha Levin
2018-03-19 15:54 ` [PATCH AUTOSEL for 4.14 26/97] clk: sunxi-ng: a83t: Add M divider to TCON1 clock Sasha Levin
2018-03-19 15:54 ` [PATCH AUTOSEL for 4.14 27/97] media: videobuf2-core: don't go out of the buffer range Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 28/97] ASoC: Intel: Skylake: Disable clock gating during firmware and library download Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 29/97] ASoC: Intel: cht_bsw_rt5645: Analog Mic support Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 30/97] spi: sh-msiof: Fix timeout failures for TX-only DMA transfers Sasha Levin
2018-03-19 15:55 ` Sasha Levin [this message]
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 32/97] scsi: mpt3sas: Proper handling of set/clear of "ATA command pending" flag Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 33/97] scsi: qla2xxx: Fix NULL pointer access for fcport structure Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 34/97] irqchip/gic-v3: Fix the driver probe() fail due to disabled GICC entry Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 35/97] ACPI: EC: Fix debugfs_create_*() usage Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 36/97] mac80211: Fix setting TX power on monitor interfaces Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 37/97] vfb: fix video mode and line_length being set when loaded Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 38/97] ACPICA: Recognize the Windows 10 version 1607 and 1703 OSI strings Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 39/97] gpio: label descriptors using the device name Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 40/97] powernv-cpufreq: Add helper to extract pstate from PMSR Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 41/97] IB/rdmavt: Allocate CQ memory on the correct node Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 42/97] blk-mq: avoid to map CPU into stale hw queue Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 43/97] blk-mq: fix race between updating nr_hw_queues and switching io sched Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 44/97] ipv6: Set nexthop flags during route creation Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 45/97] backlight: tdo24m: Fix the SPI CS between transfers Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 46/97] pinctrl: baytrail: Enable glitch filter for GPIOs used as interrupts Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 47/97] nvme_fcloop: disassocate local port structs Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 48/97] nvme_fcloop: fix abort race condition Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 49/97] tpm: return a TPM_RC_COMMAND_CODE response if command is not implemented Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 50/97] perf report: Fix a no annotate browser displayed issue Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 51/97] iio: imu: st_lsm6dsx: fix endianness in st_lsm6dsx_read_oneshot() Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 52/97] staging: lustre: disable preempt while sampling processor id Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 53/97] ASoC: Intel: sst: Fix the return value of 'sst_send_byte_stream_mrfld()' Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 54/97] netfilter: core: only allow one nat hook per hook point Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 55/97] power: supply: axp288_charger: Properly stop work on probe-error / remove Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 56/97] rt2x00: do not pause queue unconditionally on error path Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 57/97] wl1251: check return from call to wl1251_acx_arp_ip_filter Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 58/97] xfs: include inobt buffers in ifree tx log reservation Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 59/97] xfs: fix up agi unlinked list reservations Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 60/97] net/mlx5: Fix race for multiple RoCE enable Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 61/97] net: hns3: Fix an error of total drop packet statistics Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 62/97] net: hns3: Fix a loop index error of tqp statistics query Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 63/97] net: hns3: Fix an error macro definition of HNS3_TQP_STAT Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 64/97] net: hns3: fix for changing MTU Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 65/97] bcache: ret IOERR when read meets metadata error Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 66/97] bcache: stop writeback thread after detaching Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 67/97] bcache: segregate flash only volume write streams Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 68/97] scsi: libsas: fix memory leak in sas_smp_get_phy_events() Sasha Levin
2018-03-19 15:56 ` [PATCH AUTOSEL for 4.14 69/97] scsi: libsas: fix error when getting phy events Sasha Levin
2018-03-19 15:56 ` [PATCH AUTOSEL for 4.14 70/97] scsi: libsas: initialize sas_phy status according to response of DISCOVER Sasha Levin
2018-03-19 15:56 ` [PATCH AUTOSEL for 4.14 71/97] blk-mq: fix kernel oops in blk_mq_tag_idle() Sasha Levin
2018-03-19 15:56 ` [PATCH AUTOSEL for 4.14 72/97] tty: n_gsm: Allow ADM response in addition to UA for control dlci Sasha Levin
2018-03-19 15:56 ` [PATCH AUTOSEL for 4.14 73/97] block, bfq: put async queues for root bfq groups too Sasha Levin
2018-03-19 15:56 ` [PATCH AUTOSEL for 4.14 74/97] EDAC, mv64x60: Fix an error handling path Sasha Levin
2018-03-19 15:56 ` [PATCH AUTOSEL for 4.14 75/97] uio_hv_generic: check that host supports monitor page Sasha Levin
2018-03-19 15:56 ` [PATCH AUTOSEL for 4.14 76/97] i40evf: don't rely on netif_running() outside rtnl_lock() Sasha Levin
2018-03-19 15:56 ` [PATCH AUTOSEL for 4.14 77/97] cxgb4vf: Fix SGE FL buffer initialization logic for 64K pages Sasha Levin
2018-03-19 15:56 ` [PATCH AUTOSEL for 4.14 78/97] clk: fix reentrancy of clk_enable() on UP systems Sasha Levin
2018-03-19 15:56 ` [PATCH AUTOSEL for 4.14 79/97] scsi: megaraid_sas: Error handling for invalid ldcount provided by firmware in RAID map Sasha Levin
2018-03-19 15:56 ` [PATCH AUTOSEL for 4.14 80/97] scsi: megaraid_sas: unload flag should be set after scsi_remove_host is called Sasha Levin
2018-03-19 15:56 ` [PATCH AUTOSEL for 4.14 81/97] RDMA/cma: Fix rdma_cm path querying for RoCE Sasha Levin
2018-03-19 15:56 ` [PATCH AUTOSEL for 4.14 82/97] gpio: thunderx: fix error return code in thunderx_gpio_probe() Sasha Levin
2018-03-19 15:56 ` [PATCH AUTOSEL for 4.14 83/97] x86/gart: Exclude GART aperture from vmcore Sasha Levin
2018-03-19 15:56 ` [PATCH AUTOSEL for 4.14 84/97] sdhci: Advertise 2.0v supply on SDIO host controller Sasha Levin
2018-03-19 15:56 ` [PATCH AUTOSEL for 4.14 85/97] ibmvnic: Don't handle RX interrupts when not up Sasha Levin
2018-03-19 15:56 ` [PATCH AUTOSEL for 4.14 86/97] Input: goodix - disable IRQs while suspended Sasha Levin
2018-03-19 15:56 ` [PATCH AUTOSEL for 4.14 87/97] mtd: mtd_oobtest: Handle bitflips during reads Sasha Levin
2018-03-19 15:56 ` [PATCH AUTOSEL for 4.14 88/97] crypto: aes-generic - build with -Os on gcc-7+ Sasha Levin
2018-03-19 15:56 ` [PATCH AUTOSEL for 4.14 89/97] genirq/affinity: assign vectors to all possible CPUs Sasha Levin
2018-03-19 15:56 ` [PATCH AUTOSEL for 4.14 90/97] perf tools: Fix copyfile_offset update of output offset Sasha Levin
2018-03-19 15:56 ` [PATCH AUTOSEL for 4.14 91/97] signal/parisc: Document a conflict with SI_USER with SIGFPE Sasha Levin
2018-03-19 15:56 ` [PATCH AUTOSEL for 4.14 92/97] signal/metag: " Sasha Levin
2018-03-19 15:56 ` [PATCH AUTOSEL for 4.14 93/97] signal/powerpc: Document conflicts with SI_USER and SIGFPE and SIGTRAP Sasha Levin
2018-03-19 15:56 ` [PATCH AUTOSEL for 4.14 94/97] signal/arm: Document conflicts with SI_USER and SIGFPE Sasha Levin
2018-03-19 15:56 ` [PATCH AUTOSEL for 4.14 95/97] xfs: account finobt blocks properly in perag reservation Sasha Levin
2018-03-19 15:56 ` [PATCH AUTOSEL for 4.14 96/97] tcmu: release blocks for partially setup cmds Sasha Levin
2018-03-19 15:56 ` [PATCH AUTOSEL for 4.14 97/97] thermal: int3400_thermal: fix error handling in int3400_thermal_probe() Sasha Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180319155411.12348-31-alexander.levin@microsoft.com \
--to=alexander.levin@microsoft.com \
--cc=linux-kernel@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=rafael.tinoco@canonical.com \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox