From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, Jens Remus <jremus@linux.ibm.com>,
Benjamin Block <bblock@linux.ibm.com>,
Steffen Maier <maier@linux.ibm.com>,
"Martin K. Petersen" <martin.petersen@oracle.com>,
Sasha Levin <sashal@kernel.org>
Subject: [PATCH 4.4 30/41] scsi: zfcp: fix reaction on bit error threshold notification
Date: Sun, 27 Oct 2019 22:01:08 +0100 [thread overview]
Message-ID: <20191027203125.040101070@linuxfoundation.org> (raw)
In-Reply-To: <20191027203056.220821342@linuxfoundation.org>
From: Steffen Maier <maier@linux.ibm.com>
[ Upstream commit 2190168aaea42c31bff7b9a967e7b045f07df095 ]
On excessive bit errors for the FCP channel ingress fibre path, the channel
notifies us. Previously, we only emitted a kernel message and a trace
record. Since performance can become suboptimal with I/O timeouts due to
bit errors, we now stop using an FCP device by default on channel
notification so multipath on top can timely failover to other paths. A new
module parameter zfcp.ber_stop can be used to get zfcp old behavior.
User explanation of new kernel message:
* Description:
* The FCP channel reported that its bit error threshold has been exceeded.
* These errors might result from a problem with the physical components
* of the local fibre link into the FCP channel.
* The problem might be damage or malfunction of the cable or
* cable connection between the FCP channel and
* the adjacent fabric switch port or the point-to-point peer.
* Find details about the errors in the HBA trace for the FCP device.
* The zfcp device driver closed down the FCP device
* to limit the performance impact from possible I/O command timeouts.
* User action:
* Check for problems on the local fibre link, ensure that fibre optics are
* clean and functional, and all cables are properly plugged.
* After the repair action, you can manually recover the FCP device by
* writing "0" into its "failed" sysfs attribute.
* If recovery through sysfs is not possible, set the CHPID of the device
* offline and back online on the service element.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: <stable@vger.kernel.org> #2.6.30+
Link: https://lore.kernel.org/r/20191001104949.42810-1-maier@linux.ibm.com
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/s390/scsi/zfcp_fsf.c | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)
diff --git a/drivers/s390/scsi/zfcp_fsf.c b/drivers/s390/scsi/zfcp_fsf.c
index 1964391db9047..a3aaef4c53a3c 100644
--- a/drivers/s390/scsi/zfcp_fsf.c
+++ b/drivers/s390/scsi/zfcp_fsf.c
@@ -20,6 +20,11 @@
struct kmem_cache *zfcp_fsf_qtcb_cache;
+static bool ber_stop = true;
+module_param(ber_stop, bool, 0600);
+MODULE_PARM_DESC(ber_stop,
+ "Shuts down FCP devices for FCP channels that report a bit-error count in excess of its threshold (default on)");
+
static void zfcp_fsf_request_timeout_handler(unsigned long data)
{
struct zfcp_adapter *adapter = (struct zfcp_adapter *) data;
@@ -231,10 +236,15 @@ static void zfcp_fsf_status_read_handler(struct zfcp_fsf_req *req)
case FSF_STATUS_READ_SENSE_DATA_AVAIL:
break;
case FSF_STATUS_READ_BIT_ERROR_THRESHOLD:
- dev_warn(&adapter->ccw_device->dev,
- "The error threshold for checksum statistics "
- "has been exceeded\n");
zfcp_dbf_hba_bit_err("fssrh_3", req);
+ if (ber_stop) {
+ dev_warn(&adapter->ccw_device->dev,
+ "All paths over this FCP device are disused because of excessive bit errors\n");
+ zfcp_erp_adapter_shutdown(adapter, 0, "fssrh_b");
+ } else {
+ dev_warn(&adapter->ccw_device->dev,
+ "The error threshold for checksum statistics has been exceeded\n");
+ }
break;
case FSF_STATUS_READ_LINK_DOWN:
zfcp_fsf_status_read_link_down(req);
--
2.20.1
next prev parent reply other threads:[~2019-10-27 21:04 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-10-27 21:00 [PATCH 4.4 00/41] 4.4.198-stable review Greg Kroah-Hartman
2019-10-27 21:00 ` [PATCH 4.4 01/41] scsi: ufs: skip shutdown if hba is not powered Greg Kroah-Hartman
2019-10-27 21:00 ` [PATCH 4.4 02/41] scsi: megaraid: disable device when probe failed after enabled device Greg Kroah-Hartman
2019-10-27 21:00 ` [PATCH 4.4 03/41] scsi: qla2xxx: Fix unbound sleep in fcport delete path Greg Kroah-Hartman
2019-10-27 21:00 ` [PATCH 4.4 04/41] ARM: OMAP2+: Fix missing reset done flag for am3 and am43 Greg Kroah-Hartman
2019-10-27 21:00 ` [PATCH 4.4 05/41] ARM: dts: am4372: Set memory bandwidth limit for DISPC Greg Kroah-Hartman
2019-10-27 21:00 ` [PATCH 4.4 06/41] nl80211: fix null pointer dereference Greg Kroah-Hartman
2019-10-27 21:00 ` [PATCH 4.4 07/41] mips: Loongson: Fix the link time qualifier of serial_exit() Greg Kroah-Hartman
2019-10-27 21:00 ` [PATCH 4.4 08/41] net: hisilicon: Fix usage of uninitialized variable in function mdio_sc_cfg_reg_write() Greg Kroah-Hartman
2019-10-27 21:00 ` [PATCH 4.4 09/41] namespace: fix namespace.pl script to support relative paths Greg Kroah-Hartman
2019-10-27 21:00 ` [PATCH 4.4 10/41] MIPS: Treat Loongson Extensions as ASEs Greg Kroah-Hartman
2019-10-27 21:00 ` [PATCH 4.4 11/41] MIPS: elf_hwcap: Export userspace ASEs Greg Kroah-Hartman
2019-10-27 21:00 ` [PATCH 4.4 12/41] loop: Add LOOP_SET_DIRECT_IO to compat ioctl Greg Kroah-Hartman
2019-10-27 21:00 ` [PATCH 4.4 13/41] net: bcmgenet: Fix RGMII_MODE_EN value for GENET v1/2/3 Greg Kroah-Hartman
2019-10-27 21:00 ` [PATCH 4.4 14/41] net: bcmgenet: Set phydev->dev_flags only for internal PHYs Greg Kroah-Hartman
2019-10-27 21:00 ` [PATCH 4.4 15/41] sctp: change sctp_prot .no_autobind with true Greg Kroah-Hartman
2019-10-27 21:00 ` [PATCH 4.4 16/41] net: avoid potential infinite loop in tc_ctl_action() Greg Kroah-Hartman
2019-10-27 21:00 ` [PATCH 4.4 17/41] ipv4: Return -ENETUNREACH if we cant create route but saddr is valid Greg Kroah-Hartman
2019-10-27 21:00 ` [PATCH 4.4 18/41] memfd: Fix locking when tagging pins Greg Kroah-Hartman
2019-10-27 21:00 ` [PATCH 4.4 19/41] USB: legousbtower: fix memleak on disconnect Greg Kroah-Hartman
2019-10-27 21:00 ` [PATCH 4.4 20/41] usb: udc: lpc32xx: fix bad bit shift operation Greg Kroah-Hartman
2019-10-27 21:00 ` [PATCH 4.4 21/41] USB: serial: ti_usb_3410_5052: fix port-close races Greg Kroah-Hartman
2019-10-27 21:01 ` [PATCH 4.4 22/41] USB: ldusb: fix memleak on disconnect Greg Kroah-Hartman
2019-10-27 21:01 ` [PATCH 4.4 23/41] USB: usblp: fix use-after-free " Greg Kroah-Hartman
2019-10-27 21:01 ` [PATCH 4.4 24/41] USB: ldusb: fix read info leaks Greg Kroah-Hartman
2019-10-27 21:01 ` [PATCH 4.4 25/41] scsi: core: try to get module before removing device Greg Kroah-Hartman
2019-10-27 21:01 ` [PATCH 4.4 26/41] ASoC: rsnd: Reinitialize bit clock inversion flag for every format setting Greg Kroah-Hartman
2019-10-27 21:01 ` [PATCH 4.4 27/41] cfg80211: wext: avoid copying malformed SSIDs Greg Kroah-Hartman
2019-10-27 21:01 ` [PATCH 4.4 28/41] mac80211: Reject malformed SSID elements Greg Kroah-Hartman
2019-10-27 21:01 ` [PATCH 4.4 29/41] drm/edid: Add 6 bpc quirk for SDC panel in Lenovo G50 Greg Kroah-Hartman
2019-10-27 21:01 ` Greg Kroah-Hartman [this message]
2019-10-27 21:01 ` [PATCH 4.4 31/41] mm/slub: fix a deadlock in show_slab_objects() Greg Kroah-Hartman
2019-10-27 21:01 ` [PATCH 4.4 32/41] xtensa: drop EXPORT_SYMBOL for outs*/ins* Greg Kroah-Hartman
2019-10-27 21:01 ` [PATCH 4.4 33/41] parisc: Fix vmap memory leak in ioremap()/iounmap() Greg Kroah-Hartman
2019-10-27 21:01 ` [PATCH 4.4 34/41] CIFS: avoid using MID 0xFFFF Greg Kroah-Hartman
2019-10-27 21:01 ` [PATCH 4.4 35/41] btrfs: block-group: Fix a memory leak due to missing btrfs_put_block_group() Greg Kroah-Hartman
2019-10-27 21:01 ` [PATCH 4.4 36/41] memstick: jmb38x_ms: Fix an error handling path in jmb38x_ms_probe() Greg Kroah-Hartman
2019-10-27 21:01 ` [PATCH 4.4 37/41] cpufreq: Avoid cpufreq_suspend() deadlock on system shutdown Greg Kroah-Hartman
2019-10-27 21:01 ` [PATCH 4.4 38/41] xen/netback: fix error path of xenvif_connect_data() Greg Kroah-Hartman
2019-10-27 21:01 ` [PATCH 4.4 39/41] PCI: PM: Fix pci_power_up() Greg Kroah-Hartman
2019-10-27 21:01 ` [PATCH 4.4 40/41] net: sched: Fix memory exposure from short TCA_U32_SEL Greg Kroah-Hartman
2019-10-27 21:01 ` [PATCH 4.4 41/41] RDMA/cxgb4: Do not dma memory off of the stack Greg Kroah-Hartman
2019-10-28 3:55 ` [PATCH 4.4 00/41] 4.4.198-stable review kernelci.org bot
2019-10-28 9:07 ` Didik Setiawan
2019-10-28 13:32 ` Guenter Roeck
2019-10-28 13:49 ` Greg Kroah-Hartman
2019-10-28 13:57 ` Greg Kroah-Hartman
2019-10-28 18:12 ` Guenter Roeck
2019-10-29 8:19 ` Greg Kroah-Hartman
2019-10-28 13:33 ` Guenter Roeck
2019-10-28 21:36 ` Jon Hunter
2019-10-29 13:29 ` Naresh Kamboju
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20191027203125.040101070@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=bblock@linux.ibm.com \
--cc=jremus@linux.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=maier@linux.ibm.com \
--cc=martin.petersen@oracle.com \
--cc=sashal@kernel.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).