public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH AUTOSEL 4.19 09/37] scsi: qedf: Do not retry ELS request if qedf_alloc_cmd fails
       [not found] <20191025135603.25093-1-sashal@kernel.org>
@ 2019-10-25 13:55 ` Sasha Levin
  2019-10-25 13:55 ` [PATCH AUTOSEL 4.19 15/37] scsi: bnx2fc: Only put reference to io_req in bnx2fc_abts_cleanup if cleanup times out Sasha Levin
  1 sibling, 0 replies; 2+ messages in thread
From: Sasha Levin @ 2019-10-25 13:55 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Chad Dupuis, Saurav Kashyap, Martin K . Petersen, Sasha Levin,
	linux-scsi

From: Chad Dupuis <cdupuis@marvell.com>

[ Upstream commit f1c43590365bac054d753d808dbbd207d09e088d ]

If we cannot allocate an ELS middlepath request, simply fail instead of
trying to delay and then reallocate.  This delay logic is causing soft
lockup messages:

NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [kworker/2:1:7639]
Modules linked in: xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun devlink ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter dm_service_time vfat fat rpcrdma sunrpc ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm sb_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm
irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt iTCO_vendor_support qedr(OE) ib_core joydev ipmi_ssif pcspkr hpilo hpwdt sg ipmi_si ipmi_devintf ipmi_msghandler ioatdma shpchp lpc_ich wmi dca acpi_power_meter dm_multipath ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic qedf(OE) libfcoe mgag200 libfc i2c_algo_bit drm_kms_helper scsi_transport_fc qede(OE) syscopyarea sysfillrect sysimgblt fb_sys_fops ttm qed(OE) drm crct10dif_pclmul e1000e crct10dif_common crc32c_intel scsi_tgt hpsa i2c_core ptp scsi_transport_sas pps_core dm_mirror dm_region_hash dm_log dm_mod
CPU: 2 PID: 7639 Comm: kworker/2:1 Kdump: loaded Tainted: G           OEL ------------   3.10.0-861.el7.x86_64 #1
Hardware name: HP ProLiant DL580 Gen9/ProLiant DL580 Gen9, BIOS U17 07/21/2016
Workqueue: qedf_2_dpc qedf_handle_rrq [qedf]
task: ffff959edd628fd0 ti: ffff959ed6f08000 task.ti: ffff959ed6f08000
RIP: 0010:[<ffffffff8355913a>]  [<ffffffff8355913a>] delay_tsc+0x3a/0x60
RSP: 0018:ffff959ed6f0bd30  EFLAGS: 00000246
RAX: 000000008ef5f791 RBX: 5f646d635f666465 RCX: 0000025b8ededa2f
RDX: 000000000000025b RSI: 0000000000000002 RDI: 0000000000217d1e
RBP: ffff959ed6f0bd30 R08: ffffffffc079aae8 R09: 0000000000000200
R10: ffffffffc07952c6 R11: 0000000000000000 R12: 6c6c615f66646571
R13: ffff959ed6f0bcc8 R14: ffff959ed6f0bd08 R15: ffff959e00000028
FS:  0000000000000000(0000) GS:ffff959eff480000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f4117fa1eb0 CR3: 0000002039e66000 CR4: 00000000003607e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
[<ffffffff8355907d>] __const_udelay+0x2d/0x30
[<ffffffffc079444a>] qedf_initiate_els+0x13a/0x450 [qedf]
[<ffffffffc0794210>] ? qedf_srr_compl+0x2a0/0x2a0 [qedf]
[<ffffffffc0795337>] qedf_send_rrq+0x127/0x230 [qedf]
[<ffffffffc078ed55>] qedf_handle_rrq+0x15/0x20 [qedf]
[<ffffffff832b2dff>] process_one_work+0x17f/0x440
[<ffffffff832b3ac6>] worker_thread+0x126/0x3c0
[<ffffffff832b39a0>] ? manage_workers.isra.24+0x2a0/0x2a0
[<ffffffff832bae31>] kthread+0xd1/0xe0
[<ffffffff832bad60>] ? insert_kthread_work+0x40/0x40
[<ffffffff8391f637>] ret_from_fork_nospec_begin+0x21/0x21
[<ffffffff832bad60>] ? insert_kthread_work+0x40/0x40

Signed-off-by: Chad Dupuis <cdupuis@marvell.com>
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/scsi/qedf/qedf_els.c | 16 ++++------------
 1 file changed, 4 insertions(+), 12 deletions(-)

diff --git a/drivers/scsi/qedf/qedf_els.c b/drivers/scsi/qedf/qedf_els.c
index 04f0c4d2e256e..5178cd03666a6 100644
--- a/drivers/scsi/qedf/qedf_els.c
+++ b/drivers/scsi/qedf/qedf_els.c
@@ -23,8 +23,6 @@ static int qedf_initiate_els(struct qedf_rport *fcport, unsigned int op,
 	int rc = 0;
 	uint32_t did, sid;
 	uint16_t xid;
-	uint32_t start_time = jiffies / HZ;
-	uint32_t current_time;
 	struct fcoe_wqe *sqe;
 	unsigned long flags;
 	u16 sqe_idx;
@@ -59,18 +57,12 @@ static int qedf_initiate_els(struct qedf_rport *fcport, unsigned int op,
 		goto els_err;
 	}
 
-retry_els:
 	els_req = qedf_alloc_cmd(fcport, QEDF_ELS);
 	if (!els_req) {
-		current_time = jiffies / HZ;
-		if ((current_time - start_time) > 10) {
-			QEDF_INFO(&(qedf->dbg_ctx), QEDF_LOG_ELS,
-				   "els: Failed els 0x%x\n", op);
-			rc = -ENOMEM;
-			goto els_err;
-		}
-		mdelay(20 * USEC_PER_MSEC);
-		goto retry_els;
+		QEDF_INFO(&qedf->dbg_ctx, QEDF_LOG_ELS,
+			  "Failed to alloc ELS request 0x%x\n", op);
+		rc = -ENOMEM;
+		goto els_err;
 	}
 
 	QEDF_INFO(&(qedf->dbg_ctx), QEDF_LOG_ELS, "initiate_els els_req = "
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* [PATCH AUTOSEL 4.19 15/37] scsi: bnx2fc: Only put reference to io_req in bnx2fc_abts_cleanup if cleanup times out
       [not found] <20191025135603.25093-1-sashal@kernel.org>
  2019-10-25 13:55 ` [PATCH AUTOSEL 4.19 09/37] scsi: qedf: Do not retry ELS request if qedf_alloc_cmd fails Sasha Levin
@ 2019-10-25 13:55 ` Sasha Levin
  1 sibling, 0 replies; 2+ messages in thread
From: Sasha Levin @ 2019-10-25 13:55 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Chad Dupuis, Saurav Kashyap, Martin K . Petersen, Sasha Levin,
	linux-scsi

From: Chad Dupuis <cdupuis@marvell.com>

[ Upstream commit a92ac6ee7980f3c139910d0d0a079802363818cb ]

In certain tests where the SCSI error handler issues an abort that is
already outstanding, we will cleanup the command so that the SCSI error
handler can proceed.  In some of these cases we were seeing a command
mismatch:

 kernel: scsi host2: bnx2fc: xid:0x42b eh_abort - refcnt = 2
 kernel: bnx2fc: eh_abort: io_req (xid = 0x42b) already in abts processing
 kernel: scsi host2: bnx2fc: xid:0x42b Entered bnx2fc_initiate_cleanup
 kernel: scsi host2: bnx2fc: xid:0x42b CLEANUP io_req xid = 0x80b
 kernel: scsi host2: bnx2fc: xid:0x80b cq_compl- cleanup resp rcvd
 kernel: scsi host2: bnx2fc: xid:0x42b complete - rx_state = 9
 kernel: scsi host2: bnx2fc: xid:0x42b Entered process_cleanup_compl refcnt = 2, cmd_type = 1
 kernel: scsi host2: bnx2fc: xid:0x42b scsi_done. err_code = 0x7
 kernel: scsi host2: bnx2fc: xid:0x42b sc=ffff8807f93dfb80, result=0x7, retries=0, allowed=5
 kernel: ------------[ cut here ]------------
 kernel: WARNING: at /root/rpmbuild/BUILD/netxtreme2-7.14.43/obj/default/bnx2fc-2.12.1/driver/bnx2fc_io.c:1347 bnx2fc_eh_abort+0x56f/0x680 [bnx2fc]()
 kernel: xid=0x42b refcount=-1
 kernel: Modules linked in:
 kernel: nls_utf8 isofs sr_mod cdrom tcp_lp dm_round_robin xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun bridge ebtable_filter ebtables fuse ip6table_filter ip6_tables iptable_filter bnx2fc(OE) cnic(OE) uio fcoe libfcoe 8021q libfc garp mrp scsi_transport_fc stp llc scsi_tgt vfat fat dm_service_time intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ses enclosure ipmi_ssif i2c_core hpilo hpwdt wmi sg ipmi_devintf pcspkr ipmi_si ipmi_msghandler shpchp acpi_power_meter dm_multipath nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs sd_mod crc_t10dif
 kernel: crct10dif_generic bnx2x(OE) crct10dif_pclmul crct10dif_common crc32c_intel mdio ptp pps_core libcrc32c smartpqi scsi_transport_sas fjes uas usb_storage dm_mirror dm_region_hash dm_log dm_mod
 kernel: CPU: 9 PID: 2012 Comm: scsi_eh_2 Tainted: G        W  OE  ------------   3.10.0-514.el7.x86_64 #1
 kernel: Hardware name: HPE Synergy 480 Gen10/Synergy 480 Gen10 Compute Module, BIOS I42 03/21/2018
 kernel: ffff8807f25a3d98 0000000015e7fa0c ffff8807f25a3d50 ffffffff81685eac
 kernel: ffff8807f25a3d88 ffffffff81085820 ffff8807f8e39000 ffff880801ff7468
 kernel: ffff880801ff7610 0000000000002002 ffff8807f8e39014 ffff8807f25a3df0
 kernel: Call Trace:
 kernel: [<ffffffff81685eac>] dump_stack+0x19/0x1b
 kernel: [<ffffffff81085820>] warn_slowpath_common+0x70/0xb0
 kernel: [<ffffffff810858bc>] warn_slowpath_fmt+0x5c/0x80
 kernel: [<ffffffff8168d842>] ? _raw_spin_lock_bh+0x12/0x50
 kernel: [<ffffffffa0549e6f>] bnx2fc_eh_abort+0x56f/0x680 [bnx2fc]
 kernel: [<ffffffff814570af>] scsi_error_handler+0x59f/0x8b0
 kernel: [<ffffffff81456b10>] ? scsi_eh_get_sense+0x250/0x250
 kernel: [<ffffffff810b052f>] kthread+0xcf/0xe0
 kernel: [<ffffffff810b0460>] ? kthread_create_on_node+0x140/0x140
 kernel: [<ffffffff81696418>] ret_from_fork+0x58/0x90
 kernel: [<ffffffff810b0460>] ? kthread_create_on_node+0x140/0x140
 kernel: ---[ end trace 42deb88f2032b111 ]---

The reason that there was a mismatch is that the SCSI command is actual
returned from the cleanup handler.  In previous testing, the type of
cleanup notification we'd get from the CQE did not trigger the code that
returned the SCSI command.  To overcome the previous behavior we would put
a reference in bnx2fc_abts_cleanup() to account for the SCSI command.
However, in cases where the SCSI command is actually off, we end up with an
extra put.

The fix for this is to only take the extra put in bnx2fc_abts_cleanup if
the completion for the cleanup times out.

Signed-off-by: Chad Dupuis <cdupuis@marvell.com>
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/scsi/bnx2fc/bnx2fc_io.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/scsi/bnx2fc/bnx2fc_io.c b/drivers/scsi/bnx2fc/bnx2fc_io.c
index bc9f2a2365f4d..d27cf966e5600 100644
--- a/drivers/scsi/bnx2fc/bnx2fc_io.c
+++ b/drivers/scsi/bnx2fc/bnx2fc_io.c
@@ -1098,16 +1098,16 @@ static int bnx2fc_abts_cleanup(struct bnx2fc_cmd *io_req)
 	time_left = wait_for_completion_timeout(&io_req->tm_done,
 						BNX2FC_FW_TIMEOUT);
 	io_req->wait_for_comp = 0;
-	if (!time_left)
+	if (!time_left) {
 		BNX2FC_IO_DBG(io_req, "%s(): Wait for cleanup timed out.\n",
 			      __func__);
 
-	/*
-	 * Release reference held by SCSI command the cleanup completion
-	 * hits the BNX2FC_CLEANUP case in bnx2fc_process_cq_compl() and
-	 * thus the SCSI command is not returnedi by bnx2fc_scsi_done().
-	 */
-	kref_put(&io_req->refcount, bnx2fc_cmd_release);
+		/*
+		 * Put the extra reference to the SCSI command since it would
+		 * not have been returned in this case.
+		 */
+		kref_put(&io_req->refcount, bnx2fc_cmd_release);
+	}
 
 	spin_lock_bh(&tgt->tgt_lock);
 	return rc;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2019-10-25 14:05 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20191025135603.25093-1-sashal@kernel.org>
2019-10-25 13:55 ` [PATCH AUTOSEL 4.19 09/37] scsi: qedf: Do not retry ELS request if qedf_alloc_cmd fails Sasha Levin
2019-10-25 13:55 ` [PATCH AUTOSEL 4.19 15/37] scsi: bnx2fc: Only put reference to io_req in bnx2fc_abts_cleanup if cleanup times out Sasha Levin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox