From: James Smart <jsmart2021@gmail.com>
To: linux-scsi@vger.kernel.org
Cc: James Smart <jsmart2021@gmail.com>,
Dick Kennedy <dick.kennedy@broadcom.com>
Subject: [PATCH 23/42] lpfc: Fix crash due to port reset racing vs adapter error handling
Date: Wed, 14 Aug 2019 16:56:53 -0700 [thread overview]
Message-ID: <20190814235712.4487-24-jsmart2021@gmail.com> (raw)
In-Reply-To: <20190814235712.4487-1-jsmart2021@gmail.com>
If the adapter encounters a condition which causes the adapter
to fail (driver must detect the failure) simultaneously to a
request to the driver to reset the adapter (such as a host_reset),
the reset path will be racing with the asynchronously-detect
adapter failure path. In the failing situation, one path has
started to tear down the adapter data structures (io_wq's) while
the other path has initiated a repeat of the teardown and is in
the lpfc_sli_flush_xxx_rings path and attempting to access the
just-freed data structures.
Fix by the following:
- In cases where an adapter failure is detected, rather than
explicitly calling offline_eratt() to start the teardown,
change the adapter state and let the later calls of posted
work to the slowpath thread invoke the adapter recovery.
In essence, this means all requests to reset are serialized
on the slowpath thread.
- Clean up the routine that restarts the adapter. If there is a
failure from brdreset, don't immediately error and leave
things in a partial state. Instead, ensure the adapter state
is set and finish the teardown of structures before returning.
- if in the scsi host reset handler and the board fails to
reset and restart (which can be due to parallel reset/recovery
paths), instead of hard failing and explicitly calling
offline_eratt() (which gets into the redundant path), just
fail out and let the asynchronous path resolve the adapter
state.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
---
drivers/scsi/lpfc/lpfc_init.c | 7 +++----
drivers/scsi/lpfc/lpfc_scsi.c | 16 +++++++++-------
drivers/scsi/lpfc/lpfc_sli.c | 6 ++++--
3 files changed, 16 insertions(+), 13 deletions(-)
diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c
index 84a77faed114..c580d512a3db 100644
--- a/drivers/scsi/lpfc/lpfc_init.c
+++ b/drivers/scsi/lpfc/lpfc_init.c
@@ -1926,7 +1926,7 @@ lpfc_handle_eratt_s4(struct lpfc_hba *phba)
lpfc_printf_log(phba, KERN_ERR, LOG_INIT,
"7624 Firmware not ready: Failing UE recovery,"
" waited %dSec", i);
- lpfc_sli4_offline_eratt(phba);
+ phba->link_state = LPFC_HBA_ERROR;
break;
case LPFC_SLI_INTF_IF_TYPE_2:
@@ -2000,9 +2000,8 @@ lpfc_handle_eratt_s4(struct lpfc_hba *phba)
}
/* fall through for not able to recover */
lpfc_printf_log(phba, KERN_ERR, LOG_INIT,
- "3152 Unrecoverable error, bring the port "
- "offline\n");
- lpfc_sli4_offline_eratt(phba);
+ "3152 Unrecoverable error\n");
+ phba->link_state = LPFC_HBA_ERROR;
break;
case LPFC_SLI_INTF_IF_TYPE_1:
default:
diff --git a/drivers/scsi/lpfc/lpfc_scsi.c b/drivers/scsi/lpfc/lpfc_scsi.c
index f9df800e7067..720a98266986 100644
--- a/drivers/scsi/lpfc/lpfc_scsi.c
+++ b/drivers/scsi/lpfc/lpfc_scsi.c
@@ -5295,18 +5295,20 @@ lpfc_host_reset_handler(struct scsi_cmnd *cmnd)
lpfc_offline(phba);
rc = lpfc_sli_brdrestart(phba);
if (rc)
- ret = FAILED;
+ goto error;
+
rc = lpfc_online(phba);
if (rc)
- ret = FAILED;
+ goto error;
+
lpfc_unblock_mgmt_io(phba);
- if (ret == FAILED) {
- lpfc_printf_vlog(vport, KERN_ERR, LOG_FCP,
- "3323 Failed host reset, bring it offline\n");
- lpfc_sli4_offline_eratt(phba);
- }
return ret;
+error:
+ lpfc_printf_vlog(vport, KERN_ERR, LOG_FCP,
+ "3323 Failed host reset\n");
+ lpfc_unblock_mgmt_io(phba);
+ return FAILED;
}
/**
diff --git a/drivers/scsi/lpfc/lpfc_sli.c b/drivers/scsi/lpfc/lpfc_sli.c
index 058c092bda73..ca6988ae8924 100644
--- a/drivers/scsi/lpfc/lpfc_sli.c
+++ b/drivers/scsi/lpfc/lpfc_sli.c
@@ -4509,7 +4509,7 @@ lpfc_sli_brdreset(struct lpfc_hba *phba)
* checking during resets the device. The caller is not required to hold
* any locks.
*
- * This function returns 0 always.
+ * This function returns 0 on success else returns negative error code.
**/
int
lpfc_sli4_brdreset(struct lpfc_hba *phba)
@@ -4667,7 +4667,7 @@ lpfc_sli_brdrestart_s4(struct lpfc_hba *phba)
rc = lpfc_sli4_brdreset(phba);
if (rc)
- return rc;
+ goto error;
spin_lock_irq(&phba->hbalock);
phba->pport->stopped = 0;
@@ -4682,6 +4682,8 @@ lpfc_sli_brdrestart_s4(struct lpfc_hba *phba)
if (hba_aer_enabled)
pci_disable_pcie_error_reporting(phba->pcidev);
+error:
+ phba->link_state = LPFC_HBA_ERROR;
lpfc_hba_down_post(phba);
lpfc_sli4_queue_destroy(phba);
--
2.13.7
next prev parent reply other threads:[~2019-08-14 23:57 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-08-14 23:56 [PATCH 00/42] lpfc: Update lpfc to revision 12.4.0.0 James Smart
2019-08-14 23:56 ` [PATCH 01/42] lpfc: Limit xri count for kdump environment James Smart
2019-08-14 23:56 ` [PATCH 02/42] lpfc: Fix PLOGI failure with high remoteport count James Smart
2019-08-14 23:56 ` [PATCH 03/42] lpfc: Fix ELS field alignments James Smart
2019-08-14 23:56 ` [PATCH 04/42] lpfc: Fix crash on driver unload in wq free James Smart
2019-08-14 23:56 ` [PATCH 05/42] lpfc: Fix failure to clear non-zero eq_delay after io rate reduction James Smart
2019-08-14 23:56 ` [PATCH 06/42] lpfc: Fix leak of ELS completions on adapter reset James Smart
2019-08-14 23:56 ` [PATCH 07/42] lpfc: Fix port relogin failure due to GID_FT interaction James Smart
2019-08-14 23:56 ` [PATCH 08/42] lpfc: Fix discovery when target has no GID_FT information James Smart
2019-08-14 23:56 ` [PATCH 09/42] lpfc: Fix ADISC reception terminating login state if a NVME target James Smart
2019-08-14 23:56 ` [PATCH 10/42] lpfc: Fix issuing init_vpi mbox on SLI-3 card James Smart
2019-08-14 23:56 ` [PATCH 11/42] lpfc: Fix Oops in nvme_register with target logout/login James Smart
2019-08-14 23:56 ` [PATCH 12/42] lpfc: Fix irq raising in lpfc_sli_hba_down James Smart
2019-08-14 23:56 ` [PATCH 13/42] lpfc: Fix oops when fewer hdwqs than cpus James Smart
2019-08-14 23:56 ` [PATCH 14/42] lpfc: Fix FLOGI handling across multiple link up/down conditions James Smart
2019-08-14 23:56 ` [PATCH 15/42] lpfc: Fix null ptr oops updating lpfc_devloss_tmo via sysfs attribute James Smart
2019-08-14 23:56 ` [PATCH 16/42] lpfc: Fix devices that don't return after devloss followed by rediscovery James Smart
2019-08-14 23:56 ` [PATCH 17/42] lpfc: Fix loss of remote port after devloss due to lack of RPIs James Smart
2019-08-14 23:56 ` [PATCH 18/42] lpfc: Fix propagation of devloss_tmo setting to nvme transport James Smart
2019-08-14 23:56 ` [PATCH 19/42] lpfc: Fix sg_seg_cnt for HBAs that don't support NVME James Smart
2019-08-14 23:56 ` [PATCH 20/42] lpfc: Fix driver nvme rescan logging James Smart
2019-08-14 23:56 ` [PATCH 21/42] lpfc: Fix error in remote port address change James Smart
2019-08-14 23:56 ` [PATCH 22/42] lpfc: Fix deadlock on host_lock during cable pulls James Smart
2019-08-14 23:56 ` James Smart [this message]
2019-08-14 23:56 ` [PATCH 24/42] lpfc: Fix too many sg segments spamming in kernel log James Smart
2019-08-14 23:56 ` [PATCH 25/42] lpfc: Fix hang when downloading fw on port enabled for nvme James Smart
2019-08-14 23:56 ` [PATCH 26/42] lpfc: Fix nvme target mode ABTSing a received ABTS James Smart
2019-08-14 23:56 ` [PATCH 27/42] lpfc: Fix nvme sg_seg_cnt display if HBA does not support NVME James Smart
2019-08-14 23:56 ` [PATCH 28/42] lpfc: Fix sli4 adapter initialization with MSI James Smart
2019-08-14 23:56 ` [PATCH 29/42] lpfc: Fix upcall to bsg done in non-success cases James Smart
2019-08-14 23:57 ` [PATCH 30/42] lpfc: Fix Max Frame Size value shown in fdmishow output James Smart
2019-08-14 23:57 ` [PATCH 31/42] lpfc: Fix reported physical link speed on a disabled trunked link James Smart
2019-08-14 23:57 ` [PATCH 32/42] lpfc: Fix BlockGuard enablement on FCoE adapters James Smart
2019-08-14 23:57 ` [PATCH 33/42] lpfc: Fix nvme first burst module parameter description James Smart
2019-08-14 23:57 ` [PATCH 34/42] lpfc: Fix coverity warnings James Smart
2019-08-14 23:57 ` [PATCH 35/42] lpfc: Add simple unlikely optimizations to reduce NVME latency James Smart
2019-08-14 23:57 ` [PATCH 36/42] lpfc: Migrate to %px and %pf in kernel print calls James Smart
2019-08-14 23:57 ` [PATCH 37/42] lpfc: Add first and second level hardware revisions to sysfs reporting James Smart
2019-08-14 23:57 ` [PATCH 38/42] lpfc: Add MDS driver loopback diagnostics support James Smart
2019-08-14 23:57 ` [PATCH 39/42] lpfc: Support dynamic unbounded SGL lists on G7 hardware James Smart
2019-08-14 23:57 ` [PATCH 40/42] lpfc: Add NVMe sequence level error recovery support James Smart
2019-08-14 23:57 ` [PATCH 41/42] lpfc: Merge per-protocol WQ/CQ pairs into single per-cpu pair James Smart
2019-08-14 23:57 ` [PATCH 42/42] lpfc: Update lpfc version to 12.4.0.0 James Smart
2019-08-20 3:06 ` [PATCH 00/42] lpfc: Update lpfc to revision 12.4.0.0 Martin K. Petersen
2019-08-27 13:31 ` Hannes Reinecke
2019-08-28 0:10 ` James Smart
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190814235712.4487-24-jsmart2021@gmail.com \
--to=jsmart2021@gmail.com \
--cc=dick.kennedy@broadcom.com \
--cc=linux-scsi@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).