Linux SCSI subsystem development
 help / color / mirror / Atom feed
From: James Smart <jsmart2021@gmail.com>
To: linux-scsi@vger.kernel.org
Cc: James Smart <jsmart2021@gmail.com>,
	Dick Kennedy <dick.kennedy@broadcom.com>
Subject: [PATCH v2 04/15] lpfc: Fix crash when a fabric node is released prematurely.
Date: Mon,  4 Jan 2021 10:02:29 -0800	[thread overview]
Message-ID: <20210104180240.46824-5-jsmart2021@gmail.com> (raw)
In-Reply-To: <20210104180240.46824-1-jsmart2021@gmail.com>

The driver's management of the fabric controller (aka pseudo-scsi
initiator) node in SLI3 mode is causing this crash. The crash occurs
because of a node reference imbalance that frees the fabric controller
node while devloss is outstanding from the SCSI transport.  This is
triggered by an odd behavior where the switch reacts to a rejected RDP
request with a PLOGI and nothing else, not even a LOGO.  The driver
ACKS the PLOGI and after successfully registering the RPI, incorrectly
registers the fabric controller node because it has the NLP_FC4_FCP
flag still set from the fabric controller PRLI.  If a LIP is issued,
the driver attempts to cleanup on Link Up and ends up executing too
many puts.

Fix by detecting the fabric node type and clearing out the nodes internal
flags that triggered a SCSI transport registration and subsequence
dev_loss event.  The driver cannot count on any persistence from
fabric controller nodes.

Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
---
 drivers/scsi/lpfc/lpfc_hbadisc.c   | 18 +++++++++++++-----
 drivers/scsi/lpfc/lpfc_nportdisc.c |  8 +++++++-
 2 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_hbadisc.c b/drivers/scsi/lpfc/lpfc_hbadisc.c
index 2b6b5fc671fe..bcb5bf7e19dc 100644
--- a/drivers/scsi/lpfc/lpfc_hbadisc.c
+++ b/drivers/scsi/lpfc/lpfc_hbadisc.c
@@ -73,6 +73,16 @@ static void lpfc_unregister_fcfi_cmpl(struct lpfc_hba *, LPFC_MBOXQ_t *);
 static int lpfc_fcf_inuse(struct lpfc_hba *);
 static void lpfc_mbx_cmpl_read_sparam(struct lpfc_hba *, LPFC_MBOXQ_t *);
 
+static int
+lpfc_valid_xpt_node(struct lpfc_nodelist *ndlp)
+{
+	if (ndlp->nlp_fc4_type ||
+	    ndlp->nlp_DID == Fabric_DID ||
+	    ndlp->nlp_DID == NameServer_DID ||
+	    ndlp->nlp_DID == FDMI_DID)
+		return 1;
+	return 0;
+}
 /* The source of a terminate rport I/O is either a dev_loss_tmo
  * event or a call to fc_remove_host.  While the rport should be
  * valid during these downcalls, the transport can call twice
@@ -4318,7 +4328,8 @@ lpfc_nlp_state_cleanup(struct lpfc_vport *vport, struct lpfc_nodelist *ndlp,
 	/* FCP and NVME Transport interface */
 	if ((old_state == NLP_STE_MAPPED_NODE ||
 	     old_state == NLP_STE_UNMAPPED_NODE)) {
-		if (ndlp->rport) {
+		if (ndlp->rport &&
+		    lpfc_valid_xpt_node(ndlp)) {
 			vport->phba->nport_event_cnt++;
 			lpfc_unregister_remote_port(ndlp);
 		}
@@ -4340,10 +4351,7 @@ lpfc_nlp_state_cleanup(struct lpfc_vport *vport, struct lpfc_nodelist *ndlp,
 
 	if (new_state ==  NLP_STE_MAPPED_NODE ||
 	    new_state == NLP_STE_UNMAPPED_NODE) {
-		if (ndlp->nlp_fc4_type ||
-		    ndlp->nlp_DID == Fabric_DID ||
-		    ndlp->nlp_DID == NameServer_DID ||
-		    ndlp->nlp_DID == FDMI_DID) {
+		if (lpfc_valid_xpt_node(ndlp)) {
 			vport->phba->nport_event_cnt++;
 			/*
 			 * Tell the fc transport about the port, if we haven't
diff --git a/drivers/scsi/lpfc/lpfc_nportdisc.c b/drivers/scsi/lpfc/lpfc_nportdisc.c
index 4961a8a55844..0d0d2ca1a5d8 100644
--- a/drivers/scsi/lpfc/lpfc_nportdisc.c
+++ b/drivers/scsi/lpfc/lpfc_nportdisc.c
@@ -1021,7 +1021,12 @@ lpfc_rcv_prli(struct lpfc_vport *vport, struct lpfc_nodelist *ndlp,
 			ndlp->nlp_fc4_type |= NLP_FC4_NVME;
 			lpfc_nlp_set_state(vport, ndlp, NLP_STE_UNMAPPED_NODE);
 		}
-		if (npr->prliType == PRLI_FCP_TYPE)
+
+		/* Fabric Controllers send FCP PRLI as an initiator but should
+		 * not get recognized as FCP type and registered with transport.
+		 */
+		if (npr->prliType == PRLI_FCP_TYPE &&
+		    !(ndlp->nlp_type & NLP_FABRIC))
 			ndlp->nlp_fc4_type |= NLP_FC4_FCP;
 	}
 	if (rport) {
@@ -2044,6 +2049,7 @@ lpfc_cmpl_reglogin_reglogin_issue(struct lpfc_vport *vport,
 		 * must complete PRLI.
 		 */
 		if (ndlp->nlp_type & NLP_FABRIC) {
+			ndlp->nlp_fc4_type &= ~NLP_FC4_FCP;
 			ndlp->nlp_prev_state = NLP_STE_REG_LOGIN_ISSUE;
 			lpfc_nlp_set_state(vport, ndlp, NLP_STE_UNMAPPED_NODE);
 		}
-- 
2.26.2


  parent reply	other threads:[~2021-01-04 18:03 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-04 18:02 [PATCH v2 00/15] lpfc: Update lpfc to revision 12.8.0.7 James Smart
2021-01-04 18:02 ` [PATCH v2 01/15] lpfc: Fix PLOGI S_ID of 0 on pt2pt config James Smart
2021-01-04 18:02 ` [PATCH v2 02/15] lpfc: Fix auto sli_mode and its effect on CONFIG_PORT for SLI3 James Smart
2021-06-07 11:06   ` Daniel Wagner
2021-06-07 15:12     ` James Smart
2021-06-15 12:45       ` Daniel Wagner
2021-06-18  8:52         ` Daniel Wagner
2021-12-14 13:19           ` [PATCH] lpfc: Reintroduce old IRQ probe logic Daniel Wagner
2021-01-04 18:02 ` [PATCH v2 03/15] lpfc: Refresh ndlp when a new PRLI is received in the PRLI issue state James Smart
2021-01-04 18:02 ` James Smart [this message]
2021-01-04 18:02 ` [PATCH v2 05/15] lpfc: Use the nvme-fc transport supplied timeout for LS requests James Smart
2021-01-04 18:02 ` [PATCH v2 06/15] lpfc: Fix FW reset action if IOs are outstanding James Smart
2021-01-04 18:02 ` [PATCH v2 07/15] lpfc: Prevent duplicate requests to unregister with cpuhp framework James Smart
2021-01-04 18:02 ` [PATCH v2 08/15] lpfc: Fix error log messages being logged following scsi task mgnt James Smart
2021-01-04 18:02 ` [PATCH v2 09/15] lpfc: Fix target reset failing James Smart
2021-01-04 18:02 ` [PATCH v2 10/15] lpfc: Fix NVME recovery after mailbox timeout James Smart
2021-01-04 18:02 ` [PATCH v2 11/15] lpfc: Fix vport create logging James Smart
2021-01-04 18:02 ` [PATCH v2 12/15] lpfc: Fix crash when nvmet transport calls host_release James Smart
2021-01-04 18:02 ` [PATCH v2 13/15] lpfc: Implement health checking when aborting io James Smart
2021-01-04 18:02 ` [PATCH v2 14/15] lpfc: Enhancements to LOG_TRACE_EVENT for better readability James Smart
2021-01-04 18:02 ` [PATCH v2 15/15] lpfc: Update lpfc version to 12.8.0.7 James Smart
2021-01-08  4:02 ` [PATCH v2 00/15] lpfc: Update lpfc to revision 12.8.0.7 Martin K. Petersen
2021-01-13  5:48 ` Martin K. Petersen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210104180240.46824-5-jsmart2021@gmail.com \
    --to=jsmart2021@gmail.com \
    --cc=dick.kennedy@broadcom.com \
    --cc=linux-scsi@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox