All of lore.kernel.org
 help / color / mirror / Atom feed
From: James Smart <jsmart2021@gmail.com>
To: linux-scsi@vger.kernel.org
Cc: James Smart <jsmart2021@gmail.com>,
	Dick Kennedy <dick.kennedy@broadcom.com>
Subject: [PATCH 12/20] lpfc: Fix host hang at boot or slow boot
Date: Sat, 21 Sep 2019 20:58:58 -0700	[thread overview]
Message-ID: <20190922035906.10977-13-jsmart2021@gmail.com> (raw)
In-Reply-To: <20190922035906.10977-1-jsmart2021@gmail.com>

Scenarios were seen where a host hung when the system booted
or the host was very slow in booting. The link would not come
up and no luns were visible to the host.

After investigation, this was found to be due to the introduction
of a new ACQE that adapter may generate to report a adapter hw
warning. The ACQE was delivered to the driver very early in
adapter initialization, when the driver did not expect command
completion. As part of handling this unexpected interrupt the
an EQEs are consumed and discarded and the EQ rearmed. The issue
is the CQ that cause the EQE and thus the interrupt was not
processed and the CQ was left unarmed. Meaning it would no longer
generate a new interrupt condition. Subsequent mailbox commands
used to initialize the adapter use the same CQ, and as there was
no completion interrupt generated, the driver never saw the
mailbox commands complete and it would wait long command timeouts.

Fix by having the early flush routine also process the related CQ
and rearm the CQ.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
---
 drivers/scsi/lpfc/lpfc_sli.c | 42 ++++++++++++++++++++++++++++++++++++------
 1 file changed, 36 insertions(+), 6 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_sli.c b/drivers/scsi/lpfc/lpfc_sli.c
index 939efee6b5dd..412cd8c56d90 100644
--- a/drivers/scsi/lpfc/lpfc_sli.c
+++ b/drivers/scsi/lpfc/lpfc_sli.c
@@ -87,6 +87,10 @@ static void lpfc_sli4_hba_handle_eqe(struct lpfc_hba *phba,
 				     struct lpfc_eqe *eqe);
 static bool lpfc_sli4_mbox_completions_pending(struct lpfc_hba *phba);
 static bool lpfc_sli4_process_missed_mbox_completions(struct lpfc_hba *phba);
+static struct lpfc_cqe *lpfc_sli4_cq_get(struct lpfc_queue *q);
+static void __lpfc_sli4_consume_cqe(struct lpfc_hba *phba,
+				    struct lpfc_queue *cq,
+				    struct lpfc_cqe *cqe);
 
 static IOCB_t *
 lpfc_get_iocb_from_iocbq(struct lpfc_iocbq *iocbq)
@@ -467,21 +471,47 @@ __lpfc_sli4_consume_eqe(struct lpfc_hba *phba, struct lpfc_queue *eq,
 }
 
 static void
-lpfc_sli4_eq_flush(struct lpfc_hba *phba, struct lpfc_queue *eq)
+lpfc_sli4_eqcq_flush(struct lpfc_hba *phba, struct lpfc_queue *eq)
 {
-	struct lpfc_eqe *eqe;
-	uint32_t count = 0;
+	struct lpfc_eqe *eqe = NULL;
+	u32 eq_count = 0, cq_count = 0;
+	struct lpfc_cqe *cqe = NULL;
+	struct lpfc_queue *cq = NULL, *childq = NULL;
+	int cqid = 0;
 
 	/* walk all the EQ entries and drop on the floor */
 	eqe = lpfc_sli4_eq_get(eq);
 	while (eqe) {
+		/* Get the reference to the corresponding CQ */
+		cqid = bf_get_le32(lpfc_eqe_resource_id, eqe);
+		cq = NULL;
+
+		list_for_each_entry(childq, &eq->child_list, list) {
+			if (childq->queue_id == cqid) {
+				cq = childq;
+				break;
+			}
+		}
+		/* If CQ is valid, iterate through it and drop all the CQEs */
+		if (cq) {
+			cqe = lpfc_sli4_cq_get(cq);
+			while (cqe) {
+				__lpfc_sli4_consume_cqe(phba, cq, cqe);
+				cq_count++;
+				cqe = lpfc_sli4_cq_get(cq);
+			}
+			/* Clear and re-arm the CQ */
+			phba->sli4_hba.sli4_write_cq_db(phba, cq, cq_count,
+			    LPFC_QUEUE_REARM);
+			cq_count = 0;
+		}
 		__lpfc_sli4_consume_eqe(phba, eq, eqe);
-		count++;
+		eq_count++;
 		eqe = lpfc_sli4_eq_get(eq);
 	}
 
 	/* Clear and re-arm the EQ */
-	phba->sli4_hba.sli4_write_eq_db(phba, eq, count, LPFC_QUEUE_REARM);
+	phba->sli4_hba.sli4_write_eq_db(phba, eq, eq_count, LPFC_QUEUE_REARM);
 }
 
 static int
@@ -14236,7 +14266,7 @@ lpfc_sli4_hba_intr_handler(int irq, void *dev_id)
 		spin_lock_irqsave(&phba->hbalock, iflag);
 		if (phba->link_state < LPFC_LINK_DOWN)
 			/* Flush, clear interrupt, and rearm the EQ */
-			lpfc_sli4_eq_flush(phba, fpeq);
+			lpfc_sli4_eqcq_flush(phba, fpeq);
 		spin_unlock_irqrestore(&phba->hbalock, iflag);
 		return IRQ_NONE;
 	}
-- 
2.13.7


  parent reply	other threads:[~2019-09-22  3:59 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-22  3:58 [PATCH 00/20] lpfc: Update lpfc to revision 12.4.0.1 James Smart
2019-09-22  3:58 ` [PATCH 01/20] lpfc: Fix pt2pt discovery on SLI3 HBAs James Smart
2019-09-22  3:58 ` [PATCH 02/20] lpfc: Fix premature re-enabling of interrupts in lpfc_sli_host_down James Smart
2019-09-22  3:58 ` [PATCH 03/20] lpfc: Fix miss of register read failure check James Smart
2019-09-22  3:58 ` [PATCH 04/20] lpfc: Fix NVME io abort failures causing hangs James Smart
2019-09-22  3:58 ` [PATCH 05/20] lpfc: Fix rpi release when deleting vport James Smart
2019-09-22  3:58 ` [PATCH 06/20] lpfc: Fix device recovery errors after PLOGI failures James Smart
2019-09-22  3:58 ` [PATCH 07/20] lpfc: Fix locking on mailbox command completion James Smart
2019-09-22  3:58 ` [PATCH 08/20] lpfc: Fix GPF on scsi " James Smart
2019-09-22  3:58 ` [PATCH 09/20] lpfc: Fix discovery failures when target device connectivity bounces James Smart
2019-09-22  3:58 ` [PATCH 10/20] lpfc: Fix NVMe ABTS in response to receiving an ABTS James Smart
2019-09-22  3:58 ` [PATCH 11/20] lpfc: Fix coverity errors on NULL pointer checks James Smart
2019-09-22  3:58 ` James Smart [this message]
2019-09-22  3:58 ` [PATCH 13/20] lpfc: Fix list corruption in lpfc_sli_get_iocbq James Smart
2019-09-22  3:59 ` [PATCH 14/20] lpfc: Fix spinlock_irq issues in lpfc_els_flush_cmd() James Smart
2019-09-22  3:59 ` [PATCH 15/20] lpfc: Fix hdwq sgl locks and irq handling James Smart
2019-09-22  3:59 ` [PATCH 16/20] lpfc: Fix list corruption detected in lpfc_put_sgl_per_hdwq James Smart
2019-09-22  3:59 ` [PATCH 17/20] lpfc: Update async event logging James Smart
2019-09-22  3:59 ` [PATCH 18/20] lpfc: Complete removal of FCoE T10diff support on SLI-4 adapters James Smart
2019-09-22  3:59 ` [PATCH 19/20] lpfc: cleanup: remove unused fcp_txcmlpq_cnt James Smart
2019-09-22  3:59 ` [PATCH 20/20] lpfc: Update lpfc version to 12.4.0.1 James Smart
2019-10-01  2:07 ` [PATCH 00/20] lpfc: Update lpfc to revision 12.4.0.1 Martin K. Petersen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190922035906.10977-13-jsmart2021@gmail.com \
    --to=jsmart2021@gmail.com \
    --cc=dick.kennedy@broadcom.com \
    --cc=linux-scsi@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.