Linux SCSI subsystem development
 help / color / mirror / Atom feed
From: Tony Battersby <tonyb@cybernetics.com>
To: Nilesh Javali <njavali@marvell.com>,
	GR-QLogic-Storage-Upstream@marvell.com,
	"James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>,
	"Martin K. Petersen" <martin.petersen@oracle.com>
Cc: linux-scsi <linux-scsi@vger.kernel.org>,
	target-devel@vger.kernel.org, scst-devel@lists.sourceforge.net,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: [PATCH 03/15] scsi: qla2xxx: fix lost interrupts with qlini_mode=disabled
Date: Mon, 8 Sep 2025 14:50:14 -0400	[thread overview]
Message-ID: <3572eea7-a6ef-46b4-8bdc-d0da72fd4b3c@cybernetics.com> (raw)
In-Reply-To: <f8977250-638c-4d7d-ac0c-65f742b8d535@cybernetics.com>

(target mode)

When qla2xxx is loaded with qlini_mode=disabled,
ha->flags.disable_msix_handshake is used before it is set, resulting in
the wrong interrupt handler being used on certain HBAs
(qla2xxx_msix_rsp_q_hs() is used when qla2xxx_msix_rsp_q() should be
used).  The only difference between these two interrupt handlers is
that the _hs() version writes to a register to clear the "RISC"
interrupt, whereas the other version does not.  So this bug results in
the RISC interrupt being cleared when it should not be.  This
occasionally causes a different interrupt handler
qla24xx_msix_default() for a different vector to see ((stat &
HSRX_RISC_INT) == 0) and ignore its interrupt, which then causes
problems like:

qla2xxx [0000:02:00.0]-d04c:6: MBX Command timeout for cmd 20,
  iocontrol=8 jiffies=1090c0300 mb[0-3]=[0x4000 0x0 0x40 0xda] mb7 0x500
  host_status 0x40000010 hccr 0x3f00
qla2xxx [0000:02:00.0]-101e:6: Mailbox cmd timeout occurred, cmd=0x20,
  mb[0]=0x20. Scheduling ISP abort
(the cmd varies; sometimes it is 0x20, 0x22, 0x54, 0x5a, 0x5d, or 0x6a)

This problem can be reproduced with a 16 or 32 Gbps HBA by loading
qla2xxx with qlini_mode=disabled and running a high IOPS test
while triggering frequent RSCN database change events.

While analyzing the problem I discovered that even with
disable_msix_handshake forced to 0, it is not necessary to clear the
RISC interrupt from qla2xxx_msix_rsp_q_hs() (more below).  So just
completely remove qla2xxx_msix_rsp_q_hs() and the logic for selecting
it, which also fixes the bug with qlini_mode=disabled.

The test below describes the justification for not needing
qla2xxx_msix_rsp_q_hs():

Force disable_msix_handshake to 0:
qla24xx_config_rings():
if (0 && (ha->fw_attributes & BIT_6) && (IS_MSIX_NACK_CAPABLE(ha)) &&
    (ha->flags.msix_enabled)) {

In qla24xx_msix_rsp_q() and qla2xxx_msix_rsp_q_hs(), check:
  (rd_reg_dword(&reg->host_status) & HSRX_RISC_INT)

Count the number of calls to each function with HSRX_RISC_INT set and
the number with HSRX_RISC_INT not set while performing some I/O.

If qla2xxx_msix_rsp_q_hs() clears the RISC interrupt (original code):
qla24xx_msix_rsp_q:    50% of calls have HSRX_RISC_INT set
qla2xxx_msix_rsp_q_hs:  5% of calls have HSRX_RISC_INT set
(# of qla2xxx_msix_rsp_q_hs interrupts) =
    (# of qla24xx_msix_rsp_q interrupts) * 3

If qla2xxx_msix_rsp_q_hs() does not clear the RISC interrupt (patched
code):
qla24xx_msix_rsp_q:    100% of calls have HSRX_RISC_INT set
qla2xxx_msix_rsp_q_hs:   9% of calls have HSRX_RISC_INT set
(# of qla2xxx_msix_rsp_q_hs interrupts) =
    (# of qla24xx_msix_rsp_q interrupts) * 3

In the case of the original code, qla24xx_msix_rsp_q() was seeing
HSRX_RISC_INT set only 50% of the time because qla2xxx_msix_rsp_q_hs()
was clearing it when it shouldn't have been.  In the patched code,
qla24xx_msix_rsp_q() sees HSRX_RISC_INT set 100% of the time, which
makes sense if that interrupt handler needs to clear the RISC interrupt
(which it does).  qla2xxx_msix_rsp_q_hs() sees HSRX_RISC_INT only 9% of
the time, which is just overlap from the other interrupt during the
high IOPS test.

Tested with SCST on:
QLE2742  FW:v9.08.02 (32 Gbps 2-port)
QLE2694L FW:v9.10.11 (16 Gbps 4-port)
QLE2694L FW:v9.08.02 (16 Gbps 4-port)
QLE2672  FW:v8.07.12 (16 Gbps 2-port)
both initiator and target mode

Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
---
 drivers/scsi/qla2xxx/qla_def.h |  1 -
 drivers/scsi/qla2xxx/qla_gbl.h |  2 +-
 drivers/scsi/qla2xxx/qla_isr.c | 32 +++-----------------------------
 drivers/scsi/qla2xxx/qla_mid.c |  4 +---
 4 files changed, 5 insertions(+), 34 deletions(-)

diff --git a/drivers/scsi/qla2xxx/qla_def.h b/drivers/scsi/qla2xxx/qla_def.h
index cb95b7b12051..b3265952c4be 100644
--- a/drivers/scsi/qla2xxx/qla_def.h
+++ b/drivers/scsi/qla2xxx/qla_def.h
@@ -3503,7 +3503,6 @@ struct isp_operations {
 #define QLA_MSIX_RSP_Q			0x01
 #define QLA_ATIO_VECTOR		0x02
 #define QLA_MSIX_QPAIR_MULTIQ_RSP_Q	0x03
-#define QLA_MSIX_QPAIR_MULTIQ_RSP_Q_HS	0x04
 
 #define QLA_MIDX_DEFAULT	0
 #define QLA_MIDX_RSP_Q		1
diff --git a/drivers/scsi/qla2xxx/qla_gbl.h b/drivers/scsi/qla2xxx/qla_gbl.h
index 145defc420f2..55d531c19e6b 100644
--- a/drivers/scsi/qla2xxx/qla_gbl.h
+++ b/drivers/scsi/qla2xxx/qla_gbl.h
@@ -766,7 +766,7 @@ extern int qla2x00_dfs_remove(scsi_qla_host_t *);
 
 /* Globa function prototypes for multi-q */
 extern int qla25xx_request_irq(struct qla_hw_data *, struct qla_qpair *,
-	struct qla_msix_entry *, int);
+	struct qla_msix_entry *);
 extern int qla25xx_init_req_que(struct scsi_qla_host *, struct req_que *);
 extern int qla25xx_init_rsp_que(struct scsi_qla_host *, struct rsp_que *);
 extern int qla25xx_create_req_que(struct qla_hw_data *, uint16_t, uint8_t,
diff --git a/drivers/scsi/qla2xxx/qla_isr.c b/drivers/scsi/qla2xxx/qla_isr.c
index c4c6b5c6658c..a3971afc2dd1 100644
--- a/drivers/scsi/qla2xxx/qla_isr.c
+++ b/drivers/scsi/qla2xxx/qla_isr.c
@@ -4467,32 +4467,6 @@ qla2xxx_msix_rsp_q(int irq, void *dev_id)
 	return IRQ_HANDLED;
 }
 
-irqreturn_t
-qla2xxx_msix_rsp_q_hs(int irq, void *dev_id)
-{
-	struct qla_hw_data *ha;
-	struct qla_qpair *qpair;
-	struct device_reg_24xx __iomem *reg;
-	unsigned long flags;
-
-	qpair = dev_id;
-	if (!qpair) {
-		ql_log(ql_log_info, NULL, 0x505b,
-		    "%s: NULL response queue pointer.\n", __func__);
-		return IRQ_NONE;
-	}
-	ha = qpair->hw;
-
-	reg = &ha->iobase->isp24;
-	spin_lock_irqsave(&ha->hardware_lock, flags);
-	wrt_reg_dword(&reg->hccr, HCCRX_CLR_RISC_INT);
-	spin_unlock_irqrestore(&ha->hardware_lock, flags);
-
-	queue_work(ha->wq, &qpair->q_work);
-
-	return IRQ_HANDLED;
-}
-
 /* Interrupt handling helpers. */
 
 struct qla_init_msix_entry {
@@ -4505,7 +4479,6 @@ static const struct qla_init_msix_entry msix_entries[] = {
 	{ "rsp_q", qla24xx_msix_rsp_q },
 	{ "atio_q", qla83xx_msix_atio_q },
 	{ "qpair_multiq", qla2xxx_msix_rsp_q },
-	{ "qpair_multiq_hs", qla2xxx_msix_rsp_q_hs },
 };
 
 static const struct qla_init_msix_entry qla82xx_msix_entries[] = {
@@ -4792,9 +4765,10 @@ qla2x00_free_irqs(scsi_qla_host_t *vha)
 }
 
 int qla25xx_request_irq(struct qla_hw_data *ha, struct qla_qpair *qpair,
-	struct qla_msix_entry *msix, int vector_type)
+	struct qla_msix_entry *msix)
 {
-	const struct qla_init_msix_entry *intr = &msix_entries[vector_type];
+	const struct qla_init_msix_entry *intr =
+		&msix_entries[QLA_MSIX_QPAIR_MULTIQ_RSP_Q];
 	scsi_qla_host_t *vha = pci_get_drvdata(ha->pdev);
 	int ret;
 
diff --git a/drivers/scsi/qla2xxx/qla_mid.c b/drivers/scsi/qla2xxx/qla_mid.c
index 8b71ac0b1d99..0abc47e72e0b 100644
--- a/drivers/scsi/qla2xxx/qla_mid.c
+++ b/drivers/scsi/qla2xxx/qla_mid.c
@@ -899,9 +899,7 @@ qla25xx_create_rsp_que(struct qla_hw_data *ha, uint16_t options,
 	    rsp->options, rsp->id, rsp->rsp_q_in,
 	    rsp->rsp_q_out);
 
-	ret = qla25xx_request_irq(ha, qpair, qpair->msix,
-		ha->flags.disable_msix_handshake ?
-		QLA_MSIX_QPAIR_MULTIQ_RSP_Q : QLA_MSIX_QPAIR_MULTIQ_RSP_Q_HS);
+	ret = qla25xx_request_irq(ha, qpair, qpair->msix);
 	if (ret)
 		goto que_failed;
 
-- 
2.43.0



  parent reply	other threads:[~2025-09-08 18:50 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-08 18:45 [PATCH 00/15] qla2xxx target mode improvements Tony Battersby
2025-09-08 18:47 ` [PATCH 01/15] Revert "scsi: qla2xxx: Perform lockless command completion in abort path" Tony Battersby
2025-09-08 18:48 ` [PATCH 02/15] scsi: qla2xxx: fix initiator mode with qlini_mode=exclusive Tony Battersby
2025-09-08 18:50 ` Tony Battersby [this message]
2025-09-08 18:51 ` [PATCH 04/15] scsi: qla2xxx: use reinit_completion on mbx_intr_comp Tony Battersby
2025-09-08 18:53 ` [PATCH 05/15] scsi: qla2xxx: remove code for unsupported hardware Tony Battersby
2025-09-08 18:54 ` [PATCH 06/15] scsi: qla2xxx: improve debug output for term exchange Tony Battersby
2025-09-08 18:56 ` [PATCH 07/15] scsi: qla2xxx: fix term exchange when cmd_sent_to_fw == 1 Tony Battersby
2025-09-08 18:58 ` [PATCH 08/15] scsi: qla2xxx: fix oops during cmd abort Tony Battersby
2025-09-08 18:59   ` [SCST PATCH " Tony Battersby
2025-09-11 14:21   ` [PATCH " Dmitry Bogdanov
2025-09-24 19:41     ` Tony Battersby
2025-09-24 19:43       ` [PATCH v2 09/16] scsi: qla2xxx: fix races with aborting commands Tony Battersby
2025-09-24 19:45         ` [SCST PATCH " Tony Battersby
2025-09-25  8:42       ` [DMARC Error]Re: [PATCH 08/15] scsi: qla2xxx: fix oops during cmd abort Dmitry Bogdanov
2025-09-08 19:01 ` [PATCH 09/15] scsi: qla2xxx: improve checks in qlt_xmit_response / qlt_rdy_to_xfer Tony Battersby
2025-09-08 19:02 ` [PATCH 10/15] scsi: qla2xxx: fix TMR failure handling Tony Battersby
2025-09-12 14:36   ` Dmitry Bogdanov
2025-09-16 16:04     ` Tony Battersby
2025-09-17 13:06       ` [DMARC Error]Re: " Dmitry Bogdanov
2025-09-17 20:38         ` Tony Battersby
2025-09-08 19:04 ` [PATCH 11/15] scsi: qla2xxx: fix invalid memory access with big CDBs Tony Battersby
2025-09-08 19:05   ` [SCST PATCH " Tony Battersby
2025-09-08 19:07 ` [PATCH 12/15] scsi: qla2xxx: add cmd->rsp_sent Tony Battersby
2025-09-08 19:08   ` [SCST PATCH " Tony Battersby
2025-09-15 13:47   ` [PATCH " Dmitry Bogdanov
2025-09-24 20:04     ` Tony Battersby
2025-09-08 19:09 ` [PATCH 13/15] scsi: qla2xxx: improve cmd logging Tony Battersby
2025-09-08 19:10 ` [PATCH 14/15] scsi: qla2xxx: add back SRR support Tony Battersby
2025-09-08 19:11   ` [SCST PATCH " Tony Battersby
2025-09-25 12:49   ` [PATCH " Xose Vazquez Perez
2025-09-25 15:30     ` Xose Vazquez Perez
2025-09-25 16:04       ` Tony Battersby
2025-09-25 17:00         ` Tony Battersby
2025-09-25 19:30           ` Tony Battersby
2025-09-08 19:13 ` [PATCH 15/15] scsi: qla2xxx: improve safety of cmd lookup by handle Tony Battersby
2025-09-08 19:14 ` [SCST PATCH] qla2x00t-32gbit: add on_abort_cmd callback Tony Battersby

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3572eea7-a6ef-46b4-8bdc-d0da72fd4b3c@cybernetics.com \
    --to=tonyb@cybernetics.com \
    --cc=GR-QLogic-Storage-Upstream@marvell.com \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=njavali@marvell.com \
    --cc=scst-devel@lists.sourceforge.net \
    --cc=target-devel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox