public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Dmitry Bogdanov <d.bogdanov@yadro.com>
To: Tony Battersby <tonyb@cybernetics.com>
Cc: Nilesh Javali <njavali@marvell.com>,
	<GR-QLogic-Storage-Upstream@marvell.com>,
	"James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	linux-scsi <linux-scsi@vger.kernel.org>,
	<target-devel@vger.kernel.org>,
	<scst-devel@lists.sourceforge.net>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 08/15] scsi: qla2xxx: fix oops during cmd abort
Date: Thu, 11 Sep 2025 17:21:35 +0300	[thread overview]
Message-ID: <20250911142135.GA624@yadro.com> (raw)
In-Reply-To: <80974286-f8ac-4eff-9439-c05fe38716b1@cybernetics.com>

On Mon, Sep 08, 2025 at 02:58:06PM -0400, Tony Battersby wrote:
> 
> (target mode)
> 
> There is a race between the following:
> 
> CPU 1:
> scst_hw_pending_work_fn() ->
> sqa_on_hw_pending_cmd_timeout() ->
> qlt_abort_cmd() ->
> qlt_unmap_sg()
> 
> CPU 2:
> qla_do_work() ->
> qla24xx_process_response_queue() ->
> qlt_do_ctio_completion() ->
> qlt_unmap_sg()
> 
> Two CPUs calling qlt_unmap_sg() on the same cmd at the same time
> results in an oops:
> 
> dma_unmap_sg_attrs()
>         BUG_ON(!valid_dma_direction(dir));
> 
> This race is more likely to happen because qlt_abort_cmd() may cause the
> hardware to send a CTIO.
> 
> The solution is to lock cmd->qpair->qp_lock_ptr when aborting a command.
> This makes it possible to check the cmd state and make decisions about
> what to do without racing with the CTIO handler and other code.
> 
> - Lock cmd->qpair->qp_lock_ptr when aborting a cmd.
> - Eliminate cmd->cmd_lock and change cmd->aborted back to a bitfield
>   since it is now protected by qp_lock_ptr just like all the other
>   flags.
> - Add another command state QLA_TGT_STATE_DONE to avoid any possible
>   races between qlt_abort_cmd() and tgt_ops->free_cmd().
> - Add the cmd->sent_term_exchg flag to indicate if
>   qlt_send_term_exchange() has already been called.
> - For SCST (scst_hw_pending_work_fn()), export qlt_send_term_exchange()
>   and qlt_unmap_sg() so that they can be called directly instead of
>   trying to make qlt_abort_cmd() work for both HW timeout and TMR abort.
> - Add TRC_CTIO_IGNORED for scst_hw_pending_work_fn().
> 
> Fixes: 26f9ce53817a ("scsi: qla2xxx: Fix missed DMA unmap for aborted commands")

You are trying to fix that commit using its approach, but actually that
approach is the root cause itself. It is not ok to unmap dma while that
memory is owned by HW.

We use this patch 4 years already instead of 26f9ce53817a and never
faced with such crashes.


From: Dmitry Bogdanov <d.bogdanov@yadro.com>
Date: Wed, 20 Oct 2021 15:57:31 +0300
Subject: [PATCH] scsi: qla2xxx: clear cmds after chip reset

Commands sent to FW, after chip reset got stuck and never freed as FW is
not going to response to them anymore.

This patch partially reverts aefed3e5548f at __qla2x00_abort_all_cmds.

Fixes: aefed3e5548f ("scsi: qla2xxx: target: Fix offline port handling and host reset handling")
Signed-off-by: Dmitry Bogdanov <d.bogdanov@yadro.com>
---
 drivers/scsi/qla2xxx/qla_os.c     | 20 ++++++++++++++++++--
 drivers/scsi/qla2xxx/qla_target.c |  2 +-
 drivers/scsi/qla2xxx/qla_target.h |  1 +
 3 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
index 407047e8b42b..04b0d3eb97e7 100644
--- a/drivers/scsi/qla2xxx/qla_os.c
+++ b/drivers/scsi/qla2xxx/qla_os.c
@@ -1809,10 +1809,26 @@ __qla2x00_abort_all_cmds(struct qla_qpair *qp, int res)
 					continue;
 				}
 				cmd = (struct qla_tgt_cmd *)sp;
-				cmd->aborted = 1;
+
+				if (cmd->sg_mapped)
+					qlt_unmap_sg(vha, cmd);
+
+				if (cmd->se_cmd.t_state == TRANSPORT_WRITE_PENDING) {
+					cmd->aborted = 1;
+					cmd->write_data_transferred = 0;
+					cmd->state = QLA_TGT_STATE_DATA_IN;
+					ha->tgt.tgt_ops->handle_data(cmd);
+				} else {
+					ha->tgt.tgt_ops->free_cmd(cmd);
+				}
 				break;
 			case TYPE_TGT_TMCMD:
-				/* Skip task management functions. */
+				/*
+				 * Currently, only ABTS response gets on the
+				 * outstanding_cmds[]
+				 */
+				ha->tgt.tgt_ops->free_mcmd(
+					(struct qla_tgt_mgmt_cmd *) sp);
 				break;
 			default:
 				break;
diff --git a/drivers/scsi/qla2xxx/qla_target.c b/drivers/scsi/qla2xxx/qla_target.c
index 8dfd13af48ea..c3b1a1426253 100644
--- a/drivers/scsi/qla2xxx/qla_target.c
+++ b/drivers/scsi/qla2xxx/qla_target.c
@@ -2463,7 +2463,7 @@ static int qlt_pci_map_calc_cnt(struct qla_tgt_prm *prm)
 	return -1;
 }
 
-static void qlt_unmap_sg(struct scsi_qla_host *vha, struct qla_tgt_cmd *cmd)
+void qlt_unmap_sg(struct scsi_qla_host *vha, struct qla_tgt_cmd *cmd)
 {
 	struct qla_hw_data *ha;
 	struct qla_qpair *qpair;
diff --git a/drivers/scsi/qla2xxx/qla_target.h b/drivers/scsi/qla2xxx/qla_target.h
index 10e5e6c8087d..76e80208d731 100644
--- a/drivers/scsi/qla2xxx/qla_target.h
+++ b/drivers/scsi/qla2xxx/qla_target.h
@@ -1050,6 +1050,7 @@ extern int qlt_abort_cmd(struct qla_tgt_cmd *);
 extern void qlt_xmit_tm_rsp(struct qla_tgt_mgmt_cmd *);
 extern void qlt_free_mcmd(struct qla_tgt_mgmt_cmd *);
 extern void qlt_free_cmd(struct qla_tgt_cmd *cmd);
+extern void qlt_unmap_sg(struct scsi_qla_host *vha, struct qla_tgt_cmd *cmd);
 extern void qlt_async_event(uint16_t, struct scsi_qla_host *, uint16_t *);
 extern void qlt_enable_vha(struct scsi_qla_host *);
 extern void qlt_vport_create(struct scsi_qla_host *, struct qla_hw_data *);
-- 
2.25.1







  parent reply	other threads:[~2025-09-11 14:28 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-08 18:45 [PATCH 00/15] qla2xxx target mode improvements Tony Battersby
2025-09-08 18:47 ` [PATCH 01/15] Revert "scsi: qla2xxx: Perform lockless command completion in abort path" Tony Battersby
2025-09-08 18:48 ` [PATCH 02/15] scsi: qla2xxx: fix initiator mode with qlini_mode=exclusive Tony Battersby
2025-09-08 18:50 ` [PATCH 03/15] scsi: qla2xxx: fix lost interrupts with qlini_mode=disabled Tony Battersby
2025-09-08 18:51 ` [PATCH 04/15] scsi: qla2xxx: use reinit_completion on mbx_intr_comp Tony Battersby
2025-09-08 18:53 ` [PATCH 05/15] scsi: qla2xxx: remove code for unsupported hardware Tony Battersby
2025-09-08 18:54 ` [PATCH 06/15] scsi: qla2xxx: improve debug output for term exchange Tony Battersby
2025-09-08 18:56 ` [PATCH 07/15] scsi: qla2xxx: fix term exchange when cmd_sent_to_fw == 1 Tony Battersby
2025-09-08 18:58 ` [PATCH 08/15] scsi: qla2xxx: fix oops during cmd abort Tony Battersby
2025-09-08 18:59   ` [SCST PATCH " Tony Battersby
2025-09-11 14:21   ` Dmitry Bogdanov [this message]
2025-09-24 19:41     ` [PATCH " Tony Battersby
2025-09-24 19:43       ` [PATCH v2 09/16] scsi: qla2xxx: fix races with aborting commands Tony Battersby
2025-09-24 19:45         ` [SCST PATCH " Tony Battersby
2025-09-25  8:42       ` [DMARC Error]Re: [PATCH 08/15] scsi: qla2xxx: fix oops during cmd abort Dmitry Bogdanov
2025-09-08 19:01 ` [PATCH 09/15] scsi: qla2xxx: improve checks in qlt_xmit_response / qlt_rdy_to_xfer Tony Battersby
2025-09-08 19:02 ` [PATCH 10/15] scsi: qla2xxx: fix TMR failure handling Tony Battersby
2025-09-12 14:36   ` Dmitry Bogdanov
2025-09-16 16:04     ` Tony Battersby
2025-09-17 13:06       ` [DMARC Error]Re: " Dmitry Bogdanov
2025-09-17 20:38         ` Tony Battersby
2025-09-08 19:04 ` [PATCH 11/15] scsi: qla2xxx: fix invalid memory access with big CDBs Tony Battersby
2025-09-08 19:05   ` [SCST PATCH " Tony Battersby
2025-09-08 19:07 ` [PATCH 12/15] scsi: qla2xxx: add cmd->rsp_sent Tony Battersby
2025-09-08 19:08   ` [SCST PATCH " Tony Battersby
2025-09-15 13:47   ` [PATCH " Dmitry Bogdanov
2025-09-24 20:04     ` Tony Battersby
2025-09-08 19:09 ` [PATCH 13/15] scsi: qla2xxx: improve cmd logging Tony Battersby
2025-09-08 19:10 ` [PATCH 14/15] scsi: qla2xxx: add back SRR support Tony Battersby
2025-09-08 19:11   ` [SCST PATCH " Tony Battersby
2025-09-25 12:49   ` [PATCH " Xose Vazquez Perez
2025-09-25 15:30     ` Xose Vazquez Perez
2025-09-25 16:04       ` Tony Battersby
2025-09-25 17:00         ` Tony Battersby
2025-09-25 19:30           ` Tony Battersby
2025-09-08 19:13 ` [PATCH 15/15] scsi: qla2xxx: improve safety of cmd lookup by handle Tony Battersby
2025-09-08 19:14 ` [SCST PATCH] qla2x00t-32gbit: add on_abort_cmd callback Tony Battersby

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250911142135.GA624@yadro.com \
    --to=d.bogdanov@yadro.com \
    --cc=GR-QLogic-Storage-Upstream@marvell.com \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=njavali@marvell.com \
    --cc=scst-devel@lists.sourceforge.net \
    --cc=target-devel@vger.kernel.org \
    --cc=tonyb@cybernetics.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox