From: Dmitry Bogdanov <d.bogdanov@yadro.com>
To: Tony Battersby <tonyb@cybernetics.com>
Cc: Nilesh Javali <njavali@marvell.com>,
<GR-QLogic-Storage-Upstream@marvell.com>,
"James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>,
"Martin K. Petersen" <martin.petersen@oracle.com>,
linux-scsi <linux-scsi@vger.kernel.org>,
<target-devel@vger.kernel.org>,
<scst-devel@lists.sourceforge.net>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [DMARC Error]Re: [PATCH 08/15] scsi: qla2xxx: fix oops during cmd abort
Date: Thu, 25 Sep 2025 11:42:52 +0300 [thread overview]
Message-ID: <20250925084252.GA821@yadro.com> (raw)
In-Reply-To: <e8cc07cf-9bd1-41a4-bd46-44e18179154b@cybernetics.com>
On Wed, Sep 24, 2025 at 03:41:32PM -0400, Tony Battersby wrote:
>
> On 9/11/25 10:21, Dmitry Bogdanov wrote:
> > On Mon, Sep 08, 2025 at 02:58:06PM -0400, Tony Battersby wrote:
> >> (target mode)
> >>
> >> There is a race between the following:
> >>
> >> CPU 1:
> >> scst_hw_pending_work_fn() ->
> >> sqa_on_hw_pending_cmd_timeout() ->
> >> qlt_abort_cmd() ->
> >> qlt_unmap_sg()
> >>
> >> CPU 2:
> >> qla_do_work() ->
> >> qla24xx_process_response_queue() ->
> >> qlt_do_ctio_completion() ->
> >> qlt_unmap_sg()
> >>
> >> Two CPUs calling qlt_unmap_sg() on the same cmd at the same time
> >> results in an oops:
> >>
> >> dma_unmap_sg_attrs()
> >> BUG_ON(!valid_dma_direction(dir));
> >>
> >> This race is more likely to happen because qlt_abort_cmd() may cause the
> >> hardware to send a CTIO.
> >>
> >> The solution is to lock cmd->qpair->qp_lock_ptr when aborting a command.
> >> This makes it possible to check the cmd state and make decisions about
> >> what to do without racing with the CTIO handler and other code.
> >>
> >> - Lock cmd->qpair->qp_lock_ptr when aborting a cmd.
> >> - Eliminate cmd->cmd_lock and change cmd->aborted back to a bitfield
> >> since it is now protected by qp_lock_ptr just like all the other
> >> flags.
> >> - Add another command state QLA_TGT_STATE_DONE to avoid any possible
> >> races between qlt_abort_cmd() and tgt_ops->free_cmd().
> >> - Add the cmd->sent_term_exchg flag to indicate if
> >> qlt_send_term_exchange() has already been called.
> >> - For SCST (scst_hw_pending_work_fn()), export qlt_send_term_exchange()
> >> and qlt_unmap_sg() so that they can be called directly instead of
> >> trying to make qlt_abort_cmd() work for both HW timeout and TMR abort.
> >> - Add TRC_CTIO_IGNORED for scst_hw_pending_work_fn().
> >>
> >> Fixes: 26f9ce53817a ("scsi: qla2xxx: Fix missed DMA unmap for aborted commands")
> > You are trying to fix that commit using its approach, but actually that
> > approach is the root cause itself. It is not ok to unmap dma while that
> > memory is owned by HW.
> >
> > We use this patch 4 years already instead of 26f9ce53817a and never
> > faced with such crashes.
> >
> >
> > From: Dmitry Bogdanov <d.bogdanov@yadro.com>
> > Date: Wed, 20 Oct 2021 15:57:31 +0300
> > Subject: [PATCH] scsi: qla2xxx: clear cmds after chip reset
> >
> > Commands sent to FW, after chip reset got stuck and never freed as FW is
> > not going to response to them anymore.
> >
> > This patch partially reverts aefed3e5548f at __qla2x00_abort_all_cmds.
> >
> > Fixes: aefed3e5548f ("scsi: qla2xxx: target: Fix offline port handling and host reset handling")
> > Signed-off-by: Dmitry Bogdanov <d.bogdanov@yadro.com>
>
> Hey Dmitry, I want to pick up your patch and add it to my v2 patchset,
> but I have made a few changes to it. Do I have your permission to add
> your "Co-developed-by" and "Signed-off-by" tags to the patch below?
> (Never did this before, I think I need to ask permission.)
Yes, of course, feel free to use it. Signed-off-by is enough I think.
> Compared to your patch, I changed "if (cmd->se_cmd.t_state ==
> TRANSPORT_WRITE_PENDING)" to "if (cmd->state ==
> QLA_TGT_STATE_NEED_DATA)" (which is the way it was originally) to work
> better with SCST and added the revert of 26f9ce53817a.
Yes, that is more correct.
> I will reply to this message with the two updated v2 patches that follow
> your suggestions and remove the dangerous code that you objected to. If
> you approve of them, then I will submit the entire v2 patchset, since
> some of the other patches needed to be rebased.
>
next prev parent reply other threads:[~2025-09-25 8:52 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-08 18:45 [PATCH 00/15] qla2xxx target mode improvements Tony Battersby
2025-09-08 18:47 ` [PATCH 01/15] Revert "scsi: qla2xxx: Perform lockless command completion in abort path" Tony Battersby
2025-09-08 18:48 ` [PATCH 02/15] scsi: qla2xxx: fix initiator mode with qlini_mode=exclusive Tony Battersby
2025-09-08 18:50 ` [PATCH 03/15] scsi: qla2xxx: fix lost interrupts with qlini_mode=disabled Tony Battersby
2025-09-08 18:51 ` [PATCH 04/15] scsi: qla2xxx: use reinit_completion on mbx_intr_comp Tony Battersby
2025-09-08 18:53 ` [PATCH 05/15] scsi: qla2xxx: remove code for unsupported hardware Tony Battersby
2025-09-08 18:54 ` [PATCH 06/15] scsi: qla2xxx: improve debug output for term exchange Tony Battersby
2025-09-08 18:56 ` [PATCH 07/15] scsi: qla2xxx: fix term exchange when cmd_sent_to_fw == 1 Tony Battersby
2025-09-08 18:58 ` [PATCH 08/15] scsi: qla2xxx: fix oops during cmd abort Tony Battersby
2025-09-08 18:59 ` [SCST PATCH " Tony Battersby
2025-09-11 14:21 ` [PATCH " Dmitry Bogdanov
2025-09-24 19:41 ` Tony Battersby
2025-09-24 19:43 ` [PATCH v2 09/16] scsi: qla2xxx: fix races with aborting commands Tony Battersby
2025-09-24 19:45 ` [SCST PATCH " Tony Battersby
2025-09-25 8:42 ` Dmitry Bogdanov [this message]
2025-09-08 19:01 ` [PATCH 09/15] scsi: qla2xxx: improve checks in qlt_xmit_response / qlt_rdy_to_xfer Tony Battersby
2025-09-08 19:02 ` [PATCH 10/15] scsi: qla2xxx: fix TMR failure handling Tony Battersby
2025-09-12 14:36 ` Dmitry Bogdanov
2025-09-16 16:04 ` Tony Battersby
2025-09-17 13:06 ` [DMARC Error]Re: " Dmitry Bogdanov
2025-09-17 20:38 ` Tony Battersby
2025-09-08 19:04 ` [PATCH 11/15] scsi: qla2xxx: fix invalid memory access with big CDBs Tony Battersby
2025-09-08 19:05 ` [SCST PATCH " Tony Battersby
2025-09-08 19:07 ` [PATCH 12/15] scsi: qla2xxx: add cmd->rsp_sent Tony Battersby
2025-09-08 19:08 ` [SCST PATCH " Tony Battersby
2025-09-15 13:47 ` [PATCH " Dmitry Bogdanov
2025-09-24 20:04 ` Tony Battersby
2025-09-08 19:09 ` [PATCH 13/15] scsi: qla2xxx: improve cmd logging Tony Battersby
2025-09-08 19:10 ` [PATCH 14/15] scsi: qla2xxx: add back SRR support Tony Battersby
2025-09-08 19:11 ` [SCST PATCH " Tony Battersby
2025-09-25 12:49 ` [PATCH " Xose Vazquez Perez
2025-09-25 15:30 ` Xose Vazquez Perez
2025-09-25 16:04 ` Tony Battersby
2025-09-25 17:00 ` Tony Battersby
2025-09-25 19:30 ` Tony Battersby
2025-09-08 19:13 ` [PATCH 15/15] scsi: qla2xxx: improve safety of cmd lookup by handle Tony Battersby
2025-09-08 19:14 ` [SCST PATCH] qla2x00t-32gbit: add on_abort_cmd callback Tony Battersby
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250925084252.GA821@yadro.com \
--to=d.bogdanov@yadro.com \
--cc=GR-QLogic-Storage-Upstream@marvell.com \
--cc=James.Bottomley@HansenPartnership.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=njavali@marvell.com \
--cc=scst-devel@lists.sourceforge.net \
--cc=target-devel@vger.kernel.org \
--cc=tonyb@cybernetics.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.