* [PATCH] ata: libata: cancel pending work after clearing deferred_qc
@ 2026-03-03 10:03 Niklas Cassel
2026-03-03 10:38 ` Damien Le Moal
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Niklas Cassel @ 2026-03-03 10:03 UTC (permalink / raw)
To: Damien Le Moal, Niklas Cassel, John Garry, Martin K. Petersen,
Hannes Reinecke, Igor Pylypiv
Cc: syzbot+bcaf842a1e8ead8dfb89, linux-ide
Syzbot reported a WARN_ON() in ata_scsi_deferred_qc_work(), caused by
ap->ops->qc_defer() returning non-zero before issuing the deferred qc.
ata_scsi_schedule_deferred_qc() is called during each command completion.
This function will check if there is a deferred QC, and if
ap->ops->qc_defer() returns zero, meaning that it is possible to queue the
deferred qc at this time (without being deferred), then it will queue the
work which will issue the deferred qc.
Once the work get to run, which can potentially be a very long time after
the work was scheduled, there is a WARN_ON() if ap->ops->qc_defer() returns
non-zero.
While we hold the ap->lock both when assigning and clearing deferred_qc,
and the work itself holds the ap->lock, the code currently does not cancel
the work after clearing the deferred qc.
This means that the following scenario can happen:
1) One or several NCQ commands are queued.
2) A non-NCQ command is queued, gets stored in ap->deferred_qc.
3) Last NCQ command gets completed, work is queued to issue the deferred
qc.
4) Timeout or error happens, ap->deferred_qc is cleared. The queued work is
currently NOT canceled.
5) Port is reset.
6) One or several NCQ commands are queued.
7) A non-NCQ command is queued, gets stored in ap->deferred_qc.
8) Work is finally run. Yet at this time, there is still NCQ commands in
flight.
The work in 8) really belongs to the non-NCQ command in 2), not to the
non-NCQ command in 7). The reason why the work is executed when it is not
supposed to, is because it was never canceled when ap->deferred_qc was
cleared in 4). Thus, ensure that we always cancel the work after clearing
ap->deferred_qc.
Another potential fix would have been to let ata_scsi_deferred_qc_work() do
nothing if ap->ops->qc_defer() returns non-zero. However, canceling the
work when clearing ap->deferred_qc seems slightly more logical, as we hold
the ap->lock when clearing ap->deferred_qc, so we know that the work cannot
be holding the lock. (The function could be waiting for the lock, but that
is okay since it will do nothing if ap->deferred_qc is not set.)
Reported-by: syzbot+bcaf842a1e8ead8dfb89@syzkaller.appspotmail.com
Fixes: 0ea84089dbf6 ("ata: libata-scsi: avoid Non-NCQ command starvation")
Fixes: eddb98ad9364 ("ata: libata-eh: correctly handle deferred qc timeouts")
Signed-off-by: Niklas Cassel <cassel@kernel.org>
---
drivers/ata/libata-eh.c | 1 +
drivers/ata/libata-scsi.c | 1 +
2 files changed, 2 insertions(+)
diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index b373cceb95d2..563432400f72 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -659,6 +659,7 @@ void ata_scsi_cmd_error_handler(struct Scsi_Host *host, struct ata_port *ap,
*/
WARN_ON_ONCE(qc->flags & ATA_QCFLAG_ACTIVE);
ap->deferred_qc = NULL;
+ cancel_work(&ap->deferred_qc_work);
set_host_byte(scmd, DID_TIME_OUT);
scsi_eh_finish_cmd(scmd, &ap->eh_done_q);
} else if (i < ATA_MAX_QUEUE) {
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index c0dd75a0287c..ad798e5246b4 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -1699,6 +1699,7 @@ void ata_scsi_requeue_deferred_qc(struct ata_port *ap)
scmd = qc->scsicmd;
ap->deferred_qc = NULL;
+ cancel_work(&ap->deferred_qc_work);
ata_qc_free(qc);
scmd->result = (DID_SOFT_ERROR << 16);
scsi_done(scmd);
--
2.53.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] ata: libata: cancel pending work after clearing deferred_qc
2026-03-03 10:03 [PATCH] ata: libata: cancel pending work after clearing deferred_qc Niklas Cassel
@ 2026-03-03 10:38 ` Damien Le Moal
2026-03-03 18:26 ` Igor Pylypiv
2026-03-04 11:19 ` Niklas Cassel
2 siblings, 0 replies; 4+ messages in thread
From: Damien Le Moal @ 2026-03-03 10:38 UTC (permalink / raw)
To: Niklas Cassel, John Garry, Martin K. Petersen, Hannes Reinecke,
Igor Pylypiv
Cc: syzbot+bcaf842a1e8ead8dfb89, linux-ide
On 3/3/26 19:03, Niklas Cassel wrote:
> Syzbot reported a WARN_ON() in ata_scsi_deferred_qc_work(), caused by
> ap->ops->qc_defer() returning non-zero before issuing the deferred qc.
>
> ata_scsi_schedule_deferred_qc() is called during each command completion.
> This function will check if there is a deferred QC, and if
> ap->ops->qc_defer() returns zero, meaning that it is possible to queue the
> deferred qc at this time (without being deferred), then it will queue the
> work which will issue the deferred qc.
>
> Once the work get to run, which can potentially be a very long time after
> the work was scheduled, there is a WARN_ON() if ap->ops->qc_defer() returns
> non-zero.
>
> While we hold the ap->lock both when assigning and clearing deferred_qc,
> and the work itself holds the ap->lock, the code currently does not cancel
> the work after clearing the deferred qc.
>
> This means that the following scenario can happen:
> 1) One or several NCQ commands are queued.
> 2) A non-NCQ command is queued, gets stored in ap->deferred_qc.
> 3) Last NCQ command gets completed, work is queued to issue the deferred
> qc.
> 4) Timeout or error happens, ap->deferred_qc is cleared. The queued work is
> currently NOT canceled.
> 5) Port is reset.
> 6) One or several NCQ commands are queued.
> 7) A non-NCQ command is queued, gets stored in ap->deferred_qc.
> 8) Work is finally run. Yet at this time, there is still NCQ commands in
> flight.
>
> The work in 8) really belongs to the non-NCQ command in 2), not to the
> non-NCQ command in 7). The reason why the work is executed when it is not
> supposed to, is because it was never canceled when ap->deferred_qc was
> cleared in 4). Thus, ensure that we always cancel the work after clearing
> ap->deferred_qc.
>
> Another potential fix would have been to let ata_scsi_deferred_qc_work() do
> nothing if ap->ops->qc_defer() returns non-zero. However, canceling the
> work when clearing ap->deferred_qc seems slightly more logical, as we hold
> the ap->lock when clearing ap->deferred_qc, so we know that the work cannot
> be holding the lock. (The function could be waiting for the lock, but that
> is okay since it will do nothing if ap->deferred_qc is not set.)
>
> Reported-by: syzbot+bcaf842a1e8ead8dfb89@syzkaller.appspotmail.com
> Fixes: 0ea84089dbf6 ("ata: libata-scsi: avoid Non-NCQ command starvation")
> Fixes: eddb98ad9364 ("ata: libata-eh: correctly handle deferred qc timeouts")
> Signed-off-by: Niklas Cassel <cassel@kernel.org>
Looks good.
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Note that a followup cleanup, we may want to define a
ata_scsi_cancel_deferred_qc() helper for the pattern:
ap->deferred_qc = NULL;
cancel_work(&ap->deferred_qc_work);
to avoid repetitions.
> ---
> drivers/ata/libata-eh.c | 1 +
> drivers/ata/libata-scsi.c | 1 +
> 2 files changed, 2 insertions(+)
>
> diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
> index b373cceb95d2..563432400f72 100644
> --- a/drivers/ata/libata-eh.c
> +++ b/drivers/ata/libata-eh.c
> @@ -659,6 +659,7 @@ void ata_scsi_cmd_error_handler(struct Scsi_Host *host, struct ata_port *ap,
> */
> WARN_ON_ONCE(qc->flags & ATA_QCFLAG_ACTIVE);
> ap->deferred_qc = NULL;
> + cancel_work(&ap->deferred_qc_work);
> set_host_byte(scmd, DID_TIME_OUT);
> scsi_eh_finish_cmd(scmd, &ap->eh_done_q);
> } else if (i < ATA_MAX_QUEUE) {
> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
> index c0dd75a0287c..ad798e5246b4 100644
> --- a/drivers/ata/libata-scsi.c
> +++ b/drivers/ata/libata-scsi.c
> @@ -1699,6 +1699,7 @@ void ata_scsi_requeue_deferred_qc(struct ata_port *ap)
>
> scmd = qc->scsicmd;
> ap->deferred_qc = NULL;
> + cancel_work(&ap->deferred_qc_work);
> ata_qc_free(qc);
> scmd->result = (DID_SOFT_ERROR << 16);
> scsi_done(scmd);
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] ata: libata: cancel pending work after clearing deferred_qc
2026-03-03 10:03 [PATCH] ata: libata: cancel pending work after clearing deferred_qc Niklas Cassel
2026-03-03 10:38 ` Damien Le Moal
@ 2026-03-03 18:26 ` Igor Pylypiv
2026-03-04 11:19 ` Niklas Cassel
2 siblings, 0 replies; 4+ messages in thread
From: Igor Pylypiv @ 2026-03-03 18:26 UTC (permalink / raw)
To: Niklas Cassel
Cc: Damien Le Moal, John Garry, Martin K. Petersen, Hannes Reinecke,
syzbot+bcaf842a1e8ead8dfb89, linux-ide
On Tue, Mar 03, 2026 at 11:03:42AM +0100, Niklas Cassel wrote:
> Syzbot reported a WARN_ON() in ata_scsi_deferred_qc_work(), caused by
> ap->ops->qc_defer() returning non-zero before issuing the deferred qc.
>
> ata_scsi_schedule_deferred_qc() is called during each command completion.
> This function will check if there is a deferred QC, and if
> ap->ops->qc_defer() returns zero, meaning that it is possible to queue the
> deferred qc at this time (without being deferred), then it will queue the
> work which will issue the deferred qc.
>
> Once the work get to run, which can potentially be a very long time after
> the work was scheduled, there is a WARN_ON() if ap->ops->qc_defer() returns
> non-zero.
>
> While we hold the ap->lock both when assigning and clearing deferred_qc,
> and the work itself holds the ap->lock, the code currently does not cancel
> the work after clearing the deferred qc.
>
> This means that the following scenario can happen:
> 1) One or several NCQ commands are queued.
> 2) A non-NCQ command is queued, gets stored in ap->deferred_qc.
> 3) Last NCQ command gets completed, work is queued to issue the deferred
> qc.
> 4) Timeout or error happens, ap->deferred_qc is cleared. The queued work is
> currently NOT canceled.
> 5) Port is reset.
> 6) One or several NCQ commands are queued.
> 7) A non-NCQ command is queued, gets stored in ap->deferred_qc.
> 8) Work is finally run. Yet at this time, there is still NCQ commands in
> flight.
>
> The work in 8) really belongs to the non-NCQ command in 2), not to the
> non-NCQ command in 7). The reason why the work is executed when it is not
> supposed to, is because it was never canceled when ap->deferred_qc was
> cleared in 4). Thus, ensure that we always cancel the work after clearing
> ap->deferred_qc.
>
> Another potential fix would have been to let ata_scsi_deferred_qc_work() do
> nothing if ap->ops->qc_defer() returns non-zero. However, canceling the
> work when clearing ap->deferred_qc seems slightly more logical, as we hold
> the ap->lock when clearing ap->deferred_qc, so we know that the work cannot
> be holding the lock. (The function could be waiting for the lock, but that
> is okay since it will do nothing if ap->deferred_qc is not set.)
>
> Reported-by: syzbot+bcaf842a1e8ead8dfb89@syzkaller.appspotmail.com
> Fixes: 0ea84089dbf6 ("ata: libata-scsi: avoid Non-NCQ command starvation")
> Fixes: eddb98ad9364 ("ata: libata-eh: correctly handle deferred qc timeouts")
> Signed-off-by: Niklas Cassel <cassel@kernel.org>
Reviewed-by: Igor Pylypiv <ipylypiv@google.com>
Thanks,
Igor
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] ata: libata: cancel pending work after clearing deferred_qc
2026-03-03 10:03 [PATCH] ata: libata: cancel pending work after clearing deferred_qc Niklas Cassel
2026-03-03 10:38 ` Damien Le Moal
2026-03-03 18:26 ` Igor Pylypiv
@ 2026-03-04 11:19 ` Niklas Cassel
2 siblings, 0 replies; 4+ messages in thread
From: Niklas Cassel @ 2026-03-04 11:19 UTC (permalink / raw)
To: Damien Le Moal, John Garry, Martin K. Petersen, Hannes Reinecke,
Igor Pylypiv, Niklas Cassel
Cc: syzbot+bcaf842a1e8ead8dfb89, linux-ide
On Tue, 03 Mar 2026 11:03:42 +0100, Niklas Cassel wrote:
> Syzbot reported a WARN_ON() in ata_scsi_deferred_qc_work(), caused by
> ap->ops->qc_defer() returning non-zero before issuing the deferred qc.
>
> ata_scsi_schedule_deferred_qc() is called during each command completion.
> This function will check if there is a deferred QC, and if
> ap->ops->qc_defer() returns zero, meaning that it is possible to queue the
> deferred qc at this time (without being deferred), then it will queue the
> work which will issue the deferred qc.
>
> [...]
Applied to libata/linux.git (libata-for-7.0-fixes), thanks!
[1/1] ata: libata: cancel pending work after clearing deferred_qc
https://git.kernel.org/libata/linux/c/aac9b27f
Kind regards,
Niklas
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-03-04 11:19 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-03 10:03 [PATCH] ata: libata: cancel pending work after clearing deferred_qc Niklas Cassel
2026-03-03 10:38 ` Damien Le Moal
2026-03-03 18:26 ` Igor Pylypiv
2026-03-04 11:19 ` Niklas Cassel
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox