* [PATCH v3 0/2] Prevent non-NCQ command starvation
@ 2025-12-20 0:21 Damien Le Moal
2025-12-20 0:21 ` [PATCH v3 1/2] ata: libata-scsi: refactor ata_scsi_translate() Damien Le Moal
` (2 more replies)
0 siblings, 3 replies; 18+ messages in thread
From: Damien Le Moal @ 2025-12-20 0:21 UTC (permalink / raw)
To: linux-ide, Niklas Cassel; +Cc: Igor Pylypiv, Xingui Yang, John Garry
This small patch series addresses potential command starvation issues
with non-NCQ passthrough commands issued to a device accessed through
a multi-queue libsas HBA. In such setup, a non-NCQ passthrough command
issued by an application may suffer long delays and even complete
starvation if the device is under constant NCQ read/write workload.
This issue is avoided by limiting the requeing through the block layer
and instead explicitly waiting for the device command queue to drain
before issuing the non-NCQ command. This is done reliably by not trying
to second guess ata EH in case of error or device reset, by always
requeuing a deferred command whenever EH is scheduled for the device
port.
When a non-ncq command is deferred waiting for the device queue to
drain, all new incoming commands are always requeued through the regular
busy mechanism, thus avoiding the need to manage an internal command
queue.
With this, the long latencies observed when executing non-NCQ
passthrough commands under a heavy NCQ read/write workload at high queue
depth is resolved: the non-NCQ command latency only depends on how long
it takes for all in-flight commands to complete.
Changes from v2:
- Improved comment about ata_scsi_qc_new() behavior in patch 1
(suggested by Garry).
- Removed some lockdep annotations from patch 2 as they have little
value.
- In patch 2, change ata_scsi_requeue_deferred_qc() and
ata_scsi_schedule_deferred_qc() to return early as suggested by
Garry.
- Fixed comments typo in patch 2.
Changes from v1:
- Fixed ac leak in ata_scsi_defer() in patch 2.
- Added Hannes review tag
Damien Le Moal (2):
ata: libata-scsi: refactor ata_scsi_translate()
ata: libata-scsi: avoid passthrough command starvation
drivers/ata/libata-core.c | 1 +
drivers/ata/libata-eh.c | 6 ++
drivers/ata/libata-scsi.c | 155 ++++++++++++++++++++++++++++++++------
drivers/ata/libata.h | 2 +
include/linux/libata.h | 3 +
5 files changed, 142 insertions(+), 25 deletions(-)
--
2.52.0
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v3 1/2] ata: libata-scsi: refactor ata_scsi_translate()
2025-12-20 0:21 [PATCH v3 0/2] Prevent non-NCQ command starvation Damien Le Moal
@ 2025-12-20 0:21 ` Damien Le Moal
2025-12-22 21:19 ` Igor Pylypiv
2025-12-30 13:40 ` Niklas Cassel
2025-12-20 0:21 ` [PATCH v3 2/2] ata: libata-scsi: avoid passthrough command starvation Damien Le Moal
2025-12-31 10:04 ` [PATCH v3 0/2] Prevent non-NCQ " Niklas Cassel
2 siblings, 2 replies; 18+ messages in thread
From: Damien Le Moal @ 2025-12-20 0:21 UTC (permalink / raw)
To: linux-ide, Niklas Cassel; +Cc: Igor Pylypiv, Xingui Yang, John Garry
Factor out of ata_scsi_translate() the code handling queued command
deferral using the port qc_defer callback into the new function
ata_scsi_defer(), and simplify the goto used in ata_scsi_translate().
While at it, also add a lockdep annotation to check that the port lock
is held when ata_scsi_translate() is called.
No functional changes.
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
---
drivers/ata/libata-scsi.c | 66 ++++++++++++++++++++++++---------------
1 file changed, 40 insertions(+), 26 deletions(-)
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 721d3f270c8e..8e04fc173ea3 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -1691,6 +1691,30 @@ static void ata_scsi_qc_complete(struct ata_queued_cmd *qc)
ata_qc_done(qc);
}
+static int ata_scsi_defer(struct ata_port *ap, struct ata_queued_cmd *qc)
+{
+ int ret;
+
+ if (!ap->ops->qc_defer)
+ return 0;
+
+ ret = ap->ops->qc_defer(qc);
+ if (!ret)
+ return 0;
+
+ ata_qc_free(qc);
+
+ switch (ret) {
+ case ATA_DEFER_LINK:
+ return SCSI_MLQUEUE_DEVICE_BUSY;
+ case ATA_DEFER_PORT:
+ return SCSI_MLQUEUE_HOST_BUSY;
+ default:
+ WARN_ON_ONCE(1);
+ return SCSI_MLQUEUE_HOST_BUSY;
+ }
+}
+
/**
* ata_scsi_translate - Translate then issue SCSI command to ATA device
* @dev: ATA device to which the command is addressed
@@ -1714,8 +1738,8 @@ static void ata_scsi_qc_complete(struct ata_queued_cmd *qc)
* spin_lock_irqsave(host lock)
*
* RETURNS:
- * 0 on success, SCSI_ML_QUEUE_DEVICE_BUSY if the command
- * needs to be deferred.
+ * 0 on success, SCSI_ML_QUEUE_DEVICE_BUSY or SCSI_MLQUEUE_HOST_BUSY if the
+ * command needs to be deferred.
*/
static int ata_scsi_translate(struct ata_device *dev, struct scsi_cmnd *cmd,
ata_xlat_func_t xlat_func)
@@ -1724,56 +1748,46 @@ static int ata_scsi_translate(struct ata_device *dev, struct scsi_cmnd *cmd,
struct ata_queued_cmd *qc;
int rc;
+ lockdep_assert_held(ap->lock);
+
+ /*
+ * ata_scsi_qc_new() calls scsi_done(cmd) in case of failure. So we
+ * have nothing further to do here when allocating a qc fails.
+ */
qc = ata_scsi_qc_new(dev, cmd);
if (!qc)
- goto err_mem;
+ return 0;
/* data is present; dma-map it */
if (cmd->sc_data_direction == DMA_FROM_DEVICE ||
cmd->sc_data_direction == DMA_TO_DEVICE) {
if (unlikely(scsi_bufflen(cmd) < 1)) {
ata_dev_warn(dev, "WARNING: zero len r/w req\n");
- goto err_did;
+ cmd->result = (DID_ERROR << 16);
+ goto done;
}
ata_sg_init(qc, scsi_sglist(cmd), scsi_sg_count(cmd));
-
qc->dma_dir = cmd->sc_data_direction;
}
qc->complete_fn = ata_scsi_qc_complete;
if (xlat_func(qc))
- goto early_finish;
+ goto done;
- if (ap->ops->qc_defer) {
- if ((rc = ap->ops->qc_defer(qc)))
- goto defer;
- }
+ rc = ata_scsi_defer(ap, qc);
+ if (rc)
+ return rc;
- /* select device, send command to hardware */
ata_qc_issue(qc);
return 0;
-early_finish:
- ata_qc_free(qc);
- scsi_done(cmd);
- return 0;
-
-err_did:
+done:
ata_qc_free(qc);
- cmd->result = (DID_ERROR << 16);
scsi_done(cmd);
-err_mem:
return 0;
-
-defer:
- ata_qc_free(qc);
- if (rc == ATA_DEFER_LINK)
- return SCSI_MLQUEUE_DEVICE_BUSY;
- else
- return SCSI_MLQUEUE_HOST_BUSY;
}
/**
--
2.52.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v3 2/2] ata: libata-scsi: avoid passthrough command starvation
2025-12-20 0:21 [PATCH v3 0/2] Prevent non-NCQ command starvation Damien Le Moal
2025-12-20 0:21 ` [PATCH v3 1/2] ata: libata-scsi: refactor ata_scsi_translate() Damien Le Moal
@ 2025-12-20 0:21 ` Damien Le Moal
2025-12-22 21:28 ` Igor Pylypiv
` (3 more replies)
2025-12-31 10:04 ` [PATCH v3 0/2] Prevent non-NCQ " Niklas Cassel
2 siblings, 4 replies; 18+ messages in thread
From: Damien Le Moal @ 2025-12-20 0:21 UTC (permalink / raw)
To: linux-ide, Niklas Cassel; +Cc: Igor Pylypiv, Xingui Yang, John Garry
When a non-NCQ passthrough command is issued while NCQ commands are
being executed, ata_scsi_defer() indicates to ata_scsi_translate() that
ata_qc_issue() should not be called for the passthrough command, and
instead returns SCSI_MLQUEUE_XXX_BUSY to defer the command execution.
This command deferring is correct and as mandated by the ACS
specifications since NCQ and non-NCQ commands cannot be mixed.
However, in the case of a host adapter using multiple submission queues,
when the target device is under a constant load of NCQ read or write
commands, there are no guarantees that requeueing the non-NCQ command
will lead to it not being deferred again repeatedly, since other
submission queues can constantly issue NCQ commands from different CPUs
ahead of the non-NCQ command. This can lead to very long delays for the
execution of non-NCQ passthrough commands, and even complete starvation
in the worst case scenario.
Since the block layer and the SCSI layer do not distinguish between
queuable (NCQ) and non queueable (non-NCQ) commands, libata-scsi SAT
implementation must ensure forward progress for non-NCQ commands in the
presence of NCQ command traffic. This is similar to what SAS HBAs with a
hardware/firmware based implementation do.
Implement such forward progress guarantee by limiting requeueing of
non-NCQ commands: when a non-NCQ command is received and NCQ commands
are in-flight, do not force a requeue of the non-NCQ command by
returning SCSI_MLQUEUE_XXX_BUSY in ata_scsi_translate() and instead
hold on to the qc using the new deferred_qc field of struct ata_port.
This deferred qc will be issued using the work item deferred_qc_work
running the function ata_scsi_deferred_qc_work() once all in-flight
commands complete, which is checked with the port qc_defer() callback
indicating that no further delay is necessary. This check is done using
the helper function ata_scsi_schedule_deferred_qc() which is called from
ata_scsi_qc_complete(). This thus excludes this mechanism from all
internal non-NCQ commands issued by ATA EH.
When a port deferred_qc is non NULL, that is, the port has a command
waiting for the device queue to drain, the issuing of all incoming
commands (both NCQ and non-NCQ) is deferred using the regular busy
mechanism. This simplifies the code and also avoids potential denial of
service problems if a user issues too many non-NCQ passthrough commands.
Finally, whenever ata EH is scheduled, regardless of the reason, a
deferred qc is always requeued so that it can be retried once EH
completes. This is done by calling the function
ata_scsi_requeue_deferred_qc() from ata_eh_set_pending(). This avoids
the need for any special processing for the deferred qc in case of NCQ
error, link or device reset, or device timeout.
Reported-by: Xingui Yang <yangxingui@huawei.com>
Reported-by: Igor Pylypiv <ipylypiv@google.com>
Fixes: 42f22fe36d51 ("scsi: pm8001: Expose hardware queues for pm80xx")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
---
drivers/ata/libata-core.c | 1 +
drivers/ata/libata-eh.c | 6 +++
drivers/ata/libata-scsi.c | 93 ++++++++++++++++++++++++++++++++++++++-
drivers/ata/libata.h | 2 +
include/linux/libata.h | 3 ++
5 files changed, 104 insertions(+), 1 deletion(-)
diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 0b24bd169d61..121f35115d33 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -5558,6 +5558,7 @@ struct ata_port *ata_port_alloc(struct ata_host *host)
mutex_init(&ap->scsi_scan_mutex);
INIT_DELAYED_WORK(&ap->hotplug_task, ata_scsi_hotplug);
INIT_DELAYED_WORK(&ap->scsi_rescan_task, ata_scsi_dev_rescan);
+ INIT_WORK(&ap->deferred_qc_work, ata_scsi_deferred_qc_work);
INIT_LIST_HEAD(&ap->eh_done_q);
init_waitqueue_head(&ap->eh_wait_q);
init_completion(&ap->park_req_pending);
diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index 2586e77ebf45..b90b17f680f8 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -917,6 +917,12 @@ static void ata_eh_set_pending(struct ata_port *ap, bool fastdrain)
ap->pflags |= ATA_PFLAG_EH_PENDING;
+ /*
+ * If we have a deferred qc, requeue it so that it is retried once EH
+ * completes.
+ */
+ ata_scsi_requeue_deferred_qc(ap);
+
if (!fastdrain)
return;
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 8e04fc173ea3..e39837846566 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -1658,8 +1658,73 @@ static void ata_qc_done(struct ata_queued_cmd *qc)
done(cmd);
}
+void ata_scsi_deferred_qc_work(struct work_struct *work)
+{
+ struct ata_port *ap =
+ container_of(work, struct ata_port, deferred_qc_work);
+ struct ata_queued_cmd *qc;
+ unsigned long flags;
+
+ spin_lock_irqsave(ap->lock, flags);
+
+ /*
+ * If we still have a deferred qc and we are not in EH, issue it. In
+ * such case, we should not need any more deferring the qc, so warn if
+ * qc_defer() says otherwise.
+ */
+ qc = ap->deferred_qc;
+ if (qc && !ata_port_eh_scheduled(ap)) {
+ WARN_ON_ONCE(ap->ops->qc_defer(qc));
+ ap->deferred_qc = NULL;
+ ata_qc_issue(qc);
+ }
+
+ spin_unlock_irqrestore(ap->lock, flags);
+}
+
+void ata_scsi_requeue_deferred_qc(struct ata_port *ap)
+{
+ struct ata_queued_cmd *qc = ap->deferred_qc;
+ struct scsi_cmnd *scmd;
+
+ /*
+ * If we have a deferred qc when a reset occurs or NCQ commands fail,
+ * do not try to be smart about what to do with this deferred command
+ * and simply retry it by completing it with DID_SOFT_ERROR.
+ */
+ if (!qc)
+ return;
+
+ scmd = qc->scsicmd;
+ ap->deferred_qc = NULL;
+ ata_qc_free(qc);
+ scmd->result = (DID_SOFT_ERROR << 16);
+ scsi_done(scmd);
+}
+
+static void ata_scsi_schedule_deferred_qc(struct ata_port *ap)
+{
+ struct ata_queued_cmd *qc = ap->deferred_qc;
+
+ /*
+ * If we have a deferred qc, then qc_defer() is defined and we can use
+ * this callback to determine if this qc is good to go, unless EH has
+ * been scheduled.
+ */
+ if (!qc)
+ return;
+
+ if (ata_port_eh_scheduled(ap)) {
+ ata_scsi_requeue_deferred_qc(ap);
+ return;
+ }
+ if (!ap->ops->qc_defer(qc))
+ queue_work(system_highpri_wq, &ap->deferred_qc_work);
+}
+
static void ata_scsi_qc_complete(struct ata_queued_cmd *qc)
{
+ struct ata_port *ap = qc->ap;
struct scsi_cmnd *cmd = qc->scsicmd;
u8 *cdb = cmd->cmnd;
bool have_sense = qc->flags & ATA_QCFLAG_SENSE_VALID;
@@ -1689,12 +1754,24 @@ static void ata_scsi_qc_complete(struct ata_queued_cmd *qc)
}
ata_qc_done(qc);
+
+ ata_scsi_schedule_deferred_qc(ap);
}
static int ata_scsi_defer(struct ata_port *ap, struct ata_queued_cmd *qc)
{
int ret;
+ /*
+ * If we already have a deferred qc, then rely on the SCSI layer to
+ * requeue and defer all incoming commands until the deferred qc is
+ * processed, once all on-going commands are completed.
+ */
+ if (ap->deferred_qc) {
+ ata_qc_free(qc);
+ return SCSI_MLQUEUE_DEVICE_BUSY;
+ }
+
if (!ap->ops->qc_defer)
return 0;
@@ -1702,6 +1779,17 @@ static int ata_scsi_defer(struct ata_port *ap, struct ata_queued_cmd *qc)
if (!ret)
return 0;
+ /*
+ * We must defer this qc: if this is not an NCQ command, keep this qc
+ * as a deferred one and wait for all on-going NCQ commands to complete
+ * before issuing it with the deferred qc work.
+ */
+ if (!ata_is_ncq(qc->tf.protocol)) {
+ ap->deferred_qc = qc;
+ return SCSI_MLQUEUE_DEVICE_BUSY;
+ }
+
+ /* Use the SCSI layer to requeue and defer the command. */
ata_qc_free(qc);
switch (ret) {
@@ -1777,8 +1865,11 @@ static int ata_scsi_translate(struct ata_device *dev, struct scsi_cmnd *cmd,
goto done;
rc = ata_scsi_defer(ap, qc);
- if (rc)
+ if (rc) {
+ if (qc == ap->deferred_qc)
+ return 0;
return rc;
+ }
ata_qc_issue(qc);
diff --git a/drivers/ata/libata.h b/drivers/ata/libata.h
index 0e7ecac73680..60a675df61dc 100644
--- a/drivers/ata/libata.h
+++ b/drivers/ata/libata.h
@@ -165,6 +165,8 @@ void ata_scsi_sdev_config(struct scsi_device *sdev);
int ata_scsi_dev_config(struct scsi_device *sdev, struct queue_limits *lim,
struct ata_device *dev);
int __ata_scsi_queuecmd(struct scsi_cmnd *scmd, struct ata_device *dev);
+void ata_scsi_deferred_qc_work(struct work_struct *work);
+void ata_scsi_requeue_deferred_qc(struct ata_port *ap);
/* libata-eh.c */
extern unsigned int ata_internal_cmd_timeout(struct ata_device *dev, u8 cmd);
diff --git a/include/linux/libata.h b/include/linux/libata.h
index 39534fafa36a..c5b27d97dfaf 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -903,6 +903,9 @@ struct ata_port {
u64 qc_active;
int nr_active_links; /* #links with active qcs */
+ struct work_struct deferred_qc_work;
+ struct ata_queued_cmd *deferred_qc;
+
struct ata_link link; /* host default link */
struct ata_link *slave_link; /* see ata_slave_link_init() */
--
2.52.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH v3 1/2] ata: libata-scsi: refactor ata_scsi_translate()
2025-12-20 0:21 ` [PATCH v3 1/2] ata: libata-scsi: refactor ata_scsi_translate() Damien Le Moal
@ 2025-12-22 21:19 ` Igor Pylypiv
2025-12-30 13:40 ` Niklas Cassel
1 sibling, 0 replies; 18+ messages in thread
From: Igor Pylypiv @ 2025-12-22 21:19 UTC (permalink / raw)
To: Damien Le Moal; +Cc: linux-ide, Niklas Cassel, Xingui Yang, John Garry
On Sat, Dec 20, 2025 at 09:21:39AM +0900, Damien Le Moal wrote:
> Factor out of ata_scsi_translate() the code handling queued command
> deferral using the port qc_defer callback into the new function
> ata_scsi_defer(), and simplify the goto used in ata_scsi_translate().
> While at it, also add a lockdep annotation to check that the port lock
> is held when ata_scsi_translate() is called.
>
> No functional changes.
>
> Reviewed-by: Hannes Reinecke <hare@suse.de>
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Igor Pylypiv <ipylypiv@google.com>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 2/2] ata: libata-scsi: avoid passthrough command starvation
2025-12-20 0:21 ` [PATCH v3 2/2] ata: libata-scsi: avoid passthrough command starvation Damien Le Moal
@ 2025-12-22 21:28 ` Igor Pylypiv
2025-12-28 2:25 ` Damien Le Moal
2025-12-23 11:17 ` yangxingui
` (2 subsequent siblings)
3 siblings, 1 reply; 18+ messages in thread
From: Igor Pylypiv @ 2025-12-22 21:28 UTC (permalink / raw)
To: Damien Le Moal; +Cc: linux-ide, Niklas Cassel, Xingui Yang, John Garry
On Sat, Dec 20, 2025 at 09:21:40AM +0900, Damien Le Moal wrote:
> When a non-NCQ passthrough command is issued while NCQ commands are
> being executed, ata_scsi_defer() indicates to ata_scsi_translate() that
> ata_qc_issue() should not be called for the passthrough command, and
> instead returns SCSI_MLQUEUE_XXX_BUSY to defer the command execution.
> This command deferring is correct and as mandated by the ACS
> specifications since NCQ and non-NCQ commands cannot be mixed.
>
> However, in the case of a host adapter using multiple submission queues,
> when the target device is under a constant load of NCQ read or write
> commands, there are no guarantees that requeueing the non-NCQ command
> will lead to it not being deferred again repeatedly, since other
> submission queues can constantly issue NCQ commands from different CPUs
> ahead of the non-NCQ command. This can lead to very long delays for the
> execution of non-NCQ passthrough commands, and even complete starvation
> in the worst case scenario.
>
> Since the block layer and the SCSI layer do not distinguish between
> queuable (NCQ) and non queueable (non-NCQ) commands, libata-scsi SAT
> implementation must ensure forward progress for non-NCQ commands in the
> presence of NCQ command traffic. This is similar to what SAS HBAs with a
> hardware/firmware based implementation do.
>
> Implement such forward progress guarantee by limiting requeueing of
> non-NCQ commands: when a non-NCQ command is received and NCQ commands
> are in-flight, do not force a requeue of the non-NCQ command by
> returning SCSI_MLQUEUE_XXX_BUSY in ata_scsi_translate() and instead
> hold on to the qc using the new deferred_qc field of struct ata_port.
>
> This deferred qc will be issued using the work item deferred_qc_work
> running the function ata_scsi_deferred_qc_work() once all in-flight
> commands complete, which is checked with the port qc_defer() callback
> indicating that no further delay is necessary. This check is done using
> the helper function ata_scsi_schedule_deferred_qc() which is called from
> ata_scsi_qc_complete(). This thus excludes this mechanism from all
> internal non-NCQ commands issued by ATA EH.
>
> When a port deferred_qc is non NULL, that is, the port has a command
> waiting for the device queue to drain, the issuing of all incoming
> commands (both NCQ and non-NCQ) is deferred using the regular busy
> mechanism. This simplifies the code and also avoids potential denial of
> service problems if a user issues too many non-NCQ passthrough commands.
>
> Finally, whenever ata EH is scheduled, regardless of the reason, a
> deferred qc is always requeued so that it can be retried once EH
> completes. This is done by calling the function
> ata_scsi_requeue_deferred_qc() from ata_eh_set_pending(). This avoids
> the need for any special processing for the deferred qc in case of NCQ
> error, link or device reset, or device timeout.
>
> Reported-by: Xingui Yang <yangxingui@huawei.com>
> Reported-by: Igor Pylypiv <ipylypiv@google.com>
Tested-by: Igor Pylypiv <ipylypiv@google.com>
Thanks again for fixing this issue, Damien!
> Fixes: 42f22fe36d51 ("scsi: pm8001: Expose hardware queues for pm80xx")
> Cc: stable@vger.kernel.org
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
> ---
> drivers/ata/libata-core.c | 1 +
> drivers/ata/libata-eh.c | 6 +++
> drivers/ata/libata-scsi.c | 93 ++++++++++++++++++++++++++++++++++++++-
> drivers/ata/libata.h | 2 +
> include/linux/libata.h | 3 ++
> 5 files changed, 104 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
> index 0b24bd169d61..121f35115d33 100644
> --- a/drivers/ata/libata-core.c
> +++ b/drivers/ata/libata-core.c
> @@ -5558,6 +5558,7 @@ struct ata_port *ata_port_alloc(struct ata_host *host)
> mutex_init(&ap->scsi_scan_mutex);
> INIT_DELAYED_WORK(&ap->hotplug_task, ata_scsi_hotplug);
> INIT_DELAYED_WORK(&ap->scsi_rescan_task, ata_scsi_dev_rescan);
> + INIT_WORK(&ap->deferred_qc_work, ata_scsi_deferred_qc_work);
> INIT_LIST_HEAD(&ap->eh_done_q);
> init_waitqueue_head(&ap->eh_wait_q);
> init_completion(&ap->park_req_pending);
> diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
> index 2586e77ebf45..b90b17f680f8 100644
> --- a/drivers/ata/libata-eh.c
> +++ b/drivers/ata/libata-eh.c
> @@ -917,6 +917,12 @@ static void ata_eh_set_pending(struct ata_port *ap, bool fastdrain)
>
> ap->pflags |= ATA_PFLAG_EH_PENDING;
>
> + /*
> + * If we have a deferred qc, requeue it so that it is retried once EH
> + * completes.
> + */
> + ata_scsi_requeue_deferred_qc(ap);
> +
> if (!fastdrain)
> return;
>
> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
> index 8e04fc173ea3..e39837846566 100644
> --- a/drivers/ata/libata-scsi.c
> +++ b/drivers/ata/libata-scsi.c
> @@ -1658,8 +1658,73 @@ static void ata_qc_done(struct ata_queued_cmd *qc)
> done(cmd);
> }
>
> +void ata_scsi_deferred_qc_work(struct work_struct *work)
> +{
> + struct ata_port *ap =
> + container_of(work, struct ata_port, deferred_qc_work);
> + struct ata_queued_cmd *qc;
> + unsigned long flags;
> +
> + spin_lock_irqsave(ap->lock, flags);
> +
> + /*
> + * If we still have a deferred qc and we are not in EH, issue it. In
> + * such case, we should not need any more deferring the qc, so warn if
> + * qc_defer() says otherwise.
> + */
> + qc = ap->deferred_qc;
> + if (qc && !ata_port_eh_scheduled(ap)) {
> + WARN_ON_ONCE(ap->ops->qc_defer(qc));
> + ap->deferred_qc = NULL;
> + ata_qc_issue(qc);
> + }
> +
> + spin_unlock_irqrestore(ap->lock, flags);
> +}
> +
> +void ata_scsi_requeue_deferred_qc(struct ata_port *ap)
> +{
> + struct ata_queued_cmd *qc = ap->deferred_qc;
> + struct scsi_cmnd *scmd;
> +
> + /*
> + * If we have a deferred qc when a reset occurs or NCQ commands fail,
> + * do not try to be smart about what to do with this deferred command
> + * and simply retry it by completing it with DID_SOFT_ERROR.
> + */
> + if (!qc)
> + return;
> +
> + scmd = qc->scsicmd;
> + ap->deferred_qc = NULL;
> + ata_qc_free(qc);
> + scmd->result = (DID_SOFT_ERROR << 16);
> + scsi_done(scmd);
> +}
> +
> +static void ata_scsi_schedule_deferred_qc(struct ata_port *ap)
> +{
> + struct ata_queued_cmd *qc = ap->deferred_qc;
> +
> + /*
> + * If we have a deferred qc, then qc_defer() is defined and we can use
> + * this callback to determine if this qc is good to go, unless EH has
> + * been scheduled.
> + */
> + if (!qc)
> + return;
> +
> + if (ata_port_eh_scheduled(ap)) {
> + ata_scsi_requeue_deferred_qc(ap);
> + return;
> + }
> + if (!ap->ops->qc_defer(qc))
> + queue_work(system_highpri_wq, &ap->deferred_qc_work);
Do we need to call cancel_work_sync(&ap->deferred_qc_work) in ata_port_detach()?
> +}
> +
> static void ata_scsi_qc_complete(struct ata_queued_cmd *qc)
> {
> + struct ata_port *ap = qc->ap;
> struct scsi_cmnd *cmd = qc->scsicmd;
> u8 *cdb = cmd->cmnd;
> bool have_sense = qc->flags & ATA_QCFLAG_SENSE_VALID;
> @@ -1689,12 +1754,24 @@ static void ata_scsi_qc_complete(struct ata_queued_cmd *qc)
> }
>
> ata_qc_done(qc);
> +
> + ata_scsi_schedule_deferred_qc(ap);
> }
>
> static int ata_scsi_defer(struct ata_port *ap, struct ata_queued_cmd *qc)
> {
> int ret;
>
> + /*
> + * If we already have a deferred qc, then rely on the SCSI layer to
> + * requeue and defer all incoming commands until the deferred qc is
> + * processed, once all on-going commands are completed.
> + */
> + if (ap->deferred_qc) {
> + ata_qc_free(qc);
> + return SCSI_MLQUEUE_DEVICE_BUSY;
> + }
> +
> if (!ap->ops->qc_defer)
> return 0;
>
> @@ -1702,6 +1779,17 @@ static int ata_scsi_defer(struct ata_port *ap, struct ata_queued_cmd *qc)
> if (!ret)
> return 0;
>
> + /*
> + * We must defer this qc: if this is not an NCQ command, keep this qc
> + * as a deferred one and wait for all on-going NCQ commands to complete
> + * before issuing it with the deferred qc work.
> + */
> + if (!ata_is_ncq(qc->tf.protocol)) {
> + ap->deferred_qc = qc;
> + return SCSI_MLQUEUE_DEVICE_BUSY;
> + }
> +
> + /* Use the SCSI layer to requeue and defer the command. */
> ata_qc_free(qc);
>
> switch (ret) {
> @@ -1777,8 +1865,11 @@ static int ata_scsi_translate(struct ata_device *dev, struct scsi_cmnd *cmd,
> goto done;
>
> rc = ata_scsi_defer(ap, qc);
> - if (rc)
> + if (rc) {
> + if (qc == ap->deferred_qc)
> + return 0;
> return rc;
> + }
>
> ata_qc_issue(qc);
>
> diff --git a/drivers/ata/libata.h b/drivers/ata/libata.h
> index 0e7ecac73680..60a675df61dc 100644
> --- a/drivers/ata/libata.h
> +++ b/drivers/ata/libata.h
> @@ -165,6 +165,8 @@ void ata_scsi_sdev_config(struct scsi_device *sdev);
> int ata_scsi_dev_config(struct scsi_device *sdev, struct queue_limits *lim,
> struct ata_device *dev);
> int __ata_scsi_queuecmd(struct scsi_cmnd *scmd, struct ata_device *dev);
> +void ata_scsi_deferred_qc_work(struct work_struct *work);
> +void ata_scsi_requeue_deferred_qc(struct ata_port *ap);
>
> /* libata-eh.c */
> extern unsigned int ata_internal_cmd_timeout(struct ata_device *dev, u8 cmd);
> diff --git a/include/linux/libata.h b/include/linux/libata.h
> index 39534fafa36a..c5b27d97dfaf 100644
> --- a/include/linux/libata.h
> +++ b/include/linux/libata.h
> @@ -903,6 +903,9 @@ struct ata_port {
> u64 qc_active;
> int nr_active_links; /* #links with active qcs */
>
> + struct work_struct deferred_qc_work;
> + struct ata_queued_cmd *deferred_qc;
> +
> struct ata_link link; /* host default link */
> struct ata_link *slave_link; /* see ata_slave_link_init() */
>
> --
> 2.52.0
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 2/2] ata: libata-scsi: avoid passthrough command starvation
2025-12-20 0:21 ` [PATCH v3 2/2] ata: libata-scsi: avoid passthrough command starvation Damien Le Moal
2025-12-22 21:28 ` Igor Pylypiv
@ 2025-12-23 11:17 ` yangxingui
2025-12-31 11:17 ` Niklas Cassel
2025-12-31 11:39 ` Niklas Cassel
3 siblings, 0 replies; 18+ messages in thread
From: yangxingui @ 2025-12-23 11:17 UTC (permalink / raw)
To: Damien Le Moal, linux-ide, Niklas Cassel; +Cc: Igor Pylypiv, John Garry
On 2025/12/20 8:21, Damien Le Moal wrote:
> When a non-NCQ passthrough command is issued while NCQ commands are
> being executed, ata_scsi_defer() indicates to ata_scsi_translate() that
> ata_qc_issue() should not be called for the passthrough command, and
> instead returns SCSI_MLQUEUE_XXX_BUSY to defer the command execution.
> This command deferring is correct and as mandated by the ACS
> specifications since NCQ and non-NCQ commands cannot be mixed.
>
> However, in the case of a host adapter using multiple submission queues,
> when the target device is under a constant load of NCQ read or write
> commands, there are no guarantees that requeueing the non-NCQ command
> will lead to it not being deferred again repeatedly, since other
> submission queues can constantly issue NCQ commands from different CPUs
> ahead of the non-NCQ command. This can lead to very long delays for the
> execution of non-NCQ passthrough commands, and even complete starvation
> in the worst case scenario.
>
> Since the block layer and the SCSI layer do not distinguish between
> queuable (NCQ) and non queueable (non-NCQ) commands, libata-scsi SAT
> implementation must ensure forward progress for non-NCQ commands in the
> presence of NCQ command traffic. This is similar to what SAS HBAs with a
> hardware/firmware based implementation do.
>
> Implement such forward progress guarantee by limiting requeueing of
> non-NCQ commands: when a non-NCQ command is received and NCQ commands
> are in-flight, do not force a requeue of the non-NCQ command by
> returning SCSI_MLQUEUE_XXX_BUSY in ata_scsi_translate() and instead
> hold on to the qc using the new deferred_qc field of struct ata_port.
>
> This deferred qc will be issued using the work item deferred_qc_work
> running the function ata_scsi_deferred_qc_work() once all in-flight
> commands complete, which is checked with the port qc_defer() callback
> indicating that no further delay is necessary. This check is done using
> the helper function ata_scsi_schedule_deferred_qc() which is called from
> ata_scsi_qc_complete(). This thus excludes this mechanism from all
> internal non-NCQ commands issued by ATA EH.
>
> When a port deferred_qc is non NULL, that is, the port has a command
> waiting for the device queue to drain, the issuing of all incoming
> commands (both NCQ and non-NCQ) is deferred using the regular busy
> mechanism. This simplifies the code and also avoids potential denial of
> service problems if a user issues too many non-NCQ passthrough commands.
>
> Finally, whenever ata EH is scheduled, regardless of the reason, a
> deferred qc is always requeued so that it can be retried once EH
> completes. This is done by calling the function
> ata_scsi_requeue_deferred_qc() from ata_eh_set_pending(). This avoids
> the need for any special processing for the deferred qc in case of NCQ
> error, link or device reset, or device timeout.
>
> Reported-by: Xingui Yang <yangxingui@huawei.com>
Tested-by: Xingui Yang <yangxingui@huawei.com>
I'm excited that this issue has been fixed, thank you.
> Reported-by: Igor Pylypiv <ipylypiv@google.com>
> Fixes: 42f22fe36d51 ("scsi: pm8001: Expose hardware queues for pm80xx")
> Cc: stable@vger.kernel.org
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
> ---
> drivers/ata/libata-core.c | 1 +
> drivers/ata/libata-eh.c | 6 +++
> drivers/ata/libata-scsi.c | 93 ++++++++++++++++++++++++++++++++++++++-
> drivers/ata/libata.h | 2 +
> include/linux/libata.h | 3 ++
> 5 files changed, 104 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
> index 0b24bd169d61..121f35115d33 100644
> --- a/drivers/ata/libata-core.c
> +++ b/drivers/ata/libata-core.c
> @@ -5558,6 +5558,7 @@ struct ata_port *ata_port_alloc(struct ata_host *host)
> mutex_init(&ap->scsi_scan_mutex);
> INIT_DELAYED_WORK(&ap->hotplug_task, ata_scsi_hotplug);
> INIT_DELAYED_WORK(&ap->scsi_rescan_task, ata_scsi_dev_rescan);
> + INIT_WORK(&ap->deferred_qc_work, ata_scsi_deferred_qc_work);
> INIT_LIST_HEAD(&ap->eh_done_q);
> init_waitqueue_head(&ap->eh_wait_q);
> init_completion(&ap->park_req_pending);
> diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
> index 2586e77ebf45..b90b17f680f8 100644
> --- a/drivers/ata/libata-eh.c
> +++ b/drivers/ata/libata-eh.c
> @@ -917,6 +917,12 @@ static void ata_eh_set_pending(struct ata_port *ap, bool fastdrain)
>
> ap->pflags |= ATA_PFLAG_EH_PENDING;
>
> + /*
> + * If we have a deferred qc, requeue it so that it is retried once EH
> + * completes.
> + */
> + ata_scsi_requeue_deferred_qc(ap);
> +
> if (!fastdrain)
> return;
>
> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
> index 8e04fc173ea3..e39837846566 100644
> --- a/drivers/ata/libata-scsi.c
> +++ b/drivers/ata/libata-scsi.c
> @@ -1658,8 +1658,73 @@ static void ata_qc_done(struct ata_queued_cmd *qc)
> done(cmd);
> }
>
> +void ata_scsi_deferred_qc_work(struct work_struct *work)
> +{
> + struct ata_port *ap =
> + container_of(work, struct ata_port, deferred_qc_work);
> + struct ata_queued_cmd *qc;
> + unsigned long flags;
> +
> + spin_lock_irqsave(ap->lock, flags);
> +
> + /*
> + * If we still have a deferred qc and we are not in EH, issue it. In
> + * such case, we should not need any more deferring the qc, so warn if
> + * qc_defer() says otherwise.
> + */
> + qc = ap->deferred_qc;
> + if (qc && !ata_port_eh_scheduled(ap)) {
> + WARN_ON_ONCE(ap->ops->qc_defer(qc));
> + ap->deferred_qc = NULL;
> + ata_qc_issue(qc);
> + }
> +
> + spin_unlock_irqrestore(ap->lock, flags);
> +}
> +
> +void ata_scsi_requeue_deferred_qc(struct ata_port *ap)
> +{
> + struct ata_queued_cmd *qc = ap->deferred_qc;
> + struct scsi_cmnd *scmd;
> +
> + /*
> + * If we have a deferred qc when a reset occurs or NCQ commands fail,
> + * do not try to be smart about what to do with this deferred command
> + * and simply retry it by completing it with DID_SOFT_ERROR.
> + */
> + if (!qc)
> + return;
> +
> + scmd = qc->scsicmd;
> + ap->deferred_qc = NULL;
> + ata_qc_free(qc);
> + scmd->result = (DID_SOFT_ERROR << 16);
> + scsi_done(scmd);
> +}
> +
> +static void ata_scsi_schedule_deferred_qc(struct ata_port *ap)
> +{
> + struct ata_queued_cmd *qc = ap->deferred_qc;
> +
> + /*
> + * If we have a deferred qc, then qc_defer() is defined and we can use
> + * this callback to determine if this qc is good to go, unless EH has
> + * been scheduled.
> + */
> + if (!qc)
> + return;
> +
> + if (ata_port_eh_scheduled(ap)) {
> + ata_scsi_requeue_deferred_qc(ap);
> + return;
> + }
> + if (!ap->ops->qc_defer(qc))
> + queue_work(system_highpri_wq, &ap->deferred_qc_work);
> +}
> +
> static void ata_scsi_qc_complete(struct ata_queued_cmd *qc)
> {
> + struct ata_port *ap = qc->ap;
> struct scsi_cmnd *cmd = qc->scsicmd;
> u8 *cdb = cmd->cmnd;
> bool have_sense = qc->flags & ATA_QCFLAG_SENSE_VALID;
> @@ -1689,12 +1754,24 @@ static void ata_scsi_qc_complete(struct ata_queued_cmd *qc)
> }
>
> ata_qc_done(qc);
> +
> + ata_scsi_schedule_deferred_qc(ap);
> }
>
> static int ata_scsi_defer(struct ata_port *ap, struct ata_queued_cmd *qc)
> {
> int ret;
>
> + /*
> + * If we already have a deferred qc, then rely on the SCSI layer to
> + * requeue and defer all incoming commands until the deferred qc is
> + * processed, once all on-going commands are completed.
> + */
> + if (ap->deferred_qc) {
> + ata_qc_free(qc);
> + return SCSI_MLQUEUE_DEVICE_BUSY;
> + }
> +
> if (!ap->ops->qc_defer)
> return 0;
>
> @@ -1702,6 +1779,17 @@ static int ata_scsi_defer(struct ata_port *ap, struct ata_queued_cmd *qc)
> if (!ret)
> return 0;
>
> + /*
> + * We must defer this qc: if this is not an NCQ command, keep this qc
> + * as a deferred one and wait for all on-going NCQ commands to complete
> + * before issuing it with the deferred qc work.
> + */
> + if (!ata_is_ncq(qc->tf.protocol)) {
> + ap->deferred_qc = qc;
> + return SCSI_MLQUEUE_DEVICE_BUSY;
> + }
> +
> + /* Use the SCSI layer to requeue and defer the command. */
> ata_qc_free(qc);
>
> switch (ret) {
> @@ -1777,8 +1865,11 @@ static int ata_scsi_translate(struct ata_device *dev, struct scsi_cmnd *cmd,
> goto done;
>
> rc = ata_scsi_defer(ap, qc);
> - if (rc)
> + if (rc) {
> + if (qc == ap->deferred_qc)
> + return 0;
> return rc;
> + }
>
> ata_qc_issue(qc);
>
> diff --git a/drivers/ata/libata.h b/drivers/ata/libata.h
> index 0e7ecac73680..60a675df61dc 100644
> --- a/drivers/ata/libata.h
> +++ b/drivers/ata/libata.h
> @@ -165,6 +165,8 @@ void ata_scsi_sdev_config(struct scsi_device *sdev);
> int ata_scsi_dev_config(struct scsi_device *sdev, struct queue_limits *lim,
> struct ata_device *dev);
> int __ata_scsi_queuecmd(struct scsi_cmnd *scmd, struct ata_device *dev);
> +void ata_scsi_deferred_qc_work(struct work_struct *work);
> +void ata_scsi_requeue_deferred_qc(struct ata_port *ap);
>
> /* libata-eh.c */
> extern unsigned int ata_internal_cmd_timeout(struct ata_device *dev, u8 cmd);
> diff --git a/include/linux/libata.h b/include/linux/libata.h
> index 39534fafa36a..c5b27d97dfaf 100644
> --- a/include/linux/libata.h
> +++ b/include/linux/libata.h
> @@ -903,6 +903,9 @@ struct ata_port {
> u64 qc_active;
> int nr_active_links; /* #links with active qcs */
>
> + struct work_struct deferred_qc_work;
> + struct ata_queued_cmd *deferred_qc;
> +
> struct ata_link link; /* host default link */
> struct ata_link *slave_link; /* see ata_slave_link_init() */
>
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 2/2] ata: libata-scsi: avoid passthrough command starvation
2025-12-22 21:28 ` Igor Pylypiv
@ 2025-12-28 2:25 ` Damien Le Moal
0 siblings, 0 replies; 18+ messages in thread
From: Damien Le Moal @ 2025-12-28 2:25 UTC (permalink / raw)
To: Igor Pylypiv; +Cc: linux-ide, Niklas Cassel, Xingui Yang, John Garry
On 12/23/25 06:28, Igor Pylypiv wrote:
>> +static void ata_scsi_schedule_deferred_qc(struct ata_port *ap)
>> +{
>> + struct ata_queued_cmd *qc = ap->deferred_qc;
>> +
>> + /*
>> + * If we have a deferred qc, then qc_defer() is defined and we can use
>> + * this callback to determine if this qc is good to go, unless EH has
>> + * been scheduled.
>> + */
>> + if (!qc)
>> + return;
>> +
>> + if (ata_port_eh_scheduled(ap)) {
>> + ata_scsi_requeue_deferred_qc(ap);
>> + return;
>> + }
>> + if (!ap->ops->qc_defer(qc))
>> + queue_work(system_highpri_wq, &ap->deferred_qc_work);
>
> Do we need to call cancel_work_sync(&ap->deferred_qc_work) in ata_port_detach()?
Good point. I added that.
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 1/2] ata: libata-scsi: refactor ata_scsi_translate()
2025-12-20 0:21 ` [PATCH v3 1/2] ata: libata-scsi: refactor ata_scsi_translate() Damien Le Moal
2025-12-22 21:19 ` Igor Pylypiv
@ 2025-12-30 13:40 ` Niklas Cassel
1 sibling, 0 replies; 18+ messages in thread
From: Niklas Cassel @ 2025-12-30 13:40 UTC (permalink / raw)
To: Damien Le Moal; +Cc: linux-ide, Igor Pylypiv, Xingui Yang, John Garry
On Sat, Dec 20, 2025 at 09:21:39AM +0900, Damien Le Moal wrote:
> Factor out of ata_scsi_translate() the code handling queued command
> deferral using the port qc_defer callback into the new function
> ata_scsi_defer(), and simplify the goto used in ata_scsi_translate().
> While at it, also add a lockdep annotation to check that the port lock
> is held when ata_scsi_translate() is called.
>
> No functional changes.
>
> Reviewed-by: Hannes Reinecke <hare@suse.de>
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Niklas Cassel <cassel@kernel.org>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 0/2] Prevent non-NCQ command starvation
2025-12-20 0:21 [PATCH v3 0/2] Prevent non-NCQ command starvation Damien Le Moal
2025-12-20 0:21 ` [PATCH v3 1/2] ata: libata-scsi: refactor ata_scsi_translate() Damien Le Moal
2025-12-20 0:21 ` [PATCH v3 2/2] ata: libata-scsi: avoid passthrough command starvation Damien Le Moal
@ 2025-12-31 10:04 ` Niklas Cassel
2026-01-02 1:14 ` Damien Le Moal
2 siblings, 1 reply; 18+ messages in thread
From: Niklas Cassel @ 2025-12-31 10:04 UTC (permalink / raw)
To: Damien Le Moal; +Cc: linux-ide, Igor Pylypiv, Xingui Yang, John Garry
On Sat, Dec 20, 2025 at 09:21:38AM +0900, Damien Le Moal wrote:
> This small patch series addresses potential command starvation issues
> with non-NCQ passthrough commands issued to a device accessed through
Here and everywhere else:
s/non-NCQ passthrough commands/non-NCQ commands/
The problem is really not related to passthrough commands,
but rather the mix between NCQ and non-NCQ commands.
E.g. you can perform a ioctl with the REQ_OP_DISCARD command,
for most ATA devices this is a SCSI WRITE SAME 16 CMD, which gets
translated to either a NCQ or non-NCQ command depending on if the
device supports NCQ encapsulated Data Set Management (DSM) commands,
and specifically if the device has the QUEUED DATA SET MANAGEMENT
SUPPORTS TRIM bit set or not, which is not a given.
TL;DR: the starvation problem is not directly tied to non-NCQ
passthrough commands, but any command, even commands that goes via
the block layer, that happens to be translated to a non-NCQ command.
Kind regards,
Niklas
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 2/2] ata: libata-scsi: avoid passthrough command starvation
2025-12-20 0:21 ` [PATCH v3 2/2] ata: libata-scsi: avoid passthrough command starvation Damien Le Moal
2025-12-22 21:28 ` Igor Pylypiv
2025-12-23 11:17 ` yangxingui
@ 2025-12-31 11:17 ` Niklas Cassel
2026-01-02 0:47 ` Damien Le Moal
2025-12-31 11:39 ` Niklas Cassel
3 siblings, 1 reply; 18+ messages in thread
From: Niklas Cassel @ 2025-12-31 11:17 UTC (permalink / raw)
To: Damien Le Moal; +Cc: linux-ide, Igor Pylypiv, Xingui Yang, John Garry
On Sat, Dec 20, 2025 at 09:21:40AM +0900, Damien Le Moal wrote:
> When a non-NCQ passthrough command is issued while NCQ commands are
Same as cover letter:
s/non-NCQ passthrough command/non-NCQ command/
> being executed, ata_scsi_defer() indicates to ata_scsi_translate() that
> ata_qc_issue() should not be called for the passthrough command, and
s/passthrough//
> instead returns SCSI_MLQUEUE_XXX_BUSY to defer the command execution.
> This command deferring is correct and as mandated by the ACS
> specifications since NCQ and non-NCQ commands cannot be mixed.
>
> However, in the case of a host adapter using multiple submission queues,
> when the target device is under a constant load of NCQ read or write
s/NCQ read or write commands/NCQ commands/
There are other NCQ commands, such as NCQ SEND and NCQ RECEIVE.
E.g. a user could saturate the queue by doing a NCQ encapsulated READ LOG
DMA EXT.
> commands, there are no guarantees that requeueing the non-NCQ command
> will lead to it not being deferred again repeatedly, since other
> submission queues can constantly issue NCQ commands from different CPUs
> ahead of the non-NCQ command. This can lead to very long delays for the
> execution of non-NCQ passthrough commands, and even complete starvation
> in the worst case scenario.
>
> Since the block layer and the SCSI layer do not distinguish between
> queuable (NCQ) and non queueable (non-NCQ) commands, libata-scsi SAT
s/queuable/queueable/
> implementation must ensure forward progress for non-NCQ commands in the
> presence of NCQ command traffic. This is similar to what SAS HBAs with a
> hardware/firmware based implementation do.
>
> Implement such forward progress guarantee by limiting requeueing of
> non-NCQ commands: when a non-NCQ command is received and NCQ commands
> are in-flight, do not force a requeue of the non-NCQ command by
> returning SCSI_MLQUEUE_XXX_BUSY in ata_scsi_translate() and instead
> hold on to the qc using the new deferred_qc field of struct ata_port.
>
> This deferred qc will be issued using the work item deferred_qc_work
> running the function ata_scsi_deferred_qc_work() once all in-flight
> commands complete, which is checked with the port qc_defer() callback
> indicating that no further delay is necessary. This check is done using
> the helper function ata_scsi_schedule_deferred_qc() which is called from
> ata_scsi_qc_complete(). This thus excludes this mechanism from all
> internal non-NCQ commands issued by ATA EH.
>
> When a port deferred_qc is non NULL, that is, the port has a command
> waiting for the device queue to drain, the issuing of all incoming
> commands (both NCQ and non-NCQ) is deferred using the regular busy
> mechanism. This simplifies the code and also avoids potential denial of
> service problems if a user issues too many non-NCQ passthrough commands.
Instead of introducing a workqueue, did you try/consider to do it the same
way that we handle CDL commands that completed without error:
https://github.com/torvalds/linux/blob/v6.19-rc3/drivers/ata/libata-core.c#L4971-L4988
I.e., in qc_defer(), if we try to queue a non-NCQ command while there
are NCQ commands active, set ATA_PFLAG_EH_PENDING so that we do not
trigger fast drain (we wait for the active commands to finish),
and then call ata_qc_schedule_eh() on the non-NCQ qc.
Sure, for passthrough commands specifically, SCSI will not want to
retry the command, because scsi_noretry_cmd() will return true:
https://github.com/torvalds/linux/blob/v6.19-rc3/drivers/scsi/scsi_error.c#L2227
But this could easily be solved by e.g. modifying scsi_noretry_cmd()
to add an additional "case DID_REQUEUE: return false;"
And also set set_host_byte(scmd, DID_REQUEUE); (probably based on a
new ATA_QCFLAG_DEFER flag or simiar), after we have cleared DID_TIME_OUT
using set_host_byte(scmd, DID_OK):
https://github.com/torvalds/linux/blob/v6.19-rc3/drivers/ata/libata-eh.c#L640
Since this works for CDL commands, I don't see why we shouldn't be able to
also send a non-NCQ command (if there are NCQ commands active), via SCSI
EH, so that SCSI itself retries the command, rather than us holding/hiding
a command from the SCSI queue, by using an additional workqueue in libata.
(If libata is just a SCSI LLDD, it somehow feels more logical to make use
of the SCSI queue, rather than to add our own queueing in libata using a
workqueue.)
Kind regards,
Niklas
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 2/2] ata: libata-scsi: avoid passthrough command starvation
2025-12-20 0:21 ` [PATCH v3 2/2] ata: libata-scsi: avoid passthrough command starvation Damien Le Moal
` (2 preceding siblings ...)
2025-12-31 11:17 ` Niklas Cassel
@ 2025-12-31 11:39 ` Niklas Cassel
2026-01-02 1:10 ` Damien Le Moal
3 siblings, 1 reply; 18+ messages in thread
From: Niklas Cassel @ 2025-12-31 11:39 UTC (permalink / raw)
To: Damien Le Moal; +Cc: linux-ide, Igor Pylypiv, Xingui Yang, John Garry
On Sat, Dec 20, 2025 at 09:21:40AM +0900, Damien Le Moal wrote:
> @@ -1702,6 +1779,17 @@ static int ata_scsi_defer(struct ata_port *ap, struct ata_queued_cmd *qc)
> if (!ret)
> return 0;
>
> + /*
> + * We must defer this qc: if this is not an NCQ command, keep this qc
> + * as a deferred one and wait for all on-going NCQ commands to complete
> + * before issuing it with the deferred qc work.
> + */
> + if (!ata_is_ncq(qc->tf.protocol)) {
> + ap->deferred_qc = qc;
> + return SCSI_MLQUEUE_DEVICE_BUSY;
Here you save the qc, and you return SCSI_MLQUEUE_DEVICE_BUSY;
SCSI_MLQUEUE_DEVICE_BUSY will cause the block layer to reinsert the request
in the queue, so you will get the same request sent to ata_scsi_translate()
again, even though you have save it. A little ugly IMO.
It seems like you are relying on the ata_scsi_qc_new() call will give you
the same *qc as the one you have stored...
> + }
> +
> + /* Use the SCSI layer to requeue and defer the command. */
> ata_qc_free(qc);
>
> switch (ret) {
> @@ -1777,8 +1865,11 @@ static int ata_scsi_translate(struct ata_device *dev, struct scsi_cmnd *cmd,
> goto done;
>
> rc = ata_scsi_defer(ap, qc);
> - if (rc)
> + if (rc) {
> + if (qc == ap->deferred_qc)
> + return 0;
...and then you have this code here, to do nothing in that case.
But, are we guaranteed that we will always get the same qc?
Have you tried this both with and without mq-deadline?
And on both ahci and libsas HBAs?
Because AFAICT, for AHCI with mq-deadline, I don't think that rq->tag will
be kept, so you will probably get another *qc.
This is apparently because when using mq-deadline, q->elevator is non-NULL:
https://github.com/torvalds/linux/blob/v6.19-rc3/block/blk-mq.c#L742-L745
Thus RQF_SCHED_TAGS will be set.
When RQF_SCHED_TAGS is set, the rq->tag is not kept during requeue:
https://github.com/torvalds/linux/blob/v6.19-rc3/block/blk-mq.c#L427-L433
Perhaps it does work (only) for libsas HBAs though, because:
https://github.com/torvalds/linux/blob/v6.19-rc3/drivers/ata/libata-scsi.c#L752-L762
in that case, it seems like the tag is taken from scsi_cmd->budget_token
rather than from rq->tag.
> return rc;
> + }
>
> ata_qc_issue(qc);
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 2/2] ata: libata-scsi: avoid passthrough command starvation
2025-12-31 11:17 ` Niklas Cassel
@ 2026-01-02 0:47 ` Damien Le Moal
2026-01-02 10:24 ` Niklas Cassel
0 siblings, 1 reply; 18+ messages in thread
From: Damien Le Moal @ 2026-01-02 0:47 UTC (permalink / raw)
To: Niklas Cassel; +Cc: linux-ide, Igor Pylypiv, Xingui Yang, John Garry
On 12/31/25 20:17, Niklas Cassel wrote:
>> Implement such forward progress guarantee by limiting requeueing of
>> non-NCQ commands: when a non-NCQ command is received and NCQ commands
>> are in-flight, do not force a requeue of the non-NCQ command by
>> returning SCSI_MLQUEUE_XXX_BUSY in ata_scsi_translate() and instead
>> hold on to the qc using the new deferred_qc field of struct ata_port.
>>
>> This deferred qc will be issued using the work item deferred_qc_work
>> running the function ata_scsi_deferred_qc_work() once all in-flight
>> commands complete, which is checked with the port qc_defer() callback
>> indicating that no further delay is necessary. This check is done using
>> the helper function ata_scsi_schedule_deferred_qc() which is called from
>> ata_scsi_qc_complete(). This thus excludes this mechanism from all
>> internal non-NCQ commands issued by ATA EH.
>>
>> When a port deferred_qc is non NULL, that is, the port has a command
>> waiting for the device queue to drain, the issuing of all incoming
>> commands (both NCQ and non-NCQ) is deferred using the regular busy
>> mechanism. This simplifies the code and also avoids potential denial of
>> service problems if a user issues too many non-NCQ passthrough commands.
>
> Instead of introducing a workqueue, did you try/consider to do it the same
> way that we handle CDL commands that completed without error:
> https://github.com/torvalds/linux/blob/v6.19-rc3/drivers/ata/libata-core.c#L4971-L4988
>
> I.e., in qc_defer(), if we try to queue a non-NCQ command while there
> are NCQ commands active, set ATA_PFLAG_EH_PENDING so that we do not
> trigger fast drain (we wait for the active commands to finish),
> and then call ata_qc_schedule_eh() on the non-NCQ qc.
I thought about it but ruled it out so that we do not overload EH with handling
of non-NCQ commands. libata EH is already complicated enough :)
> Sure, for passthrough commands specifically, SCSI will not want to
> retry the command, because scsi_noretry_cmd() will return true:
> https://github.com/torvalds/linux/blob/v6.19-rc3/drivers/scsi/scsi_error.c#L2227
>
> But this could easily be solved by e.g. modifying scsi_noretry_cmd()
> to add an additional "case DID_REQUEUE: return false;"
>
> And also set set_host_byte(scmd, DID_REQUEUE); (probably based on a
> new ATA_QCFLAG_DEFER flag or simiar), after we have cleared DID_TIME_OUT
> using set_host_byte(scmd, DID_OK):
> https://github.com/torvalds/linux/blob/v6.19-rc3/drivers/ata/libata-eh.c#L640
You are kind of proving my point here: I feel that is all much more complicated
than using the libata internal deferred qc work. Also, involving the SCSI layer
about non-NCQ commands deferral is kind of a layering violation as the
non-queueable command concept does not exist in SCSI.
> Since this works for CDL commands, I don't see why we shouldn't be able to
> also send a non-NCQ command (if there are NCQ commands active), via SCSI
> EH, so that SCSI itself retries the command, rather than us holding/hiding
> a command from the SCSI queue, by using an additional workqueue in libata.
Sure, we could make that working, but as mentioned above, I feel that is a lot
more complicated.
> (If libata is just a SCSI LLDD, it somehow feels more logical to make use
> of the SCSI queue, rather than to add our own queueing in libata using a
> workqueue.)
libata must act as a SATL and the SAT specifications are clear about what a SATL
implementation can do for non-NCQ commands. Referring to SATA6r01:
6.2.5 Commands the SATL queues internally
If the translation of a SCSI command requires the SATL to send an ATA non-NCQ
command to the ATA device while one or more ATA commands are active, then the
SATL shall:
a) suspend processing of that SCSI command, and:
1) maintain the SCSI command in a task set;
2) resume processing of that SCSI command when the ATA device returns command
complete for all ATA commands the SATL has previously sent to the ATA device; and
3) after that SCSI command completes, resume processing of other SCSI commands;
b) return TASK SET FULL status for that SCSI command; or
c) return BUSY status for that SCSI command.
What you are proposing does not nicely fit. This patch essentially implements
option (a). Option (c) is the current code, which causes the issue for
multi-queue cases (note that AHCI is not affected because it is single queue).
And option (c) would be mostly equivalent to (b) and cause the same issue as
that would involve a requeue too.
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 2/2] ata: libata-scsi: avoid passthrough command starvation
2025-12-31 11:39 ` Niklas Cassel
@ 2026-01-02 1:10 ` Damien Le Moal
2026-01-02 9:46 ` Niklas Cassel
0 siblings, 1 reply; 18+ messages in thread
From: Damien Le Moal @ 2026-01-02 1:10 UTC (permalink / raw)
To: Niklas Cassel; +Cc: linux-ide, Igor Pylypiv, Xingui Yang, John Garry
On 12/31/25 20:39, Niklas Cassel wrote:
> On Sat, Dec 20, 2025 at 09:21:40AM +0900, Damien Le Moal wrote:
>> @@ -1702,6 +1779,17 @@ static int ata_scsi_defer(struct ata_port *ap, struct ata_queued_cmd *qc)
>> if (!ret)
>> return 0;
>>
>> + /*
>> + * We must defer this qc: if this is not an NCQ command, keep this qc
>> + * as a deferred one and wait for all on-going NCQ commands to complete
>> + * before issuing it with the deferred qc work.
>> + */
>> + if (!ata_is_ncq(qc->tf.protocol)) {
>> + ap->deferred_qc = qc;
>> + return SCSI_MLQUEUE_DEVICE_BUSY;
>
> Here you save the qc, and you return SCSI_MLQUEUE_DEVICE_BUSY;
> SCSI_MLQUEUE_DEVICE_BUSY will cause the block layer to reinsert the request
> in the queue, so you will get the same request sent to ata_scsi_translate()
> again, even though you have save it. A little ugly IMO.
No it will not cause a requeue in this case because this change:
rc = ata_scsi_defer(ap, qc);
- if (rc)
+ if (rc) {
+ if (qc == ap->deferred_qc)
+ return 0;
ensures that we return "0", thus telling the scsi and block layer that the
command/request was accepted, but we do *not* call ata_qc_issue() for that qc.
It is deferred and will be issued by the deferred qc work once the queue drains.
> It seems like you are relying on the ata_scsi_qc_new() call will give you
> the same *qc as the one you have stored...
No because we will not see that request/command again since we told the scsi
layer that we accepted it (but did not issue it yet, but the scsi layer does not
need to know that).
>> rc = ata_scsi_defer(ap, qc);
>> - if (rc)
>> + if (rc) {
>> + if (qc == ap->deferred_qc)
>> + return 0;
>
> ...and then you have this code here, to do nothing in that case.
Yes, we do not issue the qc in this case. But the "return 0" tells the scsi
layer that we accepted the command, so as far as scsi is concerned, the command
was "issued".
> But, are we guaranteed that we will always get the same qc?
We will not see the same command again because we told the scsi layer we
accepted it with the above "return 0".
> Have you tried this both with and without mq-deadline?
> And on both ahci and libsas HBAs?
Yes to all.
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 0/2] Prevent non-NCQ command starvation
2025-12-31 10:04 ` [PATCH v3 0/2] Prevent non-NCQ " Niklas Cassel
@ 2026-01-02 1:14 ` Damien Le Moal
0 siblings, 0 replies; 18+ messages in thread
From: Damien Le Moal @ 2026-01-02 1:14 UTC (permalink / raw)
To: Niklas Cassel; +Cc: linux-ide, Igor Pylypiv, Xingui Yang, John Garry
On 12/31/25 19:04, Niklas Cassel wrote:
> On Sat, Dec 20, 2025 at 09:21:38AM +0900, Damien Le Moal wrote:
>> This small patch series addresses potential command starvation issues
>> with non-NCQ passthrough commands issued to a device accessed through
>
> Here and everywhere else:
> s/non-NCQ passthrough commands/non-NCQ commands/
>
> The problem is really not related to passthrough commands,
> but rather the mix between NCQ and non-NCQ commands.
>
> E.g. you can perform a ioctl with the REQ_OP_DISCARD command,
> for most ATA devices this is a SCSI WRITE SAME 16 CMD, which gets
> translated to either a NCQ or non-NCQ command depending on if the
> device supports NCQ encapsulated Data Set Management (DSM) commands,
> and specifically if the device has the QUEUED DATA SET MANAGEMENT
> SUPPORTS TRIM bit set or not, which is not a given.
>
> TL;DR: the starvation problem is not directly tied to non-NCQ
> passthrough commands, but any command, even commands that goes via
> the block layer, that happens to be translated to a non-NCQ command.
Yep. I corrected that.
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 2/2] ata: libata-scsi: avoid passthrough command starvation
2026-01-02 1:10 ` Damien Le Moal
@ 2026-01-02 9:46 ` Niklas Cassel
2026-01-02 11:01 ` Damien Le Moal
0 siblings, 1 reply; 18+ messages in thread
From: Niklas Cassel @ 2026-01-02 9:46 UTC (permalink / raw)
To: Damien Le Moal; +Cc: linux-ide, Igor Pylypiv, Xingui Yang, John Garry
On Fri, Jan 02, 2026 at 10:10:44AM +0900, Damien Le Moal wrote:
> On 12/31/25 20:39, Niklas Cassel wrote:
> > On Sat, Dec 20, 2025 at 09:21:40AM +0900, Damien Le Moal wrote:
> >> @@ -1702,6 +1779,17 @@ static int ata_scsi_defer(struct ata_port *ap, struct ata_queued_cmd *qc)
> >> if (!ret)
> >> return 0;
> >>
> >> + /*
> >> + * We must defer this qc: if this is not an NCQ command, keep this qc
> >> + * as a deferred one and wait for all on-going NCQ commands to complete
> >> + * before issuing it with the deferred qc work.
> >> + */
> >> + if (!ata_is_ncq(qc->tf.protocol)) {
> >> + ap->deferred_qc = qc;
> >> + return SCSI_MLQUEUE_DEVICE_BUSY;
> >
> > Here you save the qc, and you return SCSI_MLQUEUE_DEVICE_BUSY;
> > SCSI_MLQUEUE_DEVICE_BUSY will cause the block layer to reinsert the request
> > in the queue, so you will get the same request sent to ata_scsi_translate()
> > again, even though you have save it. A little ugly IMO.
>
> No it will not cause a requeue in this case because this change:
>
> rc = ata_scsi_defer(ap, qc);
> - if (rc)
> + if (rc) {
> + if (qc == ap->deferred_qc)
> + return 0;
>
> ensures that we return "0", thus telling the scsi and block layer that the
> command/request was accepted, but we do *not* call ata_qc_issue() for that qc.
> It is deferred and will be issued by the deferred qc work once the queue drains.
I see how I got confused.
In the case where we store the deferred QC, you overload the return value.
But the original return value did not make sense (SCSI_MLQUEUE_DEVICE_BUSY),
as it means ask upper layer to requeue. I suggest that you let ata_scsi_defer()
return some other value in this case.
ATA_DEFER_ACTION_CMD_STORED or something.
Then at the call site:
rc = ata_scsi_defer(ap, qc);
if (rc) {
if (rc == ATA_DEFER_ACTION_CMD_STORED)
return 0;
return rc;
}
Kind regards,
Niklas
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 2/2] ata: libata-scsi: avoid passthrough command starvation
2026-01-02 0:47 ` Damien Le Moal
@ 2026-01-02 10:24 ` Niklas Cassel
2026-01-02 11:00 ` Damien Le Moal
0 siblings, 1 reply; 18+ messages in thread
From: Niklas Cassel @ 2026-01-02 10:24 UTC (permalink / raw)
To: Damien Le Moal; +Cc: linux-ide, Igor Pylypiv, Xingui Yang, John Garry
On Fri, Jan 02, 2026 at 09:47:57AM +0900, Damien Le Moal wrote:
> On 12/31/25 20:17, Niklas Cassel wrote:
>
> You are kind of proving my point here: I feel that is all much more complicated
> than using the libata internal deferred qc work. Also, involving the SCSI layer
> about non-NCQ commands deferral is kind of a layering violation as the
> non-queueable command concept does not exist in SCSI.
Personally, I think my suggestion is much less complicated, as we do not need
to think about when to clear ap->deferred_qc or when to stop the workqueue,
during the special cases like when a reset occurs, or when a NCQ command fails.
I'm a little bit worried that there are some other special cases that we might
be overlooking.
Reusing the method we already use for CDL commands would completely sidestep
these issues, as we are putting the command in the SCSI EH workqueue, which
will requeue the command for us.
I do not see it as a layering violation, as we simply wait for the queue to
drain, then let SCSI EH retry/insert the failed/deferred non-NCQ command in
the normal SCSI queue.
The non-NCQ should be the only command that is failed, but even if there was
a NCQ command that failed as well, we will retry scmd->allowed times anyway,
so the non-NCQ command should eventually be the only command that is retried.
> libata must act as a SATL and the SAT specifications are clear about what a SATL
> implementation can do for non-NCQ commands. Referring to SATA6r01:
>
> 6.2.5 Commands the SATL queues internally
> If the translation of a SCSI command requires the SATL to send an ATA non-NCQ
> command to the ATA device while one or more ATA commands are active, then the
> SATL shall:
> a) suspend processing of that SCSI command, and:
> 1) maintain the SCSI command in a task set;
> 2) resume processing of that SCSI command when the ATA device returns command
> complete for all ATA commands the SATL has previously sent to the ATA device; and
> 3) after that SCSI command completes, resume processing of other SCSI commands;
> b) return TASK SET FULL status for that SCSI command; or
> c) return BUSY status for that SCSI command.
>
> What you are proposing does not nicely fit. This patch essentially implements
> option (a). Option (c) is the current code, which causes the issue for
> multi-queue cases (note that AHCI is not affected because it is single queue).
> And option (c) would be mostly equivalent to (b) and cause the same issue as
> that would involve a requeue too.
I think my suggestion would be a)
since by triggering EH, we wait for all commands to complete,
then resume processing of the non-NCQ command.
That said, I know that you had plans to change the processing of CDL commands
to use a workqueue instead of EH, so in the grand scheme of things, if we will
migrate to use a workqueue instead of EH for CDL, perhaps it makes sense if we
also use a workqueue for a deferred non-NCQ command.
Will we be able to use the same workqueue, or will we soon have two different
workqueues?
Kind regards,
Niklas
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 2/2] ata: libata-scsi: avoid passthrough command starvation
2026-01-02 10:24 ` Niklas Cassel
@ 2026-01-02 11:00 ` Damien Le Moal
0 siblings, 0 replies; 18+ messages in thread
From: Damien Le Moal @ 2026-01-02 11:00 UTC (permalink / raw)
To: Niklas Cassel; +Cc: linux-ide, Igor Pylypiv, Xingui Yang, John Garry
On 1/2/26 19:24, Niklas Cassel wrote:
> On Fri, Jan 02, 2026 at 09:47:57AM +0900, Damien Le Moal wrote:
>> On 12/31/25 20:17, Niklas Cassel wrote:
>>
>> You are kind of proving my point here: I feel that is all much more complicated
>> than using the libata internal deferred qc work. Also, involving the SCSI layer
>> about non-NCQ commands deferral is kind of a layering violation as the
>> non-queueable command concept does not exist in SCSI.
>
> Personally, I think my suggestion is much less complicated, as we do not need
> to think about when to clear ap->deferred_qc or when to stop the workqueue,
> during the special cases like when a reset occurs, or when a NCQ command fails.
> I'm a little bit worried that there are some other special cases that we might
> be overlooking.
>
> Reusing the method we already use for CDL commands would completely sidestep
> these issues, as we are putting the command in the SCSI EH workqueue, which
> will requeue the command for us.
>
> I do not see it as a layering violation, as we simply wait for the queue to
> drain, then let SCSI EH retry/insert the failed/deferred non-NCQ command in
> the normal SCSI queue.
>
> The non-NCQ should be the only command that is failed, but even if there was
> a NCQ command that failed as well, we will retry scmd->allowed times anyway,
> so the non-NCQ command should eventually be the only command that is retried.
>
>
>> libata must act as a SATL and the SAT specifications are clear about what a SATL
>> implementation can do for non-NCQ commands. Referring to SATA6r01:
>>
>> 6.2.5 Commands the SATL queues internally
>> If the translation of a SCSI command requires the SATL to send an ATA non-NCQ
>> command to the ATA device while one or more ATA commands are active, then the
>> SATL shall:
>> a) suspend processing of that SCSI command, and:
>> 1) maintain the SCSI command in a task set;
>> 2) resume processing of that SCSI command when the ATA device returns command
>> complete for all ATA commands the SATL has previously sent to the ATA device; and
>> 3) after that SCSI command completes, resume processing of other SCSI commands;
>> b) return TASK SET FULL status for that SCSI command; or
>> c) return BUSY status for that SCSI command.
>>
>> What you are proposing does not nicely fit. This patch essentially implements
>> option (a). Option (c) is the current code, which causes the issue for
>> multi-queue cases (note that AHCI is not affected because it is single queue).
>> And option (c) would be mostly equivalent to (b) and cause the same issue as
>> that would involve a requeue too.
>
> I think my suggestion would be a)
> since by triggering EH, we wait for all commands to complete,
> then resume processing of the non-NCQ command.
>
> That said, I know that you had plans to change the processing of CDL commands
> to use a workqueue instead of EH, so in the grand scheme of things, if we will
> migrate to use a workqueue instead of EH for CDL, perhaps it makes sense if we
> also use a workqueue for a deferred non-NCQ command.
>
> Will we be able to use the same workqueue, or will we soon have two different
> workqueues?
I can try reusing. Let's see how that looks.
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 2/2] ata: libata-scsi: avoid passthrough command starvation
2026-01-02 9:46 ` Niklas Cassel
@ 2026-01-02 11:01 ` Damien Le Moal
0 siblings, 0 replies; 18+ messages in thread
From: Damien Le Moal @ 2026-01-02 11:01 UTC (permalink / raw)
To: Niklas Cassel; +Cc: linux-ide, Igor Pylypiv, Xingui Yang, John Garry
On 1/2/26 18:46, Niklas Cassel wrote:
> On Fri, Jan 02, 2026 at 10:10:44AM +0900, Damien Le Moal wrote:
>> On 12/31/25 20:39, Niklas Cassel wrote:
>>> On Sat, Dec 20, 2025 at 09:21:40AM +0900, Damien Le Moal wrote:
>>>> @@ -1702,6 +1779,17 @@ static int ata_scsi_defer(struct ata_port *ap, struct ata_queued_cmd *qc)
>>>> if (!ret)
>>>> return 0;
>>>>
>>>> + /*
>>>> + * We must defer this qc: if this is not an NCQ command, keep this qc
>>>> + * as a deferred one and wait for all on-going NCQ commands to complete
>>>> + * before issuing it with the deferred qc work.
>>>> + */
>>>> + if (!ata_is_ncq(qc->tf.protocol)) {
>>>> + ap->deferred_qc = qc;
>>>> + return SCSI_MLQUEUE_DEVICE_BUSY;
>>>
>>> Here you save the qc, and you return SCSI_MLQUEUE_DEVICE_BUSY;
>>> SCSI_MLQUEUE_DEVICE_BUSY will cause the block layer to reinsert the request
>>> in the queue, so you will get the same request sent to ata_scsi_translate()
>>> again, even though you have save it. A little ugly IMO.
>>
>> No it will not cause a requeue in this case because this change:
>>
>> rc = ata_scsi_defer(ap, qc);
>> - if (rc)
>> + if (rc) {
>> + if (qc == ap->deferred_qc)
>> + return 0;
>>
>> ensures that we return "0", thus telling the scsi and block layer that the
>> command/request was accepted, but we do *not* call ata_qc_issue() for that qc.
>> It is deferred and will be issued by the deferred qc work once the queue drains.
>
> I see how I got confused.
>
> In the case where we store the deferred QC, you overload the return value.
> But the original return value did not make sense (SCSI_MLQUEUE_DEVICE_BUSY),
> as it means ask upper layer to requeue. I suggest that you let ata_scsi_defer()
> return some other value in this case.
>
> ATA_DEFER_ACTION_CMD_STORED or something.
>
> Then at the call site:
>
> rc = ata_scsi_defer(ap, qc);
> if (rc) {
> if (rc == ATA_DEFER_ACTION_CMD_STORED)
> return 0;
> return rc;
> }
OK. Will send v5 with that.
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2026-01-02 11:01 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-20 0:21 [PATCH v3 0/2] Prevent non-NCQ command starvation Damien Le Moal
2025-12-20 0:21 ` [PATCH v3 1/2] ata: libata-scsi: refactor ata_scsi_translate() Damien Le Moal
2025-12-22 21:19 ` Igor Pylypiv
2025-12-30 13:40 ` Niklas Cassel
2025-12-20 0:21 ` [PATCH v3 2/2] ata: libata-scsi: avoid passthrough command starvation Damien Le Moal
2025-12-22 21:28 ` Igor Pylypiv
2025-12-28 2:25 ` Damien Le Moal
2025-12-23 11:17 ` yangxingui
2025-12-31 11:17 ` Niklas Cassel
2026-01-02 0:47 ` Damien Le Moal
2026-01-02 10:24 ` Niklas Cassel
2026-01-02 11:00 ` Damien Le Moal
2025-12-31 11:39 ` Niklas Cassel
2026-01-02 1:10 ` Damien Le Moal
2026-01-02 9:46 ` Niklas Cassel
2026-01-02 11:01 ` Damien Le Moal
2025-12-31 10:04 ` [PATCH v3 0/2] Prevent non-NCQ " Niklas Cassel
2026-01-02 1:14 ` Damien Le Moal
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox