public inbox for linux-ide@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/2] Prevent non-NCQ command starvation
@ 2026-01-04  8:22 Damien Le Moal
  2026-01-04  8:22 ` [PATCH v5 1/2] ata: libata-scsi: refactor ata_scsi_translate() Damien Le Moal
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Damien Le Moal @ 2026-01-04  8:22 UTC (permalink / raw)
  To: linux-ide, Niklas Cassel; +Cc: Igor Pylypiv, Xingui Yang, John Garry

This small patch series addresses potential command starvation issues
with non-NCQ commands issued to a device accessed through a multi-queue
libsas HBA. In such setup, a non-NCQ command issued to a device may
suffer long delays and even complete starvation if the device is under
a constant workload of NCQ commands issued from multiple queues.

This issue is avoided by limiting requeing request through the block
layer and instead explicitly waiting for the device command queue to
drain before issuing the non-NCQ command. This is done reliably by not
trying to second guess ata EH in case of error or device reset, by
always requeuing a deferred command whenever EH is scheduled for the
device port.

When a non-ncq command is deferred waiting for the device queue to
drain, all new incoming commands are always requeued through the regular
busy mechanism, thus avoiding the need to manage an internal command
queue.

With this, the long latencies observed when executing non-NCQ commands
under a heavy workload of NCQ commands at high queue depth is resolved:
the non-NCQ command latency only depends on how long it takes for all
in-flight NCQ commands to complete.

Changes from v4:
 - Changed patch 1 refactoring to create the helper function
   ata_scsi_qc_issue() to handle both deferral and issuing of commands,
   instead of the formet ata_scsi_qc_defer(9 helper which was handling
   only command deferral. this simplifies patch 2 by avoiding needing
   modifications in ata_scsi_translate()

Changes from v3:
 - Added review and test tags
 - Improved patch 2 commit message
 - Added a comment in ata_scsi_translate() to clarify the handling of
   ata_scsi_defer() return value.

Changes from v2:
 - Improved comment about ata_scsi_qc_new() behavior in patch 1
   (suggested by Garry).
 - Removed some lockdep annotations from patch 2 as they have little
   value.
 - In patch 2, change ata_scsi_requeue_deferred_qc() and
   ata_scsi_schedule_deferred_qc() to return early as suggested by
   Garry.
 - Fixed comments typo in patch 2.

Changes from v1:
 - Fixed ac leak in ata_scsi_defer() in patch 2.
 - Added Hannes review tag

Damien Le Moal (2):
  ata: libata-scsi: refactor ata_scsi_translate()
  ata: libata-scsi: avoid Non-NCQ command starvation

 drivers/ata/libata-core.c |   5 ++
 drivers/ata/libata-eh.c   |   6 ++
 drivers/ata/libata-scsi.c | 170 +++++++++++++++++++++++++++++++-------
 drivers/ata/libata.h      |   2 +
 include/linux/libata.h    |   3 +
 5 files changed, 155 insertions(+), 31 deletions(-)

-- 
2.52.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v5 1/2] ata: libata-scsi: refactor ata_scsi_translate()
  2026-01-04  8:22 [PATCH v5 0/2] Prevent non-NCQ command starvation Damien Le Moal
@ 2026-01-04  8:22 ` Damien Le Moal
  2026-01-04 17:05   ` Niklas Cassel
                     ` (2 more replies)
  2026-01-04  8:22 ` [PATCH v5 2/2] ata: libata-scsi: avoid Non-NCQ command starvation Damien Le Moal
  2026-01-04 21:49 ` [PATCH v5 0/2] Prevent non-NCQ " Martin K. Petersen
  2 siblings, 3 replies; 11+ messages in thread
From: Damien Le Moal @ 2026-01-04  8:22 UTC (permalink / raw)
  To: linux-ide, Niklas Cassel; +Cc: Igor Pylypiv, Xingui Yang, John Garry

Factor out of ata_scsi_translate() the code handling queued command
deferral using the port qc_defer callback and inssuing the queued
command with ata_qc_issue() into the new function ata_scsi_qc_issue(),
and simplify the goto used in ata_scsi_translate().
While at it, also add a lockdep annotation to check that the port lock
is held when ata_scsi_translate() is called.

No functional changes.

Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
---
 drivers/ata/libata-scsi.c | 81 ++++++++++++++++++++++++---------------
 1 file changed, 50 insertions(+), 31 deletions(-)

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 721d3f270c8e..5b6b5f1ff3c7 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -1691,6 +1691,42 @@ static void ata_scsi_qc_complete(struct ata_queued_cmd *qc)
 	ata_qc_done(qc);
 }
 
+static int ata_scsi_qc_issue(struct ata_port *ap, struct ata_queued_cmd *qc)
+{
+	int ret;
+
+	if (!ap->ops->qc_defer)
+		goto issue;
+
+	/* Check if the command needs to be deferred. */
+	ret = ap->ops->qc_defer(qc);
+	switch (ret) {
+	case 0:
+		break;
+	case ATA_DEFER_LINK:
+		ret = SCSI_MLQUEUE_DEVICE_BUSY;
+		break;
+	case ATA_DEFER_PORT:
+		ret = SCSI_MLQUEUE_HOST_BUSY;
+		break;
+	default:
+		WARN_ON_ONCE(1);
+		ret = SCSI_MLQUEUE_HOST_BUSY;
+		break;
+	}
+
+	if (ret) {
+		/* Force a requeue of the command to defer its execution. */
+		ata_qc_free(qc);
+		return ret;
+	}
+
+issue:
+	ata_qc_issue(qc);
+
+	return 0;
+}
+
 /**
  *	ata_scsi_translate - Translate then issue SCSI command to ATA device
  *	@dev: ATA device to which the command is addressed
@@ -1714,66 +1750,49 @@ static void ata_scsi_qc_complete(struct ata_queued_cmd *qc)
  *	spin_lock_irqsave(host lock)
  *
  *	RETURNS:
- *	0 on success, SCSI_ML_QUEUE_DEVICE_BUSY if the command
- *	needs to be deferred.
+ *	0 on success, SCSI_ML_QUEUE_DEVICE_BUSY or SCSI_MLQUEUE_HOST_BUSY if the
+ *	command needs to be deferred.
  */
 static int ata_scsi_translate(struct ata_device *dev, struct scsi_cmnd *cmd,
 			      ata_xlat_func_t xlat_func)
 {
 	struct ata_port *ap = dev->link->ap;
 	struct ata_queued_cmd *qc;
-	int rc;
 
+	lockdep_assert_held(ap->lock);
+
+	/*
+	 * ata_scsi_qc_new() calls scsi_done(cmd) in case of failure. So we
+	 * have nothing further to do when allocating a qc fails.
+	 */
 	qc = ata_scsi_qc_new(dev, cmd);
 	if (!qc)
-		goto err_mem;
+		return 0;
 
 	/* data is present; dma-map it */
 	if (cmd->sc_data_direction == DMA_FROM_DEVICE ||
 	    cmd->sc_data_direction == DMA_TO_DEVICE) {
 		if (unlikely(scsi_bufflen(cmd) < 1)) {
 			ata_dev_warn(dev, "WARNING: zero len r/w req\n");
-			goto err_did;
+			cmd->result = (DID_ERROR << 16);
+			goto done;
 		}
 
 		ata_sg_init(qc, scsi_sglist(cmd), scsi_sg_count(cmd));
-
 		qc->dma_dir = cmd->sc_data_direction;
 	}
 
 	qc->complete_fn = ata_scsi_qc_complete;
 
 	if (xlat_func(qc))
-		goto early_finish;
-
-	if (ap->ops->qc_defer) {
-		if ((rc = ap->ops->qc_defer(qc)))
-			goto defer;
-	}
-
-	/* select device, send command to hardware */
-	ata_qc_issue(qc);
+		goto done;
 
-	return 0;
-
-early_finish:
-	ata_qc_free(qc);
-	scsi_done(cmd);
-	return 0;
+	return ata_scsi_qc_issue(ap, qc);
 
-err_did:
+done:
 	ata_qc_free(qc);
-	cmd->result = (DID_ERROR << 16);
 	scsi_done(cmd);
-err_mem:
 	return 0;
-
-defer:
-	ata_qc_free(qc);
-	if (rc == ATA_DEFER_LINK)
-		return SCSI_MLQUEUE_DEVICE_BUSY;
-	else
-		return SCSI_MLQUEUE_HOST_BUSY;
 }
 
 /**
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v5 2/2] ata: libata-scsi: avoid Non-NCQ command starvation
  2026-01-04  8:22 [PATCH v5 0/2] Prevent non-NCQ command starvation Damien Le Moal
  2026-01-04  8:22 ` [PATCH v5 1/2] ata: libata-scsi: refactor ata_scsi_translate() Damien Le Moal
@ 2026-01-04  8:22 ` Damien Le Moal
  2026-01-04 17:16   ` Niklas Cassel
                     ` (3 more replies)
  2026-01-04 21:49 ` [PATCH v5 0/2] Prevent non-NCQ " Martin K. Petersen
  2 siblings, 4 replies; 11+ messages in thread
From: Damien Le Moal @ 2026-01-04  8:22 UTC (permalink / raw)
  To: linux-ide, Niklas Cassel; +Cc: Igor Pylypiv, Xingui Yang, John Garry

When a non-NCQ command is issued while NCQ commands are being executed,
ata_scsi_qc_issue() indicates to the SCSI layer that the command issuing
should be deferred by returning SCSI_MLQUEUE_XXX_BUSY.  This command
deferring is correct and as mandated by the ACS specifications since
NCQ and non-NCQ commands cannot be mixed.

However, in the case of a host adapter using multiple submission queues,
when the target device is under a constant load of NCQ commands, there
are no guarantees that requeueing the non-NCQ command will be executed
later and it may be deferred again repeatedly as other submission queues
can constantly issue NCQ commands from different CPUs ahead of the
non-NCQ command. This can lead to very long delays for the execution of
non-NCQ commands, and even complete starvation for these commands in the
worst case scenario.

Since the block layer and the SCSI layer do not distinguish between
queueable (NCQ) and non queueable (non-NCQ) commands, libata-scsi SAT
implementation must ensure forward progress for non-NCQ commands in the
presence of NCQ command traffic. This is similar to what SAS HBAs with a
hardware/firmware based SAT implementation do.

Implement such forward progress guarantee by limiting requeueing of
non-NCQ commands from ata_scsi_qc_issue(): when a non-NCQ command is
received and NCQ commands are in-flight, do not force a requeue of the
non-NCQ command by returning SCSI_MLQUEUE_XXX_BUSY and instead return 0
to indicate that the command was accepted but hold on to the qc using
the new deferred_qc field of struct ata_port.

This deferred qc will be issued using the work item deferred_qc_work
running the function ata_scsi_deferred_qc_work() once all in-flight
commands complete, which is checked with the port qc_defer() callback
return value indicating that no further delay is necessary. This check
is done using the helper function ata_scsi_schedule_deferred_qc() which
is called from ata_scsi_qc_complete(). This thus excludes this mechanism
from all internal non-NCQ commands issued by ATA EH.

When a port deferred_qc is non NULL, that is, the port has a command
waiting for the device queue to drain, the issuing of all incoming
commands (both NCQ and non-NCQ) is deferred using the regular busy
mechanism. This simplifies the code and also avoids potential denial of
service problems if a user issues too many non-NCQ commands.

Finally, whenever ata EH is scheduled, regardless of the reason, a
deferred qc is always requeued so that it can be retried once EH
completes. This is done by calling the function
ata_scsi_requeue_deferred_qc() from ata_eh_set_pending(). This avoids
the need for any special processing for the deferred qc in case of NCQ
error, link or device reset, or device timeout.

Reported-by: Xingui Yang <yangxingui@huawei.com>
Reported-by: Igor Pylypiv <ipylypiv@google.com>
Fixes: 42f22fe36d51 ("scsi: pm8001: Expose hardware queues for pm80xx")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
---
 drivers/ata/libata-core.c |  5 +++
 drivers/ata/libata-eh.c   |  6 +++
 drivers/ata/libata-scsi.c | 89 +++++++++++++++++++++++++++++++++++++++
 drivers/ata/libata.h      |  2 +
 include/linux/libata.h    |  3 ++
 5 files changed, 105 insertions(+)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 09d8c035fcdf..447d8dc666a4 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -5561,6 +5561,7 @@ struct ata_port *ata_port_alloc(struct ata_host *host)
 	mutex_init(&ap->scsi_scan_mutex);
 	INIT_DELAYED_WORK(&ap->hotplug_task, ata_scsi_hotplug);
 	INIT_DELAYED_WORK(&ap->scsi_rescan_task, ata_scsi_dev_rescan);
+	INIT_WORK(&ap->deferred_qc_work, ata_scsi_deferred_qc_work);
 	INIT_LIST_HEAD(&ap->eh_done_q);
 	init_waitqueue_head(&ap->eh_wait_q);
 	init_completion(&ap->park_req_pending);
@@ -6173,6 +6174,10 @@ static void ata_port_detach(struct ata_port *ap)
 		}
 	}
 
+	/* Make sure the deferred qc work finished. */
+	cancel_work_sync(&ap->deferred_qc_work);
+	WARN_ON(ap->deferred_qc);
+
 	/* Tell EH to disable all devices */
 	ap->pflags |= ATA_PFLAG_UNLOADING;
 	ata_port_schedule_eh(ap);
diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index 2586e77ebf45..b90b17f680f8 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -917,6 +917,12 @@ static void ata_eh_set_pending(struct ata_port *ap, bool fastdrain)
 
 	ap->pflags |= ATA_PFLAG_EH_PENDING;
 
+	/*
+	 * If we have a deferred qc, requeue it so that it is retried once EH
+	 * completes.
+	 */
+	ata_scsi_requeue_deferred_qc(ap);
+
 	if (!fastdrain)
 		return;
 
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 5b6b5f1ff3c7..4aeffd6a5640 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -1658,8 +1658,73 @@ static void ata_qc_done(struct ata_queued_cmd *qc)
 	done(cmd);
 }
 
+void ata_scsi_deferred_qc_work(struct work_struct *work)
+{
+	struct ata_port *ap =
+		container_of(work, struct ata_port, deferred_qc_work);
+	struct ata_queued_cmd *qc;
+	unsigned long flags;
+
+	spin_lock_irqsave(ap->lock, flags);
+
+	/*
+	 * If we still have a deferred qc and we are not in EH, issue it. In
+	 * such case, we should not need any more deferring the qc, so warn if
+	 * qc_defer() says otherwise.
+	 */
+	qc = ap->deferred_qc;
+	if (qc && !ata_port_eh_scheduled(ap)) {
+		WARN_ON_ONCE(ap->ops->qc_defer(qc));
+		ap->deferred_qc = NULL;
+		ata_qc_issue(qc);
+	}
+
+	spin_unlock_irqrestore(ap->lock, flags);
+}
+
+void ata_scsi_requeue_deferred_qc(struct ata_port *ap)
+{
+	struct ata_queued_cmd *qc = ap->deferred_qc;
+	struct scsi_cmnd *scmd;
+
+	/*
+	 * If we have a deferred qc when a reset occurs or NCQ commands fail,
+	 * do not try to be smart about what to do with this deferred command
+	 * and simply retry it by completing it with DID_SOFT_ERROR.
+	 */
+	if (!qc)
+		return;
+
+	scmd = qc->scsicmd;
+	ap->deferred_qc = NULL;
+	ata_qc_free(qc);
+	scmd->result = (DID_SOFT_ERROR << 16);
+	scsi_done(scmd);
+}
+
+static void ata_scsi_schedule_deferred_qc(struct ata_port *ap)
+{
+	struct ata_queued_cmd *qc = ap->deferred_qc;
+
+	/*
+	 * If we have a deferred qc, then qc_defer() is defined and we can use
+	 * this callback to determine if this qc is good to go, unless EH has
+	 * been scheduled.
+	 */
+	if (!qc)
+		return;
+
+	if (ata_port_eh_scheduled(ap)) {
+		ata_scsi_requeue_deferred_qc(ap);
+		return;
+	}
+	if (!ap->ops->qc_defer(qc))
+		queue_work(system_highpri_wq, &ap->deferred_qc_work);
+}
+
 static void ata_scsi_qc_complete(struct ata_queued_cmd *qc)
 {
+	struct ata_port *ap = qc->ap;
 	struct scsi_cmnd *cmd = qc->scsicmd;
 	u8 *cdb = cmd->cmnd;
 	bool have_sense = qc->flags & ATA_QCFLAG_SENSE_VALID;
@@ -1689,6 +1754,8 @@ static void ata_scsi_qc_complete(struct ata_queued_cmd *qc)
 	}
 
 	ata_qc_done(qc);
+
+	ata_scsi_schedule_deferred_qc(ap);
 }
 
 static int ata_scsi_qc_issue(struct ata_port *ap, struct ata_queued_cmd *qc)
@@ -1698,6 +1765,16 @@ static int ata_scsi_qc_issue(struct ata_port *ap, struct ata_queued_cmd *qc)
 	if (!ap->ops->qc_defer)
 		goto issue;
 
+	/*
+	 * If we already have a deferred qc, then rely on the SCSI layer to
+	 * requeue and defer all incoming commands until the deferred qc is
+	 * processed, once all on-going commands complete.
+	 */
+	if (ap->deferred_qc) {
+		ata_qc_free(qc);
+		return SCSI_MLQUEUE_DEVICE_BUSY;
+	}
+
 	/* Check if the command needs to be deferred. */
 	ret = ap->ops->qc_defer(qc);
 	switch (ret) {
@@ -1716,6 +1793,18 @@ static int ata_scsi_qc_issue(struct ata_port *ap, struct ata_queued_cmd *qc)
 	}
 
 	if (ret) {
+		/*
+		 * We must defer this qc: if this is not an NCQ command, keep
+		 * this qc as a deferred one and report to the SCSI layer that
+		 * we issued it so that it is not requeued. The deferred qc will
+		 * be issued with the port deferred_qc_work once all on-going
+		 * commands complete.
+		 */
+		if (!ata_is_ncq(qc->tf.protocol)) {
+			ap->deferred_qc = qc;
+			return 0;
+		}
+
 		/* Force a requeue of the command to defer its execution. */
 		ata_qc_free(qc);
 		return ret;
diff --git a/drivers/ata/libata.h b/drivers/ata/libata.h
index 0e7ecac73680..60a675df61dc 100644
--- a/drivers/ata/libata.h
+++ b/drivers/ata/libata.h
@@ -165,6 +165,8 @@ void ata_scsi_sdev_config(struct scsi_device *sdev);
 int ata_scsi_dev_config(struct scsi_device *sdev, struct queue_limits *lim,
 		struct ata_device *dev);
 int __ata_scsi_queuecmd(struct scsi_cmnd *scmd, struct ata_device *dev);
+void ata_scsi_deferred_qc_work(struct work_struct *work);
+void ata_scsi_requeue_deferred_qc(struct ata_port *ap);
 
 /* libata-eh.c */
 extern unsigned int ata_internal_cmd_timeout(struct ata_device *dev, u8 cmd);
diff --git a/include/linux/libata.h b/include/linux/libata.h
index 39534fafa36a..c5b27d97dfaf 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -903,6 +903,9 @@ struct ata_port {
 	u64			qc_active;
 	int			nr_active_links; /* #links with active qcs */
 
+	struct work_struct	deferred_qc_work;
+	struct ata_queued_cmd	*deferred_qc;
+
 	struct ata_link		link;		/* host default link */
 	struct ata_link		*slave_link;	/* see ata_slave_link_init() */
 
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v5 1/2] ata: libata-scsi: refactor ata_scsi_translate()
  2026-01-04  8:22 ` [PATCH v5 1/2] ata: libata-scsi: refactor ata_scsi_translate() Damien Le Moal
@ 2026-01-04 17:05   ` Niklas Cassel
  2026-01-05 10:30   ` John Garry
  2026-01-05 23:26   ` Igor Pylypiv
  2 siblings, 0 replies; 11+ messages in thread
From: Niklas Cassel @ 2026-01-04 17:05 UTC (permalink / raw)
  To: Damien Le Moal; +Cc: linux-ide, Igor Pylypiv, Xingui Yang, John Garry

On Sun, Jan 04, 2026 at 05:22:02PM +0900, Damien Le Moal wrote:
> Factor out of ata_scsi_translate() the code handling queued command
> deferral using the port qc_defer callback and inssuing the queued
> command with ata_qc_issue() into the new function ata_scsi_qc_issue(),
> and simplify the goto used in ata_scsi_translate().
> While at it, also add a lockdep annotation to check that the port lock
> is held when ata_scsi_translate() is called.
> 
> No functional changes.
> 
> Cc: stable@vger.kernel.org
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>

Reviewed-by: Niklas Cassel <cassel@kernel.org>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v5 2/2] ata: libata-scsi: avoid Non-NCQ command starvation
  2026-01-04  8:22 ` [PATCH v5 2/2] ata: libata-scsi: avoid Non-NCQ command starvation Damien Le Moal
@ 2026-01-04 17:16   ` Niklas Cassel
  2026-01-05 10:51   ` John Garry
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 11+ messages in thread
From: Niklas Cassel @ 2026-01-04 17:16 UTC (permalink / raw)
  To: Damien Le Moal; +Cc: linux-ide, Igor Pylypiv, Xingui Yang, John Garry

On Sun, Jan 04, 2026 at 05:22:03PM +0900, Damien Le Moal wrote:
> @@ -1658,8 +1658,73 @@ static void ata_qc_done(struct ata_queued_cmd *qc)
>  	done(cmd);
>  }
>  
> +void ata_scsi_deferred_qc_work(struct work_struct *work)
> +{
> +	struct ata_port *ap =
> +		container_of(work, struct ata_port, deferred_qc_work);
> +	struct ata_queued_cmd *qc;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(ap->lock, flags);
> +
> +	/*
> +	 * If we still have a deferred qc and we are not in EH, issue it. In
> +	 * such case, we should not need any more deferring the qc, so warn if
> +	 * qc_defer() says otherwise.
> +	 */
> +	qc = ap->deferred_qc;
> +	if (qc && !ata_port_eh_scheduled(ap)) {
> +		WARN_ON_ONCE(ap->ops->qc_defer(qc));
> +		ap->deferred_qc = NULL;
> +		ata_qc_issue(qc);
> +	}
> +
> +	spin_unlock_irqrestore(ap->lock, flags);
> +}
> +
> +void ata_scsi_requeue_deferred_qc(struct ata_port *ap)
> +{
> +	struct ata_queued_cmd *qc = ap->deferred_qc;
> +	struct scsi_cmnd *scmd;

Since ap->deferred_qc needs to be checked while holding the spinlock,
in order to synchronize with ata_scsi_deferred_qc_work(), please add a
lockdep_assert_held(ap->lock) here.

Yes, I know that you had multiple lockdep_assert_held() in V2.

But I think that a single lockdep_assert_held() in either 
ata_scsi_requeue_deferred_qc() or ata_scsi_qc_complete() is warranted
(and yes, it is a no-op on production systems (where lockdep is disabled)).

Other than that, looks good to me, no need for a resend, you canfixup when
applying:

Reviewed-by: Niklas Cassel <cassel@kernel.org>


> +
> +	/*
> +	 * If we have a deferred qc when a reset occurs or NCQ commands fail,
> +	 * do not try to be smart about what to do with this deferred command
> +	 * and simply retry it by completing it with DID_SOFT_ERROR.
> +	 */
> +	if (!qc)
> +		return;
> +
> +	scmd = qc->scsicmd;
> +	ap->deferred_qc = NULL;
> +	ata_qc_free(qc);
> +	scmd->result = (DID_SOFT_ERROR << 16);
> +	scsi_done(scmd);
> +}

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v5 0/2] Prevent non-NCQ command starvation
  2026-01-04  8:22 [PATCH v5 0/2] Prevent non-NCQ command starvation Damien Le Moal
  2026-01-04  8:22 ` [PATCH v5 1/2] ata: libata-scsi: refactor ata_scsi_translate() Damien Le Moal
  2026-01-04  8:22 ` [PATCH v5 2/2] ata: libata-scsi: avoid Non-NCQ command starvation Damien Le Moal
@ 2026-01-04 21:49 ` Martin K. Petersen
  2 siblings, 0 replies; 11+ messages in thread
From: Martin K. Petersen @ 2026-01-04 21:49 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: linux-ide, Niklas Cassel, Igor Pylypiv, Xingui Yang, John Garry


Damien,

> This small patch series addresses potential command starvation issues
> with non-NCQ commands issued to a device accessed through a
> multi-queue libsas HBA. In such setup, a non-NCQ command issued to a
> device may suffer long delays and even complete starvation if the
> device is under a constant workload of NCQ commands issued from
> multiple queues.

LGTM.

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

-- 
Martin K. Petersen

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v5 1/2] ata: libata-scsi: refactor ata_scsi_translate()
  2026-01-04  8:22 ` [PATCH v5 1/2] ata: libata-scsi: refactor ata_scsi_translate() Damien Le Moal
  2026-01-04 17:05   ` Niklas Cassel
@ 2026-01-05 10:30   ` John Garry
  2026-01-05 23:26   ` Igor Pylypiv
  2 siblings, 0 replies; 11+ messages in thread
From: John Garry @ 2026-01-05 10:30 UTC (permalink / raw)
  To: Damien Le Moal, linux-ide, Niklas Cassel; +Cc: Igor Pylypiv, Xingui Yang

On 04/01/2026 08:22, Damien Le Moal wrote:
> Factor out of ata_scsi_translate() the code handling queued command
> deferral using the port qc_defer callback and inssuing the queued
> command with ata_qc_issue() into the new function ata_scsi_qc_issue(),
> and simplify the goto used in ata_scsi_translate().
> While at it, also add a lockdep annotation to check that the port lock
> is held when ata_scsi_translate() is called.
> 
> No functional changes.
> 
> Cc:stable@vger.kernel.org
> Signed-off-by: Damien Le Moal<dlemoal@kernel.org>

Reviewed-by: John Garry <john.g.garry@oracle.com>

> ---
>   drivers/ata/libata-scsi.c | 81 ++++++++++++++++++++++++---------------
>   1 file changed, 50 insertions(+), 31 deletions(-)
> 
> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
> index 721d3f270c8e..5b6b5f1ff3c7 100644
> --- a/drivers/ata/libata-scsi.c
> +++ b/drivers/ata/libata-scsi.c
> @@ -1691,6 +1691,42 @@ static void ata_scsi_qc_complete(struct ata_queued_cmd *qc)
>   	ata_qc_done(qc);
>   }
>   
> +static int ata_scsi_qc_issue(struct ata_port *ap, struct ata_queued_cmd *qc)
> +{
> +	int ret;
> +
> +	if (!ap->ops->qc_defer)
> +		goto issue;
> +
> +	/* Check if the command needs to be deferred. */

nit: I am not sure if there is much value in this comment

> +	ret = ap->ops->qc_defer(qc);
> +	switch (ret) {
> +	case 0:
> +		break;


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v5 2/2] ata: libata-scsi: avoid Non-NCQ command starvation
  2026-01-04  8:22 ` [PATCH v5 2/2] ata: libata-scsi: avoid Non-NCQ command starvation Damien Le Moal
  2026-01-04 17:16   ` Niklas Cassel
@ 2026-01-05 10:51   ` John Garry
  2026-01-06 21:03   ` Igor Pylypiv
  2026-01-14  3:16   ` yangxingui
  3 siblings, 0 replies; 11+ messages in thread
From: John Garry @ 2026-01-05 10:51 UTC (permalink / raw)
  To: Damien Le Moal, linux-ide, Niklas Cassel; +Cc: Igor Pylypiv, Xingui Yang

On 04/01/2026 08:22, Damien Le Moal wrote:
> When a non-NCQ command is issued while NCQ commands are being executed,
> ata_scsi_qc_issue() indicates to the SCSI layer that the command issuing
> should be deferred by returning SCSI_MLQUEUE_XXX_BUSY.  This command
> deferring is correct and as mandated by the ACS specifications since
> NCQ and non-NCQ commands cannot be mixed.
> 
> However, in the case of a host adapter using multiple submission queues,
> when the target device is under a constant load of NCQ commands, there
> are no guarantees that requeueing the non-NCQ command will be executed
> later and it may be deferred again repeatedly as other submission queues
> can constantly issue NCQ commands from different CPUs ahead of the
> non-NCQ command. This can lead to very long delays for the execution of
> non-NCQ commands, and even complete starvation for these commands in the
> worst case scenario.
> 
> Since the block layer and the SCSI layer do not distinguish between
> queueable (NCQ) and non queueable (non-NCQ) commands, libata-scsi SAT
> implementation must ensure forward progress for non-NCQ commands in the
> presence of NCQ command traffic. This is similar to what SAS HBAs with a
> hardware/firmware based SAT implementation do.
> 
> Implement such forward progress guarantee by limiting requeueing of
> non-NCQ commands from ata_scsi_qc_issue(): when a non-NCQ command is
> received and NCQ commands are in-flight, do not force a requeue of the
> non-NCQ command by returning SCSI_MLQUEUE_XXX_BUSY and instead return 0
> to indicate that the command was accepted but hold on to the qc using
> the new deferred_qc field of struct ata_port.
> 
> This deferred qc will be issued using the work item deferred_qc_work
> running the function ata_scsi_deferred_qc_work() once all in-flight
> commands complete, which is checked with the port qc_defer() callback
> return value indicating that no further delay is necessary. This check
> is done using the helper function ata_scsi_schedule_deferred_qc() which
> is called from ata_scsi_qc_complete(). This thus excludes this mechanism
> from all internal non-NCQ commands issued by ATA EH.
> 
> When a port deferred_qc is non NULL, that is, the port has a command
> waiting for the device queue to drain, the issuing of all incoming
> commands (both NCQ and non-NCQ) is deferred using the regular busy
> mechanism. This simplifies the code and also avoids potential denial of
> service problems if a user issues too many non-NCQ commands.
> 
> Finally, whenever ata EH is scheduled, regardless of the reason, a
> deferred qc is always requeued so that it can be retried once EH
> completes. 
> This is done by calling the function
> ata_scsi_requeue_deferred_qc() from ata_eh_set_pending(). This avoids
> the need for any special processing for the deferred qc in case of NCQ
> error, link or device reset, or device timeout.
> 
> Reported-by: Xingui Yang <yangxingui@huawei.com>
> Reported-by: Igor Pylypiv <ipylypiv@google.com>
> Fixes: 42f22fe36d51 ("scsi: pm8001: Expose hardware queues for pm80xx")

I'd say that this is more related to the following:

commit bdb01301f3ea51a59eff252b06643fc1fe843e57
Author: Hannes Reinecke <hare@suse.com>
Date:   Wed Aug 19 23:20:30 2020 +0800

     scsi: Add host and host template flag 'host_tagset'

That is when we could expose multiple queues.

> Cc: stable@vger.kernel.org
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
> ---

Generally looks ok, but a couple of comments remain for me, so:

Reviewed-by: John Garry <john.g.garry@oracle.com>

>   drivers/ata/libata-core.c |  5 +++
>   drivers/ata/libata-eh.c   |  6 +++
>   drivers/ata/libata-scsi.c | 89 +++++++++++++++++++++++++++++++++++++++
>   drivers/ata/libata.h      |  2 +
>   include/linux/libata.h    |  3 ++
>   5 files changed, 105 insertions(+)
> 
> diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
> index 09d8c035fcdf..447d8dc666a4 100644
> --- a/drivers/ata/libata-core.c
> +++ b/drivers/ata/libata-core.c
> @@ -5561,6 +5561,7 @@ struct ata_port *ata_port_alloc(struct ata_host *host)
>   	mutex_init(&ap->scsi_scan_mutex);
>   	INIT_DELAYED_WORK(&ap->hotplug_task, ata_scsi_hotplug);
>   	INIT_DELAYED_WORK(&ap->scsi_rescan_task, ata_scsi_dev_rescan);
> +	INIT_WORK(&ap->deferred_qc_work, ata_scsi_deferred_qc_work);
>   	INIT_LIST_HEAD(&ap->eh_done_q);
>   	init_waitqueue_head(&ap->eh_wait_q);
>   	init_completion(&ap->park_req_pending);
> @@ -6173,6 +6174,10 @@ static void ata_port_detach(struct ata_port *ap)
>   		}
>   	}
>   
> +	/* Make sure the deferred qc work finished. */
> +	cancel_work_sync(&ap->deferred_qc_work);
> +	WARN_ON(ap->deferred_qc);
> +
>   	/* Tell EH to disable all devices */
>   	ap->pflags |= ATA_PFLAG_UNLOADING;
>   	ata_port_schedule_eh(ap);
> diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
> index 2586e77ebf45..b90b17f680f8 100644
> --- a/drivers/ata/libata-eh.c
> +++ b/drivers/ata/libata-eh.c
> @@ -917,6 +917,12 @@ static void ata_eh_set_pending(struct ata_port *ap, bool fastdrain)
>   
>   	ap->pflags |= ATA_PFLAG_EH_PENDING;
>   
> +	/*
> +	 * If we have a deferred qc, requeue it so that it is retried once EH
> +	 * completes.
> +	 */
> +	ata_scsi_requeue_deferred_qc(ap);
> +
>   	if (!fastdrain)
>   		return;
>   
> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
> index 5b6b5f1ff3c7..4aeffd6a5640 100644
> --- a/drivers/ata/libata-scsi.c
> +++ b/drivers/ata/libata-scsi.c
> @@ -1658,8 +1658,73 @@ static void ata_qc_done(struct ata_queued_cmd *qc)
>   	done(cmd);
>   }
>   
> +void ata_scsi_deferred_qc_work(struct work_struct *work)
> +{
> +	struct ata_port *ap =
> +		container_of(work, struct ata_port, deferred_qc_work);
> +	struct ata_queued_cmd *qc;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(ap->lock, flags);
> +
> +	/*
> +	 * If we still have a deferred qc and we are not in EH, issue it. In
> +	 * such case, we should not need any more deferring the qc, so warn if
> +	 * qc_defer() says otherwise.
> +	 */
> +	qc = ap->deferred_qc;
> +	if (qc && !ata_port_eh_scheduled(ap)) {
> +		WARN_ON_ONCE(ap->ops->qc_defer(qc));
> +		ap->deferred_qc = NULL;
> +		ata_qc_issue(qc);
> +	}
> +
> +	spin_unlock_irqrestore(ap->lock, flags);
> +}
> +
> +void ata_scsi_requeue_deferred_qc(struct ata_port *ap)
> +{
> +	struct ata_queued_cmd *qc = ap->deferred_qc;
> +	struct scsi_cmnd *scmd;
> +
> +	/*
> +	 * If we have a deferred qc when a reset occurs or NCQ commands fail,
> +	 * do not try to be smart about what to do with this deferred command
> +	 * and simply retry it by completing it with DID_SOFT_ERROR.
> +	 */
> +	if (!qc)
> +		return;
> +
> +	scmd = qc->scsicmd;
> +	ap->deferred_qc = NULL;
> +	ata_qc_free(qc);
> +	scmd->result = (DID_SOFT_ERROR << 16);
> +	scsi_done(scmd);
> +}
> +
> +static void ata_scsi_schedule_deferred_qc(struct ata_port *ap)
> +{
> +	struct ata_queued_cmd *qc = ap->deferred_qc;

Can we assert that the host lock is held here?

It seems that the flags which we set in ata_port_eh_scheduled() are only 
set with the lock held, so we should ensure that it is held when we check.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v5 1/2] ata: libata-scsi: refactor ata_scsi_translate()
  2026-01-04  8:22 ` [PATCH v5 1/2] ata: libata-scsi: refactor ata_scsi_translate() Damien Le Moal
  2026-01-04 17:05   ` Niklas Cassel
  2026-01-05 10:30   ` John Garry
@ 2026-01-05 23:26   ` Igor Pylypiv
  2 siblings, 0 replies; 11+ messages in thread
From: Igor Pylypiv @ 2026-01-05 23:26 UTC (permalink / raw)
  To: Damien Le Moal; +Cc: linux-ide, Niklas Cassel, Xingui Yang, John Garry

On Sun, Jan 04, 2026 at 05:22:02PM +0900, Damien Le Moal wrote:
> Factor out of ata_scsi_translate() the code handling queued command
> deferral using the port qc_defer callback and inssuing the queued

s/inssuing/issuing/

> command with ata_qc_issue() into the new function ata_scsi_qc_issue(),
> and simplify the goto used in ata_scsi_translate().
> While at it, also add a lockdep annotation to check that the port lock
> is held when ata_scsi_translate() is called.
> 
> No functional changes.
> 
> Cc: stable@vger.kernel.org
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>

Reviewed-by: Igor Pylypiv <ipylypiv@google.com>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v5 2/2] ata: libata-scsi: avoid Non-NCQ command starvation
  2026-01-04  8:22 ` [PATCH v5 2/2] ata: libata-scsi: avoid Non-NCQ command starvation Damien Le Moal
  2026-01-04 17:16   ` Niklas Cassel
  2026-01-05 10:51   ` John Garry
@ 2026-01-06 21:03   ` Igor Pylypiv
  2026-01-14  3:16   ` yangxingui
  3 siblings, 0 replies; 11+ messages in thread
From: Igor Pylypiv @ 2026-01-06 21:03 UTC (permalink / raw)
  To: Damien Le Moal; +Cc: linux-ide, Niklas Cassel, Xingui Yang, John Garry

On Sun, Jan 04, 2026 at 05:22:03PM +0900, Damien Le Moal wrote:
> When a non-NCQ command is issued while NCQ commands are being executed,
> ata_scsi_qc_issue() indicates to the SCSI layer that the command issuing
> should be deferred by returning SCSI_MLQUEUE_XXX_BUSY.  This command
> deferring is correct and as mandated by the ACS specifications since
> NCQ and non-NCQ commands cannot be mixed.
> 
> However, in the case of a host adapter using multiple submission queues,
> when the target device is under a constant load of NCQ commands, there
> are no guarantees that requeueing the non-NCQ command will be executed
> later and it may be deferred again repeatedly as other submission queues
> can constantly issue NCQ commands from different CPUs ahead of the
> non-NCQ command. This can lead to very long delays for the execution of
> non-NCQ commands, and even complete starvation for these commands in the
> worst case scenario.
> 
> Since the block layer and the SCSI layer do not distinguish between
> queueable (NCQ) and non queueable (non-NCQ) commands, libata-scsi SAT
> implementation must ensure forward progress for non-NCQ commands in the
> presence of NCQ command traffic. This is similar to what SAS HBAs with a
> hardware/firmware based SAT implementation do.
> 
> Implement such forward progress guarantee by limiting requeueing of
> non-NCQ commands from ata_scsi_qc_issue(): when a non-NCQ command is
> received and NCQ commands are in-flight, do not force a requeue of the
> non-NCQ command by returning SCSI_MLQUEUE_XXX_BUSY and instead return 0
> to indicate that the command was accepted but hold on to the qc using
> the new deferred_qc field of struct ata_port.
> 
> This deferred qc will be issued using the work item deferred_qc_work
> running the function ata_scsi_deferred_qc_work() once all in-flight
> commands complete, which is checked with the port qc_defer() callback
> return value indicating that no further delay is necessary. This check
> is done using the helper function ata_scsi_schedule_deferred_qc() which
> is called from ata_scsi_qc_complete(). This thus excludes this mechanism
> from all internal non-NCQ commands issued by ATA EH.
> 
> When a port deferred_qc is non NULL, that is, the port has a command
> waiting for the device queue to drain, the issuing of all incoming
> commands (both NCQ and non-NCQ) is deferred using the regular busy
> mechanism. This simplifies the code and also avoids potential denial of
> service problems if a user issues too many non-NCQ commands.
> 
> Finally, whenever ata EH is scheduled, regardless of the reason, a
> deferred qc is always requeued so that it can be retried once EH
> completes. This is done by calling the function
> ata_scsi_requeue_deferred_qc() from ata_eh_set_pending(). This avoids
> the need for any special processing for the deferred qc in case of NCQ
> error, link or device reset, or device timeout.
> 
> Reported-by: Xingui Yang <yangxingui@huawei.com>
> Reported-by: Igor Pylypiv <ipylypiv@google.com>
> Fixes: 42f22fe36d51 ("scsi: pm8001: Expose hardware queues for pm80xx")
> Cc: stable@vger.kernel.org
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>

Tested-by: Igor Pylypiv <ipylypiv@google.com>

Thanks again for fixing this issue, Damien!

Best,
Igor

> ---
>  drivers/ata/libata-core.c |  5 +++
>  drivers/ata/libata-eh.c   |  6 +++
>  drivers/ata/libata-scsi.c | 89 +++++++++++++++++++++++++++++++++++++++
>  drivers/ata/libata.h      |  2 +
>  include/linux/libata.h    |  3 ++
>  5 files changed, 105 insertions(+)
> 
> diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
> index 09d8c035fcdf..447d8dc666a4 100644
> --- a/drivers/ata/libata-core.c
> +++ b/drivers/ata/libata-core.c
> @@ -5561,6 +5561,7 @@ struct ata_port *ata_port_alloc(struct ata_host *host)
>  	mutex_init(&ap->scsi_scan_mutex);
>  	INIT_DELAYED_WORK(&ap->hotplug_task, ata_scsi_hotplug);
>  	INIT_DELAYED_WORK(&ap->scsi_rescan_task, ata_scsi_dev_rescan);
> +	INIT_WORK(&ap->deferred_qc_work, ata_scsi_deferred_qc_work);
>  	INIT_LIST_HEAD(&ap->eh_done_q);
>  	init_waitqueue_head(&ap->eh_wait_q);
>  	init_completion(&ap->park_req_pending);
> @@ -6173,6 +6174,10 @@ static void ata_port_detach(struct ata_port *ap)
>  		}
>  	}
>  
> +	/* Make sure the deferred qc work finished. */
> +	cancel_work_sync(&ap->deferred_qc_work);
> +	WARN_ON(ap->deferred_qc);
> +
>  	/* Tell EH to disable all devices */
>  	ap->pflags |= ATA_PFLAG_UNLOADING;
>  	ata_port_schedule_eh(ap);
> diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
> index 2586e77ebf45..b90b17f680f8 100644
> --- a/drivers/ata/libata-eh.c
> +++ b/drivers/ata/libata-eh.c
> @@ -917,6 +917,12 @@ static void ata_eh_set_pending(struct ata_port *ap, bool fastdrain)
>  
>  	ap->pflags |= ATA_PFLAG_EH_PENDING;
>  
> +	/*
> +	 * If we have a deferred qc, requeue it so that it is retried once EH
> +	 * completes.
> +	 */
> +	ata_scsi_requeue_deferred_qc(ap);
> +
>  	if (!fastdrain)
>  		return;
>  
> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
> index 5b6b5f1ff3c7..4aeffd6a5640 100644
> --- a/drivers/ata/libata-scsi.c
> +++ b/drivers/ata/libata-scsi.c
> @@ -1658,8 +1658,73 @@ static void ata_qc_done(struct ata_queued_cmd *qc)
>  	done(cmd);
>  }
>  
> +void ata_scsi_deferred_qc_work(struct work_struct *work)
> +{
> +	struct ata_port *ap =
> +		container_of(work, struct ata_port, deferred_qc_work);
> +	struct ata_queued_cmd *qc;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(ap->lock, flags);
> +
> +	/*
> +	 * If we still have a deferred qc and we are not in EH, issue it. In
> +	 * such case, we should not need any more deferring the qc, so warn if
> +	 * qc_defer() says otherwise.
> +	 */
> +	qc = ap->deferred_qc;
> +	if (qc && !ata_port_eh_scheduled(ap)) {
> +		WARN_ON_ONCE(ap->ops->qc_defer(qc));
> +		ap->deferred_qc = NULL;
> +		ata_qc_issue(qc);
> +	}
> +
> +	spin_unlock_irqrestore(ap->lock, flags);
> +}
> +
> +void ata_scsi_requeue_deferred_qc(struct ata_port *ap)
> +{
> +	struct ata_queued_cmd *qc = ap->deferred_qc;
> +	struct scsi_cmnd *scmd;
> +
> +	/*
> +	 * If we have a deferred qc when a reset occurs or NCQ commands fail,
> +	 * do not try to be smart about what to do with this deferred command
> +	 * and simply retry it by completing it with DID_SOFT_ERROR.
> +	 */
> +	if (!qc)
> +		return;
> +
> +	scmd = qc->scsicmd;
> +	ap->deferred_qc = NULL;
> +	ata_qc_free(qc);
> +	scmd->result = (DID_SOFT_ERROR << 16);
> +	scsi_done(scmd);
> +}
> +
> +static void ata_scsi_schedule_deferred_qc(struct ata_port *ap)
> +{
> +	struct ata_queued_cmd *qc = ap->deferred_qc;
> +
> +	/*
> +	 * If we have a deferred qc, then qc_defer() is defined and we can use
> +	 * this callback to determine if this qc is good to go, unless EH has
> +	 * been scheduled.
> +	 */
> +	if (!qc)
> +		return;
> +
> +	if (ata_port_eh_scheduled(ap)) {
> +		ata_scsi_requeue_deferred_qc(ap);
> +		return;
> +	}
> +	if (!ap->ops->qc_defer(qc))
> +		queue_work(system_highpri_wq, &ap->deferred_qc_work);
> +}
> +
>  static void ata_scsi_qc_complete(struct ata_queued_cmd *qc)
>  {
> +	struct ata_port *ap = qc->ap;
>  	struct scsi_cmnd *cmd = qc->scsicmd;
>  	u8 *cdb = cmd->cmnd;
>  	bool have_sense = qc->flags & ATA_QCFLAG_SENSE_VALID;
> @@ -1689,6 +1754,8 @@ static void ata_scsi_qc_complete(struct ata_queued_cmd *qc)
>  	}
>  
>  	ata_qc_done(qc);
> +
> +	ata_scsi_schedule_deferred_qc(ap);
>  }
>  
>  static int ata_scsi_qc_issue(struct ata_port *ap, struct ata_queued_cmd *qc)
> @@ -1698,6 +1765,16 @@ static int ata_scsi_qc_issue(struct ata_port *ap, struct ata_queued_cmd *qc)
>  	if (!ap->ops->qc_defer)
>  		goto issue;
>  
> +	/*
> +	 * If we already have a deferred qc, then rely on the SCSI layer to
> +	 * requeue and defer all incoming commands until the deferred qc is
> +	 * processed, once all on-going commands complete.
> +	 */
> +	if (ap->deferred_qc) {
> +		ata_qc_free(qc);
> +		return SCSI_MLQUEUE_DEVICE_BUSY;
> +	}
> +
>  	/* Check if the command needs to be deferred. */
>  	ret = ap->ops->qc_defer(qc);
>  	switch (ret) {
> @@ -1716,6 +1793,18 @@ static int ata_scsi_qc_issue(struct ata_port *ap, struct ata_queued_cmd *qc)
>  	}
>  
>  	if (ret) {
> +		/*
> +		 * We must defer this qc: if this is not an NCQ command, keep
> +		 * this qc as a deferred one and report to the SCSI layer that
> +		 * we issued it so that it is not requeued. The deferred qc will
> +		 * be issued with the port deferred_qc_work once all on-going
> +		 * commands complete.
> +		 */
> +		if (!ata_is_ncq(qc->tf.protocol)) {
> +			ap->deferred_qc = qc;
> +			return 0;
> +		}
> +
>  		/* Force a requeue of the command to defer its execution. */
>  		ata_qc_free(qc);
>  		return ret;
> diff --git a/drivers/ata/libata.h b/drivers/ata/libata.h
> index 0e7ecac73680..60a675df61dc 100644
> --- a/drivers/ata/libata.h
> +++ b/drivers/ata/libata.h
> @@ -165,6 +165,8 @@ void ata_scsi_sdev_config(struct scsi_device *sdev);
>  int ata_scsi_dev_config(struct scsi_device *sdev, struct queue_limits *lim,
>  		struct ata_device *dev);
>  int __ata_scsi_queuecmd(struct scsi_cmnd *scmd, struct ata_device *dev);
> +void ata_scsi_deferred_qc_work(struct work_struct *work);
> +void ata_scsi_requeue_deferred_qc(struct ata_port *ap);
>  
>  /* libata-eh.c */
>  extern unsigned int ata_internal_cmd_timeout(struct ata_device *dev, u8 cmd);
> diff --git a/include/linux/libata.h b/include/linux/libata.h
> index 39534fafa36a..c5b27d97dfaf 100644
> --- a/include/linux/libata.h
> +++ b/include/linux/libata.h
> @@ -903,6 +903,9 @@ struct ata_port {
>  	u64			qc_active;
>  	int			nr_active_links; /* #links with active qcs */
>  
> +	struct work_struct	deferred_qc_work;
> +	struct ata_queued_cmd	*deferred_qc;
> +
>  	struct ata_link		link;		/* host default link */
>  	struct ata_link		*slave_link;	/* see ata_slave_link_init() */
>  
> -- 
> 2.52.0
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v5 2/2] ata: libata-scsi: avoid Non-NCQ command starvation
  2026-01-04  8:22 ` [PATCH v5 2/2] ata: libata-scsi: avoid Non-NCQ command starvation Damien Le Moal
                     ` (2 preceding siblings ...)
  2026-01-06 21:03   ` Igor Pylypiv
@ 2026-01-14  3:16   ` yangxingui
  3 siblings, 0 replies; 11+ messages in thread
From: yangxingui @ 2026-01-14  3:16 UTC (permalink / raw)
  To: Damien Le Moal, linux-ide, Niklas Cassel; +Cc: Igor Pylypiv, John Garry


On 2026/1/4 16:22, Damien Le Moal wrote:
> When a non-NCQ command is issued while NCQ commands are being executed,
> ata_scsi_qc_issue() indicates to the SCSI layer that the command issuing
> should be deferred by returning SCSI_MLQUEUE_XXX_BUSY.  This command
> deferring is correct and as mandated by the ACS specifications since
> NCQ and non-NCQ commands cannot be mixed.
> 
> However, in the case of a host adapter using multiple submission queues,
> when the target device is under a constant load of NCQ commands, there
> are no guarantees that requeueing the non-NCQ command will be executed
> later and it may be deferred again repeatedly as other submission queues
> can constantly issue NCQ commands from different CPUs ahead of the
> non-NCQ command. This can lead to very long delays for the execution of
> non-NCQ commands, and even complete starvation for these commands in the
> worst case scenario.
> 
> Since the block layer and the SCSI layer do not distinguish between
> queueable (NCQ) and non queueable (non-NCQ) commands, libata-scsi SAT
> implementation must ensure forward progress for non-NCQ commands in the
> presence of NCQ command traffic. This is similar to what SAS HBAs with a
> hardware/firmware based SAT implementation do.
> 
> Implement such forward progress guarantee by limiting requeueing of
> non-NCQ commands from ata_scsi_qc_issue(): when a non-NCQ command is
> received and NCQ commands are in-flight, do not force a requeue of the
> non-NCQ command by returning SCSI_MLQUEUE_XXX_BUSY and instead return 0
> to indicate that the command was accepted but hold on to the qc using
> the new deferred_qc field of struct ata_port.
> 
> This deferred qc will be issued using the work item deferred_qc_work
> running the function ata_scsi_deferred_qc_work() once all in-flight
> commands complete, which is checked with the port qc_defer() callback
> return value indicating that no further delay is necessary. This check
> is done using the helper function ata_scsi_schedule_deferred_qc() which
> is called from ata_scsi_qc_complete(). This thus excludes this mechanism
> from all internal non-NCQ commands issued by ATA EH.
> 
> When a port deferred_qc is non NULL, that is, the port has a command
> waiting for the device queue to drain, the issuing of all incoming
> commands (both NCQ and non-NCQ) is deferred using the regular busy
> mechanism. This simplifies the code and also avoids potential denial of
> service problems if a user issues too many non-NCQ commands.
> 
> Finally, whenever ata EH is scheduled, regardless of the reason, a
> deferred qc is always requeued so that it can be retried once EH
> completes. This is done by calling the function
> ata_scsi_requeue_deferred_qc() from ata_eh_set_pending(). This avoids
> the need for any special processing for the deferred qc in case of NCQ
> error, link or device reset, or device timeout.
> 
> Reported-by: Xingui Yang <yangxingui@huawei.com>
> Reported-by: Igor Pylypiv <ipylypiv@google.com>
> Fixes: 42f22fe36d51 ("scsi: pm8001: Expose hardware queues for pm80xx")
> Cc: stable@vger.kernel.org
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>

Tested-by: Xingui Yang <yangxingui@huawei.com>

Thanks,
Xingui
> ---
>   drivers/ata/libata-core.c |  5 +++
>   drivers/ata/libata-eh.c   |  6 +++
>   drivers/ata/libata-scsi.c | 89 +++++++++++++++++++++++++++++++++++++++
>   drivers/ata/libata.h      |  2 +
>   include/linux/libata.h    |  3 ++
>   5 files changed, 105 insertions(+)
> 
> diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
> index 09d8c035fcdf..447d8dc666a4 100644
> --- a/drivers/ata/libata-core.c
> +++ b/drivers/ata/libata-core.c
> @@ -5561,6 +5561,7 @@ struct ata_port *ata_port_alloc(struct ata_host *host)
>   	mutex_init(&ap->scsi_scan_mutex);
>   	INIT_DELAYED_WORK(&ap->hotplug_task, ata_scsi_hotplug);
>   	INIT_DELAYED_WORK(&ap->scsi_rescan_task, ata_scsi_dev_rescan);
> +	INIT_WORK(&ap->deferred_qc_work, ata_scsi_deferred_qc_work);
>   	INIT_LIST_HEAD(&ap->eh_done_q);
>   	init_waitqueue_head(&ap->eh_wait_q);
>   	init_completion(&ap->park_req_pending);
> @@ -6173,6 +6174,10 @@ static void ata_port_detach(struct ata_port *ap)
>   		}
>   	}
>   
> +	/* Make sure the deferred qc work finished. */
> +	cancel_work_sync(&ap->deferred_qc_work);
> +	WARN_ON(ap->deferred_qc);
> +
>   	/* Tell EH to disable all devices */
>   	ap->pflags |= ATA_PFLAG_UNLOADING;
>   	ata_port_schedule_eh(ap);
> diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
> index 2586e77ebf45..b90b17f680f8 100644
> --- a/drivers/ata/libata-eh.c
> +++ b/drivers/ata/libata-eh.c
> @@ -917,6 +917,12 @@ static void ata_eh_set_pending(struct ata_port *ap, bool fastdrain)
>   
>   	ap->pflags |= ATA_PFLAG_EH_PENDING;
>   
> +	/*
> +	 * If we have a deferred qc, requeue it so that it is retried once EH
> +	 * completes.
> +	 */
> +	ata_scsi_requeue_deferred_qc(ap);
> +
>   	if (!fastdrain)
>   		return;
>   
> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
> index 5b6b5f1ff3c7..4aeffd6a5640 100644
> --- a/drivers/ata/libata-scsi.c
> +++ b/drivers/ata/libata-scsi.c
> @@ -1658,8 +1658,73 @@ static void ata_qc_done(struct ata_queued_cmd *qc)
>   	done(cmd);
>   }
>   
> +void ata_scsi_deferred_qc_work(struct work_struct *work)
> +{
> +	struct ata_port *ap =
> +		container_of(work, struct ata_port, deferred_qc_work);
> +	struct ata_queued_cmd *qc;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(ap->lock, flags);
> +
> +	/*
> +	 * If we still have a deferred qc and we are not in EH, issue it. In
> +	 * such case, we should not need any more deferring the qc, so warn if
> +	 * qc_defer() says otherwise.
> +	 */
> +	qc = ap->deferred_qc;
> +	if (qc && !ata_port_eh_scheduled(ap)) {
> +		WARN_ON_ONCE(ap->ops->qc_defer(qc));
> +		ap->deferred_qc = NULL;
> +		ata_qc_issue(qc);
> +	}
> +
> +	spin_unlock_irqrestore(ap->lock, flags);
> +}
> +
> +void ata_scsi_requeue_deferred_qc(struct ata_port *ap)
> +{
> +	struct ata_queued_cmd *qc = ap->deferred_qc;
> +	struct scsi_cmnd *scmd;
> +
> +	/*
> +	 * If we have a deferred qc when a reset occurs or NCQ commands fail,
> +	 * do not try to be smart about what to do with this deferred command
> +	 * and simply retry it by completing it with DID_SOFT_ERROR.
> +	 */
> +	if (!qc)
> +		return;
> +
> +	scmd = qc->scsicmd;
> +	ap->deferred_qc = NULL;
> +	ata_qc_free(qc);
> +	scmd->result = (DID_SOFT_ERROR << 16);
> +	scsi_done(scmd);
> +}
> +
> +static void ata_scsi_schedule_deferred_qc(struct ata_port *ap)
> +{
> +	struct ata_queued_cmd *qc = ap->deferred_qc;
> +
> +	/*
> +	 * If we have a deferred qc, then qc_defer() is defined and we can use
> +	 * this callback to determine if this qc is good to go, unless EH has
> +	 * been scheduled.
> +	 */
> +	if (!qc)
> +		return;
> +
> +	if (ata_port_eh_scheduled(ap)) {
> +		ata_scsi_requeue_deferred_qc(ap);
> +		return;
> +	}
> +	if (!ap->ops->qc_defer(qc))
> +		queue_work(system_highpri_wq, &ap->deferred_qc_work);
> +}
> +
>   static void ata_scsi_qc_complete(struct ata_queued_cmd *qc)
>   {
> +	struct ata_port *ap = qc->ap;
>   	struct scsi_cmnd *cmd = qc->scsicmd;
>   	u8 *cdb = cmd->cmnd;
>   	bool have_sense = qc->flags & ATA_QCFLAG_SENSE_VALID;
> @@ -1689,6 +1754,8 @@ static void ata_scsi_qc_complete(struct ata_queued_cmd *qc)
>   	}
>   
>   	ata_qc_done(qc);
> +
> +	ata_scsi_schedule_deferred_qc(ap);
>   }
>   
>   static int ata_scsi_qc_issue(struct ata_port *ap, struct ata_queued_cmd *qc)
> @@ -1698,6 +1765,16 @@ static int ata_scsi_qc_issue(struct ata_port *ap, struct ata_queued_cmd *qc)
>   	if (!ap->ops->qc_defer)
>   		goto issue;
>   
> +	/*
> +	 * If we already have a deferred qc, then rely on the SCSI layer to
> +	 * requeue and defer all incoming commands until the deferred qc is
> +	 * processed, once all on-going commands complete.
> +	 */
> +	if (ap->deferred_qc) {
> +		ata_qc_free(qc);
> +		return SCSI_MLQUEUE_DEVICE_BUSY;
> +	}
> +
>   	/* Check if the command needs to be deferred. */
>   	ret = ap->ops->qc_defer(qc);
>   	switch (ret) {
> @@ -1716,6 +1793,18 @@ static int ata_scsi_qc_issue(struct ata_port *ap, struct ata_queued_cmd *qc)
>   	}
>   
>   	if (ret) {
> +		/*
> +		 * We must defer this qc: if this is not an NCQ command, keep
> +		 * this qc as a deferred one and report to the SCSI layer that
> +		 * we issued it so that it is not requeued. The deferred qc will
> +		 * be issued with the port deferred_qc_work once all on-going
> +		 * commands complete.
> +		 */
> +		if (!ata_is_ncq(qc->tf.protocol)) {
> +			ap->deferred_qc = qc;
> +			return 0;
> +		}
> +
>   		/* Force a requeue of the command to defer its execution. */
>   		ata_qc_free(qc);
>   		return ret;
> diff --git a/drivers/ata/libata.h b/drivers/ata/libata.h
> index 0e7ecac73680..60a675df61dc 100644
> --- a/drivers/ata/libata.h
> +++ b/drivers/ata/libata.h
> @@ -165,6 +165,8 @@ void ata_scsi_sdev_config(struct scsi_device *sdev);
>   int ata_scsi_dev_config(struct scsi_device *sdev, struct queue_limits *lim,
>   		struct ata_device *dev);
>   int __ata_scsi_queuecmd(struct scsi_cmnd *scmd, struct ata_device *dev);
> +void ata_scsi_deferred_qc_work(struct work_struct *work);
> +void ata_scsi_requeue_deferred_qc(struct ata_port *ap);
>   
>   /* libata-eh.c */
>   extern unsigned int ata_internal_cmd_timeout(struct ata_device *dev, u8 cmd);
> diff --git a/include/linux/libata.h b/include/linux/libata.h
> index 39534fafa36a..c5b27d97dfaf 100644
> --- a/include/linux/libata.h
> +++ b/include/linux/libata.h
> @@ -903,6 +903,9 @@ struct ata_port {
>   	u64			qc_active;
>   	int			nr_active_links; /* #links with active qcs */
>   
> +	struct work_struct	deferred_qc_work;
> +	struct ata_queued_cmd	*deferred_qc;
> +
>   	struct ata_link		link;		/* host default link */
>   	struct ata_link		*slave_link;	/* see ata_slave_link_init() */
>   
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2026-01-14  3:16 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-04  8:22 [PATCH v5 0/2] Prevent non-NCQ command starvation Damien Le Moal
2026-01-04  8:22 ` [PATCH v5 1/2] ata: libata-scsi: refactor ata_scsi_translate() Damien Le Moal
2026-01-04 17:05   ` Niklas Cassel
2026-01-05 10:30   ` John Garry
2026-01-05 23:26   ` Igor Pylypiv
2026-01-04  8:22 ` [PATCH v5 2/2] ata: libata-scsi: avoid Non-NCQ command starvation Damien Le Moal
2026-01-04 17:16   ` Niklas Cassel
2026-01-05 10:51   ` John Garry
2026-01-06 21:03   ` Igor Pylypiv
2026-01-14  3:16   ` yangxingui
2026-01-04 21:49 ` [PATCH v5 0/2] Prevent non-NCQ " Martin K. Petersen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox