* Re: [PATCH v5 2/2] ata: libata-scsi: avoid Non-NCQ command starvation
2026-01-04 8:22 ` [PATCH v5 2/2] ata: libata-scsi: avoid Non-NCQ command starvation Damien Le Moal
@ 2026-01-04 17:16 ` Niklas Cassel
2026-01-05 10:51 ` John Garry
` (2 subsequent siblings)
3 siblings, 0 replies; 11+ messages in thread
From: Niklas Cassel @ 2026-01-04 17:16 UTC (permalink / raw)
To: Damien Le Moal; +Cc: linux-ide, Igor Pylypiv, Xingui Yang, John Garry
On Sun, Jan 04, 2026 at 05:22:03PM +0900, Damien Le Moal wrote:
> @@ -1658,8 +1658,73 @@ static void ata_qc_done(struct ata_queued_cmd *qc)
> done(cmd);
> }
>
> +void ata_scsi_deferred_qc_work(struct work_struct *work)
> +{
> + struct ata_port *ap =
> + container_of(work, struct ata_port, deferred_qc_work);
> + struct ata_queued_cmd *qc;
> + unsigned long flags;
> +
> + spin_lock_irqsave(ap->lock, flags);
> +
> + /*
> + * If we still have a deferred qc and we are not in EH, issue it. In
> + * such case, we should not need any more deferring the qc, so warn if
> + * qc_defer() says otherwise.
> + */
> + qc = ap->deferred_qc;
> + if (qc && !ata_port_eh_scheduled(ap)) {
> + WARN_ON_ONCE(ap->ops->qc_defer(qc));
> + ap->deferred_qc = NULL;
> + ata_qc_issue(qc);
> + }
> +
> + spin_unlock_irqrestore(ap->lock, flags);
> +}
> +
> +void ata_scsi_requeue_deferred_qc(struct ata_port *ap)
> +{
> + struct ata_queued_cmd *qc = ap->deferred_qc;
> + struct scsi_cmnd *scmd;
Since ap->deferred_qc needs to be checked while holding the spinlock,
in order to synchronize with ata_scsi_deferred_qc_work(), please add a
lockdep_assert_held(ap->lock) here.
Yes, I know that you had multiple lockdep_assert_held() in V2.
But I think that a single lockdep_assert_held() in either
ata_scsi_requeue_deferred_qc() or ata_scsi_qc_complete() is warranted
(and yes, it is a no-op on production systems (where lockdep is disabled)).
Other than that, looks good to me, no need for a resend, you canfixup when
applying:
Reviewed-by: Niklas Cassel <cassel@kernel.org>
> +
> + /*
> + * If we have a deferred qc when a reset occurs or NCQ commands fail,
> + * do not try to be smart about what to do with this deferred command
> + * and simply retry it by completing it with DID_SOFT_ERROR.
> + */
> + if (!qc)
> + return;
> +
> + scmd = qc->scsicmd;
> + ap->deferred_qc = NULL;
> + ata_qc_free(qc);
> + scmd->result = (DID_SOFT_ERROR << 16);
> + scsi_done(scmd);
> +}
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [PATCH v5 2/2] ata: libata-scsi: avoid Non-NCQ command starvation
2026-01-04 8:22 ` [PATCH v5 2/2] ata: libata-scsi: avoid Non-NCQ command starvation Damien Le Moal
2026-01-04 17:16 ` Niklas Cassel
@ 2026-01-05 10:51 ` John Garry
2026-01-06 21:03 ` Igor Pylypiv
2026-01-14 3:16 ` yangxingui
3 siblings, 0 replies; 11+ messages in thread
From: John Garry @ 2026-01-05 10:51 UTC (permalink / raw)
To: Damien Le Moal, linux-ide, Niklas Cassel; +Cc: Igor Pylypiv, Xingui Yang
On 04/01/2026 08:22, Damien Le Moal wrote:
> When a non-NCQ command is issued while NCQ commands are being executed,
> ata_scsi_qc_issue() indicates to the SCSI layer that the command issuing
> should be deferred by returning SCSI_MLQUEUE_XXX_BUSY. This command
> deferring is correct and as mandated by the ACS specifications since
> NCQ and non-NCQ commands cannot be mixed.
>
> However, in the case of a host adapter using multiple submission queues,
> when the target device is under a constant load of NCQ commands, there
> are no guarantees that requeueing the non-NCQ command will be executed
> later and it may be deferred again repeatedly as other submission queues
> can constantly issue NCQ commands from different CPUs ahead of the
> non-NCQ command. This can lead to very long delays for the execution of
> non-NCQ commands, and even complete starvation for these commands in the
> worst case scenario.
>
> Since the block layer and the SCSI layer do not distinguish between
> queueable (NCQ) and non queueable (non-NCQ) commands, libata-scsi SAT
> implementation must ensure forward progress for non-NCQ commands in the
> presence of NCQ command traffic. This is similar to what SAS HBAs with a
> hardware/firmware based SAT implementation do.
>
> Implement such forward progress guarantee by limiting requeueing of
> non-NCQ commands from ata_scsi_qc_issue(): when a non-NCQ command is
> received and NCQ commands are in-flight, do not force a requeue of the
> non-NCQ command by returning SCSI_MLQUEUE_XXX_BUSY and instead return 0
> to indicate that the command was accepted but hold on to the qc using
> the new deferred_qc field of struct ata_port.
>
> This deferred qc will be issued using the work item deferred_qc_work
> running the function ata_scsi_deferred_qc_work() once all in-flight
> commands complete, which is checked with the port qc_defer() callback
> return value indicating that no further delay is necessary. This check
> is done using the helper function ata_scsi_schedule_deferred_qc() which
> is called from ata_scsi_qc_complete(). This thus excludes this mechanism
> from all internal non-NCQ commands issued by ATA EH.
>
> When a port deferred_qc is non NULL, that is, the port has a command
> waiting for the device queue to drain, the issuing of all incoming
> commands (both NCQ and non-NCQ) is deferred using the regular busy
> mechanism. This simplifies the code and also avoids potential denial of
> service problems if a user issues too many non-NCQ commands.
>
> Finally, whenever ata EH is scheduled, regardless of the reason, a
> deferred qc is always requeued so that it can be retried once EH
> completes.
> This is done by calling the function
> ata_scsi_requeue_deferred_qc() from ata_eh_set_pending(). This avoids
> the need for any special processing for the deferred qc in case of NCQ
> error, link or device reset, or device timeout.
>
> Reported-by: Xingui Yang <yangxingui@huawei.com>
> Reported-by: Igor Pylypiv <ipylypiv@google.com>
> Fixes: 42f22fe36d51 ("scsi: pm8001: Expose hardware queues for pm80xx")
I'd say that this is more related to the following:
commit bdb01301f3ea51a59eff252b06643fc1fe843e57
Author: Hannes Reinecke <hare@suse.com>
Date: Wed Aug 19 23:20:30 2020 +0800
scsi: Add host and host template flag 'host_tagset'
That is when we could expose multiple queues.
> Cc: stable@vger.kernel.org
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
> ---
Generally looks ok, but a couple of comments remain for me, so:
Reviewed-by: John Garry <john.g.garry@oracle.com>
> drivers/ata/libata-core.c | 5 +++
> drivers/ata/libata-eh.c | 6 +++
> drivers/ata/libata-scsi.c | 89 +++++++++++++++++++++++++++++++++++++++
> drivers/ata/libata.h | 2 +
> include/linux/libata.h | 3 ++
> 5 files changed, 105 insertions(+)
>
> diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
> index 09d8c035fcdf..447d8dc666a4 100644
> --- a/drivers/ata/libata-core.c
> +++ b/drivers/ata/libata-core.c
> @@ -5561,6 +5561,7 @@ struct ata_port *ata_port_alloc(struct ata_host *host)
> mutex_init(&ap->scsi_scan_mutex);
> INIT_DELAYED_WORK(&ap->hotplug_task, ata_scsi_hotplug);
> INIT_DELAYED_WORK(&ap->scsi_rescan_task, ata_scsi_dev_rescan);
> + INIT_WORK(&ap->deferred_qc_work, ata_scsi_deferred_qc_work);
> INIT_LIST_HEAD(&ap->eh_done_q);
> init_waitqueue_head(&ap->eh_wait_q);
> init_completion(&ap->park_req_pending);
> @@ -6173,6 +6174,10 @@ static void ata_port_detach(struct ata_port *ap)
> }
> }
>
> + /* Make sure the deferred qc work finished. */
> + cancel_work_sync(&ap->deferred_qc_work);
> + WARN_ON(ap->deferred_qc);
> +
> /* Tell EH to disable all devices */
> ap->pflags |= ATA_PFLAG_UNLOADING;
> ata_port_schedule_eh(ap);
> diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
> index 2586e77ebf45..b90b17f680f8 100644
> --- a/drivers/ata/libata-eh.c
> +++ b/drivers/ata/libata-eh.c
> @@ -917,6 +917,12 @@ static void ata_eh_set_pending(struct ata_port *ap, bool fastdrain)
>
> ap->pflags |= ATA_PFLAG_EH_PENDING;
>
> + /*
> + * If we have a deferred qc, requeue it so that it is retried once EH
> + * completes.
> + */
> + ata_scsi_requeue_deferred_qc(ap);
> +
> if (!fastdrain)
> return;
>
> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
> index 5b6b5f1ff3c7..4aeffd6a5640 100644
> --- a/drivers/ata/libata-scsi.c
> +++ b/drivers/ata/libata-scsi.c
> @@ -1658,8 +1658,73 @@ static void ata_qc_done(struct ata_queued_cmd *qc)
> done(cmd);
> }
>
> +void ata_scsi_deferred_qc_work(struct work_struct *work)
> +{
> + struct ata_port *ap =
> + container_of(work, struct ata_port, deferred_qc_work);
> + struct ata_queued_cmd *qc;
> + unsigned long flags;
> +
> + spin_lock_irqsave(ap->lock, flags);
> +
> + /*
> + * If we still have a deferred qc and we are not in EH, issue it. In
> + * such case, we should not need any more deferring the qc, so warn if
> + * qc_defer() says otherwise.
> + */
> + qc = ap->deferred_qc;
> + if (qc && !ata_port_eh_scheduled(ap)) {
> + WARN_ON_ONCE(ap->ops->qc_defer(qc));
> + ap->deferred_qc = NULL;
> + ata_qc_issue(qc);
> + }
> +
> + spin_unlock_irqrestore(ap->lock, flags);
> +}
> +
> +void ata_scsi_requeue_deferred_qc(struct ata_port *ap)
> +{
> + struct ata_queued_cmd *qc = ap->deferred_qc;
> + struct scsi_cmnd *scmd;
> +
> + /*
> + * If we have a deferred qc when a reset occurs or NCQ commands fail,
> + * do not try to be smart about what to do with this deferred command
> + * and simply retry it by completing it with DID_SOFT_ERROR.
> + */
> + if (!qc)
> + return;
> +
> + scmd = qc->scsicmd;
> + ap->deferred_qc = NULL;
> + ata_qc_free(qc);
> + scmd->result = (DID_SOFT_ERROR << 16);
> + scsi_done(scmd);
> +}
> +
> +static void ata_scsi_schedule_deferred_qc(struct ata_port *ap)
> +{
> + struct ata_queued_cmd *qc = ap->deferred_qc;
Can we assert that the host lock is held here?
It seems that the flags which we set in ata_port_eh_scheduled() are only
set with the lock held, so we should ensure that it is held when we check.
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [PATCH v5 2/2] ata: libata-scsi: avoid Non-NCQ command starvation
2026-01-04 8:22 ` [PATCH v5 2/2] ata: libata-scsi: avoid Non-NCQ command starvation Damien Le Moal
2026-01-04 17:16 ` Niklas Cassel
2026-01-05 10:51 ` John Garry
@ 2026-01-06 21:03 ` Igor Pylypiv
2026-01-14 3:16 ` yangxingui
3 siblings, 0 replies; 11+ messages in thread
From: Igor Pylypiv @ 2026-01-06 21:03 UTC (permalink / raw)
To: Damien Le Moal; +Cc: linux-ide, Niklas Cassel, Xingui Yang, John Garry
On Sun, Jan 04, 2026 at 05:22:03PM +0900, Damien Le Moal wrote:
> When a non-NCQ command is issued while NCQ commands are being executed,
> ata_scsi_qc_issue() indicates to the SCSI layer that the command issuing
> should be deferred by returning SCSI_MLQUEUE_XXX_BUSY. This command
> deferring is correct and as mandated by the ACS specifications since
> NCQ and non-NCQ commands cannot be mixed.
>
> However, in the case of a host adapter using multiple submission queues,
> when the target device is under a constant load of NCQ commands, there
> are no guarantees that requeueing the non-NCQ command will be executed
> later and it may be deferred again repeatedly as other submission queues
> can constantly issue NCQ commands from different CPUs ahead of the
> non-NCQ command. This can lead to very long delays for the execution of
> non-NCQ commands, and even complete starvation for these commands in the
> worst case scenario.
>
> Since the block layer and the SCSI layer do not distinguish between
> queueable (NCQ) and non queueable (non-NCQ) commands, libata-scsi SAT
> implementation must ensure forward progress for non-NCQ commands in the
> presence of NCQ command traffic. This is similar to what SAS HBAs with a
> hardware/firmware based SAT implementation do.
>
> Implement such forward progress guarantee by limiting requeueing of
> non-NCQ commands from ata_scsi_qc_issue(): when a non-NCQ command is
> received and NCQ commands are in-flight, do not force a requeue of the
> non-NCQ command by returning SCSI_MLQUEUE_XXX_BUSY and instead return 0
> to indicate that the command was accepted but hold on to the qc using
> the new deferred_qc field of struct ata_port.
>
> This deferred qc will be issued using the work item deferred_qc_work
> running the function ata_scsi_deferred_qc_work() once all in-flight
> commands complete, which is checked with the port qc_defer() callback
> return value indicating that no further delay is necessary. This check
> is done using the helper function ata_scsi_schedule_deferred_qc() which
> is called from ata_scsi_qc_complete(). This thus excludes this mechanism
> from all internal non-NCQ commands issued by ATA EH.
>
> When a port deferred_qc is non NULL, that is, the port has a command
> waiting for the device queue to drain, the issuing of all incoming
> commands (both NCQ and non-NCQ) is deferred using the regular busy
> mechanism. This simplifies the code and also avoids potential denial of
> service problems if a user issues too many non-NCQ commands.
>
> Finally, whenever ata EH is scheduled, regardless of the reason, a
> deferred qc is always requeued so that it can be retried once EH
> completes. This is done by calling the function
> ata_scsi_requeue_deferred_qc() from ata_eh_set_pending(). This avoids
> the need for any special processing for the deferred qc in case of NCQ
> error, link or device reset, or device timeout.
>
> Reported-by: Xingui Yang <yangxingui@huawei.com>
> Reported-by: Igor Pylypiv <ipylypiv@google.com>
> Fixes: 42f22fe36d51 ("scsi: pm8001: Expose hardware queues for pm80xx")
> Cc: stable@vger.kernel.org
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Tested-by: Igor Pylypiv <ipylypiv@google.com>
Thanks again for fixing this issue, Damien!
Best,
Igor
> ---
> drivers/ata/libata-core.c | 5 +++
> drivers/ata/libata-eh.c | 6 +++
> drivers/ata/libata-scsi.c | 89 +++++++++++++++++++++++++++++++++++++++
> drivers/ata/libata.h | 2 +
> include/linux/libata.h | 3 ++
> 5 files changed, 105 insertions(+)
>
> diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
> index 09d8c035fcdf..447d8dc666a4 100644
> --- a/drivers/ata/libata-core.c
> +++ b/drivers/ata/libata-core.c
> @@ -5561,6 +5561,7 @@ struct ata_port *ata_port_alloc(struct ata_host *host)
> mutex_init(&ap->scsi_scan_mutex);
> INIT_DELAYED_WORK(&ap->hotplug_task, ata_scsi_hotplug);
> INIT_DELAYED_WORK(&ap->scsi_rescan_task, ata_scsi_dev_rescan);
> + INIT_WORK(&ap->deferred_qc_work, ata_scsi_deferred_qc_work);
> INIT_LIST_HEAD(&ap->eh_done_q);
> init_waitqueue_head(&ap->eh_wait_q);
> init_completion(&ap->park_req_pending);
> @@ -6173,6 +6174,10 @@ static void ata_port_detach(struct ata_port *ap)
> }
> }
>
> + /* Make sure the deferred qc work finished. */
> + cancel_work_sync(&ap->deferred_qc_work);
> + WARN_ON(ap->deferred_qc);
> +
> /* Tell EH to disable all devices */
> ap->pflags |= ATA_PFLAG_UNLOADING;
> ata_port_schedule_eh(ap);
> diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
> index 2586e77ebf45..b90b17f680f8 100644
> --- a/drivers/ata/libata-eh.c
> +++ b/drivers/ata/libata-eh.c
> @@ -917,6 +917,12 @@ static void ata_eh_set_pending(struct ata_port *ap, bool fastdrain)
>
> ap->pflags |= ATA_PFLAG_EH_PENDING;
>
> + /*
> + * If we have a deferred qc, requeue it so that it is retried once EH
> + * completes.
> + */
> + ata_scsi_requeue_deferred_qc(ap);
> +
> if (!fastdrain)
> return;
>
> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
> index 5b6b5f1ff3c7..4aeffd6a5640 100644
> --- a/drivers/ata/libata-scsi.c
> +++ b/drivers/ata/libata-scsi.c
> @@ -1658,8 +1658,73 @@ static void ata_qc_done(struct ata_queued_cmd *qc)
> done(cmd);
> }
>
> +void ata_scsi_deferred_qc_work(struct work_struct *work)
> +{
> + struct ata_port *ap =
> + container_of(work, struct ata_port, deferred_qc_work);
> + struct ata_queued_cmd *qc;
> + unsigned long flags;
> +
> + spin_lock_irqsave(ap->lock, flags);
> +
> + /*
> + * If we still have a deferred qc and we are not in EH, issue it. In
> + * such case, we should not need any more deferring the qc, so warn if
> + * qc_defer() says otherwise.
> + */
> + qc = ap->deferred_qc;
> + if (qc && !ata_port_eh_scheduled(ap)) {
> + WARN_ON_ONCE(ap->ops->qc_defer(qc));
> + ap->deferred_qc = NULL;
> + ata_qc_issue(qc);
> + }
> +
> + spin_unlock_irqrestore(ap->lock, flags);
> +}
> +
> +void ata_scsi_requeue_deferred_qc(struct ata_port *ap)
> +{
> + struct ata_queued_cmd *qc = ap->deferred_qc;
> + struct scsi_cmnd *scmd;
> +
> + /*
> + * If we have a deferred qc when a reset occurs or NCQ commands fail,
> + * do not try to be smart about what to do with this deferred command
> + * and simply retry it by completing it with DID_SOFT_ERROR.
> + */
> + if (!qc)
> + return;
> +
> + scmd = qc->scsicmd;
> + ap->deferred_qc = NULL;
> + ata_qc_free(qc);
> + scmd->result = (DID_SOFT_ERROR << 16);
> + scsi_done(scmd);
> +}
> +
> +static void ata_scsi_schedule_deferred_qc(struct ata_port *ap)
> +{
> + struct ata_queued_cmd *qc = ap->deferred_qc;
> +
> + /*
> + * If we have a deferred qc, then qc_defer() is defined and we can use
> + * this callback to determine if this qc is good to go, unless EH has
> + * been scheduled.
> + */
> + if (!qc)
> + return;
> +
> + if (ata_port_eh_scheduled(ap)) {
> + ata_scsi_requeue_deferred_qc(ap);
> + return;
> + }
> + if (!ap->ops->qc_defer(qc))
> + queue_work(system_highpri_wq, &ap->deferred_qc_work);
> +}
> +
> static void ata_scsi_qc_complete(struct ata_queued_cmd *qc)
> {
> + struct ata_port *ap = qc->ap;
> struct scsi_cmnd *cmd = qc->scsicmd;
> u8 *cdb = cmd->cmnd;
> bool have_sense = qc->flags & ATA_QCFLAG_SENSE_VALID;
> @@ -1689,6 +1754,8 @@ static void ata_scsi_qc_complete(struct ata_queued_cmd *qc)
> }
>
> ata_qc_done(qc);
> +
> + ata_scsi_schedule_deferred_qc(ap);
> }
>
> static int ata_scsi_qc_issue(struct ata_port *ap, struct ata_queued_cmd *qc)
> @@ -1698,6 +1765,16 @@ static int ata_scsi_qc_issue(struct ata_port *ap, struct ata_queued_cmd *qc)
> if (!ap->ops->qc_defer)
> goto issue;
>
> + /*
> + * If we already have a deferred qc, then rely on the SCSI layer to
> + * requeue and defer all incoming commands until the deferred qc is
> + * processed, once all on-going commands complete.
> + */
> + if (ap->deferred_qc) {
> + ata_qc_free(qc);
> + return SCSI_MLQUEUE_DEVICE_BUSY;
> + }
> +
> /* Check if the command needs to be deferred. */
> ret = ap->ops->qc_defer(qc);
> switch (ret) {
> @@ -1716,6 +1793,18 @@ static int ata_scsi_qc_issue(struct ata_port *ap, struct ata_queued_cmd *qc)
> }
>
> if (ret) {
> + /*
> + * We must defer this qc: if this is not an NCQ command, keep
> + * this qc as a deferred one and report to the SCSI layer that
> + * we issued it so that it is not requeued. The deferred qc will
> + * be issued with the port deferred_qc_work once all on-going
> + * commands complete.
> + */
> + if (!ata_is_ncq(qc->tf.protocol)) {
> + ap->deferred_qc = qc;
> + return 0;
> + }
> +
> /* Force a requeue of the command to defer its execution. */
> ata_qc_free(qc);
> return ret;
> diff --git a/drivers/ata/libata.h b/drivers/ata/libata.h
> index 0e7ecac73680..60a675df61dc 100644
> --- a/drivers/ata/libata.h
> +++ b/drivers/ata/libata.h
> @@ -165,6 +165,8 @@ void ata_scsi_sdev_config(struct scsi_device *sdev);
> int ata_scsi_dev_config(struct scsi_device *sdev, struct queue_limits *lim,
> struct ata_device *dev);
> int __ata_scsi_queuecmd(struct scsi_cmnd *scmd, struct ata_device *dev);
> +void ata_scsi_deferred_qc_work(struct work_struct *work);
> +void ata_scsi_requeue_deferred_qc(struct ata_port *ap);
>
> /* libata-eh.c */
> extern unsigned int ata_internal_cmd_timeout(struct ata_device *dev, u8 cmd);
> diff --git a/include/linux/libata.h b/include/linux/libata.h
> index 39534fafa36a..c5b27d97dfaf 100644
> --- a/include/linux/libata.h
> +++ b/include/linux/libata.h
> @@ -903,6 +903,9 @@ struct ata_port {
> u64 qc_active;
> int nr_active_links; /* #links with active qcs */
>
> + struct work_struct deferred_qc_work;
> + struct ata_queued_cmd *deferred_qc;
> +
> struct ata_link link; /* host default link */
> struct ata_link *slave_link; /* see ata_slave_link_init() */
>
> --
> 2.52.0
>
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [PATCH v5 2/2] ata: libata-scsi: avoid Non-NCQ command starvation
2026-01-04 8:22 ` [PATCH v5 2/2] ata: libata-scsi: avoid Non-NCQ command starvation Damien Le Moal
` (2 preceding siblings ...)
2026-01-06 21:03 ` Igor Pylypiv
@ 2026-01-14 3:16 ` yangxingui
3 siblings, 0 replies; 11+ messages in thread
From: yangxingui @ 2026-01-14 3:16 UTC (permalink / raw)
To: Damien Le Moal, linux-ide, Niklas Cassel; +Cc: Igor Pylypiv, John Garry
On 2026/1/4 16:22, Damien Le Moal wrote:
> When a non-NCQ command is issued while NCQ commands are being executed,
> ata_scsi_qc_issue() indicates to the SCSI layer that the command issuing
> should be deferred by returning SCSI_MLQUEUE_XXX_BUSY. This command
> deferring is correct and as mandated by the ACS specifications since
> NCQ and non-NCQ commands cannot be mixed.
>
> However, in the case of a host adapter using multiple submission queues,
> when the target device is under a constant load of NCQ commands, there
> are no guarantees that requeueing the non-NCQ command will be executed
> later and it may be deferred again repeatedly as other submission queues
> can constantly issue NCQ commands from different CPUs ahead of the
> non-NCQ command. This can lead to very long delays for the execution of
> non-NCQ commands, and even complete starvation for these commands in the
> worst case scenario.
>
> Since the block layer and the SCSI layer do not distinguish between
> queueable (NCQ) and non queueable (non-NCQ) commands, libata-scsi SAT
> implementation must ensure forward progress for non-NCQ commands in the
> presence of NCQ command traffic. This is similar to what SAS HBAs with a
> hardware/firmware based SAT implementation do.
>
> Implement such forward progress guarantee by limiting requeueing of
> non-NCQ commands from ata_scsi_qc_issue(): when a non-NCQ command is
> received and NCQ commands are in-flight, do not force a requeue of the
> non-NCQ command by returning SCSI_MLQUEUE_XXX_BUSY and instead return 0
> to indicate that the command was accepted but hold on to the qc using
> the new deferred_qc field of struct ata_port.
>
> This deferred qc will be issued using the work item deferred_qc_work
> running the function ata_scsi_deferred_qc_work() once all in-flight
> commands complete, which is checked with the port qc_defer() callback
> return value indicating that no further delay is necessary. This check
> is done using the helper function ata_scsi_schedule_deferred_qc() which
> is called from ata_scsi_qc_complete(). This thus excludes this mechanism
> from all internal non-NCQ commands issued by ATA EH.
>
> When a port deferred_qc is non NULL, that is, the port has a command
> waiting for the device queue to drain, the issuing of all incoming
> commands (both NCQ and non-NCQ) is deferred using the regular busy
> mechanism. This simplifies the code and also avoids potential denial of
> service problems if a user issues too many non-NCQ commands.
>
> Finally, whenever ata EH is scheduled, regardless of the reason, a
> deferred qc is always requeued so that it can be retried once EH
> completes. This is done by calling the function
> ata_scsi_requeue_deferred_qc() from ata_eh_set_pending(). This avoids
> the need for any special processing for the deferred qc in case of NCQ
> error, link or device reset, or device timeout.
>
> Reported-by: Xingui Yang <yangxingui@huawei.com>
> Reported-by: Igor Pylypiv <ipylypiv@google.com>
> Fixes: 42f22fe36d51 ("scsi: pm8001: Expose hardware queues for pm80xx")
> Cc: stable@vger.kernel.org
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Tested-by: Xingui Yang <yangxingui@huawei.com>
Thanks,
Xingui
> ---
> drivers/ata/libata-core.c | 5 +++
> drivers/ata/libata-eh.c | 6 +++
> drivers/ata/libata-scsi.c | 89 +++++++++++++++++++++++++++++++++++++++
> drivers/ata/libata.h | 2 +
> include/linux/libata.h | 3 ++
> 5 files changed, 105 insertions(+)
>
> diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
> index 09d8c035fcdf..447d8dc666a4 100644
> --- a/drivers/ata/libata-core.c
> +++ b/drivers/ata/libata-core.c
> @@ -5561,6 +5561,7 @@ struct ata_port *ata_port_alloc(struct ata_host *host)
> mutex_init(&ap->scsi_scan_mutex);
> INIT_DELAYED_WORK(&ap->hotplug_task, ata_scsi_hotplug);
> INIT_DELAYED_WORK(&ap->scsi_rescan_task, ata_scsi_dev_rescan);
> + INIT_WORK(&ap->deferred_qc_work, ata_scsi_deferred_qc_work);
> INIT_LIST_HEAD(&ap->eh_done_q);
> init_waitqueue_head(&ap->eh_wait_q);
> init_completion(&ap->park_req_pending);
> @@ -6173,6 +6174,10 @@ static void ata_port_detach(struct ata_port *ap)
> }
> }
>
> + /* Make sure the deferred qc work finished. */
> + cancel_work_sync(&ap->deferred_qc_work);
> + WARN_ON(ap->deferred_qc);
> +
> /* Tell EH to disable all devices */
> ap->pflags |= ATA_PFLAG_UNLOADING;
> ata_port_schedule_eh(ap);
> diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
> index 2586e77ebf45..b90b17f680f8 100644
> --- a/drivers/ata/libata-eh.c
> +++ b/drivers/ata/libata-eh.c
> @@ -917,6 +917,12 @@ static void ata_eh_set_pending(struct ata_port *ap, bool fastdrain)
>
> ap->pflags |= ATA_PFLAG_EH_PENDING;
>
> + /*
> + * If we have a deferred qc, requeue it so that it is retried once EH
> + * completes.
> + */
> + ata_scsi_requeue_deferred_qc(ap);
> +
> if (!fastdrain)
> return;
>
> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
> index 5b6b5f1ff3c7..4aeffd6a5640 100644
> --- a/drivers/ata/libata-scsi.c
> +++ b/drivers/ata/libata-scsi.c
> @@ -1658,8 +1658,73 @@ static void ata_qc_done(struct ata_queued_cmd *qc)
> done(cmd);
> }
>
> +void ata_scsi_deferred_qc_work(struct work_struct *work)
> +{
> + struct ata_port *ap =
> + container_of(work, struct ata_port, deferred_qc_work);
> + struct ata_queued_cmd *qc;
> + unsigned long flags;
> +
> + spin_lock_irqsave(ap->lock, flags);
> +
> + /*
> + * If we still have a deferred qc and we are not in EH, issue it. In
> + * such case, we should not need any more deferring the qc, so warn if
> + * qc_defer() says otherwise.
> + */
> + qc = ap->deferred_qc;
> + if (qc && !ata_port_eh_scheduled(ap)) {
> + WARN_ON_ONCE(ap->ops->qc_defer(qc));
> + ap->deferred_qc = NULL;
> + ata_qc_issue(qc);
> + }
> +
> + spin_unlock_irqrestore(ap->lock, flags);
> +}
> +
> +void ata_scsi_requeue_deferred_qc(struct ata_port *ap)
> +{
> + struct ata_queued_cmd *qc = ap->deferred_qc;
> + struct scsi_cmnd *scmd;
> +
> + /*
> + * If we have a deferred qc when a reset occurs or NCQ commands fail,
> + * do not try to be smart about what to do with this deferred command
> + * and simply retry it by completing it with DID_SOFT_ERROR.
> + */
> + if (!qc)
> + return;
> +
> + scmd = qc->scsicmd;
> + ap->deferred_qc = NULL;
> + ata_qc_free(qc);
> + scmd->result = (DID_SOFT_ERROR << 16);
> + scsi_done(scmd);
> +}
> +
> +static void ata_scsi_schedule_deferred_qc(struct ata_port *ap)
> +{
> + struct ata_queued_cmd *qc = ap->deferred_qc;
> +
> + /*
> + * If we have a deferred qc, then qc_defer() is defined and we can use
> + * this callback to determine if this qc is good to go, unless EH has
> + * been scheduled.
> + */
> + if (!qc)
> + return;
> +
> + if (ata_port_eh_scheduled(ap)) {
> + ata_scsi_requeue_deferred_qc(ap);
> + return;
> + }
> + if (!ap->ops->qc_defer(qc))
> + queue_work(system_highpri_wq, &ap->deferred_qc_work);
> +}
> +
> static void ata_scsi_qc_complete(struct ata_queued_cmd *qc)
> {
> + struct ata_port *ap = qc->ap;
> struct scsi_cmnd *cmd = qc->scsicmd;
> u8 *cdb = cmd->cmnd;
> bool have_sense = qc->flags & ATA_QCFLAG_SENSE_VALID;
> @@ -1689,6 +1754,8 @@ static void ata_scsi_qc_complete(struct ata_queued_cmd *qc)
> }
>
> ata_qc_done(qc);
> +
> + ata_scsi_schedule_deferred_qc(ap);
> }
>
> static int ata_scsi_qc_issue(struct ata_port *ap, struct ata_queued_cmd *qc)
> @@ -1698,6 +1765,16 @@ static int ata_scsi_qc_issue(struct ata_port *ap, struct ata_queued_cmd *qc)
> if (!ap->ops->qc_defer)
> goto issue;
>
> + /*
> + * If we already have a deferred qc, then rely on the SCSI layer to
> + * requeue and defer all incoming commands until the deferred qc is
> + * processed, once all on-going commands complete.
> + */
> + if (ap->deferred_qc) {
> + ata_qc_free(qc);
> + return SCSI_MLQUEUE_DEVICE_BUSY;
> + }
> +
> /* Check if the command needs to be deferred. */
> ret = ap->ops->qc_defer(qc);
> switch (ret) {
> @@ -1716,6 +1793,18 @@ static int ata_scsi_qc_issue(struct ata_port *ap, struct ata_queued_cmd *qc)
> }
>
> if (ret) {
> + /*
> + * We must defer this qc: if this is not an NCQ command, keep
> + * this qc as a deferred one and report to the SCSI layer that
> + * we issued it so that it is not requeued. The deferred qc will
> + * be issued with the port deferred_qc_work once all on-going
> + * commands complete.
> + */
> + if (!ata_is_ncq(qc->tf.protocol)) {
> + ap->deferred_qc = qc;
> + return 0;
> + }
> +
> /* Force a requeue of the command to defer its execution. */
> ata_qc_free(qc);
> return ret;
> diff --git a/drivers/ata/libata.h b/drivers/ata/libata.h
> index 0e7ecac73680..60a675df61dc 100644
> --- a/drivers/ata/libata.h
> +++ b/drivers/ata/libata.h
> @@ -165,6 +165,8 @@ void ata_scsi_sdev_config(struct scsi_device *sdev);
> int ata_scsi_dev_config(struct scsi_device *sdev, struct queue_limits *lim,
> struct ata_device *dev);
> int __ata_scsi_queuecmd(struct scsi_cmnd *scmd, struct ata_device *dev);
> +void ata_scsi_deferred_qc_work(struct work_struct *work);
> +void ata_scsi_requeue_deferred_qc(struct ata_port *ap);
>
> /* libata-eh.c */
> extern unsigned int ata_internal_cmd_timeout(struct ata_device *dev, u8 cmd);
> diff --git a/include/linux/libata.h b/include/linux/libata.h
> index 39534fafa36a..c5b27d97dfaf 100644
> --- a/include/linux/libata.h
> +++ b/include/linux/libata.h
> @@ -903,6 +903,9 @@ struct ata_port {
> u64 qc_active;
> int nr_active_links; /* #links with active qcs */
>
> + struct work_struct deferred_qc_work;
> + struct ata_queued_cmd *deferred_qc;
> +
> struct ata_link link; /* host default link */
> struct ata_link *slave_link; /* see ata_slave_link_init() */
>
>
^ permalink raw reply [flat|nested] 11+ messages in thread