public inbox for linux-ide@vger.kernel.org
 help / color / mirror / Atom feed
From: Henry Tseng <henrytseng@qnap.com>
To: Damien Le Moal <dlemoal@kernel.org>
Cc: Niklas Cassel <cassel@kernel.org>,
	linux-ide@vger.kernel.org, Kevin Ko <kevinko@qnap.com>,
	SW Chen <swchen@qnap.com>, Henry Tseng <henrytseng@qnap.com>
Subject: [PATCH] ata: libata: avoid long timeouts on hot-unplugged SATA DAS
Date: Thu, 20 Nov 2025 15:35:13 +0800	[thread overview]
Message-ID: <20251120073513.702344-1-henrytseng@qnap.com> (raw)

When a SATA DAS enclosure is connected behind a Thunderbolt PCIe
switch, hot-unplugging the whole enclosure causes pciehp to tear down
the PCI hierarchy before the SCSI layer issues SYNCHRONIZE CACHE and
START STOP UNIT for the disks.

libata still queues these commands and the AHCI driver tries to access
the HBA registers even though the PCI channel is already offline. This
results in a series of timeouts and error recovery attempts, e.g.:

  [  824.778346] pcieport 0000:00:07.0: pciehp: Slot(14): Link Down
  [  891.612720] ata8.00: qc timeout after 5000 msecs (cmd 0xec)
  [  902.876501] ata8.00: qc timeout after 10000 msecs (cmd 0xec)
  [  934.107998] ata8.00: qc timeout after 30000 msecs (cmd 0xec)
  [  936.206431] sd 7:0:0:0: [sda] Synchronize Cache(10) failed:
      Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
  ...
  [ 1006.298356] ata1.00: qc timeout after 5000 msecs (cmd 0xec)
  [ 1017.561926] ata1.00: qc timeout after 10000 msecs (cmd 0xec)
  [ 1048.791790] ata1.00: qc timeout after 30000 msecs (cmd 0xec)
  [ 1050.890035] sd 0:0:0:0: [sdb] Synchronize Cache(10) failed:
      Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK

With this patch applied, the same hot-unplug looks like:

  [   55.452526] pcieport 0000:00:07.0: pciehp: Slot(14): Link Down
  [   55.496563] sd 7:0:0:0: [sda] Synchronize Cache(10) failed:
      Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
  [   55.617450] sd 0:0:0:0: [sdb] Synchronize Cache(10) failed:
      Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK

In this test setup with two disks, the hot-unplug sequence shrinks from
about 226 seconds (~3.8 minutes) between the Link Down event and the
last SYNCHRONIZE CACHE failure to under 200 ms. Without this patch the
total delay grows roughly with the number of disks, because each disk
gets its own SYNCHRONIZE CACHE and qc timeout series.

If the underlying PCI device is already gone, these commands cannot
succeed anyway. Avoid issuing them by introducing
ata_port_pci_channel_offline(), which checks pci_channel_offline() for
PCI-based hosts and is used in both ata_scsi_queuecmd() and
ata_qc_issue().

For SCSI-originated commands we now complete the request immediately
with DID_BAD_TARGET instead of queuing it. For internal ATA commands
we fail the qc with AC_ERR_SYSTEM early without touching the HBA
registers.

With this change, SYNCHRONIZE CACHE issued during hot-unplug fails
quickly with DID_BAD_TARGET, without qc timeout spam, and the whole
unplug completes much faster.

Signed-off-by: Henry Tseng <henrytseng@qnap.com>
---
 drivers/ata/libata-core.c |  3 +++
 drivers/ata/libata-scsi.c |  3 ++-
 drivers/ata/libata.h      | 16 ++++++++++++++++
 3 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 2a210719c4ce..49a59a90e8eb 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -5060,6 +5060,9 @@ void ata_qc_issue(struct ata_queued_cmd *qc)
 		if (ata_sg_setup(qc))
 			goto sys_err;
 
+	if (ata_port_pci_channel_offline(ap))
+		goto sys_err;
+
 	/* if device is sleeping, schedule reset and abort the link */
 	if (unlikely(qc->dev->flags & ATA_DFLAG_SLEEPING)) {
 		link->eh_info.action |= ATA_EH_RESET;
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index b43a3196e2be..a49fd4bbf321 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -4388,7 +4388,8 @@ int ata_scsi_queuecmd(struct Scsi_Host *shost, struct scsi_cmnd *cmd)
 	spin_lock_irqsave(ap->lock, irq_flags);
 
 	dev = ata_scsi_find_dev(ap, scsidev);
-	if (likely(dev))
+
+	if (likely(dev) && !ata_port_pci_channel_offline(ap))
 		rc = __ata_scsi_queuecmd(cmd, dev);
 	else {
 		cmd->result = (DID_BAD_TARGET << 16);
diff --git a/drivers/ata/libata.h b/drivers/ata/libata.h
index e5b977a8d3e1..ff2eb824e51b 100644
--- a/drivers/ata/libata.h
+++ b/drivers/ata/libata.h
@@ -12,6 +12,8 @@
 #ifndef __LIBATA_H__
 #define __LIBATA_H__
 
+#include <linux/pci.h>
+
 #define DRV_NAME	"libata"
 #define DRV_VERSION	"3.00"	/* must be exactly four chars */
 
@@ -56,6 +58,20 @@ static inline bool ata_port_eh_scheduled(struct ata_port *ap)
 	return ap->pflags & (ATA_PFLAG_EH_PENDING | ATA_PFLAG_EH_IN_PROGRESS);
 }
 
+static inline bool ata_port_pci_channel_offline(struct ata_port *ap)
+{
+	struct device *dev;
+
+	if (!ap || !ap->host)
+		return false;
+
+	dev = ap->host->dev;
+	if (!dev || !dev_is_pci(dev))
+		return false;
+
+	return pci_channel_offline(to_pci_dev(dev));
+}
+
 #ifdef CONFIG_ATA_FORCE
 extern void ata_force_cbl(struct ata_port *ap);
 #else
-- 
2.43.0


             reply	other threads:[~2025-11-20  7:37 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-20  7:35 Henry Tseng [this message]
2025-11-20  8:30 ` [PATCH] ata: libata: avoid long timeouts on hot-unplugged SATA DAS Damien Le Moal
2025-11-21 10:14   ` [PATCH v2] " Henry Tseng
2025-11-26  7:50     ` Damien Le Moal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251120073513.702344-1-henrytseng@qnap.com \
    --to=henrytseng@qnap.com \
    --cc=cassel@kernel.org \
    --cc=dlemoal@kernel.org \
    --cc=kevinko@qnap.com \
    --cc=linux-ide@vger.kernel.org \
    --cc=swchen@qnap.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox