public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] ata: libata-sata: retry hardreset when device detected but PHY not established
@ 2026-04-25  6:04 Xingui Yang
  2026-04-25 22:53 ` Damien Le Moal
  2026-04-27 13:17 ` Niklas Cassel
  0 siblings, 2 replies; 5+ messages in thread
From: Xingui Yang @ 2026-04-25  6:04 UTC (permalink / raw)
  To: dlemoal, cassel
  Cc: linux-scsi, linux-kernel, yangxingui, liuyonglong, kangfenglong

When sata_link_hardreset() detects that the link is offline, it currently
returns immediately without distinguishing the reason. According to SATA
specification, the SStatus register's det filed (bits 0-3) indicates:
  - 0x0: No device detected, PHY not communicating
  - 0x1: Device detected but PHY communication not established
  - 0x3: Device detected and PHY communication established

This patch helps improve device detection reliability and adds a check
when the link is offline but det filed shows 0x1, return -EAGAIN to
trigger retry, rather than giving up immediately.

Signed-off-by: Xingui Yang <yangxingui@huawei.com>
---
 drivers/ata/libata-sata.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/ata/libata-sata.c b/drivers/ata/libata-sata.c
index b9d635088f5f..e5bb92c38e38 100644
--- a/drivers/ata/libata-sata.c
+++ b/drivers/ata/libata-sata.c
@@ -667,8 +667,18 @@ int sata_link_hardreset(struct ata_link *link, const unsigned int *timing,
 	if (rc)
 		goto out;
 	/* if link is offline nothing more to do */
-	if (ata_phys_link_offline(link))
+	if (ata_phys_link_offline(link)) {
+		u32 sstatus;
+
+		if (sata_scr_read(link, SCR_STATUS, &sstatus) == 0 &&
+		    (sstatus & 0xf) == 0x1) {
+			ata_link_warn(link, "device detected but PHY not ready (SStatus %X), retrying\n",
+				      sstatus);
+			rc = -EAGAIN;
+		}
+
 		goto out;
+	}
 
 	/* Link is online.  From this point, -ENODEV too is an error. */
 	if (online)
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] ata: libata-sata: retry hardreset when device detected but PHY not established
  2026-04-25  6:04 [PATCH] ata: libata-sata: retry hardreset when device detected but PHY not established Xingui Yang
@ 2026-04-25 22:53 ` Damien Le Moal
  2026-04-27  1:51   ` yangxingui
  2026-04-27 13:17 ` Niklas Cassel
  1 sibling, 1 reply; 5+ messages in thread
From: Damien Le Moal @ 2026-04-25 22:53 UTC (permalink / raw)
  To: Xingui Yang, cassel; +Cc: linux-scsi, linux-kernel, liuyonglong, kangfenglong

On 4/25/26 15:04, Xingui Yang wrote:
> When sata_link_hardreset() detects that the link is offline, it currently
> returns immediately without distinguishing the reason. According to SATA
> specification, the SStatus register's det filed (bits 0-3) indicates:
>   - 0x0: No device detected, PHY not communicating
>   - 0x1: Device detected but PHY communication not established
>   - 0x3: Device detected and PHY communication established
> 
> This patch helps improve device detection reliability and adds a check
> when the link is offline but det filed shows 0x1, return -EAGAIN to
> trigger retry, rather than giving up immediately.
> 
> Signed-off-by: Xingui Yang <yangxingui@huawei.com>

This is a pure ATA patch so please CC the linux-ide list, not the linux-scsi list.

Also, please check your mail setup: your email was in my Junk folder.

> ---
>  drivers/ata/libata-sata.c | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/ata/libata-sata.c b/drivers/ata/libata-sata.c
> index b9d635088f5f..e5bb92c38e38 100644
> --- a/drivers/ata/libata-sata.c
> +++ b/drivers/ata/libata-sata.c
> @@ -667,8 +667,18 @@ int sata_link_hardreset(struct ata_link *link, const unsigned int *timing,
>  	if (rc)
>  		goto out;
>  	/* if link is offline nothing more to do */
> -	if (ata_phys_link_offline(link))
> +	if (ata_phys_link_offline(link)) {

This is preceeded by a call to sata_link_resume(), which calls
sata_link_debounce() and that function makes sure that DET is stable. So if
after that DET still shows that their is no PHY, there is likely a big problem
with it and it is super slow to be established.

In this case, I do not think that doing another hardreset is the right thing to
do. Have you tried increasing the deadline for hardreset ? That deadline is used
as the limit for the link debounce too.

Do you have a specific controller/device where you see this issue ? What exactly
is the hardware setup where you see this issue ?



> +		u32 sstatus;
> +
> +		if (sata_scr_read(link, SCR_STATUS, &sstatus) == 0 &&
> +		    (sstatus & 0xf) == 0x1) {
> +			ata_link_warn(link, "device detected but PHY not ready (SStatus %X), retrying\n",
> +				      sstatus);
> +			rc = -EAGAIN;
> +		}
> +
>  		goto out;
> +	}
>  
>  	/* Link is online.  From this point, -ENODEV too is an error. */
>  	if (online)


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] ata: libata-sata: retry hardreset when device detected but PHY not established
  2026-04-25 22:53 ` Damien Le Moal
@ 2026-04-27  1:51   ` yangxingui
  2026-04-27  4:45     ` Damien Le Moal
  0 siblings, 1 reply; 5+ messages in thread
From: yangxingui @ 2026-04-27  1:51 UTC (permalink / raw)
  To: Damien Le Moal, cassel
  Cc: linux-scsi, linux-kernel, liuyonglong, kangfenglong, linux-ide



On 2026/4/26 6:53, Damien Le Moal wrote:
> On 4/25/26 15:04, Xingui Yang wrote:
>> When sata_link_hardreset() detects that the link is offline, it currently
>> returns immediately without distinguishing the reason. According to SATA
>> specification, the SStatus register's det filed (bits 0-3) indicates:
>>    - 0x0: No device detected, PHY not communicating
>>    - 0x1: Device detected but PHY communication not established
>>    - 0x3: Device detected and PHY communication established
>>
>> This patch helps improve device detection reliability and adds a check
>> when the link is offline but det filed shows 0x1, return -EAGAIN to
>> trigger retry, rather than giving up immediately.
>>
>> Signed-off-by: Xingui Yang <yangxingui@huawei.com>
> 
> This is a pure ATA patch so please CC the linux-ide list, not the linux-scsi list.

Ok.
> 
> Also, please check your mail setup: your email was in my Junk folder.

Well, patche was sent using the git send command.

> 
>> ---
>>   drivers/ata/libata-sata.c | 12 +++++++++++-
>>   1 file changed, 11 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/ata/libata-sata.c b/drivers/ata/libata-sata.c
>> index b9d635088f5f..e5bb92c38e38 100644
>> --- a/drivers/ata/libata-sata.c
>> +++ b/drivers/ata/libata-sata.c
>> @@ -667,8 +667,18 @@ int sata_link_hardreset(struct ata_link *link, const unsigned int *timing,
>>   	if (rc)
>>   		goto out;
>>   	/* if link is offline nothing more to do */
>> -	if (ata_phys_link_offline(link))
>> +	if (ata_phys_link_offline(link)) {
> 
> This is preceeded by a call to sata_link_resume(), which calls
> sata_link_debounce() and that function makes sure that DET is stable. So if
> after that DET still shows that their is no PHY, there is likely a big problem
> with it and it is super slow to be established.
> 
> In this case, I do not think that doing another hardreset is the right thing to
> do. Have you tried increasing the deadline for hardreset ? That deadline is used
> as the limit for the link debounce too.
> 
> Do you have a specific controller/device where you see this issue ? What exactly
> is the hardware setup where you see this issue ?

Our customer imports and verifies a new disk, there is an occasional 
failure in performing a hard reset on the disk and no exception log is 
generated for resume and debounce.

[   22.864418][ T1285] ahci 0000:76:03.0: Adding to iommu group 23
[   22.870403][ T1285] ahci 0000:76:03.0: controller does not support 
SXS, disabling CAP_SXS
[   22.878655][ T1285] ahci 0000:76:03.0: SSS flag set, parallel bus 
scan disabled
[   22.885966][ T1285] ahci 0000:76:03.0: AHCI 0001.0300 32 slots 2 
ports 6 Gbps 0x3 impl SATA mode
[   22.894743][ T1285] ahci 0000:76:03.0: flags: 64bit ncq sntf stag pm 
led clo only pmp fbs slum part ccc ems boh
[   22.905277][ T1285] scsi host0: ahci
[   22.909061][ T1285] scsi host1: ahci
[   22.966463][ T1285] ata1: SATA max UDMA/133 abar m4096@0xa3010000 
port 0xa3010100 irq 108
[   22.974629][ T1285] ata2: SATA max UDMA/133 abar m4096@0xa3010000 
port 0xa3010180 irq 109
[   25.242373][ T1286] ata1: SATA link down (SStatus 1 SControl 300) 
<==============
[   25.659901][ T1288] ata2: SATA link down (SStatus 0 SControl 300)
> 
> 
> 
>> +		u32 sstatus;
>> +
>> +		if (sata_scr_read(link, SCR_STATUS, &sstatus) == 0 &&
>> +		    (sstatus & 0xf) == 0x1) {
>> +			ata_link_warn(link, "device detected but PHY not ready (SStatus %X), retrying\n",
>> +				      sstatus);
>> +			rc = -EAGAIN;
>> +		}
>> +
>>   		goto out;
>> +	}
>>   
>>   	/* Link is online.  From this point, -ENODEV too is an error. */
>>   	if (online)
> 
> 

Thanks,
Xingui

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] ata: libata-sata: retry hardreset when device detected but PHY not established
  2026-04-27  1:51   ` yangxingui
@ 2026-04-27  4:45     ` Damien Le Moal
  0 siblings, 0 replies; 5+ messages in thread
From: Damien Le Moal @ 2026-04-27  4:45 UTC (permalink / raw)
  To: yangxingui, cassel
  Cc: linux-scsi, linux-kernel, liuyonglong, kangfenglong, linux-ide

On 4/27/26 10:51 AM, yangxingui wrote:
> 
> 
> On 2026/4/26 6:53, Damien Le Moal wrote:
>> On 4/25/26 15:04, Xingui Yang wrote:
>>> When sata_link_hardreset() detects that the link is offline, it currently
>>> returns immediately without distinguishing the reason. According to SATA
>>> specification, the SStatus register's det filed (bits 0-3) indicates:
>>>    - 0x0: No device detected, PHY not communicating
>>>    - 0x1: Device detected but PHY communication not established
>>>    - 0x3: Device detected and PHY communication established
>>>
>>> This patch helps improve device detection reliability and adds a check
>>> when the link is offline but det filed shows 0x1, return -EAGAIN to
>>> trigger retry, rather than giving up immediately.
>>>
>>> Signed-off-by: Xingui Yang <yangxingui@huawei.com>
>>
>> This is a pure ATA patch so please CC the linux-ide list, not the linux-scsi
>> list.
> 
> Ok.
>>
>> Also, please check your mail setup: your email was in my Junk folder.
> 
> Well, patche was sent using the git send command.

Not git send-email, your smtp server. It probably has something wrong with
DMARC. All your emails endup in my junk folder.

>> This is preceeded by a call to sata_link_resume(), which calls
>> sata_link_debounce() and that function makes sure that DET is stable. So if
>> after that DET still shows that their is no PHY, there is likely a big problem
>> with it and it is super slow to be established.
>>
>> In this case, I do not think that doing another hardreset is the right thing to
>> do. Have you tried increasing the deadline for hardreset ? That deadline is used
>> as the limit for the link debounce too.
>>
>> Do you have a specific controller/device where you see this issue ? What exactly
>> is the hardware setup where you see this issue ?
> 
> Our customer imports and verifies a new disk, there is an occasional failure in
> performing a hard reset on the disk and no exception log is generated for
> resume and debounce.

Does this hold for all disks or for only one or some models ?

> 
> [   22.864418][ T1285] ahci 0000:76:03.0: Adding to iommu group 23
> [   22.870403][ T1285] ahci 0000:76:03.0: controller does not support SXS,
> disabling CAP_SXS
> [   22.878655][ T1285] ahci 0000:76:03.0: SSS flag set, parallel bus scan disabled
> [   22.885966][ T1285] ahci 0000:76:03.0: AHCI 0001.0300 32 slots 2 ports 6
> Gbps 0x3 impl SATA mode
> [   22.894743][ T1285] ahci 0000:76:03.0: flags: 64bit ncq sntf stag pm led clo
> only pmp fbs slum part ccc ems boh
> [   22.905277][ T1285] scsi host0: ahci
> [   22.909061][ T1285] scsi host1: ahci
> [   22.966463][ T1285] ata1: SATA max UDMA/133 abar m4096@0xa3010000 port
> 0xa3010100 irq 108
> [   22.974629][ T1285] ata2: SATA max UDMA/133 abar m4096@0xa3010000 port
> 0xa3010180 irq 109
> [   25.242373][ T1286] ata1: SATA link down (SStatus 1 SControl 300)
> <==============
> [   25.659901][ T1288] ata2: SATA link down (SStatus 0 SControl 300)
>>
>>
>>
>>> +        u32 sstatus;
>>> +
>>> +        if (sata_scr_read(link, SCR_STATUS, &sstatus) == 0 &&
>>> +            (sstatus & 0xf) == 0x1) {
>>> +            ata_link_warn(link, "device detected but PHY not ready (SStatus
>>> %X), retrying\n",
>>> +                      sstatus);
>>> +            rc = -EAGAIN;
>>> +        }
>>> +
>>>           goto out;
>>> +    }
>>>         /* Link is online.  From this point, -ENODEV too is an error. */
>>>       if (online)
>>
>>
> 
> Thanks,
> Xingui
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] ata: libata-sata: retry hardreset when device detected but PHY not established
  2026-04-25  6:04 [PATCH] ata: libata-sata: retry hardreset when device detected but PHY not established Xingui Yang
  2026-04-25 22:53 ` Damien Le Moal
@ 2026-04-27 13:17 ` Niklas Cassel
  1 sibling, 0 replies; 5+ messages in thread
From: Niklas Cassel @ 2026-04-27 13:17 UTC (permalink / raw)
  To: Xingui Yang; +Cc: dlemoal, linux-scsi, linux-kernel, liuyonglong, kangfenglong

On Sat, Apr 25, 2026 at 02:04:47PM +0800, Xingui Yang wrote:
> When sata_link_hardreset() detects that the link is offline, it currently
> returns immediately without distinguishing the reason. According to SATA
> specification, the SStatus register's det filed (bits 0-3) indicates:
>   - 0x0: No device detected, PHY not communicating
>   - 0x1: Device detected but PHY communication not established
>   - 0x3: Device detected and PHY communication established
> 
> This patch helps improve device detection reliability and adds a check
> when the link is offline but det filed shows 0x1, return -EAGAIN to
> trigger retry, rather than giving up immediately.
> 
> Signed-off-by: Xingui Yang <yangxingui@huawei.com>
> ---
>  drivers/ata/libata-sata.c | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/ata/libata-sata.c b/drivers/ata/libata-sata.c
> index b9d635088f5f..e5bb92c38e38 100644
> --- a/drivers/ata/libata-sata.c
> +++ b/drivers/ata/libata-sata.c
> @@ -667,8 +667,18 @@ int sata_link_hardreset(struct ata_link *link, const unsigned int *timing,
>  	if (rc)
>  		goto out;
>  	/* if link is offline nothing more to do */
> -	if (ata_phys_link_offline(link))
> +	if (ata_phys_link_offline(link)) {
> +		u32 sstatus;
> +
> +		if (sata_scr_read(link, SCR_STATUS, &sstatus) == 0 &&
> +		    (sstatus & 0xf) == 0x1) {
> +			ata_link_warn(link, "device detected but PHY not ready (SStatus %X), retrying\n",
> +				      sstatus);
> +			rc = -EAGAIN;
> +		}
> +

This looks like you are more or less duplicating the function
ata_eh_link_established(), untrouced in commit 4371fe1ba400 ("ata:
libata-eh: Avoid unnecessary resets when revalidating devices").

Could you perhaps try to reuse this function?

(It is currently private, so you would need to make it public.)


Kind regards,
Niklas

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-04-27 13:17 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-25  6:04 [PATCH] ata: libata-sata: retry hardreset when device detected but PHY not established Xingui Yang
2026-04-25 22:53 ` Damien Le Moal
2026-04-27  1:51   ` yangxingui
2026-04-27  4:45     ` Damien Le Moal
2026-04-27 13:17 ` Niklas Cassel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox