Linux ATA/IDE development
 help / color / mirror / Atom feed
* libahci driver and power switching HDD on newer kernels
@ 2024-09-15 12:44 W
  2024-09-24  5:20 ` Linux regression tracking (Thorsten Leemhuis)
  0 siblings, 1 reply; 8+ messages in thread
From: W @ 2024-09-15 12:44 UTC (permalink / raw)
  To: linux-ide

[-- Attachment #1: Type: text/plain, Size: 2303 bytes --]

Hi,

I've got some problems with newer kernels during hibernation and
waking up from hibernation.
The symptom of the issue is that there is HDD power switching executed
after I run pm-hibernate command and the same HDD power switching
during the wake up process from hibernation.

As the HDD power switch process I mean: powering off my HDD and
immediately after that powering on the HDD. This process in my case
takes about 1 second.

Here there are more details about both operations: hibernation and 
waking up:

Hibernation process:
1. in shell I type: "pm-hibernate"
2. kernel is preparing system to hibernation
3. hdd is powered off and immediately powered on - it takes about one
second or less to do the power switch
4. kernel is saving mem image to swap partition with printing progress
in percentage
5. PC and HDD are powered off

Waking up process:
1. my PC and HDD are powered off, I'm pressing any key on my keyboard
so it is powering up my PC
2. kernel is starting, recognizes that there is mem image on swap and
starting to load it - printing progress percentage
3. hdd is powered off and immediately powered on - it takes about one
second or less to do the power switch
4. system is ready to use and working fine

So to sum up - in both processes described above the problematic step
is the step 3.

I noticed this issue when I switched kernel 6.4.12 to 6.7.5. So far I
haven't used git bisect yet to find the exact offending commit so the
change might be introduced somewhere between 6.4.12 and 6.7.5.
In 6.4.12 I have not such an issues with HDD power switching. The
issues exist in 6.7.5 and newer ones and probably somewhere between
6.4.12 and 6.7.5.

I noticed some errors in dmesg coming from ahci driver like these:
Sep 11 15:49:30 localhost kernel: ahci 0000:00:17.0: port does not
support device sleep

and ACPI BIOS errors like these:
Sep 11 15:49:30 localhost kernel: ACPI BIOS Error (bug): Could not
resolve symbol [\_SB.PCI0.SAT0.PRT0._GTF.DSSP], AE_NOT_FOUND
(20240322/psargs-330)

Please take a look at included dmesg in the attachment.

The mainboard I use is: Gigabyte Z370 HD3P with the newest available
BIOS update.

I'd like to ask if this is known issue and if yes how could I fix it?
I'm not quite sure where exactly is the issue: in kernel or in my 
Gigabyte BIOS?

[-- Attachment #2: kern_log.txt --]
[-- Type: text/plain, Size: 8797 bytes --]

Sep 11 15:48:14 localhost kernel: EXT4-fs (sdc3): re-mounted 74f580dd-0819-46ea-bb1d-761996f80da2 r/w. Quota mode: none.
Sep 11 15:48:14 localhost kernel: PM: hibernation: hibernation entry
Sep 11 15:48:14 localhost kernel: Filesystems sync: 0.001 seconds
Sep 11 15:49:30 localhost kernel: Freezing user space processes
Sep 11 15:49:30 localhost kernel: Freezing user space processes completed (elapsed 0.001 seconds)
Sep 11 15:49:30 localhost kernel: OOM killer disabled.
Sep 11 15:49:30 localhost kernel: PM: hibernation: Preallocating image memory
Sep 11 15:49:30 localhost kernel: PM: hibernation: Allocated 766253 pages for snapshot
Sep 11 15:49:30 localhost kernel: PM: hibernation: Allocated 3065012 kbytes in 0.36 seconds (8513.92 MB/s)
Sep 11 15:49:30 localhost kernel: Freezing remaining freezable tasks
Sep 11 15:49:30 localhost kernel: Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
Sep 11 15:49:30 localhost kernel: printk: Suspending console(s) (use no_console_suspend to debug)
Sep 11 15:49:30 localhost kernel: serial 00:04: disabled
Sep 11 15:49:30 localhost kernel: parport_pc 00:02: disabled
Sep 11 15:49:30 localhost kernel: ata2.00: Entering standby power mode
Sep 11 15:49:30 localhost kernel: ata4.00: Entering standby power mode
Sep 11 15:49:30 localhost kernel: ata1.00: Entering standby power mode
Sep 11 15:49:30 localhost kernel: ACPI: PM: Preparing to enter system sleep state S4
Sep 11 15:49:30 localhost kernel: ACPI: PM: Saving platform NVS memory
Sep 11 15:49:30 localhost kernel: Disabling non-boot CPUs ...
Sep 11 15:49:30 localhost kernel: smpboot: CPU 1 is now offline
Sep 11 15:49:30 localhost kernel: smpboot: CPU 2 is now offline
Sep 11 15:49:30 localhost kernel: smpboot: CPU 3 is now offline
Sep 11 15:49:30 localhost kernel: smpboot: CPU 4 is now offline
Sep 11 15:49:30 localhost kernel: smpboot: CPU 5 is now offline
Sep 11 15:49:30 localhost kernel: smpboot: CPU 6 is now offline
Sep 11 15:49:30 localhost kernel: smpboot: CPU 7 is now offline
Sep 11 15:49:30 localhost kernel: smpboot: CPU 8 is now offline
Sep 11 15:49:30 localhost kernel: smpboot: CPU 9 is now offline
Sep 11 15:49:30 localhost kernel: smpboot: CPU 10 is now offline
Sep 11 15:49:30 localhost kernel: smpboot: CPU 11 is now offline
Sep 11 15:49:30 localhost kernel: PM: hibernation: Creating image:
Sep 11 15:49:30 localhost kernel: PM: hibernation: Need to copy 756937 pages
Sep 11 15:49:30 localhost kernel: ACPI: PM: Restoring platform NVS memory
Sep 11 15:49:30 localhost kernel: Enabling non-boot CPUs ...
Sep 11 15:49:30 localhost kernel: smpboot: Booting Node 0 Processor 1 APIC 0x2
Sep 11 15:49:30 localhost kernel: CPU1 is up
Sep 11 15:49:30 localhost kernel: smpboot: Booting Node 0 Processor 2 APIC 0x4
Sep 11 15:49:30 localhost kernel: CPU2 is up
Sep 11 15:49:30 localhost kernel: smpboot: Booting Node 0 Processor 3 APIC 0x6
Sep 11 15:49:30 localhost kernel: CPU3 is up
Sep 11 15:49:30 localhost kernel: smpboot: Booting Node 0 Processor 4 APIC 0x8
Sep 11 15:49:30 localhost kernel: CPU4 is up
Sep 11 15:49:30 localhost kernel: smpboot: Booting Node 0 Processor 5 APIC 0xa
Sep 11 15:49:30 localhost kernel: CPU5 is up
Sep 11 15:49:30 localhost kernel: smpboot: Booting Node 0 Processor 6 APIC 0x1
Sep 11 15:49:30 localhost kernel: CPU6 is up
Sep 11 15:49:30 localhost kernel: smpboot: Booting Node 0 Processor 7 APIC 0x3
Sep 11 15:49:30 localhost kernel: CPU7 is up
Sep 11 15:49:30 localhost kernel: smpboot: Booting Node 0 Processor 8 APIC 0x5
Sep 11 15:49:30 localhost kernel: CPU8 is up
Sep 11 15:49:30 localhost kernel: smpboot: Booting Node 0 Processor 9 APIC 0x7
Sep 11 15:49:30 localhost kernel: CPU9 is up
Sep 11 15:49:30 localhost kernel: smpboot: Booting Node 0 Processor 10 APIC 0x9
Sep 11 15:49:30 localhost kernel: CPU10 is up
Sep 11 15:49:30 localhost kernel: smpboot: Booting Node 0 Processor 11 APIC 0xb
Sep 11 15:49:30 localhost kernel: CPU11 is up
Sep 11 15:49:30 localhost kernel: ACPI: PM: Waking up from system sleep state S4
Sep 11 15:49:30 localhost kernel: usb usb1: root hub lost power or was reset
Sep 11 15:49:30 localhost kernel: usb usb2: root hub lost power or was reset
Sep 11 15:49:30 localhost kernel: usb usb3: root hub lost power or was reset
Sep 11 15:49:30 localhost kernel: usb usb4: root hub lost power or was reset
Sep 11 15:49:30 localhost kernel: parport_pc 00:02: activated
Sep 11 15:49:30 localhost kernel: serial 00:04: activated
Sep 11 15:49:30 localhost kernel: sd 0:0:0:0: [sda] Starting disk
Sep 11 15:49:30 localhost kernel: sd 1:0:0:0: [sdb] Starting disk
Sep 11 15:49:30 localhost kernel: sd 3:0:0:0: [sdc] Starting disk
Sep 11 15:49:30 localhost kernel: ata5: SATA link down (SStatus 4 SControl 300)
Sep 11 15:49:30 localhost kernel: ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Sep 11 15:49:30 localhost kernel: ata3: SATA link down (SStatus 4 SControl 300)
Sep 11 15:49:30 localhost kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Sep 11 15:49:30 localhost kernel: ACPI BIOS Error (bug): Could not resolve symbol [\_SB.PCI0.SAT0.PRT1._GTF.DSSP], AE_NOT_FOUND (20240322/psargs-330)
Sep 11 15:49:30 localhost kernel: ACPI Error: Aborting method \_SB.PCI0.SAT0.PRT1._GTF due to previous error (AE_NOT_FOUND) (20240322/psparse-529)
Sep 11 15:49:30 localhost kernel: ata2.00: supports DRM functions and may not be fully accessible
Sep 11 15:49:30 localhost kernel: ACPI BIOS Error (bug): Could not resolve symbol [\_SB.PCI0.SAT0.PRT0._GTF.DSSP], AE_NOT_FOUND (20240322/psargs-330)
Sep 11 15:49:30 localhost kernel: ACPI Error: Aborting method \_SB.PCI0.SAT0.PRT0._GTF due to previous error (AE_NOT_FOUND) (20240322/psparse-529)
Sep 11 15:49:30 localhost kernel: ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Sep 11 15:49:30 localhost kernel: ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Sep 11 15:49:30 localhost kernel: ACPI BIOS Error (bug): Could not resolve symbol [\_SB.PCI0.SAT0.PRT0._GTF.DSSP], AE_NOT_FOUND (20240322/psargs-330)
Sep 11 15:49:30 localhost kernel: ACPI Error: Aborting method \_SB.PCI0.SAT0.PRT0._GTF due to previous error (AE_NOT_FOUND) (20240322/psparse-529)
Sep 11 15:49:30 localhost kernel: ACPI BIOS Error (bug): Could not resolve symbol [\_SB.PCI0.SAT0.PRT3._GTF.DSSP], AE_NOT_FOUND (20240322/psargs-330)
Sep 11 15:49:30 localhost kernel: ACPI Error: Aborting method \_SB.PCI0.SAT0.PRT3._GTF due to previous error (AE_NOT_FOUND) (20240322/psparse-529)
Sep 11 15:49:30 localhost kernel: ata1.00: configured for UDMA/133
Sep 11 15:49:30 localhost kernel: ata1.00: Entering active power mode
Sep 11 15:49:30 localhost kernel: ACPI BIOS Error (bug): Could not resolve symbol [\_SB.PCI0.SAT0.PRT5._GTF.DSSP], AE_NOT_FOUND (20240322/psargs-330)
Sep 11 15:49:30 localhost kernel: ACPI Error: Aborting method \_SB.PCI0.SAT0.PRT5._GTF due to previous error (AE_NOT_FOUND) (20240322/psparse-529)
Sep 11 15:49:30 localhost kernel: ACPI BIOS Error (bug): Could not resolve symbol [\_SB.PCI0.SAT0.PRT1._GTF.DSSP], AE_NOT_FOUND (20240322/psargs-330)
Sep 11 15:49:30 localhost kernel: ACPI Error: Aborting method \_SB.PCI0.SAT0.PRT1._GTF due to previous error (AE_NOT_FOUND) (20240322/psparse-529)
Sep 11 15:49:30 localhost kernel: ata2.00: supports DRM functions and may not be fully accessible
Sep 11 15:49:30 localhost kernel: ACPI BIOS Error (bug): Could not resolve symbol [\_SB.PCI0.SAT0.PRT5._GTF.DSSP], AE_NOT_FOUND (20240322/psargs-330)
Sep 11 15:49:30 localhost kernel: ACPI Error: Aborting method \_SB.PCI0.SAT0.PRT5._GTF due to previous error (AE_NOT_FOUND) (20240322/psparse-529)
Sep 11 15:49:30 localhost kernel: ata6.00: configured for UDMA/100
Sep 11 15:49:30 localhost kernel: ata2.00: configured for UDMA/133
Sep 11 15:49:30 localhost kernel: ata2.00: Entering active power mode
Sep 11 15:49:30 localhost kernel: ahci 0000:00:17.0: port does not support device sleep
Sep 11 15:49:30 localhost kernel: ata2.00: Enabling discard_zeroes_data
Sep 11 15:49:30 localhost kernel: usb 1-10: reset low-speed USB device number 2 using xhci_hcd
Sep 11 15:49:30 localhost kernel: OOM killer enabled.
Sep 11 15:49:30 localhost kernel: Restarting tasks ... done.
Sep 11 15:49:30 localhost kernel: PM: hibernation: hibernation exit
Sep 11 15:49:30 localhost kernel: EXT4-fs (sdc3): re-mounted 74f580dd-0819-46ea-bb1d-761996f80da2 r/w. Quota mode: none.
Sep 11 15:49:31 localhost kernel: ACPI BIOS Error (bug): Could not resolve symbol [\_SB.PCI0.SAT0.PRT3._GTF.DSSP], AE_NOT_FOUND (20240322/psargs-330)
Sep 11 15:49:31 localhost kernel: ACPI Error: Aborting method \_SB.PCI0.SAT0.PRT3._GTF due to previous error (AE_NOT_FOUND) (20240322/psparse-529)
Sep 11 15:49:31 localhost kernel: ata4.00: configured for UDMA/133
Sep 11 15:49:32 localhost kernel: e1000e 0000:00:1f.6 eth0: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: libahci driver and power switching HDD on newer kernels
  2024-09-15 12:44 libahci driver and power switching HDD on newer kernels W
@ 2024-09-24  5:20 ` Linux regression tracking (Thorsten Leemhuis)
  2024-09-24  7:31   ` Damien Le Moal
  0 siblings, 1 reply; 8+ messages in thread
From: Linux regression tracking (Thorsten Leemhuis) @ 2024-09-24  5:20 UTC (permalink / raw)
  To: W; +Cc: linux-ide, Linux kernel regressions list, Damien Le Moal,
	Niklas Cassel

Hi!

On 15.09.24 14:44, W wrote:
> 
> I've got some problems with newer kernels during hibernation and
> waking up from hibernation.
> The symptom of the issue is that there is HDD power switching executed
> after I run pm-hibernate command and the same HDD power switching
> during the wake up process from hibernation.

Thx for the report. I CCed Damien and Niklas (and the regressions list),
maybe they have an idea. If they do not reply you most likely need to
perform a bisection to raise interest in your report.

And FWIW, there is one important info that afaics is missing from your
report: If the latest mainline kernel (e.g. 6.11) still affected?

Ciao, Thorsten

> As the HDD power switch process I mean: powering off my HDD and
> immediately after that powering on the HDD. This process in my case
> takes about 1 second.
> 
> Here there are more details about both operations: hibernation and
> waking up:
> 
> Hibernation process:
> 1. in shell I type: "pm-hibernate"
> 2. kernel is preparing system to hibernation
> 3. hdd is powered off and immediately powered on - it takes about one
> second or less to do the power switch
> 4. kernel is saving mem image to swap partition with printing progress
> in percentage
> 5. PC and HDD are powered off
> 
> Waking up process:
> 1. my PC and HDD are powered off, I'm pressing any key on my keyboard
> so it is powering up my PC
> 2. kernel is starting, recognizes that there is mem image on swap and
> starting to load it - printing progress percentage
> 3. hdd is powered off and immediately powered on - it takes about one
> second or less to do the power switch
> 4. system is ready to use and working fine
> 
> So to sum up - in both processes described above the problematic step
> is the step 3.
> 
> I noticed this issue when I switched kernel 6.4.12 to 6.7.5. So far I
> haven't used git bisect yet to find the exact offending commit so the
> change might be introduced somewhere between 6.4.12 and 6.7.5.
> In 6.4.12 I have not such an issues with HDD power switching. The
> issues exist in 6.7.5 and newer ones and probably somewhere between
> 6.4.12 and 6.7.5.
> 
> I noticed some errors in dmesg coming from ahci driver like these:
> Sep 11 15:49:30 localhost kernel: ahci 0000:00:17.0: port does not
> support device sleep
> 
> and ACPI BIOS errors like these:
> Sep 11 15:49:30 localhost kernel: ACPI BIOS Error (bug): Could not
> resolve symbol [\_SB.PCI0.SAT0.PRT0._GTF.DSSP], AE_NOT_FOUND
> (20240322/psargs-330)
> 
> Please take a look at included dmesg in the attachment.
> 
> The mainboard I use is: Gigabyte Z370 HD3P with the newest available
> BIOS update.
> 
> I'd like to ask if this is known issue and if yes how could I fix it?
> I'm not quite sure where exactly is the issue: in kernel or in my
> Gigabyte BIOS?


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: libahci driver and power switching HDD on newer kernels
  2024-09-24  5:20 ` Linux regression tracking (Thorsten Leemhuis)
@ 2024-09-24  7:31   ` Damien Le Moal
  2024-09-24 10:42     ` W
  0 siblings, 1 reply; 8+ messages in thread
From: Damien Le Moal @ 2024-09-24  7:31 UTC (permalink / raw)
  To: Linux regressions mailing list, W; +Cc: linux-ide, Niklas Cassel

On 2024/09/24 7:20, Linux regression tracking (Thorsten Leemhuis) wrote:
> Hi!
> 
> On 15.09.24 14:44, W wrote:
>>
>> I've got some problems with newer kernels during hibernation and
>> waking up from hibernation.
>> The symptom of the issue is that there is HDD power switching executed
>> after I run pm-hibernate command and the same HDD power switching
>> during the wake up process from hibernation.
> 
> Thx for the report. I CCed Damien and Niklas (and the regressions list),
> maybe they have an idea. If they do not reply you most likely need to
> perform a bisection to raise interest in your report.

Thanks for that. Niklas and I will look at this (I thought we already had fixed
that issue...). However, this week may be difficult as Niklas and I are
attending a conference and we may not have time to do much. But this will be our
highest priority next week.

> 
> And FWIW, there is one important info that afaics is missing from your
> report: If the latest mainline kernel (e.g. 6.11) still affected?

Yes, please check that.

> 
> Ciao, Thorsten
> 
>> As the HDD power switch process I mean: powering off my HDD and
>> immediately after that powering on the HDD. This process in my case
>> takes about 1 second.
>>
>> Here there are more details about both operations: hibernation and
>> waking up:
>>
>> Hibernation process:
>> 1. in shell I type: "pm-hibernate"
>> 2. kernel is preparing system to hibernation
>> 3. hdd is powered off and immediately powered on - it takes about one
>> second or less to do the power switch

I think that the issue is that this HDD power-off is done way too early as the
mem image saving generates IOs which wake up the drive again. The 1s it takes
for this is because the HDD started spinning down and needs to be spun up again.

This is likely because the mem image must be prepared with the devices already
suspended so that when restoring the image, the restarts sees the devices
suspended (as they should be). For this case, libata (or scsi SD PM code, not
sure which one) should suspend the device logically but not spin down the
physical HDD...

>> 4. kernel is saving mem image to swap partition with printing progress
>> in percentage
>> 5. PC and HDD are powered off

... and the power off of the HDD should happen here.

>>
>> Waking up process:
>> 1. my PC and HDD are powered off, I'm pressing any key on my keyboard
>> so it is powering up my PC
>> 2. kernel is starting, recognizes that there is mem image on swap and
>> starting to load it - printing progress percentage
>> 3. hdd is powered off and immediately powered on - it takes about one
>> second or less to do the power switch

That is a weird one, but again I think due to the IO activity that loading the
mem image generates. Similarly, I think this is a mishandling of the device
logical state vs physical power state...

>> 4. system is ready to use and working fine
>>
>> So to sum up - in both processes described above the problematic step
>> is the step 3.
>>
>> I noticed this issue when I switched kernel 6.4.12 to 6.7.5. So far I
>> haven't used git bisect yet to find the exact offending commit so the
>> change might be introduced somewhere between 6.4.12 and 6.7.5.

It would really be helpful if you could bisect this !

>> In 6.4.12 I have not such an issues with HDD power switching. The
>> issues exist in 6.7.5 and newer ones and probably somewhere between
>> 6.4.12 and 6.7.5.
>>
>> I noticed some errors in dmesg coming from ahci driver like these:
>> Sep 11 15:49:30 localhost kernel: ahci 0000:00:17.0: port does not
>> support device sleep

This is relevant. This is not really an error but rather a statement that your
device does not support sleep. This could be the reason for the behavior. (hence
my request that if you could test with a different device, it would be help).

>> and ACPI BIOS errors like these:
>> Sep 11 15:49:30 localhost kernel: ACPI BIOS Error (bug): Could not
>> resolve symbol [\_SB.PCI0.SAT0.PRT0._GTF.DSSP], AE_NOT_FOUND
>> (20240322/psargs-330)

Yeah... I see these all the time on different machines. That comes from the BIOS
not being great :) Not sure if that one relates to SATA though ?

>> Please take a look at included dmesg in the attachment.
>>
>> The mainboard I use is: Gigabyte Z370 HD3P with the newest available
>> BIOS update.
>>
>> I'd like to ask if this is known issue and if yes how could I fix it?
>> I'm not quite sure where exactly is the issue: in kernel or in my
>> Gigabyte BIOS?

Given that you had 6.4.12 working OK, it is likely some commit that introduced a
regression. If you can git bisect it, we will have a better idea how to remove
the regression.

Best regards.

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: libahci driver and power switching HDD on newer kernels
  2024-09-24  7:31   ` Damien Le Moal
@ 2024-09-24 10:42     ` W
  2024-10-02 21:20       ` Niklas Cassel
  0 siblings, 1 reply; 8+ messages in thread
From: W @ 2024-09-24 10:42 UTC (permalink / raw)
  To: Damien Le Moal, Linux regressions mailing list; +Cc: linux-ide, Niklas Cassel

W dniu 24.09.2024 o 9:31 AM, Damien Le Moal pisze:
> On 2024/09/24 7:20, Linux regression tracking (Thorsten Leemhuis) wrote:
>> Hi!

Hi
>> On 15.09.24 14:44, W wrote:
>>>
>>> I've got some problems with newer kernels during hibernation and
>>> waking up from hibernation.
>>> The symptom of the issue is that there is HDD power switching executed
>>> after I run pm-hibernate command and the same HDD power switching
>>> during the wake up process from hibernation.
>>
>> Thx for the report. I CCed Damien and Niklas (and the regressions list),
>> maybe they have an idea. If they do not reply you most likely need to
>> perform a bisection to raise interest in your report.
> 
> Thanks for that. Niklas and I will look at this (I thought we already had fixed
> that issue...). However, this week may be difficult as Niklas and I are
> attending a conference and we may not have time to do much. But this will be our
> highest priority next week.
> 
>>
>> And FWIW, there is one important info that afaics is missing from your
>> report: If the latest mainline kernel (e.g. 6.11) still affected?
> 
> Yes, please check that.

I've already reported the issue here:
https://bugzilla.kernel.org/show_bug.cgi?id=219296
and unfortunately the issue exists also on 6.11.

>>> 4. system is ready to use and working fine
>>>
>>> So to sum up - in both processes described above the problematic step
>>> is the step 3.
>>>
>>> I noticed this issue when I switched kernel 6.4.12 to 6.7.5. So far I
>>> haven't used git bisect yet to find the exact offending commit so the
>>> change might be introduced somewhere between 6.4.12 and 6.7.5.
> 
> It would really be helpful if you could bisect this !

I did it and placed the log from bisect in my bugzilla report: 
https://bugzilla.kernel.org/show_bug.cgi?id=219296

>>> In 6.4.12 I have not such an issues with HDD power switching. The
>>> issues exist in 6.7.5 and newer ones and probably somewhere between
>>> 6.4.12 and 6.7.5.
>>>
>>> I noticed some errors in dmesg coming from ahci driver like these:
>>> Sep 11 15:49:30 localhost kernel: ahci 0000:00:17.0: port does not
>>> support device sleep
> 
> This is relevant. This is not really an error but rather a statement that your
> device does not support sleep. This could be the reason for the behavior. (hence
> my request that if you could test with a different device, it would be help).
> 
>>> and ACPI BIOS errors like these:
>>> Sep 11 15:49:30 localhost kernel: ACPI BIOS Error (bug): Could not
>>> resolve symbol [\_SB.PCI0.SAT0.PRT0._GTF.DSSP], AE_NOT_FOUND
>>> (20240322/psargs-330)
> 
> Yeah... I see these all the time on different machines. That comes from the BIOS
> not being great :) Not sure if that one relates to SATA though ?

I was trying to report this issue to American Megatrends Inc. as the 
BIOS maintainer but their website form didn't accept any of my emails :) 
So eventually I couldn't let them know about these issues.

>>> Please take a look at included dmesg in the attachment.
>>>
>>> The mainboard I use is: Gigabyte Z370 HD3P with the newest available
>>> BIOS update.
>>>
>>> I'd like to ask if this is known issue and if yes how could I fix it?
>>> I'm not quite sure where exactly is the issue: in kernel or in my
>>> Gigabyte BIOS?
> 
> Given that you had 6.4.12 working OK, it is likely some commit that introduced a
> regression. If you can git bisect it, we will have a better idea how to remove
> the regression.

Please take a look at bugzilla report: 
https://bugzilla.kernel.org/show_bug.cgi?id=219296 - there are the details.

I'm wondering what is the better way for communication - here on mailing 
list or put the comments in bugzilla ticket?
Probably here will be better idea...

W


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: libahci driver and power switching HDD on newer kernels
  2024-09-24 10:42     ` W
@ 2024-10-02 21:20       ` Niklas Cassel
  2024-10-02 22:47         ` Damien Le Moal
  2024-10-03  7:01         ` W
  0 siblings, 2 replies; 8+ messages in thread
From: Niklas Cassel @ 2024-10-02 21:20 UTC (permalink / raw)
  To: W; +Cc: Damien Le Moal, Linux regressions mailing list, linux-ide

On Tue, Sep 24, 2024 at 12:42:10PM +0200, W wrote:
> > 
> > Given that you had 6.4.12 working OK, it is likely some commit that introduced a
> > regression. If you can git bisect it, we will have a better idea how to remove
> > the regression.
> 
> Please take a look at bugzilla report:
> https://bugzilla.kernel.org/show_bug.cgi?id=219296 - there are the details.
> 
> I'm wondering what is the better way for communication - here on mailing
> list or put the comments in bugzilla ticket?
> Probably here will be better idea...
> 
> W
> 

Hello W,

Could you please try the following patch,
and see if it helps:


From dba01b7d68fffc26f3abf3252296082311a767a0 Mon Sep 17 00:00:00 2001
From: Niklas Cassel <cassel@kernel.org>
Date: Wed, 2 Oct 2024 21:40:41 +0200
Subject: [PATCH] ata: libata: do not spin down disk on PM event freeze

Currently, ata_eh_handle_port_suspend() will return early if
ATA_PFLAG_PM_PENDING is not set, or if the PM event has flag
PM_EVENT_RESUME set.

This means that the following PM callbacks:
.suspend = ata_port_pm_suspend,
.freeze = ata_port_pm_freeze,
.poweroff = ata_port_pm_poweroff,
.runtime_suspend = ata_port_runtime_suspend,
will actually make ata_eh_handle_port_suspend() perform some work.

ata_eh_handle_port_suspend() will spin down the disks (by calling
ata_dev_power_set_standby()), regardless of the PM event.

Documentation/driver-api/pm/devices.rst, section "Entering Hibernation",
explicitly mentions that .freeze() does not have to be put the device in
a low-power state, and actually recommends not doing so. Thus, let's not
spin down the disk for the .freeze() callback. (The disk will instead be
spun down during the succeeding .poweroff() callback.)

Fixes: aa3998dbeb3a ("ata: libata-scsi: Disable scsi device manage_system_start_stop")
Signed-off-by: Niklas Cassel <cassel@kernel.org>
---
 drivers/ata/libata-eh.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index 3f0144e7dc80..45a0d9af2d54 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -4099,10 +4099,20 @@ static void ata_eh_handle_port_suspend(struct ata_port *ap)
 
        WARN_ON(ap->pflags & ATA_PFLAG_SUSPENDED);
 
-       /* Set all devices attached to the port in standby mode */
-       ata_for_each_link(link, ap, HOST_FIRST) {
-               ata_for_each_dev(dev, link, ENABLED)
-                       ata_dev_power_set_standby(dev);
+       /*
+        * We will reach this point for all of the PM events:
+        * PM_EVENT_SUSPEND (if runtime pm, PM_EVENT_AUTO will also be set)
+        * PM_EVENT_FREEZE, and PM_EVENT_HIBERNATE.
+        *
+        * We do not want to perform disk spin down for PM_EVENT_FREEZE.
+        * (Spin down will be performed by the succeeding PM_EVENT_HIBERNATE.)
+        */
+       if (!(ap->pm_mesg.event & PM_EVENT_FREEZE)) {
+               /* Set all devices attached to the port in standby mode */
+               ata_for_each_link(link, ap, HOST_FIRST) {
+                       ata_for_each_dev(dev, link, ENABLED)
+                               ata_dev_power_set_standby(dev);
+               }
        }
 
        /*
-- 
2.46.2

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: libahci driver and power switching HDD on newer kernels
  2024-10-02 21:20       ` Niklas Cassel
@ 2024-10-02 22:47         ` Damien Le Moal
  2024-10-07 16:29           ` Niklas Cassel
  2024-10-03  7:01         ` W
  1 sibling, 1 reply; 8+ messages in thread
From: Damien Le Moal @ 2024-10-02 22:47 UTC (permalink / raw)
  To: Niklas Cassel, W; +Cc: Linux regressions mailing list, linux-ide

On 10/3/24 06:20, Niklas Cassel wrote:
> On Tue, Sep 24, 2024 at 12:42:10PM +0200, W wrote:
>>>
>>> Given that you had 6.4.12 working OK, it is likely some commit that introduced a
>>> regression. If you can git bisect it, we will have a better idea how to remove
>>> the regression.
>>
>> Please take a look at bugzilla report:
>> https://bugzilla.kernel.org/show_bug.cgi?id=219296 - there are the details.
>>
>> I'm wondering what is the better way for communication - here on mailing
>> list or put the comments in bugzilla ticket?
>> Probably here will be better idea...
>>
>> W
>>
> 
> Hello W,
> 
> Could you please try the following patch,
> and see if it helps:
> 
> 
> From dba01b7d68fffc26f3abf3252296082311a767a0 Mon Sep 17 00:00:00 2001
> From: Niklas Cassel <cassel@kernel.org>
> Date: Wed, 2 Oct 2024 21:40:41 +0200
> Subject: [PATCH] ata: libata: do not spin down disk on PM event freeze
> 
> Currently, ata_eh_handle_port_suspend() will return early if
> ATA_PFLAG_PM_PENDING is not set, or if the PM event has flag
> PM_EVENT_RESUME set.
> 
> This means that the following PM callbacks:
> .suspend = ata_port_pm_suspend,
> .freeze = ata_port_pm_freeze,
> .poweroff = ata_port_pm_poweroff,
> .runtime_suspend = ata_port_runtime_suspend,
> will actually make ata_eh_handle_port_suspend() perform some work.
> 
> ata_eh_handle_port_suspend() will spin down the disks (by calling
> ata_dev_power_set_standby()), regardless of the PM event.
> 
> Documentation/driver-api/pm/devices.rst, section "Entering Hibernation",
> explicitly mentions that .freeze() does not have to be put the device in
> a low-power state, and actually recommends not doing so. Thus, let's not
> spin down the disk for the .freeze() callback. (The disk will instead be
> spun down during the succeeding .poweroff() callback.)
> 
> Fixes: aa3998dbeb3a ("ata: libata-scsi: Disable scsi device manage_system_start_stop")
> Signed-off-by: Niklas Cassel <cassel@kernel.org>
> ---
>  drivers/ata/libata-eh.c | 18 ++++++++++++++----
>  1 file changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
> index 3f0144e7dc80..45a0d9af2d54 100644
> --- a/drivers/ata/libata-eh.c
> +++ b/drivers/ata/libata-eh.c
> @@ -4099,10 +4099,20 @@ static void ata_eh_handle_port_suspend(struct ata_port *ap)
>  
>         WARN_ON(ap->pflags & ATA_PFLAG_SUSPENDED);
>  
> -       /* Set all devices attached to the port in standby mode */
> -       ata_for_each_link(link, ap, HOST_FIRST) {
> -               ata_for_each_dev(dev, link, ENABLED)
> -                       ata_dev_power_set_standby(dev);
> +       /*
> +        * We will reach this point for all of the PM events:
> +        * PM_EVENT_SUSPEND (if runtime pm, PM_EVENT_AUTO will also be set)
> +        * PM_EVENT_FREEZE, and PM_EVENT_HIBERNATE.
> +        *
> +        * We do not want to perform disk spin down for PM_EVENT_FREEZE.
> +        * (Spin down will be performed by the succeeding PM_EVENT_HIBERNATE.)
> +        */
> +       if (!(ap->pm_mesg.event & PM_EVENT_FREEZE)) {

This feels odd: not doing anything to the drive for PM_EVENT_FREEZE will still
endup freezing the port, which will later cause a reset. And we still endup
calling the port suspend op and ata_acpi_set_state(), which seems to be doing
nothing for freeze...

So I wonder if a simpler approach would not be to simply remove the
ata_port_pm_freeze() method entirely and do nothing for freeze events ?

> +               /* Set all devices attached to the port in standby mode */
> +               ata_for_each_link(link, ap, HOST_FIRST) {
> +                       ata_for_each_dev(dev, link, ENABLED)
> +                               ata_dev_power_set_standby(dev);
> +               }
>         }
>  
>         /*


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: libahci driver and power switching HDD on newer kernels
  2024-10-02 21:20       ` Niklas Cassel
  2024-10-02 22:47         ` Damien Le Moal
@ 2024-10-03  7:01         ` W
  1 sibling, 0 replies; 8+ messages in thread
From: W @ 2024-10-03  7:01 UTC (permalink / raw)
  To: Niklas Cassel; +Cc: Damien Le Moal, Linux regressions mailing list, linux-ide

W dniu 2.10.2024 o 11:20 PM, Niklas Cassel pisze:
> On Tue, Sep 24, 2024 at 12:42:10PM +0200, W wrote:
>>>
>>> Given that you had 6.4.12 working OK, it is likely some commit that introduced a
>>> regression. If you can git bisect it, we will have a better idea how to remove
>>> the regression.
>>
>> Please take a look at bugzilla report:
>> https://bugzilla.kernel.org/show_bug.cgi?id=219296 - there are the details.
>>
>> I'm wondering what is the better way for communication - here on mailing
>> list or put the comments in bugzilla ticket?
>> Probably here will be better idea...
>>
>> W
>>
> 
> Hello W,
> 
> Could you please try the following patch,
> and see if it helps:
> 
> 
>  From dba01b7d68fffc26f3abf3252296082311a767a0 Mon Sep 17 00:00:00 2001
> From: Niklas Cassel <cassel@kernel.org>
> Date: Wed, 2 Oct 2024 21:40:41 +0200
> Subject: [PATCH] ata: libata: do not spin down disk on PM event freeze
> 
> Currently, ata_eh_handle_port_suspend() will return early if
> ATA_PFLAG_PM_PENDING is not set, or if the PM event has flag
> PM_EVENT_RESUME set.
> 
> This means that the following PM callbacks:
> .suspend = ata_port_pm_suspend,
> .freeze = ata_port_pm_freeze,
> .poweroff = ata_port_pm_poweroff,
> .runtime_suspend = ata_port_runtime_suspend,
> will actually make ata_eh_handle_port_suspend() perform some work.
> 
> ata_eh_handle_port_suspend() will spin down the disks (by calling
> ata_dev_power_set_standby()), regardless of the PM event.
> 
> Documentation/driver-api/pm/devices.rst, section "Entering Hibernation",
> explicitly mentions that .freeze() does not have to be put the device in
> a low-power state, and actually recommends not doing so. Thus, let's not
> spin down the disk for the .freeze() callback. (The disk will instead be
> spun down during the succeeding .poweroff() callback.)
> 
> Fixes: aa3998dbeb3a ("ata: libata-scsi: Disable scsi device manage_system_start_stop")
> Signed-off-by: Niklas Cassel <cassel@kernel.org>
> ---
>   drivers/ata/libata-eh.c | 18 ++++++++++++++----
>   1 file changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
> index 3f0144e7dc80..45a0d9af2d54 100644
> --- a/drivers/ata/libata-eh.c
> +++ b/drivers/ata/libata-eh.c
> @@ -4099,10 +4099,20 @@ static void ata_eh_handle_port_suspend(struct ata_port *ap)
>   
>          WARN_ON(ap->pflags & ATA_PFLAG_SUSPENDED);
>   
> -       /* Set all devices attached to the port in standby mode */
> -       ata_for_each_link(link, ap, HOST_FIRST) {
> -               ata_for_each_dev(dev, link, ENABLED)
> -                       ata_dev_power_set_standby(dev);
> +       /*
> +        * We will reach this point for all of the PM events:
> +        * PM_EVENT_SUSPEND (if runtime pm, PM_EVENT_AUTO will also be set)
> +        * PM_EVENT_FREEZE, and PM_EVENT_HIBERNATE.
> +        *
> +        * We do not want to perform disk spin down for PM_EVENT_FREEZE.
> +        * (Spin down will be performed by the succeeding PM_EVENT_HIBERNATE.)
> +        */
> +       if (!(ap->pm_mesg.event & PM_EVENT_FREEZE)) {
> +               /* Set all devices attached to the port in standby mode */
> +               ata_for_each_link(link, ap, HOST_FIRST) {
> +                       ata_for_each_dev(dev, link, ENABLED)
> +                               ata_dev_power_set_standby(dev);
> +               }
>          }
>   
>          /*

Hi Niklas, Damien and others,

Niklas I applied your patch on:
commit 9852d85ec9d492ebef56dc5f229416c925758edc (tag: v6.12-rc1)
and gave it a try.

I have done 2 cycles of hibernate/wake_up and in both cases it worked 
fine so the HDD is not powered off and then powered on.

W


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: libahci driver and power switching HDD on newer kernels
  2024-10-02 22:47         ` Damien Le Moal
@ 2024-10-07 16:29           ` Niklas Cassel
  0 siblings, 0 replies; 8+ messages in thread
From: Niklas Cassel @ 2024-10-07 16:29 UTC (permalink / raw)
  To: Damien Le Moal; +Cc: W, Linux regressions mailing list, linux-ide

On Thu, Oct 03, 2024 at 07:47:41AM +0900, Damien Le Moal wrote:
> On 10/3/24 06:20, Niklas Cassel wrote:
> > On Tue, Sep 24, 2024 at 12:42:10PM +0200, W wrote:
> >>>
> >>> Given that you had 6.4.12 working OK, it is likely some commit that introduced a
> >>> regression. If you can git bisect it, we will have a better idea how to remove
> >>> the regression.
> >>
> >> Please take a look at bugzilla report:
> >> https://bugzilla.kernel.org/show_bug.cgi?id=219296 - there are the details.
> >>
> >> I'm wondering what is the better way for communication - here on mailing
> >> list or put the comments in bugzilla ticket?
> >> Probably here will be better idea...
> >>
> >> W
> >>
> > 
> > Hello W,
> > 
> > Could you please try the following patch,
> > and see if it helps:
> > 
> > 
> > From dba01b7d68fffc26f3abf3252296082311a767a0 Mon Sep 17 00:00:00 2001
> > From: Niklas Cassel <cassel@kernel.org>
> > Date: Wed, 2 Oct 2024 21:40:41 +0200
> > Subject: [PATCH] ata: libata: do not spin down disk on PM event freeze
> > 
> > Currently, ata_eh_handle_port_suspend() will return early if
> > ATA_PFLAG_PM_PENDING is not set, or if the PM event has flag
> > PM_EVENT_RESUME set.
> > 
> > This means that the following PM callbacks:
> > .suspend = ata_port_pm_suspend,
> > .freeze = ata_port_pm_freeze,
> > .poweroff = ata_port_pm_poweroff,
> > .runtime_suspend = ata_port_runtime_suspend,
> > will actually make ata_eh_handle_port_suspend() perform some work.
> > 
> > ata_eh_handle_port_suspend() will spin down the disks (by calling
> > ata_dev_power_set_standby()), regardless of the PM event.
> > 
> > Documentation/driver-api/pm/devices.rst, section "Entering Hibernation",
> > explicitly mentions that .freeze() does not have to be put the device in
> > a low-power state, and actually recommends not doing so. Thus, let's not
> > spin down the disk for the .freeze() callback. (The disk will instead be
> > spun down during the succeeding .poweroff() callback.)
> > 
> > Fixes: aa3998dbeb3a ("ata: libata-scsi: Disable scsi device manage_system_start_stop")
> > Signed-off-by: Niklas Cassel <cassel@kernel.org>
> > ---
> >  drivers/ata/libata-eh.c | 18 ++++++++++++++----
> >  1 file changed, 14 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
> > index 3f0144e7dc80..45a0d9af2d54 100644
> > --- a/drivers/ata/libata-eh.c
> > +++ b/drivers/ata/libata-eh.c
> > @@ -4099,10 +4099,20 @@ static void ata_eh_handle_port_suspend(struct ata_port *ap)
> >  
> >         WARN_ON(ap->pflags & ATA_PFLAG_SUSPENDED);
> >  
> > -       /* Set all devices attached to the port in standby mode */
> > -       ata_for_each_link(link, ap, HOST_FIRST) {
> > -               ata_for_each_dev(dev, link, ENABLED)
> > -                       ata_dev_power_set_standby(dev);
> > +       /*
> > +        * We will reach this point for all of the PM events:
> > +        * PM_EVENT_SUSPEND (if runtime pm, PM_EVENT_AUTO will also be set)
> > +        * PM_EVENT_FREEZE, and PM_EVENT_HIBERNATE.
> > +        *
> > +        * We do not want to perform disk spin down for PM_EVENT_FREEZE.
> > +        * (Spin down will be performed by the succeeding PM_EVENT_HIBERNATE.)
> > +        */
> > +       if (!(ap->pm_mesg.event & PM_EVENT_FREEZE)) {
> 
> This feels odd: not doing anything to the drive for PM_EVENT_FREEZE will still
> endup freezing the port, which will later cause a reset. And we still endup
> calling the port suspend op and ata_acpi_set_state(), which seems to be doing
> nothing for freeze...

Hello Damien,

Hibernation is done by three consecutive PM events, in the following order:
1) PM_EVENT_FREEZE
2) PM_EVENT_THAW
3) PM_EVENT_HIBERNATE

Hibernate before commit aa3998dbeb3a ("ata: libata-scsi: Disable scsi
device manage_system_start_stop"):
-On event PM_EVENT_FREEZE:
 ata_eh_handle_port_suspend() would call ata_eh_freeze_port() and then
 call ap->ops->port_suspend().
-On event PM_EVENT_THAW:
 ata_port_resume() would trigger EH with ATA_EH_RESET, which triggers a
 all to ata_eh_reset(), which will thaw the port, and then later
 ata_eh_handle_port_resume() would call ap->ops->port_resume().
-On event PM_EVENT_HIBERNATE:
 same as PM_EVENT_FREEZE.

After commit aa3998dbeb3a ("ata: libata-scsi: Disable scsi device
manage_system_start_stop"), on event PM_EVENT_FREEZE, and on event
PM_EVENT_HIBERNATE, ata_eh_handle_port_suspend() would call
ata_eh_freeze_port() and then call ap->ops->port_suspend(),
but it would also call ata_dev_power_set_standby() to spin down the disk.

So what my propsed patch does is simply to restore
ata_eh_handle_port_suspend() to the behavior before commit aa3998dbeb3a
for event PM_EVENT_FREEZE, but for event PM_EVENT_HIBERNATE,
ata_eh_handle_port_suspend() will continue to spin down the disk.


> 
> So I wonder if a simpler approach would not be to simply remove the
> ata_port_pm_freeze() method entirely and do nothing for freeze events ?

I do not think that is a good option, as:
https://docs.kernel.org/driver-api/pm/devices.html#entering-hibernation
clearly states that:
"The ->freeze methods should quiesce the device so that it doesn’t
generate IRQs or DMA."

That is what we do when calling ata_eh_freeze_port().

I think my propsed patch is the simplest thing that we can do to
restore the behavior of ata_eh_handle_port_suspend() for event
PM_EVENT_FREEZE, as it was before commit aa3998dbeb3a.

Thus, I will send out this proposed patch as a real patch shortly.


Kind regards,
Niklas

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-10-07 16:29 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-15 12:44 libahci driver and power switching HDD on newer kernels W
2024-09-24  5:20 ` Linux regression tracking (Thorsten Leemhuis)
2024-09-24  7:31   ` Damien Le Moal
2024-09-24 10:42     ` W
2024-10-02 21:20       ` Niklas Cassel
2024-10-02 22:47         ` Damien Le Moal
2024-10-07 16:29           ` Niklas Cassel
2024-10-03  7:01         ` W

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox