From: Damien Le Moal <dlemoal@kernel.org>
To: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: linux-ide@vger.kernel.org, linux-scsi@vger.kernel.org,
"Martin K . Petersen" <martin.petersen@oracle.com>,
John Garry <john.g.garry@oracle.com>,
Rodrigo Vivi <rodrigo.vivi@intel.com>,
Paul Ausbeck <paula@soe.ucsc.edu>,
Kai-Heng Feng <kai.heng.feng@canonical.com>,
Joe Breuer <linux-kernel@jmbreuer.net>,
Chia-Lin Kao <acelan.kao@canonical.com>
Subject: Re: [PATCH v7 00/23] Fix libata suspend/resume handling and code cleanup
Date: Fri, 29 Sep 2023 16:29:17 +0200 [thread overview]
Message-ID: <b37801fd-dc5f-3d79-a5b3-2fc0008037bf@kernel.org> (raw)
In-Reply-To: <CAMuHMdUN0yiMiEjev6gx2tv8eqQoecv6kuHSSztxUhoAvQ9OdA@mail.gmail.com>
On 2023/09/29 15:56, Geert Uytterhoeven wrote:
> Hi Damien,
>
> On Fri, Sep 29, 2023 at 3:37 PM Damien Le Moal <dlemoal@kernel.org> wrote:
>> On 2023/09/28 14:26, Geert Uytterhoeven wrote:
>>> On Tue, Sep 26, 2023 at 10:15 AM Damien Le Moal <dlemoal@kernel.org> wrote:
>>>> The first 9 patches of this series fix several issues with suspend/resume
>>>> power management operations in scsi and libata. The most significant
>>>> changes introduced are in patch 4 and 5, where the manage_start_stop
>>>> flag of scsi devices is split into the manage_system_start_stop and
>>>> manage_runtime_start_stop flags to allow keeping scsi runtime power
>>>> operations for spining up/down ATA devices but have libata do its own
>>>> system suspend/resume device power state management using EH.
>>>>
>>>> The remaining patches are code cleanup that do not introduce any
>>>> significant functional change.
>>>>
>>>> This series was tested on qemu and on various PCs and servers. I am
>>>> CC-ing people who recently reported issues with suspend/resume.
>>>> Additional testing would be much appreciated.
>>>
>>> JFTR, with current libata/for-next[*], I saw the following with
>>> rcar-sata, once (interesting lines marked with "!"):
>>>
>>> PM: suspend entry (s2idle)
>>> Filesystems sync: 0.026 seconds
>>> Freezing user space processes
>>> ! ata1.00: qc timeout after 10000 msecs (cmd 0x40)
>>> Freezing user space processes completed (elapsed 0.007 seconds)
>>> ! ata1.00: VERIFY failed (err_mask=0x4)
>>> OOM killer disabled.
>>> ! ata1.00: failed to IDENTIFY (I/O error, err_mask=0x40)
>>> Freezing remaining freezable tasks
>>> ! ata1.00: revalidation failed (errno=-5)
>>> Freezing remaining freezable tasks completed (elapsed 0.002 seconds)
>>> sd 0:0:0:0: [sda] Synchronizing SCSI cache
>>> ata1: link resume succeeded after 1 retries
>>> ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>>> ata1.00: configured for UDMA/133
>>> ata1.00: Entering active power mode
>>> ata1.00: Entering standby power mode
>>> ravb e6800000.ethernet eth0: Link is Down
>>> Micrel KSZ9031 Gigabit PHY e6800000.ethernet-ffffffff:00: attached
>>> PHY driver (mii_bus:phy_addr=e6800000.ethernet-ffffffff:00, irq=136)
>>> OOM killer enabled.
>>> Restarting tasks ... done.
>>> random: crng reseeded on system resumption
>>> PM: suspend exit
>>> ata1: link resume succeeded after 1 retries
>>> ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>>> ata1.00: Entering active power mode
>>> ata1.00: configured for UDMA/133
>>> ravb e6800000.ethernet eth0: Link is Up - 1Gbps/Full - flow control off
>>>
>>> Regardless, the disk worked fine after resume.
>>>
>>> Note that I saw this only once.
>>
>> I think I found the reason for this, but to confirm, were you doing a suspend
>> right after resuming the system ? If yes, that I think I exactly understand the
>> issue and why you saw it only once (it is a subtle race with scheduling
>> libata-EH suspend/resume operations). I will send a fix next week.
>
> Now you ask that, yes there was a system suspend before.
>
> Relevant log with timing info:
>
> [ 130.177616] PM: suspend exit
> [ 130.257981] ata1: link resume succeeded after 1 retries
> [ 130.376714] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> [ 130.388525] ata1.00: Entering active power mode
>
> so the drive should have been ready here.
yep.
>
> [ 140.452669] PM: suspend entry (s2idle)
Then suspend again 10s later.
> [ 140.488313] Filesystems sync: 0.026 seconds
> [ 140.515957] Freezing user space processes
> [ 140.518209] ata1.00: qc timeout after 10000 msecs (cmd 0x40)
> [ 140.523384] Freezing user space processes completed (elapsed
> 0.007 seconds)
> [ 140.527718] ata1.00: VERIFY failed (err_mask=0x4)
But that verify sent 10s earlier to spinup the drive failed... Hmmm... That is
not exactly what I was thinking of. While the race between scheduling suspend
while resume is still on-going does exist, it is likely not what is happening
here given the time interval with suspend entry. Need to dig further.
Do you perhaps have "Power-up in Standby" (PUIS) enabled on that drive ?
> [ 140.532541] OOM killer disabled.
> [ 140.537270] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x40)
> [ 140.542069] Freezing remaining freezable tasks
> [ 140.546784] ata1.00: revalidation failed (errno=-5)
>
> Gr{oetje,eeting}s,
>
> Geert
>
--
Damien Le Moal
Western Digital Research
prev parent reply other threads:[~2023-09-29 14:29 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-26 8:14 [PATCH v7 00/23] Fix libata suspend/resume handling and code cleanup Damien Le Moal
2023-09-26 8:14 ` [PATCH v7 01/23] ata: libata-core: Fix ata_port_request_pm() locking Damien Le Moal
2023-09-26 8:14 ` [PATCH v7 02/23] ata: libata-core: Fix port and device removal Damien Le Moal
2023-09-26 8:14 ` [PATCH v7 03/23] ata: libata-scsi: link ata port and scsi device Damien Le Moal
2023-09-26 8:14 ` [PATCH v7 04/23] scsi: sd: Differentiate system and runtime start/stop management Damien Le Moal
2023-09-26 8:14 ` [PATCH v7 05/23] ata: libata-scsi: Disable scsi device manage_system_start_stop Damien Le Moal
2023-09-26 8:14 ` [PATCH v7 06/23] scsi: Do not attempt to rescan suspended devices Damien Le Moal
2023-09-26 8:14 ` [PATCH v7 07/23] ata: libata-scsi: Fix delayed scsi_rescan_device() execution Damien Le Moal
2023-09-26 8:14 ` [PATCH v7 08/23] ata: libata-core: Do not register PM operations for SAS ports Damien Le Moal
2023-09-26 8:14 ` [PATCH v7 09/23] scsi: sd: Do not issue commands to suspended disks on shutdown Damien Le Moal
2023-09-27 7:46 ` Hannes Reinecke
2023-09-27 14:55 ` Bart Van Assche
2023-09-27 15:52 ` Damien Le Moal
2023-09-27 16:29 ` Bart Van Assche
2023-09-26 8:14 ` [PATCH v7 10/23] ata: libata-core: Fix compilation warning in ata_dev_config_ncq() Damien Le Moal
2023-09-26 8:14 ` [PATCH v7 11/23] ata: libata-eh: Fix compilation warning in ata_eh_link_report() Damien Le Moal
2023-09-26 8:14 ` [PATCH v7 12/23] scsi: Remove scsi device no_start_on_resume flag Damien Le Moal
2023-09-26 8:14 ` [PATCH v7 13/23] ata: libata-scsi: Cleanup ata_scsi_start_stop_xlat() Damien Le Moal
2023-09-26 8:14 ` [PATCH v7 14/23] ata: libata-core: Synchronize ata_port_detach() with hotplug Damien Le Moal
2023-09-26 8:14 ` [PATCH v7 15/23] ata: libata-core: Detach a port devices on shutdown Damien Le Moal
2023-09-26 8:15 ` [PATCH v7 16/23] ata: libata-core: Remove ata_port_suspend_async() Damien Le Moal
2023-09-26 8:15 ` [PATCH v7 17/23] ata: libata-core: Remove ata_port_resume_async() Damien Le Moal
2023-09-26 8:15 ` [PATCH v7 18/23] ata: libata-core: Do not poweroff runtime suspended ports Damien Le Moal
2023-09-26 8:15 ` [PATCH v7 19/23] ata: libata-core: Do not resume " Damien Le Moal
2023-09-26 8:15 ` [PATCH v7 20/23] ata: libata-sata: Improve ata_sas_slave_configure() Damien Le Moal
2023-09-26 8:15 ` [PATCH v7 21/23] ata: libata-eh: Improve reset error messages Damien Le Moal
2023-09-26 8:15 ` [PATCH v7 22/23] ata: libata-eh: Reduce "disable device" message verbosity Damien Le Moal
2023-09-26 8:15 ` [PATCH v7 23/23] ata: libata: Cleanup inline DMA helper functions Damien Le Moal
2023-09-28 12:26 ` [PATCH v7 00/23] Fix libata suspend/resume handling and code cleanup Geert Uytterhoeven
2023-09-28 12:39 ` Damien Le Moal
2023-09-29 13:37 ` Damien Le Moal
2023-09-29 13:56 ` Geert Uytterhoeven
2023-09-29 14:29 ` Damien Le Moal [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b37801fd-dc5f-3d79-a5b3-2fc0008037bf@kernel.org \
--to=dlemoal@kernel.org \
--cc=acelan.kao@canonical.com \
--cc=geert@linux-m68k.org \
--cc=john.g.garry@oracle.com \
--cc=kai.heng.feng@canonical.com \
--cc=linux-ide@vger.kernel.org \
--cc=linux-kernel@jmbreuer.net \
--cc=linux-scsi@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=paula@soe.ucsc.edu \
--cc=rodrigo.vivi@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox