public inbox for linux-ide@vger.kernel.org
 help / color / mirror / Atom feed
From: Damien Le Moal <dlemoal@kernel.org>
To: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: linux-ide@vger.kernel.org, linux-scsi@vger.kernel.org,
	"Martin K . Petersen" <martin.petersen@oracle.com>,
	Paul Ausbeck <paula@soe.ucsc.edu>,
	Thorsten Leemhuis <regressions@leemhuis.info>,
	TW <dalzot@gmail.com>,
	regressions@lists.linux.dev, Bart Van Assche <bvanassche@acm.org>,
	linux-renesas-soc@vger.kernel.org
Subject: Re: [PATCH] ata,scsi: do not issue START STOP UNIT on resume
Date: Thu, 14 Sep 2023 15:58:26 +0900	[thread overview]
Message-ID: <ced723bd-dc88-d796-c86f-9fa96641d982@kernel.org> (raw)
In-Reply-To: <CAMuHMdVHu3zeM0WA7TcTA2QT9X68cTttU-TzjsQw6ZuYCu4t=A@mail.gmail.com>

On 9/14/23 15:53, Geert Uytterhoeven wrote:
> Hi Damien,
> 
> On Thu, Sep 14, 2023 at 12:03 AM Damien Le Moal <dlemoal@kernel.org> wrote:
>> On 9/13/23 19:21, Geert Uytterhoeven wrote:
>>>> Thanks for the report. The delay for the first data access from user space right
>>>> after resume is 100% expected, with or without this patch. The reason is that
>>>> waking up the drive (spinning it up) is done asynchronously from the PM resume
>>>> context, so when you get "PM suspend exit" message signaling that the system is
>>>> resumed, the drive may not yet be spinning. Any access will wait for that to
>>>> happen before proceeding. Depending on the drive that can take up to 10s or so.
>>>
>>> That does not match with what I am seeing: before this patch, there
>>> was no delay on first data access from user space, as the drive is fully
>>> spun up when system resume returns.
>>
>> Yes, that is a possibility, but not by design. Some users have complained about
>> the long resume times which causes laptop screens to be "black" until disks spin
>> up. That did not happen before 5.16, when the change to scsi using the PM async
>> ops to do suspend/resume created all the issues with suspend/resume on ata side.
>> I am going to run 5.15 again to check.
>>
>> The patch "do not issue START STOP UNIT on resume" was a botch attempt to remove
>> this delay. But it is a bad one because the ATA specs define that a drive can
>> get out of standby (spun down) power state only with a media access command (a
>> VERIFY command is used to spin up the disk). And furthermore, the specs also
>> says that even a reset shall not change the device power state. So issuing a
>> VERIFY command to spin up the drive is required per specs. Note that I do see
>> many of my drives (I have hundreds in the lab) spinning up on reset, which is
>> against the specs. But not all of them. So with the patch "do not issue START
>> STOP UNIT on resume", one risks not seeing the drive resuming correctly (timeout
>> errors on IDENTIFY command issued on resume will get the drive removed).
>>
>>> With this patch, system resume returns earlier, and the drive is only
>>> spun up when user space starts accessing data.
>>
>> Yes, but "when user space starts accessing data" -> "user space accesses are
>> processed only after the drive completes spinning up" would be more accurate.
> 
> Sure, I wrote it down in terms of what the user is experiencing, which may
> not be identical to what's happening under the hood.
> 
>> That is by design and expected. This is the behavior one would see if the drive
>> is set to use standby timers (to go to standby on its own after some idle time)
>> or if runtime suspend is used. I do not see any issue with this behavior. Is
>> this causing any issue on your end ? Do you have concerns about this approach ?
>>
>> Having the resume process wait for the drive to fully spin-up is easy to do. But
>> as I mentioned, many users are really not happy about how slow resuming become
>> with that. If I do that, you will get back the previous behavior you mention,
>> but I will be again getting emails about "resume is broken".
>>
>> I made a decision: no waiting for spinup. That causes a delay for the user on
>> first access, but that makes resume overall far faster. I do not want to go back
>> on that, unless you can confirm that there is a real regression/error/data
>> corruption happening.
> 
> I agree that is fine.

Still, R-CAR is not working for you with the patch adding the link. Looking into
it now. Trying to verify things with qemu & regular AHCI on PCs too as I do not
have an R-CAR board. Hard to debug :)

Preparing some patches for you to get more debug info.

> 
> Gr{oetje,eeting}s,
> 
>                         Geert
> 

-- 
Damien Le Moal
Western Digital Research


  reply	other threads:[~2023-09-14  6:58 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-31  0:39 [PATCH] ata,scsi: do not issue START STOP UNIT on resume Damien Le Moal
2023-07-31  3:48 ` TW
2023-07-31  4:44   ` Damien Le Moal
2023-07-31  5:47 ` Tanner Watkins
2023-07-31 16:13 ` Hannes Reinecke
2023-08-01  3:44   ` Damien Le Moal
2023-08-01  6:16     ` Hannes Reinecke
2023-07-31 19:43 ` Paul Ausbeck
2023-08-01 18:36 ` Bart Van Assche
2023-08-02  8:05 ` Damien Le Moal
2023-08-24 18:28 ` Rodrigo Vivi
2023-08-24 23:42   ` Damien Le Moal
2023-08-25  1:31     ` Martin K. Petersen
2023-08-25  1:33       ` Damien Le Moal
2023-08-25 17:09     ` Rodrigo Vivi
2023-08-25 22:06       ` Damien Le Moal
2023-08-29  6:17       ` Damien Le Moal
2023-08-30 22:14         ` Rodrigo Vivi
2023-08-31  0:32           ` Damien Le Moal
2023-08-31  1:48             ` Vivi, Rodrigo
2023-08-31  3:06               ` Damien Le Moal
2023-09-05  5:20               ` Damien Le Moal
2023-09-05 17:17                 ` Rodrigo Vivi
2023-09-06  1:07                   ` Damien Le Moal
2023-08-31  6:55       ` Damien Le Moal
2023-08-25 12:19   ` Damien Le Moal
2023-09-12 17:39 ` Geert Uytterhoeven
2023-09-12 22:58   ` Damien Le Moal
2023-09-13 10:21     ` Geert Uytterhoeven
2023-09-13 10:34       ` Geert Uytterhoeven
2023-09-13 22:07         ` Damien Le Moal
2023-09-14  6:59           ` Geert Uytterhoeven
2023-09-13 22:03       ` Damien Le Moal
2023-09-14  6:53         ` Geert Uytterhoeven
2023-09-14  6:58           ` Damien Le Moal [this message]
2023-09-14 15:29         ` Phillip Susi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ced723bd-dc88-d796-c86f-9fa96641d982@kernel.org \
    --to=dlemoal@kernel.org \
    --cc=bvanassche@acm.org \
    --cc=dalzot@gmail.com \
    --cc=geert@linux-m68k.org \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-renesas-soc@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=paula@soe.ucsc.edu \
    --cc=regressions@leemhuis.info \
    --cc=regressions@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox