All of lore.kernel.org
 help / color / mirror / Atom feed
From: Damien Le Moal <dlemoal@kernel.org>
To: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: linux-ide@vger.kernel.org, linux-scsi@vger.kernel.org,
	"Martin K . Petersen" <martin.petersen@oracle.com>,
	Paul Ausbeck <paula@soe.ucsc.edu>,
	Thorsten Leemhuis <regressions@leemhuis.info>,
	TW <dalzot@gmail.com>,
	regressions@lists.linux.dev, Bart Van Assche <bvanassche@acm.org>,
	linux-renesas-soc@vger.kernel.org
Subject: Re: [PATCH] ata,scsi: do not issue START STOP UNIT on resume
Date: Thu, 14 Sep 2023 15:58:26 +0900	[thread overview]
Message-ID: <ced723bd-dc88-d796-c86f-9fa96641d982@kernel.org> (raw)
In-Reply-To: <CAMuHMdVHu3zeM0WA7TcTA2QT9X68cTttU-TzjsQw6ZuYCu4t=A@mail.gmail.com>

On 9/14/23 15:53, Geert Uytterhoeven wrote:
> Hi Damien,
> 
> On Thu, Sep 14, 2023 at 12:03 AM Damien Le Moal <dlemoal@kernel.org> wrote:
>> On 9/13/23 19:21, Geert Uytterhoeven wrote:
>>>> Thanks for the report. The delay for the first data access from user space right
>>>> after resume is 100% expected, with or without this patch. The reason is that
>>>> waking up the drive (spinning it up) is done asynchronously from the PM resume
>>>> context, so when you get "PM suspend exit" message signaling that the system is
>>>> resumed, the drive may not yet be spinning. Any access will wait for that to
>>>> happen before proceeding. Depending on the drive that can take up to 10s or so.
>>>
>>> That does not match with what I am seeing: before this patch, there
>>> was no delay on first data access from user space, as the drive is fully
>>> spun up when system resume returns.
>>
>> Yes, that is a possibility, but not by design. Some users have complained about
>> the long resume times which causes laptop screens to be "black" until disks spin
>> up. That did not happen before 5.16, when the change to scsi using the PM async
>> ops to do suspend/resume created all the issues with suspend/resume on ata side.
>> I am going to run 5.15 again to check.
>>
>> The patch "do not issue START STOP UNIT on resume" was a botch attempt to remove
>> this delay. But it is a bad one because the ATA specs define that a drive can
>> get out of standby (spun down) power state only with a media access command (a
>> VERIFY command is used to spin up the disk). And furthermore, the specs also
>> says that even a reset shall not change the device power state. So issuing a
>> VERIFY command to spin up the drive is required per specs. Note that I do see
>> many of my drives (I have hundreds in the lab) spinning up on reset, which is
>> against the specs. But not all of them. So with the patch "do not issue START
>> STOP UNIT on resume", one risks not seeing the drive resuming correctly (timeout
>> errors on IDENTIFY command issued on resume will get the drive removed).
>>
>>> With this patch, system resume returns earlier, and the drive is only
>>> spun up when user space starts accessing data.
>>
>> Yes, but "when user space starts accessing data" -> "user space accesses are
>> processed only after the drive completes spinning up" would be more accurate.
> 
> Sure, I wrote it down in terms of what the user is experiencing, which may
> not be identical to what's happening under the hood.
> 
>> That is by design and expected. This is the behavior one would see if the drive
>> is set to use standby timers (to go to standby on its own after some idle time)
>> or if runtime suspend is used. I do not see any issue with this behavior. Is
>> this causing any issue on your end ? Do you have concerns about this approach ?
>>
>> Having the resume process wait for the drive to fully spin-up is easy to do. But
>> as I mentioned, many users are really not happy about how slow resuming become
>> with that. If I do that, you will get back the previous behavior you mention,
>> but I will be again getting emails about "resume is broken".
>>
>> I made a decision: no waiting for spinup. That causes a delay for the user on
>> first access, but that makes resume overall far faster. I do not want to go back
>> on that, unless you can confirm that there is a real regression/error/data
>> corruption happening.
> 
> I agree that is fine.

Still, R-CAR is not working for you with the patch adding the link. Looking into
it now. Trying to verify things with qemu & regular AHCI on PCs too as I do not
have an R-CAR board. Hard to debug :)

Preparing some patches for you to get more debug info.

> 
> Gr{oetje,eeting}s,
> 
>                         Geert
> 

-- 
Damien Le Moal
Western Digital Research


  reply	other threads:[~2023-09-14  6:58 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-31  0:39 [PATCH] ata,scsi: do not issue START STOP UNIT on resume Damien Le Moal
2023-07-31  3:48 ` TW
2023-07-31  4:44   ` Damien Le Moal
2023-07-31  5:47 ` Tanner Watkins
2023-07-31 16:13 ` Hannes Reinecke
2023-08-01  3:44   ` Damien Le Moal
2023-08-01  6:16     ` Hannes Reinecke
2023-07-31 19:43 ` Paul Ausbeck
2023-08-01 18:36 ` Bart Van Assche
2023-08-02  8:05 ` Damien Le Moal
2023-08-24 18:28 ` Rodrigo Vivi
2023-08-24 23:42   ` Damien Le Moal
2023-08-25  1:31     ` Martin K. Petersen
2023-08-25  1:33       ` Damien Le Moal
2023-08-25 17:09     ` Rodrigo Vivi
2023-08-25 22:06       ` Damien Le Moal
2023-08-29  6:17       ` Damien Le Moal
2023-08-30 22:14         ` Rodrigo Vivi
2023-08-31  0:32           ` Damien Le Moal
2023-08-31  1:48             ` Vivi, Rodrigo
2023-08-31  3:06               ` Damien Le Moal
2023-09-05  5:20               ` Damien Le Moal
2023-09-05 17:17                 ` Rodrigo Vivi
2023-09-06  1:07                   ` Damien Le Moal
2023-08-31  6:55       ` Damien Le Moal
2023-08-25 12:19   ` Damien Le Moal
2023-09-12 17:39 ` Geert Uytterhoeven
2023-09-12 22:58   ` Damien Le Moal
2023-09-13 10:21     ` Geert Uytterhoeven
2023-09-13 10:34       ` Geert Uytterhoeven
2023-09-13 22:07         ` Damien Le Moal
2023-09-14  6:59           ` Geert Uytterhoeven
2023-09-13 22:03       ` Damien Le Moal
2023-09-14  6:53         ` Geert Uytterhoeven
2023-09-14  6:58           ` Damien Le Moal [this message]
2023-09-14 15:29         ` Phillip Susi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ced723bd-dc88-d796-c86f-9fa96641d982@kernel.org \
    --to=dlemoal@kernel.org \
    --cc=bvanassche@acm.org \
    --cc=dalzot@gmail.com \
    --cc=geert@linux-m68k.org \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-renesas-soc@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=paula@soe.ucsc.edu \
    --cc=regressions@leemhuis.info \
    --cc=regressions@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.