All of lore.kernel.org
 help / color / mirror / Atom feed
From: Damien Le Moal <dlemoal@kernel.org>
To: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: linux-ide@vger.kernel.org, linux-scsi@vger.kernel.org,
	"Martin K . Petersen" <martin.petersen@oracle.com>,
	Paul Ausbeck <paula@soe.ucsc.edu>,
	Thorsten Leemhuis <regressions@leemhuis.info>,
	TW <dalzot@gmail.com>,
	regressions@lists.linux.dev, Bart Van Assche <bvanassche@acm.org>,
	linux-renesas-soc@vger.kernel.org
Subject: Re: [PATCH] ata,scsi: do not issue START STOP UNIT on resume
Date: Thu, 14 Sep 2023 07:07:56 +0900	[thread overview]
Message-ID: <0f164526-d1b6-64d3-a802-aa2a6c7951ee@kernel.org> (raw)
In-Reply-To: <CAMuHMdUN5M3CkUn+HyPmgynT+9QLnE1GW_-v7gH5xOObQHfz2Q@mail.gmail.com>

On 9/13/23 19:34, Geert Uytterhoeven wrote:
> Hi Damien,
> 
> On Wed, Sep 13, 2023 at 12:21 PM Geert Uytterhoeven
> <geert@linux-m68k.org> wrote:
>> On Wed, Sep 13, 2023 at 12:58 AM Damien Le Moal <dlemoal@kernel.org> wrote:
>>> On 9/13/23 02:39, Geert Uytterhoeven wrote:
>>>> On Mon, 31 Jul 2023, Damien Le Moal wrote:
>>>>> During system resume, ata_port_pm_resume() triggers ata EH to
>>>>> 1) Resume the controller
>>>>> 2) Reset and rescan the ports
>>>>> 3) Revalidate devices
>>>>> This EH execution is started asynchronously from ata_port_pm_resume(),
>>>>> which means that when sd_resume() is executed, none or only part of the
>>>>> above processing may have been executed. However, sd_resume() issues a
>>>>> START STOP UNIT to wake up the drive from sleep mode. This command is
>>>>> translated to ATA with ata_scsi_start_stop_xlat() and issued to the
>>>>> device. However, depending on the state of execution of the EH process
>>>>> and revalidation triggerred by ata_port_pm_resume(), two things may
>>>>> happen:
>>>>> 1) The START STOP UNIT fails if it is received before the controller has
>>>>>   been reenabled at the beginning of the EH execution. This is visible
>>>>>   with error messages like:
>>>>>
>>>>> ata10.00: device reported invalid CHS sector 0
>>>>> sd 9:0:0:0: [sdc] Start/Stop Unit failed: Result: hostbyte=DID_OK driverbyte=DRIVER_OK
>>>>> sd 9:0:0:0: [sdc] Sense Key : Illegal Request [current]
>>>>> sd 9:0:0:0: [sdc] Add. Sense: Unaligned write command
>>>>> sd 9:0:0:0: PM: dpm_run_callback(): scsi_bus_resume+0x0/0x90 returns -5
>>>>> sd 9:0:0:0: PM: failed to resume async: error -5
>>>>>
>>>>> 2) The START STOP UNIT command is received while the EH process is
>>>>>   on-going, which mean that it is stopped and must wait for its
>>>>>   completion, at which point the command is rather useless as the drive
>>>>>   is already fully spun up already. This case results also in a
>>>>>   significant delay in sd_resume() which is observable by users as
>>>>>   the entire system resume completion is delayed.
>>>>>
>>>>> Given that ATA devices will be woken up by libata activity on resume,
>>>>> sd_resume() has no need to issue a START STOP UNIT command, which solves
>>>>> the above mentioned problems. Do not issue this command by introducing
>>>>> the new scsi_device flag no_start_on_resume and setting this flag to 1
>>>>> in ata_scsi_dev_config(). sd_resume() is modified to issue a START STOP
>>>>> UNIT command only if this flag is not set.
>>>>>
>>>>> Reported-by: Paul Ausbeck <paula@soe.ucsc.edu>
>>>>> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=215880
>>>>> Fixes: a19a93e4c6a9 ("scsi: core: pm: Rely on the device driver core for async power management")
>>>>> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
>>>>
>>>> Thanks for your patch, which is now commit 0a8589055936d8fe
>>>> ("ata,scsi: do not issue START STOP UNIT on resume") in v6.5-rc5.
>>>> Sorry for being late to the party, but this commit landed upstream
>>>> during my summer holidays, and apparently I wasn't that focussed on
>>>> noticing small behavioral changes after getting back to work...
>>>>
>>>> I noticed an oddity after s2idle or s2ram on Renesas Salvator-XS (R-Car
>>>> H3 ES2.0) with an old (spinning rust) SATA drive, and bisected it to
>>>> this commit: when accessing the drive after system resume, there is now
>>>> a delay of ca. 5s before data is returned, presumably due to starting
>>>> the drive, and having to wait for it to spin up to be able to read data.
>>>> But the good news is that the actual system resume takes less time than
>>>> before (reduced by even more than ca. 5s!), so this looks like a net
>>>> win...
>>>
>>> Thanks for the report. The delay for the first data access from user space right
>>> after resume is 100% expected, with or without this patch. The reason is that
>>> waking up the drive (spinning it up) is done asynchronously from the PM resume
>>> context, so when you get "PM suspend exit" message signaling that the system is
>>> resumed, the drive may not yet be spinning. Any access will wait for that to
>>> happen before proceeding. Depending on the drive that can take up to 10s or so.
>>
>> That does not match with what I am seeing: before this patch, there
>> was no delay on first data access from user space, as the drive is fully
>> spun up when system resume returns.
>> With this patch, system resume returns earlier, and the drive is only
>> spun up when user space starts accessing data.
>>
>> Note that I do not have any file system mounted, and use
>> "hd /dev/sda | head -70" to access the disk.
>>
>>> I am not entirely sure where the net win you see come from. But the patch you
>>> mention is in fact completely wrong and does not fix the underlying issues with
>>> ata suspend/resume and potential deadlocks in PM due to ata ports relationship
>>> with scsi devices. So I have been working on fixing this for the last 2 weeks,
>>> after another user also reported issues with the patch you mention [1].
>>>
>>> Could you try libata for-next branch on your system ? There are 7 fix patches in
>>> there that I plan to send out for 6.6-rc2 to fix the patch in question and other
>>> issues potentially causing deadlocks on resume. The patches were posted also [2].
>>>
>>> https://lore.kernel.org/linux-ide/20230912005655.368075-1-dlemoal@kernel.org/T/#t
>>
>> Unfortunately that didn't work, as /dev/sda no longer exists.
>> Will reply to the patch I bisected the issue to...
> 
> With libata/for-next (fa2259a59966c005 ("ata: libata: Cleanup inline
> DMA helper functions")) and commit 99626085d036ec32 ("ata: libata-scsi:
> link ata port and scsi device") reverted, it behaves as before (disk
> is spun up when system resume completes, no delay when accessing the
> disk from userspace).

I will check the ata platform driver for R-CAR. I may have overlooked something
in that area. I tested with AHCI and libsas adapters only as I do not have ATA
on the few ARM SBC boards I have. And I do not have an R-CAR board.

What surprises me is that you need to revert ata: libata: Cleanup inline DMA
helper functions". This patch has 0 functional changes and really is only a code
cleanup... Nothing should change with it. Can you confirm that you really need
to revert that patch to get things working again ?

-- 
Damien Le Moal
Western Digital Research


  reply	other threads:[~2023-09-13 22:08 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-31  0:39 [PATCH] ata,scsi: do not issue START STOP UNIT on resume Damien Le Moal
2023-07-31  3:48 ` TW
2023-07-31  4:44   ` Damien Le Moal
2023-07-31  5:47 ` Tanner Watkins
2023-07-31 16:13 ` Hannes Reinecke
2023-08-01  3:44   ` Damien Le Moal
2023-08-01  6:16     ` Hannes Reinecke
2023-07-31 19:43 ` Paul Ausbeck
2023-08-01 18:36 ` Bart Van Assche
2023-08-02  8:05 ` Damien Le Moal
2023-08-24 18:28 ` Rodrigo Vivi
2023-08-24 23:42   ` Damien Le Moal
2023-08-25  1:31     ` Martin K. Petersen
2023-08-25  1:33       ` Damien Le Moal
2023-08-25 17:09     ` Rodrigo Vivi
2023-08-25 22:06       ` Damien Le Moal
2023-08-29  6:17       ` Damien Le Moal
2023-08-30 22:14         ` Rodrigo Vivi
2023-08-31  0:32           ` Damien Le Moal
2023-08-31  1:48             ` Vivi, Rodrigo
2023-08-31  3:06               ` Damien Le Moal
2023-09-05  5:20               ` Damien Le Moal
2023-09-05 17:17                 ` Rodrigo Vivi
2023-09-06  1:07                   ` Damien Le Moal
2023-08-31  6:55       ` Damien Le Moal
2023-08-25 12:19   ` Damien Le Moal
2023-09-12 17:39 ` Geert Uytterhoeven
2023-09-12 22:58   ` Damien Le Moal
2023-09-13 10:21     ` Geert Uytterhoeven
2023-09-13 10:34       ` Geert Uytterhoeven
2023-09-13 22:07         ` Damien Le Moal [this message]
2023-09-14  6:59           ` Geert Uytterhoeven
2023-09-13 22:03       ` Damien Le Moal
2023-09-14  6:53         ` Geert Uytterhoeven
2023-09-14  6:58           ` Damien Le Moal
2023-09-14 15:29         ` Phillip Susi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0f164526-d1b6-64d3-a802-aa2a6c7951ee@kernel.org \
    --to=dlemoal@kernel.org \
    --cc=bvanassche@acm.org \
    --cc=dalzot@gmail.com \
    --cc=geert@linux-m68k.org \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-renesas-soc@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=paula@soe.ucsc.edu \
    --cc=regressions@leemhuis.info \
    --cc=regressions@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.