Re: [PATCH] ata,scsi: do not issue START STOP UNIT on resume

public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed

From: Rodrigo Vivi <rodrigo.vivi@intel.com>
To: Damien Le Moal <dlemoal@kernel.org>
Cc: "regressions@leemhuis.info" <regressions@leemhuis.info>,
	"dalzot@gmail.com" <dalzot@gmail.com>,
	"linux-ide@vger.kernel.org" <linux-ide@vger.kernel.org>,
	"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
	"paula@soe.ucsc.edu" <paula@soe.ucsc.edu>,
	"regressions@lists.linux.dev" <regressions@lists.linux.dev>,
	"bvanassche@acm.org" <bvanassche@acm.org>,
	"martin.petersen@oracle.com" <martin.petersen@oracle.com>
Subject: Re: [PATCH] ata,scsi: do not issue START STOP UNIT on resume
Date: Tue, 5 Sep 2023 13:17:40 -0400	[thread overview]
Message-ID: <ZPditFZNQWQw5yp3@intel.com> (raw)
In-Reply-To: <83ebb54c-a114-7cd9-4eb4-b9860f1afd26@kernel.org>

On Tue, Sep 05, 2023 at 02:20:23PM +0900, Damien Le Moal wrote:
> On 8/31/23 10:48, Vivi, Rodrigo wrote:
> > On Thu, 2023-08-31 at 09:32 +0900, Damien Le Moal wrote:
> >> On 8/31/23 07:14, Rodrigo Vivi wrote:
> >>> On Tue, Aug 29, 2023 at 03:17:38PM +0900, Damien Le Moal wrote:
> >>>> On 8/26/23 02:09, Rodrigo Vivi wrote:
> >>>>>>> So, maybe we have some kind of disks/configuration out
> >>>>>>> there where this
> >>>>>>> start upon resume is needed? Maybe it is just a matter of
> >>>>>>> timming to
> >>>>>>> ensure some firmware underneath is up and back to life?
> >>>>>>
> >>>>>> I do not think so. Suspend will issue a start stop unit
> >>>>>> command to put the drive
> >>>>>> to sleep and resume will reset the port (which should wake up
> >>>>>> the drive) and
> >>>>>> then issue an IDENTIFY command (which will also wake up the
> >>>>>> drive) and other
> >>>>>> read logs etc to rescan the drive.
> >>>>>> In both cases, if the commands do not complete, we would see
> >>>>>> errors/timeout and
> >>>>>> likely port reset/drive gone events. So I think this is
> >>>>>> likely another subtle
> >>>>>> race between scsi suspend and ata suspend that is causing a
> >>>>>> deadlock.
> >>>>>>
> >>>>>> The main issue I think is that there is no direct ancestry
> >>>>>> between the ata port
> >>>>>> (device) and scsi device, so the change to scsi async pm ops
> >>>>>> made a mess of the
> >>>>>> suspend/resume operations ordering. For suspend, scsi device
> >>>>>> (child of ata port)
> >>>>>> should be first, then ata port device (parent). For resume,
> >>>>>> the reverse order is
> >>>>>> needed. PM normally ensures that parent/child ordering, but
> >>>>>> we lack that
> >>>>>> parent/child relationship. I am working on fixing that but it
> >>>>>> is very slow
> >>>>>> progress because I have been so far enable to recreate any of
> >>>>>> the issues that
> >>>>>> have been reported. I am patching "blind"...
> >>>>>
> >>>>> I believe your suspicious makes sense. And on these lines, that
> >>>>> patch you
> >>>>> attached earlier would fix that. However my initial tries of
> >>>>> that didn't
> >>>>> help. I'm going to run more tests and get back to you.
> >>>>
> >>>> Rodrigo,
> >>>>
> >>>> I pushed the resume-v2 branch to libata tree:
> >>>>
> >>>> git@gitolite.kernel.org:pub/scm/linux/kernel/git/dlemoal/libata
> >>>> (or
> >>>> https://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata.gi
> >>>> t)
> >>>>
> >>>> This branch adds 13 patches on top of 6.5.0 to cleanup libata
> >>>> suspend/resume and
> >>>> other device shutdown issues. The first 4 patches are the main
> >>>> ones to fix
> >>>> suspend resume. I tested that on 2 different machines with
> >>>> different drives and
> >>>> with qemu. All seems fine.
> >>>>
> >>>> Could you try to run this through your CI ? I am very interested
> >>>> in seeing if it
> >>>> survives your suspend/resume tests.
> >>>
> >>> well, in the end this didn't affect the CI machinery as I was
> >>> afraid.
> >>> it is only in my local DG2.
> >>>
> >>> https://intel-gfx-ci.01.org/tree/intel-xe/bat-
> >>> all.html?testfilter=suspend
> >>> (bat-dg2-oem2 one)
> >>>
> >>> I just got these 13 patches and applied to my branch and tested it
> >>> again
> >>> and it still *fails* for me.
> >>
> >> That is annoying... But I think the messages give us a hint as to
> >> what is going
> >> on. See below.
> >>
> >>>
> >>> [   79.648328] [IGT] kms_pipe_crc_basic: finished subtest pipe-A-
> >>> DP-2, SUCCESS
> >>> [   79.657353] [IGT] kms_pipe_crc_basic: starting dynamic subtest
> >>> pipe-B-DP-2
> >>> [   80.375042] PM: suspend entry (deep)
> >>> [   80.380799] Filesystems sync: 0.002 seconds
> >>> [   80.386476] Freezing user space processes
> >>> [   80.392286] Freezing user space processes completed (elapsed
> >>> 0.001 seconds)
> >>> [   80.399294] OOM killer disabled.
> >>> [   80.402536] Freezing remaining freezable tasks
> >>> [   80.408335] Freezing remaining freezable tasks completed
> >>> (elapsed 0.001 seconds)
> >>> [   80.439372] sd 5:0:0:0: [sdb] Synchronizing SCSI cache
> >>> [   80.439716] serial 00:01: disabled
> >>> [   80.448011] sd 4:0:0:0: [sda] Synchronizing SCSI cache
> >>> [   80.448014] sd 7:0:0:0: [sdc] Synchronizing SCSI cache
> >>> [   80.453600] ata6.00: Entering standby power mode
> >>
> >> This is sd 5:0:0:0. All good, ordered properly with the
> >> "Synchronizing SCSI cache".
> >>
> >>> [   80.464217] ata5.00: Entering standby power mode
> >>
> >> Same here for sd 4:0:0:0.
> >>
> >>> [   80.812294] ata8: SATA link up 6.0 Gbps (SStatus 133 SControl
> >>> 300)
> >>> [   80.818520] ata8.00: Entering active power mode
> >>> [   80.842989] ata8.00: configured for UDMA/133
> >>
> >> Arg ! sd 7:0:0:0 is resuming ! But the above "Synchronizing SCSI
> >> cache" tells
> >> us that it was suspending and libata EH did not yet put that drive to
> >> standby...
> >>
> >>> [   80.847660] ata8.00: Entering standby power mode
> >>
> >> ... which happens here. So it looks like libata EH had both the
> >> suspend and
> >> resume requests at the same time, which is totally weird.
> > 
> > although it looks weird, it totally matches the 'use case'.
> > I mean, if I suspend, resume, and wait a bit before suspend and resume
> > again, it will work 100% of the time.
> > The issue is really only when another suspend comes right after the
> > resume, in a loop without any wait.
> > 
> >>
> >>> [   81.119426] xe 0000:03:00.0: [drm] GT0: suspended
> >>> [   81.800508] PM: suspend of devices complete after 1367.829 msecs
> >>> [   81.806661] PM: start suspend of devices complete after 1390.859
> >>> msecs
> >>> [   81.813244] PM: suspend devices took 1.398 seconds
> >>> [   81.820101] PM: late suspend of devices complete after 2.036
> >>> msecs
> >>
> >> ...and PM suspend completes here. Resume "starts" now (but clearly it
> >> started
> >> earlier already given that sd 7:0:0:0 was reactivated.
> > 
> > that is weird.
> > 
> >>
> >>> �[   82.403857] serial 00:01: activated
> >>> [   82.489612] nvme nvme0: 16/0/0 default/read/poll queues
> >>> [   82.563318] r8169 0000:07:00.0 enp7s0: Link is Down
> >>> [   82.581444] xe REG[0x223a8-0x223af]: allow read access
> >>> [   82.586704] xe REG[0x1c03a8-0x1c03af]: allow read access
> >>> [   82.592071] xe REG[0x1d03a8-0x1d03af]: allow read access
> >>> [   82.597423] xe REG[0x1c83a8-0x1c83af]: allow read access
> >>> [   82.602765] xe REG[0x1d83a8-0x1d83af]: allow read access
> >>> [   82.608113] xe REG[0x1a3a8-0x1a3af]: allow read access
> >>> [   82.613281] xe REG[0x1c3a8-0x1c3af]: allow read access
> >>> [   82.618454] xe REG[0x1e3a8-0x1e3af]: allow read access
> >>> [   82.623634] xe REG[0x263a8-0x263af]: allow read access
> >>> [   82.628816] xe 0000:03:00.0: [drm] GT0: resumed
> >>> [   82.728005] ata7: SATA link down (SStatus 4 SControl 300)
> >>> [   82.733531] ata5: SATA link up 6.0 Gbps (SStatus 133 SControl
> >>> 300)
> >>> [   82.739773] ata5.00: Entering active power mode
> >>> [   82.744398] ata6: SATA link up 6.0 Gbps (SStatus 133 SControl
> >>> 300)
> >>> [   82.750618] ata6.00: Entering active power mode
> >>> [   82.755961] ata5.00: configured for UDMA/133
> >>> [   82.760479] ata5.00: Enabling discard_zeroes_data
> >>> [   82.836266] ata6.00: configured for UDMA/133
> >>> [   84.460081] ata8: SATA link up 6.0 Gbps (SStatus 133 SControl
> >>> 300)
> >>> [   84.466354] ata8.00: Entering active power mode
> >>> [   84.497256] ata8.00: configured for UDMA/133
> >>> ...
> >>
> >> And this looks all normal, the drives have all been transitioned to
> >> active
> >> power mode as expected. And yet, your system is stuck after this,
> >> right ?
> > 
> > yes
> 
> I think I have now figured it out, and fixed. I could reliably recreate the same
> hang both with qemu using a failed suspend (using a device not supporting
> suspend) and real hardware with a short rtc wake.
> 
> It turns out that the root cause of the hang is ata_scsi_dev_rescan(), which is
> scheduled asynchronously from PM context on resume. With quick suspend after a
> resume, suspend may win the race against that ata_scsi_dev_rescan() task
> execution and we endup calling scsi_rescan_device() on a suspended device,
> causing that function to wait with the device_lock() held, which causes PM to
> deadlock when it needs to resume the scsi device. The recent commit 6aa0365a3c85
> ("ata: libata-scsi: Avoid deadlock on rescan after device resume") was intended
> to fix that, but it did so less than ideally and the fix has a race on the scsi
> power state check, thus not always preventing the resume hang.
> 
> I pushed a new patch series that goes on top of 6.5.0: resume-v3 branch in the
> libata tree:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata.git
> 
> This works very well for me. Using this script on real hardware:
> 
> for (( i=0; i<20; i++ )); do
> 	echo "+2" > /sys/class/rtc/rtc0/wakealarm
> 	echo mem > /sys/power/state
> done
> 
> The system repeatedly suspends and resumes and comes back OK. Of note is that if
> I set the delay to +1 second, then I sometime do not see the system resume and
> the script stops. But using wakeup-on-lan (wol command) from another machine to
> wake it up, the machine resumes normally and continues executing the script. So
> it seems that setting the rtc alarm unreasonably early result in it being lost
> and the system suspending wating to be woken up.
> 
> I also tested this in qemu. As mentioned before, I cannot get rtc alarm to wake
> up the VM guest though. However, using a virtio device that does not support
> suspend, resume strats in the middle of the suspend operation due to the suspend
> error reported by that device. And it turns out that systemd really insists on
> suspending the system despite the error, so when running "systemctl suspend" I
> see a retry for suspend right after the first failed one. That is enough to
> trigger the issue without the patches.
> 
> Please test !

\o/ works for me!

Feel free to use:
Tested-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

Thank you so much,
Rodrigo.

> 
> Thanks.
> 
> -- 
> Damien Le Moal
> Western Digital Research
>

next prev parent reply	other threads:[~2023-09-05 18:28 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-31  0:39 [PATCH] ata,scsi: do not issue START STOP UNIT on resume Damien Le Moal
2023-07-31  3:48 ` TW
2023-07-31  4:44   ` Damien Le Moal
2023-07-31  5:47 ` Tanner Watkins
2023-07-31 16:13 ` Hannes Reinecke
2023-08-01  3:44   ` Damien Le Moal
2023-08-01  6:16     ` Hannes Reinecke
2023-07-31 19:43 ` Paul Ausbeck
2023-08-01 18:36 ` Bart Van Assche
2023-08-02  8:05 ` Damien Le Moal
2023-08-24 18:28 ` Rodrigo Vivi
2023-08-24 23:42   ` Damien Le Moal
2023-08-25  1:31     ` Martin K. Petersen
2023-08-25  1:33       ` Damien Le Moal
2023-08-25 17:09     ` Rodrigo Vivi
2023-08-25 22:06       ` Damien Le Moal
2023-08-29  6:17       ` Damien Le Moal
2023-08-30 22:14         ` Rodrigo Vivi
2023-08-31  0:32           ` Damien Le Moal
2023-08-31  1:48             ` Vivi, Rodrigo
2023-08-31  3:06               ` Damien Le Moal
2023-09-05  5:20               ` Damien Le Moal
2023-09-05 17:17                 ` Rodrigo Vivi [this message]
2023-09-06  1:07                   ` Damien Le Moal
2023-08-31  6:55       ` Damien Le Moal
2023-08-25 12:19   ` Damien Le Moal
2023-09-12 17:39 ` Geert Uytterhoeven
2023-09-12 22:58   ` Damien Le Moal
2023-09-13 10:21     ` Geert Uytterhoeven
2023-09-13 10:34       ` Geert Uytterhoeven
2023-09-13 22:07         ` Damien Le Moal
2023-09-14  6:59           ` Geert Uytterhoeven
2023-09-13 22:03       ` Damien Le Moal
2023-09-14  6:53         ` Geert Uytterhoeven
2023-09-14  6:58           ` Damien Le Moal
2023-09-14 15:29         ` Phillip Susi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZPditFZNQWQw5yp3@intel.com \
    --to=rodrigo.vivi@intel.com \
    --cc=bvanassche@acm.org \
    --cc=dalzot@gmail.com \
    --cc=dlemoal@kernel.org \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=paula@soe.ucsc.edu \
    --cc=regressions@leemhuis.info \
    --cc=regressions@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox