From: Damien Le Moal <dlemoal@kernel.org>
To: "Vivi, Rodrigo" <rodrigo.vivi@intel.com>
Cc: "regressions@leemhuis.info" <regressions@leemhuis.info>,
"dalzot@gmail.com" <dalzot@gmail.com>,
"linux-ide@vger.kernel.org" <linux-ide@vger.kernel.org>,
"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
"paula@soe.ucsc.edu" <paula@soe.ucsc.edu>,
"regressions@lists.linux.dev" <regressions@lists.linux.dev>,
"bvanassche@acm.org" <bvanassche@acm.org>,
"martin.petersen@oracle.com" <martin.petersen@oracle.com>
Subject: Re: [PATCH] ata,scsi: do not issue START STOP UNIT on resume
Date: Thu, 31 Aug 2023 12:06:45 +0900 [thread overview]
Message-ID: <610a1f58-d64c-406c-9102-ef3508e3ffec@kernel.org> (raw)
In-Reply-To: <9e09411348ae7469b4a9a7d076a8c42f84d12823.camel@intel.com>
On 8/31/23 10:48, Vivi, Rodrigo wrote:
> On Thu, 2023-08-31 at 09:32 +0900, Damien Le Moal wrote:
>> On 8/31/23 07:14, Rodrigo Vivi wrote:
>>> On Tue, Aug 29, 2023 at 03:17:38PM +0900, Damien Le Moal wrote:
>>>> On 8/26/23 02:09, Rodrigo Vivi wrote:
>>>>>>> So, maybe we have some kind of disks/configuration out
>>>>>>> there where this
>>>>>>> start upon resume is needed? Maybe it is just a matter of
>>>>>>> timming to
>>>>>>> ensure some firmware underneath is up and back to life?
>>>>>>
>>>>>> I do not think so. Suspend will issue a start stop unit
>>>>>> command to put the drive
>>>>>> to sleep and resume will reset the port (which should wake up
>>>>>> the drive) and
>>>>>> then issue an IDENTIFY command (which will also wake up the
>>>>>> drive) and other
>>>>>> read logs etc to rescan the drive.
>>>>>> In both cases, if the commands do not complete, we would see
>>>>>> errors/timeout and
>>>>>> likely port reset/drive gone events. So I think this is
>>>>>> likely another subtle
>>>>>> race between scsi suspend and ata suspend that is causing a
>>>>>> deadlock.
>>>>>>
>>>>>> The main issue I think is that there is no direct ancestry
>>>>>> between the ata port
>>>>>> (device) and scsi device, so the change to scsi async pm ops
>>>>>> made a mess of the
>>>>>> suspend/resume operations ordering. For suspend, scsi device
>>>>>> (child of ata port)
>>>>>> should be first, then ata port device (parent). For resume,
>>>>>> the reverse order is
>>>>>> needed. PM normally ensures that parent/child ordering, but
>>>>>> we lack that
>>>>>> parent/child relationship. I am working on fixing that but it
>>>>>> is very slow
>>>>>> progress because I have been so far enable to recreate any of
>>>>>> the issues that
>>>>>> have been reported. I am patching "blind"...
>>>>>
>>>>> I believe your suspicious makes sense. And on these lines, that
>>>>> patch you
>>>>> attached earlier would fix that. However my initial tries of
>>>>> that didn't
>>>>> help. I'm going to run more tests and get back to you.
>>>>
>>>> Rodrigo,
>>>>
>>>> I pushed the resume-v2 branch to libata tree:
>>>>
>>>> git@gitolite.kernel.org:pub/scm/linux/kernel/git/dlemoal/libata
>>>> (or
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata.gi
>>>> t)
>>>>
>>>> This branch adds 13 patches on top of 6.5.0 to cleanup libata
>>>> suspend/resume and
>>>> other device shutdown issues. The first 4 patches are the main
>>>> ones to fix
>>>> suspend resume. I tested that on 2 different machines with
>>>> different drives and
>>>> with qemu. All seems fine.
>>>>
>>>> Could you try to run this through your CI ? I am very interested
>>>> in seeing if it
>>>> survives your suspend/resume tests.
>>>
>>> well, in the end this didn't affect the CI machinery as I was
>>> afraid.
>>> it is only in my local DG2.
>>>
>>> https://intel-gfx-ci.01.org/tree/intel-xe/bat-
>>> all.html?testfilter=suspend
>>> (bat-dg2-oem2 one)
>>>
>>> I just got these 13 patches and applied to my branch and tested it
>>> again
>>> and it still *fails* for me.
>>
>> That is annoying... But I think the messages give us a hint as to
>> what is going
>> on. See below.
>>
>>>
>>> [ 79.648328] [IGT] kms_pipe_crc_basic: finished subtest pipe-A-
>>> DP-2, SUCCESS
>>> [ 79.657353] [IGT] kms_pipe_crc_basic: starting dynamic subtest
>>> pipe-B-DP-2
>>> [ 80.375042] PM: suspend entry (deep)
>>> [ 80.380799] Filesystems sync: 0.002 seconds
>>> [ 80.386476] Freezing user space processes
>>> [ 80.392286] Freezing user space processes completed (elapsed
>>> 0.001 seconds)
>>> [ 80.399294] OOM killer disabled.
>>> [ 80.402536] Freezing remaining freezable tasks
>>> [ 80.408335] Freezing remaining freezable tasks completed
>>> (elapsed 0.001 seconds)
>>> [ 80.439372] sd 5:0:0:0: [sdb] Synchronizing SCSI cache
>>> [ 80.439716] serial 00:01: disabled
>>> [ 80.448011] sd 4:0:0:0: [sda] Synchronizing SCSI cache
>>> [ 80.448014] sd 7:0:0:0: [sdc] Synchronizing SCSI cache
>>> [ 80.453600] ata6.00: Entering standby power mode
>>
>> This is sd 5:0:0:0. All good, ordered properly with the
>> "Synchronizing SCSI cache".
>>
>>> [ 80.464217] ata5.00: Entering standby power mode
>>
>> Same here for sd 4:0:0:0.
>>
>>> [ 80.812294] ata8: SATA link up 6.0 Gbps (SStatus 133 SControl
>>> 300)
>>> [ 80.818520] ata8.00: Entering active power mode
>>> [ 80.842989] ata8.00: configured for UDMA/133
>>
>> Arg ! sd 7:0:0:0 is resuming ! But the above "Synchronizing SCSI
>> cache" tells
>> us that it was suspending and libata EH did not yet put that drive to
>> standby...
>>
>>> [ 80.847660] ata8.00: Entering standby power mode
>>
>> ... which happens here. So it looks like libata EH had both the
>> suspend and
>> resume requests at the same time, which is totally weird.
>
> although it looks weird, it totally matches the 'use case'.
> I mean, if I suspend, resume, and wait a bit before suspend and resume
> again, it will work 100% of the time.
> The issue is really only when another suspend comes right after the
> resume, in a loop without any wait.
>
>>
>>> [ 81.119426] xe 0000:03:00.0: [drm] GT0: suspended
>>> [ 81.800508] PM: suspend of devices complete after 1367.829 msecs
>>> [ 81.806661] PM: start suspend of devices complete after 1390.859
>>> msecs
>>> [ 81.813244] PM: suspend devices took 1.398 seconds
>>> [ 81.820101] PM: late suspend of devices complete after 2.036
>>> msecs
>>
>> ...and PM suspend completes here. Resume "starts" now (but clearly it
>> started
>> earlier already given that sd 7:0:0:0 was reactivated.
>
> that is weird.
>
>>
>>> �[ 82.403857] serial 00:01: activated
>>> [ 82.489612] nvme nvme0: 16/0/0 default/read/poll queues
>>> [ 82.563318] r8169 0000:07:00.0 enp7s0: Link is Down
>>> [ 82.581444] xe REG[0x223a8-0x223af]: allow read access
>>> [ 82.586704] xe REG[0x1c03a8-0x1c03af]: allow read access
>>> [ 82.592071] xe REG[0x1d03a8-0x1d03af]: allow read access
>>> [ 82.597423] xe REG[0x1c83a8-0x1c83af]: allow read access
>>> [ 82.602765] xe REG[0x1d83a8-0x1d83af]: allow read access
>>> [ 82.608113] xe REG[0x1a3a8-0x1a3af]: allow read access
>>> [ 82.613281] xe REG[0x1c3a8-0x1c3af]: allow read access
>>> [ 82.618454] xe REG[0x1e3a8-0x1e3af]: allow read access
>>> [ 82.623634] xe REG[0x263a8-0x263af]: allow read access
>>> [ 82.628816] xe 0000:03:00.0: [drm] GT0: resumed
>>> [ 82.728005] ata7: SATA link down (SStatus 4 SControl 300)
>>> [ 82.733531] ata5: SATA link up 6.0 Gbps (SStatus 133 SControl
>>> 300)
>>> [ 82.739773] ata5.00: Entering active power mode
>>> [ 82.744398] ata6: SATA link up 6.0 Gbps (SStatus 133 SControl
>>> 300)
>>> [ 82.750618] ata6.00: Entering active power mode
>>> [ 82.755961] ata5.00: configured for UDMA/133
>>> [ 82.760479] ata5.00: Enabling discard_zeroes_data
>>> [ 82.836266] ata6.00: configured for UDMA/133
>>> [ 84.460081] ata8: SATA link up 6.0 Gbps (SStatus 133 SControl
>>> 300)
>>> [ 84.466354] ata8.00: Entering active power mode
>>> [ 84.497256] ata8.00: configured for UDMA/133
>>> ...
>>
>> And this looks all normal, the drives have all been transitioned to
>> active
>> power mode as expected. And yet, your system is stuck after this,
>> right ?
>
> yes
>
>> Can you try to boot with "sysrq_always_enabled" and try to see if
>> sending
>> "ctrl-sysrq-t" keys can give you a stack backtrace of the tasks to
>> see where
>> they are stuck ?
>
> I will try tomorrow.
After fixing my qemu setup to get rtcwake to work, I succedded in recreating
the hang ! With multiple disks attached to the VM, suspending the VM with an
rtcwake timer set to +1 seconds triggers the resume in the middle of the
suspend and it hangs.
And I found the problem: it is a race in ata_port_request_pm() where the call
to ata_port_wait_eh() is completely innefective, causing suspend and resume to
be scheduled simultaneously, and so one being lost since the pm_msg field is
per port.
I have a fix. I keep doing suspend/resume in the VM and all looks good now.
Cleaning up the patches and I will push a resume-v3 branch soon for you to test.
>
>>
>> I am going to try something like you do with very short resume rtc
>> timer and
>> multiple disks to see if I can reproduce. But it is starting to look
>> like PM is
>> starting resuming before suspend completes...
>
> yes, this is what it looks like.
It seems that the PM code is fine with that. The issue really was with libata :)
--
Damien Le Moal
Western Digital Research
next prev parent reply other threads:[~2023-08-31 3:06 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-31 0:39 [PATCH] ata,scsi: do not issue START STOP UNIT on resume Damien Le Moal
2023-07-31 3:48 ` TW
2023-07-31 4:44 ` Damien Le Moal
2023-07-31 5:47 ` Tanner Watkins
2023-07-31 16:13 ` Hannes Reinecke
2023-08-01 3:44 ` Damien Le Moal
2023-08-01 6:16 ` Hannes Reinecke
2023-07-31 19:43 ` Paul Ausbeck
2023-08-01 18:36 ` Bart Van Assche
2023-08-02 8:05 ` Damien Le Moal
2023-08-24 18:28 ` Rodrigo Vivi
2023-08-24 23:42 ` Damien Le Moal
2023-08-25 1:31 ` Martin K. Petersen
2023-08-25 1:33 ` Damien Le Moal
2023-08-25 17:09 ` Rodrigo Vivi
2023-08-25 22:06 ` Damien Le Moal
2023-08-29 6:17 ` Damien Le Moal
2023-08-30 22:14 ` Rodrigo Vivi
2023-08-31 0:32 ` Damien Le Moal
2023-08-31 1:48 ` Vivi, Rodrigo
2023-08-31 3:06 ` Damien Le Moal [this message]
2023-09-05 5:20 ` Damien Le Moal
2023-09-05 17:17 ` Rodrigo Vivi
2023-09-06 1:07 ` Damien Le Moal
2023-08-31 6:55 ` Damien Le Moal
2023-08-25 12:19 ` Damien Le Moal
2023-09-12 17:39 ` Geert Uytterhoeven
2023-09-12 22:58 ` Damien Le Moal
2023-09-13 10:21 ` Geert Uytterhoeven
2023-09-13 10:34 ` Geert Uytterhoeven
2023-09-13 22:07 ` Damien Le Moal
2023-09-14 6:59 ` Geert Uytterhoeven
2023-09-13 22:03 ` Damien Le Moal
2023-09-14 6:53 ` Geert Uytterhoeven
2023-09-14 6:58 ` Damien Le Moal
2023-09-14 15:29 ` Phillip Susi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=610a1f58-d64c-406c-9102-ef3508e3ffec@kernel.org \
--to=dlemoal@kernel.org \
--cc=bvanassche@acm.org \
--cc=dalzot@gmail.com \
--cc=linux-ide@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=paula@soe.ucsc.edu \
--cc=regressions@leemhuis.info \
--cc=regressions@lists.linux.dev \
--cc=rodrigo.vivi@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox