Re: [PATCH] ata,scsi: do not issue START STOP UNIT on resume

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Damien Le Moal <dlemoal@kernel.org>
To: "Vivi, Rodrigo" <rodrigo.vivi@intel.com>
Cc: "regressions@leemhuis.info" <regressions@leemhuis.info>,
	"dalzot@gmail.com" <dalzot@gmail.com>,
	"linux-ide@vger.kernel.org" <linux-ide@vger.kernel.org>,
	"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
	"paula@soe.ucsc.edu" <paula@soe.ucsc.edu>,
	"regressions@lists.linux.dev" <regressions@lists.linux.dev>,
	"bvanassche@acm.org" <bvanassche@acm.org>,
	"martin.petersen@oracle.com" <martin.petersen@oracle.com>
Subject: Re: [PATCH] ata,scsi: do not issue START STOP UNIT on resume
Date: Thu, 31 Aug 2023 12:06:45 +0900	[thread overview]
Message-ID: <610a1f58-d64c-406c-9102-ef3508e3ffec@kernel.org> (raw)
In-Reply-To: <9e09411348ae7469b4a9a7d076a8c42f84d12823.camel@intel.com>

On 8/31/23 10:48, Vivi, Rodrigo wrote:
> On Thu, 2023-08-31 at 09:32 +0900, Damien Le Moal wrote:
>> On 8/31/23 07:14, Rodrigo Vivi wrote:
>>> On Tue, Aug 29, 2023 at 03:17:38PM +0900, Damien Le Moal wrote:
>>>> On 8/26/23 02:09, Rodrigo Vivi wrote:
>>>>>>> So, maybe we have some kind of disks/configuration out
>>>>>>> there where this
>>>>>>> start upon resume is needed? Maybe it is just a matter of
>>>>>>> timming to
>>>>>>> ensure some firmware underneath is up and back to life?
>>>>>>
>>>>>> I do not think so. Suspend will issue a start stop unit
>>>>>> command to put the drive
>>>>>> to sleep and resume will reset the port (which should wake up
>>>>>> the drive) and
>>>>>> then issue an IDENTIFY command (which will also wake up the
>>>>>> drive) and other
>>>>>> read logs etc to rescan the drive.
>>>>>> In both cases, if the commands do not complete, we would see
>>>>>> errors/timeout and
>>>>>> likely port reset/drive gone events. So I think this is
>>>>>> likely another subtle
>>>>>> race between scsi suspend and ata suspend that is causing a
>>>>>> deadlock.
>>>>>>
>>>>>> The main issue I think is that there is no direct ancestry
>>>>>> between the ata port
>>>>>> (device) and scsi device, so the change to scsi async pm ops
>>>>>> made a mess of the
>>>>>> suspend/resume operations ordering. For suspend, scsi device
>>>>>> (child of ata port)
>>>>>> should be first, then ata port device (parent). For resume,
>>>>>> the reverse order is
>>>>>> needed. PM normally ensures that parent/child ordering, but
>>>>>> we lack that
>>>>>> parent/child relationship. I am working on fixing that but it
>>>>>> is very slow
>>>>>> progress because I have been so far enable to recreate any of
>>>>>> the issues that
>>>>>> have been reported. I am patching "blind"...
>>>>>
>>>>> I believe your suspicious makes sense. And on these lines, that
>>>>> patch you
>>>>> attached earlier would fix that. However my initial tries of
>>>>> that didn't
>>>>> help. I'm going to run more tests and get back to you.
>>>>
>>>> Rodrigo,
>>>>
>>>> I pushed the resume-v2 branch to libata tree:
>>>>
>>>> git@gitolite.kernel.org:pub/scm/linux/kernel/git/dlemoal/libata
>>>> (or
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata.gi
>>>> t)
>>>>
>>>> This branch adds 13 patches on top of 6.5.0 to cleanup libata
>>>> suspend/resume and
>>>> other device shutdown issues. The first 4 patches are the main
>>>> ones to fix
>>>> suspend resume. I tested that on 2 different machines with
>>>> different drives and
>>>> with qemu. All seems fine.
>>>>
>>>> Could you try to run this through your CI ? I am very interested
>>>> in seeing if it
>>>> survives your suspend/resume tests.
>>>
>>> well, in the end this didn't affect the CI machinery as I was
>>> afraid.
>>> it is only in my local DG2.
>>>
>>> https://intel-gfx-ci.01.org/tree/intel-xe/bat-
>>> all.html?testfilter=suspend
>>> (bat-dg2-oem2 one)
>>>
>>> I just got these 13 patches and applied to my branch and tested it
>>> again
>>> and it still *fails* for me.
>>
>> That is annoying... But I think the messages give us a hint as to
>> what is going
>> on. See below.
>>
>>>
>>> [   79.648328] [IGT] kms_pipe_crc_basic: finished subtest pipe-A-
>>> DP-2, SUCCESS
>>> [   79.657353] [IGT] kms_pipe_crc_basic: starting dynamic subtest
>>> pipe-B-DP-2
>>> [   80.375042] PM: suspend entry (deep)
>>> [   80.380799] Filesystems sync: 0.002 seconds
>>> [   80.386476] Freezing user space processes
>>> [   80.392286] Freezing user space processes completed (elapsed
>>> 0.001 seconds)
>>> [   80.399294] OOM killer disabled.
>>> [   80.402536] Freezing remaining freezable tasks
>>> [   80.408335] Freezing remaining freezable tasks completed
>>> (elapsed 0.001 seconds)
>>> [   80.439372] sd 5:0:0:0: [sdb] Synchronizing SCSI cache
>>> [   80.439716] serial 00:01: disabled
>>> [   80.448011] sd 4:0:0:0: [sda] Synchronizing SCSI cache
>>> [   80.448014] sd 7:0:0:0: [sdc] Synchronizing SCSI cache
>>> [   80.453600] ata6.00: Entering standby power mode
>>
>> This is sd 5:0:0:0. All good, ordered properly with the
>> "Synchronizing SCSI cache".
>>
>>> [   80.464217] ata5.00: Entering standby power mode
>>
>> Same here for sd 4:0:0:0.
>>
>>> [   80.812294] ata8: SATA link up 6.0 Gbps (SStatus 133 SControl
>>> 300)
>>> [   80.818520] ata8.00: Entering active power mode
>>> [   80.842989] ata8.00: configured for UDMA/133
>>
>> Arg ! sd 7:0:0:0 is resuming ! But the above "Synchronizing SCSI
>> cache" tells
>> us that it was suspending and libata EH did not yet put that drive to
>> standby...
>>
>>> [   80.847660] ata8.00: Entering standby power mode
>>
>> ... which happens here. So it looks like libata EH had both the
>> suspend and
>> resume requests at the same time, which is totally weird.
> 
> although it looks weird, it totally matches the 'use case'.
> I mean, if I suspend, resume, and wait a bit before suspend and resume
> again, it will work 100% of the time.
> The issue is really only when another suspend comes right after the
> resume, in a loop without any wait.
> 
>>
>>> [   81.119426] xe 0000:03:00.0: [drm] GT0: suspended
>>> [   81.800508] PM: suspend of devices complete after 1367.829 msecs
>>> [   81.806661] PM: start suspend of devices complete after 1390.859
>>> msecs
>>> [   81.813244] PM: suspend devices took 1.398 seconds
>>> [   81.820101] PM: late suspend of devices complete after 2.036
>>> msecs
>>
>> ...and PM suspend completes here. Resume "starts" now (but clearly it
>> started
>> earlier already given that sd 7:0:0:0 was reactivated.
> 
> that is weird.
> 
>>
>>> �[   82.403857] serial 00:01: activated
>>> [   82.489612] nvme nvme0: 16/0/0 default/read/poll queues
>>> [   82.563318] r8169 0000:07:00.0 enp7s0: Link is Down
>>> [   82.581444] xe REG[0x223a8-0x223af]: allow read access
>>> [   82.586704] xe REG[0x1c03a8-0x1c03af]: allow read access
>>> [   82.592071] xe REG[0x1d03a8-0x1d03af]: allow read access
>>> [   82.597423] xe REG[0x1c83a8-0x1c83af]: allow read access
>>> [   82.602765] xe REG[0x1d83a8-0x1d83af]: allow read access
>>> [   82.608113] xe REG[0x1a3a8-0x1a3af]: allow read access
>>> [   82.613281] xe REG[0x1c3a8-0x1c3af]: allow read access
>>> [   82.618454] xe REG[0x1e3a8-0x1e3af]: allow read access
>>> [   82.623634] xe REG[0x263a8-0x263af]: allow read access
>>> [   82.628816] xe 0000:03:00.0: [drm] GT0: resumed
>>> [   82.728005] ata7: SATA link down (SStatus 4 SControl 300)
>>> [   82.733531] ata5: SATA link up 6.0 Gbps (SStatus 133 SControl
>>> 300)
>>> [   82.739773] ata5.00: Entering active power mode
>>> [   82.744398] ata6: SATA link up 6.0 Gbps (SStatus 133 SControl
>>> 300)
>>> [   82.750618] ata6.00: Entering active power mode
>>> [   82.755961] ata5.00: configured for UDMA/133
>>> [   82.760479] ata5.00: Enabling discard_zeroes_data
>>> [   82.836266] ata6.00: configured for UDMA/133
>>> [   84.460081] ata8: SATA link up 6.0 Gbps (SStatus 133 SControl
>>> 300)
>>> [   84.466354] ata8.00: Entering active power mode
>>> [   84.497256] ata8.00: configured for UDMA/133
>>> ...
>>
>> And this looks all normal, the drives have all been transitioned to
>> active
>> power mode as expected. And yet, your system is stuck after this,
>> right ?
> 
> yes
> 
>> Can you try to boot with "sysrq_always_enabled" and try to see if
>> sending
>> "ctrl-sysrq-t" keys can give you a stack backtrace of the tasks to
>> see where
>> they are stuck ?
> 
> I will try tomorrow.

After fixing my qemu setup to get rtcwake to work, I succedded in recreating
the hang ! With multiple disks attached to the VM, suspending the VM with an
rtcwake timer set to +1 seconds triggers the resume in the middle of the
suspend and it hangs.

And I found the problem: it is a race in ata_port_request_pm() where the call
to ata_port_wait_eh() is completely innefective, causing suspend and resume to
be scheduled simultaneously, and so one being lost since the pm_msg field is
per port.

I have a fix. I keep doing suspend/resume in the VM and all looks good now.
Cleaning up the patches and I will push a resume-v3 branch soon for you to test.

> 
>>
>> I am going to try something like you do with very short resume rtc
>> timer and
>> multiple disks to see if I can reproduce. But it is starting to look
>> like PM is
>> starting resuming before suspend completes...
> 
> yes, this is what it looks like.

It seems that the PM code is fine with that. The issue really was with libata :)


-- 
Damien Le Moal
Western Digital Research

next prev parent reply	other threads:[~2023-08-31  3:06 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-31  0:39 [PATCH] ata,scsi: do not issue START STOP UNIT on resume Damien Le Moal
2023-07-31  3:48 ` TW
2023-07-31  4:44   ` Damien Le Moal
2023-07-31  5:47 ` Tanner Watkins
2023-07-31 16:13 ` Hannes Reinecke
2023-08-01  3:44   ` Damien Le Moal
2023-08-01  6:16     ` Hannes Reinecke
2023-07-31 19:43 ` Paul Ausbeck
2023-08-01 18:36 ` Bart Van Assche
2023-08-02  8:05 ` Damien Le Moal
2023-08-24 18:28 ` Rodrigo Vivi
2023-08-24 23:42   ` Damien Le Moal
2023-08-25  1:31     ` Martin K. Petersen
2023-08-25  1:33       ` Damien Le Moal
2023-08-25 17:09     ` Rodrigo Vivi
2023-08-25 22:06       ` Damien Le Moal
2023-08-29  6:17       ` Damien Le Moal
2023-08-30 22:14         ` Rodrigo Vivi
2023-08-31  0:32           ` Damien Le Moal
2023-08-31  1:48             ` Vivi, Rodrigo
2023-08-31  3:06               ` Damien Le Moal [this message]
2023-09-05  5:20               ` Damien Le Moal
2023-09-05 17:17                 ` Rodrigo Vivi
2023-09-06  1:07                   ` Damien Le Moal
2023-08-31  6:55       ` Damien Le Moal
2023-08-25 12:19   ` Damien Le Moal
2023-09-12 17:39 ` Geert Uytterhoeven
2023-09-12 22:58   ` Damien Le Moal
2023-09-13 10:21     ` Geert Uytterhoeven
2023-09-13 10:34       ` Geert Uytterhoeven
2023-09-13 22:07         ` Damien Le Moal
2023-09-14  6:59           ` Geert Uytterhoeven
2023-09-13 22:03       ` Damien Le Moal
2023-09-14  6:53         ` Geert Uytterhoeven
2023-09-14  6:58           ` Damien Le Moal
2023-09-14 15:29         ` Phillip Susi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=610a1f58-d64c-406c-9102-ef3508e3ffec@kernel.org \
    --to=dlemoal@kernel.org \
    --cc=bvanassche@acm.org \
    --cc=dalzot@gmail.com \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=paula@soe.ucsc.edu \
    --cc=regressions@leemhuis.info \
    --cc=regressions@lists.linux.dev \
    --cc=rodrigo.vivi@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.