From: Damien Le Moal <dlemoal@kernel.org>
To: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: "regressions@leemhuis.info" <regressions@leemhuis.info>,
"dalzot@gmail.com" <dalzot@gmail.com>,
"linux-ide@vger.kernel.org" <linux-ide@vger.kernel.org>,
"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
"paula@soe.ucsc.edu" <paula@soe.ucsc.edu>,
"regressions@lists.linux.dev" <regressions@lists.linux.dev>,
"bvanassche@acm.org" <bvanassche@acm.org>,
"martin.petersen@oracle.com" <martin.petersen@oracle.com>
Subject: Re: [PATCH] ata,scsi: do not issue START STOP UNIT on resume
Date: Wed, 6 Sep 2023 10:07:07 +0900 [thread overview]
Message-ID: <a4255590-26ef-56ed-8574-5297ac0ef40e@kernel.org> (raw)
In-Reply-To: <ZPditFZNQWQw5yp3@intel.com>
On 9/6/23 02:17, Rodrigo Vivi wrote:
>> I think I have now figured it out, and fixed. I could reliably recreate the same
>> hang both with qemu using a failed suspend (using a device not supporting
>> suspend) and real hardware with a short rtc wake.
>>
>> It turns out that the root cause of the hang is ata_scsi_dev_rescan(), which is
>> scheduled asynchronously from PM context on resume. With quick suspend after a
>> resume, suspend may win the race against that ata_scsi_dev_rescan() task
>> execution and we endup calling scsi_rescan_device() on a suspended device,
>> causing that function to wait with the device_lock() held, which causes PM to
>> deadlock when it needs to resume the scsi device. The recent commit 6aa0365a3c85
>> ("ata: libata-scsi: Avoid deadlock on rescan after device resume") was intended
>> to fix that, but it did so less than ideally and the fix has a race on the scsi
>> power state check, thus not always preventing the resume hang.
>>
>> I pushed a new patch series that goes on top of 6.5.0: resume-v3 branch in the
>> libata tree:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata.git
>>
>> This works very well for me. Using this script on real hardware:
>>
>> for (( i=0; i<20; i++ )); do
>> echo "+2" > /sys/class/rtc/rtc0/wakealarm
>> echo mem > /sys/power/state
>> done
>>
>> The system repeatedly suspends and resumes and comes back OK. Of note is that if
>> I set the delay to +1 second, then I sometime do not see the system resume and
>> the script stops. But using wakeup-on-lan (wol command) from another machine to
>> wake it up, the machine resumes normally and continues executing the script. So
>> it seems that setting the rtc alarm unreasonably early result in it being lost
>> and the system suspending wating to be woken up.
>>
>> I also tested this in qemu. As mentioned before, I cannot get rtc alarm to wake
>> up the VM guest though. However, using a virtio device that does not support
>> suspend, resume strats in the middle of the suspend operation due to the suspend
>> error reported by that device. And it turns out that systemd really insists on
>> suspending the system despite the error, so when running "systemctl suspend" I
>> see a retry for suspend right after the first failed one. That is enough to
>> trigger the issue without the patches.
>>
>> Please test !
>
> \o/ works for me!
>
> Feel free to use:
> Tested-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Awesome ! Thank you for testing. I will rebase the patches and post the official
version for 6.6 fixes (and the other cleanup patches for 6.7), after retesting
again. Never know :)
--
Damien Le Moal
Western Digital Research
next prev parent reply other threads:[~2023-09-06 1:07 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-31 0:39 [PATCH] ata,scsi: do not issue START STOP UNIT on resume Damien Le Moal
2023-07-31 3:48 ` TW
2023-07-31 4:44 ` Damien Le Moal
2023-07-31 5:47 ` Tanner Watkins
2023-07-31 16:13 ` Hannes Reinecke
2023-08-01 3:44 ` Damien Le Moal
2023-08-01 6:16 ` Hannes Reinecke
2023-07-31 19:43 ` Paul Ausbeck
2023-08-01 18:36 ` Bart Van Assche
2023-08-02 8:05 ` Damien Le Moal
2023-08-24 18:28 ` Rodrigo Vivi
2023-08-24 23:42 ` Damien Le Moal
2023-08-25 1:31 ` Martin K. Petersen
2023-08-25 1:33 ` Damien Le Moal
2023-08-25 17:09 ` Rodrigo Vivi
2023-08-25 22:06 ` Damien Le Moal
2023-08-29 6:17 ` Damien Le Moal
2023-08-30 22:14 ` Rodrigo Vivi
2023-08-31 0:32 ` Damien Le Moal
2023-08-31 1:48 ` Vivi, Rodrigo
2023-08-31 3:06 ` Damien Le Moal
2023-09-05 5:20 ` Damien Le Moal
2023-09-05 17:17 ` Rodrigo Vivi
2023-09-06 1:07 ` Damien Le Moal [this message]
2023-08-31 6:55 ` Damien Le Moal
2023-08-25 12:19 ` Damien Le Moal
2023-09-12 17:39 ` Geert Uytterhoeven
2023-09-12 22:58 ` Damien Le Moal
2023-09-13 10:21 ` Geert Uytterhoeven
2023-09-13 10:34 ` Geert Uytterhoeven
2023-09-13 22:07 ` Damien Le Moal
2023-09-14 6:59 ` Geert Uytterhoeven
2023-09-13 22:03 ` Damien Le Moal
2023-09-14 6:53 ` Geert Uytterhoeven
2023-09-14 6:58 ` Damien Le Moal
2023-09-14 15:29 ` Phillip Susi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a4255590-26ef-56ed-8574-5297ac0ef40e@kernel.org \
--to=dlemoal@kernel.org \
--cc=bvanassche@acm.org \
--cc=dalzot@gmail.com \
--cc=linux-ide@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=paula@soe.ucsc.edu \
--cc=regressions@leemhuis.info \
--cc=regressions@lists.linux.dev \
--cc=rodrigo.vivi@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox