All of lore.kernel.org
 help / color / mirror / Atom feed
From: Damien Le Moal <dlemoal@kernel.org>
To: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: "regressions@leemhuis.info" <regressions@leemhuis.info>,
	"dalzot@gmail.com" <dalzot@gmail.com>,
	"linux-ide@vger.kernel.org" <linux-ide@vger.kernel.org>,
	"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
	"paula@soe.ucsc.edu" <paula@soe.ucsc.edu>,
	"regressions@lists.linux.dev" <regressions@lists.linux.dev>,
	"bvanassche@acm.org" <bvanassche@acm.org>,
	"martin.petersen@oracle.com" <martin.petersen@oracle.com>
Subject: Re: [PATCH] ata,scsi: do not issue START STOP UNIT on resume
Date: Wed, 6 Sep 2023 10:07:07 +0900	[thread overview]
Message-ID: <a4255590-26ef-56ed-8574-5297ac0ef40e@kernel.org> (raw)
In-Reply-To: <ZPditFZNQWQw5yp3@intel.com>

On 9/6/23 02:17, Rodrigo Vivi wrote:
>> I think I have now figured it out, and fixed. I could reliably recreate the same
>> hang both with qemu using a failed suspend (using a device not supporting
>> suspend) and real hardware with a short rtc wake.
>>
>> It turns out that the root cause of the hang is ata_scsi_dev_rescan(), which is
>> scheduled asynchronously from PM context on resume. With quick suspend after a
>> resume, suspend may win the race against that ata_scsi_dev_rescan() task
>> execution and we endup calling scsi_rescan_device() on a suspended device,
>> causing that function to wait with the device_lock() held, which causes PM to
>> deadlock when it needs to resume the scsi device. The recent commit 6aa0365a3c85
>> ("ata: libata-scsi: Avoid deadlock on rescan after device resume") was intended
>> to fix that, but it did so less than ideally and the fix has a race on the scsi
>> power state check, thus not always preventing the resume hang.
>>
>> I pushed a new patch series that goes on top of 6.5.0: resume-v3 branch in the
>> libata tree:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata.git
>>
>> This works very well for me. Using this script on real hardware:
>>
>> for (( i=0; i<20; i++ )); do
>> 	echo "+2" > /sys/class/rtc/rtc0/wakealarm
>> 	echo mem > /sys/power/state
>> done
>>
>> The system repeatedly suspends and resumes and comes back OK. Of note is that if
>> I set the delay to +1 second, then I sometime do not see the system resume and
>> the script stops. But using wakeup-on-lan (wol command) from another machine to
>> wake it up, the machine resumes normally and continues executing the script. So
>> it seems that setting the rtc alarm unreasonably early result in it being lost
>> and the system suspending wating to be woken up.
>>
>> I also tested this in qemu. As mentioned before, I cannot get rtc alarm to wake
>> up the VM guest though. However, using a virtio device that does not support
>> suspend, resume strats in the middle of the suspend operation due to the suspend
>> error reported by that device. And it turns out that systemd really insists on
>> suspending the system despite the error, so when running "systemctl suspend" I
>> see a retry for suspend right after the first failed one. That is enough to
>> trigger the issue without the patches.
>>
>> Please test !
> 
> \o/ works for me!
> 
> Feel free to use:
> Tested-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

Awesome ! Thank you for testing. I will rebase the patches and post the official
version for 6.6 fixes (and the other cleanup patches for 6.7), after retesting
again. Never know :)

-- 
Damien Le Moal
Western Digital Research


  reply	other threads:[~2023-09-06  1:07 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-31  0:39 [PATCH] ata,scsi: do not issue START STOP UNIT on resume Damien Le Moal
2023-07-31  3:48 ` TW
2023-07-31  4:44   ` Damien Le Moal
2023-07-31  5:47 ` Tanner Watkins
2023-07-31 16:13 ` Hannes Reinecke
2023-08-01  3:44   ` Damien Le Moal
2023-08-01  6:16     ` Hannes Reinecke
2023-07-31 19:43 ` Paul Ausbeck
2023-08-01 18:36 ` Bart Van Assche
2023-08-02  8:05 ` Damien Le Moal
2023-08-24 18:28 ` Rodrigo Vivi
2023-08-24 23:42   ` Damien Le Moal
2023-08-25  1:31     ` Martin K. Petersen
2023-08-25  1:33       ` Damien Le Moal
2023-08-25 17:09     ` Rodrigo Vivi
2023-08-25 22:06       ` Damien Le Moal
2023-08-29  6:17       ` Damien Le Moal
2023-08-30 22:14         ` Rodrigo Vivi
2023-08-31  0:32           ` Damien Le Moal
2023-08-31  1:48             ` Vivi, Rodrigo
2023-08-31  3:06               ` Damien Le Moal
2023-09-05  5:20               ` Damien Le Moal
2023-09-05 17:17                 ` Rodrigo Vivi
2023-09-06  1:07                   ` Damien Le Moal [this message]
2023-08-31  6:55       ` Damien Le Moal
2023-08-25 12:19   ` Damien Le Moal
2023-09-12 17:39 ` Geert Uytterhoeven
2023-09-12 22:58   ` Damien Le Moal
2023-09-13 10:21     ` Geert Uytterhoeven
2023-09-13 10:34       ` Geert Uytterhoeven
2023-09-13 22:07         ` Damien Le Moal
2023-09-14  6:59           ` Geert Uytterhoeven
2023-09-13 22:03       ` Damien Le Moal
2023-09-14  6:53         ` Geert Uytterhoeven
2023-09-14  6:58           ` Damien Le Moal
2023-09-14 15:29         ` Phillip Susi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a4255590-26ef-56ed-8574-5297ac0ef40e@kernel.org \
    --to=dlemoal@kernel.org \
    --cc=bvanassche@acm.org \
    --cc=dalzot@gmail.com \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=paula@soe.ucsc.edu \
    --cc=regressions@leemhuis.info \
    --cc=regressions@lists.linux.dev \
    --cc=rodrigo.vivi@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.