From: "Petr Tesařík" <petr@tesarici.cz>
To: linux-pm@vger.kernel.org, linux-acpi@vger.kernel.org,
"Rafael J. Wysocki" <rafael@kernel.org>
Subject: Re: Thinkpad E595 system deadlock on resume from S3
Date: Tue, 3 Oct 2023 11:31:35 +0200 [thread overview]
Message-ID: <20231003113135.38384a87@meshulam.tesarici.cz> (raw)
In-Reply-To: <20230930122054.3cf727a4@meshulam>
Hi again (adding more recipients),
On Sat, 30 Sep 2023 12:20:54 +0200
Petr Tesařík <petr@tesarici.cz> wrote:
> Hi all,
>
> this time no patch (yet). In short, my Thinkpad running v6.6-rc3 fails
> to resume from S3. It also fails the same way with Tumbleweed v6.5
> kernel. I was able to capture a crash dump of the v6.5 kernel, and
> here's my analysis:
>
> The system never gets to waking up my SATA SSD disk:
>
> [0:0:0:0] disk ATA KINGSTON SEDC600 H5.1 /dev/sda
>
> There is a pending resume work for kworker/u32:12 (PID 11032), but this
> worker is stuck in 'D' state:
>
> >>> prog.stack_trace(11032)
> #0 context_switch (../kernel/sched/core.c:5381:2)
> #1 __schedule (../kernel/sched/core.c:6710:8)
> #2 schedule (../kernel/sched/core.c:6786:3)
> #3 schedule_preempt_disabled (../kernel/sched/core.c:6845:2)
> #4 __mutex_lock_common (../kernel/locking/mutex.c:679:3)
> #5 __mutex_lock (../kernel/locking/mutex.c:747:9)
> #6 acpi_device_hotplug (../drivers/acpi/scan.c:382:2)
> #7 acpi_hotplug_work_fn (../drivers/acpi/osl.c:1162:2)
> #8 process_one_work (../kernel/workqueue.c:2600:2)
> #9 worker_thread (../kernel/workqueue.c:2751:4)
> #10 kthread (../kernel/kthread.c:389:9)
> #11 ret_from_fork (../arch/x86/kernel/process.c:145:3)
> #12 ret_from_fork_asm+0x1b/0x20 (../arch/x86/entry/entry_64.S:304)
>
> acpi_device_hotplug() tries to acquire acpi_scan_lock, which is held by
> systemd-sleep (PID 11002). This task is also in 'D' state:
>
> >>> prog.stack_trace(11002)
> #0 context_switch (../kernel/sched/core.c:5381:2)
> #1 __schedule (../kernel/sched/core.c:6710:8)
> #2 schedule (../kernel/sched/core.c:6786:3)
> #3 schedule_preempt_disabled (../kernel/sched/core.c:6845:2)
> #4 __mutex_lock_common (../kernel/locking/mutex.c:679:3)
> #5 __mutex_lock (../kernel/locking/mutex.c:747:9)
> #6 device_lock (../include/linux/device.h:958:2)
> #7 device_complete (../drivers/base/power/main.c:1063:2)
> #8 dpm_complete (../drivers/base/power/main.c:1121:3)
> #9 suspend_devices_and_enter (../kernel/power/suspend.c:516:2)
I believe the issue must be somewhere here. The whole suspend and
resume logic in suspend_devices_and_enter() is framed by
platform_suspend_begin() and platform_resume_end().
My system is an ACPI system, so suspend_ops contains:
.begin = acpi_suspend_begin,
.end = acpi_pm_end,
Now, acpi_suspend_begin() acquires acpi_scan_lock through
acpi_pm_start(), and the lock is not released until acpi_pm_end().
Since dpm_complete() waits for the completion of a work that tries to
acquire acpi_scan_lock, the system will deadlock.
AFAICS either:
a. the ACPI lock cannot be held while dpm_complete() runs, or
b. ata_scsi_dev_rescan() must not be scheduled before the system is
resumed, or
c. acpi_device_hotplug() must be implemented without taking dev->mutex.
My gut feeling is that b. is the right answer.
I'm afraid I can't contribute too much more ATM, at least not before I
find out what broke my USB hub between rc3 and rc4.
Regards
Petr T
> #10 enter_state (../kernel/power/suspend.c:592:10)
> #11 pm_suspend (../kernel/power/suspend.c:619:10)
> #12 state_store (../kernel/power/main.c:707:11)
> #13 kernfs_fop_write_iter (../fs/kernfs/file.c:334:9)
> #14 call_write_iter (../include/linux/fs.h:1877:9)
> #15 new_sync_write (../fs/read_write.c:491:8)
> #16 vfs_write (../fs/read_write.c:584:9)
> #17 ksys_write (../fs/read_write.c:637:9)
> #18 do_syscall_x64 (../arch/x86/entry/common.c:50:14)
> #19 do_syscall_64 (../arch/x86/entry/common.c:80:7)
> #20 entry_SYSCALL_64+0xaa/0x1a6 (../arch/x86/entry/entry_64.S:120)
>
> It is trying to acquire dev->mutex, which is in turn held by
> kworker/0:0 (PID 10830), also in 'D' state:
>
> >>> prog.stack_trace(10830)
> #0 context_switch (../kernel/sched/core.c:5381:2)
> #1 __schedule (../kernel/sched/core.c:6710:8)
> #2 schedule (../kernel/sched/core.c:6786:3)
> #3 blk_queue_enter (../block/blk-core.c:326:3)
> #4 blk_mq_alloc_request (../block/blk-mq.c:592:9)
> #5 scsi_alloc_request (../drivers/scsi/scsi_lib.c:1139:7)
> #6 scsi_execute_cmd (../drivers/scsi/scsi_lib.c:217:8)
> #7 scsi_vpd_inquiry (../drivers/scsi/scsi.c:312:11)
> #8 scsi_get_vpd_size (../drivers/scsi/scsi.c:345:11)
> #9 scsi_get_vpd_size (../drivers/scsi/scsi.c:336:5)
> #10 scsi_get_vpd_buf (../drivers/scsi/scsi.c:415:12)
> #11 scsi_attach_vpd (../drivers/scsi/scsi.c:483:12)
> #12 scsi_rescan_device (../drivers/scsi/scsi_scan.c:1628:2)
> #13 ata_scsi_dev_rescan (../drivers/ata/libata-scsi.c:4894:4)
> #14 process_one_work (../kernel/workqueue.c:2600:2)
> #15 worker_thread (../kernel/workqueue.c:2751:4)
> #16 kthread (../kernel/kthread.c:389:9)
> #17 ret_from_fork (../arch/x86/kernel/process.c:145:3)
> #18 ret_from_fork_asm+0x1b/0x20 (../arch/x86/entry/entry_64.S:304)
>
> And here we are, waiting for the completion of the resume work that is
> scheduled on the blocked kworker/u32:12 (PID 11032), see above.
>
> FWIW the deadlock does not happen if autosuspend is turned off for this
> disk.
>
> I'm willing to invest more time into debugging, but I'm not well-versed
> in power management, so I may need some help from you as to what else I
> should look at. Please, include me in Cc of all potential replies; I'm
> not subscribed to linux-pm.
>
> Petr T
next prev parent reply other threads:[~2023-10-03 9:31 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-30 10:20 Thinkpad E595 system deadlock on resume from S3 Petr Tesařík
2023-10-03 9:31 ` Petr Tesařík [this message]
2023-10-03 10:15 ` Rafael J. Wysocki
2023-10-03 11:02 ` Petr Tesařík
2023-10-03 12:34 ` Rafael J. Wysocki
2023-10-03 12:40 ` Petr Tesařík
2023-10-03 12:48 ` Rafael J. Wysocki
2023-10-03 12:51 ` Petr Tesařík
2023-10-03 12:57 ` Rafael J. Wysocki
2023-10-03 15:18 ` Petr Tesařík
2023-10-03 16:16 ` Petr Tesařík
2023-10-03 16:48 ` Rafael J. Wysocki
2023-10-03 17:19 ` Petr Tesařík
2023-10-03 20:07 ` Petr Tesařík
2023-10-04 1:25 ` Damien Le Moal
2023-10-04 6:43 ` Petr Tesařík
2023-10-04 7:38 ` Damien Le Moal
2023-10-04 6:13 ` Hannes Reinecke
2023-10-04 6:18 ` Damien Le Moal
2023-10-04 6:42 ` Petr Tesařík
2023-10-03 12:35 ` Petr Tesařík
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20231003113135.38384a87@meshulam.tesarici.cz \
--to=petr@tesarici.cz \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=rafael@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox