public inbox for linux-pm@vger.kernel.org
 help / color / mirror / Atom feed
* Thinkpad E595 system deadlock on resume from S3
@ 2023-09-30 10:20 Petr Tesařík
  2023-10-03  9:31 ` Petr Tesařík
  0 siblings, 1 reply; 21+ messages in thread
From: Petr Tesařík @ 2023-09-30 10:20 UTC (permalink / raw)
  To: linux-pm

Hi all,

this time no patch (yet). In short, my Thinkpad running v6.6-rc3 fails
to resume from S3. It also fails the same way with Tumbleweed v6.5
kernel. I was able to capture a crash dump of the v6.5 kernel, and
here's my analysis:

The system never gets to waking up my SATA SSD disk:

[0:0:0:0]    disk    ATA      KINGSTON SEDC600 H5.1  /dev/sda 

There is a pending resume work for kworker/u32:12 (PID 11032), but this
worker is stuck in 'D' state:

>>> prog.stack_trace(11032)
#0  context_switch (../kernel/sched/core.c:5381:2)
#1  __schedule (../kernel/sched/core.c:6710:8)
#2  schedule (../kernel/sched/core.c:6786:3)
#3  schedule_preempt_disabled (../kernel/sched/core.c:6845:2)
#4  __mutex_lock_common (../kernel/locking/mutex.c:679:3)
#5  __mutex_lock (../kernel/locking/mutex.c:747:9)
#6  acpi_device_hotplug (../drivers/acpi/scan.c:382:2)
#7  acpi_hotplug_work_fn (../drivers/acpi/osl.c:1162:2)
#8  process_one_work (../kernel/workqueue.c:2600:2)
#9  worker_thread (../kernel/workqueue.c:2751:4)
#10 kthread (../kernel/kthread.c:389:9)
#11 ret_from_fork (../arch/x86/kernel/process.c:145:3)
#12 ret_from_fork_asm+0x1b/0x20 (../arch/x86/entry/entry_64.S:304)

acpi_device_hotplug() tries to acquire acpi_scan_lock, which is held by
systemd-sleep (PID 11002). This task is also in 'D' state:

>>> prog.stack_trace(11002)
#0  context_switch (../kernel/sched/core.c:5381:2)
#1  __schedule (../kernel/sched/core.c:6710:8)
#2  schedule (../kernel/sched/core.c:6786:3)
#3  schedule_preempt_disabled (../kernel/sched/core.c:6845:2)
#4  __mutex_lock_common (../kernel/locking/mutex.c:679:3)
#5  __mutex_lock (../kernel/locking/mutex.c:747:9)
#6  device_lock (../include/linux/device.h:958:2)
#7  device_complete (../drivers/base/power/main.c:1063:2)
#8  dpm_complete (../drivers/base/power/main.c:1121:3)
#9  suspend_devices_and_enter (../kernel/power/suspend.c:516:2)
#10 enter_state (../kernel/power/suspend.c:592:10)
#11 pm_suspend (../kernel/power/suspend.c:619:10)
#12 state_store (../kernel/power/main.c:707:11)
#13 kernfs_fop_write_iter (../fs/kernfs/file.c:334:9)
#14 call_write_iter (../include/linux/fs.h:1877:9)
#15 new_sync_write (../fs/read_write.c:491:8)
#16 vfs_write (../fs/read_write.c:584:9)
#17 ksys_write (../fs/read_write.c:637:9)
#18 do_syscall_x64 (../arch/x86/entry/common.c:50:14)
#19 do_syscall_64 (../arch/x86/entry/common.c:80:7)
#20 entry_SYSCALL_64+0xaa/0x1a6 (../arch/x86/entry/entry_64.S:120)

It is trying to acquire dev->mutex, which is in turn held by
kworker/0:0 (PID 10830), also in 'D' state:

>>> prog.stack_trace(10830)
#0  context_switch (../kernel/sched/core.c:5381:2)
#1  __schedule (../kernel/sched/core.c:6710:8)
#2  schedule (../kernel/sched/core.c:6786:3)
#3  blk_queue_enter (../block/blk-core.c:326:3)
#4  blk_mq_alloc_request (../block/blk-mq.c:592:9)
#5  scsi_alloc_request (../drivers/scsi/scsi_lib.c:1139:7)
#6  scsi_execute_cmd (../drivers/scsi/scsi_lib.c:217:8)
#7  scsi_vpd_inquiry (../drivers/scsi/scsi.c:312:11)
#8  scsi_get_vpd_size (../drivers/scsi/scsi.c:345:11)
#9  scsi_get_vpd_size (../drivers/scsi/scsi.c:336:5)
#10 scsi_get_vpd_buf (../drivers/scsi/scsi.c:415:12)
#11 scsi_attach_vpd (../drivers/scsi/scsi.c:483:12)
#12 scsi_rescan_device (../drivers/scsi/scsi_scan.c:1628:2)
#13 ata_scsi_dev_rescan (../drivers/ata/libata-scsi.c:4894:4)
#14 process_one_work (../kernel/workqueue.c:2600:2)
#15 worker_thread (../kernel/workqueue.c:2751:4)
#16 kthread (../kernel/kthread.c:389:9)
#17 ret_from_fork (../arch/x86/kernel/process.c:145:3)
#18 ret_from_fork_asm+0x1b/0x20 (../arch/x86/entry/entry_64.S:304)

And here we are, waiting for the completion of the resume work that is
scheduled on the blocked kworker/u32:12 (PID 11032), see above.

FWIW the deadlock does not happen if autosuspend is turned off for this
disk.

I'm willing to invest more time into debugging, but I'm not well-versed
in power management, so I may need some help from you as to what else I
should look at. Please, include me in Cc of all potential replies; I'm
not subscribed to linux-pm.

Petr T

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2023-10-04  7:38 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-30 10:20 Thinkpad E595 system deadlock on resume from S3 Petr Tesařík
2023-10-03  9:31 ` Petr Tesařík
2023-10-03 10:15   ` Rafael J. Wysocki
2023-10-03 11:02     ` Petr Tesařík
2023-10-03 12:34       ` Rafael J. Wysocki
2023-10-03 12:40         ` Petr Tesařík
2023-10-03 12:48           ` Rafael J. Wysocki
2023-10-03 12:51             ` Petr Tesařík
2023-10-03 12:57               ` Rafael J. Wysocki
2023-10-03 15:18                 ` Petr Tesařík
2023-10-03 16:16                   ` Petr Tesařík
2023-10-03 16:48                   ` Rafael J. Wysocki
2023-10-03 17:19                     ` Petr Tesařík
2023-10-03 20:07                     ` Petr Tesařík
2023-10-04  1:25                       ` Damien Le Moal
2023-10-04  6:43                         ` Petr Tesařík
2023-10-04  7:38                           ` Damien Le Moal
2023-10-04  6:13                   ` Hannes Reinecke
2023-10-04  6:18                     ` Damien Le Moal
2023-10-04  6:42                       ` Petr Tesařík
2023-10-03 12:35       ` Petr Tesařík

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox