public inbox for linux-pm@vger.kernel.org
 help / color / mirror / Atom feed
* SMP suspend broken due to "swsusp: Change code ordering in disk.c" et al.
@ 2007-02-23  3:29 Johannes Berg
  2007-02-23 11:54 ` Rafael J. Wysocki
  0 siblings, 1 reply; 13+ messages in thread
From: Johannes Berg @ 2007-02-23  3:29 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Andrew Morton, Linus Torvalds, Dave Vasilevsky, Pavel Machek,
	Alexey Starikovskiy, Nigel Cunningham, linux-pm


[-- Attachment #1.1: Type: text/plain, Size: 2439 bytes --]

Hi,

After first debugging a while and then bisecting I found out why my quad
G5 won't suspend any longer.

Let me explain. The patch in question (committed as
ed746e3b18f4df18afa3763155972c5835f284c5, but the other ones around that
for other suspend methods will have the same problems) modifies the
suspend sequence to be like this:

freeze_processes
swsusp_shrink_memory
platform_prepare
device_suspend
disable_nonboot_cpus
[...]

while previously it was

disable_nonboot_cpus
freeze_processes
platform_prepare
swsusp_shrink_memory
[...]


The only thing I'm worried about here is the ordering of
freeze_processes vs. disable_nonboot_cpus. The problem with this new
ordering is with workqueues, specifically freezable per-CPU workqueues
which consist of one kthread per CPU, bound to a single CPU. Now, when
CPUs are hot-unplugged, the workqueue code (by having a cpu notifier
called) will kill the thread for the CPU that is being unplugged. If you
look into kernel/workqueue.c, you'll notice that this is done by a
regular kthread_stop() as one might expect.

However, and this is the problem, for any freezable workqueue, the
workqueue kthread will be frozen at this point! Hence, kthread_stop()
will wait forever for the thread to finish, blocking the suspend
process.

Now, as for a solution, I don't really have a great idea yet. We have a
bunch of things we could do:
 (1) simply change the ordering to disable nonboot CPUs much earlier
 (2a) teach kthread_stop() about frozen processes and that it doesn't
      need to wait for them because they'll die once they wake up again
 (2b) teach kthread_stop() about frozen processes modify the freezer to
      allow waking up a process that is destined to die
 (3) teach the workqueue code about suspend

Of these options,

(1) would work, but also only punts the problem until someone wants to
do multi-threaded suspend (as if...).

(2a) would sort-of work, but what if someone unplugs a CPU while the
system is suspended [will that even work]? the thread would get really
stuck there, bound to a CPU that no longer exists.

(2b) should be possible, but would require some sort of per-thread
exit-the-freezer API

(3) is icky

I think I prefer (2b) or alternatively (1). In any case, with the commit
mentioned above reverted, my quad G5 can suspend to disk again and I'm
happy that it isn't my fault ;)

johannes

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 190 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2007-03-12 18:14 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-02-23  3:29 SMP suspend broken due to "swsusp: Change code ordering in disk.c" et al Johannes Berg
2007-02-23 11:54 ` Rafael J. Wysocki
2007-02-23 12:17   ` Johannes Berg
2007-02-23 13:25     ` Rafael J. Wysocki
2007-02-23 20:23       ` Johannes Berg
2007-02-24  0:01         ` Rafael J. Wysocki
2007-02-24  0:31           ` Johannes Berg
2007-02-24  8:57             ` Rafael J. Wysocki
2007-02-24 20:54               ` Rafael J. Wysocki
2007-02-24 21:07                 ` Johannes Berg
2007-03-12 16:57                 ` Roman Jarosz
2007-03-12 18:14                   ` Rafael J. Wysocki
2007-02-23 13:31     ` Johannes Berg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox