public inbox for linux-pm@vger.kernel.org
 help / color / mirror / Atom feed
From: "Rafael J. Wysocki" <rjw@sisk.pl>
To: Johannes Berg <johannes@sipsolutions.net>
Cc: Andrew Morton <akpm@osdl.org>, Linus Torvalds <torvalds@osdl.org>,
	Dave Vasilevsky <djvasi@gmail.com>, Pavel Machek <pavel@ucw.cz>,
	Alexey Starikovskiy <alexey.y.starikovskiy@linux.intel.com>,
	Nigel Cunningham <ncunningham@cyclades.com>,
	linux-pm <linux-pm@lists.osdl.org>
Subject: Re: SMP suspend broken due to "swsusp: Change code ordering in disk.c" et al.
Date: Fri, 23 Feb 2007 12:54:45 +0100	[thread overview]
Message-ID: <200702231254.47009.rjw@sisk.pl> (raw)
In-Reply-To: <1172201385.15769.32.camel@johannes.berg>

Hi,

On Friday, 23 February 2007 04:29, Johannes Berg wrote:
> Hi,
> 
> After first debugging a while and then bisecting I found out why my quad
> G5 won't suspend any longer.
> 
> Let me explain. The patch in question (committed as
> ed746e3b18f4df18afa3763155972c5835f284c5, but the other ones around that
> for other suspend methods will have the same problems) modifies the
> suspend sequence to be like this:
> 
> freeze_processes
> swsusp_shrink_memory
> platform_prepare
> device_suspend
> disable_nonboot_cpus
> [...]
> 
> while previously it was
> 
> disable_nonboot_cpus
> freeze_processes
> platform_prepare
> swsusp_shrink_memory
> [...]
> 
> 
> The only thing I'm worried about here is the ordering of
> freeze_processes vs. disable_nonboot_cpus. The problem with this new
> ordering is with workqueues, specifically freezable per-CPU workqueues
> which consist of one kthread per CPU, bound to a single CPU. Now, when
> CPUs are hot-unplugged, the workqueue code (by having a cpu notifier
> called) will kill the thread for the CPU that is being unplugged. If you
> look into kernel/workqueue.c, you'll notice that this is done by a
> regular kthread_stop() as one might expect.
> 
> However, and this is the problem, for any freezable workqueue, the
> workqueue kthread will be frozen at this point! Hence, kthread_stop()
> will wait forever for the thread to finish, blocking the suspend
> process.

Hm, the only freezable workqueues I was aware of were those in XFS.

Moreover, the patch has got _a_ _lot_ of testing on SMP on x86_64
and I believe it works for people on i386 too.  So the workqueues in question
seem to be architecture-specific.  Is that correct?

> Now, as for a solution, I don't really have a great idea yet. We have a
> bunch of things we could do:
>  (1) simply change the ordering to disable nonboot CPUs much earlier
>  (2a) teach kthread_stop() about frozen processes and that it doesn't
>       need to wait for them because they'll die once they wake up again
>  (2b) teach kthread_stop() about frozen processes modify the freezer to
>       allow waking up a process that is destined to die
>  (3) teach the workqueue code about suspend
> 
> Of these options,
> 
> (1) would work, but also only punts the problem until someone wants to
> do multi-threaded suspend (as if...).

It will also break symmetry with the resume code that has to be like this
because of ACPI-related issues.

> (2a) would sort-of work, but what if someone unplugs a CPU while the
> system is suspended [will that even work]? the thread would get really
> stuck there, bound to a CPU that no longer exists.

Right now we are working on using the task freezer for CPU hotplugging and if
that works, this won't be an issue.
 
> (2b) should be possible, but would require some sort of per-thread
> exit-the-freezer API
> 
> (3) is icky

The workqueue code knows about the suspend already, that's why we have
create_freezeable_worqueue(), for example.

I'd like to first understand why the workqueues in question here are freezable.

> I think I prefer (2b) or alternatively (1). In any case, with the commit
> mentioned above reverted, my quad G5 can suspend to disk again and I'm
> happy that it isn't my fault ;)

Could you please check if the appended patch (on top of the commit you have
reverted) changes anything?

Rafael

---
 kernel/power/disk.c |   10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

Index: linux-2.6.21-rc1/kernel/power/disk.c
===================================================================
--- linux-2.6.21-rc1.orig/kernel/power/disk.c
+++ linux-2.6.21-rc1/kernel/power/disk.c
@@ -132,9 +132,13 @@ int pm_suspend_disk(void)
 	if (error)
 		goto Thaw;
 
+	error = disable_nonboot_cpus();
+	if (error)
+		goto Enable_cpus;
+
 	error = platform_prepare();
 	if (error)
-		goto Thaw;
+		goto Enable_cpus;
 
 	suspend_console();
 	error = device_suspend(PMSG_FREEZE);
@@ -142,10 +146,6 @@ int pm_suspend_disk(void)
 		printk(KERN_ERR "PM: Some devices failed to suspend\n");
 		goto Resume_devices;
 	}
-	error = disable_nonboot_cpus();
-	if (error)
-		goto Enable_cpus;
-
 	if (pm_disk_mode == PM_DISK_TEST) {
 		printk("swsusp debug: Waiting for 5 seconds.\n");
 		mdelay(5000);

  reply	other threads:[~2007-02-23 11:54 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-02-23  3:29 SMP suspend broken due to "swsusp: Change code ordering in disk.c" et al Johannes Berg
2007-02-23 11:54 ` Rafael J. Wysocki [this message]
2007-02-23 12:17   ` Johannes Berg
2007-02-23 13:25     ` Rafael J. Wysocki
2007-02-23 20:23       ` Johannes Berg
2007-02-24  0:01         ` Rafael J. Wysocki
2007-02-24  0:31           ` Johannes Berg
2007-02-24  8:57             ` Rafael J. Wysocki
2007-02-24 20:54               ` Rafael J. Wysocki
2007-02-24 21:07                 ` Johannes Berg
2007-03-12 16:57                 ` Roman Jarosz
2007-03-12 18:14                   ` Rafael J. Wysocki
2007-02-23 13:31     ` Johannes Berg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200702231254.47009.rjw@sisk.pl \
    --to=rjw@sisk.pl \
    --cc=akpm@osdl.org \
    --cc=alexey.y.starikovskiy@linux.intel.com \
    --cc=djvasi@gmail.com \
    --cc=johannes@sipsolutions.net \
    --cc=linux-pm@lists.osdl.org \
    --cc=ncunningham@cyclades.com \
    --cc=pavel@ucw.cz \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox