From: Maxim <maximlevitsky@gmail.com>
To: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: linux-kernel@vger.kernel.org, Pavel Machek <pavel@ucw.cz>,
Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
Subject: Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems
Date: Sun, 25 Mar 2007 02:40:39 +0200 [thread overview]
Message-ID: <200703250240.39457.maximlevitsky@gmail.com> (raw)
In-Reply-To: <200703231542.45459.rjw@sisk.pl>
On Friday 23 March 2007 16:42:44 Rafael J. Wysocki wrote:
> On Friday, 23 March 2007 00:30, Rafael J. Wysocki wrote:
> > On Thursday, 22 March 2007 00:53, Rafael J. Wysocki wrote:
> > > On Thursday, 22 March 2007 00:39, Maxim wrote:
> > > > On Thursday 22 March 2007 01:24:25 Rafael J. Wysocki wrote:
> > > > > On Thursday, 22 March 2007 00:09, Maxim wrote:
> > > > > > On Thursday 22 March 2007 00:39:02 you wrote:
> > > > > > > On Wednesday, 21 March 2007 23:21, Pavel Machek wrote:
> > > > > > > > Hi!
> > > > > > > >
> > > > > > > > > Starting with 2.6.21-rc1 suspend to ram and disk doesn't work anymore on my system.
> > > > > > > > >
> > > > > > > > > I did a git-bisect and found that those commits break it:
> > > > > > > > >
> > > > > > > > > e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: Change code ordering in main.c
> > > > > > > > > ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] swsusp: Change code ordering in disk.c
> > > > > > > > > 259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] swsusp: Change code ordering in user.c
> > > > > > > > >
> > > > > > > >
> > > > > > > > (Yep, it was in my "to analyze" queue).
> > > > > > > >
> > > > > > > > > I already reported about it, but now i know the reason why suspend breaks.
> > > > > > > > >
> > > > > > > > > The problem is that both cpu_up/cpu_down were allowed to sleep until now,
> > > > > > > > > and it did work because those functions could be called only in process context
> > > > > > > > > (the one that writes to /sys/devices/system/cpu/cpu*/online) or idle thread that does smp_init()).
> > > > > > > > >
> > > > > > > > > But now they are called _after_ all tasks were suspended, so if cpu_down tries for example to take a lock
> > > > > > > > > that is taken by different process, it can't since the different proccess is frozen and can't release the lock.
> > > > > > > > >
> > > > > > > >
> > > > > > > > Thanks for detailed explanation.
> > > > > > > >
> > > > > > > > ...but, on my machine suspend works ok in -rc4. I'm not seeing this.
> > > > > > > >
> > > > > > > > ...by design, "frozen" tasks must not hold any locks. If frozen task
> > > > > > > > holds a lock, that's a bug.
> > > > > > > >
> > > > > > > > > Or, it is also possible to revert this change.
> > > > > > > >
> > > > > > > > Are you using xfs?
> > > > > > >
> > > > > > > Well, this is the only case that can trigger it. There are no other freezable
> > > > > > > workqueues.
> > > > > > >
> > > > > > > Greetings,
> > > > > > > Rafael
> > > > > > >
> > > > > >
> > > > > > Hello,
> > > > > >
> > > > > > Yes, you are right and it is XFS
> > > > > >
> > > > > > System suspends and resumes with xfs and your patch correctly,
> > > > >
> > > > > Could you please sent this information to the list? I'd like it to reach all
> > > > > of the CCed parites. ;-)
> > > >
> > > > I did now ( sorry I just keep using this Answer command, instead of Answer to everybody)
> > > > I didn't intend to send private email.
> > > > >
> > > > > > Of course I need to mention that I had to unload microcode update driver because it prevented resume,
> > > > > > because it calls firmware loader helper, and again sleeps on lock
> > > > >
> > > > > This is interesting. Did it happen before or is it a regression?
> > > >
> > > > It is from the same group of bugs , I mean hang because cpu_up/down is called with frozen tasks
> > > > Of course it didn't happen before those reordering commits were introduced
> > >
> > > Well, we want cpu_up/down to be called after processes have been frozen, for
> > > various reasons (one of them being that applications shouldn't see us playing
> > > with the CPUs).
> > >
> > > Thanks for reporting this, I'll have a look at the microcode update driver.
> >
> > Well, I have invented the appended workaround, but I'm not sure how much
> > sense it makes with respect to the microcode driver. At least, it doesn't
> > break my AMD64 SMP setup. ;-)
>
> Modified version of the patch is appended. Unfortunately I have no hardware
> supporting the microcode updates.
>
> Greetings,
> Rafael
>
>
> ---
> arch/i386/kernel/microcode.c | 28 +++++++++++++++++++++++++---
> include/linux/cpu.h | 2 ++
> kernel/cpu.c | 32 ++++++++++++++++----------------
> 3 files changed, 43 insertions(+), 19 deletions(-)
>
> Index: linux-2.6.21-rc4/arch/i386/kernel/microcode.c
> ===================================================================
> --- linux-2.6.21-rc4.orig/arch/i386/kernel/microcode.c
> +++ linux-2.6.21-rc4/arch/i386/kernel/microcode.c
> @@ -567,6 +567,16 @@ static int cpu_request_microcode(int cpu
> return error;
> }
>
> +static void apply_microcode_on_cpu(int cpu)
> +{
> + cpumask_t old;
> +
> + old = current->cpus_allowed;
> + set_cpus_allowed(current, cpumask_of_cpu(cpu));
> + apply_microcode(cpu);
> + set_cpus_allowed(current, old);
> +}
> +
> static void microcode_init_cpu(int cpu)
> {
> cpumask_t old;
> @@ -663,13 +673,21 @@ static int mc_sysdev_add(struct sys_devi
> return 0;
>
> pr_debug("Microcode:CPU %d added\n", cpu);
> - memset(uci, 0, sizeof(*uci));
> + /* If suspend_cpu_hotplug is set, the system is resuming and we should
> + * use the data from before the suspend.
> + */
> + if (!suspend_cpu_hotplug)
> + memset(uci, 0, sizeof(*uci));
>
> err = sysfs_create_group(&sys_dev->kobj, &mc_attr_group);
> if (err)
> return err;
>
> - microcode_init_cpu(cpu);
> + if (suspend_cpu_hotplug && uci->valid)
> + apply_microcode_on_cpu(cpu);
> + else
> + microcode_init_cpu(cpu);
> +
> return 0;
> }
>
> @@ -680,7 +698,11 @@ static int mc_sysdev_remove(struct sys_d
> if (!cpu_online(cpu))
> return 0;
> pr_debug("Microcode:CPU %d removed\n", cpu);
> - microcode_fini_cpu(cpu);
> + /* If suspend_cpu_hotplug is set, the system is suspending and we should
> + * keep the microcode in memory for the resume.
> + */
> + if (!suspend_cpu_hotplug)
> + microcode_fini_cpu(cpu);
> sysfs_remove_group(&sys_dev->kobj, &mc_attr_group);
> return 0;
> }
> Index: linux-2.6.21-rc4/include/linux/cpu.h
> ===================================================================
> --- linux-2.6.21-rc4.orig/include/linux/cpu.h
> +++ linux-2.6.21-rc4/include/linux/cpu.h
> @@ -127,6 +127,8 @@ static inline int cpu_is_offline(int cpu
> #endif /* CONFIG_HOTPLUG_CPU */
>
> #ifdef CONFIG_SUSPEND_SMP
> +extern int suspend_cpu_hotplug;
> +
> extern int disable_nonboot_cpus(void);
> extern void enable_nonboot_cpus(void);
> #else
> Index: linux-2.6.21-rc4/kernel/cpu.c
> ===================================================================
> --- linux-2.6.21-rc4.orig/kernel/cpu.c
> +++ linux-2.6.21-rc4/kernel/cpu.c
> @@ -254,6 +254,12 @@ int __cpuinit cpu_up(unsigned int cpu)
> }
>
> #ifdef CONFIG_SUSPEND_SMP
> +/* Needed to prevent the microcode driver from requesting firmware in its CPU
> + * hotplug notifier during the suspend/resume.
> + */
> +int suspend_cpu_hotplug;
> +EXPORT_SYMBOL(suspend_cpu_hotplug);
> +
> static cpumask_t frozen_cpus;
>
> int disable_nonboot_cpus(void)
> @@ -261,16 +267,8 @@ int disable_nonboot_cpus(void)
> int cpu, first_cpu, error = 0;
>
> mutex_lock(&cpu_add_remove_lock);
> - first_cpu = first_cpu(cpu_present_map);
> - if (!cpu_online(first_cpu)) {
> - error = _cpu_up(first_cpu);
> - if (error) {
> - printk(KERN_ERR "Could not bring CPU%d up.\n",
> - first_cpu);
> - goto out;
> - }
> - }
> -
> + suspend_cpu_hotplug = 1;
> + first_cpu = first_cpu(cpu_online_map);
> /* We take down all of the non-boot CPUs in one shot to avoid races
> * with the userspace trying to use the CPU hotplug at the same time
> */
> @@ -296,7 +294,7 @@ int disable_nonboot_cpus(void)
> } else {
> printk(KERN_ERR "Non-boot CPUs are not disabled\n");
> }
> -out:
> + suspend_cpu_hotplug = 0;
> mutex_unlock(&cpu_add_remove_lock);
> return error;
> }
> @@ -308,20 +306,22 @@ void enable_nonboot_cpus(void)
> /* Allow everyone to use the CPU hotplug again */
> mutex_lock(&cpu_add_remove_lock);
> cpu_hotplug_disabled = 0;
> - mutex_unlock(&cpu_add_remove_lock);
> if (cpus_empty(frozen_cpus))
> - return;
> + goto out;
>
> + suspend_cpu_hotplug = 1;
> printk("Enabling non-boot CPUs ...\n");
> for_each_cpu_mask(cpu, frozen_cpus) {
> - error = cpu_up(cpu);
> + error = _cpu_up(cpu);
> if (!error) {
> printk("CPU%d is up\n", cpu);
> continue;
> }
> - printk(KERN_WARNING "Error taking CPU%d up: %d\n",
> - cpu, error);
> + printk(KERN_WARNING "Error taking CPU%d up: %d\n", cpu, error);
> }
> cpus_clear(frozen_cpus);
> + suspend_cpu_hotplug = 0;
> +out:
> + mutex_unlock(&cpu_add_remove_lock);
> }
> #endif
>
Hi,
I confirm that the above patch works,
At least system didn't hang on resume with microcode driver loaded,
I can't really test whenever it did update microcode because I almost sure there is nothing to update
(I use core 2 duo that I bought a month ago, and an intel motherboard with latest bios ( updated yesterday) )
I selected this driver just in case when I compiled kernel.
Regards,
Maxim Levitsky
next prev parent reply other threads:[~2007-03-25 0:40 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-03-21 16:40 [BUG] Code reordering in swsusp breaks suspend on SMP systems Maxim Levitsky
2007-03-21 21:22 ` Nigel Cunningham
2007-03-21 21:38 ` Rafael J. Wysocki
2007-03-21 23:47 ` Nigel Cunningham
2007-03-22 0:25 ` Maxim
2007-03-22 4:51 ` David Chinner
2007-03-22 7:23 ` Rafael J. Wysocki
2007-03-22 7:31 ` Andrew Morton
2007-03-22 8:17 ` Rafael J. Wysocki
[not found] ` <200703220114.05228.maximlevitsky@gmail.com>
2007-03-21 23:16 ` Maxim
2007-03-22 0:32 ` Maxim
2007-03-21 22:21 ` Pavel Machek
2007-03-21 22:39 ` Rafael J. Wysocki
2007-03-21 22:58 ` [RFC] : Is /proc/kcore still usefull and/or maintained ? Eric Dumazet
2007-03-21 23:11 ` Jan Engelhardt
2007-03-21 23:28 ` Maxim
2007-03-21 23:53 ` Eric Dumazet
2007-03-22 0:04 ` Maxim
2007-03-22 6:35 ` Eric Dumazet
[not found] ` <200703220109.54719.maximlevitsky@gmail.com>
2007-03-21 23:18 ` [BUG] Code reordering in swsusp breaks suspend on SMP systems Maxim
[not found] ` <200703220024.25436.rjw@sisk.pl>
2007-03-21 23:39 ` Maxim
2007-03-21 23:44 ` Maxim
2007-03-21 23:53 ` Rafael J. Wysocki
2007-03-22 0:01 ` Maxim
2007-03-22 23:30 ` Rafael J. Wysocki
2007-03-23 14:42 ` Rafael J. Wysocki
2007-03-25 0:40 ` Maxim [this message]
2007-03-25 12:13 ` Rafael J. Wysocki
2007-03-25 15:10 ` Maxim
2007-03-25 19:27 ` Rafael J. Wysocki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200703250240.39457.maximlevitsky@gmail.com \
--to=maximlevitsky@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=pavel@ucw.cz \
--cc=rjw@sisk.pl \
--cc=tigran@aivazian.fsnet.co.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.