* Re: 2.6.30: hibernation/swsusp lockup due to acpi-cpufreq [not found] ` <cNOy0-5C5-23@gated-at.bofh.it> @ 2009-07-04 18:09 ` Michael Witten 2009-07-04 21:29 ` Rafael J. Wysocki 0 siblings, 1 reply; 17+ messages in thread From: Michael Witten @ 2009-07-04 18:09 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Johannes Stezenbach, Andrew Morton, Venkatesh Pallipadi, linux-acpi, linux-kernel On Tue, 16 Jun 2009 23:39:59 +0200, Rafael J. Wysocki wrote (http://www.spinics.net/lists/linux-acpi/msg22661.html): > In fact, we need to do this entire thing differently. > > The basic problem is that cpufreq_suspend() is a sysdev thing, so it will > always be called with iterrupts off and *only* for CPU0. So, it looks like > the majority of things we do there is just unnecessary (at least). What's the status? This bug is driving me nuts. Thanks, Michael Witten PS See http://groups.google.com/group/linux.kernel/browse_thread/thread/bb3f4e0fc32273c4/e83178dfc9374669 ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: 2.6.30: hibernation/swsusp lockup due to acpi-cpufreq 2009-07-04 18:09 ` 2.6.30: hibernation/swsusp lockup due to acpi-cpufreq Michael Witten @ 2009-07-04 21:29 ` Rafael J. Wysocki 2009-07-06 21:20 ` Pallipadi, Venkatesh 0 siblings, 1 reply; 17+ messages in thread From: Rafael J. Wysocki @ 2009-07-04 21:29 UTC (permalink / raw) To: Michael Witten Cc: Johannes Stezenbach, Andrew Morton, Venkatesh Pallipadi, linux-acpi, linux-kernel On Saturday 04 July 2009, Michael Witten wrote: > On Tue, 16 Jun 2009 23:39:59 +0200, Rafael J. Wysocki wrote > (http://www.spinics.net/lists/linux-acpi/msg22661.html): > > > In fact, we need to do this entire thing differently. > > > > The basic problem is that cpufreq_suspend() is a sysdev thing, so it will > > always be called with iterrupts off and *only* for CPU0. So, it looks like > > the majority of things we do there is just unnecessary (at least). > > What's the status? This bug is driving me nuts. Unfortunately, still unresolved. Rafael ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: 2.6.30: hibernation/swsusp lockup due to acpi-cpufreq 2009-07-04 21:29 ` Rafael J. Wysocki @ 2009-07-06 21:20 ` Pallipadi, Venkatesh 2009-07-06 21:39 ` Rafael J. Wysocki 0 siblings, 1 reply; 17+ messages in thread From: Pallipadi, Venkatesh @ 2009-07-06 21:20 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Michael Witten, Johannes Stezenbach, Andrew Morton, linux-acpi@vger.kernel.org, linux-kernel@vger.kernel.org On Sat, 2009-07-04 at 14:29 -0700, Rafael J. Wysocki wrote: > On Saturday 04 July 2009, Michael Witten wrote: > > On Tue, 16 Jun 2009 23:39:59 +0200, Rafael J. Wysocki wrote > > (http://www.spinics.net/lists/linux-acpi/msg22661.html): > > > > > In fact, we need to do this entire thing differently. > > > > > > The basic problem is that cpufreq_suspend() is a sysdev thing, so it will > > > always be called with iterrupts off and *only* for CPU0. So, it looks like > > > the majority of things we do there is just unnecessary (at least). > > > > What's the status? This bug is driving me nuts. > > Unfortunately, still unresolved. Looked at this a bit more from acpi cpufreq angle. But, I feel the patch that Johannes had here http://lkml.indiana.edu/hypermail/linux/kernel/0906.2/00335.html is the right fix as we do the same saving and restoring of flags on SMP when cpu==this_cpu. This change will make code in UP same as that in SMP with 1 CPU active. We can avoid the driver->get call from cpufreq_suspend for the drivers that do not do any freq changes in their suspend methods. But, then we will hit this same problem in cpufreq_resume() path and there we do want to check for adjust_jiffies for all drivers as long as CONSTANT_LOOPS is not set. Thanks, Venki ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: 2.6.30: hibernation/swsusp lockup due to acpi-cpufreq 2009-07-06 21:20 ` Pallipadi, Venkatesh @ 2009-07-06 21:39 ` Rafael J. Wysocki 2009-07-06 22:16 ` Pallipadi, Venkatesh 0 siblings, 1 reply; 17+ messages in thread From: Rafael J. Wysocki @ 2009-07-06 21:39 UTC (permalink / raw) To: Pallipadi, Venkatesh Cc: Michael Witten, Johannes Stezenbach, Andrew Morton, linux-acpi@vger.kernel.org, linux-kernel@vger.kernel.org On Monday 06 July 2009, Pallipadi, Venkatesh wrote: > On Sat, 2009-07-04 at 14:29 -0700, Rafael J. Wysocki wrote: > > On Saturday 04 July 2009, Michael Witten wrote: > > > On Tue, 16 Jun 2009 23:39:59 +0200, Rafael J. Wysocki wrote > > > (http://www.spinics.net/lists/linux-acpi/msg22661.html): > > > > > > > In fact, we need to do this entire thing differently. > > > > > > > > The basic problem is that cpufreq_suspend() is a sysdev thing, so it will > > > > always be called with iterrupts off and *only* for CPU0. So, it looks like > > > > the majority of things we do there is just unnecessary (at least). > > > > > > What's the status? This bug is driving me nuts. > > > > Unfortunately, still unresolved. > > Looked at this a bit more from acpi cpufreq angle. Thanks! > But, I feel the patch that Johannes had here > http://lkml.indiana.edu/hypermail/linux/kernel/0906.2/00335.html > is the right fix as we do the same saving and restoring of flags on SMP > when cpu==this_cpu. This change will make code in UP same as that in SMP > with 1 CPU active. Andrew didn't like this patch IIRC. > We can avoid the driver->get call from cpufreq_suspend for the drivers > that do not do any freq changes in their suspend methods. But, then we > will hit this same problem in cpufreq_resume() path and there we do want > to check for adjust_jiffies for all drivers as long as CONSTANT_LOOPS is > not set. Why do we need to call that driver->get from cpufreq_suspend() in the first place? We know we are on CPU0, there are no any other CPUs active and interrupts are disabled, so why do we need to care for any races? Rafael ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: 2.6.30: hibernation/swsusp lockup due to acpi-cpufreq 2009-07-06 21:39 ` Rafael J. Wysocki @ 2009-07-06 22:16 ` Pallipadi, Venkatesh 0 siblings, 0 replies; 17+ messages in thread From: Pallipadi, Venkatesh @ 2009-07-06 22:16 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Michael Witten, Johannes Stezenbach, Andrew Morton, linux-acpi@vger.kernel.org, linux-kernel@vger.kernel.org On Mon, 2009-07-06 at 14:39 -0700, Rafael J. Wysocki wrote: > On Monday 06 July 2009, Pallipadi, Venkatesh wrote: > > On Sat, 2009-07-04 at 14:29 -0700, Rafael J. Wysocki wrote: > > > On Saturday 04 July 2009, Michael Witten wrote: > > > > On Tue, 16 Jun 2009 23:39:59 +0200, Rafael J. Wysocki wrote > > > > (http://www.spinics.net/lists/linux-acpi/msg22661.html): > > > > > > > > > In fact, we need to do this entire thing differently. > > > > > > > > > > The basic problem is that cpufreq_suspend() is a sysdev thing, so it will > > > > > always be called with iterrupts off and *only* for CPU0. So, it looks like > > > > > the majority of things we do there is just unnecessary (at least). > > > > > > > > What's the status? This bug is driving me nuts. > > > > > > Unfortunately, still unresolved. > > > > Looked at this a bit more from acpi cpufreq angle. > > Thanks! > > > But, I feel the patch that Johannes had here > > http://lkml.indiana.edu/hypermail/linux/kernel/0906.2/00335.html > > is the right fix as we do the same saving and restoring of flags on SMP > > when cpu==this_cpu. This change will make code in UP same as that in SMP > > with 1 CPU active. > > Andrew didn't like this patch IIRC. > > > We can avoid the driver->get call from cpufreq_suspend for the drivers > > that do not do any freq changes in their suspend methods. But, then we > > will hit this same problem in cpufreq_resume() path and there we do want > > to check for adjust_jiffies for all drivers as long as CONSTANT_LOOPS is > > not set. > > Why do we need to call that driver->get from cpufreq_suspend() in the first > place? We know we are on CPU0, there are no any other CPUs active and > interrupts are disabled, so why do we need to care for any races? > cpufreq core is trying to get the current frequency on the CPU and call adjust_jiffies(). Both in suspend and resume. Looks like it is needed only for some systems on suspend path, as the driver suspend routine may change freq. But, it is probably needed for most of the systems on resume time as CPU may come up with different freq. It doesn't care about races. Problem is that ->get method used here is same as the one used in normal system working state. In system working state, ->get has to get the frequency from proper CPU using smp_call_function and int enabled. I thought Andrew was worried about SMP case. But, we already do this in SMP case kernel/smp.c:smp_call_function_single() if (cpu == this_cpu) { local_irq_save(flags); func(info); local_irq_restore(flags); } else { That patch changes kernel/up.c:smp_call_function_single() local_irq_disable(); (func)(info); local_irq_enable(); to add save and restore flags. Thanks, Venki ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: 2.6.30: hibernation/swsusp lockup due to acpi-cpufreq
@ 2009-07-04 2:32 mfwitten
0 siblings, 0 replies; 17+ messages in thread
From: mfwitten @ 2009-07-04 2:32 UTC (permalink / raw)
To: linux-acpi
On Tue, 16 Jun 2009 23:39:59 +0200, Rafael J. Wysocki wrote
(http://www.spinics.net/lists/linux-acpi/msg22661.html):
> In fact, we need to do this entire thing differently.
>
> The basic problem is that cpufreq_suspend() is a sysdev thing, so it will
> always be called with iterrupts off and *only* for CPU0. So, it looks like
> the majority of things we do there is just unnecessary (at least).
What's the status? This bug is driving me nuts.
Thanks,
Michael Witten
^ permalink raw reply [flat|nested] 17+ messages in thread[parent not found: <20090615232709.GA6059@sig21.net>]
* Re: 2.6.30: hibernation/swsusp lockup due to acpi-cpufreq [not found] <20090615232709.GA6059@sig21.net> @ 2009-06-16 0:16 ` Rafael J. Wysocki 2009-06-16 14:22 ` Johannes Stezenbach 0 siblings, 1 reply; 17+ messages in thread From: Rafael J. Wysocki @ 2009-06-16 0:16 UTC (permalink / raw) To: Johannes Stezenbach, Andrew Morton Cc: linux-kernel, Dave Jones, Pavel Machek, ACPI Devel Maling List, Len Brown, Venki Pallipadi, Arjan van de Ven, Thomas Gleixner On Tuesday 16 June 2009, Johannes Stezenbach wrote: > Hi, > > on my aging Thinkpad T42p resume from hibernation > fails in 2.6.30. There is a backtrace on suspend prior > to writing out the disk image, but I cannot capture > it due to lack of a serial port on the T42p. On > resume the machine is dead after reading the image > from disk. > > I've bisected this to: > > commit 01599fca6758d2cd133e78f87426fc851c9ea725 > Author: Andrew Morton <akpm@linux-foundation.org> > Date: Mon Apr 13 10:27:49 2009 -0700 > > cpufreq: use smp_call_function_[single|many]() in acpi-cpufreq.c > > I see in git log that this commit is known broken, but the > resume on my machine is still broken in 2.6.30. > > If I disable CONFIG_X86_ACPI_CPUFREQ suspend/resume works in 2.6.30. Thanks a lot for bisecting this! Is it the reason for the enabling of interrupts during cpufreq_suspend()? /me wonders Is there anything we can do to fix this quickly? Rafael ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: 2.6.30: hibernation/swsusp lockup due to acpi-cpufreq 2009-06-16 0:16 ` Rafael J. Wysocki @ 2009-06-16 14:22 ` Johannes Stezenbach 2009-06-16 18:55 ` Andrew Morton 0 siblings, 1 reply; 17+ messages in thread From: Johannes Stezenbach @ 2009-06-16 14:22 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Andrew Morton, linux-kernel, Dave Jones, Pavel Machek, ACPI Devel Maling List, Len Brown, Venki Pallipadi, Arjan van de Ven, Thomas Gleixner On Tue, Jun 16, 2009 at 02:16:28AM +0200, Rafael J. Wysocki wrote: > On Tuesday 16 June 2009, Johannes Stezenbach wrote: > > > > on my aging Thinkpad T42p resume from hibernation > > fails in 2.6.30. There is a backtrace on suspend prior > > to writing out the disk image, but I cannot capture > > it due to lack of a serial port on the T42p. On > > resume the machine is dead after reading the image > > from disk. > > > > I've bisected this to: > > > > commit 01599fca6758d2cd133e78f87426fc851c9ea725 > > Author: Andrew Morton <akpm@linux-foundation.org> > > Date: Mon Apr 13 10:27:49 2009 -0700 > > > > cpufreq: use smp_call_function_[single|many]() in acpi-cpufreq.c > > > > I see in git log that this commit is known broken, but the > > resume on my machine is still broken in 2.6.30. > > > > If I disable CONFIG_X86_ACPI_CPUFREQ suspend/resume works in 2.6.30. > > Thanks a lot for bisecting this! > > Is it the reason for the enabling of interrupts during cpufreq_suspend()? > > /me wonders > > Is there anything we can do to fix this quickly? I think your guess was right. The patch below fixes the problem for me (hang after resume and backtrace on suspend). Johannes ----------------------------- Fix swsusp failure on !SMP Commit 01599fca6758d2cd133e78f87426fc851c9ea725 introduced a regression which caused a backtrace on suspend and a hang on resume on a Thinkpad T42p (Pentium M CPU). Signed-off-by: Johannes Stezenbach <js@sig21.net> --- linux-2.6.30/kernel/up.c.orig 2009-06-16 15:56:28.000000000 +0200 +++ linux-2.6.30/kernel/up.c 2009-06-16 15:57:27.000000000 +0200 @@ -10,11 +10,13 @@ int smp_call_function_single(int cpu, void (*func) (void *info), void *info, int wait) { + unsigned long flags; + WARN_ON(cpu != 0); - local_irq_disable(); + local_irq_save(flags); (func)(info); - local_irq_enable(); + local_irq_restore(flags); return 0; } ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: 2.6.30: hibernation/swsusp lockup due to acpi-cpufreq 2009-06-16 14:22 ` Johannes Stezenbach @ 2009-06-16 18:55 ` Andrew Morton 2009-06-16 19:57 ` Johannes Stezenbach 0 siblings, 1 reply; 17+ messages in thread From: Andrew Morton @ 2009-06-16 18:55 UTC (permalink / raw) To: Johannes Stezenbach Cc: rjw, linux-kernel, davej, pavel, linux-acpi, lenb, venkatesh.pallipadi, arjan, tglx On Tue, 16 Jun 2009 16:22:17 +0200 Johannes Stezenbach <js@sig21.net> wrote: > Fix swsusp failure on !SMP > > Commit 01599fca6758d2cd133e78f87426fc851c9ea725 introduced > a regression which caused a backtrace on suspend and > a hang on resume on a Thinkpad T42p (Pentium M CPU). > > Signed-off-by: Johannes Stezenbach <js@sig21.net> > > > --- linux-2.6.30/kernel/up.c.orig 2009-06-16 15:56:28.000000000 +0200 > +++ linux-2.6.30/kernel/up.c 2009-06-16 15:57:27.000000000 +0200 > @@ -10,11 +10,13 @@ > int smp_call_function_single(int cpu, void (*func) (void *info), void *info, > int wait) > { > + unsigned long flags; > + > WARN_ON(cpu != 0); > > - local_irq_disable(); > + local_irq_save(flags); > (func)(info); > - local_irq_enable(); > + local_irq_restore(flags); > > return 0; > } ok, what's going on here? The patch implies that someone (presumably acpi-cpufreq) is calling smp_call_function_single() with local interrupts disabled. That's a bug on SMP kernels. And it'll generate a trace if it happens: /* Can deadlock when called with interrupts disabled */ WARN_ON_ONCE(irqs_disabled() && !oops_in_progress); but nobody has reported such a trace AFAIK? Also, prior to 01599fca6758d2cd133e78f87426fc851c9ea725, acpi-cpufreq was using work_on_cpu(). If it was calling work_on_cpu() with local interrupts disabled then that would have been a bug too, which could generate might_sleep() or scheduling-while-atomic warnings. Because it is a bug to call the SMP version of smp_call_function_single() with local interrupts disabled, I don't think we should need to apply the above patch. But I don't know what we _should_ do because I don't know what the bug is. Are you able to get us a copy of that stack trace? ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: 2.6.30: hibernation/swsusp lockup due to acpi-cpufreq 2009-06-16 18:55 ` Andrew Morton @ 2009-06-16 19:57 ` Johannes Stezenbach 2009-06-16 20:25 ` Pallipadi, Venkatesh 0 siblings, 1 reply; 17+ messages in thread From: Johannes Stezenbach @ 2009-06-16 19:57 UTC (permalink / raw) To: Andrew Morton Cc: rjw, linux-kernel, davej, pavel, linux-acpi, lenb, venkatesh.pallipadi, arjan, tglx On Tue, Jun 16, 2009 at 11:55:40AM -0700, Andrew Morton wrote: > On Tue, 16 Jun 2009 16:22:17 +0200 > Johannes Stezenbach <js@sig21.net> wrote: > > > Fix swsusp failure on !SMP > > > > Commit 01599fca6758d2cd133e78f87426fc851c9ea725 introduced > > a regression which caused a backtrace on suspend and > > a hang on resume on a Thinkpad T42p (Pentium M CPU). > > > > Signed-off-by: Johannes Stezenbach <js@sig21.net> > > > > > > --- linux-2.6.30/kernel/up.c.orig 2009-06-16 15:56:28.000000000 +0200 > > +++ linux-2.6.30/kernel/up.c 2009-06-16 15:57:27.000000000 +0200 > > @@ -10,11 +10,13 @@ > > int smp_call_function_single(int cpu, void (*func) (void *info), void *info, > > int wait) > > { > > + unsigned long flags; > > + > > WARN_ON(cpu != 0); > > > > - local_irq_disable(); > > + local_irq_save(flags); > > (func)(info); > > - local_irq_enable(); > > + local_irq_restore(flags); > > > > return 0; > > } > > ok, what's going on here? The patch implies that someone (presumably > acpi-cpufreq) is calling smp_call_function_single() with local > interrupts disabled. That's a bug on SMP kernels. And it'll generate > a trace if it happens: > > /* Can deadlock when called with interrupts disabled */ > WARN_ON_ONCE(irqs_disabled() && !oops_in_progress); > > but nobody has reported such a trace AFAIK? This problem apparently only exists on !SMP kernels... > Also, prior to 01599fca6758d2cd133e78f87426fc851c9ea725, acpi-cpufreq > was using work_on_cpu(). If it was calling work_on_cpu() with local > interrupts disabled then that would have been a bug too, which could > generate might_sleep() or scheduling-while-atomic warnings. On !SMP, work_on_cpu() is just a function call: http://lxr.linux.no/linux+v2.6.30/include/linux/workqueue.h#L261 > Because it is a bug to call the SMP version of > smp_call_function_single() with local interrupts disabled, I don't > think we should need to apply the above patch. and on SMP, smp_call_function_single() also uses local_irq_save/restore() iff cpu == this_cpu: http://lxr.linux.no/linux+v2.6.30/kernel/smp.c#L272 > But I don't know what we _should_ do because I don't know what the bug > is. Are you able to get us a copy of that stack trace? Unfortunately my laptop doesn't have a serial port, and the stack trace is large and scrolls off the screen, I can only see the last part of it and I would need to find someone with a camera to take a picture... Johannes ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: 2.6.30: hibernation/swsusp lockup due to acpi-cpufreq 2009-06-16 19:57 ` Johannes Stezenbach @ 2009-06-16 20:25 ` Pallipadi, Venkatesh 2009-06-16 20:40 ` Johannes Stezenbach 2009-06-16 20:44 ` Johannes Stezenbach 0 siblings, 2 replies; 17+ messages in thread From: Pallipadi, Venkatesh @ 2009-06-16 20:25 UTC (permalink / raw) To: Johannes Stezenbach Cc: Andrew Morton, rjw@sisk.pl, linux-kernel@vger.kernel.org, davej@redhat.com, pavel@ucw.cz, linux-acpi@vger.kernel.org, lenb@kernel.org, Pallipadi, Venkatesh, arjan@infradead.org, tglx@linutronix.de On Tue, Jun 16, 2009 at 12:57:50PM -0700, Johannes Stezenbach wrote: > On Tue, Jun 16, 2009 at 11:55:40AM -0700, Andrew Morton wrote: > > On Tue, 16 Jun 2009 16:22:17 +0200 > > Johannes Stezenbach <js@sig21.net> wrote: > > > > > Fix swsusp failure on !SMP > > > > > > Commit 01599fca6758d2cd133e78f87426fc851c9ea725 introduced > > > a regression which caused a backtrace on suspend and > > > a hang on resume on a Thinkpad T42p (Pentium M CPU). > > > > > > Signed-off-by: Johannes Stezenbach <js@sig21.net> > > > > > > > > > --- linux-2.6.30/kernel/up.c.orig 2009-06-16 15:56:28.000000000 +0200 > > > +++ linux-2.6.30/kernel/up.c 2009-06-16 15:57:27.000000000 +0200 > > > @@ -10,11 +10,13 @@ > > > int smp_call_function_single(int cpu, void (*func) (void *info), void *info, > > > int wait) > > > { > > > + unsigned long flags; > > > + > > > WARN_ON(cpu != 0); > > > > > > - local_irq_disable(); > > > + local_irq_save(flags); > > > (func)(info); > > > - local_irq_enable(); > > > + local_irq_restore(flags); > > > > > > return 0; > > > } > > > > ok, what's going on here? The patch implies that someone (presumably > > acpi-cpufreq) is calling smp_call_function_single() with local > > interrupts disabled. That's a bug on SMP kernels. And it'll generate > > a trace if it happens: > > > > /* Can deadlock when called with interrupts disabled */ > > WARN_ON_ONCE(irqs_disabled() && !oops_in_progress); > > > > but nobody has reported such a trace AFAIK? > > This problem apparently only exists on !SMP kernels... > > > Also, prior to 01599fca6758d2cd133e78f87426fc851c9ea725, acpi-cpufreq > > was using work_on_cpu(). If it was calling work_on_cpu() with local > > interrupts disabled then that would have been a bug too, which could > > generate might_sleep() or scheduling-while-atomic warnings. > > On !SMP, work_on_cpu() is just a function call: > http://lxr.linux.no/linux+v2.6.30/include/linux/workqueue.h#L261 > > > Because it is a bug to call the SMP version of > > smp_call_function_single() with local interrupts disabled, I don't > > think we should need to apply the above patch. > > and on SMP, smp_call_function_single() also uses > local_irq_save/restore() iff cpu == this_cpu: > http://lxr.linux.no/linux+v2.6.30/kernel/smp.c#L272 > > > But I don't know what we _should_ do because I don't know what the bug > > is. Are you able to get us a copy of that stack trace? > > Unfortunately my laptop doesn't have a serial port, and the > stack trace is large and scrolls off the screen, I can only > see the last part of it and I would need to find someone with > a camera to take a picture... Can you try the patch below (your changes + a warnon). That should give the stack trace with successful suspend-resume. acpi-cpufreq will not directly disable interrupt and call these routines. So, it will be interesting to see how we are ending up in this state. Thanks, Venki diff --git a/kernel/up.c b/kernel/up.c index 1ff27a2..a4318ff 100644 --- a/kernel/up.c +++ b/kernel/up.c @@ -10,11 +10,15 @@ int smp_call_function_single(int cpu, void (*func) (void *info), void *info, int wait) { + unsigned long flags; + WARN_ON(cpu != 0); - local_irq_disable(); + WARN_ON(irqs_disabled()); + + local_irq_save(flags); (func)(info); - local_irq_enable(); + local_irq_restore(flags); return 0; } ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: 2.6.30: hibernation/swsusp lockup due to acpi-cpufreq 2009-06-16 20:25 ` Pallipadi, Venkatesh @ 2009-06-16 20:40 ` Johannes Stezenbach 2009-06-16 21:09 ` Andrew Morton 2009-06-16 20:44 ` Johannes Stezenbach 1 sibling, 1 reply; 17+ messages in thread From: Johannes Stezenbach @ 2009-06-16 20:40 UTC (permalink / raw) To: Pallipadi, Venkatesh Cc: Andrew Morton, rjw@sisk.pl, linux-kernel@vger.kernel.org, davej@redhat.com, pavel@ucw.cz, linux-acpi@vger.kernel.org, lenb@kernel.org, arjan@infradead.org, tglx@linutronix.de On Tue, Jun 16, 2009 at 01:25:58PM -0700, Pallipadi, Venkatesh wrote: > > Can you try the patch below (your changes + a warnon). That should give > the stack trace with successful suspend-resume. > > acpi-cpufreq will not directly disable interrupt and call these routines. > So, it will be interesting to see how we are ending up in this state. Yes, I actually had the same idea and just did it ;-) I also found this: http://lkml.org/lkml/2007/7/17/674 ------------[ cut here ]------------ WARNING: at kernel/up.c:18 smp_call_function_single+0x45/0x60() Hardware name: 2373Y4M Modules linked in: ath5k mac80211 cfg80211 uhci_hcd ehci_hcd Pid: 4139, comm: bash Not tainted 2.6.30 #8 Call Trace: [<c011ea0d>] warn_slowpath_common+0x60/0x90 [<c010d86c>] ? do_drv_read+0x0/0x31 [<c011ea4a>] warn_slowpath_null+0xd/0x10 [<c013acc1>] smp_call_function_single+0x45/0x60 [<c010d4e5>] get_cur_val+0x62/0x6c [<c010d72f>] get_cur_freq_on_cpu+0x35/0x58 [<c03786e9>] cpufreq_suspend+0x76/0xd9 [<c0136c3b>] ? clockevents_notify+0x1e/0x68 [<c02ff570>] sysdev_suspend+0x4e/0x182 [<c013fd28>] hibernation_snapshot+0x89/0x16b [<c013fe99>] hibernate+0x8f/0x147 [<c013ec82>] ? state_store+0x0/0xa2 [<c013ecd7>] state_store+0x55/0xa2 [<c013ec82>] ? state_store+0x0/0xa2 [<c024dff5>] kobj_attr_store+0x1a/0x22 [<c01a7164>] sysfs_write_file+0xb4/0xdf [<c01a70b0>] ? sysfs_write_file+0x0/0xdf [<c0170cf2>] vfs_write+0x8a/0x12c [<c0170e2d>] sys_write+0x3b/0x60 [<c01028f4>] sysenter_do_call+0x12/0x26 ---[ end trace 1c2172bce3982a59 ]--- Johannes ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: 2.6.30: hibernation/swsusp lockup due to acpi-cpufreq 2009-06-16 20:40 ` Johannes Stezenbach @ 2009-06-16 21:09 ` Andrew Morton 2009-06-16 21:18 ` Pallipadi, Venkatesh 2009-06-16 22:44 ` Johannes Stezenbach 0 siblings, 2 replies; 17+ messages in thread From: Andrew Morton @ 2009-06-16 21:09 UTC (permalink / raw) To: Johannes Stezenbach Cc: venkatesh.pallipadi, rjw, linux-kernel, davej, pavel, linux-acpi, lenb, arjan, tglx On Tue, 16 Jun 2009 22:40:39 +0200 Johannes Stezenbach <js@sig21.net> wrote: > On Tue, Jun 16, 2009 at 01:25:58PM -0700, Pallipadi, Venkatesh wrote: > > > > Can you try the patch below (your changes + a warnon). That should give > > the stack trace with successful suspend-resume. > > > > acpi-cpufreq will not directly disable interrupt and call these routines. > > So, it will be interesting to see how we are ending up in this state. > > Yes, I actually had the same idea and just did it ;-) > I also found this: > http://lkml.org/lkml/2007/7/17/674 > > ------------[ cut here ]------------ > WARNING: at kernel/up.c:18 smp_call_function_single+0x45/0x60() > Hardware name: 2373Y4M > Modules linked in: ath5k mac80211 cfg80211 uhci_hcd ehci_hcd > Pid: 4139, comm: bash Not tainted 2.6.30 #8 > Call Trace: > [<c011ea0d>] warn_slowpath_common+0x60/0x90 > [<c010d86c>] ? do_drv_read+0x0/0x31 > [<c011ea4a>] warn_slowpath_null+0xd/0x10 > [<c013acc1>] smp_call_function_single+0x45/0x60 > [<c010d4e5>] get_cur_val+0x62/0x6c > [<c010d72f>] get_cur_freq_on_cpu+0x35/0x58 > [<c03786e9>] cpufreq_suspend+0x76/0xd9 > [<c0136c3b>] ? clockevents_notify+0x1e/0x68 > [<c02ff570>] sysdev_suspend+0x4e/0x182 > [<c013fd28>] hibernation_snapshot+0x89/0x16b > [<c013fe99>] hibernate+0x8f/0x147 > [<c013ec82>] ? state_store+0x0/0xa2 > [<c013ecd7>] state_store+0x55/0xa2 > [<c013ec82>] ? state_store+0x0/0xa2 > [<c024dff5>] kobj_attr_store+0x1a/0x22 > [<c01a7164>] sysfs_write_file+0xb4/0xdf > [<c01a70b0>] ? sysfs_write_file+0x0/0xdf > [<c0170cf2>] vfs_write+0x8a/0x12c > [<c0170e2d>] sys_write+0x3b/0x60 > [<c01028f4>] sysenter_do_call+0x12/0x26 > ---[ end trace 1c2172bce3982a59 ]--- Right, so it's the suspend-must-disable-local-interrupts thing. Again. create_image()'s local_irq_disable(). It was wrong to call work_on_cpu() with lcoal interrupts disabled, and it's now wrong to call smp_call_function_single() with local interrupts disabled. It's just that smp_call_function_single() warns while work_on_cpu() didn't. That all explains the warning But afaik we still don't know why your machine actually failed. Perhaps it is a side-efect of emitting the warning when the console is in a weird state? So.. what to do? Possibly we could hack cpufreq to not use smp_call_function_single() if the call is to be done on the local CPU. But SMP might still be broken - if it really does want to do a cross-cpu call. Why does cpufreq need to do a cross-CPU get_cur_freq_on_cpu() call at suspend time _anyway_? Surely cpufreq knows the target CPU's frequency from its internal in-main-memory state? ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: 2.6.30: hibernation/swsusp lockup due to acpi-cpufreq 2009-06-16 21:09 ` Andrew Morton @ 2009-06-16 21:18 ` Pallipadi, Venkatesh 2009-06-16 21:39 ` Rafael J. Wysocki 2009-06-16 22:44 ` Johannes Stezenbach 1 sibling, 1 reply; 17+ messages in thread From: Pallipadi, Venkatesh @ 2009-06-16 21:18 UTC (permalink / raw) To: Andrew Morton Cc: Johannes Stezenbach, rjw@sisk.pl, linux-kernel@vger.kernel.org, davej@redhat.com, pavel@ucw.cz, linux-acpi@vger.kernel.org, lenb@kernel.org, arjan@infradead.org, tglx@linutronix.de, benh On Tue, 2009-06-16 at 14:09 -0700, Andrew Morton wrote: > On Tue, 16 Jun 2009 22:40:39 +0200 > Johannes Stezenbach <js@sig21.net> wrote: > > > On Tue, Jun 16, 2009 at 01:25:58PM -0700, Pallipadi, Venkatesh wrote: > > > > > > Can you try the patch below (your changes + a warnon). That should give > > > the stack trace with successful suspend-resume. > > > > > > acpi-cpufreq will not directly disable interrupt and call these routines. > > > So, it will be interesting to see how we are ending up in this state. > > > > Yes, I actually had the same idea and just did it ;-) > > I also found this: > > http://lkml.org/lkml/2007/7/17/674 > > > > ------------[ cut here ]------------ > > WARNING: at kernel/up.c:18 smp_call_function_single+0x45/0x60() > > Hardware name: 2373Y4M > > Modules linked in: ath5k mac80211 cfg80211 uhci_hcd ehci_hcd > > Pid: 4139, comm: bash Not tainted 2.6.30 #8 > > Call Trace: > > [<c011ea0d>] warn_slowpath_common+0x60/0x90 > > [<c010d86c>] ? do_drv_read+0x0/0x31 > > [<c011ea4a>] warn_slowpath_null+0xd/0x10 > > [<c013acc1>] smp_call_function_single+0x45/0x60 > > [<c010d4e5>] get_cur_val+0x62/0x6c > > [<c010d72f>] get_cur_freq_on_cpu+0x35/0x58 > > [<c03786e9>] cpufreq_suspend+0x76/0xd9 > > [<c0136c3b>] ? clockevents_notify+0x1e/0x68 > > [<c02ff570>] sysdev_suspend+0x4e/0x182 > > [<c013fd28>] hibernation_snapshot+0x89/0x16b > > [<c013fe99>] hibernate+0x8f/0x147 > > [<c013ec82>] ? state_store+0x0/0xa2 > > [<c013ecd7>] state_store+0x55/0xa2 > > [<c013ec82>] ? state_store+0x0/0xa2 > > [<c024dff5>] kobj_attr_store+0x1a/0x22 > > [<c01a7164>] sysfs_write_file+0xb4/0xdf > > [<c01a70b0>] ? sysfs_write_file+0x0/0xdf > > [<c0170cf2>] vfs_write+0x8a/0x12c > > [<c0170e2d>] sys_write+0x3b/0x60 > > [<c01028f4>] sysenter_do_call+0x12/0x26 > > ---[ end trace 1c2172bce3982a59 ]--- > > Right, so it's the suspend-must-disable-local-interrupts thing. Again. > create_image()'s local_irq_disable(). > > It was wrong to call work_on_cpu() with lcoal interrupts disabled, and > it's now wrong to call smp_call_function_single() with local interrupts > disabled. It's just that smp_call_function_single() warns while > work_on_cpu() didn't. > > That all explains the warning But afaik we still don't know why your > machine actually failed. Perhaps it is a side-efect of emitting the > warning when the console is in a weird state? > > So.. what to do? Possibly we could hack cpufreq to not use > smp_call_function_single() if the call is to be done on the local CPU. > But SMP might still be broken - if it really does want to do a cross-cpu > call. We surely do not need cross CPU cal at this point as all secondary cpus will be offline at this point. > Why does cpufreq need to do a cross-CPU get_cur_freq_on_cpu() call at > suspend time _anyway_? Surely cpufreq knows the target CPU's frequency > from its internal in-main-memory state? That was what I was wondering as well. Looks like this part of cpufreq_suspend came from commit 42d4dc3f4e1ec1396371aac89d0dccfdd977191b Author: Benjamin Herrenschmidt <benh@kernel.crashing.org> Date: Fri Apr 29 07:40:12 2005 -0700 [PATCH] Add suspend method to cpufreq core In order to properly fix some issues with cpufreq vs. sleep on PowerBooks, I had to add a suspend callback to the pmac_cpufreq driver. I must force a switch to full speed before sleep and I switch back to previous speed on resume. I also added a driver flag to disable the warnings in suspend/resume since it is expected in this case to have different speed (and I want it to fixup the jiffies properly). Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> benh: Do you think we still need this cpufreq_driver->get() and return error on (!cur_freq || !cpu_policy->cur) stuff? May be we should all the checks only if CPUFREQ_PM_NO_WARN is set? Thanks, Venki ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: 2.6.30: hibernation/swsusp lockup due to acpi-cpufreq 2009-06-16 21:18 ` Pallipadi, Venkatesh @ 2009-06-16 21:39 ` Rafael J. Wysocki 0 siblings, 0 replies; 17+ messages in thread From: Rafael J. Wysocki @ 2009-06-16 21:39 UTC (permalink / raw) To: Pallipadi, Venkatesh Cc: Andrew Morton, Johannes Stezenbach, linux-kernel@vger.kernel.org, davej@redhat.com, pavel@ucw.cz, linux-acpi@vger.kernel.org, lenb@kernel.org, arjan@infradead.org, tglx@linutronix.de, benh On Tuesday 16 June 2009, Pallipadi, Venkatesh wrote: > On Tue, 2009-06-16 at 14:09 -0700, Andrew Morton wrote: > > On Tue, 16 Jun 2009 22:40:39 +0200 > > > > Johannes Stezenbach <js@sig21.net> wrote: > > > On Tue, Jun 16, 2009 at 01:25:58PM -0700, Pallipadi, Venkatesh wrote: > > > > Can you try the patch below (your changes + a warnon). That should > > > > give the stack trace with successful suspend-resume. > > > > > > > > acpi-cpufreq will not directly disable interrupt and call these > > > > routines. So, it will be interesting to see how we are ending up in > > > > this state. > > > > > > Yes, I actually had the same idea and just did it ;-) > > > I also found this: > > > http://lkml.org/lkml/2007/7/17/674 > > > > > > ------------[ cut here ]------------ > > > WARNING: at kernel/up.c:18 smp_call_function_single+0x45/0x60() > > > Hardware name: 2373Y4M > > > Modules linked in: ath5k mac80211 cfg80211 uhci_hcd ehci_hcd > > > Pid: 4139, comm: bash Not tainted 2.6.30 #8 > > > Call Trace: > > > [<c011ea0d>] warn_slowpath_common+0x60/0x90 > > > [<c010d86c>] ? do_drv_read+0x0/0x31 > > > [<c011ea4a>] warn_slowpath_null+0xd/0x10 > > > [<c013acc1>] smp_call_function_single+0x45/0x60 > > > [<c010d4e5>] get_cur_val+0x62/0x6c > > > [<c010d72f>] get_cur_freq_on_cpu+0x35/0x58 > > > [<c03786e9>] cpufreq_suspend+0x76/0xd9 > > > [<c0136c3b>] ? clockevents_notify+0x1e/0x68 > > > [<c02ff570>] sysdev_suspend+0x4e/0x182 > > > [<c013fd28>] hibernation_snapshot+0x89/0x16b > > > [<c013fe99>] hibernate+0x8f/0x147 > > > [<c013ec82>] ? state_store+0x0/0xa2 > > > [<c013ecd7>] state_store+0x55/0xa2 > > > [<c013ec82>] ? state_store+0x0/0xa2 > > > [<c024dff5>] kobj_attr_store+0x1a/0x22 > > > [<c01a7164>] sysfs_write_file+0xb4/0xdf > > > [<c01a70b0>] ? sysfs_write_file+0x0/0xdf > > > [<c0170cf2>] vfs_write+0x8a/0x12c > > > [<c0170e2d>] sys_write+0x3b/0x60 > > > [<c01028f4>] sysenter_do_call+0x12/0x26 > > > ---[ end trace 1c2172bce3982a59 ]--- > > > > Right, so it's the suspend-must-disable-local-interrupts thing. Again. > > create_image()'s local_irq_disable(). > > > > It was wrong to call work_on_cpu() with lcoal interrupts disabled, and > > it's now wrong to call smp_call_function_single() with local interrupts > > disabled. It's just that smp_call_function_single() warns while > > work_on_cpu() didn't. > > > > That all explains the warning But afaik we still don't know why your > > machine actually failed. Perhaps it is a side-efect of emitting the > > warning when the console is in a weird state? > > > > So.. what to do? Possibly we could hack cpufreq to not use > > smp_call_function_single() if the call is to be done on the local CPU. > > But SMP might still be broken - if it really does want to do a cross-cpu > > call. > > We surely do not need cross CPU cal at this point as all secondary cpus > will be offline at this point. > > > Why does cpufreq need to do a cross-CPU get_cur_freq_on_cpu() call at > > suspend time _anyway_? Surely cpufreq knows the target CPU's frequency > > from its internal in-main-memory state? > > That was what I was wondering as well. Looks like this part of > cpufreq_suspend came from > > commit 42d4dc3f4e1ec1396371aac89d0dccfdd977191b > Author: Benjamin Herrenschmidt <benh@kernel.crashing.org> > Date: Fri Apr 29 07:40:12 2005 -0700 > > [PATCH] Add suspend method to cpufreq core > > In order to properly fix some issues with cpufreq vs. sleep on > PowerBooks, I had to add a suspend callback to the pmac_cpufreq > driver. > I must force a switch to full speed before sleep and I switch back > to > previous speed on resume. > > I also added a driver flag to disable the warnings in suspend/resume > since it is expected in this case to have different speed (and I > want it > to fixup the jiffies properly). > > Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> > Signed-off-by: Andrew Morton <akpm@osdl.org> > Signed-off-by: Linus Torvalds <torvalds@osdl.org> > > > > benh: Do you think we still need this cpufreq_driver->get() and return > error on (!cur_freq || !cpu_policy->cur) stuff? > May be we should all the checks only if CPUFREQ_PM_NO_WARN is set? In fact, we need to do this entire thing differently. The basic problem is that cpufreq_suspend() is a sysdev thing, so it will always be called with iterrupts off and *only* for CPU0. So, it looks like the majority of things we do there is just unnecessary (at least). Best, Rafael ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: 2.6.30: hibernation/swsusp lockup due to acpi-cpufreq 2009-06-16 21:09 ` Andrew Morton 2009-06-16 21:18 ` Pallipadi, Venkatesh @ 2009-06-16 22:44 ` Johannes Stezenbach 1 sibling, 0 replies; 17+ messages in thread From: Johannes Stezenbach @ 2009-06-16 22:44 UTC (permalink / raw) To: Andrew Morton Cc: venkatesh.pallipadi, rjw, linux-kernel, davej, pavel, linux-acpi, lenb, arjan, tglx On Tue, Jun 16, 2009 at 02:09:23PM -0700, Andrew Morton wrote: > > Right, so it's the suspend-must-disable-local-interrupts thing. Again. > create_image()'s local_irq_disable(). > > It was wrong to call work_on_cpu() with lcoal interrupts disabled, and > it's now wrong to call smp_call_function_single() with local interrupts > disabled. It's just that smp_call_function_single() warns while > work_on_cpu() didn't. > > That all explains the warning But afaik we still don't know why your > machine actually failed. Perhaps it is a side-efect of emitting the > warning when the console is in a weird state? smp_call_function_single() enables irqs and hibernate doesn't like that? BTW, I have no other UP machine to test with, but I reported in another thread that a !SMP kernel (or a SMP kernel with maxcpus=0 parameter) does not boot at all on my destop machine, see http://lkml.org/lkml/2009/6/12/468 No idea if I should be worried about this since the SMP kernel now works fine, another hibernate problem was solved in http://lkml.org/lkml/2009/6/14/156 Johannes ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: 2.6.30: hibernation/swsusp lockup due to acpi-cpufreq 2009-06-16 20:25 ` Pallipadi, Venkatesh 2009-06-16 20:40 ` Johannes Stezenbach @ 2009-06-16 20:44 ` Johannes Stezenbach 1 sibling, 0 replies; 17+ messages in thread From: Johannes Stezenbach @ 2009-06-16 20:44 UTC (permalink / raw) To: Pallipadi, Venkatesh Cc: Andrew Morton, rjw@sisk.pl, linux-kernel@vger.kernel.org, davej@redhat.com, pavel@ucw.cz, linux-acpi@vger.kernel.org, lenb@kernel.org, arjan@infradead.org, tglx@linutronix.de On Tue, Jun 16, 2009 at 01:25:58PM -0700, Pallipadi, Venkatesh wrote: > > diff --git a/kernel/up.c b/kernel/up.c > index 1ff27a2..a4318ff 100644 > --- a/kernel/up.c > +++ b/kernel/up.c > @@ -10,11 +10,15 @@ > int smp_call_function_single(int cpu, void (*func) (void *info), void *info, > int wait) > { > + unsigned long flags; > + > WARN_ON(cpu != 0); > > - local_irq_disable(); > + WARN_ON(irqs_disabled()); > + > + local_irq_save(flags); > (func)(info); > - local_irq_enable(); > + local_irq_restore(flags); > > return 0; > } > PS: It seems like a good idea to apply this patch with the warning even if the root cause of the hibernate problem is elsewhere, for better debuggability of such issues? Johannes ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2009-07-06 22:18 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <cNtMS-6Iq-11@gated-at.bofh.it>
[not found] ` <cNuzd-7VN-3@gated-at.bofh.it>
[not found] ` <cNHPT-3kB-1@gated-at.bofh.it>
[not found] ` <cNM3f-1ty-29@gated-at.bofh.it>
[not found] ` <cNMZg-2ZR-17@gated-at.bofh.it>
[not found] ` <cNNsg-3U5-3@gated-at.bofh.it>
[not found] ` <cNNLx-4kf-5@gated-at.bofh.it>
[not found] ` <cNOeG-5cB-23@gated-at.bofh.it>
[not found] ` <cNOok-5pJ-11@gated-at.bofh.it>
[not found] ` <cNOy0-5C5-23@gated-at.bofh.it>
2009-07-04 18:09 ` 2.6.30: hibernation/swsusp lockup due to acpi-cpufreq Michael Witten
2009-07-04 21:29 ` Rafael J. Wysocki
2009-07-06 21:20 ` Pallipadi, Venkatesh
2009-07-06 21:39 ` Rafael J. Wysocki
2009-07-06 22:16 ` Pallipadi, Venkatesh
2009-07-04 2:32 mfwitten
[not found] <20090615232709.GA6059@sig21.net>
2009-06-16 0:16 ` Rafael J. Wysocki
2009-06-16 14:22 ` Johannes Stezenbach
2009-06-16 18:55 ` Andrew Morton
2009-06-16 19:57 ` Johannes Stezenbach
2009-06-16 20:25 ` Pallipadi, Venkatesh
2009-06-16 20:40 ` Johannes Stezenbach
2009-06-16 21:09 ` Andrew Morton
2009-06-16 21:18 ` Pallipadi, Venkatesh
2009-06-16 21:39 ` Rafael J. Wysocki
2009-06-16 22:44 ` Johannes Stezenbach
2009-06-16 20:44 ` Johannes Stezenbach
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox