public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [BUG] APM resume breakage from 2.6.18-rc1 clocksource changes
@ 2006-07-09 20:58 Mikael Pettersson
  2006-07-09 21:20 ` john stultz
                   ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Mikael Pettersson @ 2006-07-09 20:58 UTC (permalink / raw)
  To: linux-kernel; +Cc: johnstul, pavel

On Fri, 7 Jul 2006 21:01:26 +0200 (MEST), I wrote:
>Kernel 2.6.18-rc1 broke resume from APM suspend (to RAM)
>on my old Dell Latitude CPi laptop. At resume the disk
>spins up and the screen gets lit, but there is no response
>to the keyboard, not even sysrq. All other system activity
>also appears to be halted.
>
>I did the obvious test of reverting apm.c to the 2.6.17
>version and fixing up the fallout from the TIF_POLLING_NRFLAG
>changes, but it made no difference. So the problem must be
>somewhere else.

I've traced the cause of this problem to the i386 time-keeping
changes in kernel 2.6.17-git11. What happens is that:
- The kernel autoselects TSC as my clocksource, which is
  reasonable since it's a PentiumII. 2.6.17 also chose the TSC.
- Immediately after APM resumes (arch/i386/kernel/apm.c line
  1231 in 2.6.18-rc1) there is an interrupt from the PIT,
  which takes us to kernel/timer.c:update_wall_time().
- update_wall_time() does a clocksource_read() and computes
  the offset from the previous read. However, the TSC was
  reset by HW or BIOS during the APM suspend/resume cycle and
  is now smaller than it was at the prevous read. On my machine,
  the offset is 0xffffffd598e0a566 at this point, which appears
  to throw update_wall_time() into a very very long loop.

Hacks around the problem:
- echo jiffies>/sys/devices/system/clocksource/clocksource0/current_clocksource
  before suspending eliminates the hang. (Note: this doesn't affect
  the i386 delay loop implementation. Is that a bug or a feature?)
- Hacking apm.c to set a flag just before suspending and clear
  it only after all resume actions are done, and to have update_wall_time()
  return immediately when this flag is set also eliminates the hang.

I guess the appropriate fix would be to augment either the
TSC clocksource driver or the clocksource API to deal with
pseudo-monotonic clocks that get reset at suspend.

/Mikael

^ permalink raw reply	[flat|nested] 28+ messages in thread
* Re: [BUG] APM resume breakage from 2.6.18-rc1 clocksource changes
@ 2006-07-09 23:52 Mikael Pettersson
  2006-07-10  7:55 ` Pavel Machek
  2006-07-10 17:58 ` john stultz
  0 siblings, 2 replies; 28+ messages in thread
From: Mikael Pettersson @ 2006-07-09 23:52 UTC (permalink / raw)
  To: johnstul, mikpe; +Cc: linux-kernel, pavel

On Sun, 09 Jul 2006 14:20:56 -0700, john stultz wrote:
>> I've traced the cause of this problem to the i386 time-keeping
>> changes in kernel 2.6.17-git11. What happens is that:
>> - The kernel autoselects TSC as my clocksource, which is
>>   reasonable since it's a PentiumII. 2.6.17 also chose the TSC.
>> - Immediately after APM resumes (arch/i386/kernel/apm.c line
>>   1231 in 2.6.18-rc1) there is an interrupt from the PIT,
>>   which takes us to kernel/timer.c:update_wall_time().
>> - update_wall_time() does a clocksource_read() and computes
>>   the offset from the previous read. However, the TSC was
>>   reset by HW or BIOS during the APM suspend/resume cycle and
>>   is now smaller than it was at the prevous read. On my machine,
>>   the offset is 0xffffffd598e0a566 at this point, which appears
>>   to throw update_wall_time() into a very very long loop.
>
>Huh. It seems you're getting an interrupt before timekeeping_resume()
>runs (which resets cycle_last). I'll look over the code and see if I can
>sort out why it works w/ ACPI suspend, but not APM, or if the
>resume/interrupt-enablement bit is just racy in general.

I forgot to mention this, but I had a debug printk() in apm.c
which showed that irqs_disabled() == 0 at the point when APM
resumes the kernel.

/Mikael

^ permalink raw reply	[flat|nested] 28+ messages in thread
* Re: [BUG] APM resume breakage from 2.6.18-rc1 clocksource changes
@ 2006-07-09 23:53 Mikael Pettersson
  0 siblings, 0 replies; 28+ messages in thread
From: Mikael Pettersson @ 2006-07-09 23:53 UTC (permalink / raw)
  To: Valdis.Kletnieks, mikpe; +Cc: johnstul, linux-kernel, pavel

On Sun, 09 Jul 2006 17:31:39 -0400, Valdis.Kletnieks wrote:
>On Sun, 09 Jul 2006 22:58:31 +0200, Mikael Pettersson said:
>
>> I've traced the cause of this problem to the i386 time-keeping
>> changes in kernel 2.6.17-git11. What happens is that:
>> - The kernel autoselects TSC as my clocksource, which is
>>   reasonable since it's a PentiumII. 2.6.17 also chose the TSC.
>> - Immediately after APM resumes (arch/i386/kernel/apm.c line
>>   1231 in 2.6.18-rc1) there is an interrupt from the PIT,
>>   which takes us to kernel/timer.c:update_wall_time().
>> - update_wall_time() does a clocksource_read() and computes
>>   the offset from the previous read. However, the TSC was
>>   reset by HW or BIOS during the APM suspend/resume cycle and
>>   is now smaller than it was at the prevous read. On my machine,
>>   the offset is 0xffffffd598e0a566 at this point, which appears
>>   to throw update_wall_time() into a very very long loop.
>
>Does applying this patch make it work?
>
>ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.18-rc1/2.6.18-rc1-mm1/broken-out/adjust-clock-for-lost-ticks.patch
>
>Or is this a different breakage?

I haven't tried it, but I'm 99% certain it's unrelated breakage.
In my case the TSC jumped backwards not forwards as would be the
case in a lost ticks scenario, and the patch above doesn't touch
the code that gets into an semi-infinite loop in my case.

/Mikael

^ permalink raw reply	[flat|nested] 28+ messages in thread
* Re: [BUG] APM resume breakage from 2.6.18-rc1 clocksource changes
@ 2006-07-10 23:36 Mikael Pettersson
  0 siblings, 0 replies; 28+ messages in thread
From: Mikael Pettersson @ 2006-07-10 23:36 UTC (permalink / raw)
  To: johnstul, mikpe; +Cc: linux-kernel, pavel

On Mon, 10 Jul 2006 10:58:47 -0700, john stultz wrote:
>So it seems possible that the timer tick will be enabled before the
>timekeeping resume code runs. I'm not sure why this isn't seen w/ ACPI
>suspend/resume, as I think they're using the same
>sysdev_class .suspend/.resume bits. 
>
>Anyway, I think this patch should fix it (I've only compile tested it,
>as I don't have my laptop on me right now). Would you mind giving it a
>try? 
>
>thanks
>-john

With this patch my Latitude resumes OK from APM suspend again.

Thanks John.

Signed-off-by: Mikael Pettersson <mikpe@it.uu.se>

>diff --git a/kernel/timer.c b/kernel/timer.c
>index 396a3c0..afaa594 100644
>--- a/kernel/timer.c
>+++ b/kernel/timer.c
>@@ -966,7 +966,7 @@ void __init timekeeping_init(void)
> 	write_sequnlock_irqrestore(&xtime_lock, flags);
> }
> 
>-
>+static int timekeeping_suspended;
> /*
>  * timekeeping_resume - Resumes the generic timekeeping subsystem.
>  * @dev:	unused
>@@ -982,6 +982,18 @@ static int timekeeping_resume(struct sys
> 	write_seqlock_irqsave(&xtime_lock, flags);
> 	/* restart the last cycle value */
> 	clock->cycle_last = clocksource_read(clock);
>+	clock->error = 0;
>+	timekeeping_suspended = 0;
>+	write_sequnlock_irqrestore(&xtime_lock, flags);
>+	return 0;
>+}
>+
>+static int timekeeping_suspend(struct sys_device *dev, pm_message_t state)
>+{
>+	unsigned long flags;
>+
>+	write_seqlock_irqsave(&xtime_lock, flags);
>+	timekeeping_suspended = 1;
> 	write_sequnlock_irqrestore(&xtime_lock, flags);
> 	return 0;
> }
>@@ -989,6 +1001,7 @@ static int timekeeping_resume(struct sys
> /* sysfs resume/suspend bits for timekeeping */
> static struct sysdev_class timekeeping_sysclass = {
> 	.resume		= timekeeping_resume,
>+	.suspend	= timekeeping_suspend,
> 	set_kset_name("timekeeping"),
> };
> 
>@@ -1090,14 +1103,17 @@ static void clocksource_adjust(struct cl
> static void update_wall_time(void)
> {
> 	cycle_t offset;
>-
>-	clock->xtime_nsec += (s64)xtime.tv_nsec << clock->shift;
>+	
>+	/* avoid timekeeping before we're fully resumed */
>+	if (unlikely(timekeeping_suspended))
>+		return;
> 
> #ifdef CONFIG_GENERIC_TIME
> 	offset = (clocksource_read(clock) - clock->cycle_last) & clock->mask;
> #else
> 	offset = clock->cycle_interval;
> #endif
>+	clock->xtime_nsec += (s64)xtime.tv_nsec << clock->shift;
> 
> 	/* normally this loop will run just once, however in the
> 	 * case of lost or late ticks, it will accumulate correctly.
>
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2006-07-16 16:15 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-07-09 20:58 [BUG] APM resume breakage from 2.6.18-rc1 clocksource changes Mikael Pettersson
2006-07-09 21:20 ` john stultz
2006-07-09 21:31 ` Valdis.Kletnieks
2006-07-09 21:44 ` Pavel Machek
2006-07-09 22:51   ` Alan Cox
  -- strict thread matches above, loose matches on Subject: below --
2006-07-09 23:52 Mikael Pettersson
2006-07-10  7:55 ` Pavel Machek
2006-07-10 17:58 ` john stultz
2006-07-10 18:08   ` Pavel Machek
2006-07-10 18:19     ` john stultz
2006-07-10 22:37     ` Roman Zippel
2006-07-10 22:50       ` john stultz
2006-07-10 22:59         ` Roman Zippel
2006-07-11  8:07           ` Thomas Gleixner
2006-07-11  9:29             ` Roman Zippel
2006-07-11 11:07               ` Thomas Gleixner
2006-07-11 23:31                 ` Roman Zippel
2006-07-12  0:42                   ` john stultz
2006-07-13 20:27                     ` Thomas Gleixner
2006-07-13 22:05                       ` john stultz
2006-07-14  6:56                         ` Thomas Gleixner
2006-07-16 15:52                         ` Roman Zippel
2006-07-16 15:50                       ` Roman Zippel
2006-07-16 16:09                         ` Thomas Gleixner
2006-07-16 16:15                           ` Roman Zippel
2006-07-10 23:17       ` Pavel Machek
2006-07-09 23:53 Mikael Pettersson
2006-07-10 23:36 Mikael Pettersson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox