* [Regression][Revert request] Excessive delay or hang during resume from system suspend due to a hrtimer commit
@ 2012-07-15 20:40 Rafael J. Wysocki
2012-07-16 9:47 ` Thomas Gleixner
0 siblings, 1 reply; 7+ messages in thread
From: Rafael J. Wysocki @ 2012-07-15 20:40 UTC (permalink / raw)
To: Linus Torvalds
Cc: Linux PM list, LKML, John Stultz, Ingo Molnar, Peter Zijlstra,
Prarit Bhargava, stable, Thomas Gleixner, Andreas Schwab
Hi Linus,
Please revert:
commit 5baefd6d84163443215f4a99f6a20f054ef11236
Author: John Stultz <johnstul@us.ibm.com>
Date: Tue Jul 10 18:43:25 2012 -0400
hrtimer: Update hrtimer base offsets each hrtimer_interrupt
This breaks resume on the iBook G4 and Toshiba Portege R500 (at least), by
adding an excessive delay to it (the Toshiba box sometimes hangs hard during
resume from system suspend). According to Andreas
(https://lkml.org/lkml/2012/7/15/66):
"Apparently during or before noirq resume the system is hanging by the same
amount of time as the system was sleeping."
which seems to agree with my observations.
Given that the two known-affected boxes are so different, it is quite probable
that the total number of affected systems is actually quite high.
Thanks!
To everyone involved: the fact that this change, which was likely to introduce
regressions from the look of it alone, has been pushed to Linus (an to -stable
at the same time!) so late in the cycle, is seriuosly disappointing.
Thanks,
Rafael
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [Regression][Revert request] Excessive delay or hang during resume from system suspend due to a hrtimer commit 2012-07-15 20:40 [Regression][Revert request] Excessive delay or hang during resume from system suspend due to a hrtimer commit Rafael J. Wysocki @ 2012-07-16 9:47 ` Thomas Gleixner 2012-07-16 11:16 ` Rafael J. Wysocki 0 siblings, 1 reply; 7+ messages in thread From: Thomas Gleixner @ 2012-07-16 9:47 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linus Torvalds, Linux PM list, LKML, John Stultz, Ingo Molnar, Peter Zijlstra, Prarit Bhargava, stable, Andreas Schwab On Sun, 15 Jul 2012, Rafael J. Wysocki wrote: > To everyone involved: the fact that this change, which was likely to introduce > regressions from the look of it alone, has been pushed to Linus (an to -stable > at the same time!) so late in the cycle, is seriuosly disappointing. Well, we spent an massive amount of time in testing, reviewing and discussion and it definitely did not break suspend/resume here. This was not pushed without a lot of thoughts and in fact what you are seing is another long standing bug in the timekeeping resume code, which was just papered over by the incorrect handling of the clock was set cases in the other parts of the system. Does the following patch fix the problem for you ? @John: Should that clear ntp as well or is it enough to set ntp_error to 0 ? /me really goes on vacation now. Thanks, tglx --------- diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index 269b1fe..3447cfa 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -717,6 +717,7 @@ static void timekeeping_resume(void) timekeeper.clock->cycle_last = timekeeper.clock->read(timekeeper.clock); timekeeper.ntp_error = 0; timekeeping_suspended = 0; + timekeeping_update(false); write_sequnlock_irqrestore(&timekeeper.lock, flags); touch_softlockup_watchdog(); ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [Regression][Revert request] Excessive delay or hang during resume from system suspend due to a hrtimer commit 2012-07-16 9:47 ` Thomas Gleixner @ 2012-07-16 11:16 ` Rafael J. Wysocki 2012-07-16 11:15 ` Thomas Gleixner 2012-07-16 12:48 ` Andreas Schwab 0 siblings, 2 replies; 7+ messages in thread From: Rafael J. Wysocki @ 2012-07-16 11:16 UTC (permalink / raw) To: Thomas Gleixner Cc: Linus Torvalds, Linux PM list, LKML, John Stultz, Ingo Molnar, Peter Zijlstra, Prarit Bhargava, stable, Andreas Schwab On Monday, July 16, 2012, Thomas Gleixner wrote: > On Sun, 15 Jul 2012, Rafael J. Wysocki wrote: > > To everyone involved: the fact that this change, which was likely to introduce > > regressions from the look of it alone, has been pushed to Linus (an to -stable > > at the same time!) so late in the cycle, is seriuosly disappointing. > > Well, we spent an massive amount of time in testing, reviewing and > discussion and it definitely did not break suspend/resume here. I'm not saying that you didn't consider it thoroughly, but unfortunately you did overlook this particular issue, didn't you? > This was not pushed without a lot of thoughts and in fact what you are > seing is another long standing bug in the timekeeping resume code, > which was just papered over by the incorrect handling of the clock was > set cases in the other parts of the system. > > Does the following patch fix the problem for you ? Yes, it does, thanks! > @John: Should that clear ntp as well or is it enough to set ntp_error > to 0 ? > > /me really goes on vacation now. So who's going to take care of the patch? :-) Rafael > --------- > diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c > index 269b1fe..3447cfa 100644 > --- a/kernel/time/timekeeping.c > +++ b/kernel/time/timekeeping.c > @@ -717,6 +717,7 @@ static void timekeeping_resume(void) > timekeeper.clock->cycle_last = timekeeper.clock->read(timekeeper.clock); > timekeeper.ntp_error = 0; > timekeeping_suspended = 0; > + timekeeping_update(false); > write_sequnlock_irqrestore(&timekeeper.lock, flags); > > touch_softlockup_watchdog(); > > > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Regression][Revert request] Excessive delay or hang during resume from system suspend due to a hrtimer commit 2012-07-16 11:16 ` Rafael J. Wysocki @ 2012-07-16 11:15 ` Thomas Gleixner 2012-07-16 11:26 ` Thomas Gleixner 2012-07-16 12:48 ` Andreas Schwab 1 sibling, 1 reply; 7+ messages in thread From: Thomas Gleixner @ 2012-07-16 11:15 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linus Torvalds, Linux PM list, LKML, John Stultz, Ingo Molnar, Peter Zijlstra, Prarit Bhargava, stable, Andreas Schwab On Mon, 16 Jul 2012, Rafael J. Wysocki wrote: > On Monday, July 16, 2012, Thomas Gleixner wrote: > > On Sun, 15 Jul 2012, Rafael J. Wysocki wrote: > > > To everyone involved: the fact that this change, which was likely to introduce > > > regressions from the look of it alone, has been pushed to Linus (an to -stable > > > at the same time!) so late in the cycle, is seriuosly disappointing. > > > > Well, we spent an massive amount of time in testing, reviewing and > > discussion and it definitely did not break suspend/resume here. > > I'm not saying that you didn't consider it thoroughly, but unfortunately you > did overlook this particular issue, didn't you? > > > This was not pushed without a lot of thoughts and in fact what you are > > seing is another long standing bug in the timekeeping resume code, > > which was just papered over by the incorrect handling of the clock was > > set cases in the other parts of the system. > > > > Does the following patch fix the problem for you ? > > Yes, it does, thanks! > > > @John: Should that clear ntp as well or is it enough to set ntp_error > > to 0 ? > > > > /me really goes on vacation now. > > So who's going to take care of the patch? :-) I'm still packing gear. So i'll push it into timers/urgent. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Regression][Revert request] Excessive delay or hang during resume from system suspend due to a hrtimer commit 2012-07-16 11:15 ` Thomas Gleixner @ 2012-07-16 11:26 ` Thomas Gleixner 2012-07-16 15:47 ` John Stultz 0 siblings, 1 reply; 7+ messages in thread From: Thomas Gleixner @ 2012-07-16 11:26 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linus Torvalds, Linux PM list, LKML, John Stultz, Ingo Molnar, Peter Zijlstra, Prarit Bhargava, stable, Andreas Schwab On Mon, 16 Jul 2012, Thomas Gleixner wrote: > On Mon, 16 Jul 2012, Rafael J. Wysocki wrote: > > > On Monday, July 16, 2012, Thomas Gleixner wrote: > > > On Sun, 15 Jul 2012, Rafael J. Wysocki wrote: > > > > To everyone involved: the fact that this change, which was likely to introduce > > > > regressions from the look of it alone, has been pushed to Linus (an to -stable > > > > at the same time!) so late in the cycle, is seriuosly disappointing. > > > > > > Well, we spent an massive amount of time in testing, reviewing and > > > discussion and it definitely did not break suspend/resume here. > > > > I'm not saying that you didn't consider it thoroughly, but unfortunately you > > did overlook this particular issue, didn't you? > > > > > This was not pushed without a lot of thoughts and in fact what you are > > > seing is another long standing bug in the timekeeping resume code, > > > which was just papered over by the incorrect handling of the clock was > > > set cases in the other parts of the system. > > > > > > Does the following patch fix the problem for you ? > > > > Yes, it does, thanks! > > > > > @John: Should that clear ntp as well or is it enough to set ntp_error > > > to 0 ? > > > > > > /me really goes on vacation now. > > > > So who's going to take care of the patch? :-) > > I'm still packing gear. So i'll push it into timers/urgent. Actually that's a bad idea. John want's to double check vs. the ntp_clear question. So John can send it to linus directly. @John: Should it be: timekeeping_update(true) Now I'm gone for real. Thanks, tglx ----- Subject: timekeeping: Add missing update call in timekeeping_resume() From: Thomas Gleixner <tglx@linutronix.de> Date: Mon, 16 Jul 2012 11:47:31 +0200 (CEST) The leap second rework unearthed another issue of inconsistent data. On timekeeping_resume() the timekeeper data is updated, but nothing calls timekeeping_update(), so now the update code in the timer interrupt sees stale values. This has been the case before those changes, but then the timer interrupt was using stale data as well so this went unnoticed for quite some time. Add the missing update call, so all the data is consistent everywhere. Reported-by: Andreas Schwab <schwab@linux-m68k.org> Reported-by-and-tested-by: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Linux PM list <linux-pm@vger.kernel.org> Cc: John Stultz <johnstul@us.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>, Cc: Prarit Bhargava <prarit@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Index: tip/kernel/time/timekeeping.c =================================================================== --- tip.orig/kernel/time/timekeeping.c +++ tip/kernel/time/timekeeping.c @@ -717,6 +717,7 @@ static void timekeeping_resume(void) timekeeper.clock->cycle_last = timekeeper.clock->read(timekeeper.clock); timekeeper.ntp_error = 0; timekeeping_suspended = 0; + timekeeping_update(false); write_sequnlock_irqrestore(&timekeeper.lock, flags); touch_softlockup_watchdog(); ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Regression][Revert request] Excessive delay or hang during resume from system suspend due to a hrtimer commit 2012-07-16 11:26 ` Thomas Gleixner @ 2012-07-16 15:47 ` John Stultz 0 siblings, 0 replies; 7+ messages in thread From: John Stultz @ 2012-07-16 15:47 UTC (permalink / raw) To: Thomas Gleixner Cc: Rafael J. Wysocki, Linus Torvalds, Linux PM list, LKML, Ingo Molnar, Peter Zijlstra, Prarit Bhargava, stable, Andreas Schwab On 07/16/2012 04:26 AM, Thomas Gleixner wrote: > On Mon, 16 Jul 2012, Thomas Gleixner wrote: > >> On Mon, 16 Jul 2012, Rafael J. Wysocki wrote: >> >>> On Monday, July 16, 2012, Thomas Gleixner wrote: >>>> On Sun, 15 Jul 2012, Rafael J. Wysocki wrote: >>>>> To everyone involved: the fact that this change, which was likely to introduce >>>>> regressions from the look of it alone, has been pushed to Linus (an to -stable >>>>> at the same time!) so late in the cycle, is seriuosly disappointing. >>>> Well, we spent an massive amount of time in testing, reviewing and >>>> discussion and it definitely did not break suspend/resume here. >>> I'm not saying that you didn't consider it thoroughly, but unfortunately you >>> did overlook this particular issue, didn't you? >>> >>>> This was not pushed without a lot of thoughts and in fact what you are >>>> seing is another long standing bug in the timekeeping resume code, >>>> which was just papered over by the incorrect handling of the clock was >>>> set cases in the other parts of the system. >>>> >>>> Does the following patch fix the problem for you ? >>> Yes, it does, thanks! >>> >>>> @John: Should that clear ntp as well or is it enough to set ntp_error >>>> to 0 ? >>>> >>>> /me really goes on vacation now. >>> So who's going to take care of the patch? :-) >> I'm still packing gear. So i'll push it into timers/urgent. > Actually that's a bad idea. John want's to double check vs. the > ntp_clear question. So John can send it to linus directly. > > @John: Should it be: timekeeping_update(true) I think its better to leave it as false, so we don't reset the NTP state machine completely after suspend. When we come back from suspend our error is usually off by the persistent_clock/rtc granularity, so it might make sense, but I'd want a lot more testing of using ntp over suspend before changing the existing behavior of not doing it. > Now I'm gone for real. Ok. Thanks for spinning this up so quickly. I'll go ahead and send it on to Linus. thanks -john ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Regression][Revert request] Excessive delay or hang during resume from system suspend due to a hrtimer commit 2012-07-16 11:16 ` Rafael J. Wysocki 2012-07-16 11:15 ` Thomas Gleixner @ 2012-07-16 12:48 ` Andreas Schwab 1 sibling, 0 replies; 7+ messages in thread From: Andreas Schwab @ 2012-07-16 12:48 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Thomas Gleixner, Linus Torvalds, Linux PM list, LKML, John Stultz, Ingo Molnar, Peter Zijlstra, Prarit Bhargava, stable "Rafael J. Wysocki" <rjw@sisk.pl> writes: > On Monday, July 16, 2012, Thomas Gleixner wrote: >> Does the following patch fix the problem for you ? > > Yes, it does, thanks! Works for me as well. Andreas. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2012-07-16 15:50 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-07-15 20:40 [Regression][Revert request] Excessive delay or hang during resume from system suspend due to a hrtimer commit Rafael J. Wysocki 2012-07-16 9:47 ` Thomas Gleixner 2012-07-16 11:16 ` Rafael J. Wysocki 2012-07-16 11:15 ` Thomas Gleixner 2012-07-16 11:26 ` Thomas Gleixner 2012-07-16 15:47 ` John Stultz 2012-07-16 12:48 ` Andreas Schwab
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox