linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] sched_clock: Avoid corrupting hrtimer tree during suspend
@ 2014-07-18 22:09 Stephen Boyd
  2014-07-18 22:25 ` John Stultz
  0 siblings, 1 reply; 7+ messages in thread
From: Stephen Boyd @ 2014-07-18 22:09 UTC (permalink / raw)
  To: linux-arm-kernel

During suspend we call sched_clock_poll() to update the epoch and
accumulated time and reprogram the sched_clock_timer to fire
before the next wrap-around time. Unfortunately,
sched_clock_poll() doesn't restart the timer, instead it relies
on the hrtimer layer to do that and during suspend we aren't
calling that function from the hrtimer layer. Instead, we're
reprogramming the expires time while the hrtimer is enqueued,
which can cause the hrtimer tree to be corrupted. Fix this
problem by updating the state via update_sched_clock() and
properly restarting the timer via hrtimer_start().

Fixes: a08ca5d1089d "sched_clock: Use an hrtimer instead of timer"
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
---

I also wonder if we should be restarting the timer during resume
instead of suspend given that the resume path modifies the epoch.
At that point timers can't run because interrupts are disabled and
we don't really care if the timer fires earlier than it's supposed
to anyway because it's just there to avoid rollover events, but
does it seem better to do it that way? I didn't send that version
because this patch is to fix the code intention, but I'm curious
if anyone else feels like it should be changed.

 kernel/time/sched_clock.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/time/sched_clock.c b/kernel/time/sched_clock.c
index 445106d2c729..9e32ce88e9ee 100644
--- a/kernel/time/sched_clock.c
+++ b/kernel/time/sched_clock.c
@@ -191,7 +191,9 @@ void __init sched_clock_postinit(void)
 
 static int sched_clock_suspend(void)
 {
-	sched_clock_poll(&sched_clock_timer);
+	update_sched_clock();
+	/* Restart the timer because we forced an update */
+	hrtimer_start(&sched_clock_timer, cd.wrap_kt, HRTIMER_MODE_REL);
 	cd.suspended = true;
 	return 0;
 }
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH] sched_clock: Avoid corrupting hrtimer tree during suspend
  2014-07-18 22:09 [PATCH] sched_clock: Avoid corrupting hrtimer tree during suspend Stephen Boyd
@ 2014-07-18 22:25 ` John Stultz
  2014-07-18 22:38   ` Stephen Boyd
  0 siblings, 1 reply; 7+ messages in thread
From: John Stultz @ 2014-07-18 22:25 UTC (permalink / raw)
  To: linux-arm-kernel

On 07/18/2014 03:09 PM, Stephen Boyd wrote:
> During suspend we call sched_clock_poll() to update the epoch and
> accumulated time and reprogram the sched_clock_timer to fire
> before the next wrap-around time. Unfortunately,
> sched_clock_poll() doesn't restart the timer, instead it relies
> on the hrtimer layer to do that and during suspend we aren't
> calling that function from the hrtimer layer. Instead, we're
> reprogramming the expires time while the hrtimer is enqueued,
> which can cause the hrtimer tree to be corrupted. Fix this
> problem by updating the state via update_sched_clock() and
> properly restarting the timer via hrtimer_start().
>
> Fixes: a08ca5d1089d "sched_clock: Use an hrtimer instead of timer"
> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
> ---
>
> I also wonder if we should be restarting the timer during resume
> instead of suspend given that the resume path modifies the epoch.
> At that point timers can't run because interrupts are disabled and
> we don't really care if the timer fires earlier than it's supposed
> to anyway because it's just there to avoid rollover events, but
> does it seem better to do it that way? I didn't send that version
> because this patch is to fix the code intention, but I'm curious
> if anyone else feels like it should be changed.

Yea, starting the timer on suspend seems unintuitive to me.

Is this something you were hoping to get in for 3.17 or is this a urgent
3.16 item?

thanks
-john

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH] sched_clock: Avoid corrupting hrtimer tree during suspend
  2014-07-18 22:25 ` John Stultz
@ 2014-07-18 22:38   ` Stephen Boyd
  2014-07-18 22:42     ` John Stultz
  0 siblings, 1 reply; 7+ messages in thread
From: Stephen Boyd @ 2014-07-18 22:38 UTC (permalink / raw)
  To: linux-arm-kernel

On 07/18/14 15:25, John Stultz wrote:
> On 07/18/2014 03:09 PM, Stephen Boyd wrote:
>> During suspend we call sched_clock_poll() to update the epoch and
>> accumulated time and reprogram the sched_clock_timer to fire
>> before the next wrap-around time. Unfortunately,
>> sched_clock_poll() doesn't restart the timer, instead it relies
>> on the hrtimer layer to do that and during suspend we aren't
>> calling that function from the hrtimer layer. Instead, we're
>> reprogramming the expires time while the hrtimer is enqueued,
>> which can cause the hrtimer tree to be corrupted. Fix this
>> problem by updating the state via update_sched_clock() and
>> properly restarting the timer via hrtimer_start().
>>
>> Fixes: a08ca5d1089d "sched_clock: Use an hrtimer instead of timer"
>> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
>> ---
>>
>> I also wonder if we should be restarting the timer during resume
>> instead of suspend given that the resume path modifies the epoch.
>> At that point timers can't run because interrupts are disabled and
>> we don't really care if the timer fires earlier than it's supposed
>> to anyway because it's just there to avoid rollover events, but
>> does it seem better to do it that way? I didn't send that version
>> because this patch is to fix the code intention, but I'm curious
>> if anyone else feels like it should be changed.
> Yea, starting the timer on suspend seems unintuitive to me.
>
> Is this something you were hoping to get in for 3.17 or is this a urgent
> 3.16 item?

Ok I'll send a follow up patch to cancel during suspend and start during
resume, unless you want that to be part of this fix? It's a regression
back to v3.13 so I would think it's urgent, although I haven't seen any
reports on the mailing list, just reports on some of our android kernels.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH] sched_clock: Avoid corrupting hrtimer tree during suspend
  2014-07-18 22:38   ` Stephen Boyd
@ 2014-07-18 22:42     ` John Stultz
  2014-07-18 23:24       ` Stephen Boyd
  0 siblings, 1 reply; 7+ messages in thread
From: John Stultz @ 2014-07-18 22:42 UTC (permalink / raw)
  To: linux-arm-kernel

On 07/18/2014 03:38 PM, Stephen Boyd wrote:
> On 07/18/14 15:25, John Stultz wrote:
>> On 07/18/2014 03:09 PM, Stephen Boyd wrote:
>>> During suspend we call sched_clock_poll() to update the epoch and
>>> accumulated time and reprogram the sched_clock_timer to fire
>>> before the next wrap-around time. Unfortunately,
>>> sched_clock_poll() doesn't restart the timer, instead it relies
>>> on the hrtimer layer to do that and during suspend we aren't
>>> calling that function from the hrtimer layer. Instead, we're
>>> reprogramming the expires time while the hrtimer is enqueued,
>>> which can cause the hrtimer tree to be corrupted. Fix this
>>> problem by updating the state via update_sched_clock() and
>>> properly restarting the timer via hrtimer_start().
>>>
>>> Fixes: a08ca5d1089d "sched_clock: Use an hrtimer instead of timer"
>>> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
>>> ---
>>>
>>> I also wonder if we should be restarting the timer during resume
>>> instead of suspend given that the resume path modifies the epoch.
>>> At that point timers can't run because interrupts are disabled and
>>> we don't really care if the timer fires earlier than it's supposed
>>> to anyway because it's just there to avoid rollover events, but
>>> does it seem better to do it that way? I didn't send that version
>>> because this patch is to fix the code intention, but I'm curious
>>> if anyone else feels like it should be changed.
>> Yea, starting the timer on suspend seems unintuitive to me.
>>
>> Is this something you were hoping to get in for 3.17 or is this a urgent
>> 3.16 item?
> Ok I'll send a follow up patch to cancel during suspend and start during
> resume, unless you want that to be part of this fix? It's a regression
> back to v3.13 so I would think it's urgent, although I haven't seen any
> reports on the mailing list, just reports on some of our android kernels.

If its a regression (and needs -stable backports) it needs to go in via
tip/timers/urgent, and not via the regular merge window.

Whats the additional risk -stable wise for canceling the timer during
suspend and starting it back up during resume?

thanks
-john

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH] sched_clock: Avoid corrupting hrtimer tree during suspend
  2014-07-18 22:42     ` John Stultz
@ 2014-07-18 23:24       ` Stephen Boyd
  2014-07-19  0:14         ` John Stultz
  0 siblings, 1 reply; 7+ messages in thread
From: Stephen Boyd @ 2014-07-18 23:24 UTC (permalink / raw)
  To: linux-arm-kernel

On 07/18/14 15:42, John Stultz wrote:
> If its a regression (and needs -stable backports) it needs to go in via
> tip/timers/urgent, and not via the regular merge window.
>
> Whats the additional risk -stable wise for canceling the timer during
> suspend and starting it back up during resume?
>

I'd say close to zero given that we'd only be making the timer run a
little bit later and we have slack in there already. Here's that version.

----8<-----

Subject: [PATCH] sched_clock: Avoid corrupting hrtimer tree during suspend

During suspend we call sched_clock_poll() to update the epoch and
accumulated time and reprogram the sched_clock_timer to fire
before the next wrap-around time. Unfortunately,
sched_clock_poll() doesn't restart the timer, instead it relies
on the hrtimer layer to do that and during suspend we aren't
calling that function from the hrtimer layer. Instead, we're
reprogramming the expires time while the hrtimer is enqueued,
which can cause the hrtimer tree to be corrupted. Furthermore, we
restart the timer during suspend but we update the epoch during
resume which seems counter-intuitive.

Let's fix this by saving the accumulated state and canceling the
timer during suspend. On resume we can update the epoch and
restart the timer similar to what we would do if we were starting
the clock for the first time.

Fixes: a08ca5d1089d "sched_clock: Use an hrtimer instead of timer"
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
---
 kernel/time/sched_clock.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/time/sched_clock.c b/kernel/time/sched_clock.c
index 445106d2c729..01d2d15aa662 100644
--- a/kernel/time/sched_clock.c
+++ b/kernel/time/sched_clock.c
@@ -191,7 +191,8 @@ void __init sched_clock_postinit(void)
 
 static int sched_clock_suspend(void)
 {
-	sched_clock_poll(&sched_clock_timer);
+	update_sched_clock();
+	hrtimer_cancel(&sched_clock_timer);
 	cd.suspended = true;
 	return 0;
 }
@@ -199,6 +200,7 @@ static int sched_clock_suspend(void)
 static void sched_clock_resume(void)
 {
 	cd.epoch_cyc = read_sched_clock();
+	hrtimer_start(&sched_clock_timer, cd.wrap_kt, HRTIMER_MODE_REL);
 	cd.suspended = false;
 }
 

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH] sched_clock: Avoid corrupting hrtimer tree during suspend
  2014-07-18 23:24       ` Stephen Boyd
@ 2014-07-19  0:14         ` John Stultz
  2014-07-22 22:21           ` Stephen Boyd
  0 siblings, 1 reply; 7+ messages in thread
From: John Stultz @ 2014-07-19  0:14 UTC (permalink / raw)
  To: linux-arm-kernel

On 07/18/2014 04:24 PM, Stephen Boyd wrote:
> On 07/18/14 15:42, John Stultz wrote:
>> If its a regression (and needs -stable backports) it needs to go in via
>> tip/timers/urgent, and not via the regular merge window.
>>
>> Whats the additional risk -stable wise for canceling the timer during
>> suspend and starting it back up during resume?
>>
> I'd say close to zero given that we'd only be making the timer run a
> little bit later and we have slack in there already. Here's that version.

Ok, thanks. I'll try to do a closer review it and get it queued. Is
there anyone who might be able to validate this and provide a Tested-by: ?

thanks
-john

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH] sched_clock: Avoid corrupting hrtimer tree during suspend
  2014-07-19  0:14         ` John Stultz
@ 2014-07-22 22:21           ` Stephen Boyd
  0 siblings, 0 replies; 7+ messages in thread
From: Stephen Boyd @ 2014-07-22 22:21 UTC (permalink / raw)
  To: linux-arm-kernel

On 07/18/14 17:14, John Stultz wrote:
> On 07/18/2014 04:24 PM, Stephen Boyd wrote:
>> On 07/18/14 15:42, John Stultz wrote:
>>> If its a regression (and needs -stable backports) it needs to go in via
>>> tip/timers/urgent, and not via the regular merge window.
>>>
>>> Whats the additional risk -stable wise for canceling the timer during
>>> suspend and starting it back up during resume?
>>>
>> I'd say close to zero given that we'd only be making the timer run a
>> little bit later and we have slack in there already. Here's that version.
> Ok, thanks. I'll try to do a closer review it and get it queued. Is
> there anyone who might be able to validate this and provide a Tested-by: ?
>

Maybe someone from Linaro can give a Tested-by? I basically did this:

# grep -A1 'sched_clock' /proc/timer_list && echo mem > /sys/power/state && grep -A1 'sched_clock' /proc/timer_list


and made sure that the expires time was reset.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-07-22 22:21 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-07-18 22:09 [PATCH] sched_clock: Avoid corrupting hrtimer tree during suspend Stephen Boyd
2014-07-18 22:25 ` John Stultz
2014-07-18 22:38   ` Stephen Boyd
2014-07-18 22:42     ` John Stultz
2014-07-18 23:24       ` Stephen Boyd
2014-07-19  0:14         ` John Stultz
2014-07-22 22:21           ` Stephen Boyd

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).