From: Bernhard Schiffner <bernhard@schiffner-limbach.de>
To: linux-rt-users <linux-rt-users@vger.kernel.org>
Subject: Re: Suspend resume problem (WAS Re: [ANNOUNCE] 3.8.10-rt6)
Date: Wed, 01 May 2013 10:30:48 +0200 [thread overview]
Message-ID: <47157022.drQAKXAcr6@bs8> (raw)
In-Reply-To: <20130430170948.GB4688@linutronix.de>
Am Dienstag, 30. April 2013, 19:09:48 schrieb Sebastian Andrzej Siewior:
> * Clark Williams | 2013-04-29 16:19:25 [-0500]:
> >On Mon, 29 Apr 2013 22:12:02 +0200
> >
> >Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
> >> - suspend / resume seems to program program the timer wrong and wait
> >>
> >> ages until it continues.
> >
> >It has to be something we're doing when we apply RT to v3.8.x, since
> >v3.8.x suspends/resumes with no issues and I was able to suspend and
> >resume fine with the 3.6-rt series.
>
> I think I figured out what is going on or atleast I think I did.
>
> This log snippet is from the resume path (from suspend to mem):
>
> [ 15.052115] Enabling non-boot CPUs ...
> [ 15.052115] smpboot: Booting Node 0 Processor 1 APIC 0x1
> [ 14.841378] Initializing CPU#1
> [ 42.840017] [sched_delayed] sched: RT throttling activated
> [ 42.842144] CPU1 is up
> [ 42.842536] smpboot: Booting Node 0 Processor 2 APIC 0x2
>
> Two things happen here:
> - the time goes backwards from 15.X to 14.X. This is okay because the
> 14.X is the timestamp from the secondary CPU not - yet synchronized
> with the bootcpu
> - the printk with "CPU1 is up" is comming from the boot CPU and
> according to the timestamp about 28secs passed by. But this did not
> really happen as the whole procedure took less time.
>
> The next thing that happens is that RCU assumes nobody is doing any
> progress (for almost 28secs) and triggers NMIs & printks to get some
> attention. I have a trace where
> - CPU0: arch_trigger_all_cpu_backtrace_handler() => printk()
> has "lock" and is spinning for logbuf_lock
>
> - CPU1: print_cpu_stall() => printk() (spinning for the lock) => NMI =>
> arch_trigger_all_cpu_backtrace_handler()
> it may have logbuf_lock and is spinning for "lock"
>
> I can't tell if CPU1 got the logbuf_lock at this time but it seemed that
> it made no progress until I ended it.
> This NMI releated deadlock is a problem which should also trigger
> mainline, right?
>
> Now, the time jump on the other hand is the real issue here and is
> RT-only. It looks like we get a big number of timer updates via
> tick_do_update_jiffies64() because according to ktime_get() that much
> time really passed by.
>
> The sollution seems as simple as
>
> From c27eb2e0ab0b5acd96a4b62288976f1b72789b3e Mon Sep 17 00:00:00 2001
> From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Date: Tue, 30 Apr 2013 18:53:55 +0200
> Subject: [PATCH] time/timekeeping: shadow tk->cycle_last together with
> clock->cycle_last
>
> Commit ("timekeeping: Store cycle_last value in timekeeper struct as
> well") introduced a tk-> based cycle_last values which needs to be reset
> on resume path as well or else ktime_get() will think that time
> increased a lot.
>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
> kernel/time/timekeeping.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> index 99f943b..688817f 100644
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -777,6 +777,7 @@ static void timekeeping_resume(void)
> }
> /* re-base the last cycle value */
> tk->clock->cycle_last = tk->clock->read(tk->clock);
> + tk->cycle_last = tk->clock->cycle_last;
> tk->ntp_error = 0;
> timekeeping_suspended = 0;
> timekeeping_update(tk, false, true);
>
> >Clark
>
> Sebastian
> --
This patch together with the in_nmi() patch solves the resume problem for me.
Architecture X64, patched against 3.8.10-rt6.
THANKS!
Bernhard
next prev parent reply other threads:[~2013-05-01 8:33 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-04-29 20:12 [ANNOUNCE] 3.8.10-rt6 Sebastian Andrzej Siewior
2013-04-29 21:19 ` Clark Williams
2013-04-30 8:47 ` John Kacur
2013-04-30 10:35 ` Sebastian Andrzej Siewior
2013-04-30 17:09 ` Suspend resume problem (WAS Re: [ANNOUNCE] 3.8.10-rt6) Sebastian Andrzej Siewior
2013-04-30 18:08 ` Steven Rostedt
2013-05-03 9:59 ` Sebastian Andrzej Siewior
2013-05-03 15:31 ` Steven Rostedt
2013-04-30 19:18 ` Clark Williams
2013-04-30 21:54 ` Clark Williams
2013-04-30 22:31 ` Borislav Petkov
2013-05-02 7:59 ` Sebastian Andrzej Siewior
2013-05-01 8:30 ` Bernhard Schiffner [this message]
2013-05-01 8:32 ` Bernhard Schiffner
2013-05-03 10:27 ` Sebastian Andrzej Siewior
2013-05-03 17:46 ` Bernhard Schiffner
[not found] ` <23187402.mkEEi1N7Lp@bs8>
2013-04-30 7:26 ` [ANNOUNCE] 3.8.10-rt6 Sebastian Andrzej Siewior
2013-05-03 4:40 ` Jain Priyanka-B32167
2013-05-03 8:40 ` Sebastian Andrzej Siewior
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=47157022.drQAKXAcr6@bs8 \
--to=bernhard@schiffner-limbach.de \
--cc=linux-rt-users@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).