All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Wessel <jason.wessel@windriver.com>
To: Ingo Molnar <mingo@elte.hu>
Cc: lkml <linux-kernel@vger.kernel.org>,
	"Deng, Dongdong" <Dongdong.Deng@windriver.com>,
	peterz@infradead.org
Subject: [PATCH] softlockup: fix problem with long kernel pauses from kgdb
Date: Mon, 27 Jul 2009 15:03:49 -0500	[thread overview]
Message-ID: <4A6E0825.3020604@windriver.com> (raw)


Ingo,

Given that you are the maintainer of kernel/softlockup.c, I am seeking
advice as to how to properly fix this problem.

The short version of the problem is:

  * CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y
  * Attach to kgdb after boot
  * Wait for 90 seconds
  * Execute a continue in gdb
  * You receive a warning about softlockup

The patch that follows is a lengthy analysis on the issue, but the
question here is what is the right way to fix this?

It seems that a provision is required in order to get the clock synced
up prior to touching the watch dog.  It was not clear that it was a
good idea to unconditionally call the sched_clock_tick() from the
softlockup touch code.

Your input on this issue is greatly appreciated.

Thanks,
Jason.

----------
From: "Dongdong Deng" <Dongdong.Deng@windriver.com>
Subject: [PATCH] softlockup: add sched_clock_tick() to avoid kernel warning on kgdb resume

When CONFIG_HAVE_UNSTABLE_SCHED_CLOCK is set and the sched_clock()
was gets the time from hardware, such as from TSC, kgdb often
causes softlock warning messages on resuming or detaching from
a debug session.

Sequence of events in the problem case:

1) "cpu sched clock" and "hardware time" are at 100 seconds prior
to a call to kgdb_handle_exception()

2) Debugger waits in kgdb_handle_exception() for 80 seconds and on
   exit the following is called
    ...
    touch_softlockup_watchdog()
       --> __raw_get_cpu_var(touch_timestamp) = 0;

3) "cpu sched clock" = 100s (it didn't be updated, because the
   interrupt was disabled in kgdb) but the "hardware time" = 180s

4) The first timer interrupt after resuming from kgdb_handle_exception
   updates the watchdog from the "cpu sched clock"

update_process_times()
{
    ...
    run_local_timers()
          --> softlockup_tick()
              --> check (touch_timestamp == 0)
                (it is "YES" here, we have set "touch_timestamp = 0"
                at kgdb)
                --> __touch_softlockup_watchdog()
              ***(A)--> reset "touch_timestamp" to "get_timestamp()"
                     (Here, the "touch_timestamp" will still be set to
                      100s.)
    ...

    scheduler_tick()
      ***(B)--> sched_clock_tick()
               (update "cpu sched clock" to "hardware time" = 180s)
    ...
}

5) The Second timer interrupt handler appears to have a large jump and
   trips the softlockup warning.

update_process_times()
{
    ...
    run_local_timers()
        --> softlockup_tick()
           --> "cpu sched clock" - "touch_timestamp" = 180s-100s > 60s
           --> printk "soft lockup error messages"
    ...
}

note:
***(A) reset "touch_timestamp" to "get_timestamp(this_cpu)"

Why "touch_timestamp" will be 100s, not 180s ?

We enable "CONFIG_HAVE_UNSTABLE_SCHED_CLOCK",
so the call trace of get_timestamp() is:

get_timestamp(this_cpu)
      -->cpu_clock(this_cpu)
         -->sched_clock_cpu(this_cpu)
            -->__update_sched_clock(sched_clock_data, now)

22 static u64 __update_sched_clock(struct sched_clock_data *scd, u64 =
now)
23 {
24    s64 delta = now - scd->tick_raw;
25    u64 clock, min_clock, max_clock;
26
27    WARN_ON_ONCE(!irqs_disabled());
28
29    if (unlikely(delta < 0))
30        delta = 0;
31
32    clock = scd->tick_gtod + delta;
33
34    min_clock = wrap_max(scd->tick_gtod, scd->clock);
35    max_clock = wrap_max(scd->clock, scd->tick_gtod + TICK_NSEC);
36
37    clock = wrap_max(clock, min_clock);
38    clock = wrap_min(clock, max_clock);
39
40    scd->clock = clock;
41
42    return scd->clock;
43 }

Please pay attention to __update_sched_clock() function, it uses the
GTOD tick value to create a window to filter crazy "now" values.  So
if "now" values is too big for sched_clock_data, it will be ignored.

as the data of "step 3)",
"now" = sched_clock() = "hardware time" = 180s and
sched_clock_data = 100s.
180s is too big for 100s, it will be ignored.

That's why the touch_timestamp will be set to 100s, not 180s.

The fix is to simply invoke sched_clock_tick() to update "cpu sched
clock" on exit from kgdb_handle_exception.

Signed-off-by: Dongdong Deng <Dongdong.Deng@windriver.com>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: peterz@infradead.org

---
 kernel/softlockup.c |    3 +++
 1 file changed, 3 insertions(+)

--- a/kernel/softlockup.c
+++ b/kernel/softlockup.c
@@ -118,6 +118,9 @@ void softlockup_tick(void)
 	}
 
 	if (touch_timestamp == 0) {
+		/* If the time stamp was touched externally make sure the
+		 * scheduler tick is up to date as well */
+		sched_clock_tick();
 		__touch_softlockup_watchdog();
 		return;
 	}

             reply	other threads:[~2009-07-27 20:04 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-07-27 20:03 Jason Wessel [this message]
2009-07-27 20:18 ` [PATCH] softlockup: fix problem with long kernel pauses from kgdb Peter Zijlstra
2009-07-27 21:25   ` Jason Wessel
2009-07-28 15:05   ` Jason Wessel
2009-08-04 14:16     ` Ingo Molnar
2009-08-04 14:53       ` [PATCH] softlockup: fix problem with long kernel pauses fromkgdb Jason Wessel
2009-08-04 14:59         ` Ingo Molnar
2009-08-21 12:42           ` [PATCH] softlockup: fix problem with long kernel pauses from kgdb DDD
2009-09-26  3:01       ` Yong Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4A6E0825.3020604@windriver.com \
    --to=jason.wessel@windriver.com \
    --cc=Dongdong.Deng@windriver.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.