From: Ingo Molnar <mingo@elte.hu>
To: David Miller <davem@davemloft.net>
Cc: linux-kernel@vger.kernel.org, tglx@linutronix.de,
Peter Zijlstra <a.p.zijlstra@chello.nl>
Subject: Re: Soft lockup regression from today's sched.git merge.
Date: Wed, 23 Apr 2008 09:51:40 +0200 [thread overview]
Message-ID: <20080423075140.GA20980@elte.hu> (raw)
In-Reply-To: <20080422.224212.32058127.davem@davemloft.net>
* David Miller <davem@davemloft.net> wrote:
> From: Ingo Molnar <mingo@elte.hu>
> Date: Tue, 22 Apr 2008 11:14:56 +0200
>
> > thanks for reporting it. I havent seen this false positive happen in a
> > long time - but then again, PC CPUs are a lot less idle than a 128-CPU
> > Niagara2 :-/ I'm wondering what the best method would be to provoke a
> > CPU to stay idle that long - to make sure this bug is fixed.
>
> I looked more closely at this.
>
> There is no way the patch in question can work properly.
>
> The algorithm is, essentialy "if time - prev_cpu_time is large enough,
> call __sync_cpu_clock()" which if fine, except that nothing ever sets
> prev_cpu_time.
you are right - and this causes us to hit that global spinlock every
time cpu_clock() is called. Note that only debugging code uses
cpu_clock() though (softlockup watchdog, blktrace, etc.) - but you are
right that the performance bug should be fixed - the patch below fixes
the bogosity.
the second patch below makes time_sync_thresh available to architecture
code - that way, if your platform has a guaranteed-synchronous
sched_clock(), you can set that to a larger value (or just -1LL for
infinite).
this problem cannot explain the softlockup bug though.
Ingo
---------------------------->
Subject: sched: fix cpu clock
From: Ingo Molnar <mingo@elte.hu>
Date: Wed Apr 23 09:24:06 CEST 2008
David Miller pointed it out that nothing in cpu_clock() sets
prev_cpu_time. This caused __sync_cpu_clock() to be called
all the time - against the intention of this code.
The result was that in practice we hit a global spinlock every
time cpu_clock() is called - which - even though cpu_clock()
is used for tracing and debugging, is suboptimal.
Reported-by: David Miller <davem@davemloft.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
kernel/sched.c | 2 ++
1 file changed, 2 insertions(+)
Index: linux/kernel/sched.c
===================================================================
--- linux.orig/kernel/sched.c
+++ linux/kernel/sched.c
@@ -1001,6 +1001,8 @@ unsigned long long notrace cpu_clock(int
if (unlikely(delta_time > time_sync_thresh))
time = __sync_cpu_clock(time, cpu);
+ per_cpu(prev_cpu_time, cpu) = time;
+
return time;
}
EXPORT_SYMBOL_GPL(cpu_clock);
Subject: sched: make clock sync tunable by architecture code
From: Ingo Molnar <mingo@elte.hu>
Date: Wed Apr 23 09:31:35 CEST 2008
make time_sync_thresh tunable to architecture code.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
include/linux/sched.h | 2 ++
kernel/sched.c | 2 +-
2 files changed, 3 insertions(+), 1 deletion(-)
Index: linux/include/linux/sched.h
===================================================================
--- linux.orig/include/linux/sched.h
+++ linux/include/linux/sched.h
@@ -159,6 +159,8 @@ print_cfs_rq(struct seq_file *m, int cpu
}
#endif
+extern unsigned long long time_sync_thresh;
+
/*
* Task state bitmask. NOTE! These bits are also
* encoded in fs/proc/array.c: get_task_state().
Index: linux/kernel/sched.c
===================================================================
--- linux.orig/kernel/sched.c
+++ linux/kernel/sched.c
@@ -924,7 +924,7 @@ static inline u64 global_rt_runtime(void
return (u64)sysctl_sched_rt_runtime * NSEC_PER_USEC;
}
-static const unsigned long long time_sync_thresh = 100000;
+unsigned long long time_sync_thresh = 100000;
static DEFINE_PER_CPU(unsigned long long, time_offset);
static DEFINE_PER_CPU(unsigned long long, prev_cpu_time);
next prev parent reply other threads:[~2008-04-23 7:52 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-04-22 8:59 Soft lockup regression from today's sched.git merge David Miller
2008-04-22 9:14 ` Ingo Molnar
2008-04-22 10:05 ` David Miller
2008-04-22 12:45 ` Peter Zijlstra
2008-05-06 22:41 ` Rafael J. Wysocki
2008-05-06 23:05 ` David Miller
2008-05-07 6:43 ` Ingo Molnar
2008-05-07 18:56 ` Rafael J. Wysocki
2008-04-23 8:50 ` [patch] softlockup: fix false positives on nohz if CPU is 100% idle for more than 60 seconds Ingo Molnar
2008-04-23 10:55 ` David Miller
2008-04-23 12:29 ` David Miller
2008-04-23 13:36 ` Ingo Molnar
2008-04-23 23:23 ` David Miller
2008-04-23 5:42 ` Soft lockup regression from today's sched.git merge David Miller
2008-04-23 7:32 ` Dhaval Giani
2008-04-23 7:51 ` Ingo Molnar [this message]
2008-04-23 9:40 ` Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080423075140.GA20980@elte.hu \
--to=mingo@elte.hu \
--cc=a.p.zijlstra@chello.nl \
--cc=davem@davemloft.net \
--cc=linux-kernel@vger.kernel.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox