* [PATCH] Instrumenting high latency
@ 2004-07-12 6:29 Con Kolivas
2004-07-12 6:37 ` Andrew Morton
` (3 more replies)
0 siblings, 4 replies; 21+ messages in thread
From: Con Kolivas @ 2004-07-12 6:29 UTC (permalink / raw)
To: linux-kernel; +Cc: akpm
[-- Attachment #1: Type: text/plain, Size: 1239 bytes --]
Because of the recent discussion about latency in the kernel I asked William
Lee Irwin III to help create some instrumentation to determine where in the
kernel there were still sustained periods of non-preemptible code. He hacked
together this simple patch which times periods according to the preempt
count. Hopefully we can use this patch in the advice of Linus to avoid the
"mental masturbation" at guessing where latency is and track down real
problem areas.
It is enabled via a config option and by setting the threshold at boot by
passing the parameter:
preempt_thresh=2
to set the threshold at 2ms for example.
The output is a warning in syslog like so:
5ms non-preemptible critical section violated 2 ms preempt threshold
starting at add_wait_queue+0x21/0x82 and ending at add_wait _queue+0x4a/0x82
I would not recommend using this patch for daily use but please try it out
on multiple setups/filesystems etc and help us track down the areas.
Unfortunately I am not personally capable of fixing the code paths in
question so I'll need the help of others in this.
The patch appears to require CONFIG_PREEMPT enabled on uniprocessor and is
i386 only at the moment.
Patch applies to 2.6.8-rc1
Cheers,
Con Kolivas
[-- Attachment #2: wli_preempttest.diff --]
[-- Type: text/plain, Size: 4406 bytes --]
Index: linux-2.6.8-rc1/arch/i386/Kconfig
===================================================================
--- linux-2.6.8-rc1.orig/arch/i386/Kconfig 2004-07-12 16:11:43.531241636 +1000
+++ linux-2.6.8-rc1/arch/i386/Kconfig 2004-07-12 16:12:45.036678502 +1000
@@ -504,6 +504,13 @@
Say Y here if you are building a kernel for a desktop, embedded
or real-time system. Say N if you are unsure.
+config PREEMPT_TIMING
+ bool "Non-preemptible critical section timing"
+ help
+ This option measures the time spent in non-preemptible critical
+ sections and reports warnings when a boot-time configurable
+ latency threshold is exceeded.
+
config X86_UP_APIC
bool "Local APIC support on uniprocessors" if !SMP
depends on !(X86_VISWS || X86_VOYAGER)
Index: linux-2.6.8-rc1/arch/i386/kernel/traps.c
===================================================================
--- linux-2.6.8-rc1.orig/arch/i386/kernel/traps.c 2004-07-12 16:11:43.601230753 +1000
+++ linux-2.6.8-rc1/arch/i386/kernel/traps.c 2004-07-12 16:12:45.037678346 +1000
@@ -947,3 +947,68 @@
trap_init_hook();
}
+
+#ifdef CONFIG_PREEMPT_TIMING
+
+static int preempt_thresh;
+static DEFINE_PER_CPU(u64, preempt_timings);
+static DEFINE_PER_CPU(unsigned long, preempt_entry);
+static DEFINE_PER_CPU(unsigned long, preempt_exit);
+
+static int setup_preempt_thresh(char *s)
+{
+ int thresh;
+
+ get_option(&s, &thresh);
+ if (thresh > 0) {
+ preempt_thresh = thresh;
+ printk("Preemption threshold = %dms\n", preempt_thresh);
+ }
+ return 1;
+}
+__setup("preempt_thresh=", setup_preempt_thresh);
+
+void __inc_preempt_count(void)
+{
+ preempt_count()++;
+ if (preempt_count() == 1 && system_state == SYSTEM_RUNNING) {
+ rdtscll(__get_cpu_var(preempt_timings));
+ __get_cpu_var(preempt_entry)
+ = (unsigned long)__builtin_return_address(0);
+ }
+}
+
+void __dec_preempt_count(void)
+{
+ if (preempt_count() == 1 && system_state == SYSTEM_RUNNING &&
+ __get_cpu_var(preempt_entry)) {
+ u64 exit;
+ __get_cpu_var(preempt_exit)
+ = (unsigned long)__builtin_return_address(0);
+ rdtscll(exit);
+ if (cpu_khz) {
+ __get_cpu_var(preempt_timings) =
+ exit - __get_cpu_var(preempt_timings);
+ do_div(__get_cpu_var(preempt_timings), cpu_khz);
+ if (__get_cpu_var(preempt_timings) > preempt_thresh &&
+ preempt_thresh) {
+ printk("%lums non-preemptible critical "
+ "section violated %d ms preempt "
+ "threshold starting at ",
+ (unsigned long)
+ __get_cpu_var(preempt_timings),
+ preempt_thresh);
+ print_symbol("%s and ending at ",
+ __get_cpu_var(preempt_entry));
+ print_symbol("%s\n",
+ __get_cpu_var(preempt_exit));
+ dump_stack();
+ }
+ }
+ __get_cpu_var(preempt_exit) = __get_cpu_var(preempt_entry) = 0;
+ }
+ preempt_count()--;
+}
+EXPORT_SYMBOL(__inc_preempt_count);
+EXPORT_SYMBOL(__dec_preempt_count);
+#endif
Index: linux-2.6.8-rc1/include/linux/preempt.h
===================================================================
--- linux-2.6.8-rc1.orig/include/linux/preempt.h 2004-03-11 21:29:26.000000000 +1100
+++ linux-2.6.8-rc1/include/linux/preempt.h 2004-07-12 16:12:45.055675548 +1000
@@ -9,17 +9,22 @@
#include <linux/config.h>
#include <linux/linkage.h>
-#define preempt_count() (current_thread_info()->preempt_count)
-
-#define inc_preempt_count() \
-do { \
- preempt_count()++; \
-} while (0)
+#ifdef CONFIG_PREEMPT_TIMING
+void __inc_preempt_count(void);
+void __dec_preempt_count(void);
+#else
+#ifdef CONFIG_PREEMPT
+#define __inc_preempt_count() do { preempt_count()++; } while (0)
+#define __dec_preempt_count() do { preempt_count()--; } while (0)
+#else
+#define __inc_preempt_count() do { } while (0)
+#define __dec_preempt_count() do { } while (0)
+#endif
+#endif
-#define dec_preempt_count() \
-do { \
- preempt_count()--; \
-} while (0)
+#define preempt_count() (current_thread_info()->preempt_count)
+#define inc_preempt_count() __inc_preempt_count()
+#define dec_preempt_count() __dec_preempt_count()
#ifdef CONFIG_PREEMPT
@@ -51,9 +56,9 @@
#else
-#define preempt_disable() do { } while (0)
-#define preempt_enable_no_resched() do { } while (0)
-#define preempt_enable() do { } while (0)
+#define preempt_disable() __inc_preempt_count()
+#define preempt_enable_no_resched() __dec_preempt_count()
+#define preempt_enable() __dec_preempt_count()
#define preempt_check_resched() do { } while (0)
#endif
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] Instrumenting high latency
2004-07-12 6:29 [PATCH] Instrumenting high latency Con Kolivas
@ 2004-07-12 6:37 ` Andrew Morton
2004-07-12 6:43 ` Con Kolivas
2004-07-12 7:00 ` [PATCH] " William Lee Irwin III
2004-07-12 7:34 ` Andrew Morton
` (2 subsequent siblings)
3 siblings, 2 replies; 21+ messages in thread
From: Andrew Morton @ 2004-07-12 6:37 UTC (permalink / raw)
To: Con Kolivas; +Cc: linux-kernel
Con Kolivas <kernel@kolivas.org> wrote:
>
> Because of the recent discussion about latency in the kernel I asked William
> Lee Irwin III to help create some instrumentation to determine where in the
> kernel there were still sustained periods of non-preemptible code. He hacked
> together this simple patch which times periods according to the preempt
> count.
Looks sane.
> The patch appears to require CONFIG_PREEMPT enabled on uniprocessor and is
> i386 only at the moment.
Not sure what you mean by "on uniprocessor"? AFAICT the patch will work
as-is on uniprocessor and on SMP. Looks like it'll work with
CONFIG_PREEMPT=n, too, although that would be a slightly bizarre thing to
do.
+ print_symbol("%s\n",
+ __get_cpu_var(preempt_exit));
I'll change this to
print_symbol("%s",
__get_cpu_var(preempt_exit));
printk("\n");
so it doesn't make a mess with CONFIG_KALLSYMS=n.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Instrumenting high latency
2004-07-12 6:37 ` Andrew Morton
@ 2004-07-12 6:43 ` Con Kolivas
2004-07-12 6:58 ` Andrew Morton
2004-07-12 7:00 ` [PATCH] " William Lee Irwin III
1 sibling, 1 reply; 21+ messages in thread
From: Con Kolivas @ 2004-07-12 6:43 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel
Andrew Morton writes:
> Con Kolivas <kernel@kolivas.org> wrote:
>>
>> Because of the recent discussion about latency in the kernel I asked William
>> Lee Irwin III to help create some instrumentation to determine where in the
>> kernel there were still sustained periods of non-preemptible code. He hacked
>> together this simple patch which times periods according to the preempt
>> count.
>
> Looks sane.
>
>> The patch appears to require CONFIG_PREEMPT enabled on uniprocessor and is
>> i386 only at the moment.
>
> Not sure what you mean by "on uniprocessor"? AFAICT the patch will work
> as-is on uniprocessor and on SMP. Looks like it'll work with
> CONFIG_PREEMPT=n, too, although that would be a slightly bizarre thing to
That was WLI's aim, but I had one user report an OOPS with that combination.
> do.
>
> + print_symbol("%s\n",
> + __get_cpu_var(preempt_exit));
>
> I'll change this to
>
> print_symbol("%s",
> __get_cpu_var(preempt_exit));
> printk("\n");
>
> so it doesn't make a mess with CONFIG_KALLSYMS=n.
Ok
It's not heavily tested but ran for a day on my box at home. It can trip
over itself saying that it's own code paths are a problem too if it gets
real busy.
Cheers,
Con
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Instrumenting high latency
2004-07-12 6:43 ` Con Kolivas
@ 2004-07-12 6:58 ` Andrew Morton
0 siblings, 0 replies; 21+ messages in thread
From: Andrew Morton @ 2004-07-12 6:58 UTC (permalink / raw)
To: Con Kolivas; +Cc: linux-kernel
Con Kolivas <kernel@kolivas.org> wrote:
>
> It can trip
> over itself saying that it's own code paths are a problem too if it gets
> real busy.
This might fix it?
diff -puN arch/i386/kernel/traps.c~instrumenting-high-latency-fixes arch/i386/kernel/traps.c
--- 25/arch/i386/kernel/traps.c~instrumenting-high-latency-fixes 2004-07-11 23:47:51.007799680 -0700
+++ 25-akpm/arch/i386/kernel/traps.c 2004-07-11 23:57:58.726412408 -0700
@@ -1065,6 +1065,9 @@ void __dec_preempt_count(void)
if (preempt_count() == 1 && system_state == SYSTEM_RUNNING &&
__get_cpu_var(preempt_entry)) {
u64 exit;
+
+ preempt_count()++; /* Prevent recursive warnings */
+
__get_cpu_var(preempt_exit)
= (unsigned long)__builtin_return_address(0);
rdtscll(exit);
@@ -1082,12 +1085,13 @@ void __dec_preempt_count(void)
preempt_thresh);
print_symbol("%s and ending at ",
__get_cpu_var(preempt_entry));
- print_symbol("%s\n",
- __get_cpu_var(preempt_exit));
+ print_symbol("%s", __get_cpu_var(preempt_exit));
+ printk("\n");
dump_stack();
}
}
__get_cpu_var(preempt_exit) = __get_cpu_var(preempt_entry) = 0;
+ preempt_count()--;
}
preempt_count()--;
}
_
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] Instrumenting high latency
2004-07-12 6:37 ` Andrew Morton
2004-07-12 6:43 ` Con Kolivas
@ 2004-07-12 7:00 ` William Lee Irwin III
1 sibling, 0 replies; 21+ messages in thread
From: William Lee Irwin III @ 2004-07-12 7:00 UTC (permalink / raw)
To: Andrew Morton; +Cc: Con Kolivas, linux-kernel
Con Kolivas <kernel@kolivas.org> wrote:
>> The patch appears to require CONFIG_PREEMPT enabled on uniprocessor and is
>> i386 only at the moment.
On Sun, Jul 11, 2004 at 11:37:50PM -0700, Andrew Morton wrote:
> Not sure what you mean by "on uniprocessor"? AFAICT the patch will work
> as-is on uniprocessor and on SMP. Looks like it'll work with
> CONFIG_PREEMPT=n, too, although that would be a slightly bizarre thing to
> do.
> + print_symbol("%s\n",
> + __get_cpu_var(preempt_exit));
> I'll change this to
> print_symbol("%s",
> __get_cpu_var(preempt_exit));
> printk("\n");
> so it doesn't make a mess with CONFIG_KALLSYMS=n.
I'm not 100% sure why UP goes down in flames yet, fscking laptop has no
loggable console. Guess I'd better figure out netconsole or some such.
-- wli
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] Instrumenting high latency
2004-07-12 6:29 [PATCH] Instrumenting high latency Con Kolivas
2004-07-12 6:37 ` Andrew Morton
@ 2004-07-12 7:34 ` Andrew Morton
2004-07-12 8:02 ` William Lee Irwin III
2004-07-12 8:29 ` Arjan van de Ven
2004-07-12 10:41 ` Rajput
2004-07-12 14:24 ` Martin J. Bligh
3 siblings, 2 replies; 21+ messages in thread
From: Andrew Morton @ 2004-07-12 7:34 UTC (permalink / raw)
To: Con Kolivas; +Cc: linux-kernel
Con Kolivas <kernel@kolivas.org> wrote:
>
> He hacked
> together this simple patch which times periods according to the preempt
> count.
OK, small problem. We have code which does, effectively,
if (need_resched()) {
drop_the_lock();
schedule();
grab_the_lock();
}
so if need_resched() stays false then this will hold the lock for a long
time and bogus reports are generated:
46ms non-preemptible critical section violated 1 ms preempt threshold starting at exit_mmap+0x26/0x188 and ending at exit_mmap+0x154/0x188
To fix that you need to generate high scheduling pressure so that
need_resched() is frequently true. On all CPUs. Modify realfeel to pin
itself to each CPU, or something like that.
This rather decreases the patch's usefulness.
The way I normally do this stuff is with
http://www.zip.com.au/~akpm/linux/patches/stuff/rtc-debug.patch
and `amlat', from http://www.zip.com.au/~akpm/linux/amlat.tar.gz
It _might_ be sufficient to redefine need_resched() to just return 1 all
the time. If that causes the kernel to livelock then we need to fix that
up anyway.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] Instrumenting high latency
2004-07-12 7:34 ` Andrew Morton
@ 2004-07-12 8:02 ` William Lee Irwin III
2004-07-12 8:38 ` William Lee Irwin III
2004-07-12 8:29 ` Arjan van de Ven
1 sibling, 1 reply; 21+ messages in thread
From: William Lee Irwin III @ 2004-07-12 8:02 UTC (permalink / raw)
To: Andrew Morton; +Cc: Con Kolivas, linux-kernel
On Mon, Jul 12, 2004 at 12:34:18AM -0700, Andrew Morton wrote:
> OK, small problem. We have code which does, effectively,
> if (need_resched()) {
> drop_the_lock();
> schedule();
> grab_the_lock();
> }
> so if need_resched() stays false then this will hold the lock for a long
> time and bogus reports are generated:
> 46ms non-preemptible critical section violated 1 ms preempt threshold starting at exit_mmap+0x26/0x188 and ending at exit_mmap+0x154/0x188
> To fix that you need to generate high scheduling pressure so that
> need_resched() is frequently true. On all CPUs. Modify realfeel to pin
> itself to each CPU, or something like that.
I suspect it's better to drop in hooks to trap those e.g. in
cond_resched() and cond_resched_lock().
On Mon, Jul 12, 2004 at 12:34:18AM -0700, Andrew Morton wrote:
> This rather decreases the patch's usefulness.
> The way I normally do this stuff is with
> http://www.zip.com.au/~akpm/linux/patches/stuff/rtc-debug.patch
> and `amlat', from http://www.zip.com.au/~akpm/linux/amlat.tar.gz
> It _might_ be sufficient to redefine need_resched() to just return 1 all
> the time. If that causes the kernel to livelock then we need to fix that
> up anyway.
Less code... not sure how nasty performance the implications are.
-- wli
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] Instrumenting high latency
2004-07-12 7:34 ` Andrew Morton
2004-07-12 8:02 ` William Lee Irwin III
@ 2004-07-12 8:29 ` Arjan van de Ven
2004-07-12 8:42 ` William Lee Irwin III
1 sibling, 1 reply; 21+ messages in thread
From: Arjan van de Ven @ 2004-07-12 8:29 UTC (permalink / raw)
To: Andrew Morton; +Cc: Con Kolivas, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 625 bytes --]
On Mon, 2004-07-12 at 09:34, Andrew Morton wrote:
> Con Kolivas <kernel@kolivas.org> wrote:
> >
> > He hacked
> > together this simple patch which times periods according to the preempt
> > count.
>
> OK, small problem. We have code which does, effectively,
>
> if (need_resched()) {
> drop_the_lock();
> schedule();
> grab_the_lock();
> }
>
> so if need_resched() stays false then this will hold the lock for a long
> time and bogus reports are generated:
... or reset the time(r) in need_resched() under the assumption that all
need_resched() callers will yield when it returns true...
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] Instrumenting high latency
2004-07-12 8:02 ` William Lee Irwin III
@ 2004-07-12 8:38 ` William Lee Irwin III
2004-07-12 11:31 ` William Lee Irwin III
0 siblings, 1 reply; 21+ messages in thread
From: William Lee Irwin III @ 2004-07-12 8:38 UTC (permalink / raw)
To: Andrew Morton, Con Kolivas, linux-kernel
On Mon, Jul 12, 2004 at 01:02:59AM -0700, William Lee Irwin III wrote:
> I suspect it's better to drop in hooks to trap those e.g. in
> cond_resched() and cond_resched_lock().
Something like this should do, modulo everything outside kernel/ and mm/.
Index: timing-2.6.7/include/linux/sched.h
===================================================================
--- timing-2.6.7.orig/include/linux/sched.h 2004-06-15 22:18:57.000000000 -0700
+++ timing-2.6.7/include/linux/sched.h 2004-07-12 01:17:32.000000000 -0700
@@ -1008,6 +1008,7 @@
extern void __cond_resched(void);
static inline void cond_resched(void)
{
+ touch_preempt_timing();
if (need_resched())
__cond_resched();
}
@@ -1022,6 +1023,7 @@
*/
static inline void cond_resched_lock(spinlock_t * lock)
{
+ touch_preempt_timing();
if (need_resched()) {
_raw_spin_unlock(lock);
preempt_enable_no_resched();
Index: timing-2.6.7/include/linux/preempt.h
===================================================================
--- timing-2.6.7.orig/include/linux/preempt.h 2004-07-11 04:33:55.000000000 -0700
+++ timing-2.6.7/include/linux/preempt.h 2004-07-12 01:18:17.000000000 -0700
@@ -12,7 +12,9 @@
#ifdef CONFIG_PREEMPT_TIMING
void __inc_preempt_count(void);
void __dec_preempt_count(void);
+void touch_preempt_timing(void);
#else
+#define touch_preempt_timing() do { } while (0)
#ifdef CONFIG_PREEMPT
#define __inc_preempt_count() do { preempt_count()++; } while (0)
#define __dec_preempt_count() do { preempt_count()--; } while (0)
Index: timing-2.6.7/arch/i386/kernel/traps.c
===================================================================
--- timing-2.6.7.orig/arch/i386/kernel/traps.c 2004-07-12 00:10:27.000000000 -0700
+++ timing-2.6.7/arch/i386/kernel/traps.c 2004-07-12 01:34:58.000000000 -0700
@@ -969,14 +969,24 @@
}
__setup("preempt_thresh=", setup_preempt_thresh);
+static void __touch_preempt_timing(void *addr)
+{
+ rdtscll(__get_cpu_var(preempt_timings));
+ __get_cpu_var(preempt_entry) = (unsigned long)addr;
+}
+
+void touch_preempt_timing(void)
+{
+ if (preempt_count() == 1 && system_state == SYSTEM_RUNNING)
+ __touch_preempt_timing(__builtin_return_address(0));
+}
+EXPORT_SYMBOL(touch_preempt_timing);
+
void __inc_preempt_count(void)
{
preempt_count()++;
- if (preempt_count() == 1 && system_state == SYSTEM_RUNNING) {
- rdtscll(__get_cpu_var(preempt_timings));
- __get_cpu_var(preempt_entry)
- = (unsigned long)__builtin_return_address(0);
- }
+ if (preempt_count() == 1 && system_state == SYSTEM_RUNNING)
+ __touch_preempt_timing(__builtin_return_address(0));
}
EXPORT_SYMBOL(__inc_preempt_count);
Index: timing-2.6.7/mm/memory.c
===================================================================
--- timing-2.6.7.orig/mm/memory.c 2004-06-15 22:19:22.000000000 -0700
+++ timing-2.6.7/mm/memory.c 2004-07-12 01:27:32.000000000 -0700
@@ -558,6 +558,8 @@
zap_bytes -= block;
if ((long)zap_bytes > 0)
continue;
+ if (!atomic)
+ touch_preempt_timing();
if (!atomic && need_resched()) {
int fullmm = tlb_is_full_mm(*tlbp);
tlb_finish_mmu(*tlbp, tlb_start, start);
Index: timing-2.6.7/kernel/printk.c
===================================================================
--- timing-2.6.7.orig/kernel/printk.c 2004-06-15 22:20:26.000000000 -0700
+++ timing-2.6.7/kernel/printk.c 2004-07-12 01:28:04.000000000 -0700
@@ -650,6 +650,8 @@
*/
void console_conditional_schedule(void)
{
+ if (console_may_schedule)
+ touch_preempt_timing();
if (console_may_schedule && need_resched()) {
set_current_state(TASK_RUNNING);
schedule();
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] Instrumenting high latency
2004-07-12 8:29 ` Arjan van de Ven
@ 2004-07-12 8:42 ` William Lee Irwin III
2004-07-12 8:47 ` William Lee Irwin III
0 siblings, 1 reply; 21+ messages in thread
From: William Lee Irwin III @ 2004-07-12 8:42 UTC (permalink / raw)
To: Arjan van de Ven; +Cc: Andrew Morton, Con Kolivas, linux-kernel
On Mon, 2004-07-12 at 09:34, Andrew Morton wrote:
>> so if need_resched() stays false then this will hold the lock for a long
>> time and bogus reports are generated:
On Mon, Jul 12, 2004 at 10:29:40AM +0200, Arjan van de Ven wrote:
> ... or reset the time(r) in need_resched() under the assumption that all
> need_resched() callers will yield when it returns true...
Might be a good enough approximation. Two examples and a counterexample...
-- wli
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] Instrumenting high latency
2004-07-12 8:42 ` William Lee Irwin III
@ 2004-07-12 8:47 ` William Lee Irwin III
0 siblings, 0 replies; 21+ messages in thread
From: William Lee Irwin III @ 2004-07-12 8:47 UTC (permalink / raw)
To: Arjan van de Ven, Andrew Morton, Con Kolivas, linux-kernel
On Mon, Jul 12, 2004 at 10:29:40AM +0200, Arjan van de Ven wrote:
>> ... or reset the time(r) in need_resched() under the assumption that all
>> need_resched() callers will yield when it returns true...
On Mon, Jul 12, 2004 at 01:42:31AM -0700, William Lee Irwin III wrote:
> Might be a good enough approximation. Two examples and a counterexample...
Index: timing-2.6.7/include/linux/sched.h
===================================================================
--- timing-2.6.7.orig/include/linux/sched.h 2004-06-15 22:18:57.000000000 -0700
+++ timing-2.6.7/include/linux/sched.h 2004-07-12 01:45:08.000000000 -0700
@@ -1000,7 +1000,8 @@
return unlikely(test_tsk_thread_flag(p,TIF_SIGPENDING));
}
-static inline int need_resched(void)
+#define need_resched() ({ touch_preempt_timing(); __need_resched(); })
+static inline int __need_resched(void)
{
return unlikely(test_thread_flag(TIF_NEED_RESCHED));
}
Index: timing-2.6.7/include/linux/preempt.h
===================================================================
--- timing-2.6.7.orig/include/linux/preempt.h 2004-07-11 04:33:55.000000000 -0700
+++ timing-2.6.7/include/linux/preempt.h 2004-07-12 01:18:17.000000000 -0700
@@ -12,7 +12,9 @@
#ifdef CONFIG_PREEMPT_TIMING
void __inc_preempt_count(void);
void __dec_preempt_count(void);
+void touch_preempt_timing(void);
#else
+#define touch_preempt_timing() do { } while (0)
#ifdef CONFIG_PREEMPT
#define __inc_preempt_count() do { preempt_count()++; } while (0)
#define __dec_preempt_count() do { preempt_count()--; } while (0)
Index: timing-2.6.7/arch/i386/kernel/traps.c
===================================================================
--- timing-2.6.7.orig/arch/i386/kernel/traps.c 2004-07-12 00:10:27.000000000 -0700
+++ timing-2.6.7/arch/i386/kernel/traps.c 2004-07-12 01:34:58.000000000 -0700
@@ -969,14 +969,24 @@
}
__setup("preempt_thresh=", setup_preempt_thresh);
+static void __touch_preempt_timing(void *addr)
+{
+ rdtscll(__get_cpu_var(preempt_timings));
+ __get_cpu_var(preempt_entry) = (unsigned long)addr;
+}
+
+void touch_preempt_timing(void)
+{
+ if (preempt_count() == 1 && system_state == SYSTEM_RUNNING)
+ __touch_preempt_timing(__builtin_return_address(0));
+}
+EXPORT_SYMBOL(touch_preempt_timing);
+
void __inc_preempt_count(void)
{
preempt_count()++;
- if (preempt_count() == 1 && system_state == SYSTEM_RUNNING) {
- rdtscll(__get_cpu_var(preempt_timings));
- __get_cpu_var(preempt_entry)
- = (unsigned long)__builtin_return_address(0);
- }
+ if (preempt_count() == 1 && system_state == SYSTEM_RUNNING)
+ __touch_preempt_timing(__builtin_return_address(0));
}
EXPORT_SYMBOL(__inc_preempt_count);
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] Instrumenting high latency
2004-07-12 6:29 [PATCH] Instrumenting high latency Con Kolivas
2004-07-12 6:37 ` Andrew Morton
2004-07-12 7:34 ` Andrew Morton
@ 2004-07-12 10:41 ` Rajput
2004-07-12 10:44 ` William Lee Irwin III
2004-07-12 14:24 ` Martin J. Bligh
3 siblings, 1 reply; 21+ messages in thread
From: Rajput @ 2004-07-12 10:41 UTC (permalink / raw)
To: Con Kolivas; +Cc: linux-kernel, Rajput
what is the difference in interrupt latency with and without your patch.
Regards,
Rajput.
----- Original Message -----
From: "Con Kolivas" <kernel@kolivas.org>
To: <linux-kernel@vger.kernel.org>
Cc: <akpm@osdl.org>
Sent: Monday, July 12, 2004 11:59 AM
Subject: [PATCH] Instrumenting high latency
> Because of the recent discussion about latency in the kernel I asked
William
> Lee Irwin III to help create some instrumentation to determine where in
the
> kernel there were still sustained periods of non-preemptible code. He
hacked
> together this simple patch which times periods according to the preempt
> count. Hopefully we can use this patch in the advice of Linus to avoid the
> "mental masturbation" at guessing where latency is and track down real
> problem areas.
>
> It is enabled via a config option and by setting the threshold at boot by
> passing the parameter:
> preempt_thresh=2
>
> to set the threshold at 2ms for example.
>
> The output is a warning in syslog like so:
>
> 5ms non-preemptible critical section violated 2 ms preempt threshold
> starting at add_wait_queue+0x21/0x82 and ending at add_wait
_queue+0x4a/0x82
>
> I would not recommend using this patch for daily use but please try it out
> on multiple setups/filesystems etc and help us track down the areas.
> Unfortunately I am not personally capable of fixing the code paths in
> question so I'll need the help of others in this.
>
> The patch appears to require CONFIG_PREEMPT enabled on uniprocessor and is
> i386 only at the moment.
>
> Patch applies to 2.6.8-rc1
>
> Cheers,
> Con Kolivas
>
>
----------------------------------------------------------------------------
----
> Index: linux-2.6.8-rc1/arch/i386/Kconfig
> ===================================================================
> --- linux-2.6.8-rc1.orig/arch/i386/Kconfig 2004-07-12 16:11:43.531241636
+1000
> +++ linux-2.6.8-rc1/arch/i386/Kconfig 2004-07-12 16:12:45.036678502 +1000
> @@ -504,6 +504,13 @@
> Say Y here if you are building a kernel for a desktop, embedded
> or real-time system. Say N if you are unsure.
>
> +config PREEMPT_TIMING
> + bool "Non-preemptible critical section timing"
> + help
> + This option measures the time spent in non-preemptible critical
> + sections and reports warnings when a boot-time configurable
> + latency threshold is exceeded.
> +
> config X86_UP_APIC
> bool "Local APIC support on uniprocessors" if !SMP
> depends on !(X86_VISWS || X86_VOYAGER)
> Index: linux-2.6.8-rc1/arch/i386/kernel/traps.c
> ===================================================================
> --- linux-2.6.8-rc1.orig/arch/i386/kernel/traps.c 2004-07-12
16:11:43.601230753 +1000
> +++ linux-2.6.8-rc1/arch/i386/kernel/traps.c 2004-07-12 16:12:45.037678346
+1000
> @@ -947,3 +947,68 @@
>
> trap_init_hook();
> }
> +
> +#ifdef CONFIG_PREEMPT_TIMING
> +
> +static int preempt_thresh;
> +static DEFINE_PER_CPU(u64, preempt_timings);
> +static DEFINE_PER_CPU(unsigned long, preempt_entry);
> +static DEFINE_PER_CPU(unsigned long, preempt_exit);
> +
> +static int setup_preempt_thresh(char *s)
> +{
> + int thresh;
> +
> + get_option(&s, &thresh);
> + if (thresh > 0) {
> + preempt_thresh = thresh;
> + printk("Preemption threshold = %dms\n", preempt_thresh);
> + }
> + return 1;
> +}
> +__setup("preempt_thresh=", setup_preempt_thresh);
> +
> +void __inc_preempt_count(void)
> +{
> + preempt_count()++;
> + if (preempt_count() == 1 && system_state == SYSTEM_RUNNING) {
> + rdtscll(__get_cpu_var(preempt_timings));
> + __get_cpu_var(preempt_entry)
> + = (unsigned long)__builtin_return_address(0);
> + }
> +}
> +
> +void __dec_preempt_count(void)
> +{
> + if (preempt_count() == 1 && system_state == SYSTEM_RUNNING &&
> + __get_cpu_var(preempt_entry)) {
> + u64 exit;
> + __get_cpu_var(preempt_exit)
> + = (unsigned long)__builtin_return_address(0);
> + rdtscll(exit);
> + if (cpu_khz) {
> + __get_cpu_var(preempt_timings) =
> + exit - __get_cpu_var(preempt_timings);
> + do_div(__get_cpu_var(preempt_timings), cpu_khz);
> + if (__get_cpu_var(preempt_timings) > preempt_thresh &&
> + preempt_thresh) {
> + printk("%lums non-preemptible critical "
> + "section violated %d ms preempt "
> + "threshold starting at ",
> + (unsigned long)
> + __get_cpu_var(preempt_timings),
> + preempt_thresh);
> + print_symbol("%s and ending at ",
> + __get_cpu_var(preempt_entry));
> + print_symbol("%s\n",
> + __get_cpu_var(preempt_exit));
> + dump_stack();
> + }
> + }
> + __get_cpu_var(preempt_exit) = __get_cpu_var(preempt_entry) = 0;
> + }
> + preempt_count()--;
> +}
> +EXPORT_SYMBOL(__inc_preempt_count);
> +EXPORT_SYMBOL(__dec_preempt_count);
> +#endif
> Index: linux-2.6.8-rc1/include/linux/preempt.h
> ===================================================================
> --- linux-2.6.8-rc1.orig/include/linux/preempt.h 2004-03-11
21:29:26.000000000 +1100
> +++ linux-2.6.8-rc1/include/linux/preempt.h 2004-07-12 16:12:45.055675548
+1000
> @@ -9,17 +9,22 @@
> #include <linux/config.h>
> #include <linux/linkage.h>
>
> -#define preempt_count() (current_thread_info()->preempt_count)
> -
> -#define inc_preempt_count() \
> -do { \
> - preempt_count()++; \
> -} while (0)
> +#ifdef CONFIG_PREEMPT_TIMING
> +void __inc_preempt_count(void);
> +void __dec_preempt_count(void);
> +#else
> +#ifdef CONFIG_PREEMPT
> +#define __inc_preempt_count() do { preempt_count()++; } while (0)
> +#define __dec_preempt_count() do { preempt_count()--; } while (0)
> +#else
> +#define __inc_preempt_count() do { } while (0)
> +#define __dec_preempt_count() do { } while (0)
> +#endif
> +#endif
>
> -#define dec_preempt_count() \
> -do { \
> - preempt_count()--; \
> -} while (0)
> +#define preempt_count() (current_thread_info()->preempt_count)
> +#define inc_preempt_count() __inc_preempt_count()
> +#define dec_preempt_count() __dec_preempt_count()
>
> #ifdef CONFIG_PREEMPT
>
> @@ -51,9 +56,9 @@
>
> #else
>
> -#define preempt_disable() do { } while (0)
> -#define preempt_enable_no_resched() do { } while (0)
> -#define preempt_enable() do { } while (0)
> +#define preempt_disable() __inc_preempt_count()
> +#define preempt_enable_no_resched() __dec_preempt_count()
> +#define preempt_enable() __dec_preempt_count()
> #define preempt_check_resched() do { } while (0)
>
> #endif
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] Instrumenting high latency
2004-07-12 10:41 ` Rajput
@ 2004-07-12 10:44 ` William Lee Irwin III
0 siblings, 0 replies; 21+ messages in thread
From: William Lee Irwin III @ 2004-07-12 10:44 UTC (permalink / raw)
To: Rajput; +Cc: Con Kolivas, linux-kernel
On Mon, Jul 12, 2004 at 04:11:32PM +0530, Rajput wrote:
> what is the difference in interrupt latency with and without your patch.
> Regards,
> Rajput.
This is meant to instrument scheduling latency, it doesn't improve
anything directly. It likely introduces a fair amount of overhead.
In the event a poor preempt_thresh= parameter is chosen, it may render
systems effectively unusable due to endless printk()'ing (which I should
fix via printk_ratelimit() shortly).
I recommend applying and/or enabling this only for diagnostic purposes.
-- wli
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] Instrumenting high latency
2004-07-12 8:38 ` William Lee Irwin III
@ 2004-07-12 11:31 ` William Lee Irwin III
2004-07-12 14:32 ` William Lee Irwin III
2004-07-12 17:09 ` Martin Josefsson
0 siblings, 2 replies; 21+ messages in thread
From: William Lee Irwin III @ 2004-07-12 11:31 UTC (permalink / raw)
To: Andrew Morton, Con Kolivas, linux-kernel
On Mon, Jul 12, 2004 at 01:02:59AM -0700, William Lee Irwin III wrote:
>> I suspect it's better to drop in hooks to trap those e.g. in
>> cond_resched() and cond_resched_lock().
On Mon, Jul 12, 2004 at 01:38:20AM -0700, William Lee Irwin III wrote:
> Something like this should do, modulo everything outside kernel/ and mm/.
Okay, finally, it should be possible to genericize this all, since
sched_clock() is an arch-independent interface for high-precision
timings in units of nanoseconds.
Some cleanups and fixing a minor oversight in touch_preempt_timing(),
which was that in principle things can break out of preempt_count() > 1
after checking need_resched(). Cleaner, smaller, yes. How crippled it
just got with the various potentially false negatives introduced by
using need_resched() as a hook for touch_preempt_timing() is unclear.
Also, the major oversight of nopping out inc_preempt_count() and
dec_preempt_count(), which will take out boxen, is corrected here.
Happily, some underscore-itis is also removed.
I couldn't think of a better place to put the config option for generic
use than under CONFIG_EMBEDDED, otherwise I'd have to drop it in
alongside CONFIG_PREEMPT on all arches, as it's basically more generic
than CONFIG_PREEMPT in and of itself.
I guess the other things are:
(a) do people want this sysctl-configurable instead of boot-time?
I chose boot-time because sysctls seem to introduce more risk
of people shooting themselves in the foot.
(b) the damn thing finally boots on my craptop, where else did it break?
(c) everything else and the kitchen sink
-- wli
Index: timing-2.6.7/include/linux/preempt.h
===================================================================
--- timing-2.6.7.orig/include/linux/preempt.h 2004-06-15 22:19:02.000000000 -0700
+++ timing-2.6.7/include/linux/preempt.h 2004-07-12 03:18:19.000000000 -0700
@@ -9,17 +9,17 @@
#include <linux/config.h>
#include <linux/linkage.h>
-#define preempt_count() (current_thread_info()->preempt_count)
-
-#define inc_preempt_count() \
-do { \
- preempt_count()++; \
-} while (0)
+#ifdef CONFIG_PREEMPT_TIMING
+void inc_preempt_count(void);
+void dec_preempt_count(void);
+void touch_preempt_timing(void);
+#else
+#define touch_preempt_timing() do { } while (0)
+#define inc_preempt_count() do { preempt_count()++; } while (0)
+#define dec_preempt_count() do { preempt_count()--; } while (0)
+#endif
-#define dec_preempt_count() \
-do { \
- preempt_count()--; \
-} while (0)
+#define preempt_count() (current_thread_info()->preempt_count)
#ifdef CONFIG_PREEMPT
@@ -51,9 +51,15 @@
#else
+#ifdef CONFIG_PREEMPT_TIMING
+#define preempt_disable() inc_preempt_count()
+#define preempt_enable_no_resched() dec_preempt_count()
+#define preempt_enable() dec_preempt_count()
+#else
#define preempt_disable() do { } while (0)
#define preempt_enable_no_resched() do { } while (0)
#define preempt_enable() do { } while (0)
+#endif
#define preempt_check_resched() do { } while (0)
#endif
Index: timing-2.6.7/include/linux/sched.h
===================================================================
--- timing-2.6.7.orig/include/linux/sched.h 2004-06-15 22:18:57.000000000 -0700
+++ timing-2.6.7/include/linux/sched.h 2004-07-12 01:45:08.000000000 -0700
@@ -1000,7 +1000,8 @@
return unlikely(test_tsk_thread_flag(p,TIF_SIGPENDING));
}
-static inline int need_resched(void)
+#define need_resched() ({ touch_preempt_timing(); __need_resched(); })
+static inline int __need_resched(void)
{
return unlikely(test_thread_flag(TIF_NEED_RESCHED));
}
Index: timing-2.6.7/init/Kconfig
===================================================================
--- timing-2.6.7.orig/init/Kconfig 2004-06-15 22:19:52.000000000 -0700
+++ timing-2.6.7/init/Kconfig 2004-07-12 03:55:15.000000000 -0700
@@ -261,6 +261,14 @@
Disabling this option will cause the kernel to be built without
support for epoll family of system calls.
+config PREEMPT_TIMING
+ bool "Non-preemptible critical section timing"
+ default n
+ help
+ This option measures the time spent in non-preemptible critical
+ sections and reports warnings when a boot-time configurable
+ latency threshold is exceeded.
+
source "drivers/block/Kconfig.iosched"
config CC_OPTIMIZE_FOR_SIZE
Index: timing-2.6.7/kernel/sched.c
===================================================================
--- timing-2.6.7.orig/kernel/sched.c 2004-06-15 22:19:51.000000000 -0700
+++ timing-2.6.7/kernel/sched.c 2004-07-12 03:14:41.000000000 -0700
@@ -4029,3 +4029,74 @@
EXPORT_SYMBOL(__preempt_write_lock);
#endif /* defined(CONFIG_SMP) && defined(CONFIG_PREEMPT) */
+
+#ifdef CONFIG_PREEMPT_TIMING
+#include <linux/kallsyms.h>
+
+static int preempt_thresh;
+static DEFINE_PER_CPU(u64, preempt_timings);
+static DEFINE_PER_CPU(unsigned long, preempt_entry);
+
+static int setup_preempt_thresh(char *s)
+{
+ int thresh;
+
+ get_option(&s, &thresh);
+ if (thresh > 0) {
+ preempt_thresh = thresh;
+ printk("Preemption threshold = %dms\n", preempt_thresh);
+ }
+ return 1;
+}
+__setup("preempt_thresh=", setup_preempt_thresh);
+
+static void __touch_preempt_timing(void *addr)
+{
+ __get_cpu_var(preempt_timings) = sched_clock();
+ __get_cpu_var(preempt_entry) = (unsigned long)addr;
+}
+
+void touch_preempt_timing(void)
+{
+ if (preempt_count() > 0 && system_state == SYSTEM_RUNNING &&
+ __get_cpu_var(preempt_entry))
+ __touch_preempt_timing(__builtin_return_address(0));
+}
+EXPORT_SYMBOL(touch_preempt_timing);
+
+void inc_preempt_count(void)
+{
+ preempt_count()++;
+ if (preempt_count() == 1 && system_state == SYSTEM_RUNNING)
+ __touch_preempt_timing(__builtin_return_address(0));
+}
+EXPORT_SYMBOL(inc_preempt_count);
+
+void dec_preempt_count(void)
+{
+ if (preempt_count() == 1 && system_state == SYSTEM_RUNNING &&
+ __get_cpu_var(preempt_entry)) {
+ u64 hold;
+ unsigned long preempt_exit
+ = (unsigned long)__builtin_return_address(0);
+ hold = sched_clock() - __get_cpu_var(preempt_timings) + 999999;
+ do_div(hold, 1000000);
+ if (preempt_thresh && hold > preempt_thresh &&
+ printk_ratelimit()) {
+ printk("%lums non-preemptible critical section "
+ "violated %d ms preempt threshold "
+ "starting at ",
+ (unsigned long)hold,
+ preempt_thresh);
+ print_symbol("%s", __get_cpu_var(preempt_entry));
+ printk(" and ending at ");
+ print_symbol("%s", preempt_exit);
+ printk("\n");
+ dump_stack();
+ }
+ __get_cpu_var(preempt_entry) = 0;
+ }
+ preempt_count()--;
+}
+EXPORT_SYMBOL(dec_preempt_count);
+#endif
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] Instrumenting high latency
2004-07-12 6:29 [PATCH] Instrumenting high latency Con Kolivas
` (2 preceding siblings ...)
2004-07-12 10:41 ` Rajput
@ 2004-07-12 14:24 ` Martin J. Bligh
2004-07-12 14:27 ` Con Kolivas
3 siblings, 1 reply; 21+ messages in thread
From: Martin J. Bligh @ 2004-07-12 14:24 UTC (permalink / raw)
To: Con Kolivas, linux-kernel; +Cc: akpm
> Because of the recent discussion about latency in the kernel I asked
> William Lee Irwin III to help create some instrumentation to determine
> where in the kernel there were still sustained periods of non-preemptible
> code. He hacked together this simple patch which times periods according
> to the preempt count. Hopefully we can use this patch in the advice of
> Linus to avoid the "mental masturbation" at guessing where latency is
> and track down real problem areas.
Is this much different from Rick's schedstat's work, which was itself based
on some earlier patches by Bill? I'd hate to end up with two sets of patches,
and schedstats seemed pretty comprehensive to me. He's on vacation, but his
stuff is here, if you want to take a look:
http://eaglet.rain.com/rick/linux/schedstats/
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] Instrumenting high latency
2004-07-12 14:24 ` Martin J. Bligh
@ 2004-07-12 14:27 ` Con Kolivas
2004-07-12 14:33 ` Martin J. Bligh
0 siblings, 1 reply; 21+ messages in thread
From: Con Kolivas @ 2004-07-12 14:27 UTC (permalink / raw)
To: Martin J. Bligh; +Cc: linux-kernel, akpm
[-- Attachment #1: Type: text/plain, Size: 1024 bytes --]
Martin J. Bligh wrote:
>>Because of the recent discussion about latency in the kernel I asked
>>William Lee Irwin III to help create some instrumentation to determine
>>where in the kernel there were still sustained periods of non-preemptible
>>code. He hacked together this simple patch which times periods according
>>to the preempt count. Hopefully we can use this patch in the advice of
>>Linus to avoid the "mental masturbation" at guessing where latency is
>>and track down real problem areas.
>
>
> Is this much different from Rick's schedstat's work, which was itself based
> on some earlier patches by Bill? I'd hate to end up with two sets of patches,
> and schedstats seemed pretty comprehensive to me. He's on vacation, but his
> stuff is here, if you want to take a look:
>
> http://eaglet.rain.com/rick/linux/schedstats/
No I remember his work and this is tackling it via a different area if I
recall correctly. He was looking at scheduler latencies as opposed to
non-preemptible kernel code.
Con
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 256 bytes --]
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] Instrumenting high latency
2004-07-12 11:31 ` William Lee Irwin III
@ 2004-07-12 14:32 ` William Lee Irwin III
2004-07-12 17:09 ` Martin Josefsson
1 sibling, 0 replies; 21+ messages in thread
From: William Lee Irwin III @ 2004-07-12 14:32 UTC (permalink / raw)
To: Andrew Morton, Con Kolivas, linux-kernel
On Mon, Jul 12, 2004 at 04:31:14AM -0700, William Lee Irwin III wrote:
> Some cleanups and fixing a minor oversight in touch_preempt_timing(),
> which was that in principle things can break out of preempt_count() > 1
> after checking need_resched(). Cleaner, smaller, yes. How crippled it
> just got with the various potentially false negatives introduced by
> using need_resched() as a hook for touch_preempt_timing() is unclear.
Something's gone wrong with using need_resched() as a hook for that.
The following, incremental atop the cleaned up patch, backs that out
and restores the apparent accuracy of reporting, though there isn't
any rigorous correlation of this with other scheduling latency stuff
(it should report lower worst-case latencies since it's measuring what
are tighter sections of code and ignores waiting time on runqueues).
vs. 2.6.7 (and the prior patch too). I'll probably repost a full patch
vs. -mm7 (or whatever's current when I do) in a dedicated thread.
-- wli
Index: timing-2.6.7/include/linux/sched.h
===================================================================
--- timing-2.6.7.orig/include/linux/sched.h 2004-07-12 01:45:08.000000000 -0700
+++ timing-2.6.7/include/linux/sched.h 2004-07-12 05:28:50.000000000 -0700
@@ -1000,8 +1000,7 @@
return unlikely(test_tsk_thread_flag(p,TIF_SIGPENDING));
}
-#define need_resched() ({ touch_preempt_timing(); __need_resched(); })
-static inline int __need_resched(void)
+static inline int need_resched(void)
{
return unlikely(test_thread_flag(TIF_NEED_RESCHED));
}
@@ -1009,6 +1008,7 @@
extern void __cond_resched(void);
static inline void cond_resched(void)
{
+ touch_preempt_timing();
if (need_resched())
__cond_resched();
}
Index: timing-2.6.7/kernel/printk.c
===================================================================
--- timing-2.6.7.orig/kernel/printk.c 2004-06-15 22:20:26.000000000 -0700
+++ timing-2.6.7/kernel/printk.c 2004-07-12 05:30:14.000000000 -0700
@@ -650,10 +650,8 @@
*/
void console_conditional_schedule(void)
{
- if (console_may_schedule && need_resched()) {
- set_current_state(TASK_RUNNING);
- schedule();
- }
+ if (console_may_schedule)
+ cond_resched();
}
EXPORT_SYMBOL(console_conditional_schedule);
Index: timing-2.6.7/mm/memory.c
===================================================================
--- timing-2.6.7.orig/mm/memory.c 2004-06-15 22:19:22.000000000 -0700
+++ timing-2.6.7/mm/memory.c 2004-07-12 05:33:17.000000000 -0700
@@ -558,14 +558,17 @@
zap_bytes -= block;
if ((long)zap_bytes > 0)
continue;
- if (!atomic && need_resched()) {
+ zap_bytes = ZAP_BLOCK_SIZE;
+ if (!atomic)
+ continue;
+ touch_preempt_timing();
+ if (need_resched()) {
int fullmm = tlb_is_full_mm(*tlbp);
tlb_finish_mmu(*tlbp, tlb_start, start);
cond_resched_lock(&mm->page_table_lock);
*tlbp = tlb_gather_mmu(mm, fullmm);
tlb_start_valid = 0;
}
- zap_bytes = ZAP_BLOCK_SIZE;
}
}
return ret;
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] Instrumenting high latency
2004-07-12 14:27 ` Con Kolivas
@ 2004-07-12 14:33 ` Martin J. Bligh
2004-07-12 14:42 ` Con Kolivas
0 siblings, 1 reply; 21+ messages in thread
From: Martin J. Bligh @ 2004-07-12 14:33 UTC (permalink / raw)
To: Con Kolivas; +Cc: linux-kernel, akpm
--Con Kolivas <kernel@kolivas.org> wrote (on Tuesday, July 13, 2004 00:27:27 +1000):
> Martin J. Bligh wrote:
>>> Because of the recent discussion about latency in the kernel I asked
>>> William Lee Irwin III to help create some instrumentation to determine
>>> where in the kernel there were still sustained periods of non-preemptible
>>> code. He hacked together this simple patch which times periods according
>>> to the preempt count. Hopefully we can use this patch in the advice of
>>> Linus to avoid the "mental masturbation" at guessing where latency is
>>> and track down real problem areas.
>>
>>
>> Is this much different from Rick's schedstat's work, which was itself based
>> on some earlier patches by Bill? I'd hate to end up with two sets of patches,
>> and schedstats seemed pretty comprehensive to me. He's on vacation, but his
>> stuff is here, if you want to take a look:
>>
>> http://eaglet.rain.com/rick/linux/schedstats/
>
> No I remember his work and this is tackling it via a different area if I
> recall correctly. He was looking at scheduler latencies as opposed to
> non-preemptible kernel code.
Fair enough ... is it worth adding it to the same harness though? He had lots
of nice analysis tools set up to do comparisons and graphing, etc.
M.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] Instrumenting high latency
2004-07-12 14:33 ` Martin J. Bligh
@ 2004-07-12 14:42 ` Con Kolivas
2004-07-12 14:47 ` Martin J. Bligh
0 siblings, 1 reply; 21+ messages in thread
From: Con Kolivas @ 2004-07-12 14:42 UTC (permalink / raw)
To: Martin J. Bligh; +Cc: linux-kernel, akpm
[-- Attachment #1: Type: text/plain, Size: 2570 bytes --]
Martin J. Bligh wrote:
> --Con Kolivas <kernel@kolivas.org> wrote (on Tuesday, July 13, 2004 00:27:27 +1000):
>
>
>>Martin J. Bligh wrote:
>>
>>>>Because of the recent discussion about latency in the kernel I asked
>>>>William Lee Irwin III to help create some instrumentation to determine
>>>>where in the kernel there were still sustained periods of non-preemptible
>>>>code. He hacked together this simple patch which times periods according
>>>>to the preempt count. Hopefully we can use this patch in the advice of
>>>>Linus to avoid the "mental masturbation" at guessing where latency is
>>>>and track down real problem areas.
>>>
>>>
>>>Is this much different from Rick's schedstat's work, which was itself based
>>>on some earlier patches by Bill? I'd hate to end up with two sets of patches,
>>>and schedstats seemed pretty comprehensive to me. He's on vacation, but his
>>>stuff is here, if you want to take a look:
>>>
>>>http://eaglet.rain.com/rick/linux/schedstats/
>>
>>No I remember his work and this is tackling it via a different area if I
>>recall correctly. He was looking at scheduler latencies as opposed to
>>non-preemptible kernel code.
>
>
> Fair enough ... is it worth adding it to the same harness though? He had lots
> of nice analysis tools set up to do comparisons and graphing, etc.
This works very nicely standalone getting us this for example with the
fixed patch:
6ms non-preemptible critical section violated 1 ms preempt threshold
starting at exit_mmap+0x1c/0x188 and ending at exit_mmap+0x118/0x188
[<c011d769>] dec_preempt_count+0x14f/0x151
[<c014d0d4>] exit_mmap+0x118/0x188
[<c014d0d4>] exit_mmap+0x118/0x188
[<c011df0a>] mmput+0x61/0x7b
[<c01226fa>] do_exit+0x142/0x510
[<c014c928>] unmap_vma_list+0xe/0x17
[<c0122b8a>] do_group_exit+0x41/0xf9
[<c0105fd5>] sysenter_past_esp+0x52/0x71
which then an objdump of the inlined code has allowed us to track it
down to this:
profile_exit_mmap(mm);
lru_add_drain();
c014cfce: e8 18 72 ff ff call c01441eb <lru_add_drain>
spin_lock(&mm->page_table_lock);
c014cfd3: e8 16 06 fd ff call c011d5ee <inc_preempt_count>
That's pretty specific. I dont think this comes under the umbrella of
statistics as such. Sure it can be modified to do it but I was looking
for a tool to find where specific latency hotspots still exist. Of
course I'm not capable myself of tackling the actual hotspots but those
who code those areas certainly can tackle them knowing what firm data shows.
Cheers,
Con
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 256 bytes --]
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] Instrumenting high latency
2004-07-12 14:42 ` Con Kolivas
@ 2004-07-12 14:47 ` Martin J. Bligh
0 siblings, 0 replies; 21+ messages in thread
From: Martin J. Bligh @ 2004-07-12 14:47 UTC (permalink / raw)
To: Con Kolivas; +Cc: linux-kernel, akpm
> This works very nicely standalone getting us this for example with the fixed patch:
>
> 6ms non-preemptible critical section violated 1 ms preempt threshold starting at exit_mmap+0x1c/0x188 and ending at exit_mmap+0x118/0x188
> [<c011d769>] dec_preempt_count+0x14f/0x151
> [<c014d0d4>] exit_mmap+0x118/0x188
> [<c014d0d4>] exit_mmap+0x118/0x188
> [<c011df0a>] mmput+0x61/0x7b
> [<c01226fa>] do_exit+0x142/0x510
> [<c014c928>] unmap_vma_list+0xe/0x17
> [<c0122b8a>] do_group_exit+0x41/0xf9
> [<c0105fd5>] sysenter_past_esp+0x52/0x71
>
> which then an objdump of the inlined code has allowed us to track it down to this:
>
> profile_exit_mmap(mm);
> lru_add_drain();
> c014cfce: e8 18 72 ff ff call c01441eb <lru_add_drain>
> spin_lock(&mm->page_table_lock);
> c014cfd3: e8 16 06 fd ff call c011d5ee <inc_preempt_count>
>
>
> That's pretty specific. I dont think this comes under the umbrella of statistics as such. Sure it can be modified to do it but I was looking for a tool to find where specific latency hotspots still exist. Of course I'm not capable myself of tackling the actual hotspots but those who code those areas certainly can tackle them knowing what firm data shows.
Guess so. Thought it might be nice to have a measurement of worst case
latencies across various functions ... I'll have a play with it, and see
if I can come up with something.
M.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] Instrumenting high latency
2004-07-12 11:31 ` William Lee Irwin III
2004-07-12 14:32 ` William Lee Irwin III
@ 2004-07-12 17:09 ` Martin Josefsson
1 sibling, 0 replies; 21+ messages in thread
From: Martin Josefsson @ 2004-07-12 17:09 UTC (permalink / raw)
To: William Lee Irwin III; +Cc: Andrew Morton, Con Kolivas, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 400 bytes --]
On Mon, 2004-07-12 at 13:31, William Lee Irwin III wrote:
> (b) the damn thing finally boots on my craptop, where else did it break?
Con sent me the version he later posted here, and it broke during boot
with a preempt_count() of 0xffffff6d which didn't make in_interrupt()
very friendly and kmem_cache_create() doesn't like that...
I'll test this new version in a while.
--
/Martin
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2004-07-12 17:10 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-07-12 6:29 [PATCH] Instrumenting high latency Con Kolivas
2004-07-12 6:37 ` Andrew Morton
2004-07-12 6:43 ` Con Kolivas
2004-07-12 6:58 ` Andrew Morton
2004-07-12 7:00 ` [PATCH] " William Lee Irwin III
2004-07-12 7:34 ` Andrew Morton
2004-07-12 8:02 ` William Lee Irwin III
2004-07-12 8:38 ` William Lee Irwin III
2004-07-12 11:31 ` William Lee Irwin III
2004-07-12 14:32 ` William Lee Irwin III
2004-07-12 17:09 ` Martin Josefsson
2004-07-12 8:29 ` Arjan van de Ven
2004-07-12 8:42 ` William Lee Irwin III
2004-07-12 8:47 ` William Lee Irwin III
2004-07-12 10:41 ` Rajput
2004-07-12 10:44 ` William Lee Irwin III
2004-07-12 14:24 ` Martin J. Bligh
2004-07-12 14:27 ` Con Kolivas
2004-07-12 14:33 ` Martin J. Bligh
2004-07-12 14:42 ` Con Kolivas
2004-07-12 14:47 ` Martin J. Bligh
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.