Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q5

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q5
@ 2004-08-30 19:13 Mark_H_Johnson
  2004-08-30 19:21 ` Ingo Molnar
                   ` (2 more replies)
  0 siblings, 3 replies; 100+ messages in thread
From: Mark_H_Johnson @ 2004-08-30 19:13 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: K.R. Foley, linux-kernel, Felipe Alfaro Solana, Daniel Schmitt,
	Lee Revell

>i've uploaded -Q5 to:
> [snip the rest...]

Thanks.

This appears to be the first 2.6.x kernel I've run that has results
comparable to 2.4.x kernels with low latency patches and kernel preemption.
The few remaining symptoms I see include:

 - a few long (> 1 msec) delays in the real time CPU loop (no system calls)
 - varying time to complete the write system call (for audio) - much
different than 2.4
 - a couple latency traces (> 700 usec) in the network driver

For reference, these tests were performed on the following SMP system:
  Dual 866 Mhz Pentium III
  512 Mbyte memory
  IDE system disk (DMA enabled)
The basic test is Benno's latency test
(http://www.gardena.net/benno/linux/audio) with some slight modifications
to the tests to keep the second CPU busy (non real time CPU burner) and to
add network I/O tests. The 2.4 tests were run with 2.4.20, the 2.6 tests
were run with 2.4.9-rc1-Q5. On 2.6, voluntary_preemption,
kernel_preemption, hardirq_preemption, and softirq_preemption are all 1. I
also set
  /sys/block/hda/queue/max_sectors_kb = 32
  /sys/block/hda/queue/read_ahead_kb = 32
  /proc/sys/net/core/netdev_max_backlog = 8
and the audio driver was set to be non-threaded.

BASIC RESULTS
=============
Comparison of results between 2.6.x and 2.4.x; values in milliseconds.
Nominal values for the write operation is 1.45 msec; the CPU loop is 1.16
msec.

          Max CPU Delta     Max Write Delta
Test     2.4.x     2.6.x    2.4.x     2.6.x
X11       0.10      0.16     0.05      0.65
/proc     0.07      0.17     0.05      0.65
net out   0.15      0.19     0.05      0.75
net in    0.17      0.23     0.05      0.95
dsk wrt   0.49      0.18     0.25      1.05
dsk copy  2.48      0.68     2.25      1.25
disk rd   3.03      1.61     2.75      1.35

LONG DELAYS
===========

Note I still see over 110% worst case overhead on a max priority real time
CPU task (no system calls) when doing heavy disk I/O on 2.6. It is much
better than 2.4, but still disturbing. What I would hope would happen on a
dual CPU system like mine, is that the real time task tends to be on one
CPU and the other system activity would tend to stay on the other CPU.
However, the results do not seem to indicate that behavior.

VARYING SYSTEM CALL TIMES
=========================

In 2.4, it appears that the duration of the write system call is basically
fixed and dependent on the duration of the audio fragment. In 2.6, this
behavior is now different. If I look at the chart in detail, it appears the
system is queueing up several write operations during the first few seconds
of testing. You can see this by consistently low elapsed times for the
write system call. Then the elapsed time for the write bounces up / down in
a sawtooth pattern over a 1 msec range. Could someone explain the cause of
this new behavior and if there is a setting to restore the old behavior? I
am concerned that this queueing adds latency to audio operations (when
trying to synchronize audio with other real time behavior).

LONG NETWORK LATENCIES
======================

In about 25 minutes of heavy testing, I had two latency traces with
/proc/sys/kernel/preempt_max_latency set to 700. They had the same start /
end location with the long delay as follows:
  730 us, entries: 361
  ...
  started at rtl8139_poll+0x3c/0x160
  ended at   rtl8139_poll+0x100/0x160
  00000001 0.000ms (+0.000ms): rtl8139_poll (net_rx_action)
  00000001 0.140ms (+0.140ms): rtl8139_rx (rtl8139_poll)
  00000001 0.556ms (+0.416ms): alloc_skb (rtl8139_rx)
  ... remaining items all > +0.005ms ...

  731 us, entries: 360
  ...
  started at rtl8139_poll+0x3c/0x160
  ended at   rtl8139_poll+0x100/0x160
  00000001 0.000ms (+0.000ms): rtl8139_poll (net_rx_action)
  00000001 0.000ms (+0.000ms): rtl8139_rx (rtl8139_poll)
  00000001 0.002ms (+0.001ms): alloc_skb (rtl8139_rx)
  00000001 0.141ms (+0.139ms): kmem_cache_alloc (alloc_skb)
  00000001 0.211ms (+0.070ms): __kmalloc (alloc_skb)
  00000001 0.496ms (+0.284ms): eth_type_trans (rtl8139_rx)
  00000001 0.565ms (+0.068ms): netif_receive_skb (rtl8139_rx)
  ... remaining items all > +0.005ms ...

Still much better than my previous results (before setting
netdev_max_backlog).

I will be running some additional tests
 - reducing preempt_max_latency
 - running with sortirq and hardirq_preemption=0
to see if these uncover any further problems.

Thanks again for the good work.
--Mark H Johnson
  <mailto:Mark_H_Johnson@raytheon.com>

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q5
  2004-08-30 19:13 [patch] voluntary-preempt-2.6.9-rc1-bk4-Q5 Mark_H_Johnson
@ 2004-08-30 19:21 ` Ingo Molnar
  2004-09-01 12:31   ` [patch] voluntary-preempt-2.6.9-rc1-bk4-Q5 - netdev_max_back_log is too small P.O. Gaillard
  2004-08-31  8:49 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-Q5 Ingo Molnar
  2004-09-02  6:33 ` Ingo Molnar
  2 siblings, 1 reply; 100+ messages in thread
From: Ingo Molnar @ 2004-08-30 19:21 UTC (permalink / raw)
  To: Mark_H_Johnson
  Cc: K.R. Foley, linux-kernel, Felipe Alfaro Solana, Daniel Schmitt,
	Lee Revell

[-- Attachment #1: Type: text/plain, Size: 728 bytes --]


* Mark_H_Johnson@raytheon.com <Mark_H_Johnson@raytheon.com> wrote:

> LONG NETWORK LATENCIES
> ======================
> 
> In about 25 minutes of heavy testing, I had two latency traces with
> /proc/sys/kernel/preempt_max_latency set to 700. They had the same start /
> end location with the long delay as follows:
>   730 us, entries: 361
>   ...
>   started at rtl8139_poll+0x3c/0x160
>   ended at   rtl8139_poll+0x100/0x160

regarding this particular latency, could you try the attached patch
ontop of -Q5? It turns the ->poll() loop into separate, individually
preemptable iterations instead of one batch of processing. In theory
this should result in latency being lower regardless of the
netdev_max_backlog value.

	Ingo

[-- Attachment #2: 2 --]
[-- Type: text/plain, Size: 738 bytes --]

--- linux/net/core/dev.c.orig2	
+++ linux/net/core/dev.c	
@@ -1903,7 +1903,7 @@ static void net_rx_action(struct softirq
 {
 	struct softnet_data *queue = &__get_cpu_var(softnet_data);
 	unsigned long start_time = jiffies;
-	int budget = netdev_max_backlog;
+	int budget = netdev_max_backlog, loops;
 
 	
 	local_irq_disable();
@@ -1926,7 +1926,10 @@ static void net_rx_action(struct softirq
 		dev = list_entry(queue->poll_list.next,
 				 struct net_device, poll_list);
 
-		if (dev->quota <= 0 || dev->poll(dev, &budget)) {
+		loops = 1;
+		if (dev->quota <= 0 || dev->poll(dev, &loops)) {
+			if (loops < 1)
+				budget--;
 			local_irq_disable();
 			list_del(&dev->poll_list);
 			list_add_tail(&dev->poll_list, &queue->poll_list);

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q5
  2004-08-30 19:13 [patch] voluntary-preempt-2.6.9-rc1-bk4-Q5 Mark_H_Johnson
  2004-08-30 19:21 ` Ingo Molnar
@ 2004-08-31  8:49 ` Ingo Molnar
  2004-09-02  6:33 ` Ingo Molnar
  2 siblings, 0 replies; 100+ messages in thread
From: Ingo Molnar @ 2004-08-31  8:49 UTC (permalink / raw)
  To: Mark_H_Johnson
  Cc: K.R. Foley, linux-kernel, Felipe Alfaro Solana, Daniel Schmitt,
	Lee Revell

[-- Attachment #1: Type: text/plain, Size: 2144 bytes --]


* Mark_H_Johnson@raytheon.com <Mark_H_Johnson@raytheon.com> wrote:

> VARYING SYSTEM CALL TIMES
> =========================
> 
> In 2.4, it appears that the duration of the write system call is
> basically fixed and dependent on the duration of the audio fragment.
> In 2.6, this behavior is now different. If I look at the chart in
> detail, it appears the system is queueing up several write operations
> during the first few seconds of testing. You can see this by
> consistently low elapsed times for the write system call. Then the
> elapsed time for the write bounces up / down in a sawtooth pattern
> over a 1 msec range. Could someone explain the cause of this new
> behavior and if there is a setting to restore the old behavior? I am
> concerned that this queueing adds latency to audio operations (when
> trying to synchronize audio with other real time behavior).

since the latency tracer does not trigger, we need a modified tracer to
find out what's happening during such long delays. I've attached the
'user-latency-tracer' patch ontop of -Q5, which is a modification of the
latency tracer. This patch enables free-running tracing which includes
all kernel functions not just critical sections. To activate this, two
things have to be done. Firstly:

	echo 2 > /proc/sys/kernel/trace_enabled

this turns off the normal latency tracer and turns on the 'user tracer'. 
Traces can be generated by userspace via a hack done to
sys_gettimeofday():

	gettimeofday(0,1); // enable the tracer
	gettimeofday(0,0); // save current trace and disable the tracer

this way the tracing can be limited to the suspected codepaths only.

could you try to insert gettimeofday(0,1) into your testsuite just
before the write() call is done, and right after the write() call, and
save a couple of representative traces? The patch also ups the # of
latency entries to 8000 - if that is still insufficient then please
increase it as needed.

NOTE: on SMP the tracing on/off is strictly per-CPU. So do the enabling
and disabling of the trace on the same CPU. (doing otherwise wont cause
problems, but the generated traces will be less useful.)

	Ingo

[-- Attachment #2: user-latency-2.6.9-rc1-bk4-Q5-A0 --]
[-- Type: text/plain, Size: 3598 bytes --]

--- linux/kernel/latency.c.orig
+++ linux/kernel/latency.c
@@ -27,7 +27,7 @@ static DECLARE_MUTEX(max_mutex);
 
 #ifdef CONFIG_LATENCY_TRACE
 
-#define MAX_TRACE 4000UL
+#define MAX_TRACE 8000UL
 
 struct trace_entry {
 	unsigned long preempt_count;
@@ -63,14 +63,13 @@ static unsigned long max_nice;
 static unsigned long max_policy;
 static unsigned long max_rt_priority;
 
-inline void notrace
+static inline void notrace
 ____trace(struct cpu_trace *tr, unsigned long eip, unsigned long parent_eip)
 {
 	struct trace_entry *entry;
 
-	BUG_ON(!irqs_disabled());
-
-	if (tr->trace_idx < MAX_TRACE) {
+	if ((tr->critical_start || (trace_enabled == 2)) &&
+			(tr->trace_idx < MAX_TRACE)) {
 		entry = tr->trace + tr->trace_idx;
 		entry->eip = eip;
 		entry->parent_eip = parent_eip;
@@ -80,7 +79,7 @@ ____trace(struct cpu_trace *tr, unsigned
 	tr->trace_idx++;
 }
 
-inline void notrace
+static inline void notrace
 ___trace(unsigned long eip, unsigned long parent_eip)
 {
 	unsigned long flags;
@@ -266,6 +265,18 @@ static int setup_preempt_thresh(char *s)
 }
 __setup("preempt_thresh=", setup_preempt_thresh);
 
+static void update_max_trace(struct cpu_trace *tr)
+{
+	memcpy(&max_trace, tr, sizeof (max_trace));
+
+	memcpy(max_comm, current->comm, 16);
+	max_pid = current->pid;
+	max_uid = current->uid;
+	max_nice = current->static_prio - 20 - MAX_RT_PRIO;
+	max_policy = current->policy;
+	max_rt_priority = current->rt_priority;
+}
+
 static void notrace check_preempt_timing(struct cpu_trace *tr)
 {
 #ifdef CONFIG_LATENCY_TRACE
@@ -274,6 +285,10 @@ static void notrace check_preempt_timing
 	unsigned long parent_eip = (unsigned long)__builtin_return_address(1);
 	unsigned long latency;
 
+#ifdef CONFIG_LATENCY_TRACE
+	if (trace_enabled == 2)
+		return;
+#endif
 	atomic_inc(&tr->disabled);
 	latency = cycles_to_usecs(get_cycles() - tr->preempt_timestamp);
 
@@ -293,14 +308,7 @@ static void notrace check_preempt_timing
 
 #ifdef CONFIG_LATENCY_TRACE
 	____trace(tr, eip, parent_eip);
-	memcpy(&max_trace, tr, sizeof (max_trace));
-
-	memcpy(max_comm, current->comm, 16);
-	max_pid = current->pid;
-	max_uid = current->uid;
-	max_nice = current->static_prio - 20 - MAX_RT_PRIO;
-	max_policy = current->policy;
-	max_rt_priority = current->rt_priority;
+	update_max_trace(tr);
 #endif
 
 	if (preempt_thresh)
@@ -354,6 +362,10 @@ void notrace add_preempt_count(int val)
 #endif
 
 	preempt_count() += val;
+#ifdef CONFIG_LATENCY_TRACE
+	if (trace_enabled == 2)
+		return;
+#endif
 	if (preempt_count() == val) {
 		struct cpu_trace *tr = &__get_cpu_var(trace);
 
@@ -383,3 +395,27 @@ void notrace sub_preempt_count(int val)
 	preempt_count() -= val;
 }
 EXPORT_SYMBOL(sub_preempt_count);
+
+void user_trace_start(void)
+{
+	struct cpu_trace *tr;
+
+	if (trace_enabled != 2)
+		return;
+	tr = &get_cpu_var(trace);
+	tr->trace_idx = 0;
+	mcount();
+	put_cpu_var(trace);
+}
+
+void user_trace_stop(void)
+{
+	struct cpu_trace *tr;
+
+	if (trace_enabled != 2)
+		return;
+	tr = &get_cpu_var(trace);
+	mcount();
+	update_max_trace(tr);
+	put_cpu_var(trace);
+}
--- linux/kernel/time.c.orig2	
+++ linux/kernel/time.c	
@@ -90,8 +90,17 @@ asmlinkage long sys_stime(time_t __user 
 
 #endif /* __ARCH_WANT_SYS_TIME */
 
+extern void user_trace_start(void);
+extern void user_trace_stop(void);
+
 asmlinkage long sys_gettimeofday(struct timeval __user *tv, struct timezone __user *tz)
 {
+#ifdef CONFIG_LATENCY_TRACE
+	if (!tv && ((int)tz == 1))
+		user_trace_start();
+	if (!tv && !tz)
+		user_trace_stop();
+#endif
 	if (likely(tv != NULL)) {
 		struct timeval ktv;
 		do_gettimeofday(&ktv);

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q5 - netdev_max_back_log is too small
  2004-08-30 19:21 ` Ingo Molnar
@ 2004-09-01 12:31   ` P.O. Gaillard
  2004-09-01 13:05     ` Ingo Molnar
  0 siblings, 1 reply; 100+ messages in thread
From: P.O. Gaillard @ 2004-09-01 12:31 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel

Hello,

I know that you have reduced netdev_max_back_log to 8 to reduce the latency. But 
I think that you should know that I had to set it back to 32 to avoid ethernet 
frame losses on a 3GHz P4 using e1000 (eth0/threaded=1).

My app is still the one with a 20Mbytes/s flow of data and some signal 
processing work. That's only 20% of max Gigabit Ethernet load and 40% of CPU 
time. Not really hot stuff.

So, this were my 2 cents to help you decide how to balance latency and 
throughput. I hope that you can find a good balance or find a way to allow 
people to easily choose their own balance (e.g. maybe there should be a list of 
parameters impacting latency in the Documentation directory).

  please keep on the good work and thanks a lot !

	P.O. Gaillard

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q5 - netdev_max_back_log is too small
  2004-09-01 12:31   ` [patch] voluntary-preempt-2.6.9-rc1-bk4-Q5 - netdev_max_back_log is too small P.O. Gaillard
@ 2004-09-01 13:05     ` Ingo Molnar
  2004-09-02 11:24       ` [patch] voluntary-preempt-2.6.9-rc1-bk4-Q6 - network is no longer smooth P.O. Gaillard
  0 siblings, 1 reply; 100+ messages in thread
From: Ingo Molnar @ 2004-09-01 13:05 UTC (permalink / raw)
  To: P.O. Gaillard; +Cc: linux-kernel


* P.O. Gaillard <pierre-olivier.gaillard@fr.thalesgroup.com> wrote:

> I know that you have reduced netdev_max_back_log to 8 to reduce the
> latency. But I think that you should know that I had to set it back to
> 32 to avoid ethernet frame losses on a 3GHz P4 using e1000
> (eth0/threaded=1).

could you try the -Q6 patch? That leaves netdev_max_backlog at the
default value of 300 and uses another method to break up the networking
codepaths, without any bad tradeoffs.

	Ingo

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q5
  2004-08-30 19:13 [patch] voluntary-preempt-2.6.9-rc1-bk4-Q5 Mark_H_Johnson
  2004-08-30 19:21 ` Ingo Molnar
  2004-08-31  8:49 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-Q5 Ingo Molnar
@ 2004-09-02  6:33 ` Ingo Molnar
  2004-09-02  6:55   ` [patch] voluntary-preempt-2.6.9-rc1-bk4-Q8 Ingo Molnar
  2 siblings, 1 reply; 100+ messages in thread
From: Ingo Molnar @ 2004-09-02  6:33 UTC (permalink / raw)
  To: Mark_H_Johnson
  Cc: K.R. Foley, linux-kernel, Felipe Alfaro Solana, Daniel Schmitt,
	Lee Revell


* Mark_H_Johnson@raytheon.com <Mark_H_Johnson@raytheon.com> wrote:

> In 2.4, it appears that the duration of the write system call is
> basically fixed and dependent on the duration of the audio fragment.
> In 2.6, this behavior is now different. If I look at the chart in
> detail, it appears the system is queueing up several write operations
> during the first few seconds of testing. You can see this by
> consistently low elapsed times for the write system call. Then the
> elapsed time for the write bounces up / down in a sawtooth pattern
> over a 1 msec range. Could someone explain the cause of this new
> behavior and if there is a setting to restore the old behavior? I am
> concerned that this queueing adds latency to audio operations (when
> trying to synchronize audio with other real time behavior).

i think i found the reason for the sawtooth: it's a bug in hardirq
redirection. In certain situations we can end up not waking up softirqd,
resulting in a random 0-1msec latency between hardirq arrival and
softirq execution. We dont see higher latencies because timer IRQs
always wake up softirqd which hides the bug to a certain degree.

I'll fix this in -Q8.

	Ingo

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [patch] voluntary-preempt-2.6.9-rc1-bk4-Q8
  2004-09-02  6:33 ` Ingo Molnar
@ 2004-09-02  6:55   ` Ingo Molnar
  2004-09-02  7:04     ` Lee Revell
                       ` (2 more replies)
  0 siblings, 3 replies; 100+ messages in thread
From: Ingo Molnar @ 2004-09-02  6:55 UTC (permalink / raw)
  To: Mark_H_Johnson
  Cc: K.R. Foley, linux-kernel, Felipe Alfaro Solana, Daniel Schmitt,
	Lee Revell, alsa-devel

i've released the -Q8 patch:

  http://redhat.com/~mingo/voluntary-preempt/voluntary-preempt-2.6.9-rc1-bk4-Q8

ontop of:

  http://redhat.com/~mingo/voluntary-preempt/diff-bk-040828-2.6.8.1.bz2

this release fixes an artificial 0-1msec delay between hardirq arrival
and softirq invocation. This should solve some of the ALSA artifacts
reported by Mark H Johnson. It should also solve the rtl8139 problems -
i've put such a card into a testbox and with -Q7 i had similar packet
latency problems while with -Q8 it works just fine.

So netdev_backlog_granularity is still a value of 1 in -Q8, please check
whether the networking problems (bootup and service startup and latency)
problems are resolved. (and increase this value in case there are still
problems.)

	Ingo

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q8
  2004-09-02  6:55   ` [patch] voluntary-preempt-2.6.9-rc1-bk4-Q8 Ingo Molnar
@ 2004-09-02  7:04     ` Lee Revell
  2004-09-02  7:15       ` Ingo Molnar
  2004-09-02  7:17       ` Ingo Molnar
  2004-09-02  8:23     ` Ingo Molnar
  2004-09-02 11:10     ` [patch] voluntary-preempt-2.6.9-rc1-bk4-Q9 Ingo Molnar
  2 siblings, 2 replies; 100+ messages in thread
From: Lee Revell @ 2004-09-02  7:04 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Mark_H_Johnson, K.R. Foley, linux-kernel, Felipe Alfaro Solana,
	Daniel Schmitt, alsa-devel

On Thu, 2004-09-02 at 02:55, Ingo Molnar wrote:
> i've released the -Q8 patch:
> 
>   http://redhat.com/~mingo/voluntary-preempt/voluntary-preempt-2.6.9-rc1-bk4-Q8
> 
> ontop of:
> 
>   http://redhat.com/~mingo/voluntary-preempt/diff-bk-040828-2.6.8.1.bz2
> 

Here are traces of a 145, 190, and 217 usec latencies in
netif_receive_skb:

http://krustophenia.net/testresults.php?dataset=2.6.9-rc1-Q6#/var/www/2.6.9-rc1-Q6/trace2.txt
http://krustophenia.net/testresults.php?dataset=2.6.9-rc1-Q6#/var/www/2.6.9-rc1-Q6/trace3.txt
http://krustophenia.net/testresults.php?dataset=2.6.9-rc1-Q6#/var/www/2.6.9-rc1-Q6/trace4.txt

Some of these are with ip_conntrack enabled, at the request of another
poster, this does not make much of a difference, it increases the worst
case latency by 20 usec or so.

Also there is the rt_garbage_collect issue, previously reported.  I have
not seen this lately but I do not remember seeing that it was fixed.

Lee


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q8
  2004-09-02  7:04     ` Lee Revell
@ 2004-09-02  7:15       ` Ingo Molnar
  2004-09-02  7:31         ` Lee Revell
  2004-09-02 23:25         ` Lee Revell
  2004-09-02  7:17       ` Ingo Molnar
  1 sibling, 2 replies; 100+ messages in thread
From: Ingo Molnar @ 2004-09-02  7:15 UTC (permalink / raw)
  To: Lee Revell
  Cc: Mark_H_Johnson, K.R. Foley, linux-kernel, Felipe Alfaro Solana,
	Daniel Schmitt, alsa-devel


* Lee Revell <rlrevell@joe-job.com> wrote:

> Here are traces of a 145, 190, and 217 usec latencies in
> netif_receive_skb:
> 
> http://krustophenia.net/testresults.php?dataset=2.6.9-rc1-Q6#/var/www/2.6.9-rc1-Q6/trace2.txt
> http://krustophenia.net/testresults.php?dataset=2.6.9-rc1-Q6#/var/www/2.6.9-rc1-Q6/trace3.txt
> http://krustophenia.net/testresults.php?dataset=2.6.9-rc1-Q6#/var/www/2.6.9-rc1-Q6/trace4.txt

these all seem to be single-packet processing latencies - it would be
quite hard to make those codepaths preemptible.

i'd suggest to turn off things like netfilter and ip_conntrack (and
other optional networking features that show up in the trace), they can
only increase latency:

 00000001 0.016ms (+0.000ms): ip_rcv (netif_receive_skb)
 00000001 0.019ms (+0.002ms): nf_hook_slow (ip_rcv)
 00000002 0.019ms (+0.000ms): nf_iterate (nf_hook_slow)
 00000002 0.021ms (+0.001ms): ip_conntrack_defrag (nf_iterate)
 00000002 0.022ms (+0.000ms): ip_conntrack_in (nf_iterate)
 00000002 0.022ms (+0.000ms): ip_ct_find_proto (ip_conntrack_in)
 00000103 0.023ms (+0.000ms): __ip_ct_find_proto (ip_ct_find_proto)
 00000102 0.024ms (+0.000ms): local_bh_enable (ip_ct_find_proto)
 00000002 0.025ms (+0.001ms): tcp_error (ip_conntrack_in)

	Ingo

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q8
  2004-09-02  7:04     ` Lee Revell
  2004-09-02  7:15       ` Ingo Molnar
@ 2004-09-02  7:17       ` Ingo Molnar
  1 sibling, 0 replies; 100+ messages in thread
From: Ingo Molnar @ 2004-09-02  7:17 UTC (permalink / raw)
  To: Lee Revell
  Cc: Mark_H_Johnson, K.R. Foley, linux-kernel, Felipe Alfaro Solana,
	Daniel Schmitt


* Lee Revell <rlrevell@joe-job.com> wrote:

> Also there is the rt_garbage_collect issue, previously reported.  I
> have not seen this lately but I do not remember seeing that it was
> fixed.

i dont think it's fixed, please re-report it if it occurs again, there
have been many changes.

	Ingo

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q8
  2004-09-02  7:15       ` Ingo Molnar
@ 2004-09-02  7:31         ` Lee Revell
  2004-09-02  7:46           ` Ingo Molnar
  2004-09-02 23:25         ` Lee Revell
  1 sibling, 1 reply; 100+ messages in thread
From: Lee Revell @ 2004-09-02  7:31 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Mark_H_Johnson, K.R. Foley, linux-kernel, Felipe Alfaro Solana,
	Daniel Schmitt, alsa-devel

On Thu, 2004-09-02 at 03:15, Ingo Molnar wrote:
> * Lee Revell <rlrevell@joe-job.com> wrote:
> 
> > Here are traces of a 145, 190, and 217 usec latencies in
> > netif_receive_skb:
> > 
> > http://krustophenia.net/testresults.php?dataset=2.6.9-rc1-Q6#/var/www/2.6.9-rc1-Q6/trace2.txt
> > http://krustophenia.net/testresults.php?dataset=2.6.9-rc1-Q6#/var/www/2.6.9-rc1-Q6/trace3.txt
> > http://krustophenia.net/testresults.php?dataset=2.6.9-rc1-Q6#/var/www/2.6.9-rc1-Q6/trace4.txt
> 
> these all seem to be single-packet processing latencies - it would be
> quite hard to make those codepaths preemptible.
> 

I suspected as much, these are not a problem.  The large latencies from
reading the /proc filesystem are a bit worrisome (trace1.txt), I will
report these again if they still happen with Q8.

Lee 


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q8
  2004-09-02  7:31         ` Lee Revell
@ 2004-09-02  7:46           ` Ingo Molnar
  2004-09-03  1:10             ` Rusty Russell
  0 siblings, 1 reply; 100+ messages in thread
From: Ingo Molnar @ 2004-09-02  7:46 UTC (permalink / raw)
  To: Lee Revell
  Cc: Mark_H_Johnson, K.R. Foley, linux-kernel, Felipe Alfaro Solana,
	Daniel Schmitt, alsa-devel, Rusty Russell, netfilter-devel


* Lee Revell <rlrevell@joe-job.com> wrote:

> > these all seem to be single-packet processing latencies - it would be
> > quite hard to make those codepaths preemptible.
> 
> I suspected as much, these are not a problem.  The large latencies
> from reading the /proc filesystem are a bit worrisome (trace1.txt), I
> will report these again if they still happen with Q8.

conntrack's ct_seq ops indeed seems to have latency problems - the quick
workaround is to disable conntrack.

The reason for the latency is that ct_seq_start() does a read_lock() on
ip_conntrack_lock and only ct_seq_stop() releases it - possibly
milliseconds later. But the whole conntrack /proc code is quite flawed:

        READ_LOCK(&ip_conntrack_lock);

        if (*pos >= ip_conntrack_htable_size)
                return NULL;

        bucket = kmalloc(sizeof(unsigned int), GFP_KERNEL);
        if (!bucket) {
                return ERR_PTR(-ENOMEM);
        }
        *bucket = *pos;
        return bucket;

#1: we kmalloc(GFP_KERNEL) with a spinlock held and softirqs off - ouch!

#2: why does it do the kmalloc() anyway? It could store the position in
    the seq pointer just fine. No need to alloc an integer pointer to
    store the value in ...

#3: to fix the latency, ct_seq_show() could take the ip_conntrack_lock 
    and could check the current index against ip_conntrack_htable_size. 
    There's not much point in making this non-preemptible, there's 
    a 4K granularity anyway.

Rusty, what's going on in this code?

	Ingo

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q8
  2004-09-02  6:55   ` [patch] voluntary-preempt-2.6.9-rc1-bk4-Q8 Ingo Molnar
  2004-09-02  7:04     ` Lee Revell
@ 2004-09-02  8:23     ` Ingo Molnar
  2004-09-02 11:10     ` [patch] voluntary-preempt-2.6.9-rc1-bk4-Q9 Ingo Molnar
  2 siblings, 0 replies; 100+ messages in thread
From: Ingo Molnar @ 2004-09-02  8:23 UTC (permalink / raw)
  To: Mark_H_Johnson
  Cc: K.R. Foley, linux-kernel, Felipe Alfaro Solana, Daniel Schmitt,
	Lee Revell, alsa-devel


* Ingo Molnar <mingo@elte.hu> wrote:

> this release fixes an artificial 0-1msec delay between hardirq arrival
> and softirq invocation. This should solve some of the ALSA artifacts
> reported by Mark H Johnson. It should also solve the rtl8139 problems
> - i've put such a card into a testbox and with -Q7 i had similar
> packet latency problems while with -Q8 it works just fine.

the rtl8139 problems are not fixed yet - i can still reproduce the
delayed packet issues.

	Ingo

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [patch] voluntary-preempt-2.6.9-rc1-bk4-Q9
  2004-09-02  6:55   ` [patch] voluntary-preempt-2.6.9-rc1-bk4-Q8 Ingo Molnar
  2004-09-02  7:04     ` Lee Revell
  2004-09-02  8:23     ` Ingo Molnar
@ 2004-09-02 11:10     ` Ingo Molnar
  2004-09-02 12:14       ` Thomas Charbonnel
                         ` (2 more replies)
  2 siblings, 3 replies; 100+ messages in thread
From: Ingo Molnar @ 2004-09-02 11:10 UTC (permalink / raw)
  To: linux-kernel
  Cc: K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt, Lee Revell,
	Mark_H_Johnson


i've released the -Q9 patch:

  http://redhat.com/~mingo/voluntary-preempt/voluntary-preempt-2.6.9-rc1-bk4-Q9

ontop of:

  http://redhat.com/~mingo/voluntary-preempt/diff-bk-040828-2.6.8.1.bz2

Changes:

 - fixed the cond_resched_softirq() bug noticed by Mika Penttila.

 - updated the preemption-friendly network-RX code but 8193too.c still
   produces delayed packets so netdev_backlog_granularity now defaults
   to 2, which seems to be working fine on my testbox.

 - the latency_trace output now includes the kernel and patch version,
   for easier sorting of reports.

	Ingo

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q6 - network is no longer smooth
  2004-09-01 13:05     ` Ingo Molnar
@ 2004-09-02 11:24       ` P.O. Gaillard
  2004-09-02 11:28         ` Ingo Molnar
  0 siblings, 1 reply; 100+ messages in thread
From: P.O. Gaillard @ 2004-09-02 11:24 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel

Hello,

I just tried with Q6 and the network behaviour seems odd :
  - my ssh connection freezes a bit,
  - communication between the two PCs sharing the GigE connection is difficult 
to start :
13:17:37.965962 arp who-has djerba-gig tell canopus-gig
13:17:37.966220 IP djerba-gig > canopus-gig: udp
13:17:37.966230 IP djerba-gig > canopus-gig: udp
13:17:37.966313 IP djerba-gig > canopus-gig: udp
13:17:38.965906 arp who-has djerba-gig tell canopus-gig
13:17:38.966040 IP djerba-gig > canopus-gig: udp
This is strange since canopus is the one with Q6
  - ping does not start for a moment since all packets are RX-ERROR, then they 
start being received.

Here is the output of ethtool -S eth1

NIC statistics:
      rx_packets: 581
      tx_packets: 552
      rx_bytes: 545746
      tx_bytes: 66508
      rx_errors: 112950
      tx_errors: 0
      rx_dropped: 90
      tx_dropped: 0
      multicast: 0
      collisions: 0
      rx_length_errors: 0
      rx_over_errors: 0
      rx_crc_errors: 0
      rx_frame_errors: 0
      rx_fifo_errors: 112860
      rx_missed_errors: 112860
      tx_aborted_errors: 0
      tx_carrier_errors: 0
      tx_fifo_errors: 0
      tx_heartbeat_errors: 0
      tx_window_errors: 0
      tx_abort_late_coll: 0
      tx_deferred_ok: 0
      tx_single_coll_ok: 0
      tx_multi_coll_ok: 0
      rx_long_length_errors: 0
      rx_short_length_errors: 0
      rx_align_errors: 0
      tx_tcp_seg_good: 0
      tx_tcp_seg_failed: 0
      rx_flow_control_xon: 0
      rx_flow_control_xoff: 0
      tx_flow_control_xon: 45
      tx_flow_control_xoff: 112905
      rx_long_byte_count: 545746
      rx_csum_offload_good: 1
      rx_csum_offload_errors: 0


I rebooted with Q5 (+ netdev_max_backlog=32) and everything was fine. Is there 
anything I can do to make my report more useful for you ?

	

	P.O. Gaillard








^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q6 - network is no longer smooth
  2004-09-02 11:24       ` [patch] voluntary-preempt-2.6.9-rc1-bk4-Q6 - network is no longer smooth P.O. Gaillard
@ 2004-09-02 11:28         ` Ingo Molnar
  2004-09-02 15:26           ` P.O. Gaillard
  0 siblings, 1 reply; 100+ messages in thread
From: Ingo Molnar @ 2004-09-02 11:28 UTC (permalink / raw)
  To: P.O. Gaillard; +Cc: linux-kernel


* P.O. Gaillard <pierre-olivier.gaillard@fr.thalesgroup.com> wrote:

> Hello,
> 
> I just tried with Q6 and the network behaviour seems odd :
>  - my ssh connection freezes a bit,
>  - communication between the two PCs sharing the GigE connection is 
>  difficult to start :

please try the -Q9 patch i just released, and if you still see any
problems does the following:

    echo 300 > /proc/sys/net/core/netdev_backlog_granularity

help?

	Ingo

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q9
  2004-09-02 11:10     ` [patch] voluntary-preempt-2.6.9-rc1-bk4-Q9 Ingo Molnar
@ 2004-09-02 12:14       ` Thomas Charbonnel
  2004-09-02 13:16       ` Thomas Charbonnel
  2004-09-02 21:57       ` [patch] voluntary-preempt-2.6.9-rc1-bk4-R0 Ingo Molnar
  2 siblings, 0 replies; 100+ messages in thread
From: Thomas Charbonnel @ 2004-09-02 12:14 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt,
	Lee Revell, Mark_H_Johnson

Ingo Molnar wrote :
> i've released the -Q9 patch:
> 
>   http://redhat.com/~mingo/voluntary-preempt/voluntary-preempt-2.6.9-rc1-bk4-Q9
> 
> ontop of:
> 
>   http://redhat.com/~mingo/voluntary-preempt/diff-bk-040828-2.6.8.1.bz2
> 
> Changes:
> 
>  - fixed the cond_resched_softirq() bug noticed by Mika Penttila.
> 
>  - updated the preemption-friendly network-RX code but 8193too.c still
>    produces delayed packets so netdev_backlog_granularity now defaults
>    to 2, which seems to be working fine on my testbox.

Thanks, network seems to be working fine with the new default value.
I still get > 140 us non-preemptible sections in rtl8139_pol, though :
http://www.undata.org/~thomas/q9_rtl8139.trace

Thomas



^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q9
  2004-09-02 11:10     ` [patch] voluntary-preempt-2.6.9-rc1-bk4-Q9 Ingo Molnar
  2004-09-02 12:14       ` Thomas Charbonnel
@ 2004-09-02 13:16       ` Thomas Charbonnel
  2004-09-02 13:23         ` Ingo Molnar
  2004-09-02 21:57       ` [patch] voluntary-preempt-2.6.9-rc1-bk4-R0 Ingo Molnar
  2 siblings, 1 reply; 100+ messages in thread
From: Thomas Charbonnel @ 2004-09-02 13:16 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt,
	Lee Revell, Mark_H_Johnson

Ingo Molnar wrote :
> i've released the -Q9 patch:
> 
>   http://redhat.com/~mingo/voluntary-preempt/voluntary-preempt-2.6.9-rc1-bk4-Q9

With ACPI compiled in and booting with acpi=off (which again doesn't
seem to be honoured), here's another weird one :

preemption latency trace v1.0.5 on 2.6.9-rc1-VP-Q9
--------------------------------------------------
 latency: 2888 us, entries: 37 (37)
    -----------------
    | task: swapper/0, uid:0 nice:0 policy:0 rt_prio:0
    -----------------
 => started at: do_IRQ+0x19/0x1a0
 => ended at:   do_IRQ+0x14d/0x1a0
=======>
00010000 0.000ms (+0.000ms): do_IRQ (common_interrupt)
00010000 0.000ms (+0.000ms): do_IRQ (default_idle)
00010000 0.000ms (+0.000ms): do_IRQ (<00000000>)
00010001 0.000ms (+0.002ms): mask_and_ack_8259A (do_IRQ)
00010001 0.002ms (+0.000ms): generic_redirect_hardirq (do_IRQ)
00010000 0.002ms (+0.000ms): generic_handle_IRQ_event (do_IRQ)
00010000 0.003ms (+0.000ms): timer_interrupt (generic_handle_IRQ_event)
00010001 0.003ms (+2.878ms): mark_offset_tsc (timer_interrupt)
00010001 2.882ms (+0.000ms): do_timer (timer_interrupt)
00010001 2.882ms (+0.000ms): update_process_times (do_timer)
00010001 2.882ms (+0.000ms): update_one_process (update_process_times)
00010001 2.882ms (+0.000ms): run_local_timers (update_process_times)
00010001 2.882ms (+0.000ms): raise_softirq (update_process_times)
00010001 2.882ms (+0.000ms): scheduler_tick (update_process_times)
00010001 2.883ms (+0.000ms): sched_clock (scheduler_tick)
00010001 2.883ms (+0.000ms): update_wall_time (do_timer)
00010001 2.883ms (+0.000ms): update_wall_time_one_tick
(update_wall_time)
00010001 2.883ms (+0.000ms): profile_tick (timer_interrupt)
00010001 2.884ms (+0.000ms): profile_hook (profile_tick)
00010002 2.884ms (+0.000ms): notifier_call_chain (profile_hook)
00010001 2.884ms (+0.000ms): profile_hit (timer_interrupt)
00010001 2.884ms (+0.000ms): generic_note_interrupt (do_IRQ)
00010001 2.884ms (+0.000ms): end_8259A_irq (do_IRQ)
00010001 2.885ms (+0.000ms): enable_8259A_irq (do_IRQ)
00000001 2.886ms (+0.000ms): do_softirq (do_IRQ)
00000001 2.886ms (+0.000ms): __do_softirq (do_softirq)
00000001 2.886ms (+0.000ms): wake_up_process (do_softirq)
00000001 2.886ms (+0.000ms): try_to_wake_up (wake_up_process)
00000001 2.886ms (+0.000ms): task_rq_lock (try_to_wake_up)
00000002 2.886ms (+0.000ms): activate_task (try_to_wake_up)
00000002 2.886ms (+0.000ms): sched_clock (activate_task)
00000002 2.887ms (+0.000ms): recalc_task_prio (activate_task)
00000002 2.887ms (+0.000ms): effective_prio (recalc_task_prio)
00000002 2.887ms (+0.000ms): enqueue_task (activate_task)
00000001 2.887ms (+0.000ms): preempt_schedule (try_to_wake_up)
00000001 2.888ms (+0.000ms): sub_preempt_count (do_IRQ)
00000001 2.888ms (+0.000ms): update_max_trace (check_preempt_timing)

Thomas



^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q9
  2004-09-02 13:16       ` Thomas Charbonnel
@ 2004-09-02 13:23         ` Ingo Molnar
  2004-09-02 14:38           ` Thomas Charbonnel
  0 siblings, 1 reply; 100+ messages in thread
From: Ingo Molnar @ 2004-09-02 13:23 UTC (permalink / raw)
  To: Thomas Charbonnel
  Cc: linux-kernel, K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt,
	Lee Revell, Mark_H_Johnson


* Thomas Charbonnel <thomas@undata.org> wrote:

> With ACPI compiled in and booting with acpi=off (which again doesn't
> seem to be honoured), here's another weird one :

> 00010000 0.003ms (+0.000ms): timer_interrupt (generic_handle_IRQ_event)
> 00010001 0.003ms (+2.878ms): mark_offset_tsc (timer_interrupt)
> 00010001 2.882ms (+0.000ms): do_timer (timer_interrupt)

do you have the NMI watchdog enabled? That could help us debugging this. 
Enabling the APIC/IO_APIC and using nmi_watchdog=1 would be the ideal
solution - if that doesnt work then nmi_watchdog=2 would be fine too,
but remove this code from arch/i386/kernel/nmi.c:

	if (nmi_watchdog == NMI_LOCAL_APIC)
		nmi_hz = 1;

(otherwise nmi_watchdog=2 would result in one NMI per second, not enough
to shed light on the above 2.8 msec latency.)

	Ingo

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q9
  2004-09-02 13:23         ` Ingo Molnar
@ 2004-09-02 14:38           ` Thomas Charbonnel
  0 siblings, 0 replies; 100+ messages in thread
From: Thomas Charbonnel @ 2004-09-02 14:38 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt,
	Lee Revell, Mark_H_Johnson

Ingo Molnar wrote :

> do you have the NMI watchdog enabled? That could help us debugging this. 
> Enabling the APIC/IO_APIC and using nmi_watchdog=1 would be the ideal
> solution - if that doesnt work then nmi_watchdog=2 would be fine too,
> but remove this code from arch/i386/kernel/nmi.c:
> 
> 	if (nmi_watchdog == NMI_LOCAL_APIC)
> 		nmi_hz = 1;
> 
> (otherwise nmi_watchdog=2 would result in one NMI per second, not enough
> to shed light on the above 2.8 msec latency.)
> 
> 	Ingo

nmi_watchdog=1 worked fine.
Here's a trace (there seem to be a counter overflow) :

preemption latency trace v1.0.5 on 2.6.9-rc1-VP-Q9
--------------------------------------------------
 latency: 539 us, entries: 38 (38)
    -----------------
    | task: swapper/0, uid:0 nice:0 policy:0 rt_prio:0
    -----------------
 => started at: do_IRQ+0x19/0x190
 => ended at:   do_IRQ+0x14a/0x190
=======>
00010000 0.000ms (+0.000ms): do_IRQ (common_interrupt)
00010000 0.000ms (+0.000ms): do_IRQ (default_idle)
00010000 0.000ms (+0.000ms): do_IRQ (<00000000>)
00010001 0.000ms (+0.002ms): mask_and_ack_8259A (do_IRQ)
00010001 0.002ms (+0.000ms): generic_redirect_hardirq (do_IRQ)
00010000 0.002ms (+0.000ms): generic_handle_IRQ_event (do_IRQ)
00010000 0.002ms (+0.000ms): timer_interrupt (generic_handle_IRQ_event)
00010001 0.003ms (+0.005ms): mark_offset_tsc (timer_interrupt)
00010001 0.008ms (+0.000ms): do_timer (timer_interrupt)
00010001 0.008ms (+0.000ms): update_process_times (do_timer)
00010001 0.008ms (+0.000ms): update_one_process (update_process_times)
00010001 0.009ms (+0.000ms): run_local_timers (update_process_times)
00010001 0.009ms (+0.000ms): raise_softirq (update_process_times)
00010001 0.009ms (+0.000ms): scheduler_tick (update_process_times)
00010001 0.009ms (+0.000ms): sched_clock (scheduler_tick)
00010001 0.009ms (+0.000ms): update_wall_time (do_timer)
00010001 0.009ms (+0.000ms): update_wall_time_one_tick
(update_wall_time)
00010001 0.010ms (+0.000ms): generic_note_interrupt (do_IRQ)
00010001 0.010ms (+0.000ms): end_8259A_irq (do_IRQ)
00010001 0.010ms (+0.000ms): enable_8259A_irq (do_IRQ)
00000001 0.011ms (+0.524ms): do_softirq (do_IRQ)
00000001 0.535ms (+0.000ms): do_nmi (mcount)
00010001 0.536ms (+0.000ms): profile_tick (nmi_watchdog_tick)
00010001 0.536ms (+0.000ms): profile_hook (profile_tick)
00010002 0.536ms (+0.000ms): notifier_call_chain (profile_hook)
00010001 0.537ms (+689953.062ms): profile_hit (nmi_watchdog_tick)
00000002 689953.599ms (+1.523ms): sched_clock (activate_task)
00000001 0.537ms (+0.000ms): wake_up_process (do_softirq)
00000001 0.537ms (+0.000ms): try_to_wake_up (wake_up_process)
00000001 0.537ms (+0.000ms): task_rq_lock (try_to_wake_up)
00000002 0.537ms (+0.000ms): activate_task (try_to_wake_up)
00000002 0.538ms (+0.000ms): sched_clock (activate_task)
00000002 0.538ms (+0.000ms): recalc_task_prio (activate_task)
00000002 0.538ms (+0.000ms): effective_prio (recalc_task_prio)
00000002 0.538ms (+0.000ms): enqueue_task (activate_task)
00000001 0.538ms (+0.000ms): preempt_schedule (try_to_wake_up)
00000001 0.539ms (+0.000ms): sub_preempt_count (do_IRQ)
00000001 0.539ms (+0.000ms): update_max_trace (check_preempt_timing)

Thomas



^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q6 - network is no longer smooth
  2004-09-02 11:28         ` Ingo Molnar
@ 2004-09-02 15:26           ` P.O. Gaillard
  0 siblings, 0 replies; 100+ messages in thread
From: P.O. Gaillard @ 2004-09-02 15:26 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel

Ingo Molnar wrote:> * P.O. Gaillard <pierre-olivier.gaillard@fr.thalesgroup.com> 
wrote:
> 
> 
>>Hello,
>>
>>I just tried with Q6 and the network behaviour seems odd :
>> - my ssh connection freezes a bit,
>> - communication between the two PCs sharing the GigE connection is 
>> difficult to start :
> 
> 
> please try the -Q9 patch i just released, and if you still see any
> problems does the following:
> 
>     echo 300 > /proc/sys/net/core/netdev_backlog_granularity
> 
> help?
>
Q9 just crashed after showing the login prompt. It says there is a kernel bug in 
netdevice.h at line 874

The backtrace is :
finegrained_poll
net_rx_action
___do_soft_irq
_do_soft_irq
ksoftirqd
kthread
ksoftirqd
kthread
kernel_thread_helper

I have the following modules :
e1000
i2c_isa
i2c_sensor
w83627hf
asus_acpi



	P.O. Gaillard







^ permalink raw reply	[flat|nested] 100+ messages in thread

* [patch] voluntary-preempt-2.6.9-rc1-bk4-R0
  2004-09-02 11:10     ` [patch] voluntary-preempt-2.6.9-rc1-bk4-Q9 Ingo Molnar
  2004-09-02 12:14       ` Thomas Charbonnel
  2004-09-02 13:16       ` Thomas Charbonnel
@ 2004-09-02 21:57       ` Ingo Molnar
  2004-09-02 22:06         ` Lee Revell
                           ` (3 more replies)
  2 siblings, 4 replies; 100+ messages in thread
From: Ingo Molnar @ 2004-09-02 21:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt, Lee Revell,
	Mark_H_Johnson, P.O. Gaillard

i've released the -R0 patch:

  http://redhat.com/~mingo/voluntary-preempt/voluntary-preempt-2.6.9-rc1-bk4-R0

ontop of:

  http://redhat.com/~mingo/voluntary-preempt/diff-bk-040828-2.6.8.1.bz2

i've given up on the netdev_backlog_granularity approach, and as a
replacement i've modified specific network drivers to return at a safe
point if softirq preemption is requested. This gives the same end result
but is more robust. For the time being i've fixed 8193too.c and e100.c.
(will fix up other drivers too as latencies get reported)

this should fix the crash reported by P.O. Gaillard, and it should solve
the packet delay/loss issues reported by Mark H Johnson. I cannot see
any problems on my rtl8193 testbox anymore.

	Ingo

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R0
  2004-09-02 21:57       ` [patch] voluntary-preempt-2.6.9-rc1-bk4-R0 Ingo Molnar
@ 2004-09-02 22:06         ` Lee Revell
  2004-09-02 22:14           ` Ingo Molnar
  2004-09-03 11:04         ` K.R. Foley
                           ` (2 subsequent siblings)
  3 siblings, 1 reply; 100+ messages in thread
From: Lee Revell @ 2004-09-02 22:06 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt,
	Mark_H_Johnson, P.O. Gaillard

On Thu, 2004-09-02 at 17:57, Ingo Molnar wrote:
> i've released the -R0 patch:
> 
>   http://redhat.com/~mingo/voluntary-preempt/voluntary-preempt-2.6.9-rc1-bk4-R0
>  
> ontop of:
> 
>   http://redhat.com/~mingo/voluntary-preempt/diff-bk-040828-2.6.8.1.bz2
> 
> i've given up on the netdev_backlog_granularity approach, and as a
> replacement i've modified specific network drivers to return at a safe
> point if softirq preemption is requested.

Makes sense, netdev_max_backlog never made a difference on my system
(via-rhine driver).

If you read that Microsoft paper I posted a link to earlier, they
describe all of the main categories of latencies we have dealt with. 
They give special mention to network driver DPCs/softirqs.

The worst offender was a driver that used a DPC/softirq to reset the
card if there was no traffic in 10 seconds, to work around a hardware
lockup bug in some versions.  Since the reset path was never designed to
be fast this caused problems.  They also mention a similar problem to
the one that still exists with the via-rhine driver (fixed in -mm) where
it uses a DPC to poll for link status.

Lee

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R0
  2004-09-02 22:06         ` Lee Revell
@ 2004-09-02 22:14           ` Ingo Molnar
  2004-09-02 22:15             ` Lee Revell
  2004-09-03  0:24             ` Lee Revell
  0 siblings, 2 replies; 100+ messages in thread
From: Ingo Molnar @ 2004-09-02 22:14 UTC (permalink / raw)
  To: Lee Revell
  Cc: linux-kernel, K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt,
	Mark_H_Johnson, P.O. Gaillard


* Lee Revell <rlrevell@joe-job.com> wrote:

> > i've given up on the netdev_backlog_granularity approach, and as a
> > replacement i've modified specific network drivers to return at a safe
> > point if softirq preemption is requested.
> 
> Makes sense, netdev_max_backlog never made a difference on my system
> (via-rhine driver).

via-rhine does RX processing from the hardirq handler, this codepath is
harder to break up. The NAPI ->poll functions used by e100 and 8193too
are much easier to break up because RX throttling and re-trying is a
basic property of NAPI.

	Ingo

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R0
  2004-09-02 22:14           ` Ingo Molnar
@ 2004-09-02 22:15             ` Lee Revell
  2004-09-03  0:24             ` Lee Revell
  1 sibling, 0 replies; 100+ messages in thread
From: Lee Revell @ 2004-09-02 22:15 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt,
	Mark_H_Johnson, P.O. Gaillard

On Thu, 2004-09-02 at 18:14, Ingo Molnar wrote:
> * Lee Revell <rlrevell@joe-job.com> wrote:
> 
> > > i've given up on the netdev_backlog_granularity approach, and as a
> > > replacement i've modified specific network drivers to return at a safe
> > > point if softirq preemption is requested.
> > 
> > Makes sense, netdev_max_backlog never made a difference on my system
> > (via-rhine driver).
> 
> via-rhine does RX processing from the hardirq handler, this codepath is
> harder to break up. The NAPI ->poll functions used by e100 and 8193too
> are much easier to break up because RX throttling and re-trying is a
> basic property of NAPI.
> 

What I meant was, I did not consider the latencies from via-rhine to be
a problem.  The other posters reported 300-500 usec latencies in the
network driver, I did not see this.

Lee


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q8
  2004-09-02  7:15       ` Ingo Molnar
  2004-09-02  7:31         ` Lee Revell
@ 2004-09-02 23:25         ` Lee Revell
  2004-09-02 23:28           ` Ingo Molnar
  1 sibling, 1 reply; 100+ messages in thread
From: Lee Revell @ 2004-09-02 23:25 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Mark_H_Johnson, K.R. Foley, linux-kernel, Felipe Alfaro Solana,
	Daniel Schmitt, alsa-devel

On Thu, 2004-09-02 at 03:15, Ingo Molnar wrote:
> * Lee Revell <rlrevell@joe-job.com> wrote:
> 
> > Here are traces of a 145, 190, and 217 usec latencies in
> > netif_receive_skb:
> > 
> > http://krustophenia.net/testresults.php?dataset=2.6.9-rc1-Q6#/var/www/2.6.9-rc1-Q6/trace2.txt
> > http://krustophenia.net/testresults.php?dataset=2.6.9-rc1-Q6#/var/www/2.6.9-rc1-Q6/trace3.txt
> > http://krustophenia.net/testresults.php?dataset=2.6.9-rc1-Q6#/var/www/2.6.9-rc1-Q6/trace4.txt
> 
> these all seem to be single-packet processing latencies - it would be
> quite hard to make those codepaths preemptible.
> 
> i'd suggest to turn off things like netfilter and ip_conntrack (and
> other optional networking features that show up in the trace), they can
> only increase latency:
> 

Do you see any optional networking features in the trace (other than
ip_conntrack)?  I was under the impression that I had everything
optional disabled.

Lee  


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q8
  2004-09-02 23:25         ` Lee Revell
@ 2004-09-02 23:28           ` Ingo Molnar
  2004-09-02 23:32             ` Lee Revell
  0 siblings, 1 reply; 100+ messages in thread
From: Ingo Molnar @ 2004-09-02 23:28 UTC (permalink / raw)
  To: Lee Revell
  Cc: Mark_H_Johnson, K.R. Foley, linux-kernel, Felipe Alfaro Solana,
	Daniel Schmitt, alsa-devel


* Lee Revell <rlrevell@joe-job.com> wrote:

> Do you see any optional networking features in the trace (other than
> ip_conntrack)?  I was under the impression that I had everything
> optional disabled.

yeah, it seems to be only ip_conntrack and netfilter (which conntrack
relies on).

	Ingo

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q8
  2004-09-02 23:28           ` Ingo Molnar
@ 2004-09-02 23:32             ` Lee Revell
  0 siblings, 0 replies; 100+ messages in thread
From: Lee Revell @ 2004-09-02 23:32 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Mark_H_Johnson, K.R. Foley, linux-kernel, Felipe Alfaro Solana,
	Daniel Schmitt, alsa-devel

On Thu, 2004-09-02 at 19:28, Ingo Molnar wrote:
> * Lee Revell <rlrevell@joe-job.com> wrote:
> 
> > Do you see any optional networking features in the trace (other than
> > ip_conntrack)?  I was under the impression that I had everything
> > optional disabled.
> 
> yeah, it seems to be only ip_conntrack and netfilter (which conntrack
> relies on).
> 

FWIW these seem to only slow down the single packet path by about 10%. 
This is pretty good.

Lee


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R0
  2004-09-02 22:14           ` Ingo Molnar
  2004-09-02 22:15             ` Lee Revell
@ 2004-09-03  0:24             ` Lee Revell
  2004-09-03  3:17               ` Eric St-Laurent
  1 sibling, 1 reply; 100+ messages in thread
From: Lee Revell @ 2004-09-03  0:24 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt,
	Mark_H_Johnson, P.O. Gaillard

On Thu, 2004-09-02 at 18:14, Ingo Molnar wrote:
> * Lee Revell <rlrevell@joe-job.com> wrote:
> 
> > > i've given up on the netdev_backlog_granularity approach, and as a
> > > replacement i've modified specific network drivers to return at a safe
> > > point if softirq preemption is requested.
> > 
> > Makes sense, netdev_max_backlog never made a difference on my system
> > (via-rhine driver).
> 
> via-rhine does RX processing from the hardirq handler, this codepath is
> harder to break up. The NAPI ->poll functions used by e100 and 8193too
> are much easier to break up because RX throttling and re-trying is a
> basic property of NAPI.
> 

Judging from these graphs, all of the latency issues are solved, at
least on my UP hardware, and the latencies seem to be getting very close
to the limits of what the hardware can do:

http://krustophenia.net/testresults.php?dataset=2.6.9-rc1-Q6#/var/www/2.6.9-rc1-Q6/jack-test-1

The worst case latency is only 160 usecs, and the vast majority fall
into the pattern from 0 to 120 usecs.  All of the spikes above 120 are
almost certainly caused by netdev_max_backlog.  However these are not
long enough to cause any problems with my workload; the lowest practical
latency for audio work is around 0.66 ms (32 frames at 48khz). 

Lee



^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q8
  2004-09-02  7:46           ` Ingo Molnar
@ 2004-09-03  1:10             ` Rusty Russell
  0 siblings, 0 replies; 100+ messages in thread
From: Rusty Russell @ 2004-09-03  1:10 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Lee Revell, Mark_H_Johnson, K.R. Foley, linux-kernel,
	Felipe Alfaro Solana, Daniel Schmitt, alsa-devel, netfilter-devel

On Thu, 2004-09-02 at 17:46, Ingo Molnar wrote:
> Rusty, what's going on in this code?

Good question!  Not my code, fortunately...

> #1: we kmalloc(GFP_KERNEL) with a spinlock held and softirqs off - ouch!
> 
> #2: why does it do the kmalloc() anyway? It could store the position in
>     the seq pointer just fine. No need to alloc an integer pointer to
>     store the value in ...
> 
> #3: to fix the latency, ct_seq_show() could take the ip_conntrack_lock 
>     and could check the current index against ip_conntrack_htable_size. 
>     There's not much point in making this non-preemptible, there's 
>     a 4K granularity anyway.

The code tries to put an entire hash bucket into a single seq_read(). 
That's not going to work if the hash is really deep.  On the other hand,
not much will, and it's simple.

The lock is only needed on traversing: htable_size can't change after
init anyway, so it should be done in ct_seq_show.

Fix should be fairly simple...
Rusty.
-- 
Anyone who quotes me in their signature is an idiot -- Rusty Russell


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R0
  2004-09-03  0:24             ` Lee Revell
@ 2004-09-03  3:17               ` Eric St-Laurent
  2004-09-03  6:26                 ` Lee Revell
  0 siblings, 1 reply; 100+ messages in thread
From: Eric St-Laurent @ 2004-09-03  3:17 UTC (permalink / raw)
  To: Lee Revell
  Cc: Ingo Molnar, linux-kernel, K.R. Foley, Felipe Alfaro Solana,
	Daniel Schmitt, Mark_H_Johnson, P.O. Gaillard

> Judging from these graphs, all of the latency issues are solved, at
> least on my UP hardware, and the latencies seem to be getting very close
> to the limits of what the hardware can do:
> 
> http://krustophenia.net/testresults.php?dataset=2.6.9-rc1-Q6#/var/www/2.6.9-rc1-Q6/jack-test-1
> 
> The worst case latency is only 160 usecs, and the vast majority fall
> into the pattern from 0 to 120 usecs.  All of the spikes above 120 are
> almost certainly caused by netdev_max_backlog.  However these are not
> long enough to cause any problems with my workload; the lowest practical
> latency for audio work is around 0.66 ms (32 frames at 48khz). 

Lee,

A few weeks ago you wrote that "the worst latency I was able to trigger
was 46 usecs", now it's 160 usecs.

Ingo has done much work on his patches since then.

Why the worst latency is higher now? I presume that the latency
measurements technique are more accurate and the 46 usecs was
inaccurate?

Ref: http://uwsg.indiana.edu/hypermail/linux/kernel/0407.3/0994.html

Best regards,

Eric St-Laurent



^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R0
  2004-09-03  3:17               ` Eric St-Laurent
@ 2004-09-03  6:26                 ` Lee Revell
  2004-09-03  6:36                   ` Ingo Molnar
  0 siblings, 1 reply; 100+ messages in thread
From: Lee Revell @ 2004-09-03  6:26 UTC (permalink / raw)
  To: Eric St-Laurent
  Cc: Ingo Molnar, linux-kernel, K.R. Foley, Felipe Alfaro Solana,
	Daniel Schmitt, Mark_H_Johnson, P.O. Gaillard

On Thu, 2004-09-02 at 23:17, Eric St-Laurent wrote:
> > Judging from these graphs, all of the latency issues are solved, at
> > least on my UP hardware, and the latencies seem to be getting very close
> > to the limits of what the hardware can do:
> > 
> > http://krustophenia.net/testresults.php?dataset=2.6.9-rc1-Q6#/var/www/2.6.9-rc1-Q6/jack-test-1
> > 
> > The worst case latency is only 160 usecs, and the vast majority fall
> > into the pattern from 0 to 120 usecs.  All of the spikes above 120 are
> > almost certainly caused by netdev_max_backlog.  However these are not
> > long enough to cause any problems with my workload; the lowest practical
> > latency for audio work is around 0.66 ms (32 frames at 48khz). 
> 
> Lee,
> 
> A few weeks ago you wrote that "the worst latency I was able to trigger
> was 46 usecs", now it's 160 usecs.
> 
> Ingo has done much work on his patches since then.
> 
> Why the worst latency is higher now? I presume that the latency
> measurements technique are more accurate and the 46 usecs was
> inaccurate?
> 
> Ref: http://uwsg.indiana.edu/hypermail/linux/kernel/0407.3/0994.html
> 

Yup, due to my incomplete understanding of the jackd code, my initial
measurements were measuring the time it took to run one process cycle
(basically a NOOP if there are no clients), rather than the actual time
jackd spent in poll() between cycles.

This did have the effect of measuring the scueduler latency, but I
believe it was being measured indirectly via cache effects - the longer
it had been since jackd last ran, the colder the cachelines touched by
the last cycle.

Since I am using a patch to jackd to measure these latencies, which will
be merged in the near future, it's more important for me that the patch
accurately reflect the latencies jackd users will see than it is for the
new, accurate results to be compatible with the old.

All datasets for -O and earlier use the old code:

	http://krustophenia.net/testresults.php?dataset=2.6.8-rc3-O5

This one uses the initial version of the new code, measuring the time
jackd spends in poll().  The bimodal distribution is due to my sound
card having two different interrupt sources for capture and playback. 
So one of the spikes represent the elapsed time between the capture
interrupt and the playback interrupt, and the other the time between the
playback interrupt and the next capture interrupt:

	http://krustophenia.net/testresults.php?dataset=2.6.8.1-P0

-Q and later use the current method, which is like the above except the
second hump is discarded, as it is a function of the scheduling latency
and the period size rather than just the scheduling latency:

	http://krustophenia.net/testresults.php?dataset=2.6.9-rc1-Q6

So, don't be fooled by the numbers, the newest version of the patch is
in fact the best.  I have been meaning to go back and measure the
current patches with the old code but it's pretty low priority...

Lee

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R0
  2004-09-03  6:26                 ` Lee Revell
@ 2004-09-03  6:36                   ` Ingo Molnar
  2004-09-03  6:49                     ` Lee Revell
  0 siblings, 1 reply; 100+ messages in thread
From: Ingo Molnar @ 2004-09-03  6:36 UTC (permalink / raw)
  To: Lee Revell
  Cc: Eric St-Laurent, linux-kernel, K.R. Foley, Felipe Alfaro Solana,
	Daniel Schmitt, Mark_H_Johnson, P.O. Gaillard


* Lee Revell <rlrevell@joe-job.com> wrote:

> -Q and later use the current method, which is like the above except
> the second hump is discarded, as it is a function of the scheduling
> latency and the period size rather than just the scheduling latency:
> 
> 	http://krustophenia.net/testresults.php?dataset=2.6.9-rc1-Q6
> 
> So, don't be fooled by the numbers, the newest version of the patch is
> in fact the best.  I have been meaning to go back and measure the
> current patches with the old code but it's pretty low priority...

vanilla kernel 2.6.8.1 would be quite interesting to get a few charts of
- especially if your measurement methodology has changed. There's not
much sense in re-testing older VP patches.

also, has the userspace workload you are using stayed constant during
all these tests?

	Ingo

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R0
  2004-09-03  6:36                   ` Ingo Molnar
@ 2004-09-03  6:49                     ` Lee Revell
  2004-09-03  7:01                       ` Ingo Molnar
  2004-09-03  7:05                       ` Ingo Molnar
  0 siblings, 2 replies; 100+ messages in thread
From: Lee Revell @ 2004-09-03  6:49 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Eric St-Laurent, linux-kernel, K.R. Foley, Felipe Alfaro Solana,
	Daniel Schmitt, Mark_H_Johnson, P.O. Gaillard

On Fri, 2004-09-03 at 02:36, Ingo Molnar wrote:
> * Lee Revell <rlrevell@joe-job.com> wrote:
> 
> > -Q and later use the current method, which is like the above except
> > the second hump is discarded, as it is a function of the scheduling
> > latency and the period size rather than just the scheduling latency:
> > 
> > 	http://krustophenia.net/testresults.php?dataset=2.6.9-rc1-Q6
> > 
> > So, don't be fooled by the numbers, the newest version of the patch is
> > in fact the best.  I have been meaning to go back and measure the
> > current patches with the old code but it's pretty low priority...
> 
> vanilla kernel 2.6.8.1 would be quite interesting to get a few charts of
> - especially if your measurement methodology has changed.

OK, I will give this a shot.  Now that the VP patches are stabilizing I
will be doing more profiling.  I also want to try the -mm kernel, this
has some interesting differences from the stock kernel.  For example I
measured about a 10% improvement with the old method, which implies a
big performance gain.

>  There's not
> much sense in re-testing older VP patches.
> 

Yup, my thoughts exactly, this would just tell us what we already know,
that the latency gets better with each version.

> also, has the userspace workload you are using stayed constant during
> all these tests?
> 

I am mostly just using normal desktop hacker workloads, web browsing,
email, builds.  Lately I am using the box as a Samba server.  At first,
I was stressing the system using every disk benchmark I could think of,
but it never seemed to affect the worst case and did not even change the
shape of the distribution much, so I don't bother.  For all practical
purposes, it's impossible to change the shape of these graphs much by
stressing the system.

I am able to induce large latencies by using up all available swap with
make -j12 on a KDE program, and by pingflooding the broadcast address,
but these are pathological enough that I have not worried about them.

Lee

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R0
  2004-09-03  6:49                     ` Lee Revell
@ 2004-09-03  7:01                       ` Ingo Molnar
  2004-09-03  7:05                       ` Ingo Molnar
  1 sibling, 0 replies; 100+ messages in thread
From: Ingo Molnar @ 2004-09-03  7:01 UTC (permalink / raw)
  To: Lee Revell
  Cc: Eric St-Laurent, linux-kernel, K.R. Foley, Felipe Alfaro Solana,
	Daniel Schmitt, Mark_H_Johnson, P.O. Gaillard

* Lee Revell <rlrevell@joe-job.com> wrote:

> > vanilla kernel 2.6.8.1 would be quite interesting to get a few charts of
> > - especially if your measurement methodology has changed.
> 
> OK, I will give this a shot.  Now that the VP patches are stabilizing
> I will be doing more profiling.  I also want to try the -mm kernel,
> this has some interesting differences from the stock kernel.  For
> example I measured about a 10% improvement with the old method, which
> implies a big performance gain.

the -mm kernel used to have additional *-latency-fix patches that were
done based on the initial preemption-timing patch in -mm and partly
based on early VP discussions and findings. I recently reviewed and
merged the 2-3 missing ones into VP. Andrew has dropped these patches
meanwhile and i expect to submit the cleaner and more complete solution
that is in VP.

So i'd expect -mm to still perform better than vanilla (it usually
does), but if that big 10% difference in latencies doesnt show up
anymore i'd attribute it to the shuffling around of latency related
patches, not some genuine deficiency in -mm.

	Ingo

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R0
  2004-09-03  6:49                     ` Lee Revell
  2004-09-03  7:01                       ` Ingo Molnar
@ 2004-09-03  7:05                       ` Ingo Molnar
  2004-09-03  7:40                         ` Lee Revell
  2004-09-03 18:09                         ` K.R. Foley
  1 sibling, 2 replies; 100+ messages in thread
From: Ingo Molnar @ 2004-09-03  7:05 UTC (permalink / raw)
  To: Lee Revell
  Cc: Eric St-Laurent, linux-kernel, K.R. Foley, Felipe Alfaro Solana,
	Daniel Schmitt, Mark_H_Johnson, P.O. Gaillard

another question - any objections against me rebasing the VP patch to
the current -mm tree and keeping it there exclusively until all possible
merges are done? It would probably quite some work to create a complete
patch for the upstream or BK tree during that process, as small patches
start to flow in the VP => -mm => BK direction. Would any of the regular
VP users/testers be wary to use the -mm tree?

	Ingo

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R0
  2004-09-03  7:05                       ` Ingo Molnar
@ 2004-09-03  7:40                         ` Lee Revell
  2004-09-03  7:50                           ` Free Ekanayaka
  2004-09-03  8:09                           ` [patch] voluntary-preempt-2.6.9-rc1-bk4-R0 Luke Yelavich
  2004-09-03 18:09                         ` K.R. Foley
  1 sibling, 2 replies; 100+ messages in thread
From: Lee Revell @ 2004-09-03  7:40 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Eric St-Laurent, linux-kernel, K.R. Foley, Felipe Alfaro Solana,
	Daniel Schmitt, Mark_H_Johnson, P.O. Gaillard, nando, luke,
	free78

On Fri, 2004-09-03 at 03:05, Ingo Molnar wrote:
> another question - any objections against me rebasing the VP patch to
> the current -mm tree and keeping it there exclusively until all possible
> merges are done? It would probably quite some work to create a complete
> patch for the upstream or BK tree during that process, as small patches
> start to flow in the VP => -mm => BK direction. Would any of the regular
> VP users/testers be wary to use the -mm tree?
> 

None here.  Assuming the SMP issues are resolved, then the only
remaining issues on UP will probably be with various drivers, those
fixes should apply just as easily to -mm as vanilla.

As far as I am concerned the VP patches are stable enough for the
audio-centric distros to start distributing VP kernel packages, these
will certainly be using the vanilla kernel.  I think the PlanetCCRMA and
AGNULA people are planning to start distributing test VP-kernel packages
as soon as the patches stabilize.  IIRC Nando is on vacation this week.

I will make an announcement on LAD that as of R0 the VP patches should
be stable and are ready for wider testing.  You may want to wait until
after the initial slew of bug reports before rebasing VP against MM.  I
suspect most of the problems with be driver specific, and most of the
fixes will apply equally to -mm and vanilla.

I have added Luke (AudioSlack), Free (AGNULA), and Nando (CCRMA) to the
cc: list.  They would be in the best position to answer your question.

Lee

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R0
  2004-09-03  7:40                         ` Lee Revell
@ 2004-09-03  7:50                           ` Free Ekanayaka
  2004-09-03  8:05                             ` Lee Revell
  2004-09-03  8:09                           ` [patch] voluntary-preempt-2.6.9-rc1-bk4-R0 Luke Yelavich
  1 sibling, 1 reply; 100+ messages in thread
From: Free Ekanayaka @ 2004-09-03  7:50 UTC (permalink / raw)
  To: Lee Revell
  Cc: Ingo Molnar, Eric St-Laurent, linux-kernel, K.R. Foley,
	Felipe Alfaro Solana, Daniel Schmitt, Mark_H_Johnson,
	P.O. Gaillard, nando, luke, free78

|--==> "LR" == Lee Revell <rlrevell@joe-job.com> writes:

  LR> On Fri, 2004-09-03 at 03:05, Ingo Molnar wrote:
  >>another question - any objections against me rebasing the VP patch to
  >>the current -mm tree and keeping it there exclusively until all possible
  >>merges are done? It would probably quite some work to create a complete
  >>patch for the upstream or BK tree during that process, as small patches
  >>start to flow in the VP => -mm => BK direction. Would any of the regular
  >>VP users/testers be wary to use the -mm tree?
  >>

  LR> None here.  Assuming the SMP issues are resolved, then the only
  LR> remaining issues on UP will probably be with various drivers, those
  LR> fixes should apply just as easily to -mm as vanilla.

  LR> As far as I am concerned the VP patches are stable enough for the
  LR> audio-centric distros to start distributing VP kernel packages, these
  LR> will certainly be using the vanilla kernel.  I think the PlanetCCRMA and
  LR> AGNULA people are planning to start distributing test VP-kernel packages
  LR> as soon as the patches stabilize.  IIRC Nando is on vacation this week.

  LR> I will make an announcement on LAD that as of R0 the VP patches should
  LR> be stable and are ready for wider testing.  You may want to wait until
  LR> after the initial slew of bug reports before rebasing VP against MM.  I
  LR> suspect most of the problems with be driver specific, and most of the
  LR> fixes will apply equally to -mm and vanilla.

  LR> I have added Luke (AudioSlack), Free (AGNULA), and Nando (CCRMA) to the
  LR> cc: list.  They would be in the best position to answer your question.

Yes, you're right. We plan to  provide test 2.6.x  packages as soon as
patches stabilise. Please let me  know if you have some recommendation
(configuration flags, additional patches, etc.).

Cheers,

Free

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R0
  2004-09-03  7:50                           ` Free Ekanayaka
@ 2004-09-03  8:05                             ` Lee Revell
  2004-09-03  9:05                               ` Free Ekanayaka
  2004-09-03  9:25                               ` [patch] voluntary-preempt-2.6.9-rc1-bk4-R1 Ingo Molnar
  0 siblings, 2 replies; 100+ messages in thread
From: Lee Revell @ 2004-09-03  8:05 UTC (permalink / raw)
  To: Free Ekanayaka
  Cc: Ingo Molnar, Eric St-Laurent, linux-kernel, K.R. Foley,
	Felipe Alfaro Solana, Daniel Schmitt, Mark_H_Johnson,
	P.O. Gaillard, nando, luke, free78

On Fri, 2004-09-03 at 03:50, Free Ekanayaka wrote:
>   LR> As far as I am concerned the VP patches are stable enough for the
>   LR> audio-centric distros to start distributing VP kernel packages, these
>   LR> will certainly be using the vanilla kernel.  I think the PlanetCCRMA and
>   LR> AGNULA people are planning to start distributing test VP-kernel packages
>   LR> as soon as the patches stabilize.  IIRC Nando is on vacation this week.
> 
>   LR> I will make an announcement on LAD that as of R0 the VP patches should
>   LR> be stable and are ready for wider testing.  You may want to wait until
>   LR> after the initial slew of bug reports before rebasing VP against MM.  I
>   LR> suspect most of the problems with be driver specific, and most of the
>   LR> fixes will apply equally to -mm and vanilla.
> 
>   LR> I have added Luke (AudioSlack), Free (AGNULA), and Nando (CCRMA) to the
>   LR> cc: list.  They would be in the best position to answer your question.
> 
> Yes, you're right. We plan to  provide test 2.6.x  packages as soon as
> patches stabilise. Please let me  know if you have some recommendation
> (configuration flags, additional patches, etc.).
> 

As of -R0 it's definitely stable on UP and SMP users are reporting the
same.  All known problems should be fixed, and there are no known
regressions.  You should probably post a UP version and have your users
test that before posting SMP packages, the latter are not quite as well
tested.

No other patches (ie various scheduler tweaks, CK) should be necessary,
and in fact are not recommended because they might mask latency issues
that we would rather fix.

I use Debian unstable which should be pretty close to AGNULA, and for
several weeks now I have been unable to produce an xrun in jack at 32
frames no matter what I throw at the machine.  I actually have not had
xrun debugging enabled in weeks because I don't get any xruns.

Any problems at this point are most likely going to involve less common
hardware, stuff the LKML testers don't have.

Lee

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R0
  2004-09-03  7:40                         ` Lee Revell
  2004-09-03  7:50                           ` Free Ekanayaka
@ 2004-09-03  8:09                           ` Luke Yelavich
  2004-09-03  8:13                             ` Lee Revell
  1 sibling, 1 reply; 100+ messages in thread
From: Luke Yelavich @ 2004-09-03  8:09 UTC (permalink / raw)
  To: Lee Revell
  Cc: Ingo Molnar, Eric St-Laurent, linux-kernel, K.R. Foley,
	Felipe Alfaro Solana, Daniel Schmitt, Mark_H_Johnson,
	P.O. Gaillard, nando, free78

On Fri, Sep 03, 2004 at 05:40:34PM EST, Lee Revell wrote:
> On Fri, 2004-09-03 at 03:05, Ingo Molnar wrote:
> > another question - any objections against me rebasing the VP patch to
> > the current -mm tree and keeping it there exclusively until all possible
> > merges are done? It would probably quite some work to create a complete
> > patch for the upstream or BK tree during that process, as small patches
> > start to flow in the VP => -mm => BK direction. Would any of the regular
> > VP users/testers be wary to use the -mm tree?
> > 
> 
> None here.  Assuming the SMP issues are resolved, then the only
> remaining issues on UP will probably be with various drivers, those
> fixes should apply just as easily to -mm as vanilla.
> 
> As far as I am concerned the VP patches are stable enough for the
> audio-centric distros to start distributing VP kernel packages, these
> will certainly be using the vanilla kernel.  I think the PlanetCCRMA and
> AGNULA people are planning to start distributing test VP-kernel packages
> as soon as the patches stabilize.  IIRC Nando is on vacation this week.

I certainly have intensions of making test kernels available in my repository
in a separate testing branch. I don't really mind for myself what tree the
patch is based on, but I feel that the patches may get more testing if they
are against vanilla, rather than against another tree, as users 
may consider the patch to be more stable. On the other hand, it is not hard for
packagers like myself to do the leg work, and we can also provide a pre-patched
kernel source for those who still want to roll their own.

> I will make an announcement on LAD that as of R0 the VP patches should
> be stable and are ready for wider testing.  You may want to wait until
> after the initial slew of bug reports before rebasing VP against MM.  I
> suspect most of the problems with be driver specific, and most of the
> fixes will apply equally to -mm and vanilla.

Well with Lee's help, I think I have identified an ICE1712 sound driver issue,
but this is yet to be determined.

> I have added Luke (AudioSlack), Free (AGNULA), and Nando (CCRMA) to the
> cc: list.  They would be in the best position to answer your question.

I will go with whatever is decided, as I am prepared to do the legwork for my
repository users.
-- 
Luke Yelavich
http://www.audioslack.com
luke@audioslack.com

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R0
  2004-09-03  8:09                           ` [patch] voluntary-preempt-2.6.9-rc1-bk4-R0 Luke Yelavich
@ 2004-09-03  8:13                             ` Lee Revell
  2004-09-03  8:21                               ` Luke Yelavich
  2004-09-03 12:52                               ` Luke Yelavich
  0 siblings, 2 replies; 100+ messages in thread
From: Lee Revell @ 2004-09-03  8:13 UTC (permalink / raw)
  To: Luke Yelavich
  Cc: Ingo Molnar, Eric St-Laurent, linux-kernel, K.R. Foley,
	Felipe Alfaro Solana, Daniel Schmitt, Mark_H_Johnson,
	P.O. Gaillard, nando, free78

On Fri, 2004-09-03 at 04:09, Luke Yelavich wrote:
> On Fri, Sep 03, 2004 at 05:40:34PM EST, Lee Revell wrote:
> > On Fri, 2004-09-03 at 03:05, Ingo Molnar wrote:
> Well with Lee's help, I think I have identified an ICE1712 sound driver issue,
> but this is yet to be determined.

Hmm, this one is still not fixed, using the latest VP patches?

What are the symptoms again?

Lee


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R0
  2004-09-03  8:13                             ` Lee Revell
@ 2004-09-03  8:21                               ` Luke Yelavich
  2004-09-03 12:52                               ` Luke Yelavich
  1 sibling, 0 replies; 100+ messages in thread
From: Luke Yelavich @ 2004-09-03  8:21 UTC (permalink / raw)
  To: Lee Revell
  Cc: Ingo Molnar, Eric St-Laurent, linux-kernel, K.R. Foley,
	Felipe Alfaro Solana, Daniel Schmitt, Mark_H_Johnson,
	P.O. Gaillard, nando, free78

On Fri, Sep 03, 2004 at 06:13:16PM EST, Lee Revell wrote:
> > On Fri, Sep 03, 2004 at 05:40:34PM EST, Lee Revell wrote:
> > > On Fri, 2004-09-03 at 03:05, Ingo Molnar wrote:
> > Well with Lee's help, I think I have identified an ICE1712 sound driver issue,
> > but this is yet to be determined.
> 
> Hmm, this one is still not fixed, using the latest VP patches?

I have been keeping up with the patches, but haven't had a chance to try it out
with them lately. Currently building the latest one, and will be able to test
again when done.
-- 
Luke Yelavich
http://www.audioslack.com
luke@audioslack.com

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R0
  2004-09-03  8:05                             ` Lee Revell
@ 2004-09-03  9:05                               ` Free Ekanayaka
  2004-09-03  9:25                               ` [patch] voluntary-preempt-2.6.9-rc1-bk4-R1 Ingo Molnar
  1 sibling, 0 replies; 100+ messages in thread
From: Free Ekanayaka @ 2004-09-03  9:05 UTC (permalink / raw)
  To: Lee Revell
  Cc: Ingo Molnar, Eric St-Laurent, linux-kernel, K.R. Foley,
	Felipe Alfaro Solana, Daniel Schmitt, Mark_H_Johnson,
	P.O. Gaillard, nando, luke

|--==> "LR" == Lee Revell <rlrevell@joe-job.com> writes:

  LR> On Fri, 2004-09-03 at 03:50, Free Ekanayaka wrote:
  LR> As far as I am concerned the VP patches are stable enough for the
  LR> audio-centric distros to start distributing VP kernel packages, these
  LR> will certainly be using the vanilla kernel.  I think the PlanetCCRMA and
  LR> AGNULA people are planning to start distributing test VP-kernel packages
  LR> as soon as the patches stabilize.  IIRC Nando is on vacation this week.
  >>
  LR> I will make an announcement on LAD that as of R0 the VP patches should
  LR> be stable and are ready for wider testing.  You may want to wait until
  LR> after the initial slew of bug reports before rebasing VP against MM.  I
  LR> suspect most of the problems with be driver specific, and most of the
  LR> fixes will apply equally to -mm and vanilla.
  >>
  LR> I have added Luke (AudioSlack), Free (AGNULA), and Nando (CCRMA) to the
  LR> cc: list.  They would be in the best position to answer your question.
  >>
  >>Yes, you're right. We plan to  provide test 2.6.x  packages as soon as
  >>patches stabilise. Please let me  know if you have some recommendation
  >>(configuration flags, additional patches, etc.).
  >>

  LR> As of -R0 it's definitely stable on UP and SMP users are reporting the
  LR> same.  All known problems should be fixed, and there are no known
  LR> regressions.  You should probably post a UP version and have your users
  LR> test that before posting SMP packages, the latter are not quite as well
  LR> tested.

  LR> No other patches (ie various scheduler tweaks, CK) should be necessary,
  LR> and in fact are not recommended because they might mask latency issues
  LR> that we would rather fix.

  LR> I use Debian unstable which should be pretty close to AGNULA, and for
  LR> several weeks now I have been unable to produce an xrun in jack at 32
  LR> frames no matter what I throw at the machine.  I actually have not had
  LR> xrun debugging enabled in weeks because I don't get any xruns.

  LR> Any problems at this point are most likely going to involve less common
  LR> hardware, stuff the LKML testers don't have.

A/DeMuDi is based on  a sarge  snapshot (currently  17 July.. time  to
shift  forward!), plus some bits of  sid  (especially up to date audio
packages).

Which kernel tree are you using?

Have you built your kernel starting from the stock kernel-source-2.6.8
package along with the relevant debian patch?

Cheers,

Free

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [patch] voluntary-preempt-2.6.9-rc1-bk4-R1
  2004-09-03  8:05                             ` Lee Revell
  2004-09-03  9:05                               ` Free Ekanayaka
@ 2004-09-03  9:25                               ` Ingo Molnar
  2004-09-03  9:50                                 ` Luke Yelavich
                                                   ` (2 more replies)
  1 sibling, 3 replies; 100+ messages in thread
From: Ingo Molnar @ 2004-09-03  9:25 UTC (permalink / raw)
  To: Lee Revell
  Cc: Free Ekanayaka, Eric St-Laurent, linux-kernel, K.R. Foley,
	Felipe Alfaro Solana, Daniel Schmitt, Mark_H_Johnson,
	P.O. Gaillard, nando, luke, free78

* Lee Revell <rlrevell@joe-job.com> wrote:

> As of -R0 it's definitely stable on UP and SMP users are reporting the
> same.  All known problems should be fixed, and there are no known
> regressions.  You should probably post a UP version and have your
> users test that before posting SMP packages, the latter are not quite
> as well tested.

Florian Schmidt reported a minor bug that prevents a successful build if
!CONFIG_LATENCY_TRACE - i've uploaded -R1 that fixes this:

 http://redhat.com/~mingo/voluntary-preempt/voluntary-preempt-2.6.9-rc1-bk4-R1

there are no other changes in -R1, and there are no known pending bugs
currently.

for a packaged kernel i'd suggest to enable all the CONFIG_PREEMPT_*
values in the .config, including CONFIG_PREEMPT_TIMING.

CONFIG_LATENCY_TRACE can add overhead if active, so while it would be
useful for initial packages to enable this .config option, i'd suggest
to turn it off by default by changing 'tracing_enabled = 1' to
'tracing_enabled = 0' in the patch. Then people can enable it and do
precise tracing whenever they encounter a particular high latency on
their system.

configuring the threaded/nonthreaded properties of IRQ handlers can be
tricky. Perhaps a script could scan /proc (and/or /sys) for audio
interrupts and make them directly executed? Unfortunately audio
interrupt handler requests dont seem to be going through some central
ALSA function so i see no easy way to somehow tag audio interrupts
automatically and provide some /proc flag to make audio interrupts
non-threaded by default.

	Ingo

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R1
  2004-09-03  9:25                               ` [patch] voluntary-preempt-2.6.9-rc1-bk4-R1 Ingo Molnar
@ 2004-09-03  9:50                                 ` Luke Yelavich
  2004-09-03 10:29                                   ` Ingo Molnar
  2004-09-03 11:33                                 ` Thomas Charbonnel
  2004-09-03 11:36                                 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-R2 Ingo Molnar
  2 siblings, 1 reply; 100+ messages in thread
From: Luke Yelavich @ 2004-09-03  9:50 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Lee Revell, Free Ekanayaka, Eric St-Laurent, linux-kernel,
	K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt, Mark_H_Johnson,
	P.O. Gaillard, nando, free78

On Fri, Sep 03, 2004 at 07:25:47PM EST, Ingo Molnar wrote:
> 
> * Lee Revell <rlrevell@joe-job.com> wrote:
> 
> > As of -R0 it's definitely stable on UP and SMP users are reporting the
> > same.  All known problems should be fixed, and there are no known
> > regressions.  You should probably post a UP version and have your
> > users test that before posting SMP packages, the latter are not quite
> > as well tested.
> 
> Florian Schmidt reported a minor bug that prevents a successful build if
> !CONFIG_LATENCY_TRACE - i've uploaded -R1 that fixes this:
>   
>  http://redhat.com/~mingo/voluntary-preempt/voluntary-preempt-2.6.9-rc1-bk4-R1
> 
> there are no other changes in -R1, and there are no known pending bugs
> currently.

I think I might be having a problem here, that I didn't have with previous
patches. Bare in mind that the previous patch I tested was against 2.6.8.1
vanilla.

I am using a D-Link KVM here between my notebook and my desktop machine.
It is the desktop I am currently testing these patches on, and the KVM
requires a double-tap of the scroll lock key to switch between machines. With
the latest R0 patch, something is not working when I attempt to change from
my desktop to my notebook. The KVM usually lets out a beep when I can
use the arrow keys to switch, but it isn't here. Adding to that, my console
locks up totally for about 10 seconds, before allowing me to go on and
type commands. I also seem to get some debugging output or a trace of
some sort when rebooting, and the kernel panics with the message:
(0)Kernel Panic - not syncing: Failed exception in interrupt

Included is some dmesg output from beginning of bootup.

Let me know if any more info is needed.
-- 
Luke Yelavich
http://www.audioslack.com
luke@audioslack.com

CPU: After generic identify, caps: bfebfbff 00000000 00000000 00000000
CPU: After vendor identify, caps:  bfebfbff 00000000 00000000 00000000
CPU: Trace cache: 12K uops, L1 D cache: 8K
CPU: L2 cache: 512K
CPU: After all inits, caps:        bfebfbff 00000000 00000000 00000080
CPU: Intel(R) Pentium(R) 4 CPU 2.40GHz stepping 07
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Checking 'hlt' instruction... OK.
Badness in schedule at kernel/sched.c:2637
 [<c0302fb0>] schedule+0x558/0x55d
 [<c0302ff5>] preempt_schedule+0x40/0x5f
 [<c010041f>] rest_init+0x67/0x73
 [<c03da780>] start_kernel+0x195/0x1d4
 [<c03da38c>] unknown_bootoption+0x0/0x147
(swapper/1): new 666 us maximum-latency critical section.
 => started at: <preempt_schedule+0x3b/0x5f>
 => ended at:   <finish_task_switch+0x37/0x93>
 [<c012f01f>] check_preempt_timing+0x11b/0x1da
 [<c01103df>] finish_task_switch+0x37/0x93
 [<c01103df>] finish_task_switch+0x37/0x93
 [<c012f212>] sub_preempt_count+0x4d/0x65
 [<c012f212>] sub_preempt_count+0x4d/0x65
 [<c01103df>] finish_task_switch+0x37/0x93
 [<c011045e>] schedule_tail+0x23/0x5c
 [<c0105f5e>] ret_from_fork+0x6/0x14
 [<c010046b>] init+0x0/0x179
 [<c0104210>] kernel_thread_helper+0x0/0xb
Badness in schedule at kernel/sched.c:2637
 [<c0302fb0>] schedule+0x558/0x55d
 [<c0302ff5>] preempt_schedule+0x40/0x5f
 [<c0110439>] finish_task_switch+0x91/0x93
 [<c011045e>] schedule_tail+0x23/0x5c
 [<c0105f5e>] ret_from_fork+0x6/0x14
 [<c010046b>] init+0x0/0x179
 [<c0104210>] kernel_thread_helper+0x0/0xb
NET: Registered protocol family 16
PCI: PCI BIOS revision 2.10 entry at 0xf1e60, last bus=2
(swapper/1): new 904 us maximum-latency critical section.
 => started at: <cond_resched+0xd/0x41>
 => ended at:   <cond_resched+0xd/0x41>
 [<c012f01f>] check_preempt_timing+0x11b/0x1da
 [<c0303417>] cond_resched+0xd/0x41
 [<c0303417>] cond_resched+0xd/0x41
 [<c012f114>] touch_preempt_timing+0x36/0x3a
 [<c012f114>] touch_preempt_timing+0x36/0x3a
 [<c0303417>] cond_resched+0xd/0x41
 [<c0138b6b>] kmem_cache_alloc+0x5f/0x61
 [<c0119fe8>] __request_region+0x26/0xbd
 [<c03f1336>] pci_direct_init+0x3e/0x10e
 [<c03da807>] do_initcalls+0x2f/0xbc
 [<c01004c4>] init+0x59/0x179
 [<c010046b>] init+0x0/0x179
 [<c0104215>] kernel_thread_helper+0x5/0xb
PCI: Using configuration type 1
Linux Plug and Play Support v0.97 (c) Adam Belay
SCSI subsystem initialized
usbcore: registered new driver usbfs
usbcore: registered new driver hub

<snip>

get_random_bytes called before random driver initialization
(swapper/1): new 6605 us maximum-latency critical section.
 => started at: <cond_resched+0xd/0x41>
 => ended at:   <cond_resched+0xd/0x41>
 [<c012f01f>] check_preempt_timing+0x11b/0x1da
 [<c0303417>] cond_resched+0xd/0x41
 [<c0303417>] cond_resched+0xd/0x41
 [<c012f114>] touch_preempt_timing+0x36/0x3a
 [<c012f114>] touch_preempt_timing+0x36/0x3a
 [<c0303417>] cond_resched+0xd/0x41
 [<c0138da7>] __kmalloc+0x89/0x90
 [<c0183f6c>] proc_create+0x86/0xd9
 [<c018416b>] create_proc_entry+0x66/0xc9
 [<c03f1fcc>] dev_proc_init+0x2b/0xa3
 [<c03f206c>] net_dev_init+0x28/0x17c
 [<c03f1ce4>] pcibios_init+0x65/0x7e
 [<c03da807>] do_initcalls+0x2f/0xbc
 [<c01004c4>] init+0x59/0x179
 [<c010046b>] init+0x0/0x179
 [<c0104215>] kernel_thread_helper+0x5/0xb
(swapper/1): new 43385 us maximum-latency critical section.
 => started at: <cond_resched+0xd/0x41>
 => ended at:   <cond_resched+0xd/0x41>
 [<c012f01f>] check_preempt_timing+0x11b/0x1da
 [<c0303417>] cond_resched+0xd/0x41
 [<c0107b1d>] do_IRQ+0x14a/0x18e
 [<c0303417>] cond_resched+0xd/0x41
 [<c012f114>] touch_preempt_timing+0x36/0x3a
 [<c012f114>] touch_preempt_timing+0x36/0x3a
 [<c0303417>] cond_resched+0xd/0x41
 [<c0138b6b>] kmem_cache_alloc+0x5f/0x61
 [<c0119fe8>] __request_region+0x26/0xbd
 [<c01efb09>] isapnp_next_rdp+0x66/0xa1
 [<c03e6d2d>] isapnp_isolate_rdp_select+0x5b/0xc5
 [<c03e6eda>] isapnp_isolate+0x143/0x1f8
 [<c03e7c15>] isapnp_init+0xbb/0x2da
 [<c03da807>] do_initcalls+0x2f/0xbc
 [<c01004c4>] init+0x59/0x179
 [<c010046b>] init+0x0/0x179
 [<c0104215>] kernel_thread_helper+0x5/0xb
(swapper/1): new 43492 us maximum-latency critical section.
 => started at: <cond_resched+0xd/0x41>
 => ended at:   <cond_resched+0xd/0x41>
 [<c012f01f>] check_preempt_timing+0x11b/0x1da
 [<c0303417>] cond_resched+0xd/0x41
 [<c0303417>] cond_resched+0xd/0x41
 [<c012f114>] touch_preempt_timing+0x36/0x3a
 [<c012f114>] touch_preempt_timing+0x36/0x3a
 [<c0303417>] cond_resched+0xd/0x41
 [<c0138b6b>] kmem_cache_alloc+0x5f/0x61
 [<c0119fe8>] __request_region+0x26/0xbd
 [<c01efb09>] isapnp_next_rdp+0x66/0xa1
 [<c03e6d2d>] isapnp_isolate_rdp_select+0x5b/0xc5
 [<c03e6eda>] isapnp_isolate+0x143/0x1f8
 [<c03e7c15>] isapnp_init+0xbb/0x2da
 [<c03da807>] do_initcalls+0x2f/0xbc
 [<c01004c4>] init+0x59/0x179
 [<c010046b>] init+0x0/0x179
 [<c0104215>] kernel_thread_helper+0x5/0xb
(swapper/1): new 43561 us maximum-latency critical section.
 => started at: <cond_resched+0xd/0x41>
 => ended at:   <cond_resched+0xd/0x41>
 [<c012f01f>] check_preempt_timing+0x11b/0x1da
 [<c0303417>] cond_resched+0xd/0x41
 [<c0107b1d>] do_IRQ+0x14a/0x18e
 [<c0303417>] cond_resched+0xd/0x41
 [<c012f114>] touch_preempt_timing+0x36/0x3a
 [<c012f114>] touch_preempt_timing+0x36/0x3a
 [<c0303417>] cond_resched+0xd/0x41
 [<c0138b6b>] kmem_cache_alloc+0x5f/0x61
 [<c0119fe8>] __request_region+0x26/0xbd
 [<c01efb09>] isapnp_next_rdp+0x66/0xa1
 [<c03e6d2d>] isapnp_isolate_rdp_select+0x5b/0xc5
 [<c03e6eda>] isapnp_isolate+0x143/0x1f8
 [<c03e7c15>] isapnp_init+0xbb/0x2da
 [<c03da807>] do_initcalls+0x2f/0xbc
 [<c01004c4>] init+0x59/0x179
 [<c010046b>] init+0x0/0x179
 [<c0104215>] kernel_thread_helper+0x5/0xb
isapnp: No Plug & Play device found
requesting new irq thread for IRQ8...
Real Time Clock Driver v1.12
Using anticipatory io scheduler
<snip>
requesting new irq thread for IRQ6...
<snip>
IRQ#6 thread started up.
eth0: Tigon3 [partno(BCM95702A20) rev 1002 PHY(5703)] (PCI:33MHz:32-bit) 10/100/1000BaseT Ethernet 00:e0:18:df:d5:b6
eth0: HostTXDS[1] RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] Split[0] WireSpeed[1] TSOcap[1] 
requesting new irq thread for IRQ3...
(dhcpcd/2226): new 115753 us maximum-latency critical section.
 => started at: <tg3_open+0xc9/0x263 [tg3]>
 => ended at:   <tg3_open+0x122/0x263 [tg3]>
 [<c012f01f>] check_preempt_timing+0x11b/0x1da
 [<f88c805a>] tg3_open+0x122/0x263 [tg3]
 [<f88c805a>] tg3_open+0x122/0x263 [tg3]
 [<c012f212>] sub_preempt_count+0x4d/0x65
 [<c012f212>] sub_preempt_count+0x4d/0x65
 [<f88c805a>] tg3_open+0x122/0x263 [tg3]
 [<c029f2ad>] dev_open+0xcf/0xfe
 [<c02a30c1>] dev_mc_upload+0x3d/0x5d
 [<c02a0987>] dev_change_flags+0x5b/0x12c
 [<c029e930>] __dev_get_by_name+0xe/0xb9
 [<c02e2d4b>] devinet_ioctl+0x247/0x5c1
 [<c02e5066>] inet_ioctl+0x5f/0x9f
 [<c0301f9d>] packet_ioctl+0x14d/0x177
 [<c0296a96>] sock_ioctl+0x112/0x2df
 [<c0161e06>] sys_ioctl+0x13a/0x2a0
 [<c0106087>] syscall_call+0x7/0xb
IRQ#3 thread started up.
divert: not allocating divert_blk for non-ethernet device sit0
(IRQ 1/716): new 204218 us maximum-latency critical section.
 => started at: <__do_softirq+0x39/0x59>
 => ended at:   <__do_softirq+0x4a/0x59>
 [<c012f01f>] check_preempt_timing+0x11b/0x1da
 [<c01186d0>] __do_softirq+0x4a/0x59
 [<c01186d0>] __do_softirq+0x4a/0x59
 [<c012f212>] sub_preempt_count+0x4d/0x65
 [<c012f212>] sub_preempt_count+0x4d/0x65
 [<c01186d0>] __do_softirq+0x4a/0x59
 [<c01194d6>] do_irqd+0x71/0x91
 [<c012819f>] kthread+0xaa/0xaf
 [<c0119465>] do_irqd+0x0/0x91
 [<c01280f5>] kthread+0x0/0xaf
 [<c0104215>] kernel_thread_helper+0x5/0xb

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R1
  2004-09-03  9:50                                 ` Luke Yelavich
@ 2004-09-03 10:29                                   ` Ingo Molnar
  2004-09-03 10:43                                     ` Luke Yelavich
  0 siblings, 1 reply; 100+ messages in thread
From: Ingo Molnar @ 2004-09-03 10:29 UTC (permalink / raw)
  To: Luke Yelavich
  Cc: Lee Revell, Free Ekanayaka, Eric St-Laurent, linux-kernel,
	K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt, Mark_H_Johnson,
	P.O. Gaillard, nando, free78


* Luke Yelavich <luke@audioslack.com> wrote:

> I think I might be having a problem here, that I didn't have with
> previous patches. Bare in mind that the previous patch I tested was
> against 2.6.8.1 vanilla.
> 
> I am using a D-Link KVM here between my notebook and my desktop
> machine. It is the desktop I am currently testing these patches on,
> and the KVM requires a double-tap of the scroll lock key to switch
> between machines. With the latest R0 patch, something is not working
> when I attempt to change from my desktop to my notebook. The KVM
> usually lets out a beep when I can use the arrow keys to switch, but
> it isn't here. Adding to that, my console locks up totally for about
> 10 seconds, before allowing me to go on and type commands. [...]

i have a KVM too to two testsystems and unfortunately i cannot reproduce
your problems neither with KVM (key-based-)switching nor with scroll
lock. But this KVM uses a triple-key combination to switch, not
scroll-lock.

it's a PS2 keyboard, right? If yes then does:

	echo 0 > /proc/irq/1/i8042/threaded
	( maybe also: echo 0 > /proc/irq/12/i8042/threaded )

fix the problem? The PS2 driver has been a bit unrobust when hardirq
redirection is enabled.

> [...] I also seem to get some debugging output or a trace of some sort
> when rebooting, and the kernel panics with the message: (0)Kernel
> Panic - not syncing: Failed exception in interrupt

hm. Would be nice to get a serial console capture of this, if possible.

	Ingo

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R1
  2004-09-03 10:29                                   ` Ingo Molnar
@ 2004-09-03 10:43                                     ` Luke Yelavich
  0 siblings, 0 replies; 100+ messages in thread
From: Luke Yelavich @ 2004-09-03 10:43 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Lee Revell, Free Ekanayaka, Eric St-Laurent, linux-kernel,
	K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt, Mark_H_Johnson,
	P.O. Gaillard, nando, free78

On Fri, Sep 03, 2004 at 08:29:48PM EST, Ingo Molnar wrote:
> * Luke Yelavich <luke@audioslack.com> wrote:
> > I am using a D-Link KVM here between my notebook and my desktop
> > machine. It is the desktop I am currently testing these patches on,
> > and the KVM requires a double-tap of the scroll lock key to switch
> > between machines. With the latest R0 patch, something is not working
> > when I attempt to change from my desktop to my notebook. The KVM
> > usually lets out a beep when I can use the arrow keys to switch, but
> > it isn't here. Adding to that, my console locks up totally for about
> > 10 seconds, before allowing me to go on and type commands. [...]
> 
> i have a KVM too to two testsystems and unfortunately i cannot reproduce
> your problems neither with KVM (key-based-)switching nor with scroll
> lock. But this KVM uses a triple-key combination to switch, not
> scroll-lock.
> 
> it's a PS2 keyboard, right?

Yes, that is correct.

> If yes then does:
> 
> 	echo 0 > /proc/irq/1/i8042/threaded
> 	( maybe also: echo 0 > /proc/irq/12/i8042/threaded )
> 
> fix the problem? The PS2 driver has been a bit unrobust when hardirq
> redirection is enabled.

I only had to turn off IRQ threading for 1, and all is well again.

> 
> > [...] I also seem to get some debugging output or a trace of some sort
> > when rebooting, and the kernel panics with the message: (0)Kernel
> > Panic - not syncing: Failed exception in interrupt
> 
> hm. Would be nice to get a serial console capture of this, if possible.

I am pretty sure I have a cable, however my desktop is the only computer in this
room that has serial ports, and I also don't know how to set up a serial
console. If you feel that this is important, I will see what I can do.
-- 
Luke Yelavich
http://www.audioslack.com
luke@audioslack.com

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R0
  2004-09-02 21:57       ` [patch] voluntary-preempt-2.6.9-rc1-bk4-R0 Ingo Molnar
  2004-09-02 22:06         ` Lee Revell
@ 2004-09-03 11:04         ` K.R. Foley
  2004-09-03 17:02         ` K.R. Foley
  2004-09-03 17:10         ` K.R. Foley
  3 siblings, 0 replies; 100+ messages in thread
From: K.R. Foley @ 2004-09-03 11:04 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Felipe Alfaro Solana, Daniel Schmitt, Lee Revell,
	Mark_H_Johnson, P.O. Gaillard

Ingo Molnar wrote:
> i've released the -R0 patch:
> 
>   http://redhat.com/~mingo/voluntary-preempt/voluntary-preempt-2.6.9-rc1-bk4-R0
>  
> ontop of:
> 
>   http://redhat.com/~mingo/voluntary-preempt/diff-bk-040828-2.6.8.1.bz2
> 
> i've given up on the netdev_backlog_granularity approach, and as a
> replacement i've modified specific network drivers to return at a safe
> point if softirq preemption is requested. This gives the same end result
> but is more robust. For the time being i've fixed 8193too.c and e100.c.
> (will fix up other drivers too as latencies get reported)
> 
> this should fix the crash reported by P.O. Gaillard, and it should solve
> the packet delay/loss issues reported by Mark H Johnson. I cannot see
> any problems on my rtl8193 testbox anymore.
> 
> 	Ingo
> 

Last night when rebooting on R0 for the first time, my system locked up. 
It appears to have happened the first time trying to send/recv any real 
data over the e100 interface when ntp was starting up. Will try to 
investigate further a little later.

kr


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R1
  2004-09-03  9:25                               ` [patch] voluntary-preempt-2.6.9-rc1-bk4-R1 Ingo Molnar
  2004-09-03  9:50                                 ` Luke Yelavich
@ 2004-09-03 11:33                                 ` Thomas Charbonnel
  2004-09-03 11:49                                   ` Ingo Molnar
  2004-09-03 11:36                                 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-R2 Ingo Molnar
  2 siblings, 1 reply; 100+ messages in thread
From: Thomas Charbonnel @ 2004-09-03 11:33 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Lee Revell, Free Ekanayaka, Eric St-Laurent, linux-kernel,
	K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt, Mark_H_Johnson,
	P.O. Gaillard, nando, luke, free78

Ingo Molnar wrote :
> Florian Schmidt reported a minor bug that prevents a successful build if
> !CONFIG_LATENCY_TRACE - i've uploaded -R1 that fixes this:
>   
>  http://redhat.com/~mingo/voluntary-preempt/voluntary-preempt-2.6.9-rc1-bk4-R1
> 

I still get > 170 us latency from rtl8139 :
http://www.undata.org/~thomas/R1_rtl8139.trace

And again this one :
preemption latency trace v1.0.5 on 2.6.9-rc1-VP-R1
--------------------------------------------------
 latency: 597 us, entries: 12 (12)
    -----------------
    | task: swapper/0, uid:0 nice:0 policy:0 rt_prio:0
    -----------------
 => started at: smp_apic_timer_interrupt+0x32/0xd0
 => ended at:   smp_apic_timer_interrupt+0x86/0xd0
=======>
00010000 0.000ms (+0.000ms): smp_apic_timer_interrupt
(apic_timer_interrupt)
00010000 0.000ms (+0.000ms): profile_tick (smp_apic_timer_interrupt)
00010000 0.000ms (+0.000ms): profile_hook (profile_tick)
00010001 0.000ms (+0.595ms): notifier_call_chain (profile_hook)
00010000 0.595ms (+0.000ms): do_nmi (mcount)
00020000 0.596ms (+0.000ms): profile_tick (nmi_watchdog_tick)
00020000 0.596ms (+0.000ms): profile_hook (profile_tick)
00020001 0.597ms (+0.000ms): notifier_call_chain (profile_hook)
00020000 0.597ms (+689953.444ms): profile_hit (nmi_watchdog_tick)
00010001 689954.042ms (+1.141ms): update_process_times (do_timer)
00000001 0.597ms (+0.000ms): sub_preempt_count
(smp_apic_timer_interrupt)
00000001 0.598ms (+0.000ms): update_max_trace (check_preempt_timing)

Thomas



^ permalink raw reply	[flat|nested] 100+ messages in thread

* [patch] voluntary-preempt-2.6.9-rc1-bk4-R2
  2004-09-03  9:25                               ` [patch] voluntary-preempt-2.6.9-rc1-bk4-R1 Ingo Molnar
  2004-09-03  9:50                                 ` Luke Yelavich
  2004-09-03 11:33                                 ` Thomas Charbonnel
@ 2004-09-03 11:36                                 ` Ingo Molnar
  2 siblings, 0 replies; 100+ messages in thread
From: Ingo Molnar @ 2004-09-03 11:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: Free Ekanayaka, Eric St-Laurent, linux-kernel, K.R. Foley,
	Felipe Alfaro Solana, Daniel Schmitt, Mark_H_Johnson,
	P.O. Gaillard, nando, luke, free78, Lee Revell

i've uploaded -R2:

  http://redhat.com/~mingo/voluntary-preempt/voluntary-preempt-2.6.9-rc1-bk4-R2

no function changes, only fixing some rough edges: it fixes an UP
boot-time assert that occurs right after 'checking hlt'. I also fixed
PREEMPT_TIMING + !LATENCY_TRACE build bugs, and another boot bug that
occurs when PREEMPT_TIMING + !FRAME_POINTERS is used.

(the reboot assert i'm not sure about - tried to reproduce but here
reboot works fine. Will need some sort of serial log to debug this.)

	Ingo

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R1
  2004-09-03 11:33                                 ` Thomas Charbonnel
@ 2004-09-03 11:49                                   ` Ingo Molnar
  2004-09-03 12:05                                     ` Thomas Charbonnel
  2004-09-03 16:14                                     ` Thomas Charbonnel
  0 siblings, 2 replies; 100+ messages in thread
From: Ingo Molnar @ 2004-09-03 11:49 UTC (permalink / raw)
  To: Thomas Charbonnel
  Cc: Lee Revell, Free Ekanayaka, Eric St-Laurent, linux-kernel,
	K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt, Mark_H_Johnson,
	P.O. Gaillard, nando, luke, free78

[-- Attachment #1: Type: text/plain, Size: 2395 bytes --]


* Thomas Charbonnel <thomas@undata.org> wrote:

> I still get > 170 us latency from rtl8139 :
> http://www.undata.org/~thomas/R1_rtl8139.trace

this is a single-packet latency, we wont get much lower than this with
the current techniques. Disabling ip_conntrack and tracing ought to
lower the real latency somewhat.

> And again this one :
> preemption latency trace v1.0.5 on 2.6.9-rc1-VP-R1
> --------------------------------------------------
>  latency: 597 us, entries: 12 (12)
>     -----------------
>     | task: swapper/0, uid:0 nice:0 policy:0 rt_prio:0
>     -----------------
>  => started at: smp_apic_timer_interrupt+0x32/0xd0
>  => ended at:   smp_apic_timer_interrupt+0x86/0xd0
> =======>
> 00010000 0.000ms (+0.000ms): smp_apic_timer_interrupt (apic_timer_interrupt)
> 00010000 0.000ms (+0.000ms): profile_tick (smp_apic_timer_interrupt)
> 00010000 0.000ms (+0.000ms): profile_hook (profile_tick)
> 00010001 0.000ms (+0.595ms): notifier_call_chain (profile_hook)
> 00010000 0.595ms (+0.000ms): do_nmi (mcount)
> 00020000 0.596ms (+0.000ms): profile_tick (nmi_watchdog_tick)
> 00020000 0.596ms (+0.000ms): profile_hook (profile_tick)
> 00020001 0.597ms (+0.000ms): notifier_call_chain (profile_hook)
> 00020000 0.597ms (+689953.444ms): profile_hit (nmi_watchdog_tick)
> 00010001 689954.042ms (+1.141ms): update_process_times (do_timer)
> 00000001 0.597ms (+0.000ms): sub_preempt_count (smp_apic_timer_interrupt)
> 00000001 0.598ms (+0.000ms): update_max_trace (check_preempt_timing)

this is a pretty weird one. First it shows an apparently non-monotonic
RDTSC: the jump forward and backward in time around profile_hit. I
suspect the real RDTSC value was lower than the previous one and caused
an underflow. What is your cpu_khz in /proc/cpuinfo?

the other weird one is the +0.595 usec entry at notifier_call_chain(). 
That code is just a couple of instructions, so no real for any overhead 
there.

could you try the attached robust-get-cycles.patch ontop of your current
tree and see whether it impacts these weirdnesses? The patch makes sure
that the cycle counter is sane: two subsequent readings of it were
monotonic and less than 1000 cycles apart.

this patch probably wont remove the +0.595 msec latency though - the
RDTSC value jumped forward there permanently. Maybe the RDTSC value is
somehow corrupted by NMIs - could you turn off the NMI watchdog to
check?

	Ingo

[-- Attachment #2: robust-get-cycles.patch --]
[-- Type: text/plain, Size: 1594 bytes --]

--- linux/kernel/latency.c.orig2	
+++ linux/kernel/latency.c	
@@ -66,6 +66,18 @@ static unsigned long notrace cycles_to_u
 	return (unsigned long) delta;
 }
 
+static cycles_t notrace robust_get_cycles(void)
+{
+	cycles_t t0 = get_cycles(), t1;
+
+	for (;;) {
+		t1 = get_cycles();
+		if (t1 - t0 < 1000)
+			return t1;
+		t0 = t1;
+	}
+}
+
 #ifdef CONFIG_LATENCY_TRACE
 
 unsigned int trace_enabled = 1;
@@ -89,7 +101,7 @@ ____trace(struct cpu_trace *tr, unsigned
 		entry = tr->trace + tr->trace_idx;
 		entry->eip = eip;
 		entry->parent_eip = parent_eip;
-		entry->timestamp = get_cycles();
+		entry->timestamp = robust_get_cycles();
 		entry->preempt_count = preempt_count();
 	}
 	tr->trace_idx++;
@@ -295,7 +307,7 @@ check_preempt_timing(struct cpu_trace *t
 		return;
 #endif
 	atomic_inc(&tr->disabled);
-	latency = cycles_to_usecs(get_cycles() - tr->preempt_timestamp);
+	latency = cycles_to_usecs(robust_get_cycles() - tr->preempt_timestamp);
 
 	if (preempt_thresh) {
 		if (latency < preempt_thresh)
@@ -337,7 +349,7 @@ check_preempt_timing(struct cpu_trace *t
 out:
 #ifdef CONFIG_LATENCY_TRACE
 	tr->trace_idx = 0;
-	tr->preempt_timestamp = get_cycles();
+	tr->preempt_timestamp = robust_get_cycles();
 #endif
 	tr->critical_start = parent_eip;
 	__trace(eip, parent_eip);
@@ -376,7 +388,7 @@ void notrace add_preempt_count(int val)
 		struct cpu_trace *tr = &__get_cpu_var(trace);
 
 		local_irq_save(flags);
-		tr->preempt_timestamp = get_cycles();
+		tr->preempt_timestamp = robust_get_cycles();
 		tr->critical_start = eip;
 #ifdef CONFIG_LATENCY_TRACE
 		tr->trace_idx = 0;

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R1
  2004-09-03 11:49                                   ` Ingo Molnar
@ 2004-09-03 12:05                                     ` Thomas Charbonnel
  2004-09-03 16:14                                     ` Thomas Charbonnel
  1 sibling, 0 replies; 100+ messages in thread
From: Thomas Charbonnel @ 2004-09-03 12:05 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Lee Revell, Free Ekanayaka, Eric St-Laurent, linux-kernel,
	K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt, Mark_H_Johnson,
	P.O. Gaillard, nando, luke, free78

Ingo Molnar wrote :
> * Thomas Charbonnel <thomas@undata.org> wrote:
> 
> > I still get > 170 us latency from rtl8139 :
> > http://www.undata.org/~thomas/R1_rtl8139.trace
> 
> this is a single-packet latency, we wont get much lower than this with
> the current techniques. Disabling ip_conntrack and tracing ought to
> lower the real latency somewhat.
> 

Ok, I'll do that.

> > And again this one :
> > preemption latency trace v1.0.5 on 2.6.9-rc1-VP-R1
> > --------------------------------------------------
> >  latency: 597 us, entries: 12 (12)
> >     -----------------
> >     | task: swapper/0, uid:0 nice:0 policy:0 rt_prio:0
> >     -----------------
> >  => started at: smp_apic_timer_interrupt+0x32/0xd0
> >  => ended at:   smp_apic_timer_interrupt+0x86/0xd0
> > =======>
> > 00010000 0.000ms (+0.000ms): smp_apic_timer_interrupt (apic_timer_interrupt)
> > 00010000 0.000ms (+0.000ms): profile_tick (smp_apic_timer_interrupt)
> > 00010000 0.000ms (+0.000ms): profile_hook (profile_tick)
> > 00010001 0.000ms (+0.595ms): notifier_call_chain (profile_hook)
> > 00010000 0.595ms (+0.000ms): do_nmi (mcount)
> > 00020000 0.596ms (+0.000ms): profile_tick (nmi_watchdog_tick)
> > 00020000 0.596ms (+0.000ms): profile_hook (profile_tick)
> > 00020001 0.597ms (+0.000ms): notifier_call_chain (profile_hook)
> > 00020000 0.597ms (+689953.444ms): profile_hit (nmi_watchdog_tick)
> > 00010001 689954.042ms (+1.141ms): update_process_times (do_timer)
> > 00000001 0.597ms (+0.000ms): sub_preempt_count (smp_apic_timer_interrupt)
> > 00000001 0.598ms (+0.000ms): update_max_trace (check_preempt_timing)
> 
> this is a pretty weird one. First it shows an apparently non-monotonic
> RDTSC: the jump forward and backward in time around profile_hit. I
> suspect the real RDTSC value was lower than the previous one and caused
> an underflow. What is your cpu_khz in /proc/cpuinfo?
> 
root@satellite thomas # cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 11
model name      : Intel(R) Pentium(R) III Mobile CPU      1000MHz
stepping        : 1
cpu MHz         : 996.879
cache size      : 512 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 mmx fxsr sse
bogomips        : 1966.08

> the other weird one is the +0.595 usec entry at notifier_call_chain(). 
> That code is just a couple of instructions, so no real for any overhead 
> there.
> 
> could you try the attached robust-get-cycles.patch ontop of your current
> tree and see whether it impacts these weirdnesses? The patch makes sure
> that the cycle counter is sane: two subsequent readings of it were
> monotonic and less than 1000 cycles apart.
> 
> this patch probably wont remove the +0.595 msec latency though - the
> RDTSC value jumped forward there permanently. Maybe the RDTSC value is
> somehow corrupted by NMIs - could you turn off the NMI watchdog to
> check?
> 
> 	Ingo

I precisely enabled the NMI watchdog to track those weird latencies
down. My guess is still that when ACPI is enabled my bios does something
funky with SMM/SMI that increments the TSC. I'll try the patch and let
you know.

Thomas



^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R0
  2004-09-03  8:13                             ` Lee Revell
  2004-09-03  8:21                               ` Luke Yelavich
@ 2004-09-03 12:52                               ` Luke Yelavich
  1 sibling, 0 replies; 100+ messages in thread
From: Luke Yelavich @ 2004-09-03 12:52 UTC (permalink / raw)
  To: Lee Revell
  Cc: Ingo Molnar, Eric St-Laurent, linux-kernel, K.R. Foley,
	Felipe Alfaro Solana, Daniel Schmitt, Mark_H_Johnson,
	P.O. Gaillard, nando, free78

On Fri, Sep 03, 2004 at 06:13:16PM EST, Lee Revell wrote:
> On Fri, 2004-09-03 at 04:09, Luke Yelavich wrote:
> > On Fri, Sep 03, 2004 at 05:40:34PM EST, Lee Revell wrote:
> > > On Fri, 2004-09-03 at 03:05, Ingo Molnar wrote:
> > Well with Lee's help, I think I have identified an ICE1712 sound driver issue,
> > but this is yet to be determined.
> 
> Hmm, this one is still not fixed, using the latest VP patches?

With R2, it turns out that this problem still exists. If I turn off threading
for either ICE1712 soundcard when it is being used by JACK, I get xruns of
around 3 msecs.

I can't remember the info you needed. What is needed to debug this problem?
-- 
Luke Yelavich
http://www.audioslack.com
luke@audioslack.com

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R1
  2004-09-03 11:49                                   ` Ingo Molnar
  2004-09-03 12:05                                     ` Thomas Charbonnel
@ 2004-09-03 16:14                                     ` Thomas Charbonnel
  2004-09-03 17:36                                       ` Thomas Charbonnel
  1 sibling, 1 reply; 100+ messages in thread
From: Thomas Charbonnel @ 2004-09-03 16:14 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Lee Revell, Free Ekanayaka, Eric St-Laurent, linux-kernel,
	K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt, Mark_H_Johnson,
	P.O. Gaillard, nando, luke, free78

Ingo Molnar wrote :
> this is a pretty weird one. First it shows an apparently non-monotonic
> RDTSC: the jump forward and backward in time around profile_hit. I
> suspect the real RDTSC value was lower than the previous one and caused
> an underflow. What is your cpu_khz in /proc/cpuinfo?
> 
> the other weird one is the +0.595 usec entry at notifier_call_chain(). 
> That code is just a couple of instructions, so no real for any overhead 
> there.
> 
> could you try the attached robust-get-cycles.patch ontop of your current
> tree and see whether it impacts these weirdnesses? The patch makes sure
> that the cycle counter is sane: two subsequent readings of it were
> monotonic and less than 1000 cycles apart.
> 
> this patch probably wont remove the +0.595 msec latency though - the
> RDTSC value jumped forward there permanently. Maybe the RDTSC value is
> somehow corrupted by NMIs - could you turn off the NMI watchdog to
> check?

Here are more traces with robust-get-cycles applied. So far no
non-monotonic issue.

preemption latency trace v1.0.5 on 2.6.9-rc1-VP-R1
--------------------------------------------------
 latency: 626 us, entries: 25 (25)
    -----------------
    | task: ksoftirqd/0/2, uid:0 nice:-10 policy:0 rt_prio:0
    -----------------
 => started at: preempt_schedule+0x3b/0x60
 => ended at:   schedule+0x2e4/0x570
=======>
04000000 0.000ms (+0.000ms): preempt_schedule (need_resched)
04000000 0.000ms (+0.000ms): schedule (preempt_schedule)
04000001 0.000ms (+0.620ms): sched_clock (schedule)
04010001 0.620ms (+0.000ms): do_nmi (sched_clock)
04010001 0.621ms (+0.000ms): profile_tick (nmi_watchdog_tick)
04010001 0.622ms (+0.000ms): profile_hook (profile_tick)
04010002 0.622ms (+0.000ms): notifier_call_chain (profile_hook)
04010001 0.622ms (+0.000ms): preempt_schedule (profile_tick)
04010001 0.622ms (+0.000ms): profile_hit (nmi_watchdog_tick)
04000001 0.623ms (+0.000ms): smp_apic_timer_interrupt (sched_clock)
04010001 0.623ms (+0.000ms): profile_tick (smp_apic_timer_interrupt)
04010001 0.623ms (+0.000ms): profile_hook (profile_tick)
04010002 0.623ms (+0.000ms): notifier_call_chain (profile_hook)
04010001 0.624ms (+0.000ms): preempt_schedule (profile_tick)
04010001 0.624ms (+0.000ms): profile_hit (smp_apic_timer_interrupt)
04000002 0.624ms (+0.000ms): do_softirq (smp_apic_timer_interrupt)
04000002 0.624ms (+0.000ms): __do_softirq (do_softirq)
04000002 0.625ms (+0.000ms): dequeue_task (schedule)
04000002 0.625ms (+0.000ms): recalc_task_prio (schedule)
04000002 0.625ms (+0.000ms): effective_prio (recalc_task_prio)
04000002 0.625ms (+0.000ms): enqueue_task (schedule)
00000002 0.626ms (+0.000ms): __switch_to (schedule)
00000002 0.626ms (+0.000ms): finish_task_switch (schedule)
00000001 0.626ms (+0.000ms): sub_preempt_count (schedule)
00000001 0.627ms (+0.000ms): update_max_trace (check_preempt_timing)

preemption latency trace v1.0.5 on 2.6.9-rc1-VP-R1
--------------------------------------------------
 latency: 460 us, entries: 38 (38)
    -----------------
    | task: swapper/0, uid:0 nice:0 policy:0 rt_prio:0
    -----------------
 => started at: do_IRQ+0x19/0x190
 => ended at:   do_IRQ+0x14a/0x190
=======>
00010000 0.000ms (+0.000ms): do_IRQ (common_interrupt)
00010000 0.000ms (+0.000ms): do_IRQ (default_idle)
00010000 0.000ms (+0.000ms): do_IRQ (<00000000>)
00010001 0.000ms (+0.444ms): mask_and_ack_8259A (do_IRQ)
00020002 0.444ms (+0.000ms): do_nmi (mask_and_ack_8259A)
00020002 0.445ms (+0.000ms): profile_tick (nmi_watchdog_tick)
00020002 0.446ms (+0.000ms): profile_hook (profile_tick)
00020003 0.446ms (+0.000ms): notifier_call_chain (profile_hook)
00020002 0.446ms (+0.001ms): profile_hit (nmi_watchdog_tick)
00010001 0.448ms (+0.000ms): generic_redirect_hardirq (do_IRQ)
00010000 0.448ms (+0.000ms): generic_handle_IRQ_event (do_IRQ)
00010000 0.448ms (+0.000ms): timer_interrupt (generic_handle_IRQ_event)
00010001 0.448ms (+0.005ms): mark_offset_tsc (timer_interrupt)
00010001 0.454ms (+0.000ms): do_timer (timer_interrupt)
00010001 0.454ms (+0.000ms): update_process_times (do_timer)
00010001 0.454ms (+0.000ms): update_one_process (update_process_times)
00010001 0.455ms (+0.000ms): run_local_timers (update_process_times)
00010001 0.455ms (+0.000ms): raise_softirq (update_process_times)
00010001 0.455ms (+0.000ms): scheduler_tick (update_process_times)
00010001 0.455ms (+0.000ms): sched_clock (scheduler_tick)
00010001 0.456ms (+0.000ms): update_wall_time (do_timer)
00010001 0.456ms (+0.000ms): update_wall_time_one_tick
(update_wall_time)
00010001 0.456ms (+0.000ms): generic_note_interrupt (do_IRQ)
00010001 0.456ms (+0.000ms): end_8259A_irq (do_IRQ)
00010001 0.457ms (+0.001ms): enable_8259A_irq (do_IRQ)
00000001 0.458ms (+0.000ms): do_softirq (do_IRQ)
00000001 0.458ms (+0.000ms): __do_softirq (do_softirq)
00000001 0.458ms (+0.000ms): wake_up_process (do_softirq)
00000001 0.458ms (+0.000ms): try_to_wake_up (wake_up_process)
00000001 0.458ms (+0.000ms): task_rq_lock (try_to_wake_up)
00000002 0.459ms (+0.000ms): activate_task (try_to_wake_up)
00000002 0.459ms (+0.000ms): sched_clock (activate_task)
00000002 0.459ms (+0.000ms): recalc_task_prio (activate_task)
00000002 0.459ms (+0.000ms): effective_prio (recalc_task_prio)
00000002 0.459ms (+0.000ms): enqueue_task (activate_task)
00000001 0.460ms (+0.000ms): preempt_schedule (try_to_wake_up)
00000001 0.460ms (+0.000ms): sub_preempt_count (do_IRQ)
00000001 0.460ms (+0.000ms): update_max_trace (check_preempt_timing)

preemption latency trace v1.0.5 on 2.6.9-rc1-VP-R1
--------------------------------------------------
 latency: 557 us, entries: 39 (39)
    -----------------
    | task: evolution-1.4/5492, uid:1000 nice:0 policy:0 rt_prio:0
    -----------------
 => started at: fget+0x32/0x80
 => ended at:   fget+0x58/0x80
=======>
00000001 0.000ms (+0.537ms): fget (do_pollfd)
00010001 0.537ms (+0.000ms): do_nmi (sub_preempt_count)
00010001 0.538ms (+0.000ms): profile_tick (nmi_watchdog_tick)
00010001 0.538ms (+0.000ms): profile_hook (profile_tick)
00010002 0.539ms (+0.000ms): notifier_call_chain (profile_hook)
00010001 0.539ms (+0.001ms): profile_hit (nmi_watchdog_tick)
00010001 0.540ms (+0.000ms): do_IRQ (sub_preempt_count)
00010001 0.540ms (+0.000ms): do_IRQ (<00000000>)
00010002 0.541ms (+0.002ms): mask_and_ack_8259A (do_IRQ)
00010002 0.543ms (+0.000ms): generic_redirect_hardirq (do_IRQ)
00010001 0.543ms (+0.000ms): generic_handle_IRQ_event (do_IRQ)
00010001 0.543ms (+0.000ms): timer_interrupt (generic_handle_IRQ_event)
00010002 0.544ms (+0.005ms): mark_offset_tsc (timer_interrupt)
00010002 0.549ms (+0.000ms): do_timer (timer_interrupt)
00010002 0.550ms (+0.000ms): update_process_times (do_timer)
00010002 0.550ms (+0.000ms): update_one_process (update_process_times)
00010002 0.550ms (+0.000ms): run_local_timers (update_process_times)
00010002 0.550ms (+0.000ms): raise_softirq (update_process_times)
00010002 0.551ms (+0.000ms): scheduler_tick (update_process_times)
00010002 0.551ms (+0.000ms): sched_clock (scheduler_tick)
00010003 0.551ms (+0.000ms): task_timeslice (scheduler_tick)
00010002 0.552ms (+0.000ms): update_wall_time (do_timer)
00010002 0.552ms (+0.000ms): update_wall_time_one_tick
(update_wall_time)
00010002 0.552ms (+0.000ms): generic_note_interrupt (do_IRQ)
00010002 0.553ms (+0.000ms): end_8259A_irq (do_IRQ)
00010002 0.553ms (+0.001ms): enable_8259A_irq (do_IRQ)
00000002 0.554ms (+0.000ms): do_softirq (do_IRQ)
00000002 0.554ms (+0.000ms): __do_softirq (do_softirq)
00000002 0.554ms (+0.000ms): wake_up_process (do_softirq)
00000002 0.555ms (+0.000ms): try_to_wake_up (wake_up_process)
00000002 0.555ms (+0.000ms): task_rq_lock (try_to_wake_up)
00000003 0.555ms (+0.000ms): activate_task (try_to_wake_up)
00000003 0.555ms (+0.000ms): sched_clock (activate_task)
00000003 0.555ms (+0.000ms): recalc_task_prio (activate_task)
00000003 0.556ms (+0.000ms): effective_prio (recalc_task_prio)
00000003 0.556ms (+0.000ms): enqueue_task (activate_task)
00000002 0.556ms (+0.000ms): preempt_schedule (try_to_wake_up)
00000001 0.557ms (+0.000ms): sub_preempt_count (fget)
00000001 0.557ms (+0.000ms): update_max_trace (check_preempt_timing)

I had similar problems affecting apparently random code paths before
enabling nmi_watchdog, but if you feel it can be useful, I can do some
more tests with robust-get-cycles and without nmi.

Thomas



^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R0
  2004-09-02 21:57       ` [patch] voluntary-preempt-2.6.9-rc1-bk4-R0 Ingo Molnar
  2004-09-02 22:06         ` Lee Revell
  2004-09-03 11:04         ` K.R. Foley
@ 2004-09-03 17:02         ` K.R. Foley
  2004-09-03 20:40           ` Lee Revell
  2004-09-03 17:10         ` K.R. Foley
  3 siblings, 1 reply; 100+ messages in thread
From: K.R. Foley @ 2004-09-03 17:02 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Felipe Alfaro Solana, Daniel Schmitt, Lee Revell,
	Mark_H_Johnson, P.O. Gaillard

Ingo Molnar wrote:
> i've released the -R0 patch:
> 
>   http://redhat.com/~mingo/voluntary-preempt/voluntary-preempt-2.6.9-rc1-bk4-R0
>  
> ontop of:
> 
>   http://redhat.com/~mingo/voluntary-preempt/diff-bk-040828-2.6.8.1.bz2
> 
> i've given up on the netdev_backlog_granularity approach, and as a
> replacement i've modified specific network drivers to return at a safe
> point if softirq preemption is requested. This gives the same end result
> but is more robust. For the time being i've fixed 8193too.c and e100.c.
> (will fix up other drivers too as latencies get reported)
> 
> this should fix the crash reported by P.O. Gaillard, and it should solve
> the packet delay/loss issues reported by Mark H Johnson. I cannot see
> any problems on my rtl8193 testbox anymore.
> 
> 	Ingo
> 

OK. I previously reported about the system hanging trying to boot this 
patch. It may have been a fluke. In any event, it doesn't hang 
consistently. I have rebooted several times now without problems. I do 
however still get some of these:

(ksoftirqd/0/2): new 131 us maximum-latency critical section.
  => started at: <netif_receive_skb+0x82/0x280>
  => ended at:   <netif_receive_skb+0x1d7/0x280>
  [<c0136789>] check_preempt_timing+0x119/0x1f0
  [<c0255017>] netif_receive_skb+0x1d7/0x280
  [<c0255017>] netif_receive_skb+0x1d7/0x280
  [<c013699d>] sub_preempt_count+0x4d/0x70
  [<c013699d>] sub_preempt_count+0x4d/0x70
  [<c0255017>] netif_receive_skb+0x1d7/0x280
  [<d08d40b7>] e100_poll+0x5b7/0x630 [e100]
  [<c013699d>] sub_preempt_count+0x4d/0x70
  [<c025527f>] net_rx_action+0x7f/0x110
  [<c011eb55>] ___do_softirq+0x75/0x90
  [<c011ec1e>] _do_softirq+0xe/0x20
  [<c011f034>] ksoftirqd+0x94/0xe0
  [<c012f8aa>] kthread+0xaa/0xb0
  [<c011efa0>] ksoftirqd+0x0/0xe0
  [<c012f800>] kthread+0x0/0xb0
  [<c0104575>] kernel_thread_helper+0x5/0x10



^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R0
  2004-09-02 21:57       ` [patch] voluntary-preempt-2.6.9-rc1-bk4-R0 Ingo Molnar
                           ` (2 preceding siblings ...)
  2004-09-03 17:02         ` K.R. Foley
@ 2004-09-03 17:10         ` K.R. Foley
  2004-09-03 18:17           ` Ingo Molnar
  2004-09-03 18:39           ` [patch] voluntary-preempt-2.6.9-rc1-bk4-R0 Ingo Molnar
  3 siblings, 2 replies; 100+ messages in thread
From: K.R. Foley @ 2004-09-03 17:10 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Felipe Alfaro Solana, Daniel Schmitt, Lee Revell,
	Mark_H_Johnson, P.O. Gaillard

Ingo Molnar wrote:
> i've released the -R0 patch:
> 
>   http://redhat.com/~mingo/voluntary-preempt/voluntary-preempt-2.6.9-rc1-bk4-R0
>  
> ontop of:
> 
>   http://redhat.com/~mingo/voluntary-preempt/diff-bk-040828-2.6.8.1.bz2
> 

Managed to hang the system again under heavy load. This time with the 
above patch:

http://www.cybsft.com/testresults/crashes/2.6.9-rc1-bk4-R0.txt

Last time was with Q7:

http://www.cybsft.com/testresults/crashes/2.6.9-rc1-bk4-Q7.txt


kr


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R1
  2004-09-03 16:14                                     ` Thomas Charbonnel
@ 2004-09-03 17:36                                       ` Thomas Charbonnel
  0 siblings, 0 replies; 100+ messages in thread
From: Thomas Charbonnel @ 2004-09-03 17:36 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Lee Revell, Free Ekanayaka, Eric St-Laurent, linux-kernel,
	K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt, Mark_H_Johnson,
	P.O. Gaillard, nando, luke, free78

I wrote :
> Ingo Molnar wrote :
> > this is a pretty weird one. First it shows an apparently non-monotonic
> > RDTSC: the jump forward and backward in time around profile_hit. I
> > suspect the real RDTSC value was lower than the previous one and caused
> > an underflow. What is your cpu_khz in /proc/cpuinfo?
> > 
> > the other weird one is the +0.595 usec entry at notifier_call_chain(). 
> > That code is just a couple of instructions, so no real for any overhead 
> > there.
> > 
> > could you try the attached robust-get-cycles.patch ontop of your current
> > tree and see whether it impacts these weirdnesses? The patch makes sure
> > that the cycle counter is sane: two subsequent readings of it were
> > monotonic and less than 1000 cycles apart.
> > 
> > this patch probably wont remove the +0.595 msec latency though - the
> > RDTSC value jumped forward there permanently. Maybe the RDTSC value is
> > somehow corrupted by NMIs - could you turn off the NMI watchdog to
> > check?
> 
> Here are more traces with robust-get-cycles applied. So far no
> non-monotonic issue.

In the end here's one :

preemption latency trace v1.0.5 on 2.6.9-rc1-VP-R1
--------------------------------------------------
 latency: 891 us, entries: 38 (38)
    -----------------
    | task: swapper/0, uid:0 nice:0 policy:0 rt_prio:0
    -----------------
 => started at: do_IRQ+0x19/0x190
 => ended at:   do_IRQ+0x14a/0x190
=======>
00010000 0.000ms (+0.876ms): do_nmi (robust_get_cycles)
00020000 0.876ms (+0.000ms): profile_tick (nmi_watchdog_tick)
00020000 0.876ms (+0.000ms): profile_hook (profile_tick)
00020001 0.876ms (+0.000ms): notifier_call_chain (profile_hook)
00020000 0.876ms (+689952.730ms): profile_hit (nmi_watchdog_tick)
04000002 689953.607ms (+1.855ms): finish_task_switch (schedule)
00010000 0.877ms (+0.000ms): do_IRQ (default_idle)
00010000 0.877ms (+0.000ms): do_IRQ (<00000000>)
00010001 0.877ms (+0.002ms): mask_and_ack_8259A (do_IRQ)
00010001 0.879ms (+0.000ms): generic_redirect_hardirq (do_IRQ)
00010000 0.880ms (+0.000ms): generic_handle_IRQ_event (do_IRQ)
00010000 0.880ms (+0.000ms): timer_interrupt (generic_handle_IRQ_event)
00010001 0.880ms (+0.005ms): mark_offset_tsc (timer_interrupt)
00010001 0.886ms (+0.000ms): do_timer (timer_interrupt)
00010001 0.886ms (+0.000ms): update_process_times (do_timer)
00010001 0.886ms (+0.000ms): update_one_process (update_process_times)
00010001 0.886ms (+0.000ms): run_local_timers (update_process_times)
00010001 0.886ms (+0.000ms): raise_softirq (update_process_times)
00010001 0.887ms (+0.000ms): scheduler_tick (update_process_times)
00010001 0.887ms (+0.000ms): sched_clock (scheduler_tick)
00010001 0.887ms (+0.000ms): update_wall_time (do_timer)
00010001 0.887ms (+0.000ms): update_wall_time_one_tick
(update_wall_time)
00010001 0.888ms (+0.000ms): generic_note_interrupt (do_IRQ)
00010001 0.888ms (+0.000ms): end_8259A_irq (do_IRQ)
00010001 0.888ms (+0.001ms): enable_8259A_irq (do_IRQ)
00000001 0.889ms (+0.000ms): do_softirq (do_IRQ)
00000001 0.889ms (+0.000ms): __do_softirq (do_softirq)
00000001 0.889ms (+0.000ms): wake_up_process (do_softirq)
00000001 0.890ms (+0.000ms): try_to_wake_up (wake_up_process)
00000001 0.890ms (+0.000ms): task_rq_lock (try_to_wake_up)
00000002 0.890ms (+0.000ms): activate_task (try_to_wake_up)
00000002 0.890ms (+0.000ms): sched_clock (activate_task)
00000002 0.890ms (+0.000ms): recalc_task_prio (activate_task)
00000002 0.891ms (+0.000ms): effective_prio (recalc_task_prio)
00000002 0.891ms (+0.000ms): enqueue_task (activate_task)
00000001 0.891ms (+0.000ms): preempt_schedule (try_to_wake_up)
00000001 0.892ms (+0.000ms): sub_preempt_count (do_IRQ)
00000001 0.892ms (+0.000ms): update_max_trace (check_preempt_timing)



^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R0
  2004-09-03  7:05                       ` Ingo Molnar
  2004-09-03  7:40                         ` Lee Revell
@ 2004-09-03 18:09                         ` K.R. Foley
  1 sibling, 0 replies; 100+ messages in thread
From: K.R. Foley @ 2004-09-03 18:09 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Lee Revell, Eric St-Laurent, linux-kernel, Felipe Alfaro Solana,
	Daniel Schmitt, Mark_H_Johnson, P.O. Gaillard

Ingo Molnar wrote:
> another question - any objections against me rebasing the VP patch to
> the current -mm tree and keeping it there exclusively until all possible
> merges are done? It would probably quite some work to create a complete
> patch for the upstream or BK tree during that process, as small patches
> start to flow in the VP => -mm => BK direction. Would any of the regular
> VP users/testers be wary to use the -mm tree?
> 
> 	Ingo
> 

I don't have any objections. Should I be wary of running the -mm tree?

kr

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R0
  2004-09-03 17:10         ` K.R. Foley
@ 2004-09-03 18:17           ` Ingo Molnar
  2004-09-03 18:36             ` K.R. Foley
  2004-09-03 19:30             ` [patch] voluntary-preempt-2.6.9-rc1-bk4-R3 Ingo Molnar
  2004-09-03 18:39           ` [patch] voluntary-preempt-2.6.9-rc1-bk4-R0 Ingo Molnar
  1 sibling, 2 replies; 100+ messages in thread
From: Ingo Molnar @ 2004-09-03 18:17 UTC (permalink / raw)
  To: K.R. Foley
  Cc: linux-kernel, Felipe Alfaro Solana, Daniel Schmitt, Lee Revell,
	Mark_H_Johnson, P.O. Gaillard

* K.R. Foley <kr@cybsft.com> wrote:

> Managed to hang the system again under heavy load. This time with the
> above patch:
> 
> http://www.cybsft.com/testresults/crashes/2.6.9-rc1-bk4-R0.txt
> 
> Last time was with Q7:
> 
> http://www.cybsft.com/testresults/crashes/2.6.9-rc1-bk4-Q7.txt

seems to be the same thing - an unbalanced preemption count, possibly
due to some locking error. Unfortunately the assert catches the
imbalance only at exit time. (it's unlikely that the do_exit() code is
buggy.)

i'll add a new feature to debug this: when crashing on an assert and
tracing is enabled the trace leading up to the crash will be printed to
the console. How did you capture the crash - was it in the syslog or do
you have serial logging? Maybe this is not the real crash but only a
followup crash?

	Ingo

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R0
  2004-09-03 18:17           ` Ingo Molnar
@ 2004-09-03 18:36             ` K.R. Foley
  2004-09-03 19:30             ` [patch] voluntary-preempt-2.6.9-rc1-bk4-R3 Ingo Molnar
  1 sibling, 0 replies; 100+ messages in thread
From: K.R. Foley @ 2004-09-03 18:36 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Felipe Alfaro Solana, Daniel Schmitt, Lee Revell,
	Mark_H_Johnson, P.O. Gaillard

Ingo Molnar wrote:
> * K.R. Foley <kr@cybsft.com> wrote:
> 
> 
>>Managed to hang the system again under heavy load. This time with the
>>above patch:
>>
>>http://www.cybsft.com/testresults/crashes/2.6.9-rc1-bk4-R0.txt
>>
>>Last time was with Q7:
>>
>>http://www.cybsft.com/testresults/crashes/2.6.9-rc1-bk4-Q7.txt
> 
> 
> seems to be the same thing - an unbalanced preemption count, possibly
> due to some locking error. Unfortunately the assert catches the
> imbalance only at exit time. (it's unlikely that the do_exit() code is
> buggy.)
> 
> i'll add a new feature to debug this: when crashing on an assert and
> tracing is enabled the trace leading up to the crash will be printed to
> the console. How did you capture the crash - was it in the syslog or do
> you have serial logging? Maybe this is not the real crash but only a
> followup crash?
> 
> 	Ingo
> 

This actually came from syslog. When it happened it was completely 
locked, so I have no idea if there was more that didn't make it to disk. 
Of course the monitor was blanked because most of the access is via the 
network. I could probably enable serial logging if necessary because 
it's sitting next to a couple of other machines.

kr

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R0
  2004-09-03 17:10         ` K.R. Foley
  2004-09-03 18:17           ` Ingo Molnar
@ 2004-09-03 18:39           ` Ingo Molnar
  2004-09-03 18:41             ` K.R. Foley
  1 sibling, 1 reply; 100+ messages in thread
From: Ingo Molnar @ 2004-09-03 18:39 UTC (permalink / raw)
  To: K.R. Foley
  Cc: linux-kernel, Felipe Alfaro Solana, Daniel Schmitt, Lee Revell,
	Mark_H_Johnson, P.O. Gaillard


* K.R. Foley <kr@cybsft.com> wrote:

> Managed to hang the system again under heavy load. This time with the 
> above patch:
> 
> http://www.cybsft.com/testresults/crashes/2.6.9-rc1-bk4-R0.txt

do you have CONFIG_DEBUG_SPINLOCK and CONFIG_DEBUG_SPINLOCK_SLEEP
enabled? These are pretty good at e.g. catching illegal scheduling in
critical sections much earlier. You can enable them on !SMP kernels as
well.

	Ingo


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R0
  2004-09-03 18:39           ` [patch] voluntary-preempt-2.6.9-rc1-bk4-R0 Ingo Molnar
@ 2004-09-03 18:41             ` K.R. Foley
  0 siblings, 0 replies; 100+ messages in thread
From: K.R. Foley @ 2004-09-03 18:41 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Felipe Alfaro Solana, Daniel Schmitt, Lee Revell,
	Mark_H_Johnson, P.O. Gaillard

Ingo Molnar wrote:
> * K.R. Foley <kr@cybsft.com> wrote:
> 
> 
>>Managed to hang the system again under heavy load. This time with the 
>>above patch:
>>
>>http://www.cybsft.com/testresults/crashes/2.6.9-rc1-bk4-R0.txt
> 
> 
> do you have CONFIG_DEBUG_SPINLOCK and CONFIG_DEBUG_SPINLOCK_SLEEP
> enabled? These are pretty good at e.g. catching illegal scheduling in
> critical sections much earlier. You can enable them on !SMP kernels as
> well.
> 
> 	Ingo

Not currently but I am enabling now.

kr

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [patch] voluntary-preempt-2.6.9-rc1-bk4-R3
  2004-09-03 18:17           ` Ingo Molnar
  2004-09-03 18:36             ` K.R. Foley
@ 2004-09-03 19:30             ` Ingo Molnar
  2004-09-03 19:49               ` K.R. Foley
                                 ` (2 more replies)
  1 sibling, 3 replies; 100+ messages in thread
From: Ingo Molnar @ 2004-09-03 19:30 UTC (permalink / raw)
  To: K.R. Foley
  Cc: linux-kernel, Felipe Alfaro Solana, Daniel Schmitt, Lee Revell,
	Mark_H_Johnson, P.O. Gaillard


* Ingo Molnar <mingo@elte.hu> wrote:

> i'll add a new feature to debug this: when crashing on an assert and
> tracing is enabled the trace leading up to the crash will be printed
> to the console. [...]

the -R3 patch has this feature:

  http://redhat.com/~mingo/voluntary-preempt/voluntary-preempt-2.6.9-rc1-bk4-R3 

you can enable it via enabling CONFIG_LATENCY_TRACE and doing:

  echo 3 > /proc/sys/kernel/trace_enabled

it's all automatic from this point on: tracing will happen nonstop and
any assert or crash that prints the process stack will also print the
last 100 trace entries. Sample output:

 Call Trace:
  [<c0160401>] sys_munmap+0x61/0x80
  [<c010520d>] sysenter_past_esp+0x52/0x71
 Last 100 trace entries:
 00000001: zap_pmd_range+0xe/0x90  <= (unmap_page_range+0x55/0x80)
 00000001: stop_trace+0x8/0x20  <= (bust_spinlocks+0x20/0x60)
 00000001: bust_spinlocks+0xe/0x60  <= (die+0xbc/0x2a0)
 [... 97 more trace entries ...]

Please capture and mail the first 'extended' oops that triggers
(secondary followup traces are probably just side-effects of the crash),
it could give us clues about where the bug is.

	Ingo

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R3
  2004-09-03 19:30             ` [patch] voluntary-preempt-2.6.9-rc1-bk4-R3 Ingo Molnar
@ 2004-09-03 19:49               ` K.R. Foley
  2004-09-04  3:39               ` K.R. Foley
  2004-09-04  3:43               ` K.R. Foley
  2 siblings, 0 replies; 100+ messages in thread
From: K.R. Foley @ 2004-09-03 19:49 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Felipe Alfaro Solana, Daniel Schmitt, Lee Revell,
	Mark_H_Johnson, P.O. Gaillard

Ingo Molnar wrote:
> * Ingo Molnar <mingo@elte.hu> wrote:
> 
> 
>>i'll add a new feature to debug this: when crashing on an assert and
>>tracing is enabled the trace leading up to the crash will be printed
>>to the console. [...]
> 
> 
> the -R3 patch has this feature:
> 
>   http://redhat.com/~mingo/voluntary-preempt/voluntary-preempt-2.6.9-rc1-bk4-R3 
> 
> you can enable it via enabling CONFIG_LATENCY_TRACE and doing:
> 
>   echo 3 > /proc/sys/kernel/trace_enabled
> 
> it's all automatic from this point on: tracing will happen nonstop and
> any assert or crash that prints the process stack will also print the
> last 100 trace entries. Sample output:
> 
>  Call Trace:
>   [<c0160401>] sys_munmap+0x61/0x80
>   [<c010520d>] sysenter_past_esp+0x52/0x71
>  Last 100 trace entries:
>  00000001: zap_pmd_range+0xe/0x90  <= (unmap_page_range+0x55/0x80)
>  00000001: stop_trace+0x8/0x20  <= (bust_spinlocks+0x20/0x60)
>  00000001: bust_spinlocks+0xe/0x60  <= (die+0xbc/0x2a0)
>  [... 97 more trace entries ...]
> 
> Please capture and mail the first 'extended' oops that triggers
> (secondary followup traces are probably just side-effects of the crash),
> it could give us clues about where the bug is.
> 
> 	Ingo
> 

Building now.

kr

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R0
  2004-09-03 17:02         ` K.R. Foley
@ 2004-09-03 20:40           ` Lee Revell
  0 siblings, 0 replies; 100+ messages in thread
From: Lee Revell @ 2004-09-03 20:40 UTC (permalink / raw)
  To: K.R. Foley
  Cc: Ingo Molnar, linux-kernel, Felipe Alfaro Solana, Daniel Schmitt,
	Mark_H_Johnson, P.O. Gaillard

On Fri, 2004-09-03 at 13:02, K.R. Foley wrote:
> OK. I previously reported about the system hanging trying to boot this 
> patch. It may have been a fluke. In any event, it doesn't hang 
> consistently. I have rebooted several times now without problems. I do 
> however still get some of these:
> 
> (ksoftirqd/0/2): new 131 us maximum-latency critical section.
>   => started at: <netif_receive_skb+0x82/0x280>
>   => ended at:   <netif_receive_skb+0x1d7/0x280>

This is actually a good sign if this is the most common one you see. 
The netif_receive softirq is basically the longest code path that can't
easily be made preemptible; on my hardware about 150 usecs seems to be
the best we can do.  This is really good, I can't imagine an application
that you would use PC hardware for where 150 usecs is not enough, though
I am sure the hard realtime crowd has a few.

Even fixing this one would not help much because it seems there are many
other non-preemtible code paths that are only a little shorter than this
one.

Lee

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R3
  2004-09-03 19:30             ` [patch] voluntary-preempt-2.6.9-rc1-bk4-R3 Ingo Molnar
  2004-09-03 19:49               ` K.R. Foley
@ 2004-09-04  3:39               ` K.R. Foley
  2004-09-04  3:43               ` K.R. Foley
  2 siblings, 0 replies; 100+ messages in thread
From: K.R. Foley @ 2004-09-04  3:39 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Felipe Alfaro Solana, Daniel Schmitt, Lee Revell,
	Mark_H_Johnson, P.O. Gaillard

Ingo Molnar wrote:
> * Ingo Molnar <mingo@elte.hu> wrote:
> 
> 
>>i'll add a new feature to debug this: when crashing on an assert and
>>tracing is enabled the trace leading up to the crash will be printed
>>to the console. [...]
> 
> 
> the -R3 patch has this feature:
> 

After hammering the system for a little more than an hour it gave up. I 
don't have the serial logging setup yet because I haven't had time this 
evening. I will be glad to do whatever I can to try to help debug this, 
but it will have to wait until tomorrow. The log is here:

http://www.cybsft.com/testresults/crashes/2.6.9-rc1-vo-R3.txt

kr

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R3
  2004-09-03 19:30             ` [patch] voluntary-preempt-2.6.9-rc1-bk4-R3 Ingo Molnar
  2004-09-03 19:49               ` K.R. Foley
  2004-09-04  3:39               ` K.R. Foley
@ 2004-09-04  3:43               ` K.R. Foley
  2004-09-04  6:41                 ` Ingo Molnar
  2004-09-04  8:57                 ` Ingo Molnar
  2 siblings, 2 replies; 100+ messages in thread
From: K.R. Foley @ 2004-09-04  3:43 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Felipe Alfaro Solana, Daniel Schmitt, Lee Revell,
	Mark_H_Johnson, P.O. Gaillard

Ingo Molnar wrote:
> * Ingo Molnar <mingo@elte.hu> wrote:
> 
> 
>>i'll add a new feature to debug this: when crashing on an assert and
>>tracing is enabled the trace leading up to the crash will be printed
>>to the console. [...]
> 
> 
> the -R3 patch has this feature:
> 

After hammering the system for a little more than an hour it gave up. I
don't have the serial logging setup yet because I haven't had time this
evening. I will be glad to do whatever I can to try to help debug this,
but it will have to wait until tomorrow. The log is here:

http://www.cybsft.com/testresults/crashes/2.6.9-rc1-vo-R3.txt

kr

Sorry I forgot to mention that this was triggered running the 
stress-kernel package, minus the NFS-Compile, but it does include the 
CRASHME test. In addition, amlat was running as well. The system was 
pretty much 100% loaded.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R3
  2004-09-04  3:43               ` K.R. Foley
@ 2004-09-04  6:41                 ` Ingo Molnar
  2004-09-04 12:28                   ` K.R. Foley
  2004-09-04  8:57                 ` Ingo Molnar
  1 sibling, 1 reply; 100+ messages in thread
From: Ingo Molnar @ 2004-09-04  6:41 UTC (permalink / raw)
  To: K.R. Foley
  Cc: linux-kernel, Felipe Alfaro Solana, Daniel Schmitt, Lee Revell,
	Mark_H_Johnson, P.O. Gaillard


* K.R. Foley <kr@cybsft.com> wrote:

> http://www.cybsft.com/testresults/crashes/2.6.9-rc1-vo-R3.txt

the first line seems partial - isnt the full oops in the log?

> Sorry I forgot to mention that this was triggered running the
> stress-kernel package, minus the NFS-Compile, but it does include the
> CRASHME test. In addition, amlat was running as well. The system was
> pretty much 100% loaded.

Have you run crashme as root? That would be unsafe.

	Ingo

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R3
  2004-09-04  3:43               ` K.R. Foley
  2004-09-04  6:41                 ` Ingo Molnar
@ 2004-09-04  8:57                 ` Ingo Molnar
  2004-09-04 10:16                   ` Lee Revell
  2004-09-04 14:35                   ` K.R. Foley
  1 sibling, 2 replies; 100+ messages in thread
From: Ingo Molnar @ 2004-09-04  8:57 UTC (permalink / raw)
  To: K.R. Foley
  Cc: linux-kernel, Felipe Alfaro Solana, Daniel Schmitt, Lee Revell,
	Mark_H_Johnson, P.O. Gaillard


* K.R. Foley <kr@cybsft.com> wrote:

> After hammering the system for a little more than an hour it gave up.
> I don't have the serial logging setup yet because I haven't had time
> this evening. I will be glad to do whatever I can to try to help debug
> this, but it will have to wait until tomorrow. The log is here:
> 
> http://www.cybsft.com/testresults/crashes/2.6.9-rc1-vo-R3.txt

fyi, i have now triggered a similar crash on a testbox too. It takes
quite some time to trigger but it does.

since it happens with VP=0,KP=0,SP=0,HP=0 as well it should be one of
the cond_resched_lock() (or cond_resched()) additions.

	Ingo

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R3
  2004-09-04  8:57                 ` Ingo Molnar
@ 2004-09-04 10:16                   ` Lee Revell
  2004-09-04 14:35                   ` K.R. Foley
  1 sibling, 0 replies; 100+ messages in thread
From: Lee Revell @ 2004-09-04 10:16 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: K.R. Foley, linux-kernel, Felipe Alfaro Solana, Daniel Schmitt,
	Mark_H_Johnson, P.O. Gaillard

On Sat, 2004-09-04 at 04:57, Ingo Molnar wrote:
> * K.R. Foley <kr@cybsft.com> wrote:
> 
> > After hammering the system for a little more than an hour it gave up.
> > I don't have the serial logging setup yet because I haven't had time
> > this evening. I will be glad to do whatever I can to try to help debug
> > this, but it will have to wait until tomorrow. The log is here:
> > 
> > http://www.cybsft.com/testresults/crashes/2.6.9-rc1-vo-R3.txt
> 
> fyi, i have now triggered a similar crash on a testbox too. It takes
> quite some time to trigger but it does.
> 
> since it happens with VP=0,KP=0,SP=0,HP=0 as well it should be one of
> the cond_resched_lock() (or cond_resched()) additions.
> 

Here are some results for R0 on another machine, a 1.2Ghz Athlon XP:

http://krustophenia.net/testresults.php?dataset=2.6.9-rc1-R0

I have also changed the test again, to be more accurate.  Now the jackd
alsa driver separately keeps track of the time it spends in poll(), and
the time it takes running the jackd process cycle.  The length of one
period less the sum of the time we spent in poll() and the time it took
to run the process cycle equals the amount of time we spent ready to
run, and waiting to be scheduled AKA latency.

The results are pretty amazing - out of a period time of 666 usecs most
of the time we spend between 0 and 1 usec in this state,  The worst is
27 usecs or so.

These results are of course not directly comparable with previous tests,
but I believe this is the most accurate way to measure latency in jackd.

Lee 

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R3
  2004-09-04  6:41                 ` Ingo Molnar
@ 2004-09-04 12:28                   ` K.R. Foley
  0 siblings, 0 replies; 100+ messages in thread
From: K.R. Foley @ 2004-09-04 12:28 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Felipe Alfaro Solana, Daniel Schmitt, Lee Revell,
	Mark_H_Johnson, P.O. Gaillard

Ingo Molnar wrote:
> * K.R. Foley <kr@cybsft.com> wrote:
> 
> 
>>http://www.cybsft.com/testresults/crashes/2.6.9-rc1-vo-R3.txt
> 
> 
> the first line seems partial - isnt the full oops in the log?

Agreed. Unfortunately that is all there is.

> 
> 
>>Sorry I forgot to mention that this was triggered running the
>>stress-kernel package, minus the NFS-Compile, but it does include the
>>CRASHME test. In addition, amlat was running as well. The system was
>>pretty much 100% loaded.
> 
> 
> Have you run crashme as root? That would be unsafe.

Actually what happens is that it creates a "crashme" user (and group)
for running the test and then deletes the user after the test. In fact
the user is still in passwd, because of the crash it didn't get cleaned
up. So I don't think this SHOULD be a problem.



^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R3
  2004-09-04  8:57                 ` Ingo Molnar
  2004-09-04 10:16                   ` Lee Revell
@ 2004-09-04 14:35                   ` K.R. Foley
  2004-09-04 20:05                     ` Ingo Molnar
  1 sibling, 1 reply; 100+ messages in thread
From: K.R. Foley @ 2004-09-04 14:35 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Felipe Alfaro Solana, Daniel Schmitt, Lee Revell,
	Mark_H_Johnson, P.O. Gaillard

Ingo Molnar wrote:
> * K.R. Foley <kr@cybsft.com> wrote:
> 
> 
>>After hammering the system for a little more than an hour it gave up.
>>I don't have the serial logging setup yet because I haven't had time
>>this evening. I will be glad to do whatever I can to try to help debug
>>this, but it will have to wait until tomorrow. The log is here:
>>
>>http://www.cybsft.com/testresults/crashes/2.6.9-rc1-vo-R3.txt
> 
> 
> fyi, i have now triggered a similar crash on a testbox too. It takes
> quite some time to trigger but it does.
> 
> since it happens with VP=0,KP=0,SP=0,HP=0 as well it should be one of
> the cond_resched_lock() (or cond_resched()) additions.
> 
> 	Ingo
> 

I am glad that it's reproducible for you as well. How did you trigger 
it? Because it seems to only crash under heavy load for me. The system 
has been up since I rebooted last night after the crash and I haven't 
seen any problems. Same thing goes for up until last night when I booted 
the new patch. Even building the new patch didn't seem to be enough to 
trigger it.

kr

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R3
  2004-09-04 14:35                   ` K.R. Foley
@ 2004-09-04 20:05                     ` Ingo Molnar
  0 siblings, 0 replies; 100+ messages in thread
From: Ingo Molnar @ 2004-09-04 20:05 UTC (permalink / raw)
  To: K.R. Foley
  Cc: linux-kernel, Felipe Alfaro Solana, Daniel Schmitt, Lee Revell,
	Mark_H_Johnson, P.O. Gaillard


* K.R. Foley <kr@cybsft.com> wrote:

> I am glad that it's reproducible for you as well. How did you trigger
> it? Because it seems to only crash under heavy load for me. The system
> has been up since I rebooted last night after the crash and I haven't
> seen any problems. Same thing goes for up until last night when I
> booted the new patch. Even building the new patch didn't seem to be
> enough to trigger it.

i triggered it via two IO-intense scripts running on a 256 MB RAM
testbox:

  cd /tmp; while true; do dd if=/dev/zero of=bigfile bs=1000000 
  count=500 >/dev/null 2>/dev/null; sync; date; sleep 60; done &

  while true; do du /usr >/dev/null 2>/dev/null; date; done &

usually it triggers within half an hour or so of runtime. With -R4 it
didnt crash yet after ~1 hour of running so i'm quite optimistic that 
this particular bug has been fixed.

i think the key to reproduce was to use KP=0,SP=0,HP=0 - this
concentrates all preemption activity to the places the VP patch adds -
amongst them the one in the do_exit() path that is necessary to trigger
the race.

	Ingo

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R1
@ 2004-09-07 16:54 Mark_H_Johnson
  2004-09-08 18:42 ` Ingo Molnar
  2004-09-09 16:02 ` Ingo Molnar
  0 siblings, 2 replies; 100+ messages in thread
From: Mark_H_Johnson @ 2004-09-07 16:54 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Lee Revell, Free Ekanayaka, Eric St-Laurent, linux-kernel,
	K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt, P.O. Gaillard,
	nando, luke, free78

Results from this morning's test with -R1 and some follow up on
related messages.

Two runs with -R1 plus the following changes
 - a change to the ensonic driver to reduce latency
 - added mcount() calls in sched.c and timer_tsc.c
 - disabled inline functions in sched.c
The last two so we can see the steps a little more clearly.

The differences between the runs was:
 - first run had DRI and hardware acceleration enabled
 - second run disabled DRI / hardware acceleration
as suggested by Lee Revell.

>From Lee Revell <rlrevell@joe-job.com>
>This is looking more and more like a video driver problem:
>...
>The easiest way to eliminate this possibility is to disable DRI and set
>'Option "NoAccel"' in your X config.  Do you get the same mysterious
>latencies with this setting?

I am not sure I see any significant differences in latencies. Most of
the large delays occur during disk reads and the number of traces >
500 usec was over 100 in both tests (about 30 minutes each). There were
also more latency traces per minute during disk reads when DRI was
disabled. There were however none of the "find_next_bit" traces in the
NoAccel run with significant time delays so that may indicate some
display problems but not sure. See below for details.

Side Comment
============

If you look at the date / time of the traces, you will notice that
most occur in the latter part of the test. This is during the
"disk copy" and "disk read" parts of the testing. To illustrate, the
first 10 traces take about 20 minutes to collect; the last 90 are
collected in 15 and 9 minutes (first and second test).

It is also encouraging that the longest trace is < 800 usec in these
two runs. This is far better than what I saw a couple weeks ago.

Shortest trace
==============

preemption latency trace v1.0.5 on 2.6.9-rc1-VP-R1
--------------------------------------------------
 latency: 550 us, entries: 6 (6)
    -----------------
    | task: cat/6771, uid:0 nice:0 policy:0 rt_prio:0
    -----------------
 => started at: kmap_atomic+0x23/0xe0
 => ended at:   kunmap_atomic+0x7b/0xa0
=======>
00000001 0.000ms (+0.000ms): kmap_atomic (file_read_actor)
00000001 0.000ms (+0.000ms): page_address (file_read_actor)
00000001 0.000ms (+0.549ms): __copy_to_user_ll (file_read_actor)
00000001 0.550ms (+0.000ms): kunmap_atomic (file_read_actor)
00000001 0.550ms (+0.000ms): sub_preempt_count (kunmap_atomic)
00000001 0.550ms (+0.000ms): update_max_trace (check_preempt_timing)

The 1/2 msec latency in __copy_to_user_ll occurred a few more times
in both tests.

Find Next Bit
=============

It appears that -R1 has significantly fewer cases where the code in
find_next_bit is delayed. To summarize:

# grep find_next_bit lt040903/lt*/lt.* | grep '(+0.1' | wc -l
63
# grep find_next_bit lt040907/lt*/lt.* | grep '(+0.1' | wc -l
0

[counting number of trace entries w/ 100-199 usec latencies]

The runs on September 3 were with -Q9, today's (September 7) were
with -R1. Not sure why this changed, but it is encouraging. The
maximum with DRI / display acceleration was +0.069ms, the maximum
without DRI was +0.001ms.

Mark Offset TSC
===============

In a similar way, long latencies in mark_offset_tsc were significantly
reduced in -R1.
# grep _tsc lt040903/lt*/lt.* | grep '(+0.1' | wc -l
24
# grep _tsc lt040907/lt*/lt.* | grep '(+0.1' | wc -l
3

Two of the long delays were in the same trace in the following
sequence:

00010001 0.140ms (+0.000ms): timer_interrupt (generic_handle_IRQ_event)
00010001 0.140ms (+0.000ms): spin_lock (timer_interrupt)
00010002 0.140ms (+0.000ms): spin_lock (<00000000>)
00010002 0.141ms (+0.000ms): mark_offset_tsc (timer_interrupt) [0]
00010002 0.141ms (+0.000ms): mark_offset_tsc (timer_interrupt) [1]
00010002 0.141ms (+0.000ms): spin_lock (mark_offset_tsc)
00010003 0.141ms (+0.000ms): spin_lock (<00000000>)
00010003 0.141ms (+0.131ms): mark_offset_tsc (timer_interrupt) [2]
00010003 0.273ms (+0.000ms): mark_offset_tsc (timer_interrupt) [3]
00010003 0.273ms (+0.000ms): spin_lock (mark_offset_tsc)
00010004 0.273ms (+0.000ms): spin_lock (<00000000>)
00010004 0.273ms (+0.145ms): mark_offset_tsc (timer_interrupt) [4]
00010004 0.419ms (+0.004ms): mark_offset_tsc (timer_interrupt) [5]
00010004 0.423ms (+0.002ms): mark_offset_tsc (timer_interrupt) [6]
00010004 0.425ms (+0.000ms): mark_offset_tsc (timer_interrupt)
00010004 0.425ms (+0.000ms): mark_offset_tsc (timer_interrupt)
00010003 0.426ms (+0.000ms): mark_offset_tsc (timer_interrupt)
00010003 0.426ms (+0.046ms): mark_offset_tsc (timer_interrupt)
00010003 0.472ms (+0.000ms): mark_offset_tsc (timer_interrupt)
00010002 0.472ms (+0.000ms): timer_interrupt (generic_handle_IRQ_event)

For reference, the steps in the code read (w/o comments):

        mcount(); [1]
        write_seqlock(&monotonic_lock);
        mcount(); [2]
        last_offset = ((unsigned long long)last_tsc_high<<32)|last_tsc_low;
        rdtsc(last_tsc_low, last_tsc_high);

        mcount(); [3]
        spin_lock(&i8253_lock);
        mcount(); [4]
        outb_p(0x00, PIT_MODE);     /* latch the count ASAP */
        mcount(); [5]

        count = inb_p(PIT_CH0);    /* read the latched count */
        mcount(); [6]
        count |= inb(PIT_CH0) << 8;
        mcount(); [7]
I numbered the traces / corresponding mcount() calls. If I labeled this
right, it appears the delays were in the rdtsc call and outb_p to
latch the count. Perhaps a hardware level delay taking place.

Clear Page Tables
=================

This is the longest single latency with the following traces:

# grep '(+0.6' lt040907/lt*/lt.*
lt040907/lt001.v3k1/lt.28:00000001 0.001ms (+0.635ms): clear_page_tables
(exit_mmap)
lt040907/lt002.v3k1/lt.75:00000001 0.001ms (+0.628ms): clear_page_tables
(exit_mmap)

__modify_IO_APIC_irq
====================

Not quite sure what is happening here, but this trace

preemption latency trace v1.0.5 on 2.6.9-rc1-VP-R1
--------------------------------------------------
 latency: 612 us, entries: 111 (111)
    -----------------
    | task: kblockd/0/10, uid:0 nice:-10 policy:0 rt_prio:0
    -----------------
 => started at: __spin_lock_irqsave+0x4b/0xa0
 => ended at:   as_work_handler+0x5c/0xa0
=======>
00000001 0.000ms (+0.000ms): __spin_lock_irqsave (spin_lock_irq)
00000001 0.000ms (+0.000ms): generic_enable_irq (ide_do_request)
00000001 0.000ms (+0.000ms): __spin_lock_irqsave (generic_enable_irq)
00000001 0.000ms (+0.000ms): __spin_lock_irqsave (<00000000>)
00000002 0.001ms (+0.000ms): unmask_IO_APIC_irq (generic_enable_irq)
00000002 0.001ms (+0.000ms): __spin_lock_irqsave (unmask_IO_APIC_irq)
00000002 0.001ms (+0.000ms): __spin_lock_irqsave (<00000000>)
00000003 0.001ms (+0.000ms): __unmask_IO_APIC_irq (unmask_IO_APIC_irq)
00000003 0.002ms (+0.567ms): __modify_IO_APIC_irq (__unmask_IO_APIC_irq)
00010001 0.569ms (+0.000ms): do_nmi (smp_apic_timer_interrupt)
00010001 0.569ms (+0.002ms): do_nmi (sched_clock)
00010001 0.572ms (+0.000ms): profile_tick (nmi_watchdog_tick)
00010001 0.572ms (+0.000ms): profile_hook (profile_tick)
00010001 0.573ms (+0.000ms): read_lock (profile_hook)
00010002 0.573ms (+0.000ms): read_lock (<00000000>)
00010002 0.573ms (+0.000ms): notifier_call_chain (profile_hook)
00010001 0.574ms (+0.000ms): profile_hit (nmi_watchdog_tick)
00000001 0.575ms (+0.000ms): smp_apic_timer_interrupt (as_work_handler)

is an example where __modify_IO_APIC_irq shows up with a long latency.
There were about a dozen long latencies (> 100 usec) in the two tests
run this morning, slightly fewer (6 vs 8) in the first test with DRI
and display accelaration turned on. This appears to be an increase
with -R1, tests over three days (-Q5 through -Q9) only had 4 long
latencies at this spot. [Hmm - that might be due to the change in
the way we compute / display the latencies]

Spin Lock
=========

We seem to have gotten stuck here in a spin lock...

 => started at: spin_lock+0x34/0x90
 => ended at:   journal_commit_transaction+0xa25/0x1c30
=======>
00000001 0.000ms (+0.559ms): spin_lock (journal_commit_transaction)

and here...

 => started at: __read_lock_irqsave+0x93/0xa0
 => ended at:   snd_pcm_lib_write1+0x3ba/0x600
=======>
00000001 0.000ms (+0.000ms): __read_lock_irqsave (read_lock_irq)
00000001 0.000ms (+0.000ms): spin_lock (snd_pcm_lib_write1)
00000002 0.000ms (+0.000ms): spin_lock (<00000000>)
00000002 0.000ms (+0.000ms): snd_pcm_update_hw_ptr (snd_pcm_lib_write1)
...
00000002 0.008ms (+0.000ms): snd_ensoniq_trigger (snd_pcm_do_stop)
00000002 0.009ms (+0.000ms): spin_lock (snd_ensoniq_trigger)
00000003 0.009ms (+0.549ms): spin_lock (<00000000>)

Overall there traces referring to spin_lock were...

# grep 'spin_lock (' lt040907/lt*/* | grep -v '+0.0' | wc -l
35
# grep 'spin_lock (' lt040903/lt*/* | grep -v '+0.0' | wc -l
11

More of these traces in -R1 than -Q9, but I am not sure if this is
significant or not.

Context Switch
==============

With the inlined functions disabled, and some extra debug code,
I saw context_switch appear > 800 times in the long latency traces.
Usually very fast (> 10 usec) but sometimes listed with some long
latencies (> 100 usec, 45 trace lines).
For example:

 => started at: schedule+0x5b/0x610
 => ended at:   schedule+0x385/0x610
...
0000002 0.004ms (+0.000ms): schedule (do_irqd)
00000002 0.005ms (+0.000ms): context_switch (schedule)
00000002 0.005ms (+0.000ms): dummy_cs_entry (context_switch)
00000002 0.005ms (+0.000ms): context_switch (schedule)
00000002 0.005ms (+0.000ms): dummy_cs_switch_mm (context_switch)
00000002 0.005ms (+0.111ms): context_switch (schedule)
00000002 0.116ms (+0.000ms): dummy_cs_unlikely_if (context_switch)
00000002 0.117ms (+0.000ms): context_switch (schedule)
00000002 0.117ms (+0.000ms): dummy_cs_switch_to (context_switch)
00000002 0.117ms (+0.029ms): context_switch (schedule)
00000002 0.147ms (+0.128ms): __switch_to (context_switch)
00000002 0.275ms (+0.001ms): dummy_cs_exit (context_switch)
00000002 0.277ms (+0.155ms): context_switch (schedule)
00000002 0.432ms (+0.000ms): finish_task_switch (schedule)
00000002 0.433ms (+0.000ms): smp_apic_timer_interrupt (finish_task_switch)
00010002 0.433ms (+0.137ms): profile_tick (smp_apic_timer_interrupt)
00010002 0.571ms (+0.000ms): profile_hook (profile_tick)
00010002 0.571ms (+0.000ms): read_lock (profile_hook)
00010003 0.571ms (+0.000ms): read_lock (<00000000>)
00010003 0.571ms (+0.000ms): notifier_call_chain (profile_hook)
00010002 0.572ms (+0.000ms): profile_hit (smp_apic_timer_interrupt)

The dummy_cs lines refer to calls I added to context_switch at each
step to read...

        dummy_cs_entry();
        if (unlikely(!mm)) {
                next->active_mm = oldmm;
                atomic_inc(&oldmm->mm_count);
                dummy_cs_lazy_tlb();
                enter_lazy_tlb(oldmm, next);
        } else {
                dummy_cs_switch_mm();
                switch_mm(oldmm, mm, next);
        }

        dummy_cs_unlikely_if();
        if (unlikely(!prev->mm)) {
                prev->active_mm = NULL;
                WARN_ON(rq->prev_mm);
                rq->prev_mm = oldmm;
        }

        /* Here we just switch the register state and the stack. */
        dummy_cs_switch_to();
        switch_to(prev, next, prev);

        dummy_cs_exit();
        return prev;

The switch_mm path of the if appears to be more costly time wise
than the lazy_tlb path. To see this from the detailed traces, try
something like...

grep -C2 'dummy_cs_switch_mm' lt040907/lt*/* | less -im
or
grep -C2 'dummy_cs_lazy_tlb' lt040907/lt*/* | less -im

Closing
=======

I will send copies of the traces (not to the mailing lists) in a few
minutes for review / analysis.

I also downloaded the -R8 patch and other associated patches and will
be building that soon.
  --Mark


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R1
  2004-09-07 16:54 [patch] voluntary-preempt-2.6.9-rc1-bk4-R1 Mark_H_Johnson
@ 2004-09-08 18:42 ` Ingo Molnar
  2004-09-10  2:31   ` K.R. Foley
  2004-09-09 16:02 ` Ingo Molnar
  1 sibling, 1 reply; 100+ messages in thread
From: Ingo Molnar @ 2004-09-08 18:42 UTC (permalink / raw)
  To: Mark_H_Johnson
  Cc: Lee Revell, Free Ekanayaka, Eric St-Laurent, linux-kernel,
	K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt, P.O. Gaillard,
	nando, luke, free78

* Mark_H_Johnson@raytheon.com <Mark_H_Johnson@raytheon.com> wrote:

> If you look at the date / time of the traces, you will notice that
> most occur in the latter part of the test. This is during the "disk
> copy" and "disk read" parts of the testing. [...]

would it be possible to test with DMA disabled? (hdparm -d0 /dev/hda) It
might take some extra work to shun the extra latency reports from the
PIO IDE path (which is quite slow) but once that is done you should be
able to see whether these long 0.5 msec delays remain even if all (most)
DMA activity has been eliminated.

> preemption latency trace v1.0.5 on 2.6.9-rc1-VP-R1
> --------------------------------------------------
>  latency: 550 us, entries: 6 (6)
>     -----------------
>     | task: cat/6771, uid:0 nice:0 policy:0 rt_prio:0
>     -----------------
>  => started at: kmap_atomic+0x23/0xe0
>  => ended at:   kunmap_atomic+0x7b/0xa0
> =======>
> 00000001 0.000ms (+0.000ms): kmap_atomic (file_read_actor)
> 00000001 0.000ms (+0.000ms): page_address (file_read_actor)
> 00000001 0.000ms (+0.549ms): __copy_to_user_ll (file_read_actor)
> 00000001 0.550ms (+0.000ms): kunmap_atomic (file_read_actor)
> 00000001 0.550ms (+0.000ms): sub_preempt_count (kunmap_atomic)
> 00000001 0.550ms (+0.000ms): update_max_trace (check_preempt_timing)

this is a full page copy, from userspace into a kernelspace pagecache
page. This shouldnt take 500 usecs on any hardware. Since this is a
single instruction (memcpy's rep; movsl instruction) there's nothing
that Linux can do to avoid (or even to cause) such a situation.

> 00010002 0.141ms (+0.000ms): mark_offset_tsc (timer_interrupt) [0]
> 00010002 0.141ms (+0.000ms): mark_offset_tsc (timer_interrupt) [1]
> 00010002 0.141ms (+0.000ms): spin_lock (mark_offset_tsc)
> 00010003 0.141ms (+0.000ms): spin_lock (<00000000>)
> 00010003 0.141ms (+0.131ms): mark_offset_tsc (timer_interrupt) [2]
> 00010003 0.273ms (+0.000ms): mark_offset_tsc (timer_interrupt) [3]

note that there's no spinning on the spinlock, the (<00000000>) shows 
that there was no contention at all.

> For reference, the steps in the code read (w/o comments):
> 
>         mcount(); [1]
>         write_seqlock(&monotonic_lock);
>         mcount(); [2]
>         last_offset = ((unsigned long long)last_tsc_high<<32)|last_tsc_low;
>         rdtsc(last_tsc_low, last_tsc_high);
>         mcount(); [3]

the 131 usec delay occured between [2] and [3] - which, if you check the 
assembly, there are only 14 instructions between those two mcount() 
calls:

 1a0:   31 db                   xor    %ebx,%ebx
 1a2:   8b 0d 10 00 00 00       mov    0x10,%ecx
                        1a4: R_386_32   .bss
 1a8:   a1 14 00 00 00          mov    0x14,%eax
                        1a9: R_386_32   .bss
 1ad:   89 c2                   mov    %eax,%edx
 1af:   31 c0                   xor    %eax,%eax
 1b1:   89 c7                   mov    %eax,%edi
 1b3:   09 cf                   or     %ecx,%edi
 1b5:   89 7d e0                mov    %edi,0xffffffe0(%ebp)
 1b8:   89 d7                   mov    %edx,%edi
 1ba:   09 df                   or     %ebx,%edi
 1bc:   89 7d e4                mov    %edi,0xffffffe4(%ebp)
 1bf:   0f 31                   rdtsc
 1c1:   89 15 14 00 00 00       mov    %edx,0x14
                        1c3: R_386_32   .bss
 1c7:   a3 10 00 00 00          mov    %eax,0x10
                        1c8: R_386_32   .bss

no loop, no nothing. No way can this take 131 usecs without hardware 
effects.

> Clear Page Tables
> =================
> 
> This is the longest single latency with the following traces:
> 
> # grep '(+0.6' lt040907/lt*/lt.*
> lt040907/lt001.v3k1/lt.28:00000001 0.001ms (+0.635ms): clear_page_tables
> (exit_mmap)
> lt040907/lt002.v3k1/lt.75:00000001 0.001ms (+0.628ms): clear_page_tables
> (exit_mmap)

this one might be a real latency - but it's hard to tell if there are 
random 500 usec latencies all around the place.

> __modify_IO_APIC_irq
> ====================

> 00000003 0.001ms (+0.000ms): __unmask_IO_APIC_irq (unmask_IO_APIC_irq)
> 00000003 0.002ms (+0.567ms): __modify_IO_APIC_irq (__unmask_IO_APIC_irq)
> 00010001 0.569ms (+0.000ms): do_nmi (smp_apic_timer_interrupt)

this too seems to be one of these random 500 usec latencies that have no 
connection whatsoever to what is being done. It's just some unfortunate 
piece of code that is more likely to access the memory bus or happens to 
be on a page boundary or something like that.

> Spin Lock
> =========
> 
> We seem to have gotten stuck here in a spin lock...

none of the spinlocks had a counter different from zero so there was no
contention. The extra trace entry after a spinlock:

> 00000002 0.000ms (+0.000ms): spin_lock (<00000000>)

shows the number of times the spinlock had to spin internally before it
got the lock. For real contention this should be some large nonzero
number.

> 00000002 0.008ms (+0.000ms): snd_ensoniq_trigger (snd_pcm_do_stop)
> 00000002 0.009ms (+0.000ms): spin_lock (snd_ensoniq_trigger)
> 00000003 0.009ms (+0.549ms): spin_lock (<00000000>)

this too seems to be caused by that 'magic' latency - a noncontended
spinlock cannot take 500 usecs to execute ...

> Context Switch
> ==============

same for the context-switch codepath. I'm very convinced that the 'magic
latencies' are distributed more or less randomly across kernel code. 
Code that accesses the main memory bus more likely will be affected more
by DMA starvation, that's what makes some of these functions more
prominent than others.

> 00000002 0.005ms (+0.000ms): dummy_cs_switch_mm (context_switch)
> 00000002 0.005ms (+0.111ms): context_switch (schedule)

since this includes a cr3 flush it is very likely accessing main memory. 
(it's possibly re-fetching lots of TLB entries from a new pagetable
which is likely cache-cold.)

so my main candidate is still IDE DMA. Please disable IDE DMA and see
what happens (after hiding the PIO IDE codepath via
touch_preempt_timing()).

	Ingo

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R1
@ 2004-09-08 20:33 Mark_H_Johnson
  2004-09-08 22:31 ` Alan Cox
                   ` (3 more replies)
  0 siblings, 4 replies; 100+ messages in thread
From: Mark_H_Johnson @ 2004-09-08 20:33 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Mark_H_Johnson, Lee Revell, Free Ekanayaka, Eric St-Laurent,
	linux-kernel, K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt,
	P.O. Gaillard, nando, luke, free78

>.... Please disable IDE DMA and see
>what happens (after hiding the PIO IDE codepath via
>touch_preempt_timing()).

Not quite sure where to add touch_preempt_timing() calls - somewhere in the
loop in ide_outsl and ide_insl? [so we keep resetting the start /end
times?]

I did collect some initial data - most are of the form:

preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-R8
-------------------------------------------------------
 latency: 545 us, entries: 8 (8)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: IRQ 14/140, uid:0 nice:-10 policy:0 rt_prio:0
    -----------------
 => started at: kmap_atomic+0x23/0xe0
 => ended at:   kunmap_atomic+0x7b/0xa0
=======>
00000001 0.000ms (+0.000ms): kmap_atomic (ide_multwrite)
00000001 0.000ms (+0.001ms): page_address (ide_multwrite)
00000001 0.001ms (+0.000ms): taskfile_output_data (ide_multwrite)
00000001 0.002ms (+0.000ms): ata_output_data (taskfile_output_data)
00000001 0.003ms (+0.542ms): ide_outsl (ata_output_data)
00000001 0.545ms (+0.000ms): kunmap_atomic (ide_multwrite)
00000001 0.545ms (+0.000ms): sub_preempt_count (kunmap_atomic)
00000001 0.546ms (+0.000ms): update_max_trace (check_preempt_timing)

Which I am sure is what you expected.

I did get a couple that looked like this:

preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-R8
-------------------------------------------------------
 latency: 1261 us, entries: 73 (73)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: sleep/4859, uid:0 nice:-9 policy:0 rt_prio:0
    -----------------
 => started at: spin_lock+0x24/0x90
 => ended at:   _spin_unlock+0x2d/0x60
=======>
00000001 0.000ms (+0.000ms): spin_lock (do_wp_page)
00000001 0.000ms (+0.000ms): spin_lock (<00000000>)
00000001 0.000ms (+0.000ms): kmap_atomic (do_wp_page)
00000002 0.000ms (+0.000ms): page_address (do_wp_page)
00000002 0.001ms (+0.000ms): page_remove_rmap (do_wp_page)
00000002 0.002ms (+0.001ms): flush_tlb_page (do_wp_page)
00000003 0.003ms (+0.000ms): flush_tlb_others (flush_tlb_page)
00000003 0.003ms (+0.000ms): spin_lock (flush_tlb_others)
00000004 0.004ms (+0.001ms): spin_lock (<00000000>)
00000004 0.005ms (+0.000ms): send_IPI_mask (flush_tlb_others)
00000004 0.005ms (+0.321ms): send_IPI_mask_bitmask (flush_tlb_others)
00010004 0.327ms (+0.000ms): do_nmi (flush_tlb_others)
00010004 0.327ms (+0.003ms): do_nmi (kallsyms_lookup)
00010004 0.331ms (+0.000ms): notifier_call_chain (default_do_nmi)
00010004 0.331ms (+0.000ms): profile_tick (nmi_watchdog_tick)
00010004 0.331ms (+0.000ms): profile_hook (profile_tick)
00010004 0.332ms (+0.000ms): read_lock (profile_hook)
00010005 0.332ms (+0.000ms): read_lock (<00000000>)
00010005 0.332ms (+0.000ms): notifier_call_chain (profile_hook)
00010005 0.332ms (+0.000ms): _read_unlock (profile_tick)
00010004 0.333ms (+0.091ms): profile_hit (nmi_watchdog_tick)
00000004 0.424ms (+0.000ms): smp_apic_timer_interrupt (flush_tlb_others)
...
00000005 0.431ms (+0.000ms): do_softirq (smp_apic_timer_interrupt)
00000005 0.431ms (+0.000ms): __do_softirq (do_softirq)
00000005 0.432ms (+0.000ms): wake_up_process (do_softirq)
00000005 0.432ms (+0.000ms): try_to_wake_up (wake_up_process)
00000005 0.433ms (+0.000ms): task_rq_lock (try_to_wake_up)
00000005 0.433ms (+0.000ms): spin_lock (task_rq_lock)
00000006 0.433ms (+0.000ms): spin_lock (<00000000>)
00000006 0.433ms (+0.000ms): wake_idle (try_to_wake_up)
00000006 0.434ms (+0.000ms): activate_task (try_to_wake_up)
00000006 0.434ms (+0.000ms): sched_clock (activate_task)
00000006 0.434ms (+0.000ms): recalc_task_prio (activate_task)
00000006 0.435ms (+0.000ms): effective_prio (recalc_task_prio)
00000006 0.435ms (+0.000ms): __activate_task (try_to_wake_up)
00000006 0.436ms (+0.000ms): enqueue_task (__activate_task)
00000006 0.436ms (+0.000ms): sched_info_queued (enqueue_task)
00000006 0.437ms (+0.000ms): resched_task (try_to_wake_up)
00000006 0.437ms (+0.000ms): task_rq_unlock (try_to_wake_up)
00000006 0.437ms (+0.000ms): _spin_unlock_irqrestore (try_to_wake_up)
00000005 0.437ms (+0.820ms): preempt_schedule (try_to_wake_up)
00000004 1.258ms (+0.000ms): _spin_unlock (flush_tlb_others)
00000003 1.258ms (+0.000ms): preempt_schedule (flush_tlb_others)
00000002 1.258ms (+0.000ms): preempt_schedule (flush_tlb_page)
...

or this

preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-R8
-------------------------------------------------------
 latency: 1210 us, entries: 113 (113)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: get_ltrace.sh/3071, uid:0 nice:-9 policy:0 rt_prio:0
    -----------------
 => started at: flush_tlb_mm+0x1d/0xd0
 => ended at:   flush_tlb_mm+0x52/0xd0
=======>
00000001 0.000ms (+0.001ms): flush_tlb_mm (copy_mm)
00000001 0.001ms (+0.000ms): flush_tlb_others (flush_tlb_mm)
00000001 0.001ms (+0.000ms): spin_lock (flush_tlb_others)
00000002 0.002ms (+0.000ms): spin_lock (<00000000>)
00000002 0.003ms (+0.000ms): send_IPI_mask (flush_tlb_others)
00000002 0.003ms (+0.378ms): send_IPI_mask_bitmask (flush_tlb_others)
00000002 0.381ms (+0.000ms): smp_apic_timer_interrupt (flush_tlb_others)
...
00010003 0.459ms (+0.000ms): preempt_schedule (timer_interrupt)
00010003 0.460ms (+0.000ms): do_timer (timer_interrupt)
00010003 0.460ms (+0.000ms): update_wall_time (do_timer)
00010003 0.460ms (+0.000ms): update_wall_time_one_tick (update_wall_time)
00010003 0.460ms (+0.000ms): _spin_unlock (timer_interrupt)
00010002 0.460ms (+0.000ms): preempt_schedule (timer_interrupt)
00010002 0.461ms (+0.000ms): spin_lock (do_IRQ)
00010003 0.461ms (+0.000ms): spin_lock (<00000000>)
00010003 0.461ms (+0.000ms): generic_note_interrupt (do_IRQ)
00010003 0.461ms (+0.000ms): end_edge_ioapic_irq (do_IRQ)
00010003 0.461ms (+0.000ms): _spin_unlock (do_IRQ)
00010002 0.462ms (+0.000ms): preempt_schedule (do_IRQ)
00000003 0.462ms (+0.000ms): do_softirq (do_IRQ)
00000003 0.462ms (+0.747ms): __do_softirq (do_softirq)
00000002 1.209ms (+0.000ms): _spin_unlock (flush_tlb_others)
00000001 1.210ms (+0.000ms): preempt_schedule (flush_tlb_others)
00000001 1.210ms (+0.000ms): sub_preempt_count (flush_tlb_mm)
00000001 1.211ms (+0.000ms): update_max_trace (check_preempt_timing)

where I had > 1 msec latencies w/o any apparent references to the IDE.

  --Mark


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R1
  2004-09-08 20:33 Mark_H_Johnson
@ 2004-09-08 22:31 ` Alan Cox
  2004-09-09 16:44   ` Thomas Charbonnel
  2004-09-09  6:27 ` Ingo Molnar
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 100+ messages in thread
From: Alan Cox @ 2004-09-08 22:31 UTC (permalink / raw)
  To: Mark_H_Johnson
  Cc: Ingo Molnar, Lee Revell, Free Ekanayaka, Eric St-Laurent,
	Linux Kernel Mailing List, K.R. Foley, Felipe Alfaro Solana,
	Daniel Schmitt, P.O. Gaillard, nando, luke, free78

On Mer, 2004-09-08 at 21:33, Mark_H_Johnson@raytheon.com wrote:
> >.... Please disable IDE DMA and see
> >what happens (after hiding the PIO IDE codepath via
> >touch_preempt_timing()).
> 
> Not quite sure where to add touch_preempt_timing() calls - somewhere in the
> loop in ide_outsl and ide_insl? [so we keep resetting the start /end
> times?]

If you haven't done hdparm -u1 that may be a reason you want to touch
these. To defend against some very bad old h/w where a stall in the I/O
stream to the disk causes corruption we disable IRQ's across the
transfer in PIO mode by default.


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R1
  2004-09-08 20:33 Mark_H_Johnson
  2004-09-08 22:31 ` Alan Cox
@ 2004-09-09  6:27 ` Ingo Molnar
  2004-09-09 15:12 ` Ingo Molnar
  2004-09-09 15:33 ` Ingo Molnar
  3 siblings, 0 replies; 100+ messages in thread
From: Ingo Molnar @ 2004-09-09  6:27 UTC (permalink / raw)
  To: Mark_H_Johnson
  Cc: Lee Revell, Free Ekanayaka, Eric St-Laurent, linux-kernel,
	K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt, P.O. Gaillard,
	nando, luke, free78


* Mark_H_Johnson@raytheon.com <Mark_H_Johnson@raytheon.com> wrote:

> Not quite sure where to add touch_preempt_timing() calls - somewhere in the
> loop in ide_outsl and ide_insl? [so we keep resetting the start /end
> times?]

> 00000001 0.002ms (+0.000ms): ata_output_data (taskfile_output_data)
> 00000001 0.003ms (+0.542ms): ide_outsl (ata_output_data)
> 00000001 0.545ms (+0.000ms): kunmap_atomic (ide_multwrite)

yeah, i'd add it to ide_outsl. Something like the patch below. (it
compiles but is untested otherwise.)

another possibility besides DMA starvation is starvation between CPUs. 
The only way to exclude DMA as a source of CPU starvation would be to
try a maxcpus=1 test using the SMP kernel. You dont even have to run the
RT-latency tester for this - just the disk read/write workload should
trigger the latency traces!

	Ingo

--- linux/drivers/ide/ide-iops.c.orig	
+++ linux/drivers/ide/ide-iops.c	
@@ -56,7 +56,15 @@ static u32 ide_inl (unsigned long port)
 
 static void ide_insl (unsigned long port, void *addr, u32 count)
 {
-	insl(port, addr, count);
+	unsigned int chunk, offset = 0;
+
+	while (count) {
+		chunk = min(128U, count);
+		insl(port, addr + offset, chunk);
+		count -= chunk;
+		offset += chunk;
+		touch_preempt_timing();
+	}
 }
 
 static void ide_outb (u8 val, unsigned long port)
@@ -86,7 +94,15 @@ static void ide_outl (u32 val, unsigned 
 
 static void ide_outsl (unsigned long port, void *addr, u32 count)
 {
-	outsl(port, addr, count);
+	unsigned int chunk, offset = 0;
+
+	while (count) {
+		chunk = min(128U, count);
+		outsl(port, addr + offset, chunk);
+		count -= chunk;
+		offset += chunk;
+		touch_preempt_timing();
+	}
 }
 
 void default_hwif_iops (ide_hwif_t *hwif)

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R1
  2004-09-08 20:33 Mark_H_Johnson
  2004-09-08 22:31 ` Alan Cox
  2004-09-09  6:27 ` Ingo Molnar
@ 2004-09-09 15:12 ` Ingo Molnar
  2004-09-09 15:33 ` Ingo Molnar
  3 siblings, 0 replies; 100+ messages in thread
From: Ingo Molnar @ 2004-09-09 15:12 UTC (permalink / raw)
  To: Mark_H_Johnson
  Cc: Lee Revell, Free Ekanayaka, Eric St-Laurent, linux-kernel,
	K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt, P.O. Gaillard,
	nando, luke, free78

* Mark_H_Johnson@raytheon.com <Mark_H_Johnson@raytheon.com> wrote:

> 00000004 0.005ms (+0.321ms): send_IPI_mask_bitmask (flush_tlb_others)
> 00010004 0.327ms (+0.000ms): do_nmi (flush_tlb_others)
> 00010004 0.327ms (+0.003ms): do_nmi (kallsyms_lookup)

these traces happen if you forget to do 'dmesg -n 1' to turn off console
output. One CPU does a latency-printout with interrupts disabled, and
this CPU keeps waiting in flush_tlb_others(). The second do_nmi() entry
(which is a special one, showing the other CPU's activity) indeed shows
that the CPU is executing in kallsyms_lookup() - possibly it's in the
middle of a latency printout.

i havent seen such type of traces with 'dmesg -n 1'.

	Ingo

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R1
  2004-09-08 20:33 Mark_H_Johnson
                   ` (2 preceding siblings ...)
  2004-09-09 15:12 ` Ingo Molnar
@ 2004-09-09 15:33 ` Ingo Molnar
  3 siblings, 0 replies; 100+ messages in thread
From: Ingo Molnar @ 2004-09-09 15:33 UTC (permalink / raw)
  To: Mark_H_Johnson
  Cc: Lee Revell, Free Ekanayaka, Eric St-Laurent, linux-kernel,
	K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt, P.O. Gaillard,
	nando, luke, free78

> >.... Please disable IDE DMA and see
> >what happens (after hiding the PIO IDE codepath via
> >touch_preempt_timing()).

here is a list of a couple of more or less independent ideas regarding
your latency problems. (Some of these i might have suggested before.)

1)

we talked about external SMM events (handled by BIOS code and not
influenced/disabled by Linux) being a possible source of these 'mystic'
latencies. In SMP systems, there's a good likelyhood that SMM's are only
executed on CPU#0.

one good way to exclude SMM as a possibility would be to switch the CPUs
in the test. When you start an RT CPU-intense task to do the latency
test, is that task bound to any particular CPU? If yes and it's CPU#1,
then could you bind it to CPU#0? That would shift almost all kernel
activities to CPU#1. Likewise, could you set the
/proc/irq/*/smp_affinity masks to be '2' for all interrupts - to handle
all IRQs on CPU#1.

but i dont expect SMMs as the source of these latencies. BIOS writers
are supposed to fix up the TSC in SMM handlers to make them as seemless
as possible. So seeing frequent 500+ usec SMMs with no TSC fixup seems
weird.

2)

the maxcpus=1 workload-only (non-latencytest) test should also give an
important datapoint: is this phenomenon caused by SMP?

3)

it would still be nice if you could try the tests on a different type of
SMP hardware, as i cannot reproduce your results on 3 different types of
SMP hardware. E.g. one of my testsystems is a dual 466 MHz Celeron
(Mendocino) box, and the biggest latencies very closely track the UP
results reported by others - the highest latency in an IO-intense
workload is 192 usecs. The box is using IDE DMA. This system is roughly
half as fast as your 800+ MHz P3 box. I also have tried tests on a dual
P4 and a dual P3 - both have maximum latencies below 100 usecs and never
produce traces in weird places like the scheduler or copy_page_ll.

4)

your previous experiments have shown that disabling DRI in X does not
eliminate these weird latencies. Another DMA agent is IDE, which you are
testing now via the PIO-mode experiment. There's a third DMA agent:
audio cards do DMA too, have you tried to eliminate the audio component
of the test? Isnt there a latencytest mode that doesnt use an audio card
but /dev/rtc? [or just running the workloads without running latencytest
should be ok too to trigger the latency traces.]

	Ingo

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R1
  2004-09-07 16:54 [patch] voluntary-preempt-2.6.9-rc1-bk4-R1 Mark_H_Johnson
  2004-09-08 18:42 ` Ingo Molnar
@ 2004-09-09 16:02 ` Ingo Molnar
  1 sibling, 0 replies; 100+ messages in thread
From: Ingo Molnar @ 2004-09-09 16:02 UTC (permalink / raw)
  To: Mark_H_Johnson
  Cc: Lee Revell, Free Ekanayaka, Eric St-Laurent, linux-kernel,
	K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt, P.O. Gaillard,
	nando, luke, free78


* Mark_H_Johnson@raytheon.com <Mark_H_Johnson@raytheon.com> wrote:

> Results from this morning's test with -R1 and some follow up on
> related messages.

lt002.v3k1/lt.10 is particularly interesting:

 00000001 0.000ms (+0.000ms): spin_lock (rtl8139_poll)
 00000001 0.000ms (+0.070ms): spin_lock (<00000000>)
 00000001 0.070ms (+0.070ms): rtl8139_rx (rtl8139_poll)
 00000001 0.140ms (+0.070ms): alloc_skb (rtl8139_rx)
 00000001 0.210ms (+0.070ms): kmem_cache_alloc (alloc_skb)
 00000001 0.280ms (+0.073ms): __kmalloc (alloc_skb)
 00000001 0.354ms (+0.139ms): eth_type_trans (rtl8139_rx)
 00000001 0.493ms (+0.076ms): netif_receive_skb (rtl8139_rx)
 00000002 0.570ms (+0.001ms): packet_rcv_spkt (netif_receive_skb)

this too shows the CPU in 'slow motion' in a codepath that normally
executes 10 times faster than this on a 100 MHz Pentium Classic ...

another interesting thing is that the unit of delay seems to be around
70 usecs. As if under certain circumstances every main memory access
created a 70 usecs hit. (a cachemiss perhaps?) The eth_type_trans entry
perhaps generated 2 main memory accesses (2 cachemisses?) or so.

	Ingo

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R1
  2004-09-08 22:31 ` Alan Cox
@ 2004-09-09 16:44   ` Thomas Charbonnel
  0 siblings, 0 replies; 100+ messages in thread
From: Thomas Charbonnel @ 2004-09-09 16:44 UTC (permalink / raw)
  To: Alan Cox
  Cc: Mark_H_Johnson, Ingo Molnar, Lee Revell, Free Ekanayaka,
	Eric St-Laurent, Linux Kernel Mailing List, K.R. Foley,
	Felipe Alfaro Solana, Daniel Schmitt, P.O. Gaillard, nando, luke,
	free78

Alan Cox wrote :
> On Mer, 2004-09-08 at 21:33, Mark_H_Johnson@raytheon.com wrote:
> > >.... Please disable IDE DMA and see
> > >what happens (after hiding the PIO IDE codepath via
> > >touch_preempt_timing()).
> > 
> > Not quite sure where to add touch_preempt_timing() calls - somewhere in the
> > loop in ide_outsl and ide_insl? [so we keep resetting the start /end
> > times?]
> 
> If you haven't done hdparm -u1 that may be a reason you want to touch
> these. To defend against some very bad old h/w where a stall in the I/O
> stream to the disk causes corruption we disable IRQ's across the
> transfer in PIO mode by default.
> 

I had the exact same problem showing in the output of latencytest, and
enabling unmaskirq on the drive being stressed solved it, thanks !

See this for the problem :
http://www.undata.org/~thomas/unmaskirq_0/index.html
and this for the (impressive) results :
http://www.undata.org/~thomas/unmaskirq_1/index.html

Thomas



^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R1
@ 2004-09-09 18:31 Mark_H_Johnson
  0 siblings, 0 replies; 100+ messages in thread
From: Mark_H_Johnson @ 2004-09-09 18:31 UTC (permalink / raw)
  To: Alan Cox
  Cc: Mark_H_Johnson, Ingo Molnar, Lee Revell, Free Ekanayaka,
	Eric St-Laurent, Linux Kernel Mailing List, K.R. Foley,
	Felipe Alfaro Solana, Daniel Schmitt, P.O. Gaillard, nando, luke,
	free78

>If you haven't done hdparm -u1 that may be a reason you want to touch
>these.

Alas, but
# hdparm /dev/hda
/dev/hda:
 multcount    = 16 (on)
 IO_support   =  1 (32-bit)
 unmaskirq    =  1 (on)
 using_dma    =  0 (off)
 keepsettings =  0 (off)
 readonly     =  0 (off)
 readahead    = 64 (on)
 geometry     = 58168/16/63, sectors = 58633344, start = 0

so I already have IRQ's unmasked. [this was during no DMA tests, I usually
run with DMA enabled]

I'll be commenting on the results of the no DMA tests shortly.

  --Mark


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R1
  2004-09-09 19:47 ` Ingo Molnar
@ 2004-09-09 19:02   ` Alan Cox
  0 siblings, 0 replies; 100+ messages in thread
From: Alan Cox @ 2004-09-09 19:02 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Mark_H_Johnson, Lee Revell, Free Ekanayaka, Eric St-Laurent,
	Linux Kernel Mailing List, K.R. Foley, Felipe Alfaro Solana,
	Daniel Schmitt, P.O. Gaillard, nando, luke, free78

On Iau, 2004-09-09 at 20:47, Ingo Molnar wrote:
> there are also tools that tweak chipsets directly - powertweak and the
> PCI latency settings. Maybe something tweaks the IDE controller just the
> right way. Also, try disabling specific controller support in the
> .config (or turn it on) - by chance the generic IDE code could program
> the IDE controller in a way that generates nicer DMA.

Do not tweak your IDE controller pci latency without making a backup if
its anything but the base motherboard chipset. Not unless you make
backups first and test it very hard.

Its a particularly popular area for errata.

You could also try lower DMA modes, I don't know if that will help or
not but UDMA133 is essentially "one entire PCI 32/33Mhz bus worth of
traffic so the effects are not too suprising. IDE control ports can give
you long stalls too especially if IORDY is in use.

Alan

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R1
@ 2004-09-09 19:23 Mark_H_Johnson
  2004-09-09 19:47 ` Ingo Molnar
  0 siblings, 1 reply; 100+ messages in thread
From: Mark_H_Johnson @ 2004-09-09 19:23 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Lee Revell, Free Ekanayaka, Eric St-Laurent, linux-kernel,
	K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt, P.O. Gaillard,
	nando, luke, free78

>would it be possible to test with DMA disabled? (hdparm -d0 /dev/hda) It
>might take some extra work to shun the extra latency reports from the
>PIO IDE path (which is quite slow) but once that is done you should be
>able to see whether these long 0.5 msec delays remain even if all (most)
>DMA activity has been eliminated.

OK. With new patches in hand, I have a set of latency results w/ IDE
DMA turned off. For reference, tests were run with:

# hdparm /dev/hda

/dev/hda:
 multcount    = 16 (on)
 IO_support   =  1 (32-bit)
 unmaskirq    =  1 (on)
 using_dma    =  0 (off)
 keepsettings =  0 (off)
 readonly     =  0 (off)
 readahead    = 64 (on)
 geometry     = 58168/16/63, sectors = 58633344, start = 0

# cat /sys/block/hda/queue/max_sectors_kb
32
# cat /sys/block/hda/queue/read_ahead_kb
32
# cat /proc/sys/net/core/netdev_max_backlog
8
# dmesg -n 1

and all tests run w/ a -R8 kernel plus patches for sched, timer_tsc,
and ide-iops (to add latency trace outputs or suppress known long paths).

No latency traces > 600 usec. These tests were run simply with
  head -c 750000000 /dev/zero >tmpfile  (disk writes)
  cp tmpfile tmpfile2 (disk copy)
  cat tmpfile tmpfile2 >/dev/null
while capturing the latency traces in another process. 72 total traces
captured in 15 minutes of tests.

For reference, the 750 meg file size is about 1.5 x physical memory.
No I/O to the audio card. X was running (to monitor the test) and a
network was connected (but generally idle).

PIO trace
=========

preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-R8-nd
-------------------------------------------------------
 latency: 369 us, entries: 3 (3)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: IRQ 14/140, uid:0 nice:-10 policy:0 rt_prio:0
    -----------------
 => started at: ide_outsl+0x44/0x50
 => ended at:   ide_outsl+0x44/0x50
=======>
00000001 0.000ms (+0.370ms): touch_preempt_timing (ide_outsl)
00000001 0.370ms (+0.000ms): touch_preempt_timing (ide_outsl)
00000001 0.370ms (+0.000ms): update_max_trace (check_preempt_timing)

which appears to be the PIO path ide_outsl. I had a few similar traces
with ide_insl during the copy / read tests as well.

send_IPI_mask_bitmask
=====================

I have several traces where send_IPI_mask_bitmask (flush_tlb_others)
shows up. For example...

00000002 0.010ms (+0.001ms): kunmap_atomic (zap_pte_range)
00000001 0.011ms (+0.000ms): flush_tlb_mm (unmap_vmas)
00000002 0.012ms (+0.000ms): flush_tlb_others (flush_tlb_mm)
00000002 0.012ms (+0.000ms): spin_lock (flush_tlb_others)
00000003 0.013ms (+0.001ms): spin_lock (<00000000>)
00000003 0.014ms (+0.000ms): send_IPI_mask (flush_tlb_others)
00000003 0.014ms (+0.132ms): send_IPI_mask_bitmask (flush_tlb_others)
00010003 0.147ms (+0.000ms): do_nmi (flush_tlb_others)
00010003 0.147ms (+0.001ms): do_nmi (ide_outsl)
00010003 0.149ms (+0.000ms): notifier_call_chain (default_do_nmi)
00010003 0.149ms (+0.000ms): profile_tick (nmi_watchdog_tick)
00010003 0.150ms (+0.000ms): profile_hook (profile_tick)
00010003 0.150ms (+0.000ms): read_lock (profile_hook)
00010004 0.150ms (+0.000ms): read_lock (<00000000>)
00010004 0.150ms (+0.000ms): notifier_call_chain (profile_hook)
00010004 0.151ms (+0.000ms): _read_unlock (profile_tick)
00010003 0.151ms (+0.250ms): profile_hit (nmi_watchdog_tick)
00000003 0.401ms (+0.000ms): _spin_unlock (flush_tlb_others)
00000001 0.402ms (+0.000ms): free_pages_and_swap_cache (unmap_vmas)
00000001 0.402ms (+0.000ms): lru_add_drain (free_pages_and_swap_cache)

or this one...

preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-R8-nd
-------------------------------------------------------
 latency: 445 us, entries: 22 (22)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: kswapd0/40, uid:0 nice:0 policy:0 rt_prio:0
    -----------------
 => started at: spin_lock+0x24/0x90
 => ended at:   _spin_unlock+0x2d/0x60
=======>
00000001 0.000ms (+0.000ms): spin_lock (page_referenced_file)
00000001 0.000ms (+0.000ms): spin_lock (<00000000>)
00000001 0.000ms (+0.000ms): vma_prio_tree_next (page_referenced_file)
00000001 0.001ms (+0.001ms): prio_tree_first (vma_prio_tree_next)
00000001 0.003ms (+0.001ms): prio_tree_left (prio_tree_first)
00000001 0.005ms (+0.000ms): page_referenced_one (page_referenced_file)
00000001 0.005ms (+0.000ms): spin_lock (page_referenced_one)
00000002 0.006ms (+0.001ms): spin_lock (<00000000>)
00000002 0.007ms (+0.000ms): kmap_atomic (page_referenced_one)
00000003 0.007ms (+0.001ms): page_address (page_referenced_one)
00000003 0.009ms (+0.000ms): flush_tlb_page (page_referenced_one)
00000004 0.010ms (+0.000ms): flush_tlb_others (flush_tlb_page)
00000004 0.010ms (+0.000ms): spin_lock (flush_tlb_others)
00000005 0.011ms (+0.001ms): spin_lock (<00000000>)
00000005 0.012ms (+0.000ms): send_IPI_mask (flush_tlb_others)
00000005 0.012ms (+0.431ms): send_IPI_mask_bitmask (flush_tlb_others)
00000005 0.444ms (+0.000ms): _spin_unlock (flush_tlb_others)
00000003 0.445ms (+0.000ms): kunmap_atomic (page_referenced_one)
00000002 0.445ms (+0.000ms): _spin_unlock (page_referenced_one)
00000001 0.445ms (+0.000ms): _spin_unlock (page_referenced_file)
00000001 0.446ms (+0.000ms): sub_preempt_count (_spin_unlock)
00000001 0.446ms (+0.000ms): update_max_trace (check_preempt_timing)

or this one...

preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-R8-nd
-------------------------------------------------------
 latency: 428 us, entries: 49 (49)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: get_ltrace.sh/5514, uid:0 nice:-9 policy:0 rt_prio:0
    -----------------
 => started at: flush_tlb_mm+0x1d/0xd0
 => ended at:   flush_tlb_mm+0x52/0xd0
=======>
00000001 0.000ms (+0.001ms): flush_tlb_mm (copy_mm)
00000001 0.001ms (+0.000ms): flush_tlb_others (flush_tlb_mm)
00000001 0.002ms (+0.000ms): spin_lock (flush_tlb_others)
00000002 0.002ms (+0.000ms): spin_lock (<00000000>)
00000002 0.003ms (+0.000ms): send_IPI_mask (flush_tlb_others)
00000002 0.003ms (+0.415ms): send_IPI_mask_bitmask (flush_tlb_others)
00000002 0.419ms (+0.000ms): smp_apic_timer_interrupt (flush_tlb_others)
00010002 0.419ms (+0.000ms): profile_tick (smp_apic_timer_interrupt)
00010002 0.419ms (+0.000ms): profile_hook (profile_tick)
00010002 0.419ms (+0.000ms): read_lock (profile_hook)
00010003 0.420ms (+0.000ms): read_lock (<00000000>)
00010003 0.420ms (+0.000ms): notifier_call_chain (profile_hook)
00010003 0.420ms (+0.000ms): _read_unlock (profile_tick)
00010002 0.420ms (+0.000ms): profile_hit (smp_apic_timer_interrupt)
00010002 0.420ms (+0.000ms): update_process_times
(smp_apic_timer_interrupt)
...

or this one...

preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-R8-nd
-------------------------------------------------------
 latency: 330 us, entries: 9 (9)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: get_ltrace.sh/5514, uid:0 nice:-9 policy:0 rt_prio:0
    -----------------
 => started at: flush_tlb_mm+0x1d/0xd0
 => ended at:   flush_tlb_mm+0x52/0xd0
=======>
00000001 0.000ms (+0.001ms): flush_tlb_mm (copy_mm)
00000001 0.001ms (+0.000ms): flush_tlb_others (flush_tlb_mm)
00000001 0.002ms (+0.000ms): spin_lock (flush_tlb_others)
00000002 0.002ms (+0.000ms): spin_lock (<00000000>)
00000002 0.003ms (+0.000ms): send_IPI_mask (flush_tlb_others)
00000002 0.004ms (+0.326ms): send_IPI_mask_bitmask (flush_tlb_others)
00000002 0.330ms (+0.000ms): _spin_unlock (flush_tlb_others)
00000001 0.331ms (+0.000ms): sub_preempt_count (flush_tlb_mm)
00000001 0.331ms (+0.000ms): update_max_trace (check_preempt_timing)

try_to_wake_up
==============

Buried inside a pretty long trace in kswapd0, I saw the following...

preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-R8-nd
-------------------------------------------------------
 latency: 426 us, entries: 286 (286)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: kswapd0/40, uid:0 nice:0 policy:0 rt_prio:0
    -----------------
 => started at: spin_lock+0x24/0x90
 => ended at:   _spin_unlock+0x2d/0x60
=======>
00000001 0.000ms (+0.000ms): spin_lock (page_referenced_file)
00000001 0.000ms (+0.000ms): spin_lock (<00000000>)
00000001 0.000ms (+0.000ms): vma_prio_tree_next (page_referenced_file)
00000001 0.000ms (+0.000ms): prio_tree_first (vma_prio_tree_next)
00000001 0.001ms (+0.000ms): prio_tree_left (prio_tree_first)
00000001 0.001ms (+0.000ms): page_referenced_one (page_referenced_file)
00000001 0.001ms (+0.000ms): spin_lock (page_referenced_one)
00000002 0.002ms (+0.000ms): spin_lock (<00000000>)
...
00000006 0.108ms (+0.000ms): task_rq_lock (try_to_wake_up)
00000006 0.108ms (+0.000ms): spin_lock (task_rq_lock)
00000007 0.108ms (+0.000ms): spin_lock (<00000000>)
00000007 0.109ms (+0.000ms): wake_idle (try_to_wake_up)
00000007 0.109ms (+0.000ms): activate_task (try_to_wake_up)
00000007 0.109ms (+0.000ms): sched_clock (activate_task)
00000007 0.109ms (+0.000ms): recalc_task_prio (activate_task)
00000007 0.109ms (+0.000ms): effective_prio (recalc_task_prio)
00000007 0.110ms (+0.000ms): __activate_task (try_to_wake_up)
00000007 0.110ms (+0.000ms): enqueue_task (__activate_task)
00000007 0.110ms (+0.000ms): sched_info_queued (enqueue_task)
00000007 0.110ms (+0.000ms): resched_task (try_to_wake_up)
00000007 0.111ms (+0.000ms): task_rq_unlock (try_to_wake_up)
00000007 0.111ms (+0.000ms): _spin_unlock_irqrestore (try_to_wake_up)
00000006 0.111ms (+0.298ms): preempt_schedule (try_to_wake_up)
00000005 0.409ms (+0.000ms): _spin_unlock (flush_tlb_others)
00000004 0.409ms (+0.000ms): preempt_schedule (flush_tlb_others)
00000003 0.410ms (+0.000ms): preempt_schedule (flush_tlb_page)
00000003 0.410ms (+0.000ms): kunmap_atomic (page_referenced_one)
00000002 0.410ms (+0.000ms): preempt_schedule (page_referenced_one)
00000002 0.410ms (+0.000ms): _spin_unlock (page_referenced_one)
00000001 0.410ms (+0.000ms): preempt_schedule (page_referenced_one)

There is an almost 300 usec hit there as we unwind the nested layers.

So the long wait on paths through sched and timer_tsc appear to be
eliminated with PIO to the disk.

Is there some "for sure" way to limit the size and/or duration of
DMA transfers so I get reasonable performance from the disk (and
other DMA devices) and reasonable latency?
  --Mark


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R1
  2004-09-09 19:23 Mark_H_Johnson
@ 2004-09-09 19:47 ` Ingo Molnar
  2004-09-09 19:02   ` Alan Cox
  0 siblings, 1 reply; 100+ messages in thread
From: Ingo Molnar @ 2004-09-09 19:47 UTC (permalink / raw)
  To: Mark_H_Johnson
  Cc: Lee Revell, Free Ekanayaka, Eric St-Laurent, linux-kernel,
	K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt, P.O. Gaillard,
	nando, luke, free78

* Mark_H_Johnson@raytheon.com <Mark_H_Johnson@raytheon.com> wrote:

> PIO trace
> =========

> 00000001 0.000ms (+0.370ms): touch_preempt_timing (ide_outsl)
> 00000001 0.370ms (+0.000ms): touch_preempt_timing (ide_outsl)

Please decrease the 128U chunking to 8U or so.

> I have several traces where send_IPI_mask_bitmask (flush_tlb_others)
> shows up. For example...

flush_tlb_others() is a good indicator of irqs-off sections on the other
CPU. The code does the following when it flushes TLBs: it sends an IPI
(inter-process-interrupt) to all other CPUs (one CPU in your case) and
waits for arrival of that IRQ and completion while spinning on a flag. 
The IPI normally takes 10 usecs or so to process so this is not an
issue. BUT if the CPU has IRQs disabled then the IPI is delayed and the
IRQs-off latency shows up as flush_tlb_others() latency.

> 00000003 0.014ms (+0.132ms): send_IPI_mask_bitmask (flush_tlb_others)
> 00010003 0.147ms (+0.000ms): do_nmi (flush_tlb_others)
> 00010003 0.147ms (+0.001ms): do_nmi (ide_outsl)

Since the other CPU's do_nmi() implicates ide_outsl it could be that we
are doing ide_outsl with IRQs disabled? Could you add something like
this to the ide_outsl code:

	if (irqs_disabled() && printk_ratelimit())
		dump_stack();

(the most common irqs-off section is the latency printout itself - this
triggers if the latency message goes to the console - i.e. 'dmesg -n 1'
wasnt done.)

> Buried inside a pretty long trace in kswapd0, I saw the following...

> 00000007 0.111ms (+0.000ms): _spin_unlock_irqrestore (try_to_wake_up)
> 00000006 0.111ms (+0.298ms): preempt_schedule (try_to_wake_up)
> 00000005 0.409ms (+0.000ms): _spin_unlock (flush_tlb_others)

this too is flush_tlb_others() related.

> So the long wait on paths through sched and timer_tsc appear to be
> eliminated with PIO to the disk.

yeah, nice. I'd still like to make sure that we've not hidden latencies
by working down the ide_outsl() latency and its apparent IRQs-off
property.

> Is there some "for sure" way to limit the size and/or duration of DMA
> transfers so I get reasonable performance from the disk (and other DMA
> devices) and reasonable latency?

the 'unit' of the 'weird' delays seem to be around 70 usecs, the maximum
seems to be around 560 usecs. Note the 1:8 relationship between the two. 
You have 32 KB as max_sectors, so the 70 usecs unit is for a single 4K
transfer which is a single scatter-gather entry: it all makes perfect
sense. 4K per 70 usecs means a DMA rate of ~57 MB/sec which sounds
reasonable.

so if these assumptions are true i'd suggest to decrease max_sectors
from 32K to 16K - that alone should halve these random latencies from
560 usecs to 280 usecs. Unless you see stability you might want to try
an 8K setting as well - this will likely decrease your IO rates
noticeably though. This would reduce the random delays to 140 usecs.

but the real fix would be to tweak the IDE controller to not do so
agressive DMA! Are there any BIOS settings that somehow deal with it? 
Try increasing the PCI latency value? Is the disk using UDMA - if yes,
could you downgrade it to normal IDE DMA? Perhaps that tweaks the
controller to be 'nice' to the CPU. Is your IDE chipset integrated on
the motherboard? Could you send me your full bootlog (off-list)?

there are also tools that tweak chipsets directly - powertweak and the
PCI latency settings. Maybe something tweaks the IDE controller just the
right way. Also, try disabling specific controller support in the
.config (or turn it on) - by chance the generic IDE code could program
the IDE controller in a way that generates nicer DMA.

	Ingo

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R1
@ 2004-09-09 22:41 Mark_H_Johnson
  0 siblings, 0 replies; 100+ messages in thread
From: Mark_H_Johnson @ 2004-09-09 22:41 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Lee Revell, Free Ekanayaka, Eric St-Laurent, linux-kernel,
	K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt, P.O. Gaillard,
	nando, luke, free78

>so if these assumptions are true i'd suggest to decrease max_sectors
>from 32K to 16K - that alone should halve these random latencies from
>560 usecs to 280 usecs. Unless you see stability you might want to try
>an 8K setting as well - this will likely decrease your IO rates
>noticeably though. This would reduce the random delays to 140 usecs.

I tried both and still had some long delays. For example:

00000003 0.015ms (+0.000ms): send_IPI_mask (flush_tlb_others)
00000003 0.015ms (+0.137ms): send_IPI_mask_bitmask (flush_tlb_others)
00010003 0.153ms (+0.000ms): do_nmi (flush_tlb_others)
00010003 0.153ms (+0.001ms): do_nmi (ide_outsl)
00010003 0.155ms (+0.000ms): notifier_call_chain (default_do_nmi)
00010003 0.155ms (+0.000ms): profile_tick (nmi_watchdog_tick)
00010003 0.155ms (+0.000ms): profile_hook (profile_tick)
00010003 0.156ms (+0.000ms): read_lock (profile_hook)
00010004 0.156ms (+0.000ms): read_lock (<00000000>)
00010004 0.156ms (+0.000ms): notifier_call_chain (profile_hook)
00010004 0.156ms (+0.000ms): _read_unlock (profile_tick)
00010003 0.157ms (+0.290ms): profile_hit (nmi_watchdog_tick)
00000003 0.447ms (+0.000ms): _spin_unlock (flush_tlb_others)
00000001 0.447ms (+0.000ms): free_pages_and_swap_cache (unmap_vmas)

or this...

preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-R8-nd
-------------------------------------------------------
 latency: 440 us, entries: 34 (34)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: kswapd0/40, uid:0 nice:0 policy:0 rt_prio:0
    -----------------
 => started at: spin_lock+0x24/0x90
 => ended at:   _spin_unlock+0x2d/0x60
=======>
00000001 0.000ms (+0.000ms): spin_lock (get_swap_page)
00000001 0.000ms (+0.000ms): spin_lock (<00000000>)
00000001 0.000ms (+0.000ms): spin_lock (get_swap_page)
00000002 0.000ms (+0.410ms): spin_lock (<00000000>)
00010002 0.411ms (+0.000ms): do_IRQ (get_swap_page)

or this...

preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-R8-nd
-------------------------------------------------------
 latency: 548 us, entries: 8 (8)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: kswapd0/40, uid:0 nice:0 policy:0 rt_prio:0
    -----------------
 => started at: spin_lock+0x24/0x90
 => ended at:   _spin_unlock+0x2d/0x60
=======>
00000001 0.000ms (+0.000ms): spin_lock (get_swap_page)
00000001 0.000ms (+0.000ms): spin_lock (<00000000>)
00000001 0.000ms (+0.000ms): spin_lock (get_swap_page)
00000002 0.000ms (+0.547ms): spin_lock (<00000000>)
00000002 0.548ms (+0.000ms): _spin_unlock (get_swap_page)
00000001 0.548ms (+0.000ms): _spin_unlock (get_swap_page)
00000001 0.548ms (+0.000ms): sub_preempt_count (_spin_unlock)
00000001 0.549ms (+0.000ms): update_max_trace (check_preempt_timing)

or this...

preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-R8-nd
-------------------------------------------------------
 latency: 422 us, entries: 345 (345)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: kblockd/0/12, uid:0 nice:-10 policy:0 rt_prio:0
    -----------------
 => started at: do_IRQ+0x19/0x1a0
 => ended at:   do_IRQ+0x14a/0x1a0
=======>
00010000 0.000ms (+0.000ms): do_IRQ (common_interrupt)
00010000 0.000ms (+0.000ms): do_IRQ (kthread_should_stop)
00010000 0.000ms (+0.000ms): do_IRQ (<0000000a>)
00010000 0.000ms (+0.000ms): spin_lock (do_IRQ)
00010001 0.000ms (+0.052ms): spin_lock (<00000000>)
00010001 0.053ms (+0.000ms): mask_and_ack_level_ioapic_irq (do_IRQ)
00010001 0.053ms (+0.000ms): mask_IO_APIC_irq
(mask_and_ack_level_ioapic_irq)
00010001 0.053ms (+0.000ms): __spin_lock_irqsave (mask_IO_APIC_irq)
00010002 0.053ms (+0.000ms): __spin_lock_irqsave (<00000000>)
00010002 0.053ms (+0.000ms): __mask_IO_APIC_irq (mask_IO_APIC_irq)
00010002 0.054ms (+0.220ms): __modify_IO_APIC_irq (__mask_IO_APIC_irq)
00010002 0.274ms (+0.000ms): _spin_unlock_irqrestore (mask_IO_APIC_irq)
00010001 0.274ms (+0.000ms): generic_redirect_hardirq (do_IRQ)

Kswapd0
=======

Not sure I saw this before - may be a side effect of the smaller
transfer sizes...

preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-R8-nd
-------------------------------------------------------
 latency: 500 us, entries: 608 (608)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: kswapd0/40, uid:0 nice:0 policy:0 rt_prio:0
    -----------------
 => started at: spin_lock+0x24/0x90
 => ended at:   _spin_unlock+0x2d/0x60
=======>
00000001 0.000ms (+0.000ms): spin_lock (page_referenced_file)
00000001 0.000ms (+0.000ms): spin_lock (<00000000>)
00000001 0.000ms (+0.000ms): vma_prio_tree_next (page_referenced_file)
00000001 0.000ms (+0.000ms): prio_tree_first (vma_prio_tree_next)
00000001 0.000ms (+0.000ms): prio_tree_left (prio_tree_first)
00000001 0.000ms (+0.000ms): page_referenced_one (page_referenced_file)
00000001 0.001ms (+0.000ms): spin_lock (page_referenced_one)
00000002 0.001ms (+0.000ms): spin_lock (<00000000>)
00000002 0.001ms (+0.000ms): kmap_atomic (page_referenced_one)
00000003 0.001ms (+0.000ms): page_address (page_referenced_one)
00000003 0.002ms (+0.000ms): kunmap_atomic (page_referenced_one)
00000002 0.002ms (+0.000ms): _spin_unlock (page_referenced_one)
00000001 0.002ms (+0.000ms): vma_prio_tree_next (page_referenced_file)
00000001 0.002ms (+0.000ms): page_referenced_one (page_referenced_file)
00000001 0.003ms (+0.000ms): spin_lock (page_referenced_one)
00000002 0.003ms (+0.000ms): spin_lock (<00000000>)
...
00000001 0.480ms (+0.000ms): vma_prio_tree_next (page_referenced_file)
00000001 0.481ms (+0.001ms): page_referenced_one (page_referenced_file)
00000001 0.483ms (+0.000ms): spin_lock (page_referenced_one)
00000002 0.484ms (+0.001ms): spin_lock (<00000000>)
00000002 0.485ms (+0.000ms): kmap_atomic (page_referenced_one)
00000003 0.486ms (+0.001ms): page_address (page_referenced_one)
00000003 0.487ms (+0.000ms): flush_tlb_page (page_referenced_one)
00000003 0.488ms (+0.000ms): kunmap_atomic (page_referenced_one)
00000002 0.489ms (+0.000ms): _spin_unlock (page_referenced_one)
00000001 0.489ms (+0.001ms): vma_prio_tree_next (page_referenced_file)
00000001 0.491ms (+0.002ms): page_referenced_one (page_referenced_file)
00000001 0.494ms (+0.000ms): spin_lock (page_referenced_one)
00000002 0.494ms (+0.002ms): spin_lock (<00000000>)
00000002 0.497ms (+0.000ms): kmap_atomic (page_referenced_one)
00000003 0.497ms (+0.000ms): page_address (page_referenced_one)
00000003 0.498ms (+0.000ms): kunmap_atomic (page_referenced_one)
00000002 0.499ms (+0.000ms): _spin_unlock (page_referenced_one)
00000001 0.499ms (+0.002ms): _spin_unlock (page_referenced_file)
00000001 0.501ms (+0.002ms): sub_preempt_count (_spin_unlock)
00000001 0.503ms (+0.000ms): update_max_trace (check_preempt_timing)

>but the real fix would be to tweak the IDE controller to not do so
>agressive DMA! Are there any BIOS settings that somehow deal with it?
>Try increasing the PCI latency value? Is the disk using UDMA - if yes,
>could you downgrade it to normal IDE DMA? Perhaps that tweaks the
>controller to be 'nice' to the CPU.

I tried
  hdparm -p -X udma2 /dev/hda
(since it was udma4 previously)
and reran the tests. As a side note - that command caused a long latency
Something like...

preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-R8-nd
-------------------------------------------------------
 latency: 2019 us, entries: 2829 (2829)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: hdparm/13753, uid:0 nice:0 policy:0 rt_prio:0
    -----------------
 => started at: cond_resched+0xd/0x30
 => ended at:   cond_resched+0xd/0x30
=======>
00000001 0.000ms (+0.000ms): touch_preempt_timing (cond_resched)
00000001 0.000ms (+0.007ms): ide_find_setting_by_ioctl (generic_ide_ioctl)
00000001 0.008ms (+0.000ms): ide_write_setting (generic_ide_ioctl)
00000001 0.009ms (+0.001ms): capable (ide_write_setting)
...
00000001 0.064ms (+0.001ms): ide_inb (ide_config_drive_speed)
00000001 0.066ms (+0.000ms): ide_inb (ide_config_drive_speed)
00000001 0.067ms (+0.000ms): ide_inb (ide_config_drive_speed)
...
00000001 1.899ms (+0.000ms): ide_inb (ide_config_drive_speed)
00000001 1.899ms (+0.000ms): ide_inb (ide_config_drive_speed)
00000001 1.900ms (+0.000ms): ide_inb (ide_config_drive_speed)
00000001 1.901ms (+0.000ms): __const_udelay (ide_config_drive_speed)
00000001 1.901ms (+0.000ms): __delay (ide_config_drive_speed)
00000001 1.901ms (+0.001ms): delay_tsc (__delay)
00000001 1.902ms (+0.000ms): ide_inb (ide_config_drive_speed)
...

The results after that change were like this...

preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-R8-nd
-------------------------------------------------------
 latency: 536 us, entries: 31 (31)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: IRQ 14/140, uid:0 nice:-10 policy:0 rt_prio:0
    -----------------
 => started at: kmap_atomic+0x23/0xe0
 => ended at:   kunmap_atomic+0x7b/0xa0
=======>
00000001 0.000ms (+0.000ms): kmap_atomic (__ide_do_rw_disk)
00000001 0.000ms (+0.000ms): page_address (__ide_do_rw_disk)
00000001 0.000ms (+0.000ms): ide_set_handler (__ide_do_rw_disk)
00000001 0.000ms (+0.000ms): __spin_lock_irqsave (ide_set_handler)
00000002 0.000ms (+0.000ms): __spin_lock_irqsave (<00000000>)
00000002 0.001ms (+0.000ms): __ide_set_handler (ide_set_handler)
00000002 0.001ms (+0.000ms): __mod_timer (ide_set_handler)
00000002 0.001ms (+0.000ms): __spin_lock_irqsave (__mod_timer)
00000003 0.001ms (+0.000ms): __spin_lock_irqsave (<00000000>)
00000003 0.002ms (+0.000ms): spin_lock (__mod_timer)
00000004 0.002ms (+0.000ms): spin_lock (<00000000>)
00000004 0.002ms (+0.000ms): internal_add_timer (__mod_timer)
00000004 0.002ms (+0.000ms): _spin_unlock (__mod_timer)
00000003 0.002ms (+0.000ms): _spin_unlock_irqrestore (__mod_timer)
00000002 0.003ms (+0.000ms): _spin_unlock_irqrestore (ide_set_handler)
00000001 0.003ms (+0.000ms): taskfile_output_data (__ide_do_rw_disk)
00000001 0.003ms (+0.000ms): ata_output_data (taskfile_output_data)
00000001 0.003ms (+0.496ms): ide_outsw (ata_output_data)
00010001 0.500ms (+0.000ms): do_nmi (ide_outsw)
00010001 0.501ms (+0.005ms): do_nmi (<0864d2b2>)
00010001 0.506ms (+0.000ms): notifier_call_chain (default_do_nmi)
00010001 0.506ms (+0.000ms): profile_tick (nmi_watchdog_tick)
00010001 0.506ms (+0.000ms): profile_hook (profile_tick)
00010001 0.506ms (+0.000ms): read_lock (profile_hook)
00010002 0.507ms (+0.000ms): read_lock (<00000000>)
00010002 0.507ms (+0.000ms): notifier_call_chain (profile_hook)
00010002 0.507ms (+0.000ms): _read_unlock (profile_tick)
00010001 0.507ms (+0.028ms): profile_hit (nmi_watchdog_tick)
00000001 0.536ms (+0.000ms): kunmap_atomic (__ide_do_rw_disk)
00000001 0.536ms (+0.000ms): sub_preempt_count (kunmap_atomic)
00000001 0.536ms (+0.000ms): sub_preempt_count (kunmap_atomic)
00000001 0.537ms (+0.000ms): update_max_trace (check_preempt_timing)

or this...

preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-R8-nd
-------------------------------------------------------
 latency: 497 us, entries: 18 (18)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: IRQ 14/140, uid:0 nice:-10 policy:0 rt_prio:0
    -----------------
 => started at: kmap_atomic+0x23/0xe0
 => ended at:   kunmap_atomic+0x7b/0xa0
=======>
00000001 0.000ms (+0.000ms): kmap_atomic (read_intr)
00000001 0.000ms (+0.000ms): page_address (read_intr)
00000001 0.000ms (+0.000ms): taskfile_input_data (read_intr)
00000001 0.000ms (+0.000ms): ata_input_data (taskfile_input_data)
00000001 0.000ms (+0.066ms): ide_insw (ata_input_data)
00010001 0.067ms (+0.000ms): do_nmi (ide_insw)
00010001 0.067ms (+0.003ms): do_nmi (<0815f06c>)
00010001 0.070ms (+0.000ms): notifier_call_chain (default_do_nmi)
00010001 0.070ms (+0.000ms): profile_tick (nmi_watchdog_tick)
00010001 0.071ms (+0.000ms): profile_hook (profile_tick)
00010001 0.071ms (+0.000ms): read_lock (profile_hook)
00010002 0.071ms (+0.000ms): read_lock (<00000000>)
00010002 0.071ms (+0.000ms): notifier_call_chain (profile_hook)
00010002 0.071ms (+0.000ms): _read_unlock (profile_tick)
00010001 0.072ms (+0.424ms): profile_hit (nmi_watchdog_tick)
00000001 0.497ms (+0.000ms): kunmap_atomic (read_intr)
00000001 0.497ms (+0.000ms): sub_preempt_count (kunmap_atomic)
00000001 0.498ms (+0.000ms): update_max_trace (check_preempt_timing)

or this...

preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-R8-nd
-------------------------------------------------------
 latency: 454 us, entries: 60 (60)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: kswapd0/40, uid:0 nice:0 policy:0 rt_prio:0
    -----------------
 => started at: spin_lock+0x24/0x90
 => ended at:   _spin_unlock+0x2d/0x60
=======>
00000001 0.000ms (+0.000ms): spin_lock (get_swap_page)
00000001 0.000ms (+0.000ms): spin_lock (<00000000>)
00000001 0.000ms (+0.000ms): spin_lock (get_swap_page)
00000002 0.001ms (+0.267ms): spin_lock (<00000000>)
00010002 0.268ms (+0.000ms): do_nmi (get_swap_page)
00010002 0.268ms (+0.003ms): do_nmi (ide_outsw)
00010002 0.272ms (+0.000ms): notifier_call_chain (default_do_nmi)
00010002 0.272ms (+0.000ms): profile_tick (nmi_watchdog_tick)
00010002 0.272ms (+0.000ms): profile_hook (profile_tick)
00010002 0.272ms (+0.000ms): read_lock (profile_hook)
00010003 0.273ms (+0.000ms): read_lock (<00000000>)
00010003 0.273ms (+0.000ms): notifier_call_chain (profile_hook)
00010003 0.273ms (+0.000ms): _read_unlock (profile_tick)
00010002 0.274ms (+0.000ms): profile_hit (nmi_watchdog_tick)
00010002 0.274ms (+0.000ms): do_IRQ (get_swap_page)
00010002 0.275ms (+0.000ms): do_IRQ (<00000000>)
00010002 0.275ms (+0.000ms): spin_lock (do_IRQ)
00010003 0.275ms (+0.000ms): spin_lock (<00000000>)
00010003 0.275ms (+0.000ms): ack_edge_ioapic_irq (do_IRQ)
00010003 0.275ms (+0.000ms): generic_redirect_hardirq (do_IRQ)
00010003 0.276ms (+0.000ms): _spin_unlock (do_IRQ)

or this...

reemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-R8-nd
-------------------------------------------------------
 latency: 457 us, entries: 8 (8)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: kswapd0/40, uid:0 nice:0 policy:0 rt_prio:0
    -----------------
 => started at: spin_lock+0x24/0x90
 => ended at:   _spin_unlock+0x2d/0x60
=======>
00000001 0.000ms (+0.000ms): spin_lock (get_swap_page)
00000001 0.000ms (+0.000ms): spin_lock (<00000000>)
00000001 0.000ms (+0.000ms): spin_lock (get_swap_page)
00000002 0.000ms (+0.456ms): spin_lock (<00000000>)
00000002 0.457ms (+0.000ms): _spin_unlock (get_swap_page)
00000001 0.457ms (+0.000ms): _spin_unlock (get_swap_page)
00000001 0.458ms (+0.000ms): sub_preempt_count (_spin_unlock)
00000001 0.458ms (+0.000ms): update_max_trace (check_preempt_timing)

So I don't think that particular setting helps.

  --Mark



^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R1
@ 2004-09-09 22:56 Mark_H_Johnson
  2004-09-10 12:42 ` Ingo Molnar
  0 siblings, 1 reply; 100+ messages in thread
From: Mark_H_Johnson @ 2004-09-09 22:56 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Eric St-Laurent, Free Ekanayaka, free78, K.R. Foley, linux-kernel,
	Felipe Alfaro Solana, luke, nando, P.O. Gaillard, Daniel Schmitt,
	Lee Revell

>I tried
>  hdparm -p -X udma2 /dev/hda
>(since it was udma4 previously)
>and reran the tests.
Not quite sure this was what I wanted - appears to turn on PIO modes
exclusively.

What is the right incantation for this?
Thanks.
--Mark H Johnson
  <mailto:Mark_H_Johnson@raytheon.com>


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R1
  2004-09-08 18:42 ` Ingo Molnar
@ 2004-09-10  2:31   ` K.R. Foley
  2004-09-10 17:56     ` K.R. Foley
  0 siblings, 1 reply; 100+ messages in thread
From: K.R. Foley @ 2004-09-10  2:31 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Mark_H_Johnson, Lee Revell, Free Ekanayaka, Eric St-Laurent,
	linux-kernel, Felipe Alfaro Solana, Daniel Schmitt, P.O. Gaillard,
	nando, luke, free78

Ingo Molnar wrote:
> * Mark_H_Johnson@raytheon.com <Mark_H_Johnson@raytheon.com> wrote:
> 
> 
>>If you look at the date / time of the traces, you will notice that
>>most occur in the latter part of the test. This is during the "disk
>>copy" and "disk read" parts of the testing. [...]
> 
> 
> would it be possible to test with DMA disabled? (hdparm -d0 /dev/hda) It
> might take some extra work to shun the extra latency reports from the
> PIO IDE path (which is quite slow) but once that is done you should be
> able to see whether these long 0.5 msec delays remain even if all (most)
> DMA activity has been eliminated.
> 
> 
>>preemption latency trace v1.0.5 on 2.6.9-rc1-VP-R1
>>--------------------------------------------------
>> latency: 550 us, entries: 6 (6)
>>    -----------------
>>    | task: cat/6771, uid:0 nice:0 policy:0 rt_prio:0
>>    -----------------
>> => started at: kmap_atomic+0x23/0xe0
>> => ended at:   kunmap_atomic+0x7b/0xa0
>>=======>
>>00000001 0.000ms (+0.000ms): kmap_atomic (file_read_actor)
>>00000001 0.000ms (+0.000ms): page_address (file_read_actor)
>>00000001 0.000ms (+0.549ms): __copy_to_user_ll (file_read_actor)
>>00000001 0.550ms (+0.000ms): kunmap_atomic (file_read_actor)
>>00000001 0.550ms (+0.000ms): sub_preempt_count (kunmap_atomic)
>>00000001 0.550ms (+0.000ms): update_max_trace (check_preempt_timing)
>
> 
> this is a full page copy, from userspace into a kernelspace pagecache
> page. This shouldnt take 500 usecs on any hardware. Since this is a
> single instruction (memcpy's rep; movsl instruction) there's nothing
> that Linux can do to avoid (or even to cause) such a situation.

I saw this one (or one very similar) on a system that I just started 
testing on some today. Not quite as high (~219 usec if I remember 
correctly). I don't have access to the system from here. I will forward 
the trace tomorrow when I'm there tomorrow. However, I haven't seen this 
on my slower system running the same stress tests.  There are several 
possible points of interest:

System I saw this on:
P4 2.4GHz or 3.0GHz
2GB memory
2.6.9-rc1-bk12-S0 built for SMP (even though hyperthreading is off 
currently)

System I haven't seen this on:
PII 450
256MB memory
2.6.9-rc1-bk12-R6 built for UP

Sorry I don't have more complete data in front of me. I will send the 
concrete info tomorrow with the trace.

kr

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R1
  2004-09-09 22:56 Mark_H_Johnson
@ 2004-09-10 12:42 ` Ingo Molnar
  0 siblings, 0 replies; 100+ messages in thread
From: Ingo Molnar @ 2004-09-10 12:42 UTC (permalink / raw)
  To: Mark_H_Johnson
  Cc: Eric St-Laurent, Free Ekanayaka, free78, K.R. Foley, linux-kernel,
	Felipe Alfaro Solana, luke, nando, P.O. Gaillard, Daniel Schmitt,
	Lee Revell


* Mark_H_Johnson@raytheon.com <Mark_H_Johnson@raytheon.com> wrote:

> >I tried
> >  hdparm -p -X udma2 /dev/hda
> >(since it was udma4 previously)
> >and reran the tests.
> Not quite sure this was what I wanted - appears to turn on PIO modes
> exclusively.
>
> What is the right incantation for this?

does 'hdparm -X udma2 /dev/hda' work? -p seems to be some PIO auto-tune 
option that you probably dont need.

	Ingo

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R1
@ 2004-09-10 12:57 Mark_H_Johnson
  2004-09-10 13:10 ` Ingo Molnar
  0 siblings, 1 reply; 100+ messages in thread
From: Mark_H_Johnson @ 2004-09-10 12:57 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Lee Revell, Free Ekanayaka, Eric St-Laurent, linux-kernel,
	K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt, P.O. Gaillard,
	nando, luke, free78

>You could also try lower DMA modes, I don't know if that will help or
>not but UDMA133 is essentially "one entire PCI 32/33Mhz bus worth of
>traffic so the effects are not too suprising. IDE control ports can give
>you long stalls too especially if IORDY is in use.

It appears I finally figured it out and have the drive running with udma2
instead of udma4. For reference:

# hdparm /dev/hda

/dev/hda:
 multcount    = 16 (on)
 IO_support   =  1 (32-bit)
 unmaskirq    =  1 (on)
 using_dma    =  1 (on)
 keepsettings =  0 (off)
 readonly     =  0 (off)
 readahead    = 64 (on)
 geometry     = 58168/16/63, sectors = 58633344, start = 0
# hdparm -i /dev/hda

/dev/hda:

 Model=WDC WD300BB-00AUA1, FwRev=18.20D18, SerialNo=WD-WMA6W1764218
 Config={ HardSect NotMFM HdSw>15uSec SpinMotCtl Fixed DTR>5Mbs FmtGapReq }
 RawCHS=16383/16/63, TrkSize=57600, SectSize=600, ECCbytes=40
 BuffType=DualPortCache, BuffSize=2048kB, MaxMultSect=16, MultSect=16
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=58633344
 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes:  pio0 pio1 pio2 pio3 pio4
 DMA modes:  mdma0 mdma1 mdma2
 UDMA modes: udma0 udma1 *udma2 udma3 udma4 udma5
 AdvancedPM=no WriteCache=enabled
 Drive conforms to: device does not report version:

 * signifies the current active mode

The same test I ran yesterday
 head -c 750000000 /dev/zero >tmpfile (disk writes)
 cp tmpfile tmpfile2 (disk copy)
 cat tmpfile tmpfile2 >/dev/null
ran with the following results late last night. All are traces > 200 usec
[much smaller limit than before] No traces > 600 usec.

Exit Mmap - Traces 00 01 02 03 04 05 10 13 24 26
Kswapd0   - Traces 06 07 08 09 11 12 14 16..23 25 27
Spin lock - Trace  15

Exit Mmap
=========

preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-R8-nd
-------------------------------------------------------
 latency: 209 us, entries: 34 (34)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: sleep/4436, uid:0 nice:-9 policy:0 rt_prio:0
    -----------------
 => started at: cond_resched_lock+0x1d/0xa0
 => ended at:   _spin_unlock+0x2d/0x60
=======>
00000001 0.000ms (+0.000ms): touch_preempt_timing (cond_resched_lock)
00000001 0.000ms (+0.001ms): __bitmap_weight (unmap_vmas)
00000001 0.001ms (+0.002ms): vm_acct_memory (exit_mmap)
00000001 0.003ms (+0.113ms): clear_page_tables (exit_mmap)
00010001 0.117ms (+0.000ms): do_nmi (clear_page_tables)
00010001 0.117ms (+0.003ms): do_nmi (avc_has_perm)
00010001 0.121ms (+0.000ms): notifier_call_chain (default_do_nmi)
00010001 0.121ms (+0.000ms): profile_tick (nmi_watchdog_tick)
00010001 0.121ms (+0.000ms): profile_hook (profile_tick)
00010001 0.121ms (+0.000ms): read_lock (profile_hook)
00010002 0.121ms (+0.000ms): read_lock (<00000000>)
00010002 0.122ms (+0.000ms): notifier_call_chain (profile_hook)
00010002 0.122ms (+0.000ms): _read_unlock (profile_tick)
00010001 0.122ms (+0.078ms): profile_hit (nmi_watchdog_tick)
00000001 0.200ms (+0.002ms): flush_tlb_mm (exit_mmap)
00000001 0.202ms (+0.000ms): free_pages_and_swap_cache (exit_mmap)

Kswapd0
=======

preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-R8-nd
-------------------------------------------------------
 latency: 539 us, entries: 719 (719)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: kswapd0/40, uid:0 nice:0 policy:0 rt_prio:0
    -----------------
 => started at: spin_lock+0x24/0x90
 => ended at:   _spin_unlock+0x2d/0x60
=======>
00000001 0.000ms (+0.000ms): spin_lock (page_referenced_file)
00000001 0.000ms (+0.000ms): spin_lock (<00000000>)
00000001 0.000ms (+0.000ms): vma_prio_tree_next (page_referenced_file)
00000001 0.000ms (+0.000ms): prio_tree_first (vma_prio_tree_next)
00000001 0.001ms (+0.001ms): prio_tree_left (prio_tree_first)
00000001 0.003ms (+0.000ms): page_referenced_one (page_referenced_file)
00000001 0.004ms (+0.000ms): spin_lock (page_referenced_one)
00000002 0.005ms (+0.001ms): spin_lock (<00000000>)
00000002 0.006ms (+0.000ms): kmap_atomic (page_referenced_one)
00000003 0.006ms (+0.000ms): page_address (page_referenced_one)
00000003 0.007ms (+0.000ms): kunmap_atomic (page_referenced_one)
00000002 0.007ms (+0.000ms): _spin_unlock (page_referenced_one)
00000001 0.007ms (+0.000ms): vma_prio_tree_next (page_referenced_file)
00000001 0.008ms (+0.001ms): page_referenced_one (page_referenced_file)
00000001 0.009ms (+0.000ms): spin_lock (page_referenced_one)
00000002 0.009ms (+0.002ms): spin_lock (<00000000>)
... cycle repeats a number of times ...
00000001 0.535ms (+0.000ms): preempt_schedule (page_referenced_one)
00000001 0.536ms (+0.000ms): vma_prio_tree_next (page_referenced_file)
00000001 0.536ms (+0.000ms): page_referenced_one (page_referenced_file)
00000001 0.537ms (+0.000ms): spin_lock (page_referenced_one)
00000002 0.537ms (+0.000ms): spin_lock (<00000000>)
00000002 0.538ms (+0.000ms): kmap_atomic (page_referenced_one)
00000003 0.538ms (+0.000ms): page_address (page_referenced_one)
00000003 0.538ms (+0.000ms): kunmap_atomic (page_referenced_one)
00000002 0.538ms (+0.000ms): preempt_schedule (page_referenced_one)
00000002 0.539ms (+0.000ms): _spin_unlock (page_referenced_one)
00000001 0.539ms (+0.000ms): preempt_schedule (page_referenced_one)
00000001 0.539ms (+0.000ms): _spin_unlock (page_referenced_file)
00000001 0.540ms (+0.000ms): sub_preempt_count (_spin_unlock)
00000001 0.541ms (+0.000ms): update_max_trace (check_preempt_timing)

Spin Lock
=========

preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-R8-nd
-------------------------------------------------------
 latency: 235 us, entries: 18 (18)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: kswapd0/40, uid:0 nice:0 policy:0 rt_prio:0
    -----------------
 => started at: spin_lock+0x24/0x90
 => ended at:   _spin_unlock+0x2d/0x60
=======>
00000001 0.000ms (+0.000ms): spin_lock (get_swap_page)
00000001 0.000ms (+0.000ms): spin_lock (<00000000>)
00000001 0.000ms (+0.000ms): spin_lock (get_swap_page)
00000002 0.000ms (+0.113ms): spin_lock (<00000000>)
00010002 0.113ms (+0.000ms): do_nmi (get_swap_page)
00010002 0.114ms (+0.002ms): do_nmi (default_idle)
00010002 0.116ms (+0.000ms): notifier_call_chain (default_do_nmi)
00010002 0.116ms (+0.000ms): profile_tick (nmi_watchdog_tick)
00010002 0.117ms (+0.000ms): profile_hook (profile_tick)
00010002 0.117ms (+0.000ms): read_lock (profile_hook)
00010003 0.117ms (+0.000ms): read_lock (<00000000>)
00010003 0.117ms (+0.000ms): notifier_call_chain (profile_hook)
00010003 0.117ms (+0.000ms): _read_unlock (profile_tick)
00010002 0.118ms (+0.116ms): profile_hit (nmi_watchdog_tick)
00000002 0.235ms (+0.000ms): _spin_unlock (get_swap_page)
00000001 0.235ms (+0.000ms): _spin_unlock (get_swap_page)
00000001 0.235ms (+0.001ms): sub_preempt_count (_spin_unlock)
00000001 0.237ms (+0.000ms): update_max_trace (check_preempt_timing)

Are you SURE the spin lock counter works properly on SMP systems?
I did a quick check of yesterday's results:
  # grep -ir '<.*>' latencytest0.42-png/lt040909  | wc -l
  6978
  # grep -ir '<.*>' latencytest0.42-png/lt040909  | grep -v '<00000000>' |
less -im
  ...
  latencytest0.42-png/lt040909/lt004.v3k1/lt.10:00010002 0.382ms
(+0.000ms): do_IRQ (<0000000e>)
latencytest0.42-png/lt040909/lt004.v3k1/lt.11:00010001 0.003ms (+0.000ms):
do_IRQ (<0000000a>)
latencytest0.42-png/lt040909/lt004.v3k1/lt.11:00020001 0.308ms (+0.000ms):
do_IRQ (<0000000e>)
latencytest0.42-png/lt040909/lt005.v3k1/lt.00:00010002 0.263ms (+0.005ms):
do_nmi (<0804bd56>)
latencytest0.42-png/lt040909/lt006.v3k1/lt.00:00010001 0.501ms (+0.005ms):
do_nmi (<0864d2b2>)
latencytest0.42-png/lt040909/lt006.v3k1/lt.01:00010001 0.067ms (+0.003ms):
do_nmi (<0815f06c>)
  ...
No entries that are non zero and lock related.

  --Mark


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R1
  2004-09-10 12:57 Mark_H_Johnson
@ 2004-09-10 13:10 ` Ingo Molnar
  0 siblings, 0 replies; 100+ messages in thread
From: Ingo Molnar @ 2004-09-10 13:10 UTC (permalink / raw)
  To: Mark_H_Johnson
  Cc: Lee Revell, Free Ekanayaka, Eric St-Laurent, linux-kernel,
	K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt, P.O. Gaillard,
	nando, luke, free78


* Mark_H_Johnson@raytheon.com <Mark_H_Johnson@raytheon.com> wrote:

> 00000001 0.000ms (+0.000ms): spin_lock (get_swap_page)
> 00000001 0.000ms (+0.000ms): spin_lock (<00000000>)
> 00000001 0.000ms (+0.000ms): spin_lock (get_swap_page)
> 00000002 0.000ms (+0.113ms): spin_lock (<00000000>)
> 00010002 0.113ms (+0.000ms): do_nmi (get_swap_page)

> Are you SURE the spin lock counter works properly on SMP systems?
> I did a quick check of yesterday's results:
>   # grep -ir '<.*>' latencytest0.42-png/lt040909  | wc -l
>   6978
>   # grep -ir '<.*>' latencytest0.42-png/lt040909  | grep -v '<00000000>' |
> less -im
>   ...

> No entries that are non zero and lock related.

it works fine here. To double-check i've created a contention testcase:

 00000001 0.000ms (+0.000ms): spin_lock (sys_gettimeofday)
 00000001 0.000ms (+0.000ms): spin_lock (<000012ce>)

this spin_lock() spun 4814 times before it got the lock.

Linux locking is pretty uncontended on 2-way boxes.

	Ingo

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R1
@ 2004-09-10 14:28 Mark_H_Johnson
  2004-09-10 19:29 ` Ingo Molnar
  0 siblings, 1 reply; 100+ messages in thread
From: Mark_H_Johnson @ 2004-09-10 14:28 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Eric St-Laurent, Free Ekanayaka, free78, K.R. Foley, linux-kernel,
	Felipe Alfaro Solana, luke, nando, P.O. Gaillard, Daniel Schmitt,
	Lee Revell

>does 'hdparm -X udma2 /dev/hda' work?

Yes it does and quite well actually. For comparison
  -R1 on September 7 - over 100 traces > 500 usec
  -R8 on September 10 - 38 traces > 200 usec, only 3 > 500 usec

This was with the full test suite (latencytest active, all different
types of operations).

In addition to what I already reported today, there were a few additional
types of traces:

network poll - Traces 00 01 02 03 04 06 07 08 16 17 29 32
twkill_work  - Trace  05
VL kswapd0   - Trace  12
spin lock/nmi- Trace  15
do_IRQ       - Trace  25 26 27
the previously reported symptoms...
exit mmap    - Traces 09 18 19 20 21 22 23 24 28 30 31 33 34 35 36 37
kswapd0      - Traces 10 11 13 14

Network Poll
============
preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-R8-nd
-------------------------------------------------------
 latency: 332 us, entries: 198 (198)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: ksoftirqd/1/5, uid:0 nice:-10 policy:0 rt_prio:0
    -----------------
 => started at: spin_lock+0x24/0x90
 => ended at:   rtl8139_rx+0x219/0x340
=======>
00000001 0.000ms (+0.000ms): spin_lock (rtl8139_poll)
00000001 0.000ms (+0.001ms): spin_lock (<00000000>)
00000001 0.001ms (+0.003ms): rtl8139_rx (rtl8139_poll)
... we've been down this path before ...
No individual traces > 10 usec; just the number of traces involved.

twkill_work
===========

I have not seen this one before....

preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-R8-nd
-------------------------------------------------------
 latency: 499 us, entries: 1858 (1858)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: events/0/6, uid:0 nice:-10 policy:0 rt_prio:0
    -----------------
 => started at: twkill_work+0xb5/0xe0
 => ended at:   twkill_work+0xb5/0xe0
=======>
00000101 0.000ms (+0.000ms): touch_preempt_timing (twkill_work)
00000101 0.000ms (+0.000ms): smp_apic_timer_interrupt
(touch_preempt_timing)
00010101 0.000ms (+0.000ms): profile_tick (smp_apic_timer_interrupt)
00010101 0.000ms (+0.000ms): profile_hook (profile_tick)
00010101 0.000ms (+0.000ms): read_lock (profile_hook)
00010102 0.001ms (+0.000ms): read_lock (<00000000>)
00010102 0.001ms (+0.000ms): notifier_call_chain (profile_hook)
00010102 0.001ms (+0.000ms): _read_unlock (profile_tick)
00010101 0.002ms (+0.000ms): profile_hit (smp_apic_timer_interrupt)
00010101 0.002ms (+0.000ms): update_process_times
(smp_apic_timer_interrupt)
00010101 0.002ms (+0.000ms): update_one_process (update_process_times)
00010101 0.003ms (+0.000ms): run_local_timers (update_process_times)
00010101 0.003ms (+0.000ms): raise_softirq (update_process_times)
00010101 0.003ms (+0.000ms): scheduler_tick (update_process_times)
00010101 0.004ms (+0.000ms): sched_clock (scheduler_tick)
00010101 0.005ms (+0.000ms): rcu_check_callbacks (scheduler_tick)
00010101 0.005ms (+0.001ms): idle_cpu (rcu_check_callbacks)
00010101 0.006ms (+0.000ms): spin_lock (scheduler_tick)
00010102 0.006ms (+0.000ms): spin_lock (<00000000>)
00010102 0.007ms (+0.000ms): task_timeslice (scheduler_tick)
00010102 0.007ms (+0.001ms): __bitmap_weight (scheduler_tick)
00010102 0.009ms (+0.000ms): _spin_unlock (scheduler_tick)
00010101 0.010ms (+0.001ms): rebalance_tick (scheduler_tick)
00010101 0.011ms (+0.000ms): do_IRQ (touch_preempt_timing)
00010101 0.011ms (+0.000ms): do_IRQ (<00000000>)
00010101 0.012ms (+0.000ms): spin_lock (do_IRQ)
00010102 0.012ms (+0.000ms): spin_lock (<00000000>)
00010102 0.012ms (+0.000ms): ack_edge_ioapic_irq (do_IRQ)
00010102 0.013ms (+0.000ms): generic_redirect_hardirq (do_IRQ)
00010102 0.013ms (+0.000ms): _spin_unlock (do_IRQ)
... Look at this next sequence ...
00010102 0.033ms (+0.000ms): end_edge_ioapic_irq (do_IRQ)
00010102 0.033ms (+0.000ms): _spin_unlock (do_IRQ)
00000101 0.034ms (+0.000ms): tcp_do_twkill_work (twkill_work)
00000101 0.035ms (+0.000ms): _spin_unlock (tcp_do_twkill_work)
00000100 0.035ms (+0.000ms): tcp_timewait_kill (tcp_do_twkill_work)
00000100 0.035ms (+0.000ms): write_lock (tcp_timewait_kill)
00000101 0.036ms (+0.000ms): write_lock (<00000000>)
00000101 0.037ms (+0.000ms): _write_unlock (tcp_timewait_kill)
00000100 0.037ms (+0.000ms): spin_lock (tcp_timewait_kill)
00000101 0.037ms (+0.000ms): spin_lock (<00000000>)
00000101 0.038ms (+0.000ms): tcp_bucket_destroy (tcp_timewait_kill)
00000101 0.038ms (+0.000ms): kmem_cache_free (tcp_bucket_destroy)
00000101 0.038ms (+0.000ms): cache_flusharray (kmem_cache_free)
... or this one ...
00000100 0.050ms (+0.000ms): spin_lock (tcp_do_twkill_work)
00000101 0.051ms (+0.000ms): spin_lock (<00000000>)
00000101 0.051ms (+0.000ms): _spin_unlock (tcp_do_twkill_work)
00000100 0.051ms (+0.000ms): tcp_timewait_kill (tcp_do_twkill_work)
00000100 0.051ms (+0.000ms): write_lock (tcp_timewait_kill)
00000101 0.052ms (+0.000ms): write_lock (<00000000>)
00000101 0.052ms (+0.000ms): _write_unlock (tcp_timewait_kill)
00000100 0.053ms (+0.000ms): spin_lock (tcp_timewait_kill)
00000101 0.053ms (+0.000ms): spin_lock (<00000000>)
00000101 0.053ms (+0.000ms): tcp_bucket_destroy (tcp_timewait_kill)
00000101 0.053ms (+0.000ms): kmem_cache_free (tcp_bucket_destroy)
00000101 0.054ms (+0.000ms): _spin_unlock (tcp_timewait_kill)
00000100 0.054ms (+0.000ms): kmem_cache_free (tcp_do_twkill_work)
00000100 0.054ms (+0.000ms): spin_lock (tcp_do_twkill_work)
00000101 0.054ms (+0.000ms): spin_lock (<00000000>)
00000101 0.054ms (+0.000ms): _spin_unlock (tcp_do_twkill_work)
00000100 0.055ms (+0.000ms): tcp_timewait_kill (tcp_do_twkill_work)
... and ends like this ...
00000101 0.496ms (+0.000ms): _write_unlock (tcp_timewait_kill)
00000100 0.496ms (+0.000ms): preempt_schedule (tcp_timewait_kill)
00000100 0.496ms (+0.000ms): spin_lock (tcp_timewait_kill)
00000101 0.496ms (+0.000ms): spin_lock (<00000000>)
00000101 0.497ms (+0.000ms): tcp_bucket_destroy (tcp_timewait_kill)
00000101 0.497ms (+0.000ms): kmem_cache_free (tcp_bucket_destroy)
00000101 0.497ms (+0.000ms): _spin_unlock (tcp_timewait_kill)
00000100 0.497ms (+0.000ms): preempt_schedule (tcp_timewait_kill)
00000100 0.498ms (+0.000ms): kmem_cache_free (tcp_do_twkill_work)
00000100 0.498ms (+0.000ms): spin_lock (tcp_do_twkill_work)
00000101 0.498ms (+0.000ms): spin_lock (<00000000>)
00000101 0.499ms (+0.000ms): touch_preempt_timing (twkill_work)
00000101 0.499ms (+0.000ms): update_max_trace (check_preempt_timing)

I looked briefly at the -R8 patch and see the change in condition for
breaking the lock, but from this trace, it may not be enough.

VL kswapd0
==========

An extremely long latency (> 6 msec) appears to be a combination of
effects, perhaps tracing related. It starts like this...

preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-R8-nd
-------------------------------------------------------
 latency: 6776 us, entries: 548 (548)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: kswapd0/40, uid:0 nice:0 policy:0 rt_prio:0
    -----------------
 => started at: spin_lock+0x24/0x90
 => ended at:   _spin_unlock+0x2d/0x60
=======>
00000001 0.000ms (+0.000ms): spin_lock (page_referenced_file)
00000001 0.000ms (+0.000ms): spin_lock (<00000000>)
00000001 0.000ms (+0.000ms): vma_prio_tree_next (page_referenced_file)
00000001 0.001ms (+0.001ms): prio_tree_first (vma_prio_tree_next)
00000001 0.002ms (+0.000ms): page_referenced_one (page_referenced_file)
00000001 0.002ms (+0.000ms): spin_lock (page_referenced_one)
00000002 0.003ms (+0.001ms): spin_lock (<00000000>)
00000002 0.004ms (+0.000ms): kmap_atomic (page_referenced_one)
00000003 0.004ms (+0.000ms): page_address (page_referenced_one)
00000003 0.005ms (+0.000ms): kunmap_atomic (page_referenced_one)
00000002 0.005ms (+0.000ms): _spin_unlock (page_referenced_one)
00000001 0.005ms (+0.000ms): vma_prio_tree_next (page_referenced_file)
00000001 0.006ms (+0.000ms): page_referenced_one (page_referenced_file)
00000001 0.006ms (+0.000ms): spin_lock (page_referenced_one)
00000002 0.007ms (+0.000ms): spin_lock (<00000000>)
... looks like the typical kswapd0 cycle until this point ...
00000001 0.126ms (+0.000ms): page_referenced_one (page_referenced_file)
00000001 0.127ms (+0.000ms): spin_lock (page_referenced_one)
00000002 0.127ms (+0.000ms): spin_lock (<00000000>)
00000002 0.128ms (+0.000ms): kmap_atomic (page_referenced_one)
00000003 0.128ms (+0.000ms): page_address (page_referenced_one)
00000003 0.129ms (+0.000ms): flush_tlb_page (page_referenced_one)
00000004 0.129ms (+0.000ms): flush_tlb_others (flush_tlb_page)
00000004 0.130ms (+0.000ms): spin_lock (flush_tlb_others)
00000005 0.130ms (+0.000ms): spin_lock (<00000000>)
00000005 0.131ms (+0.000ms): send_IPI_mask (flush_tlb_others)
00000005 0.132ms (+0.790ms): send_IPI_mask_bitmask (flush_tlb_others)
00010005 0.922ms (+0.000ms): do_nmi (flush_tlb_others)
00010005 0.923ms (+0.003ms): do_nmi (touch_preempt_timing)
00010005 0.926ms (+0.000ms): notifier_call_chain (default_do_nmi)
00010005 0.926ms (+0.000ms): profile_tick (nmi_watchdog_tick)
00010005 0.926ms (+0.000ms): profile_hook (profile_tick)
00010005 0.927ms (+0.000ms): read_lock (profile_hook)
00010006 0.927ms (+0.000ms): read_lock (<00000000>)
00010006 0.927ms (+0.000ms): notifier_call_chain (profile_hook)
00010006 0.928ms (+0.000ms): _read_unlock (profile_tick)
00010005 0.928ms (+0.027ms): profile_hit (nmi_watchdog_tick)
00000005 0.955ms (+0.000ms): smp_apic_timer_interrupt (flush_tlb_others)
00010005 0.955ms (+0.000ms): profile_tick (smp_apic_timer_interrupt)
00010005 0.956ms (+0.000ms): profile_hook (profile_tick)
00010005 0.956ms (+0.000ms): read_lock (profile_hook)
00010006 0.956ms (+0.000ms): read_lock (<00000000>)
00010006 0.956ms (+0.000ms): notifier_call_chain (profile_hook)
00010006 0.956ms (+0.000ms): _read_unlock (profile_tick)
00010005 0.956ms (+0.000ms): profile_hit (smp_apic_timer_interrupt)
00010005 0.957ms (+0.000ms): update_process_times
(smp_apic_timer_interrupt)
00010005 0.957ms (+0.000ms): update_one_process (update_process_times)
00010005 0.957ms (+0.000ms): run_local_timers (update_process_times)
00010005 0.957ms (+0.000ms): raise_softirq (update_process_times)
00010005 0.958ms (+0.000ms): scheduler_tick (update_process_times)
00010005 0.958ms (+0.000ms): sched_clock (scheduler_tick)
00010005 0.958ms (+0.000ms): spin_lock (scheduler_tick)
00010006 0.958ms (+0.000ms): spin_lock (<00000000>)
00010006 0.959ms (+0.000ms): task_timeslice (scheduler_tick)
00010006 0.959ms (+0.000ms): __bitmap_weight (scheduler_tick)
00010006 0.959ms (+0.000ms): _spin_unlock (scheduler_tick)
00010005 0.959ms (+0.000ms): rebalance_tick (scheduler_tick)
00000006 0.960ms (+0.000ms): do_softirq (smp_apic_timer_interrupt)
00000006 0.960ms (+0.000ms): __do_softirq (do_softirq)
00000006 0.960ms (+0.000ms): wake_up_process (do_softirq)
00000006 0.960ms (+0.000ms): try_to_wake_up (wake_up_process)
00000006 0.960ms (+0.000ms): task_rq_lock (try_to_wake_up)
00000006 0.961ms (+0.000ms): spin_lock (task_rq_lock)
00000007 0.961ms (+0.000ms): spin_lock (<00000000>)
00000007 0.961ms (+0.000ms): wake_idle (try_to_wake_up)
00000007 0.961ms (+0.000ms): activate_task (try_to_wake_up)
00000007 0.961ms (+0.000ms): sched_clock (activate_task)
00000007 0.962ms (+0.000ms): recalc_task_prio (activate_task)
00000007 0.962ms (+0.000ms): effective_prio (recalc_task_prio)
00000007 0.962ms (+0.000ms): __activate_task (try_to_wake_up)
00000007 0.962ms (+0.000ms): enqueue_task (__activate_task)
00000007 0.962ms (+0.000ms): sched_info_queued (enqueue_task)
00000007 0.963ms (+0.000ms): resched_task (try_to_wake_up)
00000007 0.963ms (+0.000ms): task_rq_unlock (try_to_wake_up)
00000007 0.963ms (+0.000ms): _spin_unlock_irqrestore (try_to_wake_up)
00000006 0.964ms (+0.958ms): preempt_schedule (try_to_wake_up)
00010005 1.922ms (+0.000ms): do_nmi (flush_tlb_others)
00010005 1.922ms (+0.003ms): do_nmi (check_preempt_timing)
00010005 1.926ms (+0.000ms): notifier_call_chain (default_do_nmi)
00010005 1.926ms (+0.000ms): profile_tick (nmi_watchdog_tick)
00010005 1.926ms (+0.000ms): profile_hook (profile_tick)
00010005 1.926ms (+0.000ms): read_lock (profile_hook)
00010006 1.927ms (+0.000ms): read_lock (<00000000>)
00010006 1.927ms (+0.000ms): notifier_call_chain (profile_hook)
00010006 1.927ms (+0.000ms): _read_unlock (profile_tick)
00010005 1.927ms (+0.000ms): preempt_schedule (profile_tick)
00010005 1.928ms (+0.027ms): profile_hit (nmi_watchdog_tick)
00000005 1.955ms (+0.000ms): smp_apic_timer_interrupt (flush_tlb_others)
00010005 1.955ms (+0.000ms): profile_tick (smp_apic_timer_interrupt)
00010005 1.955ms (+0.000ms): profile_hook (profile_tick)
00010005 1.955ms (+0.000ms): read_lock (profile_hook)
00010006 1.955ms (+0.000ms): read_lock (<00000000>)
00010006 1.956ms (+0.000ms): notifier_call_chain (profile_hook)
00010006 1.956ms (+0.000ms): _read_unlock (profile_tick)
00010005 1.956ms (+0.000ms): preempt_schedule (profile_tick)
00010005 1.956ms (+0.000ms): profile_hit (smp_apic_timer_interrupt)
00010005 1.956ms (+0.000ms): update_process_times
(smp_apic_timer_interrupt)
00010005 1.956ms (+0.000ms): update_one_process (update_process_times)
00010005 1.957ms (+0.000ms): run_local_timers (update_process_times)
00010005 1.957ms (+0.000ms): raise_softirq (update_process_times)
00010005 1.957ms (+0.000ms): scheduler_tick (update_process_times)
00010005 1.957ms (+0.000ms): sched_clock (scheduler_tick)
00010005 1.958ms (+0.000ms): spin_lock (scheduler_tick)
00010006 1.958ms (+0.000ms): spin_lock (<00000000>)
00010006 1.958ms (+0.000ms): task_timeslice (scheduler_tick)
00010006 1.958ms (+0.000ms): __bitmap_weight (scheduler_tick)
00010006 1.958ms (+0.000ms): _spin_unlock (scheduler_tick)
00010005 1.959ms (+0.000ms): preempt_schedule (scheduler_tick)
00010005 1.959ms (+0.000ms): rebalance_tick (scheduler_tick)
00000006 1.959ms (+0.000ms): do_softirq (smp_apic_timer_interrupt)
00000006 1.959ms (+0.961ms): __do_softirq (do_softirq)
00010005 2.921ms (+0.000ms): do_nmi (flush_tlb_others)
00010005 2.922ms (+0.001ms): do_nmi (check_preempt_timing)
... looks like we got stuck for milliseconds but finally finished ...
00000006 4.959ms (+0.000ms): do_softirq (smp_apic_timer_interrupt)
00000006 4.959ms (+0.960ms): __do_softirq (do_softirq)
00010005 5.919ms (+0.000ms): do_nmi (flush_tlb_others)
00010005 5.920ms (+0.003ms): do_nmi (__trace)
00010005 5.924ms (+0.000ms): notifier_call_chain (default_do_nmi)
00010005 5.924ms (+0.000ms): profile_tick (nmi_watchdog_tick)
00010005 5.924ms (+0.000ms): profile_hook (profile_tick)
00010005 5.924ms (+0.000ms): read_lock (profile_hook)
00010006 5.924ms (+0.000ms): read_lock (<00000000>)
00010006 5.925ms (+0.000ms): notifier_call_chain (profile_hook)
00010006 5.925ms (+0.000ms): _read_unlock (profile_tick)
00010005 5.925ms (+0.000ms): preempt_schedule (profile_tick)
00010005 5.925ms (+0.027ms): profile_hit (nmi_watchdog_tick)
00000005 5.952ms (+0.000ms): smp_apic_timer_interrupt (flush_tlb_others)
00010005 5.953ms (+0.000ms): profile_tick (smp_apic_timer_interrupt)
00010005 5.953ms (+0.000ms): profile_hook (profile_tick)
00010005 5.953ms (+0.000ms): read_lock (profile_hook)
00010006 5.953ms (+0.000ms): read_lock (<00000000>)
00010006 5.953ms (+0.000ms): notifier_call_chain (profile_hook)
00010006 5.953ms (+0.000ms): _read_unlock (profile_tick)
00010005 5.954ms (+0.000ms): preempt_schedule (profile_tick)
00010005 5.954ms (+0.000ms): profile_hit (smp_apic_timer_interrupt)
00010005 5.954ms (+0.000ms): update_process_times
(smp_apic_timer_interrupt)
00010005 5.954ms (+0.000ms): update_one_process (update_process_times)
00010005 5.954ms (+0.000ms): run_local_timers (update_process_times)
00010005 5.955ms (+0.000ms): raise_softirq (update_process_times)
00010005 5.955ms (+0.000ms): scheduler_tick (update_process_times)
00010005 5.955ms (+0.000ms): sched_clock (scheduler_tick)
00010005 5.955ms (+0.000ms): spin_lock (scheduler_tick)
00010006 5.956ms (+0.000ms): spin_lock (<00000000>)
00010006 5.956ms (+0.000ms): task_timeslice (scheduler_tick)
00010006 5.956ms (+0.000ms): __bitmap_weight (scheduler_tick)
00010006 5.956ms (+0.000ms): _spin_unlock (scheduler_tick)
00010005 5.956ms (+0.000ms): preempt_schedule (scheduler_tick)
00010005 5.957ms (+0.000ms): rebalance_tick (scheduler_tick)
00000006 5.957ms (+0.000ms): do_softirq (smp_apic_timer_interrupt)
00000006 5.957ms (+0.817ms): __do_softirq (do_softirq)
00000005 6.774ms (+0.000ms): _spin_unlock (flush_tlb_others)
00000004 6.775ms (+0.000ms): preempt_schedule (flush_tlb_others)
00000003 6.775ms (+0.000ms): preempt_schedule (flush_tlb_page)
00000003 6.775ms (+0.000ms): kunmap_atomic (page_referenced_one)
00000002 6.775ms (+0.000ms): preempt_schedule (page_referenced_one)
00000002 6.776ms (+0.000ms): _spin_unlock (page_referenced_one)
00000001 6.776ms (+0.000ms): preempt_schedule (page_referenced_one)
00000001 6.776ms (+0.000ms): _spin_unlock (page_referenced_file)
00000001 6.777ms (+0.000ms): sub_preempt_count (_spin_unlock)
00000001 6.777ms (+0.000ms): update_max_trace (check_preempt_timing)

Spin Lock/nmi
=============

May be an artifact of the test configuration

preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-R8-nd
-------------------------------------------------------
 latency: 244 us, entries: 17 (17)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: fam/3212, uid:0 nice:0 policy:0 rt_prio:0
    -----------------
 => started at: __spin_lock_irqsave+0x2b/0xb0
 => ended at:   _spin_unlock_irqrestore+0x32/0x70
=======>
00000001 0.000ms (+0.000ms): __spin_lock_irqsave (avc_has_perm_noaudit)
00000001 0.000ms (+0.000ms): __spin_lock_irqsave (<00000000>)
00000001 0.000ms (+0.069ms): avc_insert (avc_has_perm_noaudit)
00010001 0.069ms (+0.000ms): do_nmi (avc_insert)
00010001 0.070ms (+0.002ms): do_nmi (__mcount)
00010001 0.072ms (+0.000ms): notifier_call_chain (default_do_nmi)
00010001 0.073ms (+0.000ms): profile_tick (nmi_watchdog_tick)
00010001 0.073ms (+0.000ms): profile_hook (profile_tick)
00010001 0.073ms (+0.000ms): read_lock (profile_hook)
00010002 0.073ms (+0.000ms): read_lock (<00000000>)
00010002 0.073ms (+0.000ms): notifier_call_chain (profile_hook)
00010002 0.074ms (+0.000ms): _read_unlock (profile_tick)
00010001 0.074ms (+0.168ms): profile_hit (nmi_watchdog_tick)
00000001 0.242ms (+0.000ms): memcpy (avc_has_perm_noaudit)
00000001 0.243ms (+0.002ms): _spin_unlock_irqrestore (avc_has_perm_noaudit)
00000001 0.245ms (+0.000ms): sub_preempt_count (_spin_unlock_irqrestore)
00000001 0.245ms (+0.000ms): update_max_trace (check_preempt_timing)

Do_IRQ
======

Perhaps an unfortunate series of nested IRQ's?
Starts like this...

preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-R8-nd
-------------------------------------------------------
 latency: 240 us, entries: 380 (380)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: cpu_burn/14061, uid:0 nice:10 policy:0 rt_prio:0
    -----------------
 => started at: do_IRQ+0x19/0x1a0
 => ended at:   do_IRQ+0x14a/0x1a0
=======>
00010000 0.000ms (+0.000ms): do_IRQ (common_interrupt)
00010000 0.000ms (+0.000ms): do_IRQ (<08048340>)
00010000 0.000ms (+0.000ms): do_IRQ (<0000000a>)
00010000 0.000ms (+0.000ms): spin_lock (do_IRQ)
00010001 0.000ms (+0.000ms): spin_lock (<00000000>)
00010001 0.001ms (+0.000ms): mask_and_ack_level_ioapic_irq (do_IRQ)
00010001 0.001ms (+0.000ms): mask_IO_APIC_irq
(mask_and_ack_level_ioapic_irq)
00010001 0.001ms (+0.000ms): __spin_lock_irqsave (mask_IO_APIC_irq)
00010002 0.002ms (+0.000ms): __spin_lock_irqsave (<00000000>)
00010002 0.002ms (+0.000ms): __mask_IO_APIC_irq (mask_IO_APIC_irq)
00010002 0.002ms (+0.013ms): __modify_IO_APIC_irq (__mask_IO_APIC_irq)
00010002 0.016ms (+0.000ms): _spin_unlock_irqrestore (mask_IO_APIC_irq)
00010001 0.016ms (+0.000ms): generic_redirect_hardirq (do_IRQ)
00010001 0.016ms (+0.000ms): _spin_unlock (do_IRQ)
00010000 0.017ms (+0.000ms): generic_handle_IRQ_event (do_IRQ)
00010000 0.017ms (+0.001ms): usb_hcd_irq (generic_handle_IRQ_event)
00010000 0.019ms (+0.001ms): uhci_irq (usb_hcd_irq)
00010000 0.021ms (+0.000ms): spin_lock (uhci_irq)
00010001 0.021ms (+0.000ms): spin_lock (<00000000>)
00010001 0.021ms (+0.001ms): uhci_get_current_frame_number (uhci_irq)
00010001 0.023ms (+0.000ms): uhci_free_pending_qhs (uhci_irq)
00010001 0.023ms (+0.001ms): uhci_free_pending_tds (uhci_irq)
00010001 0.024ms (+0.001ms): uhci_remove_pending_urbps (uhci_irq)
00010001 0.025ms (+0.000ms): uhci_transfer_result (uhci_irq)
00010001 0.026ms (+0.000ms): spin_lock (uhci_transfer_result)
...
00020005 0.134ms (+0.000ms): do_IRQ (_spin_unlock_irqrestore)
00020005 0.134ms (+0.000ms): do_IRQ (<00000000>)
00020005 0.134ms (+0.000ms): spin_lock (do_IRQ)
00020006 0.134ms (+0.000ms): spin_lock (<00000000>)
00020006 0.134ms (+0.000ms): ack_edge_ioapic_irq (do_IRQ)
00020006 0.135ms (+0.000ms): generic_redirect_hardirq (do_IRQ)
00020006 0.135ms (+0.000ms): _spin_unlock (do_IRQ)
00020005 0.135ms (+0.000ms): preempt_schedule (do_IRQ)
00020005 0.135ms (+0.000ms): generic_handle_IRQ_event (do_IRQ)
00020005 0.136ms (+0.000ms): timer_interrupt (generic_handle_IRQ_event)
... exits like this ...
00010001 0.220ms (+0.000ms): generic_note_interrupt (do_IRQ)
00010001 0.220ms (+0.000ms): end_level_ioapic_irq (do_IRQ)
00010001 0.220ms (+0.000ms): unmask_IO_APIC_irq (do_IRQ)
00010001 0.221ms (+0.000ms): __spin_lock_irqsave (unmask_IO_APIC_irq)
00010002 0.221ms (+0.000ms): __spin_lock_irqsave (<00000000>)
00010002 0.221ms (+0.000ms): __unmask_IO_APIC_irq (unmask_IO_APIC_irq)
00010002 0.221ms (+0.013ms): __modify_IO_APIC_irq (__unmask_IO_APIC_irq)
00010002 0.235ms (+0.000ms): _spin_unlock_irqrestore (unmask_IO_APIC_irq)
00010001 0.235ms (+0.000ms): preempt_schedule (unmask_IO_APIC_irq)
00010001 0.235ms (+0.000ms): _spin_unlock (do_IRQ)
00010000 0.236ms (+0.000ms): preempt_schedule (do_IRQ)
00000001 0.236ms (+0.000ms): do_softirq (do_IRQ)
00000001 0.236ms (+0.000ms): __do_softirq (do_softirq)
00000001 0.236ms (+0.000ms): wake_up_process (do_softirq)
00000001 0.237ms (+0.000ms): try_to_wake_up (wake_up_process)
00000001 0.237ms (+0.000ms): task_rq_lock (try_to_wake_up)
00000001 0.237ms (+0.000ms): spin_lock (task_rq_lock)
00000002 0.237ms (+0.000ms): spin_lock (<00000000>)
00000002 0.237ms (+0.000ms): wake_idle (try_to_wake_up)
00000002 0.238ms (+0.000ms): activate_task (try_to_wake_up)
00000002 0.238ms (+0.000ms): sched_clock (activate_task)
00000002 0.238ms (+0.000ms): recalc_task_prio (activate_task)
00000002 0.238ms (+0.000ms): effective_prio (recalc_task_prio)
00000002 0.239ms (+0.000ms): __activate_task (try_to_wake_up)
00000002 0.239ms (+0.000ms): enqueue_task (__activate_task)
00000002 0.239ms (+0.000ms): sched_info_queued (enqueue_task)
00000002 0.239ms (+0.000ms): resched_task (try_to_wake_up)
00000002 0.240ms (+0.000ms): task_rq_unlock (try_to_wake_up)
00000002 0.240ms (+0.000ms): _spin_unlock_irqrestore (try_to_wake_up)
00000001 0.240ms (+0.000ms): preempt_schedule (try_to_wake_up)
00000001 0.241ms (+0.000ms): sub_preempt_count (do_IRQ)
00000001 0.242ms (+0.000ms): update_max_trace (check_preempt_timing)

I will send the full traces separately (off list).

  --Mark


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R1
  2004-09-10  2:31   ` K.R. Foley
@ 2004-09-10 17:56     ` K.R. Foley
  2004-09-10 19:26       ` Ingo Molnar
  0 siblings, 1 reply; 100+ messages in thread
From: K.R. Foley @ 2004-09-10 17:56 UTC (permalink / raw)
  To: K.R. Foley
  Cc: Ingo Molnar, Mark_H_Johnson, Lee Revell, Free Ekanayaka,
	Eric St-Laurent, linux-kernel, Felipe Alfaro Solana,
	Daniel Schmitt, P.O. Gaillard, nando, luke, free78

K.R. Foley wrote:
> Ingo Molnar wrote:
> 
>> * Mark_H_Johnson@raytheon.com <Mark_H_Johnson@raytheon.com> wrote:
>>
>>
>>> If you look at the date / time of the traces, you will notice that
>>> most occur in the latter part of the test. This is during the "disk
>>> copy" and "disk read" parts of the testing. [...]
>>
>>
>>
>> would it be possible to test with DMA disabled? (hdparm -d0 /dev/hda) It
>> might take some extra work to shun the extra latency reports from the
>> PIO IDE path (which is quite slow) but once that is done you should be
>> able to see whether these long 0.5 msec delays remain even if all (most)
>> DMA activity has been eliminated.
>>
>>
>>> preemption latency trace v1.0.5 on 2.6.9-rc1-VP-R1
>>> --------------------------------------------------
>>> latency: 550 us, entries: 6 (6)
>>>    -----------------
>>>    | task: cat/6771, uid:0 nice:0 policy:0 rt_prio:0
>>>    -----------------
>>> => started at: kmap_atomic+0x23/0xe0
>>> => ended at:   kunmap_atomic+0x7b/0xa0
>>> =======>
>>> 00000001 0.000ms (+0.000ms): kmap_atomic (file_read_actor)
>>> 00000001 0.000ms (+0.000ms): page_address (file_read_actor)
>>> 00000001 0.000ms (+0.549ms): __copy_to_user_ll (file_read_actor)
>>> 00000001 0.550ms (+0.000ms): kunmap_atomic (file_read_actor)
>>> 00000001 0.550ms (+0.000ms): sub_preempt_count (kunmap_atomic)
>>> 00000001 0.550ms (+0.000ms): update_max_trace (check_preempt_timing)
>>
>>
>>
>> this is a full page copy, from userspace into a kernelspace pagecache
>> page. This shouldnt take 500 usecs on any hardware. Since this is a
>> single instruction (memcpy's rep; movsl instruction) there's nothing
>> that Linux can do to avoid (or even to cause) such a situation.
> 
> 
> I saw this one (or one very similar) on a system that I just started 
> testing on some today. Not quite as high (~219 usec if I remember 
> correctly). I don't have access to the system from here. I will forward 
> the trace tomorrow when I'm there tomorrow. However, I haven't seen this 
> on my slower system running the same stress tests.  There are several 
> possible points of interest:
> 
> System I saw this on:
> P4 2.4GHz or 3.0GHz
> 2GB memory
> 2.6.9-rc1-bk12-S0 built for SMP (even though hyperthreading is off 
> currently)

Actual system info for above is 2.4GHz with 512 memory.

> 
> System I haven't seen this on:
> PII 450
> 256MB memory
> 2.6.9-rc1-bk12-R6 built for UP
> 
> Sorry I don't have more complete data in front of me. I will send the 
> concrete info tomorrow with the trace.
> 

As promised here is the trace:

http://www.cybsft.com/testresults/2.6.9-rc1-bk12-S0/trace2.txt

I also have some other traces from this system that I have not seen 
before on my slower system. For instance this one where we spend ~204 
usec in __spin_lock_irqsave:

http://www.cybsft.com/testresults/2.6.9-rc1-bk12-S0/trace1.txt

Or this one where we spend ~203 usec in sched_clock. That just doesn't 
seem possible.

http://www.cybsft.com/testresults/2.6.9-rc1-bk12-S0/trace4.txt

kr

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R1
  2004-09-10 17:56     ` K.R. Foley
@ 2004-09-10 19:26       ` Ingo Molnar
  2004-09-10 19:28         ` K.R. Foley
  0 siblings, 1 reply; 100+ messages in thread
From: Ingo Molnar @ 2004-09-10 19:26 UTC (permalink / raw)
  To: K.R. Foley
  Cc: Mark_H_Johnson, Lee Revell, Free Ekanayaka, Eric St-Laurent,
	linux-kernel, Felipe Alfaro Solana, Daniel Schmitt, P.O. Gaillard,
	nando, luke, free78


* K.R. Foley <kr@cybsft.com> wrote:

> I also have some other traces from this system that I have not seen
> before on my slower system. For instance this one where we spend ~204
> usec in __spin_lock_irqsave:
> 
> http://www.cybsft.com/testresults/2.6.9-rc1-bk12-S0/trace1.txt
> 
> Or this one where we spend ~203 usec in sched_clock. That just doesn't
> seem possible.
> 
> http://www.cybsft.com/testresults/2.6.9-rc1-bk12-S0/trace4.txt

seems quite similar to Mark's IDE-DMA related hardware latencies. Does 
reducing the DMA mode via hdparm reduce these latencies?

	Ingo

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R1
  2004-09-10 19:26       ` Ingo Molnar
@ 2004-09-10 19:28         ` K.R. Foley
  0 siblings, 0 replies; 100+ messages in thread
From: K.R. Foley @ 2004-09-10 19:28 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Mark_H_Johnson, Lee Revell, Free Ekanayaka, Eric St-Laurent,
	linux-kernel, Felipe Alfaro Solana, Daniel Schmitt, P.O. Gaillard,
	nando, luke, free78

Ingo Molnar wrote:
> * K.R. Foley <kr@cybsft.com> wrote:
> 
> 
>>I also have some other traces from this system that I have not seen
>>before on my slower system. For instance this one where we spend ~204
>>usec in __spin_lock_irqsave:
>>
>>http://www.cybsft.com/testresults/2.6.9-rc1-bk12-S0/trace1.txt
>>
>>Or this one where we spend ~203 usec in sched_clock. That just doesn't
>>seem possible.
>>
>>http://www.cybsft.com/testresults/2.6.9-rc1-bk12-S0/trace4.txt
> 
> 
> seems quite similar to Mark's IDE-DMA related hardware latencies. Does 
> reducing the DMA mode via hdparm reduce these latencies?
> 
> 	Ingo
> 
I will give that a try shortly.

kr

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R1
  2004-09-10 14:28 Mark_H_Johnson
@ 2004-09-10 19:29 ` Ingo Molnar
  0 siblings, 0 replies; 100+ messages in thread
From: Ingo Molnar @ 2004-09-10 19:29 UTC (permalink / raw)
  To: Mark_H_Johnson
  Cc: Eric St-Laurent, Free Ekanayaka, free78, K.R. Foley, linux-kernel,
	Felipe Alfaro Solana, luke, nando, P.O. Gaillard, Daniel Schmitt,
	Lee Revell

[-- Attachment #1: Type: text/plain, Size: 668 bytes --]


* Mark_H_Johnson@Raytheon.com <Mark_H_Johnson@Raytheon.com> wrote:

> >does 'hdparm -X udma2 /dev/hda' work?
> 
> Yes it does and quite well actually. For comparison
>   -R1 on September 7 - over 100 traces > 500 usec
>   -R8 on September 10 - 38 traces > 200 usec, only 3 > 500 usec
> 
> This was with the full test suite (latencytest active, all different
> types of operations).

cool. Perhaps further reducing the DMA mode (to udma0, or even mdma 
modes?) will further reduce these latencies?

most of your remaining latencies seem to be get_swap_page() related -
the attached (highly experimental) patch might fix that particular
latency. (ontop of -S0).

	Ingo

[-- Attachment #2: swapspace-layout-improvements-2.6.9-rc1-bk12-A1 --]
[-- Type: text/plain, Size: 8058 bytes --]

--- linux/include/linux/swap.h.orig	
+++ linux/include/linux/swap.h	
@@ -193,7 +193,7 @@ extern int rw_swap_page_sync(int, swp_en
 extern struct address_space swapper_space;
 #define total_swapcache_pages  swapper_space.nrpages
 extern void show_swap_cache_info(void);
-extern int add_to_swap(struct page *);
+extern int add_to_swap(struct page *page, void *cookie, pgoff_t index);
 extern void __delete_from_swap_cache(struct page *);
 extern void delete_from_swap_cache(struct page *);
 extern int move_to_swap_cache(struct page *, swp_entry_t);
@@ -209,7 +209,7 @@ extern long total_swap_pages;
 extern unsigned int nr_swapfiles;
 extern struct swap_info_struct swap_info[];
 extern void si_swapinfo(struct sysinfo *);
-extern swp_entry_t get_swap_page(void);
+extern swp_entry_t get_swap_page(void *cookie, pgoff_t index);
 extern int swap_duplicate(swp_entry_t);
 extern int valid_swaphandles(swp_entry_t, unsigned long *);
 extern void swap_free(swp_entry_t);
@@ -274,7 +274,7 @@ static inline int remove_exclusive_swap_
 	return 0;
 }
 
-static inline swp_entry_t get_swap_page(void)
+static inline swp_entry_t get_swap_page(void *cookie, pgoff_t index)
 {
 	swp_entry_t entry;
 	entry.val = 0;
--- linux/kernel/power/swsusp.c.orig	
+++ linux/kernel/power/swsusp.c	
@@ -317,7 +317,7 @@ static int write_suspend_image(void)
 	for (i=0; i<nr_copy_pages; i++) {
 		if (!(i%100))
 			printk( "." );
-		entry = get_swap_page();
+		entry = get_swap_page(NULL, i);
 		if (!entry.val)
 			panic("\nNot enough swapspace when writing data" );
 		
@@ -335,7 +335,7 @@ static int write_suspend_image(void)
 		cur = (union diskpage *)((char *) pagedir_nosave)+i;
 		BUG_ON ((char *) cur != (((char *) pagedir_nosave) + i*PAGE_SIZE));
 		printk( "." );
-		entry = get_swap_page();
+		entry = get_swap_page(NULL, i);
 		if (!entry.val) {
 			printk(KERN_CRIT "Not enough swapspace when writing pgdir\n" );
 			panic("Don't know how to recover");
@@ -358,7 +358,7 @@ static int write_suspend_image(void)
 	BUG_ON (sizeof(struct suspend_header) > PAGE_SIZE-sizeof(swp_entry_t));
 	BUG_ON (sizeof(union diskpage) != PAGE_SIZE);
 	BUG_ON (sizeof(struct link) != PAGE_SIZE);
-	entry = get_swap_page();
+	entry = get_swap_page(NULL, i);
 	if (!entry.val)
 		panic( "\nNot enough swapspace when writing header" );
 	if (swapfile_used[swp_type(entry)] != SWAPFILE_SUSPEND)
--- linux/kernel/power/pmdisk.c.orig	
+++ linux/kernel/power/pmdisk.c	
@@ -206,7 +206,7 @@ static int write_swap_page(unsigned long
 	swp_entry_t entry;
 	int error = 0;
 
-	entry = get_swap_page();
+	entry = get_swap_page(NULL, addr >> PAGE_SHIFT);
 	if (swp_offset(entry) && 
 	    swapfile_used[swp_type(entry)] == SWAPFILE_SUSPEND) {
 		error = rw_swap_page_sync(WRITE, entry,
--- linux/mm/vmscan.c.orig	
+++ linux/mm/vmscan.c	
@@ -390,7 +390,10 @@ static int shrink_list(struct list_head 
 		 * Try to allocate it some swap space here.
 		 */
 		if (PageAnon(page) && !PageSwapCache(page)) {
-			if (!add_to_swap(page))
+			void *cookie = page->mapping;
+			pgoff_t index = page->index;
+
+			if (!add_to_swap(page, cookie, index))
 				goto activate_locked;
 		}
 #endif /* CONFIG_SWAP */
--- linux/mm/swap_state.c.orig	
+++ linux/mm/swap_state.c	
@@ -137,8 +137,12 @@ void __delete_from_swap_cache(struct pag
  *
  * Allocate swap space for the page and add the page to the
  * swap cache.  Caller needs to hold the page lock. 
+ *
+ * We attempt to lay pages out on swap to that virtually-contiguous pages are
+ * contiguous on-disk.  To do this we utilise page->index (offset into vma) and
+ * page->mapping (the anon_vma's address).
  */
-int add_to_swap(struct page * page)
+int add_to_swap(struct page *page, void *cookie, pgoff_t index)
 {
 	swp_entry_t entry;
 	int pf_flags;
@@ -148,7 +152,7 @@ int add_to_swap(struct page * page)
 		BUG();
 
 	for (;;) {
-		entry = get_swap_page();
+		entry = get_swap_page(cookie, index);
 		if (!entry.val)
 			return 0;
 
--- linux/mm/shmem.c.orig	
+++ linux/mm/shmem.c	
@@ -740,7 +740,7 @@ static int shmem_writepage(struct page *
 	struct shmem_inode_info *info;
 	swp_entry_t *entry, swap;
 	struct address_space *mapping;
-	unsigned long index;
+	pgoff_t index;
 	struct inode *inode;
 
 	BUG_ON(!PageLocked(page));
@@ -752,7 +752,7 @@ static int shmem_writepage(struct page *
 	info = SHMEM_I(inode);
 	if (info->flags & VM_LOCKED)
 		goto redirty;
-	swap = get_swap_page();
+	swap = get_swap_page(mapping, index);
 	if (!swap.val)
 		goto redirty;
 
--- linux/mm/swapfile.c.orig	
+++ linux/mm/swapfile.c	
@@ -25,6 +25,7 @@
 #include <linux/rmap.h>
 #include <linux/security.h>
 #include <linux/backing-dev.h>
+#include <linux/hash.h>
 
 #include <asm/pgtable.h>
 #include <asm/tlbflush.h>
@@ -83,71 +84,51 @@ void swap_unplug_io_fn(struct backing_de
 	up_read(&swap_unplug_sem);
 }
 
-static inline int scan_swap_map(struct swap_info_struct *si)
-{
-	unsigned long offset;
-	/* 
-	 * We try to cluster swap pages by allocating them
-	 * sequentially in swap.  Once we've allocated
-	 * SWAPFILE_CLUSTER pages this way, however, we resort to
-	 * first-free allocation, starting a new cluster.  This
-	 * prevents us from scattering swap pages all over the entire
-	 * swap partition, so that we reduce overall disk seek times
-	 * between swap pages.  -- sct */
-	if (si->cluster_nr) {
-		while (si->cluster_next <= si->highest_bit) {
-			offset = si->cluster_next++;
-			if (si->swap_map[offset])
-				continue;
-			si->cluster_nr--;
-			goto got_page;
-		}
-	}
-	si->cluster_nr = SWAPFILE_CLUSTER;
+int akpm;
 
-	/* try to find an empty (even not aligned) cluster. */
-	offset = si->lowest_bit;
- check_next_cluster:
-	if (offset+SWAPFILE_CLUSTER-1 <= si->highest_bit)
-	{
-		unsigned long nr;
-		for (nr = offset; nr < offset+SWAPFILE_CLUSTER; nr++)
-			if (si->swap_map[nr])
-			{
-				offset = nr+1;
-				goto check_next_cluster;
-			}
-		/* We found a completly empty cluster, so start
-		 * using it.
-		 */
-		goto got_page;
-	}
-	/* No luck, so now go finegrined as usual. -Andrea */
-	for (offset = si->lowest_bit; offset <= si->highest_bit ; offset++) {
-		if (si->swap_map[offset])
+/*
+ * We divide the swapdev into 1024 kilobyte chunks.  We use the cookie and the
+ * upper bits of the index to select a chunk and the rest of the index as the
+ * offset into the selected chunk.
+ */
+#define CHUNK_SHIFT	(20 - PAGE_SHIFT)
+#define CHUNK_MASK	(-1UL << CHUNK_SHIFT)
+
+static int
+scan_swap_map(struct swap_info_struct *si, void *cookie, pgoff_t index)
+{
+	unsigned long chunk;
+	unsigned long nchunks;
+	unsigned long block;
+	unsigned long scan;
+
+	nchunks = si->max >> CHUNK_SHIFT;
+	chunk = 0;
+	if (nchunks)
+		chunk = hash_long((unsigned long)cookie + (index & CHUNK_MASK),
+					BITS_PER_LONG) % nchunks;
+
+	block = (chunk << CHUNK_SHIFT) + (index & ~CHUNK_MASK);
+
+	for (scan = 0; scan < si->max; scan++, block++) {
+		if (block == si->max)
+			block = 0;
+		if (block == 0)
 			continue;
-		si->lowest_bit = offset+1;
-	got_page:
-		if (offset == si->lowest_bit)
-			si->lowest_bit++;
-		if (offset == si->highest_bit)
-			si->highest_bit--;
-		if (si->lowest_bit > si->highest_bit) {
-			si->lowest_bit = si->max;
-			si->highest_bit = 0;
-		}
-		si->swap_map[offset] = 1;
-		si->inuse_pages++;
+		if (si->swap_map[block])
+			continue;
+		si->swap_map[block] = 1;
 		nr_swap_pages--;
-		si->cluster_next = offset+1;
-		return offset;
+		if (akpm)
+			printk("cookie:%p, index:%lu, chunk:%lu nchunks:%lu "
+				"block:%lu\n",
+				cookie, index, chunk, nchunks, block);
+		return block;
 	}
-	si->lowest_bit = si->max;
-	si->highest_bit = 0;
 	return 0;
 }
 
-swp_entry_t get_swap_page(void)
+swp_entry_t get_swap_page(void *cookie, pgoff_t index)
 {
 	struct swap_info_struct * p;
 	unsigned long offset;
@@ -166,7 +147,7 @@ swp_entry_t get_swap_page(void)
 		p = &swap_info[type];
 		if ((p->flags & SWP_ACTIVE) == SWP_ACTIVE) {
 			swap_device_lock(p);
-			offset = scan_swap_map(p);
+			offset = scan_swap_map(p, cookie, index);
 			swap_device_unlock(p);
 			if (offset) {
 				entry = swp_entry(type,offset);

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R1
@ 2004-09-13 14:44 Mark_H_Johnson
  2004-09-14 18:32 ` Ingo Molnar
  0 siblings, 1 reply; 100+ messages in thread
From: Mark_H_Johnson @ 2004-09-13 14:44 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Eric St-Laurent, Free Ekanayaka, free78, K.R. Foley, linux-kernel,
	Felipe Alfaro Solana, luke, nando, P.O. Gaillard, Daniel Schmitt,
	Lee Revell

>most of your remaining latencies seem to be get_swap_page() related -
>the attached (highly experimental) patch might fix that particular
>latency. (ontop of -S0).

Qualitatively, -S0 with this patch is a little worse than what I saw with
-R8.
>From the last message I sent...

> -R8 on September 10 - 38 traces > 200 usec, only 3 > 500 usec
-S0 on September 13 - 78 traces > 200 usec, only 3 > 500 usec

So I have more relatively small delays in the same test period - almost
twice as much with few that I can see are swap related. Most are network
related. One very long trace > 70 msec - this is the third day I've had
a single multiple millisecond trace in a run.

Key contributors seem to be:
 - IRQ sequence         - 00 02 50 64 73 75 76 77
 - modprobe             - 01
 - rtl8139_poll         - 03 04 07 08 09 10 12 13 15 16 17 18 19 21 24 47
                        - 49 72 74
 - tcp_prequeue_process - 05
 - release_sock         - 06 14
 - tcp_copy_to_iovec    - 11
 - do_munmap            - 20 [very long trace]
 - twkill_work          - 22
 - exit_mmap            - 23 25 26 27 28 29 30 31 32 36 37 38 39 40 41 44
                        - 46 48 51 52 53 54 55 57 58 59 60 61 62 66 67 68
                        - 69 70 71
 - kswapd               - 33 34 35 42 63
 - avc_insert           - 43 45 65
 - schedule             - 56

IRQ Sequence
============

Hard to describe - no specific steps that take a long time, just
a long series of activities, appear to be IRQ related that add up to
a > 300 usec time delay. Many I noted have mouse / signal related
activities, see also the "schedule" trace near the end for a similar
condition.

preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-S0
-------------------------------------------------------
 latency: 342 us, entries: 280 (280)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: kjournald/191, uid:0 nice:0 policy:0 rt_prio:0
    -----------------
 => started at: __spin_lock_irqsave+0x2b/0xb0
 => ended at:   _spin_unlock_irq+0x2e/0x60
=======>
[should look at the full traces for details]

modprobe
========

The shortest trace. I had ran
  system-config-soundcard
to play the test sound before starting the real time test program to
make sure the sound system ran OK. Not something I would expect to do
on a real time system, but something to fix (or document as a limitation).

preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-S0
-------------------------------------------------------
 latency: 460 us, entries: 9 (9)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: modprobe/2995, uid:0 nice:-10 policy:0 rt_prio:0
    -----------------
 => started at: __spin_lock_irqsave+0x2b/0xb0
 => ended at:   _spin_unlock_irq+0x2e/0x60
=======>
00000001 0.000ms (+0.000ms): __spin_lock_irqsave (spin_lock_irq)
00000001 0.000ms (+0.000ms): __spin_lock_irqsave (<00000000>)
00000001 0.000ms (+0.454ms): __find_symbol (resolve_symbol)
00000001 0.455ms (+0.000ms): use_module (resolve_symbol)
00000001 0.455ms (+0.003ms): already_uses (use_module)
00000001 0.459ms (+0.001ms): kmem_cache_alloc (use_module)
00000001 0.460ms (+0.000ms): _spin_unlock_irq (resolve_symbol)
00000001 0.460ms (+0.000ms): sub_preempt_count (_spin_unlock_irq)
00000001 0.461ms (+0.000ms): update_max_trace (check_preempt_timing)

rtl8139_poll
============

We've talked this one before but still shows up on a regular basis.

preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-S0
-------------------------------------------------------
 latency: 347 us, entries: 245 (245)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: ksoftirqd/0/3, uid:0 nice:-10 policy:0 rt_prio:0
    -----------------
 => started at: spin_lock+0x24/0x90
 => ended at:   rtl8139_rx+0x219/0x340
=======>
[again - no big steps, just a large number w/o reschedule opportunity]

tcp_prequeue_process
====================

preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-S0
-------------------------------------------------------
 latency: 271 us, entries: 294 (294)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: rcp/3591, uid:2711 nice:0 policy:0 rt_prio:0
    -----------------
 => started at: tcp_prequeue_process+0x49/0xb0
 => ended at:   tcp_write_xmit+0x249/0x330
=======>
[appears similar to the others - many steps, each one very short]

release_sock
============

preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-S0
-------------------------------------------------------
 latency: 209 us, entries: 275 (275)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: rcp/3705, uid:2711 nice:0 policy:0 rt_prio:0
    -----------------
 => started at: __spin_lock_irqsave+0x2b/0xb0
 => ended at:   cond_resched_softirq+0x65/0x90
=======>
00000001 0.000ms (+0.000ms): __spin_lock_irqsave (spin_lock_bh)
00000001 0.000ms (+0.000ms): __spin_lock_irqsave (<00000000>)
00000101 0.000ms (+0.000ms): __release_sock (release_sock)
00000101 0.000ms (+0.000ms): _spin_unlock (__release_sock)
... long sequence, not long in each step ...

tcp_copy_to_iovec
=================

preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-S0
-------------------------------------------------------
 latency: 210 us, entries: 243 (243)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: rcp/3894, uid:2711 nice:0 policy:0 rt_prio:0
    -----------------
 => started at: tcp_copy_to_iovec+0x5b/0xb0
 => ended at:   local_bh_enable+0x17/0xa0
=======>
00000100 0.000ms (+0.002ms): tcp_copy_to_iovec (tcp_rcv_established)
00000100 0.002ms (+0.005ms): tcp_event_data_recv (tcp_rcv_established)
00000100 0.007ms (+0.000ms): tcp_send_ack (tcp_rcv_established)
... large number of steps, each one very short in duration ...

do_munmap
=========

The very long trace (> 4000 traces, > 70 msec). Following lists all steps
until we get "stuck" with roughly 1 msec pauses. Note - I don't have the
"how we got out" part of this trace but see previous messages for that
kind of detail.

preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-S0
-------------------------------------------------------
 latency: 71065 us, entries: 4000 (5343)   |   [VP:1 KP:1 SP:1 HP:1
#CPUS:2]
    -----------------
    | task: latencytest/4408, uid:0 nice:0 policy:1 rt_prio:99
    -----------------
 => started at: spin_lock+0x24/0x90
 => ended at:   cond_resched_lock+0x1d/0xa0
=======>
00000001 0.000ms (+0.000ms): spin_lock (do_munmap)
00000001 0.000ms (+0.000ms): spin_lock (<00000000>)
00000001 0.000ms (+0.000ms): unmap_region (do_munmap)
00000001 0.001ms (+0.000ms): lru_add_drain (unmap_region)
00000002 0.001ms (+0.000ms): __pagevec_lru_add_active (lru_add_drain)
00000002 0.002ms (+0.000ms): spin_lock_irq (__pagevec_lru_add_active)
00000002 0.002ms (+0.000ms): __spin_lock_irqsave (spin_lock_irq)
00000003 0.002ms (+0.001ms): __spin_lock_irqsave (<00000000>)
00000003 0.003ms (+0.000ms): _spin_unlock_irq (__pagevec_lru_add_active)
00000002 0.003ms (+0.001ms): release_pages (__pagevec_lru_add_active)
00000001 0.004ms (+0.001ms): __bitmap_weight (unmap_region)
00000001 0.006ms (+0.001ms): unmap_vmas (unmap_region)
00000001 0.007ms (+0.000ms): unmap_page_range (unmap_vmas)
00000001 0.007ms (+0.000ms): zap_pmd_range (unmap_page_range)
00000001 0.008ms (+0.000ms): zap_pte_range (zap_pmd_range)
00000001 0.009ms (+0.000ms): kmap_atomic (zap_pte_range)
00000002 0.009ms (+0.001ms): page_address (zap_pte_range)
00000002 0.011ms (+0.000ms): set_page_dirty (zap_pte_range)
00000002 0.012ms (+0.000ms): page_remove_rmap (zap_pte_range)
00000002 0.013ms (+0.000ms): set_page_dirty (zap_pte_range)
00000002 0.013ms (+0.000ms): page_remove_rmap (zap_pte_range)
00000002 0.013ms (+0.000ms): set_page_dirty (zap_pte_range)
00000002 0.013ms (+0.000ms): page_remove_rmap (zap_pte_range)
00000002 0.014ms (+0.000ms): set_page_dirty (zap_pte_range)
00000002 0.014ms (+0.000ms): page_remove_rmap (zap_pte_range)
00000002 0.015ms (+0.000ms): set_page_dirty (zap_pte_range)
00000002 0.015ms (+0.000ms): page_remove_rmap (zap_pte_range)
00000002 0.015ms (+0.000ms): set_page_dirty (zap_pte_range)
00000002 0.016ms (+0.000ms): page_remove_rmap (zap_pte_range)
00000002 0.016ms (+0.000ms): set_page_dirty (zap_pte_range)
00000002 0.016ms (+0.001ms): page_remove_rmap (zap_pte_range)
00000002 0.017ms (+0.000ms): smp_apic_timer_interrupt (zap_pte_range)
00010002 0.018ms (+0.000ms): profile_tick (smp_apic_timer_interrupt)
00010002 0.018ms (+0.000ms): profile_hook (profile_tick)
00010002 0.018ms (+0.000ms): read_lock (profile_hook)
00010003 0.019ms (+0.000ms): read_lock (<00000000>)
00010003 0.019ms (+0.000ms): notifier_call_chain (profile_hook)
00010003 0.019ms (+0.000ms): _read_unlock (profile_tick)
00010002 0.020ms (+0.000ms): profile_hit (smp_apic_timer_interrupt)
00010002 0.021ms (+0.000ms): update_process_times
(smp_apic_timer_interrupt)
00010002 0.021ms (+0.000ms): update_one_process (update_process_times)
00010002 0.022ms (+0.000ms): run_local_timers (update_process_times)
00010002 0.022ms (+0.000ms): raise_softirq (update_process_times)
00010002 0.023ms (+0.000ms): scheduler_tick (update_process_times)
00010002 0.023ms (+0.002ms): sched_clock (scheduler_tick)
00010002 0.026ms (+0.000ms): spin_lock (scheduler_tick)
00010003 0.027ms (+0.000ms): spin_lock (<00000000>)
00010003 0.027ms (+0.000ms): _spin_unlock (scheduler_tick)
00010002 0.027ms (+0.001ms): rebalance_tick (scheduler_tick)
00000003 0.028ms (+0.000ms): do_softirq (smp_apic_timer_interrupt)
00000003 0.029ms (+0.001ms): __do_softirq (do_softirq)
00000002 0.030ms (+0.000ms): set_page_dirty (zap_pte_range)
00000002 0.031ms (+0.000ms): page_remove_rmap (zap_pte_range)
00000002 0.031ms (+0.001ms): kunmap_atomic (zap_pte_range)
00000001 0.032ms (+0.002ms): flush_tlb_mm (unmap_vmas)
00000002 0.034ms (+0.000ms): flush_tlb_others (flush_tlb_mm)
00000002 0.035ms (+0.000ms): spin_lock (flush_tlb_others)
00000003 0.035ms (+0.000ms): spin_lock (<00000000>)
00000003 0.036ms (+0.000ms): send_IPI_mask (flush_tlb_others)
00000003 0.037ms (+0.158ms): send_IPI_mask_bitmask (flush_tlb_others)
00010003 0.195ms (+0.000ms): do_nmi (flush_tlb_others)
00010003 0.195ms (+0.002ms): do_nmi (__trace)
00010003 0.198ms (+0.000ms): notifier_call_chain (default_do_nmi)
00010003 0.199ms (+0.000ms): profile_tick (nmi_watchdog_tick)
00010003 0.199ms (+0.000ms): profile_hook (profile_tick)
00010003 0.199ms (+0.000ms): read_lock (profile_hook)
00010004 0.199ms (+0.000ms): read_lock (<00000000>)
00010004 0.199ms (+0.000ms): notifier_call_chain (profile_hook)
00010004 0.200ms (+0.000ms): _read_unlock (profile_tick)
00010003 0.200ms (+0.001ms): profile_hit (nmi_watchdog_tick)
00010003 0.202ms (+0.000ms): do_IRQ (flush_tlb_others)
00010003 0.202ms (+0.000ms): do_IRQ (<00000000>)
00010003 0.202ms (+0.000ms): spin_lock (do_IRQ)
00010004 0.203ms (+0.000ms): spin_lock (<00000000>)
00010004 0.203ms (+0.000ms): ack_edge_ioapic_irq (do_IRQ)
00010004 0.203ms (+0.000ms): redirect_hardirq (do_IRQ)
00010004 0.204ms (+0.000ms): _spin_unlock (do_IRQ)
00010003 0.204ms (+0.000ms): handle_IRQ_event (do_IRQ)
00010003 0.204ms (+0.000ms): timer_interrupt (handle_IRQ_event)
00010003 0.204ms (+0.000ms): spin_lock (timer_interrupt)
00010004 0.205ms (+0.000ms): spin_lock (<00000000>)
00010004 0.205ms (+0.000ms): mark_offset_tsc (timer_interrupt)
00010004 0.205ms (+0.000ms): mark_offset_tsc (timer_interrupt)
00010004 0.206ms (+0.000ms): spin_lock (mark_offset_tsc)
00010005 0.206ms (+0.000ms): spin_lock (<00000000>)
00010005 0.206ms (+0.000ms): mark_offset_tsc (timer_interrupt)
00010005 0.206ms (+0.000ms): mark_offset_tsc (timer_interrupt)
00010005 0.206ms (+0.000ms): spin_lock (mark_offset_tsc)
00010006 0.207ms (+0.000ms): spin_lock (<00000000>)
00010006 0.207ms (+0.003ms): mark_offset_tsc (timer_interrupt)
00010006 0.211ms (+0.004ms): mark_offset_tsc (timer_interrupt)
00010006 0.215ms (+0.002ms): mark_offset_tsc (timer_interrupt)
00010006 0.217ms (+0.000ms): mark_offset_tsc (timer_interrupt)
00010006 0.217ms (+0.000ms): mark_offset_tsc (timer_interrupt)
00010006 0.217ms (+0.000ms): _spin_unlock (mark_offset_tsc)
00010005 0.218ms (+0.000ms): mark_offset_tsc (timer_interrupt)
00010005 0.218ms (+0.000ms): mark_offset_tsc (timer_interrupt)
00010005 0.218ms (+0.000ms): mark_offset_tsc (timer_interrupt)
00010005 0.219ms (+0.000ms): _spin_unlock (mark_offset_tsc)
00010004 0.219ms (+0.000ms): timer_interrupt (handle_IRQ_event)
00010004 0.219ms (+0.000ms): spin_lock (timer_interrupt)
00010005 0.220ms (+0.003ms): spin_lock (<00000000>)
00010005 0.223ms (+0.000ms): _spin_unlock (timer_interrupt)
00010004 0.224ms (+0.000ms): do_timer (timer_interrupt)
00010004 0.224ms (+0.000ms): update_wall_time (do_timer)
00010004 0.225ms (+0.000ms): update_wall_time_one_tick (update_wall_time)
00010004 0.225ms (+0.000ms): _spin_unlock (timer_interrupt)
00010003 0.226ms (+0.000ms): spin_lock (do_IRQ)
00010004 0.226ms (+0.000ms): spin_lock (<00000000>)
00010004 0.226ms (+0.000ms): note_interrupt (do_IRQ)
00010004 0.227ms (+0.000ms): end_edge_ioapic_irq (do_IRQ)
00010004 0.227ms (+0.000ms): _spin_unlock (do_IRQ)
00000004 0.227ms (+0.000ms): do_softirq (do_IRQ)
00000004 0.227ms (+0.788ms): __do_softirq (do_softirq)
00000003 1.016ms (+0.000ms): smp_apic_timer_interrupt (flush_tlb_others)
00010003 1.016ms (+0.000ms): profile_tick (smp_apic_timer_interrupt)
00010003 1.017ms (+0.000ms): profile_hook (profile_tick)
00010003 1.017ms (+0.000ms): read_lock (profile_hook)
00010004 1.017ms (+0.000ms): read_lock (<00000000>)
00010004 1.017ms (+0.000ms): notifier_call_chain (profile_hook)
00010004 1.017ms (+0.000ms): _read_unlock (profile_tick)
00010003 1.018ms (+0.000ms): profile_hit (smp_apic_timer_interrupt)
00010003 1.018ms (+0.000ms): update_process_times
(smp_apic_timer_interrupt)
00010003 1.018ms (+0.000ms): update_one_process (update_process_times)
00010003 1.018ms (+0.000ms): run_local_timers (update_process_times)
00010003 1.018ms (+0.000ms): raise_softirq (update_process_times)
00010003 1.019ms (+0.000ms): scheduler_tick (update_process_times)
00010003 1.019ms (+0.000ms): sched_clock (scheduler_tick)
00010003 1.019ms (+0.000ms): spin_lock (scheduler_tick)
00010004 1.019ms (+0.000ms): spin_lock (<00000000>)
00010004 1.019ms (+0.000ms): _spin_unlock (scheduler_tick)
00010003 1.020ms (+0.000ms): rebalance_tick (scheduler_tick)
00000004 1.020ms (+0.000ms): do_softirq (smp_apic_timer_interrupt)
00000004 1.020ms (+0.173ms): __do_softirq (do_softirq)
00010003 1.193ms (+0.000ms): do_nmi (flush_tlb_others)
00010003 1.194ms (+0.001ms): do_nmi (__trace)
00010003 1.195ms (+0.000ms): notifier_call_chain (default_do_nmi)
00010003 1.196ms (+0.000ms): profile_tick (nmi_watchdog_tick)
00010003 1.196ms (+0.000ms): profile_hook (profile_tick)
00010003 1.196ms (+0.000ms): read_lock (profile_hook)
00010004 1.196ms (+0.000ms): read_lock (<00000000>)
00010004 1.196ms (+0.000ms): notifier_call_chain (profile_hook)
00010004 1.197ms (+0.000ms): _read_unlock (profile_tick)
00010003 1.197ms (+0.000ms): profile_hit (nmi_watchdog_tick)
00010003 1.198ms (+0.000ms): do_IRQ (flush_tlb_others)
00010003 1.198ms (+0.000ms): do_IRQ (<00000000>)
00010003 1.198ms (+0.000ms): spin_lock (do_IRQ)
00010004 1.198ms (+0.000ms): spin_lock (<00000000>)
00010004 1.198ms (+0.000ms): ack_edge_ioapic_irq (do_IRQ)
00010004 1.199ms (+0.000ms): redirect_hardirq (do_IRQ)
00010004 1.199ms (+0.000ms): _spin_unlock (do_IRQ)
00010003 1.199ms (+0.000ms): handle_IRQ_event (do_IRQ)
00010003 1.199ms (+0.000ms): timer_interrupt (handle_IRQ_event)
00010003 1.199ms (+0.000ms): spin_lock (timer_interrupt)
00010004 1.200ms (+0.000ms): spin_lock (<00000000>)
00010004 1.200ms (+0.000ms): mark_offset_tsc (timer_interrupt)
00010004 1.200ms (+0.000ms): mark_offset_tsc (timer_interrupt)
00010004 1.200ms (+0.000ms): spin_lock (mark_offset_tsc)
00010005 1.200ms (+0.000ms): spin_lock (<00000000>)
00010005 1.201ms (+0.000ms): mark_offset_tsc (timer_interrupt)
00010005 1.201ms (+0.000ms): mark_offset_tsc (timer_interrupt)
00010005 1.201ms (+0.000ms): spin_lock (mark_offset_tsc)
00010006 1.201ms (+0.000ms): spin_lock (<00000000>)
00010006 1.201ms (+0.003ms): mark_offset_tsc (timer_interrupt)
00010006 1.205ms (+0.004ms): mark_offset_tsc (timer_interrupt)
00010006 1.209ms (+0.002ms): mark_offset_tsc (timer_interrupt)
00010006 1.212ms (+0.000ms): mark_offset_tsc (timer_interrupt)
00010006 1.212ms (+0.000ms): mark_offset_tsc (timer_interrupt)
00010006 1.212ms (+0.000ms): _spin_unlock (mark_offset_tsc)
00010005 1.212ms (+0.000ms): mark_offset_tsc (timer_interrupt)
00010005 1.213ms (+0.000ms): mark_offset_tsc (timer_interrupt)
00010005 1.213ms (+0.000ms): mark_offset_tsc (timer_interrupt)
00010005 1.213ms (+0.000ms): _spin_unlock (mark_offset_tsc)
00010004 1.213ms (+0.000ms): timer_interrupt (handle_IRQ_event)
00010004 1.213ms (+0.000ms): spin_lock (timer_interrupt)
00010005 1.214ms (+0.003ms): spin_lock (<00000000>)
00010005 1.217ms (+0.000ms): _spin_unlock (timer_interrupt)
00010004 1.218ms (+0.000ms): do_timer (timer_interrupt)
00010004 1.218ms (+0.000ms): update_wall_time (do_timer)
00010004 1.218ms (+0.000ms): update_wall_time_one_tick (update_wall_time)
00010004 1.218ms (+0.000ms): _spin_unlock (timer_interrupt)
00010003 1.219ms (+0.000ms): spin_lock (do_IRQ)
00010004 1.219ms (+0.000ms): spin_lock (<00000000>)
00010004 1.219ms (+0.000ms): note_interrupt (do_IRQ)
00010004 1.219ms (+0.000ms): end_edge_ioapic_irq (do_IRQ)
00010004 1.219ms (+0.000ms): _spin_unlock (do_IRQ)
00000004 1.220ms (+0.000ms): do_softirq (do_IRQ)
00000004 1.220ms (+0.795ms): __do_softirq (do_softirq)
00000003 2.016ms (+0.000ms): smp_apic_timer_interrupt (flush_tlb_others)
... the steps over the last millisecond get repeated for several cycles ...

twkill_work
===========

A very long sequence (> 1500 traces) with non zero preempt values
that end in "00". Does that mean we did not have any locks but could
not schedule for other reasons?

preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-S0
-------------------------------------------------------
 latency: 429 us, entries: 1531 (1531)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: events/0/6, uid:0 nice:-10 policy:0 rt_prio:0
    -----------------
 => started at: __spin_lock_irqsave+0x2b/0xb0
 => ended at:   twkill_work+0xb5/0xe0
=======>
00000001 0.000ms (+0.000ms): __spin_lock_irqsave (spin_lock_bh)
00000001 0.000ms (+0.000ms): __spin_lock_irqsave (<00000000>)
00000101 0.000ms (+0.000ms): tcp_do_twkill_work (twkill_work)
00000101 0.000ms (+0.000ms): _spin_unlock (tcp_do_twkill_work)
00000100 0.001ms (+0.000ms): tcp_timewait_kill (tcp_do_twkill_work)
00000100 0.001ms (+0.000ms): write_lock (tcp_timewait_kill)
00000101 0.001ms (+0.000ms): write_lock (<00000000>)
00000101 0.002ms (+0.000ms): _write_unlock (tcp_timewait_kill)
00000100 0.002ms (+0.000ms): spin_lock (tcp_timewait_kill)
00000101 0.002ms (+0.000ms): spin_lock (<00000000>)
00000101 0.002ms (+0.000ms): tcp_bucket_destroy (tcp_timewait_kill)
00000101 0.003ms (+0.000ms): kmem_cache_free (tcp_bucket_destroy)
00000101 0.003ms (+0.000ms): _spin_unlock (tcp_timewait_kill)
00000100 0.003ms (+0.000ms): kmem_cache_free (tcp_do_twkill_work)
00000100 0.003ms (+0.000ms): spin_lock (tcp_do_twkill_work)
00000101 0.004ms (+0.000ms): spin_lock (<00000000>)
00000101 0.004ms (+0.000ms): _spin_unlock (tcp_do_twkill_work)
00000100 0.004ms (+0.000ms): tcp_timewait_kill (tcp_do_twkill_work)
00000100 0.004ms (+0.000ms): write_lock (tcp_timewait_kill)
...

exit_mmap
=========

This one has a relatively low time delay for clear_page_tables, but
several other traces have 100 to 200 usec delays at this step. [though
if you add the delay handling the nmi_watchdog, it falls in the 100-200
range as well]

preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-S0
-------------------------------------------------------
 latency: 266 us, entries: 146 (146)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: get_ltrace.sh/5099, uid:0 nice:-9 policy:0 rt_prio:0
    -----------------
 => started at: spin_lock+0x24/0x90
 => ended at:   _spin_unlock+0x2d/0x60
=======>
00000001 0.000ms (+0.000ms): spin_lock (cond_resched_lock)
00000001 0.000ms (+0.000ms): spin_lock (<00000000>)
00000001 0.000ms (+0.000ms): __bitmap_weight (unmap_vmas)
00000001 0.000ms (+0.001ms): vm_acct_memory (exit_mmap)
00000001 0.001ms (+0.056ms): clear_page_tables (exit_mmap)
00010001 0.058ms (+0.000ms): do_nmi (clear_page_tables)
00010001 0.058ms (+0.002ms): do_nmi (__mcount)
00010001 0.060ms (+0.000ms): notifier_call_chain (default_do_nmi)
00010001 0.061ms (+0.000ms): profile_tick (nmi_watchdog_tick)
00010001 0.061ms (+0.000ms): profile_hook (profile_tick)
00010001 0.061ms (+0.000ms): read_lock (profile_hook)
00010002 0.061ms (+0.000ms): read_lock (<00000000>)
00010002 0.061ms (+0.000ms): notifier_call_chain (profile_hook)
00010002 0.062ms (+0.000ms): _read_unlock (profile_tick)
00010001 0.062ms (+0.091ms): profile_hit (nmi_watchdog_tick)
00010001 0.153ms (+0.000ms): do_IRQ (clear_page_tables)
00010001 0.153ms (+0.000ms): do_IRQ (<0000000a>)
00010001 0.153ms (+0.000ms): spin_lock (do_IRQ)
00010002 0.154ms (+0.000ms): spin_lock (<00000000>)
00010002 0.154ms (+0.000ms): mask_and_ack_level_ioapic_irq (do_IRQ)
00010002 0.154ms (+0.000ms): mask_IO_APIC_irq
(mask_and_ack_level_ioapic_irq)
00010002 0.154ms (+0.000ms): __spin_lock_irqsave (mask_IO_APIC_irq)
00010003 0.154ms (+0.000ms): __spin_lock_irqsave (<00000000>)
00010003 0.154ms (+0.000ms): __mask_IO_APIC_irq (mask_IO_APIC_irq)
00010003 0.155ms (+0.018ms): __modify_IO_APIC_irq (__mask_IO_APIC_irq)
00010003 0.173ms (+0.000ms): _spin_unlock_irqrestore (mask_IO_APIC_irq)
00010002 0.174ms (+0.000ms): redirect_hardirq (do_IRQ)
00010002 0.174ms (+0.000ms): _spin_unlock (do_IRQ)
00010001 0.174ms (+0.000ms): handle_IRQ_event (do_IRQ)
00010001 0.174ms (+0.000ms): usb_hcd_irq (handle_IRQ_event)
...

kswapd
======

preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-S0
-------------------------------------------------------
 latency: 557 us, entries: 840 (840)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: kswapd0/40, uid:0 nice:0 policy:0 rt_prio:0
    -----------------
 => started at: spin_lock+0x24/0x90
 => ended at:   _spin_unlock+0x2d/0x60
=======>
00000001 0.000ms (+0.000ms): spin_lock (page_referenced_file)
00000001 0.000ms (+0.000ms): spin_lock (<00000000>)
00000001 0.000ms (+0.000ms): vma_prio_tree_next (page_referenced_file)
00000001 0.001ms (+0.002ms): prio_tree_first (vma_prio_tree_next)
00000001 0.004ms (+0.002ms): prio_tree_left (prio_tree_first)
00000001 0.006ms (+0.000ms): page_referenced_one (page_referenced_file)
00000001 0.007ms (+0.000ms): spin_lock (page_referenced_one)
00000002 0.007ms (+0.001ms): spin_lock (<00000000>)
00000002 0.009ms (+0.001ms): kmap_atomic (page_referenced_one)
00000003 0.011ms (+0.000ms): page_address (page_referenced_one)
00000003 0.012ms (+0.000ms): kunmap_atomic (page_referenced_one)
00000002 0.012ms (+0.000ms): _spin_unlock (page_referenced_one)
00000001 0.012ms (+0.000ms): vma_prio_tree_next (page_referenced_file)
00000001 0.013ms (+0.000ms): page_referenced_one (page_referenced_file)
00000001 0.013ms (+0.003ms): spin_lock (page_referenced_one)
00000002 0.017ms (+0.003ms): spin_lock (<00000000>)
00000002 0.020ms (+0.000ms): kmap_atomic (page_referenced_one)

avc_insert
==========

preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-S0
-------------------------------------------------------
 latency: 266 us, entries: 80 (80)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: fam/2565, uid:0 nice:0 policy:0 rt_prio:0
    -----------------
 => started at: __spin_lock_irqsave+0x2b/0xb0
 => ended at:   _spin_unlock_irqrestore+0x32/0x70
=======>
00000001 0.000ms (+0.000ms): __spin_lock_irqsave (avc_has_perm_noaudit)
00000001 0.000ms (+0.000ms): __spin_lock_irqsave (<00000000>)
00000001 0.000ms (+0.204ms): avc_insert (avc_has_perm_noaudit)
00000001 0.204ms (+0.000ms): memcpy (avc_has_perm_noaudit)
00000001 0.204ms (+0.001ms): _spin_unlock_irqrestore (avc_has_perm_noaudit)
00010001 0.205ms (+0.000ms): do_IRQ (_spin_unlock_irqrestore)
00010001 0.206ms (+0.000ms): do_IRQ (<0000000a>)
00010001 0.206ms (+0.000ms): spin_lock (do_IRQ)
00010002 0.206ms (+0.000ms): spin_lock (<00000000>)
...

schedule
========

Have not talked about schedule in a while - this looks like something
different than before. Appears to be some deep nesting of preempt
disabling activities.

preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-S0
-------------------------------------------------------
 latency: 237 us, entries: 426 (426)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: latencytest/5314, uid:0 nice:0 policy:1 rt_prio:99
    -----------------
 => started at: schedule+0x59/0x5b0
 => ended at:   schedule+0x39d/0x5b0
=======>
00000001 0.000ms (+0.000ms): schedule (ksoftirqd)
00000001 0.000ms (+0.000ms): schedule (ksoftirqd)
00000001 0.000ms (+0.000ms): schedule (ksoftirqd)
00000001 0.000ms (+0.000ms): schedule (ksoftirqd)
00000001 0.000ms (+0.000ms): schedule (ksoftirqd)
00000001 0.000ms (+0.000ms): schedule (ksoftirqd)
00000001 0.000ms (+0.000ms): sched_clock (schedule)
00000001 0.001ms (+0.000ms): schedule (ksoftirqd)
00000001 0.001ms (+0.000ms): schedule (ksoftirqd)
00000001 0.001ms (+0.000ms): schedule (ksoftirqd)
00000001 0.001ms (+0.000ms): spin_lock_irq (schedule)
00000001 0.001ms (+0.000ms): __spin_lock_irqsave (spin_lock_irq)
00000002 0.002ms (+0.000ms): __spin_lock_irqsave (<00000000>)
00000002 0.002ms (+0.000ms): schedule (ksoftirqd)
00000002 0.002ms (+0.000ms): deactivate_task (schedule)
00000002 0.002ms (+0.000ms): dequeue_task (deactivate_task)
00000002 0.002ms (+0.000ms): schedule (ksoftirqd)
00000002 0.002ms (+0.000ms): schedule (ksoftirqd)
00000002 0.003ms (+0.000ms): dependent_sleeper (schedule)
00000002 0.003ms (+0.000ms): schedule (ksoftirqd)
00000002 0.003ms (+0.000ms): schedule (ksoftirqd)
00000002 0.003ms (+0.000ms): schedule (ksoftirqd)
00000002 0.003ms (+0.000ms): schedule (ksoftirqd)
00000002 0.003ms (+0.000ms): dummy_switch_tasks (schedule)
00000002 0.003ms (+0.000ms): schedule (ksoftirqd)
00000002 0.004ms (+0.000ms): schedule (ksoftirqd)
00000002 0.004ms (+0.000ms): schedule (ksoftirqd)
00000002 0.004ms (+0.000ms): schedule (ksoftirqd)
00000002 0.004ms (+0.000ms): schedule (ksoftirqd)
00000002 0.004ms (+0.000ms): sched_info_switch (schedule)
00000002 0.004ms (+0.000ms): sched_info_depart (sched_info_switch)
00000002 0.005ms (+0.000ms): sched_info_arrive (schedule)
00000002 0.005ms (+0.000ms): sched_info_dequeued (sched_info_arrive)
00000002 0.005ms (+0.000ms): schedule (ksoftirqd)
00000002 0.005ms (+0.000ms): context_switch (schedule)
00000002 0.005ms (+0.000ms): dummy_cs_entry (context_switch)
00000002 0.005ms (+0.000ms): context_switch (schedule)
00000002 0.006ms (+0.000ms): dummy_cs_switch_mm (context_switch)
00000002 0.006ms (+0.000ms): context_switch (schedule)
00000002 0.006ms (+0.000ms): dummy_cs_unlikely_if (context_switch)
00000002 0.006ms (+0.000ms): context_switch (schedule)
00000002 0.006ms (+0.000ms): dummy_cs_switch_to (context_switch)
00000002 0.006ms (+0.000ms): context_switch (schedule)
04000002 0.007ms (+0.000ms): __switch_to (context_switch)
04000002 0.007ms (+0.000ms): dummy_cs_exit (context_switch)
04000002 0.007ms (+0.000ms): context_switch (schedule)
04000002 0.007ms (+0.000ms): finish_task_switch (schedule)
04000002 0.008ms (+0.000ms): _spin_unlock_irq (finish_task_switch)
04010002 0.008ms (+0.000ms): do_IRQ (_spin_unlock_irq)
04010002 0.008ms (+0.000ms): do_IRQ (<0000000a>)
04010002 0.008ms (+0.000ms): spin_lock (do_IRQ)
04010003 0.009ms (+0.000ms): spin_lock (<00000000>)
04010003 0.009ms (+0.000ms): mask_and_ack_level_ioapic_irq (do_IRQ)
04010003 0.009ms (+0.000ms): mask_IO_APIC_irq
(mask_and_ack_level_ioapic_irq)
04010003 0.009ms (+0.000ms): __spin_lock_irqsave (mask_IO_APIC_irq)
04010004 0.009ms (+0.000ms): __spin_lock_irqsave (<00000000>)
04010004 0.009ms (+0.000ms): __mask_IO_APIC_irq (mask_IO_APIC_irq)
04010004 0.010ms (+0.020ms): __modify_IO_APIC_irq (__mask_IO_APIC_irq)
04010004 0.030ms (+0.000ms): _spin_unlock_irqrestore (mask_IO_APIC_irq)
04010003 0.030ms (+0.000ms): redirect_hardirq (do_IRQ)
04010003 0.030ms (+0.000ms): _spin_unlock (do_IRQ)
04010002 0.031ms (+0.000ms): handle_IRQ_event (do_IRQ)
04010002 0.031ms (+0.000ms): usb_hcd_irq (handle_IRQ_event)
04010002 0.031ms (+0.001ms): uhci_irq (usb_hcd_irq)
04010002 0.032ms (+0.000ms): spin_lock (uhci_irq)
04010003 0.032ms (+0.000ms): spin_lock (<00000000>)
04010003 0.032ms (+0.000ms): uhci_get_current_frame_number (uhci_irq)
04010003 0.033ms (+0.000ms): uhci_free_pending_qhs (uhci_irq)
04010003 0.033ms (+0.000ms): uhci_free_pending_tds (uhci_irq)
...
gets even deeper nesting with the first value going up to 04010008
while the system appears to send signals based on mouse input. None
of the steps take very long, but with over 400 steps in the sequence
it shows up in the trace record.

  --Mark


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R1
  2004-09-13 14:44 Mark_H_Johnson
@ 2004-09-14 18:32 ` Ingo Molnar
  2004-09-15 15:26   ` Stephen Smalley
  0 siblings, 1 reply; 100+ messages in thread
From: Ingo Molnar @ 2004-09-14 18:32 UTC (permalink / raw)
  To: Mark_H_Johnson
  Cc: Eric St-Laurent, Free Ekanayaka, free78, K.R. Foley, linux-kernel,
	Felipe Alfaro Solana, luke, nando, P.O. Gaillard, Daniel Schmitt,
	Lee Revell

* Mark_H_Johnson@raytheon.com <Mark_H_Johnson@raytheon.com> wrote:

> So I have more relatively small delays in the same test period -
> almost twice as much with few that I can see are swap related. Most
> are network related. One very long trace > 70 msec - this is the third
> day I've had a single multiple millisecond trace in a run.

the big one is the tlb_flush_others() latency:

> 00010003 0.195ms (+0.000ms): do_nmi (flush_tlb_others)
> 00010003 0.195ms (+0.002ms): do_nmi (__trace)

> 00010003 1.193ms (+0.000ms): do_nmi (flush_tlb_others)
> 00010003 1.194ms (+0.001ms): do_nmi (__trace)

> ... the steps over the last millisecond get repeated for several cycles ...

this is similar to what i described in earlier emails - IPI passing 
getting held up by some IRQs-off activity on the other CPU. You can see 
what happened on the other CPU by doing this:

  grep ': do_nmi' lt.20 | grep -v flush_tlb_other

there's only one suspicious thing, roughly half into the trace:

  00010003 30.178ms (+0.004ms): do_nmi (run_timer_softirq)
  00010003 31.178ms (+0.001ms): do_nmi (run_timer_softirq)

so run_timer_softirq() was done - but we dont normally run that with
interrupts disabled ...

so maybe something else is holding up the IPI? Eventually the IPI
arrived so it could very well be just a 70 msec delay due to printing a
latency trace to the console. But ... maybe something odd is going on
that delays IPIs. To debug this further, could you do something like
this to kernel/latency.c's nmi_trace() function:

		if (cpu_online(cpu) && cpu != this_cpu) {
			__trace(eip, nmi_eips[cpu]);
			__trace(eip, irq_stat[cpu].apic_timer_irqs);
		}

this will add a third do_nmi entry to the trace, showing the APIC timer
interrupt counter on that other CPU. If this counter doesnt increase
during the ~70 ticks then that CPU had interrupts disabled permanently
and the IPI latency is normal. If the counter increases then the IPI
delay is unjustified and there's something bad going on ...

> A very long sequence (> 1500 traces) with non zero preempt values that
> end in "00". Does that mean we did not have any locks but could not
> schedule for other reasons?

no - it's 0x00000100 that is SOFTIRQ_OFFSET, so it's normal behavior.

The design is like this: the softirq and hardirq counts are 'merged'
into preempt_count(). spin_lock/unlock increases/decreases the
preempt_count() by 1. Softirq disabling increases the preempt_count() by
0x100 - hardirq entry increases the preempt_count() by 0x10000. This way
we have a nesting up to 255 and we are still able to tell from the
preempt_count() alone whether we are in a hardirq, or whether softirqs
are disabled.

For scheduling purposes all that matters is whether the preempt_count is
nonzero (in which case the kernel must not be preempted). But there are
other contexts too that need exclusion: softirqs will run asynchronously
even if the preempt_count is nonzero - as long as the bits covered by
SOFTIRQ_MASK (0xff00) are not set.

the selinux overhead looks interesting:

> avc_insert
> ==========
> 
> preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-S0
> -------------------------------------------------------
>  latency: 266 us, entries: 80 (80)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
>     -----------------
>     | task: fam/2565, uid:0 nice:0 policy:0 rt_prio:0
>     -----------------
>  => started at: __spin_lock_irqsave+0x2b/0xb0
>  => ended at:   _spin_unlock_irqrestore+0x32/0x70
> =======>
> 00000001 0.000ms (+0.000ms): __spin_lock_irqsave (avc_has_perm_noaudit)
> 00000001 0.000ms (+0.000ms): __spin_lock_irqsave (<00000000>)
> 00000001 0.000ms (+0.204ms): avc_insert (avc_has_perm_noaudit)
> 00000001 0.204ms (+0.000ms): memcpy (avc_has_perm_noaudit)
> 00000001 0.204ms (+0.001ms): _spin_unlock_irqrestore (avc_has_perm_noaudit)

it seems a bit odd. avc_has_perm_noaudit() calls avc_insert(), and from
the entry to avc_insert(), to the memcpy() call done by
avc_has_perm_noaudit() there were 204 usecs spent. That's alot of time -
175 thousand cycles! Now if you check out the code between line 1009 and
1019 in security/selinux/avc.c, there's little that could cause this
amount of overhead. avc_insert() itself is rather simple - it does an
avc_claim_node() call which is an inlined function so it doesnt show up
in the trace. That function _might_ have called acv_reclaim_node() which
seems to do a linear scan over a list - i dont know how long that list
is typically but it could be long. Could you add "#define inline" near
the top of avc.c (but below the #include lines) so that we can see how
the overhead is distributed in the future?

> schedule
> ========
> 
> Have not talked about schedule in a while - this looks like something
> different than before. Appears to be some deep nesting of preempt
> disabling activities.

> ... gets even deeper nesting with the first value going up to 04010008
> while the system appears to send signals based on mouse input. None of
> the steps take very long, but with over 400 steps in the sequence it
> shows up in the trace record.

it could be tracing overhead - you've got inlining disabled in sched.c
(upon my request) and you've got the extra lines you added to find out
where the overhead is. Now that we know that DMA was (is?) a big
contributor to the mystic latencies you might want to remove those.

	Ingo

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R1
  2004-09-14 18:32 ` Ingo Molnar
@ 2004-09-15 15:26   ` Stephen Smalley
  0 siblings, 0 replies; 100+ messages in thread
From: Stephen Smalley @ 2004-09-15 15:26 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Mark_H_Johnson, Eric St-Laurent, Free Ekanayaka, free78,
	K.R. Foley, linux-kernel, Felipe Alfaro Solana, luke, nando,
	P.O. Gaillard, Daniel Schmitt, Lee Revell

On Tue, 2004-09-14 at 14:32, Ingo Molnar wrote:
> it seems a bit odd. avc_has_perm_noaudit() calls avc_insert(), and from
> the entry to avc_insert(), to the memcpy() call done by
> avc_has_perm_noaudit() there were 204 usecs spent. That's alot of time -
> 175 thousand cycles! Now if you check out the code between line 1009 and
> 1019 in security/selinux/avc.c, there's little that could cause this
> amount of overhead. avc_insert() itself is rather simple - it does an
> avc_claim_node() call which is an inlined function so it doesnt show up
> in the trace. That function _might_ have called acv_reclaim_node() which
> seems to do a linear scan over a list - i dont know how long that list
> is typically but it could be long. Could you add "#define inline" near
> the top of avc.c (but below the #include lines) so that we can see how
> the overhead is distributed in the future?

May be rendered moot by the RCU patches for SELinux, see
http://marc.theaimsgroup.com/?l=linux-kernel&m=109386501021596&w=2

-- 
Stephen Smalley <sds@epoch.ncsc.mil>
National Security Agency


^ permalink raw reply	[flat|nested] 100+ messages in thread

end of thread, other threads:[~2004-09-15 15:31 UTC | newest]

Thread overview: 100+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-08-30 19:13 [patch] voluntary-preempt-2.6.9-rc1-bk4-Q5 Mark_H_Johnson
2004-08-30 19:21 ` Ingo Molnar
2004-09-01 12:31   ` [patch] voluntary-preempt-2.6.9-rc1-bk4-Q5 - netdev_max_back_log is too small P.O. Gaillard
2004-09-01 13:05     ` Ingo Molnar
2004-09-02 11:24       ` [patch] voluntary-preempt-2.6.9-rc1-bk4-Q6 - network is no longer smooth P.O. Gaillard
2004-09-02 11:28         ` Ingo Molnar
2004-09-02 15:26           ` P.O. Gaillard
2004-08-31  8:49 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-Q5 Ingo Molnar
2004-09-02  6:33 ` Ingo Molnar
2004-09-02  6:55   ` [patch] voluntary-preempt-2.6.9-rc1-bk4-Q8 Ingo Molnar
2004-09-02  7:04     ` Lee Revell
2004-09-02  7:15       ` Ingo Molnar
2004-09-02  7:31         ` Lee Revell
2004-09-02  7:46           ` Ingo Molnar
2004-09-03  1:10             ` Rusty Russell
2004-09-02 23:25         ` Lee Revell
2004-09-02 23:28           ` Ingo Molnar
2004-09-02 23:32             ` Lee Revell
2004-09-02  7:17       ` Ingo Molnar
2004-09-02  8:23     ` Ingo Molnar
2004-09-02 11:10     ` [patch] voluntary-preempt-2.6.9-rc1-bk4-Q9 Ingo Molnar
2004-09-02 12:14       ` Thomas Charbonnel
2004-09-02 13:16       ` Thomas Charbonnel
2004-09-02 13:23         ` Ingo Molnar
2004-09-02 14:38           ` Thomas Charbonnel
2004-09-02 21:57       ` [patch] voluntary-preempt-2.6.9-rc1-bk4-R0 Ingo Molnar
2004-09-02 22:06         ` Lee Revell
2004-09-02 22:14           ` Ingo Molnar
2004-09-02 22:15             ` Lee Revell
2004-09-03  0:24             ` Lee Revell
2004-09-03  3:17               ` Eric St-Laurent
2004-09-03  6:26                 ` Lee Revell
2004-09-03  6:36                   ` Ingo Molnar
2004-09-03  6:49                     ` Lee Revell
2004-09-03  7:01                       ` Ingo Molnar
2004-09-03  7:05                       ` Ingo Molnar
2004-09-03  7:40                         ` Lee Revell
2004-09-03  7:50                           ` Free Ekanayaka
2004-09-03  8:05                             ` Lee Revell
2004-09-03  9:05                               ` Free Ekanayaka
2004-09-03  9:25                               ` [patch] voluntary-preempt-2.6.9-rc1-bk4-R1 Ingo Molnar
2004-09-03  9:50                                 ` Luke Yelavich
2004-09-03 10:29                                   ` Ingo Molnar
2004-09-03 10:43                                     ` Luke Yelavich
2004-09-03 11:33                                 ` Thomas Charbonnel
2004-09-03 11:49                                   ` Ingo Molnar
2004-09-03 12:05                                     ` Thomas Charbonnel
2004-09-03 16:14                                     ` Thomas Charbonnel
2004-09-03 17:36                                       ` Thomas Charbonnel
2004-09-03 11:36                                 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-R2 Ingo Molnar
2004-09-03  8:09                           ` [patch] voluntary-preempt-2.6.9-rc1-bk4-R0 Luke Yelavich
2004-09-03  8:13                             ` Lee Revell
2004-09-03  8:21                               ` Luke Yelavich
2004-09-03 12:52                               ` Luke Yelavich
2004-09-03 18:09                         ` K.R. Foley
2004-09-03 11:04         ` K.R. Foley
2004-09-03 17:02         ` K.R. Foley
2004-09-03 20:40           ` Lee Revell
2004-09-03 17:10         ` K.R. Foley
2004-09-03 18:17           ` Ingo Molnar
2004-09-03 18:36             ` K.R. Foley
2004-09-03 19:30             ` [patch] voluntary-preempt-2.6.9-rc1-bk4-R3 Ingo Molnar
2004-09-03 19:49               ` K.R. Foley
2004-09-04  3:39               ` K.R. Foley
2004-09-04  3:43               ` K.R. Foley
2004-09-04  6:41                 ` Ingo Molnar
2004-09-04 12:28                   ` K.R. Foley
2004-09-04  8:57                 ` Ingo Molnar
2004-09-04 10:16                   ` Lee Revell
2004-09-04 14:35                   ` K.R. Foley
2004-09-04 20:05                     ` Ingo Molnar
2004-09-03 18:39           ` [patch] voluntary-preempt-2.6.9-rc1-bk4-R0 Ingo Molnar
2004-09-03 18:41             ` K.R. Foley
  -- strict thread matches above, loose matches on Subject: below --
2004-09-07 16:54 [patch] voluntary-preempt-2.6.9-rc1-bk4-R1 Mark_H_Johnson
2004-09-08 18:42 ` Ingo Molnar
2004-09-10  2:31   ` K.R. Foley
2004-09-10 17:56     ` K.R. Foley
2004-09-10 19:26       ` Ingo Molnar
2004-09-10 19:28         ` K.R. Foley
2004-09-09 16:02 ` Ingo Molnar
2004-09-08 20:33 Mark_H_Johnson
2004-09-08 22:31 ` Alan Cox
2004-09-09 16:44   ` Thomas Charbonnel
2004-09-09  6:27 ` Ingo Molnar
2004-09-09 15:12 ` Ingo Molnar
2004-09-09 15:33 ` Ingo Molnar
2004-09-09 18:31 Mark_H_Johnson
2004-09-09 19:23 Mark_H_Johnson
2004-09-09 19:47 ` Ingo Molnar
2004-09-09 19:02   ` Alan Cox
2004-09-09 22:41 Mark_H_Johnson
2004-09-09 22:56 Mark_H_Johnson
2004-09-10 12:42 ` Ingo Molnar
2004-09-10 12:57 Mark_H_Johnson
2004-09-10 13:10 ` Ingo Molnar
2004-09-10 14:28 Mark_H_Johnson
2004-09-10 19:29 ` Ingo Molnar
2004-09-13 14:44 Mark_H_Johnson
2004-09-14 18:32 ` Ingo Molnar
2004-09-15 15:26   ` Stephen Smalley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox