public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-7
@ 2004-11-26 12:12 Rui Nuno Capela
  2004-11-29 11:05 ` Ingo Molnar
  2004-11-29 11:16 ` Ingo Molnar
  0 siblings, 2 replies; 59+ messages in thread
From: Rui Nuno Capela @ 2004-11-26 12:12 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Lee Revell, mark_h_johnson, K.R. Foley, Bill Huey,
	Adam Heath, Florian Schmidt, Thomas Gleixner, Michal Schmidt,
	Fernando Pablo Lopez-Lezcano, Karsten Wiese, Gunther Persoons,
	emann, Shane Shrybman, Amit Shah, Esben Nielsen

Hi Ingo et al.

(Sorry on the previous post, touched the wrong button :)


I'm sending (sent) some XRUN latencies captured just a minute ago.

  xruntrace1.0.tar.gz
     - scripts used for xrun trace capture automation

  xruntrace1-2.6.10-rc2-mm3-RT-V0.7.31-7.tar.gz
     - actual xrun traces captured while running jackd on RT-V0.7.31-7

  config-2.6.10-rc2-mm3-RT-V0.7.31-7.gz
     - kernel configuration as of my laptop, taken from /proc/config.gz

Cheers.
-- 
rncbc aka Rui Nuno Capela
rncbc@rncbc.org




^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-7
  2004-11-26 12:12 Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-7 Rui Nuno Capela
@ 2004-11-29 11:05 ` Ingo Molnar
  2004-11-29 11:16 ` Ingo Molnar
  1 sibling, 0 replies; 59+ messages in thread
From: Ingo Molnar @ 2004-11-29 11:05 UTC (permalink / raw)
  To: Rui Nuno Capela
  Cc: linux-kernel, Lee Revell, mark_h_johnson, K.R. Foley, Bill Huey,
	Adam Heath, Florian Schmidt, Thomas Gleixner, Michal Schmidt,
	Fernando Pablo Lopez-Lezcano, Karsten Wiese, Gunther Persoons,
	emann, Shane Shrybman, Amit Shah, Esben Nielsen


* Rui Nuno Capela <rncbc@rncbc.org> wrote:

>   config-2.6.10-rc2-mm3-RT-V0.7.31-7.gz
>      - kernel configuration as of my laptop, taken from /proc/config.gz

Here are some .config suggestion to reduce tracing costs. Here's the
relevant portion of your .config:

 CONFIG_DEBUG_KERNEL=y
 CONFIG_MAGIC_SYSRQ=y
 # CONFIG_SCHEDSTATS is not set
 # CONFIG_DEBUG_SLAB is not set
 CONFIG_DEBUG_PREEMPT=y
 CONFIG_WAKEUP_TIMING=y
 CONFIG_PREEMPT_TRACE=y
 CONFIG_CRITICAL_PREEMPT_TIMING=y
 CONFIG_CRITICAL_IRQSOFF_TIMING=y
 CONFIG_CRITICAL_TIMING=y
 CONFIG_LATENCY_TIMING=y
 CONFIG_LATENCY_TRACE=y
 CONFIG_MCOUNT=y
 CONFIG_RT_DEADLOCK_DETECT=y
 # CONFIG_DEBUG_KOBJECT is not set
 CONFIG_DEBUG_BUGVERBOSE=y
 # CONFIG_DEBUG_INFO is not set
 CONFIG_FRAME_POINTER=y
 CONFIG_EARLY_PRINTK=y
 CONFIG_DEBUG_STACKOVERFLOW=y

i'd suggest to use the following one:

 CONFIG_DEBUG_KERNEL=y
 CONFIG_MAGIC_SYSRQ=y
 # CONFIG_SCHEDSTATS is not set
 # CONFIG_DEBUG_SLAB is not set
 # CONFIG_DEBUG_PREEMPT is not set
 # CONFIG_DEBUG_SPINLOCK_SLEEP is not set
 CONFIG_WAKEUP_TIMING=y
 # CONFIG_CRITICAL_PREEMPT_TIMING is not set
 # CONFIG_CRITICAL_IRQSOFF_TIMING is not set
 CONFIG_LATENCY_TIMING=y
 CONFIG_LATENCY_TRACE=y
 CONFIG_MCOUNT=y
 # CONFIG_RT_DEADLOCK_DETECT is not set
 # CONFIG_DEBUG_KOBJECT is not set
 CONFIG_DEBUG_BUGVERBOSE=y
 # CONFIG_DEBUG_INFO is not set
 CONFIG_FRAME_POINTER=y
 CONFIG_EARLY_PRINTK=y
 # CONFIG_DEBUG_STACKOVERFLOW is not set

especially PREEMPT_TIMING, IRQSOFF_TIMING and RT_DEADLOCK_DETECT are
quite expensive options.

	Ingo

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-7
  2004-11-26 12:12 Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-7 Rui Nuno Capela
  2004-11-29 11:05 ` Ingo Molnar
@ 2004-11-29 11:16 ` Ingo Molnar
  2004-11-29 11:24   ` Ingo Molnar
                     ` (2 more replies)
  1 sibling, 3 replies; 59+ messages in thread
From: Ingo Molnar @ 2004-11-29 11:16 UTC (permalink / raw)
  To: Rui Nuno Capela
  Cc: linux-kernel, Lee Revell, mark_h_johnson, K.R. Foley, Bill Huey,
	Adam Heath, Florian Schmidt, Thomas Gleixner, Michal Schmidt,
	Fernando Pablo Lopez-Lezcano, Karsten Wiese, Gunther Persoons,
	emann, Shane Shrybman, Amit Shah, Esben Nielsen


* Rui Nuno Capela <rncbc@rncbc.org> wrote:

>   xruntrace1.0.tar.gz
>      - scripts used for xrun trace capture automation
>
>   xruntrace1-2.6.10-rc2-mm3-RT-V0.7.31-7.tar.gz
>      - actual xrun traces captured while running jackd on RT-V0.7.31-7

the trace buffer is too small to capture a significant portion of the
xrun - i'd suggest for you to increase it from 4096-1 to 4096*16-1, to
be able to capture a couple of hundreds of millisecs worth of traces.

also, i think it produces more stable results if the tracing is
activated and deactivated from userspace - i.e. the patch below will
measure the latency of poll() executed by the ALSA glue code within
JACK. (i also removed the xrun printf, because it can cause delays.)

this patch can be used the following way: do not activate xrun_debug via
/proc (to not create additional latencies that make xruns worse), but
otherwise set the /proc/sys/kernel/ flags the same way you do already. 
If the patch is applied to jackd then /proc/latency_trace will show the
latest and highest latency that was measured.

note that this captures latencies visible in the ALSA glue, which doesnt
cover the whole jackd critical path - but it should be pretty good as an
initial start. It will definitely show all jackd-unrelated delay
sources.

to cover all latency paths, the last chunk of the patch could perhaps be
modified to do:

+		gettimeofday(1,0);
+		gettimeofday(1,1);

this will also capture latencies within jackd itself. But i think it is
better to separate 'external' from 'internal' latencies, because the two
situations are quite different and it might be hard to identify which
particular latency we are seeing in /proc/latency_trace.

	Ingo

--- ./drivers/alsa/alsa_driver.c.orig	2004-11-26 14:11:26.000000000 +0100
+++ ./drivers/alsa/alsa_driver.c	2004-11-26 14:24:41.000000000 +0100
@@ -1081,9 +1081,11 @@ alsa_driver_xrun_recovery (alsa_driver_t
 		snd_pcm_status_get_trigger_tstamp(status, &tstamp);
 		timersub(&now, &tstamp, &diff);
 		*delayed_usecs = diff.tv_sec * 1000000.0 + diff.tv_usec;
+#if 0
 		fprintf(stderr, "\n\n**** alsa_pcm: xrun of at least %.3f "
 			"msecs\n\n",
 			*delayed_usecs / 1000.0);
+#endif
 	}
 
 	if (alsa_driver_stop (driver) ||
@@ -1187,6 +1189,7 @@ alsa_driver_wait (alsa_driver_t *driver,
 
 		poll_enter = jack_get_microseconds ();
 
+		gettimeofday(1,1);
 		if (poll (driver->pfd, nfds, driver->poll_timeout) < 0) {
 
 			if (errno == EINTR) {
@@ -1206,6 +1209,7 @@ alsa_driver_wait (alsa_driver_t *driver,
 			return 0;
 			
 		}
+		gettimeofday(1,0);
 
 		poll_ret = jack_get_microseconds ();
 


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-7
  2004-11-29 11:16 ` Ingo Molnar
@ 2004-11-29 11:24   ` Ingo Molnar
  2004-11-29 15:42     ` Ingo Molnar
  2004-11-29 13:13   ` Rui Nuno Capela
  2004-11-30 18:13   ` Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-7 Remi Colinet
  2 siblings, 1 reply; 59+ messages in thread
From: Ingo Molnar @ 2004-11-29 11:24 UTC (permalink / raw)
  To: Rui Nuno Capela
  Cc: linux-kernel, Lee Revell, mark_h_johnson, K.R. Foley, Bill Huey,
	Adam Heath, Florian Schmidt, Thomas Gleixner, Michal Schmidt,
	Fernando Pablo Lopez-Lezcano, Karsten Wiese, Gunther Persoons,
	emann, Shane Shrybman, Amit Shah, Esben Nielsen


* Ingo Molnar <mingo@elte.hu> wrote:

> --- ./drivers/alsa/alsa_driver.c.orig	2004-11-26 14:11:26.000000000 +0100
> +++ ./drivers/alsa/alsa_driver.c	2004-11-26 14:24:41.000000000 +0100

i think the one below is a better approach - it will only trace events
that the ALSA driver has reported to be an xrun. I.e. the full latency
path from the point where poll() is called, up to the point where jackd
[after the latency has occured] considers it an xrun worth
counting/reporting. The tracing restarts at every poll(), so only the 
latency is captured. If you up your tracebuffer to 40K+ entries then i'd 
suggest to use the following trace settings:

 echo 0 > /proc/asound/card0/pcm0p/xrun_debug

 echo 1 > /proc/sys/kernel/trace_user_triggered
 echo 0 > /proc/sys/kernel/trace_freerunning
 echo 0 > /proc/sys/kernel/preempt_max_latency
 echo 0 > /proc/sys/kernel/preempt_thresh
 echo 0 > /proc/sys/kernel/preempt_wakeup_timing

i.e. dont use trace_freerunning - this will give much easier to parse
traces.

a suggestion wrt. the format of the .trc files: it would be nice if you
could dump the PIDs of all relevant tasks into it too, to make it easier
to identify who causes what latency. Ideally it would be useful to 
have a more symbolic trace - i.e. instead of:

 3570 00000000 254981.991ms (+0.000ms): up (ext3_orphan_del)

it would be:

 jackd-3570 00000000 254981.991ms (+0.000ms): up (ext3_orphan_del)

but this is easier done in the kernel - some of the tasks involved in a
latency might be long gone by the time you detect the xrun.

	Ingo

--- ./drivers/alsa/alsa_driver.c.orig	2004-11-26 14:11:26.000000000 +0100
+++ ./drivers/alsa/alsa_driver.c	2004-11-26 15:31:37.000000000 +0100
@@ -1077,13 +1077,16 @@ alsa_driver_xrun_recovery (alsa_driver_t
 	    && driver->process_count > XRUN_REPORT_DELAY) {
 		struct timeval now, diff, tstamp;
 		driver->xrun_count++;
+		gettimeofday(0,0);
 		gettimeofday(&now, 0);
 		snd_pcm_status_get_trigger_tstamp(status, &tstamp);
 		timersub(&now, &tstamp, &diff);
 		*delayed_usecs = diff.tv_sec * 1000000.0 + diff.tv_usec;
+#if 0
 		fprintf(stderr, "\n\n**** alsa_pcm: xrun of at least %.3f "
 			"msecs\n\n",
 			*delayed_usecs / 1000.0);
+#endif
 	}
 
 	if (alsa_driver_stop (driver) ||
@@ -1185,6 +1188,7 @@ alsa_driver_wait (alsa_driver_t *driver,
 			nfds++;
 		}
 
+		gettimeofday(0,1);
 		poll_enter = jack_get_microseconds ();
 
 		if (poll (driver->pfd, nfds, driver->poll_timeout) < 0) {

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-7
  2004-11-29 11:16 ` Ingo Molnar
  2004-11-29 11:24   ` Ingo Molnar
@ 2004-11-29 13:13   ` Rui Nuno Capela
  2004-11-29 14:33     ` Ingo Molnar
  2004-11-30 18:13   ` Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-7 Remi Colinet
  2 siblings, 1 reply; 59+ messages in thread
From: Rui Nuno Capela @ 2004-11-29 13:13 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Lee Revell, mark_h_johnson, K.R. Foley, Bill Huey,
	Adam Heath, Florian Schmidt, Thomas Gleixner, Michal Schmidt,
	Fernando Pablo Lopez-Lezcano, Karsten Wiese, Gunther Persoons,
	emann, Shane Shrybman, Amit Shah, Esben Nielsen

Ingo Molnar wrote:
>
> * Rui Nuno Capela wrote:
>
>>   xruntrace1.0.tar.gz
>>      - scripts used for xrun trace capture automation
>>
>>   xruntrace1-2.6.10-rc2-mm3-RT-V0.7.31-7.tar.gz
>>      - actual xrun traces captured while running jackd on RT-V0.7.31-7
>
> the trace buffer is too small to capture a significant portion of the
> xrun - i'd suggest for you to increase it from 4096-1 to 4096*16-1, to
> be able to capture a couple of hundreds of millisecs worth of traces.
>

and how I do that? Is it some /proc magic or its in the kernel configuration?

please forgive my ignorance:)

bye.
-- 
rncbc aka Rui Nuno Capela
rncbc@rncbc.org


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-7
  2004-11-29 13:13   ` Rui Nuno Capela
@ 2004-11-29 14:33     ` Ingo Molnar
  2004-11-29 15:23       ` Ingo Molnar
  0 siblings, 1 reply; 59+ messages in thread
From: Ingo Molnar @ 2004-11-29 14:33 UTC (permalink / raw)
  To: Rui Nuno Capela
  Cc: linux-kernel, Lee Revell, mark_h_johnson, K.R. Foley, Bill Huey,
	Adam Heath, Florian Schmidt, Thomas Gleixner, Michal Schmidt,
	Fernando Pablo Lopez-Lezcano, Karsten Wiese, Gunther Persoons,
	emann, Shane Shrybman, Amit Shah, Esben Nielsen


* Rui Nuno Capela <rncbc@rncbc.org> wrote:

> > the trace buffer is too small to capture a significant portion of the
> > xrun - i'd suggest for you to increase it from 4096-1 to 4096*16-1, to
> > be able to capture a couple of hundreds of millisecs worth of traces.
> >
> 
> and how I do that? Is it some /proc magic or its in the kernel configuration?

it's the MAX_TRACE define in kernel/latency.c.

but please try to the -31-10 kernel that i've just uploaded, it has a
number of tracer enhancements:

 - process name tracking. The new format is:

    bash- 3633 80000003 0.264ms (+0.000ms): idle_cpu (wake_idle)
    bash- 3633 80000003 0.264ms (+0.000ms): idle_cpu (wake_idle)
    bash- 3633 80000003 0.264ms (+0.000ms): find_next_bit (wake_idle)

  this makes it easier to identify which process does what. This feature 
  has no significant overhead in the tracer itself, all the hard work is
  done when /proc/latency_trace is read by the user.

 - /proc/sys/kernel/mcount_enabled flag: if disabled then 
   /proc/latency_trace will only contain 'custom' events, but no 
   per-function entries. This can be useful to trace really long 
   latencies, to get a rough idea of what is going on.

 - /proc/latency_trace atomicity. It was fundamentally non-atomic, due 
   to it being a 4K-granular file interface. Now the kernel has a second
   layer of saved-trace logic, which makes sure that the only time the
   trace is switched is when the header of the trace is read. I.e. when
   cp-ing or cat-ing /proc/latency_trace, no new trace info will be
   used. This change could solve some of the 'truncated traces'
   weirdnesses reported.

	Ingo

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-7
  2004-11-29 14:33     ` Ingo Molnar
@ 2004-11-29 15:23       ` Ingo Molnar
  2004-11-29 23:16         ` Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-13 Gene Heskett
  2004-11-30 10:29         ` Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-7 Rui Nuno Capela
  0 siblings, 2 replies; 59+ messages in thread
From: Ingo Molnar @ 2004-11-29 15:23 UTC (permalink / raw)
  To: Rui Nuno Capela
  Cc: linux-kernel, Lee Revell, mark_h_johnson, K.R. Foley, Bill Huey,
	Adam Heath, Florian Schmidt, Thomas Gleixner, Michal Schmidt,
	Fernando Pablo Lopez-Lezcano, Karsten Wiese, Gunther Persoons,
	emann, Shane Shrybman, Amit Shah, Esben Nielsen


* Ingo Molnar <mingo@elte.hu> wrote:

> but please try to the -31-10 kernel that i've just uploaded, it has a
> number of tracer enhancements:

make that -31-13 (or later). Earlier kernels had a bug in where the
process name tracking only worked for the first latency trace saved,
subsequent traces showed 'unknown' for the process name. In -13 i've
also added a printk that shows the latest user latency in a one-line
printk - just like the built-in latency tracing modes do:

 (gettimeofday/3671/CPU#0): new 3068 us user-latency.
 (gettimeofday/3784/CPU#0): new 1008627 us user-latency.

(this should also make it easier for helper scripts to save the traces,
whenever they happen.)

	Ingo

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-7
  2004-11-29 11:24   ` Ingo Molnar
@ 2004-11-29 15:42     ` Ingo Molnar
  0 siblings, 0 replies; 59+ messages in thread
From: Ingo Molnar @ 2004-11-29 15:42 UTC (permalink / raw)
  To: Rui Nuno Capela
  Cc: linux-kernel, Lee Revell, mark_h_johnson, K.R. Foley, Bill Huey,
	Adam Heath, Florian Schmidt, Thomas Gleixner, Michal Schmidt,
	Fernando Pablo Lopez-Lezcano, Karsten Wiese, Gunther Persoons,
	emann, Shane Shrybman, Amit Shah, Esben Nielsen


> > --- ./drivers/alsa/alsa_driver.c.orig	2004-11-26 14:11:26.000000000 +0100
> > +++ ./drivers/alsa/alsa_driver.c	2004-11-26 14:24:41.000000000 +0100

find yet another approach below - it aborts jackd when the first xrun
happens. Since an xrun can trigger other xruns, the most reliable trace
would be the first xrun that triggers.

	Ingo

--- ./drivers/alsa/alsa_driver.c.orig	2004-11-26 14:11:26.000000000 +0100
+++ ./drivers/alsa/alsa_driver.c	2004-11-29 17:39:15.985918576 +0100
@@ -1077,13 +1077,17 @@ alsa_driver_xrun_recovery (alsa_driver_t
 	    && driver->process_count > XRUN_REPORT_DELAY) {
 		struct timeval now, diff, tstamp;
 		driver->xrun_count++;
+		gettimeofday(0,0);
 		gettimeofday(&now, 0);
 		snd_pcm_status_get_trigger_tstamp(status, &tstamp);
 		timersub(&now, &tstamp, &diff);
 		*delayed_usecs = diff.tv_sec * 1000000.0 + diff.tv_usec;
+#if 1
 		fprintf(stderr, "\n\n**** alsa_pcm: xrun of at least %.3f "
 			"msecs\n\n",
 			*delayed_usecs / 1000.0);
+#endif
+		exit(-1);
 	}
 
 	if (alsa_driver_stop (driver) ||
@@ -1185,6 +1189,7 @@ alsa_driver_wait (alsa_driver_t *driver,
 			nfds++;
 		}
 
+		gettimeofday(0,1);
 		poll_enter = jack_get_microseconds ();
 
 		if (poll (driver->pfd, nfds, driver->poll_timeout) < 0) {

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-13
  2004-11-29 15:23       ` Ingo Molnar
@ 2004-11-29 23:16         ` Gene Heskett
  2004-11-30  1:50           ` K.R. Foley
  2004-11-30 10:29         ` Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-7 Rui Nuno Capela
  1 sibling, 1 reply; 59+ messages in thread
From: Gene Heskett @ 2004-11-29 23:16 UTC (permalink / raw)
  To: linux-kernel

On Monday 29 November 2004 10:23, Ingo Molnar wrote:
>* Ingo Molnar <mingo@elte.hu> wrote:
>> but please try to the -31-10 kernel that i've just uploaded, it
>> has a number of tracer enhancements:
>
>make that -31-13 (or later). Earlier kernels had a bug in where the
>process name tracking only worked for the first latency trace saved,
>subsequent traces showed 'unknown' for the process name. In -13 i've
>also added a printk that shows the latest user latency in a one-line
>printk - just like the built-in latency tracing modes do:
>
> (gettimeofday/3671/CPU#0): new 3068 us user-latency.
> (gettimeofday/3784/CPU#0): new 1008627 us user-latency.
>
>(this should also make it easier for helper scripts to save the
> traces, whenever they happen.)
>
> Ingo

I just built this to see how much blood it would draw, which isn't 
much.  I don't have jack here, so I don't have your standard torture 
test.  Instead, I run tvtime, which runs at a -19 priority.

I let it run about 30 seconds (untimed), noted that the frame error 
slippage wasn't improved, and got this output histogram when I quit 
it.

Its (tvtime) is running here of course.
--------------------
Nov 29 18:05:45 coyote kernel: Read missed before next interrupt
Nov 29 18:05:45 coyote kernel: wow!  That was a 22 millisec bump
Nov 29 18:05:45 coyote kernel: `IRQ 8'[846] is being piggy. 
need_resched=0, cpu=0
Nov 29 18:05:45 coyote kernel: Read missed before next interrupt
Nov 29 18:05:45 coyote kernel: wow!  That was a 22 millisec bump
Nov 29 18:05:45 coyote kernel: `IRQ 8'[846] is being piggy. 
need_resched=0, cpu=0
Nov 29 18:05:45 coyote kernel: Read missed before next interrupt
Nov 29 18:05:45 coyote kernel: wow!  That was a 21 millisec bump
Nov 29 18:05:45 coyote kernel: `IRQ 8'[846] is being piggy. 
need_resched=0, cpu=0
Nov 29 18:05:45 coyote kernel: Read missed before next interrupt
Nov 29 18:05:45 coyote kernel: wow!  That was a 21 millisec bump
Nov 29 18:05:45 coyote kernel: `IRQ 8'[846] is being piggy. 
need_resched=0, cpu=0
Nov 29 18:05:45 coyote kernel: Read missed before next interrupt
Nov 29 18:05:45 coyote kernel:

And was stopped here. 

Nov 29 18:05:45 coyote kernel: rtc latency histogram of {tvtime/3398, 
10609 samples}:
Nov 29 18:05:45 coyote kernel: 4 11
Nov 29 18:05:45 coyote kernel: 5 1716
Nov 29 18:05:45 coyote kernel: 6 4827
Nov 29 18:05:45 coyote kernel: 7 1495
Nov 29 18:05:45 coyote kernel: 8 382
Nov 29 18:05:45 coyote kernel: 9 193
Nov 29 18:05:45 coyote kernel: 10 206
Nov 29 18:05:45 coyote kernel: 11 188
Nov 29 18:05:45 coyote kernel: 12 148
Nov 29 18:05:45 coyote kernel: 13 202
Nov 29 18:05:45 coyote kernel: 14 195
Nov 29 18:05:45 coyote kernel: 15 95
Nov 29 18:05:45 coyote kernel: 16 70
Nov 29 18:05:45 coyote kernel: 17 23
Nov 29 18:05:45 coyote kernel: 18 18
Nov 29 18:05:45 coyote kernel: 19 8
Nov 29 18:05:45 coyote kernel: 20 9
Nov 29 18:05:45 coyote kernel: 21 1
Nov 29 18:05:45 coyote kernel: 22 1
Nov 29 18:05:45 coyote kernel: 26 1
--------------------
And I note that the 1-26 column of numbers does not seem to add up to 
whats being logged above there, which are all 21 and 22 ms bumps 
(whatever a bump is)

Is this a helpfull report, or just noise?  Subjectively, tvtime is 
running with far fewer visible frame glitches than before I started 
playing with these patches.  A marked improvement IMO.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.29% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-13
  2004-11-29 23:16         ` Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-13 Gene Heskett
@ 2004-11-30  1:50           ` K.R. Foley
  2004-11-30  3:19             ` Gene Heskett
  2004-11-30  4:54             ` Gene Heskett
  0 siblings, 2 replies; 59+ messages in thread
From: K.R. Foley @ 2004-11-30  1:50 UTC (permalink / raw)
  To: gene.heskett; +Cc: linux-kernel

Gene Heskett wrote:
> On Monday 29 November 2004 10:23, Ingo Molnar wrote:
> 
>>* Ingo Molnar <mingo@elte.hu> wrote:
>>
>>>but please try to the -31-10 kernel that i've just uploaded, it
>>>has a number of tracer enhancements:
>>
>>make that -31-13 (or later). Earlier kernels had a bug in where the
>>process name tracking only worked for the first latency trace saved,
>>subsequent traces showed 'unknown' for the process name. In -13 i've
>>also added a printk that shows the latest user latency in a one-line
>>printk - just like the built-in latency tracing modes do:
>>
>>(gettimeofday/3671/CPU#0): new 3068 us user-latency.
>>(gettimeofday/3784/CPU#0): new 1008627 us user-latency.
>>
>>(this should also make it easier for helper scripts to save the
>>traces, whenever they happen.)
>>
>>Ingo
> 
> 
> I just built this to see how much blood it would draw, which isn't 
> much.  I don't have jack here, so I don't have your standard torture 
> test.  Instead, I run tvtime, which runs at a -19 priority.
> 
> I let it run about 30 seconds (untimed), noted that the frame error 
> slippage wasn't improved, and got this output histogram when I quit 
> it.
> 
> Its (tvtime) is running here of course.
> --------------------
> Nov 29 18:05:45 coyote kernel: Read missed before next interrupt
> Nov 29 18:05:45 coyote kernel: wow!  That was a 22 millisec bump
> Nov 29 18:05:45 coyote kernel: `IRQ 8'[846] is being piggy. 
> need_resched=0, cpu=0
> Nov 29 18:05:45 coyote kernel: Read missed before next interrupt
> Nov 29 18:05:45 coyote kernel: wow!  That was a 22 millisec bump
> Nov 29 18:05:45 coyote kernel: `IRQ 8'[846] is being piggy. 
> need_resched=0, cpu=0
> Nov 29 18:05:45 coyote kernel: Read missed before next interrupt
> Nov 29 18:05:45 coyote kernel: wow!  That was a 21 millisec bump
> Nov 29 18:05:45 coyote kernel: `IRQ 8'[846] is being piggy. 
> need_resched=0, cpu=0
> Nov 29 18:05:45 coyote kernel: Read missed before next interrupt
> Nov 29 18:05:45 coyote kernel: wow!  That was a 21 millisec bump
> Nov 29 18:05:45 coyote kernel: `IRQ 8'[846] is being piggy. 
> need_resched=0, cpu=0
> Nov 29 18:05:45 coyote kernel: Read missed before next interrupt
> Nov 29 18:05:45 coyote kernel:
> 
> And was stopped here. 
> 
> Nov 29 18:05:45 coyote kernel: rtc latency histogram of {tvtime/3398, 
> 10609 samples}:
> Nov 29 18:05:45 coyote kernel: 4 11
> Nov 29 18:05:45 coyote kernel: 5 1716
> Nov 29 18:05:45 coyote kernel: 6 4827
> Nov 29 18:05:45 coyote kernel: 7 1495
> Nov 29 18:05:45 coyote kernel: 8 382
> Nov 29 18:05:45 coyote kernel: 9 193
> Nov 29 18:05:45 coyote kernel: 10 206
> Nov 29 18:05:45 coyote kernel: 11 188
> Nov 29 18:05:45 coyote kernel: 12 148
> Nov 29 18:05:45 coyote kernel: 13 202
> Nov 29 18:05:45 coyote kernel: 14 195
> Nov 29 18:05:45 coyote kernel: 15 95
> Nov 29 18:05:45 coyote kernel: 16 70
> Nov 29 18:05:45 coyote kernel: 17 23
> Nov 29 18:05:45 coyote kernel: 18 18
> Nov 29 18:05:45 coyote kernel: 19 8
> Nov 29 18:05:45 coyote kernel: 20 9
> Nov 29 18:05:45 coyote kernel: 21 1
> Nov 29 18:05:45 coyote kernel: 22 1
> Nov 29 18:05:45 coyote kernel: 26 1
> --------------------
> And I note that the 1-26 column of numbers does not seem to add up to 
> whats being logged above there, which are all 21 and 22 ms bumps 
> (whatever a bump is)

Is this all that is in the log? For some reason there are 820 samples 
not represented in the output above. The ms+ hits would have been 
represented by something like:

Nov 29 18:05:45 coyote kernel: 9999 4

Not that it is related to the missing output, but does tvtime use 
polling on the rtc?

kr

> 
> Is this a helpfull report, or just noise?  Subjectively, tvtime is 
> running with far fewer visible frame glitches than before I started 
> playing with these patches.  A marked improvement IMO.
> 


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-13
  2004-11-30  1:50           ` K.R. Foley
@ 2004-11-30  3:19             ` Gene Heskett
  2004-11-30  4:54             ` Gene Heskett
  1 sibling, 0 replies; 59+ messages in thread
From: Gene Heskett @ 2004-11-30  3:19 UTC (permalink / raw)
  To: linux-kernel

On Monday 29 November 2004 20:50, K.R. Foley wrote:
>Gene Heskett wrote:
>> On Monday 29 November 2004 10:23, Ingo Molnar wrote:
>>>* Ingo Molnar <mingo@elte.hu> wrote:
>>>>but please try to the -31-10 kernel that i've just uploaded, it
>>>>has a number of tracer enhancements:
>>>
>>>make that -31-13 (or later). Earlier kernels had a bug in where
>>> the process name tracking only worked for the first latency trace
>>> saved, subsequent traces showed 'unknown' for the process name.
>>> In -13 i've also added a printk that shows the latest user
>>> latency in a one-line printk - just like the built-in latency
>>> tracing modes do:
>>>
>>>(gettimeofday/3671/CPU#0): new 3068 us user-latency.
>>>(gettimeofday/3784/CPU#0): new 1008627 us user-latency.
>>>
>>>(this should also make it easier for helper scripts to save the
>>>traces, whenever they happen.)
>>>
>>>Ingo
>>
>> I just built this to see how much blood it would draw, which isn't
>> much.  I don't have jack here, so I don't have your standard
>> torture test.  Instead, I run tvtime, which runs at a -19
>> priority.
>>
>> I let it run about 30 seconds (untimed), noted that the frame
>> error slippage wasn't improved, and got this output histogram when
>> I quit it.
>>
>> Its (tvtime) is running here of course.
>> --------------------
>> Nov 29 18:05:45 coyote kernel: Read missed before next interrupt
>> Nov 29 18:05:45 coyote kernel: wow!  That was a 22 millisec bump
>> Nov 29 18:05:45 coyote kernel: `IRQ 8'[846] is being piggy.
>> need_resched=0, cpu=0
>> Nov 29 18:05:45 coyote kernel: Read missed before next interrupt
>> Nov 29 18:05:45 coyote kernel: wow!  That was a 22 millisec bump
>> Nov 29 18:05:45 coyote kernel: `IRQ 8'[846] is being piggy.
>> need_resched=0, cpu=0
>> Nov 29 18:05:45 coyote kernel: Read missed before next interrupt
>> Nov 29 18:05:45 coyote kernel: wow!  That was a 21 millisec bump
>> Nov 29 18:05:45 coyote kernel: `IRQ 8'[846] is being piggy.
>> need_resched=0, cpu=0
>> Nov 29 18:05:45 coyote kernel: Read missed before next interrupt
>> Nov 29 18:05:45 coyote kernel: wow!  That was a 21 millisec bump
>> Nov 29 18:05:45 coyote kernel: `IRQ 8'[846] is being piggy.
>> need_resched=0, cpu=0
>> Nov 29 18:05:45 coyote kernel: Read missed before next interrupt
>> Nov 29 18:05:45 coyote kernel:
>>
>> And was stopped here.
>>
>> Nov 29 18:05:45 coyote kernel: rtc latency histogram of
>> {tvtime/3398, 10609 samples}:
>> Nov 29 18:05:45 coyote kernel: 4 11
>> Nov 29 18:05:45 coyote kernel: 5 1716
>> Nov 29 18:05:45 coyote kernel: 6 4827
>> Nov 29 18:05:45 coyote kernel: 7 1495
>> Nov 29 18:05:45 coyote kernel: 8 382
>> Nov 29 18:05:45 coyote kernel: 9 193
>> Nov 29 18:05:45 coyote kernel: 10 206
>> Nov 29 18:05:45 coyote kernel: 11 188
>> Nov 29 18:05:45 coyote kernel: 12 148
>> Nov 29 18:05:45 coyote kernel: 13 202
>> Nov 29 18:05:45 coyote kernel: 14 195
>> Nov 29 18:05:45 coyote kernel: 15 95
>> Nov 29 18:05:45 coyote kernel: 16 70
>> Nov 29 18:05:45 coyote kernel: 17 23
>> Nov 29 18:05:45 coyote kernel: 18 18
>> Nov 29 18:05:45 coyote kernel: 19 8
>> Nov 29 18:05:45 coyote kernel: 20 9
>> Nov 29 18:05:45 coyote kernel: 21 1
>> Nov 29 18:05:45 coyote kernel: 22 1
>> Nov 29 18:05:45 coyote kernel: 26 1
>> --------------------
>> And I note that the 1-26 column of numbers does not seem to add up
>> to whats being logged above there, which are all 21 and 22 ms
>> bumps (whatever a bump is)
>
>Is this all that is in the log? For some reason there are 820
> samples not represented in the output above. The ms+ hits would
> have been represented by something like:
>
>Nov 29 18:05:45 coyote kernel: 9999 4

There are NO lines in the log that look like that.  There are many 
hundred duplicates of the 3 line pattern you see at the top of the 
snip, but thats it.

AND I just discovered that I am not running the -13 kernel, but the -7 
version as I had that entry set as the fallback in grub.conf.  I've 
since reset that to the -9 kernel as fallback and thats whats running 
right now.

Which brings up something thats been an excedrin headache here since I 
bought this #*&^^$^% ati 9200-SE video card to replace a failed 
nvidia that took the motherboard with it when it failed.  Thats the 
fact that I boot blind.  From the grub screen to the start of the 
rc.sysinit script, this machine has no fonts!  Sure, I see the curser 
dancing back and forth over the bottom line of the screen, but 
nothing is displayed, and this is true whether the fonts are compiled 
in or as modules, whether the radeonfb is compiled in or as a module, 
and with or without the VGA(16) or vesafb, compiled in or as a 
module.

So its doing an automatic reboot to the kernel set as default, and 
doing it during the time that I am blind from a lack of fonts.

FWIW, I've mentioned this on a couple of occasions but no one has 
commented one way or the other.

One thing I haven't tried is to take all frame buffers out of the 
kernel, not even as available modules, in order to force it to use 
the cards own display.  That would effectively convert this machine 
into a 66 mhz 486, but I guess its worth the test.  I'll be back 
after trying that.

>
>Not that it is related to the missing output, but does tvtime use
>polling on the rtc?

Sorry.  All I can tell you is that I did have to put the rtc into the 
kernel before tvtime would run well at all.

>kr

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.29% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-13
  2004-11-30  1:50           ` K.R. Foley
  2004-11-30  3:19             ` Gene Heskett
@ 2004-11-30  4:54             ` Gene Heskett
  2004-11-30 15:26               ` K.R. Foley
  1 sibling, 1 reply; 59+ messages in thread
From: Gene Heskett @ 2004-11-30  4:54 UTC (permalink / raw)
  To: linux-kernel; +Cc: K.R. Foley

On Monday 29 November 2004 20:50, K.R. Foley wrote:
>Gene Heskett wrote:

>>>make that -31-13 (or later). Earlier kernels had a bug in where

[...]

>Is this all that is in the log? For some reason there are 820
> samples not represented in the output above. The ms+ hits would
> have been represented by something like:
>
>Nov 29 18:05:45 coyote kernel: 9999 4

Ok, I finally got -13 to run (typo in grub), and you are now correct 
in that the final entry in the log after I shut tvtime down is like 
this:

Nov 29 23:43:40 coyote kernel: rtc latency histogram of {tvtime/3911, 
10430 samples}:
Nov 29 23:43:40 coyote kernel: 4 51
Nov 29 23:43:40 coyote kernel: 5 2058
Nov 29 23:43:40 coyote kernel: 6 3594
Nov 29 23:43:40 coyote kernel: 7 1270
Nov 29 23:43:40 coyote kernel: 8 473
Nov 29 23:43:40 coyote kernel: 9 299
Nov 29 23:43:40 coyote kernel: 10 252
Nov 29 23:43:40 coyote kernel: 11 209
Nov 29 23:43:40 coyote kernel: 12 215
Nov 29 23:43:40 coyote kernel: 13 345
Nov 29 23:43:40 coyote kernel: 14 391
Nov 29 23:43:40 coyote kernel: 15 248
Nov 29 23:43:40 coyote kernel: 16 113
Nov 29 23:43:40 coyote kernel: 17 55
Nov 29 23:43:40 coyote kernel: 18 17
Nov 29 23:43:40 coyote kernel: 19 11
Nov 29 23:43:40 coyote kernel: 20 4
Nov 29 23:43:40 coyote kernel: 21 1
Nov 29 23:43:40 coyote kernel: 23 2
Nov 29 23:43:40 coyote kernel: 28 1
Nov 29 23:43:40 coyote kernel: 4612 1
Nov 29 23:43:40 coyote kernel: 9999 820

What does this tell you now?  The last 2 lines look a bit strange to
me.  Particularly since the runtime was random enough that your 
previous comment about the number 820 was a mssing count, and what 
came out of a seperate run IS an 820.

I find that a bit hard to believe that I timed those two runs 
identically.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.29% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-7
  2004-11-30 18:13   ` Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-7 Remi Colinet
@ 2004-11-30  8:15     ` Ingo Molnar
  2004-12-01  8:30       ` Eran Mann
  0 siblings, 1 reply; 59+ messages in thread
From: Ingo Molnar @ 2004-11-30  8:15 UTC (permalink / raw)
  To: Remi Colinet
  Cc: Rui Nuno Capela, linux-kernel, Lee Revell, mark_h_johnson,
	K.R. Foley, Bill Huey, Adam Heath, Florian Schmidt,
	Thomas Gleixner, Michal Schmidt, Fernando Pablo Lopez-Lezcano,
	Karsten Wiese, Gunther Persoons, emann, Shane Shrybman, Amit Shah,
	Esben Nielsen


* Remi Colinet <remi.colinet@free.fr> wrote:

> Hi Ingo,
> 
> I'm getting this error with V0.7.31-13

> CC kernel/latency.o
> kernel/latency.c: In function `check_critical_timing':
> kernel/latency.c:730: too few arguments to function `___trace'
> kernel/latency.c:730: warning: too few arguments passed to inline 

fixed this in -V0.7.31-14.

	Ingo

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-7
  2004-11-29 15:23       ` Ingo Molnar
  2004-11-29 23:16         ` Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-13 Gene Heskett
@ 2004-11-30 10:29         ` Rui Nuno Capela
  2004-11-30 13:19           ` Ingo Molnar
  1 sibling, 1 reply; 59+ messages in thread
From: Rui Nuno Capela @ 2004-11-30 10:29 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Lee Revell, mark_h_johnson, K.R. Foley, Bill Huey,
	Adam Heath, Florian Schmidt, Thomas Gleixner, Michal Schmidt,
	Fernando Pablo Lopez-Lezcano, Karsten Wiese, Gunther Persoons,
	emann, Shane Shrybman, Amit Shah, Esben Nielsen

[-- Attachment #1: Type: text/plain, Size: 4225 bytes --]

Ingo Molnar wrote:
>
> * Ingo Molnar <mingo@elte.hu> wrote:
>
>> but please try to the -31-10 kernel that i've just uploaded, it has a
>> number of tracer enhancements:
>
> make that -31-13 (or later). Earlier kernels had a bug in where the
> process name tracking only worked for the first latency trace saved,
> subsequent traces showed 'unknown' for the process name. In -13 i've
> also added a printk that shows the latest user latency in a one-line
> printk - just like the built-in latency tracing modes do:
>
>  (gettimeofday/3671/CPU#0): new 3068 us user-latency.
>  (gettimeofday/3784/CPU#0): new 1008627 us user-latency.
>
> (this should also make it easier for helper scripts to save the traces,
> whenever they happen.)
>

OK.

I did my homework and managed to grab some latency traces for you to take
a look and see if it's alright. Please.

On attach you may find the files:

    xruntrace1.1.tar.gz
        - scripts used to arm and capture the latency traces;

    jack-0.99.14.0xruntrace-alsa.patch.gz
        - the actual patch applied to jack for trace auto-triggering;

    config-2.6.10-rc2-mm3-RT-V0.7.31-13.gz
        - the running kernel configuration, as taken from /proc/config.gz

    xruntrace1-2.6.10-rc2-mm3-RT-V0.7.31-13-*.trc.gz
        - the captured traces, as dumped with xruntrace1_watch.sh script.

    jack_test3.2.tar.gz
        - updated my jack_test3 benchmarking suite.


The traces were captured, while running a usual KDE desktop session (from
qjackctl) on my P4@2.53Ghz laptop (Mandrake 10.1):

    jackd -R -P60 -dalsa -dhw:0 -r44100 -p64 -n2 -S -P

Each trace shows only the first XRUN occurrence on a distinct jackd
session. Every other trace were triggered and captured after restarting
jackd.

Must say that XRUNs are rare but do occurr. Using the same jackd command
line, my jack_test3 suite is showing these figures by now:

    *********** CONSOLIDATED RESULTS ************
    Total seconds ran . . . . . . :  7200
    Number of clients . . . . . . :    16
    Ports per client  . . . . . . :     4
    *********************************************
    Summary Result Count  . . . . :     2 /     2
    *********************************************
    Timeout Rate  . . . . . . . . :(    0.0)/hour
    XRUN Rate . . . . . . . . . . :    13.8 /hour
    Delay Rate (>spare time)  . . :    13.8 /hour
    Delay Rate (>1000 usecs)  . . :     7.0 /hour
    Delay Maximum . . . . . . . . :  1022   usecs
    Cycle Maximum . . . . . . . . :  1048   usecs
    Average DSP Load. . . . . . . :    62.2 %
    Average CPU System Load . . . :    20.6 %
    Average CPU User Load . . . . :    48.5 %
    Average CPU Nice Load . . . . :     0.0 %
    Average CPU I/O Wait Load . . :     0.0 %
    Average CPU IRQ Load  . . . . :     0.0 %
    Average CPU Soft-IRQ Load . . :     0.0 %
    Average Interrupt Rate  . . . :  1689.2 /sec
    Average Context-Switch Rate . : 19778.2 /sec
    *********************************************

The IRQ process/threading priority status has been fixed as shown below:

  PID CLS RTPRIO  NI PRI %CPU STAT COMMAND
    2 FF      90   - 130  0.0 S    IRQ 0
  728 FF      80   - 120  0.0 S<   IRQ 8
 2943 FF      70   - 110  0.0 S<   IRQ 5
 1500 FF      60   - 100  0.0 S<   IRQ 10
  841 FF      50   -  90  0.0 S<   IRQ 1
  736 FF      49   -  89  0.0 S<   IRQ 12
   20 FF      48   -  88  0.0 S<   IRQ 9
  814 FF      45   -  85  0.0 S<   IRQ 6
  821 FF      44   -  84  0.0 S<   IRQ 14
  823 FF      43   -  83  0.0 S<   IRQ 15
 2227 FF      40   -  80  0.0 S<   IRQ 11

# cat /proc/interrupts

           CPU0
  0:    6435186          XT-PIC  timer  0/35186
  1:       1164          XT-PIC  i8042  1/1164
  2:          0          XT-PIC  cascade  0/0
  5:         73          XT-PIC  ALI 5451  0/72
  8:          1          XT-PIC  rtc  0/1
  9:      20602          XT-PIC  acpi  0/20602
 10:          0          XT-PIC  ohci_hcd, ohci_hcd  0/0
 11:          5          XT-PIC  yenta  5/5
 12:      64317          XT-PIC  i8042  0/64317
 14:      11772          XT-PIC  ide0  0/11772
 15:         19          XT-PIC  ide1  0/19
NMI:          0
LOC:          0
ERR:          0
MIS:          0


Cheers.
-- 
rncbc aka Rui Nuno Capela
rncbc@rncbc.org

[-- Attachment #2: xruntrace1.1.tar.gz --]
[-- Type: application/x-gzip-compressed, Size: 1688 bytes --]

[-- Attachment #3: jack-0.99.14.0xruntrace-alsa.patch.gz --]
[-- Type: application/x-gzip-compressed, Size: 548 bytes --]

[-- Attachment #4: config-2.6.10-rc2-mm3-RT-V0.7.31-13.gz --]
[-- Type: application/x-gzip-compressed, Size: 8248 bytes --]

[-- Attachment #5: xruntrace1-2.6.10-rc2-mm3-RT-V0.7.31-13-20041129154425.trc.gz --]
[-- Type: application/x-gzip-compressed, Size: 1695 bytes --]

[-- Attachment #6: xruntrace1-2.6.10-rc2-mm3-RT-V0.7.31-13-20041129184639.trc.gz --]
[-- Type: application/x-gzip-compressed, Size: 1659 bytes --]

[-- Attachment #7: xruntrace1-2.6.10-rc2-mm3-RT-V0.7.31-13-20041129195721.trc.gz --]
[-- Type: application/x-gzip-compressed, Size: 1667 bytes --]

[-- Attachment #8: jack_test3.2.tar.gz --]
[-- Type: application/x-gzip-compressed, Size: 10159 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-7
  2004-11-30 10:29         ` Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-7 Rui Nuno Capela
@ 2004-11-30 13:19           ` Ingo Molnar
  2004-11-30 15:39             ` Rui Nuno Capela
  0 siblings, 1 reply; 59+ messages in thread
From: Ingo Molnar @ 2004-11-30 13:19 UTC (permalink / raw)
  To: Rui Nuno Capela
  Cc: linux-kernel, Lee Revell, mark_h_johnson, K.R. Foley, Bill Huey,
	Adam Heath, Florian Schmidt, Thomas Gleixner, Michal Schmidt,
	Fernando Pablo Lopez-Lezcano, Karsten Wiese, Gunther Persoons,
	emann, Shane Shrybman, Amit Shah, Esben Nielsen


* Rui Nuno Capela <rncbc@rncbc.org> wrote:

>     xruntrace1-2.6.10-rc2-mm3-RT-V0.7.31-13-*.trc.gz
>         - the captured traces, as dumped with xruntrace1_watch.sh script.

> Each trace shows only the first XRUN occurrence on a distinct jackd
> session. Every other trace were triggered and captured after
> restarting jackd.

interesting - the trace shows only a latency of 20-40 usecs. Perhaps the
tracing should be done over two periods?

Could you try to hack alsa_driver.c to do the gettimeofday(1,1) call for
only every second case. Something like this ought to be enough:

	{
		static count = 0;

		if (!(count++ & 1))
			gettimeofday(0,1);
	}

(only one thread accesses this particular variable so there are no
threading issues.)

with this variant there's a 50% chance that we get the trace of the last
2 poll() instances. (and there's a 50% chance that we get only 1 period
covered.)

in any case, the scripts & approach seems to be almost there, i hope we
only need the above change to see the true source of the xruns.

	Ingo

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-13
  2004-11-30  4:54             ` Gene Heskett
@ 2004-11-30 15:26               ` K.R. Foley
  2004-11-30 16:24                 ` Gene Heskett
  0 siblings, 1 reply; 59+ messages in thread
From: K.R. Foley @ 2004-11-30 15:26 UTC (permalink / raw)
  To: gene.heskett; +Cc: linux-kernel

Gene Heskett wrote:
> On Monday 29 November 2004 20:50, K.R. Foley wrote:
> 
>>Gene Heskett wrote:
> 
> 
>>>>make that -31-13 (or later). Earlier kernels had a bug in where
> 
> 
> [...]
> 
> 
>>Is this all that is in the log? For some reason there are 820
>>samples not represented in the output above. The ms+ hits would
>>have been represented by something like:
>>
>>Nov 29 18:05:45 coyote kernel: 9999 4
> 
> 
> Ok, I finally got -13 to run (typo in grub), and you are now correct 
> in that the final entry in the log after I shut tvtime down is like 
> this:
> 
> Nov 29 23:43:40 coyote kernel: rtc latency histogram of {tvtime/3911, 
> 10430 samples}:
> Nov 29 23:43:40 coyote kernel: 4 51
> Nov 29 23:43:40 coyote kernel: 5 2058
> Nov 29 23:43:40 coyote kernel: 6 3594
> Nov 29 23:43:40 coyote kernel: 7 1270
> Nov 29 23:43:40 coyote kernel: 8 473
> Nov 29 23:43:40 coyote kernel: 9 299
> Nov 29 23:43:40 coyote kernel: 10 252
> Nov 29 23:43:40 coyote kernel: 11 209
> Nov 29 23:43:40 coyote kernel: 12 215
> Nov 29 23:43:40 coyote kernel: 13 345
> Nov 29 23:43:40 coyote kernel: 14 391
> Nov 29 23:43:40 coyote kernel: 15 248
> Nov 29 23:43:40 coyote kernel: 16 113
> Nov 29 23:43:40 coyote kernel: 17 55
> Nov 29 23:43:40 coyote kernel: 18 17
> Nov 29 23:43:40 coyote kernel: 19 11
> Nov 29 23:43:40 coyote kernel: 20 4
> Nov 29 23:43:40 coyote kernel: 21 1
> Nov 29 23:43:40 coyote kernel: 23 2
> Nov 29 23:43:40 coyote kernel: 28 1
> Nov 29 23:43:40 coyote kernel: 4612 1
> Nov 29 23:43:40 coyote kernel: 9999 820
> 
> What does this tell you now?  The last 2 lines look a bit strange to
> me.  Particularly since the runtime was random enough that your 
> previous comment about the number 820 was a mssing count, and what 
> came out of a seperate run IS an 820.
> 
> I find that a bit hard to believe that I timed those two runs 
> identically.
> 

When I mentioned 820 that was just what was missing from the histogram 
data vs. the total samples. As for why this number showed up again, dumb 
luck? What the histogram represents is usecs (left column) between 
interrupt and read. The right column is the number of occurrences. What 
this tells me is that you had 820 samples that were greater than 9.999ms.

I took a brief look at the tvtime code. I noticed several things:
1) tvtime sets the interrupt frequency for the rtc to 1024 Hz which 
would imply that you should have had roughly 30000 samples in a 30 
second run, minus initial setup time. The histogram only shows 10430 
samples. I have no way of knowing if the setup time is really that long 
or if you are missing some interrupts. Unfortunately the rtc histogram 
stuff doesn't currently track missed interrupts except to print out the 
messages you were seeing

"<some process> is being piggy... Read missed before next interrupt"

2) tvtime is probably running at a RT priority of 99. The IRQ handler 
for the rtc defaults to 48-49 (I think). If you didn't already do so, 
you should bump the priority up as in:

chrt -f -p 99 `/sbin/pidof 'IRQ 8'`

Try that and see if it helps at all.

kr

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-7
  2004-11-30 13:19           ` Ingo Molnar
@ 2004-11-30 15:39             ` Rui Nuno Capela
  2004-11-30 16:42               ` Ingo Molnar
  2004-12-01 10:32               ` Ingo Molnar
  0 siblings, 2 replies; 59+ messages in thread
From: Rui Nuno Capela @ 2004-11-30 15:39 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Lee Revell, mark_h_johnson, K.R. Foley, Bill Huey,
	Adam Heath, Florian Schmidt, Thomas Gleixner, Michal Schmidt,
	Fernando Pablo Lopez-Lezcano, Karsten Wiese, Gunther Persoons,
	emann, Shane Shrybman, Amit Shah, Esben Nielsen

[-- Attachment #1: Type: text/plain, Size: 1351 bytes --]

> Ingo Molnar
>
> * Rui Nuno Capela wrote:
>
>>     xruntrace1-2.6.10-rc2-mm3-RT-V0.7.31-13-*.trc.gz
>>         - the captured traces, as dumped with xruntrace1_watch.sh
>>           script.
>
>> Each trace shows only the first XRUN occurrence on a distinct jackd
>> session. Every other trace were triggered and captured after
>> restarting jackd.
>
> interesting - the trace shows only a latency of 20-40 usecs. Perhaps the
> tracing should be done over two periods?
>
> Could you try to hack alsa_driver.c to do the gettimeofday(1,1) call for
> only every second case. Something like this ought to be enough:
>
> 	{
> 		static count = 0;
>
> 		if (!(count++ & 1))
> 			gettimeofday(0,1);
> 	}
>
> (only one thread accesses this particular variable so there are no
> threading issues.)
>
> with this variant there's a 50% chance that we get the trace of the last
> 2 poll() instances. (and there's a 50% chance that we get only 1 period
> covered.)
>
> in any case, the scripts & approach seems to be almost there, i hope we
> only need the above change to see the true source of the xruns.
>

Done that.

New XRUN traces are attached, while running RT-V0.7.31-15 now. However, I
don't seem to get any notorious difference on the results, since previous
ones. All latencies traced ca. 26-27 usecs.

Bye.
-- 
rncbc aka Rui Nuno Capela
rncbc@rncbc.org

[-- Attachment #2: config-2.6.10-rc2-mm3-RT-V0.7.31-15.gz --]
[-- Type: application/x-gzip-compressed, Size: 8232 bytes --]

[-- Attachment #3: xruntrace1-2.6.10-rc2-mm3-RT-V0.7.31-15-20041130140917.trc.gz --]
[-- Type: application/x-gzip-compressed, Size: 1670 bytes --]

[-- Attachment #4: xruntrace1-2.6.10-rc2-mm3-RT-V0.7.31-15-20041130141834.trc.gz --]
[-- Type: application/x-gzip-compressed, Size: 1648 bytes --]

[-- Attachment #5: xruntrace1-2.6.10-rc2-mm3-RT-V0.7.31-15-20041130142405.trc.gz --]
[-- Type: application/x-gzip-compressed, Size: 1654 bytes --]

[-- Attachment #6: jackit-0.99.15.1xruntrace-alsa.patch.gz --]
[-- Type: application/x-gzip-compressed, Size: 626 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-13
  2004-11-30 15:26               ` K.R. Foley
@ 2004-11-30 16:24                 ` Gene Heskett
  2004-11-30 16:52                   ` Zwane Mwaikambo
  2004-11-30 16:57                   ` K.R. Foley
  0 siblings, 2 replies; 59+ messages in thread
From: Gene Heskett @ 2004-11-30 16:24 UTC (permalink / raw)
  To: linux-kernel

On Tuesday 30 November 2004 10:26, K.R. Foley wrote:
>
>"<some process> is being piggy... Read missed before next interrupt"
>
>2) tvtime is probably running at a RT priority of 99. The IRQ
> handler for the rtc defaults to 48-49 (I think). If you didn't
> already do so, you should bump the priority up as in:
>
>chrt -f -p 99 `/sbin/pidof 'IRQ 8'`

[root@coyote root]# chrt -f -p 99 `/sbin/pidof 'IRQ 8'`
bash: chrt: command not found

chrt is an unknown command here. WTH?  Basicly an FC2 system.

>Try that and see if it helps at all.
>
>kr

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.29% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-7
  2004-11-30 15:39             ` Rui Nuno Capela
@ 2004-11-30 16:42               ` Ingo Molnar
  2004-12-01 10:32               ` Ingo Molnar
  1 sibling, 0 replies; 59+ messages in thread
From: Ingo Molnar @ 2004-11-30 16:42 UTC (permalink / raw)
  To: Rui Nuno Capela
  Cc: linux-kernel, Lee Revell, mark_h_johnson, K.R. Foley, Bill Huey,
	Adam Heath, Florian Schmidt, Thomas Gleixner, Michal Schmidt,
	Fernando Pablo Lopez-Lezcano, Karsten Wiese, Gunther Persoons,
	emann, Shane Shrybman, Amit Shah, Esben Nielsen


* Rui Nuno Capela <rncbc@rncbc.org> wrote:

> Done that.
> 
> New XRUN traces are attached, while running RT-V0.7.31-15 now.
> However, I don't seem to get any notorious difference on the results,
> since previous ones. All latencies traced ca. 26-27 usecs.

hm, weird. Next idea: could you activate trace_freerunning and change
the alsa_driver.c count hack to do:

	if (!count) {
		gettimeofday(0,1);
		count = 1;
	}

this means that tracing will only be activated once, and the tracing
goes on non-stop from that point on, up until the first xrun, at which
point the full tracebuffer is saved. Ok?

	Ingo

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-13
  2004-11-30 16:24                 ` Gene Heskett
@ 2004-11-30 16:52                   ` Zwane Mwaikambo
  2004-12-01  7:16                     ` Gene Heskett
  2004-11-30 16:57                   ` K.R. Foley
  1 sibling, 1 reply; 59+ messages in thread
From: Zwane Mwaikambo @ 2004-11-30 16:52 UTC (permalink / raw)
  To: Gene Heskett; +Cc: linux-kernel

On Tue, 30 Nov 2004, Gene Heskett wrote:

> On Tuesday 30 November 2004 10:26, K.R. Foley wrote:
> >
> >"<some process> is being piggy... Read missed before next interrupt"
> >
> >2) tvtime is probably running at a RT priority of 99. The IRQ
> > handler for the rtc defaults to 48-49 (I think). If you didn't
> > already do so, you should bump the priority up as in:
> >
> >chrt -f -p 99 `/sbin/pidof 'IRQ 8'`
> 
> [root@coyote root]# chrt -f -p 99 `/sbin/pidof 'IRQ 8'`
> bash: chrt: command not found
> 
> chrt is an unknown command here. WTH?  Basicly an FC2 system.

Install the package first (from an FC2 system)

zwane@r3000 ~ {0:1} rpm -qif `which chrt`
Name        : schedutils                   Relocations: (not relocatable)
Version     : 1.3.0                             Vendor: Red Hat, Inc.
Release     : 6                             Build Date: Tue 17 Feb 2004 
10:16:15 MST
Install Date: Tue 13 Jul 2004 11:13:52 MDT      Build Host: tweety.devel.redhat.com
Group       : Applications/System           Source RPM: schedutils-1.3.0-6.src.rpm
Size        : 39412                            License: GPL
Signature   : DSA/SHA1, Thu 06 May 2004 16:36:57 MDT, Key ID b44269d04f2a6fd2
Packager    : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>
Summary     : Utilities for manipulating process scheduler attributes
Description : schedutils is a set of utilities for retrieving and 
manipulating process scheduler-related attributes, such as real-time 
parameters and CPU affinity.

This package includes the chrt and taskset utilities.

Install this package if you need to set or get scheduler-related 
attributes.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-13
  2004-11-30 16:24                 ` Gene Heskett
  2004-11-30 16:52                   ` Zwane Mwaikambo
@ 2004-11-30 16:57                   ` K.R. Foley
  1 sibling, 0 replies; 59+ messages in thread
From: K.R. Foley @ 2004-11-30 16:57 UTC (permalink / raw)
  To: gene.heskett; +Cc: linux-kernel

Gene Heskett wrote:
> On Tuesday 30 November 2004 10:26, K.R. Foley wrote:
> 
>>"<some process> is being piggy... Read missed before next interrupt"
>>
>>2) tvtime is probably running at a RT priority of 99. The IRQ
>>handler for the rtc defaults to 48-49 (I think). If you didn't
>>already do so, you should bump the priority up as in:
>>
>>chrt -f -p 99 `/sbin/pidof 'IRQ 8'`
> 
> 
> [root@coyote root]# chrt -f -p 99 `/sbin/pidof 'IRQ 8'`
> bash: chrt: command not found
> 
> chrt is an unknown command here. WTH?  Basicly an FC2 system.

It's part of schedutils pkg.

kr

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-7
  2004-11-29 11:16 ` Ingo Molnar
  2004-11-29 11:24   ` Ingo Molnar
  2004-11-29 13:13   ` Rui Nuno Capela
@ 2004-11-30 18:13   ` Remi Colinet
  2004-11-30  8:15     ` Ingo Molnar
  2 siblings, 1 reply; 59+ messages in thread
From: Remi Colinet @ 2004-11-30 18:13 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rui Nuno Capela, linux-kernel, Lee Revell, mark_h_johnson,
	K.R. Foley, Bill Huey, Adam Heath, Florian Schmidt,
	Thomas Gleixner, Michal Schmidt, Fernando Pablo Lopez-Lezcano,
	Karsten Wiese, Gunther Persoons, emann, Shane Shrybman, Amit Shah,
	Esben Nielsen

Hi Ingo,

I'm getting this error with V0.7.31-13

CC kernel/wait.o
CC kernel/kfifo.o
CC kernel/sys_ni.o
CC kernel/rt.o
CC kernel/latency.o
kernel/latency.c: In function `check_critical_timing':
kernel/latency.c:730: too few arguments to function `___trace'
kernel/latency.c:730: warning: too few arguments passed to inline 
function, suppressing inlining
kernel/latency.c: In function `__start_critical_timing':
kernel/latency.c:810: incompatible type for argument 1 of `____trace'
kernel/latency.c:810: warning: passing arg 2 of `____trace' makes 
pointer from integer without a cast
kernel/latency.c:810: too few arguments to function `____trace'
kernel/latency.c: In function `trace_irqs_off':
kernel/latency.c:810: warning: too few arguments passed to inline 
function, suppressing inlining
kernel/latency.c: In function `add_preempt_count':
kernel/latency.c:810: warning: too few arguments passed to inline 
function, suppressing inlining
kernel/latency.c: At top level:
kernel/latency.c:810: warning: too few arguments passed to inline 
function, suppressing inlining
make[1]: *** [kernel/latency.o] Error 1
make: *** [kernel] Error 2
[root@tigre01 im]#

Regards
Remi



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-13
  2004-11-30 16:52                   ` Zwane Mwaikambo
@ 2004-12-01  7:16                     ` Gene Heskett
  0 siblings, 0 replies; 59+ messages in thread
From: Gene Heskett @ 2004-12-01  7:16 UTC (permalink / raw)
  To: linux-kernel

On Tuesday 30 November 2004 11:52, Zwane Mwaikambo wrote:
>On Tue, 30 Nov 2004, Gene Heskett wrote:
>> On Tuesday 30 November 2004 10:26, K.R. Foley wrote:
>> >"<some process> is being piggy... Read missed before next
>> > interrupt"
>> >
>> >2) tvtime is probably running at a RT priority of 99. The IRQ
>> > handler for the rtc defaults to 48-49 (I think). If you didn't
>> > already do so, you should bump the priority up as in:
>> >
>> >chrt -f -p 99 `/sbin/pidof 'IRQ 8'`
>>
>> [root@coyote root]# chrt -f -p 99 `/sbin/pidof 'IRQ 8'`
>> bash: chrt: command not found
>>
>> chrt is an unknown command here. WTH?  Basicly an FC2 system.
>
>Install the package first (from an FC2 system)
>
>zwane@r3000 ~ {0:1} rpm -qif `which chrt`
>Name        : schedutils                   Relocations: (not
> relocatable) Version     : 1.3.0                            
> Vendor: Red Hat, Inc. Release     : 6                            
> Build Date: Tue 17 Feb 2004 10:16:15 MST
>Install Date: Tue 13 Jul 2004 11:13:52 MDT      Build Host:
> tweety.devel.redhat.com Group       : Applications/System          
> Source RPM: schedutils-1.3.0-6.src.rpm Size        : 39412         
>                   License: GPL Signature   : DSA/SHA1, Thu 06 May
> 2004 16:36:57 MDT, Key ID b44269d04f2a6fd2 Packager    : Red Hat,
> Inc. <http://bugzilla.redhat.com/bugzilla> Summary     : Utilities
> for manipulating process scheduler attributes Description :
> schedutils is a set of utilities for retrieving and manipulating
> process scheduler-related attributes, such as real-time parameters
> and CPU affinity.
>
>This package includes the chrt and taskset utilities.
>
>Install this package if you need to set or get scheduler-related
>attributes.

Ok, I installed it just now, and ran it twice, once before running 
tvtime, and once after, but it made no visible difference in the log 
input, which is still being very verbose and the delays it reports 
also remain the same.

So it would appear that this isn't the nail my hammer needs. :-)

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.29% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-7
  2004-11-30  8:15     ` Ingo Molnar
@ 2004-12-01  8:30       ` Eran Mann
  2004-12-01  8:53         ` Ingo Molnar
  0 siblings, 1 reply; 59+ messages in thread
From: Eran Mann @ 2004-12-01  8:30 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel

Ingo Molnar wrote:
> 
> 
> fixed this in -V0.7.31-14.
> 
> 	Ingo
>  This mail arrived via mail.mrv.com
> 
> 

Compiling -V0.7.31-15 without CONFIG_LATENCY_TRACE results in:
...
   CC      kernel/latency.o
kernel/latency.c: In function `check_critical_timing':
kernel/latency.c:765: warning: implicit declaration of function `__trace'
...
   LD      .tmp_vmlinux1
kernel/built-in.o(.text+0x1e407): In function `check_critical_timing':
: undefined reference to `__trace'
make: *** [.tmp_vmlinux1] Error 1

Seems to be fixed by the patch below:

--- kernel/latency.c.orig       2004-12-01 10:21:45.000000000 +0200
+++ kernel/latency.c    2004-12-01 10:11:37.000000000 +0200
@@ -762,7 +762,9 @@
         tr->critical_sequence = max_sequence;
         tr->preempt_timestamp = cycles();
         tr->early_warning = 0;
+#ifdef CONFIG_LATENCY_TRACE
         __trace(CALLER_ADDR0, parent_eip);
+#endif
  }

  void notrace touch_critical_timing(void)
-- 
Eran Mann

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-7
  2004-12-01  8:30       ` Eran Mann
@ 2004-12-01  8:53         ` Ingo Molnar
  2004-12-01 18:19           ` Adam Heath
  0 siblings, 1 reply; 59+ messages in thread
From: Ingo Molnar @ 2004-12-01  8:53 UTC (permalink / raw)
  To: Eran Mann; +Cc: linux-kernel


* Eran Mann <emann@mrv.com> wrote:

> Seems to be fixed by the patch below:
> 
> --- kernel/latency.c.orig       2004-12-01 10:21:45.000000000 +0200
> +++ kernel/latency.c    2004-12-01 10:11:37.000000000 +0200
> @@ -762,7 +762,9 @@
>         tr->critical_sequence = max_sequence;
>         tr->preempt_timestamp = cycles();
>         tr->early_warning = 0;
> +#ifdef CONFIG_LATENCY_TRACE
>         __trace(CALLER_ADDR0, parent_eip);
> +#endif

thanks, applied it and uploaded -V0.7.31-16.

	Ingo

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-7
  2004-11-30 15:39             ` Rui Nuno Capela
  2004-11-30 16:42               ` Ingo Molnar
@ 2004-12-01 10:32               ` Ingo Molnar
  2004-12-01 11:25                 ` Ingo Molnar
  2004-12-01 12:47                 ` Rui Nuno Capela
  1 sibling, 2 replies; 59+ messages in thread
From: Ingo Molnar @ 2004-12-01 10:32 UTC (permalink / raw)
  To: Rui Nuno Capela
  Cc: linux-kernel, Lee Revell, mark_h_johnson, K.R. Foley, Bill Huey,
	Adam Heath, Florian Schmidt, Thomas Gleixner, Michal Schmidt,
	Fernando Pablo Lopez-Lezcano, Karsten Wiese, Gunther Persoons,
	emann, Shane Shrybman, Amit Shah, Esben Nielsen


* Rui Nuno Capela <rncbc@rncbc.org> wrote:

> > 		if (!(count++ & 1))
> > 			gettimeofday(0,1);
> > 	}

> Done that.
> 
> New XRUN traces are attached, while running RT-V0.7.31-15 now.
> However, I don't seem to get any notorious difference on the results,
> since previous ones. All latencies traced ca. 26-27 usecs.

ah, found the reason why the trace length didnt improve:

+               {       // Trigger off trace every other poll...
+                       static unsigned int xruntrace_count = 0;
+                       if ((xruntrace_count++ % 1) == 0)
+                               gettimeofday(0,1);
+               }

this should either be '& 1' as i originally suggested, or '% 2'. The way 
it is right now it will stop the trace every time - i.e. what we had 
before.

so please disregard my freerunning suggestionsand try the two-periods
solution first.

	Ingo

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-7
  2004-12-01 10:32               ` Ingo Molnar
@ 2004-12-01 11:25                 ` Ingo Molnar
  2004-12-01 12:49                   ` Rui Nuno Capela
  2004-12-01 12:47                 ` Rui Nuno Capela
  1 sibling, 1 reply; 59+ messages in thread
From: Ingo Molnar @ 2004-12-01 11:25 UTC (permalink / raw)
  To: Rui Nuno Capela
  Cc: linux-kernel, Lee Revell, mark_h_johnson, K.R. Foley, Bill Huey,
	Adam Heath, Florian Schmidt, Thomas Gleixner, Michal Schmidt,
	Fernando Pablo Lopez-Lezcano, Karsten Wiese, Gunther Persoons,
	emann, Shane Shrybman, Amit Shah, Esben Nielsen


> New XRUN traces are attached, while running RT-V0.7.31-15 now.
> However, I don't seem to get any notorious difference on the results,
> since previous ones. All latencies traced ca. 26-27 usecs.

another thing: could you try the latest jackd CVS tree too? I found
spurious xruns in 99-0 (the last official release), while with 99-11 i
dont see any xruns on my box, up until i completely saturate CPU time.

I.e. both latest -RT kernels are needed for this (earlier RT kernels,
prior the PI fixes still generated lots of xruns even with latest jack
CVS), and latest jack CVS is needed.

	Ingo

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-7
  2004-12-01 10:32               ` Ingo Molnar
  2004-12-01 11:25                 ` Ingo Molnar
@ 2004-12-01 12:47                 ` Rui Nuno Capela
  2004-12-01 15:40                   ` Ingo Molnar
  1 sibling, 1 reply; 59+ messages in thread
From: Rui Nuno Capela @ 2004-12-01 12:47 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Lee Revell, mark_h_johnson, K.R. Foley, Bill Huey,
	Adam Heath, Florian Schmidt, Thomas Gleixner, Michal Schmidt,
	Fernando Pablo Lopez-Lezcano, Karsten Wiese, Gunther Persoons,
	emann, Shane Shrybman, Amit Shah, Esben Nielsen

Ingo Molnar wrote:
>
> * Rui Nuno Capela <rncbc@rncbc.org> wrote:
>
>> > 		if (!(count++ & 1))
>> > 			gettimeofday(0,1);
>> > 	}
>
>> Done that.
>>
>> New XRUN traces are attached, while running RT-V0.7.31-15 now.
>> However, I don't seem to get any notorious difference on the results,
>> since previous ones. All latencies traced ca. 26-27 usecs.
>
> ah, found the reason why the trace length didnt improve:
>
> +               {       // Trigger off trace every other poll...
> +                       static unsigned int xruntrace_count = 0;
> +                       if ((xruntrace_count++ % 1) == 0)
> +                               gettimeofday(0,1);
> +               }
>
> this should either be '& 1' as i originally suggested, or '% 2'. The way
> it is right now it will stop the trace every time - i.e. what we had
> before.
>
> so please disregard my freerunning suggestionsand try the two-periods
> solution first.
>

Very good eyes! It was all my mistake. Sorry, sorry. It was a sure typo
after all, the '%' key is right next to '&', at least on my on my keyboard
:)

Will correct it and bbl...
-- 
rncbc aka Rui Nuno Capela
rncbc@rncbc.org


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-7
  2004-12-01 11:25                 ` Ingo Molnar
@ 2004-12-01 12:49                   ` Rui Nuno Capela
  0 siblings, 0 replies; 59+ messages in thread
From: Rui Nuno Capela @ 2004-12-01 12:49 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Lee Revell, mark_h_johnson, K.R. Foley, Bill Huey,
	Adam Heath, Florian Schmidt, Thomas Gleixner, Michal Schmidt,
	Fernando Pablo Lopez-Lezcano, Karsten Wiese, Gunther Persoons,
	emann, Shane Shrybman, Amit Shah, Esben Nielsen

Ingo Molnar wrote:
>
>> New XRUN traces are attached, while running RT-V0.7.31-15 now.
>> However, I don't seem to get any notorious difference on the results,
>> since previous ones. All latencies traced ca. 26-27 usecs.
>
> another thing: could you try the latest jackd CVS tree too? I found
> spurious xruns in 99-0 (the last official release), while with 99-11 i
> dont see any xruns on my box, up until i completely saturate CPU time.
>
> I.e. both latest -RT kernels are needed for this (earlier RT kernels,
> prior the PI fixes still generated lots of xruns even with latest jack
> CVS), and latest jack CVS is needed.
>

I am using the latest-as-it-could-be jack CVS snapshot, as handed by Jack
O'Quin, since sourceforge's CVS service has been broken for a while.

The latest of my last report is 0.99.15.
-- 
rncbc aka Rui Nuno Capela
rncbc@rncbc.org


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-7
  2004-12-01 12:47                 ` Rui Nuno Capela
@ 2004-12-01 15:40                   ` Ingo Molnar
  2004-12-01 16:06                     ` Ingo Molnar
  0 siblings, 1 reply; 59+ messages in thread
From: Ingo Molnar @ 2004-12-01 15:40 UTC (permalink / raw)
  To: Rui Nuno Capela
  Cc: linux-kernel, Lee Revell, mark_h_johnson, K.R. Foley, Bill Huey,
	Adam Heath, Florian Schmidt, Thomas Gleixner, Michal Schmidt,
	Fernando Pablo Lopez-Lezcano, Karsten Wiese, Gunther Persoons,
	emann, Shane Shrybman, Amit Shah, Esben Nielsen


ok, got the latest traces from Rui offlist.

here's a summary of an xrun that went like this:

   jackd-4711  00000000 0.002ms (+0.000ms): do_poll (sys_poll)
   ...
   jackd-4711  80000002 0.012ms (+0.000ms): dequeue_task (deactivate_task)
   ...
   IRQ 5-3052  ........ 1.946ms (+0.000ms): ->    jackd-4711
                     [ 00000027 00000001 ]: try_to_wake_up
   ...
qjackctl-4719  ........ 2.362ms (+0.000ms): common_interrupt:
                     [ b75e9090 00000005 00000000 ]
   ...
   IRQ 5-3052  00000000 2.473ms (+0.000ms): xrun (snd_pcm_period_elapsed)
   ...

this shows an interesting phenomenon: for whatever reason IRQ5's thread
didnt run until timestamp 1.946ms. There is lots of idle time between
timestamps 0.012ms and 1.946ms, so this must be some weird condition. 

Rui, could you activate trace_freerunning and send me a couple of 'big'
traces that result out of it? Apparently there is some other, earlier
activity that caused this to happen to IRQ5. Nothing seems to have woken
up IRQ5 when it ran, which is weird by itself too.

if IRQ thread 5 got delayed somehow that might explain the xrun: the
hardware interrupt had no chance to run. But how IRQ5 could stay on the
runqueue while the idle thread managed to run is strange.

	Ingo

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-7
  2004-12-01 15:40                   ` Ingo Molnar
@ 2004-12-01 16:06                     ` Ingo Molnar
  2004-12-01 16:20                       ` [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-19 Ingo Molnar
  0 siblings, 1 reply; 59+ messages in thread
From: Ingo Molnar @ 2004-12-01 16:06 UTC (permalink / raw)
  To: Rui Nuno Capela
  Cc: linux-kernel, Lee Revell, mark_h_johnson, K.R. Foley, Bill Huey,
	Adam Heath, Florian Schmidt, Thomas Gleixner, Michal Schmidt,
	Fernando Pablo Lopez-Lezcano, Karsten Wiese, Gunther Persoons,
	emann, Shane Shrybman, Amit Shah, Esben Nielsen


* Ingo Molnar <mingo@elte.hu> wrote:

> this shows an interesting phenomenon: for whatever reason IRQ5's
> thread didnt run until timestamp 1.946ms. There is lots of idle time
> between timestamps 0.012ms and 1.946ms, so this must be some weird
> condition. 

ok, this could be ACPI CPU-sleep related. Could you disable all ACPI
options in your .config (as a workaround), and re-check whether the
xruns still occur?

	Ingo

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-19
  2004-12-01 16:06                     ` Ingo Molnar
@ 2004-12-01 16:20                       ` Ingo Molnar
  2004-12-01 16:31                         ` Ingo Molnar
                                           ` (2 more replies)
  0 siblings, 3 replies; 59+ messages in thread
From: Ingo Molnar @ 2004-12-01 16:20 UTC (permalink / raw)
  To: Rui Nuno Capela
  Cc: linux-kernel, Lee Revell, mark_h_johnson, K.R. Foley, Bill Huey,
	Adam Heath, Florian Schmidt, Thomas Gleixner, Michal Schmidt,
	Fernando Pablo Lopez-Lezcano, Karsten Wiese, Gunther Persoons,
	emann, Shane Shrybman, Amit Shah, Esben Nielsen, Andrew Morton


* Ingo Molnar <mingo@elte.hu> wrote:

> ok, this could be ACPI CPU-sleep related. Could you disable all ACPI
> options in your .config (as a workaround), and re-check whether the
> xruns still occur?

i think i found the bug - it's an upstream ACPI bug. Does the patch
below (or the -31-19 kernel, which i've just uploaded) fix the xruns?

the upstream ACPI bug is this: we check need_resched() _before_
disabling preemption. This opens up the following scenario:

 swapper:	!need_resched()
 [IRQ context]
		wakes up a task
		marks idle task as need-resched

 swapper:	acpi_processor_idle(); // sleeps until next irq

instant 1msec latency introduced...

normally default_idle() is safe because it re-checks need_resched with
interrupts disabled before it truly halts the CPU. But
acpi_processor_idle() doesnt seem to be doing this! Your trace clearly
shows a missed preemption due to ACPI. I'm wondering why no-one has
triggered this before, it's a really bad bug that should be fixed in
2.6.10.

	Ingo

Signed-off-by: Ingo Molnar <mingo@elte.hu>

--- linux/drivers/acpi/processor.c.orig
+++ linux/drivers/acpi/processor.c
@@ -370,6 +370,15 @@ acpi_processor_idle (void)
 	 */
 	local_irq_disable();
 
+	/*
+	 * Check whether we truly need to go idle, or should
+	 * reschedule:
+	 */
+	if (unlikely(need_resched())) {
+		local_irq_enable();
+		return;
+	}
+
 	cx = &(pr->power.states[pr->power.state]);
 
 	/*

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-19
  2004-12-01 16:20                       ` [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-19 Ingo Molnar
@ 2004-12-01 16:31                         ` Ingo Molnar
  2004-12-01 18:59                         ` Rui Nuno Capela
  2004-12-03  1:41                         ` Fernando Lopez-Lezcano
  2 siblings, 0 replies; 59+ messages in thread
From: Ingo Molnar @ 2004-12-01 16:31 UTC (permalink / raw)
  To: Rui Nuno Capela
  Cc: linux-kernel, Lee Revell, mark_h_johnson, K.R. Foley, Bill Huey,
	Adam Heath, Florian Schmidt, Thomas Gleixner, Michal Schmidt,
	Fernando Pablo Lopez-Lezcano, Karsten Wiese, Gunther Persoons,
	emann, Shane Shrybman, Amit Shah, Esben Nielsen, Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 1925 bytes --]


* Ingo Molnar <mingo@elte.hu> wrote:

> i think i found the bug - it's an upstream ACPI bug. Does the patch
> below (or the -31-19 kernel, which i've just uploaded) fix the xruns?
>
> the upstream ACPI bug is this: we check need_resched() _before_
> disabling preemption. This opens up the following scenario:
> 
>  swapper:	!need_resched()
>  [IRQ context]
> 		wakes up a task
> 		marks idle task as need-resched
> 
>  swapper:	acpi_processor_idle(); // sleeps until next irq
> 
> instant 1msec latency introduced...
>
> normally default_idle() is safe because it re-checks need_resched with
> interrupts disabled before it truly halts the CPU. But
> acpi_processor_idle() doesnt seem to be doing this! Your trace clearly
> shows a missed preemption due to ACPI. I'm wondering why no-one has
> triggered this before, it's a really bad bug that should be fixed in
> 2.6.10.

i've attached the particular trace from Rui's collection of traces which
nicely shows the ACPI race in action. Here are the relevant events:

 swapper-0     08000000 0.840ms (+0.000ms): preempt_schedule (cpu_idle)
 swapper-0     ........ 0.913ms (+0.000ms): ->    IRQ 5-3052 [ 0000001d 0000008c ]: wake_up_process
 swapper-0     08000001 0.913ms (+0.920ms): acpi_processor_idle (cpu_idle)
 swapper-0     ........ 1.834ms (+0.000ms): common_interrupt:  [ e0022266 00000000 00000000 ]
 swapper-0     88010001 1.834ms (+0.000ms): direct_timer_interrupt (do_IRQ)
   IRQ 5-3052  80000002 1.848ms (+0.000ms): __switch_to (__sched_text_start)

i.e. at timestamp 0.840ms cpu_idle() got preempted (it was after the
need_resched() check but before the rcu_read_lock() which disables
preemption), then at 0.913ms a task was woken up by the hardirq context,
but despite this we called into acpi_processor_idle() which slept
dutifully until the next timer interrupt hit us. Then did only the woken
up task schedule. (which caused an xrun in Rui's setup.)

	Ingo

[-- Attachment #2: xruntrace1-2.6.10-rc2-mm3-RT-V0.7.31-15-20041201143614.trc.gz --]
[-- Type: application/x-gzip, Size: 30870 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-7
  2004-12-01  8:53         ` Ingo Molnar
@ 2004-12-01 18:19           ` Adam Heath
  0 siblings, 0 replies; 59+ messages in thread
From: Adam Heath @ 2004-12-01 18:19 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Eran Mann, linux-kernel

On Wed, 1 Dec 2004, Ingo Molnar wrote:

>
> * Eran Mann <emann@mrv.com> wrote:
>
> > Seems to be fixed by the patch below:
> >
> > --- kernel/latency.c.orig       2004-12-01 10:21:45.000000000 +0200
> > +++ kernel/latency.c    2004-12-01 10:11:37.000000000 +0200
> > @@ -762,7 +762,9 @@
> >         tr->critical_sequence = max_sequence;
> >         tr->preempt_timestamp = cycles();
> >         tr->early_warning = 0;
> > +#ifdef CONFIG_LATENCY_TRACE
> >         __trace(CALLER_ADDR0, parent_eip);
> > +#endif
>
> thanks, applied it and uploaded -V0.7.31-16.

Wouldn't it be better to change the definition of __trace()?  Always have it
defined, but make it empty if CONFIG_LATENCY_TRACE isn't set?

It'd keep the code from being littered with ifdef.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-19
  2004-12-01 16:20                       ` [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-19 Ingo Molnar
  2004-12-01 16:31                         ` Ingo Molnar
@ 2004-12-01 18:59                         ` Rui Nuno Capela
  2004-12-01 21:29                           ` Ingo Molnar
  2004-12-03  1:41                         ` Fernando Lopez-Lezcano
  2 siblings, 1 reply; 59+ messages in thread
From: Rui Nuno Capela @ 2004-12-01 18:59 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Lee Revell, mark_h_johnson, K.R. Foley, Bill Huey,
	Adam Heath, Florian Schmidt, Thomas Gleixner, Michal Schmidt,
	Fernando Pablo Lopez-Lezcano, Karsten Wiese, Gunther Persoons,
	emann, Shane Shrybman, Amit Shah, Esben Nielsen, Andrew Morton

Ingo Molnar wrote:
>
>> ok, this could be ACPI CPU-sleep related. Could you disable all ACPI
>> options in your .config (as a workaround), and re-check whether the
>> xruns still occur?
>
> i think i found the bug - it's an upstream ACPI bug. Does the patch
> below (or the -31-19 kernel, which i've just uploaded) fix the xruns?
>

So far so good.

I have now an armed RT-V0.31-19 on my laptop, running for about 30 minutes
already, running the usual 'jackd -R -dalsa -dhw:0 -r44100 -p64 -n2 -S -P'
from qjackctl, several qsynth (fluidsynth) engines are up and running,
normal desktop janitor work (KDE) and yes, ACPI is on. Oh, wi-fi is also
pumping nice :)

Guess what? No XRUNs, not even one to show you a single latency trace.

I'll keep you posted, if anything goes less right than it is :).
-- 
rncbc aka Rui Nuno Capela
rncbc@rncbc.org


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-19
  2004-12-01 18:59                         ` Rui Nuno Capela
@ 2004-12-01 21:29                           ` Ingo Molnar
  2004-12-01 21:30                             ` Ingo Molnar
  0 siblings, 1 reply; 59+ messages in thread
From: Ingo Molnar @ 2004-12-01 21:29 UTC (permalink / raw)
  To: Rui Nuno Capela
  Cc: linux-kernel, Lee Revell, mark_h_johnson, K.R. Foley, Bill Huey,
	Adam Heath, Florian Schmidt, Thomas Gleixner, Michal Schmidt,
	Fernando Pablo Lopez-Lezcano, Karsten Wiese, Gunther Persoons,
	emann, Shane Shrybman, Amit Shah, Esben Nielsen, Andrew Morton


* Rui Nuno Capela <rncbc@rncbc.org> wrote:

> > i think i found the bug - it's an upstream ACPI bug. Does the patch
> > below (or the -31-19 kernel, which i've just uploaded) fix the xruns?
> >
> 
> So far so good.
> 
> I have now an armed RT-V0.31-19 on my laptop, running for about 30
> minutes already, running the usual 'jackd -R -dalsa -dhw:0 -r44100
> -p64 -n2 -S -P' from qjackctl, several qsynth (fluidsynth) engines are
> up and running, normal desktop janitor work (KDE) and yes, ACPI is on.
> Oh, wi-fi is also pumping nice :)
> 
> Guess what? No XRUNs, not even one to show you a single latency trace.
> 
> I'll keep you posted, if anything goes less right than it is :).

great - it seems the ACPI fix made a difference. If there are no xruns
then you might want to try the # of engines at which point xruns show
up. (can the soundcard period size / buffering be reduced further, to
make it more sensitive to scheduling latencies?)

	Ingo

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-19
  2004-12-01 21:29                           ` Ingo Molnar
@ 2004-12-01 21:30                             ` Ingo Molnar
       [not found]                               ` <32788.192.168.1.8.1101938057.squirrel@192.168.1.8>
  0 siblings, 1 reply; 59+ messages in thread
From: Ingo Molnar @ 2004-12-01 21:30 UTC (permalink / raw)
  To: Rui Nuno Capela
  Cc: linux-kernel, Lee Revell, mark_h_johnson, K.R. Foley, Bill Huey,
	Adam Heath, Florian Schmidt, Thomas Gleixner, Michal Schmidt,
	Fernando Pablo Lopez-Lezcano, Karsten Wiese, Gunther Persoons,
	emann, Shane Shrybman, Amit Shah, Esben Nielsen, Andrew Morton


* Ingo Molnar <mingo@elte.hu> wrote:

> > Guess what? No XRUNs, not even one to show you a single latency trace.
> > 
> > I'll keep you posted, if anything goes less right than it is :).
> 
> great - it seems the ACPI fix made a difference. If there are no xruns
> then you might want to try the # of engines at which point xruns show
                            ^-- to increase

> up. (can the soundcard period size / buffering be reduced further, to
> make it more sensitive to scheduling latencies?)

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-19
       [not found]                               ` <32788.192.168.1.8.1101938057.squirrel@192.168.1.8>
@ 2004-12-01 21:58                                 ` Ingo Molnar
  2004-12-01 22:04                                   ` Rui Nuno Capela
  2004-12-01 22:09                                 ` Ingo Molnar
  1 sibling, 1 reply; 59+ messages in thread
From: Ingo Molnar @ 2004-12-01 21:58 UTC (permalink / raw)
  To: Rui Nuno Capela
  Cc: linux-kernel, Lee Revell, mark_h_johnson, K.R. Foley, Bill Huey,
	Adam Heath, Florian Schmidt, Thomas Gleixner, Michal Schmidt,
	Fernando Pablo Lopez-Lezcano, Karsten Wiese, Gunther Persoons,
	emann, Shane Shrybman, Amit Shah, Esben Nielsen, Andrew Morton


* Rui Nuno Capela <rncbc@rncbc.org> wrote:

> >> up. (can the soundcard period size / buffering be reduced further, to
> >> make it more sensitive to scheduling latencies?)
> >
> 
> OK.
> 
> A couple of hours later, as I thrown in some more jack clients into
> the picture, the XRUNs started to appear, but very discrete still.

just curious, what type of CPU load did this create - over 50%?

	Ingo

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-19
  2004-12-01 21:58                                 ` Ingo Molnar
@ 2004-12-01 22:04                                   ` Rui Nuno Capela
  0 siblings, 0 replies; 59+ messages in thread
From: Rui Nuno Capela @ 2004-12-01 22:04 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Lee Revell, mark_h_johnson, K.R. Foley, Bill Huey,
	Adam Heath, Florian Schmidt, Thomas Gleixner, Michal Schmidt,
	Fernando Pablo Lopez-Lezcano, Karsten Wiese, Gunther Persoons,
	emann, Shane Shrybman, Amit Shah, Esben Nielsen, Andrew Morton

Ingo Molnar wrote:
>
> * Rui Nuno Capela <rncbc@rncbc.org> wrote:
>
>> >> up. (can the soundcard period size / buffering be reduced further, to
>> >> make it more sensitive to scheduling latencies?)
>> >
>>
>> OK.
>>
>> A couple of hours later, as I thrown in some more jack clients into
>> the picture, the XRUNs started to appear, but very discrete still.
>
> just curious, what type of CPU load did this create - over 50%?
>

At this very moment? About 75%. Here's a top snapshot:

top - 22:03:48 up  3:36,  4 users,  load average: 1.91, 2.73, 2.16
Tasks:  86 total,   1 running,  85 sleeping,   0 stopped,   0 zombie
Cpu(s): 33.9% us, 39.7% sy,  0.0% ni, 26.5% id,  0.0% wa,  0.0% hi,  0.0% si
Mem:    500900k total,   411308k used,    89592k free,    12680k buffers
Swap:   506008k total,        0k used,   506008k free,   209300k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
18300 rncbc     16   0 89128  87m  14m S 30.8 17.8  13:02.31 qsynth
18288 rncbc     20   0 28148  27m 2316 S 10.8  5.6   4:29.60 jackd
 3078 root     -71  -5     0    0    0 S  8.9  0.0  16:33.69 IRQ 5
16701 rncbc     15   0 26988  26m  14m S  7.3  5.4   4:05.01 qjackctl
 3587 root      15   0 24288  13m 2696 S  3.8  2.8   9:27.28 X
 4291 rncbc     15   0 28240  15m  12m S  1.9  3.1   1:44.69 kdeinit
 4356 rncbc     15   0 21312 7832 5952 S  1.6  1.6   2:45.78 gkrellm
 4346 rncbc     15   0 23744  12m 9.9m S  0.6  2.5   0:19.92 krandrtray
 4347 rncbc     15   0 24260  12m   9m S  0.6  2.5   0:19.62 kdeinit
 4355 rncbc     15   0 23684  11m 9504 S  0.6  2.3   0:16.15 kdeinit
 4361 rncbc     15   0 24948  13m  11m S  0.6  2.8   0:38.53 kscd
 4453 rncbc     16   0 27168  15m  12m S  0.6  3.1   0:28.73 kdeinit
24504 rncbc     17   0  2064  996  780 R  0.6  0.2   0:00.25 top
 4323 rncbc     15   0 29280  13m  11m S  0.3  2.8   0:14.93 kdeinit
 4330 rncbc     16   0 32296  19m  15m S  0.3  3.9   0:30.49 kdeinit
 4333 rncbc     15   0 26880  14m  12m S  0.3  3.1   0:25.56 kdeinit
 4337 rncbc     15   0 24680  13m  10m S  0.3  2.7   0:25.52 kamix
 4345 rncbc     15   0 26500  14m  12m S  0.3  2.9   0:26.77 korgac

Bye.
-- 
rncbc aka Rui Nuno Capela
rncbc@rncbc.org


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-19
       [not found]                               ` <32788.192.168.1.8.1101938057.squirrel@192.168.1.8>
  2004-12-01 21:58                                 ` Ingo Molnar
@ 2004-12-01 22:09                                 ` Ingo Molnar
  2004-12-01 22:31                                   ` Rui Nuno Capela
  2004-12-01 22:43                                   ` Florian Schmidt
  1 sibling, 2 replies; 59+ messages in thread
From: Ingo Molnar @ 2004-12-01 22:09 UTC (permalink / raw)
  To: Rui Nuno Capela
  Cc: linux-kernel, Lee Revell, mark_h_johnson, K.R. Foley, Bill Huey,
	Adam Heath, Florian Schmidt, Thomas Gleixner, Michal Schmidt,
	Fernando Pablo Lopez-Lezcano, Karsten Wiese, Gunther Persoons,
	emann, Shane Shrybman, Amit Shah, Esben Nielsen, Andrew Morton


* Rui Nuno Capela <rncbc@rncbc.org> wrote:

> Here goes attached the captured latency traces for you to check. I
> guess this time xmms-jack its at stake 'coz it was the cherry on top
> of the cake, for three times in a row :)

the latency does seem to be related to clients, as in the following
trace:

  xruntrace1-2.6.10-rc2-mm3-RT-V0.7.31-19-20041201202926.trc

jackd-16714 goes into sys_poll() at timestamp 0.850ms, and does a pipe
related wait:

   jackd-16714 00000000 0.853ms (+0.000ms): pipe_poll (do_pollfd)

and goes to sleep:

   jackd-16714 00000000 0.857ms (+0.000ms): schedule_timeout (do_poll)

almost 3 milliseconds later, at timestamp 3.404ms, the pipe related
timer expires and jackd gets scheduled again:

ksoftirq-3     00000000 3.403ms (+0.000ms): wake_up_process (run_timer_softirq)
ksoftirq-3     ........ 3.404ms (+0.000ms): ->    jackd-16714
                     [ 00000027 00000003 ]: try_to_wake_up

obviously an xrun has happened during this 2.6 msec wait:

   IRQ 5-3078  00000000 2.277ms (+0.000ms): xrun (snd_pcm_period_elapsed)

but ... this brings up the question, is this a valid scenario? How can
jackd block on clients for so long? Perhaps this means that every audio
buffer has run empty and jackd desperately needed some new audio input,
which it didnt get from the clients, up until after the xrun? In theory
this should only be possible if there's CPU saturation (that's why i
asked about how much CPU% time there was in use).

One indication that this might have been the case is that in the full
3.5 msecs trace there's not a single cycle spent idle. But, lots of time
was spent by e.g. X or gkrellm-4356, which are not RT tasks - so from
the RT task POV i think there were cycles left to be utilized. Could
this be a client bug? That seems a bit unlikely because to let jackd
'run empty', each and every client would have to starve it, correct?

anyway, your plan to try jack_test3.1 again looks like the correct
approach: we first need to check the behavior of 'trivial clients',
before more complex scenarios with real clients can be considered.

	Ingo

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-19
  2004-12-01 22:09                                 ` Ingo Molnar
@ 2004-12-01 22:31                                   ` Rui Nuno Capela
  2004-12-02  9:12                                     ` Rui Nuno Capela
  2004-12-01 22:43                                   ` Florian Schmidt
  1 sibling, 1 reply; 59+ messages in thread
From: Rui Nuno Capela @ 2004-12-01 22:31 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Lee Revell, mark_h_johnson, K.R. Foley, Bill Huey,
	Adam Heath, Florian Schmidt, Thomas Gleixner, Michal Schmidt,
	Fernando Pablo Lopez-Lezcano, Karsten Wiese, Gunther Persoons,
	emann, Shane Shrybman, Amit Shah, Esben Nielsen, Andrew Morton

Ingo Molnar wrote:
>
> the latency does seem to be related to clients, as in the following
> trace:
>
>   xruntrace1-2.6.10-rc2-mm3-RT-V0.7.31-19-20041201202926.trc
>
> jackd-16714 goes into sys_poll() at timestamp 0.850ms, and does a pipe
> related wait:
>
>    jackd-16714 00000000 0.853ms (+0.000ms): pipe_poll (do_pollfd)
>
> and goes to sleep:
>
>    jackd-16714 00000000 0.857ms (+0.000ms): schedule_timeout (do_poll)
>
> almost 3 milliseconds later, at timestamp 3.404ms, the pipe related
> timer expires and jackd gets scheduled again:
>
> ksoftirq-3     00000000 3.403ms (+0.000ms): wake_up_process
> (run_timer_softirq)
> ksoftirq-3     ........ 3.404ms (+0.000ms): ->    jackd-16714
>                      [ 00000027 00000003 ]: try_to_wake_up
>
> obviously an xrun has happened during this 2.6 msec wait:
>
>    IRQ 5-3078  00000000 2.277ms (+0.000ms): xrun (snd_pcm_period_elapsed)
>
> but ... this brings up the question, is this a valid scenario? How can
> jackd block on clients for so long? Perhaps this means that every audio
> buffer has run empty and jackd desperately needed some new audio input,
> which it didnt get from the clients, up until after the xrun? In theory
> this should only be possible if there's CPU saturation (that's why i
> asked about how much CPU% time there was in use).
>
> One indication that this might have been the case is that in the full
> 3.5 msecs trace there's not a single cycle spent idle. But, lots of time
> was spent by e.g. X or gkrellm-4356, which are not RT tasks - so from
> the RT task POV i think there were cycles left to be utilized. Could
> this be a client bug? That seems a bit unlikely because to let jackd
> 'run empty', each and every client would have to starve it, correct?
>

Did I mention that the XRUNs seem to occur only when xmms-jack is in the
graph? AFAICT xmms-jack uses a sort of double-buffering wrapper
(bio2jack), or in other words, its not a "native" jack client.

>
> anyway, your plan to try jack_test3.1 again looks like the correct
> approach: we first need to check the behavior of 'trivial clients',
> before more complex scenarios with real clients can be considered.
>

Yes, and I'll have to rebuild a new RT-V0.7.31-19 kernel without latency
timing stuff, just to make sure we'll compare apples to apples ;)

Bye now.
-- 
rncbc aka Rui Nuno Capela
rncbc@rncbc.org


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-19
  2004-12-01 22:09                                 ` Ingo Molnar
  2004-12-01 22:31                                   ` Rui Nuno Capela
@ 2004-12-01 22:43                                   ` Florian Schmidt
  2004-12-02  8:40                                     ` Ingo Molnar
  1 sibling, 1 reply; 59+ messages in thread
From: Florian Schmidt @ 2004-12-01 22:43 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rui Nuno Capela, linux-kernel, Lee Revell, mark_h_johnson,
	K.R. Foley, Bill Huey, Adam Heath, Thomas Gleixner,
	Michal Schmidt, Fernando Pablo Lopez-Lezcano, Karsten Wiese,
	Gunther Persoons, emann, Shane Shrybman, Amit Shah, Esben Nielsen,
	Andrew Morton

On Wed, 1 Dec 2004 23:09:16 +0100
Ingo Molnar <mingo@elte.hu> wrote:

> but ... this brings up the question, is this a valid scenario? How can
> jackd block on clients for so long? Perhaps this means that every audio
> buffer has run empty and jackd desperately needed some new audio input,
> which it didnt get from the clients, up until after the xrun? In theory
> this should only be possible if there's CPU saturation (that's why i
> asked about how much CPU% time there was in use).
> 
> One indication that this might have been the case is that in the full
> 3.5 msecs trace there's not a single cycle spent idle. But, lots of time
> was spent by e.g. X or gkrellm-4356, which are not RT tasks - so from
> the RT task POV i think there were cycles left to be utilized. Could
> this be a client bug? That seems a bit unlikely because to let jackd
> 'run empty', each and every client would have to starve it, correct?

actually a single client doing nasty (non RT) stuff in its process()
callback can "starve" jackd. AFAIK jackd waits until the last client has
finished its process callback. So, if some client's process callback
decides to use (for example) some blocking system call (big no no) and
consequently falls asleep for a relatively long time, then it can cause
jackd to miss its deadline. I'm not sure though wether this triggers an
xrun in jackd or just a delay exceeded message.

flo


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-19
  2004-12-01 22:43                                   ` Florian Schmidt
@ 2004-12-02  8:40                                     ` Ingo Molnar
  2004-12-02 12:22                                       ` Florian Schmidt
  0 siblings, 1 reply; 59+ messages in thread
From: Ingo Molnar @ 2004-12-02  8:40 UTC (permalink / raw)
  To: Florian Schmidt
  Cc: Rui Nuno Capela, linux-kernel, Lee Revell, mark_h_johnson,
	K.R. Foley, Bill Huey, Adam Heath, Thomas Gleixner,
	Michal Schmidt, Fernando Pablo Lopez-Lezcano, Karsten Wiese,
	Gunther Persoons, emann, Shane Shrybman, Amit Shah, Esben Nielsen,
	Andrew Morton


* Florian Schmidt <mista.tapas@gmx.net> wrote:

> On Wed, 1 Dec 2004 23:09:16 +0100
> Ingo Molnar <mingo@elte.hu> wrote:
> 
> > but ... this brings up the question, is this a valid scenario? How can
> > jackd block on clients for so long? Perhaps this means that every audio
> > buffer has run empty and jackd desperately needed some new audio input,
> > which it didnt get from the clients, up until after the xrun? In theory
> > this should only be possible if there's CPU saturation (that's why i
> > asked about how much CPU% time there was in use).
> > 
> > One indication that this might have been the case is that in the full
> > 3.5 msecs trace there's not a single cycle spent idle. But, lots of time
> > was spent by e.g. X or gkrellm-4356, which are not RT tasks - so from
> > the RT task POV i think there were cycles left to be utilized. Could
> > this be a client bug? That seems a bit unlikely because to let jackd
> > 'run empty', each and every client would have to starve it, correct?
> 
> actually a single client doing nasty (non RT) stuff in its process()
> callback can "starve" jackd. AFAIK jackd waits until the last client
> has finished its process callback. So, if some client's process
> callback decides to use (for example) some blocking system call (big
> no no) and consequently falls asleep for a relatively long time, then
> it can cause jackd to miss its deadline. I'm not sure though wether
> this triggers an xrun in jackd or just a delay exceeded message.

since the period size in Rui's test is so small (-p64 is 1.6 msecs?),
the 2.6 msec timeout in pipe_poll() already generates an xrun.

	Ingo

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-19
  2004-12-01 22:31                                   ` Rui Nuno Capela
@ 2004-12-02  9:12                                     ` Rui Nuno Capela
  2004-12-02 12:59                                       ` Rui Nuno Capela
  0 siblings, 1 reply; 59+ messages in thread
From: Rui Nuno Capela @ 2004-12-02  9:12 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Lee Revell, mark_h_johnson, K.R. Foley, Bill Huey,
	Adam Heath, Florian Schmidt, Thomas Gleixner, Michal Schmidt,
	Fernando Pablo Lopez-Lezcano, Karsten Wiese, Gunther Persoons,
	emann, Shane Shrybman, Amit Shah, Esben Nielsen, Andrew Morton

>
> ... I'll have to rebuild a new RT-V0.7.31-19 kernel without latency
> timing stuff, just to make sure we'll compare apples to apples ;)
>

Oh, and running also an jackd 0.99.17cvs _without_ the xrun trace
auto-trigger patch, so to end up with a production-like system.

And that was it. After a total of 4 hours running free, the jack_test3
suite spitted out these results, which speak from themselves:

    *********** CONSOLIDATED RESULTS ************
    Total seconds ran . . . . . . : 14400
    Number of clients . . . . . . :    16
    Ports per client  . . . . . . :     4
    *********************************************
    Summary Result Count  . . . . :     4 /     4
    *********************************************
    Timeout Rate  . . . . . . . . :(    0.0)/hour
    XRUN Rate . . . . . . . . . . :     0.0 /hour
    Delay Rate (>spare time)  . . :     0.0 /hour
    Delay Rate (>1000 usecs)  . . :     0.0 /hour
    Delay Maximum . . . . . . . . :    77   usecs
    Cycle Maximum . . . . . . . . :  1029   usecs
    Average DSP Load. . . . . . . :    60.0 %
    Average CPU System Load . . . :    19.6 %
    Average CPU User Load . . . . :    47.7 %
    Average CPU Nice Load . . . . :     0.0 %
    Average CPU I/O Wait Load . . :     0.0 %
    Average CPU IRQ Load  . . . . :     0.0 %
    Average CPU Soft-IRQ Load . . :     0.0 %
    Average Interrupt Rate  . . . :  1689.1 /sec
    Average Context-Switch Rate . : 19781.4 /sec
    *********************************************

This is frankly the best performance EVER on this laptop machine, given
these jackd parameters (-R -P60 -dalsa -dhw:0 -r44100 -p64 -n2 -S -P).

Next step is really trying to increase the stress and look after when it
will break apart. It will not take too long...

First attempts, by just increasing the client count from 16 to 20, are
leading to what will be the next "horror show" :) CPU tops above 90%, and
after a couple of minutes running steadly it enters into some kind of
turbulence and hangs. Yeah, it just freezes completely.

So I guess we're having a lot more food to mangle ;)

Stay tuned.
-- 
rncbc aka Rui Nuno Capela
rncbc@rncbc.org


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-19
  2004-12-02  8:40                                     ` Ingo Molnar
@ 2004-12-02 12:22                                       ` Florian Schmidt
  2004-12-02 12:29                                         ` Ingo Molnar
  0 siblings, 1 reply; 59+ messages in thread
From: Florian Schmidt @ 2004-12-02 12:22 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rui Nuno Capela, linux-kernel, Lee Revell, mark_h_johnson,
	K.R. Foley, Bill Huey, Adam Heath, Thomas Gleixner,
	Michal Schmidt, Fernando Pablo Lopez-Lezcano, Karsten Wiese,
	Gunther Persoons, emann, Shane Shrybman, Amit Shah, Esben Nielsen,
	Andrew Morton

On Thu, 2 Dec 2004 09:40:40 +0100
Ingo Molnar <mingo@elte.hu> wrote:

> > actually a single client doing nasty (non RT) stuff in its process()
> > callback can "starve" jackd. AFAIK jackd waits until the last client
> > has finished its process callback. So, if some client's process
> > callback decides to use (for example) some blocking system call (big
> > no no) and consequently falls asleep for a relatively long time, then
> > it can cause jackd to miss its deadline. I'm not sure though wether
> > this triggers an xrun in jackd or just a delay exceeded message.
> 
> since the period size in Rui's test is so small (-p64 is 1.6 msecs?),

ca. 1.3ms

> the 2.6 msec timeout in pipe_poll() already generates an xrun.

yep. now the question is: why did jackd have to wait so long for the
client? what was the client doing? was it sleeping? what client was it?
probably not the simple jack_test client, right?

Short tests ((20 minutes, due to lack of time) with cvs jackd and a
number of modified jack_test clients (added a for loop in the process()
callback to burn cycles to increase the cpu load) show no xruns here
even with relatively high cpu load (around 70%) and a periodsize of 64
frames. Of course i had the mandatory additional kernel compile running
in the background :)

flo

Oh wow. Just before hitting send i got three xruns of around
0.020-0.050msec. Ok, will read up on recent emails to see what to do to
debug these.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-19
  2004-12-02 12:22                                       ` Florian Schmidt
@ 2004-12-02 12:29                                         ` Ingo Molnar
  2004-12-02 13:06                                           ` Florian Schmidt
  0 siblings, 1 reply; 59+ messages in thread
From: Ingo Molnar @ 2004-12-02 12:29 UTC (permalink / raw)
  To: Florian Schmidt
  Cc: Rui Nuno Capela, linux-kernel, Lee Revell, mark_h_johnson,
	K.R. Foley, Bill Huey, Adam Heath, Thomas Gleixner,
	Michal Schmidt, Fernando Pablo Lopez-Lezcano, Karsten Wiese,
	Gunther Persoons, emann, Shane Shrybman, Amit Shah, Esben Nielsen,
	Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 1017 bytes --]


* Florian Schmidt <mista.tapas@gmx.net> wrote:

> > the 2.6 msec timeout in pipe_poll() already generates an xrun.
> 
> yep. now the question is: why did jackd have to wait so long for the
> client? what was the client doing? was it sleeping? what client was
> it? probably not the simple jack_test client, right?

it's very likely not the simple jack_test client. I've attached the
trace in question. Here are the tasks that were running:

 gkrellm
   IRQ 0
  IRQ 14
   IRQ 5
   jackd
 kblockd
  korgac
ksoftirq
qjackctl
  qsynth
       X
    xmms

the trace doesnt show what task jackd was waiting on, and it would be
hard to establish it, the tracepoint would have to 'discover' all other
holders of the pipe fd, which is quite complex.

> Oh wow. Just before hitting send i got three xruns of around
> 0.020-0.050msec. Ok, will read up on recent emails to see what to do
> to debug these.

which jackd version is this? I saw similar small spurious xruns with
99.0, those went away in recent CVS versions.

	Ingo

[-- Attachment #2: xruntrace1-2.6.10-rc2-mm3-RT-V0.7.31-19-20041201202926.trc.gz --]
[-- Type: application/x-gzip, Size: 51588 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-19
  2004-12-02  9:12                                     ` Rui Nuno Capela
@ 2004-12-02 12:59                                       ` Rui Nuno Capela
  2004-12-02 16:38                                         ` Fernando Lopez-Lezcano
  0 siblings, 1 reply; 59+ messages in thread
From: Rui Nuno Capela @ 2004-12-02 12:59 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Lee Revell, mark_h_johnson, K.R. Foley, Bill Huey,
	Adam Heath, Florian Schmidt, Thomas Gleixner, Michal Schmidt,
	Fernando Pablo Lopez-Lezcano, Karsten Wiese, Gunther Persoons,
	emann, Shane Shrybman, Amit Shah, Esben Nielsen, Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 1525 bytes --]

>
> Next step is really trying to increase the stress and look after when it
> will break apart. It will not take too long...
>
> First attempts, by just increasing the client count from 16 to 20, are
> leading to what will be the next "horror show" :) CPU tops above 90%, and
> after a couple of minutes running steadly it enters into some kind of
> turbulence and hangs. Yeah, it just freezes completely.
>
> So I guess we're having a lot more food to mangle ;)
>

After a bit of investigation, I've found some evidence that the "horror
show" is primarily due to XRUN fprintf stderr storm. If I simply remove
that fprintf line from jack/drivers/alsa/alsa_driver.c (around line 1084),
I can run more than 20 jack_test3_client's without hosing the system.

Most probably, the main issue is about fprintf(stderr) being called in a
RT thread context. This is a jackd issue, not of RT kernel's. I remember
there's a jack patch, somewhere in the limbo, for queing all printable
messages out of the jackd RT threads.

I think I'm urging for that patch right now, even though I'm probably
pushing all this pressure into real/physical/hard limits ;) OK. I'll look
if I can take that back from the jackit-devel archives and see what I can
do about it.


Back to a latency-trace enabled RT-0V0.7.31-19, and a patched jackd
0.99.17cvs, I've trapped some more traces (on attachment). These were
taken while running some jack_test3 with light stress (10~12 clients, 4
ports each).

Bye now.
-- 
rncbc aka Rui Nuno Capela
rncbc@rncbc.org

[-- Attachment #2: xruntrace1-2.6.10-rc2-mm3-RT-V0.7.31-19-20041202095337.trc.gz --]
[-- Type: application/x-gzip-compressed, Size: 1675 bytes --]

[-- Attachment #3: xruntrace1-2.6.10-rc2-mm3-RT-V0.7.31-19-20041202100140.trc.gz --]
[-- Type: application/x-gzip-compressed, Size: 1671 bytes --]

[-- Attachment #4: xruntrace1-2.6.10-rc2-mm3-RT-V0.7.31-19-20041202101633.trc.gz --]
[-- Type: application/x-gzip-compressed, Size: 1658 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-19
  2004-12-02 12:29                                         ` Ingo Molnar
@ 2004-12-02 13:06                                           ` Florian Schmidt
  2004-12-02 13:10                                             ` Ingo Molnar
  2004-12-02 13:18                                             ` Rui Nuno Capela
  0 siblings, 2 replies; 59+ messages in thread
From: Florian Schmidt @ 2004-12-02 13:06 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rui Nuno Capela, linux-kernel, Lee Revell, mark_h_johnson,
	K.R. Foley, Bill Huey, Adam Heath, Thomas Gleixner,
	Michal Schmidt, Fernando Pablo Lopez-Lezcano, Karsten Wiese,
	Gunther Persoons, emann, Shane Shrybman, Amit Shah, Esben Nielsen,
	Andrew Morton

On Thu, 2 Dec 2004 13:29:31 +0100
Ingo Molnar <mingo@elte.hu> wrote:

> it's very likely not the simple jack_test client. I've attached the
> trace in question. Here are the tasks that were running:
> 
>  gkrellm
>    IRQ 0
>   IRQ 14
>    IRQ 5
>    jackd
>  kblockd
>   korgac
> ksoftirq
> qjackctl
>   qsynth
>        X
>     xmms
> 
> the trace doesnt show what task jackd was waiting on, and it would be
> hard to establish it, the tracepoint would have to 'discover' all other
> holders of the pipe fd, which is quite complex.

I'm not knowledgable enough to read the trace, but what was for example
the last thing qsynth was doing? Did it go to sleep? I suppose this was
Rui's 9 qsynth's test, right?

Hmm, i wonder if there's a way to detect non RT behaviour in jackd
clients. I mean AFAIK the only thing allowed for the process callback of
on is the FIFO it waits on to be woken, right? Every other sleeping is
to be considered a bug. 

> 
> > Oh wow. Just before hitting send i got three xruns of around
> > 0.020-0.050msec. Ok, will read up on recent emails to see what to do
> > to debug these.
> 
> which jackd version is this? I saw similar small spurious xruns with
> 99.0, those went away in recent CVS versions.

For this test i used jackd from yesterdays CVS. With the 0.99 version i
think i saw more of these xruns, but cannot tell (still running the CVS
version right now).. They all had in common though that they are in the
0.020-0.040msec range.

Flo

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-19
  2004-12-02 13:06                                           ` Florian Schmidt
@ 2004-12-02 13:10                                             ` Ingo Molnar
  2004-12-02 13:40                                               ` Florian Schmidt
  2004-12-02 13:18                                             ` Rui Nuno Capela
  1 sibling, 1 reply; 59+ messages in thread
From: Ingo Molnar @ 2004-12-02 13:10 UTC (permalink / raw)
  To: Florian Schmidt
  Cc: Rui Nuno Capela, linux-kernel, Lee Revell, mark_h_johnson,
	K.R. Foley, Bill Huey, Adam Heath, Thomas Gleixner,
	Michal Schmidt, Fernando Pablo Lopez-Lezcano, Karsten Wiese,
	Gunther Persoons, emann, Shane Shrybman, Amit Shah, Esben Nielsen,
	Andrew Morton


* Florian Schmidt <mista.tapas@gmx.net> wrote:

> Hmm, i wonder if there's a way to detect non RT behaviour in jackd
> clients. I mean AFAIK the only thing allowed for the process callback
> of on is the FIFO it waits on to be woken, right? Every other sleeping
> is to be considered a bug. 

there's such a feature in -RT kernels. If a user process calls:

	gettimeofday(1,1);

then the kernel turns 'atomic mode' on. To turn it off, call:

	gettimeofday(1,0);

while in atomic-mode, any non-atomic activity (scheduling) will produce
a kernel message and a SIGUSR2 sent to the offending process (once,
atomic mode has to be re-enabled again for the next message). Preemption
by a higher-prio task does not trigger a message/signal.

If you run the client under gdb you should be able to catch the SIGUSR2
signal and then you can see the offending code's backtrace via 'bt'.

	Ingo

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-19
  2004-12-02 13:06                                           ` Florian Schmidt
  2004-12-02 13:10                                             ` Ingo Molnar
@ 2004-12-02 13:18                                             ` Rui Nuno Capela
  1 sibling, 0 replies; 59+ messages in thread
From: Rui Nuno Capela @ 2004-12-02 13:18 UTC (permalink / raw)
  To: Florian Schmidt
  Cc: Ingo Molnar, linux-kernel, Lee Revell, mark_h_johnson, K.R. Foley,
	Bill Huey, Adam Heath, Thomas Gleixner, Michal Schmidt,
	Fernando Pablo Lopez-Lezcano, Karsten Wiese, Gunther Persoons,
	emann, Shane Shrybman, Amit Shah, Esben Nielsen, Andrew Morton

Florian Schmidt wrote:
>
> Ingo Molnar wrote:
>
>> it's very likely not the simple jack_test client. I've attached the
>> trace in question. Here are the tasks that were running:
>>
>>  gkrellm
>>    IRQ 0
>>   IRQ 14
>>    IRQ 5
>>    jackd
>>  kblockd
>>   korgac
>> ksoftirq
>> qjackctl
>>   qsynth
>>        X
>>     xmms
>>
>> the trace doesnt show what task jackd was waiting on, and it would be
>> hard to establish it, the tracepoint would have to 'discover' all other
>> holders of the pipe fd, which is quite complex.
>
> I'm not knowledgable enough to read the trace, but what was for example
> the last thing qsynth was doing? Did it go to sleep? I suppose this was
> Rui's 9 qsynth's test, right?
>

No, it wasn't my old 8 fluidsynths test. It was just one normal desktop
work session. At the time, IIRC, the only jack clients that were running
were:

    qjackctl (obviously :)
    qsynth   (3 fluidsynth engines, many soundfonts loaded).
    xmms     (via xmms-jack output plugin)

By just taking xmms out from the graph, I can run my desktop environment
(KDE 3.2) for hours without a single XRUN or noticeable delay. And just
before RT-V0.9.31-19 had arrived, that was just a dream ;)

Bye.
-- 
rncbc aka Rui Nuno Capela
rncbc@rncbc.org



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-19
  2004-12-02 13:10                                             ` Ingo Molnar
@ 2004-12-02 13:40                                               ` Florian Schmidt
  2004-12-02 13:49                                                 ` Ingo Molnar
  0 siblings, 1 reply; 59+ messages in thread
From: Florian Schmidt @ 2004-12-02 13:40 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rui Nuno Capela, linux-kernel, Lee Revell, mark_h_johnson,
	K.R. Foley, Bill Huey, Adam Heath, Thomas Gleixner,
	Michal Schmidt, Fernando Pablo Lopez-Lezcano, Karsten Wiese,
	Gunther Persoons, emann, Shane Shrybman, Amit Shah, Esben Nielsen,
	Andrew Morton

On Thu, 2 Dec 2004 14:10:02 +0100
Ingo Molnar <mingo@elte.hu> wrote:

> 
> * Florian Schmidt <mista.tapas@gmx.net> wrote:
> 
> > Hmm, i wonder if there's a way to detect non RT behaviour in jackd
> > clients. I mean AFAIK the only thing allowed for the process callback
> > of on is the FIFO it waits on to be woken, right? Every other sleeping
> > is to be considered a bug. 
> 
> there's such a feature in -RT kernels. If a user process calls:
> 
> 	gettimeofday(1,1);
> 
> then the kernel turns 'atomic mode' on. To turn it off, call:
> 
> 	gettimeofday(1,0);
> 
> while in atomic-mode, any non-atomic activity (scheduling) will produce
> a kernel message and a SIGUSR2 sent to the offending process (once,
> atomic mode has to be re-enabled again for the next message). Preemption
> by a higher-prio task does not trigger a message/signal.
> 
> If you run the client under gdb you should be able to catch the SIGUSR2
> signal and then you can see the offending code's backtrace via 'bt'.

Ok, so if i want to find out whether a client violates the RT
constraints for its process callback i would have to add a call to
gettimeofday(1,1) at the start of the process callback and
gettimeofday(1,0) at the end.

Everything which causes a reschedule inbetween will then cause SIGUSR2
to be sent to the client for which i could either add a signal handler
in the client or just use gdb to get notified of it. 

Cool!

Flo

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-19
  2004-12-02 13:40                                               ` Florian Schmidt
@ 2004-12-02 13:49                                                 ` Ingo Molnar
  2004-12-02 16:08                                                   ` Florian Schmidt
  2004-12-02 17:44                                                   ` Florian Schmidt
  0 siblings, 2 replies; 59+ messages in thread
From: Ingo Molnar @ 2004-12-02 13:49 UTC (permalink / raw)
  To: Florian Schmidt
  Cc: Rui Nuno Capela, linux-kernel, Lee Revell, mark_h_johnson,
	K.R. Foley, Bill Huey, Adam Heath, Thomas Gleixner,
	Michal Schmidt, Fernando Pablo Lopez-Lezcano, Karsten Wiese,
	Gunther Persoons, emann, Shane Shrybman, Amit Shah, Esben Nielsen,
	Andrew Morton


* Florian Schmidt <mista.tapas@gmx.net> wrote:

> Ok, so if i want to find out whether a client violates the RT
> constraints for its process callback i would have to add a call to
> gettimeofday(1,1) at the start of the process callback and
> gettimeofday(1,0) at the end.
> 
> Everything which causes a reschedule inbetween will then cause SIGUSR2
> to be sent to the client for which i could either add a signal handler
> in the client or just use gdb to get notified of it. 

correct. I'd expect there to be a number of less critical reschedules
happening around startup/shutdown of a client, which one could consider
a false positive, but there should be no unexpected rescheduling while
the client is up and running.

	Ingo

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-19
  2004-12-02 13:49                                                 ` Ingo Molnar
@ 2004-12-02 16:08                                                   ` Florian Schmidt
  2004-12-02 17:44                                                   ` Florian Schmidt
  1 sibling, 0 replies; 59+ messages in thread
From: Florian Schmidt @ 2004-12-02 16:08 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rui Nuno Capela, linux-kernel, Lee Revell, mark_h_johnson,
	K.R. Foley, Bill Huey, Adam Heath, Thomas Gleixner,
	Michal Schmidt, Fernando Pablo Lopez-Lezcano, Karsten Wiese,
	Gunther Persoons, emann, Shane Shrybman, Amit Shah, Esben Nielsen,
	Andrew Morton

On Thu, 2 Dec 2004 14:49:34 +0100
Ingo Molnar <mingo@elte.hu> wrote:

> 
> * Florian Schmidt <mista.tapas@gmx.net> wrote:
> 
> > Ok, so if i want to find out whether a client violates the RT
> > constraints for its process callback i would have to add a call to
> > gettimeofday(1,1) at the start of the process callback and
> > gettimeofday(1,0) at the end.
> > 
> > Everything which causes a reschedule inbetween will then cause SIGUSR2
> > to be sent to the client for which i could either add a signal handler
> > in the client or just use gdb to get notified of it. 
> 
> correct. I'd expect there to be a number of less critical reschedules
> happening around startup/shutdown of a client, which one could consider
> a false positive, but there should be no unexpected rescheduling while
> the client is up and running.

cool, we're discussing on jackit-devel if this could be added to jackd's
libjack which would do the gettimeofday calls right before and after
calling the process() callback. This might indeed be very very very very
useful as in this case the clients themselfes wouldn't need to be
changed and jackd might then print a console message for example when a
client does something nasty.

Flo

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-19
  2004-12-02 12:59                                       ` Rui Nuno Capela
@ 2004-12-02 16:38                                         ` Fernando Lopez-Lezcano
  0 siblings, 0 replies; 59+ messages in thread
From: Fernando Lopez-Lezcano @ 2004-12-02 16:38 UTC (permalink / raw)
  To: Rui Nuno Capela
  Cc: Ingo Molnar, linux-kernel, Lee Revell, mark_h_johnson, K.R. Foley,
	Bill Huey, Adam Heath, Florian Schmidt, Thomas Gleixner,
	Michal Schmidt, Karsten Wiese, Gunther Persoons, emann,
	Shane Shrybman, Amit Shah, Esben Nielsen, Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 2138 bytes --]

On Thu, 2004-12-02 at 04:59, Rui Nuno Capela wrote:
> >
> > Next step is really trying to increase the stress and look after when it
> > will break apart. It will not take too long...
> >
> > First attempts, by just increasing the client count from 16 to 20, are
> > leading to what will be the next "horror show" :) CPU tops above 90%, and
> > after a couple of minutes running steadly it enters into some kind of
> > turbulence and hangs. Yeah, it just freezes completely.
> >
> > So I guess we're having a lot more food to mangle ;)
> >
> 
> After a bit of investigation, I've found some evidence that the "horror
> show" is primarily due to XRUN fprintf stderr storm. If I simply remove
> that fprintf line from jack/drivers/alsa/alsa_driver.c (around line 1084),
> I can run more than 20 jack_test3_client's without hosing the system.
> 
> Most probably, the main issue is about fprintf(stderr) being called in a
> RT thread context. 

Yes, that's what's happening. I've been seeing this for a while under
very high loads. 

> This is a jackd issue, not of RT kernel's. I remember
> there's a jack patch, somewhere in the limbo, for queing all printable
> messages out of the jackd RT threads.

There is one I created a while ago, I'm attaching it. It has not been
made part of jack because a proper patch is needed (this one does not
cover all possible uses of printf, the VERBOSE macro itself should be
changed, and apparently it should be made thread safe - but I'm using
this in production machines without problems and it covers the most
obnoxious uses of printf). 

> I think I'm urging for that patch right now, even though I'm probably
> pushing all this pressure into real/physical/hard limits ;) OK. I'll look
> if I can take that back from the jackit-devel archives and see what I can
> do about it.
>
> Back to a latency-trace enabled RT-0V0.7.31-19, and a patched jackd
> 0.99.17cvs, I've trapped some more traces (on attachment). These were
> taken while running some jack_test3 with light stress (10~12 clients, 4
> ports each).

Rui: [MANY] thanks for the amazing job of tracking stuff you are
doing...

-- Fernando


[-- Attachment #2: jack-0.99.0-messagebuffer.patch --]
[-- Type: text/x-patch, Size: 12359 bytes --]

diff -uNr jack-audio-connection-kit-0.98.8-orig/jackd/engine.c jack-audio-connection-kit-0.98.8/jackd/engine.c
--- jack-audio-connection-kit-0.98.8-orig/jackd/engine.c	2004-08-11 12:57:49.175398640 -0700
+++ jack-audio-connection-kit-0.98.8/jackd/engine.c	2004-08-11 13:46:26.275931952 -0700
@@ -58,6 +58,7 @@
 #endif
 
 #include "transengine.h"
+#include "message_buffer.h"
 
 #define JACK_ERROR_WITH_SOCKETS 10000000
 
@@ -687,20 +687,20 @@
 	now = jack_get_microseconds ();
 
 	if (status != 0) {
-		VERBOSE (engine, "at %" PRIu64
-			 " waiting on %d for %" PRIu64
-			 " usecs, status = %d sig = %" PRIu64
-			 " awa = %" PRIu64 " fin = %" PRIu64
-			 " dur=%" PRIu64 "\n",
-			 now,
-			 client->subgraph_wait_fd,
-			 now - then,
-			 status,
-			 ctl->signalled_at,
-			 ctl->awake_at,
-			 ctl->finished_at,
-			 ctl->finished_at? (ctl->finished_at -
-					    ctl->signalled_at): 0);
+		MB_VERBOSE (engine, "at %" PRIu64
+			    " waiting on %d for %" PRIu64
+			    " usecs, status = %d sig = %" PRIu64
+			    " awa = %" PRIu64 " fin = %" PRIu64
+			    " dur=%" PRIu64 "\n",
+			    now,
+			    client->subgraph_wait_fd,
+			    now - then,
+			    status,
+			    ctl->signalled_at,
+			    ctl->awake_at,
+			    ctl->finished_at,
+			    ctl->finished_at? (ctl->finished_at -
+					       ctl->signalled_at): 0);
 
 		/* we can only consider the timeout a client error if
 		 * it actually woke up.  its possible that the kernel
@@ -819,9 +820,9 @@
 				 engine->driver->period_usecs)) * 50.0f
 			+ (engine->control->cpu_load * 0.5f);
 
-		VERBOSE (engine, "load = %.4f max usecs: %.3f, "
-			 "spare = %.3f\n", engine->control->cpu_load,
-			 max_usecs, engine->spare_usecs);
+		MB_VERBOSE (engine, "load = %.4f max usecs: %.3f, "
+			    "spare = %.3f\n", engine->control->cpu_load,
+			    max_usecs, engine->spare_usecs);
 	}
 
 }
@@ -921,13 +921,13 @@
 			    ctl->state > NotTriggered &&
 			    ctl->state != Finished &&
 			    ctl->timed_out++) {
-				VERBOSE(engine, "client %s error: awake_at = %"
-					 PRIu64
-					 " state = %d timed_out = %d\n",
-					 ctl->name,
-					 ctl->awake_at,
-					 ctl->state,
-					 ctl->timed_out);
+				MB_VERBOSE(engine, "client %s error: awake_at = %"
+					   PRIu64
+					   " state = %d timed_out = %d\n",
+					   ctl->name,
+					   ctl->awake_at,
+					   ctl->state,
+					   ctl->timed_out);
 				client->error++;
 			}
 		}
@@ -2374,9 +2374,9 @@
 	    engine->spare_usecs &&
 	    ((WORK_SCALE * engine->spare_usecs) <= delayed_usecs)) {
 
-		fprintf (stderr, "delay of %.3f usecs exceeds estimated spare"
-			 " time of %.3f; restart ...\n",
-			 delayed_usecs, WORK_SCALE * engine->spare_usecs);
+		MB_MESSAGE ("delay of %.3f usecs exceeds estimated spare"
+			    " time of %.3f; restart ...\n",
+			    delayed_usecs, WORK_SCALE * engine->spare_usecs);
 		
 		if (++consecutive_excessive_delays > 10) {
 			jack_error ("too many consecutive interrupt delays "
@@ -2448,9 +2448,9 @@
 
 	if (engine->verbose) {
 		if (nframes != b_size) { 
-			VERBOSE (engine, 
-				"late driver wakeup: nframes to process = %"
-				PRIu32 ".\n", nframes);
+			MB_VERBOSE (engine, 
+				    "late driver wakeup: nframes to process = %"
+				    PRIu32 ".\n", nframes);
 		}
 	}
 
@@ -2677,8 +2677,8 @@
 static void
 jack_zombify_client (jack_engine_t *engine, jack_client_internal_t *client)
 {
-	VERBOSE (engine, "removing client \"%s\" from the processing chain\n",
-		 client->control->name);
+	MB_VERBOSE (engine, "removing client \"%s\" from the processing chain\n",
+		    client->control->name);
 
 	/* caller must hold the client_lock */
 
@@ -2698,7 +2698,7 @@
 
 	/* caller must hold the client_lock */
 
-	VERBOSE (engine, "removing client \"%s\"\n", client->control->name);
+	MB_VERBOSE (engine, "removing client \"%s\"\n", client->control->name);
 
 	/* if its not already a zombie, make it so */
 
@@ -2956,7 +2956,7 @@
 
 	subgraph_client = 0;
 
-	VERBOSE(engine, "++ jack_rechain_graph():\n");
+	MB_VERBOSE(engine, "++ jack_rechain_graph():\n");
 
 	event.type = GraphReordered;
 
@@ -3000,20 +3000,20 @@
 				if (subgraph_client) {
 					subgraph_client->subgraph_wait_fd =
 						jack_get_fifo_fd (engine, n);
-					VERBOSE (engine, "client %s: wait_fd="
-						 "%d, execution_order="
-						 "%lu.\n", 
-						 subgraph_client->
-						 control->name,
-						 subgraph_client->
-						 subgraph_wait_fd, n);
+					MB_VERBOSE (engine, "client %s: wait_fd="
+						    "%d, execution_order="
+						    "%lu.\n", 
+						    subgraph_client->
+						    control->name,
+						    subgraph_client->
+						    subgraph_wait_fd, n);
 					n++;
 				}
 
-				VERBOSE (engine, "client %s: internal "
-					 "client, execution_order="
-					 "%lu.\n", 
-					 client->control->name, n);
+				MB_VERBOSE (engine, "client %s: internal "
+					    "client, execution_order="
+					    "%lu.\n", 
+					    client->control->name, n);
 
 				/* this does the right thing for
 				 * internal clients too 
@@ -3036,13 +3036,13 @@
 					subgraph_client = client;
 					subgraph_client->subgraph_start_fd =
 						jack_get_fifo_fd (engine, n);
-					VERBOSE (engine, "client %s: "
-						 "start_fd=%d, execution"
-						 "_order=%lu.\n",
-						 subgraph_client->
-						 control->name,
-						 subgraph_client->
-						 subgraph_start_fd, n);
+					MB_VERBOSE (engine, "client %s: "
+						    "start_fd=%d, execution"
+						    "_order=%lu.\n",
+						    subgraph_client->
+						    control->name,
+						    subgraph_client->
+						    subgraph_start_fd, n);
 					
 					/* this external client after this will have
 					   jackd as its upstream connection.
@@ -3052,13 +3052,13 @@
 
 				} 
 				else {
-					VERBOSE (engine, "client %s: in"
-						 " subgraph after %s, "
-						 "execution_order="
-						 "%lu.\n",
-						 client->control->name,
-						 subgraph_client->
-						 control->name, n);
+					MB_VERBOSE (engine, "client %s: in"
+						    " subgraph after %s, "
+						    "execution_order="
+						    "%lu.\n",
+						    client->control->name,
+						    subgraph_client->
+						    control->name, n);
 					subgraph_client->subgraph_wait_fd = -1;
 					
 					/* this external client after this will have
@@ -3084,7 +3084,7 @@
 	if (subgraph_client) {
 		subgraph_client->subgraph_wait_fd =
 			jack_get_fifo_fd (engine, n);
-		VERBOSE (engine, "client %s: wait_fd=%d, "
+		MB_VERBOSE (engine, "client %s: wait_fd=%d, "
 			 "execution_order=%lu (last client).\n", 
 			 subgraph_client->control->name,
 			 subgraph_client->subgraph_wait_fd, n);
@@ -3513,9 +3513,9 @@
 		jack_unlock_graph (engine);
 		return -1;
 	} else {
-		VERBOSE (engine, "connect %s and %s\n",
-			 srcport->shared->name,
-			 dstport->shared->name);
+		MB_VERBOSE (engine, "connect %s and %s\n",
+			    srcport->shared->name,
+			    dstport->shared->name);
 
 		dstport->connections =
 			jack_slist_prepend (dstport->connections, connection);
@@ -3560,9 +3560,9 @@
 		if (connect->source == srcport &&
 		    connect->destination == dstport) {
 
-			VERBOSE (engine, "DIS-connect %s and %s\n",
-				 srcport->shared->name,
-				 dstport->shared->name);
+			MB_VERBOSE (engine, "DIS-connect %s and %s\n",
+				    srcport->shared->name,
+				    dstport->shared->name);
 			
 			srcport->connections =
 				jack_slist_remove (srcport->connections,
@@ -3616,8 +3616,8 @@
 		return -1;
 	}
 
-	VERBOSE (engine, "clear connections for %s\n",
-		 engine->internal_ports[port_id].shared->name);
+	MB_VERBOSE (engine, "clear connections for %s\n",
+		    engine->internal_ports[port_id].shared->name);
 
 	jack_lock_graph (engine);
 	jack_port_clear_connections (engine, &engine->internal_ports[port_id]);
@@ -3894,8 +3894,8 @@
 	jack_port_registration_notify (engine, port_id, TRUE);
 	jack_unlock_graph (engine);
 
-	VERBOSE (engine, "registered port %s, offset = %u\n",
-		 shared->name, (unsigned int)shared->offset);
+	MB_VERBOSE (engine, "registered port %s, offset = %u\n",
+		    shared->name, (unsigned int)shared->offset);
 
 	req->x.port_info.port_id = port_id;
 
diff -uNr jack-audio-connection-kit-0.98.8-orig/jackd/jackd.c jack-audio-connection-kit-0.98.8/jackd/jackd.c
--- jack-audio-connection-kit-0.98.8-orig/jackd/jackd.c	2004-08-11 12:57:49.251387088 -0700
+++ jack-audio-connection-kit-0.98.8/jackd/jackd.c	2004-08-11 13:17:25.632549968 -0700
@@ -39,6 +39,8 @@
 #include <jack/shm.h>
 #include <jack/driver_parse.h>
 
+#include "message_buffer.h"
+
 #ifdef USE_CAPABILITIES
 
 #include <sys/stat.h>
@@ -170,6 +172,9 @@
 			sigaction (i, &action, 0);
 		} 
 	}
+
+	/* start a thread to display messages from realtime threads */
+	mb_init("");
 	
 	if (verbose) {
 		fprintf (stderr, "%d waiting for signals\n", getpid());
diff -uNr jack-audio-connection-kit-0.98.8-orig/jackd/Makefile.am jack-audio-connection-kit-0.98.8/jackd/Makefile.am
--- jack-audio-connection-kit-0.98.8-orig/jackd/Makefile.in	2004-04-06 21:52:58.000000000 -0700
+++ jack-audio-connection-kit-0.98.8/jackd/Makefile.in	2004-08-11 13:17:25.843517896 -0700
@@ -55,7 +55,7 @@
 binPROGRAMS_INSTALL = $(INSTALL_PROGRAM)
 PROGRAMS = $(bin_PROGRAMS)
 am_jackd_OBJECTS = jackd.$(OBJEXT) engine.$(OBJEXT) \
-	transengine.$(OBJEXT)
+	transengine.$(OBJEXT) message_buffer.$(OBJEXT)
 jackd_OBJECTS = $(am_jackd_OBJECTS)
 am__DEPENDENCIES_1 =
 jackd_DEPENDENCIES = ../libjack/libjack.la $(am__DEPENDENCIES_1)
@@ -236,9 +236,9 @@
 @USE_CAPABILITIES_FALSE@CAP_LIBS = 
 @USE_CAPABILITIES_TRUE@CAP_LIBS = -lcap
 AM_CFLAGS = $(JACK_CFLAGS) -DJACK_LOCATION=\"$(bindir)\"
-jackd_SOURCES = jackd.c engine.c transengine.c
+jackd_SOURCES = jackd.c engine.c transengine.c message_buffer.c
 jackd_LDADD = ../libjack/libjack.la $(CAP_LIBS) @OS_LDFLAGS@
-noinst_HEADERS = jack_md5.h md5.h md5_loc.h transengine.h
+noinst_HEADERS = jack_md5.h md5.h md5_loc.h transengine.h message_buffer.h
 BUILT_SOURCES = jack_md5.h
 jackstart_SOURCES = jackstart.c md5.c
 jackstart_LDFLAGS = -lcap
diff -uNr jack-audio-connection-kit-0.98.8-orig/jackd/message_buffer.c jack-audio-connection-kit-0.98.8/jackd/message_buffer.c
--- jack-audio-connection-kit-0.98.8-orig/jackd/message_buffer.c	1969-12-31 16:00:00.000000000 -0800
+++ jack-audio-connection-kit-0.98.8/jackd/message_buffer.c	2004-06-21 17:47:26.000000000 -0700
@@ -0,0 +1,56 @@
+/* -*- c-basic-offset: 4 -*-  vi:set ts=8 sts=4 sw=4: */
+
+/* message_buffer.c
+
+   This file is in the public domain.
+*/
+
+#include <unistd.h>
+#include <string.h>
+#include <pthread.h>
+#include <stdio.h>
+
+#define BUFFERS      16		/* must be 2^n */
+#define BUFFER_SIZE 256
+
+static char buffer[BUFFERS][BUFFER_SIZE];
+static const char *mb_prefix;
+static unsigned int initialised = 0;
+static unsigned int in_buffer = 0;
+static unsigned int out_buffer = 0;
+static pthread_t writer_thread;
+
+void *mb_thread_func(void *arg);
+
+void add_message(const char *msg)
+{
+    strncpy(buffer[in_buffer], msg, BUFFER_SIZE - 1);
+    in_buffer = (in_buffer + 1) & (BUFFERS - 1);
+}
+
+void mb_init(const char *prefix)
+{
+    if (initialised) {
+	return;
+    }
+    mb_prefix = prefix;
+
+    pthread_create(&writer_thread, NULL, &mb_thread_func, NULL);
+
+    initialised = 1;
+}
+
+void *mb_thread_func(void *arg)
+{
+    while (1) {
+	while (out_buffer != in_buffer) {
+	    fprintf(stderr, "%s%s", mb_prefix, buffer[out_buffer]);
+	    out_buffer = (out_buffer + 1) & (BUFFERS - 1);
+	}
+	usleep(1000);
+    }
+
+    return NULL;
+}
+
+/* vi:set ts=8 sts=4 sw=4: */
diff -uNr jack-audio-connection-kit-0.98.8-orig/jackd/message_buffer.h jack-audio-connection-kit-0.98.8/jackd/message_buffer.h
--- jack-audio-connection-kit-0.98.8-orig/jackd/message_buffer.h	1969-12-31 16:00:00.000000000 -0800
+++ jack-audio-connection-kit-0.98.8/jackd/message_buffer.h	2004-08-11 13:41:21.696235104 -0700
@@ -0,0 +1,30 @@
+/* -*- c-basic-offset: 4 -*-  vi:set ts=8 sts=4 sw=4: */
+
+/* message_buffer.h
+
+   This file is in the public domain.
+*/
+
+#ifndef MESSAGE_BUFFER_H
+#define MESSAGE_BUFFER_H
+
+#define MB_MESSAGE(fmt...) { \
+	char _m[256]; \
+	snprintf(_m, 255, fmt); \
+	add_message(_m); \
+}
+
+#define MB_VERBOSE(engine,format,args...)   \
+    if ((engine)->verbose) {		    \
+	char _m[256];			    \
+	snprintf(_m, 255, format, ## args); \
+	add_message(_m);		    \
+    }
+
+void mb_init(const char *prefix);
+
+void add_message(const char *msg);
+
+#endif
+
+/* vi:set ts=8 sts=4 sw=4: */

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-19
  2004-12-02 13:49                                                 ` Ingo Molnar
  2004-12-02 16:08                                                   ` Florian Schmidt
@ 2004-12-02 17:44                                                   ` Florian Schmidt
  2004-12-02 21:12                                                     ` Florian Schmidt
  1 sibling, 1 reply; 59+ messages in thread
From: Florian Schmidt @ 2004-12-02 17:44 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rui Nuno Capela, linux-kernel, Lee Revell, mark_h_johnson,
	K.R. Foley, Bill Huey, Adam Heath, Thomas Gleixner,
	Michal Schmidt, Fernando Pablo Lopez-Lezcano, Karsten Wiese,
	Gunther Persoons, emann, Shane Shrybman, Amit Shah, Esben Nielsen,
	Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 2467 bytes --]

On Thu, 2 Dec 2004 14:49:34 +0100
Ingo Molnar <mingo@elte.hu> wrote:

> 
> * Florian Schmidt <mista.tapas@gmx.net> wrote:
> 
> > Ok, so if i want to find out whether a client violates the RT
> > constraints for its process callback i would have to add a call to
> > gettimeofday(1,1) at the start of the process callback and
> > gettimeofday(1,0) at the end.
> > 
> > Everything which causes a reschedule inbetween will then cause SIGUSR2
> > to be sent to the client for which i could either add a signal handler
> > in the client or just use gdb to get notified of it. 
> 
> correct. I'd expect there to be a number of less critical reschedules
> happening around startup/shutdown of a client, which one could consider
> a false positive, but there should be no unexpected rescheduling while
> the client is up and running.

Ok,

this simple patch adds the gettimeofday calls around the calling of the
process callback:

--- libjack/client.c.orig	2004-12-02 17:55:04.000000000 +0100
+++ libjack/client.c	2004-12-02 17:56:23.000000000 +0100
@@ -1238,6 +1238,9 @@
 			if (control->sync_cb)
 				jack_call_sync_client (client);
 
+			// enable atomicity check for RP kernels
+			gettimeofday(1,1);
+			
 			if (control->process) {
 				if (control->process (control->nframes,
 						      control->process_arg)
@@ -1247,7 +1250,10 @@
 			} else {
 				control->state = Finished;
 			}
-
+			
+			// disable atomicity check
+			gettimeofday(0,1);
+			
 			if (control->timebase_cb)
 				jack_call_timebase_master (client);
 

The results i see are rather interesting though. Even with a noop jack
client (which does nothing but return 0 in the process callback) i get a
syslog report everytime i start the client. Client source attached.

Dec  2 18:39:06 mango kernel: jack_test:22743 userspace BUG: scheduling in user-atomic context!
Dec  2 18:39:06 mango kernel:  [<c02a38b6>] schedule+0x76/0x130 (8)
Dec  2 18:39:06 mango kernel:  [<c02a44c5>] schedule_timeout+0x85/0xe0 (36)
Dec  2 18:39:06 mango kernel:  [<c016677f>] do_pollfd+0x4f/0x90 (48)
Dec  2 18:39:06 mango kernel:  [<c011ceb0>] process_timeout+0x0/0x10 (8)
Dec  2 18:39:06 mango kernel:  [<c016686a>] do_poll+0xaa/0xd0 (20)
Dec  2 18:39:06 mango kernel:  [<c01669e2>] sys_poll+0x152/0x230 (48)
Dec  2 18:39:06 mango kernel:  [<c0165db0>] __pollwait+0x0/0xd0 (36)
Dec  2 18:39:06 mango kernel:  [<c01025cb>] syscall_call+0x7/0xb (32)


The atomicity check operates on a per task (thread) basis right?

Flo

[-- Attachment #2: jack_test.cc --]
[-- Type: text/x-c++src, Size: 2068 bytes --]

#include <jack/jack.h>
#include <iostream>
#include <sstream>
#include <unistd.h>

jack_client_t *client;
jack_port_t *iport;
jack_port_t *oport;

int wasted_loops = 0;

int sleep_seconds = 1;
int sleep_in_period = 2000;
int counter = 0;

int process(jack_nframes_t frames, void *arg) {
	/*
	// std::cout << "process callback" << std::endl;
	jack_default_audio_sample_t *ibuf;
	ibuf = (jack_default_audio_sample_t*)jack_port_get_buffer(iport, frames);

	jack_default_audio_sample_t *obuf;
	obuf = (jack_default_audio_sample_t*)jack_port_get_buffer(oport, frames);

	for (jack_nframes_t frame = 0; frame < frames; frame++) {
		for (int i = 0; i < wasted_loops; ++i) {
			// do nothing
		}
		obuf[frame] = ibuf[frame];
	}
	counter++;
	if (counter == sleep_in_period) {
	  //sleep(sleep_seconds);
	}
*/
        return 0;
}

int main(int argc, char *argv[]) {
	// default = 60 seconds
	unsigned int seconds_to_run = 60;
	if (argc > 1) {
		std::stringstream sec_stream;
		sec_stream << argv[1];
		sec_stream >> seconds_to_run;
		if (argc > 2) {
			std::stringstream waste_stream;
			waste_stream << argv[2];
			waste_stream >> wasted_loops;
			std::cout << "wasted loops: " << wasted_loops << std::endl;
		}
	}
	std::cout << "seconds to run: " << seconds_to_run << std::endl;
	
	std::stringstream pid_stream;
	pid_stream << getpid();
	
        std::cout << "client_new" << std::endl;
        client = jack_client_new(pid_stream.str().c_str());

        std::cout << "port_register." << std::endl;
        iport = jack_port_register(client, "in", JACK_DEFAULT_AUDIO_TYPE, JackPortIsInput|JackPortIsTerminal, 0);
	oport = jack_port_register(client, "out", JACK_DEFAULT_AUDIO_TYPE, JackPortIsTerminal|JackPortIsOutput, 0);

        std::cout << "set_process_callback" << std::endl;
        jack_set_process_callback(client, process, 0);

        std::cout << "activate" << std::endl;
        jack_activate(client);

        std::cout << "running" << std::endl;

        // while(1) {sleep(1);};
	sleep(seconds_to_run);

	jack_deactivate(client);
	jack_client_close(client);
}

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-19
@ 2004-12-02 21:01 Mark_H_Johnson
  0 siblings, 0 replies; 59+ messages in thread
From: Mark_H_Johnson @ 2004-12-02 21:01 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rui Nuno Capela, linux-kernel, Lee Revell, mark_h_johnson,
	K.R. Foley, Bill Huey, Adam Heath, Florian Schmidt,
	Thomas Gleixner, Michal Schmidt, Fernando Pablo Lopez-Lezcano,
	Karsten Wiese, Gunther Persoons, emann, Shane Shrybman, Amit Shah,
	Esben Nielsen, Andrew Morton

Some results with -20.

[1] Had another crash where the serial console had incomplete output.
Unfortunately, I still had
  dmesg -n 0
so there were no messages prior to the failure. This was with -20PK
(and not RT) so CONFIG_RT does not seem to be the culprit. Will send
the serial console output separately. Did not stream messages, just the
one incomplete one.

[2] Still getting the odd behavior of the audio output. Seeing both
  /dev/dsp is busy
messages as well as the
  - "too short" duration alternating with
  - "wide variation" in duration
symptoms. No apparent pattern that I can see to cause these symptoms.
May have to resurrect the latencytest variant that I can trigger the
user tracing to track this down.

[3] The cpu_delay program is still getting > 1000 usec delays in the
CPU loop. The latency traces are not consistent either (though I have not
had any of the truncated ones either...). The following summarizes one
set of results with the -RT kernel.
   cpu_delay    latency_trace
     usec          usec
00   1855          2208
01   1790          1792 (OK)
02   2795          2785 (OK)
03   1543          1544 (OK)
04   1907          3078
05   1858         11322
06   1974          2519
07   1065          8411

The traces where the times don't match up also seem to have non RT
tasks included in the traces. I assume this means the trace did not
follow the task to the other CPU. It also is really annoying that the
non RT task was activated far faster than the RT one. Here's an
example trace (00 - 1855 usec; note non RT tasks in this trace)...

preemption latency trace v1.1.1 on 2.6.10-rc2-mm3-V0.7.31-20RT
-------------------------------------------------------
 latency: 2208 us, entries: 3738 (3738)   |   [VP:0 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: cpu_delay/8656, uid:0 nice:0 policy:1 rt_prio:30
    -----------------
 => started at: <00000000>
 => ended at:   __up_mutex+0x495/0x500 <c013b705>
=======>
cpu_dela-8656  ........ 0.000ms (+0.000ms): (0): [ 00000000 00000000
00000000 ]
cpu_dela-8656  08000001 0.000ms (+0.000ms): user_trace_start
(sys_gettimeofday)
cpu_dela-8656  08000000 0.000ms (+0.000ms): preempt_schedule
(user_trace_start)
cpu_dela-8656  98000000 0.001ms (+0.000ms): __sched_text_start
(preempt_schedule)
cpu_dela-8656  98000001 0.001ms (+0.000ms): sched_clock
(__sched_text_start)
cpu_dela-8656  98000001 0.002ms (+0.000ms): _raw_spin_lock_irq
(__sched_text_start)
cpu_dela-8656  98000001 0.002ms (+0.001ms): _raw_spin_lock_irqsave
(__sched_text_start)
cpu_dela-8656  80000002 0.004ms (+0.001ms): trace_array
(__sched_text_start)
cpu_dela-8656  ........ 0.005ms (+0.001ms): ->    IRQ 0-2 [ 00000000
00000001 ]: __sched_text_start
cpu_dela-8656  ........ 0.007ms (+0.001ms): -> cpu_dela-8656 [ 00000045
00000046 ]: __sched_text_start
cpu_dela-8656  ........ 0.008ms (+0.000ms): ->        X-2845 [ 00000074
00000078 ]: __sched_text_start
cpu_dela-8656  ........ 0.008ms (+0.000ms): ->  kdeinit-3207 [ 00000074
00000078 ]: __sched_text_start
cpu_dela-8656  ........ 0.008ms (+0.000ms): ->  kdeinit-3215 [ 00000074
00000078 ]: __sched_text_start
cpu_dela-8656  80000002 0.009ms (+0.003ms): trace_array
(__sched_text_start)
   IRQ 0-2     80000002 0.013ms (+0.001ms): __switch_to
(__sched_text_start)
   IRQ 0-2     ........ 0.014ms (+0.000ms): -> cpu_dela-8656 [ 00000045
00000000 ]: schedule
   IRQ 0-2     80000002 0.015ms (+0.000ms): finish_task_switch
(__sched_text_start)
   IRQ 0-2     80000002 0.015ms (+0.000ms): get_next_rt_task
(finish_task_switch)
   IRQ 0-2     80000002 0.015ms (+0.000ms): find_next_bit
(get_next_rt_task)
   IRQ 0-2     80000002 0.016ms (+0.000ms): get_task_struct
(get_next_rt_task)
   IRQ 0-2     80000002 0.016ms (+0.000ms): _raw_spin_unlock
(finish_task_switch)
   IRQ 0-2     80000001 0.017ms (+0.000ms): trace_stop_sched_switched
(finish_task_switch)
   IRQ 0-2     80000001 0.018ms (+0.000ms): wake_rt (finish_task_switch)
   IRQ 0-2     ........ 0.018ms (+0.000ms): ->    IRQ 0-2 [ 00000000
00000005 ]: finish_task_switch
   IRQ 0-2     80000001 0.019ms (+0.001ms): find_next_bit (wake_rt)
   IRQ 0-2     ........ 0.020ms (+0.000ms): -> <unknown-9714 [ 00000076
00000001 ]: finish_task_switch
   IRQ 0-2     80000001 0.020ms (+0.001ms): __migrate_task
(finish_task_switch)
   IRQ 0-2     80000001 0.021ms (+0.000ms): double_rq_lock (__migrate_task)
   IRQ 0-2     80000001 0.021ms (+0.000ms): _raw_spin_lock (double_rq_lock)
   IRQ 0-2     80000002 0.022ms (+0.000ms): _raw_spin_lock (__migrate_task)
   IRQ 0-2     80000003 0.023ms (+0.000ms): trace_change_sched_cpu
(__migrate_task)
   IRQ 0-2     80000003 0.023ms (+0.000ms): deactivate_task
(__migrate_task)
   IRQ 0-2     80000003 0.023ms (+0.000ms): dequeue_task (deactivate_task)
   IRQ 0-2     80000003 0.024ms (+0.000ms): activate_task (__migrate_task)
   IRQ 0-2     80000003 0.024ms (+0.000ms): sched_clock (activate_task)
   IRQ 0-2     80000003 0.024ms (+0.000ms): recalc_task_prio
(activate_task)
   IRQ 0-2     80000003 0.025ms (+0.000ms): effective_prio
(recalc_task_prio)
   IRQ 0-2     ........ 0.025ms (+0.000ms): -> cpu_dela-8656 [ 00000045
00000001 ]: __migrate_task
   IRQ 0-2     80000003 0.026ms (+0.001ms): enqueue_task (activate_task)
   IRQ 0-2     80000003 0.027ms (+0.001ms): resched_task (__migrate_task)
   IRQ 0-2     80000003 0.028ms (+0.001ms): smp_send_reschedule
(__migrate_task)
   IRQ 0-2     80000003 0.030ms (+0.000ms): send_IPI_mask
(smp_send_reschedule)
   IRQ 0-2     80000003 0.030ms (+0.001ms): send_IPI_mask_bitmask
(smp_send_reschedule)
   IRQ 0-2     80000003 0.032ms (+0.000ms): double_rq_unlock
(__migrate_task)
   IRQ 0-2     80000003 0.032ms (+0.000ms): _raw_spin_unlock
(double_rq_unlock)
   IRQ 0-2     80000002 0.032ms (+0.001ms): _raw_spin_unlock
(__migrate_task)
   IRQ 0-2     80000001 0.033ms (+0.000ms): put_task_struct_delayed
(finish_task_switch)
   IRQ 0-2     ........ 0.034ms (+0.001ms): reschedule_interrupt: [
c032d150 00000000 00000000 ]
   IRQ 0-2     00000000 0.036ms (+0.000ms): kthread_should_stop (do_irqd)
   IRQ 0-2     00000000 0.036ms (+0.000ms): do_hardirq (do_irqd)
   IRQ 0-2     80000000 0.037ms (+0.000ms): _raw_spin_lock (do_hardirq)
   IRQ 0-2     80000001 0.038ms (+0.000ms): _raw_spin_unlock (do_hardirq)
   IRQ 0-2     80000000 0.038ms (+0.000ms): handle_IRQ_event (do_hardirq)
   IRQ 0-2     00000000 0.039ms (+0.000ms): timer_interrupt
(handle_IRQ_event)
   IRQ 0-2     00000000 0.039ms (+0.000ms): _spin_lock (timer_interrupt)
   IRQ 0-2     00000000 0.039ms (+0.000ms): __spin_lock (_spin_lock)
   IRQ 0-2     00000000 0.039ms (+0.000ms): __might_sleep (__spin_lock)
   IRQ 0-2     00000000 0.040ms (+0.000ms): _down_mutex (__spin_lock)
   IRQ 0-2     00000000 0.040ms (+0.000ms): __down_mutex (__spin_lock)
   IRQ 0-2     00000000 0.041ms (+0.000ms): __might_sleep (__down_mutex)
   IRQ 0-2     80000000 0.042ms (+0.001ms): _raw_spin_lock (__down_mutex)
   IRQ 0-2     80000001 0.043ms (+0.000ms): _raw_spin_lock (__down_mutex)
   IRQ 0-2     80000002 0.044ms (+0.000ms): _raw_spin_lock (__down_mutex)
   IRQ 0-2     80000003 0.044ms (+0.000ms): set_new_owner (__down_mutex)
   IRQ 0-2     ........ 0.044ms (+0.000ms): ->    IRQ 0-2 [ 00000000
00000000 ]: __down_mutex
   IRQ 0-2     80000003 0.045ms (+0.000ms): _raw_spin_unlock (__down_mutex)
   IRQ 0-2     80000002 0.046ms (+0.000ms): _raw_spin_unlock (__down_mutex)
   IRQ 0-2     80000001 0.046ms (+0.000ms): _raw_spin_unlock (__down_mutex)
   IRQ 0-2     00000000 0.047ms (+0.000ms): mark_offset_tsc
(timer_interrupt)
   IRQ 0-2     00000000 0.047ms (+0.000ms): _spin_lock (mark_offset_tsc)
   IRQ 0-2     00000000 0.048ms (+0.000ms): __spin_lock (_spin_lock)
   IRQ 0-2     00000000 0.048ms (+0.000ms): __might_sleep (__spin_lock)
   IRQ 0-2     00000000 0.048ms (+0.000ms): _down_mutex (__spin_lock)
   IRQ 0-2     00000000 0.049ms (+0.000ms): __down_mutex (__spin_lock)
   IRQ 0-2     00000000 0.049ms (+0.000ms): __might_sleep (__down_mutex)
   IRQ 0-2     80000000 0.050ms (+0.000ms): _raw_spin_lock (__down_mutex)
   IRQ 0-2     80000001 0.050ms (+0.000ms): _raw_spin_lock (__down_mutex)
   IRQ 0-2     80000002 0.051ms (+0.000ms): _raw_spin_lock (__down_mutex)
   IRQ 0-2     80000003 0.051ms (+0.000ms): set_new_owner (__down_mutex)
   IRQ 0-2     ........ 0.051ms (+0.000ms): ->    IRQ 0-2 [ 00000000
00000000 ]: __down_mutex
   IRQ 0-2     80000003 0.052ms (+0.000ms): _raw_spin_unlock (__down_mutex)
   IRQ 0-2     80000002 0.052ms (+0.000ms): _raw_spin_unlock (__down_mutex)
   IRQ 0-2     80000001 0.053ms (+0.000ms): _raw_spin_unlock (__down_mutex)
   IRQ 0-2     00000000 0.053ms (+0.006ms): _raw_spin_lock_irqsave
(mark_offset_tsc)
   IRQ 0-2     80000001 0.060ms (+0.000ms): _raw_spin_unlock_irqrestore
(mark_offset_tsc)
   IRQ 0-2     00000000 0.061ms (+0.000ms): _spin_unlock (mark_offset_tsc)
   IRQ 0-2     00000000 0.061ms (+0.000ms): up_mutex (mark_offset_tsc)
   IRQ 0-2     00000000 0.062ms (+0.000ms): __up_mutex (up_mutex)
   IRQ 0-2     80000000 0.062ms (+0.000ms): _raw_spin_lock (__up_mutex)
   IRQ 0-2     80000001 0.062ms (+0.000ms): _raw_spin_lock (__up_mutex)
   IRQ 0-2     80000002 0.063ms (+0.000ms): _raw_spin_lock (__up_mutex)
   IRQ 0-2     80000003 0.063ms (+0.000ms): mutex_getprio (__up_mutex)
   IRQ 0-2     ........ 0.063ms (+0.000ms): ->    IRQ 0-2 [ 00000000
00000000 ]: __up_mutex
   IRQ 0-2     80000003 0.064ms (+0.000ms): _raw_spin_unlock (__up_mutex)
   IRQ 0-2     80000002 0.064ms (+0.000ms): _raw_spin_unlock (__up_mutex)
   IRQ 0-2     80000001 0.065ms (+0.000ms): _raw_spin_unlock (__up_mutex)
   IRQ 0-2     00000000 0.065ms (+0.004ms): _raw_spin_lock_irqsave
(timer_interrupt)
   IRQ 0-2     80000001 0.070ms (+0.000ms): _raw_spin_unlock_irqrestore
(timer_interrupt)
   IRQ 0-2     00000000 0.070ms (+0.000ms): _spin_unlock (timer_interrupt)
   IRQ 0-2     00000000 0.070ms (+0.000ms): up_mutex (timer_interrupt)
   IRQ 0-2     00000000 0.071ms (+0.000ms): __up_mutex (up_mutex)
   IRQ 0-2     80000000 0.071ms (+0.000ms): _raw_spin_lock (__up_mutex)
   IRQ 0-2     80000001 0.072ms (+0.000ms): _raw_spin_lock (__up_mutex)
   IRQ 0-2     80000002 0.072ms (+0.000ms): _raw_spin_lock (__up_mutex)
   IRQ 0-2     80000003 0.073ms (+0.000ms): mutex_getprio (__up_mutex)
   IRQ 0-2     ........ 0.073ms (+0.000ms): ->    IRQ 0-2 [ 00000000
00000000 ]: __up_mutex
   IRQ 0-2     80000003 0.073ms (+0.000ms): _raw_spin_unlock (__up_mutex)
   IRQ 0-2     80000002 0.074ms (+0.000ms): _raw_spin_unlock (__up_mutex)
   IRQ 0-2     80000001 0.074ms (+0.000ms): _raw_spin_unlock (__up_mutex)
   IRQ 0-2     00000000 0.075ms (+0.000ms): cond_resched_all (do_hardirq)
   IRQ 0-2     00000000 0.075ms (+0.000ms): cond_resched (do_hardirq)
   IRQ 0-2     00000000 0.075ms (+0.000ms): _raw_spin_lock_irq (do_hardirq)
   IRQ 0-2     00000000 0.076ms (+0.000ms): _raw_spin_lock_irqsave
(do_hardirq)
   IRQ 0-2     80000001 0.076ms (+0.000ms): note_interrupt (do_hardirq)
   IRQ 0-2     80000001 0.077ms (+0.000ms): end_edge_ioapic_irq
(do_hardirq)
   IRQ 0-2     80000001 0.077ms (+0.000ms): _raw_spin_unlock (do_hardirq)
   IRQ 0-2     00000000 0.078ms (+0.000ms): cond_resched_all (do_irqd)
   IRQ 0-2     00000000 0.078ms (+0.000ms): cond_resched (do_irqd)
   IRQ 0-2     00000000 0.078ms (+0.000ms): __do_softirq (do_irqd)
   IRQ 0-2     00000000 0.079ms (+0.000ms): schedule (do_irqd)
   IRQ 0-2     00000000 0.079ms (+0.000ms): __sched_text_start (schedule)
   IRQ 0-2     00000001 0.080ms (+0.000ms): sched_clock
(__sched_text_start)
   IRQ 0-2     00000001 0.081ms (+0.000ms): _raw_spin_lock_irq
(__sched_text_start)
   IRQ 0-2     00000001 0.081ms (+0.000ms): _raw_spin_lock_irqsave
(__sched_text_start)
   IRQ 0-2     80000002 0.082ms (+0.000ms): deactivate_task
(__sched_text_start)
   IRQ 0-2     80000002 0.082ms (+0.000ms): dequeue_task (deactivate_task)
   IRQ 0-2     80000002 0.083ms (+0.002ms): trace_array
(__sched_text_start)
   IRQ 0-2     ........ 0.085ms (+0.000ms): ->        X-2845 [ 00000074
00000078 ]: __sched_text_start
   IRQ 0-2     ........ 0.085ms (+0.000ms): ->  kdeinit-3207 [ 00000074
00000078 ]: __sched_text_start
   IRQ 0-2     ........ 0.086ms (+0.000ms): ->  kdeinit-3215 [ 00000074
00000078 ]: __sched_text_start
   IRQ 0-2     80000002 0.086ms (+0.003ms): trace_array
(__sched_text_start)
       X-2845  80000002 0.090ms (+0.000ms): __switch_to
(__sched_text_start)
       X-2845  ........ 0.091ms (+0.000ms): ->    IRQ 0-2 [ 00000000
00000074 ]: preempt_schedule_irq
       X-2845  80000002 0.091ms (+0.000ms): finish_task_switch
(__sched_text_start)
       X-2845  80000002 0.091ms (+0.000ms): _raw_spin_unlock
(finish_task_switch)
       X-2845  80000001 0.092ms (+0.000ms): trace_stop_sched_switched
(finish_task_switch)
       X-2845  00000000 0.093ms (+0.000ms): tty_ldisc_deref (tty_poll)
       X-2845  00000000 0.094ms (+0.000ms): _spin_unlock_irqrestore
(tty_ldisc_deref)
       X-2845  00000000 0.094ms (+0.000ms): up_mutex (tty_ldisc_deref)
       X-2845  00000000 0.095ms (+0.000ms): __up_mutex (up_mutex)
       X-2845  80000000 0.095ms (+0.000ms): _raw_spin_lock (__up_mutex)
       X-2845  80000001 0.095ms (+0.000ms): _raw_spin_lock (__up_mutex)
       X-2845  80000002 0.096ms (+0.000ms): _raw_spin_lock (__up_mutex)
       X-2845  80000003 0.097ms (+0.000ms): mutex_getprio (__up_mutex)
       X-2845  ........ 0.097ms (+0.000ms): ->        X-2845 [ 00000074
00000074 ]: __up_mutex
       X-2845  80000003 0.097ms (+0.000ms): _raw_spin_unlock (__up_mutex)
       X-2845  80000002 0.098ms (+0.000ms): _raw_spin_unlock (__up_mutex)
       X-2845  80000001 0.098ms (+0.000ms): _raw_spin_unlock (__up_mutex)
       X-2845  00000000 0.099ms (+0.000ms): fput (do_select)
       X-2845  00000000 0.099ms (+0.000ms): cond_resched (do_select)
       X-2845  00000000 0.100ms (+0.000ms): fget (do_select)
... traces of X, ksoftrirq (RT), X, IRQ 14 (RT), X, IRQ 0 (RT), X,
    kdeinit, ksoftirq (RT), kdeinit [cpu_delay (RT) starts on untraced
CPU],
    IRQ 0 (RT), kdeinit, cpu_delay (RT) back in this trace ...

It may be annoying, but it might be good on a small SMP system like mine
to trace what's going on both CPU's during a user trace. Add a column for
the CPU number. That would show the interaction between the CPU's and
see why cpu_delay got stuck for so long in this instance. If you wanted
to follow a single CPU, just grep for that column. Would probably have to
increase the max trace table size (16K traces?). Not sure how to do that
though.

I'll send the traces separately for the cpu_delay symptoms and see if I
can somehow recreate the crash (w/ dmesg -n 8 active).
  --Mark


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-19
  2004-12-02 17:44                                                   ` Florian Schmidt
@ 2004-12-02 21:12                                                     ` Florian Schmidt
  0 siblings, 0 replies; 59+ messages in thread
From: Florian Schmidt @ 2004-12-02 21:12 UTC (permalink / raw)
  To: Florian Schmidt
  Cc: Ingo Molnar, Rui Nuno Capela, linux-kernel, Lee Revell,
	mark_h_johnson, K.R. Foley, Bill Huey, Adam Heath,
	Thomas Gleixner, Michal Schmidt, Fernando Pablo Lopez-Lezcano,
	Karsten Wiese, Gunther Persoons, emann, Shane Shrybman, Amit Shah,
	Esben Nielsen, Andrew Morton

On Thu, 2 Dec 2004 18:44:21 +0100
Florian Schmidt <mista.tapas@gmx.net> wrote:

> this simple patch adds the gettimeofday calls around the calling of the
> process callback:
> 

[snip]

> 
> The results i see are rather interesting though. Even with a noop jack
> client (which does nothing but return 0 in the process callback) i get a
> syslog report everytime i start the client. Client source attached.
> 

[snip]

> 
> The atomicity check operates on a per task (thread) basis right?
> 
> Flo
> 

Well, my error. gettimeofday(1,0) != gettimeofday(0,1) :)

flo

-- 
Palimm Palimm!
Sounds/Ware:
http://affenbande.org/~tapas/

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-19
  2004-12-01 16:20                       ` [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-19 Ingo Molnar
  2004-12-01 16:31                         ` Ingo Molnar
  2004-12-01 18:59                         ` Rui Nuno Capela
@ 2004-12-03  1:41                         ` Fernando Lopez-Lezcano
  2004-12-03  2:23                           ` Fernando Lopez-Lezcano
  2 siblings, 1 reply; 59+ messages in thread
From: Fernando Lopez-Lezcano @ 2004-12-03  1:41 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rui Nuno Capela, linux-kernel, Lee Revell, mark_h_johnson,
	K.R. Foley, Bill Huey, Adam Heath, Florian Schmidt,
	Thomas Gleixner, Michal Schmidt, Karsten Wiese, Gunther Persoons,
	emann, Shane Shrybman, Amit Shah, Esben Nielsen, Andrew Morton

On Wed, 2004-12-01 at 08:20, Ingo Molnar wrote:
> * Ingo Molnar <mingo@elte.hu> wrote:
> 
> > ok, this could be ACPI CPU-sleep related. Could you disable all ACPI
> > options in your .config (as a workaround), and re-check whether the
> > xruns still occur?
> 
> i think i found the bug - it's an upstream ACPI bug. Does the patch
> below (or the -31-19 kernel, which i've just uploaded) fix the xruns?

I'm trying unsuccessfully to boot my dual processor athlon into an smp
V0.7.31-20 patched kernel, with PREEMPT_DESKTOP (has not booted
correctly with realtime preemption compiled in for a long time):

(copied from the screen)

BUG: scheduling while atomic: swapper/0x10000000/1
caller is preempt_schedule_irq+0x34/0x50
[<c0105303>] dump_stack+-x23/0x30 (20)
[<c03726dc>] __schedule+0xd5c/0xdb0 (124)
[<c0372904>] preempt_schedule_irq+0x34/0x50 (12)
[<c010429b>] need_resched+0x20/0x29 (-7256)
-------
|preempt_count: 10000001
1-level deep critical section nesting
.. [<c014610d>] .... preint_tracs+0x1d/0x60
.....[<c0105303>] .. ( <= dump_stack+0x23/0x30)
BUG at kernel/latency.c: 918!
--------
invalid operand: 0000 [#1]
PREEMPT SMP

Tried booting with softirq-preempt=1 hardirq-preempt=0 with no change. 

The matching uniprocessor kernel boots fine.
-- Fernando



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-19
  2004-12-03  1:41                         ` Fernando Lopez-Lezcano
@ 2004-12-03  2:23                           ` Fernando Lopez-Lezcano
  0 siblings, 0 replies; 59+ messages in thread
From: Fernando Lopez-Lezcano @ 2004-12-03  2:23 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rui Nuno Capela, linux-kernel, Lee Revell, mark_h_johnson,
	K.R. Foley, Bill Huey, Adam Heath, Florian Schmidt,
	Thomas Gleixner, Michal Schmidt, Karsten Wiese, Gunther Persoons,
	emann, Shane Shrybman, Amit Shah, Esben Nielsen, Andrew Morton

On Thu, 2004-12-02 at 17:41, Fernando Lopez-Lezcano wrote:
> On Wed, 2004-12-01 at 08:20, Ingo Molnar wrote:
> > * Ingo Molnar <mingo@elte.hu> wrote:
> > 
> > > ok, this could be ACPI CPU-sleep related. Could you disable all ACPI
> > > options in your .config (as a workaround), and re-check whether the
> > > xruns still occur?
> > 
> > i think i found the bug - it's an upstream ACPI bug. Does the patch
> > below (or the -31-19 kernel, which i've just uploaded) fix the xruns?
> 
> I'm trying unsuccessfully to boot my dual processor athlon into an smp
> V0.7.31-20 patched kernel, with PREEMPT_DESKTOP (has not booted
> correctly with realtime preemption compiled in for a long time):
> 
> (copied from the screen)
> 
> BUG: scheduling while atomic: swapper/0x10000000/1
> caller is preempt_schedule_irq+0x34/0x50
> [<c0105303>] dump_stack+-x23/0x30 (20)
> [<c03726dc>] __schedule+0xd5c/0xdb0 (124)
> [<c0372904>] preempt_schedule_irq+0x34/0x50 (12)
> [<c010429b>] need_resched+0x20/0x29 (-7256)
> -------
> |preempt_count: 10000001
> 1-level deep critical section nesting
> .. [<c014610d>] .... preint_tracs+0x1d/0x60
> .....[<c0105303>] .. ( <= dump_stack+0x23/0x30)
> BUG at kernel/latency.c: 918!
> --------
> invalid operand: 0000 [#1]
> PREEMPT SMP
> 
> Tried booting with softirq-preempt=1 hardirq-preempt=0 with no change. 
> 
> The matching uniprocessor kernel boots fine.

But on shutdown (up kernel):

BUG at mm/truncate.c:48!
---- ...
kernel BUG at mm/truncate.c:48!
invalid operand: 0000 [#1]
PREEMPT

-- Fernando



^ permalink raw reply	[flat|nested] 59+ messages in thread

end of thread, other threads:[~2004-12-03  2:23 UTC | newest]

Thread overview: 59+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-11-26 12:12 Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-7 Rui Nuno Capela
2004-11-29 11:05 ` Ingo Molnar
2004-11-29 11:16 ` Ingo Molnar
2004-11-29 11:24   ` Ingo Molnar
2004-11-29 15:42     ` Ingo Molnar
2004-11-29 13:13   ` Rui Nuno Capela
2004-11-29 14:33     ` Ingo Molnar
2004-11-29 15:23       ` Ingo Molnar
2004-11-29 23:16         ` Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-13 Gene Heskett
2004-11-30  1:50           ` K.R. Foley
2004-11-30  3:19             ` Gene Heskett
2004-11-30  4:54             ` Gene Heskett
2004-11-30 15:26               ` K.R. Foley
2004-11-30 16:24                 ` Gene Heskett
2004-11-30 16:52                   ` Zwane Mwaikambo
2004-12-01  7:16                     ` Gene Heskett
2004-11-30 16:57                   ` K.R. Foley
2004-11-30 10:29         ` Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-7 Rui Nuno Capela
2004-11-30 13:19           ` Ingo Molnar
2004-11-30 15:39             ` Rui Nuno Capela
2004-11-30 16:42               ` Ingo Molnar
2004-12-01 10:32               ` Ingo Molnar
2004-12-01 11:25                 ` Ingo Molnar
2004-12-01 12:49                   ` Rui Nuno Capela
2004-12-01 12:47                 ` Rui Nuno Capela
2004-12-01 15:40                   ` Ingo Molnar
2004-12-01 16:06                     ` Ingo Molnar
2004-12-01 16:20                       ` [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-19 Ingo Molnar
2004-12-01 16:31                         ` Ingo Molnar
2004-12-01 18:59                         ` Rui Nuno Capela
2004-12-01 21:29                           ` Ingo Molnar
2004-12-01 21:30                             ` Ingo Molnar
     [not found]                               ` <32788.192.168.1.8.1101938057.squirrel@192.168.1.8>
2004-12-01 21:58                                 ` Ingo Molnar
2004-12-01 22:04                                   ` Rui Nuno Capela
2004-12-01 22:09                                 ` Ingo Molnar
2004-12-01 22:31                                   ` Rui Nuno Capela
2004-12-02  9:12                                     ` Rui Nuno Capela
2004-12-02 12:59                                       ` Rui Nuno Capela
2004-12-02 16:38                                         ` Fernando Lopez-Lezcano
2004-12-01 22:43                                   ` Florian Schmidt
2004-12-02  8:40                                     ` Ingo Molnar
2004-12-02 12:22                                       ` Florian Schmidt
2004-12-02 12:29                                         ` Ingo Molnar
2004-12-02 13:06                                           ` Florian Schmidt
2004-12-02 13:10                                             ` Ingo Molnar
2004-12-02 13:40                                               ` Florian Schmidt
2004-12-02 13:49                                                 ` Ingo Molnar
2004-12-02 16:08                                                   ` Florian Schmidt
2004-12-02 17:44                                                   ` Florian Schmidt
2004-12-02 21:12                                                     ` Florian Schmidt
2004-12-02 13:18                                             ` Rui Nuno Capela
2004-12-03  1:41                         ` Fernando Lopez-Lezcano
2004-12-03  2:23                           ` Fernando Lopez-Lezcano
2004-11-30 18:13   ` Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-7 Remi Colinet
2004-11-30  8:15     ` Ingo Molnar
2004-12-01  8:30       ` Eran Mann
2004-12-01  8:53         ` Ingo Molnar
2004-12-01 18:19           ` Adam Heath
  -- strict thread matches above, loose matches on Subject: below --
2004-12-02 21:01 [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.31-19 Mark_H_Johnson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox