* [patch] voluntary-preempt-2.6.8-rc3-O5
2004-08-09 10:46 ` [patch] voluntary-preempt-2.6.8-rc3-O4 Ingo Molnar
@ 2004-08-10 13:26 ` Ingo Molnar
2004-08-10 18:25 ` Peter Zijlstra
` (2 more replies)
0 siblings, 3 replies; 30+ messages in thread
From: Ingo Molnar @ 2004-08-10 13:26 UTC (permalink / raw)
To: linux-kernel; +Cc: Lee Revell, Felipe Alfaro Solana, Florian Schmidt
i've uploaded the latest version of the voluntary-preempt patch:
http://redhat.com/~mingo/voluntary-preempt/voluntary-preempt-2.6.8-rc3-O5
-O5 fixes the APIC lockup issues. The bug was primarily caused by PCI
POST delays causing IRQ storms of level-triggered IRQ sources that were
hardirq-redirected. Also found some bugs in delayed-IRQ masking and
unmasking. SMP should thus work again too.
Ingo
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [patch] voluntary-preempt-2.6.8-rc3-O5
@ 2004-08-10 15:46 Shane Shrybman
0 siblings, 0 replies; 30+ messages in thread
From: Shane Shrybman @ 2004-08-10 15:46 UTC (permalink / raw)
To: mingo, linux-kernel
Hi,
[ Ingo Molnar wrote ]
-O5 fixes the APIC lockup issues. The bug was primarily caused by PCI
POST delays causing IRQ storms of level-triggered IRQ sources that were
hardirq-redirected. Also found some bugs in delayed-IRQ masking and
unmasking. SMP should thus work again too.
Indeed, -O5 is running fine here with all APIC options enabled. Great
work, thanks!
--- linux-2.6.8-rc3-O5/Makefile.orig 2004-08-10 11:17:29.000000000 -0400
+++ linux-2.6.8-rc3-O5/Makefile 2004-08-10 11:14:55.000000000 -0400
@@ -1,8 +1,8 @@
VERSION = 2
PATCHLEVEL = 6
SUBLEVEL = 8
-EXTRAVERSION =-rc3
-NAME=Zonked Quokka
+EXTRAVERSION =-rc3-O5
+NAME=Smooth as Ex-Lax
# *DOCUMENTATION*
# To see a list of typical targets execute "make help"
;)
Shane
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [patch] voluntary-preempt-2.6.8-rc3-O5
2004-08-10 13:26 ` [patch] voluntary-preempt-2.6.8-rc3-O5 Ingo Molnar
@ 2004-08-10 18:25 ` Peter Zijlstra
2004-08-10 21:56 ` Lee Revell
2004-08-10 21:59 ` Lee Revell
2 siblings, 0 replies; 30+ messages in thread
From: Peter Zijlstra @ 2004-08-10 18:25 UTC (permalink / raw)
To: Ingo Molnar, LKML
[-- Attachment #1: Type: text/plain, Size: 870 bytes --]
On Tue, 2004-08-10 at 15:26 +0200, Ingo Molnar wrote:
> i've uploaded the latest version of the voluntary-preempt patch:
>
> http://redhat.com/~mingo/voluntary-preempt/voluntary-preempt-2.6.8-rc3-O5
>
> -O5 fixes the APIC lockup issues. The bug was primarily caused by PCI
> POST delays causing IRQ storms of level-triggered IRQ sources that were
> hardirq-redirected. Also found some bugs in delayed-IRQ masking and
> unmasking. SMP should thus work again too.
>
> Ingo
Hi Ingo,
I've rediffed O5 agains 2.6.8-rc4-mm1-staircase. It compiles and boots
for me when using voluntary-preempt=1; 2 and 3 lock up the system right
after the line: 'Probing IDE interface ide1...' (dmesg & .config
included).
staircase patch used:
http://ck.kolivas.org/patches/2.6/2.6.8/2.6.8-rc4-mm1/from_2.6.8-rc4-
mm1_to_staircase7.I
Thanks for all the effort,
Peter Zijlstra
[-- Attachment #2: voluntary-preempt-2.6.8-rc4-mm1-O5.patch.gz --]
[-- Type: application/x-gzip, Size: 19704 bytes --]
[-- Attachment #3: dmesg-2.6.8-rc4-mm1-staircase-O5.gz --]
[-- Type: application/x-gzip, Size: 5366 bytes --]
[-- Attachment #4: config-2.6.8-rc4-mm1-staircase-O5.gz --]
[-- Type: application/x-gzip, Size: 7692 bytes --]
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [patch] voluntary-preempt-2.6.8-rc3-O5
2004-08-10 13:26 ` [patch] voluntary-preempt-2.6.8-rc3-O5 Ingo Molnar
2004-08-10 18:25 ` Peter Zijlstra
@ 2004-08-10 21:56 ` Lee Revell
2004-08-10 23:18 ` Florian Schmidt
2004-08-11 7:31 ` Ingo Molnar
2004-08-10 21:59 ` Lee Revell
2 siblings, 2 replies; 30+ messages in thread
From: Lee Revell @ 2004-08-10 21:56 UTC (permalink / raw)
To: Ingo Molnar; +Cc: linux-kernel, Felipe Alfaro Solana, Florian Schmidt
On Tue, 2004-08-10 at 09:26, Ingo Molnar wrote:
> i've uploaded the latest version of the voluntary-preempt patch:
>
> http://redhat.com/~mingo/voluntary-preempt/voluntary-preempt-2.6.8-rc3-O5
>
> -O5 fixes the APIC lockup issues. The bug was primarily caused by PCI
> POST delays causing IRQ storms of level-triggered IRQ sources that were
> hardirq-redirected. Also found some bugs in delayed-IRQ masking and
> unmasking. SMP should thus work again too.
>
The mlockall() issue seems to be fixed. Now I get this one when
starting jackd:
(jackd/12427): 10882us non-preemptible critical section violated 400 us preempt threshold starting at kernel_fpu_begin+0x10/0x60 and ending at fast_clear_page+0x75/0xa0
[<c0106777>] dump_stack+0x17/0x20
[<c01140eb>] sub_preempt_count+0x4b/0x60
[<c01d1585>] fast_clear_page+0x75/0xa0
[<c013ea86>] do_anonymous_page+0x86/0x180
[<c013ebd0>] do_no_page+0x50/0x300
[<c013f041>] handle_mm_fault+0xc1/0x170
[<c013da33>] get_user_pages+0x133/0x3d0
[<c013f198>] make_pages_present+0x68/0x90
[<c0140948>] do_mmap_pgoff+0x3f8/0x640
[<c010b7f6>] sys_mmap2+0x76/0xb0
[<c0106117>] syscall_call+0x7/0xb
Lee
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [patch] voluntary-preempt-2.6.8-rc3-O5
2004-08-10 13:26 ` [patch] voluntary-preempt-2.6.8-rc3-O5 Ingo Molnar
2004-08-10 18:25 ` Peter Zijlstra
2004-08-10 21:56 ` Lee Revell
@ 2004-08-10 21:59 ` Lee Revell
2 siblings, 0 replies; 30+ messages in thread
From: Lee Revell @ 2004-08-10 21:59 UTC (permalink / raw)
To: Ingo Molnar; +Cc: linux-kernel, Felipe Alfaro Solana, Florian Schmidt
On Tue, 2004-08-10 at 09:26, Ingo Molnar wrote:
> i've uploaded the latest version of the voluntary-preempt patch:
>
> http://redhat.com/~mingo/voluntary-preempt/voluntary-preempt-2.6.8-rc3-O5
>
> -O5 fixes the APIC lockup issues. The bug was primarily caused by PCI
> POST delays causing IRQ storms of level-triggered IRQ sources that were
> hardirq-redirected. Also found some bugs in delayed-IRQ masking and
> unmasking. SMP should thus work again too.
>
Correction, mlockall-test *does* cause xruns. Here is the trace:
(mlockall-test/12466): 10303us non-preemptible critical section violated 400 us preempt threshold starting at get_user_pages+0xa5/0x3d0 and ending at get_user_pages+0x2e7/0x3d0
[<c0106777>] dump_stack+0x17/0x20
[<c01140eb>] sub_preempt_count+0x4b/0x60
[<c013dbe7>] get_user_pages+0x2e7/0x3d0
[<c013f198>] make_pages_present+0x68/0x90
[<c013f5ed>] mlock_fixup+0x8d/0xb0
[<c013f8d0>] do_mlockall+0x70/0x90
[<c013f989>] sys_mlockall+0x99/0xa0
[<c0106117>] syscall_call+0x7/0xb
ALSA /home/rlrevell/cvs/alsa/alsa-driver/alsa-kernel/core/pcm_lib.c:139: XRUN: pcmC0D2c
[<c0106777>] dump_stack+0x17/0x20
[<de92f79b>] snd_pcm_period_elapsed+0x28b/0x3f0 [snd_pcm]
[<de96c1d1>] snd_emu10k1_interrupt+0xd1/0x3c0 [snd_emu10k1]
[<c011ac63>] generic_handle_IRQ_event+0x33/0x60
[<c0107a27>] do_IRQ+0xb7/0x190
[<c0106338>] common_interrupt+0x18/0x20
[<c0114008>] __touch_preempt_timing+0x8/0x20
[<c013dbf4>] get_user_pages+0x2f4/0x3d0
[<c013f198>] make_pages_present+0x68/0x90
[<c013f5ed>] mlock_fixup+0x8d/0xb0
[<c013f8d0>] do_mlockall+0x70/0x90
[<c013f989>] sys_mlockall+0x99/0xa0
[<c0106117>] syscall_call+0x7/0xb
Here is a different one:
ALSA /home/rlrevell/cvs/alsa/alsa-driver/alsa-kernel/core/pcm_lib.c:139: XRUN: pcmC0D0p
[<c0106777>] dump_stack+0x17/0x20
[<de92f79b>] snd_pcm_period_elapsed+0x28b/0x3f0 [snd_pcm]
[<de96c437>] snd_emu10k1_interrupt+0x337/0x3c0 [snd_emu10k1]
[<c011ac63>] generic_handle_IRQ_event+0x33/0x60
[<c0107a27>] do_IRQ+0xb7/0x190
[<c0106338>] common_interrupt+0x18/0x20
[<c0266777>] schedule+0x2d7/0x5b0
[<c0266fa7>] schedule_timeout+0x57/0xa0
[<c015fef1>] do_poll+0xa1/0xc0
[<c0160041>] sys_poll+0x131/0x220
[<c0106117>] syscall_call+0x7/0xb
(jackd/12430): 17135us non-preemptible critical section violated 400 us preempt threshold starting at schedule+0x57/0x5b0 and ending at schedule+0x2ef/0x5b0
[<c0106777>] dump_stack+0x17/0x20
[<c01140eb>] sub_preempt_count+0x4b/0x60
[<c026678f>] schedule+0x2ef/0x5b0
[<c0266fa7>] schedule_timeout+0x57/0xa0
[<c015fef1>] do_poll+0xa1/0xc0
[<c0160041>] sys_poll+0x131/0x220
[<c0106117>] syscall_call+0x7/0xb
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [patch] voluntary-preempt-2.6.8-rc3-O5
2004-08-10 21:56 ` Lee Revell
@ 2004-08-10 23:18 ` Florian Schmidt
2004-08-11 7:31 ` Ingo Molnar
1 sibling, 0 replies; 30+ messages in thread
From: Florian Schmidt @ 2004-08-10 23:18 UTC (permalink / raw)
To: Lee Revell; +Cc: Ingo Molnar, linux-kernel, Felipe Alfaro Solana
On Tue, 10 Aug 2004 17:56:00 -0400
Lee Revell <rlrevell@joe-job.com> wrote:
> The mlockall() issue seems to be fixed. Now I get this one when
> starting jackd:
>
> (jackd/12427): 10882us non-preemptible critical section violated 400
> us preempt threshold starting at kernel_fpu_begin+0x10/0x60 and ending
> at fast_clear_page+0x75/0xa0
> [<c0106777>] dump_stack+0x17/0x20
> [<c01140eb>] sub_preempt_count+0x4b/0x60
> [<c01d1585>] fast_clear_page+0x75/0xa0
> [<c013ea86>] do_anonymous_page+0x86/0x180
> [<c013ebd0>] do_no_page+0x50/0x300
> [<c013f041>] handle_mm_fault+0xc1/0x170
> [<c013da33>] get_user_pages+0x133/0x3d0
> [<c013f198>] make_pages_present+0x68/0x90
> [<c0140948>] do_mmap_pgoff+0x3f8/0x640
> [<c010b7f6>] sys_mmap2+0x76/0xb0
> [<c0106117>] syscall_call+0x7/0xb
I see these, too [at jackd startup and at client startup]:
Aug 11 01:15:21 mango kernel: (jackd/843): 3215us non-preemptible
critical secti
on violated 1000 us preempt threshold starting at
kernel_fpu_begin+0x1a/0x60 and
ending at fast_clear_page+0x54/0x80
Aug 11 01:15:21 mango kernel: [<c01064ae>] dump_stack+0x1e/0x20
Aug 11 01:15:21 mango kernel: [<c0114735>] sub_preempt_count+0x45/0x60
Aug 11 01:15:21 mango kernel: [<c01daa94>] fast_clear_page+0x54/0x80
Aug 11 01:15:21 mango kernel: [<c0140150>] do_anonymous_page+0x90/0x190
Aug 11 01:15:21 mango kernel: [<c01402b1>] do_no_page+0x61/0x320
Aug 11 01:15:21 mango kernel: [<c0140763>] handle_mm_fault+0xe3/0x1a0
Aug 11 01:15:21 mango kernel: [<c013f0e7>] get_user_pages+0x147/0x3d0
Aug 11 01:15:21 mango kernel: [<c01408fd>] make_pages_present+0x8d/0xb0
Aug 11 01:15:21 mango kernel: [<c0142255>] do_mmap_pgoff+0x445/0x6b0
Aug 11 01:15:21 mango kernel: [<c010b800>] old_mmap+0xe0/0x120
Aug 11 01:15:21 mango kernel: [<c0105f2b>] syscall_call+0x7/0xb
Aug 11 01:15:21 mango kernel: ALSA sound/core/pcm_lib.c:169: XRUN:
pcmC0D0p
Aug 11 01:15:21 mango kernel: [<c01064ae>] dump_stack+0x1e/0x20
Aug 11 01:15:21 mango kernel: [<f08c2431>]
snd_pcm_period_elapsed+0x311/0x480 [
snd_pcm]
Aug 11 01:15:21 mango kernel: [<f089ebce>]
snd_cs46xx_interrupt+0x1be/0x1f0 [sn
d_cs46xx]
Aug 11 01:15:21 mango kernel: [<c011b55b>]
generic_handle_IRQ_event+0x3b/0x70
Aug 11 01:15:21 mango kernel: [<c01078c6>] do_IRQ+0xb6/0x170
Aug 11 01:15:21 mango kernel: [<c0106098>] common_interrupt+0x18/0x20
Aug 11 01:15:40 mango gconfd (tapas-803): GConf server is not in use,
shutting d
own.
Aug 11 01:15:40 mango gconfd (tapas-803): Exiting
Aug 11 01:16:01 mango kernel: ALSA sound/core/pcm_lib.c:169: XRUN:
pcmC0D0p
Aug 11 01:16:01 mango kernel: [<c01064ae>] dump_stack+0x1e/0x20
Aug 11 01:16:01 mango kernel: [<f08c2431>]
snd_pcm_period_elapsed+0x311/0x480 [
snd_pcm]
Aug 11 01:16:01 mango kernel: [<f089ebce>]
snd_cs46xx_interrupt+0x1be/0x1f0 [sn
d_cs46xx]
Aug 11 01:16:01 mango kernel: [<c011b55b>]
generic_handle_IRQ_event+0x3b/0x70
Aug 11 01:16:01 mango kernel: [<c01078c6>] do_IRQ+0xb6/0x170
Aug 11 01:16:01 mango kernel: [<c0106098>] common_interrupt+0x18/0x20
Aug 11 01:16:01 mango kernel: [<c0140150>] do_anonymous_page+0x90/0x190
Aug 11 01:16:01 mango kernel: [<c01402b1>] do_no_page+0x61/0x320
Aug 11 01:16:01 mango kernel: [<c0140763>] handle_mm_fault+0xe3/0x1a0
Aug 11 01:16:01 mango kernel: [<c013f0e7>] get_user_pages+0x147/0x3d0
Aug 11 01:16:01 mango kernel: [<c01408fd>] make_pages_present+0x8d/0xb0
Aug 11 01:16:01 mango kernel: [<c0142255>] do_mmap_pgoff+0x445/0x6b0
Aug 11 01:16:01 mango kernel: [<c010b800>] old_mmap+0xe0/0x120
Aug 11 01:16:01 mango kernel: [<c0105f2b>] syscall_call+0x7/0xb
Aug 11 01:16:01 mango kernel: (scsynth/870): 3230us non-preemptible
critical sec
tion violated 1000 us preempt threshold starting at
kernel_fpu_begin+0x1a/0x60 a
nd ending at fast_clear_page+0x54/0x80
Aug 11 01:16:01 mango kernel: [<c01064ae>] dump_stack+0x1e/0x20
Aug 11 01:16:01 mango kernel: [<c0114735>] sub_preempt_count+0x45/0x60
Aug 11 01:16:01 mango kernel: [<c01daa94>] fast_clear_page+0x54/0x80
Aug 11 01:16:01 mango kernel: [<c0140150>] do_anonymous_page+0x90/0x190
Aug 11 01:16:01 mango kernel: [<c01402b1>] do_no_page+0x61/0x320
Aug 11 01:16:01 mango kernel: [<c0140763>] handle_mm_fault+0xe3/0x1a0
Aug 11 01:16:01 mango kernel: [<c013f0e7>] get_user_pages+0x147/0x3d0
Aug 11 01:16:01 mango kernel: [<c01408fd>] make_pages_present+0x8d/0xb0
Aug 11 01:16:01 mango kernel: [<c0142255>] do_mmap_pgoff+0x445/0x6b0
Aug 11 01:16:01 mango kernel: [<c010b800>] old_mmap+0xe0/0x120
Aug 11 01:16:01 mango kernel: [<c0105f2b>] syscall_call+0x7/0xb
Flo
--
Palimm Palimm!
http://affenbande.org/~tapas/
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [patch] voluntary-preempt-2.6.8-rc3-O5
2004-08-10 21:56 ` Lee Revell
2004-08-10 23:18 ` Florian Schmidt
@ 2004-08-11 7:31 ` Ingo Molnar
2004-08-11 7:42 ` Ingo Molnar
2004-08-12 22:16 ` Lee Revell
1 sibling, 2 replies; 30+ messages in thread
From: Ingo Molnar @ 2004-08-11 7:31 UTC (permalink / raw)
To: Lee Revell; +Cc: linux-kernel, Felipe Alfaro Solana, Florian Schmidt
* Lee Revell <rlrevell@joe-job.com> wrote:
> (jackd/12427): 10882us non-preemptible critical section violated 400
> us preempt threshold starting at kernel_fpu_begin+0x10/0x60 and ending
> at fast_clear_page+0x75/0xa0
to make sure this is a real latency and not some rdtsc weirdness, could
you try the latest version of preempt-timing:
http://redhat.com/~mingo/voluntary-preempt/preempt-timing-on-2.6.8-rc3-O5-A2
this adds jiffies-based latency values to the printout, e.g.:
(ksoftirqd/0/2): 3860us [3 jiffy] non-preemptible critical section
violated 100 us preempt threshold starting at ___do_softirq+0x1b/0x90
and ending at cond_resched_softirq+0x57/0x70
shows that a 10 jiffy (10 msec) latency happened - which matches the
rdtsc-based 3860 usecs value.
Ingo
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [patch] voluntary-preempt-2.6.8-rc3-O5
2004-08-11 7:31 ` Ingo Molnar
@ 2004-08-11 7:42 ` Ingo Molnar
2004-08-11 7:52 ` Lee Revell
2004-08-12 22:16 ` Lee Revell
1 sibling, 1 reply; 30+ messages in thread
From: Ingo Molnar @ 2004-08-11 7:42 UTC (permalink / raw)
To: Lee Revell; +Cc: linux-kernel, Felipe Alfaro Solana, Florian Schmidt
* Ingo Molnar <mingo@elte.hu> wrote:
> > (jackd/12427): 10882us non-preemptible critical section violated 400
> > us preempt threshold starting at kernel_fpu_begin+0x10/0x60 and ending
> > at fast_clear_page+0x75/0xa0
>
> to make sure this is a real latency and not some rdtsc weirdness, could
> you try the latest version of preempt-timing:
>
> http://redhat.com/~mingo/voluntary-preempt/preempt-timing-on-2.6.8-rc3-O5-A2
>
> this adds jiffies-based latency values to the printout, e.g.:
could you also apply the patch below? This splits up fast_clear_page()
overhead into 3 pieces to find out where the overhead comes from. Note
that this will likely cause 'reduced' latency printouts, but those are
not real because the touch_preempt_timing() lines dont show real
preemptible points. You should still get a larger latency in one of
those regions. The kernel-FPU code (which is triggered by the MMX ops)
could be another source of overhead - but 10 msecs is so large ...
(another thing: could you turn on CONFIG_DEBUG_HIGHMEM,
CONFIG_DEBUG_SPINLOCK and CONFIG_DEBUG_SPINLOCK_SLEEP? Lets make sure
that what we are seeing here is not somehow caused by atomicity
violations.)
Ingo
--- linux/arch/i386/lib/mmx.c.orig
+++ linux/arch/i386/lib/mmx.c
@@ -132,7 +132,9 @@ static void fast_clear_page(void *page)
{
int i;
+ touch_preempt_timing();
kernel_fpu_begin();
+ touch_preempt_timing();
__asm__ __volatile__ (
" pxor %%mm0, %%mm0\n" : :
@@ -158,7 +160,9 @@ static void fast_clear_page(void *page)
__asm__ __volatile__ (
" sfence \n" : :
);
+ touch_preempt_timing();
kernel_fpu_end();
+ touch_preempt_timing();
}
static void fast_copy_page(void *to, void *from)
@@ -260,8 +264,10 @@ static void fast_copy_page(void *to, voi
static void fast_clear_page(void *page)
{
int i;
-
+
+ touch_preempt_timing();
kernel_fpu_begin();
+ touch_preempt_timing();
__asm__ __volatile__ (
" pxor %%mm0, %%mm0\n" : :
@@ -290,7 +296,9 @@ static void fast_clear_page(void *page)
page+=128;
}
+ touch_preempt_timing();
kernel_fpu_end();
+ touch_preempt_timing();
}
static void fast_copy_page(void *to, void *from)
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [patch] voluntary-preempt-2.6.8-rc3-O5
2004-08-11 7:42 ` Ingo Molnar
@ 2004-08-11 7:52 ` Lee Revell
2004-08-11 8:25 ` Ingo Molnar
` (2 more replies)
0 siblings, 3 replies; 30+ messages in thread
From: Lee Revell @ 2004-08-11 7:52 UTC (permalink / raw)
To: Ingo Molnar; +Cc: linux-kernel, Felipe Alfaro Solana, Florian Schmidt
On Wed, 2004-08-11 at 03:42, Ingo Molnar wrote:
> (another thing: could you turn on CONFIG_DEBUG_HIGHMEM,
> CONFIG_DEBUG_SPINLOCK and CONFIG_DEBUG_SPINLOCK_SLEEP? Lets make sure
> that what we are seeing here is not somehow caused by atomicity
> violations.)
I have highmem disabled per your previous request. Should I turn it
back on?
Lee
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [patch] voluntary-preempt-2.6.8-rc3-O5
2004-08-11 7:52 ` Lee Revell
@ 2004-08-11 8:25 ` Ingo Molnar
2004-08-11 8:27 ` Ingo Molnar
2004-08-11 9:06 ` Ingo Molnar
2 siblings, 0 replies; 30+ messages in thread
From: Ingo Molnar @ 2004-08-11 8:25 UTC (permalink / raw)
To: Lee Revell; +Cc: linux-kernel, Felipe Alfaro Solana, Florian Schmidt
* Lee Revell <rlrevell@joe-job.com> wrote:
> > (another thing: could you turn on CONFIG_DEBUG_HIGHMEM,
> > CONFIG_DEBUG_SPINLOCK and CONFIG_DEBUG_SPINLOCK_SLEEP? Lets make sure
> > that what we are seeing here is not somehow caused by atomicity
> > violations.)
>
> I have highmem disabled per your previous request. Should I turn it
> back on?
nope, please keep it disabled still. (You can turn the C3 CPU option
back on though, it doesnt seem to have any impact on latency.)
then it's just the spinlock debugging options that should be enabled.
(they make sense on UP kernels too, if CONFIG_PREEMPT is enabled.)
Ingo
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [patch] voluntary-preempt-2.6.8-rc3-O5
2004-08-11 7:52 ` Lee Revell
2004-08-11 8:25 ` Ingo Molnar
@ 2004-08-11 8:27 ` Ingo Molnar
2004-08-11 11:48 ` Linh Dang
2004-08-11 9:06 ` Ingo Molnar
2 siblings, 1 reply; 30+ messages in thread
From: Ingo Molnar @ 2004-08-11 8:27 UTC (permalink / raw)
To: Lee Revell; +Cc: linux-kernel, Felipe Alfaro Solana, Florian Schmidt
another thing: do you see the 10 msec latency every time you run
mlockall-test, or does it only happen sporadically?
Ingo
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [patch] voluntary-preempt-2.6.8-rc3-O5
2004-08-11 7:52 ` Lee Revell
2004-08-11 8:25 ` Ingo Molnar
2004-08-11 8:27 ` Ingo Molnar
@ 2004-08-11 9:06 ` Ingo Molnar
2004-08-11 12:16 ` Florian Schmidt
2 siblings, 1 reply; 30+ messages in thread
From: Ingo Molnar @ 2004-08-11 9:06 UTC (permalink / raw)
To: Lee Revell; +Cc: linux-kernel, Felipe Alfaro Solana, Florian Schmidt
[-- Attachment #1: Type: text/plain, Size: 1882 bytes --]
most of the remaining latencies look quite suspect. E.g. the
select()/tty_poll() ones:
(gnome-terminal/826): 15491us non-preemptible critical section
violated 1100 us preempt threshold starting at
add_wait_queue+0x15/0x50 and ending at add_wait_queue+0x2c/0x50
[dump_stack+23/32] dump_stack+0x17/0x20
[dec_preempt_count+60/80] dec_preempt_count+0x3c/0x50
[add_wait_queue+44/80] add_wait_queue+0x2c/0x50
[normal_poll+61/375] normal_poll+0x3d/0x177
[tty_poll+97/128] tty_poll+0x61/0x80
[do_pollfd+145/160] do_pollfd+0x91/0xa0
[do_poll+95/192] do_poll+0x5f/0xc0
[sys_poll+305/544] sys_poll+0x131/0x220
[syscall_call+7/11] syscall_call+0x7/0xb
according to the trace this latency happened in a point where it's near
impossible to happen. add_wait_queue() is just a couple of straight
instructions on UP.
do you have any powersaving mode enabled in the BIOS? SMM handlers can
introduce such latencies (low probability).
the only other possibility is either a measurement error, or some mystic
IRQ overhead. But almost all IRQs are redirected so the IRQ overhead can
be eliminated almost completely. Plus direct-IRQ overhead should also
show up via the latest preempt-timing patch. Wrt. measurement error, the
jiffies based printout ought to help somewhat.
i'm currently running a loop of mlockall-test 100MB on a 466 MHz
Celeron, and not a single blip on the radar with a 1000 usecs threshold,
after 1 hour of runtime ...
i've previously seen RDTSC (cycle-counter) weirdnesses on another box,
in userspace. To exclude this possibility i've attached yet another
patch, it tries to make all the kernel rdtsc variants more robust. Does
this patch make any difference to the latency printouts? [this patch
doesnt handle cases where the rdtsc value jumps forward in time
permanently, but it handles the occasional blips i've seen on the other
box.]
Ingo
[-- Attachment #2: rdtsc-robust-2.6.8-rc3-A0 --]
[-- Type: text/plain, Size: 1427 bytes --]
--- linux/include/asm-i386/msr.h.orig3
+++ linux/include/asm-i386/msr.h
@@ -32,15 +32,50 @@ static inline void wrmsrl (unsigned long
wrmsr (msr, lo, hi);
}
-#define rdtsc(low,high) \
+#define __rdtsc(low,high) \
__asm__ __volatile__("rdtsc" : "=a" (low), "=d" (high))
-#define rdtscl(low) \
+#define rdtsc(low,high) do { \
+ unsigned int __low1, __high1, __low2, __high2; \
+ for (;;) { \
+ __rdtsc(__low1,__high1); \
+ __rdtsc(__low2,__high2); \
+ if (__high1 == __high2 && __low2 - __low1 < 1000) \
+ break; \
+ } \
+ low = __low2; \
+ high = __high2; \
+} while (0)
+
+#define __rdtscl(low) \
__asm__ __volatile__("rdtsc" : "=a" (low) : : "edx")
-#define rdtscll(val) \
+#define rdtscl(low) do { \
+ unsigned int __low1, __low2; \
+ for (;;) { \
+ __rdtscl(__low1); \
+ __rdtscl(__low2); \
+ if (__low2 - __low1 < 1000) \
+ break; \
+ } \
+ low = __low2; \
+} while (0)
+
+#define __rdtscll(val) \
__asm__ __volatile__("rdtsc" : "=A" (val))
+#define rdtscll(val) do { \
+ unsigned long long __val1, __val2; \
+ for (;;) { \
+ __rdtscll(__val1); \
+ __rdtscll(__val2); \
+ if (__val2 - __val1 < 1000ULL) \
+ break; \
+ } \
+ val = __val2; \
+} while (0)
+
+
#define write_tsc(val1,val2) wrmsr(0x10, val1, val2)
#define rdpmc(counter,low,high) \
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [patch] voluntary-preempt-2.6.8-rc3-O5
2004-08-11 8:27 ` Ingo Molnar
@ 2004-08-11 11:48 ` Linh Dang
2004-08-12 0:04 ` Lee Revell
0 siblings, 1 reply; 30+ messages in thread
From: Linh Dang @ 2004-08-11 11:48 UTC (permalink / raw)
To: linux-kernel
Hi,
I'm not running the voluntary-preempt-* patches. but I do see some
long latencies with:
vanilla 2.6.7+preempt-timing+defer-softirq
which were NOT reported here. Is it useful the report them?
Regards
--
Linh Dang
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [patch] voluntary-preempt-2.6.8-rc3-O5
2004-08-11 9:06 ` Ingo Molnar
@ 2004-08-11 12:16 ` Florian Schmidt
2004-08-11 12:43 ` Ingo Molnar
0 siblings, 1 reply; 30+ messages in thread
From: Florian Schmidt @ 2004-08-11 12:16 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Lee Revell, linux-kernel, Felipe Alfaro Solana
On Wed, 11 Aug 2004 11:06:39 +0200
Ingo Molnar <mingo@elte.hu> wrote:
> i'm currently running a loop of mlockall-test 100MB on a 466 MHz
> Celeron, and not a single blip on the radar with a 1000 usecs
> threshold, after 1 hour of runtime ...
I suppose you're not using jackd. As i have noticed that these critical
sections only get reported when jackd is running. It seems jackd is
producing a certain kind of load which exposes them..
Flo
--
Palimm Palimm!
http://affenbande.org/~tapas/
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [patch] voluntary-preempt-2.6.8-rc3-O5
2004-08-11 12:16 ` Florian Schmidt
@ 2004-08-11 12:43 ` Ingo Molnar
2004-08-11 13:44 ` Florian Schmidt
` (2 more replies)
0 siblings, 3 replies; 30+ messages in thread
From: Ingo Molnar @ 2004-08-11 12:43 UTC (permalink / raw)
To: Florian Schmidt; +Cc: Lee Revell, linux-kernel, Felipe Alfaro Solana
* Florian Schmidt <mista.tapas@gmx.net> wrote:
> > i'm currently running a loop of mlockall-test 100MB on a 466 MHz
> > Celeron, and not a single blip on the radar with a 1000 usecs
> > threshold, after 1 hour of runtime ...
>
> I suppose you're not using jackd. As i have noticed that these
> critical sections only get reported when jackd is running. It seems
> jackd is producing a certain kind of load which exposes them..
so you can only trigger the latencies via mlockall-test if jackd is also
running? Or do the latencies only trigger in jackd (and related
programs)?
if the later, then i'm wondering whether any of the audio code turns off
caching for specific pages or does DMA to user pages, or mmap()s device
(PCI) memory?
Ingo
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [patch] voluntary-preempt-2.6.8-rc3-O5
2004-08-11 12:43 ` Ingo Molnar
@ 2004-08-11 13:44 ` Florian Schmidt
2004-08-11 19:18 ` Florian Schmidt
2004-08-11 23:55 ` Lee Revell
2 siblings, 0 replies; 30+ messages in thread
From: Florian Schmidt @ 2004-08-11 13:44 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Lee Revell, linux-kernel, Felipe Alfaro Solana
On Wed, 11 Aug 2004 14:43:42 +0200
Ingo Molnar <mingo@elte.hu> wrote:
>
> * Florian Schmidt <mista.tapas@gmx.net> wrote:
>
> > > i'm currently running a loop of mlockall-test 100MB on a 466 MHz
> > > Celeron, and not a single blip on the radar with a 1000 usecs
> > > threshold, after 1 hour of runtime ...
> >
> > I suppose you're not using jackd. As i have noticed that these
> > critical sections only get reported when jackd is running. It seems
> > jackd is producing a certain kind of load which exposes them..
>
> so you can only trigger the latencies via mlockall-test if jackd is
> also running?
Yes. I can happily mlockall 500 megs of ram when jackd is not running
and i do not get any preempt-timing reports.. As soon as jackd is
running even mlockall'ing only 20megs triggers a preempt-timing report
and also a alsa xrun report. Hmm, let me try turning off xrun_debug
traces while leaving preempt timing reports enabled...
Ok, when xrun_debug traces are disabled [but preempt timing is on].
mlocking 20 megs of memory triggers an xrun [now only reported by jackd
in it's stdout, not in syslog] but not a preempt timing report. Lee, can
you verify?
So it seems that the xrun_debug reports from the alsa interface jackd is
using triggers the preempt timing reports. So these are really
suspicious...
So i'm asking myself: How can jackd experience an xrun, when there's no
preempt-timing report showing a kernel latency. Jackd is running
SCHED_FIFO, so the mlockall_test program should not take away the cpu
from jackd. Is mlockall() special in another way? Sorry, i know too
little about the kernel internals to ask the right question. Maybe the
alsa driver is reacting to the mlockall_test..
Another piece of info which might be valuable (i cannot judge this): I
use jackd and mlockall not as root, but as normal user utilizing the
realtime lsm which allows non root users to mlock, mlockall and change
the scheduling class (via sched_setsched()?). I'll try runnign jackd and
the mlockall_test as root to see if the results differ..
> Or do the latencies only trigger in jackd (and related
> programs)?
Sorry, i don't understand that question..
>
> if the later, then i'm wondering whether any of the audio code turns
> off caching for specific pages or does DMA to user pages, or mmap()s
> device(PCI) memory?
Flo
--
Palimm Palimm!
http://affenbande.org/~tapas/
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [patch] voluntary-preempt-2.6.8-rc3-O5
2004-08-11 12:43 ` Ingo Molnar
2004-08-11 13:44 ` Florian Schmidt
@ 2004-08-11 19:18 ` Florian Schmidt
2004-08-11 23:55 ` Lee Revell
2 siblings, 0 replies; 30+ messages in thread
From: Florian Schmidt @ 2004-08-11 19:18 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Lee Revell, linux-kernel, Felipe Alfaro Solana
On Wed, 11 Aug 2004 14:43:42 +0200
Ingo Molnar <mingo@elte.hu> wrote:
> so you can only trigger the latencies via mlockall-test if jackd is
> also running? Or do the latencies only trigger in jackd (and related
> programs)?
Hi,
i also want to say that some of the sprocadic xruns i see do also not
seem to be cause by latencies. The preempt timing reports i saw where
rather results of the xrun_debug reports i think..
How do i come to this conclusion? Well i have jack running w/o ALSA
xrun_debug reports but with preempt timing running for quite a while and
i see about the same number of xruns as before, but i don't see any
preempt timing reports anymore...
So i think jackd itself is doing some weird stuff. Or the driver of my
soundcard.
Flo
--
Palimm Palimm!
http://affenbande.org/~tapas/
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [patch] voluntary-preempt-2.6.8-rc3-O5
2004-08-11 12:43 ` Ingo Molnar
2004-08-11 13:44 ` Florian Schmidt
2004-08-11 19:18 ` Florian Schmidt
@ 2004-08-11 23:55 ` Lee Revell
2004-08-12 7:21 ` Ingo Molnar
2 siblings, 1 reply; 30+ messages in thread
From: Lee Revell @ 2004-08-11 23:55 UTC (permalink / raw)
To: Ingo Molnar
Cc: Florian Schmidt, linux-kernel, Felipe Alfaro Solana, jackit-devel,
Paul Davis
On Wed, 2004-08-11 at 08:43, Ingo Molnar wrote:
> * Florian Schmidt <mista.tapas@gmx.net> wrote:
>
> > > i'm currently running a loop of mlockall-test 100MB on a 466 MHz
> > > Celeron, and not a single blip on the radar with a 1000 usecs
> > > threshold, after 1 hour of runtime ...
> >
> > I suppose you're not using jackd. As i have noticed that these
> > critical sections only get reported when jackd is running. It seems
> > jackd is producing a certain kind of load which exposes them..
>
> so you can only trigger the latencies via mlockall-test if jackd is also
> running? Or do the latencies only trigger in jackd (and related
> programs)?
>
> if the later, then i'm wondering whether any of the audio code turns off
> caching for specific pages or does DMA to user pages, or mmap()s device
> (PCI) memory?
>
I believe that jackd may do all of these. I am adding Paul Davis to the
cc: list as he would know better.
Whatever is going on, it only happens at jackd startup. Jackd does not
report an xrun because the developers added code not to report an xrun
within the first 64 frames, so this message would not confuse users.
There is definitely some subtle bug in the preempt-timing patch, because
I am getting reports of long non-preemptible sections, which do not
correspond to an xrun in jackd - if these were real then even a 400usec
non-preemptible section would cause an xrun. I do not seem to get many
xruns during normal jackd operation.
If, during initialization, jackd called some function that could sleep
*after* starting the PCM, for example if it tried to allocate memory in
the same thread as the audio, this would cause an xrun, because the
soundcard interrupt would occur, but jackd would not be woken up because
it is sleeping on some other resource, correct? Then, when jackd
eventually woke up, it would see that a lot of time had passed, and
report an xrun. This would look the same as if jackd had been ready to
run and had not been scheduled in time.
Lee
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [patch] voluntary-preempt-2.6.8-rc3-O5
2004-08-11 11:48 ` Linh Dang
@ 2004-08-12 0:04 ` Lee Revell
2004-08-12 2:55 ` Linh Dang
0 siblings, 1 reply; 30+ messages in thread
From: Lee Revell @ 2004-08-12 0:04 UTC (permalink / raw)
To: Linh Dang; +Cc: linux-kernel
On Wed, 2004-08-11 at 07:48, Linh Dang wrote:
> Hi,
>
> I'm not running the voluntary-preempt-* patches. but I do see some
> long latencies with:
>
> vanilla 2.6.7+preempt-timing+defer-softirq
>
> which were NOT reported here. Is it useful the report them?
>
Probably not. Many latency issues as well as bugs in the preempt timing
patch have been fixed since then. You should try the latest version.
Lee
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [patch] voluntary-preempt-2.6.8-rc3-O5
2004-08-12 0:04 ` Lee Revell
@ 2004-08-12 2:55 ` Linh Dang
0 siblings, 0 replies; 30+ messages in thread
From: Linh Dang @ 2004-08-12 2:55 UTC (permalink / raw)
To: linux-kernel
Lee Revell <rlrevell@joe-job.com> wrote:
> On Wed, 2004-08-11 at 07:48, Linh Dang wrote:
>> Hi,
>>
>> I'm not running the voluntary-preempt-* patches. but I do see some
>> long latencies with:
>>
>> vanilla 2.6.7+preempt-timing+defer-softirq
>>
>> which were NOT reported here. Is it useful the report them?
>>
>
> Probably not. Many latency issues as well as bugs in the preempt
> timing patch have been fixed since then. You should try the latest
> version.
which "latest"? Linus's 2.6.8-rcX or Andrew's -mm's or Ingo's patches.
I've looked at Ingo patches but didn't see fixes for the followings:
1. In sys_mount(): do_mount() is called with the BKL held. on jffs2
system, jffs2 might for a big media-scan and lock preemption for
several msecs. even if jffs2_scan_eraseblock() calls cond_resched()
for every flash sector, scanning one sector is still very long.
2. netif_receive_skb(): the rcu_read_lock() is too broad. IMHO, it's
only needed around the 2 list_for_each_entry_rcu()s. There's no
reason why rcu_read_lock() is needed when calling ip_recv on the
skb.
3. with Ingo's patches, if all softirqs are done by the daemon then
there's should be NO need to call local_bh_disable()/enable()
around the processing loop in ___do_softirq().
4. in cfi_cmdset_0002.c, using spin_lock_bh() to guard struct flchip
seems to be an overkill. why a semaphore is NOT sufficient?
I'm a total newbie so it's possible that I'm totally wrong about the
aboves.
Regards
--
Linh Dang
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [patch] voluntary-preempt-2.6.8-rc3-O5
2004-08-11 23:55 ` Lee Revell
@ 2004-08-12 7:21 ` Ingo Molnar
2004-08-12 21:54 ` Lee Revell
0 siblings, 1 reply; 30+ messages in thread
From: Ingo Molnar @ 2004-08-12 7:21 UTC (permalink / raw)
To: Lee Revell
Cc: Florian Schmidt, linux-kernel, Felipe Alfaro Solana, jackit-devel,
Paul Davis
* Lee Revell <rlrevell@joe-job.com> wrote:
> There is definitely some subtle bug in the preempt-timing patch,
> because I am getting reports of long non-preemptible sections, which
> do not correspond to an xrun in jackd - if these were real then even a
> 400usec non-preemptible section would cause an xrun. I do not seem to
> get many xruns during normal jackd operation.
could you send me these latest preempt-timing warnings?
Ingo
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [patch] voluntary-preempt-2.6.8-rc3-O5
2004-08-12 7:21 ` Ingo Molnar
@ 2004-08-12 21:54 ` Lee Revell
2004-08-13 0:04 ` Lee Revell
0 siblings, 1 reply; 30+ messages in thread
From: Lee Revell @ 2004-08-12 21:54 UTC (permalink / raw)
To: Ingo Molnar
Cc: Florian Schmidt, linux-kernel, Felipe Alfaro Solana, jackit-devel,
Paul Davis
On Thu, 2004-08-12 at 03:21, Ingo Molnar wrote:
> * Lee Revell <rlrevell@joe-job.com> wrote:
>
> > There is definitely some subtle bug in the preempt-timing patch,
> > because I am getting reports of long non-preemptible sections, which
> > do not correspond to an xrun in jackd - if these were real then even a
> > 400usec non-preemptible section would cause an xrun. I do not seem to
> > get many xruns during normal jackd operation.
>
> could you send me these latest preempt-timing warnings?
>
Here are all the XRUN traces and preempt-timing warnings for the last
~24 hours. I still have not had a chance to test with the latest
patches & config changes.
http://krustophenia.net/2.6.8-rc3-O5/kern.log.txt
Here are some graphs from other test runs:
http://krustophenia.net/testresults.php?dataset=2.6.8-rc3-O5
http://krustophenia.net/testresults.php?dataset=2.6.8-rc2-mm2-O3
For reference here's an -mm kernel without the voluntary-preempt
patches:
http://krustophenia.net/testresults.php?dataset=2.6.8-rc3-mm2
Source code for the php script:
http://krustophenia.net/testresults.phps
I based the php script on this perl script:
http://www.oddmuse.org/cgi-bin/oddmuse/GnuPlot_Extension
The web server is my desktop machine, so please don't complain if it
becomes unavailable without warning. I will try to keep the links
working.
Lee
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [patch] voluntary-preempt-2.6.8-rc3-O5
2004-08-11 7:31 ` Ingo Molnar
2004-08-11 7:42 ` Ingo Molnar
@ 2004-08-12 22:16 ` Lee Revell
1 sibling, 0 replies; 30+ messages in thread
From: Lee Revell @ 2004-08-12 22:16 UTC (permalink / raw)
To: Ingo Molnar; +Cc: linux-kernel, Felipe Alfaro Solana, Florian Schmidt
On Wed, 2004-08-11 at 03:31, Ingo Molnar wrote:
> * Lee Revell <rlrevell@joe-job.com> wrote:
>
> > (jackd/12427): 10882us non-preemptible critical section violated 400
> > us preempt threshold starting at kernel_fpu_begin+0x10/0x60 and ending
> > at fast_clear_page+0x75/0xa0
>
> to make sure this is a real latency and not some rdtsc weirdness, could
> you try the latest version of preempt-timing:
>
> http://redhat.com/~mingo/voluntary-preempt/preempt-timing-on-2.6.8-rc3-O5-A2
>
> this adds jiffies-based latency values to the printout, e.g.:
>
> (ksoftirqd/0/2): 3860us [3 jiffy] non-preemptible critical section
> violated 100 us preempt threshold starting at ___do_softirq+0x1b/0x90
> and ending at cond_resched_softirq+0x57/0x70
>
> shows that a 10 jiffy (10 msec) latency happened - which matches the
> rdtsc-based 3860 usecs value.
>
Here is the preempt-timing violation and accompanying xrun I got when
starting jackd. It looks like the jiffies value matches the rdtsc:
ALSA /home/rlrevell/cvs/alsa/alsa-driver/alsa-kernel/core/pcm_lib.c:139:
XRUN: pcmC0D2c
[<c0106497>] dump_stack+0x17/0x20
[<de92714b>] snd_pcm_period_elapsed+0x27b/0x3e0 [snd_pcm]
[<de963171>] snd_emu10k1_interrupt+0xd1/0x380 [snd_emu10k1]
[<c011a693>] generic_handle_IRQ_event+0x33/0x60
[<c010765e>] do_IRQ+0xbe/0x180
[<c0106078>] common_interrupt+0x18/0x20
[<c013d28e>] do_no_page+0x4e/0x300
[<c013d6e1>] handle_mm_fault+0xb1/0x150
[<c013c118>] get_user_pages+0x138/0x3e0
[<c013d828>] make_pages_present+0x68/0x90
[<c013ef5f>] do_mmap_pgoff+0x40f/0x670
[<c010b245>] sys_mmap2+0x75/0xb0
[<c0105e57>] syscall_call+0x7/0xb
(jackd/1359): 9174us [10 jiffy] non-preemptible critical section
violated 400 us preempt threshold starting at do_IRQ+0x19/0x180 and
ending at do_IRQ+0x121/0x180 [<c0106497>] dump_stack+0x17/0x20
[<c0113e5b>] sub_preempt_count+0x4b/0x60
[<c01076c1>] do_IRQ+0x121/0x180
[<c0106078>] common_interrupt+0x18/0x20
[<c013d28e>] do_no_page+0x4e/0x300
[<c013d6e1>] handle_mm_fault+0xb1/0x150
[<c013c118>] get_user_pages+0x138/0x3e0
[<c013d828>] make_pages_present+0x68/0x90
[<c013ef5f>] do_mmap_pgoff+0x40f/0x670
[<c010b245>] sys_mmap2+0x75/0xb0
[<c0105e57>] syscall_call+0x7/0xb
Lee
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [patch] voluntary-preempt-2.6.8-rc3-O5
2004-08-12 21:54 ` Lee Revell
@ 2004-08-13 0:04 ` Lee Revell
2004-08-13 0:27 ` Lee Revell
0 siblings, 1 reply; 30+ messages in thread
From: Lee Revell @ 2004-08-13 0:04 UTC (permalink / raw)
To: Ingo Molnar
Cc: Florian Schmidt, linux-kernel, Felipe Alfaro Solana, jackit-devel,
Paul Davis
On Thu, 2004-08-12 at 17:54, Lee Revell wrote:
> I still have not had a chance to test with the latest
> patches & config changes.
Here are all the log entries produced during this test:
rlrevell@mindpipe:~$ for test in 1 2 3 4 5; do echo; echo "Test $test";
jackd -n 100000 --realtime -d alsa --rate 48000 -D -P hw:0,0 -C hw:0,2
-p 32 -S >> mm2; done
http://krustophenia.net/2.6.8-rc3-O5-A2/kern.log.txt
When one jackd process is running, starting another jackd process,
using a different device produces an xrun in the first. If mlockall-test
is run while a jackd process is running, this also produces an xrun in
the jackd process.
So, it seems that if a SCHED_FIFO process opens a PCM device using mmap,
then mlockall's the memory, then another process mlockall's memory, the
result is an xrun 100% of the time.
Lee
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [patch] voluntary-preempt-2.6.8-rc3-O5
2004-08-13 0:04 ` Lee Revell
@ 2004-08-13 0:27 ` Lee Revell
2004-08-13 0:55 ` Florian Schmidt
0 siblings, 1 reply; 30+ messages in thread
From: Lee Revell @ 2004-08-13 0:27 UTC (permalink / raw)
To: Ingo Molnar
Cc: Florian Schmidt, linux-kernel, Felipe Alfaro Solana, jackit-devel,
Paul Davis
On Thu, 2004-08-12 at 20:04, Lee Revell wrote:
> So, it seems that if a SCHED_FIFO process opens a PCM device using mmap,
> then mlockall's the memory, then another process mlockall's memory, the
> result is an xrun 100% of the time.
>
I have found that around 1400 KB is a magic number on my system, this
triggers the preempt violation/xrun about 50% of the time. 1300 never
triggers it, 1500 always triggers it.
Also the amount of memory being mlockall'ed does not affect the length
of the preemption violation - if we hit it at all, there's a 10ms
latency, whether we lock 1400KB or 100MB.
Hopefully O6 will give enough info to track this down.
Lee
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [patch] voluntary-preempt-2.6.8-rc3-O5
2004-08-13 0:55 ` Florian Schmidt
@ 2004-08-13 0:55 ` Lee Revell
2004-08-13 1:18 ` Florian Schmidt
2004-08-13 10:07 ` Ingo Molnar
1 sibling, 1 reply; 30+ messages in thread
From: Lee Revell @ 2004-08-13 0:55 UTC (permalink / raw)
To: Florian Schmidt
Cc: Ingo Molnar, linux-kernel, Felipe Alfaro Solana, jackit-devel,
Paul Davis
On Thu, 2004-08-12 at 20:55, Florian Schmidt wrote:
> i think that the mlockall and client/jackd startup xruns often do not
> seem to correspond to a critical timing report.. Try the following: turn
> off xrun_debug but leave the preempt-timing stuff on. On my system, the
> mlockall_test provokes an xrun in jackd's output but i do not get a
> preempt-timing report (thresh = 500).
>
> OTOH when the xrun_debug is on, the xrun_debug report actually seems to
> trigger the preempt-timing report.
>
Even if it is not a long non-preemptible section, mlockall-test is still
doing *something* that causes an xrun in jackd. Maybe interrupts are
being disabled for a long time. Whatever it is, it cannot possibly be a
jackd bug, because mlockall by an unrelated, normal-priority process
causes an xrun in the SCHED_FIFO jackd process.
Lee
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [patch] voluntary-preempt-2.6.8-rc3-O5
2004-08-13 0:27 ` Lee Revell
@ 2004-08-13 0:55 ` Florian Schmidt
2004-08-13 0:55 ` Lee Revell
2004-08-13 10:07 ` Ingo Molnar
0 siblings, 2 replies; 30+ messages in thread
From: Florian Schmidt @ 2004-08-13 0:55 UTC (permalink / raw)
To: Lee Revell
Cc: Ingo Molnar, linux-kernel, Felipe Alfaro Solana, jackit-devel,
Paul Davis
On Thu, 12 Aug 2004 20:27:57 -0400
Lee Revell <rlrevell@joe-job.com> wrote:
> On Thu, 2004-08-12 at 20:04, Lee Revell wrote:
>
> > So, it seems that if a SCHED_FIFO process opens a PCM device using
> > mmap, then mlockall's the memory, then another process mlockall's
> > memory, the result is an xrun 100% of the time.
> >
>
> I have found that around 1400 KB is a magic number on my system, this
> triggers the preempt violation/xrun about 50% of the time. 1300 never
> triggers it, 1500 always triggers it.
>
> Also the amount of memory being mlockall'ed does not affect the length
> of the preemption violation - if we hit it at all, there's a 10ms
> latency, whether we lock 1400KB or 100MB.
>
> Hopefully O6 will give enough info to track this down.
Hi,
i think that the mlockall and client/jackd startup xruns often do not
seem to correspond to a critical timing report.. Try the following: turn
off xrun_debug but leave the preempt-timing stuff on. On my system, the
mlockall_test provokes an xrun in jackd's output but i do not get a
preempt-timing report (thresh = 500).
OTOH when the xrun_debug is on, the xrun_debug report actually seems to
trigger the preempt-timing report.
I think many of the jackd xruns are really jacks business. But maybe i
misinterpret the symptom.
BTW: on my system with 2*64 frames the magic mlockall number seems to be
around 15megs.. mlockall'ing only 10megs is very unlikely to trigger an
xrun in jackd And even mlockall'ing 100 megs, while always producing an
xrun in jackd, doesn't show a corresponding preempt timing report either
[with preempt_thresh = 500] as long as the xrun_debug is off.
Flo
--
Palimm Palimm!
http://affenbande.org/~tapas/
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [patch] voluntary-preempt-2.6.8-rc3-O5
2004-08-13 1:18 ` Florian Schmidt
@ 2004-08-13 1:12 ` Lee Revell
0 siblings, 0 replies; 30+ messages in thread
From: Lee Revell @ 2004-08-13 1:12 UTC (permalink / raw)
To: Florian Schmidt
Cc: Ingo Molnar, linux-kernel, Felipe Alfaro Solana, jackit-devel,
Paul Davis
On Thu, 2004-08-12 at 21:18, Florian Schmidt wrote:
> On Thu, 12 Aug 2004 20:55:30 -0400
> Lee Revell <rlrevell@joe-job.com> wrote:
>
> > Even if it is not a long non-preemptible section, mlockall-test is
> > still doing *something* that causes an xrun in jackd. Maybe
> > interrupts are being disabled for a long time. Whatever it is, it
> > cannot possibly be a jackd bug, because mlockall by an unrelated,
> > normal-priority process causes an xrun in the SCHED_FIFO jackd
> > process.
>
> Hmm, yes, that makes sense. I wonder: does mlockall have any influence
> on IPC?
>
Good question. I did repeat your test (disabling xrun_debug) and I can
confirm that I no longer get a preempt timing violation, but
mlockall-test still triggers xruns in jackd every time. So there is
something else going on here.
Lee
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [patch] voluntary-preempt-2.6.8-rc3-O5
2004-08-13 0:55 ` Lee Revell
@ 2004-08-13 1:18 ` Florian Schmidt
2004-08-13 1:12 ` Lee Revell
0 siblings, 1 reply; 30+ messages in thread
From: Florian Schmidt @ 2004-08-13 1:18 UTC (permalink / raw)
To: Lee Revell
Cc: Ingo Molnar, linux-kernel, Felipe Alfaro Solana, jackit-devel,
Paul Davis
On Thu, 12 Aug 2004 20:55:30 -0400
Lee Revell <rlrevell@joe-job.com> wrote:
> Even if it is not a long non-preemptible section, mlockall-test is
> still doing *something* that causes an xrun in jackd. Maybe
> interrupts are being disabled for a long time. Whatever it is, it
> cannot possibly be a jackd bug, because mlockall by an unrelated,
> normal-priority process causes an xrun in the SCHED_FIFO jackd
> process.
Hmm, yes, that makes sense. I wonder: does mlockall have any influence
on IPC?
Flo
--
Palimm Palimm!
http://affenbande.org/~tapas/
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [patch] voluntary-preempt-2.6.8-rc3-O5
2004-08-13 0:55 ` Florian Schmidt
2004-08-13 0:55 ` Lee Revell
@ 2004-08-13 10:07 ` Ingo Molnar
1 sibling, 0 replies; 30+ messages in thread
From: Ingo Molnar @ 2004-08-13 10:07 UTC (permalink / raw)
To: Florian Schmidt
Cc: Lee Revell, linux-kernel, Felipe Alfaro Solana, jackit-devel,
Paul Davis
* Florian Schmidt <mista.tapas@gmx.net> wrote:
> i think that the mlockall and client/jackd startup xruns often do not
> seem to correspond to a critical timing report.. Try the following:
> turn off xrun_debug but leave the preempt-timing stuff on. On my
> system, the mlockall_test provokes an xrun in jackd's output but i do
> not get a preempt-timing report (thresh = 500).
>
> OTOH when the xrun_debug is on, the xrun_debug report actually seems
> to trigger the preempt-timing report.
this later phenomenon is expected and unrelated: printk-ing to the
console (as the ALSA xrun kernel message does) is quite expensive,
especially with a full stack dump included. So this fact alone doesnt
tell us much about why the xrun itself happened.
if there is no preempt-timing report when the ALSA xrun debugging is
disabled it strongly suggests that whatever causes the xrun, it's not
due to the length of a non-preemptible critical section, but some other
phenomenon (either in userspace or in kernelspace) that causes an xrun.
irqs being left disabled by accident is one such possibility - the
preempt-timing patch does not (yet) track irqs-off sections.
> I think many of the jackd xruns are really jacks business. But maybe i
> misinterpret the symptom.
either jack's or the kernel's - but it's less likely to be the classical
non-preemptible sections we were focused on so far. (it could still be a
bug in the preempt-timing patch producing a false negative - but the
likelyhood is low i'd say.)
Ingo
^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2004-08-13 10:07 UTC | newest]
Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-08-10 15:46 [patch] voluntary-preempt-2.6.8-rc3-O5 Shane Shrybman
-- strict thread matches above, loose matches on Subject: below --
2004-07-25 22:49 preempt-timing-2.6.8-rc1 Lee Revell
2004-07-26 8:23 ` preempt-timing-2.6.8-rc1 Ingo Molnar
2004-07-26 8:29 ` preempt-timing-2.6.8-rc1 Lee Revell
2004-07-26 8:35 ` [patch] voluntary-preempt-2.6.8-rc2-J3 Ingo Molnar
2004-07-26 9:00 ` Lee Revell
2004-07-26 12:40 ` Ingo Molnar
2004-07-26 20:47 ` [patch] voluntary-preempt-2.6.8-rc2-J7 Ingo Molnar
2004-07-29 22:26 ` [patch] voluntary-preempt-2.6.8-rc2-M5 Ingo Molnar
2004-08-01 19:30 ` [patch] voluntary-preempt-2.6.8-rc2-O2 Ingo Molnar
2004-08-09 10:46 ` [patch] voluntary-preempt-2.6.8-rc3-O4 Ingo Molnar
2004-08-10 13:26 ` [patch] voluntary-preempt-2.6.8-rc3-O5 Ingo Molnar
2004-08-10 18:25 ` Peter Zijlstra
2004-08-10 21:56 ` Lee Revell
2004-08-10 23:18 ` Florian Schmidt
2004-08-11 7:31 ` Ingo Molnar
2004-08-11 7:42 ` Ingo Molnar
2004-08-11 7:52 ` Lee Revell
2004-08-11 8:25 ` Ingo Molnar
2004-08-11 8:27 ` Ingo Molnar
2004-08-11 11:48 ` Linh Dang
2004-08-12 0:04 ` Lee Revell
2004-08-12 2:55 ` Linh Dang
2004-08-11 9:06 ` Ingo Molnar
2004-08-11 12:16 ` Florian Schmidt
2004-08-11 12:43 ` Ingo Molnar
2004-08-11 13:44 ` Florian Schmidt
2004-08-11 19:18 ` Florian Schmidt
2004-08-11 23:55 ` Lee Revell
2004-08-12 7:21 ` Ingo Molnar
2004-08-12 21:54 ` Lee Revell
2004-08-13 0:04 ` Lee Revell
2004-08-13 0:27 ` Lee Revell
2004-08-13 0:55 ` Florian Schmidt
2004-08-13 0:55 ` Lee Revell
2004-08-13 1:18 ` Florian Schmidt
2004-08-13 1:12 ` Lee Revell
2004-08-13 10:07 ` Ingo Molnar
2004-08-12 22:16 ` Lee Revell
2004-08-10 21:59 ` Lee Revell
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox