* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q7
@ 2004-09-01 22:56 Mark_H_Johnson
2004-09-02 5:34 ` Ingo Molnar
0 siblings, 1 reply; 15+ messages in thread
From: Mark_H_Johnson @ 2004-09-01 22:56 UTC (permalink / raw)
To: Thomas Charbonnel; +Cc: K.R. Foley, linux-kernel, Ingo Molnar, Lee Revell
>With Q7 I still get rx latency issues (> 130 us non-preemptible section
>from rtl8139_poll). Moreover network connections were extremely slow
>(almost hung) until I set /proc/sys/net/core/netdev_backlog_granularity
>to 2.
The default of 1 caused a couple services to fail start up - most
annoying failure was NIS. I changed netdev_backlog_granularity to
eight (8) in /etc/sysctl.conf and came up fine. The system is under
test right now, though will probably tomorrow before I get full
results.
It appears to have fewer > 500 usec traces than previous
tests so the -Q7 stuff appears to work (though has not made it to
the disk tests where I generally have more problems yet).
One place where we may need to consider more mcount() calls is in
the scheduler. I got another 500+ msec trace going from dequeue_task
to __switch_to.
I also looked briefly at find_first_bit since it appears in a number
of traces. Just curious, but the coding for the i386 version is MUCH
different in style than several other architectures (e.g, PPC64, SPARC).
Is there some reason why it is recursive on the x86 and a loop in the
others?
--Mark H Johnson
<mailto:Mark_H_Johnson@raytheon.com>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q7
2004-09-01 22:56 [patch] voluntary-preempt-2.6.9-rc1-bk4-Q7 Mark_H_Johnson
@ 2004-09-02 5:34 ` Ingo Molnar
0 siblings, 0 replies; 15+ messages in thread
From: Ingo Molnar @ 2004-09-02 5:34 UTC (permalink / raw)
To: Mark_H_Johnson; +Cc: Thomas Charbonnel, K.R. Foley, linux-kernel, Lee Revell
* Mark_H_Johnson@raytheon.com <Mark_H_Johnson@raytheon.com> wrote:
> One place where we may need to consider more mcount() calls is in the
> scheduler. I got another 500+ msec trace going from dequeue_task to
> __switch_to.
(you mean 500+ usec, correct?)
there's no way the scheduler can have 500 usecs of overhead going from
dequeue_task() to __switch_to(): we have all interrupts disabled and
take zero locks! This is almost certainly some hardware effect (i
described some possibilities and tests a couple of mails earlier).
In any case, please enable nmi_watchdog=1 so that we can see (in -Q7)
what happens on the other CPUs during such long delays.
> I also looked briefly at find_first_bit since it appears in a number
> of traces. Just curious, but the coding for the i386 version is MUCH
> different in style than several other architectures (e.g, PPC64,
> SPARC). Is there some reason why it is recursive on the x86 and a loop
> in the others?
what do you mean by recursive? It uses the SCAS (scan string) x86
instruction.
Ingo
^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <OF3E3C1690.FD6E285E-ON86256F03.004CDD15-86256F03.004CDD4F@raytheon.com>]
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q7
[not found] <OF3E3C1690.FD6E285E-ON86256F03.004CDD15-86256F03.004CDD4F@raytheon.com>
@ 2004-09-02 14:43 ` Ingo Molnar
0 siblings, 0 replies; 15+ messages in thread
From: Ingo Molnar @ 2004-09-02 14:43 UTC (permalink / raw)
To: Mark_H_Johnson; +Cc: K.R. Foley, Lee Revell, Thomas Charbonnel, linux-kernel
* Mark_H_Johnson@raytheon.com <Mark_H_Johnson@raytheon.com> wrote:
> The test just completed. Over 100 traces (>500 usec) in 25 minutes
> of test runs.
>
> To recap - this kernel has:
>
> Downloaded linux-2.6.8.1 from
> http://www.kernel.org/pub/linux/kernel/v2.6/linux-2.6.8.1.tar.bz2
> Downloaded patches from
> http://redhat.com/~mingo/voluntary-preempt/diff-bk-040828-2.6.8.1.bz2
> http://people.redhat.com/mingo/voluntary-preempt/voluntary-preempt-2.6.9-rc1-bk4-Q7
> ... email saved into mark-offset-tsc-mcount.patch ...
> ... email saved into ens001.patch ...
thanks for the data. There are dozens of traces that show a big latency
for no algorithmic reason, in completely unlocked codepaths. In these
places the CPU seems to have an inexplicable inability to run simple
sequential code that has no looping potential at all.
the NMI samples show that just about any kernel code can be delayed by
this phenomenon - the kernel functions that have critical sections show
up by their likely frequency of use. There doesnt seem to be anything
common to the functions that show these delays, other than that they
have a critical section and that they are running in your workload.
so the remaining theories are:
- DMA starvation. I've never seen anything on this scale but it's
pretty much the only thing interacting with a CPU's ability to
execute code - besides the other CPU running in the system.
i'd not be surprised if some audio cards tried tricks to do as
agressive DMA as physically possible, even violating hw
specifications - for the purpose of producing skip-free audio output.
Do you have another soundcard for testing by any chance? Another
option would be to try latencytest driven not by the soundcard IRQ
but by /dev/rtc.
- some sort of SMM handler that is triggered on I/O ops or something.
But a number of functions in the traces dont do any I/O ops (port
instructions like IN or OUT) so it's hard to imagine this to be the
case. An externally triggered SMM is possible too, perhaps some
independent timer triggers a watchdog SMM?
it is nearly impossible for these traces to be caused by the kernel. It
really has to be some hardware effect, based on the data we have so far.
Ingo
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q7
@ 2004-09-02 13:33 Mark_H_Johnson
0 siblings, 0 replies; 15+ messages in thread
From: Mark_H_Johnson @ 2004-09-02 13:33 UTC (permalink / raw)
To: Ingo Molnar; +Cc: K.R. Foley, linux-kernel, Lee Revell, Thomas Charbonnel
>> I also looked briefly at find_first_bit since it appears in a number
>> of traces. Just curious, but the coding for the i386 version is MUCH
>> different in style than several other architectures (e.g, PPC64,
>> SPARC). Is there some reason why it is recursive on the x86 and a loop
>> in the others?
>
>what do you mean by recursive? It uses the SCAS (scan string) x86
>instruction.
Never mind. In bitops.c I misread "find_first_bit" (the call near the end)
as "find_next_bit" and thought there was recursion here.
--Mark H Johnson
<mailto:Mark_H_Johnson@raytheon.com>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q7
@ 2004-09-02 13:18 Mark_H_Johnson
2004-09-02 13:37 ` Ingo Molnar
0 siblings, 1 reply; 15+ messages in thread
From: Mark_H_Johnson @ 2004-09-02 13:18 UTC (permalink / raw)
To: Ingo Molnar; +Cc: K.R. Foley, linux-kernel, Lee Revell, Thomas Charbonnel
>(you mean 500+ usec, correct?)
>
>there's no way the scheduler can have 500 usecs of overhead going from
>dequeue_task() to __switch_to(): we have all interrupts disabled and
>take zero locks! This is almost certainly some hardware effect (i
>described some possibilities and tests a couple of mails earlier).
>
>In any case, please enable nmi_watchdog=1 so that we can see (in -Q7)
>what happens on the other CPUs during such long delays.
Booted with nmi_watchdog=1, saw the kernel message indicating that
NMI was checked OK.
The first trace looks something like this...
latency 518 us, entries: 79
...
started at schedule+0x51/0x740
ended at schedule+0x337/0x740
00000001 0.000ms (+0.000ms): schedule (io_schedule)
00000001 0.000ms (+0.000ms): sched_clock (schedule)
00010001 0.478ms (+0.478ms): do_nmi (sched_clock)
00010001 0.478ms (+0.000ms): do_nmi (<08049b21>)
00010001 0.482ms (+0.003ms): profile_tick (nmi_watchdog_tick)
...
and a few entries later ends up at do_IRQ (sched_clock).
The second trace goes from dequeue_task to __switch_to with a
similar pattern - the line with do_nmi has +0.282ms duration and
the line notifier_call_chain (profile_hook) as +0.135ms duration.
I don't see how this provides any additional information but will
provide several additional traces when the test gets done in a
few minutes.
--Mark H Johnson
<mailto:Mark_H_Johnson@raytheon.com>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q7
2004-09-02 13:18 Mark_H_Johnson
@ 2004-09-02 13:37 ` Ingo Molnar
2004-09-02 18:01 ` Lee Revell
0 siblings, 1 reply; 15+ messages in thread
From: Ingo Molnar @ 2004-09-02 13:37 UTC (permalink / raw)
To: Mark_H_Johnson; +Cc: K.R. Foley, linux-kernel, Lee Revell, Thomas Charbonnel
* Mark_H_Johnson@raytheon.com <Mark_H_Johnson@raytheon.com> wrote:
> >In any case, please enable nmi_watchdog=1 so that we can see (in -Q7)
> >what happens on the other CPUs during such long delays.
>
> Booted with nmi_watchdog=1, saw the kernel message indicating that
> NMI was checked OK.
>
> The first trace looks something like this...
>
> latency 518 us, entries: 79
> ...
> started at schedule+0x51/0x740
> ended at schedule+0x337/0x740
>
> 00000001 0.000ms (+0.000ms): schedule (io_schedule)
> 00000001 0.000ms (+0.000ms): sched_clock (schedule)
> 00010001 0.478ms (+0.478ms): do_nmi (sched_clock)
> 00010001 0.478ms (+0.000ms): do_nmi (<08049b21>)
> 00010001 0.482ms (+0.003ms): profile_tick (nmi_watchdog_tick)
> ...
> and a few entries later ends up at do_IRQ (sched_clock).
>
> The second trace goes from dequeue_task to __switch_to with a
> similar pattern - the line with do_nmi has +0.282ms duration and
> the line notifier_call_chain (profile_hook) as +0.135ms duration.
>
> I don't see how this provides any additional information but will
> provide several additional traces when the test gets done in a few
> minutes.
thanks. The NMI gives us two kinds of information, both useful:
- if the ratio of do_nmi()'s within such a section roughly matches the
number of NMIs we'd expect during the sum of these sections then it
means that the delay is most likely wall-clock time and not some
measurement artifact (RDTSC artifact or tracing bug). The NMI's are
triggered (indirectly) by the PIT and the PIT is an independent clock
that has a frequency that is independent of the rest of the system
(independent of the CPU clock, DMA activities, IRQ load, etc.)
since most of the codepaths in question (the scheduler's
dequeue_task(), etc.) run with interrupts disabled the normal timer
interrupts (smp_apic_timer_interrupt() and do_IRQ(00000000)) cannot
'sample' this codepath. Only the NMI can interrupt these codepaths.
- the NMIs also sample what happens on the other CPU. In your above
trace this gives:
> 00010001 0.478ms (+0.478ms): do_nmi (sched_clock)
> 00010001 0.478ms (+0.000ms): do_nmi (<08049b21>)
the other CPU was executing userspace code during the last NMI tick -
i.e. nothing that could be suspect. 'suspect' code would be some sort
kernel code that could in theory interact with this CPU's scheduler
code.
this too is statistical sampling so we'll need as much of these
traces as possible.
some wacky guess based on the above single sampling point: it seems the
delays are real wall-clock delays, and the only thing matching the
theory so far is that DMA traffic on the memory bus somehow stalls this
CPU's memory traffic for up to 500 usecs. How could userspace running on
CPU#0 impact the kernel's scheduler code on CPU#1?
Ingo
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q7
2004-09-02 13:37 ` Ingo Molnar
@ 2004-09-02 18:01 ` Lee Revell
0 siblings, 0 replies; 15+ messages in thread
From: Lee Revell @ 2004-09-02 18:01 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Mark_H_Johnson, K.R. Foley, linux-kernel, Thomas Charbonnel
On Thu, 2004-09-02 at 09:37, Ingo Molnar wrote:
> - the NMIs also sample what happens on the other CPU. In your above
> trace this gives:
>
> > 00010001 0.478ms (+0.478ms): do_nmi (sched_clock)
> > 00010001 0.478ms (+0.000ms): do_nmi (<08049b21>)
>
> the other CPU was executing userspace code during the last NMI tick -
> i.e. nothing that could be suspect. 'suspect' code would be some sort
> kernel code that could in theory interact with this CPU's scheduler
> code.
>
> this too is statistical sampling so we'll need as much of these
> traces as possible.
>
> some wacky guess based on the above single sampling point: it seems the
> delays are real wall-clock delays, and the only thing matching the
> theory so far is that DMA traffic on the memory bus somehow stalls this
> CPU's memory traffic for up to 500 usecs. How could userspace running on
> CPU#0 impact the kernel's scheduler code on CPU#1?
>
This is not wacky at all. For example the 2D acceleration driver for my
Via Unichrome chipset will completely stall the PCI bus and processor if
the 2D engine is overloaded and the command FIFO is written to without
checking whether it is full. This can be triggered easily, all you have
to do is enable 'display window contents while dragging' and drag a busy
window around slowly.
Many vendor-supplied drivers (including the open source via_accel.c)
don't bother to check whether this FIFO is full before writing to it,
because it increases benchmark scores slighly, at the expense of ruining
audio performance. You can thank Matrox for this "innovation", though
to be fair, every other vendor seems to have followed suit until they
were busted. Only setting 'Options "NoAccel"' in my X config fixes it.
This is not even a kernel driver, it's part of XFree86!
This paper describes the problem in more detail:
http://research.microsoft.com/~mbj/papers/tr-98-29.html
Since they don't provide a deep link and the interesting content is
buried, here is the excerpt:
Misbehaving video card drivers are another source of significant delays
in scheduling user code. A number of video cards manufacturers recently
began employing a hack to save a PCI bus transaction for each display
operation in order to gain a few percentage points on their WinBench
[Ziff-Davis 98] Graphics WinMark performance.
The video cards have a command FIFO that is written to via the PCI bus.
They also have a status register, read via the PCI bus, which says
whether the command FIFO is full or not. The hack is to not check
whether the command FIFO is full before attempting to write to it, thus
saving a PCI bus read.
The problem with this is that the result of attempting to write to the
FIFO when it is full is to stall the CPU waiting on the PCI bus write
until a command has been completed and space becomes available to accept
the new command. In fact, this not only causes the CPU to stall waiting
on the PCI bus, but since the PCI controller chip also controls the ISA
bus and mediates interrupts, ISA traffic and interrupt requests are
stalled as well. Even the clock interrupts stop.
These video cards will stall the machine, for instance, when the user
drags a window. For windows occupying most of a 1024x768 screen on a
333MHz Pentium II with an AccelStar II AGP video board (which is based
on the 3D Labs Permedia 2 chip set) this will stall the machine for
25-30ms at a time!
This may marginally improve the graphics performance under some
circumstances, but it wrecks havoc on any other devices expecting timely
response from the machine. For instance, this causes severe problems
with USB and IEEE 1394 video and audio streams, as well as standard
sound cards.
Some manufacturers, such as 3D Labs, do provide a registry key that can
be set to disable this anti-social behavior. For instance, [Hanssen 98]
describes this behavior and lists the registry keys to fix several
common graphics cards, including some by Matrox, Tseng Labs, Hercules,
and S3. However as of this writing, there were still drivers, including
some from Number 9 and ATI, for which this behavior could not be
disabled.
This hack, and the problems it causes, has recently started to receive
attention in the trade press [PC Magazine 98]. We hope that pressures
can soon be brought to bear on the vendors to cease this antisocial
behavior. At the very least, should they persist in writing drivers that
can stall the machine, this behavior should no longer be the default.
Lee
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q2
@ 2004-08-28 20:10 Daniel Schmitt
2004-08-28 20:31 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-Q3 Ingo Molnar
0 siblings, 1 reply; 15+ messages in thread
From: Daniel Schmitt @ 2004-08-28 20:10 UTC (permalink / raw)
To: Ingo Molnar
Cc: Lee Revell, K.R. Foley, Felipe Alfaro Solana, linux-kernel,
Mark_H_Johnson
On Saturday 28 August 2004 21:44, Ingo Molnar wrote:
>
> there's a Kconfig chunk missing from the Q0/Q1 patches, i've uploaded Q2
> that fixes this:
>
This breaks here unless CONFIG_SMP is defined, with the following error:
CC arch/i386/kernel/asm-offsets.s
In file included from arch/i386/kernel/asm-offsets.c:7:
include/linux/sched.h: In function `lock_need_resched':
include/linux/sched.h:983: error: structure has no member named `break_lock'
Probably missing a check for CONFIG_SMP around the need_lockbreak defines in
sched.h, and maybe also in cond_resched_lock().
Daniel.
^ permalink raw reply [flat|nested] 15+ messages in thread
* [patch] voluntary-preempt-2.6.9-rc1-bk4-Q3
2004-08-28 20:10 [patch] voluntary-preempt-2.6.9-rc1-bk4-Q2 Daniel Schmitt
@ 2004-08-28 20:31 ` Ingo Molnar
2004-08-28 21:10 ` Lee Revell
0 siblings, 1 reply; 15+ messages in thread
From: Ingo Molnar @ 2004-08-28 20:31 UTC (permalink / raw)
To: Daniel Schmitt
Cc: Lee Revell, K.R. Foley, Felipe Alfaro Solana, linux-kernel,
Mark_H_Johnson
* Daniel Schmitt <pnambic@unu.nu> wrote:
> > there's a Kconfig chunk missing from the Q0/Q1 patches, i've uploaded Q2
> > that fixes this:
> >
> This breaks here unless CONFIG_SMP is defined, with the following error:
>
> CC arch/i386/kernel/asm-offsets.s
> In file included from arch/i386/kernel/asm-offsets.c:7:
> include/linux/sched.h: In function `lock_need_resched':
> include/linux/sched.h:983: error: structure has no member named `break_lock'
>
> Probably missing a check for CONFIG_SMP around the need_lockbreak
> defines in sched.h, and maybe also in cond_resched_lock().
doh - right indeed. -Q3 has this fixed, it is at:
http://redhat.com/~mingo/voluntary-preempt/voluntary-preempt-2.6.9-rc1-bk4-Q3
ontop of the usual:
http://redhat.com/~mingo/voluntary-preempt/diff-bk-040828-2.6.8.1.bz2
Ingo
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q3
2004-08-28 20:31 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-Q3 Ingo Molnar
@ 2004-08-28 21:10 ` Lee Revell
2004-08-28 21:13 ` Ingo Molnar
0 siblings, 1 reply; 15+ messages in thread
From: Lee Revell @ 2004-08-28 21:10 UTC (permalink / raw)
To: Ingo Molnar
Cc: Daniel Schmitt, K.R. Foley, Felipe Alfaro Solana, linux-kernel,
Mark_H_Johnson
On Sat, 2004-08-28 at 16:31, Ingo Molnar wrote:
> http://redhat.com/~mingo/voluntary-preempt/voluntary-preempt-2.6.9-rc1-bk4-Q3
>
I get this error:
WARNING: /lib/modules/2.6.9-rc1-Q3/kernel/fs/ntfs/ntfs.ko needs unknown symbol unlock_kernel
WARNING: /lib/modules/2.6.9-rc1-Q3/kernel/fs/ntfs/ntfs.ko needs unknown symbol lock_kernel
I believe this is the correct fix:
--- fs/ntfs/super.c~ 2004-08-28 16:31:33.000000000 -0400
+++ fs/ntfs/super.c 2004-08-28 17:08:11.000000000 -0400
@@ -29,6 +29,7 @@
#include <linux/buffer_head.h>
#include <linux/vfs.h>
#include <linux/moduleparam.h>
+#include <linux/smp_lock.h>
#include "ntfs.h"
#include "sysctl.h"
Lee
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q3
2004-08-28 21:10 ` Lee Revell
@ 2004-08-28 21:13 ` Ingo Molnar
2004-08-28 21:16 ` Lee Revell
0 siblings, 1 reply; 15+ messages in thread
From: Ingo Molnar @ 2004-08-28 21:13 UTC (permalink / raw)
To: Lee Revell
Cc: Daniel Schmitt, K.R. Foley, Felipe Alfaro Solana, linux-kernel,
Mark_H_Johnson
* Lee Revell <rlrevell@joe-job.com> wrote:
> On Sat, 2004-08-28 at 16:31, Ingo Molnar wrote:
> > http://redhat.com/~mingo/voluntary-preempt/voluntary-preempt-2.6.9-rc1-bk4-Q3
> >
>
> I get this error:
>
> WARNING: /lib/modules/2.6.9-rc1-Q3/kernel/fs/ntfs/ntfs.ko needs unknown symbol unlock_kernel
> WARNING: /lib/modules/2.6.9-rc1-Q3/kernel/fs/ntfs/ntfs.ko needs unknown symbol lock_kernel
>
> I believe this is the correct fix:
>
> --- fs/ntfs/super.c~ 2004-08-28 16:31:33.000000000 -0400
> +++ fs/ntfs/super.c 2004-08-28 17:08:11.000000000 -0400
> @@ -29,6 +29,7 @@
> #include <linux/buffer_head.h>
> #include <linux/vfs.h>
> #include <linux/moduleparam.h>
> +#include <linux/smp_lock.h>
>
> #include "ntfs.h"
> #include "sysctl.h"
ok, will add this to -Q4.
Ingo
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q3
2004-08-28 21:13 ` Ingo Molnar
@ 2004-08-28 21:16 ` Lee Revell
2004-08-28 23:51 ` Lee Revell
0 siblings, 1 reply; 15+ messages in thread
From: Lee Revell @ 2004-08-28 21:16 UTC (permalink / raw)
To: Ingo Molnar
Cc: Daniel Schmitt, K.R. Foley, Felipe Alfaro Solana, linux-kernel,
Mark_H_Johnson
On Sat, 2004-08-28 at 17:13, Ingo Molnar wrote:
> ok, will add this to -Q4.
>
Hrm, Q3 broke my PS/2 keyboard.
Lee
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q3
2004-08-28 21:16 ` Lee Revell
@ 2004-08-28 23:51 ` Lee Revell
2004-08-29 2:35 ` Lee Revell
0 siblings, 1 reply; 15+ messages in thread
From: Lee Revell @ 2004-08-28 23:51 UTC (permalink / raw)
To: Ingo Molnar
Cc: Daniel Schmitt, K.R. Foley, Felipe Alfaro Solana, linux-kernel,
Mark_H_Johnson
On Sat, 2004-08-28 at 17:16, Lee Revell wrote:
> On Sat, 2004-08-28 at 17:13, Ingo Molnar wrote:
> > ok, will add this to -Q4.
> >
>
> Hrm, Q3 broke my PS/2 keyboard.
>
The problem goes away when I disable CONFIG_PREEMPT_HARDIRQS. In both
cases CONFIG_PREEMPT_SOFTIRQS was enabled.
Lee
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q3
2004-08-28 23:51 ` Lee Revell
@ 2004-08-29 2:35 ` Lee Revell
2004-08-29 5:43 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-Q4 Ingo Molnar
0 siblings, 1 reply; 15+ messages in thread
From: Lee Revell @ 2004-08-29 2:35 UTC (permalink / raw)
To: Ingo Molnar
Cc: Daniel Schmitt, K.R. Foley, Felipe Alfaro Solana, linux-kernel,
Mark_H_Johnson
On Sat, 2004-08-28 at 19:51, Lee Revell wrote:
> On Sat, 2004-08-28 at 17:16, Lee Revell wrote:
> > On Sat, 2004-08-28 at 17:13, Ingo Molnar wrote:
> > > ok, will add this to -Q4.
> > >
> >
> > Hrm, Q3 broke my PS/2 keyboard.
> >
Some more info:
This bug is 100% reproducible. During boot, as soon as the i8042 driver
is loaded:
serio: i8042 AUX port at 0x60,0x64 irq 12
serio: i8042 KBD port at 0x60,0x64 irq 1
input: AT Translated Set 2 keyboard on isa0060/serio0
the keyboard freezes, with 'Num Lock' stuck on.
The problem only occurs when CONFIG_PREEMPT_HARDIRQS=y. Works fine
otherwise.
/proc/interrupts:
CPU0
0: 509819 XT-PIC timer
1: 1649 XT-PIC i8042
2: 0 XT-PIC cascade
8: 4 XT-PIC rtc
10: 0 XT-PIC uhci_hcd, EMU10K1
11: 24394 XT-PIC uhci_hcd, eth0
12: 0 XT-PIC uhci_hcd
14: 1 XT-PIC ide0
15: 12864 XT-PIC ide1
NMI: 0
ERR: 0
Lee
^ permalink raw reply [flat|nested] 15+ messages in thread* [patch] voluntary-preempt-2.6.9-rc1-bk4-Q4
2004-08-29 2:35 ` Lee Revell
@ 2004-08-29 5:43 ` Ingo Molnar
2004-08-30 9:06 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-Q5 Ingo Molnar
0 siblings, 1 reply; 15+ messages in thread
From: Ingo Molnar @ 2004-08-29 5:43 UTC (permalink / raw)
To: Lee Revell
Cc: Daniel Schmitt, K.R. Foley, Felipe Alfaro Solana, linux-kernel,
Mark_H_Johnson
* Lee Revell <rlrevell@joe-job.com> wrote:
> Some more info:
>
> This bug is 100% reproducible. During boot, as soon as the i8042 driver
> is loaded:
>
> serio: i8042 AUX port at 0x60,0x64 irq 12
> serio: i8042 KBD port at 0x60,0x64 irq 1
> input: AT Translated Set 2 keyboard on isa0060/serio0
>
> the keyboard freezes, with 'Num Lock' stuck on.
>
> The problem only occurs when CONFIG_PREEMPT_HARDIRQS=y. Works fine
> otherwise.
i suspect it's the generic_synchronize_irq() change. Does -Q4 boot?:
http://redhat.com/~mingo/voluntary-preempt/voluntary-preempt-2.6.9-rc1-bk4-Q4
-Q4 reverts this change. (this doesnt solve the problems Scott noticed
though.)
another solution would be to boot Q3 with preempt_hardirqs=0 and then
turn on threading for all IRQs but the keyboard.
Ingo
^ permalink raw reply [flat|nested] 15+ messages in thread
* [patch] voluntary-preempt-2.6.9-rc1-bk4-Q5
2004-08-29 5:43 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-Q4 Ingo Molnar
@ 2004-08-30 9:06 ` Ingo Molnar
2004-09-01 8:29 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-Q6 Ingo Molnar
0 siblings, 1 reply; 15+ messages in thread
From: Ingo Molnar @ 2004-08-30 9:06 UTC (permalink / raw)
To: Lee Revell
Cc: Daniel Schmitt, K.R. Foley, Felipe Alfaro Solana, linux-kernel,
Mark_H_Johnson
i've uploaded -Q5 to:
http://redhat.com/~mingo/voluntary-preempt/voluntary-preempt-2.6.9-rc1-bk4-Q5
ontop of:
http://redhat.com/~mingo/voluntary-preempt/diff-bk-040828-2.6.8.1.bz2
-Q5 should fix the PS2 problems and the early boot problems, and it
might even fix the USB, ACPI and APIC problems some people were
reporting.
There were a number of bugs that led to the PS2 problems:
- a change to __cond_resched() in the -Q series caused the starvation
of the IRQ1 and IRQ12 threads during init - causing a silent timeout
and misdetection in the ps2 driver(s).
- even with the starvation bug fixed, we must set system_state to
SCHEDULER_OK only once the init thread has started - otherwise the
idle thread might hang during bootup.
- the redirected IRQ handling now matches that of non-redirected IRQs
better, the outer loop in generic_handle_IRQ has been flattened.
i also re-added the synchronize_irq() fix, it was not causing the PS2
problems.
Ingo
^ permalink raw reply [flat|nested] 15+ messages in thread
* [patch] voluntary-preempt-2.6.9-rc1-bk4-Q6
2004-08-30 9:06 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-Q5 Ingo Molnar
@ 2004-09-01 8:29 ` Ingo Molnar
2004-09-01 13:51 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-Q7 Ingo Molnar
0 siblings, 1 reply; 15+ messages in thread
From: Ingo Molnar @ 2004-09-01 8:29 UTC (permalink / raw)
To: linux-kernel; +Cc: K.R. Foley, Mark_H_Johnson, Lee Revell
i've released the -Q6 patch:
http://redhat.com/~mingo/voluntary-preempt/voluntary-preempt-2.6.9-rc1-bk4-Q6
ontop of:
http://redhat.com/~mingo/voluntary-preempt/diff-bk-040828-2.6.8.1.bz2
this patch includes two changes that should shorten the networking
latencies reported. There's a new 'RX granularity' sysctl now:
/proc/sys/net/core/netdev_backlog_granularity
It defaults to the most finegrained value, 1.
netdev_max_backlog has been moved back to the upstream value of 300.
Also, the backlog processing is now sensitive to preemption requests and
will break out early in that case.
(This should not result in TCP connection quality issues (all processing
is restarted after such a breakout), but nevertheless i'd suggest
everyone to keep an eye on lost packets and seemingly hung TCP
connections.)
other changes since -Q5:
- mtrr simplifications and IRQ-disabling. (reported & tested by Lee
Revell) Still under discussion though.
- fix /dev/random driver latency (reported & tested by Lee Revell)
- move vgacon_do_font_op out of the BKL (reported by P.O. Gaillard)
- increase percpu space for tracing (by Mark H Johnson)
- added user-triggerable generic kernel tracing enabled via
tracing_enabled=2 and turned on via gettimeofday(0,1) and turned off
via gettimeofday(0,0).
Ingo
^ permalink raw reply [flat|nested] 15+ messages in thread* [patch] voluntary-preempt-2.6.9-rc1-bk4-Q7
2004-09-01 8:29 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-Q6 Ingo Molnar
@ 2004-09-01 13:51 ` Ingo Molnar
2004-09-01 17:09 ` Thomas Charbonnel
[not found] ` <41367E5D.3040605@cybsft.com>
0 siblings, 2 replies; 15+ messages in thread
From: Ingo Molnar @ 2004-09-01 13:51 UTC (permalink / raw)
To: linux-kernel; +Cc: K.R. Foley, Mark_H_Johnson, Lee Revell
i've released the -Q7 patch:
http://redhat.com/~mingo/voluntary-preempt/voluntary-preempt-2.6.9-rc1-bk4-Q7
ontop of:
http://redhat.com/~mingo/voluntary-preempt/diff-bk-040828-2.6.8.1.bz2
the main change in this patch are more SMP latency fixes. The stock
kernel, even with CONFIG_PREEMPT enabled, didnt have any spin-nicely
preemption logic for the following, commonly used SMP locking
primitives: read_lock(), spin_lock_irqsave(), spin_lock_irq(),
spin_lock_bh(), read_lock_irqsave(), read_lock_irq(), read_lock_bh(),
write_lock_irqsave(), write_lock_irq(), write_lock_bh(). Only
spin_lock() and write_lock() [the two simplest cases] where covered.
In addition to the preemption latency problems, the _irq() variants in
the above list didnt do any IRQ-enabling while spinning - possibly
resulting in excessive irqs-off sections of code!
-Q7 fixes all of these latency problems: we now re-enable interrupts
while spinning in all possible cases, and a spinning op stays
preemptible if this is a beginning of a new critical section.
there's also an SMP related tracing improvement in -Q7: the NMI tracing
code now traces the other CPUs too - this way if an NMI hits a
particulary long section, we'll have a chance to see what the other CPU
was doing. These show up as double do_nmi() trace entries on a 2-CPU x86
box. The first one is the current CPU, subsequent entries are the other
CPUs in the system.
(-Q7 is not that interesting to uniprocessor kernel users, but it would
still be useful to test it, just to see nothing broke (on the
compilation side), lots of spinlock code had to be changed.)
Ingo
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q7
2004-09-01 13:51 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-Q7 Ingo Molnar
@ 2004-09-01 17:09 ` Thomas Charbonnel
2004-09-01 19:03 ` K.R. Foley
2004-09-01 20:11 ` Peter Zijlstra
[not found] ` <41367E5D.3040605@cybsft.com>
1 sibling, 2 replies; 15+ messages in thread
From: Thomas Charbonnel @ 2004-09-01 17:09 UTC (permalink / raw)
To: Ingo Molnar; +Cc: linux-kernel, K.R. Foley, Mark_H_Johnson, Lee Revell
Ingo Molnar wrote :
> i've released the -Q7 patch:
>
> http://redhat.com/~mingo/voluntary-preempt/voluntary-preempt-2.6.9-rc1-bk4-Q7
With Q7 I still get rx latency issues (> 130 us non-preemptible section
from rtl8139_poll). Moreover network connections were extremely slow
(almost hung) until I set /proc/sys/net/core/netdev_backlog_granularity
to 2.
Thomas
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q7
2004-09-01 17:09 ` Thomas Charbonnel
@ 2004-09-01 19:03 ` K.R. Foley
2004-09-01 20:11 ` Peter Zijlstra
1 sibling, 0 replies; 15+ messages in thread
From: K.R. Foley @ 2004-09-01 19:03 UTC (permalink / raw)
To: Thomas Charbonnel; +Cc: Ingo Molnar, linux-kernel, Mark_H_Johnson, Lee Revell
Thomas Charbonnel wrote:
> Ingo Molnar wrote :
>
>>i've released the -Q7 patch:
>>
>> http://redhat.com/~mingo/voluntary-preempt/voluntary-preempt-2.6.9-rc1-bk4-Q7
>
>
> With Q7 I still get rx latency issues (> 130 us non-preemptible section
> from rtl8139_poll). Moreover network connections were extremely slow
> (almost hung) until I set /proc/sys/net/core/netdev_backlog_granularity
> to 2.
>
> Thomas
>
>
>
I too am still getting these latencies, although not as often (maybe?).
I on the other hand am having no problems with slow connections.
However, this is with very little load on the system. Here is one such
trace:
http://www.cybsft.com/testresults/2.6.9-rc1-bk4-Q7/latencytrace4.txt
I do have a couple of new traces that seem to be related to transmitting
data, I think. They are here:
http://www.cybsft.com/testresults/2.6.9-rc1-bk4-Q7/latencytrace2.txt
http://www.cybsft.com/testresults/2.6.9-rc1-bk4-Q7/latencytrace3.txt
kr
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q7
2004-09-01 17:09 ` Thomas Charbonnel
2004-09-01 19:03 ` K.R. Foley
@ 2004-09-01 20:11 ` Peter Zijlstra
2004-09-01 20:16 ` Lee Revell
2004-09-01 20:53 ` K.R. Foley
1 sibling, 2 replies; 15+ messages in thread
From: Peter Zijlstra @ 2004-09-01 20:11 UTC (permalink / raw)
To: Thomas Charbonnel
Cc: Ingo Molnar, LKML, K.R. Foley, Mark_H_Johnson, Lee Revell
On Wed, 2004-09-01 at 19:09 +0200, Thomas Charbonnel wrote:
> Ingo Molnar wrote :
> > i've released the -Q7 patch:
> >
> > http://redhat.com/~mingo/voluntary-preempt/voluntary-preempt-2.6.9-rc1-bk4-Q7
>
> With Q7 I still get rx latency issues (> 130 us non-preemptible section
> from rtl8139_poll). Moreover network connections were extremely slow
> (almost hung) until I set /proc/sys/net/core/netdev_backlog_granularity
> to 2.
>
> Thomas
>
Me too!
I too have a rtl8139 network card.
kr, what kind of nic do you have since this does not occur on your
machine?
--
Peter Zijlstra <a.p.zijlstra@chello.nl>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q7
2004-09-01 20:11 ` Peter Zijlstra
@ 2004-09-01 20:16 ` Lee Revell
2004-09-01 20:53 ` K.R. Foley
1 sibling, 0 replies; 15+ messages in thread
From: Lee Revell @ 2004-09-01 20:16 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Thomas Charbonnel, Ingo Molnar, LKML, K.R. Foley, Mark_H_Johnson
On Wed, 2004-09-01 at 16:11, Peter Zijlstra wrote:
> On Wed, 2004-09-01 at 19:09 +0200, Thomas Charbonnel wrote:
> > Ingo Molnar wrote :
> > > i've released the -Q7 patch:
> > >
> > > http://redhat.com/~mingo/voluntary-preempt/voluntary-preempt-2.6.9-rc1-bk4-Q7
> >
> > With Q7 I still get rx latency issues (> 130 us non-preemptible section
> > from rtl8139_poll). Moreover network connections were extremely slow
> > (almost hung) until I set /proc/sys/net/core/netdev_backlog_granularity
> > to 2.
> >
> > Thomas
> >
>
> Me too!
> I too have a rtl8139 network card.
>
> kr, what kind of nic do you have since this does not occur on your
> machine?
Hmm, I am not a network driver expert, and this is just a guess, but if
they work anything like sound cards, I would say that that that hardware
will only generate an interrupt when there are 2 packets in its queue.
Lee
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q7
2004-09-01 20:11 ` Peter Zijlstra
2004-09-01 20:16 ` Lee Revell
@ 2004-09-01 20:53 ` K.R. Foley
1 sibling, 0 replies; 15+ messages in thread
From: K.R. Foley @ 2004-09-01 20:53 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Thomas Charbonnel, Ingo Molnar, LKML, Mark_H_Johnson, Lee Revell
Peter Zijlstra wrote:
> On Wed, 2004-09-01 at 19:09 +0200, Thomas Charbonnel wrote:
>
>>Ingo Molnar wrote :
>>
>>>i've released the -Q7 patch:
>>>
>>> http://redhat.com/~mingo/voluntary-preempt/voluntary-preempt-2.6.9-rc1-bk4-Q7
>>
>>With Q7 I still get rx latency issues (> 130 us non-preemptible section
>>from rtl8139_poll). Moreover network connections were extremely slow
>>(almost hung) until I set /proc/sys/net/core/netdev_backlog_granularity
>>to 2.
>>
>>Thomas
>>
>
>
> Me too!
> I too have a rtl8139 network card.
>
> kr, what kind of nic do you have since this does not occur on your
> machine?
>
Ethernet Pro 100.
^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <41367E5D.3040605@cybsft.com>]
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q7
[not found] ` <41367E5D.3040605@cybsft.com>
@ 2004-09-02 5:37 ` Ingo Molnar
2004-09-02 5:40 ` Ingo Molnar
0 siblings, 1 reply; 15+ messages in thread
From: Ingo Molnar @ 2004-09-02 5:37 UTC (permalink / raw)
To: K.R. Foley; +Cc: linux-kernel, Mark_H_Johnson, Lee Revell
* K.R. Foley <kr@cybsft.com> wrote:
> This is an interesting one. ~3.9ms generated here by amlat in do_IRQ:
the overhead is not in do_IRQ():
> 00000001 0.000ms (+0.000ms): n_tty_receive_buf (pty_write)
> 00010001 3.992ms (+3.992ms): do_IRQ (n_tty_receive_buf)
the overhead is always relative to the previous entry - so the overhead
was in n_tty_receive_buf() [that is the function that was interrupted by
do_IRQ()]. But it's a bit weird - you should have gotten timer IRQs
every 1 msec. Does n_tty_receive_buf() run with irqs disabled perhaps?
Ingo
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q7
2004-09-02 5:37 ` Ingo Molnar
@ 2004-09-02 5:40 ` Ingo Molnar
0 siblings, 0 replies; 15+ messages in thread
From: Ingo Molnar @ 2004-09-02 5:40 UTC (permalink / raw)
To: K.R. Foley; +Cc: linux-kernel, Mark_H_Johnson, Lee Revell
* Ingo Molnar <mingo@elte.hu> wrote:
> > 00000001 0.000ms (+0.000ms): n_tty_receive_buf (pty_write)
> > 00010001 3.992ms (+3.992ms): do_IRQ (n_tty_receive_buf)
>
> the overhead is always relative to the previous entry [...]
i've changed the /proc/latency_trace output in my tree to print the
latency of this entry relative to the next entry, not the previous
entry. This should be more intuitive than using the previous entry.
Ingo
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2004-09-02 18:03 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-09-01 22:56 [patch] voluntary-preempt-2.6.9-rc1-bk4-Q7 Mark_H_Johnson
2004-09-02 5:34 ` Ingo Molnar
[not found] <OF3E3C1690.FD6E285E-ON86256F03.004CDD15-86256F03.004CDD4F@raytheon.com>
2004-09-02 14:43 ` Ingo Molnar
-- strict thread matches above, loose matches on Subject: below --
2004-09-02 13:33 Mark_H_Johnson
2004-09-02 13:18 Mark_H_Johnson
2004-09-02 13:37 ` Ingo Molnar
2004-09-02 18:01 ` Lee Revell
2004-08-28 20:10 [patch] voluntary-preempt-2.6.9-rc1-bk4-Q2 Daniel Schmitt
2004-08-28 20:31 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-Q3 Ingo Molnar
2004-08-28 21:10 ` Lee Revell
2004-08-28 21:13 ` Ingo Molnar
2004-08-28 21:16 ` Lee Revell
2004-08-28 23:51 ` Lee Revell
2004-08-29 2:35 ` Lee Revell
2004-08-29 5:43 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-Q4 Ingo Molnar
2004-08-30 9:06 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-Q5 Ingo Molnar
2004-09-01 8:29 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-Q6 Ingo Molnar
2004-09-01 13:51 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-Q7 Ingo Molnar
2004-09-01 17:09 ` Thomas Charbonnel
2004-09-01 19:03 ` K.R. Foley
2004-09-01 20:11 ` Peter Zijlstra
2004-09-01 20:16 ` Lee Revell
2004-09-01 20:53 ` K.R. Foley
[not found] ` <41367E5D.3040605@cybsft.com>
2004-09-02 5:37 ` Ingo Molnar
2004-09-02 5:40 ` Ingo Molnar
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.