* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q5
@ 2004-08-30 19:13 Mark_H_Johnson
2004-08-30 19:21 ` Ingo Molnar
` (2 more replies)
0 siblings, 3 replies; 73+ messages in thread
From: Mark_H_Johnson @ 2004-08-30 19:13 UTC (permalink / raw)
To: Ingo Molnar
Cc: K.R. Foley, linux-kernel, Felipe Alfaro Solana, Daniel Schmitt,
Lee Revell
>i've uploaded -Q5 to:
> [snip the rest...]
Thanks.
This appears to be the first 2.6.x kernel I've run that has results
comparable to 2.4.x kernels with low latency patches and kernel preemption.
The few remaining symptoms I see include:
- a few long (> 1 msec) delays in the real time CPU loop (no system calls)
- varying time to complete the write system call (for audio) - much
different than 2.4
- a couple latency traces (> 700 usec) in the network driver
For reference, these tests were performed on the following SMP system:
Dual 866 Mhz Pentium III
512 Mbyte memory
IDE system disk (DMA enabled)
The basic test is Benno's latency test
(http://www.gardena.net/benno/linux/audio) with some slight modifications
to the tests to keep the second CPU busy (non real time CPU burner) and to
add network I/O tests. The 2.4 tests were run with 2.4.20, the 2.6 tests
were run with 2.4.9-rc1-Q5. On 2.6, voluntary_preemption,
kernel_preemption, hardirq_preemption, and softirq_preemption are all 1. I
also set
/sys/block/hda/queue/max_sectors_kb = 32
/sys/block/hda/queue/read_ahead_kb = 32
/proc/sys/net/core/netdev_max_backlog = 8
and the audio driver was set to be non-threaded.
BASIC RESULTS
=============
Comparison of results between 2.6.x and 2.4.x; values in milliseconds.
Nominal values for the write operation is 1.45 msec; the CPU loop is 1.16
msec.
Max CPU Delta Max Write Delta
Test 2.4.x 2.6.x 2.4.x 2.6.x
X11 0.10 0.16 0.05 0.65
/proc 0.07 0.17 0.05 0.65
net out 0.15 0.19 0.05 0.75
net in 0.17 0.23 0.05 0.95
dsk wrt 0.49 0.18 0.25 1.05
dsk copy 2.48 0.68 2.25 1.25
disk rd 3.03 1.61 2.75 1.35
LONG DELAYS
===========
Note I still see over 110% worst case overhead on a max priority real time
CPU task (no system calls) when doing heavy disk I/O on 2.6. It is much
better than 2.4, but still disturbing. What I would hope would happen on a
dual CPU system like mine, is that the real time task tends to be on one
CPU and the other system activity would tend to stay on the other CPU.
However, the results do not seem to indicate that behavior.
VARYING SYSTEM CALL TIMES
=========================
In 2.4, it appears that the duration of the write system call is basically
fixed and dependent on the duration of the audio fragment. In 2.6, this
behavior is now different. If I look at the chart in detail, it appears the
system is queueing up several write operations during the first few seconds
of testing. You can see this by consistently low elapsed times for the
write system call. Then the elapsed time for the write bounces up / down in
a sawtooth pattern over a 1 msec range. Could someone explain the cause of
this new behavior and if there is a setting to restore the old behavior? I
am concerned that this queueing adds latency to audio operations (when
trying to synchronize audio with other real time behavior).
LONG NETWORK LATENCIES
======================
In about 25 minutes of heavy testing, I had two latency traces with
/proc/sys/kernel/preempt_max_latency set to 700. They had the same start /
end location with the long delay as follows:
730 us, entries: 361
...
started at rtl8139_poll+0x3c/0x160
ended at rtl8139_poll+0x100/0x160
00000001 0.000ms (+0.000ms): rtl8139_poll (net_rx_action)
00000001 0.140ms (+0.140ms): rtl8139_rx (rtl8139_poll)
00000001 0.556ms (+0.416ms): alloc_skb (rtl8139_rx)
... remaining items all > +0.005ms ...
731 us, entries: 360
...
started at rtl8139_poll+0x3c/0x160
ended at rtl8139_poll+0x100/0x160
00000001 0.000ms (+0.000ms): rtl8139_poll (net_rx_action)
00000001 0.000ms (+0.000ms): rtl8139_rx (rtl8139_poll)
00000001 0.002ms (+0.001ms): alloc_skb (rtl8139_rx)
00000001 0.141ms (+0.139ms): kmem_cache_alloc (alloc_skb)
00000001 0.211ms (+0.070ms): __kmalloc (alloc_skb)
00000001 0.496ms (+0.284ms): eth_type_trans (rtl8139_rx)
00000001 0.565ms (+0.068ms): netif_receive_skb (rtl8139_rx)
... remaining items all > +0.005ms ...
Still much better than my previous results (before setting
netdev_max_backlog).
I will be running some additional tests
- reducing preempt_max_latency
- running with sortirq and hardirq_preemption=0
to see if these uncover any further problems.
Thanks again for the good work.
--Mark H Johnson
<mailto:Mark_H_Johnson@raytheon.com>
^ permalink raw reply [flat|nested] 73+ messages in thread* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q5 2004-08-30 19:13 [patch] voluntary-preempt-2.6.9-rc1-bk4-Q5 Mark_H_Johnson @ 2004-08-30 19:21 ` Ingo Molnar 2004-09-01 12:31 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-Q5 - netdev_max_back_log is too small P.O. Gaillard 2004-08-31 8:49 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-Q5 Ingo Molnar 2004-09-02 6:33 ` Ingo Molnar 2 siblings, 1 reply; 73+ messages in thread From: Ingo Molnar @ 2004-08-30 19:21 UTC (permalink / raw) To: Mark_H_Johnson Cc: K.R. Foley, linux-kernel, Felipe Alfaro Solana, Daniel Schmitt, Lee Revell [-- Attachment #1: Type: text/plain, Size: 728 bytes --] * Mark_H_Johnson@raytheon.com <Mark_H_Johnson@raytheon.com> wrote: > LONG NETWORK LATENCIES > ====================== > > In about 25 minutes of heavy testing, I had two latency traces with > /proc/sys/kernel/preempt_max_latency set to 700. They had the same start / > end location with the long delay as follows: > 730 us, entries: 361 > ... > started at rtl8139_poll+0x3c/0x160 > ended at rtl8139_poll+0x100/0x160 regarding this particular latency, could you try the attached patch ontop of -Q5? It turns the ->poll() loop into separate, individually preemptable iterations instead of one batch of processing. In theory this should result in latency being lower regardless of the netdev_max_backlog value. Ingo [-- Attachment #2: 2 --] [-- Type: text/plain, Size: 738 bytes --] --- linux/net/core/dev.c.orig2 +++ linux/net/core/dev.c @@ -1903,7 +1903,7 @@ static void net_rx_action(struct softirq { struct softnet_data *queue = &__get_cpu_var(softnet_data); unsigned long start_time = jiffies; - int budget = netdev_max_backlog; + int budget = netdev_max_backlog, loops; local_irq_disable(); @@ -1926,7 +1926,10 @@ static void net_rx_action(struct softirq dev = list_entry(queue->poll_list.next, struct net_device, poll_list); - if (dev->quota <= 0 || dev->poll(dev, &budget)) { + loops = 1; + if (dev->quota <= 0 || dev->poll(dev, &loops)) { + if (loops < 1) + budget--; local_irq_disable(); list_del(&dev->poll_list); list_add_tail(&dev->poll_list, &queue->poll_list); ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q5 - netdev_max_back_log is too small 2004-08-30 19:21 ` Ingo Molnar @ 2004-09-01 12:31 ` P.O. Gaillard 2004-09-01 13:05 ` Ingo Molnar 0 siblings, 1 reply; 73+ messages in thread From: P.O. Gaillard @ 2004-09-01 12:31 UTC (permalink / raw) To: Ingo Molnar; +Cc: linux-kernel Hello, I know that you have reduced netdev_max_back_log to 8 to reduce the latency. But I think that you should know that I had to set it back to 32 to avoid ethernet frame losses on a 3GHz P4 using e1000 (eth0/threaded=1). My app is still the one with a 20Mbytes/s flow of data and some signal processing work. That's only 20% of max Gigabit Ethernet load and 40% of CPU time. Not really hot stuff. So, this were my 2 cents to help you decide how to balance latency and throughput. I hope that you can find a good balance or find a way to allow people to easily choose their own balance (e.g. maybe there should be a list of parameters impacting latency in the Documentation directory). please keep on the good work and thanks a lot ! P.O. Gaillard ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q5 - netdev_max_back_log is too small 2004-09-01 12:31 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-Q5 - netdev_max_back_log is too small P.O. Gaillard @ 2004-09-01 13:05 ` Ingo Molnar 2004-09-02 11:24 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-Q6 - network is no longer smooth P.O. Gaillard 0 siblings, 1 reply; 73+ messages in thread From: Ingo Molnar @ 2004-09-01 13:05 UTC (permalink / raw) To: P.O. Gaillard; +Cc: linux-kernel * P.O. Gaillard <pierre-olivier.gaillard@fr.thalesgroup.com> wrote: > I know that you have reduced netdev_max_back_log to 8 to reduce the > latency. But I think that you should know that I had to set it back to > 32 to avoid ethernet frame losses on a 3GHz P4 using e1000 > (eth0/threaded=1). could you try the -Q6 patch? That leaves netdev_max_backlog at the default value of 300 and uses another method to break up the networking codepaths, without any bad tradeoffs. Ingo ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q6 - network is no longer smooth 2004-09-01 13:05 ` Ingo Molnar @ 2004-09-02 11:24 ` P.O. Gaillard 2004-09-02 11:28 ` Ingo Molnar 0 siblings, 1 reply; 73+ messages in thread From: P.O. Gaillard @ 2004-09-02 11:24 UTC (permalink / raw) To: Ingo Molnar; +Cc: linux-kernel Hello, I just tried with Q6 and the network behaviour seems odd : - my ssh connection freezes a bit, - communication between the two PCs sharing the GigE connection is difficult to start : 13:17:37.965962 arp who-has djerba-gig tell canopus-gig 13:17:37.966220 IP djerba-gig > canopus-gig: udp 13:17:37.966230 IP djerba-gig > canopus-gig: udp 13:17:37.966313 IP djerba-gig > canopus-gig: udp 13:17:38.965906 arp who-has djerba-gig tell canopus-gig 13:17:38.966040 IP djerba-gig > canopus-gig: udp This is strange since canopus is the one with Q6 - ping does not start for a moment since all packets are RX-ERROR, then they start being received. Here is the output of ethtool -S eth1 NIC statistics: rx_packets: 581 tx_packets: 552 rx_bytes: 545746 tx_bytes: 66508 rx_errors: 112950 tx_errors: 0 rx_dropped: 90 tx_dropped: 0 multicast: 0 collisions: 0 rx_length_errors: 0 rx_over_errors: 0 rx_crc_errors: 0 rx_frame_errors: 0 rx_fifo_errors: 112860 rx_missed_errors: 112860 tx_aborted_errors: 0 tx_carrier_errors: 0 tx_fifo_errors: 0 tx_heartbeat_errors: 0 tx_window_errors: 0 tx_abort_late_coll: 0 tx_deferred_ok: 0 tx_single_coll_ok: 0 tx_multi_coll_ok: 0 rx_long_length_errors: 0 rx_short_length_errors: 0 rx_align_errors: 0 tx_tcp_seg_good: 0 tx_tcp_seg_failed: 0 rx_flow_control_xon: 0 rx_flow_control_xoff: 0 tx_flow_control_xon: 45 tx_flow_control_xoff: 112905 rx_long_byte_count: 545746 rx_csum_offload_good: 1 rx_csum_offload_errors: 0 I rebooted with Q5 (+ netdev_max_backlog=32) and everything was fine. Is there anything I can do to make my report more useful for you ? P.O. Gaillard ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q6 - network is no longer smooth 2004-09-02 11:24 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-Q6 - network is no longer smooth P.O. Gaillard @ 2004-09-02 11:28 ` Ingo Molnar 2004-09-02 15:26 ` P.O. Gaillard 0 siblings, 1 reply; 73+ messages in thread From: Ingo Molnar @ 2004-09-02 11:28 UTC (permalink / raw) To: P.O. Gaillard; +Cc: linux-kernel * P.O. Gaillard <pierre-olivier.gaillard@fr.thalesgroup.com> wrote: > Hello, > > I just tried with Q6 and the network behaviour seems odd : > - my ssh connection freezes a bit, > - communication between the two PCs sharing the GigE connection is > difficult to start : please try the -Q9 patch i just released, and if you still see any problems does the following: echo 300 > /proc/sys/net/core/netdev_backlog_granularity help? Ingo ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q6 - network is no longer smooth 2004-09-02 11:28 ` Ingo Molnar @ 2004-09-02 15:26 ` P.O. Gaillard 0 siblings, 0 replies; 73+ messages in thread From: P.O. Gaillard @ 2004-09-02 15:26 UTC (permalink / raw) To: Ingo Molnar; +Cc: linux-kernel Ingo Molnar wrote:> * P.O. Gaillard <pierre-olivier.gaillard@fr.thalesgroup.com> wrote: > > >>Hello, >> >>I just tried with Q6 and the network behaviour seems odd : >> - my ssh connection freezes a bit, >> - communication between the two PCs sharing the GigE connection is >> difficult to start : > > > please try the -Q9 patch i just released, and if you still see any > problems does the following: > > echo 300 > /proc/sys/net/core/netdev_backlog_granularity > > help? > Q9 just crashed after showing the login prompt. It says there is a kernel bug in netdevice.h at line 874 The backtrace is : finegrained_poll net_rx_action ___do_soft_irq _do_soft_irq ksoftirqd kthread ksoftirqd kthread kernel_thread_helper I have the following modules : e1000 i2c_isa i2c_sensor w83627hf asus_acpi P.O. Gaillard ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q5 2004-08-30 19:13 [patch] voluntary-preempt-2.6.9-rc1-bk4-Q5 Mark_H_Johnson 2004-08-30 19:21 ` Ingo Molnar @ 2004-08-31 8:49 ` Ingo Molnar 2004-09-02 6:33 ` Ingo Molnar 2 siblings, 0 replies; 73+ messages in thread From: Ingo Molnar @ 2004-08-31 8:49 UTC (permalink / raw) To: Mark_H_Johnson Cc: K.R. Foley, linux-kernel, Felipe Alfaro Solana, Daniel Schmitt, Lee Revell [-- Attachment #1: Type: text/plain, Size: 2144 bytes --] * Mark_H_Johnson@raytheon.com <Mark_H_Johnson@raytheon.com> wrote: > VARYING SYSTEM CALL TIMES > ========================= > > In 2.4, it appears that the duration of the write system call is > basically fixed and dependent on the duration of the audio fragment. > In 2.6, this behavior is now different. If I look at the chart in > detail, it appears the system is queueing up several write operations > during the first few seconds of testing. You can see this by > consistently low elapsed times for the write system call. Then the > elapsed time for the write bounces up / down in a sawtooth pattern > over a 1 msec range. Could someone explain the cause of this new > behavior and if there is a setting to restore the old behavior? I am > concerned that this queueing adds latency to audio operations (when > trying to synchronize audio with other real time behavior). since the latency tracer does not trigger, we need a modified tracer to find out what's happening during such long delays. I've attached the 'user-latency-tracer' patch ontop of -Q5, which is a modification of the latency tracer. This patch enables free-running tracing which includes all kernel functions not just critical sections. To activate this, two things have to be done. Firstly: echo 2 > /proc/sys/kernel/trace_enabled this turns off the normal latency tracer and turns on the 'user tracer'. Traces can be generated by userspace via a hack done to sys_gettimeofday(): gettimeofday(0,1); // enable the tracer gettimeofday(0,0); // save current trace and disable the tracer this way the tracing can be limited to the suspected codepaths only. could you try to insert gettimeofday(0,1) into your testsuite just before the write() call is done, and right after the write() call, and save a couple of representative traces? The patch also ups the # of latency entries to 8000 - if that is still insufficient then please increase it as needed. NOTE: on SMP the tracing on/off is strictly per-CPU. So do the enabling and disabling of the trace on the same CPU. (doing otherwise wont cause problems, but the generated traces will be less useful.) Ingo [-- Attachment #2: user-latency-2.6.9-rc1-bk4-Q5-A0 --] [-- Type: text/plain, Size: 3598 bytes --] --- linux/kernel/latency.c.orig +++ linux/kernel/latency.c @@ -27,7 +27,7 @@ static DECLARE_MUTEX(max_mutex); #ifdef CONFIG_LATENCY_TRACE -#define MAX_TRACE 4000UL +#define MAX_TRACE 8000UL struct trace_entry { unsigned long preempt_count; @@ -63,14 +63,13 @@ static unsigned long max_nice; static unsigned long max_policy; static unsigned long max_rt_priority; -inline void notrace +static inline void notrace ____trace(struct cpu_trace *tr, unsigned long eip, unsigned long parent_eip) { struct trace_entry *entry; - BUG_ON(!irqs_disabled()); - - if (tr->trace_idx < MAX_TRACE) { + if ((tr->critical_start || (trace_enabled == 2)) && + (tr->trace_idx < MAX_TRACE)) { entry = tr->trace + tr->trace_idx; entry->eip = eip; entry->parent_eip = parent_eip; @@ -80,7 +79,7 @@ ____trace(struct cpu_trace *tr, unsigned tr->trace_idx++; } -inline void notrace +static inline void notrace ___trace(unsigned long eip, unsigned long parent_eip) { unsigned long flags; @@ -266,6 +265,18 @@ static int setup_preempt_thresh(char *s) } __setup("preempt_thresh=", setup_preempt_thresh); +static void update_max_trace(struct cpu_trace *tr) +{ + memcpy(&max_trace, tr, sizeof (max_trace)); + + memcpy(max_comm, current->comm, 16); + max_pid = current->pid; + max_uid = current->uid; + max_nice = current->static_prio - 20 - MAX_RT_PRIO; + max_policy = current->policy; + max_rt_priority = current->rt_priority; +} + static void notrace check_preempt_timing(struct cpu_trace *tr) { #ifdef CONFIG_LATENCY_TRACE @@ -274,6 +285,10 @@ static void notrace check_preempt_timing unsigned long parent_eip = (unsigned long)__builtin_return_address(1); unsigned long latency; +#ifdef CONFIG_LATENCY_TRACE + if (trace_enabled == 2) + return; +#endif atomic_inc(&tr->disabled); latency = cycles_to_usecs(get_cycles() - tr->preempt_timestamp); @@ -293,14 +308,7 @@ static void notrace check_preempt_timing #ifdef CONFIG_LATENCY_TRACE ____trace(tr, eip, parent_eip); - memcpy(&max_trace, tr, sizeof (max_trace)); - - memcpy(max_comm, current->comm, 16); - max_pid = current->pid; - max_uid = current->uid; - max_nice = current->static_prio - 20 - MAX_RT_PRIO; - max_policy = current->policy; - max_rt_priority = current->rt_priority; + update_max_trace(tr); #endif if (preempt_thresh) @@ -354,6 +362,10 @@ void notrace add_preempt_count(int val) #endif preempt_count() += val; +#ifdef CONFIG_LATENCY_TRACE + if (trace_enabled == 2) + return; +#endif if (preempt_count() == val) { struct cpu_trace *tr = &__get_cpu_var(trace); @@ -383,3 +395,27 @@ void notrace sub_preempt_count(int val) preempt_count() -= val; } EXPORT_SYMBOL(sub_preempt_count); + +void user_trace_start(void) +{ + struct cpu_trace *tr; + + if (trace_enabled != 2) + return; + tr = &get_cpu_var(trace); + tr->trace_idx = 0; + mcount(); + put_cpu_var(trace); +} + +void user_trace_stop(void) +{ + struct cpu_trace *tr; + + if (trace_enabled != 2) + return; + tr = &get_cpu_var(trace); + mcount(); + update_max_trace(tr); + put_cpu_var(trace); +} --- linux/kernel/time.c.orig2 +++ linux/kernel/time.c @@ -90,8 +90,17 @@ asmlinkage long sys_stime(time_t __user #endif /* __ARCH_WANT_SYS_TIME */ +extern void user_trace_start(void); +extern void user_trace_stop(void); + asmlinkage long sys_gettimeofday(struct timeval __user *tv, struct timezone __user *tz) { +#ifdef CONFIG_LATENCY_TRACE + if (!tv && ((int)tz == 1)) + user_trace_start(); + if (!tv && !tz) + user_trace_stop(); +#endif if (likely(tv != NULL)) { struct timeval ktv; do_gettimeofday(&ktv); ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q5 2004-08-30 19:13 [patch] voluntary-preempt-2.6.9-rc1-bk4-Q5 Mark_H_Johnson 2004-08-30 19:21 ` Ingo Molnar 2004-08-31 8:49 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-Q5 Ingo Molnar @ 2004-09-02 6:33 ` Ingo Molnar 2004-09-02 6:55 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-Q8 Ingo Molnar 2 siblings, 1 reply; 73+ messages in thread From: Ingo Molnar @ 2004-09-02 6:33 UTC (permalink / raw) To: Mark_H_Johnson Cc: K.R. Foley, linux-kernel, Felipe Alfaro Solana, Daniel Schmitt, Lee Revell * Mark_H_Johnson@raytheon.com <Mark_H_Johnson@raytheon.com> wrote: > In 2.4, it appears that the duration of the write system call is > basically fixed and dependent on the duration of the audio fragment. > In 2.6, this behavior is now different. If I look at the chart in > detail, it appears the system is queueing up several write operations > during the first few seconds of testing. You can see this by > consistently low elapsed times for the write system call. Then the > elapsed time for the write bounces up / down in a sawtooth pattern > over a 1 msec range. Could someone explain the cause of this new > behavior and if there is a setting to restore the old behavior? I am > concerned that this queueing adds latency to audio operations (when > trying to synchronize audio with other real time behavior). i think i found the reason for the sawtooth: it's a bug in hardirq redirection. In certain situations we can end up not waking up softirqd, resulting in a random 0-1msec latency between hardirq arrival and softirq execution. We dont see higher latencies because timer IRQs always wake up softirqd which hides the bug to a certain degree. I'll fix this in -Q8. Ingo ^ permalink raw reply [flat|nested] 73+ messages in thread
* [patch] voluntary-preempt-2.6.9-rc1-bk4-Q8 2004-09-02 6:33 ` Ingo Molnar @ 2004-09-02 6:55 ` Ingo Molnar 2004-09-02 7:04 ` Lee Revell ` (2 more replies) 0 siblings, 3 replies; 73+ messages in thread From: Ingo Molnar @ 2004-09-02 6:55 UTC (permalink / raw) To: Mark_H_Johnson Cc: K.R. Foley, linux-kernel, Felipe Alfaro Solana, Daniel Schmitt, Lee Revell, alsa-devel i've released the -Q8 patch: http://redhat.com/~mingo/voluntary-preempt/voluntary-preempt-2.6.9-rc1-bk4-Q8 ontop of: http://redhat.com/~mingo/voluntary-preempt/diff-bk-040828-2.6.8.1.bz2 this release fixes an artificial 0-1msec delay between hardirq arrival and softirq invocation. This should solve some of the ALSA artifacts reported by Mark H Johnson. It should also solve the rtl8139 problems - i've put such a card into a testbox and with -Q7 i had similar packet latency problems while with -Q8 it works just fine. So netdev_backlog_granularity is still a value of 1 in -Q8, please check whether the networking problems (bootup and service startup and latency) problems are resolved. (and increase this value in case there are still problems.) Ingo ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q8 2004-09-02 6:55 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-Q8 Ingo Molnar @ 2004-09-02 7:04 ` Lee Revell 2004-09-02 7:15 ` Ingo Molnar 2004-09-02 7:17 ` Ingo Molnar 2004-09-02 8:23 ` Ingo Molnar 2004-09-02 11:10 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-Q9 Ingo Molnar 2 siblings, 2 replies; 73+ messages in thread From: Lee Revell @ 2004-09-02 7:04 UTC (permalink / raw) To: Ingo Molnar Cc: Mark_H_Johnson, K.R. Foley, linux-kernel, Felipe Alfaro Solana, Daniel Schmitt, alsa-devel On Thu, 2004-09-02 at 02:55, Ingo Molnar wrote: > i've released the -Q8 patch: > > http://redhat.com/~mingo/voluntary-preempt/voluntary-preempt-2.6.9-rc1-bk4-Q8 > > ontop of: > > http://redhat.com/~mingo/voluntary-preempt/diff-bk-040828-2.6.8.1.bz2 > Here are traces of a 145, 190, and 217 usec latencies in netif_receive_skb: http://krustophenia.net/testresults.php?dataset=2.6.9-rc1-Q6#/var/www/2.6.9-rc1-Q6/trace2.txt http://krustophenia.net/testresults.php?dataset=2.6.9-rc1-Q6#/var/www/2.6.9-rc1-Q6/trace3.txt http://krustophenia.net/testresults.php?dataset=2.6.9-rc1-Q6#/var/www/2.6.9-rc1-Q6/trace4.txt Some of these are with ip_conntrack enabled, at the request of another poster, this does not make much of a difference, it increases the worst case latency by 20 usec or so. Also there is the rt_garbage_collect issue, previously reported. I have not seen this lately but I do not remember seeing that it was fixed. Lee ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q8 2004-09-02 7:04 ` Lee Revell @ 2004-09-02 7:15 ` Ingo Molnar 2004-09-02 7:31 ` Lee Revell 2004-09-02 23:25 ` Lee Revell 2004-09-02 7:17 ` Ingo Molnar 1 sibling, 2 replies; 73+ messages in thread From: Ingo Molnar @ 2004-09-02 7:15 UTC (permalink / raw) To: Lee Revell Cc: Mark_H_Johnson, K.R. Foley, linux-kernel, Felipe Alfaro Solana, Daniel Schmitt, alsa-devel * Lee Revell <rlrevell@joe-job.com> wrote: > Here are traces of a 145, 190, and 217 usec latencies in > netif_receive_skb: > > http://krustophenia.net/testresults.php?dataset=2.6.9-rc1-Q6#/var/www/2.6.9-rc1-Q6/trace2.txt > http://krustophenia.net/testresults.php?dataset=2.6.9-rc1-Q6#/var/www/2.6.9-rc1-Q6/trace3.txt > http://krustophenia.net/testresults.php?dataset=2.6.9-rc1-Q6#/var/www/2.6.9-rc1-Q6/trace4.txt these all seem to be single-packet processing latencies - it would be quite hard to make those codepaths preemptible. i'd suggest to turn off things like netfilter and ip_conntrack (and other optional networking features that show up in the trace), they can only increase latency: 00000001 0.016ms (+0.000ms): ip_rcv (netif_receive_skb) 00000001 0.019ms (+0.002ms): nf_hook_slow (ip_rcv) 00000002 0.019ms (+0.000ms): nf_iterate (nf_hook_slow) 00000002 0.021ms (+0.001ms): ip_conntrack_defrag (nf_iterate) 00000002 0.022ms (+0.000ms): ip_conntrack_in (nf_iterate) 00000002 0.022ms (+0.000ms): ip_ct_find_proto (ip_conntrack_in) 00000103 0.023ms (+0.000ms): __ip_ct_find_proto (ip_ct_find_proto) 00000102 0.024ms (+0.000ms): local_bh_enable (ip_ct_find_proto) 00000002 0.025ms (+0.001ms): tcp_error (ip_conntrack_in) Ingo ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q8 2004-09-02 7:15 ` Ingo Molnar @ 2004-09-02 7:31 ` Lee Revell 2004-09-02 7:46 ` Ingo Molnar 2004-09-02 23:25 ` Lee Revell 1 sibling, 1 reply; 73+ messages in thread From: Lee Revell @ 2004-09-02 7:31 UTC (permalink / raw) To: Ingo Molnar Cc: Mark_H_Johnson, K.R. Foley, linux-kernel, Felipe Alfaro Solana, Daniel Schmitt, alsa-devel On Thu, 2004-09-02 at 03:15, Ingo Molnar wrote: > * Lee Revell <rlrevell@joe-job.com> wrote: > > > Here are traces of a 145, 190, and 217 usec latencies in > > netif_receive_skb: > > > > http://krustophenia.net/testresults.php?dataset=2.6.9-rc1-Q6#/var/www/2.6.9-rc1-Q6/trace2.txt > > http://krustophenia.net/testresults.php?dataset=2.6.9-rc1-Q6#/var/www/2.6.9-rc1-Q6/trace3.txt > > http://krustophenia.net/testresults.php?dataset=2.6.9-rc1-Q6#/var/www/2.6.9-rc1-Q6/trace4.txt > > these all seem to be single-packet processing latencies - it would be > quite hard to make those codepaths preemptible. > I suspected as much, these are not a problem. The large latencies from reading the /proc filesystem are a bit worrisome (trace1.txt), I will report these again if they still happen with Q8. Lee ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q8 2004-09-02 7:31 ` Lee Revell @ 2004-09-02 7:46 ` Ingo Molnar 2004-09-03 1:10 ` Rusty Russell 0 siblings, 1 reply; 73+ messages in thread From: Ingo Molnar @ 2004-09-02 7:46 UTC (permalink / raw) To: Lee Revell Cc: Mark_H_Johnson, K.R. Foley, linux-kernel, Felipe Alfaro Solana, Daniel Schmitt, alsa-devel, Rusty Russell, netfilter-devel * Lee Revell <rlrevell@joe-job.com> wrote: > > these all seem to be single-packet processing latencies - it would be > > quite hard to make those codepaths preemptible. > > I suspected as much, these are not a problem. The large latencies > from reading the /proc filesystem are a bit worrisome (trace1.txt), I > will report these again if they still happen with Q8. conntrack's ct_seq ops indeed seems to have latency problems - the quick workaround is to disable conntrack. The reason for the latency is that ct_seq_start() does a read_lock() on ip_conntrack_lock and only ct_seq_stop() releases it - possibly milliseconds later. But the whole conntrack /proc code is quite flawed: READ_LOCK(&ip_conntrack_lock); if (*pos >= ip_conntrack_htable_size) return NULL; bucket = kmalloc(sizeof(unsigned int), GFP_KERNEL); if (!bucket) { return ERR_PTR(-ENOMEM); } *bucket = *pos; return bucket; #1: we kmalloc(GFP_KERNEL) with a spinlock held and softirqs off - ouch! #2: why does it do the kmalloc() anyway? It could store the position in the seq pointer just fine. No need to alloc an integer pointer to store the value in ... #3: to fix the latency, ct_seq_show() could take the ip_conntrack_lock and could check the current index against ip_conntrack_htable_size. There's not much point in making this non-preemptible, there's a 4K granularity anyway. Rusty, what's going on in this code? Ingo ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q8 2004-09-02 7:46 ` Ingo Molnar @ 2004-09-03 1:10 ` Rusty Russell 0 siblings, 0 replies; 73+ messages in thread From: Rusty Russell @ 2004-09-03 1:10 UTC (permalink / raw) To: Ingo Molnar Cc: Lee Revell, Mark_H_Johnson, K.R. Foley, linux-kernel, Felipe Alfaro Solana, Daniel Schmitt, alsa-devel, netfilter-devel On Thu, 2004-09-02 at 17:46, Ingo Molnar wrote: > Rusty, what's going on in this code? Good question! Not my code, fortunately... > #1: we kmalloc(GFP_KERNEL) with a spinlock held and softirqs off - ouch! > > #2: why does it do the kmalloc() anyway? It could store the position in > the seq pointer just fine. No need to alloc an integer pointer to > store the value in ... > > #3: to fix the latency, ct_seq_show() could take the ip_conntrack_lock > and could check the current index against ip_conntrack_htable_size. > There's not much point in making this non-preemptible, there's > a 4K granularity anyway. The code tries to put an entire hash bucket into a single seq_read(). That's not going to work if the hash is really deep. On the other hand, not much will, and it's simple. The lock is only needed on traversing: htable_size can't change after init anyway, so it should be done in ct_seq_show. Fix should be fairly simple... Rusty. -- Anyone who quotes me in their signature is an idiot -- Rusty Russell ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q8 2004-09-02 7:15 ` Ingo Molnar 2004-09-02 7:31 ` Lee Revell @ 2004-09-02 23:25 ` Lee Revell 2004-09-02 23:28 ` Ingo Molnar 1 sibling, 1 reply; 73+ messages in thread From: Lee Revell @ 2004-09-02 23:25 UTC (permalink / raw) To: Ingo Molnar Cc: Mark_H_Johnson, K.R. Foley, linux-kernel, Felipe Alfaro Solana, Daniel Schmitt, alsa-devel On Thu, 2004-09-02 at 03:15, Ingo Molnar wrote: > * Lee Revell <rlrevell@joe-job.com> wrote: > > > Here are traces of a 145, 190, and 217 usec latencies in > > netif_receive_skb: > > > > http://krustophenia.net/testresults.php?dataset=2.6.9-rc1-Q6#/var/www/2.6.9-rc1-Q6/trace2.txt > > http://krustophenia.net/testresults.php?dataset=2.6.9-rc1-Q6#/var/www/2.6.9-rc1-Q6/trace3.txt > > http://krustophenia.net/testresults.php?dataset=2.6.9-rc1-Q6#/var/www/2.6.9-rc1-Q6/trace4.txt > > these all seem to be single-packet processing latencies - it would be > quite hard to make those codepaths preemptible. > > i'd suggest to turn off things like netfilter and ip_conntrack (and > other optional networking features that show up in the trace), they can > only increase latency: > Do you see any optional networking features in the trace (other than ip_conntrack)? I was under the impression that I had everything optional disabled. Lee ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q8 2004-09-02 23:25 ` Lee Revell @ 2004-09-02 23:28 ` Ingo Molnar 2004-09-02 23:32 ` Lee Revell 0 siblings, 1 reply; 73+ messages in thread From: Ingo Molnar @ 2004-09-02 23:28 UTC (permalink / raw) To: Lee Revell Cc: Mark_H_Johnson, K.R. Foley, linux-kernel, Felipe Alfaro Solana, Daniel Schmitt, alsa-devel * Lee Revell <rlrevell@joe-job.com> wrote: > Do you see any optional networking features in the trace (other than > ip_conntrack)? I was under the impression that I had everything > optional disabled. yeah, it seems to be only ip_conntrack and netfilter (which conntrack relies on). Ingo ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q8 2004-09-02 23:28 ` Ingo Molnar @ 2004-09-02 23:32 ` Lee Revell 0 siblings, 0 replies; 73+ messages in thread From: Lee Revell @ 2004-09-02 23:32 UTC (permalink / raw) To: Ingo Molnar Cc: Mark_H_Johnson, K.R. Foley, linux-kernel, Felipe Alfaro Solana, Daniel Schmitt, alsa-devel On Thu, 2004-09-02 at 19:28, Ingo Molnar wrote: > * Lee Revell <rlrevell@joe-job.com> wrote: > > > Do you see any optional networking features in the trace (other than > > ip_conntrack)? I was under the impression that I had everything > > optional disabled. > > yeah, it seems to be only ip_conntrack and netfilter (which conntrack > relies on). > FWIW these seem to only slow down the single packet path by about 10%. This is pretty good. Lee ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q8 2004-09-02 7:04 ` Lee Revell 2004-09-02 7:15 ` Ingo Molnar @ 2004-09-02 7:17 ` Ingo Molnar 1 sibling, 0 replies; 73+ messages in thread From: Ingo Molnar @ 2004-09-02 7:17 UTC (permalink / raw) To: Lee Revell Cc: Mark_H_Johnson, K.R. Foley, linux-kernel, Felipe Alfaro Solana, Daniel Schmitt * Lee Revell <rlrevell@joe-job.com> wrote: > Also there is the rt_garbage_collect issue, previously reported. I > have not seen this lately but I do not remember seeing that it was > fixed. i dont think it's fixed, please re-report it if it occurs again, there have been many changes. Ingo ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q8 2004-09-02 6:55 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-Q8 Ingo Molnar 2004-09-02 7:04 ` Lee Revell @ 2004-09-02 8:23 ` Ingo Molnar 2004-09-02 11:10 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-Q9 Ingo Molnar 2 siblings, 0 replies; 73+ messages in thread From: Ingo Molnar @ 2004-09-02 8:23 UTC (permalink / raw) To: Mark_H_Johnson Cc: K.R. Foley, linux-kernel, Felipe Alfaro Solana, Daniel Schmitt, Lee Revell, alsa-devel * Ingo Molnar <mingo@elte.hu> wrote: > this release fixes an artificial 0-1msec delay between hardirq arrival > and softirq invocation. This should solve some of the ALSA artifacts > reported by Mark H Johnson. It should also solve the rtl8139 problems > - i've put such a card into a testbox and with -Q7 i had similar > packet latency problems while with -Q8 it works just fine. the rtl8139 problems are not fixed yet - i can still reproduce the delayed packet issues. Ingo ^ permalink raw reply [flat|nested] 73+ messages in thread
* [patch] voluntary-preempt-2.6.9-rc1-bk4-Q9 2004-09-02 6:55 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-Q8 Ingo Molnar 2004-09-02 7:04 ` Lee Revell 2004-09-02 8:23 ` Ingo Molnar @ 2004-09-02 11:10 ` Ingo Molnar 2004-09-02 12:14 ` Thomas Charbonnel ` (2 more replies) 2 siblings, 3 replies; 73+ messages in thread From: Ingo Molnar @ 2004-09-02 11:10 UTC (permalink / raw) To: linux-kernel Cc: K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt, Lee Revell, Mark_H_Johnson i've released the -Q9 patch: http://redhat.com/~mingo/voluntary-preempt/voluntary-preempt-2.6.9-rc1-bk4-Q9 ontop of: http://redhat.com/~mingo/voluntary-preempt/diff-bk-040828-2.6.8.1.bz2 Changes: - fixed the cond_resched_softirq() bug noticed by Mika Penttila. - updated the preemption-friendly network-RX code but 8193too.c still produces delayed packets so netdev_backlog_granularity now defaults to 2, which seems to be working fine on my testbox. - the latency_trace output now includes the kernel and patch version, for easier sorting of reports. Ingo ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q9 2004-09-02 11:10 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-Q9 Ingo Molnar @ 2004-09-02 12:14 ` Thomas Charbonnel 2004-09-02 13:16 ` Thomas Charbonnel 2004-09-02 21:57 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-R0 Ingo Molnar 2 siblings, 0 replies; 73+ messages in thread From: Thomas Charbonnel @ 2004-09-02 12:14 UTC (permalink / raw) To: Ingo Molnar Cc: linux-kernel, K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt, Lee Revell, Mark_H_Johnson Ingo Molnar wrote : > i've released the -Q9 patch: > > http://redhat.com/~mingo/voluntary-preempt/voluntary-preempt-2.6.9-rc1-bk4-Q9 > > ontop of: > > http://redhat.com/~mingo/voluntary-preempt/diff-bk-040828-2.6.8.1.bz2 > > Changes: > > - fixed the cond_resched_softirq() bug noticed by Mika Penttila. > > - updated the preemption-friendly network-RX code but 8193too.c still > produces delayed packets so netdev_backlog_granularity now defaults > to 2, which seems to be working fine on my testbox. Thanks, network seems to be working fine with the new default value. I still get > 140 us non-preemptible sections in rtl8139_pol, though : http://www.undata.org/~thomas/q9_rtl8139.trace Thomas ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q9 2004-09-02 11:10 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-Q9 Ingo Molnar 2004-09-02 12:14 ` Thomas Charbonnel @ 2004-09-02 13:16 ` Thomas Charbonnel 2004-09-02 13:23 ` Ingo Molnar 2004-09-02 21:57 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-R0 Ingo Molnar 2 siblings, 1 reply; 73+ messages in thread From: Thomas Charbonnel @ 2004-09-02 13:16 UTC (permalink / raw) To: Ingo Molnar Cc: linux-kernel, K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt, Lee Revell, Mark_H_Johnson Ingo Molnar wrote : > i've released the -Q9 patch: > > http://redhat.com/~mingo/voluntary-preempt/voluntary-preempt-2.6.9-rc1-bk4-Q9 With ACPI compiled in and booting with acpi=off (which again doesn't seem to be honoured), here's another weird one : preemption latency trace v1.0.5 on 2.6.9-rc1-VP-Q9 -------------------------------------------------- latency: 2888 us, entries: 37 (37) ----------------- | task: swapper/0, uid:0 nice:0 policy:0 rt_prio:0 ----------------- => started at: do_IRQ+0x19/0x1a0 => ended at: do_IRQ+0x14d/0x1a0 =======> 00010000 0.000ms (+0.000ms): do_IRQ (common_interrupt) 00010000 0.000ms (+0.000ms): do_IRQ (default_idle) 00010000 0.000ms (+0.000ms): do_IRQ (<00000000>) 00010001 0.000ms (+0.002ms): mask_and_ack_8259A (do_IRQ) 00010001 0.002ms (+0.000ms): generic_redirect_hardirq (do_IRQ) 00010000 0.002ms (+0.000ms): generic_handle_IRQ_event (do_IRQ) 00010000 0.003ms (+0.000ms): timer_interrupt (generic_handle_IRQ_event) 00010001 0.003ms (+2.878ms): mark_offset_tsc (timer_interrupt) 00010001 2.882ms (+0.000ms): do_timer (timer_interrupt) 00010001 2.882ms (+0.000ms): update_process_times (do_timer) 00010001 2.882ms (+0.000ms): update_one_process (update_process_times) 00010001 2.882ms (+0.000ms): run_local_timers (update_process_times) 00010001 2.882ms (+0.000ms): raise_softirq (update_process_times) 00010001 2.882ms (+0.000ms): scheduler_tick (update_process_times) 00010001 2.883ms (+0.000ms): sched_clock (scheduler_tick) 00010001 2.883ms (+0.000ms): update_wall_time (do_timer) 00010001 2.883ms (+0.000ms): update_wall_time_one_tick (update_wall_time) 00010001 2.883ms (+0.000ms): profile_tick (timer_interrupt) 00010001 2.884ms (+0.000ms): profile_hook (profile_tick) 00010002 2.884ms (+0.000ms): notifier_call_chain (profile_hook) 00010001 2.884ms (+0.000ms): profile_hit (timer_interrupt) 00010001 2.884ms (+0.000ms): generic_note_interrupt (do_IRQ) 00010001 2.884ms (+0.000ms): end_8259A_irq (do_IRQ) 00010001 2.885ms (+0.000ms): enable_8259A_irq (do_IRQ) 00000001 2.886ms (+0.000ms): do_softirq (do_IRQ) 00000001 2.886ms (+0.000ms): __do_softirq (do_softirq) 00000001 2.886ms (+0.000ms): wake_up_process (do_softirq) 00000001 2.886ms (+0.000ms): try_to_wake_up (wake_up_process) 00000001 2.886ms (+0.000ms): task_rq_lock (try_to_wake_up) 00000002 2.886ms (+0.000ms): activate_task (try_to_wake_up) 00000002 2.886ms (+0.000ms): sched_clock (activate_task) 00000002 2.887ms (+0.000ms): recalc_task_prio (activate_task) 00000002 2.887ms (+0.000ms): effective_prio (recalc_task_prio) 00000002 2.887ms (+0.000ms): enqueue_task (activate_task) 00000001 2.887ms (+0.000ms): preempt_schedule (try_to_wake_up) 00000001 2.888ms (+0.000ms): sub_preempt_count (do_IRQ) 00000001 2.888ms (+0.000ms): update_max_trace (check_preempt_timing) Thomas ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q9 2004-09-02 13:16 ` Thomas Charbonnel @ 2004-09-02 13:23 ` Ingo Molnar 2004-09-02 14:38 ` Thomas Charbonnel 0 siblings, 1 reply; 73+ messages in thread From: Ingo Molnar @ 2004-09-02 13:23 UTC (permalink / raw) To: Thomas Charbonnel Cc: linux-kernel, K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt, Lee Revell, Mark_H_Johnson * Thomas Charbonnel <thomas@undata.org> wrote: > With ACPI compiled in and booting with acpi=off (which again doesn't > seem to be honoured), here's another weird one : > 00010000 0.003ms (+0.000ms): timer_interrupt (generic_handle_IRQ_event) > 00010001 0.003ms (+2.878ms): mark_offset_tsc (timer_interrupt) > 00010001 2.882ms (+0.000ms): do_timer (timer_interrupt) do you have the NMI watchdog enabled? That could help us debugging this. Enabling the APIC/IO_APIC and using nmi_watchdog=1 would be the ideal solution - if that doesnt work then nmi_watchdog=2 would be fine too, but remove this code from arch/i386/kernel/nmi.c: if (nmi_watchdog == NMI_LOCAL_APIC) nmi_hz = 1; (otherwise nmi_watchdog=2 would result in one NMI per second, not enough to shed light on the above 2.8 msec latency.) Ingo ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q9 2004-09-02 13:23 ` Ingo Molnar @ 2004-09-02 14:38 ` Thomas Charbonnel 0 siblings, 0 replies; 73+ messages in thread From: Thomas Charbonnel @ 2004-09-02 14:38 UTC (permalink / raw) To: Ingo Molnar Cc: linux-kernel, K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt, Lee Revell, Mark_H_Johnson Ingo Molnar wrote : > do you have the NMI watchdog enabled? That could help us debugging this. > Enabling the APIC/IO_APIC and using nmi_watchdog=1 would be the ideal > solution - if that doesnt work then nmi_watchdog=2 would be fine too, > but remove this code from arch/i386/kernel/nmi.c: > > if (nmi_watchdog == NMI_LOCAL_APIC) > nmi_hz = 1; > > (otherwise nmi_watchdog=2 would result in one NMI per second, not enough > to shed light on the above 2.8 msec latency.) > > Ingo nmi_watchdog=1 worked fine. Here's a trace (there seem to be a counter overflow) : preemption latency trace v1.0.5 on 2.6.9-rc1-VP-Q9 -------------------------------------------------- latency: 539 us, entries: 38 (38) ----------------- | task: swapper/0, uid:0 nice:0 policy:0 rt_prio:0 ----------------- => started at: do_IRQ+0x19/0x190 => ended at: do_IRQ+0x14a/0x190 =======> 00010000 0.000ms (+0.000ms): do_IRQ (common_interrupt) 00010000 0.000ms (+0.000ms): do_IRQ (default_idle) 00010000 0.000ms (+0.000ms): do_IRQ (<00000000>) 00010001 0.000ms (+0.002ms): mask_and_ack_8259A (do_IRQ) 00010001 0.002ms (+0.000ms): generic_redirect_hardirq (do_IRQ) 00010000 0.002ms (+0.000ms): generic_handle_IRQ_event (do_IRQ) 00010000 0.002ms (+0.000ms): timer_interrupt (generic_handle_IRQ_event) 00010001 0.003ms (+0.005ms): mark_offset_tsc (timer_interrupt) 00010001 0.008ms (+0.000ms): do_timer (timer_interrupt) 00010001 0.008ms (+0.000ms): update_process_times (do_timer) 00010001 0.008ms (+0.000ms): update_one_process (update_process_times) 00010001 0.009ms (+0.000ms): run_local_timers (update_process_times) 00010001 0.009ms (+0.000ms): raise_softirq (update_process_times) 00010001 0.009ms (+0.000ms): scheduler_tick (update_process_times) 00010001 0.009ms (+0.000ms): sched_clock (scheduler_tick) 00010001 0.009ms (+0.000ms): update_wall_time (do_timer) 00010001 0.009ms (+0.000ms): update_wall_time_one_tick (update_wall_time) 00010001 0.010ms (+0.000ms): generic_note_interrupt (do_IRQ) 00010001 0.010ms (+0.000ms): end_8259A_irq (do_IRQ) 00010001 0.010ms (+0.000ms): enable_8259A_irq (do_IRQ) 00000001 0.011ms (+0.524ms): do_softirq (do_IRQ) 00000001 0.535ms (+0.000ms): do_nmi (mcount) 00010001 0.536ms (+0.000ms): profile_tick (nmi_watchdog_tick) 00010001 0.536ms (+0.000ms): profile_hook (profile_tick) 00010002 0.536ms (+0.000ms): notifier_call_chain (profile_hook) 00010001 0.537ms (+689953.062ms): profile_hit (nmi_watchdog_tick) 00000002 689953.599ms (+1.523ms): sched_clock (activate_task) 00000001 0.537ms (+0.000ms): wake_up_process (do_softirq) 00000001 0.537ms (+0.000ms): try_to_wake_up (wake_up_process) 00000001 0.537ms (+0.000ms): task_rq_lock (try_to_wake_up) 00000002 0.537ms (+0.000ms): activate_task (try_to_wake_up) 00000002 0.538ms (+0.000ms): sched_clock (activate_task) 00000002 0.538ms (+0.000ms): recalc_task_prio (activate_task) 00000002 0.538ms (+0.000ms): effective_prio (recalc_task_prio) 00000002 0.538ms (+0.000ms): enqueue_task (activate_task) 00000001 0.538ms (+0.000ms): preempt_schedule (try_to_wake_up) 00000001 0.539ms (+0.000ms): sub_preempt_count (do_IRQ) 00000001 0.539ms (+0.000ms): update_max_trace (check_preempt_timing) Thomas ^ permalink raw reply [flat|nested] 73+ messages in thread
* [patch] voluntary-preempt-2.6.9-rc1-bk4-R0 2004-09-02 11:10 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-Q9 Ingo Molnar 2004-09-02 12:14 ` Thomas Charbonnel 2004-09-02 13:16 ` Thomas Charbonnel @ 2004-09-02 21:57 ` Ingo Molnar 2004-09-02 22:06 ` Lee Revell ` (3 more replies) 2 siblings, 4 replies; 73+ messages in thread From: Ingo Molnar @ 2004-09-02 21:57 UTC (permalink / raw) To: linux-kernel Cc: K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt, Lee Revell, Mark_H_Johnson, P.O. Gaillard i've released the -R0 patch: http://redhat.com/~mingo/voluntary-preempt/voluntary-preempt-2.6.9-rc1-bk4-R0 ontop of: http://redhat.com/~mingo/voluntary-preempt/diff-bk-040828-2.6.8.1.bz2 i've given up on the netdev_backlog_granularity approach, and as a replacement i've modified specific network drivers to return at a safe point if softirq preemption is requested. This gives the same end result but is more robust. For the time being i've fixed 8193too.c and e100.c. (will fix up other drivers too as latencies get reported) this should fix the crash reported by P.O. Gaillard, and it should solve the packet delay/loss issues reported by Mark H Johnson. I cannot see any problems on my rtl8193 testbox anymore. Ingo ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R0 2004-09-02 21:57 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-R0 Ingo Molnar @ 2004-09-02 22:06 ` Lee Revell 2004-09-02 22:14 ` Ingo Molnar 2004-09-03 11:04 ` K.R. Foley ` (2 subsequent siblings) 3 siblings, 1 reply; 73+ messages in thread From: Lee Revell @ 2004-09-02 22:06 UTC (permalink / raw) To: Ingo Molnar Cc: linux-kernel, K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt, Mark_H_Johnson, P.O. Gaillard On Thu, 2004-09-02 at 17:57, Ingo Molnar wrote: > i've released the -R0 patch: > > http://redhat.com/~mingo/voluntary-preempt/voluntary-preempt-2.6.9-rc1-bk4-R0 > > ontop of: > > http://redhat.com/~mingo/voluntary-preempt/diff-bk-040828-2.6.8.1.bz2 > > i've given up on the netdev_backlog_granularity approach, and as a > replacement i've modified specific network drivers to return at a safe > point if softirq preemption is requested. Makes sense, netdev_max_backlog never made a difference on my system (via-rhine driver). If you read that Microsoft paper I posted a link to earlier, they describe all of the main categories of latencies we have dealt with. They give special mention to network driver DPCs/softirqs. The worst offender was a driver that used a DPC/softirq to reset the card if there was no traffic in 10 seconds, to work around a hardware lockup bug in some versions. Since the reset path was never designed to be fast this caused problems. They also mention a similar problem to the one that still exists with the via-rhine driver (fixed in -mm) where it uses a DPC to poll for link status. Lee ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R0 2004-09-02 22:06 ` Lee Revell @ 2004-09-02 22:14 ` Ingo Molnar 2004-09-02 22:15 ` Lee Revell 2004-09-03 0:24 ` Lee Revell 0 siblings, 2 replies; 73+ messages in thread From: Ingo Molnar @ 2004-09-02 22:14 UTC (permalink / raw) To: Lee Revell Cc: linux-kernel, K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt, Mark_H_Johnson, P.O. Gaillard * Lee Revell <rlrevell@joe-job.com> wrote: > > i've given up on the netdev_backlog_granularity approach, and as a > > replacement i've modified specific network drivers to return at a safe > > point if softirq preemption is requested. > > Makes sense, netdev_max_backlog never made a difference on my system > (via-rhine driver). via-rhine does RX processing from the hardirq handler, this codepath is harder to break up. The NAPI ->poll functions used by e100 and 8193too are much easier to break up because RX throttling and re-trying is a basic property of NAPI. Ingo ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R0 2004-09-02 22:14 ` Ingo Molnar @ 2004-09-02 22:15 ` Lee Revell 2004-09-03 0:24 ` Lee Revell 1 sibling, 0 replies; 73+ messages in thread From: Lee Revell @ 2004-09-02 22:15 UTC (permalink / raw) To: Ingo Molnar Cc: linux-kernel, K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt, Mark_H_Johnson, P.O. Gaillard On Thu, 2004-09-02 at 18:14, Ingo Molnar wrote: > * Lee Revell <rlrevell@joe-job.com> wrote: > > > > i've given up on the netdev_backlog_granularity approach, and as a > > > replacement i've modified specific network drivers to return at a safe > > > point if softirq preemption is requested. > > > > Makes sense, netdev_max_backlog never made a difference on my system > > (via-rhine driver). > > via-rhine does RX processing from the hardirq handler, this codepath is > harder to break up. The NAPI ->poll functions used by e100 and 8193too > are much easier to break up because RX throttling and re-trying is a > basic property of NAPI. > What I meant was, I did not consider the latencies from via-rhine to be a problem. The other posters reported 300-500 usec latencies in the network driver, I did not see this. Lee ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R0 2004-09-02 22:14 ` Ingo Molnar 2004-09-02 22:15 ` Lee Revell @ 2004-09-03 0:24 ` Lee Revell 2004-09-03 3:17 ` Eric St-Laurent 1 sibling, 1 reply; 73+ messages in thread From: Lee Revell @ 2004-09-03 0:24 UTC (permalink / raw) To: Ingo Molnar Cc: linux-kernel, K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt, Mark_H_Johnson, P.O. Gaillard On Thu, 2004-09-02 at 18:14, Ingo Molnar wrote: > * Lee Revell <rlrevell@joe-job.com> wrote: > > > > i've given up on the netdev_backlog_granularity approach, and as a > > > replacement i've modified specific network drivers to return at a safe > > > point if softirq preemption is requested. > > > > Makes sense, netdev_max_backlog never made a difference on my system > > (via-rhine driver). > > via-rhine does RX processing from the hardirq handler, this codepath is > harder to break up. The NAPI ->poll functions used by e100 and 8193too > are much easier to break up because RX throttling and re-trying is a > basic property of NAPI. > Judging from these graphs, all of the latency issues are solved, at least on my UP hardware, and the latencies seem to be getting very close to the limits of what the hardware can do: http://krustophenia.net/testresults.php?dataset=2.6.9-rc1-Q6#/var/www/2.6.9-rc1-Q6/jack-test-1 The worst case latency is only 160 usecs, and the vast majority fall into the pattern from 0 to 120 usecs. All of the spikes above 120 are almost certainly caused by netdev_max_backlog. However these are not long enough to cause any problems with my workload; the lowest practical latency for audio work is around 0.66 ms (32 frames at 48khz). Lee ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R0 2004-09-03 0:24 ` Lee Revell @ 2004-09-03 3:17 ` Eric St-Laurent 2004-09-03 6:26 ` Lee Revell 0 siblings, 1 reply; 73+ messages in thread From: Eric St-Laurent @ 2004-09-03 3:17 UTC (permalink / raw) To: Lee Revell Cc: Ingo Molnar, linux-kernel, K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt, Mark_H_Johnson, P.O. Gaillard > Judging from these graphs, all of the latency issues are solved, at > least on my UP hardware, and the latencies seem to be getting very close > to the limits of what the hardware can do: > > http://krustophenia.net/testresults.php?dataset=2.6.9-rc1-Q6#/var/www/2.6.9-rc1-Q6/jack-test-1 > > The worst case latency is only 160 usecs, and the vast majority fall > into the pattern from 0 to 120 usecs. All of the spikes above 120 are > almost certainly caused by netdev_max_backlog. However these are not > long enough to cause any problems with my workload; the lowest practical > latency for audio work is around 0.66 ms (32 frames at 48khz). Lee, A few weeks ago you wrote that "the worst latency I was able to trigger was 46 usecs", now it's 160 usecs. Ingo has done much work on his patches since then. Why the worst latency is higher now? I presume that the latency measurements technique are more accurate and the 46 usecs was inaccurate? Ref: http://uwsg.indiana.edu/hypermail/linux/kernel/0407.3/0994.html Best regards, Eric St-Laurent ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R0 2004-09-03 3:17 ` Eric St-Laurent @ 2004-09-03 6:26 ` Lee Revell 2004-09-03 6:36 ` Ingo Molnar 0 siblings, 1 reply; 73+ messages in thread From: Lee Revell @ 2004-09-03 6:26 UTC (permalink / raw) To: Eric St-Laurent Cc: Ingo Molnar, linux-kernel, K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt, Mark_H_Johnson, P.O. Gaillard On Thu, 2004-09-02 at 23:17, Eric St-Laurent wrote: > > Judging from these graphs, all of the latency issues are solved, at > > least on my UP hardware, and the latencies seem to be getting very close > > to the limits of what the hardware can do: > > > > http://krustophenia.net/testresults.php?dataset=2.6.9-rc1-Q6#/var/www/2.6.9-rc1-Q6/jack-test-1 > > > > The worst case latency is only 160 usecs, and the vast majority fall > > into the pattern from 0 to 120 usecs. All of the spikes above 120 are > > almost certainly caused by netdev_max_backlog. However these are not > > long enough to cause any problems with my workload; the lowest practical > > latency for audio work is around 0.66 ms (32 frames at 48khz). > > Lee, > > A few weeks ago you wrote that "the worst latency I was able to trigger > was 46 usecs", now it's 160 usecs. > > Ingo has done much work on his patches since then. > > Why the worst latency is higher now? I presume that the latency > measurements technique are more accurate and the 46 usecs was > inaccurate? > > Ref: http://uwsg.indiana.edu/hypermail/linux/kernel/0407.3/0994.html > Yup, due to my incomplete understanding of the jackd code, my initial measurements were measuring the time it took to run one process cycle (basically a NOOP if there are no clients), rather than the actual time jackd spent in poll() between cycles. This did have the effect of measuring the scueduler latency, but I believe it was being measured indirectly via cache effects - the longer it had been since jackd last ran, the colder the cachelines touched by the last cycle. Since I am using a patch to jackd to measure these latencies, which will be merged in the near future, it's more important for me that the patch accurately reflect the latencies jackd users will see than it is for the new, accurate results to be compatible with the old. All datasets for -O and earlier use the old code: http://krustophenia.net/testresults.php?dataset=2.6.8-rc3-O5 This one uses the initial version of the new code, measuring the time jackd spends in poll(). The bimodal distribution is due to my sound card having two different interrupt sources for capture and playback. So one of the spikes represent the elapsed time between the capture interrupt and the playback interrupt, and the other the time between the playback interrupt and the next capture interrupt: http://krustophenia.net/testresults.php?dataset=2.6.8.1-P0 -Q and later use the current method, which is like the above except the second hump is discarded, as it is a function of the scheduling latency and the period size rather than just the scheduling latency: http://krustophenia.net/testresults.php?dataset=2.6.9-rc1-Q6 So, don't be fooled by the numbers, the newest version of the patch is in fact the best. I have been meaning to go back and measure the current patches with the old code but it's pretty low priority... Lee ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R0 2004-09-03 6:26 ` Lee Revell @ 2004-09-03 6:36 ` Ingo Molnar 2004-09-03 6:49 ` Lee Revell 0 siblings, 1 reply; 73+ messages in thread From: Ingo Molnar @ 2004-09-03 6:36 UTC (permalink / raw) To: Lee Revell Cc: Eric St-Laurent, linux-kernel, K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt, Mark_H_Johnson, P.O. Gaillard * Lee Revell <rlrevell@joe-job.com> wrote: > -Q and later use the current method, which is like the above except > the second hump is discarded, as it is a function of the scheduling > latency and the period size rather than just the scheduling latency: > > http://krustophenia.net/testresults.php?dataset=2.6.9-rc1-Q6 > > So, don't be fooled by the numbers, the newest version of the patch is > in fact the best. I have been meaning to go back and measure the > current patches with the old code but it's pretty low priority... vanilla kernel 2.6.8.1 would be quite interesting to get a few charts of - especially if your measurement methodology has changed. There's not much sense in re-testing older VP patches. also, has the userspace workload you are using stayed constant during all these tests? Ingo ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R0 2004-09-03 6:36 ` Ingo Molnar @ 2004-09-03 6:49 ` Lee Revell 2004-09-03 7:01 ` Ingo Molnar 2004-09-03 7:05 ` Ingo Molnar 0 siblings, 2 replies; 73+ messages in thread From: Lee Revell @ 2004-09-03 6:49 UTC (permalink / raw) To: Ingo Molnar Cc: Eric St-Laurent, linux-kernel, K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt, Mark_H_Johnson, P.O. Gaillard On Fri, 2004-09-03 at 02:36, Ingo Molnar wrote: > * Lee Revell <rlrevell@joe-job.com> wrote: > > > -Q and later use the current method, which is like the above except > > the second hump is discarded, as it is a function of the scheduling > > latency and the period size rather than just the scheduling latency: > > > > http://krustophenia.net/testresults.php?dataset=2.6.9-rc1-Q6 > > > > So, don't be fooled by the numbers, the newest version of the patch is > > in fact the best. I have been meaning to go back and measure the > > current patches with the old code but it's pretty low priority... > > vanilla kernel 2.6.8.1 would be quite interesting to get a few charts of > - especially if your measurement methodology has changed. OK, I will give this a shot. Now that the VP patches are stabilizing I will be doing more profiling. I also want to try the -mm kernel, this has some interesting differences from the stock kernel. For example I measured about a 10% improvement with the old method, which implies a big performance gain. > There's not > much sense in re-testing older VP patches. > Yup, my thoughts exactly, this would just tell us what we already know, that the latency gets better with each version. > also, has the userspace workload you are using stayed constant during > all these tests? > I am mostly just using normal desktop hacker workloads, web browsing, email, builds. Lately I am using the box as a Samba server. At first, I was stressing the system using every disk benchmark I could think of, but it never seemed to affect the worst case and did not even change the shape of the distribution much, so I don't bother. For all practical purposes, it's impossible to change the shape of these graphs much by stressing the system. I am able to induce large latencies by using up all available swap with make -j12 on a KDE program, and by pingflooding the broadcast address, but these are pathological enough that I have not worried about them. Lee ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R0 2004-09-03 6:49 ` Lee Revell @ 2004-09-03 7:01 ` Ingo Molnar 2004-09-03 7:05 ` Ingo Molnar 1 sibling, 0 replies; 73+ messages in thread From: Ingo Molnar @ 2004-09-03 7:01 UTC (permalink / raw) To: Lee Revell Cc: Eric St-Laurent, linux-kernel, K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt, Mark_H_Johnson, P.O. Gaillard * Lee Revell <rlrevell@joe-job.com> wrote: > > vanilla kernel 2.6.8.1 would be quite interesting to get a few charts of > > - especially if your measurement methodology has changed. > > OK, I will give this a shot. Now that the VP patches are stabilizing > I will be doing more profiling. I also want to try the -mm kernel, > this has some interesting differences from the stock kernel. For > example I measured about a 10% improvement with the old method, which > implies a big performance gain. the -mm kernel used to have additional *-latency-fix patches that were done based on the initial preemption-timing patch in -mm and partly based on early VP discussions and findings. I recently reviewed and merged the 2-3 missing ones into VP. Andrew has dropped these patches meanwhile and i expect to submit the cleaner and more complete solution that is in VP. So i'd expect -mm to still perform better than vanilla (it usually does), but if that big 10% difference in latencies doesnt show up anymore i'd attribute it to the shuffling around of latency related patches, not some genuine deficiency in -mm. Ingo ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R0 2004-09-03 6:49 ` Lee Revell 2004-09-03 7:01 ` Ingo Molnar @ 2004-09-03 7:05 ` Ingo Molnar 2004-09-03 7:40 ` Lee Revell 2004-09-03 18:09 ` K.R. Foley 1 sibling, 2 replies; 73+ messages in thread From: Ingo Molnar @ 2004-09-03 7:05 UTC (permalink / raw) To: Lee Revell Cc: Eric St-Laurent, linux-kernel, K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt, Mark_H_Johnson, P.O. Gaillard another question - any objections against me rebasing the VP patch to the current -mm tree and keeping it there exclusively until all possible merges are done? It would probably quite some work to create a complete patch for the upstream or BK tree during that process, as small patches start to flow in the VP => -mm => BK direction. Would any of the regular VP users/testers be wary to use the -mm tree? Ingo ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R0 2004-09-03 7:05 ` Ingo Molnar @ 2004-09-03 7:40 ` Lee Revell 2004-09-03 7:50 ` Free Ekanayaka 2004-09-03 8:09 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-R0 Luke Yelavich 2004-09-03 18:09 ` K.R. Foley 1 sibling, 2 replies; 73+ messages in thread From: Lee Revell @ 2004-09-03 7:40 UTC (permalink / raw) To: Ingo Molnar Cc: Eric St-Laurent, linux-kernel, K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt, Mark_H_Johnson, P.O. Gaillard, nando, luke, free78 On Fri, 2004-09-03 at 03:05, Ingo Molnar wrote: > another question - any objections against me rebasing the VP patch to > the current -mm tree and keeping it there exclusively until all possible > merges are done? It would probably quite some work to create a complete > patch for the upstream or BK tree during that process, as small patches > start to flow in the VP => -mm => BK direction. Would any of the regular > VP users/testers be wary to use the -mm tree? > None here. Assuming the SMP issues are resolved, then the only remaining issues on UP will probably be with various drivers, those fixes should apply just as easily to -mm as vanilla. As far as I am concerned the VP patches are stable enough for the audio-centric distros to start distributing VP kernel packages, these will certainly be using the vanilla kernel. I think the PlanetCCRMA and AGNULA people are planning to start distributing test VP-kernel packages as soon as the patches stabilize. IIRC Nando is on vacation this week. I will make an announcement on LAD that as of R0 the VP patches should be stable and are ready for wider testing. You may want to wait until after the initial slew of bug reports before rebasing VP against MM. I suspect most of the problems with be driver specific, and most of the fixes will apply equally to -mm and vanilla. I have added Luke (AudioSlack), Free (AGNULA), and Nando (CCRMA) to the cc: list. They would be in the best position to answer your question. Lee ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R0 2004-09-03 7:40 ` Lee Revell @ 2004-09-03 7:50 ` Free Ekanayaka 2004-09-03 8:05 ` Lee Revell 2004-09-03 8:09 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-R0 Luke Yelavich 1 sibling, 1 reply; 73+ messages in thread From: Free Ekanayaka @ 2004-09-03 7:50 UTC (permalink / raw) To: Lee Revell Cc: Ingo Molnar, Eric St-Laurent, linux-kernel, K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt, Mark_H_Johnson, P.O. Gaillard, nando, luke, free78 |--==> "LR" == Lee Revell <rlrevell@joe-job.com> writes: LR> On Fri, 2004-09-03 at 03:05, Ingo Molnar wrote: >>another question - any objections against me rebasing the VP patch to >>the current -mm tree and keeping it there exclusively until all possible >>merges are done? It would probably quite some work to create a complete >>patch for the upstream or BK tree during that process, as small patches >>start to flow in the VP => -mm => BK direction. Would any of the regular >>VP users/testers be wary to use the -mm tree? >> LR> None here. Assuming the SMP issues are resolved, then the only LR> remaining issues on UP will probably be with various drivers, those LR> fixes should apply just as easily to -mm as vanilla. LR> As far as I am concerned the VP patches are stable enough for the LR> audio-centric distros to start distributing VP kernel packages, these LR> will certainly be using the vanilla kernel. I think the PlanetCCRMA and LR> AGNULA people are planning to start distributing test VP-kernel packages LR> as soon as the patches stabilize. IIRC Nando is on vacation this week. LR> I will make an announcement on LAD that as of R0 the VP patches should LR> be stable and are ready for wider testing. You may want to wait until LR> after the initial slew of bug reports before rebasing VP against MM. I LR> suspect most of the problems with be driver specific, and most of the LR> fixes will apply equally to -mm and vanilla. LR> I have added Luke (AudioSlack), Free (AGNULA), and Nando (CCRMA) to the LR> cc: list. They would be in the best position to answer your question. Yes, you're right. We plan to provide test 2.6.x packages as soon as patches stabilise. Please let me know if you have some recommendation (configuration flags, additional patches, etc.). Cheers, Free ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R0 2004-09-03 7:50 ` Free Ekanayaka @ 2004-09-03 8:05 ` Lee Revell 2004-09-03 9:05 ` Free Ekanayaka 2004-09-03 9:25 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-R1 Ingo Molnar 0 siblings, 2 replies; 73+ messages in thread From: Lee Revell @ 2004-09-03 8:05 UTC (permalink / raw) To: Free Ekanayaka Cc: Ingo Molnar, Eric St-Laurent, linux-kernel, K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt, Mark_H_Johnson, P.O. Gaillard, nando, luke, free78 On Fri, 2004-09-03 at 03:50, Free Ekanayaka wrote: > LR> As far as I am concerned the VP patches are stable enough for the > LR> audio-centric distros to start distributing VP kernel packages, these > LR> will certainly be using the vanilla kernel. I think the PlanetCCRMA and > LR> AGNULA people are planning to start distributing test VP-kernel packages > LR> as soon as the patches stabilize. IIRC Nando is on vacation this week. > > LR> I will make an announcement on LAD that as of R0 the VP patches should > LR> be stable and are ready for wider testing. You may want to wait until > LR> after the initial slew of bug reports before rebasing VP against MM. I > LR> suspect most of the problems with be driver specific, and most of the > LR> fixes will apply equally to -mm and vanilla. > > LR> I have added Luke (AudioSlack), Free (AGNULA), and Nando (CCRMA) to the > LR> cc: list. They would be in the best position to answer your question. > > Yes, you're right. We plan to provide test 2.6.x packages as soon as > patches stabilise. Please let me know if you have some recommendation > (configuration flags, additional patches, etc.). > As of -R0 it's definitely stable on UP and SMP users are reporting the same. All known problems should be fixed, and there are no known regressions. You should probably post a UP version and have your users test that before posting SMP packages, the latter are not quite as well tested. No other patches (ie various scheduler tweaks, CK) should be necessary, and in fact are not recommended because they might mask latency issues that we would rather fix. I use Debian unstable which should be pretty close to AGNULA, and for several weeks now I have been unable to produce an xrun in jack at 32 frames no matter what I throw at the machine. I actually have not had xrun debugging enabled in weeks because I don't get any xruns. Any problems at this point are most likely going to involve less common hardware, stuff the LKML testers don't have. Lee ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R0 2004-09-03 8:05 ` Lee Revell @ 2004-09-03 9:05 ` Free Ekanayaka 2004-09-03 9:25 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-R1 Ingo Molnar 1 sibling, 0 replies; 73+ messages in thread From: Free Ekanayaka @ 2004-09-03 9:05 UTC (permalink / raw) To: Lee Revell Cc: Ingo Molnar, Eric St-Laurent, linux-kernel, K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt, Mark_H_Johnson, P.O. Gaillard, nando, luke |--==> "LR" == Lee Revell <rlrevell@joe-job.com> writes: LR> On Fri, 2004-09-03 at 03:50, Free Ekanayaka wrote: LR> As far as I am concerned the VP patches are stable enough for the LR> audio-centric distros to start distributing VP kernel packages, these LR> will certainly be using the vanilla kernel. I think the PlanetCCRMA and LR> AGNULA people are planning to start distributing test VP-kernel packages LR> as soon as the patches stabilize. IIRC Nando is on vacation this week. >> LR> I will make an announcement on LAD that as of R0 the VP patches should LR> be stable and are ready for wider testing. You may want to wait until LR> after the initial slew of bug reports before rebasing VP against MM. I LR> suspect most of the problems with be driver specific, and most of the LR> fixes will apply equally to -mm and vanilla. >> LR> I have added Luke (AudioSlack), Free (AGNULA), and Nando (CCRMA) to the LR> cc: list. They would be in the best position to answer your question. >> >>Yes, you're right. We plan to provide test 2.6.x packages as soon as >>patches stabilise. Please let me know if you have some recommendation >>(configuration flags, additional patches, etc.). >> LR> As of -R0 it's definitely stable on UP and SMP users are reporting the LR> same. All known problems should be fixed, and there are no known LR> regressions. You should probably post a UP version and have your users LR> test that before posting SMP packages, the latter are not quite as well LR> tested. LR> No other patches (ie various scheduler tweaks, CK) should be necessary, LR> and in fact are not recommended because they might mask latency issues LR> that we would rather fix. LR> I use Debian unstable which should be pretty close to AGNULA, and for LR> several weeks now I have been unable to produce an xrun in jack at 32 LR> frames no matter what I throw at the machine. I actually have not had LR> xrun debugging enabled in weeks because I don't get any xruns. LR> Any problems at this point are most likely going to involve less common LR> hardware, stuff the LKML testers don't have. A/DeMuDi is based on a sarge snapshot (currently 17 July.. time to shift forward!), plus some bits of sid (especially up to date audio packages). Which kernel tree are you using? Have you built your kernel starting from the stock kernel-source-2.6.8 package along with the relevant debian patch? Cheers, Free ^ permalink raw reply [flat|nested] 73+ messages in thread
* [patch] voluntary-preempt-2.6.9-rc1-bk4-R1 2004-09-03 8:05 ` Lee Revell 2004-09-03 9:05 ` Free Ekanayaka @ 2004-09-03 9:25 ` Ingo Molnar 2004-09-03 9:50 ` Luke Yelavich ` (2 more replies) 1 sibling, 3 replies; 73+ messages in thread From: Ingo Molnar @ 2004-09-03 9:25 UTC (permalink / raw) To: Lee Revell Cc: Free Ekanayaka, Eric St-Laurent, linux-kernel, K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt, Mark_H_Johnson, P.O. Gaillard, nando, luke, free78 * Lee Revell <rlrevell@joe-job.com> wrote: > As of -R0 it's definitely stable on UP and SMP users are reporting the > same. All known problems should be fixed, and there are no known > regressions. You should probably post a UP version and have your > users test that before posting SMP packages, the latter are not quite > as well tested. Florian Schmidt reported a minor bug that prevents a successful build if !CONFIG_LATENCY_TRACE - i've uploaded -R1 that fixes this: http://redhat.com/~mingo/voluntary-preempt/voluntary-preempt-2.6.9-rc1-bk4-R1 there are no other changes in -R1, and there are no known pending bugs currently. for a packaged kernel i'd suggest to enable all the CONFIG_PREEMPT_* values in the .config, including CONFIG_PREEMPT_TIMING. CONFIG_LATENCY_TRACE can add overhead if active, so while it would be useful for initial packages to enable this .config option, i'd suggest to turn it off by default by changing 'tracing_enabled = 1' to 'tracing_enabled = 0' in the patch. Then people can enable it and do precise tracing whenever they encounter a particular high latency on their system. configuring the threaded/nonthreaded properties of IRQ handlers can be tricky. Perhaps a script could scan /proc (and/or /sys) for audio interrupts and make them directly executed? Unfortunately audio interrupt handler requests dont seem to be going through some central ALSA function so i see no easy way to somehow tag audio interrupts automatically and provide some /proc flag to make audio interrupts non-threaded by default. Ingo ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R1 2004-09-03 9:25 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-R1 Ingo Molnar @ 2004-09-03 9:50 ` Luke Yelavich 2004-09-03 10:29 ` Ingo Molnar 2004-09-03 11:33 ` Thomas Charbonnel 2004-09-03 11:36 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-R2 Ingo Molnar 2 siblings, 1 reply; 73+ messages in thread From: Luke Yelavich @ 2004-09-03 9:50 UTC (permalink / raw) To: Ingo Molnar Cc: Lee Revell, Free Ekanayaka, Eric St-Laurent, linux-kernel, K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt, Mark_H_Johnson, P.O. Gaillard, nando, free78 On Fri, Sep 03, 2004 at 07:25:47PM EST, Ingo Molnar wrote: > > * Lee Revell <rlrevell@joe-job.com> wrote: > > > As of -R0 it's definitely stable on UP and SMP users are reporting the > > same. All known problems should be fixed, and there are no known > > regressions. You should probably post a UP version and have your > > users test that before posting SMP packages, the latter are not quite > > as well tested. > > Florian Schmidt reported a minor bug that prevents a successful build if > !CONFIG_LATENCY_TRACE - i've uploaded -R1 that fixes this: > > http://redhat.com/~mingo/voluntary-preempt/voluntary-preempt-2.6.9-rc1-bk4-R1 > > there are no other changes in -R1, and there are no known pending bugs > currently. I think I might be having a problem here, that I didn't have with previous patches. Bare in mind that the previous patch I tested was against 2.6.8.1 vanilla. I am using a D-Link KVM here between my notebook and my desktop machine. It is the desktop I am currently testing these patches on, and the KVM requires a double-tap of the scroll lock key to switch between machines. With the latest R0 patch, something is not working when I attempt to change from my desktop to my notebook. The KVM usually lets out a beep when I can use the arrow keys to switch, but it isn't here. Adding to that, my console locks up totally for about 10 seconds, before allowing me to go on and type commands. I also seem to get some debugging output or a trace of some sort when rebooting, and the kernel panics with the message: (0)Kernel Panic - not syncing: Failed exception in interrupt Included is some dmesg output from beginning of bootup. Let me know if any more info is needed. -- Luke Yelavich http://www.audioslack.com luke@audioslack.com CPU: After generic identify, caps: bfebfbff 00000000 00000000 00000000 CPU: After vendor identify, caps: bfebfbff 00000000 00000000 00000000 CPU: Trace cache: 12K uops, L1 D cache: 8K CPU: L2 cache: 512K CPU: After all inits, caps: bfebfbff 00000000 00000000 00000080 CPU: Intel(R) Pentium(R) 4 CPU 2.40GHz stepping 07 Enabling fast FPU save and restore... done. Enabling unmasked SIMD FPU exception support... done. Checking 'hlt' instruction... OK. Badness in schedule at kernel/sched.c:2637 [<c0302fb0>] schedule+0x558/0x55d [<c0302ff5>] preempt_schedule+0x40/0x5f [<c010041f>] rest_init+0x67/0x73 [<c03da780>] start_kernel+0x195/0x1d4 [<c03da38c>] unknown_bootoption+0x0/0x147 (swapper/1): new 666 us maximum-latency critical section. => started at: <preempt_schedule+0x3b/0x5f> => ended at: <finish_task_switch+0x37/0x93> [<c012f01f>] check_preempt_timing+0x11b/0x1da [<c01103df>] finish_task_switch+0x37/0x93 [<c01103df>] finish_task_switch+0x37/0x93 [<c012f212>] sub_preempt_count+0x4d/0x65 [<c012f212>] sub_preempt_count+0x4d/0x65 [<c01103df>] finish_task_switch+0x37/0x93 [<c011045e>] schedule_tail+0x23/0x5c [<c0105f5e>] ret_from_fork+0x6/0x14 [<c010046b>] init+0x0/0x179 [<c0104210>] kernel_thread_helper+0x0/0xb Badness in schedule at kernel/sched.c:2637 [<c0302fb0>] schedule+0x558/0x55d [<c0302ff5>] preempt_schedule+0x40/0x5f [<c0110439>] finish_task_switch+0x91/0x93 [<c011045e>] schedule_tail+0x23/0x5c [<c0105f5e>] ret_from_fork+0x6/0x14 [<c010046b>] init+0x0/0x179 [<c0104210>] kernel_thread_helper+0x0/0xb NET: Registered protocol family 16 PCI: PCI BIOS revision 2.10 entry at 0xf1e60, last bus=2 (swapper/1): new 904 us maximum-latency critical section. => started at: <cond_resched+0xd/0x41> => ended at: <cond_resched+0xd/0x41> [<c012f01f>] check_preempt_timing+0x11b/0x1da [<c0303417>] cond_resched+0xd/0x41 [<c0303417>] cond_resched+0xd/0x41 [<c012f114>] touch_preempt_timing+0x36/0x3a [<c012f114>] touch_preempt_timing+0x36/0x3a [<c0303417>] cond_resched+0xd/0x41 [<c0138b6b>] kmem_cache_alloc+0x5f/0x61 [<c0119fe8>] __request_region+0x26/0xbd [<c03f1336>] pci_direct_init+0x3e/0x10e [<c03da807>] do_initcalls+0x2f/0xbc [<c01004c4>] init+0x59/0x179 [<c010046b>] init+0x0/0x179 [<c0104215>] kernel_thread_helper+0x5/0xb PCI: Using configuration type 1 Linux Plug and Play Support v0.97 (c) Adam Belay SCSI subsystem initialized usbcore: registered new driver usbfs usbcore: registered new driver hub <snip> get_random_bytes called before random driver initialization (swapper/1): new 6605 us maximum-latency critical section. => started at: <cond_resched+0xd/0x41> => ended at: <cond_resched+0xd/0x41> [<c012f01f>] check_preempt_timing+0x11b/0x1da [<c0303417>] cond_resched+0xd/0x41 [<c0303417>] cond_resched+0xd/0x41 [<c012f114>] touch_preempt_timing+0x36/0x3a [<c012f114>] touch_preempt_timing+0x36/0x3a [<c0303417>] cond_resched+0xd/0x41 [<c0138da7>] __kmalloc+0x89/0x90 [<c0183f6c>] proc_create+0x86/0xd9 [<c018416b>] create_proc_entry+0x66/0xc9 [<c03f1fcc>] dev_proc_init+0x2b/0xa3 [<c03f206c>] net_dev_init+0x28/0x17c [<c03f1ce4>] pcibios_init+0x65/0x7e [<c03da807>] do_initcalls+0x2f/0xbc [<c01004c4>] init+0x59/0x179 [<c010046b>] init+0x0/0x179 [<c0104215>] kernel_thread_helper+0x5/0xb (swapper/1): new 43385 us maximum-latency critical section. => started at: <cond_resched+0xd/0x41> => ended at: <cond_resched+0xd/0x41> [<c012f01f>] check_preempt_timing+0x11b/0x1da [<c0303417>] cond_resched+0xd/0x41 [<c0107b1d>] do_IRQ+0x14a/0x18e [<c0303417>] cond_resched+0xd/0x41 [<c012f114>] touch_preempt_timing+0x36/0x3a [<c012f114>] touch_preempt_timing+0x36/0x3a [<c0303417>] cond_resched+0xd/0x41 [<c0138b6b>] kmem_cache_alloc+0x5f/0x61 [<c0119fe8>] __request_region+0x26/0xbd [<c01efb09>] isapnp_next_rdp+0x66/0xa1 [<c03e6d2d>] isapnp_isolate_rdp_select+0x5b/0xc5 [<c03e6eda>] isapnp_isolate+0x143/0x1f8 [<c03e7c15>] isapnp_init+0xbb/0x2da [<c03da807>] do_initcalls+0x2f/0xbc [<c01004c4>] init+0x59/0x179 [<c010046b>] init+0x0/0x179 [<c0104215>] kernel_thread_helper+0x5/0xb (swapper/1): new 43492 us maximum-latency critical section. => started at: <cond_resched+0xd/0x41> => ended at: <cond_resched+0xd/0x41> [<c012f01f>] check_preempt_timing+0x11b/0x1da [<c0303417>] cond_resched+0xd/0x41 [<c0303417>] cond_resched+0xd/0x41 [<c012f114>] touch_preempt_timing+0x36/0x3a [<c012f114>] touch_preempt_timing+0x36/0x3a [<c0303417>] cond_resched+0xd/0x41 [<c0138b6b>] kmem_cache_alloc+0x5f/0x61 [<c0119fe8>] __request_region+0x26/0xbd [<c01efb09>] isapnp_next_rdp+0x66/0xa1 [<c03e6d2d>] isapnp_isolate_rdp_select+0x5b/0xc5 [<c03e6eda>] isapnp_isolate+0x143/0x1f8 [<c03e7c15>] isapnp_init+0xbb/0x2da [<c03da807>] do_initcalls+0x2f/0xbc [<c01004c4>] init+0x59/0x179 [<c010046b>] init+0x0/0x179 [<c0104215>] kernel_thread_helper+0x5/0xb (swapper/1): new 43561 us maximum-latency critical section. => started at: <cond_resched+0xd/0x41> => ended at: <cond_resched+0xd/0x41> [<c012f01f>] check_preempt_timing+0x11b/0x1da [<c0303417>] cond_resched+0xd/0x41 [<c0107b1d>] do_IRQ+0x14a/0x18e [<c0303417>] cond_resched+0xd/0x41 [<c012f114>] touch_preempt_timing+0x36/0x3a [<c012f114>] touch_preempt_timing+0x36/0x3a [<c0303417>] cond_resched+0xd/0x41 [<c0138b6b>] kmem_cache_alloc+0x5f/0x61 [<c0119fe8>] __request_region+0x26/0xbd [<c01efb09>] isapnp_next_rdp+0x66/0xa1 [<c03e6d2d>] isapnp_isolate_rdp_select+0x5b/0xc5 [<c03e6eda>] isapnp_isolate+0x143/0x1f8 [<c03e7c15>] isapnp_init+0xbb/0x2da [<c03da807>] do_initcalls+0x2f/0xbc [<c01004c4>] init+0x59/0x179 [<c010046b>] init+0x0/0x179 [<c0104215>] kernel_thread_helper+0x5/0xb isapnp: No Plug & Play device found requesting new irq thread for IRQ8... Real Time Clock Driver v1.12 Using anticipatory io scheduler <snip> requesting new irq thread for IRQ6... <snip> IRQ#6 thread started up. eth0: Tigon3 [partno(BCM95702A20) rev 1002 PHY(5703)] (PCI:33MHz:32-bit) 10/100/1000BaseT Ethernet 00:e0:18:df:d5:b6 eth0: HostTXDS[1] RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] Split[0] WireSpeed[1] TSOcap[1] requesting new irq thread for IRQ3... (dhcpcd/2226): new 115753 us maximum-latency critical section. => started at: <tg3_open+0xc9/0x263 [tg3]> => ended at: <tg3_open+0x122/0x263 [tg3]> [<c012f01f>] check_preempt_timing+0x11b/0x1da [<f88c805a>] tg3_open+0x122/0x263 [tg3] [<f88c805a>] tg3_open+0x122/0x263 [tg3] [<c012f212>] sub_preempt_count+0x4d/0x65 [<c012f212>] sub_preempt_count+0x4d/0x65 [<f88c805a>] tg3_open+0x122/0x263 [tg3] [<c029f2ad>] dev_open+0xcf/0xfe [<c02a30c1>] dev_mc_upload+0x3d/0x5d [<c02a0987>] dev_change_flags+0x5b/0x12c [<c029e930>] __dev_get_by_name+0xe/0xb9 [<c02e2d4b>] devinet_ioctl+0x247/0x5c1 [<c02e5066>] inet_ioctl+0x5f/0x9f [<c0301f9d>] packet_ioctl+0x14d/0x177 [<c0296a96>] sock_ioctl+0x112/0x2df [<c0161e06>] sys_ioctl+0x13a/0x2a0 [<c0106087>] syscall_call+0x7/0xb IRQ#3 thread started up. divert: not allocating divert_blk for non-ethernet device sit0 (IRQ 1/716): new 204218 us maximum-latency critical section. => started at: <__do_softirq+0x39/0x59> => ended at: <__do_softirq+0x4a/0x59> [<c012f01f>] check_preempt_timing+0x11b/0x1da [<c01186d0>] __do_softirq+0x4a/0x59 [<c01186d0>] __do_softirq+0x4a/0x59 [<c012f212>] sub_preempt_count+0x4d/0x65 [<c012f212>] sub_preempt_count+0x4d/0x65 [<c01186d0>] __do_softirq+0x4a/0x59 [<c01194d6>] do_irqd+0x71/0x91 [<c012819f>] kthread+0xaa/0xaf [<c0119465>] do_irqd+0x0/0x91 [<c01280f5>] kthread+0x0/0xaf [<c0104215>] kernel_thread_helper+0x5/0xb ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R1 2004-09-03 9:50 ` Luke Yelavich @ 2004-09-03 10:29 ` Ingo Molnar 2004-09-03 10:43 ` Luke Yelavich 0 siblings, 1 reply; 73+ messages in thread From: Ingo Molnar @ 2004-09-03 10:29 UTC (permalink / raw) To: Luke Yelavich Cc: Lee Revell, Free Ekanayaka, Eric St-Laurent, linux-kernel, K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt, Mark_H_Johnson, P.O. Gaillard, nando, free78 * Luke Yelavich <luke@audioslack.com> wrote: > I think I might be having a problem here, that I didn't have with > previous patches. Bare in mind that the previous patch I tested was > against 2.6.8.1 vanilla. > > I am using a D-Link KVM here between my notebook and my desktop > machine. It is the desktop I am currently testing these patches on, > and the KVM requires a double-tap of the scroll lock key to switch > between machines. With the latest R0 patch, something is not working > when I attempt to change from my desktop to my notebook. The KVM > usually lets out a beep when I can use the arrow keys to switch, but > it isn't here. Adding to that, my console locks up totally for about > 10 seconds, before allowing me to go on and type commands. [...] i have a KVM too to two testsystems and unfortunately i cannot reproduce your problems neither with KVM (key-based-)switching nor with scroll lock. But this KVM uses a triple-key combination to switch, not scroll-lock. it's a PS2 keyboard, right? If yes then does: echo 0 > /proc/irq/1/i8042/threaded ( maybe also: echo 0 > /proc/irq/12/i8042/threaded ) fix the problem? The PS2 driver has been a bit unrobust when hardirq redirection is enabled. > [...] I also seem to get some debugging output or a trace of some sort > when rebooting, and the kernel panics with the message: (0)Kernel > Panic - not syncing: Failed exception in interrupt hm. Would be nice to get a serial console capture of this, if possible. Ingo ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R1 2004-09-03 10:29 ` Ingo Molnar @ 2004-09-03 10:43 ` Luke Yelavich 0 siblings, 0 replies; 73+ messages in thread From: Luke Yelavich @ 2004-09-03 10:43 UTC (permalink / raw) To: Ingo Molnar Cc: Lee Revell, Free Ekanayaka, Eric St-Laurent, linux-kernel, K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt, Mark_H_Johnson, P.O. Gaillard, nando, free78 On Fri, Sep 03, 2004 at 08:29:48PM EST, Ingo Molnar wrote: > * Luke Yelavich <luke@audioslack.com> wrote: > > I am using a D-Link KVM here between my notebook and my desktop > > machine. It is the desktop I am currently testing these patches on, > > and the KVM requires a double-tap of the scroll lock key to switch > > between machines. With the latest R0 patch, something is not working > > when I attempt to change from my desktop to my notebook. The KVM > > usually lets out a beep when I can use the arrow keys to switch, but > > it isn't here. Adding to that, my console locks up totally for about > > 10 seconds, before allowing me to go on and type commands. [...] > > i have a KVM too to two testsystems and unfortunately i cannot reproduce > your problems neither with KVM (key-based-)switching nor with scroll > lock. But this KVM uses a triple-key combination to switch, not > scroll-lock. > > it's a PS2 keyboard, right? Yes, that is correct. > If yes then does: > > echo 0 > /proc/irq/1/i8042/threaded > ( maybe also: echo 0 > /proc/irq/12/i8042/threaded ) > > fix the problem? The PS2 driver has been a bit unrobust when hardirq > redirection is enabled. I only had to turn off IRQ threading for 1, and all is well again. > > > [...] I also seem to get some debugging output or a trace of some sort > > when rebooting, and the kernel panics with the message: (0)Kernel > > Panic - not syncing: Failed exception in interrupt > > hm. Would be nice to get a serial console capture of this, if possible. I am pretty sure I have a cable, however my desktop is the only computer in this room that has serial ports, and I also don't know how to set up a serial console. If you feel that this is important, I will see what I can do. -- Luke Yelavich http://www.audioslack.com luke@audioslack.com ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R1 2004-09-03 9:25 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-R1 Ingo Molnar 2004-09-03 9:50 ` Luke Yelavich @ 2004-09-03 11:33 ` Thomas Charbonnel 2004-09-03 11:49 ` Ingo Molnar 2004-09-03 11:36 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-R2 Ingo Molnar 2 siblings, 1 reply; 73+ messages in thread From: Thomas Charbonnel @ 2004-09-03 11:33 UTC (permalink / raw) To: Ingo Molnar Cc: Lee Revell, Free Ekanayaka, Eric St-Laurent, linux-kernel, K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt, Mark_H_Johnson, P.O. Gaillard, nando, luke, free78 Ingo Molnar wrote : > Florian Schmidt reported a minor bug that prevents a successful build if > !CONFIG_LATENCY_TRACE - i've uploaded -R1 that fixes this: > > http://redhat.com/~mingo/voluntary-preempt/voluntary-preempt-2.6.9-rc1-bk4-R1 > I still get > 170 us latency from rtl8139 : http://www.undata.org/~thomas/R1_rtl8139.trace And again this one : preemption latency trace v1.0.5 on 2.6.9-rc1-VP-R1 -------------------------------------------------- latency: 597 us, entries: 12 (12) ----------------- | task: swapper/0, uid:0 nice:0 policy:0 rt_prio:0 ----------------- => started at: smp_apic_timer_interrupt+0x32/0xd0 => ended at: smp_apic_timer_interrupt+0x86/0xd0 =======> 00010000 0.000ms (+0.000ms): smp_apic_timer_interrupt (apic_timer_interrupt) 00010000 0.000ms (+0.000ms): profile_tick (smp_apic_timer_interrupt) 00010000 0.000ms (+0.000ms): profile_hook (profile_tick) 00010001 0.000ms (+0.595ms): notifier_call_chain (profile_hook) 00010000 0.595ms (+0.000ms): do_nmi (mcount) 00020000 0.596ms (+0.000ms): profile_tick (nmi_watchdog_tick) 00020000 0.596ms (+0.000ms): profile_hook (profile_tick) 00020001 0.597ms (+0.000ms): notifier_call_chain (profile_hook) 00020000 0.597ms (+689953.444ms): profile_hit (nmi_watchdog_tick) 00010001 689954.042ms (+1.141ms): update_process_times (do_timer) 00000001 0.597ms (+0.000ms): sub_preempt_count (smp_apic_timer_interrupt) 00000001 0.598ms (+0.000ms): update_max_trace (check_preempt_timing) Thomas ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R1 2004-09-03 11:33 ` Thomas Charbonnel @ 2004-09-03 11:49 ` Ingo Molnar 2004-09-03 12:05 ` Thomas Charbonnel 2004-09-03 16:14 ` Thomas Charbonnel 0 siblings, 2 replies; 73+ messages in thread From: Ingo Molnar @ 2004-09-03 11:49 UTC (permalink / raw) To: Thomas Charbonnel Cc: Lee Revell, Free Ekanayaka, Eric St-Laurent, linux-kernel, K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt, Mark_H_Johnson, P.O. Gaillard, nando, luke, free78 [-- Attachment #1: Type: text/plain, Size: 2395 bytes --] * Thomas Charbonnel <thomas@undata.org> wrote: > I still get > 170 us latency from rtl8139 : > http://www.undata.org/~thomas/R1_rtl8139.trace this is a single-packet latency, we wont get much lower than this with the current techniques. Disabling ip_conntrack and tracing ought to lower the real latency somewhat. > And again this one : > preemption latency trace v1.0.5 on 2.6.9-rc1-VP-R1 > -------------------------------------------------- > latency: 597 us, entries: 12 (12) > ----------------- > | task: swapper/0, uid:0 nice:0 policy:0 rt_prio:0 > ----------------- > => started at: smp_apic_timer_interrupt+0x32/0xd0 > => ended at: smp_apic_timer_interrupt+0x86/0xd0 > =======> > 00010000 0.000ms (+0.000ms): smp_apic_timer_interrupt (apic_timer_interrupt) > 00010000 0.000ms (+0.000ms): profile_tick (smp_apic_timer_interrupt) > 00010000 0.000ms (+0.000ms): profile_hook (profile_tick) > 00010001 0.000ms (+0.595ms): notifier_call_chain (profile_hook) > 00010000 0.595ms (+0.000ms): do_nmi (mcount) > 00020000 0.596ms (+0.000ms): profile_tick (nmi_watchdog_tick) > 00020000 0.596ms (+0.000ms): profile_hook (profile_tick) > 00020001 0.597ms (+0.000ms): notifier_call_chain (profile_hook) > 00020000 0.597ms (+689953.444ms): profile_hit (nmi_watchdog_tick) > 00010001 689954.042ms (+1.141ms): update_process_times (do_timer) > 00000001 0.597ms (+0.000ms): sub_preempt_count (smp_apic_timer_interrupt) > 00000001 0.598ms (+0.000ms): update_max_trace (check_preempt_timing) this is a pretty weird one. First it shows an apparently non-monotonic RDTSC: the jump forward and backward in time around profile_hit. I suspect the real RDTSC value was lower than the previous one and caused an underflow. What is your cpu_khz in /proc/cpuinfo? the other weird one is the +0.595 usec entry at notifier_call_chain(). That code is just a couple of instructions, so no real for any overhead there. could you try the attached robust-get-cycles.patch ontop of your current tree and see whether it impacts these weirdnesses? The patch makes sure that the cycle counter is sane: two subsequent readings of it were monotonic and less than 1000 cycles apart. this patch probably wont remove the +0.595 msec latency though - the RDTSC value jumped forward there permanently. Maybe the RDTSC value is somehow corrupted by NMIs - could you turn off the NMI watchdog to check? Ingo [-- Attachment #2: robust-get-cycles.patch --] [-- Type: text/plain, Size: 1594 bytes --] --- linux/kernel/latency.c.orig2 +++ linux/kernel/latency.c @@ -66,6 +66,18 @@ static unsigned long notrace cycles_to_u return (unsigned long) delta; } +static cycles_t notrace robust_get_cycles(void) +{ + cycles_t t0 = get_cycles(), t1; + + for (;;) { + t1 = get_cycles(); + if (t1 - t0 < 1000) + return t1; + t0 = t1; + } +} + #ifdef CONFIG_LATENCY_TRACE unsigned int trace_enabled = 1; @@ -89,7 +101,7 @@ ____trace(struct cpu_trace *tr, unsigned entry = tr->trace + tr->trace_idx; entry->eip = eip; entry->parent_eip = parent_eip; - entry->timestamp = get_cycles(); + entry->timestamp = robust_get_cycles(); entry->preempt_count = preempt_count(); } tr->trace_idx++; @@ -295,7 +307,7 @@ check_preempt_timing(struct cpu_trace *t return; #endif atomic_inc(&tr->disabled); - latency = cycles_to_usecs(get_cycles() - tr->preempt_timestamp); + latency = cycles_to_usecs(robust_get_cycles() - tr->preempt_timestamp); if (preempt_thresh) { if (latency < preempt_thresh) @@ -337,7 +349,7 @@ check_preempt_timing(struct cpu_trace *t out: #ifdef CONFIG_LATENCY_TRACE tr->trace_idx = 0; - tr->preempt_timestamp = get_cycles(); + tr->preempt_timestamp = robust_get_cycles(); #endif tr->critical_start = parent_eip; __trace(eip, parent_eip); @@ -376,7 +388,7 @@ void notrace add_preempt_count(int val) struct cpu_trace *tr = &__get_cpu_var(trace); local_irq_save(flags); - tr->preempt_timestamp = get_cycles(); + tr->preempt_timestamp = robust_get_cycles(); tr->critical_start = eip; #ifdef CONFIG_LATENCY_TRACE tr->trace_idx = 0; ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R1 2004-09-03 11:49 ` Ingo Molnar @ 2004-09-03 12:05 ` Thomas Charbonnel 2004-09-03 16:14 ` Thomas Charbonnel 1 sibling, 0 replies; 73+ messages in thread From: Thomas Charbonnel @ 2004-09-03 12:05 UTC (permalink / raw) To: Ingo Molnar Cc: Lee Revell, Free Ekanayaka, Eric St-Laurent, linux-kernel, K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt, Mark_H_Johnson, P.O. Gaillard, nando, luke, free78 Ingo Molnar wrote : > * Thomas Charbonnel <thomas@undata.org> wrote: > > > I still get > 170 us latency from rtl8139 : > > http://www.undata.org/~thomas/R1_rtl8139.trace > > this is a single-packet latency, we wont get much lower than this with > the current techniques. Disabling ip_conntrack and tracing ought to > lower the real latency somewhat. > Ok, I'll do that. > > And again this one : > > preemption latency trace v1.0.5 on 2.6.9-rc1-VP-R1 > > -------------------------------------------------- > > latency: 597 us, entries: 12 (12) > > ----------------- > > | task: swapper/0, uid:0 nice:0 policy:0 rt_prio:0 > > ----------------- > > => started at: smp_apic_timer_interrupt+0x32/0xd0 > > => ended at: smp_apic_timer_interrupt+0x86/0xd0 > > =======> > > 00010000 0.000ms (+0.000ms): smp_apic_timer_interrupt (apic_timer_interrupt) > > 00010000 0.000ms (+0.000ms): profile_tick (smp_apic_timer_interrupt) > > 00010000 0.000ms (+0.000ms): profile_hook (profile_tick) > > 00010001 0.000ms (+0.595ms): notifier_call_chain (profile_hook) > > 00010000 0.595ms (+0.000ms): do_nmi (mcount) > > 00020000 0.596ms (+0.000ms): profile_tick (nmi_watchdog_tick) > > 00020000 0.596ms (+0.000ms): profile_hook (profile_tick) > > 00020001 0.597ms (+0.000ms): notifier_call_chain (profile_hook) > > 00020000 0.597ms (+689953.444ms): profile_hit (nmi_watchdog_tick) > > 00010001 689954.042ms (+1.141ms): update_process_times (do_timer) > > 00000001 0.597ms (+0.000ms): sub_preempt_count (smp_apic_timer_interrupt) > > 00000001 0.598ms (+0.000ms): update_max_trace (check_preempt_timing) > > this is a pretty weird one. First it shows an apparently non-monotonic > RDTSC: the jump forward and backward in time around profile_hit. I > suspect the real RDTSC value was lower than the previous one and caused > an underflow. What is your cpu_khz in /proc/cpuinfo? > root@satellite thomas # cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 11 model name : Intel(R) Pentium(R) III Mobile CPU 1000MHz stepping : 1 cpu MHz : 996.879 cache size : 512 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse bogomips : 1966.08 > the other weird one is the +0.595 usec entry at notifier_call_chain(). > That code is just a couple of instructions, so no real for any overhead > there. > > could you try the attached robust-get-cycles.patch ontop of your current > tree and see whether it impacts these weirdnesses? The patch makes sure > that the cycle counter is sane: two subsequent readings of it were > monotonic and less than 1000 cycles apart. > > this patch probably wont remove the +0.595 msec latency though - the > RDTSC value jumped forward there permanently. Maybe the RDTSC value is > somehow corrupted by NMIs - could you turn off the NMI watchdog to > check? > > Ingo I precisely enabled the NMI watchdog to track those weird latencies down. My guess is still that when ACPI is enabled my bios does something funky with SMM/SMI that increments the TSC. I'll try the patch and let you know. Thomas ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R1 2004-09-03 11:49 ` Ingo Molnar 2004-09-03 12:05 ` Thomas Charbonnel @ 2004-09-03 16:14 ` Thomas Charbonnel 2004-09-03 17:36 ` Thomas Charbonnel 1 sibling, 1 reply; 73+ messages in thread From: Thomas Charbonnel @ 2004-09-03 16:14 UTC (permalink / raw) To: Ingo Molnar Cc: Lee Revell, Free Ekanayaka, Eric St-Laurent, linux-kernel, K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt, Mark_H_Johnson, P.O. Gaillard, nando, luke, free78 Ingo Molnar wrote : > this is a pretty weird one. First it shows an apparently non-monotonic > RDTSC: the jump forward and backward in time around profile_hit. I > suspect the real RDTSC value was lower than the previous one and caused > an underflow. What is your cpu_khz in /proc/cpuinfo? > > the other weird one is the +0.595 usec entry at notifier_call_chain(). > That code is just a couple of instructions, so no real for any overhead > there. > > could you try the attached robust-get-cycles.patch ontop of your current > tree and see whether it impacts these weirdnesses? The patch makes sure > that the cycle counter is sane: two subsequent readings of it were > monotonic and less than 1000 cycles apart. > > this patch probably wont remove the +0.595 msec latency though - the > RDTSC value jumped forward there permanently. Maybe the RDTSC value is > somehow corrupted by NMIs - could you turn off the NMI watchdog to > check? Here are more traces with robust-get-cycles applied. So far no non-monotonic issue. preemption latency trace v1.0.5 on 2.6.9-rc1-VP-R1 -------------------------------------------------- latency: 626 us, entries: 25 (25) ----------------- | task: ksoftirqd/0/2, uid:0 nice:-10 policy:0 rt_prio:0 ----------------- => started at: preempt_schedule+0x3b/0x60 => ended at: schedule+0x2e4/0x570 =======> 04000000 0.000ms (+0.000ms): preempt_schedule (need_resched) 04000000 0.000ms (+0.000ms): schedule (preempt_schedule) 04000001 0.000ms (+0.620ms): sched_clock (schedule) 04010001 0.620ms (+0.000ms): do_nmi (sched_clock) 04010001 0.621ms (+0.000ms): profile_tick (nmi_watchdog_tick) 04010001 0.622ms (+0.000ms): profile_hook (profile_tick) 04010002 0.622ms (+0.000ms): notifier_call_chain (profile_hook) 04010001 0.622ms (+0.000ms): preempt_schedule (profile_tick) 04010001 0.622ms (+0.000ms): profile_hit (nmi_watchdog_tick) 04000001 0.623ms (+0.000ms): smp_apic_timer_interrupt (sched_clock) 04010001 0.623ms (+0.000ms): profile_tick (smp_apic_timer_interrupt) 04010001 0.623ms (+0.000ms): profile_hook (profile_tick) 04010002 0.623ms (+0.000ms): notifier_call_chain (profile_hook) 04010001 0.624ms (+0.000ms): preempt_schedule (profile_tick) 04010001 0.624ms (+0.000ms): profile_hit (smp_apic_timer_interrupt) 04000002 0.624ms (+0.000ms): do_softirq (smp_apic_timer_interrupt) 04000002 0.624ms (+0.000ms): __do_softirq (do_softirq) 04000002 0.625ms (+0.000ms): dequeue_task (schedule) 04000002 0.625ms (+0.000ms): recalc_task_prio (schedule) 04000002 0.625ms (+0.000ms): effective_prio (recalc_task_prio) 04000002 0.625ms (+0.000ms): enqueue_task (schedule) 00000002 0.626ms (+0.000ms): __switch_to (schedule) 00000002 0.626ms (+0.000ms): finish_task_switch (schedule) 00000001 0.626ms (+0.000ms): sub_preempt_count (schedule) 00000001 0.627ms (+0.000ms): update_max_trace (check_preempt_timing) preemption latency trace v1.0.5 on 2.6.9-rc1-VP-R1 -------------------------------------------------- latency: 460 us, entries: 38 (38) ----------------- | task: swapper/0, uid:0 nice:0 policy:0 rt_prio:0 ----------------- => started at: do_IRQ+0x19/0x190 => ended at: do_IRQ+0x14a/0x190 =======> 00010000 0.000ms (+0.000ms): do_IRQ (common_interrupt) 00010000 0.000ms (+0.000ms): do_IRQ (default_idle) 00010000 0.000ms (+0.000ms): do_IRQ (<00000000>) 00010001 0.000ms (+0.444ms): mask_and_ack_8259A (do_IRQ) 00020002 0.444ms (+0.000ms): do_nmi (mask_and_ack_8259A) 00020002 0.445ms (+0.000ms): profile_tick (nmi_watchdog_tick) 00020002 0.446ms (+0.000ms): profile_hook (profile_tick) 00020003 0.446ms (+0.000ms): notifier_call_chain (profile_hook) 00020002 0.446ms (+0.001ms): profile_hit (nmi_watchdog_tick) 00010001 0.448ms (+0.000ms): generic_redirect_hardirq (do_IRQ) 00010000 0.448ms (+0.000ms): generic_handle_IRQ_event (do_IRQ) 00010000 0.448ms (+0.000ms): timer_interrupt (generic_handle_IRQ_event) 00010001 0.448ms (+0.005ms): mark_offset_tsc (timer_interrupt) 00010001 0.454ms (+0.000ms): do_timer (timer_interrupt) 00010001 0.454ms (+0.000ms): update_process_times (do_timer) 00010001 0.454ms (+0.000ms): update_one_process (update_process_times) 00010001 0.455ms (+0.000ms): run_local_timers (update_process_times) 00010001 0.455ms (+0.000ms): raise_softirq (update_process_times) 00010001 0.455ms (+0.000ms): scheduler_tick (update_process_times) 00010001 0.455ms (+0.000ms): sched_clock (scheduler_tick) 00010001 0.456ms (+0.000ms): update_wall_time (do_timer) 00010001 0.456ms (+0.000ms): update_wall_time_one_tick (update_wall_time) 00010001 0.456ms (+0.000ms): generic_note_interrupt (do_IRQ) 00010001 0.456ms (+0.000ms): end_8259A_irq (do_IRQ) 00010001 0.457ms (+0.001ms): enable_8259A_irq (do_IRQ) 00000001 0.458ms (+0.000ms): do_softirq (do_IRQ) 00000001 0.458ms (+0.000ms): __do_softirq (do_softirq) 00000001 0.458ms (+0.000ms): wake_up_process (do_softirq) 00000001 0.458ms (+0.000ms): try_to_wake_up (wake_up_process) 00000001 0.458ms (+0.000ms): task_rq_lock (try_to_wake_up) 00000002 0.459ms (+0.000ms): activate_task (try_to_wake_up) 00000002 0.459ms (+0.000ms): sched_clock (activate_task) 00000002 0.459ms (+0.000ms): recalc_task_prio (activate_task) 00000002 0.459ms (+0.000ms): effective_prio (recalc_task_prio) 00000002 0.459ms (+0.000ms): enqueue_task (activate_task) 00000001 0.460ms (+0.000ms): preempt_schedule (try_to_wake_up) 00000001 0.460ms (+0.000ms): sub_preempt_count (do_IRQ) 00000001 0.460ms (+0.000ms): update_max_trace (check_preempt_timing) preemption latency trace v1.0.5 on 2.6.9-rc1-VP-R1 -------------------------------------------------- latency: 557 us, entries: 39 (39) ----------------- | task: evolution-1.4/5492, uid:1000 nice:0 policy:0 rt_prio:0 ----------------- => started at: fget+0x32/0x80 => ended at: fget+0x58/0x80 =======> 00000001 0.000ms (+0.537ms): fget (do_pollfd) 00010001 0.537ms (+0.000ms): do_nmi (sub_preempt_count) 00010001 0.538ms (+0.000ms): profile_tick (nmi_watchdog_tick) 00010001 0.538ms (+0.000ms): profile_hook (profile_tick) 00010002 0.539ms (+0.000ms): notifier_call_chain (profile_hook) 00010001 0.539ms (+0.001ms): profile_hit (nmi_watchdog_tick) 00010001 0.540ms (+0.000ms): do_IRQ (sub_preempt_count) 00010001 0.540ms (+0.000ms): do_IRQ (<00000000>) 00010002 0.541ms (+0.002ms): mask_and_ack_8259A (do_IRQ) 00010002 0.543ms (+0.000ms): generic_redirect_hardirq (do_IRQ) 00010001 0.543ms (+0.000ms): generic_handle_IRQ_event (do_IRQ) 00010001 0.543ms (+0.000ms): timer_interrupt (generic_handle_IRQ_event) 00010002 0.544ms (+0.005ms): mark_offset_tsc (timer_interrupt) 00010002 0.549ms (+0.000ms): do_timer (timer_interrupt) 00010002 0.550ms (+0.000ms): update_process_times (do_timer) 00010002 0.550ms (+0.000ms): update_one_process (update_process_times) 00010002 0.550ms (+0.000ms): run_local_timers (update_process_times) 00010002 0.550ms (+0.000ms): raise_softirq (update_process_times) 00010002 0.551ms (+0.000ms): scheduler_tick (update_process_times) 00010002 0.551ms (+0.000ms): sched_clock (scheduler_tick) 00010003 0.551ms (+0.000ms): task_timeslice (scheduler_tick) 00010002 0.552ms (+0.000ms): update_wall_time (do_timer) 00010002 0.552ms (+0.000ms): update_wall_time_one_tick (update_wall_time) 00010002 0.552ms (+0.000ms): generic_note_interrupt (do_IRQ) 00010002 0.553ms (+0.000ms): end_8259A_irq (do_IRQ) 00010002 0.553ms (+0.001ms): enable_8259A_irq (do_IRQ) 00000002 0.554ms (+0.000ms): do_softirq (do_IRQ) 00000002 0.554ms (+0.000ms): __do_softirq (do_softirq) 00000002 0.554ms (+0.000ms): wake_up_process (do_softirq) 00000002 0.555ms (+0.000ms): try_to_wake_up (wake_up_process) 00000002 0.555ms (+0.000ms): task_rq_lock (try_to_wake_up) 00000003 0.555ms (+0.000ms): activate_task (try_to_wake_up) 00000003 0.555ms (+0.000ms): sched_clock (activate_task) 00000003 0.555ms (+0.000ms): recalc_task_prio (activate_task) 00000003 0.556ms (+0.000ms): effective_prio (recalc_task_prio) 00000003 0.556ms (+0.000ms): enqueue_task (activate_task) 00000002 0.556ms (+0.000ms): preempt_schedule (try_to_wake_up) 00000001 0.557ms (+0.000ms): sub_preempt_count (fget) 00000001 0.557ms (+0.000ms): update_max_trace (check_preempt_timing) I had similar problems affecting apparently random code paths before enabling nmi_watchdog, but if you feel it can be useful, I can do some more tests with robust-get-cycles and without nmi. Thomas ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R1 2004-09-03 16:14 ` Thomas Charbonnel @ 2004-09-03 17:36 ` Thomas Charbonnel 0 siblings, 0 replies; 73+ messages in thread From: Thomas Charbonnel @ 2004-09-03 17:36 UTC (permalink / raw) To: Ingo Molnar Cc: Lee Revell, Free Ekanayaka, Eric St-Laurent, linux-kernel, K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt, Mark_H_Johnson, P.O. Gaillard, nando, luke, free78 I wrote : > Ingo Molnar wrote : > > this is a pretty weird one. First it shows an apparently non-monotonic > > RDTSC: the jump forward and backward in time around profile_hit. I > > suspect the real RDTSC value was lower than the previous one and caused > > an underflow. What is your cpu_khz in /proc/cpuinfo? > > > > the other weird one is the +0.595 usec entry at notifier_call_chain(). > > That code is just a couple of instructions, so no real for any overhead > > there. > > > > could you try the attached robust-get-cycles.patch ontop of your current > > tree and see whether it impacts these weirdnesses? The patch makes sure > > that the cycle counter is sane: two subsequent readings of it were > > monotonic and less than 1000 cycles apart. > > > > this patch probably wont remove the +0.595 msec latency though - the > > RDTSC value jumped forward there permanently. Maybe the RDTSC value is > > somehow corrupted by NMIs - could you turn off the NMI watchdog to > > check? > > Here are more traces with robust-get-cycles applied. So far no > non-monotonic issue. In the end here's one : preemption latency trace v1.0.5 on 2.6.9-rc1-VP-R1 -------------------------------------------------- latency: 891 us, entries: 38 (38) ----------------- | task: swapper/0, uid:0 nice:0 policy:0 rt_prio:0 ----------------- => started at: do_IRQ+0x19/0x190 => ended at: do_IRQ+0x14a/0x190 =======> 00010000 0.000ms (+0.876ms): do_nmi (robust_get_cycles) 00020000 0.876ms (+0.000ms): profile_tick (nmi_watchdog_tick) 00020000 0.876ms (+0.000ms): profile_hook (profile_tick) 00020001 0.876ms (+0.000ms): notifier_call_chain (profile_hook) 00020000 0.876ms (+689952.730ms): profile_hit (nmi_watchdog_tick) 04000002 689953.607ms (+1.855ms): finish_task_switch (schedule) 00010000 0.877ms (+0.000ms): do_IRQ (default_idle) 00010000 0.877ms (+0.000ms): do_IRQ (<00000000>) 00010001 0.877ms (+0.002ms): mask_and_ack_8259A (do_IRQ) 00010001 0.879ms (+0.000ms): generic_redirect_hardirq (do_IRQ) 00010000 0.880ms (+0.000ms): generic_handle_IRQ_event (do_IRQ) 00010000 0.880ms (+0.000ms): timer_interrupt (generic_handle_IRQ_event) 00010001 0.880ms (+0.005ms): mark_offset_tsc (timer_interrupt) 00010001 0.886ms (+0.000ms): do_timer (timer_interrupt) 00010001 0.886ms (+0.000ms): update_process_times (do_timer) 00010001 0.886ms (+0.000ms): update_one_process (update_process_times) 00010001 0.886ms (+0.000ms): run_local_timers (update_process_times) 00010001 0.886ms (+0.000ms): raise_softirq (update_process_times) 00010001 0.887ms (+0.000ms): scheduler_tick (update_process_times) 00010001 0.887ms (+0.000ms): sched_clock (scheduler_tick) 00010001 0.887ms (+0.000ms): update_wall_time (do_timer) 00010001 0.887ms (+0.000ms): update_wall_time_one_tick (update_wall_time) 00010001 0.888ms (+0.000ms): generic_note_interrupt (do_IRQ) 00010001 0.888ms (+0.000ms): end_8259A_irq (do_IRQ) 00010001 0.888ms (+0.001ms): enable_8259A_irq (do_IRQ) 00000001 0.889ms (+0.000ms): do_softirq (do_IRQ) 00000001 0.889ms (+0.000ms): __do_softirq (do_softirq) 00000001 0.889ms (+0.000ms): wake_up_process (do_softirq) 00000001 0.890ms (+0.000ms): try_to_wake_up (wake_up_process) 00000001 0.890ms (+0.000ms): task_rq_lock (try_to_wake_up) 00000002 0.890ms (+0.000ms): activate_task (try_to_wake_up) 00000002 0.890ms (+0.000ms): sched_clock (activate_task) 00000002 0.890ms (+0.000ms): recalc_task_prio (activate_task) 00000002 0.891ms (+0.000ms): effective_prio (recalc_task_prio) 00000002 0.891ms (+0.000ms): enqueue_task (activate_task) 00000001 0.891ms (+0.000ms): preempt_schedule (try_to_wake_up) 00000001 0.892ms (+0.000ms): sub_preempt_count (do_IRQ) 00000001 0.892ms (+0.000ms): update_max_trace (check_preempt_timing) ^ permalink raw reply [flat|nested] 73+ messages in thread
* [patch] voluntary-preempt-2.6.9-rc1-bk4-R2 2004-09-03 9:25 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-R1 Ingo Molnar 2004-09-03 9:50 ` Luke Yelavich 2004-09-03 11:33 ` Thomas Charbonnel @ 2004-09-03 11:36 ` Ingo Molnar 2 siblings, 0 replies; 73+ messages in thread From: Ingo Molnar @ 2004-09-03 11:36 UTC (permalink / raw) To: linux-kernel Cc: Free Ekanayaka, Eric St-Laurent, linux-kernel, K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt, Mark_H_Johnson, P.O. Gaillard, nando, luke, free78, Lee Revell i've uploaded -R2: http://redhat.com/~mingo/voluntary-preempt/voluntary-preempt-2.6.9-rc1-bk4-R2 no function changes, only fixing some rough edges: it fixes an UP boot-time assert that occurs right after 'checking hlt'. I also fixed PREEMPT_TIMING + !LATENCY_TRACE build bugs, and another boot bug that occurs when PREEMPT_TIMING + !FRAME_POINTERS is used. (the reboot assert i'm not sure about - tried to reproduce but here reboot works fine. Will need some sort of serial log to debug this.) Ingo ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R0 2004-09-03 7:40 ` Lee Revell 2004-09-03 7:50 ` Free Ekanayaka @ 2004-09-03 8:09 ` Luke Yelavich 2004-09-03 8:13 ` Lee Revell 1 sibling, 1 reply; 73+ messages in thread From: Luke Yelavich @ 2004-09-03 8:09 UTC (permalink / raw) To: Lee Revell Cc: Ingo Molnar, Eric St-Laurent, linux-kernel, K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt, Mark_H_Johnson, P.O. Gaillard, nando, free78 On Fri, Sep 03, 2004 at 05:40:34PM EST, Lee Revell wrote: > On Fri, 2004-09-03 at 03:05, Ingo Molnar wrote: > > another question - any objections against me rebasing the VP patch to > > the current -mm tree and keeping it there exclusively until all possible > > merges are done? It would probably quite some work to create a complete > > patch for the upstream or BK tree during that process, as small patches > > start to flow in the VP => -mm => BK direction. Would any of the regular > > VP users/testers be wary to use the -mm tree? > > > > None here. Assuming the SMP issues are resolved, then the only > remaining issues on UP will probably be with various drivers, those > fixes should apply just as easily to -mm as vanilla. > > As far as I am concerned the VP patches are stable enough for the > audio-centric distros to start distributing VP kernel packages, these > will certainly be using the vanilla kernel. I think the PlanetCCRMA and > AGNULA people are planning to start distributing test VP-kernel packages > as soon as the patches stabilize. IIRC Nando is on vacation this week. I certainly have intensions of making test kernels available in my repository in a separate testing branch. I don't really mind for myself what tree the patch is based on, but I feel that the patches may get more testing if they are against vanilla, rather than against another tree, as users may consider the patch to be more stable. On the other hand, it is not hard for packagers like myself to do the leg work, and we can also provide a pre-patched kernel source for those who still want to roll their own. > I will make an announcement on LAD that as of R0 the VP patches should > be stable and are ready for wider testing. You may want to wait until > after the initial slew of bug reports before rebasing VP against MM. I > suspect most of the problems with be driver specific, and most of the > fixes will apply equally to -mm and vanilla. Well with Lee's help, I think I have identified an ICE1712 sound driver issue, but this is yet to be determined. > I have added Luke (AudioSlack), Free (AGNULA), and Nando (CCRMA) to the > cc: list. They would be in the best position to answer your question. I will go with whatever is decided, as I am prepared to do the legwork for my repository users. -- Luke Yelavich http://www.audioslack.com luke@audioslack.com ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R0 2004-09-03 8:09 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-R0 Luke Yelavich @ 2004-09-03 8:13 ` Lee Revell 2004-09-03 8:21 ` Luke Yelavich 2004-09-03 12:52 ` Luke Yelavich 0 siblings, 2 replies; 73+ messages in thread From: Lee Revell @ 2004-09-03 8:13 UTC (permalink / raw) To: Luke Yelavich Cc: Ingo Molnar, Eric St-Laurent, linux-kernel, K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt, Mark_H_Johnson, P.O. Gaillard, nando, free78 On Fri, 2004-09-03 at 04:09, Luke Yelavich wrote: > On Fri, Sep 03, 2004 at 05:40:34PM EST, Lee Revell wrote: > > On Fri, 2004-09-03 at 03:05, Ingo Molnar wrote: > Well with Lee's help, I think I have identified an ICE1712 sound driver issue, > but this is yet to be determined. Hmm, this one is still not fixed, using the latest VP patches? What are the symptoms again? Lee ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R0 2004-09-03 8:13 ` Lee Revell @ 2004-09-03 8:21 ` Luke Yelavich 2004-09-03 12:52 ` Luke Yelavich 1 sibling, 0 replies; 73+ messages in thread From: Luke Yelavich @ 2004-09-03 8:21 UTC (permalink / raw) To: Lee Revell Cc: Ingo Molnar, Eric St-Laurent, linux-kernel, K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt, Mark_H_Johnson, P.O. Gaillard, nando, free78 On Fri, Sep 03, 2004 at 06:13:16PM EST, Lee Revell wrote: > > On Fri, Sep 03, 2004 at 05:40:34PM EST, Lee Revell wrote: > > > On Fri, 2004-09-03 at 03:05, Ingo Molnar wrote: > > Well with Lee's help, I think I have identified an ICE1712 sound driver issue, > > but this is yet to be determined. > > Hmm, this one is still not fixed, using the latest VP patches? I have been keeping up with the patches, but haven't had a chance to try it out with them lately. Currently building the latest one, and will be able to test again when done. -- Luke Yelavich http://www.audioslack.com luke@audioslack.com ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R0 2004-09-03 8:13 ` Lee Revell 2004-09-03 8:21 ` Luke Yelavich @ 2004-09-03 12:52 ` Luke Yelavich 1 sibling, 0 replies; 73+ messages in thread From: Luke Yelavich @ 2004-09-03 12:52 UTC (permalink / raw) To: Lee Revell Cc: Ingo Molnar, Eric St-Laurent, linux-kernel, K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt, Mark_H_Johnson, P.O. Gaillard, nando, free78 On Fri, Sep 03, 2004 at 06:13:16PM EST, Lee Revell wrote: > On Fri, 2004-09-03 at 04:09, Luke Yelavich wrote: > > On Fri, Sep 03, 2004 at 05:40:34PM EST, Lee Revell wrote: > > > On Fri, 2004-09-03 at 03:05, Ingo Molnar wrote: > > Well with Lee's help, I think I have identified an ICE1712 sound driver issue, > > but this is yet to be determined. > > Hmm, this one is still not fixed, using the latest VP patches? With R2, it turns out that this problem still exists. If I turn off threading for either ICE1712 soundcard when it is being used by JACK, I get xruns of around 3 msecs. I can't remember the info you needed. What is needed to debug this problem? -- Luke Yelavich http://www.audioslack.com luke@audioslack.com ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R0 2004-09-03 7:05 ` Ingo Molnar 2004-09-03 7:40 ` Lee Revell @ 2004-09-03 18:09 ` K.R. Foley 1 sibling, 0 replies; 73+ messages in thread From: K.R. Foley @ 2004-09-03 18:09 UTC (permalink / raw) To: Ingo Molnar Cc: Lee Revell, Eric St-Laurent, linux-kernel, Felipe Alfaro Solana, Daniel Schmitt, Mark_H_Johnson, P.O. Gaillard Ingo Molnar wrote: > another question - any objections against me rebasing the VP patch to > the current -mm tree and keeping it there exclusively until all possible > merges are done? It would probably quite some work to create a complete > patch for the upstream or BK tree during that process, as small patches > start to flow in the VP => -mm => BK direction. Would any of the regular > VP users/testers be wary to use the -mm tree? > > Ingo > I don't have any objections. Should I be wary of running the -mm tree? kr ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R0 2004-09-02 21:57 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-R0 Ingo Molnar 2004-09-02 22:06 ` Lee Revell @ 2004-09-03 11:04 ` K.R. Foley 2004-09-03 17:02 ` K.R. Foley 2004-09-03 17:10 ` K.R. Foley 3 siblings, 0 replies; 73+ messages in thread From: K.R. Foley @ 2004-09-03 11:04 UTC (permalink / raw) To: Ingo Molnar Cc: linux-kernel, Felipe Alfaro Solana, Daniel Schmitt, Lee Revell, Mark_H_Johnson, P.O. Gaillard Ingo Molnar wrote: > i've released the -R0 patch: > > http://redhat.com/~mingo/voluntary-preempt/voluntary-preempt-2.6.9-rc1-bk4-R0 > > ontop of: > > http://redhat.com/~mingo/voluntary-preempt/diff-bk-040828-2.6.8.1.bz2 > > i've given up on the netdev_backlog_granularity approach, and as a > replacement i've modified specific network drivers to return at a safe > point if softirq preemption is requested. This gives the same end result > but is more robust. For the time being i've fixed 8193too.c and e100.c. > (will fix up other drivers too as latencies get reported) > > this should fix the crash reported by P.O. Gaillard, and it should solve > the packet delay/loss issues reported by Mark H Johnson. I cannot see > any problems on my rtl8193 testbox anymore. > > Ingo > Last night when rebooting on R0 for the first time, my system locked up. It appears to have happened the first time trying to send/recv any real data over the e100 interface when ntp was starting up. Will try to investigate further a little later. kr ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R0 2004-09-02 21:57 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-R0 Ingo Molnar 2004-09-02 22:06 ` Lee Revell 2004-09-03 11:04 ` K.R. Foley @ 2004-09-03 17:02 ` K.R. Foley 2004-09-03 20:40 ` Lee Revell 2004-09-03 17:10 ` K.R. Foley 3 siblings, 1 reply; 73+ messages in thread From: K.R. Foley @ 2004-09-03 17:02 UTC (permalink / raw) To: Ingo Molnar Cc: linux-kernel, Felipe Alfaro Solana, Daniel Schmitt, Lee Revell, Mark_H_Johnson, P.O. Gaillard Ingo Molnar wrote: > i've released the -R0 patch: > > http://redhat.com/~mingo/voluntary-preempt/voluntary-preempt-2.6.9-rc1-bk4-R0 > > ontop of: > > http://redhat.com/~mingo/voluntary-preempt/diff-bk-040828-2.6.8.1.bz2 > > i've given up on the netdev_backlog_granularity approach, and as a > replacement i've modified specific network drivers to return at a safe > point if softirq preemption is requested. This gives the same end result > but is more robust. For the time being i've fixed 8193too.c and e100.c. > (will fix up other drivers too as latencies get reported) > > this should fix the crash reported by P.O. Gaillard, and it should solve > the packet delay/loss issues reported by Mark H Johnson. I cannot see > any problems on my rtl8193 testbox anymore. > > Ingo > OK. I previously reported about the system hanging trying to boot this patch. It may have been a fluke. In any event, it doesn't hang consistently. I have rebooted several times now without problems. I do however still get some of these: (ksoftirqd/0/2): new 131 us maximum-latency critical section. => started at: <netif_receive_skb+0x82/0x280> => ended at: <netif_receive_skb+0x1d7/0x280> [<c0136789>] check_preempt_timing+0x119/0x1f0 [<c0255017>] netif_receive_skb+0x1d7/0x280 [<c0255017>] netif_receive_skb+0x1d7/0x280 [<c013699d>] sub_preempt_count+0x4d/0x70 [<c013699d>] sub_preempt_count+0x4d/0x70 [<c0255017>] netif_receive_skb+0x1d7/0x280 [<d08d40b7>] e100_poll+0x5b7/0x630 [e100] [<c013699d>] sub_preempt_count+0x4d/0x70 [<c025527f>] net_rx_action+0x7f/0x110 [<c011eb55>] ___do_softirq+0x75/0x90 [<c011ec1e>] _do_softirq+0xe/0x20 [<c011f034>] ksoftirqd+0x94/0xe0 [<c012f8aa>] kthread+0xaa/0xb0 [<c011efa0>] ksoftirqd+0x0/0xe0 [<c012f800>] kthread+0x0/0xb0 [<c0104575>] kernel_thread_helper+0x5/0x10 ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R0 2004-09-03 17:02 ` K.R. Foley @ 2004-09-03 20:40 ` Lee Revell 0 siblings, 0 replies; 73+ messages in thread From: Lee Revell @ 2004-09-03 20:40 UTC (permalink / raw) To: K.R. Foley Cc: Ingo Molnar, linux-kernel, Felipe Alfaro Solana, Daniel Schmitt, Mark_H_Johnson, P.O. Gaillard On Fri, 2004-09-03 at 13:02, K.R. Foley wrote: > OK. I previously reported about the system hanging trying to boot this > patch. It may have been a fluke. In any event, it doesn't hang > consistently. I have rebooted several times now without problems. I do > however still get some of these: > > (ksoftirqd/0/2): new 131 us maximum-latency critical section. > => started at: <netif_receive_skb+0x82/0x280> > => ended at: <netif_receive_skb+0x1d7/0x280> This is actually a good sign if this is the most common one you see. The netif_receive softirq is basically the longest code path that can't easily be made preemptible; on my hardware about 150 usecs seems to be the best we can do. This is really good, I can't imagine an application that you would use PC hardware for where 150 usecs is not enough, though I am sure the hard realtime crowd has a few. Even fixing this one would not help much because it seems there are many other non-preemtible code paths that are only a little shorter than this one. Lee ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R0 2004-09-02 21:57 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-R0 Ingo Molnar ` (2 preceding siblings ...) 2004-09-03 17:02 ` K.R. Foley @ 2004-09-03 17:10 ` K.R. Foley 2004-09-03 18:17 ` Ingo Molnar 2004-09-03 18:39 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-R0 Ingo Molnar 3 siblings, 2 replies; 73+ messages in thread From: K.R. Foley @ 2004-09-03 17:10 UTC (permalink / raw) To: Ingo Molnar Cc: linux-kernel, Felipe Alfaro Solana, Daniel Schmitt, Lee Revell, Mark_H_Johnson, P.O. Gaillard Ingo Molnar wrote: > i've released the -R0 patch: > > http://redhat.com/~mingo/voluntary-preempt/voluntary-preempt-2.6.9-rc1-bk4-R0 > > ontop of: > > http://redhat.com/~mingo/voluntary-preempt/diff-bk-040828-2.6.8.1.bz2 > Managed to hang the system again under heavy load. This time with the above patch: http://www.cybsft.com/testresults/crashes/2.6.9-rc1-bk4-R0.txt Last time was with Q7: http://www.cybsft.com/testresults/crashes/2.6.9-rc1-bk4-Q7.txt kr ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R0 2004-09-03 17:10 ` K.R. Foley @ 2004-09-03 18:17 ` Ingo Molnar 2004-09-03 18:36 ` K.R. Foley 2004-09-03 19:30 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-R3 Ingo Molnar 2004-09-03 18:39 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-R0 Ingo Molnar 1 sibling, 2 replies; 73+ messages in thread From: Ingo Molnar @ 2004-09-03 18:17 UTC (permalink / raw) To: K.R. Foley Cc: linux-kernel, Felipe Alfaro Solana, Daniel Schmitt, Lee Revell, Mark_H_Johnson, P.O. Gaillard * K.R. Foley <kr@cybsft.com> wrote: > Managed to hang the system again under heavy load. This time with the > above patch: > > http://www.cybsft.com/testresults/crashes/2.6.9-rc1-bk4-R0.txt > > Last time was with Q7: > > http://www.cybsft.com/testresults/crashes/2.6.9-rc1-bk4-Q7.txt seems to be the same thing - an unbalanced preemption count, possibly due to some locking error. Unfortunately the assert catches the imbalance only at exit time. (it's unlikely that the do_exit() code is buggy.) i'll add a new feature to debug this: when crashing on an assert and tracing is enabled the trace leading up to the crash will be printed to the console. How did you capture the crash - was it in the syslog or do you have serial logging? Maybe this is not the real crash but only a followup crash? Ingo ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R0 2004-09-03 18:17 ` Ingo Molnar @ 2004-09-03 18:36 ` K.R. Foley 2004-09-03 19:30 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-R3 Ingo Molnar 1 sibling, 0 replies; 73+ messages in thread From: K.R. Foley @ 2004-09-03 18:36 UTC (permalink / raw) To: Ingo Molnar Cc: linux-kernel, Felipe Alfaro Solana, Daniel Schmitt, Lee Revell, Mark_H_Johnson, P.O. Gaillard Ingo Molnar wrote: > * K.R. Foley <kr@cybsft.com> wrote: > > >>Managed to hang the system again under heavy load. This time with the >>above patch: >> >>http://www.cybsft.com/testresults/crashes/2.6.9-rc1-bk4-R0.txt >> >>Last time was with Q7: >> >>http://www.cybsft.com/testresults/crashes/2.6.9-rc1-bk4-Q7.txt > > > seems to be the same thing - an unbalanced preemption count, possibly > due to some locking error. Unfortunately the assert catches the > imbalance only at exit time. (it's unlikely that the do_exit() code is > buggy.) > > i'll add a new feature to debug this: when crashing on an assert and > tracing is enabled the trace leading up to the crash will be printed to > the console. How did you capture the crash - was it in the syslog or do > you have serial logging? Maybe this is not the real crash but only a > followup crash? > > Ingo > This actually came from syslog. When it happened it was completely locked, so I have no idea if there was more that didn't make it to disk. Of course the monitor was blanked because most of the access is via the network. I could probably enable serial logging if necessary because it's sitting next to a couple of other machines. kr ^ permalink raw reply [flat|nested] 73+ messages in thread
* [patch] voluntary-preempt-2.6.9-rc1-bk4-R3 2004-09-03 18:17 ` Ingo Molnar 2004-09-03 18:36 ` K.R. Foley @ 2004-09-03 19:30 ` Ingo Molnar 2004-09-03 19:49 ` K.R. Foley ` (2 more replies) 1 sibling, 3 replies; 73+ messages in thread From: Ingo Molnar @ 2004-09-03 19:30 UTC (permalink / raw) To: K.R. Foley Cc: linux-kernel, Felipe Alfaro Solana, Daniel Schmitt, Lee Revell, Mark_H_Johnson, P.O. Gaillard * Ingo Molnar <mingo@elte.hu> wrote: > i'll add a new feature to debug this: when crashing on an assert and > tracing is enabled the trace leading up to the crash will be printed > to the console. [...] the -R3 patch has this feature: http://redhat.com/~mingo/voluntary-preempt/voluntary-preempt-2.6.9-rc1-bk4-R3 you can enable it via enabling CONFIG_LATENCY_TRACE and doing: echo 3 > /proc/sys/kernel/trace_enabled it's all automatic from this point on: tracing will happen nonstop and any assert or crash that prints the process stack will also print the last 100 trace entries. Sample output: Call Trace: [<c0160401>] sys_munmap+0x61/0x80 [<c010520d>] sysenter_past_esp+0x52/0x71 Last 100 trace entries: 00000001: zap_pmd_range+0xe/0x90 <= (unmap_page_range+0x55/0x80) 00000001: stop_trace+0x8/0x20 <= (bust_spinlocks+0x20/0x60) 00000001: bust_spinlocks+0xe/0x60 <= (die+0xbc/0x2a0) [... 97 more trace entries ...] Please capture and mail the first 'extended' oops that triggers (secondary followup traces are probably just side-effects of the crash), it could give us clues about where the bug is. Ingo ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R3 2004-09-03 19:30 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-R3 Ingo Molnar @ 2004-09-03 19:49 ` K.R. Foley 2004-09-04 3:39 ` K.R. Foley 2004-09-04 3:43 ` K.R. Foley 2 siblings, 0 replies; 73+ messages in thread From: K.R. Foley @ 2004-09-03 19:49 UTC (permalink / raw) To: Ingo Molnar Cc: linux-kernel, Felipe Alfaro Solana, Daniel Schmitt, Lee Revell, Mark_H_Johnson, P.O. Gaillard Ingo Molnar wrote: > * Ingo Molnar <mingo@elte.hu> wrote: > > >>i'll add a new feature to debug this: when crashing on an assert and >>tracing is enabled the trace leading up to the crash will be printed >>to the console. [...] > > > the -R3 patch has this feature: > > http://redhat.com/~mingo/voluntary-preempt/voluntary-preempt-2.6.9-rc1-bk4-R3 > > you can enable it via enabling CONFIG_LATENCY_TRACE and doing: > > echo 3 > /proc/sys/kernel/trace_enabled > > it's all automatic from this point on: tracing will happen nonstop and > any assert or crash that prints the process stack will also print the > last 100 trace entries. Sample output: > > Call Trace: > [<c0160401>] sys_munmap+0x61/0x80 > [<c010520d>] sysenter_past_esp+0x52/0x71 > Last 100 trace entries: > 00000001: zap_pmd_range+0xe/0x90 <= (unmap_page_range+0x55/0x80) > 00000001: stop_trace+0x8/0x20 <= (bust_spinlocks+0x20/0x60) > 00000001: bust_spinlocks+0xe/0x60 <= (die+0xbc/0x2a0) > [... 97 more trace entries ...] > > Please capture and mail the first 'extended' oops that triggers > (secondary followup traces are probably just side-effects of the crash), > it could give us clues about where the bug is. > > Ingo > Building now. kr ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R3 2004-09-03 19:30 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-R3 Ingo Molnar 2004-09-03 19:49 ` K.R. Foley @ 2004-09-04 3:39 ` K.R. Foley 2004-09-04 3:43 ` K.R. Foley 2 siblings, 0 replies; 73+ messages in thread From: K.R. Foley @ 2004-09-04 3:39 UTC (permalink / raw) To: Ingo Molnar Cc: linux-kernel, Felipe Alfaro Solana, Daniel Schmitt, Lee Revell, Mark_H_Johnson, P.O. Gaillard Ingo Molnar wrote: > * Ingo Molnar <mingo@elte.hu> wrote: > > >>i'll add a new feature to debug this: when crashing on an assert and >>tracing is enabled the trace leading up to the crash will be printed >>to the console. [...] > > > the -R3 patch has this feature: > After hammering the system for a little more than an hour it gave up. I don't have the serial logging setup yet because I haven't had time this evening. I will be glad to do whatever I can to try to help debug this, but it will have to wait until tomorrow. The log is here: http://www.cybsft.com/testresults/crashes/2.6.9-rc1-vo-R3.txt kr ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R3 2004-09-03 19:30 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-R3 Ingo Molnar 2004-09-03 19:49 ` K.R. Foley 2004-09-04 3:39 ` K.R. Foley @ 2004-09-04 3:43 ` K.R. Foley 2004-09-04 6:41 ` Ingo Molnar 2004-09-04 8:57 ` Ingo Molnar 2 siblings, 2 replies; 73+ messages in thread From: K.R. Foley @ 2004-09-04 3:43 UTC (permalink / raw) To: Ingo Molnar Cc: linux-kernel, Felipe Alfaro Solana, Daniel Schmitt, Lee Revell, Mark_H_Johnson, P.O. Gaillard Ingo Molnar wrote: > * Ingo Molnar <mingo@elte.hu> wrote: > > >>i'll add a new feature to debug this: when crashing on an assert and >>tracing is enabled the trace leading up to the crash will be printed >>to the console. [...] > > > the -R3 patch has this feature: > After hammering the system for a little more than an hour it gave up. I don't have the serial logging setup yet because I haven't had time this evening. I will be glad to do whatever I can to try to help debug this, but it will have to wait until tomorrow. The log is here: http://www.cybsft.com/testresults/crashes/2.6.9-rc1-vo-R3.txt kr Sorry I forgot to mention that this was triggered running the stress-kernel package, minus the NFS-Compile, but it does include the CRASHME test. In addition, amlat was running as well. The system was pretty much 100% loaded. ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R3 2004-09-04 3:43 ` K.R. Foley @ 2004-09-04 6:41 ` Ingo Molnar 2004-09-04 12:28 ` K.R. Foley 2004-09-04 8:57 ` Ingo Molnar 1 sibling, 1 reply; 73+ messages in thread From: Ingo Molnar @ 2004-09-04 6:41 UTC (permalink / raw) To: K.R. Foley Cc: linux-kernel, Felipe Alfaro Solana, Daniel Schmitt, Lee Revell, Mark_H_Johnson, P.O. Gaillard * K.R. Foley <kr@cybsft.com> wrote: > http://www.cybsft.com/testresults/crashes/2.6.9-rc1-vo-R3.txt the first line seems partial - isnt the full oops in the log? > Sorry I forgot to mention that this was triggered running the > stress-kernel package, minus the NFS-Compile, but it does include the > CRASHME test. In addition, amlat was running as well. The system was > pretty much 100% loaded. Have you run crashme as root? That would be unsafe. Ingo ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R3 2004-09-04 6:41 ` Ingo Molnar @ 2004-09-04 12:28 ` K.R. Foley 0 siblings, 0 replies; 73+ messages in thread From: K.R. Foley @ 2004-09-04 12:28 UTC (permalink / raw) To: Ingo Molnar Cc: linux-kernel, Felipe Alfaro Solana, Daniel Schmitt, Lee Revell, Mark_H_Johnson, P.O. Gaillard Ingo Molnar wrote: > * K.R. Foley <kr@cybsft.com> wrote: > > >>http://www.cybsft.com/testresults/crashes/2.6.9-rc1-vo-R3.txt > > > the first line seems partial - isnt the full oops in the log? Agreed. Unfortunately that is all there is. > > >>Sorry I forgot to mention that this was triggered running the >>stress-kernel package, minus the NFS-Compile, but it does include the >>CRASHME test. In addition, amlat was running as well. The system was >>pretty much 100% loaded. > > > Have you run crashme as root? That would be unsafe. Actually what happens is that it creates a "crashme" user (and group) for running the test and then deletes the user after the test. In fact the user is still in passwd, because of the crash it didn't get cleaned up. So I don't think this SHOULD be a problem. ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R3 2004-09-04 3:43 ` K.R. Foley 2004-09-04 6:41 ` Ingo Molnar @ 2004-09-04 8:57 ` Ingo Molnar 2004-09-04 10:16 ` Lee Revell 2004-09-04 14:35 ` K.R. Foley 1 sibling, 2 replies; 73+ messages in thread From: Ingo Molnar @ 2004-09-04 8:57 UTC (permalink / raw) To: K.R. Foley Cc: linux-kernel, Felipe Alfaro Solana, Daniel Schmitt, Lee Revell, Mark_H_Johnson, P.O. Gaillard * K.R. Foley <kr@cybsft.com> wrote: > After hammering the system for a little more than an hour it gave up. > I don't have the serial logging setup yet because I haven't had time > this evening. I will be glad to do whatever I can to try to help debug > this, but it will have to wait until tomorrow. The log is here: > > http://www.cybsft.com/testresults/crashes/2.6.9-rc1-vo-R3.txt fyi, i have now triggered a similar crash on a testbox too. It takes quite some time to trigger but it does. since it happens with VP=0,KP=0,SP=0,HP=0 as well it should be one of the cond_resched_lock() (or cond_resched()) additions. Ingo ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R3 2004-09-04 8:57 ` Ingo Molnar @ 2004-09-04 10:16 ` Lee Revell 2004-09-04 14:35 ` K.R. Foley 1 sibling, 0 replies; 73+ messages in thread From: Lee Revell @ 2004-09-04 10:16 UTC (permalink / raw) To: Ingo Molnar Cc: K.R. Foley, linux-kernel, Felipe Alfaro Solana, Daniel Schmitt, Mark_H_Johnson, P.O. Gaillard On Sat, 2004-09-04 at 04:57, Ingo Molnar wrote: > * K.R. Foley <kr@cybsft.com> wrote: > > > After hammering the system for a little more than an hour it gave up. > > I don't have the serial logging setup yet because I haven't had time > > this evening. I will be glad to do whatever I can to try to help debug > > this, but it will have to wait until tomorrow. The log is here: > > > > http://www.cybsft.com/testresults/crashes/2.6.9-rc1-vo-R3.txt > > fyi, i have now triggered a similar crash on a testbox too. It takes > quite some time to trigger but it does. > > since it happens with VP=0,KP=0,SP=0,HP=0 as well it should be one of > the cond_resched_lock() (or cond_resched()) additions. > Here are some results for R0 on another machine, a 1.2Ghz Athlon XP: http://krustophenia.net/testresults.php?dataset=2.6.9-rc1-R0 I have also changed the test again, to be more accurate. Now the jackd alsa driver separately keeps track of the time it spends in poll(), and the time it takes running the jackd process cycle. The length of one period less the sum of the time we spent in poll() and the time it took to run the process cycle equals the amount of time we spent ready to run, and waiting to be scheduled AKA latency. The results are pretty amazing - out of a period time of 666 usecs most of the time we spend between 0 and 1 usec in this state, The worst is 27 usecs or so. These results are of course not directly comparable with previous tests, but I believe this is the most accurate way to measure latency in jackd. Lee ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R3 2004-09-04 8:57 ` Ingo Molnar 2004-09-04 10:16 ` Lee Revell @ 2004-09-04 14:35 ` K.R. Foley 2004-09-04 20:05 ` Ingo Molnar 1 sibling, 1 reply; 73+ messages in thread From: K.R. Foley @ 2004-09-04 14:35 UTC (permalink / raw) To: Ingo Molnar Cc: linux-kernel, Felipe Alfaro Solana, Daniel Schmitt, Lee Revell, Mark_H_Johnson, P.O. Gaillard Ingo Molnar wrote: > * K.R. Foley <kr@cybsft.com> wrote: > > >>After hammering the system for a little more than an hour it gave up. >>I don't have the serial logging setup yet because I haven't had time >>this evening. I will be glad to do whatever I can to try to help debug >>this, but it will have to wait until tomorrow. The log is here: >> >>http://www.cybsft.com/testresults/crashes/2.6.9-rc1-vo-R3.txt > > > fyi, i have now triggered a similar crash on a testbox too. It takes > quite some time to trigger but it does. > > since it happens with VP=0,KP=0,SP=0,HP=0 as well it should be one of > the cond_resched_lock() (or cond_resched()) additions. > > Ingo > I am glad that it's reproducible for you as well. How did you trigger it? Because it seems to only crash under heavy load for me. The system has been up since I rebooted last night after the crash and I haven't seen any problems. Same thing goes for up until last night when I booted the new patch. Even building the new patch didn't seem to be enough to trigger it. kr ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R3 2004-09-04 14:35 ` K.R. Foley @ 2004-09-04 20:05 ` Ingo Molnar 0 siblings, 0 replies; 73+ messages in thread From: Ingo Molnar @ 2004-09-04 20:05 UTC (permalink / raw) To: K.R. Foley Cc: linux-kernel, Felipe Alfaro Solana, Daniel Schmitt, Lee Revell, Mark_H_Johnson, P.O. Gaillard * K.R. Foley <kr@cybsft.com> wrote: > I am glad that it's reproducible for you as well. How did you trigger > it? Because it seems to only crash under heavy load for me. The system > has been up since I rebooted last night after the crash and I haven't > seen any problems. Same thing goes for up until last night when I > booted the new patch. Even building the new patch didn't seem to be > enough to trigger it. i triggered it via two IO-intense scripts running on a 256 MB RAM testbox: cd /tmp; while true; do dd if=/dev/zero of=bigfile bs=1000000 count=500 >/dev/null 2>/dev/null; sync; date; sleep 60; done & while true; do du /usr >/dev/null 2>/dev/null; date; done & usually it triggers within half an hour or so of runtime. With -R4 it didnt crash yet after ~1 hour of running so i'm quite optimistic that this particular bug has been fixed. i think the key to reproduce was to use KP=0,SP=0,HP=0 - this concentrates all preemption activity to the places the VP patch adds - amongst them the one in the do_exit() path that is necessary to trigger the race. Ingo ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R0 2004-09-03 17:10 ` K.R. Foley 2004-09-03 18:17 ` Ingo Molnar @ 2004-09-03 18:39 ` Ingo Molnar 2004-09-03 18:41 ` K.R. Foley 1 sibling, 1 reply; 73+ messages in thread From: Ingo Molnar @ 2004-09-03 18:39 UTC (permalink / raw) To: K.R. Foley Cc: linux-kernel, Felipe Alfaro Solana, Daniel Schmitt, Lee Revell, Mark_H_Johnson, P.O. Gaillard * K.R. Foley <kr@cybsft.com> wrote: > Managed to hang the system again under heavy load. This time with the > above patch: > > http://www.cybsft.com/testresults/crashes/2.6.9-rc1-bk4-R0.txt do you have CONFIG_DEBUG_SPINLOCK and CONFIG_DEBUG_SPINLOCK_SLEEP enabled? These are pretty good at e.g. catching illegal scheduling in critical sections much earlier. You can enable them on !SMP kernels as well. Ingo ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R0 2004-09-03 18:39 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-R0 Ingo Molnar @ 2004-09-03 18:41 ` K.R. Foley 0 siblings, 0 replies; 73+ messages in thread From: K.R. Foley @ 2004-09-03 18:41 UTC (permalink / raw) To: Ingo Molnar Cc: linux-kernel, Felipe Alfaro Solana, Daniel Schmitt, Lee Revell, Mark_H_Johnson, P.O. Gaillard Ingo Molnar wrote: > * K.R. Foley <kr@cybsft.com> wrote: > > >>Managed to hang the system again under heavy load. This time with the >>above patch: >> >>http://www.cybsft.com/testresults/crashes/2.6.9-rc1-bk4-R0.txt > > > do you have CONFIG_DEBUG_SPINLOCK and CONFIG_DEBUG_SPINLOCK_SLEEP > enabled? These are pretty good at e.g. catching illegal scheduling in > critical sections much earlier. You can enable them on !SMP kernels as > well. > > Ingo Not currently but I am enabling now. kr ^ permalink raw reply [flat|nested] 73+ messages in thread
end of thread, other threads:[~2004-09-04 20:04 UTC | newest] Thread overview: 73+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2004-08-30 19:13 [patch] voluntary-preempt-2.6.9-rc1-bk4-Q5 Mark_H_Johnson 2004-08-30 19:21 ` Ingo Molnar 2004-09-01 12:31 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-Q5 - netdev_max_back_log is too small P.O. Gaillard 2004-09-01 13:05 ` Ingo Molnar 2004-09-02 11:24 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-Q6 - network is no longer smooth P.O. Gaillard 2004-09-02 11:28 ` Ingo Molnar 2004-09-02 15:26 ` P.O. Gaillard 2004-08-31 8:49 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-Q5 Ingo Molnar 2004-09-02 6:33 ` Ingo Molnar 2004-09-02 6:55 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-Q8 Ingo Molnar 2004-09-02 7:04 ` Lee Revell 2004-09-02 7:15 ` Ingo Molnar 2004-09-02 7:31 ` Lee Revell 2004-09-02 7:46 ` Ingo Molnar 2004-09-03 1:10 ` Rusty Russell 2004-09-02 23:25 ` Lee Revell 2004-09-02 23:28 ` Ingo Molnar 2004-09-02 23:32 ` Lee Revell 2004-09-02 7:17 ` Ingo Molnar 2004-09-02 8:23 ` Ingo Molnar 2004-09-02 11:10 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-Q9 Ingo Molnar 2004-09-02 12:14 ` Thomas Charbonnel 2004-09-02 13:16 ` Thomas Charbonnel 2004-09-02 13:23 ` Ingo Molnar 2004-09-02 14:38 ` Thomas Charbonnel 2004-09-02 21:57 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-R0 Ingo Molnar 2004-09-02 22:06 ` Lee Revell 2004-09-02 22:14 ` Ingo Molnar 2004-09-02 22:15 ` Lee Revell 2004-09-03 0:24 ` Lee Revell 2004-09-03 3:17 ` Eric St-Laurent 2004-09-03 6:26 ` Lee Revell 2004-09-03 6:36 ` Ingo Molnar 2004-09-03 6:49 ` Lee Revell 2004-09-03 7:01 ` Ingo Molnar 2004-09-03 7:05 ` Ingo Molnar 2004-09-03 7:40 ` Lee Revell 2004-09-03 7:50 ` Free Ekanayaka 2004-09-03 8:05 ` Lee Revell 2004-09-03 9:05 ` Free Ekanayaka 2004-09-03 9:25 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-R1 Ingo Molnar 2004-09-03 9:50 ` Luke Yelavich 2004-09-03 10:29 ` Ingo Molnar 2004-09-03 10:43 ` Luke Yelavich 2004-09-03 11:33 ` Thomas Charbonnel 2004-09-03 11:49 ` Ingo Molnar 2004-09-03 12:05 ` Thomas Charbonnel 2004-09-03 16:14 ` Thomas Charbonnel 2004-09-03 17:36 ` Thomas Charbonnel 2004-09-03 11:36 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-R2 Ingo Molnar 2004-09-03 8:09 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-R0 Luke Yelavich 2004-09-03 8:13 ` Lee Revell 2004-09-03 8:21 ` Luke Yelavich 2004-09-03 12:52 ` Luke Yelavich 2004-09-03 18:09 ` K.R. Foley 2004-09-03 11:04 ` K.R. Foley 2004-09-03 17:02 ` K.R. Foley 2004-09-03 20:40 ` Lee Revell 2004-09-03 17:10 ` K.R. Foley 2004-09-03 18:17 ` Ingo Molnar 2004-09-03 18:36 ` K.R. Foley 2004-09-03 19:30 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-R3 Ingo Molnar 2004-09-03 19:49 ` K.R. Foley 2004-09-04 3:39 ` K.R. Foley 2004-09-04 3:43 ` K.R. Foley 2004-09-04 6:41 ` Ingo Molnar 2004-09-04 12:28 ` K.R. Foley 2004-09-04 8:57 ` Ingo Molnar 2004-09-04 10:16 ` Lee Revell 2004-09-04 14:35 ` K.R. Foley 2004-09-04 20:05 ` Ingo Molnar 2004-09-03 18:39 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-R0 Ingo Molnar 2004-09-03 18:41 ` K.R. Foley
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox