public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q5
@ 2004-08-30 19:13 Mark_H_Johnson
  2004-08-30 19:21 ` Ingo Molnar
                   ` (2 more replies)
  0 siblings, 3 replies; 84+ messages in thread
From: Mark_H_Johnson @ 2004-08-30 19:13 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: K.R. Foley, linux-kernel, Felipe Alfaro Solana, Daniel Schmitt,
	Lee Revell

>i've uploaded -Q5 to:
> [snip the rest...]

Thanks.

This appears to be the first 2.6.x kernel I've run that has results
comparable to 2.4.x kernels with low latency patches and kernel preemption.
The few remaining symptoms I see include:

 - a few long (> 1 msec) delays in the real time CPU loop (no system calls)
 - varying time to complete the write system call (for audio) - much
different than 2.4
 - a couple latency traces (> 700 usec) in the network driver

For reference, these tests were performed on the following SMP system:
  Dual 866 Mhz Pentium III
  512 Mbyte memory
  IDE system disk (DMA enabled)
The basic test is Benno's latency test
(http://www.gardena.net/benno/linux/audio) with some slight modifications
to the tests to keep the second CPU busy (non real time CPU burner) and to
add network I/O tests. The 2.4 tests were run with 2.4.20, the 2.6 tests
were run with 2.4.9-rc1-Q5. On 2.6, voluntary_preemption,
kernel_preemption, hardirq_preemption, and softirq_preemption are all 1. I
also set
  /sys/block/hda/queue/max_sectors_kb = 32
  /sys/block/hda/queue/read_ahead_kb = 32
  /proc/sys/net/core/netdev_max_backlog = 8
and the audio driver was set to be non-threaded.

BASIC RESULTS
=============
Comparison of results between 2.6.x and 2.4.x; values in milliseconds.
Nominal values for the write operation is 1.45 msec; the CPU loop is 1.16
msec.

          Max CPU Delta     Max Write Delta
Test     2.4.x     2.6.x    2.4.x     2.6.x
X11       0.10      0.16     0.05      0.65
/proc     0.07      0.17     0.05      0.65
net out   0.15      0.19     0.05      0.75
net in    0.17      0.23     0.05      0.95
dsk wrt   0.49      0.18     0.25      1.05
dsk copy  2.48      0.68     2.25      1.25
disk rd   3.03      1.61     2.75      1.35

LONG DELAYS
===========

Note I still see over 110% worst case overhead on a max priority real time
CPU task (no system calls) when doing heavy disk I/O on 2.6. It is much
better than 2.4, but still disturbing. What I would hope would happen on a
dual CPU system like mine, is that the real time task tends to be on one
CPU and the other system activity would tend to stay on the other CPU.
However, the results do not seem to indicate that behavior.

VARYING SYSTEM CALL TIMES
=========================

In 2.4, it appears that the duration of the write system call is basically
fixed and dependent on the duration of the audio fragment. In 2.6, this
behavior is now different. If I look at the chart in detail, it appears the
system is queueing up several write operations during the first few seconds
of testing. You can see this by consistently low elapsed times for the
write system call. Then the elapsed time for the write bounces up / down in
a sawtooth pattern over a 1 msec range. Could someone explain the cause of
this new behavior and if there is a setting to restore the old behavior? I
am concerned that this queueing adds latency to audio operations (when
trying to synchronize audio with other real time behavior).

LONG NETWORK LATENCIES
======================

In about 25 minutes of heavy testing, I had two latency traces with
/proc/sys/kernel/preempt_max_latency set to 700. They had the same start /
end location with the long delay as follows:
  730 us, entries: 361
  ...
  started at rtl8139_poll+0x3c/0x160
  ended at   rtl8139_poll+0x100/0x160
  00000001 0.000ms (+0.000ms): rtl8139_poll (net_rx_action)
  00000001 0.140ms (+0.140ms): rtl8139_rx (rtl8139_poll)
  00000001 0.556ms (+0.416ms): alloc_skb (rtl8139_rx)
  ... remaining items all > +0.005ms ...

  731 us, entries: 360
  ...
  started at rtl8139_poll+0x3c/0x160
  ended at   rtl8139_poll+0x100/0x160
  00000001 0.000ms (+0.000ms): rtl8139_poll (net_rx_action)
  00000001 0.000ms (+0.000ms): rtl8139_rx (rtl8139_poll)
  00000001 0.002ms (+0.001ms): alloc_skb (rtl8139_rx)
  00000001 0.141ms (+0.139ms): kmem_cache_alloc (alloc_skb)
  00000001 0.211ms (+0.070ms): __kmalloc (alloc_skb)
  00000001 0.496ms (+0.284ms): eth_type_trans (rtl8139_rx)
  00000001 0.565ms (+0.068ms): netif_receive_skb (rtl8139_rx)
  ... remaining items all > +0.005ms ...

Still much better than my previous results (before setting
netdev_max_backlog).

I will be running some additional tests
 - reducing preempt_max_latency
 - running with sortirq and hardirq_preemption=0
to see if these uncover any further problems.

Thanks again for the good work.
--Mark H Johnson
  <mailto:Mark_H_Johnson@raytheon.com>


^ permalink raw reply	[flat|nested] 84+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q9
@ 2004-09-02 20:21 Mark_H_Johnson
  2004-09-02 20:38 ` Ingo Molnar
  2004-09-02 22:43 ` Ingo Molnar
  0 siblings, 2 replies; 84+ messages in thread
From: Mark_H_Johnson @ 2004-09-02 20:21 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt,
	Lee Revell

>I've released the -Q9 patch:
>...

Built a kernel with -Q9 plus the following additional patches:
... email saved into mark-offset-tsc-mcount.patch ...
... email saved into ens001.patch ...
... own patch to add mcount() calls to sched.c ...

Ran my 25 minute test and collected 71 traces > 500 usec. The following
has a summary of the results.

[1] No cascade traces - several runs with this result - we should be OK
here.

[2] rtl8139 still showing some traces due to my use of
netdev_backlog_granularity = 8. Had severe problems when it is 1.

[3] I read what you said about hardware overhead, but the data seems
to show latencies getting to "__switch_to" may be buried in some
inline functions. A typical trace looks something like this...

00000002 0.003ms (+0.000ms): dummy_switch_tasks (schedule)
00000002 0.003ms (+0.000ms): schedule (worker_thread)
00000002 0.003ms (+0.000ms): schedule (worker_thread)
00000002 0.004ms (+0.000ms): schedule (worker_thread)
00000002 0.004ms (+0.000ms): schedule (worker_thread)
00000002 0.004ms (+0.000ms): schedule (worker_thread)
00000002 0.004ms (+0.274ms): schedule (worker_thread)
04000002 0.279ms (+0.000ms): __switch_to (schedule)

dummy_switch_tasks refers to a function I added / called right after
the label switch_tasks (in sched.c). The mcount() traces that follow
are basically at each step leading up to the call to context_switch.
Since context_switch is static inline, I assume it is not traced -
please confirm. I am considering adding mcount() calls inside
context_switch to see if there is a step that has some long
duration.

[4] The go_idle path is also one that appears to have some long
latencies. The following is an example - the trace of dummy_go_idle
refers to the label named go_idle.

00000002 0.002ms (+0.000ms): dummy_go_idle (schedule)
00000002 0.002ms (+0.060ms): schedule (io_schedule)
00000002 0.063ms (+0.069ms): load_balance_newidle (schedule)
00000002 0.133ms (+0.074ms): find_busiest_group (load_balance_newidle)
00000002 0.207ms (+0.034ms): find_next_bit (find_busiest_group)
00000002 0.242ms (+0.039ms): find_next_bit (find_busiest_group)
00000002 0.281ms (+0.070ms): find_busiest_queue (load_balance_newidle)
00000002 0.351ms (+0.071ms): find_next_bit (find_busiest_queue)
00000002 0.422ms (+0.069ms): double_lock_balance (load_balance_newidle)
00000003 0.492ms (+0.070ms): move_tasks (load_balance_newidle)
00010003 0.563ms (+0.000ms): do_nmi (move_tasks)
00010003 0.563ms (+0.002ms): do_nmi (del_timer_sync)
00010003 0.566ms (+0.000ms): profile_tick (nmi_watchdog_tick)
00010003 0.566ms (+0.000ms): profile_hook (profile_tick)
00010003 0.567ms (+0.000ms): read_lock (profile_hook)
00010004 0.567ms (+0.000ms): notifier_call_chain (profile_hook)
00010003 0.567ms (+0.001ms): profile_hit (nmi_watchdog_tick)
00000003 0.569ms (+0.000ms): find_next_bit (move_tasks)
00000003 0.569ms (+0.000ms): dequeue_task (move_tasks)
00000003 0.570ms (+0.000ms): enqueue_task (move_tasks)
00000003 0.570ms (+0.000ms): resched_task (move_tasks)
00000003 0.571ms (+0.000ms): find_next_bit (move_tasks)
00000003 0.571ms (+0.000ms): dequeue_task (move_tasks)
00000003 0.571ms (+0.000ms): enqueue_task (move_tasks)
00000003 0.572ms (+0.000ms): resched_task (move_tasks)
00000002 0.572ms (+0.000ms): preempt_schedule (load_balance_newidle)

We had 500 usec accumulation of latency through several function calls.

[5] mark_tsc_offset - some steps in that sequence are generating
some long latencies. For example:

04010003 0.011ms (+0.000ms): mark_offset_tsc (timer_interrupt)
04010003 0.011ms (+0.000ms): mark_offset_tsc (timer_interrupt)
04010003 0.011ms (+0.000ms): spin_lock (mark_offset_tsc)
04010004 0.011ms (+0.137ms): mark_offset_tsc (timer_interrupt)
04010004 0.149ms (+0.000ms): mark_offset_tsc (timer_interrupt)
04010004 0.149ms (+0.000ms): spin_lock (mark_offset_tsc)
04010005 0.149ms (+0.144ms): mark_offset_tsc (timer_interrupt)
04010005 0.294ms (+0.004ms): mark_offset_tsc (timer_interrupt)
04010005 0.298ms (+0.003ms): mark_offset_tsc (timer_interrupt)
04010005 0.301ms (+0.000ms): mark_offset_tsc (timer_interrupt)
04010005 0.301ms (+0.000ms): mark_offset_tsc (timer_interrupt)
04010004 0.301ms (+0.000ms): mark_offset_tsc (timer_interrupt)
04010004 0.301ms (+0.073ms): mark_offset_tsc (timer_interrupt)
04010004 0.375ms (+0.000ms): mark_offset_tsc (timer_interrupt)
04010003 0.375ms (+0.000ms): timer_interrupt (generic_handle_IRQ_event)

I didn't see any feedback from the table I provided previously.
Is this data helpful or should I take out the patch?

[6] spin_lock - may just be an SMP lock problem but here's a trace
I don't recall seeing previously.

00000002 0.008ms (+0.000ms): snd_ensoniq_trigger (snd_pcm_do_stop)
00000002 0.008ms (+0.344ms): spin_lock (snd_ensoniq_trigger)
00010003 0.353ms (+0.015ms): do_nmi (snd_ensoniq_trigger)
00010003 0.368ms (+0.006ms): do_nmi (update_one_process)
00010003 0.375ms (+0.000ms): profile_tick (nmi_watchdog_tick)
00010003 0.375ms (+0.000ms): profile_hook (profile_tick)
00010003 0.375ms (+0.121ms): read_lock (profile_hook)
00010004 0.496ms (+0.000ms): notifier_call_chain (profile_hook)
00010003 0.497ms (+0.068ms): profile_hit (nmi_watchdog_tick)
00000002 0.566ms (+0.000ms): snd_pcm_post_stop (snd_pcm_action_single)

[7] Not sure what to call it, I don't recall seeing this type of trace
before either.

 => started at: __spin_lock_irqsave+0x39/0x90
 => ended at:   as_work_handler+0x5c/0xa0
=======>
00000001 0.000ms (+0.000ms): __spin_lock_irqsave (spin_lock_irq)
00000001 0.000ms (+0.000ms): generic_enable_irq (ide_do_request)
00000001 0.000ms (+0.000ms): __spin_lock_irqsave (generic_enable_irq)
00000002 0.000ms (+0.000ms): unmask_IO_APIC_irq (generic_enable_irq)
00000002 0.000ms (+0.000ms): __spin_lock_irqsave (unmask_IO_APIC_irq)
00000003 0.001ms (+0.000ms): __unmask_IO_APIC_irq (unmask_IO_APIC_irq)
00000003 0.001ms (+0.066ms): __modify_IO_APIC_irq (__unmask_IO_APIC_irq)
00000001 0.067ms (+0.001ms): smp_apic_timer_interrupt (as_work_handler)
00010001 0.069ms (+0.087ms): profile_tick (smp_apic_timer_interrupt)
00010001 0.157ms (+0.000ms): profile_hook (profile_tick)
00010001 0.157ms (+0.069ms): read_lock (profile_hook)
00010002 0.227ms (+0.000ms): notifier_call_chain (profile_hook)
00010001 0.227ms (+0.069ms): profile_hit (smp_apic_timer_interrupt)
00010001 0.297ms (+0.000ms): update_process_times
(smp_apic_timer_interrupt)
00010001 0.297ms (+0.069ms): update_one_process (update_process_times)
00010001 0.367ms (+0.000ms): run_local_timers (update_process_times)
00010001 0.367ms (+0.069ms): raise_softirq (update_process_times)
00010001 0.437ms (+0.000ms): scheduler_tick (update_process_times)
00010001 0.437ms (+0.070ms): sched_clock (scheduler_tick)
00020001 0.507ms (+0.000ms): do_nmi (scheduler_tick)
00020001 0.508ms (+0.002ms): do_nmi (del_timer_sync)
00020001 0.511ms (+0.000ms): profile_tick (nmi_watchdog_tick)
00020001 0.511ms (+0.000ms): profile_hook (profile_tick)
00020001 0.511ms (+0.065ms): read_lock (profile_hook)
00020002 0.577ms (+0.000ms): notifier_call_chain (profile_hook)
00020001 0.577ms (+0.000ms): profile_hit (nmi_watchdog_tick)
00010001 0.578ms (+0.000ms): spin_lock (scheduler_tick)

[8] exit_mmap - there are a few traces referring to code in or
called by exit_mmap. Here's an example.

 => started at: cond_resched_lock+0x6b/0x110
 => ended at:   exit_mmap+0x155/0x1f0
=======>
00000001 0.000ms (+0.000ms): touch_preempt_timing (cond_resched_lock)
00000001 0.000ms (+0.000ms): __bitmap_weight (unmap_vmas)
00000001 0.000ms (+0.000ms): vm_acct_memory (exit_mmap)
00000001 0.001ms (+0.629ms): clear_page_tables (exit_mmap)
00000001 0.631ms (+0.000ms): flush_tlb_mm (exit_mmap)
00000001 0.631ms (+0.000ms): free_pages_and_swap_cache (exit_mmap)

I will send the traces separately (not to linux-kernel) for analysis.

  --Mark


^ permalink raw reply	[flat|nested] 84+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q9
@ 2004-09-03 13:42 Mark_H_Johnson
  0 siblings, 0 replies; 84+ messages in thread
From: Mark_H_Johnson @ 2004-09-03 13:42 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt,
	Lee Revell

>> this is as if the CPU executed everything in 'slow motion'. E.g. the
>> cache being disabled could be one such reason - or some severe DMA or
>> other bus traffic.
>
>another thing: could you try maxcpus=1 again and see whether 1 CPU
>produces similar 'slow motion' traces?

OK. Second try to send this message. I've had two complete system lockups
now with -Q9. The first doing a kernel build, the second while trying
to send a message explaining that plus the results I saw when trying
to test with maxcpus=1.

Booted w/ maxcpus=1 nmi_watchdog=1. Saw information messages confirming
both were set and acted on.

First major problem was the audio came out "too fast". The test repeats
a pattern and you could hear it repeating faster than it should have.
Needless to say - this meant that there was no time available to run
any non real time tasks so the application timing results were worthless
and the latency traces were separated by several minutes in some cases.
[before I could force a SIGINT to the real time task]

I don't know if the traces I did get are helpful or not, but here is a
summary.... The numbers after each title refer to the trace file number
so if you count them, you get an idea how frequent each symptom was.
I will send the full traces only if you ask for them.

Slow getting to "switch_tasks" (00, 01, 10, 12, 13, 16)

00000002 0.002ms (+0.000ms): dequeue_task (deactivate_task)
00000002 0.002ms (+0.000ms): schedule (io_schedule)
00000002 0.002ms (+0.000ms): schedule (io_schedule)
00000002 0.002ms (+0.000ms): schedule (io_schedule)
00000002 0.003ms (+0.000ms): schedule (io_schedule)
00000002 0.003ms (+0.000ms): schedule (io_schedule)
00000002 0.003ms (+0.303ms): schedule (io_schedule)
00010002 0.306ms (+0.016ms): do_nmi (schedule)
00010002 0.323ms (+0.000ms): profile_tick (nmi_watchdog_tick)
00010002 0.323ms (+0.000ms): profile_hook (profile_tick)
00010002 0.323ms (+0.100ms): read_lock (profile_hook)
00010003 0.424ms (+0.000ms): notifier_call_chain (profile_hook)
00010002 0.424ms (+0.070ms): profile_hit (nmi_watchdog_tick)
00000002 0.494ms (+0.000ms): dummy_switch_tasks (schedule)

Slow getting to "__switch_to" (02, 03, 05, 09)

00000002 0.003ms (+0.000ms): dummy_switch_tasks (schedule)
00000002 0.003ms (+0.000ms): schedule (io_schedule)
00000002 0.003ms (+0.327ms): schedule (io_schedule)
00000002 0.331ms (+0.000ms): schedule (io_schedule)
00000002 0.331ms (+0.000ms): schedule (io_schedule)
00000002 0.331ms (+0.000ms): schedule (io_schedule)
00000002 0.332ms (+0.236ms): schedule (io_schedule)
00010002 0.568ms (+0.001ms): do_nmi (mcount)
00010002 0.570ms (+0.000ms): profile_tick (nmi_watchdog_tick)
00010002 0.570ms (+0.000ms): profile_hook (profile_tick)
00010002 0.570ms (+0.000ms): read_lock (profile_hook)
00010003 0.571ms (+0.000ms): notifier_call_chain (profile_hook)
00010002 0.571ms (+0.000ms): profile_hit (nmi_watchdog_tick)
00000002 0.572ms (+0.000ms): __switch_to (schedule)

Slow to both switch_tasks and __switch_to (04, 06)

00000002 0.002ms (+0.000ms): dequeue_task (deactivate_task)
00000002 0.002ms (+0.000ms): schedule (io_schedule)
00000002 0.002ms (+0.000ms): schedule (io_schedule)
00000002 0.002ms (+0.000ms): schedule (io_schedule)
00000002 0.003ms (+0.000ms): schedule (io_schedule)
00000002 0.003ms (+0.000ms): schedule (io_schedule)
00000002 0.003ms (+0.131ms): schedule (io_schedule)
00000002 0.134ms (+0.000ms): dummy_switch_tasks (schedule)
00000002 0.134ms (+0.000ms): schedule (io_schedule)
00000002 0.134ms (+0.000ms): schedule (io_schedule)
00000002 0.134ms (+0.000ms): schedule (io_schedule)
00000002 0.135ms (+0.000ms): schedule (io_schedule)
00000002 0.135ms (+0.000ms): schedule (io_schedule)
00000002 0.135ms (+0.139ms): schedule (io_schedule)
00010002 0.275ms (+0.279ms): do_nmi (schedule)
00010002 0.554ms (+0.000ms): profile_tick (nmi_watchdog_tick)
00010002 0.554ms (+0.000ms): profile_hook (profile_tick)
00010002 0.554ms (+0.000ms): read_lock (profile_hook)
00010003 0.555ms (+0.000ms): notifier_call_chain (profile_hook)
00010002 0.555ms (+0.001ms): profile_hit (nmi_watchdog_tick)
00000002 0.557ms (+0.000ms): __switch_to (schedule)

sched_clock (08, 11, 14)

00000001 0.000ms (+0.000ms): schedule (worker_thread)
00000001 0.000ms (+0.000ms): schedule (worker_thread)
00000001 0.000ms (+0.000ms): schedule (worker_thread)
00000001 0.000ms (+0.000ms): schedule (worker_thread)
00000001 0.000ms (+0.000ms): schedule (worker_thread)
00000001 0.000ms (+0.000ms): schedule (worker_thread)
00000001 0.000ms (+0.553ms): sched_clock (schedule)
00010001 0.554ms (+0.001ms): do_nmi (sched_clock)
00010001 0.556ms (+0.000ms): profile_tick (nmi_watchdog_tick)
00010001 0.556ms (+0.000ms): profile_hook (profile_tick)
00010001 0.556ms (+0.000ms): read_lock (profile_hook)
00010002 0.556ms (+0.000ms): notifier_call_chain (profile_hook)
00010001 0.557ms (+0.000ms): profile_hit (nmi_watchdog_tick)
00010001 0.557ms (+0.000ms): do_IRQ (sched_clock)
00010001 0.558ms (+0.000ms): do_IRQ (<00000000>)
00010001 0.558ms (+0.000ms): spin_lock (do_IRQ)

__modify_IO_APIC_irq (15)

00000001 0.000ms (+0.000ms): __spin_lock_irqsave (spin_lock_irq)
00000001 0.000ms (+0.000ms): generic_enable_irq (ide_do_request)
00000001 0.000ms (+0.000ms): __spin_lock_irqsave (generic_enable_irq)
00000002 0.001ms (+0.000ms): unmask_IO_APIC_irq (generic_enable_irq)
00000002 0.001ms (+0.000ms): __spin_lock_irqsave (unmask_IO_APIC_irq)
00000003 0.002ms (+0.000ms): __unmask_IO_APIC_irq (unmask_IO_APIC_irq)
00000003 0.002ms (+0.565ms): __modify_IO_APIC_irq (__unmask_IO_APIC_irq)
00010001 0.567ms (+0.001ms): do_nmi (apic_timer_interrupt)
00010001 0.569ms (+0.000ms): profile_tick (nmi_watchdog_tick)
00010001 0.569ms (+0.000ms): profile_hook (profile_tick)
00010001 0.569ms (+0.000ms): read_lock (profile_hook)
00010002 0.570ms (+0.000ms): notifier_call_chain (profile_hook)
00010001 0.570ms (+0.000ms): profile_hit (nmi_watchdog_tick)
00000001 0.571ms (+0.000ms): smp_apic_timer_interrupt (as_work_handler)

In the meantime, I will build a kernel with -R1 and leave the latency trace
on (now running a -Q7 kernel with both CPUs active) to see if it shows
anything interesting before attempting any more tests.

  --Mark


^ permalink raw reply	[flat|nested] 84+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q9
@ 2004-09-03 15:33 Mark_H_Johnson
  2004-09-04  0:04 ` Lee Revell
  0 siblings, 1 reply; 84+ messages in thread
From: Mark_H_Johnson @ 2004-09-03 15:33 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt,
	Lee Revell

>In the meantime, I will build a kernel with -R1 and leave the latency
trace
>on (now running a -Q7 kernel with both CPUs active) to see if it shows
>anything interesting before attempting any more tests.

Well - THAT was special. Another crash but I may have a clue on both that
and the "general slow down" in sched.c.

The crash is likely due to a problem with X. I caused it this time when
I was trying to open / hide Mozilla windows (again preparing to send email
with a web based client). The last operation that worked was the window
hide. The one that locked everything up was a click to restore a Mozilla
window onto the screen. I don't know if this is relevant, but the last
trace in /var/log/messages was a latency trace caused by X.

Sep  3 09:57:11 dws77 kernel: (X/2382): new 329 us maximum-latency critical
section.
Sep  3 09:57:11 dws77 kernel:  => started at: <kmap_atomic+0x23/0xe0>
Sep  3 09:57:11 dws77 kernel:  => ended at:   <kunmap_atomic+0x7b/0xa0>

I am not sure this is relevant since all the data for it was written
to disk (my script picked up the latency trace as well). Let me know
if you want the trace data.

The slow down in sched.c may be due to disk DMA activities.
When I started the kernel build, I forgot to run my script that sets
a number of /proc values. In specific, the values were:
cat /sys/block/hda/queue/max_sectors_kb
128
cat /sys/block/hda/queue/read_ahead_kb
128
cat /proc/sys/net/core/netdev_max_backlog
300
cat '/proc/irq/10/Ensoniq AudioPCI/threaded'
1

and in my other tests, they are:
echo 32 > /sys/block/hda/queue/max_sectors_kb
echo 32 > /sys/block/hda/queue/read_ahead_kb
echo 8 > /proc/sys/net/core/netdev_max_backlog
echo 0 > '/proc/irq/10/Ensoniq AudioPCI/threaded'

Note - no audio and only light network activity during the kernel
build.

With the default settings, I had latencies in sched.c over 2 msec
(longest was 2305 us). For example:

preemption latency trace v1.0.3
-------------------------------
 latency: 2305 us, entries: 137 (137)
    -----------------
    | task: ksoftirqd/0/3, uid:0 nice:-10 policy:0 rt_prio:0
    -----------------
 => started at: schedule+0x51/0x740
 => ended at:   schedule+0x337/0x740
=======>
00000001 0.000ms (+0.000ms): schedule (worker_thread)
00000001 0.000ms (+0.000ms): sched_clock (schedule)
00000001 0.000ms (+0.000ms): spin_lock_irq (schedule)
00000001 0.000ms (+0.000ms): __spin_lock_irqsave (spin_lock_irq)
00000002 0.001ms (+0.000ms): deactivate_task (schedule)
00000002 0.001ms (+0.000ms): dequeue_task (deactivate_task)
04000002 0.765ms (+0.764ms): __switch_to (schedule)
04000002 0.800ms (+0.034ms): finish_task_switch (schedule)
04010002 1.116ms (+0.316ms): do_IRQ (finish_task_switch)
04010002 1.116ms (+0.000ms): spin_lock (do_IRQ)
04010003 1.117ms (+0.000ms): ack_edge_ioapic_irq (do_IRQ)
04010003 1.118ms (+0.000ms): generic_redirect_hardirq (do_IRQ)
04010002 1.328ms (+0.210ms): generic_handle_IRQ_event (do_IRQ)
04010002 1.797ms (+0.468ms): timer_interrupt (generic_handle_IRQ_event)
04010002 1.840ms (+0.043ms): spin_lock (timer_interrupt)
04010003 1.842ms (+0.001ms): mark_offset_tsc (timer_interrupt)
04010003 1.842ms (+0.000ms): mark_offset_tsc (timer_interrupt)
04010003 1.842ms (+0.000ms): spin_lock (mark_offset_tsc)
04010004 1.890ms (+0.048ms): mark_offset_tsc (timer_interrupt)
04010004 2.236ms (+0.345ms): mark_offset_tsc (timer_interrupt)
04010004 2.236ms (+0.000ms): spin_lock (mark_offset_tsc)
04010005 2.236ms (+0.000ms): mark_offset_tsc (timer_interrupt)
04010005 2.240ms (+0.003ms): mark_offset_tsc (timer_interrupt)
04010005 2.244ms (+0.004ms): mark_offset_tsc (timer_interrupt)
04010005 2.246ms (+0.002ms): mark_offset_tsc (timer_interrupt)
04010005 2.247ms (+0.000ms): mark_offset_tsc (timer_interrupt)
04010004 2.247ms (+0.000ms): mark_offset_tsc (timer_interrupt)
04010004 2.247ms (+0.000ms): mark_offset_tsc (timer_interrupt)
04010004 2.248ms (+0.000ms): mark_offset_tsc (timer_interrupt)
04010003 2.249ms (+0.001ms): timer_interrupt (generic_handle_IRQ_event)
...

So this trace shows a number of the symptoms I have previously
reported.

When I used the lower disk settings, the latencies went way
down [at least up to the crash...] with a maximum of 618 us.
I looked through several traces and could not find a similar
sequence for comparison.

So we may be both right.
 - there is hardware overhead that needs to be accounted for; this
pair of runs seems to point to disk I/O which we previously checked
is DMA to/from an IDE disk.
 - there may be steps in the code that cause longer latency and
we want to avoid if possible.

It may be a combination of effects. A question for others doing
testing (like Lee) - have you been doing any other activity in
the background when doing your tests? For example, I have found
that something as simple as
  head -c $1 /dev/zero >tmpfile  [$1 is a > physical memory size]
or
  cat tmpfile > /dev/null
can cause significantly increased latencies in the 2.6 kernels.

  --Mark


^ permalink raw reply	[flat|nested] 84+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q9
@ 2004-09-03 18:28 Mark_H_Johnson
  2004-09-03 18:33 ` Ingo Molnar
  0 siblings, 1 reply; 84+ messages in thread
From: Mark_H_Johnson @ 2004-09-03 18:28 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: K.R. Foley, linux-kernel, Felipe Alfaro Solana, Daniel Schmitt,
	Lee Revell

>a quick suggestion: could you add this near the top of sched.c (below
>the #include lines):
>
> #define static
>
>this will turn off all inlining and makes the scheduler-internal
>functions visible. ....

I did this and built a broken kernel. I panics very early in start up.
I also got warnings like
kernel/sched.c: In function __might_sleep
kernel/sched.c:4974: warning: prev_jiffy might be used uninitialized in
this function
which if I read the code right should be a local static variable.

I suppose you meant
 #define inline
instead (which throws a warning about a duplicate definition; can I
ignore it?)

--Mark H Johnson
  <mailto:Mark_H_Johnson@raytheon.com>


^ permalink raw reply	[flat|nested] 84+ messages in thread

end of thread, other threads:[~2004-09-04 20:04 UTC | newest]

Thread overview: 84+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-08-30 19:13 [patch] voluntary-preempt-2.6.9-rc1-bk4-Q5 Mark_H_Johnson
2004-08-30 19:21 ` Ingo Molnar
2004-09-01 12:31   ` [patch] voluntary-preempt-2.6.9-rc1-bk4-Q5 - netdev_max_back_log is too small P.O. Gaillard
2004-09-01 13:05     ` Ingo Molnar
2004-09-02 11:24       ` [patch] voluntary-preempt-2.6.9-rc1-bk4-Q6 - network is no longer smooth P.O. Gaillard
2004-09-02 11:28         ` Ingo Molnar
2004-09-02 15:26           ` P.O. Gaillard
2004-08-31  8:49 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-Q5 Ingo Molnar
2004-09-02  6:33 ` Ingo Molnar
2004-09-02  6:55   ` [patch] voluntary-preempt-2.6.9-rc1-bk4-Q8 Ingo Molnar
2004-09-02  7:04     ` Lee Revell
2004-09-02  7:15       ` Ingo Molnar
2004-09-02  7:31         ` Lee Revell
2004-09-02  7:46           ` Ingo Molnar
2004-09-03  1:10             ` Rusty Russell
2004-09-02 23:25         ` Lee Revell
2004-09-02 23:28           ` Ingo Molnar
2004-09-02 23:32             ` Lee Revell
2004-09-02  7:17       ` Ingo Molnar
2004-09-02  8:23     ` Ingo Molnar
2004-09-02 11:10     ` [patch] voluntary-preempt-2.6.9-rc1-bk4-Q9 Ingo Molnar
2004-09-02 12:14       ` Thomas Charbonnel
2004-09-02 13:16       ` Thomas Charbonnel
2004-09-02 13:23         ` Ingo Molnar
2004-09-02 14:38           ` Thomas Charbonnel
2004-09-02 21:57       ` [patch] voluntary-preempt-2.6.9-rc1-bk4-R0 Ingo Molnar
2004-09-02 22:06         ` Lee Revell
2004-09-02 22:14           ` Ingo Molnar
2004-09-02 22:15             ` Lee Revell
2004-09-03  0:24             ` Lee Revell
2004-09-03  3:17               ` Eric St-Laurent
2004-09-03  6:26                 ` Lee Revell
2004-09-03  6:36                   ` Ingo Molnar
2004-09-03  6:49                     ` Lee Revell
2004-09-03  7:01                       ` Ingo Molnar
2004-09-03  7:05                       ` Ingo Molnar
2004-09-03  7:40                         ` Lee Revell
2004-09-03  7:50                           ` Free Ekanayaka
2004-09-03  8:05                             ` Lee Revell
2004-09-03  9:05                               ` Free Ekanayaka
2004-09-03  9:25                               ` [patch] voluntary-preempt-2.6.9-rc1-bk4-R1 Ingo Molnar
2004-09-03  9:50                                 ` Luke Yelavich
2004-09-03 10:29                                   ` Ingo Molnar
2004-09-03 10:43                                     ` Luke Yelavich
2004-09-03 11:33                                 ` Thomas Charbonnel
2004-09-03 11:49                                   ` Ingo Molnar
2004-09-03 12:05                                     ` Thomas Charbonnel
2004-09-03 16:14                                     ` Thomas Charbonnel
2004-09-03 17:36                                       ` Thomas Charbonnel
2004-09-03 11:36                                 ` [patch] voluntary-preempt-2.6.9-rc1-bk4-R2 Ingo Molnar
2004-09-03  8:09                           ` [patch] voluntary-preempt-2.6.9-rc1-bk4-R0 Luke Yelavich
2004-09-03  8:13                             ` Lee Revell
2004-09-03  8:21                               ` Luke Yelavich
2004-09-03 12:52                               ` Luke Yelavich
2004-09-03 18:09                         ` K.R. Foley
2004-09-03 11:04         ` K.R. Foley
2004-09-03 17:02         ` K.R. Foley
2004-09-03 20:40           ` Lee Revell
2004-09-03 17:10         ` K.R. Foley
2004-09-03 18:17           ` Ingo Molnar
2004-09-03 18:36             ` K.R. Foley
2004-09-03 19:30             ` [patch] voluntary-preempt-2.6.9-rc1-bk4-R3 Ingo Molnar
2004-09-03 19:49               ` K.R. Foley
2004-09-04  3:39               ` K.R. Foley
2004-09-04  3:43               ` K.R. Foley
2004-09-04  6:41                 ` Ingo Molnar
2004-09-04 12:28                   ` K.R. Foley
2004-09-04  8:57                 ` Ingo Molnar
2004-09-04 10:16                   ` Lee Revell
2004-09-04 14:35                   ` K.R. Foley
2004-09-04 20:05                     ` Ingo Molnar
2004-09-03 18:39           ` [patch] voluntary-preempt-2.6.9-rc1-bk4-R0 Ingo Molnar
2004-09-03 18:41             ` K.R. Foley
  -- strict thread matches above, loose matches on Subject: below --
2004-09-02 20:21 [patch] voluntary-preempt-2.6.9-rc1-bk4-Q9 Mark_H_Johnson
2004-09-02 20:38 ` Ingo Molnar
2004-09-02 22:43 ` Ingo Molnar
2004-09-03  9:09   ` Ingo Molnar
2004-09-03 13:42 Mark_H_Johnson
2004-09-03 15:33 Mark_H_Johnson
2004-09-04  0:04 ` Lee Revell
2004-09-04 16:52   ` Alan Cox
2004-09-04 18:05     ` Lee Revell
2004-09-03 18:28 Mark_H_Johnson
2004-09-03 18:33 ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox