public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R1
@ 2004-09-07 16:54 Mark_H_Johnson
  2004-09-08 18:42 ` Ingo Molnar
  2004-09-09 16:02 ` Ingo Molnar
  0 siblings, 2 replies; 36+ messages in thread
From: Mark_H_Johnson @ 2004-09-07 16:54 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Lee Revell, Free Ekanayaka, Eric St-Laurent, linux-kernel,
	K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt, P.O. Gaillard,
	nando, luke, free78

Results from this morning's test with -R1 and some follow up on
related messages.

Two runs with -R1 plus the following changes
 - a change to the ensonic driver to reduce latency
 - added mcount() calls in sched.c and timer_tsc.c
 - disabled inline functions in sched.c
The last two so we can see the steps a little more clearly.

The differences between the runs was:
 - first run had DRI and hardware acceleration enabled
 - second run disabled DRI / hardware acceleration
as suggested by Lee Revell.

>From Lee Revell <rlrevell@joe-job.com>
>This is looking more and more like a video driver problem:
>...
>The easiest way to eliminate this possibility is to disable DRI and set
>'Option "NoAccel"' in your X config.  Do you get the same mysterious
>latencies with this setting?

I am not sure I see any significant differences in latencies. Most of
the large delays occur during disk reads and the number of traces >
500 usec was over 100 in both tests (about 30 minutes each). There were
also more latency traces per minute during disk reads when DRI was
disabled. There were however none of the "find_next_bit" traces in the
NoAccel run with significant time delays so that may indicate some
display problems but not sure. See below for details.

Side Comment
============

If you look at the date / time of the traces, you will notice that
most occur in the latter part of the test. This is during the
"disk copy" and "disk read" parts of the testing. To illustrate, the
first 10 traces take about 20 minutes to collect; the last 90 are
collected in 15 and 9 minutes (first and second test).

It is also encouraging that the longest trace is < 800 usec in these
two runs. This is far better than what I saw a couple weeks ago.

Shortest trace
==============

preemption latency trace v1.0.5 on 2.6.9-rc1-VP-R1
--------------------------------------------------
 latency: 550 us, entries: 6 (6)
    -----------------
    | task: cat/6771, uid:0 nice:0 policy:0 rt_prio:0
    -----------------
 => started at: kmap_atomic+0x23/0xe0
 => ended at:   kunmap_atomic+0x7b/0xa0
=======>
00000001 0.000ms (+0.000ms): kmap_atomic (file_read_actor)
00000001 0.000ms (+0.000ms): page_address (file_read_actor)
00000001 0.000ms (+0.549ms): __copy_to_user_ll (file_read_actor)
00000001 0.550ms (+0.000ms): kunmap_atomic (file_read_actor)
00000001 0.550ms (+0.000ms): sub_preempt_count (kunmap_atomic)
00000001 0.550ms (+0.000ms): update_max_trace (check_preempt_timing)

The 1/2 msec latency in __copy_to_user_ll occurred a few more times
in both tests.

Find Next Bit
=============

It appears that -R1 has significantly fewer cases where the code in
find_next_bit is delayed. To summarize:

# grep find_next_bit lt040903/lt*/lt.* | grep '(+0.1' | wc -l
63
# grep find_next_bit lt040907/lt*/lt.* | grep '(+0.1' | wc -l
0

[counting number of trace entries w/ 100-199 usec latencies]

The runs on September 3 were with -Q9, today's (September 7) were
with -R1. Not sure why this changed, but it is encouraging. The
maximum with DRI / display acceleration was +0.069ms, the maximum
without DRI was +0.001ms.

Mark Offset TSC
===============

In a similar way, long latencies in mark_offset_tsc were significantly
reduced in -R1.
# grep _tsc lt040903/lt*/lt.* | grep '(+0.1' | wc -l
24
# grep _tsc lt040907/lt*/lt.* | grep '(+0.1' | wc -l
3

Two of the long delays were in the same trace in the following
sequence:

00010001 0.140ms (+0.000ms): timer_interrupt (generic_handle_IRQ_event)
00010001 0.140ms (+0.000ms): spin_lock (timer_interrupt)
00010002 0.140ms (+0.000ms): spin_lock (<00000000>)
00010002 0.141ms (+0.000ms): mark_offset_tsc (timer_interrupt) [0]
00010002 0.141ms (+0.000ms): mark_offset_tsc (timer_interrupt) [1]
00010002 0.141ms (+0.000ms): spin_lock (mark_offset_tsc)
00010003 0.141ms (+0.000ms): spin_lock (<00000000>)
00010003 0.141ms (+0.131ms): mark_offset_tsc (timer_interrupt) [2]
00010003 0.273ms (+0.000ms): mark_offset_tsc (timer_interrupt) [3]
00010003 0.273ms (+0.000ms): spin_lock (mark_offset_tsc)
00010004 0.273ms (+0.000ms): spin_lock (<00000000>)
00010004 0.273ms (+0.145ms): mark_offset_tsc (timer_interrupt) [4]
00010004 0.419ms (+0.004ms): mark_offset_tsc (timer_interrupt) [5]
00010004 0.423ms (+0.002ms): mark_offset_tsc (timer_interrupt) [6]
00010004 0.425ms (+0.000ms): mark_offset_tsc (timer_interrupt)
00010004 0.425ms (+0.000ms): mark_offset_tsc (timer_interrupt)
00010003 0.426ms (+0.000ms): mark_offset_tsc (timer_interrupt)
00010003 0.426ms (+0.046ms): mark_offset_tsc (timer_interrupt)
00010003 0.472ms (+0.000ms): mark_offset_tsc (timer_interrupt)
00010002 0.472ms (+0.000ms): timer_interrupt (generic_handle_IRQ_event)

For reference, the steps in the code read (w/o comments):

        mcount(); [1]
        write_seqlock(&monotonic_lock);
        mcount(); [2]
        last_offset = ((unsigned long long)last_tsc_high<<32)|last_tsc_low;
        rdtsc(last_tsc_low, last_tsc_high);

        mcount(); [3]
        spin_lock(&i8253_lock);
        mcount(); [4]
        outb_p(0x00, PIT_MODE);     /* latch the count ASAP */
        mcount(); [5]

        count = inb_p(PIT_CH0);    /* read the latched count */
        mcount(); [6]
        count |= inb(PIT_CH0) << 8;
        mcount(); [7]
I numbered the traces / corresponding mcount() calls. If I labeled this
right, it appears the delays were in the rdtsc call and outb_p to
latch the count. Perhaps a hardware level delay taking place.

Clear Page Tables
=================

This is the longest single latency with the following traces:

# grep '(+0.6' lt040907/lt*/lt.*
lt040907/lt001.v3k1/lt.28:00000001 0.001ms (+0.635ms): clear_page_tables
(exit_mmap)
lt040907/lt002.v3k1/lt.75:00000001 0.001ms (+0.628ms): clear_page_tables
(exit_mmap)

__modify_IO_APIC_irq
====================

Not quite sure what is happening here, but this trace

preemption latency trace v1.0.5 on 2.6.9-rc1-VP-R1
--------------------------------------------------
 latency: 612 us, entries: 111 (111)
    -----------------
    | task: kblockd/0/10, uid:0 nice:-10 policy:0 rt_prio:0
    -----------------
 => started at: __spin_lock_irqsave+0x4b/0xa0
 => ended at:   as_work_handler+0x5c/0xa0
=======>
00000001 0.000ms (+0.000ms): __spin_lock_irqsave (spin_lock_irq)
00000001 0.000ms (+0.000ms): generic_enable_irq (ide_do_request)
00000001 0.000ms (+0.000ms): __spin_lock_irqsave (generic_enable_irq)
00000001 0.000ms (+0.000ms): __spin_lock_irqsave (<00000000>)
00000002 0.001ms (+0.000ms): unmask_IO_APIC_irq (generic_enable_irq)
00000002 0.001ms (+0.000ms): __spin_lock_irqsave (unmask_IO_APIC_irq)
00000002 0.001ms (+0.000ms): __spin_lock_irqsave (<00000000>)
00000003 0.001ms (+0.000ms): __unmask_IO_APIC_irq (unmask_IO_APIC_irq)
00000003 0.002ms (+0.567ms): __modify_IO_APIC_irq (__unmask_IO_APIC_irq)
00010001 0.569ms (+0.000ms): do_nmi (smp_apic_timer_interrupt)
00010001 0.569ms (+0.002ms): do_nmi (sched_clock)
00010001 0.572ms (+0.000ms): profile_tick (nmi_watchdog_tick)
00010001 0.572ms (+0.000ms): profile_hook (profile_tick)
00010001 0.573ms (+0.000ms): read_lock (profile_hook)
00010002 0.573ms (+0.000ms): read_lock (<00000000>)
00010002 0.573ms (+0.000ms): notifier_call_chain (profile_hook)
00010001 0.574ms (+0.000ms): profile_hit (nmi_watchdog_tick)
00000001 0.575ms (+0.000ms): smp_apic_timer_interrupt (as_work_handler)

is an example where __modify_IO_APIC_irq shows up with a long latency.
There were about a dozen long latencies (> 100 usec) in the two tests
run this morning, slightly fewer (6 vs 8) in the first test with DRI
and display accelaration turned on. This appears to be an increase
with -R1, tests over three days (-Q5 through -Q9) only had 4 long
latencies at this spot. [Hmm - that might be due to the change in
the way we compute / display the latencies]

Spin Lock
=========

We seem to have gotten stuck here in a spin lock...

 => started at: spin_lock+0x34/0x90
 => ended at:   journal_commit_transaction+0xa25/0x1c30
=======>
00000001 0.000ms (+0.559ms): spin_lock (journal_commit_transaction)

and here...

 => started at: __read_lock_irqsave+0x93/0xa0
 => ended at:   snd_pcm_lib_write1+0x3ba/0x600
=======>
00000001 0.000ms (+0.000ms): __read_lock_irqsave (read_lock_irq)
00000001 0.000ms (+0.000ms): spin_lock (snd_pcm_lib_write1)
00000002 0.000ms (+0.000ms): spin_lock (<00000000>)
00000002 0.000ms (+0.000ms): snd_pcm_update_hw_ptr (snd_pcm_lib_write1)
...
00000002 0.008ms (+0.000ms): snd_ensoniq_trigger (snd_pcm_do_stop)
00000002 0.009ms (+0.000ms): spin_lock (snd_ensoniq_trigger)
00000003 0.009ms (+0.549ms): spin_lock (<00000000>)

Overall there traces referring to spin_lock were...

# grep 'spin_lock (' lt040907/lt*/* | grep -v '+0.0' | wc -l
35
# grep 'spin_lock (' lt040903/lt*/* | grep -v '+0.0' | wc -l
11

More of these traces in -R1 than -Q9, but I am not sure if this is
significant or not.

Context Switch
==============

With the inlined functions disabled, and some extra debug code,
I saw context_switch appear > 800 times in the long latency traces.
Usually very fast (> 10 usec) but sometimes listed with some long
latencies (> 100 usec, 45 trace lines).
For example:

 => started at: schedule+0x5b/0x610
 => ended at:   schedule+0x385/0x610
...
0000002 0.004ms (+0.000ms): schedule (do_irqd)
00000002 0.005ms (+0.000ms): context_switch (schedule)
00000002 0.005ms (+0.000ms): dummy_cs_entry (context_switch)
00000002 0.005ms (+0.000ms): context_switch (schedule)
00000002 0.005ms (+0.000ms): dummy_cs_switch_mm (context_switch)
00000002 0.005ms (+0.111ms): context_switch (schedule)
00000002 0.116ms (+0.000ms): dummy_cs_unlikely_if (context_switch)
00000002 0.117ms (+0.000ms): context_switch (schedule)
00000002 0.117ms (+0.000ms): dummy_cs_switch_to (context_switch)
00000002 0.117ms (+0.029ms): context_switch (schedule)
00000002 0.147ms (+0.128ms): __switch_to (context_switch)
00000002 0.275ms (+0.001ms): dummy_cs_exit (context_switch)
00000002 0.277ms (+0.155ms): context_switch (schedule)
00000002 0.432ms (+0.000ms): finish_task_switch (schedule)
00000002 0.433ms (+0.000ms): smp_apic_timer_interrupt (finish_task_switch)
00010002 0.433ms (+0.137ms): profile_tick (smp_apic_timer_interrupt)
00010002 0.571ms (+0.000ms): profile_hook (profile_tick)
00010002 0.571ms (+0.000ms): read_lock (profile_hook)
00010003 0.571ms (+0.000ms): read_lock (<00000000>)
00010003 0.571ms (+0.000ms): notifier_call_chain (profile_hook)
00010002 0.572ms (+0.000ms): profile_hit (smp_apic_timer_interrupt)

The dummy_cs lines refer to calls I added to context_switch at each
step to read...

        dummy_cs_entry();
        if (unlikely(!mm)) {
                next->active_mm = oldmm;
                atomic_inc(&oldmm->mm_count);
                dummy_cs_lazy_tlb();
                enter_lazy_tlb(oldmm, next);
        } else {
                dummy_cs_switch_mm();
                switch_mm(oldmm, mm, next);
        }

        dummy_cs_unlikely_if();
        if (unlikely(!prev->mm)) {
                prev->active_mm = NULL;
                WARN_ON(rq->prev_mm);
                rq->prev_mm = oldmm;
        }

        /* Here we just switch the register state and the stack. */
        dummy_cs_switch_to();
        switch_to(prev, next, prev);

        dummy_cs_exit();
        return prev;

The switch_mm path of the if appears to be more costly time wise
than the lazy_tlb path. To see this from the detailed traces, try
something like...

grep -C2 'dummy_cs_switch_mm' lt040907/lt*/* | less -im
or
grep -C2 'dummy_cs_lazy_tlb' lt040907/lt*/* | less -im

Closing
=======

I will send copies of the traces (not to the mailing lists) in a few
minutes for review / analysis.

I also downloaded the -R8 patch and other associated patches and will
be building that soon.
  --Mark


^ permalink raw reply	[flat|nested] 36+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R1
@ 2004-09-13 14:44 Mark_H_Johnson
  2004-09-14 18:32 ` Ingo Molnar
  0 siblings, 1 reply; 36+ messages in thread
From: Mark_H_Johnson @ 2004-09-13 14:44 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Eric St-Laurent, Free Ekanayaka, free78, K.R. Foley, linux-kernel,
	Felipe Alfaro Solana, luke, nando, P.O. Gaillard, Daniel Schmitt,
	Lee Revell

>most of your remaining latencies seem to be get_swap_page() related -
>the attached (highly experimental) patch might fix that particular
>latency. (ontop of -S0).

Qualitatively, -S0 with this patch is a little worse than what I saw with
-R8.
>From the last message I sent...

> -R8 on September 10 - 38 traces > 200 usec, only 3 > 500 usec
-S0 on September 13 - 78 traces > 200 usec, only 3 > 500 usec

So I have more relatively small delays in the same test period - almost
twice as much with few that I can see are swap related. Most are network
related. One very long trace > 70 msec - this is the third day I've had
a single multiple millisecond trace in a run.

Key contributors seem to be:
 - IRQ sequence         - 00 02 50 64 73 75 76 77
 - modprobe             - 01
 - rtl8139_poll         - 03 04 07 08 09 10 12 13 15 16 17 18 19 21 24 47
                        - 49 72 74
 - tcp_prequeue_process - 05
 - release_sock         - 06 14
 - tcp_copy_to_iovec    - 11
 - do_munmap            - 20 [very long trace]
 - twkill_work          - 22
 - exit_mmap            - 23 25 26 27 28 29 30 31 32 36 37 38 39 40 41 44
                        - 46 48 51 52 53 54 55 57 58 59 60 61 62 66 67 68
                        - 69 70 71
 - kswapd               - 33 34 35 42 63
 - avc_insert           - 43 45 65
 - schedule             - 56

IRQ Sequence
============

Hard to describe - no specific steps that take a long time, just
a long series of activities, appear to be IRQ related that add up to
a > 300 usec time delay. Many I noted have mouse / signal related
activities, see also the "schedule" trace near the end for a similar
condition.

preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-S0
-------------------------------------------------------
 latency: 342 us, entries: 280 (280)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: kjournald/191, uid:0 nice:0 policy:0 rt_prio:0
    -----------------
 => started at: __spin_lock_irqsave+0x2b/0xb0
 => ended at:   _spin_unlock_irq+0x2e/0x60
=======>
[should look at the full traces for details]

modprobe
========

The shortest trace. I had ran
  system-config-soundcard
to play the test sound before starting the real time test program to
make sure the sound system ran OK. Not something I would expect to do
on a real time system, but something to fix (or document as a limitation).

preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-S0
-------------------------------------------------------
 latency: 460 us, entries: 9 (9)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: modprobe/2995, uid:0 nice:-10 policy:0 rt_prio:0
    -----------------
 => started at: __spin_lock_irqsave+0x2b/0xb0
 => ended at:   _spin_unlock_irq+0x2e/0x60
=======>
00000001 0.000ms (+0.000ms): __spin_lock_irqsave (spin_lock_irq)
00000001 0.000ms (+0.000ms): __spin_lock_irqsave (<00000000>)
00000001 0.000ms (+0.454ms): __find_symbol (resolve_symbol)
00000001 0.455ms (+0.000ms): use_module (resolve_symbol)
00000001 0.455ms (+0.003ms): already_uses (use_module)
00000001 0.459ms (+0.001ms): kmem_cache_alloc (use_module)
00000001 0.460ms (+0.000ms): _spin_unlock_irq (resolve_symbol)
00000001 0.460ms (+0.000ms): sub_preempt_count (_spin_unlock_irq)
00000001 0.461ms (+0.000ms): update_max_trace (check_preempt_timing)

rtl8139_poll
============

We've talked this one before but still shows up on a regular basis.

preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-S0
-------------------------------------------------------
 latency: 347 us, entries: 245 (245)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: ksoftirqd/0/3, uid:0 nice:-10 policy:0 rt_prio:0
    -----------------
 => started at: spin_lock+0x24/0x90
 => ended at:   rtl8139_rx+0x219/0x340
=======>
[again - no big steps, just a large number w/o reschedule opportunity]

tcp_prequeue_process
====================

preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-S0
-------------------------------------------------------
 latency: 271 us, entries: 294 (294)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: rcp/3591, uid:2711 nice:0 policy:0 rt_prio:0
    -----------------
 => started at: tcp_prequeue_process+0x49/0xb0
 => ended at:   tcp_write_xmit+0x249/0x330
=======>
[appears similar to the others - many steps, each one very short]

release_sock
============

preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-S0
-------------------------------------------------------
 latency: 209 us, entries: 275 (275)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: rcp/3705, uid:2711 nice:0 policy:0 rt_prio:0
    -----------------
 => started at: __spin_lock_irqsave+0x2b/0xb0
 => ended at:   cond_resched_softirq+0x65/0x90
=======>
00000001 0.000ms (+0.000ms): __spin_lock_irqsave (spin_lock_bh)
00000001 0.000ms (+0.000ms): __spin_lock_irqsave (<00000000>)
00000101 0.000ms (+0.000ms): __release_sock (release_sock)
00000101 0.000ms (+0.000ms): _spin_unlock (__release_sock)
... long sequence, not long in each step ...

tcp_copy_to_iovec
=================

preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-S0
-------------------------------------------------------
 latency: 210 us, entries: 243 (243)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: rcp/3894, uid:2711 nice:0 policy:0 rt_prio:0
    -----------------
 => started at: tcp_copy_to_iovec+0x5b/0xb0
 => ended at:   local_bh_enable+0x17/0xa0
=======>
00000100 0.000ms (+0.002ms): tcp_copy_to_iovec (tcp_rcv_established)
00000100 0.002ms (+0.005ms): tcp_event_data_recv (tcp_rcv_established)
00000100 0.007ms (+0.000ms): tcp_send_ack (tcp_rcv_established)
... large number of steps, each one very short in duration ...

do_munmap
=========

The very long trace (> 4000 traces, > 70 msec). Following lists all steps
until we get "stuck" with roughly 1 msec pauses. Note - I don't have the
"how we got out" part of this trace but see previous messages for that
kind of detail.

preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-S0
-------------------------------------------------------
 latency: 71065 us, entries: 4000 (5343)   |   [VP:1 KP:1 SP:1 HP:1
#CPUS:2]
    -----------------
    | task: latencytest/4408, uid:0 nice:0 policy:1 rt_prio:99
    -----------------
 => started at: spin_lock+0x24/0x90
 => ended at:   cond_resched_lock+0x1d/0xa0
=======>
00000001 0.000ms (+0.000ms): spin_lock (do_munmap)
00000001 0.000ms (+0.000ms): spin_lock (<00000000>)
00000001 0.000ms (+0.000ms): unmap_region (do_munmap)
00000001 0.001ms (+0.000ms): lru_add_drain (unmap_region)
00000002 0.001ms (+0.000ms): __pagevec_lru_add_active (lru_add_drain)
00000002 0.002ms (+0.000ms): spin_lock_irq (__pagevec_lru_add_active)
00000002 0.002ms (+0.000ms): __spin_lock_irqsave (spin_lock_irq)
00000003 0.002ms (+0.001ms): __spin_lock_irqsave (<00000000>)
00000003 0.003ms (+0.000ms): _spin_unlock_irq (__pagevec_lru_add_active)
00000002 0.003ms (+0.001ms): release_pages (__pagevec_lru_add_active)
00000001 0.004ms (+0.001ms): __bitmap_weight (unmap_region)
00000001 0.006ms (+0.001ms): unmap_vmas (unmap_region)
00000001 0.007ms (+0.000ms): unmap_page_range (unmap_vmas)
00000001 0.007ms (+0.000ms): zap_pmd_range (unmap_page_range)
00000001 0.008ms (+0.000ms): zap_pte_range (zap_pmd_range)
00000001 0.009ms (+0.000ms): kmap_atomic (zap_pte_range)
00000002 0.009ms (+0.001ms): page_address (zap_pte_range)
00000002 0.011ms (+0.000ms): set_page_dirty (zap_pte_range)
00000002 0.012ms (+0.000ms): page_remove_rmap (zap_pte_range)
00000002 0.013ms (+0.000ms): set_page_dirty (zap_pte_range)
00000002 0.013ms (+0.000ms): page_remove_rmap (zap_pte_range)
00000002 0.013ms (+0.000ms): set_page_dirty (zap_pte_range)
00000002 0.013ms (+0.000ms): page_remove_rmap (zap_pte_range)
00000002 0.014ms (+0.000ms): set_page_dirty (zap_pte_range)
00000002 0.014ms (+0.000ms): page_remove_rmap (zap_pte_range)
00000002 0.015ms (+0.000ms): set_page_dirty (zap_pte_range)
00000002 0.015ms (+0.000ms): page_remove_rmap (zap_pte_range)
00000002 0.015ms (+0.000ms): set_page_dirty (zap_pte_range)
00000002 0.016ms (+0.000ms): page_remove_rmap (zap_pte_range)
00000002 0.016ms (+0.000ms): set_page_dirty (zap_pte_range)
00000002 0.016ms (+0.001ms): page_remove_rmap (zap_pte_range)
00000002 0.017ms (+0.000ms): smp_apic_timer_interrupt (zap_pte_range)
00010002 0.018ms (+0.000ms): profile_tick (smp_apic_timer_interrupt)
00010002 0.018ms (+0.000ms): profile_hook (profile_tick)
00010002 0.018ms (+0.000ms): read_lock (profile_hook)
00010003 0.019ms (+0.000ms): read_lock (<00000000>)
00010003 0.019ms (+0.000ms): notifier_call_chain (profile_hook)
00010003 0.019ms (+0.000ms): _read_unlock (profile_tick)
00010002 0.020ms (+0.000ms): profile_hit (smp_apic_timer_interrupt)
00010002 0.021ms (+0.000ms): update_process_times
(smp_apic_timer_interrupt)
00010002 0.021ms (+0.000ms): update_one_process (update_process_times)
00010002 0.022ms (+0.000ms): run_local_timers (update_process_times)
00010002 0.022ms (+0.000ms): raise_softirq (update_process_times)
00010002 0.023ms (+0.000ms): scheduler_tick (update_process_times)
00010002 0.023ms (+0.002ms): sched_clock (scheduler_tick)
00010002 0.026ms (+0.000ms): spin_lock (scheduler_tick)
00010003 0.027ms (+0.000ms): spin_lock (<00000000>)
00010003 0.027ms (+0.000ms): _spin_unlock (scheduler_tick)
00010002 0.027ms (+0.001ms): rebalance_tick (scheduler_tick)
00000003 0.028ms (+0.000ms): do_softirq (smp_apic_timer_interrupt)
00000003 0.029ms (+0.001ms): __do_softirq (do_softirq)
00000002 0.030ms (+0.000ms): set_page_dirty (zap_pte_range)
00000002 0.031ms (+0.000ms): page_remove_rmap (zap_pte_range)
00000002 0.031ms (+0.001ms): kunmap_atomic (zap_pte_range)
00000001 0.032ms (+0.002ms): flush_tlb_mm (unmap_vmas)
00000002 0.034ms (+0.000ms): flush_tlb_others (flush_tlb_mm)
00000002 0.035ms (+0.000ms): spin_lock (flush_tlb_others)
00000003 0.035ms (+0.000ms): spin_lock (<00000000>)
00000003 0.036ms (+0.000ms): send_IPI_mask (flush_tlb_others)
00000003 0.037ms (+0.158ms): send_IPI_mask_bitmask (flush_tlb_others)
00010003 0.195ms (+0.000ms): do_nmi (flush_tlb_others)
00010003 0.195ms (+0.002ms): do_nmi (__trace)
00010003 0.198ms (+0.000ms): notifier_call_chain (default_do_nmi)
00010003 0.199ms (+0.000ms): profile_tick (nmi_watchdog_tick)
00010003 0.199ms (+0.000ms): profile_hook (profile_tick)
00010003 0.199ms (+0.000ms): read_lock (profile_hook)
00010004 0.199ms (+0.000ms): read_lock (<00000000>)
00010004 0.199ms (+0.000ms): notifier_call_chain (profile_hook)
00010004 0.200ms (+0.000ms): _read_unlock (profile_tick)
00010003 0.200ms (+0.001ms): profile_hit (nmi_watchdog_tick)
00010003 0.202ms (+0.000ms): do_IRQ (flush_tlb_others)
00010003 0.202ms (+0.000ms): do_IRQ (<00000000>)
00010003 0.202ms (+0.000ms): spin_lock (do_IRQ)
00010004 0.203ms (+0.000ms): spin_lock (<00000000>)
00010004 0.203ms (+0.000ms): ack_edge_ioapic_irq (do_IRQ)
00010004 0.203ms (+0.000ms): redirect_hardirq (do_IRQ)
00010004 0.204ms (+0.000ms): _spin_unlock (do_IRQ)
00010003 0.204ms (+0.000ms): handle_IRQ_event (do_IRQ)
00010003 0.204ms (+0.000ms): timer_interrupt (handle_IRQ_event)
00010003 0.204ms (+0.000ms): spin_lock (timer_interrupt)
00010004 0.205ms (+0.000ms): spin_lock (<00000000>)
00010004 0.205ms (+0.000ms): mark_offset_tsc (timer_interrupt)
00010004 0.205ms (+0.000ms): mark_offset_tsc (timer_interrupt)
00010004 0.206ms (+0.000ms): spin_lock (mark_offset_tsc)
00010005 0.206ms (+0.000ms): spin_lock (<00000000>)
00010005 0.206ms (+0.000ms): mark_offset_tsc (timer_interrupt)
00010005 0.206ms (+0.000ms): mark_offset_tsc (timer_interrupt)
00010005 0.206ms (+0.000ms): spin_lock (mark_offset_tsc)
00010006 0.207ms (+0.000ms): spin_lock (<00000000>)
00010006 0.207ms (+0.003ms): mark_offset_tsc (timer_interrupt)
00010006 0.211ms (+0.004ms): mark_offset_tsc (timer_interrupt)
00010006 0.215ms (+0.002ms): mark_offset_tsc (timer_interrupt)
00010006 0.217ms (+0.000ms): mark_offset_tsc (timer_interrupt)
00010006 0.217ms (+0.000ms): mark_offset_tsc (timer_interrupt)
00010006 0.217ms (+0.000ms): _spin_unlock (mark_offset_tsc)
00010005 0.218ms (+0.000ms): mark_offset_tsc (timer_interrupt)
00010005 0.218ms (+0.000ms): mark_offset_tsc (timer_interrupt)
00010005 0.218ms (+0.000ms): mark_offset_tsc (timer_interrupt)
00010005 0.219ms (+0.000ms): _spin_unlock (mark_offset_tsc)
00010004 0.219ms (+0.000ms): timer_interrupt (handle_IRQ_event)
00010004 0.219ms (+0.000ms): spin_lock (timer_interrupt)
00010005 0.220ms (+0.003ms): spin_lock (<00000000>)
00010005 0.223ms (+0.000ms): _spin_unlock (timer_interrupt)
00010004 0.224ms (+0.000ms): do_timer (timer_interrupt)
00010004 0.224ms (+0.000ms): update_wall_time (do_timer)
00010004 0.225ms (+0.000ms): update_wall_time_one_tick (update_wall_time)
00010004 0.225ms (+0.000ms): _spin_unlock (timer_interrupt)
00010003 0.226ms (+0.000ms): spin_lock (do_IRQ)
00010004 0.226ms (+0.000ms): spin_lock (<00000000>)
00010004 0.226ms (+0.000ms): note_interrupt (do_IRQ)
00010004 0.227ms (+0.000ms): end_edge_ioapic_irq (do_IRQ)
00010004 0.227ms (+0.000ms): _spin_unlock (do_IRQ)
00000004 0.227ms (+0.000ms): do_softirq (do_IRQ)
00000004 0.227ms (+0.788ms): __do_softirq (do_softirq)
00000003 1.016ms (+0.000ms): smp_apic_timer_interrupt (flush_tlb_others)
00010003 1.016ms (+0.000ms): profile_tick (smp_apic_timer_interrupt)
00010003 1.017ms (+0.000ms): profile_hook (profile_tick)
00010003 1.017ms (+0.000ms): read_lock (profile_hook)
00010004 1.017ms (+0.000ms): read_lock (<00000000>)
00010004 1.017ms (+0.000ms): notifier_call_chain (profile_hook)
00010004 1.017ms (+0.000ms): _read_unlock (profile_tick)
00010003 1.018ms (+0.000ms): profile_hit (smp_apic_timer_interrupt)
00010003 1.018ms (+0.000ms): update_process_times
(smp_apic_timer_interrupt)
00010003 1.018ms (+0.000ms): update_one_process (update_process_times)
00010003 1.018ms (+0.000ms): run_local_timers (update_process_times)
00010003 1.018ms (+0.000ms): raise_softirq (update_process_times)
00010003 1.019ms (+0.000ms): scheduler_tick (update_process_times)
00010003 1.019ms (+0.000ms): sched_clock (scheduler_tick)
00010003 1.019ms (+0.000ms): spin_lock (scheduler_tick)
00010004 1.019ms (+0.000ms): spin_lock (<00000000>)
00010004 1.019ms (+0.000ms): _spin_unlock (scheduler_tick)
00010003 1.020ms (+0.000ms): rebalance_tick (scheduler_tick)
00000004 1.020ms (+0.000ms): do_softirq (smp_apic_timer_interrupt)
00000004 1.020ms (+0.173ms): __do_softirq (do_softirq)
00010003 1.193ms (+0.000ms): do_nmi (flush_tlb_others)
00010003 1.194ms (+0.001ms): do_nmi (__trace)
00010003 1.195ms (+0.000ms): notifier_call_chain (default_do_nmi)
00010003 1.196ms (+0.000ms): profile_tick (nmi_watchdog_tick)
00010003 1.196ms (+0.000ms): profile_hook (profile_tick)
00010003 1.196ms (+0.000ms): read_lock (profile_hook)
00010004 1.196ms (+0.000ms): read_lock (<00000000>)
00010004 1.196ms (+0.000ms): notifier_call_chain (profile_hook)
00010004 1.197ms (+0.000ms): _read_unlock (profile_tick)
00010003 1.197ms (+0.000ms): profile_hit (nmi_watchdog_tick)
00010003 1.198ms (+0.000ms): do_IRQ (flush_tlb_others)
00010003 1.198ms (+0.000ms): do_IRQ (<00000000>)
00010003 1.198ms (+0.000ms): spin_lock (do_IRQ)
00010004 1.198ms (+0.000ms): spin_lock (<00000000>)
00010004 1.198ms (+0.000ms): ack_edge_ioapic_irq (do_IRQ)
00010004 1.199ms (+0.000ms): redirect_hardirq (do_IRQ)
00010004 1.199ms (+0.000ms): _spin_unlock (do_IRQ)
00010003 1.199ms (+0.000ms): handle_IRQ_event (do_IRQ)
00010003 1.199ms (+0.000ms): timer_interrupt (handle_IRQ_event)
00010003 1.199ms (+0.000ms): spin_lock (timer_interrupt)
00010004 1.200ms (+0.000ms): spin_lock (<00000000>)
00010004 1.200ms (+0.000ms): mark_offset_tsc (timer_interrupt)
00010004 1.200ms (+0.000ms): mark_offset_tsc (timer_interrupt)
00010004 1.200ms (+0.000ms): spin_lock (mark_offset_tsc)
00010005 1.200ms (+0.000ms): spin_lock (<00000000>)
00010005 1.201ms (+0.000ms): mark_offset_tsc (timer_interrupt)
00010005 1.201ms (+0.000ms): mark_offset_tsc (timer_interrupt)
00010005 1.201ms (+0.000ms): spin_lock (mark_offset_tsc)
00010006 1.201ms (+0.000ms): spin_lock (<00000000>)
00010006 1.201ms (+0.003ms): mark_offset_tsc (timer_interrupt)
00010006 1.205ms (+0.004ms): mark_offset_tsc (timer_interrupt)
00010006 1.209ms (+0.002ms): mark_offset_tsc (timer_interrupt)
00010006 1.212ms (+0.000ms): mark_offset_tsc (timer_interrupt)
00010006 1.212ms (+0.000ms): mark_offset_tsc (timer_interrupt)
00010006 1.212ms (+0.000ms): _spin_unlock (mark_offset_tsc)
00010005 1.212ms (+0.000ms): mark_offset_tsc (timer_interrupt)
00010005 1.213ms (+0.000ms): mark_offset_tsc (timer_interrupt)
00010005 1.213ms (+0.000ms): mark_offset_tsc (timer_interrupt)
00010005 1.213ms (+0.000ms): _spin_unlock (mark_offset_tsc)
00010004 1.213ms (+0.000ms): timer_interrupt (handle_IRQ_event)
00010004 1.213ms (+0.000ms): spin_lock (timer_interrupt)
00010005 1.214ms (+0.003ms): spin_lock (<00000000>)
00010005 1.217ms (+0.000ms): _spin_unlock (timer_interrupt)
00010004 1.218ms (+0.000ms): do_timer (timer_interrupt)
00010004 1.218ms (+0.000ms): update_wall_time (do_timer)
00010004 1.218ms (+0.000ms): update_wall_time_one_tick (update_wall_time)
00010004 1.218ms (+0.000ms): _spin_unlock (timer_interrupt)
00010003 1.219ms (+0.000ms): spin_lock (do_IRQ)
00010004 1.219ms (+0.000ms): spin_lock (<00000000>)
00010004 1.219ms (+0.000ms): note_interrupt (do_IRQ)
00010004 1.219ms (+0.000ms): end_edge_ioapic_irq (do_IRQ)
00010004 1.219ms (+0.000ms): _spin_unlock (do_IRQ)
00000004 1.220ms (+0.000ms): do_softirq (do_IRQ)
00000004 1.220ms (+0.795ms): __do_softirq (do_softirq)
00000003 2.016ms (+0.000ms): smp_apic_timer_interrupt (flush_tlb_others)
... the steps over the last millisecond get repeated for several cycles ...

twkill_work
===========

A very long sequence (> 1500 traces) with non zero preempt values
that end in "00". Does that mean we did not have any locks but could
not schedule for other reasons?

preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-S0
-------------------------------------------------------
 latency: 429 us, entries: 1531 (1531)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: events/0/6, uid:0 nice:-10 policy:0 rt_prio:0
    -----------------
 => started at: __spin_lock_irqsave+0x2b/0xb0
 => ended at:   twkill_work+0xb5/0xe0
=======>
00000001 0.000ms (+0.000ms): __spin_lock_irqsave (spin_lock_bh)
00000001 0.000ms (+0.000ms): __spin_lock_irqsave (<00000000>)
00000101 0.000ms (+0.000ms): tcp_do_twkill_work (twkill_work)
00000101 0.000ms (+0.000ms): _spin_unlock (tcp_do_twkill_work)
00000100 0.001ms (+0.000ms): tcp_timewait_kill (tcp_do_twkill_work)
00000100 0.001ms (+0.000ms): write_lock (tcp_timewait_kill)
00000101 0.001ms (+0.000ms): write_lock (<00000000>)
00000101 0.002ms (+0.000ms): _write_unlock (tcp_timewait_kill)
00000100 0.002ms (+0.000ms): spin_lock (tcp_timewait_kill)
00000101 0.002ms (+0.000ms): spin_lock (<00000000>)
00000101 0.002ms (+0.000ms): tcp_bucket_destroy (tcp_timewait_kill)
00000101 0.003ms (+0.000ms): kmem_cache_free (tcp_bucket_destroy)
00000101 0.003ms (+0.000ms): _spin_unlock (tcp_timewait_kill)
00000100 0.003ms (+0.000ms): kmem_cache_free (tcp_do_twkill_work)
00000100 0.003ms (+0.000ms): spin_lock (tcp_do_twkill_work)
00000101 0.004ms (+0.000ms): spin_lock (<00000000>)
00000101 0.004ms (+0.000ms): _spin_unlock (tcp_do_twkill_work)
00000100 0.004ms (+0.000ms): tcp_timewait_kill (tcp_do_twkill_work)
00000100 0.004ms (+0.000ms): write_lock (tcp_timewait_kill)
...

exit_mmap
=========

This one has a relatively low time delay for clear_page_tables, but
several other traces have 100 to 200 usec delays at this step. [though
if you add the delay handling the nmi_watchdog, it falls in the 100-200
range as well]

preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-S0
-------------------------------------------------------
 latency: 266 us, entries: 146 (146)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: get_ltrace.sh/5099, uid:0 nice:-9 policy:0 rt_prio:0
    -----------------
 => started at: spin_lock+0x24/0x90
 => ended at:   _spin_unlock+0x2d/0x60
=======>
00000001 0.000ms (+0.000ms): spin_lock (cond_resched_lock)
00000001 0.000ms (+0.000ms): spin_lock (<00000000>)
00000001 0.000ms (+0.000ms): __bitmap_weight (unmap_vmas)
00000001 0.000ms (+0.001ms): vm_acct_memory (exit_mmap)
00000001 0.001ms (+0.056ms): clear_page_tables (exit_mmap)
00010001 0.058ms (+0.000ms): do_nmi (clear_page_tables)
00010001 0.058ms (+0.002ms): do_nmi (__mcount)
00010001 0.060ms (+0.000ms): notifier_call_chain (default_do_nmi)
00010001 0.061ms (+0.000ms): profile_tick (nmi_watchdog_tick)
00010001 0.061ms (+0.000ms): profile_hook (profile_tick)
00010001 0.061ms (+0.000ms): read_lock (profile_hook)
00010002 0.061ms (+0.000ms): read_lock (<00000000>)
00010002 0.061ms (+0.000ms): notifier_call_chain (profile_hook)
00010002 0.062ms (+0.000ms): _read_unlock (profile_tick)
00010001 0.062ms (+0.091ms): profile_hit (nmi_watchdog_tick)
00010001 0.153ms (+0.000ms): do_IRQ (clear_page_tables)
00010001 0.153ms (+0.000ms): do_IRQ (<0000000a>)
00010001 0.153ms (+0.000ms): spin_lock (do_IRQ)
00010002 0.154ms (+0.000ms): spin_lock (<00000000>)
00010002 0.154ms (+0.000ms): mask_and_ack_level_ioapic_irq (do_IRQ)
00010002 0.154ms (+0.000ms): mask_IO_APIC_irq
(mask_and_ack_level_ioapic_irq)
00010002 0.154ms (+0.000ms): __spin_lock_irqsave (mask_IO_APIC_irq)
00010003 0.154ms (+0.000ms): __spin_lock_irqsave (<00000000>)
00010003 0.154ms (+0.000ms): __mask_IO_APIC_irq (mask_IO_APIC_irq)
00010003 0.155ms (+0.018ms): __modify_IO_APIC_irq (__mask_IO_APIC_irq)
00010003 0.173ms (+0.000ms): _spin_unlock_irqrestore (mask_IO_APIC_irq)
00010002 0.174ms (+0.000ms): redirect_hardirq (do_IRQ)
00010002 0.174ms (+0.000ms): _spin_unlock (do_IRQ)
00010001 0.174ms (+0.000ms): handle_IRQ_event (do_IRQ)
00010001 0.174ms (+0.000ms): usb_hcd_irq (handle_IRQ_event)
...

kswapd
======

preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-S0
-------------------------------------------------------
 latency: 557 us, entries: 840 (840)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: kswapd0/40, uid:0 nice:0 policy:0 rt_prio:0
    -----------------
 => started at: spin_lock+0x24/0x90
 => ended at:   _spin_unlock+0x2d/0x60
=======>
00000001 0.000ms (+0.000ms): spin_lock (page_referenced_file)
00000001 0.000ms (+0.000ms): spin_lock (<00000000>)
00000001 0.000ms (+0.000ms): vma_prio_tree_next (page_referenced_file)
00000001 0.001ms (+0.002ms): prio_tree_first (vma_prio_tree_next)
00000001 0.004ms (+0.002ms): prio_tree_left (prio_tree_first)
00000001 0.006ms (+0.000ms): page_referenced_one (page_referenced_file)
00000001 0.007ms (+0.000ms): spin_lock (page_referenced_one)
00000002 0.007ms (+0.001ms): spin_lock (<00000000>)
00000002 0.009ms (+0.001ms): kmap_atomic (page_referenced_one)
00000003 0.011ms (+0.000ms): page_address (page_referenced_one)
00000003 0.012ms (+0.000ms): kunmap_atomic (page_referenced_one)
00000002 0.012ms (+0.000ms): _spin_unlock (page_referenced_one)
00000001 0.012ms (+0.000ms): vma_prio_tree_next (page_referenced_file)
00000001 0.013ms (+0.000ms): page_referenced_one (page_referenced_file)
00000001 0.013ms (+0.003ms): spin_lock (page_referenced_one)
00000002 0.017ms (+0.003ms): spin_lock (<00000000>)
00000002 0.020ms (+0.000ms): kmap_atomic (page_referenced_one)

avc_insert
==========

preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-S0
-------------------------------------------------------
 latency: 266 us, entries: 80 (80)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: fam/2565, uid:0 nice:0 policy:0 rt_prio:0
    -----------------
 => started at: __spin_lock_irqsave+0x2b/0xb0
 => ended at:   _spin_unlock_irqrestore+0x32/0x70
=======>
00000001 0.000ms (+0.000ms): __spin_lock_irqsave (avc_has_perm_noaudit)
00000001 0.000ms (+0.000ms): __spin_lock_irqsave (<00000000>)
00000001 0.000ms (+0.204ms): avc_insert (avc_has_perm_noaudit)
00000001 0.204ms (+0.000ms): memcpy (avc_has_perm_noaudit)
00000001 0.204ms (+0.001ms): _spin_unlock_irqrestore (avc_has_perm_noaudit)
00010001 0.205ms (+0.000ms): do_IRQ (_spin_unlock_irqrestore)
00010001 0.206ms (+0.000ms): do_IRQ (<0000000a>)
00010001 0.206ms (+0.000ms): spin_lock (do_IRQ)
00010002 0.206ms (+0.000ms): spin_lock (<00000000>)
...

schedule
========

Have not talked about schedule in a while - this looks like something
different than before. Appears to be some deep nesting of preempt
disabling activities.

preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-S0
-------------------------------------------------------
 latency: 237 us, entries: 426 (426)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: latencytest/5314, uid:0 nice:0 policy:1 rt_prio:99
    -----------------
 => started at: schedule+0x59/0x5b0
 => ended at:   schedule+0x39d/0x5b0
=======>
00000001 0.000ms (+0.000ms): schedule (ksoftirqd)
00000001 0.000ms (+0.000ms): schedule (ksoftirqd)
00000001 0.000ms (+0.000ms): schedule (ksoftirqd)
00000001 0.000ms (+0.000ms): schedule (ksoftirqd)
00000001 0.000ms (+0.000ms): schedule (ksoftirqd)
00000001 0.000ms (+0.000ms): schedule (ksoftirqd)
00000001 0.000ms (+0.000ms): sched_clock (schedule)
00000001 0.001ms (+0.000ms): schedule (ksoftirqd)
00000001 0.001ms (+0.000ms): schedule (ksoftirqd)
00000001 0.001ms (+0.000ms): schedule (ksoftirqd)
00000001 0.001ms (+0.000ms): spin_lock_irq (schedule)
00000001 0.001ms (+0.000ms): __spin_lock_irqsave (spin_lock_irq)
00000002 0.002ms (+0.000ms): __spin_lock_irqsave (<00000000>)
00000002 0.002ms (+0.000ms): schedule (ksoftirqd)
00000002 0.002ms (+0.000ms): deactivate_task (schedule)
00000002 0.002ms (+0.000ms): dequeue_task (deactivate_task)
00000002 0.002ms (+0.000ms): schedule (ksoftirqd)
00000002 0.002ms (+0.000ms): schedule (ksoftirqd)
00000002 0.003ms (+0.000ms): dependent_sleeper (schedule)
00000002 0.003ms (+0.000ms): schedule (ksoftirqd)
00000002 0.003ms (+0.000ms): schedule (ksoftirqd)
00000002 0.003ms (+0.000ms): schedule (ksoftirqd)
00000002 0.003ms (+0.000ms): schedule (ksoftirqd)
00000002 0.003ms (+0.000ms): dummy_switch_tasks (schedule)
00000002 0.003ms (+0.000ms): schedule (ksoftirqd)
00000002 0.004ms (+0.000ms): schedule (ksoftirqd)
00000002 0.004ms (+0.000ms): schedule (ksoftirqd)
00000002 0.004ms (+0.000ms): schedule (ksoftirqd)
00000002 0.004ms (+0.000ms): schedule (ksoftirqd)
00000002 0.004ms (+0.000ms): sched_info_switch (schedule)
00000002 0.004ms (+0.000ms): sched_info_depart (sched_info_switch)
00000002 0.005ms (+0.000ms): sched_info_arrive (schedule)
00000002 0.005ms (+0.000ms): sched_info_dequeued (sched_info_arrive)
00000002 0.005ms (+0.000ms): schedule (ksoftirqd)
00000002 0.005ms (+0.000ms): context_switch (schedule)
00000002 0.005ms (+0.000ms): dummy_cs_entry (context_switch)
00000002 0.005ms (+0.000ms): context_switch (schedule)
00000002 0.006ms (+0.000ms): dummy_cs_switch_mm (context_switch)
00000002 0.006ms (+0.000ms): context_switch (schedule)
00000002 0.006ms (+0.000ms): dummy_cs_unlikely_if (context_switch)
00000002 0.006ms (+0.000ms): context_switch (schedule)
00000002 0.006ms (+0.000ms): dummy_cs_switch_to (context_switch)
00000002 0.006ms (+0.000ms): context_switch (schedule)
04000002 0.007ms (+0.000ms): __switch_to (context_switch)
04000002 0.007ms (+0.000ms): dummy_cs_exit (context_switch)
04000002 0.007ms (+0.000ms): context_switch (schedule)
04000002 0.007ms (+0.000ms): finish_task_switch (schedule)
04000002 0.008ms (+0.000ms): _spin_unlock_irq (finish_task_switch)
04010002 0.008ms (+0.000ms): do_IRQ (_spin_unlock_irq)
04010002 0.008ms (+0.000ms): do_IRQ (<0000000a>)
04010002 0.008ms (+0.000ms): spin_lock (do_IRQ)
04010003 0.009ms (+0.000ms): spin_lock (<00000000>)
04010003 0.009ms (+0.000ms): mask_and_ack_level_ioapic_irq (do_IRQ)
04010003 0.009ms (+0.000ms): mask_IO_APIC_irq
(mask_and_ack_level_ioapic_irq)
04010003 0.009ms (+0.000ms): __spin_lock_irqsave (mask_IO_APIC_irq)
04010004 0.009ms (+0.000ms): __spin_lock_irqsave (<00000000>)
04010004 0.009ms (+0.000ms): __mask_IO_APIC_irq (mask_IO_APIC_irq)
04010004 0.010ms (+0.020ms): __modify_IO_APIC_irq (__mask_IO_APIC_irq)
04010004 0.030ms (+0.000ms): _spin_unlock_irqrestore (mask_IO_APIC_irq)
04010003 0.030ms (+0.000ms): redirect_hardirq (do_IRQ)
04010003 0.030ms (+0.000ms): _spin_unlock (do_IRQ)
04010002 0.031ms (+0.000ms): handle_IRQ_event (do_IRQ)
04010002 0.031ms (+0.000ms): usb_hcd_irq (handle_IRQ_event)
04010002 0.031ms (+0.001ms): uhci_irq (usb_hcd_irq)
04010002 0.032ms (+0.000ms): spin_lock (uhci_irq)
04010003 0.032ms (+0.000ms): spin_lock (<00000000>)
04010003 0.032ms (+0.000ms): uhci_get_current_frame_number (uhci_irq)
04010003 0.033ms (+0.000ms): uhci_free_pending_qhs (uhci_irq)
04010003 0.033ms (+0.000ms): uhci_free_pending_tds (uhci_irq)
...
gets even deeper nesting with the first value going up to 04010008
while the system appears to send signals based on mouse input. None
of the steps take very long, but with over 400 steps in the sequence
it shows up in the trace record.

  --Mark


^ permalink raw reply	[flat|nested] 36+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R1
@ 2004-09-10 14:28 Mark_H_Johnson
  2004-09-10 19:29 ` Ingo Molnar
  0 siblings, 1 reply; 36+ messages in thread
From: Mark_H_Johnson @ 2004-09-10 14:28 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Eric St-Laurent, Free Ekanayaka, free78, K.R. Foley, linux-kernel,
	Felipe Alfaro Solana, luke, nando, P.O. Gaillard, Daniel Schmitt,
	Lee Revell

>does 'hdparm -X udma2 /dev/hda' work?

Yes it does and quite well actually. For comparison
  -R1 on September 7 - over 100 traces > 500 usec
  -R8 on September 10 - 38 traces > 200 usec, only 3 > 500 usec

This was with the full test suite (latencytest active, all different
types of operations).

In addition to what I already reported today, there were a few additional
types of traces:

network poll - Traces 00 01 02 03 04 06 07 08 16 17 29 32
twkill_work  - Trace  05
VL kswapd0   - Trace  12
spin lock/nmi- Trace  15
do_IRQ       - Trace  25 26 27
the previously reported symptoms...
exit mmap    - Traces 09 18 19 20 21 22 23 24 28 30 31 33 34 35 36 37
kswapd0      - Traces 10 11 13 14

Network Poll
============
preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-R8-nd
-------------------------------------------------------
 latency: 332 us, entries: 198 (198)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: ksoftirqd/1/5, uid:0 nice:-10 policy:0 rt_prio:0
    -----------------
 => started at: spin_lock+0x24/0x90
 => ended at:   rtl8139_rx+0x219/0x340
=======>
00000001 0.000ms (+0.000ms): spin_lock (rtl8139_poll)
00000001 0.000ms (+0.001ms): spin_lock (<00000000>)
00000001 0.001ms (+0.003ms): rtl8139_rx (rtl8139_poll)
... we've been down this path before ...
No individual traces > 10 usec; just the number of traces involved.

twkill_work
===========

I have not seen this one before....

preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-R8-nd
-------------------------------------------------------
 latency: 499 us, entries: 1858 (1858)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: events/0/6, uid:0 nice:-10 policy:0 rt_prio:0
    -----------------
 => started at: twkill_work+0xb5/0xe0
 => ended at:   twkill_work+0xb5/0xe0
=======>
00000101 0.000ms (+0.000ms): touch_preempt_timing (twkill_work)
00000101 0.000ms (+0.000ms): smp_apic_timer_interrupt
(touch_preempt_timing)
00010101 0.000ms (+0.000ms): profile_tick (smp_apic_timer_interrupt)
00010101 0.000ms (+0.000ms): profile_hook (profile_tick)
00010101 0.000ms (+0.000ms): read_lock (profile_hook)
00010102 0.001ms (+0.000ms): read_lock (<00000000>)
00010102 0.001ms (+0.000ms): notifier_call_chain (profile_hook)
00010102 0.001ms (+0.000ms): _read_unlock (profile_tick)
00010101 0.002ms (+0.000ms): profile_hit (smp_apic_timer_interrupt)
00010101 0.002ms (+0.000ms): update_process_times
(smp_apic_timer_interrupt)
00010101 0.002ms (+0.000ms): update_one_process (update_process_times)
00010101 0.003ms (+0.000ms): run_local_timers (update_process_times)
00010101 0.003ms (+0.000ms): raise_softirq (update_process_times)
00010101 0.003ms (+0.000ms): scheduler_tick (update_process_times)
00010101 0.004ms (+0.000ms): sched_clock (scheduler_tick)
00010101 0.005ms (+0.000ms): rcu_check_callbacks (scheduler_tick)
00010101 0.005ms (+0.001ms): idle_cpu (rcu_check_callbacks)
00010101 0.006ms (+0.000ms): spin_lock (scheduler_tick)
00010102 0.006ms (+0.000ms): spin_lock (<00000000>)
00010102 0.007ms (+0.000ms): task_timeslice (scheduler_tick)
00010102 0.007ms (+0.001ms): __bitmap_weight (scheduler_tick)
00010102 0.009ms (+0.000ms): _spin_unlock (scheduler_tick)
00010101 0.010ms (+0.001ms): rebalance_tick (scheduler_tick)
00010101 0.011ms (+0.000ms): do_IRQ (touch_preempt_timing)
00010101 0.011ms (+0.000ms): do_IRQ (<00000000>)
00010101 0.012ms (+0.000ms): spin_lock (do_IRQ)
00010102 0.012ms (+0.000ms): spin_lock (<00000000>)
00010102 0.012ms (+0.000ms): ack_edge_ioapic_irq (do_IRQ)
00010102 0.013ms (+0.000ms): generic_redirect_hardirq (do_IRQ)
00010102 0.013ms (+0.000ms): _spin_unlock (do_IRQ)
... Look at this next sequence ...
00010102 0.033ms (+0.000ms): end_edge_ioapic_irq (do_IRQ)
00010102 0.033ms (+0.000ms): _spin_unlock (do_IRQ)
00000101 0.034ms (+0.000ms): tcp_do_twkill_work (twkill_work)
00000101 0.035ms (+0.000ms): _spin_unlock (tcp_do_twkill_work)
00000100 0.035ms (+0.000ms): tcp_timewait_kill (tcp_do_twkill_work)
00000100 0.035ms (+0.000ms): write_lock (tcp_timewait_kill)
00000101 0.036ms (+0.000ms): write_lock (<00000000>)
00000101 0.037ms (+0.000ms): _write_unlock (tcp_timewait_kill)
00000100 0.037ms (+0.000ms): spin_lock (tcp_timewait_kill)
00000101 0.037ms (+0.000ms): spin_lock (<00000000>)
00000101 0.038ms (+0.000ms): tcp_bucket_destroy (tcp_timewait_kill)
00000101 0.038ms (+0.000ms): kmem_cache_free (tcp_bucket_destroy)
00000101 0.038ms (+0.000ms): cache_flusharray (kmem_cache_free)
... or this one ...
00000100 0.050ms (+0.000ms): spin_lock (tcp_do_twkill_work)
00000101 0.051ms (+0.000ms): spin_lock (<00000000>)
00000101 0.051ms (+0.000ms): _spin_unlock (tcp_do_twkill_work)
00000100 0.051ms (+0.000ms): tcp_timewait_kill (tcp_do_twkill_work)
00000100 0.051ms (+0.000ms): write_lock (tcp_timewait_kill)
00000101 0.052ms (+0.000ms): write_lock (<00000000>)
00000101 0.052ms (+0.000ms): _write_unlock (tcp_timewait_kill)
00000100 0.053ms (+0.000ms): spin_lock (tcp_timewait_kill)
00000101 0.053ms (+0.000ms): spin_lock (<00000000>)
00000101 0.053ms (+0.000ms): tcp_bucket_destroy (tcp_timewait_kill)
00000101 0.053ms (+0.000ms): kmem_cache_free (tcp_bucket_destroy)
00000101 0.054ms (+0.000ms): _spin_unlock (tcp_timewait_kill)
00000100 0.054ms (+0.000ms): kmem_cache_free (tcp_do_twkill_work)
00000100 0.054ms (+0.000ms): spin_lock (tcp_do_twkill_work)
00000101 0.054ms (+0.000ms): spin_lock (<00000000>)
00000101 0.054ms (+0.000ms): _spin_unlock (tcp_do_twkill_work)
00000100 0.055ms (+0.000ms): tcp_timewait_kill (tcp_do_twkill_work)
... and ends like this ...
00000101 0.496ms (+0.000ms): _write_unlock (tcp_timewait_kill)
00000100 0.496ms (+0.000ms): preempt_schedule (tcp_timewait_kill)
00000100 0.496ms (+0.000ms): spin_lock (tcp_timewait_kill)
00000101 0.496ms (+0.000ms): spin_lock (<00000000>)
00000101 0.497ms (+0.000ms): tcp_bucket_destroy (tcp_timewait_kill)
00000101 0.497ms (+0.000ms): kmem_cache_free (tcp_bucket_destroy)
00000101 0.497ms (+0.000ms): _spin_unlock (tcp_timewait_kill)
00000100 0.497ms (+0.000ms): preempt_schedule (tcp_timewait_kill)
00000100 0.498ms (+0.000ms): kmem_cache_free (tcp_do_twkill_work)
00000100 0.498ms (+0.000ms): spin_lock (tcp_do_twkill_work)
00000101 0.498ms (+0.000ms): spin_lock (<00000000>)
00000101 0.499ms (+0.000ms): touch_preempt_timing (twkill_work)
00000101 0.499ms (+0.000ms): update_max_trace (check_preempt_timing)

I looked briefly at the -R8 patch and see the change in condition for
breaking the lock, but from this trace, it may not be enough.

VL kswapd0
==========

An extremely long latency (> 6 msec) appears to be a combination of
effects, perhaps tracing related. It starts like this...

preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-R8-nd
-------------------------------------------------------
 latency: 6776 us, entries: 548 (548)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: kswapd0/40, uid:0 nice:0 policy:0 rt_prio:0
    -----------------
 => started at: spin_lock+0x24/0x90
 => ended at:   _spin_unlock+0x2d/0x60
=======>
00000001 0.000ms (+0.000ms): spin_lock (page_referenced_file)
00000001 0.000ms (+0.000ms): spin_lock (<00000000>)
00000001 0.000ms (+0.000ms): vma_prio_tree_next (page_referenced_file)
00000001 0.001ms (+0.001ms): prio_tree_first (vma_prio_tree_next)
00000001 0.002ms (+0.000ms): page_referenced_one (page_referenced_file)
00000001 0.002ms (+0.000ms): spin_lock (page_referenced_one)
00000002 0.003ms (+0.001ms): spin_lock (<00000000>)
00000002 0.004ms (+0.000ms): kmap_atomic (page_referenced_one)
00000003 0.004ms (+0.000ms): page_address (page_referenced_one)
00000003 0.005ms (+0.000ms): kunmap_atomic (page_referenced_one)
00000002 0.005ms (+0.000ms): _spin_unlock (page_referenced_one)
00000001 0.005ms (+0.000ms): vma_prio_tree_next (page_referenced_file)
00000001 0.006ms (+0.000ms): page_referenced_one (page_referenced_file)
00000001 0.006ms (+0.000ms): spin_lock (page_referenced_one)
00000002 0.007ms (+0.000ms): spin_lock (<00000000>)
... looks like the typical kswapd0 cycle until this point ...
00000001 0.126ms (+0.000ms): page_referenced_one (page_referenced_file)
00000001 0.127ms (+0.000ms): spin_lock (page_referenced_one)
00000002 0.127ms (+0.000ms): spin_lock (<00000000>)
00000002 0.128ms (+0.000ms): kmap_atomic (page_referenced_one)
00000003 0.128ms (+0.000ms): page_address (page_referenced_one)
00000003 0.129ms (+0.000ms): flush_tlb_page (page_referenced_one)
00000004 0.129ms (+0.000ms): flush_tlb_others (flush_tlb_page)
00000004 0.130ms (+0.000ms): spin_lock (flush_tlb_others)
00000005 0.130ms (+0.000ms): spin_lock (<00000000>)
00000005 0.131ms (+0.000ms): send_IPI_mask (flush_tlb_others)
00000005 0.132ms (+0.790ms): send_IPI_mask_bitmask (flush_tlb_others)
00010005 0.922ms (+0.000ms): do_nmi (flush_tlb_others)
00010005 0.923ms (+0.003ms): do_nmi (touch_preempt_timing)
00010005 0.926ms (+0.000ms): notifier_call_chain (default_do_nmi)
00010005 0.926ms (+0.000ms): profile_tick (nmi_watchdog_tick)
00010005 0.926ms (+0.000ms): profile_hook (profile_tick)
00010005 0.927ms (+0.000ms): read_lock (profile_hook)
00010006 0.927ms (+0.000ms): read_lock (<00000000>)
00010006 0.927ms (+0.000ms): notifier_call_chain (profile_hook)
00010006 0.928ms (+0.000ms): _read_unlock (profile_tick)
00010005 0.928ms (+0.027ms): profile_hit (nmi_watchdog_tick)
00000005 0.955ms (+0.000ms): smp_apic_timer_interrupt (flush_tlb_others)
00010005 0.955ms (+0.000ms): profile_tick (smp_apic_timer_interrupt)
00010005 0.956ms (+0.000ms): profile_hook (profile_tick)
00010005 0.956ms (+0.000ms): read_lock (profile_hook)
00010006 0.956ms (+0.000ms): read_lock (<00000000>)
00010006 0.956ms (+0.000ms): notifier_call_chain (profile_hook)
00010006 0.956ms (+0.000ms): _read_unlock (profile_tick)
00010005 0.956ms (+0.000ms): profile_hit (smp_apic_timer_interrupt)
00010005 0.957ms (+0.000ms): update_process_times
(smp_apic_timer_interrupt)
00010005 0.957ms (+0.000ms): update_one_process (update_process_times)
00010005 0.957ms (+0.000ms): run_local_timers (update_process_times)
00010005 0.957ms (+0.000ms): raise_softirq (update_process_times)
00010005 0.958ms (+0.000ms): scheduler_tick (update_process_times)
00010005 0.958ms (+0.000ms): sched_clock (scheduler_tick)
00010005 0.958ms (+0.000ms): spin_lock (scheduler_tick)
00010006 0.958ms (+0.000ms): spin_lock (<00000000>)
00010006 0.959ms (+0.000ms): task_timeslice (scheduler_tick)
00010006 0.959ms (+0.000ms): __bitmap_weight (scheduler_tick)
00010006 0.959ms (+0.000ms): _spin_unlock (scheduler_tick)
00010005 0.959ms (+0.000ms): rebalance_tick (scheduler_tick)
00000006 0.960ms (+0.000ms): do_softirq (smp_apic_timer_interrupt)
00000006 0.960ms (+0.000ms): __do_softirq (do_softirq)
00000006 0.960ms (+0.000ms): wake_up_process (do_softirq)
00000006 0.960ms (+0.000ms): try_to_wake_up (wake_up_process)
00000006 0.960ms (+0.000ms): task_rq_lock (try_to_wake_up)
00000006 0.961ms (+0.000ms): spin_lock (task_rq_lock)
00000007 0.961ms (+0.000ms): spin_lock (<00000000>)
00000007 0.961ms (+0.000ms): wake_idle (try_to_wake_up)
00000007 0.961ms (+0.000ms): activate_task (try_to_wake_up)
00000007 0.961ms (+0.000ms): sched_clock (activate_task)
00000007 0.962ms (+0.000ms): recalc_task_prio (activate_task)
00000007 0.962ms (+0.000ms): effective_prio (recalc_task_prio)
00000007 0.962ms (+0.000ms): __activate_task (try_to_wake_up)
00000007 0.962ms (+0.000ms): enqueue_task (__activate_task)
00000007 0.962ms (+0.000ms): sched_info_queued (enqueue_task)
00000007 0.963ms (+0.000ms): resched_task (try_to_wake_up)
00000007 0.963ms (+0.000ms): task_rq_unlock (try_to_wake_up)
00000007 0.963ms (+0.000ms): _spin_unlock_irqrestore (try_to_wake_up)
00000006 0.964ms (+0.958ms): preempt_schedule (try_to_wake_up)
00010005 1.922ms (+0.000ms): do_nmi (flush_tlb_others)
00010005 1.922ms (+0.003ms): do_nmi (check_preempt_timing)
00010005 1.926ms (+0.000ms): notifier_call_chain (default_do_nmi)
00010005 1.926ms (+0.000ms): profile_tick (nmi_watchdog_tick)
00010005 1.926ms (+0.000ms): profile_hook (profile_tick)
00010005 1.926ms (+0.000ms): read_lock (profile_hook)
00010006 1.927ms (+0.000ms): read_lock (<00000000>)
00010006 1.927ms (+0.000ms): notifier_call_chain (profile_hook)
00010006 1.927ms (+0.000ms): _read_unlock (profile_tick)
00010005 1.927ms (+0.000ms): preempt_schedule (profile_tick)
00010005 1.928ms (+0.027ms): profile_hit (nmi_watchdog_tick)
00000005 1.955ms (+0.000ms): smp_apic_timer_interrupt (flush_tlb_others)
00010005 1.955ms (+0.000ms): profile_tick (smp_apic_timer_interrupt)
00010005 1.955ms (+0.000ms): profile_hook (profile_tick)
00010005 1.955ms (+0.000ms): read_lock (profile_hook)
00010006 1.955ms (+0.000ms): read_lock (<00000000>)
00010006 1.956ms (+0.000ms): notifier_call_chain (profile_hook)
00010006 1.956ms (+0.000ms): _read_unlock (profile_tick)
00010005 1.956ms (+0.000ms): preempt_schedule (profile_tick)
00010005 1.956ms (+0.000ms): profile_hit (smp_apic_timer_interrupt)
00010005 1.956ms (+0.000ms): update_process_times
(smp_apic_timer_interrupt)
00010005 1.956ms (+0.000ms): update_one_process (update_process_times)
00010005 1.957ms (+0.000ms): run_local_timers (update_process_times)
00010005 1.957ms (+0.000ms): raise_softirq (update_process_times)
00010005 1.957ms (+0.000ms): scheduler_tick (update_process_times)
00010005 1.957ms (+0.000ms): sched_clock (scheduler_tick)
00010005 1.958ms (+0.000ms): spin_lock (scheduler_tick)
00010006 1.958ms (+0.000ms): spin_lock (<00000000>)
00010006 1.958ms (+0.000ms): task_timeslice (scheduler_tick)
00010006 1.958ms (+0.000ms): __bitmap_weight (scheduler_tick)
00010006 1.958ms (+0.000ms): _spin_unlock (scheduler_tick)
00010005 1.959ms (+0.000ms): preempt_schedule (scheduler_tick)
00010005 1.959ms (+0.000ms): rebalance_tick (scheduler_tick)
00000006 1.959ms (+0.000ms): do_softirq (smp_apic_timer_interrupt)
00000006 1.959ms (+0.961ms): __do_softirq (do_softirq)
00010005 2.921ms (+0.000ms): do_nmi (flush_tlb_others)
00010005 2.922ms (+0.001ms): do_nmi (check_preempt_timing)
... looks like we got stuck for milliseconds but finally finished ...
00000006 4.959ms (+0.000ms): do_softirq (smp_apic_timer_interrupt)
00000006 4.959ms (+0.960ms): __do_softirq (do_softirq)
00010005 5.919ms (+0.000ms): do_nmi (flush_tlb_others)
00010005 5.920ms (+0.003ms): do_nmi (__trace)
00010005 5.924ms (+0.000ms): notifier_call_chain (default_do_nmi)
00010005 5.924ms (+0.000ms): profile_tick (nmi_watchdog_tick)
00010005 5.924ms (+0.000ms): profile_hook (profile_tick)
00010005 5.924ms (+0.000ms): read_lock (profile_hook)
00010006 5.924ms (+0.000ms): read_lock (<00000000>)
00010006 5.925ms (+0.000ms): notifier_call_chain (profile_hook)
00010006 5.925ms (+0.000ms): _read_unlock (profile_tick)
00010005 5.925ms (+0.000ms): preempt_schedule (profile_tick)
00010005 5.925ms (+0.027ms): profile_hit (nmi_watchdog_tick)
00000005 5.952ms (+0.000ms): smp_apic_timer_interrupt (flush_tlb_others)
00010005 5.953ms (+0.000ms): profile_tick (smp_apic_timer_interrupt)
00010005 5.953ms (+0.000ms): profile_hook (profile_tick)
00010005 5.953ms (+0.000ms): read_lock (profile_hook)
00010006 5.953ms (+0.000ms): read_lock (<00000000>)
00010006 5.953ms (+0.000ms): notifier_call_chain (profile_hook)
00010006 5.953ms (+0.000ms): _read_unlock (profile_tick)
00010005 5.954ms (+0.000ms): preempt_schedule (profile_tick)
00010005 5.954ms (+0.000ms): profile_hit (smp_apic_timer_interrupt)
00010005 5.954ms (+0.000ms): update_process_times
(smp_apic_timer_interrupt)
00010005 5.954ms (+0.000ms): update_one_process (update_process_times)
00010005 5.954ms (+0.000ms): run_local_timers (update_process_times)
00010005 5.955ms (+0.000ms): raise_softirq (update_process_times)
00010005 5.955ms (+0.000ms): scheduler_tick (update_process_times)
00010005 5.955ms (+0.000ms): sched_clock (scheduler_tick)
00010005 5.955ms (+0.000ms): spin_lock (scheduler_tick)
00010006 5.956ms (+0.000ms): spin_lock (<00000000>)
00010006 5.956ms (+0.000ms): task_timeslice (scheduler_tick)
00010006 5.956ms (+0.000ms): __bitmap_weight (scheduler_tick)
00010006 5.956ms (+0.000ms): _spin_unlock (scheduler_tick)
00010005 5.956ms (+0.000ms): preempt_schedule (scheduler_tick)
00010005 5.957ms (+0.000ms): rebalance_tick (scheduler_tick)
00000006 5.957ms (+0.000ms): do_softirq (smp_apic_timer_interrupt)
00000006 5.957ms (+0.817ms): __do_softirq (do_softirq)
00000005 6.774ms (+0.000ms): _spin_unlock (flush_tlb_others)
00000004 6.775ms (+0.000ms): preempt_schedule (flush_tlb_others)
00000003 6.775ms (+0.000ms): preempt_schedule (flush_tlb_page)
00000003 6.775ms (+0.000ms): kunmap_atomic (page_referenced_one)
00000002 6.775ms (+0.000ms): preempt_schedule (page_referenced_one)
00000002 6.776ms (+0.000ms): _spin_unlock (page_referenced_one)
00000001 6.776ms (+0.000ms): preempt_schedule (page_referenced_one)
00000001 6.776ms (+0.000ms): _spin_unlock (page_referenced_file)
00000001 6.777ms (+0.000ms): sub_preempt_count (_spin_unlock)
00000001 6.777ms (+0.000ms): update_max_trace (check_preempt_timing)

Spin Lock/nmi
=============

May be an artifact of the test configuration

preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-R8-nd
-------------------------------------------------------
 latency: 244 us, entries: 17 (17)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: fam/3212, uid:0 nice:0 policy:0 rt_prio:0
    -----------------
 => started at: __spin_lock_irqsave+0x2b/0xb0
 => ended at:   _spin_unlock_irqrestore+0x32/0x70
=======>
00000001 0.000ms (+0.000ms): __spin_lock_irqsave (avc_has_perm_noaudit)
00000001 0.000ms (+0.000ms): __spin_lock_irqsave (<00000000>)
00000001 0.000ms (+0.069ms): avc_insert (avc_has_perm_noaudit)
00010001 0.069ms (+0.000ms): do_nmi (avc_insert)
00010001 0.070ms (+0.002ms): do_nmi (__mcount)
00010001 0.072ms (+0.000ms): notifier_call_chain (default_do_nmi)
00010001 0.073ms (+0.000ms): profile_tick (nmi_watchdog_tick)
00010001 0.073ms (+0.000ms): profile_hook (profile_tick)
00010001 0.073ms (+0.000ms): read_lock (profile_hook)
00010002 0.073ms (+0.000ms): read_lock (<00000000>)
00010002 0.073ms (+0.000ms): notifier_call_chain (profile_hook)
00010002 0.074ms (+0.000ms): _read_unlock (profile_tick)
00010001 0.074ms (+0.168ms): profile_hit (nmi_watchdog_tick)
00000001 0.242ms (+0.000ms): memcpy (avc_has_perm_noaudit)
00000001 0.243ms (+0.002ms): _spin_unlock_irqrestore (avc_has_perm_noaudit)
00000001 0.245ms (+0.000ms): sub_preempt_count (_spin_unlock_irqrestore)
00000001 0.245ms (+0.000ms): update_max_trace (check_preempt_timing)

Do_IRQ
======

Perhaps an unfortunate series of nested IRQ's?
Starts like this...

preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-R8-nd
-------------------------------------------------------
 latency: 240 us, entries: 380 (380)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: cpu_burn/14061, uid:0 nice:10 policy:0 rt_prio:0
    -----------------
 => started at: do_IRQ+0x19/0x1a0
 => ended at:   do_IRQ+0x14a/0x1a0
=======>
00010000 0.000ms (+0.000ms): do_IRQ (common_interrupt)
00010000 0.000ms (+0.000ms): do_IRQ (<08048340>)
00010000 0.000ms (+0.000ms): do_IRQ (<0000000a>)
00010000 0.000ms (+0.000ms): spin_lock (do_IRQ)
00010001 0.000ms (+0.000ms): spin_lock (<00000000>)
00010001 0.001ms (+0.000ms): mask_and_ack_level_ioapic_irq (do_IRQ)
00010001 0.001ms (+0.000ms): mask_IO_APIC_irq
(mask_and_ack_level_ioapic_irq)
00010001 0.001ms (+0.000ms): __spin_lock_irqsave (mask_IO_APIC_irq)
00010002 0.002ms (+0.000ms): __spin_lock_irqsave (<00000000>)
00010002 0.002ms (+0.000ms): __mask_IO_APIC_irq (mask_IO_APIC_irq)
00010002 0.002ms (+0.013ms): __modify_IO_APIC_irq (__mask_IO_APIC_irq)
00010002 0.016ms (+0.000ms): _spin_unlock_irqrestore (mask_IO_APIC_irq)
00010001 0.016ms (+0.000ms): generic_redirect_hardirq (do_IRQ)
00010001 0.016ms (+0.000ms): _spin_unlock (do_IRQ)
00010000 0.017ms (+0.000ms): generic_handle_IRQ_event (do_IRQ)
00010000 0.017ms (+0.001ms): usb_hcd_irq (generic_handle_IRQ_event)
00010000 0.019ms (+0.001ms): uhci_irq (usb_hcd_irq)
00010000 0.021ms (+0.000ms): spin_lock (uhci_irq)
00010001 0.021ms (+0.000ms): spin_lock (<00000000>)
00010001 0.021ms (+0.001ms): uhci_get_current_frame_number (uhci_irq)
00010001 0.023ms (+0.000ms): uhci_free_pending_qhs (uhci_irq)
00010001 0.023ms (+0.001ms): uhci_free_pending_tds (uhci_irq)
00010001 0.024ms (+0.001ms): uhci_remove_pending_urbps (uhci_irq)
00010001 0.025ms (+0.000ms): uhci_transfer_result (uhci_irq)
00010001 0.026ms (+0.000ms): spin_lock (uhci_transfer_result)
...
00020005 0.134ms (+0.000ms): do_IRQ (_spin_unlock_irqrestore)
00020005 0.134ms (+0.000ms): do_IRQ (<00000000>)
00020005 0.134ms (+0.000ms): spin_lock (do_IRQ)
00020006 0.134ms (+0.000ms): spin_lock (<00000000>)
00020006 0.134ms (+0.000ms): ack_edge_ioapic_irq (do_IRQ)
00020006 0.135ms (+0.000ms): generic_redirect_hardirq (do_IRQ)
00020006 0.135ms (+0.000ms): _spin_unlock (do_IRQ)
00020005 0.135ms (+0.000ms): preempt_schedule (do_IRQ)
00020005 0.135ms (+0.000ms): generic_handle_IRQ_event (do_IRQ)
00020005 0.136ms (+0.000ms): timer_interrupt (generic_handle_IRQ_event)
... exits like this ...
00010001 0.220ms (+0.000ms): generic_note_interrupt (do_IRQ)
00010001 0.220ms (+0.000ms): end_level_ioapic_irq (do_IRQ)
00010001 0.220ms (+0.000ms): unmask_IO_APIC_irq (do_IRQ)
00010001 0.221ms (+0.000ms): __spin_lock_irqsave (unmask_IO_APIC_irq)
00010002 0.221ms (+0.000ms): __spin_lock_irqsave (<00000000>)
00010002 0.221ms (+0.000ms): __unmask_IO_APIC_irq (unmask_IO_APIC_irq)
00010002 0.221ms (+0.013ms): __modify_IO_APIC_irq (__unmask_IO_APIC_irq)
00010002 0.235ms (+0.000ms): _spin_unlock_irqrestore (unmask_IO_APIC_irq)
00010001 0.235ms (+0.000ms): preempt_schedule (unmask_IO_APIC_irq)
00010001 0.235ms (+0.000ms): _spin_unlock (do_IRQ)
00010000 0.236ms (+0.000ms): preempt_schedule (do_IRQ)
00000001 0.236ms (+0.000ms): do_softirq (do_IRQ)
00000001 0.236ms (+0.000ms): __do_softirq (do_softirq)
00000001 0.236ms (+0.000ms): wake_up_process (do_softirq)
00000001 0.237ms (+0.000ms): try_to_wake_up (wake_up_process)
00000001 0.237ms (+0.000ms): task_rq_lock (try_to_wake_up)
00000001 0.237ms (+0.000ms): spin_lock (task_rq_lock)
00000002 0.237ms (+0.000ms): spin_lock (<00000000>)
00000002 0.237ms (+0.000ms): wake_idle (try_to_wake_up)
00000002 0.238ms (+0.000ms): activate_task (try_to_wake_up)
00000002 0.238ms (+0.000ms): sched_clock (activate_task)
00000002 0.238ms (+0.000ms): recalc_task_prio (activate_task)
00000002 0.238ms (+0.000ms): effective_prio (recalc_task_prio)
00000002 0.239ms (+0.000ms): __activate_task (try_to_wake_up)
00000002 0.239ms (+0.000ms): enqueue_task (__activate_task)
00000002 0.239ms (+0.000ms): sched_info_queued (enqueue_task)
00000002 0.239ms (+0.000ms): resched_task (try_to_wake_up)
00000002 0.240ms (+0.000ms): task_rq_unlock (try_to_wake_up)
00000002 0.240ms (+0.000ms): _spin_unlock_irqrestore (try_to_wake_up)
00000001 0.240ms (+0.000ms): preempt_schedule (try_to_wake_up)
00000001 0.241ms (+0.000ms): sub_preempt_count (do_IRQ)
00000001 0.242ms (+0.000ms): update_max_trace (check_preempt_timing)

I will send the full traces separately (off list).

  --Mark


^ permalink raw reply	[flat|nested] 36+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R1
@ 2004-09-10 12:57 Mark_H_Johnson
  2004-09-10 13:10 ` Ingo Molnar
  0 siblings, 1 reply; 36+ messages in thread
From: Mark_H_Johnson @ 2004-09-10 12:57 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Lee Revell, Free Ekanayaka, Eric St-Laurent, linux-kernel,
	K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt, P.O. Gaillard,
	nando, luke, free78

>You could also try lower DMA modes, I don't know if that will help or
>not but UDMA133 is essentially "one entire PCI 32/33Mhz bus worth of
>traffic so the effects are not too suprising. IDE control ports can give
>you long stalls too especially if IORDY is in use.

It appears I finally figured it out and have the drive running with udma2
instead of udma4. For reference:

# hdparm /dev/hda

/dev/hda:
 multcount    = 16 (on)
 IO_support   =  1 (32-bit)
 unmaskirq    =  1 (on)
 using_dma    =  1 (on)
 keepsettings =  0 (off)
 readonly     =  0 (off)
 readahead    = 64 (on)
 geometry     = 58168/16/63, sectors = 58633344, start = 0
# hdparm -i /dev/hda

/dev/hda:

 Model=WDC WD300BB-00AUA1, FwRev=18.20D18, SerialNo=WD-WMA6W1764218
 Config={ HardSect NotMFM HdSw>15uSec SpinMotCtl Fixed DTR>5Mbs FmtGapReq }
 RawCHS=16383/16/63, TrkSize=57600, SectSize=600, ECCbytes=40
 BuffType=DualPortCache, BuffSize=2048kB, MaxMultSect=16, MultSect=16
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=58633344
 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes:  pio0 pio1 pio2 pio3 pio4
 DMA modes:  mdma0 mdma1 mdma2
 UDMA modes: udma0 udma1 *udma2 udma3 udma4 udma5
 AdvancedPM=no WriteCache=enabled
 Drive conforms to: device does not report version:

 * signifies the current active mode

The same test I ran yesterday
 head -c 750000000 /dev/zero >tmpfile (disk writes)
 cp tmpfile tmpfile2 (disk copy)
 cat tmpfile tmpfile2 >/dev/null
ran with the following results late last night. All are traces > 200 usec
[much smaller limit than before] No traces > 600 usec.

Exit Mmap - Traces 00 01 02 03 04 05 10 13 24 26
Kswapd0   - Traces 06 07 08 09 11 12 14 16..23 25 27
Spin lock - Trace  15

Exit Mmap
=========

preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-R8-nd
-------------------------------------------------------
 latency: 209 us, entries: 34 (34)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: sleep/4436, uid:0 nice:-9 policy:0 rt_prio:0
    -----------------
 => started at: cond_resched_lock+0x1d/0xa0
 => ended at:   _spin_unlock+0x2d/0x60
=======>
00000001 0.000ms (+0.000ms): touch_preempt_timing (cond_resched_lock)
00000001 0.000ms (+0.001ms): __bitmap_weight (unmap_vmas)
00000001 0.001ms (+0.002ms): vm_acct_memory (exit_mmap)
00000001 0.003ms (+0.113ms): clear_page_tables (exit_mmap)
00010001 0.117ms (+0.000ms): do_nmi (clear_page_tables)
00010001 0.117ms (+0.003ms): do_nmi (avc_has_perm)
00010001 0.121ms (+0.000ms): notifier_call_chain (default_do_nmi)
00010001 0.121ms (+0.000ms): profile_tick (nmi_watchdog_tick)
00010001 0.121ms (+0.000ms): profile_hook (profile_tick)
00010001 0.121ms (+0.000ms): read_lock (profile_hook)
00010002 0.121ms (+0.000ms): read_lock (<00000000>)
00010002 0.122ms (+0.000ms): notifier_call_chain (profile_hook)
00010002 0.122ms (+0.000ms): _read_unlock (profile_tick)
00010001 0.122ms (+0.078ms): profile_hit (nmi_watchdog_tick)
00000001 0.200ms (+0.002ms): flush_tlb_mm (exit_mmap)
00000001 0.202ms (+0.000ms): free_pages_and_swap_cache (exit_mmap)

Kswapd0
=======

preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-R8-nd
-------------------------------------------------------
 latency: 539 us, entries: 719 (719)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: kswapd0/40, uid:0 nice:0 policy:0 rt_prio:0
    -----------------
 => started at: spin_lock+0x24/0x90
 => ended at:   _spin_unlock+0x2d/0x60
=======>
00000001 0.000ms (+0.000ms): spin_lock (page_referenced_file)
00000001 0.000ms (+0.000ms): spin_lock (<00000000>)
00000001 0.000ms (+0.000ms): vma_prio_tree_next (page_referenced_file)
00000001 0.000ms (+0.000ms): prio_tree_first (vma_prio_tree_next)
00000001 0.001ms (+0.001ms): prio_tree_left (prio_tree_first)
00000001 0.003ms (+0.000ms): page_referenced_one (page_referenced_file)
00000001 0.004ms (+0.000ms): spin_lock (page_referenced_one)
00000002 0.005ms (+0.001ms): spin_lock (<00000000>)
00000002 0.006ms (+0.000ms): kmap_atomic (page_referenced_one)
00000003 0.006ms (+0.000ms): page_address (page_referenced_one)
00000003 0.007ms (+0.000ms): kunmap_atomic (page_referenced_one)
00000002 0.007ms (+0.000ms): _spin_unlock (page_referenced_one)
00000001 0.007ms (+0.000ms): vma_prio_tree_next (page_referenced_file)
00000001 0.008ms (+0.001ms): page_referenced_one (page_referenced_file)
00000001 0.009ms (+0.000ms): spin_lock (page_referenced_one)
00000002 0.009ms (+0.002ms): spin_lock (<00000000>)
... cycle repeats a number of times ...
00000001 0.535ms (+0.000ms): preempt_schedule (page_referenced_one)
00000001 0.536ms (+0.000ms): vma_prio_tree_next (page_referenced_file)
00000001 0.536ms (+0.000ms): page_referenced_one (page_referenced_file)
00000001 0.537ms (+0.000ms): spin_lock (page_referenced_one)
00000002 0.537ms (+0.000ms): spin_lock (<00000000>)
00000002 0.538ms (+0.000ms): kmap_atomic (page_referenced_one)
00000003 0.538ms (+0.000ms): page_address (page_referenced_one)
00000003 0.538ms (+0.000ms): kunmap_atomic (page_referenced_one)
00000002 0.538ms (+0.000ms): preempt_schedule (page_referenced_one)
00000002 0.539ms (+0.000ms): _spin_unlock (page_referenced_one)
00000001 0.539ms (+0.000ms): preempt_schedule (page_referenced_one)
00000001 0.539ms (+0.000ms): _spin_unlock (page_referenced_file)
00000001 0.540ms (+0.000ms): sub_preempt_count (_spin_unlock)
00000001 0.541ms (+0.000ms): update_max_trace (check_preempt_timing)

Spin Lock
=========

preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-R8-nd
-------------------------------------------------------
 latency: 235 us, entries: 18 (18)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: kswapd0/40, uid:0 nice:0 policy:0 rt_prio:0
    -----------------
 => started at: spin_lock+0x24/0x90
 => ended at:   _spin_unlock+0x2d/0x60
=======>
00000001 0.000ms (+0.000ms): spin_lock (get_swap_page)
00000001 0.000ms (+0.000ms): spin_lock (<00000000>)
00000001 0.000ms (+0.000ms): spin_lock (get_swap_page)
00000002 0.000ms (+0.113ms): spin_lock (<00000000>)
00010002 0.113ms (+0.000ms): do_nmi (get_swap_page)
00010002 0.114ms (+0.002ms): do_nmi (default_idle)
00010002 0.116ms (+0.000ms): notifier_call_chain (default_do_nmi)
00010002 0.116ms (+0.000ms): profile_tick (nmi_watchdog_tick)
00010002 0.117ms (+0.000ms): profile_hook (profile_tick)
00010002 0.117ms (+0.000ms): read_lock (profile_hook)
00010003 0.117ms (+0.000ms): read_lock (<00000000>)
00010003 0.117ms (+0.000ms): notifier_call_chain (profile_hook)
00010003 0.117ms (+0.000ms): _read_unlock (profile_tick)
00010002 0.118ms (+0.116ms): profile_hit (nmi_watchdog_tick)
00000002 0.235ms (+0.000ms): _spin_unlock (get_swap_page)
00000001 0.235ms (+0.000ms): _spin_unlock (get_swap_page)
00000001 0.235ms (+0.001ms): sub_preempt_count (_spin_unlock)
00000001 0.237ms (+0.000ms): update_max_trace (check_preempt_timing)

Are you SURE the spin lock counter works properly on SMP systems?
I did a quick check of yesterday's results:
  # grep -ir '<.*>' latencytest0.42-png/lt040909  | wc -l
  6978
  # grep -ir '<.*>' latencytest0.42-png/lt040909  | grep -v '<00000000>' |
less -im
  ...
  latencytest0.42-png/lt040909/lt004.v3k1/lt.10:00010002 0.382ms
(+0.000ms): do_IRQ (<0000000e>)
latencytest0.42-png/lt040909/lt004.v3k1/lt.11:00010001 0.003ms (+0.000ms):
do_IRQ (<0000000a>)
latencytest0.42-png/lt040909/lt004.v3k1/lt.11:00020001 0.308ms (+0.000ms):
do_IRQ (<0000000e>)
latencytest0.42-png/lt040909/lt005.v3k1/lt.00:00010002 0.263ms (+0.005ms):
do_nmi (<0804bd56>)
latencytest0.42-png/lt040909/lt006.v3k1/lt.00:00010001 0.501ms (+0.005ms):
do_nmi (<0864d2b2>)
latencytest0.42-png/lt040909/lt006.v3k1/lt.01:00010001 0.067ms (+0.003ms):
do_nmi (<0815f06c>)
  ...
No entries that are non zero and lock related.

  --Mark


^ permalink raw reply	[flat|nested] 36+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R1
@ 2004-09-09 22:56 Mark_H_Johnson
  2004-09-10 12:42 ` Ingo Molnar
  0 siblings, 1 reply; 36+ messages in thread
From: Mark_H_Johnson @ 2004-09-09 22:56 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Eric St-Laurent, Free Ekanayaka, free78, K.R. Foley, linux-kernel,
	Felipe Alfaro Solana, luke, nando, P.O. Gaillard, Daniel Schmitt,
	Lee Revell

>I tried
>  hdparm -p -X udma2 /dev/hda
>(since it was udma4 previously)
>and reran the tests.
Not quite sure this was what I wanted - appears to turn on PIO modes
exclusively.

What is the right incantation for this?
Thanks.
--Mark H Johnson
  <mailto:Mark_H_Johnson@raytheon.com>


^ permalink raw reply	[flat|nested] 36+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R1
@ 2004-09-09 22:41 Mark_H_Johnson
  0 siblings, 0 replies; 36+ messages in thread
From: Mark_H_Johnson @ 2004-09-09 22:41 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Lee Revell, Free Ekanayaka, Eric St-Laurent, linux-kernel,
	K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt, P.O. Gaillard,
	nando, luke, free78

>so if these assumptions are true i'd suggest to decrease max_sectors
>from 32K to 16K - that alone should halve these random latencies from
>560 usecs to 280 usecs. Unless you see stability you might want to try
>an 8K setting as well - this will likely decrease your IO rates
>noticeably though. This would reduce the random delays to 140 usecs.

I tried both and still had some long delays. For example:

00000003 0.015ms (+0.000ms): send_IPI_mask (flush_tlb_others)
00000003 0.015ms (+0.137ms): send_IPI_mask_bitmask (flush_tlb_others)
00010003 0.153ms (+0.000ms): do_nmi (flush_tlb_others)
00010003 0.153ms (+0.001ms): do_nmi (ide_outsl)
00010003 0.155ms (+0.000ms): notifier_call_chain (default_do_nmi)
00010003 0.155ms (+0.000ms): profile_tick (nmi_watchdog_tick)
00010003 0.155ms (+0.000ms): profile_hook (profile_tick)
00010003 0.156ms (+0.000ms): read_lock (profile_hook)
00010004 0.156ms (+0.000ms): read_lock (<00000000>)
00010004 0.156ms (+0.000ms): notifier_call_chain (profile_hook)
00010004 0.156ms (+0.000ms): _read_unlock (profile_tick)
00010003 0.157ms (+0.290ms): profile_hit (nmi_watchdog_tick)
00000003 0.447ms (+0.000ms): _spin_unlock (flush_tlb_others)
00000001 0.447ms (+0.000ms): free_pages_and_swap_cache (unmap_vmas)

or this...

preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-R8-nd
-------------------------------------------------------
 latency: 440 us, entries: 34 (34)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: kswapd0/40, uid:0 nice:0 policy:0 rt_prio:0
    -----------------
 => started at: spin_lock+0x24/0x90
 => ended at:   _spin_unlock+0x2d/0x60
=======>
00000001 0.000ms (+0.000ms): spin_lock (get_swap_page)
00000001 0.000ms (+0.000ms): spin_lock (<00000000>)
00000001 0.000ms (+0.000ms): spin_lock (get_swap_page)
00000002 0.000ms (+0.410ms): spin_lock (<00000000>)
00010002 0.411ms (+0.000ms): do_IRQ (get_swap_page)

or this...

preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-R8-nd
-------------------------------------------------------
 latency: 548 us, entries: 8 (8)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: kswapd0/40, uid:0 nice:0 policy:0 rt_prio:0
    -----------------
 => started at: spin_lock+0x24/0x90
 => ended at:   _spin_unlock+0x2d/0x60
=======>
00000001 0.000ms (+0.000ms): spin_lock (get_swap_page)
00000001 0.000ms (+0.000ms): spin_lock (<00000000>)
00000001 0.000ms (+0.000ms): spin_lock (get_swap_page)
00000002 0.000ms (+0.547ms): spin_lock (<00000000>)
00000002 0.548ms (+0.000ms): _spin_unlock (get_swap_page)
00000001 0.548ms (+0.000ms): _spin_unlock (get_swap_page)
00000001 0.548ms (+0.000ms): sub_preempt_count (_spin_unlock)
00000001 0.549ms (+0.000ms): update_max_trace (check_preempt_timing)

or this...

preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-R8-nd
-------------------------------------------------------
 latency: 422 us, entries: 345 (345)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: kblockd/0/12, uid:0 nice:-10 policy:0 rt_prio:0
    -----------------
 => started at: do_IRQ+0x19/0x1a0
 => ended at:   do_IRQ+0x14a/0x1a0
=======>
00010000 0.000ms (+0.000ms): do_IRQ (common_interrupt)
00010000 0.000ms (+0.000ms): do_IRQ (kthread_should_stop)
00010000 0.000ms (+0.000ms): do_IRQ (<0000000a>)
00010000 0.000ms (+0.000ms): spin_lock (do_IRQ)
00010001 0.000ms (+0.052ms): spin_lock (<00000000>)
00010001 0.053ms (+0.000ms): mask_and_ack_level_ioapic_irq (do_IRQ)
00010001 0.053ms (+0.000ms): mask_IO_APIC_irq
(mask_and_ack_level_ioapic_irq)
00010001 0.053ms (+0.000ms): __spin_lock_irqsave (mask_IO_APIC_irq)
00010002 0.053ms (+0.000ms): __spin_lock_irqsave (<00000000>)
00010002 0.053ms (+0.000ms): __mask_IO_APIC_irq (mask_IO_APIC_irq)
00010002 0.054ms (+0.220ms): __modify_IO_APIC_irq (__mask_IO_APIC_irq)
00010002 0.274ms (+0.000ms): _spin_unlock_irqrestore (mask_IO_APIC_irq)
00010001 0.274ms (+0.000ms): generic_redirect_hardirq (do_IRQ)

Kswapd0
=======

Not sure I saw this before - may be a side effect of the smaller
transfer sizes...

preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-R8-nd
-------------------------------------------------------
 latency: 500 us, entries: 608 (608)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: kswapd0/40, uid:0 nice:0 policy:0 rt_prio:0
    -----------------
 => started at: spin_lock+0x24/0x90
 => ended at:   _spin_unlock+0x2d/0x60
=======>
00000001 0.000ms (+0.000ms): spin_lock (page_referenced_file)
00000001 0.000ms (+0.000ms): spin_lock (<00000000>)
00000001 0.000ms (+0.000ms): vma_prio_tree_next (page_referenced_file)
00000001 0.000ms (+0.000ms): prio_tree_first (vma_prio_tree_next)
00000001 0.000ms (+0.000ms): prio_tree_left (prio_tree_first)
00000001 0.000ms (+0.000ms): page_referenced_one (page_referenced_file)
00000001 0.001ms (+0.000ms): spin_lock (page_referenced_one)
00000002 0.001ms (+0.000ms): spin_lock (<00000000>)
00000002 0.001ms (+0.000ms): kmap_atomic (page_referenced_one)
00000003 0.001ms (+0.000ms): page_address (page_referenced_one)
00000003 0.002ms (+0.000ms): kunmap_atomic (page_referenced_one)
00000002 0.002ms (+0.000ms): _spin_unlock (page_referenced_one)
00000001 0.002ms (+0.000ms): vma_prio_tree_next (page_referenced_file)
00000001 0.002ms (+0.000ms): page_referenced_one (page_referenced_file)
00000001 0.003ms (+0.000ms): spin_lock (page_referenced_one)
00000002 0.003ms (+0.000ms): spin_lock (<00000000>)
...
00000001 0.480ms (+0.000ms): vma_prio_tree_next (page_referenced_file)
00000001 0.481ms (+0.001ms): page_referenced_one (page_referenced_file)
00000001 0.483ms (+0.000ms): spin_lock (page_referenced_one)
00000002 0.484ms (+0.001ms): spin_lock (<00000000>)
00000002 0.485ms (+0.000ms): kmap_atomic (page_referenced_one)
00000003 0.486ms (+0.001ms): page_address (page_referenced_one)
00000003 0.487ms (+0.000ms): flush_tlb_page (page_referenced_one)
00000003 0.488ms (+0.000ms): kunmap_atomic (page_referenced_one)
00000002 0.489ms (+0.000ms): _spin_unlock (page_referenced_one)
00000001 0.489ms (+0.001ms): vma_prio_tree_next (page_referenced_file)
00000001 0.491ms (+0.002ms): page_referenced_one (page_referenced_file)
00000001 0.494ms (+0.000ms): spin_lock (page_referenced_one)
00000002 0.494ms (+0.002ms): spin_lock (<00000000>)
00000002 0.497ms (+0.000ms): kmap_atomic (page_referenced_one)
00000003 0.497ms (+0.000ms): page_address (page_referenced_one)
00000003 0.498ms (+0.000ms): kunmap_atomic (page_referenced_one)
00000002 0.499ms (+0.000ms): _spin_unlock (page_referenced_one)
00000001 0.499ms (+0.002ms): _spin_unlock (page_referenced_file)
00000001 0.501ms (+0.002ms): sub_preempt_count (_spin_unlock)
00000001 0.503ms (+0.000ms): update_max_trace (check_preempt_timing)

>but the real fix would be to tweak the IDE controller to not do so
>agressive DMA! Are there any BIOS settings that somehow deal with it?
>Try increasing the PCI latency value? Is the disk using UDMA - if yes,
>could you downgrade it to normal IDE DMA? Perhaps that tweaks the
>controller to be 'nice' to the CPU.

I tried
  hdparm -p -X udma2 /dev/hda
(since it was udma4 previously)
and reran the tests. As a side note - that command caused a long latency
Something like...

preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-R8-nd
-------------------------------------------------------
 latency: 2019 us, entries: 2829 (2829)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: hdparm/13753, uid:0 nice:0 policy:0 rt_prio:0
    -----------------
 => started at: cond_resched+0xd/0x30
 => ended at:   cond_resched+0xd/0x30
=======>
00000001 0.000ms (+0.000ms): touch_preempt_timing (cond_resched)
00000001 0.000ms (+0.007ms): ide_find_setting_by_ioctl (generic_ide_ioctl)
00000001 0.008ms (+0.000ms): ide_write_setting (generic_ide_ioctl)
00000001 0.009ms (+0.001ms): capable (ide_write_setting)
...
00000001 0.064ms (+0.001ms): ide_inb (ide_config_drive_speed)
00000001 0.066ms (+0.000ms): ide_inb (ide_config_drive_speed)
00000001 0.067ms (+0.000ms): ide_inb (ide_config_drive_speed)
...
00000001 1.899ms (+0.000ms): ide_inb (ide_config_drive_speed)
00000001 1.899ms (+0.000ms): ide_inb (ide_config_drive_speed)
00000001 1.900ms (+0.000ms): ide_inb (ide_config_drive_speed)
00000001 1.901ms (+0.000ms): __const_udelay (ide_config_drive_speed)
00000001 1.901ms (+0.000ms): __delay (ide_config_drive_speed)
00000001 1.901ms (+0.001ms): delay_tsc (__delay)
00000001 1.902ms (+0.000ms): ide_inb (ide_config_drive_speed)
...

The results after that change were like this...

preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-R8-nd
-------------------------------------------------------
 latency: 536 us, entries: 31 (31)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: IRQ 14/140, uid:0 nice:-10 policy:0 rt_prio:0
    -----------------
 => started at: kmap_atomic+0x23/0xe0
 => ended at:   kunmap_atomic+0x7b/0xa0
=======>
00000001 0.000ms (+0.000ms): kmap_atomic (__ide_do_rw_disk)
00000001 0.000ms (+0.000ms): page_address (__ide_do_rw_disk)
00000001 0.000ms (+0.000ms): ide_set_handler (__ide_do_rw_disk)
00000001 0.000ms (+0.000ms): __spin_lock_irqsave (ide_set_handler)
00000002 0.000ms (+0.000ms): __spin_lock_irqsave (<00000000>)
00000002 0.001ms (+0.000ms): __ide_set_handler (ide_set_handler)
00000002 0.001ms (+0.000ms): __mod_timer (ide_set_handler)
00000002 0.001ms (+0.000ms): __spin_lock_irqsave (__mod_timer)
00000003 0.001ms (+0.000ms): __spin_lock_irqsave (<00000000>)
00000003 0.002ms (+0.000ms): spin_lock (__mod_timer)
00000004 0.002ms (+0.000ms): spin_lock (<00000000>)
00000004 0.002ms (+0.000ms): internal_add_timer (__mod_timer)
00000004 0.002ms (+0.000ms): _spin_unlock (__mod_timer)
00000003 0.002ms (+0.000ms): _spin_unlock_irqrestore (__mod_timer)
00000002 0.003ms (+0.000ms): _spin_unlock_irqrestore (ide_set_handler)
00000001 0.003ms (+0.000ms): taskfile_output_data (__ide_do_rw_disk)
00000001 0.003ms (+0.000ms): ata_output_data (taskfile_output_data)
00000001 0.003ms (+0.496ms): ide_outsw (ata_output_data)
00010001 0.500ms (+0.000ms): do_nmi (ide_outsw)
00010001 0.501ms (+0.005ms): do_nmi (<0864d2b2>)
00010001 0.506ms (+0.000ms): notifier_call_chain (default_do_nmi)
00010001 0.506ms (+0.000ms): profile_tick (nmi_watchdog_tick)
00010001 0.506ms (+0.000ms): profile_hook (profile_tick)
00010001 0.506ms (+0.000ms): read_lock (profile_hook)
00010002 0.507ms (+0.000ms): read_lock (<00000000>)
00010002 0.507ms (+0.000ms): notifier_call_chain (profile_hook)
00010002 0.507ms (+0.000ms): _read_unlock (profile_tick)
00010001 0.507ms (+0.028ms): profile_hit (nmi_watchdog_tick)
00000001 0.536ms (+0.000ms): kunmap_atomic (__ide_do_rw_disk)
00000001 0.536ms (+0.000ms): sub_preempt_count (kunmap_atomic)
00000001 0.536ms (+0.000ms): sub_preempt_count (kunmap_atomic)
00000001 0.537ms (+0.000ms): update_max_trace (check_preempt_timing)

or this...

preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-R8-nd
-------------------------------------------------------
 latency: 497 us, entries: 18 (18)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: IRQ 14/140, uid:0 nice:-10 policy:0 rt_prio:0
    -----------------
 => started at: kmap_atomic+0x23/0xe0
 => ended at:   kunmap_atomic+0x7b/0xa0
=======>
00000001 0.000ms (+0.000ms): kmap_atomic (read_intr)
00000001 0.000ms (+0.000ms): page_address (read_intr)
00000001 0.000ms (+0.000ms): taskfile_input_data (read_intr)
00000001 0.000ms (+0.000ms): ata_input_data (taskfile_input_data)
00000001 0.000ms (+0.066ms): ide_insw (ata_input_data)
00010001 0.067ms (+0.000ms): do_nmi (ide_insw)
00010001 0.067ms (+0.003ms): do_nmi (<0815f06c>)
00010001 0.070ms (+0.000ms): notifier_call_chain (default_do_nmi)
00010001 0.070ms (+0.000ms): profile_tick (nmi_watchdog_tick)
00010001 0.071ms (+0.000ms): profile_hook (profile_tick)
00010001 0.071ms (+0.000ms): read_lock (profile_hook)
00010002 0.071ms (+0.000ms): read_lock (<00000000>)
00010002 0.071ms (+0.000ms): notifier_call_chain (profile_hook)
00010002 0.071ms (+0.000ms): _read_unlock (profile_tick)
00010001 0.072ms (+0.424ms): profile_hit (nmi_watchdog_tick)
00000001 0.497ms (+0.000ms): kunmap_atomic (read_intr)
00000001 0.497ms (+0.000ms): sub_preempt_count (kunmap_atomic)
00000001 0.498ms (+0.000ms): update_max_trace (check_preempt_timing)

or this...

preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-R8-nd
-------------------------------------------------------
 latency: 454 us, entries: 60 (60)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: kswapd0/40, uid:0 nice:0 policy:0 rt_prio:0
    -----------------
 => started at: spin_lock+0x24/0x90
 => ended at:   _spin_unlock+0x2d/0x60
=======>
00000001 0.000ms (+0.000ms): spin_lock (get_swap_page)
00000001 0.000ms (+0.000ms): spin_lock (<00000000>)
00000001 0.000ms (+0.000ms): spin_lock (get_swap_page)
00000002 0.001ms (+0.267ms): spin_lock (<00000000>)
00010002 0.268ms (+0.000ms): do_nmi (get_swap_page)
00010002 0.268ms (+0.003ms): do_nmi (ide_outsw)
00010002 0.272ms (+0.000ms): notifier_call_chain (default_do_nmi)
00010002 0.272ms (+0.000ms): profile_tick (nmi_watchdog_tick)
00010002 0.272ms (+0.000ms): profile_hook (profile_tick)
00010002 0.272ms (+0.000ms): read_lock (profile_hook)
00010003 0.273ms (+0.000ms): read_lock (<00000000>)
00010003 0.273ms (+0.000ms): notifier_call_chain (profile_hook)
00010003 0.273ms (+0.000ms): _read_unlock (profile_tick)
00010002 0.274ms (+0.000ms): profile_hit (nmi_watchdog_tick)
00010002 0.274ms (+0.000ms): do_IRQ (get_swap_page)
00010002 0.275ms (+0.000ms): do_IRQ (<00000000>)
00010002 0.275ms (+0.000ms): spin_lock (do_IRQ)
00010003 0.275ms (+0.000ms): spin_lock (<00000000>)
00010003 0.275ms (+0.000ms): ack_edge_ioapic_irq (do_IRQ)
00010003 0.275ms (+0.000ms): generic_redirect_hardirq (do_IRQ)
00010003 0.276ms (+0.000ms): _spin_unlock (do_IRQ)

or this...

reemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-R8-nd
-------------------------------------------------------
 latency: 457 us, entries: 8 (8)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: kswapd0/40, uid:0 nice:0 policy:0 rt_prio:0
    -----------------
 => started at: spin_lock+0x24/0x90
 => ended at:   _spin_unlock+0x2d/0x60
=======>
00000001 0.000ms (+0.000ms): spin_lock (get_swap_page)
00000001 0.000ms (+0.000ms): spin_lock (<00000000>)
00000001 0.000ms (+0.000ms): spin_lock (get_swap_page)
00000002 0.000ms (+0.456ms): spin_lock (<00000000>)
00000002 0.457ms (+0.000ms): _spin_unlock (get_swap_page)
00000001 0.457ms (+0.000ms): _spin_unlock (get_swap_page)
00000001 0.458ms (+0.000ms): sub_preempt_count (_spin_unlock)
00000001 0.458ms (+0.000ms): update_max_trace (check_preempt_timing)

So I don't think that particular setting helps.

  --Mark



^ permalink raw reply	[flat|nested] 36+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R1
@ 2004-09-09 19:23 Mark_H_Johnson
  2004-09-09 19:47 ` Ingo Molnar
  0 siblings, 1 reply; 36+ messages in thread
From: Mark_H_Johnson @ 2004-09-09 19:23 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Lee Revell, Free Ekanayaka, Eric St-Laurent, linux-kernel,
	K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt, P.O. Gaillard,
	nando, luke, free78

>would it be possible to test with DMA disabled? (hdparm -d0 /dev/hda) It
>might take some extra work to shun the extra latency reports from the
>PIO IDE path (which is quite slow) but once that is done you should be
>able to see whether these long 0.5 msec delays remain even if all (most)
>DMA activity has been eliminated.

OK. With new patches in hand, I have a set of latency results w/ IDE
DMA turned off. For reference, tests were run with:

# hdparm /dev/hda

/dev/hda:
 multcount    = 16 (on)
 IO_support   =  1 (32-bit)
 unmaskirq    =  1 (on)
 using_dma    =  0 (off)
 keepsettings =  0 (off)
 readonly     =  0 (off)
 readahead    = 64 (on)
 geometry     = 58168/16/63, sectors = 58633344, start = 0

# cat /sys/block/hda/queue/max_sectors_kb
32
# cat /sys/block/hda/queue/read_ahead_kb
32
# cat /proc/sys/net/core/netdev_max_backlog
8
# dmesg -n 1

and all tests run w/ a -R8 kernel plus patches for sched, timer_tsc,
and ide-iops (to add latency trace outputs or suppress known long paths).

No latency traces > 600 usec. These tests were run simply with
  head -c 750000000 /dev/zero >tmpfile  (disk writes)
  cp tmpfile tmpfile2 (disk copy)
  cat tmpfile tmpfile2 >/dev/null
while capturing the latency traces in another process. 72 total traces
captured in 15 minutes of tests.

For reference, the 750 meg file size is about 1.5 x physical memory.
No I/O to the audio card. X was running (to monitor the test) and a
network was connected (but generally idle).

PIO trace
=========

preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-R8-nd
-------------------------------------------------------
 latency: 369 us, entries: 3 (3)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: IRQ 14/140, uid:0 nice:-10 policy:0 rt_prio:0
    -----------------
 => started at: ide_outsl+0x44/0x50
 => ended at:   ide_outsl+0x44/0x50
=======>
00000001 0.000ms (+0.370ms): touch_preempt_timing (ide_outsl)
00000001 0.370ms (+0.000ms): touch_preempt_timing (ide_outsl)
00000001 0.370ms (+0.000ms): update_max_trace (check_preempt_timing)

which appears to be the PIO path ide_outsl. I had a few similar traces
with ide_insl during the copy / read tests as well.

send_IPI_mask_bitmask
=====================

I have several traces where send_IPI_mask_bitmask (flush_tlb_others)
shows up. For example...

00000002 0.010ms (+0.001ms): kunmap_atomic (zap_pte_range)
00000001 0.011ms (+0.000ms): flush_tlb_mm (unmap_vmas)
00000002 0.012ms (+0.000ms): flush_tlb_others (flush_tlb_mm)
00000002 0.012ms (+0.000ms): spin_lock (flush_tlb_others)
00000003 0.013ms (+0.001ms): spin_lock (<00000000>)
00000003 0.014ms (+0.000ms): send_IPI_mask (flush_tlb_others)
00000003 0.014ms (+0.132ms): send_IPI_mask_bitmask (flush_tlb_others)
00010003 0.147ms (+0.000ms): do_nmi (flush_tlb_others)
00010003 0.147ms (+0.001ms): do_nmi (ide_outsl)
00010003 0.149ms (+0.000ms): notifier_call_chain (default_do_nmi)
00010003 0.149ms (+0.000ms): profile_tick (nmi_watchdog_tick)
00010003 0.150ms (+0.000ms): profile_hook (profile_tick)
00010003 0.150ms (+0.000ms): read_lock (profile_hook)
00010004 0.150ms (+0.000ms): read_lock (<00000000>)
00010004 0.150ms (+0.000ms): notifier_call_chain (profile_hook)
00010004 0.151ms (+0.000ms): _read_unlock (profile_tick)
00010003 0.151ms (+0.250ms): profile_hit (nmi_watchdog_tick)
00000003 0.401ms (+0.000ms): _spin_unlock (flush_tlb_others)
00000001 0.402ms (+0.000ms): free_pages_and_swap_cache (unmap_vmas)
00000001 0.402ms (+0.000ms): lru_add_drain (free_pages_and_swap_cache)

or this one...

preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-R8-nd
-------------------------------------------------------
 latency: 445 us, entries: 22 (22)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: kswapd0/40, uid:0 nice:0 policy:0 rt_prio:0
    -----------------
 => started at: spin_lock+0x24/0x90
 => ended at:   _spin_unlock+0x2d/0x60
=======>
00000001 0.000ms (+0.000ms): spin_lock (page_referenced_file)
00000001 0.000ms (+0.000ms): spin_lock (<00000000>)
00000001 0.000ms (+0.000ms): vma_prio_tree_next (page_referenced_file)
00000001 0.001ms (+0.001ms): prio_tree_first (vma_prio_tree_next)
00000001 0.003ms (+0.001ms): prio_tree_left (prio_tree_first)
00000001 0.005ms (+0.000ms): page_referenced_one (page_referenced_file)
00000001 0.005ms (+0.000ms): spin_lock (page_referenced_one)
00000002 0.006ms (+0.001ms): spin_lock (<00000000>)
00000002 0.007ms (+0.000ms): kmap_atomic (page_referenced_one)
00000003 0.007ms (+0.001ms): page_address (page_referenced_one)
00000003 0.009ms (+0.000ms): flush_tlb_page (page_referenced_one)
00000004 0.010ms (+0.000ms): flush_tlb_others (flush_tlb_page)
00000004 0.010ms (+0.000ms): spin_lock (flush_tlb_others)
00000005 0.011ms (+0.001ms): spin_lock (<00000000>)
00000005 0.012ms (+0.000ms): send_IPI_mask (flush_tlb_others)
00000005 0.012ms (+0.431ms): send_IPI_mask_bitmask (flush_tlb_others)
00000005 0.444ms (+0.000ms): _spin_unlock (flush_tlb_others)
00000003 0.445ms (+0.000ms): kunmap_atomic (page_referenced_one)
00000002 0.445ms (+0.000ms): _spin_unlock (page_referenced_one)
00000001 0.445ms (+0.000ms): _spin_unlock (page_referenced_file)
00000001 0.446ms (+0.000ms): sub_preempt_count (_spin_unlock)
00000001 0.446ms (+0.000ms): update_max_trace (check_preempt_timing)

or this one...

preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-R8-nd
-------------------------------------------------------
 latency: 428 us, entries: 49 (49)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: get_ltrace.sh/5514, uid:0 nice:-9 policy:0 rt_prio:0
    -----------------
 => started at: flush_tlb_mm+0x1d/0xd0
 => ended at:   flush_tlb_mm+0x52/0xd0
=======>
00000001 0.000ms (+0.001ms): flush_tlb_mm (copy_mm)
00000001 0.001ms (+0.000ms): flush_tlb_others (flush_tlb_mm)
00000001 0.002ms (+0.000ms): spin_lock (flush_tlb_others)
00000002 0.002ms (+0.000ms): spin_lock (<00000000>)
00000002 0.003ms (+0.000ms): send_IPI_mask (flush_tlb_others)
00000002 0.003ms (+0.415ms): send_IPI_mask_bitmask (flush_tlb_others)
00000002 0.419ms (+0.000ms): smp_apic_timer_interrupt (flush_tlb_others)
00010002 0.419ms (+0.000ms): profile_tick (smp_apic_timer_interrupt)
00010002 0.419ms (+0.000ms): profile_hook (profile_tick)
00010002 0.419ms (+0.000ms): read_lock (profile_hook)
00010003 0.420ms (+0.000ms): read_lock (<00000000>)
00010003 0.420ms (+0.000ms): notifier_call_chain (profile_hook)
00010003 0.420ms (+0.000ms): _read_unlock (profile_tick)
00010002 0.420ms (+0.000ms): profile_hit (smp_apic_timer_interrupt)
00010002 0.420ms (+0.000ms): update_process_times
(smp_apic_timer_interrupt)
...

or this one...

preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-R8-nd
-------------------------------------------------------
 latency: 330 us, entries: 9 (9)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: get_ltrace.sh/5514, uid:0 nice:-9 policy:0 rt_prio:0
    -----------------
 => started at: flush_tlb_mm+0x1d/0xd0
 => ended at:   flush_tlb_mm+0x52/0xd0
=======>
00000001 0.000ms (+0.001ms): flush_tlb_mm (copy_mm)
00000001 0.001ms (+0.000ms): flush_tlb_others (flush_tlb_mm)
00000001 0.002ms (+0.000ms): spin_lock (flush_tlb_others)
00000002 0.002ms (+0.000ms): spin_lock (<00000000>)
00000002 0.003ms (+0.000ms): send_IPI_mask (flush_tlb_others)
00000002 0.004ms (+0.326ms): send_IPI_mask_bitmask (flush_tlb_others)
00000002 0.330ms (+0.000ms): _spin_unlock (flush_tlb_others)
00000001 0.331ms (+0.000ms): sub_preempt_count (flush_tlb_mm)
00000001 0.331ms (+0.000ms): update_max_trace (check_preempt_timing)

try_to_wake_up
==============

Buried inside a pretty long trace in kswapd0, I saw the following...

preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-R8-nd
-------------------------------------------------------
 latency: 426 us, entries: 286 (286)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: kswapd0/40, uid:0 nice:0 policy:0 rt_prio:0
    -----------------
 => started at: spin_lock+0x24/0x90
 => ended at:   _spin_unlock+0x2d/0x60
=======>
00000001 0.000ms (+0.000ms): spin_lock (page_referenced_file)
00000001 0.000ms (+0.000ms): spin_lock (<00000000>)
00000001 0.000ms (+0.000ms): vma_prio_tree_next (page_referenced_file)
00000001 0.000ms (+0.000ms): prio_tree_first (vma_prio_tree_next)
00000001 0.001ms (+0.000ms): prio_tree_left (prio_tree_first)
00000001 0.001ms (+0.000ms): page_referenced_one (page_referenced_file)
00000001 0.001ms (+0.000ms): spin_lock (page_referenced_one)
00000002 0.002ms (+0.000ms): spin_lock (<00000000>)
...
00000006 0.108ms (+0.000ms): task_rq_lock (try_to_wake_up)
00000006 0.108ms (+0.000ms): spin_lock (task_rq_lock)
00000007 0.108ms (+0.000ms): spin_lock (<00000000>)
00000007 0.109ms (+0.000ms): wake_idle (try_to_wake_up)
00000007 0.109ms (+0.000ms): activate_task (try_to_wake_up)
00000007 0.109ms (+0.000ms): sched_clock (activate_task)
00000007 0.109ms (+0.000ms): recalc_task_prio (activate_task)
00000007 0.109ms (+0.000ms): effective_prio (recalc_task_prio)
00000007 0.110ms (+0.000ms): __activate_task (try_to_wake_up)
00000007 0.110ms (+0.000ms): enqueue_task (__activate_task)
00000007 0.110ms (+0.000ms): sched_info_queued (enqueue_task)
00000007 0.110ms (+0.000ms): resched_task (try_to_wake_up)
00000007 0.111ms (+0.000ms): task_rq_unlock (try_to_wake_up)
00000007 0.111ms (+0.000ms): _spin_unlock_irqrestore (try_to_wake_up)
00000006 0.111ms (+0.298ms): preempt_schedule (try_to_wake_up)
00000005 0.409ms (+0.000ms): _spin_unlock (flush_tlb_others)
00000004 0.409ms (+0.000ms): preempt_schedule (flush_tlb_others)
00000003 0.410ms (+0.000ms): preempt_schedule (flush_tlb_page)
00000003 0.410ms (+0.000ms): kunmap_atomic (page_referenced_one)
00000002 0.410ms (+0.000ms): preempt_schedule (page_referenced_one)
00000002 0.410ms (+0.000ms): _spin_unlock (page_referenced_one)
00000001 0.410ms (+0.000ms): preempt_schedule (page_referenced_one)

There is an almost 300 usec hit there as we unwind the nested layers.

So the long wait on paths through sched and timer_tsc appear to be
eliminated with PIO to the disk.

Is there some "for sure" way to limit the size and/or duration of
DMA transfers so I get reasonable performance from the disk (and
other DMA devices) and reasonable latency?
  --Mark


^ permalink raw reply	[flat|nested] 36+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R1
@ 2004-09-09 18:31 Mark_H_Johnson
  0 siblings, 0 replies; 36+ messages in thread
From: Mark_H_Johnson @ 2004-09-09 18:31 UTC (permalink / raw)
  To: Alan Cox
  Cc: Mark_H_Johnson, Ingo Molnar, Lee Revell, Free Ekanayaka,
	Eric St-Laurent, Linux Kernel Mailing List, K.R. Foley,
	Felipe Alfaro Solana, Daniel Schmitt, P.O. Gaillard, nando, luke,
	free78

>If you haven't done hdparm -u1 that may be a reason you want to touch
>these.

Alas, but
# hdparm /dev/hda
/dev/hda:
 multcount    = 16 (on)
 IO_support   =  1 (32-bit)
 unmaskirq    =  1 (on)
 using_dma    =  0 (off)
 keepsettings =  0 (off)
 readonly     =  0 (off)
 readahead    = 64 (on)
 geometry     = 58168/16/63, sectors = 58633344, start = 0

so I already have IRQ's unmasked. [this was during no DMA tests, I usually
run with DMA enabled]

I'll be commenting on the results of the no DMA tests shortly.

  --Mark


^ permalink raw reply	[flat|nested] 36+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R1
@ 2004-09-08 20:33 Mark_H_Johnson
  2004-09-08 22:31 ` Alan Cox
                   ` (3 more replies)
  0 siblings, 4 replies; 36+ messages in thread
From: Mark_H_Johnson @ 2004-09-08 20:33 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Mark_H_Johnson, Lee Revell, Free Ekanayaka, Eric St-Laurent,
	linux-kernel, K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt,
	P.O. Gaillard, nando, luke, free78

>.... Please disable IDE DMA and see
>what happens (after hiding the PIO IDE codepath via
>touch_preempt_timing()).

Not quite sure where to add touch_preempt_timing() calls - somewhere in the
loop in ide_outsl and ide_insl? [so we keep resetting the start /end
times?]

I did collect some initial data - most are of the form:

preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-R8
-------------------------------------------------------
 latency: 545 us, entries: 8 (8)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: IRQ 14/140, uid:0 nice:-10 policy:0 rt_prio:0
    -----------------
 => started at: kmap_atomic+0x23/0xe0
 => ended at:   kunmap_atomic+0x7b/0xa0
=======>
00000001 0.000ms (+0.000ms): kmap_atomic (ide_multwrite)
00000001 0.000ms (+0.001ms): page_address (ide_multwrite)
00000001 0.001ms (+0.000ms): taskfile_output_data (ide_multwrite)
00000001 0.002ms (+0.000ms): ata_output_data (taskfile_output_data)
00000001 0.003ms (+0.542ms): ide_outsl (ata_output_data)
00000001 0.545ms (+0.000ms): kunmap_atomic (ide_multwrite)
00000001 0.545ms (+0.000ms): sub_preempt_count (kunmap_atomic)
00000001 0.546ms (+0.000ms): update_max_trace (check_preempt_timing)

Which I am sure is what you expected.

I did get a couple that looked like this:

preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-R8
-------------------------------------------------------
 latency: 1261 us, entries: 73 (73)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: sleep/4859, uid:0 nice:-9 policy:0 rt_prio:0
    -----------------
 => started at: spin_lock+0x24/0x90
 => ended at:   _spin_unlock+0x2d/0x60
=======>
00000001 0.000ms (+0.000ms): spin_lock (do_wp_page)
00000001 0.000ms (+0.000ms): spin_lock (<00000000>)
00000001 0.000ms (+0.000ms): kmap_atomic (do_wp_page)
00000002 0.000ms (+0.000ms): page_address (do_wp_page)
00000002 0.001ms (+0.000ms): page_remove_rmap (do_wp_page)
00000002 0.002ms (+0.001ms): flush_tlb_page (do_wp_page)
00000003 0.003ms (+0.000ms): flush_tlb_others (flush_tlb_page)
00000003 0.003ms (+0.000ms): spin_lock (flush_tlb_others)
00000004 0.004ms (+0.001ms): spin_lock (<00000000>)
00000004 0.005ms (+0.000ms): send_IPI_mask (flush_tlb_others)
00000004 0.005ms (+0.321ms): send_IPI_mask_bitmask (flush_tlb_others)
00010004 0.327ms (+0.000ms): do_nmi (flush_tlb_others)
00010004 0.327ms (+0.003ms): do_nmi (kallsyms_lookup)
00010004 0.331ms (+0.000ms): notifier_call_chain (default_do_nmi)
00010004 0.331ms (+0.000ms): profile_tick (nmi_watchdog_tick)
00010004 0.331ms (+0.000ms): profile_hook (profile_tick)
00010004 0.332ms (+0.000ms): read_lock (profile_hook)
00010005 0.332ms (+0.000ms): read_lock (<00000000>)
00010005 0.332ms (+0.000ms): notifier_call_chain (profile_hook)
00010005 0.332ms (+0.000ms): _read_unlock (profile_tick)
00010004 0.333ms (+0.091ms): profile_hit (nmi_watchdog_tick)
00000004 0.424ms (+0.000ms): smp_apic_timer_interrupt (flush_tlb_others)
...
00000005 0.431ms (+0.000ms): do_softirq (smp_apic_timer_interrupt)
00000005 0.431ms (+0.000ms): __do_softirq (do_softirq)
00000005 0.432ms (+0.000ms): wake_up_process (do_softirq)
00000005 0.432ms (+0.000ms): try_to_wake_up (wake_up_process)
00000005 0.433ms (+0.000ms): task_rq_lock (try_to_wake_up)
00000005 0.433ms (+0.000ms): spin_lock (task_rq_lock)
00000006 0.433ms (+0.000ms): spin_lock (<00000000>)
00000006 0.433ms (+0.000ms): wake_idle (try_to_wake_up)
00000006 0.434ms (+0.000ms): activate_task (try_to_wake_up)
00000006 0.434ms (+0.000ms): sched_clock (activate_task)
00000006 0.434ms (+0.000ms): recalc_task_prio (activate_task)
00000006 0.435ms (+0.000ms): effective_prio (recalc_task_prio)
00000006 0.435ms (+0.000ms): __activate_task (try_to_wake_up)
00000006 0.436ms (+0.000ms): enqueue_task (__activate_task)
00000006 0.436ms (+0.000ms): sched_info_queued (enqueue_task)
00000006 0.437ms (+0.000ms): resched_task (try_to_wake_up)
00000006 0.437ms (+0.000ms): task_rq_unlock (try_to_wake_up)
00000006 0.437ms (+0.000ms): _spin_unlock_irqrestore (try_to_wake_up)
00000005 0.437ms (+0.820ms): preempt_schedule (try_to_wake_up)
00000004 1.258ms (+0.000ms): _spin_unlock (flush_tlb_others)
00000003 1.258ms (+0.000ms): preempt_schedule (flush_tlb_others)
00000002 1.258ms (+0.000ms): preempt_schedule (flush_tlb_page)
...

or this

preemption latency trace v1.0.7 on 2.6.9-rc1-bk12-VP-R8
-------------------------------------------------------
 latency: 1210 us, entries: 113 (113)   |   [VP:1 KP:1 SP:1 HP:1 #CPUS:2]
    -----------------
    | task: get_ltrace.sh/3071, uid:0 nice:-9 policy:0 rt_prio:0
    -----------------
 => started at: flush_tlb_mm+0x1d/0xd0
 => ended at:   flush_tlb_mm+0x52/0xd0
=======>
00000001 0.000ms (+0.001ms): flush_tlb_mm (copy_mm)
00000001 0.001ms (+0.000ms): flush_tlb_others (flush_tlb_mm)
00000001 0.001ms (+0.000ms): spin_lock (flush_tlb_others)
00000002 0.002ms (+0.000ms): spin_lock (<00000000>)
00000002 0.003ms (+0.000ms): send_IPI_mask (flush_tlb_others)
00000002 0.003ms (+0.378ms): send_IPI_mask_bitmask (flush_tlb_others)
00000002 0.381ms (+0.000ms): smp_apic_timer_interrupt (flush_tlb_others)
...
00010003 0.459ms (+0.000ms): preempt_schedule (timer_interrupt)
00010003 0.460ms (+0.000ms): do_timer (timer_interrupt)
00010003 0.460ms (+0.000ms): update_wall_time (do_timer)
00010003 0.460ms (+0.000ms): update_wall_time_one_tick (update_wall_time)
00010003 0.460ms (+0.000ms): _spin_unlock (timer_interrupt)
00010002 0.460ms (+0.000ms): preempt_schedule (timer_interrupt)
00010002 0.461ms (+0.000ms): spin_lock (do_IRQ)
00010003 0.461ms (+0.000ms): spin_lock (<00000000>)
00010003 0.461ms (+0.000ms): generic_note_interrupt (do_IRQ)
00010003 0.461ms (+0.000ms): end_edge_ioapic_irq (do_IRQ)
00010003 0.461ms (+0.000ms): _spin_unlock (do_IRQ)
00010002 0.462ms (+0.000ms): preempt_schedule (do_IRQ)
00000003 0.462ms (+0.000ms): do_softirq (do_IRQ)
00000003 0.462ms (+0.747ms): __do_softirq (do_softirq)
00000002 1.209ms (+0.000ms): _spin_unlock (flush_tlb_others)
00000001 1.210ms (+0.000ms): preempt_schedule (flush_tlb_others)
00000001 1.210ms (+0.000ms): sub_preempt_count (flush_tlb_mm)
00000001 1.211ms (+0.000ms): update_max_trace (check_preempt_timing)

where I had > 1 msec latencies w/o any apparent references to the IDE.

  --Mark


^ permalink raw reply	[flat|nested] 36+ messages in thread
* Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R0
@ 2004-09-02 22:14 Ingo Molnar
  2004-09-03  0:24 ` Lee Revell
  0 siblings, 1 reply; 36+ messages in thread
From: Ingo Molnar @ 2004-09-02 22:14 UTC (permalink / raw)
  To: Lee Revell
  Cc: linux-kernel, K.R. Foley, Felipe Alfaro Solana, Daniel Schmitt,
	Mark_H_Johnson, P.O. Gaillard


* Lee Revell <rlrevell@joe-job.com> wrote:

> > i've given up on the netdev_backlog_granularity approach, and as a
> > replacement i've modified specific network drivers to return at a safe
> > point if softirq preemption is requested.
> 
> Makes sense, netdev_max_backlog never made a difference on my system
> (via-rhine driver).

via-rhine does RX processing from the hardirq handler, this codepath is
harder to break up. The NAPI ->poll functions used by e100 and 8193too
are much easier to break up because RX throttling and re-trying is a
basic property of NAPI.

	Ingo

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2004-09-15 15:31 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-09-07 16:54 [patch] voluntary-preempt-2.6.9-rc1-bk4-R1 Mark_H_Johnson
2004-09-08 18:42 ` Ingo Molnar
2004-09-10  2:31   ` K.R. Foley
2004-09-10 17:56     ` K.R. Foley
2004-09-10 19:26       ` Ingo Molnar
2004-09-10 19:28         ` K.R. Foley
2004-09-09 16:02 ` Ingo Molnar
  -- strict thread matches above, loose matches on Subject: below --
2004-09-13 14:44 Mark_H_Johnson
2004-09-14 18:32 ` Ingo Molnar
2004-09-15 15:26   ` Stephen Smalley
2004-09-10 14:28 Mark_H_Johnson
2004-09-10 19:29 ` Ingo Molnar
2004-09-10 12:57 Mark_H_Johnson
2004-09-10 13:10 ` Ingo Molnar
2004-09-09 22:56 Mark_H_Johnson
2004-09-10 12:42 ` Ingo Molnar
2004-09-09 22:41 Mark_H_Johnson
2004-09-09 19:23 Mark_H_Johnson
2004-09-09 19:47 ` Ingo Molnar
2004-09-09 19:02   ` Alan Cox
2004-09-09 18:31 Mark_H_Johnson
2004-09-08 20:33 Mark_H_Johnson
2004-09-08 22:31 ` Alan Cox
2004-09-09 16:44   ` Thomas Charbonnel
2004-09-09  6:27 ` Ingo Molnar
2004-09-09 15:12 ` Ingo Molnar
2004-09-09 15:33 ` Ingo Molnar
2004-09-02 22:14 [patch] voluntary-preempt-2.6.9-rc1-bk4-R0 Ingo Molnar
2004-09-03  0:24 ` Lee Revell
2004-09-03  3:17   ` Eric St-Laurent
2004-09-03  6:26     ` Lee Revell
2004-09-03  6:36       ` Ingo Molnar
2004-09-03  6:49         ` Lee Revell
2004-09-03  7:05           ` Ingo Molnar
2004-09-03  7:40             ` Lee Revell
2004-09-03  7:50               ` Free Ekanayaka
2004-09-03  8:05                 ` Lee Revell
2004-09-03  9:25                   ` [patch] voluntary-preempt-2.6.9-rc1-bk4-R1 Ingo Molnar
2004-09-03  9:50                     ` Luke Yelavich
2004-09-03 10:29                       ` Ingo Molnar
2004-09-03 10:43                         ` Luke Yelavich
2004-09-03 11:33                     ` Thomas Charbonnel
2004-09-03 11:49                       ` Ingo Molnar
2004-09-03 12:05                         ` Thomas Charbonnel
2004-09-03 16:14                         ` Thomas Charbonnel
2004-09-03 17:36                           ` Thomas Charbonnel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox