From: Eric Dumazet <dada1@cosmosbay.com>
To: Kenny Chang <kchang@athenacr.com>
Cc: netdev@vger.kernel.org, "David S. Miller" <davem@davemloft.net>,
Christoph Lameter <cl@linux-foundation.org>
Subject: Re: Multicast packet loss
Date: Sun, 01 Mar 2009 18:03:12 +0100 [thread overview]
Message-ID: <49AABFD0.5090204@cosmosbay.com> (raw)
In-Reply-To: <49A8FAFF.7060104@cosmosbay.com>
Eric Dumazet a écrit :
> Kenny Chang a écrit :
>> It's been a while since I updated this thread. We've been running
>> through the different suggestions and tabulating their effects, as well
>> as trying out an Intel card. The short story is that setting affinity
>> and MSI works to some extent, and the Intel card doesn't seem to change
>> things significantly. The results don't seem consistent enough for us
>> to be able to point to a smoking gun.
>>
>> It does look like the 2.6.29-rc4 kernel performs okay with the Intel
>> card, but this is not a real-time build and it's not likely to be in a
>> supported Ubuntu distribution real soon. We've reached the point where
>> we'd like to look for an expert dedicated to work on this problem for a
>> period of time. The final result being some sort of solution to produce
>> a realtime configuration with a reasonably "aged" kernel (.24~.28) that
>> has multicast performance greater than or equal to that of 2.6.15.
>>
>> If anybody is interested in devoting some compensated time to this
>> issue, we're offering up a bounty:
>> http://www.athenacr.com/bounties/multicast-performance/
>>
>> For completeness, here's the table of our experiment results:
>>
>> ====================== ================== ========= ==========
>> =============== ============== ============== =================
>> Kernel flavor IRQ affinity *4x
>> mcasttest* *5x mcasttest* *6x mcasttest* *Mtools2* [4]_
>> ====================== ================== ========= ==========
>> =============== ============== ============== =================
>> Intel
>> e1000e
>>
>> -----------------------------------------+---------+----------+---------------+--------------+--------------+-----------------
>>
>> 2.6.24.19 rt | any |
>> OK Maybe X
>> 2.6.24.19 rt | CPU0 |
>> OK OK X
>> 2.6.24.19 generic | any |
>> X
>> 2.6.24.19 generic | CPU0 |
>> OK
>> 2.6.29-rc3 vanilla-server | any |
>> X
>> 2.6.29-rc3 vanilla-server | CPU0 |
>> OK
>> 2.6.29-rc4 vanilla-generic | any |
>> X OK
>> 2.6.29-rc4 vanilla-generic | CPU0 | OK
>> OK OK [5]_ OK
>> -----------------------------------------+---------+----------+---------------+--------------+--------------+-----------------
>>
>> Broadcom
>> BNX2
>>
>> -----------------------------------------+---------+----------+---------------+--------------+--------------+-----------------
>>
>> 2.6.24-19 rt | MSI any |
>> OK OK X
>> 2.6.24-19 rt | MSI CPU0 |
>> OK Maybe X
>> 2.6.24-19 rt | APIC any |
>> OK OK X
>> 2.6.24-19 rt | APIC CPU0 |
>> OK Maybe X
>> 2.6.24-19-bnx-latest rt | APIC CPU0 |
>> OK X
>> 2.6.24-19 server | MSI any |
>> X
>> 2.6.24-19 server | MSI CPU0 |
>> OK
>> 2.6.24-19 generic | APIC any |
>> X
>> 2.6.24-19 generic | APIC CPU0 |
>> OK
>> 2.6.27-11 generic | APIC any |
>> X
>> 2.6.27-11 generic | APIC CPU0 |
>> OK 10% drop
>> 2.6.28-8 generic | APIC any |
>> OK X
>> 2.6.28-8 generic | APIC CPU0 |
>> OK OK 0.5% drop
>> 2.6.29-rc3 vanilla-server | MSI any |
>> X
>> 2.6.29-rc3 vanilla-server | MSI CPU0 |
>> X
>> 2.6.29-rc3 vanilla-server | APIC any |
>> OK X
>> 2.6.29-rc3 vanilla-server | APIC CPU0 |
>> OK OK
>> 2.6.29-rc4 vanilla-generic | APIC any |
>> X
>> 2.6.29-rc4 vanilla-generic | APIC CPU0 |
>> OK 3% drop 10% drop X
>> ======================
>> ==================+=========+==========+===============+==============+==============+=================
>>
>> * [4] MTools2 is a test from 29West: http://www.29west.com/docs/TestNet/
>> * [5] In 5 trials, 1 of the trials dropped 2%, 4 of the trials dropped
>> nothing.
>>
>> Kenny
>>
>
> Hi Kenny
>
> I am investigating how to reduce contention (and schedule() calls) on this workload.
>
I bound NIC (gigabit BNX2) irq to cpu 0, so that oprofile results on this cpu can show us
where ksoftirqd is spending its time.
We can see scheduler at work :)
Also, one thing to note is __copy_skb_header() : 9.49 % of cpu0 time.
The problem comes from dst_clone() (6.05 % total, so 2/3 of __copy_skb_header()),
touching a highly contended cache line. (other cpus are doing the decrement of
dst refcounter)
CPU: Core 2, speed 3000.05 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted)
with a unit mask of 0x00 (Unhalted core cycles) count 100000
Samples on CPU 0
(samples for other cpus 1..7 omitted)
samples cum. samples % cum. % symbol name
23750 23750 9.8159 9.8159 try_to_wake_up
22972 46722 9.4944 19.3103 __copy_skb_header
20217 66939 8.3557 27.6660 enqueue_task_fair
14565 81504 6.0197 33.6857 sock_def_readable
13454 94958 5.5606 39.2463 task_rq_lock
13381 108339 5.5304 44.7767 resched_task
13090 121429 5.4101 50.1868 udp_queue_rcv_skb
11441 132870 4.7286 54.9154 skb_queue_tail
10109 142979 4.1781 59.0935 sock_queue_rcv_skb
10024 153003 4.1429 63.2364 __wake_up_sync
9952 162955 4.1132 67.3496 update_curr
8761 171716 3.6209 70.9705 sched_clock_cpu
7414 179130 3.0642 74.0347 rb_insert_color
7381 186511 3.0506 77.0853 select_task_rq_fair
6749 193260 2.7894 79.8747 __slab_alloc
5881 199141 2.4306 82.3053 __wake_up_common
5432 204573 2.2451 84.5504 __skb_clone
4306 208879 1.7797 86.3300 kmem_cache_alloc
3524 212403 1.4565 87.7865 place_entity
2783 215186 1.1502 88.9367 skb_clone
2576 217762 1.0647 90.0014 __udp4_lib_rcv
2430 220192 1.0043 91.0057 bnx2_poll_work
2184 222376 0.9027 91.9084 ipt_do_table
2090 224466 0.8638 92.7722 ip_route_input
1877 226343 0.7758 93.5479 __alloc_skb
1495 227838 0.6179 94.1658 native_sched_clock
1166 229004 0.4819 94.6477 __update_sched_clock
1083 230087 0.4476 95.0953 netif_receive_skb
1062 231149 0.4389 95.5343 activate_task
644 231793 0.2662 95.8004 __kmalloc_track_caller
638 232431 0.2637 96.0641 nf_iterate
549 232980 0.2269 96.2910 skb_put
next prev parent reply other threads:[~2009-03-01 17:03 UTC|newest]
Thread overview: 70+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-01-30 17:49 Multicast packet loss Kenny Chang
2009-01-30 19:04 ` Eric Dumazet
2009-01-30 19:17 ` Denys Fedoryschenko
2009-01-30 20:03 ` Neil Horman
2009-01-30 22:29 ` Kenny Chang
2009-01-30 22:41 ` Eric Dumazet
2009-01-31 16:03 ` Neil Horman
2009-02-02 16:13 ` Kenny Chang
2009-02-02 16:48 ` Kenny Chang
2009-02-03 11:55 ` Neil Horman
2009-02-03 15:20 ` Kenny Chang
2009-02-04 1:15 ` Neil Horman
2009-02-04 16:07 ` Kenny Chang
2009-02-04 16:46 ` Wesley Chow
2009-02-04 18:11 ` Eric Dumazet
2009-02-05 13:33 ` Neil Horman
2009-02-05 13:46 ` Wesley Chow
2009-02-05 13:29 ` Neil Horman
2009-02-01 12:40 ` Eric Dumazet
2009-02-02 13:45 ` Neil Horman
2009-02-02 16:57 ` Eric Dumazet
2009-02-02 18:22 ` Neil Horman
2009-02-02 19:51 ` Wes Chow
2009-02-02 20:29 ` Eric Dumazet
2009-02-02 21:09 ` Wes Chow
2009-02-02 21:31 ` Eric Dumazet
2009-02-03 17:34 ` Kenny Chang
2009-02-04 1:21 ` Neil Horman
2009-02-26 17:15 ` Kenny Chang
2009-02-28 8:51 ` Eric Dumazet
2009-03-01 17:03 ` Eric Dumazet [this message]
2009-03-04 8:16 ` David Miller
2009-03-04 8:36 ` Eric Dumazet
2009-03-07 7:46 ` Eric Dumazet
2009-03-08 16:46 ` Eric Dumazet
2009-03-09 2:49 ` David Miller
2009-03-09 6:36 ` Eric Dumazet
2009-03-13 21:51 ` David Miller
2009-03-13 22:30 ` Eric Dumazet
2009-03-13 22:38 ` David Miller
2009-03-13 22:45 ` Eric Dumazet
2009-03-14 9:03 ` [PATCH] net: reorder fields of struct socket Eric Dumazet
2009-03-16 2:59 ` David Miller
2009-03-16 22:22 ` Multicast packet loss Eric Dumazet
2009-03-17 10:11 ` Peter Zijlstra
2009-03-17 11:08 ` Eric Dumazet
2009-03-17 11:57 ` Peter Zijlstra
2009-03-17 15:00 ` Brian Bloniarz
2009-03-17 15:16 ` Eric Dumazet
2009-03-17 19:39 ` David Stevens
2009-03-17 21:19 ` Eric Dumazet
2009-04-03 19:28 ` Brian Bloniarz
2009-04-05 13:49 ` Eric Dumazet
2009-04-06 21:53 ` Brian Bloniarz
2009-04-06 22:12 ` Brian Bloniarz
2009-04-07 20:08 ` Brian Bloniarz
2009-04-08 8:12 ` Eric Dumazet
2009-03-09 22:56 ` Brian Bloniarz
2009-03-10 5:28 ` Eric Dumazet
2009-03-10 23:22 ` Brian Bloniarz
2009-03-11 3:00 ` Eric Dumazet
2009-03-12 15:47 ` Brian Bloniarz
2009-03-12 16:34 ` Eric Dumazet
2009-02-27 18:40 ` Christoph Lameter
2009-02-27 18:56 ` Eric Dumazet
2009-02-27 19:45 ` Christoph Lameter
2009-02-27 20:12 ` Eric Dumazet
2009-02-27 21:36 ` Eric Dumazet
2009-02-02 13:53 ` Eric Dumazet
-- strict thread matches above, loose matches on Subject: below --
2009-04-05 14:42 bmb
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=49AABFD0.5090204@cosmosbay.com \
--to=dada1@cosmosbay.com \
--cc=cl@linux-foundation.org \
--cc=davem@davemloft.net \
--cc=kchang@athenacr.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.