From: Eric Dumazet <eric.dumazet@gmail.com>
To: Rusty Russell <rusty@rustcorp.com.au>
Cc: netdev@vger.kernel.org,
virtualization@lists.linux-foundation.org,
Divy Le Ray <divy@chelsio.com>, Roland Dreier <rolandd@cisco.com>,
Pavel Emelianov <xemul@openvz.org>,
Dan Williams <dcbw@redhat.com>,
libertas-dev@lists.infradead.org
Subject: Re: [PATCH 1/4] net: skb_orphan on dev_hard_start_xmit
Date: Wed, 03 Jun 2009 23:02:53 +0200 [thread overview]
Message-ID: <4A26E4FD.5010405@gmail.com> (raw)
In-Reply-To: <200906012157.29465.rusty@rustcorp.com.au>
Rusty Russell a écrit :
> On Sat, 30 May 2009 12:41:00 am Eric Dumazet wrote:
>> Rusty Russell a écrit :
>>> DaveM points out that there are advantages to doing it generally (it's
>>> more likely to be on same CPU than after xmit), and I couldn't find
>>> any new starvation issues in simple benchmarking here.
>> If really no starvations are possible at all, I really wonder why some
>> guys added memory accounting to UDP flows. Maybe they dont run "simple
>> benchmarks" but real apps ? :)
>
> Well, without any accounting at all you could use quite a lot of memory as
> there are many places packets can be queued.
>
>> For TCP, I agree your patch is a huge benefit, since its paced by remote
>> ACKS and window control
>
> I doubt that. There'll be some cache friendliness, but I'm not sure it'll be
> measurable, let alone "huge". It's the win to drivers which don't have a
> timely and batching tx free mechanism which I aim for.
At 250.000 packets/second on a Gigabit link, this is huge, I can tell you.
(250.000 incoming packets and 250.000 outgoing packets per second, 700 Mbit/s)
According to this oprofile on CPU0 (dedicated to softirqs on one bnx2 eth adapter)
We can see sock_wfree() being number 2 on the profile, because it touches three cache lines per socket and
transmited packet in TX completion handler.
Also, taking a reference on socket for each xmit packet in flight is very expensive, since it slows
down receiver in __udp4_lib_lookup(). Several cpus are fighting for sk->refcnt cache line.
CPU: Core 2, speed 3000.24 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000
samples cum. samples % cum. % symbol name
21215 21215 11.8847 11.8847 bnx2_poll_work
17239 38454 9.6573 21.5420 sock_wfree << effect of udp memory accounting >>
14817 53271 8.3005 29.8425 __slab_free
14635 67906 8.1986 38.0411 __udp4_lib_lookup
11425 79331 6.4003 44.4414 __alloc_skb
9710 89041 5.4396 49.8810 __slab_alloc
8095 97136 4.5348 54.4158 __udp4_lib_rcv
7831 104967 4.3869 58.8027 sock_def_write_space
7586 112553 4.2497 63.0524 ip_rcv
7518 120071 4.2116 67.2640 skb_dma_unmap
6711 126782 3.7595 71.0235 netif_receive_skb
6272 133054 3.5136 74.5371 udp_queue_rcv_skb
5262 138316 2.9478 77.4849 skb_release_data
5023 143339 2.8139 80.2988 __kmalloc_track_caller
4070 147409 2.2800 82.5788 kmem_cache_alloc
3216 150625 1.8016 84.3804 ipt_do_table
2576 153201 1.4431 85.8235 skb_queue_tail
next prev parent reply other threads:[~2009-06-03 21:03 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-05-29 14:14 [PATCH 1/4] net: skb_orphan on dev_hard_start_xmit Rusty Russell
2009-05-29 15:11 ` Eric Dumazet
2009-06-01 12:27 ` Rusty Russell
2009-06-03 21:02 ` Eric Dumazet [this message]
2009-06-04 3:54 ` Rusty Russell
2009-06-04 4:00 ` David Miller
2009-06-04 4:54 ` Eric Dumazet
2009-06-04 4:56 ` David Miller
2009-06-04 9:18 ` [PATCH] net: No more expensive sock_hold()/sock_put() on each tx Eric Dumazet
2009-06-04 9:26 ` David Miller
2009-06-10 8:17 ` David Miller
2009-06-10 8:30 ` Eric Dumazet
2009-06-11 9:56 ` David Miller
2009-06-01 19:47 ` [PATCH 1/4] net: skb_orphan on dev_hard_start_xmit Patrick Ohly
2009-06-02 7:25 ` David Miller
2009-06-02 14:08 ` Rusty Russell
2009-06-03 0:14 ` David Miller
2009-07-03 7:55 ` Herbert Xu
2009-07-04 3:02 ` David Miller
2009-07-04 3:08 ` Herbert Xu
2009-07-04 3:13 ` David Miller
2009-07-04 7:42 ` Herbert Xu
2009-07-04 9:09 ` Herbert Xu
2009-07-05 3:26 ` Herbert Xu
2009-07-05 3:34 ` Herbert Xu
2009-08-18 1:47 ` David Miller
2009-08-19 3:19 ` Herbert Xu
2009-08-19 3:34 ` David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4A26E4FD.5010405@gmail.com \
--to=eric.dumazet@gmail.com \
--cc=dcbw@redhat.com \
--cc=divy@chelsio.com \
--cc=libertas-dev@lists.infradead.org \
--cc=netdev@vger.kernel.org \
--cc=rolandd@cisco.com \
--cc=rusty@rustcorp.com.au \
--cc=virtualization@lists.linux-foundation.org \
--cc=xemul@openvz.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).