public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <eric.dumazet@gmail.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: David Miller <davem@davemloft.net>,
	netdev <netdev@vger.kernel.org>,
	Michael Chan <mchan@broadcom.com>,
	Eilon Greenstein <eilong@broadcom.com>,
	Christoph Hellwig <hch@lst.de>,
	Christoph Lameter <cl@linux-foundation.org>
Subject: Re: [PATCH net-next] net:  allocate skbs on local node
Date: Tue, 12 Oct 2010 08:58:19 +0200	[thread overview]
Message-ID: <1286866699.30423.234.camel@edumazet-laptop> (raw)
In-Reply-To: <20101011230322.f0f6dd47.akpm@linux-foundation.org>

Le lundi 11 octobre 2010 à 23:03 -0700, Andrew Morton a écrit :
> On Tue, 12 Oct 2010 07:05:25 +0200 Eric Dumazet <eric.dumazet@gmail.com> wrote:

> > [PATCH net-next] net: allocate skbs on local node
> > 
> > commit b30973f877 (node-aware skb allocation) spread a wrong habit of
> > allocating net drivers skbs on a given memory node : The one closest to
> > the NIC hardware. This is wrong because as soon as we try to scale
> > network stack, we need to use many cpus to handle traffic and hit
> > slub/slab management on cross-node allocations/frees when these cpus
> > have to alloc/free skbs bound to a central node.
> > 
> > skb allocated in RX path are ephemeral, they have a very short
> > lifetime : Extra cost to maintain NUMA affinity is too expensive. What
> > appeared as a nice idea four years ago is in fact a bad one.
> > 
> > In 2010, NIC hardwares are multiqueue, or we use RPS to spread the load,
> > and two 10Gb NIC might deliver more than 28 million packets per second,
> > needing all the available cpus.
> > 
> > Cost of cross-node handling in network and vm stacks outperforms the
> > small benefit hardware had when doing its DMA transfert in its 'local'
> > memory node at RX time. Even trying to differentiate the two allocations
> > done for one skb (the sk_buff on local node, the data part on NIC
> > hardware node) is not enough to bring good performance.
> > 
> 
> This is all conspicuously hand-wavy and unquantified.  (IOW: prove it!)
> 

I would say, _you_ should prove that original patch was good. It seems
no network guy was really in the discussion ?

Just run a test on a bnx2x or ixgbe multiqueue 10Gb adapter, and see the
difference. Thats about a 40% slowdown on high packet rates, on a dual
socket machine (dual X5570  @2.93GHz). You can expect higher values on
four nodes (I dont have such hardware to do the test)


> The mooted effects should be tested for on both slab and slub, I
> suggest.  They're pretty different beasts.

SLAB is so slow on NUMA these days, you can forget it for good.

Its about 40% slower on some tests I did this week on net-next, to
speedup output (and routing) performance, so it was with normal (local)
allocations, not even cross-nodes ones.

Once you remove network bottlenecks, you badly hit contention on SLAB
and are forced to switch to SLUB ;)

Sending 160.000.000 udp frames on same neighbour/destination,
IP route cache disabled (to mimic DDOS on a router)
16 threads, 16 logical cpus. 32bit kernel (dual E5540  @ 2.53GHz)


(It takes more than 2 minutes with linux-2.6, so use net-next-2.6 if you
really want to get these numbers)

SLUB :

real	0m50.661s
user	0m15.973s
sys	11m42.548s


18348.00 21.4% dst_destroy             vmlinux
 5674.00  6.6% fib_table_lookup        vmlinux
 5563.00  6.5% dst_alloc               vmlinux
 5226.00  6.1% neigh_lookup            vmlinux
 3590.00  4.2% __ip_route_output_key   vmlinux
 2712.00  3.2% neigh_resolve_output    vmlinux
 2511.00  2.9% fib_semantic_match      vmlinux
 2488.00  2.9% ipv4_dst_destroy        vmlinux
 2206.00  2.6% __xfrm_lookup           vmlinux
 2119.00  2.5% memset                  vmlinux
 2015.00  2.4% __copy_from_user_ll     vmlinux
 1722.00  2.0% udp_sendmsg             vmlinux
 1679.00  2.0% __slab_free             vmlinux
 1152.00  1.3% ip_append_data          vmlinux
 1044.00  1.2% __alloc_skb             vmlinux
  952.00  1.1% kmem_cache_free         vmlinux
  942.00  1.1% udp_push_pending_frames vmlinux
  877.00  1.0% kfree                   vmlinux
  870.00  1.0% __call_rcu              vmlinux
  829.00  1.0% ip_push_pending_frames  vmlinux
  799.00  0.9% _raw_spin_lock_bh       vmlinux

SLAB:

real	1m10.771s
user	0m13.941s
sys	12m42.188s


22734.00 26.0% _raw_spin_lock          vmlinux
 8238.00  9.4% dst_destroy             vmlinux
 4393.00  5.0% fib_table_lookup        vmlinux
 3652.00  4.2% dst_alloc               vmlinux
 3335.00  3.8% neigh_lookup            vmlinux
 2444.00  2.8% memset                  vmlinux
 2443.00  2.8% __ip_route_output_key   vmlinux
 1916.00  2.2% fib_semantic_match      vmlinux
 1708.00  2.0% __copy_from_user_ll     vmlinux
 1669.00  1.9% __xfrm_lookup           vmlinux
 1642.00  1.9% free_block              vmlinux
 1554.00  1.8% neigh_resolve_output    vmlinux
 1388.00  1.6% ipv4_dst_destroy        vmlinux
 1335.00  1.5% udp_sendmsg             vmlinux
 1109.00  1.3% kmem_cache_free         vmlinux
 1007.00  1.2% __alloc_skb             vmlinux
 1004.00  1.1% kfree                   vmlinux
 1002.00  1.1% ip_append_data          vmlinux
  975.00  1.1% cache_grow              vmlinux
  936.00  1.1% ____cache_alloc_node    vmlinux
  925.00  1.1% udp_push_pending_frames vmlinux


All this raw_spin_lock overhead comes from SLAB.



  reply	other threads:[~2010-10-12  6:58 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-10-11 23:03 [PATCH net-next] bnx2x: dont use netdev_alloc_skb() Eric Dumazet
2010-10-11 23:22 ` Eric Dumazet
2010-10-12  5:03   ` Tom Herbert
2010-10-12  5:16     ` Eric Dumazet
2010-10-12  9:12       ` Vladislav Zolotarov
2010-10-14 17:39         ` David Miller
2010-10-14 18:17           ` Eilon Greenstein
2010-10-14 18:20             ` Eric Dumazet
2010-10-14 18:25               ` David Miller
2010-10-14 18:17           ` Tom Herbert
2010-10-12  5:05   ` [PATCH net-next] net: allocate skbs on local node Eric Dumazet
2010-10-12  5:35     ` Tom Herbert
2010-10-12  6:03     ` Andrew Morton
2010-10-12  6:58       ` Eric Dumazet [this message]
2010-10-12  7:24         ` Andrew Morton
2010-10-12  7:49           ` Eric Dumazet
2010-10-12  7:58             ` Andrew Morton
2010-10-12 11:08               ` Pekka Enberg
2010-10-12 12:50                 ` Christoph Lameter
2010-10-12 19:43                   ` David Rientjes
2010-10-13  6:17                     ` Pekka Enberg
2010-10-13  6:31                       ` David Rientjes
2010-10-13  6:36                         ` Pekka Enberg
2010-10-13 16:00                     ` Christoph Lameter
2010-10-13 20:48                       ` David Rientjes
2010-10-13 21:43                         ` Christoph Lameter
2010-10-13 22:41                           ` David Rientjes
2010-10-14  6:22                             ` Pekka Enberg
2010-10-14  7:23                               ` David Rientjes
2010-10-15 14:23                             ` Christoph Lameter
2010-10-14 15:31       ` Tom Herbert
2010-10-14 16:05         ` Eric Dumazet
2010-10-15 16:57           ` Christoph Lameter
2010-10-14 19:27         ` Andrew Morton
2010-10-14 19:59           ` Eric Dumazet
2010-10-16 18:54     ` David Miller
2010-10-12 16:07 ` [BUG net-next] bnx2x: all traffic comes to RX queue 0 Eric Dumazet
2010-10-12 16:20   ` Dmitry Kravkov
2010-10-12 18:11     ` Eric Dumazet
2010-10-12 18:18       ` Vladislav Zolotarov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1286866699.30423.234.camel@edumazet-laptop \
    --to=eric.dumazet@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux-foundation.org \
    --cc=davem@davemloft.net \
    --cc=eilong@broadcom.com \
    --cc=hch@lst.de \
    --cc=mchan@broadcom.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox