From: Eric Dumazet <eric.dumazet@gmail.com>
To: Ben Hutchings <bhutchings@solarflare.com>
Cc: Tom Herbert <therbert@google.com>,
David Miller <davem@davemloft.net>,
netdev@vger.kernel.org
Subject: Re: [PATCH net-next-2.6] sched: use xps information for qdisc NUMA affinity
Date: Tue, 30 Nov 2010 19:52:07 +0100 [thread overview]
Message-ID: <1291143127.2904.192.camel@edumazet-laptop> (raw)
In-Reply-To: <1291142762.21077.47.camel@bwh-desktop>
Le mardi 30 novembre 2010 à 18:46 +0000, Ben Hutchings a écrit :
> Yes, that's why I proposed an ethtool interface for reconfiguring this.
> Although to be honest I haven't yet constructed a case where it made a
> difference. I think the most important objects to be allocated on the
> right node are RX buffers, and as long as refill is scheduled on the
> same CPU as the IRQ this already happens.
>
Hmm, right now RX skbs are allocated on the right node, since they are
allocated on the node of the cpu handling the {soft}irq.
commit 564824b0c52c346
net: allocate skbs on local node
commit b30973f877 (node-aware skb allocation) spread a wrong habit of
allocating net drivers skbs on a given memory node : The one closest to
the NIC hardware. This is wrong because as soon as we try to scale
network stack, we need to use many cpus to handle traffic and hit
slub/slab management on cross-node allocations/frees when these cpus
have to alloc/free skbs bound to a central node.
skb allocated in RX path are ephemeral, they have a very short
lifetime : Extra cost to maintain NUMA affinity is too expensive. What
appeared as a nice idea four years ago is in fact a bad one.
In 2010, NIC hardwares are multiqueue, or we use RPS to spread the load,
and two 10Gb NIC might deliver more than 28 million packets per second,
needing all the available cpus.
Cost of cross-node handling in network and vm stacks outperforms the
small benefit hardware had when doing its DMA transfert in its 'local'
memory node at RX time. Even trying to differentiate the two allocations
done for one skb (the sk_buff on local node, the data part on NIC
hardware node) is not enough to bring good performance.
next prev parent reply other threads:[~2010-11-30 18:52 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-11-21 23:17 [PATCH 2/2 v7] xps: Transmit Packet Steering Tom Herbert
2010-11-22 11:42 ` Changli Gao
2010-11-22 13:33 ` Eric Dumazet
2010-11-24 19:45 ` David Miller
2010-11-26 17:13 ` Tom Herbert
2010-11-26 17:17 ` Eric Dumazet
2010-11-28 15:43 ` [PATCH net-next-2.6] xps: NUMA allocations for per cpu data Eric Dumazet
2010-11-29 17:43 ` David Miller
2010-11-25 17:12 ` [PATCH 2/2 v7] xps: Transmit Packet Steering Ben Hutchings
2010-11-29 18:14 ` [PATCH net-next-2.6] sched: use xps information for qdisc NUMA affinity Eric Dumazet
2010-11-30 18:31 ` Tom Herbert
2010-11-30 18:39 ` Eric Dumazet
2010-11-30 18:46 ` Ben Hutchings
2010-11-30 18:52 ` Eric Dumazet [this message]
2010-11-30 18:48 ` David Miller
2010-11-30 19:07 ` Eric Dumazet
2010-11-30 19:19 ` Ben Hutchings
2010-11-30 19:21 ` David Miller
2010-11-30 20:01 ` Brandeburg, Jesse
2010-12-01 20:49 ` David Miller
2010-12-01 20:55 ` Eric Dumazet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1291143127.2904.192.camel@edumazet-laptop \
--to=eric.dumazet@gmail.com \
--cc=bhutchings@solarflare.com \
--cc=davem@davemloft.net \
--cc=netdev@vger.kernel.org \
--cc=therbert@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox