From: Eric Dumazet <eric.dumazet@gmail.com>
To: David Miller <davem@davemloft.net>
Cc: netdev <netdev@vger.kernel.org>,
Michael Chan <mchan@broadcom.com>,
Eilon Greenstein <eilong@broadcom.com>,
Andrew Morton <akpm@linux-foundation.org>,
Christoph Hellwig <hch@lst.de>,
Christoph Lameter <cl@linux-foundation.org>
Subject: [PATCH net-next] net: allocate skbs on local node
Date: Tue, 12 Oct 2010 07:05:25 +0200 [thread overview]
Message-ID: <1286859925.30423.184.camel@edumazet-laptop> (raw)
In-Reply-To: <1286839363.30423.130.camel@edumazet-laptop>
Le mardi 12 octobre 2010 à 01:22 +0200, Eric Dumazet a écrit :
> Le mardi 12 octobre 2010 à 01:03 +0200, Eric Dumazet a écrit :
> >
> > For multi queue devices, it makes more sense to allocate skb on local
> > node of the cpu handling RX interrupts. This allow each cpu to
> > manipulate its own slub/slab queues/structures without doing expensive
> > cross-node business.
> >
> > For non multi queue devices, IRQ affinity should be set so that a cpu
> > close to the device services interrupts. Even if not set, using
> > dev_alloc_skb() is faster.
> >
> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
>
> Or maybe revert :
>
> commit b30973f877fea1a3fb84e05599890fcc082a88e5
> Author: Christoph Hellwig <hch@lst.de>
> Date: Wed Dec 6 20:32:36 2006 -0800
>
> [PATCH] node-aware skb allocation
>
> Node-aware allocation of skbs for the receive path.
>
> Details:
>
> - __alloc_skb gets a new node argument and cals the node-aware
> slab functions with it.
> - netdev_alloc_skb passed the node number it gets from dev_to_node
> to it, everyone else passes -1 (any node)
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Cc: Christoph Lameter <clameter@engr.sgi.com>
> Cc: "David S. Miller" <davem@davemloft.net>
> Signed-off-by: Andrew Morton <akpm@osdl.org>
>
>
> Apparently, only Christoph and Andrew signed it.
>
>
[PATCH net-next] net: allocate skbs on local node
commit b30973f877 (node-aware skb allocation) spread a wrong habit of
allocating net drivers skbs on a given memory node : The one closest to
the NIC hardware. This is wrong because as soon as we try to scale
network stack, we need to use many cpus to handle traffic and hit
slub/slab management on cross-node allocations/frees when these cpus
have to alloc/free skbs bound to a central node.
skb allocated in RX path are ephemeral, they have a very short
lifetime : Extra cost to maintain NUMA affinity is too expensive. What
appeared as a nice idea four years ago is in fact a bad one.
In 2010, NIC hardwares are multiqueue, or we use RPS to spread the load,
and two 10Gb NIC might deliver more than 28 million packets per second,
needing all the available cpus.
Cost of cross-node handling in network and vm stacks outperforms the
small benefit hardware had when doing its DMA transfert in its 'local'
memory node at RX time. Even trying to differentiate the two allocations
done for one skb (the sk_buff on local node, the data part on NIC
hardware node) is not enough to bring good performance.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
include/linux/skbuff.h | 20 ++++++++++++++++----
net/core/skbuff.c | 13 +------------
2 files changed, 17 insertions(+), 16 deletions(-)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 0b53c43..05a358f 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -496,13 +496,13 @@ extern struct sk_buff *__alloc_skb(unsigned int size,
static inline struct sk_buff *alloc_skb(unsigned int size,
gfp_t priority)
{
- return __alloc_skb(size, priority, 0, -1);
+ return __alloc_skb(size, priority, 0, NUMA_NO_NODE);
}
static inline struct sk_buff *alloc_skb_fclone(unsigned int size,
gfp_t priority)
{
- return __alloc_skb(size, priority, 1, -1);
+ return __alloc_skb(size, priority, 1, NUMA_NO_NODE);
}
extern bool skb_recycle_check(struct sk_buff *skb, int skb_size);
@@ -1563,13 +1563,25 @@ static inline struct sk_buff *netdev_alloc_skb_ip_align(struct net_device *dev,
return skb;
}
-extern struct page *__netdev_alloc_page(struct net_device *dev, gfp_t gfp_mask);
+/**
+ * __netdev_alloc_page - allocate a page for ps-rx on a specific device
+ * @dev: network device to receive on
+ * @gfp_mask: alloc_pages_node mask
+ *
+ * Allocate a new page. dev currently unused.
+ *
+ * %NULL is returned if there is no free memory.
+ */
+static inline struct page *__netdev_alloc_page(struct net_device *dev, gfp_t gfp_mask)
+{
+ return alloc_pages_node(NUMA_NO_NODE, gfp_mask, 0);
+}
/**
* netdev_alloc_page - allocate a page for ps-rx on a specific device
* @dev: network device to receive on
*
- * Allocate a new page node local to the specified device.
+ * Allocate a new page. dev currently unused.
*
* %NULL is returned if there is no free memory.
*/
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 752c197..4e8b82e 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -247,10 +247,9 @@ EXPORT_SYMBOL(__alloc_skb);
struct sk_buff *__netdev_alloc_skb(struct net_device *dev,
unsigned int length, gfp_t gfp_mask)
{
- int node = dev->dev.parent ? dev_to_node(dev->dev.parent) : -1;
struct sk_buff *skb;
- skb = __alloc_skb(length + NET_SKB_PAD, gfp_mask, 0, node);
+ skb = __alloc_skb(length + NET_SKB_PAD, gfp_mask, 0, NUMA_NO_NODE);
if (likely(skb)) {
skb_reserve(skb, NET_SKB_PAD);
skb->dev = dev;
@@ -259,16 +258,6 @@ struct sk_buff *__netdev_alloc_skb(struct net_device *dev,
}
EXPORT_SYMBOL(__netdev_alloc_skb);
-struct page *__netdev_alloc_page(struct net_device *dev, gfp_t gfp_mask)
-{
- int node = dev->dev.parent ? dev_to_node(dev->dev.parent) : -1;
- struct page *page;
-
- page = alloc_pages_node(node, gfp_mask, 0);
- return page;
-}
-EXPORT_SYMBOL(__netdev_alloc_page);
-
void skb_add_rx_frag(struct sk_buff *skb, int i, struct page *page, int off,
int size)
{
next prev parent reply other threads:[~2010-10-12 5:05 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-10-11 23:03 [PATCH net-next] bnx2x: dont use netdev_alloc_skb() Eric Dumazet
2010-10-11 23:22 ` Eric Dumazet
2010-10-12 5:03 ` Tom Herbert
2010-10-12 5:16 ` Eric Dumazet
2010-10-12 9:12 ` Vladislav Zolotarov
2010-10-14 17:39 ` David Miller
2010-10-14 18:17 ` Eilon Greenstein
2010-10-14 18:20 ` Eric Dumazet
2010-10-14 18:25 ` David Miller
2010-10-14 18:17 ` Tom Herbert
2010-10-12 5:05 ` Eric Dumazet [this message]
2010-10-12 5:35 ` [PATCH net-next] net: allocate skbs on local node Tom Herbert
2010-10-12 6:03 ` Andrew Morton
2010-10-12 6:58 ` Eric Dumazet
2010-10-12 7:24 ` Andrew Morton
2010-10-12 7:49 ` Eric Dumazet
2010-10-12 7:58 ` Andrew Morton
2010-10-12 11:08 ` Pekka Enberg
2010-10-12 12:50 ` Christoph Lameter
2010-10-12 19:43 ` David Rientjes
2010-10-13 6:17 ` Pekka Enberg
2010-10-13 6:31 ` David Rientjes
2010-10-13 6:36 ` Pekka Enberg
2010-10-13 16:00 ` Christoph Lameter
2010-10-13 20:48 ` David Rientjes
2010-10-13 21:43 ` Christoph Lameter
2010-10-13 22:41 ` David Rientjes
2010-10-14 6:22 ` Pekka Enberg
2010-10-14 7:23 ` David Rientjes
2010-10-15 14:23 ` Christoph Lameter
2010-10-14 15:31 ` Tom Herbert
2010-10-14 16:05 ` Eric Dumazet
2010-10-15 16:57 ` Christoph Lameter
2010-10-14 19:27 ` Andrew Morton
2010-10-14 19:59 ` Eric Dumazet
2010-10-16 18:54 ` David Miller
2010-10-12 16:07 ` [BUG net-next] bnx2x: all traffic comes to RX queue 0 Eric Dumazet
2010-10-12 16:20 ` Dmitry Kravkov
2010-10-12 18:11 ` Eric Dumazet
2010-10-12 18:18 ` Vladislav Zolotarov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1286859925.30423.184.camel@edumazet-laptop \
--to=eric.dumazet@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=cl@linux-foundation.org \
--cc=davem@davemloft.net \
--cc=eilong@broadcom.com \
--cc=hch@lst.de \
--cc=mchan@broadcom.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox