Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next] net: allocate skbs on local node
From: Tom Herbert @ 2010-10-14 15:31 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Eric Dumazet, David Miller, netdev, Michael Chan,
	Eilon Greenstein, Christoph Hellwig, Christoph Lameter
In-Reply-To: <20101011230322.f0f6dd47.akpm@linux-foundation.org>

> This is all conspicuously hand-wavy and unquantified.  (IOW: prove it!)
>
> The mooted effects should be tested for on both slab and slub, I
> suggest.  They're pretty different beasts.
> --

Some results running netper TCP_RR test with 200 instances, 1 byte
request and response on 16 core AMD using bnx2x with one 16 queues,
one for each CPU.

SLAB

Without patch 553570 tps at 86% CPU
With patch 791883 tps at 93% CPU

SLUB

Without patch 704879 tps at 95% CPU
With patch 775278 tps at 92% CPU

I believe both show good benfits with patch, and it actually looks
like the impact is more pronounced for SLAB.  I would also note, that
we have actually already internally patched __netdev_alloc_skb to do
local node allocation which we have been running in production for
quite some time.

Tom

^ permalink raw reply

* [PATCH net-next] netns: reorder fields in struct net
From: Eric Dumazet @ 2010-10-14 15:56 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

In a network bench, I noticed an unfortunate false sharing between
'loopback_dev' and 'count' fields in "struct net".

'count' is written each time a socket is created or destroyed, while
loopback_dev might be often read in routing code.

Move loopback_dev in a read mostly section of "struct net"

Note: struct netns_xfrm is cache line aligned on SMP.
(It contains a "struct dst_ops")
Move it at the end to avoid holes, and reduce sizeof(struct net) by 128
bytes on ia32.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/net/net_namespace.h |   17 ++++++++++-------
 include/net/netns/xfrm.h    |    9 +++++----
 2 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index bd10a79..65af9a0 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -41,6 +41,8 @@ struct net {
 						 * destroy on demand
 						 */
 #endif
+	spinlock_t		rules_mod_lock;
+
 	struct list_head	list;		/* list of network namespaces */
 	struct list_head	cleanup_list;	/* namespaces on death row */
 	struct list_head	exit_list;	/* Use only net_mutex */
@@ -52,7 +54,8 @@ struct net {
 	struct ctl_table_set	sysctls;
 #endif
 
-	struct net_device       *loopback_dev;          /* The loopback */
+	struct sock 		*rtnl;			/* rtnetlink socket */
+	struct sock		*genl_sock;
 
 	struct list_head 	dev_base_head;
 	struct hlist_head 	*dev_name_head;
@@ -60,11 +63,9 @@ struct net {
 
 	/* core fib_rules */
 	struct list_head	rules_ops;
-	spinlock_t		rules_mod_lock;
 
-	struct sock 		*rtnl;			/* rtnetlink socket */
-	struct sock		*genl_sock;
 
+	struct net_device       *loopback_dev;          /* The loopback */
 	struct netns_core	core;
 	struct netns_mib	mib;
 	struct netns_packet	packet;
@@ -84,13 +85,15 @@ struct net {
 	struct sock		*nfnl;
 	struct sock		*nfnl_stash;
 #endif
-#ifdef CONFIG_XFRM
-	struct netns_xfrm	xfrm;
-#endif
 #ifdef CONFIG_WEXT_CORE
 	struct sk_buff_head	wext_nlevents;
 #endif
 	struct net_generic	*gen;
+
+	/* Note : following structs are cache line aligned */
+#ifdef CONFIG_XFRM
+	struct netns_xfrm	xfrm;
+#endif
 };
 
 
diff --git a/include/net/netns/xfrm.h b/include/net/netns/xfrm.h
index 74f119a..748f91f 100644
--- a/include/net/netns/xfrm.h
+++ b/include/net/netns/xfrm.h
@@ -43,10 +43,6 @@ struct netns_xfrm {
 	unsigned int		policy_count[XFRM_POLICY_MAX * 2];
 	struct work_struct	policy_hash_work;
 
-	struct dst_ops		xfrm4_dst_ops;
-#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
-	struct dst_ops		xfrm6_dst_ops;
-#endif
 
 	struct sock		*nlsk;
 	struct sock		*nlsk_stash;
@@ -58,6 +54,11 @@ struct netns_xfrm {
 #ifdef CONFIG_SYSCTL
 	struct ctl_table_header	*sysctl_hdr;
 #endif
+
+	struct dst_ops		xfrm4_dst_ops;
+#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
+	struct dst_ops		xfrm6_dst_ops;
+#endif
 };
 
 #endif



^ permalink raw reply related

* Re: [PATCH net-next] net: allocate skbs on local node
From: Eric Dumazet @ 2010-10-14 16:05 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Andrew Morton, David Miller, netdev, Michael Chan,
	Eilon Greenstein, Christoph Hellwig, Christoph Lameter
In-Reply-To: <AANLkTin8FczYpPvi79w7iuKOZz3uZtYFfTPitAgQFxzD@mail.gmail.com>

Le jeudi 14 octobre 2010 à 08:31 -0700, Tom Herbert a écrit :
> > This is all conspicuously hand-wavy and unquantified.  (IOW: prove it!)
> >
> > The mooted effects should be tested for on both slab and slub, I
> > suggest.  They're pretty different beasts.
> > --
> 
> Some results running netper TCP_RR test with 200 instances, 1 byte
> request and response on 16 core AMD using bnx2x with one 16 queues,
> one for each CPU.
> 
> SLAB
> 
> Without patch 553570 tps at 86% CPU
> With patch 791883 tps at 93% CPU
> 
> SLUB
> 
> Without patch 704879 tps at 95% CPU
> With patch 775278 tps at 92% CPU
> 
> I believe both show good benfits with patch, and it actually looks
> like the impact is more pronounced for SLAB.  I would also note, that
> we have actually already internally patched __netdev_alloc_skb to do
> local node allocation which we have been running in production for
> quite some time.

Excellent ! I was not sure I could do this before NFWS...

Thanks Tom !

Small note : last user of 'allocate skb on a given node, not local one'
is pktgen.

[PATCH net-next] net: allocate skbs on local node

commit b30973f877 (node-aware skb allocation) spread a wrong habit of
allocating net drivers skbs on a given memory node : The one closest to
the NIC hardware. This is wrong because as soon as we try to scale
network stack, we need to use many cpus to handle traffic and hit
slub/slab management on cross-node allocations/frees when these cpus
have to alloc/free skbs bound to a central node.

skb allocated in RX path are ephemeral, they have a very short
lifetime : Extra cost to maintain NUMA affinity is too expensive. What
appeared as a nice idea four years ago is in fact a bad one.

In 2010, NIC hardwares are multiqueue, or we use RPS to spread the load,
and two 10Gb NIC might deliver more than 28 million packets per second,
needing all the available cpus.

Cost of cross-node handling in network and vm stacks outperforms the
small benefit hardware had when doing its DMA transfert in its 'local'
memory node at RX time. Even trying to differentiate the two allocations
done for one skb (the sk_buff on local node, the data part on NIC
hardware node) is not enough to bring good performance.

Some numbers, courtesy from Tom Herbert :

Some results running netper TCP_RR test with 200 instances, 1 byte
request and response on 16 core AMD using bnx2x with one 16 queues,
one for each CPU.

SLAB

Without patch 553570 tps at 86% CPU
With patch 791883 tps at 93% CPU

SLUB

Without patch 704879 tps at 95% CPU
With patch 775278 tps at 92% CPU

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Tom Herbert <therbert@google.com>
---
 include/linux/skbuff.h |   20 ++++++++++++++++----
 net/core/skbuff.c      |   13 +------------
 2 files changed, 17 insertions(+), 16 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 0b53c43..05a358f 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -496,13 +496,13 @@ extern struct sk_buff *__alloc_skb(unsigned int size,
 static inline struct sk_buff *alloc_skb(unsigned int size,
 					gfp_t priority)
 {
-	return __alloc_skb(size, priority, 0, -1);
+	return __alloc_skb(size, priority, 0, NUMA_NO_NODE);
 }
 
 static inline struct sk_buff *alloc_skb_fclone(unsigned int size,
 					       gfp_t priority)
 {
-	return __alloc_skb(size, priority, 1, -1);
+	return __alloc_skb(size, priority, 1, NUMA_NO_NODE);
 }
 
 extern bool skb_recycle_check(struct sk_buff *skb, int skb_size);
@@ -1563,13 +1563,25 @@ static inline struct sk_buff *netdev_alloc_skb_ip_align(struct net_device *dev,
 	return skb;
 }
 
-extern struct page *__netdev_alloc_page(struct net_device *dev, gfp_t gfp_mask);
+/**
+ *	__netdev_alloc_page - allocate a page for ps-rx on a specific device
+ *	@dev: network device to receive on
+ *	@gfp_mask: alloc_pages_node mask
+ *
+ * 	Allocate a new page. dev currently unused.
+ *
+ * 	%NULL is returned if there is no free memory.
+ */
+static inline struct page *__netdev_alloc_page(struct net_device *dev, gfp_t gfp_mask)
+{
+	return alloc_pages_node(NUMA_NO_NODE, gfp_mask, 0);
+}
 
 /**
  *	netdev_alloc_page - allocate a page for ps-rx on a specific device
  *	@dev: network device to receive on
  *
- * 	Allocate a new page node local to the specified device.
+ * 	Allocate a new page. dev currently unused.
  *
  * 	%NULL is returned if there is no free memory.
  */
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 752c197..4e8b82e 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -247,10 +247,9 @@ EXPORT_SYMBOL(__alloc_skb);
 struct sk_buff *__netdev_alloc_skb(struct net_device *dev,
 		unsigned int length, gfp_t gfp_mask)
 {
-	int node = dev->dev.parent ? dev_to_node(dev->dev.parent) : -1;
 	struct sk_buff *skb;
 
-	skb = __alloc_skb(length + NET_SKB_PAD, gfp_mask, 0, node);
+	skb = __alloc_skb(length + NET_SKB_PAD, gfp_mask, 0, NUMA_NO_NODE);
 	if (likely(skb)) {
 		skb_reserve(skb, NET_SKB_PAD);
 		skb->dev = dev;
@@ -259,16 +258,6 @@ struct sk_buff *__netdev_alloc_skb(struct net_device *dev,
 }
 EXPORT_SYMBOL(__netdev_alloc_skb);
 
-struct page *__netdev_alloc_page(struct net_device *dev, gfp_t gfp_mask)
-{
-	int node = dev->dev.parent ? dev_to_node(dev->dev.parent) : -1;
-	struct page *page;
-
-	page = alloc_pages_node(node, gfp_mask, 0);
-	return page;
-}
-EXPORT_SYMBOL(__netdev_alloc_page);
-
 void skb_add_rx_frag(struct sk_buff *skb, int i, struct page *page, int off,
 		int size)
 {



^ permalink raw reply related

* [PATCH net-next] fib_hash: embed initial hash table in fn_zone
From: Eric Dumazet @ 2010-10-14 16:24 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

While looking for false sharing problems, I noticed 
sizeof(struct fn_zone) was small (28 bytes) and possibly sharing a cache
line with an often written kernel structure.

Most of the time, fn_zone uses its initial hash table of 16 slots.

We can avoid the false sharing problem by embedding this initial hash
table in fn_zone itself, so that sizeof(fn_zone) > L1_CACHE_BYTES

We did a similar optimization in commit a6501e080c (Reduce memory needs
and speedup lookups)

Add a fz_revorder field to speedup fn_hash() a bit.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/ipv4/fib_hash.c |   50 ++++++++++++++++++------------------------
 1 files changed, 22 insertions(+), 28 deletions(-)

diff --git a/net/ipv4/fib_hash.c b/net/ipv4/fib_hash.c
index 83cca68..0e14dbd 100644
--- a/net/ipv4/fib_hash.c
+++ b/net/ipv4/fib_hash.c
@@ -54,23 +54,23 @@ struct fib_node {
 	struct fib_alias        fn_embedded_alias;
 };
 
+#define EMBEDDED_HASH_SIZE (L1_CACHE_BYTES / sizeof(struct hlist_head))
+
 struct fn_zone {
 	struct fn_zone		*fz_next;	/* Next not empty zone	*/
 	struct hlist_head	*fz_hash;	/* Hash table pointer	*/
-	int			fz_nent;	/* Number of entries	*/
-
-	int			fz_divisor;	/* Hash divisor		*/
 	u32			fz_hashmask;	/* (fz_divisor - 1)	*/
-#define FZ_HASHMASK(fz)		((fz)->fz_hashmask)
 
-	int			fz_order;	/* Zone order		*/
-	__be32			fz_mask;
+	u8			fz_order;	/* Zone order (0..32)	*/
+	u8			fz_revorder;	/* 32 - fz_order	*/
+	__be32			fz_mask;	/* inet_make_mask(order) */
 #define FZ_MASK(fz)		((fz)->fz_mask)
-};
 
-/* NOTE. On fast computers evaluation of fz_hashmask and fz_mask
- * can be cheaper than memory lookup, so that FZ_* macros are used.
- */
+	struct hlist_head	fz_embedded_hash[EMBEDDED_HASH_SIZE];
+
+	int			fz_nent;	/* Number of entries	*/
+	int			fz_divisor;	/* Hash size (mask+1)	*/
+};
 
 struct fn_hash {
 	struct fn_zone	*fn_zones[33];
@@ -79,11 +79,11 @@ struct fn_hash {
 
 static inline u32 fn_hash(__be32 key, struct fn_zone *fz)
 {
-	u32 h = ntohl(key)>>(32 - fz->fz_order);
+	u32 h = ntohl(key) >> fz->fz_revorder;
 	h ^= (h>>20);
 	h ^= (h>>10);
 	h ^= (h>>5);
-	h &= FZ_HASHMASK(fz);
+	h &= fz->fz_hashmask;
 	return h;
 }
 
@@ -150,11 +150,11 @@ static void fn_rehash_zone(struct fn_zone *fz)
 	old_divisor = fz->fz_divisor;
 
 	switch (old_divisor) {
-	case 16:
-		new_divisor = 256;
+	case EMBEDDED_HASH_SIZE:
+		new_divisor *= EMBEDDED_HASH_SIZE;
 		break;
-	case 256:
-		new_divisor = 1024;
+	case EMBEDDED_HASH_SIZE*EMBEDDED_HASH_SIZE:
+		new_divisor *= (EMBEDDED_HASH_SIZE/2);
 		break;
 	default:
 		if ((old_divisor << 1) > FZ_MAX_DIVISOR) {
@@ -184,7 +184,8 @@ static void fn_rehash_zone(struct fn_zone *fz)
 		fib_hash_genid++;
 		write_unlock_bh(&fib_hash_lock);
 
-		fz_hash_free(old_ht, old_divisor);
+		if (old_ht != fz->fz_embedded_hash)
+			fz_hash_free(old_ht, old_divisor);
 	}
 }
 
@@ -210,18 +211,11 @@ fn_new_zone(struct fn_hash *table, int z)
 	if (!fz)
 		return NULL;
 
-	if (z) {
-		fz->fz_divisor = 16;
-	} else {
-		fz->fz_divisor = 1;
-	}
-	fz->fz_hashmask = (fz->fz_divisor - 1);
-	fz->fz_hash = fz_hash_alloc(fz->fz_divisor);
-	if (!fz->fz_hash) {
-		kfree(fz);
-		return NULL;
-	}
+	fz->fz_divisor = z ? EMBEDDED_HASH_SIZE : 1;
+	fz->fz_hashmask = fz->fz_divisor - 1;
+	fz->fz_hash = fz->fz_embedded_hash;
 	fz->fz_order = z;
+	fz->fz_revorder = 32 - z;
 	fz->fz_mask = inet_make_mask(z);
 
 	/* Find the first not empty zone with more specific mask */



^ permalink raw reply related

* Re: [PATCH 2.6.35-rc6] net-next: Add multiqueue support to vmxnet3 driver
From: Ben Hutchings @ 2010-10-14 16:31 UTC (permalink / raw)
  To: Stephen Hemminger, Shreyas Bhatewara; +Cc: netdev, pv-drivers, linux-kernel
In-Reply-To: <20101013145732.7d69d0f3@nehalam>

On Wed, 2010-10-13 at 14:57 -0700, Stephen Hemminger wrote:
> On Wed, 13 Oct 2010 14:47:05 -0700 (PDT)
> Shreyas Bhatewara <sbhatewara@vmware.com> wrote:
> 
> > #ifdef VMXNET3_RSS
> > +static unsigned int num_rss_entries;
> > +#define VMXNET3_MAX_DEVICES 10
> > +
> > +static int rss_ind_table[VMXNET3_MAX_DEVICES *
> > +			 VMXNET3_RSS_IND_TABLE_SIZE + 1] = {
> > +	[0 ... VMXNET3_MAX_DEVICES * VMXNET3_RSS_IND_TABLE_SIZE] = -1 };
> > +#endif
> > +static int num_tqs[VMXNET3_MAX_DEVICES + 1] = {
> > +	[0 ... VMXNET3_MAX_DEVICES] = 1 };
> > +static int num_rqs[VMXNET3_MAX_DEVICES + 1] = {
> > +	[0 ... VMXNET3_MAX_DEVICES] = 1 };
> > +static int share_tx_intr[VMXNET3_MAX_DEVICES + 1] = {
> > +	[0 ... VMXNET3_MAX_DEVICES] = 0 };
> > +static int buddy_intr[VMXNET3_MAX_DEVICES + 1] = {
> > +	[0 ... VMXNET3_MAX_DEVICES] = 1 };
> > +
> > +static unsigned int num_adapters;
> > +module_param_array(share_tx_intr, int, &num_adapters, 0400);
> > +MODULE_PARM_DESC(share_tx_intr, "Share one IRQ among all tx queue completions. "
> > +		 "Comma separated list of 1s and 0s - one for each NIC. "
> > +		 "1 to share, 0 to not, default is 0");
> > +module_param_array(buddy_intr, int, &num_adapters, 0400);
> > +MODULE_PARM_DESC(buddy_intr, "Share one IRQ among corresponding tx and rx "
> > +		 "queues. Comma separated list of 1s and 0s - one for each "
> > +		 "NIC. 1 to share, 0 to not, default is 1");
> > +module_param_array(num_tqs, int, &num_adapters, 0400);
> > +MODULE_PARM_DESC(num_tqs, "Number of transmit queues in each adapter. Comma "
> > +		 "separated list of integers. Setting this to 0 makes number"
> > +		 " of queues same as number of CPUs. Default is 1.");
> > +
> > +#ifdef VMXNET3_RSS
> > +module_param_array(rss_ind_table, int, &num_rss_entries, 0400);
> > +MODULE_PARM_DESC(rss_ind_table, "RSS Indirection table. Number of entries "
> > +		 "per NIC should be 32. Each integer in a comma separated list"
> > +		 " is an rx queue number starting with 0. Repeat the same for"
> > +		 " all NICs.");
> > +module_param_array(num_rqs, int, &num_adapters, 0400);
> > +MODULE_PARM_DESC(num_rqs, "Number of receive queues in each adapter. Comma "
> > +		 " separated list of integers. Setting this to 0 makes number"
> > +		 " of queues same as number of CPUs. Default is 1.");
> 
> Module parameters are not right for this. They lead to different API
> for interacting with each driver vendor. Is there a another better API?
> Does it have to be this tweakable in a production environment.

The ethtool commands ETHTOOL_{G,S}RXFHINDIR cover the RSS indirection
table.  These are new in 2.6.36 but already supported in the ethtool
utility.

As for numbers of queues and association of their completions with
interrupts, we currently have nothing except ETHTOOL_GRXRINGS to get the
number of RX queues.  I did post a tentative definition of an ethtool
interface for this in
<http://article.gmane.org/gmane.linux.network/172386> though it wouldn't
provide quite as much control as these module parameters.  It is also
significantly more difficult to support changing numbers of queues after
an interface has been created, and I have not yet attempted to implement
the 'set' command myself.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* [GIT] Networking
From: David Miller @ 2010-10-14 17:14 UTC (permalink / raw)
  To: torvalds; +Cc: akpm, netdev, linux-kernel


One last push before heading off to Spain for the netfilter workshop:

1) r8169 does 16K allocations using GFP_ATOMIC during resume,
   which makes resume fail frequently.  Use dma_*() interfaces
   and GFP_KERNEL where possible.

   From Stanislaw Gruszka.

2) b44 and fec driver mis-manage initial carrier state, breaking NFS
   root mounts and similar, from Paul Fertser and Oskar Schirmer.

3) Kernel heap leak to userspace in privileged ethtool ops, fix from
   Kees Cook.

4) Several use-after free fixes in ATM and wimax from Jiri Slaby.

5) Fixed PHY support regressed upon conversion to phylib in FEC
   driver, fix from Greg Ungerer.

6) ehea sets CHECKSUM_UNNECESSARY erroneously on non-TCP/UDP packets,
   fix from Breno Leitao.

7) TG3 rx_dropped accounting regressed unintentionally, fix from Eric
   Dumazet.

Please pull, thanks a lot!

The following changes since commit 53eeb64e808971207350386121f4bab12fa2f45f:
  Linus Torvalds (1):
        Merge branch 'fixes' of git://git.kernel.org/.../djbw/async_tx

are available in the git repository at:

  master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6.git master

Breno Leitao (1):
      ehea: Fix a checksum issue on the receive path

Eric Dumazet (1):
      tg3: restore rx_dropped accounting

Greg Ungerer (1):
      net: allow FEC driver to use fixed PHY support

Jiri Slaby (4):
      ATM: solos-pci, remove use after free
      ATM: mpc, fix use after free
      ATM: iphase, remove sleep-inside-atomic
      NET: wimax, fix use after free

Kees Cook (1):
      net: clear heap allocations for privileged ethtool actions

Oskar Schirmer (1):
      net/fec: carrier off initially to avoid root mount failure

Paul Fertser (1):
      b44: fix carrier detection on bind

Stanislaw Gruszka (2):
      r8169: allocate with GFP_KERNEL flag when able to sleep
      r8169: use device model DMA API

 drivers/atm/iphase.c          |    6 ----
 drivers/atm/iphase.h          |    2 +-
 drivers/atm/solos-pci.c       |    8 +++--
 drivers/net/b44.c             |    4 +-
 drivers/net/ehea/ehea_main.c  |    9 +++++-
 drivers/net/ehea/ehea_qmr.h   |    1 +
 drivers/net/fec.c             |   44 +++++++++++++++++++---------
 drivers/net/r8169.c           |   65 ++++++++++++++++++++++-------------------
 drivers/net/tg3.c             |    6 ++-
 drivers/net/tg3.h             |    2 +-
 drivers/net/wimax/i2400m/rx.c |   26 ++++++++--------
 net/atm/mpc.c                 |    2 +-
 net/core/ethtool.c            |    6 ++--
 13 files changed, 104 insertions(+), 77 deletions(-)

^ permalink raw reply

* Re: [PATCH net-next] bnx2x: dont use netdev_alloc_skb()
From: David Miller @ 2010-10-14 17:39 UTC (permalink / raw)
  To: vladz; +Cc: eric.dumazet, therbert, netdev, mchan, eilong
In-Reply-To: <8628FE4E7912BF47A96AE7DD7BAC0AADDDEE4282DD@SJEXCHCCR02.corp.ad.broadcom.com>

From: "Vladislav Zolotarov" <vladz@broadcom.com>
Date: Tue, 12 Oct 2010 02:12:41 -0700

> Dave, if it can wait till tomorrow, I'd like to ask u to wait for Eilon's
> final ACK.

Ok, but it is two days later now :-)

^ permalink raw reply

* Re: [PATCH 0/8 net-next] cnic: Bug fixes and 57712 support
From: David Miller @ 2010-10-14 17:46 UTC (permalink / raw)
  To: mchan; +Cc: netdev
In-Reply-To: <1287014811-10979-1-git-send-email-mchan@broadcom.com>

From: "Michael Chan" <mchan@broadcom.com>
Date: Wed, 13 Oct 2010 17:06:43 -0700

> 
> David, please consider applying these patches to net-next.

Looks good, applied, thanks.

^ permalink raw reply

* Re: [net-next-2.6 PATCH 1/2] ixgbe: remove unused functions
From: David Miller @ 2010-10-14 17:49 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, bphilips, shemminger, emil.s.tantilov
In-Reply-To: <20101013081759.24491.46846.stgit@localhost.localdomain>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Wed, 13 Oct 2010 01:20:34 -0700

> From: Emil Tantilov <emil.s.tantilov@intel.com>
> 
> Remove functions that are declared, but not used in the driver.
> This patch fixes warnings reported by `make namespacecheck`
> 
> Reported by Stephen Hemminger
> 
> CC: Stephen Hemminger <shemminger@vyatta.com>
> Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
> Tested-by: Stephen Ko <stephen.s.ko@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Applied.

^ permalink raw reply

* Re: [net-next-2.6 PATCH 2/2] ixgbe: declare functions as static
From: David Miller @ 2010-10-14 17:50 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, bphilips, shemminger, emil.s.tantilov
In-Reply-To: <20101013082056.24491.93910.stgit@localhost.localdomain>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Wed, 13 Oct 2010 01:20:59 -0700

> From: Emil Tantilov <emil.s.tantilov@intel.com>
> 
> Following patch fixes warnings reported by `make namespacecheck`
> 
> Reported by Stephen Hemminger
> 
> CC: Stephen Hemminger <shemminger@vyatta.com>
> Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
> Tested-by: Stephen Ko <stephen.s.ko@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Applied.

^ permalink raw reply

* Re: [net-next-2.6 PATCH] igb: add check for fiber/serdes devices to igb_set_spd_dplx;
From: David Miller @ 2010-10-14 17:50 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, bphilips, carolyn.wyborny
In-Reply-To: <20101013082611.24740.80297.stgit@localhost.localdomain>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Wed, 13 Oct 2010 01:27:02 -0700

> From: Carolyn Wyborny <carolyn.wyborny@intel.com>
> 
> Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
> Tested-by: Emil Tantilov <emil.s.tantilov@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next] tipc: cleanup function namespace
From: Paul Gortmaker @ 2010-10-14 17:53 UTC (permalink / raw)
  To: Neil Horman
  Cc: Stephen Hemminger, David Miller, netdev, allan.stephens,
	Jon Maloy
In-Reply-To: <20101014012951.GA2186@localhost.localdomain>

On 10-10-13 09:29 PM, Neil Horman wrote:
> On Wed, Oct 13, 2010 at 08:23:24PM -0400, Paul Gortmaker wrote:
>> On 10-10-13 07:20 PM, Stephen Hemminger wrote:
>>> Do some cleanups of TIPC based on make namespacecheck
>>>     1. Don't export unused symbols
>>>     2. Eliminate dead code
>>>     3. Make functions and variables local
>>>     4. Rename buf_acquire to tipc_buf_acquire since it is used in several files
>>>
>>> Compile tested only.
>>> This make break out of tree kernel modules that depend on TIPC routines.
>>
>> Hi Stephen,
>>
>> When I first started looking at TIPC code, I too came to the
>> same conclusion as you did and was about to do #1,2,3 -- but
>> then I was told that the exported symbols were part of an API
>> and might be in use by folks here and there as per this thread:
>>
>> http://www.mail-archive.com/netdev@vger.kernel.org/msg30208.html
>>
> I think its telling the the argument in the above thread for keeping the API
> were that users of it were out there and 'likely to contribute' in the future.
> That thread was 3 years ago.  They might be using the API from outside the
> kernel tree, but they're not planning on contributing.  As Christoph noted,
> they're freeloaders.  The community really doesn't need or want to maintain an
> API like that.  If these users are your customers, and removing the API is
> unacceptable, perhaps its time to move the entire TIPC module out of tree.

As I'd said -- I don't know what the use cases of these API users are,
and so as far as I know they aren't customers either.  For what it is
worth, know that I personally wouldn't try and use a business case to
justify a technically wrong decision here on netdev anyway.

I was just describing the history of the situation, and suggesting
one possible slower approach of phasing it out as a courtesy to those
users, in the same way that the kernel community has extended that
same courtesy with other things in feature-removal.txt

In the end, since Jon is OK with the removal, and is in the process of
communicating this to the API users he is aware of, I sure don't have
any reason to try and save the API.  If folks are good with having it
just go away overnight, then great -- I'll be just as happy to see it
disappear as you and Stephen.  So, a long winded way of saying...

Acked-by: Paul Gortmaker <paul.gortmaker@windriver.com>

Paul.

> 
> Neil
> 


^ permalink raw reply

* Re: [PATCH net-next] bnx2x: dont use netdev_alloc_skb()
From: Eilon Greenstein @ 2010-10-14 18:17 UTC (permalink / raw)
  To: David Miller, eric.dumazet@gmail.com
  Cc: Vladislav Zolotarov, therbert@google.com, netdev@vger.kernel.org,
	Michael Chan
In-Reply-To: <20101014.103914.193713537.davem@davemloft.net>

On Thu, 2010-10-14 at 10:39 -0700, David Miller wrote:
> From: "Vladislav Zolotarov" <vladz@broadcom.com>
> Date: Tue, 12 Oct 2010 02:12:41 -0700
> 
> > Dave, if it can wait till tomorrow, I'd like to ask u to wait for Eilon's
> > final ACK.
> 
> Ok, but it is two days later now :-)
> 

I was under the impression that the current path is to revert
b30973f877fea1a3fb84e05599890fcc082a88e5 instead of the original thread.
Eric - do you still want to apply the original patch from this thread?

Thanks,
Eilon



^ permalink raw reply

* Re: [PATCH net-next] bnx2x: dont use netdev_alloc_skb()
From: Tom Herbert @ 2010-10-14 18:17 UTC (permalink / raw)
  To: David Miller; +Cc: vladz, eric.dumazet, netdev, mchan, eilong
In-Reply-To: <20101014.103914.193713537.davem@davemloft.net>

On Thu, Oct 14, 2010 at 10:39 AM, David Miller <davem@davemloft.net> wrote:
> From: "Vladislav Zolotarov" <vladz@broadcom.com>
> Date: Tue, 12 Oct 2010 02:12:41 -0700
>
>> Dave, if it can wait till tomorrow, I'd like to ask u to wait for Eilon's
>> final ACK.
>
> Ok, but it is two days later now :-)

If the netdev_alloc_skb is reverted to do local node allocation as
Eric suggested, then this patch against bnx2x (and presumably similar
changes to other drivers) is no longer needed.

> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply

* Re: [PATCH net-next] bnx2x: dont use netdev_alloc_skb()
From: Eric Dumazet @ 2010-10-14 18:20 UTC (permalink / raw)
  To: eilong
  Cc: David Miller, Vladislav Zolotarov, therbert@google.com,
	netdev@vger.kernel.org, Michael Chan
In-Reply-To: <1287080255.22035.2.camel@lb-tlvb-eilong.il.broadcom.com>

Le jeudi 14 octobre 2010 à 20:17 +0200, Eilon Greenstein a écrit :
> On Thu, 2010-10-14 at 10:39 -0700, David Miller wrote:
> > From: "Vladislav Zolotarov" <vladz@broadcom.com>
> > Date: Tue, 12 Oct 2010 02:12:41 -0700
> > 
> > > Dave, if it can wait till tomorrow, I'd like to ask u to wait for Eilon's
> > > final ACK.
> > 
> > Ok, but it is two days later now :-)
> > 
> 
> I was under the impression that the current path is to revert
> b30973f877fea1a3fb84e05599890fcc082a88e5 instead of the original thread.
> Eric - do you still want to apply the original patch from this thread?

No, this patch is not needed, if the generic one is considered.

We might change copybreak stuff, but not all skb allocations in bnx2x

Thanks



^ permalink raw reply

* Re: [PATCH net-next] bnx2x: dont use netdev_alloc_skb()
From: David Miller @ 2010-10-14 18:25 UTC (permalink / raw)
  To: eric.dumazet; +Cc: eilong, vladz, therbert, netdev, mchan
In-Reply-To: <1287080420.2712.102.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 14 Oct 2010 20:20:20 +0200

> Le jeudi 14 octobre 2010 à 20:17 +0200, Eilon Greenstein a écrit :
>> On Thu, 2010-10-14 at 10:39 -0700, David Miller wrote:
>> > From: "Vladislav Zolotarov" <vladz@broadcom.com>
>> > Date: Tue, 12 Oct 2010 02:12:41 -0700
>> > 
>> > > Dave, if it can wait till tomorrow, I'd like to ask u to wait for Eilon's
>> > > final ACK.
>> > 
>> > Ok, but it is two days later now :-)
>> > 
>> 
>> I was under the impression that the current path is to revert
>> b30973f877fea1a3fb84e05599890fcc082a88e5 instead of the original thread.
>> Eric - do you still want to apply the original patch from this thread?
> 
> No, this patch is not needed, if the generic one is considered.

Ok.

^ permalink raw reply

* Re: [PATCH net-next] tipc: cleanup function namespace
From: Stephen Hemminger @ 2010-10-14 18:33 UTC (permalink / raw)
  To: Paul Gortmaker
  Cc: Neil Horman, David Miller, netdev, allan.stephens, Jon Maloy
In-Reply-To: <4CB74391.9020400@windriver.com>

On Thu, 14 Oct 2010 13:53:21 -0400
Paul Gortmaker <paul.gortmaker@windriver.com> wrote:

> On 10-10-13 09:29 PM, Neil Horman wrote:
> > On Wed, Oct 13, 2010 at 08:23:24PM -0400, Paul Gortmaker wrote:
> >> On 10-10-13 07:20 PM, Stephen Hemminger wrote:
> >>> Do some cleanups of TIPC based on make namespacecheck
> >>>     1. Don't export unused symbols
> >>>     2. Eliminate dead code
> >>>     3. Make functions and variables local
> >>>     4. Rename buf_acquire to tipc_buf_acquire since it is used in several files
> >>>
> >>> Compile tested only.
> >>> This make break out of tree kernel modules that depend on TIPC routines.
> >>
> >> Hi Stephen,
> >>
> >> When I first started looking at TIPC code, I too came to the
> >> same conclusion as you did and was about to do #1,2,3 -- but
> >> then I was told that the exported symbols were part of an API
> >> and might be in use by folks here and there as per this thread:
> >>
> >> http://www.mail-archive.com/netdev@vger.kernel.org/msg30208.html
> >>
> > I think its telling the the argument in the above thread for keeping the API
> > were that users of it were out there and 'likely to contribute' in the future.
> > That thread was 3 years ago.  They might be using the API from outside the
> > kernel tree, but they're not planning on contributing.  As Christoph noted,
> > they're freeloaders.  The community really doesn't need or want to maintain an
> > API like that.  If these users are your customers, and removing the API is
> > unacceptable, perhaps its time to move the entire TIPC module out of tree.
> 
> As I'd said -- I don't know what the use cases of these API users are,
> and so as far as I know they aren't customers either.  For what it is
> worth, know that I personally wouldn't try and use a business case to
> justify a technically wrong decision here on netdev anyway.
> 
> I was just describing the history of the situation, and suggesting
> one possible slower approach of phasing it out as a courtesy to those
> users, in the same way that the kernel community has extended that
> same courtesy with other things in feature-removal.txt
> 
> In the end, since Jon is OK with the removal, and is in the process of
> communicating this to the API users he is aware of, I sure don't have
> any reason to try and save the API.  If folks are good with having it
> just go away overnight, then great -- I'll be just as happy to see it
> disappear as you and Stephen.  So, a long winded way of saying...
> 
> Acked-by: Paul Gortmaker <paul.gortmaker@windriver.com>

How about putting an entry in feature-removal.txt with a short (6 month)
window?


-- 

^ permalink raw reply

* Re: BUG ? ipip unregister_netdevice_many()
From: Eric W. Biederman @ 2010-10-14 18:35 UTC (permalink / raw)
  To: David Miller; +Cc: hans.schillstrom, daniel.lezcano, netdev, Octavian Purdila
In-Reply-To: <20101014.080907.189690627.davem@davemloft.net>

David Miller <davem@davemloft.net> writes:

> From: ebiederm@xmission.com (Eric W. Biederman)
> Date: Wed, 13 Oct 2010 22:20:28 -0700
>
>> With the network namespace support we limit the scope of the test of
>> the invalidate to just a single network namespace, and as such
>> rt_is_expired stops being true for every cache entry.  So we cannot
>> unconditionally throw away entire chains.
>> 
>> All of which can be either done by network namespace equality or by
>> rt_is_expired().  Although Denis picked rt_is_expired() when he made
>> his change.
>
> Right, and I choose to use namespace equality which will completely
> compile into no code at all when namespace support is not in the
> kernel.
>
> Therefore, making the non-namespace case equivalent and as efficient
> as it always was.

Almost you still have the hash list inversion, which means you have
to at look at the rtable entry even on a one list long hash chain.
Perhaps I am looking at it wrong but once you look at the entries
I don't see the difference in the number of cache line faults
between one variant of the code and the other.

>> The only place it makes a noticable difference in practice is what
>> happens when we do batched deleletes of lots of network devices in
>> different network namespaces.
>> 
>> During batched network device deletes in fib_netdev_event we do
>> rt_cache_flush(dev_net(dev), -1) for each network device.  and then a
>> final rt_cache_flush_batch() to remove the invalidated entries.  These
>> devices can be from multiple network namespaces, so I suspect that is
>> a savings worth having.
>
> How can it make a real difference even in this case?  We'll obliterate
> all the entries, and then on subsequent passes we'll find nothing
> matching that namespace any more.
>
> Show me performance tests that show it makes any difference, please.

Octavian did you happen to measure the performance difference when you
added batching of routing table flushes?

>> So if we are going to change the tests we need to do something with
>> rt_cache_flush_batch().  Further I do not see what is confusing about
>> a test that asks if the routing cache entry is unusable.  Is
>> rt_cache_expired() a bad name?
>
> It's not a bad name, it's just an unnecessary test that we don't need
> to even make in this specific place.

As long as we do something that is correct in the batched flush case
I am happy either way.

Eric

^ permalink raw reply

* Re: BUG ? ipip unregister_netdevice_many()
From: Octavian Purdila @ 2010-10-14 19:21 UTC (permalink / raw)
  To: Eric W. Biederman, David Miller; +Cc: hans.schillstrom, daniel.lezcano, netdev


> > How can it make a real difference even in this case? We'll obliterate
> > all the entries, and then on subsequent passes we'll find nothing
> > matching that namespace any more.
> > 
> > Show me performance tests that show it makes any difference, please.
> 
> Octavian did you happen to measure the performance difference when you
> added batching of routing table flushes?
> 

Unfortunatelly I dont't have the numbers anymore, but I remember it was noticeable when using a large number of interfaces (10K) - if I remember correctly around 1 second out of 10 for the whole unregister process.

BTW, another bottleneck for mass unregister while interfaces are up is dev_deactivate / dev_close. I experimented with "batchifying" it and for 32K interfaces the time went down from 5mins to 30s.

The patch I have is not pretty, it basically creates another 2 functions for each of dev_close and dev_deactivate for pre and post synchronise_rcu barrier.

^ permalink raw reply

* Re: [PATCH net-next] net: allocate skbs on local node
From: Andrew Morton @ 2010-10-14 19:27 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Eric Dumazet, David Miller, netdev, Michael Chan,
	Eilon Greenstein, Christoph Hellwig, Christoph Lameter
In-Reply-To: <AANLkTin8FczYpPvi79w7iuKOZz3uZtYFfTPitAgQFxzD@mail.gmail.com>

On Thu, 14 Oct 2010 08:31:01 -0700
Tom Herbert <therbert@google.com> wrote:

> > This is all conspicuously hand-wavy and unquantified. __(IOW: prove it!)
> >
> > The mooted effects should be tested for on both slab and slub, I
> > suggest. __They're pretty different beasts.
> > --
> 
> Some results running netper TCP_RR test with 200 instances, 1 byte
> request and response on 16 core AMD using bnx2x with one 16 queues,
> one for each CPU.
> 
> SLAB
> 
> Without patch 553570 tps at 86% CPU
> With patch 791883 tps at 93% CPU
> 
> SLUB
> 
> Without patch 704879 tps at 95% CPU
> With patch 775278 tps at 92% CPU
> 
> I believe both show good benfits with patch, and it actually looks
> like the impact is more pronounced for SLAB.  I would also note, that
> we have actually already internally patched __netdev_alloc_skb to do
> local node allocation which we have been running in production for
> quite some time.
> 

Yes, that's a solid gain.

Can we think of any hardware configuration for which the change would
be harmful?  Something with really expensive cross-node DMA maybe?


^ permalink raw reply

* Re: [PATCH net-next] fib_hash: embed initial hash table in fn_zone
From: Eric Dumazet @ 2010-10-14 19:22 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <1287073458.2712.100.camel@edumazet-laptop>

Le jeudi 14 octobre 2010 à 18:24 +0200, Eric Dumazet a écrit :
> While looking for false sharing problems, I noticed 
> sizeof(struct fn_zone) was small (28 bytes) and possibly sharing a cache
> line with an often written kernel structure.
> 
> Most of the time, fn_zone uses its initial hash table of 16 slots.
> 
> We can avoid the false sharing problem by embedding this initial hash
> table in fn_zone itself, so that sizeof(fn_zone) > L1_CACHE_BYTES
> 
> We did a similar optimization in commit a6501e080c (Reduce memory needs
> and speedup lookups)
> 
> Add a fz_revorder field to speedup fn_hash() a bit.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Oops, a last minute change was wrong, I'll resend a V2.

Thanks



^ permalink raw reply

* Re: [RFC PATCH net-next] drivers/net Documentation/networking: Create directory intel_wired_lan
From: Joe Perches @ 2010-10-14 19:30 UTC (permalink / raw)
  To: jeffrey.t.kirsher, Michal Marek, Sam Ravnborg, linux-kbuild
  Cc: Brandeburg, Jesse, Allan, Bruce W, Wyborny, Carolyn,
	Skidmore, Donald C, Rose, Gregory V, Waskiewicz Jr, Peter P,
	Duyck, Alexander H, Ronciak, John, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org, e1000-devel
In-Reply-To: <1287048849.3319.20.camel@jtkirshe-MOBL1>

On Thu, 2010-10-14 at 02:34 -0700, Jeff Kirsher wrote:
> On Wed, 2010-10-13 at 22:57 -0700, Joe Perches wrote:
> > On Wed, 2010-10-13 at 21:57 -0700, Jeff Kirsher wrote:
> > > On Wed, 2010-10-13 at 15:28 -0700, Joe Perches wrote:
> > > Sorry I am not ignoring you, I was taking a closer look at your patch.
> > > > What regression testing would actually be done?
> > > The Makefile and Kconfig needs more work.  I applied your patch and none
> > > of the Intel Wired drivers build.
> > Care to describe the Makefile/Kconfig issues you have seen?
> > I built it allyesconfig, defconfig, allmodconfig and allnoconfig.
> Yeah, I found all of those built without errors, but if you build the
> Intel Wired LAN drivers as modules, you will not find the *.ko files
> after the build.  The Kconfig files look fine, the problem was with the
> Makefiles.  Instead of creating a drivers/net/intel_wired_lan/Makefile,
> I simply changed the path in drivers/net/Makefile to the updated path
> and that resolved the issue.

(adding a few cc's and a link for history)

http://lkml.org/lkml/2010/10/10/207

That's the way I had done it originally as well, but I found
you couldn't build the directory with:

	make drivers/net/intel_wired_lan/

so I created a Makefile in the new directory below with
the elements necessary.

Perhaps there's some missing functionality in the build system
when the Kconfig file resides in a higher directory and the
directory being built doesn't have a Kconfig file?

I think it'd wrong to duplicate the makefile components in
2 places to allow "make subdir/" and I wonder if there's a
good solution for this.

> As far as the sub-directory name "intel_wired_lan", what about "intel"
> or "intel_wired"?  Just a thought...

Using "intel" seemed too sweeping because of the wireless drivers.
I think intel_wired_lan isn't overly long, but your choice...

Should the new (OKI?/intel) pch_gbe directory be moved as well?
It's using a PCI_VENDOR_ID_INTEL.



^ permalink raw reply

* RE: [PATCH net-next] tipc: cleanup function namespace
From: Jon Maloy @ 2010-10-14 19:49 UTC (permalink / raw)
  To: Stephen Hemminger, Paul Gortmaker
  Cc: Neil Horman, David Miller, netdev@vger.kernel.org,
	allan.stephens@windriver.com
In-Reply-To: <20101014113320.7720828f@nehalam>


<...> 
> > justify a technically wrong decision here on netdev anyway.
> > 
> > I was just describing the history of the situation, and 
> suggesting one 
> > possible slower approach of phasing it out as a courtesy to those 
> > users, in the same way that the kernel community has extended that 
> > same courtesy with other things in feature-removal.txt
> > 
> > In the end, since Jon is OK with the removal, and is in the 
> process of 
> > communicating this to the API users he is aware of, I sure 
> don't have 
> > any reason to try and save the API.  If folks are good with 
> having it 
> > just go away overnight, then great -- I'll be just as happy 
> to see it 
> > disappear as you and Stephen.  So, a long winded way of saying...
> > 
> > Acked-by: Paul Gortmaker <paul.gortmaker@windriver.com>
> 
> How about putting an entry in feature-removal.txt with a 
> short (6 month) window?

I think that is a very good idea. That would give the users a reasonable time
to find other solutions.

> 
> 
> --
> --
> To unsubscribe from this list: send the line "unsubscribe 
> netdev" in the body of a message to majordomo@vger.kernel.org 
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply

* [PATCH net-next] drivers/net/pch_gbe: Use DEFINE_PCI_DEVICE_TABLE
From: Joe Perches @ 2010-10-14 19:55 UTC (permalink / raw)
  To: Masayuki Ohtake; +Cc: netdev, LKML

Use the standard macro to put this table in __devinitconst.

Compiled, untested.

Signed-off-by: Joe Perches <joe@perches.com>
---
 drivers/net/pch_gbe/pch_gbe_main.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/pch_gbe/pch_gbe_main.c b/drivers/net/pch_gbe/pch_gbe_main.c
index e44644f..cf4b49d 100644
--- a/drivers/net/pch_gbe/pch_gbe_main.c
+++ b/drivers/net/pch_gbe/pch_gbe_main.c
@@ -2394,7 +2394,7 @@ err_disable_device:
 	return ret;
 }
 
-static const struct pci_device_id pch_gbe_pcidev_id[] = {
+static DEFINE_PCI_DEVICE_TABLE(pch_gbe_pcidev_id) = {
 	{.vendor = PCI_VENDOR_ID_INTEL,
 	 .device = PCI_DEVICE_ID_INTEL_IOH1_GBE,
 	 .subvendor = PCI_ANY_ID,



^ permalink raw reply related

* Re: [PATCH net-next] net: allocate skbs on local node
From: Eric Dumazet @ 2010-10-14 19:59 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Tom Herbert, David Miller, netdev, Michael Chan, Eilon Greenstein,
	Christoph Hellwig, Christoph Lameter
In-Reply-To: <20101014122752.21dd4eaf.akpm@linux-foundation.org>

Le jeudi 14 octobre 2010 à 12:27 -0700, Andrew Morton a écrit :

> Can we think of any hardware configuration for which the change would
> be harmful?  Something with really expensive cross-node DMA maybe?
> 

If such hardware exists (yes it does, but not close my hands...), then
NIC IRQS probably should be handled by cpus close to the device, or it
might be very slow. This has nothing to do with skb allocation layer.

I believe we should not try to correct wrong IRQ affinities with NUMA
games. Network stack will wakeup threads and scheduler will run them on
same cpu, if possible.

If skb stay long enough on socket receive queue, application will need
to fetch again remote numa node when reading socket a few milliseconds
later, because cache will be cold : Total : two cross node transferts
per incoming frame.

The more node distances are big, the more speedup we can expect from
this patch.

Thanks



^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox