Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH v3 net-next] net: move inet_dport/inet_num in sock_common
From: David Miller @ 2012-11-30 20:03 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, ling.ma.program, bhutchings, joe
In-Reply-To: <1354304967.20109.10.camel@edumazet-glaptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 30 Nov 2012 11:49:27 -0800

> From: Eric Dumazet <edumazet@google.com>
> 
> commit 68835aba4d9b (net: optimize INET input path further)
> moved some fields used for tcp/udp sockets lookup in the first cache
> line of struct sock_common.
> 
> This patch moves inet_dport/inet_num as well, filling a 32bit hole
> on 64 bit arches and reducing number of cache line misses in lookups.
> 
> Also change INET_MATCH()/INET_TW_MATCH() to perform the ports match
> before addresses match, as this check is more discriminant.
> 
> Remove the hash check from MATCH() macros because we dont need to
> re validate the hash value after taking a refcount on socket, and
> use likely/unlikely compiler hints, as the sk_hash/hash check
> makes the following conditional tests 100% predicted by cpu.
> 
> Introduce skc_addrpair/skc_portpair pair values to better
> document the alignment requirements of the port/addr pairs
> used in the various MATCH() macros, and remove some casts.
> 
> The namespace check can also be done at last.
> 
> This slightly improves TCP/UDP lookup times.
> 
> IP/TCP early demux needs inet->rx_dst_ifindex and
> TCP needs inet->min_ttl, lets group them together in same cache line.
> 
> With help from Ben Hutchings & Joe Perches.
> 
> Idea of this patch came after Ling Ma proposal to move skc_hash
> to the beginning of struct sock_common, and should allow him
> to submit a final version of his patch. My tests show an improvement
> doing so.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied, thanks for fixing this up.

^ permalink raw reply

* [PATCH net-next] tcp: change default tcp hash size
From: Eric Dumazet @ 2012-11-30 20:08 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

From: Eric Dumazet <edumazet@google.com>

As time passed, available memory increased faster than number of
concurrent tcp sockets. 

As a result, a machine with 4GB of ram gets a hash table
with 524288 slots, using 8388608 bytes of memory.

Lets change that by a 16x factor (one slot for 128 KB of ram)

Even if a small machine needs a _lot_ of sockets, tcp lookups are now
very efficient, using one cache line per socket.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/tcp.c |    6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index e6eace1..1aca02c 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -3590,8 +3590,7 @@ void __init tcp_init(void)
 		alloc_large_system_hash("TCP established",
 					sizeof(struct inet_ehash_bucket),
 					thash_entries,
-					(totalram_pages >= 128 * 1024) ?
-					13 : 15,
+					17, /* one slot per 128 KB of memory */
 					0,
 					NULL,
 					&tcp_hashinfo.ehash_mask,
@@ -3607,8 +3606,7 @@ void __init tcp_init(void)
 		alloc_large_system_hash("TCP bind",
 					sizeof(struct inet_bind_hashbucket),
 					tcp_hashinfo.ehash_mask + 1,
-					(totalram_pages >= 128 * 1024) ?
-					13 : 15,
+					17, /* one slot per 128 KB of memory */
 					0,
 					&tcp_hashinfo.bhash_size,
 					NULL,

^ permalink raw reply related

* Re: [PATCH 00/17] ATM fixes for pppoatm/br2684
From: David Woodhouse @ 2012-11-30 20:22 UTC (permalink / raw)
  To: davem; +Cc: netdev, chas, Krzysztof Mazur
In-Reply-To: <20121130104411.GA16410@shrek.podlesie.net>

[-- Attachment #1: Type: text/plain, Size: 1955 bytes --]

On Fri, 2012-11-30 at 11:44 +0100, Krzysztof Mazur wrote:
> > The patch series can be pulled from
> >       git://git.infradead.org/users/dwmw2/atm.git
> > or viewed at 
> >       http://git.infradead.org/users/dwmw2/atm.git
> > 
> > DaveM, please wait for an ack from Krzysztof and Chas before pulling this.
> 
> looks good to me, except [<fixed>]

On Fri, 2012-11-30 at 12:12 -0500, chas williams - CONTRACTOR wrote:
> no objections.  i think this deals with my concerns. 

Dave, if you're not now ignoring this thread entirely, please pull into
net-next from
	git://git.infradead.org/users/dwmw2/atm.git

David Woodhouse (9):
      solos-pci: wait for pending TX to complete when releasing vcc
      atm: add release_cb() callback to vcc
      br2684: don't send frames on not-ready vcc
      pppoatm: fix missing wakeup in pppoatm_send()
      br2684: fix module_put() race
      pppoatm: optimise PPP channel wakeups after sock_owned_by_user()
      solos-pci: clean up pclose() function
      solos-pci: use GFP_KERNEL where possible, not GFP_ATOMIC
      solos-pci: remove list_vccs() debugging function

Krzysztof Mazur (7):
      atm: add owner of push() callback to atmvcc
      pppoatm: allow assign only on a connected socket
      pppoatm: fix module_put() race
      pppoatm: take ATM socket lock in pppoatm_send()
      pppoatm: drop frames to not-ready vcc
      pppoatm: do not inline pppoatm_may_send()
      br2684: allow assign only on a connected socket

Nathan Williams (1):
      solos-pci: Fix leak of skb received for unknown vcc

 drivers/atm/solos-pci.c | 85 ++++++++++++++++++++-----------------------------
 include/linux/atmdev.h  |  2 ++
 net/atm/br2684.c        | 55 ++++++++++++++++++++++++++++----
 net/atm/common.c        | 12 +++++++
 net/atm/pppoatm.c       | 68 ++++++++++++++++++++++++++++++++++++---
 5 files changed, 160 insertions(+), 62 deletions(-)

-- 
dwmw2


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 6171 bytes --]

^ permalink raw reply

* [PATCH v2] ipv6: unify logic evaluating inet6_dev's accept_ra property
From: Shmulik Ladkani @ 2012-11-30 20:25 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Hideaki YOSHIFUJI, Thomas Graf, Tore Anderson,
	shmulik.ladkani

As of 026359b [ipv6: Send ICMPv6 RSes only when RAs are accepted], the
logic determining whether to send Router Solicitations is identical
to the logic determining whether kernel accepts Router Advertisements.

However the condition itself is repeated in several code locations.

Unify it by introducing 'ipv6_accept_ra()' accessor.

Also, simplify the condition expression, making it more readable.
No semantic change.

Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
---

v2: proper comment styling

 include/net/ipv6.h  |    9 +++++++++
 net/ipv6/addrconf.c |    3 +--
 net/ipv6/ndisc.c    |   16 ++--------------
 3 files changed, 12 insertions(+), 16 deletions(-)

diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 979bf6c..985c6fa 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -271,6 +271,15 @@ struct ipv6_txoptions *ipv6_fixup_options(struct ipv6_txoptions *opt_space,
 
 extern bool ipv6_opt_accepted(const struct sock *sk, const struct sk_buff *skb);
 
+static inline bool ipv6_accept_ra(struct inet6_dev *idev)
+{
+	/* If forwarding is enabled, RA are not accepted unless the special
+	 * hybrid mode (accept_ra=2) is enabled.
+	 */
+	return idev->cnf.forwarding ? idev->cnf.accept_ra == 2 :
+	    idev->cnf.accept_ra;
+}
+
 #if IS_ENABLED(CONFIG_IPV6)
 static inline int ip6_frag_nqueues(struct net *net)
 {
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 0424e4e..ca1ed8a 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -3005,8 +3005,7 @@ static void addrconf_dad_completed(struct inet6_ifaddr *ifp)
 	   router advertisements, start sending router solicitations.
 	 */
 
-	if (((ifp->idev->cnf.accept_ra == 1 && !ifp->idev->cnf.forwarding) ||
-	     ifp->idev->cnf.accept_ra == 2) &&
+	if (ipv6_accept_ra(ifp->idev) &&
 	    ifp->idev->cnf.rtr_solicits > 0 &&
 	    (dev->flags&IFF_LOOPBACK) == 0 &&
 	    (ipv6_addr_type(&ifp->addr) & IPV6_ADDR_LINKLOCAL)) {
diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index 2edce30..980cdc3 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -1033,18 +1033,6 @@ errout:
 	rtnl_set_sk_err(net, RTNLGRP_ND_USEROPT, err);
 }
 
-static inline int accept_ra(struct inet6_dev *in6_dev)
-{
-	/*
-	 * If forwarding is enabled, RA are not accepted unless the special
-	 * hybrid mode (accept_ra=2) is enabled.
-	 */
-	if (in6_dev->cnf.forwarding && in6_dev->cnf.accept_ra < 2)
-		return 0;
-
-	return in6_dev->cnf.accept_ra;
-}
-
 static void ndisc_router_discovery(struct sk_buff *skb)
 {
 	struct ra_msg *ra_msg = (struct ra_msg *)skb_transport_header(skb);
@@ -1092,7 +1080,7 @@ static void ndisc_router_discovery(struct sk_buff *skb)
 		return;
 	}
 
-	if (!accept_ra(in6_dev))
+	if (!ipv6_accept_ra(in6_dev))
 		goto skip_linkparms;
 
 #ifdef CONFIG_IPV6_NDISC_NODETYPE
@@ -1248,7 +1236,7 @@ skip_linkparms:
 			     NEIGH_UPDATE_F_ISROUTER);
 	}
 
-	if (!accept_ra(in6_dev))
+	if (!ipv6_accept_ra(in6_dev))
 		goto out;
 
 #ifdef CONFIG_IPV6_ROUTE_INFO
-- 
1.7.9

^ permalink raw reply related

* [RFT PATCH] 8139cp: properly support change of MTU values
From: John Greene @ 2012-11-30 20:51 UTC (permalink / raw)
  To: netdev; +Cc: John Greene, David S. Miller

The 8139cp driver has a change_mtu function that has not been
enabled since the dawn of the git repository. However, the
generic eth_change_mtu is not used in its place, so that
invalid MTU values can be set on the interface.

Original patch salvages the broken code for the single case of
setting the MTU while the interface is down, which is safe
and also includes the range check.  Now enhanced to support up
or down interface.

Original patch from
http://lkml.indiana.edu/hypermail/linux/kernel/1202.2/00770.html

Testing: has been test on virtual 8139cp setup without issue,
have no access real hardware 8139cp, need testing help.

Signed-off-by: "John Greene" <jogreene@redhat.com>
CC: "David S. Miller" <davem@davemloft.net>
---
 drivers/net/ethernet/realtek/8139cp.c | 22 +++-------------------
 1 file changed, 3 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ethernet/realtek/8139cp.c b/drivers/net/ethernet/realtek/8139cp.c
index 6cb96b4..7847c83 100644
--- a/drivers/net/ethernet/realtek/8139cp.c
+++ b/drivers/net/ethernet/realtek/8139cp.c
@@ -1226,12 +1226,9 @@ static void cp_tx_timeout(struct net_device *dev)
 	spin_unlock_irqrestore(&cp->lock, flags);
 }
 
-#ifdef BROKEN
 static int cp_change_mtu(struct net_device *dev, int new_mtu)
 {
 	struct cp_private *cp = netdev_priv(dev);
-	int rc;
-	unsigned long flags;
 
 	/* check for invalid MTU, according to hardware limits */
 	if (new_mtu < CP_MIN_MTU || new_mtu > CP_MAX_MTU)
@@ -1244,22 +1241,11 @@ static int cp_change_mtu(struct net_device *dev, int new_mtu)
 		return 0;
 	}
 
-	spin_lock_irqsave(&cp->lock, flags);
-
-	cp_stop_hw(cp);			/* stop h/w and free rings */
-	cp_clean_rings(cp);
-
+	/* network IS up, close it, reset MTU, and come up again. */
+	cp_close(dev);
 	dev->mtu = new_mtu;
-	cp_set_rxbufsize(cp);		/* set new rx buf size */
-
-	rc = cp_init_rings(cp);		/* realloc and restart h/w */
-	cp_start_hw(cp);
-
-	spin_unlock_irqrestore(&cp->lock, flags);
-
-	return rc;
+	return cp_open(dev);
 }
-#endif /* BROKEN */
 
 static const char mii_2_8139_map[8] = {
 	BasicModeCtrl,
@@ -1835,9 +1821,7 @@ static const struct net_device_ops cp_netdev_ops = {
 	.ndo_start_xmit		= cp_start_xmit,
 	.ndo_tx_timeout		= cp_tx_timeout,
 	.ndo_set_features	= cp_set_features,
-#ifdef BROKEN
 	.ndo_change_mtu		= cp_change_mtu,
-#endif
 
 #ifdef CONFIG_NET_POLL_CONTROLLER
 	.ndo_poll_controller	= cp_poll_controller,
-- 
1.7.11.7

^ permalink raw reply related

* [net-next:master 98/98] drivers/net/ethernet/myricom/myri10ge/myri10ge.c:1286:34: sparse: cast to restricted __be16
From: kbuild test robot @ 2012-11-30 21:02 UTC (permalink / raw)
  To: Andrew Gallatin; +Cc: netdev

tree:   git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master
head:   1b4c44e6369dbbafd113f1e00b406f1eda5ab5b2
commit: 1b4c44e6369dbbafd113f1e00b406f1eda5ab5b2 [98/98] myri10ge: Add vlan rx for better GRO perf.


sparse warnings:

+ drivers/net/ethernet/myricom/myri10ge/myri10ge.c:1286:34: sparse: cast to restricted __be16
+ drivers/net/ethernet/myricom/myri10ge/myri10ge.c:1286:34: sparse: cast to restricted __be16
+ drivers/net/ethernet/myricom/myri10ge/myri10ge.c:1286:34: sparse: cast to restricted __be16
+ drivers/net/ethernet/myricom/myri10ge/myri10ge.c:1286:34: sparse: cast to restricted __be16
+ drivers/net/ethernet/myricom/myri10ge/myri10ge.c:1286:16: sparse: restricted __be16 degrades to integer
drivers/net/ethernet/myricom/myri10ge/myri10ge.c:1888:16: sparse: incorrect type in argument 1 (different base types)
drivers/net/ethernet/myricom/myri10ge/myri10ge.c:1888:16:    expected unsigned int [unsigned] val
drivers/net/ethernet/myricom/myri10ge/myri10ge.c:1888:16:    got restricted __be32 [usertype] <noident>
drivers/net/ethernet/myricom/myri10ge/myri10ge.c:2334:24: sparse: incorrect type in assignment (different address spaces)
drivers/net/ethernet/myricom/myri10ge/myri10ge.c:2334:24:    expected unsigned char [usertype] *itable
drivers/net/ethernet/myricom/myri10ge/myri10ge.c:2334:24:    got unsigned char [noderef] [usertype] <asn:2>*
drivers/net/ethernet/myricom/myri10ge/myri10ge.c:2336:48: sparse: incorrect type in argument 2 (different address spaces)
drivers/net/ethernet/myricom/myri10ge/myri10ge.c:2336:48:    expected void volatile [noderef] <asn:2>*addr
drivers/net/ethernet/myricom/myri10ge/myri10ge.c:2336:48:    got unsigned char [usertype] *
drivers/net/ethernet/myricom/myri10ge/myri10ge.c:2760:60: sparse: dubious: x & !y
drivers/net/ethernet/myricom/myri10ge/myri10ge.c:3839:13: sparse: cast to restricted __be32
drivers/net/ethernet/myricom/myri10ge/myri10ge.c:3839:13: sparse: cast to restricted __be32
drivers/net/ethernet/myricom/myri10ge/myri10ge.c:3839:13: sparse: cast to restricted __be32
drivers/net/ethernet/myricom/myri10ge/myri10ge.c:3839:13: sparse: cast to restricted __be32
drivers/net/ethernet/myricom/myri10ge/myri10ge.c:3839:13: sparse: cast to restricted __be32
drivers/net/ethernet/myricom/myri10ge/myri10ge.c:3839:13: sparse: cast to restricted __be32
drivers/net/ethernet/myricom/myri10ge/myri10ge.c:3841:26: sparse: cast to restricted __be32
drivers/net/ethernet/myricom/myri10ge/myri10ge.c:3841:26: sparse: cast to restricted __be32
drivers/net/ethernet/myricom/myri10ge/myri10ge.c:3841:26: sparse: cast to restricted __be32
drivers/net/ethernet/myricom/myri10ge/myri10ge.c:3841:26: sparse: cast to restricted __be32
drivers/net/ethernet/myricom/myri10ge/myri10ge.c:3841:26: sparse: cast to restricted __be32
drivers/net/ethernet/myricom/myri10ge/myri10ge.c:3841:26: sparse: cast to restricted __be32
drivers/net/ethernet/myricom/myri10ge/myri10ge.c:1434:35: sparse: context imbalance in 'myri10ge_intr' - different lock contexts for basic block

vim +1286 drivers/net/ethernet/myricom/myri10ge/myri10ge.c

1b4c44e6 Andrew Gallatin 2012-11-30  1270   * the tag ourselves to be able to achieve GRO performance that
1b4c44e6 Andrew Gallatin 2012-11-30  1271   * is comparable to LRO.
1b4c44e6 Andrew Gallatin 2012-11-30  1272   */
1b4c44e6 Andrew Gallatin 2012-11-30  1273  
1b4c44e6 Andrew Gallatin 2012-11-30  1274  static inline void
1b4c44e6 Andrew Gallatin 2012-11-30  1275  myri10ge_vlan_rx(struct net_device *dev, void *addr, struct sk_buff *skb)
1b4c44e6 Andrew Gallatin 2012-11-30  1276  {
1b4c44e6 Andrew Gallatin 2012-11-30  1277  	u8 *va;
1b4c44e6 Andrew Gallatin 2012-11-30  1278  	struct vlan_ethhdr *veh;
1b4c44e6 Andrew Gallatin 2012-11-30  1279  	struct skb_frag_struct *frag;
1b4c44e6 Andrew Gallatin 2012-11-30  1280  	__wsum vsum;
1b4c44e6 Andrew Gallatin 2012-11-30  1281  
1b4c44e6 Andrew Gallatin 2012-11-30  1282  	va = addr;
1b4c44e6 Andrew Gallatin 2012-11-30  1283  	va += MXGEFW_PAD;
1b4c44e6 Andrew Gallatin 2012-11-30  1284  	veh = (struct vlan_ethhdr *)va;
1b4c44e6 Andrew Gallatin 2012-11-30  1285  	if ((dev->features & NETIF_F_HW_VLAN_RX) == NETIF_F_HW_VLAN_RX &&
1b4c44e6 Andrew Gallatin 2012-11-30 @1286  	    veh->h_vlan_proto == ntohs(ETH_P_8021Q)) {
1b4c44e6 Andrew Gallatin 2012-11-30  1287  		/* fixup csum if needed */
1b4c44e6 Andrew Gallatin 2012-11-30  1288  		if (skb->ip_summed == CHECKSUM_COMPLETE) {
1b4c44e6 Andrew Gallatin 2012-11-30  1289  			vsum = csum_partial(va + ETH_HLEN, VLAN_HLEN, 0);
1b4c44e6 Andrew Gallatin 2012-11-30  1290  			skb->csum = csum_sub(skb->csum, vsum);
1b4c44e6 Andrew Gallatin 2012-11-30  1291  		}
1b4c44e6 Andrew Gallatin 2012-11-30  1292  		/* pop tag */
1b4c44e6 Andrew Gallatin 2012-11-30  1293  		__vlan_hwaccel_put_tag(skb, ntohs(veh->h_vlan_TCI));
1b4c44e6 Andrew Gallatin 2012-11-30  1294  		memmove(va + VLAN_HLEN, va, 2 * ETH_ALEN);

---
0-DAY kernel build testing backend         Open Source Technology Center
Fengguang Wu, Yuanhan Liu                              Intel Corporation

^ permalink raw reply

* Re: Wireless regression in workqueue: use mod_delayed_work() instead of __cancel + queue
From: Tejun Heo @ 2012-11-30 21:14 UTC (permalink / raw)
  To: Anders Kaseorg
  Cc: Herbert Xu, John W. Linville, netdev, linux-wireless,
	linux-kernel
In-Reply-To: <alpine.DEB.2.00.1211281016320.26602@dr-wily.mit.edu>

Hello, Anders.

Sorry about the delay.

On Wed, Nov 28, 2012 at 10:17:28AM -0500, Anders Kaseorg wrote:
> On Wed, 28 Nov 2012, Anders Kaseorg wrote:
> > My Intel 6250 wireless card (iwldvm) can no longer associate with a 
> > WPA-Enterprise network (PEAP-MSCHAPv2).  To my surprise, I bisected this 
> > regression to commit e7c2f967445dd2041f0f8e3179cca22bb8bb7f79, 
> > workqueue: use mod_delayed_work() instead of __cancel + queue.

I see.

> > A bunch of logs collected by Ubuntu apport are in this bug report: 
> >   https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1083980
> > 
> > How can I help to debug this?
> > 
> > I see that someone else reported another regression with the same commit 
> > last week, although this looks unrelated at first glance: 
> >   http://thread.gmane.org/gmane.linux.kernel/1395938

Urgh... that one was in my spam folder probably due to the mimed
content.  Nothing rings a bell yet.  Will keep looking into it.

Thanks.

-- 
tejun

^ permalink raw reply

* Re: [net-next PATCH V2 1/9] net: frag evictor, avoid killing warm frag queues
From: Jesper Dangaard Brouer @ 2012-11-30 21:37 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, fw, netdev, pablo, tgraf, amwang, kaber, paulmck,
	herbert
In-Reply-To: <1354293469.3299.81.camel@edumazet-glaptop>

On Fri, 2012-11-30 at 08:37 -0800, Eric Dumazet wrote:
> On Fri, 2012-11-30 at 16:45 +0100, Jesper Dangaard Brouer wrote:
> > On Fri, 2012-11-30 at 06:52 -0800, Eric Dumazet wrote:
> 
> > 
> > > I dont know how you expect that many
> > > datagrams being correctly reassembled with ipfrag_high_thresh=262144 
> > 
> > That's my point... I'm showing that its not possible, with out current
> > implementation!
> 
> What I was saying is that the limits are too small, and we should
> increase them for this particular need.
> 
> This has little to do with the underlying algo.

Actual data is an engineers best friend.

[root@dragon ~]# sysctl -w net/ipv4/ipfrag_high_thresh=$((4<<20))
net.ipv4.ipfrag_high_thresh = 4194304
[root@dragon ~]# sysctl -w net/ipv4/ipfrag_low_thresh=$((3<<20))
net.ipv4.ipfrag_low_thresh = 3145728


[jbrouer@firesoul ~]$ netperf -H 192.168.51.2 -T0,0 -t UDP_STREAM -l 20 &\
 netperf -p 1337 -H 192.168.31.2 -T7,7 -t UDP_STREAM -l 20
[1] 18573
UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.51.2 (192.168.51.2) port 0 AF_INET : cpu bind
UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.31.2 (192.168.31.2) port 0 AF_INET : cpu bind
Socket  Message  Elapsed      Messages                
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

229376   65507   20.00      363315      0    9519.86
212992           20.00        7297            191.20

Socket  Message  Elapsed      Messages                
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

229376   65507   20.00      366927      0    9614.48
212992           20.00       10437            273.48


This test is 2x10G with straight NUMA nodes (meaning optimal NUMA
allocation where the incoming netperf packets are received by kernel and
delivered to netserver on the same NUMA node).


Come on Eric, you are smart than this.  When will you realize, that
dropping partly completed fragment queue are bad for performance? (And
thus a bad algorithmic choice in the evictor)


-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [net-next:master 98/98] drivers/net/ethernet/myricom/myri10ge/myri10ge.c:1286:34: sparse: cast to restricted __be16
From: Andrew Gallatin @ 2012-11-30 21:51 UTC (permalink / raw)
  To: kbuild test robot; +Cc: netdev
In-Reply-To: <50b91efa.B0WbOtcWMs7eOSaC%fengguang.wu@intel.com>

On 11/30/12 16:02, kbuild test robot wrote:
> tree:   git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master
> head:   1b4c44e6369dbbafd113f1e00b406f1eda5ab5b2
> commit: 1b4c44e6369dbbafd113f1e00b406f1eda5ab5b2 [98/98] myri10ge: Add vlan rx for better GRO perf.
> 
> 
> sparse warnings:
> 
> + drivers/net/ethernet/myricom/myri10ge/myri10ge.c:1286:34: sparse: cast to restricted __be16
> + drivers/net/ethernet/myricom/myri10ge/myri10ge.c:1286:34: sparse: cast to restricted __be16
> + drivers/net/ethernet/myricom/myri10ge/myri10ge.c:1286:34: sparse: cast to restricted __be16
> + drivers/net/ethernet/myricom/myri10ge/myri10ge.c:1286:34: sparse: cast to restricted __be16
> + drivers/net/ethernet/myricom/myri10ge/myri10ge.c:1286:16: sparse: restricted __be16 degrades to integer


OK, maybe a dumb question again, but how do I get sparse to produce
the 'cast to restricted' warnings?  I ran sparse before submission,
but it only showed the pre-existing, non "cast to restricted"
warnings, so I did not know I was introducing a new warning.
Do I need to use a different architecture? (I was using x86_64).

Also, the line it is warning about is this:

> 1b4c44e6 Andrew Gallatin 2012-11-30 @1286  	    veh->h_vlan_proto == ntohs(ETH_P_8021Q)) {


Which seems to be nearly identical to the usage in
if_vlan.h:__vlan_get_tag, which I was treating as canonical..
So I'm a bit confused as to how to fix it.


Thanks,

Drew

^ permalink raw reply

* Re: [net-next:master 98/98] drivers/net/ethernet/myricom/myri10ge/myri10ge.c:1286:34: sparse: cast to restricted __be16
From: Stephen Hemminger @ 2012-11-30 21:53 UTC (permalink / raw)
  To: Andrew Gallatin; +Cc: kbuild test robot, netdev
In-Reply-To: <50B92A6D.8000600@myri.com>

On Fri, 30 Nov 2012 16:51:41 -0500
Andrew Gallatin <gallatin@myri.com> wrote:

> On 11/30/12 16:02, kbuild test robot wrote:
> > tree:   git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master
> > head:   1b4c44e6369dbbafd113f1e00b406f1eda5ab5b2
> > commit: 1b4c44e6369dbbafd113f1e00b406f1eda5ab5b2 [98/98] myri10ge: Add vlan rx for better GRO perf.
> > 
> > 
> > sparse warnings:
> > 
> > + drivers/net/ethernet/myricom/myri10ge/myri10ge.c:1286:34: sparse: cast to restricted __be16
> > + drivers/net/ethernet/myricom/myri10ge/myri10ge.c:1286:34: sparse: cast to restricted __be16
> > + drivers/net/ethernet/myricom/myri10ge/myri10ge.c:1286:34: sparse: cast to restricted __be16
> > + drivers/net/ethernet/myricom/myri10ge/myri10ge.c:1286:34: sparse: cast to restricted __be16
> > + drivers/net/ethernet/myricom/myri10ge/myri10ge.c:1286:16: sparse: restricted __be16 degrades to integer
> 
> 
> OK, maybe a dumb question again, but how do I get sparse to produce
> the 'cast to restricted' warnings?  I ran sparse before submission,
> but it only showed the pre-existing, non "cast to restricted"
> warnings, so I did not know I was introducing a new warning.
> Do I need to use a different architecture? (I was using x86_64).

See Documentation/sparse.txt
  
  The optional make variable CF can be used to pass arguments to sparse.  The
  build system passes -Wbitwise to sparse automatically.  To perform endianness
  checks, you may define __CHECK_ENDIAN__:

        make C=2 CF="-D__CHECK_ENDIAN__"

  These checks are disabled by default as they generate a host of warnings.

^ permalink raw reply

* Re: [PATCH] Smack: Add missing depends on INET in Kconfig
From: Eric Paris @ 2012-11-30 22:01 UTC (permalink / raw)
  To: Casey Schaufler
  Cc: Randy Dunlap, Paul Moore, Stephen Rothwell, linux-next,
	Linux Kernel Mailing List, netdev@vger.kernel.org, LSM List
In-Reply-To: <50B8ECB3.2090801@schaufler-ca.com>

Do other LSMs need this too Casey?  I remember we mentioned how select
was dangerous  :-(

On Fri, Nov 30, 2012 at 12:28 PM, Casey Schaufler
<casey@schaufler-ca.com> wrote:
> Because NETLABEL depends on INET SECURITY_SMACK
> has to explicitly call out the dependency.
>
> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
> ---
>  security/smack/Kconfig |    1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/security/smack/Kconfig b/security/smack/Kconfig
> index 9fb14ef..1be1088 100644
> --- a/security/smack/Kconfig
> +++ b/security/smack/Kconfig
> @@ -1,5 +1,6 @@
>  config SECURITY_SMACK
>         bool "Simplified Mandatory Access Control Kernel Support"
> +       depends on INET
>         depends on NET
>         depends on SECURITY
>         select NETLABEL
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply

* Re: [net-next:master 98/98] drivers/net/ethernet/myricom/myri10ge/myri10ge.c:1286:34: sparse: cast to restricted __be16
From: Fengguang Wu @ 2012-11-30 22:02 UTC (permalink / raw)
  To: Andrew Gallatin; +Cc: netdev, Christopher Li, Stephen Hemminger
In-Reply-To: <50B92A6D.8000600@myri.com>

On Fri, Nov 30, 2012 at 04:51:41PM -0500, Andrew Gallatin wrote:
> On 11/30/12 16:02, kbuild test robot wrote:
> > tree:   git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master
> > head:   1b4c44e6369dbbafd113f1e00b406f1eda5ab5b2
> > commit: 1b4c44e6369dbbafd113f1e00b406f1eda5ab5b2 [98/98] myri10ge: Add vlan rx for better GRO perf.
> > 
> > 
> > sparse warnings:
> > 
> > + drivers/net/ethernet/myricom/myri10ge/myri10ge.c:1286:34: sparse: cast to restricted __be16
> > + drivers/net/ethernet/myricom/myri10ge/myri10ge.c:1286:34: sparse: cast to restricted __be16
> > + drivers/net/ethernet/myricom/myri10ge/myri10ge.c:1286:34: sparse: cast to restricted __be16
> > + drivers/net/ethernet/myricom/myri10ge/myri10ge.c:1286:34: sparse: cast to restricted __be16
> > + drivers/net/ethernet/myricom/myri10ge/myri10ge.c:1286:16: sparse: restricted __be16 degrades to integer
> 
> 
> OK, maybe a dumb question again, but how do I get sparse to produce
> the 'cast to restricted' warnings?
[snip]
> Also, the line it is warning about is this:
> 
> > 1b4c44e6 Andrew Gallatin 2012-11-30 @1286  	    veh->h_vlan_proto == ntohs(ETH_P_8021Q)) {
> 
> 
> Which seems to be nearly identical to the usage in
> if_vlan.h:__vlan_get_tag, which I was treating as canonical..
> So I'm a bit confused as to how to fix it.
 
Andrew, here is the explanations from Christopher Li:

On Thu, Nov 29, 2012 at 09:58:58AM -0800, Christopher Li wrote:
> On Wed, Nov 28, 2012 at 2:42 PM, Andrew Morton
> <akpm@linux-foundation.org> wrote:
> > On Wed, 28 Nov 2012 17:23:47 +0800
> >>
> >> + fs/hfsplus/xattr.c:363:23: sparse: cast to restricted __be32
> 
> >> 54d776ef Vyacheslav Dubeyko 2012-11-27  361   hfs_bnode_read(fd.bnode, &record_type,
> >> 54d776ef Vyacheslav Dubeyko 2012-11-27  362                   fd.entryoffset, sizeof(record_type));
> >> 54d776ef Vyacheslav Dubeyko 2012-11-27 @363   record_type = be32_to_cpu(record_type);
> >> 54d776ef Vyacheslav Dubeyko 2012-11-27  364   if (record_type == HFSPLUS_ATTR_INLINE_DATA) {
> >> 54d776ef Vyacheslav Dubeyko 2012-11-27  365           record_length = hfs_bnode_read_u16(fd.bnode,
> >> 54d776ef Vyacheslav Dubeyko 2012-11-27  366                           fd.entryoffset +
> >> 54d776ef Vyacheslav Dubeyko 2012-11-27  367                           offsetof(struct hfsplus_attr_inline_data,
> >> 54d776ef Vyacheslav Dubeyko 2012-11-27  368                           length));
> >> 54d776ef Vyacheslav Dubeyko 2012-11-27  369           if (record_length > HFSPLUS_MAX_INLINE_DATA_SIZE) {
> >> 54d776ef Vyacheslav Dubeyko 2012-11-27  370                   printk(KERN_ERR "hfs: invalid xattr record size\n");
> >> 54d776ef Vyacheslav Dubeyko 2012-11-27  371                   res = -EIO;
> >>
> >
> > I don't know what that warning means :(
> 
> Who does any way :-).
> 
> >
> > Chris, can you shed some light here?
> 
> I think it is likely cause by record_type get value assigned.
> What you want here is have one variable for record_type store in back
> end endian.
> Then have a different variable to store the record_type in CPU endian.
> It is bad idea to store both endian in the same variable. That is what sparse is
> complaining right now.
> 
> The detail cause of the complain is that, record_type has type __be32 __u32.
> After be32_to_cpu() it return __u32 type.
> When you assign __u32 type to a __be32_u32, sparse find out it has
> type mismatch,
> so it will do implicitly up cast. Think about if you assign char
> variable to int, the compiler
> will need to insert a cast to do the sign extension. That case is
> causing the error message
> because __be32 can't be cased.
> 
> Any way, it seems sparse is doing what it suppose to do here. The suggested
> way to fix the warning is give different variable for back end and
> CPU. That should get rid
> of the warning.
> 
> Chris

^ permalink raw reply

* respin of __dev* removal patches
From: Bill Pemberton @ 2012-11-30 22:15 UTC (permalink / raw)
  To: netdev; +Cc: Greg KH

I've got a respin of the hotplug removal patches for the networking
subsystem.  I don't want to irritate you like the big patch set did,
so before I submit them, what do you want to see that will keep your
pain to a minimum?

I've redone them so that all the __dev* removals are done at once so
there won't be one patch for __devinit, one for __devexit, etc.  I've
also broken them down into chunks following what's in the MAINTAINERS
file.  The result is 103 patches.

-- 
Bill

^ permalink raw reply

* Re: [PATCH] Smack: Add missing depends on INET in Kconfig
From: Casey Schaufler @ 2012-11-30 22:18 UTC (permalink / raw)
  To: Eric Paris
  Cc: Randy Dunlap, Paul Moore, Stephen Rothwell, linux-next,
	Linux Kernel Mailing List, netdev@vger.kernel.org, LSM List,
	Casey Schaufler
In-Reply-To: <CACLa4ptazSZCpk5Ug0mZLZRS1PxZ7KnwYxEXsXFwn3zG90YzNg@mail.gmail.com>

On 11/30/2012 2:01 PM, Eric Paris wrote:
> Do other LSMs need this too Casey?  I remember we mentioned how select
> was dangerous  :-(

I don't see any missing dependencies, but then, I missed INET.
Yes, you mentioned that it was dangerous.

>
> On Fri, Nov 30, 2012 at 12:28 PM, Casey Schaufler
> <casey@schaufler-ca.com> wrote:
>> Because NETLABEL depends on INET SECURITY_SMACK
>> has to explicitly call out the dependency.
>>
>> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
>> ---
>>  security/smack/Kconfig |    1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/security/smack/Kconfig b/security/smack/Kconfig
>> index 9fb14ef..1be1088 100644
>> --- a/security/smack/Kconfig
>> +++ b/security/smack/Kconfig
>> @@ -1,5 +1,6 @@
>>  config SECURITY_SMACK
>>         bool "Simplified Mandatory Access Control Kernel Support"
>> +       depends on INET
>>         depends on NET
>>         depends on SECURITY
>>         select NETLABEL
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply

* Re: [net-next:master 98/98] drivers/net/ethernet/myricom/myri10ge/myri10ge.c:1286:34: sparse: cast to restricted __be16
From: Andrew Gallatin @ 2012-11-30 22:19 UTC (permalink / raw)
  To: Fengguang Wu; +Cc: netdev, Christopher Li, Stephen Hemminger
In-Reply-To: <20121130220228.GA22050@localhost>

Thanks guys.

In this case, it found a real typo (use of ntohs() rather than htons()).
If it would not have been for this tool making me stare at it, I never
would have seen this.

I need audit the rest of the warnings in the driver..

Sorry for the noise & thanks for this service!

Drew

^ permalink raw reply

* Re: [net-next PATCH V2 1/9] net: frag evictor, avoid killing warm frag queues
From: Eric Dumazet @ 2012-11-30 22:25 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: David Miller, fw, netdev, pablo, tgraf, amwang, kaber, paulmck,
	herbert
In-Reply-To: <1354311437.11754.459.camel@localhost>

On Fri, 2012-11-30 at 22:37 +0100, Jesper Dangaard Brouer wrote:

> 
> Come on Eric, you are smart than this.  When will you realize, that
> dropping partly completed fragment queue are bad for performance? (And
> thus a bad algorithmic choice in the evictor)

Sorry I must be dumb, so I'll stop commenting.

^ permalink raw reply

* [PATCH net-next] myri10ge: fix incorrect use of ntohs()
From: Andrew Gallatin @ 2012-11-30 22:31 UTC (permalink / raw)
  To: davem; +Cc: netdev, Andrew Gallatin

1b4c44e6369dbbafd113f1e00b406f1eda5ab5b2 incorrectly used
ntohs() rather than htons() in myri10ge_vlan_rx().

Thanks to Fengguang Wu, Yuanhan Liu's kernel-build tester
for pointing out this bug.

Signed-off-by: Andrew Gallatin <gallatin@myri.com>
---
 drivers/net/ethernet/myricom/myri10ge/myri10ge.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/myricom/myri10ge/myri10ge.c b/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
index 2fc984a..a40234e 100644
--- a/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
+++ b/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
@@ -1283,7 +1283,7 @@ myri10ge_vlan_rx(struct net_device *dev, void *addr, struct sk_buff *skb)
 	va += MXGEFW_PAD;
 	veh = (struct vlan_ethhdr *)va;
 	if ((dev->features & NETIF_F_HW_VLAN_RX) == NETIF_F_HW_VLAN_RX &&
-	    veh->h_vlan_proto == ntohs(ETH_P_8021Q)) {
+	    veh->h_vlan_proto == htons(ETH_P_8021Q)) {
 		/* fixup csum if needed */
 		if (skb->ip_summed == CHECKSUM_COMPLETE) {
 			vsum = csum_partial(va + ETH_HLEN, VLAN_HLEN, 0);
-- 
1.7.9.5

^ permalink raw reply related

* Re: Wireless regression in workqueue: use mod_delayed_work() instead of __cancel + queue
From: Tejun Heo @ 2012-11-30 22:56 UTC (permalink / raw)
  To: Anders Kaseorg
  Cc: Herbert Xu, John W. Linville, netdev, linux-wireless,
	linux-kernel
In-Reply-To: <20121130211435.GJ3873@htj.dyndns.org>

Hey, again.

Can you please test whether the following patch makes any difference?

Thanks!

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 042d221..26368ef 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1477,7 +1477,10 @@ bool mod_delayed_work_on(int cpu, struct workqueue_struct *wq,
 	} while (unlikely(ret == -EAGAIN));
 
 	if (likely(ret >= 0)) {
-		__queue_delayed_work(cpu, wq, dwork, delay);
+		if (!delay)
+			__queue_work(cpu, wq, &dwork->work);
+		else
+			__queue_delayed_work(cpu, wq, dwork, delay);
 		local_irq_restore(flags);
 	}
 

^ permalink raw reply related

* [PATCH v3]realtek:r8169: Bugfix or workaround for missing extended GigaMAC registers settings
From: Wang YanQing @ 2012-11-30 23:21 UTC (permalink / raw)
  To: nic_swsd; +Cc: romieu, netdev, linux-kernel

I get a board with 8168e-vl(10ec:8168 with RTL_GIGA_MAC_VER_34),
everything looks well first, I can use ifconfig to set ip, netmask,
etc. And the rx/tx statistics show by ifconfig looks good when I
ping another host or ping it from another host. But it don't work,
I can't get ICMP REPLAY from both sides, although the RX/TX statistics
seem good.

After add some debug code, I found this NIC only accept ethernet
broadcast package, it can't filter out the package send to its
MAC address, but it works good for sending.So ifconfig show the
RX/TX status means it can receive ARP package.(It don't know its
MAC address, so below)

I have try the driver provided by realtek's website, it have the
same problem at the first time. BUT IT WORK AFTER I REBOOT with
CRTL-ALT-DEL, the reason is that realtek's driver call rtl8168_rar_set
in the .shutdown function register with pci_register_driver. Yes,
the really reason to make it work is rtl8689_rar_set, this function
set extended GigaMAC registers, so after reboot without lost the power,
NIC keep the status before reboot.

I haven't see any code to set GigaMAC registers in kernel when boot,
so I guess BIOS or NIC's circuit make it, but of course one miss
the extended GigaMAC registers  in this problem. The probe code can
get MAC address right, so MAC{0,4} must had been setted, but some
guys forget the extended GigaMAC registers.

This patch fix it.

[ I don't known whether others' realtek's NIC with extended GigaMAC
reigisters have the same problem, I meet it in 8168e-vl with
RTL_GIGA_MAC_VER_34, so I make this patch just for it.]

Changes:
V1-V2:
I follow Francois Romieu 's below opinion to make this patch oneline:

I'd rather see the GigaMAC registers written through a call to
rtl_rar_set when the mac address is read in rtl_init_one instead of
duplicating most of rtl_rar_set in a quite different place.

V2-V3:
1:Add conditon code to around this fix, because it make no sense for
most NIC
2:Add comment in code

Signed-off-by: Wang YanQing <udknight@gmail.com>
---
 drivers/net/ethernet/realtek/r8169.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index 927aa33..5d98296 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -6903,6 +6903,14 @@ rtl_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 		dev->dev_addr[i] = RTL_R8(MAC0 + i);
 	memcpy(dev->perm_addr, dev->dev_addr, dev->addr_len);
 
+	/*
+	 *This is a fix for BIOS forget to set
+	 *extend GigaMAC registers
+	 *Wang YanQing 12/1/2012
+	 */
+	if (tp->mac_version == RTL_GIGA_MAC_VER_34) {
+	    rtl_rar_set(tp, dev->dev_addr);
+	}
 	SET_ETHTOOL_OPS(dev, &rtl8169_ethtool_ops);
 	dev->watchdog_timeo = RTL8169_TX_TIMEOUT;
 
-- 
1.7.11.1.116.g8228a23

^ permalink raw reply related

* Re: [net-next PATCH V2 1/9] net: frag evictor, avoid killing warm frag queues
From: Jesper Dangaard Brouer @ 2012-11-30 23:23 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, fw, netdev, pablo, tgraf, amwang, kaber, paulmck,
	herbert
In-Reply-To: <1354314319.20109.179.camel@edumazet-glaptop>

On Fri, 2012-11-30 at 14:25 -0800, Eric Dumazet wrote:
> On Fri, 2012-11-30 at 22:37 +0100, Jesper Dangaard Brouer wrote:
>
> > Come on Eric, you are smart than this.  When will you realize, that
> > dropping partly completed fragment queue are bad for performance? (And
> > thus a bad algorithmic choice in the evictor)
> 
> Sorry I must be dumb, so I'll stop commenting.

Come on Eric, you are one of the smartest and most enlightened persons I
know.

I'm just a little puzzled (and perhaps annoyed) that you don't agree
that the evictor code is a problem, given the tests I have provided and
the discussion we have had.

On this mailing list we challenge and give each other a hard time on the
technical side, as it should be.  This is nothing personal -- I don't
take it personal, I just believe this patch is important and makes a
difference.


I want us to discuss the evictor code as such.  Not trying to come up
with, workarounds avoiding the evictor code.

The dropping choice in the evictor code is not sound.

We are dealing with assembling fragments.  If a single fragment is lost,
the complete fragment is lost.  The evictor code, will kill off one or
several fragments, knowing that this will invalidate the remaining
fragments.  Under high load, the LRU list has no effect, and cannot
guide the drop choice.  The result is dropping on an "even"/fair basis,
which will basically kill all fragments, letting none complete.  Just as
my tests indicate, it severely affects performance with nearly no
throughput as a result.


-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: respin of __dev* removal patches
From: Greg KH @ 2012-11-30 23:39 UTC (permalink / raw)
  To: Bill Pemberton; +Cc: netdev
In-Reply-To: <20121130221555.2AA738019A@viridian.itc.virginia.edu>

On Fri, Nov 30, 2012 at 05:15:54PM -0500, Bill Pemberton wrote:
> I've got a respin of the hotplug removal patches for the networking
> subsystem.  I don't want to irritate you like the big patch set did,
> so before I submit them, what do you want to see that will keep your
> pain to a minimum?
> 
> I've redone them so that all the __dev* removals are done at once so
> there won't be one patch for __devinit, one for __devexit, etc.  I've
> also broken them down into chunks following what's in the MAINTAINERS
> file.  The result is 103 patches.

As that's a lot of patches to handle through patchwork, would it be
easier for the network maintainers for me to just put these in a tree
they can pull from?  I will base it off of net-next.

thanks,

greg k-h

^ permalink raw reply

* Re: [net-next PATCH V2 1/9] net: frag evictor, avoid killing warm frag queues
From: Stephen Hemminger @ 2012-11-30 23:47 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Eric Dumazet, David Miller, fw, netdev, pablo, tgraf, amwang,
	kaber, paulmck, herbert
In-Reply-To: <1354317815.11754.498.camel@localhost>

My $.02 is that this would be a good place to introduce lock-free auto
resizing hash lists that are in the userspace RCU library.

It would be a non-trivial effort to put this in the kernel, but
it would let the table grow transparently.

^ permalink raw reply

* Re: [PATCH net-next v1 3/3] net/mlx4_en: Set number of rx/tx channels using ethtool
From: Ben Hutchings @ 2012-11-30 23:58 UTC (permalink / raw)
  To: Amir Vadai; +Cc: David S. Miller, Or Gerlitz, Oren Duer, netdev
In-Reply-To: <1354216903-830-4-git-send-email-amirv@mellanox.com>

On Thu, 2012-11-29 at 21:21 +0200, Amir Vadai wrote:
> Add support to changing number of rx/tx channels using
> ethtool ('ethtool -[lL]'). Where the number of tx channels specified in ethtool
> is the number of rings per user priority - not total number of tx rings.
> 
> Signed-off-by: Amir Vadai <amirv@mellanox.com>
> ---
>  drivers/net/ethernet/mellanox/mlx4/en_ethtool.c |   69 +++++++++++++++++++++++
>  drivers/net/ethernet/mellanox/mlx4/en_main.c    |    2 +-
>  drivers/net/ethernet/mellanox/mlx4/en_netdev.c  |   26 +++++----
>  drivers/net/ethernet/mellanox/mlx4/en_tx.c      |    2 +-
>  drivers/net/ethernet/mellanox/mlx4/mlx4_en.h    |    8 ++-
>  5 files changed, 93 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
> index dc8ccb4..681bc1b 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
> @@ -999,6 +999,73 @@ static int mlx4_en_set_rxnfc(struct net_device *dev, struct ethtool_rxnfc *cmd)
[...]
> +static int mlx4_en_set_channels(struct net_device *dev,
> +		struct ethtool_channels *channel)
> +{
> +	struct mlx4_en_priv *priv = netdev_priv(dev);
> +	struct mlx4_en_dev *mdev = priv->mdev;
> +	int port_up;
> +	int err = 0;
> +
> +	if (channel->other_count || channel->combined_count ||
> +	    channel->tx_count > channel->max_tx ||
> +	    channel->rx_count > channel->max_rx ||

The values of max_tx and max_rx are passed in from userland, so you
can't trust them.

> +	    !channel->tx_count || !channel->rx_count)
> +		return -EINVAL;
[...]

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [net-next PATCH V2 1/9] net: frag evictor, avoid killing warm frag queues
From: Eric Dumazet @ 2012-11-30 23:58 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: David Miller, fw, netdev, pablo, tgraf, amwang, kaber, paulmck,
	herbert
In-Reply-To: <1354317815.11754.498.camel@localhost>

On Sat, 2012-12-01 at 00:23 +0100, Jesper Dangaard Brouer wrote:


> I'm just a little puzzled (and perhaps annoyed) that you don't agree
> that the evictor code is a problem, given the tests I have provided and
> the discussion we have had.
> 
> On this mailing list we challenge and give each other a hard time on the
> technical side, as it should be.  This is nothing personal -- I don't
> take it personal, I just believe this patch is important and makes a
> difference.
> 
> 
> I want us to discuss the evictor code as such.  Not trying to come up
> with, workarounds avoiding the evictor code.
> 
> The dropping choice in the evictor code is not sound.
> 
> We are dealing with assembling fragments.  If a single fragment is lost,
> the complete fragment is lost.  The evictor code, will kill off one or
> several fragments, knowing that this will invalidate the remaining
> fragments.  Under high load, the LRU list has no effect, and cannot
> guide the drop choice.  The result is dropping on an "even"/fair basis,
> which will basically kill all fragments, letting none complete.  Just as
> my tests indicate, it severely affects performance with nearly no
> throughput as a result.

Give me an alternative, I'll tell you how an attacker can hurt you,
knowing the strategy you use.

Keeping around old frags is not good. After a burst of frags, you'll be
unable to recover until they are purged.

Purging old frags is the most natural way to evict incomplete messages.

(If your mem limits are high enough to absorb the expected workload plus
a fair amount of extra space, but your results are biased with wrong
thresholds)

Or else, an attacker only has to send incomplete messages, and your host
will fill its table and refuse your messages.

^ permalink raw reply

* Re: [net-next PATCH V2 1/9] net: frag evictor, avoid killing warm frag queues
From: Eric Dumazet @ 2012-12-01  0:03 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Jesper Dangaard Brouer, David Miller, fw, netdev, pablo, tgraf,
	amwang, kaber, paulmck, herbert
In-Reply-To: <20121130154751.06f7d3f7@nehalam.linuxnetplumber.net>

On Fri, 2012-11-30 at 15:47 -0800, Stephen Hemminger wrote:
> My $.02 is that this would be a good place to introduce lock-free auto
> resizing hash lists that are in the userspace RCU library.
> 
> It would be a non-trivial effort to put this in the kernel, but
> it would let the table grow transparently.

Yes, but no ;)

The current hash is 64 slots, its not like anybody wants it to be one
million slots.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox