Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] r6040: disable pci device if the subsequent calls (after pci_enable_device) fails
From: Florian Fainelli @ 2012-05-29 10:06 UTC (permalink / raw)
  To: devendra.aaru; +Cc: netdev, linux-kernel
In-Reply-To: <CAHdPZaPWM3MfyARfwOm2KkKPPvm4dEfncF47N2xASyp27HdzsA@mail.gmail.com>

On Tuesday 29 May 2012 15:28:51 devendra.aaru wrote:
> Hello Florian,
> 
> On Tue, May 29, 2012 at 2:50 PM, Florian Fainelli <florian@openwrt.org> 
wrote:

> 
> Thanks for the Ack.
> I found one more problem. Its when mdiobus_alloc fails in
> r6040_init_one, we need to do call to the netif_napi_del and set the
> NULL to pci_set_drvdata, at  err_out_unmap.

Ok, can you please submit a patch to fix this issue as well? Thanks!
--
Florian

^ permalink raw reply

* Re: [PATCH] r6040: disable pci device if the subsequent calls (after pci_enable_device) fails
From: devendra.aaru @ 2012-05-29 10:13 UTC (permalink / raw)
  To: Florian Fainelli; +Cc: netdev, linux-kernel
In-Reply-To: <3646407.261CWqact8@flexo>

Hi Florian,

>> On Tue, May 29, 2012 at 2:50 PM, Florian Fainelli <florian@openwrt.org>
> wrote:
>
>>
>> Thanks for the Ack.
>> I found one more problem. Its when mdiobus_alloc fails in
>> r6040_init_one, we need to do call to the netif_napi_del and set the
>> NULL to pci_set_drvdata, at  err_out_unmap.
>
> Ok, can you please submit a patch to fix this issue as well? Thanks!
> --
Ok sure. will be doing it shortly.
> Florian

Thanks,
Devendra.

^ permalink raw reply

* Re
From: WANG LIU @ 2012-05-29 12:05 UTC (permalink / raw)


I am Mr. Liu Wang, bank officer with international bank
of Taipei, Taiwan. I need your partnership in
re-profiling funds. You will be paid 30% for management
fee.Contact Email:wangliu159@gmail.com

^ permalink raw reply

* Re: [RFC] skb: avoid unnecessary reallocations in __skb_cow
From: Eric Dumazet @ 2012-05-29 12:34 UTC (permalink / raw)
  To: Felix Fietkau; +Cc: netdev
In-Reply-To: <1338132370-88299-1-git-send-email-nbd@openwrt.org>

On Sun, 2012-05-27 at 17:26 +0200, Felix Fietkau wrote:
> At the beginning of __skb_cow, headroom gets set to a minimum of
> NET_SKB_PAD. This causes unnecessary reallocations if the buffer was not
> cloned and the headroom is just below NET_SKB_PAD, but still more than the
> amount requested by the caller.
> This was showing up frequently in my tests on VLAN tx, where
> vlan_insert_tag calls skb_cow_head(skb, VLAN_HLEN).
> 
> Fix this by only setting the headroom delta if either there is less
> headroom than specified by the caller, or if reallocation has to be done
> anyway because the skb was cloned.
> 
> Signed-off-by: Felix Fietkau <nbd@openwrt.org>
> ---
>  include/linux/skbuff.h |    9 ++++++---
>  1 files changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 0e50171..1898471 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -1894,12 +1894,15 @@ static inline int skb_clone_writable(const struct sk_buff *skb, unsigned int len
>  static inline int __skb_cow(struct sk_buff *skb, unsigned int headroom,
>  			    int cloned)
>  {
> +	unsigned int alloc_headroom = headroom;
>  	int delta = 0;
>  
>  	if (headroom < NET_SKB_PAD)
> -		headroom = NET_SKB_PAD;
> -	if (headroom > skb_headroom(skb))
> -		delta = headroom - skb_headroom(skb);
> +		alloc_headroom = NET_SKB_PAD;
> +	if (headroom > skb_headroom(skb) ||
> +	    (cloned && alloc_headroom > skb_headroom(skb))) {
> +		delta = alloc_headroom - skb_headroom(skb);
> +	}
>  
>  	if (delta || cloned)
>  		return pskb_expand_head(skb, ALIGN(delta, NET_SKB_PAD), 0,

Nice catch.

Scratching my head on this one. Why not the obvious fix ?

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 0e50171..b534a1b 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1896,8 +1896,6 @@ static inline int __skb_cow(struct sk_buff *skb, unsigned int headroom,
 {
 	int delta = 0;
 
-	if (headroom < NET_SKB_PAD)
-		headroom = NET_SKB_PAD;
 	if (headroom > skb_headroom(skb))
 		delta = headroom - skb_headroom(skb);
 

^ permalink raw reply related

* Re: [RFC] skb: avoid unnecessary reallocations in __skb_cow
From: Felix Fietkau @ 2012-05-29 12:41 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev
In-Reply-To: <1338294848.2840.15.camel@edumazet-glaptop>

On 2012-05-29 2:34 PM, Eric Dumazet wrote:
> On Sun, 2012-05-27 at 17:26 +0200, Felix Fietkau wrote:
>> At the beginning of __skb_cow, headroom gets set to a minimum of
>> NET_SKB_PAD. This causes unnecessary reallocations if the buffer was not
>> cloned and the headroom is just below NET_SKB_PAD, but still more than the
>> amount requested by the caller.
>> This was showing up frequently in my tests on VLAN tx, where
>> vlan_insert_tag calls skb_cow_head(skb, VLAN_HLEN).
>> 
>> Fix this by only setting the headroom delta if either there is less
>> headroom than specified by the caller, or if reallocation has to be done
>> anyway because the skb was cloned.
>> 
>> Signed-off-by: Felix Fietkau <nbd@openwrt.org>
>> ---
>>  include/linux/skbuff.h |    9 ++++++---
>>  1 files changed, 6 insertions(+), 3 deletions(-)
>> 
>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>> index 0e50171..1898471 100644
>> --- a/include/linux/skbuff.h
>> +++ b/include/linux/skbuff.h
>> @@ -1894,12 +1894,15 @@ static inline int skb_clone_writable(const struct sk_buff *skb, unsigned int len
>>  static inline int __skb_cow(struct sk_buff *skb, unsigned int headroom,
>>  			    int cloned)
>>  {
>> +	unsigned int alloc_headroom = headroom;
>>  	int delta = 0;
>>  
>>  	if (headroom < NET_SKB_PAD)
>> -		headroom = NET_SKB_PAD;
>> -	if (headroom > skb_headroom(skb))
>> -		delta = headroom - skb_headroom(skb);
>> +		alloc_headroom = NET_SKB_PAD;
>> +	if (headroom > skb_headroom(skb) ||
>> +	    (cloned && alloc_headroom > skb_headroom(skb))) {
>> +		delta = alloc_headroom - skb_headroom(skb);
>> +	}
>>  
>>  	if (delta || cloned)
>>  		return pskb_expand_head(skb, ALIGN(delta, NET_SKB_PAD), 0,
> 
> Nice catch.
> 
> Scratching my head on this one. Why not the obvious fix ?
If we're reallocating anyway, we might as well put in more headroom than
requested, in case something else needs even more than that.

- Felix

^ permalink raw reply

* Re: [RFC] skb: avoid unnecessary reallocations in __skb_cow
From: Eric Dumazet @ 2012-05-29 12:59 UTC (permalink / raw)
  To: Felix Fietkau; +Cc: netdev
In-Reply-To: <4FC4C3E8.6080206@openwrt.org>

On Tue, 2012-05-29 at 14:41 +0200, Felix Fietkau wrote:
> On 2012-05-29 2:34 PM, Eric Dumazet wrote:
> > On Sun, 2012-05-27 at 17:26 +0200, Felix Fietkau wrote:
> >> At the beginning of __skb_cow, headroom gets set to a minimum of
> >> NET_SKB_PAD. This causes unnecessary reallocations if the buffer was not
> >> cloned and the headroom is just below NET_SKB_PAD, but still more than the
> >> amount requested by the caller.
> >> This was showing up frequently in my tests on VLAN tx, where
> >> vlan_insert_tag calls skb_cow_head(skb, VLAN_HLEN).
> >> 
> >> Fix this by only setting the headroom delta if either there is less
> >> headroom than specified by the caller, or if reallocation has to be done
> >> anyway because the skb was cloned.
> >> 
> >> Signed-off-by: Felix Fietkau <nbd@openwrt.org>
> >> ---
> >>  include/linux/skbuff.h |    9 ++++++---
> >>  1 files changed, 6 insertions(+), 3 deletions(-)
> >> 
> >> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> >> index 0e50171..1898471 100644
> >> --- a/include/linux/skbuff.h
> >> +++ b/include/linux/skbuff.h
> >> @@ -1894,12 +1894,15 @@ static inline int skb_clone_writable(const struct sk_buff *skb, unsigned int len
> >>  static inline int __skb_cow(struct sk_buff *skb, unsigned int headroom,
> >>  			    int cloned)
> >>  {
> >> +	unsigned int alloc_headroom = headroom;
> >>  	int delta = 0;
> >>  
> >>  	if (headroom < NET_SKB_PAD)
> >> -		headroom = NET_SKB_PAD;
> >> -	if (headroom > skb_headroom(skb))
> >> -		delta = headroom - skb_headroom(skb);
> >> +		alloc_headroom = NET_SKB_PAD;
> >> +	if (headroom > skb_headroom(skb) ||
> >> +	    (cloned && alloc_headroom > skb_headroom(skb))) {
> >> +		delta = alloc_headroom - skb_headroom(skb);
> >> +	}
> >>  
> >>  	if (delta || cloned)
> >>  		return pskb_expand_head(skb, ALIGN(delta, NET_SKB_PAD), 0,
> > 
> > Nice catch.
> > 
> > Scratching my head on this one. Why not the obvious fix ?
> If we're reallocating anyway, we might as well put in more headroom than
> requested, in case something else needs even more than that.


Locally generated packets should have enough headroom, and for forward
paths, we already have NET_SKB_PAD bytes of headroom.

Adding yet another NET_SKB_PAD extra space is overkill, unless you have
a real use case in mind ?

^ permalink raw reply

* Re: [RFC] skb: avoid unnecessary reallocations in __skb_cow
From: Felix Fietkau @ 2012-05-29 13:10 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev
In-Reply-To: <1338296361.2840.23.camel@edumazet-glaptop>

On 2012-05-29 2:59 PM, Eric Dumazet wrote:
> On Tue, 2012-05-29 at 14:41 +0200, Felix Fietkau wrote:
>> On 2012-05-29 2:34 PM, Eric Dumazet wrote:
>> > On Sun, 2012-05-27 at 17:26 +0200, Felix Fietkau wrote:
>> >> At the beginning of __skb_cow, headroom gets set to a minimum of
>> >> NET_SKB_PAD. This causes unnecessary reallocations if the buffer was not
>> >> cloned and the headroom is just below NET_SKB_PAD, but still more than the
>> >> amount requested by the caller.
>> >> This was showing up frequently in my tests on VLAN tx, where
>> >> vlan_insert_tag calls skb_cow_head(skb, VLAN_HLEN).
>> >> 
>> >> Fix this by only setting the headroom delta if either there is less
>> >> headroom than specified by the caller, or if reallocation has to be done
>> >> anyway because the skb was cloned.
>> >> 
>> >> Signed-off-by: Felix Fietkau <nbd@openwrt.org>
>> >> ---
>> >>  include/linux/skbuff.h |    9 ++++++---
>> >>  1 files changed, 6 insertions(+), 3 deletions(-)
>> >> 
>> >> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>> >> index 0e50171..1898471 100644
>> >> --- a/include/linux/skbuff.h
>> >> +++ b/include/linux/skbuff.h
>> >> @@ -1894,12 +1894,15 @@ static inline int skb_clone_writable(const struct sk_buff *skb, unsigned int len
>> >>  static inline int __skb_cow(struct sk_buff *skb, unsigned int headroom,
>> >>  			    int cloned)
>> >>  {
>> >> +	unsigned int alloc_headroom = headroom;
>> >>  	int delta = 0;
>> >>  
>> >>  	if (headroom < NET_SKB_PAD)
>> >> -		headroom = NET_SKB_PAD;
>> >> -	if (headroom > skb_headroom(skb))
>> >> -		delta = headroom - skb_headroom(skb);
>> >> +		alloc_headroom = NET_SKB_PAD;
>> >> +	if (headroom > skb_headroom(skb) ||
>> >> +	    (cloned && alloc_headroom > skb_headroom(skb))) {
>> >> +		delta = alloc_headroom - skb_headroom(skb);
>> >> +	}
>> >>  
>> >>  	if (delta || cloned)
>> >>  		return pskb_expand_head(skb, ALIGN(delta, NET_SKB_PAD), 0,
>> > 
>> > Nice catch.
>> > 
>> > Scratching my head on this one. Why not the obvious fix ?
>> If we're reallocating anyway, we might as well put in more headroom than
>> requested, in case something else needs even more than that.
> 
> 
> Locally generated packets should have enough headroom, and for forward
> paths, we already have NET_SKB_PAD bytes of headroom.
> 
> Adding yet another NET_SKB_PAD extra space is overkill, unless you have
> a real use case in mind ?
I don't have any real use case in mind, but it's not really adding an
extra NET_SKB_PAD, it simply fills up the headroom to NET_SKB_PAD, but I
guess that's probably unnecessary as well.
I'll resend the patch without the extra padding later.

- Felix

^ permalink raw reply

* Re: [RFC] skb: avoid unnecessary reallocations in __skb_cow
From: Eric Dumazet @ 2012-05-29 13:26 UTC (permalink / raw)
  To: Felix Fietkau; +Cc: netdev
In-Reply-To: <4FC4CAB1.8010000@openwrt.org>

On Tue, 2012-05-29 at 15:10 +0200, Felix Fietkau wrote:

> I don't have any real use case in mind, but it's not really adding an
> extra NET_SKB_PAD, it simply fills up the headroom to NET_SKB_PAD,

This is not what is doing your patch.

If cloned is true, and current skb headroom less than 64, you add an
extra 64 bytes of headroom.

Just keep it simple, this is inline code and should be kept as small as
possible.

^ permalink raw reply

* [PATCH] l2tp: fix oops in L2TP IP sockets for connect() AF_UNSPEC case
From: James Chapman @ 2012-05-29 13:30 UTC (permalink / raw)
  To: netdev; +Cc: levinsasha928, James Chapman

An application may call connect() to disconnect a socket using an
address with family AF_UNSPEC. The L2TP IP sockets were not handling
this case when the socket is not bound and an attempt to connect()
using AF_UNSPEC in such cases would result in an oops. This patch
addresses the problem by protecting the sk_prot->disconnect() call
against trying to unhash the socket before it is bound.

The L2TP IPv4 and IPv6 sockets have the same problem. Both are fixed
by this patch.

The patch also adds more checks that the sockaddr supplied to bind()
and connect() calls is valid.

 RIP: 0010:[<ffffffff82e133b0>]  [<ffffffff82e133b0>] inet_unhash+0x50/0xd0
 RSP: 0018:ffff88001989be28  EFLAGS: 00010293
 Stack:
  ffff8800407a8000 0000000000000000 ffff88001989be78 ffffffff82e3a249
  ffffffff82e3a050 ffff88001989bec8 ffff88001989be88 ffff8800407a8000
  0000000000000010 ffff88001989bec8 ffff88001989bea8 ffffffff82e42639
 Call Trace:
 [<ffffffff82e3a249>] udp_disconnect+0x1f9/0x290
 [<ffffffff82e42639>] inet_dgram_connect+0x29/0x80
 [<ffffffff82d012fc>] sys_connect+0x9c/0x100

Reported-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
---
 net/l2tp/l2tp_ip.c  |   24 ++++++++++++++++++++++--
 net/l2tp/l2tp_ip6.c |   18 +++++++++++++++++-
 2 files changed, 39 insertions(+), 3 deletions(-)

diff --git a/net/l2tp/l2tp_ip.c b/net/l2tp/l2tp_ip.c
index 889f5d1..70614e7 100644
--- a/net/l2tp/l2tp_ip.c
+++ b/net/l2tp/l2tp_ip.c
@@ -239,9 +239,16 @@ static int l2tp_ip_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 {
 	struct inet_sock *inet = inet_sk(sk);
 	struct sockaddr_l2tpip *addr = (struct sockaddr_l2tpip *) uaddr;
-	int ret = -EINVAL;
+	int ret;
 	int chk_addr_ret;
 
+	if (!sock_flag(sk, SOCK_ZAPPED))
+		return -EINVAL;
+	if (addr_len < sizeof(struct sockaddr_l2tpip))
+		return -EINVAL;
+	if (addr->l2tp_family != AF_INET)
+		return -EINVAL;
+
 	ret = -EADDRINUSE;
 	read_lock_bh(&l2tp_ip_lock);
 	if (__l2tp_ip_bind_lookup(&init_net, addr->l2tp_addr.s_addr, sk->sk_bound_dev_if, addr->l2tp_conn_id))
@@ -272,6 +279,8 @@ static int l2tp_ip_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 	sk_del_node_init(sk);
 	write_unlock_bh(&l2tp_ip_lock);
 	ret = 0;
+	sock_reset_flag(sk, SOCK_ZAPPED);
+
 out:
 	release_sock(sk);
 
@@ -288,6 +297,9 @@ static int l2tp_ip_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len
 	struct sockaddr_l2tpip *lsa = (struct sockaddr_l2tpip *) uaddr;
 	int rc;
 
+	if (sock_flag(sk, SOCK_ZAPPED)) /* Must bind first - autobinding does not work */
+		return -EINVAL;
+
 	if (addr_len < sizeof(*lsa))
 		return -EINVAL;
 
@@ -311,6 +323,14 @@ static int l2tp_ip_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len
 	return rc;
 }
 
+static int l2tp_ip_disconnect(struct sock *sk, int flags)
+{
+	if (sock_flag(sk, SOCK_ZAPPED))
+		return 0;
+
+	return udp_disconnect(sk, flags);
+}
+
 static int l2tp_ip_getname(struct socket *sock, struct sockaddr *uaddr,
 			   int *uaddr_len, int peer)
 {
@@ -530,7 +550,7 @@ static struct proto l2tp_ip_prot = {
 	.close		   = l2tp_ip_close,
 	.bind		   = l2tp_ip_bind,
 	.connect	   = l2tp_ip_connect,
-	.disconnect	   = udp_disconnect,
+	.disconnect	   = l2tp_ip_disconnect,
 	.ioctl		   = udp_ioctl,
 	.destroy	   = l2tp_ip_destroy_sock,
 	.setsockopt	   = ip_setsockopt,
diff --git a/net/l2tp/l2tp_ip6.c b/net/l2tp/l2tp_ip6.c
index 0291d8d..35e1e4b 100644
--- a/net/l2tp/l2tp_ip6.c
+++ b/net/l2tp/l2tp_ip6.c
@@ -258,6 +258,10 @@ static int l2tp_ip6_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 	int addr_type;
 	int err;
 
+	if (!sock_flag(sk, SOCK_ZAPPED))
+		return -EINVAL;
+	if (addr->l2tp_family != AF_INET6)
+		return -EINVAL;
 	if (addr_len < sizeof(*addr))
 		return -EINVAL;
 
@@ -331,6 +335,7 @@ static int l2tp_ip6_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 	sk_del_node_init(sk);
 	write_unlock_bh(&l2tp_ip6_lock);
 
+	sock_reset_flag(sk, SOCK_ZAPPED);
 	release_sock(sk);
 	return 0;
 
@@ -354,6 +359,9 @@ static int l2tp_ip6_connect(struct sock *sk, struct sockaddr *uaddr,
 	int	addr_type;
 	int rc;
 
+	if (sock_flag(sk, SOCK_ZAPPED)) /* Must bind first - autobinding does not work */
+		return -EINVAL;
+
 	if (addr_len < sizeof(*lsa))
 		return -EINVAL;
 
@@ -383,6 +391,14 @@ static int l2tp_ip6_connect(struct sock *sk, struct sockaddr *uaddr,
 	return rc;
 }
 
+static int l2tp_ip6_disconnect(struct sock *sk, int flags)
+{
+	if (sock_flag(sk, SOCK_ZAPPED))
+		return 0;
+
+	return udp_disconnect(sk, flags);
+}
+
 static int l2tp_ip6_getname(struct socket *sock, struct sockaddr *uaddr,
 			    int *uaddr_len, int peer)
 {
@@ -689,7 +705,7 @@ static struct proto l2tp_ip6_prot = {
 	.close		   = l2tp_ip6_close,
 	.bind		   = l2tp_ip6_bind,
 	.connect	   = l2tp_ip6_connect,
-	.disconnect	   = udp_disconnect,
+	.disconnect	   = l2tp_ip6_disconnect,
 	.ioctl		   = udp_ioctl,
 	.destroy	   = l2tp_ip6_destroy_sock,
 	.setsockopt	   = ipv6_setsockopt,
-- 
1.7.0.4

^ permalink raw reply related

* [PATCH] skb: avoid unnecessary reallocations in __skb_cow
From: Felix Fietkau @ 2012-05-29 13:35 UTC (permalink / raw)
  To: netdev; +Cc: eric.dumazet

At the beginning of __skb_cow, headroom gets set to a minimum of
NET_SKB_PAD. This causes unnecessary reallocations if the buffer was not
cloned and the headroom is just below NET_SKB_PAD, but still more than the
amount requested by the caller.
This was showing up frequently in my tests on VLAN tx, where
vlan_insert_tag calls skb_cow_head(skb, VLAN_HLEN).

Locally generated packets should have enough headroom, and for forward
paths, we already have NET_SKB_PAD bytes of headroom, so we don't need to
add any extra space here.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
---
 include/linux/skbuff.h |    2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 0e50171..b534a1b 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1896,8 +1896,6 @@ static inline int __skb_cow(struct sk_buff *skb, unsigned int headroom,
 {
 	int delta = 0;

-	if (headroom < NET_SKB_PAD)
-		headroom = NET_SKB_PAD;
 	if (headroom > skb_headroom(skb))
 		delta = headroom - skb_headroom(skb);

-- 
1.7.3.2

^ permalink raw reply related

* Re: [PATCH] skb: avoid unnecessary reallocations in __skb_cow
From: Eric Dumazet @ 2012-05-29 13:43 UTC (permalink / raw)
  To: Felix Fietkau; +Cc: netdev
In-Reply-To: <1338298508-40376-1-git-send-email-nbd@openwrt.org>

On Tue, 2012-05-29 at 15:35 +0200, Felix Fietkau wrote:

> 
> Signed-off-by: Felix Fietkau <nbd@openwrt.org>
> ---
>  include/linux/skbuff.h |    2 --
>  1 files changed, 0 insertions(+), 2 deletions(-)
> 

 

Signed-off-by: Eric Dumazet <edumazet@google.com>

Thanks !

^ permalink raw reply

* Re: Strange latency spikes/TX network stalls on Sun Fire X4150(x86) and e1000e
From: Hiroaki SHIMODA @ 2012-05-29 14:25 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Denys Fedoryshchenko, netdev, e1000-devel, jeffrey.t.kirsher,
	jesse.brandeburg, eric.dumazet, davem
In-Reply-To: <CA+mtBx_sF5GCMRpLQuTruZ=xpFTFpd5z8SZJaG_dBqf4oCXpwg@mail.gmail.com>

On Sun, 20 May 2012 10:40:41 -0700
Tom Herbert <therbert@google.com> wrote:

> Tried to reproduce:
> 
> May 20 10:08:30 test kernel: [    6.168240] e1000e 0000:06:00.0:
> (unregistered net_device): Interrupt Throttling Rate (ints/sec) set to
> dynamic conservative mode
> May 20 10:08:30 test kernel: [    6.221591] e1000e 0000:06:00.1:
> (unregistered net_device): Interrupt Throttling Rate (ints/sec) set to
> dynamic conservative mode
> 
> 06:00.0 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit
> Ethernet Controller (Copper) (rev 01)
> 06:00.1 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit
> Ethernet Controller (Copper) (rev 01)
> 
> Following above instructions to repro gives:
> 
> 1480 bytes from test2 (192.168.2.49): icmp_req=5875 ttl=64 time=0.358 ms
> 1480 bytes from test2 (192.168.2.49): icmp_req=5876 ttl=64 time=0.330 ms
> 1480 bytes from test2 (192.168.2.49): icmp_req=5877 ttl=64 time=0.337 ms
> 1480 bytes from test2 (192.168.2.49): icmp_req=5878 ttl=64 time=0.375 ms
> 1480 bytes from test2 (192.168.2.49): icmp_req=5879 ttl=64 time=0.359 ms
> 1480 bytes from lpb49.prod.google.com (192.168.2.49): icmp_req=5880
> ttl=64 time=0.380 ms
> 
> And I didn't see the stalls. This was on an Intel machine.  The limit
> was stable, went up to around 28K when opened large file and tended to
> stay between 15-28K.
> 
> The describe problem seems to have characteristics that transmit
> interrupts are not at all periodic, and it would seem that some are
> taking hundreds of milliseconds to pop.  I don't see anything that
> would cause that in the NIC, is it possible there is some activity on
> the machines periodically and often holding down interrupts for  long
> periods of time.  Are there any peculiarities on Sun Fire in interrupt
> handling?
> 
> Can you also provide an 'ethtool -c eth0'
> 
> Thanks,
> Tom

I also observed the similar behaviour on the following environment.

03:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection

[    2.962119] e1000e: Intel(R) PRO/1000 Network Driver - 2.0.0-k
[    2.968095] e1000e: Copyright(c) 1999 - 2012 Intel Corporation.
[    2.974251] e1000e 0000:03:00.0: Disabling ASPM L0s L1
[    2.979653] e1000e 0000:03:00.0: (unregistered net_device): Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode
[    2.991599] e1000e 0000:03:00.0: irq 72 for MSI/MSI-X
[    2.991606] e1000e 0000:03:00.0: irq 73 for MSI/MSI-X
[    2.991611] e1000e 0000:03:00.0: irq 74 for MSI/MSI-X
[    3.092768] e1000e 0000:03:00.0: eth0: (PCI Express:2.5GT/s:Width x1) 48:5b:39:75:91:bd
[    3.100992] e1000e 0000:03:00.0: eth0: Intel(R) PRO/1000 Network Connection
[    3.108173] e1000e 0000:03:00.0: eth0: MAC: 3, PHY: 8, PBA No: FFFFFF-0FF

I tried some coalesce options by 'ethtool -C eth0', but
anything didn't help.

If I understand the code and spec correctly, TX interrupts are
generated when TXDCTL.WTHRESH descriptors have been accumulated
and write backed.

I tentatively changed the TXDCTL.WTHRESH to 1, then it seems
that latency spikes are disappear.

drivers/net/ethernet/intel/e1000e/e1000.h
@@ -181,7 +181,7 @@ struct e1000_info;
 #define E1000_TXDCTL_DMA_BURST_ENABLE                          \
        (E1000_TXDCTL_GRAN | /* set descriptor granularity */  \
         E1000_TXDCTL_COUNT_DESC |                             \
-        (5 << 16) | /* wthresh must be +1 more than desired */\
+        (1 << 16) | /* wthresh must be +1 more than desired */\
         (1 << 8)  | /* hthresh */                             \
         0x1f)       /* pthresh */

(before) $ ping -i0.2 192.168.11.2
PING 192.168.11.2 (192.168.11.2) 56(84) bytes of data.
64 bytes from 192.168.11.2: icmp_req=1 ttl=64 time=0.191 ms
64 bytes from 192.168.11.2: icmp_req=2 ttl=64 time=0.179 ms
64 bytes from 192.168.11.2: icmp_req=3 ttl=64 time=0.199 ms
64 bytes from 192.168.11.2: icmp_req=4 ttl=64 time=0.143 ms
64 bytes from 192.168.11.2: icmp_req=5 ttl=64 time=0.193 ms
64 bytes from 192.168.11.2: icmp_req=6 ttl=64 time=0.150 ms
64 bytes from 192.168.11.2: icmp_req=7 ttl=64 time=0.186 ms
64 bytes from 192.168.11.2: icmp_req=8 ttl=64 time=0.198 ms
64 bytes from 192.168.11.2: icmp_req=9 ttl=64 time=0.195 ms
64 bytes from 192.168.11.2: icmp_req=10 ttl=64 time=0.194 ms
64 bytes from 192.168.11.2: icmp_req=11 ttl=64 time=0.196 ms
64 bytes from 192.168.11.2: icmp_req=12 ttl=64 time=0.200 ms
64 bytes from 192.168.11.2: icmp_req=13 ttl=64 time=651 ms
64 bytes from 192.168.11.2: icmp_req=14 ttl=64 time=451 ms
64 bytes from 192.168.11.2: icmp_req=15 ttl=64 time=241 ms
64 bytes from 192.168.11.2: icmp_req=16 ttl=64 time=31.3 ms
64 bytes from 192.168.11.2: icmp_req=17 ttl=64 time=0.184 ms
64 bytes from 192.168.11.2: icmp_req=18 ttl=64 time=0.199 ms
64 bytes from 192.168.11.2: icmp_req=19 ttl=64 time=0.197 ms
64 bytes from 192.168.11.2: icmp_req=20 ttl=64 time=0.196 ms
64 bytes from 192.168.11.2: icmp_req=21 ttl=64 time=0.192 ms
64 bytes from 192.168.11.2: icmp_req=22 ttl=64 time=0.205 ms
64 bytes from 192.168.11.2: icmp_req=23 ttl=64 time=629 ms
64 bytes from 192.168.11.2: icmp_req=24 ttl=64 time=419 ms
64 bytes from 192.168.11.2: icmp_req=25 ttl=64 time=209 ms
64 bytes from 192.168.11.2: icmp_req=26 ttl=64 time=0.280 ms
64 bytes from 192.168.11.2: icmp_req=27 ttl=64 time=0.193 ms
64 bytes from 192.168.11.2: icmp_req=28 ttl=64 time=0.194 ms
64 bytes from 192.168.11.2: icmp_req=29 ttl=64 time=0.143 ms
64 bytes from 192.168.11.2: icmp_req=30 ttl=64 time=0.191 ms
64 bytes from 192.168.11.2: icmp_req=31 ttl=64 time=0.144 ms
64 bytes from 192.168.11.2: icmp_req=32 ttl=64 time=0.192 ms
64 bytes from 192.168.11.2: icmp_req=33 ttl=64 time=0.199 ms
64 bytes from 192.168.11.2: icmp_req=34 ttl=64 time=0.193 ms
64 bytes from 192.168.11.2: icmp_req=35 ttl=64 time=0.196 ms
64 bytes from 192.168.11.2: icmp_req=36 ttl=64 time=0.196 ms
64 bytes from 192.168.11.2: icmp_req=37 ttl=64 time=0.196 ms
64 bytes from 192.168.11.2: icmp_req=38 ttl=64 time=1600 ms
64 bytes from 192.168.11.2: icmp_req=39 ttl=64 time=1390 ms
64 bytes from 192.168.11.2: icmp_req=40 ttl=64 time=1180 ms
64 bytes from 192.168.11.2: icmp_req=41 ttl=64 time=980 ms
64 bytes from 192.168.11.2: icmp_req=42 ttl=64 time=780 ms
64 bytes from 192.168.11.2: icmp_req=43 ttl=64 time=570 ms
64 bytes from 192.168.11.2: icmp_req=44 ttl=64 time=0.151 ms
64 bytes from 192.168.11.2: icmp_req=45 ttl=64 time=0.189 ms
64 bytes from 192.168.11.2: icmp_req=46 ttl=64 time=0.203 ms
64 bytes from 192.168.11.2: icmp_req=47 ttl=64 time=0.185 ms
64 bytes from 192.168.11.2: icmp_req=48 ttl=64 time=0.189 ms
64 bytes from 192.168.11.2: icmp_req=49 ttl=64 time=0.204 ms
64 bytes from 192.168.11.2: icmp_req=50 ttl=64 time=0.198 ms

I think 1000 ms - 2000 ms delay is come from e1000_watchdog_task().

(after) $ ping -i0.2 192.168.11.2
64 bytes from 192.168.11.2: icmp_req=1 ttl=64 time=0.175 ms
64 bytes from 192.168.11.2: icmp_req=2 ttl=64 time=0.203 ms
64 bytes from 192.168.11.2: icmp_req=3 ttl=64 time=0.196 ms
64 bytes from 192.168.11.2: icmp_req=4 ttl=64 time=0.197 ms
64 bytes from 192.168.11.2: icmp_req=5 ttl=64 time=0.186 ms
64 bytes from 192.168.11.2: icmp_req=6 ttl=64 time=0.197 ms
64 bytes from 192.168.11.2: icmp_req=7 ttl=64 time=0.189 ms
64 bytes from 192.168.11.2: icmp_req=8 ttl=64 time=0.146 ms
64 bytes from 192.168.11.2: icmp_req=9 ttl=64 time=0.193 ms
64 bytes from 192.168.11.2: icmp_req=10 ttl=64 time=0.194 ms
64 bytes from 192.168.11.2: icmp_req=11 ttl=64 time=0.195 ms
64 bytes from 192.168.11.2: icmp_req=12 ttl=64 time=0.190 ms
64 bytes from 192.168.11.2: icmp_req=13 ttl=64 time=0.204 ms
64 bytes from 192.168.11.2: icmp_req=14 ttl=64 time=0.201 ms
64 bytes from 192.168.11.2: icmp_req=15 ttl=64 time=0.189 ms
64 bytes from 192.168.11.2: icmp_req=16 ttl=64 time=0.193 ms
64 bytes from 192.168.11.2: icmp_req=17 ttl=64 time=0.190 ms
64 bytes from 192.168.11.2: icmp_req=18 ttl=64 time=0.143 ms
64 bytes from 192.168.11.2: icmp_req=19 ttl=64 time=0.191 ms
64 bytes from 192.168.11.2: icmp_req=20 ttl=64 time=0.190 ms

^ permalink raw reply

* RE: [PATCH net-next] iwlwifi: dont pull too much payload in skb head
From: Berg, Johannes @ 2012-05-29 14:45 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, Guy, Wey-Yi W
In-Reply-To: <1337354484.7029.42.camel@edumazet-glaptop>

> > We may want to move this code into mac80211 later though since it also
> > has an if (pull in everything, even reallocating if necessary, if it's
> > a management frame), but that can wait, I think we're the only driver
> > using paged RX.
> 
> This is OK, these frames wont be injected in linux IP/TCP stack.

Right.

> Or maybe you would like an optimized version of skb_header_pointer(),
> avoiding the copy if the whole blob can be part of _one_ fragment ?

Hmm. I guess that would work, but I'm not sure it's worth the effort since there typically aren't many management frames. We'd have to replace all skb->data, the entire mac80211 assumes that management frames are linear. 

johannes
-- 

--------------------------------------------------------------------------------------
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen, Deutschland 
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 
Ust.-IdNr./VAT Registration No.: DE129385895
Citibank Frankfurt a.M. (BLZ 502 109 00) 600119052

^ permalink raw reply

* Difficulties to get 1Gbps on be2net ethernet card
From: Jean-Michel Hautbois @ 2012-05-29 14:46 UTC (permalink / raw)
  To: netdev

Hi list,

I am using a NC553i ethernet card connected on a HP 10GbE Flex-10.
I am sending UDP multicast packets from one blade to another (HP
ProLiant BL460c G7) which has stricly the same HW.

I have lots of packet loss from Tx to Rx, and I can't understand why.
I suspected TX coalescing but since 3.4 I can't set this parameter
(and adaptive-tx is on by default).
I have tried the same test with a debian lenny (2.6.26 kernel and HP
drivers) and it works very well (adaptive-tx is off).

Here is the netstat (from Tx point of view) :

$> netstat -s eth1 > before ; sleep 10 ; netstat -s eth1 > after
$> beforeafter before after
Ip:
    280769 total packets received
    4 with invalid addresses
    0 forwarded
    0 incoming packets discarded
    275063 incoming packets delivered
    305430 requests sent out
    0 dropped because of missing route
Icmp:
    0 ICMP messages received
    0 input ICMP message failed.
    ICMP input histogram:
        destination unreachable: 0
        echo requests: 0
    0 ICMP messages sent
    0 ICMP messages failed
    ICMP output histogram:
        destination unreachable: 0
        echo replies: 0
IcmpMsg:
        InType3: 0
        InType8: 0
        OutType0: 0
        OutType3: 0
Tcp:
    18 active connections openings
    18 passive connection openings
    0 failed connection attempts
    0 connection resets received
    0 connections established
    3681 segments received
    3650 segments send out
    0 segments retransmited
    0 bad segments received.
    0 resets sent
Udp:
    12626 packets received
    0 packets to unknown port received.
    0 packet receive errors
    259025 packets sent
UdpLite:
TcpExt:
    0 invalid SYN cookies received
    0 packets pruned from receive queue because of socket buffer overrun
    14 TCP sockets finished time wait in fast timer
    0 packets rejects in established connections because of timestamp
    61 delayed acks sent
    0 delayed acks further delayed because of locked socket
    Quick ack mode was activated 0 times
    2924 packets directly queued to recvmsg prequeue.
    32 bytes directly in process context from backlog
    48684 bytes directly received in process context from prequeue
    232 packet headers predicted
    1991 packets header predicted and directly queued to user
    132 acknowledgments not containing data payload received
    2230 predicted acknowledgments
    0 times recovered from packet loss by selective acknowledgements
    0 congestion windows recovered without slow start after partial ack
    0 TCP data loss events
    0 timeouts after SACK recovery
    0 fast retransmits
    0 forward retransmits
    0 retransmits in slow start
    0 other TCP timeouts
    1 times receiver scheduled too late for direct processing
    0 packets collapsed in receive queue due to low socket buffer
    0 DSACKs sent for old packets
    0 DSACKs received
    0 connections reset due to unexpected data
    0 connections reset due to early user close
    0 connections aborted due to timeout
    0 times unabled to send RST due to no memory
    TCPSackShifted: 0
    TCPSackMerged: 0
    TCPSackShiftFallback: 0
    TCPBacklogDrop: 0
    TCPDeferAcceptDrop: 0
IpExt:
    InMcastPkts: -652745397
    OutMcastPkts: 301498
    InBcastPkts: 13
    InOctets: -2004227752
    OutOctets: -2096666083
    InMcastOctets: 1058181285
    OutMcastOctets: -1510963815
    InBcastOctets: 1014

And ethtool diff :
$> ethtool -S eth1 > before ; sleep 10 ; ethtool -S eth1 > after
$> beforeafter before after
NIC statistics:
     rx_crc_errors: 0
     rx_alignment_symbol_errors: 0
     rx_pause_frames: 0
     rx_control_frames: 0
     rx_in_range_errors: 0
     rx_out_range_errors: 0
     rx_frame_too_long: 0
     rx_address_mismatch_drops: 6
     rx_dropped_too_small: 0
     rx_dropped_too_short: 0
     rx_dropped_header_too_small: 0
     rx_dropped_tcp_length: 0
     rx_dropped_runt: 0
     rxpp_fifo_overflow_drop: 0
     rx_input_fifo_overflow_drop: 0
     rx_ip_checksum_errs: 0
     rx_tcp_checksum_errs: 0
     rx_udp_checksum_errs: 0
     tx_pauseframes: 0
     tx_controlframes: 0
     rx_priority_pause_frames: 0
     pmem_fifo_overflow_drop: 0
     jabber_events: 0
     rx_drops_no_pbuf: 0
     rx_drops_no_erx_descr: 0
     rx_drops_no_tpre_descr: 0
     rx_drops_too_many_frags: 0
     forwarded_packets: 0
     rx_drops_mtu: 0
     eth_red_drops: 0
     be_on_die_temperature: 0
     rxq0: rx_bytes: 0
     rxq0: rx_pkts: 0
     rxq0: rx_compl: 0
     rxq0: rx_mcast_pkts: 0
     rxq0: rx_post_fail: 0
     rxq0: rx_drops_no_skbs: 0
     rxq0: rx_drops_no_frags: 0
     txq0: tx_compl: 257113
     txq0: tx_bytes: 1038623935
     txq0: tx_pkts: 257113
     txq0: tx_reqs: 257113
     txq0: tx_wrbs: 514226
     txq0: tx_stops: 10

As you can see, there is 10 tx_stops in 10 seconds (it varies, can be 3 to 15).
Any thoughts ?

Regards,
JM

^ permalink raw reply

* Re: Strange latency spikes/TX network stalls on Sun Fire X4150(x86) and e1000e
From: Tom Herbert @ 2012-05-29 14:54 UTC (permalink / raw)
  To: Hiroaki SHIMODA
  Cc: Denys Fedoryshchenko, netdev, e1000-devel, jeffrey.t.kirsher,
	jesse.brandeburg, eric.dumazet, davem
In-Reply-To: <20120529232518.e5b41759.shimoda.hiroaki@gmail.com>

Thanks Hiroaki for this description, it looks promising.  Denys, can
you test with his patch.

Tom

On Tue, May 29, 2012 at 7:25 AM, Hiroaki SHIMODA
<shimoda.hiroaki@gmail.com> wrote:
> On Sun, 20 May 2012 10:40:41 -0700
> Tom Herbert <therbert@google.com> wrote:
>
>> Tried to reproduce:
>>
>> May 20 10:08:30 test kernel: [    6.168240] e1000e 0000:06:00.0:
>> (unregistered net_device): Interrupt Throttling Rate (ints/sec) set to
>> dynamic conservative mode
>> May 20 10:08:30 test kernel: [    6.221591] e1000e 0000:06:00.1:
>> (unregistered net_device): Interrupt Throttling Rate (ints/sec) set to
>> dynamic conservative mode
>>
>> 06:00.0 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit
>> Ethernet Controller (Copper) (rev 01)
>> 06:00.1 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit
>> Ethernet Controller (Copper) (rev 01)
>>
>> Following above instructions to repro gives:
>>
>> 1480 bytes from test2 (192.168.2.49): icmp_req=5875 ttl=64 time=0.358 ms
>> 1480 bytes from test2 (192.168.2.49): icmp_req=5876 ttl=64 time=0.330 ms
>> 1480 bytes from test2 (192.168.2.49): icmp_req=5877 ttl=64 time=0.337 ms
>> 1480 bytes from test2 (192.168.2.49): icmp_req=5878 ttl=64 time=0.375 ms
>> 1480 bytes from test2 (192.168.2.49): icmp_req=5879 ttl=64 time=0.359 ms
>> 1480 bytes from lpb49.prod.google.com (192.168.2.49): icmp_req=5880
>> ttl=64 time=0.380 ms
>>
>> And I didn't see the stalls. This was on an Intel machine.  The limit
>> was stable, went up to around 28K when opened large file and tended to
>> stay between 15-28K.
>>
>> The describe problem seems to have characteristics that transmit
>> interrupts are not at all periodic, and it would seem that some are
>> taking hundreds of milliseconds to pop.  I don't see anything that
>> would cause that in the NIC, is it possible there is some activity on
>> the machines periodically and often holding down interrupts for  long
>> periods of time.  Are there any peculiarities on Sun Fire in interrupt
>> handling?
>>
>> Can you also provide an 'ethtool -c eth0'
>>
>> Thanks,
>> Tom
>
> I also observed the similar behaviour on the following environment.
>
> 03:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection
>
> [    2.962119] e1000e: Intel(R) PRO/1000 Network Driver - 2.0.0-k
> [    2.968095] e1000e: Copyright(c) 1999 - 2012 Intel Corporation.
> [    2.974251] e1000e 0000:03:00.0: Disabling ASPM L0s L1
> [    2.979653] e1000e 0000:03:00.0: (unregistered net_device): Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode
> [    2.991599] e1000e 0000:03:00.0: irq 72 for MSI/MSI-X
> [    2.991606] e1000e 0000:03:00.0: irq 73 for MSI/MSI-X
> [    2.991611] e1000e 0000:03:00.0: irq 74 for MSI/MSI-X
> [    3.092768] e1000e 0000:03:00.0: eth0: (PCI Express:2.5GT/s:Width x1) 48:5b:39:75:91:bd
> [ 3.100992] e1000e 0000:03:00.0: eth0: Intel(R) PRO/1000 Network Connection
> [ 3.108173] e1000e 0000:03:00.0: eth0: MAC: 3, PHY: 8, PBA No: FFFFFF-0FF
>
> I tried some coalesce options by 'ethtool -C eth0', but
> anything didn't help.
>
> If I understand the code and spec correctly, TX interrupts are
> generated when TXDCTL.WTHRESH descriptors have been accumulated
> and write backed.
>
> I tentatively changed the TXDCTL.WTHRESH to 1, then it seems
> that latency spikes are disappear.
>
> drivers/net/ethernet/intel/e1000e/e1000.h
> @@ -181,7 +181,7 @@ struct e1000_info;
>  #define E1000_TXDCTL_DMA_BURST_ENABLE                          \
>        (E1000_TXDCTL_GRAN | /* set descriptor granularity */  \
>         E1000_TXDCTL_COUNT_DESC |                             \
> -        (5 << 16) | /* wthresh must be +1 more than desired */\
> +        (1 << 16) | /* wthresh must be +1 more than desired */\
>         (1 << 8)  | /* hthresh */                             \
>         0x1f)       /* pthresh */
>
> (before) $ ping -i0.2 192.168.11.2
> PING 192.168.11.2 (192.168.11.2) 56(84) bytes of data.
> 64 bytes from 192.168.11.2: icmp_req=1 ttl=64 time=0.191 ms
> 64 bytes from 192.168.11.2: icmp_req=2 ttl=64 time=0.179 ms
> 64 bytes from 192.168.11.2: icmp_req=3 ttl=64 time=0.199 ms
> 64 bytes from 192.168.11.2: icmp_req=4 ttl=64 time=0.143 ms
> 64 bytes from 192.168.11.2: icmp_req=5 ttl=64 time=0.193 ms
> 64 bytes from 192.168.11.2: icmp_req=6 ttl=64 time=0.150 ms
> 64 bytes from 192.168.11.2: icmp_req=7 ttl=64 time=0.186 ms
> 64 bytes from 192.168.11.2: icmp_req=8 ttl=64 time=0.198 ms
> 64 bytes from 192.168.11.2: icmp_req=9 ttl=64 time=0.195 ms
> 64 bytes from 192.168.11.2: icmp_req=10 ttl=64 time=0.194 ms
> 64 bytes from 192.168.11.2: icmp_req=11 ttl=64 time=0.196 ms
> 64 bytes from 192.168.11.2: icmp_req=12 ttl=64 time=0.200 ms
> 64 bytes from 192.168.11.2: icmp_req=13 ttl=64 time=651 ms
> 64 bytes from 192.168.11.2: icmp_req=14 ttl=64 time=451 ms
> 64 bytes from 192.168.11.2: icmp_req=15 ttl=64 time=241 ms
> 64 bytes from 192.168.11.2: icmp_req=16 ttl=64 time=31.3 ms
> 64 bytes from 192.168.11.2: icmp_req=17 ttl=64 time=0.184 ms
> 64 bytes from 192.168.11.2: icmp_req=18 ttl=64 time=0.199 ms
> 64 bytes from 192.168.11.2: icmp_req=19 ttl=64 time=0.197 ms
> 64 bytes from 192.168.11.2: icmp_req=20 ttl=64 time=0.196 ms
> 64 bytes from 192.168.11.2: icmp_req=21 ttl=64 time=0.192 ms
> 64 bytes from 192.168.11.2: icmp_req=22 ttl=64 time=0.205 ms
> 64 bytes from 192.168.11.2: icmp_req=23 ttl=64 time=629 ms
> 64 bytes from 192.168.11.2: icmp_req=24 ttl=64 time=419 ms
> 64 bytes from 192.168.11.2: icmp_req=25 ttl=64 time=209 ms
> 64 bytes from 192.168.11.2: icmp_req=26 ttl=64 time=0.280 ms
> 64 bytes from 192.168.11.2: icmp_req=27 ttl=64 time=0.193 ms
> 64 bytes from 192.168.11.2: icmp_req=28 ttl=64 time=0.194 ms
> 64 bytes from 192.168.11.2: icmp_req=29 ttl=64 time=0.143 ms
> 64 bytes from 192.168.11.2: icmp_req=30 ttl=64 time=0.191 ms
> 64 bytes from 192.168.11.2: icmp_req=31 ttl=64 time=0.144 ms
> 64 bytes from 192.168.11.2: icmp_req=32 ttl=64 time=0.192 ms
> 64 bytes from 192.168.11.2: icmp_req=33 ttl=64 time=0.199 ms
> 64 bytes from 192.168.11.2: icmp_req=34 ttl=64 time=0.193 ms
> 64 bytes from 192.168.11.2: icmp_req=35 ttl=64 time=0.196 ms
> 64 bytes from 192.168.11.2: icmp_req=36 ttl=64 time=0.196 ms
> 64 bytes from 192.168.11.2: icmp_req=37 ttl=64 time=0.196 ms
> 64 bytes from 192.168.11.2: icmp_req=38 ttl=64 time=1600 ms
> 64 bytes from 192.168.11.2: icmp_req=39 ttl=64 time=1390 ms
> 64 bytes from 192.168.11.2: icmp_req=40 ttl=64 time=1180 ms
> 64 bytes from 192.168.11.2: icmp_req=41 ttl=64 time=980 ms
> 64 bytes from 192.168.11.2: icmp_req=42 ttl=64 time=780 ms
> 64 bytes from 192.168.11.2: icmp_req=43 ttl=64 time=570 ms
> 64 bytes from 192.168.11.2: icmp_req=44 ttl=64 time=0.151 ms
> 64 bytes from 192.168.11.2: icmp_req=45 ttl=64 time=0.189 ms
> 64 bytes from 192.168.11.2: icmp_req=46 ttl=64 time=0.203 ms
> 64 bytes from 192.168.11.2: icmp_req=47 ttl=64 time=0.185 ms
> 64 bytes from 192.168.11.2: icmp_req=48 ttl=64 time=0.189 ms
> 64 bytes from 192.168.11.2: icmp_req=49 ttl=64 time=0.204 ms
> 64 bytes from 192.168.11.2: icmp_req=50 ttl=64 time=0.198 ms
>
> I think 1000 ms - 2000 ms delay is come from e1000_watchdog_task().
>
> (after) $ ping -i0.2 192.168.11.2
> 64 bytes from 192.168.11.2: icmp_req=1 ttl=64 time=0.175 ms
> 64 bytes from 192.168.11.2: icmp_req=2 ttl=64 time=0.203 ms
> 64 bytes from 192.168.11.2: icmp_req=3 ttl=64 time=0.196 ms
> 64 bytes from 192.168.11.2: icmp_req=4 ttl=64 time=0.197 ms
> 64 bytes from 192.168.11.2: icmp_req=5 ttl=64 time=0.186 ms
> 64 bytes from 192.168.11.2: icmp_req=6 ttl=64 time=0.197 ms
> 64 bytes from 192.168.11.2: icmp_req=7 ttl=64 time=0.189 ms
> 64 bytes from 192.168.11.2: icmp_req=8 ttl=64 time=0.146 ms
> 64 bytes from 192.168.11.2: icmp_req=9 ttl=64 time=0.193 ms
> 64 bytes from 192.168.11.2: icmp_req=10 ttl=64 time=0.194 ms
> 64 bytes from 192.168.11.2: icmp_req=11 ttl=64 time=0.195 ms
> 64 bytes from 192.168.11.2: icmp_req=12 ttl=64 time=0.190 ms
> 64 bytes from 192.168.11.2: icmp_req=13 ttl=64 time=0.204 ms
> 64 bytes from 192.168.11.2: icmp_req=14 ttl=64 time=0.201 ms
> 64 bytes from 192.168.11.2: icmp_req=15 ttl=64 time=0.189 ms
> 64 bytes from 192.168.11.2: icmp_req=16 ttl=64 time=0.193 ms
> 64 bytes from 192.168.11.2: icmp_req=17 ttl=64 time=0.190 ms
> 64 bytes from 192.168.11.2: icmp_req=18 ttl=64 time=0.143 ms
> 64 bytes from 192.168.11.2: icmp_req=19 ttl=64 time=0.191 ms
> 64 bytes from 192.168.11.2: icmp_req=20 ttl=64 time=0.190 ms

^ permalink raw reply

* Re: Strange latency spikes/TX network stalls on Sun Fire X4150(x86) and e1000e
From: Eric Dumazet @ 2012-05-29 15:11 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Hiroaki SHIMODA, Denys Fedoryshchenko, netdev, e1000-devel,
	jeffrey.t.kirsher, jesse.brandeburg, davem
In-Reply-To: <CA+mtBx_uYy9XcRvpD2E46FuMBFu38iQvCiwFHWqbhPBmY=JfOg@mail.gmail.com>

On Tue, 2012-05-29 at 07:54 -0700, Tom Herbert wrote:
> Thanks Hiroaki for this description, it looks promising.  Denys, can
> you test with his patch.
> 
> Tom

Indeed this sounds good.

Hmm, I guess my e1000e has no FLAG2_DMA_BURST in adapter->flags2

^ permalink raw reply

* Re: [PATCH 2/2] tc-drr(8): tab unquoted in a argument to a macro
From: Stephen Hemminger @ 2012-05-29 15:18 UTC (permalink / raw)
  To: Andreas Henriksson; +Cc: netdev, Bjarni Ingi Gislason
In-Reply-To: <1338205565-11872-2-git-send-email-andreas@fatal.se>

On Mon, 28 May 2012 13:46:05 +0200
Andreas Henriksson <andreas@fatal.se> wrote:

> From: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
> 
> From "man ..." ("groff -ww -mandoc ..."):
> 
> <groff: tc-drr.8>:67: warning: tab character in unquoted macro argument
> <groff: tc-drr.8>:69: warning: tab character in unquoted macro argument
> 
> *********************
> 
> Originally filed at: http://bugs.debian.org/674706
> 
> Signed-off-by: Andreas Henriksson <andreas@fatal.se>
> ---
>  man/man8/tc-drr.8 |    4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/man/man8/tc-drr.8 b/man/man8/tc-drr.8
> index 16a8ec0..e25d6dd 100644
> --- a/man/man8/tc-drr.8
> +++ b/man/man8/tc-drr.8
> @@ -64,9 +64,9 @@ flow filter:
>  
>  .B for i in .. 1024;do
>  .br
> -.B \ttc class add dev ..  classid $handle:$(print %x $i)
> +.B "\ttc class add dev .. classid $handle:$(print %x $i)"
>  .br
> -.B \ttc qdisc add dev .. fifo limit 16
> +.B "\ttc qdisc add dev .. fifo limit 16"
>  .br
>  .B done
>  

Both applied, thanks.

^ permalink raw reply

* Fwd: Ethtool not displaying ntuple/nfc rule settings
From: TJ Johnson @ 2012-05-29 16:55 UTC (permalink / raw)
  To: netdev, bhutchings
In-Reply-To: <CAFZv140-yrrDmD_H3ySEfJOL7Yp3syGt-VCNXaOi64nzucp5xA@mail.gmail.com>

Sorry for the double send. Had to convert to plain text as I received
failure messages.

---------- Forwarded message ----------
From: TJ Johnson <tjjohnson10200@gmail.com>
Date: Tue, May 29, 2012 at 10:36 AM
Subject: Ethtool not displaying ntuple/nfc rule settings
To: netdev@vger.kernel.org, bhutchings@solarflare.com

Hi,

Not sure if this is even the right place to ask this, so feel free to push
me somewhere else if necessary. I am using ethtool's -U option to set up
rules for an ixgbe device. That part seems to work great. However I am
unable to check the currently configured rules once they are in place.

ethtool -u DEVNAME produces this:
Cannot get RX rings: Operation not supported
rxclass: Cannot get RX class rule count: Operation not supported
RX classification rule retrieval failed

ethtool -n DEVNAME rx-flow-hash udp4 produces this:
Cannot get RX network flow hashing options: Operation not supported

The ethtool version I am using is 3.2, however I have tried 2.6.36-2.6.39.
Just used configure; make for building the tool.

ethtool -i gives this for the device version:

driver: ixgbe
version: 3.3.9-NAPI
firmware-version: 1.0-3
bus-info: 0000:0c:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no

The OS is SUSE Linux Enterprise Server 11 using a custom 2.6.36 kernel.

Any Ideas as to what the problem might be? Or how I can solve the issue?

Thanks,
TJ

^ permalink raw reply

* Re: Fwd: Ethtool not displaying ntuple/nfc rule settings
From: Ben Hutchings @ 2012-05-29 17:55 UTC (permalink / raw)
  To: TJ Johnson; +Cc: netdev
In-Reply-To: <CAFZv1415XPt+7v3xo6BR86cRKPju7QV7NQMAzS-O21TqvGBgBA@mail.gmail.com>

On Tue, 2012-05-29 at 10:55 -0600, TJ Johnson wrote:
> Sorry for the double send. Had to convert to plain text as I received
> failure messages.
> 
> ---------- Forwarded message ----------
> From: TJ Johnson <tjjohnson10200@gmail.com>
> Date: Tue, May 29, 2012 at 10:36 AM
> Subject: Ethtool not displaying ntuple/nfc rule settings
> To: netdev@vger.kernel.org, bhutchings@solarflare.com
> 
> 
> Hi,
> 
> Not sure if this is even the right place to ask this, so feel free to push
> me somewhere else if necessary. I am using ethtool's -U option to set up
> rules for an ixgbe device. That part seems to work great. However I am
> unable to check the currently configured rules once they are in place.
> 
> ethtool -u DEVNAME produces this:
> Cannot get RX rings: Operation not supported
> rxclass: Cannot get RX class rule count: Operation not supported
> RX classification rule retrieval failed
> 
> ethtool -n DEVNAME rx-flow-hash udp4 produces this:
> Cannot get RX network flow hashing options: Operation not supported
[...]
> The OS is SUSE Linux Enterprise Server 11 using a custom 2.6.36 kernel.
> 
> Any Ideas as to what the problem might be? Or how I can solve the issue?

This version of the ixgbe driver implemented the n-tuple interface,
which has since been removed in favour of the NFC rules interface.  It
was switched to the new interface in Linux 3.1.

ethtool supports setting rules through either interface (automatically)
but can only read them back through the NFC rules interface.  The
n-tuple interface did support reading rules but the information returned
was unreliable: it would not necessarily match the hardware state.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* [PATCH 0/3] net: implement auto-loading of generic netlink modules
From: Neil Horman @ 2012-05-29 19:30 UTC (permalink / raw)
  To: netdev; +Cc: Neil Horman, Eric Dumazet, James Chapman, David Miller

Eric D. recently noted that the drop_monitor module didn't autoload when the
dropwatch user space utility started.  Looking into this I noted that theres no
formal macro set to define module aliases that can be used by a request module
call in the generic netlink family lookup path.  Currenlty the
net-pf-*-proto-*-type-<n> format is used, but the macros which form this expect
<n> to be a well defined integer, which generic netlink doesn't use for family
definitions.  So this series creates a new macro that create a
net-pf-*-proto-*-name format where name can be any arbitrary string, allowing us
to apend family-<x> where x is a generic netlink family name.  With these
macros, we can easily autoload modules that register generic netlink families

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: James Chapman <jchapman@katalix.com>
CC: David Miller <davem@davemloft.net>

^ permalink raw reply

* [PATCH 1/3] net: add MODULE_ALIAS_NET_PF_PROTO_NAME
From: Neil Horman @ 2012-05-29 19:30 UTC (permalink / raw)
  To: netdev; +Cc: Neil Horman, Eric Dumazet, David Miller
In-Reply-To: <1338319842-18395-1-git-send-email-nhorman@tuxdriver.com>

The MODULE_ALAIS_NET_PF macro set is missing a variant that allows for the
appending of an arbitrary string to the net-pf-<x>-proto-<y> base.  while
MODULE_ALIAS_NET_PF_PROTO_NAME_TYPE allows an appending of a numerical type, we
need to be able to append a generic string to support generic netlink families
that have neither a fix numberical protocol or type number

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: David Miller <davem@davemloft.net>
---
 include/linux/net.h |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/include/linux/net.h b/include/linux/net.h
index 2d7510f..e9ac2df 100644
--- a/include/linux/net.h
+++ b/include/linux/net.h
@@ -313,5 +313,8 @@ extern int kernel_sock_shutdown(struct socket *sock,
 	MODULE_ALIAS("net-pf-" __stringify(pf) "-proto-" __stringify(proto) \
 		     "-type-" __stringify(type))
 
+#define MODULE_ALIAS_NET_PF_PROTO_NAME(pf, proto, name) \
+	MODULE_ALIAS("net-pf-" __stringify(pf) "-proto-" __stringify(proto) \
+		     name)
 #endif /* __KERNEL__ */
 #endif	/* _LINUX_NET_H */
-- 
1.7.7.6

^ permalink raw reply related

* [PATCH 2/3] genetlink: Build a generic netlink family module alias
From: Neil Horman @ 2012-05-29 19:30 UTC (permalink / raw)
  To: netdev; +Cc: Neil Horman, Eric Dumazet, James Chapman, David Miller
In-Reply-To: <1338319842-18395-1-git-send-email-nhorman@tuxdriver.com>

Generic netlink searches for -type- formatted aliases when requesting a module to
fulfill a protocol request (i.e. net-pf-16-proto-16-type-<x>, where x is a type
value).  However generic netlink protocols have no well defined type numbers,
they have string names.  Modify genl_ctrl_getfamily to request an alias in the
format net-pf-16-proto-16-family-<x> instead, where x is a generic string, and
add a macro that builds on the previously added MODULE_ALIAS_NET_PF_PROTO_NAME
macro to allow modules to specifify those generic strings.

Note, l2tp previously hacked together an net-pf-16-proto-16-type-l2tp alias
using the MODULE_ALIAS macro, with these updates we can convert that to use the
PROTO_NAME macro.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: James Chapman <jchapman@katalix.com>
CC: David Miller <davem@davemloft.net>
---
 include/linux/genetlink.h |    3 +++
 net/l2tp/l2tp_netlink.c   |    3 +--
 net/netlink/genetlink.c   |    2 +-
 3 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/include/linux/genetlink.h b/include/linux/genetlink.h
index 73c28de..7a11401 100644
--- a/include/linux/genetlink.h
+++ b/include/linux/genetlink.h
@@ -110,6 +110,9 @@ extern int lockdep_genl_is_held(void);
 #define genl_dereference(p)					\
 	rcu_dereference_protected(p, lockdep_genl_is_held())
 
+#define MODULE_ALIAS_GENL_FAMILY(family)\
+ MODULE_ALIAS_NET_PF_PROTO_NAME(PF_NETLINK, NETLINK_GENERIC, "-family-" family)
+
 #endif /* __KERNEL__ */
 
 #endif	/* __LINUX_GENERIC_NETLINK_H */
diff --git a/net/l2tp/l2tp_netlink.c b/net/l2tp/l2tp_netlink.c
index 8577264..ddc553e 100644
--- a/net/l2tp/l2tp_netlink.c
+++ b/net/l2tp/l2tp_netlink.c
@@ -923,5 +923,4 @@ MODULE_AUTHOR("James Chapman <jchapman@katalix.com>");
 MODULE_DESCRIPTION("L2TP netlink");
 MODULE_LICENSE("GPL");
 MODULE_VERSION("1.0");
-MODULE_ALIAS("net-pf-" __stringify(PF_NETLINK) "-proto-" \
-	     __stringify(NETLINK_GENERIC) "-type-" "l2tp");
+MODULE_ALIAS_GENL_FAMILY("l2tp");
diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
index 8340ace..2cc7c1e 100644
--- a/net/netlink/genetlink.c
+++ b/net/netlink/genetlink.c
@@ -836,7 +836,7 @@ static int ctrl_getfamily(struct sk_buff *skb, struct genl_info *info)
 #ifdef CONFIG_MODULES
 		if (res == NULL) {
 			genl_unlock();
-			request_module("net-pf-%d-proto-%d-type-%s",
+			request_module("net-pf-%d-proto-%d-family-%s",
 				       PF_NETLINK, NETLINK_GENERIC, name);
 			genl_lock();
 			res = genl_family_find_byname(name);
-- 
1.7.7.6

^ permalink raw reply related

* [PATCH 3/3] drop_monitor: Add module alias to enable automatic module loading
From: Neil Horman @ 2012-05-29 19:30 UTC (permalink / raw)
  To: netdev; +Cc: Neil Horman, Eric Dumazet, David Miller
In-Reply-To: <1338319842-18395-1-git-send-email-nhorman@tuxdriver.com>

Now that we have module alias macros for generic netlink families, lets use
those to mark modules with the appropriate family names for loading

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: David Miller <davem@davemloft.net>
---
 net/core/drop_monitor.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c
index 3252e7e..ea5fb9f 100644
--- a/net/core/drop_monitor.c
+++ b/net/core/drop_monitor.c
@@ -468,3 +468,4 @@ module_exit(exit_net_drop_monitor);
 
 MODULE_LICENSE("GPL v2");
 MODULE_AUTHOR("Neil Horman <nhorman@tuxdriver.com>");
+MODULE_ALIAS_GENL_FAMILY("NET_DM");
-- 
1.7.7.6

^ permalink raw reply related

* Re: [PATCH v3] drop_monitor: convert to modular building
From: Neil Horman @ 2012-05-29 19:33 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, bhutchings
In-Reply-To: <1337691919.3361.189.camel@edumazet-glaptop>

On Tue, May 22, 2012 at 03:05:19PM +0200, Eric Dumazet wrote:
> On Thu, 2012-05-17 at 16:21 -0400, Neil Horman wrote:
> > On Thu, May 17, 2012 at 04:09:37PM -0400, David Miller wrote:
> 
> > > 
> > > Applied, althrough it didn't apply cleanly to net-next.
> > > 
> > 
> > Apologies Dave, should have told you that I was carrying Joe P.'s cleanup patch
> > in my net-next tree as well:
> > http://marc.info/?l=linux-netdev&m=133727344816140&w=2
> > 
> > Since you noted that you had applied it, I applied it myself here.
> > Neil
> > 
> 
> Any plan to autoload drop_monitor module from dropwatch,
> or issuing some advice ?
> 
> # dropwatch -l kas
> Unable to find NET_DM family, dropwatch can't work
> Cleanuing up on socket creation error
> 
> Thanks
> 
> 
> 

Eric,
	Just FYI, I sent a series upstream to implement autoloading of generic
netlink families.  Please be awarem, that I've tested these with a hacked
version of dropwatch, and it works great, but with the normal version of
dropwatch, the drop_monitor module still doesn't autoload.  This is due to libnl
not explicitly requesting a family when genl_ctrl_family_resolve is called.
Instead of trying to load the module, it dumps the existing registered families
via a NLM_F_DUMP message.  I'm working on updating libnl to correct this
currently and will cc you on the patch.
Neil

^ permalink raw reply

* Re: [RFC PATCH 2/2] tcp: Early SYN limit and SYN cookie handling to mitigate SYN floods
From: Andi Kleen @ 2012-05-29 19:37 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Jesper Dangaard Brouer, netdev, Christoph Paasch, Eric Dumazet,
	David S. Miller, Martin Topholm, Florian Westphal, opurdila,
	Hans Schillstrom
In-Reply-To: <20120528115226.12068.31850.stgit@localhost.localdomain>

Jesper Dangaard Brouer <jbrouer@redhat.com> writes:

> TCP SYN handling is on the slow path via tcp_v4_rcv(), and is
> performed while holding spinlock bh_lock_sock().
>
> Real-life and testlab experiments show, that the kernel choks
> when reaching 130Kpps SYN floods (powerful Nehalem 16 cores).
> Measuring with perf reveals, that its caused by
> bh_lock_sock_nested() call in tcp_v4_rcv().
>
> With this patch, the machine can handle 750Kpps (max of the SYN
> flood generator) with cycles to spare, CPU load on the big machine
> dropped to 1%, from 100%.
>
> Notice we only handle syn cookie early on, normal SYN packets
> are still processed under the bh_lock_sock().

So basically handling syncookie lockless? 

Makes sense. Syncookies is a bit obsolete these days of course, due
to the lack of options. But may be still useful for this.

Obviously you'll need to clean up the patch and support IPv6,
but the basic idea looks good to me.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox