Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] fragment: add fast path
From: David Miller @ 2010-06-14  1:18 UTC (permalink / raw)
  To: xiaosuo; +Cc: kuznet, pekkas, jmorris, yoshfuji, kaber, netdev
In-Reply-To: <1276470995-21713-1-git-send-email-xiaosuo@gmail.com>

From: Changli Gao <xiaosuo@gmail.com>
Date: Mon, 14 Jun 2010 07:16:35 +0800

> As the fragments are usually in order,

In what universe does this happen "usually"?

Linux has been outputting fragments in reverse order for more than 10
years.

I'm not applying this patch.

^ permalink raw reply

* Re: [net-2.6 PATCH] ixgbe: fix automatic LRO/RSC settings for low latency
From: David Miller @ 2010-06-14  1:21 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, andy, sgruszka, stable
In-Reply-To: <20100611224629.30958.22500.stgit@localhost.localdomain>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Fri, 11 Jun 2010 15:47:03 -0700

> From: Andy Gospodarek <andy@greyhouse.net>
> 
> This patch added to 2.6.34:
> 
> 	commit f8d1dcaf88bddc7f282722ec1fdddbcb06a72f18
> 	Author: Jesse Brandeburg <jesse.brandeburg@intel.com>
> 	Date:   Tue Apr 27 01:37:20 2010 +0000
> 
> 	    ixgbe: enable extremely low latency
> 
> introduced a feature where LRO (called RSC on the hardware) was disabled
> automatically when setting rx-usecs to 0 via ethtool.  Some might not
> like the fact that LRO was disabled automatically, but I'm fine with
> that.  What I don't like is that LRO/RSC is automatically enabled when
> rx-usecs is set >0 via ethtool.
> 
> This would certainly be a problem if the device was used for forwarding
> and it was determined that the low latency wasn't needed after the
> device was already forwarding.  I played around with saving the state of
> LRO in the driver, but it just didn't seem worthwhile and would require
> a small change to dev_disable_lro() that I did not like.
> 
> This patch simply leaves LRO disabled when setting rx-usecs >0 and
> requires that the user enable it again.  An extra informational message
> will also now appear in the log so users can understand why LRO isn't
> being enabled as they expect.
> 
> Inconsistency of LRO setting first noticed by Stanislaw Gruszka.
> 
> Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
> CC: Stanislaw Gruszka <sgruszka@redhat.com>
> CC: stable@kernel.org
> Tested-by: Stephen Ko <stephen.s.ko@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Applied.

^ permalink raw reply

* Re: [net-2.6 PATCH] e1000: Fix message logging defect
From: David Miller @ 2010-06-14  1:21 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, joe
In-Reply-To: <20100611225148.31194.61945.stgit@localhost.localdomain>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Fri, 11 Jun 2010 15:51:49 -0700

> From: Joe Perches <joe@perches.com>
> 
> commit 675ad47375c76a7c3be4ace9554d92cd55518ced
> removed the capability to use ethtool.set_msglevel to
> control the types of messages emitted by the driver.
> 
> That commit should probably be reverted.
> 
> If not, then this patch fixes a message logging defect
> introduced by converting a printk without KERN_<level>
> to e_info.
> 
> This also reduces text by about 200 bytes.
> 
> Signed-off-by: Joe Perches <joe@perches.com>
> Tested-by: Emil Tantilov <emil.s.tantilov@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Applied.

^ permalink raw reply

* Re: [net-2.6 PATCH] ixgbe: fix for race with 8259(8|9) during shutdown
From: David Miller @ 2010-06-14  1:21 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, asaitou, donald.c.skidmore
In-Reply-To: <20100611232029.31430.75582.stgit@localhost.localdomain>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Fri, 11 Jun 2010 16:20:29 -0700

> From: Don Skidmore <donald.c.skidmore@intel.com>
> 
> There is a small window where the watchdog could be running as the
> interface is brought down on a NIC with two ports wired back to back.
> If ixgbe_update_status is then called can lead to a panic.  This patch
> allows the update to bail if we are in that condition.
> 
> This issue was orignally reported and fix proposed by Akihiko Saitou.
> 
> CC: Akihiko Saitou <asaitou@users.sourceforge.net>
> Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Applied.

^ permalink raw reply

* Re: [PATCH] fragment: add fast path
From: Changli Gao @ 2010-06-14  2:03 UTC (permalink / raw)
  To: David Miller; +Cc: kuznet, pekkas, jmorris, yoshfuji, kaber, netdev
In-Reply-To: <20100613.181857.189703148.davem@davemloft.net>

On Mon, Jun 14, 2010 at 9:18 AM, David Miller <davem@davemloft.net> wrote:
> From: Changli Gao <xiaosuo@gmail.com>
> Date: Mon, 14 Jun 2010 07:16:35 +0800
>
>> As the fragments are usually in order,
>
> In what universe does this happen "usually"?
>
> Linux has been outputting fragments in reverse order for more than 10
> years.
>

I have tested next-next-2.6 and darwin, and found they are both send
fragments in order:

Darwin:

Darwin localhost 9.8.0 Darwin Kernel Version 9.8.0: Wed Jul 15
16:55:01 PDT 2009; root:xnu-1228.15.4~1/RELEASE_I386 i386 i386

09:53:26.891820 IP (tos 0x0, ttl 64, id 19628, offset 0, flags [+],
proto UDP (17), length 1500) 10.13.3.10.52189 > 10.13.150.1.8888: UDP,
length 8192
09:53:26.892048 IP (tos 0x0, ttl 64, id 19628, offset 1480, flags [+],
proto UDP (17), length 1500) 10.13.3.10 > 10.13.150.1: udp
09:53:26.892229 IP (tos 0x0, ttl 64, id 19628, offset 2960, flags [+],
proto UDP (17), length 1500) 10.13.3.10 > 10.13.150.1: udp
09:53:26.892397 IP (tos 0x0, ttl 64, id 19628, offset 4440, flags [+],
proto UDP (17), length 1500) 10.13.3.10 > 10.13.150.1: udp
09:53:26.892529 IP (tos 0x0, ttl 64, id 19628, offset 5920, flags [+],
proto UDP (17), length 1500) 10.13.3.10 > 10.13.150.1: udp
09:53:26.892670 IP (tos 0x0, ttl 64, id 19628, offset 7400, flags
[none], proto UDP (17), length 820) 10.13.3.10 > 10.13.150.1: udp

Linux:

Linux localhost 2.6.35-rc1 #88 SMP Sun Jun 13 14:25:07 CST 2010 x86_64
Intel(R) Core(TM)2 Duo CPU E6550 @ 2.33GHz GenuineIntel GNU/Linux

08:01:53.730902 IP (tos 0x0, ttl 64, id 1263, offset 0, flags [+],
proto UDP (17), length 1500) 10.13.150.50.45295 > 10.13.150.1.8888:
UDP, length 8192
08:01:53.730955 IP (tos 0x0, ttl 64, id 1263, offset 1480, flags [+],
proto UDP (17), length 1500) 10.13.150.50 > 10.13.150.1: udp
08:01:53.731113 IP (tos 0x0, ttl 64, id 1263, offset 2960, flags [+],
proto UDP (17), length 1500) 10.13.150.50 > 10.13.150.1: udp
08:01:53.731139 IP (tos 0x0, ttl 64, id 1263, offset 4440, flags [+],
proto UDP (17), length 1500) 10.13.150.50 > 10.13.150.1: udp
08:01:53.731280 IP (tos 0x0, ttl 64, id 1263, offset 5920, flags [+],
proto UDP (17), length 1500) 10.13.150.50 > 10.13.150.1: udp
08:01:53.731306 IP (tos 0x0, ttl 64, id 1263, offset 7400, flags
[none], proto UDP (17), length 820) 10.13.150.50 > 10.13.150.1: udp

Later I'll test Windows.

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* Re: [PATCH] fragment: add fast path
From: YOSHIFUJI Hideaki @ 2010-06-14  2:24 UTC (permalink / raw)
  To: Changli Gao
  Cc: David Miller, kuznet, pekkas, jmorris, kaber, netdev,
	YOSHIFUJI Hideaki
In-Reply-To: <AANLkTilf218t898mfK1jeIFtFBFE8OvAeoOIxQTnmEt3@mail.gmail.com>

NIC?

--yoshfuji

(2010/06/14 11:03), Changli Gao wrote:
> On Mon, Jun 14, 2010 at 9:18 AM, David Miller<davem@davemloft.net>  wrote:
>> From: Changli Gao<xiaosuo@gmail.com>
>> Date: Mon, 14 Jun 2010 07:16:35 +0800
>>
>>> As the fragments are usually in order,
>>
>> In what universe does this happen "usually"?
>>
>> Linux has been outputting fragments in reverse order for more than 10
>> years.
>>
>
> I have tested next-next-2.6 and darwin, and found they are both send
> fragments in order:
>
> Darwin:
>
> Darwin localhost 9.8.0 Darwin Kernel Version 9.8.0: Wed Jul 15
> 16:55:01 PDT 2009; root:xnu-1228.15.4~1/RELEASE_I386 i386 i386
>
> 09:53:26.891820 IP (tos 0x0, ttl 64, id 19628, offset 0, flags [+],
> proto UDP (17), length 1500) 10.13.3.10.52189>  10.13.150.1.8888: UDP,
> length 8192
> 09:53:26.892048 IP (tos 0x0, ttl 64, id 19628, offset 1480, flags [+],
> proto UDP (17), length 1500) 10.13.3.10>  10.13.150.1: udp
> 09:53:26.892229 IP (tos 0x0, ttl 64, id 19628, offset 2960, flags [+],
> proto UDP (17), length 1500) 10.13.3.10>  10.13.150.1: udp
> 09:53:26.892397 IP (tos 0x0, ttl 64, id 19628, offset 4440, flags [+],
> proto UDP (17), length 1500) 10.13.3.10>  10.13.150.1: udp
> 09:53:26.892529 IP (tos 0x0, ttl 64, id 19628, offset 5920, flags [+],
> proto UDP (17), length 1500) 10.13.3.10>  10.13.150.1: udp
> 09:53:26.892670 IP (tos 0x0, ttl 64, id 19628, offset 7400, flags
> [none], proto UDP (17), length 820) 10.13.3.10>  10.13.150.1: udp
>
> Linux:
>
> Linux localhost 2.6.35-rc1 #88 SMP Sun Jun 13 14:25:07 CST 2010 x86_64
> Intel(R) Core(TM)2 Duo CPU E6550 @ 2.33GHz GenuineIntel GNU/Linux
>
> 08:01:53.730902 IP (tos 0x0, ttl 64, id 1263, offset 0, flags [+],
> proto UDP (17), length 1500) 10.13.150.50.45295>  10.13.150.1.8888:
> UDP, length 8192
> 08:01:53.730955 IP (tos 0x0, ttl 64, id 1263, offset 1480, flags [+],
> proto UDP (17), length 1500) 10.13.150.50>  10.13.150.1: udp
> 08:01:53.731113 IP (tos 0x0, ttl 64, id 1263, offset 2960, flags [+],
> proto UDP (17), length 1500) 10.13.150.50>  10.13.150.1: udp
> 08:01:53.731139 IP (tos 0x0, ttl 64, id 1263, offset 4440, flags [+],
> proto UDP (17), length 1500) 10.13.150.50>  10.13.150.1: udp
> 08:01:53.731280 IP (tos 0x0, ttl 64, id 1263, offset 5920, flags [+],
> proto UDP (17), length 1500) 10.13.150.50>  10.13.150.1: udp
> 08:01:53.731306 IP (tos 0x0, ttl 64, id 1263, offset 7400, flags
> [none], proto UDP (17), length 820) 10.13.150.50>  10.13.150.1: udp
>
> Later I'll test Windows.
>


^ permalink raw reply

* Re: [PATCH] fragment: add fast path
From: Changli Gao @ 2010-06-14  2:52 UTC (permalink / raw)
  To: YOSHIFUJI Hideaki; +Cc: David Miller, kuznet, pekkas, jmorris, kaber, netdev
In-Reply-To: <4C1592CE.5030501@linux-ipv6.org>

On Mon, Jun 14, 2010 at 10:24 AM, YOSHIFUJI Hideaki
<yoshfuji@linux-ipv6.org> wrote:
> NIC?
>

Linux: e1000
Darwin: tun

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* Re: [PATCH] fragment: add fast path
From: Changli Gao @ 2010-06-14  2:55 UTC (permalink / raw)
  To: David Miller; +Cc: kuznet, pekkas, jmorris, yoshfuji, kaber, netdev
In-Reply-To: <AANLkTilf218t898mfK1jeIFtFBFE8OvAeoOIxQTnmEt3@mail.gmail.com>

On Mon, Jun 14, 2010 at 10:03 AM, Changli Gao <xiaosuo@gmail.com> wrote:
>
> Later I'll test Windows.
>

Windows also sends fragments in order.

10:40:29.064453 IP (tos 0x0, ttl 128, id 35511, offset 0, flags [+],
proto UDP (17), length 1452) 221.238.33.71.4194 > 202.113.29.4.8888:
UDP, length 32768
10:40:29.064453 IP (tos 0x0, ttl 128, id 35511, offset 1432, flags
[+], proto UDP (17), length 1452) 221.238.33.71 > 202.113.29.4: udp
10:40:29.064453 IP (tos 0x0, ttl 128, id 35511, offset 2864, flags
[+], proto UDP (17), length 1452) 221.238.33.71 > 202.113.29.4: udp
10:40:29.064453 IP (tos 0x0, ttl 128, id 35511, offset 4296, flags
[+], proto UDP (17), length 1452) 221.238.33.71 > 202.113.29.4: udp
10:40:29.064453 IP (tos 0x0, ttl 128, id 35511, offset 5728, flags
[+], proto UDP (17), length 1452) 221.238.33.71 > 202.113.29.4: udp
10:40:29.064453 IP (tos 0x0, ttl 128, id 35511, offset 7160, flags
[+], proto UDP (17), length 1452) 221.238.33.71 > 202.113.29.4: udp
10:40:29.064453 IP (tos 0x0, ttl 128, id 35511, offset 8592, flags
[+], proto UDP (17), length 1452) 221.238.33.71 > 202.113.29.4: udp
10:40:29.064453 IP (tos 0x0, ttl 128, id 35511, offset 10024, flags
[+], proto UDP (17), length 1452) 221.238.33.71 > 202.113.29.4: udp
10:40:29.064453 IP (tos 0x0, ttl 128, id 35511, offset 11456, flags
[+], proto UDP (17), length 1452) 221.238.33.71 > 202.113.29.4: udp
10:40:29.064453 IP (tos 0x0, ttl 128, id 35511, offset 12888, flags
[+], proto UDP (17), length 1452) 221.238.33.71 > 202.113.29.4: udp
10:40:29.064453 IP (tos 0x0, ttl 128, id 35511, offset 14320, flags
[+], proto UDP (17), length 1452) 221.238.33.71 > 202.113.29.4: udp
10:40:29.064453 IP (tos 0x0, ttl 128, id 35511, offset 15752, flags
[+], proto UDP (17), length 1452) 221.238.33.71 > 202.113.29.4: udp
10:40:29.066406 IP (tos 0x0, ttl 128, id 35511, offset 17184, flags
[+], proto UDP (17), length 1452) 221.238.33.71 > 202.113.29.4: udp
10:40:29.066406 IP (tos 0x0, ttl 128, id 35511, offset 18616, flags
[+], proto UDP (17), length 1452) 221.238.33.71 > 202.113.29.4: udp
10:40:29.066406 IP (tos 0x0, ttl 128, id 35511, offset 20048, flags
[+], proto UDP (17), length 1452) 221.238.33.71 > 202.113.29.4: udp
10:40:29.066406 IP (tos 0x0, ttl 128, id 35511, offset 21480, flags
[+], proto UDP (17), length 1452) 221.238.33.71 > 202.113.29.4: udp
10:40:29.066406 IP (tos 0x0, ttl 128, id 35511, offset 22912, flags
[+], proto UDP (17), length 1452) 221.238.33.71 > 202.113.29.4: udp
10:40:29.066406 IP (tos 0x0, ttl 128, id 35511, offset 24344, flags
[+], proto UDP (17), length 1452) 221.238.33.71 > 202.113.29.4: udp
10:40:29.066406 IP (tos 0x0, ttl 128, id 35511, offset 25776, flags
[+], proto UDP (17), length 1452) 221.238.33.71 > 202.113.29.4: udp
10:40:29.066406 IP (tos 0x0, ttl 128, id 35511, offset 27208, flags
[+], proto UDP (17), length 1452) 221.238.33.71 > 202.113.29.4: udp
10:40:29.066406 IP (tos 0x0, ttl 128, id 35511, offset 28640, flags
[+], proto UDP (17), length 1452) 221.238.33.71 > 202.113.29.4: udp
10:40:29.066406 IP (tos 0x0, ttl 128, id 35511, offset 30072, flags
[+], proto UDP (17), length 1452) 221.238.33.71 > 202.113.29.4: udp
10:40:29.066406 IP (tos 0x0, ttl 128, id 35511, offset 31504, flags
[none], proto UDP (17), length 1292) 221.238.33.71 > 202.113.29.4: udp

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* Re: [PATCH] fragment: add fast path
From: YOSHIFUJI Hideaki @ 2010-06-14  5:15 UTC (permalink / raw)
  To: David Miller
  Cc: xiaosuo, kuznet, pekkas, jmorris, kaber, netdev,
	YOSHIFUJI Hideaki
In-Reply-To: <20100613.181857.189703148.davem@davemloft.net>

David Miller wrote:
>> As the fragments are usually in order,
> 
> In what universe does this happen "usually"?
> 
> Linux has been outputting fragments in reverse order for more than 10
> years.
> 
> I'm not applying this patch.

Dave,  I know we've been sending in reverse order, of course.
And, as far as I know, it seems Linux is the only implementation
which sends fragments in reverse order.

This is receiving side.  I think we should accept the fact
that Linux is not the only implementation, no?

--yoshfuji

^ permalink raw reply

* Re: [PATCH] fragment: add fast path
From: Eric Dumazet @ 2010-06-14  5:35 UTC (permalink / raw)
  To: Changli Gao
  Cc: David S. Miller, Alexey Kuznetsov, Pekka Savola (ipv6),
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev
In-Reply-To: <1276470995-21713-1-git-send-email-xiaosuo@gmail.com>

Le lundi 14 juin 2010 à 07:16 +0800, Changli Gao a écrit :
> add fast path
> 
> As the fragments are usually in order, it is likely the new fragments are
> at the end of the inet_frag_queue. In the fast path, we check if the skb at the
> end of the inet_frag_queue is the prev we expect.
> 
> Signed-off-by: Changli Gao <xiaosuo@gmail.com>
> ----
>  include/net/inet_frag.h |    1 +
>  net/ipv4/ip_fragment.c  |   17 +++++++++++++++++
>  net/ipv6/reassembly.c   |   16 ++++++++++++++++
>  3 files changed, 34 insertions(+)
> diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
> index 39f2dc9..16ff29a 100644
> --- a/include/net/inet_frag.h
> +++ b/include/net/inet_frag.h
> @@ -20,6 +20,7 @@ struct inet_frag_queue {
>  	atomic_t		refcnt;
>  	struct timer_list	timer;      /* when will this queue expire? */
>  	struct sk_buff		*fragments; /* list of received fragments */
> +	struct sk_buff		*fragments_tail;
>  	ktime_t			stamp;
>  	int			len;        /* total length of orig datagram */
>  	int			meat;
> diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
> index 75347ea..d8c36d4 100644
> --- a/net/ipv4/ip_fragment.c
> +++ b/net/ipv4/ip_fragment.c
> @@ -317,6 +317,7 @@ static int ip_frag_reinit(struct ipq *qp)
>  	qp->q.len = 0;
>  	qp->q.meat = 0;
>  	qp->q.fragments = NULL;
> +	qp->q.fragments_tail = NULL;
>  	qp->iif = 0;
>  
>  	return 0;
> @@ -389,6 +390,16 @@ static int ip_frag_queue(struct ipq *qp, struct sk_buff *skb)
>  	 * in the chain of fragments so far.  We must know where to put
>  	 * this fragment, right?
>  	 */
> +	prev = qp->q.fragments_tail;
> +	if (prev) {
> +		if (FRAG_CB(prev)->offset < offset) {
> +			next = NULL;
> +			goto found;
> +		}
> +	} else {
> +		next = NULL;

How can this chunk be a win ? queue is empty anyway.
You add tests and slow the 'other path'

> +		goto found;
> +	}

Quite frankly, one easy way to speedup things would be to move 'offset'
from  ipfrag_skb_cb close to skb->next field so that only one cache line
per frag is used during lookup.

I am not sure why we need "struct inet_skb_parm h;" field in struct
ipfrag_skb_cb... 

I probably need to wakeup this monday morning ?

Untested patch follows, only compiled.

[PATCH] frags: Remove unecessary bits

While trying to move 'offset' to the beginning of frag CB, I found
inet_skb_parm field was unused.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index 75347ea..0f51ae0 100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -55,7 +55,6 @@ static int sysctl_ipfrag_max_dist __read_mostly = 64;
 
 struct ipfrag_skb_cb
 {
-	struct inet_skb_parm	h;
 	int			offset;
 };
 
diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
index 6d4292f..122e0be 100644
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -57,7 +57,6 @@
 
 struct ip6frag_skb_cb
 {
-	struct inet6_skb_parm	h;
 	int			offset;
 };
 



^ permalink raw reply related

* Re: [PATCH] fragment: add fast path
From: Changli Gao @ 2010-06-14  5:59 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S. Miller, Alexey Kuznetsov, Pekka Savola (ipv6),
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev
In-Reply-To: <1276493743.2448.41.camel@edumazet-laptop>

On Mon, Jun 14, 2010 at 1:35 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le lundi 14 juin 2010 à 07:16 +0800, Changli Gao a écrit :
>> add fast path
>>
>> As the fragments are usually in order, it is likely the new fragments are
>> at the end of the inet_frag_queue. In the fast path, we check if the skb at the
>> end of the inet_frag_queue is the prev we expect.
>>
>> Signed-off-by: Changli Gao <xiaosuo@gmail.com>
>> ----
>>  include/net/inet_frag.h |    1 +
>>  net/ipv4/ip_fragment.c  |   17 +++++++++++++++++
>>  net/ipv6/reassembly.c   |   16 ++++++++++++++++
>>  3 files changed, 34 insertions(+)
>> diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
>> index 39f2dc9..16ff29a 100644
>> --- a/include/net/inet_frag.h
>> +++ b/include/net/inet_frag.h
>> @@ -20,6 +20,7 @@ struct inet_frag_queue {
>>       atomic_t                refcnt;
>>       struct timer_list       timer;      /* when will this queue expire? */
>>       struct sk_buff          *fragments; /* list of received fragments */
>> +     struct sk_buff          *fragments_tail;
>>       ktime_t                 stamp;
>>       int                     len;        /* total length of orig datagram */
>>       int                     meat;
>> diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
>> index 75347ea..d8c36d4 100644
>> --- a/net/ipv4/ip_fragment.c
>> +++ b/net/ipv4/ip_fragment.c
>> @@ -317,6 +317,7 @@ static int ip_frag_reinit(struct ipq *qp)
>>       qp->q.len = 0;
>>       qp->q.meat = 0;
>>       qp->q.fragments = NULL;
>> +     qp->q.fragments_tail = NULL;
>>       qp->iif = 0;
>>
>>       return 0;
>> @@ -389,6 +390,16 @@ static int ip_frag_queue(struct ipq *qp, struct sk_buff *skb)
>>        * in the chain of fragments so far.  We must know where to put
>>        * this fragment, right?
>>        */
>> +     prev = qp->q.fragments_tail;
>> +     if (prev) {
>> +             if (FRAG_CB(prev)->offset < offset) {
>> +                     next = NULL;
>> +                     goto found;
>> +             }
>> +     } else {
>> +             next = NULL;
>
> How can this chunk be a win ? queue is empty anyway.
> You add tests and slow the 'other path'
>
>> +             goto found;
>> +     }

Without this branch. prev needs to be initialized to zero again(of
course, we can avoid this by moving prev = NULL in the previous
branch). next needs an assignment, and a duplicate check if the the
queue is empty, which is already known in the above branch. Sorry, but
I can't see which path I slow.

        prev = NULL;
        for (next = qp->q.fragments; next != NULL; next = next->next) {
                if (FRAG_CB(next)->offset >= offset)
                        break;  /* bingo! */
                prev = next;
        }

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* Re: [PATCH] fragment: add fast path
From: Mitchell Erblich @ 2010-06-14  6:19 UTC (permalink / raw)
  To: YOSHIFUJI Hideaki
  Cc: David Miller, xiaosuo, kuznet, pekkas, jmorris, kaber, netdev
In-Reply-To: <4C15BAE5.6010900@linux-ipv6.org>

On Jun 13, 2010, at 10:15 PM, YOSHIFUJI Hideaki wrote:

> David Miller wrote:
>>> As the fragments are usually in order,
>> In what universe does this happen "usually"?
>> Linux has been outputting fragments in reverse order for more than 10
>> years.
>> I'm not applying this patch.
> 
> Dave,  I know we've been sending in reverse order, of course.
> And, as far as I know, it seems Linux is the only implementation
> which sends fragments in reverse order.
> 
> This is receiving side.  I think we should accept the fact
> that Linux is not the only implementation, no?
> 
> --yoshfuji

Group,

With respect to IPv4 and PATH MTU, aren't we
unlikely to generate packet frags?

With respect to IPv6, aren't  frags are less likely 
(intermediate nodes are not allowed to frag 
(rfc 2460, sect 4.5 Frag header)) as I would
expect the source to use the PATH MTU?

Thus, a faster path is to support the largest Jumbo
MTUs (MTU >=9000: 16,110 : 14,336 : 10,240)
where possible for bulk data transfers with
proper page orders, IMO.

Mitchell Erblich

> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] fragment: add fast path
From: Eric Dumazet @ 2010-06-14  6:40 UTC (permalink / raw)
  To: Changli Gao
  Cc: David S. Miller, Alexey Kuznetsov, Pekka Savola (ipv6),
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev
In-Reply-To: <1276493743.2448.41.camel@edumazet-laptop>


> I am not sure why we need "struct inet_skb_parm h;" field in struct
> ipfrag_skb_cb... 
> 
> I probably need to wakeup this monday morning ?
> 
> Untested patch follows, only compiled.
> 

Oh well, I see light now I woke up ;)

> [PATCH] frags: Remove unecessary bits
> 
> While trying to move 'offset' to the beginning of frag CB, I found
> inet_skb_parm field was unused.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> ---

Wrong patch of course.

We need some comment or document better why its there..




^ permalink raw reply

* Re: [PATCH] fragment: add fast path
From: Eric Dumazet @ 2010-06-14  7:04 UTC (permalink / raw)
  To: Changli Gao
  Cc: David S. Miller, Alexey Kuznetsov, Pekka Savola (ipv6),
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev
In-Reply-To: <AANLkTimN1-uyAvpxgpno55XoyPzBptdZimz-MA-0MCWZ@mail.gmail.com>

> 
> Without this branch. prev needs to be initialized to zero again(of
> course, we can avoid this by moving prev = NULL in the previous
> branch). next needs an assignment, and a duplicate check if the the
> queue is empty, which is already known in the above branch. Sorry, but
> I can't see which path I slow.
> 
>         prev = NULL;
>         for (next = qp->q.fragments; next != NULL; next = next->next) {
>                 if (FRAG_CB(next)->offset >= offset)
>                         break;  /* bingo! */
>                 prev = next;
>         }
> 

Concept of 'fast path' has changed over years. It used to be cpu
instructions and cycles, its now number of memory transactions.

The only thing we need to address are the cache lines we must bring into
cpu caches, and keep code short.

These days, one cache line miss -> more than one hundred instructions
that could be done during cpu stall. cpu cycles are cheap if code
already in instruction cache.

Adding a test to avoid entering a NULL loop (no fragment is stored yet)
just bloats the code, making it larger than necessary.

You dont need the else branch :

if (prev) {
	if (FRAG_CB(prev)->offset < offset) {
		next = NULL;
		goto found;
	}
else {
	next = NULL;
	goto found;
}

Just write :

next = NULL;
if (prev && FRAG_CB(prev)->offset < offset)
	goto found;

^ permalink raw reply

* Re: BUG: unable to handle kernel paging request at 000041ed00000001
From: Arturas @ 2010-06-14  7:05 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev
In-Reply-To: <1276185609.2448.12.camel@edumazet-laptop>

Hi,

your patch fixes hangs and i get a warning (see bellow if needed) when it triggers. 
As I understand it is workaround and a real fix should be different? 
What about bonding to be multiqueue aware?

I also have another issue with NMI. On older machine with 5500 xeons i 
have almost no overhead with nmi_watchdog enabled, but on this it is about twice.
without nmi enabled cpu peak average is 30%, and with nmi enabled i have 53%.
When traffic is not passing all cpus are idling at 100%.
Maybe overhead could be a little bit smaller? :-)

[ 8064.562106] WARNING: at net/core/dev.c:1964 dev_queue_xmit+0x504/0x520()
[ 8064.562108] Hardware name: S5520UR
[ 8064.562108] br0
[ 8064.562109] Modules linked in: ipt_REDIRECT xt_tcpudp ipt_set iptable_filter iptable_nat nf_nat ipt_addrtype xt_dscp xt_string xt_owner xt_multiport xt_mark xt_iprange xt_hashlimit xt_conntrack xt_connmark xt_DSCP xt_NFQUEUE xt_MARK xt_CONNMARK ip_tables x_tables ip_set_ipmap ip_set cls_u32 sch_htb ipmi_watchdog ipmi_devintf ipmi_si ipmi_msghandler nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack bonding ipv6 ixgbe igb mdio
[ 8064.562125] Pid: 8643, comm: lighttpd Not tainted 2.6.34-gentoo #6
[ 8064.562126] Call Trace:
[ 8064.562133]  [<ffffffff8103e463>] ? warn_slowpath_common+0x73/0xb0
[ 8064.562135]  [<ffffffff8103e500>] ? warn_slowpath_fmt+0x40/0x50
[ 8064.562137]  [<ffffffff812f47b4>] ? dev_queue_xmit+0x504/0x520
[ 8064.562141]  [<ffffffff813222d2>] ? ip_queue_xmit+0x182/0x3e0
[ 8064.562145]  [<ffffffff81335c1e>] ? tcp_init_tso_segs+0x2e/0x50
[ 8064.562147]  [<ffffffff81338bb5>] ? tcp_write_xmit+0x75/0xa00
[ 8064.562151]  [<ffffffff810494f3>] ? lock_timer_base+0x33/0x70
[ 8064.562153]  [<ffffffff8133665c>] ? tcp_transmit_skb+0x3ac/0x820
[ 8064.562155]  [<ffffffff8132bd76>] ? tcp_sendmsg+0x866/0xbf0
[ 8064.562156]  [<ffffffff81338d2c>] ? tcp_write_xmit+0x1ec/0xa00
[ 8064.562161]  [<ffffffff812e5f7d>] ? lock_sock_nested+0x3d/0xe0
[ 8064.562163]  [<ffffffff812e0ff0>] ? sock_aio_write+0x0/0x150
[ 8064.562166]  [<ffffffff81339599>] ? __tcp_push_pending_frames+0x19/0x80
[ 8064.562167]  [<ffffffff8132a3fa>] ? do_tcp_setsockopt+0x53a/0x690
[ 8064.562171]  [<ffffffff810bc439>] ? do_sync_readv_writev+0xa9/0xf0
[ 8064.562173]  [<ffffffff810494f3>] ? lock_timer_base+0x33/0x70
[ 8064.562174]  [<ffffffff810bc63f>] ? do_sync_read+0xbf/0x100
[ 8064.562176]  [<ffffffff810bcb82>] ? do_readv_writev+0x172/0x220
[ 8064.562179]  [<ffffffff810cdc3f>] ? d_kill+0x5f/0x80
[ 8064.562181]  [<ffffffff810ce3f8>] ? dput+0xb8/0x180
[ 8064.562183]  [<ffffffff812e1ef2>] ? sockfd_lookup_light+0x22/0x80
[ 8064.562185]  [<ffffffff812e248d>] ? sys_setsockopt+0x6d/0xd0
[ 8064.562188]  [<ffffffff81002502>] ? system_call_fastpath+0x16/0x1b


On Jun 10, 2010, at 7:00 PM, Eric Dumazet wrote:

> Le jeudi 10 juin 2010 à 16:45 +0300, Arturas a écrit :
> 
> This is right mailing list :)
> 
> I would try following patch for 2.6.34,
> not blindly trusting sk_tx_queue_get(sk)
> 
> --- net/core/dev.c.orig	2010-06-10 17:52:17.000000000 +0200
> +++ net/core/dev.c	2010-06-10 17:54:56.000000000 +0200
> @@ -1958,12 +1958,10 @@
> static inline u16 dev_cap_txqueue(struct net_device *dev, u16 queue_index)
> {
> 	if (unlikely(queue_index >= dev->real_num_tx_queues)) {
> -		if (net_ratelimit()) {
> -			WARN(1, "%s selects TX queue %d, but "
> -			     "real number of TX queues is %d\n",
> -			     dev->name, queue_index,
> -			     dev->real_num_tx_queues);
> -		}
> +		WARN_ONCE("%s selects TX queue %d, but "
> +			  "real number of TX queues is %d\n",
> +			  dev->name, queue_index,
> +			  dev->real_num_tx_queues);
> 		return 0;
> 	}
> 	return queue_index;
> @@ -1977,6 +1975,7 @@
> 
> 	if (sk_tx_queue_recorded(sk)) {
> 		queue_index = sk_tx_queue_get(sk);
> +		queue_index = dev_cap_txqueue(dev, queue_index);
> 	} else {
> 		const struct net_device_ops *ops = dev->netdev_ops;
> 
> 
> 


^ permalink raw reply

* [net-next PATCH] bnx2x: Fix link problem with some DACs
From: Yaniv Rosner @ 2010-06-14 10:26 UTC (permalink / raw)
  To: davem; +Cc: netdev

Change 2wire transfer rate of SFP+ module EEPROM from 400Khz to 100Khz since some DACs(direct attached cables) do not work at 400Khz.

Reported-by: Krzysztof Oldzki <ole@ans.pl>
Signed-off-by: Yaniv Rosner <yanivr@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
diff --git a/drivers/net/bnx2x_link.c b/drivers/net/bnx2x_link.c
index ff70be8..600bc44 100644
--- a/drivers/net/bnx2x_link.c
+++ b/drivers/net/bnx2x_link.c
@@ -4266,14 +4266,15 @@ static u8 bnx2x_ext_phy_init(struct link_params *params, struct link_vars *vars)
 					       MDIO_PMA_REG_10G_CTRL2, 0x0008);
 			}
 
-			/* Set 2-wire transfer rate to 400Khz since 100Khz
-			is not operational */
+			/* Set 2-wire transfer rate of SFP+ module EEPROM
+			to 100Khz since some DACs(direct attached cables) do
+			not work at 400Khz.*/
 			bnx2x_cl45_write(bp, params->port,
 				       ext_phy_type,
 				       ext_phy_addr,
 				       MDIO_PMA_DEVAD,
 				       MDIO_PMA_REG_8727_TWO_WIRE_SLAVE_ADDR,
-				       0xa101);
+				       0xa001);
 
 			/* Set TX PreEmphasis if needed */
 			if ((params->feature_config_flags &




^ permalink raw reply related

* Re: BUG: unable to handle kernel paging request at 000041ed00000001
From: Eric Dumazet @ 2010-06-14  8:31 UTC (permalink / raw)
  To: Arturas; +Cc: netdev
In-Reply-To: <80C864F3-B075-4E3A-B72E-6FCD945A8058@res.lt>

Le lundi 14 juin 2010 à 10:05 +0300, Arturas a écrit :
> Hi,
> 
> your patch fixes hangs and i get a warning (see bellow if needed) when it triggers. 
> As I understand it is workaround and a real fix should be different? 
> What about bonding to be multiqueue aware?
> 

But your problem is about bridge, not bonding (see trace).

And 2.6.34 wont accept such changes, its already released.

> I also have another issue with NMI. On older machine with 5500 xeons i 
> have almost no overhead with nmi_watchdog enabled, but on this it is about twice.
> without nmi enabled cpu peak average is 30%, and with nmi enabled i have 53%.
> When traffic is not passing all cpus are idling at 100%.
> Maybe overhead could be a little bit smaller? :-)
> 

I am a bit lost here, NMI have litle to do with network stack ;)


> [ 8064.562106] WARNING: at net/core/dev.c:1964 dev_queue_xmit+0x504/0x520()
> [ 8064.562108] Hardware name: S5520UR
> [ 8064.562108] br0
> [ 8064.562109] Modules linked in: ipt_REDIRECT xt_tcpudp ipt_set iptable_filter iptable_nat nf_nat ipt_addrtype xt_dscp xt_string xt_owner xt_multiport xt_mark xt_iprange xt_hashlimit xt_conntrack xt_connmark xt_DSCP xt_NFQUEUE xt_MARK xt_CONNMARK ip_tables x_tables ip_set_ipmap ip_set cls_u32 sch_htb ipmi_watchdog ipmi_devintf ipmi_si ipmi_msghandler nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack bonding ipv6 ixgbe igb mdio
> [ 8064.562125] Pid: 8643, comm: lighttpd Not tainted 2.6.34-gentoo #6
> [ 8064.562126] Call Trace:
> [ 8064.562133]  [<ffffffff8103e463>] ? warn_slowpath_common+0x73/0xb0
> [ 8064.562135]  [<ffffffff8103e500>] ? warn_slowpath_fmt+0x40/0x50
> [ 8064.562137]  [<ffffffff812f47b4>] ? dev_queue_xmit+0x504/0x520
> [ 8064.562141]  [<ffffffff813222d2>] ? ip_queue_xmit+0x182/0x3e0
> [ 8064.562145]  [<ffffffff81335c1e>] ? tcp_init_tso_segs+0x2e/0x50
> [ 8064.562147]  [<ffffffff81338bb5>] ? tcp_write_xmit+0x75/0xa00
> [ 8064.562151]  [<ffffffff810494f3>] ? lock_timer_base+0x33/0x70
> [ 8064.562153]  [<ffffffff8133665c>] ? tcp_transmit_skb+0x3ac/0x820
> [ 8064.562155]  [<ffffffff8132bd76>] ? tcp_sendmsg+0x866/0xbf0
> [ 8064.562156]  [<ffffffff81338d2c>] ? tcp_write_xmit+0x1ec/0xa00
> [ 8064.562161]  [<ffffffff812e5f7d>] ? lock_sock_nested+0x3d/0xe0
> [ 8064.562163]  [<ffffffff812e0ff0>] ? sock_aio_write+0x0/0x150
> [ 8064.562166]  [<ffffffff81339599>] ? __tcp_push_pending_frames+0x19/0x80
> [ 8064.562167]  [<ffffffff8132a3fa>] ? do_tcp_setsockopt+0x53a/0x690
> [ 8064.562171]  [<ffffffff810bc439>] ? do_sync_readv_writev+0xa9/0xf0
> [ 8064.562173]  [<ffffffff810494f3>] ? lock_timer_base+0x33/0x70
> [ 8064.562174]  [<ffffffff810bc63f>] ? do_sync_read+0xbf/0x100
> [ 8064.562176]  [<ffffffff810bcb82>] ? do_readv_writev+0x172/0x220
> [ 8064.562179]  [<ffffffff810cdc3f>] ? d_kill+0x5f/0x80
> [ 8064.562181]  [<ffffffff810ce3f8>] ? dput+0xb8/0x180
> [ 8064.562183]  [<ffffffff812e1ef2>] ? sockfd_lookup_light+0x22/0x80
> [ 8064.562185]  [<ffffffff812e248d>] ? sys_setsockopt+0x6d/0xd0
> [ 8064.562188]  [<ffffffff81002502>] ? system_call_fastpath+0x16/0x1b
> 


Could you please test another patch ?

Before calling sk_tx_queue_set(sk, queue_index); we should check if dst
dev is current device.

--- net/core/dev.c.orig	2010-06-10 17:52:17.000000000 +0200
+++ net/core/dev.c	2010-06-14 10:25:25.000000000 +0200
@@ -1958,12 +1958,10 @@
 static inline u16 dev_cap_txqueue(struct net_device *dev, u16 queue_index)
 {
 	if (unlikely(queue_index >= dev->real_num_tx_queues)) {
-		if (net_ratelimit()) {
-			WARN(1, "%s selects TX queue %d, but "
-			     "real number of TX queues is %d\n",
-			     dev->name, queue_index,
-			     dev->real_num_tx_queues);
-		}
+		WARN_ONCE("%s selects TX queue %d, but "
+			  "real number of TX queues is %d\n",
+			  dev->name, queue_index,
+			  dev->real_num_tx_queues);
 		return 0;
 	}
 	return queue_index;
@@ -1977,6 +1975,7 @@
 
 	if (sk_tx_queue_recorded(sk)) {
 		queue_index = sk_tx_queue_get(sk);
+		queue_index = dev_cap_txqueue(dev, queue_index);
 	} else {
 		const struct net_device_ops *ops = dev->netdev_ops;
 
@@ -1991,7 +1990,7 @@
 			if (sk) {
 				struct dst_entry *dst = rcu_dereference_bh(sk->sk_dst_cache);
 
-				if (dst && skb_dst(skb) == dst)
+				if (dst && skb_dst(skb) == dst && dst->dev == dev)
 					sk_tx_queue_set(sk, queue_index);
 			}
 		}



^ permalink raw reply

* Re: [net-next PATCH] bnx2x: Fix link problem with some DACs
From: Simon Horman @ 2010-06-14  8:49 UTC (permalink / raw)
  To: Yaniv Rosner; +Cc: davem, netdev
In-Reply-To: <1276511188.13056.26.camel@lb-tlvb-yanivr.il.broadcom.com>

On Mon, Jun 14, 2010 at 01:26:28PM +0300, Yaniv Rosner wrote:
> Change 2wire transfer rate of SFP+ module EEPROM from 400Khz to 100Khz since some DACs(direct attached cables) do not work at 400Khz.
> 
> Reported-by: Krzysztof Oldzki <ole@ans.pl>
> Signed-off-by: Yaniv Rosner <yanivr@broadcom.com>
> Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
> ---
> diff --git a/drivers/net/bnx2x_link.c b/drivers/net/bnx2x_link.c
> index ff70be8..600bc44 100644
> --- a/drivers/net/bnx2x_link.c
> +++ b/drivers/net/bnx2x_link.c
> @@ -4266,14 +4266,15 @@ static u8 bnx2x_ext_phy_init(struct link_params *params, struct link_vars *vars)
>  					       MDIO_PMA_REG_10G_CTRL2, 0x0008);
>  			}
>  
> -			/* Set 2-wire transfer rate to 400Khz since 100Khz
> -			is not operational */

Doesn't the above comment indicate there is some HW where 100Khz doesn't work?

> +			/* Set 2-wire transfer rate of SFP+ module EEPROM
> +			to 100Khz since some DACs(direct attached cables) do
> +			not work at 400Khz.*/
>  			bnx2x_cl45_write(bp, params->port,
>  				       ext_phy_type,
>  				       ext_phy_addr,
>  				       MDIO_PMA_DEVAD,
>  				       MDIO_PMA_REG_8727_TWO_WIRE_SLAVE_ADDR,
> -				       0xa101);
> +				       0xa001);
>  
>  			/* Set TX PreEmphasis if needed */
>  			if ((params->feature_config_flags &
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH net-next-2.6] ip_frag: Remove some atomic ops
From: Eric Dumazet @ 2010-06-14  9:02 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

Instead of doing one atomic operation per frag, we can factorize them.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/ipv4/ip_fragment.c |    3 +--
 net/ipv6/reassembly.c  |    3 +--
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index 75347ea..963c368 100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -556,7 +556,6 @@ static int ip_frag_reasm(struct ipq *qp, struct sk_buff *prev,
 
 	skb_shinfo(head)->frag_list = head->next;
 	skb_push(head, head->data - skb_network_header(head));
-	atomic_sub(head->truesize, &qp->q.net->mem);
 
 	for (fp=head->next; fp; fp = fp->next) {
 		head->data_len += fp->len;
@@ -566,8 +565,8 @@ static int ip_frag_reasm(struct ipq *qp, struct sk_buff *prev,
 		else if (head->ip_summed == CHECKSUM_COMPLETE)
 			head->csum = csum_add(head->csum, fp->csum);
 		head->truesize += fp->truesize;
-		atomic_sub(fp->truesize, &qp->q.net->mem);
 	}
+	atomic_sub(head->truesize, &qp->q.net->mem);
 
 	head->next = NULL;
 	head->dev = dev;
diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
index 6d4292f..a630506 100644
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -524,7 +524,6 @@ static int ip6_frag_reasm(struct frag_queue *fq, struct sk_buff *prev,
 	skb_shinfo(head)->frag_list = head->next;
 	skb_reset_transport_header(head);
 	skb_push(head, head->data - skb_network_header(head));
-	atomic_sub(head->truesize, &fq->q.net->mem);
 
 	for (fp=head->next; fp; fp = fp->next) {
 		head->data_len += fp->len;
@@ -534,8 +533,8 @@ static int ip6_frag_reasm(struct frag_queue *fq, struct sk_buff *prev,
 		else if (head->ip_summed == CHECKSUM_COMPLETE)
 			head->csum = csum_add(head->csum, fp->csum);
 		head->truesize += fp->truesize;
-		atomic_sub(fp->truesize, &fq->q.net->mem);
 	}
+	atomic_sub(head->truesize, &fq->q.net->mem);
 
 	head->next = NULL;
 	head->dev = dev;



^ permalink raw reply related

* RE: [net-next PATCH] bnx2x: Fix link problem with some DACs
From: Yaniv Rosner @ 2010-06-14  9:04 UTC (permalink / raw)
  To: Simon Horman
  Cc: davem@davemloft.net, netdev@vger.kernel.org, Eilon Greenstein,
	ole@ans.pl
In-Reply-To: <20100614084956.GC25374@verge.net.au>

Simon,
The previous comment refers to HW limitation of older PHY version which is related only to write operation that we don't do anyhow.

Regards,
Yaniv
-----Original Message-----
From: Simon Horman [mailto:horms@verge.net.au] 
Sent: Monday, June 14, 2010 11:50 AM
To: Yaniv Rosner
Cc: davem@davemloft.net; netdev@vger.kernel.org
Subject: Re: [net-next PATCH] bnx2x: Fix link problem with some DACs

On Mon, Jun 14, 2010 at 01:26:28PM +0300, Yaniv Rosner wrote:
> Change 2wire transfer rate of SFP+ module EEPROM from 400Khz to 100Khz since some DACs(direct attached cables) do not work at 400Khz.
> 
> Reported-by: Krzysztof Oldzki <ole@ans.pl>
> Signed-off-by: Yaniv Rosner <yanivr@broadcom.com>
> Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
> ---
> diff --git a/drivers/net/bnx2x_link.c b/drivers/net/bnx2x_link.c
> index ff70be8..600bc44 100644
> --- a/drivers/net/bnx2x_link.c
> +++ b/drivers/net/bnx2x_link.c
> @@ -4266,14 +4266,15 @@ static u8 bnx2x_ext_phy_init(struct link_params *params, struct link_vars *vars)
>  					       MDIO_PMA_REG_10G_CTRL2, 0x0008);
>  			}
>  
> -			/* Set 2-wire transfer rate to 400Khz since 100Khz
> -			is not operational */

Doesn't the above comment indicate there is some HW where 100Khz doesn't work?

> +			/* Set 2-wire transfer rate of SFP+ module EEPROM
> +			to 100Khz since some DACs(direct attached cables) do
> +			not work at 400Khz.*/
>  			bnx2x_cl45_write(bp, params->port,
>  				       ext_phy_type,
>  				       ext_phy_addr,
>  				       MDIO_PMA_DEVAD,
>  				       MDIO_PMA_REG_8727_TWO_WIRE_SLAVE_ADDR,
> -				       0xa101);
> +				       0xa001);
>  
>  			/* Set TX PreEmphasis if needed */
>  			if ((params->feature_config_flags &
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



^ permalink raw reply

* Re: [RFC][PATCH] Fix another namespace issue with devices assigned to classes
From: Kay Sievers @ 2010-06-14  9:13 UTC (permalink / raw)
  To: Johannes Berg; +Cc: Eric W. Biederman, Greg KH, netdev
In-Reply-To: <1276250151.3640.11.camel@jlt3.sipsolutions.net>

On Fri, Jun 11, 2010 at 11:55, Johannes Berg <johannes@sipsolutions.net> wrote:
> On Tue, 2010-06-08 at 18:39 +0200, Kay Sievers wrote:
>
>> That all works if you have two modules, like almost all buses have.
>> That's what I meant, that we need to add stuff to the core to be able
>> to cleanup bus devices internally too, if we use everything in a
>> single module, which is also supposed to cleanup on unload, like the
>> network devices like to do.
>
> Or some "wait for bus to be cleaned up" we can call in the module exit
> maybe?

That would block the rmmod process until the resources are cleaned up,
wouldn't it?

The network devices can do this because the cleanup code is always
compiled-in, for a module cleaning up itself, this is kind of
complicated, isn't it?

Kay

^ permalink raw reply

* Re: [net-next PATCH] bnx2x: Fix link problem with some DACs
From: Simon Horman @ 2010-06-14  9:18 UTC (permalink / raw)
  To: Yaniv Rosner
  Cc: davem@davemloft.net, netdev@vger.kernel.org, Eilon Greenstein,
	ole@ans.pl
In-Reply-To: <41FDA5AFCC66094F83A3D9545495A2A4107119FE0A@SJEXCHCCR01.corp.ad.broadcom.com>

On Mon, Jun 14, 2010 at 02:04:17AM -0700, Yaniv Rosner wrote:
> Simon,
> The previous comment refers to HW limitation of older PHY version which is related only to write operation that we don't do anyhow.

Thanks for the clarification. In that case I have no objection.

> 
> Regards,
> Yaniv
> -----Original Message-----
> From: Simon Horman [mailto:horms@verge.net.au] 
> Sent: Monday, June 14, 2010 11:50 AM
> To: Yaniv Rosner
> Cc: davem@davemloft.net; netdev@vger.kernel.org
> Subject: Re: [net-next PATCH] bnx2x: Fix link problem with some DACs
> 
> On Mon, Jun 14, 2010 at 01:26:28PM +0300, Yaniv Rosner wrote:
> > Change 2wire transfer rate of SFP+ module EEPROM from 400Khz to 100Khz since some DACs(direct attached cables) do not work at 400Khz.
> > 
> > Reported-by: Krzysztof Oldzki <ole@ans.pl>
> > Signed-off-by: Yaniv Rosner <yanivr@broadcom.com>
> > Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
> > ---
> > diff --git a/drivers/net/bnx2x_link.c b/drivers/net/bnx2x_link.c
> > index ff70be8..600bc44 100644
> > --- a/drivers/net/bnx2x_link.c
> > +++ b/drivers/net/bnx2x_link.c
> > @@ -4266,14 +4266,15 @@ static u8 bnx2x_ext_phy_init(struct link_params *params, struct link_vars *vars)
> >  					       MDIO_PMA_REG_10G_CTRL2, 0x0008);
> >  			}
> >  
> > -			/* Set 2-wire transfer rate to 400Khz since 100Khz
> > -			is not operational */
> 
> Doesn't the above comment indicate there is some HW where 100Khz doesn't work?
> 
> > +			/* Set 2-wire transfer rate of SFP+ module EEPROM
> > +			to 100Khz since some DACs(direct attached cables) do
> > +			not work at 400Khz.*/
> >  			bnx2x_cl45_write(bp, params->port,
> >  				       ext_phy_type,
> >  				       ext_phy_addr,
> >  				       MDIO_PMA_DEVAD,
> >  				       MDIO_PMA_REG_8727_TWO_WIRE_SLAVE_ADDR,
> > -				       0xa101);
> > +				       0xa001);
> >  
> >  			/* Set TX PreEmphasis if needed */
> >  			if ((params->feature_config_flags &
> > 
> > 
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [RFC][PATCH] Fix another namespace issue with devices assigned to  classes
From: Johannes Berg @ 2010-06-14  9:20 UTC (permalink / raw)
  To: Kay Sievers; +Cc: Eric W. Biederman, Greg KH, netdev
In-Reply-To: <AANLkTil4IvOiDHnLwAaeseEPVgYp-EF5TK1HfHitY3-H@mail.gmail.com>

On Mon, 2010-06-14 at 11:13 +0200, Kay Sievers wrote:

> That would block the rmmod process until the resources are cleaned up,
> wouldn't it?

Yes, would that be so bad?

> The network devices can do this because the cleanup code is always
> compiled-in, for a module cleaning up itself, this is kind of
> complicated, isn't it?

It just needs a wait_for_bus_exit() function that the module calls in
_exit?

johannes

^ permalink raw reply

* [PATCH net-next-2.6] ipfrag : frag_kfree_skb() cleanup
From: Eric Dumazet @ 2010-06-14  9:22 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

Third param (work) is unused, remove it.

Remove __inline__ and inline qualifiers.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/ipv4/ip_fragment.c |    9 +++------
 net/ipv6/reassembly.c  |    7 ++-----
 2 files changed, 5 insertions(+), 11 deletions(-)

diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index 75347ea..7e660dd 100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -124,11 +124,8 @@ static int ip4_frag_match(struct inet_frag_queue *q, void *a)
 }
 
 /* Memory Tracking Functions. */
-static __inline__ void frag_kfree_skb(struct netns_frags *nf,
-		struct sk_buff *skb, int *work)
+static void frag_kfree_skb(struct netns_frags *nf, struct sk_buff *skb)
 {
-	if (work)
-		*work -= skb->truesize;
 	atomic_sub(skb->truesize, &nf->mem);
 	kfree_skb(skb);
 }
@@ -309,7 +306,7 @@ static int ip_frag_reinit(struct ipq *qp)
 	fp = qp->q.fragments;
 	do {
 		struct sk_buff *xp = fp->next;
-		frag_kfree_skb(qp->q.net, fp, NULL);
+		frag_kfree_skb(qp->q.net, fp);
 		fp = xp;
 	} while (fp);
 
@@ -446,7 +443,7 @@ static int ip_frag_queue(struct ipq *qp, struct sk_buff *skb)
 				qp->q.fragments = next;
 
 			qp->q.meat -= free_it->len;
-			frag_kfree_skb(qp->q.net, free_it, NULL);
+			frag_kfree_skb(qp->q.net, free_it);
 		}
 	}
 
diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
index 6d4292f..f967978 100644
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -150,11 +150,8 @@ int ip6_frag_match(struct inet_frag_queue *q, void *a)
 EXPORT_SYMBOL(ip6_frag_match);
 
 /* Memory Tracking Functions. */
-static inline void frag_kfree_skb(struct netns_frags *nf,
-		struct sk_buff *skb, int *work)
+static void frag_kfree_skb(struct netns_frags *nf, struct sk_buff *skb)
 {
-	if (work)
-		*work -= skb->truesize;
 	atomic_sub(skb->truesize, &nf->mem);
 	kfree_skb(skb);
 }
@@ -392,7 +389,7 @@ static int ip6_frag_queue(struct frag_queue *fq, struct sk_buff *skb,
 				fq->q.fragments = next;
 
 			fq->q.meat -= free_it->len;
-			frag_kfree_skb(fq->q.net, free_it, NULL);
+			frag_kfree_skb(fq->q.net, free_it);
 		}
 	}
 



^ permalink raw reply related

* Re: BUG: unable to handle kernel paging request at 000041ed00000001
From: Arturas @ 2010-06-14  9:27 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev
In-Reply-To: <1276504295.2478.35.camel@edumazet-laptop>


On Jun 14, 2010, at 11:31 AM, Eric Dumazet wrote:
> But your problem is about bridge, not bonding (see trace).
I want it for performance reason, not because of this bug.
Bridge isn't a bottleneck for me, but bonding may be and not to me only,
but for many people. I believe that performance gain would be more 
than 1% on cpu? :-)

> 
> And 2.6.34 wont accept such changes, its already released.
It can be as a separate patch or I can test 2.3.35 if it would accept
such change. I just need a stable kernel with good performance :-)

> 
>> I also have another issue with NMI. On older machine with 5500 xeons i 
>> have almost no overhead with nmi_watchdog enabled, but on this it is about twice.
>> without nmi enabled cpu peak average is 30%, and with nmi enabled i have 53%.
>> When traffic is not passing all cpus are idling at 100%.
>> Maybe overhead could be a little bit smaller? :-)
>> 
> 
> I am a bit lost here, NMI have litle to do with network stack ;)
May this be related to very recent cpu? As i understand NMI depends on CPU.

> 
> 
> Could you please test another patch ?
Applied, it's working correctly for now. If i'll get a warning i'll write you or maybe I
shouldn't get it if a patch is correct?

> 
> Before calling sk_tx_queue_set(sk, queue_index); we should check if dst
> dev is current device.


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox