* Re: Bug in skb_segment: fskb->len != len
From: Eric Dumazet @ 2013-10-30 4:03 UTC (permalink / raw)
To: Herbert Xu; +Cc: Christoph Paasch, netdev
In-Reply-To: <20131030015002.GB1347@gondor.apana.org.au>
On Wed, 2013-10-30 at 09:50 +0800, Herbert Xu wrote:
> On Tue, Oct 29, 2013 at 08:08:13AM -0700, Eric Dumazet wrote:
> >
> > GRO layer was updated to be able to stack two or three sk_buff,
> > fully populated with page frags.
> >
> > Thats quite mandatory to support line rate for 40Gb links.
>
> Indeed I missed this. Which commit introduced this change?
This was mentioned in the changelog :
<quote>
Christoph Paasch and Jerry Chu reported crashes in skb_segment() caused
by commit 8a29111c7ca6 ("net: gro: allow to build full sized skb")
</quote>
^ permalink raw reply
* Re: Bug in skb_segment: fskb->len != len
From: Herbert Xu @ 2013-10-30 4:06 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Christoph Paasch, netdev
In-Reply-To: <1383105785.4857.21.camel@edumazet-glaptop.roam.corp.google.com>
On Tue, Oct 29, 2013 at 09:03:05PM -0700, Eric Dumazet wrote:
> On Wed, 2013-10-30 at 09:50 +0800, Herbert Xu wrote:
> >
> > Indeed I missed this. Which commit introduced this change?
>
> This was mentioned in the changelog :
>
> <quote>
>
> Christoph Paasch and Jerry Chu reported crashes in skb_segment() caused
> by commit 8a29111c7ca6 ("net: gro: allow to build full sized skb")
>
> </quote>
Thanks.
In that case we should be able to segment these full-sized skbs
without linearising. What we should do is iterate through
each frag_list entry and apply the usual GSO code to them.
Cheers,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply
* Re: [PATCH v2 net-next] net: introduce gro_frag_list_enable sysctl
From: Eric Dumazet @ 2013-10-30 4:06 UTC (permalink / raw)
To: David Miller; +Cc: christoph.paasch, herbert, netdev, hkchu, mwdalton
In-Reply-To: <20131029.220253.1263087684709722001.davem@davemloft.net>
On Tue, 2013-10-29 at 22:02 -0400, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Tue, 29 Oct 2013 17:53:48 -0700
>
> > So should we apply the first fix to avoid the BUG_ON() ?
>
> Please be more specific, are you talking about splitting up
> this patch in some way?
I am referring to the first version I sent to Christoph :
http://www.spinics.net/lists/netdev/msg255452.html
Then I added the sysctl to avoid future packets to get a frag_list in
the first place.
Doing a smart skb_segment() is possible, but this function is already
complex.
I am not sure 64K GRO packets that must be segmented are going to be
faster than 22K packets without segmentation at all (TSO path on xmit)
^ permalink raw reply
* Re: [PATCH v2 net-next] net: introduce gro_frag_list_enable sysctl
From: Herbert Xu @ 2013-10-30 4:08 UTC (permalink / raw)
To: Eric Dumazet; +Cc: David Miller, christoph.paasch, netdev, hkchu, mwdalton
In-Reply-To: <1383106012.4857.26.camel@edumazet-glaptop.roam.corp.google.com>
On Tue, Oct 29, 2013 at 09:06:52PM -0700, Eric Dumazet wrote:
>
> I am not sure 64K GRO packets that must be segmented are going to be
> faster than 22K packets without segmentation at all (TSO path on xmit)
Indeed that is a tough call, but I think conceptually the 64K
case is much nicer than a sysctl that gets magically turned off.
Cheers,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply
* Re: [PATCH v2 net-next] net: introduce gro_frag_list_enable sysctl
From: Herbert Xu @ 2013-10-30 4:09 UTC (permalink / raw)
To: Eric Dumazet; +Cc: David Miller, christoph.paasch, netdev, hkchu, mwdalton
In-Reply-To: <20131030040818.GB2755@gondor.apana.org.au>
On Wed, Oct 30, 2013 at 12:08:18PM +0800, Herbert Xu wrote:
> On Tue, Oct 29, 2013 at 09:06:52PM -0700, Eric Dumazet wrote:
> >
> > I am not sure 64K GRO packets that must be segmented are going to be
> > faster than 22K packets without segmentation at all (TSO path on xmit)
>
> Indeed that is a tough call, but I think conceptually the 64K
> case is much nicer than a sysctl that gets magically turned off.
Also at some point we'll want to do >64K GRO/GSO too so we'll
have to face this complexity one day.
Cheers,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply
* Re: [PATCH v2 net-next] net: introduce gro_frag_list_enable sysctl
From: Jerry Chu @ 2013-10-30 4:15 UTC (permalink / raw)
To: Herbert Xu
Cc: Eric Dumazet, David Miller, Christoph Paasch,
netdev@vger.kernel.org, Michael Dalton
In-Reply-To: <20131030040949.GA2912@gondor.apana.org.au>
On Tue, Oct 29, 2013 at 9:09 PM, Herbert Xu <herbert@gondor.apana.org.au> wrote:
> On Wed, Oct 30, 2013 at 12:08:18PM +0800, Herbert Xu wrote:
>> On Tue, Oct 29, 2013 at 09:06:52PM -0700, Eric Dumazet wrote:
>> >
>> > I am not sure 64K GRO packets that must be segmented are going to be
>> > faster than 22K packets without segmentation at all (TSO path on xmit)
>>
>> Indeed that is a tough call, but I think conceptually the 64K
>> case is much nicer than a sysctl that gets magically turned off.
>
> Also at some point we'll want to do >64K GRO/GSO too so we'll
> have to face this complexity one day.
Not sure how this is possible with the IP datagram length limit. (Am I
missing something, like pkt chaining?)
Jerry
>
> Cheers,
> --
> Email: Herbert Xu <herbert@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply
* Re: [PATCH v2 net-next] net: introduce gro_frag_list_enable sysctl
From: Eric Dumazet @ 2013-10-30 4:16 UTC (permalink / raw)
To: Herbert Xu; +Cc: David Miller, christoph.paasch, netdev, hkchu, mwdalton
In-Reply-To: <20131030040818.GB2755@gondor.apana.org.au>
On Wed, 2013-10-30 at 12:08 +0800, Herbert Xu wrote:
> On Tue, Oct 29, 2013 at 09:06:52PM -0700, Eric Dumazet wrote:
> >
> > I am not sure 64K GRO packets that must be segmented are going to be
> > faster than 22K packets without segmentation at all (TSO path on xmit)
>
> Indeed that is a tough call, but I think conceptually the 64K
> case is much nicer than a sysctl that gets magically turned off.
The thing is this only matters for hosts receiving at line rate on few
TCP flows.
A router should not build too big GRO packets, as it adds latencies.
Really, we had to make TSO packets being auto sized, lets not add the
syndrome again.
So I do not really understand David concern about emitting a warning.
If a machine is used as a router, building GRO packets of 17 MSS is
absolutely fine.
^ permalink raw reply
* Re: [PATCH v2 net-next] net: introduce gro_frag_list_enable sysctl
From: Herbert Xu @ 2013-10-30 4:19 UTC (permalink / raw)
To: Eric Dumazet; +Cc: David Miller, christoph.paasch, netdev, hkchu, mwdalton
In-Reply-To: <1383106577.4857.30.camel@edumazet-glaptop.roam.corp.google.com>
On Tue, Oct 29, 2013 at 09:16:17PM -0700, Eric Dumazet wrote:
>
> The thing is this only matters for hosts receiving at line rate on few
> TCP flows.
>
> A router should not build too big GRO packets, as it adds latencies.
>
> Really, we had to make TSO packets being auto sized, lets not add the
> syndrome again.
>
> So I do not really understand David concern about emitting a warning.
>
> If a machine is used as a router, building GRO packets of 17 MSS is
> absolutely fine.
It's not just routers you know, we use the same code on bridges
as part of virtualisation. So it absolutely does matter.
In fact this is why I wrote GRO in the first place, to make it
work for virtualisation.
Cheers,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply
* Re: named network namespace -- setns() with Invalid argument (errno 22)
From: Eric W. Biederman @ 2013-10-30 4:33 UTC (permalink / raw)
To: dilip.daya; +Cc: netdev
In-Reply-To: <1383092184.12859.78.camel@dilip-laptop>
Dilip Daya <dilip.daya@hp.com> writes:
> Hi All,
>
> Is the following intended behavior for adding "nested" named network namespaces ?
Not exactly intended but this is not misbehavior either.
Mostly this is a don't do that then scenario.
Eric
> Steps to reproduce:
>
> # uname -r
> 3.10.1
>
>
> # /sbin/ip -V
> ip utility, iproute2-ss130903
>
>
> Existing network namespaces:
> # ip netns list
> NETNS0
> NETNS1
>
>
> List of named network namespace objects with inode/permissions:
> # ls -li /var/run/netns/
> total 0
> 4026532310 -r--r--r-- 1 root root 0 Oct 29 09:11 NETNS0
> 4026532366 -r--r--r-- 1 root root 0 Oct 29 09:13 NETNS1
>
> Enter existing named network namespace:
> # ip netns exec NETNS0 bash
>
> List network devices for named netns:
> # ls -l /sys/class/net/
> total 0
> lrwxrwxrwx 1 root root 0 Oct 29 12:25 lo -> ../../devices/virtual/net/lo/
>
> List of named network namespace objects with inode/permissions:
> # ls -li /var/run/netns/
> total 0
> 4026532310 -r--r--r-- 1 root root 0 Oct 29 09:11 NETNS0
> 4026532366 -r--r--r-- 1 root root 0 Oct 29 09:13 NETNS1
>
>
> # ip netns add NETNS0a <<< adding NETNS0a from within NETNS0
> # ls -li /var/run/netns/
> total 0
> 4026532310 -r--r--r-- 1 root root 0 Oct 29 09:11 NETNS0
> 4026532366 -r--r--r-- 1 root root 0 Oct 29 09:13 NETNS1
> 4026532423 -r--r--r-- 1 root root 0 Oct 29 12:28 NETNS0a
> ^^^^^^^^^^ ^^^^^^^^^^
> inode permissions
>
>
> # ip netns exec NETNS0a ls -l /sys/class/net/
> total 0
> lrwxrwxrwx 1 root root 0 Oct 29 12:28 lo -> ../../devices/virtual/net/lo
>
> # exit <<< exiting from NETNS0
>
> Listing from host/default namespace:
> # ls -li /var/run/netns/
> total 0
> 4026532310 -r--r--r-- 1 root root 0 Oct 29 09:11 NETNS0
> 4026532366 -r--r--r-- 1 root root 0 Oct 29 09:13 NETNS1
> 964863 ---------- 1 root root 0 Oct 29 12:28 NETNS0a <<< NULL permissions
> ^^^^^^ ^^^^^^^^^^
>
>
> Re-enter NETNS0:
> # ip netns exec NETNS0 bash
> # ls -li /var/run/netns/
> total 0
> 4026532310 -r--r--r-- 1 root root 0 Oct 29 09:11 NETNS0
> 4026532366 -r--r--r-- 1 root root 0 Oct 29 09:13 NETNS1
> 964863 ---------- 1 root root 0 Oct 29 12:28 NETNS0a <<< NULL permissions
> ^^^^^^^^^^
>
>
> # ip netns exec NETNS0a ls -l /sys/class/net/
> seting the network namespace "NETNS0a" failed: Invalid argument
>
> => It seems the bash shell that created the nested named netns is the only
> one that can view/enter the nested named netns. All other attempts from
> either another bash shell or host/default namespace will get a different
> inode with NULL permissions. Once the initial bash shell that created the
> nested named netns exists the nested netns is rendered unusable due to
> NULL permissions on its inode. setns() Invalid argument (errno 22) seems
> to be due to NULL permissions on /var/run/netns/<netnsName> object.
>
>
> Thanks.
^ permalink raw reply
* Re: [PATCH v2 net-next] net: introduce gro_frag_list_enable sysctl
From: Eric Dumazet @ 2013-10-30 4:34 UTC (permalink / raw)
To: Herbert Xu; +Cc: David Miller, christoph.paasch, netdev, hkchu, mwdalton
In-Reply-To: <20131030041959.GA3162@gondor.apana.org.au>
On Wed, 2013-10-30 at 12:19 +0800, Herbert Xu wrote:
> On Tue, Oct 29, 2013 at 09:16:17PM -0700, Eric Dumazet wrote:
> >
> > The thing is this only matters for hosts receiving at line rate on few
> > TCP flows.
> >
> > A router should not build too big GRO packets, as it adds latencies.
> >
> > Really, we had to make TSO packets being auto sized, lets not add the
> > syndrome again.
> >
> > So I do not really understand David concern about emitting a warning.
> >
> > If a machine is used as a router, building GRO packets of 17 MSS is
> > absolutely fine.
>
> It's not just routers you know, we use the same code on bridges
> as part of virtualisation. So it absolutely does matter.
>
What matters ?
GRO ?
Or making size of GRO packets not too big, or making them bigger ?
Before my patch, GRO packets were 17 MSS, and nobody complained packets
were too small, so what are you saying exactly ?
^ permalink raw reply
* Re: Bug in skb_segment: fskb->len != len
From: Eric Dumazet @ 2013-10-30 4:37 UTC (permalink / raw)
To: Herbert Xu; +Cc: Christoph Paasch, netdev
In-Reply-To: <20131030040644.GA2755@gondor.apana.org.au>
On Wed, 2013-10-30 at 12:06 +0800, Herbert Xu wrote:
> In that case we should be able to segment these full-sized skbs
> without linearising. What we should do is iterate through
> each frag_list entry and apply the usual GSO code to them.
Well, if you really want to be smart, we could build GSO packets of ~16
MSS, as most NIC support TSO.
So a 64K packet containing 44 MSS could be split into 3 TSO packets.
^ permalink raw reply
* Re: [PATCH net-next] virtio_net: migrate mergeable rx buffers to page frag allocators
From: Jason Wang @ 2013-10-30 4:41 UTC (permalink / raw)
To: Michael Dalton, Eric Northup
Cc: Michael S. Tsirkin, netdev, Eric Dumazet, lf-virt,
Daniel Borkmann, David S. Miller
In-Reply-To: <CANJ5vP+=wCrzJ3Fdg0_5ZTTT-V-fw2AXUG=EgwsUcxXO=UpMgw@mail.gmail.com>
On 10/30/2013 03:05 AM, Michael Dalton wrote:
> Agreed Eric, the buffer size should be increased so that we can accommodate a
> MTU-sized packet + mergeable virtio net header in a single buffer. I will send
> a patch to fix shortly cleaning up the #define headers as Rusty indicated and
> increasing the buffer size slightly by VirtioNet header size bytes per Eric.
>
> Jason, I'll followup with you directly - I'd like to know your exact workload
> (single steam or multi-stream netperf?), VM configuration, etc, and also see if
> the nit that Erichas pointed out affects your results.
It's just a single TCP_STREAM from local host to guest with vhost_net
enabled. The host and guest were connected with bridge.
> It is also
> worth noting that
> we may want to tune the queue sizes for your benchmarks, e.g, by reducing
> buffer size from 4KB to MTU-sized but keeping queue length constant, we're
> implicitly decreasing the number of bytes stored in the VirtioQueue for the
> VirtioNet device, so increasing the queue size may help.
The problem is the overhead of introduced by the frag refill and frag
list. Looking at tg3 and bnx2x, the frag were only used for small
packets not for jumbo frame which does not need a frag list. But your
patch tires to use it for GSO packets. So we'd better to solve it in
device level (e.g multiple rings with different size of buffers).
I try to change the queue size from 256 to 1024. Unfortunately, I didn't
see improvement. As a workaround, If you care only the MTU size packet
and do not want GSO packet to be received, you can just disable it by
specifying gso_tso{4|6}=off in qemu command line or let's make it
configurable through ethtool.
>
> Best,
>
> Mike
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH v2 net-next] net: introduce gro_frag_list_enable sysctl
From: Herbert Xu @ 2013-10-30 4:42 UTC (permalink / raw)
To: Eric Dumazet; +Cc: David Miller, christoph.paasch, netdev, hkchu, mwdalton
In-Reply-To: <1383107681.4857.33.camel@edumazet-glaptop.roam.corp.google.com>
On Tue, Oct 29, 2013 at 09:34:41PM -0700, Eric Dumazet wrote:
>
> What matters ?
>
> GRO ?
What matters is that you should not treat the forwarding case
separately from the host case.
For virtualisation the host case looks exactly like the forwarding
case.
IOW, if having a 64KB packet matters for the host, then it matters
for forwarding as well.
> Before my patch, GRO packets were 17 MSS, and nobody complained packets
> were too small, so what are you saying exactly ?
I'm not criticsing your mega-GRO patch at all. That one is great
and means that we'll get aggregated packets up to 64K. What we need
to do is just to patch up the GSO code so that it can handle these
mega-packets properly.
Cheers,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply
* Re: Bug in skb_segment: fskb->len != len
From: Herbert Xu @ 2013-10-30 4:47 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Christoph Paasch, netdev
In-Reply-To: <1383107858.4857.35.camel@edumazet-glaptop.roam.corp.google.com>
On Tue, Oct 29, 2013 at 09:37:38PM -0700, Eric Dumazet wrote:
>
> Well, if you really want to be smart, we could build GSO packets of ~16
> MSS, as most NIC support TSO.
>
> So a 64K packet containing 44 MSS could be split into 3 TSO packets.
Yes that would be nice for sure.
Thanks,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply
* Re: [PATCH] x86: Run checksumming in parallel accross multiple alu's
From: Doug Ledford @ 2013-10-30 5:25 UTC (permalink / raw)
To: Neil Horman; +Cc: Ingo Molnar, Eric Dumazet, Doug Ledford, linux-kernel, netdev
* Neil Horman <nhorman@tuxdriver.com> wrote:
> 3) The run times are proportionally larger, but still indicate that Parallel ALU
> execution is hurting rather than helping, which is counter-intuitive. I'm
> looking into it, but thought you might want to see these results in case
> something jumped out at you
So here's my theory about all of this.
I think that the original observation some years back was a fluke caused by
either a buggy CPU or a CPU design that is no longer used.
The parallel ALU design of this patch seems OK at first glance, but it means
that two parallel operations are both trying to set/clear both the overflow
and carry flags of the EFLAGS register of the *CPU* (not the ALU). So, either
some CPU in the past had a set of overflow/carry flags per ALU and did some
sort of magic to make sure that the last state of those flags across multiple
ALUs that might have been used in parallelizing work were always in the CPU's
logical EFLAGS register, or the CPU has a buggy microcode that allowed two
ALUs to operate on data at the same time in situations where they would
potentially stomp on the carry/overflow flags of the other ALUs operations.
It's my theory that all modern CPUs have this behavior fixed, probably via a
microcode update, and so trying to do parallel ALU operations like this simply
has no effect because the CPU (rightly so) serializes the operations to keep
them from clobbering the overflow/carry flags of the other ALUs operations.
My additional theory then is that the reason you see a slowdown from this
patch is because the attempt to parallelize the ALU operation has caused
us to write a series of instructions that, once serialized, are non-optimal
and hinder smooth pipelining of the data (aka going 0*8, 2*8, 4*8, 6*8, 1*8,
3*8, 5*8, and 7*8 in terms of memory accesses is worse than doing them in
order, and since we aren't getting the parallel operation we want, this
is the net result of the patch).
It would explain things anyway.
^ permalink raw reply
* [PATCH -next] SUNRPC: remove duplicated include from clnt.c
From: Wei Yongjun @ 2013-10-30 5:32 UTC (permalink / raw)
To: Trond.Myklebust-HgOvQuBEEgTQT0dZR+AlfA,
bfields-uC3wQj2KruNg9hUCZPvPmw, davem-fT/PcQaiUtIeIZ0/mPfg9Q
Cc: yongjun_wei-zrsr2BFq86L20UzCJQGyNP8+0UxHXcjY,
linux-nfs-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA
From: Wei Yongjun <yongjun_wei-zrsr2BFq86L20UzCJQGyNP8+0UxHXcjY@public.gmane.org>
Remove duplicated include.
Signed-off-by: Wei Yongjun <yongjun_wei-zrsr2BFq86L20UzCJQGyNP8+0UxHXcjY@public.gmane.org>
---
net/sunrpc/clnt.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 759b78b..dab09da 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -31,7 +31,6 @@
#include <linux/in.h>
#include <linux/in6.h>
#include <linux/un.h>
-#include <linux/rcupdate.h>
#include <linux/sunrpc/clnt.h>
#include <linux/sunrpc/addr.h>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* Re: [PATCH] bnx2: Use dev_kfree_skb_any() in bnx2_tx_int()
From: David Miller @ 2013-10-30 6:40 UTC (permalink / raw)
To: xiyou.wangcong; +Cc: tdmackey, mchan, netdev, linux-kernel
In-Reply-To: <CAM_iQpXzJOxZ3MFNthKw5ye=WcxDWbiPaHuL=pHMz7D1R4nJ+w@mail.gmail.com>
From: Cong Wang <xiyou.wangcong@gmail.com>
Date: Tue, 29 Oct 2013 20:50:08 -0700
> Normally ->poll() is called in softirq context, while netpoll could
> be called in any context depending on its caller.
It still makes amends to make the execution context still looks
"compatible" as far as locking et al. is concerned.
^ permalink raw reply
* RE: [PATCH net 0/2] qlcnic: Bug fixes
From: Shahed Shaikh @ 2013-10-30 6:44 UTC (permalink / raw)
To: David Miller; +Cc: netdev, Dept-NX Linux NIC Driver
In-Reply-To: <20131026.000549.2135341232449385506.davem@davemloft.net>
> -----Original Message-----
> From: David Miller [mailto:davem@davemloft.net]
> Sent: Saturday, October 26, 2013 9:36 AM
> To: Shahed Shaikh
> Cc: netdev; Dept-NX Linux NIC Driver
> Subject: Re: [PATCH net 0/2] qlcnic: Bug fixes
>
> From: Shahed Shaikh <shahed.shaikh@qlogic.com>
> Date: Fri, 25 Oct 2013 10:38:35 -0400
>
> > From: Shahed Shaikh <shahed.shaikh@qlogic.com>
> >
> > This patch series contains following fixes-
> > * Performace drop because driver was forcing adapter not to check
> > destination IP for LRO.
> > * driver was not issuing qlcnic_fw_cmd_set_drv_version() to 83xx adapter
> > becasue of improper handling of QLCNIC_FW_CAPABILITY_MORE_CAPS
> bit.
> >
> > Please apply to net.
>
> Applied, what exactly does that destination IP check do?
When a destination IP is programmed, adapter will perform LRO only on TCP packets destined to the programmed IP address.
When the adapter is programmed to skip destination IP check, adapter performs LRO on TCP packets destined to all IP addresses.
Thanks,
Shahed
^ permalink raw reply
* RE: [PATCH net 1/2] qlcnic: Do not force adapter to perform LRO without destination IP check
From: Shahed Shaikh @ 2013-10-30 6:46 UTC (permalink / raw)
To: Eric Dumazet; +Cc: David Miller, netdev, Dept-NX Linux NIC Driver
In-Reply-To: <1382761028.4032.36.camel@edumazet-glaptop.roam.corp.google.com>
> -----Original Message-----
> From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
> Sent: Saturday, October 26, 2013 9:47 AM
> To: Shahed Shaikh
> Cc: David Miller; netdev; Dept-NX Linux NIC Driver
> Subject: Re: [PATCH net 1/2] qlcnic: Do not force adapter to perform LRO
> without destination IP check
>
> On Fri, 2013-10-25 at 10:38 -0400, Shahed Shaikh wrote:
> > From: Shahed Shaikh <shahed.shaikh@qlogic.com>
> >
> > Forcing adapter to perform LRO without destination IP check degrades
> > the performance.
>
>
> Hmm... the performance, or does it allow two packets belonging to different
> flows being wrongly aggregated ?
>
> It looks a critical bug fix, not only a performance issue.
Hi Eric,
That's not the case. When destination IP check is skipped, due to a firmware bug, some packets take a slower path leading to performance issues.
Once the firmware issue is resolved we will revert this change in the driver to skip the des tip check.
Thanks,
Shahed
^ permalink raw reply
* [PATCH] bgmac: pass received packet to the netif instead of copying it
From: Rafał Miłecki @ 2013-10-30 7:00 UTC (permalink / raw)
To: netdev, David S. Miller; +Cc: openwrt-devel
Copying whole packets with skb_copy_from_linear_data_offset is a pretty
bad idea. CPU was spending time in __copy_user_common and network
performance was lower. With the new solution iperf-measured speed
increased from 116Mb/s to 134Mb/s.
Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
---
Changes since [RFC TRY#2]:
1) Fixed arguments alignment
2) Dropped code fixing old slot in case of bgmac_dma_rx_skb_for_slot
failure. Thanks to Nathan patch bgmac_dma_rx_skb_for_slot doesn't
change anything in slot in case it failed somewhere.
---
drivers/net/ethernet/broadcom/bgmac.c | 66 +++++++++++++++++++--------------
1 file changed, 39 insertions(+), 27 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bgmac.c b/drivers/net/ethernet/broadcom/bgmac.c
index 9a3fc89..eeb651d 100644
--- a/drivers/net/ethernet/broadcom/bgmac.c
+++ b/drivers/net/ethernet/broadcom/bgmac.c
@@ -315,7 +315,6 @@ static int bgmac_dma_rx_read(struct bgmac *bgmac, struct bgmac_dma_ring *ring,
struct device *dma_dev = bgmac->core->dma_dev;
struct bgmac_slot_info *slot = &ring->slots[ring->start];
struct sk_buff *skb = slot->skb;
- struct sk_buff *new_skb;
struct bgmac_rx_header *rx;
u16 len, flags;
@@ -328,38 +327,51 @@ static int bgmac_dma_rx_read(struct bgmac *bgmac, struct bgmac_dma_ring *ring,
len = le16_to_cpu(rx->len);
flags = le16_to_cpu(rx->flags);
- /* Check for poison and drop or pass the packet */
- if (len == 0xdead && flags == 0xbeef) {
- bgmac_err(bgmac, "Found poisoned packet at slot %d, DMA issue!\n",
- ring->start);
- } else {
+ do {
+ dma_addr_t old_dma_addr = slot->dma_addr;
+ int err;
+
+ /* Check for poison and drop or pass the packet */
+ if (len == 0xdead && flags == 0xbeef) {
+ bgmac_err(bgmac, "Found poisoned packet at slot %d, DMA issue!\n",
+ ring->start);
+ dma_sync_single_for_device(dma_dev,
+ slot->dma_addr,
+ BGMAC_RX_BUF_SIZE,
+ DMA_FROM_DEVICE);
+ break;
+ }
+
/* Omit CRC. */
len -= ETH_FCS_LEN;
- new_skb = netdev_alloc_skb_ip_align(bgmac->net_dev, len);
- if (new_skb) {
- skb_put(new_skb, len);
- skb_copy_from_linear_data_offset(skb, BGMAC_RX_FRAME_OFFSET,
- new_skb->data,
- len);
- skb_checksum_none_assert(skb);
- new_skb->protocol =
- eth_type_trans(new_skb, bgmac->net_dev);
- netif_receive_skb(new_skb);
- handled++;
- } else {
- bgmac->net_dev->stats.rx_dropped++;
- bgmac_err(bgmac, "Allocation of skb for copying packet failed!\n");
+ /* Prepare new skb as replacement */
+ err = bgmac_dma_rx_skb_for_slot(bgmac, slot);
+ if (err) {
+ /* Poison the old skb */
+ rx->len = cpu_to_le16(0xdead);
+ rx->flags = cpu_to_le16(0xbeef);
+
+ dma_sync_single_for_device(dma_dev,
+ slot->dma_addr,
+ BGMAC_RX_BUF_SIZE,
+ DMA_FROM_DEVICE);
+ break;
}
+ bgmac_dma_rx_setup_desc(bgmac, ring, ring->start);
- /* Poison the old skb */
- rx->len = cpu_to_le16(0xdead);
- rx->flags = cpu_to_le16(0xbeef);
- }
+ /* Unmap old skb, we'll pass it to the netfif */
+ dma_unmap_single(dma_dev, old_dma_addr,
+ BGMAC_RX_BUF_SIZE, DMA_FROM_DEVICE);
+
+ skb_put(skb, BGMAC_RX_FRAME_OFFSET + len);
+ skb_pull(skb, BGMAC_RX_FRAME_OFFSET);
- /* Make it back accessible to the hardware */
- dma_sync_single_for_device(dma_dev, slot->dma_addr,
- BGMAC_RX_BUF_SIZE, DMA_FROM_DEVICE);
+ skb_checksum_none_assert(skb);
+ skb->protocol = eth_type_trans(skb, bgmac->net_dev);
+ netif_receive_skb(skb);
+ handled++;
+ } while (0);
if (++ring->start >= BGMAC_RX_RING_SLOTS)
ring->start = 0;
--
1.7.10.4
_______________________________________________
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/cgi-bin/mailman/listinfo/openwrt-devel
^ permalink raw reply related
* Re: [PATCH RESEND] packet: Deliver VLAN TPID to userspace
From: Atzm Watanabe @ 2013-10-30 7:03 UTC (permalink / raw)
To: Ben Hutchings; +Cc: David Miller, netdev, stephen
In-Reply-To: <1382991074.3779.22.camel@bwh-desktop.uk.level5networks.com>
At Mon, 28 Oct 2013 20:11:14 +0000,
Ben Hutchings wrote:
>
> On Fri, 2013-10-25 at 18:59 -0400, David Miller wrote:
> > From: Atzm Watanabe <atzm@stratosphere.co.jp>
> > Date: Tue, 22 Oct 2013 17:39:40 +0900
> >
> > > struct tpacket_hdr_variant1 {
> > > __u32 tp_rxhash;
> > > __u32 tp_vlan_tci;
> > > + __u32 tp_vlan_tpid;
> > > };
> > >
> >
> > You cannot do this, the header length is not variable.
>
> Well it is variable to an extent. This is used as a member of the final
> anonymous union member of struct tpacket3_hdr, and the latter is
> specified to be tail-padded to a multiple of 16 bytes
> (TPACKET_ALIGNMENT). Since its current size is 36 bytes, I think it can
> safely grow by 12 bytes, so long as userland doesn't depend on
> getsockopt(..., PACKET_HDRLEN, ...) returning only specific values.
>
> Possibly there should be a separate struct tpacket_hdr_variant2 which
> includes the extra member. Possibly there should also be a status flag
> to indicate that this member is present.
Should it be structures as below?
struct tpacket_hdr_variant1 {
__u32 tp_rxhash;
__u32 tp_vlan_tci;
};
struct tpacket_hdr_variant2 {
__u32 tp_rxhash;
__u32 tp_vlan_tci;
__u32 tp_vlan_tpid;
};
struct tpacket3_hdr {
__u32 tp_next_offset;
__u32 tp_sec;
__u32 tp_nsec;
__u32 tp_snaplen;
__u32 tp_len;
__u32 tp_status;
__u16 tp_mac;
__u16 tp_net;
/* pkt_hdr variants */
union {
struct tpacket_hdr_variant1 hv1;
struct tpacket_hdr_variant2 hv2;
};
};
If it's ok, I'd like to send the patch v2.
> > This patch has been submitted several times, each of which you
> > have been shown ways in which your changes break userspace in
> > one way or another.
>
> I think we established that struct tpacket3_hdr can't grow arbitrarily,
> but not that it can't grow at all.
I think so, too.
^ permalink raw reply
* [PATCH net v2 0/3] r8152 bug fixes
From: Hayes Wang @ 2013-10-30 7:13 UTC (permalink / raw)
To: netdev-u79uwXL29TY76Z2rM5mHXA
Cc: nic_swsd-Rasf1IRRPZFBDgjK7y7TUQ,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
linux-usb-u79uwXL29TY76Z2rM5mHXA, Hayes Wang
In-Reply-To: <1383033377-1178-1-git-send-email-hayeswang-Rasf1IRRPZFBDgjK7y7TUQ@public.gmane.org>
I have update the commit messages. I hope they are clear enough.
I remove the stop/wake tx queue from the second patch first, until
I find a way to determine which value is suitable for TX_QLEN.
Hayes Wang (3):
r8152: fix tx/rx memory overflow
r8152: modify the tx flow
r8152: fix incorrect type in assignment
drivers/net/usb/r8152.c | 94 +++++++++++++++----------------------------------
1 file changed, 29 insertions(+), 65 deletions(-)
--
1.8.3.1
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* [PATCH net v2 1/3] r8152: fix tx/rx memory overflow
From: Hayes Wang @ 2013-10-30 7:13 UTC (permalink / raw)
To: netdev; +Cc: nic_swsd, linux-kernel, linux-usb, Hayes Wang
In-Reply-To: <1383117220-893-1-git-send-email-hayeswang@realtek.com>
The tx/rx would access the memory which is out of the desired range.
Modify the method of checking the end of the memory to avoid it.
For r8152_tx_agg_fill(), the variable remain may become negative.
However, the declaration is unsigned, so the while loop wouldn't
break when reach the end of the desied memory. Although change the
declaration from unsigned to signed is enough to fix it, I also
modify the checking method for safe. Replace
remain = rx_buf_sz - sizeof(*tx_desc) -
(u32)((void *)tx_data - agg->head);
with
remain = rx_buf_sz - (int)(tx_agg_align(tx_data) - agg->head);
to make sure the variable remain is always positive. Then, the
overflow wouldn't happen.
For rx_bottom(), the rx_desc should not be used to calculate the
packet length before making sure the rx_desc is in the desired range.
Change the checking to two parts. First, check the descriptor is in
the memory. The other, using the descriptor to find out the packet
length and check if the packet is in the memory.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
---
drivers/net/usb/r8152.c | 30 +++++++++++++++++-------------
1 file changed, 17 insertions(+), 13 deletions(-)
diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index f3fce41..5dbfe50 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -24,7 +24,7 @@
#include <linux/ipv6.h>
/* Version Information */
-#define DRIVER_VERSION "v1.01.0 (2013/08/12)"
+#define DRIVER_VERSION "v1.02.0 (2013/10/28)"
#define DRIVER_AUTHOR "Realtek linux nic maintainers <nic_swsd@realtek.com>"
#define DRIVER_DESC "Realtek RTL8152 Based USB 2.0 Ethernet Adapters"
#define MODULENAME "r8152"
@@ -1136,14 +1136,14 @@ r8152_tx_csum(struct r8152 *tp, struct tx_desc *desc, struct sk_buff *skb)
static int r8152_tx_agg_fill(struct r8152 *tp, struct tx_agg *agg)
{
- u32 remain;
+ int remain;
u8 *tx_data;
tx_data = agg->head;
agg->skb_num = agg->skb_len = 0;
- remain = rx_buf_sz - sizeof(struct tx_desc);
+ remain = rx_buf_sz;
- while (remain >= ETH_ZLEN) {
+ while (remain >= ETH_ZLEN + sizeof(struct tx_desc)) {
struct tx_desc *tx_desc;
struct sk_buff *skb;
unsigned int len;
@@ -1152,12 +1152,14 @@ static int r8152_tx_agg_fill(struct r8152 *tp, struct tx_agg *agg)
if (!skb)
break;
+ remain -= sizeof(*tx_desc);
len = skb->len;
if (remain < len) {
skb_queue_head(&tp->tx_queue, skb);
break;
}
+ tx_data = tx_agg_align(tx_data);
tx_desc = (struct tx_desc *)tx_data;
tx_data += sizeof(*tx_desc);
@@ -1167,9 +1169,8 @@ static int r8152_tx_agg_fill(struct r8152 *tp, struct tx_agg *agg)
agg->skb_len += len;
dev_kfree_skb_any(skb);
- tx_data = tx_agg_align(tx_data + len);
- remain = rx_buf_sz - sizeof(*tx_desc) -
- (u32)((void *)tx_data - agg->head);
+ tx_data += len;
+ remain = rx_buf_sz - (int)(tx_agg_align(tx_data) - agg->head);
}
usb_fill_bulk_urb(agg->urb, tp->udev, usb_sndbulkpipe(tp->udev, 2),
@@ -1188,7 +1189,6 @@ static void rx_bottom(struct r8152 *tp)
list_for_each_safe(cursor, next, &tp->rx_done) {
struct rx_desc *rx_desc;
struct rx_agg *agg;
- unsigned pkt_len;
int len_used = 0;
struct urb *urb;
u8 *rx_data;
@@ -1204,17 +1204,22 @@ static void rx_bottom(struct r8152 *tp)
rx_desc = agg->head;
rx_data = agg->head;
- pkt_len = le32_to_cpu(rx_desc->opts1) & RX_LEN_MASK;
- len_used += sizeof(struct rx_desc) + pkt_len;
+ len_used += sizeof(struct rx_desc);
- while (urb->actual_length >= len_used) {
+ while (urb->actual_length > len_used) {
struct net_device *netdev = tp->netdev;
struct net_device_stats *stats;
+ unsigned pkt_len;
struct sk_buff *skb;
+ pkt_len = le32_to_cpu(rx_desc->opts1) & RX_LEN_MASK;
if (pkt_len < ETH_ZLEN)
break;
+ len_used += pkt_len;
+ if (urb->actual_length < len_used)
+ break;
+
stats = rtl8152_get_stats(netdev);
pkt_len -= 4; /* CRC */
@@ -1234,9 +1239,8 @@ static void rx_bottom(struct r8152 *tp)
rx_data = rx_agg_align(rx_data + pkt_len + 4);
rx_desc = (struct rx_desc *)rx_data;
- pkt_len = le32_to_cpu(rx_desc->opts1) & RX_LEN_MASK;
len_used = (int)(rx_data - (u8 *)agg->head);
- len_used += sizeof(struct rx_desc) + pkt_len;
+ len_used += sizeof(struct rx_desc);
}
submit:
--
1.8.3.1
^ permalink raw reply related
* [PATCH net v2 2/3] r8152: modify the tx flow
From: Hayes Wang @ 2013-10-30 7:13 UTC (permalink / raw)
To: netdev; +Cc: nic_swsd, linux-kernel, linux-usb, Hayes Wang
In-Reply-To: <1383117220-893-1-git-send-email-hayeswang@realtek.com>
Remove the code for sending the packet in the rtl8152_start_xmit().
Let rtl8152_start_xmit() to queue the packet only, and schedule a
tasklet to send the queued packets. This simplify the code and make
sure all the packet would be sent by the original order.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
---
drivers/net/usb/r8152.c | 46 +++-------------------------------------------
1 file changed, 3 insertions(+), 43 deletions(-)
diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index 5dbfe50..763234d 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -1388,53 +1388,13 @@ static netdev_tx_t rtl8152_start_xmit(struct sk_buff *skb,
struct net_device *netdev)
{
struct r8152 *tp = netdev_priv(netdev);
- struct net_device_stats *stats = rtl8152_get_stats(netdev);
- unsigned long flags;
- struct tx_agg *agg = NULL;
- struct tx_desc *tx_desc;
- unsigned int len;
- u8 *tx_data;
- int res;
skb_tx_timestamp(skb);
- /* If tx_queue is not empty, it means at least one previous packt */
- /* is waiting for sending. Don't send current one before it. */
- if (skb_queue_empty(&tp->tx_queue))
- agg = r8152_get_tx_agg(tp);
-
- if (!agg) {
- skb_queue_tail(&tp->tx_queue, skb);
- return NETDEV_TX_OK;
- }
-
- tx_desc = (struct tx_desc *)agg->head;
- tx_data = agg->head + sizeof(*tx_desc);
- agg->skb_num = agg->skb_len = 0;
+ skb_queue_tail(&tp->tx_queue, skb);
- len = skb->len;
- r8152_tx_csum(tp, tx_desc, skb);
- memcpy(tx_data, skb->data, len);
- dev_kfree_skb_any(skb);
- agg->skb_num++;
- agg->skb_len += len;
- usb_fill_bulk_urb(agg->urb, tp->udev, usb_sndbulkpipe(tp->udev, 2),
- agg->head, len + sizeof(*tx_desc),
- (usb_complete_t)write_bulk_callback, agg);
- res = usb_submit_urb(agg->urb, GFP_ATOMIC);
- if (res) {
- /* Can we get/handle EPIPE here? */
- if (res == -ENODEV) {
- netif_device_detach(tp->netdev);
- } else {
- netif_warn(tp, tx_err, netdev,
- "failed tx_urb %d\n", res);
- stats->tx_dropped++;
- spin_lock_irqsave(&tp->tx_lock, flags);
- list_add_tail(&agg->list, &tp->tx_free);
- spin_unlock_irqrestore(&tp->tx_lock, flags);
- }
- }
+ if (!list_empty(&tp->tx_free))
+ tasklet_schedule(&tp->tl);
return NETDEV_TX_OK;
}
--
1.8.3.1
^ permalink raw reply related
* [PATCH net v2 3/3] r8152: fix incorrect type in assignment
From: Hayes Wang @ 2013-10-30 7:13 UTC (permalink / raw)
To: netdev; +Cc: nic_swsd, linux-kernel, linux-usb, Hayes Wang
In-Reply-To: <1383117220-893-1-git-send-email-hayeswang@realtek.com>
The data from the hardware should be little endian. Correct the
declaration.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
---
drivers/net/usb/r8152.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index 763234d..a77bdb8 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -307,22 +307,22 @@ enum rtl8152_flags {
#define MCU_TYPE_USB 0x0000
struct rx_desc {
- u32 opts1;
+ __le32 opts1;
#define RX_LEN_MASK 0x7fff
- u32 opts2;
- u32 opts3;
- u32 opts4;
- u32 opts5;
- u32 opts6;
+ __le32 opts2;
+ __le32 opts3;
+ __le32 opts4;
+ __le32 opts5;
+ __le32 opts6;
};
struct tx_desc {
- u32 opts1;
+ __le32 opts1;
#define TX_FS (1 << 31) /* First segment of a packet */
#define TX_LS (1 << 30) /* Final segment of a packet */
#define TX_LEN_MASK 0x3ffff
- u32 opts2;
+ __le32 opts2;
#define UDP_CS (1 << 31) /* Calculate UDP/IP checksum */
#define TCP_CS (1 << 30) /* Calculate TCP/IP checksum */
#define IPV4_CS (1 << 29) /* Calculate IPv4 checksum */
@@ -876,7 +876,7 @@ static void write_bulk_callback(struct urb *urb)
static void intr_callback(struct urb *urb)
{
struct r8152 *tp;
- __u16 *d;
+ __le16 *d;
int status = urb->status;
int res;
--
1.8.3.1
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox