* Re: [Bugme-new] [Bug 11316] New: severe performance regression for iptables nat routing
[not found] <bug-11316-10286@http.bugzilla.kernel.org/>
@ 2008-08-13 5:12 ` Andrew Morton
2008-08-14 2:08 ` Alex Williamson
0 siblings, 1 reply; 22+ messages in thread
From: Andrew Morton @ 2008-08-13 5:12 UTC (permalink / raw)
To: netdev; +Cc: bugme-daemon, alex.williamson
(switched to email. Please respond via emailed reply-to-all, not via the
bugzilla web interface).
On Tue, 12 Aug 2008 22:04:41 -0700 (PDT) bugme-daemon@bugzilla.kernel.org wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=11316
>
> Summary: severe performance regression for iptables nat routing
> Product: Networking
> Version: 2.5
> KernelVersion: 2.6.27-rc3
> Platform: All
> OS/Version: Linux
> Tree: Mainline
> Status: NEW
> Severity: high
> Priority: P1
> Component: Netfilter/Iptables
> AssignedTo: networking_netfilter-iptables@kernel-bugs.osdl.org
> ReportedBy: alex.williamson@hp.com
>
>
> Latest working kernel version: 2.6.26.2
> Earliest failing kernel version: 2.6.27-rc2 (maybe earlier)
> Distribution: Ubuntu
> Hardware Environment: x86_64
> Software Environment: 32bit userspace/64bit kernel
> Problem Description: When using iptables to intercept addr:port and reroute
> through an ssh tunnel, I see a huge performance hit on the 2.6.27-rc series
> relative to 2.6.26 (34KB/s vs 1+MB/s).
>
> Steps to reproduce:
>
> Setup and ssh tunnel to one of the kernel.org servers using a system on your
> local network:
>
> ssh -L 8888:204.152.191.37:80 <local system>
>
> Leave the ssh session running. In a new terminal (on your local system),
> verify performance of direct access versus the tunnel:
>
> wget -O /dev/null
> http://204.152.191.37/pub/linux/kernel/v2.6/linux-2.6.26.2.tar.bz2
> wget -O /dev/null
> http://127.0.0.1:8888/pub/linux/kernel/v2.6/linux-2.6.26.2.tar.bz2
>
> These should be roughly the same. Now setup iptables so that when you try to
> access 204.152.191.37:80 you'll automatically be redirected to the ssh tunnel:
>
> sudo iptables -t nat -N bug
> sudo iptables -t nat -I OUTPUT 1 -j bug
> sudo iptables -t nat -A bug -d 204.152.191.37 -p tcp --dport 80 -j DNAT
> --to-destination 127.0.0.1:8888
>
> Repeat the performance test:
>
> wget -O /dev/null
> http://204.152.191.37/pub/linux/kernel/v2.6/linux-2.6.26.2.tar.bz2
> wget -O /dev/null
> http://127.0.0.1:8888/pub/linux/kernel/v2.6/linux-2.6.26.2.tar.bz2
>
> On 2.6.27-rc2+ My rate quickly drops down to ~34KB/s using the iptables nat'd
> wget (204.152.191.37) while the ssh tunnel still runs 1+MB/s. On 2.6.26 I get
> similar performance for both paths.
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Bugme-new] [Bug 11316] New: severe performance regression for iptables nat routing
2008-08-13 5:12 ` [Bugme-new] [Bug 11316] New: severe performance regression for iptables nat routing Andrew Morton
@ 2008-08-14 2:08 ` Alex Williamson
2008-08-14 2:21 ` David Miller
0 siblings, 1 reply; 22+ messages in thread
From: Alex Williamson @ 2008-08-14 2:08 UTC (permalink / raw)
To: Andrew Morton, buytenh; +Cc: netdev, bugme-daemon
git bisect traced the problem back to this changeset:
commit e5a4a72d4f88f4389e9340d383ca67031d1b8536
Author: Lennert Buytenhek <buytenh@marvell.com>
Date: Sun Aug 3 01:23:10 2008 -0700
net: use software GSO for SG+CSUM capable netdevices
I've verified that I can toggle the slowness by reverting this patch on
top of 8d0968ab (current head). The problem is readily reproducible
using Ubuntu Hardy in a KVM VM with upstream, defconfig kernel.
On Tue, 2008-08-12 at 22:12 -0700, Andrew Morton wrote:
> (switched to email. Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
>
> On Tue, 12 Aug 2008 22:04:41 -0700 (PDT) bugme-daemon@bugzilla.kernel.org wrote:
>
> > http://bugzilla.kernel.org/show_bug.cgi?id=11316
> >
> > Summary: severe performance regression for iptables nat routing
> > Product: Networking
> > Version: 2.5
> > KernelVersion: 2.6.27-rc3
> > Platform: All
> > OS/Version: Linux
> > Tree: Mainline
> > Status: NEW
> > Severity: high
> > Priority: P1
> > Component: Netfilter/Iptables
> > AssignedTo: networking_netfilter-iptables@kernel-bugs.osdl.org
> > ReportedBy: alex.williamson@hp.com
> >
> >
> > Latest working kernel version: 2.6.26.2
> > Earliest failing kernel version: 2.6.27-rc2 (maybe earlier)
> > Distribution: Ubuntu
> > Hardware Environment: x86_64
> > Software Environment: 32bit userspace/64bit kernel
> > Problem Description: When using iptables to intercept addr:port and reroute
> > through an ssh tunnel, I see a huge performance hit on the 2.6.27-rc series
> > relative to 2.6.26 (34KB/s vs 1+MB/s).
> >
> > Steps to reproduce:
> >
> > Setup and ssh tunnel to one of the kernel.org servers using a system on your
> > local network:
> >
> > ssh -L 8888:204.152.191.37:80 <local system>
> >
> > Leave the ssh session running. In a new terminal (on your local system),
> > verify performance of direct access versus the tunnel:
> >
> > wget -O /dev/null
> > http://204.152.191.37/pub/linux/kernel/v2.6/linux-2.6.26.2.tar.bz2
> > wget -O /dev/null
> > http://127.0.0.1:8888/pub/linux/kernel/v2.6/linux-2.6.26.2.tar.bz2
> >
> > These should be roughly the same. Now setup iptables so that when you try to
> > access 204.152.191.37:80 you'll automatically be redirected to the ssh tunnel:
> >
> > sudo iptables -t nat -N bug
> > sudo iptables -t nat -I OUTPUT 1 -j bug
> > sudo iptables -t nat -A bug -d 204.152.191.37 -p tcp --dport 80 -j DNAT
> > --to-destination 127.0.0.1:8888
> >
> > Repeat the performance test:
> >
> > wget -O /dev/null
> > http://204.152.191.37/pub/linux/kernel/v2.6/linux-2.6.26.2.tar.bz2
> > wget -O /dev/null
> > http://127.0.0.1:8888/pub/linux/kernel/v2.6/linux-2.6.26.2.tar.bz2
> >
> > On 2.6.27-rc2+ My rate quickly drops down to ~34KB/s using the iptables nat'd
> > wget (204.152.191.37) while the ssh tunnel still runs 1+MB/s. On 2.6.26 I get
> > similar performance for both paths.
> >
>
--
Alex Williamson HP Open Source & Linux Org.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Bugme-new] [Bug 11316] New: severe performance regression for iptables nat routing
2008-08-14 2:08 ` Alex Williamson
@ 2008-08-14 2:21 ` David Miller
2008-08-14 11:04 ` Patrick McHardy
2008-08-15 4:34 ` Herbert Xu
0 siblings, 2 replies; 22+ messages in thread
From: David Miller @ 2008-08-14 2:21 UTC (permalink / raw)
To: alex.williamson; +Cc: akpm, buytenh, netdev, bugme-daemon, kaber
From: Alex Williamson <alex.williamson@hp.com>
Date: Wed, 13 Aug 2008 20:08:20 -0600
> git bisect traced the problem back to this changeset:
>
> commit e5a4a72d4f88f4389e9340d383ca67031d1b8536
> Author: Lennert Buytenhek <buytenh@marvell.com>
> Date: Sun Aug 3 01:23:10 2008 -0700
>
> net: use software GSO for SG+CSUM capable netdevices
>
> I've verified that I can toggle the slowness by reverting this patch on
> top of 8d0968ab (current head). The problem is readily reproducible
> using Ubuntu Hardy in a KVM VM with upstream, defconfig kernel.
Patrick I wonder if there a case where iptables NAT will COW the packet
when it really doesn't need to.
It seems, if anything, using GSO should make things go a little bit
faster not slower... Hmmm...
Anyways, if we can't figure this one out soon we can easily revert.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Bugme-new] [Bug 11316] New: severe performance regression for iptables nat routing
2008-08-14 2:21 ` David Miller
@ 2008-08-14 11:04 ` Patrick McHardy
2008-08-14 15:08 ` Alex Williamson
2008-08-14 22:00 ` David Miller
2008-08-15 4:34 ` Herbert Xu
1 sibling, 2 replies; 22+ messages in thread
From: Patrick McHardy @ 2008-08-14 11:04 UTC (permalink / raw)
To: David Miller; +Cc: alex.williamson, akpm, buytenh, netdev, bugme-daemon
David Miller wrote:
> From: Alex Williamson <alex.williamson@hp.com>
> Date: Wed, 13 Aug 2008 20:08:20 -0600
>
>> git bisect traced the problem back to this changeset:
>>
>> commit e5a4a72d4f88f4389e9340d383ca67031d1b8536
>> Author: Lennert Buytenhek <buytenh@marvell.com>
>> Date: Sun Aug 3 01:23:10 2008 -0700
>>
>> net: use software GSO for SG+CSUM capable netdevices
>>
>> I've verified that I can toggle the slowness by reverting this patch on
>> top of 8d0968ab (current head). The problem is readily reproducible
>> using Ubuntu Hardy in a KVM VM with upstream, defconfig kernel.
>
> Patrick I wonder if there a case where iptables NAT will COW the packet
> when it really doesn't need to.
I don't think so, its using skb_make_writable everywhere, which checks
for skb_clone_writable, which should usually avoid COWing local TCP
packets. It would also be unlikely to have that much of a performance
impact (1MB/s -> 34kb/s).
>
> It seems, if anything, using GSO should make things go a little bit
> faster not slower... Hmmm...
Alex, could you post a tcpdump from both loopback and the outgoing
device from the machine you're doing NAT on?
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Bugme-new] [Bug 11316] New: severe performance regression for iptables nat routing
2008-08-14 11:04 ` Patrick McHardy
@ 2008-08-14 15:08 ` Alex Williamson
2008-08-15 4:44 ` Herbert Xu
2008-08-14 22:00 ` David Miller
1 sibling, 1 reply; 22+ messages in thread
From: Alex Williamson @ 2008-08-14 15:08 UTC (permalink / raw)
To: Patrick McHardy; +Cc: David Miller, akpm, buytenh, netdev, bugme-daemon
[-- Attachment #1: Type: text/plain, Size: 773 bytes --]
On Thu, 2008-08-14 at 13:04 +0200, Patrick McHardy wrote:
> I don't think so, its using skb_make_writable everywhere, which checks
> for skb_clone_writable, which should usually avoid COWing local TCP
> packets. It would also be unlikely to have that much of a performance
> impact (1MB/s -> 34kb/s).
>
> >
> > It seems, if anything, using GSO should make things go a little bit
> > faster not slower... Hmmm...
>
> Alex, could you post a tcpdump from both loopback and the outgoing
> device from the machine you're doing NAT on?
Attached, let me know if you want more options, this is just -vv -n.
The NAT'ing system is at 10.0.2.15 and the ssh tunnel target is
192.168.1.60. Thanks,
Alex
--
Alex Williamson HP Open Source & Linux Org.
[-- Attachment #2: eth0.log.bz2 --]
[-- Type: application/x-bzip, Size: 34521 bytes --]
[-- Attachment #3: lo.log.bz2 --]
[-- Type: application/x-bzip, Size: 10538 bytes --]
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Bugme-new] [Bug 11316] New: severe performance regression for iptables nat routing
2008-08-14 11:04 ` Patrick McHardy
2008-08-14 15:08 ` Alex Williamson
@ 2008-08-14 22:00 ` David Miller
1 sibling, 0 replies; 22+ messages in thread
From: David Miller @ 2008-08-14 22:00 UTC (permalink / raw)
To: kaber; +Cc: alex.williamson, akpm, buytenh, netdev, bugme-daemon
From: Patrick McHardy <kaber@trash.net>
Date: Thu, 14 Aug 2008 13:04:25 +0200
> David Miller wrote:
> > Patrick I wonder if there a case where iptables NAT will COW the packet
> > when it really doesn't need to.
>
> I don't think so, its using skb_make_writable everywhere, which checks
> for skb_clone_writable, which should usually avoid COWing local TCP
> packets. It would also be unlikely to have that much of a performance
> impact (1MB/s -> 34kb/s).
I think he is NAT'ing locally generated traffic, look at the bugzilla
entry.
He has two cases of the same wget transfer, one is direct and another
uses a 127.0.0.1:XXXX URL that does the transfer over an SSH tunnel.
Normally they go roughly at the same rate.
Then he adds iptables NAT entries that redirect the first transfer
case over the SSH tunnel addr/port. And it is this case that degrades
in performance with the GSO changeset.
So it is locally generated TCP traffic, NAT'd to another port and IP
address (specifically, redirected to 127.0.0.1:8888).
Perhaps the problem has something to do with the fact that as far as
TCP is concerned, the destination device can do SG and CSUM and thus
GSO. But then iptables NATs this traffic to loopback. I think that
is what leads to some kind of slowpath.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Bugme-new] [Bug 11316] New: severe performance regression for iptables nat routing
2008-08-14 2:21 ` David Miller
2008-08-14 11:04 ` Patrick McHardy
@ 2008-08-15 4:34 ` Herbert Xu
1 sibling, 0 replies; 22+ messages in thread
From: Herbert Xu @ 2008-08-15 4:34 UTC (permalink / raw)
To: David Miller; +Cc: alex.williamson, akpm, buytenh, netdev, bugme-daemon, kaber
David Miller <davem@davemloft.net> wrote:
>
> Patrick I wonder if there a case where iptables NAT will COW the packet
> when it really doesn't need to.
This doesn't make sense. He's downloading from a remote host, so
GSO shouldn't even come into play.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Bugme-new] [Bug 11316] New: severe performance regression for iptables nat routing
2008-08-14 15:08 ` Alex Williamson
@ 2008-08-15 4:44 ` Herbert Xu
2008-08-15 5:35 ` Herbert Xu
[not found] ` <1218778238.23510.22.camel@2710p.home>
0 siblings, 2 replies; 22+ messages in thread
From: Herbert Xu @ 2008-08-15 4:44 UTC (permalink / raw)
To: Alex Williamson; +Cc: kaber, davem, akpm, buytenh, netdev, bugme-daemon
Alex Williamson <alex.williamson@hp.com> wrote:
>
> Attached, let me know if you want more options, this is just -vv -n.
> The NAT'ing system is at 10.0.2.15 and the ssh tunnel target is
> 192.168.1.60. Thanks,
Right, the underlying TCP connection is going well, but the NATed
connection is getting checksum errors. Please send us the raw
packet dump on lo (tcpdump -s 1600 -w file) so we can see what's
wrong.
Actually, I think know what's going on but a raw packet dump should
confirm whether we're getting a partial checksum.
Thanks,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Bugme-new] [Bug 11316] New: severe performance regression for iptables nat routing
2008-08-15 4:44 ` Herbert Xu
@ 2008-08-15 5:35 ` Herbert Xu
2008-08-15 5:49 ` Alex Williamson
2008-08-15 21:56 ` David Miller
[not found] ` <1218778238.23510.22.camel@2710p.home>
1 sibling, 2 replies; 22+ messages in thread
From: Herbert Xu @ 2008-08-15 5:35 UTC (permalink / raw)
To: Alex Williamson; +Cc: kaber, davem, akpm, buytenh, netdev, bugme-daemon
On Fri, Aug 15, 2008 at 02:44:26PM +1000, Herbert Xu wrote:
>
> Actually, I think know what's going on but a raw packet dump should
> confirm whether we're getting a partial checksum.
Nevermind, I think I've found the problem.
loopback: Drop obsolete ip_summed setting
Now that the network stack can handle inbound packets with partial
checksums, we should no longer clobber the ip_summed field in the
loopback driver. This is because CHECKSUM_UNNECESSARY implies that
the checksum field is actually valid which is not true for loopback
packets since it's only partial (and thus complemented).
This allows packets from lo to then be SNATed to an external source
while still preserving the checksum's validity.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
index 49f6bc0..810e292 100644
--- a/drivers/net/loopback.c
+++ b/drivers/net/loopback.c
@@ -137,9 +137,6 @@ static int loopback_xmit(struct sk_buff *skb, struct net_device *dev)
skb_orphan(skb);
skb->protocol = eth_type_trans(skb,dev);
-#ifndef LOOPBACK_MUST_CHECKSUM
- skb->ip_summed = CHECKSUM_UNNECESSARY;
-#endif
#ifdef LOOPBACK_TSO
if (skb_is_gso(skb)) {
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [Bugme-new] [Bug 11316] New: severe performance regression for iptables nat routing
2008-08-15 5:35 ` Herbert Xu
@ 2008-08-15 5:49 ` Alex Williamson
2008-08-15 6:17 ` Herbert Xu
2008-08-15 21:56 ` David Miller
1 sibling, 1 reply; 22+ messages in thread
From: Alex Williamson @ 2008-08-15 5:49 UTC (permalink / raw)
To: Herbert Xu; +Cc: kaber, davem, akpm, buytenh, netdev, bugme-daemon
On Fri, 2008-08-15 at 15:35 +1000, Herbert Xu wrote:
> On Fri, Aug 15, 2008 at 02:44:26PM +1000, Herbert Xu wrote:
> >
> > Actually, I think know what's going on but a raw packet dump should
> > confirm whether we're getting a partial checksum.
>
> Nevermind, I think I've found the problem.
>
> loopback: Drop obsolete ip_summed setting
>
> Now that the network stack can handle inbound packets with partial
> checksums, we should no longer clobber the ip_summed field in the
> loopback driver. This is because CHECKSUM_UNNECESSARY implies that
> the checksum field is actually valid which is not true for loopback
> packets since it's only partial (and thus complemented).
>
> This allows packets from lo to then be SNATed to an external source
> while still preserving the checksum's validity.
Nope, that doesn't fix it. NAT'd throughput remains about the same.
Thanks,
Alex
--
Alex Williamson HP Open Source & Linux Org.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Bugme-new] [Bug 11316] New: severe performance regression for iptables nat routing
2008-08-15 5:49 ` Alex Williamson
@ 2008-08-15 6:17 ` Herbert Xu
0 siblings, 0 replies; 22+ messages in thread
From: Herbert Xu @ 2008-08-15 6:17 UTC (permalink / raw)
To: Alex Williamson
Cc: herbert, kaber, davem, akpm, buytenh, netdev, bugme-daemon
Alex Williamson <alex.williamson@hp.com> wrote:
>
> Nope, that doesn't fix it. NAT'd throughput remains about the same.
Please take the raw packet dump on lo then.
Thanks,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Bugme-new] [Bug 11316] New: severe performance regression for iptables nat routing
[not found] ` <1218778238.23510.22.camel@2710p.home>
@ 2008-08-15 7:33 ` Herbert Xu
2008-08-15 8:14 ` Herbert Xu
0 siblings, 1 reply; 22+ messages in thread
From: Herbert Xu @ 2008-08-15 7:33 UTC (permalink / raw)
To: Alex Williamson; +Cc: kaber, davem, akpm, buytenh, netdev, bugme-daemon
On Thu, Aug 14, 2008 at 11:30:37PM -0600, Alex Williamson wrote:
>
> Here it is. Thanks,
Can you also post all your netfilter rules (filter + NAT) please?
Thanks,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Bugme-new] [Bug 11316] New: severe performance regression for iptables nat routing
2008-08-15 7:33 ` Herbert Xu
@ 2008-08-15 8:14 ` Herbert Xu
2008-08-15 10:32 ` Herbert Xu
0 siblings, 1 reply; 22+ messages in thread
From: Herbert Xu @ 2008-08-15 8:14 UTC (permalink / raw)
To: Alex Williamson; +Cc: kaber, davem, akpm, buytenh, netdev, bugme-daemon
On Fri, Aug 15, 2008 at 05:33:43PM +1000, Herbert Xu wrote:
> On Thu, Aug 14, 2008 at 11:30:37PM -0600, Alex Williamson wrote:
> >
> > Here it is. Thanks,
>
> Can you also post all your netfilter rules (filter + NAT) please?
It's OK, I can reproduce it now.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Bugme-new] [Bug 11316] New: severe performance regression for iptables nat routing
2008-08-15 8:14 ` Herbert Xu
@ 2008-08-15 10:32 ` Herbert Xu
2008-08-15 10:53 ` Herbert Xu
` (2 more replies)
0 siblings, 3 replies; 22+ messages in thread
From: Herbert Xu @ 2008-08-15 10:32 UTC (permalink / raw)
To: Alex Williamson; +Cc: kaber, davem, akpm, buytenh, netdev, bugme-daemon
On Fri, Aug 15, 2008 at 06:14:42PM +1000, Herbert Xu wrote:
>
> It's OK, I can reproduce it now.
This fixes it for me.
loopback: Enable TSO
This patch enables TSO since the loopback device is naturally
capable of handling packets of any size. This also means that
we won't enable GSO on lo which is good until GSO is fixed to
preserve netfilter state as netfilter treats loopback packets
in a special way.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
I'll work on the netfilter state preservation next.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
index 49f6bc0..c11e621 100644
--- a/drivers/net/loopback.c
+++ b/drivers/net/loopback.c
@@ -234,9 +231,7 @@ static void loopback_setup(struct net_device *dev)
dev->type = ARPHRD_LOOPBACK; /* 0x0001*/
dev->flags = IFF_LOOPBACK;
dev->features = NETIF_F_SG | NETIF_F_FRAGLIST
-#ifdef LOOPBACK_TSO
| NETIF_F_TSO
-#endif
| NETIF_F_NO_CSUM
| NETIF_F_HIGHDMA
| NETIF_F_LLTX
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [Bugme-new] [Bug 11316] New: severe performance regression for iptables nat routing
2008-08-15 10:32 ` Herbert Xu
@ 2008-08-15 10:53 ` Herbert Xu
2008-08-15 15:34 ` Alex Williamson
2008-08-15 21:55 ` David Miller
2008-08-15 20:58 ` David Miller
2008-08-15 21:54 ` David Miller
2 siblings, 2 replies; 22+ messages in thread
From: Herbert Xu @ 2008-08-15 10:53 UTC (permalink / raw)
To: Alex Williamson; +Cc: kaber, davem, akpm, buytenh, netdev, bugme-daemon
On Fri, Aug 15, 2008 at 08:32:35PM +1000, Herbert Xu wrote:
>
> I'll work on the netfilter state preservation next.
Here it is:
net: Preserve netfilter attributes in skb_gso_segment using __copy_skb_header
skb_gso_segment didn't preserve some attributes in the original skb
such as the netfilter fields. This was harmless until they were used
which is the case for packets going through lo.
This patch makes it call __copy_skb_header which also picks up some
other missing attributes.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 8464017..ca1ccdf 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -2256,14 +2256,7 @@ struct sk_buff *skb_segment(struct sk_buff *skb, int features)
segs = nskb;
tail = nskb;
- nskb->dev = skb->dev;
- skb_copy_queue_mapping(nskb, skb);
- nskb->priority = skb->priority;
- nskb->protocol = skb->protocol;
- nskb->vlan_tci = skb->vlan_tci;
- nskb->dst = dst_clone(skb->dst);
- memcpy(nskb->cb, skb->cb, sizeof(skb->cb));
- nskb->pkt_type = skb->pkt_type;
+ __copy_skb_header(nskb, skb);
nskb->mac_len = skb->mac_len;
skb_reserve(nskb, headroom);
@@ -2274,6 +2267,7 @@ struct sk_buff *skb_segment(struct sk_buff *skb, int features)
skb_copy_from_linear_data(skb, skb_put(nskb, doffset),
doffset);
if (!sg) {
+ nskb->ip_summed = CHECKSUM_NONE;
nskb->csum = skb_copy_and_csum_bits(skb, offset,
skb_put(nskb, len),
len, 0);
@@ -2283,8 +2277,6 @@ struct sk_buff *skb_segment(struct sk_buff *skb, int features)
frag = skb_shinfo(nskb)->frags;
k = 0;
- nskb->ip_summed = CHECKSUM_PARTIAL;
- nskb->csum = skb->csum;
skb_copy_from_linear_data_offset(skb, offset,
skb_put(nskb, hsize), hsize);
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [Bugme-new] [Bug 11316] New: severe performance regression for iptables nat routing
2008-08-15 10:53 ` Herbert Xu
@ 2008-08-15 15:34 ` Alex Williamson
2008-08-15 21:55 ` David Miller
2008-08-15 21:55 ` David Miller
1 sibling, 1 reply; 22+ messages in thread
From: Alex Williamson @ 2008-08-15 15:34 UTC (permalink / raw)
To: Herbert Xu; +Cc: kaber, davem, akpm, buytenh, netdev, bugme-daemon
On Fri, 2008-08-15 at 20:53 +1000, Herbert Xu wrote:
> On Fri, Aug 15, 2008 at 08:32:35PM +1000, Herbert Xu wrote:
> >
> > I'll work on the netfilter state preservation next.
>
> Here it is:
Confirmed, these patches solve the problem. Thanks Herbert.
--
Alex Williamson HP Open Source & Linux Org.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Bugme-new] [Bug 11316] New: severe performance regression for iptables nat routing
2008-08-15 10:32 ` Herbert Xu
2008-08-15 10:53 ` Herbert Xu
@ 2008-08-15 20:58 ` David Miller
2008-08-16 0:25 ` Herbert Xu
2008-08-15 21:54 ` David Miller
2 siblings, 1 reply; 22+ messages in thread
From: David Miller @ 2008-08-15 20:58 UTC (permalink / raw)
To: herbert; +Cc: alex.williamson, kaber, akpm, buytenh, netdev, bugme-daemon
From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Fri, 15 Aug 2008 20:32:35 +1000
> loopback: Enable TSO
>
> This patch enables TSO since the loopback device is naturally
> capable of handling packets of any size. This also means that
> we won't enable GSO on lo which is good until GSO is fixed to
> preserve netfilter state as netfilter treats loopback packets
> in a special way.
>
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This, effectively, "enables" LRO on loopback.
And sure it's pretty obscure to shape, NAT, and end up forwarding
loopback received packets, but do you want to be the user trying to do
something like that and trying to find this particular patch which is
causing it to not work? :-)
I really don't know whether it's worth worrying about, I just wanted
to mention it.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Bugme-new] [Bug 11316] New: severe performance regression for iptables nat routing
2008-08-15 10:32 ` Herbert Xu
2008-08-15 10:53 ` Herbert Xu
2008-08-15 20:58 ` David Miller
@ 2008-08-15 21:54 ` David Miller
2 siblings, 0 replies; 22+ messages in thread
From: David Miller @ 2008-08-15 21:54 UTC (permalink / raw)
To: herbert; +Cc: alex.williamson, kaber, akpm, buytenh, netdev, bugme-daemon
From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Fri, 15 Aug 2008 20:32:35 +1000
> loopback: Enable TSO
>
> This patch enables TSO since the loopback device is naturally
> capable of handling packets of any size. This also means that
> we won't enable GSO on lo which is good until GSO is fixed to
> preserve netfilter state as netfilter treats loopback packets
> in a special way.
>
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Meanwhile I applied this and I took the liberty of applying
the following right afterwards:
loopback: Remove rest of LOOPBACK_TSO code.
It hasn't been enabled for a long time and the generic GSO
engine is better documentation of what is expected of a
device implementing TSO.
Signed-off-by: David S. Miller <davem@davemloft.net>
---
drivers/net/loopback.c | 62 ------------------------------------------------
1 files changed, 0 insertions(+), 62 deletions(-)
diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
index 46e87cc..489d53b 100644
--- a/drivers/net/loopback.c
+++ b/drivers/net/loopback.c
@@ -64,68 +64,6 @@ struct pcpu_lstats {
unsigned long bytes;
};
-/* KISS: just allocate small chunks and copy bits.
- *
- * So, in fact, this is documentation, explaining what we expect
- * of largesending device modulo TCP checksum, which is ignored for loopback.
- */
-
-#ifdef LOOPBACK_TSO
-static void emulate_large_send_offload(struct sk_buff *skb)
-{
- struct iphdr *iph = ip_hdr(skb);
- struct tcphdr *th = (struct tcphdr *)(skb_network_header(skb) +
- (iph->ihl * 4));
- unsigned int doffset = (iph->ihl + th->doff) * 4;
- unsigned int mtu = skb_shinfo(skb)->gso_size + doffset;
- unsigned int offset = 0;
- u32 seq = ntohl(th->seq);
- u16 id = ntohs(iph->id);
-
- while (offset + doffset < skb->len) {
- unsigned int frag_size = min(mtu, skb->len - offset) - doffset;
- struct sk_buff *nskb = alloc_skb(mtu + 32, GFP_ATOMIC);
-
- if (!nskb)
- break;
- skb_reserve(nskb, 32);
- skb_set_mac_header(nskb, -ETH_HLEN);
- skb_reset_network_header(nskb);
- iph = ip_hdr(nskb);
- skb_copy_to_linear_data(nskb, skb_network_header(skb),
- doffset);
- if (skb_copy_bits(skb,
- doffset + offset,
- nskb->data + doffset,
- frag_size))
- BUG();
- skb_put(nskb, doffset + frag_size);
- nskb->ip_summed = CHECKSUM_UNNECESSARY;
- nskb->dev = skb->dev;
- nskb->priority = skb->priority;
- nskb->protocol = skb->protocol;
- nskb->dst = dst_clone(skb->dst);
- memcpy(nskb->cb, skb->cb, sizeof(skb->cb));
- nskb->pkt_type = skb->pkt_type;
-
- th = (struct tcphdr *)(skb_network_header(nskb) + iph->ihl * 4);
- iph->tot_len = htons(frag_size + doffset);
- iph->id = htons(id);
- iph->check = 0;
- iph->check = ip_fast_csum((unsigned char *) iph, iph->ihl);
- th->seq = htonl(seq);
- if (offset + doffset + frag_size < skb->len)
- th->fin = th->psh = 0;
- netif_rx(nskb);
- offset += frag_size;
- seq += frag_size;
- id++;
- }
-
- dev_kfree_skb(skb);
-}
-#endif /* LOOPBACK_TSO */
-
/*
* The higher levels take care of making this non-reentrant (it's
* called with bh's disabled).
--
1.5.6.5.GIT
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [Bugme-new] [Bug 11316] New: severe performance regression for iptables nat routing
2008-08-15 10:53 ` Herbert Xu
2008-08-15 15:34 ` Alex Williamson
@ 2008-08-15 21:55 ` David Miller
1 sibling, 0 replies; 22+ messages in thread
From: David Miller @ 2008-08-15 21:55 UTC (permalink / raw)
To: herbert; +Cc: alex.williamson, kaber, akpm, buytenh, netdev, bugme-daemon
From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Fri, 15 Aug 2008 20:53:18 +1000
> net: Preserve netfilter attributes in skb_gso_segment using __copy_skb_header
>
> skb_gso_segment didn't preserve some attributes in the original skb
> such as the netfilter fields. This was harmless until they were used
> which is the case for packets going through lo.
>
> This patch makes it call __copy_skb_header which also picks up some
> other missing attributes.
>
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Applied, thanks Herbert.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Bugme-new] [Bug 11316] New: severe performance regression for iptables nat routing
2008-08-15 15:34 ` Alex Williamson
@ 2008-08-15 21:55 ` David Miller
0 siblings, 0 replies; 22+ messages in thread
From: David Miller @ 2008-08-15 21:55 UTC (permalink / raw)
To: alex.williamson; +Cc: herbert, kaber, akpm, buytenh, netdev, bugme-daemon
From: Alex Williamson <alex.williamson@hp.com>
Date: Fri, 15 Aug 2008 09:34:47 -0600
> On Fri, 2008-08-15 at 20:53 +1000, Herbert Xu wrote:
> > On Fri, Aug 15, 2008 at 08:32:35PM +1000, Herbert Xu wrote:
> > >
> > > I'll work on the netfilter state preservation next.
> >
> > Here it is:
>
> Confirmed, these patches solve the problem. Thanks Herbert.
Thanks for your report and testing the fix Alex.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Bugme-new] [Bug 11316] New: severe performance regression for iptables nat routing
2008-08-15 5:35 ` Herbert Xu
2008-08-15 5:49 ` Alex Williamson
@ 2008-08-15 21:56 ` David Miller
1 sibling, 0 replies; 22+ messages in thread
From: David Miller @ 2008-08-15 21:56 UTC (permalink / raw)
To: herbert; +Cc: alex.williamson, kaber, akpm, buytenh, netdev, bugme-daemon
From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Fri, 15 Aug 2008 15:35:48 +1000
> loopback: Drop obsolete ip_summed setting
>
> Now that the network stack can handle inbound packets with partial
> checksums, we should no longer clobber the ip_summed field in the
> loopback driver. This is because CHECKSUM_UNNECESSARY implies that
> the checksum field is actually valid which is not true for loopback
> packets since it's only partial (and thus complemented).
>
> This allows packets from lo to then be SNATed to an external source
> while still preserving the checksum's validity.
>
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
I've applied this one too, let me know if I should not have :)
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Bugme-new] [Bug 11316] New: severe performance regression for iptables nat routing
2008-08-15 20:58 ` David Miller
@ 2008-08-16 0:25 ` Herbert Xu
0 siblings, 0 replies; 22+ messages in thread
From: Herbert Xu @ 2008-08-16 0:25 UTC (permalink / raw)
To: David Miller; +Cc: alex.williamson, kaber, akpm, buytenh, netdev, bugme-daemon
On Fri, Aug 15, 2008 at 01:58:51PM -0700, David Miller wrote:
>
> This, effectively, "enables" LRO on loopback.
>
> And sure it's pretty obscure to shape, NAT, and end up forwarding
> loopback received packets, but do you want to be the user trying to do
> something like that and trying to find this particular patch which is
> causing it to not work? :-)
>
> I really don't know whether it's worth worrying about, I just wanted
> to mention it.
Well the same code path is also used by Xen and virtio (apart
from the netfilter bits which caused this particular bug), so
we should be pretty safe here.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2008-08-16 0:26 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <bug-11316-10286@http.bugzilla.kernel.org/>
2008-08-13 5:12 ` [Bugme-new] [Bug 11316] New: severe performance regression for iptables nat routing Andrew Morton
2008-08-14 2:08 ` Alex Williamson
2008-08-14 2:21 ` David Miller
2008-08-14 11:04 ` Patrick McHardy
2008-08-14 15:08 ` Alex Williamson
2008-08-15 4:44 ` Herbert Xu
2008-08-15 5:35 ` Herbert Xu
2008-08-15 5:49 ` Alex Williamson
2008-08-15 6:17 ` Herbert Xu
2008-08-15 21:56 ` David Miller
[not found] ` <1218778238.23510.22.camel@2710p.home>
2008-08-15 7:33 ` Herbert Xu
2008-08-15 8:14 ` Herbert Xu
2008-08-15 10:32 ` Herbert Xu
2008-08-15 10:53 ` Herbert Xu
2008-08-15 15:34 ` Alex Williamson
2008-08-15 21:55 ` David Miller
2008-08-15 21:55 ` David Miller
2008-08-15 20:58 ` David Miller
2008-08-16 0:25 ` Herbert Xu
2008-08-15 21:54 ` David Miller
2008-08-14 22:00 ` David Miller
2008-08-15 4:34 ` Herbert Xu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).