* LRO restructuring?
@ 2008-08-11 13:30 Andrew Gallatin
2008-08-11 21:03 ` David Miller
2008-08-12 0:50 ` Herbert Xu
0 siblings, 2 replies; 16+ messages in thread
From: Andrew Gallatin @ 2008-08-11 13:30 UTC (permalink / raw)
To: Herbert Xu; +Cc: netdev, Brice Goglin
Hi,
You mentioned in the recent "Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI
initiator" thread that you were planning to restructure LRO to
preserve headers so as to make forwarding possible without totally
disabling LRO.
For lro_receive_frags() based LRO, it would be ideal to locate the
header in place in the frag via the mac_hdr argument to the
get_frag_header() callback. Eg, I'm hoping that neither the driver
nor the LRO module will need to allocate extra memory per frame and
copy the headers to it in the common case when forwarding is
not enabled. That would add quite a bit of overhead.
With respect to hardware LRO and headers: Would it be possible
to notify the driver via some sort of callback whether the headers
are required? I think most hardware LRO implementations are going
to collapse the headers, and having the option to fallback to software
LRO for forwarding might be needed for those devices which will throw
away the intermediate headers.
Last, have you considered simply allowing "inexact" forwarding, where
the ingress NIC is doing LRO and the egress nic is doing TSO? You
loose exact framing information (eg, what you emit might not be framed
exactly as you receive it), but you can still do filtering, and the
host overhead is very low.
Thanks,
Drew
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: LRO restructuring?
2008-08-11 13:30 LRO restructuring? Andrew Gallatin
@ 2008-08-11 21:03 ` David Miller
2008-08-12 11:50 ` Andrew Gallatin
2008-08-12 0:50 ` Herbert Xu
1 sibling, 1 reply; 16+ messages in thread
From: David Miller @ 2008-08-11 21:03 UTC (permalink / raw)
To: gallatin; +Cc: herbert, netdev, brice
From: Andrew Gallatin <gallatin@myri.com>
Date: Mon, 11 Aug 2008 09:30:33 -0400
> Last, have you considered simply allowing "inexact" forwarding, where
> the ingress NIC is doing LRO and the egress nic is doing TSO? You
> loose exact framing information (eg, what you emit might not be framed
> exactly as you receive it), but you can still do filtering, and the
> host overhead is very low.
Intermediate nodes are not supposed to change the transport layer
checksum if at all possible, especially on routers.
Otherwise it is much more difficult to diagnose checksum errors,
and figure out what caused such an error.
When the router doesn't modify the checksum, we know it's an end-node.
Even a firewall only "adjusts" checksums based upon packet
modifications for NAT and such, which will preserve end-node created
errors.
So no this isn't really an option.
This is why Herbert wants to preserve the original headers,
we're not supposed to change them.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: LRO restructuring?
2008-08-11 13:30 LRO restructuring? Andrew Gallatin
2008-08-11 21:03 ` David Miller
@ 2008-08-12 0:50 ` Herbert Xu
2008-08-12 0:54 ` David Miller
1 sibling, 1 reply; 16+ messages in thread
From: Herbert Xu @ 2008-08-12 0:50 UTC (permalink / raw)
To: Andrew Gallatin; +Cc: netdev, Brice Goglin
On Mon, Aug 11, 2008 at 09:30:33AM -0400, Andrew Gallatin wrote:
>
> For lro_receive_frags() based LRO, it would be ideal to locate the
> header in place in the frag via the mac_hdr argument to the
> get_frag_header() callback. Eg, I'm hoping that neither the driver
> nor the LRO module will need to allocate extra memory per frame and
> copy the headers to it in the common case when forwarding is
> not enabled. That would add quite a bit of overhead.
You don't have to save the whole thing, just save enough so we
can easily/exactly reconstruct it on output, i.e., save the lengths.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: LRO restructuring?
2008-08-12 0:50 ` Herbert Xu
@ 2008-08-12 0:54 ` David Miller
2008-08-12 1:00 ` Herbert Xu
0 siblings, 1 reply; 16+ messages in thread
From: David Miller @ 2008-08-12 0:54 UTC (permalink / raw)
To: herbert; +Cc: gallatin, netdev, brice
From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Tue, 12 Aug 2008 08:50:33 +0800
> On Mon, Aug 11, 2008 at 09:30:33AM -0400, Andrew Gallatin wrote:
> >
> > For lro_receive_frags() based LRO, it would be ideal to locate the
> > header in place in the frag via the mac_hdr argument to the
> > get_frag_header() callback. Eg, I'm hoping that neither the driver
> > nor the LRO module will need to allocate extra memory per frame and
> > copy the headers to it in the common case when forwarding is
> > not enabled. That would add quite a bit of overhead.
>
> You don't have to save the whole thing, just save enough so we
> can easily/exactly reconstruct it on output, i.e., save the lengths.
And the checksums :-) As an intermediate node we don't want
to touch the checksum.
The length and the checksum is two u16 values, which would be
able to fit in a single 32-bit descriptor or something like
that.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: LRO restructuring?
2008-08-12 0:54 ` David Miller
@ 2008-08-12 1:00 ` Herbert Xu
2008-08-12 1:30 ` Rick Jones
0 siblings, 1 reply; 16+ messages in thread
From: Herbert Xu @ 2008-08-12 1:00 UTC (permalink / raw)
To: David Miller; +Cc: gallatin, netdev, brice
On Mon, Aug 11, 2008 at 05:54:34PM -0700, David Miller wrote:
>
> And the checksums :-) As an intermediate node we don't want
> to touch the checksum.
Yeah if it wasn't verified then we must store this as well.
> The length and the checksum is two u16 values, which would be
> able to fit in a single 32-bit descriptor or something like
> that.
Yep.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: LRO restructuring?
2008-08-12 1:00 ` Herbert Xu
@ 2008-08-12 1:30 ` Rick Jones
2008-08-12 1:39 ` David Miller
0 siblings, 1 reply; 16+ messages in thread
From: Rick Jones @ 2008-08-12 1:30 UTC (permalink / raw)
To: Herbert Xu; +Cc: David Miller, gallatin, netdev, brice
Herbert Xu wrote:
> On Mon, Aug 11, 2008 at 05:54:34PM -0700, David Miller wrote:
>
>>And the checksums :-) As an intermediate node we don't want
>>to touch the checksum.
>
>
> Yeah if it wasn't verified then we must store this as well.
Even if it was verified I think you want to keep the checksums from the
header. Since an intermediate device isn't supposed to be peeking at
the TCP part anyway, it wouldn't do to drop the segment ourselves, pass
it along to be dropped by the ultimate reciever. And if there is
something amis in the verification or the regeneration, we don't want to
introduce silent data corruption.
Likely that also goes for the IP header checksum...
rick jones
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: LRO restructuring?
2008-08-12 1:30 ` Rick Jones
@ 2008-08-12 1:39 ` David Miller
2008-08-12 1:53 ` Herbert Xu
0 siblings, 1 reply; 16+ messages in thread
From: David Miller @ 2008-08-12 1:39 UTC (permalink / raw)
To: rick.jones2; +Cc: herbert, gallatin, netdev, brice
From: Rick Jones <rick.jones2@hp.com>
Date: Mon, 11 Aug 2008 18:30:11 -0700
> Even if it was verified I think you want to keep the checksums from the
> header. Since an intermediate device isn't supposed to be peeking at
> the TCP part anyway, it wouldn't do to drop the segment ourselves, pass
> it along to be dropped by the ultimate reciever. And if there is
> something amis in the verification or the regeneration, we don't want to
> introduce silent data corruption.
>
> Likely that also goes for the IP header checksum...
IP header is a little different, intermediate nodes should verify it
(and we do adjust it when decrementing TTL).
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: LRO restructuring?
2008-08-12 1:39 ` David Miller
@ 2008-08-12 1:53 ` Herbert Xu
2009-02-18 19:25 ` James Huang
0 siblings, 1 reply; 16+ messages in thread
From: Herbert Xu @ 2008-08-12 1:53 UTC (permalink / raw)
To: David Miller; +Cc: rick.jones2, gallatin, netdev, brice
On Mon, Aug 11, 2008 at 06:39:13PM -0700, David Miller wrote:
> From: Rick Jones <rick.jones2@hp.com>
> Date: Mon, 11 Aug 2008 18:30:11 -0700
>
> > Even if it was verified I think you want to keep the checksums from the
> > header. Since an intermediate device isn't supposed to be peeking at
> > the TCP part anyway, it wouldn't do to drop the segment ourselves, pass
> > it along to be dropped by the ultimate reciever. And if there is
> > something amis in the verification or the regeneration, we don't want to
> > introduce silent data corruption.
Well I wasn't suggesting that it be dropped, but simply skip LRO
if the inbound packet fails the checksum check.
But yeah, it's only two bytes so we might as well always have it.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: LRO restructuring?
2008-08-11 21:03 ` David Miller
@ 2008-08-12 11:50 ` Andrew Gallatin
2008-08-13 2:14 ` Herbert Xu
0 siblings, 1 reply; 16+ messages in thread
From: Andrew Gallatin @ 2008-08-12 11:50 UTC (permalink / raw)
To: David Miller; +Cc: herbert, netdev, brice
David Miller wrote:
> From: Andrew Gallatin <gallatin@myri.com>
> Date: Mon, 11 Aug 2008 09:30:33 -0400
>
>> Last, have you considered simply allowing "inexact" forwarding, where
>> the ingress NIC is doing LRO and the egress nic is doing TSO? You
>> loose exact framing information (eg, what you emit might not be framed
>> exactly as you receive it), but you can still do filtering, and the
>> host overhead is very low.
>
> Intermediate nodes are not supposed to change the transport layer
> checksum if at all possible, especially on routers.
Indeed. Nor should they change lengths, or anything else.
Everything about this "inexact" forwarding is illegal as
hell. However, you have to admit that it is an interesting hack :)
Drew
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: LRO restructuring?
2008-08-12 11:50 ` Andrew Gallatin
@ 2008-08-13 2:14 ` Herbert Xu
0 siblings, 0 replies; 16+ messages in thread
From: Herbert Xu @ 2008-08-13 2:14 UTC (permalink / raw)
To: Andrew Gallatin; +Cc: davem, herbert, netdev, brice
Andrew Gallatin <gallatin@myri.com> wrote:
>
> Indeed. Nor should they change lengths, or anything else.
> Everything about this "inexact" forwarding is illegal as
> hell. However, you have to admit that it is an interesting hack :)
Solutions like this have been deployed. For instance, many satellite
networks use transparent TCP proxies to mitigate the effect of large
latencies on older TCP stacks that don't have modern congestion
control algorithms.
Surprisingly there are actually very few problems. The biggest
one (apart from scalability) is with non-TCP traffic masquerading
as TCP such as Cisco's VPN solution.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: LRO restructuring?
2008-08-12 1:53 ` Herbert Xu
@ 2009-02-18 19:25 ` James Huang
2009-02-18 19:42 ` Ben Hutchings
` (2 more replies)
0 siblings, 3 replies; 16+ messages in thread
From: James Huang @ 2009-02-18 19:25 UTC (permalink / raw)
To: netdev
Hi Herbert,
Any idea when this LRO restructuring work will be done?
Making LRO available even when ip forwarding is enabled will significantly
improve performace of network appliances in the data path.
I have some questions on this:
(1) Based on the emails in this thread, I suppose you are going to keep the
original length of each segment you coalesced into the big packet and use that
info to segment the big packet on the output path. In case the packet was
modified by an appliance in the path and the total length is changed (e.g. NAT
on ftp control packets), should the corresponding segment length info also get
updated? This same question also applies to the checksums.
(2) Do you make sure all of the segments to be coalesced have the same DF bit?
(3) I think bridged packets should not be LROed. Whether a packet is bridged
or not can be based on the L2 MAC destination address. Is this how it is done?
(4) Does LRO work only for IPv4? Any plan to extend it to support IPv6?
Thanks,
James Huang
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: LRO restructuring?
2009-02-18 19:25 ` James Huang
@ 2009-02-18 19:42 ` Ben Hutchings
2009-02-18 19:46 ` Stephen Hemminger
2009-02-19 13:53 ` Herbert Xu
2 siblings, 0 replies; 16+ messages in thread
From: Ben Hutchings @ 2009-02-18 19:42 UTC (permalink / raw)
To: James Huang; +Cc: netdev
On Wed, 2009-02-18 at 19:25 +0000, James Huang wrote:
> Hi Herbert,
>
> Any idea when this LRO restructuring work will be done?
> Making LRO available even when ip forwarding is enabled will significantly
> improve performace of network appliances in the data path.
Herbert has added "GRO" rather than immediately replacing the inet_lro
code. An early version of this is in 2.6.29 and there is more in
net-next-2.6 destined for 2.6.30.
> I have some questions on this:
[...]
> (3) I think bridged packets should not be LROed. Whether a packet is bridged
> or not can be based on the L2 MAC destination address. Is this how it is done?
GRO preserves enough information to reconstruct the original frames on
output, so there is no specific check for bridging. Presumably it would
be cheaper not to do use GRO if the frames are not going to hit the
TCP/IP stack though.
> (4) Does LRO work only for IPv4? Any plan to extend it to support IPv6?
IPv6 is covered.
Ben.
--
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: LRO restructuring?
2009-02-18 19:25 ` James Huang
2009-02-18 19:42 ` Ben Hutchings
@ 2009-02-18 19:46 ` Stephen Hemminger
2009-02-19 13:53 ` Herbert Xu
2 siblings, 0 replies; 16+ messages in thread
From: Stephen Hemminger @ 2009-02-18 19:46 UTC (permalink / raw)
To: James Huang; +Cc: netdev
On Wed, 18 Feb 2009 19:25:49 +0000 (UTC)
James Huang <jamesclhuang@gmail.com> wrote:
> Hi Herbert,
>
> Any idea when this LRO restructuring work will be done?
> Making LRO available even when ip forwarding is enabled will significantly
> improve performace of network appliances in the data path.
LRO with ip forwarding is a bad idea since it violates the end-to-end
principle. It might even be a DO NOT in host requirements RFC.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: LRO restructuring?
2009-02-18 19:25 ` James Huang
2009-02-18 19:42 ` Ben Hutchings
2009-02-18 19:46 ` Stephen Hemminger
@ 2009-02-19 13:53 ` Herbert Xu
2009-02-19 22:20 ` James Huang
[not found] ` <f0ed9b110902191417k2917d856q9098b304eeb7435b@mail.gmail.com>
2 siblings, 2 replies; 16+ messages in thread
From: Herbert Xu @ 2009-02-19 13:53 UTC (permalink / raw)
To: James Huang; +Cc: netdev
James Huang <jamesclhuang@gmail.com> wrote:
> Hi Herbert,
Please cc me if you're going to address the email to me :)
> Any idea when this LRO restructuring work will be done?
The target is to get GRO into 2.6.30, and have all LRO drivers
converted by 2.6.31.
> Making LRO available even when ip forwarding is enabled will significantly
> improve performace of network appliances in the data path.
I'd like to see more performance data on the forwarding side.
The act of aggregation isn't exactly free so it very much depends
on what you're forwarding.
The driver for all this work is bridging used for virtualisation
where the data is very likely to be mergeable.
> I have some questions on this:
> (1) Based on the emails in this thread, I suppose you are going to keep the
> original length of each segment you coalesced into the big packet and use that
> info to segment the big packet on the output path. In case the packet was
> modified by an appliance in the path and the total length is changed (e.g. NAT
> on ftp control packets), should the corresponding segment length info also get
> updated? This same question also applies to the checksums.
If packet length changes due to NAT then we're no loner bound
by the end-to-end restriction so we can do whatever we want.
For GRO we treat it in exactly the same way as a GSO packet that
undergoes NAT. That is, NAT just sees a very large IP packet and
does what it has to do, and on final output we resegment the packet.
> (2) Do you make sure all of the segments to be coalesced have the same DF bit?
Yes.
> (3) I think bridged packets should not be LROed. Whether a packet is bridged
> or not can be based on the L2 MAC destination address. Is this how it is done?
We currently require the Ethernet header to be identical.
> (4) Does LRO work only for IPv4? Any plan to extend it to support IPv6?
IPv6 is supported.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: LRO restructuring?
2009-02-19 13:53 ` Herbert Xu
@ 2009-02-19 22:20 ` James Huang
[not found] ` <f0ed9b110902191417k2917d856q9098b304eeb7435b@mail.gmail.com>
1 sibling, 0 replies; 16+ messages in thread
From: James Huang @ 2009-02-19 22:20 UTC (permalink / raw)
To: netdev
Hi all,
One question about inet_lro.c (linux 2.6.28.6)
In lro_proc_segment(), it does not check the number of available frags entries
in lro_desc->parent against the number of new fragments to be added before
calling lro_add_frags(). Is that a bug? I see that lro_proc_segment() does
some type of checking at the end, but could that be too late (i.e. overflow
the frags[])??
Also I think some obvious optimization in lro_get_desc() can be done by
combining the two for loops into one.
-- James Huang
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: LRO restructuring?
[not found] ` <f0ed9b110902191417k2917d856q9098b304eeb7435b@mail.gmail.com>
@ 2009-02-20 0:37 ` Herbert Xu
0 siblings, 0 replies; 16+ messages in thread
From: Herbert Xu @ 2009-02-20 0:37 UTC (permalink / raw)
To: James Huang; +Cc: Jan-Bernd Themann, Christoph Raisch, netdev
On Thu, Feb 19, 2009 at 02:17:11PM -0800, James Huang wrote:
> Hi all,
>
> One question about inet_lro.c (linux 2.6.28.6)
> In lro_proc_segment(), it does not check the number of available frags
> entries in lro_desc->parent against the number of new fragments to be added
> before calling lro_add_frags(). Is that a bug? I see that
> lro_proc_segment() does some type of checking at the end, but could that be
> too late (i.e. overflow the frags[])??
>
> Also I think some obvious optimization in lro_get_desc() can be done by
> combining the two for loops into one.
That file is obsolete. The GRO code that will replace it lives
in various places:
net/core/dev.c
net/core/skbuff.c
net/ipv4/af_inet.c
net/ipv6/af_inet6.c
net/ipv4/tcp_ipv4.c
net/ipv6/tcp_ipv6.c
net/ipv4/tcp.c
net/8021q/vlan_core.c
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2009-02-20 0:37 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-08-11 13:30 LRO restructuring? Andrew Gallatin
2008-08-11 21:03 ` David Miller
2008-08-12 11:50 ` Andrew Gallatin
2008-08-13 2:14 ` Herbert Xu
2008-08-12 0:50 ` Herbert Xu
2008-08-12 0:54 ` David Miller
2008-08-12 1:00 ` Herbert Xu
2008-08-12 1:30 ` Rick Jones
2008-08-12 1:39 ` David Miller
2008-08-12 1:53 ` Herbert Xu
2009-02-18 19:25 ` James Huang
2009-02-18 19:42 ` Ben Hutchings
2009-02-18 19:46 ` Stephen Hemminger
2009-02-19 13:53 ` Herbert Xu
2009-02-19 22:20 ` James Huang
[not found] ` <f0ed9b110902191417k2917d856q9098b304eeb7435b@mail.gmail.com>
2009-02-20 0:37 ` Herbert Xu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).