All of lore.kernel.org
 help / color / mirror / Atom feed
From: Edward Cree <ecree@solarflare.com>
To: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Or Gerlitz <gerlitz.or@gmail.com>,
	Alexander Duyck <aduyck@mirantis.com>,
	Netdev <netdev@vger.kernel.org>,
	David Miller <davem@davemloft.net>,
	Tom Herbert <tom@herbertland.com>
Subject: Re: [RFC PATCH 7/9] GSO: Support partial segmentation offload
Date: Thu, 24 Mar 2016 20:17:26 +0000	[thread overview]
Message-ID: <56F44B56.9070206@solarflare.com> (raw)
In-Reply-To: <CAKgT0UeW7A6cMRHm8_nEX=op_VAK5wcoYYRTNQVmMQWe02HdfA@mail.gmail.com>

On 24/03/16 18:43, Alexander Duyck wrote:
> On Thu, Mar 24, 2016 at 10:12 AM, Edward Cree <ecree@solarflare.com> wrote:
>> For UDP header, we look to see if the current checksum field is zero.  If
>> so, we leave it as zero, fold our edits into csum_edit and return the
>> result.  Otherwise, we fold our edits and csum_edit into our checksum
>> field, and return zero.
> This would require changing how we handle partial checksums so that in
> the case of UDP we don't allow 0 as a valid value.  Currently we do.
> It isn't till we get to the final checksum that we take care of the
> bit flip in the case of 0.
No, the UDP checksum will have been filled in by LCO and thus have been bit-
flipped already if it was zero.  Only the innermost L4 header will have a
partial checksum, and that's TCP so the checksum is required.  (Alternatively:
whichever header has the partial checksum - and there is at most one - is
identified by skb->csum_start, and by definition the checksum must be enabled
for that header, so we can skip the 'check for zero' heuristic there.)
(Besides, I thought it was impossible for the partial checksum to be zero
anyway because at least one of the inputs must be nonzero, and the end-
around carry can never produce a zero.  But maybe I'm getting confused here.)
>> This should even be a worthwhile simplification of the non-nested case,
>> because (if I understand correctly) it means GSO partial doesn't need the
>> various gso_type flags we already have to specify tunnel type and checksum
>> status; it just figures it out as it goes.
> Yes, but doing packet inspection can get to be expensive as we crawl
> through the headers.  In addition it gets into the whole software
> versus hardware offloads thing.
The headers should already be in cache, I thought, and this is only happening
once per superframe.  We're already going to have to crawl through the headers
anyway to edit the lengths, I don't think it should cost much more to also
inspect things like GRE csum bit or the UDP checksum field.  And by
identifying the 'next header' from this method as well, we don't need to add a
new SKB_GSO_FOO bit or two every time we add another kind of encapsulation to
the kernel.

(And remember that this is separate from - and doesn't impact - the existing
GSO code; this is a bunch of new foo_gso_partial() callbacks, distinct from
the foo_gso_segment() ones; and it's about preparing a superframe for hardware
offload rather than doing the segmentation in software.  I think it's
preferable to have some preparation happen in software if that allows hardware
to be totally dumb.  (Apologies if that was already all obvious.))
> Honestly I think tunnel-in-tunnel is not going to be doable due to the
> fact that we would have to increment multiple layers of IP ID in order
> to do it correctly.  The more I look into the whole DF on outer
> headers thing the more I see RFCs such as RFC 2784 that say not to do
> it unless you want to implement PMTU discovery, and PMTU discovery is
> inherently flawed since it would require ICMP messages to be passed
> which may be filtered out by firewalls.
If PMTU discovery really is "inherently flawed", then TCP is broken and so
is IPv6.  I think the Right Thing here is for us to translate and forward
ICMP errors to the originator of the traffic, as RFC 2784 vaguely suggests.
It should also be possible to configure the tunnel endpoint's MTU to a
sufficiently low value that in practice it will fit within the path MTU;
then the sender will discover the tunnel's MTU restriction and stay within
that.  (I am assuming here that ICMP won't be filtered on the overlay
network - which should be under the control of the tunnel administrator -
even if it might be on the underlay network.)
> On top of that it occurred to me that GRE cannot be screened in GRO
> for the outer-IP-ID check.  Basically what can happen is on devices
> that don't parse inner headers for GRE we can end up with two TCP
> flows from the same tunnel essentially stomping on each other and
> causing one another to get evicted for having an outer IP-ID that
> doesn't match up with the expected value.
Yes, that does seem problematic - you'd need to save away the IPID check
result and only evict once you'd ascertained they were the same flow.  All
the more reason to try to make your tunnels use DF *grin*.

On the subject of GRO, I was wondering - it seems like the main reason why
drivers have LRO is so they can be more permissive than GRO's "must be
reversible" rules.  (At least, that seems to be the case for sfc's LRO,
which is only in our out-of-tree driver.)  Maybe instead of each driver
having its own LRO code (with hacks to avoid breaking Slow Start and
suchlike by breaking ACK stats), there should be per-device options to
control how permissive GRO is (e.g. don't check IP IDs), so that a user who
wants LRO-like behaviour can get it from GRO.
Any obvious reason why I couldn't do such a thing?

(I realise LRO might not go away entirely, if other drivers are doing
various hardware-assisted things.  But in our case it really is all in
software.)

-Ed

  reply	other threads:[~2016-03-24 20:17 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-18 23:24 [RFC PATCH 0/9] RFC6864 compliant GRO and GSO partial offload Alexander Duyck
2016-03-18 23:24 ` [RFC PATCH 1/9] ipv4/GRO: Allow multiple frames to use the same IP ID Alexander Duyck
2016-03-24  1:43   ` Jesse Gross
2016-03-24  2:21     ` Alexander Duyck
2016-03-28  4:57       ` Jesse Gross
2016-03-18 23:24 ` [RFC PATCH 2/9] gre: Enforce IP ID verification on outer headers Alexander Duyck
2016-03-18 23:24 ` [RFC PATCH 3/9] geneve: " Alexander Duyck
2016-03-18 23:25 ` [RFC PATCH 4/9] vxlan: " Alexander Duyck
2016-03-18 23:25 ` [RFC PATCH 5/9] gue: " Alexander Duyck
2016-03-18 23:25 ` [RFC PATCH 6/9] ethtool: Add support for toggling any of the GSO offloads Alexander Duyck
2016-03-19  0:18   ` Ben Hutchings
2016-03-19  0:30     ` Alexander Duyck
2016-03-19  1:42       ` Ben Hutchings
2016-03-19  2:01         ` Jesse Gross
2016-03-19  2:43           ` Alexander Duyck
2016-03-18 23:25 ` [RFC PATCH 7/9] GSO: Support partial segmentation offload Alexander Duyck
2016-03-22 17:00   ` Edward Cree
2016-03-22 17:47     ` Alexander Duyck
2016-03-22 19:40       ` Edward Cree
2016-03-22 20:11         ` Jesse Gross
2016-03-22 20:17           ` David Miller
2016-03-22 21:38         ` Alexander Duyck
2016-03-23 16:27           ` Edward Cree
2016-03-23 18:06             ` Alexander Duyck
2016-03-23 21:05               ` Edward Cree
2016-03-23 22:36                 ` Alexander Duyck
2016-03-23 23:00                   ` Edward Cree
2016-03-23 23:15                     ` Alexander Duyck
2016-03-24 17:12                       ` Edward Cree
2016-03-24 18:43                         ` Alexander Duyck
2016-03-24 20:17                           ` Edward Cree [this message]
2016-03-24 21:50                             ` Alexander Duyck
2016-03-24 23:00                               ` Edward Cree
2016-03-24 23:35                                 ` Alexander Duyck
2016-03-25  0:37                                   ` Edward Cree
2016-03-23 17:09   ` Tom Herbert
2016-03-23 18:19     ` Alexander Duyck
2016-03-24  1:37       ` Jesse Gross
2016-03-24  2:53         ` Alexander Duyck
2016-03-28  5:35           ` Jesse Gross
2016-03-28  5:36   ` Jesse Gross
2016-03-28 16:25     ` Alexander Duyck
2016-03-18 23:25 ` [RFC PATCH 8/9] i40e/i40evf: Add support for GSO partial with UDP_TUNNEL_CSUM and GRE_CSUM Alexander Duyck
2016-03-23 19:35   ` Jesse Gross
2016-03-23 20:21     ` Alexander Duyck
2016-03-18 23:25 ` [RFC PATCH 9/9] ixgbe/ixgbevf: Add support for GSO partial Alexander Duyck
2016-03-19  2:05   ` Jesse Gross
2016-03-19  2:42     ` Alexander Duyck
2016-03-21 18:50 ` [RFC PATCH 0/9] RFC6864 compliant GRO and GSO partial offload David Miller
2016-03-21 19:46   ` Alexander Duyck
2016-03-21 20:10     ` Jesse Gross

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56F44B56.9070206@solarflare.com \
    --to=ecree@solarflare.com \
    --cc=aduyck@mirantis.com \
    --cc=alexander.duyck@gmail.com \
    --cc=davem@davemloft.net \
    --cc=gerlitz.or@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=tom@herbertland.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.