All of lore.kernel.org
 help / color / mirror / Atom feed
From: Florian Westphal <fw@strlen.de>
To: Florian Westphal <fw@strlen.de>
Cc: netdev@vger.kernel.org, kaber@trash.net, jesse@nicira.com
Subject: Re: [PATCH -next 3/3] ipv4: don't remove df bit when refragmenting
Date: Sun, 12 Apr 2015 10:51:22 +0200	[thread overview]
Message-ID: <20150412085122.GA14720@breakpoint.cc> (raw)
In-Reply-To: <1428704189-31247-4-git-send-email-fw@strlen.de>

Florian Westphal <fw@strlen.de> wrote:
> We always send fragments without DF bit set.
> 
> Thus, given following setup:
> 
> mtu1500 - mtu1500:1400 - mtu1400:1280 - mtu1280
>    A           R1               R2         B
> 
> Where R1 and R2 run linux with netfilter defragmentation/conntrack
> enabled, then if Host A sent a fragmented packet _with_ DF set to B, R1
> will respond with icmp too big error if one of these fragments exceeded
> 1400 bytes.  So far, so good.
> 
> However, the host A will never learn about the lower 1280 link.
> The next packet presumably sent by A would use 1400 as the new fragment
> size, but R1 will strip DF bit when refragmenting.
> 
> Whats worse: If R1 receives fragment sizes 1200 and 100, it would
> forward the reassembled packet without refragmenting, i.e.
> R2 would send an icmp error in response to a packet that was never sent,
> citing mtu that the original sender never exceeded.
> 
> In order to 'replay' the original fragments to preserve their semantics,
> one solution is to
> 
>  1. set DF bit on the new fragments if it was set on original ones.
>  2. set the size of the new fragments generated by R1 during
>     refragmentation to the largest size seen when defragmenting.
> 
> R2 will then notice the problem and will send the expected
> 'too big, use 1280' icmp error, and further fragments of this size
> would not grow anymore to 1400 link mtu when R1 refragments.
> 
> There is however, one important caveat. We cannot just use existing
> IPCB(skb)->frag_max_size as upper boundary for refragmentation.
> 
> We have to consider a case where we receive a large fragment without DF,
> followed by a small fragment with DF set.
> 
> In such scenario we must not generate a large spew of small DF-fragments
> (else we induce packet/traffic amplification).
> 
> This modifies ip_fragment so that we track largest fragment size seen
> both for DF and non-DF packets.
> 
> Then, when we find that we had at least one DF fragment AND the largest
> non-DF fragment did not exceed one with DF set, let ip_fragment know that
> it should refragment using original frag size and also set DF bit on the
> newly created fragments.

Seems Jesse would prefer if we'd set max_frag_size unconditionally.

The advantage of doing that would be that we can easily get rid of the
nf_bridge_mtu_reduction() kludge we have right now since we'd just pick
the largest original fragment size.

Only caveat is that we'd have to check IPSKB_FRAG_PMTU flag too before
deciding to reject if out mtu is too small, which is easy to do.

So, before I respin patch #3: What do others think?

  reply	other threads:[~2015-04-12  8:51 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-10 22:16 [PATCH -next 0/3] net: cap size to original frag size when refragmenting Florian Westphal
2015-04-10 22:16 ` [PATCH -next 1/3] ipv4: reject too-big defragmented DF-skb when forwarding Florian Westphal
2015-04-10 22:51   ` Hannes Frederic Sowa
2015-04-10 22:16 ` [PATCH -next 2/3] ipv6: don't increase size when refragmenting forwarded skbs Florian Westphal
2015-04-10 22:16 ` [PATCH -next 3/3] ipv4: don't remove df bit when refragmenting Florian Westphal
2015-04-12  8:51   ` Florian Westphal [this message]
2015-04-13 17:53 ` [PATCH -next 0/3] net: cap size to original frag size " David Miller
2015-04-13 18:40   ` Hannes Frederic Sowa
2015-04-13 18:56     ` Hannes Frederic Sowa
2015-04-13 20:16     ` David Miller
2015-04-13 20:33       ` Hannes Frederic Sowa
2015-04-16  4:56   ` Herbert Xu
2015-04-16  5:24     ` Patrick McHardy
2015-04-16  5:29       ` Herbert Xu
2015-04-16  5:42         ` Patrick McHardy
2015-04-16 12:11         ` Hannes Frederic Sowa
2015-04-16 15:43           ` David Miller
2015-04-16 16:13             ` Hannes Frederic Sowa
2015-04-16 20:47               ` Herbert Xu
2015-04-16 20:56                 ` Patrick McHardy
2015-04-16 23:16                   ` Hannes Frederic Sowa
2015-04-16 23:44                     ` Patrick McHardy
2015-04-16 20:32             ` Patrick McHardy
2015-04-16 15:32       ` David Miller
2015-04-16 20:55         ` Patrick McHardy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150412085122.GA14720@breakpoint.cc \
    --to=fw@strlen.de \
    --cc=jesse@nicira.com \
    --cc=kaber@trash.net \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.