All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bram Yvahk <bram-yvahk@mail.wizbit.be>
To: Steffen Klassert <steffen.klassert@secunet.com>
Cc: herbert@gondor.apana.org.au, davem@davemloft.net, netdev@vger.kernel.org
Subject: Re: [PATCH ipsec/vti 0/2] Fragmentation of IPv4 in VTI
Date: Thu, 21 Mar 2019 19:33:52 +0100	[thread overview]
Message-ID: <5C93D910.1080008@mail.wizbit.be> (raw)
In-Reply-To: <20190321151630.GA25855@gauss3.secunet.de>

Steffen Klassert wrote:
> On Sun, Mar 17, 2019 at 11:37:55PM +0000, Bram Yvahk wrote:
>> We've experienced an issue with VTI when the path-mtu is smaller than
the size
>> of the "client" packet.
>>
>> What happens: IPv4 packet from the client (i.e. another system in the
LAN)
>> attempts to transmit some data; IPv4 header shows that 'DF' bit is
not set but
>> still the client receives ICMPv4 "need-to-frag" message [which the
client does
>> not expect and ignores].
>>
>> Example: $ ping -s 1300 -M dont -c5 192.168.235.2
>>     PING 192.168.235.3 (192.168.235.3) 1300(1328) bytes of data.
>>     From 192.168.236.254 icmp_seq=1 Frag needed and DF set (mtu = 1214)
>>     From 192.168.236.254 icmp_seq=2 Frag needed and DF set (mtu = 1214)
>>     From 192.168.236.254 icmp_seq=3 Frag needed and DF set (mtu = 1214)
>>     From 192.168.236.254 icmp_seq=4 Frag needed and DF set (mtu = 1214)
>>     From 192.168.236.254 icmp_seq=5 Frag needed and DF set (mtu = 1214)
>>
>>     --- 192.168.235.3 ping statistics ---
>>     5 packets transmitted, 0 received, +5 errors, 100% packet loss,
time 3999ms
>
> Hm, this works here. Can you show how you setup the vti device?
> Some tunnel configuration options (set ttl etc.) force to have
> the DF bit set.

I will provide these details Tommorow.
What I can say is that ttl was set to inherit.

When testing this there is one important bit - which in hindsight I
should've included in the previous message - the (IPsec) Gateway A
needs to know the path-mtu to (IPsec) Gateway B.

Some ways to accomplish this:
- transmit a ICMP with DF bit set and a larger packet size from
  Gateway A to Gateway B
- ensure the "nopmtudisc" option is *not* set in the xfrm state
  and then let client A transmit a ICMP *with* DF bit set to
  client B. [when "nopmtudisc" is set then all outgoing IPv4 ESP
  packet have the DF bit cleared, when "nopmtudisc" is not set then
  DF bit is copied from the client packet]
 
For testing purposes I recommend to do the ping from Gateway A to
Gateway B. (Otherwise tcpdumps/traffic get a bit more confusing.)

A more in-depth description of what happens:

Setup:
======

|----------|   |-----------|   |-------|   |-----------|   |----------|
| client A |---| Gateway A |---| Hop H |---| Gateway B |---| client B |
------------   |-----------|   |-------|   |-----------|   |----------|

- testing with linux 4.14.95 (setup with more recent kernel is WIP)
- link mtu between client A and Gateway A: 1500
- link mtu between Gateway A and Hop H: 1500
- link mtu between Hop H and Gateway B: 1280
- link mtu between Gateway B and client B: 1500
- path-mtu between Gateway A and Gateway B: 1280
- IPsec tunnel over *IPv4* between Gateway A and Gateway B
- tunneling IPv4 over the IPsec tunnel
- testing with VTI

Scenario:
==========

Before starting it's important to ensure that:
- Gateway A does *not* know the path-mtu to Gateway B
- Client A does *not* know the path-mtu to Gateway B

* Step 1: client A: $ ping -M dont -s 1300 ip_of_client_B
  - IPv4 ICMP packet of client A does not have DF bit set
  - IPv4 ESP packet of Gateway A does not have DF bit set
  - Hop H receives a IPv4 ESP packet that is too large for link-mtu
    between Hop H and Gateway B: it fragments the IPv4 ESP packet.
  - Gateway B receives 2 IPv4 fragmented packets
  - (Client B receives one IPv4 ICMP packet from client A)

* Step 2: Gateway A: $ ping -M do -s 1300 ip_of_gateway_B
  - IPv4 ICMP packet of Gateway A does have DF bit set
  - Gateway A receives a 'need to frag' ICMP from Hop H
 
* Step 3: client A: $ ping -M dont -s 1300 ip_of_client_B
  - IPv4 ICMP packet of client A does not have DF bit set
  - Gateway A: it process this packet in VTI module and detects that
    packet size > path-mtu and then sends a 'need to frag' ICMP to
    client A. [this is the code I patched]
     
=> the critical bit in the above is that Gateway A learns
   the path-mtu to Gateway B. If it doesn't then it keeps
   assuming path-mtu is 1500 and the check in VTI will not
   trigger (since path-mtu of 1500 > packet size)


  reply	other threads:[~2019-03-21 18:33 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-17 23:37 [PATCH ipsec/vti 0/2] Fragmentation of IPv4 in VTI Bram Yvahk
2019-03-17 23:37 ` [PATCH ipsec/vti 1/2] vti: fragment IPv4 packets when DF bit is not set Bram Yvahk
2019-03-17 23:52   ` Bram Yvahk
2019-03-17 23:37 ` [PATCH ipsec/vti 2/2] vti6: process icmp msg when IPv6 is fragmented Bram Yvahk
2019-03-21 15:16 ` [PATCH ipsec/vti 0/2] Fragmentation of IPv4 in VTI Steffen Klassert
2019-03-21 18:33   ` Bram Yvahk [this message]
2019-03-22 20:46     ` Bram Yvahk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5C93D910.1080008@mail.wizbit.be \
    --to=bram-yvahk@mail.wizbit.be \
    --cc=davem@davemloft.net \
    --cc=herbert@gondor.apana.org.au \
    --cc=netdev@vger.kernel.org \
    --cc=steffen.klassert@secunet.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.