Re: [PATCH] net: ip_finish_output_gso: If skb_gso_network_seglen exceeds MTU, do segmentation even for non IPSKB_FORWARDED skbs

Netdev List
 help / color / mirror / Atom feed

From: Shmulik Ladkani <shmulik.ladkani@ravellosystems.com>
To: Florian Westphal <fw@strlen.de>
Cc: "David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Hannes Frederic Sowa <hannes@stressinduktion.org>,
	shmulik.ladkani@gmail.com, netdev@vger.kernel.org
Subject: Re: [PATCH] net: ip_finish_output_gso: If skb_gso_network_seglen exceeds MTU, do segmentation even for non IPSKB_FORWARDED skbs
Date: Tue, 5 Jul 2016 17:05:41 +0300	[thread overview]
Message-ID: <20160705170541.3f210675@pixies> (raw)
In-Reply-To: <20160705130327.GA10737@breakpoint.cc>

On Tue, 5 Jul 2016 15:03:27 +0200, fw@strlen.de wrote:
> > The expected behavior in such a setup would be segmenting the skb first,
> > and then fragmenting each segment according to dst mtu, and finally
> > passing the resulting fragments to ip_finish_output2.
> > 
> > 'ip_finish_output_gso' already supports this "Slowpath" behavior,
> > but it is only considered if IPSKB_FORWARDED is set.
> > 
> > However in the bridged case, IPSKB_FORWARDED is off, and the "Slowpath"
> > behavior is not considered.
> 
> I placed this test there under the assumption that L2 bridges have
> the same MTU on all bridge ports, so we'd only need to consider routing
> case.

In our setups we have no control of VM mtu (which affects gso_size of
packets arriving from tap), and no control of vxlan's underlay mtu.

> How does work if e.g. 1460-sized udp packet arrives on tap0?
> Do we fragment (possibly ignoring DF?)

A *non* gso udp packet arriving on tap0 is bridged to vxlan0 (assuming
vxlan mtu is sufficient), and the original DF of the inner packet is
preserved.

The skb gets vxlan-udp encapsulated, with outer IP header having DF=0
(unless tun_flags & TUNNEL_DONT_FRAGMENT), and then, if skb->len > mtu,
fragmented normally at the ip_finish_output --> ip_fragment code path.

So on wire we have 2 frags of the vxlan datagram; they are reassembled
at recepient ip stack of vxlan termination. Inner packet preserved.
Not ideal, but works.

The issue is with GSO skbs arriving from tap, which eventually generates
segments larger then the mtu, which are not transmitted on eth0:

  tap0 rx:  super packet, gso_size from user's virtio_net_hdr
    ...
    vxlan0 tx:  encaps the super packet
      ...
      ip_finish_output
        ip_finish_output_gso
          *NO* skb_gso_validate_mtu()     <--- problem here
            ip_finish_output2:  tx the encapsulated super packet on eth0
              ...
              validate_xmit_skb
                netif_needs_gso
                  skb_gso_segment: segments inner payload according to
                                   original gso_size,
                                   leads to vxlan datagrams larger than mtu

> How does it work for non-ip protocols?

The specific problem is with vxlan (or any other udp based tunnel)
encapsulated GSOed packets.

> (Or did I misunderstand this setup...?)

tap0 bridged with vxlan0.
route to vxlan0's remote peer is via eth0, configured with small mtu.

Regards,
Shmulik

next prev parent reply	other threads:[~2016-07-05 14:05 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-05 12:35 [PATCH] net: ip_finish_output_gso: If skb_gso_network_seglen exceeds MTU, do segmentation even for non IPSKB_FORWARDED skbs Shmulik Ladkani
2016-07-05 13:03 ` Florian Westphal
2016-07-05 14:05   ` Shmulik Ladkani [this message]
2016-07-09  3:12     ` David Miller
2016-07-09  9:06       ` Florian Westphal
2016-07-09  9:00     ` Florian Westphal
2016-07-09 12:30       ` Shmulik Ladkani
2016-07-09 13:22         ` Florian Westphal
2016-07-10  7:51           ` Shmulik Ladkani
2016-07-11  8:15             ` Florian Westphal
2016-07-11 13:32               ` Hannes Frederic Sowa
2016-07-12  5:56           ` Shmulik Ladkani
2016-07-13 14:00             ` Shmulik Ladkani
2016-07-14 13:12               ` Hannes Frederic Sowa
2016-07-14 14:13                 ` Shmulik Ladkani
2016-07-14 23:32                   ` Hannes Frederic Sowa
2016-07-10 20:14         ` Shmulik Ladkani
2016-07-11  8:13           ` Florian Westphal
2016-07-09 15:10       ` Hannes Frederic Sowa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160705170541.3f210675@pixies \
    --to=shmulik.ladkani@ravellosystems.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=fw@strlen.de \
    --cc=hannes@stressinduktion.org \
    --cc=netdev@vger.kernel.org \
    --cc=shmulik.ladkani@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox