a vxlan question.

Netdev List
 help / color / mirror / Atom feed

* a vxlan question.
@ 2014-01-12 17:25 sowmini varadhan
  2014-01-12 18:25 ` Stephen Hemminger
  0 siblings, 1 reply; 3+ messages in thread
From: sowmini varadhan @ 2014-01-12 17:25 UTC (permalink / raw)
  To: netdev; +Cc: sowmini.varadhan

A question about the vxlan implementation in linux:

if the inner packet (the frame that is vxlan encapsulated) is an IP
packet that has the DF bit set, i.e., it is a PMTU discovery packet, and
the subsequent vxlan encapsulation results in a ICMP packet-too-big
error,then does the VTEP relay that error back to the originator of
the
PMTU packet?

AFAICT, the current linux code in drivers/net/vxlan.c
does not address any icmp errors (though it sets the DF of the outer
header based on the inner header). From my reading of the code,
we'd end up in __udp4_lib_err for the vxlan-encaps packet, but
there's nothing in there that recognizes that the udp payload is
itself an ethernet+IP frame and relays pmtu back to the (inner) ip src?
Did I miss something?

--Sowmini

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: a vxlan question.
  2014-01-12 17:25 a vxlan question sowmini varadhan
@ 2014-01-12 18:25 ` Stephen Hemminger
  2014-01-12 18:38   ` sowmini varadhan
  0 siblings, 1 reply; 3+ messages in thread
From: Stephen Hemminger @ 2014-01-12 18:25 UTC (permalink / raw)
  To: sowmini varadhan; +Cc: netdev, sowmini.varadhan

On Sun, 12 Jan 2014 12:25:51 -0500
sowmini varadhan <sowmini05@gmail.com> wrote:

> A question about the vxlan implementation in linux:
> 
> if the inner packet (the frame that is vxlan encapsulated) is an IP
> packet that has the DF bit set, i.e., it is a PMTU discovery packet, and
> the subsequent vxlan encapsulation results in a ICMP packet-too-big
> error,then does the VTEP relay that error back to the originator of
> the
> PMTU packet?
> 
> AFAICT, the current linux code in drivers/net/vxlan.c
> does not address any icmp errors (though it sets the DF of the outer
> header based on the inner header). From my reading of the code,
> we'd end up in __udp4_lib_err for the vxlan-encaps packet, but
> there's nothing in there that recognizes that the udp payload is
> itself an ethernet+IP frame and relays pmtu back to the (inner) ip src?
> Did I miss something?
> 
> --Sowmini
> --

The VXLAN like all layer 2 tunnels is not allowed to respond IP packets
in the inner header. One of the principles of network virtualization
is that the inner network IP space may overlap or be invalid in the
outer IP domain. Therefore an intermediate system (like VXLAN) does
not have a valid IP in the inner domain to send a response.

Another way to look at is that VXLAN is more of L2 bridge rather
than a L3 router.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: a vxlan question.
  2014-01-12 18:25 ` Stephen Hemminger
@ 2014-01-12 18:38   ` sowmini varadhan
  0 siblings, 0 replies; 3+ messages in thread
From: sowmini varadhan @ 2014-01-12 18:38 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, sowmini.varadhan

On Sun, Jan 12, 2014 at 1:25 PM, Stephen Hemminger
<stephen@networkplumber.org> wrote:

> The VXLAN like all layer 2 tunnels is not allowed to respond IP packets
> in the inner header.

Ok, so I see that the other tunnels don't relay packets, but
I noticed that they do take note of  FRAG_NEEDED and
update the soft state (e.g., ipip_err)

> One of the principles of network virtualization
> is that the inner network IP space may overlap or be invalid in the
> outer IP domain. Therefore an intermediate system (like VXLAN) does
> not have a valid IP in the inner domain to send a response.

Understood. And I recognize that the other tunnel drivers
don't relay pmtu today.

 But if you have  the rfc1812 conformant 576 bytes of the offending IP
in your error, you should have enough information (vmi, mac addrs,
vlans, IP addrs) to figure out who generated this packet?

>
> Another way to look at is that VXLAN is more of L2 bridge rather
> than a L3 router.

Well I guess the difference is that this flavor of L2 bridge
does in fact reduce the mtu by adding the udp/ip header.
Thus if I was a VM talking to another VM on my own pod
(i.e, no vxlan encaps in the way), I'd have a different mtu than if
I migrated to to another ToR/pod, and ended up with a reduced
mtu (I'd need to either bump up to 1600 byte mtus so that the
VM could continue to send 1500 byte pre-encaps frames, or
figure out the reduced mtu via pmtu, no?)

At the very least, shouldn't the VTEP track the reduced mtu
(in the same way that other tunnel drivers do)?

--Sowmini

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2014-01-12 18:38 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-12 17:25 a vxlan question sowmini varadhan
2014-01-12 18:25 ` Stephen Hemminger
2014-01-12 18:38   ` sowmini varadhan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox