From mboxrd@z Thu Jan 1 00:00:00 1970 From: Stephen Hemminger Subject: Re: [PATCHv4 net-next] vxlan: virtual extensible lan Date: Tue, 25 Sep 2012 21:36:23 -0700 Message-ID: <20120925213623.39ee67d1@nehalam.linuxnetplumber.net> References: <20120924184304.727711327@vyatta.com> <20120924185050.162920909@vyatta.com> <20120924205822.GI26494@x200.localdomain> <20120924145031.3b0122e6@nehalam.linuxnetplumber.net> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: Chris Wright , David Miller , netdev@vger.kernel.org To: Jesse Gross Return-path: Received: from mail.vyatta.com ([76.74.103.46]:45591 "EHLO mail.vyatta.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750772Ab2IZEg6 (ORCPT ); Wed, 26 Sep 2012 00:36:58 -0400 In-Reply-To: Sender: netdev-owner@vger.kernel.org List-ID: On Tue, 25 Sep 2012 14:55:13 -0700 Jesse Gross wrote: > On Mon, Sep 24, 2012 at 2:50 PM, Stephen Hemminger > wrote: > > +static netdev_tx_t vxlan_xmit(struct sk_buff *skb, struct net_device *dev) > [...] > > + /* Do PMTU */ > > + if (skb->protocol == htons(ETH_P_IP)) { > > + df |= old_iph->frag_off & htons(IP_DF); > > + if (df && mtu < pkt_len) { > > + icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED, > > + htonl(mtu)); > > + ip_rt_put(rt); > > + goto tx_error; > > + } > > + } > > +#if IS_ENABLED(CONFIG_IPV6) > > + else if (skb->protocol == htons(ETH_P_IPV6)) { > > + if (mtu >= IPV6_MIN_MTU && mtu < pkt_len) { > > + icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu); > > + ip_rt_put(rt); > > + goto tx_error; > > + } > > + } > > +#endif > > Won't this black hole packets if we need to generate ICMP messages? > Since we're doing switching and not routing here icmp_send() doesn't > necessarily have a route to the relevant endpoint. It looks like > Ethernet over GRE has this issue as well. It is an interesting question about what is the correct way to handle packets where the inner header is IPv6 or IPv4 with Don't Fragment set. As you mention sending an ICMP response won't work because the tunnel endpoint is not part of that IP network. The simple option is to fragment it in the tunnel and since the fragmentation is not visible to the overlay network, that is okay. But for PMTU discovery it might be better to just drop the packet and not send a fragmented payload. Some backbone networks don't allow fragmentation at all (in a futile attempt to block DoS attacks and protect fragile Windows hosts). Fragmentation brings all sorts of evil problems like the potential of corrupted assembly because of sequence wrap; the checksum in the inner packet will defend against that but tunnels are not supposed to rely on inner protocol data protection. Or you can just do what Cisco and Microsoft do and just tell everyone to set larger MTU on the backbone.