* [PATCH net v2 0/3] Set a large MTU on ovs-created tunnel devices @ 2016-02-09 16:47 David Wragg 2016-02-09 16:47 ` [PATCH net v2 1/3] vxlan: Relax the MTU constraints David Wragg ` (2 more replies) 0 siblings, 3 replies; 16+ messages in thread From: David Wragg @ 2016-02-09 16:47 UTC (permalink / raw) To: netdev-u79uwXL29TY76Z2rM5mHXA, dev-yBygre7rU0TnMu66kgdUjQ Cc: David Wragg, Roopa Prabhu, Hannes Frederic Sowa, David Miller Prior to 4.3, tunnel vports (vxlan, gre and geneve) could transmit vxlan packets of any size, constrained only by the ability to send out the resulting packets. 4.3 introduced netdevs corresponding to tunnel vports. These netdevs have an MTU, which limits the size of a packet that can be successfully encapsulated. The default value for the MTUs are low (1500 or less), which is awkwardly small in the context of physical networks supporting jumbo frames, and leads to a conspicuous change in behaviour for userspace. This patch series sets the MTU on openvswitch-created netdevs to be the relevant maximum (i.e. the maximum IP packet size minus any relevant overhead), effectively restoring the behaviour prior to 4.3. Where appropriate, the limits on MTU values when set on the netdevs directly are also relaxed. Changes in v2: * Extend to all openvswitch tunnel types, i.e. gre and geneve as well * Use IP_MAX_MTU David Wragg (3): vxlan: Relax the MTU constraints geneve: Relax MTU constraints vxlan, gre, geneve: Set a large MTU on ovs-created tunnel devices drivers/net/geneve.c | 29 +++++++++++++++++++++----- drivers/net/vxlan.c | 47 ++++++++++++++++++++++++++++++------------- include/net/ip_tunnels.h | 1 + net/ipv4/ip_gre.c | 7 +++++++ net/ipv4/ip_tunnel.c | 21 ++++++++++++++++--- net/openvswitch/vport-vxlan.c | 2 ++ 6 files changed, 85 insertions(+), 22 deletions(-) -- 2.5.0 _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH net v2 1/3] vxlan: Relax the MTU constraints 2016-02-09 16:47 [PATCH net v2 0/3] Set a large MTU on ovs-created tunnel devices David Wragg @ 2016-02-09 16:47 ` David Wragg [not found] ` <1455036424-6403-1-git-send-email-david-1SEAoVOfG6VEzL6FDj/jAg@public.gmane.org> 2016-02-09 16:47 ` [PATCH net v2 3/3] vxlan, gre, geneve: Set a large MTU on ovs-created tunnel devices David Wragg 2 siblings, 0 replies; 16+ messages in thread From: David Wragg @ 2016-02-09 16:47 UTC (permalink / raw) To: netdev, dev Cc: Jesse Gross, David Miller, Hannes Frederic Sowa, Thomas Graf, Roopa Prabhu, David Wragg Allow the MTU of vxlan devices without an underlying device to be set to larger values (up to a maximum based on IP packet limits and vxlan overhead). Previously, their MTUs could not be set to higher than the conventional ethernet value of 1500. This is a very arbitrary value in the context of vxlan, and prevented vxlan devices from being able to take advantage of jumbo frames etc. The default MTU remains 1500, for compatibility. Signed-off-by: David Wragg <david@weave.works> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> --- drivers/net/vxlan.c | 36 +++++++++++++++++++++++++----------- 1 file changed, 25 insertions(+), 11 deletions(-) diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index 6543918..e992c6a 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -2367,29 +2367,43 @@ static void vxlan_set_multicast_list(struct net_device *dev) { } -static int vxlan_change_mtu(struct net_device *dev, int new_mtu) +static int __vxlan_change_mtu(struct net_device *dev, + struct net_device *lowerdev, + struct vxlan_rdst *dst, int new_mtu, bool strict) { - struct vxlan_dev *vxlan = netdev_priv(dev); - struct vxlan_rdst *dst = &vxlan->default_dst; - struct net_device *lowerdev; - int max_mtu; + int max_mtu = IP_MAX_MTU; - lowerdev = __dev_get_by_index(vxlan->net, dst->remote_ifindex); - if (lowerdev == NULL) - return eth_change_mtu(dev, new_mtu); + if (lowerdev) + max_mtu = lowerdev->mtu; if (dst->remote_ip.sa.sa_family == AF_INET6) - max_mtu = lowerdev->mtu - VXLAN6_HEADROOM; + max_mtu -= VXLAN6_HEADROOM; else - max_mtu = lowerdev->mtu - VXLAN_HEADROOM; + max_mtu -= VXLAN_HEADROOM; - if (new_mtu < 68 || new_mtu > max_mtu) + if (new_mtu < 68) return -EINVAL; + if (new_mtu > max_mtu) { + if (strict) + return -EINVAL; + + new_mtu = max_mtu; + } + dev->mtu = new_mtu; return 0; } +static int vxlan_change_mtu(struct net_device *dev, int new_mtu) +{ + struct vxlan_dev *vxlan = netdev_priv(dev); + struct vxlan_rdst *dst = &vxlan->default_dst; + struct net_device *lowerdev = __dev_get_by_index(vxlan->net, + dst->remote_ifindex); + return __vxlan_change_mtu(dev, lowerdev, dst, new_mtu, true); +} + static int egress_ipv4_tun_info(struct net_device *dev, struct sk_buff *skb, struct ip_tunnel_info *info, __be16 sport, __be16 dport) -- 2.5.0 ^ permalink raw reply related [flat|nested] 16+ messages in thread
[parent not found: <1455036424-6403-1-git-send-email-david-1SEAoVOfG6VEzL6FDj/jAg@public.gmane.org>]
* [PATCH net v2 2/3] geneve: Relax MTU constraints [not found] ` <1455036424-6403-1-git-send-email-david-1SEAoVOfG6VEzL6FDj/jAg@public.gmane.org> @ 2016-02-09 16:47 ` David Wragg 2016-02-09 18:18 ` Sergei Shtylyov 2016-02-10 7:40 ` Tom Herbert 0 siblings, 2 replies; 16+ messages in thread From: David Wragg @ 2016-02-09 16:47 UTC (permalink / raw) To: netdev-u79uwXL29TY76Z2rM5mHXA, dev-yBygre7rU0TnMu66kgdUjQ Cc: David Wragg, Roopa Prabhu, Hannes Frederic Sowa, David Miller Allow the MTU of geneve devices to be set to large values, in order to exploit underlying networks with larger frame sizes. GENEVE does not have a fixed encapsulation overhead (an openvswitch rule can add variable length options), so there is no relevant maximum MTU to enforce. A maximum of IP_MAX_MTU is used instead. Encapsulated packets that are too big for the underlying network will get dropped on the floor. Signed-off-by: David Wragg <david@weave.works> --- drivers/net/geneve.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c index 0b14ac3..05cef11 100644 --- a/drivers/net/geneve.c +++ b/drivers/net/geneve.c @@ -1039,6 +1039,16 @@ static netdev_tx_t geneve_xmit(struct sk_buff *skb, struct net_device *dev) return geneve_xmit_skb(skb, dev, info); } +static int geneve_change_mtu(struct net_device *dev, int new_mtu) +{ + /* GENEVE overhead is not fixed, so we can't enforce a more + precise max MTU. */ + if (new_mtu < 68 || new_mtu > IP_MAX_MTU) + return -EINVAL; + dev->mtu = new_mtu; + return 0; +} + static int geneve_fill_metadata_dst(struct net_device *dev, struct sk_buff *skb) { struct ip_tunnel_info *info = skb_tunnel_info(skb); @@ -1083,7 +1093,7 @@ static const struct net_device_ops geneve_netdev_ops = { .ndo_stop = geneve_stop, .ndo_start_xmit = geneve_xmit, .ndo_get_stats64 = ip_tunnel_get_stats64, - .ndo_change_mtu = eth_change_mtu, + .ndo_change_mtu = geneve_change_mtu, .ndo_validate_addr = eth_validate_addr, .ndo_set_mac_address = eth_mac_addr, .ndo_fill_metadata_dst = geneve_fill_metadata_dst, -- 2.5.0 _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH net v2 2/3] geneve: Relax MTU constraints 2016-02-09 16:47 ` [PATCH net v2 2/3] geneve: Relax " David Wragg @ 2016-02-09 18:18 ` Sergei Shtylyov [not found] ` <56BA2D6D.6090408-M4DtvfQ/ZS1MRgGoP+s0PdBPR1lH4CV8@public.gmane.org> 2016-02-10 7:40 ` Tom Herbert 1 sibling, 1 reply; 16+ messages in thread From: Sergei Shtylyov @ 2016-02-09 18:18 UTC (permalink / raw) To: David Wragg, netdev, dev Cc: Jesse Gross, David Miller, Hannes Frederic Sowa, Thomas Graf, Roopa Prabhu On 02/09/2016 07:47 PM, David Wragg wrote: > Allow the MTU of geneve devices to be set to large values, in order to > exploit underlying networks with larger frame sizes. > > GENEVE does not have a fixed encapsulation overhead (an openvswitch > rule can add variable length options), so there is no relevant maximum > MTU to enforce. A maximum of IP_MAX_MTU is used instead. > Encapsulated packets that are too big for the underlying network will > get dropped on the floor. > > Signed-off-by: David Wragg <david@weave.works> > --- > drivers/net/geneve.c | 12 +++++++++++- > 1 file changed, 11 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c > index 0b14ac3..05cef11 100644 > --- a/drivers/net/geneve.c > +++ b/drivers/net/geneve.c > @@ -1039,6 +1039,16 @@ static netdev_tx_t geneve_xmit(struct sk_buff *skb, struct net_device *dev) > return geneve_xmit_skb(skb, dev, info); > } > > +static int geneve_change_mtu(struct net_device *dev, int new_mtu) > +{ > + /* GENEVE overhead is not fixed, so we can't enforce a more > + precise max MTU. */ The networking code formats comments: /* Like * this. */ > + if (new_mtu < 68 || new_mtu > IP_MAX_MTU) > + return -EINVAL; > + dev->mtu = new_mtu; > + return 0; > +} > + > static int geneve_fill_metadata_dst(struct net_device *dev, struct sk_buff *skb) > { > struct ip_tunnel_info *info = skb_tunnel_info(skb); [...] MBR, Sergei ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <56BA2D6D.6090408-M4DtvfQ/ZS1MRgGoP+s0PdBPR1lH4CV8@public.gmane.org>]
* Re: [PATCH net v2 2/3] geneve: Relax MTU constraints [not found] ` <56BA2D6D.6090408-M4DtvfQ/ZS1MRgGoP+s0PdBPR1lH4CV8@public.gmane.org> @ 2016-02-09 18:22 ` David Wragg 0 siblings, 0 replies; 16+ messages in thread From: David Wragg @ 2016-02-09 18:22 UTC (permalink / raw) To: Sergei Shtylyov Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA, Roopa Prabhu, Hannes Frederic Sowa, David Miller Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> writes: > The networking code formats comments: > > /* Like > * this. > */ Thanks. And I noticed another silly mistake. Will respin. David _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net v2 2/3] geneve: Relax MTU constraints 2016-02-09 16:47 ` [PATCH net v2 2/3] geneve: Relax " David Wragg 2016-02-09 18:18 ` Sergei Shtylyov @ 2016-02-10 7:40 ` Tom Herbert 2016-02-10 11:41 ` David Wragg 1 sibling, 1 reply; 16+ messages in thread From: Tom Herbert @ 2016-02-10 7:40 UTC (permalink / raw) To: David Wragg Cc: Linux Kernel Network Developers, dev, Jesse Gross, David Miller, Hannes Frederic Sowa, Thomas Graf, Roopa Prabhu On Tue, Feb 9, 2016 at 5:47 PM, David Wragg <david@weave.works> wrote: > Allow the MTU of geneve devices to be set to large values, in order to > exploit underlying networks with larger frame sizes. > > GENEVE does not have a fixed encapsulation overhead (an openvswitch > rule can add variable length options), so there is no relevant maximum > MTU to enforce. A maximum of IP_MAX_MTU is used instead. > Encapsulated packets that are too big for the underlying network will > get dropped on the floor. > This defeats the purpose of having an MTU. The advertised MTU indicates how large a packet that the interface if willing to handle. Upper layers use this information, and if they need to send a packet larger than MTU they take appropriate action such as fragmenting packet or sending an ICMP PTB error. If a packets are sent on the interface that are smaller than MTU and are being "dropped on the floor" for being too big, a sender would get no indication at all why its packets are being dropped. The correct thing to do is determine the maximum amount of encapsulation overhead that can ever be set in a packet and use for setting the MTU. For instance, when RCO is enable in GUE, the size of the option is included in tunnel->encap_hlen even though it will not be used in all packets (via ip_tunnel_change_mtu). If there is no way to determine a maximum overhead a priori from configuration, then maximum overhead could be assumed to be maximum possible encapsulation header size which for Geneve is 132 bytes IIRC. Tom > Signed-off-by: David Wragg <david@weave.works> > --- > drivers/net/geneve.c | 12 +++++++++++- > 1 file changed, 11 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c > index 0b14ac3..05cef11 100644 > --- a/drivers/net/geneve.c > +++ b/drivers/net/geneve.c > @@ -1039,6 +1039,16 @@ static netdev_tx_t geneve_xmit(struct sk_buff *skb, struct net_device *dev) > return geneve_xmit_skb(skb, dev, info); > } > > +static int geneve_change_mtu(struct net_device *dev, int new_mtu) > +{ > + /* GENEVE overhead is not fixed, so we can't enforce a more > + precise max MTU. */ > + if (new_mtu < 68 || new_mtu > IP_MAX_MTU) > + return -EINVAL; > + dev->mtu = new_mtu; > + return 0; > +} > + > static int geneve_fill_metadata_dst(struct net_device *dev, struct sk_buff *skb) > { > struct ip_tunnel_info *info = skb_tunnel_info(skb); > @@ -1083,7 +1093,7 @@ static const struct net_device_ops geneve_netdev_ops = { > .ndo_stop = geneve_stop, > .ndo_start_xmit = geneve_xmit, > .ndo_get_stats64 = ip_tunnel_get_stats64, > - .ndo_change_mtu = eth_change_mtu, > + .ndo_change_mtu = geneve_change_mtu, > .ndo_validate_addr = eth_validate_addr, > .ndo_set_mac_address = eth_mac_addr, > .ndo_fill_metadata_dst = geneve_fill_metadata_dst, > -- > 2.5.0 > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net v2 2/3] geneve: Relax MTU constraints 2016-02-10 7:40 ` Tom Herbert @ 2016-02-10 11:41 ` David Wragg 2016-02-10 11:59 ` Jesse Gross 0 siblings, 1 reply; 16+ messages in thread From: David Wragg @ 2016-02-10 11:41 UTC (permalink / raw) To: Tom Herbert Cc: Linux Kernel Network Developers, dev, Jesse Gross, David Miller, Hannes Frederic Sowa, Thomas Graf, Roopa Prabhu Tom Herbert <tom@herbertland.com> writes: > The correct thing to do is determine the maximum amount of > encapsulation overhead that can ever be set in a packet and use for > setting the MTU. For instance, when RCO is enable in GUE, the size of > the option is included in tunnel->encap_hlen even though it will not > be used in all packets (via ip_tunnel_change_mtu). If there is no way > to determine a maximum overhead a priori from configuration, then > maximum overhead could be assumed to be maximum possible encapsulation > header size which for Geneve is 132 bytes IIRC. Ok, I'll come up with a patch to address this. David ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net v2 2/3] geneve: Relax MTU constraints 2016-02-10 11:41 ` David Wragg @ 2016-02-10 11:59 ` Jesse Gross 2016-02-10 14:21 ` Tom Herbert 0 siblings, 1 reply; 16+ messages in thread From: Jesse Gross @ 2016-02-10 11:59 UTC (permalink / raw) To: David Wragg Cc: Tom Herbert, Linux Kernel Network Developers, ovs dev, David Miller, Hannes Frederic Sowa, Thomas Graf, Roopa Prabhu On Wed, Feb 10, 2016 at 12:41 PM, David Wragg <david@weave.works> wrote: > Tom Herbert <tom@herbertland.com> writes: >> The correct thing to do is determine the maximum amount of >> encapsulation overhead that can ever be set in a packet and use for >> setting the MTU. For instance, when RCO is enable in GUE, the size of >> the option is included in tunnel->encap_hlen even though it will not >> be used in all packets (via ip_tunnel_change_mtu). If there is no way >> to determine a maximum overhead a priori from configuration, then >> maximum overhead could be assumed to be maximum possible encapsulation >> header size which for Geneve is 132 bytes IIRC. > > Ok, I'll come up with a patch to address this. I don't think that this really applies in this situation. The concerns here relate to what the MTU is actually set to but this patch affects the range of MTUs allowed to be set by the user. I don't see a reason to disallow the user from setting a precise value if they know what it should be. In any case, I don't think it is likely to have much impact. By default with tunnels the output device is not fixed and therefore the base MTU that is used is IP_MAX_MTU. Subtracting some tunnel overhead amount from this is still likely quite a bit higher than any physical MTU. If you really want, I would subtract the base Geneve header size from IP_MAX_MTU to get the true max but it's probably not a big deal in any case. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net v2 2/3] geneve: Relax MTU constraints 2016-02-10 11:59 ` Jesse Gross @ 2016-02-10 14:21 ` Tom Herbert [not found] ` <CALx6S36kvXJUZM8_vEmF8L3bxnRiBh98Rs0wAd-HGBJ9Yyde_Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 16+ messages in thread From: Tom Herbert @ 2016-02-10 14:21 UTC (permalink / raw) To: Jesse Gross Cc: David Wragg, Linux Kernel Network Developers, ovs dev, David Miller, Hannes Frederic Sowa, Thomas Graf, Roopa Prabhu On Wed, Feb 10, 2016 at 12:59 PM, Jesse Gross <jesse@kernel.org> wrote: > On Wed, Feb 10, 2016 at 12:41 PM, David Wragg <david@weave.works> wrote: >> Tom Herbert <tom@herbertland.com> writes: >>> The correct thing to do is determine the maximum amount of >>> encapsulation overhead that can ever be set in a packet and use for >>> setting the MTU. For instance, when RCO is enable in GUE, the size of >>> the option is included in tunnel->encap_hlen even though it will not >>> be used in all packets (via ip_tunnel_change_mtu). If there is no way >>> to determine a maximum overhead a priori from configuration, then >>> maximum overhead could be assumed to be maximum possible encapsulation >>> header size which for Geneve is 132 bytes IIRC. >> >> Ok, I'll come up with a patch to address this. > > I don't think that this really applies in this situation. The concerns > here relate to what the MTU is actually set to but this patch affects > the range of MTUs allowed to be set by the user. I don't see a reason > to disallow the user from setting a precise value if they know what it > should be. > Right, but if the user sets a bad value and packets are silently dropped on the floor then that seems like a bad result that could have easily been prevented. > In any case, I don't think it is likely to have much impact. By > default with tunnels the output device is not fixed and therefore the > base MTU that is used is IP_MAX_MTU. Subtracting some tunnel overhead > amount from this is still likely quite a bit higher than any physical > MTU. > > If you really want, I would subtract the base Geneve header size from > IP_MAX_MTU to get the true max but it's probably not a big deal in any > case. ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <CALx6S36kvXJUZM8_vEmF8L3bxnRiBh98Rs0wAd-HGBJ9Yyde_Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH net v2 2/3] geneve: Relax MTU constraints [not found] ` <CALx6S36kvXJUZM8_vEmF8L3bxnRiBh98Rs0wAd-HGBJ9Yyde_Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2016-02-10 14:52 ` Jesse Gross 2016-02-16 12:33 ` David Wragg 0 siblings, 1 reply; 16+ messages in thread From: Jesse Gross @ 2016-02-10 14:52 UTC (permalink / raw) To: Tom Herbert Cc: ovs dev, David Wragg, Linux Kernel Network Developers, Roopa Prabhu, Hannes Frederic Sowa, David Miller On Wed, Feb 10, 2016 at 3:21 PM, Tom Herbert <tom@herbertland.com> wrote: > On Wed, Feb 10, 2016 at 12:59 PM, Jesse Gross <jesse@kernel.org> wrote: >> On Wed, Feb 10, 2016 at 12:41 PM, David Wragg <david@weave.works> wrote: >>> Tom Herbert <tom@herbertland.com> writes: >>>> The correct thing to do is determine the maximum amount of >>>> encapsulation overhead that can ever be set in a packet and use for >>>> setting the MTU. For instance, when RCO is enable in GUE, the size of >>>> the option is included in tunnel->encap_hlen even though it will not >>>> be used in all packets (via ip_tunnel_change_mtu). If there is no way >>>> to determine a maximum overhead a priori from configuration, then >>>> maximum overhead could be assumed to be maximum possible encapsulation >>>> header size which for Geneve is 132 bytes IIRC. >>> >>> Ok, I'll come up with a patch to address this. >> >> I don't think that this really applies in this situation. The concerns >> here relate to what the MTU is actually set to but this patch affects >> the range of MTUs allowed to be set by the user. I don't see a reason >> to disallow the user from setting a precise value if they know what it >> should be. >> > Right, but if the user sets a bad value and packets are silently > dropped on the floor then that seems like a bad result that could have > easily been prevented. Sure, we might as well prevent the extreme edge cases that can never be valid. In the case of Geneve though, this would be the minimum header size, not the maximum, since it's possible that the user actually knows how big the options are that they want to use. But as I said, the practical impact is low because IP_MAX_MTU is so much larger than the MTU of real devices. There will always be many values that the MTU can be set to that result in dropped packets. This is true of all tunnel types. _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net v2 2/3] geneve: Relax MTU constraints 2016-02-10 14:52 ` Jesse Gross @ 2016-02-16 12:33 ` David Wragg 2016-02-16 16:44 ` Tom Herbert 0 siblings, 1 reply; 16+ messages in thread From: David Wragg @ 2016-02-16 12:33 UTC (permalink / raw) To: Jesse Gross, Tom Herbert Cc: Linux Kernel Network Developers, ovs dev, David Miller, Hannes Frederic Sowa, Thomas Graf, Roopa Prabhu Jesse Gross <jesse@kernel.org> writes: > On Wed, Feb 10, 2016 at 3:21 PM, Tom Herbert <tom@herbertland.com> wrote: >> On Wed, Feb 10, 2016 at 12:59 PM, Jesse Gross <jesse@kernel.org> wrote: >>> On Wed, Feb 10, 2016 at 12:41 PM, David Wragg <david@weave.works> wrote: >>>> Tom Herbert <tom@herbertland.com> writes: >>>>> The correct thing to do is determine the maximum amount of >>>>> encapsulation overhead that can ever be set in a packet and use for >>>>> setting the MTU. For instance, when RCO is enable in GUE, the size of >>>>> the option is included in tunnel->encap_hlen even though it will not >>>>> be used in all packets (via ip_tunnel_change_mtu). If there is no way >>>>> to determine a maximum overhead a priori from configuration, then >>>>> maximum overhead could be assumed to be maximum possible encapsulation >>>>> header size which for Geneve is 132 bytes IIRC. >>>> >>>> Ok, I'll come up with a patch to address this. >>> >>> I don't think that this really applies in this situation. The concerns >>> here relate to what the MTU is actually set to but this patch affects >>> the range of MTUs allowed to be set by the user. I don't see a reason >>> to disallow the user from setting a precise value if they know what it >>> should be. >>> >> Right, but if the user sets a bad value and packets are silently >> dropped on the floor then that seems like a bad result that could have >> easily been prevented. > > Sure, we might as well prevent the extreme edge cases that can never > be valid. In the case of Geneve though, this would be the minimum > header size, not the maximum, since it's possible that the user > actually knows how big the options are that they want to use. > > But as I said, the practical impact is low because IP_MAX_MTU is so > much larger than the MTU of real devices. There will always be many > values that the MTU can be set to that result in dropped packets. This > is true of all tunnel types. I agree with Jesse, and if there was a debate about lifting the MTU limit on all tunnel types to IP_MAX_MTU, I'd be for that. But for the other tunnel types, I followed the precedent of the code that was already there, and geneve might as well be consistent. Patch sent to the lists. David ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net v2 2/3] geneve: Relax MTU constraints 2016-02-16 12:33 ` David Wragg @ 2016-02-16 16:44 ` Tom Herbert 2016-02-18 16:54 ` David Wragg 0 siblings, 1 reply; 16+ messages in thread From: Tom Herbert @ 2016-02-16 16:44 UTC (permalink / raw) To: David Wragg Cc: Jesse Gross, Linux Kernel Network Developers, ovs dev, David Miller, Hannes Frederic Sowa, Thomas Graf, Roopa Prabhu On Tue, Feb 16, 2016 at 4:33 AM, David Wragg <david@weave.works> wrote: > Jesse Gross <jesse@kernel.org> writes: >> On Wed, Feb 10, 2016 at 3:21 PM, Tom Herbert <tom@herbertland.com> wrote: >>> On Wed, Feb 10, 2016 at 12:59 PM, Jesse Gross <jesse@kernel.org> wrote: >>>> On Wed, Feb 10, 2016 at 12:41 PM, David Wragg <david@weave.works> wrote: >>>>> Tom Herbert <tom@herbertland.com> writes: >>>>>> The correct thing to do is determine the maximum amount of >>>>>> encapsulation overhead that can ever be set in a packet and use for >>>>>> setting the MTU. For instance, when RCO is enable in GUE, the size of >>>>>> the option is included in tunnel->encap_hlen even though it will not >>>>>> be used in all packets (via ip_tunnel_change_mtu). If there is no way >>>>>> to determine a maximum overhead a priori from configuration, then >>>>>> maximum overhead could be assumed to be maximum possible encapsulation >>>>>> header size which for Geneve is 132 bytes IIRC. >>>>> >>>>> Ok, I'll come up with a patch to address this. >>>> >>>> I don't think that this really applies in this situation. The concerns >>>> here relate to what the MTU is actually set to but this patch affects >>>> the range of MTUs allowed to be set by the user. I don't see a reason >>>> to disallow the user from setting a precise value if they know what it >>>> should be. >>>> >>> Right, but if the user sets a bad value and packets are silently >>> dropped on the floor then that seems like a bad result that could have >>> easily been prevented. >> >> Sure, we might as well prevent the extreme edge cases that can never >> be valid. In the case of Geneve though, this would be the minimum >> header size, not the maximum, since it's possible that the user >> actually knows how big the options are that they want to use. >> >> But as I said, the practical impact is low because IP_MAX_MTU is so >> much larger than the MTU of real devices. There will always be many >> values that the MTU can be set to that result in dropped packets. This >> is true of all tunnel types. > > I agree with Jesse, and if there was a debate about lifting the MTU > limit on all tunnel types to IP_MAX_MTU, I'd be for that. > > But for the other tunnel types, I followed the precedent of the code > that was already there, and geneve might as well be consistent. > Please implement like in ip_tunnel_change_mtu (or better yet call it), that is the precedent for tunnels. Tom > Patch sent to the lists. > > David ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net v2 2/3] geneve: Relax MTU constraints 2016-02-16 16:44 ` Tom Herbert @ 2016-02-18 16:54 ` David Wragg 2016-02-18 20:07 ` David Miller [not found] ` <86bn7e2c3t.fsf-1SEAoVOfG6VEzL6FDj/jAg@public.gmane.org> 0 siblings, 2 replies; 16+ messages in thread From: David Wragg @ 2016-02-18 16:54 UTC (permalink / raw) To: Tom Herbert Cc: Jesse Gross, Linux Kernel Network Developers, ovs dev, David Miller, Hannes Frederic Sowa, Thomas Graf, Roopa Prabhu Tom Herbert <tom@herbertland.com> writes: > Please implement like in ip_tunnel_change_mtu (or better yet call it), > that is the precedent for tunnels. I've made geneve_change_mtu follow ip_tunnel_change_mtu in v2. If it were to call it instead, are you suggesting just passing in t_hlen? Or restructuring geneve.c to re-use the whole ip_tunnel infrastructure? Also, I'm not sure where the 0xFFF8 comes from in __ip_tunnel_change_mtu. Any ideas why 0xFFF8 rather than 0xffff? It goes all the way back to the inital import of the kernel into git. David ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net v2 2/3] geneve: Relax MTU constraints 2016-02-18 16:54 ` David Wragg @ 2016-02-18 20:07 ` David Miller [not found] ` <86bn7e2c3t.fsf-1SEAoVOfG6VEzL6FDj/jAg@public.gmane.org> 1 sibling, 0 replies; 16+ messages in thread From: David Miller @ 2016-02-18 20:07 UTC (permalink / raw) To: david; +Cc: tom, jesse, netdev, dev, hannes, tgraf, roopa From: David Wragg <david@weave.works> Date: Thu, 18 Feb 2016 16:54:14 +0000 > Tom Herbert <tom@herbertland.com> writes: >> Please implement like in ip_tunnel_change_mtu (or better yet call it), >> that is the precedent for tunnels. > > I've made geneve_change_mtu follow ip_tunnel_change_mtu in v2. > > If it were to call it instead, are you suggesting just passing in > t_hlen? Or restructuring geneve.c to re-use the whole ip_tunnel > infrastructure? > > Also, I'm not sure where the 0xFFF8 comes from in > __ip_tunnel_change_mtu. Any ideas why 0xFFF8 rather than 0xffff? It > goes all the way back to the inital import of the kernel into git. Some 8 byte multiple requirement, perhaps to do with fragmentation. ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <86bn7e2c3t.fsf-1SEAoVOfG6VEzL6FDj/jAg@public.gmane.org>]
* Re: [PATCH net v2 2/3] geneve: Relax MTU constraints [not found] ` <86bn7e2c3t.fsf-1SEAoVOfG6VEzL6FDj/jAg@public.gmane.org> @ 2016-02-19 2:11 ` Tom Herbert 0 siblings, 0 replies; 16+ messages in thread From: Tom Herbert @ 2016-02-19 2:11 UTC (permalink / raw) To: David Wragg Cc: ovs dev, Roopa Prabhu, Hannes Frederic Sowa, Linux Kernel Network Developers, David Miller On Thu, Feb 18, 2016 at 8:54 AM, David Wragg <david@weave.works> wrote: > Tom Herbert <tom@herbertland.com> writes: >> Please implement like in ip_tunnel_change_mtu (or better yet call it), >> that is the precedent for tunnels. > > I've made geneve_change_mtu follow ip_tunnel_change_mtu in v2. > > If it were to call it instead, are you suggesting just passing in > t_hlen? Or restructuring geneve.c to re-use the whole ip_tunnel > infrastructure? > I'll leave that to you to decide if that is feasible or makes sense, but ip_tunnel does do some other interesting things. Support for geneve could easily be implemented using ip_tunnel_encap facility. The default MTU on the device is set based on the MTU of the outgoing interface and tunnel overhead-- this should mitigate the possibility of a lot of fragmentation happening within the tunnel. Also, the output infrastructure caches the route for the tunnel which is a nice performance win. > Also, I'm not sure where the 0xFFF8 comes from in > __ip_tunnel_change_mtu. Any ideas why 0xFFF8 rather than 0xffff? It > goes all the way back to the inital import of the kernel into git. > Yes, that's pretty ugly. Feel free to replace that with a #define or at least put a comment about it for the benefit of future generations. Thanks, Tom > David _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH net v2 3/3] vxlan, gre, geneve: Set a large MTU on ovs-created tunnel devices 2016-02-09 16:47 [PATCH net v2 0/3] Set a large MTU on ovs-created tunnel devices David Wragg 2016-02-09 16:47 ` [PATCH net v2 1/3] vxlan: Relax the MTU constraints David Wragg [not found] ` <1455036424-6403-1-git-send-email-david-1SEAoVOfG6VEzL6FDj/jAg@public.gmane.org> @ 2016-02-09 16:47 ` David Wragg 2 siblings, 0 replies; 16+ messages in thread From: David Wragg @ 2016-02-09 16:47 UTC (permalink / raw) To: netdev, dev Cc: Jesse Gross, David Miller, Hannes Frederic Sowa, Thomas Graf, Roopa Prabhu, David Wragg Prior to 4.3, tunnel vports (vxlan, gre and geneve) could transmit vxlan packets of any size, constrained only by the ability to send out the resulting packets. 4.3 introduced netdevs corresponding to tunnel vports. These netdevs have an MTU, which limits the size of a packet that can be successfully encapsulated. The default value for the MTUs are low (1500 or less), which is awkwardly small in the context of physical networks supporting jumbo frames, and leads to a conspicuous change in behaviour for userspace. Instead, set the MTU on openvswitch-created netdevs to be the relevant maximum (i.e. the maximum IP packet size minus any relevant overhead), effectively restoring the behaviour prior to 4.3. Signed-off-by: David Wragg <david@weave.works> --- drivers/net/geneve.c | 17 +++++++++++++---- drivers/net/vxlan.c | 11 ++++++++--- include/net/ip_tunnels.h | 1 + net/ipv4/ip_gre.c | 7 +++++++ net/ipv4/ip_tunnel.c | 21 ++++++++++++++++++--- net/openvswitch/vport-vxlan.c | 2 ++ 6 files changed, 49 insertions(+), 10 deletions(-) diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c index 05cef11..9965714 100644 --- a/drivers/net/geneve.c +++ b/drivers/net/geneve.c @@ -1452,11 +1452,20 @@ struct net_device *geneve_dev_create_fb(struct net *net, const char *name, err = geneve_configure(net, dev, &geneve_remote_unspec, 0, 0, 0, htons(dst_port), true, 0); - if (err) { - free_netdev(dev); - return ERR_PTR(err); - } + if (err) + goto err; + + /* openvswitch users expect packet sizes to be unrestricted, + so set the largest MTU we can. */ + err = geneve_change_mtu(dev, IP_MAX_MTU); + if (err) + goto err; + return dev; + + err: + free_netdev(dev); + return ERR_PTR(err); } EXPORT_SYMBOL_GPL(geneve_dev_create_fb); diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index e992c6a..a31cd95 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -2779,6 +2779,7 @@ static int vxlan_dev_configure(struct net *src_net, struct net_device *dev, int err; bool use_ipv6 = false; __be16 default_port = vxlan->cfg.dst_port; + struct net_device *lowerdev = NULL; vxlan->net = src_net; @@ -2799,9 +2800,7 @@ static int vxlan_dev_configure(struct net *src_net, struct net_device *dev, } if (conf->remote_ifindex) { - struct net_device *lowerdev - = __dev_get_by_index(src_net, conf->remote_ifindex); - + lowerdev = __dev_get_by_index(src_net, conf->remote_ifindex); dst->remote_ifindex = conf->remote_ifindex; if (!lowerdev) { @@ -2825,6 +2824,12 @@ static int vxlan_dev_configure(struct net *src_net, struct net_device *dev, needed_headroom = lowerdev->hard_header_len; } + if (conf->mtu) { + err = __vxlan_change_mtu(dev, lowerdev, dst, conf->mtu, false); + if (err) + return err; + } + if (use_ipv6 || conf->flags & VXLAN_F_COLLECT_METADATA) needed_headroom += VXLAN6_HEADROOM; else diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h index 6db96ea..dda9abf 100644 --- a/include/net/ip_tunnels.h +++ b/include/net/ip_tunnels.h @@ -230,6 +230,7 @@ void ip_tunnel_xmit(struct sk_buff *skb, struct net_device *dev, int ip_tunnel_ioctl(struct net_device *dev, struct ip_tunnel_parm *p, int cmd); int ip_tunnel_encap(struct sk_buff *skb, struct ip_tunnel *t, u8 *protocol, struct flowi4 *fl4); +int __ip_tunnel_change_mtu(struct net_device *dev, int new_mtu, bool strict); int ip_tunnel_change_mtu(struct net_device *dev, int new_mtu); struct rtnl_link_stats64 *ip_tunnel_get_stats64(struct net_device *dev, diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c index 7c51c4e..057806d 100644 --- a/net/ipv4/ip_gre.c +++ b/net/ipv4/ip_gre.c @@ -1240,6 +1240,13 @@ struct net_device *gretap_fb_dev_create(struct net *net, const char *name, err = ipgre_newlink(net, dev, tb, NULL); if (err < 0) goto out; + + /* openvswitch users expect packet sizes to be unrestricted, + so set the largest MTU we can. */ + err = __ip_tunnel_change_mtu(dev, IP_MAX_MTU, false); + if (err) + goto out; + return dev; out: free_netdev(dev); diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c index c7bd72e..49504ed 100644 --- a/net/ipv4/ip_tunnel.c +++ b/net/ipv4/ip_tunnel.c @@ -943,17 +943,32 @@ done: } EXPORT_SYMBOL_GPL(ip_tunnel_ioctl); -int ip_tunnel_change_mtu(struct net_device *dev, int new_mtu) +int __ip_tunnel_change_mtu(struct net_device *dev, int new_mtu, bool strict) { struct ip_tunnel *tunnel = netdev_priv(dev); int t_hlen = tunnel->hlen + sizeof(struct iphdr); + int max_mtu = 0xFFF8 - dev->hard_header_len - t_hlen; - if (new_mtu < 68 || - new_mtu > 0xFFF8 - dev->hard_header_len - t_hlen) + if (new_mtu < 68) return -EINVAL; + + printk("XXX %d %d\n", new_mtu, max_mtu); + if (new_mtu > max_mtu) { + if (strict) + return -EINVAL; + + new_mtu = max_mtu; + } + dev->mtu = new_mtu; return 0; } +EXPORT_SYMBOL_GPL(__ip_tunnel_change_mtu); + +int ip_tunnel_change_mtu(struct net_device *dev, int new_mtu) +{ + return __ip_tunnel_change_mtu(dev, new_mtu, true); +} EXPORT_SYMBOL_GPL(ip_tunnel_change_mtu); static void ip_tunnel_dev_free(struct net_device *dev) diff --git a/net/openvswitch/vport-vxlan.c b/net/openvswitch/vport-vxlan.c index 1605691..de9cb19 100644 --- a/net/openvswitch/vport-vxlan.c +++ b/net/openvswitch/vport-vxlan.c @@ -91,6 +91,8 @@ static struct vport *vxlan_tnl_create(const struct vport_parms *parms) struct vxlan_config conf = { .no_share = true, .flags = VXLAN_F_COLLECT_METADATA, + /* Don't restrict the packets that can be sent by MTU */ + .mtu = IP_MAX_MTU, }; if (!options) { -- 2.5.0 ^ permalink raw reply related [flat|nested] 16+ messages in thread
end of thread, other threads:[~2016-02-19 2:11 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-02-09 16:47 [PATCH net v2 0/3] Set a large MTU on ovs-created tunnel devices David Wragg 2016-02-09 16:47 ` [PATCH net v2 1/3] vxlan: Relax the MTU constraints David Wragg [not found] ` <1455036424-6403-1-git-send-email-david-1SEAoVOfG6VEzL6FDj/jAg@public.gmane.org> 2016-02-09 16:47 ` [PATCH net v2 2/3] geneve: Relax " David Wragg 2016-02-09 18:18 ` Sergei Shtylyov [not found] ` <56BA2D6D.6090408-M4DtvfQ/ZS1MRgGoP+s0PdBPR1lH4CV8@public.gmane.org> 2016-02-09 18:22 ` David Wragg 2016-02-10 7:40 ` Tom Herbert 2016-02-10 11:41 ` David Wragg 2016-02-10 11:59 ` Jesse Gross 2016-02-10 14:21 ` Tom Herbert [not found] ` <CALx6S36kvXJUZM8_vEmF8L3bxnRiBh98Rs0wAd-HGBJ9Yyde_Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2016-02-10 14:52 ` Jesse Gross 2016-02-16 12:33 ` David Wragg 2016-02-16 16:44 ` Tom Herbert 2016-02-18 16:54 ` David Wragg 2016-02-18 20:07 ` David Miller [not found] ` <86bn7e2c3t.fsf-1SEAoVOfG6VEzL6FDj/jAg@public.gmane.org> 2016-02-19 2:11 ` Tom Herbert 2016-02-09 16:47 ` [PATCH net v2 3/3] vxlan, gre, geneve: Set a large MTU on ovs-created tunnel devices David Wragg
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).