From: Stefano Brivio <sbrivio@redhat.com>
To: "David S. Miller" <davem@davemloft.net>
Cc: Sabrina Dubroca <sd@queasysnail.net>,
Xin Long <lucien.xin@gmail.com>,
Stephen Hemminger <stephen@networkplumber.org>,
Jiri Benc <jbenc@redhat.com>, David Ahern <dsahern@gmail.com>,
netdev@vger.kernel.org
Subject: [PATCH net-next v2 03/11] vxlan: Allow configuration of DF behaviour
Date: Thu, 8 Nov 2018 12:19:16 +0100 [thread overview]
Message-ID: <37169eb16a5d7c240f713b743b7ed726bb88decc.1541675666.git.sbrivio@redhat.com> (raw)
In-Reply-To: <cover.1541675666.git.sbrivio@redhat.com>
Allow users to set the IPv4 DF bit in outgoing packets, or to inherit its
value from the IPv4 inner header. If the encapsulated protocol is IPv6 and
DF is configured to be inherited, always set it.
For IPv4, inheriting DF from the inner header was probably intended from
the very beginning judging by the comment to vxlan_xmit(), but it wasn't
actually implemented -- also because it would have done more harm than
good, without handling for ICMP Fragmentation Needed messages.
According to RFC 7348, "Path MTU discovery MAY be used". An expired RFC
draft, draft-saum-nvo3-pmtud-over-vxlan-05, whose purpose was to describe
PMTUD implementation, says that "is a MUST that Vxlan gateways [...]
SHOULD set the DF-bit [...]", whatever that means.
Given this background, the only sane option is probably to let the user
decide, and keep the current behaviour as default.
This only applies to non-lwt tunnels: if an external control plane is
used, tunnel key will still control the DF flag.
v2:
- DF behaviour configuration only applies for non-lwt tunnels, move DF
setting to if (!info) block in vxlan_xmit_one() (Stephen Hemminger)
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
---
drivers/net/vxlan.c | 29 ++++++++++++++++++++++++++++-
include/net/vxlan.h | 1 +
include/uapi/linux/if_link.h | 9 +++++++++
3 files changed, 38 insertions(+), 1 deletion(-)
diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 0851af6733f3..c3e65e78f015 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -2278,13 +2278,24 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
goto tx_error;
}
- /* Bypass encapsulation if the destination is local */
if (!info) {
+ /* Bypass encapsulation if the destination is local */
err = encap_bypass_if_local(skb, dev, vxlan, dst,
dst_port, ifindex, vni,
&rt->dst, rt->rt_flags);
if (err)
goto out_unlock;
+
+ if (vxlan->cfg.df == VXLAN_DF_SET) {
+ df = htons(IP_DF);
+ } else if (vxlan->cfg.df == VXLAN_DF_INHERIT) {
+ struct ethhdr *eth = eth_hdr(skb);
+
+ if (ntohs(eth->h_proto) == ETH_P_IPV6 ||
+ (ntohs(eth->h_proto) == ETH_P_IP &&
+ old_iph->frag_off & htons(IP_DF)))
+ df = htons(IP_DF);
+ }
} else if (info->key.tun_flags & TUNNEL_DONT_FRAGMENT) {
df = htons(IP_DF);
}
@@ -2837,6 +2848,7 @@ static const struct nla_policy vxlan_policy[IFLA_VXLAN_MAX + 1] = {
[IFLA_VXLAN_GPE] = { .type = NLA_FLAG, },
[IFLA_VXLAN_REMCSUM_NOPARTIAL] = { .type = NLA_FLAG },
[IFLA_VXLAN_TTL_INHERIT] = { .type = NLA_FLAG },
+ [IFLA_VXLAN_DF] = { .type = NLA_U8 },
};
static int vxlan_validate(struct nlattr *tb[], struct nlattr *data[],
@@ -2893,6 +2905,16 @@ static int vxlan_validate(struct nlattr *tb[], struct nlattr *data[],
}
}
+ if (data[IFLA_VXLAN_DF]) {
+ enum ifla_vxlan_df df = nla_get_u8(data[IFLA_VXLAN_DF]);
+
+ if (df < 0 || df > VXLAN_DF_MAX) {
+ NL_SET_ERR_MSG_ATTR(extack, tb[IFLA_VXLAN_DF],
+ "Invalid DF attribute");
+ return -EINVAL;
+ }
+ }
+
return 0;
}
@@ -3538,6 +3560,9 @@ static int vxlan_nl2conf(struct nlattr *tb[], struct nlattr *data[],
conf->mtu = nla_get_u32(tb[IFLA_MTU]);
}
+ if (data[IFLA_VXLAN_DF])
+ conf->df = nla_get_u8(data[IFLA_VXLAN_DF]);
+
return 0;
}
@@ -3630,6 +3655,7 @@ static size_t vxlan_get_size(const struct net_device *dev)
nla_total_size(sizeof(__u8)) + /* IFLA_VXLAN_TTL */
nla_total_size(sizeof(__u8)) + /* IFLA_VXLAN_TTL_INHERIT */
nla_total_size(sizeof(__u8)) + /* IFLA_VXLAN_TOS */
+ nla_total_size(sizeof(__u8)) + /* IFLA_VXLAN_DF */
nla_total_size(sizeof(__be32)) + /* IFLA_VXLAN_LABEL */
nla_total_size(sizeof(__u8)) + /* IFLA_VXLAN_LEARNING */
nla_total_size(sizeof(__u8)) + /* IFLA_VXLAN_PROXY */
@@ -3696,6 +3722,7 @@ static int vxlan_fill_info(struct sk_buff *skb, const struct net_device *dev)
nla_put_u8(skb, IFLA_VXLAN_TTL_INHERIT,
!!(vxlan->cfg.flags & VXLAN_F_TTL_INHERIT)) ||
nla_put_u8(skb, IFLA_VXLAN_TOS, vxlan->cfg.tos) ||
+ nla_put_u8(skb, IFLA_VXLAN_DF, vxlan->cfg.df) ||
nla_put_be32(skb, IFLA_VXLAN_LABEL, vxlan->cfg.label) ||
nla_put_u8(skb, IFLA_VXLAN_LEARNING,
!!(vxlan->cfg.flags & VXLAN_F_LEARN)) ||
diff --git a/include/net/vxlan.h b/include/net/vxlan.h
index 03431c148e16..ec999c49df1f 100644
--- a/include/net/vxlan.h
+++ b/include/net/vxlan.h
@@ -216,6 +216,7 @@ struct vxlan_config {
unsigned long age_interval;
unsigned int addrmax;
bool no_share;
+ enum ifla_vxlan_df df;
};
struct vxlan_dev_node {
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 1debfa42cba1..efc588949431 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -533,6 +533,7 @@ enum {
IFLA_VXLAN_LABEL,
IFLA_VXLAN_GPE,
IFLA_VXLAN_TTL_INHERIT,
+ IFLA_VXLAN_DF,
__IFLA_VXLAN_MAX
};
#define IFLA_VXLAN_MAX (__IFLA_VXLAN_MAX - 1)
@@ -542,6 +543,14 @@ struct ifla_vxlan_port_range {
__be16 high;
};
+enum ifla_vxlan_df {
+ VXLAN_DF_UNSET = 0,
+ VXLAN_DF_SET,
+ VXLAN_DF_INHERIT,
+ __VXLAN_DF_END,
+ VXLAN_DF_MAX = __VXLAN_DF_END - 1,
+};
+
/* GENEVE section */
enum {
IFLA_GENEVE_UNSPEC,
--
2.19.1
next prev parent reply other threads:[~2018-11-08 20:54 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-11-08 11:19 [PATCH net-next v2 00/11] ICMP error handling for UDP tunnels Stefano Brivio
2018-11-08 11:19 ` [PATCH net-next v2 01/11] udp: Handle ICMP errors for tunnels with same destination port on both endpoints Stefano Brivio
2018-11-08 11:19 ` [PATCH net-next v2 02/11] vxlan: ICMP error lookup handler Stefano Brivio
2018-11-08 11:19 ` Stefano Brivio [this message]
2018-11-08 11:19 ` [PATCH net-next v2 04/11] selftests: pmtu: Introduce tests for IPv4/IPv6 over VXLAN over IPv4/IPv6 Stefano Brivio
2018-11-08 11:19 ` [PATCH net-next v2 05/11] geneve: ICMP error lookup handler Stefano Brivio
2018-11-08 11:19 ` [PATCH net-next v2 06/11] geneve: Allow configuration of DF behaviour Stefano Brivio
2018-11-08 11:19 ` [PATCH net-next v2 07/11] selftests: pmtu: Introduce tests for IPv4/IPv6 over GENEVE over IPv4/IPv6 Stefano Brivio
2018-11-08 11:19 ` [PATCH net-next v2 08/11] net: Convert protocol error handlers from void to int Stefano Brivio
2018-11-08 11:19 ` [PATCH net-next v2 09/11] udp: Support for error handlers of tunnels with arbitrary destination port Stefano Brivio
2018-11-08 11:19 ` [PATCH net-next v2 10/11] fou, fou6: ICMP error handlers for FoU and GUE Stefano Brivio
2018-11-08 11:19 ` [PATCH net-next v2 11/11] selftests: pmtu: Introduce FoU and GUE PMTU exceptions tests Stefano Brivio
2018-11-09 1:13 ` [PATCH net-next v2 00/11] ICMP error handling for UDP tunnels David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=37169eb16a5d7c240f713b743b7ed726bb88decc.1541675666.git.sbrivio@redhat.com \
--to=sbrivio@redhat.com \
--cc=davem@davemloft.net \
--cc=dsahern@gmail.com \
--cc=jbenc@redhat.com \
--cc=lucien.xin@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=sd@queasysnail.net \
--cc=stephen@networkplumber.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).