* [PATCH net-next v3 0/4] vxlan: implement Generic Protocol Extension (GPE)
@ 2016-04-05 12:47 Jiri Benc
2016-04-05 12:47 ` [PATCH net-next v3 1/4] vxlan: move Ethernet initialization to a separate function Jiri Benc
` (4 more replies)
0 siblings, 5 replies; 8+ messages in thread
From: Jiri Benc @ 2016-04-05 12:47 UTC (permalink / raw)
To: netdev; +Cc: Tom Herbert, Jesse Gross
v3: just rebased on top of the current net-next, no changes
This patchset implements VXLAN-GPE. It follows the same model as the tun/tap
driver: depending on the chosen mode, the vxlan interface is created either
as ARPHRD_ETHER (non-GPE) or ARPHRD_NONE (GPE).
Note that the internal fdb control plane cannot be used together with
VXLAN-GPE and attempt to configure it will be rejected by the driver. In
fact, COLLECT_METADATA is required to be set for now. This can be relaxed in
the future by adding support for static PtP configuration; it will be
backward compatible and won't affect existing users.
The previous version of the patchset supported two GPE modes, L2 and L3. The
L2 mode (now called "ether mode" in the code) was removed from this version.
It can be easily added later if there's demand. The L3 mode is now called
"raw mode" and supports also encapsulated Ethernet headers (via ETH_P_TEB).
The only limitation of not having "ether mode" for GPE is for ip route based
encapsulation: with such setup, only IP packets can be encapsulated. Meaning
no Ethernet encapsulation. It seems there's not much use for this, though.
If it turns out to be useful, we'll add it.
Jiri Benc (4):
vxlan: move Ethernet initialization to a separate function
vxlan: move fdb code to common location in vxlan_xmit
ip_tunnel: implement __iptunnel_pull_header
vxlan: implement GPE
drivers/net/vxlan.c | 210 ++++++++++++++++++++++++++++++++++++-------
include/net/ip_tunnels.h | 11 ++-
include/net/vxlan.h | 68 ++++++++++++++
include/uapi/linux/if_link.h | 1 +
net/ipv4/ip_tunnel_core.c | 8 +-
5 files changed, 258 insertions(+), 40 deletions(-)
--
1.8.3.1
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH net-next v3 1/4] vxlan: move Ethernet initialization to a separate function
2016-04-05 12:47 [PATCH net-next v3 0/4] vxlan: implement Generic Protocol Extension (GPE) Jiri Benc
@ 2016-04-05 12:47 ` Jiri Benc
2016-04-05 12:47 ` [PATCH net-next v3 2/4] vxlan: move fdb code to common location in vxlan_xmit Jiri Benc
` (3 subsequent siblings)
4 siblings, 0 replies; 8+ messages in thread
From: Jiri Benc @ 2016-04-05 12:47 UTC (permalink / raw)
To: netdev; +Cc: Tom Herbert, Jesse Gross
This will allow to initialize vxlan in ARPHRD_NONE mode based on the passed
rtnl attributes.
v2: renamed "l2mode" to "ether".
Signed-off-by: Jiri Benc <jbenc@redhat.com>
---
drivers/net/vxlan.c | 20 +++++++++++++-------
1 file changed, 13 insertions(+), 7 deletions(-)
diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 1c0fa364323e..6bd5b874ead7 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -2404,7 +2404,7 @@ static int vxlan_fill_metadata_dst(struct net_device *dev, struct sk_buff *skb)
return 0;
}
-static const struct net_device_ops vxlan_netdev_ops = {
+static const struct net_device_ops vxlan_netdev_ether_ops = {
.ndo_init = vxlan_init,
.ndo_uninit = vxlan_uninit,
.ndo_open = vxlan_open,
@@ -2458,10 +2458,6 @@ static void vxlan_setup(struct net_device *dev)
struct vxlan_dev *vxlan = netdev_priv(dev);
unsigned int h;
- eth_hw_addr_random(dev);
- ether_setup(dev);
-
- dev->netdev_ops = &vxlan_netdev_ops;
dev->destructor = free_netdev;
SET_NETDEV_DEVTYPE(dev, &vxlan_type);
@@ -2476,8 +2472,7 @@ static void vxlan_setup(struct net_device *dev)
dev->hw_features |= NETIF_F_GSO_SOFTWARE;
dev->hw_features |= NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_STAG_TX;
netif_keep_dst(dev);
- dev->priv_flags &= ~IFF_TX_SKB_SHARING;
- dev->priv_flags |= IFF_LIVE_ADDR_CHANGE | IFF_NO_QUEUE;
+ dev->priv_flags |= IFF_NO_QUEUE;
INIT_LIST_HEAD(&vxlan->next);
spin_lock_init(&vxlan->hash_lock);
@@ -2496,6 +2491,15 @@ static void vxlan_setup(struct net_device *dev)
INIT_HLIST_HEAD(&vxlan->fdb_head[h]);
}
+static void vxlan_ether_setup(struct net_device *dev)
+{
+ eth_hw_addr_random(dev);
+ ether_setup(dev);
+ dev->priv_flags &= ~IFF_TX_SKB_SHARING;
+ dev->priv_flags |= IFF_LIVE_ADDR_CHANGE;
+ dev->netdev_ops = &vxlan_netdev_ether_ops;
+}
+
static const struct nla_policy vxlan_policy[IFLA_VXLAN_MAX + 1] = {
[IFLA_VXLAN_ID] = { .type = NLA_U32 },
[IFLA_VXLAN_GROUP] = { .len = FIELD_SIZEOF(struct iphdr, daddr) },
@@ -2722,6 +2726,8 @@ static int vxlan_dev_configure(struct net *src_net, struct net_device *dev,
__be16 default_port = vxlan->cfg.dst_port;
struct net_device *lowerdev = NULL;
+ vxlan_ether_setup(dev);
+
vxlan->net = src_net;
dst->remote_vni = conf->vni;
--
1.8.3.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH net-next v3 2/4] vxlan: move fdb code to common location in vxlan_xmit
2016-04-05 12:47 [PATCH net-next v3 0/4] vxlan: implement Generic Protocol Extension (GPE) Jiri Benc
2016-04-05 12:47 ` [PATCH net-next v3 1/4] vxlan: move Ethernet initialization to a separate function Jiri Benc
@ 2016-04-05 12:47 ` Jiri Benc
2016-04-05 12:47 ` [PATCH net-next v3 3/4] ip_tunnel: implement __iptunnel_pull_header Jiri Benc
` (2 subsequent siblings)
4 siblings, 0 replies; 8+ messages in thread
From: Jiri Benc @ 2016-04-05 12:47 UTC (permalink / raw)
To: netdev; +Cc: Tom Herbert, Jesse Gross
Handle VXLAN_F_COLLECT_METADATA before VXLAN_F_PROXY. The latter does not
make sense with the former, as it needs populated fdb which does not happen
in metadata mode.
After this cleanup, the fdb code in vxlan_xmit is moved to a common location
and can be later skipped for VXLAN-GPE which does not necessarily carry
inner Ethernet header.
v2: changed commit description to not reference L3 mode
Signed-off-by: Jiri Benc <jbenc@redhat.com>
---
drivers/net/vxlan.c | 22 +++++++++++-----------
1 file changed, 11 insertions(+), 11 deletions(-)
diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 6bd5b874ead7..d62eebaa9720 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -2106,9 +2106,17 @@ static netdev_tx_t vxlan_xmit(struct sk_buff *skb, struct net_device *dev)
info = skb_tunnel_info(skb);
skb_reset_mac_header(skb);
- eth = eth_hdr(skb);
- if ((vxlan->flags & VXLAN_F_PROXY)) {
+ if (vxlan->flags & VXLAN_F_COLLECT_METADATA) {
+ if (info && info->mode & IP_TUNNEL_INFO_TX)
+ vxlan_xmit_one(skb, dev, NULL, false);
+ else
+ kfree_skb(skb);
+ return NETDEV_TX_OK;
+ }
+
+ if (vxlan->flags & VXLAN_F_PROXY) {
+ eth = eth_hdr(skb);
if (ntohs(eth->h_proto) == ETH_P_ARP)
return arp_reduce(dev, skb);
#if IS_ENABLED(CONFIG_IPV6)
@@ -2123,18 +2131,10 @@ static netdev_tx_t vxlan_xmit(struct sk_buff *skb, struct net_device *dev)
msg->icmph.icmp6_type == NDISC_NEIGHBOUR_SOLICITATION)
return neigh_reduce(dev, skb);
}
- eth = eth_hdr(skb);
#endif
}
- if (vxlan->flags & VXLAN_F_COLLECT_METADATA) {
- if (info && info->mode & IP_TUNNEL_INFO_TX)
- vxlan_xmit_one(skb, dev, NULL, false);
- else
- kfree_skb(skb);
- return NETDEV_TX_OK;
- }
-
+ eth = eth_hdr(skb);
f = vxlan_find_mac(vxlan, eth->h_dest);
did_rsc = false;
--
1.8.3.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH net-next v3 3/4] ip_tunnel: implement __iptunnel_pull_header
2016-04-05 12:47 [PATCH net-next v3 0/4] vxlan: implement Generic Protocol Extension (GPE) Jiri Benc
2016-04-05 12:47 ` [PATCH net-next v3 1/4] vxlan: move Ethernet initialization to a separate function Jiri Benc
2016-04-05 12:47 ` [PATCH net-next v3 2/4] vxlan: move fdb code to common location in vxlan_xmit Jiri Benc
@ 2016-04-05 12:47 ` Jiri Benc
2016-04-05 12:47 ` [PATCH net-next v3 4/4] vxlan: implement GPE Jiri Benc
2016-04-06 20:50 ` [PATCH net-next v3 0/4] vxlan: implement Generic Protocol Extension (GPE) David Miller
4 siblings, 0 replies; 8+ messages in thread
From: Jiri Benc @ 2016-04-05 12:47 UTC (permalink / raw)
To: netdev; +Cc: Tom Herbert, Jesse Gross
Allow calling of iptunnel_pull_header without special casing ETH_P_TEB inner
protocol.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
---
New in v2.
---
include/net/ip_tunnels.h | 11 +++++++++--
net/ipv4/ip_tunnel_core.c | 8 ++++----
2 files changed, 13 insertions(+), 6 deletions(-)
diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index 56050f913339..16435d8b1f93 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -295,8 +295,15 @@ static inline u8 ip_tunnel_ecn_encap(u8 tos, const struct iphdr *iph,
return INET_ECN_encapsulate(tos, inner);
}
-int iptunnel_pull_header(struct sk_buff *skb, int hdr_len, __be16 inner_proto,
- bool xnet);
+int __iptunnel_pull_header(struct sk_buff *skb, int hdr_len,
+ __be16 inner_proto, bool raw_proto, bool xnet);
+
+static inline int iptunnel_pull_header(struct sk_buff *skb, int hdr_len,
+ __be16 inner_proto, bool xnet)
+{
+ return __iptunnel_pull_header(skb, hdr_len, inner_proto, false, xnet);
+}
+
void iptunnel_xmit(struct sock *sk, struct rtable *rt, struct sk_buff *skb,
__be32 src, __be32 dst, u8 proto,
u8 tos, u8 ttl, __be16 df, bool xnet);
diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c
index b3ab1205dfdf..43445df61efd 100644
--- a/net/ipv4/ip_tunnel_core.c
+++ b/net/ipv4/ip_tunnel_core.c
@@ -86,15 +86,15 @@ void iptunnel_xmit(struct sock *sk, struct rtable *rt, struct sk_buff *skb,
}
EXPORT_SYMBOL_GPL(iptunnel_xmit);
-int iptunnel_pull_header(struct sk_buff *skb, int hdr_len, __be16 inner_proto,
- bool xnet)
+int __iptunnel_pull_header(struct sk_buff *skb, int hdr_len,
+ __be16 inner_proto, bool raw_proto, bool xnet)
{
if (unlikely(!pskb_may_pull(skb, hdr_len)))
return -ENOMEM;
skb_pull_rcsum(skb, hdr_len);
- if (inner_proto == htons(ETH_P_TEB)) {
+ if (!raw_proto && inner_proto == htons(ETH_P_TEB)) {
struct ethhdr *eh;
if (unlikely(!pskb_may_pull(skb, ETH_HLEN)))
@@ -117,7 +117,7 @@ int iptunnel_pull_header(struct sk_buff *skb, int hdr_len, __be16 inner_proto,
return iptunnel_pull_offloads(skb);
}
-EXPORT_SYMBOL_GPL(iptunnel_pull_header);
+EXPORT_SYMBOL_GPL(__iptunnel_pull_header);
struct metadata_dst *iptunnel_metadata_reply(struct metadata_dst *md,
gfp_t flags)
--
1.8.3.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH net-next v3 4/4] vxlan: implement GPE
2016-04-05 12:47 [PATCH net-next v3 0/4] vxlan: implement Generic Protocol Extension (GPE) Jiri Benc
` (2 preceding siblings ...)
2016-04-05 12:47 ` [PATCH net-next v3 3/4] ip_tunnel: implement __iptunnel_pull_header Jiri Benc
@ 2016-04-05 12:47 ` Jiri Benc
2016-04-05 13:50 ` Tom Herbert
2016-04-06 20:50 ` [PATCH net-next v3 0/4] vxlan: implement Generic Protocol Extension (GPE) David Miller
4 siblings, 1 reply; 8+ messages in thread
From: Jiri Benc @ 2016-04-05 12:47 UTC (permalink / raw)
To: netdev; +Cc: Tom Herbert, Jesse Gross
Implement VXLAN-GPE. Only COLLECT_METADATA is supported for now (it is
possible to support static configuration, too, if there is demand for it).
The GPE header parsing has to be moved before iptunnel_pull_header, as we
need to know the protocol.
v2: Removed what was called "L2 mode" in v1 of the patchset. Only "L3 mode"
(now called "raw mode") is added by this patch. This mode does not allow
Ethernet header to be encapsulated in VXLAN-GPE when using ip route to
specify the encapsulation, IP header is encapsulated instead. The patch
does support Ethernet to be encapsulated, though, using ETH_P_TEB in
skb->protocol. This will be utilized by other COLLECT_METADATA users
(openvswitch in particular).
If there is ever demand for Ethernet encapsulation with VXLAN-GPE using
ip route, it's easy to add a new flag switching the interface to
"Ethernet mode" (called "L2 mode" in v1 of this patchset). For now,
leave this out, it seems we don't need it.
Disallowed more flag combinations, especially RCO with GPE.
Added comment explaining that GBP and GPE cannot be set together.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
---
drivers/net/vxlan.c | 170 ++++++++++++++++++++++++++++++++++++++-----
include/net/vxlan.h | 68 +++++++++++++++++
include/uapi/linux/if_link.h | 1 +
3 files changed, 222 insertions(+), 17 deletions(-)
diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index d62eebaa9720..51cccddfe403 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1192,6 +1192,45 @@ out:
unparsed->vx_flags &= ~VXLAN_GBP_USED_BITS;
}
+static bool vxlan_parse_gpe_hdr(struct vxlanhdr *unparsed,
+ __be32 *protocol,
+ struct sk_buff *skb, u32 vxflags)
+{
+ struct vxlanhdr_gpe *gpe = (struct vxlanhdr_gpe *)unparsed;
+
+ /* Need to have Next Protocol set for interfaces in GPE mode. */
+ if (!gpe->np_applied)
+ return false;
+ /* "The initial version is 0. If a receiver does not support the
+ * version indicated it MUST drop the packet.
+ */
+ if (gpe->version != 0)
+ return false;
+ /* "When the O bit is set to 1, the packet is an OAM packet and OAM
+ * processing MUST occur." However, we don't implement OAM
+ * processing, thus drop the packet.
+ */
+ if (gpe->oam_flag)
+ return false;
+
+ switch (gpe->next_protocol) {
+ case VXLAN_GPE_NP_IPV4:
+ *protocol = htons(ETH_P_IP);
+ break;
+ case VXLAN_GPE_NP_IPV6:
+ *protocol = htons(ETH_P_IPV6);
+ break;
+ case VXLAN_GPE_NP_ETHERNET:
+ *protocol = htons(ETH_P_TEB);
+ break;
+ default:
+ return false;
+ }
+
+ unparsed->vx_flags &= ~VXLAN_GPE_USED_BITS;
+ return true;
+}
+
static bool vxlan_set_mac(struct vxlan_dev *vxlan,
struct vxlan_sock *vs,
struct sk_buff *skb)
@@ -1257,9 +1296,11 @@ static int vxlan_rcv(struct sock *sk, struct sk_buff *skb)
struct vxlanhdr unparsed;
struct vxlan_metadata _md;
struct vxlan_metadata *md = &_md;
+ __be32 protocol = htons(ETH_P_TEB);
+ bool raw_proto = false;
void *oiph;
- /* Need Vxlan and inner Ethernet header to be present */
+ /* Need UDP and VXLAN header to be present */
if (!pskb_may_pull(skb, VXLAN_HLEN))
return 1;
@@ -1283,9 +1324,18 @@ static int vxlan_rcv(struct sock *sk, struct sk_buff *skb)
if (!vxlan)
goto drop;
- if (iptunnel_pull_header(skb, VXLAN_HLEN, htons(ETH_P_TEB),
- !net_eq(vxlan->net, dev_net(vxlan->dev))))
- goto drop;
+ /* For backwards compatibility, only allow reserved fields to be
+ * used by VXLAN extensions if explicitly requested.
+ */
+ if (vs->flags & VXLAN_F_GPE) {
+ if (!vxlan_parse_gpe_hdr(&unparsed, &protocol, skb, vs->flags))
+ goto drop;
+ raw_proto = true;
+ }
+
+ if (__iptunnel_pull_header(skb, VXLAN_HLEN, protocol, raw_proto,
+ !net_eq(vxlan->net, dev_net(vxlan->dev))))
+ goto drop;
if (vxlan_collect_metadata(vs)) {
__be32 vni = vxlan_vni(vxlan_hdr(skb)->vx_vni);
@@ -1304,14 +1354,14 @@ static int vxlan_rcv(struct sock *sk, struct sk_buff *skb)
memset(md, 0, sizeof(*md));
}
- /* For backwards compatibility, only allow reserved fields to be
- * used by VXLAN extensions if explicitly requested.
- */
if (vs->flags & VXLAN_F_REMCSUM_RX)
if (!vxlan_remcsum(&unparsed, skb, vs->flags))
goto drop;
if (vs->flags & VXLAN_F_GBP)
vxlan_parse_gbp_hdr(&unparsed, skb, vs->flags, md);
+ /* Note that GBP and GPE can never be active together. This is
+ * ensured in vxlan_dev_configure.
+ */
if (unparsed.vx_flags || unparsed.vx_vni) {
/* If there are any unprocessed flags remaining treat
@@ -1325,8 +1375,13 @@ static int vxlan_rcv(struct sock *sk, struct sk_buff *skb)
goto drop;
}
- if (!vxlan_set_mac(vxlan, vs, skb))
- goto drop;
+ if (!raw_proto) {
+ if (!vxlan_set_mac(vxlan, vs, skb))
+ goto drop;
+ } else {
+ skb->dev = vxlan->dev;
+ skb->pkt_type = PACKET_HOST;
+ }
oiph = skb_network_header(skb);
skb_reset_network_header(skb);
@@ -1685,6 +1740,27 @@ static void vxlan_build_gbp_hdr(struct vxlanhdr *vxh, u32 vxflags,
gbp->policy_id = htons(md->gbp & VXLAN_GBP_ID_MASK);
}
+static int vxlan_build_gpe_hdr(struct vxlanhdr *vxh, u32 vxflags,
+ __be16 protocol)
+{
+ struct vxlanhdr_gpe *gpe = (struct vxlanhdr_gpe *)vxh;
+
+ gpe->np_applied = 1;
+
+ switch (protocol) {
+ case htons(ETH_P_IP):
+ gpe->next_protocol = VXLAN_GPE_NP_IPV4;
+ return 0;
+ case htons(ETH_P_IPV6):
+ gpe->next_protocol = VXLAN_GPE_NP_IPV6;
+ return 0;
+ case htons(ETH_P_TEB):
+ gpe->next_protocol = VXLAN_GPE_NP_ETHERNET;
+ return 0;
+ }
+ return -EPFNOSUPPORT;
+}
+
static int vxlan_build_skb(struct sk_buff *skb, struct dst_entry *dst,
int iphdr_len, __be32 vni,
struct vxlan_metadata *md, u32 vxflags,
@@ -1694,6 +1770,7 @@ static int vxlan_build_skb(struct sk_buff *skb, struct dst_entry *dst,
int min_headroom;
int err;
int type = udp_sum ? SKB_GSO_UDP_TUNNEL_CSUM : SKB_GSO_UDP_TUNNEL;
+ __be16 inner_protocol = htons(ETH_P_TEB);
if ((vxflags & VXLAN_F_REMCSUM_TX) &&
skb->ip_summed == CHECKSUM_PARTIAL) {
@@ -1712,10 +1789,8 @@ static int vxlan_build_skb(struct sk_buff *skb, struct dst_entry *dst,
/* Need space for new headers (invalidates iph ptr) */
err = skb_cow_head(skb, min_headroom);
- if (unlikely(err)) {
- kfree_skb(skb);
- return err;
- }
+ if (unlikely(err))
+ goto out_free;
skb = vlan_hwaccel_push_inside(skb);
if (WARN_ON(!skb))
@@ -1744,9 +1819,19 @@ static int vxlan_build_skb(struct sk_buff *skb, struct dst_entry *dst,
if (vxflags & VXLAN_F_GBP)
vxlan_build_gbp_hdr(vxh, vxflags, md);
+ if (vxflags & VXLAN_F_GPE) {
+ err = vxlan_build_gpe_hdr(vxh, vxflags, skb->protocol);
+ if (err < 0)
+ goto out_free;
+ inner_protocol = skb->protocol;
+ }
- skb_set_inner_protocol(skb, htons(ETH_P_TEB));
+ skb_set_inner_protocol(skb, inner_protocol);
return 0;
+
+out_free:
+ kfree_skb(skb);
+ return err;
}
static struct rtable *vxlan_get_route(struct vxlan_dev *vxlan,
@@ -2421,6 +2506,17 @@ static const struct net_device_ops vxlan_netdev_ether_ops = {
.ndo_fill_metadata_dst = vxlan_fill_metadata_dst,
};
+static const struct net_device_ops vxlan_netdev_raw_ops = {
+ .ndo_init = vxlan_init,
+ .ndo_uninit = vxlan_uninit,
+ .ndo_open = vxlan_open,
+ .ndo_stop = vxlan_stop,
+ .ndo_start_xmit = vxlan_xmit,
+ .ndo_get_stats64 = ip_tunnel_get_stats64,
+ .ndo_change_mtu = vxlan_change_mtu,
+ .ndo_fill_metadata_dst = vxlan_fill_metadata_dst,
+};
+
/* Info for udev, that this is a virtual tunnel endpoint */
static struct device_type vxlan_type = {
.name = "vxlan",
@@ -2500,6 +2596,17 @@ static void vxlan_ether_setup(struct net_device *dev)
dev->netdev_ops = &vxlan_netdev_ether_ops;
}
+static void vxlan_raw_setup(struct net_device *dev)
+{
+ dev->type = ARPHRD_NONE;
+ dev->hard_header_len = 0;
+ dev->addr_len = 0;
+ dev->mtu = ETH_DATA_LEN;
+ dev->tx_queue_len = 1000;
+ dev->flags = IFF_POINTOPOINT | IFF_NOARP | IFF_MULTICAST;
+ dev->netdev_ops = &vxlan_netdev_raw_ops;
+}
+
static const struct nla_policy vxlan_policy[IFLA_VXLAN_MAX + 1] = {
[IFLA_VXLAN_ID] = { .type = NLA_U32 },
[IFLA_VXLAN_GROUP] = { .len = FIELD_SIZEOF(struct iphdr, daddr) },
@@ -2526,6 +2633,7 @@ static const struct nla_policy vxlan_policy[IFLA_VXLAN_MAX + 1] = {
[IFLA_VXLAN_REMCSUM_TX] = { .type = NLA_U8 },
[IFLA_VXLAN_REMCSUM_RX] = { .type = NLA_U8 },
[IFLA_VXLAN_GBP] = { .type = NLA_FLAG, },
+ [IFLA_VXLAN_GPE] = { .type = NLA_FLAG, },
[IFLA_VXLAN_REMCSUM_NOPARTIAL] = { .type = NLA_FLAG },
};
@@ -2726,7 +2834,20 @@ static int vxlan_dev_configure(struct net *src_net, struct net_device *dev,
__be16 default_port = vxlan->cfg.dst_port;
struct net_device *lowerdev = NULL;
- vxlan_ether_setup(dev);
+ if (conf->flags & VXLAN_F_GPE) {
+ if (conf->flags & ~VXLAN_F_ALLOWED_GPE)
+ return -EINVAL;
+ /* For now, allow GPE only together with COLLECT_METADATA.
+ * This can be relaxed later; in such case, the other side
+ * of the PtP link will have to be provided.
+ */
+ if (!(conf->flags & VXLAN_F_COLLECT_METADATA))
+ return -EINVAL;
+
+ vxlan_raw_setup(dev);
+ } else {
+ vxlan_ether_setup(dev);
+ }
vxlan->net = src_net;
@@ -2789,8 +2910,12 @@ static int vxlan_dev_configure(struct net *src_net, struct net_device *dev,
dev->needed_headroom = needed_headroom;
memcpy(&vxlan->cfg, conf, sizeof(*conf));
- if (!vxlan->cfg.dst_port)
- vxlan->cfg.dst_port = default_port;
+ if (!vxlan->cfg.dst_port) {
+ if (conf->flags & VXLAN_F_GPE)
+ vxlan->cfg.dst_port = 4790; /* IANA assigned VXLAN-GPE port */
+ else
+ vxlan->cfg.dst_port = default_port;
+ }
vxlan->flags |= conf->flags;
if (!vxlan->cfg.age_interval)
@@ -2961,6 +3086,9 @@ static int vxlan_newlink(struct net *src_net, struct net_device *dev,
if (data[IFLA_VXLAN_GBP])
conf.flags |= VXLAN_F_GBP;
+ if (data[IFLA_VXLAN_GPE])
+ conf.flags |= VXLAN_F_GPE;
+
if (data[IFLA_VXLAN_REMCSUM_NOPARTIAL])
conf.flags |= VXLAN_F_REMCSUM_NOPARTIAL;
@@ -2977,6 +3105,10 @@ static int vxlan_newlink(struct net *src_net, struct net_device *dev,
case -EEXIST:
pr_info("duplicate VNI %u\n", be32_to_cpu(conf.vni));
break;
+
+ case -EINVAL:
+ pr_info("unsupported combination of extensions\n");
+ break;
}
return err;
@@ -3104,6 +3236,10 @@ static int vxlan_fill_info(struct sk_buff *skb, const struct net_device *dev)
nla_put_flag(skb, IFLA_VXLAN_GBP))
goto nla_put_failure;
+ if (vxlan->flags & VXLAN_F_GPE &&
+ nla_put_flag(skb, IFLA_VXLAN_GPE))
+ goto nla_put_failure;
+
if (vxlan->flags & VXLAN_F_REMCSUM_NOPARTIAL &&
nla_put_flag(skb, IFLA_VXLAN_REMCSUM_NOPARTIAL))
goto nla_put_failure;
diff --git a/include/net/vxlan.h b/include/net/vxlan.h
index 73ed2e951c02..dcc6f4057115 100644
--- a/include/net/vxlan.h
+++ b/include/net/vxlan.h
@@ -119,6 +119,64 @@ struct vxlanhdr_gbp {
#define VXLAN_GBP_POLICY_APPLIED (BIT(3) << 16)
#define VXLAN_GBP_ID_MASK (0xFFFF)
+/*
+ * VXLAN Generic Protocol Extension (VXLAN_F_GPE):
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |R|R|Ver|I|P|R|O| Reserved |Next Protocol |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * | VXLAN Network Identifier (VNI) | Reserved |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *
+ * Ver = Version. Indicates VXLAN GPE protocol version.
+ *
+ * P = Next Protocol Bit. The P bit is set to indicate that the
+ * Next Protocol field is present.
+ *
+ * O = OAM Flag Bit. The O bit is set to indicate that the packet
+ * is an OAM packet.
+ *
+ * Next Protocol = This 8 bit field indicates the protocol header
+ * immediately following the VXLAN GPE header.
+ *
+ * https://tools.ietf.org/html/draft-ietf-nvo3-vxlan-gpe-01
+ */
+
+struct vxlanhdr_gpe {
+#if defined(__LITTLE_ENDIAN_BITFIELD)
+ u8 oam_flag:1,
+ reserved_flags1:1,
+ np_applied:1,
+ instance_applied:1,
+ version:2,
+reserved_flags2:2;
+#elif defined(__BIG_ENDIAN_BITFIELD)
+ u8 reserved_flags2:2,
+ version:2,
+ instance_applied:1,
+ np_applied:1,
+ reserved_flags1:1,
+ oam_flag:1;
+#endif
+ u8 reserved_flags3;
+ u8 reserved_flags4;
+ u8 next_protocol;
+ __be32 vx_vni;
+};
+
+/* VXLAN-GPE header flags. */
+#define VXLAN_HF_VER cpu_to_be32(BIT(29) | BIT(28))
+#define VXLAN_HF_NP cpu_to_be32(BIT(26))
+#define VXLAN_HF_OAM cpu_to_be32(BIT(24))
+
+#define VXLAN_GPE_USED_BITS (VXLAN_HF_VER | VXLAN_HF_NP | VXLAN_HF_OAM | \
+ cpu_to_be32(0xff))
+
+/* VXLAN-GPE header Next Protocol. */
+#define VXLAN_GPE_NP_IPV4 0x01
+#define VXLAN_GPE_NP_IPV6 0x02
+#define VXLAN_GPE_NP_ETHERNET 0x03
+#define VXLAN_GPE_NP_NSH 0x04
+
struct vxlan_metadata {
u32 gbp;
};
@@ -206,16 +264,26 @@ struct vxlan_dev {
#define VXLAN_F_GBP 0x800
#define VXLAN_F_REMCSUM_NOPARTIAL 0x1000
#define VXLAN_F_COLLECT_METADATA 0x2000
+#define VXLAN_F_GPE 0x4000
/* Flags that are used in the receive path. These flags must match in
* order for a socket to be shareable
*/
#define VXLAN_F_RCV_FLAGS (VXLAN_F_GBP | \
+ VXLAN_F_GPE | \
VXLAN_F_UDP_ZERO_CSUM6_RX | \
VXLAN_F_REMCSUM_RX | \
VXLAN_F_REMCSUM_NOPARTIAL | \
VXLAN_F_COLLECT_METADATA)
+/* Flags that can be set together with VXLAN_F_GPE. */
+#define VXLAN_F_ALLOWED_GPE (VXLAN_F_GPE | \
+ VXLAN_F_IPV6 | \
+ VXLAN_F_UDP_ZERO_CSUM_TX | \
+ VXLAN_F_UDP_ZERO_CSUM6_TX | \
+ VXLAN_F_UDP_ZERO_CSUM6_RX | \
+ VXLAN_F_COLLECT_METADATA)
+
struct net_device *vxlan_dev_create(struct net *net, const char *name,
u8 name_assign_type, struct vxlan_config *conf);
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index c488066fb53a..9427f17d06d6 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -488,6 +488,7 @@ enum {
IFLA_VXLAN_REMCSUM_NOPARTIAL,
IFLA_VXLAN_COLLECT_METADATA,
IFLA_VXLAN_LABEL,
+ IFLA_VXLAN_GPE,
__IFLA_VXLAN_MAX
};
#define IFLA_VXLAN_MAX (__IFLA_VXLAN_MAX - 1)
--
1.8.3.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH net-next v3 4/4] vxlan: implement GPE
2016-04-05 12:47 ` [PATCH net-next v3 4/4] vxlan: implement GPE Jiri Benc
@ 2016-04-05 13:50 ` Tom Herbert
2016-04-05 13:57 ` Jiri Benc
0 siblings, 1 reply; 8+ messages in thread
From: Tom Herbert @ 2016-04-05 13:50 UTC (permalink / raw)
To: Jiri Benc; +Cc: Linux Kernel Network Developers, Jesse Gross
On Tue, Apr 5, 2016 at 9:47 AM, Jiri Benc <jbenc@redhat.com> wrote:
> Implement VXLAN-GPE. Only COLLECT_METADATA is supported for now (it is
> possible to support static configuration, too, if there is demand for it).
>
> The GPE header parsing has to be moved before iptunnel_pull_header, as we
> need to know the protocol.
>
> v2: Removed what was called "L2 mode" in v1 of the patchset. Only "L3 mode"
> (now called "raw mode") is added by this patch. This mode does not allow
> Ethernet header to be encapsulated in VXLAN-GPE when using ip route to
> specify the encapsulation, IP header is encapsulated instead. The patch
> does support Ethernet to be encapsulated, though, using ETH_P_TEB in
> skb->protocol. This will be utilized by other COLLECT_METADATA users
> (openvswitch in particular).
>
> If there is ever demand for Ethernet encapsulation with VXLAN-GPE using
> ip route, it's easy to add a new flag switching the interface to
> "Ethernet mode" (called "L2 mode" in v1 of this patchset). For now,
> leave this out, it seems we don't need it.
>
> Disallowed more flag combinations, especially RCO with GPE.
> Added comment explaining that GBP and GPE cannot be set together.
>
I requested input from VXLAN protocol experts on whether RCO is
architecturally correct in VXLAN and whether we are using the right
fields both on the mailing list and in WG meeting yesterday @IETF.
Have not gotten any response, so I am going to assume all this is
reasonable. I will add explicit support to VXLAN-RCO draft for
VXLAN-GPE. The configuration option for RX RCO can be removed and RCO
can be supported for VXLAN/VXLAN-GPE in same way. Presumably, GBP
might make same assumptions but GBP format as defined for VXLAN isn't
compatible with VXLAN-GPE.
Tom
> Signed-off-by: Jiri Benc <jbenc@redhat.com>
> ---
> drivers/net/vxlan.c | 170 ++++++++++++++++++++++++++++++++++++++-----
> include/net/vxlan.h | 68 +++++++++++++++++
> include/uapi/linux/if_link.h | 1 +
> 3 files changed, 222 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> index d62eebaa9720..51cccddfe403 100644
> --- a/drivers/net/vxlan.c
> +++ b/drivers/net/vxlan.c
> @@ -1192,6 +1192,45 @@ out:
> unparsed->vx_flags &= ~VXLAN_GBP_USED_BITS;
> }
>
> +static bool vxlan_parse_gpe_hdr(struct vxlanhdr *unparsed,
> + __be32 *protocol,
> + struct sk_buff *skb, u32 vxflags)
> +{
> + struct vxlanhdr_gpe *gpe = (struct vxlanhdr_gpe *)unparsed;
> +
> + /* Need to have Next Protocol set for interfaces in GPE mode. */
> + if (!gpe->np_applied)
> + return false;
> + /* "The initial version is 0. If a receiver does not support the
> + * version indicated it MUST drop the packet.
> + */
> + if (gpe->version != 0)
> + return false;
> + /* "When the O bit is set to 1, the packet is an OAM packet and OAM
> + * processing MUST occur." However, we don't implement OAM
> + * processing, thus drop the packet.
> + */
> + if (gpe->oam_flag)
> + return false;
> +
> + switch (gpe->next_protocol) {
> + case VXLAN_GPE_NP_IPV4:
> + *protocol = htons(ETH_P_IP);
> + break;
> + case VXLAN_GPE_NP_IPV6:
> + *protocol = htons(ETH_P_IPV6);
> + break;
> + case VXLAN_GPE_NP_ETHERNET:
> + *protocol = htons(ETH_P_TEB);
> + break;
> + default:
> + return false;
> + }
> +
> + unparsed->vx_flags &= ~VXLAN_GPE_USED_BITS;
> + return true;
> +}
> +
> static bool vxlan_set_mac(struct vxlan_dev *vxlan,
> struct vxlan_sock *vs,
> struct sk_buff *skb)
> @@ -1257,9 +1296,11 @@ static int vxlan_rcv(struct sock *sk, struct sk_buff *skb)
> struct vxlanhdr unparsed;
> struct vxlan_metadata _md;
> struct vxlan_metadata *md = &_md;
> + __be32 protocol = htons(ETH_P_TEB);
> + bool raw_proto = false;
> void *oiph;
>
> - /* Need Vxlan and inner Ethernet header to be present */
> + /* Need UDP and VXLAN header to be present */
> if (!pskb_may_pull(skb, VXLAN_HLEN))
> return 1;
>
> @@ -1283,9 +1324,18 @@ static int vxlan_rcv(struct sock *sk, struct sk_buff *skb)
> if (!vxlan)
> goto drop;
>
> - if (iptunnel_pull_header(skb, VXLAN_HLEN, htons(ETH_P_TEB),
> - !net_eq(vxlan->net, dev_net(vxlan->dev))))
> - goto drop;
> + /* For backwards compatibility, only allow reserved fields to be
> + * used by VXLAN extensions if explicitly requested.
> + */
> + if (vs->flags & VXLAN_F_GPE) {
> + if (!vxlan_parse_gpe_hdr(&unparsed, &protocol, skb, vs->flags))
> + goto drop;
> + raw_proto = true;
> + }
> +
> + if (__iptunnel_pull_header(skb, VXLAN_HLEN, protocol, raw_proto,
> + !net_eq(vxlan->net, dev_net(vxlan->dev))))
> + goto drop;
>
> if (vxlan_collect_metadata(vs)) {
> __be32 vni = vxlan_vni(vxlan_hdr(skb)->vx_vni);
> @@ -1304,14 +1354,14 @@ static int vxlan_rcv(struct sock *sk, struct sk_buff *skb)
> memset(md, 0, sizeof(*md));
> }
>
> - /* For backwards compatibility, only allow reserved fields to be
> - * used by VXLAN extensions if explicitly requested.
> - */
> if (vs->flags & VXLAN_F_REMCSUM_RX)
> if (!vxlan_remcsum(&unparsed, skb, vs->flags))
> goto drop;
> if (vs->flags & VXLAN_F_GBP)
> vxlan_parse_gbp_hdr(&unparsed, skb, vs->flags, md);
> + /* Note that GBP and GPE can never be active together. This is
> + * ensured in vxlan_dev_configure.
> + */
>
> if (unparsed.vx_flags || unparsed.vx_vni) {
> /* If there are any unprocessed flags remaining treat
> @@ -1325,8 +1375,13 @@ static int vxlan_rcv(struct sock *sk, struct sk_buff *skb)
> goto drop;
> }
>
> - if (!vxlan_set_mac(vxlan, vs, skb))
> - goto drop;
> + if (!raw_proto) {
> + if (!vxlan_set_mac(vxlan, vs, skb))
> + goto drop;
> + } else {
> + skb->dev = vxlan->dev;
> + skb->pkt_type = PACKET_HOST;
> + }
>
> oiph = skb_network_header(skb);
> skb_reset_network_header(skb);
> @@ -1685,6 +1740,27 @@ static void vxlan_build_gbp_hdr(struct vxlanhdr *vxh, u32 vxflags,
> gbp->policy_id = htons(md->gbp & VXLAN_GBP_ID_MASK);
> }
>
> +static int vxlan_build_gpe_hdr(struct vxlanhdr *vxh, u32 vxflags,
> + __be16 protocol)
> +{
> + struct vxlanhdr_gpe *gpe = (struct vxlanhdr_gpe *)vxh;
> +
> + gpe->np_applied = 1;
> +
> + switch (protocol) {
> + case htons(ETH_P_IP):
> + gpe->next_protocol = VXLAN_GPE_NP_IPV4;
> + return 0;
> + case htons(ETH_P_IPV6):
> + gpe->next_protocol = VXLAN_GPE_NP_IPV6;
> + return 0;
> + case htons(ETH_P_TEB):
> + gpe->next_protocol = VXLAN_GPE_NP_ETHERNET;
> + return 0;
> + }
> + return -EPFNOSUPPORT;
> +}
> +
> static int vxlan_build_skb(struct sk_buff *skb, struct dst_entry *dst,
> int iphdr_len, __be32 vni,
> struct vxlan_metadata *md, u32 vxflags,
> @@ -1694,6 +1770,7 @@ static int vxlan_build_skb(struct sk_buff *skb, struct dst_entry *dst,
> int min_headroom;
> int err;
> int type = udp_sum ? SKB_GSO_UDP_TUNNEL_CSUM : SKB_GSO_UDP_TUNNEL;
> + __be16 inner_protocol = htons(ETH_P_TEB);
>
> if ((vxflags & VXLAN_F_REMCSUM_TX) &&
> skb->ip_summed == CHECKSUM_PARTIAL) {
> @@ -1712,10 +1789,8 @@ static int vxlan_build_skb(struct sk_buff *skb, struct dst_entry *dst,
>
> /* Need space for new headers (invalidates iph ptr) */
> err = skb_cow_head(skb, min_headroom);
> - if (unlikely(err)) {
> - kfree_skb(skb);
> - return err;
> - }
> + if (unlikely(err))
> + goto out_free;
>
> skb = vlan_hwaccel_push_inside(skb);
> if (WARN_ON(!skb))
> @@ -1744,9 +1819,19 @@ static int vxlan_build_skb(struct sk_buff *skb, struct dst_entry *dst,
>
> if (vxflags & VXLAN_F_GBP)
> vxlan_build_gbp_hdr(vxh, vxflags, md);
> + if (vxflags & VXLAN_F_GPE) {
> + err = vxlan_build_gpe_hdr(vxh, vxflags, skb->protocol);
> + if (err < 0)
> + goto out_free;
> + inner_protocol = skb->protocol;
> + }
>
> - skb_set_inner_protocol(skb, htons(ETH_P_TEB));
> + skb_set_inner_protocol(skb, inner_protocol);
> return 0;
> +
> +out_free:
> + kfree_skb(skb);
> + return err;
> }
>
> static struct rtable *vxlan_get_route(struct vxlan_dev *vxlan,
> @@ -2421,6 +2506,17 @@ static const struct net_device_ops vxlan_netdev_ether_ops = {
> .ndo_fill_metadata_dst = vxlan_fill_metadata_dst,
> };
>
> +static const struct net_device_ops vxlan_netdev_raw_ops = {
> + .ndo_init = vxlan_init,
> + .ndo_uninit = vxlan_uninit,
> + .ndo_open = vxlan_open,
> + .ndo_stop = vxlan_stop,
> + .ndo_start_xmit = vxlan_xmit,
> + .ndo_get_stats64 = ip_tunnel_get_stats64,
> + .ndo_change_mtu = vxlan_change_mtu,
> + .ndo_fill_metadata_dst = vxlan_fill_metadata_dst,
> +};
> +
> /* Info for udev, that this is a virtual tunnel endpoint */
> static struct device_type vxlan_type = {
> .name = "vxlan",
> @@ -2500,6 +2596,17 @@ static void vxlan_ether_setup(struct net_device *dev)
> dev->netdev_ops = &vxlan_netdev_ether_ops;
> }
>
> +static void vxlan_raw_setup(struct net_device *dev)
> +{
> + dev->type = ARPHRD_NONE;
> + dev->hard_header_len = 0;
> + dev->addr_len = 0;
> + dev->mtu = ETH_DATA_LEN;
> + dev->tx_queue_len = 1000;
> + dev->flags = IFF_POINTOPOINT | IFF_NOARP | IFF_MULTICAST;
> + dev->netdev_ops = &vxlan_netdev_raw_ops;
> +}
> +
> static const struct nla_policy vxlan_policy[IFLA_VXLAN_MAX + 1] = {
> [IFLA_VXLAN_ID] = { .type = NLA_U32 },
> [IFLA_VXLAN_GROUP] = { .len = FIELD_SIZEOF(struct iphdr, daddr) },
> @@ -2526,6 +2633,7 @@ static const struct nla_policy vxlan_policy[IFLA_VXLAN_MAX + 1] = {
> [IFLA_VXLAN_REMCSUM_TX] = { .type = NLA_U8 },
> [IFLA_VXLAN_REMCSUM_RX] = { .type = NLA_U8 },
> [IFLA_VXLAN_GBP] = { .type = NLA_FLAG, },
> + [IFLA_VXLAN_GPE] = { .type = NLA_FLAG, },
> [IFLA_VXLAN_REMCSUM_NOPARTIAL] = { .type = NLA_FLAG },
> };
>
> @@ -2726,7 +2834,20 @@ static int vxlan_dev_configure(struct net *src_net, struct net_device *dev,
> __be16 default_port = vxlan->cfg.dst_port;
> struct net_device *lowerdev = NULL;
>
> - vxlan_ether_setup(dev);
> + if (conf->flags & VXLAN_F_GPE) {
> + if (conf->flags & ~VXLAN_F_ALLOWED_GPE)
> + return -EINVAL;
> + /* For now, allow GPE only together with COLLECT_METADATA.
> + * This can be relaxed later; in such case, the other side
> + * of the PtP link will have to be provided.
> + */
> + if (!(conf->flags & VXLAN_F_COLLECT_METADATA))
> + return -EINVAL;
> +
> + vxlan_raw_setup(dev);
> + } else {
> + vxlan_ether_setup(dev);
> + }
>
> vxlan->net = src_net;
>
> @@ -2789,8 +2910,12 @@ static int vxlan_dev_configure(struct net *src_net, struct net_device *dev,
> dev->needed_headroom = needed_headroom;
>
> memcpy(&vxlan->cfg, conf, sizeof(*conf));
> - if (!vxlan->cfg.dst_port)
> - vxlan->cfg.dst_port = default_port;
> + if (!vxlan->cfg.dst_port) {
> + if (conf->flags & VXLAN_F_GPE)
> + vxlan->cfg.dst_port = 4790; /* IANA assigned VXLAN-GPE port */
> + else
> + vxlan->cfg.dst_port = default_port;
> + }
> vxlan->flags |= conf->flags;
>
> if (!vxlan->cfg.age_interval)
> @@ -2961,6 +3086,9 @@ static int vxlan_newlink(struct net *src_net, struct net_device *dev,
> if (data[IFLA_VXLAN_GBP])
> conf.flags |= VXLAN_F_GBP;
>
> + if (data[IFLA_VXLAN_GPE])
> + conf.flags |= VXLAN_F_GPE;
> +
> if (data[IFLA_VXLAN_REMCSUM_NOPARTIAL])
> conf.flags |= VXLAN_F_REMCSUM_NOPARTIAL;
>
> @@ -2977,6 +3105,10 @@ static int vxlan_newlink(struct net *src_net, struct net_device *dev,
> case -EEXIST:
> pr_info("duplicate VNI %u\n", be32_to_cpu(conf.vni));
> break;
> +
> + case -EINVAL:
> + pr_info("unsupported combination of extensions\n");
> + break;
> }
>
> return err;
> @@ -3104,6 +3236,10 @@ static int vxlan_fill_info(struct sk_buff *skb, const struct net_device *dev)
> nla_put_flag(skb, IFLA_VXLAN_GBP))
> goto nla_put_failure;
>
> + if (vxlan->flags & VXLAN_F_GPE &&
> + nla_put_flag(skb, IFLA_VXLAN_GPE))
> + goto nla_put_failure;
> +
> if (vxlan->flags & VXLAN_F_REMCSUM_NOPARTIAL &&
> nla_put_flag(skb, IFLA_VXLAN_REMCSUM_NOPARTIAL))
> goto nla_put_failure;
> diff --git a/include/net/vxlan.h b/include/net/vxlan.h
> index 73ed2e951c02..dcc6f4057115 100644
> --- a/include/net/vxlan.h
> +++ b/include/net/vxlan.h
> @@ -119,6 +119,64 @@ struct vxlanhdr_gbp {
> #define VXLAN_GBP_POLICY_APPLIED (BIT(3) << 16)
> #define VXLAN_GBP_ID_MASK (0xFFFF)
>
> +/*
> + * VXLAN Generic Protocol Extension (VXLAN_F_GPE):
> + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> + * |R|R|Ver|I|P|R|O| Reserved |Next Protocol |
> + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> + * | VXLAN Network Identifier (VNI) | Reserved |
> + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> + *
> + * Ver = Version. Indicates VXLAN GPE protocol version.
> + *
> + * P = Next Protocol Bit. The P bit is set to indicate that the
> + * Next Protocol field is present.
> + *
> + * O = OAM Flag Bit. The O bit is set to indicate that the packet
> + * is an OAM packet.
> + *
> + * Next Protocol = This 8 bit field indicates the protocol header
> + * immediately following the VXLAN GPE header.
> + *
> + * https://tools.ietf.org/html/draft-ietf-nvo3-vxlan-gpe-01
> + */
> +
> +struct vxlanhdr_gpe {
> +#if defined(__LITTLE_ENDIAN_BITFIELD)
> + u8 oam_flag:1,
> + reserved_flags1:1,
> + np_applied:1,
> + instance_applied:1,
> + version:2,
> +reserved_flags2:2;
> +#elif defined(__BIG_ENDIAN_BITFIELD)
> + u8 reserved_flags2:2,
> + version:2,
> + instance_applied:1,
> + np_applied:1,
> + reserved_flags1:1,
> + oam_flag:1;
> +#endif
> + u8 reserved_flags3;
> + u8 reserved_flags4;
> + u8 next_protocol;
> + __be32 vx_vni;
> +};
> +
> +/* VXLAN-GPE header flags. */
> +#define VXLAN_HF_VER cpu_to_be32(BIT(29) | BIT(28))
> +#define VXLAN_HF_NP cpu_to_be32(BIT(26))
> +#define VXLAN_HF_OAM cpu_to_be32(BIT(24))
> +
> +#define VXLAN_GPE_USED_BITS (VXLAN_HF_VER | VXLAN_HF_NP | VXLAN_HF_OAM | \
> + cpu_to_be32(0xff))
> +
> +/* VXLAN-GPE header Next Protocol. */
> +#define VXLAN_GPE_NP_IPV4 0x01
> +#define VXLAN_GPE_NP_IPV6 0x02
> +#define VXLAN_GPE_NP_ETHERNET 0x03
> +#define VXLAN_GPE_NP_NSH 0x04
> +
> struct vxlan_metadata {
> u32 gbp;
> };
> @@ -206,16 +264,26 @@ struct vxlan_dev {
> #define VXLAN_F_GBP 0x800
> #define VXLAN_F_REMCSUM_NOPARTIAL 0x1000
> #define VXLAN_F_COLLECT_METADATA 0x2000
> +#define VXLAN_F_GPE 0x4000
>
> /* Flags that are used in the receive path. These flags must match in
> * order for a socket to be shareable
> */
> #define VXLAN_F_RCV_FLAGS (VXLAN_F_GBP | \
> + VXLAN_F_GPE | \
> VXLAN_F_UDP_ZERO_CSUM6_RX | \
> VXLAN_F_REMCSUM_RX | \
> VXLAN_F_REMCSUM_NOPARTIAL | \
> VXLAN_F_COLLECT_METADATA)
>
> +/* Flags that can be set together with VXLAN_F_GPE. */
> +#define VXLAN_F_ALLOWED_GPE (VXLAN_F_GPE | \
> + VXLAN_F_IPV6 | \
> + VXLAN_F_UDP_ZERO_CSUM_TX | \
> + VXLAN_F_UDP_ZERO_CSUM6_TX | \
> + VXLAN_F_UDP_ZERO_CSUM6_RX | \
> + VXLAN_F_COLLECT_METADATA)
> +
> struct net_device *vxlan_dev_create(struct net *net, const char *name,
> u8 name_assign_type, struct vxlan_config *conf);
>
> diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
> index c488066fb53a..9427f17d06d6 100644
> --- a/include/uapi/linux/if_link.h
> +++ b/include/uapi/linux/if_link.h
> @@ -488,6 +488,7 @@ enum {
> IFLA_VXLAN_REMCSUM_NOPARTIAL,
> IFLA_VXLAN_COLLECT_METADATA,
> IFLA_VXLAN_LABEL,
> + IFLA_VXLAN_GPE,
> __IFLA_VXLAN_MAX
> };
> #define IFLA_VXLAN_MAX (__IFLA_VXLAN_MAX - 1)
> --
> 1.8.3.1
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net-next v3 4/4] vxlan: implement GPE
2016-04-05 13:50 ` Tom Herbert
@ 2016-04-05 13:57 ` Jiri Benc
0 siblings, 0 replies; 8+ messages in thread
From: Jiri Benc @ 2016-04-05 13:57 UTC (permalink / raw)
To: Tom Herbert; +Cc: Linux Kernel Network Developers, Jesse Gross
On Tue, 5 Apr 2016 10:50:57 -0300, Tom Herbert wrote:
> I requested input from VXLAN protocol experts on whether RCO is
> architecturally correct in VXLAN and whether we are using the right
> fields both on the mailing list and in WG meeting yesterday @IETF.
> Have not gotten any response, so I am going to assume all this is
> reasonable. I will add explicit support to VXLAN-RCO draft for
> VXLAN-GPE. The configuration option for RX RCO can be removed and RCO
> can be supported for VXLAN/VXLAN-GPE in same way. Presumably, GBP
> might make same assumptions but GBP format as defined for VXLAN isn't
> compatible with VXLAN-GPE.
Cool, thanks! We'll loosen the restriction after it's added to the RFC,
it should be a simple one line patch.
Jiri
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net-next v3 0/4] vxlan: implement Generic Protocol Extension (GPE)
2016-04-05 12:47 [PATCH net-next v3 0/4] vxlan: implement Generic Protocol Extension (GPE) Jiri Benc
` (3 preceding siblings ...)
2016-04-05 12:47 ` [PATCH net-next v3 4/4] vxlan: implement GPE Jiri Benc
@ 2016-04-06 20:50 ` David Miller
4 siblings, 0 replies; 8+ messages in thread
From: David Miller @ 2016-04-06 20:50 UTC (permalink / raw)
To: jbenc; +Cc: netdev, tom, jesse
From: Jiri Benc <jbenc@redhat.com>
Date: Tue, 5 Apr 2016 14:47:09 +0200
> v3: just rebased on top of the current net-next, no changes
>
> This patchset implements VXLAN-GPE. It follows the same model as the tun/tap
> driver: depending on the chosen mode, the vxlan interface is created either
> as ARPHRD_ETHER (non-GPE) or ARPHRD_NONE (GPE).
>
> Note that the internal fdb control plane cannot be used together with
> VXLAN-GPE and attempt to configure it will be rejected by the driver. In
> fact, COLLECT_METADATA is required to be set for now. This can be relaxed in
> the future by adding support for static PtP configuration; it will be
> backward compatible and won't affect existing users.
>
> The previous version of the patchset supported two GPE modes, L2 and L3. The
> L2 mode (now called "ether mode" in the code) was removed from this version.
> It can be easily added later if there's demand. The L3 mode is now called
> "raw mode" and supports also encapsulated Ethernet headers (via ETH_P_TEB).
>
> The only limitation of not having "ether mode" for GPE is for ip route based
> encapsulation: with such setup, only IP packets can be encapsulated. Meaning
> no Ethernet encapsulation. It seems there's not much use for this, though.
> If it turns out to be useful, we'll add it.
Series applied, thanks Jiri.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2016-04-06 20:50 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-04-05 12:47 [PATCH net-next v3 0/4] vxlan: implement Generic Protocol Extension (GPE) Jiri Benc
2016-04-05 12:47 ` [PATCH net-next v3 1/4] vxlan: move Ethernet initialization to a separate function Jiri Benc
2016-04-05 12:47 ` [PATCH net-next v3 2/4] vxlan: move fdb code to common location in vxlan_xmit Jiri Benc
2016-04-05 12:47 ` [PATCH net-next v3 3/4] ip_tunnel: implement __iptunnel_pull_header Jiri Benc
2016-04-05 12:47 ` [PATCH net-next v3 4/4] vxlan: implement GPE Jiri Benc
2016-04-05 13:50 ` Tom Herbert
2016-04-05 13:57 ` Jiri Benc
2016-04-06 20:50 ` [PATCH net-next v3 0/4] vxlan: implement Generic Protocol Extension (GPE) David Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).