[PATCH RFC,net-next 0/3] ip_tunnel: specify tunnel type via template

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH RFC,net-next 0/3] ip_tunnel: specify tunnel type via template
@ 2018-10-04  0:03 Pablo Neira Ayuso
  2018-10-04  0:03 ` [PATCH RFC,net-next 1/3] ip_tunnel: add type field to struct ip_tunnel_info Pablo Neira Ayuso
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Pablo Neira Ayuso @ 2018-10-04  0:03 UTC (permalink / raw)
  To: netdev; +Cc: netfilter-devel, roopa, amir, pshelar, u9012063

Hi,

The following patchset adds a new field to the tunnel metadata template
to restrict the configuration to a given tunnel driver. Currently, a
misconfiguration may result in packets going to the wrong tunnel driver.

Although we have the tunnel option flags, they are not mandatory for
some tunnel drivers, eg. vxlan, which may use it or not; and gre which
does not use them.

This patch updates tc's tunnel action and netfilter's tunnel extension
to use this new field. OVS netlink interface has been left unset, although they
could be updated to use this.

By extending the existing tc action to support the IP_TUNNEL_INFO_BRIDGE
mode, I think it should be possible to expose IP_TUNNEL_TYPE_VLAN too,
although this patchset doesn't address this scenario.

The field is initialized to zero, which maps to IP_TUNNEL_TYPE_UNSPEC to
retain the existing behaviour, so the existing flexibility is still in
place while this new feature is added.

Cc'ing people that git annotate show as dealing with these bits more
recently.

Compile tested only.

Comments welcome, thanks.

Pablo Neira Ayuso (3):
  ip_tunnel: add type field to struct ip_tunnel_info
  net: act_tunnel_key: support for tunnel type
  netfilter: nft_tunnel: support for tunnel type

 drivers/net/geneve.c                      |  3 ++-
 drivers/net/vxlan.c                       | 13 +++++++------
 include/net/dst_metadata.h                |  1 +
 include/net/ip_tunnels.h                  | 16 ++++++++++++++++
 include/uapi/linux/netfilter/nf_tables.h  | 10 ++++++++++
 include/uapi/linux/tc_act/tc_tunnel_key.h | 10 ++++++++++
 net/ipv4/ip_gre.c                         |  2 ++
 net/ipv6/ip6_gre.c                        |  2 ++
 net/netfilter/nft_tunnel.c                |  9 ++++++++-
 net/openvswitch/flow_netlink.c            |  1 +
 net/sched/act_tunnel_key.c                |  9 +++++++++
 11 files changed, 68 insertions(+), 8 deletions(-)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH RFC,net-next 1/3] ip_tunnel: add type field to struct ip_tunnel_info
  2018-10-04  0:03 [PATCH RFC,net-next 0/3] ip_tunnel: specify tunnel type via template Pablo Neira Ayuso
@ 2018-10-04  0:03 ` Pablo Neira Ayuso
  2018-10-04  9:25   ` Daniel Borkmann
  2018-10-04  0:03 ` [PATCH RFC,net-next 2/3] net: act_tunnel_key: support for tunnel type Pablo Neira Ayuso
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 13+ messages in thread
From: Pablo Neira Ayuso @ 2018-10-04  0:03 UTC (permalink / raw)
  To: netdev; +Cc: netfilter-devel, roopa, amir, pshelar, u9012063

This new field allows you to restrict the metadata template for a given
tunnel driver. This is convenient in scenarios that combine different
tunneling drivers, to deal with possible misconfigurations given that
the template can be interpreted by any target tunnel driver. Default
value is IP_TUNNEL_TYPE_UNSPEC, to retain the existing behaviour. This
also implicitly exposes what drivers are currently supported in the
IP_TUNNEL_INFO_TX mode.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 drivers/net/geneve.c           |  3 ++-
 drivers/net/vxlan.c            | 13 +++++++------
 include/net/dst_metadata.h     |  1 +
 include/net/ip_tunnels.h       | 16 ++++++++++++++++
 net/ipv4/ip_gre.c              |  2 ++
 net/ipv6/ip6_gre.c             |  2 ++
 net/openvswitch/flow_netlink.c |  1 +
 7 files changed, 31 insertions(+), 7 deletions(-)

diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index 6625fabe2c88..c383c394f0d2 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -920,7 +920,8 @@ static netdev_tx_t geneve_xmit(struct sk_buff *skb, struct net_device *dev)
 
 	if (geneve->collect_md) {
 		info = skb_tunnel_info(skb);
-		if (unlikely(!info || !(info->mode & IP_TUNNEL_INFO_TX))) {
+		if (unlikely(!info || !(info->mode & IP_TUNNEL_INFO_TX) ||
+			     !ip_tunnel_type(info, IP_TUNNEL_TYPE_GENEVE))) {
 			err = -EINVAL;
 			netdev_dbg(dev, "no tunnel metadata\n");
 			goto tx_error;
diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index e5d236595206..8cca91b572bd 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -2296,14 +2296,15 @@ static netdev_tx_t vxlan_xmit(struct sk_buff *skb, struct net_device *dev)
 	skb_reset_mac_header(skb);
 
 	if (vxlan->cfg.flags & VXLAN_F_COLLECT_METADATA) {
-		if (info && info->mode & IP_TUNNEL_INFO_BRIDGE &&
-		    info->mode & IP_TUNNEL_INFO_TX) {
+		if (unlikely(!info || !(info->mode & IP_TUNNEL_INFO_TX) ||
+			     !ip_tunnel_type(info, IP_TUNNEL_TYPE_VXLAN))) {
+			kfree_skb(skb);
+			return NETDEV_TX_OK;
+		}
+		if (info->mode & IP_TUNNEL_INFO_BRIDGE) {
 			vni = tunnel_id_to_key32(info->key.tun_id);
 		} else {
-			if (info && info->mode & IP_TUNNEL_INFO_TX)
-				vxlan_xmit_one(skb, dev, vni, NULL, false);
-			else
-				kfree_skb(skb);
+			vxlan_xmit_one(skb, dev, vni, NULL, false);
 			return NETDEV_TX_OK;
 		}
 	}
diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h
index 56cb3c38569a..674116f7fa4a 100644
--- a/include/net/dst_metadata.h
+++ b/include/net/dst_metadata.h
@@ -100,6 +100,7 @@ static inline struct metadata_dst *tun_rx_dst(int md_size)
 	if (!tun_dst)
 		return NULL;
 
+	tun_dst->u.tun_info.type = IP_TUNNEL_TYPE_UNSPEC;
 	tun_dst->u.tun_info.options_len = 0;
 	tun_dst->u.tun_info.mode = 0;
 	return tun_dst;
diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index b0d022ff6ea1..985d24b6a102 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -66,7 +66,16 @@ struct ip_tunnel_key {
 	GENMASK((FIELD_SIZEOF(struct ip_tunnel_info,		\
 			      options_len) * BITS_PER_BYTE) - 1, 0)
 
+enum ip_tunnel_type {
+	IP_TUNNEL_TYPE_UNSPEC	= 0,
+	IP_TUNNEL_TYPE_GRE,
+	IP_TUNNEL_TYPE_VXLAN,
+	IP_TUNNEL_TYPE_GENEVE,
+	IP_TUNNEL_TYPE_ERSPAN,
+};
+
 struct ip_tunnel_info {
+	enum ip_tunnel_type	type;
 	struct ip_tunnel_key	key;
 #ifdef CONFIG_DST_CACHE
 	struct dst_cache	dst_cache;
@@ -75,6 +84,13 @@ struct ip_tunnel_info {
 	u8			mode;
 };
 
+static inline bool ip_tunnel_type(const struct ip_tunnel_info *tun_info,
+				  enum ip_tunnel_type type)
+{
+	return tun_info->type == IP_TUNNEL_TYPE_UNSPEC ||
+	       tun_info->type == type;
+}
+
 /* 6rd prefix/relay information */
 #ifdef CONFIG_IPV6_SIT_6RD
 struct ip_tunnel_6rd_parm {
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index c3385a84f8ff..3ad12135c3d3 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -534,6 +534,7 @@ static void gre_fb_xmit(struct sk_buff *skb, struct net_device *dev,
 
 	tun_info = skb_tunnel_info(skb);
 	if (unlikely(!tun_info || !(tun_info->mode & IP_TUNNEL_INFO_TX) ||
+		     !ip_tunnel_type(tun_info, IP_TUNNEL_TYPE_GRE) ||
 		     ip_tunnel_info_af(tun_info) != AF_INET))
 		goto err_free_skb;
 
@@ -585,6 +586,7 @@ static void erspan_fb_xmit(struct sk_buff *skb, struct net_device *dev,
 
 	tun_info = skb_tunnel_info(skb);
 	if (unlikely(!tun_info || !(tun_info->mode & IP_TUNNEL_INFO_TX) ||
+		     !ip_tunnel_type(tun_info, IP_TUNNEL_TYPE_ERSPAN) ||
 		     ip_tunnel_info_af(tun_info) != AF_INET))
 		goto err_free_skb;
 
diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index 515adbdba1d2..675f373809ee 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -732,6 +732,7 @@ static netdev_tx_t __gre6_xmit(struct sk_buff *skb,
 		tun_info = skb_tunnel_info(skb);
 		if (unlikely(!tun_info ||
 			     !(tun_info->mode & IP_TUNNEL_INFO_TX) ||
+			     !ip_tunnel_type(tun_info, IP_TUNNEL_TYPE_GRE) ||
 			     ip_tunnel_info_af(tun_info) != AF_INET6))
 			return -EINVAL;
 
@@ -960,6 +961,7 @@ static netdev_tx_t ip6erspan_tunnel_xmit(struct sk_buff *skb,
 		tun_info = skb_tunnel_info(skb);
 		if (unlikely(!tun_info ||
 			     !(tun_info->mode & IP_TUNNEL_INFO_TX) ||
+			     !ip_tunnel_type(tun_info, IP_TUNNEL_TYPE_ERSPAN) ||
 			     ip_tunnel_info_af(tun_info) != AF_INET6))
 			return -EINVAL;
 
diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index a70097ecf33c..1ee2509534df 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -2602,6 +2602,7 @@ static int validate_and_copy_set_tun(const struct nlattr *attr,
 	ovs_tun->tun_dst = tun_dst;
 
 	tun_info = &tun_dst->u.tun_info;
+	tun_info->type = IP_TUNNEL_TYPE_UNSPEC;
 	tun_info->mode = IP_TUNNEL_INFO_TX;
 	if (key.tun_proto == AF_INET6)
 		tun_info->mode |= IP_TUNNEL_INFO_IPV6;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH RFC,net-next 2/3] net: act_tunnel_key: support for tunnel type
  2018-10-04  0:03 [PATCH RFC,net-next 0/3] ip_tunnel: specify tunnel type via template Pablo Neira Ayuso
  2018-10-04  0:03 ` [PATCH RFC,net-next 1/3] ip_tunnel: add type field to struct ip_tunnel_info Pablo Neira Ayuso
@ 2018-10-04  0:03 ` Pablo Neira Ayuso
  2018-10-04  0:03 ` [PATCH RFC,net-next 3/3] netfilter: nft_tunnel: " Pablo Neira Ayuso
  2018-10-04 19:13 ` [PATCH RFC,net-next 0/3] ip_tunnel: specify tunnel type via template Jakub Kicinski
  3 siblings, 0 replies; 13+ messages in thread
From: Pablo Neira Ayuso @ 2018-10-04  0:03 UTC (permalink / raw)
  To: netdev; +Cc: netfilter-devel, roopa, amir, pshelar, u9012063

This patch allows you to set an explicit tunnel driver type in the
metadata template. In case of misconfiguration, ie. if the packets ends
up in the wrong tunnel device, the packet is dropped.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/uapi/linux/tc_act/tc_tunnel_key.h | 10 ++++++++++
 net/sched/act_tunnel_key.c                |  9 +++++++++
 2 files changed, 19 insertions(+)

diff --git a/include/uapi/linux/tc_act/tc_tunnel_key.h b/include/uapi/linux/tc_act/tc_tunnel_key.h
index be384d63e1b5..f6471549a085 100644
--- a/include/uapi/linux/tc_act/tc_tunnel_key.h
+++ b/include/uapi/linux/tc_act/tc_tunnel_key.h
@@ -41,9 +41,19 @@ enum {
 					 */
 	TCA_TUNNEL_KEY_ENC_TOS,		/* u8 */
 	TCA_TUNNEL_KEY_ENC_TTL,		/* u8 */
+	TCA_TUNNEL_KEY_ENC_TYPE,	/* u32 */
 	__TCA_TUNNEL_KEY_MAX,
 };
 
+/* 1:1 mapping with the internal enum ip_tunnel_type. */
+enum tc_tunnel_type {
+	TC_TUNNEL_TYPE_UNSPEC   = 0,
+	TC_TUNNEL_TYPE_GRE,
+	TC_TUNNEL_TYPE_VXLAN,
+	TC_TUNNEL_TYPE_GENEVE,
+	TC_TUNNEL_TYPE_ERSPAN,
+};
+
 #define TCA_TUNNEL_KEY_MAX (__TCA_TUNNEL_KEY_MAX - 1)
 
 enum {
diff --git a/net/sched/act_tunnel_key.c b/net/sched/act_tunnel_key.c
index 4cca8f274662..e3a1306c7ea8 100644
--- a/net/sched/act_tunnel_key.c
+++ b/net/sched/act_tunnel_key.c
@@ -195,6 +195,7 @@ static const struct nla_policy tunnel_key_policy[TCA_TUNNEL_KEY_MAX + 1] = {
 	[TCA_TUNNEL_KEY_ENC_OPTS]     = { .type = NLA_NESTED },
 	[TCA_TUNNEL_KEY_ENC_TOS]      = { .type = NLA_U8 },
 	[TCA_TUNNEL_KEY_ENC_TTL]      = { .type = NLA_U8 },
+	[TCA_TUNNEL_KEY_ENC_TYPE]     = { .type = NLA_U32 },
 };
 
 static int tunnel_key_init(struct net *net, struct nlattr *nla,
@@ -215,6 +216,7 @@ static int tunnel_key_init(struct net *net, struct nlattr *nla,
 	__be16 flags;
 	u8 tos, ttl;
 	int ret = 0;
+	u32 type;
 	int err;
 
 	if (!nla) {
@@ -278,6 +280,10 @@ static int tunnel_key_init(struct net *net, struct nlattr *nla,
 		if (tb[TCA_TUNNEL_KEY_ENC_TTL])
 			ttl = nla_get_u8(tb[TCA_TUNNEL_KEY_ENC_TTL]);
 
+		type = TC_TUNNEL_TYPE_UNSPEC;
+		if (tb[TCA_TUNNEL_KEY_ENC_TYPE])
+			type = nla_get_u32(tb[TCA_TUNNEL_KEY_ENC_TYPE]);
+
 		if (tb[TCA_TUNNEL_KEY_ENC_IPV4_SRC] &&
 		    tb[TCA_TUNNEL_KEY_ENC_IPV4_DST]) {
 			__be32 saddr;
@@ -320,6 +326,7 @@ static int tunnel_key_init(struct net *net, struct nlattr *nla,
 				goto release_tun_meta;
 		}
 
+		metadata->u.tun_info.type = type;
 		metadata->u.tun_info.mode |= IP_TUNNEL_INFO_TX;
 		break;
 	default:
@@ -522,6 +529,8 @@ static int tunnel_key_dump(struct sk_buff *skb, struct tc_action *a,
 
 		if (key->ttl && nla_put_u8(skb, TCA_TUNNEL_KEY_ENC_TTL, key->ttl))
 			goto nla_put_failure;
+		if (nla_put_u32(skb, TCA_TUNNEL_KEY_ENC_TYPE, info->type))
+			goto nla_put_failure;
 	}
 
 	tcf_tm_dump(&tm, &t->tcf_tm);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH RFC,net-next 3/3] netfilter: nft_tunnel: support for tunnel type
  2018-10-04  0:03 [PATCH RFC,net-next 0/3] ip_tunnel: specify tunnel type via template Pablo Neira Ayuso
  2018-10-04  0:03 ` [PATCH RFC,net-next 1/3] ip_tunnel: add type field to struct ip_tunnel_info Pablo Neira Ayuso
  2018-10-04  0:03 ` [PATCH RFC,net-next 2/3] net: act_tunnel_key: support for tunnel type Pablo Neira Ayuso
@ 2018-10-04  0:03 ` Pablo Neira Ayuso
  2018-10-04 19:13 ` [PATCH RFC,net-next 0/3] ip_tunnel: specify tunnel type via template Jakub Kicinski
  3 siblings, 0 replies; 13+ messages in thread
From: Pablo Neira Ayuso @ 2018-10-04  0:03 UTC (permalink / raw)
  To: netdev; +Cc: netfilter-devel, roopa, amir, pshelar, u9012063

This patch allows you to set an explicit tunnel driver type in the
metadata template. In case of misconfiguration, ie. if the packets ends
up in the wrong tunnel device, the packet is dropped.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/uapi/linux/netfilter/nf_tables.h | 10 ++++++++++
 net/netfilter/nft_tunnel.c               |  9 ++++++++-
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h
index 5444e76870bb..b36acb52eb50 100644
--- a/include/uapi/linux/netfilter/nf_tables.h
+++ b/include/uapi/linux/netfilter/nf_tables.h
@@ -1692,6 +1692,15 @@ enum nft_tunnel_flags {
 				 NFT_TUNNEL_F_DONT_FRAGMENT | \
 				 NFT_TUNNEL_F_SEQ_NUMBER)
 
+/* 1:1 mapping with the internal enum ip_tunnel_type. */
+enum nft_tunnel_type {
+	NFT_TUNNEL_TYPE_UNSPEC   = 0,
+	NFT_TUNNEL_TYPE_GRE,
+	NFT_TUNNEL_TYPE_VXLAN,
+	NFT_TUNNEL_TYPE_GENEVE,
+	NFT_TUNNEL_TYPE_ERSPAN,
+};
+
 enum nft_tunnel_key_attributes {
 	NFTA_TUNNEL_KEY_UNSPEC,
 	NFTA_TUNNEL_KEY_ID,
@@ -1703,6 +1712,7 @@ enum nft_tunnel_key_attributes {
 	NFTA_TUNNEL_KEY_SPORT,
 	NFTA_TUNNEL_KEY_DPORT,
 	NFTA_TUNNEL_KEY_OPTS,
+	NFTA_TUNNEL_KEY_TYPE,
 	__NFTA_TUNNEL_KEY_MAX
 };
 #define NFTA_TUNNEL_KEY_MAX	(__NFTA_TUNNEL_KEY_MAX - 1)
diff --git a/net/netfilter/nft_tunnel.c b/net/netfilter/nft_tunnel.c
index 3a15f219e4e7..3cc54bc4ce31 100644
--- a/net/netfilter/nft_tunnel.c
+++ b/net/netfilter/nft_tunnel.c
@@ -305,6 +305,7 @@ static const struct nla_policy nft_tunnel_key_policy[NFTA_TUNNEL_KEY_MAX + 1] =
 	[NFTA_TUNNEL_KEY_TOS]	= { .type = NLA_U8, },
 	[NFTA_TUNNEL_KEY_TTL]	= { .type = NLA_U8, },
 	[NFTA_TUNNEL_KEY_OPTS]	= { .type = NLA_NESTED, },
+	[NFTA_TUNNEL_KEY_TYPE]	= { .type = NLA_U32, },
 };
 
 static int nft_tunnel_obj_init(const struct nft_ctx *ctx,
@@ -312,6 +313,7 @@ static int nft_tunnel_obj_init(const struct nft_ctx *ctx,
 			       struct nft_object *obj)
 {
 	struct nft_tunnel_obj *priv = nft_obj_data(obj);
+	u32 type = NFT_TUNNEL_TYPE_UNSPEC;
 	struct ip_tunnel_info info;
 	struct metadata_dst *md;
 	int err;
@@ -319,7 +321,11 @@ static int nft_tunnel_obj_init(const struct nft_ctx *ctx,
 	if (!tb[NFTA_TUNNEL_KEY_ID])
 		return -EINVAL;
 
+	if (tb[NFTA_TUNNEL_KEY_TYPE])
+		type = ntohl(nla_get_be32(tb[NFTA_TUNNEL_KEY_TYPE]));
+
 	memset(&info, 0, sizeof(info));
+	info.type		= type;
 	info.mode		= IP_TUNNEL_INFO_TX;
 	info.key.tun_id		= key32_to_tunnel_id(nla_get_be32(tb[NFTA_TUNNEL_KEY_ID]));
 	info.key.tun_flags	= TUNNEL_KEY | TUNNEL_CSUM | TUNNEL_NOCACHE;
@@ -494,7 +500,8 @@ static int nft_tunnel_obj_dump(struct sk_buff *skb,
 	struct nft_tunnel_obj *priv = nft_obj_data(obj);
 	struct ip_tunnel_info *info = &priv->md->u.tun_info;
 
-	if (nla_put_be32(skb, NFTA_TUNNEL_KEY_ID,
+	if (nla_put_be32(skb, NFTA_TUNNEL_KEY_TYPE, htonl(info->type)) ||
+	    nla_put_be32(skb, NFTA_TUNNEL_KEY_ID,
 			 tunnel_id_to_key32(info->key.tun_id)) ||
 	    nft_tunnel_ip_dump(skb, info) < 0 ||
 	    nft_tunnel_ports_dump(skb, info) < 0 ||
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH RFC,net-next 1/3] ip_tunnel: add type field to struct ip_tunnel_info
  2018-10-04  0:03 ` [PATCH RFC,net-next 1/3] ip_tunnel: add type field to struct ip_tunnel_info Pablo Neira Ayuso
@ 2018-10-04  9:25   ` Daniel Borkmann
  2018-10-04 10:56     ` Pablo Neira Ayuso
  0 siblings, 1 reply; 13+ messages in thread
From: Daniel Borkmann @ 2018-10-04  9:25 UTC (permalink / raw)
  To: Pablo Neira Ayuso, netdev
  Cc: netfilter-devel, roopa, amir, pshelar, u9012063,
	alexei.starovoitov

On 10/04/2018 02:03 AM, Pablo Neira Ayuso wrote:
> This new field allows you to restrict the metadata template for a given
> tunnel driver. This is convenient in scenarios that combine different
> tunneling drivers, to deal with possible misconfigurations given that
> the template can be interpreted by any target tunnel driver. Default
> value is IP_TUNNEL_TYPE_UNSPEC, to retain the existing behaviour. This
> also implicitly exposes what drivers are currently supported in the
> IP_TUNNEL_INFO_TX mode.
> 
> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> ---
>  drivers/net/geneve.c           |  3 ++-
>  drivers/net/vxlan.c            | 13 +++++++------
>  include/net/dst_metadata.h     |  1 +
>  include/net/ip_tunnels.h       | 16 ++++++++++++++++
>  net/ipv4/ip_gre.c              |  2 ++
>  net/ipv6/ip6_gre.c             |  2 ++
>  net/openvswitch/flow_netlink.c |  1 +
>  7 files changed, 31 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
> index 6625fabe2c88..c383c394f0d2 100644
> --- a/drivers/net/geneve.c
> +++ b/drivers/net/geneve.c
> @@ -920,7 +920,8 @@ static netdev_tx_t geneve_xmit(struct sk_buff *skb, struct net_device *dev)
>  
>  	if (geneve->collect_md) {
>  		info = skb_tunnel_info(skb);
> -		if (unlikely(!info || !(info->mode & IP_TUNNEL_INFO_TX))) {
> +		if (unlikely(!info || !(info->mode & IP_TUNNEL_INFO_TX) ||
> +			     !ip_tunnel_type(info, IP_TUNNEL_TYPE_GENEVE))) {
>  			err = -EINVAL;
>  			netdev_dbg(dev, "no tunnel metadata\n");
>  			goto tx_error;
> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> index e5d236595206..8cca91b572bd 100644
> --- a/drivers/net/vxlan.c
> +++ b/drivers/net/vxlan.c
> @@ -2296,14 +2296,15 @@ static netdev_tx_t vxlan_xmit(struct sk_buff *skb, struct net_device *dev)
>  	skb_reset_mac_header(skb);
>  
>  	if (vxlan->cfg.flags & VXLAN_F_COLLECT_METADATA) {
> -		if (info && info->mode & IP_TUNNEL_INFO_BRIDGE &&
> -		    info->mode & IP_TUNNEL_INFO_TX) {
> +		if (unlikely(!info || !(info->mode & IP_TUNNEL_INFO_TX) ||
> +			     !ip_tunnel_type(info, IP_TUNNEL_TYPE_VXLAN))) {
> +			kfree_skb(skb);
> +			return NETDEV_TX_OK;
> +		}
> +		if (info->mode & IP_TUNNEL_INFO_BRIDGE) {
>  			vni = tunnel_id_to_key32(info->key.tun_id);
>  		} else {
> -			if (info && info->mode & IP_TUNNEL_INFO_TX)
> -				vxlan_xmit_one(skb, dev, vni, NULL, false);
> -			else
> -				kfree_skb(skb);
> +			vxlan_xmit_one(skb, dev, vni, NULL, false);
>  			return NETDEV_TX_OK;
>  		}
>  	}
> diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h
> index 56cb3c38569a..674116f7fa4a 100644
> --- a/include/net/dst_metadata.h
> +++ b/include/net/dst_metadata.h
> @@ -100,6 +100,7 @@ static inline struct metadata_dst *tun_rx_dst(int md_size)
>  	if (!tun_dst)
>  		return NULL;
>  
> +	tun_dst->u.tun_info.type = IP_TUNNEL_TYPE_UNSPEC;
>  	tun_dst->u.tun_info.options_len = 0;
>  	tun_dst->u.tun_info.mode = 0;
>  	return tun_dst;
> diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
> index b0d022ff6ea1..985d24b6a102 100644
> --- a/include/net/ip_tunnels.h
> +++ b/include/net/ip_tunnels.h
> @@ -66,7 +66,16 @@ struct ip_tunnel_key {
>  	GENMASK((FIELD_SIZEOF(struct ip_tunnel_info,		\
>  			      options_len) * BITS_PER_BYTE) - 1, 0)
>  
> +enum ip_tunnel_type {
> +	IP_TUNNEL_TYPE_UNSPEC	= 0,
> +	IP_TUNNEL_TYPE_GRE,
> +	IP_TUNNEL_TYPE_VXLAN,
> +	IP_TUNNEL_TYPE_GENEVE,
> +	IP_TUNNEL_TYPE_ERSPAN,
> +};
> +
>  struct ip_tunnel_info {
> +	enum ip_tunnel_type	type;
>  	struct ip_tunnel_key	key;
>  #ifdef CONFIG_DST_CACHE
>  	struct dst_cache	dst_cache;
> @@ -75,6 +84,13 @@ struct ip_tunnel_info {
>  	u8			mode;
>  };
>  
> +static inline bool ip_tunnel_type(const struct ip_tunnel_info *tun_info,
> +				  enum ip_tunnel_type type)
> +{
> +	return tun_info->type == IP_TUNNEL_TYPE_UNSPEC ||
> +	       tun_info->type == type;
> +}
> +
>  /* 6rd prefix/relay information */
>  #ifdef CONFIG_IPV6_SIT_6RD
>  struct ip_tunnel_6rd_parm {
> diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
> index c3385a84f8ff..3ad12135c3d3 100644
> --- a/net/ipv4/ip_gre.c
> +++ b/net/ipv4/ip_gre.c
> @@ -534,6 +534,7 @@ static void gre_fb_xmit(struct sk_buff *skb, struct net_device *dev,
>  
>  	tun_info = skb_tunnel_info(skb);
>  	if (unlikely(!tun_info || !(tun_info->mode & IP_TUNNEL_INFO_TX) ||
> +		     !ip_tunnel_type(tun_info, IP_TUNNEL_TYPE_GRE) ||
>  		     ip_tunnel_info_af(tun_info) != AF_INET))
>  		goto err_free_skb;
>  
> @@ -585,6 +586,7 @@ static void erspan_fb_xmit(struct sk_buff *skb, struct net_device *dev,
>  
>  	tun_info = skb_tunnel_info(skb);
>  	if (unlikely(!tun_info || !(tun_info->mode & IP_TUNNEL_INFO_TX) ||
> +		     !ip_tunnel_type(tun_info, IP_TUNNEL_TYPE_ERSPAN) ||
>  		     ip_tunnel_info_af(tun_info) != AF_INET))
>  		goto err_free_skb;
>  
> diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
> index 515adbdba1d2..675f373809ee 100644
> --- a/net/ipv6/ip6_gre.c
> +++ b/net/ipv6/ip6_gre.c
> @@ -732,6 +732,7 @@ static netdev_tx_t __gre6_xmit(struct sk_buff *skb,
>  		tun_info = skb_tunnel_info(skb);
>  		if (unlikely(!tun_info ||
>  			     !(tun_info->mode & IP_TUNNEL_INFO_TX) ||
> +			     !ip_tunnel_type(tun_info, IP_TUNNEL_TYPE_GRE) ||
>  			     ip_tunnel_info_af(tun_info) != AF_INET6))
>  			return -EINVAL;
>  
> @@ -960,6 +961,7 @@ static netdev_tx_t ip6erspan_tunnel_xmit(struct sk_buff *skb,
>  		tun_info = skb_tunnel_info(skb);
>  		if (unlikely(!tun_info ||
>  			     !(tun_info->mode & IP_TUNNEL_INFO_TX) ||
> +			     !ip_tunnel_type(tun_info, IP_TUNNEL_TYPE_ERSPAN) ||
>  			     ip_tunnel_info_af(tun_info) != AF_INET6))
>  			return -EINVAL;
>  
> diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
> index a70097ecf33c..1ee2509534df 100644
> --- a/net/openvswitch/flow_netlink.c
> +++ b/net/openvswitch/flow_netlink.c
> @@ -2602,6 +2602,7 @@ static int validate_and_copy_set_tun(const struct nlattr *attr,
>  	ovs_tun->tun_dst = tun_dst;
>  
>  	tun_info = &tun_dst->u.tun_info;
> +	tun_info->type = IP_TUNNEL_TYPE_UNSPEC;
>  	tun_info->mode = IP_TUNNEL_INFO_TX;
>  	if (key.tun_proto == AF_INET6)
>  		tun_info->mode |= IP_TUNNEL_INFO_IPV6;
> 

If so then this should also be made explicit IP_TUNNEL_TYPE_UNSPEC in BPF code
since all these tunnel types are supported there as well.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFC,net-next 1/3] ip_tunnel: add type field to struct ip_tunnel_info
  2018-10-04  9:25   ` Daniel Borkmann
@ 2018-10-04 10:56     ` Pablo Neira Ayuso
  2018-10-04 12:00       ` Daniel Borkmann
  0 siblings, 1 reply; 13+ messages in thread
From: Pablo Neira Ayuso @ 2018-10-04 10:56 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: netdev, netfilter-devel, roopa, amir, pshelar, u9012063,
	alexei.starovoitov

On Thu, Oct 04, 2018 at 11:25:33AM +0200, Daniel Borkmann wrote:
> On 10/04/2018 02:03 AM, Pablo Neira Ayuso wrote:
> > This new field allows you to restrict the metadata template for a given
> > tunnel driver. This is convenient in scenarios that combine different
> > tunneling drivers, to deal with possible misconfigurations given that
> > the template can be interpreted by any target tunnel driver. Default
> > value is IP_TUNNEL_TYPE_UNSPEC, to retain the existing behaviour. This
> > also implicitly exposes what drivers are currently supported in the
> > IP_TUNNEL_INFO_TX mode.
> > 
> > Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> > ---
> >  drivers/net/geneve.c           |  3 ++-
> >  drivers/net/vxlan.c            | 13 +++++++------
> >  include/net/dst_metadata.h     |  1 +
> >  include/net/ip_tunnels.h       | 16 ++++++++++++++++
> >  net/ipv4/ip_gre.c              |  2 ++
> >  net/ipv6/ip6_gre.c             |  2 ++
> >  net/openvswitch/flow_netlink.c |  1 +
> >  7 files changed, 31 insertions(+), 7 deletions(-)
> > 
> > diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
> > index 6625fabe2c88..c383c394f0d2 100644
> > --- a/drivers/net/geneve.c
> > +++ b/drivers/net/geneve.c
> > @@ -920,7 +920,8 @@ static netdev_tx_t geneve_xmit(struct sk_buff *skb, struct net_device *dev)
> >  
> >  	if (geneve->collect_md) {
> >  		info = skb_tunnel_info(skb);
> > -		if (unlikely(!info || !(info->mode & IP_TUNNEL_INFO_TX))) {
> > +		if (unlikely(!info || !(info->mode & IP_TUNNEL_INFO_TX) ||
> > +			     !ip_tunnel_type(info, IP_TUNNEL_TYPE_GENEVE))) {
> >  			err = -EINVAL;
> >  			netdev_dbg(dev, "no tunnel metadata\n");
> >  			goto tx_error;
> > diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> > index e5d236595206..8cca91b572bd 100644
> > --- a/drivers/net/vxlan.c
> > +++ b/drivers/net/vxlan.c
> > @@ -2296,14 +2296,15 @@ static netdev_tx_t vxlan_xmit(struct sk_buff *skb, struct net_device *dev)
> >  	skb_reset_mac_header(skb);
> >  
> >  	if (vxlan->cfg.flags & VXLAN_F_COLLECT_METADATA) {
> > -		if (info && info->mode & IP_TUNNEL_INFO_BRIDGE &&
> > -		    info->mode & IP_TUNNEL_INFO_TX) {
> > +		if (unlikely(!info || !(info->mode & IP_TUNNEL_INFO_TX) ||
> > +			     !ip_tunnel_type(info, IP_TUNNEL_TYPE_VXLAN))) {
> > +			kfree_skb(skb);
> > +			return NETDEV_TX_OK;
> > +		}
> > +		if (info->mode & IP_TUNNEL_INFO_BRIDGE) {
> >  			vni = tunnel_id_to_key32(info->key.tun_id);
> >  		} else {
> > -			if (info && info->mode & IP_TUNNEL_INFO_TX)
> > -				vxlan_xmit_one(skb, dev, vni, NULL, false);
> > -			else
> > -				kfree_skb(skb);
> > +			vxlan_xmit_one(skb, dev, vni, NULL, false);
> >  			return NETDEV_TX_OK;
> >  		}
> >  	}
> > diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h
> > index 56cb3c38569a..674116f7fa4a 100644
> > --- a/include/net/dst_metadata.h
> > +++ b/include/net/dst_metadata.h
> > @@ -100,6 +100,7 @@ static inline struct metadata_dst *tun_rx_dst(int md_size)
> >  	if (!tun_dst)
> >  		return NULL;
> >  
> > +	tun_dst->u.tun_info.type = IP_TUNNEL_TYPE_UNSPEC;
> >  	tun_dst->u.tun_info.options_len = 0;
> >  	tun_dst->u.tun_info.mode = 0;
> >  	return tun_dst;
> > diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
> > index b0d022ff6ea1..985d24b6a102 100644
> > --- a/include/net/ip_tunnels.h
> > +++ b/include/net/ip_tunnels.h
> > @@ -66,7 +66,16 @@ struct ip_tunnel_key {
> >  	GENMASK((FIELD_SIZEOF(struct ip_tunnel_info,		\
> >  			      options_len) * BITS_PER_BYTE) - 1, 0)
> >  
> > +enum ip_tunnel_type {
> > +	IP_TUNNEL_TYPE_UNSPEC	= 0,
> > +	IP_TUNNEL_TYPE_GRE,
> > +	IP_TUNNEL_TYPE_VXLAN,
> > +	IP_TUNNEL_TYPE_GENEVE,
> > +	IP_TUNNEL_TYPE_ERSPAN,
> > +};
> > +
> >  struct ip_tunnel_info {
> > +	enum ip_tunnel_type	type;
> >  	struct ip_tunnel_key	key;
> >  #ifdef CONFIG_DST_CACHE
> >  	struct dst_cache	dst_cache;
> > @@ -75,6 +84,13 @@ struct ip_tunnel_info {
> >  	u8			mode;
> >  };
> >  
> > +static inline bool ip_tunnel_type(const struct ip_tunnel_info *tun_info,
> > +				  enum ip_tunnel_type type)
> > +{
> > +	return tun_info->type == IP_TUNNEL_TYPE_UNSPEC ||
> > +	       tun_info->type == type;
> > +}
> > +
> >  /* 6rd prefix/relay information */
> >  #ifdef CONFIG_IPV6_SIT_6RD
> >  struct ip_tunnel_6rd_parm {
> > diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
> > index c3385a84f8ff..3ad12135c3d3 100644
> > --- a/net/ipv4/ip_gre.c
> > +++ b/net/ipv4/ip_gre.c
> > @@ -534,6 +534,7 @@ static void gre_fb_xmit(struct sk_buff *skb, struct net_device *dev,
> >  
> >  	tun_info = skb_tunnel_info(skb);
> >  	if (unlikely(!tun_info || !(tun_info->mode & IP_TUNNEL_INFO_TX) ||
> > +		     !ip_tunnel_type(tun_info, IP_TUNNEL_TYPE_GRE) ||
> >  		     ip_tunnel_info_af(tun_info) != AF_INET))
> >  		goto err_free_skb;
> >  
> > @@ -585,6 +586,7 @@ static void erspan_fb_xmit(struct sk_buff *skb, struct net_device *dev,
> >  
> >  	tun_info = skb_tunnel_info(skb);
> >  	if (unlikely(!tun_info || !(tun_info->mode & IP_TUNNEL_INFO_TX) ||
> > +		     !ip_tunnel_type(tun_info, IP_TUNNEL_TYPE_ERSPAN) ||
> >  		     ip_tunnel_info_af(tun_info) != AF_INET))
> >  		goto err_free_skb;
> >  
> > diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
> > index 515adbdba1d2..675f373809ee 100644
> > --- a/net/ipv6/ip6_gre.c
> > +++ b/net/ipv6/ip6_gre.c
> > @@ -732,6 +732,7 @@ static netdev_tx_t __gre6_xmit(struct sk_buff *skb,
> >  		tun_info = skb_tunnel_info(skb);
> >  		if (unlikely(!tun_info ||
> >  			     !(tun_info->mode & IP_TUNNEL_INFO_TX) ||
> > +			     !ip_tunnel_type(tun_info, IP_TUNNEL_TYPE_GRE) ||
> >  			     ip_tunnel_info_af(tun_info) != AF_INET6))
> >  			return -EINVAL;
> >  
> > @@ -960,6 +961,7 @@ static netdev_tx_t ip6erspan_tunnel_xmit(struct sk_buff *skb,
> >  		tun_info = skb_tunnel_info(skb);
> >  		if (unlikely(!tun_info ||
> >  			     !(tun_info->mode & IP_TUNNEL_INFO_TX) ||
> > +			     !ip_tunnel_type(tun_info, IP_TUNNEL_TYPE_ERSPAN) ||
> >  			     ip_tunnel_info_af(tun_info) != AF_INET6))
> >  			return -EINVAL;
> >  
> > diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
> > index a70097ecf33c..1ee2509534df 100644
> > --- a/net/openvswitch/flow_netlink.c
> > +++ b/net/openvswitch/flow_netlink.c
> > @@ -2602,6 +2602,7 @@ static int validate_and_copy_set_tun(const struct nlattr *attr,
> >  	ovs_tun->tun_dst = tun_dst;
> >  
> >  	tun_info = &tun_dst->u.tun_info;
> > +	tun_info->type = IP_TUNNEL_TYPE_UNSPEC;
> >  	tun_info->mode = IP_TUNNEL_INFO_TX;
> >  	if (key.tun_proto == AF_INET6)
> >  		tun_info->mode |= IP_TUNNEL_INFO_IPV6;
> > 
> 
> If so then this should also be made explicit IP_TUNNEL_TYPE_UNSPEC in BPF code
> since all these tunnel types are supported there as well.

Are you refering to proper initialization? I can see a memset() there
for the ip_tunnel_info structure, which is implicitly setting
tun_info->type to zero, ie. IP_TUNNEL_TYPE_UNSPEC.

I can also make it explicit there if you prefer.

Thanks.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFC,net-next 1/3] ip_tunnel: add type field to struct ip_tunnel_info
  2018-10-04 10:56     ` Pablo Neira Ayuso
@ 2018-10-04 12:00       ` Daniel Borkmann
  0 siblings, 0 replies; 13+ messages in thread
From: Daniel Borkmann @ 2018-10-04 12:00 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: netdev, netfilter-devel, roopa, amir, pshelar, u9012063,
	alexei.starovoitov

On 10/04/2018 12:56 PM, Pablo Neira Ayuso wrote:
> On Thu, Oct 04, 2018 at 11:25:33AM +0200, Daniel Borkmann wrote:
>> On 10/04/2018 02:03 AM, Pablo Neira Ayuso wrote:
[...]
>>> diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
>>> index a70097ecf33c..1ee2509534df 100644
>>> --- a/net/openvswitch/flow_netlink.c
>>> +++ b/net/openvswitch/flow_netlink.c
>>> @@ -2602,6 +2602,7 @@ static int validate_and_copy_set_tun(const struct nlattr *attr,
>>>  	ovs_tun->tun_dst = tun_dst;
>>>  
>>>  	tun_info = &tun_dst->u.tun_info;
>>> +	tun_info->type = IP_TUNNEL_TYPE_UNSPEC;
>>>  	tun_info->mode = IP_TUNNEL_INFO_TX;
>>>  	if (key.tun_proto == AF_INET6)
>>>  		tun_info->mode |= IP_TUNNEL_INFO_IPV6;
>>>
>>
>> If so then this should also be made explicit IP_TUNNEL_TYPE_UNSPEC in BPF code
>> since all these tunnel types are supported there as well.
> 
> Are you refering to proper initialization? I can see a memset() there
> for the ip_tunnel_info structure, which is implicitly setting
> tun_info->type to zero, ie. IP_TUNNEL_TYPE_UNSPEC.
> 
> I can also make it explicit there if you prefer.

Yeah that would be my preference as otherwise we might miss future changes there.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFC,net-next 0/3] ip_tunnel: specify tunnel type via template
  2018-10-04  0:03 [PATCH RFC,net-next 0/3] ip_tunnel: specify tunnel type via template Pablo Neira Ayuso
                   ` (2 preceding siblings ...)
  2018-10-04  0:03 ` [PATCH RFC,net-next 3/3] netfilter: nft_tunnel: " Pablo Neira Ayuso
@ 2018-10-04 19:13 ` Jakub Kicinski
  2018-10-04 21:58   ` Pablo Neira Ayuso
  3 siblings, 1 reply; 13+ messages in thread
From: Jakub Kicinski @ 2018-10-04 19:13 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netdev, netfilter-devel, roopa, amir, pshelar, u9012063

On Thu,  4 Oct 2018 02:03:42 +0200, Pablo Neira Ayuso wrote:
> Hi,
> 
> The following patchset adds a new field to the tunnel metadata template
> to restrict the configuration to a given tunnel driver. Currently, a
> misconfiguration may result in packets going to the wrong tunnel driver.
> 
> Although we have the tunnel option flags, they are not mandatory for
> some tunnel drivers, eg. vxlan, which may use it or not; and gre which
> does not use them.

Option flags are necessary because interpretation of option blob is
entirely protocol-specific.

> This patch updates tc's tunnel action and netfilter's tunnel extension
> to use this new field. OVS netlink interface has been left unset, although they
> could be updated to use this.
> 
> By extending the existing tc action to support the IP_TUNNEL_INFO_BRIDGE
> mode, I think it should be possible to expose IP_TUNNEL_TYPE_VLAN too,
> although this patchset doesn't address this scenario.
> 
> The field is initialized to zero, which maps to IP_TUNNEL_TYPE_UNSPEC to
> retain the existing behaviour, so the existing flexibility is still in
> place while this new feature is added.
> 
> Cc'ing people that git annotate show as dealing with these bits more
> recently.

What practical scenario are you trying to address here?  Have you seen
https://www.mail-archive.com/netdev@vger.kernel.org/msg250705.html
?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFC,net-next 0/3] ip_tunnel: specify tunnel type via template
  2018-10-04 19:13 ` [PATCH RFC,net-next 0/3] ip_tunnel: specify tunnel type via template Jakub Kicinski
@ 2018-10-04 21:58   ` Pablo Neira Ayuso
  2018-10-14  6:42     ` Or Gerlitz
  0 siblings, 1 reply; 13+ messages in thread
From: Pablo Neira Ayuso @ 2018-10-04 21:58 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: netdev, netfilter-devel, roopa, amir, pshelar, u9012063

Hi Jakub,

On Thu, Oct 04, 2018 at 12:13:57PM -0700, Jakub Kicinski wrote:
> On Thu,  4 Oct 2018 02:03:42 +0200, Pablo Neira Ayuso wrote:
> > Hi,
> > 
> > The following patchset adds a new field to the tunnel metadata template
> > to restrict the configuration to a given tunnel driver. Currently, a
> > misconfiguration may result in packets going to the wrong tunnel driver.
> > 
> > Although we have the tunnel option flags, they are not mandatory for
> > some tunnel drivers, eg. vxlan, which may use it or not; and gre which
> > does not use them.
> 
> Option flags are necessary because interpretation of option blob is
> entirely protocol-specific.
> 
> > This patch updates tc's tunnel action and netfilter's tunnel extension
> > to use this new field. OVS netlink interface has been left unset, although they
> > could be updated to use this.
> > 
> > By extending the existing tc action to support the IP_TUNNEL_INFO_BRIDGE
> > mode, I think it should be possible to expose IP_TUNNEL_TYPE_VLAN too,
> > although this patchset doesn't address this scenario.
> > 
> > The field is initialized to zero, which maps to IP_TUNNEL_TYPE_UNSPEC to
> > retain the existing behaviour, so the existing flexibility is still in
> > place while this new feature is added.
> > 
> > Cc'ing people that git annotate show as dealing with these bits more
> > recently.
> 
> What practical scenario are you trying to address here?

Incorrect configuration. The tunnel template defines an ID field, this
ID means vni in vxlan, but it also means session in erspan. If a
packet that should go to vxlan tunnel device (to be encapsulated using
vni 5) ends up in a gre/erspan device, you will get an erspan packet
with session 5. With this new tunnel type field, you can restrict
the tunnel template to work _only_ for a given tunnel device driver.
Hence, if the packet ends up in the wrong tunnel device driver due to
incorrect configuration, packet gets dropped.

> Have you seen
> https://www.mail-archive.com/netdev@vger.kernel.org/msg250705.html
> ?

Hm, I remember to have seen some hw offload driver code that is making
assumptions on the destination ports to pick the tunnel protocol,
based on what the act_key_tunnel is passing.

This patchset may probably help there too since act_key_tunnel will
convey the tunnel type, given this can now be made explicit. The tc
action parsing from the driver can annotate the tunnel type that has
been set in this rule via act_tunnel_key, then validate that follow up
mirred action points to a tunnel device of the same type.

Thanks.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFC,net-next 0/3] ip_tunnel: specify tunnel type via template
  2018-10-04 21:58   ` Pablo Neira Ayuso
@ 2018-10-14  6:42     ` Or Gerlitz
  2018-10-14  9:24       ` Pablo Neira Ayuso
  0 siblings, 1 reply; 13+ messages in thread
From: Or Gerlitz @ 2018-10-14  6:42 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Jakub Kicinski, Linux Netdev List, netfilter-devel, Roopa Prabhu,
	Amir Vadai, pravin shelar, u9012063, Jiri Pirko, Oz Shlomo

On Fri, Oct 5, 2018 at 12:58 AM Pablo Neira Ayuso <pablo@netfilter.org> wrote:
>
> Hi Jakub,
>
> On Thu, Oct 04, 2018 at 12:13:57PM -0700, Jakub Kicinski wrote:
> > On Thu,  4 Oct 2018 02:03:42 +0200, Pablo Neira Ayuso wrote:
> > > Hi,
> > >
> > > The following patchset adds a new field to the tunnel metadata template
> > > to restrict the configuration to a given tunnel driver. Currently, a
> > > misconfiguration may result in packets going to the wrong tunnel driver.
> > >
> > > Although we have the tunnel option flags, they are not mandatory for
> > > some tunnel drivers, eg. vxlan, which may use it or not; and gre which
> > > does not use them.
> >
> > Option flags are necessary because interpretation of option blob is
> > entirely protocol-specific.
> >
> > > This patch updates tc's tunnel action and netfilter's tunnel extension
> > > to use this new field. OVS netlink interface has been left unset, although they
> > > could be updated to use this.
> > >
> > > By extending the existing tc action to support the IP_TUNNEL_INFO_BRIDGE
> > > mode, I think it should be possible to expose IP_TUNNEL_TYPE_VLAN too,
> > > although this patchset doesn't address this scenario.

not following... can you elaborate further please?

> > > The field is initialized to zero, which maps to IP_TUNNEL_TYPE_UNSPEC to
> > > retain the existing behaviour, so the existing flexibility is still in
> > > place while this new feature is added.
> > >
> > > Cc'ing people that git annotate show as dealing with these bits more
> > > recently.
> >
> > What practical scenario are you trying to address here?
>
> Incorrect configuration. The tunnel template defines an ID field, this
> ID means vni in vxlan, but it also means session in erspan. If a
> packet that should go to vxlan tunnel device (to be encapsulated using
> vni 5) ends up in a gre/erspan device, you will get an erspan packet
> with session 5. With this new tunnel type field, you can restrict
> the tunnel template to work _only_ for a given tunnel device driver.
> Hence, if the packet ends up in the wrong tunnel device driver due to
> incorrect configuration, packet gets dropped.
>
>> Have you seen
>> https://www.mail-archive.com/netdev@vger.kernel.org/msg250705.html ?

> Hm, I remember to have seen some hw offload driver code that is making
> assumptions on the destination ports to pick the tunnel protocol,
> based on what the act_key_tunnel is passing.
[..]

for udp based tunneling this is valid practice, b/c a driver can get the tunnel
type <--> udp dest port relation from the stack through the udp tunnel ndo.

HW offloading wise, I think it would be better first pursue Januk and Co
proposal which deals with the problematic part of tc/tunnels -- ingress rules
set on tunnel devices (decap rules). The RFC looks very promising and
I understand
this is going to be reposed as non-rfc early in the next cycle

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFC,net-next 0/3] ip_tunnel: specify tunnel type via template
  2018-10-14  6:42     ` Or Gerlitz
@ 2018-10-14  9:24       ` Pablo Neira Ayuso
  2018-10-14 11:24         ` Or Gerlitz
  0 siblings, 1 reply; 13+ messages in thread
From: Pablo Neira Ayuso @ 2018-10-14  9:24 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Jakub Kicinski, Linux Netdev List, netfilter-devel, Roopa Prabhu,
	Amir Vadai, pravin shelar, u9012063, Jiri Pirko, Oz Shlomo

Hi Or,

On Sun, Oct 14, 2018 at 09:42:42AM +0300, Or Gerlitz wrote:
> On Fri, Oct 5, 2018 at 12:58 AM Pablo Neira Ayuso <pablo@netfilter.org> wrote:
[...]
> > On Thu, Oct 04, 2018 at 12:13:57PM -0700, Jakub Kicinski wrote:
> > > On Thu,  4 Oct 2018 02:03:42 +0200, Pablo Neira Ayuso wrote:
> > > > Hi,
> > > >
> > > > The following patchset adds a new field to the tunnel metadata template
> > > > to restrict the configuration to a given tunnel driver. Currently, a
> > > > misconfiguration may result in packets going to the wrong tunnel driver.
> > > >
> > > > Although we have the tunnel option flags, they are not mandatory for
> > > > some tunnel drivers, eg. vxlan, which may use it or not; and gre which
> > > > does not use them.
> > >
> > > Option flags are necessary because interpretation of option blob is
> > > entirely protocol-specific.
> > >
> > > > This patch updates tc's tunnel action and netfilter's tunnel extension
> > > > to use this new field. OVS netlink interface has been left unset, although they
> > > > could be updated to use this.
> > > >
> > > > By extending the existing tc action to support the IP_TUNNEL_INFO_BRIDGE
> > > > mode, I think it should be possible to expose IP_TUNNEL_TYPE_VLAN too,
> > > > although this patchset doesn't address this scenario.
> 
> not following... can you elaborate further please?

It should be possible to extend act_key_tunnel to support the
IP_TUNNEL_INFO_BRIDGE flag, but this is just a suggestion.

> > > > The field is initialized to zero, which maps to IP_TUNNEL_TYPE_UNSPEC to
> > > > retain the existing behaviour, so the existing flexibility is still in
> > > > place while this new feature is added.
> > > >
> > > > Cc'ing people that git annotate show as dealing with these bits more
> > > > recently.
> > >
> > > What practical scenario are you trying to address here?
> >
> > Incorrect configuration. The tunnel template defines an ID field, this
> > ID means vni in vxlan, but it also means session in erspan. If a
> > packet that should go to vxlan tunnel device (to be encapsulated using
> > vni 5) ends up in a gre/erspan device, you will get an erspan packet
> > with session 5. With this new tunnel type field, you can restrict
> > the tunnel template to work _only_ for a given tunnel device driver.
> > Hence, if the packet ends up in the wrong tunnel device driver due to
> > incorrect configuration, packet gets dropped.
> >
> >> Have you seen
> >> https://www.mail-archive.com/netdev@vger.kernel.org/msg250705.html ?
> 
> > Hm, I remember to have seen some hw offload driver code that is making
> > assumptions on the destination ports to pick the tunnel protocol,
> > based on what the act_key_tunnel is passing.
> [..]
> 
> for udp based tunneling this is valid practice, b/c a driver can get the tunnel
> type <--> udp dest port relation from the stack through the udp tunnel ndo.
> 
> HW offloading wise, I think it would be better first pursue Januk and Co
> proposal which deals with the problematic part of tc/tunnels -- ingress rules
> set on tunnel devices (decap rules). The RFC looks very promising and
> I understand this is going to be reposed as non-rfc early in the next cycle

Sorry, I think there's a misunderstanding here.

My patchset is of benefit for pure software/control plane approach:
This allows users to narrow down tunnel configurations, existing
approach is very flexible - probably a bit too much for people willing
to validate things are correct - since it is allowing too loose
configurations. This is leaving room to the user to make configuration
mistakes that result in bad things, such as packets being tunneled
through the wrong encapsulation type. With this patchset, packets
going to the wrong tunnel devices will be simply dropped - and we
could even do more specific validation from control plane after this.
This patchset is backward compatible, so it doesn't restrict for the
existing flexibility that users may want for this.

This patchset _never_ meant to replace Jakub's work nor any HW offload
infrastructure. After reading Jakub's email, I was just suggesting
that this may (probably) still help drivers too, since this would
provide more hints to the driver. Please, let me know if this is
causing any interference with your ongoing HW driver development in
some way, and if so, in what way.

Let me know,
Thanks.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFC,net-next 0/3] ip_tunnel: specify tunnel type via template
  2018-10-14  9:24       ` Pablo Neira Ayuso
@ 2018-10-14 11:24         ` Or Gerlitz
  2018-10-14 17:59           ` Pablo Neira Ayuso
  0 siblings, 1 reply; 13+ messages in thread
From: Or Gerlitz @ 2018-10-14 11:24 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Jakub Kicinski, Linux Netdev List, netfilter-devel, Roopa Prabhu,
	Amir Vadai, pravin shelar, u9012063, Jiri Pirko, Oz Shlomo

On Sun, Oct 14, 2018 at 12:24 PM Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> On Sun, Oct 14, 2018 at 09:42:42AM +0300, Or Gerlitz wrote:
> > On Fri, Oct 5, 2018 at 12:58 AM Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> [...]
> > > On Thu, Oct 04, 2018 at 12:13:57PM -0700, Jakub Kicinski wrote:
> > > > On Thu,  4 Oct 2018 02:03:42 +0200, Pablo Neira Ayuso wrote:
> > > > > Hi,
> > > > >
> > > > > The following patchset adds a new field to the tunnel metadata template
> > > > > to restrict the configuration to a given tunnel driver. Currently, a
> > > > > misconfiguration may result in packets going to the wrong tunnel driver.
> > > > >
> > > > > Although we have the tunnel option flags, they are not mandatory for
> > > > > some tunnel drivers, eg. vxlan, which may use it or not; and gre which
> > > > > does not use them.
> > > >
> > > > Option flags are necessary because interpretation of option blob is
> > > > entirely protocol-specific.
> > > >
> > > > > This patch updates tc's tunnel action and netfilter's tunnel extension
> > > > > to use this new field. OVS netlink interface has been left unset, although they
> > > > > could be updated to use this.
> > > > >
> > > > > By extending the existing tc action to support the IP_TUNNEL_INFO_BRIDGE
> > > > > mode, I think it should be possible to expose IP_TUNNEL_TYPE_VLAN too,
> > > > > although this patchset doesn't address this scenario.
> >
> > not following... can you elaborate further please?
>
> It should be possible to extend act_key_tunnel to support the
> IP_TUNNEL_INFO_BRIDGE flag, but this is just a suggestion.
>
> > > > > The field is initialized to zero, which maps to IP_TUNNEL_TYPE_UNSPEC to
> > > > > retain the existing behaviour, so the existing flexibility is still in
> > > > > place while this new feature is added.
> > > > >
> > > > > Cc'ing people that git annotate show as dealing with these bits more
> > > > > recently.
> > > >
> > > > What practical scenario are you trying to address here?
> > >
> > > Incorrect configuration. The tunnel template defines an ID field, this
> > > ID means vni in vxlan, but it also means session in erspan. If a
> > > packet that should go to vxlan tunnel device (to be encapsulated using
> > > vni 5) ends up in a gre/erspan device, you will get an erspan packet
> > > with session 5. With this new tunnel type field, you can restrict
> > > the tunnel template to work _only_ for a given tunnel device driver.
> > > Hence, if the packet ends up in the wrong tunnel device driver due to
> > > incorrect configuration, packet gets dropped.
> > >
> > >> Have you seen
> > >> https://www.mail-archive.com/netdev@vger.kernel.org/msg250705.html ?
> >
> > > Hm, I remember to have seen some hw offload driver code that is making
> > > assumptions on the destination ports to pick the tunnel protocol,
> > > based on what the act_key_tunnel is passing.
> > [..]
> >
> > for udp based tunneling this is valid practice, b/c a driver can get the tunnel
> > type <--> udp dest port relation from the stack through the udp tunnel ndo.
> >
> > HW offloading wise, I think it would be better first pursue Januk and Co
> > proposal which deals with the problematic part of tc/tunnels -- ingress rules
> > set on tunnel devices (decap rules). The RFC looks very promising and
> > I understand this is going to be reposed as non-rfc early in the next cycle
>
> Sorry, I think there's a misunderstanding here.
>
> My patchset is of benefit for pure software/control plane approach:
> This allows users to narrow down tunnel configurations, existing
> approach is very flexible - probably a bit too much for people willing
> to validate things are correct - since it is allowing too loose
> configurations. This is leaving room to the user to make configuration
> mistakes that result in bad things, such as packets being tunneled
> through the wrong encapsulation type.

I see your point, but I think that the fact that effectively all or most
of the IP based tunnels in the kernel conform to struct ip_tunnel_info
is a plus and not a minus.. control planes can do their mistakes if they
don't read the man, this wrong tunneling that you mentioned is not going
to live far beyond the next networking hop or even dropped earlier, within the
stack of this node.

> through the wrong encapsulation type. With this patchset, packets
> going to the wrong tunnel devices will be simply dropped - and we
> could even do more specific validation from control plane after this.
> This patchset is backward compatible, so it doesn't restrict for the
> existing flexibility that users may want for this.
>
> This patchset _never_ meant to replace Jakub's work nor any HW offload
> infrastructure. After reading Jakub's email, I was just suggesting
> that this may (probably) still help drivers too, since this would
> provide more hints to the driver.

well, yes, more hints and sort of no (see next)

> provide more hints to the driver. Please, let me know if this is
> causing any interference with your ongoing HW driver development in
> some way, and if so, in what way.

for example, if newer controller wants to work over older kernel that doesn't
have the new flag, they have to write code that can go both ways, this is doable
-- but do we want to do that just for the sake of reacting to user
mis-configurations?

I am open  / will be happy to hear more opinions here.

Or.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFC,net-next 0/3] ip_tunnel: specify tunnel type via template
  2018-10-14 11:24         ` Or Gerlitz
@ 2018-10-14 17:59           ` Pablo Neira Ayuso
  0 siblings, 0 replies; 13+ messages in thread
From: Pablo Neira Ayuso @ 2018-10-14 17:59 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Jakub Kicinski, Linux Netdev List, netfilter-devel, Roopa Prabhu,
	Amir Vadai, pravin shelar, u9012063, Jiri Pirko, Oz Shlomo

Hi Or,

On Sun, Oct 14, 2018 at 02:24:42PM +0300, Or Gerlitz wrote:
> On Sun, Oct 14, 2018 at 12:24 PM Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > On Sun, Oct 14, 2018 at 09:42:42AM +0300, Or Gerlitz wrote:
[...]
> > > for udp based tunneling this is valid practice, b/c a driver can get the tunnel
> > > type <--> udp dest port relation from the stack through the udp tunnel ndo.
> > >
> > > HW offloading wise, I think it would be better first pursue Januk and Co
> > > proposal which deals with the problematic part of tc/tunnels -- ingress rules
> > > set on tunnel devices (decap rules). The RFC looks very promising and
> > > I understand this is going to be reposed as non-rfc early in the next cycle
> >
> > Sorry, I think there's a misunderstanding here.
> >
> > My patchset is of benefit for pure software/control plane approach:
> > This allows users to narrow down tunnel configurations, existing
> > approach is very flexible - probably a bit too much for people willing
> > to validate things are correct - since it is allowing too loose
> > configurations. This is leaving room to the user to make configuration
> > mistakes that result in bad things, such as packets being tunneled
> > through the wrong encapsulation type.
> 
> I see your point, but I think that the fact that effectively all or most
> of the IP based tunnels in the kernel conform to struct ip_tunnel_info
> is a plus and not a minus.. control planes can do their mistakes if they
> don't read the man, this wrong tunneling that you mentioned is not going
> to live far beyond the next networking hop or even dropped earlier, within the
> stack of this node.

I think it is worse that just 'please read the manpage', look:

The tunnel ID field is 64 bits, but it is trimmed down to 32 bits for
VxLAN, and then it is 8 bits in case of ERSPAN for the session field.
Semantics depend on the tunnel protocol. Depending on the tunneling
protocol, the ID have different tunnel header field length. The
control plane cannot reject a 0xff00ffff in ERSPAN because the
ip_tunnel_info structure has no tunnel type semantics. I think it
would be good if users get some sort of 'sorry, you cannot specify
tunnel ID larger that 8 bits in ERSPAN'.

Another example, you can also specify via tc/ingress a rule like:

        act_tunnel_key id 50 src-addr 192.168.2.1

with no fwd/mirred action in a bridge setup. The tunnel device is part
of the bridge in this case, and the template passes the configuration
to the tunnel device that is part of the bridge. So we cannot assume
the fwd/mirred action always follow act_tunnel_key in software.
Probably in HW offload it makes sense to always follow it up with
fwd/mirred to keep things simple, but we already have scenarios in
software where this is not the case.

> > through the wrong encapsulation type. With this patchset, packets
> > going to the wrong tunnel devices will be simply dropped - and we
> > could even do more specific validation from control plane after this.
> > This patchset is backward compatible, so it doesn't restrict for the
> > existing flexibility that users may want for this.
> >
> > This patchset _never_ meant to replace Jakub's work nor any HW offload
> > infrastructure. After reading Jakub's email, I was just suggesting
> > that this may (probably) still help drivers too, since this would
> > provide more hints to the driver.
> 
> well, yes, more hints and sort of no (see next)
> 
> > provide more hints to the driver. Please, let me know if this is
> > causing any interference with your ongoing HW driver development in
> > some way, and if so, in what way.
> 
> for example, if newer controller wants to work over older kernel that doesn't
> have the new flag, they have to write code that can go both ways

This is backward compatible, your controller can keep using the
existing approach forever.

Anyway, I think the problem you're refering about older kernel and new
controller is an interesting one but a different problem: This is an
existing limitation in netlink. Currently, there is not way to know
what the API supports other than doing probing, and probing is
something may not even help. I already proposed something to address
this problem in case you're interested [1].

> , this is doable
> -- but do we want to do that just for the sake of reacting to user
> mis-configurations?
> 
> I am open  / will be happy to hear more opinions here.

Please note that this new feature is optional, for people willing to
have an interface/control plane that can validate what they are
configuring can be good. Users tend to make mistakes, manpage is last
resort, well if control plane can help / provide hints on what is
wrong, things become easier for users.

Thanks for your feedback!

[1] https://lwn.net/Articles/746776/

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2018-10-15  1:41 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-10-04  0:03 [PATCH RFC,net-next 0/3] ip_tunnel: specify tunnel type via template Pablo Neira Ayuso
2018-10-04  0:03 ` [PATCH RFC,net-next 1/3] ip_tunnel: add type field to struct ip_tunnel_info Pablo Neira Ayuso
2018-10-04  9:25   ` Daniel Borkmann
2018-10-04 10:56     ` Pablo Neira Ayuso
2018-10-04 12:00       ` Daniel Borkmann
2018-10-04  0:03 ` [PATCH RFC,net-next 2/3] net: act_tunnel_key: support for tunnel type Pablo Neira Ayuso
2018-10-04  0:03 ` [PATCH RFC,net-next 3/3] netfilter: nft_tunnel: " Pablo Neira Ayuso
2018-10-04 19:13 ` [PATCH RFC,net-next 0/3] ip_tunnel: specify tunnel type via template Jakub Kicinski
2018-10-04 21:58   ` Pablo Neira Ayuso
2018-10-14  6:42     ` Or Gerlitz
2018-10-14  9:24       ` Pablo Neira Ayuso
2018-10-14 11:24         ` Or Gerlitz
2018-10-14 17:59           ` Pablo Neira Ayuso

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).