[PATCH ipsec-next 0/7]: Support multiple VTIs with the same src+dst pair

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH ipsec-next 0/7]: Support multiple VTIs with the same src+dst pair
@ 2017-12-20 17:06 Lorenzo Colitti
  2017-12-20 17:06 ` [PATCH ipsec-next 1/7] net: xfrm: Don't check for TUNNEL_KEY when hashing VTI tunnels Lorenzo Colitti
                   ` (8 more replies)
  0 siblings, 9 replies; 12+ messages in thread
From: Lorenzo Colitti @ 2017-12-20 17:06 UTC (permalink / raw)
  To: netdev; +Cc: steffen.klassert, subashab, nharold, davem

When using IPsec tunnel mode, VTIs provide many benefits compared
to direct configuration of xfrm policies / states. However, one
limitation is that there can only be one VTI between a given pair
of IP addresses. This does not allow configuring multiple IPsec
tunnels to the same security gateway. This is required by some
deployments, for example I-WLAN [3GPP TS 24.327].

This patchset introduces a new VTI_KEYED flag that allows
configuration of multiple VTIs between the same IP address
pairs. The semantics are as follows:

- The output path is the same as current VTI behaviour, where a
  routing lookup selects a VTI interface, and the VTI's okey
  specifies the mark to use in the XFRM lookup.
- The input and ICMP error paths instead work by first looking up
  an SA with a loose match that ignores the mark. That mark is
  then used to find the tunnel by ikey (for input packets) or
  okey (for ICMP errors).

In order for ICMP errors to work, flags are added to the common
IP lookup functions to ignore the tunnel ikey and to look up
tunnels by okey instead of ikey.

On the same IP address pair, keyed VTIs can coexist with each
other (as long as the ikeys are different), but cannot coexist
with keyless VTIs. This is because the existing keyless VTI
behaviour (which this series does not change) is to always accept
packets matching an input policy, regardless of whether there is
any matching XFRM state. Thus, the keyless VTI would accept the
traffic for the keyed tunnel and drop it because it would not
match the keyed tunnel's state.

Changes from RFC series:
- Processing of ICMP errors now works when ikey != okey.
- Series now contains changes to the common tunnel lookup
  functions to match tunnels by okey and to ignore ikey when
  matching.
- Fixed missing EXPORT_SYMBOL for xfrm_state_lookup_loose.
- Made vti6_lookup static as it should have been.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH ipsec-next 1/7] net: xfrm: Don't check for TUNNEL_KEY when hashing VTI tunnels.
  2017-12-20 17:06 [PATCH ipsec-next 0/7]: Support multiple VTIs with the same src+dst pair Lorenzo Colitti
@ 2017-12-20 17:06 ` Lorenzo Colitti
  2017-12-20 17:06 ` [PATCH ipsec-next 2/7] net: ipv4: Add new flags to tunnel lookup Lorenzo Colitti
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Lorenzo Colitti @ 2017-12-20 17:06 UTC (permalink / raw)
  To: netdev; +Cc: steffen.klassert, subashab, nharold, davem, Lorenzo Colitti

Currently, ip_bucket sets the lookup i_key to 0 if the tunnel's
i_flags have VTI_ISVTI flag set but not TUNNEL_KEY. However, it
can can never happen that TUNNEL_KEY is set if VTI_ISVTI is also
set (see below). Therefore, just drop the check for TUNNEL_KEY
and only set i_key to 0 on VTI_ISVTI.

This will allow the VTI code to set TUNNEL_KEY on certain
tunnels in a future change.

None of the callers of ip_bucket pass in TUNNEL_KEY | VTI_ISVTI.
The call graph is as follows:

- ip_tunnel_add
  - ip_tunnel_create
    - ip_tunnel_ioctl
      - ipgre_tunnel_ioctl: can set TUNNEL_KEY but not VTI_ISVTI
      - ipip_tunnel_ioctl: hardcodes i_flags to 0
      - vti_tunnel_ioctl: hardcodes i_flags to VTI_ISVTI
  - ip_tunnel_update: doesn't touch i_flags
  - ip_tunnel_init_net: memsets flags to 0
  - ip_tunnel_newlink
    - ipgre_newlink
      - ipgre_netlink_parms: can set TUNNEL_KEY but not VTI_ISVTI
    - vti_newlink: hardcodes i_flags to VTI_ISVTI
  - ip_tunnel_changelink: doesn't set flags
- ip_tunnel_find
  - ip_tunnel_ioctl (see above)
  - ip_tunnel_newlink (see above)
  - ip_tunnel_changelink (see above)

VTI_ISVTI has the same value as TUNNEL_DONT_FRAGMENT, but that
is never set into tunnel parameters.

Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
---
 net/ipv4/ip_tunnel.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index 5ddb1cb52b..539c8f22c4 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -198,7 +198,7 @@ static struct hlist_head *ip_bucket(struct ip_tunnel_net *itn,
 	else
 		remote = 0;
 
-	if (!(parms->i_flags & TUNNEL_KEY) && (parms->i_flags & VTI_ISVTI))
+	if (parms->i_flags & VTI_ISVTI)
 		i_key = 0;
 
 	h = ip_tunnel_hash(i_key, remote);
-- 
2.15.1.620.gb9897f4670-goog

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH ipsec-next 2/7] net: ipv4: Add new flags to tunnel lookup.
  2017-12-20 17:06 [PATCH ipsec-next 0/7]: Support multiple VTIs with the same src+dst pair Lorenzo Colitti
  2017-12-20 17:06 ` [PATCH ipsec-next 1/7] net: xfrm: Don't check for TUNNEL_KEY when hashing VTI tunnels Lorenzo Colitti
@ 2017-12-20 17:06 ` Lorenzo Colitti
  2017-12-20 17:06 ` [PATCH ipsec-next 3/7] net: xfrm: Add an xfrm lookup that ignores the mark Lorenzo Colitti
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Lorenzo Colitti @ 2017-12-20 17:06 UTC (permalink / raw)
  To: netdev; +Cc: steffen.klassert, subashab, nharold, davem, Lorenzo Colitti

This patch adds support for two new flags to ip_tunnel_lookup.

- TUNNEL_LOOKUP_NO_KEY: ignores the tunnel's i_key when
  determining which hash bucket to look up the tunnel in. This is
  useful for tunnels such as VTI, which have i_keys, but are
  always hashed into bucket 0.
- TUNNEL_LOOKUP_OKEY: finds the tunnel by o_key instead of i_key.

Together, these flags allow processing ICMP errors correctly on
keyed tunnels where i_key != o_key. If such tunnels receive an
ICMP error, the only information available in the packet is the
o_key, so we must be able to find a tunnel by o_key alone. For
that to work, the tunnel hash must not depend on the i_key,
because if it does, we won't be able to find it by o_key alone.

TUNNEL_LOOKUP_NO_KEY is very similar to TUNNEL_NO_KEY so it might
be possible just to use TUNNEL_NO_KEY instead. However, it might
be confusing to see code simultaneously pass in both TUNNEL_NO_KEY
and TUNNEL_KEY.

These flags are numbered separately from tunnel flags because
they are not tunnel properties but properties of tunnel lookups.
(Also, the tunnel flags are 16 bits and all but one is unused.)

The flags are passed into ip_lookup by adding a new parameter.
This could also be done by expanding the existing flags parameter
from __be16 to __be32 and ensuring that the new flags are all
above the 16-bit boundary.

Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
---
 include/net/ip_tunnels.h |  6 +++++-
 net/ipv4/ip_gre.c        |  6 +++---
 net/ipv4/ip_tunnel.c     | 22 +++++++++++++---------
 net/ipv4/ip_vti.c        |  4 ++--
 net/ipv4/ipip.c          |  6 +++---
 5 files changed, 26 insertions(+), 18 deletions(-)

diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index 1f16773cfd..19d97b993a 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -163,6 +163,10 @@ struct ip_tunnel {
 #define TUNNEL_OPTIONS_PRESENT \
 		(TUNNEL_GENEVE_OPT | TUNNEL_VXLAN_OPT | TUNNEL_ERSPAN_OPT)
 
+/* Flags for ip_tunnel_lookup. */
+#define TUNNEL_LOOKUP_NO_KEY	0x01
+#define TUNNEL_LOOKUP_OKEY	0x02
+
 struct tnl_ptk_info {
 	__be16 flags;
 	__be16 proto;
@@ -276,7 +280,7 @@ int ip_tunnel_change_mtu(struct net_device *dev, int new_mtu);
 void ip_tunnel_get_stats64(struct net_device *dev,
 			   struct rtnl_link_stats64 *tot);
 struct ip_tunnel *ip_tunnel_lookup(struct ip_tunnel_net *itn,
-				   int link, __be16 flags,
+				   int link, __be16 flags, u8 lookup_flags,
 				   __be32 remote, __be32 local,
 				   __be32 key);
 
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index fd4d6e96da..f16a46cb19 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -182,7 +182,7 @@ static void ipgre_err(struct sk_buff *skb, u32 info,
 		itn = net_generic(net, ipgre_net_id);
 
 	iph = (const struct iphdr *)(icmp_hdr(skb) + 1);
-	t = ip_tunnel_lookup(itn, skb->dev->ifindex, tpi->flags,
+	t = ip_tunnel_lookup(itn, skb->dev->ifindex, tpi->flags, 0,
 			     iph->daddr, iph->saddr, tpi->key);
 
 	if (!t)
@@ -280,7 +280,7 @@ static int erspan_rcv(struct sk_buff *skb, struct tnl_ptk_info *tpi,
 	 */
 	tpi->key = cpu_to_be32(ntohs(ershdr->session_id) & ID_MASK);
 	tunnel = ip_tunnel_lookup(itn, skb->dev->ifindex,
-				  tpi->flags | TUNNEL_KEY,
+				  tpi->flags | TUNNEL_KEY, 0,
 				  iph->saddr, iph->daddr, tpi->key);
 
 	if (tunnel) {
@@ -356,7 +356,7 @@ static int __ipgre_rcv(struct sk_buff *skb, const struct tnl_ptk_info *tpi,
 	struct ip_tunnel *tunnel;
 
 	iph = ip_hdr(skb);
-	tunnel = ip_tunnel_lookup(itn, skb->dev->ifindex, tpi->flags,
+	tunnel = ip_tunnel_lookup(itn, skb->dev->ifindex, tpi->flags, 0,
 				  iph->saddr, iph->daddr, tpi->key);
 
 	if (tunnel) {
diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index 539c8f22c4..f45968bb81 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -70,11 +70,13 @@ static unsigned int ip_tunnel_hash(__be32 key, __be32 remote)
 }
 
 static bool ip_tunnel_key_match(const struct ip_tunnel_parm *p,
-				__be16 flags, __be32 key)
+				__be32 flags, u8 lookup_flags, __be32 key)
 {
+	__be32 tunnel_key = (lookup_flags & TUNNEL_LOOKUP_OKEY) ? p->o_key :
+								  p->i_key;
 	if (p->i_flags & TUNNEL_KEY) {
 		if (flags & TUNNEL_KEY)
-			return key == p->i_key;
+			return key == tunnel_key;
 		else
 			/* key expected, none present */
 			return false;
@@ -94,15 +96,17 @@ static bool ip_tunnel_key_match(const struct ip_tunnel_parm *p,
    Given src, dst and key, find appropriate for input tunnel.
 */
 struct ip_tunnel *ip_tunnel_lookup(struct ip_tunnel_net *itn,
-				   int link, __be16 flags,
+				   int link, __be16 flags, u8 lookup_flags,
 				   __be32 remote, __be32 local,
 				   __be32 key)
 {
 	unsigned int hash;
 	struct ip_tunnel *t, *cand = NULL;
 	struct hlist_head *head;
+	__be32 hash_key;
 
-	hash = ip_tunnel_hash(key, remote);
+	hash_key = (lookup_flags & TUNNEL_LOOKUP_NO_KEY) ? 0 : key;
+	hash = ip_tunnel_hash(hash_key, remote);
 	head = &itn->tunnels[hash];
 
 	hlist_for_each_entry_rcu(t, head, hash_node) {
@@ -111,7 +115,7 @@ struct ip_tunnel *ip_tunnel_lookup(struct ip_tunnel_net *itn,
 		    !(t->dev->flags & IFF_UP))
 			continue;
 
-		if (!ip_tunnel_key_match(&t->parms, flags, key))
+		if (!ip_tunnel_key_match(&t->parms, flags, lookup_flags, key))
 			continue;
 
 		if (t->parms.link == link)
@@ -126,7 +130,7 @@ struct ip_tunnel *ip_tunnel_lookup(struct ip_tunnel_net *itn,
 		    !(t->dev->flags & IFF_UP))
 			continue;
 
-		if (!ip_tunnel_key_match(&t->parms, flags, key))
+		if (!ip_tunnel_key_match(&t->parms, flags, lookup_flags, key))
 			continue;
 
 		if (t->parms.link == link)
@@ -135,7 +139,7 @@ struct ip_tunnel *ip_tunnel_lookup(struct ip_tunnel_net *itn,
 			cand = t;
 	}
 
-	hash = ip_tunnel_hash(key, 0);
+	hash = ip_tunnel_hash(hash_key, 0);
 	head = &itn->tunnels[hash];
 
 	hlist_for_each_entry_rcu(t, head, hash_node) {
@@ -146,7 +150,7 @@ struct ip_tunnel *ip_tunnel_lookup(struct ip_tunnel_net *itn,
 		if (!(t->dev->flags & IFF_UP))
 			continue;
 
-		if (!ip_tunnel_key_match(&t->parms, flags, key))
+		if (!ip_tunnel_key_match(&t->parms, flags, lookup_flags, key))
 			continue;
 
 		if (t->parms.link == link)
@@ -238,7 +242,7 @@ static struct ip_tunnel *ip_tunnel_find(struct ip_tunnel_net *itn,
 		    remote == t->parms.iph.daddr &&
 		    link == t->parms.link &&
 		    type == t->dev->type &&
-		    ip_tunnel_key_match(&t->parms, flags, key))
+		    ip_tunnel_key_match(&t->parms, flags, 0, key))
 			break;
 	}
 	return t;
diff --git a/net/ipv4/ip_vti.c b/net/ipv4/ip_vti.c
index 949f432a5f..804cee8126 100644
--- a/net/ipv4/ip_vti.c
+++ b/net/ipv4/ip_vti.c
@@ -57,7 +57,7 @@ static int vti_input(struct sk_buff *skb, int nexthdr, __be32 spi,
 	struct net *net = dev_net(skb->dev);
 	struct ip_tunnel_net *itn = net_generic(net, vti_net_id);
 
-	tunnel = ip_tunnel_lookup(itn, skb->dev->ifindex, TUNNEL_NO_KEY,
+	tunnel = ip_tunnel_lookup(itn, skb->dev->ifindex, TUNNEL_NO_KEY, 0,
 				  iph->saddr, iph->daddr, 0);
 	if (tunnel) {
 		if (!xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb))
@@ -278,7 +278,7 @@ static int vti4_err(struct sk_buff *skb, u32 info)
 	int protocol = iph->protocol;
 	struct ip_tunnel_net *itn = net_generic(net, vti_net_id);
 
-	tunnel = ip_tunnel_lookup(itn, skb->dev->ifindex, TUNNEL_NO_KEY,
+	tunnel = ip_tunnel_lookup(itn, skb->dev->ifindex, TUNNEL_NO_KEY, 0,
 				  iph->daddr, iph->saddr, 0);
 	if (!tunnel)
 		return -1;
diff --git a/net/ipv4/ipip.c b/net/ipv4/ipip.c
index c891235b49..81f94ffb92 100644
--- a/net/ipv4/ipip.c
+++ b/net/ipv4/ipip.c
@@ -167,7 +167,7 @@ static int ipip_err(struct sk_buff *skb, u32 info)
 		goto out;
 	}
 
-	t = ip_tunnel_lookup(itn, skb->dev->ifindex, TUNNEL_NO_KEY,
+	t = ip_tunnel_lookup(itn, skb->dev->ifindex, TUNNEL_NO_KEY, 0,
 			     iph->daddr, iph->saddr, 0);
 	if (!t) {
 		err = -ENOENT;
@@ -224,8 +224,8 @@ static int ipip_tunnel_rcv(struct sk_buff *skb, u8 ipproto)
 	const struct iphdr *iph;
 
 	iph = ip_hdr(skb);
-	tunnel = ip_tunnel_lookup(itn, skb->dev->ifindex, TUNNEL_NO_KEY,
-			iph->saddr, iph->daddr, 0);
+	tunnel = ip_tunnel_lookup(itn, skb->dev->ifindex, TUNNEL_NO_KEY, 0,
+				  iph->saddr, iph->daddr, 0);
 	if (tunnel) {
 		const struct tnl_ptk_info *tpi;
 
-- 
2.15.1.620.gb9897f4670-goog

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH ipsec-next 3/7] net: xfrm: Add an xfrm lookup that ignores the mark.
  2017-12-20 17:06 [PATCH ipsec-next 0/7]: Support multiple VTIs with the same src+dst pair Lorenzo Colitti
  2017-12-20 17:06 ` [PATCH ipsec-next 1/7] net: xfrm: Don't check for TUNNEL_KEY when hashing VTI tunnels Lorenzo Colitti
  2017-12-20 17:06 ` [PATCH ipsec-next 2/7] net: ipv4: Add new flags to tunnel lookup Lorenzo Colitti
@ 2017-12-20 17:06 ` Lorenzo Colitti
  2017-12-20 17:06 ` [PATCH ipsec-next 4/7] net: xfrm: Find VTI interfaces from xfrm_input Lorenzo Colitti
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Lorenzo Colitti @ 2017-12-20 17:06 UTC (permalink / raw)
  To: netdev; +Cc: steffen.klassert, subashab, nharold, davem, Lorenzo Colitti

The xfrm inbound and ICMP error paths can match inbound XFRM states
that have a mark, but only if the skb mark is already correctly set
to match the state mark. This typically requires iptables rules
(potentially even per SA iptables rules), which impose configuration
complexity.

In some cases, it may be useful to match such an SA anyway. An example
is when processing an ICMP error to an ESP packet that we previously
sent. In this case, the only information available to match the SA are
the IP addresses and the outbound SPI. Therefore, if the output SA has
a mark, the lookup will fail and the ICMP packet cannot be processed
unless the packet is somehow already marked.

Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
---
 include/net/xfrm.h    |  4 ++++
 net/xfrm/xfrm_state.c | 33 +++++++++++++++++++++++++++++++++
 2 files changed, 37 insertions(+)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 1ec0c47606..9d3b7c0ac6 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -1550,6 +1550,10 @@ struct xfrm_state *xfrm_state_lookup_byaddr(struct net *net, u32 mark,
 					    const xfrm_address_t *saddr,
 					    u8 proto,
 					    unsigned short family);
+struct xfrm_state *xfrm_state_lookup_loose(struct net *net, u32 mark,
+					   const xfrm_address_t *daddr,
+					   __be32 spi, u8 proto,
+					   unsigned short family);
 #ifdef CONFIG_XFRM_SUB_POLICY
 int xfrm_tmpl_sort(struct xfrm_tmpl **dst, struct xfrm_tmpl **src, int n,
 		   unsigned short family, struct net *net);
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 1b7856be3e..cff151c714 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -839,6 +839,39 @@ static struct xfrm_state *__xfrm_state_lookup(struct net *net, u32 mark,
 	return NULL;
 }
 
+struct xfrm_state *xfrm_state_lookup_loose(struct net *net, u32 mark,
+					   const xfrm_address_t *daddr,
+					   __be32 spi, u8 proto,
+					   unsigned short family)
+{
+	unsigned int h = xfrm_spi_hash(net, daddr, spi, proto, family);
+	struct xfrm_state *x, *cand = NULL;
+
+	rcu_read_lock();
+	hlist_for_each_entry_rcu(x, net->xfrm.state_byspi + h, byspi) {
+		if (x->props.family != family ||
+		    x->id.spi       != spi ||
+		    x->id.proto     != proto ||
+		    !xfrm_addr_equal(&x->id.daddr, daddr, family))
+			continue;
+
+		if (((mark & x->mark.m) == x->mark.v) &&
+		    xfrm_state_hold_rcu(x)) {
+			if (cand)
+				xfrm_state_put(cand);
+			rcu_read_unlock();
+			return x;
+		}
+
+		if (!cand && xfrm_state_hold_rcu(x))
+			cand = x;
+	}
+
+	rcu_read_unlock();
+	return cand;
+}
+EXPORT_SYMBOL(xfrm_state_lookup_loose);
+
 static struct xfrm_state *__xfrm_state_lookup_byaddr(struct net *net, u32 mark,
 						     const xfrm_address_t *daddr,
 						     const xfrm_address_t *saddr,
-- 
2.15.1.620.gb9897f4670-goog

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH ipsec-next 4/7] net: xfrm: Find VTI interfaces from xfrm_input.
  2017-12-20 17:06 [PATCH ipsec-next 0/7]: Support multiple VTIs with the same src+dst pair Lorenzo Colitti
                   ` (2 preceding siblings ...)
  2017-12-20 17:06 ` [PATCH ipsec-next 3/7] net: xfrm: Add an xfrm lookup that ignores the mark Lorenzo Colitti
@ 2017-12-20 17:06 ` Lorenzo Colitti
  2017-12-20 17:06 ` [PATCH ipsec-next 5/7] net: xfrm: Deliver packets to keyed VTI tunnels Lorenzo Colitti
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Lorenzo Colitti @ 2017-12-20 17:06 UTC (permalink / raw)
  To: netdev; +Cc: steffen.klassert, subashab, nharold, davem, Lorenzo Colitti

Currently, the VTI input path works by first looking up the VTI
by its IP addresses, then setting the tunnel pointer in the
XFRM_TUNNEL_SKB_CB, and then having xfrm_input override the mark
with the mark in the tunnel.

This patch changes the order so that the tunnel is found by a
callback from xfrm_input. Each tunnel type (currently only ip_vti
and ip6_vti) implements a lookup function pointer that finds the
tunnel and sets it in the CB, and also does a state lookup.

This has the advantage that much more information is available to
the tunnel lookup function, including the looked-up XFRM state.
This will be used in a future change to allow finding the tunnel
based on the result of the xfrm lookup and not just on IP
addresses, which will allow multiple tunnels on the same IP
address pair.

The lookup function pointer occupies the same space in the
XFRM_TUNNEL_SKB_CB as the IPv4/IPv6 tunnel pointer. The semantics
of the field are:
- When not running a handler that uses tunnels: always null.
- At the beginning of xfrm_input: lookup function pointer.
- After xfrm_input calls the lookup function: tunnel if found,
  else null.

Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
---
 include/net/xfrm.h     |  2 ++
 net/ipv4/ip_vti.c      | 43 ++++++++++++++++++++++++++++++++++++----
 net/ipv6/ip6_vti.c     | 53 +++++++++++++++++++++++++++++++++++++++++++++-----
 net/ipv6/xfrm6_input.c |  1 -
 net/xfrm/xfrm_input.c  | 34 +++++++++++++++++++-------------
 5 files changed, 109 insertions(+), 24 deletions(-)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 9d3b7c0ac6..3d245f2f6f 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -653,6 +653,8 @@ struct xfrm_tunnel_skb_cb {
 	} header;
 
 	union {
+		int (*lookup)(struct sk_buff *skb, int nexthdr, __be32 spi,
+			      __be32 seq, struct xfrm_state **x);
 		struct ip_tunnel *ip4;
 		struct ip6_tnl *ip6;
 	} tunnel;
diff --git a/net/ipv4/ip_vti.c b/net/ipv4/ip_vti.c
index 804cee8126..21f93e398e 100644
--- a/net/ipv4/ip_vti.c
+++ b/net/ipv4/ip_vti.c
@@ -49,8 +49,8 @@ static struct rtnl_link_ops vti_link_ops __read_mostly;
 static unsigned int vti_net_id __read_mostly;
 static int vti_tunnel_init(struct net_device *dev);
 
-static int vti_input(struct sk_buff *skb, int nexthdr, __be32 spi,
-		     int encap_type)
+static struct ip_tunnel *
+vti4_find_tunnel(struct sk_buff *skb, __be32 spi, struct xfrm_state **x)
 {
 	struct ip_tunnel *tunnel;
 	const struct iphdr *iph = ip_hdr(skb);
@@ -59,19 +59,52 @@ static int vti_input(struct sk_buff *skb, int nexthdr, __be32 spi,
 
 	tunnel = ip_tunnel_lookup(itn, skb->dev->ifindex, TUNNEL_NO_KEY, 0,
 				  iph->saddr, iph->daddr, 0);
+	if (tunnel) {
+		*x = xfrm_state_lookup(net, be32_to_cpu(tunnel->parms.i_key),
+				       (xfrm_address_t *)&iph->daddr,
+				       spi, iph->protocol, AF_INET);
+	}
+
+	return tunnel;
+}
+
+static int vti_lookup(struct sk_buff *skb, int nexthdr, __be32 spi, __be32 seq,
+		      struct xfrm_state **x)
+{
+	struct net *net = dev_net(skb->dev);
+	struct ip_tunnel *tunnel;
+
+	tunnel = vti4_find_tunnel(skb, spi, x);
 	if (tunnel) {
 		if (!xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb))
 			goto drop;
 
+		if (!*x) {
+			XFRM_INC_STATS(net, LINUX_MIB_XFRMINNOSTATES);
+			xfrm_audit_state_notfound(skb, AF_INET, spi, seq);
+			tunnel->dev->stats.rx_errors++;
+			tunnel->dev->stats.rx_dropped++;
+			goto drop;
+		}
+
 		XFRM_TUNNEL_SKB_CB(skb)->tunnel.ip4 = tunnel;
 
-		return xfrm_input(skb, nexthdr, spi, encap_type);
+		return 0;
 	}
 
 	return -EINVAL;
 drop:
+	if (*x)
+		xfrm_state_put(*x);
 	kfree_skb(skb);
-	return 0;
+	return -ESRCH;
+}
+
+static int vti_input(struct sk_buff *skb, int nexthdr, __be32 spi,
+		     int encap_type)
+{
+	XFRM_TUNNEL_SKB_CB(skb)->tunnel.lookup = vti_lookup;
+	return xfrm_input(skb, nexthdr, spi, encap_type);
 }
 
 static int vti_rcv(struct sk_buff *skb)
@@ -93,6 +126,8 @@ static int vti_rcv_cb(struct sk_buff *skb, int err)
 	u32 orig_mark = skb->mark;
 	int ret;
 
+	XFRM_TUNNEL_SKB_CB(skb)->tunnel.ip4 = NULL;
+
 	if (!tunnel)
 		return 1;
 
diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
index dbb74f3c57..5994fedd19 100644
--- a/net/ipv6/ip6_vti.c
+++ b/net/ipv6/ip6_vti.c
@@ -297,13 +297,33 @@ static void vti6_dev_uninit(struct net_device *dev)
 	dev_put(dev);
 }
 
-static int vti6_rcv(struct sk_buff *skb)
+static struct ip6_tnl *
+vti6_find_tunnel(struct sk_buff *skb, __be32 spi, struct xfrm_state **x)
 {
+	const struct ipv6hdr *ipv6h = ipv6_hdr(skb);
+	struct net *net = dev_net(skb->dev);
 	struct ip6_tnl *t;
+
+	t = vti6_tnl_lookup(net, &ipv6h->saddr, &ipv6h->daddr);
+	if (t) {
+		*x = xfrm_state_lookup(net, be32_to_cpu(t->parms.i_key),
+				       (xfrm_address_t *)&ipv6h->daddr,
+				       spi, ipv6h->nexthdr, AF_INET6);
+	}
+
+	return t;
+}
+
+static int
+vti6_lookup(struct sk_buff *skb, int nexthdr, __be32 spi, __be32 seq,
+	    struct xfrm_state **x)
+{
 	const struct ipv6hdr *ipv6h = ipv6_hdr(skb);
+	struct net *net = dev_net(skb->dev);
+	struct ip6_tnl *t;
 
 	rcu_read_lock();
-	t = vti6_tnl_lookup(dev_net(skb->dev), &ipv6h->saddr, &ipv6h->daddr);
+	t = vti6_find_tunnel(skb, spi, x);
 	if (t) {
 		if (t->parms.proto != IPPROTO_IPV6 && t->parms.proto != 0) {
 			rcu_read_unlock();
@@ -312,7 +332,7 @@ static int vti6_rcv(struct sk_buff *skb)
 
 		if (!xfrm6_policy_check(NULL, XFRM_POLICY_IN, skb)) {
 			rcu_read_unlock();
-			return 0;
+			goto discard;
 		}
 
 		if (!ip6_tnl_rcv_ctl(t, &ipv6h->daddr, &ipv6h->saddr)) {
@@ -321,15 +341,36 @@ static int vti6_rcv(struct sk_buff *skb)
 			goto discard;
 		}
 
+		if (!*x) {
+			XFRM_INC_STATS(net, LINUX_MIB_XFRMINNOSTATES);
+			xfrm_audit_state_notfound(skb, AF_INET6, spi, seq);
+			t->dev->stats.rx_errors++;
+			t->dev->stats.rx_dropped++;
+			rcu_read_unlock();
+			goto discard;
+		}
+
 		rcu_read_unlock();
 
-		return xfrm6_rcv_tnl(skb, t);
+		XFRM_TUNNEL_SKB_CB(skb)->tunnel.ip6 = t;
+
+		return 0;
 	}
 	rcu_read_unlock();
 	return -EINVAL;
 discard:
+	if (*x)
+		xfrm_state_put(*x);
 	kfree_skb(skb);
-	return 0;
+	return -ESRCH;
+}
+
+static int vti6_rcv(struct sk_buff *skb)
+{
+	int nexthdr = skb_network_header(skb)[IP6CB(skb)->nhoff];
+
+	XFRM_TUNNEL_SKB_CB(skb)->tunnel.lookup = vti6_lookup;
+	return xfrm6_rcv_spi(skb, nexthdr, 0, NULL);
 }
 
 static int vti6_rcv_cb(struct sk_buff *skb, int err)
@@ -343,6 +384,8 @@ static int vti6_rcv_cb(struct sk_buff *skb, int err)
 	u32 orig_mark = skb->mark;
 	int ret;
 
+	XFRM_TUNNEL_SKB_CB(skb)->tunnel.ip6 = NULL;
+
 	if (!t)
 		return 1;
 
diff --git a/net/ipv6/xfrm6_input.c b/net/ipv6/xfrm6_input.c
index fe04e23af9..6d1b734fef 100644
--- a/net/ipv6/xfrm6_input.c
+++ b/net/ipv6/xfrm6_input.c
@@ -25,7 +25,6 @@ int xfrm6_extract_input(struct xfrm_state *x, struct sk_buff *skb)
 int xfrm6_rcv_spi(struct sk_buff *skb, int nexthdr, __be32 spi,
 		  struct ip6_tnl *t)
 {
-	XFRM_TUNNEL_SKB_CB(skb)->tunnel.ip6 = t;
 	XFRM_SPI_SKB_CB(skb)->family = AF_INET6;
 	XFRM_SPI_SKB_CB(skb)->daddroff = offsetof(struct ipv6hdr, daddr);
 	return xfrm_input(skb, nexthdr, spi, 0);
diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
index ac277b97e0..7b54f58454 100644
--- a/net/xfrm/xfrm_input.c
+++ b/net/xfrm/xfrm_input.c
@@ -267,18 +267,6 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type)
 
 	family = XFRM_SPI_SKB_CB(skb)->family;
 
-	/* if tunnel is present override skb->mark value with tunnel i_key */
-	switch (family) {
-	case AF_INET:
-		if (XFRM_TUNNEL_SKB_CB(skb)->tunnel.ip4)
-			mark = be32_to_cpu(XFRM_TUNNEL_SKB_CB(skb)->tunnel.ip4->parms.i_key);
-		break;
-	case AF_INET6:
-		if (XFRM_TUNNEL_SKB_CB(skb)->tunnel.ip6)
-			mark = be32_to_cpu(XFRM_TUNNEL_SKB_CB(skb)->tunnel.ip6->parms.i_key);
-		break;
-	}
-
 	err = secpath_set(skb);
 	if (err) {
 		XFRM_INC_STATS(net, LINUX_MIB_XFRMINERROR);
@@ -293,14 +281,29 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type)
 
 	daddr = (xfrm_address_t *)(skb_network_header(skb) +
 				   XFRM_SPI_SKB_CB(skb)->daddroff);
+
+	if (XFRM_TUNNEL_SKB_CB(skb)->tunnel.lookup) {
+		err = XFRM_TUNNEL_SKB_CB(skb)->tunnel.lookup(skb, nexthdr,
+							     spi, seq, &x);
+		if (err) {
+			XFRM_TUNNEL_SKB_CB(skb)->tunnel.lookup = NULL;
+			return err;
+		}
+	}
+
 	do {
 		if (skb->sp->len == XFRM_MAX_DEPTH) {
 			XFRM_INC_STATS(net, LINUX_MIB_XFRMINBUFFERERROR);
+			if (x)
+				xfrm_state_put(x);
 			goto drop;
 		}
 
-		x = xfrm_state_lookup(net, mark, daddr, spi, nexthdr, family);
-		if (x == NULL) {
+		if (!x)
+			x = xfrm_state_lookup(net, mark, daddr, spi, nexthdr,
+					      family);
+
+		if (!x) {
 			XFRM_INC_STATS(net, LINUX_MIB_XFRMINNOSTATES);
 			xfrm_audit_state_notfound(skb, family, spi, seq);
 			goto drop;
@@ -420,6 +423,9 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type)
 			XFRM_INC_STATS(net, LINUX_MIB_XFRMINHDRERROR);
 			goto drop;
 		}
+
+		if (!err)
+			x = NULL;
 	} while (!err);
 
 	err = xfrm_rcv_cb(skb, family, x->type->proto, 0);
-- 
2.15.1.620.gb9897f4670-goog

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH ipsec-next 5/7] net: xfrm: Deliver packets to keyed VTI tunnels.
  2017-12-20 17:06 [PATCH ipsec-next 0/7]: Support multiple VTIs with the same src+dst pair Lorenzo Colitti
                   ` (3 preceding siblings ...)
  2017-12-20 17:06 ` [PATCH ipsec-next 4/7] net: xfrm: Find VTI interfaces from xfrm_input Lorenzo Colitti
@ 2017-12-20 17:06 ` Lorenzo Colitti
  2017-12-20 17:06 ` [PATCH ipsec-next 6/7] net: xfrm: Allow userspace to configure " Lorenzo Colitti
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Lorenzo Colitti @ 2017-12-20 17:06 UTC (permalink / raw)
  To: netdev; +Cc: steffen.klassert, subashab, nharold, davem, Lorenzo Colitti

- Input works as follows:
  1. Attempt to match a regular VTI by IP addresses only. If that
     succeeds, use the i_key as the mark to look up the xfrm
     state.
  2. If the match failed, do an XFRM state lookup that ignores
     the mark. If that finds an state, then use the state match's
     mark to find the tunnel by its i_key.
- ICMP errors: similar to input, except the search is for the
  outbound XFRM state and the tunnel is found by o_key instead of
  by i_key.
- The output path is the same as existing VTIs. A routing lookup
  matches a VTI interface. The VTI uses its o_key as the mark to
  select an XFRM state. The state transforms the packet.

Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
---
 net/ipv4/ip_vti.c  | 52 ++++++++++++++++++++++++++++++------------
 net/ipv6/ip6_vti.c | 67 ++++++++++++++++++++++++++++++++++++++++--------------
 2 files changed, 88 insertions(+), 31 deletions(-)

diff --git a/net/ipv4/ip_vti.c b/net/ipv4/ip_vti.c
index 21f93e398e..9d28433a60 100644
--- a/net/ipv4/ip_vti.c
+++ b/net/ipv4/ip_vti.c
@@ -63,6 +63,18 @@ vti4_find_tunnel(struct sk_buff *skb, __be32 spi, struct xfrm_state **x)
 		*x = xfrm_state_lookup(net, be32_to_cpu(tunnel->parms.i_key),
 				       (xfrm_address_t *)&iph->daddr,
 				       spi, iph->protocol, AF_INET);
+	} else {
+		*x = xfrm_state_lookup_loose(net, skb->mark,
+					     (xfrm_address_t *)&iph->daddr,
+					     spi, iph->protocol, AF_INET);
+		if (!*x)
+			return NULL;
+		tunnel = ip_tunnel_lookup(itn, skb->dev->ifindex, TUNNEL_KEY,
+					  TUNNEL_LOOKUP_NO_KEY,
+					  iph->saddr, iph->daddr,
+					  cpu_to_be32((*x)->mark.v));
+		if (!tunnel)
+			xfrm_state_put(*x);
 	}
 
 	return tunnel;
@@ -302,7 +314,6 @@ static netdev_tx_t vti_tunnel_xmit(struct sk_buff *skb, struct net_device *dev)
 static int vti4_err(struct sk_buff *skb, u32 info)
 {
 	__be32 spi;
-	__u32 mark;
 	struct xfrm_state *x;
 	struct ip_tunnel *tunnel;
 	struct ip_esp_hdr *esph;
@@ -313,13 +324,6 @@ static int vti4_err(struct sk_buff *skb, u32 info)
 	int protocol = iph->protocol;
 	struct ip_tunnel_net *itn = net_generic(net, vti_net_id);
 
-	tunnel = ip_tunnel_lookup(itn, skb->dev->ifindex, TUNNEL_NO_KEY, 0,
-				  iph->daddr, iph->saddr, 0);
-	if (!tunnel)
-		return -1;
-
-	mark = be32_to_cpu(tunnel->parms.o_key);
-
 	switch (protocol) {
 	case IPPROTO_ESP:
 		esph = (struct ip_esp_hdr *)(skb->data+(iph->ihl<<2));
@@ -347,18 +351,38 @@ static int vti4_err(struct sk_buff *skb, u32 info)
 		return 0;
 	}
 
-	x = xfrm_state_lookup(net, mark, (const xfrm_address_t *)&iph->daddr,
-			      spi, protocol, AF_INET);
-	if (!x)
-		return 0;
+	tunnel = ip_tunnel_lookup(itn, skb->dev->ifindex, TUNNEL_NO_KEY, 0,
+				  iph->daddr, iph->saddr, 0);
+	if (tunnel) {
+		x = xfrm_state_lookup(net, be32_to_cpu(tunnel->parms.o_key),
+				      (xfrm_address_t *)&iph->daddr,
+				      spi, iph->protocol, AF_INET);
+	} else {
+		x = xfrm_state_lookup_loose(net, skb->mark,
+					    (xfrm_address_t *)&iph->daddr,
+					    spi, iph->protocol, AF_INET);
+		if (!x)
+			goto out;
+		tunnel = ip_tunnel_lookup(itn, skb->dev->ifindex, TUNNEL_KEY,
+					  TUNNEL_LOOKUP_NO_KEY |
+					  TUNNEL_LOOKUP_OKEY,
+					  iph->daddr, iph->saddr,
+					  cpu_to_be32(x->mark.v));
+	}
+
+	if (!tunnel || !x)
+		goto out;
 
 	if (icmp_hdr(skb)->type == ICMP_DEST_UNREACH)
 		ipv4_update_pmtu(skb, net, info, 0, 0, protocol, 0);
 	else
 		ipv4_redirect(skb, net, 0, 0, protocol, 0);
-	xfrm_state_put(x);
 
-	return 0;
+out:
+	if (x)
+		xfrm_state_put(x);
+
+	return tunnel ? 0 : -1;
 }
 
 static int
diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
index 5994fedd19..bf64821b8a 100644
--- a/net/ipv6/ip6_vti.c
+++ b/net/ipv6/ip6_vti.c
@@ -78,11 +78,21 @@ struct vti6_net {
 #define for_each_vti6_tunnel_rcu(start) \
 	for (t = rcu_dereference(start); t; t = rcu_dereference(t->next))
 
+static bool vti6_match_key(const struct ip6_tnl *t, __be32 key, bool in)
+{
+	__be16 tunnel_key = in ? t->parms.i_key : t->parms.o_key;
+	__be16 flags = in ? t->parms.i_flags : t->parms.o_flags;
+
+	return !(flags & TUNNEL_KEY) || tunnel_key == key;
+}
+
 /**
- * vti6_tnl_lookup - fetch tunnel matching the end-point addresses
+ * vti6_tnl_lookup - fetch tunnel matching the end-point addresses and key
  *   @net: network namespace
  *   @remote: the address of the tunnel exit-point
  *   @local: the address of the tunnel entry-point
+ *   @key: the key of the tunnel
+ *   @in: whether to match i_key or i_key
  *
  * Return:
  *   tunnel matching given end-points if found,
@@ -91,7 +101,7 @@ struct vti6_net {
  **/
 static struct ip6_tnl *
 vti6_tnl_lookup(struct net *net, const struct in6_addr *remote,
-		const struct in6_addr *local)
+		const struct in6_addr *local, __be32 key, bool in)
 {
 	unsigned int hash = HASH(remote, local);
 	struct ip6_tnl *t;
@@ -101,6 +111,7 @@ vti6_tnl_lookup(struct net *net, const struct in6_addr *remote,
 	for_each_vti6_tunnel_rcu(ip6n->tnls_r_l[hash]) {
 		if (ipv6_addr_equal(local, &t->parms.laddr) &&
 		    ipv6_addr_equal(remote, &t->parms.raddr) &&
+		    vti6_match_key(t, key, in) &&
 		    (t->dev->flags & IFF_UP))
 			return t;
 	}
@@ -109,6 +120,7 @@ vti6_tnl_lookup(struct net *net, const struct in6_addr *remote,
 	hash = HASH(&any, local);
 	for_each_vti6_tunnel_rcu(ip6n->tnls_r_l[hash]) {
 		if (ipv6_addr_equal(local, &t->parms.laddr) &&
+		    vti6_match_key(t, key, in) &&
 		    (t->dev->flags & IFF_UP))
 			return t;
 	}
@@ -116,6 +128,7 @@ vti6_tnl_lookup(struct net *net, const struct in6_addr *remote,
 	hash = HASH(remote, &any);
 	for_each_vti6_tunnel_rcu(ip6n->tnls_r_l[hash]) {
 		if (ipv6_addr_equal(remote, &t->parms.raddr) &&
+		    vti6_match_key(t, key, in) &&
 		    (t->dev->flags & IFF_UP))
 			return t;
 	}
@@ -266,7 +279,8 @@ static struct ip6_tnl *vti6_locate(struct net *net, struct __ip6_tnl_parm *p,
 	     (t = rtnl_dereference(*tp)) != NULL;
 	     tp = &t->next) {
 		if (ipv6_addr_equal(local, &t->parms.laddr) &&
-		    ipv6_addr_equal(remote, &t->parms.raddr)) {
+		    ipv6_addr_equal(remote, &t->parms.raddr) &&
+		    vti6_match_key(t, p->i_key, true)) {
 			if (create)
 				return NULL;
 
@@ -304,11 +318,21 @@ vti6_find_tunnel(struct sk_buff *skb, __be32 spi, struct xfrm_state **x)
 	struct net *net = dev_net(skb->dev);
 	struct ip6_tnl *t;
 
-	t = vti6_tnl_lookup(net, &ipv6h->saddr, &ipv6h->daddr);
+	t = vti6_tnl_lookup(net, &ipv6h->saddr, &ipv6h->daddr, 0, true);
 	if (t) {
 		*x = xfrm_state_lookup(net, be32_to_cpu(t->parms.i_key),
 				       (xfrm_address_t *)&ipv6h->daddr,
 				       spi, ipv6h->nexthdr, AF_INET6);
+	} else {
+		*x = xfrm_state_lookup_loose(net, skb->mark,
+					     (xfrm_address_t *)&ipv6h->daddr,
+					     spi, ipv6h->nexthdr, AF_INET6);
+		if (!*x)
+			return NULL;
+		t =  vti6_tnl_lookup(net, &ipv6h->saddr, &ipv6h->daddr,
+				     cpu_to_be32((*x)->mark.v), true);
+		if (!t)
+			xfrm_state_put(*x);
 	}
 
 	return t;
@@ -613,7 +637,6 @@ static int vti6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
 		    u8 type, u8 code, int offset, __be32 info)
 {
 	__be32 spi;
-	__u32 mark;
 	struct xfrm_state *x;
 	struct ip6_tnl *t;
 	struct ip_esp_hdr *esph;
@@ -623,12 +646,6 @@ static int vti6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
 	const struct ipv6hdr *iph = (const struct ipv6hdr *)skb->data;
 	int protocol = iph->nexthdr;
 
-	t = vti6_tnl_lookup(dev_net(skb->dev), &iph->daddr, &iph->saddr);
-	if (!t)
-		return -1;
-
-	mark = be32_to_cpu(t->parms.o_key);
-
 	switch (protocol) {
 	case IPPROTO_ESP:
 		esph = (struct ip_esp_hdr *)(skb->data + offset);
@@ -650,19 +667,35 @@ static int vti6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
 	    type != NDISC_REDIRECT)
 		return 0;
 
-	x = xfrm_state_lookup(net, mark, (const xfrm_address_t *)&iph->daddr,
-			      spi, protocol, AF_INET6);
-	if (!x)
-		return 0;
+	t = vti6_tnl_lookup(net, &iph->daddr, &iph->saddr, 0, false);
+	if (t) {
+		x = xfrm_state_lookup(net, be32_to_cpu(t->parms.o_key),
+				      (xfrm_address_t *)&iph->daddr,
+				      spi, protocol, AF_INET6);
+	} else {
+		x = xfrm_state_lookup_loose(net, skb->mark,
+					    (xfrm_address_t *)&iph->daddr,
+					    spi, protocol, AF_INET6);
+		if (!x)
+			goto out;
+		t = vti6_tnl_lookup(net, &iph->daddr, &iph->saddr,
+				    cpu_to_be32(x->mark.v), false);
+	}
+
+	if (!t || !x)
+		goto out;
 
 	if (type == NDISC_REDIRECT)
 		ip6_redirect(skb, net, skb->dev->ifindex, 0,
 			     sock_net_uid(net, NULL));
 	else
 		ip6_update_pmtu(skb, net, info, 0, 0, sock_net_uid(net, NULL));
-	xfrm_state_put(x);
 
-	return 0;
+out:
+	if (x)
+		xfrm_state_put(x);
+
+	return t ? 0 : -1;
 }
 
 static void vti6_link_config(struct ip6_tnl *t)
-- 
2.15.1.620.gb9897f4670-goog

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH ipsec-next 6/7] net: xfrm: Allow userspace to configure keyed VTI tunnels.
  2017-12-20 17:06 [PATCH ipsec-next 0/7]: Support multiple VTIs with the same src+dst pair Lorenzo Colitti
                   ` (4 preceding siblings ...)
  2017-12-20 17:06 ` [PATCH ipsec-next 5/7] net: xfrm: Deliver packets to keyed VTI tunnels Lorenzo Colitti
@ 2017-12-20 17:06 ` Lorenzo Colitti
  2017-12-20 17:06 ` [PATCH ipsec-next 7/7] net: xfrm: Don't pass tunnel objects to xfrm6_rcv_spi Lorenzo Colitti
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Lorenzo Colitti @ 2017-12-20 17:06 UTC (permalink / raw)
  To: netdev; +Cc: steffen.klassert, subashab, nharold, davem, Lorenzo Colitti

This commit allows userspace to configure keyed VTI tunnels by
adding a IFLA_VTI_FLAGS attribute and a VTI_KEYED flag. When set,
the flag causes the tunnel parameter i_flags to be set to
TUNNEL_KEY.

Creating both a non-keyed VTI and a keyed VTI on the same IP
src+dst pair is not useful. Because non-keyed VTIs always accept
packets, in such a configuration the keyed VTI would not receive
any traffic. This is disallowed by modifying the ip_tunnel_find
and vti6_locate functions to treat VTIs on the same src+dst pair
as identical unless they are both keyed (in which case they can
coexist, by design). So attempts to create such duplicate tunnels
- or to change one tunnel in such a way that it would duplicate
another - will fail with EEXIST.

Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
---
 include/uapi/linux/if_tunnel.h |  4 ++++
 net/ipv4/ip_tunnel.c           | 10 +++++++++-
 net/ipv4/ip_vti.c              | 26 +++++++++++++++++++++++---
 net/ipv6/ip6_vti.c             | 33 +++++++++++++++++++++++++++++++--
 4 files changed, 67 insertions(+), 6 deletions(-)

diff --git a/include/uapi/linux/if_tunnel.h b/include/uapi/linux/if_tunnel.h
index 1b3d148c45..b431b1c209 100644
--- a/include/uapi/linux/if_tunnel.h
+++ b/include/uapi/linux/if_tunnel.h
@@ -148,6 +148,9 @@ enum {
 /* VTI-mode i_flags */
 #define VTI_ISVTI ((__force __be16)0x0001)
 
+/* VTI netlink iflags. */
+#define VTI_KEYED 0x0001
+
 enum {
 	IFLA_VTI_UNSPEC,
 	IFLA_VTI_LINK,
@@ -156,6 +159,7 @@ enum {
 	IFLA_VTI_LOCAL,
 	IFLA_VTI_REMOTE,
 	IFLA_VTI_FWMARK,
+	IFLA_VTI_FLAGS,
 	__IFLA_VTI_MAX,
 };
 
diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index f45968bb81..9a0a56b491 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -84,6 +84,14 @@ static bool ip_tunnel_key_match(const struct ip_tunnel_parm *p,
 		return !(flags & TUNNEL_KEY);
 }
 
+static bool ip_tunnel_match(const struct ip_tunnel_parm *p,
+			    __be32 flags, u8 lookup_flags, __be32 key)
+{
+	return ip_tunnel_key_match(p, flags, lookup_flags, key) ||
+	       ((p->i_flags & flags & VTI_ISVTI) &&
+		!(p->i_flags & flags & TUNNEL_KEY));
+}
+
 /* Fallback tunnel: no source, no destination, no key, no options
 
    Tunnel hash table:
@@ -242,7 +250,7 @@ static struct ip_tunnel *ip_tunnel_find(struct ip_tunnel_net *itn,
 		    remote == t->parms.iph.daddr &&
 		    link == t->parms.link &&
 		    type == t->dev->type &&
-		    ip_tunnel_key_match(&t->parms, flags, 0, key))
+		    ip_tunnel_match(&t->parms, flags, 0, key))
 			break;
 	}
 	return t;
diff --git a/net/ipv4/ip_vti.c b/net/ipv4/ip_vti.c
index 9d28433a60..1f52719228 100644
--- a/net/ipv4/ip_vti.c
+++ b/net/ipv4/ip_vti.c
@@ -385,6 +385,16 @@ static int vti4_err(struct sk_buff *skb, u32 info)
 	return tunnel ? 0 : -1;
 }
 
+static __be16 vti_flags_to_tnl_flags(__u16 flags)
+{
+	return VTI_ISVTI | ((flags & VTI_KEYED) ? TUNNEL_KEY : 0);
+}
+
+static __u16 tnl_flags_to_vti_flags(__be16 i_flags)
+{
+	return (i_flags & TUNNEL_KEY) ? VTI_KEYED : 0;
+}
+
 static int
 vti_tunnel_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 {
@@ -525,6 +535,8 @@ static void vti_netlink_parms(struct nlattr *data[],
 			      struct ip_tunnel_parm *parms,
 			      __u32 *fwmark)
 {
+	__u16 flags = 0;
+
 	memset(parms, 0, sizeof(*parms));
 
 	parms->iph.protocol = IPPROTO_IPIP;
@@ -532,8 +544,6 @@ static void vti_netlink_parms(struct nlattr *data[],
 	if (!data)
 		return;
 
-	parms->i_flags = VTI_ISVTI;
-
 	if (data[IFLA_VTI_LINK])
 		parms->link = nla_get_u32(data[IFLA_VTI_LINK]);
 
@@ -551,6 +561,11 @@ static void vti_netlink_parms(struct nlattr *data[],
 
 	if (data[IFLA_VTI_FWMARK])
 		*fwmark = nla_get_u32(data[IFLA_VTI_FWMARK]);
+
+	if (data[IFLA_VTI_FLAGS])
+		flags = nla_get_u16(data[IFLA_VTI_FLAGS]);
+
+	parms->i_flags = vti_flags_to_tnl_flags(flags);
 }
 
 static int vti_newlink(struct net *src_net, struct net_device *dev,
@@ -591,6 +606,8 @@ static size_t vti_get_size(const struct net_device *dev)
 		nla_total_size(4) +
 		/* IFLA_VTI_FWMARK */
 		nla_total_size(4) +
+		/* IFLA_VTI_FLAGS */
+		nla_total_size(2) +
 		0;
 }
 
@@ -604,7 +621,9 @@ static int vti_fill_info(struct sk_buff *skb, const struct net_device *dev)
 	    nla_put_be32(skb, IFLA_VTI_OKEY, p->o_key) ||
 	    nla_put_in_addr(skb, IFLA_VTI_LOCAL, p->iph.saddr) ||
 	    nla_put_in_addr(skb, IFLA_VTI_REMOTE, p->iph.daddr) ||
-	    nla_put_u32(skb, IFLA_VTI_FWMARK, t->fwmark))
+	    nla_put_u32(skb, IFLA_VTI_FWMARK, t->fwmark) ||
+	    nla_put_u16(skb, IFLA_VTI_FLAGS,
+			tnl_flags_to_vti_flags(p->i_flags)))
 		return -EMSGSIZE;
 
 	return 0;
@@ -617,6 +636,7 @@ static const struct nla_policy vti_policy[IFLA_VTI_MAX + 1] = {
 	[IFLA_VTI_LOCAL]	= { .len = FIELD_SIZEOF(struct iphdr, saddr) },
 	[IFLA_VTI_REMOTE]	= { .len = FIELD_SIZEOF(struct iphdr, daddr) },
 	[IFLA_VTI_FWMARK]	= { .type = NLA_U32 },
+	[IFLA_VTI_FLAGS]	= { .type = NLA_U16 },
 };
 
 static struct rtnl_link_ops vti_link_ops __read_mostly = {
diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
index bf64821b8a..18c2695dc3 100644
--- a/net/ipv6/ip6_vti.c
+++ b/net/ipv6/ip6_vti.c
@@ -86,6 +86,13 @@ static bool vti6_match_key(const struct ip6_tnl *t, __be32 key, bool in)
 	return !(flags & TUNNEL_KEY) || tunnel_key == key;
 }
 
+static bool vti6_match_tunnel(const struct ip6_tnl *t, struct __ip6_tnl_parm *p)
+{
+	return !(t->parms.i_flags & TUNNEL_KEY) ||
+	       !(p->i_flags & TUNNEL_KEY) ||
+	       vti6_match_key(t, p->i_key, true);
+}
+
 /**
  * vti6_tnl_lookup - fetch tunnel matching the end-point addresses and key
  *   @net: network namespace
@@ -280,7 +287,7 @@ static struct ip6_tnl *vti6_locate(struct net *net, struct __ip6_tnl_parm *p,
 	     tp = &t->next) {
 		if (ipv6_addr_equal(local, &t->parms.laddr) &&
 		    ipv6_addr_equal(remote, &t->parms.raddr) &&
-		    vti6_match_key(t, p->i_key, true)) {
+		    vti6_match_tunnel(t, p)) {
 			if (create)
 				return NULL;
 
@@ -990,9 +997,21 @@ static int vti6_validate(struct nlattr *tb[], struct nlattr *data[],
 	return 0;
 }
 
+static __be16 vti_flags_to_tnl_flags(__u16 i_flags)
+{
+	return VTI_ISVTI | ((i_flags & VTI_KEYED) ? TUNNEL_KEY : 0);
+}
+
+static __u16 tnl_flags_to_vti_flags(__be16 i_flags)
+{
+	return (i_flags & TUNNEL_KEY) ? VTI_KEYED : 0;
+}
+
 static void vti6_netlink_parms(struct nlattr *data[],
 			       struct __ip6_tnl_parm *parms)
 {
+	__u16 flags = 0;
+
 	memset(parms, 0, sizeof(*parms));
 
 	if (!data)
@@ -1015,6 +1034,11 @@ static void vti6_netlink_parms(struct nlattr *data[],
 
 	if (data[IFLA_VTI_FWMARK])
 		parms->fwmark = nla_get_u32(data[IFLA_VTI_FWMARK]);
+
+	if (data[IFLA_VTI_FLAGS])
+		flags = nla_get_u16(data[IFLA_VTI_FLAGS]);
+
+	parms->i_flags = vti_flags_to_tnl_flags(flags);
 }
 
 static int vti6_newlink(struct net *src_net, struct net_device *dev,
@@ -1084,6 +1108,8 @@ static size_t vti6_get_size(const struct net_device *dev)
 		nla_total_size(4) +
 		/* IFLA_VTI_FWMARK */
 		nla_total_size(4) +
+		/* IFLA_VTI_FLAGS */
+		nla_total_size(2) +
 		0;
 }
 
@@ -1097,7 +1123,9 @@ static int vti6_fill_info(struct sk_buff *skb, const struct net_device *dev)
 	    nla_put_in6_addr(skb, IFLA_VTI_REMOTE, &parm->raddr) ||
 	    nla_put_be32(skb, IFLA_VTI_IKEY, parm->i_key) ||
 	    nla_put_be32(skb, IFLA_VTI_OKEY, parm->o_key) ||
-	    nla_put_u32(skb, IFLA_VTI_FWMARK, parm->fwmark))
+	    nla_put_u32(skb, IFLA_VTI_FWMARK, parm->fwmark) ||
+	    nla_put_u16(skb, IFLA_VTI_FLAGS,
+			tnl_flags_to_vti_flags(parm->i_flags)))
 		goto nla_put_failure;
 	return 0;
 
@@ -1112,6 +1140,7 @@ static const struct nla_policy vti6_policy[IFLA_VTI_MAX + 1] = {
 	[IFLA_VTI_IKEY]		= { .type = NLA_U32 },
 	[IFLA_VTI_OKEY]		= { .type = NLA_U32 },
 	[IFLA_VTI_FWMARK]	= { .type = NLA_U32 },
+	[IFLA_VTI_FLAGS]	= { .type = NLA_U16 },
 };
 
 static struct rtnl_link_ops vti6_link_ops __read_mostly = {
-- 
2.15.1.620.gb9897f4670-goog

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH ipsec-next 7/7] net: xfrm: Don't pass tunnel objects to xfrm6_rcv_spi.
  2017-12-20 17:06 [PATCH ipsec-next 0/7]: Support multiple VTIs with the same src+dst pair Lorenzo Colitti
                   ` (5 preceding siblings ...)
  2017-12-20 17:06 ` [PATCH ipsec-next 6/7] net: xfrm: Allow userspace to configure " Lorenzo Colitti
@ 2017-12-20 17:06 ` Lorenzo Colitti
  2018-01-03 12:10 ` [PATCH ipsec-next 0/7]: Support multiple VTIs with the same src+dst pair Steffen Klassert
  2018-01-05 14:32 ` [ipsec-next,0/7] : " Antony Antony
  8 siblings, 0 replies; 12+ messages in thread
From: Lorenzo Colitti @ 2017-12-20 17:06 UTC (permalink / raw)
  To: netdev; +Cc: steffen.klassert, subashab, nharold, davem, Lorenzo Colitti

This change removes the tunnel parameter from xfrm6_rcv_spi and
deletes xfrm6_rcv_tnl. These were only used by the VTI code and
are now unused.

Tested: https://android-review.googlesource.com/571524
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
---
 include/net/xfrm.h      |  4 +---
 net/ipv6/ip6_vti.c      |  2 +-
 net/ipv6/xfrm6_input.c  | 13 +++----------
 net/ipv6/xfrm6_tunnel.c |  2 +-
 4 files changed, 6 insertions(+), 15 deletions(-)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 3d245f2f6f..fc19dda73c 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -1638,10 +1638,8 @@ int xfrm4_tunnel_deregister(struct xfrm_tunnel *handler, unsigned short family);
 void xfrm4_local_error(struct sk_buff *skb, u32 mtu);
 int xfrm6_extract_header(struct sk_buff *skb);
 int xfrm6_extract_input(struct xfrm_state *x, struct sk_buff *skb);
-int xfrm6_rcv_spi(struct sk_buff *skb, int nexthdr, __be32 spi,
-		  struct ip6_tnl *t);
+int xfrm6_rcv_spi(struct sk_buff *skb, int nexthdr, __be32 spi);
 int xfrm6_transport_finish(struct sk_buff *skb, int async);
-int xfrm6_rcv_tnl(struct sk_buff *skb, struct ip6_tnl *t);
 int xfrm6_rcv(struct sk_buff *skb);
 int xfrm6_input_addr(struct sk_buff *skb, xfrm_address_t *daddr,
 		     xfrm_address_t *saddr, u8 proto);
diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
index 18c2695dc3..2ac0bfff0f 100644
--- a/net/ipv6/ip6_vti.c
+++ b/net/ipv6/ip6_vti.c
@@ -401,7 +401,7 @@ static int vti6_rcv(struct sk_buff *skb)
 	int nexthdr = skb_network_header(skb)[IP6CB(skb)->nhoff];
 
 	XFRM_TUNNEL_SKB_CB(skb)->tunnel.lookup = vti6_lookup;
-	return xfrm6_rcv_spi(skb, nexthdr, 0, NULL);
+	return xfrm6_rcv_spi(skb, nexthdr, 0);
 }
 
 static int vti6_rcv_cb(struct sk_buff *skb, int err)
diff --git a/net/ipv6/xfrm6_input.c b/net/ipv6/xfrm6_input.c
index 6d1b734fef..5f20e30926 100644
--- a/net/ipv6/xfrm6_input.c
+++ b/net/ipv6/xfrm6_input.c
@@ -22,8 +22,7 @@ int xfrm6_extract_input(struct xfrm_state *x, struct sk_buff *skb)
 	return xfrm6_extract_header(skb);
 }
 
-int xfrm6_rcv_spi(struct sk_buff *skb, int nexthdr, __be32 spi,
-		  struct ip6_tnl *t)
+int xfrm6_rcv_spi(struct sk_buff *skb, int nexthdr, __be32 spi)
 {
 	XFRM_SPI_SKB_CB(skb)->family = AF_INET6;
 	XFRM_SPI_SKB_CB(skb)->daddroff = offsetof(struct ipv6hdr, daddr);
@@ -59,16 +58,10 @@ int xfrm6_transport_finish(struct sk_buff *skb, int async)
 	return -1;
 }
 
-int xfrm6_rcv_tnl(struct sk_buff *skb, struct ip6_tnl *t)
-{
-	return xfrm6_rcv_spi(skb, skb_network_header(skb)[IP6CB(skb)->nhoff],
-			     0, t);
-}
-EXPORT_SYMBOL(xfrm6_rcv_tnl);
-
 int xfrm6_rcv(struct sk_buff *skb)
 {
-	return xfrm6_rcv_tnl(skb, NULL);
+	return xfrm6_rcv_spi(skb, skb_network_header(skb)[IP6CB(skb)->nhoff],
+			     0);
 }
 EXPORT_SYMBOL(xfrm6_rcv);
 int xfrm6_input_addr(struct sk_buff *skb, xfrm_address_t *daddr,
diff --git a/net/ipv6/xfrm6_tunnel.c b/net/ipv6/xfrm6_tunnel.c
index f85f0d7480..02161543a9 100644
--- a/net/ipv6/xfrm6_tunnel.c
+++ b/net/ipv6/xfrm6_tunnel.c
@@ -236,7 +236,7 @@ static int xfrm6_tunnel_rcv(struct sk_buff *skb)
 	__be32 spi;
 
 	spi = xfrm6_tunnel_spi_lookup(net, (const xfrm_address_t *)&iph->saddr);
-	return xfrm6_rcv_spi(skb, IPPROTO_IPV6, spi, NULL);
+	return xfrm6_rcv_spi(skb, IPPROTO_IPV6, spi);
 }
 
 static int xfrm6_tunnel_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
-- 
2.15.1.620.gb9897f4670-goog

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH ipsec-next 0/7]: Support multiple VTIs with the same src+dst pair
  2017-12-20 17:06 [PATCH ipsec-next 0/7]: Support multiple VTIs with the same src+dst pair Lorenzo Colitti
                   ` (6 preceding siblings ...)
  2017-12-20 17:06 ` [PATCH ipsec-next 7/7] net: xfrm: Don't pass tunnel objects to xfrm6_rcv_spi Lorenzo Colitti
@ 2018-01-03 12:10 ` Steffen Klassert
  2018-01-04 16:41   ` Lorenzo Colitti
  2018-01-05 14:32 ` [ipsec-next,0/7] : " Antony Antony
  8 siblings, 1 reply; 12+ messages in thread
From: Steffen Klassert @ 2018-01-03 12:10 UTC (permalink / raw)
  To: Lorenzo Colitti; +Cc: netdev, subashab, nharold, davem

On Thu, Dec 21, 2017 at 02:06:00AM +0900, Lorenzo Colitti wrote:
> When using IPsec tunnel mode, VTIs provide many benefits compared
> to direct configuration of xfrm policies / states. However, one
> limitation is that there can only be one VTI between a given pair
> of IP addresses. This does not allow configuring multiple IPsec
> tunnels to the same security gateway. This is required by some
> deployments, for example I-WLAN [3GPP TS 24.327].
> 
> This patchset introduces a new VTI_KEYED flag that allows
> configuration of multiple VTIs between the same IP address
> pairs. The semantics are as follows:
> 
> - The output path is the same as current VTI behaviour, where a
>   routing lookup selects a VTI interface, and the VTI's okey
>   specifies the mark to use in the XFRM lookup.
> - The input and ICMP error paths instead work by first looking up
>   an SA with a loose match that ignores the mark. That mark is
>   then used to find the tunnel by ikey (for input packets) or
>   okey (for ICMP errors).
> 
> In order for ICMP errors to work, flags are added to the common
> IP lookup functions to ignore the tunnel ikey and to look up
> tunnels by okey instead of ikey.
> 
> On the same IP address pair, keyed VTIs can coexist with each
> other (as long as the ikeys are different), but cannot coexist
> with keyless VTIs. This is because the existing keyless VTI
> behaviour (which this series does not change) is to always accept
> packets matching an input policy, regardless of whether there is
> any matching XFRM state. Thus, the keyless VTI would accept the
> traffic for the keyed tunnel and drop it because it would not
> match the keyed tunnel's state.

I think this is not the way we should go. Instead of creating some
new sort of VTI interfaces that have incompatibilities with the old
ones, we should create something new.

The fact that you need new keyed VTIs looks a bit like a workaround
of the design limitations the VTI interfaces have. Unfortunately
this is not the only limitation of VTI and I think we don't get what
we really want by changing VTI without breaking existing userspace.

The problem is that VTI interfaces are IP tunnels, and this is
not the thing we need. The tunnel is already implemented in the
generic xfrm code. All we need is some interface we can route
through. In particular we need something that can work with
transport mode too.

I showed already some ideas on creating xfrm interfaces at the
IPsec workshop in Seoul and my plan is to discuss this at the
upcomming IPsec workshop, so that we get something everybody is
happy with. In particular I want to have feedback from the
userspace IPsec IKE developers before we change/create something.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH ipsec-next 0/7]: Support multiple VTIs with the same src+dst pair
  2018-01-03 12:10 ` [PATCH ipsec-next 0/7]: Support multiple VTIs with the same src+dst pair Steffen Klassert
@ 2018-01-04 16:41   ` Lorenzo Colitti
  2018-01-05  7:16     ` Steffen Klassert
  0 siblings, 1 reply; 12+ messages in thread
From: Lorenzo Colitti @ 2018-01-04 16:41 UTC (permalink / raw)
  To: Steffen Klassert
  Cc: netdev, Subash Abhinov Kasiviswanathan, Nathan Harold,
	David Miller

On Wed, Jan 3, 2018 at 9:10 PM, Steffen Klassert
<steffen.klassert@secunet.com> wrote:
> The fact that you need new keyed VTIs looks a bit like a workaround
> of the design limitations the VTI interfaces have. Unfortunately
> this is not the only limitation of VTI and I think we don't get what
> we really want by changing VTI without breaking existing userspace.

Actually, I added the flag mostly to ensure that there would be no
changes in behaviour at all - so, for example, to return EEXIST if
someone tried to create two VTIs on the same IP address pair without
the flag. But perhaps that's not important. It's unlikely that anyone
would be trying to do this, since it has always returned an error.

If it's indeed not important, then I think it may be possible to fix
the limitations that stop there from being two VTIs with the same IP
address pair without introducing a new flag or userspace-visible
changes. (I don't think it's too far off what this patch series does
today.) If existing setups that only have one VTI per IP address pair
continue to work as before, but setups where there is more than one
VTI per IP address pair now work in some way, would that be
acceptable?

> The problem is that VTI interfaces are IP tunnels, and this is
> not the thing we need. The tunnel is already implemented in the
> generic xfrm code. All we need is some interface we can route
> through. In particular we need something that can work with
> transport mode too.

Well, I'm not sure. Personally I think VTIs are a pretty natural fit
for tunnel mode IPsec. For example, they provide an easy way to assign
an IP address to an IPsec tunnel which is then used for packets
originated on that tunnel. That doesn't really make much sense in
transport mode, because in transport mode the IP addresses used are
the ones of the physical interfaces that send the packets.

> I showed already some ideas on creating xfrm interfaces at the
> IPsec workshop in Seoul and my plan is to discuss this at the
> upcomming IPsec workshop, so that we get something everybody is
> happy with. In particular I want to have feedback from the
> userspace IPsec IKE developers before we change/create something.

I did look at the code in the ipsec-next-xfrmi tree for a while -
wrote some tests for it, etc. The main reason I didn't pursue it is
that, as written, it couldn't support our use cases. The main reasons
were:

1. It needs to be bound to a specific underlying interface. It looks
like that interface must have a 6-byte hardware address (and thus
can't be a cellular interface), but I'm not 100% sure. By contrast,
the VTI supports an optional underlying link index, and doesn't pose
any requirements on hardware addresses. If it's possible to make the
underlying interface optional, by storing the underlying ifindex
instead of the dev (like tunnels do) then that might work.
2. It cannot use the output mark to influence routing of the
transformed packets, because it uses the output mark/mask for its own
purposes. Unfortunately, influencing routing of the transformed
packets was the reason we proposed XFRMA_OUTPUT_MARK in the first
place, so this is a showstopper :-(. Do you recall why you used the
output mark for this, as opposed to the SA mark? If it's possible to
use the SA mark instead, that might work.

If you're willing to evolve the xfrmi design in response to our
feedback, I can try to make the xfrmi code fit our use cases and send
patches over the next couple of weeks. But I don't think we can wait
until the discussion at the ipsec workshop to discover whether xfrmi
might be a feasible solution for us. By then we'll either have had to
do something out of tree (likely the keyed VTI patches, or something
like them) or postponed this work to a future release.

Thoughts?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH ipsec-next 0/7]: Support multiple VTIs with the same src+dst pair
  2018-01-04 16:41   ` Lorenzo Colitti
@ 2018-01-05  7:16     ` Steffen Klassert
  0 siblings, 0 replies; 12+ messages in thread
From: Steffen Klassert @ 2018-01-05  7:16 UTC (permalink / raw)
  To: Lorenzo Colitti
  Cc: netdev, Subash Abhinov Kasiviswanathan, Nathan Harold,
	David Miller

On Fri, Jan 05, 2018 at 01:41:46AM +0900, Lorenzo Colitti wrote:
> On Wed, Jan 3, 2018 at 9:10 PM, Steffen Klassert
> <steffen.klassert@secunet.com> wrote:
> > The fact that you need new keyed VTIs looks a bit like a workaround
> > of the design limitations the VTI interfaces have. Unfortunately
> > this is not the only limitation of VTI and I think we don't get what
> > we really want by changing VTI without breaking existing userspace.
> 
> Actually, I added the flag mostly to ensure that there would be no
> changes in behaviour at all - so, for example, to return EEXIST if
> someone tried to create two VTIs on the same IP address pair without
> the flag. But perhaps that's not important. It's unlikely that anyone
> would be trying to do this, since it has always returned an error.
> 
> If it's indeed not important, then I think it may be possible to fix
> the limitations that stop there from being two VTIs with the same IP
> address pair without introducing a new flag or userspace-visible
> changes. (I don't think it's too far off what this patch series does
> today.) If existing setups that only have one VTI per IP address pair
> continue to work as before, but setups where there is more than one
> VTI per IP address pair now work in some way, would that be
> acceptable?

Well, it would be acceptable to support this. The reason why I don't
want it is that we already had problems with such extensions in the
past. The VTI interfaces were developed for one special usecase,
then extended to try to support more. The result was that some user
configurations did not work anymore. And still it has design
limitations that we can't work around.

Also we changed generic tunnel lookup code to flag that we use GRE
keys for something that is not a GRE key but a VTI interface marker.
Your patches need to extend the generic tunnel lookup code again
for this very special usecase. I just don't want to mess around
too much with this.

> 
> > The problem is that VTI interfaces are IP tunnels, and this is
> > not the thing we need. The tunnel is already implemented in the
> > generic xfrm code. All we need is some interface we can route
> > through. In particular we need something that can work with
> > transport mode too.
> 
> Well, I'm not sure. Personally I think VTIs are a pretty natural fit
> for tunnel mode IPsec. For example, they provide an easy way to assign
> an IP address to an IPsec tunnel which is then used for packets
> originated on that tunnel.

The IPsec tunnel endpoint addresses are already defined by the SA,
so there is no need to define them again at the interface. All we
need is some marker (maybe a new lookup key) to assign the SAs
to a certain VTI interface.

> That doesn't really make much sense in
> transport mode, because in transport mode the IP addresses used are
> the ones of the physical interfaces that send the packets.

Right, and this is one of the limitations we can't overcome with
a VTI. So we need to find a new solution for transport mode anyway.

> 
> > I showed already some ideas on creating xfrm interfaces at the
> > IPsec workshop in Seoul and my plan is to discuss this at the
> > upcomming IPsec workshop, so that we get something everybody is
> > happy with. In particular I want to have feedback from the
> > userspace IPsec IKE developers before we change/create something.
> 
> I did look at the code in the ipsec-next-xfrmi tree for a while -
> wrote some tests for it, etc.

To give people on the list a chance to follow what we are talking about,
here is the link to the code:

https://git.kernel.org/pub/scm/linux/kernel/git/klassert/linux-stk.git/log/?h=ipsec-next-xfrmi

> The main reason I didn't pursue it is
> that, as written, it couldn't support our use cases. The main reasons
> were:
> 
> 1. It needs to be bound to a specific underlying interface. It looks
> like that interface must have a 6-byte hardware address (and thus
> can't be a cellular interface), but I'm not 100% sure. By contrast,
> the VTI supports an optional underlying link index, and doesn't pose
> any requirements on hardware addresses. If it's possible to make the
> underlying interface optional, by storing the underlying ifindex
> instead of the dev (like tunnels do) then that might work.

This should be possible.

> 2. It cannot use the output mark to influence routing of the
> transformed packets, because it uses the output mark/mask for its own
> purposes. Unfortunately, influencing routing of the transformed
> packets was the reason we proposed XFRMA_OUTPUT_MARK in the first
> place, so this is a showstopper :-(. Do you recall why you used the
> output mark for this, as opposed to the SA mark? If it's possible to
> use the SA mark instead, that might work.

Well, using the output mark was an easy way to get a lookup working
for the first version. I already noticed that it was not a so good
idea. Maybe we need some new lookup key for IPsec lookups...

> 
> If you're willing to evolve the xfrmi design in response to our
> feedback, I can try to make the xfrmi code fit our use cases and send
> patches over the next couple of weeks.

Yes, sure :-)

The current code is just a discussion base, it was likely that
it changes.

> But I don't think we can wait
> until the discussion at the ipsec workshop to discover whether xfrmi
> might be a feasible solution for us. By then we'll either have had to
> do something out of tree (likely the keyed VTI patches, or something
> like them) or postponed this work to a future release.

I don't want to rush with this, I want to have feedback from as many
potential users as possible to be sure to end up with the right thing.
I really want to avoid to have yet another inerface that is almost what
we need, like VTI is.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [ipsec-next,0/7] : Support multiple VTIs with the same src+dst pair
  2017-12-20 17:06 [PATCH ipsec-next 0/7]: Support multiple VTIs with the same src+dst pair Lorenzo Colitti
                   ` (7 preceding siblings ...)
  2018-01-03 12:10 ` [PATCH ipsec-next 0/7]: Support multiple VTIs with the same src+dst pair Steffen Klassert
@ 2018-01-05 14:32 ` Antony Antony
  8 siblings, 0 replies; 12+ messages in thread
From: Antony Antony @ 2018-01-05 14:32 UTC (permalink / raw)
  To: Lorenzo Colitti; +Cc: netdev, steffen.klassert, subashab, nharold, davem

Hi Lorenzo,
I agree vti is very limiting! I am glad to hear about xfrmi.

I saw two tunnels between gateways send traffic using VTI. So I am curious 
what is different in your case. Or are you dealing with something else?

Here are a couple of outputs from libreswan testing 

this is the verbose output.
https://swantest.libreswan.fi/results/blackswan/2018-01-05-swantest-3.22-596-g6bac0de35-master/netkey-vti-09/OUTPUT/west.console.verbose.txt

[root@west netkey-vti-09]# ip xfrm state
src 192.1.2.23 dst 192.1.2.45
	proto esp spi 0x0eb97a56 reqid 16393 mode tunnel
	replay-window 32 flag af-unspec
	aead rfc4106(gcm(aes)) 
0xcf25becdc6c196ff04b9e893d2caa387ed63bc96cf66ab7c1e5c74d32b9eeb801f06fbf3 
128
src 192.1.2.45 dst 192.1.2.23
	proto esp spi 0x507929e8 reqid 16393 mode tunnel
	replay-window 32 flag af-unspec
	aead rfc4106(gcm(aes)) 
0x8b5f8143cb6f361e57b0096bc1a444a65e244f97ba76e8779b968bcae54b78cea7c17d38 
128
src 192.1.2.23 dst 192.1.2.45
	proto esp spi 0x73eab5cf reqid 16389 mode tunnel
	replay-window 32 flag af-unspec
	aead rfc4106(gcm(aes)) 
0xb00cc4d0da414cba35c9a41e176fdafb92af0a7912d633c1c3f4623bee2d090173e3fca3 
128
src 192.1.2.45 dst 192.1.2.23
	proto esp spi 0x52581beb reqid 16389 mode tunnel
	replay-window 32 flag af-unspec
	aead rfc4106(gcm(aes)) 
0x01b906499806ec2436ffe125f54ce297c3a3a14bb19fcbf51f16888c6927a20cd0dc5e67 
128

a santized output 
https://swantest.libreswan.fi/results/blackswan/2018-01-05-swantest-3.22-596-g6bac0de35-master/netkey-vti-09/OUTPUT/west.console.txt

Also from an older test:
https://swantest.libreswan.fi/results/blackswan/2018-01-04-swantest-3.22-596-g6bac0de35-master/netkey-vti-02/OUTPUT/west.console.txt

regards,
-antony

On Thu, Dec 21, 2017 at 02:06:00AM +0900, Lorenzo Colitti wrote:
> When using IPsec tunnel mode, VTIs provide many benefits compared
> to direct configuration of xfrm policies / states. However, one
> limitation is that there can only be one VTI between a given pair
> of IP addresses. This does not allow configuring multiple IPsec
> tunnels to the same security gateway. This is required by some
> deployments, for example I-WLAN [3GPP TS 24.327].
> 
> This patchset introduces a new VTI_KEYED flag that allows
> configuration of multiple VTIs between the same IP address
> pairs. The semantics are as follows:
> 
> - The output path is the same as current VTI behaviour, where a
>   routing lookup selects a VTI interface, and the VTI's okey
>   specifies the mark to use in the XFRM lookup.
> - The input and ICMP error paths instead work by first looking up
>   an SA with a loose match that ignores the mark. That mark is
>   then used to find the tunnel by ikey (for input packets) or
>   okey (for ICMP errors).
> 
> In order for ICMP errors to work, flags are added to the common
> IP lookup functions to ignore the tunnel ikey and to look up
> tunnels by okey instead of ikey.
> 
> On the same IP address pair, keyed VTIs can coexist with each
> other (as long as the ikeys are different), but cannot coexist
> with keyless VTIs. This is because the existing keyless VTI
> behaviour (which this series does not change) is to always accept
> packets matching an input policy, regardless of whether there is
> any matching XFRM state. Thus, the keyless VTI would accept the
> traffic for the keyed tunnel and drop it because it would not
> match the keyed tunnel's state.
> 
> Changes from RFC series:
> - Processing of ICMP errors now works when ikey != okey.
> - Series now contains changes to the common tunnel lookup
>   functions to match tunnels by okey and to ignore ikey when
>   matching.
> - Fixed missing EXPORT_SYMBOL for xfrm_state_lookup_loose.
> - Made vti6_lookup static as it should have been.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2018-01-05 14:32 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-12-20 17:06 [PATCH ipsec-next 0/7]: Support multiple VTIs with the same src+dst pair Lorenzo Colitti
2017-12-20 17:06 ` [PATCH ipsec-next 1/7] net: xfrm: Don't check for TUNNEL_KEY when hashing VTI tunnels Lorenzo Colitti
2017-12-20 17:06 ` [PATCH ipsec-next 2/7] net: ipv4: Add new flags to tunnel lookup Lorenzo Colitti
2017-12-20 17:06 ` [PATCH ipsec-next 3/7] net: xfrm: Add an xfrm lookup that ignores the mark Lorenzo Colitti
2017-12-20 17:06 ` [PATCH ipsec-next 4/7] net: xfrm: Find VTI interfaces from xfrm_input Lorenzo Colitti
2017-12-20 17:06 ` [PATCH ipsec-next 5/7] net: xfrm: Deliver packets to keyed VTI tunnels Lorenzo Colitti
2017-12-20 17:06 ` [PATCH ipsec-next 6/7] net: xfrm: Allow userspace to configure " Lorenzo Colitti
2017-12-20 17:06 ` [PATCH ipsec-next 7/7] net: xfrm: Don't pass tunnel objects to xfrm6_rcv_spi Lorenzo Colitti
2018-01-03 12:10 ` [PATCH ipsec-next 0/7]: Support multiple VTIs with the same src+dst pair Steffen Klassert
2018-01-04 16:41   ` Lorenzo Colitti
2018-01-05  7:16     ` Steffen Klassert
2018-01-05 14:32 ` [ipsec-next,0/7] : " Antony Antony

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).