Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH RFC 3/5] net:stmmac: ensure we reclaim all dirty descriptors.
From: Jimmy PERCHET @ 2013-10-18  8:32 UTC (permalink / raw)
  To: Sergei Shtylyov; +Cc: peppe.cavallaro, netdev
In-Reply-To: <525ED0DD.9070201@cogentembedded.com>

Hello,

> 
>   The continuation line should stat right under &, according to thenetworking coding style.

Thanks for your comment,
I realized that there is several other issues regarding continuation line in my patch series.
I'll fix them in v2.

Best Regards,
Jimmy

^ permalink raw reply

* [PATCH 1/4] xfrm: Force SA to be lookup again if SA in acquire state
From: Steffen Klassert @ 2013-10-18  8:42 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev
In-Reply-To: <1382085730-3415-1-git-send-email-steffen.klassert@secunet.com>

From: Fan Du <fan.du@windriver.com>

If SA is in the process of acquiring, which indicates this SA is more
promising and precise than the fall back option, i.e. using wild card
source address for searching less suitable SA.

So, here bail out, and try again.

Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/xfrm/xfrm_state.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index d6e7f98..b2117a1 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -815,7 +815,7 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr,
 			xfrm_state_look_at(pol, x, fl, encap_family,
 					   &best, &acquire_in_progress, &error);
 	}
-	if (best)
+	if (best || acquire_in_progress)
 		goto found;
 
 	h_wildcard = xfrm_dst_hash(net, daddr, &saddr_wildcard, tmpl->reqid, encap_family);
-- 
1.7.9.5

^ permalink raw reply related

* pull request (net-next): ipsec-next 2013-10-18
From: Steffen Klassert @ 2013-10-18  8:42 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev

1) Don't use a wildcard SA if a more precise one is in acquire state,
   from Fan Du.

2) Simplify the SA lookup when using wildcard source. We need to check
   only the destination in this case, from Fan Du.

3) Add a receive path hook for IPsec virtual tunnel interfaces
   to xfrm6_mode_tunnel.

4) Add support for IPsec virtual tunnel interfaces to ipv6.

Please pull or let me know if there are problems.

Thanks!

The following changes since commit b32418705107265dfca5edfe2b547643e53a732e:

  bonding: RCUify bond_set_rx_mode() (2013-09-30 22:26:41 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next.git master

for you to fetch changes up to ed1efb2aefbbc6f5a3da5b42158bfb753ba6fe82:

  ipv6: Add support for IPsec virtual tunnel interfaces (2013-10-10 12:00:01 +0200)

----------------------------------------------------------------
Fan Du (2):
      xfrm: Force SA to be lookup again if SA in acquire state
      xfrm: Simplify SA looking up when using wildcard source

Steffen Klassert (2):
      ipv6: Add a receive path hook for vti6 in xfrm6_mode_tunnel.
      ipv6: Add support for IPsec virtual tunnel interfaces

 include/net/xfrm.h           |    2 +
 net/ipv6/Kconfig             |   11 +
 net/ipv6/Makefile            |    1 +
 net/ipv6/ip6_vti.c           | 1056 ++++++++++++++++++++++++++++++++++++++++++
 net/ipv6/xfrm6_mode_tunnel.c |   69 +++
 net/xfrm/xfrm_state.c        |    4 +-
 6 files changed, 1141 insertions(+), 2 deletions(-)
 create mode 100644 net/ipv6/ip6_vti.c

^ permalink raw reply

* [PATCH 2/4] xfrm: Simplify SA looking up when using wildcard source
From: Steffen Klassert @ 2013-10-18  8:42 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev
In-Reply-To: <1382085730-3415-1-git-send-email-steffen.klassert@secunet.com>

From: Fan Du <fan.du@windriver.com>

__xfrm4/6_state_addr_check is a four steps check, all we need to do
is checking whether the destination address match when looking SA
using wildcard source address. Passing saddr from flow is worst option,
as the checking needs to reach the fourth step while actually only
one time checking will do the work.

So, simplify this process by only checking destination address when
using wildcard source address for looking up SAs.

Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/xfrm/xfrm_state.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index b2117a1..68c2f35 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -824,7 +824,7 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr,
 		    x->props.reqid == tmpl->reqid &&
 		    (mark & x->mark.m) == x->mark.v &&
 		    !(x->props.flags & XFRM_STATE_WILDRECV) &&
-		    xfrm_state_addr_check(x, daddr, saddr, encap_family) &&
+		    xfrm_addr_equal(&x->id.daddr, daddr, encap_family) &&
 		    tmpl->mode == x->props.mode &&
 		    tmpl->id.proto == x->id.proto &&
 		    (tmpl->id.spi == x->id.spi || !tmpl->id.spi))
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH 3/4] ipv6: Add a receive path hook for vti6 in xfrm6_mode_tunnel.
From: Steffen Klassert @ 2013-10-18  8:42 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev
In-Reply-To: <1382085730-3415-1-git-send-email-steffen.klassert@secunet.com>

Add a receive path hook for the IPsec vritual tunnel interface.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 include/net/xfrm.h           |    2 ++
 net/ipv6/xfrm6_mode_tunnel.c |   69 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 71 insertions(+)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index b8a9ed8..6b82fdf 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -1508,6 +1508,8 @@ int xfrm4_tunnel_deregister(struct xfrm_tunnel *handler, unsigned short family);
 void xfrm4_local_error(struct sk_buff *skb, u32 mtu);
 int xfrm4_mode_tunnel_input_register(struct xfrm_tunnel_notifier *handler);
 int xfrm4_mode_tunnel_input_deregister(struct xfrm_tunnel_notifier *handler);
+int xfrm6_mode_tunnel_input_register(struct xfrm_tunnel_notifier *handler);
+int xfrm6_mode_tunnel_input_deregister(struct xfrm_tunnel_notifier *handler);
 int xfrm6_extract_header(struct sk_buff *skb);
 int xfrm6_extract_input(struct xfrm_state *x, struct sk_buff *skb);
 int xfrm6_rcv_spi(struct sk_buff *skb, int nexthdr, __be32 spi);
diff --git a/net/ipv6/xfrm6_mode_tunnel.c b/net/ipv6/xfrm6_mode_tunnel.c
index 4770d51..cb04f7a 100644
--- a/net/ipv6/xfrm6_mode_tunnel.c
+++ b/net/ipv6/xfrm6_mode_tunnel.c
@@ -18,6 +18,65 @@
 #include <net/ipv6.h>
 #include <net/xfrm.h>
 
+/* Informational hook. The decap is still done here. */
+static struct xfrm_tunnel_notifier __rcu *rcv_notify_handlers __read_mostly;
+static DEFINE_MUTEX(xfrm6_mode_tunnel_input_mutex);
+
+int xfrm6_mode_tunnel_input_register(struct xfrm_tunnel_notifier *handler)
+{
+	struct xfrm_tunnel_notifier __rcu **pprev;
+	struct xfrm_tunnel_notifier *t;
+	int ret = -EEXIST;
+	int priority = handler->priority;
+
+	mutex_lock(&xfrm6_mode_tunnel_input_mutex);
+
+	for (pprev = &rcv_notify_handlers;
+	     (t = rcu_dereference_protected(*pprev,
+	     lockdep_is_held(&xfrm6_mode_tunnel_input_mutex))) != NULL;
+	     pprev = &t->next) {
+		if (t->priority > priority)
+			break;
+		if (t->priority == priority)
+			goto err;
+
+	}
+
+	handler->next = *pprev;
+	rcu_assign_pointer(*pprev, handler);
+
+	ret = 0;
+
+err:
+	mutex_unlock(&xfrm6_mode_tunnel_input_mutex);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(xfrm6_mode_tunnel_input_register);
+
+int xfrm6_mode_tunnel_input_deregister(struct xfrm_tunnel_notifier *handler)
+{
+	struct xfrm_tunnel_notifier __rcu **pprev;
+	struct xfrm_tunnel_notifier *t;
+	int ret = -ENOENT;
+
+	mutex_lock(&xfrm6_mode_tunnel_input_mutex);
+	for (pprev = &rcv_notify_handlers;
+	     (t = rcu_dereference_protected(*pprev,
+	     lockdep_is_held(&xfrm6_mode_tunnel_input_mutex))) != NULL;
+	     pprev = &t->next) {
+		if (t == handler) {
+			*pprev = handler->next;
+			ret = 0;
+			break;
+		}
+	}
+	mutex_unlock(&xfrm6_mode_tunnel_input_mutex);
+	synchronize_net();
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(xfrm6_mode_tunnel_input_deregister);
+
 static inline void ipip6_ecn_decapsulate(struct sk_buff *skb)
 {
 	const struct ipv6hdr *outer_iph = ipv6_hdr(skb);
@@ -63,8 +122,15 @@ static int xfrm6_mode_tunnel_output(struct xfrm_state *x, struct sk_buff *skb)
 	return 0;
 }
 
+#define for_each_input_rcu(head, handler)	\
+	for (handler = rcu_dereference(head);	\
+	     handler != NULL;			\
+	     handler = rcu_dereference(handler->next))
+
+
 static int xfrm6_mode_tunnel_input(struct xfrm_state *x, struct sk_buff *skb)
 {
+	struct xfrm_tunnel_notifier *handler;
 	int err = -EINVAL;
 
 	if (XFRM_MODE_SKB_CB(skb)->protocol != IPPROTO_IPV6)
@@ -72,6 +138,9 @@ static int xfrm6_mode_tunnel_input(struct xfrm_state *x, struct sk_buff *skb)
 	if (!pskb_may_pull(skb, sizeof(struct ipv6hdr)))
 		goto out;
 
+	for_each_input_rcu(rcv_notify_handlers, handler)
+		handler->handler(skb);
+
 	err = skb_unclone(skb, GFP_ATOMIC);
 	if (err)
 		goto out;
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH 4/4] ipv6: Add support for IPsec virtual tunnel interfaces
From: Steffen Klassert @ 2013-10-18  8:42 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev
In-Reply-To: <1382085730-3415-1-git-send-email-steffen.klassert@secunet.com>

This patch adds IPv6  support for IPsec virtual tunnel interfaces
(vti). IPsec virtual tunnel interfaces provide a routable interface
for IPsec tunnel endpoints.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/ipv6/Kconfig   |   11 +
 net/ipv6/Makefile  |    1 +
 net/ipv6/ip6_vti.c | 1056 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 1068 insertions(+)
 create mode 100644 net/ipv6/ip6_vti.c

diff --git a/net/ipv6/Kconfig b/net/ipv6/Kconfig
index 11b13ea..e1a8d90 100644
--- a/net/ipv6/Kconfig
+++ b/net/ipv6/Kconfig
@@ -153,6 +153,17 @@ config INET6_XFRM_MODE_ROUTEOPTIMIZATION
 	---help---
 	  Support for MIPv6 route optimization mode.
 
+config IPV6_VTI
+tristate "Virtual (secure) IPv6: tunneling"
+	select IPV6_TUNNEL
+	depends on INET6_XFRM_MODE_TUNNEL
+	---help---
+	Tunneling means encapsulating data of one protocol type within
+	another protocol and sending it over a channel that understands the
+	encapsulating protocol. This can be used with xfrm mode tunnel to give
+	the notion of a secure tunnel for IPSEC and then use routing protocol
+	on top.
+
 config IPV6_SIT
 	tristate "IPv6: IPv6-in-IPv4 tunnel (SIT driver)"
 	select INET_TUNNEL
diff --git a/net/ipv6/Makefile b/net/ipv6/Makefile
index 470a9c0..17bb830 100644
--- a/net/ipv6/Makefile
+++ b/net/ipv6/Makefile
@@ -36,6 +36,7 @@ obj-$(CONFIG_INET6_XFRM_MODE_BEET) += xfrm6_mode_beet.o
 obj-$(CONFIG_IPV6_MIP6) += mip6.o
 obj-$(CONFIG_NETFILTER)	+= netfilter/
 
+obj-$(CONFIG_IPV6_VTI) += ip6_vti.o
 obj-$(CONFIG_IPV6_SIT) += sit.o
 obj-$(CONFIG_IPV6_TUNNEL) += ip6_tunnel.o
 obj-$(CONFIG_IPV6_GRE) += ip6_gre.o
diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
new file mode 100644
index 0000000..ed94ba6
--- /dev/null
+++ b/net/ipv6/ip6_vti.c
@@ -0,0 +1,1056 @@
+/*
+ *	IPv6 virtual tunneling interface
+ *
+ *	Copyright (C) 2013 secunet Security Networks AG
+ *
+ *	Author:
+ *	Steffen Klassert <steffen.klassert@secunet.com>
+ *
+ *	Based on:
+ *	net/ipv6/ip6_tunnel.c
+ *
+ *	This program is free software; you can redistribute it and/or
+ *	modify it under the terms of the GNU General Public License
+ *	as published by the Free Software Foundation; either version
+ *	2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include <linux/capability.h>
+#include <linux/errno.h>
+#include <linux/types.h>
+#include <linux/sockios.h>
+#include <linux/icmp.h>
+#include <linux/if.h>
+#include <linux/in.h>
+#include <linux/ip.h>
+#include <linux/if_tunnel.h>
+#include <linux/net.h>
+#include <linux/in6.h>
+#include <linux/netdevice.h>
+#include <linux/if_arp.h>
+#include <linux/icmpv6.h>
+#include <linux/init.h>
+#include <linux/route.h>
+#include <linux/rtnetlink.h>
+#include <linux/netfilter_ipv6.h>
+#include <linux/slab.h>
+#include <linux/hash.h>
+
+#include <linux/uaccess.h>
+#include <linux/atomic.h>
+
+#include <net/icmp.h>
+#include <net/ip.h>
+#include <net/ip_tunnels.h>
+#include <net/ipv6.h>
+#include <net/ip6_route.h>
+#include <net/addrconf.h>
+#include <net/ip6_tunnel.h>
+#include <net/xfrm.h>
+#include <net/net_namespace.h>
+#include <net/netns/generic.h>
+
+#define HASH_SIZE_SHIFT  5
+#define HASH_SIZE (1 << HASH_SIZE_SHIFT)
+
+static u32 HASH(const struct in6_addr *addr1, const struct in6_addr *addr2)
+{
+	u32 hash = ipv6_addr_hash(addr1) ^ ipv6_addr_hash(addr2);
+
+	return hash_32(hash, HASH_SIZE_SHIFT);
+}
+
+static int vti6_dev_init(struct net_device *dev);
+static void vti6_dev_setup(struct net_device *dev);
+static struct rtnl_link_ops vti6_link_ops __read_mostly;
+
+static int vti6_net_id __read_mostly;
+struct vti6_net {
+	/* the vti6 tunnel fallback device */
+	struct net_device *fb_tnl_dev;
+	/* lists for storing tunnels in use */
+	struct ip6_tnl __rcu *tnls_r_l[HASH_SIZE];
+	struct ip6_tnl __rcu *tnls_wc[1];
+	struct ip6_tnl __rcu **tnls[2];
+};
+
+static struct net_device_stats *vti6_get_stats(struct net_device *dev)
+{
+	struct pcpu_tstats sum = { 0 };
+	int i;
+
+	for_each_possible_cpu(i) {
+		const struct pcpu_tstats *tstats = per_cpu_ptr(dev->tstats, i);
+
+		sum.rx_packets += tstats->rx_packets;
+		sum.rx_bytes   += tstats->rx_bytes;
+		sum.tx_packets += tstats->tx_packets;
+		sum.tx_bytes   += tstats->tx_bytes;
+	}
+	dev->stats.rx_packets = sum.rx_packets;
+	dev->stats.rx_bytes   = sum.rx_bytes;
+	dev->stats.tx_packets = sum.tx_packets;
+	dev->stats.tx_bytes   = sum.tx_bytes;
+	return &dev->stats;
+}
+
+#define for_each_vti6_tunnel_rcu(start) \
+	for (t = rcu_dereference(start); t; t = rcu_dereference(t->next))
+
+/**
+ * vti6_tnl_lookup - fetch tunnel matching the end-point addresses
+ *   @net: network namespace
+ *   @remote: the address of the tunnel exit-point
+ *   @local: the address of the tunnel entry-point
+ *
+ * Return:
+ *   tunnel matching given end-points if found,
+ *   else fallback tunnel if its device is up,
+ *   else %NULL
+ **/
+static struct ip6_tnl *
+vti6_tnl_lookup(struct net *net, const struct in6_addr *remote,
+		const struct in6_addr *local)
+{
+	unsigned int hash = HASH(remote, local);
+	struct ip6_tnl *t;
+	struct vti6_net *ip6n = net_generic(net, vti6_net_id);
+
+	for_each_vti6_tunnel_rcu(ip6n->tnls_r_l[hash]) {
+		if (ipv6_addr_equal(local, &t->parms.laddr) &&
+		    ipv6_addr_equal(remote, &t->parms.raddr) &&
+		    (t->dev->flags & IFF_UP))
+			return t;
+	}
+	t = rcu_dereference(ip6n->tnls_wc[0]);
+	if (t && (t->dev->flags & IFF_UP))
+		return t;
+
+	return NULL;
+}
+
+/**
+ * vti6_tnl_bucket - get head of list matching given tunnel parameters
+ *   @p: parameters containing tunnel end-points
+ *
+ * Description:
+ *   vti6_tnl_bucket() returns the head of the list matching the
+ *   &struct in6_addr entries laddr and raddr in @p.
+ *
+ * Return: head of IPv6 tunnel list
+ **/
+static struct ip6_tnl __rcu **
+vti6_tnl_bucket(struct vti6_net *ip6n, const struct __ip6_tnl_parm *p)
+{
+	const struct in6_addr *remote = &p->raddr;
+	const struct in6_addr *local = &p->laddr;
+	unsigned int h = 0;
+	int prio = 0;
+
+	if (!ipv6_addr_any(remote) || !ipv6_addr_any(local)) {
+		prio = 1;
+		h = HASH(remote, local);
+	}
+	return &ip6n->tnls[prio][h];
+}
+
+static void
+vti6_tnl_link(struct vti6_net *ip6n, struct ip6_tnl *t)
+{
+	struct ip6_tnl __rcu **tp = vti6_tnl_bucket(ip6n, &t->parms);
+
+	rcu_assign_pointer(t->next , rtnl_dereference(*tp));
+	rcu_assign_pointer(*tp, t);
+}
+
+static void
+vti6_tnl_unlink(struct vti6_net *ip6n, struct ip6_tnl *t)
+{
+	struct ip6_tnl __rcu **tp;
+	struct ip6_tnl *iter;
+
+	for (tp = vti6_tnl_bucket(ip6n, &t->parms);
+	     (iter = rtnl_dereference(*tp)) != NULL;
+	     tp = &iter->next) {
+		if (t == iter) {
+			rcu_assign_pointer(*tp, t->next);
+			break;
+		}
+	}
+}
+
+static void vti6_dev_free(struct net_device *dev)
+{
+	free_percpu(dev->tstats);
+	free_netdev(dev);
+}
+
+static int vti6_tnl_create2(struct net_device *dev)
+{
+	struct ip6_tnl *t = netdev_priv(dev);
+	struct net *net = dev_net(dev);
+	struct vti6_net *ip6n = net_generic(net, vti6_net_id);
+	int err;
+
+	err = vti6_dev_init(dev);
+	if (err < 0)
+		goto out;
+
+	err = register_netdevice(dev);
+	if (err < 0)
+		goto out;
+
+	strcpy(t->parms.name, dev->name);
+	dev->rtnl_link_ops = &vti6_link_ops;
+
+	dev_hold(dev);
+	vti6_tnl_link(ip6n, t);
+
+	return 0;
+
+out:
+	return err;
+}
+
+static struct ip6_tnl *vti6_tnl_create(struct net *net, struct __ip6_tnl_parm *p)
+{
+	struct net_device *dev;
+	struct ip6_tnl *t;
+	char name[IFNAMSIZ];
+	int err;
+
+	if (p->name[0])
+		strlcpy(name, p->name, IFNAMSIZ);
+	else
+		sprintf(name, "ip6_vti%%d");
+
+	dev = alloc_netdev(sizeof(*t), name, vti6_dev_setup);
+	if (dev == NULL)
+		goto failed;
+
+	dev_net_set(dev, net);
+
+	t = netdev_priv(dev);
+	t->parms = *p;
+	t->net = dev_net(dev);
+
+	err = vti6_tnl_create2(dev);
+	if (err < 0)
+		goto failed_free;
+
+	return t;
+
+failed_free:
+	vti6_dev_free(dev);
+failed:
+	return NULL;
+}
+
+/**
+ * vti6_locate - find or create tunnel matching given parameters
+ *   @net: network namespace
+ *   @p: tunnel parameters
+ *   @create: != 0 if allowed to create new tunnel if no match found
+ *
+ * Description:
+ *   vti6_locate() first tries to locate an existing tunnel
+ *   based on @parms. If this is unsuccessful, but @create is set a new
+ *   tunnel device is created and registered for use.
+ *
+ * Return:
+ *   matching tunnel or NULL
+ **/
+static struct ip6_tnl *vti6_locate(struct net *net, struct __ip6_tnl_parm *p,
+				   int create)
+{
+	const struct in6_addr *remote = &p->raddr;
+	const struct in6_addr *local = &p->laddr;
+	struct ip6_tnl __rcu **tp;
+	struct ip6_tnl *t;
+	struct vti6_net *ip6n = net_generic(net, vti6_net_id);
+
+	for (tp = vti6_tnl_bucket(ip6n, p);
+	     (t = rtnl_dereference(*tp)) != NULL;
+	     tp = &t->next) {
+		if (ipv6_addr_equal(local, &t->parms.laddr) &&
+		    ipv6_addr_equal(remote, &t->parms.raddr))
+			return t;
+	}
+	if (!create)
+		return NULL;
+	return vti6_tnl_create(net, p);
+}
+
+/**
+ * vti6_dev_uninit - tunnel device uninitializer
+ *   @dev: the device to be destroyed
+ *
+ * Description:
+ *   vti6_dev_uninit() removes tunnel from its list
+ **/
+static void vti6_dev_uninit(struct net_device *dev)
+{
+	struct ip6_tnl *t = netdev_priv(dev);
+	struct net *net = dev_net(dev);
+	struct vti6_net *ip6n = net_generic(net, vti6_net_id);
+
+	if (dev == ip6n->fb_tnl_dev)
+		RCU_INIT_POINTER(ip6n->tnls_wc[0], NULL);
+	else
+		vti6_tnl_unlink(ip6n, t);
+	ip6_tnl_dst_reset(t);
+	dev_put(dev);
+}
+
+static int vti6_rcv(struct sk_buff *skb)
+{
+	struct ip6_tnl *t;
+	const struct ipv6hdr *ipv6h = ipv6_hdr(skb);
+
+	rcu_read_lock();
+
+	if ((t = vti6_tnl_lookup(dev_net(skb->dev), &ipv6h->saddr,
+				 &ipv6h->daddr)) != NULL) {
+		struct pcpu_tstats *tstats;
+
+		if (t->parms.proto != IPPROTO_IPV6 && t->parms.proto != 0) {
+			rcu_read_unlock();
+			goto discard;
+		}
+
+		if (!xfrm6_policy_check(NULL, XFRM_POLICY_IN, skb)) {
+			rcu_read_unlock();
+			return 0;
+		}
+
+		if (!ip6_tnl_rcv_ctl(t, &ipv6h->daddr, &ipv6h->saddr)) {
+			t->dev->stats.rx_dropped++;
+			rcu_read_unlock();
+			goto discard;
+		}
+
+		tstats = this_cpu_ptr(t->dev->tstats);
+		tstats->rx_packets++;
+		tstats->rx_bytes += skb->len;
+
+		skb->mark = 0;
+		secpath_reset(skb);
+		skb->dev = t->dev;
+
+		rcu_read_unlock();
+		return 0;
+	}
+	rcu_read_unlock();
+	return 1;
+
+discard:
+	kfree_skb(skb);
+	return 0;
+}
+
+/**
+ * vti6_addr_conflict - compare packet addresses to tunnel's own
+ *   @t: the outgoing tunnel device
+ *   @hdr: IPv6 header from the incoming packet
+ *
+ * Description:
+ *   Avoid trivial tunneling loop by checking that tunnel exit-point
+ *   doesn't match source of incoming packet.
+ *
+ * Return:
+ *   1 if conflict,
+ *   0 else
+ **/
+static inline bool
+vti6_addr_conflict(const struct ip6_tnl *t, const struct ipv6hdr *hdr)
+{
+	return ipv6_addr_equal(&t->parms.raddr, &hdr->saddr);
+}
+
+/**
+ * vti6_xmit - send a packet
+ *   @skb: the outgoing socket buffer
+ *   @dev: the outgoing tunnel device
+ **/
+static int vti6_xmit(struct sk_buff *skb, struct net_device *dev)
+{
+	struct net *net = dev_net(dev);
+	struct ip6_tnl *t = netdev_priv(dev);
+	struct net_device_stats *stats = &t->dev->stats;
+	struct dst_entry *dst = NULL, *ndst = NULL;
+	struct flowi6 fl6;
+	struct ipv6hdr *ipv6h = ipv6_hdr(skb);
+	struct net_device *tdev;
+	int err = -1;
+
+	if ((t->parms.proto != IPPROTO_IPV6 && t->parms.proto != 0) ||
+	    !ip6_tnl_xmit_ctl(t) || vti6_addr_conflict(t, ipv6h))
+		return err;
+
+	dst = ip6_tnl_dst_check(t);
+	if (!dst) {
+		memcpy(&fl6, &t->fl.u.ip6, sizeof(fl6));
+
+		ndst = ip6_route_output(net, NULL, &fl6);
+
+		if (ndst->error)
+			goto tx_err_link_failure;
+		ndst = xfrm_lookup(net, ndst, flowi6_to_flowi(&fl6), NULL, 0);
+		if (IS_ERR(ndst)) {
+			err = PTR_ERR(ndst);
+			ndst = NULL;
+			goto tx_err_link_failure;
+		}
+		dst = ndst;
+	}
+
+	if (!dst->xfrm || dst->xfrm->props.mode != XFRM_MODE_TUNNEL)
+		goto tx_err_link_failure;
+
+	tdev = dst->dev;
+
+	if (tdev == dev) {
+		stats->collisions++;
+		net_warn_ratelimited("%s: Local routing loop detected!\n",
+				     t->parms.name);
+		goto tx_err_dst_release;
+	}
+
+
+	skb_dst_drop(skb);
+	skb_dst_set_noref(skb, dst);
+
+	ip6tunnel_xmit(skb, dev);
+	if (ndst) {
+		dev->mtu = dst_mtu(ndst);
+		ip6_tnl_dst_store(t, ndst);
+	}
+
+	return 0;
+tx_err_link_failure:
+	stats->tx_carrier_errors++;
+	dst_link_failure(skb);
+tx_err_dst_release:
+	dst_release(ndst);
+	return err;
+}
+
+static netdev_tx_t
+vti6_tnl_xmit(struct sk_buff *skb, struct net_device *dev)
+{
+	struct ip6_tnl *t = netdev_priv(dev);
+	struct net_device_stats *stats = &t->dev->stats;
+	int ret;
+
+	switch (skb->protocol) {
+	case htons(ETH_P_IPV6):
+		ret = vti6_xmit(skb, dev);
+		break;
+	default:
+		goto tx_err;
+	}
+
+	if (ret < 0)
+		goto tx_err;
+
+	return NETDEV_TX_OK;
+
+tx_err:
+	stats->tx_errors++;
+	stats->tx_dropped++;
+	kfree_skb(skb);
+	return NETDEV_TX_OK;
+}
+
+static void vti6_link_config(struct ip6_tnl *t)
+{
+	struct dst_entry *dst;
+	struct net_device *dev = t->dev;
+	struct __ip6_tnl_parm *p = &t->parms;
+	struct flowi6 *fl6 = &t->fl.u.ip6;
+
+	memcpy(dev->dev_addr, &p->laddr, sizeof(struct in6_addr));
+	memcpy(dev->broadcast, &p->raddr, sizeof(struct in6_addr));
+
+	/* Set up flowi template */
+	fl6->saddr = p->laddr;
+	fl6->daddr = p->raddr;
+	fl6->flowi6_oif = p->link;
+	fl6->flowi6_mark = be32_to_cpu(p->i_key);
+	fl6->flowi6_proto = p->proto;
+	fl6->flowlabel = 0;
+
+	p->flags &= ~(IP6_TNL_F_CAP_XMIT | IP6_TNL_F_CAP_RCV |
+		      IP6_TNL_F_CAP_PER_PACKET);
+	p->flags |= ip6_tnl_get_cap(t, &p->laddr, &p->raddr);
+
+	if (p->flags & IP6_TNL_F_CAP_XMIT && p->flags & IP6_TNL_F_CAP_RCV)
+		dev->flags |= IFF_POINTOPOINT;
+	else
+		dev->flags &= ~IFF_POINTOPOINT;
+
+	dev->iflink = p->link;
+
+	if (p->flags & IP6_TNL_F_CAP_XMIT) {
+
+		dst = ip6_route_output(dev_net(dev), NULL, fl6);
+		if (dst->error)
+			return;
+
+		dst = xfrm_lookup(dev_net(dev), dst, flowi6_to_flowi(fl6),
+				  NULL, 0);
+		if (IS_ERR(dst))
+			return;
+
+		if (dst->dev) {
+			dev->hard_header_len = dst->dev->hard_header_len;
+
+			dev->mtu = dst_mtu(dst);
+
+			if (dev->mtu < IPV6_MIN_MTU)
+				dev->mtu = IPV6_MIN_MTU;
+		}
+		dst_release(dst);
+	}
+}
+
+/**
+ * vti6_tnl_change - update the tunnel parameters
+ *   @t: tunnel to be changed
+ *   @p: tunnel configuration parameters
+ *
+ * Description:
+ *   vti6_tnl_change() updates the tunnel parameters
+ **/
+static int
+vti6_tnl_change(struct ip6_tnl *t, const struct __ip6_tnl_parm *p)
+{
+	t->parms.laddr = p->laddr;
+	t->parms.raddr = p->raddr;
+	t->parms.link = p->link;
+	t->parms.i_key = p->i_key;
+	t->parms.o_key = p->o_key;
+	t->parms.proto = p->proto;
+	ip6_tnl_dst_reset(t);
+	vti6_link_config(t);
+	return 0;
+}
+
+static int vti6_update(struct ip6_tnl *t, struct __ip6_tnl_parm *p)
+{
+	struct net *net = dev_net(t->dev);
+	struct vti6_net *ip6n = net_generic(net, vti6_net_id);
+	int err;
+
+	vti6_tnl_unlink(ip6n, t);
+	synchronize_net();
+	err = vti6_tnl_change(t, p);
+	vti6_tnl_link(ip6n, t);
+	netdev_state_change(t->dev);
+	return err;
+}
+
+static void
+vti6_parm_from_user(struct __ip6_tnl_parm *p, const struct ip6_tnl_parm2 *u)
+{
+	p->laddr = u->laddr;
+	p->raddr = u->raddr;
+	p->link = u->link;
+	p->i_key = u->i_key;
+	p->o_key = u->o_key;
+	p->proto = u->proto;
+
+	memcpy(p->name, u->name, sizeof(u->name));
+}
+
+static void
+vti6_parm_to_user(struct ip6_tnl_parm2 *u, const struct __ip6_tnl_parm *p)
+{
+	u->laddr = p->laddr;
+	u->raddr = p->raddr;
+	u->link = p->link;
+	u->i_key = p->i_key;
+	u->o_key = p->o_key;
+	u->proto = p->proto;
+
+	memcpy(u->name, p->name, sizeof(u->name));
+}
+
+/**
+ * vti6_tnl_ioctl - configure vti6 tunnels from userspace
+ *   @dev: virtual device associated with tunnel
+ *   @ifr: parameters passed from userspace
+ *   @cmd: command to be performed
+ *
+ * Description:
+ *   vti6_ioctl() is used for managing vti6 tunnels
+ *   from userspace.
+ *
+ *   The possible commands are the following:
+ *     %SIOCGETTUNNEL: get tunnel parameters for device
+ *     %SIOCADDTUNNEL: add tunnel matching given tunnel parameters
+ *     %SIOCCHGTUNNEL: change tunnel parameters to those given
+ *     %SIOCDELTUNNEL: delete tunnel
+ *
+ *   The fallback device "ip6_vti0", created during module
+ *   initialization, can be used for creating other tunnel devices.
+ *
+ * Return:
+ *   0 on success,
+ *   %-EFAULT if unable to copy data to or from userspace,
+ *   %-EPERM if current process hasn't %CAP_NET_ADMIN set
+ *   %-EINVAL if passed tunnel parameters are invalid,
+ *   %-EEXIST if changing a tunnel's parameters would cause a conflict
+ *   %-ENODEV if attempting to change or delete a nonexisting device
+ **/
+static int
+vti6_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
+{
+	int err = 0;
+	struct ip6_tnl_parm2 p;
+	struct __ip6_tnl_parm p1;
+	struct ip6_tnl *t = NULL;
+	struct net *net = dev_net(dev);
+	struct vti6_net *ip6n = net_generic(net, vti6_net_id);
+
+	switch (cmd) {
+	case SIOCGETTUNNEL:
+		if (dev == ip6n->fb_tnl_dev) {
+			if (copy_from_user(&p, ifr->ifr_ifru.ifru_data, sizeof(p))) {
+				err = -EFAULT;
+				break;
+			}
+			vti6_parm_from_user(&p1, &p);
+			t = vti6_locate(net, &p1, 0);
+		} else {
+			memset(&p, 0, sizeof(p));
+		}
+		if (t == NULL)
+			t = netdev_priv(dev);
+		vti6_parm_to_user(&p, &t->parms);
+		if (copy_to_user(ifr->ifr_ifru.ifru_data, &p, sizeof(p)))
+			err = -EFAULT;
+		break;
+	case SIOCADDTUNNEL:
+	case SIOCCHGTUNNEL:
+		err = -EPERM;
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+			break;
+		err = -EFAULT;
+		if (copy_from_user(&p, ifr->ifr_ifru.ifru_data, sizeof(p)))
+			break;
+		err = -EINVAL;
+		if (p.proto != IPPROTO_IPV6  && p.proto != 0)
+			break;
+		vti6_parm_from_user(&p1, &p);
+		t = vti6_locate(net, &p1, cmd == SIOCADDTUNNEL);
+		if (dev != ip6n->fb_tnl_dev && cmd == SIOCCHGTUNNEL) {
+			if (t != NULL) {
+				if (t->dev != dev) {
+					err = -EEXIST;
+					break;
+				}
+			} else
+				t = netdev_priv(dev);
+
+			err = vti6_update(t, &p1);
+		}
+		if (t) {
+			err = 0;
+			vti6_parm_to_user(&p, &t->parms);
+			if (copy_to_user(ifr->ifr_ifru.ifru_data, &p, sizeof(p)))
+				err = -EFAULT;
+
+		} else
+			err = (cmd == SIOCADDTUNNEL ? -ENOBUFS : -ENOENT);
+		break;
+	case SIOCDELTUNNEL:
+		err = -EPERM;
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+			break;
+
+		if (dev == ip6n->fb_tnl_dev) {
+			err = -EFAULT;
+			if (copy_from_user(&p, ifr->ifr_ifru.ifru_data, sizeof(p)))
+				break;
+			err = -ENOENT;
+			vti6_parm_from_user(&p1, &p);
+			t = vti6_locate(net, &p1, 0);
+			if (t == NULL)
+				break;
+			err = -EPERM;
+			if (t->dev == ip6n->fb_tnl_dev)
+				break;
+			dev = t->dev;
+		}
+		err = 0;
+		unregister_netdevice(dev);
+		break;
+	default:
+		err = -EINVAL;
+	}
+	return err;
+}
+
+/**
+ * vti6_tnl_change_mtu - change mtu manually for tunnel device
+ *   @dev: virtual device associated with tunnel
+ *   @new_mtu: the new mtu
+ *
+ * Return:
+ *   0 on success,
+ *   %-EINVAL if mtu too small
+ **/
+static int vti6_change_mtu(struct net_device *dev, int new_mtu)
+{
+	if (new_mtu < IPV6_MIN_MTU)
+		return -EINVAL;
+
+	dev->mtu = new_mtu;
+	return 0;
+}
+
+static const struct net_device_ops vti6_netdev_ops = {
+	.ndo_uninit	= vti6_dev_uninit,
+	.ndo_start_xmit = vti6_tnl_xmit,
+	.ndo_do_ioctl	= vti6_ioctl,
+	.ndo_change_mtu = vti6_change_mtu,
+	.ndo_get_stats	= vti6_get_stats,
+};
+
+/**
+ * vti6_dev_setup - setup virtual tunnel device
+ *   @dev: virtual device associated with tunnel
+ *
+ * Description:
+ *   Initialize function pointers and device parameters
+ **/
+static void vti6_dev_setup(struct net_device *dev)
+{
+	struct ip6_tnl *t;
+
+	dev->netdev_ops = &vti6_netdev_ops;
+	dev->destructor = vti6_dev_free;
+
+	dev->type = ARPHRD_TUNNEL6;
+	dev->hard_header_len = LL_MAX_HEADER + sizeof(struct ipv6hdr);
+	dev->mtu = ETH_DATA_LEN;
+	t = netdev_priv(dev);
+	dev->flags |= IFF_NOARP;
+	dev->addr_len = sizeof(struct in6_addr);
+	dev->features |= NETIF_F_NETNS_LOCAL;
+	dev->priv_flags &= ~IFF_XMIT_DST_RELEASE;
+}
+
+/**
+ * vti6_dev_init_gen - general initializer for all tunnel devices
+ *   @dev: virtual device associated with tunnel
+ **/
+static inline int vti6_dev_init_gen(struct net_device *dev)
+{
+	struct ip6_tnl *t = netdev_priv(dev);
+
+	t->dev = dev;
+	t->net = dev_net(dev);
+	dev->tstats = alloc_percpu(struct pcpu_tstats);
+	if (!dev->tstats)
+		return -ENOMEM;
+	return 0;
+}
+
+/**
+ * vti6_dev_init - initializer for all non fallback tunnel devices
+ *   @dev: virtual device associated with tunnel
+ **/
+static int vti6_dev_init(struct net_device *dev)
+{
+	struct ip6_tnl *t = netdev_priv(dev);
+	int err = vti6_dev_init_gen(dev);
+
+	if (err)
+		return err;
+	vti6_link_config(t);
+	return 0;
+}
+
+/**
+ * vti6_fb_tnl_dev_init - initializer for fallback tunnel device
+ *   @dev: fallback device
+ *
+ * Return: 0
+ **/
+static int __net_init vti6_fb_tnl_dev_init(struct net_device *dev)
+{
+	struct ip6_tnl *t = netdev_priv(dev);
+	struct net *net = dev_net(dev);
+	struct vti6_net *ip6n = net_generic(net, vti6_net_id);
+	int err = vti6_dev_init_gen(dev);
+
+	if (err)
+		return err;
+
+	t->parms.proto = IPPROTO_IPV6;
+	dev_hold(dev);
+
+	vti6_link_config(t);
+
+	rcu_assign_pointer(ip6n->tnls_wc[0], t);
+	return 0;
+}
+
+static int vti6_validate(struct nlattr *tb[], struct nlattr *data[])
+{
+	return 0;
+}
+
+static void vti6_netlink_parms(struct nlattr *data[],
+			       struct __ip6_tnl_parm *parms)
+{
+	memset(parms, 0, sizeof(*parms));
+
+	if (!data)
+		return;
+
+	if (data[IFLA_VTI_LINK])
+		parms->link = nla_get_u32(data[IFLA_VTI_LINK]);
+
+	if (data[IFLA_VTI_LOCAL])
+		nla_memcpy(&parms->laddr, data[IFLA_VTI_LOCAL],
+			   sizeof(struct in6_addr));
+
+	if (data[IFLA_VTI_REMOTE])
+		nla_memcpy(&parms->raddr, data[IFLA_VTI_REMOTE],
+			   sizeof(struct in6_addr));
+
+	if (data[IFLA_VTI_IKEY])
+		parms->i_key = nla_get_be32(data[IFLA_VTI_IKEY]);
+
+	if (data[IFLA_VTI_OKEY])
+		parms->o_key = nla_get_be32(data[IFLA_VTI_OKEY]);
+}
+
+static int vti6_newlink(struct net *src_net, struct net_device *dev,
+			struct nlattr *tb[], struct nlattr *data[])
+{
+	struct net *net = dev_net(dev);
+	struct ip6_tnl *nt;
+
+	nt = netdev_priv(dev);
+	vti6_netlink_parms(data, &nt->parms);
+
+	nt->parms.proto = IPPROTO_IPV6;
+
+	if (vti6_locate(net, &nt->parms, 0))
+		return -EEXIST;
+
+	return vti6_tnl_create2(dev);
+}
+
+static int vti6_changelink(struct net_device *dev, struct nlattr *tb[],
+			   struct nlattr *data[])
+{
+	struct ip6_tnl *t;
+	struct __ip6_tnl_parm p;
+	struct net *net = dev_net(dev);
+	struct vti6_net *ip6n = net_generic(net, vti6_net_id);
+
+	if (dev == ip6n->fb_tnl_dev)
+		return -EINVAL;
+
+	vti6_netlink_parms(data, &p);
+
+	t = vti6_locate(net, &p, 0);
+
+	if (t) {
+		if (t->dev != dev)
+			return -EEXIST;
+	} else
+		t = netdev_priv(dev);
+
+	return vti6_update(t, &p);
+}
+
+static size_t vti6_get_size(const struct net_device *dev)
+{
+	return
+		/* IFLA_VTI_LINK */
+		nla_total_size(4) +
+		/* IFLA_VTI_LOCAL */
+		nla_total_size(sizeof(struct in6_addr)) +
+		/* IFLA_VTI_REMOTE */
+		nla_total_size(sizeof(struct in6_addr)) +
+		/* IFLA_VTI_IKEY */
+		nla_total_size(4) +
+		/* IFLA_VTI_OKEY */
+		nla_total_size(4) +
+		0;
+}
+
+static int vti6_fill_info(struct sk_buff *skb, const struct net_device *dev)
+{
+	struct ip6_tnl *tunnel = netdev_priv(dev);
+	struct __ip6_tnl_parm *parm = &tunnel->parms;
+
+	if (nla_put_u32(skb, IFLA_VTI_LINK, parm->link) ||
+	    nla_put(skb, IFLA_VTI_LOCAL, sizeof(struct in6_addr),
+		    &parm->laddr) ||
+	    nla_put(skb, IFLA_VTI_REMOTE, sizeof(struct in6_addr),
+		    &parm->raddr) ||
+	    nla_put_be32(skb, IFLA_VTI_IKEY, parm->i_key) ||
+	    nla_put_be32(skb, IFLA_VTI_OKEY, parm->o_key))
+		goto nla_put_failure;
+	return 0;
+
+nla_put_failure:
+	return -EMSGSIZE;
+}
+
+static const struct nla_policy vti6_policy[IFLA_VTI_MAX + 1] = {
+	[IFLA_VTI_LINK]		= { .type = NLA_U32 },
+	[IFLA_VTI_LOCAL]	= { .len = sizeof(struct in6_addr) },
+	[IFLA_VTI_REMOTE]	= { .len = sizeof(struct in6_addr) },
+	[IFLA_VTI_IKEY]		= { .type = NLA_U32 },
+	[IFLA_VTI_OKEY]		= { .type = NLA_U32 },
+};
+
+static struct rtnl_link_ops vti6_link_ops __read_mostly = {
+	.kind		= "vti6",
+	.maxtype	= IFLA_VTI_MAX,
+	.policy		= vti6_policy,
+	.priv_size	= sizeof(struct ip6_tnl),
+	.setup		= vti6_dev_setup,
+	.validate	= vti6_validate,
+	.newlink	= vti6_newlink,
+	.changelink	= vti6_changelink,
+	.get_size	= vti6_get_size,
+	.fill_info	= vti6_fill_info,
+};
+
+static struct xfrm_tunnel_notifier vti6_handler __read_mostly = {
+	.handler	= vti6_rcv,
+	.priority	=	1,
+};
+
+static void __net_exit vti6_destroy_tunnels(struct vti6_net *ip6n)
+{
+	int h;
+	struct ip6_tnl *t;
+	LIST_HEAD(list);
+
+	for (h = 0; h < HASH_SIZE; h++) {
+		t = rtnl_dereference(ip6n->tnls_r_l[h]);
+		while (t != NULL) {
+			unregister_netdevice_queue(t->dev, &list);
+			t = rtnl_dereference(t->next);
+		}
+	}
+
+	t = rtnl_dereference(ip6n->tnls_wc[0]);
+	unregister_netdevice_queue(t->dev, &list);
+	unregister_netdevice_many(&list);
+}
+
+static int __net_init vti6_init_net(struct net *net)
+{
+	struct vti6_net *ip6n = net_generic(net, vti6_net_id);
+	struct ip6_tnl *t = NULL;
+	int err;
+
+	ip6n->tnls[0] = ip6n->tnls_wc;
+	ip6n->tnls[1] = ip6n->tnls_r_l;
+
+	err = -ENOMEM;
+	ip6n->fb_tnl_dev = alloc_netdev(sizeof(struct ip6_tnl), "ip6_vti0",
+					vti6_dev_setup);
+
+	if (!ip6n->fb_tnl_dev)
+		goto err_alloc_dev;
+	dev_net_set(ip6n->fb_tnl_dev, net);
+
+	err = vti6_fb_tnl_dev_init(ip6n->fb_tnl_dev);
+	if (err < 0)
+		goto err_register;
+
+	err = register_netdev(ip6n->fb_tnl_dev);
+	if (err < 0)
+		goto err_register;
+
+	t = netdev_priv(ip6n->fb_tnl_dev);
+
+	strcpy(t->parms.name, ip6n->fb_tnl_dev->name);
+	return 0;
+
+err_register:
+	vti6_dev_free(ip6n->fb_tnl_dev);
+err_alloc_dev:
+	return err;
+}
+
+static void __net_exit vti6_exit_net(struct net *net)
+{
+	struct vti6_net *ip6n = net_generic(net, vti6_net_id);
+
+	rtnl_lock();
+	vti6_destroy_tunnels(ip6n);
+	rtnl_unlock();
+}
+
+static struct pernet_operations vti6_net_ops = {
+	.init = vti6_init_net,
+	.exit = vti6_exit_net,
+	.id   = &vti6_net_id,
+	.size = sizeof(struct vti6_net),
+};
+
+/**
+ * vti6_tunnel_init - register protocol and reserve needed resources
+ *
+ * Return: 0 on success
+ **/
+static int __init vti6_tunnel_init(void)
+{
+	int  err;
+
+	err = register_pernet_device(&vti6_net_ops);
+	if (err < 0)
+		goto out_pernet;
+
+	err = xfrm6_mode_tunnel_input_register(&vti6_handler);
+	if (err < 0) {
+		pr_err("%s: can't register vti6\n", __func__);
+		goto out;
+	}
+	err = rtnl_link_register(&vti6_link_ops);
+	if (err < 0)
+		goto rtnl_link_failed;
+
+	return 0;
+
+rtnl_link_failed:
+	xfrm6_mode_tunnel_input_deregister(&vti6_handler);
+out:
+	unregister_pernet_device(&vti6_net_ops);
+out_pernet:
+	return err;
+}
+
+/**
+ * vti6_tunnel_cleanup - free resources and unregister protocol
+ **/
+static void __exit vti6_tunnel_cleanup(void)
+{
+	rtnl_link_unregister(&vti6_link_ops);
+	if (xfrm6_mode_tunnel_input_deregister(&vti6_handler))
+		pr_info("%s: can't deregister vti6\n", __func__);
+
+	unregister_pernet_device(&vti6_net_ops);
+}
+
+module_init(vti6_tunnel_init);
+module_exit(vti6_tunnel_cleanup);
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_RTNL_LINK("vti6");
+MODULE_ALIAS_NETDEV("ip6_vti0");
+MODULE_AUTHOR("Steffen Klassert");
+MODULE_DESCRIPTION("IPv6 virtual tunnel interface");
-- 
1.7.9.5

^ permalink raw reply related

* Re: [PATCH net] net: unix: inherit SOCK_PASS{CRED,SEC} flags from socket to fix race
From: Daniel Borkmann @ 2013-10-18  8:42 UTC (permalink / raw)
  To: David Laight; +Cc: davem, netdev, Eric Dumazet, Eric W. Biederman
In-Reply-To: <AE90C24D6B3A694183C094C60CF0A2F6026B7395@saturn3.aculab.com>

On 10/18/2013 10:26 AM, David Laight wrote:
>> Subject: [PATCH net] net: unix: inherit SOCK_PASS{CRED,SEC} flags from socket to fix race
>>
>> In the case of credentials passing in unix stream sockets (dgram
>> sockets seem not affected), we get a rather sparse race after
>> commit 16e5726 ("af_unix: dont send SCM_CREDENTIALS by default").
> ...
>> +static void unix_sock_inherit_flags(const struct socket *old,
>> +				    struct socket *new)
>> +{
>> +	if (test_bit(SOCK_PASSCRED, &old->flags))
>> +		set_bit(SOCK_PASSCRED, &new->flags);
>> +	if (test_bit(SOCK_PASSSEC, &old->flags))
>> +		set_bit(SOCK_PASSSEC, &new->flags);
>> +}
>> +
>
> Isn't that just:
> 	new->flags |= old->flags & (PASSCRED | SOCK_PASSSEC);

Nope, please have a look at the individual test_bit() etc
implementations under arch/, and the definitions of
SOCK_PASSCRED and SOCK_PASSSEC.

I though about just setting new->flags = old->flags, but that
would probably be _not_ a good idea, as we actually do not want
to pass other flags than these two relevant ones onwards.

> 	David

^ permalink raw reply

* RE: [Xen-devel] [PATCH net] xen-netback: add the scenario which now beyond the range time_after_eq().
From: David Laight @ 2013-10-18  8:40 UTC (permalink / raw)
  To: Jan Beulich, annie li
  Cc: david.vrabel, ian.campbell, wei.liu2, xen-devel, jianhai luan,
	netdev
In-Reply-To: <52610CD202000078000FC00B@nat28.tlf.novell.com>

> > My understanding is this patch does not simply double the span, it is
> > just stricter than the original one. Please check my previous comments,
> > I paste it here.
> 
> No, the code (on a 32-bit arch) just _can't_ handle jiffies differences
> beyond 2^32, no matter how cleverly you use the respective macros.
> All arithmetic there is done modulo 2^32.

I haven't followed this discussion very closely but it might be possible
to arrange that the 'incorrect lack of credit' only occurs for a few
seconds every time 'jiffies' wraps - instead of half of the time.
Then you'd have to be extremely unlucky to hit the timing window.

	David

^ permalink raw reply

* Re: [Xen-devel] [PATCH net] xen-netback: add the scenario which now beyond the range time_after_eq().
From: annie li @ 2013-10-18  8:55 UTC (permalink / raw)
  To: Jan Beulich
  Cc: wei.liu2, ian.campbell, netdev, david.vrabel, xen-devel,
	jianhai luan
In-Reply-To: <52610CD202000078000FC00B@nat28.tlf.novell.com>


On 2013-10-18 16:26, Jan Beulich wrote:
>>>> On 18.10.13 at 10:14, annie li <annie.li@oracle.com> wrote:
>> On 2013-10-18 15:43, Jan Beulich wrote:
>>>>>> On 17.10.13 at 18:38, annie li <annie.li@oracle.com> wrote:
>>>> On 2013-10-17 17:26, Jan Beulich wrote:
>>>>>> Yes, the issue only can be  reproduced in 32-bit Dom0 (Beyond
>>>>>> MAX_ULONG/2 in 64-bit will need long long time)
>>>>>>
>>>>>> I think the gap should be think all environment even now extending 480+.
>>>>>> if now fall in the gap,  one timer will be pending and replenish will be
>>>>>> in time.  Please run the attachment test program.
>>>>> Not sure what this is supposed to tell me. I recognize that there
>>>>> are overflow conditions not handled properly, but (a) I have a
>>>>> hard time thinking of a sensible guest that sits idle for over 240
>>>>> days (host uptime usually isn't even coming close to that due to
>>>>> maintenance requirements) and (b) if there is such a sensible
>>>>> guest, then I can't see why dealing with one being idle for over
>>>>> 480 days should be required too.
>>>>>
>>>> If the guest contains multiple NICs, that situation probably happens
>>>> when one NIC keeps idle and others work under load. BTW, how do you get
>>>> the 240?
>>> 2^31 / 100 / 60 / 60 / 24
>>>
>>> Obviously with HZ=1000 the span would be smaller by a factor
>>> of 10, which would make it even more clear that doubling the
>>> span doesn't really help.
>> My understanding is this patch does not simply double the span, it is
>> just stricter than the original one. Please check my previous comments,
>> I paste it here.
> No, the code (on a 32-bit arch) just _can't_ handle jiffies differences
> beyond 2^32, no matter how cleverly you use the respective macros.
> All arithmetic there is done modulo 2^32.

On 32-bit arch, the jiffies difference beyond 2^32 is only in theory, 
and the real value of jiffies would wrap around after 2^32. Then it will 
still be verified by time_in_range_open, the code will either replenish 
the credit or hit the timer soon.

Thanks
Annie

^ permalink raw reply

* [patch] ax25: cleanup a range test
From: Dan Carpenter @ 2013-10-18  9:06 UTC (permalink / raw)
  To: Joerg Reuter
  Cc: Ralf Baechle, David S. Miller, linux-hams, netdev,
	kernel-janitors

The current test works fine in practice.  The "amount" variable is
actually used as a boolean so negative values or any non-zero values
count as "true".  However since we don't allow numbers greater than one,
let's not allow negative numbers either.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>

diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c
index 4b4d2b7..a00123e 100644
--- a/net/ax25/af_ax25.c
+++ b/net/ax25/af_ax25.c
@@ -1735,7 +1735,7 @@ static int ax25_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg)
 			res = -EFAULT;
 			break;
 		}
-		if (amount > AX25_NOUID_BLOCK) {
+		if (amount < 0 || amount > AX25_NOUID_BLOCK) {
 			res = -EINVAL;
 			break;
 		}

^ permalink raw reply related

* Re: [PATCH ipsec v3] xfrm: prevent ipcomp scratch buffer race condition
From: Steffen Klassert @ 2013-10-18  9:25 UTC (permalink / raw)
  To: Michal Kubecek; +Cc: Herbert Xu, David S. Miller, netdev, Eric Dumazet
In-Reply-To: <20131017130740.D8741E8A5E@unicorn.suse.cz>

On Thu, Oct 17, 2013 at 03:07:40PM +0200, Michal Kubecek wrote:
> In ipcomp_compress(), sortirq is enabled too early, allowing the
> per-cpu scratch buffer to be rewritten by ipcomp_decompress()
> (called on the same CPU in softirq context) between populating
> the buffer and copying the compressed data to the skb.
> 
> v2: as pointed out by Steffen Klassert, if we also move the
> local_bh_disable() before reading the per-cpu pointers, we can
> get rid of get_cpu()/put_cpu().
> 
> v3: removed ipcomp_decompress part (as explained by Herbert Xu,
> it cannot be called from process context), get rid of cpu
> variable (thanks to Eric Dumazet)
> 
> Signed-off-by: Michal Kubecek <mkubecek@suse.cz>

Applied to the ipsec tree, thanks everyone!

^ permalink raw reply

* [PATCH net-next 0/2] Removal of struct esp_data
From: Mathias Krause @ 2013-10-18 10:09 UTC (permalink / raw)
  To: netdev; +Cc: Mathias Krause, Steffen Klassert, Herbert Xu, David S. Miller

This series removes one level of indirection when accessing the aead
crypto algorithm in ESP transforms by simply removing struct esp_data.
This results in smaller code and less memory usage per xfrm state.

Please apply!

Mathias Krause (2):
  net: esp{4,6}: remove padlen from struct esp_data
  net: esp{4,6}: get rid of struct esp_data

 include/net/esp.h |   10 ----------
 net/ipv4/esp4.c   |   49 +++++++++++++++----------------------------------
 net/ipv6/esp6.c   |   48 +++++++++++++++---------------------------------
 3 files changed, 30 insertions(+), 77 deletions(-)

-- 
1.7.10.4

^ permalink raw reply

* [PATCH net-next 1/2] net: esp{4,6}: remove padlen from struct esp_data
From: Mathias Krause @ 2013-10-18 10:09 UTC (permalink / raw)
  To: netdev; +Cc: Mathias Krause, Steffen Klassert, Herbert Xu, David S. Miller
In-Reply-To: <1382090945-20860-1-git-send-email-mathias.krause@secunet.com>

The padlen member of struct esp_data is always zero. Get rid of it.

Signed-off-by: Mathias Krause <mathias.krause@secunet.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
---
 include/net/esp.h |    3 ---
 net/ipv4/esp4.c   |    9 +--------
 net/ipv6/esp6.c   |    9 +--------
 3 files changed, 2 insertions(+), 19 deletions(-)

diff --git a/include/net/esp.h b/include/net/esp.h
index 1356dda..706b740 100644
--- a/include/net/esp.h
+++ b/include/net/esp.h
@@ -6,9 +6,6 @@
 struct crypto_aead;
 
 struct esp_data {
-	/* 0..255 */
-	int padlen;
-
 	/* Confidentiality & Integrity */
 	struct crypto_aead *aead;
 };
diff --git a/net/ipv4/esp4.c b/net/ipv4/esp4.c
index 109ee89..8b5386a 100644
--- a/net/ipv4/esp4.c
+++ b/net/ipv4/esp4.c
@@ -154,8 +154,6 @@ static int esp_output(struct xfrm_state *x, struct sk_buff *skb)
 	}
 	blksize = ALIGN(crypto_aead_blocksize(aead), 4);
 	clen = ALIGN(skb->len + 2 + tfclen, blksize);
-	if (esp->padlen)
-		clen = ALIGN(clen, esp->padlen);
 	plen = clen - skb->len - tfclen;
 
 	err = skb_cow_data(skb, tfclen + plen + alen, &trailer);
@@ -461,7 +459,6 @@ static u32 esp4_get_mtu(struct xfrm_state *x, int mtu)
 {
 	struct esp_data *esp = x->data;
 	u32 blksize = ALIGN(crypto_aead_blocksize(esp->aead), 4);
-	u32 align = max_t(u32, blksize, esp->padlen);
 	unsigned int net_adj;
 
 	switch (x->props.mode) {
@@ -477,7 +474,7 @@ static u32 esp4_get_mtu(struct xfrm_state *x, int mtu)
 	}
 
 	return ((mtu - x->props.header_len - crypto_aead_authsize(esp->aead) -
-		 net_adj) & ~(align - 1)) + net_adj - 2;
+		 net_adj) & ~(blksize - 1)) + net_adj - 2;
 }
 
 static void esp4_err(struct sk_buff *skb, u32 info)
@@ -659,8 +656,6 @@ static int esp_init_state(struct xfrm_state *x)
 
 	aead = esp->aead;
 
-	esp->padlen = 0;
-
 	x->props.header_len = sizeof(struct ip_esp_hdr) +
 			      crypto_aead_ivsize(aead);
 	if (x->props.mode == XFRM_MODE_TUNNEL)
@@ -683,8 +678,6 @@ static int esp_init_state(struct xfrm_state *x)
 	}
 
 	align = ALIGN(crypto_aead_blocksize(aead), 4);
-	if (esp->padlen)
-		align = max_t(u32, align, esp->padlen);
 	x->props.trailer_len = align + 1 + crypto_aead_authsize(esp->aead);
 
 error:
diff --git a/net/ipv6/esp6.c b/net/ipv6/esp6.c
index d3618a7..0073cd0 100644
--- a/net/ipv6/esp6.c
+++ b/net/ipv6/esp6.c
@@ -181,8 +181,6 @@ static int esp6_output(struct xfrm_state *x, struct sk_buff *skb)
 	}
 	blksize = ALIGN(crypto_aead_blocksize(aead), 4);
 	clen = ALIGN(skb->len + 2 + tfclen, blksize);
-	if (esp->padlen)
-		clen = ALIGN(clen, esp->padlen);
 	plen = clen - skb->len - tfclen;
 
 	err = skb_cow_data(skb, tfclen + plen + alen, &trailer);
@@ -416,7 +414,6 @@ static u32 esp6_get_mtu(struct xfrm_state *x, int mtu)
 {
 	struct esp_data *esp = x->data;
 	u32 blksize = ALIGN(crypto_aead_blocksize(esp->aead), 4);
-	u32 align = max_t(u32, blksize, esp->padlen);
 	unsigned int net_adj;
 
 	if (x->props.mode != XFRM_MODE_TUNNEL)
@@ -425,7 +422,7 @@ static u32 esp6_get_mtu(struct xfrm_state *x, int mtu)
 		net_adj = 0;
 
 	return ((mtu - x->props.header_len - crypto_aead_authsize(esp->aead) -
-		 net_adj) & ~(align - 1)) + net_adj - 2;
+		 net_adj) & ~(blksize - 1)) + net_adj - 2;
 }
 
 static void esp6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
@@ -606,8 +603,6 @@ static int esp6_init_state(struct xfrm_state *x)
 
 	aead = esp->aead;
 
-	esp->padlen = 0;
-
 	x->props.header_len = sizeof(struct ip_esp_hdr) +
 			      crypto_aead_ivsize(aead);
 	switch (x->props.mode) {
@@ -626,8 +621,6 @@ static int esp6_init_state(struct xfrm_state *x)
 	}
 
 	align = ALIGN(crypto_aead_blocksize(aead), 4);
-	if (esp->padlen)
-		align = max_t(u32, align, esp->padlen);
 	x->props.trailer_len = align + 1 + crypto_aead_authsize(esp->aead);
 
 error:
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH net-next 2/2] net: esp{4,6}: get rid of struct esp_data
From: Mathias Krause @ 2013-10-18 10:09 UTC (permalink / raw)
  To: netdev; +Cc: Mathias Krause, Steffen Klassert, Herbert Xu, David S. Miller
In-Reply-To: <1382090945-20860-1-git-send-email-mathias.krause@secunet.com>

struct esp_data consists of a single pointer, vanishing the need for it
to be a structure. Fold the pointer into 'data' direcly, removing one
level of pointer indirection.

Signed-off-by: Mathias Krause <mathias.krause@secunet.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
---
 include/net/esp.h |    7 -------
 net/ipv4/esp4.c   |   40 ++++++++++++++--------------------------
 net/ipv6/esp6.c   |   39 ++++++++++++++-------------------------
 3 files changed, 28 insertions(+), 58 deletions(-)

diff --git a/include/net/esp.h b/include/net/esp.h
index 706b740..c92213c 100644
--- a/include/net/esp.h
+++ b/include/net/esp.h
@@ -3,13 +3,6 @@
 
 #include <linux/skbuff.h>
 
-struct crypto_aead;
-
-struct esp_data {
-	/* Confidentiality & Integrity */
-	struct crypto_aead *aead;
-};
-
 void *pskb_put(struct sk_buff *skb, struct sk_buff *tail, int len);
 
 struct ip_esp_hdr;
diff --git a/net/ipv4/esp4.c b/net/ipv4/esp4.c
index 8b5386a..7785b28 100644
--- a/net/ipv4/esp4.c
+++ b/net/ipv4/esp4.c
@@ -121,7 +121,6 @@ static int esp_output(struct xfrm_state *x, struct sk_buff *skb)
 	struct aead_givcrypt_request *req;
 	struct scatterlist *sg;
 	struct scatterlist *asg;
-	struct esp_data *esp;
 	struct sk_buff *trailer;
 	void *tmp;
 	u8 *iv;
@@ -139,8 +138,7 @@ static int esp_output(struct xfrm_state *x, struct sk_buff *skb)
 
 	/* skb is pure payload to encrypt */
 
-	esp = x->data;
-	aead = esp->aead;
+	aead = x->data;
 	alen = crypto_aead_authsize(aead);
 
 	tfclen = 0;
@@ -278,8 +276,7 @@ static int esp_input_done2(struct sk_buff *skb, int err)
 {
 	const struct iphdr *iph;
 	struct xfrm_state *x = xfrm_input_state(skb);
-	struct esp_data *esp = x->data;
-	struct crypto_aead *aead = esp->aead;
+	struct crypto_aead *aead = x->data;
 	int alen = crypto_aead_authsize(aead);
 	int hlen = sizeof(struct ip_esp_hdr) + crypto_aead_ivsize(aead);
 	int elen = skb->len - hlen;
@@ -374,8 +371,7 @@ static void esp_input_done(struct crypto_async_request *base, int err)
 static int esp_input(struct xfrm_state *x, struct sk_buff *skb)
 {
 	struct ip_esp_hdr *esph;
-	struct esp_data *esp = x->data;
-	struct crypto_aead *aead = esp->aead;
+	struct crypto_aead *aead = x->data;
 	struct aead_request *req;
 	struct sk_buff *trailer;
 	int elen = skb->len - sizeof(*esph) - crypto_aead_ivsize(aead);
@@ -457,8 +453,8 @@ out:
 
 static u32 esp4_get_mtu(struct xfrm_state *x, int mtu)
 {
-	struct esp_data *esp = x->data;
-	u32 blksize = ALIGN(crypto_aead_blocksize(esp->aead), 4);
+	struct crypto_aead *aead = x->data;
+	u32 blksize = ALIGN(crypto_aead_blocksize(aead), 4);
 	unsigned int net_adj;
 
 	switch (x->props.mode) {
@@ -473,7 +469,7 @@ static u32 esp4_get_mtu(struct xfrm_state *x, int mtu)
 		BUG();
 	}
 
-	return ((mtu - x->props.header_len - crypto_aead_authsize(esp->aead) -
+	return ((mtu - x->props.header_len - crypto_aead_authsize(aead) -
 		 net_adj) & ~(blksize - 1)) + net_adj - 2;
 }
 
@@ -508,18 +504,16 @@ static void esp4_err(struct sk_buff *skb, u32 info)
 
 static void esp_destroy(struct xfrm_state *x)
 {
-	struct esp_data *esp = x->data;
+	struct crypto_aead *aead = x->data;
 
-	if (!esp)
+	if (!aead)
 		return;
 
-	crypto_free_aead(esp->aead);
-	kfree(esp);
+	crypto_free_aead(aead);
 }
 
 static int esp_init_aead(struct xfrm_state *x)
 {
-	struct esp_data *esp = x->data;
 	struct crypto_aead *aead;
 	int err;
 
@@ -528,7 +522,7 @@ static int esp_init_aead(struct xfrm_state *x)
 	if (IS_ERR(aead))
 		goto error;
 
-	esp->aead = aead;
+	x->data = aead;
 
 	err = crypto_aead_setkey(aead, x->aead->alg_key,
 				 (x->aead->alg_key_len + 7) / 8);
@@ -545,7 +539,6 @@ error:
 
 static int esp_init_authenc(struct xfrm_state *x)
 {
-	struct esp_data *esp = x->data;
 	struct crypto_aead *aead;
 	struct crypto_authenc_key_param *param;
 	struct rtattr *rta;
@@ -580,7 +573,7 @@ static int esp_init_authenc(struct xfrm_state *x)
 	if (IS_ERR(aead))
 		goto error;
 
-	esp->aead = aead;
+	x->data = aead;
 
 	keylen = (x->aalg ? (x->aalg->alg_key_len + 7) / 8 : 0) +
 		 (x->ealg->alg_key_len + 7) / 8 + RTA_SPACE(sizeof(*param));
@@ -635,16 +628,11 @@ error:
 
 static int esp_init_state(struct xfrm_state *x)
 {
-	struct esp_data *esp;
 	struct crypto_aead *aead;
 	u32 align;
 	int err;
 
-	esp = kzalloc(sizeof(*esp), GFP_KERNEL);
-	if (esp == NULL)
-		return -ENOMEM;
-
-	x->data = esp;
+	x->data = NULL;
 
 	if (x->aead)
 		err = esp_init_aead(x);
@@ -654,7 +642,7 @@ static int esp_init_state(struct xfrm_state *x)
 	if (err)
 		goto error;
 
-	aead = esp->aead;
+	aead = x->data;
 
 	x->props.header_len = sizeof(struct ip_esp_hdr) +
 			      crypto_aead_ivsize(aead);
@@ -678,7 +666,7 @@ static int esp_init_state(struct xfrm_state *x)
 	}
 
 	align = ALIGN(crypto_aead_blocksize(aead), 4);
-	x->props.trailer_len = align + 1 + crypto_aead_authsize(esp->aead);
+	x->props.trailer_len = align + 1 + crypto_aead_authsize(aead);
 
 error:
 	return err;
diff --git a/net/ipv6/esp6.c b/net/ipv6/esp6.c
index 0073cd0..87eb79e 100644
--- a/net/ipv6/esp6.c
+++ b/net/ipv6/esp6.c
@@ -164,10 +164,9 @@ static int esp6_output(struct xfrm_state *x, struct sk_buff *skb)
 	u8 *iv;
 	u8 *tail;
 	__be32 *seqhi;
-	struct esp_data *esp = x->data;
 
 	/* skb is pure payload to encrypt */
-	aead = esp->aead;
+	aead = x->data;
 	alen = crypto_aead_authsize(aead);
 
 	tfclen = 0;
@@ -269,8 +268,7 @@ error:
 static int esp_input_done2(struct sk_buff *skb, int err)
 {
 	struct xfrm_state *x = xfrm_input_state(skb);
-	struct esp_data *esp = x->data;
-	struct crypto_aead *aead = esp->aead;
+	struct crypto_aead *aead = x->data;
 	int alen = crypto_aead_authsize(aead);
 	int hlen = sizeof(struct ip_esp_hdr) + crypto_aead_ivsize(aead);
 	int elen = skb->len - hlen;
@@ -323,8 +321,7 @@ static void esp_input_done(struct crypto_async_request *base, int err)
 static int esp6_input(struct xfrm_state *x, struct sk_buff *skb)
 {
 	struct ip_esp_hdr *esph;
-	struct esp_data *esp = x->data;
-	struct crypto_aead *aead = esp->aead;
+	struct crypto_aead *aead = x->data;
 	struct aead_request *req;
 	struct sk_buff *trailer;
 	int elen = skb->len - sizeof(*esph) - crypto_aead_ivsize(aead);
@@ -412,8 +409,8 @@ out:
 
 static u32 esp6_get_mtu(struct xfrm_state *x, int mtu)
 {
-	struct esp_data *esp = x->data;
-	u32 blksize = ALIGN(crypto_aead_blocksize(esp->aead), 4);
+	struct crypto_aead *aead = x->data;
+	u32 blksize = ALIGN(crypto_aead_blocksize(aead), 4);
 	unsigned int net_adj;
 
 	if (x->props.mode != XFRM_MODE_TUNNEL)
@@ -421,7 +418,7 @@ static u32 esp6_get_mtu(struct xfrm_state *x, int mtu)
 	else
 		net_adj = 0;
 
-	return ((mtu - x->props.header_len - crypto_aead_authsize(esp->aead) -
+	return ((mtu - x->props.header_len - crypto_aead_authsize(aead) -
 		 net_adj) & ~(blksize - 1)) + net_adj - 2;
 }
 
@@ -452,18 +449,16 @@ static void esp6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
 
 static void esp6_destroy(struct xfrm_state *x)
 {
-	struct esp_data *esp = x->data;
+	struct crypto_aead *aead = x->data;
 
-	if (!esp)
+	if (!aead)
 		return;
 
-	crypto_free_aead(esp->aead);
-	kfree(esp);
+	crypto_free_aead(aead);
 }
 
 static int esp_init_aead(struct xfrm_state *x)
 {
-	struct esp_data *esp = x->data;
 	struct crypto_aead *aead;
 	int err;
 
@@ -472,7 +467,7 @@ static int esp_init_aead(struct xfrm_state *x)
 	if (IS_ERR(aead))
 		goto error;
 
-	esp->aead = aead;
+	x->data = aead;
 
 	err = crypto_aead_setkey(aead, x->aead->alg_key,
 				 (x->aead->alg_key_len + 7) / 8);
@@ -489,7 +484,6 @@ error:
 
 static int esp_init_authenc(struct xfrm_state *x)
 {
-	struct esp_data *esp = x->data;
 	struct crypto_aead *aead;
 	struct crypto_authenc_key_param *param;
 	struct rtattr *rta;
@@ -524,7 +518,7 @@ static int esp_init_authenc(struct xfrm_state *x)
 	if (IS_ERR(aead))
 		goto error;
 
-	esp->aead = aead;
+	x->data = aead;
 
 	keylen = (x->aalg ? (x->aalg->alg_key_len + 7) / 8 : 0) +
 		 (x->ealg->alg_key_len + 7) / 8 + RTA_SPACE(sizeof(*param));
@@ -579,7 +573,6 @@ error:
 
 static int esp6_init_state(struct xfrm_state *x)
 {
-	struct esp_data *esp;
 	struct crypto_aead *aead;
 	u32 align;
 	int err;
@@ -587,11 +580,7 @@ static int esp6_init_state(struct xfrm_state *x)
 	if (x->encap)
 		return -EINVAL;
 
-	esp = kzalloc(sizeof(*esp), GFP_KERNEL);
-	if (esp == NULL)
-		return -ENOMEM;
-
-	x->data = esp;
+	x->data = NULL;
 
 	if (x->aead)
 		err = esp_init_aead(x);
@@ -601,7 +590,7 @@ static int esp6_init_state(struct xfrm_state *x)
 	if (err)
 		goto error;
 
-	aead = esp->aead;
+	aead = x->data;
 
 	x->props.header_len = sizeof(struct ip_esp_hdr) +
 			      crypto_aead_ivsize(aead);
@@ -621,7 +610,7 @@ static int esp6_init_state(struct xfrm_state *x)
 	}
 
 	align = ALIGN(crypto_aead_blocksize(aead), 4);
-	x->props.trailer_len = align + 1 + crypto_aead_authsize(esp->aead);
+	x->props.trailer_len = align + 1 + crypto_aead_authsize(aead);
 
 error:
 	return err;
-- 
1.7.10.4

^ permalink raw reply related

* Re: [RFC] bridge and friends: reduce TheLinuxWay(tm)
From: Jamal Hadi Salim @ 2013-10-18 10:42 UTC (permalink / raw)
  To: vyasevic, Eric Dumazet
  Cc: Stephen Hemminger, Stephen Hemminger, netdev@vger.kernel.org
In-Reply-To: <525EF570.3050105@redhat.com>

On 10/16/13 16:22, Vlad Yasevich wrote:

> Yes, I know about that one.  BRIDGE_FLAGS started the polution and then
> we continued it with the other attributes....
>

Understood - note the email has subject TheLinuxWay ;->
So i fear this one also falls under "it is already in the wild, cant do
anything about it".

BTW: the flooding and learning knobs are still not exposed to iproute2.

In regards to a GET always being translated to DUMP forcing user space
to filter:
If neither you nor Stephen planning to take of it, I will because it
is a scaling problem.  Please let me know.

cheers,
jamal

^ permalink raw reply

* Re: [PATCH] net bridge: remove unused field
From: Jamal Hadi Salim @ 2013-10-18 10:46 UTC (permalink / raw)
  To: David Miller; +Cc: shemminger, netdev
In-Reply-To: <20131017.161037.310607928835493105.davem@davemloft.net>


The only reason i made an exception of this one is because:
a) nothing in the kernel touches it
b) nothing in iproute2 touches it
c) the intended logic to transport the hairpin knob is now
in IFLA_BRPORT_MODE

But: I empathize;-> Your call

cheers,
jamal

On 10/17/13 16:10, David Miller wrote:
>
>
> I know it seems rediculous, but anything we export to userspace, as we
> do in this UAPI header, could be potentially be used by some piece of
> source out there.
>
> I'd rather just leave it be, you can add a comment saying it's unused
> if you wish, but I'd really rather not remove it and run the risk of
> breaking someone's build out there.
>
> Thanks.
>

^ permalink raw reply

* [PATCH ipsec-next] xfrm: use vmalloc_node() for percpu scratches
From: Eric Dumazet @ 2013-10-18 10:54 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: Herbert Xu, David S. Miller, netdev
In-Reply-To: <20131018092512.GA31491@secunet.com>

From: Eric Dumazet <edumazet@google.com>

scratches are per cpu, we can use vmalloc_node() for proper
NUMA affinity.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/xfrm/xfrm_ipcomp.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/xfrm/xfrm_ipcomp.c b/net/xfrm/xfrm_ipcomp.c
index 2906d52..b943c7f 100644
--- a/net/xfrm/xfrm_ipcomp.c
+++ b/net/xfrm/xfrm_ipcomp.c
@@ -220,8 +220,8 @@ static void ipcomp_free_scratches(void)
 
 static void * __percpu *ipcomp_alloc_scratches(void)
 {
-	int i;
 	void * __percpu *scratches;
+	int i;
 
 	if (ipcomp_scratch_users++)
 		return ipcomp_scratches;
@@ -233,7 +233,9 @@ static void * __percpu *ipcomp_alloc_scratches(void)
 	ipcomp_scratches = scratches;
 
 	for_each_possible_cpu(i) {
-		void *scratch = vmalloc(IPCOMP_SCRATCH_SIZE);
+		void *scratch;
+
+		scratch = vmalloc_node(IPCOMP_SCRATCH_SIZE, cpu_to_node(i));
 		if (!scratch)
 			return NULL;
 		*per_cpu_ptr(scratches, i) = scratch;

^ permalink raw reply related

* Re: [PATCH net] bridge: clean the nf_bridge status when forwarding the skb
From: Pablo Neira Ayuso @ 2013-10-18 11:10 UTC (permalink / raw)
  To: Antonio Quartulli
  Cc: Antonio Quartulli, David S. Miller, netdev@vger.kernel.org,
	Stephen Hemminger
In-Reply-To: <20131017113735.GB2699@open-mesh.com>

On Thu, Oct 17, 2013 at 01:37:35PM +0200, Antonio Quartulli wrote:
> On Thu, Oct 17, 2013 at 04:28:57AM -0700, Pablo Neira Ayuso wrote:
> > Hi,
> > > +
> > > +/**
> > > + * br_netfilter_skb_free - clean the NF bridge data in an skb
> > > + * @skb: the skb which the data to free belongs to
> > > + */
> > > +void br_netfilter_skb_free(struct sk_buff *skb)
> > > +{
> > > +	nf_bridge_put(skb->nf_bridge);
> > > +	skb->nf_bridge = NULL;
> > > +}
> > 
> > This should be nf_reset.
> 
> You think I should directly use nf_reset instead of this function?
> 
> I see that nf_reset() cleans up the conntrack part too: does it also become
> useless once the packet exits the bridge interface?

The conntrack should not attached if it's forwarded to another netif,
see dev_forward_skb.

But I'm not sure what scenario you're trying to handle with this
change, if you could please elaborate.

Perhaps your fix is more conservative to avoid breaking strange setups
that have been relying on this behaviour. I know of people deploying
strange configurations using netfilter bridge.

^ permalink raw reply

* Re: [Xen-devel] [PATCH net] xen-netback: add the scenario which now beyond the range time_after_eq().
From: Wei Liu @ 2013-10-18 11:24 UTC (permalink / raw)
  To: David Laight
  Cc: Jan Beulich, annie li, david.vrabel, ian.campbell, wei.liu2,
	xen-devel, jianhai luan, netdev
In-Reply-To: <AE90C24D6B3A694183C094C60CF0A2F6026B7396@saturn3.aculab.com>

On Fri, Oct 18, 2013 at 09:40:33AM +0100, David Laight wrote:
> > > My understanding is this patch does not simply double the span, it is
> > > just stricter than the original one. Please check my previous comments,
> > > I paste it here.
> > 
> > No, the code (on a 32-bit arch) just _can't_ handle jiffies differences
> > beyond 2^32, no matter how cleverly you use the respective macros.
> > All arithmetic there is done modulo 2^32.
> 
> I haven't followed this discussion very closely but it might be possible
> to arrange that the 'incorrect lack of credit' only occurs for a few
> seconds every time 'jiffies' wraps - instead of half of the time.
> Then you'd have to be extremely unlucky to hit the timing window.
> 

As I understand it, this is the idea of this patch -- to narrow down the
timing window.

Wei.

> 	David
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net] bridge: clean the nf_bridge status when forwarding the skb
From: Antonio Quartulli @ 2013-10-18 11:35 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Antonio Quartulli, David S. Miller, netdev@vger.kernel.org,
	Stephen Hemminger
In-Reply-To: <20131018111041.GA10964@localhost>

[-- Attachment #1: Type: text/plain, Size: 1947 bytes --]

On Fri, Oct 18, 2013 at 01:10:41PM +0200, Pablo Neira Ayuso wrote:
> On Thu, Oct 17, 2013 at 01:37:35PM +0200, Antonio Quartulli wrote:
> > On Thu, Oct 17, 2013 at 04:28:57AM -0700, Pablo Neira Ayuso wrote:
> > > Hi,
> > > > +
> > > > +/**
> > > > + * br_netfilter_skb_free - clean the NF bridge data in an skb
> > > > + * @skb: the skb which the data to free belongs to
> > > > + */
> > > > +void br_netfilter_skb_free(struct sk_buff *skb)
> > > > +{
> > > > +	nf_bridge_put(skb->nf_bridge);
> > > > +	skb->nf_bridge = NULL;
> > > > +}
> > > 
> > > This should be nf_reset.
> > 
> > You think I should directly use nf_reset instead of this function?
> > 
> > I see that nf_reset() cleans up the conntrack part too: does it also become
> > useless once the packet exits the bridge interface?
> 
> The conntrack should not attached if it's forwarded to another netif,
> see dev_forward_skb.
> 
> But I'm not sure what scenario you're trying to handle with this
> change, if you could please elaborate.


This is a sample scenario (nf bridge is on):

[eth0] ---> [br0] ---> [bat0] ---> [br1]

where the relation '[a] ---> [b]' means 'a is enslaved in b' (bat0 is a
batman-adv virtual interface..in this situation it should not matter: it
just removs an header from an incoming skb and delivers it).

The problem I was having was due to an skb entering br0 first and br1 later.
When reaching br1 skb->nf_bridge was != NULL because of the previous processing
in br0.

To clarify, the packet arriving on eth0 is 'delivered' to br0. It is not
forwarded to another port of the bridge. Therefore I am not sure that we should
clean the conntrack part too.

> 
> Perhaps your fix is more conservative to avoid breaking strange setups
> that have been relying on this behaviour. I know of people deploying
> strange configurations using netfilter bridge.
> 

could be.

Cheers,

-- 
Antonio Quartulli

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: PROBLEM: GRE forwarding not working with 3.10.x, WCCPv2 and Cisco 7200 Router showing IP0 bad-hlen 4 in tcpdump
From: Peter Schmitt @ 2013-10-18 11:39 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev@vger.kernel.org, Pravin B Shelar
In-Reply-To: <1382017464.2045.149.camel@edumazet-glaptop.roam.corp.google.com>

Hi,



> On Thursday, October 17, 2013 3:44 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > On Thu, 2013-10-17 at 11:53 +0100, Peter Schmitt wrote:
>>  Hi,
>> 
>> 
>>  > 
>>  > 3.10 is buggy (for ETH_P_WCCP support I mean), 3.11 has the probable
>>  > fix :
>>  > 
>>  > commit 3d7b46cd20e300bd6989fb1f43d46f1b9645816e
>>  > ("ip_tunnel: push generic protocol handling to ip_tunnel 
> module.")
>>  > 
>>  > The problem is 3.10 do not pull the extra 4 bytes of WCCP
>>  > 
>>  > This is now properly done in iptunnel_pull_header()
>>  > 
>> 
>>  First of all, many thanks for your help on this.
>> 
>>  I tried to apply the fix to the 3.10.16 sources, as I would like to
>>  stay with the long-term line. Unfortunately it does not apply, as
>>  there are a lot of dependencies on other patches.
>> 
>>  Do you think there will be a fix for the long-time 3.10 kernel line?
>>  Or can you guide me on how to apply this fix to the long-term kernel?
> 
> 3.10 is right in the middle of GRE refactoring, and many bugs are in it.
> 
> It might be very painful to get a complete list of patches to backport.

thanks again. Yes I saw that there were lots of changes.

Do you think there will be a fix for all the 3.10.x stable long term kernel users? Many of them might not be able to use some newer kernel easily or will this be a "won't fix"?

Best regards,
Peter

^ permalink raw reply

* Re: [PATCH ipsec-next] xfrm: use vmalloc_node() for percpu scratches
From: Herbert Xu @ 2013-10-18 12:46 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Steffen Klassert, David S. Miller, netdev
In-Reply-To: <1382093656.3284.14.camel@edumazet-glaptop.roam.corp.google.com>

On Fri, Oct 18, 2013 at 03:54:16AM -0700, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> scratches are per cpu, we can use vmalloc_node() for proper
> NUMA affinity.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Acked-by: Herbert Xu <herbert@gondor.apana.org.au>

I agree completely.

Thanks!
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* [net-next  02/14] i40e: don't free nonexistent rings
From: Jeff Kirsher @ 2013-10-18 13:23 UTC (permalink / raw)
  To: a, davem
  Cc: Mitch Williams, netdev, gospo, sassmann, Jesse Brandeburg,
	Jeff Kirsher
In-Reply-To: <1382102598-11343-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Mitch Williams <mitch.a.williams@intel.com>

Not all VSIs have rings! Check to see if rings were actually allocated before
freeing them.

This prevents a panic when tx_rings[0] is not allocated.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 69ed801..a8c18fa 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -5160,11 +5160,12 @@ static s32 i40e_vsi_clear_rings(struct i40e_vsi *vsi)
 {
 	int i;
 
-	for (i = 0; i < vsi->alloc_queue_pairs; i++) {
-		kfree_rcu(vsi->tx_rings[i], rcu);
-		vsi->tx_rings[i] = NULL;
-		vsi->rx_rings[i] = NULL;
-	}
+	if (vsi->tx_rings[0])
+		for (i = 0; i < vsi->alloc_queue_pairs; i++) {
+			kfree_rcu(vsi->tx_rings[i], rcu);
+			vsi->tx_rings[i] = NULL;
+			vsi->rx_rings[i] = NULL;
+		}
 
 	return 0;
 }
-- 
1.8.3.1

^ permalink raw reply related

* [net-next  01/14] i40e: do not flush after re-enabling interrupts
From: Jeff Kirsher @ 2013-10-18 13:23 UTC (permalink / raw)
  To: a, davem; +Cc: Jesse Brandeburg, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1382102598-11343-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Jesse Brandeburg <jesse.brandeburg@intel.com>

Hot path doesn't need read-flush after interrupt enable, and this
flush really causes a lot of extra cpu utilization.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 3 ++-
 drivers/net/ethernet/intel/i40e/i40e_txrx.c | 2 --
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index fbe7fe2..69ed801 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -2560,7 +2560,7 @@ void i40e_irq_dynamic_enable(struct i40e_vsi *vsi, int vector)
 	      I40E_PFINT_DYN_CTLN_CLEARPBA_MASK |
 	      (I40E_ITR_NONE << I40E_PFINT_DYN_CTLN_ITR_INDX_SHIFT);
 	wr32(hw, I40E_PFINT_DYN_CTLN(vector - 1), val);
-	i40e_flush(hw);
+	/* skip the flush */
 }
 
 /**
@@ -2709,6 +2709,7 @@ static int i40e_vsi_enable_irq(struct i40e_vsi *vsi)
 		i40e_irq_dynamic_enable_icr0(pf);
 	}
 
+	i40e_flush(&pf->hw);
 	return 0;
 }
 
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index dc89e72..fbc40cd 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -559,8 +559,6 @@ static void i40e_update_dynamic_itr(struct i40e_q_vector *q_vector)
 	i40e_set_new_dynamic_itr(&q_vector->tx);
 	if (old_itr != q_vector->tx.itr)
 		wr32(hw, reg_addr, q_vector->tx.itr);
-
-	i40e_flush(hw);
 }
 
 /**
-- 
1.8.3.1

^ permalink raw reply related

* [net-next  00/14][pull request] Intel Wired LAN Driver Updates
From: Jeff Kirsher @ 2013-10-18 13:23 UTC (permalink / raw)
  To: a, davem; +Cc: Jeff Kirsher, netdev, gospo, sassmann

This series contains updates to i40e only.

Jesse provides 6 patches against i40e.  First is a patch to reduce
CPU utilization by reducing read-flush to read in the hot path.  Next
couple of patches resolve coverity issues reported by Hannes Frederic
Sowa <hannes@stressinduktion.org>.  Then Jesse refactored i40e to cleanup
functions which used cpu_to_xxx(foo) which caused a lot of line wrapping.

Mitch provides 2 i40e patches.  First fixes a panic when tx_rings[0]
are not allocated, his second patch corrects a math error when
assigning MSI-X vectors to VFs.  The vectors-per-vf value reported
by the hardware already conveniently reports one less than the actual
value.

Shannon provides 5 patches against i40e.  His first patch corrects a
number of little bugs in the error handling of irq setup, most of
which ended up panicing the kernel.  Next he fixes the overactive
IRQ issue seen in testing and allows the use of the legacy interrupt.
Shannon then provides a cleanup of the arguments declared at the
beginning of each function.  Then he provides a patch to make sure
that there are really rings and queues before trying to dump
information in them.  Lastly he simplifies the code by using an
already existing variable.

Catherine provides an i40e patch to bump the version.

The following are changes since commit 7cc7c5e54b7128195a1403747a63971c3c3f8e25:
  net: Delete trailing semi-colon from definition of netdev_WARN()
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next master

Catherine Sullivan (1):
  i40e: Bump version

Jesse Brandeburg (6):
  i40e: do not flush after re-enabling interrupts
  i40e: debugfs fixups
  i40e: clamp debugfs nvm read command
  i40e: fix use of untrusted scalar value warning
  i40e: fix sign extension issue
  i40e: refactor fdir setup function

Mitch Williams (2):
  i40e: don't free nonexistent rings
  i40e: assign correct vector to VF

Shannon Nelson (5):
  i40e: fixup legacy interrupt handling
  i40e: tweaking icr0 handling for legacy irq
  i40e: reorder block declarations in debugfs
  i40e: check vsi ptrs before dumping them
  i40e: use pf_id for pf function id in qtx_ctl

 drivers/net/ethernet/intel/i40e/i40e.h             |   1 +
 drivers/net/ethernet/intel/i40e/i40e_debugfs.c     | 135 ++++++++++++---------
 drivers/net/ethernet/intel/i40e/i40e_main.c        |  50 ++++----
 drivers/net/ethernet/intel/i40e/i40e_txrx.c        |  83 ++++++-------
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c |   4 +-
 5 files changed, 146 insertions(+), 127 deletions(-)

-- 
1.8.3.1

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox