* Re: BUG: 4.14.11 unable to handle kernel NULL pointer dereference in xfrm_lookup
From: Steffen Klassert @ 2018-06-12 8:03 UTC (permalink / raw)
To: Kristian Evensen; +Cc: Tobias Hommel, Markus Berner, Network Development
In-Reply-To: <CAKfDRXiq2c2ruvT8XoXGQntHYccAOp0zUZ3uH4iJM3cSAQkNVw@mail.gmail.com>
On Fri, Jun 08, 2018 at 10:41:37AM +0200, Kristian Evensen wrote:
> Hi,
>
> On Wed, Jun 6, 2018 at 6:03 PM, Tobias Hommel <netdev-list@genoetigt.de> wrote:
> > Sorry no progress until now, I currently do not get time to have a deeper look
> > into that. We're back to 4.1.6 right now.
>
> Thanks for letting me know. In the project I am currently involved in,
> we unfortunately don't have the option of reverting the kernel, so we
> are finding ways to live with the error. We have been looking into the
> error a bit more, and have made the following observations:
>
> * First of all, as discussed earlier in the thread, the error is
> triggered by dst_orig being NULL.
I spent quite some time again yesterday in trying to find a
case where dst_orig can be NULL in xfrm_lookup(). I don't see
how this can happen, so I fear we need a bisection on this.
^ permalink raw reply
* [PATCH RFC v2 ipsec-next 3/3] xfrm: Add virtual xfrm interfaces
From: Steffen Klassert @ 2018-06-12 7:56 UTC (permalink / raw)
To: netdev, David Miller
Cc: Steffen Klassert, Eyal Birger, Antony Antony, Benedict Wong,
Lorenzo Colitti, Shannon Nelson
In-Reply-To: <20180612075610.2000-1-steffen.klassert@secunet.com>
This patch adds support for virtual xfrm interfaces.
Packets that are routed through such an interface
are guaranteed to be IPsec transformed or dropped.
It is a generic virtual interface that ensures IPsec
transformation, no need to know what happens behind
the interface. This means that we can tunnel IPv4 and
IPv6 through the same interface and support all xfrm
modes (tunnel, transport and beet) on it.
Co-developed-by: Lorenzo Colitti <lorenzo@google.com>
Co-developed-by: Benedict Wong <benedictwong@google.com>
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Benedict Wong <benedictwong@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Benedict Wong <benedictwong@google.com>
Tested-by: Antony Antony <antony@phenome.org>
Reviewed-by: Eyal Birger <eyal.birger@gmail.com>
---
include/net/xfrm.h | 24 ++
include/uapi/linux/if_link.h | 10 +
net/xfrm/Kconfig | 8 +
net/xfrm/Makefile | 1 +
net/xfrm/xfrm_input.c | 3 +
net/xfrm/xfrm_interface.c | 971 +++++++++++++++++++++++++++++++++++++++++++
net/xfrm/xfrm_policy.c | 43 ++
7 files changed, 1060 insertions(+)
create mode 100644 net/xfrm/xfrm_interface.c
diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 26df8d6d507c..211b15c55a92 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -23,6 +23,7 @@
#include <net/ipv6.h>
#include <net/ip6_fib.h>
#include <net/flow.h>
+#include <net/gro_cells.h>
#include <linux/interrupt.h>
@@ -293,6 +294,13 @@ struct xfrm_replay {
int (*overflow)(struct xfrm_state *x, struct sk_buff *skb);
};
+struct xfrm_if_cb {
+ struct xfrm_if *(*decode_session)(struct sk_buff *skb);
+};
+
+void xfrm_if_register_cb(const struct xfrm_if_cb *ifcb);
+void xfrm_if_unregister_cb(void);
+
struct net_device;
struct xfrm_type;
struct xfrm_dst;
@@ -1039,6 +1047,22 @@ static inline void xfrm_dst_destroy(struct xfrm_dst *xdst)
void xfrm_dst_ifdown(struct dst_entry *dst, struct net_device *dev);
+struct xfrm_if_parms {
+ char name[IFNAMSIZ]; /* name of XFRM device */
+ int link; /* ifindex of underlying L2 interface */
+ u32 if_id; /* interface identifyer */
+};
+
+struct xfrm_if {
+ struct xfrm_if __rcu *next; /* next interface in list */
+ struct net_device *dev; /* virtual device associated with interface */
+ struct net_device *phydev; /* physical device */
+ struct net *net; /* netns for packet i/o */
+ struct xfrm_if_parms p; /* interface parms */
+
+ struct gro_cells gro_cells;
+};
+
struct xfrm_offload {
/* Output sequence number for replay protection on offloading. */
struct {
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index cf01b6824244..bff0af507b32 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -459,6 +459,16 @@ enum {
#define IFLA_MACSEC_MAX (__IFLA_MACSEC_MAX - 1)
+/* XFRM section */
+enum {
+ IFLA_XFRM_UNSPEC,
+ IFLA_XFRM_LINK,
+ IFLA_XFRM_IF_ID,
+ __IFLA_XFRM_MAX
+};
+
+#define IFLA_XFRM_MAX (__IFLA_XFRM_MAX - 1)
+
enum macsec_validation_type {
MACSEC_VALIDATE_DISABLED = 0,
MACSEC_VALIDATE_CHECK = 1,
diff --git a/net/xfrm/Kconfig b/net/xfrm/Kconfig
index 286ed25c1a69..53381888a7b3 100644
--- a/net/xfrm/Kconfig
+++ b/net/xfrm/Kconfig
@@ -25,6 +25,14 @@ config XFRM_USER
If unsure, say Y.
+config XFRM_INTERFACE
+ tristate "Transformation virtual interface"
+ depends on XFRM && IPV6
+ ---help---
+ This provides a virtual interface to route IPsec traffic.
+
+ If unsure, say N.
+
config XFRM_SUB_POLICY
bool "Transformation sub policy support"
depends on XFRM
diff --git a/net/xfrm/Makefile b/net/xfrm/Makefile
index 0bd2465a8c5a..fbc4552d17b8 100644
--- a/net/xfrm/Makefile
+++ b/net/xfrm/Makefile
@@ -10,3 +10,4 @@ obj-$(CONFIG_XFRM_STATISTICS) += xfrm_proc.o
obj-$(CONFIG_XFRM_ALGO) += xfrm_algo.o
obj-$(CONFIG_XFRM_USER) += xfrm_user.o
obj-$(CONFIG_XFRM_IPCOMP) += xfrm_ipcomp.o
+obj-$(CONFIG_XFRM_INTERFACE) += xfrm_interface.o
diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
index 352abca2605f..b104724d1caa 100644
--- a/net/xfrm/xfrm_input.c
+++ b/net/xfrm/xfrm_input.c
@@ -320,6 +320,7 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type)
seq = 0;
if (!spi && (err = xfrm_parse_spi(skb, nexthdr, &spi, &seq)) != 0) {
+ secpath_reset(skb);
XFRM_INC_STATS(net, LINUX_MIB_XFRMINHDRERROR);
goto drop;
}
@@ -328,12 +329,14 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type)
XFRM_SPI_SKB_CB(skb)->daddroff);
do {
if (skb->sp->len == XFRM_MAX_DEPTH) {
+ secpath_reset(skb);
XFRM_INC_STATS(net, LINUX_MIB_XFRMINBUFFERERROR);
goto drop;
}
x = xfrm_state_lookup(net, mark, daddr, spi, nexthdr, family);
if (x == NULL) {
+ secpath_reset(skb);
XFRM_INC_STATS(net, LINUX_MIB_XFRMINNOSTATES);
xfrm_audit_state_notfound(skb, family, spi, seq);
goto drop;
diff --git a/net/xfrm/xfrm_interface.c b/net/xfrm/xfrm_interface.c
new file mode 100644
index 000000000000..57df8f087132
--- /dev/null
+++ b/net/xfrm/xfrm_interface.c
@@ -0,0 +1,971 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * XFRM virtual interface
+ *
+ * Copyright (C) 2018 secunet Security Networks AG
+ *
+ * Author:
+ * Steffen Klassert <steffen.klassert@secunet.com>
+ */
+
+#include <linux/module.h>
+#include <linux/capability.h>
+#include <linux/errno.h>
+#include <linux/types.h>
+#include <linux/sockios.h>
+#include <linux/icmp.h>
+#include <linux/if.h>
+#include <linux/in.h>
+#include <linux/ip.h>
+#include <linux/net.h>
+#include <linux/in6.h>
+#include <linux/netdevice.h>
+#include <linux/if_link.h>
+#include <linux/if_arp.h>
+#include <linux/icmpv6.h>
+#include <linux/init.h>
+#include <linux/route.h>
+#include <linux/rtnetlink.h>
+#include <linux/netfilter_ipv6.h>
+#include <linux/slab.h>
+#include <linux/hash.h>
+
+#include <linux/uaccess.h>
+#include <linux/atomic.h>
+
+#include <net/icmp.h>
+#include <net/ip.h>
+#include <net/ipv6.h>
+#include <net/ip6_route.h>
+#include <net/addrconf.h>
+#include <net/xfrm.h>
+#include <net/net_namespace.h>
+#include <net/netns/generic.h>
+#include <linux/etherdevice.h>
+
+static int xfrmi_dev_init(struct net_device *dev);
+static void xfrmi_dev_setup(struct net_device *dev);
+static struct rtnl_link_ops xfrmi_link_ops __read_mostly;
+static unsigned int xfrmi_net_id __read_mostly;
+
+struct xfrmi_net {
+ /* lists for storing interfaces in use */
+ struct xfrm_if __rcu *xfrmi[1];
+};
+
+#define for_each_xfrmi_rcu(start, xi) \
+ for (xi = rcu_dereference(start); xi; xi = rcu_dereference(xi->next))
+
+static struct xfrm_if *xfrmi_lookup(struct net *net, struct xfrm_state *x)
+{
+ struct xfrmi_net *xfrmn = net_generic(net, xfrmi_net_id);
+ struct xfrm_if *xi;
+
+ for_each_xfrmi_rcu(xfrmn->xfrmi[0], xi) {
+ if (x->if_id == xi->p.if_id &&
+ (xi->dev->flags & IFF_UP))
+ return xi;
+ }
+
+ return NULL;
+}
+
+static struct xfrm_if *xfrmi_decode_session(struct sk_buff *skb)
+{
+ struct xfrmi_net *xfrmn;
+ int ifindex;
+ struct xfrm_if *xi;
+
+ if (!skb->dev)
+ return NULL;
+
+ xfrmn = net_generic(dev_net(skb->dev), xfrmi_net_id);
+ ifindex = skb->dev->ifindex;
+
+ for_each_xfrmi_rcu(xfrmn->xfrmi[0], xi) {
+ if (ifindex == xi->dev->ifindex &&
+ (xi->dev->flags & IFF_UP))
+ return xi;
+ }
+
+ return NULL;
+}
+
+static void xfrmi_link(struct xfrmi_net *xfrmn, struct xfrm_if *xi)
+{
+ struct xfrm_if __rcu **xip = &xfrmn->xfrmi[0];
+
+ rcu_assign_pointer(xi->next , rtnl_dereference(*xip));
+ rcu_assign_pointer(*xip, xi);
+}
+
+static void xfrmi_unlink(struct xfrmi_net *xfrmn, struct xfrm_if *xi)
+{
+ struct xfrm_if __rcu **xip;
+ struct xfrm_if *iter;
+
+ for (xip = &xfrmn->xfrmi[0];
+ (iter = rtnl_dereference(*xip)) != NULL;
+ xip = &iter->next) {
+ if (xi == iter) {
+ rcu_assign_pointer(*xip, xi->next);
+ break;
+ }
+ }
+}
+
+static void xfrmi_dev_free(struct net_device *dev)
+{
+ free_percpu(dev->tstats);
+}
+
+static int xfrmi_create2(struct net_device *dev)
+{
+ struct xfrm_if *xi = netdev_priv(dev);
+ struct net *net = dev_net(dev);
+ struct xfrmi_net *xfrmn = net_generic(net, xfrmi_net_id);
+ int err;
+
+ dev->rtnl_link_ops = &xfrmi_link_ops;
+ err = register_netdevice(dev);
+ if (err < 0)
+ goto out;
+
+ strcpy(xi->p.name, dev->name);
+
+ dev_hold(dev);
+ xfrmi_link(xfrmn, xi);
+
+ return 0;
+
+out:
+ return err;
+}
+
+static struct xfrm_if *xfrmi_create(struct net *net, struct xfrm_if_parms *p)
+{
+ struct net_device *dev;
+ struct xfrm_if *xi;
+ char name[IFNAMSIZ];
+ int err;
+
+ if (p->name[0])
+ strlcpy(name, p->name, IFNAMSIZ);
+ else
+ goto failed;
+
+ dev = alloc_netdev(sizeof(*xi), name, NET_NAME_UNKNOWN, xfrmi_dev_setup);
+ if (!dev)
+ goto failed;
+
+ dev_net_set(dev, net);
+
+ xi = netdev_priv(dev);
+ xi->p = *p;
+ xi->net = net;
+ xi->dev = dev;
+ xi->phydev = dev_get_by_index(net, p->link);
+ if (!xi->phydev)
+ goto failed_free;
+
+ err = xfrmi_create2(dev);
+ if (err < 0)
+ goto failed_dev_put;
+
+ return xi;
+
+failed_dev_put:
+ dev_put(xi->phydev);
+failed_free:
+ free_netdev(dev);
+failed:
+ return NULL;
+}
+
+static struct xfrm_if *xfrmi_locate(struct net *net, struct xfrm_if_parms *p,
+ int create)
+{
+ struct xfrm_if __rcu **xip;
+ struct xfrm_if *xi;
+ struct xfrmi_net *xfrmn = net_generic(net, xfrmi_net_id);
+
+ for (xip = &xfrmn->xfrmi[0];
+ (xi = rtnl_dereference(*xip)) != NULL;
+ xip = &xi->next) {
+ if (xi->p.if_id == p->if_id) {
+ if (create)
+ return NULL;
+
+ return xi;
+ }
+ }
+ if (!create)
+ return NULL;
+ return xfrmi_create(net, p);
+}
+
+static void xfrmi_dev_uninit(struct net_device *dev)
+{
+ struct xfrm_if *xi = netdev_priv(dev);
+ struct xfrmi_net *xfrmn = net_generic(xi->net, xfrmi_net_id);
+
+ xfrmi_unlink(xfrmn, xi);
+ dev_put(xi->phydev);
+ dev_put(dev);
+}
+
+static void xfrmi_scrub_packet(struct sk_buff *skb, bool xnet)
+{
+ skb->tstamp = 0;
+ skb->pkt_type = PACKET_HOST;
+ skb->skb_iif = 0;
+ skb->ignore_df = 0;
+ skb_dst_drop(skb);
+ nf_reset(skb);
+ nf_reset_trace(skb);
+
+ if (!xnet)
+ return;
+
+ ipvs_reset(skb);
+ secpath_reset(skb);
+ skb_orphan(skb);
+ skb->mark = 0;
+}
+
+static int xfrmi_rcv_cb(struct sk_buff *skb, int err)
+{
+ struct pcpu_sw_netstats *tstats;
+ struct xfrm_mode *inner_mode;
+ struct net_device *dev;
+ struct xfrm_state *x;
+ struct xfrm_if *xi;
+ bool xnet;
+
+ if (err && !skb->sp)
+ return 0;
+
+ x = xfrm_input_state(skb);
+
+ xi = xfrmi_lookup(xs_net(x), x);
+ if (!xi)
+ return 1;
+
+ dev = xi->dev;
+ skb->dev = dev;
+
+ if (err) {
+ dev->stats.rx_errors++;
+ dev->stats.rx_dropped++;
+
+ return 0;
+ }
+
+ xnet = !net_eq(xi->net, dev_net(skb->dev));
+
+ if (xnet) {
+ inner_mode = x->inner_mode;
+
+ if (x->sel.family == AF_UNSPEC) {
+ inner_mode = xfrm_ip2inner_mode(x, XFRM_MODE_SKB_CB(skb)->protocol);
+ if (inner_mode == NULL) {
+ XFRM_INC_STATS(dev_net(skb->dev),
+ LINUX_MIB_XFRMINSTATEMODEERROR);
+ return -EINVAL;
+ }
+ }
+
+ if (!xfrm_policy_check(NULL, XFRM_POLICY_IN, skb,
+ inner_mode->afinfo->family))
+ return -EPERM;
+ }
+
+ xfrmi_scrub_packet(skb, xnet);
+
+ tstats = this_cpu_ptr(dev->tstats);
+
+ u64_stats_update_begin(&tstats->syncp);
+ tstats->rx_packets++;
+ tstats->rx_bytes += skb->len;
+ u64_stats_update_end(&tstats->syncp);
+
+ return 0;
+}
+
+static int
+xfrmi_xmit2(struct sk_buff *skb, struct net_device *dev, struct flowi *fl)
+{
+ struct xfrm_if *xi = netdev_priv(dev);
+ struct net_device_stats *stats = &xi->dev->stats;
+ struct dst_entry *dst = skb_dst(skb);
+ struct net_device *tdev;
+ struct xfrm_state *x;
+ int err = -1;
+ int mtu;
+
+ if (!dst)
+ goto tx_err_link_failure;
+
+ fl->flowi_xfrm.if_id = xi->p.if_id;
+
+ dst_hold(dst);
+ dst = xfrm_lookup(xi->net, dst, fl, NULL, 0);
+ if (IS_ERR(dst)) {
+ err = PTR_ERR(dst);
+ dst = NULL;
+ goto tx_err_link_failure;
+ }
+
+ x = dst->xfrm;
+ if (!x)
+ goto tx_err_link_failure;
+
+ if (x->if_id != xi->p.if_id)
+ goto tx_err_link_failure;
+
+ tdev = dst->dev;
+
+ if (tdev == dev) {
+ stats->collisions++;
+ net_warn_ratelimited("%s: Local routing loop detected!\n",
+ xi->p.name);
+ goto tx_err_dst_release;
+ }
+
+ mtu = dst_mtu(dst);
+ if (!skb->ignore_df && skb->len > mtu) {
+ skb_dst_update_pmtu(skb, mtu);
+
+ if (skb->protocol == htons(ETH_P_IPV6)) {
+ if (mtu < IPV6_MIN_MTU)
+ mtu = IPV6_MIN_MTU;
+
+ icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
+ } else {
+ icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED,
+ htonl(mtu));
+ }
+
+ dst_release(dst);
+ return -EMSGSIZE;
+ }
+
+ xfrmi_scrub_packet(skb, !net_eq(xi->net, dev_net(dev)));
+ skb_dst_set(skb, dst);
+ skb->dev = tdev;
+
+ err = dst_output(xi->net, skb->sk, skb);
+ if (net_xmit_eval(err) == 0) {
+ struct pcpu_sw_netstats *tstats = this_cpu_ptr(dev->tstats);
+
+ u64_stats_update_begin(&tstats->syncp);
+ tstats->tx_bytes += skb->len;
+ tstats->tx_packets++;
+ u64_stats_update_end(&tstats->syncp);
+ } else {
+ stats->tx_errors++;
+ stats->tx_aborted_errors++;
+ }
+
+ return 0;
+tx_err_link_failure:
+ stats->tx_carrier_errors++;
+ dst_link_failure(skb);
+tx_err_dst_release:
+ dst_release(dst);
+ return err;
+}
+
+static netdev_tx_t xfrmi_xmit(struct sk_buff *skb, struct net_device *dev)
+{
+ struct xfrm_if *xi = netdev_priv(dev);
+ struct net_device_stats *stats = &xi->dev->stats;
+ struct flowi fl;
+ int ret;
+
+ memset(&fl, 0, sizeof(fl));
+
+ switch (skb->protocol) {
+ case htons(ETH_P_IPV6):
+ xfrm_decode_session(skb, &fl, AF_INET6);
+ memset(IP6CB(skb), 0, sizeof(*IP6CB(skb)));
+ break;
+ case htons(ETH_P_IP):
+ xfrm_decode_session(skb, &fl, AF_INET);
+ memset(IPCB(skb), 0, sizeof(*IPCB(skb)));
+ break;
+ default:
+ goto tx_err;
+ }
+
+ fl.flowi_oif = xi->phydev->ifindex;
+
+ ret = xfrmi_xmit2(skb, dev, &fl);
+ if (ret < 0)
+ goto tx_err;
+
+ return NETDEV_TX_OK;
+
+tx_err:
+ stats->tx_errors++;
+ stats->tx_dropped++;
+ kfree_skb(skb);
+ return NETDEV_TX_OK;
+}
+
+static int xfrmi4_err(struct sk_buff *skb, u32 info)
+{
+ const struct iphdr *iph = (const struct iphdr *)skb->data;
+ struct net *net = dev_net(skb->dev);
+ int protocol = iph->protocol;
+ struct ip_comp_hdr *ipch;
+ struct ip_esp_hdr *esph;
+ struct ip_auth_hdr *ah ;
+ struct xfrm_state *x;
+ struct xfrm_if *xi;
+ __be32 spi;
+
+ switch (protocol) {
+ case IPPROTO_ESP:
+ esph = (struct ip_esp_hdr *)(skb->data+(iph->ihl<<2));
+ spi = esph->spi;
+ break;
+ case IPPROTO_AH:
+ ah = (struct ip_auth_hdr *)(skb->data+(iph->ihl<<2));
+ spi = ah->spi;
+ break;
+ case IPPROTO_COMP:
+ ipch = (struct ip_comp_hdr *)(skb->data+(iph->ihl<<2));
+ spi = htonl(ntohs(ipch->cpi));
+ break;
+ default:
+ return 0;
+ }
+
+ switch (icmp_hdr(skb)->type) {
+ case ICMP_DEST_UNREACH:
+ if (icmp_hdr(skb)->code != ICMP_FRAG_NEEDED)
+ return 0;
+ case ICMP_REDIRECT:
+ break;
+ default:
+ return 0;
+ }
+
+ x = xfrm_state_lookup(net, skb->mark, (const xfrm_address_t *)&iph->daddr,
+ spi, protocol, AF_INET);
+ if (!x)
+ return 0;
+
+ xi = xfrmi_lookup(net, x);
+ if (!xi) {
+ xfrm_state_put(x);
+ return -1;
+ }
+
+ if (icmp_hdr(skb)->type == ICMP_DEST_UNREACH)
+ ipv4_update_pmtu(skb, net, info, 0, 0, protocol, 0);
+ else
+ ipv4_redirect(skb, net, 0, 0, protocol, 0);
+ xfrm_state_put(x);
+
+ return 0;
+}
+
+static int xfrmi6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
+ u8 type, u8 code, int offset, __be32 info)
+{
+ const struct ipv6hdr *iph = (const struct ipv6hdr *)skb->data;
+ struct net *net = dev_net(skb->dev);
+ int protocol = iph->nexthdr;
+ struct ip_comp_hdr *ipch;
+ struct ip_esp_hdr *esph;
+ struct ip_auth_hdr *ah;
+ struct xfrm_state *x;
+ struct xfrm_if *xi;
+ __be32 spi;
+
+ switch (protocol) {
+ case IPPROTO_ESP:
+ esph = (struct ip_esp_hdr *)(skb->data + offset);
+ spi = esph->spi;
+ break;
+ case IPPROTO_AH:
+ ah = (struct ip_auth_hdr *)(skb->data + offset);
+ spi = ah->spi;
+ break;
+ case IPPROTO_COMP:
+ ipch = (struct ip_comp_hdr *)(skb->data + offset);
+ spi = htonl(ntohs(ipch->cpi));
+ break;
+ default:
+ return 0;
+ }
+
+ if (type != ICMPV6_PKT_TOOBIG &&
+ type != NDISC_REDIRECT)
+ return 0;
+
+ x = xfrm_state_lookup(net, skb->mark, (const xfrm_address_t *)&iph->daddr,
+ spi, protocol, AF_INET6);
+ if (!x)
+ return 0;
+
+ xi = xfrmi_lookup(net, x);
+ if (!xi) {
+ xfrm_state_put(x);
+ return -1;
+ }
+
+ if (type == NDISC_REDIRECT)
+ ip6_redirect(skb, net, skb->dev->ifindex, 0,
+ sock_net_uid(net, NULL));
+ else
+ ip6_update_pmtu(skb, net, info, 0, 0, sock_net_uid(net, NULL));
+ xfrm_state_put(x);
+
+ return 0;
+}
+
+static int xfrmi_change(struct xfrm_if *xi, const struct xfrm_if_parms *p)
+{
+ if (xi->p.link != p->link)
+ return -EINVAL;
+
+ xi->p.if_id = p->if_id;
+
+ return 0;
+}
+
+static int xfrmi_update(struct xfrm_if *xi, struct xfrm_if_parms *p)
+{
+ struct net *net = dev_net(xi->dev);
+ struct xfrmi_net *xfrmn = net_generic(net, xfrmi_net_id);
+ int err;
+
+ xfrmi_unlink(xfrmn, xi);
+ synchronize_net();
+ err = xfrmi_change(xi, p);
+ xfrmi_link(xfrmn, xi);
+ netdev_state_change(xi->dev);
+ return err;
+}
+
+static void xfrmi_get_stats64(struct net_device *dev,
+ struct rtnl_link_stats64 *s)
+{
+ int cpu;
+
+ if (!dev->tstats)
+ return;
+
+ for_each_possible_cpu(cpu) {
+ struct pcpu_sw_netstats *stats;
+ struct pcpu_sw_netstats tmp;
+ int start;
+
+ stats = per_cpu_ptr(dev->tstats, cpu);
+ do {
+ start = u64_stats_fetch_begin_irq(&stats->syncp);
+ tmp.rx_packets = stats->rx_packets;
+ tmp.rx_bytes = stats->rx_bytes;
+ tmp.tx_packets = stats->tx_packets;
+ tmp.tx_bytes = stats->tx_bytes;
+ } while (u64_stats_fetch_retry_irq(&stats->syncp, start));
+
+ s->rx_packets += tmp.rx_packets;
+ s->rx_bytes += tmp.rx_bytes;
+ s->tx_packets += tmp.tx_packets;
+ s->tx_bytes += tmp.tx_bytes;
+ }
+
+ s->rx_dropped = dev->stats.rx_dropped;
+ s->tx_dropped = dev->stats.tx_dropped;
+}
+
+static int xfrmi_get_iflink(const struct net_device *dev)
+{
+ struct xfrm_if *xi = netdev_priv(dev);
+
+ return xi->phydev->ifindex;
+}
+
+
+static const struct net_device_ops xfrmi_netdev_ops = {
+ .ndo_init = xfrmi_dev_init,
+ .ndo_uninit = xfrmi_dev_uninit,
+ .ndo_start_xmit = xfrmi_xmit,
+ .ndo_get_stats64 = xfrmi_get_stats64,
+ .ndo_get_iflink = xfrmi_get_iflink,
+};
+
+static void xfrmi_dev_setup(struct net_device *dev)
+{
+ dev->netdev_ops = &xfrmi_netdev_ops;
+ dev->type = ARPHRD_NONE;
+ dev->hard_header_len = ETH_HLEN;
+ dev->min_header_len = ETH_HLEN;
+ dev->mtu = ETH_DATA_LEN;
+ dev->min_mtu = ETH_MIN_MTU;
+ dev->max_mtu = ETH_DATA_LEN;
+ dev->addr_len = ETH_ALEN;
+ dev->flags = IFF_NOARP;
+ dev->needs_free_netdev = true;
+ dev->priv_destructor = xfrmi_dev_free;
+ netif_keep_dst(dev);
+}
+
+static int xfrmi_dev_init(struct net_device *dev)
+{
+ struct xfrm_if *xi = netdev_priv(dev);
+ struct net_device *phydev = xi->phydev;
+ int err;
+
+ dev->tstats = netdev_alloc_pcpu_stats(struct pcpu_sw_netstats);
+ if (!dev->tstats)
+ return -ENOMEM;
+
+ err = gro_cells_init(&xi->gro_cells, dev);
+ if (err) {
+ free_percpu(dev->tstats);
+ return err;
+ }
+
+ dev->features |= NETIF_F_LLTX;
+
+ dev->needed_headroom = phydev->needed_headroom;
+ dev->needed_tailroom = phydev->needed_tailroom;
+
+ if (is_zero_ether_addr(dev->dev_addr))
+ eth_hw_addr_inherit(dev, phydev);
+ if (is_zero_ether_addr(dev->broadcast))
+ memcpy(dev->broadcast, phydev->broadcast, dev->addr_len);
+
+ return 0;
+}
+
+static int xfrmi_validate(struct nlattr *tb[], struct nlattr *data[],
+ struct netlink_ext_ack *extack)
+{
+ return 0;
+}
+
+static void xfrmi_netlink_parms(struct nlattr *data[],
+ struct xfrm_if_parms *parms)
+{
+ memset(parms, 0, sizeof(*parms));
+
+ if (!data)
+ return;
+
+ if (data[IFLA_XFRM_LINK])
+ parms->link = nla_get_u32(data[IFLA_XFRM_LINK]);
+
+ if (data[IFLA_XFRM_IF_ID])
+ parms->if_id = nla_get_u32(data[IFLA_XFRM_IF_ID]);
+}
+
+static int xfrmi_newlink(struct net *src_net, struct net_device *dev,
+ struct nlattr *tb[], struct nlattr *data[],
+ struct netlink_ext_ack *extack)
+{
+ struct net *net = dev_net(dev);
+ struct xfrm_if_parms *p;
+ struct xfrm_if *xi;
+
+ xi = netdev_priv(dev);
+ p = &xi->p;
+
+ xfrmi_netlink_parms(data, p);
+
+ if (!tb[IFLA_IFNAME])
+ return -EINVAL;
+
+ nla_strlcpy(p->name, tb[IFLA_IFNAME], IFNAMSIZ);
+
+ if (!xfrmi_locate(net, p, 1))
+ return -EEXIST;
+
+ return 0;
+}
+
+static void xfrmi_dellink(struct net_device *dev, struct list_head *head)
+{
+ unregister_netdevice_queue(dev, head);
+}
+
+static int xfrmi_changelink(struct net_device *dev, struct nlattr *tb[],
+ struct nlattr *data[],
+ struct netlink_ext_ack *extack)
+{
+ struct xfrm_if *xi = netdev_priv(dev);
+ struct net *net = dev_net(dev);
+
+ xfrmi_netlink_parms(data, &xi->p);
+
+ xi = xfrmi_locate(net, &xi->p, 0);
+
+ if (xi) {
+ if (xi->dev != dev)
+ return -EEXIST;
+ } else
+ xi = netdev_priv(dev);
+
+ return xfrmi_update(xi, &xi->p);
+}
+
+static size_t xfrmi_get_size(const struct net_device *dev)
+{
+ return
+ /* IFLA_XFRM_LINK */
+ nla_total_size(4) +
+ /* IFLA_XFRM_IF_ID */
+ nla_total_size(4) +
+ 0;
+}
+
+static int xfrmi_fill_info(struct sk_buff *skb, const struct net_device *dev)
+{
+ struct xfrm_if *xi = netdev_priv(dev);
+ struct xfrm_if_parms *parm = &xi->p;
+
+ if (nla_put_u32(skb, IFLA_XFRM_LINK, parm->link) ||
+ nla_put_u32(skb, IFLA_XFRM_IF_ID, parm->if_id))
+ goto nla_put_failure;
+ return 0;
+
+nla_put_failure:
+ return -EMSGSIZE;
+}
+
+struct net *xfrmi_get_link_net(const struct net_device *dev)
+{
+ struct xfrm_if *xi = netdev_priv(dev);
+
+ return dev_net(xi->phydev);
+}
+
+static const struct nla_policy xfrmi_policy[IFLA_XFRM_MAX + 1] = {
+ [IFLA_XFRM_LINK] = { .type = NLA_U32 },
+ [IFLA_XFRM_IF_ID] = { .type = NLA_U32 },
+};
+
+static struct rtnl_link_ops xfrmi_link_ops __read_mostly = {
+ .kind = "xfrm",
+ .maxtype = IFLA_XFRM_MAX,
+ .policy = xfrmi_policy,
+ .priv_size = sizeof(struct xfrm_if),
+ .setup = xfrmi_dev_setup,
+ .validate = xfrmi_validate,
+ .newlink = xfrmi_newlink,
+ .dellink = xfrmi_dellink,
+ .changelink = xfrmi_changelink,
+ .get_size = xfrmi_get_size,
+ .fill_info = xfrmi_fill_info,
+ .get_link_net = xfrmi_get_link_net,
+};
+
+static void __net_exit xfrmi_destroy_interfaces(struct xfrmi_net *xfrmn)
+{
+ struct xfrm_if *xi;
+ LIST_HEAD(list);
+
+ xi = rtnl_dereference(xfrmn->xfrmi[0]);
+ if (!xi)
+ return;
+
+ unregister_netdevice_queue(xi->dev, &list);
+ unregister_netdevice_many(&list);
+}
+
+static int __net_init xfrmi_init_net(struct net *net)
+{
+ return 0;
+}
+
+static void __net_exit xfrmi_exit_net(struct net *net)
+{
+ struct xfrmi_net *xfrmn = net_generic(net, xfrmi_net_id);
+
+ rtnl_lock();
+ xfrmi_destroy_interfaces(xfrmn);
+ rtnl_unlock();
+}
+
+static struct pernet_operations xfrmi_net_ops = {
+ .init = xfrmi_init_net,
+ .exit = xfrmi_exit_net,
+ .id = &xfrmi_net_id,
+ .size = sizeof(struct xfrmi_net),
+};
+
+static struct xfrm6_protocol xfrmi_esp6_protocol __read_mostly = {
+ .handler = xfrm6_rcv,
+ .cb_handler = xfrmi_rcv_cb,
+ .err_handler = xfrmi6_err,
+ .priority = 10,
+};
+
+static struct xfrm6_protocol xfrmi_ah6_protocol __read_mostly = {
+ .handler = xfrm6_rcv,
+ .cb_handler = xfrmi_rcv_cb,
+ .err_handler = xfrmi6_err,
+ .priority = 10,
+};
+
+static struct xfrm6_protocol xfrmi_ipcomp6_protocol __read_mostly = {
+ .handler = xfrm6_rcv,
+ .cb_handler = xfrmi_rcv_cb,
+ .err_handler = xfrmi6_err,
+ .priority = 10,
+};
+
+static struct xfrm4_protocol xfrmi_esp4_protocol __read_mostly = {
+ .handler = xfrm4_rcv,
+ .input_handler = xfrm_input,
+ .cb_handler = xfrmi_rcv_cb,
+ .err_handler = xfrmi4_err,
+ .priority = 10,
+};
+
+static struct xfrm4_protocol xfrmi_ah4_protocol __read_mostly = {
+ .handler = xfrm4_rcv,
+ .input_handler = xfrm_input,
+ .cb_handler = xfrmi_rcv_cb,
+ .err_handler = xfrmi4_err,
+ .priority = 10,
+};
+
+static struct xfrm4_protocol xfrmi_ipcomp4_protocol __read_mostly = {
+ .handler = xfrm4_rcv,
+ .input_handler = xfrm_input,
+ .cb_handler = xfrmi_rcv_cb,
+ .err_handler = xfrmi4_err,
+ .priority = 10,
+};
+
+static int __init xfrmi4_init(void)
+{
+ int err;
+
+ err = xfrm4_protocol_register(&xfrmi_esp4_protocol, IPPROTO_ESP);
+ if (err < 0)
+ goto xfrm_proto_esp_failed;
+ err = xfrm4_protocol_register(&xfrmi_ah4_protocol, IPPROTO_AH);
+ if (err < 0)
+ goto xfrm_proto_ah_failed;
+ err = xfrm4_protocol_register(&xfrmi_ipcomp4_protocol, IPPROTO_COMP);
+ if (err < 0)
+ goto xfrm_proto_comp_failed;
+
+ return 0;
+
+xfrm_proto_comp_failed:
+ xfrm4_protocol_deregister(&xfrmi_ah4_protocol, IPPROTO_AH);
+xfrm_proto_ah_failed:
+ xfrm4_protocol_deregister(&xfrmi_esp4_protocol, IPPROTO_ESP);
+xfrm_proto_esp_failed:
+ return err;
+}
+
+static void xfrmi4_fini(void)
+{
+ xfrm4_protocol_deregister(&xfrmi_ipcomp4_protocol, IPPROTO_COMP);
+ xfrm4_protocol_deregister(&xfrmi_ah4_protocol, IPPROTO_AH);
+ xfrm4_protocol_deregister(&xfrmi_esp4_protocol, IPPROTO_ESP);
+}
+
+static int __init xfrmi6_init(void)
+{
+ int err;
+
+ err = xfrm6_protocol_register(&xfrmi_esp6_protocol, IPPROTO_ESP);
+ if (err < 0)
+ goto xfrm_proto_esp_failed;
+ err = xfrm6_protocol_register(&xfrmi_ah6_protocol, IPPROTO_AH);
+ if (err < 0)
+ goto xfrm_proto_ah_failed;
+ err = xfrm6_protocol_register(&xfrmi_ipcomp6_protocol, IPPROTO_COMP);
+ if (err < 0)
+ goto xfrm_proto_comp_failed;
+
+ return 0;
+
+xfrm_proto_comp_failed:
+ xfrm6_protocol_deregister(&xfrmi_ah6_protocol, IPPROTO_AH);
+xfrm_proto_ah_failed:
+ xfrm6_protocol_deregister(&xfrmi_esp6_protocol, IPPROTO_ESP);
+xfrm_proto_esp_failed:
+ return err;
+}
+
+static void xfrmi6_fini(void)
+{
+ xfrm6_protocol_deregister(&xfrmi_ipcomp6_protocol, IPPROTO_COMP);
+ xfrm6_protocol_deregister(&xfrmi_ah6_protocol, IPPROTO_AH);
+ xfrm6_protocol_deregister(&xfrmi_esp6_protocol, IPPROTO_ESP);
+}
+
+static const struct xfrm_if_cb xfrm_if_cb = {
+ .decode_session = xfrmi_decode_session,
+};
+
+static int __init xfrmi_init(void)
+{
+ const char *msg;
+ int err;
+
+ pr_info("IPsec XFRM device driver\n");
+
+ msg = "tunnel device";
+ err = register_pernet_device(&xfrmi_net_ops);
+ if (err < 0)
+ goto pernet_dev_failed;
+
+ msg = "xfrm4 protocols";
+ err = xfrmi4_init();
+ if (err < 0)
+ goto xfrmi4_failed;
+
+ msg = "xfrm6 protocols";
+ err = xfrmi6_init();
+ if (err < 0)
+ goto xfrmi6_failed;
+
+
+ msg = "netlink interface";
+ err = rtnl_link_register(&xfrmi_link_ops);
+ if (err < 0)
+ goto rtnl_link_failed;
+
+ xfrm_if_register_cb(&xfrm_if_cb);
+
+ return err;
+
+rtnl_link_failed:
+ xfrmi6_fini();
+xfrmi6_failed:
+ xfrmi4_fini();
+xfrmi4_failed:
+ unregister_pernet_device(&xfrmi_net_ops);
+pernet_dev_failed:
+ pr_err("xfrmi init: failed to register %s\n", msg);
+ return err;
+}
+
+static void __exit xfrmi_fini(void)
+{
+ xfrm_if_unregister_cb();
+ rtnl_link_unregister(&xfrmi_link_ops);
+ xfrmi4_fini();
+ xfrmi6_fini();
+ unregister_pernet_device(&xfrmi_net_ops);
+}
+
+module_init(xfrmi_init);
+module_exit(xfrmi_fini);
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_RTNL_LINK("xfrm");
+MODULE_ALIAS_NETDEV("xfrm0");
+MODULE_AUTHOR("Steffen Klassert");
+MODULE_DESCRIPTION("XFRM virtual interface");
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 08b745aade43..47f776840df1 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -47,6 +47,9 @@ struct xfrm_flo {
static DEFINE_PER_CPU(struct xfrm_dst *, xfrm_last_dst);
static struct work_struct *xfrm_pcpu_work __read_mostly;
+static DEFINE_SPINLOCK(xfrm_if_cb_lock);
+static struct xfrm_if_cb const __rcu *xfrm_if_cb __read_mostly;
+
static DEFINE_SPINLOCK(xfrm_policy_afinfo_lock);
static struct xfrm_policy_afinfo const __rcu *xfrm_policy_afinfo[AF_INET6 + 1]
__read_mostly;
@@ -119,6 +122,12 @@ static const struct xfrm_policy_afinfo *xfrm_policy_get_afinfo(unsigned short fa
return afinfo;
}
+/* Called with rcu_read_lock(). */
+static const struct xfrm_if_cb *xfrm_if_get_cb(void)
+{
+ return rcu_dereference(xfrm_if_cb);
+}
+
struct dst_entry *__xfrm_dst_lookup(struct net *net, int tos, int oif,
const xfrm_address_t *saddr,
const xfrm_address_t *daddr,
@@ -2083,6 +2092,11 @@ xfrm_bundle_lookup(struct net *net, const struct flowi *fl, u16 family, u8 dir,
if (IS_ERR(xdst)) {
err = PTR_ERR(xdst);
+ if (err == -EREMOTE) {
+ xfrm_pols_put(pols, num_pols);
+ return NULL;
+ }
+
if (err != -EAGAIN)
goto error;
goto make_dummy_bundle;
@@ -2176,6 +2190,9 @@ struct dst_entry *xfrm_lookup(struct net *net, struct dst_entry *dst_orig,
if (IS_ERR(xdst)) {
xfrm_pols_put(pols, num_pols);
err = PTR_ERR(xdst);
+ if (err == -EREMOTE)
+ goto nopol;
+
goto dropdst;
} else if (xdst == NULL) {
num_xfrms = 0;
@@ -2368,12 +2385,20 @@ int __xfrm_decode_session(struct sk_buff *skb, struct flowi *fl,
unsigned int family, int reverse)
{
const struct xfrm_policy_afinfo *afinfo = xfrm_policy_get_afinfo(family);
+ const struct xfrm_if_cb *ifcb = xfrm_if_get_cb();
+ struct xfrm_if *xi;
int err;
if (unlikely(afinfo == NULL))
return -EAFNOSUPPORT;
afinfo->decode_session(skb, fl, reverse);
+ if (ifcb) {
+ xi = ifcb->decode_session(skb);
+ if (xi)
+ fl->flowi_xfrm.if_id = xi->p.if_id;
+ }
+
err = security_xfrm_decode_session(skb, &fl->flowi_secid);
rcu_read_unlock();
return err;
@@ -2828,6 +2853,21 @@ void xfrm_policy_unregister_afinfo(const struct xfrm_policy_afinfo *afinfo)
}
EXPORT_SYMBOL(xfrm_policy_unregister_afinfo);
+void xfrm_if_register_cb(const struct xfrm_if_cb *ifcb)
+{
+ spin_lock(&xfrm_if_cb_lock);
+ rcu_assign_pointer(xfrm_if_cb, ifcb);
+ spin_unlock(&xfrm_if_cb_lock);
+}
+EXPORT_SYMBOL(xfrm_if_register_cb);
+
+void xfrm_if_unregister_cb(void)
+{
+ RCU_INIT_POINTER(xfrm_if_cb, NULL);
+ synchronize_rcu();
+}
+EXPORT_SYMBOL(xfrm_if_unregister_cb);
+
#ifdef CONFIG_XFRM_STATISTICS
static int __net_init xfrm_statistics_init(struct net *net)
{
@@ -3008,6 +3048,9 @@ void __init xfrm_init(void)
xfrm_dev_init();
seqcount_init(&xfrm_policy_hash_generation);
xfrm_input_init();
+
+ RCU_INIT_POINTER(xfrm_if_cb, NULL);
+ synchronize_rcu();
}
#ifdef CONFIG_AUDITSYSCALL
--
2.14.1
^ permalink raw reply related
* [PATCH RFC v2 ipsec-next 2/3] xfrm: Add a new lookup key to match xfrm interfaces.
From: Steffen Klassert @ 2018-06-12 7:56 UTC (permalink / raw)
To: netdev, David Miller
Cc: Steffen Klassert, Eyal Birger, Antony Antony, Benedict Wong,
Lorenzo Colitti, Shannon Nelson
In-Reply-To: <20180612075610.2000-1-steffen.klassert@secunet.com>
This patch adds the xfrm interface id as a lookup key
for xfrm states and policies. With this we can assign
states and policies to virtual xfrm interfaces.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Shannon Nelson <shannon.nelson@oracle.com>
Acked-by: Benedict Wong <benedictwong@google.com>
Tested-by: Benedict Wong <benedictwong@google.com>
Tested-by: Antony Antony <antony@phenome.org>
Reviewed-by: Eyal Birger <eyal.birger@gmail.com>
---
include/net/xfrm.h | 21 ++++++++++++++-----
include/uapi/linux/xfrm.h | 1 +
net/core/pktgen.c | 2 +-
net/key/af_key.c | 6 +++---
net/xfrm/xfrm_policy.c | 18 +++++++++++-----
net/xfrm/xfrm_state.c | 19 ++++++++++++-----
net/xfrm/xfrm_user.c | 53 ++++++++++++++++++++++++++++++++++++++++++-----
7 files changed, 96 insertions(+), 24 deletions(-)
diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 45e75c36b738..26df8d6d507c 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -147,6 +147,7 @@ struct xfrm_state {
struct xfrm_id id;
struct xfrm_selector sel;
struct xfrm_mark mark;
+ u32 if_id;
u32 tfcpad;
u32 genid;
@@ -574,6 +575,7 @@ struct xfrm_policy {
atomic_t genid;
u32 priority;
u32 index;
+ u32 if_id;
struct xfrm_mark mark;
struct xfrm_selector selector;
struct xfrm_lifetime_cfg lft;
@@ -1533,7 +1535,7 @@ struct xfrm_state *xfrm_state_find(const xfrm_address_t *daddr,
struct xfrm_tmpl *tmpl,
struct xfrm_policy *pol, int *err,
unsigned short family);
-struct xfrm_state *xfrm_stateonly_find(struct net *net, u32 mark,
+struct xfrm_state *xfrm_stateonly_find(struct net *net, u32 mark, u32 if_id,
xfrm_address_t *daddr,
xfrm_address_t *saddr,
unsigned short family,
@@ -1690,20 +1692,20 @@ int xfrm_policy_walk(struct net *net, struct xfrm_policy_walk *walk,
void *);
void xfrm_policy_walk_done(struct xfrm_policy_walk *walk, struct net *net);
int xfrm_policy_insert(int dir, struct xfrm_policy *policy, int excl);
-struct xfrm_policy *xfrm_policy_bysel_ctx(struct net *net, u32 mark,
+struct xfrm_policy *xfrm_policy_bysel_ctx(struct net *net, u32 mark, u32 if_id,
u8 type, int dir,
struct xfrm_selector *sel,
struct xfrm_sec_ctx *ctx, int delete,
int *err);
-struct xfrm_policy *xfrm_policy_byid(struct net *net, u32 mark, u8, int dir,
- u32 id, int delete, int *err);
+struct xfrm_policy *xfrm_policy_byid(struct net *net, u32 mark, u32 if_id, u8,
+ int dir, u32 id, int delete, int *err);
int xfrm_policy_flush(struct net *net, u8 type, bool task_valid);
void xfrm_policy_hash_rebuild(struct net *net);
u32 xfrm_get_acqseq(void);
int verify_spi_info(u8 proto, u32 min, u32 max);
int xfrm_alloc_spi(struct xfrm_state *x, u32 minspi, u32 maxspi);
struct xfrm_state *xfrm_find_acq(struct net *net, const struct xfrm_mark *mark,
- u8 mode, u32 reqid, u8 proto,
+ u8 mode, u32 reqid, u32 if_id, u8 proto,
const xfrm_address_t *daddr,
const xfrm_address_t *saddr, int create,
unsigned short family);
@@ -2012,6 +2014,15 @@ static inline int xfrm_mark_put(struct sk_buff *skb, const struct xfrm_mark *m)
return ret;
}
+static inline int xfrm_if_id_put(struct sk_buff *skb, __u32 if_id)
+{
+ int ret = 0;
+
+ if (if_id)
+ ret = nla_put_u32(skb, XFRMA_IF_ID, if_id);
+ return ret;
+}
+
static inline int xfrm_tunnel_check(struct sk_buff *skb, struct xfrm_state *x,
unsigned int family)
{
diff --git a/include/uapi/linux/xfrm.h b/include/uapi/linux/xfrm.h
index e3af2859188b..690df6e7580b 100644
--- a/include/uapi/linux/xfrm.h
+++ b/include/uapi/linux/xfrm.h
@@ -306,6 +306,7 @@ enum xfrm_attr_type_t {
XFRMA_PAD,
XFRMA_OFFLOAD_DEV, /* struct xfrm_state_offload */
XFRMA_OUTPUT_MARK, /* __u32 */
+ XFRMA_IF_ID, /* __u32 */
__XFRMA_MAX
#define XFRMA_MAX (__XFRMA_MAX - 1)
diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index 7e4ede34cc52..a1b2d769fa36 100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -2255,7 +2255,7 @@ static void get_ipsec_sa(struct pktgen_dev *pkt_dev, int flow)
x = xfrm_state_lookup_byspi(pn->net, htonl(pkt_dev->spi), AF_INET);
} else {
/* slow path: we dont already have xfrm_state */
- x = xfrm_stateonly_find(pn->net, DUMMY_MARK,
+ x = xfrm_stateonly_find(pn->net, DUMMY_MARK, 0,
(xfrm_address_t *)&pkt_dev->cur_daddr,
(xfrm_address_t *)&pkt_dev->cur_saddr,
AF_INET,
diff --git a/net/key/af_key.c b/net/key/af_key.c
index e62e52e8f141..bf38b358cf38 100644
--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -1383,7 +1383,7 @@ static int pfkey_getspi(struct sock *sk, struct sk_buff *skb, const struct sadb_
}
if (!x)
- x = xfrm_find_acq(net, &dummy_mark, mode, reqid, proto, xdaddr, xsaddr, 1, family);
+ x = xfrm_find_acq(net, &dummy_mark, mode, reqid, 0, proto, xdaddr, xsaddr, 1, family);
if (x == NULL)
return -ENOENT;
@@ -2414,7 +2414,7 @@ static int pfkey_spddelete(struct sock *sk, struct sk_buff *skb, const struct sa
return err;
}
- xp = xfrm_policy_bysel_ctx(net, DUMMY_MARK, XFRM_POLICY_TYPE_MAIN,
+ xp = xfrm_policy_bysel_ctx(net, DUMMY_MARK, 0, XFRM_POLICY_TYPE_MAIN,
pol->sadb_x_policy_dir - 1, &sel, pol_ctx,
1, &err);
security_xfrm_policy_free(pol_ctx);
@@ -2663,7 +2663,7 @@ static int pfkey_spdget(struct sock *sk, struct sk_buff *skb, const struct sadb_
return -EINVAL;
delete = (hdr->sadb_msg_type == SADB_X_SPDDELETE2);
- xp = xfrm_policy_byid(net, DUMMY_MARK, XFRM_POLICY_TYPE_MAIN,
+ xp = xfrm_policy_byid(net, DUMMY_MARK, 0, XFRM_POLICY_TYPE_MAIN,
dir, pol->sadb_x_policy_id, delete, &err);
if (xp == NULL)
return -ENOENT;
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 40b54cc64243..08b745aade43 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -747,6 +747,7 @@ int xfrm_policy_insert(int dir, struct xfrm_policy *policy, int excl)
newpos = NULL;
hlist_for_each_entry(pol, chain, bydst) {
if (pol->type == policy->type &&
+ pol->if_id == policy->if_id &&
!selector_cmp(&pol->selector, &policy->selector) &&
xfrm_policy_mark_match(policy, pol) &&
xfrm_sec_ctx_match(pol->security, policy->security) &&
@@ -798,8 +799,9 @@ int xfrm_policy_insert(int dir, struct xfrm_policy *policy, int excl)
}
EXPORT_SYMBOL(xfrm_policy_insert);
-struct xfrm_policy *xfrm_policy_bysel_ctx(struct net *net, u32 mark, u8 type,
- int dir, struct xfrm_selector *sel,
+struct xfrm_policy *xfrm_policy_bysel_ctx(struct net *net, u32 mark, u32 if_id,
+ u8 type, int dir,
+ struct xfrm_selector *sel,
struct xfrm_sec_ctx *ctx, int delete,
int *err)
{
@@ -812,6 +814,7 @@ struct xfrm_policy *xfrm_policy_bysel_ctx(struct net *net, u32 mark, u8 type,
ret = NULL;
hlist_for_each_entry(pol, chain, bydst) {
if (pol->type == type &&
+ pol->if_id == if_id &&
(mark & pol->mark.m) == pol->mark.v &&
!selector_cmp(sel, &pol->selector) &&
xfrm_sec_ctx_match(ctx, pol->security)) {
@@ -837,8 +840,9 @@ struct xfrm_policy *xfrm_policy_bysel_ctx(struct net *net, u32 mark, u8 type,
}
EXPORT_SYMBOL(xfrm_policy_bysel_ctx);
-struct xfrm_policy *xfrm_policy_byid(struct net *net, u32 mark, u8 type,
- int dir, u32 id, int delete, int *err)
+struct xfrm_policy *xfrm_policy_byid(struct net *net, u32 mark, u32 if_id,
+ u8 type, int dir, u32 id, int delete,
+ int *err)
{
struct xfrm_policy *pol, *ret;
struct hlist_head *chain;
@@ -853,6 +857,7 @@ struct xfrm_policy *xfrm_policy_byid(struct net *net, u32 mark, u8 type,
ret = NULL;
hlist_for_each_entry(pol, chain, byidx) {
if (pol->type == type && pol->index == id &&
+ pol->if_id == if_id &&
(mark & pol->mark.m) == pol->mark.v) {
xfrm_pol_hold(pol);
if (delete) {
@@ -1063,6 +1068,7 @@ static int xfrm_policy_match(const struct xfrm_policy *pol,
bool match;
if (pol->family != family ||
+ pol->if_id != fl->flowi_xfrm.if_id ||
(fl->flowi_mark & pol->mark.m) != pol->mark.v ||
pol->type != type)
return ret;
@@ -1177,7 +1183,8 @@ static struct xfrm_policy *xfrm_sk_policy_lookup(const struct sock *sk, int dir,
match = xfrm_selector_match(&pol->selector, fl, family);
if (match) {
- if ((sk->sk_mark & pol->mark.m) != pol->mark.v) {
+ if ((sk->sk_mark & pol->mark.m) != pol->mark.v ||
+ pol->if_id != fl->flowi_xfrm.if_id) {
pol = NULL;
goto out;
}
@@ -1305,6 +1312,7 @@ static struct xfrm_policy *clone_policy(const struct xfrm_policy *old, int dir)
newp->lft = old->lft;
newp->curlft = old->curlft;
newp->mark = old->mark;
+ newp->if_id = old->if_id;
newp->action = old->action;
newp->flags = old->flags;
newp->xfrm_nr = old->xfrm_nr;
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 8308281f3253..3803b6813fc5 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -941,6 +941,7 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr,
int error = 0;
struct xfrm_state *best = NULL;
u32 mark = pol->mark.v & pol->mark.m;
+ u32 if_id = fl->flowi_xfrm.if_id;
unsigned short encap_family = tmpl->encap_family;
unsigned int sequence;
struct km_event c;
@@ -955,6 +956,7 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr,
if (x->props.family == encap_family &&
x->props.reqid == tmpl->reqid &&
(mark & x->mark.m) == x->mark.v &&
+ x->if_id == if_id &&
!(x->props.flags & XFRM_STATE_WILDRECV) &&
xfrm_state_addr_check(x, daddr, saddr, encap_family) &&
tmpl->mode == x->props.mode &&
@@ -971,6 +973,7 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr,
if (x->props.family == encap_family &&
x->props.reqid == tmpl->reqid &&
(mark & x->mark.m) == x->mark.v &&
+ x->if_id == if_id &&
!(x->props.flags & XFRM_STATE_WILDRECV) &&
xfrm_addr_equal(&x->id.daddr, daddr, encap_family) &&
tmpl->mode == x->props.mode &&
@@ -1010,6 +1013,7 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr,
* to current session. */
xfrm_init_tempstate(x, fl, tmpl, daddr, saddr, family);
memcpy(&x->mark, &pol->mark, sizeof(x->mark));
+ x->if_id = if_id;
error = security_xfrm_state_alloc_acquire(x, pol->security, fl->flowi_secid);
if (error) {
@@ -1067,7 +1071,7 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr,
}
struct xfrm_state *
-xfrm_stateonly_find(struct net *net, u32 mark,
+xfrm_stateonly_find(struct net *net, u32 mark, u32 if_id,
xfrm_address_t *daddr, xfrm_address_t *saddr,
unsigned short family, u8 mode, u8 proto, u32 reqid)
{
@@ -1080,6 +1084,7 @@ xfrm_stateonly_find(struct net *net, u32 mark,
if (x->props.family == family &&
x->props.reqid == reqid &&
(mark & x->mark.m) == x->mark.v &&
+ x->if_id == if_id &&
!(x->props.flags & XFRM_STATE_WILDRECV) &&
xfrm_state_addr_check(x, daddr, saddr, family) &&
mode == x->props.mode &&
@@ -1160,11 +1165,13 @@ static void __xfrm_state_bump_genids(struct xfrm_state *xnew)
struct xfrm_state *x;
unsigned int h;
u32 mark = xnew->mark.v & xnew->mark.m;
+ u32 if_id = xnew->if_id;
h = xfrm_dst_hash(net, &xnew->id.daddr, &xnew->props.saddr, reqid, family);
hlist_for_each_entry(x, net->xfrm.state_bydst+h, bydst) {
if (x->props.family == family &&
x->props.reqid == reqid &&
+ x->if_id == if_id &&
(mark & x->mark.m) == x->mark.v &&
xfrm_addr_equal(&x->id.daddr, &xnew->id.daddr, family) &&
xfrm_addr_equal(&x->props.saddr, &xnew->props.saddr, family))
@@ -1187,7 +1194,7 @@ EXPORT_SYMBOL(xfrm_state_insert);
static struct xfrm_state *__find_acq_core(struct net *net,
const struct xfrm_mark *m,
unsigned short family, u8 mode,
- u32 reqid, u8 proto,
+ u32 reqid, u32 if_id, u8 proto,
const xfrm_address_t *daddr,
const xfrm_address_t *saddr,
int create)
@@ -1242,6 +1249,7 @@ static struct xfrm_state *__find_acq_core(struct net *net,
x->props.family = family;
x->props.mode = mode;
x->props.reqid = reqid;
+ x->if_id = if_id;
x->mark.v = m->v;
x->mark.m = m->m;
x->lft.hard_add_expires_seconds = net->xfrm.sysctl_acq_expires;
@@ -1296,7 +1304,7 @@ int xfrm_state_add(struct xfrm_state *x)
if (use_spi && !x1)
x1 = __find_acq_core(net, &x->mark, family, x->props.mode,
- x->props.reqid, x->id.proto,
+ x->props.reqid, x->if_id, x->id.proto,
&x->id.daddr, &x->props.saddr, 0);
__xfrm_state_bump_genids(x);
@@ -1395,6 +1403,7 @@ static struct xfrm_state *xfrm_state_clone(struct xfrm_state *orig,
x->props.flags = orig->props.flags;
x->props.extra_flags = orig->props.extra_flags;
+ x->if_id = orig->if_id;
x->tfcpad = orig->tfcpad;
x->replay_maxdiff = orig->replay_maxdiff;
x->replay_maxage = orig->replay_maxage;
@@ -1619,13 +1628,13 @@ EXPORT_SYMBOL(xfrm_state_lookup_byaddr);
struct xfrm_state *
xfrm_find_acq(struct net *net, const struct xfrm_mark *mark, u8 mode, u32 reqid,
- u8 proto, const xfrm_address_t *daddr,
+ u32 if_id, u8 proto, const xfrm_address_t *daddr,
const xfrm_address_t *saddr, int create, unsigned short family)
{
struct xfrm_state *x;
spin_lock_bh(&net->xfrm.xfrm_state_lock);
- x = __find_acq_core(net, mark, family, mode, reqid, proto, daddr, saddr, create);
+ x = __find_acq_core(net, mark, family, mode, reqid, if_id, proto, daddr, saddr, create);
spin_unlock_bh(&net->xfrm.xfrm_state_lock);
return x;
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 080035f056d9..08ec1ab8e36c 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -582,6 +582,9 @@ static struct xfrm_state *xfrm_state_construct(struct net *net,
if (attrs[XFRMA_OUTPUT_MARK])
x->props.output_mark = nla_get_u32(attrs[XFRMA_OUTPUT_MARK]);
+ if (attrs[XFRMA_IF_ID])
+ x->if_id = nla_get_u32(attrs[XFRMA_IF_ID]);
+
err = __xfrm_init_state(x, false, attrs[XFRMA_OFFLOAD_DEV]);
if (err)
goto error;
@@ -905,6 +908,11 @@ static int copy_to_user_state_extra(struct xfrm_state *x,
if (ret)
goto out;
}
+ if (x->if_id) {
+ ret = nla_put_u32(skb, XFRMA_IF_ID, x->if_id);
+ if (ret)
+ goto out;
+ }
if (x->security)
ret = copy_sec_ctx(x->security, skb);
out:
@@ -1253,6 +1261,7 @@ static int xfrm_alloc_userspi(struct sk_buff *skb, struct nlmsghdr *nlh,
int err;
u32 mark;
struct xfrm_mark m;
+ u32 if_id = 0;
p = nlmsg_data(nlh);
err = verify_spi_info(p->info.id.proto, p->min, p->max);
@@ -1265,6 +1274,10 @@ static int xfrm_alloc_userspi(struct sk_buff *skb, struct nlmsghdr *nlh,
x = NULL;
mark = xfrm_mark_get(attrs, &m);
+
+ if (attrs[XFRMA_IF_ID])
+ if_id = nla_get_u32(attrs[XFRMA_IF_ID]);
+
if (p->info.seq) {
x = xfrm_find_acq_byseq(net, mark, p->info.seq);
if (x && !xfrm_addr_equal(&x->id.daddr, daddr, family)) {
@@ -1275,7 +1288,7 @@ static int xfrm_alloc_userspi(struct sk_buff *skb, struct nlmsghdr *nlh,
if (!x)
x = xfrm_find_acq(net, &m, p->info.mode, p->info.reqid,
- p->info.id.proto, daddr,
+ if_id, p->info.id.proto, daddr,
&p->info.saddr, 1,
family);
err = -ENOENT;
@@ -1563,6 +1576,9 @@ static struct xfrm_policy *xfrm_policy_construct(struct net *net, struct xfrm_us
xfrm_mark_get(attrs, &xp->mark);
+ if (attrs[XFRMA_IF_ID])
+ xp->if_id = nla_get_u32(attrs[XFRMA_IF_ID]);
+
return xp;
error:
*errp = err;
@@ -1708,6 +1724,8 @@ static int dump_one_policy(struct xfrm_policy *xp, int dir, int count, void *ptr
err = copy_to_user_policy_type(xp->type, skb);
if (!err)
err = xfrm_mark_put(skb, &xp->mark);
+ if (!err)
+ err = xfrm_if_id_put(skb, xp->if_id);
if (err) {
nlmsg_cancel(skb, nlh);
return err;
@@ -1789,6 +1807,7 @@ static int xfrm_get_policy(struct sk_buff *skb, struct nlmsghdr *nlh,
int delete;
struct xfrm_mark m;
u32 mark = xfrm_mark_get(attrs, &m);
+ u32 if_id = 0;
p = nlmsg_data(nlh);
delete = nlh->nlmsg_type == XFRM_MSG_DELPOLICY;
@@ -1801,8 +1820,11 @@ static int xfrm_get_policy(struct sk_buff *skb, struct nlmsghdr *nlh,
if (err)
return err;
+ if (attrs[XFRMA_IF_ID])
+ if_id = nla_get_u32(attrs[XFRMA_IF_ID]);
+
if (p->index)
- xp = xfrm_policy_byid(net, mark, type, p->dir, p->index, delete, &err);
+ xp = xfrm_policy_byid(net, mark, if_id, type, p->dir, p->index, delete, &err);
else {
struct nlattr *rt = attrs[XFRMA_SEC_CTX];
struct xfrm_sec_ctx *ctx;
@@ -1819,7 +1841,7 @@ static int xfrm_get_policy(struct sk_buff *skb, struct nlmsghdr *nlh,
if (err)
return err;
}
- xp = xfrm_policy_bysel_ctx(net, mark, type, p->dir, &p->sel,
+ xp = xfrm_policy_bysel_ctx(net, mark, if_id, type, p->dir, &p->sel,
ctx, delete, &err);
security_xfrm_policy_free(ctx);
}
@@ -1942,6 +1964,10 @@ static int build_aevent(struct sk_buff *skb, struct xfrm_state *x, const struct
if (err)
goto out_cancel;
+ err = xfrm_if_id_put(skb, x->if_id);
+ if (err)
+ goto out_cancel;
+
nlmsg_end(skb, nlh);
return 0;
@@ -2084,6 +2110,7 @@ static int xfrm_add_pol_expire(struct sk_buff *skb, struct nlmsghdr *nlh,
int err = -ENOENT;
struct xfrm_mark m;
u32 mark = xfrm_mark_get(attrs, &m);
+ u32 if_id = 0;
err = copy_from_user_policy_type(&type, attrs);
if (err)
@@ -2093,8 +2120,11 @@ static int xfrm_add_pol_expire(struct sk_buff *skb, struct nlmsghdr *nlh,
if (err)
return err;
+ if (attrs[XFRMA_IF_ID])
+ if_id = nla_get_u32(attrs[XFRMA_IF_ID]);
+
if (p->index)
- xp = xfrm_policy_byid(net, mark, type, p->dir, p->index, 0, &err);
+ xp = xfrm_policy_byid(net, mark, if_id, type, p->dir, p->index, 0, &err);
else {
struct nlattr *rt = attrs[XFRMA_SEC_CTX];
struct xfrm_sec_ctx *ctx;
@@ -2111,7 +2141,7 @@ static int xfrm_add_pol_expire(struct sk_buff *skb, struct nlmsghdr *nlh,
if (err)
return err;
}
- xp = xfrm_policy_bysel_ctx(net, mark, type, p->dir,
+ xp = xfrm_policy_bysel_ctx(net, mark, if_id, type, p->dir,
&p->sel, ctx, 0, &err);
security_xfrm_policy_free(ctx);
}
@@ -2494,6 +2524,7 @@ static const struct nla_policy xfrma_policy[XFRMA_MAX+1] = {
[XFRMA_ADDRESS_FILTER] = { .len = sizeof(struct xfrm_address_filter) },
[XFRMA_OFFLOAD_DEV] = { .len = sizeof(struct xfrm_user_offload) },
[XFRMA_OUTPUT_MARK] = { .type = NLA_U32 },
+ [XFRMA_IF_ID] = { .type = NLA_U32 },
};
static const struct nla_policy xfrma_spd_policy[XFRMA_SPD_MAX+1] = {
@@ -2625,6 +2656,10 @@ static int build_expire(struct sk_buff *skb, struct xfrm_state *x, const struct
if (err)
return err;
+ err = xfrm_if_id_put(skb, x->if_id);
+ if (err)
+ return err;
+
nlmsg_end(skb, nlh);
return 0;
}
@@ -2721,6 +2756,8 @@ static inline unsigned int xfrm_sa_len(struct xfrm_state *x)
l += nla_total_size(sizeof(x->xso));
if (x->props.output_mark)
l += nla_total_size(sizeof(x->props.output_mark));
+ if (x->if_id)
+ l += nla_total_size(sizeof(x->if_id));
/* Must count x->lastused as it may become non-zero behind our back. */
l += nla_total_size_64bit(sizeof(u64));
@@ -2850,6 +2887,8 @@ static int build_acquire(struct sk_buff *skb, struct xfrm_state *x,
err = copy_to_user_policy_type(xp->type, skb);
if (!err)
err = xfrm_mark_put(skb, &xp->mark);
+ if (!err)
+ err = xfrm_if_id_put(skb, xp->if_id);
if (err) {
nlmsg_cancel(skb, nlh);
return err;
@@ -2966,6 +3005,8 @@ static int build_polexpire(struct sk_buff *skb, struct xfrm_policy *xp,
err = copy_to_user_policy_type(xp->type, skb);
if (!err)
err = xfrm_mark_put(skb, &xp->mark);
+ if (!err)
+ err = xfrm_if_id_put(skb, xp->if_id);
if (err) {
nlmsg_cancel(skb, nlh);
return err;
@@ -3047,6 +3088,8 @@ static int xfrm_notify_policy(struct xfrm_policy *xp, int dir, const struct km_e
err = copy_to_user_policy_type(xp->type, skb);
if (!err)
err = xfrm_mark_put(skb, &xp->mark);
+ if (!err)
+ err = xfrm_if_id_put(skb, xp->if_id);
if (err)
goto out_free_skb;
--
2.14.1
^ permalink raw reply related
* [PATCH RFC v2 ipsec-next 1/3] flow: Extend flow informations with xfrm interface id.
From: Steffen Klassert @ 2018-06-12 7:56 UTC (permalink / raw)
To: netdev, David Miller
Cc: Steffen Klassert, Eyal Birger, Antony Antony, Benedict Wong,
Lorenzo Colitti, Shannon Nelson
In-Reply-To: <20180612075610.2000-1-steffen.klassert@secunet.com>
Add a new flowi_xfrm structure with informations needed to do
a xfrm lookup. At the moment it keeps the informations about
the new xfrm interface id needed to lookup xfrm interfaces
that are introduced with a followup patch. We need this new
lookup key as other possible keys, like the ifindex is
already part of the xfrm selector and used as a key to
enforce the output device after the transformation in the
policy/state lookup.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Shannon Nelson <shannon.nelson@oracle.com>
Acked-by: Benedict Wong <benedictwong@google.com>
Tested-by: Benedict Wong <benedictwong@google.com>
Tested-by: Antony Antony <antony@phenome.org>
Reviewed-by: Eyal Birger <eyal.birger@gmail.com>
---
include/net/flow.h | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/include/net/flow.h b/include/net/flow.h
index 8ce21793094e..187c9bef672f 100644
--- a/include/net/flow.h
+++ b/include/net/flow.h
@@ -26,6 +26,10 @@ struct flowi_tunnel {
__be64 tun_id;
};
+struct flowi_xfrm {
+ __u32 if_id;
+};
+
struct flowi_common {
int flowic_oif;
int flowic_iif;
@@ -39,6 +43,7 @@ struct flowi_common {
#define FLOWI_FLAG_SKIP_NH_OIF 0x04
__u32 flowic_secid;
struct flowi_tunnel flowic_tun_key;
+ struct flowi_xfrm xfrm;
kuid_t flowic_uid;
};
@@ -78,6 +83,7 @@ struct flowi4 {
#define flowi4_secid __fl_common.flowic_secid
#define flowi4_tun_key __fl_common.flowic_tun_key
#define flowi4_uid __fl_common.flowic_uid
+#define flowi4_xfrm __fl_common.xfrm
/* (saddr,daddr) must be grouped, same order as in IP header */
__be32 saddr;
@@ -109,6 +115,7 @@ static inline void flowi4_init_output(struct flowi4 *fl4, int oif,
fl4->flowi4_flags = flags;
fl4->flowi4_secid = 0;
fl4->flowi4_tun_key.tun_id = 0;
+ fl4->flowi4_xfrm.if_id = 0;
fl4->flowi4_uid = uid;
fl4->daddr = daddr;
fl4->saddr = saddr;
@@ -138,6 +145,7 @@ struct flowi6 {
#define flowi6_secid __fl_common.flowic_secid
#define flowi6_tun_key __fl_common.flowic_tun_key
#define flowi6_uid __fl_common.flowic_uid
+#define flowi6_xfrm __fl_common.xfrm
struct in6_addr daddr;
struct in6_addr saddr;
/* Note: flowi6_tos is encoded in flowlabel, too. */
@@ -185,6 +193,7 @@ struct flowi {
#define flowi_secid u.__fl_common.flowic_secid
#define flowi_tun_key u.__fl_common.flowic_tun_key
#define flowi_uid u.__fl_common.flowic_uid
+#define flowi_xfrm u.__fl_common.xfrm
} __attribute__((__aligned__(BITS_PER_LONG/8)));
static inline struct flowi *flowi4_to_flowi(struct flowi4 *fl4)
--
2.14.1
^ permalink raw reply related
* [PATCH RFC v2 ipsec-next 0/3] Virtual xfrm interfaces
From: Steffen Klassert @ 2018-06-12 7:56 UTC (permalink / raw)
To: netdev, David Miller
Cc: Steffen Klassert, Eyal Birger, Antony Antony, Benedict Wong,
Lorenzo Colitti, Shannon Nelson
This patchset introduces new virtual xfrm interfaces.
The design of virtual xfrm interfaces interfaces was
discussed at the Linux IPsec workshop 2018. This patchset
implements these interfaces as the IPsec userspace and
kernel developers agreed. The purpose of these interfaces
is to overcome the design limitations that the existing
VTI devices have.
The main limitations that we see with the current VTI are the
following:
- VTI interfaces are L3 tunnels with configurable endpoints.
For xfrm, the tunnel endpoint are already determined by the SA.
So the VTI tunnel endpoints must be either the same as on the
SA or wildcards. In case VTI tunnel endpoints are same as on
the SA, we get a one to one correlation between the SA and
the tunnel. So each SA needs its own tunnel interface.
On the other hand, we can have only one VTI tunnel with
wildcard src/dst tunnel endpoints in the system because the
lookup is based on the tunnel endpoints. The existing tunnel
lookup won't work with multiple tunnels with wildcard
tunnel endpoints. Some usecases require more than on
VTI tunnel of this type, for example if somebody has multiple
namespaces and every namespace requires such a VTI.
- VTI needs separate interfaces for IPv4 and IPv6 tunnels.
So when routing to a VTI, we have to know to which address
family this traffic class is going to be encapsulated.
This is a lmitation because it makes routing more complex
and it is not always possible to know what happens behind the
VTI, e.g. when the VTI is move to some namespace.
- VTI works just with tunnel mode SAs. We need generic interfaces
that ensures transfomation, regardless of the xfrm mode and
the encapsulated address family.
- VTI is configured with a combination GRE keys and xfrm marks.
With this we have to deal with some extra cases in the generic
tunnel lookup because the GRE keys on the VTI are actually
not GRE keys, the GRE keys were just reused for something else.
All extensions to the VTI interfaces would require to add
even more complexity to the generic tunnel lookup.
To overcome this, we started with the following design goal:
- It should be possible to tunnel IPv4 and IPv6 through the same
interface.
- No limitation on xfrm mode (tunnel, transport and beet).
- Should be a generic virtual interface that ensures IPsec
transformation, no need to know what happens behind the
interface.
- Interfaces should be configured with a new key that must match a
new policy/SA lookup key.
- The lookup logic should stay in the xfrm codebase, no need to
change or extend generic routing and tunnel lookups.
- Should be possible to use IPsec hardware offloads of the underlying
interface.
Changes from v1:
- Document the limitations of VTI interfaces and the design of
the new xfrm interfaces more explicit in the commit messages.
- No code changes.
^ permalink raw reply
* Re: [PATCH 1/3] m68k: coldfire: Normalize clk API
From: Geert Uytterhoeven @ 2018-06-12 7:31 UTC (permalink / raw)
To: Greg Ungerer
Cc: Ralf Baechle, James Hogan, Giuseppe Cavallaro, Alexandre Torgue,
Jose Abreu, Corentin Labbe, David S. Miller, Arnd Bergmann,
linux-m68k, Linux MIPS Mailing List, netdev,
Linux Kernel Mailing List
In-Reply-To: <944b08ba-a882-e6cd-42fa-9251bce1d7b1@linux-m68k.org>
Hi Greg,
On Tue, Jun 12, 2018 at 9:27 AM Greg Ungerer <gerg@linux-m68k.org> wrote:
> On 11/06/18 18:44, Geert Uytterhoeven wrote:
> > Coldfire still provides its own variant of the clk API rather than using
> > the generic COMMON_CLK API. This generally works, but it causes some
> > link errors with drivers using the clk_round_rate(), clk_set_rate(),
> > clk_set_parent(), or clk_get_parent() functions when a platform lacks
> > those interfaces.
> >
> > This adds empty stub implementations for each of them, and I don't even
> > try to do something useful here but instead just print a WARN() message
> > to make it obvious what is going on if they ever end up being called.
> >
> > The drivers that call these won't be used on these platforms (otherwise
> > we'd get a link error today), so the added code is harmless bloat and
> > will warn about accidental use.
> >
> > Based on commit bd7fefe1f06ca6cc ("ARM: w90x900: normalize clk API").
> >
> > Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
>
> I am fine with this for ColdFire, so
>
> Acked-by: Greg Ungerer <gerg@linux-m68k.org>
Thanks!
> Are you going to take this/these via your m68k git tree?
I''m fine delagating this to you.
Thanks!
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
^ permalink raw reply
* Re: [PATCH 1/3] m68k: coldfire: Normalize clk API
From: Greg Ungerer @ 2018-06-12 7:26 UTC (permalink / raw)
To: Geert Uytterhoeven, Ralf Baechle, James Hogan, Giuseppe Cavallaro,
Alexandre Torgue, Jose Abreu, Corentin Labbe, David S . Miller
Cc: Arnd Bergmann, linux-m68k, linux-mips, netdev, linux-kernel
In-Reply-To: <1528706663-20670-2-git-send-email-geert@linux-m68k.org>
Hi Geert,
On 11/06/18 18:44, Geert Uytterhoeven wrote:
> Coldfire still provides its own variant of the clk API rather than using
> the generic COMMON_CLK API. This generally works, but it causes some
> link errors with drivers using the clk_round_rate(), clk_set_rate(),
> clk_set_parent(), or clk_get_parent() functions when a platform lacks
> those interfaces.
>
> This adds empty stub implementations for each of them, and I don't even
> try to do something useful here but instead just print a WARN() message
> to make it obvious what is going on if they ever end up being called.
>
> The drivers that call these won't be used on these platforms (otherwise
> we'd get a link error today), so the added code is harmless bloat and
> will warn about accidental use.
>
> Based on commit bd7fefe1f06ca6cc ("ARM: w90x900: normalize clk API").
>
> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
I am fine with this for ColdFire, so
Acked-by: Greg Ungerer <gerg@linux-m68k.org>
Are you going to take this/these via your m68k git tree?
Regards
Greg
> ---
> arch/m68k/coldfire/clk.c | 29 +++++++++++++++++++++++++++++
> 1 file changed, 29 insertions(+)
>
> diff --git a/arch/m68k/coldfire/clk.c b/arch/m68k/coldfire/clk.c
> index 849cd208e2ed99e6..7bc666e482ebe82f 100644
> --- a/arch/m68k/coldfire/clk.c
> +++ b/arch/m68k/coldfire/clk.c
> @@ -129,4 +129,33 @@ unsigned long clk_get_rate(struct clk *clk)
> }
> EXPORT_SYMBOL(clk_get_rate);
>
> +/* dummy functions, should not be called */
> +long clk_round_rate(struct clk *clk, unsigned long rate)
> +{
> + WARN_ON(clk);
> + return 0;
> +}
> +EXPORT_SYMBOL(clk_round_rate);
> +
> +int clk_set_rate(struct clk *clk, unsigned long rate)
> +{
> + WARN_ON(clk);
> + return 0;
> +}
> +EXPORT_SYMBOL(clk_set_rate);
> +
> +int clk_set_parent(struct clk *clk, struct clk *parent)
> +{
> + WARN_ON(clk);
> + return 0;
> +}
> +EXPORT_SYMBOL(clk_set_parent);
> +
> +struct clk *clk_get_parent(struct clk *clk)
> +{
> + WARN_ON(clk);
> + return NULL;
> +}
> +EXPORT_SYMBOL(clk_get_parent);
> +
> /***************************************************************************/
>
^ permalink raw reply
* Re: [PATCH] net: phy: mdio-gpio: Cut surplus includes
From: Andrew Lunn @ 2018-06-12 7:05 UTC (permalink / raw)
To: Linus Walleij; +Cc: Florian Fainelli, netdev
In-Reply-To: <20180611111903.7221-1-linus.walleij@linaro.org>
On Mon, Jun 11, 2018 at 01:19:03PM +0200, Linus Walleij wrote:
> The GPIO MDIO driver now needs only <linux/gpio/consumer.h>
> so cut the legacy <linux/gpio.h> and <linux/of_gpio.h>
> includes that are no longer used.
>
> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Andrew
^ permalink raw reply
* [PATCH v2] xen/netfront: raise max number of slots in xennet_get_responses()
From: Juergen Gross @ 2018-06-12 6:57 UTC (permalink / raw)
To: linux-kernel, xen-devel, netdev; +Cc: boris.ostrovsky, davem, Juergen Gross
The max number of slots used in xennet_get_responses() is set to
MAX_SKB_FRAGS + (rx->status <= RX_COPY_THRESHOLD).
In old kernel-xen MAX_SKB_FRAGS was 18, while nowadays it is 17. This
difference is resulting in frequent messages "too many slots" and a
reduced network throughput for some workloads (factor 10 below that of
a kernel-xen based guest).
Replacing MAX_SKB_FRAGS by XEN_NETIF_NR_SLOTS_MIN for calculation of
the max number of slots to use solves that problem (tests showed no
more messages "too many slots" and throughput was as high as with the
kernel-xen based guest system).
Replace MAX_SKB_FRAGS-2 by XEN_NETIF_NR_SLOTS_MIN-1 in
netfront_tx_slot_available() for making it clearer what is really being
tested without actually modifying the tested value.
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
drivers/net/xen-netfront.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 679da1abd73c..922ce0abf5cf 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -239,7 +239,7 @@ static void rx_refill_timeout(struct timer_list *t)
static int netfront_tx_slot_available(struct netfront_queue *queue)
{
return (queue->tx.req_prod_pvt - queue->tx.rsp_cons) <
- (NET_TX_RING_SIZE - MAX_SKB_FRAGS - 2);
+ (NET_TX_RING_SIZE - XEN_NETIF_NR_SLOTS_MIN - 1);
}
static void xennet_maybe_wake_tx(struct netfront_queue *queue)
@@ -790,7 +790,7 @@ static int xennet_get_responses(struct netfront_queue *queue,
RING_IDX cons = queue->rx.rsp_cons;
struct sk_buff *skb = xennet_get_rx_skb(queue, cons);
grant_ref_t ref = xennet_get_rx_ref(queue, cons);
- int max = MAX_SKB_FRAGS + (rx->status <= RX_COPY_THRESHOLD);
+ int max = XEN_NETIF_NR_SLOTS_MIN + (rx->status <= RX_COPY_THRESHOLD);
int slots = 1;
int err = 0;
unsigned long ret;
--
2.13.7
^ permalink raw reply related
* Re: [Xen-devel] [PATCH] xen/netfront: raise max number of slots in xennet_get_responses()
From: Juergen Gross @ 2018-06-12 6:45 UTC (permalink / raw)
To: Boris Ostrovsky, linux-kernel, xen-devel, netdev; +Cc: davem
In-Reply-To: <e970a52b-d4ad-30c1-65ca-15d7679951da@oracle.com>
On 11/06/18 20:59, Boris Ostrovsky wrote:
> On 06/11/2018 03:57 AM, Juergen Gross wrote:
>> The max number of slots used in xennet_get_responses() is set to
>> MAX_SKB_FRAGS + (rx->status <= RX_COPY_THRESHOLD).
>>
>> In old kernel-xen MAX_SKB_FRAGS was 18, while nowadays it is 17. This
>> difference is resulting in frequent messages "too many slots" and a
>> reduced network throughput for some workloads (factor 10 below that of
>> a kernel-xen based guest).
>>
>> Replacing MAX_SKB_FRAGS by XEN_NETIF_NR_SLOTS_MIN for calculation of
>> the max number of slots to use solves that problem (tests showed no
>> more messages "too many slots" and throughput was as high as with the
>> kernel-xen based guest system).
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
>
> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>
> I wonder also whether netfront_tx_slot_available() is meant to be
>
> return (queue->tx.req_prod_pvt - queue->tx.rsp_cons) <
> (NET_TX_RING_SIZE - XEN_NETIF_NR_SLOTS_MIN - 1);
>
> which is the same numeric value but provides a more accurate description
> of what is being tested.
Yes, this is a sensible idea. I'll add that, keeping your R-b.
Juergen
^ permalink raw reply
* [PATCH bpf v3] tools/bpftool: fix a bug in bpftool perf
From: Yonghong Song @ 2018-06-12 5:35 UTC (permalink / raw)
To: ast, daniel, netdev; +Cc: kernel-team
Commit b04df400c302 ("tools/bpftool: add perf subcommand")
introduced bpftool subcommand perf to query bpf program
kuprobe and tracepoint attachments.
The perf subcommand will first test whether bpf subcommand
BPF_TASK_FD_QUERY is supported in kernel or not. It does it
by opening a file with argv[0] and feeds the file descriptor
and current task pid to the kernel for querying.
Such an approach won't work if the argv[0] cannot be opened
successfully in the current directory. This is especially
true when bpftool is accessible through PATH env variable.
The error below reflects the open failure for file argv[0]
at home directory.
[yhs@localhost ~]$ which bpftool
/usr/local/sbin/bpftool
[yhs@localhost ~]$ bpftool perf
Error: perf_query_support: No such file or directory
To fix the issue, let us open root directory ("/")
which exists in every linux system. With the fix, the
error message will correctly reflect the permission issue.
[yhs@localhost ~]$ which bpftool
/usr/local/sbin/bpftool
[yhs@localhost ~]$ bpftool perf
Error: perf_query_support: Operation not permitted
HINT: non root or kernel doesn't support TASK_FD_QUERY
Fixes: b04df400c302 ("tools/bpftool: add perf subcommand")
Reported-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
---
tools/bpf/bpftool/perf.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
Changelogs:
v2 -> v3:
. fix a stupid error (flippingg the condition) I introduced in v2.
v1 -> v2:
. remove '\n' in the p_err format string in order to
have valid json output.
diff --git a/tools/bpf/bpftool/perf.c b/tools/bpf/bpftool/perf.c
index ac6b1a12c9b7..b76b77dcfd1f 100644
--- a/tools/bpf/bpftool/perf.c
+++ b/tools/bpf/bpftool/perf.c
@@ -29,9 +29,10 @@ static bool has_perf_query_support(void)
if (perf_query_supported)
goto out;
- fd = open(bin_name, O_RDONLY);
+ fd = open("/", O_RDONLY);
if (fd < 0) {
- p_err("perf_query_support: %s", strerror(errno));
+ p_err("perf_query_support: cannot open directory \"/\" (%s)",
+ strerror(errno));
goto out;
}
--
2.14.3
^ permalink raw reply related
* Re: [PATCH net] tls: fix NULL pointer dereference on poll
From: Christoph Hellwig @ 2018-06-12 5:37 UTC (permalink / raw)
To: Daniel Borkmann; +Cc: davem, davejwatson, netdev, ast, Christoph Hellwig
In-Reply-To: <20180611212204.24002-1-daniel@iogearbox.net>
> Looks like the recent conversion from poll to poll_mask callback started
> in 152524231023 ("net: add support for ->poll_mask in proto_ops") missed
> to eventually convert kTLS, too: TCP's ->poll was converted over to the
> ->poll_mask in commit 2c7d3dacebd4 ("net/tcp: convert to ->poll_mask")
> and therefore kTLS wrongly saved the ->poll old one which is now NULL.
Looks like this TLS code was added in the same cycle.
> diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
> index 301f224..a127d61 100644
> --- a/net/tls/tls_main.c
> +++ b/net/tls/tls_main.c
> @@ -712,7 +712,7 @@ static int __init tls_register(void)
> build_protos(tls_prots[TLSV4], &tcp_prot);
>
> tls_sw_proto_ops = inet_stream_ops;
> - tls_sw_proto_ops.poll = tls_sw_poll;
> + tls_sw_proto_ops.poll_mask = tls_sw_poll_mask;
> tls_sw_proto_ops.splice_read = tls_sw_splice_read;
Not new in this patch, but copying ops vectors is a very bad idea, not
only because your new instance can't be marked const and you thus open
up exploit vectors. I would suggest to clean this up eventually.
> +__poll_t tls_sw_poll_mask(struct socket *sock, __poll_t events)
> {
> struct sock *sk = sock->sk;
> struct tls_context *tls_ctx = tls_get_ctx(sk);
> struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
> + __poll_t mask;
>
> + /* Grab EPOLLOUT and EPOLLHUP from the underlying socket */
> + mask = ctx->sk_poll_mask(sock, events);
>
> + /* Clear EPOLLIN bits, and set based on recv_pkt */
> + mask &= ~(EPOLLIN | EPOLLRDNORM);
> if (ctx->recv_pkt)
> + mask |= EPOLLIN | EPOLLRDNORM;
>
> + return mask;
So you call the underlying protocol method on the struct sock of
the TLS code? Again not reall new in this patch, but how is this
even supposed to work?
^ permalink raw reply
* Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Samudrala, Sridhar @ 2018-06-12 5:02 UTC (permalink / raw)
To: Michael S. Tsirkin, Jason Wang
Cc: alexander.h.duyck, virtio-dev, aaron.f.brown, jiri, kubakici,
netdev, qemu-devel, loseweigh, virtualization
In-Reply-To: <20180612051432-mutt-send-email-mst@kernel.org>
On 6/11/2018 7:17 PM, Michael S. Tsirkin wrote:
> On Tue, Jun 12, 2018 at 09:54:44AM +0800, Jason Wang wrote:
>>
>> On 2018年06月12日 01:26, Michael S. Tsirkin wrote:
>>> On Mon, May 07, 2018 at 04:09:54PM -0700, Sridhar Samudrala wrote:
>>>> This feature bit can be used by hypervisor to indicate virtio_net device to
>>>> act as a standby for another device with the same MAC address.
>>>>
>>>> I tested this with a small change to the patch to mark the STANDBY feature 'true'
>>>> by default as i am using libvirt to start the VMs.
>>>> Is there a way to pass the newly added feature bit 'standby' to qemu via libvirt
>>>> XML file?
>>>>
>>>> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
>>> So I do not think we can commit to this interface: we
>>> really need to control visibility of the primary device.
>> The problem is legacy guest won't use primary device at all if we do this.
> And that's by design - I think it's the only way to ensure the
> legacy guest isn't confused.
Yes. I think so. But i am not sure if Qemu is the right place to control the visibility
of the primary device. The primary device may not be specified as an argument to Qemu. It
may be plugged in later.
The cloud service provider is providing a feature that enables low latency datapath and live
migration capability.
A tenant can use this feature only if he is running a VM that has virtio-net with failover support.
I think Qemu should check if guest virtio-net supports this feature and provide a mechanism for
an upper layer indicating if the STANDBY feature is successfully negotiated or not.
The upper layer can then decide if it should hot plug a VF with the same MAC and manage the 2 links.
If VF is successfully hot plugged, virtio-net link should be disabled.
>
>> How about control the visibility of standby device?
>>
>> Thanks
> standy the always there to guarantee no downtime.
>
>>> However just for testing purposes, we could add a non-stable
>>> interface "x-standby" with the understanding that as any
>>> x- prefix it's unstable and will be changed down the road,
>>> likely in the next release.
>>>
>>>
>>>> ---
>>>> hw/net/virtio-net.c | 2 ++
>>>> include/standard-headers/linux/virtio_net.h | 3 +++
>>>> 2 files changed, 5 insertions(+)
>>>>
>>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>>>> index 90502fca7c..38b3140670 100644
>>>> --- a/hw/net/virtio-net.c
>>>> +++ b/hw/net/virtio-net.c
>>>> @@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = {
>>>> true),
>>>> DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN),
>>>> DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str),
>>>> + DEFINE_PROP_BIT64("standby", VirtIONet, host_features, VIRTIO_NET_F_STANDBY,
>>>> + false),
>>>> DEFINE_PROP_END_OF_LIST(),
>>>> };
>>>> diff --git a/include/standard-headers/linux/virtio_net.h b/include/standard-headers/linux/virtio_net.h
>>>> index e9f255ea3f..01ec09684c 100644
>>>> --- a/include/standard-headers/linux/virtio_net.h
>>>> +++ b/include/standard-headers/linux/virtio_net.h
>>>> @@ -57,6 +57,9 @@
>>>> * Steering */
>>>> #define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */
>>>> +#define VIRTIO_NET_F_STANDBY 62 /* Act as standby for another device
>>>> + * with the same MAC.
>>>> + */
>>>> #define VIRTIO_NET_F_SPEED_DUPLEX 63 /* Device set linkspeed and duplex */
>>>> #ifndef VIRTIO_NET_NO_LEGACY
>>>> --
>>>> 2.14.3
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* [PATCH] isdn/i4l: add error handling for try_module_get
From: Zhouyang Jia @ 2018-06-12 4:43 UTC (permalink / raw)
Cc: Zhouyang Jia, Karsten Keil, Kees Cook, Annie Cherkaev, Al Viro,
Jiten Thakkar, netdev, linux-kernel
When try_module_get fails, the lack of error-handling code may
cause unexpected results.
This patch adds error-handling code after calling try_module_get.
Signed-off-by: Zhouyang Jia <jiazhouyang09@gmail.com>
---
drivers/isdn/i4l/isdn_common.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/isdn/i4l/isdn_common.c b/drivers/isdn/i4l/isdn_common.c
index 7c6f3f5..7e52851 100644
--- a/drivers/isdn/i4l/isdn_common.c
+++ b/drivers/isdn/i4l/isdn_common.c
@@ -71,7 +71,8 @@ static int isdn_add_channels(isdn_driver_t *d, int drvidx, int n, int adding);
static inline void
isdn_lock_driver(isdn_driver_t *drv)
{
- try_module_get(drv->interface->owner);
+ if (!try_module_get(drv->interface->owner))
+ printk(KERN_WARNING "isdn_lock_driver: cannot get module\n");
drv->locks++;
}
--
2.7.4
^ permalink raw reply related
* [PATCH net 4/4] nfp: flower: free dst_entry in route table
From: Jakub Kicinski @ 2018-06-12 4:33 UTC (permalink / raw)
To: davem; +Cc: netdev, oss-drivers, Pieter Jansen van Vuuren, Louis Peens
In-Reply-To: <20180612043338.5447-1-jakub.kicinski@netronome.com>
From: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
We need to release the refcnt on dst_entry in the route table, otherwise
we will leak the route.
Fixes: 8e6a9046b66a ("nfp: flower vxlan neighbour offload")
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Signed-off-by: Louis Peens <louis.peens@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c b/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c
index ec524d97869d..78afe75129ab 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c
@@ -381,6 +381,8 @@ nfp_tun_neigh_event_handler(struct notifier_block *nb, unsigned long event,
err = PTR_ERR_OR_ZERO(rt);
if (err)
return NOTIFY_DONE;
+
+ ip_rt_put(rt);
#else
return NOTIFY_DONE;
#endif
--
2.17.1
^ permalink raw reply related
* [PATCH net 0/4] nfp: fix a warning, stats, naming and route leak
From: Jakub Kicinski @ 2018-06-12 4:33 UTC (permalink / raw)
To: davem; +Cc: netdev, oss-drivers, Jakub Kicinski
Hi!
Various fixes for the NFP. Patch 1 fixes a harmless GCC 8 warning.
Patch 2 ensures statistics are correct after users decrease the number
of channels/rings. Patch 3 restores phy_port_name behaviour for flower,
ndo_get_phy_port_name used to return -EOPNOTSUPP on one of the netdevs,
and we need to keep it that way otherwise interface names may change.
Patch 4 fixes refcnt leak in flower tunnel offload code.
Jakub Kicinski (3):
nfp: don't pad strings in nfp_cpp_resource_find() to avoid gcc 8
warning
nfp: include all ring counters in interface stats
nfp: remove phys_port_name on flower's vNIC
Pieter Jansen van Vuuren (1):
nfp: flower: free dst_entry in route table
drivers/net/ethernet/netronome/nfp/flower/main.c | 1 +
drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c | 2 ++
drivers/net/ethernet/netronome/nfp/nfp_net.h | 4 ++++
drivers/net/ethernet/netronome/nfp/nfp_net_common.c | 4 ++--
drivers/net/ethernet/netronome/nfp/nfpcore/nfp_resource.c | 7 ++-----
5 files changed, 11 insertions(+), 7 deletions(-)
--
2.17.1
^ permalink raw reply
* [PATCH net 3/4] nfp: remove phys_port_name on flower's vNIC
From: Jakub Kicinski @ 2018-06-12 4:33 UTC (permalink / raw)
To: davem; +Cc: netdev, oss-drivers, Jakub Kicinski
In-Reply-To: <20180612043338.5447-1-jakub.kicinski@netronome.com>
.ndo_get_phys_port_name was recently extended to support multi-vNIC
FWs. These are firmwares which can have more than one vNIC per PF
without associated port (e.g. Adaptive Buffer Management FW), therefore
we need a way of distinguishing the vNICs. Unfortunately, it's too
late to make flower use the same naming. Flower users may depend on
.ndo_get_phys_port_name returning -EOPNOTSUPP, for example the name
udev gave the PF vNIC was just the bare PCI device-based name before
the change, and will have 'nn0' appended after.
To ensure flower's vNIC doesn't have phys_port_name attribute, add
a flag to vNIC struct and set it in flower code. New projects will
not set the flag adhere to the naming scheme from the start.
Fixes: 51c1df83e35c ("nfp: assign vNIC id as phys_port_name of vNICs which are not ports")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
---
drivers/net/ethernet/netronome/nfp/flower/main.c | 1 +
drivers/net/ethernet/netronome/nfp/nfp_net.h | 4 ++++
drivers/net/ethernet/netronome/nfp/nfp_net_common.c | 2 +-
3 files changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/netronome/nfp/flower/main.c b/drivers/net/ethernet/netronome/nfp/flower/main.c
index 19cfa162ac65..1decf3a1cad3 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/main.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/main.c
@@ -455,6 +455,7 @@ static int nfp_flower_vnic_alloc(struct nfp_app *app, struct nfp_net *nn,
eth_hw_addr_random(nn->dp.netdev);
netif_keep_dst(nn->dp.netdev);
+ nn->vnic_no_name = true;
return 0;
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net.h b/drivers/net/ethernet/netronome/nfp/nfp_net.h
index 57cb035dcc6d..2a71a9ffd095 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net.h
@@ -590,6 +590,8 @@ struct nfp_net_dp {
* @vnic_list: Entry on device vNIC list
* @pdev: Backpointer to PCI device
* @app: APP handle if available
+ * @vnic_no_name: For non-port PF vNIC make ndo_get_phys_port_name return
+ * -EOPNOTSUPP to keep backwards compatibility (set by app)
* @port: Pointer to nfp_port structure if vNIC is a port
* @app_priv: APP private data for this vNIC
*/
@@ -663,6 +665,8 @@ struct nfp_net {
struct pci_dev *pdev;
struct nfp_app *app;
+ bool vnic_no_name;
+
struct nfp_port *port;
void *app_priv;
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index ed27176c2bce..d4c27f849f9b 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -3286,7 +3286,7 @@ nfp_net_get_phys_port_name(struct net_device *netdev, char *name, size_t len)
if (nn->port)
return nfp_port_get_phys_port_name(netdev, name, len);
- if (nn->dp.is_vf)
+ if (nn->dp.is_vf || nn->vnic_no_name)
return -EOPNOTSUPP;
n = snprintf(name, len, "n%d", nn->id);
--
2.17.1
^ permalink raw reply related
* [PATCH net 2/4] nfp: include all ring counters in interface stats
From: Jakub Kicinski @ 2018-06-12 4:33 UTC (permalink / raw)
To: davem; +Cc: netdev, oss-drivers, Jakub Kicinski
In-Reply-To: <20180612043338.5447-1-jakub.kicinski@netronome.com>
We are gathering software statistics on per-ring basis.
.ndo_get_stats64 handler adds the rings up. Unfortunately
we are currently only adding up active rings, which means
that if user decreases the number of active rings the
statistics from deactivated rings will no longer be counted
and total interface statistics may go backwards.
Always sum all possible rings, the stats are allocated
statically for max number of rings, so we don't have to
worry about them being removed. We could add the stats
up when user changes the ring count, but it seems unnecessary..
Adding up inactive rings will be very quick since no datapath
will be touching them.
Fixes: 164d1e9e5d52 ("nfp: add support for ethtool .set_channels")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
---
drivers/net/ethernet/netronome/nfp/nfp_net_common.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 75110c8d6a90..ed27176c2bce 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -3121,7 +3121,7 @@ static void nfp_net_stat64(struct net_device *netdev,
struct nfp_net *nn = netdev_priv(netdev);
int r;
- for (r = 0; r < nn->dp.num_r_vecs; r++) {
+ for (r = 0; r < nn->max_r_vecs; r++) {
struct nfp_net_r_vector *r_vec = &nn->r_vecs[r];
u64 data[3];
unsigned int start;
--
2.17.1
^ permalink raw reply related
* [PATCH net 1/4] nfp: don't pad strings in nfp_cpp_resource_find() to avoid gcc 8 warning
From: Jakub Kicinski @ 2018-06-12 4:33 UTC (permalink / raw)
To: davem; +Cc: netdev, oss-drivers, Jakub Kicinski
In-Reply-To: <20180612043338.5447-1-jakub.kicinski@netronome.com>
Once upon a time nfp_cpp_resource_find() took a name parameter,
which could be any user-chosen string. Resources are identified
by a CRC32 hash of a 8 byte string, so we had to pad user input
with zeros to make sure CRC32 gave the correct result.
Since then nfp_cpp_resource_find() was made to operate on allocated
resources only (struct nfp_resource). We kzalloc those so there is
no need to pad the strings and use memcmp.
This avoids a GCC 8 stringop-truncation warning:
In function ‘nfp_cpp_resource_find’,
inlined from ‘nfp_resource_try_acquire’ at .../nfpcore/nfp_resource.c:153:8,
inlined from ‘nfp_resource_acquire’ at .../nfpcore/nfp_resource.c:206:9:
.../nfpcore/nfp_resource.c:108:2: warning: strncpy’ output may be truncated copying 8 bytes from a string of length 8 [-Wstringop-truncation]
strncpy(name_pad, res->name, sizeof(name_pad));
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
---
drivers/net/ethernet/netronome/nfp/nfpcore/nfp_resource.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_resource.c b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_resource.c
index 2dd89dba9311..d32af598da90 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_resource.c
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_resource.c
@@ -98,21 +98,18 @@ struct nfp_resource {
static int nfp_cpp_resource_find(struct nfp_cpp *cpp, struct nfp_resource *res)
{
- char name_pad[NFP_RESOURCE_ENTRY_NAME_SZ] = {};
struct nfp_resource_entry entry;
u32 cpp_id, key;
int ret, i;
cpp_id = NFP_CPP_ID(NFP_RESOURCE_TBL_TARGET, 3, 0); /* Atomic read */
- strncpy(name_pad, res->name, sizeof(name_pad));
-
/* Search for a matching entry */
- if (!memcmp(name_pad, NFP_RESOURCE_TBL_NAME "\0\0\0\0\0\0\0\0", 8)) {
+ if (!strcmp(res->name, NFP_RESOURCE_TBL_NAME)) {
nfp_err(cpp, "Grabbing device lock not supported\n");
return -EOPNOTSUPP;
}
- key = crc32_posix(name_pad, sizeof(name_pad));
+ key = crc32_posix(res->name, NFP_RESOURCE_ENTRY_NAME_SZ);
for (i = 0; i < NFP_RESOURCE_TBL_ENTRIES; i++) {
u64 addr = NFP_RESOURCE_TBL_BASE +
--
2.17.1
^ permalink raw reply related
* Re: [PATCH] net: add error handling for kmem_cache_create
From: Eric Dumazet @ 2018-06-12 4:31 UTC (permalink / raw)
To: Zhouyang Jia
Cc: David S. Miller, Wei Wang, Martin KaFai Lau, Kees Cook,
Eric Dumazet, Florian Westphal, Joe Perches, linux-decnet-user,
netdev, linux-kernel
In-Reply-To: <1528777798-40722-1-git-send-email-jiazhouyang09@gmail.com>
On 06/11/2018 09:29 PM, Zhouyang Jia wrote:
> When kmem_cache_create fails, the lack of error-handling code may
> cause unexpected results.
>
> This patch adds error-handling code after calling kmem_cache_create.
>
> Signed-off-by: Zhouyang Jia <jiazhouyang09@gmail.com>
> ---
> net/decnet/dn_route.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/net/decnet/dn_route.c b/net/decnet/dn_route.c
> index e747650..2b743c7 100644
> --- a/net/decnet/dn_route.c
> +++ b/net/decnet/dn_route.c
> @@ -1861,6 +1861,9 @@ void __init dn_route_init(void)
> dn_dst_ops.kmem_cachep =
> kmem_cache_create("dn_dst_cache", sizeof(struct dn_route), 0,
> SLAB_HWCACHE_ALIGN|SLAB_PANIC, NULL);
> + if (!dn_dst_ops.kmem_cachep)
> + panic("Failed to create dn_dst_cache cache\n");
> +
Not needed, since SLAB_PANIC would have paniced the box earlier.
> dst_entries_init(&dn_dst_ops);
> timer_setup(&dn_route_timer, dn_dst_check_expire, 0);
> dn_route_timer.expires = jiffies + decnet_dst_gc_interval * HZ;
>
^ permalink raw reply
* [PATCH] net: add error handling for kmem_cache_create
From: Zhouyang Jia @ 2018-06-12 4:29 UTC (permalink / raw)
Cc: Zhouyang Jia, David S. Miller, Wei Wang, Martin KaFai Lau,
Kees Cook, Eric Dumazet, Florian Westphal, Joe Perches,
linux-decnet-user, netdev, linux-kernel
When kmem_cache_create fails, the lack of error-handling code may
cause unexpected results.
This patch adds error-handling code after calling kmem_cache_create.
Signed-off-by: Zhouyang Jia <jiazhouyang09@gmail.com>
---
net/decnet/dn_route.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/net/decnet/dn_route.c b/net/decnet/dn_route.c
index e747650..2b743c7 100644
--- a/net/decnet/dn_route.c
+++ b/net/decnet/dn_route.c
@@ -1861,6 +1861,9 @@ void __init dn_route_init(void)
dn_dst_ops.kmem_cachep =
kmem_cache_create("dn_dst_cache", sizeof(struct dn_route), 0,
SLAB_HWCACHE_ALIGN|SLAB_PANIC, NULL);
+ if (!dn_dst_ops.kmem_cachep)
+ panic("Failed to create dn_dst_cache cache\n");
+
dst_entries_init(&dn_dst_ops);
timer_setup(&dn_route_timer, dn_dst_check_expire, 0);
dn_route_timer.expires = jiffies + decnet_dst_gc_interval * HZ;
--
2.7.4
^ permalink raw reply related
* Re: [PATCH bpf v2] tools/bpftool: fix a bug in bpftool perf
From: Yonghong Song @ 2018-06-12 3:45 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: ast, daniel, netdev, kernel-team
In-Reply-To: <20180611194257.5f72219b@cakuba.netronome.com>
On 6/11/18 7:42 PM, Jakub Kicinski wrote:
> On Mon, 11 Jun 2018 19:01:08 -0700, Yonghong Song wrote:
>> Commit b04df400c302 ("tools/bpftool: add perf subcommand")
>> introduced bpftool subcommand perf to query bpf program
>> kuprobe and tracepoint attachments.
>>
>> The perf subcommand will first test whether bpf subcommand
>> BPF_TASK_FD_QUERY is supported in kernel or not. It does it
>> by opening a file with argv[0] and feeds the file descriptor
>> and current task pid to the kernel for querying.
>>
>> Such an approach won't work if the argv[0] cannot be opened
>> successfully in the current directory. This is especially
>> true when bpftool is accessible through PATH env variable.
>> The error below reflects the open failure for file argv[0]
>> at home directory.
>>
>> [yhs@localhost ~]$ which bpftool
>> /usr/local/sbin/bpftool
>> [yhs@localhost ~]$ bpftool perf
>> Error: perf_query_support: No such file or directory
>>
>> To fix the issue, let us open root directory ("/")
>> which exists in every linux system. With the fix, the
>> error message will correctly reflect the permission issue.
>>
>> [yhs@localhost ~]$ which bpftool
>> /usr/local/sbin/bpftool
>> [yhs@localhost ~]$ bpftool perf
>> Error: perf_query_support: Operation not permitted
>> HINT: non root or kernel doesn't support TASK_FD_QUERY
>>
>> Fixes: b04df400c302 ("tools/bpftool: add perf subcommand")
>> Reported-by: Alexei Starovoitov <ast@kernel.org>
>> Signed-off-by: Yonghong Song <yhs@fb.com>
>> ---
>> tools/bpf/bpftool/perf.c | 7 ++++---
>> 1 file changed, 4 insertions(+), 3 deletions(-)
>>
>> Changelogs:
>> v1 -> v2:
>> . remove '\n' in the p_err format string in order to
>> have valid json output.
>>
>> diff --git a/tools/bpf/bpftool/perf.c b/tools/bpf/bpftool/perf.c
>> index ac6b1a12c9b7..239715aa6fb9 100644
>> --- a/tools/bpf/bpftool/perf.c
>> +++ b/tools/bpf/bpftool/perf.c
>> @@ -29,9 +29,10 @@ static bool has_perf_query_support(void)
>> if (perf_query_supported)
>> goto out;
>>
>> - fd = open(bin_name, O_RDONLY);
>> - if (fd < 0) {
>> - p_err("perf_query_support: %s", strerror(errno));
>> + fd = open("/", O_RDONLY);
>> + if (fd > 0) {
>
> Looking at this again, why change the comparison from less than to
> greater than 0?
Totally my fault. I altered the condition to test p_err output since
I cannot trigger it normally, but forgot to change it back.
Will send yet another revision :-(
>> + p_err("perf_query_support: cannot open directory \"/\" (%s)",
>> + strerror(errno));
>> goto out;
>> }
>>
>
^ permalink raw reply
* problems with SCTP GSO
From: David Miller @ 2018-06-12 3:29 UTC (permalink / raw)
To: marcelo.leitner; +Cc: lucien.xin, edumazet, netdev
I would like to bring up some problems with the current GSO
implementation in SCTP.
The most important for me right now is that SCTP uses
"skb_gro_receive()" to build "GSO" frames :-(
Really it just ends up using the slow path (basically, label 'merge'
and onwards).
So, using a GRO helper to build GSO packets is not great.
I want to make major surgery here and the only way I can is if
it is exactly the GRO demuxing path that uses skb_gro_receive().
Those paths pass in the list head from the NAPI struct that initiated
the GRO code paths. That makes it easy for me to change this to use a
list_head or a hash chain.
Probably in the short term SCTP should just have a private helper that
builds the frag list, appending 'skb' to 'head'.
In the long term, SCTP should use the page frags just like TCP to
append the data when building GSO frames. Then it could actually be
offloaded and passed into drivers without linearizing.
^ permalink raw reply
* Re: [PATCH iproute2-next] ip-xfrm: Add support for OUTPUT_MARK
From: Stephen Hemminger @ 2018-06-12 3:12 UTC (permalink / raw)
To: Subash Abhinov Kasiviswanathan; +Cc: lorenzo, netdev, dsahern
In-Reply-To: <1528769493-25435-1-git-send-email-subashab@codeaurora.org>
On Mon, 11 Jun 2018 20:11:33 -0600
Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> wrote:
> This patch adds support for OUTPUT_MARK in xfrm state to exercise the
> functionality added by kernel commit 077fbac405bf
> ("net: xfrm: support setting an output mark.").
>
> Sample output with output-mark -
>
> src 192.168.1.1 dst 192.168.1.2
> proto esp spi 0x00004321 reqid 0 mode tunnel
> replay-window 0 flag af-unspec
> auth-trunc xcbc(aes) 0x3ed0af408cf5dcbf5d5d9a5fa806b211 96
> enc cbc(aes) 0x3ed0af408cf5dcbf5d5d9a5fa806b233
> anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000
> output-mark 0x20000
>
> Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
This reminds me. It is time to convert xfrm to json output.
^ permalink raw reply
* Re: [PATCH 1/2] Convert target drivers to use sbitmap
From: Matthew Wilcox @ 2018-06-12 3:06 UTC (permalink / raw)
To: Jens Axboe
Cc: Juergen Gross, kvm, linux-scsi, Matthew Wilcox, netdev, linux-usb,
linux-kernel, virtualization, target-devel, qla2xxx-upstream,
linux1394-devel, Kent Overstreet
In-Reply-To: <02326395-3241-c94b-ad70-3de27a6f5a8c@kernel.dk>
On Mon, Jun 11, 2018 at 07:18:55PM -0600, Jens Axboe wrote:
> Are you going to push this further? I really think we should.
Yes, I'll resubmit it tomorrow. Sorry for dropping the ball on this one.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox