Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 1/2] net_sched: call qlen_notify only if child qdisc is empty
From: Cong Wang @ 2017-08-16 17:22 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Linux Kernel Network Developers, David S. Miller, Jiri Kosina,
	Eric Dumazet, Jamal Hadi Salim
In-Reply-To: <150280439968.717891.6476652519937001709.stgit@buzz>

On Tue, Aug 15, 2017 at 6:39 AM, Konstantin Khlebnikov
<khlebnikov@yandex-team.ru> wrote:
> This callback is used for deactivating class in parent qdisc.
> This is cheaper to test queue length right here.
>
> Also this allows to catch draining screwed backlog and prevent
> second deactivation of already inactive parent class which will
> crash kernel for sure. Kernel with print warning at destruction
> of child qdisc where no packets but backlog is not zero.

Good cleanup. Please explicitly mark patches like this as
for net-next.

Just one comment below.


> diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
> index bd24a550e0f9..18da45c0769c 100644
> --- a/net/sched/sch_api.c
> +++ b/net/sched/sch_api.c
> @@ -752,6 +752,7 @@ void qdisc_tree_reduce_backlog(struct Qdisc *sch, unsigned int n,
>         const struct Qdisc_class_ops *cops;
>         unsigned long cl;
>         u32 parentid;
> +       bool notify;
>         int drops;
>
>         if (n == 0 && len == 0)
> @@ -764,6 +765,13 @@ void qdisc_tree_reduce_backlog(struct Qdisc *sch, unsigned int n,
>
>                 if (sch->flags & TCQ_F_NOPARENT)
>                         break;
> +               /* Notify parent qdisc only if child qdisc becomes empty.
> +                *
> +                * If child was empty even before update then backlog
> +                * counter is screwed and we skip notification because
> +                * parent class is already passive.
> +                */
> +               notify = !sch->q.qlen && !WARN_ON_ONCE(!n);

Since 'n' never changes in this function, can we just move
this WARN_ON_ONCE() right before the loop??

^ permalink raw reply

* [PATCH 6/6] mpls: pseudowire control word support
From: David Lamparter @ 2017-08-16 17:02 UTC (permalink / raw)
  To: netdev; +Cc: amine.kherbouche, roopa, David Lamparter
In-Reply-To: <20170816170202.456851-1-equinox@diac24.net>

[TODO: maybe rename this to MPLS_FLAGS and use it for non-pseudowire OAM
bits too (e.g. enabling G-ACh or LSP ping.)]

Signed-off-by: David Lamparter <equinox@diac24.net>
---
 include/uapi/linux/rtnetlink.h |  4 ++++
 net/mpls/af_mpls.c             | 11 +++++++++++
 net/mpls/internal.h            |  8 ++------
 net/mpls/vpls.c                | 34 +++++++++++++++++++++++++++++++++-
 4 files changed, 50 insertions(+), 7 deletions(-)

diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index b7840ed94526..b5a34e0e4327 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -327,6 +327,7 @@ enum rtattr_type_t {
 	RTA_UID,
 	RTA_TTL_PROPAGATE,
 	RTA_VPLS_IF,
+	RTA_VPLS_FLAGS,
 	__RTA_MAX
 };
 
@@ -335,6 +336,9 @@ enum rtattr_type_t {
 #define RTM_RTA(r)  ((struct rtattr*)(((char*)(r)) + NLMSG_ALIGN(sizeof(struct rtmsg))))
 #define RTM_PAYLOAD(n) NLMSG_PAYLOAD(n,sizeof(struct rtmsg))
 
+#define RTA_VPLS_F_CW_RX	(1 << 0)
+#define RTA_VPLS_F_CW_TX	(1 << 1)
+
 /* RTM_MULTIPATH --- array of struct rtnexthop.
  *
  * "struct rtnexthop" describes all necessary nexthop information,
diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index 4d3ce007b7db..9036beeca173 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -477,6 +477,7 @@ struct mpls_route_config {
 	u32			rc_protocol;
 	u32			rc_ifindex;
 	u32			rc_vpls_ifindex;
+	u8			rc_vpls_flags;
 	u8			rc_via_table;
 	u8			rc_via_alen;
 	u8			rc_via[MAX_VIA_ALEN];
@@ -1036,6 +1037,7 @@ static int mpls_route_add(struct mpls_route_config *cfg,
 	rt->rt_payload_type = cfg->rc_payload_type;
 	rt->rt_ttl_propagate = cfg->rc_ttl_propagate;
 	rt->rt_vpls_dev = vpls_dev;
+	rt->rt_vpls_flags = cfg->rc_vpls_flags;
 
 	if (cfg->rc_mp)
 		err = mpls_nh_build_multi(cfg, rt, max_labels, extack);
@@ -1819,6 +1821,9 @@ static int rtm_to_route_config(struct sk_buff *skb,
 			cfg->rc_vpls_ifindex = nla_get_u32(nla);
 			cfg->rc_payload_type = MPT_VPLS;
 			break;
+		case RTA_VPLS_FLAGS:
+			cfg->rc_vpls_flags = nla_get_u8(nla);
+			break;
 		case RTA_NEWDST:
 			if (nla_get_labels(nla, MAX_NEW_LABELS,
 					   &cfg->rc_output_labels,
@@ -1957,6 +1962,9 @@ static int mpls_dump_route(struct sk_buff *skb, u32 portid, u32 seq, int event,
 	if (rt->rt_vpls_dev)
 		if (nla_put_u32(skb, RTA_VPLS_IF, rt->rt_vpls_dev->ifindex))
 			goto nla_put_failure;
+	if (rt->rt_vpls_flags)
+		if (nla_put_u8(skb, RTA_VPLS_FLAGS, rt->rt_vpls_flags))
+			goto nla_put_failure;
 
 	if (rt->rt_nhn == 1) {
 		const struct mpls_nh *nh = rt->rt_nh;
@@ -2270,6 +2278,9 @@ static int mpls_getroute(struct sk_buff *in_skb, struct nlmsghdr *in_nlh,
 	if (rt->rt_vpls_dev)
 		if (nla_put_u32(skb, RTA_VPLS_IF, rt->rt_vpls_dev->ifindex))
 			goto nla_put_failure;
+	if (rt->rt_vpls_flags)
+		if (nla_put_u8(skb, RTA_VPLS_FLAGS, rt->rt_vpls_flags))
+			goto nla_put_failure;
 
 	if (nh->nh_labels &&
 	    nla_put_labels(skb, RTA_NEWDST, nh->nh_labels,
diff --git a/net/mpls/internal.h b/net/mpls/internal.h
index 876ae9993207..03048e3a5d83 100644
--- a/net/mpls/internal.h
+++ b/net/mpls/internal.h
@@ -79,11 +79,6 @@ enum mpls_payload_type {
 	MPT_VPLS = 2,	/* pseudowire */
 	MPT_IPV4 = 4,
 	MPT_IPV6 = 6,
-
-	/* Other types not implemented:
-	 *  - Pseudo-wire with or without control word (RFC4385)
-	 *  - GAL (RFC5586)
-	 */
 };
 
 struct mpls_nh { /* next hop label forwarding entry */
@@ -153,7 +148,8 @@ struct mpls_route { /* next hop label forwarding entry */
 	u8			rt_nhn_alive;
 	u8			rt_nh_size;
 	u8			rt_via_offset;
-	u8			rt_reserved1;
+
+	u8			rt_vpls_flags;
 	struct net_device	*rt_vpls_dev;
 
 	struct mpls_nh		rt_nh[0];
diff --git a/net/mpls/vpls.c b/net/mpls/vpls.c
index 28ac810da6e9..1496683d871c 100644
--- a/net/mpls/vpls.c
+++ b/net/mpls/vpls.c
@@ -27,6 +27,14 @@
 #define MIN_MTU 68		/* Min L3 MTU */
 #define MAX_MTU 65535		/* Max L3 MTU (arbitrary) */
 
+struct vpls_cw {
+	u8 type_flags;
+#define VPLS_CWTYPE(cw) ((cw)->type_flags & 0x0f)
+
+	u8 len;
+	u16 seqno;
+};
+
 struct vpls_wirelist {
 	struct rcu_head rcu;
 	size_t count;
@@ -53,6 +61,14 @@ static int vpls_xmit_wire(struct sk_buff *skb, struct net_device *dev,
 	if (rt->rt_vpls_dev != dev)
 		return -EINVAL;
 
+	if (rt->rt_vpls_flags & RTA_VPLS_F_CW_TX) {
+		struct vpls_cw *cw;
+		if (skb_cow(skb, sizeof(*cw)))
+			return -ENOMEM;
+		cw = skb_push(skb, sizeof(*cw));
+		memset(cw, 0, sizeof(*cw));
+	}
+
 	return mpls_rt_xmit(skb, rt, dec);
 }
 
@@ -123,6 +139,7 @@ int vpls_rcv(struct sk_buff *skb, struct net_device *in_dev,
 	struct mpls_entry_decoded dec;
 	struct metadata_dst *md_dst;
 	struct pcpu_sw_netstats *stats;
+	void *next;
 
 	if (!dev)
 		goto drop_nodev;
@@ -133,7 +150,22 @@ int vpls_rcv(struct sk_buff *skb, struct net_device *in_dev,
 		goto drop;
 	}
 
-	skb_pull(skb, sizeof(*hdr));
+	/* bottom label is still in the skb */
+	next = skb_pull(skb, sizeof(*hdr));
+
+	if (rt->rt_vpls_flags & RTA_VPLS_F_CW_RX) {
+		struct vpls_cw *cw = next;
+		if (unlikely(!pskb_may_pull(skb, sizeof(*cw)))) {
+			dev->stats.rx_length_errors++;
+			goto drop;
+		}
+		next = skb_pull(skb, sizeof(*cw));
+
+		if (VPLS_CWTYPE(cw) != 0) {
+			/* insert MPLS OAM implementation here */
+			goto drop_nodev;
+		}
+	}
 
 	if (unlikely(!pskb_may_pull(skb, ETH_HLEN))) {
 		dev->stats.rx_length_errors++;
-- 
2.13.0

^ permalink raw reply related

* [PATCH 5/6] bridge: add VPLS pseudowire info in fdb dump
From: David Lamparter @ 2017-08-16 17:02 UTC (permalink / raw)
  To: netdev; +Cc: amine.kherbouche, roopa, David Lamparter
In-Reply-To: <20170816170202.456851-1-equinox@diac24.net>

Add a NDA_VPLS_WIRE attribute to the FDB dump if we have dst metadata
that is indicative of a VPLS pseudowire.  This is really helpful for
debugging.

Signed-off-by: David Lamparter <equinox@diac24.net>
---
 include/uapi/linux/neighbour.h |  1 +
 net/bridge/br_fdb.c            | 12 ++++++++++++
 2 files changed, 13 insertions(+)

diff --git a/include/uapi/linux/neighbour.h b/include/uapi/linux/neighbour.h
index 3199d28980b3..703089c91b04 100644
--- a/include/uapi/linux/neighbour.h
+++ b/include/uapi/linux/neighbour.h
@@ -27,6 +27,7 @@ enum {
 	NDA_MASTER,
 	NDA_LINK_NETNSID,
 	NDA_SRC_VNI,
+	NDA_VPLS_WIRE,
 	__NDA_MAX
 };
 
diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
index 0751fcb89699..ffbaff914d55 100644
--- a/net/bridge/br_fdb.c
+++ b/net/bridge/br_fdb.c
@@ -670,6 +670,18 @@ static int fdb_fill_info(struct sk_buff *skb, const struct net_bridge *br,
 
 	if (fdb->vlan_id && nla_put(skb, NDA_VLAN, sizeof(u16), &fdb->vlan_id))
 		goto nla_put_failure;
+	if (fdb->md_dst) {
+		struct metadata_dst *md_dst;
+		md_dst = (struct metadata_dst *)fdb->md_dst;
+		/* ensured in br_fdb_update */
+		BUG_ON(!(md_dst->dst.flags & DST_METADATA));
+
+		if (md_dst->type == METADATA_VPLS) {
+			unsigned wire = md_dst->u.vpls_info.pw_label;
+			if (nla_put_u32(skb, NDA_VPLS_WIRE, wire))
+				goto nla_put_failure;
+		}
+	}
 
 	nlmsg_end(skb, nlh);
 	return 0;
-- 
2.13.0

^ permalink raw reply related

* [PATCH 4/6] mpls: VPLS support
From: David Lamparter @ 2017-08-16 17:02 UTC (permalink / raw)
  To: netdev; +Cc: amine.kherbouche, roopa, David Lamparter
In-Reply-To: <20170816170202.456851-1-equinox@diac24.net>

[work-in-progress, works but needs changes]
[v2: refactored lots of things, e.g. dst_metadata, no more genetlink]

Signed-off-by: David Lamparter <equinox@diac24.net>
---
 include/net/dst_metadata.h |  21 ++
 include/net/vpls.h         |   8 +
 net/mpls/Kconfig           |  11 ++
 net/mpls/Makefile          |   1 +
 net/mpls/vpls.c            | 469 +++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 510 insertions(+)
 create mode 100644 include/net/vpls.h
 create mode 100644 net/mpls/vpls.c

diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h
index 8858dc441458..aeee4ce3b654 100644
--- a/include/net/dst_metadata.h
+++ b/include/net/dst_metadata.h
@@ -3,11 +3,13 @@
 
 #include <linux/skbuff.h>
 #include <net/ip_tunnels.h>
+#include <net/vpls.h>
 #include <net/dst.h>
 
 enum metadata_type {
 	METADATA_IP_TUNNEL,
 	METADATA_HW_PORT_MUX,
+	METADATA_VPLS,
 };
 
 struct hw_port_info {
@@ -21,6 +23,7 @@ struct metadata_dst {
 	union {
 		struct ip_tunnel_info	tun_info;
 		struct hw_port_info	port_info;
+		struct vpls_info	vpls_info;
 	} u;
 };
 
@@ -49,6 +52,15 @@ static inline struct ip_tunnel_info *skb_tunnel_info(struct sk_buff *skb)
 	return NULL;
 }
 
+static inline struct vpls_info *skb_vpls_info(struct sk_buff *skb)
+{
+	struct metadata_dst *md_dst = skb_metadata_dst(skb);
+	if (md_dst && md_dst->type == METADATA_VPLS)
+		return &md_dst->u.vpls_info;
+	return NULL;
+}
+
+
 static inline bool skb_valid_dst(const struct sk_buff *skb)
 {
 	struct dst_entry *dst = skb_dst(skb);
@@ -73,6 +85,9 @@ static inline int dst_metadata_cmp(const struct dst_entry *dst_a,
 	case METADATA_HW_PORT_MUX:
 		return memcmp(&a->u.port_info, &b->u.port_info,
 			      sizeof(a->u.port_info));
+	case METADATA_VPLS:
+		return memcmp(&a->u.vpls_info, &b->u.vpls_info,
+			      sizeof(a->u.vpls_info));
 	case METADATA_IP_TUNNEL:
 		return memcmp(&a->u.tun_info, &b->u.tun_info,
 			      sizeof(a->u.tun_info) +
@@ -218,4 +233,10 @@ static inline struct metadata_dst *ipv6_tun_rx_dst(struct sk_buff *skb,
 				  0, ip6_flowlabel(ip6h), flags, tunnel_id,
 				  md_size);
 }
+
+static inline struct metadata_dst *vpls_rx_dst(void)
+{
+	return metadata_dst_alloc(0, METADATA_VPLS, GFP_ATOMIC);
+}
+
 #endif /* __NET_DST_METADATA_H */
diff --git a/include/net/vpls.h b/include/net/vpls.h
new file mode 100644
index 000000000000..b261e2d97734
--- /dev/null
+++ b/include/net/vpls.h
@@ -0,0 +1,8 @@
+#ifndef __NET_VPLS_H
+#define __NET_VPLS_H 1
+
+struct vpls_info {
+	u32		pw_label;
+};
+
+#endif /* __NET_VPLS_H */
diff --git a/net/mpls/Kconfig b/net/mpls/Kconfig
index 5c467ef97311..c15ba73efb34 100644
--- a/net/mpls/Kconfig
+++ b/net/mpls/Kconfig
@@ -27,6 +27,17 @@ config MPLS_ROUTING
 	---help---
 	 Add support for forwarding of mpls packets.
 
+config MPLS_VPLS
+	bool "VPLS support"
+	default y
+	depends on MPLS_ROUTING && BRIDGE_NETFILTER=n
+	---help---
+	 Add support for de-&encapsulating VPLS.  Not compatible with
+	 bridge netfilter due to the latter stomping over VPLS' dst metadata.
+
+comment "disable 'Bridged IP/ARP packets filtering' for VPLS support"
+	depends on BRIDGE_NETFILTER
+
 config MPLS_IPTUNNEL
 	tristate "MPLS: IP over MPLS tunnel support"
 	depends on LWTUNNEL && MPLS_ROUTING
diff --git a/net/mpls/Makefile b/net/mpls/Makefile
index 9ca923625016..3c028600a980 100644
--- a/net/mpls/Makefile
+++ b/net/mpls/Makefile
@@ -6,3 +6,4 @@ obj-$(CONFIG_MPLS_ROUTING) += mpls_router.o
 obj-$(CONFIG_MPLS_IPTUNNEL) += mpls_iptunnel.o
 
 mpls_router-y := af_mpls.o
+mpls_router-$(CONFIG_MPLS_VPLS) += vpls.o
diff --git a/net/mpls/vpls.c b/net/mpls/vpls.c
new file mode 100644
index 000000000000..28ac810da6e9
--- /dev/null
+++ b/net/mpls/vpls.c
@@ -0,0 +1,469 @@
+/*
+ *  net/mpls/vpls.c
+ *
+ *  Copyright (C) 2016 David Lamparter
+ *
+ */
+
+#include <linux/netdevice.h>
+#include <linux/slab.h>
+#include <linux/ethtool.h>
+#include <linux/etherdevice.h>
+#include <linux/u64_stats_sync.h>
+#include <linux/mpls.h>
+
+#include <net/rtnetlink.h>
+#include <net/dst.h>
+#include <net/xfrm.h>
+#include <net/mpls.h>
+#include <linux/module.h>
+#include <net/dst_metadata.h>
+#include <net/ip_tunnels.h>
+
+#include "internal.h"
+
+#define DRV_NAME	"vpls"
+
+#define MIN_MTU 68		/* Min L3 MTU */
+#define MAX_MTU 65535		/* Max L3 MTU (arbitrary) */
+
+struct vpls_wirelist {
+	struct rcu_head rcu;
+	size_t count;
+	unsigned wires[0];
+};
+
+struct vpls_priv {
+	struct net *encap_net;
+	struct vpls_wirelist __rcu *wires;
+};
+
+static int vpls_xmit_wire(struct sk_buff *skb, struct net_device *dev,
+			  struct vpls_priv *vpls, u32 wire)
+{
+	struct mpls_route *rt;
+	struct mpls_entry_decoded dec;
+
+	dec.bos = 1;
+	dec.ttl = 255;
+
+	rt = mpls_route_input_rcu(vpls->encap_net, wire);
+	if (!rt)
+		return -ENOENT;
+	if (rt->rt_vpls_dev != dev)
+		return -EINVAL;
+
+	return mpls_rt_xmit(skb, rt, dec);
+}
+
+static netdev_tx_t vpls_xmit(struct sk_buff *skb, struct net_device *dev)
+{
+	int err = -EINVAL, ok_count = 0;
+	struct vpls_priv *priv = netdev_priv(dev);
+	struct vpls_info *vi;
+	struct pcpu_sw_netstats *stats;
+	size_t len = skb->len;
+
+	rcu_read_lock();
+	vi = skb_vpls_info(skb);
+
+	skb_orphan(skb);
+	skb_forward_csum(skb);
+
+	if (vi) {
+		err = vpls_xmit_wire(skb, dev, priv, vi->pw_label);
+		if (err)
+			goto out_err;
+	} else {
+		struct sk_buff *cloned;
+		struct vpls_wirelist *wl;
+		size_t i;
+
+		wl = rcu_dereference(priv->wires);
+		if (wl->count == 0) {
+			dev->stats.tx_carrier_errors++;
+			goto out_err;
+		}
+
+		for (i = 0; i < wl->count; i++) {
+			cloned = skb_clone(skb, GFP_KERNEL);
+			if (vpls_xmit_wire(cloned, dev, priv, wl->wires[i]))
+				consume_skb(cloned);
+			else
+				ok_count++;
+		}
+		if (!ok_count)
+			goto out_err;
+
+		consume_skb(skb);
+	}
+
+	stats = this_cpu_ptr(dev->tstats);
+	u64_stats_update_begin(&stats->syncp);
+	stats->tx_packets++;
+	stats->tx_bytes += len;
+	u64_stats_update_end(&stats->syncp);
+
+	rcu_read_unlock();
+	return 0;
+
+out_err:
+	dev->stats.tx_errors++;
+
+	consume_skb(skb);
+	rcu_read_unlock();
+	return err;
+}
+
+int vpls_rcv(struct sk_buff *skb, struct net_device *in_dev,
+	     struct packet_type *pt, struct mpls_route *rt,
+	     struct mpls_shim_hdr *hdr, struct net_device *orig_dev)
+{
+	struct net_device *dev = rt->rt_vpls_dev;
+	struct mpls_entry_decoded dec;
+	struct metadata_dst *md_dst;
+	struct pcpu_sw_netstats *stats;
+
+	if (!dev)
+		goto drop_nodev;
+
+	dec = mpls_entry_decode(hdr);
+	if (!dec.bos) {
+		dev->stats.rx_frame_errors++;
+		goto drop;
+	}
+
+	skb_pull(skb, sizeof(*hdr));
+
+	if (unlikely(!pskb_may_pull(skb, ETH_HLEN))) {
+		dev->stats.rx_length_errors++;
+		goto drop;
+	}
+
+	md_dst = vpls_rx_dst();
+	if (unlikely(!md_dst)) {
+		netdev_err(dev, "failed to allocate dst metadata\n");
+		goto drop;
+	}
+	md_dst->u.vpls_info.pw_label = dec.label;
+
+	skb->dev = dev;
+
+	skb_reset_mac_header(skb);
+	skb->protocol = eth_type_trans(skb, dev);
+	skb->ip_summed = CHECKSUM_NONE;
+	skb->pkt_type = PACKET_HOST;
+
+	skb_clear_hash(skb);
+	skb->vlan_tci = 0;
+	skb_set_queue_mapping(skb, 0);
+	skb_scrub_packet(skb, !net_eq(dev_net(in_dev), dev_net(dev)));
+
+	skb_reset_network_header(skb);
+	skb_probe_transport_header(skb, 0);
+
+	skb_dst_drop(skb);
+	skb_dst_set(skb, &md_dst->dst);
+
+	stats = this_cpu_ptr(dev->tstats);
+	u64_stats_update_begin(&stats->syncp);
+	stats->rx_packets++;
+	stats->rx_bytes += skb->len;
+	u64_stats_update_end(&stats->syncp);
+
+	netif_rx(skb);
+	return 0;
+
+drop:
+	dev->stats.rx_errors++;
+drop_nodev:
+	kfree_skb(skb);
+	return NET_RX_DROP;
+}
+
+void vpls_label_update(unsigned label, struct mpls_route *rt_old,
+		       struct mpls_route *rt_new)
+{
+	struct vpls_priv *priv;
+	struct vpls_wirelist *wl, *wl_new;
+	size_t i;
+
+	ASSERT_RTNL();
+
+	if (rt_old && rt_new && rt_old->rt_vpls_dev == rt_new->rt_vpls_dev)
+		return;
+
+	if (rt_old && rt_old->rt_vpls_dev) {
+		priv = netdev_priv(rt_old->rt_vpls_dev);
+		wl = rcu_dereference(priv->wires);
+
+		for (i = 0; i < wl->count; i++)
+			if (wl->wires[i] == label)
+				break;
+
+		if (i == wl->count) {
+			netdev_err(rt_old->rt_vpls_dev,
+				   "can't find pseudowire to remove!\n");
+			goto update_new;
+		}
+
+		wl_new = kmalloc(sizeof(*wl) +
+				 (wl->count - 1) * sizeof(wl->wires[0]),
+				 GFP_ATOMIC);
+		if (!wl_new) {
+			netdev_err(rt_old->rt_vpls_dev,
+				   "out of memory for pseudowire delete!\n");
+			goto update_new;
+		}
+
+		wl_new->count = wl->count - 1;
+		memcpy(wl_new->wires, wl->wires, i * sizeof(wl->wires[0]));
+		memcpy(wl_new->wires + i, wl->wires + i + 1,
+			(wl->count - i - 1) * sizeof(wl->wires[0]));
+
+		rcu_assign_pointer(priv->wires, wl_new);
+		kfree_rcu(wl, rcu);
+
+		if (wl_new->count == 0)
+			netif_carrier_off(rt_old->rt_vpls_dev);
+	}
+
+update_new:
+	if (rt_new && rt_new->rt_vpls_dev) {
+		priv = netdev_priv(rt_new->rt_vpls_dev);
+		wl = rcu_dereference(priv->wires);
+
+		wl_new = kmalloc(sizeof(*wl) +
+				 (wl->count + 1) * sizeof(wl->wires[0]),
+				 GFP_ATOMIC);
+		if (!wl_new) {
+			netdev_err(rt_new->rt_vpls_dev,
+				   "out of memory for pseudowire add!\n");
+			return;
+		}
+		wl_new->count = wl->count + 1;
+		memcpy(wl_new->wires, wl->wires,
+			wl->count * sizeof(wl->wires[0]));
+		wl_new->wires[wl->count] = label;
+
+		rcu_assign_pointer(priv->wires, wl_new);
+		kfree_rcu(wl, rcu);
+
+		if (wl_new->count == 1)
+			netif_carrier_on(rt_new->rt_vpls_dev);
+	}
+}
+
+/* fake multicast ability */
+static void vpls_set_multicast_list(struct net_device *dev)
+{
+}
+
+static int vpls_open(struct net_device *dev)
+{
+	struct vpls_priv *priv = netdev_priv(dev);
+	struct vpls_wirelist *wl;
+
+	wl = rcu_dereference(priv->wires);
+	if (wl->count > 0)
+		netif_carrier_on(dev);
+
+	return 0;
+}
+
+static int vpls_close(struct net_device *dev)
+{
+	netif_carrier_off(dev);
+	return 0;
+}
+
+static int is_valid_vpls_mtu(int new_mtu)
+{
+	return new_mtu >= MIN_MTU && new_mtu <= MAX_MTU;
+}
+
+static int vpls_change_mtu(struct net_device *dev, int new_mtu)
+{
+	if (!is_valid_vpls_mtu(new_mtu))
+		return -EINVAL;
+	dev->mtu = new_mtu;
+	return 0;
+}
+
+static int vpls_dev_init(struct net_device *dev)
+{
+	struct vpls_priv *priv = netdev_priv(dev);
+	priv->wires = kzalloc(sizeof(struct vpls_wirelist), GFP_KERNEL);
+	if (!priv->wires)
+		return -ENOMEM;
+
+	dev->tstats = netdev_alloc_pcpu_stats(struct pcpu_sw_netstats);
+	if (!dev->tstats) {
+		kfree(priv->wires);
+		return -ENOMEM;
+	}
+
+	return 0;
+}
+
+static void vpls_dev_free(struct net_device *dev)
+{
+	struct vpls_priv *priv = netdev_priv(dev);
+
+	free_percpu(dev->tstats);
+
+	if (priv->wires)
+		kfree(priv->wires);
+
+	if (priv->encap_net)
+		put_net(priv->encap_net);
+
+	free_netdev(dev);
+}
+
+static const struct net_device_ops vpls_netdev_ops = {
+	.ndo_init		= vpls_dev_init,
+	.ndo_open		= vpls_open,
+	.ndo_stop		= vpls_close,
+	.ndo_start_xmit		= vpls_xmit,
+	.ndo_change_mtu		= vpls_change_mtu,
+	.ndo_get_stats64	= ip_tunnel_get_stats64,
+	.ndo_set_rx_mode	= vpls_set_multicast_list,
+	.ndo_set_mac_address	= eth_mac_addr,
+	.ndo_features_check	= passthru_features_check,
+};
+
+int is_vpls_dev(struct net_device *dev)
+{
+	return dev->netdev_ops == &vpls_netdev_ops;
+}
+
+#define VPLS_FEATURES (NETIF_F_SG | NETIF_F_FRAGLIST | \
+		       NETIF_F_HW_CSUM | NETIF_F_RXCSUM | NETIF_F_HIGHDMA)
+
+static void vpls_setup(struct net_device *dev)
+{
+	ether_setup(dev);
+
+	dev->priv_flags &= ~IFF_TX_SKB_SHARING;
+	dev->priv_flags |= IFF_LIVE_ADDR_CHANGE;
+	dev->priv_flags |= IFF_NO_QUEUE;
+
+	dev->netdev_ops = &vpls_netdev_ops;
+	dev->features |= NETIF_F_LLTX;
+	dev->features |= VPLS_FEATURES;
+	dev->vlan_features = dev->features;
+	dev->priv_destructor = vpls_dev_free;
+
+	dev->hw_features = VPLS_FEATURES;
+	dev->hw_enc_features = VPLS_FEATURES;
+
+	netif_keep_dst(dev);
+}
+
+/*
+ * netlink interface
+ */
+
+static int vpls_validate(struct nlattr *tb[], struct nlattr *data[],
+			 struct netlink_ext_ack *extack)
+{
+	if (tb[IFLA_ADDRESS]) {
+		if (nla_len(tb[IFLA_ADDRESS]) != ETH_ALEN) {
+			NL_SET_ERR_MSG_ATTR(extack, tb[IFLA_ADDRESS],
+				    "Invalid Ethernet address length");
+			return -EINVAL;
+		}
+		if (!is_valid_ether_addr(nla_data(tb[IFLA_ADDRESS]))) {
+			NL_SET_ERR_MSG_ATTR(extack, tb[IFLA_ADDRESS],
+				    "Invalid Ethernet address");
+			return -EADDRNOTAVAIL;
+		}
+	}
+	if (tb[IFLA_MTU]) {
+		if (!is_valid_vpls_mtu(nla_get_u32(tb[IFLA_MTU]))) {
+			NL_SET_ERR_MSG_ATTR(extack, tb[IFLA_MTU],
+				    "Invalid MTU");
+			return -EINVAL;
+		}
+	}
+	return 0;
+}
+
+static struct rtnl_link_ops vpls_link_ops;
+
+static int vpls_newlink(struct net *src_net, struct net_device *dev,
+			struct nlattr *tb[], struct nlattr *data[],
+			struct netlink_ext_ack *extack)
+{
+	int err;
+	struct vpls_priv *priv = netdev_priv(dev);
+
+	if (tb[IFLA_ADDRESS] == NULL)
+		eth_hw_addr_random(dev);
+
+	if (tb[IFLA_IFNAME])
+		nla_strlcpy(dev->name, tb[IFLA_IFNAME], IFNAMSIZ);
+	else
+		snprintf(dev->name, IFNAMSIZ, DRV_NAME "%%d");
+
+	err = register_netdevice(dev);
+	if (err < 0)
+		goto err;
+	priv->encap_net = get_net(src_net);
+
+	netif_carrier_off(dev);
+	return 0;
+
+err:
+	return err;
+}
+
+static void vpls_dellink(struct net_device *dev, struct list_head *head)
+{
+	unregister_netdevice_queue(dev, head);
+}
+
+
+static struct rtnl_link_ops vpls_link_ops = {
+	.kind		= DRV_NAME,
+	.priv_size	= sizeof(struct vpls_priv),
+	.setup		= vpls_setup,
+	.validate	= vpls_validate,
+	.newlink	= vpls_newlink,
+	.dellink	= vpls_dellink,
+};
+
+/*
+ * init/fini
+ */
+
+__init int vpls_init(void)
+{
+	int ret;
+
+	ret = rtnl_link_register(&vpls_link_ops);
+	if (ret)
+		goto out;
+
+	return 0;
+
+out:
+	return ret;
+}
+
+__exit void vpls_exit(void)
+{
+	rtnl_link_unregister(&vpls_link_ops);
+}
+
+#if 0
+/* not currently available as a separate module... */
+
+module_init(vpls_init);
+module_exit(vpls_exit);
+
+MODULE_DESCRIPTION("Virtual Private LAN Service");
+MODULE_LICENSE("GPL v2");
+MODULE_ALIAS_RTNL_LINK(DRV_NAME);
+#endif
-- 
2.13.0

^ permalink raw reply related

* [PATCH 3/6] mpls: add VPLS entry points
From: David Lamparter @ 2017-08-16 17:01 UTC (permalink / raw)
  To: netdev; +Cc: amine.kherbouche, roopa, David Lamparter
In-Reply-To: <20170816170202.456851-1-equinox@diac24.net>

This wires up the neccessary calls for VPLS into the MPLS forwarding
pieces.  Since CONFIG_MPLS_VPLS doesn't exist yet in Kconfig, it'll
never be enabled, so we're on the stubs for now.

Signed-off-by: David Lamparter <equinox@diac24.net>
---
 include/uapi/linux/rtnetlink.h |  1 +
 net/mpls/af_mpls.c             | 54 ++++++++++++++++++++++++++++++++++++++++++
 net/mpls/internal.h            | 29 +++++++++++++++++++++++
 3 files changed, 84 insertions(+)

diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index dab7dad9e01a..b7840ed94526 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -326,6 +326,7 @@ enum rtattr_type_t {
 	RTA_PAD,
 	RTA_UID,
 	RTA_TTL_PROPAGATE,
+	RTA_VPLS_IF,
 	__RTA_MAX
 };
 
diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index 0c5953e5d5bd..4d3ce007b7db 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -299,6 +299,11 @@ static bool mpls_egress(struct net *net, struct mpls_route *rt,
 		success = true;
 		break;
 	}
+	case MPT_VPLS:
+		/* nothing to do here, no TTL in Ethernet
+		 * (and we shouldn't mess with the TTL in inner IP packets,
+		 * pseudowires are supposed to be transparent) */
+		break;
 	case MPT_UNSPEC:
 		/* Should have decided which protocol it is by now */
 		break;
@@ -349,6 +354,8 @@ static int mpls_forward(struct sk_buff *skb, struct net_device *dev,
 		goto drop;
 	}
 
+	if (rt->rt_payload_type == MPT_VPLS)
+		return vpls_rcv(skb, dev, pt, rt, hdr, orig_dev);
 
 	/* Pop the label */
 	skb_pull(skb, sizeof(*hdr));
@@ -469,6 +476,7 @@ static const struct nla_policy rtm_mpls_policy[RTA_MAX+1] = {
 struct mpls_route_config {
 	u32			rc_protocol;
 	u32			rc_ifindex;
+	u32			rc_vpls_ifindex;
 	u8			rc_via_table;
 	u8			rc_via_alen;
 	u8			rc_via[MAX_VIA_ALEN];
@@ -541,6 +549,8 @@ static void mpls_route_update(struct net *net, unsigned index,
 	rt = rtnl_dereference(platform_label[index]);
 	rcu_assign_pointer(platform_label[index], new);
 
+	vpls_label_update(index, rt, new);
+
 	mpls_notify_route(net, index, rt, new, info);
 
 	/* If we removed a route free it now */
@@ -942,6 +952,7 @@ static int mpls_route_add(struct mpls_route_config *cfg,
 	struct mpls_route __rcu **platform_label;
 	struct net *net = cfg->rc_nlinfo.nl_net;
 	struct mpls_route *rt, *old;
+	struct net_device *vpls_dev = NULL;
 	int err = -EINVAL;
 	u8 max_via_alen;
 	unsigned index;
@@ -996,6 +1007,24 @@ static int mpls_route_add(struct mpls_route_config *cfg,
 		goto errout;
 	}
 
+	if (cfg->rc_vpls_ifindex) {
+		vpls_dev = dev_get_by_index(net, cfg->rc_vpls_ifindex);
+		if (!vpls_dev) {
+			err = -ENODEV;
+			NL_SET_ERR_MSG(extack, "Invalid VPLS ifindex");
+			goto errout;
+		}
+		/* we're under RTNL; and we'll drop routes when we're
+		 * notified the device is going away. */
+		dev_put(vpls_dev);
+
+		if (!is_vpls_dev(vpls_dev)) {
+			err = -ENODEV;
+			NL_SET_ERR_MSG(extack, "Not a VPLS device");
+			goto errout;
+		}
+	}
+
 	err = -ENOMEM;
 	rt = mpls_rt_alloc(nhs, max_via_alen, max_labels);
 	if (IS_ERR(rt)) {
@@ -1006,6 +1035,7 @@ static int mpls_route_add(struct mpls_route_config *cfg,
 	rt->rt_protocol = cfg->rc_protocol;
 	rt->rt_payload_type = cfg->rc_payload_type;
 	rt->rt_ttl_propagate = cfg->rc_ttl_propagate;
+	rt->rt_vpls_dev = vpls_dev;
 
 	if (cfg->rc_mp)
 		err = mpls_nh_build_multi(cfg, rt, max_labels, extack);
@@ -1430,6 +1460,14 @@ static void mpls_ifdown(struct net_device *dev, int event)
 		if (!rt)
 			continue;
 
+		if (rt->rt_vpls_dev == dev) {
+			switch (event) {
+			case NETDEV_UNREGISTER:
+				mpls_route_update(net, index, NULL, NULL);
+				continue;
+			}
+		}
+
 		alive = 0;
 		deleted = 0;
 		change_nexthops(rt) {
@@ -1777,6 +1815,10 @@ static int rtm_to_route_config(struct sk_buff *skb,
 		case RTA_OIF:
 			cfg->rc_ifindex = nla_get_u32(nla);
 			break;
+		case RTA_VPLS_IF:
+			cfg->rc_vpls_ifindex = nla_get_u32(nla);
+			cfg->rc_payload_type = MPT_VPLS;
+			break;
 		case RTA_NEWDST:
 			if (nla_get_labels(nla, MAX_NEW_LABELS,
 					   &cfg->rc_output_labels,
@@ -1911,6 +1953,11 @@ static int mpls_dump_route(struct sk_buff *skb, u32 portid, u32 seq, int event,
 			       ttl_propagate))
 			goto nla_put_failure;
 	}
+
+	if (rt->rt_vpls_dev)
+		if (nla_put_u32(skb, RTA_VPLS_IF, rt->rt_vpls_dev->ifindex))
+			goto nla_put_failure;
+
 	if (rt->rt_nhn == 1) {
 		const struct mpls_nh *nh = rt->rt_nh;
 
@@ -2220,6 +2267,10 @@ static int mpls_getroute(struct sk_buff *in_skb, struct nlmsghdr *in_nlh,
 	if (nla_put_labels(skb, RTA_DST, 1, &in_label))
 		goto nla_put_failure;
 
+	if (rt->rt_vpls_dev)
+		if (nla_put_u32(skb, RTA_VPLS_IF, rt->rt_vpls_dev->ifindex))
+			goto nla_put_failure;
+
 	if (nh->nh_labels &&
 	    nla_put_labels(skb, RTA_NEWDST, nh->nh_labels,
 			   nh->nh_label))
@@ -2491,6 +2542,8 @@ static int __init mpls_init(void)
 
 	rtnl_af_register(&mpls_af_ops);
 
+	vpls_init();
+
 	rtnl_register(PF_MPLS, RTM_NEWROUTE, mpls_rtm_newroute, NULL, 0);
 	rtnl_register(PF_MPLS, RTM_DELROUTE, mpls_rtm_delroute, NULL, 0);
 	rtnl_register(PF_MPLS, RTM_GETROUTE, mpls_getroute, mpls_dump_routes,
@@ -2510,6 +2563,7 @@ module_init(mpls_init);
 static void __exit mpls_exit(void)
 {
 	rtnl_unregister_all(PF_MPLS);
+	vpls_exit();
 	rtnl_af_unregister(&mpls_af_ops);
 	dev_remove_pack(&mpls_packet_type);
 	unregister_netdevice_notifier(&mpls_dev_notifier);
diff --git a/net/mpls/internal.h b/net/mpls/internal.h
index b70c6663d4f3..876ae9993207 100644
--- a/net/mpls/internal.h
+++ b/net/mpls/internal.h
@@ -76,6 +76,7 @@ struct sk_buff;
 
 enum mpls_payload_type {
 	MPT_UNSPEC, /* IPv4 or IPv6 */
+	MPT_VPLS = 2,	/* pseudowire */
 	MPT_IPV4 = 4,
 	MPT_IPV6 = 6,
 
@@ -153,6 +154,8 @@ struct mpls_route { /* next hop label forwarding entry */
 	u8			rt_nh_size;
 	u8			rt_via_offset;
 	u8			rt_reserved1;
+	struct net_device	*rt_vpls_dev;
+
 	struct mpls_nh		rt_nh[0];
 };
 
@@ -214,4 +217,30 @@ struct mpls_route *mpls_route_input_rcu(struct net *net, unsigned index);
 int mpls_rt_xmit(struct sk_buff *skb, struct mpls_route *rt,
 		 struct mpls_entry_decoded dec);
 
+#ifdef CONFIG_MPLS_VPLS
+int vpls_rcv(struct sk_buff *skb, struct net_device *in_dev,
+	     struct packet_type *pt, struct mpls_route *rt,
+	     struct mpls_shim_hdr *hdr, struct net_device *orig_dev);
+void vpls_label_update(unsigned label, struct mpls_route *rt_old,
+		       struct mpls_route *rt_new);
+__init int vpls_init(void);
+__exit void vpls_exit(void);
+int is_vpls_dev(struct net_device *dev);
+
+#else /* !CONFIG_MPLS_VPLS */
+static inline int vpls_rcv(skb, in_dev, pt, rt, hdr, orig_dev)
+{
+	kfree_skb(skb);
+	return NET_RX_DROP;
+}
+static inline int is_vpls_dev(struct net_device *dev)
+{
+	return 0;
+}
+
+#define vpls_label_update(label, rt_old, rt_new) do { } while (0)
+#define vpls_init() do { } while (0)
+#define vpls_exit() do { } while (0)
+#endif
+
 #endif /* MPLS_INTERNAL_H */
-- 
2.13.0

^ permalink raw reply related

* [PATCH 2/6] mpls: split forwarding path on rx/tx boundary
From: David Lamparter @ 2017-08-16 17:01 UTC (permalink / raw)
  To: netdev; +Cc: amine.kherbouche, roopa, David Lamparter
In-Reply-To: <20170816170202.456851-1-equinox@diac24.net>

This makes mpls_rt_xmit() available for use in upcoming VPLS code.  Same
for mpls_route_input_rcu().

Signed-off-by: David Lamparter <equinox@diac24.net>
---
 net/mpls/af_mpls.c  | 48 ++++++++++++++++++++++++++++++------------------
 net/mpls/internal.h |  4 ++++
 2 files changed, 34 insertions(+), 18 deletions(-)

diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index c5b9ce41d66f..0c5953e5d5bd 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -43,7 +43,7 @@ static void rtmsg_lfib(int event, u32 label, struct mpls_route *rt,
 		       struct nlmsghdr *nlh, struct net *net, u32 portid,
 		       unsigned int nlm_flags);
 
-static struct mpls_route *mpls_route_input_rcu(struct net *net, unsigned index)
+struct mpls_route *mpls_route_input_rcu(struct net *net, unsigned index)
 {
 	struct mpls_route *rt = NULL;
 
@@ -313,15 +313,8 @@ static int mpls_forward(struct sk_buff *skb, struct net_device *dev,
 	struct net *net = dev_net(dev);
 	struct mpls_shim_hdr *hdr;
 	struct mpls_route *rt;
-	struct mpls_nh *nh;
 	struct mpls_entry_decoded dec;
-	struct net_device *out_dev;
-	struct mpls_dev *out_mdev;
 	struct mpls_dev *mdev;
-	unsigned int hh_len;
-	unsigned int new_header_size;
-	unsigned int mtu;
-	int err;
 
 	/* Careful this entire function runs inside of an rcu critical section */
 
@@ -356,9 +349,6 @@ static int mpls_forward(struct sk_buff *skb, struct net_device *dev,
 		goto drop;
 	}
 
-	nh = mpls_select_multipath(rt, skb);
-	if (!nh)
-		goto err;
 
 	/* Pop the label */
 	skb_pull(skb, sizeof(*hdr));
@@ -376,6 +366,32 @@ static int mpls_forward(struct sk_buff *skb, struct net_device *dev,
 		goto err;
 	dec.ttl -= 1;
 
+	if (mpls_rt_xmit(skb, rt, dec))
+		goto drop;
+	return 0;
+
+err:
+	MPLS_INC_STATS(mdev, rx_errors);
+drop:
+	kfree_skb(skb);
+	return NET_RX_DROP;
+}
+
+int mpls_rt_xmit(struct sk_buff *skb, struct mpls_route *rt,
+		 struct mpls_entry_decoded dec)
+{
+	struct mpls_nh *nh;
+	struct net_device *out_dev;
+	struct mpls_dev *out_mdev;
+	unsigned int hh_len;
+	unsigned int new_header_size;
+	unsigned int mtu;
+	int err;
+
+	nh = mpls_select_multipath(rt, skb);
+	if (!nh)
+		goto tx_err;
+
 	/* Find the output device */
 	out_dev = rcu_dereference(nh->nh_dev);
 	if (!mpls_output_possible(out_dev))
@@ -401,8 +417,9 @@ static int mpls_forward(struct sk_buff *skb, struct net_device *dev,
 	if (unlikely(!new_header_size && dec.bos)) {
 		/* Penultimate hop popping */
 		if (!mpls_egress(dev_net(out_dev), rt, skb, dec))
-			goto err;
+			goto tx_err;
 	} else {
+		struct mpls_shim_hdr *hdr;
 		bool bos;
 		int i;
 		skb_push(skb, new_header_size);
@@ -435,12 +452,7 @@ static int mpls_forward(struct sk_buff *skb, struct net_device *dev,
 	out_mdev = out_dev ? mpls_dev_get(out_dev) : NULL;
 	if (out_mdev)
 		MPLS_INC_STATS(out_mdev, tx_errors);
-	goto drop;
-err:
-	MPLS_INC_STATS(mdev, rx_errors);
-drop:
-	kfree_skb(skb);
-	return NET_RX_DROP;
+	return -1;
 }
 
 static struct packet_type mpls_packet_type __read_mostly = {
diff --git a/net/mpls/internal.h b/net/mpls/internal.h
index cf65aec2e551..b70c6663d4f3 100644
--- a/net/mpls/internal.h
+++ b/net/mpls/internal.h
@@ -210,4 +210,8 @@ bool mpls_pkt_too_big(const struct sk_buff *skb, unsigned int mtu);
 void mpls_stats_inc_outucastpkts(struct net_device *dev,
 				 const struct sk_buff *skb);
 
+struct mpls_route *mpls_route_input_rcu(struct net *net, unsigned index);
+int mpls_rt_xmit(struct sk_buff *skb, struct mpls_route *rt,
+		 struct mpls_entry_decoded dec);
+
 #endif /* MPLS_INTERNAL_H */
-- 
2.13.0

^ permalink raw reply related

* [PATCH 1/6] bridge: learn dst metadata in FDB
From: David Lamparter @ 2017-08-16 17:01 UTC (permalink / raw)
  To: netdev; +Cc: amine.kherbouche, roopa, David Lamparter
In-Reply-To: <20170816170202.456851-1-equinox@diac24.net>

This implements holding dst metadata information in the bridge layer,
but only for unicast entries in the MAC table.  Multicast is still left
to design and implement.

Signed-off-by: David Lamparter <equinox@diac24.net>
---
 include/net/dst_metadata.h | 19 +++++++++++++------
 net/bridge/br_device.c     |  4 ++++
 net/bridge/br_fdb.c        | 45 +++++++++++++++++++++++++++++++--------------
 net/bridge/br_input.c      |  6 ++++--
 net/bridge/br_private.h    |  4 +++-
 5 files changed, 55 insertions(+), 23 deletions(-)

diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h
index a803129a4849..8858dc441458 100644
--- a/include/net/dst_metadata.h
+++ b/include/net/dst_metadata.h
@@ -56,16 +56,15 @@ static inline bool skb_valid_dst(const struct sk_buff *skb)
 	return dst && !(dst->flags & DST_METADATA);
 }
 
-static inline int skb_metadata_dst_cmp(const struct sk_buff *skb_a,
-				       const struct sk_buff *skb_b)
+static inline int dst_metadata_cmp(const struct dst_entry *dst_a,
+				   const struct dst_entry *dst_b)
 {
 	const struct metadata_dst *a, *b;
-
-	if (!(skb_a->_skb_refdst | skb_b->_skb_refdst))
+	if (!(dst_a || dst_b))
 		return 0;
 
-	a = (const struct metadata_dst *) skb_dst(skb_a);
-	b = (const struct metadata_dst *) skb_dst(skb_b);
+	a = (const struct metadata_dst *) dst_a;
+	b = (const struct metadata_dst *) dst_b;
 
 	if (!a != !b || a->type != b->type)
 		return 1;
@@ -83,6 +82,14 @@ static inline int skb_metadata_dst_cmp(const struct sk_buff *skb_a,
 	}
 }
 
+static inline int skb_metadata_dst_cmp(const struct sk_buff *skb_a,
+				       const struct sk_buff *skb_b)
+{
+	if (!(skb_a->_skb_refdst | skb_b->_skb_refdst))
+		return 0;
+	return dst_metadata_cmp(skb_dst(skb_a), skb_dst(skb_b));
+}
+
 void metadata_dst_free(struct metadata_dst *);
 struct metadata_dst *metadata_dst_alloc(u8 optslen, enum metadata_type type,
 					gfp_t flags);
diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
index 861ae2a165f4..534cacf02f8d 100644
--- a/net/bridge/br_device.c
+++ b/net/bridge/br_device.c
@@ -53,6 +53,7 @@ netdev_tx_t br_dev_xmit(struct sk_buff *skb, struct net_device *dev)
 	brstats->tx_bytes += skb->len;
 	u64_stats_update_end(&brstats->syncp);
 
+	skb_dst_drop(skb);
 	BR_INPUT_SKB_CB(skb)->brdev = dev;
 
 	skb_reset_mac_header(skb);
@@ -81,6 +82,9 @@ netdev_tx_t br_dev_xmit(struct sk_buff *skb, struct net_device *dev)
 		else
 			br_flood(br, skb, BR_PKT_MULTICAST, false, true);
 	} else if ((dst = br_fdb_find_rcu(br, dest, vid)) != NULL) {
+		struct dst_entry *md_dst = rcu_dereference(dst->md_dst);
+		if (md_dst)
+			skb_dst_set_noref(skb, md_dst);
 		br_forward(dst->dst, skb, false, true);
 	} else {
 		br_flood(br, skb, BR_PKT_UNICAST, false, true);
diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
index a79b648aac88..0751fcb89699 100644
--- a/net/bridge/br_fdb.c
+++ b/net/bridge/br_fdb.c
@@ -25,11 +25,13 @@
 #include <asm/unaligned.h>
 #include <linux/if_vlan.h>
 #include <net/switchdev.h>
+#include <net/dst_metadata.h>
 #include "br_private.h"
 
 static struct kmem_cache *br_fdb_cache __read_mostly;
 static int fdb_insert(struct net_bridge *br, struct net_bridge_port *source,
-		      const unsigned char *addr, u16 vid);
+		      struct dst_entry *md_dst, const unsigned char *addr,
+		      u16 vid);
 static void fdb_notify(struct net_bridge *br,
 		       const struct net_bridge_fdb_entry *, int);
 
@@ -174,6 +176,8 @@ static void fdb_delete(struct net_bridge *br, struct net_bridge_fdb_entry *f)
 	if (f->is_static)
 		fdb_del_hw_addr(br, f->addr.addr);
 
+	dst_release(rcu_access_pointer(f->md_dst));
+
 	hlist_del_init_rcu(&f->hlist);
 	fdb_notify(br, f, RTM_DELNEIGH);
 	call_rcu(&f->rcu, fdb_rcu_free);
@@ -260,7 +264,7 @@ void br_fdb_changeaddr(struct net_bridge_port *p, const unsigned char *newaddr)
 
 insert:
 	/* insert new address,  may fail if invalid address or dup. */
-	fdb_insert(br, p, newaddr, 0);
+	fdb_insert(br, p, NULL, newaddr, 0);
 
 	if (!vg || !vg->num_vlans)
 		goto done;
@@ -270,7 +274,7 @@ void br_fdb_changeaddr(struct net_bridge_port *p, const unsigned char *newaddr)
 	 * from under us.
 	 */
 	list_for_each_entry(v, &vg->vlan_list, vlist)
-		fdb_insert(br, p, newaddr, v->vid);
+		fdb_insert(br, p, NULL, newaddr, v->vid);
 
 done:
 	spin_unlock_bh(&br->hash_lock);
@@ -289,10 +293,11 @@ void br_fdb_change_mac_address(struct net_bridge *br, const u8 *newaddr)
 	if (f && f->is_local && !f->dst && !f->added_by_user)
 		fdb_delete_local(br, NULL, f);
 
-	fdb_insert(br, NULL, newaddr, 0);
+	fdb_insert(br, NULL, NULL, newaddr, 0);
 	vg = br_vlan_group(br);
 	if (!vg || !vg->num_vlans)
 		goto out;
+
 	/* Now remove and add entries for every VLAN configured on the
 	 * bridge.  This function runs under RTNL so the bitmap will not
 	 * change from under us.
@@ -303,7 +308,7 @@ void br_fdb_change_mac_address(struct net_bridge *br, const u8 *newaddr)
 		f = br_fdb_find(br, br->dev->dev_addr, v->vid);
 		if (f && f->is_local && !f->dst && !f->added_by_user)
 			fdb_delete_local(br, NULL, f);
-		fdb_insert(br, NULL, newaddr, v->vid);
+		fdb_insert(br, NULL, NULL, newaddr, v->vid);
 	}
 out:
 	spin_unlock_bh(&br->hash_lock);
@@ -477,6 +482,7 @@ int br_fdb_fillbuf(struct net_bridge *br, void *buf,
 
 static struct net_bridge_fdb_entry *fdb_create(struct hlist_head *head,
 					       struct net_bridge_port *source,
+					       struct dst_entry *md_dst,
 					       const unsigned char *addr,
 					       __u16 vid,
 					       unsigned char is_local,
@@ -488,6 +494,7 @@ static struct net_bridge_fdb_entry *fdb_create(struct hlist_head *head,
 	if (fdb) {
 		memcpy(fdb->addr.addr, addr, ETH_ALEN);
 		fdb->dst = source;
+		rcu_assign_pointer(fdb->md_dst, dst_clone(md_dst));
 		fdb->vlan_id = vid;
 		fdb->is_local = is_local;
 		fdb->is_static = is_static;
@@ -501,7 +508,8 @@ static struct net_bridge_fdb_entry *fdb_create(struct hlist_head *head,
 }
 
 static int fdb_insert(struct net_bridge *br, struct net_bridge_port *source,
-		  const unsigned char *addr, u16 vid)
+		      struct dst_entry *md_dst, const unsigned char *addr,
+		      u16 vid)
 {
 	struct hlist_head *head = &br->hash[br_mac_hash(addr, vid)];
 	struct net_bridge_fdb_entry *fdb;
@@ -521,7 +529,7 @@ static int fdb_insert(struct net_bridge *br, struct net_bridge_port *source,
 		fdb_delete(br, fdb);
 	}
 
-	fdb = fdb_create(head, source, addr, vid, 1, 1);
+	fdb = fdb_create(head, source, md_dst, addr, vid, 1, 1);
 	if (!fdb)
 		return -ENOMEM;
 
@@ -537,13 +545,14 @@ int br_fdb_insert(struct net_bridge *br, struct net_bridge_port *source,
 	int ret;
 
 	spin_lock_bh(&br->hash_lock);
-	ret = fdb_insert(br, source, addr, vid);
+	ret = fdb_insert(br, source, NULL, addr, vid);
 	spin_unlock_bh(&br->hash_lock);
 	return ret;
 }
 
 void br_fdb_update(struct net_bridge *br, struct net_bridge_port *source,
-		   const unsigned char *addr, u16 vid, bool added_by_user)
+		   struct dst_entry *md_dst, const unsigned char *addr,
+		   u16 vid, bool added_by_user)
 {
 	struct hlist_head *head = &br->hash[br_mac_hash(addr, vid)];
 	struct net_bridge_fdb_entry *fdb;
@@ -558,6 +567,9 @@ void br_fdb_update(struct net_bridge *br, struct net_bridge_port *source,
 	      source->state == BR_STATE_FORWARDING))
 		return;
 
+	if (md_dst && !(md_dst->flags & DST_METADATA))
+		md_dst = NULL;
+
 	fdb = fdb_find_rcu(head, addr, vid);
 	if (likely(fdb)) {
 		/* attempt to update an entry for a local interface */
@@ -567,10 +579,15 @@ void br_fdb_update(struct net_bridge *br, struct net_bridge_port *source,
 					source->dev->name, addr, vid);
 		} else {
 			unsigned long now = jiffies;
+			struct dst_entry *ref_md = rcu_access_pointer(fdb->md_dst);
 
 			/* fastpath: update of existing entry */
-			if (unlikely(source != fdb->dst)) {
+			if (unlikely(source != fdb->dst ||
+			    dst_metadata_cmp(md_dst, ref_md))) {
 				fdb->dst = source;
+				dst_release(ref_md);
+				rcu_assign_pointer(fdb->md_dst,
+						dst_clone(md_dst));
 				fdb_modified = true;
 				/* Take over HW learned entry */
 				if (unlikely(fdb->added_by_external_learn))
@@ -586,7 +603,7 @@ void br_fdb_update(struct net_bridge *br, struct net_bridge_port *source,
 	} else {
 		spin_lock(&br->hash_lock);
 		if (likely(!fdb_find_rcu(head, addr, vid))) {
-			fdb = fdb_create(head, source, addr, vid, 0, 0);
+			fdb = fdb_create(head, source, md_dst, addr, vid, 0, 0);
 			if (fdb) {
 				if (unlikely(added_by_user))
 					fdb->added_by_user = 1;
@@ -781,7 +798,7 @@ static int fdb_add_entry(struct net_bridge *br, struct net_bridge_port *source,
 		if (!(flags & NLM_F_CREATE))
 			return -ENOENT;
 
-		fdb = fdb_create(head, source, addr, vid, 0, 0);
+		fdb = fdb_create(head, source, NULL, addr, vid, 0, 0);
 		if (!fdb)
 			return -ENOMEM;
 
@@ -844,7 +861,7 @@ static int __br_fdb_add(struct ndmsg *ndm, struct net_bridge *br,
 		}
 		local_bh_disable();
 		rcu_read_lock();
-		br_fdb_update(br, p, addr, vid, true);
+		br_fdb_update(br, p, NULL, addr, vid, true);
 		rcu_read_unlock();
 		local_bh_enable();
 	} else if (ndm->ndm_flags & NTF_EXT_LEARNED) {
@@ -1071,7 +1088,7 @@ int br_fdb_external_learn_add(struct net_bridge *br, struct net_bridge_port *p,
 	head = &br->hash[br_mac_hash(addr, vid)];
 	fdb = br_fdb_find(br, addr, vid);
 	if (!fdb) {
-		fdb = fdb_create(head, p, addr, vid, 0, 0);
+		fdb = fdb_create(head, p, NULL, addr, vid, 0, 0);
 		if (!fdb) {
 			err = -ENOMEM;
 			goto err_unlock;
diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
index 7637f58c1226..df10c87b2499 100644
--- a/net/bridge/br_input.c
+++ b/net/bridge/br_input.c
@@ -150,7 +150,8 @@ int br_handle_frame_finish(struct net *net, struct sock *sk, struct sk_buff *skb
 	/* insert into forwarding database after filtering to avoid spoofing */
 	br = p->br;
 	if (p->flags & BR_LEARNING)
-		br_fdb_update(br, p, eth_hdr(skb)->h_source, vid, false);
+		br_fdb_update(br, p, skb_dst(skb), eth_hdr(skb)->h_source,
+			      vid, false);
 
 	local_rcv = !!(br->dev->flags & IFF_PROMISC);
 	dest = eth_hdr(skb)->h_dest;
@@ -230,7 +231,8 @@ static void __br_handle_local_finish(struct sk_buff *skb)
 
 	/* check if vlan is allowed, to avoid spoofing */
 	if (p->flags & BR_LEARNING && br_should_learn(p, skb, &vid))
-		br_fdb_update(p->br, p, eth_hdr(skb)->h_source, vid, false);
+		br_fdb_update(p->br, p, skb_dst(skb),
+				eth_hdr(skb)->h_source, vid, false);
 }
 
 /* note: already called with rcu_read_lock */
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index fd9ee73e0a6d..b73f34ed765f 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -164,6 +164,7 @@ struct net_bridge_vlan_group {
 struct net_bridge_fdb_entry {
 	struct hlist_node		hlist;
 	struct net_bridge_port		*dst;
+	struct dst_entry __rcu		*md_dst;
 
 	mac_addr			addr;
 	__u16				vlan_id;
@@ -524,7 +525,8 @@ int br_fdb_fillbuf(struct net_bridge *br, void *buf, unsigned long count,
 int br_fdb_insert(struct net_bridge *br, struct net_bridge_port *source,
 		  const unsigned char *addr, u16 vid);
 void br_fdb_update(struct net_bridge *br, struct net_bridge_port *source,
-		   const unsigned char *addr, u16 vid, bool added_by_user);
+		   struct dst_entry *md_dst, const unsigned char *addr,
+		   u16 vid, bool added_by_user);
 
 int br_fdb_delete(struct ndmsg *ndm, struct nlattr *tb[],
 		  struct net_device *dev, const unsigned char *addr, u16 vid);
-- 
2.13.0

^ permalink raw reply related

* [RFC net-next] VPLS support
From: David Lamparter @ 2017-08-16 17:01 UTC (permalink / raw)
  To: netdev; +Cc: amine.kherbouche, roopa

Hi all,

triggered by Amine posting VPLS support earlier, this is what I had in
mind on my end.  You may note the patches share some bits, this is
because both series are derived from hacks I did at 33C3 in December
2016.

This patchset is different in the following ways:
- one vpls device encapsulates many pseudowires
- the pseudowire is learned in the bridge FDB as dst_metadata
- to do this, the bridge is extended to learn / keep dst metadata in its fdb
  on a per-entry level
- configuring pseudowires happens through the mpls LIB, through route
  add/delete commands
- the TX path is shared with normal MPLS forwarding
  - this also means this supports ECMP without any further work and without
    copypasta'ing it from af_mpls.c
- pseudowire control word is supported

iproute2 / FRR patches are at:
- https://github.com/eqvinox/vpls-iproute2
- https://github.com/opensourcerouting/frr/commits/vpls
while this patchset is also available at:
- https://github.com/eqvinox/vpls-linux-kernel
(but please be aware that I'm amending and rebasing commits)

I've tested some basic setups, the chain from LDP down into the kernel works
at least in these.  FRR has some testcases around from OpenBSD VPLS support,
I haven't wired that up to run against Linux / this patchset yet.

The patchset needs a lot of polishing (yes I left my TODO notes in the
commit messages), for now my primary concern is overall design feedback.
Roopa has already provided a lot of input (Thanks!);  the major topic I'm
expecting to get discussion on is the bridge FDB changes.

Note that there is a rather large spec difference to VXLAN;  VPLS has
no concept that is analog to a "VNI".  There is nothing to do on a
per-vlan basis.  (Don't get confused by "vlan pseudowire mode", that's
just a fixed tag insertion on a per-remote-endpoint level.)

Cheers / input & feedback appreciated,

-David

P.S.: For a little context on the bridge FDB changes - I'm hoping to find
some time to extend this to the MDB to allow aggregating dst metadata and
handing down a list of dst metas on TX.  This isn't specifically for VPLS
but rather to give sufficient information to the 802.11 stack to allow it to
optimize selecting rates (or unicasting) for multicast traffic by having the
multicast subscriber list known.  This is done by major commercial wifi
solutions (e.g. google "dynamic multicast optimization".)

^ permalink raw reply

* Re: [PATCH] tun: make tun_build_skb() thread safe
From: Michael S. Tsirkin @ 2017-08-16 16:55 UTC (permalink / raw)
  To: Jason Wang; +Cc: davem, netdev, linux-kernel, Eric Dumazet
In-Reply-To: <1502892873-10770-1-git-send-email-jasowang@redhat.com>

On Wed, Aug 16, 2017 at 10:14:33PM +0800, Jason Wang wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> 
> tun_build_skb() is not thread safe since it uses per queue page frag,
> this will break things when multiple threads are sending through same
> queue. Switch to use per-thread generator (no lock involved).
> 
> Fixes: 66ccbc9c87c2 ("tap: use build_skb() for small packet")
> Tested-by: Jason Wang <jasowang@redhat.com>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Acked-by: Michael S. Tsirkin <mst@redhat.com>

Jason, given the switch to task_frag, would it be worth it to look at
using higher order allocs along the lines of
5640f7685831e088fe6c2e1f863a6805962f8e81 as well?

> ---
>  drivers/net/tun.c | 7 +------
>  1 file changed, 1 insertion(+), 6 deletions(-)
> 
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 5892284..c38cd84 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -175,7 +175,6 @@ struct tun_file {
>  	struct list_head next;
>  	struct tun_struct *detached;
>  	struct skb_array tx_array;
> -	struct page_frag alloc_frag;
>  };
>  
>  struct tun_flow_entry {
> @@ -578,8 +577,6 @@ static void __tun_detach(struct tun_file *tfile, bool clean)
>  		}
>  		if (tun)
>  			skb_array_cleanup(&tfile->tx_array);
> -		if (tfile->alloc_frag.page)
> -			put_page(tfile->alloc_frag.page);
>  		sock_put(&tfile->sk);
>  	}
>  }
> @@ -1272,7 +1269,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
>  				     struct virtio_net_hdr *hdr,
>  				     int len, int *generic_xdp)
>  {
> -	struct page_frag *alloc_frag = &tfile->alloc_frag;
> +	struct page_frag *alloc_frag = &current->task_frag;
>  	struct sk_buff *skb;
>  	struct bpf_prog *xdp_prog;
>  	int buflen = SKB_DATA_ALIGN(len + TUN_RX_PAD) +
> @@ -2580,8 +2577,6 @@ static int tun_chr_open(struct inode *inode, struct file * file)
>  	tfile->sk.sk_write_space = tun_sock_write_space;
>  	tfile->sk.sk_sndbuf = INT_MAX;
>  
> -	tfile->alloc_frag.page = NULL;
> -
>  	file->private_data = tfile;
>  	INIT_LIST_HEAD(&tfile->next);
>  
> -- 
> 2.7.4

^ permalink raw reply

* [PATCH net] tipc: fix use-after-free
From: Eric Dumazet @ 2017-08-16 16:41 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Jon Maloy, Ying Xue

From: Eric Dumazet <edumazet@google.com>

syszkaller reported use-after-free in tipc [1]

When msg->rep skb is freed, set the pointer to NULL,
so that caller does not free it again.

[1]

==================================================================
BUG: KASAN: use-after-free in skb_push+0xd4/0xe0 net/core/skbuff.c:1466
Read of size 8 at addr ffff8801c6e71e90 by task syz-executor5/4115

CPU: 1 PID: 4115 Comm: syz-executor5 Not tainted 4.13.0-rc4+ #32
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:16 [inline]
 dump_stack+0x194/0x257 lib/dump_stack.c:52
 print_address_description+0x73/0x250 mm/kasan/report.c:252
 kasan_report_error mm/kasan/report.c:351 [inline]
 kasan_report+0x24e/0x340 mm/kasan/report.c:409
 __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:430
 skb_push+0xd4/0xe0 net/core/skbuff.c:1466
 tipc_nl_compat_recv+0x833/0x18f0 net/tipc/netlink_compat.c:1209
 genl_family_rcv_msg+0x7b7/0xfb0 net/netlink/genetlink.c:598
 genl_rcv_msg+0xb2/0x140 net/netlink/genetlink.c:623
 netlink_rcv_skb+0x216/0x440 net/netlink/af_netlink.c:2397
 genl_rcv+0x28/0x40 net/netlink/genetlink.c:634
 netlink_unicast_kernel net/netlink/af_netlink.c:1265 [inline]
 netlink_unicast+0x4e8/0x6f0 net/netlink/af_netlink.c:1291
 netlink_sendmsg+0xa4a/0xe60 net/netlink/af_netlink.c:1854
 sock_sendmsg_nosec net/socket.c:633 [inline]
 sock_sendmsg+0xca/0x110 net/socket.c:643
 sock_write_iter+0x31a/0x5d0 net/socket.c:898
 call_write_iter include/linux/fs.h:1743 [inline]
 new_sync_write fs/read_write.c:457 [inline]
 __vfs_write+0x684/0x970 fs/read_write.c:470
 vfs_write+0x189/0x510 fs/read_write.c:518
 SYSC_write fs/read_write.c:565 [inline]
 SyS_write+0xef/0x220 fs/read_write.c:557
 entry_SYSCALL_64_fastpath+0x1f/0xbe
RIP: 0033:0x4512e9
RSP: 002b:00007f3bc8184c08 EFLAGS: 00000216 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000718000 RCX: 00000000004512e9
RDX: 0000000000000020 RSI: 0000000020fdb000 RDI: 0000000000000006
RBP: 0000000000000086 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000216 R12: 00000000004b5e76
R13: 00007f3bc8184b48 R14: 00000000004b5e86 R15: 0000000000000000

Allocated by task 4115:
 save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
 save_stack+0x43/0xd0 mm/kasan/kasan.c:447
 set_track mm/kasan/kasan.c:459 [inline]
 kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
 kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:489
 kmem_cache_alloc_node+0x13d/0x750 mm/slab.c:3651
 __alloc_skb+0xf1/0x740 net/core/skbuff.c:219
 alloc_skb include/linux/skbuff.h:903 [inline]
 tipc_tlv_alloc+0x26/0xb0 net/tipc/netlink_compat.c:148
 tipc_nl_compat_dumpit+0xf2/0x3c0 net/tipc/netlink_compat.c:248
 tipc_nl_compat_handle net/tipc/netlink_compat.c:1130 [inline]
 tipc_nl_compat_recv+0x756/0x18f0 net/tipc/netlink_compat.c:1199
 genl_family_rcv_msg+0x7b7/0xfb0 net/netlink/genetlink.c:598
 genl_rcv_msg+0xb2/0x140 net/netlink/genetlink.c:623
 netlink_rcv_skb+0x216/0x440 net/netlink/af_netlink.c:2397
 genl_rcv+0x28/0x40 net/netlink/genetlink.c:634
 netlink_unicast_kernel net/netlink/af_netlink.c:1265 [inline]
 netlink_unicast+0x4e8/0x6f0 net/netlink/af_netlink.c:1291
 netlink_sendmsg+0xa4a/0xe60 net/netlink/af_netlink.c:1854
 sock_sendmsg_nosec net/socket.c:633 [inline]
 sock_sendmsg+0xca/0x110 net/socket.c:643
 sock_write_iter+0x31a/0x5d0 net/socket.c:898
 call_write_iter include/linux/fs.h:1743 [inline]
 new_sync_write fs/read_write.c:457 [inline]
 __vfs_write+0x684/0x970 fs/read_write.c:470
 vfs_write+0x189/0x510 fs/read_write.c:518
 SYSC_write fs/read_write.c:565 [inline]
 SyS_write+0xef/0x220 fs/read_write.c:557
 entry_SYSCALL_64_fastpath+0x1f/0xbe

Freed by task 4115:
 save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
 save_stack+0x43/0xd0 mm/kasan/kasan.c:447
 set_track mm/kasan/kasan.c:459 [inline]
 kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524
 __cache_free mm/slab.c:3503 [inline]
 kmem_cache_free+0x77/0x280 mm/slab.c:3763
 kfree_skbmem+0x1a1/0x1d0 net/core/skbuff.c:622
 __kfree_skb net/core/skbuff.c:682 [inline]
 kfree_skb+0x165/0x4c0 net/core/skbuff.c:699
 tipc_nl_compat_dumpit+0x36a/0x3c0 net/tipc/netlink_compat.c:260
 tipc_nl_compat_handle net/tipc/netlink_compat.c:1130 [inline]
 tipc_nl_compat_recv+0x756/0x18f0 net/tipc/netlink_compat.c:1199
 genl_family_rcv_msg+0x7b7/0xfb0 net/netlink/genetlink.c:598
 genl_rcv_msg+0xb2/0x140 net/netlink/genetlink.c:623
 netlink_rcv_skb+0x216/0x440 net/netlink/af_netlink.c:2397
 genl_rcv+0x28/0x40 net/netlink/genetlink.c:634
 netlink_unicast_kernel net/netlink/af_netlink.c:1265 [inline]
 netlink_unicast+0x4e8/0x6f0 net/netlink/af_netlink.c:1291
 netlink_sendmsg+0xa4a/0xe60 net/netlink/af_netlink.c:1854
 sock_sendmsg_nosec net/socket.c:633 [inline]
 sock_sendmsg+0xca/0x110 net/socket.c:643
 sock_write_iter+0x31a/0x5d0 net/socket.c:898
 call_write_iter include/linux/fs.h:1743 [inline]
 new_sync_write fs/read_write.c:457 [inline]
 __vfs_write+0x684/0x970 fs/read_write.c:470
 vfs_write+0x189/0x510 fs/read_write.c:518
 SYSC_write fs/read_write.c:565 [inline]
 SyS_write+0xef/0x220 fs/read_write.c:557
 entry_SYSCALL_64_fastpath+0x1f/0xbe

The buggy address belongs to the object at ffff8801c6e71dc0
 which belongs to the cache skbuff_head_cache of size 224
The buggy address is located 208 bytes inside of
 224-byte region [ffff8801c6e71dc0, ffff8801c6e71ea0)
The buggy address belongs to the page:
page:ffffea00071b9c40 count:1 mapcount:0 mapping:ffff8801c6e71000 index:0x0
flags: 0x200000000000100(slab)
raw: 0200000000000100 ffff8801c6e71000 0000000000000000 000000010000000c
raw: ffffea0007224a20 ffff8801d98caf48 ffff8801d9e79040 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff8801c6e71d80: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
 ffff8801c6e71e00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff8801c6e71e80: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc
                         ^
 ffff8801c6e71f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff8801c6e71f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov  <dvyukov@google.com>
Cc: Jon Maloy <jon.maloy@ericsson.com>
Cc: Ying Xue <ying.xue@windriver.com>
---
 net/tipc/netlink_compat.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/tipc/netlink_compat.c b/net/tipc/netlink_compat.c
index 9bfe886ab330..750949dfc1d7 100644
--- a/net/tipc/netlink_compat.c
+++ b/net/tipc/netlink_compat.c
@@ -258,13 +258,15 @@ static int tipc_nl_compat_dumpit(struct tipc_nl_compat_cmd_dump *cmd,
 	arg = nlmsg_new(0, GFP_KERNEL);
 	if (!arg) {
 		kfree_skb(msg->rep);
+		msg->rep = NULL;
 		return -ENOMEM;
 	}
 
 	err = __tipc_nl_compat_dumpit(cmd, msg, arg);
-	if (err)
+	if (err) {
 		kfree_skb(msg->rep);
-
+		msg->rep = NULL;
+	}
 	kfree_skb(arg);
 
 	return err;

^ permalink raw reply related

* Re: [PATCH net-next V2 1/3] tap: use build_skb() for small packet
From: David Miller @ 2017-08-16 16:30 UTC (permalink / raw)
  To: jasowang; +Cc: mst, eric.dumazet, netdev, linux-kernel, kubakici
In-Reply-To: <3b66193a-2df1-0191-0785-20123e38460a@redhat.com>

From: Jason Wang <jasowang@redhat.com>
Date: Wed, 16 Aug 2017 17:17:45 +0800

> It looks like full page allocation just produce too much stress on the
> page allocator.
> 
> I get 1.58Mpps (full page) vs 1.95Mpps (page frag) with the patches
> attached.

Yes, this is why drivers doing XDP tend to drift towards implementing
a local cache of pages.

^ permalink raw reply

* Re: [PATCH 1/2] tcp: Remove unnecessary dst check in tcp_conn_request.
From: Tonghao Zhang @ 2017-08-16 15:32 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Linux Kernel Network Developers
In-Reply-To: <1502894687.4936.115.camel@edumazet-glaptop3.roam.corp.google.com>

On Wed, Aug 16, 2017 at 10:44 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Wed, 2017-08-16 at 06:31 -0700, Tonghao Zhang wrote:
>> Because we remove the tcp_tw_recycle support in the commit
>
>
>> 4396e46187c ('tcp: remove tcp_tw_recycle') and also delete
>> the code 'af_ops->route_req' for sysctl_tw_recycle in tcp_conn_request.
>> Now when we call the 'af_ops->route_req', the dist always is
>> NULL, and we remove the unnecessay check.
>
> Thanks for these patches.
>
> You forgot :
>
> 1) a cover letter ( [PATCH next-next 0/2] tcp: ....
>
> 2) clearly state which tree you are targeting
> ( read Documentation/networking/netdev-FAQ.txt )
Thanks, I do this in v2

> 3) Also, I would also have removed tcp_peer_is_proven()
> since it is also called with dst=NULL
Thanks for your work.

>
>

^ permalink raw reply

* [PATCH net-next 3/3] vmbus: remove unused vmbus_sendpacket_ctl
From: Stephen Hemminger @ 2017-08-16 15:56 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin; +Cc: devel, netdev
In-Reply-To: <20170816155626.16580-1-sthemmin@microsoft.com>

The only usage of vmbus_sendpacket_ctl was by vmbus_sendpacket.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
---
 drivers/hv/channel.c        | 43 +++++++++++++++++--------------------------
 drivers/net/hyperv/netvsc.c |  9 ++++-----
 include/linux/hyperv.h      |  7 -------
 3 files changed, 21 insertions(+), 38 deletions(-)

diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index 9223fe8823e0..d9e9676e2b40 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -647,9 +647,23 @@ void vmbus_close(struct vmbus_channel *channel)
 }
 EXPORT_SYMBOL_GPL(vmbus_close);
 
-int vmbus_sendpacket_ctl(struct vmbus_channel *channel, void *buffer,
-			 u32 bufferlen, u64 requestid,
-			 enum vmbus_packet_type type, u32 flags)
+/**
+ * vmbus_sendpacket() - Send the specified buffer on the given channel
+ * @channel: Pointer to vmbus_channel structure.
+ * @buffer: Pointer to the buffer you want to receive the data into.
+ * @bufferlen: Maximum size of what the the buffer will hold
+ * @requestid: Identifier of the request
+ * @type: Type of packet that is being send e.g. negotiate, time
+ * packet etc.
+ *
+ * Sends data in @buffer directly to hyper-v via the vmbus
+ * This will send the data unparsed to hyper-v.
+ *
+ * Mainly used by Hyper-V drivers.
+ */
+int vmbus_sendpacket(struct vmbus_channel *channel, void *buffer,
+			   u32 bufferlen, u64 requestid,
+			   enum vmbus_packet_type type, u32 flags)
 {
 	struct vmpacket_descriptor desc;
 	u32 packetlen = sizeof(struct vmpacket_descriptor) + bufferlen;
@@ -676,29 +690,6 @@ int vmbus_sendpacket_ctl(struct vmbus_channel *channel, void *buffer,
 
 	return hv_ringbuffer_write(channel, bufferlist, num_vecs);
 }
-EXPORT_SYMBOL(vmbus_sendpacket_ctl);
-
-/**
- * vmbus_sendpacket() - Send the specified buffer on the given channel
- * @channel: Pointer to vmbus_channel structure.
- * @buffer: Pointer to the buffer you want to receive the data into.
- * @bufferlen: Maximum size of what the the buffer will hold
- * @requestid: Identifier of the request
- * @type: Type of packet that is being send e.g. negotiate, time
- * packet etc.
- *
- * Sends data in @buffer directly to hyper-v via the vmbus
- * This will send the data unparsed to hyper-v.
- *
- * Mainly used by Hyper-V drivers.
- */
-int vmbus_sendpacket(struct vmbus_channel *channel, void *buffer,
-			   u32 bufferlen, u64 requestid,
-			   enum vmbus_packet_type type, u32 flags)
-{
-	return vmbus_sendpacket_ctl(channel, buffer, bufferlen, requestid,
-				    type, flags);
-}
 EXPORT_SYMBOL(vmbus_sendpacket);
 
 /*
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 6031102cbba3..0062b802676f 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -780,11 +780,10 @@ static inline int netvsc_send_pkt(
 						  &nvmsg, sizeof(nvmsg),
 						  req_id);
 	} else {
-		ret = vmbus_sendpacket_ctl(out_channel, &nvmsg,
-					   sizeof(struct nvsp_message),
-					   req_id,
-					   VM_PKT_DATA_INBAND,
-					   VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED);
+		ret = vmbus_sendpacket(out_channel,
+				       &nvmsg, sizeof(nvmsg),
+				       req_id, VM_PKT_DATA_INBAND,
+				       VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED);
 	}
 
 	if (ret == 0) {
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 9692592d43a3..a5f961c4149e 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1030,13 +1030,6 @@ extern int vmbus_sendpacket(struct vmbus_channel *channel,
 				  enum vmbus_packet_type type,
 				  u32 flags);
 
-extern int vmbus_sendpacket_ctl(struct vmbus_channel *channel,
-				  void *buffer,
-				  u32 bufferLen,
-				  u64 requestid,
-				  enum vmbus_packet_type type,
-				  u32 flags);
-
 extern int vmbus_sendpacket_pagebuffer(struct vmbus_channel *channel,
 					    struct hv_page_buffer pagebuffers[],
 					    u32 pagecount,
-- 
2.11.0

^ permalink raw reply related

* [PATCH net-next 2/3] vmbus: remove unused vmubs_sendpacket_pagebuffer_ctl
From: Stephen Hemminger @ 2017-08-16 15:56 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin; +Cc: devel, netdev
In-Reply-To: <20170816155626.16580-1-sthemmin@microsoft.com>

The function vmbus_sendpacket_pagebuffer_ctl was never used directly.
Just have vmbus_send_pagebuffer

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
---
 drivers/hv/channel.c        | 30 ++++++------------------------
 drivers/net/hyperv/netvsc.c | 10 ++++------
 include/linux/hyperv.h      |  8 --------
 3 files changed, 10 insertions(+), 38 deletions(-)

diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index 756a1e841142..9223fe8823e0 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -702,16 +702,16 @@ int vmbus_sendpacket(struct vmbus_channel *channel, void *buffer,
 EXPORT_SYMBOL(vmbus_sendpacket);
 
 /*
- * vmbus_sendpacket_pagebuffer_ctl - Send a range of single-page buffer
+ * vmbus_sendpacket_pagebuffer - Send a range of single-page buffer
  * packets using a GPADL Direct packet type. This interface allows you
  * to control notifying the host. This will be useful for sending
  * batched data. Also the sender can control the send flags
  * explicitly.
  */
-int vmbus_sendpacket_pagebuffer_ctl(struct vmbus_channel *channel,
-				    struct hv_page_buffer pagebuffers[],
-				    u32 pagecount, void *buffer, u32 bufferlen,
-				    u64 requestid, u32 flags)
+int vmbus_sendpacket_pagebuffer(struct vmbus_channel *channel,
+				struct hv_page_buffer pagebuffers[],
+				u32 pagecount, void *buffer, u32 bufferlen,
+				u64 requestid)
 {
 	int i;
 	struct vmbus_channel_packet_page_buffer desc;
@@ -736,7 +736,7 @@ int vmbus_sendpacket_pagebuffer_ctl(struct vmbus_channel *channel,
 
 	/* Setup the descriptor */
 	desc.type = VM_PKT_DATA_USING_GPA_DIRECT;
-	desc.flags = flags;
+	desc.flags = VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED;
 	desc.dataoffset8 = descsize >> 3; /* in 8-bytes granularity */
 	desc.length8 = (u16)(packetlen_aligned >> 3);
 	desc.transactionid = requestid;
@@ -757,24 +757,6 @@ int vmbus_sendpacket_pagebuffer_ctl(struct vmbus_channel *channel,
 
 	return hv_ringbuffer_write(channel, bufferlist, 3);
 }
-EXPORT_SYMBOL_GPL(vmbus_sendpacket_pagebuffer_ctl);
-
-/*
- * vmbus_sendpacket_pagebuffer - Send a range of single-page buffer
- * packets using a GPADL Direct packet type.
- */
-int vmbus_sendpacket_pagebuffer(struct vmbus_channel *channel,
-				     struct hv_page_buffer pagebuffers[],
-				     u32 pagecount, void *buffer, u32 bufferlen,
-				     u64 requestid)
-{
-	u32 flags = VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED;
-
-	return vmbus_sendpacket_pagebuffer_ctl(channel, pagebuffers, pagecount,
-					       buffer, bufferlen,
-					       requestid, flags);
-
-}
 EXPORT_SYMBOL_GPL(vmbus_sendpacket_pagebuffer);
 
 /*
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 0530e7d729e1..6031102cbba3 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -775,12 +775,10 @@ static inline int netvsc_send_pkt(
 		if (packet->cp_partial)
 			pb += packet->rmsg_pgcnt;
 
-		ret = vmbus_sendpacket_pagebuffer_ctl(out_channel,
-						      pb, packet->page_buf_cnt,
-						      &nvmsg,
-						      sizeof(struct nvsp_message),
-						      req_id,
-						      VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED);
+		ret = vmbus_sendpacket_pagebuffer(out_channel,
+						  pb, packet->page_buf_cnt,
+						  &nvmsg, sizeof(nvmsg),
+						  req_id);
 	} else {
 		ret = vmbus_sendpacket_ctl(out_channel, &nvmsg,
 					   sizeof(struct nvsp_message),
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 39a080ce17da..9692592d43a3 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1044,14 +1044,6 @@ extern int vmbus_sendpacket_pagebuffer(struct vmbus_channel *channel,
 					    u32 bufferlen,
 					    u64 requestid);
 
-extern int vmbus_sendpacket_pagebuffer_ctl(struct vmbus_channel *channel,
-					   struct hv_page_buffer pagebuffers[],
-					   u32 pagecount,
-					   void *buffer,
-					   u32 bufferlen,
-					   u64 requestid,
-					   u32 flags);
-
 extern int vmbus_sendpacket_mpb_desc(struct vmbus_channel *channel,
 				     struct vmbus_packet_mpb_array *mpb,
 				     u32 desc_size,
-- 
2.11.0

^ permalink raw reply related

* [PATCH net-next 1/3] vmbus: remove unused vmbus_sendpacket_multipagebuffer
From: Stephen Hemminger @ 2017-08-16 15:56 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin; +Cc: devel, netdev
In-Reply-To: <20170816155626.16580-1-sthemmin@microsoft.com>

This function is not used anywhere in current code.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
---
 drivers/hv/channel.c   | 56 --------------------------------------------------
 include/linux/hyperv.h |  6 ------
 2 files changed, 62 deletions(-)

diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index e57cc40cb768..756a1e841142 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -814,62 +814,6 @@ int vmbus_sendpacket_mpb_desc(struct vmbus_channel *channel,
 }
 EXPORT_SYMBOL_GPL(vmbus_sendpacket_mpb_desc);
 
-/*
- * vmbus_sendpacket_multipagebuffer - Send a multi-page buffer packet
- * using a GPADL Direct packet type.
- */
-int vmbus_sendpacket_multipagebuffer(struct vmbus_channel *channel,
-				struct hv_multipage_buffer *multi_pagebuffer,
-				void *buffer, u32 bufferlen, u64 requestid)
-{
-	struct vmbus_channel_packet_multipage_buffer desc;
-	u32 descsize;
-	u32 packetlen;
-	u32 packetlen_aligned;
-	struct kvec bufferlist[3];
-	u64 aligned_data = 0;
-	u32 pfncount = NUM_PAGES_SPANNED(multi_pagebuffer->offset,
-					 multi_pagebuffer->len);
-
-	if (pfncount > MAX_MULTIPAGE_BUFFER_COUNT)
-		return -EINVAL;
-
-	/*
-	 * Adjust the size down since vmbus_channel_packet_multipage_buffer is
-	 * the largest size we support
-	 */
-	descsize = sizeof(struct vmbus_channel_packet_multipage_buffer) -
-			  ((MAX_MULTIPAGE_BUFFER_COUNT - pfncount) *
-			  sizeof(u64));
-	packetlen = descsize + bufferlen;
-	packetlen_aligned = ALIGN(packetlen, sizeof(u64));
-
-
-	/* Setup the descriptor */
-	desc.type = VM_PKT_DATA_USING_GPA_DIRECT;
-	desc.flags = VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED;
-	desc.dataoffset8 = descsize >> 3; /* in 8-bytes granularity */
-	desc.length8 = (u16)(packetlen_aligned >> 3);
-	desc.transactionid = requestid;
-	desc.rangecount = 1;
-
-	desc.range.len = multi_pagebuffer->len;
-	desc.range.offset = multi_pagebuffer->offset;
-
-	memcpy(desc.range.pfn_array, multi_pagebuffer->pfn_array,
-	       pfncount * sizeof(u64));
-
-	bufferlist[0].iov_base = &desc;
-	bufferlist[0].iov_len = descsize;
-	bufferlist[1].iov_base = buffer;
-	bufferlist[1].iov_len = bufferlen;
-	bufferlist[2].iov_base = &aligned_data;
-	bufferlist[2].iov_len = (packetlen_aligned - packetlen);
-
-	return hv_ringbuffer_write(channel, bufferlist, 3);
-}
-EXPORT_SYMBOL_GPL(vmbus_sendpacket_multipagebuffer);
-
 /**
  * vmbus_recvpacket() - Retrieve the user packet on the specified channel
  * @channel: Pointer to vmbus_channel structure.
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index b7d7bbec74e0..39a080ce17da 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1052,12 +1052,6 @@ extern int vmbus_sendpacket_pagebuffer_ctl(struct vmbus_channel *channel,
 					   u64 requestid,
 					   u32 flags);
 
-extern int vmbus_sendpacket_multipagebuffer(struct vmbus_channel *channel,
-					struct hv_multipage_buffer *mpb,
-					void *buffer,
-					u32 bufferlen,
-					u64 requestid);
-
 extern int vmbus_sendpacket_mpb_desc(struct vmbus_channel *channel,
 				     struct vmbus_packet_mpb_array *mpb,
 				     u32 desc_size,
-- 
2.11.0

^ permalink raw reply related

* [PATCH net-next 0/3] vmbus sendpacket cleanups
From: Stephen Hemminger @ 2017-08-16 15:56 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin; +Cc: devel, netdev

These patches remove and consolidate vmbus_sendpacket functions.

They should go through the net-next tree since these API's
were only used by the netvsc driver.

Stephen Hemminger (3):
  vmbus: remove unused vmbus_sendpacket_multipagebuffer
  vmbus: remove unused vmubs_sendpacket_pagebuffer_ctl
  vmbus: remove unused vmbus_sendpacket_ctl

 drivers/hv/channel.c        | 129 ++++++++------------------------------------
 drivers/net/hyperv/netvsc.c |  19 +++----
 include/linux/hyperv.h      |  21 --------
 3 files changed, 31 insertions(+), 138 deletions(-)

-- 
2.11.0

^ permalink raw reply

* [PATCH net-next v2 1/2] tcp: Remove unnecessary dst check in tcp_conn_request.
From: Tonghao Zhang @ 2017-08-16 15:28 UTC (permalink / raw)
  To: netdev; +Cc: Tonghao Zhang
In-Reply-To: <1502897316-3666-1-git-send-email-xiangxia.m.yue@gmail.com>

Because we remove the tcp_tw_recycle support in the commit
4396e46187c ('tcp: remove tcp_tw_recycle') and also delete
the code 'af_ops->route_req' for sysctl_tw_recycle in tcp_conn_request.
Now when we call the 'af_ops->route_req', the dist always is
NULL, and we remove the unnecessay check.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
---
 net/ipv4/tcp_input.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index d73903f..7eee2c7 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -6132,11 +6132,10 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,

 		isn = af_ops->init_seq(skb);
 	}
-	if (!dst) {
-		dst = af_ops->route_req(sk, &fl, req);
-		if (!dst)
-			goto drop_and_free;
-	}
+
+	dst = af_ops->route_req(sk, &fl, req);
+	if (!dst)
+		goto drop_and_free;

 	tcp_ecn_create_request(req, skb, sk, dst);

-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next v2 0/2] simplify the tcp_conn_request.
From: Tonghao Zhang @ 2017-08-16 15:28 UTC (permalink / raw)
  To: netdev; +Cc: Tonghao Zhang

These patches are not bugfix. But just simplify the tcp_conn_request
function.

These patches are based on Davem's net-next tree.

Tonghao Zhang (2):
  tcp: Remove unnecessary dst check in tcp_conn_request.
  tcp: Remove the unused parameter for tcp_try_fastopen.

 include/net/tcp.h       |  3 +--
 net/ipv4/tcp_fastopen.c |  6 ++----
 net/ipv4/tcp_input.c    | 11 +++++------
 3 files changed, 8 insertions(+), 12 deletions(-)

-- 
1.8.3.1

^ permalink raw reply

* Re: [net-next PATCH 00/10] BPF: sockmap and sk redirect support
From: Daniel Borkmann @ 2017-08-16 15:25 UTC (permalink / raw)
  To: John Fastabend, davem, ast; +Cc: tgraf, netdev, tom
In-Reply-To: <20170816052338.15445.83732.stgit@john-Precision-Tower-5810>

On 08/16/2017 07:30 AM, John Fastabend wrote:
> This series implements a sockmap and socket redirect helper for BPF
> using a model similar to XDP netdev redirect. A sockmap is a BPF map
> type that holds references to sock structs. Then with a new sk
> redirect bpf helper BPF programs can use the map to redirect skbs
> between sockets,
>
>        bpf_sk_redirect_map(map, key, flags)
>
> Finally, we need a call site to attach our BPF logic to do socket
> redirects. We added hooks to recv_sock using the existing strparser
> infrastructure to do this. The call site is added via the BPF attach
> map call. To enable users to use this infrastructure a new BPF program
> BPF_PROG_TYPE_SK_SKB is created that allows users to reference sock
> details, such as port and ip address fields, to build useful socket
> layer program. The sockmap datapath is as follows,
>
>       recv -> strparser -> verdict/action
>
> where this series implements the drop and redirect actions.
> Additional, actions can be added as needed.
>
> A sample program is provided to illustrate how a sockmap can
> be integrated with cgroups and used to add/delete sockets in
> a sockmap. The program is simple but should show many of the
> key ideas.
>
> To test this work test_maps in selftests/bpf was leveraged.
> We added a set of tests to add sockets and do send/recv ops
> on the sockets to ensure correct behavior. Additionally, the
> selftests tests a series of negative test cases. We can expand
> on this in the future.
>
> I also have a basic test program I use with iperf/netperf
> clients that could be sent as an additional sample if folks
> want this. It needs a bit of cleanup to send to the list and
> wasn't included in this series.
>
> For people who prefer git over pulling patches out of their mail
> editor I've posted the code here,
>
> https://github.com/jrfastab/linux-kernel-xdp/tree/sockmap
>
> For some background information on the genesis of this work
> it might be helpful to review these slides from netconf 2017
> by Thomas Graf,
>
> http://vger.kernel.org/netconf2017.html
> https://docs.google.com/a/covalent.io/presentation/d/1dwSKSBGpUHD3WO5xxzZWj8awV_-xL-oYhvqQMOBhhtk/edit?usp=sharing
>
> Thanks to Daniel Borkmann for reviewing and providing initial
> feedback.

LGTM, for the set:

Acked-by: Daniel Borkmann <daniel@iogearbox.net>

^ permalink raw reply

* Re: [PATCH net] datagram: When peeking datagrams with offset < 0 don't skip empty skbs
From: Willem de Bruijn @ 2017-08-16 15:18 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: Matthew Dawson, Network Development, Macieira, Thiago
In-Reply-To: <1502875722.3548.11.camel@redhat.com>

> If I read the above correctly, you are arguining in favor of the
> addittional flag version, right?

I was. Though if we are going to thread the argument from the caller
to __skb_try_recv_from_queue to avoid rereading sk->sk_peek_off,
on second thought it might be simpler to do it through off:

@@ -511,7 +511,9 @@ static inline int sk_peek_offset(struct sock *sk, int flags)
        if (unlikely(flags & MSG_PEEK)) {
                s32 off = READ_ONCE(sk->sk_peek_off);
                if (off >= 0)
-                       return off;
+                       return off + 1;
+               else
+                       return 0;
        }

        return 0;

In __skb_try_recv_from_queue we can then disambiguate the two as follows:

@@ -170,13 +170,19 @@ struct sk_buff *__skb_try_recv_from_queue(struct sock *sk,
                                          struct sk_buff **last)
 {
        struct sk_buff *skb;
-       int _off = *off;
+       bool peek_at_off = false;
+       int _off = 0;
+
+       if (flags & MSG_PEEK && *off) {
+               peek_at_off = true;
+               _off = (*off) - 1;
+       }

        *last = queue->prev;
        skb_queue_walk(queue, skb) {
                if (flags & MSG_PEEK) {
-                       if (_off >= skb->len && (skb->len || _off ||
-                                                skb->peeked)) {
+                       if (peek_at_off && _off >= skb->len &&
+                           (skb->len || _off || skb->peeked)) {


This, of course, requires restricting sk_peek_off to protect against overflow.

If I'm not mistaken, the test in udp_recvmsg currently incorrectly sets
peeking to false when peeking at offset zero:

        peeking = off = sk_peek_offset(sk, flags);


> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -2408,9 +2408,7 @@ EXPORT_SYMBOL(__sk_mem_reclaim);
>
>  int sk_set_peek_off(struct sock *sk, int val)
>  {
> -       if (val < 0)
> -               return -EINVAL;
> -
> +       /* a negative value will disable peeking with offset */
>         sk->sk_peek_off = val;
>         return 0;
>  }

Separate patch to net-next?

^ permalink raw reply

* Re: [PATCH v3] openvswitch: enable NSH support
From: Eric Garver @ 2017-08-16 15:15 UTC (permalink / raw)
  To: Yi Yang
  Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA,
	Jiri Benc
In-Reply-To: <1502861730-76203-1-git-send-email-yi.y.yang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

On Wed, Aug 16, 2017 at 01:35:30PM +0800, Yi Yang wrote:
> v2->v3
>  - Change OVS_KEY_ATTR_NSH to nested key to handle
>    length-fixed attributes and length-variable
>    attriubte more flexibly.
>  - Remove struct ovs_action_push_nsh completely
>  - Add code to handle nested attribute for SET_MASKED
>  - Change PUSH_NSH to use the nested OVS_KEY_ATTR_NSH
>    to transfer NSH header data.
>  - Fix comments and coding style issues by Jiri and Eric

Thanks for the updates Yi. Some feedback below.

> 
> v1->v2
>  - Change encap_nsh and decap_nsh to push_nsh and pop_nsh
>  - Dynamically allocate struct ovs_action_push_nsh for
>    length-variable metadata.
> 
> OVS master and 2.8 branch has merged NSH userspace
> patch series, this patch is to enable NSH support
> in kernel data path in order that OVS can support
> NSH in 2.8 release in compat mode by porting this.
> 
> Signed-off-by: Yi Yang <yi.y.yang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> ---
>  drivers/net/vxlan.c              |   7 +
>  include/net/nsh.h                | 150 +++++++++++++++++++
>  include/uapi/linux/openvswitch.h |  30 ++++
>  net/openvswitch/actions.c        | 175 ++++++++++++++++++++++
>  net/openvswitch/flow.c           |  39 +++++
>  net/openvswitch/flow.h           |  11 ++
>  net/openvswitch/flow_netlink.c   | 304 ++++++++++++++++++++++++++++++++++++++-
>  net/openvswitch/flow_netlink.h   |   4 +
>  8 files changed, 719 insertions(+), 1 deletion(-)
>  create mode 100644 include/net/nsh.h
> 
[..]
> diff --git a/include/net/nsh.h b/include/net/nsh.h
> new file mode 100644
> index 0000000..54f44f6
> --- /dev/null
> +++ b/include/net/nsh.h
> @@ -0,0 +1,150 @@
[..]
> +#define NSH_VER_MASK       0xc000
> +#define NSH_VER_SHIFT      14
> +#define NSH_FLAGS_MASK     0x3fc0
> +#define NSH_FLAGS_SHIFT    6
> +#define NSH_LEN_MASK       0x003f
> +#define NSH_LEN_SHIFT      0
> +
> +#define NSH_SPI_MASK       0xffffff00
> +#define NSH_SPI_SHIFT      8
> +#define NSH_SI_MASK        0x000000ff
> +#define NSH_SI_SHIFT       0
> +
> +#define NSH_DST_PORT    4790     /* UDP Port for NSH on VXLAN. */
> +#define ETH_P_NSH       0x894F   /* Ethertype for NSH. */

ETH_P_NSH probably belongs in include/uapi/linux/if_ether.h with all the
other ETH_P_* defines.

> +
> +/* NSH Base Header Next Protocol. */
> +#define NSH_P_IPV4        0x01
> +#define NSH_P_IPV6        0x02
> +#define NSH_P_ETHERNET    0x03
> +#define NSH_P_NSH         0x04
> +#define NSH_P_MPLS        0x05
> +
> +/* MD Type Registry. */
> +#define NSH_M_TYPE1     0x01
> +#define NSH_M_TYPE2     0x02
> +#define NSH_M_EXP1      0xFE
> +#define NSH_M_EXP2      0xFF
> +
> +/* NSH Metadata Length. */
> +#define NSH_M_TYPE1_MDLEN 16
> +
> +/* NSH Base Header Length */
> +#define NSH_BASE_HDR_LEN  8
> +
> +/* NSH MD Type 1 header Length. */
> +#define NSH_M_TYPE1_LEN   24
> +
[..]
> diff --git a/net/openvswitch/flow.h b/net/openvswitch/flow.h
> index 1875bba..6c738d0 100644
> --- a/net/openvswitch/flow.h
> +++ b/net/openvswitch/flow.h
> @@ -35,6 +35,7 @@
>  #include <net/inet_ecn.h>
>  #include <net/ip_tunnels.h>
>  #include <net/dst_metadata.h>
> +#include <net/nsh.h>
>  
>  struct sk_buff;
>  
> @@ -66,6 +67,15 @@ struct vlan_head {
>  	(offsetof(struct sw_flow_key, recirc_id) +	\
>  	FIELD_SIZEOF(struct sw_flow_key, recirc_id))
>  
> +struct ovs_key_nsh {
> +	__u8 flags;
> +	__u8 mdtype;
> +	__u8 np;
> +	__u8 pad;
> +	__be32 path_hdr;
> +	__be32 context[NSH_MD1_CONTEXT_SIZE];
> +};
> +
>  struct sw_flow_key {
>  	u8 tun_opts[IP_TUNNEL_OPTS_MAX];
>  	u8 tun_opts_len;
> @@ -144,6 +154,7 @@ struct sw_flow_key {
>  			};
>  		} ipv6;
>  	};
> +	struct ovs_key_nsh nsh;         /* network service header */

Are you intentionally not reserving space in the flow key for
OVS_NSH_KEY_ATTR_MD2? I know it's not supported yet, but much of the
code is stubbed out for it - just making sure this wasn't an oversight.

>  	struct {
>  		/* Connection tracking fields not packed above. */
>  		struct {
> diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
> index f07d10a..79059db 100644
> --- a/net/openvswitch/flow_netlink.c
> +++ b/net/openvswitch/flow_netlink.c
> @@ -78,9 +78,11 @@ static bool actions_may_change_flow(const struct nlattr *actions)
>  		case OVS_ACTION_ATTR_HASH:
>  		case OVS_ACTION_ATTR_POP_ETH:
>  		case OVS_ACTION_ATTR_POP_MPLS:
> +		case OVS_ACTION_ATTR_POP_NSH:
>  		case OVS_ACTION_ATTR_POP_VLAN:
>  		case OVS_ACTION_ATTR_PUSH_ETH:
>  		case OVS_ACTION_ATTR_PUSH_MPLS:
> +		case OVS_ACTION_ATTR_PUSH_NSH:
>  		case OVS_ACTION_ATTR_PUSH_VLAN:
>  		case OVS_ACTION_ATTR_SAMPLE:
>  		case OVS_ACTION_ATTR_SET:
> @@ -322,12 +324,22 @@ size_t ovs_tun_key_attr_size(void)
>  		+ nla_total_size(2);   /* OVS_TUNNEL_KEY_ATTR_TP_DST */
>  }
>  
> +size_t ovs_nsh_key_attr_size(void)
> +{
> +	/* Whenever adding new OVS_NSH_KEY_ FIELDS, we should consider
> +	 * updating this function.
> +	 */
> +	return  nla_total_size(8)      /* OVS_NSH_KEY_ATTR_BASE */
> +		+ nla_total_size(16)   /* OVS_NSH_KEY_ATTR_MD1 */
> +		+ nla_total_size(248); /* OVS_NSH_KEY_ATTR_MD2 */

_MD1 and _MD2 are mutually exclusive. You only need the larger of the
two.

Maybe use OVS_PUSH_NSH_MAX_MD_LEN instead of 248.

> +}
> +
>  size_t ovs_key_attr_size(void)
>  {
>  	/* Whenever adding new OVS_KEY_ FIELDS, we should consider
>  	 * updating this function.
>  	 */
> -	BUILD_BUG_ON(OVS_KEY_ATTR_TUNNEL_INFO != 28);
> +	BUILD_BUG_ON(OVS_KEY_ATTR_TUNNEL_INFO != 29);
>  
>  	return    nla_total_size(4)   /* OVS_KEY_ATTR_PRIORITY */
>  		+ nla_total_size(0)   /* OVS_KEY_ATTR_TUNNEL */
> @@ -341,6 +353,8 @@ size_t ovs_key_attr_size(void)
>  		+ nla_total_size(4)   /* OVS_KEY_ATTR_CT_MARK */
>  		+ nla_total_size(16)  /* OVS_KEY_ATTR_CT_LABELS */
>  		+ nla_total_size(40)  /* OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV6 */
> +		+ nla_total_size(0)   /* OVS_KEY_ATTR_NSH */
> +		  + ovs_nsh_key_attr_size()
>  		+ nla_total_size(12)  /* OVS_KEY_ATTR_ETHERNET */
>  		+ nla_total_size(2)   /* OVS_KEY_ATTR_ETHERTYPE */
>  		+ nla_total_size(4)   /* OVS_KEY_ATTR_VLAN */
> @@ -373,6 +387,13 @@ static const struct ovs_len_tbl ovs_tunnel_key_lens[OVS_TUNNEL_KEY_ATTR_MAX + 1]
>  	[OVS_TUNNEL_KEY_ATTR_IPV6_DST]      = { .len = sizeof(struct in6_addr) },
>  };
>  
> +static const struct ovs_len_tbl
> +ovs_nsh_key_attr_lens[OVS_NSH_KEY_ATTR_MAX + 1] = {
> +	[OVS_NSH_KEY_ATTR_BASE]     = { .len = 8 },
> +	[OVS_NSH_KEY_ATTR_MD1]      = { .len = 16 },
> +	[OVS_NSH_KEY_ATTR_MD2]      = { .len = OVS_ATTR_VARIABLE },
> +};
> +
>  /* The size of the argument for each %OVS_KEY_ATTR_* Netlink attribute.  */
>  static const struct ovs_len_tbl ovs_key_lens[OVS_KEY_ATTR_MAX + 1] = {
>  	[OVS_KEY_ATTR_ENCAP]	 = { .len = OVS_ATTR_NESTED },
> @@ -405,6 +426,8 @@ static const struct ovs_len_tbl ovs_key_lens[OVS_KEY_ATTR_MAX + 1] = {
>  		.len = sizeof(struct ovs_key_ct_tuple_ipv4) },
>  	[OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV6] = {
>  		.len = sizeof(struct ovs_key_ct_tuple_ipv6) },
> +	[OVS_KEY_ATTR_NSH]       = { .len = OVS_ATTR_NESTED,
> +				     .next = ovs_nsh_key_attr_lens, },
>  };
>  
>  static bool check_attr_len(unsigned int attr_len, unsigned int expected_len)
> @@ -1179,6 +1202,209 @@ static int metadata_from_nlattrs(struct net *net, struct sw_flow_match *match,
>  	return 0;
>  }
>  
> +int push_nsh_para_from_nlattr(const struct nlattr *attr,
> +			      struct push_nsh_para *pnp)
> +{
> +	struct nlattr *a;
> +	int rem;
> +	u8 flags = 0;
> +	size_t mdlen = 0;
> +
> +	nla_for_each_nested(a, attr, rem) {
> +		int type = nla_type(a);
> +
> +		if (type > OVS_NSH_KEY_ATTR_MAX) {
> +			OVS_NLERR(1, "nsh attr %d is out of range max %d",
> +				  type, OVS_NSH_KEY_ATTR_MAX);
> +			return -EINVAL;
> +		}
> +
> +		if (!check_attr_len(nla_len(a),
> +				    ovs_nsh_key_attr_lens[type].len)) {
> +			OVS_NLERR(
> +			    1,
> +			    "nsh attr %d has unexpected len %d expected %d",
> +			    type,
> +			    nla_len(a),
> +			    ovs_nsh_key_attr_lens[type].len
> +			);
> +			return -EINVAL;
> +		}
> +
> +		switch (type) {
> +		case OVS_NSH_KEY_ATTR_BASE: {
> +			const struct ovs_nsh_key_base *base =
> +				(struct ovs_nsh_key_base *)nla_data(a);
> +			flags = base->flags;
> +			pnp->next_proto = base->np;
> +			pnp->md_type = base->mdtype;
> +			pnp->path_hdr = base->path_hdr;
> +			break;
> +		}
> +		case OVS_NSH_KEY_ATTR_MD1: {
> +			const struct ovs_nsh_key_md1 *md1 =
> +				(struct ovs_nsh_key_md1 *)nla_data(a);
> +
> +			mdlen = nla_len(a);
> +			memcpy(pnp->metadata, md1, mdlen);
> +			break;
> +		}
> +		case OVS_NSH_KEY_ATTR_MD2: {
> +			const struct u8 *md2 = nla_data(a);
> +
> +			mdlen = nla_len(a);
> +			memcpy(pnp->metadata, md2, mdlen);
> +			break;
> +		}
> +		default:
> +			OVS_NLERR(1, "Unknown nsh attribute %d",
> +				  type);
> +			return -EINVAL;
> +		}
> +	}
> +
> +	if (rem > 0) {
> +		OVS_NLERR(1, "nsh attribute has %d unknown bytes.", rem);
> +		return -EINVAL;
> +	}
> +
> +	/* nsh header length  = NSH_BASE_HDR_LEN + mdlen */
> +	pnp->ver_flags_len = htons(flags << NSH_FLAGS_SHIFT |
> +				   (NSH_BASE_HDR_LEN + mdlen) >> 2);
> +
> +	return 0;
> +}
> +
> +int nsh_key_from_nlattr(const struct nlattr *attr,
> +			struct ovs_key_nsh *nsh)
> +{
> +	struct nlattr *a;
> +	int rem;
> +
> +	nla_for_each_nested(a, attr, rem) {
> +		int type = nla_type(a);
> +
> +		if (type > OVS_NSH_KEY_ATTR_MAX) {
> +			OVS_NLERR(1, "nsh attr %d is out of range max %d",
> +				  type, OVS_NSH_KEY_ATTR_MAX);
> +			return -EINVAL;
> +		}
> +
> +		if (!check_attr_len(nla_len(a),
> +				    ovs_nsh_key_attr_lens[type].len)) {
> +			OVS_NLERR(
> +			    1,
> +			    "nsh attr %d has unexpected len %d expected %d",
> +			    type,
> +			    nla_len(a),
> +			    ovs_nsh_key_attr_lens[type].len
> +			);
> +			return -EINVAL;
> +		}
> +
> +		switch (type) {
> +		case OVS_NSH_KEY_ATTR_BASE: {
> +			const struct ovs_nsh_key_base *base =
> +				(struct ovs_nsh_key_base *)nla_data(a);
> +			nsh->flags = base->flags;
> +			nsh->mdtype = base->mdtype;
> +			nsh->np = base->np;
> +			nsh->path_hdr = base->path_hdr;
> +			break;
> +		}
> +		case OVS_NSH_KEY_ATTR_MD1: {
> +			const struct ovs_nsh_key_md1 *md1 =
> +				(struct ovs_nsh_key_md1 *)nla_data(a);
> +			memcpy(nsh->context, md1->context, sizeof(*md1));
> +			break;
> +		}
> +		case OVS_NSH_KEY_ATTR_MD2:
> +			/* Not supported yet */
> +			break;

When we encounter these metadata attributes do we need to verify that
nsh->mdtype is set correctly? Keep in mind we may parse ATTR_MD{1,2}
before ATTR_BASE.

> +		default:
> +			OVS_NLERR(1, "Unknown nsh attribute %d",
> +				  type);
> +			return -EINVAL;
> +		}
> +	}
> +
> +	if (rem > 0) {
> +		OVS_NLERR(1, "nsh attribute has %d unknown bytes.", rem);
> +		return -EINVAL;
> +	}
> +
> +	return 0;
> +}
> +
> +static int nsh_key_put_from_nlattr(const struct nlattr *attr,
> +				   struct sw_flow_match *match, bool is_mask,
> +				   bool log)
> +{
> +	struct nlattr *a;
> +	int rem;
> +
> +	nla_for_each_nested(a, attr, rem) {
> +		int type = nla_type(a);
> +		int i;
> +
> +		if (type > OVS_NSH_KEY_ATTR_MAX) {
> +			OVS_NLERR(log, "nsh attr %d is out of range max %d",
> +				  type, OVS_NSH_KEY_ATTR_MAX);
> +			return -EINVAL;
> +		}
> +
> +		if (!check_attr_len(nla_len(a),
> +				    ovs_nsh_key_attr_lens[type].len)) {
> +			OVS_NLERR(
> +			    log,
> +			    "nsh attr %d has unexpected len %d expected %d",
> +			    type,
> +			    nla_len(a),
> +			    ovs_nsh_key_attr_lens[type].len
> +			);
> +			return -EINVAL;
> +		}
> +
> +		switch (type) {
> +		case OVS_NSH_KEY_ATTR_BASE: {
> +			const struct ovs_nsh_key_base *base =
> +				(struct ovs_nsh_key_base *)nla_data(a);
> +			SW_FLOW_KEY_PUT(match, nsh.flags,
> +					base->flags, is_mask);
> +			SW_FLOW_KEY_PUT(match, nsh.mdtype,
> +					base->mdtype, is_mask);
> +			SW_FLOW_KEY_PUT(match, nsh.np,
> +					base->np, is_mask);
> +			SW_FLOW_KEY_PUT(match, nsh.path_hdr,
> +					base->path_hdr, is_mask);
> +			break;
> +		}
> +		case OVS_NSH_KEY_ATTR_MD1: {
> +			const struct ovs_nsh_key_md1 *md1 =
> +				(struct ovs_nsh_key_md1 *)nla_data(a);
> +			for (i = 0; i < 4; i++)
> +				SW_FLOW_KEY_PUT(match, nsh.context[i],
> +						md1->context[i], is_mask);
> +			break;
> +		}
> +		case OVS_NSH_KEY_ATTR_MD2:
> +			/* Not supported yet */
> +			break;
> +		default:
> +			OVS_NLERR(log, "Unknown nsh attribute %d",
> +				  type);
> +			return -EINVAL;
> +		}
> +	}
> +
> +	if (rem > 0) {
> +		OVS_NLERR(log, "nsh attribute has %d unknown bytes.", rem);
> +		return -EINVAL;
> +	}
> +
> +	return 0;
> +}
> +
>  static int ovs_key_from_nlattrs(struct net *net, struct sw_flow_match *match,
>  				u64 attrs, const struct nlattr **a,
>  				bool is_mask, bool log)
[..]

^ permalink raw reply

* [patch net-next] net: sched: cls_flower: fix ndo_setup_tc type for stats call
From: Jiri Pirko @ 2017-08-16 15:15 UTC (permalink / raw)
  To: netdev; +Cc: davem, jhs, xiyou.wangcong, mlxsw

From: Jiri Pirko <jiri@mellanox.com>

I made a stupid mistake using TC_CLSFLOWER_STATS instead of
TC_SETUP_CLSFLOWER. Funny thing is that both are defined as "2" so it
actually did not cause any harm. Anyway, fixing it now.

Fixes: 2572ac53c46f ("net: sched: make type an argument for ndo_setup_tc")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 net/sched/cls_flower.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 052e902..bd9dab4 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -289,7 +289,7 @@ static void fl_hw_update_stats(struct tcf_proto *tp, struct cls_fl_filter *f)
 	cls_flower.cookie = (unsigned long) f;
 	cls_flower.exts = &f->exts;
 
-	dev->netdev_ops->ndo_setup_tc(dev, TC_CLSFLOWER_STATS,
+	dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_CLSFLOWER,
 				      &cls_flower);
 }
 
-- 
2.9.3

^ permalink raw reply related

* Re: [patch net-next repost 1/3] idr: Use unsigned long instead of int
From: Jiri Pirko @ 2017-08-16 15:00 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Chris Mi, netdev, jhs, xiyou.wangcong, davem, mawilcox
In-Reply-To: <1502895438.4936.121.camel@edumazet-glaptop3.roam.corp.google.com>

Wed, Aug 16, 2017 at 04:57:18PM CEST, eric.dumazet@gmail.com wrote:
>On Wed, 2017-08-16 at 13:06 +0200, Jiri Pirko wrote:
>> Wed, Aug 16, 2017 at 12:58:53PM CEST, eric.dumazet@gmail.com wrote:
>> >On Wed, 2017-08-16 at 12:53 +0200, Jiri Pirko wrote:
>> >
>> >> rhashtable is unnecesary big hammer for this. IDR is nice fit for
>> >> this purpose.
>> >
>> >Obviously IDR does not fit, since you have to change its ABI.
>> 
>> I don't think it is a problem to adjust something to your needs.
>> Moreover, if it's API is misdesigned from the beginning. We are just
>> putting IDR back on track, cleaning it's API. I don't see anything wrong
>> on that. Everyone would benefit.
>
>Except that your patch is gigantic, and nobody really can review it.
>
>You could define idr_alloc_ext() maybe.
>
>Then provide a patch series grouped so that each maintainer can review
>its part.
>
>Or leave legacy code using the old idr_alloc() in place.

Fair. Thanks.

^ permalink raw reply

* Re: [patch net-next repost 1/3] idr: Use unsigned long instead of int
From: Eric Dumazet @ 2017-08-16 14:57 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: Chris Mi, netdev, jhs, xiyou.wangcong, davem, mawilcox
In-Reply-To: <20170816110612.GI1868@nanopsycho>

On Wed, 2017-08-16 at 13:06 +0200, Jiri Pirko wrote:
> Wed, Aug 16, 2017 at 12:58:53PM CEST, eric.dumazet@gmail.com wrote:
> >On Wed, 2017-08-16 at 12:53 +0200, Jiri Pirko wrote:
> >
> >> rhashtable is unnecesary big hammer for this. IDR is nice fit for
> >> this purpose.
> >
> >Obviously IDR does not fit, since you have to change its ABI.
> 
> I don't think it is a problem to adjust something to your needs.
> Moreover, if it's API is misdesigned from the beginning. We are just
> putting IDR back on track, cleaning it's API. I don't see anything wrong
> on that. Everyone would benefit.

Except that your patch is gigantic, and nobody really can review it.

You could define idr_alloc_ext() maybe.

Then provide a patch series grouped so that each maintainer can review
its part.

Or leave legacy code using the old idr_alloc() in place.

^ permalink raw reply

* Re: [PATCH 1/2] tcp: Remove unnecessary dst check in tcp_conn_request.
From: Eric Dumazet @ 2017-08-16 14:44 UTC (permalink / raw)
  To: Tonghao Zhang; +Cc: netdev
In-Reply-To: <1502890265-9285-1-git-send-email-xiangxia.m.yue@gmail.com>

On Wed, 2017-08-16 at 06:31 -0700, Tonghao Zhang wrote:
> Because we remove the tcp_tw_recycle support in the commit


> 4396e46187c ('tcp: remove tcp_tw_recycle') and also delete
> the code 'af_ops->route_req' for sysctl_tw_recycle in tcp_conn_request.
> Now when we call the 'af_ops->route_req', the dist always is
> NULL, and we remove the unnecessay check.

Thanks for these patches.

You forgot :

1) a cover letter ( [PATCH next-next 0/2] tcp: ....

2) clearly state which tree you are targeting 
( read Documentation/networking/netdev-FAQ.txt )

3) Also, I would also have removed tcp_peer_is_proven()
since it is also called with dst=NULL

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox