Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next v3 2/4] bpf, mlx5: fix various refcount issues in mlx5e_xdp_set
From: Saeed Mahameed @ 2016-11-21  9:30 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: David S. Miller, Alexei Starovoitov, Brenden Blanco, Zhiyi Sun,
	Rana Shahout, Saeed Mahameed, Linux Netdev List
In-Reply-To: <6030d6a2d31ce68280138ac7ed33c3fc6ba585b2.1479514784.git.daniel@iogearbox.net>

On Sat, Nov 19, 2016 at 2:45 AM, Daniel Borkmann <daniel@iogearbox.net> wrote:
> There are multiple issues in mlx5e_xdp_set():
>
> 1) The batched bpf_prog_add() is currently not checked for errors. When
>    doing so, it should be done at an earlier point in time to makes sure
>    that we cannot fail anymore at the time we want to set the program for
>    each channel. The batched refs short-cut can only be performed when we
>    don't need to perform a reset for changing the rq type and the device
>    was in opened state. In case the device was not in opened state, then
>    the next mlx5e_open_locked() will aquire the refs from the control prog
>    via mlx5e_create_rq(), same when we need to perform a reset.
>
> 2) When swapping the priv->xdp_prog, then no extra reference count must be
>    taken since we got that from call path via dev_change_xdp_fd() already.
>    Otherwise, we'd never be able to release the program. Also, bpf_prog_add()
>    without checking the return code could fail.
>
> Fixes: 86994156c736 ("net/mlx5e: XDP fast RX drop bpf programs support")
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

Acked-by: Saeed Mahameed <saeedm@mellanox.com>

^ permalink raw reply

* Re: [PATCH net-next v3 3/4] bpf, mlx5: drop priv->xdp_prog reference on netdev cleanup
From: Saeed Mahameed @ 2016-11-21  9:31 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: David S. Miller, Alexei Starovoitov, Brenden Blanco, Zhiyi Sun,
	Rana Shahout, Saeed Mahameed, Linux Netdev List
In-Reply-To: <29469b54bfddfe218266b8b902a3735034d28226.1479514784.git.daniel@iogearbox.net>

On Sat, Nov 19, 2016 at 2:45 AM, Daniel Borkmann <daniel@iogearbox.net> wrote:
> mlx5e_xdp_set() is currently the only place where we drop reference on the
> prog sitting in priv->xdp_prog when it's exchanged by a new one. We also
> need to make sure that we eventually release that reference, for example,
> in case the netdev is dismantled, otherwise we leak the program.
>
> Fixes: 86994156c736 ("net/mlx5e: XDP fast RX drop bpf programs support")
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

Acked-by: Saeed Mahameed <saeedm@mellanox.com>

^ permalink raw reply

* [PATCH iproute2 0/2] tc/cls_flower: Support for ip tunnel metadata set/release/classify
From: Amir Vadai @ 2016-11-21 10:20 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: David S. Miller, netdev, Or Gerlitz, Hadar Har-Zion, Roi Dayan,
	Amir Vadai

Hi,

This short series adds support for matching and setting metadata for ip tunnel
shared device using the TC system, introduced in kernel 4.9 [1].

Applied and tested on top of commit f3f339e9590a ("cleanup debris from revert")

Example usage:

$ tc filter add dev vxlan0 protocol ip parent ffff: \
    flower \
      enc_src_ip 11.11.0.2 \
      enc_dst_ip 11.11.0.1 \
      enc_key_id 11 \
      dst_ip 11.11.11.1 \
    action tunnel_key release \
    action mirred egress redirect dev vnet0

$ tc filter add dev net0 protocol ip parent ffff: \
    flower \
      ip_proto 1 \
      dst_ip 11.11.11.2 \
    action tunnel_key set \
      src_ip 11.11.0.1 \
      dst_ip 11.11.0.2 \
      id 11 \
    action mirred egress redirect dev vxlan0

[1] - d1ba24feb466 ("Merge branch 'act_tunnel_key'")

Thanks,
Amir

Amir Vadai (2):
  tc/cls_flower: Classify packet in ip tunnels
  tc/act_tunnel: Introduce ip tunnel action

 include/linux/tc_act/tc_tunnel_key.h |  42 ++++++
 man/man8/tc-flower.8                 |  17 ++-
 man/man8/tc-tunnel_key.8             | 105 ++++++++++++++
 tc/Makefile                          |   1 +
 tc/f_flower.c                        |  85 +++++++++++-
 tc/m_tunnel_key.c                    | 259 +++++++++++++++++++++++++++++++++++
 6 files changed, 505 insertions(+), 4 deletions(-)
 create mode 100644 include/linux/tc_act/tc_tunnel_key.h
 create mode 100644 man/man8/tc-tunnel_key.8
 create mode 100644 tc/m_tunnel_key.c

-- 
2.10.2

^ permalink raw reply

* [PATCH iproute2 2/2] tc/act_tunnel: Introduce ip tunnel action
From: Amir Vadai @ 2016-11-21 10:20 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: David S. Miller, netdev, Or Gerlitz, Hadar Har-Zion, Roi Dayan,
	Amir Vadai
In-Reply-To: <20161121102056.13468-1-amir@vadai.me>

This action could be used before redirecting packets to a shared tunnel
device, or when redirecting packets arriving from a such a device.

The action will release the metadata created by the tunnel device
(decap), or set the metadata with the specified values for encap
operation.

For example, the following flower filter will forward all ICMP packets
destined to 11.11.11.2 through the shared vxlan device 'vxlan0'. Before
redirecting, a metadata for the vxlan tunnel is created using the
tunnel_key action and it's arguments:

$ tc filter add dev net0 protocol ip parent ffff: \
    flower \
      ip_proto 1 \
      dst_ip 11.11.11.2 \
    action tunnel_key set \
      src_ip 11.11.0.1 \
      dst_ip 11.11.0.2 \
      id 11 \
    action mirred egress redirect dev vxlan0

Signed-off-by: Amir Vadai <amir@vadai.me>
---
 include/linux/tc_act/tc_tunnel_key.h |  42 ++++++
 man/man8/tc-tunnel_key.8             | 105 ++++++++++++++
 tc/Makefile                          |   1 +
 tc/m_tunnel_key.c                    | 259 +++++++++++++++++++++++++++++++++++
 4 files changed, 407 insertions(+)
 create mode 100644 include/linux/tc_act/tc_tunnel_key.h
 create mode 100644 man/man8/tc-tunnel_key.8
 create mode 100644 tc/m_tunnel_key.c

diff --git a/include/linux/tc_act/tc_tunnel_key.h b/include/linux/tc_act/tc_tunnel_key.h
new file mode 100644
index 000000000000..f9ddf5369a45
--- /dev/null
+++ b/include/linux/tc_act/tc_tunnel_key.h
@@ -0,0 +1,42 @@
+/*
+ * Copyright (c) 2016, Amir Vadai <amir@vadai.me>
+ * Copyright (c) 2016, Mellanox Technologies. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#ifndef __LINUX_TC_TUNNEL_KEY_H
+#define __LINUX_TC_TUNNEL_KEY_H
+
+#include <linux/pkt_cls.h>
+
+#define TCA_ACT_TUNNEL_KEY 17
+
+#define TCA_TUNNEL_KEY_ACT_SET	    1
+#define TCA_TUNNEL_KEY_ACT_RELEASE  2
+
+struct tc_tunnel_key {
+	tc_gen;
+	int t_action;
+};
+
+enum {
+	TCA_TUNNEL_KEY_UNSPEC,
+	TCA_TUNNEL_KEY_TM,
+	TCA_TUNNEL_KEY_PARMS,
+	TCA_TUNNEL_KEY_ENC_IPV4_SRC,	/* be32 */
+	TCA_TUNNEL_KEY_ENC_IPV4_DST,	/* be32 */
+	TCA_TUNNEL_KEY_ENC_IPV6_SRC,	/* struct in6_addr */
+	TCA_TUNNEL_KEY_ENC_IPV6_DST,	/* struct in6_addr */
+	TCA_TUNNEL_KEY_ENC_KEY_ID,	/* be64 */
+	TCA_TUNNEL_KEY_PAD,
+	__TCA_TUNNEL_KEY_MAX,
+};
+
+#define TCA_TUNNEL_KEY_MAX (__TCA_TUNNEL_KEY_MAX - 1)
+
+#endif
+
diff --git a/man/man8/tc-tunnel_key.8 b/man/man8/tc-tunnel_key.8
new file mode 100644
index 000000000000..c3b21e7d040e
--- /dev/null
+++ b/man/man8/tc-tunnel_key.8
@@ -0,0 +1,105 @@
+.TH "Tunnel metadata manipulation action in tc" 8 "10 Nov 2016" "iproute2" "Linux"
+
+.SH NAME
+tunnel_key - Tunnel metadata manipulation
+.SH SYNOPSIS
+.in +8
+.ti -8
+.BR tc " ... " "action tunnel_key" " { " release " | "
+.IR SET " }"
+
+.ti -8
+.IR SET " := "
+.BR set " " src_ip
+.IR ADDRESS
+.BR dst_ip
+.IR ADDRESS
+.BI id " KEY_ID"
+
+.SH DESCRIPTION
+The
+.B tunnel_key
+action allows to perform IP tunnel en- or decapsulation on a packet, reflected by
+the operation modes
+.IR RELEASE " and " SET .
+The
+.I RELEASE
+mode is simple, as no further information is required to just drop the
+metadata attached to the skb. The
+.IR SET
+mode requires the source and destination ip
+.I ADDRESS
+and the tunnel key id
+.I KEY_ID
+which will be used by the ip tunnel shared device to create the tunnel header. The
+.B tunnel_key
+action is useful only in combination with a
+.B mirred redirect
+action to a shared IP tunnel device which will use the metadata (for
+.I SET
+) and release the metadata created by it (for
+.I RELEASE
+).
+
+.SH OPTIONS
+.TP
+.B release
+Decapsulation mode, no further arguments allowed.
+.TP
+.B set
+Encapsulation mode. Requires
+.B id
+,
+.B src_ip
+and
+.B dst_ip
+options.
+.RS
+.TP
+.B id
+Tunnel ID (for example VNI in VXLAN tunnel)
+.TP
+.B src_ip
+Outer header source IP address (IPv4 or IPv6)
+.TP
+.B dst_ip
+Outer header destination IP address (IPv4 or IPv6)
+.RE
+.SH EXAMPLES
+The following example encapsulates incoming ICMP packets on eth0 into a vxlan
+tunnel by setting metadata to VNI 11, source IP 11.11.0.1 and destination IP
+11.11.0.1 by forwarding the skb with the metadata to device vxlan0, which will
+prepare the VXLAN headers:
+
+.RS
+.EX
+#tc qdisc add dev eth0 handle ffff: ingress
+#tc filter add dev eth0 protocol ip parent ffff: \\
+  flower \\
+    ip_proto icmp \\
+  action tunnel_key set \\
+    src_ip 11.11.0.1 \\
+    dst_ip 11.11.0.2 \\
+    id 11 \\
+  action mirred egress redirect dev vxlan0
+.EE
+.RE
+
+Here is an example of the
+.B release
+function: Incoming VXLAN packets on vxlan0 with specific outer IP's and VNI 11
+in the metadata are decapsulated and redirected to eth0:
+
+.RS
+.EX
+#tc qdisc add dev eth0 handle ffff: ingress
+#tc filter add dev vxlan0 protocol ip parent ffff: \
+  flower \\
+	  enc_src_ip 11.11.0.2 enc_dst_ip 11.11.0.1 enc_key_id 11 \
+	action tunnel_key release \
+	action mirred egress redirect dev eth0
+.EE
+.RE
+
+.SH SEE ALSO
+.BR tc (8)
diff --git a/tc/Makefile b/tc/Makefile
index dfa875b5edaf..f6f41ca2bb3d 100644
--- a/tc/Makefile
+++ b/tc/Makefile
@@ -50,6 +50,7 @@ TCMODULES += m_simple.o
 TCMODULES += m_vlan.o
 TCMODULES += m_connmark.o
 TCMODULES += m_bpf.o
+TCMODULES += m_tunnel_key.o
 TCMODULES += p_ip.o
 TCMODULES += p_icmp.o
 TCMODULES += p_tcp.o
diff --git a/tc/m_tunnel_key.c b/tc/m_tunnel_key.c
new file mode 100644
index 000000000000..42f2a184bd3a
--- /dev/null
+++ b/tc/m_tunnel_key.c
@@ -0,0 +1,259 @@
+/*
+ * m_tunnel_key.c	ip tunel manipulation module
+ *
+ *              This program is free software; you can redistribute it and/or
+ *              modify it under the terms of the GNU General Public License
+ *              as published by the Free Software Foundation; either version
+ *              2 of the License, or (at your option) any later version.
+ *
+ * Authors:     Amir Vadai <amir@vadai.me>
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <string.h>
+#include <linux/if_ether.h>
+#include "utils.h"
+#include "rt_names.h"
+#include "tc_util.h"
+#include <linux/tc_act/tc_tunnel_key.h>
+
+static void explain(void)
+{
+	fprintf(stderr, "Usage: tunnel_key release\n");
+	fprintf(stderr, "       tunnel_key set id TUNNELID src_ip IP dst_ip IP\n");
+}
+
+static void usage(void)
+{
+	explain();
+	exit(-1);
+}
+
+static int tunnel_key_parse_ip_addr(char *str, int addr4_type, int addr6_type,
+				    struct nlmsghdr *n)
+{
+	int ret;
+	inet_prefix addr;
+
+	ret = get_addr(&addr, str, AF_UNSPEC);
+	if (ret)
+		return -1;
+
+	addattr_l(n, MAX_MSG, addr.family == AF_INET ? addr4_type : addr6_type,
+		  addr.data, addr.bytelen);
+
+	return 0;
+}
+
+static int tunnel_key_parse_key_id(char *str, int type, struct nlmsghdr *n)
+{
+	int ret;
+	__be32 key_id;
+
+	ret = get_be32(&key_id, str, 10);
+	if (ret)
+		return -1;
+
+	addattr32(n, MAX_MSG, type, key_id);
+
+	return 0;
+}
+
+static int parse_tunnel_key(struct action_util *a, int *argc_p, char ***argv_p,
+			    int tca_id, struct nlmsghdr *n)
+{
+	struct tc_tunnel_key parm = { .action = TC_ACT_PIPE };
+	char **argv = *argv_p;
+	int argc = *argc_p;
+	struct rtattr *tail;
+	int action = 0;
+	int ret;
+	int has_src_ip = 0;
+	int has_dst_ip = 0;
+	int has_key_id = 0;
+
+	if (matches(*argv, "tunnel_key") != 0)
+		return -1;
+
+	tail = NLMSG_TAIL(n);
+	addattr_l(n, MAX_MSG, tca_id, NULL, 0);
+
+	NEXT_ARG();
+
+	while (argc > 0) {
+		if (matches(*argv, "release") == 0) {
+			if (action) {
+				fprintf(stderr, "unexpected \"%s\" - action already specified\n",
+					*argv);
+				explain();
+				return -1;
+			}
+			action = TCA_TUNNEL_KEY_ACT_RELEASE;
+		} else if (matches(*argv, "set") == 0) {
+			if (action) {
+				fprintf(stderr, "unexpected \"%s\" - action already specified\n",
+					*argv);
+				explain();
+				return -1;
+			}
+			action = TCA_TUNNEL_KEY_ACT_SET;
+		} else if (matches(*argv, "src_ip") == 0) {
+			NEXT_ARG();
+			ret = tunnel_key_parse_ip_addr(*argv,
+						       TCA_TUNNEL_KEY_ENC_IPV4_SRC,
+						       TCA_TUNNEL_KEY_ENC_IPV6_SRC,
+						       n);
+			if (ret < 0) {
+				fprintf(stderr, "Illegal \"src_ip\"\n");
+				return -1;
+			}
+			has_src_ip = 1;
+		} else if (matches(*argv, "dst_ip") == 0) {
+			NEXT_ARG();
+			ret = tunnel_key_parse_ip_addr(*argv,
+						       TCA_TUNNEL_KEY_ENC_IPV4_DST,
+						       TCA_TUNNEL_KEY_ENC_IPV6_DST,
+						       n);
+			if (ret < 0) {
+				fprintf(stderr, "Illegal \"dst_ip\"\n");
+				return -1;
+			}
+			has_dst_ip = 1;
+		} else if (matches(*argv, "id") == 0) {
+			NEXT_ARG();
+			ret = tunnel_key_parse_key_id(*argv, TCA_TUNNEL_KEY_ENC_KEY_ID, n);
+			if (ret < 0) {
+				fprintf(stderr, "Illegal \"id\"\n");
+				return -1;
+			}
+			has_key_id = 1;
+		} else if (matches(*argv, "help") == 0) {
+			usage();
+		} else {
+			break;
+		}
+		NEXT_ARG_FWD();
+	}
+
+	if (argc && !action_a2n(*argv, &parm.action, false))
+		NEXT_ARG_FWD();
+
+	if (argc) {
+		if (matches(*argv, "index") == 0) {
+			NEXT_ARG();
+			if (get_u32(&parm.index, *argv, 10)) {
+				fprintf(stderr, "tunnel_key: Illegal \"index\"\n");
+				return -1;
+			}
+
+			NEXT_ARG_FWD();
+		}
+	}
+
+	if (action == TCA_TUNNEL_KEY_ACT_SET &&
+	    (!has_src_ip || !has_dst_ip || !has_key_id)) {
+		fprintf(stderr, "set needs tunnel_key parameters\n");
+		explain();
+		return -1;
+	}
+
+	parm.t_action = action;
+	addattr_l(n, MAX_MSG, TCA_TUNNEL_KEY_PARMS, &parm, sizeof(parm));
+	tail->rta_len = (char *)NLMSG_TAIL(n) - (char *)tail;
+
+	*argc_p = argc;
+	*argv_p = argv;
+
+	return 0;
+}
+
+static void tunnel_key_print_ip_addr(FILE *f, char *name,
+				     struct rtattr *attr)
+{
+	int family;
+	size_t len;
+
+	if (!attr)
+		return;
+
+	len = RTA_PAYLOAD(attr);
+
+	if (len == 4)
+		family = AF_INET;
+	else if (len == 16)
+		family = AF_INET6;
+	else
+		return;
+
+	fprintf(f, "\n\t%s %s", name, rt_addr_n2a_rta(family, attr));
+}
+
+static void tunnel_key_print_key_id(FILE *f, char *name,
+				    struct rtattr *attr)
+{
+	if (!attr)
+		return;
+	fprintf(f, "\n\t%s %d", name, ntohl(rta_getattr_u32(attr)));
+}
+
+static int print_tunnel_key(struct action_util *au, FILE *f, struct rtattr *arg)
+{
+	struct rtattr *tb[TCA_TUNNEL_KEY_MAX + 1];
+	struct tc_tunnel_key *parm;
+
+	if (!arg)
+		return -1;
+
+	parse_rtattr_nested(tb, TCA_TUNNEL_KEY_MAX, arg);
+
+	if (!tb[TCA_TUNNEL_KEY_PARMS]) {
+		fprintf(f, "[NULL tunnel_key parameters]");
+		return -1;
+	}
+	parm = RTA_DATA(tb[TCA_TUNNEL_KEY_PARMS]);
+
+	fprintf(f, "tunnel_key");
+
+	switch (parm->t_action) {
+	case TCA_TUNNEL_KEY_ACT_RELEASE:
+		fprintf(f, " release");
+		break;
+	case TCA_TUNNEL_KEY_ACT_SET:
+		fprintf(f, " set");
+		tunnel_key_print_ip_addr(f, "src_ip",
+					 tb[TCA_TUNNEL_KEY_ENC_IPV4_SRC]);
+		tunnel_key_print_ip_addr(f, "dst_ip",
+					 tb[TCA_TUNNEL_KEY_ENC_IPV4_DST]);
+		tunnel_key_print_ip_addr(f, "src_ip",
+					 tb[TCA_TUNNEL_KEY_ENC_IPV6_SRC]);
+		tunnel_key_print_ip_addr(f, "dst_ip",
+					 tb[TCA_TUNNEL_KEY_ENC_IPV6_DST]);
+		tunnel_key_print_key_id(f, "key_id",
+					tb[TCA_TUNNEL_KEY_ENC_KEY_ID]);
+		break;
+	}
+	fprintf(f, " %s", action_n2a(parm->action));
+
+	fprintf(f, "\n\tindex %d ref %d bind %d", parm->index, parm->refcnt,
+		parm->bindcnt);
+
+	if (show_stats) {
+		if (tb[TCA_TUNNEL_KEY_TM]) {
+			struct tcf_t *tm = RTA_DATA(tb[TCA_TUNNEL_KEY_TM]);
+
+			print_tm(f, tm);
+		}
+	}
+
+	fprintf(f, "\n ");
+
+	return 0;
+}
+
+struct action_util tunnel_key_action_util = {
+	.id = "tunnel_key",
+	.parse_aopt = parse_tunnel_key,
+	.print_aopt = print_tunnel_key,
+};
-- 
2.10.2

^ permalink raw reply related

* [PATCH iproute2 1/2] tc/cls_flower: Classify packet in ip tunnels
From: Amir Vadai @ 2016-11-21 10:20 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: David S. Miller, netdev, Or Gerlitz, Hadar Har-Zion, Roi Dayan,
	Amir Vadai
In-Reply-To: <20161121102056.13468-1-amir@vadai.me>

Introduce classifying by metadata extracted by the tunnel device.
Outer header fields - source/dest ip and tunnel id, are extracted from
the metadata when classifying.

For example, the following will add a filter on the ingress Qdisc of shared
vxlan device named 'vxlan0'. To forward packets with outer src ip
11.11.0.2, dst ip 11.11.0.1 and tunnel id 11. The packets will be
forwarded to tap device 'vnet0' (after metadata is released):

$ tc filter add dev vxlan0 protocol ip parent ffff: \
    flower \
      enc_src_ip 11.11.0.2 \
      enc_dst_ip 11.11.0.1 \
      enc_key_id 11 \
      dst_ip 11.11.11.1 \
    action tunnel_key release \
    action mirred egress redirect dev vnet0

The action tunnel_key, will be introduced in the next patch in this
series.

Signed-off-by: Amir Vadai <amir@vadai.me>
---
 man/man8/tc-flower.8 | 17 ++++++++++-
 tc/f_flower.c        | 85 ++++++++++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 98 insertions(+), 4 deletions(-)

diff --git a/man/man8/tc-flower.8 b/man/man8/tc-flower.8
index 74f76647753b..0e0b0cf4bb72 100644
--- a/man/man8/tc-flower.8
+++ b/man/man8/tc-flower.8
@@ -36,7 +36,11 @@ flower \- flow based traffic control filter
 .BR dst_ip " | " src_ip " } { "
 .IR ipv4_address " | " ipv6_address " } | { "
 .BR dst_port " | " src_port " } "
-.IR port_number " }"
+.IR port_number " } | "
+.B enc_key_id
+.IR KEY-ID " | {"
+.BR enc_dst_ip " | " enc_src_ip " } { "
+.IR ipv4_address " | " ipv6_address " } | "
 .SH DESCRIPTION
 The
 .B flower
@@ -121,6 +125,17 @@ which has to be specified in beforehand.
 Match on layer 4 protocol source or destination port number. Only available for
 .BR ip_proto " values " udp " and " tcp ,
 which has to be specified in beforehand.
+.TP
+.BI enc_key_id " NUMBER"
+.TQ
+.BI enc_dst_ip " ADDRESS"
+.TQ
+.BI enc_src_ip " ADDRESS"
+Match on IP tunnel metadata. Key id
+.I NUMBER
+is a 32 bit tunnel key id (e.g. VNI for VXLAN tunnel).
+.I ADDRESS
+must be a valid IPv4 or IPv6 address.
 .SH NOTES
 As stated above where applicable, matches of a certain layer implicitly depend
 on the matches of the next lower layer. Precisely, layer one and two matches (
diff --git a/tc/f_flower.c b/tc/f_flower.c
index 2d31d1aa832d..1cf0750b5b83 100644
--- a/tc/f_flower.c
+++ b/tc/f_flower.c
@@ -41,7 +41,10 @@ static void explain(void)
 	fprintf(stderr, "                       dst_ip [ IPV4-ADDR | IPV6-ADDR ] |\n");
 	fprintf(stderr, "                       src_ip [ IPV4-ADDR | IPV6-ADDR ] |\n");
 	fprintf(stderr, "                       dst_port PORT-NUMBER |\n");
-	fprintf(stderr, "                       src_port PORT-NUMBER }\n");
+	fprintf(stderr, "                       src_port PORT-NUMBER |\n");
+	fprintf(stderr, "                       enc_dst_ip [ IPV4-ADDR | IPV6-ADDR ] |\n");
+	fprintf(stderr, "                       enc_src_ip [ IPV4-ADDR | IPV6-ADDR ] |\n");
+	fprintf(stderr, "                       enc_key_id [ KEY-ID ] }\n");
 	fprintf(stderr,	"       FILTERID := X:Y:Z\n");
 	fprintf(stderr,	"       ACTION-SPEC := ... look at individual actions\n");
 	fprintf(stderr,	"\n");
@@ -121,8 +124,9 @@ static int flower_parse_ip_addr(char *str, __be16 eth_type,
 		family = AF_INET;
 	} else if (eth_type == htons(ETH_P_IPV6)) {
 		family = AF_INET6;
+	} else if (!eth_type) {
+		family = AF_UNSPEC;
 	} else {
-		fprintf(stderr, "Illegal \"eth_type\" for ip address\n");
 		return -1;
 	}
 
@@ -130,8 +134,10 @@ static int flower_parse_ip_addr(char *str, __be16 eth_type,
 	if (ret)
 		return -1;
 
-	if (addr.family != family)
+	if (family && (addr.family != family)) {
+		fprintf(stderr, "Illegal \"eth_type\" for ip address\n");
 		return -1;
+	}
 
 	addattr_l(n, MAX_MSG, addr.family == AF_INET ? addr4_type : addr6_type,
 		  addr.data, addr.bytelen);
@@ -181,6 +187,20 @@ static int flower_parse_port(char *str, __u8 ip_port,
 	return 0;
 }
 
+static int flower_parse_key_id(char *str, int type, struct nlmsghdr *n)
+{
+	int ret;
+	__be32 key_id;
+
+	ret = get_be32(&key_id, str, 10);
+	if (ret)
+		return -1;
+
+	addattr32(n, MAX_MSG, type, key_id);
+
+	return 0;
+}
+
 static int flower_parse_opt(struct filter_util *qu, char *handle,
 			    int argc, char **argv, struct nlmsghdr *n)
 {
@@ -339,6 +359,38 @@ static int flower_parse_opt(struct filter_util *qu, char *handle,
 				fprintf(stderr, "Illegal \"src_port\"\n");
 				return -1;
 			}
+		} else if (matches(*argv, "enc_dst_ip") == 0) {
+			NEXT_ARG();
+			ret = flower_parse_ip_addr(*argv, 0,
+						   TCA_FLOWER_KEY_ENC_IPV4_DST,
+						   TCA_FLOWER_KEY_ENC_IPV4_DST_MASK,
+						   TCA_FLOWER_KEY_ENC_IPV6_DST,
+						   TCA_FLOWER_KEY_ENC_IPV6_DST_MASK,
+						   n);
+			if (ret < 0) {
+				fprintf(stderr, "Illegal \"enc_dst_ip\"\n");
+				return -1;
+			}
+		} else if (matches(*argv, "enc_src_ip") == 0) {
+			NEXT_ARG();
+			ret = flower_parse_ip_addr(*argv, 0,
+						   TCA_FLOWER_KEY_ENC_IPV4_SRC,
+						   TCA_FLOWER_KEY_ENC_IPV4_SRC_MASK,
+						   TCA_FLOWER_KEY_ENC_IPV6_SRC,
+						   TCA_FLOWER_KEY_ENC_IPV6_SRC_MASK,
+						   n);
+			if (ret < 0) {
+				fprintf(stderr, "Illegal \"enc_src_ip\"\n");
+				return -1;
+			}
+		} else if (matches(*argv, "enc_key_id") == 0) {
+			NEXT_ARG();
+			ret = flower_parse_key_id(*argv,
+						  TCA_FLOWER_KEY_ENC_KEY_ID, n);
+			if (ret < 0) {
+				fprintf(stderr, "Illegal \"enc_key_id\"\n");
+				return -1;
+			}
 		} else if (matches(*argv, "action") == 0) {
 			NEXT_ARG();
 			ret = parse_action(&argc, &argv, TCA_FLOWER_ACT, n);
@@ -509,6 +561,14 @@ static void flower_print_port(FILE *f, char *name, __u8 ip_proto,
 	fprintf(f, "\n  %s %d", name, ntohs(rta_getattr_u16(attr)));
 }
 
+static void flower_print_key_id(FILE *f, char *name,
+				struct rtattr *attr)
+{
+	if (!attr)
+		return;
+	fprintf(f, "\n  %s %d", name, ntohl(rta_getattr_u32(attr)));
+}
+
 static int flower_print_opt(struct filter_util *qu, FILE *f,
 			    struct rtattr *opt, __u32 handle)
 {
@@ -577,6 +637,25 @@ static int flower_print_opt(struct filter_util *qu, FILE *f,
 			  tb[TCA_FLOWER_KEY_TCP_SRC],
 			  tb[TCA_FLOWER_KEY_UDP_SRC]);
 
+	flower_print_ip_addr(f, "enc_dst_ip",
+			     tb[TCA_FLOWER_KEY_ENC_IPV4_DST_MASK] ?
+			     htons(ETH_P_IP) : htons(ETH_P_IPV6),
+			     tb[TCA_FLOWER_KEY_ENC_IPV4_DST],
+			     tb[TCA_FLOWER_KEY_ENC_IPV4_DST_MASK],
+			     tb[TCA_FLOWER_KEY_ENC_IPV6_DST],
+			     tb[TCA_FLOWER_KEY_ENC_IPV6_DST_MASK]);
+
+	flower_print_ip_addr(f, "enc_src_ip",
+			     tb[TCA_FLOWER_KEY_ENC_IPV4_SRC_MASK] ?
+			     htons(ETH_P_IP) : htons(ETH_P_IPV6),
+			     tb[TCA_FLOWER_KEY_ENC_IPV4_SRC],
+			     tb[TCA_FLOWER_KEY_ENC_IPV4_SRC_MASK],
+			     tb[TCA_FLOWER_KEY_ENC_IPV6_SRC],
+			     tb[TCA_FLOWER_KEY_ENC_IPV6_SRC_MASK]);
+
+	flower_print_key_id(f, "enc_key_id",
+			    tb[TCA_FLOWER_KEY_ENC_KEY_ID]);
+
 	if (tb[TCA_FLOWER_FLAGS])  {
 		__u32 flags = rta_getattr_u32(tb[TCA_FLOWER_FLAGS]);
 
-- 
2.10.2

^ permalink raw reply related

* Re: [PATCH net 1/1] net: batman-adv: Treat NET_XMIT_CN as transmit successfully
From: Sergei Shtylyov @ 2016-11-21 10:44 UTC (permalink / raw)
  To: fgao-KlmEoCYek3zQT0dZR+AlfA, mareklindner-rVWd3aGhH2z5bpWLKbzFeg,
	sw-2YrNx6rUIHYiY0qSoAWiAoQuADTiUCJX, a,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	b.a.t.m.a.n-ZwoEplunGu2X36UT3dwllkB+6BGkLq7r,
	netdev-u79uwXL29TY76Z2rM5mHXA, gfree.wind-Re5JQEeQqe8AvxtiuMwx3w
In-Reply-To: <1479688779-1328-1-git-send-email-fgao-KlmEoCYek3zQT0dZR+AlfA@public.gmane.org>

Hello.

On 11/21/2016 3:39 AM, fgao-KlmEoCYek3zQT0dZR+AlfA@public.gmane.org wrote:

> From: Gao Feng <fgao-KlmEoCYek3zQT0dZR+AlfA@public.gmane.org>
>
> The tc could return NET_XMIT_CN as one congestion notification, but
> it does not mean the packe is lost. Other modules like ipvlan,

    Packet.

> macvlan, and others treat NET_XMIT_CN as success too.
>
> So batman-adv should add the NET_XMIT_CN check.
>
> Signed-off-by: Gao Feng <fgao-KlmEoCYek3zQT0dZR+AlfA@public.gmane.org>

[...]

> diff --git a/net/batman-adv/routing.c b/net/batman-adv/routing.c
> index 7e8dc64..8edd324 100644
> --- a/net/batman-adv/routing.c
> +++ b/net/batman-adv/routing.c
> @@ -706,7 +706,7 @@ static int batadv_route_unicast_packet(struct sk_buff *skb,
>  		goto out;
>
>  	/* translate transmit result into receive result */
> -	if (res == NET_XMIT_SUCCESS) {
> +	if (res == NET_XMIT_SUCCESS || ret == NET_XMIT_CN) {

    Not 'res == NET_XMIT_CN'?

>  		/* skb was transmitted and consumed */
>  		batadv_inc_counter(bat_priv, BATADV_CNT_FORWARD);
>  		batadv_add_counter(bat_priv, BATADV_CNT_FORWARD_BYTES,
[...]

MBR, Sergei

^ permalink raw reply

* [PATCH net v2 1/1] net: batman-adv: Treat NET_XMIT_CN as transmit successfully
From: fgao @ 2016-11-21 10:58 UTC (permalink / raw)
  To: mareklindner, sw, a, davem, b.a.t.m.a.n, netdev, gfree.wind; +Cc: Gao Feng

From: Gao Feng <fgao@ikuai8.com>

The tc could return NET_XMIT_CN as one congestion notification, but
it does not mean the packet is lost. Other modules like ipvlan,
macvlan, and others treat NET_XMIT_CN as success too.

So batman-adv should add the NET_XMIT_CN check.

Signed-off-by: Gao Feng <fgao@ikuai8.com>
---
 v2: Correct two typo "packe" and "ret"
 v1: Initial version

 net/batman-adv/distributed-arp-table.c | 2 +-
 net/batman-adv/fragmentation.c         | 2 +-
 net/batman-adv/routing.c               | 2 +-
 net/batman-adv/tp_meter.c              | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/batman-adv/distributed-arp-table.c b/net/batman-adv/distributed-arp-table.c
index e257efd..4bf0622 100644
--- a/net/batman-adv/distributed-arp-table.c
+++ b/net/batman-adv/distributed-arp-table.c
@@ -660,7 +660,7 @@ static bool batadv_dat_send_data(struct batadv_priv *bat_priv,
 		}
 
 		send_status = batadv_send_unicast_skb(tmp_skb, neigh_node);
-		if (send_status == NET_XMIT_SUCCESS) {
+		if (send_status == NET_XMIT_SUCCESS || send_status == NET_XMIT_CN) {
 			/* count the sent packet */
 			switch (packet_subtype) {
 			case BATADV_P_DAT_DHT_GET:
diff --git a/net/batman-adv/fragmentation.c b/net/batman-adv/fragmentation.c
index 0934730..4714b8f 100644
--- a/net/batman-adv/fragmentation.c
+++ b/net/batman-adv/fragmentation.c
@@ -495,7 +495,7 @@ int batadv_frag_send_packet(struct sk_buff *skb,
 		batadv_add_counter(bat_priv, BATADV_CNT_FRAG_TX_BYTES,
 				   skb_fragment->len + ETH_HLEN);
 		ret = batadv_send_unicast_skb(skb_fragment, neigh_node);
-		if (ret != NET_XMIT_SUCCESS) {
+		if (ret != NET_XMIT_SUCCESS && ret != NET_XMIT_CN) {
 			/* return -1 so that the caller can free the original
 			 * skb
 			 */
diff --git a/net/batman-adv/routing.c b/net/batman-adv/routing.c
index 7e8dc64..f44fb07 100644
--- a/net/batman-adv/routing.c
+++ b/net/batman-adv/routing.c
@@ -706,7 +706,7 @@ static int batadv_route_unicast_packet(struct sk_buff *skb,
 		goto out;
 
 	/* translate transmit result into receive result */
-	if (res == NET_XMIT_SUCCESS) {
+	if (res == NET_XMIT_SUCCESS || res == NET_XMIT_CN) {
 		/* skb was transmitted and consumed */
 		batadv_inc_counter(bat_priv, BATADV_CNT_FORWARD);
 		batadv_add_counter(bat_priv, BATADV_CNT_FORWARD_BYTES,
diff --git a/net/batman-adv/tp_meter.c b/net/batman-adv/tp_meter.c
index 8af1611..461dbad 100644
--- a/net/batman-adv/tp_meter.c
+++ b/net/batman-adv/tp_meter.c
@@ -618,7 +618,7 @@ static int batadv_tp_send_msg(struct batadv_tp_vars *tp_vars, const u8 *src,
 	if (r == -1)
 		kfree_skb(skb);
 
-	if (r == NET_XMIT_SUCCESS)
+	if (r == NET_XMIT_SUCCESS || r == NET_XMIT_CN)
 		return 0;
 
 	return BATADV_TP_REASON_CANT_SEND;
-- 
1.9.1

^ permalink raw reply related

* Re: [PATCH net 1/1] net: batman-adv: Treat NET_XMIT_CN as transmit successfully
From: Feng Gao @ 2016-11-21 11:01 UTC (permalink / raw)
  To: Sergei Shtylyov
  Cc: mareklindner, sw, a, David S. Miller, b.a.t.m.a.n,
	Linux Kernel Network Developers
In-Reply-To: <850ff61e-8f94-9316-ed1f-c7d3e8faf95b@cogentembedded.com>

Hi Sergei,

On Mon, Nov 21, 2016 at 6:44 PM, Sergei Shtylyov
<sergei.shtylyov@cogentembedded.com> wrote:
> Hello.
>
> On 11/21/2016 3:39 AM, fgao@ikuai8.com wrote:
>
>> From: Gao Feng <fgao@ikuai8.com>
>>
>> The tc could return NET_XMIT_CN as one congestion notification, but
>> it does not mean the packe is lost. Other modules like ipvlan,
>
>
>    Packet.

Thanks, it was typo.

>
>> macvlan, and others treat NET_XMIT_CN as success too.
>>
>> So batman-adv should add the NET_XMIT_CN check.
>>
>> Signed-off-by: Gao Feng <fgao@ikuai8.com>
>
>
> [...]
>
>> diff --git a/net/batman-adv/routing.c b/net/batman-adv/routing.c
>> index 7e8dc64..8edd324 100644
>> --- a/net/batman-adv/routing.c
>> +++ b/net/batman-adv/routing.c
>> @@ -706,7 +706,7 @@ static int batadv_route_unicast_packet(struct sk_buff
>> *skb,
>>                 goto out;
>>
>>         /* translate transmit result into receive result */
>> -       if (res == NET_XMIT_SUCCESS) {
>> +       if (res == NET_XMIT_SUCCESS || ret == NET_XMIT_CN) {

Thanks again.
I didn't find it during myself's review and compile process.

>
>
>    Not 'res == NET_XMIT_CN'?
>
>>                 /* skb was transmitted and consumed */
>>                 batadv_inc_counter(bat_priv, BATADV_CNT_FORWARD);
>>                 batadv_add_counter(bat_priv, BATADV_CNT_FORWARD_BYTES,
>
> [...]
>
> MBR, Sergei
>

I have sent the v2 patch which corrects these two typos.

Best Regards
Feng

^ permalink raw reply

* Re: [PATCH net v2 1/1] net: batman-adv: Treat NET_XMIT_CN as transmit successfully
From: Florian Westphal @ 2016-11-21 11:09 UTC (permalink / raw)
  To: fgao; +Cc: mareklindner, sw, a, davem, b.a.t.m.a.n, netdev, gfree.wind
In-Reply-To: <1479725922-5112-1-git-send-email-fgao@ikuai8.com>

fgao@ikuai8.com <fgao@ikuai8.com> wrote:
> From: Gao Feng <fgao@ikuai8.com>
> 
> The tc could return NET_XMIT_CN as one congestion notification, but
> it does not mean the packet is lost. Other modules like ipvlan,
> macvlan, and others treat NET_XMIT_CN as success too.
> 
> So batman-adv should add the NET_XMIT_CN check.

"The tc could return NET_XMIT_CN as one congestion notification, but
it means another packet got dropped. Other modules like batman do not
treat NET_XMIT_CN as success, so modules like ipvlan, macvlan, ..
should ignore it as well."

What I am asking is:
Are you sure adding NET_XMIT_CN handling everywhere is the right way to
resolve the inconsistency?

^ permalink raw reply

* Re: [PATCH] mlxsw: switchib:  add MLXSW_PCI dependency
From: Jiri Pirko @ 2016-11-21 11:20 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Jiri Pirko, Ido Schimmel, David S. Miller, Ivan Vecera, Elad Raz,
	netdev, linux-kernel
In-Reply-To: <20161118160127.473555-1-arnd@arndb.de>

Fri, Nov 18, 2016 at 05:01:14PM CET, arnd@arndb.de wrote:
>The newly added switchib driver fails to link if MLXSW_PCI=m:
>
>drivers/net/ethernet/mellanox/mlxsw/mlxsw_switchib.o: In function^Cmlxsw_sib_module_exit':
>switchib.c:(.exit.text+0x8): undefined reference to `mlxsw_pci_driver_unregister'
>switchib.c:(.exit.text+0x10): undefined reference to `mlxsw_pci_driver_unregister'
>drivers/net/ethernet/mellanox/mlxsw/mlxsw_switchib.o: In function `mlxsw_sib_module_init':
>switchib.c:(.init.text+0x28): undefined reference to `mlxsw_pci_driver_register'
>switchib.c:(.init.text+0x38): undefined reference to `mlxsw_pci_driver_register'
>switchib.c:(.init.text+0x48): undefined reference to `mlxsw_pci_driver_unregister'
>
>The other two such sub-drivers have a dependency, so add the same one
>here. In theory we could allow this driver if MLXSW_PCI is disabled,
>but it's probably not worth it.
>
>Signed-off-by: Arnd Bergmann <arnd@arndb.de>

Thanks.

Reviewed-by: Jiri Pirko <jiri@mellanox.com>

^ permalink raw reply

* ip -s macsec show: Statistics for OutOctets... and OutPkts... switched
From: Daniel.Hopf @ 2016-11-21 11:31 UTC (permalink / raw)
  To: netdev

Dear community,

I'm using the following ip utility:

ubuntu2@ubuntu2:~$ ip -V
ip utility, iproute2-ss161009
ubuntu2@ubuntu2:~$ uname -r
4.8.0-22-generic

During tests with the recent MACsec implementation I noticed that the 
statistics for OutOctets[Protected|Encrypted] and 
OutPkts[Protected|Encrypted] are switched.

The error seems to reside in lines 637:640 of iproute2/ip/ipmacsec.c:
        [MACSEC_TXSC_STATS_ATTR_OUT_PKTS_PROTECTED] = 
"OutOctetsProtected",
        [MACSEC_TXSC_STATS_ATTR_OUT_PKTS_ENCRYPTED] = 
"OutOctetsEncrypted",
        [MACSEC_TXSC_STATS_ATTR_OUT_OCTETS_PROTECTED] = 
"OutPktsProtected",
        [MACSEC_TXSC_STATS_ATTR_OUT_OCTETS_ENCRYPTED] = 
"OutPktsEncrypted",

In my opinion this should instead read:
        [MACSEC_TXSC_STATS_ATTR_OUT_PKTS_PROTECTED] = "OutPktsProtected",
        [MACSEC_TXSC_STATS_ATTR_OUT_PKTS_ENCRYPTED] = "OutPktsEncrypted",
        [MACSEC_TXSC_STATS_ATTR_OUT_OCTETS_PROTECTED] = 
"OutOctetsProtected",
        [MACSEC_TXSC_STATS_ATTR_OUT_OCTETS_ENCRYPTED] = 
"OutOctetsEncrypted",

Regards
- Daniel

^ permalink raw reply

* Re: [net-next PATCH v2 4/5] virtio_net: add dedicated XDP transmit queues
From: Daniel Borkmann @ 2016-11-21 11:45 UTC (permalink / raw)
  To: John Fastabend, eric.dumazet, mst, kubakici, shm, davem,
	alexei.starovoitov
  Cc: netdev, bblanco, john.r.fastabend, brouer, tgraf
In-Reply-To: <20161120025104.19187.54400.stgit@john-Precision-Tower-5810>

On 11/20/2016 03:51 AM, John Fastabend wrote:
> XDP requires using isolated transmit queues to avoid interference
> with normal networking stack (BQL, NETDEV_TX_BUSY, etc). This patch
> adds a XDP queue per cpu when a XDP program is loaded and does not
> expose the queues to the OS via the normal API call to
> netif_set_real_num_tx_queues(). This way the stack will never push
> an skb to these queues.
>
> However virtio/vhost/qemu implementation only allows for creating
> TX/RX queue pairs at this time so creating only TX queues was not
> possible. And because the associated RX queues are being created I
> went ahead and exposed these to the stack and let the backend use
> them. This creates more RX queues visible to the network stack than
> TX queues which is worth mentioning but does not cause any issues as
> far as I can tell.
>
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> ---
>   drivers/net/virtio_net.c |   32 +++++++++++++++++++++++++++++---
>   1 file changed, 29 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 8f99a53..80a426c 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -114,6 +114,9 @@ struct virtnet_info {
>   	/* # of queue pairs currently used by the driver */
>   	u16 curr_queue_pairs;
>
> +	/* # of XDP queue pairs currently used by the driver */
> +	u16 xdp_queue_pairs;
> +
>   	/* I like... big packets and I cannot lie! */
>   	bool big_packets;
>
> @@ -1525,7 +1528,8 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog)
>   {
>   	struct virtnet_info *vi = netdev_priv(dev);
>   	struct bpf_prog *old_prog;
> -	int i;
> +	u16 xdp_qp = 0, curr_qp;
> +	int err, i;
>
>   	if ((dev->features & NETIF_F_LRO) && prog) {
>   		netdev_warn(dev, "can't set XDP while LRO is on, disable LRO first\n");
> @@ -1542,12 +1546,34 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog)
>   		return -EINVAL;
>   	}
>
> +	curr_qp = vi->curr_queue_pairs - vi->xdp_queue_pairs;
> +	if (prog)
> +		xdp_qp = nr_cpu_ids;
> +
> +	/* XDP requires extra queues for XDP_TX */
> +	if (curr_qp + xdp_qp > vi->max_queue_pairs) {
> +		netdev_warn(dev, "request %i queues but max is %i\n",
> +			    curr_qp + xdp_qp, vi->max_queue_pairs);
> +		return -ENOMEM;
> +	}
> +
> +	err = virtnet_set_queues(vi, curr_qp + xdp_qp);
> +	if (err) {
> +		dev_warn(&dev->dev, "XDP Device queue allocation failure.\n");
> +		return err;
> +	}
> +
>   	if (prog) {
> -		prog = bpf_prog_add(prog, vi->max_queue_pairs - 1);
> -		if (IS_ERR(prog))
> +		prog = bpf_prog_add(prog, vi->max_queue_pairs);

I think this change is not correct, it would be off by one now.
The previous 'vi->max_queue_pairs - 1' was actually correct here.
dev_change_xdp_fd() already gives you a reference (see the doc on
enum xdp_netdev_command in netdevice.h).

> +		if (IS_ERR(prog)) {
> +			virtnet_set_queues(vi, curr_qp);
>   			return PTR_ERR(prog);
> +		}
>   	}
>
> +	vi->xdp_queue_pairs = xdp_qp;
> +	netif_set_real_num_rx_queues(dev, curr_qp + xdp_qp);
> +
>   	for (i = 0; i < vi->max_queue_pairs; i++) {
>   		old_prog = rtnl_dereference(vi->rq[i].xdp_prog);
>   		rcu_assign_pointer(vi->rq[i].xdp_prog, prog);
>

^ permalink raw reply

* [PATCH net-next 0/2] bridge: add support for IGMPv3 and MLDv2 querier
From: Nikolay Aleksandrov @ 2016-11-21 12:03 UTC (permalink / raw)
  To: netdev; +Cc: roopa, sashok, stephen, davem, liuhangbin, Nikolay Aleksandrov

Hi all,
This patch-set adds support for IGMPv3 and MLDv2 querier in the bridge.
Two new options which can be toggled via netlink and sysfs are added that
control the version per-bridge:
 multicast_igmp_version - default 2, can be set to 3
 multicast_mld_version - default 1, can be set to 2 (this option is
                         disabled if CONFIG_IPV6=n)

Note that the names do not include "querier", I think that these options
can be re-used later as more IGMPv3 support is added to the bridge so we
can avoid adding more options to switch between v2 and v3 behaviour.

The set uses the already existing br_ip{4,6}_multicast_alloc_query
functions and adds the appropriate header based on the chosen version.

For the initial support I have removed the compatibility implementation
(RFC3376 sec 7.3.1, 7.3.2; RFC3810 sec 8.3.1, 8.3.2), because there are
some details that we need to sort out.

Thank you,
 Nik


Nikolay Aleksandrov (2):
  bridge: mcast: add IGMPv3 query support
  bridge: mcast: add MLDv2 querier support

 include/uapi/linux/if_link.h |   2 +
 net/bridge/br_multicast.c    | 166 +++++++++++++++++++++++++++++++++----------
 net/bridge/br_netlink.c      |  34 ++++++++-
 net/bridge/br_private.h      |   7 ++
 net/bridge/br_sysfs_br.c     |  40 +++++++++++
 5 files changed, 210 insertions(+), 39 deletions(-)

-- 
2.1.4

^ permalink raw reply

* [PATCH net-next 1/2] bridge: mcast: add IGMPv3 query support
From: Nikolay Aleksandrov @ 2016-11-21 12:03 UTC (permalink / raw)
  To: netdev; +Cc: roopa, sashok, stephen, davem, liuhangbin, Nikolay Aleksandrov
In-Reply-To: <1479729805-23108-1-git-send-email-nikolay@cumulusnetworks.com>

This patch adds basic support for IGMPv3 queries, the default is IGMPv2
as before. A new multicast option - multicast_igmp_version, adds the
ability to change it between 2 and 3 via netlink and sysfs. The option
struct member is in a 4 byte hole in net_bridge.

There also a few minor style adjustments in br_multicast_new_group and
br_multicast_add_group.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
---
 include/uapi/linux/if_link.h |  1 +
 net/bridge/br_multicast.c    | 79 ++++++++++++++++++++++++++++++++++----------
 net/bridge/br_netlink.c      | 15 ++++++++-
 net/bridge/br_private.h      |  3 ++
 net/bridge/br_sysfs_br.c     | 18 ++++++++++
 5 files changed, 98 insertions(+), 18 deletions(-)

diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index b4fba662cd32..325d2601150d 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -275,6 +275,7 @@ enum {
 	IFLA_BR_PAD,
 	IFLA_BR_VLAN_STATS_ENABLED,
 	IFLA_BR_MCAST_STATS_ENABLED,
+	IFLA_BR_MCAST_IGMP_VERSION,
 	__IFLA_BR_MAX,
 };
 
diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 073d54afa056..66192c11aa45 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -365,13 +365,18 @@ static struct sk_buff *br_ip4_multicast_alloc_query(struct net_bridge *br,
 						    __be32 group,
 						    u8 *igmp_type)
 {
+	struct igmpv3_query *ihv3;
+	size_t igmp_hdr_size;
 	struct sk_buff *skb;
 	struct igmphdr *ih;
 	struct ethhdr *eth;
 	struct iphdr *iph;
 
+	igmp_hdr_size = sizeof(*ih);
+	if (br->multicast_igmp_version == 3)
+		igmp_hdr_size = sizeof(*ihv3);
 	skb = netdev_alloc_skb_ip_align(br->dev, sizeof(*eth) + sizeof(*iph) +
-						 sizeof(*ih) + 4);
+						 igmp_hdr_size + 4);
 	if (!skb)
 		goto out;
 
@@ -396,7 +401,7 @@ static struct sk_buff *br_ip4_multicast_alloc_query(struct net_bridge *br,
 	iph->version = 4;
 	iph->ihl = 6;
 	iph->tos = 0xc0;
-	iph->tot_len = htons(sizeof(*iph) + sizeof(*ih) + 4);
+	iph->tot_len = htons(sizeof(*iph) + igmp_hdr_size + 4);
 	iph->id = 0;
 	iph->frag_off = htons(IP_DF);
 	iph->ttl = 1;
@@ -412,17 +417,37 @@ static struct sk_buff *br_ip4_multicast_alloc_query(struct net_bridge *br,
 	skb_put(skb, 24);
 
 	skb_set_transport_header(skb, skb->len);
-	ih = igmp_hdr(skb);
 	*igmp_type = IGMP_HOST_MEMBERSHIP_QUERY;
-	ih->type = IGMP_HOST_MEMBERSHIP_QUERY;
-	ih->code = (group ? br->multicast_last_member_interval :
-			    br->multicast_query_response_interval) /
-		   (HZ / IGMP_TIMER_SCALE);
-	ih->group = group;
-	ih->csum = 0;
-	ih->csum = ip_compute_csum((void *)ih, sizeof(struct igmphdr));
-	skb_put(skb, sizeof(*ih));
 
+	switch (br->multicast_igmp_version) {
+	case 2:
+		ih = igmp_hdr(skb);
+		ih->type = IGMP_HOST_MEMBERSHIP_QUERY;
+		ih->code = (group ? br->multicast_last_member_interval :
+				    br->multicast_query_response_interval) /
+			   (HZ / IGMP_TIMER_SCALE);
+		ih->group = group;
+		ih->csum = 0;
+		ih->csum = ip_compute_csum((void *)ih, sizeof(*ih));
+		break;
+	case 3:
+		ihv3 = igmpv3_query_hdr(skb);
+		ihv3->type = IGMP_HOST_MEMBERSHIP_QUERY;
+		ihv3->code = (group ? br->multicast_last_member_interval :
+				      br->multicast_query_response_interval) /
+			     (HZ / IGMP_TIMER_SCALE);
+		ihv3->group = group;
+		ihv3->qqic = br->multicast_query_interval / HZ;
+		ihv3->nsrcs = 0;
+		ihv3->resv = 0;
+		ihv3->suppress = 0;
+		ihv3->qrv = 2;
+		ihv3->csum = 0;
+		ihv3->csum = ip_compute_csum((void *)ihv3, sizeof(*ihv3));
+		break;
+	}
+
+	skb_put(skb, igmp_hdr_size);
 	__skb_pull(skb, sizeof(*eth));
 
 out:
@@ -608,7 +633,8 @@ static struct net_bridge_mdb_entry *br_multicast_get_group(
 }
 
 struct net_bridge_mdb_entry *br_multicast_new_group(struct net_bridge *br,
-	struct net_bridge_port *port, struct br_ip *group)
+						    struct net_bridge_port *p,
+						    struct br_ip *group)
 {
 	struct net_bridge_mdb_htable *mdb;
 	struct net_bridge_mdb_entry *mp;
@@ -624,7 +650,7 @@ struct net_bridge_mdb_entry *br_multicast_new_group(struct net_bridge *br,
 	}
 
 	hash = br_ip_hash(mdb, group);
-	mp = br_multicast_get_group(br, port, group, hash);
+	mp = br_multicast_get_group(br, p, group, hash);
 	switch (PTR_ERR(mp)) {
 	case 0:
 		break;
@@ -681,9 +707,9 @@ static int br_multicast_add_group(struct net_bridge *br,
 				  struct net_bridge_port *port,
 				  struct br_ip *group)
 {
-	struct net_bridge_mdb_entry *mp;
-	struct net_bridge_port_group *p;
 	struct net_bridge_port_group __rcu **pp;
+	struct net_bridge_port_group *p;
+	struct net_bridge_mdb_entry *mp;
 	unsigned long now = jiffies;
 	int err;
 
@@ -861,9 +887,9 @@ static void br_multicast_send_query(struct net_bridge *br,
 				    struct net_bridge_port *port,
 				    struct bridge_mcast_own_query *own_query)
 {
-	unsigned long time;
-	struct br_ip br_group;
 	struct bridge_mcast_other_query *other_query = NULL;
+	struct br_ip br_group;
+	unsigned long time;
 
 	if (!netif_running(br->dev) || br->multicast_disabled ||
 	    !br->multicast_querier)
@@ -1816,6 +1842,7 @@ void br_multicast_init(struct net_bridge *br)
 	br->hash_elasticity = 4;
 	br->hash_max = 512;
 
+	br->multicast_igmp_version = 2;
 	br->multicast_router = MDB_RTR_TYPE_TEMP_QUERY;
 	br->multicast_querier = 0;
 	br->multicast_query_use_ifaddr = 0;
@@ -2132,6 +2159,24 @@ int br_multicast_set_hash_max(struct net_bridge *br, unsigned long val)
 	return err;
 }
 
+int br_multicast_set_igmp_version(struct net_bridge *br, unsigned long val)
+{
+	/* Currently we support only version 2 and 3 */
+	switch (val) {
+	case 2:
+	case 3:
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	spin_lock_bh(&br->multicast_lock);
+	br->multicast_igmp_version = val;
+	spin_unlock_bh(&br->multicast_lock);
+
+	return 0;
+}
+
 /**
  * br_multicast_list_adjacent - Returns snooped multicast addresses
  * @dev:	The bridge port adjacent to which to retrieve addresses
diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index e99037c6f7b7..10b9b80f778f 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -858,6 +858,7 @@ static const struct nla_policy br_policy[IFLA_BR_MAX + 1] = {
 	[IFLA_BR_VLAN_DEFAULT_PVID] = { .type = NLA_U16 },
 	[IFLA_BR_VLAN_STATS_ENABLED] = { .type = NLA_U8 },
 	[IFLA_BR_MCAST_STATS_ENABLED] = { .type = NLA_U8 },
+	[IFLA_BR_MCAST_IGMP_VERSION] = { .type = NLA_U8 },
 };
 
 static int br_changelink(struct net_device *brdev, struct nlattr *tb[],
@@ -1069,6 +1070,15 @@ static int br_changelink(struct net_device *brdev, struct nlattr *tb[],
 		mcast_stats = nla_get_u8(data[IFLA_BR_MCAST_STATS_ENABLED]);
 		br->multicast_stats_enabled = !!mcast_stats;
 	}
+
+	if (data[IFLA_BR_MCAST_IGMP_VERSION]) {
+		__u8 igmp_version;
+
+		igmp_version = nla_get_u8(data[IFLA_BR_MCAST_IGMP_VERSION]);
+		err = br_multicast_set_igmp_version(br, igmp_version);
+		if (err)
+			return err;
+	}
 #endif
 #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
 	if (data[IFLA_BR_NF_CALL_IPTABLES]) {
@@ -1135,6 +1145,7 @@ static size_t br_get_size(const struct net_device *brdev)
 	       nla_total_size_64bit(sizeof(u64)) + /* IFLA_BR_MCAST_QUERY_INTVL */
 	       nla_total_size_64bit(sizeof(u64)) + /* IFLA_BR_MCAST_QUERY_RESPONSE_INTVL */
 	       nla_total_size_64bit(sizeof(u64)) + /* IFLA_BR_MCAST_STARTUP_QUERY_INTVL */
+	       nla_total_size(sizeof(u8)) +	/* IFLA_BR_MCAST_IGMP_VERSION */
 #endif
 #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
 	       nla_total_size(sizeof(u8)) +     /* IFLA_BR_NF_CALL_IPTABLES */
@@ -1210,7 +1221,9 @@ static int br_fill_info(struct sk_buff *skb, const struct net_device *brdev)
 	    nla_put_u32(skb, IFLA_BR_MCAST_LAST_MEMBER_CNT,
 			br->multicast_last_member_count) ||
 	    nla_put_u32(skb, IFLA_BR_MCAST_STARTUP_QUERY_CNT,
-			br->multicast_startup_query_count))
+			br->multicast_startup_query_count) ||
+	    nla_put_u8(skb, IFLA_BR_MCAST_IGMP_VERSION,
+		       br->multicast_igmp_version))
 		return -EMSGSIZE;
 
 	clockval = jiffies_to_clock_t(br->multicast_last_member_interval);
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 1b63177e0ccd..3d207d92d899 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -333,6 +333,8 @@ struct net_bridge
 	u32				multicast_last_member_count;
 	u32				multicast_startup_query_count;
 
+	u8				multicast_igmp_version;
+
 	unsigned long			multicast_last_member_interval;
 	unsigned long			multicast_membership_interval;
 	unsigned long			multicast_querier_interval;
@@ -582,6 +584,7 @@ int br_multicast_set_port_router(struct net_bridge_port *p, unsigned long val);
 int br_multicast_toggle(struct net_bridge *br, unsigned long val);
 int br_multicast_set_querier(struct net_bridge *br, unsigned long val);
 int br_multicast_set_hash_max(struct net_bridge *br, unsigned long val);
+int br_multicast_set_igmp_version(struct net_bridge *br, unsigned long val);
 struct net_bridge_mdb_entry *
 br_mdb_ip_get(struct net_bridge_mdb_htable *mdb, struct br_ip *dst);
 struct net_bridge_mdb_entry *
diff --git a/net/bridge/br_sysfs_br.c b/net/bridge/br_sysfs_br.c
index e120307c6e36..f00d1690658c 100644
--- a/net/bridge/br_sysfs_br.c
+++ b/net/bridge/br_sysfs_br.c
@@ -440,6 +440,23 @@ static ssize_t hash_max_store(struct device *d, struct device_attribute *attr,
 }
 static DEVICE_ATTR_RW(hash_max);
 
+static ssize_t multicast_igmp_version_show(struct device *d,
+					   struct device_attribute *attr,
+					   char *buf)
+{
+	struct net_bridge *br = to_bridge(d);
+
+	return sprintf(buf, "%u\n", br->multicast_igmp_version);
+}
+
+static ssize_t multicast_igmp_version_store(struct device *d,
+					    struct device_attribute *attr,
+					    const char *buf, size_t len)
+{
+	return store_bridge_parm(d, buf, len, br_multicast_set_igmp_version);
+}
+static DEVICE_ATTR_RW(multicast_igmp_version);
+
 static ssize_t multicast_last_member_count_show(struct device *d,
 						struct device_attribute *attr,
 						char *buf)
@@ -809,6 +826,7 @@ static struct attribute *bridge_attrs[] = {
 	&dev_attr_multicast_query_response_interval.attr,
 	&dev_attr_multicast_startup_query_interval.attr,
 	&dev_attr_multicast_stats_enabled.attr,
+	&dev_attr_multicast_igmp_version.attr,
 #endif
 #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
 	&dev_attr_nf_call_iptables.attr,
-- 
2.1.4

^ permalink raw reply related

* [PATCH net-next 2/2] bridge: mcast: add MLDv2 querier support
From: Nikolay Aleksandrov @ 2016-11-21 12:03 UTC (permalink / raw)
  To: netdev; +Cc: roopa, sashok, stephen, davem, liuhangbin, Nikolay Aleksandrov
In-Reply-To: <1479729805-23108-1-git-send-email-nikolay@cumulusnetworks.com>

This patch adds basic support for MLDv2 queries, the default is MLDv1
as before. A new multicast option - multicast_mld_version, adds the
ability to change it between 1 and 2 via netlink and sysfs.
The MLD option is disabled if CONFIG_IPV6 is disabled.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
---
 include/uapi/linux/if_link.h |  1 +
 net/bridge/br_multicast.c    | 89 +++++++++++++++++++++++++++++++++-----------
 net/bridge/br_netlink.c      | 19 +++++++++-
 net/bridge/br_private.h      |  4 ++
 net/bridge/br_sysfs_br.c     | 22 +++++++++++
 5 files changed, 113 insertions(+), 22 deletions(-)

diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 325d2601150d..92b2d4928bf1 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -276,6 +276,7 @@ enum {
 	IFLA_BR_VLAN_STATS_ENABLED,
 	IFLA_BR_MCAST_STATS_ENABLED,
 	IFLA_BR_MCAST_IGMP_VERSION,
+	IFLA_BR_MCAST_MLD_VERSION,
 	__IFLA_BR_MAX,
 };
 
diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 66192c11aa45..b30e77e8427c 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -459,15 +459,20 @@ static struct sk_buff *br_ip6_multicast_alloc_query(struct net_bridge *br,
 						    const struct in6_addr *grp,
 						    u8 *igmp_type)
 {
-	struct sk_buff *skb;
+	struct mld2_query *mld2q;
+	unsigned long interval;
 	struct ipv6hdr *ip6h;
 	struct mld_msg *mldq;
+	size_t mld_hdr_size;
+	struct sk_buff *skb;
 	struct ethhdr *eth;
 	u8 *hopopt;
-	unsigned long interval;
 
+	mld_hdr_size = sizeof(*mldq);
+	if (br->multicast_mld_version == 2)
+		mld_hdr_size = sizeof(*mld2q);
 	skb = netdev_alloc_skb_ip_align(br->dev, sizeof(*eth) + sizeof(*ip6h) +
-						 8 + sizeof(*mldq));
+						 8 + mld_hdr_size);
 	if (!skb)
 		goto out;
 
@@ -486,7 +491,7 @@ static struct sk_buff *br_ip6_multicast_alloc_query(struct net_bridge *br,
 	ip6h = ipv6_hdr(skb);
 
 	*(__force __be32 *)ip6h = htonl(0x60000000);
-	ip6h->payload_len = htons(8 + sizeof(*mldq));
+	ip6h->payload_len = htons(8 + mld_hdr_size);
 	ip6h->nexthdr = IPPROTO_HOPOPTS;
 	ip6h->hop_limit = 1;
 	ipv6_addr_set(&ip6h->daddr, htonl(0xff020000), 0, 0, htonl(1));
@@ -514,26 +519,47 @@ static struct sk_buff *br_ip6_multicast_alloc_query(struct net_bridge *br,
 
 	/* ICMPv6 */
 	skb_set_transport_header(skb, skb->len);
-	mldq = (struct mld_msg *) icmp6_hdr(skb);
-
 	interval = ipv6_addr_any(grp) ?
 			br->multicast_query_response_interval :
 			br->multicast_last_member_interval;
-
 	*igmp_type = ICMPV6_MGM_QUERY;
-	mldq->mld_type = ICMPV6_MGM_QUERY;
-	mldq->mld_code = 0;
-	mldq->mld_cksum = 0;
-	mldq->mld_maxdelay = htons((u16)jiffies_to_msecs(interval));
-	mldq->mld_reserved = 0;
-	mldq->mld_mca = *grp;
-
-	/* checksum */
-	mldq->mld_cksum = csum_ipv6_magic(&ip6h->saddr, &ip6h->daddr,
-					  sizeof(*mldq), IPPROTO_ICMPV6,
-					  csum_partial(mldq,
-						       sizeof(*mldq), 0));
-	skb_put(skb, sizeof(*mldq));
+	switch (br->multicast_mld_version) {
+	case 1:
+		mldq = (struct mld_msg *)icmp6_hdr(skb);
+		mldq->mld_type = ICMPV6_MGM_QUERY;
+		mldq->mld_code = 0;
+		mldq->mld_cksum = 0;
+		mldq->mld_maxdelay = htons((u16)jiffies_to_msecs(interval));
+		mldq->mld_reserved = 0;
+		mldq->mld_mca = *grp;
+		mldq->mld_cksum = csum_ipv6_magic(&ip6h->saddr, &ip6h->daddr,
+						  sizeof(*mldq), IPPROTO_ICMPV6,
+						  csum_partial(mldq,
+							       sizeof(*mldq),
+							       0));
+		break;
+	case 2:
+		mld2q = (struct mld2_query *)icmp6_hdr(skb);
+		mld2q->mld2q_mrc = ntohs((u16)jiffies_to_msecs(interval));
+		mld2q->mld2q_type = ICMPV6_MGM_QUERY;
+		mld2q->mld2q_code = 0;
+		mld2q->mld2q_cksum = 0;
+		mld2q->mld2q_resv1 = 0;
+		mld2q->mld2q_resv2 = 0;
+		mld2q->mld2q_suppress = 0;
+		mld2q->mld2q_qrv = 2;
+		mld2q->mld2q_nsrcs = 0;
+		mld2q->mld2q_qqic = br->multicast_query_interval / HZ;
+		mld2q->mld2q_mca = *grp;
+		mld2q->mld2q_cksum = csum_ipv6_magic(&ip6h->saddr, &ip6h->daddr,
+						     sizeof(*mld2q),
+						     IPPROTO_ICMPV6,
+						     csum_partial(mld2q,
+								  sizeof(*mld2q),
+								  0));
+		break;
+	}
+	skb_put(skb, mld_hdr_size);
 
 	__skb_pull(skb, sizeof(*eth));
 
@@ -1842,7 +1868,6 @@ void br_multicast_init(struct net_bridge *br)
 	br->hash_elasticity = 4;
 	br->hash_max = 512;
 
-	br->multicast_igmp_version = 2;
 	br->multicast_router = MDB_RTR_TYPE_TEMP_QUERY;
 	br->multicast_querier = 0;
 	br->multicast_query_use_ifaddr = 0;
@@ -1858,7 +1883,9 @@ void br_multicast_init(struct net_bridge *br)
 
 	br->ip4_other_query.delay_time = 0;
 	br->ip4_querier.port = NULL;
+	br->multicast_igmp_version = 2;
 #if IS_ENABLED(CONFIG_IPV6)
+	br->multicast_mld_version = 1;
 	br->ip6_other_query.delay_time = 0;
 	br->ip6_querier.port = NULL;
 #endif
@@ -2177,6 +2204,26 @@ int br_multicast_set_igmp_version(struct net_bridge *br, unsigned long val)
 	return 0;
 }
 
+#if IS_ENABLED(CONFIG_IPV6)
+int br_multicast_set_mld_version(struct net_bridge *br, unsigned long val)
+{
+	/* Currently we support version 1 and 2 */
+	switch (val) {
+	case 1:
+	case 2:
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	spin_lock_bh(&br->multicast_lock);
+	br->multicast_mld_version = val;
+	spin_unlock_bh(&br->multicast_lock);
+
+	return 0;
+}
+#endif
+
 /**
  * br_multicast_list_adjacent - Returns snooped multicast addresses
  * @dev:	The bridge port adjacent to which to retrieve addresses
diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index 10b9b80f778f..71c7453268c1 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -859,6 +859,7 @@ static const struct nla_policy br_policy[IFLA_BR_MAX + 1] = {
 	[IFLA_BR_VLAN_STATS_ENABLED] = { .type = NLA_U8 },
 	[IFLA_BR_MCAST_STATS_ENABLED] = { .type = NLA_U8 },
 	[IFLA_BR_MCAST_IGMP_VERSION] = { .type = NLA_U8 },
+	[IFLA_BR_MCAST_MLD_VERSION] = { .type = NLA_U8 },
 };
 
 static int br_changelink(struct net_device *brdev, struct nlattr *tb[],
@@ -1079,6 +1080,17 @@ static int br_changelink(struct net_device *brdev, struct nlattr *tb[],
 		if (err)
 			return err;
 	}
+
+#if IS_ENABLED(CONFIG_IPV6)
+	if (data[IFLA_BR_MCAST_MLD_VERSION]) {
+		__u8 mld_version;
+
+		mld_version = nla_get_u8(data[IFLA_BR_MCAST_MLD_VERSION]);
+		err = br_multicast_set_mld_version(br, mld_version);
+		if (err)
+			return err;
+	}
+#endif
 #endif
 #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
 	if (data[IFLA_BR_NF_CALL_IPTABLES]) {
@@ -1146,6 +1158,7 @@ static size_t br_get_size(const struct net_device *brdev)
 	       nla_total_size_64bit(sizeof(u64)) + /* IFLA_BR_MCAST_QUERY_RESPONSE_INTVL */
 	       nla_total_size_64bit(sizeof(u64)) + /* IFLA_BR_MCAST_STARTUP_QUERY_INTVL */
 	       nla_total_size(sizeof(u8)) +	/* IFLA_BR_MCAST_IGMP_VERSION */
+	       nla_total_size(sizeof(u8)) +	/* IFLA_BR_MCAST_MLD_VERSION */
 #endif
 #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
 	       nla_total_size(sizeof(u8)) +     /* IFLA_BR_NF_CALL_IPTABLES */
@@ -1225,7 +1238,11 @@ static int br_fill_info(struct sk_buff *skb, const struct net_device *brdev)
 	    nla_put_u8(skb, IFLA_BR_MCAST_IGMP_VERSION,
 		       br->multicast_igmp_version))
 		return -EMSGSIZE;
-
+#if IS_ENABLED(CONFIG_IPV6)
+	if (nla_put_u8(skb, IFLA_BR_MCAST_MLD_VERSION,
+		       br->multicast_mld_version))
+		return -EMSGSIZE;
+#endif
 	clockval = jiffies_to_clock_t(br->multicast_last_member_interval);
 	if (nla_put_u64_64bit(skb, IFLA_BR_MCAST_LAST_MEMBER_INTVL, clockval,
 			      IFLA_BR_PAD))
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 3d207d92d899..26aec2366bc3 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -355,6 +355,7 @@ struct net_bridge
 	struct bridge_mcast_other_query	ip6_other_query;
 	struct bridge_mcast_own_query	ip6_own_query;
 	struct bridge_mcast_querier	ip6_querier;
+	u8				multicast_mld_version;
 #endif /* IS_ENABLED(CONFIG_IPV6) */
 #endif
 
@@ -585,6 +586,9 @@ int br_multicast_toggle(struct net_bridge *br, unsigned long val);
 int br_multicast_set_querier(struct net_bridge *br, unsigned long val);
 int br_multicast_set_hash_max(struct net_bridge *br, unsigned long val);
 int br_multicast_set_igmp_version(struct net_bridge *br, unsigned long val);
+#if IS_ENABLED(CONFIG_IPV6)
+int br_multicast_set_mld_version(struct net_bridge *br, unsigned long val);
+#endif
 struct net_bridge_mdb_entry *
 br_mdb_ip_get(struct net_bridge_mdb_htable *mdb, struct br_ip *dst);
 struct net_bridge_mdb_entry *
diff --git a/net/bridge/br_sysfs_br.c b/net/bridge/br_sysfs_br.c
index f00d1690658c..c9d2e0abfb89 100644
--- a/net/bridge/br_sysfs_br.c
+++ b/net/bridge/br_sysfs_br.c
@@ -659,6 +659,25 @@ static ssize_t multicast_stats_enabled_store(struct device *d,
 	return store_bridge_parm(d, buf, len, set_stats_enabled);
 }
 static DEVICE_ATTR_RW(multicast_stats_enabled);
+
+#if IS_ENABLED(CONFIG_IPV6)
+static ssize_t multicast_mld_version_show(struct device *d,
+					  struct device_attribute *attr,
+					  char *buf)
+{
+	struct net_bridge *br = to_bridge(d);
+
+	return sprintf(buf, "%u\n", br->multicast_mld_version);
+}
+
+static ssize_t multicast_mld_version_store(struct device *d,
+					   struct device_attribute *attr,
+					   const char *buf, size_t len)
+{
+	return store_bridge_parm(d, buf, len, br_multicast_set_mld_version);
+}
+static DEVICE_ATTR_RW(multicast_mld_version);
+#endif
 #endif
 #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
 static ssize_t nf_call_iptables_show(
@@ -827,6 +846,9 @@ static struct attribute *bridge_attrs[] = {
 	&dev_attr_multicast_startup_query_interval.attr,
 	&dev_attr_multicast_stats_enabled.attr,
 	&dev_attr_multicast_igmp_version.attr,
+#if IS_ENABLED(CONFIG_IPV6)
+	&dev_attr_multicast_mld_version.attr,
+#endif
 #endif
 #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
 	&dev_attr_nf_call_iptables.attr,
-- 
2.1.4

^ permalink raw reply related

* Re: [PATCH v9 0/8] thunderbolt: Introducing Thunderbolt(TM) Networking
From: gregkh @ 2016-11-21 12:22 UTC (permalink / raw)
  To: Levy, Amir (Jer)
  Cc: Simon Guinot, andreas.noever@gmail.com, bhelgaas@google.com,
	corbet@lwn.net, linux-kernel@vger.kernel.org,
	linux-pci@vger.kernel.org, netdev@vger.kernel.org,
	linux-doc@vger.kernel.org, mario_limonciello@dell.com,
	thunderbolt-linux, Westerberg, Mika, Winkler, Tomas,
	Zhang, Xiong Y, Jamet, Michael, remi.rerolle@seagate.com
In-Reply-To: <E607265CB020454880711A6F96C05A03BE2214E2@hasmsx107.ger.corp.intel.com>

On Sun, Nov 20, 2016 at 06:30:19AM +0000, Levy, Amir (Jer) wrote:
> On Fri, Nov 18 2016, 12:07 PM, gregkh@linuxfoundation.org wrote:
> > On Fri, Nov 18, 2016 at 08:48:36AM +0000, Levy, Amir (Jer) wrote:
> > > > BTW, it is quite a shame that the Thunderbolt firmware version can't
> > > > be read from Linux.
> > > >
> > >
> > > This is WIP, once this patch will be upstream, we will be able to
> > > focus more on aligning Linux with the Thunderbolt features that we have
> > for windows.
> > 
> > Why is this patch somehow holding that work back?  You aren't just sitting
> > around waiting for people to review this and not doing anything else, right?
> > Is there some basic building block in these patches that your firmware
> > download code is going to rely on?
> > 
> > confused,
> > 
> > greg k-h
> 
> All the Thunderbolt SW features (including networking and FW update) depend 
> on the communication with FW, which is patch 3/8 in the series.
> The patch also sets up a generic netlink for user space communication.

It's that "generic netlink" connection that I really want a whole lot of
revewers to read over as it's very unusual and "different" from all
other driver subsystems.

thanks,

greg k-h

^ permalink raw reply

* Re: Synopsys Ethernet QoS Driver
From: Joao Pinto @ 2016-11-21 12:32 UTC (permalink / raw)
  To: Rayagond Kokatanur, Rabin Vincent, Giuseppe Cavallaro
  Cc: Joao Pinto, mued dib, David Miller, Jeff Kirsher, jiri, saeedm,
	idosch, netdev, linux-kernel, CARLOS.PALMINHA, andreas.irestal,
	alexandre.torgue, lars.persson
In-Reply-To: <CAJ3bTp7PpUJcz=1JohX=A54UP6vVAafWnB0PGUBSxb5JJiO8Lg@mail.gmail.com>

Hello,

On 21-11-2016 05:29, Rayagond Kokatanur wrote:
> On Sat, Nov 19, 2016 at 7:26 PM, Rabin Vincent <rabin@rab.in> wrote:
>> On Fri, Nov 18, 2016 at 02:20:27PM +0000, Joao Pinto wrote:
>>> For now we are interesting in improving the synopsys QoS driver under
>>> /nect/ethernet/synopsys. For now the driver structure consists of a single file
>>> called dwc_eth_qos.c, containing synopsys ethernet qos common ops and platform
>>> related stuff.
>>>
>>> Our strategy would be:
>>>
>>> a) Implement a platform glue driver (dwc_eth_qos_pltfm.c)
>>> b) Implement a pci glue driver (dwc_eth_qos_pci.c)
>>> c) Implement a "core driver" (dwc_eth_qos.c) that would only have Ethernet QoS
>>> related stuff to be reused by the platform / pci drivers
>>> d) Add a set of features to the "core driver" that we have available internally
>>
>> Note that there are actually two drivers in mainline for this hardware:
>>
>>  drivers/net/ethernet/synopsis/
>>  drivers/net/ethernet/stmicro/stmmac/
> 
> Yes the later driver (drivers/net/ethernet/stmicro/stmmac/) supports
> both 3.x and 4.x. It has glue layer for pci, platform, core etc,
> please refer this driver once before you start.
> 
> You can start adding missing feature of 4.x in stmmac driver.

Thanks you all for all the info.
Well, I think we are in a good position to organize the ethernet drivers
concerning Synopsys IPs.

First of all, in my opinion, it does not make sense to have a ethernet/synopsis
(typo :)) when ethernet/stmicro is also for a synopsys IP. If we have another
vendor using the same IP it should be able to reuse the commonn operations. But
I would put that discussion for later :)

For now I suggest that for we create ethernet/qos and create there a folder
called dwc (designware controller) where all the synopsys qos IP specific code
in order to be reused for example by ethernet/stmicro/stmmac/. We just have to
figure out a clean interface for "client drivers" like stmmac to interact with
the new qos driver.

What do you think about this approach?


> 
>>
>> (See http://lists.openwall.net/netdev/2016/02/29/127)
>>
>> The former only supports 4.x of the hardware.
>>
>> The later supports 4.x and 3.x and already has a platform glue driver
>> with support for several platforms, a PCI glue driver, and a core driver
>> with several features not present in the former (for example: TX/RX
>> interrupt coalescing, EEE, PTP).
>>
>> Have you evaluated both drivers?  Why have you decided to work on the
>> former rather than the latter?
> 
> 

Thanks.

^ permalink raw reply

* Re: [PATCH v7 1/2] nl80211: multicast_to_unicast can be changed while IFF_UP
From: Johannes Berg @ 2016-11-21 12:50 UTC (permalink / raw)
  To: Michael Braun
  Cc: linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	projekt-wlan-3kN+8DYepx7zMJDuovMtMLNAH6kLmebB
In-Reply-To: <1477921260-24556-1-git-send-email-michael-dev-1SGGS//iJ+Y38rf8aCqVIw@public.gmane.org>

On Mon, 2016-10-31 at 14:40 +0100, Michael Braun wrote:
> There is no need to prevent toggling multicast_to_unicast while
> interface is already up. This change simplifies reconfiguration
> from hostapd.

Applied. This check never should've been there anyway, if desired,
NEED_NETDEV_UP should've been used.

johannes

^ permalink raw reply

* Re: Synopsys Ethernet QoS Driver
From: Giuseppe CAVALLARO @ 2016-11-21 12:52 UTC (permalink / raw)
  To: Joao Pinto, Rayagond Kokatanur, Rabin Vincent
  Cc: andreas.irestal, Alexandre TORGUE, saeedm, netdev, linux-kernel,
	CARLOS.PALMINHA, idosch, mued dib, jiri, Jeff Kirsher,
	David Miller, linux-arm-kernel@lists.infradead.org, lars.persson
In-Reply-To: <1248f4ce-4859-10e6-fef9-342ea543f8d4@synopsys.com>

Hello Joao

On 11/21/2016 1:32 PM, Joao Pinto wrote:
> Hello,
>
> On 21-11-2016 05:29, Rayagond Kokatanur wrote:
>> On Sat, Nov 19, 2016 at 7:26 PM, Rabin Vincent <rabin@rab.in> wrote:
>>> On Fri, Nov 18, 2016 at 02:20:27PM +0000, Joao Pinto wrote:
>>>> For now we are interesting in improving the synopsys QoS driver under
>>>> /nect/ethernet/synopsys. For now the driver structure consists of a single file
>>>> called dwc_eth_qos.c, containing synopsys ethernet qos common ops and platform
>>>> related stuff.
>>>>
>>>> Our strategy would be:
>>>>
>>>> a) Implement a platform glue driver (dwc_eth_qos_pltfm.c)
>>>> b) Implement a pci glue driver (dwc_eth_qos_pci.c)
>>>> c) Implement a "core driver" (dwc_eth_qos.c) that would only have Ethernet QoS
>>>> related stuff to be reused by the platform / pci drivers
>>>> d) Add a set of features to the "core driver" that we have available internally
>>>
>>> Note that there are actually two drivers in mainline for this hardware:
>>>
>>>  drivers/net/ethernet/synopsis/
>>>  drivers/net/ethernet/stmicro/stmmac/
>>
>> Yes the later driver (drivers/net/ethernet/stmicro/stmmac/) supports
>> both 3.x and 4.x. It has glue layer for pci, platform, core etc,
>> please refer this driver once before you start.
>>
>> You can start adding missing feature of 4.x in stmmac driver.
>
> Thanks you all for all the info.
> Well, I think we are in a good position to organize the ethernet drivers
> concerning Synopsys IPs.
>
> First of all, in my opinion, it does not make sense to have a ethernet/synopsis
> (typo :)) when ethernet/stmicro is also for a synopsys IP. If we have another
> vendor using the same IP it should be able to reuse the commonn operations. But
> I would put that discussion for later :)
>
> For now I suggest that for we create ethernet/qos and create there a folder
> called dwc (designware controller) where all the synopsys qos IP specific code
> in order to be reused for example by ethernet/stmicro/stmmac/. We just have to
> figure out a clean interface for "client drivers" like stmmac to interact with
> the new qos driver.
>
> What do you think about this approach?

The stmmac drivers run since many years on several platforms
(sh4, stm32, arm, x86, mips ...) and it supports an huge of amount of
configurations starting from 3.1x to 3.7x databooks.

It also supports QoS hardware; for example, 4.00a, 4.10a and 4.20a
are fully working.

Also the stmmac has platform, device-tree and pcie supports and
a lot of maintained glue-logic files.

It is fully documented inside the kernel tree.

I am happy to have new enhancements from other developers.
So, on my side, if you want to spend your time on improving it on your
platforms please feel free to do it!

Concerning the stmicro/stmmac naming, these come from a really old
story and have no issue to adopt new folder/file names.

I am also open to merge fixes and changes from ethernet/synopsis.
I want to point you on some benchmarks made by Alex some months ago
(IIRC) that showed an stmmac winner (due to the several optimizations
analyzed and reviewed in this mailing list).

Peppe

>
>
>>
>>>
>>> (See http://lists.openwall.net/netdev/2016/02/29/127)
>>>
>>> The former only supports 4.x of the hardware.
>>>
>>> The later supports 4.x and 3.x and already has a platform glue driver
>>> with support for several platforms, a PCI glue driver, and a core driver
>>> with several features not present in the former (for example: TX/RX
>>> interrupt coalescing, EEE, PTP).
>>>
>>> Have you evaluated both drivers?  Why have you decided to work on the
>>> former rather than the latter?
>>
>>
>
> Thanks.
>
>
>
>

^ permalink raw reply

* Re: ip -s macsec show: Statistics for OutOctets... and OutPkts... switched
From: Rami Rosen @ 2016-11-21 13:02 UTC (permalink / raw)
  To: Daniel.Hopf; +Cc: Netdev
In-Reply-To: <OF4B44CECE.BF9C7A12-ONC1258072.0037C56C-C1258072.003F5731@continental-corporation.com>

+1
Rami Rosen

^ permalink raw reply

* [PATCH net-next 1/7] net: Add net-device param to the get offloaded stats ndo
From: Saeed Mahameed @ 2016-11-21 13:05 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Or Gerlitz, Roi Dayan, Saeed Mahameed
In-Reply-To: <1479733561-26601-1-git-send-email-saeedm@mellanox.com>

From: Or Gerlitz <ogerlitz@mellanox.com>

Some drivers would need to check few internal matters for
that. To be used in downstream mlx5 commit.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 2 +-
 include/linux/netdevice.h                      | 4 ++--
 net/core/rtnetlink.c                           | 4 ++--
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index 4a1f9d5..be32f47 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -857,7 +857,7 @@ mlxsw_sp_port_get_sw_stats64(const struct net_device *dev,
 	return 0;
 }
 
-static bool mlxsw_sp_port_has_offload_stats(int attr_id)
+static bool mlxsw_sp_port_has_offload_stats(struct net_device *dev, int attr_id)
 {
 	switch (attr_id) {
 	case IFLA_OFFLOAD_XSTATS_CPU_HIT:
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index e84800e..ae32a27 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -925,7 +925,7 @@ struct netdev_xdp {
  *	3. Update dev->stats asynchronously and atomically, and define
  *	   neither operation.
  *
- * bool (*ndo_has_offload_stats)(int attr_id)
+ * bool (*ndo_has_offload_stats)(const struct net_device *dev, int attr_id)
  *	Return true if this device supports offload stats of this attr_id.
  *
  * int (*ndo_get_offload_stats)(int attr_id, const struct net_device *dev,
@@ -1165,7 +1165,7 @@ struct net_device_ops {
 
 	struct rtnl_link_stats64* (*ndo_get_stats64)(struct net_device *dev,
 						     struct rtnl_link_stats64 *storage);
-	bool			(*ndo_has_offload_stats)(int attr_id);
+	bool			(*ndo_has_offload_stats)(const struct net_device *dev, int attr_id);
 	int			(*ndo_get_offload_stats)(int attr_id,
 							 const struct net_device *dev,
 							 void *attr_data);
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index db313ec..f5a8d8a 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -3665,7 +3665,7 @@ static int rtnl_get_offload_stats(struct sk_buff *skb, struct net_device *dev,
 		if (!size)
 			continue;
 
-		if (!dev->netdev_ops->ndo_has_offload_stats(attr_id))
+		if (!dev->netdev_ops->ndo_has_offload_stats(dev, attr_id))
 			continue;
 
 		attr = nla_reserve_64bit(skb, attr_id, size,
@@ -3706,7 +3706,7 @@ static int rtnl_get_offload_stats_size(const struct net_device *dev)
 
 	for (attr_id = IFLA_OFFLOAD_XSTATS_FIRST;
 	     attr_id <= IFLA_OFFLOAD_XSTATS_MAX; attr_id++) {
-		if (!dev->netdev_ops->ndo_has_offload_stats(attr_id))
+		if (!dev->netdev_ops->ndo_has_offload_stats(dev, attr_id))
 			continue;
 		size = rtnl_get_offload_stats_attr_size(attr_id);
 		nla_size += nla_total_size_64bit(size);
-- 
2.7.4

^ permalink raw reply related

* [PATCH net-next 3/7] net/mlx5e: Support VF vport link state control for SRIOV switchdev mode
From: Saeed Mahameed @ 2016-11-21 13:05 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Or Gerlitz, Roi Dayan, Saeed Mahameed
In-Reply-To: <1479733561-26601-1-git-send-email-saeedm@mellanox.com>

From: Or Gerlitz <ogerlitz@mellanox.com>

Reflect the administative link changes done on the VF representor to the
VF e-switch vport. This means that doing ip link set down/up commands on
the VF rep will modify the e-switch vport state which in turn will make
proper VF drivers to set their carrier accordingly.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 33 ++++++++++++++++++++++--
 1 file changed, 31 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index e0d1a56..5e33f6b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -236,6 +236,35 @@ void mlx5e_nic_rep_unload(struct mlx5_eswitch *esw,
 	mlx5e_tc_init(priv);
 }
 
+static int mlx5e_rep_open(struct net_device *dev)
+{
+	struct mlx5e_priv *priv = netdev_priv(dev);
+	struct mlx5_eswitch_rep *rep = priv->ppriv;
+	struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
+	int err;
+
+	err = mlx5e_open(dev);
+	if (err)
+		return err;
+
+	err = mlx5_eswitch_set_vport_state(esw, rep->vport, MLX5_ESW_VPORT_ADMIN_STATE_UP);
+	if (!err)
+		netif_carrier_on(dev);
+
+	return 0;
+}
+
+static int mlx5e_rep_close(struct net_device *dev)
+{
+	struct mlx5e_priv *priv = netdev_priv(dev);
+	struct mlx5_eswitch_rep *rep = priv->ppriv;
+	struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
+
+	(void)mlx5_eswitch_set_vport_state(esw, rep->vport, MLX5_ESW_VPORT_ADMIN_STATE_DOWN);
+
+	return mlx5e_close(dev);
+}
+
 static int mlx5e_rep_get_phys_port_name(struct net_device *dev,
 					char *buf, size_t len)
 {
@@ -349,8 +378,8 @@ static const struct switchdev_ops mlx5e_rep_switchdev_ops = {
 };
 
 static const struct net_device_ops mlx5e_netdev_ops_rep = {
-	.ndo_open                = mlx5e_open,
-	.ndo_stop                = mlx5e_close,
+	.ndo_open                = mlx5e_rep_open,
+	.ndo_stop                = mlx5e_rep_close,
 	.ndo_start_xmit          = mlx5e_xmit,
 	.ndo_get_phys_port_name  = mlx5e_rep_get_phys_port_name,
 	.ndo_setup_tc            = mlx5e_rep_ndo_setup_tc,
-- 
2.7.4

^ permalink raw reply related

* [PATCH net-next 5/7] net/mlx5: Enable to query min inline for a specific vport
From: Saeed Mahameed @ 2016-11-21 13:05 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Or Gerlitz, Roi Dayan, Saeed Mahameed
In-Reply-To: <1479733561-26601-1-git-send-email-saeedm@mellanox.com>

From: Roi Dayan <roid@mellanox.com>

Also move the inline capablities enum to a shared header vport.h

Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h      |  6 ------
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 11 +++++------
 drivers/net/ethernet/mellanox/mlx5/core/vport.c   | 14 ++++++++------
 include/linux/mlx5/vport.h                        | 10 ++++++++--
 4 files changed, 21 insertions(+), 20 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index ebf5dbc..a2b32ed 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -150,12 +150,6 @@ static inline int mlx5_max_log_rq_size(int wq_type)
 	}
 }
 
-enum {
-	MLX5E_INLINE_MODE_L2,
-	MLX5E_INLINE_MODE_VPORT_CONTEXT,
-	MLX5_INLINE_MODE_NOT_REQUIRED,
-};
-
 struct mlx5e_tx_wqe {
 	struct mlx5_wqe_ctrl_seg ctrl;
 	struct mlx5_wqe_eth_seg  eth;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 480875f..d25bc0f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -952,7 +952,7 @@ static int mlx5e_create_sq(struct mlx5e_channel *c,
 	sq->bf_buf_size = (1 << MLX5_CAP_GEN(mdev, log_bf_reg_size)) / 2;
 	sq->max_inline  = param->max_inline;
 	sq->min_inline_mode =
-		MLX5_CAP_ETH(mdev, wqe_inline_mode) == MLX5E_INLINE_MODE_VPORT_CONTEXT ?
+		MLX5_CAP_ETH(mdev, wqe_inline_mode) == MLX5_CAP_INLINE_MODE_VPORT_CONTEXT ?
 		param->min_inline_mode : 0;
 
 	err = mlx5e_alloc_sq_db(sq, cpu_to_node(c->cpu));
@@ -3403,14 +3403,13 @@ static void mlx5e_query_min_inline(struct mlx5_core_dev *mdev,
 				   u8 *min_inline_mode)
 {
 	switch (MLX5_CAP_ETH(mdev, wqe_inline_mode)) {
-	case MLX5E_INLINE_MODE_L2:
+	case MLX5_CAP_INLINE_MODE_L2:
 		*min_inline_mode = MLX5_INLINE_MODE_L2;
 		break;
-	case MLX5E_INLINE_MODE_VPORT_CONTEXT:
-		mlx5_query_nic_vport_min_inline(mdev,
-						min_inline_mode);
+	case MLX5_CAP_INLINE_MODE_VPORT_CONTEXT:
+		mlx5_query_nic_vport_min_inline(mdev, 0, min_inline_mode);
 		break;
-	case MLX5_INLINE_MODE_NOT_REQUIRED:
+	case MLX5_CAP_INLINE_MODE_NOT_REQUIRED:
 		*min_inline_mode = MLX5_INLINE_MODE_NONE;
 		break;
 	}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/vport.c b/drivers/net/ethernet/mellanox/mlx5/core/vport.c
index 525f17a..269e440 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/vport.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/vport.c
@@ -113,15 +113,17 @@ static int mlx5_modify_nic_vport_context(struct mlx5_core_dev *mdev, void *in,
 	return mlx5_cmd_exec(mdev, in, inlen, out, sizeof(out));
 }
 
-void mlx5_query_nic_vport_min_inline(struct mlx5_core_dev *mdev,
-				     u8 *min_inline_mode)
+int mlx5_query_nic_vport_min_inline(struct mlx5_core_dev *mdev,
+				    u16 vport, u8 *min_inline)
 {
 	u32 out[MLX5_ST_SZ_DW(query_nic_vport_context_out)] = {0};
+	int err;
 
-	mlx5_query_nic_vport_context(mdev, 0, out, sizeof(out));
-
-	*min_inline_mode = MLX5_GET(query_nic_vport_context_out, out,
-				    nic_vport_context.min_wqe_inline_mode);
+	err = mlx5_query_nic_vport_context(mdev, vport, out, sizeof(out));
+	if (!err)
+		*min_inline = MLX5_GET(query_nic_vport_context_out, out,
+				       nic_vport_context.min_wqe_inline_mode);
+	return err;
 }
 EXPORT_SYMBOL_GPL(mlx5_query_nic_vport_min_inline);
 
diff --git a/include/linux/mlx5/vport.h b/include/linux/mlx5/vport.h
index 451b0bd..ec35157 100644
--- a/include/linux/mlx5/vport.h
+++ b/include/linux/mlx5/vport.h
@@ -36,6 +36,12 @@
 #include <linux/mlx5/driver.h>
 #include <linux/mlx5/device.h>
 
+enum {
+	MLX5_CAP_INLINE_MODE_L2,
+	MLX5_CAP_INLINE_MODE_VPORT_CONTEXT,
+	MLX5_CAP_INLINE_MODE_NOT_REQUIRED,
+};
+
 u8 mlx5_query_vport_state(struct mlx5_core_dev *mdev, u8 opmod, u16 vport);
 u8 mlx5_query_vport_admin_state(struct mlx5_core_dev *mdev, u8 opmod,
 				u16 vport);
@@ -43,8 +49,8 @@ int mlx5_modify_vport_admin_state(struct mlx5_core_dev *mdev, u8 opmod,
 				  u16 vport, u8 state);
 int mlx5_query_nic_vport_mac_address(struct mlx5_core_dev *mdev,
 				     u16 vport, u8 *addr);
-void mlx5_query_nic_vport_min_inline(struct mlx5_core_dev *mdev,
-				     u8 *min_inline);
+int mlx5_query_nic_vport_min_inline(struct mlx5_core_dev *mdev,
+				    u16 vport, u8 *min_inline);
 int mlx5_modify_nic_vport_min_inline(struct mlx5_core_dev *mdev,
 				     u16 vport, u8 min_inline);
 int mlx5_modify_nic_vport_mac_address(struct mlx5_core_dev *dev,
-- 
2.7.4

^ permalink raw reply related

* [PATCH net-next 4/7] devlink: Add E-Switch inline mode control
From: Saeed Mahameed @ 2016-11-21 13:05 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Or Gerlitz, Roi Dayan, Saeed Mahameed
In-Reply-To: <1479733561-26601-1-git-send-email-saeedm@mellanox.com>

From: Roi Dayan <roid@mellanox.com>

Some HWs need the VF driver to put part of the packet headers on the
TX descriptor so the e-switch can do proper matching and steering.

The supported modes: none, link, network, transport.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 include/net/devlink.h        |  2 ++
 include/uapi/linux/devlink.h |  8 +++++
 net/core/devlink.c           | 70 ++++++++++++++++++++++++++++++++------------
 3 files changed, 61 insertions(+), 19 deletions(-)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index 211bd3c..d29e5fc 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -92,6 +92,8 @@ struct devlink_ops {
 
 	int (*eswitch_mode_get)(struct devlink *devlink, u16 *p_mode);
 	int (*eswitch_mode_set)(struct devlink *devlink, u16 mode);
+	int (*eswitch_inline_mode_get)(struct devlink *devlink, u8 *p_inline_mode);
+	int (*eswitch_inline_mode_set)(struct devlink *devlink, u8 inline_mode);
 };
 
 static inline void *devlink_priv(struct devlink *devlink)
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 915bfa7..9014c33 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -102,6 +102,13 @@ enum devlink_eswitch_mode {
 	DEVLINK_ESWITCH_MODE_SWITCHDEV,
 };
 
+enum devlink_eswitch_inline_mode {
+	DEVLINK_ESWITCH_INLINE_MODE_NONE,
+	DEVLINK_ESWITCH_INLINE_MODE_LINK,
+	DEVLINK_ESWITCH_INLINE_MODE_NETWORK,
+	DEVLINK_ESWITCH_INLINE_MODE_TRANSPORT,
+};
+
 enum devlink_attr {
 	/* don't change the order or add anything between, this is ABI! */
 	DEVLINK_ATTR_UNSPEC,
@@ -133,6 +140,7 @@ enum devlink_attr {
 	DEVLINK_ATTR_SB_OCC_CUR,		/* u32 */
 	DEVLINK_ATTR_SB_OCC_MAX,		/* u32 */
 	DEVLINK_ATTR_ESWITCH_MODE,		/* u16 */
+	DEVLINK_ATTR_ESWITCH_INLINE_MODE,	/* u8 */
 
 	/* add new attributes above here, update the policy in devlink.c */
 
diff --git a/net/core/devlink.c b/net/core/devlink.c
index c14f8b6..2b5bf9e 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -1394,26 +1394,45 @@ static int devlink_nl_cmd_sb_occ_max_clear_doit(struct sk_buff *skb,
 
 static int devlink_eswitch_fill(struct sk_buff *msg, struct devlink *devlink,
 				enum devlink_command cmd, u32 portid,
-				u32 seq, int flags, u16 mode)
+				u32 seq, int flags)
 {
+	const struct devlink_ops *ops = devlink->ops;
 	void *hdr;
+	int err = 0;
+	u16 mode;
+	u8 inline_mode;
 
 	hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd);
 	if (!hdr)
 		return -EMSGSIZE;
 
-	if (devlink_nl_put_handle(msg, devlink))
-		goto nla_put_failure;
+	err = devlink_nl_put_handle(msg, devlink);
+	if (err)
+		goto out;
 
-	if (nla_put_u16(msg, DEVLINK_ATTR_ESWITCH_MODE, mode))
-		goto nla_put_failure;
+	err = ops->eswitch_mode_get(devlink, &mode);
+	if (err)
+		goto out;
+	err = nla_put_u16(msg, DEVLINK_ATTR_ESWITCH_MODE, mode);
+	if (err)
+		goto out;
+
+	if (ops->eswitch_inline_mode_get) {
+		err = ops->eswitch_inline_mode_get(devlink, &inline_mode);
+		if (err)
+			goto out;
+		err = nla_put_u8(msg, DEVLINK_ATTR_ESWITCH_INLINE_MODE,
+				 inline_mode);
+		if (err)
+			goto out;
+	}
 
 	genlmsg_end(msg, hdr);
 	return 0;
 
-nla_put_failure:
+out:
 	genlmsg_cancel(msg, hdr);
-	return -EMSGSIZE;
+	return err;
 }
 
 static int devlink_nl_cmd_eswitch_mode_get_doit(struct sk_buff *skb,
@@ -1422,22 +1441,17 @@ static int devlink_nl_cmd_eswitch_mode_get_doit(struct sk_buff *skb,
 	struct devlink *devlink = info->user_ptr[0];
 	const struct devlink_ops *ops = devlink->ops;
 	struct sk_buff *msg;
-	u16 mode;
 	int err;
 
 	if (!ops || !ops->eswitch_mode_get)
 		return -EOPNOTSUPP;
 
-	err = ops->eswitch_mode_get(devlink, &mode);
-	if (err)
-		return err;
-
 	msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
 	if (!msg)
 		return -ENOMEM;
 
 	err = devlink_eswitch_fill(msg, devlink, DEVLINK_CMD_ESWITCH_MODE_GET,
-				   info->snd_portid, info->snd_seq, 0, mode);
+				   info->snd_portid, info->snd_seq, 0);
 
 	if (err) {
 		nlmsg_free(msg);
@@ -1453,15 +1467,32 @@ static int devlink_nl_cmd_eswitch_mode_set_doit(struct sk_buff *skb,
 	struct devlink *devlink = info->user_ptr[0];
 	const struct devlink_ops *ops = devlink->ops;
 	u16 mode;
+	u8 inline_mode;
+	int err = 0;
 
-	if (!info->attrs[DEVLINK_ATTR_ESWITCH_MODE])
-		return -EINVAL;
+	if (!ops)
+		return -EOPNOTSUPP;
 
-	mode = nla_get_u16(info->attrs[DEVLINK_ATTR_ESWITCH_MODE]);
+	if (info->attrs[DEVLINK_ATTR_ESWITCH_MODE]) {
+		if (!ops->eswitch_mode_set)
+			return -EOPNOTSUPP;
+		mode = nla_get_u16(info->attrs[DEVLINK_ATTR_ESWITCH_MODE]);
+		err = ops->eswitch_mode_set(devlink, mode);
+		if (err)
+			return err;
+	}
 
-	if (ops && ops->eswitch_mode_set)
-		return ops->eswitch_mode_set(devlink, mode);
-	return -EOPNOTSUPP;
+	if (info->attrs[DEVLINK_ATTR_ESWITCH_INLINE_MODE]) {
+		if (!ops->eswitch_inline_mode_set)
+			return -EOPNOTSUPP;
+		inline_mode = nla_get_u8(
+				info->attrs[DEVLINK_ATTR_ESWITCH_INLINE_MODE]);
+		err = ops->eswitch_inline_mode_set(devlink, inline_mode);
+		if (err)
+			return err;
+	}
+
+	return 0;
 }
 
 static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
@@ -1478,6 +1509,7 @@ static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
 	[DEVLINK_ATTR_SB_THRESHOLD] = { .type = NLA_U32 },
 	[DEVLINK_ATTR_SB_TC_INDEX] = { .type = NLA_U16 },
 	[DEVLINK_ATTR_ESWITCH_MODE] = { .type = NLA_U16 },
+	[DEVLINK_ATTR_ESWITCH_INLINE_MODE] = { .type = NLA_U8 },
 };
 
 static const struct genl_ops devlink_nl_ops[] = {
-- 
2.7.4

^ permalink raw reply related

* [PATCH net-next 0/7] Mellanox 100G mlx5 SRIOV switchdev update
From: Saeed Mahameed @ 2016-11-21 13:05 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Or Gerlitz, Roi Dayan, Saeed Mahameed

Hi Dave,

This series from Roi and Or further enhances the new SRIOV switchdev mode.

Roi's patches deal with allowing users to configure though devlink
the level of inline headers that the VF should be setting in order for
the eswitch HW to do proper matching. We also enforce that the matching
required for offloaded TC rules is aligned with that level on the PF driver.

Or's patches deals with allowing the user to control on the VF operational
link state through admin directives on the mlx5 VF rep link. Also in this series
is implementation of HW and SW counters for the mlx5 VF rep which is aligned
with the design set by commit a5ea31f57309 'Merge branch net-offloaded-stats'.

This series was generated against commit
f81a8a02bb3b ("Merge branch 'mV88e6xxx-interrupt-fixes'")

Thanks,
Saeed.

Or Gerlitz (3):
  net: Add net-device param to the get offloaded stats ndo
  net/mlx5e: Support HW (offloaded) and SW counters for SRIOV switchdev
    mode
  net/mlx5e: Support VF vport link state control for SRIOV switchdev
    mode

Roi Dayan (4):
  devlink: Add E-Switch inline mode control
  net/mlx5: Enable to query min inline for a specific vport
  net/mlx5: E-Switch, Add control for inline mode
  net/mlx5e: Enforce min inline mode when offloading flows

 drivers/net/ethernet/mellanox/mlx5/core/en.h       |  15 +--
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  42 +++---
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c   | 144 +++++++++++++++++++--
 drivers/net/ethernet/mellanox/mlx5/core/en_stats.h |   1 +
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c    |  46 ++++++-
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c  |   1 +
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h  |   4 +
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 141 ++++++++++++++++++++
 drivers/net/ethernet/mellanox/mlx5/core/main.c     |   2 +
 drivers/net/ethernet/mellanox/mlx5/core/vport.c    |  14 +-
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c     |   2 +-
 include/linux/mlx5/vport.h                         |  10 +-
 include/linux/netdevice.h                          |   4 +-
 include/net/devlink.h                              |   2 +
 include/uapi/linux/devlink.h                       |   8 ++
 net/core/devlink.c                                 |  70 +++++++---
 net/core/rtnetlink.c                               |   4 +-
 17 files changed, 438 insertions(+), 72 deletions(-)

-- 
2.7.4

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox