Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH iproute2] iplink: Expose IFLA_*_FWMARK attributes for supported link types
From: Stephen Hemminger @ 2017-04-23 16:16 UTC (permalink / raw)
  To: Craig Gallek; +Cc: netdev
In-Reply-To: <20170421181453.166316-1-kraigatgoog@gmail.com>

On Fri, 21 Apr 2017 14:14:53 -0400
Craig Gallek <kraigatgoog@gmail.com> wrote:

> From: Craig Gallek <kraig@google.com>
> 
> This attribute allows the administrator to adjust the packet marking
> attribute of tunnels that support policy based routing.
> 
> Signed-off-by: Craig Gallek <kraig@google.com>

Applied to net-next. Since the link attributes are not in 4.11

^ permalink raw reply

* Re: [PATCH iproute2] gre6: fix copy/paste bugs in GREv6 attribute manipulation
From: Stephen Hemminger @ 2017-04-23 16:15 UTC (permalink / raw)
  To: Craig Gallek; +Cc: netdev
In-Reply-To: <20170421181425.166260-1-kraigatgoog@gmail.com>

On Fri, 21 Apr 2017 14:14:25 -0400
Craig Gallek <kraigatgoog@gmail.com> wrote:

> From: Craig Gallek <kraig@google.com>
> 
> Fixes: af89576d7a8c("iproute2: GRE over IPv6 tunnel support.")
> Signed-off-by: Craig Gallek <kraig@google.com>

Thanks. Applied.

^ permalink raw reply

* Re: tools/testing/selftests/bpf/Makefile
From: Alexei Starovoitov @ 2017-04-23 16:13 UTC (permalink / raw)
  To: David Miller; +Cc: daniel, netdev
In-Reply-To: <20170422.154501.1876167225428231606.davem@davemloft.net>

On Sat, Apr 22, 2017 at 03:45:01PM -0400, David Miller wrote:
> 
> Alexei, that unconditional -D__x86_64__ isn't going to work.  It in
> fact makes the build break on sparc because the types.h asm headers
> explicitly check for things like __sparc__ && __arch64__ etc.

yeah. it was a quick workaround for the following error:
In file included from net-next/tools/testing/selftests/bpf/test_xdp.c:8:
In file included from /usr/include/string.h:25:
In file included from /usr/include/features.h:399:
/usr/include/gnu/stubs.h:7:11: fatal error: 'gnu/stubs-32.h' file not found

the bpf progs don't do any x86 specific things.

> In every
> 	arch/${ARCH}/Makefile
> extract out the "-DXXX" stuff from CHECKFLAGS into a new Makefile

that should work. Probably there is even simpler way.

^ permalink raw reply

* Re: [PATCH iproute2 1/1] actions: Add support for user cookies
From: Stephen Hemminger @ 2017-04-23 16:11 UTC (permalink / raw)
  To: Jamal Hadi Salim; +Cc: netdev
In-Reply-To: <1492864583-20892-1-git-send-email-jhs@emojatatu.com>

On Sat, 22 Apr 2017 08:36:23 -0400
Jamal Hadi Salim <jhs@mojatatu.com> wrote:

> From: Jamal Hadi Salim <jhs@mojatatu.com>
> 
> Make use of 128b user cookies
> 
> Introduce optional 128-bit action cookie.
> Like all other cookie schemes in the networking world (eg in protocols
> like http or existing kernel fib protocol field, etc) the idea is to
> save user state that when retrieved serves as a correlator. The kernel
> _should not_ intepret it. The user can store whatever they wish in the
> 128 bits.
> 
> Sample exercise(showing variable length use of cookie)
> 
> .. create an accept action with cookie a1b2c3d4
> sudo $TC actions add action ok index 1 cookie a1b2c3d4
> 
> .. dump all gact actions..
> sudo $TC -s actions ls action gact
> 
>     action order 0: gact action pass
>      random type none pass val 0
>      index 1 ref 1 bind 0 installed 5 sec used 5 sec
>     Action statistics:
>     Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>     backlog 0b 0p requeues 0
>     cookie a1b2c3d4
> 
> .. bind the accept action to a filter..
> sudo $TC filter add dev lo parent ffff: protocol ip prio 1 \
> u32 match ip dst 127.0.0.1/32 flowid 1:1 action gact index 1
> 
> ... send some traffic..
> $ ping 127.0.0.1 -c 3
> PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data.
> 64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.020 ms
> 64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=0.027 ms
> 64 bytes from 127.0.0.1: icmp_seq=3 ttl=64 time=0.038 ms
> 
> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>

Applied. Please update man page as well.

^ permalink raw reply

* [PATCH net-next] bpf, doc: update list of architectures that do eBPF JIT
From: Alexei Starovoitov @ 2017-04-23 16:01 UTC (permalink / raw)
  To: David S . Miller; +Cc: Daniel Borkmann, netdev

update the list and remove 'in the future' statement,
since all still alive 64-bit architectures now do eBPF JIT.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
mips64 is the only 'still alive' 64-bit arch without eBPF JIT :)
---
 Documentation/networking/filter.txt | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/Documentation/networking/filter.txt b/Documentation/networking/filter.txt
index 683ada5ad81d..b69b205501de 100644
--- a/Documentation/networking/filter.txt
+++ b/Documentation/networking/filter.txt
@@ -595,10 +595,9 @@ got from bpf_prog_create(), and 'ctx' the given context (e.g.
 skb pointer). All constraints and restrictions from bpf_check_classic() apply
 before a conversion to the new layout is being done behind the scenes!
 
-Currently, the classic BPF format is being used for JITing on most of the
-architectures. x86-64, aarch64 and s390x perform JIT compilation from eBPF
-instruction set, however, future work will migrate other JIT compilers as well,
-so that they will profit from the very same benefits.
+Currently, the classic BPF format is being used for JITing on most 32-bit
+architectures, whereas x86-64, aarch64, s390x, powerpc64, sparc64 perform JIT
+compilation from eBPF instruction set.
 
 Some core changes of the new internal format:
 
-- 
2.9.3

^ permalink raw reply related

* Re: [PATCH v2 net] net: ipv6: regenerate host route if moved to gc list
From: David Ahern @ 2017-04-23 14:08 UTC (permalink / raw)
  To: Martin KaFai Lau; +Cc: netdev, dvyukov, andreyknvl, mmanning
In-Reply-To: <20170423022801.t7lw3vuazo2ks6u4@kafai-mba.local>

On 4/22/17 8:28 PM, Martin KaFai Lau wrote:
>> The code path to fixup_permanent_addr is under RTNL, so the if check on
>> ifp->rt and rt6i_ref is ok -- neither can be changed since RTNL is held.
>>
>> Since ifp->rt can be accessed outside of RTNL, the spinlock is needed to
>> change its value.
> Got it. It is to protect the readers which are not under RTNL.
> Many thanks for pointing out what I was missing.  It all makes sense now.
> 
>> Arguably only 'ifp->rt = rt;' needs the spinlock.
> It still seems like the existing 'ifp->rt = rt;' needs protection
> anyway regardless of the rt regeneration change.  It would be nice to
> explain it in the commit log or even better separating it out
> into another patch.

I'll add a comment to the commit log when I send a v3 tomorrow morning.

^ permalink raw reply

* Re: [PATCH 1/1] tipc: check return value of nlmsg_new
From: Jon Maloy @ 2017-04-23 13:33 UTC (permalink / raw)
  To: Pan Bian, Ying Xue, David S. Miller
  Cc: netdev@vger.kernel.org, tipc-discussion@lists.sourceforge.net,
	linux-kernel@vger.kernel.org
In-Reply-To: <1492931359-25004-1-git-send-email-bianpan2016@163.com>

Acknowledged. Thank you for doing this job.

///jon


> -----Original Message-----
> From: Pan Bian [mailto:bianpan2016@163.com]
> Sent: Sunday, April 23, 2017 03:09 AM
> To: Jon Maloy <jon.maloy@ericsson.com>; Ying Xue
> <ying.xue@windriver.com>; David S. Miller <davem@davemloft.net>
> Cc: netdev@vger.kernel.org; tipc-discussion@lists.sourceforge.net; linux-
> kernel@vger.kernel.org; Pan Bian <bianpan2016@163.com>
> Subject: [PATCH 1/1] tipc: check return value of nlmsg_new
> 
> Function nlmsg_new() will return a NULL pointer if there is no enough
> memory, and its return value should be checked before it is used.
> However, in function tipc_nl_node_get_monitor(), the validation of the
> return value of function nlmsg_new() is missed. This patch fixes the bug.
> 
> Signed-off-by: Pan Bian <bianpan2016@163.com>
> ---
>  net/tipc/node.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/net/tipc/node.c b/net/tipc/node.c index 4512e83..568e48d 100644
> --- a/net/tipc/node.c
> +++ b/net/tipc/node.c
> @@ -2098,6 +2098,8 @@ int tipc_nl_node_get_monitor(struct sk_buff *skb,
> struct genl_info *info)
>  	int err;
> 
>  	msg.skb = nlmsg_new(NLMSG_GOODSIZE, GFP_KERNEL);
> +	if (!msg.skb)
> +		return -ENOMEM;
>  	msg.portid = info->snd_portid;
>  	msg.seq = info->snd_seq;
> 
> --
> 1.9.1
> 


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot

^ permalink raw reply

* [PATCH 1/1] cfg80211: add return value validation
From: Pan Bian @ 2017-04-23 13:27 UTC (permalink / raw)
  To: Jussi Kivilinna, Kalle Valo,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, Pan Bian

From: Pan Bian <bianpan2016-9Onoh4P/yGk@public.gmane.org>

Function create_singlethread_workqueue() will return a NULL pointer if
there is no enough memory, and its return value should be validated
before using. However, in function rndis_wlan_bind(), its return value
is not checked. This may cause NULL dereference bugs. This patch fixes
it.

Signed-off-by: Pan Bian <bianpan2016-9Onoh4P/yGk@public.gmane.org>
---
 drivers/net/wireless/rndis_wlan.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/wireless/rndis_wlan.c b/drivers/net/wireless/rndis_wlan.c
index 785334f..92a1bde 100644
--- a/drivers/net/wireless/rndis_wlan.c
+++ b/drivers/net/wireless/rndis_wlan.c
@@ -3427,6 +3427,10 @@ static int rndis_wlan_bind(struct usbnet *usbdev, struct usb_interface *intf)
 
 	/* because rndis_command() sleeps we need to use workqueue */
 	priv->workqueue = create_singlethread_workqueue("rndis_wlan");
+	if (!priv->workqueue) {
+		wiphy_free(wiphy);
+		return -ENOMEM;
+	}
 	INIT_WORK(&priv->work, rndis_wlan_worker);
 	INIT_DELAYED_WORK(&priv->dev_poller_work, rndis_device_poller);
 	INIT_DELAYED_WORK(&priv->scan_work, rndis_get_scan_results);
-- 
1.9.1

^ permalink raw reply related

* [PATCH 1/1] libertas: check return value of alloc_workqueue
From: Pan Bian @ 2017-04-23 13:19 UTC (permalink / raw)
  To: Kalle Valo, Bhaktipriya Shridhar, Tejun Heo
  Cc: libertas-dev, linux-wireless, netdev, linux-kernel, Pan Bian

From: Pan Bian <bianpan2016@163.com>

Function alloc_workqueue() will return a NULL pointer if there is no
enough memory, and its return value should be validated before using.
However, in function if_spi_probe(), its return value is not checked.
This may result in a NULL dereference bug. This patch fixes the bug.

Signed-off-by: Pan Bian <bianpan2016@163.com>
---
 drivers/net/wireless/marvell/libertas/if_spi.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/wireless/marvell/libertas/if_spi.c b/drivers/net/wireless/marvell/libertas/if_spi.c
index c3a53cd..7b4955c 100644
--- a/drivers/net/wireless/marvell/libertas/if_spi.c
+++ b/drivers/net/wireless/marvell/libertas/if_spi.c
@@ -1181,6 +1181,10 @@ static int if_spi_probe(struct spi_device *spi)
 
 	/* Initialize interrupt handling stuff. */
 	card->workqueue = alloc_workqueue("libertas_spi", WQ_MEM_RECLAIM, 0);
+	if (!card->workqueue) {
+		err = -ENOMEM;
+		goto remove_card;
+	}
 	INIT_WORK(&card->packet_work, if_spi_host_to_card_worker);
 	INIT_WORK(&card->resume_work, if_spi_resume_worker);
 
@@ -1209,6 +1213,7 @@ static int if_spi_probe(struct spi_device *spi)
 	free_irq(spi->irq, card);
 terminate_workqueue:
 	destroy_workqueue(card->workqueue);
+remove_card:
 	lbs_remove_card(priv); /* will call free_netdev */
 free_card:
 	free_if_spi_card(card);
-- 
1.9.1

^ permalink raw reply related

* [PATCH iproute2 net 7/8] tc/pedit: p_tcp: introduce pedit tcp support
From: Amir Vadai @ 2017-04-23 12:53 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, Or Gerlitz, Jamal Hadi Salim, Amir Vadai
In-Reply-To: <20170423125356.1298-1-amir@vadai.me>

For example, forward tcp traffic destined to port 80 to veth0 and set
tcp port to 8080:
$ tc filter add dev enp0s9 protocol ip parent ffff: \
    flower \
      ip_proto tcp \
      dst_port 80 \
    action pedit ex munge \
      tcp dport set 8080 \
    action mirred egress \
      redirect dev veth0

Signed-off-by: Amir Vadai <amir@vadai.me>
---
 man/man8/tc-pedit.8 | 23 +++++++++++++++++++++++
 tc/p_tcp.c          | 37 +++++++++++++++++++++++++++++++++++++
 2 files changed, 60 insertions(+)

diff --git a/man/man8/tc-pedit.8 b/man/man8/tc-pedit.8
index 8febdfe23f6e..ad1929592660 100644
--- a/man/man8/tc-pedit.8
+++ b/man/man8/tc-pedit.8
@@ -32,6 +32,8 @@ pedit - generic packet editor action
 .BI ip " IPHDR_FIELD"
 |
 .BI ip " EX_IPHDR_FIELD"
+|
+.BI tcp " TCPHDR_FIELD"
 .RI } " CMD_SPEC"
 
 .ti -8
@@ -52,6 +54,10 @@ pedit - generic packet editor action
 .BR ttl " }"
 
 .ti -8
+.IR TCPHDR_FIELD " := { "
+.BR sport " | " dport " | " flags " }"
+
+.ti -8
 .IR CMD_SPEC " := {"
 .BR clear " | " invert " | " set
 .IR VAL " | "
@@ -199,6 +205,20 @@ are:
 .B ttl
 .RE
 .TP
+.BI tcp " TCPHDR_FIELD"
+The supported keywords for
+.I TCPHDR_FIELD
+are:
+.RS
+.TP
+.B sport
+.TQ
+.B dport
+Source or destination TCP port number, a 16-bit value.
+.TP
+.B flags
+.RE
+.TP
 .B clear
 Clear the addressed data (i.e., set it to zero).
 .TP
@@ -293,6 +313,9 @@ tc filter add dev eth0 parent ffff: u32 \\
 tc filter add dev eth0 parent ffff: u32 \\
 	match ip sport 22 0xffff \\
 	action pedit ex munge eth dst set 11:22:33:44:55:66
+tc filter add dev eth0 parent ffff: u32 \\
+	match ip dport 23 0xffff \\
+	action pedit ex munge tcp dport set 22
 .EE
 .RE
 .SH SEE ALSO
diff --git a/tc/p_tcp.c b/tc/p_tcp.c
index 53ee9842160b..cf14574c9c3e 100644
--- a/tc/p_tcp.c
+++ b/tc/p_tcp.c
@@ -28,6 +28,43 @@ parse_tcp(int *argc_p, char ***argv_p,
 	  struct m_pedit_sel *sel, struct m_pedit_key *tkey)
 {
 	int res = -1;
+	int argc = *argc_p;
+	char **argv = *argv_p;
+
+	if (argc < 2)
+		return -1;
+
+	if (!sel->extended)
+		return -1;
+
+	tkey->htype = TCA_PEDIT_KEY_EX_HDR_TYPE_TCP;
+
+	if (strcmp(*argv, "sport") == 0) {
+		NEXT_ARG();
+		tkey->off = 0;
+		res = parse_cmd(&argc, &argv, 2, TU32, RU16, sel, tkey);
+		goto done;
+	}
+
+	if (strcmp(*argv, "dport") == 0) {
+		NEXT_ARG();
+		tkey->off = 2;
+		res = parse_cmd(&argc, &argv, 2, TU32, RU16, sel, tkey);
+		goto done;
+	}
+
+	if (strcmp(*argv, "flags") == 0) {
+		NEXT_ARG();
+		tkey->off = 13;
+		res = parse_cmd(&argc, &argv, 1, TU32, RU8, sel, tkey);
+		goto done;
+	}
+
+	return -1;
+
+done:
+	*argc_p = argc;
+	*argv_p = argv;
 	return res;
 }
 struct m_pedit_util p_pedit_tcp = {
-- 
2.12.0

^ permalink raw reply related

* [PATCH iproute2 net 5/8] tc/pedit: Support fields bigger than 32 bits
From: Amir Vadai @ 2017-04-23 12:53 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, Or Gerlitz, Jamal Hadi Salim, Amir Vadai
In-Reply-To: <20170423125356.1298-1-amir@vadai.me>

Make parse_val() accept fields up to 128 bits long, this should be
enough for current use cases and involves a minimal change to code.

Signed-off-by: Amir Vadai <amir@vadai.me>
---
 tc/m_pedit.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/tc/m_pedit.c b/tc/m_pedit.c
index 7af074a5a97c..d982c91a2585 100644
--- a/tc/m_pedit.c
+++ b/tc/m_pedit.c
@@ -256,7 +256,10 @@ int parse_val(int *argc_p, char ***argv_p, __u32 *val, int type)
 int parse_cmd(int *argc_p, char ***argv_p, __u32 len, int type, __u32 retain,
 	      struct m_pedit_sel *sel, struct m_pedit_key *tkey)
 {
-	__u32 mask = 0, val = 0;
+	__u32 mask[4] = { 0 };
+	__u32 val[4] = { 0 };
+	__u32 *m = &mask[0];
+	__u32 *v = &val[0];
 	__u32 o = 0xFF;
 	int res = -1;
 	int argc = *argc_p;
@@ -275,7 +278,7 @@ int parse_cmd(int *argc_p, char ***argv_p, __u32 len, int type, __u32 retain,
 		o = 0xFFFFFFFF;
 
 	if (matches(*argv, "invert") == 0) {
-		val = mask = o;
+		*v = *m = o;
 	} else if (matches(*argv, "set") == 0 ||
 		   matches(*argv, "add") == 0) {
 		if (matches(*argv, "add") == 0)
@@ -287,7 +290,7 @@ int parse_cmd(int *argc_p, char ***argv_p, __u32 len, int type, __u32 retain,
 		}
 
 		NEXT_ARG();
-		if (parse_val(&argc, &argv, &val, type))
+		if (parse_val(&argc, &argv, val, type))
 			return -1;
 	} else if (matches(*argv, "preserve") == 0) {
 		retain = 0;
@@ -307,8 +310,8 @@ int parse_cmd(int *argc_p, char ***argv_p, __u32 len, int type, __u32 retain,
 		argv++;
 	}
 
-	tkey->val = val;
-	tkey->mask = mask;
+	tkey->val = *v;
+	tkey->mask = *m;
 
 	if (type == TIPV4)
 		tkey->val = ntohl(tkey->val);
-- 
2.12.0

^ permalink raw reply related

* [PATCH iproute2 net 8/8] tc/pedit: p_udp: introduce pedit udp support
From: Amir Vadai @ 2017-04-23 12:53 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, Or Gerlitz, Jamal Hadi Salim, Amir Vadai
In-Reply-To: <20170423125356.1298-1-amir@vadai.me>

From: Or Gerlitz <ogerlitz@mellanox.com>

For example, forward udp traffic destined to port 999 to veth0 and set
tcp port to 888:
$ tc filter add dev enp0s9 protocol ip parent ffff: \
    flower \
      ip_proto udp \
      dst_port 999 \
    action pedit ex munge \
      udp dport set 888 \
    action mirred egress \
      redirect dev veth0

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Amir Vadai <amir@vadai.me>
---
 man/man8/tc-pedit.8 | 18 ++++++++++++++++++
 tc/p_udp.c          | 27 +++++++++++++++++++++++++++
 2 files changed, 45 insertions(+)

diff --git a/man/man8/tc-pedit.8 b/man/man8/tc-pedit.8
index ad1929592660..7f482eafc6c7 100644
--- a/man/man8/tc-pedit.8
+++ b/man/man8/tc-pedit.8
@@ -34,6 +34,8 @@ pedit - generic packet editor action
 .BI ip " EX_IPHDR_FIELD"
 |
 .BI tcp " TCPHDR_FIELD"
+|
+.BI udp " UDPHDR_FIELD"
 .RI } " CMD_SPEC"
 
 .ti -8
@@ -58,6 +60,10 @@ pedit - generic packet editor action
 .BR sport " | " dport " | " flags " }"
 
 .ti -8
+.IR UDPHDR_FIELD " := { "
+.BR sport " | " dport " }"
+
+.ti -8
 .IR CMD_SPEC " := {"
 .BR clear " | " invert " | " set
 .IR VAL " | "
@@ -219,6 +225,18 @@ Source or destination TCP port number, a 16-bit value.
 .B flags
 .RE
 .TP
+.BI udp " UDPHDR_FIELD"
+The supported keywords for
+.I UDPHDR_FIELD
+are:
+.RS
+.TP
+.B sport
+.TQ
+.B dport
+Source or destination TCP port number, a 16-bit value.
+.RE
+.TP
 .B clear
 Clear the addressed data (i.e., set it to zero).
 .TP
diff --git a/tc/p_udp.c b/tc/p_udp.c
index 3a86ba382391..a56a1b519254 100644
--- a/tc/p_udp.c
+++ b/tc/p_udp.c
@@ -28,6 +28,33 @@ parse_udp(int *argc_p, char ***argv_p,
 	  struct m_pedit_sel *sel, struct m_pedit_key *tkey)
 {
 	int res = -1;
+	int argc = *argc_p;
+	char **argv = *argv_p;
+
+	if (argc < 2)
+		return -1;
+
+	tkey->htype = TCA_PEDIT_KEY_EX_HDR_TYPE_UDP;
+
+	if (strcmp(*argv, "sport") == 0) {
+		NEXT_ARG();
+		tkey->off = 0;
+		res = parse_cmd(&argc, &argv, 2, TU32, RU16, sel, tkey);
+		goto done;
+	}
+
+	if (strcmp(*argv, "dport") == 0) {
+		NEXT_ARG();
+		tkey->off = 2;
+		res = parse_cmd(&argc, &argv, 2, TU32, RU16, sel, tkey);
+		goto done;
+	}
+
+	return -1;
+
+done:
+	*argc_p = argc;
+	*argv_p = argv;
 	return res;
 }
 
-- 
2.12.0

^ permalink raw reply related

* [PATCH iproute2 net 6/8] tc/pedit: p_eth: ETH header editor
From: Amir Vadai @ 2017-04-23 12:53 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, Or Gerlitz, Jamal Hadi Salim, Amir Vadai
In-Reply-To: <20170423125356.1298-1-amir@vadai.me>

For example, forward tcp traffic to veth0 and set
destination mac address to 11:22:33:44:55:66 :
$ tc filter add dev enp0s9 protocol ip parent ffff: \
    flower \
      ip_proto tcp \
    action pedit ex munge \
      eth dst set 11:22:33:44:55:66 \
    action mirred egress \
      redirect dev veth0

Signed-off-by: Amir Vadai <amir@vadai.me>
---
 man/man8/tc-pedit.8 | 24 ++++++++++++++++++
 tc/Makefile         |  1 +
 tc/m_pedit.c        | 46 ++++++++++++++++++++++++++++++++++
 tc/m_pedit.h        |  1 +
 tc/p_eth.c          | 72 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 144 insertions(+)
 create mode 100644 tc/p_eth.c

diff --git a/man/man8/tc-pedit.8 b/man/man8/tc-pedit.8
index c98d95cb0021..8febdfe23f6e 100644
--- a/man/man8/tc-pedit.8
+++ b/man/man8/tc-pedit.8
@@ -27,12 +27,18 @@ pedit - generic packet editor action
 
 .ti -8
 .IR EXTENDED_LAYERED_OP " := { "
+.BI eth " ETHHDR_FIELD"
+|
 .BI ip " IPHDR_FIELD"
 |
 .BI ip " EX_IPHDR_FIELD"
 .RI } " CMD_SPEC"
 
 .ti -8
+.IR ETHHDR_FIELD " := { "
+.BR src " | " dst " | " type " }"
+
+.ti -8
 .IR IPHDR_FIELD " := { "
 .BR src " | " dst " | " tos " | " dsfield " | " ihl " | " protocol " |"
 .BR precedence " | " nofrag " | " firstfrag " | " ce " | " df " }"
@@ -103,6 +109,21 @@ and right-shifted by
 before adding it to
 .IR OFFSET .
 .TP
+.BI eth " ETHHDR_FIELD"
+Change an ETH header field. The supported keywords for
+.I ETHHDR_FIELD
+are:
+.RS
+.TP
+.B src
+.TQ
+.B dst
+Source or destination MAC address in the standard format: XX:XX:XX:XX:XX:XX
+.TP
+.B type
+Ether-type in numeric value
+.RE
+.TP
 .BI ip " IPHDR_FIELD"
 Change an IPv4 header field. The supported keywords for
 .I IPHDR_FIELD
@@ -269,6 +290,9 @@ tc filter add dev eth0 parent ffff: u32 \\
 tc filter add dev eth0 parent ffff: u32 \\
 	match ip sport 22 0xffff \\
 	action pedit ex munge ip dst set 192.168.1.199
+tc filter add dev eth0 parent ffff: u32 \\
+	match ip sport 22 0xffff \\
+	action pedit ex munge eth dst set 11:22:33:44:55:66
 .EE
 .RE
 .SH SEE ALSO
diff --git a/tc/Makefile b/tc/Makefile
index 3f7fc939e194..446a11391ad7 100644
--- a/tc/Makefile
+++ b/tc/Makefile
@@ -54,6 +54,7 @@ TCMODULES += m_tunnel_key.o
 TCMODULES += m_sample.o
 TCMODULES += p_ip.o
 TCMODULES += p_icmp.o
+TCMODULES += p_eth.o
 TCMODULES += p_tcp.o
 TCMODULES += p_udp.o
 TCMODULES += em_nbyte.o
diff --git a/tc/m_pedit.c b/tc/m_pedit.c
index d982c91a2585..0be42343ac88 100644
--- a/tc/m_pedit.c
+++ b/tc/m_pedit.c
@@ -28,6 +28,7 @@
 #include "utils.h"
 #include "tc_util.h"
 #include "m_pedit.h"
+#include "rt_names.h"
 
 static struct m_pedit_util *pedit_list;
 static int pedit_debug;
@@ -223,6 +224,38 @@ int pack_key8(__u32 retain, struct m_pedit_sel *sel, struct m_pedit_key *tkey)
 	return pack_key(sel, tkey);
 }
 
+static int pack_mac(struct m_pedit_sel *sel, struct m_pedit_key *tkey,
+		    __u8 *mac)
+{
+	int ret = 0;
+
+	if (!(tkey->off & 0x3)) {
+		tkey->mask = 0;
+		tkey->val = ntohl(*((__u32 *)mac));
+		ret |= pack_key32(~0, sel, tkey);
+
+		tkey->off += 4;
+		tkey->mask = 0;
+		tkey->val = ntohs(*((__u16 *)&mac[4]));
+		ret |= pack_key16(~0, sel, tkey);
+	} else if (!(tkey->off & 0x1)) {
+		tkey->mask = 0;
+		tkey->val = ntohs(*((__u16 *)mac));
+		ret |= pack_key16(~0, sel, tkey);
+
+		tkey->off += 4;
+		tkey->mask = 0;
+		tkey->val = ntohl(*((__u32 *)(mac + 2)));
+		ret |= pack_key32(~0, sel, tkey);
+	} else {
+		fprintf(stderr,
+			"pack_mac: mac offsets must begin in 32bit or 16bit boundaries\n");
+		return -1;
+	}
+
+	return ret;
+}
+
 int parse_val(int *argc_p, char ***argv_p, __u32 *val, int type)
 {
 	int argc = *argc_p;
@@ -250,6 +283,14 @@ int parse_val(int *argc_p, char ***argv_p, __u32 *val, int type)
 	if (type == TIPV6)
 		return -1; /* not implemented yet */
 
+	if (type == TMAC) {
+#define MAC_ALEN 6
+		int ret = ll_addr_a2n((char *)val, MAC_ALEN, *argv);
+
+		if (ret == MAC_ALEN)
+			return 0;
+	}
+
 	return -1;
 }
 
@@ -310,6 +351,11 @@ int parse_cmd(int *argc_p, char ***argv_p, __u32 len, int type, __u32 retain,
 		argv++;
 	}
 
+	if (type == TMAC) {
+		res = pack_mac(sel, tkey, (__u8 *)val);
+		goto done;
+	}
+
 	tkey->val = *v;
 	tkey->mask = *m;
 
diff --git a/tc/m_pedit.h b/tc/m_pedit.h
index e2897b0c9808..ecfb6add0df5 100644
--- a/tc/m_pedit.h
+++ b/tc/m_pedit.h
@@ -32,6 +32,7 @@
 #define TIPV6 2
 #define TINT 3
 #define TU32 4
+#define TMAC 5
 
 #define RU32 0xFFFFFFFF
 #define RU16 0xFFFF
diff --git a/tc/p_eth.c b/tc/p_eth.c
new file mode 100644
index 000000000000..ad3e28f80eb6
--- /dev/null
+++ b/tc/p_eth.c
@@ -0,0 +1,72 @@
+/*
+ * m_pedit_eth.c	packet editor: ETH header
+ *
+ *		This program is free software; you can distribute it and/or
+ *		modify it under the terms of the GNU General Public License
+ *		as published by the Free Software Foundation; either version
+ *		2 of the License, or (at your option) any later version.
+ *
+ * Authors:  Amir Vadai (amir@vadai.me)
+ *
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <syslog.h>
+#include <fcntl.h>
+#include <sys/socket.h>
+#include <netinet/in.h>
+#include <arpa/inet.h>
+#include <string.h>
+#include "utils.h"
+#include "tc_util.h"
+#include "m_pedit.h"
+
+static int
+parse_eth(int *argc_p, char ***argv_p,
+	  struct m_pedit_sel *sel, struct m_pedit_key *tkey)
+{
+	int res = -1;
+	int argc = *argc_p;
+	char **argv = *argv_p;
+
+	if (argc < 2)
+		return -1;
+
+	tkey->htype = TCA_PEDIT_KEY_EX_HDR_TYPE_ETH;
+
+	if (strcmp(*argv, "type") == 0) {
+		NEXT_ARG();
+		tkey->off = 12;
+		res = parse_cmd(&argc, &argv, 2, TU32, RU16, sel, tkey);
+		goto done;
+	}
+
+	if (strcmp(*argv, "dst") == 0) {
+		NEXT_ARG();
+		tkey->off = 0;
+		res = parse_cmd(&argc, &argv, 6, TMAC, RU32, sel, tkey);
+		goto done;
+	}
+
+	if (strcmp(*argv, "src") == 0) {
+		NEXT_ARG();
+		tkey->off = 6;
+		res = parse_cmd(&argc, &argv, 6, TMAC, RU32, sel, tkey);
+		goto done;
+	}
+
+	return -1;
+
+done:
+	*argc_p = argc;
+	*argv_p = argv;
+	return res;
+}
+
+struct m_pedit_util p_pedit_eth = {
+	NULL,
+	"eth",
+	parse_eth,
+};
-- 
2.12.0

^ permalink raw reply related

* [PATCH iproute2 net 4/8] tc/pedit: p_ip: introduce editing ttl header
From: Amir Vadai @ 2017-04-23 12:53 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, Or Gerlitz, Jamal Hadi Salim, Amir Vadai
In-Reply-To: <20170423125356.1298-1-amir@vadai.me>

Enable user to edit IP header ttl field.

For example, to forward any TCP packet and decrease its TTL by one:
$ tc filter add dev enp0s9 protocol ip parent ffff: \
    flower \
      ip_proto tcp \
    action pedit ex munge \
      ip ttl add 0xff pipe \
    action mirred egress \
      redirect dev veth0

Signed-off-by: Amir Vadai <amir@vadai.me>
---
 man/man8/tc-pedit.8 | 17 +++++++++++++++++
 tc/p_ip.c           |  6 ++++++
 2 files changed, 23 insertions(+)

diff --git a/man/man8/tc-pedit.8 b/man/man8/tc-pedit.8
index 6bba741956f1..c98d95cb0021 100644
--- a/man/man8/tc-pedit.8
+++ b/man/man8/tc-pedit.8
@@ -28,6 +28,8 @@ pedit - generic packet editor action
 .ti -8
 .IR EXTENDED_LAYERED_OP " := { "
 .BI ip " IPHDR_FIELD"
+|
+.BI ip " EX_IPHDR_FIELD"
 .RI } " CMD_SPEC"
 
 .ti -8
@@ -40,6 +42,10 @@ pedit - generic packet editor action
 .BR dport " | " sport " | " icmp_type " | " icmp_code " }"
 
 .ti -8
+.IR EX_IPHDR_FIELD " := { "
+.BR ttl " }"
+
+.ti -8
 .IR CMD_SPEC " := {"
 .BR clear " | " invert " | " set
 .IR VAL " | "
@@ -161,6 +167,17 @@ If it is not or the latter is bigger than the minimum of 20 bytes, this will do
 unexpected things. These fields are eight-bit values.
 .RE
 .TP
+.BI ip " EX_IPHDR_FIELD"
+Supported only when
+.I ex
+is used. The supported keywords for
+.I EX_IPHDR_FIELD
+are:
+.RS
+.TP
+.B ttl
+.RE
+.TP
 .B clear
 Clear the addressed data (i.e., set it to zero).
 .TP
diff --git a/tc/p_ip.c b/tc/p_ip.c
index e56eb39317ba..22fe6505e427 100644
--- a/tc/p_ip.c
+++ b/tc/p_ip.c
@@ -66,6 +66,12 @@ parse_ip(int *argc_p, char ***argv_p,
 		res = parse_cmd(&argc, &argv, 1, TU32, 0x0f, sel, tkey);
 		goto done;
 	}
+	if (strcmp(*argv, "ttl") == 0) {
+		NEXT_ARG();
+		tkey->off = 8;
+		res = parse_cmd(&argc, &argv, 1, TU32, RU8, sel, tkey);
+		goto done;
+	}
 	if (strcmp(*argv, "protocol") == 0) {
 		NEXT_ARG();
 		tkey->off = 9;
-- 
2.12.0

^ permalink raw reply related

* [PATCH iproute2 net 3/8] tc/pedit: Introduce 'add' operation
From: Amir Vadai @ 2017-04-23 12:53 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, Or Gerlitz, Jamal Hadi Salim, Amir Vadai
In-Reply-To: <20170423125356.1298-1-amir@vadai.me>

This command could be useful to increase/decrease fields value.

Signed-off-by: Amir Vadai <amir@vadai.me>
---
 man/man8/tc-pedit.8 | 13 ++++++++++++-
 tc/m_pedit.c        | 18 +++++++++++++++---
 2 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/man/man8/tc-pedit.8 b/man/man8/tc-pedit.8
index 761d5c8ee2d5..6bba741956f1 100644
--- a/man/man8/tc-pedit.8
+++ b/man/man8/tc-pedit.8
@@ -43,6 +43,8 @@ pedit - generic packet editor action
 .IR CMD_SPEC " := {"
 .BR clear " | " invert " | " set
 .IR VAL " | "
+.BR add
+.IR VAL " | "
 .BR preserve " } [ " retain
 .IR RVAL " ]"
 
@@ -63,7 +65,9 @@ only for IPv4 headers.
 .B ex
 Use extended pedit.
 .I EXTENDED_LAYERED_OP
-is allowed only in this mode.
+and the add
+.I CMD_SPEC
+are allowed only in this mode.
 .TP
 .BI offset " OFFSET " "\fR{ \fBu32 \fR| \fBu16 \fR| \fBu8 \fR}"
 Specify the offset at which to change data.
@@ -173,6 +177,13 @@ keywords in
 or the size of the addressed header field in
 .IR LAYERED_OP .
 .TP
+.BI add " VAL"
+Add the addressed data by a specific value. The size of
+.I VAL
+is defined by the size of the addressed header field in
+.IR EXTENDED_LAYERED_OP .
+This operation is supported only for extended layered op.
+.TP
 .B preserve
 Keep the addressed data as is.
 .TP
diff --git a/tc/m_pedit.c b/tc/m_pedit.c
index a26fd3e5bc5e..7af074a5a97c 100644
--- a/tc/m_pedit.c
+++ b/tc/m_pedit.c
@@ -41,7 +41,7 @@ static void explain(void)
 		"\t\tATC:= at <atval> offmask <maskval> shift <shiftval>\n"
 		"\t\tNOTE: offval is byte offset, must be multiple of 4\n"
 		"\t\tNOTE: maskval is a 32 bit hex number\n \t\tNOTE: shiftval is a shift value\n"
-		"\t\tCMD:= clear | invert | set <setval>| retain\n"
+		"\t\tCMD:= clear | invert | set <setval>| add <addval> | retain\n"
 		"\t<LAYERED>:= ip <ipdata> | ip6 <ip6data>\n"
 		" \t\t| udp <udpdata> | tcp <tcpdata> | icmp <icmpdata>\n"
 		"\tCONTROL:= reclassify | pipe | drop | continue | pass\n"
@@ -276,7 +276,16 @@ int parse_cmd(int *argc_p, char ***argv_p, __u32 len, int type, __u32 retain,
 
 	if (matches(*argv, "invert") == 0) {
 		val = mask = o;
-	} else if (matches(*argv, "set") == 0) {
+	} else if (matches(*argv, "set") == 0 ||
+		   matches(*argv, "add") == 0) {
+		if (matches(*argv, "add") == 0)
+			tkey->cmd = TCA_PEDIT_KEY_EX_CMD_ADD;
+
+		if (!sel->extended && tkey->cmd) {
+			fprintf(stderr, "Non extended mode. only 'set' command is supported\n");
+			return -1;
+		}
+
 		NEXT_ARG();
 		if (parse_val(&argc, &argv, &val, type))
 			return -1;
@@ -690,9 +699,11 @@ int print_pedit(struct action_util *au, FILE *f, struct rtattr *arg)
 		for (i = 0; i < sel->nkeys; i++, key++) {
 			enum pedit_header_type htype =
 				TCA_PEDIT_KEY_EX_HDR_TYPE_NETWORK;
+			enum pedit_cmd cmd = TCA_PEDIT_KEY_EX_CMD_SET;
 
 			if (keys_ex) {
 				htype = key_ex->htype;
+				cmd = key_ex->cmd;
 
 				key_ex++;
 			}
@@ -703,7 +714,8 @@ int print_pedit(struct action_util *au, FILE *f, struct rtattr *arg)
 
 			print_pedit_location(f, htype, key->off);
 
-			fprintf(f, ": val %08x mask %08x",
+			fprintf(f, ": %s %08x mask %08x",
+				cmd ? "add" : "val",
 				(unsigned int)ntohl(key->val),
 				(unsigned int)ntohl(key->mask));
 		}
-- 
2.12.0

^ permalink raw reply related

* [PATCH iproute2 net 2/8] tc/pedit: Extend pedit to specify offset relative to mac/transport headers
From: Amir Vadai @ 2017-04-23 12:53 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, Or Gerlitz, Jamal Hadi Salim, Amir Vadai
In-Reply-To: <20170423125356.1298-1-amir@vadai.me>

Utilize the extended pedit netlink to set an offset relative to a
specific header type. Old netlink only enabled the user to set
approximated  offset relative to the IPv4 header.

To use this extended functionality need to use the 'ex' keyword after
'pedit' and before any 'munge'.
e.g:
$ tc filter add dev ens9 protocol ip parent ffff: \
    flower \
      ip_proto udp \
      dst_port 80 \
    action pedit ex munge \
      ip dst set 1.1.1.1 \
      pipe \
    action mirred egress redirect dev veth0

Signed-off-by: Amir Vadai <amir@vadai.me>
---
 man/man8/tc-pedit.8 |  41 +++++++---
 tc/m_pedit.c        | 213 +++++++++++++++++++++++++++++++++++++++++++++-------
 tc/m_pedit.h        |  43 ++++++++---
 tc/p_icmp.c         |   3 +-
 tc/p_ip.c           |  15 +++-
 tc/p_tcp.c          |   3 +-
 tc/p_udp.c          |   3 +-
 7 files changed, 270 insertions(+), 51 deletions(-)

diff --git a/man/man8/tc-pedit.8 b/man/man8/tc-pedit.8
index c34520c046a6..761d5c8ee2d5 100644
--- a/man/man8/tc-pedit.8
+++ b/man/man8/tc-pedit.8
@@ -5,8 +5,8 @@ pedit - generic packet editor action
 .SH SYNOPSIS
 .in +8
 .ti -8
-.BR tc " ... " "action pedit munge " {
-.IR RAW_OP " | " LAYERED_OP " } [ " CONTROL " ]"
+.BR tc " ... " "action pedit [ex] munge " {
+.IR RAW_OP " | " LAYERED_OP " | " EXTENDED_LAYERED_OP " } [ " CONTROL " ]"
 
 .ti -8
 .IR RAW_OP " := "
@@ -22,20 +22,22 @@ pedit - generic packet editor action
 .IR LAYERED_OP " := { "
 .BI ip " IPHDR_FIELD"
 |
-.BI ip6 " IP6HDR_FIELD"
-|
-.BI udp " UDPHDR_FIELD"
-|
-.BI tcp " TCPHDR_FIELD"
-|
-.BI icmp " ICMPHDR_FIELD"
+.BI ip " BEYOND_IPHDR_FIELD"
+.RI } " CMD_SPEC"
+
+.ti -8
+.IR EXTENDED_LAYERED_OP " := { "
+.BI ip " IPHDR_FIELD"
 .RI } " CMD_SPEC"
 
 .ti -8
 .IR IPHDR_FIELD " := { "
 .BR src " | " dst " | " tos " | " dsfield " | " ihl " | " protocol " |"
-.BR precedence " | " nofrag " | " firstfrag " | " ce " | " df " |"
-.BR mf " | " dport " | " sport " | " icmp_type " | " icmp_code " }"
+.BR precedence " | " nofrag " | " firstfrag " | " ce " | " df " }"
+
+.ti -8
+.IR BEYOND_IPHDR_FIELD " := { "
+.BR dport " | " sport " | " icmp_type " | " icmp_code " }"
 
 .ti -8
 .IR CMD_SPEC " := {"
@@ -58,6 +60,11 @@ chosen automatically based on the header field size. Currently this is supported
 only for IPv4 headers.
 .SH OPTIONS
 .TP
+.B ex
+Use extended pedit.
+.I EXTENDED_LAYERED_OP
+is allowed only in this mode.
+.TP
 .BI offset " OFFSET " "\fR{ \fBu32 \fR| \fBu16 \fR| \fBu8 \fR}"
 Specify the offset at which to change data.
 .I OFFSET
@@ -123,6 +130,15 @@ Change IP header flags. Note that the value to pass to the
 .B set
 command is not just a bit value, but the full byte including the flags field.
 Though only the relevant bits of that value are respected, the rest ignored.
+.RE
+.TP
+.BI ip " BEYOND_IPHDR_FIELD"
+Supported only for non-extended layered op. It is passed to the kernel as
+offsets relative to the beginning of the IP header and assumes the IP header is
+of minimum size (20 bytes). The supported keywords for
+.I BEYOND_IPHDR_FIELD
+are:
+.RS
 .TP
 .B dport
 .TQ
@@ -222,6 +238,9 @@ tc filter add dev eth0 parent 1: u32 \\
 tc filter add dev eth0 parent ffff: u32 \\
 	match ip sport 22 0xffff \\
 	action pedit pedit munge ip sport set 23
+tc filter add dev eth0 parent ffff: u32 \\
+	match ip sport 22 0xffff \\
+	action pedit ex munge ip dst set 192.168.1.199
 .EE
 .RE
 .SH SEE ALSO
diff --git a/tc/m_pedit.c b/tc/m_pedit.c
index 939a6a1455a5..a26fd3e5bc5e 100644
--- a/tc/m_pedit.c
+++ b/tc/m_pedit.c
@@ -34,7 +34,7 @@ static int pedit_debug;
 
 static void explain(void)
 {
-	fprintf(stderr, "Usage: ... pedit munge <MUNGE> [CONTROL]\n");
+	fprintf(stderr, "Usage: ... pedit munge [ex] <MUNGE> [CONTROL]\n");
 	fprintf(stderr,
 		"Where: MUNGE := <RAW>|<LAYERED>\n"
 		"\t<RAW>:= <OFFSETC>[ATC]<CMD>\n \t\tOFFSETC:= offset <offval> <u8|u16|u32>\n"
@@ -45,6 +45,7 @@ static void explain(void)
 		"\t<LAYERED>:= ip <ipdata> | ip6 <ip6data>\n"
 		" \t\t| udp <udpdata> | tcp <tcpdata> | icmp <icmpdata>\n"
 		"\tCONTROL:= reclassify | pipe | drop | continue | pass\n"
+		"\tNOTE: if 'ex' is set, extended functionality will be supported (kernel >= 4.11)\n"
 		"For Example usage look at the examples directory\n");
 
 }
@@ -56,8 +57,8 @@ static void usage(void)
 }
 
 static int pedit_parse_nopopt(int *argc_p, char ***argv_p,
-			      struct tc_pedit_sel *sel,
-			      struct tc_pedit_key *tkey)
+			      struct m_pedit_sel *sel,
+			      struct m_pedit_key *tkey)
 {
 	int argc = *argc_p;
 	char **argv = *argv_p;
@@ -116,8 +117,10 @@ noexist:
 	return p;
 }
 
-int pack_key(struct tc_pedit_sel *sel, struct tc_pedit_key *tkey)
+int pack_key(struct m_pedit_sel *_sel, struct m_pedit_key *tkey)
 {
+	struct tc_pedit_sel *sel = &_sel->sel;
+	struct m_pedit_key_ex *keys_ex = _sel->keys_ex;
 	int hwm = sel->nkeys;
 
 	if (hwm >= MAX_OFFS)
@@ -134,12 +137,24 @@ int pack_key(struct tc_pedit_sel *sel, struct tc_pedit_key *tkey)
 	sel->keys[hwm].at = tkey->at;
 	sel->keys[hwm].offmask = tkey->offmask;
 	sel->keys[hwm].shift = tkey->shift;
+
+	if (_sel->extended) {
+		keys_ex[hwm].htype = tkey->htype;
+		keys_ex[hwm].cmd = tkey->cmd;
+	} else {
+		if (tkey->htype != TCA_PEDIT_KEY_EX_HDR_TYPE_NETWORK ||
+		    tkey->cmd != TCA_PEDIT_KEY_EX_CMD_SET) {
+			fprintf(stderr, "Munge parameters not supported. Use 'munge ex'.\n");
+			return -1;
+		}
+	}
+
 	sel->nkeys++;
 	return 0;
 }
 
-int pack_key32(__u32 retain, struct tc_pedit_sel *sel,
-	       struct tc_pedit_key *tkey)
+int pack_key32(__u32 retain, struct m_pedit_sel *sel,
+	       struct m_pedit_key *tkey)
 {
 	if (tkey->off > (tkey->off & ~3)) {
 		fprintf(stderr,
@@ -152,8 +167,8 @@ int pack_key32(__u32 retain, struct tc_pedit_sel *sel,
 	return pack_key(sel, tkey);
 }
 
-int pack_key16(__u32 retain, struct tc_pedit_sel *sel,
-	       struct tc_pedit_key *tkey)
+int pack_key16(__u32 retain, struct m_pedit_sel *sel,
+	       struct m_pedit_key *tkey)
 {
 	int ind, stride;
 	__u32 m[4] = { 0x0000FFFF, 0xFF0000FF, 0xFFFF0000 };
@@ -183,7 +198,7 @@ int pack_key16(__u32 retain, struct tc_pedit_sel *sel,
 
 }
 
-int pack_key8(__u32 retain, struct tc_pedit_sel *sel, struct tc_pedit_key *tkey)
+int pack_key8(__u32 retain, struct m_pedit_sel *sel, struct m_pedit_key *tkey)
 {
 	int ind, stride;
 	__u32 m[4] = { 0x00FFFFFF, 0xFF00FFFF, 0xFFFF00FF, 0xFFFFFF00 };
@@ -239,7 +254,7 @@ int parse_val(int *argc_p, char ***argv_p, __u32 *val, int type)
 }
 
 int parse_cmd(int *argc_p, char ***argv_p, __u32 len, int type, __u32 retain,
-	      struct tc_pedit_sel *sel, struct tc_pedit_key *tkey)
+	      struct m_pedit_sel *sel, struct m_pedit_key *tkey)
 {
 	__u32 mask = 0, val = 0;
 	__u32 o = 0xFF;
@@ -313,8 +328,8 @@ done:
 
 }
 
-int parse_offset(int *argc_p, char ***argv_p, struct tc_pedit_sel *sel,
-		 struct tc_pedit_key *tkey)
+int parse_offset(int *argc_p, char ***argv_p, struct m_pedit_sel *sel,
+		 struct m_pedit_key *tkey)
 {
 	int off;
 	__u32 len, retain;
@@ -389,9 +404,9 @@ done:
 	return res;
 }
 
-static int parse_munge(int *argc_p, char ***argv_p, struct tc_pedit_sel *sel)
+static int parse_munge(int *argc_p, char ***argv_p, struct m_pedit_sel *sel)
 {
-	struct tc_pedit_key tkey = {};
+	struct m_pedit_key tkey = {};
 	int argc = *argc_p;
 	char **argv = *argv_p;
 	int res = -1;
@@ -433,13 +448,69 @@ done:
 	return res;
 }
 
+static int pedit_keys_ex_getattr(struct rtattr *attr,
+				 struct m_pedit_key_ex *keys_ex, int n)
+{
+	struct rtattr *i;
+	int rem = RTA_PAYLOAD(attr);
+	struct rtattr *tb[TCA_PEDIT_KEY_EX_MAX + 1];
+	struct m_pedit_key_ex *k = keys_ex;
+
+	for (i = RTA_DATA(attr); RTA_OK(i, rem); i = RTA_NEXT(i, rem)) {
+		if (!n)
+			return -1;
+
+		if (i->rta_type != TCA_PEDIT_KEY_EX)
+			return -1;
+
+		parse_rtattr_nested(tb, TCA_PEDIT_KEY_EX_MAX, i);
+
+		k->htype = rta_getattr_u16(tb[TCA_PEDIT_KEY_EX_HTYPE]);
+		k->cmd = rta_getattr_u16(tb[TCA_PEDIT_KEY_EX_CMD]);
+
+		k++;
+		n--;
+	}
+
+	return !!n;
+}
+
+static int pedit_keys_ex_addattr(struct m_pedit_sel *sel, struct nlmsghdr *n)
+{
+	struct m_pedit_key_ex *k = sel->keys_ex;
+	struct rtattr *keys_start;
+	int i;
+
+	if (!sel->extended)
+		return 0;
+
+	keys_start = addattr_nest(n, MAX_MSG, TCA_PEDIT_KEYS_EX | NLA_F_NESTED);
+
+	for (i = 0; i < sel->sel.nkeys; i++) {
+		struct rtattr *key_start;
+
+		key_start = addattr_nest(n, MAX_MSG,
+					 TCA_PEDIT_KEY_EX | NLA_F_NESTED);
+
+		if (addattr16(n, MAX_MSG, TCA_PEDIT_KEY_EX_HTYPE, k->htype) ||
+		    addattr16(n, MAX_MSG, TCA_PEDIT_KEY_EX_CMD, k->cmd)) {
+			return -1;
+		}
+
+		addattr_nest_end(n, key_start);
+
+		k++;
+	}
+
+	addattr_nest_end(n, keys_start);
+
+	return 0;
+}
+
 int parse_pedit(struct action_util *a, int *argc_p, char ***argv_p, int tca_id,
 		struct nlmsghdr *n)
 {
-	struct {
-		struct tc_pedit_sel sel;
-		struct tc_pedit_key keys[MAX_OFFS];
-	} sel = {};
+	struct m_pedit_sel sel = {};
 
 	int argc = *argc_p;
 	char **argv = *argv_p;
@@ -452,6 +523,17 @@ int parse_pedit(struct action_util *a, int *argc_p, char ***argv_p, int tca_id,
 		if (matches(*argv, "pedit") == 0) {
 			NEXT_ARG();
 			ok++;
+
+			if (matches(*argv, "ex") == 0) {
+				if (ok > 1) {
+					fprintf(stderr, "'ex' must be before first 'munge'\n");
+					explain();
+					return -1;
+				}
+				sel.extended = true;
+				NEXT_ARG();
+			}
+
 			continue;
 		} else if (matches(*argv, "help") == 0) {
 			usage();
@@ -463,7 +545,8 @@ int parse_pedit(struct action_util *a, int *argc_p, char ***argv_p, int tca_id,
 				return -1;
 			}
 			NEXT_ARG();
-			if (parse_munge(&argc, &argv, &sel.sel)) {
+
+			if (parse_munge(&argc, &argv, &sel)) {
 				fprintf(stderr, "Bad pedit construct (%s)\n",
 					*argv);
 				explain();
@@ -499,9 +582,18 @@ int parse_pedit(struct action_util *a, int *argc_p, char ***argv_p, int tca_id,
 
 	tail = NLMSG_TAIL(n);
 	addattr_l(n, MAX_MSG, tca_id, NULL, 0);
-	addattr_l(n, MAX_MSG, TCA_PEDIT_PARMS, &sel,
-		  sizeof(sel.sel) +
-		  sel.sel.nkeys * sizeof(struct tc_pedit_key));
+	if (!sel.extended) {
+		addattr_l(n, MAX_MSG, TCA_PEDIT_PARMS, &sel,
+			  sizeof(sel.sel) +
+			  sel.sel.nkeys * sizeof(struct tc_pedit_key));
+	} else {
+		addattr_l(n, MAX_MSG, TCA_PEDIT_PARMS_EX, &sel,
+			  sizeof(sel.sel) +
+			  sel.sel.nkeys * sizeof(struct tc_pedit_key));
+
+		pedit_keys_ex_addattr(&sel, n);
+	}
+
 	tail->rta_len = (void *)NLMSG_TAIL(n) - (void *)tail;
 
 	*argc_p = argc;
@@ -509,21 +601,74 @@ int parse_pedit(struct action_util *a, int *argc_p, char ***argv_p, int tca_id,
 	return 0;
 }
 
+const char *pedit_htype_str[] = {
+	[TCA_PEDIT_KEY_EX_HDR_TYPE_NETWORK] = "",
+	[TCA_PEDIT_KEY_EX_HDR_TYPE_ETH] = "eth",
+	[TCA_PEDIT_KEY_EX_HDR_TYPE_IP4] = "ipv4",
+	[TCA_PEDIT_KEY_EX_HDR_TYPE_IP6] = "ipv6",
+	[TCA_PEDIT_KEY_EX_HDR_TYPE_TCP] = "tcp",
+	[TCA_PEDIT_KEY_EX_HDR_TYPE_UDP] = "udp",
+};
+
+static void print_pedit_location(FILE *f,
+				 enum pedit_header_type htype, __u32 off)
+{
+	if (htype == TCA_PEDIT_KEY_EX_HDR_TYPE_NETWORK) {
+		fprintf(f, "%d", (unsigned int)off);
+		return;
+	}
+
+	if (htype < ARRAY_SIZE(pedit_htype_str))
+		fprintf(f, "%s", pedit_htype_str[htype]);
+	else
+		fprintf(f, "unknown(%d)", htype);
+
+	fprintf(f, "%c%d", (int)off  >= 0 ? '+' : '-', abs((int)off));
+}
+
 int print_pedit(struct action_util *au, FILE *f, struct rtattr *arg)
 {
 	struct tc_pedit_sel *sel;
 	struct rtattr *tb[TCA_PEDIT_MAX + 1];
+	struct m_pedit_key_ex *keys_ex = NULL;
 
 	if (arg == NULL)
 		return -1;
 
 	parse_rtattr_nested(tb, TCA_PEDIT_MAX, arg);
 
-	if (tb[TCA_PEDIT_PARMS] == NULL) {
+	if (!tb[TCA_PEDIT_PARMS] && !tb[TCA_PEDIT_PARMS_EX]) {
 		fprintf(f, "[NULL pedit parameters]");
 		return -1;
 	}
-	sel = RTA_DATA(tb[TCA_PEDIT_PARMS]);
+
+	if (tb[TCA_PEDIT_PARMS]) {
+		sel = RTA_DATA(tb[TCA_PEDIT_PARMS]);
+	} else {
+		int err;
+
+		sel = RTA_DATA(tb[TCA_PEDIT_PARMS_EX]);
+
+		if (!tb[TCA_PEDIT_KEYS_EX]) {
+			fprintf(f, "Netlink error\n");
+			return -1;
+		}
+
+		keys_ex = calloc(sel->nkeys, sizeof(*keys_ex));
+		if (!keys_ex) {
+			fprintf(f, "Out of memory\n");
+			return -1;
+		}
+
+		err = pedit_keys_ex_getattr(tb[TCA_PEDIT_KEYS_EX], keys_ex,
+					    sel->nkeys);
+		if (err) {
+			fprintf(f, "Netlink error\n");
+
+			free(keys_ex);
+			return -1;
+		}
+	}
 
 	fprintf(f, " pedit action %s keys %d\n ",
 		action_n2a(sel->action), sel->nkeys);
@@ -540,11 +685,25 @@ int print_pedit(struct action_util *au, FILE *f, struct rtattr *arg)
 	if (sel->nkeys) {
 		int i;
 		struct tc_pedit_key *key = sel->keys;
+		struct m_pedit_key_ex *key_ex = keys_ex;
 
 		for (i = 0; i < sel->nkeys; i++, key++) {
+			enum pedit_header_type htype =
+				TCA_PEDIT_KEY_EX_HDR_TYPE_NETWORK;
+
+			if (keys_ex) {
+				htype = key_ex->htype;
+
+				key_ex++;
+			}
+
 			fprintf(f, "\n\t key #%d", i);
-			fprintf(f, "  at %d: val %08x mask %08x",
-				(unsigned int)key->off,
+
+			fprintf(f, "  at ");
+
+			print_pedit_location(f, htype, key->off);
+
+			fprintf(f, ": val %08x mask %08x",
 				(unsigned int)ntohl(key->val),
 				(unsigned int)ntohl(key->mask));
 		}
@@ -554,6 +713,8 @@ int print_pedit(struct action_util *au, FILE *f, struct rtattr *arg)
 	}
 
 	fprintf(f, "\n ");
+
+	free(keys_ex);
 	return 0;
 }
 
diff --git a/tc/m_pedit.h b/tc/m_pedit.h
index 1698c954e999..e2897b0c9808 100644
--- a/tc/m_pedit.h
+++ b/tc/m_pedit.h
@@ -39,22 +39,47 @@
 
 #define PEDITKINDSIZ 16
 
+struct m_pedit_key {
+	__u32           mask;  /* AND */
+	__u32           val;   /*XOR */
+	__u32           off;  /*offset */
+	__u32           at;
+	__u32           offmask;
+	__u32           shift;
+
+	enum pedit_header_type htype;
+	enum pedit_cmd cmd;
+};
+
+struct m_pedit_key_ex {
+	enum pedit_header_type htype;
+	enum pedit_cmd cmd;
+};
+
+struct m_pedit_sel {
+	struct tc_pedit_sel sel;
+	struct tc_pedit_key keys[MAX_OFFS];
+	struct m_pedit_key_ex keys_ex[MAX_OFFS];
+	bool extended;
+};
+
 struct m_pedit_util
 {
 	struct m_pedit_util *next;
 	char    id[PEDITKINDSIZ];
-	int     (*parse_peopt)(int *argc_p, char ***argv_p,struct tc_pedit_sel *sel,struct tc_pedit_key *tkey);
+	int     (*parse_peopt)(int *argc_p, char ***argv_p,
+			       struct m_pedit_sel *sel, struct m_pedit_key *tkey);
 };
 
-
-extern int parse_cmd(int *argc_p, char ***argv_p, __u32 len, int type,__u32 retain,struct tc_pedit_sel *sel,struct tc_pedit_key *tkey);
-extern int pack_key(struct tc_pedit_sel *sel,struct tc_pedit_key *tkey);
-extern int pack_key32(__u32 retain,struct tc_pedit_sel *sel,struct tc_pedit_key *tkey);
-extern int pack_key16(__u32 retain,struct tc_pedit_sel *sel,struct tc_pedit_key *tkey);
-extern int pack_key8(__u32 retain,struct tc_pedit_sel *sel,struct tc_pedit_key *tkey);
+extern int pack_key(struct m_pedit_sel *sel, struct m_pedit_key *tkey);
+extern int pack_key32(__u32 retain, struct m_pedit_sel *sel, struct m_pedit_key *tkey);
+extern int pack_key16(__u32 retain, struct m_pedit_sel *sel, struct m_pedit_key *tkey);
+extern int pack_key8(__u32 retain, struct m_pedit_sel *sel, struct m_pedit_key *tkey);
 extern int parse_val(int *argc_p, char ***argv_p, __u32 * val, int type);
-extern int parse_cmd(int *argc_p, char ***argv_p, __u32 len, int type,__u32 retain,struct tc_pedit_sel *sel,struct tc_pedit_key *tkey);
-extern int parse_offset(int *argc_p, char ***argv_p,struct tc_pedit_sel *sel,struct tc_pedit_key *tkey);
+extern int parse_cmd(int *argc_p, char ***argv_p, __u32 len, int type, __u32 retain,
+		     struct m_pedit_sel *sel, struct m_pedit_key *tkey);
+extern int parse_offset(int *argc_p, char ***argv_p,
+			struct m_pedit_sel *sel, struct m_pedit_key *tkey);
 int parse_pedit(struct action_util *a, int *argc_p, char ***argv_p, int tca_id, struct nlmsghdr *n);
 extern int print_pedit(struct action_util *au,FILE * f, struct rtattr *arg);
 extern int pedit_print_xstats(struct action_util *au, FILE *f, struct rtattr *xstats);
diff --git a/tc/p_icmp.c b/tc/p_icmp.c
index c2a6fcd69817..1c3a5d90006d 100644
--- a/tc/p_icmp.c
+++ b/tc/p_icmp.c
@@ -25,7 +25,8 @@
 
 
 static int
-parse_icmp(int *argc_p, char ***argv_p, struct tc_pedit_sel *sel, struct tc_pedit_key *tkey)
+parse_icmp(int *argc_p, char ***argv_p,
+	   struct m_pedit_sel *sel, struct m_pedit_key *tkey)
 {
 	int res = -1;
 #if 0
diff --git a/tc/p_ip.c b/tc/p_ip.c
index 535151e5d766..e56eb39317ba 100644
--- a/tc/p_ip.c
+++ b/tc/p_ip.c
@@ -25,7 +25,7 @@
 
 static int
 parse_ip(int *argc_p, char ***argv_p,
-	 struct tc_pedit_sel *sel, struct tc_pedit_key *tkey)
+	 struct m_pedit_sel *sel, struct m_pedit_key *tkey)
 {
 	int res = -1;
 	int argc = *argc_p;
@@ -34,6 +34,10 @@ parse_ip(int *argc_p, char ***argv_p,
 	if (argc < 2)
 		return -1;
 
+	tkey->htype = sel->extended ?
+		TCA_PEDIT_KEY_EX_HDR_TYPE_IP4 :
+		TCA_PEDIT_KEY_EX_HDR_TYPE_NETWORK;
+
 	if (strcmp(*argv, "src") == 0) {
 		NEXT_ARG();
 		tkey->off = 12;
@@ -107,6 +111,13 @@ parse_ip(int *argc_p, char ***argv_p,
 		res = parse_cmd(&argc, &argv, 1, TU32, 0x20, sel, tkey);
 		goto done;
 	}
+
+	if (sel->extended)
+		return -1; /* fields located outside IP header should be
+			    * addressed using the relevant header type in
+			    * extended pedit kABI
+			    */
+
 	if (strcmp(*argv, "dport") == 0) {
 		NEXT_ARG();
 		tkey->off = 22;
@@ -141,7 +152,7 @@ done:
 
 static int
 parse_ip6(int *argc_p, char ***argv_p,
-	  struct tc_pedit_sel *sel, struct tc_pedit_key *tkey)
+	  struct m_pedit_sel *sel, struct m_pedit_key *tkey)
 {
 	int res = -1;
 	return res;
diff --git a/tc/p_tcp.c b/tc/p_tcp.c
index 79f16c58cad6..53ee9842160b 100644
--- a/tc/p_tcp.c
+++ b/tc/p_tcp.c
@@ -24,7 +24,8 @@
 #include "m_pedit.h"
 
 static int
-parse_tcp(int *argc_p, char ***argv_p, struct tc_pedit_sel *sel, struct tc_pedit_key *tkey)
+parse_tcp(int *argc_p, char ***argv_p,
+	  struct m_pedit_sel *sel, struct m_pedit_key *tkey)
 {
 	int res = -1;
 	return res;
diff --git a/tc/p_udp.c b/tc/p_udp.c
index c056414e6eb3..3a86ba382391 100644
--- a/tc/p_udp.c
+++ b/tc/p_udp.c
@@ -24,7 +24,8 @@
 #include "m_pedit.h"
 
 static int
-parse_udp(int *argc_p, char ***argv_p, struct tc_pedit_sel *sel, struct tc_pedit_key *tkey)
+parse_udp(int *argc_p, char ***argv_p,
+	  struct m_pedit_sel *sel, struct m_pedit_key *tkey)
 {
 	int res = -1;
 	return res;
-- 
2.12.0

^ permalink raw reply related

* [PATCH iproute2 net 1/8] tc/pedit: Fix a typo in pedit usage message
From: Amir Vadai @ 2017-04-23 12:53 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, Or Gerlitz, Jamal Hadi Salim, Amir Vadai
In-Reply-To: <20170423125356.1298-1-amir@vadai.me>

Signed-off-by: Amir Vadai <amir@vadai.me>
---
 tc/m_pedit.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tc/m_pedit.c b/tc/m_pedit.c
index 8e9bf0720dfe..939a6a1455a5 100644
--- a/tc/m_pedit.c
+++ b/tc/m_pedit.c
@@ -40,7 +40,7 @@ static void explain(void)
 		"\t<RAW>:= <OFFSETC>[ATC]<CMD>\n \t\tOFFSETC:= offset <offval> <u8|u16|u32>\n"
 		"\t\tATC:= at <atval> offmask <maskval> shift <shiftval>\n"
 		"\t\tNOTE: offval is byte offset, must be multiple of 4\n"
-		"\t\tNOTE: maskval is a 32 bit hex number\n \t\tNOTE: shiftval is a is a shift value\n"
+		"\t\tNOTE: maskval is a 32 bit hex number\n \t\tNOTE: shiftval is a shift value\n"
 		"\t\tCMD:= clear | invert | set <setval>| retain\n"
 		"\t<LAYERED>:= ip <ipdata> | ip6 <ip6data>\n"
 		" \t\t| udp <udpdata> | tcp <tcpdata> | icmp <icmpdata>\n"
-- 
2.12.0

^ permalink raw reply related

* [PATCH iproute2 net 0/8] tc/act_pedit: Support offset relative to conventional header
From: Amir Vadai @ 2017-04-23 12:53 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, Or Gerlitz, Jamal Hadi Salim, Amir Vadai

Hi Stephen,

This patchset extends pedit to support modifying a field in an offset relative
to the conventional network headers (kenrel support was added [1] in 4.11 rc1).
Without the extended pedit, user could specify fields in TCP and ICMP headers,
but the kernel code was using an offset relative to the begining of the IP
header. This will break if IP header length is greater than the minimal value
of 20, or if L3 is not IPv4.

It also introduces support in manipulating ETH, TCP, UDP and IP.ttl fields and
a new command to increase/decrease the value of a field (current use case is IP.ttl).

Since there might be deployments already using pedit, special consideration was
taken, not to break those scripts - only by specifying the special keyword
'ex', the extended capabilities are available, thus there should be no impact
on existing scripts.
Also, the new code can live together with rules added by the old code. It
supports both the old netlink and the new one.

This patchset is against the master and not net-next as the functionality was
added in 4.11

Thanks,
Amir

[1] - 71d0ed7079df ("net/act_pedit: Support using offset relative to the
                     conventional network headers")

Amir Vadai (7):
  tc/pedit: Fix a typo in pedit usage message
  tc/pedit: Extend pedit to specify offset relative to mac/transport
    headers
  tc/pedit: Introduce 'add' operation
  tc/pedit: p_ip: introduce editing ttl header
  tc/pedit: Support fields bigger than 32 bits
  tc/pedit: p_eth: ETH header editor
  tc/pedit: p_tcp: introduce pedit tcp support

Or Gerlitz (1):
  tc/pedit: p_udp: introduce pedit udp support

 man/man8/tc-pedit.8 | 126 +++++++++++++++++++++--
 tc/Makefile         |   1 +
 tc/m_pedit.c        | 290 ++++++++++++++++++++++++++++++++++++++++++++++------
 tc/m_pedit.h        |  44 ++++++--
 tc/p_eth.c          |  72 +++++++++++++
 tc/p_icmp.c         |   3 +-
 tc/p_ip.c           |  21 +++-
 tc/p_tcp.c          |  40 +++++++-
 tc/p_udp.c          |  30 +++++-
 9 files changed, 572 insertions(+), 55 deletions(-)
 create mode 100644 tc/p_eth.c

-- 
2.12.0

^ permalink raw reply

* [PATCH 1/1] qlcnic: fix unchecked return value
From: Pan Bian @ 2017-04-23 12:04 UTC (permalink / raw)
  To: Harish Patil, Manish Chopra, Dept-GELinuxNICDev
  Cc: netdev, linux-kernel, Pan Bian

From: Pan Bian <bianpan2016@163.com>

Function pci_find_ext_capability() may return 0, which is an invalid
address. In function qlcnic_sriov_virtid_fn(), its return value is used
without validation. This may result in invalid memory access bugs. This
patch fixes the bug.

Signed-off-by: Pan Bian <bianpan2016@163.com>
---
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_common.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_common.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_common.c
index d710705..2f656f3 100644
--- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_common.c
+++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_common.c
@@ -128,6 +128,8 @@ static int qlcnic_sriov_virtid_fn(struct qlcnic_adapter *adapter, int vf_id)
 		return 0;
 
 	pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_SRIOV);
+	if (!pos)
+		return 0;
 	pci_read_config_word(dev, pos + PCI_SRIOV_VF_OFFSET, &offset);
 	pci_read_config_word(dev, pos + PCI_SRIOV_VF_STRIDE, &stride);
 
-- 
1.9.1

^ permalink raw reply related

* [net 1/7] net/mlx5: Fix driver load bad flow when having fw initializing timeout
From: Saeed Mahameed @ 2017-04-23 10:07 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Mohamad Haj Yahia, Saeed Mahameed, kernel-team
In-Reply-To: <20170423100802.27630-1-saeedm@mellanox.com>

From: Mohamad Haj Yahia <mohamad@mellanox.com>

If FW is stuck in initializing state we will skip the driver load, but
current error handling flow doesn't clean previously allocated command
interface resources.

Fixes: e3297246c2c8 ('net/mlx5_core: Wait for FW readiness on startup')
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Cc: kernel-team@fb.com
---
 drivers/net/ethernet/mellanox/mlx5/core/main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 60154a175bd3..0ad66324247f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -1029,7 +1029,7 @@ static int mlx5_load_one(struct mlx5_core_dev *dev, struct mlx5_priv *priv,
 	if (err) {
 		dev_err(&dev->pdev->dev, "Firmware over %d MS in initializing state, aborting\n",
 			FW_INIT_TIMEOUT_MILI);
-		goto out_err;
+		goto err_cmd_cleanup;
 	}

 	err = mlx5_core_enable_hca(dev, 0);
-- 
2.11.0

^ permalink raw reply related

* [net 2/7] net/mlx5: E-Switch, Correctly deal with inline mode on ConnectX-5
From: Saeed Mahameed @ 2017-04-23 10:07 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Or Gerlitz, Saeed Mahameed
In-Reply-To: <20170423100802.27630-1-saeedm@mellanox.com>

From: Or Gerlitz <ogerlitz@mellanox.com>

On ConnectX5 the wqe inline mode is "none" and hence the FW
reports MLX5_CAP_INLINE_MODE_NOT_REQUIRED.

Fix our devlink callbacks to deal with that on get and set.

Also fix the tc flow parsing code not to fail anything when
inline isn't required.

Fixes: bffaa916588e ('net/mlx5: E-Switch, Add control for inline mode')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c    |  3 +-
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 36 ++++++++++++++--------
 2 files changed, 26 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index fade7233dac5..b7c99c38a7c4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -639,7 +639,8 @@ static int parse_cls_flower(struct mlx5e_priv *priv,
 
 	if (!err && (flow->flags & MLX5E_TC_FLOW_ESWITCH) &&
 	    rep->vport != FDB_UPLINK_VPORT) {
-		if (min_inline > esw->offloads.inline_mode) {
+		if (esw->offloads.inline_mode != MLX5_INLINE_MODE_NONE &&
+		    esw->offloads.inline_mode < min_inline) {
 			netdev_warn(priv->netdev,
 				    "Flow is not offloaded due to min inline setting, required %d actual %d\n",
 				    min_inline, esw->offloads.inline_mode);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index 307ec6c5fd3b..d111cebca9f1 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -911,8 +911,7 @@ int mlx5_devlink_eswitch_inline_mode_set(struct devlink *devlink, u8 mode)
 	struct mlx5_core_dev *dev = devlink_priv(devlink);
 	struct mlx5_eswitch *esw = dev->priv.eswitch;
 	int num_vports = esw->enabled_vports;
-	int err;
-	int vport;
+	int err, vport;
 	u8 mlx5_mode;
 
 	if (!MLX5_CAP_GEN(dev, vport_group_manager))
@@ -921,9 +920,17 @@ int mlx5_devlink_eswitch_inline_mode_set(struct devlink *devlink, u8 mode)
 	if (esw->mode == SRIOV_NONE)
 		return -EOPNOTSUPP;
 
-	if (MLX5_CAP_ETH(dev, wqe_inline_mode) !=
-	    MLX5_CAP_INLINE_MODE_VPORT_CONTEXT)
+	switch (MLX5_CAP_ETH(dev, wqe_inline_mode)) {
+	case MLX5_CAP_INLINE_MODE_NOT_REQUIRED:
+		if (mode == DEVLINK_ESWITCH_INLINE_MODE_NONE)
+			return 0;
+		/* fall through */
+	case MLX5_CAP_INLINE_MODE_L2:
+		esw_warn(dev, "Inline mode can't be set\n");
 		return -EOPNOTSUPP;
+	case MLX5_CAP_INLINE_MODE_VPORT_CONTEXT:
+		break;
+	}
 
 	if (esw->offloads.num_flows > 0) {
 		esw_warn(dev, "Can't set inline mode when flows are configured\n");
@@ -966,18 +973,14 @@ int mlx5_devlink_eswitch_inline_mode_get(struct devlink *devlink, u8 *mode)
 	if (esw->mode == SRIOV_NONE)
 		return -EOPNOTSUPP;
 
-	if (MLX5_CAP_ETH(dev, wqe_inline_mode) !=
-	    MLX5_CAP_INLINE_MODE_VPORT_CONTEXT)
-		return -EOPNOTSUPP;
-
 	return esw_inline_mode_to_devlink(esw->offloads.inline_mode, mode);
 }
 
 int mlx5_eswitch_inline_mode_get(struct mlx5_eswitch *esw, int nvfs, u8 *mode)
 {
+	u8 prev_mlx5_mode, mlx5_mode = MLX5_INLINE_MODE_L2;
 	struct mlx5_core_dev *dev = esw->dev;
 	int vport;
-	u8 prev_mlx5_mode, mlx5_mode = MLX5_INLINE_MODE_L2;
 
 	if (!MLX5_CAP_GEN(dev, vport_group_manager))
 		return -EOPNOTSUPP;
@@ -985,10 +988,18 @@ int mlx5_eswitch_inline_mode_get(struct mlx5_eswitch *esw, int nvfs, u8 *mode)
 	if (esw->mode == SRIOV_NONE)
 		return -EOPNOTSUPP;
 
-	if (MLX5_CAP_ETH(dev, wqe_inline_mode) !=
-	    MLX5_CAP_INLINE_MODE_VPORT_CONTEXT)
-		return -EOPNOTSUPP;
+	switch (MLX5_CAP_ETH(dev, wqe_inline_mode)) {
+	case MLX5_CAP_INLINE_MODE_NOT_REQUIRED:
+		mlx5_mode = MLX5_INLINE_MODE_NONE;
+		goto out;
+	case MLX5_CAP_INLINE_MODE_L2:
+		mlx5_mode = MLX5_INLINE_MODE_L2;
+		goto out;
+	case MLX5_CAP_INLINE_MODE_VPORT_CONTEXT:
+		goto query_vports;
+	}
 
+query_vports:
 	for (vport = 1; vport <= nvfs; vport++) {
 		mlx5_query_nic_vport_min_inline(dev, vport, &mlx5_mode);
 		if (vport > 1 && prev_mlx5_mode != mlx5_mode)
@@ -996,6 +1007,7 @@ int mlx5_eswitch_inline_mode_get(struct mlx5_eswitch *esw, int nvfs, u8 *mode)
 		prev_mlx5_mode = mlx5_mode;
 	}
 
+out:
 	*mode = mlx5_mode;
 	return 0;
 }
-- 
2.11.0

^ permalink raw reply related

* [net 5/7] net/mlx5: Fix UAR memory leak
From: Saeed Mahameed @ 2017-04-23 10:08 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Maor Gottlieb, Saeed Mahameed
In-Reply-To: <20170423100802.27630-1-saeedm@mellanox.com>

From: Maor Gottlieb <maorg@mellanox.com>

When UAR is released, we deallocate the device resource, but
don't unmmap the UAR mapping memory.
Fix the leak by unmapping this memory.

Fixes: a6d51b68611e9 ('net/mlx5: Introduce blue flame register allocator)
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/uar.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/uar.c b/drivers/net/ethernet/mellanox/mlx5/core/uar.c
index 2e6b0f290ddc..222b25908d01 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/uar.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/uar.c
@@ -87,6 +87,7 @@ static void up_rel_func(struct kref *kref)
 	struct mlx5_uars_page *up = container_of(kref, struct mlx5_uars_page, ref_count);
 
 	list_del(&up->list);
+	iounmap(up->map);
 	if (mlx5_cmd_free_uar(up->mdev, up->index))
 		mlx5_core_warn(up->mdev, "failed to free uar index %d\n", up->index);
 	kfree(up->reg_bitmap);
-- 
2.11.0

^ permalink raw reply related

* [net 7/7] net/mlx5e: Fix ETHTOOL_GRXCLSRLALL handling
From: Saeed Mahameed @ 2017-04-23 10:08 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Ilan Tayari, Saeed Mahameed, kernel-team
In-Reply-To: <20170423100802.27630-1-saeedm@mellanox.com>

From: Ilan Tayari <ilant@mellanox.com>

Handler for ETHTOOL_GRXCLSRLALL must set info->data to the size
of the table, regardless of the amount of entries in it.
Existing code does not do that, and this breaks all usage of ethtool -N
or -n without explicit location, with this error:
rmgr: Invalid RX class rules table size: Success

Set info->data to the table size.

Tested:
ethtool -n ens8
ethtool -N ens8 flow-type ip4 src-ip 1.1.1.1 dst-ip 2.2.2.2 action 1
ethtool -N ens8 flow-type ip4 src-ip 1.1.1.1 dst-ip 2.2.2.2 action 1 loc 55
ethtool -n ens8
ethtool -N ens8 delete 1023
ethtool -N ens8 delete 55

Fixes: f913a72aa008 ("net/mlx5e: Add support to get ethtool flow rules")
Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Cc: kernel-team@fb.com
---
 drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c
index d55fff0ba388..26fc77e80f7b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c
@@ -564,6 +564,7 @@ int mlx5e_ethtool_get_all_flows(struct mlx5e_priv *priv, struct ethtool_rxnfc *i
 	int idx = 0;
 	int err = 0;

+	info->data = MAX_NUM_OF_ETHTOOL_RULES;
 	while ((!err || err == -ENOENT) && idx < info->rule_cnt) {
 		err = mlx5e_ethtool_get_flow(priv, info, location);
 		if (!err)
-- 
2.11.0

^ permalink raw reply related

* [net 6/7] net/mlx5e: Fix small packet threshold
From: Saeed Mahameed @ 2017-04-23 10:08 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Eugenia Emantayev, Saeed Mahameed, kernel-team
In-Reply-To: <20170423100802.27630-1-saeedm@mellanox.com>

From: Eugenia Emantayev <eugenia@mellanox.com>

RX packet headers are meant to be contained in SKB linear part,
and chose a threshold of 128.
It turns out this is not enough, i.e. for IPv6 packet over VxLAN.
In this case, UDP/IPv4 needs 42 bytes, GENEVE header is 8 bytes,
and 86 bytes for TCP/IPv6. In total 136 bytes that is more than
current 128 bytes. In this case expand header flow is reached.
The warning in skb_try_coalesce() caused by a wrong truesize
was already fixed here:
commit 158f323b9868 ("net: adjust skb->truesize in pskb_expand_head()").
Still, we prefer to totally avoid the expand header flow for performance reasons.
Tested regular TCP_STREAM with iperf for 1 and 8 streams, no degradation was found.

Fixes: 461017cb006a ("net/mlx5e: Support RX multi-packet WQE (Striding RQ)")
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Cc: kernel-team@fb.com
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index dc52053128bc..3d9490cd2db1 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -90,7 +90,7 @@
 #define MLX5E_VALID_NUM_MTTS(num_mtts) (MLX5_MTT_OCTW(num_mtts) - 1 <= U16_MAX)

 #define MLX5_UMR_ALIGN				(2048)
-#define MLX5_MPWRQ_SMALL_PACKET_THRESHOLD	(128)
+#define MLX5_MPWRQ_SMALL_PACKET_THRESHOLD	(256)

 #define MLX5E_PARAMS_DEFAULT_LRO_WQE_SZ                 (64 * 1024)
 #define MLX5E_DEFAULT_LRO_TIMEOUT                       32
-- 
2.11.0

^ permalink raw reply related

* [pull request][net 0/7] Mellanox, mlx5 fixes 2017-04-22
From: Saeed Mahameed @ 2017-04-23 10:07 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Saeed Mahameed

Hi Dave,

This series contains some mlx5 fixes for net.

For your convenience, the series doesn't introduce any conflict with
the ongoing net-next pull request.

Please pull and let me know if there's any problem.

For -stable:
("net/mlx5: E-Switch, Correctly deal with inline mode on ConnectX-5") kernels >= 4.10
("net/mlx5e: Fix ETHTOOL_GRXCLSRLALL handling") kernels >= 4.8
("net/mlx5e: Fix small packet threshold")       kernels >= 4.7
("net/mlx5: Fix driver load bad flow when having fw initializing timeout") kernels >= 4.4

Thanks,
Saeed.

The following changes since commit 94836ecf1e7378b64d37624fbb81fe48fbd4c772:

  Merge tag 'nfsd-4.11-2' of git://linux-nfs.org/~bfields/linux (2017-04-21 16:37:48 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git tags/mlx5-fixes-2017-04-22

for you to fetch changes up to 5e82c9e4ed60beba83f46a1a5a8307b99a23e982:

  net/mlx5e: Fix ETHTOOL_GRXCLSRLALL handling (2017-04-22 21:52:37 +0300)

----------------------------------------------------------------
mlx5-fixes-2017-04-22

----------------------------------------------------------------
Eugenia Emantayev (1):
      net/mlx5e: Fix small packet threshold

Ilan Tayari (1):
      net/mlx5e: Fix ETHTOOL_GRXCLSRLALL handling

Maor Gottlieb (1):
      net/mlx5: Fix UAR memory leak

Mohamad Haj Yahia (1):
      net/mlx5: Fix driver load bad flow when having fw initializing timeout

Or Gerlitz (3):
      net/mlx5: E-Switch, Correctly deal with inline mode on ConnectX-5
      net/mlx5e: Make sure the FW max encap size is enough for ipv4 tunnels
      net/mlx5e: Make sure the FW max encap size is enough for ipv6 tunnels

 drivers/net/ethernet/mellanox/mlx5/core/en.h       |  2 +-
 .../ethernet/mellanox/mlx5/core/en_fs_ethtool.c    |  1 +
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c    | 87 ++++++++++++----------
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 36 ++++++---
 drivers/net/ethernet/mellanox/mlx5/core/main.c     |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/uar.c      |  1 +
 6 files changed, 76 insertions(+), 53 deletions(-)

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox