Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH net-next] hv_netvsc: Fix the list processing for network change event
From: Haiyang Zhang @ 2016-04-21 23:13 UTC (permalink / raw)
  To: davem, netdev
  Cc: haiyangz, kys, olaf, vkuznets, linux-kernel, driverdev-devel

RNDIS_STATUS_NETWORK_CHANGE event is handled as two "half events" --
media disconnect & connect. The second half should be added to the list
head, not to the tail. So all events are processed in normal order.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
---
 drivers/net/hyperv/netvsc_drv.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index bfdb568a..ba3f3f3 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -1125,7 +1125,7 @@ static void netvsc_link_change(struct work_struct *w)
 			netif_tx_stop_all_queues(net);
 			event->event = RNDIS_STATUS_MEDIA_CONNECT;
 			spin_lock_irqsave(&ndev_ctx->lock, flags);
-			list_add_tail(&event->list, &ndev_ctx->reconfig_events);
+			list_add(&event->list, &ndev_ctx->reconfig_events);
 			spin_unlock_irqrestore(&ndev_ctx->lock, flags);
 			reschedule = true;
 		}
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH v2] ixgbevf: Change the relaxed order settings in VF driver for sparc
From: Babu Moger @ 2016-04-21 22:56 UTC (permalink / raw)
  To: jeffrey.t.kirsher, jesse.brandeburg, shannon.nelson,
	carolyn.wyborny, donald.c.skidmore, bruce.w.allan, john.ronciak,
	mitch.a.williams
  Cc: intel-wired-lan, netdev, linux-kernel, babu.moger,
	sowmini.varadhan

We noticed performance issues with VF interface on sparc compared
to PF. Setting the RX to IXGBE_DCA_RXCTRL_DATA_WRO_EN brings it
on far with PF. Also this matches to the default sparc setting in
PF driver.

Signed-off-by: Babu Moger <babu.moger@oracle.com>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
---
v2:
  Alexander had concerns about this negativily affecting other architectures.
  Added CONFIG_SPARC check so this should not affect other architectures.

 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 0ea14c0..3596e0b 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -1748,9 +1748,15 @@ static void ixgbevf_configure_rx_ring(struct ixgbevf_adapter *adapter,
 	IXGBE_WRITE_REG(hw, IXGBE_VFRDLEN(reg_idx),
 			ring->count * sizeof(union ixgbe_adv_rx_desc));
 
+#ifndef CONFIG_SPARC
 	/* enable relaxed ordering */
 	IXGBE_WRITE_REG(hw, IXGBE_VFDCA_RXCTRL(reg_idx),
 			IXGBE_DCA_RXCTRL_DESC_RRO_EN);
+#else
+	IXGBE_WRITE_REG(hw, IXGBE_VFDCA_RXCTRL(reg_idx),
+			IXGBE_DCA_RXCTRL_DESC_RRO_EN |
+			IXGBE_DCA_RXCTRL_DATA_WRO_EN);
+#endif
 
 	/* reset head and tail pointers */
 	IXGBE_WRITE_REG(hw, IXGBE_VFRDH(reg_idx), 0);
-- 
1.7.1

^ permalink raw reply related

* Re: qdisc spin lock
From: Michael Ma @ 2016-04-21 22:12 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Cong Wang, Linux Kernel Network Developers
In-Reply-To: <1461242518.7627.8.camel@edumazet-glaptop3.roam.corp.google.com>

2016-04-21 5:41 GMT-07:00 Eric Dumazet <eric.dumazet@gmail.com>:
> On Wed, 2016-04-20 at 22:51 -0700, Michael Ma wrote:
>> 2016-04-20 15:34 GMT-07:00 Eric Dumazet <eric.dumazet@gmail.com>:
>> > On Wed, 2016-04-20 at 14:24 -0700, Michael Ma wrote:
>> >> 2016-04-08 7:19 GMT-07:00 Eric Dumazet <eric.dumazet@gmail.com>:
>> >> > On Thu, 2016-03-31 at 16:48 -0700, Michael Ma wrote:
>> >> >> I didn't really know that multiple qdiscs can be isolated using MQ so
>> >> >> that each txq can be associated with a particular qdisc. Also we don't
>> >> >> really have multiple interfaces...
>> >> >>
>> >> >> With this MQ solution we'll still need to assign transmit queues to
>> >> >> different classes by doing some math on the bandwidth limit if I
>> >> >> understand correctly, which seems to be less convenient compared with
>> >> >> a solution purely within HTB.
>> >> >>
>> >> >> I assume that with this solution I can still share qdisc among
>> >> >> multiple transmit queues - please let me know if this is not the case.
>> >> >
>> >> > Note that this MQ + HTB thing works well, unless you use a bonding
>> >> > device. (Or you need the MQ+HTB on the slaves, with no way of sharing
>> >> > tokens between the slaves)
>> >>
>> >> Actually MQ+HTB works well for small packets - like flow of 512 byte
>> >> packets can be throttled by HTB using one txq without being affected
>> >> by other flows with small packets. However I found using this solution
>> >> large packets (10k for example) will only achieve very limited
>> >> bandwidth. In my test I used MQ to assign one txq to a HTB which sets
>> >> rate at 1Gbit/s, 512 byte packets can achieve the ceiling rate by
>> >> using 30 threads. But sending 10k packets using 10 threads has only 10
>> >> Mbit/s with the same TC configuration. If I increase burst and cburst
>> >> of HTB to some extreme large value (like 50MB) the ceiling rate can be
>> >> hit.
>> >>
>> >> The strange thing is that I don't see this problem when using HTB as
>> >> the root. So txq number seems to be a factor here - however it's
>> >> really hard to understand why would it only affect larger packets. Is
>> >> this a known issue? Any suggestion on how to investigate the issue
>> >> further? Profiling shows that the cpu utilization is pretty low.
>> >
>> > You could try
>> >
>> > perf record -a -g -e skb:kfree_skb sleep 5
>> > perf report
>> >
>> > So that you see where the packets are dropped.
>> >
>> > Chances are that your UDP sockets SO_SNDBUF is too big, and packets are
>> > dropped at qdisc enqueue time, instead of having backpressure.
>> >
>>
>> Thanks for the hint - how should I read the perf report? Also we're
>> using TCP socket in this testing - TCP window size is set to 70kB.
>
> But how are you telling TCP to send 10k packets ?
>
We just write to the socket with 10k buffer and wait for a response
from the server (using read()) before the next write. Using tcpdump I
can see the 10k write is actually sent through 3 packets
(7.3k/1.5k/1.3k).

> AFAIK you can not : TCP happily aggregates packets in write queue
> (see current MSG_EOR discussion)
>
> I suspect a bug in your tc settings.
>
>

Could you help to check my tc setting?

sudo tc qdisc add dev eth0 root mqprio num_tc 6 map 0 1 2 3 4 5 0 0
queues 19@0 1@19 1@20 1@21 1@22 1@23 hw 0
sudo tc qdisc add dev eth0 parent 805a:1a handle 8001:0 htb default 10
sudo tc class add dev eth0 parent 8001: classid 8001:10 htb rate 1000Mbit

I didn't set r2q/burst/cburst/mtu/mpu so the default value should be used.

^ permalink raw reply

* Re: [net-next PATCH iproute2 v2 1/1] tc: introduce IFE action
From: Jamal Hadi Salim @ 2016-04-21 22:09 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev@vger.kernel.org, phil@nwl.cc
In-Reply-To: <5719490F.3020807@mojatatu.com>

On 16-04-21 05:41 PM, Jamal Hadi Salim wrote:
> On 16-04-21 05:36 PM, Stephen Hemminger wrote:
>> On Thu, 21 Apr 2016 21:27:39 +0000
>> Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>
>>
>> I use checkpatch from kernel source to check iproute code now.
>
> ah. It was wrong in this specific case.
> In any case I just resent the patch with the fixes.

I just realized i sent with my local ife headers.
I didnt see them in the new iproute2 tree;
if you want me to resend i could or just feel free to
edit.

cheers,
jamal

^ permalink raw reply

* Re: [PATCH net-next v2 0/4] libnl: enhance API to ease 64bit alignment for attribute
From: Nicolas Dichtel @ 2016-04-21 22:00 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, roopa, eric.dumazet, tgraf, jhs
In-Reply-To: <20160421.142831.1815562418742721577.davem@davemloft.net>

Le 21/04/2016 20:28, David Miller a écrit :
> From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> Date: Thu, 21 Apr 2016 18:58:23 +0200
>
>> Here is a proposal to add more helpers in the libnetlink to manage 64-bit
>> alignment issues.
>> Note that this series was only tested on x86 by tweeking
>> CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS and adding some traces.
>>
>> The first patch adds helpers for 64bit alignment and other patches
>> use them.
>>
>> We could also add helpers for nla_put_u64() and its variants if needed.
>>
>> v1 -> v2:
>>   - remove patch #1
>>   - split patch #2 (now #1 and #2)
>>   - add nla_need_padding_for_64bit()
>
> I like it, nice work Nicolas.
Thank you.

>
> Applied to net-next.
>
> I did a quick scan and the following jumped out at me as cases we need
> to fix up as well:
Did you grep something or just catch this by code review?

>
> 1) xfrm_user
> 2) tcp_info
> 3) taskstats
> 4) pkt_{cls,sched}
> 5) openvswitch
> etc.
>
> Most of these are statistic cases just like all of the existing ones
> we have fixed so far.
Yes, I will follow on this topic. There are also a bunch of
nla_put_[u|be|le]64():
$ git grep -w "nla_put_.\{1,2\}64" net/ | wc -l
118
$ git grep -w "nla_put_.\{1,2\}64" | wc -l
172

^ permalink raw reply

* Re: [net-next PATCH iproute2 v2 1/1] tc: introduce IFE action
From: Jamal Hadi Salim @ 2016-04-21 21:41 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev@vger.kernel.org, phil@nwl.cc
In-Reply-To: <20160421143657.51ba9fdd@xeon-e3>

On 16-04-21 05:36 PM, Stephen Hemminger wrote:
> On Thu, 21 Apr 2016 21:27:39 +0000
> Jamal Hadi Salim <jhs@mojatatu.com> wrote:

>
> I use checkpatch from kernel source to check iproute code now.

ah. It was wrong in this specific case.
In any case I just resent the patch with the fixes.

cheers,
jamal

^ permalink raw reply

* [iproute2 PATCH v3 1/1] tc: introduce IFE action
From: Jamal Hadi Salim @ 2016-04-21 21:40 UTC (permalink / raw)
  To: stephen; +Cc: netdev, phil, Jamal Hadi Salim

From: Jamal Hadi Salim <jhs@mojatatu.com>

This action allows for a sending side to encapsulate arbitrary metadata
which is decapsulated by the receiving end.
The sender runs in encoding mode and the receiver in decode mode.
Both sender and receiver must specify the same ethertype.
At some point we hope to have a registered ethertype and we'll
then provide a default so the user doesnt have to specify it.
For now we enforce the user specify it.

Described in netdev01 paper:
"Distributing Linux Traffic Control Classifier-Action Subsystem"
 Authors: Jamal Hadi Salim and Damascene M. Joachimpillai

Also refer to IETF draft-ietf-forces-interfelfb-03.txt

Lets show example usage where we encode icmp from a sender towards
a receiver with an skbmark of 17; both sender and receiver use
ethertype of 0xdead to interop.

YYYY: Lets start with Receiver-side policy config:
xxx: add an ingress qdisc
sudo tc qdisc add dev $ETH ingress

xxx: any packets with ethertype 0xdead will be subjected to ife decoding
xxx: we then restart the classification so we can match on icmp at prio 3
sudo $TC filter add dev $ETH parent ffff: prio 2 protocol 0xdead \
u32 match u32 0 0 flowid 1:1 \
action ife decode reclassify

xxx: on restarting the classification from above if it was an icmp
xxx: packet, then match it here and continue to the next rule at prio 4
xxx: which will match based on skb mark of 17
sudo tc filter add dev $ETH parent ffff: prio 3 protocol ip \
u32 match ip protocol 1 0xff flowid 1:1 \
action continue

xxx: match on skbmark of 0x11 (decimal 17) and accept
sudo tc filter add dev $ETH parent ffff: prio 4 protocol ip \
handle 0x11 fw flowid 1:1 \
action ok

xxx: Lets show the decoding policy
sudo tc -s filter ls dev $ETH parent ffff: protocol 0xdead
xxx:
filter pref 2 u32
filter pref 2 u32 fh 800: ht divisor 1
filter pref 2 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:1  (rule hit 0 success 0)
  match 00000000/00000000 at 0 (success 0 )
	action order 1: ife decode action reclassify type 0x0
	 allow mark allow prio
	 index 11 ref 1 bind 1 installed 45 sec used 45 sec
	Action statistics:
	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
	backlog 0b 0p requeues 0

xxx:
Observe that above lists all metadatum it can decode. Typically these
submodules will already be compiled into a monolithic kernel or
loaded as modules

YYYY: Lets show the sender side now ..
xxx: Add an egress qdisc on the sender netdev
sudo tc qdisc add dev $ETH root handle 1: prio
xxx:
xxx: Match all icmp packets to 192.168.122.237/24, then
xxx: tag the packet with skb mark of decimal 17, then
xxx: Encode it with:
xxx:    ethertype 0xdead
xxx:    add skb->mark to whitelist of metadatum to send
xxx:    rewrite target dst MAC address to 02:15:15:15:15:15
xxx:
sudo $TC filter add dev $ETH parent 1: protocol ip prio 10  u32 \
match ip dst 192.168.122.237/24 \
match ip protocol 1 0xff \
flowid 1:2 \
action skbedit mark 17 \
action ife encode \
type 0xDEAD \
allow mark \
dst 02:15:15:15:15:15

xxx: Lets show the encoding policy
filter pref 10 u32
filter pref 10 u32 fh 800: ht divisor 1
filter pref 10 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:2  (rule hit 118 success 0)
  match c0a87a00/ffffff00 at 16 (success 0 )
  match 00010000/00ff0000 at 8 (success 0 )
	action order 1:  skbedit mark 17
	 index 11 ref 1 bind 1 installed 3 sec used 3 sec
 	Action statistics:
	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
	backlog 0b 0p requeues 0

	action order 2: ife encode action pipe type 0xDEAD
	 allow mark dst 02:15:15:15:15:15
	 index 12 ref 1 bind 1 installed 3 sec used 3 sec
	Action statistics:
	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
	backlog 0b 0p requeues 0
xxx:

Now test by sending ping from sender to destination

Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
---
 include/linux/tc_act/tc_ife.h |  38 +++++
 include/linux/tc_ife.h        |  38 +++++
 tc/Makefile                   |   1 +
 tc/m_ife.c                    | 341 ++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 418 insertions(+)
 create mode 100644 include/linux/tc_act/tc_ife.h
 create mode 100644 include/linux/tc_ife.h
 create mode 100644 tc/m_ife.c

diff --git a/include/linux/tc_act/tc_ife.h b/include/linux/tc_act/tc_ife.h
new file mode 100644
index 0000000..d648ff6
--- /dev/null
+++ b/include/linux/tc_act/tc_ife.h
@@ -0,0 +1,38 @@
+#ifndef __UAPI_TC_IFE_H
+#define __UAPI_TC_IFE_H
+
+#include <linux/types.h>
+#include <linux/pkt_cls.h>
+
+#define TCA_ACT_IFE 25
+/* Flag bits for now just encoding/decoding; mutually exclusive */
+#define IFE_ENCODE 1
+#define IFE_DECODE 0
+
+struct tc_ife {
+	tc_gen;
+	__u16 flags;
+};
+
+/*XXX: We need to encode the total number of bytes consumed */
+enum {
+	TCA_IFE_UNSPEC,
+	TCA_IFE_PARMS,
+	TCA_IFE_TM,
+	TCA_IFE_DMAC,
+	TCA_IFE_SMAC,
+	TCA_IFE_TYPE,
+	TCA_IFE_METALST,
+	__TCA_IFE_MAX
+};
+#define TCA_IFE_MAX (__TCA_IFE_MAX - 1)
+
+#define IFE_META_SKBMARK 1
+#define IFE_META_HASHID 2
+#define	IFE_META_PRIO 3
+#define	IFE_META_QMAP 4
+/*Can be overridden at runtime by module option*/
+#define	__IFE_META_MAX 5
+#define IFE_META_MAX (__IFE_META_MAX - 1)
+
+#endif
diff --git a/include/linux/tc_ife.h b/include/linux/tc_ife.h
new file mode 100644
index 0000000..d648ff6
--- /dev/null
+++ b/include/linux/tc_ife.h
@@ -0,0 +1,38 @@
+#ifndef __UAPI_TC_IFE_H
+#define __UAPI_TC_IFE_H
+
+#include <linux/types.h>
+#include <linux/pkt_cls.h>
+
+#define TCA_ACT_IFE 25
+/* Flag bits for now just encoding/decoding; mutually exclusive */
+#define IFE_ENCODE 1
+#define IFE_DECODE 0
+
+struct tc_ife {
+	tc_gen;
+	__u16 flags;
+};
+
+/*XXX: We need to encode the total number of bytes consumed */
+enum {
+	TCA_IFE_UNSPEC,
+	TCA_IFE_PARMS,
+	TCA_IFE_TM,
+	TCA_IFE_DMAC,
+	TCA_IFE_SMAC,
+	TCA_IFE_TYPE,
+	TCA_IFE_METALST,
+	__TCA_IFE_MAX
+};
+#define TCA_IFE_MAX (__TCA_IFE_MAX - 1)
+
+#define IFE_META_SKBMARK 1
+#define IFE_META_HASHID 2
+#define	IFE_META_PRIO 3
+#define	IFE_META_QMAP 4
+/*Can be overridden at runtime by module option*/
+#define	__IFE_META_MAX 5
+#define IFE_META_MAX (__IFE_META_MAX - 1)
+
+#endif
diff --git a/tc/Makefile b/tc/Makefile
index f5bea87..20f5110 100644
--- a/tc/Makefile
+++ b/tc/Makefile
@@ -43,6 +43,7 @@ TCMODULES += m_gact.o
 TCMODULES += m_mirred.o
 TCMODULES += m_nat.o
 TCMODULES += m_pedit.o
+TCMODULES += m_ife.o
 TCMODULES += m_skbedit.o
 TCMODULES += m_csum.o
 TCMODULES += m_simple.o
diff --git a/tc/m_ife.c b/tc/m_ife.c
new file mode 100644
index 0000000..839e370
--- /dev/null
+++ b/tc/m_ife.c
@@ -0,0 +1,341 @@
+/*
+ * m_ife.c	IFE actions module
+ *
+ *		This program is free software; you can distribute it and/or
+ *		modify it under the terms of the GNU General Public License
+ *		as published by the Free Software Foundation; either version
+ *		2 of the License, or (at your option) any later version.
+ *
+ * Authors:  J Hadi Salim (jhs@mojatatu.com)
+ *
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <syslog.h>
+#include <fcntl.h>
+#include <sys/socket.h>
+#include <netinet/in.h>
+#include <arpa/inet.h>
+#include <string.h>
+#include <linux/netdevice.h>
+
+#include "rt_names.h"
+#include "utils.h"
+#include "tc_util.h"
+#include <linux/tc_act/tc_ife.h>
+
+static void ife_explain(void)
+{
+	fprintf(stderr,
+		"Usage:... ife {decode|encode} {ALLOW|USE} [dst DMAC] [src SMAC] [type TYPE] [CONTROL] [index INDEX]\n");
+	fprintf(stderr,
+		"\tALLOW := Encode direction. Allows encoding specified metadata\n"
+		"\t\t e.g \"allow mark\"\n"
+		"\tUSE := Encode direction. Enforce Static encoding of specified metadata\n"
+		"\t\t e.g \"use mark 0x12\"\n"
+		"\tDMAC := 6 byte Destination MAC address to encode\n"
+		"\tSMAC := optional 6 byte Source MAC address to encode\n"
+		"\tTYPE := optional 16 bit ethertype to encode\n"
+		"\tCONTROL := reclassify|pipe|drop|continue|ok\n"
+		"\tINDEX := optional IFE table index value used\n");
+	fprintf(stderr, "encode is used for sending IFE packets\n");
+	fprintf(stderr, "decode is used for receiving IFE packets\n");
+}
+
+static void ife_usage(void)
+{
+	ife_explain();
+	exit(-1);
+}
+
+static int parse_ife(struct action_util *a, int *argc_p, char ***argv_p,
+		     int tca_id, struct nlmsghdr *n)
+{
+	int argc = *argc_p;
+	char **argv = *argv_p;
+	int ok = 0;
+	struct tc_ife p;
+	struct rtattr *tail;
+	struct rtattr *tail2;
+	char dbuf[ETH_ALEN];
+	char sbuf[ETH_ALEN];
+	__u16 ife_type = 0;
+	__u32 ife_prio = 0;
+	__u32 ife_prio_v = 0;
+	__u32 ife_mark = 0;
+	__u32 ife_mark_v = 0;
+	char *daddr = NULL;
+	char *saddr = NULL;
+
+	memset(&p, 0, sizeof(p));
+	p.action = TC_ACT_PIPE;	/* good default */
+
+	if (argc <= 0)
+		return -1;
+
+	while (argc > 0) {
+		if (matches(*argv, "ife") == 0) {
+			NEXT_ARG();
+			continue;
+		} else if (matches(*argv, "decode") == 0) {
+			p.flags = IFE_DECODE; /* readability aid */
+			ok++;
+		} else if (matches(*argv, "encode") == 0) {
+			p.flags = IFE_ENCODE;
+			ok++;
+		} else if (matches(*argv, "allow") == 0) {
+			NEXT_ARG();
+			if (matches(*argv, "mark") == 0) {
+				ife_mark = IFE_META_SKBMARK;
+			} else if (matches(*argv, "prio") == 0) {
+				ife_prio = IFE_META_PRIO;
+			} else {
+				fprintf(stderr, "Illegal meta define <%s>\n",
+					*argv);
+				return -1;
+			}
+		} else if (matches(*argv, "use") == 0) {
+			NEXT_ARG();
+			if (matches(*argv, "mark") == 0) {
+				NEXT_ARG();
+				if (get_u32(&ife_mark_v, *argv, 0))
+					invarg("ife mark val is invalid",
+					       *argv);
+			} else if (matches(*argv, "prio") == 0) {
+				NEXT_ARG();
+				if (get_u32(&ife_prio_v, *argv, 0))
+					invarg("ife prio val is invalid",
+					       *argv);
+			} else {
+				fprintf(stderr, "Illegal meta use type <%s>\n",
+					*argv);
+				return -1;
+			}
+		} else if (matches(*argv, "type") == 0) {
+			NEXT_ARG();
+			if (get_u16(&ife_type, *argv, 0))
+				invarg("ife type is invalid", *argv);
+			fprintf(stderr, "IFE type 0x%x\n", ife_type);
+		} else if (matches(*argv, "dst") == 0) {
+			NEXT_ARG();
+			daddr = *argv;
+			if (sscanf(daddr, "%hhx:%hhx:%hhx:%hhx:%hhx:%hhx",
+				   dbuf, dbuf + 1, dbuf + 2,
+				   dbuf + 3, dbuf + 4, dbuf + 5) != 6) {
+				fprintf(stderr, "Invalid mac address %s\n",
+					daddr);
+			}
+			fprintf(stderr, "dst MAC address <%s>\n", daddr);
+
+		} else if (matches(*argv, "src") == 0) {
+			NEXT_ARG();
+			saddr = *argv;
+			if (sscanf(saddr, "%hhx:%hhx:%hhx:%hhx:%hhx:%hhx",
+				   sbuf, sbuf + 1, sbuf + 2,
+				   sbuf + 3, sbuf + 4, sbuf + 5) != 6) {
+				fprintf(stderr, "Invalid mac address %s\n",
+					saddr);
+			}
+			fprintf(stderr, "src MAC address <%s>\n", saddr);
+		} else if (matches(*argv, "help") == 0) {
+			ife_usage();
+		} else {
+			break;
+		}
+
+		argc--;
+		argv++;
+	}
+
+	if (argc) {
+		if (matches(*argv, "reclassify") == 0) {
+			p.action = TC_ACT_RECLASSIFY;
+			argc--;
+			argv++;
+		} else if (matches(*argv, "pipe") == 0) {
+			p.action = TC_ACT_PIPE;
+			argc--;
+			argv++;
+		} else if (matches(*argv, "drop") == 0 ||
+			   matches(*argv, "shot") == 0) {
+			p.action = TC_ACT_SHOT;
+			argc--;
+			argv++;
+		} else if (matches(*argv, "continue") == 0) {
+			p.action = TC_ACT_UNSPEC;
+			argc--;
+			argv++;
+		} else if (matches(*argv, "pass") == 0) {
+			p.action = TC_ACT_OK;
+			argc--;
+			argv++;
+		}
+	}
+
+	if (argc) {
+		if (matches(*argv, "index") == 0) {
+			NEXT_ARG();
+			if (get_u32(&p.index, *argv, 10)) {
+				fprintf(stderr, "ife: Illegal \"index\"\n");
+				return -1;
+			}
+			argc--;
+			argv++;
+		}
+	}
+
+	if (!ok) {
+		fprintf(stderr, "IFE requires decode/encode specified\n");
+		ife_usage();
+	}
+
+	tail = NLMSG_TAIL(n);
+	addattr_l(n, MAX_MSG, tca_id, NULL, 0);
+	addattr_l(n, MAX_MSG, TCA_IFE_PARMS, &p, sizeof(p));
+
+	if (!(p.flags & IFE_ENCODE))
+		goto skip_encode;
+
+	if (daddr)
+		addattr_l(n, MAX_MSG, TCA_IFE_DMAC, dbuf, ETH_ALEN);
+	if (ife_type)
+		addattr_l(n, MAX_MSG, TCA_IFE_TYPE, &ife_type, 2);
+	if (saddr)
+		addattr_l(n, MAX_MSG, TCA_IFE_SMAC, sbuf, ETH_ALEN);
+
+	tail2 = NLMSG_TAIL(n);
+	addattr_l(n, MAX_MSG, TCA_IFE_METALST, NULL, 0);
+	if (ife_mark || ife_mark_v) {
+		if (ife_mark_v)
+			addattr_l(n, MAX_MSG, IFE_META_SKBMARK, &ife_mark_v, 4);
+		else
+			addattr_l(n, MAX_MSG, IFE_META_SKBMARK, NULL, 0);
+	}
+	if (ife_prio || ife_prio_v) {
+		if (ife_prio_v)
+			addattr_l(n, MAX_MSG, IFE_META_PRIO, &ife_prio_v, 4);
+		else
+			addattr_l(n, MAX_MSG, IFE_META_PRIO, NULL, 0);
+	}
+
+	tail2->rta_len = (void *)NLMSG_TAIL(n) - (void *)tail2;
+
+skip_encode:
+	tail->rta_len = (void *)NLMSG_TAIL(n) - (void *)tail;
+
+	*argc_p = argc;
+	*argv_p = argv;
+	return 0;
+}
+
+static int print_ife(struct action_util *au, FILE *f, struct rtattr *arg)
+{
+	struct tc_ife *p = NULL;
+	struct rtattr *tb[TCA_IFE_MAX + 1];
+	__u16 ife_type = 0;
+	__u32 mmark = 0;
+	__u32 mhash = 0;
+	__u32 mprio = 0;
+	int has_optional = 0;
+	SPRINT_BUF(b1);
+	SPRINT_BUF(b2);
+
+	if (arg == NULL)
+		return -1;
+
+	parse_rtattr_nested(tb, TCA_IFE_MAX, arg);
+
+	if (tb[TCA_IFE_PARMS] == NULL) {
+		fprintf(f, "[NULL ife parameters]");
+		return -1;
+	}
+	p = RTA_DATA(tb[TCA_IFE_PARMS]);
+
+	fprintf(f, "ife %s action %s ",
+		(p->flags & IFE_ENCODE) ? "encode" : "decode",
+		action_n2a(p->action, b1, sizeof(b1)));
+
+	if (tb[TCA_IFE_TYPE]) {
+		ife_type = rta_getattr_u16(tb[TCA_IFE_TYPE]);
+		has_optional = 1;
+		fprintf(f, "type 0x%X ", ife_type);
+	}
+
+	if (has_optional)
+		fprintf(f, "\n\t ");
+
+	if (tb[TCA_IFE_METALST]) {
+		struct rtattr *metalist[IFE_META_MAX + 1];
+		int len = 0;
+
+		parse_rtattr_nested(metalist, IFE_META_MAX,
+				    tb[TCA_IFE_METALST]);
+
+		if (metalist[IFE_META_SKBMARK]) {
+			len = RTA_PAYLOAD(metalist[IFE_META_SKBMARK]);
+			if (len) {
+				mmark = rta_getattr_u32(metalist[IFE_META_SKBMARK]);
+				fprintf(f, "use mark %d ", mmark);
+			} else
+				fprintf(f, "allow mark ");
+		}
+
+		if (metalist[IFE_META_HASHID]) {
+			len = RTA_PAYLOAD(metalist[IFE_META_HASHID]);
+			if (len) {
+				mhash = rta_getattr_u32(metalist[IFE_META_HASHID]);
+				fprintf(f, "use hash %d ", mhash);
+			} else
+				fprintf(f, "allow hash ");
+		}
+
+		if (metalist[IFE_META_PRIO]) {
+			len = RTA_PAYLOAD(metalist[IFE_META_PRIO]);
+			if (len) {
+				mprio = rta_getattr_u32(metalist[IFE_META_PRIO]);
+				fprintf(f, "use prio %d ", mprio);
+			} else
+				fprintf(f, "allow prio ");
+		}
+
+	}
+
+	if (tb[TCA_IFE_DMAC]) {
+		has_optional = 1;
+		fprintf(f, "dst %s ",
+			ll_addr_n2a(RTA_DATA(tb[TCA_IFE_DMAC]),
+				    RTA_PAYLOAD(tb[TCA_IFE_DMAC]), 0, b2,
+				    sizeof(b2)));
+
+	}
+
+	if (tb[TCA_IFE_SMAC]) {
+		has_optional = 1;
+		fprintf(f, "src %s ",
+			ll_addr_n2a(RTA_DATA(tb[TCA_IFE_SMAC]),
+				    RTA_PAYLOAD(tb[TCA_IFE_SMAC]), 0, b2,
+				    sizeof(b2)));
+	}
+
+	fprintf(f, "\n\t index %d ref %d bind %d", p->index, p->refcnt,
+		p->bindcnt);
+	if (show_stats) {
+		if (tb[TCA_IFE_TM]) {
+			struct tcf_t *tm = RTA_DATA(tb[TCA_IFE_TM]);
+
+			print_tm(f, tm);
+		}
+	}
+
+	fprintf(f, "\n");
+
+	return 0;
+}
+
+struct action_util ife_action_util = {
+	.id = "ife",
+	.parse_aopt = parse_ife,
+	.print_aopt = print_ife,
+};
-- 
1.9.1

^ permalink raw reply related

* Re: [net-next PATCH iproute2 v2 1/1] tc: introduce IFE action
From: Stephen Hemminger @ 2016-04-21 21:36 UTC (permalink / raw)
  To: Jamal Hadi Salim; +Cc: netdev@vger.kernel.org, phil@nwl.cc
In-Reply-To: <e6070b054b0748f48991187bb4ccfccd@HQ1WP-EXMB11.corp.brocade.com>

On Thu, 21 Apr 2016 21:27:39 +0000
Jamal Hadi Salim <jhs@mojatatu.com> wrote:

> > The code has also gotten deeply intended creating lots of lines that are too long.
> >  
> 
> Is this you or the script saying the above? How is the conclusion that
> we have deep indentation come about?
> I checked there are some code lines that are > 80 characters because
> it doesnt make sense to break them down.

I use checkpatch from kernel source to check iproute code now.

^ permalink raw reply

* [PATCH net V1 8/8] net/mlx5: Add pci shutdown callback
From: Saeed Mahameed @ 2016-04-21 21:33 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Or Gerlitz, Tal Alon, Eran Ben Elisha, Majd Dibbiny,
	Tariq Toukan, Haggai Abramovsky, Saeed Mahameed
In-Reply-To: <1461274387-10653-1-git-send-email-saeedm@mellanox.com>

From: Majd Dibbiny <majd@mellanox.com>

This patch introduces kexec support for mlx5.
When switching kernels, kexec() calls shutdown, which unloads
the driver and cleans its resources.

In addition, remove unregister netdev from shutdown flow. This will
allow a clean shutdown, even if some netdev clients did not release their
reference from this netdev. Releasing The HW resources only is enough as
the kernel is shutting down

Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Haggai Abramovsky <hagaya@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c |   15 +++++++++++-
 drivers/net/ethernet/mellanox/mlx5/core/main.c    |   23 +++++++++++++++++---
 include/linux/mlx5/driver.h                       |    7 +++--
 3 files changed, 36 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 85773f8..67d548b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -2633,7 +2633,16 @@ static void mlx5e_destroy_netdev(struct mlx5_core_dev *mdev, void *vpriv)
 	schedule_work(&priv->set_rx_mode_work);
 	mlx5e_disable_async_events(priv);
 	flush_scheduled_work();
-	unregister_netdev(netdev);
+	if (test_bit(MLX5_INTERFACE_STATE_SHUTDOWN, &mdev->intf_state)) {
+		netif_device_detach(netdev);
+		mutex_lock(&priv->state_lock);
+		if (test_bit(MLX5E_STATE_OPENED, &priv->state))
+			mlx5e_close_locked(netdev);
+		mutex_unlock(&priv->state_lock);
+	} else {
+		unregister_netdev(netdev);
+	}
+
 	mlx5e_tc_cleanup(priv);
 	mlx5e_vxlan_cleanup(priv);
 	mlx5e_destroy_flow_tables(priv);
@@ -2646,7 +2655,9 @@ static void mlx5e_destroy_netdev(struct mlx5_core_dev *mdev, void *vpriv)
 	mlx5_core_dealloc_transport_domain(priv->mdev, priv->tdn);
 	mlx5_core_dealloc_pd(priv->mdev, priv->pdn);
 	mlx5_unmap_free_uar(priv->mdev, &priv->cq_uar);
-	free_netdev(netdev);
+
+	if (!test_bit(MLX5_INTERFACE_STATE_SHUTDOWN, &mdev->intf_state))
+		free_netdev(netdev);
 }
 
 static void *mlx5e_get_netdev(void *vpriv)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index ddd352a..6892746 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -966,7 +966,7 @@ static int mlx5_load_one(struct mlx5_core_dev *dev, struct mlx5_priv *priv)
 	int err;
 
 	mutex_lock(&dev->intf_state_mutex);
-	if (dev->interface_state == MLX5_INTERFACE_STATE_UP) {
+	if (test_bit(MLX5_INTERFACE_STATE_UP, &dev->intf_state)) {
 		dev_warn(&dev->pdev->dev, "%s: interface is up, NOP\n",
 			 __func__);
 		goto out;
@@ -1133,7 +1133,8 @@ static int mlx5_load_one(struct mlx5_core_dev *dev, struct mlx5_priv *priv)
 	if (err)
 		pr_info("failed request module on %s\n", MLX5_IB_MOD);
 
-	dev->interface_state = MLX5_INTERFACE_STATE_UP;
+	clear_bit(MLX5_INTERFACE_STATE_DOWN, &dev->intf_state);
+	set_bit(MLX5_INTERFACE_STATE_UP, &dev->intf_state);
 out:
 	mutex_unlock(&dev->intf_state_mutex);
 
@@ -1207,7 +1208,7 @@ static int mlx5_unload_one(struct mlx5_core_dev *dev, struct mlx5_priv *priv)
 	}
 
 	mutex_lock(&dev->intf_state_mutex);
-	if (dev->interface_state == MLX5_INTERFACE_STATE_DOWN) {
+	if (test_bit(MLX5_INTERFACE_STATE_DOWN, &dev->intf_state)) {
 		dev_warn(&dev->pdev->dev, "%s: interface is down, NOP\n",
 			 __func__);
 		goto out;
@@ -1241,7 +1242,8 @@ static int mlx5_unload_one(struct mlx5_core_dev *dev, struct mlx5_priv *priv)
 	mlx5_cmd_cleanup(dev);
 
 out:
-	dev->interface_state = MLX5_INTERFACE_STATE_DOWN;
+	clear_bit(MLX5_INTERFACE_STATE_UP, &dev->intf_state);
+	set_bit(MLX5_INTERFACE_STATE_DOWN, &dev->intf_state);
 	mutex_unlock(&dev->intf_state_mutex);
 	return err;
 }
@@ -1452,6 +1454,18 @@ static const struct pci_error_handlers mlx5_err_handler = {
 	.resume		= mlx5_pci_resume
 };
 
+static void shutdown(struct pci_dev *pdev)
+{
+	struct mlx5_core_dev *dev  = pci_get_drvdata(pdev);
+	struct mlx5_priv *priv = &dev->priv;
+
+	dev_info(&pdev->dev, "Shutdown was called\n");
+	/* Notify mlx5 clients that the kernel is being shut down */
+	set_bit(MLX5_INTERFACE_STATE_SHUTDOWN, &dev->intf_state);
+	mlx5_unload_one(dev, priv);
+	mlx5_pci_disable_device(dev);
+}
+
 static const struct pci_device_id mlx5_core_pci_table[] = {
 	{ PCI_VDEVICE(MELLANOX, 0x1011) },			/* Connect-IB */
 	{ PCI_VDEVICE(MELLANOX, 0x1012), MLX5_PCI_DEV_IS_VF},	/* Connect-IB VF */
@@ -1471,6 +1485,7 @@ static struct pci_driver mlx5_core_driver = {
 	.id_table       = mlx5_core_pci_table,
 	.probe          = init_one,
 	.remove         = remove_one,
+	.shutdown	= shutdown,
 	.err_handler	= &mlx5_err_handler,
 	.sriov_configure   = mlx5_core_sriov_configure,
 };
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index dcd5ac8..369c837 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -519,8 +519,9 @@ enum mlx5_device_state {
 };
 
 enum mlx5_interface_state {
-	MLX5_INTERFACE_STATE_DOWN,
-	MLX5_INTERFACE_STATE_UP,
+	MLX5_INTERFACE_STATE_DOWN = BIT(0),
+	MLX5_INTERFACE_STATE_UP = BIT(1),
+	MLX5_INTERFACE_STATE_SHUTDOWN = BIT(2),
 };
 
 enum mlx5_pci_status {
@@ -544,7 +545,7 @@ struct mlx5_core_dev {
 	enum mlx5_device_state	state;
 	/* sync interface state */
 	struct mutex		intf_state_mutex;
-	enum mlx5_interface_state interface_state;
+	unsigned long		intf_state;
 	void			(*event) (struct mlx5_core_dev *dev,
 					  enum mlx5_dev_event event,
 					  unsigned long param);
-- 
1.7.1

^ permalink raw reply related

* [PATCH net V1 6/8] net/mlx5e: Use vport MTU rather than physical port MTU
From: Saeed Mahameed @ 2016-04-21 21:33 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Or Gerlitz, Tal Alon, Eran Ben Elisha, Saeed Mahameed
In-Reply-To: <1461274387-10653-1-git-send-email-saeedm@mellanox.com>

Set and report vport MTU rather than physical MTU,
Driver will set both vport and physical port mtu and will
rely on the query of vport mtu.

SRIOV VFs have to report their MTU to their vport manager (PF),
and this will allow them to work with any MTU they need
without failing the request.

Also for some cases where the PF is not a port owner, PF can
work with MTU less than the physical port mtu if set physical
port mtu didn't take effect.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c |   44 ++++++++++++++++----
 drivers/net/ethernet/mellanox/mlx5/core/vport.c   |   40 +++++++++++++++++++
 include/linux/mlx5/vport.h                        |    2 +
 3 files changed, 77 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 93e4ef4..85773f8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -1404,24 +1404,50 @@ static int mlx5e_refresh_tirs_self_loopback_enable(struct mlx5e_priv *priv)
 	return 0;
 }
 
-static int mlx5e_set_dev_port_mtu(struct net_device *netdev)
+static int mlx5e_set_mtu(struct mlx5e_priv *priv, u16 mtu)
 {
-	struct mlx5e_priv *priv = netdev_priv(netdev);
 	struct mlx5_core_dev *mdev = priv->mdev;
-	u16 hw_mtu;
+	u16 hw_mtu = MLX5E_SW2HW_MTU(mtu);
 	int err;
 
-	err = mlx5_set_port_mtu(mdev, MLX5E_SW2HW_MTU(netdev->mtu), 1);
+	err = mlx5_set_port_mtu(mdev, hw_mtu, 1);
 	if (err)
 		return err;
 
-	mlx5_query_port_oper_mtu(mdev, &hw_mtu, 1);
+	/* Update vport context MTU */
+	mlx5_modify_nic_vport_mtu(mdev, hw_mtu);
+	return 0;
+}
+
+static void mlx5e_query_mtu(struct mlx5e_priv *priv, u16 *mtu)
+{
+	struct mlx5_core_dev *mdev = priv->mdev;
+	u16 hw_mtu = 0;
+	int err;
+
+	err = mlx5_query_nic_vport_mtu(mdev, &hw_mtu);
+	if (err || !hw_mtu) /* fallback to port oper mtu */
+		mlx5_query_port_oper_mtu(mdev, &hw_mtu, 1);
+
+	*mtu = MLX5E_HW2SW_MTU(hw_mtu);
+}
+
+static int mlx5e_set_dev_port_mtu(struct net_device *netdev)
+{
+	struct mlx5e_priv *priv = netdev_priv(netdev);
+	u16 mtu;
+	int err;
+
+	err = mlx5e_set_mtu(priv, netdev->mtu);
+	if (err)
+		return err;
 
-	if (MLX5E_HW2SW_MTU(hw_mtu) != netdev->mtu)
-		netdev_warn(netdev, "%s: Port MTU %d is different than netdev mtu %d\n",
-			    __func__, MLX5E_HW2SW_MTU(hw_mtu), netdev->mtu);
+	mlx5e_query_mtu(priv, &mtu);
+	if (mtu != netdev->mtu)
+		netdev_warn(netdev, "%s: VPort MTU %d is different than netdev mtu %d\n",
+			    __func__, mtu, netdev->mtu);
 
-	netdev->mtu = MLX5E_HW2SW_MTU(hw_mtu);
+	netdev->mtu = mtu;
 	return 0;
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/vport.c b/drivers/net/ethernet/mellanox/mlx5/core/vport.c
index bd51840..b69dadc 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/vport.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/vport.c
@@ -196,6 +196,46 @@ int mlx5_modify_nic_vport_mac_address(struct mlx5_core_dev *mdev,
 }
 EXPORT_SYMBOL_GPL(mlx5_modify_nic_vport_mac_address);
 
+int mlx5_query_nic_vport_mtu(struct mlx5_core_dev *mdev, u16 *mtu)
+{
+	int outlen = MLX5_ST_SZ_BYTES(query_nic_vport_context_out);
+	u32 *out;
+	int err;
+
+	out = mlx5_vzalloc(outlen);
+	if (!out)
+		return -ENOMEM;
+
+	err = mlx5_query_nic_vport_context(mdev, 0, out, outlen);
+	if (!err)
+		*mtu = MLX5_GET(query_nic_vport_context_out, out,
+				nic_vport_context.mtu);
+
+	kvfree(out);
+	return err;
+}
+EXPORT_SYMBOL_GPL(mlx5_query_nic_vport_mtu);
+
+int mlx5_modify_nic_vport_mtu(struct mlx5_core_dev *mdev, u16 mtu)
+{
+	int inlen = MLX5_ST_SZ_BYTES(modify_nic_vport_context_in);
+	void *in;
+	int err;
+
+	in = mlx5_vzalloc(inlen);
+	if (!in)
+		return -ENOMEM;
+
+	MLX5_SET(modify_nic_vport_context_in, in, field_select.mtu, 1);
+	MLX5_SET(modify_nic_vport_context_in, in, nic_vport_context.mtu, mtu);
+
+	err = mlx5_modify_nic_vport_context(mdev, in, inlen);
+
+	kvfree(in);
+	return err;
+}
+EXPORT_SYMBOL_GPL(mlx5_modify_nic_vport_mtu);
+
 int mlx5_query_nic_vport_mac_list(struct mlx5_core_dev *dev,
 				  u32 vport,
 				  enum mlx5_list_type list_type,
diff --git a/include/linux/mlx5/vport.h b/include/linux/mlx5/vport.h
index bd93e63..301da4a 100644
--- a/include/linux/mlx5/vport.h
+++ b/include/linux/mlx5/vport.h
@@ -45,6 +45,8 @@ int mlx5_query_nic_vport_mac_address(struct mlx5_core_dev *mdev,
 				     u16 vport, u8 *addr);
 int mlx5_modify_nic_vport_mac_address(struct mlx5_core_dev *dev,
 				      u16 vport, u8 *addr);
+int mlx5_query_nic_vport_mtu(struct mlx5_core_dev *mdev, u16 *mtu);
+int mlx5_modify_nic_vport_mtu(struct mlx5_core_dev *mdev, u16 mtu);
 int mlx5_query_nic_vport_system_image_guid(struct mlx5_core_dev *mdev,
 					   u64 *system_image_guid);
 int mlx5_query_nic_vport_node_guid(struct mlx5_core_dev *mdev, u64 *node_guid);
-- 
1.7.1

^ permalink raw reply related

* [PATCH net V1 7/8] net/mlx5_core: Remove static from local variable
From: Saeed Mahameed @ 2016-04-21 21:33 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Or Gerlitz, Tal Alon, Eran Ben Elisha, Eli Cohen,
	Saeed Mahameed
In-Reply-To: <1461274387-10653-1-git-send-email-saeedm@mellanox.com>

From: Eli Cohen <eli@mellanox.com>

The static is not required and breaks re-entrancy if it will be required.

Fixes: 2530236303d9 ("net/mlx5_core: Flow steering tree initialization")
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index 3c7e3e5..89cce97 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -1276,7 +1276,7 @@ struct mlx5_flow_namespace *mlx5_get_flow_namespace(struct mlx5_core_dev *dev,
 {
 	struct mlx5_flow_root_namespace *root_ns = dev->priv.root_ns;
 	int prio;
-	static struct fs_prio *fs_prio;
+	struct fs_prio *fs_prio;
 	struct mlx5_flow_namespace *ns;
 
 	if (!root_ns)
-- 
1.7.1

^ permalink raw reply related

* [PATCH net V1 3/8] net/mlx5_core: Add ConnectX-5 to list of supported devices
From: Saeed Mahameed @ 2016-04-21 21:33 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Or Gerlitz, Tal Alon, Eran Ben Elisha, Majd Dibbiny,
	Saeed Mahameed
In-Reply-To: <1461274387-10653-1-git-send-email-saeedm@mellanox.com>

From: Majd Dibbiny <majd@mellanox.com>

Add the upcoming ConnectX-5 devices (PF and VF) to the list of
supported devices by the mlx5 driver.

Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/main.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 3f3b2fa..ddd352a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -1459,6 +1459,8 @@ static const struct pci_device_id mlx5_core_pci_table[] = {
 	{ PCI_VDEVICE(MELLANOX, 0x1014), MLX5_PCI_DEV_IS_VF},	/* ConnectX-4 VF */
 	{ PCI_VDEVICE(MELLANOX, 0x1015) },			/* ConnectX-4LX */
 	{ PCI_VDEVICE(MELLANOX, 0x1016), MLX5_PCI_DEV_IS_VF},	/* ConnectX-4LX VF */
+	{ PCI_VDEVICE(MELLANOX, 0x1017) },			/* ConnectX-5 */
+	{ PCI_VDEVICE(MELLANOX, 0x1018), MLX5_PCI_DEV_IS_VF},	/* ConnectX-5 VF */
 	{ 0, }
 };
 
-- 
1.7.1

^ permalink raw reply related

* [PATCH net V1 5/8] net/mlx5e: Fix minimum MTU
From: Saeed Mahameed @ 2016-04-21 21:33 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Or Gerlitz, Tal Alon, Eran Ben Elisha, Saeed Mahameed
In-Reply-To: <1461274387-10653-1-git-send-email-saeedm@mellanox.com>

Minimum MTU that can be set in Connectx4 device is 68.

This fixes the case where a user wants to set invalid MTU,
the driver will fail to satisfy this request and the interface
will stay down.

It is better to report an error and continue working with old
mtu.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c |   11 ++++++++---
 1 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 2fbbc62..93e4ef4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -1999,22 +1999,27 @@ static int mlx5e_set_features(struct net_device *netdev,
 	return err;
 }
 
+#define MXL5_HW_MIN_MTU 64
+#define MXL5E_MIN_MTU (MXL5_HW_MIN_MTU + ETH_FCS_LEN)
+
 static int mlx5e_change_mtu(struct net_device *netdev, int new_mtu)
 {
 	struct mlx5e_priv *priv = netdev_priv(netdev);
 	struct mlx5_core_dev *mdev = priv->mdev;
 	bool was_opened;
 	u16 max_mtu;
+	u16 min_mtu;
 	int err = 0;
 
 	mlx5_query_port_max_mtu(mdev, &max_mtu, 1);
 
 	max_mtu = MLX5E_HW2SW_MTU(max_mtu);
+	min_mtu = MLX5E_HW2SW_MTU(MXL5E_MIN_MTU);
 
-	if (new_mtu > max_mtu) {
+	if (new_mtu > max_mtu || new_mtu < min_mtu) {
 		netdev_err(netdev,
-			   "%s: Bad MTU (%d) > (%d) Max\n",
-			   __func__, new_mtu, max_mtu);
+			   "%s: Bad MTU (%d), valid range is: [%d..%d]\n",
+			   __func__, new_mtu, min_mtu, max_mtu);
 		return -EINVAL;
 	}
 
-- 
1.7.1

^ permalink raw reply related

* [PATCH net V1 1/8] net/mlx5_core: Fix soft lockup in steering error flow
From: Saeed Mahameed @ 2016-04-21 21:33 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Or Gerlitz, Tal Alon, Eran Ben Elisha, Maor Gottlieb,
	Saeed Mahameed
In-Reply-To: <1461274387-10653-1-git-send-email-saeedm@mellanox.com>

From: Maor Gottlieb <maorg@mellanox.com>

In the error flow of adding flow rule to auto-grouped flow
table, we call to tree_remove_node.

tree_remove_node locks the node's parent, however the node's parent
is already locked by mlx5_add_flow_rule and this causes a deadlock.
After this patch, if we failed to add the flow rule, we unlock the
flow table before calling to tree_remove_node.

fixes: f0d22d187473 ('net/mlx5_core: Introduce flow steering autogrouped
flow table')
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reported-by: Amir Vadai <amir@vadai.me>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c |   46 ++++++++-------------
 1 files changed, 17 insertions(+), 29 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index 5121be4..3c7e3e5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -1065,33 +1065,6 @@ unlock_fg:
 	return rule;
 }
 
-static struct mlx5_flow_rule *add_rule_to_auto_fg(struct mlx5_flow_table *ft,
-						  u8 match_criteria_enable,
-						  u32 *match_criteria,
-						  u32 *match_value,
-						  u8 action,
-						  u32 flow_tag,
-						  struct mlx5_flow_destination *dest)
-{
-	struct mlx5_flow_rule *rule;
-	struct mlx5_flow_group *g;
-
-	g = create_autogroup(ft, match_criteria_enable, match_criteria);
-	if (IS_ERR(g))
-		return (void *)g;
-
-	rule = add_rule_fg(g, match_value,
-			   action, flow_tag, dest);
-	if (IS_ERR(rule)) {
-		/* Remove assumes refcount > 0 and autogroup creates a group
-		 * with a refcount = 0.
-		 */
-		tree_get_node(&g->node);
-		tree_remove_node(&g->node);
-	}
-	return rule;
-}
-
 static struct mlx5_flow_rule *
 _mlx5_add_flow_rule(struct mlx5_flow_table *ft,
 		    u8 match_criteria_enable,
@@ -1119,8 +1092,23 @@ _mlx5_add_flow_rule(struct mlx5_flow_table *ft,
 				goto unlock;
 		}
 
-	rule = add_rule_to_auto_fg(ft, match_criteria_enable, match_criteria,
-				   match_value, action, flow_tag, dest);
+	g = create_autogroup(ft, match_criteria_enable, match_criteria);
+	if (IS_ERR(g)) {
+		rule = (void *)g;
+		goto unlock;
+	}
+
+	rule = add_rule_fg(g, match_value,
+			   action, flow_tag, dest);
+	if (IS_ERR(rule)) {
+		/* Remove assumes refcount > 0 and autogroup creates a group
+		 * with a refcount = 0.
+		 */
+		unlock_ref_node(&ft->node);
+		tree_get_node(&g->node);
+		tree_remove_node(&g->node);
+		return rule;
+	}
 unlock:
 	unlock_ref_node(&ft->node);
 	return rule;
-- 
1.7.1

^ permalink raw reply related

* [PATCH net V1 4/8] net/mlx5e: Device's mtu field is u16 and not int
From: Saeed Mahameed @ 2016-04-21 21:33 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Or Gerlitz, Tal Alon, Eran Ben Elisha, Saeed Mahameed
In-Reply-To: <1461274387-10653-1-git-send-email-saeedm@mellanox.com>

For set/query MTU port firmware commands the MTU field
is 16 bits, here I changed all the "int mtu" parameters
of the functions wrapping those firmware commands to be u16.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/infiniband/hw/mlx5/main.c                 |    4 ++--
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c |    4 ++--
 drivers/net/ethernet/mellanox/mlx5/core/port.c    |   10 +++++-----
 include/linux/mlx5/port.h                         |    6 +++---
 4 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 5acf346..99eb1c1 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -671,8 +671,8 @@ static int mlx5_query_hca_port(struct ib_device *ibdev, u8 port,
 	struct mlx5_ib_dev *dev = to_mdev(ibdev);
 	struct mlx5_core_dev *mdev = dev->mdev;
 	struct mlx5_hca_vport_context *rep;
-	int max_mtu;
-	int oper_mtu;
+	u16 max_mtu;
+	u16 oper_mtu;
 	int err;
 	u8 ib_link_width_oper;
 	u8 vl_hw_cap;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index e0adb60..2fbbc62 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -1408,7 +1408,7 @@ static int mlx5e_set_dev_port_mtu(struct net_device *netdev)
 {
 	struct mlx5e_priv *priv = netdev_priv(netdev);
 	struct mlx5_core_dev *mdev = priv->mdev;
-	int hw_mtu;
+	u16 hw_mtu;
 	int err;
 
 	err = mlx5_set_port_mtu(mdev, MLX5E_SW2HW_MTU(netdev->mtu), 1);
@@ -2004,7 +2004,7 @@ static int mlx5e_change_mtu(struct net_device *netdev, int new_mtu)
 	struct mlx5e_priv *priv = netdev_priv(netdev);
 	struct mlx5_core_dev *mdev = priv->mdev;
 	bool was_opened;
-	int max_mtu;
+	u16 max_mtu;
 	int err = 0;
 
 	mlx5_query_port_max_mtu(mdev, &max_mtu, 1);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/port.c b/drivers/net/ethernet/mellanox/mlx5/core/port.c
index ae378c5..53cc1e2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/port.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/port.c
@@ -247,8 +247,8 @@ int mlx5_query_port_admin_status(struct mlx5_core_dev *dev,
 }
 EXPORT_SYMBOL_GPL(mlx5_query_port_admin_status);
 
-static void mlx5_query_port_mtu(struct mlx5_core_dev *dev, int *admin_mtu,
-				int *max_mtu, int *oper_mtu, u8 port)
+static void mlx5_query_port_mtu(struct mlx5_core_dev *dev, u16 *admin_mtu,
+				u16 *max_mtu, u16 *oper_mtu, u8 port)
 {
 	u32 in[MLX5_ST_SZ_DW(pmtu_reg)];
 	u32 out[MLX5_ST_SZ_DW(pmtu_reg)];
@@ -268,7 +268,7 @@ static void mlx5_query_port_mtu(struct mlx5_core_dev *dev, int *admin_mtu,
 		*admin_mtu = MLX5_GET(pmtu_reg, out, admin_mtu);
 }
 
-int mlx5_set_port_mtu(struct mlx5_core_dev *dev, int mtu, u8 port)
+int mlx5_set_port_mtu(struct mlx5_core_dev *dev, u16 mtu, u8 port)
 {
 	u32 in[MLX5_ST_SZ_DW(pmtu_reg)];
 	u32 out[MLX5_ST_SZ_DW(pmtu_reg)];
@@ -283,14 +283,14 @@ int mlx5_set_port_mtu(struct mlx5_core_dev *dev, int mtu, u8 port)
 }
 EXPORT_SYMBOL_GPL(mlx5_set_port_mtu);
 
-void mlx5_query_port_max_mtu(struct mlx5_core_dev *dev, int *max_mtu,
+void mlx5_query_port_max_mtu(struct mlx5_core_dev *dev, u16 *max_mtu,
 			     u8 port)
 {
 	mlx5_query_port_mtu(dev, NULL, max_mtu, NULL, port);
 }
 EXPORT_SYMBOL_GPL(mlx5_query_port_max_mtu);
 
-void mlx5_query_port_oper_mtu(struct mlx5_core_dev *dev, int *oper_mtu,
+void mlx5_query_port_oper_mtu(struct mlx5_core_dev *dev, u16 *oper_mtu,
 			      u8 port)
 {
 	mlx5_query_port_mtu(dev, NULL, NULL, oper_mtu, port);
diff --git a/include/linux/mlx5/port.h b/include/linux/mlx5/port.h
index a1d145a..b30250a 100644
--- a/include/linux/mlx5/port.h
+++ b/include/linux/mlx5/port.h
@@ -54,9 +54,9 @@ int mlx5_set_port_admin_status(struct mlx5_core_dev *dev,
 int mlx5_query_port_admin_status(struct mlx5_core_dev *dev,
 				 enum mlx5_port_status *status);
 
-int mlx5_set_port_mtu(struct mlx5_core_dev *dev, int mtu, u8 port);
-void mlx5_query_port_max_mtu(struct mlx5_core_dev *dev, int *max_mtu, u8 port);
-void mlx5_query_port_oper_mtu(struct mlx5_core_dev *dev, int *oper_mtu,
+int mlx5_set_port_mtu(struct mlx5_core_dev *dev, u16 mtu, u8 port);
+void mlx5_query_port_max_mtu(struct mlx5_core_dev *dev, u16 *max_mtu, u8 port);
+void mlx5_query_port_oper_mtu(struct mlx5_core_dev *dev, u16 *oper_mtu,
 			      u8 port);
 
 int mlx5_query_port_vl_hw_cap(struct mlx5_core_dev *dev,
-- 
1.7.1

^ permalink raw reply related

* [PATCH net V1 0/8] mlx5 driver updates and fixes
From: Saeed Mahameed @ 2016-04-21 21:32 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Or Gerlitz, Tal Alon, Eran Ben Elisha, Saeed Mahameed

Hi Dave,

Changes from V0:
	- Dropped: ("net/mlx5e: Reset link modes upon setting speed to zero") 
	- Fixed compilation issue introduced to mlx5_ib driver.
	- Rebased to df637193906a ('Revert "Prevent NUll pointer dereference with two PHYs on cpsw"')

This series has few bug fixes for mlx5 core and ethernet driver.

Eli fixed a wrong static local variable declaration in flow steering API.
Majd added the support of ConnectX-5 PF and VF and added the support
for kernel shutdown pci callback for more robust reboot procedures.
Maor fixed a soft lockup in flow steering.
Rana fixed a wrog speed define in mlx5 EN driver.
I also had the chance to introduce some bug fixes in mlx5 EN mtu 
reporting and handling.

For -stable:
	net/mlx5_core: Fix soft lockup in steering error flow
	net/mlx5e: Device's mtu field is u16 and not int
	net/mlx5e: Fix minimum MTU
	net/mlx5e: Use vport MTU rather than physical port MTU

Thanks,
Saeed


Eli Cohen (1):
  net/mlx5_core: Remove static from local variable

Majd Dibbiny (2):
  net/mlx5_core: Add ConnectX-5 to list of supported devices
  net/mlx5: Add pci shutdown callback

Maor Gottlieb (1):
  net/mlx5_core: Fix soft lockup in steering error flow

Rana Shahout (1):
  net/mlx5e: Fix MLX5E_100BASE_T define

Saeed Mahameed (3):
  net/mlx5e: Device's mtu field is u16 and not int
  net/mlx5e: Fix minimum MTU
  net/mlx5e: Use vport MTU rather than physical port MTU

 drivers/infiniband/hw/mlx5/main.c                  |    4 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h       |    2 +-
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |    8 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |   72 +++++++++++++++----
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c  |   48 +++++--------
 drivers/net/ethernet/mellanox/mlx5/core/main.c     |   25 ++++++-
 drivers/net/ethernet/mellanox/mlx5/core/port.c     |   10 ++--
 drivers/net/ethernet/mellanox/mlx5/core/vport.c    |   40 +++++++++++
 include/linux/mlx5/driver.h                        |    7 +-
 include/linux/mlx5/port.h                          |    6 +-
 include/linux/mlx5/vport.h                         |    2 +
 11 files changed, 157 insertions(+), 67 deletions(-)

^ permalink raw reply

* [PATCH net V1 2/8] net/mlx5e: Fix MLX5E_100BASE_T define
From: Saeed Mahameed @ 2016-04-21 21:33 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Or Gerlitz, Tal Alon, Eran Ben Elisha, Rana Shahout,
	Saeed Mahameed
In-Reply-To: <1461274387-10653-1-git-send-email-saeedm@mellanox.com>

From: Rana Shahout <ranas@mellanox.com>

Bit 25 of eth_proto_capability in PTYS register is
1000Base-TT and not 100Base-T.

Fixes: f62b8bb8f2d3 ('net/mlx5: Extend mlx5_core to
support ConnectX-4 Ethernet functionality')
Signed-off-by: Rana Shahout <ranas@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h       |    2 +-
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |    8 ++++----
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 879e627..e80ce94 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -609,7 +609,7 @@ enum mlx5e_link_mode {
 	MLX5E_100GBASE_KR4	 = 22,
 	MLX5E_100GBASE_LR4	 = 23,
 	MLX5E_100BASE_TX	 = 24,
-	MLX5E_100BASE_T		 = 25,
+	MLX5E_1000BASE_T	 = 25,
 	MLX5E_10GBASE_T		 = 26,
 	MLX5E_25GBASE_CR	 = 27,
 	MLX5E_25GBASE_KR	 = 28,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index 68834b7..3476ab8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -138,10 +138,10 @@ static const struct {
 	[MLX5E_100BASE_TX]   = {
 		.speed      = 100,
 	},
-	[MLX5E_100BASE_T]    = {
-		.supported  = SUPPORTED_100baseT_Full,
-		.advertised = ADVERTISED_100baseT_Full,
-		.speed      = 100,
+	[MLX5E_1000BASE_T]    = {
+		.supported  = SUPPORTED_1000baseT_Full,
+		.advertised = ADVERTISED_1000baseT_Full,
+		.speed      = 1000,
 	},
 	[MLX5E_10GBASE_T]    = {
 		.supported  = SUPPORTED_10000baseT_Full,
-- 
1.7.1

^ permalink raw reply related

* Re: [net-next PATCH iproute2 v2 1/1] tc: introduce IFE action
From: Jamal Hadi Salim @ 2016-04-21 21:27 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, phil
In-Reply-To: <20160313232651.354f6087@xeon-e3>

Sorry dropped the ball on this..

On 16-03-14 02:26 AM, Stephen Hemminger wrote:
> On Wed,  9 Mar 2016 07:04:36 -0500
> Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>

> This code has diverged way from the general rule that ip utilities display
> format should match the command format. For example the properties shown
> on "ip route show" match those of "ip route add".
>

Valid point (and thanks for catching this since i tend to be the biggest
whiner  on this topic ;-> I will make the changes - doesnt seem to be
far off already.
Note: in ife case it may not always symetric because there are optional
fields which may be absent in a request to the kernel but present in
a response.

> Also over the last several years, the code in iproute2 has switched from casting
> RTA_DATA() everywhere to a cleaner interface rte_getattr_u32() more like what
> is used in mnl library.
>

Will convert where it makes sense..

> The code has also gotten deeply intended creating lots of lines that are too long.
>

Is this you or the script saying the above? How is the conclusion that
we have deep indentation come about?
I checked there are some code lines that are > 80 characters because
it doesnt make sense to break them down.

> WARNING: 'doesnt' may be misspelled - perhaps 'doesn't'?
> #21:

What, checking my spelling now? ;->
I am on the internets dude!

> then provide a default so the user doesnt have to specify it.
>
> WARNING: Possible unwrapped commit description (prefer a maximum 75 chars per line)
> #25:
>              "Distributing Linux Traffic Control Classifier-Action Subsystem"
>

75 character? ;-> What happened to 80?

> WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
> #143:
> new file mode 100644
>
> ERROR: "foo * bar" should be "foo *bar"

Will fix above and rest shortly.

Also, promise to send man page later. Ive coerced someone to do it;->

cheers,
jamal

^ permalink raw reply

* Re: [RFC PATCH v3 net-next 2/3] tcp: Handle eor bit when coalescing skb
From: Soheil Hassas Yeganeh @ 2016-04-21 21:14 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: netdev, Eric Dumazet, Neal Cardwell, Willem de Bruijn,
	Yuchung Cheng, Kernel Team
In-Reply-To: <20160421165633.GA74969@kafai-mba.local>

On Thu, Apr 21, 2016 at 12:56 PM, Martin KaFai Lau <kafai@fb.com> wrote:
> On Wed, Apr 20, 2016 at 04:04:54PM -0400, Soheil Hassas Yeganeh wrote:
>> > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
>> > index a6e4a83..96bdf98 100644
>> > --- a/net/ipv4/tcp_output.c
>> > +++ b/net/ipv4/tcp_output.c
>> > @@ -2494,6 +2494,7 @@ static void tcp_collapse_retrans(struct sock *sk, struct sk_buff *skb)
>> >          * packet counting does not break.
>> >          */
>> >         TCP_SKB_CB(skb)->sacked |= TCP_SKB_CB(next_skb)->sacked & TCPCB_EVER_RETRANS;
>> > +       TCP_SKB_CB(skb)->eor = TCP_SKB_CB(next_skb)->eor;
>> >
>> >         /* changed transmit queue under us so clear hints */
>> >         tcp_clear_retrans_hints_partial(tp);
>> > @@ -2545,6 +2546,9 @@ static void tcp_retrans_try_collapse(struct sock *sk, struct sk_buff *to,
>> >                 if (!tcp_can_collapse(sk, skb))
>> >                         break;
>> >
>> > +               if (TCP_SKB_CB(to)->eor)
>> > +                       break;
>> > +
>>
>> nit: Perhaps a better place to check for eor is right after entering
>> the loop? to skip a few instructions and tcp_can_collapse, in an
>> unlikely case eor is set.
> hmm... Not sure I understand it.
> You meant moving the unlikely case before (or after?) the more likely
> cases which may have a better chance to break the loop sooner?

Well I don't have strong preference here. So, feel free to ignore.
Though I'm not sure how "likely" are the checks in tcp_can_collapse.

On another note, do you think putting this is a self-documenting
helper function, say tcp_can_collapse_to(), would help readability?

Thanks.

^ permalink raw reply

* Re: [PATCH] wcn36xx: Set SMD timeout to 10 seconds
From: John Stultz @ 2016-04-21 21:11 UTC (permalink / raw)
  To: Bjorn Andersson
  Cc: Eugene Krasnikov, Kalle Valo, linux-wireless, netdev, lkml
In-Reply-To: <1461272982-7233-1-git-send-email-bjorn.andersson@linaro.org>

On Thu, Apr 21, 2016 at 2:09 PM, Bjorn Andersson
<bjorn.andersson@linaro.org> wrote:
> After booting the wireless subsystem and uploading the NV blob to the
> WCNSS_CTRL service the remote continues to do things and will not start
> servicing wlan-requests for another 2-5 seconds (measured).
>
> The downstream code does not have any special handling for this case,
> but has a timeout of 10 seconds for the communication layer. By
> extending the wcn36xx timeout to match this we follows the same flow for
> the boot procedure and can successfully configure WiFi as wlan0 is
> registered.
>
> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>

I've been using this with my nexus7 tree, and its avoided issues I was
seeing without it.

Tested-by: John Stultz <john.stultz@linaro.org>

thanks
-john

^ permalink raw reply

* [PATCH] wcn36xx: Set SMD timeout to 10 seconds
From: Bjorn Andersson @ 2016-04-21 21:09 UTC (permalink / raw)
  To: Eugene Krasnikov, Kalle Valo
  Cc: linux-wireless, netdev, linux-kernel, John Stultz

After booting the wireless subsystem and uploading the NV blob to the
WCNSS_CTRL service the remote continues to do things and will not start
servicing wlan-requests for another 2-5 seconds (measured).

The downstream code does not have any special handling for this case,
but has a timeout of 10 seconds for the communication layer. By
extending the wcn36xx timeout to match this we follows the same flow for
the boot procedure and can successfully configure WiFi as wlan0 is
registered.

Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
---
 drivers/net/wireless/ath/wcn36xx/smd.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/ath/wcn36xx/smd.h b/drivers/net/wireless/ath/wcn36xx/smd.h
index e6aadd273c46..6310560901f0 100644
--- a/drivers/net/wireless/ath/wcn36xx/smd.h
+++ b/drivers/net/wireless/ath/wcn36xx/smd.h
@@ -24,7 +24,7 @@

 #define WCN36XX_HAL_BUF_SIZE				4096

-#define HAL_MSG_TIMEOUT 500
+#define HAL_MSG_TIMEOUT 10000
 #define WCN36XX_SMSM_WLAN_TX_ENABLE			0x00000400
 #define WCN36XX_SMSM_WLAN_TX_RINGS_EMPTY		0x00000200
 /* The PNO version info be contained in the rsp msg */
-- 
2.5.0

^ permalink raw reply related

* Re: [PATCH] ixgbevf: Fix relaxed order settings in VF driver
From: Alexander Duyck @ 2016-04-21 21:00 UTC (permalink / raw)
  To: Babu Moger
  Cc: Jeff Kirsher, Brandeburg, Jesse, shannon nelson, Carolyn Wyborny,
	Skidmore, Donald C, Bruce W Allan, John Ronciak, Mitch Williams,
	intel-wired-lan, Netdev, linux-kernel@vger.kernel.org,
	Sowmini Varadhan
In-Reply-To: <57192C7D.9080507@oracle.com>

On Thu, Apr 21, 2016 at 12:39 PM, Babu Moger <babu.moger@oracle.com> wrote:
> Hi Alex,
>
> On 4/21/2016 2:22 PM, Alexander Duyck wrote:
>> On Thu, Apr 21, 2016 at 11:13 AM, Alexander Duyck
>> <alexander.duyck@gmail.com> wrote:
>>> On Thu, Apr 21, 2016 at 10:21 AM, Babu Moger <babu.moger@oracle.com> wrote:
>>>> Current code writes the tx/rx relaxed order without reading it first.
>>>> This can lead to unintended consequences as we are forcibly writing
>>>> other bits.
>>>
>>> The consequences were very much intended as there are situations where
>>> enabling relaxed ordering can lead to data corruption.
>>>
>>>> We noticed this problem while testing VF driver on sparc. Relaxed
>>>> order settings for rx queue were all messed up which was causing
>>>> performance drop with VF interface.
>>>
>>> What additional relaxed ordering bits are you enabling on Sparc?  I'm
>>> assuming it is just the Rx data write back but I want to verify.
>>>
>>>> Fixed it by reading the registers first and setting the specific
>>>> bit of interest. With this change we are able to match the bandwidth
>>>> equivalent to PF interface.
>>>>
>>>> Signed-off-by: Babu Moger <babu.moger@oracle.com>
>>>
>>> Fixed is a relative term here since you are only chasing performance
>>> from what I can tell.  We need to make certain that this doesn't break
>>> the driver on any other architectures by leading to things like data
>>> corruption.
>>>
>>> - Alex
>>
>> It occurs to me that what might be easier is instead of altering the
>> configuration on all architectures you could instead wrap the write so
>> that on SPARC you include the extra bits you need and on all other
>> architectures you leave the write as-is similar to how the code in the
>> ixgbe_start_hw_gen2 only clears the bits if CONFIG_SPARC is not
>> defined.
>
>
> Here are the default values that I see when testing on Sparc.
>
> Default tx value 0x2a00
>
> All below 3 set
> #define IXGBE_DCA_TXCTRL_DESC_RRO_EN (1 << 9) /* Tx rd Desc Relax Order */
> #define IXGBE_DCA_TXCTRL_DESC_WRO_EN (1 << 11) /* Tx Desc writeback RO bit */
> #define IXGBE_DCA_TXCTRL_DATA_RRO_EN (1 << 13) /* Tx rd data Relax Order */
>
> I am not too worried about tx values. I can keep it as it is. It did not
> seem to cause any problems right now.
>
>
> Default rx value 0xb200
>
> All below 3 set plus one more
>
> #define IXGBE_DCA_RXCTRL_DESC_RRO_EN (1 << 9) /* DCA Rx rd Desc Relax Order */
> #define IXGBE_DCA_RXCTRL_DATA_WRO_EN (1 << 13) /* Rx wr data Relax Order */
> #define IXGBE_DCA_RXCTRL_HEAD_WRO_EN (1 << 15) /* Rx wr header RO */

So that looks like the register defaults.  Which based on the released
data-sheet for 82599 are a bit off.  The "one more" bit that is set is
supposed to be written as 0 as per the 82599 datasheet, but it
defaults to 1 for some reason.  On the x540 data-sheet that appears to
be a no-snoop bit that you are enabling which should not be enabled.
It won't necessarily hurt things either though as I believe the
no-snoop bit is not being set in the descriptors.

> Is there a reason to disable IXGBE_DCA_RXCTRL_DATA_WRO_EN and
> IXGBE_DCA_RXCTRL_HEAD_WRO_EN for RX?

In the case of HEAD_WRO_EN it doesn't give us anything because we
don't have packet split/replication enabled anyway.  That feature is
broken on the 82599, and was deprecated some time ago in the ixgbe
driver.

I don't have the fully history on the data writeback but I believe
there was an issue of where some x86 chipsets had issues when the
device enabled relaxed ordering.  That was why relaxed ordering was
disabled on all writes for the device.  I was the one that went
through and re-enabled relaxed ordering on reads from the device so we
at least allowed that much on most architectures.

You would probably only need to add IXGBE_DCA_RXCTRL_DATA_WRO_EN to
the write in the case of CONFIG_SPARC being defined.  Another approach
might be to have a define value that you end up passing that is
defined one way if SPARC and another if a different architecture.

> I would think CONFIG_SPARC should be our last option. What do you think?

I was thinking CONFIG_SPARC would allow us to have feature parity with
the PF.  If you look that is how this issue is solved there in
function ixgbe_start_hw_gen2().  That was why I was thinking that may
be the approach we want to take.  Otherwise we have to write up some
complicated setup where we would have to use the API in order to
determine if the PF has already taken care of this for us which I
would prefer not to have to do.

- Alex

^ permalink raw reply

* Re: [RFC PATCH v2 net-next 2/7] tcp: Merge tx_flags/tskey/txstamp_ack in tcp_collapse_retrans
From: Willem de Bruijn @ 2016-04-21 20:25 UTC (permalink / raw)
  To: Soheil Hassas Yeganeh
  Cc: Martin KaFai Lau, Eric Dumazet, netdev, Neal Cardwell,
	Soheil Hassas Yeganeh, Willem de Bruijn, Yuchung Cheng,
	Kernel Team
In-Reply-To: <CACSApvbzAtCTnom+xyQmaG=ATwncMnvLLDdxqCAYWG7rgYbBmw@mail.gmail.com>

On Tue, Apr 19, 2016 at 2:24 PM, Soheil Hassas Yeganeh
<soheil@google.com> wrote:
> On Tue, Apr 19, 2016 at 2:18 PM, Martin KaFai Lau <kafai@fb.com> wrote:
>> On Tue, Apr 19, 2016 at 10:35:52AM -0700, Eric Dumazet wrote:
>>> On Tue, Apr 19, 2016 at 10:28 AM, Martin KaFai Lau <kafai@fb.com> wrote:
>>>
>>> > A bit off topic, I feel like the SKBTX_ACK_TSTAMP and txstamp_ack are sort
>>> > of redundant but I have not look into the details yet, so not completely
>>> > sure.  It wwould be a separate cleanup patch if it is the case.

Yes, with the introduction of txstamp_ack, SKBTX_ACK_TSTAMP is completely
redundant.

>>>
>>> Please read 6b084928baac562ed61866f540a96120e9c9ddb7 changelog ;)
>>>
>>> A cache line miss avoidance is critical
>> I looked at the patch but I probably am missing something :(
>> Is checking txstamp_ack alone enough and SKBTX_ACK_TSTAMP is not needed
>> since they are always set together?
>
> That's right, the check on "(shinfo->tx_flags & SKBTX_ACK_TSTAMP)" in
> tcp_ack_tstamp() is redundant and I had a patch prepared to remove it.

You can even remove the flag completely and

-               tcb->txstamp_ack = !!(shinfo->tx_flags & SKBTX_ACK_TSTAMP);
+               if (tsflags & SOF_TIMESTAMPING_TX_ACK)
+                       tcb->txstamp_ack = 1;

> But I thought it's better to wait for
> https://patchwork.ozlabs.org/patch/611938/ to be merged first.
>
> Feel free to remove it in your patches, if you'd prefer that.

^ permalink raw reply

* [PATCH net] ipv4/fib: don't warn when primary address is missing if in_dev is dead
From: Paolo Abeni @ 2016-04-21 20:23 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, Alexey Kuznetsov, linux-kernel

After commit fbd40ea0180a ("ipv4: Don't do expensive useless work
during inetdev destroy.") when deleting an interface,
fib_del_ifaddr() can be executed without any primary address
present on the dead interface.

The above is safe, but triggers some "bug: prim == NULL" warnings.

This commit avoids warning if the in_dev is dead

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/ipv4/fib_frontend.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 8a9246d..63566ec 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -904,7 +904,11 @@ void fib_del_ifaddr(struct in_ifaddr *ifa, struct in_ifaddr *iprim)
 	if (ifa->ifa_flags & IFA_F_SECONDARY) {
 		prim = inet_ifa_byprefix(in_dev, any, ifa->ifa_mask);
 		if (!prim) {
-			pr_warn("%s: bug: prim == NULL\n", __func__);
+			/* if the device has been deleted, we don't perform
+			 * address promotion
+			 */
+			if (!in_dev->dead)
+				pr_warn("%s: bug: prim == NULL\n", __func__);
 			return;
 		}
 		if (iprim && iprim != prim) {
-- 
1.8.3.1

^ permalink raw reply related

* RE: [net-next 15/17] fm10k: fix possible null pointer deref after kcalloc
From: Keller, Jacob E @ 2016-04-21 20:09 UTC (permalink / raw)
  To: Kirsher, Jeffrey T, David Miller
  Cc: netdev@vger.kernel.org, nhorman@redhat.com, sassmann@redhat.com,
	jogreene@redhat.com
In-Reply-To: <1461268100.3018.5.camel@intel.com>

> On Thu, 2016-04-21 at 15:47 -0400, David Miller wrote:
> > From: "Keller, Jacob E" <jacob.e.keller@intel.com>
> > Date: Thu, 21 Apr 2016 19:44:24 +0000
> >
> > > Dave, please don't pull this patch. Krishneil found an issue with
> > > this patch, and I sent a corrected fix to squash in but it didn't
> > > make it into this one. I'd like to get the fix squashed in a resend
> > > before this is pulled into net-next.
> >
> > Too late, it's already in my tree.
> 
> I will address Jakes concerns in my next fm10k pull request.

Thanks Jeff. Sorry I didn't notice this earlier.

Regards,
Jake

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox