Netdev List
 help / color / mirror / Atom feed
* [PATCH net-next 7/7] ibmvnic: Free skb's in cases of failure in transmit
From: Nathan Fontenot @ 2017-04-21 19:39 UTC (permalink / raw)
  To: netdev; +Cc: brking, jallen, muvic, tlfalcon
In-Reply-To: <20170421193627.11030.34813.stgit@ltcalpine2-lp23.aus.stglabs.ibm.com>

From: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>

When an error is encountered during transmit we need to free the
skb instead of returning TX_BUSY.

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
---
 drivers/net/ethernet/ibm/ibmvnic.c |   18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
index 5a916a2..51bf337 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -908,9 +908,13 @@ static int ibmvnic_xmit(struct sk_buff *skb, struct net_device *netdev)
 				   be32_to_cpu(adapter->login_rsp_buf->
 					       off_txsubm_subcrqs));
 	if (adapter->migrated) {
+		if (!netif_subqueue_stopped(netdev, skb))
+			netif_stop_subqueue(netdev, queue_num);
+		dev_kfree_skb_any(skb);
+
 		tx_send_failed++;
 		tx_dropped++;
-		ret = NETDEV_TX_BUSY;
+		ret = NETDEV_TX_OK;
 		goto out;
 	}
 
@@ -976,11 +980,13 @@ static int ibmvnic_xmit(struct sk_buff *skb, struct net_device *netdev)
 						    sizeof(tx_buff->indir_arr),
 						    DMA_TO_DEVICE);
 		if (dma_mapping_error(dev, tx_buff->indir_dma)) {
+			dev_kfree_skb_any(skb);
+			tx_buff->skb = NULL;
 			if (!firmware_has_feature(FW_FEATURE_CMO))
 				dev_err(dev, "tx: unable to map descriptor array\n");
 			tx_map_failed++;
 			tx_dropped++;
-			ret = NETDEV_TX_BUSY;
+			ret = NETDEV_TX_OK;
 			goto out;
 		}
 		lpar_rc = send_subcrq_indirect(adapter, handle_array[queue_num],
@@ -999,9 +1005,15 @@ static int ibmvnic_xmit(struct sk_buff *skb, struct net_device *netdev)
 		else
 			tx_pool->consumer_index--;
 
+		dev_kfree_skb_any(skb);
+		tx_buff->skb = NULL;
+
+		if (lpar_rc == H_CLOSED)
+			netif_stop_subqueue(netdev, queue_num);
+
 		tx_send_failed++;
 		tx_dropped++;
-		ret = NETDEV_TX_BUSY;
+		ret = NETDEV_TX_OK;
 		goto out;
 	}
 

^ permalink raw reply related

* Re: [PATCH net-next v4 1/2] net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch
From: David Miller @ 2017-04-21 14:51 UTC (permalink / raw)
  To: jhs; +Cc: eric.dumazet, jiri, netdev, xiyou.wangcong
In-Reply-To: <ed5c82c7-0e59-d377-fefc-3b97764d2843@mojatatu.com>

From: Jamal Hadi Salim <jhs@mojatatu.com>
Date: Fri, 21 Apr 2017 06:36:19 -0400

> On 17-04-20 01:58 PM, David Miller wrote:
>> From: Jamal Hadi Salim <jhs@mojatatu.com>
>> Date: Thu, 20 Apr 2017 13:38:14 -0400
>>
> 
>>> There are no examples of such issues with bitmasks encapsulated in
>>> TLVs
> 
>>> It does not make much sense to have a TLV for each of these
>>> bits when i can fit a bunch of them in u16/32/64.
>>
>> I have not ruled out bitmasks.  I'm only saying that the kernel must
>> properly reject bits it doesn't recognize when they are set.
>>
> 
> It is the other way round from what i see: It ignores them.

Which means we can never use them for anything else reliably,
there could be random crap in there.

> This allows new bits to be added over time.

No, ignoring them actually means we cannot add new bits.

> Note: It is a bug - which must be fixed - if user space sets
> something the kernel doesnt want it to set. Even then, the only good
> use case i can think of for something like this is the kernel
> is exposing something to user space for read-only and user space
> is being silly and setting read-only bits on requests to the kernel.
> But even that is not a catastrophic issue; kernel should just ignore
> it.

But since we didn't check and enforce, we can't use the bits for
settings however we like.

That's the entire point.

We can _never_ go back later and say "oops, add the checks now, it's
all good" because that doesn't work at all.

^ permalink raw reply

* Re: [net-next 00/11][pull request] 10GbE Intel Wired LAN Driver Updates 2017-04-20
From: David Miller @ 2017-04-21 18:18 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, nhorman, sassmann, jogreene
In-Reply-To: <20170421015029.18994-1-jeffrey.t.kirsher@intel.com>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Thu, 20 Apr 2017 18:50:18 -0700

> John adds XDP support (yeah!) for ixgbe.

As excited and eager as I am about this, I want to see the build regression
for PAGE_SIZE>=8192 fixed before I pull this.

Thanks.

^ permalink raw reply

* Re: net: heap out-of-bounds in fib6_clean_node/rt6_fill_node/fib6_age/fib6_prune_clone
From: David Ahern @ 2017-04-21 18:25 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Andrey Konovalov, Dmitry Vyukov, Cong Wang, Mahesh Bandewar,
	Eric Dumazet, David Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, syzkaller
In-Reply-To: <1492793238.6453.30.camel@edumazet-glaptop3.roam.corp.google.com>

On 4/21/17 10:47 AM, Eric Dumazet wrote:
> On Fri, 2017-04-21 at 08:27 -0600, David Ahern wrote:
>> On 4/20/17 10:09 AM, Andrey Konovalov wrote:
>>> On Thu, Apr 20, 2017 at 5:39 PM, Andrey Konovalov <andreyknvl@google.com> wrote:
>>>> On Thu, Apr 20, 2017 at 5:35 PM, David Ahern <dsa@cumulusnetworks.com> wrote:
>>>>> On 4/20/17 9:28 AM, Andrey Konovalov wrote:
>>>>>> This one seems to be much closer to what Dmitry reported intially.
>>>>> does not repro here; I ran in a loop and nothing.
>>> Here's strace log, maybe it'll help figuring out why it doesn't reproduce:
>>
>> reproduced. working on it.
> 
> Thanks guys for working on this ;)
> 


Reliable reproducer is the key.

I see what's going on - why the WARN_ON is hit; just looking for the
right fix.

^ permalink raw reply

* [PATCH net-next 2/3] mlx5: fix warning about missing prototype
From: Stephen Hemminger @ 2017-04-21 18:15 UTC (permalink / raw)
  To: davem, saeedm, matanb, leonro; +Cc: linux-rdma, netdev, Stephen Hemminger
In-Reply-To: <20170421181558.5414-1-sthemmin@microsoft.com>

Fix sparse warning about missing prototypes. The rx/tx code path
defines functions with prototypes in ipoib.h.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 1 +
 drivers/net/ethernet/mellanox/mlx5/core/en_tx.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 43308243f519..ae66fad98244 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -39,6 +39,7 @@
 #include "en.h"
 #include "en_tc.h"
 #include "eswitch.h"
+#include "ipoib.h"
 
 static inline bool mlx5e_rx_hw_stamp(struct mlx5e_tstamp *tstamp)
 {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
index dda7db503043..ab3bb026ff9e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
@@ -33,6 +33,7 @@
 #include <linux/tcp.h>
 #include <linux/if_vlan.h>
 #include "en.h"
+#include "ipoib.h"
 
 #define MLX5E_SQ_NOPS_ROOM  MLX5_SEND_WQE_MAX_WQEBBS
 #define MLX5E_SQ_STOP_ROOM (MLX5_SEND_WQE_MAX_WQEBBS +\
-- 
2.11.0

^ permalink raw reply related

* [PATCH iproute2] gre6: fix copy/paste bugs in GREv6 attribute manipulation
From: Craig Gallek @ 2017-04-21 18:14 UTC (permalink / raw)
  To: stephen; +Cc: netdev

From: Craig Gallek <kraig@google.com>

Fixes: af89576d7a8c("iproute2: GRE over IPv6 tunnel support.")
Signed-off-by: Craig Gallek <kraig@google.com>
---
 ip/link_gre6.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/ip/link_gre6.c b/ip/link_gre6.c
index a91f635760fa..1b4fb051b37f 100644
--- a/ip/link_gre6.c
+++ b/ip/link_gre6.c
@@ -355,7 +355,7 @@ get_failed:
 	addattr_l(n, 1024, IFLA_GRE_TTL, &hop_limit, 1);
 	addattr_l(n, 1024, IFLA_GRE_ENCAP_LIMIT, &encap_limit, 1);
 	addattr_l(n, 1024, IFLA_GRE_FLOWINFO, &flowinfo, 4);
-	addattr_l(n, 1024, IFLA_GRE_FLAGS, &flowinfo, 4);
+	addattr32(n, 1024, IFLA_GRE_FLAGS, flags);
 
 	addattr16(n, 1024, IFLA_GRE_ENCAP_TYPE, encaptype);
 	addattr16(n, 1024, IFLA_GRE_ENCAP_FLAGS, encapflags);
@@ -383,7 +383,7 @@ static void gre_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[])
 		flags = rta_getattr_u32(tb[IFLA_GRE_FLAGS]);
 
 	if (tb[IFLA_GRE_FLOWINFO])
-		flags = rta_getattr_u32(tb[IFLA_GRE_FLOWINFO]);
+		flowinfo = rta_getattr_u32(tb[IFLA_GRE_FLOWINFO]);
 
 	if (tb[IFLA_GRE_REMOTE]) {
 		struct in6_addr addr;
-- 
2.12.2.816.g2cccc81164-goog

^ permalink raw reply related

* [PATCH net-next 1/3] mlx5: hide unused functions
From: Stephen Hemminger @ 2017-04-21 18:15 UTC (permalink / raw)
  To: davem, saeedm, matanb, leonro; +Cc: linux-rdma, netdev, Stephen Hemminger
In-Reply-To: <20170421181558.5414-1-sthemmin@microsoft.com>

Fix sparse warnings in recent ipoib support.
The RDMA functions are not used yet, hide behind #ifdef.
Based on comment, they will eventually be local so make static.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/ipoib.c | 24 +++++++++++++-----------
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c b/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c
index ec78e637840f..3c84e36af018 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c
@@ -178,7 +178,7 @@ static int mlx5i_init_tx(struct mlx5e_priv *priv)
 	return 0;
 }
 
-void mlx5i_cleanup_tx(struct mlx5e_priv *priv)
+static void mlx5i_cleanup_tx(struct mlx5e_priv *priv)
 {
 	struct mlx5i_priv *ipriv = priv->ppriv;
 
@@ -359,9 +359,10 @@ static int mlx5i_close(struct net_device *netdev)
 	return 0;
 }
 
+#ifdef notusedyet
 /* IPoIB RDMA netdev callbacks */
-int mlx5i_attach_mcast(struct net_device *netdev, struct ib_device *hca,
-		       union ib_gid *gid, u16 lid, int set_qkey)
+static int mlx5i_attach_mcast(struct net_device *netdev, struct ib_device *hca,
+			      union ib_gid *gid, u16 lid, int set_qkey)
 {
 	struct mlx5e_priv    *epriv = mlx5i_epriv(netdev);
 	struct mlx5_core_dev *mdev  = epriv->mdev;
@@ -377,8 +378,8 @@ int mlx5i_attach_mcast(struct net_device *netdev, struct ib_device *hca,
 	return err;
 }
 
-int mlx5i_detach_mcast(struct net_device *netdev, struct ib_device *hca,
-		       union ib_gid *gid, u16 lid)
+static int mlx5i_detach_mcast(struct net_device *netdev, struct ib_device *hca,
+			      union ib_gid *gid, u16 lid)
 {
 	struct mlx5e_priv    *epriv = mlx5i_epriv(netdev);
 	struct mlx5_core_dev *mdev  = epriv->mdev;
@@ -395,7 +396,7 @@ int mlx5i_detach_mcast(struct net_device *netdev, struct ib_device *hca,
 	return err;
 }
 
-int mlx5i_xmit(struct net_device *dev, struct sk_buff *skb,
+static int mlx5i_xmit(struct net_device *dev, struct sk_buff *skb,
 	       struct ib_ah *address, u32 dqpn, u32 dqkey)
 {
 	struct mlx5e_priv *epriv = mlx5i_epriv(dev);
@@ -404,6 +405,7 @@ int mlx5i_xmit(struct net_device *dev, struct sk_buff *skb,
 
 	return mlx5i_sq_xmit(sq, skb, &mah->av, dqpn, dqkey);
 }
+#endif
 
 static int mlx5i_check_required_hca_cap(struct mlx5_core_dev *mdev)
 {
@@ -418,10 +420,10 @@ static int mlx5i_check_required_hca_cap(struct mlx5_core_dev *mdev)
 	return 0;
 }
 
-struct net_device *mlx5_rdma_netdev_alloc(struct mlx5_core_dev *mdev,
-					  struct ib_device *ibdev,
-					  const char *name,
-					  void (*setup)(struct net_device *))
+static struct net_device *mlx5_rdma_netdev_alloc(struct mlx5_core_dev *mdev,
+						 struct ib_device *ibdev,
+						 const char *name,
+						 void (*setup)(struct net_device *))
 {
 	const struct mlx5e_profile *profile = &mlx5i_nic_profile;
 	int nch = profile->max_nch(mdev);
@@ -480,7 +482,7 @@ struct net_device *mlx5_rdma_netdev_alloc(struct mlx5_core_dev *mdev,
 }
 EXPORT_SYMBOL(mlx5_rdma_netdev_alloc);
 
-void mlx5_rdma_netdev_free(struct net_device *netdev)
+static void mlx5_rdma_netdev_free(struct net_device *netdev)
 {
 	struct mlx5e_priv          *priv    = mlx5i_epriv(netdev);
 	const struct mlx5e_profile *profile = priv->profile;
-- 
2.11.0

^ permalink raw reply related

* Re: [PATCH net-next v4 1/2] net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch
From: David Miller @ 2017-04-21 15:38 UTC (permalink / raw)
  To: jhs; +Cc: eric.dumazet, jiri, netdev, xiyou.wangcong
In-Reply-To: <82a6c32b-d58e-aeed-bfb5-546f328eaf35@mojatatu.com>

From: Jamal Hadi Salim <jhs@mojatatu.com>
Date: Fri, 21 Apr 2017 11:29:19 -0400

> On 17-04-21 10:51 AM, David Miller wrote:
>> From: Jamal Hadi Salim <jhs@mojatatu.com>
>> Date: Fri, 21 Apr 2017 06:36:19 -0400
>>
>>> On 17-04-20 01:58 PM, David Miller wrote:
>>>> From: Jamal Hadi Salim <jhs@mojatatu.com>
>>>> Date: Thu, 20 Apr 2017 13:38:14 -0400
>>>>
>>>
> 
>>
>> Which means we can never use them for anything else reliably,
>> there could be random crap in there.
>>
> 
> Today: User space set them to zero.

You don't know this because the kernel has never verified it.
Jamal, you cannot walk past this important point, nothing can
be argued further because of it.

> Old kernels ignore them. New kernels look at the new ones.
> We'll be in a lot of trouble if this was not the case
> for things today;-> People add bits all the time in TLVs
> and in netlink headers that are labeled as flags.

And when we do things that way it's broken, and why we have such
crappy behavior.

We made a very bad decision a long time ago to ignore unrecognized
things in netlink and it was a grave error which we must start
correcting now.

If a user says "enable X" and it just gets simply ignored by older
kernels, that can't work properly.  What if "enable X" is something
like "turn on encryption"?  Are you OK with the user getting no
feedback that their stuff is not going to be encrypted?

Even something as benign as "give melarger action dumps" _must_ still
have the same behavior because the user has no alternative action plan
possible if it cannot tell if the kernel supports the facility or not.

> Dave, I dont think you are suggesting we should use a TLV for every
> bit
> we want to  send to the kernel (as Jiri is), are you?

Jiri is not suggesting this, he is instead saying if you want to
support more bits in the future then you must check that the unused
bits are zero _now_ so that we can prove that userland clears them
properly.

And if you don't have any direct plans for more bits in the future,
use just a single attribute with the smallest integer type possible.

> I think you as suggesting we should from now on enforce a rule that
> in the kernel we start checking that bits in a bitmap received for
> things we are not interested in. So if a bit i dont understand shows
> up in the kernel what should i do?

Reject it.

> Rejecting the transaction because i received something i dont
> understand is not conducive to forward compatibility.

Not rejecting it breaks everything and gives the user no feedback
or way whatsoever to know whether the kernel supports something or
not.

I'm not letting us continue to do things so stupidly any more.

I want future applications to know if they are running on an older
kernel and that a specific netlink feature is not supported.

Ignoring not-understood bits prevents that and is the single most
fundamental mistake we've made in netlink.

^ permalink raw reply

* Re: [net-next 00/14][pull request] 40GbE Intel Wired LAN Driver Updates 2017-04-19
From: David Miller @ 2017-04-21 18:13 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, nhorman, sassmann, jogreene
In-Reply-To: <20170420015750.6828-1-jeffrey.t.kirsher@intel.com>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Wed, 19 Apr 2017 18:57:36 -0700

> This series contains updates to i40e and i40evf only, most notable being
> the addition of trace points for BPF programs.

Pulled, thanks Jeff.

^ permalink raw reply

* Re: [net-next 04/14] i40e: dump VF information in debugfs
From: David Miller @ 2017-04-21 18:12 UTC (permalink / raw)
  To: mitch.a.williams
  Cc: Yuval.Mintz, gerlitz.or, jeffrey.t.kirsher, netdev, nhorman,
	sassmann, jogreene
In-Reply-To: <AAEA33E297BCAC4B9BB20A7C2DF0AB8D80551D1A@FMSMSX113.amr.corp.intel.com>

From: "Williams, Mitch A" <mitch.a.williams@intel.com>
Date: Thu, 20 Apr 2017 22:25:03 +0000

> I'm not adding backdoor hooks to do naughty things, I'm just (very
> slightly) enhancing what's already there.

I'm not going to drop this series because of this.

I'm just sounding the warning that I probably have to start pushing
back harder in the future because seriously.. debugfs is not deployed
production and in fact a lot of people disable it for security
reasons.

Even tracing got out of debugfs for this reason.

^ permalink raw reply

* [PATCH] MAINTAINERS: Add "B:" field for networking.
From: David Miller @ 2017-04-21 14:45 UTC (permalink / raw)
  To: netdev


We want people to report bugs to the netdev list.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 63f8abc..08950e5 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8761,6 +8761,7 @@ W:	http://www.linuxfoundation.org/en/Net
 Q:	http://patchwork.ozlabs.org/project/netdev/list/
 T:	git git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git
 T:	git git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git
+B:	mailto:netdev@vger.kernel.org
 S:	Maintained
 F:	net/
 F:	include/net/
-- 
2.4.11

^ permalink raw reply related

* [PATCH iproute2] iplink: Expose IFLA_*_FWMARK attributes for supported link types
From: Craig Gallek @ 2017-04-21 18:14 UTC (permalink / raw)
  To: stephen; +Cc: netdev

From: Craig Gallek <kraig@google.com>

This attribute allows the administrator to adjust the packet marking
attribute of tunnels that support policy based routing.

Signed-off-by: Craig Gallek <kraig@google.com>
---
 include/linux/if_tunnel.h |  3 +++
 ip/link_gre.c             | 16 ++++++++++++++++
 ip/link_gre6.c            | 24 +++++++++++++++++++++++-
 ip/link_ip6tnl.c          | 23 ++++++++++++++++++-----
 ip/link_iptnl.c           | 16 ++++++++++++++++
 ip/link_vti.c             | 16 ++++++++++++++++
 ip/link_vti6.c            | 15 +++++++++++++++
 7 files changed, 107 insertions(+), 6 deletions(-)

diff --git a/include/linux/if_tunnel.h b/include/linux/if_tunnel.h
index 4f975f5704d8..7375335a0773 100644
--- a/include/linux/if_tunnel.h
+++ b/include/linux/if_tunnel.h
@@ -75,6 +75,7 @@ enum {
 	IFLA_IPTUN_ENCAP_SPORT,
 	IFLA_IPTUN_ENCAP_DPORT,
 	IFLA_IPTUN_COLLECT_METADATA,
+	IFLA_IPTUN_FWMARK,
 	__IFLA_IPTUN_MAX,
 };
 #define IFLA_IPTUN_MAX	(__IFLA_IPTUN_MAX - 1)
@@ -132,6 +133,7 @@ enum {
 	IFLA_GRE_ENCAP_DPORT,
 	IFLA_GRE_COLLECT_METADATA,
 	IFLA_GRE_IGNORE_DF,
+	IFLA_GRE_FWMARK,
 	__IFLA_GRE_MAX,
 };
 
@@ -147,6 +149,7 @@ enum {
 	IFLA_VTI_OKEY,
 	IFLA_VTI_LOCAL,
 	IFLA_VTI_REMOTE,
+	IFLA_VTI_FWMARK,
 	__IFLA_VTI_MAX,
 };
 
diff --git a/ip/link_gre.c b/ip/link_gre.c
index 35d437a15562..82df900614bf 100644
--- a/ip/link_gre.c
+++ b/ip/link_gre.c
@@ -42,11 +42,13 @@ static void print_usage(FILE *f)
 		"                            [ [no]encap-csum ]\n"
 		"                            [ [no]encap-csum6 ]\n"
 		"                            [ [no]encap-remcsum ]\n"
+		"                            [ fwmark MARK ]\n"
 		"\n"
 		"Where: ADDR := { IP_ADDRESS | any }\n"
 		"       TOS  := { NUMBER | inherit }\n"
 		"       TTL  := { 1..255 | inherit }\n"
 		"       KEY  := { DOTTED_QUAD | NUMBER }\n"
+		"       MARK := { 0x0..0xffffffff }\n"
 	);
 }
 
@@ -91,6 +93,7 @@ static int gre_parse_opt(struct link_util *lu, int argc, char **argv,
 	__u16 encapsport = 0;
 	__u16 encapdport = 0;
 	__u8 metadata = 0;
+	__u32 fwmark = 0;
 
 	if (!(n->nlmsg_flags & NLM_F_CREATE)) {
 		if (rtnl_talk(&rth, &req.n, &req.n, sizeof(req)) < 0) {
@@ -160,6 +163,9 @@ get_failed:
 
 		if (greinfo[IFLA_GRE_COLLECT_METADATA])
 			metadata = 1;
+
+		if (greinfo[IFLA_GRE_FWMARK])
+			fwmark = rta_getattr_u32(greinfo[IFLA_GRE_FWMARK]);
 	}
 
 	while (argc > 0) {
@@ -305,6 +311,10 @@ get_failed:
 			encapflags |= ~TUNNEL_ENCAP_FLAG_REMCSUM;
 		} else if (strcmp(*argv, "external") == 0) {
 			metadata = 1;
+		} else if (strcmp(*argv, "fwmark") == 0) {
+			NEXT_ARG();
+			if (get_u32(&fwmark, *argv, 0))
+				invarg("invalid fwmark\n", *argv);
 		} else
 			usage();
 		argc--; argv++;
@@ -335,6 +345,7 @@ get_failed:
 			addattr32(n, 1024, IFLA_GRE_LINK, link);
 		addattr_l(n, 1024, IFLA_GRE_TTL, &ttl, 1);
 		addattr_l(n, 1024, IFLA_GRE_TOS, &tos, 1);
+		addattr32(n, 1024, IFLA_GRE_FWMARK, fwmark);
 	} else {
 		addattr_l(n, 1024, IFLA_GRE_COLLECT_METADATA, NULL, 0);
 	}
@@ -426,6 +437,11 @@ static void gre_print_direct_opt(FILE *f, struct rtattr *tb[])
 		fputs("icsum ", f);
 	if (oflags & GRE_CSUM)
 		fputs("ocsum ", f);
+
+	if (tb[IFLA_GRE_FWMARK] && rta_getattr_u32(tb[IFLA_GRE_FWMARK])) {
+		fprintf(f, "fwmark 0x%x ",
+			rta_getattr_u32(tb[IFLA_GRE_FWMARK]));
+	}
 }
 
 static void gre_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[])
diff --git a/ip/link_gre6.c b/ip/link_gre6.c
index 1b4fb051b37f..205bada78054 100644
--- a/ip/link_gre6.c
+++ b/ip/link_gre6.c
@@ -43,6 +43,7 @@ static void print_usage(FILE *f)
 		"                                  [ tclass TCLASS ]\n"
 		"                                  [ flowlabel FLOWLABEL ]\n"
 		"                                  [ dscp inherit ]\n"
+		"                                  [ fwmark MARK ]\n"
 		"                                  [ dev PHYS_DEV ]\n"
 		"                                  [ noencap ]\n"
 		"                                  [ encap { fou | gue | none } ]\n"
@@ -57,7 +58,8 @@ static void print_usage(FILE *f)
 		"       KEY       := { DOTTED_QUAD | NUMBER }\n"
 		"       ELIM      := { none | 0..255 }(default=%d)\n"
 		"       TCLASS    := { 0x0..0xff | inherit }\n"
-		"       FLOWLABEL := { 0x0..0xfffff | inherit }\n",
+		"       FLOWLABEL := { 0x0..0xfffff | inherit }\n"
+		"       MARK      := { 0x0..0xffffffff | inherit }\n",
 		DEFAULT_TNL_HOP_LIMIT, IPV6_DEFAULT_TNL_ENCAP_LIMIT
 	);
 }
@@ -103,6 +105,7 @@ static int gre_parse_opt(struct link_util *lu, int argc, char **argv,
 	__u16 encapsport = 0;
 	__u16 encapdport = 0;
 	int len;
+	__u32 fwmark = 0;
 
 	if (!(n->nlmsg_flags & NLM_F_CREATE)) {
 		if (rtnl_talk(&rth, &req.n, &req.n, sizeof(req)) < 0) {
@@ -174,6 +177,9 @@ get_failed:
 
 		if (greinfo[IFLA_GRE_ENCAP_DPORT])
 			encapdport = rta_getattr_u16(greinfo[IFLA_GRE_ENCAP_DPORT]);
+
+		if (greinfo[IFLA_GRE_FWMARK])
+			fwmark = rta_getattr_u32(greinfo[IFLA_GRE_FWMARK]);
 	}
 
 	while (argc > 0) {
@@ -339,6 +345,16 @@ get_failed:
 			encapflags |= TUNNEL_ENCAP_FLAG_REMCSUM;
 		} else if (strcmp(*argv, "noencap-remcsum") == 0) {
 			encapflags &= ~TUNNEL_ENCAP_FLAG_REMCSUM;
+		} else if (strcmp(*argv, "fwmark") == 0) {
+			NEXT_ARG();
+			if (strcmp(*argv, "inherit") == 0) {
+				flags |= IP6_TNL_F_USE_ORIG_FWMARK;
+				fwmark = 0;
+			} else {
+				if (get_u32(&fwmark, *argv, 0))
+					invarg("invalid fwmark\n", *argv);
+				flags &= ~IP6_TNL_F_USE_ORIG_FWMARK;
+			}
 		} else
 			usage();
 		argc--; argv++;
@@ -356,6 +372,7 @@ get_failed:
 	addattr_l(n, 1024, IFLA_GRE_ENCAP_LIMIT, &encap_limit, 1);
 	addattr_l(n, 1024, IFLA_GRE_FLOWINFO, &flowinfo, 4);
 	addattr32(n, 1024, IFLA_GRE_FLAGS, flags);
+	addattr32(n, 1024, IFLA_GRE_FWMARK, fwmark);
 
 	addattr16(n, 1024, IFLA_GRE_ENCAP_TYPE, encaptype);
 	addattr16(n, 1024, IFLA_GRE_ENCAP_FLAGS, encapflags);
@@ -461,6 +478,11 @@ static void gre_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[])
 	if (oflags & GRE_CSUM)
 		fputs("ocsum ", f);
 
+	if (flags & IP6_TNL_F_USE_ORIG_FWMARK)
+		fprintf(f, "fwmark inherit ");
+	else if (tb[IFLA_GRE_FWMARK] && rta_getattr_u32(tb[IFLA_GRE_FWMARK]))
+		fprintf(f, "fwmark 0x%x ", rta_getattr_u32(tb[IFLA_GRE_FWMARK]));
+
 	if (tb[IFLA_GRE_ENCAP_TYPE] &&
 	    rta_getattr_u16(tb[IFLA_GRE_ENCAP_TYPE]) != TUNNEL_ENCAP_NONE) {
 		__u16 type = rta_getattr_u16(tb[IFLA_GRE_ENCAP_TYPE]);
diff --git a/ip/link_ip6tnl.c b/ip/link_ip6tnl.c
index 6bb968d3c918..505fb47622ed 100644
--- a/ip/link_ip6tnl.c
+++ b/ip/link_ip6tnl.c
@@ -41,7 +41,7 @@ static void print_usage(FILE *f)
 		"                  [ tclass TCLASS ]\n"
 		"                  [ flowlabel FLOWLABEL ]\n"
 		"                  [ dscp inherit ]\n"
-		"                  [ fwmark inherit ]\n"
+		"                  [ fwmark MARK ]\n"
 		"                  [ noencap ]\n"
 		"                  [ encap { fou | gue | none } ]\n"
 		"                  [ encap-sport PORT ]\n"
@@ -55,7 +55,8 @@ static void print_usage(FILE *f)
 		"       ELIM      := { none | 0..255 }(default=%d)\n"
 		"       HLIM      := 0..255 (default=%d)\n"
 		"       TCLASS    := { 0x0..0xff | inherit }\n"
-		"       FLOWLABEL := { 0x0..0xfffff | inherit }\n",
+		"       FLOWLABEL := { 0x0..0xfffff | inherit }\n"
+		"       MARK      := { 0x0..0xffffffff | inherit }\n",
 		IPV6_DEFAULT_TNL_ENCAP_LIMIT, DEFAULT_TNL_HOP_LIMIT
 	);
 }
@@ -99,6 +100,7 @@ static int ip6tunnel_parse_opt(struct link_util *lu, int argc, char **argv,
 	__u16 encapsport = 0;
 	__u16 encapdport = 0;
 	__u8 metadata = 0;
+	__u32 fwmark = 0;
 
 	if (!(n->nlmsg_flags & NLM_F_CREATE)) {
 		if (rtnl_talk(&rth, &req.n, &req.n, sizeof(req)) < 0) {
@@ -153,6 +155,9 @@ get_failed:
 			proto = rta_getattr_u8(iptuninfo[IFLA_IPTUN_PROTO]);
 		if (iptuninfo[IFLA_IPTUN_COLLECT_METADATA])
 			metadata = 1;
+
+		if (iptuninfo[IFLA_IPTUN_FWMARK])
+			fwmark = rta_getattr_u32(iptuninfo[IFLA_IPTUN_FWMARK]);
 	}
 
 	while (argc > 0) {
@@ -252,9 +257,14 @@ get_failed:
 			flags |= IP6_TNL_F_RCV_DSCP_COPY;
 		} else if (strcmp(*argv, "fwmark") == 0) {
 			NEXT_ARG();
-			if (strcmp(*argv, "inherit") != 0)
-				invarg("not inherit", *argv);
-			flags |= IP6_TNL_F_USE_ORIG_FWMARK;
+			if (strcmp(*argv, "inherit") == 0) {
+				flags |= IP6_TNL_F_USE_ORIG_FWMARK;
+				fwmark = 0;
+			} else {
+				if (get_u32(&fwmark, *argv, 0))
+					invarg("invalid fwmark\n", *argv);
+				flags &= ~IP6_TNL_F_USE_ORIG_FWMARK;
+			}
 		} else if (strcmp(*argv, "noencap") == 0) {
 			encaptype = TUNNEL_ENCAP_NONE;
 		} else if (strcmp(*argv, "encap") == 0) {
@@ -308,6 +318,7 @@ get_failed:
 	addattr32(n, 1024, IFLA_IPTUN_FLOWINFO, flowinfo);
 	addattr32(n, 1024, IFLA_IPTUN_FLAGS, flags);
 	addattr32(n, 1024, IFLA_IPTUN_LINK, link);
+	addattr32(n, 1024, IFLA_IPTUN_FWMARK, fwmark);
 
 	addattr16(n, 1024, IFLA_IPTUN_ENCAP_TYPE, encaptype);
 	addattr16(n, 1024, IFLA_IPTUN_ENCAP_FLAGS, encapflags);
@@ -398,6 +409,8 @@ static void ip6tunnel_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb
 
 	if (flags & IP6_TNL_F_USE_ORIG_FWMARK)
 		fprintf(f, "fwmark inherit ");
+	else if (tb[IFLA_IPTUN_FWMARK] && rta_getattr_u32(tb[IFLA_IPTUN_FWMARK]))
+		fprintf(f, "fwmark 0x%x ", rta_getattr_u32(tb[IFLA_IPTUN_FWMARK]));
 
 	if (tb[IFLA_IPTUN_ENCAP_TYPE] &&
 	    rta_getattr_u16(tb[IFLA_IPTUN_ENCAP_TYPE]) !=
diff --git a/ip/link_iptnl.c b/ip/link_iptnl.c
index f180b921e471..2f74d9b7df1a 100644
--- a/ip/link_iptnl.c
+++ b/ip/link_iptnl.c
@@ -52,10 +52,12 @@ static void print_usage(FILE *f, int sit)
 			"                [ isatap ]\n");
 	}
 	fprintf(f, "                [ external ]\n");
+	fprintf(f, "                [ fwmark MARK ]\n");
 	fprintf(f, "\n");
 	fprintf(f, "Where: ADDR := { IP_ADDRESS | any }\n");
 	fprintf(f, "       TOS  := { NUMBER | inherit }\n");
 	fprintf(f, "       TTL  := { 1..255 | inherit }\n");
+	fprintf(f, "       MARK := { 0x0..0xffffffff }\n");
 }
 
 static void usage(int sit) __attribute__((noreturn));
@@ -101,6 +103,7 @@ static int iptunnel_parse_opt(struct link_util *lu, int argc, char **argv,
 	__u16 encapsport = 0;
 	__u16 encapdport = 0;
 	__u8 metadata = 0;
+	__u32 fwmark = 0;
 
 	if (!(n->nlmsg_flags & NLM_F_CREATE)) {
 		if (rtnl_talk(&rth, &req.n, &req.n, sizeof(req)) < 0) {
@@ -179,6 +182,10 @@ get_failed:
 				rta_getattr_u16(iptuninfo[IFLA_IPTUN_6RD_RELAY_PREFIXLEN]);
 		if (iptuninfo[IFLA_IPTUN_COLLECT_METADATA])
 			metadata = 1;
+
+		if (iptuninfo[IFLA_IPTUN_FWMARK])
+			fwmark = rta_getattr_u32(iptuninfo[IFLA_IPTUN_FWMARK]);
+
 	}
 
 	while (argc > 0) {
@@ -301,6 +308,10 @@ get_failed:
 			ip6rdprefixlen = 16;
 			ip6rdrelayprefix = 0;
 			ip6rdrelayprefixlen = 0;
+		} else if (strcmp(*argv, "fwmark") == 0) {
+			NEXT_ARG();
+			if (get_u32(&fwmark, *argv, 0))
+				invarg("invalid fwmark\n", *argv);
 		} else
 			usage(strcmp(lu->id, "sit") == 0);
 		argc--, argv++;
@@ -322,6 +333,7 @@ get_failed:
 	addattr8(n, 1024, IFLA_IPTUN_TTL, ttl);
 	addattr8(n, 1024, IFLA_IPTUN_TOS, tos);
 	addattr8(n, 1024, IFLA_IPTUN_PMTUDISC, pmtudisc);
+	addattr32(n, 1024, IFLA_IPTUN_FWMARK, fwmark);
 
 	addattr16(n, 1024, IFLA_IPTUN_ENCAP_TYPE, encaptype);
 	addattr16(n, 1024, IFLA_IPTUN_ENCAP_FLAGS, encapflags);
@@ -471,6 +483,10 @@ static void iptunnel_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[
 		else
 			fputs("noencap-remcsum ", f);
 	}
+
+	if (tb[IFLA_IPTUN_FWMARK] && rta_getattr_u32(tb[IFLA_IPTUN_FWMARK]))
+		fprintf(f, "fwmark 0x%x ",
+			rta_getattr_u32(tb[IFLA_IPTUN_FWMARK]));
 }
 
 static void iptunnel_print_help(struct link_util *lu, int argc, char **argv,
diff --git a/ip/link_vti.c b/ip/link_vti.c
index 95bc23e92897..d5242ac762fd 100644
--- a/ip/link_vti.c
+++ b/ip/link_vti.c
@@ -31,9 +31,11 @@ static void print_usage(FILE *f)
 		"               [ local ADDR ]\n"
 		"               [ [i|o]key KEY ]\n"
 		"               [ dev PHYS_DEV ]\n"
+		"               [ fwmark MARK ]\n"
 		"\n"
 		"Where: ADDR := { IP_ADDRESS }\n"
 		"       KEY  := { DOTTED_QUAD | NUMBER }\n"
+		"       MARK := { 0x0..0xffffffff }\n"
 	);
 }
 
@@ -67,6 +69,7 @@ static int vti_parse_opt(struct link_util *lu, int argc, char **argv,
 	unsigned int saddr = 0;
 	unsigned int daddr = 0;
 	unsigned int link = 0;
+	unsigned int fwmark = 0;
 	int len;
 
 	if (!(n->nlmsg_flags & NLM_F_CREATE)) {
@@ -109,6 +112,9 @@ get_failed:
 
 		if (vtiinfo[IFLA_VTI_LINK])
 			link = rta_getattr_u8(vtiinfo[IFLA_VTI_LINK]);
+
+		if (vtiinfo[IFLA_VTI_FWMARK])
+			fwmark = rta_getattr_u32(vtiinfo[IFLA_VTI_FWMARK]);
 	}
 
 	while (argc > 0) {
@@ -180,6 +186,10 @@ get_failed:
 					*argv);
 				exit(-1);
 			}
+		} else if (strcmp(*argv, "fwmark") == 0) {
+			NEXT_ARG();
+			if (get_u32(&fwmark, *argv, 0))
+				invarg("invalid fwmark\n", *argv);
 		} else
 			usage();
 		argc--; argv++;
@@ -189,6 +199,7 @@ get_failed:
 	addattr32(n, 1024, IFLA_VTI_OKEY, okey);
 	addattr_l(n, 1024, IFLA_VTI_LOCAL, &saddr, 4);
 	addattr_l(n, 1024, IFLA_VTI_REMOTE, &daddr, 4);
+	addattr32(n, 1024, IFLA_VTI_FWMARK, fwmark);
 	if (link)
 		addattr32(n, 1024, IFLA_VTI_LINK, link);
 
@@ -242,6 +253,11 @@ static void vti_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[])
 		inet_ntop(AF_INET, RTA_DATA(tb[IFLA_VTI_OKEY]), s2, sizeof(s2));
 		fprintf(f, "okey %s ", s2);
 	}
+
+	if (tb[IFLA_VTI_FWMARK] && rta_getattr_u32(tb[IFLA_VTI_FWMARK])) {
+		fprintf(f, "fwmark 0x%x ",
+			rta_getattr_u32(tb[IFLA_VTI_FWMARK]));
+	}
 }
 
 static void vti_print_help(struct link_util *lu, int argc, char **argv,
diff --git a/ip/link_vti6.c b/ip/link_vti6.c
index 9ca127af8a5d..be4e33cee606 100644
--- a/ip/link_vti6.c
+++ b/ip/link_vti6.c
@@ -32,10 +32,12 @@ static void usage(void)
 	fprintf(stderr, "          type { vti6 } [ remote ADDR ] [ local ADDR ]\n");
 	fprintf(stderr, "          [ [i|o]key KEY ]\n");
 	fprintf(stderr, "          [ dev PHYS_DEV ]\n");
+	fprintf(stderr, "          [ fwmark MARK ]\n");
 	fprintf(stderr, "\n");
 	fprintf(stderr, "Where: NAME := STRING\n");
 	fprintf(stderr, "       ADDR := { IPV6_ADDRESS }\n");
 	fprintf(stderr, "       KEY  := { DOTTED_QUAD | NUMBER }\n");
+	fprintf(stderr, "       MARK := { 0x0..0xffffffff }\n");
 	exit(-1);
 }
 
@@ -62,6 +64,7 @@ static int vti6_parse_opt(struct link_util *lu, int argc, char **argv,
 	unsigned int ikey = 0;
 	unsigned int okey = 0;
 	unsigned int link = 0;
+	__u32 fwmark = 0;
 	int len;
 
 	if (!(n->nlmsg_flags & NLM_F_CREATE)) {
@@ -104,6 +107,9 @@ get_failed:
 
 		if (vtiinfo[IFLA_VTI_LINK])
 			link = rta_getattr_u8(vtiinfo[IFLA_VTI_LINK]);
+
+		if (vtiinfo[IFLA_VTI_FWMARK])
+			fwmark = rta_getattr_u32(vtiinfo[IFLA_VTI_FWMARK]);
 	}
 
 	while (argc > 0) {
@@ -178,6 +184,10 @@ get_failed:
 			link = if_nametoindex(*argv);
 			if (link == 0)
 				exit(-1);
+		} else if (strcmp(*argv, "fwmark") == 0) {
+			NEXT_ARG();
+			if (get_u32(&fwmark, *argv, 0))
+				invarg("invalid fwmark\n", *argv);
 		} else
 			usage();
 		argc--; argv++;
@@ -187,6 +197,7 @@ get_failed:
 	addattr32(n, 1024, IFLA_VTI_OKEY, okey);
 	addattr_l(n, 1024, IFLA_VTI_LOCAL, &saddr, sizeof(saddr));
 	addattr_l(n, 1024, IFLA_VTI_REMOTE, &daddr, sizeof(daddr));
+	addattr32(n, 1024, IFLA_VTI_FWMARK, fwmark);
 	if (link)
 		addattr32(n, 1024, IFLA_VTI_LINK, link);
 
@@ -239,6 +250,10 @@ static void vti6_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[])
 		inet_ntop(AF_INET, RTA_DATA(tb[IFLA_VTI_OKEY]), s2, sizeof(s2));
 		fprintf(f, "okey %s ", s2);
 	}
+
+	if (tb[IFLA_VTI_FWMARK] && rta_getattr_u32(tb[IFLA_VTI_FWMARK])) {
+		fprintf(f, "fwmark 0x%x ", rta_getattr_u32(tb[IFLA_VTI_FWMARK]));
+	}
 }
 
 struct link_util vti6_link_util = {
-- 
2.12.2.816.g2cccc81164-goog

^ permalink raw reply related

* [PATCH net-next 3/3] mlx5: fix space waste from ethtool descriptions
From: Stephen Hemminger @ 2017-04-21 18:15 UTC (permalink / raw)
  To: davem-fT/PcQaiUtIeIZ0/mPfg9Q, saeedm-VPRAkNaXOzVWk0Htik3J/w,
	matanb-VPRAkNaXOzVWk0Htik3J/w, leonro-VPRAkNaXOzVWk0Htik3J/w
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	Stephen Hemminger
In-Reply-To: <20170421181558.5414-1-sthemmin-0li6OtcxBFHby3iVrkZq2A@public.gmane.org>

The ethtool statistics descriptions were declared as static in
en_stats.h but that file is included indirectly in multiple places
causing multiple unused redundant copies. This is reported by building
with W=1.

The solution is to move the description out of en_stats.h into
the one file that uses them en_ethtool.c

Signed-off-by: Stephen Hemminger <sthemmin-0li6OtcxBFHby3iVrkZq2A@public.gmane.org>
---
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   | 208 +++++++++++++++++++++
 drivers/net/ethernet/mellanox/mlx5/core/en_stats.h | 206 +-------------------
 2 files changed, 209 insertions(+), 205 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index ce7b09d72ff6..b1d9518b2719 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -31,6 +31,214 @@
  */
 
 #include "en.h"
+#include "en_stats.h"
+
+
+static const struct counter_desc sw_stats_desc[] = {
+	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_packets) },
+	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_bytes) },
+	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, tx_packets) },
+	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, tx_bytes) },
+	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, tx_tso_packets) },
+	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, tx_tso_bytes) },
+	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, tx_tso_inner_packets) },
+	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, tx_tso_inner_bytes) },
+	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_lro_packets) },
+	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_lro_bytes) },
+	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_csum_unnecessary) },
+	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_csum_none) },
+	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_csum_complete) },
+	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_csum_unnecessary_inner) },
+	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_xdp_drop) },
+	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_xdp_tx) },
+	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_xdp_tx_full) },
+	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, tx_csum_partial) },
+	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, tx_csum_partial_inner) },
+	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, tx_queue_stopped) },
+	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, tx_queue_wake) },
+	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, tx_queue_dropped) },
+	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, tx_xmit_more) },
+	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_wqe_err) },
+	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_mpwqe_filler) },
+	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_buff_alloc_err) },
+	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_cqe_compress_blks) },
+	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_cqe_compress_pkts) },
+	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_cache_reuse) },
+	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_cache_full) },
+	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_cache_empty) },
+	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_cache_busy) },
+	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, link_down_events_phy) },
+};
+
+static const struct counter_desc q_stats_desc[] = {
+	{ MLX5E_DECLARE_STAT(struct mlx5e_qcounter_stats, rx_out_of_buffer) },
+};
+
+static const struct counter_desc vport_stats_desc[] = {
+	{ "rx_vport_unicast_packets",
+		VPORT_COUNTER_OFF(received_eth_unicast.packets) },
+	{ "rx_vport_unicast_bytes",
+		VPORT_COUNTER_OFF(received_eth_unicast.octets) },
+	{ "tx_vport_unicast_packets",
+		VPORT_COUNTER_OFF(transmitted_eth_unicast.packets) },
+	{ "tx_vport_unicast_bytes",
+		VPORT_COUNTER_OFF(transmitted_eth_unicast.octets) },
+	{ "rx_vport_multicast_packets",
+		VPORT_COUNTER_OFF(received_eth_multicast.packets) },
+	{ "rx_vport_multicast_bytes",
+		VPORT_COUNTER_OFF(received_eth_multicast.octets) },
+	{ "tx_vport_multicast_packets",
+		VPORT_COUNTER_OFF(transmitted_eth_multicast.packets) },
+	{ "tx_vport_multicast_bytes",
+		VPORT_COUNTER_OFF(transmitted_eth_multicast.octets) },
+	{ "rx_vport_broadcast_packets",
+		VPORT_COUNTER_OFF(received_eth_broadcast.packets) },
+	{ "rx_vport_broadcast_bytes",
+		VPORT_COUNTER_OFF(received_eth_broadcast.octets) },
+	{ "tx_vport_broadcast_packets",
+		VPORT_COUNTER_OFF(transmitted_eth_broadcast.packets) },
+	{ "tx_vport_broadcast_bytes",
+		VPORT_COUNTER_OFF(transmitted_eth_broadcast.octets) },
+	{ "rx_vport_rdma_unicast_packets",
+		VPORT_COUNTER_OFF(received_ib_unicast.packets) },
+	{ "rx_vport_rdma_unicast_bytes",
+		VPORT_COUNTER_OFF(received_ib_unicast.octets) },
+	{ "tx_vport_rdma_unicast_packets",
+		VPORT_COUNTER_OFF(transmitted_ib_unicast.packets) },
+	{ "tx_vport_rdma_unicast_bytes",
+		VPORT_COUNTER_OFF(transmitted_ib_unicast.octets) },
+	{ "rx_vport_rdma_multicast_packets",
+		VPORT_COUNTER_OFF(received_ib_multicast.packets) },
+	{ "rx_vport_rdma_multicast_bytes",
+		VPORT_COUNTER_OFF(received_ib_multicast.octets) },
+	{ "tx_vport_rdma_multicast_packets",
+		VPORT_COUNTER_OFF(transmitted_ib_multicast.packets) },
+	{ "tx_vport_rdma_multicast_bytes",
+		VPORT_COUNTER_OFF(transmitted_ib_multicast.octets) },
+};
+static const struct counter_desc pport_802_3_stats_desc[] = {
+	{ "tx_packets_phy", PPORT_802_3_OFF(a_frames_transmitted_ok) },
+	{ "rx_packets_phy", PPORT_802_3_OFF(a_frames_received_ok) },
+	{ "rx_crc_errors_phy", PPORT_802_3_OFF(a_frame_check_sequence_errors) },
+	{ "tx_bytes_phy", PPORT_802_3_OFF(a_octets_transmitted_ok) },
+	{ "rx_bytes_phy", PPORT_802_3_OFF(a_octets_received_ok) },
+	{ "tx_multicast_phy", PPORT_802_3_OFF(a_multicast_frames_xmitted_ok) },
+	{ "tx_broadcast_phy", PPORT_802_3_OFF(a_broadcast_frames_xmitted_ok) },
+	{ "rx_multicast_phy", PPORT_802_3_OFF(a_multicast_frames_received_ok) },
+	{ "rx_broadcast_phy", PPORT_802_3_OFF(a_broadcast_frames_received_ok) },
+	{ "rx_in_range_len_errors_phy", PPORT_802_3_OFF(a_in_range_length_errors) },
+	{ "rx_out_of_range_len_phy", PPORT_802_3_OFF(a_out_of_range_length_field) },
+	{ "rx_oversize_pkts_phy", PPORT_802_3_OFF(a_frame_too_long_errors) },
+	{ "rx_symbol_err_phy", PPORT_802_3_OFF(a_symbol_error_during_carrier) },
+	{ "tx_mac_control_phy", PPORT_802_3_OFF(a_mac_control_frames_transmitted) },
+	{ "rx_mac_control_phy", PPORT_802_3_OFF(a_mac_control_frames_received) },
+	{ "rx_unsupported_op_phy", PPORT_802_3_OFF(a_unsupported_opcodes_received) },
+	{ "rx_pause_ctrl_phy", PPORT_802_3_OFF(a_pause_mac_ctrl_frames_received) },
+	{ "tx_pause_ctrl_phy", PPORT_802_3_OFF(a_pause_mac_ctrl_frames_transmitted) },
+};
+
+static const struct counter_desc pport_2863_stats_desc[] = {
+	{ "rx_discards_phy", PPORT_2863_OFF(if_in_discards) },
+	{ "tx_discards_phy", PPORT_2863_OFF(if_out_discards) },
+	{ "tx_errors_phy", PPORT_2863_OFF(if_out_errors) },
+};
+
+static const struct counter_desc pport_2819_stats_desc[] = {
+	{ "rx_undersize_pkts_phy", PPORT_2819_OFF(ether_stats_undersize_pkts) },
+	{ "rx_fragments_phy", PPORT_2819_OFF(ether_stats_fragments) },
+	{ "rx_jabbers_phy", PPORT_2819_OFF(ether_stats_jabbers) },
+	{ "rx_64_bytes_phy", PPORT_2819_OFF(ether_stats_pkts64octets) },
+	{ "rx_65_to_127_bytes_phy", PPORT_2819_OFF(ether_stats_pkts65to127octets) },
+	{ "rx_128_to_255_bytes_phy", PPORT_2819_OFF(ether_stats_pkts128to255octets) },
+	{ "rx_256_to_511_bytes_phy", PPORT_2819_OFF(ether_stats_pkts256to511octets) },
+	{ "rx_512_to_1023_bytes_phy", PPORT_2819_OFF(ether_stats_pkts512to1023octets) },
+	{ "rx_1024_to_1518_bytes_phy", PPORT_2819_OFF(ether_stats_pkts1024to1518octets) },
+	{ "rx_1519_to_2047_bytes_phy", PPORT_2819_OFF(ether_stats_pkts1519to2047octets) },
+	{ "rx_2048_to_4095_bytes_phy", PPORT_2819_OFF(ether_stats_pkts2048to4095octets) },
+	{ "rx_4096_to_8191_bytes_phy", PPORT_2819_OFF(ether_stats_pkts4096to8191octets) },
+	{ "rx_8192_to_10239_bytes_phy", PPORT_2819_OFF(ether_stats_pkts8192to10239octets) },
+};
+
+static const struct counter_desc pport_phy_statistical_stats_desc[] = {
+	{ "rx_symbol_errors_phy", PPORT_PHY_STATISTICAL_OFF(phy_symbol_errors) },
+	{ "rx_corrected_bits_phy", PPORT_PHY_STATISTICAL_OFF(phy_corrected_bits) },
+};
+
+static const struct counter_desc pport_per_prio_traffic_stats_desc[] = {
+	{ "rx_prio%d_bytes", PPORT_PER_PRIO_OFF(rx_octets) },
+	{ "rx_prio%d_packets", PPORT_PER_PRIO_OFF(rx_frames) },
+	{ "tx_prio%d_bytes", PPORT_PER_PRIO_OFF(tx_octets) },
+	{ "tx_prio%d_packets", PPORT_PER_PRIO_OFF(tx_frames) },
+};
+
+static const struct counter_desc pport_per_prio_pfc_stats_desc[] = {
+	/* %s is "global" or "prio{i}" */
+	{ "rx_%s_pause", PPORT_PER_PRIO_OFF(rx_pause) },
+	{ "rx_%s_pause_duration", PPORT_PER_PRIO_OFF(rx_pause_duration) },
+	{ "tx_%s_pause", PPORT_PER_PRIO_OFF(tx_pause) },
+	{ "tx_%s_pause_duration", PPORT_PER_PRIO_OFF(tx_pause_duration) },
+	{ "rx_%s_pause_transition", PPORT_PER_PRIO_OFF(rx_pause_transition) },
+};
+
+static const struct counter_desc pcie_perf_stats_desc[] = {
+	{ "rx_pci_signal_integrity", PCIE_PERF_OFF(rx_errors) },
+	{ "tx_pci_signal_integrity", PCIE_PERF_OFF(tx_errors) },
+};
+
+static const struct counter_desc rq_stats_desc[] = {
+	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, packets) },
+	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, bytes) },
+	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, csum_complete) },
+	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, csum_unnecessary_inner) },
+	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, csum_none) },
+	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, xdp_drop) },
+	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, xdp_tx) },
+	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, xdp_tx_full) },
+	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, lro_packets) },
+	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, lro_bytes) },
+	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, wqe_err) },
+	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, mpwqe_filler) },
+	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, buff_alloc_err) },
+	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, cqe_compress_blks) },
+	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, cqe_compress_pkts) },
+	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, cache_reuse) },
+	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, cache_full) },
+	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, cache_empty) },
+	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, cache_busy) },
+};
+
+static const struct counter_desc sq_stats_desc[] = {
+	{ MLX5E_DECLARE_TX_STAT(struct mlx5e_sq_stats, packets) },
+	{ MLX5E_DECLARE_TX_STAT(struct mlx5e_sq_stats, bytes) },
+	{ MLX5E_DECLARE_TX_STAT(struct mlx5e_sq_stats, tso_packets) },
+	{ MLX5E_DECLARE_TX_STAT(struct mlx5e_sq_stats, tso_bytes) },
+	{ MLX5E_DECLARE_TX_STAT(struct mlx5e_sq_stats, tso_inner_packets) },
+	{ MLX5E_DECLARE_TX_STAT(struct mlx5e_sq_stats, tso_inner_bytes) },
+	{ MLX5E_DECLARE_TX_STAT(struct mlx5e_sq_stats, csum_partial_inner) },
+	{ MLX5E_DECLARE_TX_STAT(struct mlx5e_sq_stats, nop) },
+	{ MLX5E_DECLARE_TX_STAT(struct mlx5e_sq_stats, csum_none) },
+	{ MLX5E_DECLARE_TX_STAT(struct mlx5e_sq_stats, stopped) },
+	{ MLX5E_DECLARE_TX_STAT(struct mlx5e_sq_stats, wake) },
+	{ MLX5E_DECLARE_TX_STAT(struct mlx5e_sq_stats, dropped) },
+	{ MLX5E_DECLARE_TX_STAT(struct mlx5e_sq_stats, xmit_more) },
+};
+
+static const struct counter_desc mlx5e_pme_status_desc[] = {
+	{ "module_plug", 0 },
+	{ "module_unplug", 8 },
+};
+
+static const struct counter_desc mlx5e_pme_error_desc[] = {
+	{ "module_pwr_budget_exd", 0 },  /* power budget exceed */
+	{ "module_long_range", 8 },      /* long range for non MLNX cable */
+	{ "module_bus_stuck", 16 },      /* bus stuck (I2C or data shorted) */
+	{ "module_no_eeprom", 24 },      /* no eeprom/retry time out */
+	{ "module_enforce_part", 32 },   /* enforce part number list */
+	{ "module_unknown_id", 40 },     /* unknown identifier */
+	{ "module_high_temp", 48 },      /* high temperature */
+	{ "module_bad_shorted", 56 },    /* bad or shorted cable/module */
+	{ "module_unknown_status", 64 },
+};
 
 static void mlx5e_get_drvinfo(struct net_device *dev,
 			      struct ethtool_drvinfo *drvinfo)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
index 53e4992d6511..1c867ee973f2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
@@ -47,7 +47,7 @@
 
 struct counter_desc {
 	char		format[ETH_GSTRING_LEN];
-	int		offset; /* Byte offset */
+	size_t		offset; /* Byte offset */
 };
 
 struct mlx5e_sw_stats {
@@ -88,49 +88,10 @@ struct mlx5e_sw_stats {
 	u64 link_down_events_phy;
 };
 
-static const struct counter_desc sw_stats_desc[] = {
-	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_packets) },
-	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_bytes) },
-	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, tx_packets) },
-	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, tx_bytes) },
-	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, tx_tso_packets) },
-	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, tx_tso_bytes) },
-	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, tx_tso_inner_packets) },
-	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, tx_tso_inner_bytes) },
-	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_lro_packets) },
-	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_lro_bytes) },
-	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_csum_unnecessary) },
-	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_csum_none) },
-	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_csum_complete) },
-	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_csum_unnecessary_inner) },
-	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_xdp_drop) },
-	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_xdp_tx) },
-	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_xdp_tx_full) },
-	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, tx_csum_partial) },
-	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, tx_csum_partial_inner) },
-	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, tx_queue_stopped) },
-	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, tx_queue_wake) },
-	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, tx_queue_dropped) },
-	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, tx_xmit_more) },
-	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_wqe_err) },
-	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_mpwqe_filler) },
-	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_buff_alloc_err) },
-	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_cqe_compress_blks) },
-	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_cqe_compress_pkts) },
-	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_cache_reuse) },
-	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_cache_full) },
-	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_cache_empty) },
-	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_cache_busy) },
-	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, link_down_events_phy) },
-};
-
 struct mlx5e_qcounter_stats {
 	u32 rx_out_of_buffer;
 };
 
-static const struct counter_desc q_stats_desc[] = {
-	{ MLX5E_DECLARE_STAT(struct mlx5e_qcounter_stats, rx_out_of_buffer) },
-};
 
 #define VPORT_COUNTER_OFF(c) MLX5_BYTE_OFF(query_vport_counter_out, c)
 #define VPORT_COUNTER_GET(vstats, c) MLX5_GET64(query_vport_counter_out, \
@@ -140,48 +101,6 @@ struct mlx5e_vport_stats {
 	__be64 query_vport_out[MLX5_ST_SZ_QW(query_vport_counter_out)];
 };
 
-static const struct counter_desc vport_stats_desc[] = {
-	{ "rx_vport_unicast_packets",
-		VPORT_COUNTER_OFF(received_eth_unicast.packets) },
-	{ "rx_vport_unicast_bytes",
-		VPORT_COUNTER_OFF(received_eth_unicast.octets) },
-	{ "tx_vport_unicast_packets",
-		VPORT_COUNTER_OFF(transmitted_eth_unicast.packets) },
-	{ "tx_vport_unicast_bytes",
-		VPORT_COUNTER_OFF(transmitted_eth_unicast.octets) },
-	{ "rx_vport_multicast_packets",
-		VPORT_COUNTER_OFF(received_eth_multicast.packets) },
-	{ "rx_vport_multicast_bytes",
-		VPORT_COUNTER_OFF(received_eth_multicast.octets) },
-	{ "tx_vport_multicast_packets",
-		VPORT_COUNTER_OFF(transmitted_eth_multicast.packets) },
-	{ "tx_vport_multicast_bytes",
-		VPORT_COUNTER_OFF(transmitted_eth_multicast.octets) },
-	{ "rx_vport_broadcast_packets",
-		VPORT_COUNTER_OFF(received_eth_broadcast.packets) },
-	{ "rx_vport_broadcast_bytes",
-		VPORT_COUNTER_OFF(received_eth_broadcast.octets) },
-	{ "tx_vport_broadcast_packets",
-		VPORT_COUNTER_OFF(transmitted_eth_broadcast.packets) },
-	{ "tx_vport_broadcast_bytes",
-		VPORT_COUNTER_OFF(transmitted_eth_broadcast.octets) },
-	{ "rx_vport_rdma_unicast_packets",
-		VPORT_COUNTER_OFF(received_ib_unicast.packets) },
-	{ "rx_vport_rdma_unicast_bytes",
-		VPORT_COUNTER_OFF(received_ib_unicast.octets) },
-	{ "tx_vport_rdma_unicast_packets",
-		VPORT_COUNTER_OFF(transmitted_ib_unicast.packets) },
-	{ "tx_vport_rdma_unicast_bytes",
-		VPORT_COUNTER_OFF(transmitted_ib_unicast.octets) },
-	{ "rx_vport_rdma_multicast_packets",
-		VPORT_COUNTER_OFF(received_ib_multicast.packets) },
-	{ "rx_vport_rdma_multicast_bytes",
-		VPORT_COUNTER_OFF(received_ib_multicast.octets) },
-	{ "tx_vport_rdma_multicast_packets",
-		VPORT_COUNTER_OFF(transmitted_ib_multicast.packets) },
-	{ "tx_vport_rdma_multicast_bytes",
-		VPORT_COUNTER_OFF(transmitted_ib_multicast.octets) },
-};
 
 #define PPORT_802_3_OFF(c) \
 	MLX5_BYTE_OFF(ppcnt_reg, \
@@ -224,69 +143,6 @@ struct mlx5e_pport_stats {
 	__be64 phy_statistical_counters[MLX5_ST_SZ_QW(ppcnt_reg)];
 };
 
-static const struct counter_desc pport_802_3_stats_desc[] = {
-	{ "tx_packets_phy", PPORT_802_3_OFF(a_frames_transmitted_ok) },
-	{ "rx_packets_phy", PPORT_802_3_OFF(a_frames_received_ok) },
-	{ "rx_crc_errors_phy", PPORT_802_3_OFF(a_frame_check_sequence_errors) },
-	{ "tx_bytes_phy", PPORT_802_3_OFF(a_octets_transmitted_ok) },
-	{ "rx_bytes_phy", PPORT_802_3_OFF(a_octets_received_ok) },
-	{ "tx_multicast_phy", PPORT_802_3_OFF(a_multicast_frames_xmitted_ok) },
-	{ "tx_broadcast_phy", PPORT_802_3_OFF(a_broadcast_frames_xmitted_ok) },
-	{ "rx_multicast_phy", PPORT_802_3_OFF(a_multicast_frames_received_ok) },
-	{ "rx_broadcast_phy", PPORT_802_3_OFF(a_broadcast_frames_received_ok) },
-	{ "rx_in_range_len_errors_phy", PPORT_802_3_OFF(a_in_range_length_errors) },
-	{ "rx_out_of_range_len_phy", PPORT_802_3_OFF(a_out_of_range_length_field) },
-	{ "rx_oversize_pkts_phy", PPORT_802_3_OFF(a_frame_too_long_errors) },
-	{ "rx_symbol_err_phy", PPORT_802_3_OFF(a_symbol_error_during_carrier) },
-	{ "tx_mac_control_phy", PPORT_802_3_OFF(a_mac_control_frames_transmitted) },
-	{ "rx_mac_control_phy", PPORT_802_3_OFF(a_mac_control_frames_received) },
-	{ "rx_unsupported_op_phy", PPORT_802_3_OFF(a_unsupported_opcodes_received) },
-	{ "rx_pause_ctrl_phy", PPORT_802_3_OFF(a_pause_mac_ctrl_frames_received) },
-	{ "tx_pause_ctrl_phy", PPORT_802_3_OFF(a_pause_mac_ctrl_frames_transmitted) },
-};
-
-static const struct counter_desc pport_2863_stats_desc[] = {
-	{ "rx_discards_phy", PPORT_2863_OFF(if_in_discards) },
-	{ "tx_discards_phy", PPORT_2863_OFF(if_out_discards) },
-	{ "tx_errors_phy", PPORT_2863_OFF(if_out_errors) },
-};
-
-static const struct counter_desc pport_2819_stats_desc[] = {
-	{ "rx_undersize_pkts_phy", PPORT_2819_OFF(ether_stats_undersize_pkts) },
-	{ "rx_fragments_phy", PPORT_2819_OFF(ether_stats_fragments) },
-	{ "rx_jabbers_phy", PPORT_2819_OFF(ether_stats_jabbers) },
-	{ "rx_64_bytes_phy", PPORT_2819_OFF(ether_stats_pkts64octets) },
-	{ "rx_65_to_127_bytes_phy", PPORT_2819_OFF(ether_stats_pkts65to127octets) },
-	{ "rx_128_to_255_bytes_phy", PPORT_2819_OFF(ether_stats_pkts128to255octets) },
-	{ "rx_256_to_511_bytes_phy", PPORT_2819_OFF(ether_stats_pkts256to511octets) },
-	{ "rx_512_to_1023_bytes_phy", PPORT_2819_OFF(ether_stats_pkts512to1023octets) },
-	{ "rx_1024_to_1518_bytes_phy", PPORT_2819_OFF(ether_stats_pkts1024to1518octets) },
-	{ "rx_1519_to_2047_bytes_phy", PPORT_2819_OFF(ether_stats_pkts1519to2047octets) },
-	{ "rx_2048_to_4095_bytes_phy", PPORT_2819_OFF(ether_stats_pkts2048to4095octets) },
-	{ "rx_4096_to_8191_bytes_phy", PPORT_2819_OFF(ether_stats_pkts4096to8191octets) },
-	{ "rx_8192_to_10239_bytes_phy", PPORT_2819_OFF(ether_stats_pkts8192to10239octets) },
-};
-
-static const struct counter_desc pport_phy_statistical_stats_desc[] = {
-	{ "rx_symbol_errors_phy", PPORT_PHY_STATISTICAL_OFF(phy_symbol_errors) },
-	{ "rx_corrected_bits_phy", PPORT_PHY_STATISTICAL_OFF(phy_corrected_bits) },
-};
-
-static const struct counter_desc pport_per_prio_traffic_stats_desc[] = {
-	{ "rx_prio%d_bytes", PPORT_PER_PRIO_OFF(rx_octets) },
-	{ "rx_prio%d_packets", PPORT_PER_PRIO_OFF(rx_frames) },
-	{ "tx_prio%d_bytes", PPORT_PER_PRIO_OFF(tx_octets) },
-	{ "tx_prio%d_packets", PPORT_PER_PRIO_OFF(tx_frames) },
-};
-
-static const struct counter_desc pport_per_prio_pfc_stats_desc[] = {
-	/* %s is "global" or "prio{i}" */
-	{ "rx_%s_pause", PPORT_PER_PRIO_OFF(rx_pause) },
-	{ "rx_%s_pause_duration", PPORT_PER_PRIO_OFF(rx_pause_duration) },
-	{ "tx_%s_pause", PPORT_PER_PRIO_OFF(tx_pause) },
-	{ "tx_%s_pause_duration", PPORT_PER_PRIO_OFF(tx_pause_duration) },
-	{ "rx_%s_pause_transition", PPORT_PER_PRIO_OFF(rx_pause_transition) },
-};
 
 #define PCIE_PERF_OFF(c) \
 	MLX5_BYTE_OFF(mpcnt_reg, counter_set.pcie_perf_cntrs_grp_data_layout.c)
@@ -298,11 +154,6 @@ struct mlx5e_pcie_stats {
 	__be64 pcie_perf_counters[MLX5_ST_SZ_QW(mpcnt_reg)];
 };
 
-static const struct counter_desc pcie_perf_stats_desc[] = {
-	{ "rx_pci_signal_integrity", PCIE_PERF_OFF(rx_errors) },
-	{ "tx_pci_signal_integrity", PCIE_PERF_OFF(tx_errors) },
-};
-
 struct mlx5e_rq_stats {
 	u64 packets;
 	u64 bytes;
@@ -325,28 +176,6 @@ struct mlx5e_rq_stats {
 	u64 cache_busy;
 };
 
-static const struct counter_desc rq_stats_desc[] = {
-	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, packets) },
-	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, bytes) },
-	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, csum_complete) },
-	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, csum_unnecessary_inner) },
-	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, csum_none) },
-	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, xdp_drop) },
-	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, xdp_tx) },
-	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, xdp_tx_full) },
-	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, lro_packets) },
-	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, lro_bytes) },
-	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, wqe_err) },
-	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, mpwqe_filler) },
-	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, buff_alloc_err) },
-	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, cqe_compress_blks) },
-	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, cqe_compress_pkts) },
-	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, cache_reuse) },
-	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, cache_full) },
-	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, cache_empty) },
-	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, cache_busy) },
-};
-
 struct mlx5e_sq_stats {
 	/* commonly accessed in data path */
 	u64 packets;
@@ -365,22 +194,6 @@ struct mlx5e_sq_stats {
 	u64 dropped;
 };
 
-static const struct counter_desc sq_stats_desc[] = {
-	{ MLX5E_DECLARE_TX_STAT(struct mlx5e_sq_stats, packets) },
-	{ MLX5E_DECLARE_TX_STAT(struct mlx5e_sq_stats, bytes) },
-	{ MLX5E_DECLARE_TX_STAT(struct mlx5e_sq_stats, tso_packets) },
-	{ MLX5E_DECLARE_TX_STAT(struct mlx5e_sq_stats, tso_bytes) },
-	{ MLX5E_DECLARE_TX_STAT(struct mlx5e_sq_stats, tso_inner_packets) },
-	{ MLX5E_DECLARE_TX_STAT(struct mlx5e_sq_stats, tso_inner_bytes) },
-	{ MLX5E_DECLARE_TX_STAT(struct mlx5e_sq_stats, csum_partial_inner) },
-	{ MLX5E_DECLARE_TX_STAT(struct mlx5e_sq_stats, nop) },
-	{ MLX5E_DECLARE_TX_STAT(struct mlx5e_sq_stats, csum_none) },
-	{ MLX5E_DECLARE_TX_STAT(struct mlx5e_sq_stats, stopped) },
-	{ MLX5E_DECLARE_TX_STAT(struct mlx5e_sq_stats, wake) },
-	{ MLX5E_DECLARE_TX_STAT(struct mlx5e_sq_stats, dropped) },
-	{ MLX5E_DECLARE_TX_STAT(struct mlx5e_sq_stats, xmit_more) },
-};
-
 #define NUM_SW_COUNTERS			ARRAY_SIZE(sw_stats_desc)
 #define NUM_Q_COUNTERS			ARRAY_SIZE(q_stats_desc)
 #define NUM_VPORT_COUNTERS		ARRAY_SIZE(vport_stats_desc)
@@ -416,21 +229,4 @@ struct mlx5e_stats {
 	struct mlx5e_pcie_stats pcie;
 };
 
-static const struct counter_desc mlx5e_pme_status_desc[] = {
-	{ "module_plug", 0 },
-	{ "module_unplug", 8 },
-};
-
-static const struct counter_desc mlx5e_pme_error_desc[] = {
-	{ "module_pwr_budget_exd", 0 },  /* power budget exceed */
-	{ "module_long_range", 8 },      /* long range for non MLNX cable */
-	{ "module_bus_stuck", 16 },      /* bus stuck (I2C or data shorted) */
-	{ "module_no_eeprom", 24 },      /* no eeprom/retry time out */
-	{ "module_enforce_part", 32 },   /* enforce part number list */
-	{ "module_unknown_id", 40 },     /* unknown identifier */
-	{ "module_high_temp", 48 },      /* high temperature */
-	{ "module_bad_shorted", 56 },    /* bad or shorted cable/module */
-	{ "module_unknown_status", 64 },
-};
-
 #endif /* __MLX5_EN_STATS_H__ */
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH net-next 0/3] mlx5: fix warnings
From: Stephen Hemminger @ 2017-04-21 18:15 UTC (permalink / raw)
  To: davem-fT/PcQaiUtIeIZ0/mPfg9Q, saeedm-VPRAkNaXOzVWk0Htik3J/w,
	matanb-VPRAkNaXOzVWk0Htik3J/w, leonro-VPRAkNaXOzVWk0Htik3J/w
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	Stephen Hemminger

While looking for sparse and warning output in another driver,
I saw several trivial warnings from MLX5 driver. This patch
series fixes them.

Stephen Hemminger (3):
  mlx5: hide unused functions
  mlx5: fix warning about missing prototype
  mlx5: fix space waste from ethtool descriptions

 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   | 208 +++++++++++++++++++++
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c    |   1 +
 drivers/net/ethernet/mellanox/mlx5/core/en_stats.h | 206 +-------------------
 drivers/net/ethernet/mellanox/mlx5/core/en_tx.c    |   1 +
 drivers/net/ethernet/mellanox/mlx5/core/ipoib.c    |  24 +--
 5 files changed, 224 insertions(+), 216 deletions(-)

-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH net-next 5/7] ibmvnic: Add set_link_state routine for setting adapter link state
From: Nathan Fontenot @ 2017-04-21 19:39 UTC (permalink / raw)
  To: netdev; +Cc: brking, jallen, muvic, tlfalcon
In-Reply-To: <20170421193627.11030.34813.stgit@ltcalpine2-lp23.aus.stglabs.ibm.com>

Create a common routine for setting the link state for the vnic adapter.
This update moves the sending of the crq and waiting for the link state
response to a common place. The new routine also adds handling of
resending the crq in cases of getting a partial success response.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
---
 drivers/net/ethernet/ibm/ibmvnic.c |   71 +++++++++++++++++++++++++++++-------
 drivers/net/ethernet/ibm/ibmvnic.h |    1 +
 2 files changed, 58 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
index 199cccb..115f216 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -561,6 +561,51 @@ static void release_resources(struct ibmvnic_adapter *adapter)
 	release_error_buffers(adapter);
 }
 
+static int set_link_state(struct ibmvnic_adapter *adapter, u8 link_state)
+{
+	struct net_device *netdev = adapter->netdev;
+	unsigned long timeout = msecs_to_jiffies(30000);
+	union ibmvnic_crq crq;
+	bool resend;
+	int rc;
+
+	if (adapter->logical_link_state == link_state) {
+		netdev_dbg(netdev, "Link state already %d\n", link_state);
+		return 0;
+	}
+
+	netdev_err(netdev, "setting link state %d\n", link_state);
+	memset(&crq, 0, sizeof(crq));
+	crq.logical_link_state.first = IBMVNIC_CRQ_CMD;
+	crq.logical_link_state.cmd = LOGICAL_LINK_STATE;
+	crq.logical_link_state.link_state = link_state;
+
+	do {
+		resend = false;
+
+		reinit_completion(&adapter->init_done);
+		rc = ibmvnic_send_crq(adapter, &crq);
+		if (rc) {
+			netdev_err(netdev, "Failed to set link state\n");
+			return rc;
+		}
+
+		if (!wait_for_completion_timeout(&adapter->init_done,
+						 timeout)) {
+			netdev_err(netdev, "timeout setting link state\n");
+			return -1;
+		}
+
+		if (adapter->init_done_rc == 1) {
+			/* Partuial success, delay and re-send */
+			mdelay(1000);
+			resend = true;
+		}
+	} while (resend);
+
+	return 0;
+}
+
 static int set_real_num_queues(struct net_device *netdev)
 {
 	struct ibmvnic_adapter *adapter = netdev_priv(netdev);
@@ -583,7 +628,6 @@ static int ibmvnic_open(struct net_device *netdev)
 {
 	struct ibmvnic_adapter *adapter = netdev_priv(netdev);
 	struct device *dev = &adapter->vdev->dev;
-	union ibmvnic_crq crq;
 	int rc = 0;
 	int i;
 
@@ -643,11 +687,9 @@ static int ibmvnic_open(struct net_device *netdev)
 	for (i = 0; i < adapter->req_tx_queues; i++)
 		enable_scrq_irq(adapter, adapter->tx_scrq[i]);
 
-	memset(&crq, 0, sizeof(crq));
-	crq.logical_link_state.first = IBMVNIC_CRQ_CMD;
-	crq.logical_link_state.cmd = LOGICAL_LINK_STATE;
-	crq.logical_link_state.link_state = IBMVNIC_LOGICAL_LNK_UP;
-	ibmvnic_send_crq(adapter, &crq);
+	rc = set_link_state(adapter, IBMVNIC_LOGICAL_LNK_UP);
+	if (rc)
+		goto ibmvnic_open_fail;
 
 	netif_tx_start_all_queues(netdev);
 	adapter->is_closed = false;
@@ -681,7 +723,7 @@ static void disable_sub_crqs(struct ibmvnic_adapter *adapter)
 static int ibmvnic_close(struct net_device *netdev)
 {
 	struct ibmvnic_adapter *adapter = netdev_priv(netdev);
-	union ibmvnic_crq crq;
+	int rc = 0;
 	int i;
 
 	adapter->closing = true;
@@ -693,17 +735,13 @@ static int ibmvnic_close(struct net_device *netdev)
 	if (!adapter->failover)
 		netif_tx_stop_all_queues(netdev);
 
-	memset(&crq, 0, sizeof(crq));
-	crq.logical_link_state.first = IBMVNIC_CRQ_CMD;
-	crq.logical_link_state.cmd = LOGICAL_LINK_STATE;
-	crq.logical_link_state.link_state = IBMVNIC_LOGICAL_LNK_DN;
-	ibmvnic_send_crq(adapter, &crq);
+	rc = set_link_state(adapter, IBMVNIC_LOGICAL_LNK_DN);
 
 	release_resources(adapter);
 
 	adapter->is_closed = true;
 	adapter->closing = false;
-	return 0;
+	return rc;
 }
 
 /**
@@ -2945,9 +2983,14 @@ static void ibmvnic_handle_crq(union ibmvnic_crq *crq,
 		handle_login_rsp(crq, adapter);
 		break;
 	case LOGICAL_LINK_STATE_RSP:
-		netdev_dbg(netdev, "Got Logical Link State Response\n");
+		netdev_dbg(netdev,
+			   "Got Logical Link State Response, state: %d rc: %d\n",
+			   crq->logical_link_state_rsp.link_state,
+			   crq->logical_link_state_rsp.rc.code);
 		adapter->logical_link_state =
 		    crq->logical_link_state_rsp.link_state;
+		adapter->init_done_rc = crq->logical_link_state_rsp.rc.code;
+		complete(&adapter->init_done);
 		break;
 	case LINK_STATE_INDICATION:
 		netdev_dbg(netdev, "Got Logical Link State Indication\n");
diff --git a/drivers/net/ethernet/ibm/ibmvnic.h b/drivers/net/ethernet/ibm/ibmvnic.h
index 387c843..a69979f 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.h
+++ b/drivers/net/ethernet/ibm/ibmvnic.h
@@ -964,6 +964,7 @@ struct ibmvnic_adapter {
 	struct ibmvnic_tx_pool *tx_pool;
 	bool closing;
 	struct completion init_done;
+	int init_done_rc;
 
 	struct list_head errors;
 	spinlock_t error_list_lock;

^ permalink raw reply related

* Re: [PATCH net-next v2 2/5] virtio-net: transmit napi
From: Willem de Bruijn @ 2017-04-21 18:10 UTC (permalink / raw)
  To: Jason Wang
  Cc: Network Development, Michael S. Tsirkin, virtualization,
	David Miller, Willem de Bruijn
In-Reply-To: <CAF=yD-KzHo1jVugrZKq_CmyrsqCR1zmM4Jo12OjacreKm74P8w@mail.gmail.com>

On Thu, Apr 20, 2017 at 12:02 PM, Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>>>   static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
>>>   {
>>>         struct virtio_net_hdr_mrg_rxbuf *hdr;
>>> @@ -1130,9 +1172,11 @@ static netdev_tx_t start_xmit(struct sk_buff *skb,
>>> struct net_device *dev)
>>>         int err;
>>>         struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum);
>>>         bool kick = !skb->xmit_more;
>>> +       bool use_napi = sq->napi.weight;
>>>         /* Free up any pending old buffers before queueing new ones. */
>>> -       free_old_xmit_skbs(sq);
>>> +       if (!use_napi)
>>> +               free_old_xmit_skbs(sq);
>>
>>
>> I'm not sure this is best or even correct. Consider we clean xmit packets
>> speculatively in virtnet_poll_tx(), we need call free_old_xmit_skbs()
>> unconditionally. This can also help to reduce the possible of napi
>> rescheduling in virtnet_poll_tx().
>
> Because of the use of trylock there. Absolutely, thanks! Perhaps I should
> only use trylock in the opportunistic clean path from the rx softirq and
> full locking in the tx softirq.
>
> I previously observed that cleaning here would, counterintuitively,
> reduce efficiency. It reverted the improvements of cleaning transmit
> completions from the receive softirq. Going through my data, I did
> not observe this regression anymore on the latest patchset.
>
> Let me test again, with and without the new
> virtqueue_enable_cb_delayed patch. Perhaps that made a
> difference.

Neither cleaning in start_xmit nor converting the napi tx trylock to lock
shows a significant impact on loadtests, whether cpu affine or not.

I'll make both changes, as the first reduces patch size and code complexity
and the second is a more obviously correct codepath than than with trylock.

To be clear, the variant called from the rx napi handler will still
opportunistically use trylock.

^ permalink raw reply

* [PATCH net-next 2/7] ibmvnic: Insert header on VLAN tagged received frame
From: Nathan Fontenot @ 2017-04-21 19:38 UTC (permalink / raw)
  To: netdev; +Cc: brking, jallen, muvic, tlfalcon
In-Reply-To: <20170421193627.11030.34813.stgit@ltcalpine2-lp23.aus.stglabs.ibm.com>

From: Murilo Fossa Vicentini <muvic@linux.vnet.ibm.com>

This patch addresses a modification in the PAPR+ specification which now
defines a previously reserved value for vNIC capabilities. It indicates
whether the system firmware performs a VLAN header stripping on all VLAN
tagged received frames, in case it does, the behavior expected is for
the ibmvnic driver to be responsible for inserting the VLAN header.

Reported-by: Manvanthara B. Puttashankar <mputtash@in.ibm.com>
Signed-off-by: Murilo Fossa Vicentini <muvic@linux.vnet.ibm.com>
---
 drivers/net/ethernet/ibm/ibmvnic.c |   21 ++++++++++++++++++++-
 drivers/net/ethernet/ibm/ibmvnic.h |    2 ++
 2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
index 7f4cecb..0f359543 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -74,6 +74,7 @@
 #include <linux/uaccess.h>
 #include <asm/firmware.h>
 #include <linux/workqueue.h>
+#include <linux/if_vlan.h>
 
 #include "ibmvnic.h"
 
@@ -1105,7 +1106,15 @@ static int ibmvnic_poll(struct napi_struct *napi, int budget)
 		skb = rx_buff->skb;
 		skb_copy_to_linear_data(skb, rx_buff->data + offset,
 					length);
-		skb->vlan_tci = be16_to_cpu(next->rx_comp.vlan_tci);
+
+		/* VLAN Header has been stripped by the system firmware and
+		 * needs to be inserted by the driver
+		 */
+		if (adapter->rx_vlan_header_insertion &&
+		    (flags & IBMVNIC_VLAN_STRIPPED))
+			__vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q),
+					       ntohs(next->rx_comp.vlan_tci));
+
 		/* free the entry */
 		next->rx_comp.first = 0;
 		remove_buff_from_pool(adapter, rx_buff);
@@ -2170,6 +2179,10 @@ static void send_cap_queries(struct ibmvnic_adapter *adapter)
 	atomic_inc(&adapter->running_cap_crqs);
 	ibmvnic_send_crq(adapter, &crq);
 
+	crq.query_capability.capability = cpu_to_be16(RX_VLAN_HEADER_INSERTION);
+	atomic_inc(&adapter->running_cap_crqs);
+	ibmvnic_send_crq(adapter, &crq);
+
 	crq.query_capability.capability = cpu_to_be16(MAX_TX_SG_ENTRIES);
 	atomic_inc(&adapter->running_cap_crqs);
 	ibmvnic_send_crq(adapter, &crq);
@@ -2719,6 +2732,12 @@ static void handle_query_cap_rsp(union ibmvnic_crq *crq,
 		netdev_dbg(netdev, "vlan_header_insertion = %lld\n",
 			   adapter->vlan_header_insertion);
 		break;
+	case RX_VLAN_HEADER_INSERTION:
+		adapter->rx_vlan_header_insertion =
+		    be64_to_cpu(crq->query_capability.number);
+		netdev_dbg(netdev, "rx_vlan_header_insertion = %lld\n",
+			   adapter->rx_vlan_header_insertion);
+		break;
 	case MAX_TX_SG_ENTRIES:
 		adapter->max_tx_sg_entries =
 		    be64_to_cpu(crq->query_capability.number);
diff --git a/drivers/net/ethernet/ibm/ibmvnic.h b/drivers/net/ethernet/ibm/ibmvnic.h
index 355225c..387c843 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.h
+++ b/drivers/net/ethernet/ibm/ibmvnic.h
@@ -733,6 +733,7 @@ enum ibmvnic_capabilities {
 	REQ_MTU = 21,
 	MAX_MULTICAST_FILTERS = 22,
 	VLAN_HEADER_INSERTION = 23,
+	RX_VLAN_HEADER_INSERTION = 24,
 	MAX_TX_SG_ENTRIES = 25,
 	RX_SG_SUPPORTED = 26,
 	RX_SG_REQUESTED = 27,
@@ -993,6 +994,7 @@ struct ibmvnic_adapter {
 	u64 req_mtu;
 	u64 max_multicast_filters;
 	u64 vlan_header_insertion;
+	u64 rx_vlan_header_insertion;
 	u64 max_tx_sg_entries;
 	u64 rx_sg_supported;
 	u64 rx_sg_requested;

^ permalink raw reply related

* Re: [PATCH netnext 0/3] packet: Add option to create new fanout group with unique id.
From: Willem de Bruijn @ 2017-04-21 14:59 UTC (permalink / raw)
  To: Mike Maloney; +Cc: Network Development, David Miller, Mike Maloney
In-Reply-To: <20170420192546.5567-1-maloneykernel@gmail.com>

On Thu, Apr 20, 2017 at 3:25 PM, Mike Maloney <maloneykernel@gmail.com> wrote:
> From: Mike Maloney <maloney@google.com>
>
> Fanout uses a per net global namespace. A process that intends to create a
> new fanout group can accidentally join an existing group. It is
> not possible to detect this.
>
> Add a socket option to specify on the first call to
> setsockopt(..., PACKET_FANOUT, ...) to ensure that a new group is created.
> Also add tests.
>
> Mike Maloney (3):
>   selftests/net: cleanup unused parameter in psock_fanout
>   packet: add PACKET_FANOUT_FLAG_UNIQUEID to assign new fanout group id.
>   selftests/net: add tests for PACKET_FANOUT_FLAG_UNIQUEID
>
>  include/uapi/linux/if_packet.h             |  1 +
>  net/packet/af_packet.c                     | 44 ++++++++++++++
>  tools/testing/selftests/net/psock_fanout.c | 93 ++++++++++++++++++++++++++----
>  3 files changed, 128 insertions(+), 10 deletions(-)

Gmail refused to send out the main patch (2/3) from a fresh account. I
resubmitted the entire patchset on Mike's behalf.

^ permalink raw reply

* Re: net/core: BUG in unregister_netdevice_many
From: Linus Torvalds @ 2017-04-21 17:42 UTC (permalink / raw)
  To: Andrey Konovalov, Eric Dumazet
  Cc: David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, Alexander Duyck,
	David Ahern, Daniel Borkmann, tcharding, Jiri Pirko,
	stephen hemminger, Dmitry Vyukov, Kostya Serebryany, syzkaller
In-Reply-To: <CA+55aFwg=OtMWmU153uM27Fwk5eVv5ZBGBA9phQqVqn3nNAy6Q@mail.gmail.com>

On Fri, Apr 21, 2017 at 10:25 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> I'm assuming that the real cause is simply that "dev->reg_state" ends
> up being NETREG_UNREGISTERING or something. Maybe the BUG_ON() could
> be just removed, and replaced by the previous warning about
> NETREG_UNINITIALIZED.
>
> Something like the attached (TOTALLY UNTESTED) patch.

.. might as well test it.

That patch doesn't fix the problem, but it does show that yes, it was
NETREG_UNREGISTERING:

  unregister_netdevice: device pim6reg/ffff962dc4606000 was not registered (2)

but then immediately afterwards we get

  general protection fault: 0000 [#1] SMP
  Workqueue: netns cleanup_net
  RIP: 0010:dev_shutdown+0xe/0xc0
  Call Trace:
     rollback_registered_many+0x2a5/0x440
     unregister_netdevice_many+0x1e/0xb0
     default_device_exit_batch+0x145/0x170

which is due to a

        mov    0x388(%rdi),%eax

where %rdi is 0xdead000000000090. That is at the very beginning of
dev_shutdown, it's "dev" itself that has that value, so it comes from
(_another_) invocation of rollback_registered_many(), when it does
that

        list_for_each_entry(dev, head, unreg_list) {

so it seems to be a case of another "list_del() leaves list in bad
state", and it was the added test for "dev->reg_state !=
NETREG_REGISTERED" that did that

        list_del(&dev->unreg_list);

and left random contents in the unreg_list.

So that "handle error case" was almost certainly just buggy too.

And the bug seems to be that we're trying to unregister a netdevice
that has already been unregistered.

Over to Eric and networking people. This oops is user-triggerable, and
leaves the machine in a bad state (the original BUG_ON() and the new
GP fault both happen while holding the RTNL, so networking is not
healthy afterwards.

                      Linus

^ permalink raw reply

* Re: [PATCH net] gso: Validate assumption of frag_list segementation
From: David Miller @ 2017-04-21 17:30 UTC (permalink / raw)
  To: ilant
  Cc: netdev, alexander.h.duyck, eric.dumazet, steffen.klassert, borisp,
	ilyal
In-Reply-To: <20170419182607.2342-1-ilant@mellanox.com>

From: <ilant@mellanox.com>
Date: Wed, 19 Apr 2017 21:26:07 +0300

> From: Ilan Tayari <ilant@mellanox.com>
> 
> Commit 07b26c9454a2 ("gso: Support partial splitting at the frag_list
> pointer") assumes that all SKBs in a frag_list (except maybe the last
> one) contain the same amount of GSO payload.
> 
> This assumption is not always correct, resulting in the following
> warning message in the log:
>     skb_segment: too many frags
> 
> For example, mlx5 driver in Striding RQ mode creates some RX SKBs with
> one frag, and some with 2 frags.
> After GRO, the frag_list SKBs end up having different amounts of payload.
> If this frag_list SKB is then forwarded, the aforementioned assumption
> is violated.
> 
> Validate the assumption, and fall back to software GSO if it not true.
> 
> Change-Id: Ia03983f4a47b6534dd987d7a2aad96d54d46d212
> Fixes: 07b26c9454a2 ("gso: Support partial splitting at the frag_list pointer")
> Signed-off-by: Ilan Tayari <ilant@mellanox.com>
> Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>

Applied.

^ permalink raw reply

* Re: net: cleanup_net is slow
From: Eric Dumazet @ 2017-04-21 17:57 UTC (permalink / raw)
  To: Andrey Konovalov
  Cc: Cong Wang, netdev, LKML, Dmitry Vyukov, Kostya Serebryany,
	syzkaller
In-Reply-To: <CAAeHK+yJXMo8kr_p8iM_dicVGQCtwQhdk=rdE+9dNbsoghAb6Q@mail.gmail.com>

On Fri, Apr 21, 2017 at 10:50 AM, Andrey Konovalov
<andreyknvl@google.com> wrote:
> Hi!
>
> We're investigating some approaches to improve isolation of syzkaller
> programs. One of the ideas is run each program in it's own user/net
> namespace. However, while I was experimenting with this, I stumbled
> upon a problem.
>
> It seems that cleanup_net() might take a very long time to execute.
>
> I've attached the reproducer and kernel .config that I used. Run as
> "./a.out 1". The reproducer just forks and does unshare(CLONE_NEWNET)
> in a loop. Note, that I have a lot of network-related configs enabled,
> which causes a few interfaces to be set up by default.
>
> What I see with this reproducer is that at first a huge number
> (~200-300) net namespaces are created without any contention. But then
> (probably when one of these namespaces gets destroyed) the program
> hangs for a considerable amount of time (~100 seconds in my vm).
> Nothing locks up inside the kernel and the CPU is mostly idle.
>
> Adding debug printfs showed that the part that takes almost all of
> that time is the lines between synchronize_rcu() and
> mutex_unlock(&net_mutex) in cleanup_net. Running perf showed that the
> cause of this might be a lot of calls to synchronize_net that happen
> while executing those lines.
>
> Is there any change that can be done to speed up the
> creation/destruction of a huge number of net namespaces?
>

We have batches, but fundamentally this is a hard problem to solve.

Every time we try, we add bugs :/

RTNL is the new BKL (Big Kernel Lock of early linux) of today.

Even the synchronize_rcu_expedited() done from synchronize_net() is a
serious issue on some platforms.

^ permalink raw reply

* RE: [net-next 02/11] ixgbe: add XDP support for pass and drop actions
From: Tantilov, Emil S @ 2017-04-21 16:52 UTC (permalink / raw)
  To: Kirsher, Jeffrey T, davem@davemloft.net
  Cc: Fastabend, John R, netdev@vger.kernel.org, nhorman@redhat.com,
	sassmann@redhat.com, jogreene@redhat.com, Kirsher, Jeffrey T
In-Reply-To: <20170421015029.18994-3-jeffrey.t.kirsher@intel.com>

>-----Original Message-----
>From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org] On
>Behalf Of Jeff Kirsher
>Sent: Thursday, April 20, 2017 6:50 PM
>To: davem@davemloft.net
>Cc: Fastabend, John R <john.r.fastabend@intel.com>; netdev@vger.kernel.org;
>nhorman@redhat.com; sassmann@redhat.com; jogreene@redhat.com; Kirsher,
>Jeffrey T <jeffrey.t.kirsher@intel.com>
>Subject: [net-next 02/11] ixgbe: add XDP support for pass and drop actions
>
>From: John Fastabend <john.r.fastabend@intel.com>
>
>Basic XDP drop support for ixgbe. Uses READ_ONCE/xchg semantics on XDP
>programs instead of rcu primitives as suggested by Daniel Borkmann and
>Alex Duyck.
>
>Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
>Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
>Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
>Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
>---
> drivers/net/ethernet/intel/ixgbe/ixgbe.h         |   4 +-
> drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c |   4 +-
> drivers/net/ethernet/intel/ixgbe/ixgbe_main.c    | 163
>+++++++++++++++++++----
> 3 files changed, 144 insertions(+), 27 deletions(-)
>
>diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
>b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
>index 656ca8f69768..cb14813b0080 100644
>--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
>+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
>@@ -318,6 +318,7 @@ struct ixgbe_ring {
> 	struct ixgbe_ring *next;	/* pointer to next ring in q_vector */
> 	struct ixgbe_q_vector *q_vector; /* backpointer to host q_vector */
> 	struct net_device *netdev;	/* netdev ring belongs to */
>+	struct bpf_prog *xdp_prog;
> 	struct device *dev;		/* device for DMA mapping */
> 	struct ixgbe_fwd_adapter *l2_accel_priv;
> 	void *desc;			/* descriptor ring memory */
>@@ -555,6 +556,7 @@ struct ixgbe_adapter {
> 	unsigned long active_vlans[BITS_TO_LONGS(VLAN_N_VID)];
> 	/* OS defined structs */
> 	struct net_device *netdev;
>+	struct bpf_prog *xdp_prog;
> 	struct pci_dev *pdev;
>
> 	unsigned long state;
>@@ -835,7 +837,7 @@ void ixgbe_down(struct ixgbe_adapter *adapter);
> void ixgbe_reinit_locked(struct ixgbe_adapter *adapter);
> void ixgbe_reset(struct ixgbe_adapter *adapter);
> void ixgbe_set_ethtool_ops(struct net_device *netdev);
>-int ixgbe_setup_rx_resources(struct ixgbe_ring *);
>+int ixgbe_setup_rx_resources(struct ixgbe_adapter *, struct ixgbe_ring *);
> int ixgbe_setup_tx_resources(struct ixgbe_ring *);
> void ixgbe_free_rx_resources(struct ixgbe_ring *);
> void ixgbe_free_tx_resources(struct ixgbe_ring *);
>diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
>b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
>index 59730ede4746..79a126d9e091 100644
>--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
>+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
>@@ -1128,7 +1128,7 @@ static int ixgbe_set_ringparam(struct net_device
>*netdev,
> 			       sizeof(struct ixgbe_ring));
>
> 			temp_ring[i].count = new_rx_count;
>-			err = ixgbe_setup_rx_resources(&temp_ring[i]);
>+			err = ixgbe_setup_rx_resources(adapter, &temp_ring[i]);
> 			if (err) {
> 				while (i) {
> 					i--;
>@@ -1761,7 +1761,7 @@ static int ixgbe_setup_desc_rings(struct
>ixgbe_adapter *adapter)
> 	rx_ring->netdev = adapter->netdev;
> 	rx_ring->reg_idx = adapter->rx_ring[0]->reg_idx;
>
>-	err = ixgbe_setup_rx_resources(rx_ring);
>+	err = ixgbe_setup_rx_resources(adapter, rx_ring);
> 	if (err) {
> 		ret_val = 4;
> 		goto err_nomem;
>diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
>b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
>index afff2ca7f8c0..3dac3918e0c9 100644
>--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
>+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
>@@ -49,6 +49,9 @@
> #include <linux/if_macvlan.h>
> #include <linux/if_bridge.h>
> #include <linux/prefetch.h>
>+#include <linux/bpf.h>
>+#include <linux/bpf_trace.h>
>+#include <linux/atomic.h>
> #include <scsi/fc/fc_fcoe.h>
> #include <net/udp_tunnel.h>
> #include <net/pkt_cls.h>
>@@ -1855,6 +1858,10 @@ static void ixgbe_dma_sync_frag(struct ixgbe_ring
>*rx_ring,
>  * @rx_desc: pointer to the EOP Rx descriptor
>  * @skb: pointer to current skb being fixed
>  *
>+ * Check if the skb is valid in the XDP case it will be an error pointer.
>+ * Return true in this case to abort processing and advance to next
>+ * descriptor.
>+ *
>  * Check for corrupted packet headers caused by senders on the local L2
>  * embedded NIC switch not setting up their Tx Descriptors right.  These
>  * should be very rare.
>@@ -1873,6 +1880,10 @@ static bool ixgbe_cleanup_headers(struct ixgbe_ring
>*rx_ring,
> {
> 	struct net_device *netdev = rx_ring->netdev;
>
>+	/* XDP packets use error pointer so abort at this point */
>+	if (IS_ERR(skb))
>+		return true;
>+
> 	/* verify that the packet does not have any known errors */
> 	if (unlikely(ixgbe_test_staterr(rx_desc,
> 					IXGBE_RXDADV_ERR_FRAME_ERR_MASK) &&
>@@ -2048,7 +2059,7 @@ static void ixgbe_put_rx_buffer(struct ixgbe_ring
>*rx_ring,
> 		/* hand second half of page back to the ring */
> 		ixgbe_reuse_rx_page(rx_ring, rx_buffer);
> 	} else {
>-		if (IXGBE_CB(skb)->dma == rx_buffer->dma) {
>+		if (!IS_ERR(skb) && IXGBE_CB(skb)->dma == rx_buffer->dma) {
> 			/* the page has been released from the ring */
> 			IXGBE_CB(skb)->page_released = true;
> 		} else {
>@@ -2069,10 +2080,10 @@ static void ixgbe_put_rx_buffer(struct ixgbe_ring
>*rx_ring,
>
> static struct sk_buff *ixgbe_construct_skb(struct ixgbe_ring *rx_ring,
> 					   struct ixgbe_rx_buffer *rx_buffer,
>-					   union ixgbe_adv_rx_desc *rx_desc,
>-					   unsigned int size)
>+					   struct xdp_buff *xdp,
>+					   union ixgbe_adv_rx_desc *rx_desc)
> {
>-	void *va = page_address(rx_buffer->page) + rx_buffer->page_offset;
>+	unsigned int size = xdp->data_end - xdp->data;
> #if (PAGE_SIZE < 8192)
> 	unsigned int truesize = ixgbe_rx_pg_size(rx_ring) / 2;
> #else

This patch introduces a build error in the case where PAGE_SIZE >= 8192
because the "size" parameter was removed from the function parameters, but
it is still being used to calculate truesize for that case.

This was caught by the test build robot:

drivers/net/ethernet/intel/ixgbe/ixgbe_main.c:2173:36: error: 'size' undeclared (first use in this function)
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c:2173:5: error: 'size' undeclared (first use in this function)

I also warned about this error before, but it was not resolved.

Thanks,
Emil

^ permalink raw reply

* Re: [PATCH net-next] bpf: add napi_id read access to __sk_buff
From: David Miller @ 2017-04-21 17:53 UTC (permalink / raw)
  To: daniel; +Cc: edumazet, alexei.starovoitov, netdev
In-Reply-To: <9f477418c0db1091bfb119ef8c159f18384d6100.1492635441.git.daniel@iogearbox.net>

From: Daniel Borkmann <daniel@iogearbox.net>
Date: Wed, 19 Apr 2017 23:01:17 +0200

> Add napi_id access to __sk_buff for socket filter program types, tc
> program types and other bpf_convert_ctx_access() users. Having access
> to skb->napi_id is useful for per RX queue listener siloing, f.e.
> in combination with SO_ATTACH_REUSEPORT_EBPF and when busy polling is
> used, meaning SO_REUSEPORT enabled listeners can then select the
> corresponding socket at SYN time already [1]. The skb is marked via
> skb_mark_napi_id() early in the receive path (e.g., napi_gro_receive()).
> 
> Currently, sockets can only use SO_INCOMING_NAPI_ID from 6d4339028b35
> ("net: Introduce SO_INCOMING_NAPI_ID") as a socket option to look up
> the NAPI ID associated with the queue for steering, which requires a
> prior sk_mark_napi_id() after the socket was looked up.
> 
> Semantics for the __sk_buff napi_id access are similar, meaning if
> skb->napi_id is < MIN_NAPI_ID (e.g. outgoing packets using sender_cpu),
> then an invalid napi_id of 0 is returned to the program, otherwise a
> valid non-zero napi_id.
> 
>   [1] http://netdevconf.org/2.1/slides/apr6/dumazet-BUSY-POLLING-Netdev-2.1.pdf
> 
> Suggested-by: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> Acked-by: Alexei Starovoitov <ast@kernel.org>

Applied, thanks Daniel.

^ permalink raw reply

* Re: [PATCH net-next 00/10] ibmvnic: Updates and bug fixes
From: David Miller @ 2017-04-21 17:45 UTC (permalink / raw)
  To: nfont; +Cc: netdev, brking, jallen, muvic, tlfalcon
In-Reply-To: <20170419174015.37372.48544.stgit@ltcalpine2-lp23.aus.stglabs.ibm.com>

From: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Date: Wed, 19 Apr 2017 13:44:22 -0400

> This set of patches is a series of updates to remove some unneeded
> and unused code in the driver as well as bug fixes for the
> ibmvnic driver.

Series applied, thanks.

^ permalink raw reply

* Re: [PATCH net 0/6] net: use skb_cow_head() to deal with cloned skbs
From: David Miller @ 2017-04-21 17:25 UTC (permalink / raw)
  To: edumazet; +Cc: netdev, james.hughes, eric.dumazet
In-Reply-To: <20170419165926.30631-1-edumazet@google.com>

From: Eric Dumazet <edumazet@google.com>
Date: Wed, 19 Apr 2017 09:59:20 -0700

> James Hughes found an issue with smsc95xx driver. Same problematic code
> is found in other drivers.

Series applied, thanks Eric.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox