Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] geneve: fix ip_hdr_len reserved for geneve6 tunnel.
From: David Miller @ 2016-11-28 21:15 UTC (permalink / raw)
  To: yanhaishuang; +Cc: hannes, aduyck, pshelar, jbenc, netdev, linux-kernel
In-Reply-To: <1480310818-78456-1-git-send-email-yanhaishuang@cmss.chinamobile.com>

From: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Date: Mon, 28 Nov 2016 13:26:58 +0800

> It shold reserved sizeof(ipv6hdr) for geneve in ipv6 tunnel.
> 
> Fixes: c3ef5aa5e5 ('geneve: Merge ipv4 and ipv6 geneve_build_skb()')
> 
> Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>

Applied, thanks.

^ permalink raw reply

* Re: Receive offloads, small RCVBUF and zero TCP window
From: Alex Sidorenko @ 2016-11-28 21:14 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20161128.155459.1527519991492144879.davem@davemloft.net>

On Monday, November 28, 2016 3:54:59 PM EST David Miller wrote:
> From: Alex Sidorenko <alexandre.sidorenko@hpe.com>
> Date: Mon, 28 Nov 2016 15:49:26 -0500
> 
> > Now the question is whether is is OK to have icsk->icsk_ack.rcv_mss
> > larger than MTU.
> 
> It absolutely is not OK.
> 
> If VMWare wants to receive large frames for batching purposes it must
> use GRO or similar to achieve that, not just send vanilla frames into
> the stack which are larger than the device MTU.
> 

As VMWare's vmxnet3 driver is open-sourced and part of generic kernel, do you think the problem is in that driver or elsewhere? I looked at vmxnet3 sources and see that it uses LRO/GRO subroutines. Unfortunately, I don't understand its logic enough to see whether they are doing anything incorrectly.

Alex 

-- 

------------------------------------------------------------------
Alex Sidorenko	email: asid@hpe.com
ERT  Linux 	Hewlett-Packard Enterprise (Canada)
------------------------------------------------------------------

^ permalink raw reply

* Re: [PATCH] net: handle no dst on skb in icmp6_send
From: David Miller @ 2016-11-28 21:13 UTC (permalink / raw)
  To: dsa; +Cc: netdev, andreyknvl
In-Reply-To: <1480301573-21183-1-git-send-email-dsa@cumulusnetworks.com>

From: David Ahern <dsa@cumulusnetworks.com>
Date: Sun, 27 Nov 2016 18:52:53 -0800

> Andrey reported the following while fuzzing the kernel with syzkaller:
 ...
> icmp6_send / icmpv6_send is invoked for both rx and tx paths. In both
> cases the dst->dev should be preferred for determining the L3 domain
> if the dst has been set on the skb. Fallback to the skb->dev if it has
> not. This covers the case reported here where icmp6_send is invoked on
> Rx before the route lookup.
> 
> Fixes: 5d41ce29e ("net: icmp6_send should use dst dev to determine L3 domain")
> Reported-by: Andrey Konovalov <andreyknvl@google.com>
> Signed-off-by: David Ahern <dsa@cumulusnetworks.com>

Applied, thanks David.

^ permalink raw reply

* Re: [PATCH] rtl8xxxu: fix tx rate debug output
From: Jes Sorensen @ 2016-11-28 21:12 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: Kalle Valo, linux-wireless, netdev, linux-kernel
In-Reply-To: <20161128210815.2368509-1-arnd@arndb.de>

Arnd Bergmann <arnd@arndb.de> writes:
> We accidentally print the rate before we know it for txdesc_v2:

Hi Arnd,

Thanks for the patch - Barry Day already posted a patch for this which
Kalle has applied to the wireless tree.

Cheers,
Jes


>
> wireless/realtek/rtl8xxxu/rtl8xxxu_core.c: In function 'rtl8xxxu_fill_txdesc_v2':
> wireless/realtek/rtl8xxxu/rtl8xxxu_core.c:4848:3: error: 'rate' may be used uninitialized in this function [-Werror=maybe-uninitialized]
>
> txdesc_v1 got it right, so let's do it the same way here.
>
> Fixes: b4c3d9cfb607 ("rtl8xxxu: Pass tx_info to fill_txdesc in order to have access to retry count")
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> ---
>  drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c b/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c
> index 04141e57b8ae..a9137abc3ad9 100644
> --- a/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c
> +++ b/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c
> @@ -4844,16 +4844,16 @@ rtl8xxxu_fill_txdesc_v2(struct ieee80211_hw *hw, struct ieee80211_hdr *hdr,
>  
>  	tx_desc40 = (struct rtl8xxxu_txdesc40 *)tx_desc32;
>  
> -	if (rtl8xxxu_debug & RTL8XXXU_DEBUG_TX)
> -		dev_info(dev, "%s: TX rate: %d, pkt size %d\n",
> -			 __func__, rate, cpu_to_le16(tx_desc40->pkt_size));
> -
>  	if (rate_flags & IEEE80211_TX_RC_MCS &&
>  	    !ieee80211_is_mgmt(hdr->frame_control))
>  		rate = tx_info->control.rates[0].idx + DESC_RATE_MCS0;
>  	else
>  		rate = tx_rate->hw_value;
>  
> +	if (rtl8xxxu_debug & RTL8XXXU_DEBUG_TX)
> +		dev_info(dev, "%s: TX rate: %d, pkt size %d\n",
> +			 __func__, rate, cpu_to_le16(tx_desc40->pkt_size));
> +
>  	seq_number = IEEE80211_SEQ_TO_SN(le16_to_cpu(hdr->seq_ctrl));
>  
>  	tx_desc40->txdw4 = cpu_to_le32(rate);

^ permalink raw reply

* Re: [PATCH net 3/5] l2tp: fix racy socket lookup in l2tp_ip and l2tp_ip6 bind()
From: kbuild test robot @ 2016-11-28 21:10 UTC (permalink / raw)
  To: Guillaume Nault; +Cc: kbuild-all, netdev, James Chapman, Chris Elston
In-Reply-To: <de1fbe689e143ca160dd4da9a61c585c44ff6e78.1480360512.git.g.nault@alphalink.fr>

[-- Attachment #1: Type: text/plain, Size: 2705 bytes --]

Hi Guillaume,

[auto build test WARNING on net/master]

url:    https://github.com/0day-ci/linux/commits/Guillaume-Nault/l2tp-fixes-for-l2tp_ip-and-l2tp_ip6-socket-handling/20161129-043208
config: x86_64-rhel (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

Note: it may well be a FALSE warning. FWIW you are at least aware of it now.
http://gcc.gnu.org/wiki/Better_Uninitialized_Warnings

All warnings (new ones prefixed by >>):

   net/l2tp/l2tp_ip.c: In function 'l2tp_ip_bind':
>> net/l2tp/l2tp_ip.c:299:9: warning: 'ret' may be used uninitialized in this function [-Wmaybe-uninitialized]
     return ret;
            ^~~

vim +/ret +299 net/l2tp/l2tp_ip.c

3fb4e5ea Guillaume Nault 2016-11-28  283  		goto out;
3fb4e5ea Guillaume Nault 2016-11-28  284  	}
3fb4e5ea Guillaume Nault 2016-11-28  285  
3fb4e5ea Guillaume Nault 2016-11-28  286  	sk_dst_reset(sk);
0d76751f James Chapman   2010-04-02  287  	l2tp_ip_sk(sk)->conn_id = addr->l2tp_conn_id;
0d76751f James Chapman   2010-04-02  288  
0d76751f James Chapman   2010-04-02  289  	sk_add_bind_node(sk, &l2tp_ip_bind_table);
0d76751f James Chapman   2010-04-02  290  	sk_del_node_init(sk);
0d76751f James Chapman   2010-04-02  291  	write_unlock_bh(&l2tp_ip_lock);
3fb4e5ea Guillaume Nault 2016-11-28  292  
0d76751f James Chapman   2010-04-02  293  	ret = 0;
c51ce497 James Chapman   2012-05-29  294  	sock_reset_flag(sk, SOCK_ZAPPED);
c51ce497 James Chapman   2012-05-29  295  
0d76751f James Chapman   2010-04-02  296  out:
0d76751f James Chapman   2010-04-02  297  	release_sock(sk);
0d76751f James Chapman   2010-04-02  298  
0d76751f James Chapman   2010-04-02 @299  	return ret;
0d76751f James Chapman   2010-04-02  300  }
0d76751f James Chapman   2010-04-02  301  
0d76751f James Chapman   2010-04-02  302  static int l2tp_ip_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
0d76751f James Chapman   2010-04-02  303  {
0d76751f James Chapman   2010-04-02  304  	struct sockaddr_l2tpip *lsa = (struct sockaddr_l2tpip *) uaddr;
de3c7a18 James Chapman   2012-04-29  305  	int rc;
0d76751f James Chapman   2010-04-02  306  
0d76751f James Chapman   2010-04-02  307  	if (addr_len < sizeof(*lsa))

:::::: The code at line 299 was first introduced by commit
:::::: 0d76751fad7739014485ba5bd388d4f1b4fd4143 l2tp: Add L2TPv3 IP encapsulation (no UDP) support

:::::: TO: James Chapman <jchapman@katalix.com>
:::::: CC: David S. Miller <davem@davemloft.net>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 37847 bytes --]

^ permalink raw reply

* Re: [PATCH net-next v3 0/4] Documentation: net: phy: Improve documentation
From: David Miller @ 2016-11-28 21:08 UTC (permalink / raw)
  To: f.fainelli
  Cc: netdev, andrew, sf84, martin.blumenstingl, mans, alexandre.torgue,
	peppe.cavallaro, timur, jbrunet
In-Reply-To: <20161128024515.13070-1-f.fainelli@gmail.com>

From: Florian Fainelli <f.fainelli@gmail.com>
Date: Sun, 27 Nov 2016 18:45:11 -0800

> This patch series addresses discussions and feedback that was recently received
> on the mailing-list in the area of: flow control/pause frames, interpretation of
> phy_interface_t and finally add some links to useful standards documents.

I'm always happy to see documentation improvements, series applied,
thanks!

^ permalink raw reply

* [PATCH] rtl8xxxu: fix tx rate debug output
From: Arnd Bergmann @ 2016-11-28 21:08 UTC (permalink / raw)
  To: Jes Sorensen
  Cc: Arnd Bergmann, Kalle Valo, linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

We accidentally print the rate before we know it for txdesc_v2:

wireless/realtek/rtl8xxxu/rtl8xxxu_core.c: In function 'rtl8xxxu_fill_txdesc_v2':
wireless/realtek/rtl8xxxu/rtl8xxxu_core.c:4848:3: error: 'rate' may be used uninitialized in this function [-Werror=maybe-uninitialized]

txdesc_v1 got it right, so let's do it the same way here.

Fixes: b4c3d9cfb607 ("rtl8xxxu: Pass tx_info to fill_txdesc in order to have access to retry count")
Signed-off-by: Arnd Bergmann <arnd-r2nGTMty4D4@public.gmane.org>
---
 drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c b/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c
index 04141e57b8ae..a9137abc3ad9 100644
--- a/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c
+++ b/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c
@@ -4844,16 +4844,16 @@ rtl8xxxu_fill_txdesc_v2(struct ieee80211_hw *hw, struct ieee80211_hdr *hdr,
 
 	tx_desc40 = (struct rtl8xxxu_txdesc40 *)tx_desc32;
 
-	if (rtl8xxxu_debug & RTL8XXXU_DEBUG_TX)
-		dev_info(dev, "%s: TX rate: %d, pkt size %d\n",
-			 __func__, rate, cpu_to_le16(tx_desc40->pkt_size));

^ permalink raw reply related

* [PATCH net-next 2/2] net: phy: bcm7xxx: Plug in support for reading PHY error counters
From: Florian Fainelli @ 2016-11-28 21:06 UTC (permalink / raw)
  To: netdev
  Cc: davem, andrew, bcm-kernel-feedback-list, allan.nielsen,
	raju.lakkaraju, Florian Fainelli
In-Reply-To: <20161128210614.12621-1-f.fainelli@gmail.com>

Broadcom BCM7xxx internal PHYs can leverage the Broadcom PHY library
module PHY error counters helper functions, just implement the
appropriate PHY driver function calls to do so. We need to allocate some
storage space for our PHY statistics, and provide it to the Broadcom PHY
library, so do this in a specific probe function, and slightly wrap the
get_stats function call.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/phy/bcm7xxx.c | 34 ++++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)

diff --git a/drivers/net/phy/bcm7xxx.c b/drivers/net/phy/bcm7xxx.c
index 5b3be4c67be8..fb976ab2ab92 100644
--- a/drivers/net/phy/bcm7xxx.c
+++ b/drivers/net/phy/bcm7xxx.c
@@ -45,6 +45,10 @@
 #define AFE_VDAC_OTHERS_0		MISC_ADDR(0x39, 3)
 #define AFE_HPF_TRIM_OTHERS		MISC_ADDR(0x3a, 0)
 
+struct bcm7xxx_phy_priv {
+	u64	*stats;
+};
+
 static void r_rc_cal_reset(struct phy_device *phydev)
 {
 	/* Reset R_CAL/RC_CAL Engine */
@@ -350,6 +354,32 @@ static int bcm7xxx_28nm_set_tunable(struct phy_device *phydev,
 	return genphy_restart_aneg(phydev);
 }
 
+static void bcm7xxx_28nm_get_phy_stats(struct phy_device *phydev,
+				       struct ethtool_stats *stats, u64 *data)
+{
+	struct bcm7xxx_phy_priv *priv = phydev->priv;
+
+	bcm_phy_get_stats(phydev, priv->stats, stats, data);
+}
+
+static int bcm7xxx_28nm_probe(struct phy_device *phydev)
+{
+	struct bcm7xxx_phy_priv *priv;
+
+	priv = devm_kzalloc(&phydev->mdio.dev, sizeof(*priv), GFP_KERNEL);
+	if (!priv)
+		return -ENOMEM;
+
+	phydev->priv = priv;
+
+	priv->stats = devm_kzalloc(&phydev->mdio.dev,
+				   bcm_phy_get_sset_count(phydev), GFP_KERNEL);
+	if (!priv->stats)
+		return -ENOMEM;
+
+	return 0;
+}
+
 #define BCM7XXX_28NM_GPHY(_oui, _name)					\
 {									\
 	.phy_id		= (_oui),					\
@@ -364,6 +394,10 @@ static int bcm7xxx_28nm_set_tunable(struct phy_device *phydev,
 	.resume		= bcm7xxx_28nm_resume,				\
 	.get_tunable	= bcm7xxx_28nm_get_tunable,			\
 	.set_tunable	= bcm7xxx_28nm_set_tunable,			\
+	.get_sset_count	= bcm_phy_get_sset_count,			\
+	.get_strings	= bcm_phy_get_strings,				\
+	.get_stats	= bcm7xxx_28nm_get_phy_stats,			\
+	.probe		= bcm7xxx_28nm_probe,				\
 }
 
 #define BCM7XXX_40NM_EPHY(_oui, _name)					\
-- 
2.9.3

^ permalink raw reply related

* [PATCH net-next 1/2] net: phy: broadcom: Add support code for reading PHY counters
From: Florian Fainelli @ 2016-11-28 21:06 UTC (permalink / raw)
  To: netdev
  Cc: davem, andrew, bcm-kernel-feedback-list, allan.nielsen,
	raju.lakkaraju, Florian Fainelli
In-Reply-To: <20161128210614.12621-1-f.fainelli@gmail.com>

Broadcom PHYs expose a number of PHY error counters: receive errors,
false carrier sense, SerDes BER count, local and remote receive errors.
Add support code to allow retrieving these error counters. Since the
Broadcom PHY library code is used by several drivers, make it possible
for them to specify the storage for the software copy of the statistics.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/phy/bcm-phy-lib.c | 67 +++++++++++++++++++++++++++++++++++++++++++
 drivers/net/phy/bcm-phy-lib.h |  5 ++++
 include/linux/brcmphy.h       |  3 ++
 3 files changed, 75 insertions(+)

diff --git a/drivers/net/phy/bcm-phy-lib.c b/drivers/net/phy/bcm-phy-lib.c
index 3156ce6d5861..afaa747979f2 100644
--- a/drivers/net/phy/bcm-phy-lib.c
+++ b/drivers/net/phy/bcm-phy-lib.c
@@ -17,6 +17,7 @@
 #include <linux/mdio.h>
 #include <linux/module.h>
 #include <linux/phy.h>
+#include <linux/ethtool.h>
 
 #define MII_BCM_CHANNEL_WIDTH     0x2000
 #define BCM_CL45VEN_EEE_ADV       0x3c
@@ -317,6 +318,72 @@ int bcm_phy_downshift_set(struct phy_device *phydev, u8 count)
 }
 EXPORT_SYMBOL_GPL(bcm_phy_downshift_set);
 
+struct bcm_phy_hw_stat {
+	const char *string;
+	u8 reg;
+	u8 shift;
+	u8 bits;
+};
+
+/* Counters freeze at either 0xffff or 0xff, better than nothing */
+static struct bcm_phy_hw_stat bcm_phy_hw_stats[] = {
+	{ "phy_receive_errors", MII_BRCM_CORE_BASE12, 0, 16 },
+	{ "phy_serdes_ber_errors", MII_BRCM_CORE_BASE13, 8, 8 },
+	{ "phy_false_carrier_sense_errors", MII_BRCM_CORE_BASE13, 0, 8 },
+	{ "phy_local_rcvr_nok", MII_BRCM_CORE_BASE14, 8, 8 },
+	{ "phy_remote_rcv_nok", MII_BRCM_CORE_BASE14, 0, 8 },
+};
+
+int bcm_phy_get_sset_count(struct phy_device *phydev)
+{
+	return ARRAY_SIZE(bcm_phy_hw_stats);
+}
+
+void bcm_phy_get_strings(struct phy_device *phydev, u8 *data)
+{
+	unsigned int i;
+
+	for (i = 0; i < ARRAY_SIZE(bcm_phy_hw_stats); i++)
+		memcpy(data + i * ETH_GSTRING_LEN,
+		       bcm_phy_hw_stats[i].string, ETH_GSTRING_LEN);
+}
+
+#ifndef UINT64_MAX
+#define UINT64_MAX              (u64)(~((u64)0))
+#endif
+
+/* Caller is supposed to provide appropriate storage for the library code to
+ * access the shadow copy
+ */
+static u64 bcm_phy_get_stat(struct phy_device *phydev, u64 *shadow,
+			    unsigned int i)
+{
+	struct bcm_phy_hw_stat stat = bcm_phy_hw_stats[i];
+	int val;
+	u64 ret;
+
+	val = phy_read(phydev, stat.reg);
+	if (val < 0) {
+		ret = UINT64_MAX;
+	} else {
+		val >>= stat.shift;
+		val = val & ((1 << stat.bits) - 1);
+		shadow[i] += val;
+		ret = shadow[i];
+	}
+
+	return ret;
+}
+
+void bcm_phy_get_stats(struct phy_device *phydev, u64 *shadow,
+		       struct ethtool_stats *stats, u64 *data)
+{
+	unsigned int i;
+
+	for (i = 0; i < ARRAY_SIZE(bcm_phy_hw_stats); i++)
+		data[i] = bcm_phy_get_stat(phydev, shadow, i);
+}
+
 MODULE_DESCRIPTION("Broadcom PHY Library");
 MODULE_LICENSE("GPL v2");
 MODULE_AUTHOR("Broadcom Corporation");
diff --git a/drivers/net/phy/bcm-phy-lib.h b/drivers/net/phy/bcm-phy-lib.h
index a117f657c6d7..7c73808cbbde 100644
--- a/drivers/net/phy/bcm-phy-lib.h
+++ b/drivers/net/phy/bcm-phy-lib.h
@@ -42,4 +42,9 @@ int bcm_phy_downshift_get(struct phy_device *phydev, u8 *count);
 
 int bcm_phy_downshift_set(struct phy_device *phydev, u8 count);
 
+int bcm_phy_get_sset_count(struct phy_device *phydev);
+void bcm_phy_get_strings(struct phy_device *phydev, u8 *data);
+void bcm_phy_get_stats(struct phy_device *phydev, u64 *shadow,
+		       struct ethtool_stats *stats, u64 *data);
+
 #endif /* _LINUX_BCM_PHY_LIB_H */
diff --git a/include/linux/brcmphy.h b/include/linux/brcmphy.h
index f9f8aaf9c943..4f7d8be9ddbf 100644
--- a/include/linux/brcmphy.h
+++ b/include/linux/brcmphy.h
@@ -244,6 +244,9 @@
 #define LPI_FEATURE_EN_DIG1000X		0x4000
 
 /* Core register definitions*/
+#define MII_BRCM_CORE_BASE12	0x12
+#define MII_BRCM_CORE_BASE13	0x13
+#define MII_BRCM_CORE_BASE14	0x14
 #define MII_BRCM_CORE_BASE1E	0x1E
 #define MII_BRCM_CORE_EXPB0	0xB0
 #define MII_BRCM_CORE_EXPB1	0xB1
-- 
2.9.3

^ permalink raw reply related

* [PATCH net-next 0/2] net: phy: broadcom: Support for PHY counters
From: Florian Fainelli @ 2016-11-28 21:06 UTC (permalink / raw)
  To: netdev
  Cc: davem, andrew, bcm-kernel-feedback-list, allan.nielsen,
	raju.lakkaraju, Florian Fainelli

Hi all,

This patch series adds support for reading the Broadcom PHYs internal counters.

Florian Fainelli (2):
  net: phy: broadcom: Add support code for reading PHY counters
  net: phy: bcm7xxx: Plug in support for reading PHY error counters

 drivers/net/phy/bcm-phy-lib.c | 67 +++++++++++++++++++++++++++++++++++++++++++
 drivers/net/phy/bcm-phy-lib.h |  5 ++++
 drivers/net/phy/bcm7xxx.c     | 34 ++++++++++++++++++++++
 include/linux/brcmphy.h       |  3 ++
 4 files changed, 109 insertions(+)

-- 
2.9.3

^ permalink raw reply

* Re: net: GPF in eth_header
From: Eric Dumazet @ 2016-11-28 21:05 UTC (permalink / raw)
  To: Dmitry Vyukov, Florian Westphal
  Cc: syzkaller, Hannes Frederic Sowa, David Miller, Tom Herbert,
	Alexander Duyck, Jiri Benc, Sabrina Dubroca, netdev, LKML
In-Reply-To: <1480362459.18162.83.camel@edumazet-glaptop3.roam.corp.google.com>

On Mon, 2016-11-28 at 11:47 -0800, Eric Dumazet wrote:
> On Mon, 2016-11-28 at 20:34 +0100, Dmitry Vyukov wrote:
> > On Mon, Nov 28, 2016 at 8:04 PM, 'Andrey Konovalov' via syzkaller
> 
> > > Hi Eric,
> > >
> > > As far as I can see, skb_network_offset() becomes negative after
> > > pskb_pull(skb, (u8 *) (fhdr + 1) - skb->data) in nf_ct_frag6_queue().
> > > At least I'm able to detect that with a BUG_ON().
> > >
> > > Also it seems that the issue is only reproducible (at least with the
> > > poc I provided) for a short time after boot.
> > 
> > 
> > Eric,
> > 
> > Is it enough to debug? Or maybe Andrey can trace some values for you.
> 
> Well, now we are talking, if you tell me how many modules you load, it
> might help ;)
> 
> nf_ct_frag6_queue is nowhere to be seen in my kernels, that might
> explain why I could not reproduce the bug.
> 
> Let me try ;)
> 

Might be a bug added in commit daaa7d647f81f3
("netfilter: ipv6: avoid nf_iterate recursion")

Florian, what do you think of dropping a packet that presumably was
mangled badly by nf_ct_frag6_queue() ?

(Like about 48 byte pulled :(, and/or skb->csum changed )

diff --git a/net/ipv6/netfilter/nf_defrag_ipv6_hooks.c b/net/ipv6/netfilter/nf_defrag_ipv6_hooks.c
index f7aab5ab93a5..31aa947332d8 100644
--- a/net/ipv6/netfilter/nf_defrag_ipv6_hooks.c
+++ b/net/ipv6/netfilter/nf_defrag_ipv6_hooks.c
@@ -65,8 +65,8 @@ static unsigned int ipv6_defrag(void *priv,
 
 	err = nf_ct_frag6_gather(state->net, skb,
 				 nf_ct6_defrag_user(state->hook, skb));
-	/* queued */
-	if (err == -EINPROGRESS)
+	/* queued or mangled ... */
+	if (err)
 		return NF_STOLEN;
 
 	return NF_ACCEPT;

^ permalink raw reply related

* Re: [patch net] sched: cls_flower: remove from hashtable only in case skip sw flag is not set
From: Or Gerlitz @ 2016-11-28 21:04 UTC (permalink / raw)
  To: Jiri Pirko, Amir Vadai
  Cc: Linux Netdev List, David Miller, Jamal Hadi Salim, Ido Schimmel,
	Elad Raz, Or Gerlitz, Hadar Hen Zion
In-Reply-To: <1480344013-4812-1-git-send-email-jiri@resnulli.us>

On Mon, Nov 28, 2016 at 4:40 PM, Jiri Pirko <jiri@resnulli.us> wrote:
> From: Jiri Pirko <jiri@mellanox.com>
>
> Be symmetric to hashtable insert and remove filter from hashtable only
> in case skip sw flag is not set.
>
> Fixes: e69985c67c33 ("net/sched: cls_flower: Introduce support in SKIP SW flag")

Amir, Jiri - what was the impact of running without this fix for the
last 3-4 kernels? I haven't seen any crashes, is that leaking took
place? or this is just a cleanup to make things more clear and
maintainable?

^ permalink raw reply

* Re: [PATCH net-next] net: dsa: mv88e6xxx: Fix mv88e6xxx_g1_irq_free() interrupt count
From: David Miller @ 2016-11-28 21:04 UTC (permalink / raw)
  To: afaerber
  Cc: uwe, Michal.Hrusecky, tomas.hlavacek, bedrich.kosata, andrew,
	vivien.didelot, f.fainelli, netdev, linux-kernel
In-Reply-To: <1480285588-13501-1-git-send-email-afaerber@suse.de>

From: Andreas Färber <afaerber@suse.de>
Date: Sun, 27 Nov 2016 23:26:28 +0100

> mv88e6xxx_g1_irq_setup() sets up chip->g1_irq.nirqs interrupt mappings,
> so free the same amount. This will be 8 or 9 in practice, less than 16.
> 
> Fixes: dc30c35be720 ("net: dsa: mv88e6xxx: Implement interrupt support.")
> Cc: Andrew Lunn <andrew@lunn.ch>
> Signed-off-by: Andreas Färber <afaerber@suse.de>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH net-next 1/3] ethtool: (uapi) Add ETHTOOL_PHY_LOOPBACK to PHY tunables
From: Florian Fainelli @ 2016-11-28 21:01 UTC (permalink / raw)
  To: Andrew Lunn, Allan W. Nielsen; +Cc: netdev, raju.lakkaraju
In-Reply-To: <20161128202114.GM17704@lunn.ch>

On 11/28/2016 12:21 PM, Andrew Lunn wrote:
> On Mon, Nov 28, 2016 at 08:23:06PM +0100, Allan W. Nielsen wrote:
>> Hi Andrew and Florian,
>>
>> On 28/11/16 15:14, Andrew Lunn wrote:
>>> On Mon, Nov 28, 2016 at 02:24:30PM +0100, Allan W. Nielsen wrote:
>>>> From: Raju Lakkaraju <Raju.Lakkaraju@microsemi.com>
>>>>
>>>> 3 types of PHY loopback are supported.
>>>> i.e. Near-End Loopback, Far-End Loopback and External Loopback.
>>> Is this integrated with ethtool --test? You only want the PHY to go
>>> into loopback mode when running ethtool --test external_lb or maybe
>>> ethtool --test offline.
>> There are other use-cases for enabling PHY loopback:
>>
>> 1) If the PHY is connected to a switch then a loop-port is sometime
>>    used to "force/enable" one or more extra pass through the switch
>>    core. This "hack" can sometime be used to achieve new functionality
>>    with existing silicon.
> 
> With Linux, switches are managed via switchdev, or DSA. You will have
> to teach this infrastructure that something really odd is going on
> with one of its ports before you do anything like this in the PHY
> layer. I suggest you leave this use case alone until somebody
> really-really wants it. From my knowledge of the Marvell DSA driver,
> this is not easy.

Agree with Andrew here, this particular use case with switches does not
need to be solved now, but if we imagine we need to support that,
chances are that we will want the network device as a configuration
entry point, more than the PHY device itself.

> 
>> 2) Existing user-space application may expect to be able to do the
>>    testing on its own (generate/validate the test traffic).
> 
> Please can you reference some existing user-space application and the
> kernel API it uses to put the PHY into loopback mode?
> 
>> We are always happy to integrate with any existing functionality, but
>> as I understand "ethtool --test" then intention is to perform a test
>> and then bring back the PHY in to a "normal" state (I may be
>> wrong...).
> 
> Correct.
> 
>> The idea with this patch is to allow configuring loopback more
>> "permanently" (userspace can decide when to activate and when to
>> de-activate). I should properly have made that clear in the cover
>> letter.
> 
> Leaving it in loopback is a really bad idea. I've spent days once
> working out why an Ethernet did not work. Turned out the PHY powered
> up in loopback mode, and the embedded OS running on it did not
> initialise it to sensible defaults on probe. Packets we going out,
> dhcp server was replying but all incoming packets were discarded.
> 
> It is really not obvious when everything looks O.K, but nothing works,
> because the PHY is in loopback. There needs to be a big red flag to
> warn you.
> 
> If you really do what to do this, you should look at NETIF_F_LOOPBACK
> and consider how this concept can be applied at the PHY, not the MAC.
> But you need the full concept, so you see things like:
> 
> 2: eth0: <PHY_LOOPBACK,BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
>     link/ether 80:ee:73:83:60:27 brd ff:ff:ff:ff:ff:ff
> 
> Humm, i've no idea how you actually enable the MAC loopback
> NETIF_F_LOOPBACK represents. I don't see anything with ip link set.

I am afraid you lost me on this, NETIF_F_LOOPBACK is a netdev_features_t
bit, so it tells ethtool that this is a potential feature to be turned
on with ethtool -K <iface>. The semantics of this loopack feature are
not defined AFAICT, but a reasonable behavior from the driver is to put
itself in a mode where packets send by a socket-level application are
looped through the Ethernet adapter itself. Whether this happens at the
DMA level, the MII signals, or somewhere in the PHY, is driver specific
unfortunately.

Now, there is another way to toggle a loopback for a given Ethernet
adapter which is to actually set IFF_LOOPBACK in dev->flags for this
interface. Some drivers seem to be able to properly react to that as
well, although I see no way this can be done looking at the iproute2 or
ifconfig man pages..
-- 
Florian

^ permalink raw reply

* Re: [RFC PATCH net-next] ipv6: implement consistent hashing for equal-cost multipath routing
From: David Lebrun @ 2016-11-28 20:58 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20161128.153209.2135257061368558724.davem@davemloft.net>

[-- Attachment #1: Type: text/plain, Size: 635 bytes --]

On 11/28/2016 09:32 PM, David Miller wrote:
> When I was working on the routing cache removal in ipv4 I compared
> using a stupid O(1) hash lookup of the FIB entries vs. the O(log n)
> fib_trie stuff actually in use.
> 
> It did make a difference.
> 
> This is a lookup that can be invoked 20 million times per second or
> more.
> 
> Every cycle matters.
> 
> We already have a lot of trouble getting under the cycle budget one
> has for routing at wire speed for very high link rates, please don't
> make it worse.

OK, so O(1) mandatory. I will continue in that direction then.

Thanks for the feedback

David


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 163 bytes --]

^ permalink raw reply

* Re: [PATCH net-next v3 2/3] bpf: Add new cgroup attach type to enable sock modifications
From: David Ahern @ 2016-11-28 20:57 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: netdev, daniel, ast, daniel, maheshb, tgraf
In-Reply-To: <20161128203252.GB7634@ast-mbp.thefacebook.com>

On 11/28/16 1:32 PM, Alexei Starovoitov wrote:
> On Mon, Nov 28, 2016 at 07:48:49AM -0800, David Ahern wrote:
>> Add new cgroup based program type, BPF_PROG_TYPE_CGROUP_SOCK. Similar to
>> BPF_PROG_TYPE_CGROUP_SKB programs can be attached to a cgroup and run
>> any time a process in the cgroup opens an AF_INET or AF_INET6 socket.
>> Currently only sk_bound_dev_if is exported to userspace for modification
>> by a bpf program.
>>
>> This allows a cgroup to be configured such that AF_INET{6} sockets opened
>> by processes are automatically bound to a specific device. In turn, this
>> enables the running of programs that do not support SO_BINDTODEVICE in a
>> specific VRF context / L3 domain.
>>
>> Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
> ...
>> diff --git a/include/linux/filter.h b/include/linux/filter.h
>> index 1f09c521adfe..808e158742a2 100644
>> --- a/include/linux/filter.h
>> +++ b/include/linux/filter.h
>> @@ -408,7 +408,7 @@ struct bpf_prog {
>>  	enum bpf_prog_type	type;		/* Type of BPF program */
>>  	struct bpf_prog_aux	*aux;		/* Auxiliary fields */
>>  	struct sock_fprog_kern	*orig_prog;	/* Original BPF program */
>> -	unsigned int		(*bpf_func)(const struct sk_buff *skb,
>> +	unsigned int		(*bpf_func)(const void *ctx,
>>  					    const struct bpf_insn *filter);
> 
> Daniel already tweaked it. pls rebase.

ack

> 
>> +static const struct bpf_func_proto *
>> +cg_sock_func_proto(enum bpf_func_id func_id)
>> +{
>> +	return NULL;
>> +}
> 
> if you don't want any helpers, just don't set .get_func_proto.
> See check_call() in verifier.

ack.

> Though why not allow socket filter like helpers that
> sk_filter_func_proto() provides?
> tail call, bpf_trace_printk, maps are useful things that you get for free.
> Developing programs without bpf_trace_printk is pretty hard.

this use case was trivial enough, but in general I get your point will use sk_filter_func_proto.

> 
>> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
>> index 5ddf5cda07f4..24d2550492ee 100644
>> --- a/net/ipv4/af_inet.c
>> +++ b/net/ipv4/af_inet.c
>> @@ -374,8 +374,18 @@ static int inet_create(struct net *net, struct socket *sock, int protocol,
>>  
>>  	if (sk->sk_prot->init) {
>>  		err = sk->sk_prot->init(sk);
>> -		if (err)
>> +		if (err) {
>> +			sk_common_release(sk);
>> +			goto out;
>> +		}
>> +	}
>> +
>> +	if (!kern) {
>> +		err = BPF_CGROUP_RUN_PROG_INET_SOCK(sk);
> 
> i guess from vrf use case point of view this is the best place,
> since so_bindtodevice can still override it,
> but thinking little bit into other use case like port binding
> restrictions and port rewrites can we move it into inet_bind ?

Deferring to inet_bind won't work for a number of use cases (e.g., udp, raw).

> My understanding nothing will be using bound_dev_if until that
> time, so we can set it there?

And yes, I do want to allow a sufficiently privileged process to override the inherited setting. For example, shell is management vrf cgroup and root user wants to run a program that sends packets out a data plane vrf using an option built into the program. The sequence is:

1. socket - inherits sk_bound_dev_if from bpf program attached to mgmt cgroup

2. setsockopt( new vrf )

3. connect - lookups to remote address use vrf from step 2.

Thanks for the review.

^ permalink raw reply

* Re: Receive offloads, small RCVBUF and zero TCP window
From: David Miller @ 2016-11-28 20:54 UTC (permalink / raw)
  To: alexandre.sidorenko; +Cc: netdev
In-Reply-To: <2080597.A38JFJZ1AD@zbook>

From: Alex Sidorenko <alexandre.sidorenko@hpe.com>
Date: Mon, 28 Nov 2016 15:49:26 -0500

> Now the question is whether is is OK to have icsk->icsk_ack.rcv_mss
> larger than MTU.

It absolutely is not OK.

If VMWare wants to receive large frames for batching purposes it must
use GRO or similar to achieve that, not just send vanilla frames into
the stack which are larger than the device MTU.

^ permalink raw reply

* Re: [PATCH net-next 10/11] qede: Add basic XDP support
From: Mintz, Yuval @ 2016-11-28 20:53 UTC (permalink / raw)
  To: Tom Herbert; +Cc: David S. Miller, Linux Kernel Network Developers
In-Reply-To: <CALx6S36kiAiMeZoszx=5uBrUecwCodJx2tg3kL4HBk=4eVMSLg@mail.gmail.com>

> > +       if (act == XDP_PASS)
> > +               return true;
> > +
> > +       /* Count number of packets not to be passed to stack */
> > +       rxq->xdp_no_pass++;
> > +
> > +       switch (act) {
> > +       default:
> > +               bpf_warn_invalid_xdp_action(act);

> XDP_TX is a valid action and in fact some drivers already support that.
> Given that this isn't the first instance of driver not supporting XDP_TX
> I think we need to clear define what means. Personally, I think that we
> shouldn't allow a program to load that  returns XDP_TX but driver does
> not support it. I believe Jesper might be looking into capabilities
> support for XDP to handle that. For the purposes of this patch I'd suggest
> having an XDP_TX case and warn user that an unsupported action
> as opposed to invalid  one was returned.

Patch #11 does add XDP_TX support.
Adding an explicit case with a warning to be removed in the next patch
sounds like a waste to me. But if you think 'future generations' would
benefit from it, sure.

> > +       case XDP_ABORTED:
> > +       case XDP_DROP:

^ permalink raw reply

* Receive offloads, small RCVBUF and zero TCP window
From: Alex Sidorenko @ 2016-11-28 20:49 UTC (permalink / raw)
  To: netdev

One of our customers has met a problem: TCP window closes and stays closed forever even though receive buffer is empty. This problem has been reported for RHEL6.8 and I think that the issue is in __tcp_select_window() subroutine. Comparing sources of RHEL6.8 kernel and the latest upstream kernel (pulled from GIT today), it looks that it should still be present in the latest kernels.

The problem is triggered by the following conditions:

(a) small RCVBUF (24576 in our case), as a result WS=0
(b) mss = icsk->icsk_ack.rcv_mss > MTU

I asked customer to trigger vmcore when the problem occurs to find why window stays closed forever. I can see in vmcore (doing calculations following __tcp_select_window sources):

        windows: rcv=0, snd=65535  advmss=1460 rcv_ws=0 snd_ws=0
        --- Emulating __tcp_select_window ---
          rcv_mss=7300 free_space=18432 allowed_space=18432 full_space=16972
          rcv_ssthresh=5840, so free_space->5840 

So when we reach the test

		if (window <= free_space - mss || window > free_space)
			window = (free_space / mss) * mss;
		else if (mss == full_space &&
			 free_space > window + (full_space >> 1))
			window = free_space;

we have  negative value of (free_space - mss) = -1460

As a result, we do not update the window and it stays zero forever - even though application has read all available data and we have sufficient free_space.

This occurs only due to the fact that we have interface with MTU=1500 (so that mss=1460 is expected), but icsk->icsk_ack.rcv_mss is 5*1460 = 7300.

As a result, "Get the largest window that is a nice multiple of mss" means a multiple of 7300, and this never happens!

All other mss-related values look reasonable:

crash64> struct tcp_sock 0xffff8801bcb8c840  | grep mss
    icsk_sync_mss = 0xffffffff814ce620 , 
      rcv_mss = 7300
  mss_cache = 1460, 
  advmss = 1460, 
    user_mss = 0, 
    mss_clamp = 1460

Now the question is whether is is OK to have icsk->icsk_ack.rcv_mss larger than MTU. I suspect the most important factor is that this host is running under VMWare. VMWare probably optimizes receive offloading dramatically, pushing to us merged SKBs larger than MTU. I have written a tool to print warnings when we have mss > advmss and ran it on my collection of vmcores. Almost in all cases where vmcore was taken on VMWare guest, we have some connections with mss > advmss. I have not found any vmcores showing this high mss value for any non-VMWare vmcore.

Obviously, this is a corner-case problem - it can happen only if we have a small RCVBUF. But I think this needs to be fixed anyway. I am not sure whether having 
icsk->icsk_ack.rcv_mss > MTU is expected. If not, this should be fixed in receiving offload subroutines (LRO?) or maybe VMWare NIC driver.

But if it is OK for NICs to merge received SKBs and present to TCP supersegments (similar to TSO), this needs to be fixed in __tcp_select_window - e.g. if we see a small RCVBUF and large icsk->icsk_ack.rcv_mss, switch to mss_clamp, as it was done in older versions. From __tcp_select_window() comment 

	/* MSS for the peer's data.  Previous versions used mss_clamp
	 * here.  I don't know if the value based on our guesses
	 * of peer's MSS is better for the performance.  It's more correct
	 * but may be worse for the performance because of rcv_mss
	 * fluctuations.  --SAW  1998/11/1
	 */

Regards,
Alex

-- 

------------------------------------------------------------------
Alex Sidorenko	email: asid@hpe.com
ERT  Linux 	Hewlett-Packard Enterprise (Canada)
------------------------------------------------------------------

^ permalink raw reply

* Re: [PATCH net-next v3 3/3] samples: bpf: add userspace example for modifying sk_bound_dev_if
From: David Ahern @ 2016-11-28 20:47 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: netdev, daniel, ast, daniel, maheshb, tgraf
In-Reply-To: <20161128203752.GC7634@ast-mbp.thefacebook.com>

On 11/28/16 1:37 PM, Alexei Starovoitov wrote:
> On Mon, Nov 28, 2016 at 07:48:50AM -0800, David Ahern wrote:
>> Add a simple program to demonstrate the ability to attach a bpf program
>> to a cgroup that sets sk_bound_dev_if for AF_INET{6} sockets when they
>> are created.
>>
>> Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
> ...
>> +static int prog_load(int idx)
>> +{
>> +	struct bpf_insn prog[] = {
>> +		BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
>> +		BPF_MOV64_IMM(BPF_REG_3, idx),
>> +		BPF_MOV64_IMM(BPF_REG_2, offsetof(struct bpf_sock, bound_dev_if)),
>> +		BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_3, offsetof(struct bpf_sock, bound_dev_if)),
>> +		BPF_MOV64_IMM(BPF_REG_0, 1), /* r0 = verdict */
>> +		BPF_EXIT_INSN(),
>> +	};
>> +
>> +	return bpf_prog_load(BPF_PROG_TYPE_CGROUP_SOCK, prog, sizeof(prog),
>> +			     "GPL", 0);
>> +}
> 
> the program looks trivial enough :)
> 
> Could you integrate it into iproute2 as well ?

yes, that is the plan. iproute2 can be used for all things vrf. As infra goes into the kernel, support is added to iproute2

> Then the whole vrf management will be easier.
> The user wouldn't even need to be aware that iproute2 sets up
> this program. It will know ifindex and can delete
> the prog when vrf configs change and so on.
> 
> Also please convert this sample into automated test like samples/bpf/*.sh
> we're going to move all of them to tools/testing/selftests/ eventually.
> 

ok

^ permalink raw reply

* Re: [PATCH v3 net-next 1/2] net: ethernet: slicoss: add slicoss gigabit ethernet driver
From: Lino Sanfilippo @ 2016-11-28 20:46 UTC (permalink / raw)
  To: Markus Böhme, davem, charrer, liodot, gregkh, andrew
  Cc: devel, netdev, linux-kernel
In-Reply-To: <910890ec-7acf-1ced-55c6-d854ee2cdccc@mailbox.org>

Hi Markus,

On 27.11.2016 18:59, Markus Böhme wrote:
> Hello Lino,
> 
> just some things barely worth mentioning:
> 

> 
> I found a bunch of unused #defines in slic.h. I cannot judge if they are
> worth keeping:
> 
> 	SLIC_VRHSTATB_LONGE
> 	SLIC_VRHSTATB_PREA
> 	SLIC_ISR_IO
> 	SLIC_ISR_PING_MASK
> 	SLIC_GIG_SPEED_MASK
> 	SLIC_GMCR_RESET
> 	SLIC_XCR_RESET
> 	SLIC_XCR_XMTEN
> 	SLIC_XCR_PAUSEEN
> 	SLIC_XCR_LOADRNG
> 	SLIC_REG_DBAR
> 	SLIC_REG_PING
> 	SLIC_REG_DUMP_CMD
> 	SLIC_REG_DUMP_DATA
> 	SLIC_REG_WRHOSTID
> 	SLIC_REG_LOW_POWER
> 	SLIC_REG_RESET_IFACE
> 	SLIC_REG_ADDR_UPPER
> 	SLIC_REG_HBAR64
> 	SLIC_REG_DBAR64
> 	SLIC_REG_CBAR64
> 	SLIC_REG_RBAR64
> 	SLIC_REG_WRVLANID
> 	SLIC_REG_READ_XF_INFO
> 	SLIC_REG_WRITE_XF_INFO
> 	SLIC_REG_TICKS_PER_SEC
> 
> These device IDs are not used, either, but maybe it's good to keep them
> for documentation purposes:
> 
> 	PCI_SUBDEVICE_ID_ALACRITECH_1000X1_2
> 	PCI_SUBDEVICE_ID_ALACRITECH_SES1001T
> 	PCI_SUBDEVICE_ID_ALACRITECH_SEN2002XT
> 	PCI_SUBDEVICE_ID_ALACRITECH_SEN2001XT
> 	PCI_SUBDEVICE_ID_ALACRITECH_SEN2104ET
> 	PCI_SUBDEVICE_ID_ALACRITECH_SEN2102ET
> 

I left these defines in for both documentation and to avoid gaps in
register ranges. I would like to keep this as it is.

>> +
>> +/* SLIC EEPROM structure for Oasis */
>> +struct slic_mojave_eeprom {
> 
> Comment: "for Mojave".

Will fix, thanks,

> 
> [...]
> 
>> +struct slic_device {
>> +	struct pci_dev *pdev;
>> +	struct net_device *netdev;
>> +	void __iomem *regs;
>> +	/* upper address setting lock */
>> +	spinlock_t upper_lock;
>> +	struct slic_shmem shmem;
>> +	struct napi_struct napi;
>> +	struct slic_rx_queue rxq;
>> +	struct slic_tx_queue txq;
>> +	struct slic_stat_queue stq;
>> +	struct slic_stats stats;
>> +	struct slic_upr_list upr_list;
>> +	/* link configuration lock */
>> +	spinlock_t link_lock;
>> +	bool promisc;
>> +	bool autoneg;
>> +	int speed;
>> +	int duplex;
> 
> Maybe make speed and duplex unsigned? They are assigned and compared
> against unsigned values in slicoss.c, so this would get rid of some
> (benign, because of the range of the values) -Wsign-compare warnings in
> slic_configure_link_locked. However, in a comparison there SPEED_UNKNOWN
> would need to be casted to unsigned to prevent another one popping up.
> 

There is indeed a bunch of warnings concerning signedness. Will have a look
at all of them. However I think I will keep "speed" as an int, because casting
SPEED_UNKNOWN to an unsigned int is IMHO an ugly thing to do.

> [...]
> 
>> +#endif /* _SLIC_H */
>> diff --git a/drivers/net/ethernet/alacritech/slicoss.c b/drivers/net/ethernet/alacritech/slicoss.c
>> new file mode 100644
>> index 0000000..8cd862a
>> --- /dev/null
>> +++ b/drivers/net/ethernet/alacritech/slicoss.c
>> @@ -0,0 +1,1867 @@
> 
> [...]
> 
>> +
>> +static const struct pci_device_id slic_id_tbl[] = {
>> +	{ PCI_DEVICE(PCI_VENDOR_ID_ALACRITECH,
>> +		     PCI_DEVICE_ID_ALACRITECH_MOAVE) },
> 
> I missed this in slic.h, but is this a typo and "MOAVE" should be
> "MOJAVE"? There are a couple similar #defines in slic.h.

This should definitely be "Mojave". Will fix it. 

> 
> [...]
> 
>> +static void slic_refill_rx_queue(struct slic_device *sdev, gfp_t gfp)
>> +{
>> +	const unsigned int ALIGN_MASK = SLIC_RX_BUFF_ALIGN - 1;
>> +	unsigned int maplen = SLIC_RX_BUFF_SIZE;
>> +	struct slic_rx_queue *rxq = &sdev->rxq;
>> +	struct net_device *dev = sdev->netdev;
>> +	struct slic_rx_buffer *buff;
>> +	struct slic_rx_desc *desc;
>> +	unsigned int misalign;
>> +	unsigned int offset;
>> +	struct sk_buff *skb;
>> +	dma_addr_t paddr;
>> +
>> +	while (slic_get_free_rx_descs(rxq) > SLIC_MAX_REQ_RX_DESCS) {
>> +		skb = alloc_skb(maplen + ALIGN_MASK, gfp);
>> +		if (!skb)
>> +			break;
>> +
>> +		paddr = dma_map_single(&sdev->pdev->dev, skb->data, maplen,
>> +				       DMA_FROM_DEVICE);
>> +		if (dma_mapping_error(&sdev->pdev->dev, paddr)) {
>> +			netdev_err(dev, "mapping rx packet failed\n");
>> +			/* drop skb */
>> +			dev_kfree_skb_any(skb);
>> +			break;
>> +		}
>> +		/* ensure head buffer descriptors are 256 byte aligned */
>> +		offset = 0;
>> +		misalign = paddr & ALIGN_MASK;
>> +		if (misalign) {
>> +			offset = SLIC_RX_BUFF_ALIGN - misalign;
>> +			skb_reserve(skb, offset);
>> +		}
>> +		/* the HW expects dma chunks for descriptor + frame data */
>> +		desc = (struct slic_rx_desc *)skb->data;
>> +		memset(desc, 0, sizeof(*desc));
>> +
>> +		buff = &rxq->rxbuffs[rxq->put_idx];
>> +		buff->skb = skb;
>> +		dma_unmap_addr_set(buff, map_addr, paddr);
>> +		dma_unmap_len_set(buff, map_len, maplen);
>> +		buff->addr_offset = offset;
>> +		/* head buffer descriptors are placed immediately before skb */
>> +		slic_write(sdev, SLIC_REG_HBAR, lower_32_bits(paddr) +
>> +						offset);
> 
> This fits nicely on one line. :-)

Right, will fix.

> 
> [...]
> 
>> +static int slic_init_tx_queue(struct slic_device *sdev)
>> +{
>> +	struct slic_tx_queue *txq = &sdev->txq;
>> +	struct slic_tx_buffer *buff;
>> +	struct slic_tx_desc *desc;
>> +	int err;
>> +	int i;
> 
> You could make i unsigned...
> 

>> +
>> +	txq->len = SLIC_NUM_TX_DESCS;
>> +	txq->put_idx = 0;
>> +	txq->done_idx = 0;
>> +
>> +	txq->txbuffs = kcalloc(txq->len, sizeof(*buff), GFP_KERNEL);
>> +	if (!txq->txbuffs)
>> +		return -ENOMEM;
>> +
>> +	txq->dma_pool = dma_pool_create("slic_pool", &sdev->pdev->dev,
>> +					sizeof(*desc), SLIC_TX_DESC_ALIGN,
>> +					4096);
>> +	if (!txq->dma_pool) {
>> +		err = -ENOMEM;
>> +		netdev_err(sdev->netdev, "failed to create dma pool\n");
>> +		goto free_buffs;
>> +	}
>> +
>> +	for (i = 0; i < txq->len; i++) {
> 
> ...to fix a signed/unsigned comparison warning here, but...
> 
>> +		buff = &txq->txbuffs[i];
>> +		desc = dma_pool_zalloc(txq->dma_pool, GFP_KERNEL,
>> +				       &buff->desc_paddr);
>> +		if (!desc) {
>> +			netdev_err(sdev->netdev,
>> +				   "failed to alloc pool chunk (%i)\n", i);
>> +			err = -ENOMEM;
>> +			goto free_descs;
>> +		}
>> +
>> +		desc->hnd = cpu_to_le32((u32)(i + 1));
>> +		desc->cmd = SLIC_CMD_XMT_REQ;
>> +		desc->flags = 0;
>> +		desc->type = cpu_to_le32(SLIC_CMD_TYPE_DUMB);
>> +		buff->desc = desc;
>> +	}
>> +
>> +	return 0;
>> +
>> +free_descs:
>> +	while (i--) {
> 
> ...this would require reworking this logic to prevent an endless loop,
> so probably not worth bothering, considering that txq->len is well
> within the positive signed range.

AFAICS the logic does not have to be changed. The while loop will also work
fine if "i" is unsigned.

> 
>> +		buff = &txq->txbuffs[i];
>> +		dma_pool_free(txq->dma_pool, buff->desc, buff->desc_paddr);
>> +	}
>> +	dma_pool_destroy(txq->dma_pool);
>> +
>> +free_buffs:
>> +	kfree(txq->txbuffs);
>> +
>> +	return err;
>> +}
>> +
>> +static void slic_free_tx_queue(struct slic_device *sdev)
>> +{
>> +	struct slic_tx_queue *txq = &sdev->txq;
>> +	struct slic_tx_buffer *buff;
>> +	int i;
> 
> Make i unsigned? One warning less, almost no work invested.
> 
>> +
>> +	for (i = 0; i < txq->len; i++) {
>> +		buff = &txq->txbuffs[i];
>> +		dma_pool_free(txq->dma_pool, buff->desc, buff->desc_paddr);
>> +		if (!buff->skb)
>> +			continue;
>> +
>> +		dma_unmap_single(&sdev->pdev->dev,
>> +				 dma_unmap_addr(buff, map_addr),
>> +				 dma_unmap_len(buff, map_len), DMA_TO_DEVICE);
>> +		consume_skb(buff->skb);
>> +	}
>> +	dma_pool_destroy(txq->dma_pool);
>> +
>> +	kfree(txq->txbuffs);
>> +}
>> +
> 
> [...]
> 
>> +static void slic_free_rx_queue(struct slic_device *sdev)
>> +{
>> +	struct slic_rx_queue *rxq = &sdev->rxq;
>> +	struct slic_rx_buffer *buff;
>> +	int i;
> 
> Unsigned?
> 
>> +
>> +	/* free rx buffers */
>> +	for (i = 0; i < rxq->len; i++) {
>> +		buff = &rxq->rxbuffs[i];
>> +
>> +		if (!buff->skb)
>> +			continue;
>> +
>> +		dma_unmap_single(&sdev->pdev->dev,
>> +				 dma_unmap_addr(buff, map_addr),
>> +				 dma_unmap_len(buff, map_len),
>> +				 DMA_FROM_DEVICE);
>> +		consume_skb(buff->skb);
>> +	}
>> +	kfree(rxq->rxbuffs);
>> +}
> 
> [...]
> 
>> +static int slic_load_firmware(struct slic_device *sdev)
>> +{
>> +	u32 sectstart[SLIC_FIRMWARE_MAX_SECTIONS];
>> +	u32 sectsize[SLIC_FIRMWARE_MAX_SECTIONS];
>> +	const struct firmware *fw;
>> +	unsigned int datalen;
>> +	const char *file;
>> +	int code_start;
>> +	u32 numsects;
>> +	int idx = 0;
>> +	u32 sect;
>> +	u32 instr;
>> +	u32 addr;
>> +	u32 base;
>> +	int err;
>> +	int i;
> 
> Make i unsigned?
> 
>> +
>> +	file = (sdev->model == SLIC_MODEL_OASIS) ?  SLIC_FIRMWARE_OASIS :
>> +						    SLIC_FIRMWARE_MOAVE;
>> +	err = request_firmware(&fw, file, &sdev->pdev->dev);
>> +	if (err) {
>> +		dev_err(&sdev->pdev->dev, "failed to load firmware %s\n", file);
>> +		return err;
>> +	}
>> +	/* Do an initial sanity check concerning firmware size now. A further
>> +	 * check follows below.
>> +	 */
>> +	if (fw->size < SLIC_FIRMWARE_MIN_SIZE) {
>> +		dev_err(&sdev->pdev->dev,
>> +			"invalid firmware size %zu (min is %u)\n", fw->size,
>> +			SLIC_FIRMWARE_MIN_SIZE);
>> +		err = -EINVAL;
>> +		goto release;
>> +	}
>> +
>> +	numsects = slic_read_dword_from_firmware(fw, &idx);
>> +	if (numsects == 0 || numsects > SLIC_FIRMWARE_MAX_SECTIONS) {
>> +		dev_err(&sdev->pdev->dev,
>> +			"invalid number of sections in firmware: %u", numsects);
>> +		err = -EINVAL;
>> +		goto release;
>> +	}
>> +
>> +	datalen = numsects * 8 + 4;
>> +	for (i = 0; i < numsects; i++) {
>> +		sectsize[i] = slic_read_dword_from_firmware(fw, &idx);
>> +		datalen += sectsize[i];
>> +	}
>> +
>> +	/* do another sanity check against firmware size */
>> +	if (datalen > fw->size) {
>> +		dev_err(&sdev->pdev->dev,
>> +			"invalid firmware size %zu (expected >= %u)\n",
>> +			fw->size, datalen);
>> +		err = -EINVAL;
>> +		goto release;
>> +	}
>> +	/* get sections */
>> +	for (i = 0; i < numsects; i++)
>> +		sectstart[i] = slic_read_dword_from_firmware(fw, &idx);
>> +
>> +	code_start = idx;
>> +	instr = slic_read_dword_from_firmware(fw, &idx);
>> +
>> +	for (sect = 0; sect < numsects; sect++) {
>> +		unsigned int ssize = sectsize[sect] >> 3;
>> +
>> +		base = sectstart[sect];
>> +
>> +		for (addr = 0; addr < ssize; addr++) {
>> +			/* write out instruction address */
>> +			slic_write(sdev, SLIC_REG_WCS, base + addr);
>> +			/* write out instruction to low addr */
>> +			slic_write(sdev, SLIC_REG_WCS, instr);
>> +			instr = slic_read_dword_from_firmware(fw, &idx);
>> +			/* write out instruction to high addr */
>> +			slic_write(sdev, SLIC_REG_WCS, instr);
>> +			instr = slic_read_dword_from_firmware(fw, &idx);
>> +		}
>> +	}
>> +
>> +	idx = code_start;
>> +
>> +	for (sect = 0; sect < numsects; sect++) {
>> +		unsigned int ssize = sectsize[sect] >> 3;
>> +
>> +		instr = slic_read_dword_from_firmware(fw, &idx);
>> +		base = sectstart[sect];
>> +		if (base < 0x8000)
>> +			continue;
>> +
>> +		for (addr = 0; addr < ssize; addr++) {
>> +			/* write out instruction address */
>> +			slic_write(sdev, SLIC_REG_WCS,
>> +				   SLIC_WCS_COMPARE | (base + addr));
>> +			/* write out instruction to low addr */
>> +			slic_write(sdev, SLIC_REG_WCS, instr);
>> +			instr = slic_read_dword_from_firmware(fw, &idx);
>> +			/* write out instruction to high addr */
>> +			slic_write(sdev, SLIC_REG_WCS, instr);
>> +			instr = slic_read_dword_from_firmware(fw, &idx);
>> +		}
>> +	}
>> +	slic_flush_write(sdev);
>> +	mdelay(10);
>> +	/* everything OK, kick off the card */
>> +	slic_write(sdev, SLIC_REG_WCS, SLIC_WCS_START);
>> +	slic_flush_write(sdev);
>> +	/* wait long enough for ucode to init card and reach the mainloop */
>> +	mdelay(20);
>> +release:
>> +	release_firmware(fw);
>> +
>> +	return err;
>> +}
> 
> [...]
> 
>> +static int slic_init_iface(struct slic_device *sdev)
>> +{
>> +	struct slic_shmem *sm = &sdev->shmem;
>> +	int err;
>> +
>> +	sdev->upr_list.pending = false;
>> +
>> +	err = slic_init_shmem(sdev);
>> +	if (err) {
>> +		netdev_err(sdev->netdev, "failed to load firmware\n");
> 
> Wrong error message.

Yep, will fix.

> 
>> +		return err;
>> +	}
> 
> [...]
> 
>> +static netdev_tx_t slic_xmit(struct sk_buff *skb, struct net_device *dev)
>> +{
>> +	struct slic_device *sdev = netdev_priv(dev);
>> +	struct slic_tx_queue *txq = &sdev->txq;
>> +	struct slic_tx_buffer *buff;
>> +	struct slic_tx_desc *desc;
>> +	dma_addr_t paddr;
>> +	u32 cbar_val;
>> +	u32 maplen;
>> +
>> +	if (unlikely(slic_get_free_tx_descs(txq) < SLIC_MAX_REQ_TX_DESCS)) {
>> +		netdev_err(dev, "BUG! not enought tx LEs left: %u\n",
> 
> "Enough"?

Will fix.

>> +			   slic_get_free_tx_descs(txq));
>> +		return NETDEV_TX_BUSY;
>> +	}
> 
> [...]
> 
>> +static int slic_read_eeprom(struct slic_device *sdev)
>> +{
>> +	unsigned int devfn = PCI_FUNC(sdev->pdev->devfn);
>> +	struct slic_shmem *sm = &sdev->shmem;
>> +	struct slic_shmem_data *sm_data = sm->shmem_data;
>> +	const unsigned int MAX_LOOPS = 5000;
> 
> Another benign -Wsign-compare warning can be fixed by either dropping
> the unsigned here or making i below unsigned, too.
> 
>> +	unsigned int codesize;
>> +	unsigned char *eeprom;
>> +	struct slic_upr *upr;
>> +	dma_addr_t paddr;
>> +	int err = 0;
>> +	u8 *mac[2];
>> +	int i = 0;
>> +
>> +	eeprom = dma_zalloc_coherent(&sdev->pdev->dev, SLIC_EEPROM_SIZE,
>> +				     &paddr, GFP_KERNEL);
>> +	if (!eeprom)
>> +		return -ENOMEM;
>> +
>> +	slic_write(sdev, SLIC_REG_ICR, SLIC_ICR_INT_OFF);
>> +	/* setup ISP temporarily */
>> +	slic_write(sdev, SLIC_REG_ISP, lower_32_bits(sm->isr_paddr));
>> +
>> +	err = slic_new_upr(sdev, SLIC_UPR_CONFIG, paddr);
>> +	if (!err) {
>> +		for (i = 0; i < MAX_LOOPS; i++) {
>> +			if (le32_to_cpu(sm_data->isr) & SLIC_ISR_UPC)
>> +				break;
>> +			mdelay(1);
>> +		}
>> +		if (i == MAX_LOOPS) {
>> +			dev_err(&sdev->pdev->dev,
>> +				"timed out while waiting for eeprom data\n");
>> +			err = -ETIMEDOUT;
>> +		}
>> +		upr = slic_dequeue_upr(sdev);
>> +		kfree(upr);
>> +	}
>> +
>> +	slic_write(sdev, SLIC_REG_ISP, 0);
>> +	slic_write(sdev, SLIC_REG_ISR, 0);
>> +	slic_flush_write(sdev);
>> +
>> +	if (err)
>> +		goto free_eeprom;
>> +
>> +	if (sdev->model == SLIC_MODEL_OASIS) {
>> +		struct slic_oasis_eeprom *oee;
>> +
>> +		oee = (struct slic_oasis_eeprom *)eeprom;
>> +		mac[0] = oee->mac;
>> +		mac[1] = oee->mac2;
>> +		codesize = le16_to_cpu(oee->eeprom_code_size);
>> +	} else {
>> +		struct slic_mojave_eeprom *mee;
>> +
>> +		mee = (struct slic_mojave_eeprom *)eeprom;
>> +		mac[0] = mee->mac;
>> +		mac[1] = mee->mac2;
>> +		codesize = le16_to_cpu(mee->eeprom_code_size);
>> +	}
>> +
>> +	if (!slic_eeprom_valid(eeprom, codesize)) {
>> +		dev_err(&sdev->pdev->dev, "invalid checksum in eeprom\n");
>> +		err = -EINVAL;
>> +		goto free_eeprom;
>> +	}
>> +	/* set mac address */
>> +	ether_addr_copy(sdev->netdev->dev_addr, mac[devfn]);
>> +free_eeprom:
>> +	dma_free_coherent(&sdev->pdev->dev, SLIC_EEPROM_SIZE, eeprom, paddr);
>> +
>> +	return err;
>> +}
> 
> [...]
> 
>> +static int slic_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
>> +{
> 
> [...]
> 
>> +	err = register_netdev(dev);
>> +	if (err) {
>> +		dev_err(&pdev->dev, "failed to register net device: %i\n",
>> +			err);
> 
> Could be on one line.

Right, will adjust it.


> Regards,
> Markus
> 

Thanks Markus!

Regards,
Lino

^ permalink raw reply

* Re: [PATCH] mlx4: give precise rx/tx bytes/packets counters
From: David Miller @ 2016-11-28 20:40 UTC (permalink / raw)
  To: eric.dumazet; +Cc: saeedm, netdev, tariqt
In-Reply-To: <1480212960.18162.23.camel@edumazet-glaptop3.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sat, 26 Nov 2016 18:16:00 -0800

> On Sun, 2016-11-27 at 00:47 +0200, Saeed Mahameed wrote:
>> On Fri, Nov 25, 2016 at 5:46 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
>> As you see here in SRIOV mode (PF only) reads   sw stats from FW.
>> Tariq, I think we need to fix this.
> 
> Sure, my patch does not change this at all.
> 
> If mlx4_is_master() is false, then we aggregate the software states and
> only the software stats.
> 
> My patch makes this aggregation possible at the time ethtool or
> ndo_get_stat64() are called, since this absolutely not depend on the 250
> ms timer fetching hardware stats.

Saeed please provide counter arguments or ACK this patch, thank you.

^ permalink raw reply

* Re: [PATCH net-next v3 3/3] samples: bpf: add userspace example for modifying sk_bound_dev_if
From: Alexei Starovoitov @ 2016-11-28 20:37 UTC (permalink / raw)
  To: David Ahern; +Cc: netdev, daniel, ast, daniel, maheshb, tgraf
In-Reply-To: <1480348130-31354-4-git-send-email-dsa@cumulusnetworks.com>

On Mon, Nov 28, 2016 at 07:48:50AM -0800, David Ahern wrote:
> Add a simple program to demonstrate the ability to attach a bpf program
> to a cgroup that sets sk_bound_dev_if for AF_INET{6} sockets when they
> are created.
> 
> Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
...
> +static int prog_load(int idx)
> +{
> +	struct bpf_insn prog[] = {
> +		BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
> +		BPF_MOV64_IMM(BPF_REG_3, idx),
> +		BPF_MOV64_IMM(BPF_REG_2, offsetof(struct bpf_sock, bound_dev_if)),
> +		BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_3, offsetof(struct bpf_sock, bound_dev_if)),
> +		BPF_MOV64_IMM(BPF_REG_0, 1), /* r0 = verdict */
> +		BPF_EXIT_INSN(),
> +	};
> +
> +	return bpf_prog_load(BPF_PROG_TYPE_CGROUP_SOCK, prog, sizeof(prog),
> +			     "GPL", 0);
> +}

the program looks trivial enough :)

Could you integrate it into iproute2 as well ?
Then the whole vrf management will be easier.
The user wouldn't even need to be aware that iproute2 sets up
this program. It will know ifindex and can delete
the prog when vrf configs change and so on.

Also please convert this sample into automated test like samples/bpf/*.sh
we're going to move all of them to tools/testing/selftests/ eventually.

^ permalink raw reply

* Re: [[PATCH net-next RFC] 1/4] net: dsa: mv88e6xxx: Implement mv88e6390 tag remap
From: Vivien Didelot @ 2016-11-28 20:35 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: netdev, Andrew Lunn
In-Reply-To: <1479944598-20372-2-git-send-email-andrew@lunn.ch>

Hi Andrew,

Andrew Lunn <andrew@lunn.ch> writes:

>  #define PORT_TAG_REGMAP_0123	0x18
>  #define PORT_TAG_REGMAP_4567	0x19
> +#define PORT_PRIO_MAP_TABLE	0x18    /* 6390 */
> +#define PORT_PRIO_MAP_TABLE_UPDATE		BIT(15)
> +#define PORT_PRIO_MAP_TABLE_INGRESS_PCP		(0x0 << 12)
> +#define PORT_PRIO_MAP_TABLE_EGRESS_GREEN_PCP	(0x1 << 12)
> +#define PORT_PRIO_MAP_TABLE_EGRESS_YELLOW_PCP	(0x2 << 12)
> +#define PORT_PRIO_MAP_TABLE_EGRESS_AVB_PCP	(0x3 << 12)
> +#define PORT_PRIO_MAP_TABLE_EGRESS_GREEN_DSCP	(0x5 << 12)
> +#define PORT_PRIO_MAP_TABLE_EGRESS_YELLOW_DSCP	(0x6 << 12)
> +#define PORT_PRIO_MAP_TABLE_EGRESS_AVB_DSCP	(0x7 << 12)

0x17 is the "IP Priority Mapping Table" register, so I'd define 0x18 as
PORT_IEEE_PRIO_MAP_TABLE to avoid later confusion.

>  
>  #define GLOBAL_STATUS		0x00
>  #define GLOBAL_STATUS_PPU_STATE BIT(15) /* 6351 and 6171 */
> @@ -813,6 +822,7 @@ struct mv88e6xxx_ops {
>  	void (*stats_get_strings)(struct mv88e6xxx_chip *chip,  uint8_t *data);
>  	void (*stats_get_stats)(struct mv88e6xxx_chip *chip,  int port,
>  				uint64_t *data);
> +	int (*tag_remap)(struct mv88e6xxx_chip *chip, int port);

I would've prefered an op like .tag_remap(*chip, port, prio, new) and a
wrapper in chip.c which loops over priority 0-7, but that would make the
implementation unnecessarily complex, so let's keep it as is for now ;)

>  };
>  
>  #define STATS_TYPE_PORT		BIT(0)
> diff --git a/drivers/net/dsa/mv88e6xxx/port.c b/drivers/net/dsa/mv88e6xxx/port.c
> index af4772d86086..b7fab70f6cd7 100644
> --- a/drivers/net/dsa/mv88e6xxx/port.c
> +++ b/drivers/net/dsa/mv88e6xxx/port.c
> @@ -496,3 +496,60 @@ int mv88e6xxx_port_set_8021q_mode(struct mv88e6xxx_chip *chip, int port,
>  
>  	return 0;
>  }

Please add an ordered comment:

/* Offset 0x18: Port IEEE Priority Remapping Registers [0-3]
 * Offset 0x19: Port IEEE Priority Remapping Registers [4-7]
 */

> +
> +int mv88e6095_tag_remap(struct mv88e6xxx_chip *chip, int port)
> +{
> +	int err;
> +
> +	/* Tag Remap: use an identity 802.1p prio -> switch prio
> +	 * mapping.
> +	 */
> +	err = mv88e6xxx_port_write(chip, port, PORT_TAG_REGMAP_0123, 0x3210);
> +	if (err)
> +		return err;
> +
> +	/* Tag Remap 2: use an identity 802.1p prio -> switch
> +	 * prio mapping.
> +	 */

A single comment like this before the 2 writes will be enough:

    /* Use a direct priority mapping for all IEEE tagged frames */

> +	return mv88e6xxx_port_write(chip, port, PORT_TAG_REGMAP_4567, 0x7654);
> +}

Functions of port.c implementing Port Registers features must be
prefixed mv88e6xxx_port_* (6xxx can be a model in case of conflict).
mv88e6xxx_port_tag_remap() seems fine.

> +
> +int mv88e6390_tag_remap(struct mv88e6xxx_chip *chip, int port)
> +{
> +	int err, reg, i;
> +
> +	for (i = 0; i <= 7; i++) {
> +		reg = i | (i << 4) |

The pointer offset is 9, not 4.

> +			PORT_PRIO_MAP_TABLE_INGRESS_PCP |
> +			PORT_PRIO_MAP_TABLE_UPDATE;
> +		err = mv88e6xxx_port_write(chip, port, PORT_PRIO_MAP_TABLE,
> +					   reg);
> +		if (err)
> +			return err;
> +
> +		reg = i | PORT_PRIO_MAP_TABLE_EGRESS_GREEN_PCP |
> +			PORT_PRIO_MAP_TABLE_UPDATE;
> +		err = mv88e6xxx_port_write(chip, port, PORT_PRIO_MAP_TABLE,
> +					   reg);
> +		if (err)
> +			return err;
> +
> +		reg = i |
> +			PORT_PRIO_MAP_TABLE_EGRESS_YELLOW_PCP |
> +			PORT_PRIO_MAP_TABLE_UPDATE;
> +		err = mv88e6xxx_port_write(chip, port, PORT_PRIO_MAP_TABLE,
> +					   reg);
> +		if (err)
> +			return err;
> +
> +		reg = i |
> +			PORT_PRIO_MAP_TABLE_EGRESS_AVB_PCP |
> +			PORT_PRIO_MAP_TABLE_UPDATE;
> +		err = mv88e6xxx_port_write(chip, port, PORT_PRIO_MAP_TABLE,
> +					   reg);
> +		if (err)
> +			return err;
> +	}
> +
> +	return 0;
> +}

Please add a static helper first to write the table, e.g.

    /* Offset 0x18: Port IEEE Priority Mapping Table (88E6190) */

    static int mv88e6xxx_port_ieeepmt_write(struct mv88e6xxx_chip *chip,
                                            int port, u8 table,
                                            u8 pointer, u16 data)

And then provide

    int mv88e6xxx_port_ieeepmt_tag_remap(...)


Thanks,

        Vivien

^ permalink raw reply

* Re: [PATCH net 0/2] mlx4 bug fixes for 4.9
From: David Miller @ 2016-11-28 20:34 UTC (permalink / raw)
  To: tariqt; +Cc: netdev, eranbe, sebott, swise
In-Reply-To: <1480267252-26146-1-git-send-email-tariqt@mellanox.com>

From: Tariq Toukan <tariqt@mellanox.com>
Date: Sun, 27 Nov 2016 19:20:50 +0200

> This patchset includes 2 bug fixes:
> * In patch 1 we revert the commit that avoids invoking unregister_netdev
> in shutdown flow, as it introduces netdev presence issues where
> it can be accessed unsafely by ndo operations during the flow.
> * Patch 2 is a simple fix for a variable uninitialization issue.
> 
> Series generated against net commit:
> 6998cc6ec237 tipc: resolve connection flow control compatibility problem

Series applied, thanks.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox