Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH v3 net-next 2/2] dt-bindings: net: add support for Microchip KSZ9131
From: Yuiko Oshino @ 2018-10-18 19:06 UTC (permalink / raw)
  To: davem, robh+dt, devicetree, f.fainelli, andrew
  Cc: linux-kernel, mark.rutland, m.felsch, Markus.Niebel, netdev,
	UNGLinuxDriver
In-Reply-To: <1539889562-21458-1-git-send-email-yuiko.oshino@microchip.com>

Add support for Microchip Technology KSZ9131 10/100/1000 Ethernet PHY

Signed-off-by: Yuiko Oshino <yuiko.oshino@microchip.com>
---
 .../devicetree/bindings/net/micrel-ksz90x1.txt     | 28 +++++++++++++++++++++-
 1 file changed, 27 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/net/micrel-ksz90x1.txt b/Documentation/devicetree/bindings/net/micrel-ksz90x1.txt
index e22d8cf..5100358 100644
--- a/Documentation/devicetree/bindings/net/micrel-ksz90x1.txt
+++ b/Documentation/devicetree/bindings/net/micrel-ksz90x1.txt
@@ -1,4 +1,4 @@
-Micrel KSZ9021/KSZ9031 Gigabit Ethernet PHY
+Micrel KSZ9021/KSZ9031/KSZ9131 Gigabit Ethernet PHY
 
 Some boards require special tuning values, particularly when it comes
 to clock delays. You can specify clock delay values in the PHY OF
@@ -64,6 +64,32 @@ KSZ9031:
         Attention: The link partner must be configurable as slave otherwise
         no link will be established.
 
+KSZ9131:
+
+  All skew control options are specified in picoseconds. The increment
+  step is 100ps. Unlike KSZ9031, the values represent picoseccond delays.
+  A negative value can be assigned as rxc-skew-psec = <(-100)>;.
+
+  Optional properties:
+
+    Range of the value -700 to 2400, default value 0:
+
+      - rxc-skew-psec : Skew control of RX clock pad
+      - txc-skew-psec : Skew control of TX clock pad
+
+    Range of the value -700 to 800, default value 0:
+
+      - rxdv-skew-psec : Skew control of RX CTL pad
+      - txen-skew-psec : Skew control of TX CTL pad
+      - rxd0-skew-psec : Skew control of RX data 0 pad
+      - rxd1-skew-psec : Skew control of RX data 1 pad
+      - rxd2-skew-psec : Skew control of RX data 2 pad
+      - rxd3-skew-psec : Skew control of RX data 3 pad
+      - txd0-skew-psec : Skew control of TX data 0 pad
+      - txd1-skew-psec : Skew control of TX data 1 pad
+      - txd2-skew-psec : Skew control of TX data 2 pad
+      - txd3-skew-psec : Skew control of TX data 3 pad
+
 Examples:
 
 	mdio {
-- 
2.7.4

^ permalink raw reply related

* [PATCH v3 net-next 1/2] net: phy: micrel: add Microchip KSZ9131 initial driver
From: Yuiko Oshino @ 2018-10-18 19:06 UTC (permalink / raw)
  To: davem, robh+dt, devicetree, f.fainelli, andrew
  Cc: linux-kernel, mark.rutland, m.felsch, Markus.Niebel, netdev,
	UNGLinuxDriver
In-Reply-To: <1539889562-21458-1-git-send-email-yuiko.oshino@microchip.com>

Add support for Microchip Technology KSZ9131 10/100/1000 Ethernet PHY

Signed-off-by: Yuiko Oshino <yuiko.oshino@microchip.com>
---
 drivers/net/phy/micrel.c   | 130 ++++++++++++++++++++++++++++++++++++++++++++-
 include/linux/micrel_phy.h |   1 +
 2 files changed, 130 insertions(+), 1 deletion(-)

diff --git a/drivers/net/phy/micrel.c b/drivers/net/phy/micrel.c
index 3db06b4..270ea16 100644
--- a/drivers/net/phy/micrel.c
+++ b/drivers/net/phy/micrel.c
@@ -14,7 +14,7 @@
  * option) any later version.
  *
  * Support : Micrel Phys:
- *		Giga phys: ksz9021, ksz9031
+ *		Giga phys: ksz9021, ksz9031, ksz9131
  *		100/10 Phys : ksz8001, ksz8721, ksz8737, ksz8041
  *			   ksz8021, ksz8031, ksz8051,
  *			   ksz8081, ksz8091,
@@ -609,6 +609,116 @@ static int ksz9031_config_init(struct phy_device *phydev)
 	return result;
 }
 
+#define KSZ9131_SKEW_5BIT_MAX	2400
+#define KSZ9131_SKEW_4BIT_MAX	800
+#define KSZ9131_OFFSET		700
+#define KSZ9131_STEP		100
+
+static int ksz9131_of_load_skew_values(struct phy_device *phydev,
+				       struct device_node *of_node,
+				       u16 reg, size_t field_sz,
+				       char *field[], u8 numfields)
+{
+	int val[4] = {-(1 + KSZ9131_OFFSET), -(2 + KSZ9131_OFFSET),
+		      -(3 + KSZ9131_OFFSET), -(4 + KSZ9131_OFFSET)};
+	int skewval, skewmax = 0;
+	int matches = 0;
+	u16 maxval;
+	u16 newval;
+	u16 mask;
+	int i;
+
+	/* psec properties in dts should mean x pico seconds */
+	if (field_sz == 5)
+		skewmax = KSZ9131_SKEW_5BIT_MAX;
+	else
+		skewmax = KSZ9131_SKEW_4BIT_MAX;
+
+	for (i = 0; i < numfields; i++)
+		if (!of_property_read_s32(of_node, field[i], &skewval)) {
+			if (skewval < -KSZ9131_OFFSET)
+				skewval = -KSZ9131_OFFSET;
+			else if (skewval > skewmax)
+				skewval = skewmax;
+
+			val[i] = skewval + KSZ9131_OFFSET;
+			matches++;
+		}
+
+	if (!matches)
+		return 0;
+
+	if (matches < numfields)
+		newval = ksz9031_extended_read(phydev, OP_DATA, 2, reg);
+	else
+		newval = 0;
+
+	maxval = (field_sz == 4) ? 0xf : 0x1f;
+	for (i = 0; i < numfields; i++)
+		if (val[i] != -(i + 1 + KSZ9131_OFFSET)) {
+			mask = 0xffff;
+			mask ^= maxval << (field_sz * i);
+			newval = (newval & mask) |
+				(((val[i] / KSZ9131_STEP) & maxval)
+					<< (field_sz * i));
+		}
+
+	return ksz9031_extended_write(phydev, OP_DATA, 2, reg, newval);
+}
+
+static int ksz9131_config_init(struct phy_device *phydev)
+{
+	const struct device *dev = &phydev->mdio.dev;
+	struct device_node *of_node = dev->of_node;
+	char *clk_skews[2] = {"rxc-skew-psec", "txc-skew-psec"};
+	char *rx_data_skews[4] = {
+		"rxd0-skew-psec", "rxd1-skew-psec",
+		"rxd2-skew-psec", "rxd3-skew-psec"
+	};
+	char *tx_data_skews[4] = {
+		"txd0-skew-psec", "txd1-skew-psec",
+		"txd2-skew-psec", "txd3-skew-psec"
+	};
+	char *control_skews[2] = {"txen-skew-psec", "rxdv-skew-psec"};
+	const struct device *dev_walker;
+	int ret;
+
+	dev_walker = &phydev->mdio.dev;
+	do {
+		of_node = dev_walker->of_node;
+		dev_walker = dev_walker->parent;
+	} while (!of_node && dev_walker);
+
+	if (!of_node)
+		return 0;
+
+	ret = ksz9131_of_load_skew_values(phydev, of_node,
+					  MII_KSZ9031RN_CLK_PAD_SKEW, 5,
+					  clk_skews, 2);
+	if (ret < 0)
+		return ret;
+
+	ret = ksz9131_of_load_skew_values(phydev, of_node,
+					  MII_KSZ9031RN_CONTROL_PAD_SKEW, 4,
+					  control_skews, 2);
+	if (ret < 0)
+		return ret;
+
+	ret = ksz9131_of_load_skew_values(phydev, of_node,
+					  MII_KSZ9031RN_RX_DATA_PAD_SKEW, 4,
+					  rx_data_skews, 4);
+	if (ret < 0)
+		return ret;
+
+	ret = ksz9131_of_load_skew_values(phydev, of_node,
+					  MII_KSZ9031RN_TX_DATA_PAD_SKEW, 4,
+					  tx_data_skews, 4);
+	if (ret < 0)
+		return ret;
+
+	return 0;
+}
+
 #define KSZ8873MLL_GLOBAL_CONTROL_4	0x06
 #define KSZ8873MLL_GLOBAL_CONTROL_4_DUPLEX	BIT(6)
 #define KSZ8873MLL_GLOBAL_CONTROL_4_SPEED	BIT(4)
@@ -975,6 +1085,23 @@ static struct phy_driver ksphy_driver[] = {
 	.suspend	= genphy_suspend,
 	.resume		= kszphy_resume,
 }, {
+	.phy_id		= PHY_ID_KSZ9131,
+	.phy_id_mask	= MICREL_PHY_ID_MASK,
+	.name		= "Microchip KSZ9131 Gigabit PHY",
+	.features	= PHY_GBIT_FEATURES,
+	.flags		= PHY_HAS_INTERRUPT,
+	.driver_data	= &ksz9021_type,
+	.probe		= kszphy_probe,
+	.config_init	= ksz9131_config_init,
+	.read_status	= ksz9031_read_status,
+	.ack_interrupt	= kszphy_ack_interrupt,
+	.config_intr	= kszphy_config_intr,
+	.get_sset_count = kszphy_get_sset_count,
+	.get_strings	= kszphy_get_strings,
+	.get_stats	= kszphy_get_stats,
+	.suspend	= genphy_suspend,
+	.resume		= kszphy_resume,
+}, {
 	.phy_id		= PHY_ID_KSZ8873MLL,
 	.phy_id_mask	= MICREL_PHY_ID_MASK,
 	.name		= "Micrel KSZ8873MLL Switch",
@@ -1022,6 +1149,7 @@ MODULE_LICENSE("GPL");
 static struct mdio_device_id __maybe_unused micrel_tbl[] = {
 	{ PHY_ID_KSZ9021, 0x000ffffe },
 	{ PHY_ID_KSZ9031, MICREL_PHY_ID_MASK },
+	{ PHY_ID_KSZ9131, MICREL_PHY_ID_MASK },
 	{ PHY_ID_KSZ8001, 0x00fffffc },
 	{ PHY_ID_KS8737, MICREL_PHY_ID_MASK },
 	{ PHY_ID_KSZ8021, 0x00ffffff },
diff --git a/include/linux/micrel_phy.h b/include/linux/micrel_phy.h
index 472fa4d..7361cd3 100644
--- a/include/linux/micrel_phy.h
+++ b/include/linux/micrel_phy.h
@@ -31,6 +31,7 @@
 #define PHY_ID_KSZ8081		0x00221560
 #define PHY_ID_KSZ8061		0x00221570
 #define PHY_ID_KSZ9031		0x00221620
+#define PHY_ID_KSZ9131		0x00221640
 
 #define PHY_ID_KSZ886X		0x00221430
 #define PHY_ID_KSZ8863		0x00221435
-- 
2.7.4

^ permalink raw reply related

* pull request (net-next): ipsec-next 2018-10-18
From: Steffen Klassert @ 2018-10-18 10:56 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev

1) Remove an unnecessary dev->tstats check in xfrmi_get_stats64.
   From Li RongQing.

2) We currently do a sizeof(element) instead of a sizeof(array)
   check when initializing the ovec array of the secpath.
   Currently this array can have only one element, so code is
   OK but error-prone. Change this to do a sizeof(array)
   check so that we can add more elements in future.
   From Li RongQing.

3) Improve xfrm IPv6 address hashing by using the complete IPv6
   addresses for a hash. From Michal Kubecek.

Please pull or let me know if there are problems.

Thanks!

The following changes since commit abf1a08ff3237a27188ff8cc2904f2cea893af55:

  net: vhost: remove bad code line (2018-10-07 21:31:32 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next.git master

for you to fetch changes up to 8d4b6bce2559755cf2db6513a267fccdfbf7c3ab:

  xfrm: use complete IPv6 addresses for hash (2018-10-15 10:09:18 +0200)

----------------------------------------------------------------
Li RongQing (2):
      xfrm: remove unnecessary check in xfrmi_get_stats64
      xfrm: use correct size to initialise sp->ovec

Michal Kubecek (1):
      xfrm: use complete IPv6 addresses for hash

 net/xfrm/xfrm_hash.h      | 5 ++---
 net/xfrm/xfrm_input.c     | 2 +-
 net/xfrm/xfrm_interface.c | 3 ---
 3 files changed, 3 insertions(+), 7 deletions(-)

^ permalink raw reply

* [PATCH 1/3] xfrm: remove unnecessary check in xfrmi_get_stats64
From: Steffen Klassert @ 2018-10-18 10:56 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev
In-Reply-To: <20181018105654.32149-1-steffen.klassert@secunet.com>

From: Li RongQing <lirongqing@baidu.com>

if tstats of a device is not allocated, this device is not
registered correctly and can not be used.

Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/xfrm/xfrm_interface.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/net/xfrm/xfrm_interface.c b/net/xfrm/xfrm_interface.c
index dc5b20bf29cf..abafd49cc65d 100644
--- a/net/xfrm/xfrm_interface.c
+++ b/net/xfrm/xfrm_interface.c
@@ -561,9 +561,6 @@ static void xfrmi_get_stats64(struct net_device *dev,
 {
 	int cpu;
 
-	if (!dev->tstats)
-		return;
-
 	for_each_possible_cpu(cpu) {
 		struct pcpu_sw_netstats *stats;
 		struct pcpu_sw_netstats tmp;
-- 
2.17.1

^ permalink raw reply related

* [PATCH 3/3] xfrm: use complete IPv6 addresses for hash
From: Steffen Klassert @ 2018-10-18 10:56 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev
In-Reply-To: <20181018105654.32149-1-steffen.klassert@secunet.com>

From: Michal Kubecek <mkubecek@suse.cz>

In some environments it is common that many hosts share the same lower half
of their IPv6 addresses (in particular ::1). As __xfrm6_addr_hash() and
__xfrm6_daddr_saddr_hash() calculate the hash only from the lower halves,
as much as 1/3 of the hosts ends up in one hashtable chain which harms the
performance.

Use complete IPv6 addresses when calculating the hashes. Rather than just
adding two more words to the xor, use jhash2() for consistency with
__xfrm6_pref_hash() and __xfrm6_dpref_spref_hash().

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/xfrm/xfrm_hash.h | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/net/xfrm/xfrm_hash.h b/net/xfrm/xfrm_hash.h
index 61be810389d8..ce66323102f9 100644
--- a/net/xfrm/xfrm_hash.h
+++ b/net/xfrm/xfrm_hash.h
@@ -13,7 +13,7 @@ static inline unsigned int __xfrm4_addr_hash(const xfrm_address_t *addr)
 
 static inline unsigned int __xfrm6_addr_hash(const xfrm_address_t *addr)
 {
-	return ntohl(addr->a6[2] ^ addr->a6[3]);
+	return jhash2((__force u32 *)addr->a6, 4, 0);
 }
 
 static inline unsigned int __xfrm4_daddr_saddr_hash(const xfrm_address_t *daddr,
@@ -26,8 +26,7 @@ static inline unsigned int __xfrm4_daddr_saddr_hash(const xfrm_address_t *daddr,
 static inline unsigned int __xfrm6_daddr_saddr_hash(const xfrm_address_t *daddr,
 						    const xfrm_address_t *saddr)
 {
-	return ntohl(daddr->a6[2] ^ daddr->a6[3] ^
-		     saddr->a6[2] ^ saddr->a6[3]);
+	return __xfrm6_addr_hash(daddr) ^ __xfrm6_addr_hash(saddr);
 }
 
 static inline u32 __bits2mask32(__u8 bits)
-- 
2.17.1

^ permalink raw reply related

* [PATCH 2/3] xfrm: use correct size to initialise sp->ovec
From: Steffen Klassert @ 2018-10-18 10:56 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev
In-Reply-To: <20181018105654.32149-1-steffen.klassert@secunet.com>

From: Li RongQing <lirongqing@baidu.com>

This place should want to initialize array, not a element,
so it should be sizeof(array) instead of sizeof(element)

but now this array only has one element, so no error in
this condition that XFRM_MAX_OFFLOAD_DEPTH is 1

Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/xfrm/xfrm_input.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
index be3520e429c9..684c0bc01e2c 100644
--- a/net/xfrm/xfrm_input.c
+++ b/net/xfrm/xfrm_input.c
@@ -131,7 +131,7 @@ struct sec_path *secpath_dup(struct sec_path *src)
 	sp->len = 0;
 	sp->olen = 0;
 
-	memset(sp->ovec, 0, sizeof(sp->ovec[XFRM_MAX_OFFLOAD_DEPTH]));
+	memset(sp->ovec, 0, sizeof(sp->ovec));
 
 	if (src) {
 		int i;
-- 
2.17.1

^ permalink raw reply related

* Re: [PATCH] net: ethernet: fec: Add missing SPEED_
From: Florian Fainelli @ 2018-10-18 18:55 UTC (permalink / raw)
  To: LABBE Corentin; +Cc: andrew, davem, fugang.duan, linux-kernel, netdev
In-Reply-To: <20181018184715.GA31736@Red>

On 10/18/2018 11:47 AM, LABBE Corentin wrote:
> On Thu, Oct 18, 2018 at 11:39:24AM -0700, Florian Fainelli wrote:
>> On 10/18/2018 08:05 AM, Corentin Labbe wrote:
>>> Since commit 58056c1e1b0e ("net: ethernet: Use phy_set_max_speed() to limit advertised speed"), the fec driver is unable to get any link.
>>> This is due to missing SPEED_.
>>
>> But SPEED_1000 is defined in include/uapi/linux/ethtool.h as 1000, so
>> surely this would amount to the same code paths being taken or am I
>> missing something here?
> 
> The bisect session pointed your patch, reverting it fix the issue.
> BUT since the fix seemed trivial I sent the patch without more test then compile it.
> Sorry, I have just found some minutes ago that it didnt fix the issue.
> 
> But your patch is still the cause for sure.
> 

What you are writing is really lowering the confidence level, first
Andrew is the author of that patch, and second "just compiling" and
pretending this fixes a problem when it does not is not quite what I
would expect.

I don't have a problem helping you find the solution or the right fix
though, even if it is not my patch, but please get the author and actual
problem right so we can move forward in confidence, thanks!
-- 
Florian

^ permalink raw reply

* Re: [PATCH] brcmfmac: fix spelling mistake "Retreiving" -> "Retrieving"
From: Arend van Spriel @ 2018-10-18 10:49 UTC (permalink / raw)
  To: Colin King, Franky Lin, Hante Meuleman, Chi-Hsien Lin,
	Wright Feng, Kalle Valo, David S . Miller, Pieter-Paul Giesberts,
	linux-wireless, brcm80211-dev-list.pdl, brcm80211-dev-list,
	netdev
  Cc: kernel-janitors, linux-kernel
In-Reply-To: <20181016174342.1867-1-colin.king@canonical.com>

On 10/16/2018 7:43 PM, Colin King wrote:
> From: Colin Ian King <colin.king@canonical.com>
>
> Trivial fix to spelling mistake in brcmf_err error message.
>
Acked-by: Arend van Spriel <arend.vanspriel@broadcom.com>
> Signed-off-by: Colin Ian King <colin.king@canonical.com>
> ---
>  drivers/net/wireless/broadcom/brcm80211/brcmfmac/common.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

^ permalink raw reply

* Re: [PATCH] net: ethernet: fec: Add missing SPEED_
From: Florian Fainelli @ 2018-10-18 18:39 UTC (permalink / raw)
  To: Corentin Labbe, andrew, davem, fugang.duan; +Cc: linux-kernel, netdev
In-Reply-To: <1539875100-11121-1-git-send-email-clabbe@baylibre.com>

On 10/18/2018 08:05 AM, Corentin Labbe wrote:
> Since commit 58056c1e1b0e ("net: ethernet: Use phy_set_max_speed() to limit advertised speed"), the fec driver is unable to get any link.
> This is due to missing SPEED_.

But SPEED_1000 is defined in include/uapi/linux/ethtool.h as 1000, so
surely this would amount to the same code paths being taken or am I
missing something here?

> 
> Fixes: 58056c1e1b0e ("net: ethernet: Use phy_set_max_speed() to limit advertised speed")
> Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
> ---
>  drivers/net/ethernet/freescale/fec_main.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
> index a17cc97..75fd7c9 100644
> --- a/drivers/net/ethernet/freescale/fec_main.c
> +++ b/drivers/net/ethernet/freescale/fec_main.c
> @@ -1946,7 +1946,7 @@ static int fec_enet_mii_probe(struct net_device *ndev)
>  
>  	/* mask with MAC supported features */
>  	if (fep->quirks & FEC_QUIRK_HAS_GBIT) {
> -		phy_set_max_speed(phy_dev, 1000);
> +		phy_set_max_speed(phy_dev, SPEED_1000);
>  		phy_remove_link_mode(phy_dev,
>  				     ETHTOOL_LINK_MODE_1000baseT_Half_BIT);
>  #if !defined(CONFIG_M5272)
> @@ -1954,7 +1954,7 @@ static int fec_enet_mii_probe(struct net_device *ndev)
>  #endif
>  	}
>  	else
> -		phy_set_max_speed(phy_dev, 100);
> +		phy_set_max_speed(phy_dev, SPEED_100);
>  
>  	fep->link = 0;
>  	fep->full_duplex = 0;
> 


-- 
Florian

^ permalink raw reply

* [PATCH 4/4] xfrm: policy: use hlist rcu variants on insert
From: Steffen Klassert @ 2018-10-18 10:25 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev
In-Reply-To: <20181018102521.24997-1-steffen.klassert@secunet.com>

From: Florian Westphal <fw@strlen.de>

bydst table/list lookups use rcu, so insertions must use rcu versions.

Fixes: a7c44247f704e ("xfrm: policy: make xfrm_policy_lookup_bytype lockless")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/xfrm/xfrm_policy.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index f094d4b3520d..119a427d9b2b 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -632,9 +632,9 @@ static void xfrm_hash_rebuild(struct work_struct *work)
 				break;
 		}
 		if (newpos)
-			hlist_add_behind(&policy->bydst, newpos);
+			hlist_add_behind_rcu(&policy->bydst, newpos);
 		else
-			hlist_add_head(&policy->bydst, chain);
+			hlist_add_head_rcu(&policy->bydst, chain);
 	}
 
 	spin_unlock_bh(&net->xfrm.xfrm_policy_lock);
@@ -774,9 +774,9 @@ int xfrm_policy_insert(int dir, struct xfrm_policy *policy, int excl)
 			break;
 	}
 	if (newpos)
-		hlist_add_behind(&policy->bydst, newpos);
+		hlist_add_behind_rcu(&policy->bydst, newpos);
 	else
-		hlist_add_head(&policy->bydst, chain);
+		hlist_add_head_rcu(&policy->bydst, chain);
 	__xfrm_policy_link(policy, dir);
 
 	/* After previous checking, family can either be AF_INET or AF_INET6 */
-- 
2.17.1

^ permalink raw reply related

* [PATCH 3/4] net/xfrm: fix out-of-bounds packet access
From: Steffen Klassert @ 2018-10-18 10:25 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev
In-Reply-To: <20181018102521.24997-1-steffen.klassert@secunet.com>

From: Alexei Starovoitov <ast@kernel.org>

BUG: KASAN: slab-out-of-bounds in _decode_session6+0x1331/0x14e0
net/ipv6/xfrm6_policy.c:161
Read of size 1 at addr ffff8801d882eec7 by task syz-executor1/6667
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
  print_address_description+0x6c/0x20b mm/kasan/report.c:256
  kasan_report_error mm/kasan/report.c:354 [inline]
  kasan_report.cold.7+0x242/0x30d mm/kasan/report.c:412
  __asan_report_load1_noabort+0x14/0x20 mm/kasan/report.c:430
  _decode_session6+0x1331/0x14e0 net/ipv6/xfrm6_policy.c:161
  __xfrm_decode_session+0x71/0x140 net/xfrm/xfrm_policy.c:2299
  xfrm_decode_session include/net/xfrm.h:1232 [inline]
  vti6_tnl_xmit+0x3c3/0x1bc1 net/ipv6/ip6_vti.c:542
  __netdev_start_xmit include/linux/netdevice.h:4313 [inline]
  netdev_start_xmit include/linux/netdevice.h:4322 [inline]
  xmit_one net/core/dev.c:3217 [inline]
  dev_hard_start_xmit+0x272/0xc10 net/core/dev.c:3233
  __dev_queue_xmit+0x2ab2/0x3870 net/core/dev.c:3803
  dev_queue_xmit+0x17/0x20 net/core/dev.c:3836

Reported-by: syzbot+acffccec848dc13fe459@syzkaller.appspotmail.com
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/ipv6/xfrm6_policy.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c
index ef3defaf43b9..d35bcf92969c 100644
--- a/net/ipv6/xfrm6_policy.c
+++ b/net/ipv6/xfrm6_policy.c
@@ -146,8 +146,8 @@ _decode_session6(struct sk_buff *skb, struct flowi *fl, int reverse)
 	fl6->daddr = reverse ? hdr->saddr : hdr->daddr;
 	fl6->saddr = reverse ? hdr->daddr : hdr->saddr;
 
-	while (nh + offset + 1 < skb->data ||
-	       pskb_may_pull(skb, nh + offset + 1 - skb->data)) {
+	while (nh + offset + sizeof(*exthdr) < skb->data ||
+	       pskb_may_pull(skb, nh + offset + sizeof(*exthdr) - skb->data)) {
 		nh = skb_network_header(skb);
 		exthdr = (struct ipv6_opt_hdr *)(nh + offset);
 
-- 
2.17.1

^ permalink raw reply related

* [PATCH 2/4] MAINTAINERS: Remove net/core/flow.c
From: Steffen Klassert @ 2018-10-18 10:25 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev
In-Reply-To: <20181018102521.24997-1-steffen.klassert@secunet.com>

net/core/flow.c does not exist anymore, so remove it
from the IPSEC NETWORKING section of the MAINTAINERS
file.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 MAINTAINERS | 1 -
 1 file changed, 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index dcb0191c4f54..4ff21dac9b45 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -10130,7 +10130,6 @@ L:	netdev@vger.kernel.org
 T:	git git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec.git
 T:	git git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next.git
 S:	Maintained
-F:	net/core/flow.c
 F:	net/xfrm/
 F:	net/key/
 F:	net/ipv4/xfrm*
-- 
2.17.1

^ permalink raw reply related

* [PATCH 1/4] xfrm: fix gro_cells leak when remove virtual xfrm interfaces
From: Steffen Klassert @ 2018-10-18 10:25 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev
In-Reply-To: <20181018102521.24997-1-steffen.klassert@secunet.com>

From: Li RongQing <lirongqing@baidu.com>

The device gro_cells has been initialized, it should be freed,
otherwise it will be leaked

Fixes: f203b76d78092faf2 ("xfrm: Add virtual xfrm interfaces")
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/xfrm/xfrm_interface.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/xfrm/xfrm_interface.c b/net/xfrm/xfrm_interface.c
index 31acc6f33d98..6f05e831a73e 100644
--- a/net/xfrm/xfrm_interface.c
+++ b/net/xfrm/xfrm_interface.c
@@ -116,6 +116,9 @@ static void xfrmi_unlink(struct xfrmi_net *xfrmn, struct xfrm_if *xi)
 
 static void xfrmi_dev_free(struct net_device *dev)
 {
+	struct xfrm_if *xi = netdev_priv(dev);
+
+	gro_cells_destroy(&xi->gro_cells);
 	free_percpu(dev->tstats);
 }
 
-- 
2.17.1

^ permalink raw reply related

* pull request (net): ipsec 2018-10-18
From: Steffen Klassert @ 2018-10-18 10:25 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev

1) Free the xfrm interface gro_cells when deleting the
   interface, otherwise we leak it. From Li RongQing.

2) net/core/flow.c does not exist anymore, so remove it
   from the MAINTAINERS file.

3) Fix a slab-out-of-bounds in _decode_session6.
   From Alexei Starovoitov.

4) Fix RCU protection when policies inserted into
   thei bydst lists. From Florian Westphal.

Please pull or let me know if there are problems.

Thanks!

The following changes since commit 92d7c74b6f72a8a7d04970d5dcfb99673daaf91d:

  Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth (2018-10-01 22:40:39 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec.git master

for you to fetch changes up to 9dffff200fd178f11dd50eb1fd8ccd0650c9284e:

  xfrm: policy: use hlist rcu variants on insert (2018-10-11 13:24:46 +0200)

----------------------------------------------------------------
Alexei Starovoitov (1):
      net/xfrm: fix out-of-bounds packet access

Florian Westphal (1):
      xfrm: policy: use hlist rcu variants on insert

Li RongQing (1):
      xfrm: fix gro_cells leak when remove virtual xfrm interfaces

Steffen Klassert (1):
      MAINTAINERS: Remove net/core/flow.c

 MAINTAINERS               | 1 -
 net/ipv6/xfrm6_policy.c   | 4 ++--
 net/xfrm/xfrm_interface.c | 3 +++
 net/xfrm/xfrm_policy.c    | 8 ++++----
 4 files changed, 9 insertions(+), 7 deletions(-)

^ permalink raw reply

* [PATCH net-next] net: ethernet: ti: cpsw: don't flush mcast entries while switch promisc mode
From: Ivan Khoronzhuk @ 2018-10-18 18:00 UTC (permalink / raw)
  To: grygorii.strashko, davem
  Cc: linux-omap, netdev, linux-kernel, Ivan Khoronzhuk

No need now to flush mcast entries in switch mode while toggling to
promiscuous mode. It's not needed as vlan reg_mcast = ALL_PORTS
and mcast/vlan ports = ALL_PORTS, the same happening for vlan
unreg_mcast, it's set to ALL_PORT_MASK just after calling promisc
mode routine by calling set allmulti. I suppose main reason to flush
them is to use unreg_mcast to receive all to host port. Thus, now, all
mcast packets are received anyway and no reason to flush mcast entries
unsafely, as they were synced with __dev_mc_sync() previously and are
not restored. Another way is to _dev_mc_unsync() them, but no need.

Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
---

Based on net-next/master
Tasted on am572x EVM and BBB

 drivers/net/ethernet/ti/cpsw.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 226be2a56c1f..0e475020a674 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -638,9 +638,6 @@ static void cpsw_set_promiscious(struct net_device *ndev, bool enable)
 			} while (time_after(timeout, jiffies));
 			cpsw_ale_control_set(ale, 0, ALE_AGEOUT, 1);

-			/* Clear all mcast from ALE */
-			cpsw_ale_flush_multicast(ale, ALE_ALL_PORTS, -1);
-
 			/* Flood All Unicast Packets to Host port */
 			cpsw_ale_control_set(ale, 0, ALE_P0_UNI_FLOOD, 1);
 			dev_dbg(&ndev->dev, "promiscuity enabled\n");
-- 
2.17.1

^ permalink raw reply related

* Re: [RFC] virtio_net: add local_bh_disable() around u64_stats_update_begin
From: Sebastian Andrzej Siewior @ 2018-10-18  9:30 UTC (permalink / raw)
  To: Toshiaki Makita
  Cc: Jason Wang, netdev, virtualization, tglx, Michael S. Tsirkin,
	David S. Miller
In-Reply-To: <55f14915-744b-e11c-bc50-87a872218479@lab.ntt.co.jp>

On 2018-10-18 18:19:21 [+0900], Toshiaki Makita wrote:
> On 2018/10/18 18:08, Sebastian Andrzej Siewior wrote:
> > Again: lockdep saw the lock in softirq context once and in process
> > context once and this is what triggers the warning. It does not matter
> > if NAPI is enabled or not during the access in process context. If you
> > want to allow this you need further lockdep annotation…
> > 
> > … but: refill_work() disables NAPI for &vi->rq[1] and refills + updates
> > stats while NAPI is enabled for &vi->rq[0].
> 
> Do you mean this is false positive? rq[0] and rq[1] never race with each
> other...

Why? So you can't refill rq[1] and then be interrupted and process NAPI
for rq[0]?

But as I said. If lockdep saw the lock in acquired in softirq (what it
did) _and_ in process context (what it did as well) _once_ then this is
enough evidence for the warning.
If you claim that this can not happen due to NAPI guard [0] then this is
something lockdep does not know about.

[0] which I currently don't understand and therefore sent the patch [1]
    as Jason pointed out that in the ->ndo_open case the work is
    scheduled and then NAPI is enabled (which means the worker could
    disable NAPI and refill but before it finishes, ->ndo_open would
    continue and enable NAPI)).
[1] 20181018084753.wefvsypdevbzoadg@linutronix.de

Sebastian

^ permalink raw reply

* Re: [PATCH v3 lora-next 1/5] regmap: Add regmap_noinc_write API
From: Mark Brown @ 2018-10-18 17:18 UTC (permalink / raw)
  To: Andreas Färber
  Cc: Ben Whitten, starnight, hasnain.virk, netdev, liuxuenetmail,
	shess, Ben Whitten, Greg Kroah-Hartman, Rafael J. Wysocki,
	linux-kernel, Stefan Rehm, linux-spi@vger.kernel.org
In-Reply-To: <2908e406-a2e3-29cb-579c-949ddd383e97@suse.de>

[-- Attachment #1: Type: text/plain, Size: 346 bytes --]

On Thu, Oct 18, 2018 at 06:59:52PM +0200, Andreas Färber wrote:

> Mark, please take this one through your tree - I'll rebase the LoRa
> parts on linux-next then.

I don't have a copy of it, if I didn't apply it already and send a pull
request I probably got confused and thought I'd done that already, sorry
- can someone resend please?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH] virtio_net: add local_bh_disable() around u64_stats_update_begin
From: Sebastian Andrzej Siewior @ 2018-10-18  9:11 UTC (permalink / raw)
  To: Toshiaki Makita
  Cc: Stephen Hemminger, netdev, virtualization, tglx,
	Michael S. Tsirkin, Jason Wang, David S. Miller
In-Reply-To: <0dce3017-9973-af2d-f324-6fe8b5eb5469@lab.ntt.co.jp>

On 2018-10-18 18:06:57 [+0900], Toshiaki Makita wrote:
> NACK. Again, this race should not happen because of NAPI guard.
> We need to investigate why this warning happened.

I tried to explain this. Please see
	20181018090812.rry5qgnqxxrjxaii@linutronix.de

Sebastian

^ permalink raw reply

* Re: [RFC] virtio_net: add local_bh_disable() around u64_stats_update_begin
From: Sebastian Andrzej Siewior @ 2018-10-18  9:08 UTC (permalink / raw)
  To: Toshiaki Makita
  Cc: Jason Wang, netdev, virtualization, tglx, Michael S. Tsirkin,
	David S. Miller
In-Reply-To: <4849237d-38f5-9840-4ab9-4419de31db85@lab.ntt.co.jp>

On 2018-10-18 18:00:05 [+0900], Toshiaki Makita wrote:
> On 2018/10/18 17:47, Sebastian Andrzej Siewior wrote:
> > On 2018-10-17 14:48:02 [+0800], Jason Wang wrote:
> >>
> >> On 2018/10/17 上午9:13, Toshiaki Makita wrote:
> >>> I'm not sure what condition triggered this warning.
> > 
> > If the seqlock is acquired once in softirq and then in process context
> > again it is enough evidence for lockdep to trigger this warning.
> 
> No. As I said that should not happen because of NAPI guard.
Again: lockdep saw the lock in softirq context once and in process
context once and this is what triggers the warning. It does not matter
if NAPI is enabled or not during the access in process context. If you
want to allow this you need further lockdep annotation…

… but: refill_work() disables NAPI for &vi->rq[1] and refills + updates
stats while NAPI is enabled for &vi->rq[0].

Sebastian

^ permalink raw reply

* Re: [RFC] virtio_net: add local_bh_disable() around u64_stats_update_begin
From: Sebastian Andrzej Siewior @ 2018-10-18  8:47 UTC (permalink / raw)
  To: Jason Wang
  Cc: Toshiaki Makita, netdev, virtualization, tglx, Michael S. Tsirkin,
	David S. Miller
In-Reply-To: <a281371f-dd20-2036-d0a8-1081c2f6a452@redhat.com>

On 2018-10-17 14:48:02 [+0800], Jason Wang wrote:
> 
> On 2018/10/17 上午9:13, Toshiaki Makita wrote:
> > I'm not sure what condition triggered this warning.

If the seqlock is acquired once in softirq and then in process context
again it is enough evidence for lockdep to trigger this warning.

> > Toshiaki Makita
> 
> 
> Or maybe NAPI is enabled unexpectedly somewhere?
> 
> Btw, the schedule_delayed_work() in virtnet_open() is also suspicious, if
> the work is executed before virtnet_napi_enable(), there will be a deadloop
> for napi_disable().

something like this? It is also likely if it runs OOM on queue 2, it
will run OOM again on queue 3.

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index fbcfb4d272336..87d6ec4765270 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1263,22 +1263,22 @@ static void refill_work(struct work_struct *work)
 {
 	struct virtnet_info *vi =
 		container_of(work, struct virtnet_info, refill.work);
-	bool still_empty;
+	int still_empty = 0;
 	int i;
 
 	for (i = 0; i < vi->curr_queue_pairs; i++) {
 		struct receive_queue *rq = &vi->rq[i];
 
 		napi_disable(&rq->napi);
-		still_empty = !try_fill_recv(vi, rq, GFP_KERNEL);
+		if (!try_fill_recv(vi, rq, GFP_KERNEL))
+		    still_empty++;
 		virtnet_napi_enable(rq->vq, &rq->napi);
-
-		/* In theory, this can happen: if we don't get any buffers in
-		 * we will *never* try to fill again.
-		 */
-		if (still_empty)
-			schedule_delayed_work(&vi->refill, HZ/2);
 	}
+	/* In theory, this can happen: if we don't get any buffers in
+	 * we will *never* try to fill again.
+	 */
+	if (still_empty)
+		schedule_delayed_work(&vi->refill, HZ/2);
 }
 
 static int virtnet_receive(struct receive_queue *rq, int budget,
@@ -1407,12 +1407,13 @@ static int virtnet_open(struct net_device *dev)
 {
 	struct virtnet_info *vi = netdev_priv(dev);
 	int i, err;
+	int need_refill = 0;
 
 	for (i = 0; i < vi->max_queue_pairs; i++) {
 		if (i < vi->curr_queue_pairs)
 			/* Make sure we have some buffers: if oom use wq. */
 			if (!try_fill_recv(vi, &vi->rq[i], GFP_KERNEL))
-				schedule_delayed_work(&vi->refill, 0);
+				need_refill++;
 
 		err = xdp_rxq_info_reg(&vi->rq[i].xdp_rxq, dev, i);
 		if (err < 0)
@@ -1428,6 +1429,8 @@ static int virtnet_open(struct net_device *dev)
 		virtnet_napi_enable(vi->rq[i].vq, &vi->rq[i].napi);
 		virtnet_napi_tx_enable(vi, vi->sq[i].vq, &vi->sq[i].napi);
 	}
+	if (need_refill)
+		schedule_delayed_work(&vi->refill, 0);
 
 	return 0;
 }
@@ -2236,6 +2239,7 @@ static int virtnet_restore_up(struct virtio_device *vdev)
 {
 	struct virtnet_info *vi = vdev->priv;
 	int err, i;
+	int need_refill = 0;
 
 	err = init_vqs(vi);
 	if (err)
@@ -2246,13 +2250,15 @@ static int virtnet_restore_up(struct virtio_device *vdev)
 	if (netif_running(vi->dev)) {
 		for (i = 0; i < vi->curr_queue_pairs; i++)
 			if (!try_fill_recv(vi, &vi->rq[i], GFP_KERNEL))
-				schedule_delayed_work(&vi->refill, 0);
+				need_refill++;
 
 		for (i = 0; i < vi->max_queue_pairs; i++) {
 			virtnet_napi_enable(vi->rq[i].vq, &vi->rq[i].napi);
 			virtnet_napi_tx_enable(vi, vi->sq[i].vq,
 					       &vi->sq[i].napi);
 		}
+		if (need_refill)
+			schedule_delayed_work(&vi->refill, 0);
 	}
 
 	netif_device_attach(vi->dev);

> Thanks

Sebastian

^ permalink raw reply related

* [PATCH] virtio_net: add local_bh_disable() around u64_stats_update_begin
From: Sebastian Andrzej Siewior @ 2018-10-18  8:43 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: netdev, virtualization, tglx, Toshiaki Makita, Michael S. Tsirkin,
	Jason Wang, David S. Miller
In-Reply-To: <20181016114414.23ea73c3@xeon-e3>

on 32bit, lockdep notices that virtnet_open() and refill_work() invoke
try_fill_recv() from process context while virtnet_receive() invokes the
same function from BH context. The problem that the seqcounter within
u64_stats_update_begin() may deadlock if it is interrupted by BH and
then acquired again.

Introduce u64_stats_update_begin_bh() which disables BH on 32bit
architectures. Since the BH might interrupt the worker, this new
function should not limited to SMP like the others which are expected
to be used in softirq.

With this change we might lose increments but this is okay. The
important part that the two 32bit parts of the 64bit counter are not
corrupted.

Fixes: 461f03dc99cf6 ("virtio_net: Add kick stats").
Suggested-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/net/virtio_net.c       |  4 ++--
 include/linux/u64_stats_sync.h | 16 ++++++++++++++++
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index dab504ec5e502..fbcfb4d272336 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1206,9 +1206,9 @@ static bool try_fill_recv(struct virtnet_info *vi, struct receive_queue *rq,
 			break;
 	} while (rq->vq->num_free);
 	if (virtqueue_kick_prepare(rq->vq) && virtqueue_notify(rq->vq)) {
-		u64_stats_update_begin(&rq->stats.syncp);
+		u64_stats_update_begin_bh(&rq->stats.syncp);
 		rq->stats.kicks++;
-		u64_stats_update_end(&rq->stats.syncp);
+		u64_stats_update_end_bh(&rq->stats.syncp);
 	}
 
 	return !oom;
diff --git a/include/linux/u64_stats_sync.h b/include/linux/u64_stats_sync.h
index a27604f99ed04..46b6ad6175628 100644
--- a/include/linux/u64_stats_sync.h
+++ b/include/linux/u64_stats_sync.h
@@ -90,6 +90,22 @@ static inline void u64_stats_update_end(struct u64_stats_sync *syncp)
 #endif
 }
 
+static inline void u64_stats_update_begin_bh(struct u64_stats_sync *syncp)
+{
+#if BITS_PER_LONG==32
+	local_bh_disable();
+	write_seqcount_begin(&syncp->seq);
+#endif
+}
+
+static inline void u64_stats_update_end_bh(struct u64_stats_sync *syncp)
+{
+#if BITS_PER_LONG==32
+	write_seqcount_end(&syncp->seq);
+	local_bh_enable();
+#endif
+}
+
 static inline unsigned long
 u64_stats_update_begin_irqsave(struct u64_stats_sync *syncp)
 {
-- 
2.19.1

^ permalink raw reply related

* Re: bond: take rcu lock in netpoll_send_skb_on_dev
From: Eran Ben Elisha @ 2018-10-18  8:40 UTC (permalink / raw)
  To: Cong Wang
  Cc: Dave Jones, Linux Kernel Network Developers, Tariq Toukan,
	Saeed Mahameed
In-Reply-To: <CAM_iQpUjdPuM3mH-j94gTga3v4Ts=DdKOiCoG2wQct-tiZaOaQ@mail.gmail.com>



On 10/18/2018 8:46 AM, Cong Wang wrote:
> On Mon, Oct 15, 2018 at 4:36 AM Eran Ben Elisha <eranbe@mellanox.com> wrote:
>> Hi,
>>
>> This suggested fix introduced a regression while using netconsole module
>> with mlx5_core module loaded.
> 
> It is already reported here:
> https://marc.info/?l=linux-kernel&m=153917359528669&w=2
> 
> 
>>
>> During irq handling, we hit a warning that this rcu_read_lock_bh cannot
>> be taken inside an IRQ.
> 
> Yes, I mentioned the same even before this patch was sent out:
> https://marc.info/?l=linux-netdev&m=153816136624679&w=2

Thanks Cong.
 From the discussion, I understand that the solution shouldn't be 
touching  netpoll_send_skb_on_dev. Some modules/drivers logging will now 
trigger traces while netconsole is loaded.

DaveJ,
Can you please submit a proper fix or at least revert the current one in 
the meanwhile.

Thanks,
Eran

> 
> Thanks.
> 

^ permalink raw reply

* Re: [PATCH net] net/sched: properly init chain in case of multiple control actions
From: Davide Caratti @ 2018-10-18  8:38 UTC (permalink / raw)
  To: Cong Wang
  Cc: Jiri Pirko, Jamal Hadi Salim, David Miller,
	Linux Kernel Network Developers
In-Reply-To: <CAM_iQpUbUPgHn1LSZD2v+QhFmesyOn9cAYh-5TRCug-+ULyPgg@mail.gmail.com>

On Wed, 2018-10-17 at 22:35 -0700, Cong Wang wrote:
> > (well, after some more thinking I looked again at that patch and yes, it
> > lacked the most important thing:)
> 
> Hmm, as I said, I am not sure if the logic is correct, if we have two different
> goto actions, we must have two pointers.
> 
> I will re-think about it tomorrow. (I am at a conference so don't have much
> time on reviewing this.)
> 
> Thanks.

sure, ok. In the meanwhile, I will post a V2 that:

- adds the missing test that avoids having 'goto action' in the primary
and in the fallback control action at the same time
- fixes a very silly bug that made it fail the TDC 'gact' selftest 

regards,
-- 
davide

^ permalink raw reply

* [net PATCH] net: sched: Fix for duplicate class dump
From: Phil Sutter @ 2018-10-18  8:34 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Eric Dumazet

When dumping classes by parent, kernel would return classes twice:

| # tc qdisc add dev lo root prio
| # tc class show dev lo
| class prio 8001:1 parent 8001:
| class prio 8001:2 parent 8001:
| class prio 8001:3 parent 8001:
| # tc class show dev lo parent 8001:
| class prio 8001:1 parent 8001:
| class prio 8001:2 parent 8001:
| class prio 8001:3 parent 8001:
| class prio 8001:1 parent 8001:
| class prio 8001:2 parent 8001:
| class prio 8001:3 parent 8001:

This comes from qdisc_match_from_root() potentially returning the root
qdisc itself if its handle matched. Though in that case, root's classes
were already dumped a few lines above.

Fixes: cb395b2010879 ("net: sched: optimize class dumps")
Signed-off-by: Phil Sutter <phil@nwl.cc>
---
 net/sched/sch_api.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 6684641ea3448..3dc0acf542454 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -2059,7 +2059,8 @@ static int tc_dump_tclass_root(struct Qdisc *root, struct sk_buff *skb,
 
 	if (tcm->tcm_parent) {
 		q = qdisc_match_from_root(root, TC_H_MAJ(tcm->tcm_parent));
-		if (q && tc_dump_tclass_qdisc(q, skb, tcm, cb, t_p, s_t) < 0)
+		if (q && q != root &&
+		    tc_dump_tclass_qdisc(q, skb, tcm, cb, t_p, s_t) < 0)
 			return -1;
 		return 0;
 	}
-- 
2.19.0

^ permalink raw reply related

* Re: [PATCH bpf-next 2/3] tools, perf: use smp_{rmb,mb} barriers instead of {rmb,mb}
From: Peter Zijlstra @ 2018-10-18  8:14 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: alexei.starovoitov, paulmck, will.deacon, acme, yhs,
	john.fastabend, netdev
In-Reply-To: <55f86215-44a8-2bb8-b1d0-a77a142dc697@iogearbox.net>

On Thu, Oct 18, 2018 at 01:10:15AM +0200, Daniel Borkmann wrote:

> Wouldn't this then also allow the kernel side to use smp_store_release()
> when it updates the head? We'd be pretty much at the model as described
> in Documentation/core-api/circular-buffers.rst.
> 
> Meaning, rough pseudo-code diff would look as:
> 
> diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
> index 5d3cf40..3d96275 100644
> --- a/kernel/events/ring_buffer.c
> +++ b/kernel/events/ring_buffer.c
> @@ -84,8 +84,9 @@ static void perf_output_put_handle(struct perf_output_handle *handle)
>  	 *
>  	 * See perf_output_begin().
>  	 */
> -	smp_wmb(); /* B, matches C */
> -	rb->user_page->data_head = head;
> +
> +	/* B, matches C */
> +	smp_store_release(&rb->user_page->data_head, head);

Yes, this would be correct.

The reason we didn't do this is because smp_store_release() ends up
being smp_mb() + WRITE_ONCE() for a fair number of platforms, even if
they have a cheaper smp_wmb(). Most notably ARM.

(ARM64 OTOH would like to have smp_store_release() there I imagine;
while x86 doesn't care either way around).

A similar concern exists for the smp_load_acquire() I proposed for the
userspace side, ARM would have to resort to smp_mb() in that situation,
instead of the cheaper smp_rmb().

The smp_store_release() on the userspace side will actually be of equal
cost or cheaper, since it already has an smp_mb(). Most notably, x86 can
avoid barrier entirely, because TSO doesn't allow the LOAD-STORE reorder
(it only allows the STORE-LOAD reorder). And PowerPC can use LWSYNC
instead of SYNC.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox