Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: Resource management for ndo_xdp_xmit (Was: [PATCH net] virtio_net: Account for tx bytes and packets on sending xdp_frames)
From: Toke Høiland-Jørgensen @ 2019-02-09 16:56 UTC (permalink / raw)
  To: Jakub Kicinski, Saeed Mahameed
  Cc: brouer@redhat.com, hawk@kernel.org,
	virtualization@lists.linux-foundation.org, borkmann@iogearbox.net,
	Tariq Toukan, john.fastabend@gmail.com, mst@redhat.com,
	dsahern@gmail.com, netdev@vger.kernel.org, jasowang@redhat.com,
	davem@davemloft.net, makita.toshiaki@lab.ntt.co.jp
In-Reply-To: <20190208180516.287f6ddc@cakuba.netronome.com>

Jakub Kicinski <jakub.kicinski@netronome.com> writes:

> On Sat, 9 Feb 2019 00:18:31 +0000, Saeed Mahameed wrote:
>> On Fri, 2019-02-08 at 15:17 -0800, Saeed Mahameed wrote:
>> > On Thu, 2019-02-07 at 19:08 +0000, Saeed Mahameed wrote:  
>> > > 
>> > > So 
>> > > 1) on dev_map_update_elem() we will call
>> > > dev->dev->ndo_bpf() to notify the device on the intention to
>> > > start/stop
>> > > redirect, and wait for it to create/destroy the HW resources
>> > > before/after actually updating the map
>> > >   
>> > 
>> > silly me, dev_map_update_elem must be atomic, we can't hook driver
>> > resource allocation to it, it must come as a separate request
>> > (syscall)
>> > from user space to request to create XDP redirect resources.
>> >   
>> 
>> Well, it is possible to render dev_map_update_elem non-atomic and fail
>> BPF programs who try to update it in the verifier
>> check_map_func_compatibility.
>> 
>> if you know of any case where devmap needs to be updated from the BPF
>> program please let me know.
>
> Did we find a solution to non-map redirect?

See my other reply to Saeed :)

-Toke

^ permalink raw reply

* Re: Resource management for ndo_xdp_xmit (Was: [PATCH net] virtio_net: Account for tx bytes and packets on sending xdp_frames)
From: Toke Høiland-Jørgensen @ 2019-02-09 16:56 UTC (permalink / raw)
  To: Saeed Mahameed, brouer@redhat.com
  Cc: hawk@kernel.org, virtualization@lists.linux-foundation.org,
	borkmann@iogearbox.net, Tariq Toukan, john.fastabend@gmail.com,
	mst@redhat.com, jakub.kicinski@netronome.com, dsahern@gmail.com,
	netdev@vger.kernel.org, jasowang@redhat.com, davem@davemloft.net,
	makita.toshiaki@lab.ntt.co.jp
In-Reply-To: <b4eb9dd1c7aaa3215ade8d3d17f397588a519ca3.camel@mellanox.com>

Saeed Mahameed <saeedm@mellanox.com> writes:

> On Fri, 2019-02-08 at 15:17 -0800, Saeed Mahameed wrote:
>> On Thu, 2019-02-07 at 19:08 +0000, Saeed Mahameed wrote:
>> > 
>> > So 
>> > 1) on dev_map_update_elem() we will call
>> > dev->dev->ndo_bpf() to notify the device on the intention to
>> > start/stop
>> > redirect, and wait for it to create/destroy the HW resources
>> > before/after actually updating the map
>> > 
>> 
>> silly me, dev_map_update_elem must be atomic, we can't hook driver
>> resource allocation to it, it must come as a separate request
>> (syscall)
>> from user space to request to create XDP redirect resources.
>> 
>
> Well, it is possible to render dev_map_update_elem non-atomic and fail
> BPF programs who try to update it in the verifier
> check_map_func_compatibility.
>
> if you know of any case where devmap needs to be updated from the BPF
> program please let me know.

Well, maybe? My plan for dealing with the non-map redirect variant is
essentially to have a hidden map that does just-in-time insertion of
ifindexes if they are not already there; and rework XDP_REDIRECT to use
that.

However, this would essentially be an insert from eBPF; but I guess
maybe it could be deferred until the RX-side driver exits its NAPI loop
(as we're buffering frames in the map anyway)?

-Toke

^ permalink raw reply

* Re: [PATCH v1 net-next 1/4] net: dsa: microchip: prepare PHY for proper advertisement
From: Andrew Lunn @ 2019-02-09 16:54 UTC (permalink / raw)
  To: Tristram.Ha
  Cc: Sergio Paracuellos, Florian Fainelli, Pavel Machek,
	UNGLinuxDriver, netdev
In-Reply-To: <1549598829-25970-2-git-send-email-Tristram.Ha@microchip.com>

> +static void ksz9477_phy_setup(struct ksz_device *dev, int port,
> +			      struct phy_device *phy)
> +{
> +	/* ETHTOOL_LINK_MODE_Pause_BIT and ETHTOOL_LINK_MODE_Asym_Pause_BIT
> +	 * can be removed to disable flow control when rate limiting is used.
> +	 */
> +	if (port < dev->phy_port_cnt) {
> +		/* The MAC actually cannot run in 1000 half-duplex mode. */
> +		phy_remove_link_mode(phy,
> +				     ETHTOOL_LINK_MODE_1000baseT_Half_BIT);
> +	}

Hi Tristram

The comment about pause does not seem to match the code.

> +}
> +
>  static void ksz9477_port_setup(struct ksz_device *dev, int port, bool cpu_port)
>  {
>  	u8 data8;
> @@ -1151,6 +1164,7 @@ static int ksz9477_setup(struct dsa_switch *ds)
>  	.setup			= ksz9477_setup,
>  	.phy_read		= ksz9477_phy_read16,
>  	.phy_write		= ksz9477_phy_write16,
> +	.adjust_link		= ksz_adjust_link,
>  	.port_enable		= ksz_enable_port,
>  	.port_disable		= ksz_disable_port,
>  	.get_strings		= ksz9477_get_strings,
> @@ -1298,6 +1312,7 @@ static void ksz9477_switch_exit(struct ksz_device *dev)
>  	.get_port_addr = ksz9477_get_port_addr,
>  	.cfg_port_member = ksz9477_cfg_port_member,
>  	.flush_dyn_mac_table = ksz9477_flush_dyn_mac_table,
> +	.phy_setup = ksz9477_phy_setup,
>  	.port_setup = ksz9477_port_setup,
>  	.shutdown = ksz9477_reset_switch,
>  	.detect = ksz9477_switch_detect,
> diff --git a/drivers/net/dsa/microchip/ksz_common.c b/drivers/net/dsa/microchip/ksz_common.c
> index 8a5111f..a57bda7 100644
> --- a/drivers/net/dsa/microchip/ksz_common.c
> +++ b/drivers/net/dsa/microchip/ksz_common.c
> @@ -2,7 +2,7 @@
>  /*
>   * Microchip switch driver main logic
>   *
> - * Copyright (C) 2017-2018 Microchip Technology Inc.
> + * Copyright (C) 2017-2019 Microchip Technology Inc.
>   */
>  
>  #include <linux/delay.h>
> @@ -61,6 +61,22 @@ int ksz_phy_write16(struct dsa_switch *ds, int addr, int reg, u16 val)
>  }
>  EXPORT_SYMBOL_GPL(ksz_phy_write16);
>  
> +void ksz_adjust_link(struct dsa_switch *ds, int port,
> +		     struct phy_device *phydev)
> +{
> +	struct ksz_device *dev = ds->priv;
> +	struct ksz_port *p = &dev->ports[port];
> +
> +	if (phydev->link) {
> +		dev->live_ports |= (1 << port) & dev->on_ports;
> +	} else if (p->phydev.link) {
> +		p->link_just_down = 1;
> +		dev->live_ports &= ~(1 << port);
> +	}

It is not very easy to understand what on_ports, live_ports and
link_just_down are all about. Could you simplify this, and make the
code symmetric? ksz_enable_port does not touch any of these, but
ksz_disable_port does?

grep live_ports *
ksz9477.c:		dev->live_ports = dev->host_mask;
ksz9477.c:				  dev->live_ports |= (1 << port);
ksz_common.c:				  dev->live_ports &= ~(1 << port);
ksz_priv.h:				  u16 live_ports;

So live_ports is not actually used for anything.

> +	p->phydev = *phydev;

This last statement seems like it belongs in phy_setup, which gets
called as part of port_enable.


^ permalink raw reply

* Re: [PATCH net-next] net: phy: Add support for asking the PHY its abilities
From: Andrew Lunn @ 2019-02-09 16:35 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: Florian Fainelli, David Miller, netdev@vger.kernel.org
In-Reply-To: <19c1e5c4-2f86-a3ff-e971-a0098c605b3f@gmail.com>

On Sat, Feb 09, 2019 at 03:24:47PM +0100, Heiner Kallweit wrote:
> From: Andrew Lunn <andrew@lunn.ch>
> Add support for runtime determination of what the PHY supports, by
> adding a new function to the phy driver. The get_features call should
> set the phydev->supported member with the features the PHY supports.
> It is only called if phydrv->features is NULL.
> 
> This requires minor changes to pause. The PHY driver should not set
> pause abilities, except for when it has odd cause capabilities, e.g.
> pause cannot be disabled. With this change, phydev->supported already
> contains the drivers abilities, including pause. So rather than
> considering phydrv->features, look at the phydev->supported, and
> enable pause if neither of the pause bits are already set.

Hi Heiner

Ah, cool, these are the patches i was asking for, when you asked
about splitting Maxime's patches. There is one more in my tree which
converts the marvell10g to using this. I think that should be posted
as well. It makes it clear how this should be used, and it replaces
one of the patches in Maxime's set.

> Signed-off-by: Andrew Lunn <andrew@lunn.ch>
> [hkallweit1@gmail.com: fixed small checkpatch complaint in one comment]

Thanks.

	Andrew

^ permalink raw reply

* Re: [PATCH net-next v2 00/10] net: phy: Add support for 2.5GBASET PHYs
From: Heiner Kallweit @ 2019-02-09 16:28 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Maxime Chevallier, davem, netdev, linux-kernel, Florian Fainelli,
	Russell King, linux-arm-kernel, Antoine Tenart, thomas.petazzoni,
	gregory.clement, miquel.raynal, nadavh, stefanc, mw
In-Reply-To: <20190209162545.GA30856@lunn.ch>

On 09.02.2019 17:25, Andrew Lunn wrote:
>> I'd propose that you extract generic patches being submission-ready
>> and split the patch series into two. I think the following patches
>> would be candidates for the first series: 2, 3, 5, 6
>> (provided they have no dependency on the other patches)
>> Based on that both of us can go on with our work.
>>
>> Andrew, what do you think?
> 
> Yes, that would help.
> 
> Heiner, can you also submit the .get_features patches soon. That is
> another important piece of the puzzle for both drivers.
> 
Just submitted it few hours ago.
https://patchwork.ozlabs.org/patch/1039237/

> 	Thanks
> 		Andrew
> 
Heiner

^ permalink raw reply

* Re: [PATCH net-next v2 00/10] net: phy: Add support for 2.5GBASET PHYs
From: Andrew Lunn @ 2019-02-09 16:25 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Maxime Chevallier, davem, netdev, linux-kernel, Florian Fainelli,
	Russell King, linux-arm-kernel, Antoine Tenart, thomas.petazzoni,
	gregory.clement, miquel.raynal, nadavh, stefanc, mw
In-Reply-To: <81c340ea-54b0-1abf-94af-b8dc4ee83e3a@gmail.com>

> I'd propose that you extract generic patches being submission-ready
> and split the patch series into two. I think the following patches
> would be candidates for the first series: 2, 3, 5, 6
> (provided they have no dependency on the other patches)
> Based on that both of us can go on with our work.
> 
> Andrew, what do you think?

Yes, that would help.

Heiner, can you also submit the .get_features patches soon. That is
another important piece of the puzzle for both drivers.

	Thanks
		Andrew

^ permalink raw reply

* [PATCH] net: hso: do not unregister if not registered
From: Yavuz, Tuba @ 2019-02-09 16:24 UTC (permalink / raw)
  To: netdev@vger.kernel.org

On an error path inside the hso_create_net_device function of the hso
driver, hso_free_net_device gets called. This causes potentially a
negative reference count in the net device if register_netdev has not
been called yet as hso_free_net_device calls unregister_netdev
regardless. I think the driver should distinguish these cases and call
unregister_netdev only if register_netdev has been called.

Signed-off-by: Tuba Yavuz <tuba@ece.ufl.edu>
---

--- linux-stable/drivers/net/usb/hso.c.orig	2019-01-27 14:45:58.232683119 -0500
+++ linux-stable/drivers/net/usb/hso.c	2019-02-05 17:54:17.056496019 -0500
@@ -2377,7 +2377,9 @@ static void hso_free_net_device(struct h

 	remove_net_device(hso_net->parent);

-	if (hso_net->net)
+	if (hso_net->net &&
+	    hso_net->net->reg_state == NETREG_REGISTERED)
 		unregister_netdev(hso_net->net);

 	/* start freeing */

^ permalink raw reply

* Re: [PATCH] net: mvpp2: clear flow control modes in 10G mode
From: Russell King - ARM Linux admin @ 2019-02-09 16:07 UTC (permalink / raw)
  To: Antoine Tenart, Maxime Chevallier
  Cc: Baruch Siach, Sven Auhagen, David S. Miller, netdev
In-Reply-To: <E1gsV8v-00044k-Bq@rmk-PC.armlinux.org.uk>

Please ignore this one - subject line is not correct.  Thanks.

On Sat, Feb 09, 2019 at 04:06:13PM +0000, Russell King wrote:
> When mvpp2 configures the flow control modes in mvpp2_xlg_config() for
> 10G mode, it only ever set the flow control enable bits.  There is no
> mechanism to clear these bits, which means that userspace is unable to
> use standard APIs to disable flow control (the only way is to poke the
> register directly.)
> 
> Fix the missing bit clearance to allow flow control to be disabled.
> This means that, by default, as there is no negotiation in 10G modes
> with mvpp2, flow control is now disabled rather than being rx-only.
> 
> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
> ---
>  drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
> index b63fac6ee2e6..199e6f17ee1b 100644
> --- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
> +++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
> @@ -4532,8 +4532,13 @@ static void mvpp2_xlg_config(struct mvpp2_port *port, unsigned int mode,
>  
>  	if (state->pause & MLO_PAUSE_TX)
>  		ctrl0 |= MVPP22_XLG_CTRL0_TX_FLOW_CTRL_EN;
> +	else
> +		ctrl0 &= ~MVPP22_XLG_CTRL0_TX_FLOW_CTRL_EN;
> +
>  	if (state->pause & MLO_PAUSE_RX)
>  		ctrl0 |= MVPP22_XLG_CTRL0_RX_FLOW_CTRL_EN;
> +	else
> +		ctrl0 &= ~MVPP22_XLG_CTRL0_RX_FLOW_CTRL_EN;
>  
>  	ctrl4 &= ~MVPP22_XLG_CTRL4_MACMODSELECT_GMAC;
>  	ctrl4 |= MVPP22_XLG_CTRL4_FWD_FC | MVPP22_XLG_CTRL4_FWD_PFC |
> -- 
> 2.7.4
> 
> 

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

^ permalink raw reply

* [PATCH net-next] net: marvell: mvpp2: clear flow control modes in 10G mode
From: Russell King @ 2019-02-09 16:06 UTC (permalink / raw)
  To: Antoine Tenart, Maxime Chevallier
  Cc: Baruch Siach, Sven Auhagen, David S. Miller, netdev

When mvpp2 configures the flow control modes in mvpp2_xlg_config() for
10G mode, it only ever set the flow control enable bits.  There is no
mechanism to clear these bits, which means that userspace is unable to
use standard APIs to disable flow control (the only way is to poke the
register directly.)

Fix the missing bit clearance to allow flow control to be disabled.
This means that, by default, as there is no negotiation in 10G modes
with mvpp2, flow control is now disabled rather than being rx-only.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
---
 drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
index b63fac6ee2e6..199e6f17ee1b 100644
--- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
+++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
@@ -4532,8 +4532,13 @@ static void mvpp2_xlg_config(struct mvpp2_port *port, unsigned int mode,

 	if (state->pause & MLO_PAUSE_TX)
 		ctrl0 |= MVPP22_XLG_CTRL0_TX_FLOW_CTRL_EN;
+	else
+		ctrl0 &= ~MVPP22_XLG_CTRL0_TX_FLOW_CTRL_EN;
+
 	if (state->pause & MLO_PAUSE_RX)
 		ctrl0 |= MVPP22_XLG_CTRL0_RX_FLOW_CTRL_EN;
+	else
+		ctrl0 &= ~MVPP22_XLG_CTRL0_RX_FLOW_CTRL_EN;

 	ctrl4 &= ~MVPP22_XLG_CTRL4_MACMODSELECT_GMAC;
 	ctrl4 |= MVPP22_XLG_CTRL4_FWD_FC | MVPP22_XLG_CTRL4_FWD_PFC |
-- 
2.7.4

^ permalink raw reply related

* [PATCH] net: mvpp2: clear flow control modes in 10G mode
From: Russell King @ 2019-02-09 16:06 UTC (permalink / raw)
  To: Antoine Tenart, Maxime Chevallier
  Cc: Baruch Siach, Sven Auhagen, David S. Miller, netdev

When mvpp2 configures the flow control modes in mvpp2_xlg_config() for
10G mode, it only ever set the flow control enable bits.  There is no
mechanism to clear these bits, which means that userspace is unable to
use standard APIs to disable flow control (the only way is to poke the
register directly.)

Fix the missing bit clearance to allow flow control to be disabled.
This means that, by default, as there is no negotiation in 10G modes
with mvpp2, flow control is now disabled rather than being rx-only.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
---
 drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
index b63fac6ee2e6..199e6f17ee1b 100644
--- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
+++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
@@ -4532,8 +4532,13 @@ static void mvpp2_xlg_config(struct mvpp2_port *port, unsigned int mode,

 	if (state->pause & MLO_PAUSE_TX)
 		ctrl0 |= MVPP22_XLG_CTRL0_TX_FLOW_CTRL_EN;
+	else
+		ctrl0 &= ~MVPP22_XLG_CTRL0_TX_FLOW_CTRL_EN;
+
 	if (state->pause & MLO_PAUSE_RX)
 		ctrl0 |= MVPP22_XLG_CTRL0_RX_FLOW_CTRL_EN;
+	else
+		ctrl0 &= ~MVPP22_XLG_CTRL0_RX_FLOW_CTRL_EN;

 	ctrl4 &= ~MVPP22_XLG_CTRL4_MACMODSELECT_GMAC;
 	ctrl4 |= MVPP22_XLG_CTRL4_FWD_FC | MVPP22_XLG_CTRL4_FWD_PFC |
-- 
2.7.4

^ permalink raw reply related

* Re: [ISSUE][4.20.6] mlx5 and checksum failures
From: Ian Kumlien @ 2019-02-09 15:54 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: Cong Wang, David Miller, Saeed Mahameed,
	Linux Kernel Network Developers
In-Reply-To: <CAA85sZv-6y2RmSkHPVV_j1fW_uk_mJ2FznssTm52K1YC6L+3aA@mail.gmail.com>

On Fri, Feb 8, 2019 at 5:29 PM Ian Kumlien <ian.kumlien@gmail.com> wrote
> On Thu, Feb 7, 2019 at 11:01 PM Ian Kumlien <ian.kumlien@gmail.com> wrote:
> > On Thu, Feb 7, 2019 at 7:43 PM Saeed Mahameed <saeedm@dev.mellanox.co.il> wrote:
> > > On Thu, Feb 7, 2019 at 2:17 AM Ian Kumlien <ian.kumlien@gmail.com> wrote:
> > > > On Thu, Feb 7, 2019 at 2:01 AM Saeed Mahameed <saeedm@dev.mellanox.co.il> wrote:
> > > > > On Wed, Feb 6, 2019 at 3:00 PM Ian Kumlien <ian.kumlien@gmail.com> wrote:
> > > > > > It changes directly after the first hw checksum failure, I don't know why =/
> > > > >
> > > > > weird, Maybe a real check-summing issue/corruption on the PCI ?!
> > > >
> > > > Actually, it seems to have been introduced in 4.20.6 - 4.20.5 works just fine
>
> > > Great, the difference is only 120 patches.
> > > that is bisect-able, it will only take 5 iterations to find the
> > > offending commit.
> >
> > I just wish it wasn't a server that takes, what feels like 5 minutes to boot...
> >
> > All of these seas of sensors 2d and 3d... =P
> >
> > But, yep, that's the plan
>
> Huh, spent most of the day with two bisects and none of them yielded
> any results....
>
> Looks like I'll have to start investigating the elrepo kernel-ml build =(

Just realized that it's not an entirely fair comparison - since
retpolines wasn't enabled, damned old compilers...

^ permalink raw reply

* [PATCH net-next] net: phy: Add support for asking the PHY its abilities
From: Heiner Kallweit @ 2019-02-09 14:24 UTC (permalink / raw)
  To: Andrew Lunn, Florian Fainelli, David Miller; +Cc: netdev@vger.kernel.org

From: Andrew Lunn <andrew@lunn.ch>
Add support for runtime determination of what the PHY supports, by
adding a new function to the phy driver. The get_features call should
set the phydev->supported member with the features the PHY supports.
It is only called if phydrv->features is NULL.

This requires minor changes to pause. The PHY driver should not set
pause abilities, except for when it has odd cause capabilities, e.g.
pause cannot be disabled. With this change, phydev->supported already
contains the drivers abilities, including pause. So rather than
considering phydrv->features, look at the phydev->supported, and
enable pause if neither of the pause bits are already set.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
[hkallweit1@gmail.com: fixed small checkpatch complaint in one comment]
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
 drivers/net/phy/phy_device.c | 31 +++++++++++++++----------------
 include/linux/phy.h          |  6 ++++++
 2 files changed, 21 insertions(+), 16 deletions(-)

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 92b7a71df..8573d17ec 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -2236,7 +2236,14 @@ static int phy_probe(struct device *dev)
 	 * a controller will attach, and may modify one
 	 * or both of these values
 	 */
-	linkmode_copy(phydev->supported, phydrv->features);
+	if (phydrv->features) {
+		linkmode_copy(phydev->supported, phydrv->features);
+	} else {
+		err = phydrv->get_features(phydev);
+		if (err)
+			goto out;
+	}
+
 	of_set_phy_supported(phydev);
 	linkmode_copy(phydev->advertising, phydev->supported);
 
@@ -2256,20 +2263,8 @@ static int phy_probe(struct device *dev)
 	 * (e.g. hardware erratum) where the driver wants to set only one
 	 * of these bits.
 	 */
-	if (test_bit(ETHTOOL_LINK_MODE_Pause_BIT, phydrv->features) ||
-	    test_bit(ETHTOOL_LINK_MODE_Asym_Pause_BIT, phydrv->features)) {
-		linkmode_clear_bit(ETHTOOL_LINK_MODE_Pause_BIT,
-				   phydev->supported);
-		linkmode_clear_bit(ETHTOOL_LINK_MODE_Asym_Pause_BIT,
-				   phydev->supported);
-		if (test_bit(ETHTOOL_LINK_MODE_Pause_BIT, phydrv->features))
-			linkmode_set_bit(ETHTOOL_LINK_MODE_Pause_BIT,
-					 phydev->supported);
-		if (test_bit(ETHTOOL_LINK_MODE_Asym_Pause_BIT,
-			     phydrv->features))
-			linkmode_set_bit(ETHTOOL_LINK_MODE_Asym_Pause_BIT,
-					 phydev->supported);
-	} else {
+	if (!test_bit(ETHTOOL_LINK_MODE_Pause_BIT, phydev->supported) &&
+	    !test_bit(ETHTOOL_LINK_MODE_Asym_Pause_BIT, phydev->supported)) {
 		linkmode_set_bit(ETHTOOL_LINK_MODE_Pause_BIT,
 				 phydev->supported);
 		linkmode_set_bit(ETHTOOL_LINK_MODE_Asym_Pause_BIT,
@@ -2315,7 +2310,11 @@ int phy_driver_register(struct phy_driver *new_driver, struct module *owner)
 {
 	int retval;
 
-	if (WARN_ON(!new_driver->features)) {
+	/* Either the features are hard coded, or dynamically
+	 * determine. It cannot be both or neither
+	 */
+	if (WARN_ON((!new_driver->features && !new_driver->get_features) ||
+		    (new_driver->features && new_driver->get_features))) {
 		pr_err("%s: Driver features are missing\n", new_driver->name);
 		return -EINVAL;
 	}
diff --git a/include/linux/phy.h b/include/linux/phy.h
index f41bf651f..d2ffae992 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -502,6 +502,12 @@ struct phy_driver {
 	 */
 	int (*probe)(struct phy_device *phydev);
 
+	/*
+	 * Probe the hardware to determine what abilities it has.
+	 * Should only set phydev->supported.
+	 */
+	int (*get_features)(struct phy_device *phydev);
+
 	/* PHY Power Management */
 	int (*suspend)(struct phy_device *phydev);
 	int (*resume)(struct phy_device *phydev);
-- 
2.20.1


^ permalink raw reply related

* [PATCH net-next] net: phy: probe the PHY before determining the supported features
From: Heiner Kallweit @ 2019-02-09 13:46 UTC (permalink / raw)
  To: Andrew Lunn, Florian Fainelli, David Miller; +Cc: netdev@vger.kernel.org

From: Andrew Lunn <andrew@lunn.ch>
We will soon support asking the PHY at runtime to determine what
features it supports, rather than forcing it to be compile time.
But we should probe the PHY first. So probe the phy driver earlier.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
 drivers/net/phy/phy_device.c | 24 +++++++++++++-----------
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 31f9e7c49..92b7a71df 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -2220,6 +2220,18 @@ static int phy_probe(struct device *dev)
 
 	mutex_lock(&phydev->lock);
 
+	if (phydev->drv->probe) {
+		/* Deassert the reset signal */
+		phy_device_reset(phydev, 0);
+
+		err = phydev->drv->probe(phydev);
+		if (err) {
+			/* Assert the reset signal */
+			phy_device_reset(phydev, 1);
+			goto out;
+		}
+	}
+
 	/* Start out supporting everything. Eventually,
 	 * a controller will attach, and may modify one
 	 * or both of these values
@@ -2267,17 +2279,7 @@ static int phy_probe(struct device *dev)
 	/* Set the state to READY by default */
 	phydev->state = PHY_READY;
 
-	if (phydev->drv->probe) {
-		/* Deassert the reset signal */
-		phy_device_reset(phydev, 0);
-
-		err = phydev->drv->probe(phydev);
-		if (err) {
-			/* Assert the reset signal */
-			phy_device_reset(phydev, 1);
-		}
-	}
-
+out:
 	mutex_unlock(&phydev->lock);
 
 	return err;
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH net-next v2 00/10] net: phy: Add support for 2.5GBASET PHYs
From: Heiner Kallweit @ 2019-02-09 13:22 UTC (permalink / raw)
  To: Maxime Chevallier, davem, Andrew Lunn
  Cc: netdev, linux-kernel, Florian Fainelli, Russell King,
	linux-arm-kernel, Antoine Tenart, thomas.petazzoni,
	gregory.clement, miquel.raynal, nadavh, stefanc, mw
In-Reply-To: <20190207094939.27369-1-maxime.chevallier@bootlin.com>

On 07.02.2019 10:49, Maxime Chevallier wrote:
> Hello everyone,
> 
> This is the second iteration of the series introducing support for
> 2.5GBASET and 5GBASET to the network PHY infrastructure.
> 
> These 2 modes are described in the 802.3bz specifications, and allow to
> use 2.5G and 5G speeds on cat5e/cat6 cables.
> 
> The required infrastructure code is added for 2.5GBASET and 5GBASET, but
> only 2.5GBASET support is added to the Marvell10G driver. The reason is
> because the 5GBASER interface mode used to communicate with the MAC
> isn't implemented yet, and therefore this can't be tested.
> 
> This series has seen some rework since last time, see the full details
> in the patches changelog. The main highlights are :
> 
>  - Patch 1 moves the Pause ans Asym_Pause parameters update after
>    calling config_init. This allows us to change the phydev->supported
>    modes in config_init, while still forcing some quirks taken from
>    phydrv->features, to disable unsupported Pause modes. This was
>    proposed by Andrew.
> 
>  - Patch 2 generalizes the way we mask-out modes we don't want to use
>    when forcing the link speed through DT or ethtool, by walking through
>    the PHY settings table, as proposed by Russell.
> 
>  - Patch 4 implements automatic setting of TM, FIBRE and Backplane bits
>    from the list of supported linkmodes.
> 
>  - In Patch 5, we only read abilities from the PMA. Pause parameters
>    aren't built from the genphy_c45_pma_read_abilities, as it was done
>    in the previous iteration of the patch. We also amke use of
>    linkmode_mod_bit to make sure we mask-out unsupported modes.
> 
>  - In Patch 8, we manually check for the PMA device id to see if we have
>    to use a quirk when reading the 2.5G/5G Extended Abilities
> 
> Maxime Chevallier (10):
>   net: phy: Update PHY linkmodes after config_init
>   net: phy: Mask-out non-compatible modes when setting the max-speed
>   net: phy: Move of_set_phy_eee_broken to phy-core.c
>   net: phy: Automatically fill the generic TP, FIBRE and Backplane modes
>   net: phy: Extract genphy_c45_pma_read_abilities from marvell10g
>   net: phy: Add generic support for 2.5GBaseT and 5GBaseT
>   net: phy: marvell10g: Add support for 2.5GBASET
>   net: phy: marvell10g: Force reading of 2.5/5G
>   net: mvpp2: Add 2.5GBaseT support
>   net: phy: marvell10g: add support for the 88x2110 PHY
> 
>  .../net/ethernet/marvell/mvpp2/mvpp2_main.c   |   1 +
>  drivers/net/phy/marvell10g.c                  | 142 +++++------
>  drivers/net/phy/phy-c45.c                     | 111 ++++++++
>  drivers/net/phy/phy-core.c                    |  72 ++++++
>  drivers/net/phy/phy_device.c                  | 237 +++++++++---------
>  include/linux/linkmode.h                      |   6 +
>  include/linux/marvell_phy.h                   |   2 +
>  include/linux/phy.h                           |   3 +
>  include/uapi/linux/mdio.h                     |  16 ++
>  9 files changed, 405 insertions(+), 185 deletions(-)
> 
Hi Maxime,

Andrew and me are working on Aquantia PHY support and he handed over
to me a patch series which includes parts of the first version of your
series. Having said that I'm especially interested in your patches
5 and 6. Because your series is somewhat bigger and there are a few
review comments, preparing the next round may take time.

I'd propose that you extract generic patches being submission-ready
and split the patch series into two. I think the following patches
would be candidates for the first series: 2, 3, 5, 6
(provided they have no dependency on the other patches)
Based on that both of us can go on with our work.

Andrew, what do you think?

Heiner

^ permalink raw reply

* Re: Linux 5.0 regression: rtl8169 / kernel BUG at lib/dynamic_queue_limits.c:27!
From: Heiner Kallweit @ 2019-02-09 11:50 UTC (permalink / raw)
  To: Sander Eikelenboom, Eric Dumazet, Realtek linux nic maintainers,
	Eric Dumazet
  Cc: Linus Torvalds, linux-kernel, netdev
In-Reply-To: <88b80a6b-42e0-ce4e-8aad-4e23a17c7e65@eikelenboom.it>

On 09.02.2019 11:07, Sander Eikelenboom wrote:
> On 09/02/2019 10:59, Heiner Kallweit wrote:
>> On 09.02.2019 10:34, Sander Eikelenboom wrote:
>>> On 09/02/2019 10:02, Heiner Kallweit wrote:
>>>> On 09.02.2019 00:09, Eric Dumazet wrote:
>>>>>
>>>>>
>>>>> On 02/08/2019 01:50 PM, Heiner Kallweit wrote:
>>>>>> On 08.02.2019 22:45, Sander Eikelenboom wrote:
>>>>>>> On 08/02/2019 22:22, Heiner Kallweit wrote:
>>>>>>>> On 08.02.2019 21:55, Sander Eikelenboom wrote:
>>>>>>>>> On 08/02/2019 19:52, Heiner Kallweit wrote:
>>>>>>>>>> On 08.02.2019 19:29, Sander Eikelenboom wrote:
>>>>>>>>>>> L.S.,
>>>>>>>>>>>
>>>>>>>>>>> While testing a linux 5.0-rc5 kernel (with some patches on top but they don't seem related) under Xen i the nasty splat below, 
>>>>>>>>>>> that I haven encountered with Linux 4.20.x.
>>>>>>>>>>>
>>>>>>>>>>> Unfortunately I haven't got a clear reproducer for this and bisecting could be nasty due to another (networking related) kernel bug.
>>>>>>>>>>>
>>>>>>>>>>> If you need more info, want me to run a debug patch etc., please feel free to ask.
>>>>>>>>>>>
>>>>>>>>>> Thanks for the report. However I see no change in the r8169 driver between
>>>>>>>>>> 4.20 and 5.0 with regard to BQL code. Having said that the root cause could
>>>>>>>>>> be somewhere else. Therefore I'm afraid a bisect will be needed.
>>>>>>>>>
>>>>>>>>> Hmm i did some diging and i think:
>>>>>>>>> bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 r8169: remove unneeded mmiowb barriers
>>>>>>>>> 2e6eedb4813e34d8d84ac0eb3afb668966f3f356 r8169: make use of xmit_more and __netdev_sent_queue
>>>>>>>>> 620344c43edfa020bbadfd81a144ebe5181fc94f net: core: add __netdev_sent_queue as variant of __netdev_tx_sent_queue
>>>>>>>>>
>>>>>>>> You're right. Thought this was added in 4.20 already.
>>>>>>>> The BQL code pattern I copied from the mlx4 driver and so far I haven't heard about
>>>>>>>> this issue from any user of physical hw. And due to the fact that a lot of mainboards
>>>>>>>> have onboard Realtek network I have quite a few testers out there.
>>>>>>>> Does the issue occur under specific circumstances like very high load?
>>>>>>>
>>>>>>> Yep, the box is already quite contented with the Xen VM's and if I remember correctly it occurred while kernel compiling
>>>>>>> on the host.
>>>>>>>
>>>>>>>> If indeed the xmit_more patch causes the issue, I think we have to involve Eric Dumazet
>>>>>>>> as author of the underlying changes.
>>>>>>>
>>>>>>> It could also be the barriers weren't that unneeded as assumed.
>>>>>>
>>>>>> The barriers were removed after adding xmit_more handling. Therefore it would be good to
>>>>>> test also with only 
>>>>>> bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 r8169: remove unneeded mmiowb barriers
>>>>>> removed.
>>>>>>
>>>>>>> Since we are almost at RC6 i took the liberty to CC Eric now.
>>>>>>>
>>>>>> Sure, thanks.
>>>>>>
>>>>>>> BTW am i correct these patches are merely optimizations ?
>>>>>>
>>>>>> Yes
>>>>>>
>>>>>>> If so and concluding they revert cleanly, perhaps it should be considered at this point in the RC's
>>>>>>> to revert them for 5.0 and try again for 5.1 ?
>>>>>>>
>>>>>> Before removing both it would be good to test with only the barrier-removal removed.
>>>>>>
>>>>>
>>>>> Commit 2e6eedb4813e34d8d84ac0eb3afb668966f3f356 r8169: make use of xmit_more and __netdev_sent_queue
>>>>> looks buggy to me, since the skb might have been freed already on another cpu when you call
>>>>>
>>>>> You could try :
>>>>>
>>>>> diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
>>>>> index 3624e67aef72c92ed6e908e2c99ac2d381210126..f907d484165d9fd775e81bf2bfb9aa4ddedb1c93 100644
>>>>> --- a/drivers/net/ethernet/realtek/r8169.c
>>>>> +++ b/drivers/net/ethernet/realtek/r8169.c
>>>>> @@ -6070,6 +6070,7 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
>>>>>         dma_addr_t mapping;
>>>>>         u32 opts[2], len;
>>>>>         bool stop_queue;
>>>>> +       bool door_bell;
>>>>>         int frags;
>>>>>  
>>>>>         if (unlikely(!rtl_tx_slots_avail(tp, skb_shinfo(skb)->nr_frags))) {
>>>>> @@ -6116,6 +6117,8 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
>>>>>         /* Force memory writes to complete before releasing descriptor */
>>>>>         dma_wmb();
>>>>>  
>>>>> +       door_bell = __netdev_sent_queue(dev, skb->len, skb->xmit_more);
>>>>> +
>>>>>         txd->opts1 = rtl8169_get_txd_opts1(opts[0], len, entry);
>>>>>  
>>>>>         /* Force all memory writes to complete before notifying device */
>>>>> @@ -6127,7 +6130,7 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
>>>>>         if (unlikely(stop_queue))
>>>>>                 netif_stop_queue(dev);
>>>>>  
>>>>> -       if (__netdev_sent_queue(dev, skb->len, skb->xmit_more)) {
>>>>> +       if (door_bell) {
>>>>>                 RTL_W8(tp, TxPoll, NPQ);
>>>>>                 mmiowb();
>>>>>         }
>>>>>
>>>> Thanks a lot for checking and for the proposed fix.
>>>> Sander, can you try with this patch on top of 5.0-rc5 w/o removing two two commits?
>>>
>>> I have done that already during the night .. the results:
>>> - I can confirm 2e6eedb4813e34d8d84ac0eb3afb668966f3f356 is the first commit which causes hitting the BUG_ON in lib/dynamic_queue_limits.c.
>>>   (in other word, with only reverting bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 it still blows up).
>>>
>>> - The Eric's patch only applies cleanly with bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 reverted, so that's what I tested.
>>>   The patch seems to prevent hitting the BUG_ON in lib/dynamic_queue_limits.c, it has run this night and I gave done a few kernel compiles
>>>   this morning. How ever during these kernel compiles i'm getting a transmit queue timeout which i haven't seen with 4.20.x, although i regularly
>>>   compile kernels in the same way as I do now. The only thing I can't say if that is due to this change, or if it's again something else.
>>>   Which makes me somewhat inclined to go testing the complete revert some more and see if I can trigger the queue timeout on that or not.
>>>
>>>   If I can, it is a separate issue.
>>>   If I can't it seems even with a patch it still seems as a regression in comparison with 4.20.x, for which
>>>   a revert would be the right thing to do (since as you indicated these are merely optimizations), 
>>>   which would give us more time for 5.1 to try to solve things on top of the 5.0-release-to-be.
>>>   (especially since I seem to still have other issues which need to be sorted out and time is limited)
>>>
>>>   The timeout in question:
>>>         [28336.869479] NETDEV WATCHDOG: eth1 (r8169): transmit queue 0 timed out
>>>         [28336.881498] WARNING: CPU: 0 PID: 6925 at net/sched/sch_generic.c:461 dev_watchdog+0x20b/0x210
>>>         [28336.893358] Modules linked in:
>>>         [28336.904106] CPU: 0 PID: 6925 Comm: cc1 Tainted: G      D           5.0.0-rc5-20190208-thp-net-florian-rtl8169-eric-doflr+ #1
>>>         [28336.917385] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640)  , BIOS V1.8B1 09/13/2010
>>>         [28336.928988] RIP: e030:dev_watchdog+0x20b/0x210
>>>         [28336.940623] Code: 00 49 63 4e e0 eb 90 4c 89 e7 c6 05 ad d8 f1 00 01 e8 a9 32 fd ff 89 d9 48 89 c2 4c 89 e6 48 c7 c7 50 59 89 82 e8 e5 92 4d ff <0f> 0b eb c0 90 48 c7 47 08 00 00 00 00 48 c7 07 00 00 00 00 0f b7
>>>         [28336.965265] RSP: e02b:ffff88807d403ea0 EFLAGS: 00010286
>>>         [28336.977465] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff82a69db8
>>>         [28336.991265] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 0000000000000200
>>>         [28337.008865] RBP: ffff88807936e41c R08: 0000000000000000 R09: 0000000000000819
>>>         [28337.022250] R10: 0000000000000202 R11: ffffffff8247ca80 R12: ffff88807936e000
>>>         [28337.035204] R13: 0000000000000000 R14: ffff88807936e440 R15: 0000000000000001
>>>         [28337.049832] FS:  00007f53e9bf3840(0000) GS:ffff88807d400000(0000) knlGS:0000000000000000
>>>         [28337.062524] CS:  e030 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>         [28337.075086] CR2: 00007f53e60c4000 CR3: 000000001a0be000 CR4: 0000000000000660
>>>         [28337.090052] Call Trace:
>>>         [28337.103615]  <IRQ>
>>>         [28337.116587]  ? qdisc_destroy+0x120/0x120
>>>         [28337.128905]  call_timer_fn+0x19/0x90
>>>         [28337.141892]  expire_timers+0x8b/0xa0
>>>         [28337.153354]  run_timer_softirq+0x7e/0x160
>>>         [28337.165931]  ? handle_irq_event_percpu+0x4c/0x70
>>>         [28337.176548]  ? handle_percpu_irq+0x32/0x50
>>>         [28337.186734]  __do_softirq+0xed/0x229
>>>         [28337.196404]  ? hypervisor_callback+0xa/0x20
>>>         [28337.207822]  irq_exit+0xb7/0xc0
>>>         [28337.218978]  xen_evtchn_do_upcall+0x27/0x40
>>>         [28337.230763]  xen_do_hypervisor_callback+0x29/0x40
>>>         [28337.241261]  </IRQ>
>>>         [28337.253283] RIP: e033:0xff7e62
>>>         [28337.264899] Code: 35 43 0f c7 00 4c 89 ef e8 8b 6d 67 ff 0f 1f 00 44 89 e0 44 89 e2 c1 e8 06 83 e2 3f 48 8b 0c c5 40 8d c6 01 48 0f a3 d1 72 0e <48> 8b 04 c5 50 8d c6 01 48 0f a3 d0 73 0b 44 89 e6 4c 89 ef e8 b5
>>>         [28337.288677] RSP: e02b:00007fff0fc6a340 EFLAGS: 00000202
>>>         [28337.299234] RAX: 0000000000000000 RBX: 00007f53e60c3580 RCX: 0000000000000000
>>>         [28337.309577] RDX: 0000000000000034 RSI: 0000000001e71a98 RDI: 00007fff0fc6a538
>>>         [28337.320724] RBP: 00007fff0fc6a4b0 R08: 0000000000000000 R09: 0000000000000000
>>>         [28337.331829] R10: 0000000000000001 R11: 00000000020cb3d0 R12: 0000000000000034
>>>         [28337.343900] R13: 00007fff0fc6a538 R14: 0000000000000000 R15: 0000000000000001
>>>         [28337.353977] ---[ end trace 6ff49f09286816b7 ]---
>>>
>> Thanks for your efforts. As usual this tx timeout trace says basically nothing except
>> "timeout" and root cause could be anything. Earlier you reported a memory allocation error,
>> did that occur again?
>> If we decide to revert, I'd leave removal of the memory barriers in (as it doesn't seem to
>> contribute to the issue) and just submit a patch to effectively revert
>> 2e6eedb4813e34d8d84ac0eb3afb668966f3f356.
> 
> I can't say if that is correct, because i haven't tested that.
> 
> Another thing I could test is:
>  - putting all the r8169 patches (and prerequisites) that went into 5.0 
>    up to bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3, onto 4.20.7 and see what that does.
>    If that would be feasible (not too many needed prerequisites out of r8169) and if 
>    you could spare me some time and prep such a branch somewhere so i can pull and compile that,
>    that would be great.
> 

Unfortunately there's quite a number of changes. Regarding __netdev_tx_sent_queue()
and watchdog timeout I found the following comment in drivers/net/ethernet/sfc/tx.c,
efx_enqueue_skb():

	if (__netdev_tx_sent_queue(tx_queue->core_txq, skb_len, xmit_more)) {
		struct efx_tx_queue *txq2 = efx_tx_queue_partner(tx_queue);

		/* There could be packets left on the partner queue if those
		 * SKBs had skb->xmit_more set. If we do not push those they
		 * could be left for a long time and cause a netdev watchdog.
		 */
		if (txq2->xmit_more_available)
			efx_nic_push_buffers(txq2);

But I'm not sure whether the situation in r8169 is comparable. The following patch
implements what I mentioned earlier: It leaves all other 5.0 changes in place and
effectively reverts 2e6eedb4813e34d8d84ac0eb3afb668966f3f356. Would be great if
you could give it a try.


diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index e8a112149..3cca2ffb2 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -6192,7 +6192,6 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
 	struct device *d = tp_to_dev(tp);
 	dma_addr_t mapping;
 	u32 opts[2], len;
-	bool stop_queue;
 	int frags;
 
 	if (unlikely(!rtl_tx_slots_avail(tp, skb_shinfo(skb)->nr_frags))) {
@@ -6234,6 +6233,8 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
 
 	txd->opts2 = cpu_to_le32(opts[1]);
 
+	netdev_sent_queue(dev, skb->len);
+
 	skb_tx_timestamp(skb);
 
 	/* Force memory writes to complete before releasing descriptor */
@@ -6246,14 +6247,14 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
 
 	tp->cur_tx += frags + 1;
 
-	stop_queue = !rtl_tx_slots_avail(tp, MAX_SKB_FRAGS);
-	if (unlikely(stop_queue))
-		netif_stop_queue(dev);
-
-	if (__netdev_sent_queue(dev, skb->len, skb->xmit_more))
-		RTL_W8(tp, TxPoll, NPQ);
+	RTL_W8(tp, TxPoll, NPQ);
 
-	if (unlikely(stop_queue)) {
+	if (!rtl_tx_slots_avail(tp, MAX_SKB_FRAGS)) {
+		/* Avoid wrongly optimistic queue wake-up: rtl_tx thread must
+		 * not miss a ring update when it notices a stopped queue.
+		 */
+		smp_wmb();
+		netif_stop_queue(dev);
 		/* Sync with rtl_tx:
 		 * - publish queue status and cur_tx ring index (write barrier)
 		 * - refresh dirty_tx ring index (read barrier).
-- 
2.20.1


^ permalink raw reply related

* [PATCH] inet_diag: fix reporting cgroup classid and fallback to priority
From: Konstantin Khlebnikov @ 2019-02-09 10:35 UTC (permalink / raw)
  To: netdev, linux-kernel, David S. Miller; +Cc: Sasha Levin, linux-sctp

Field idiag_ext in struct inet_diag_req_v2 used as bitmap of requested
extensions has only 8 bits. Thus extensions starting from DCTCPINFO
cannot be requested directly. Some of them included into response
unconditionally or hook into some of lower 8 bits.

Extension INET_DIAG_CLASS_ID has not way to request from the beginning.

This patch bundle it with INET_DIAG_TCLASS (ipv6 tos), fixes space
reservation, and documents behavior for other extensions.

Also this patch adds fallback to reporting socket priority. This filed
is more widely used for traffic classification because ipv4 sockets
automatically maps TOS to priority and default qdisc pfifo_fast knows
about that. But priority could be changed via setsockopt SO_PRIORITY so
INET_DIAG_TOS isn't enough for predicting class.

Also cgroup2 obsoletes net_cls classid (it always zero), but we cannot
reuse this field for reporting cgroup2 id because it is 64-bit (ino+gen).

So, after this patch INET_DIAG_CLASS_ID will report socket priority
for most common setup when net_cls isn't set and/or cgroup2 in use.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Fixes: 0888e372c37f ("net: inet: diag: expose sockets cgroup classid")
---
 include/uapi/linux/inet_diag.h |   16 +++++++++++-----
 net/ipv4/inet_diag.c           |   10 +++++++++-
 net/sctp/diag.c                |    1 +
 3 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/include/uapi/linux/inet_diag.h b/include/uapi/linux/inet_diag.h
index 14565d703291..e8baca85bac6 100644
--- a/include/uapi/linux/inet_diag.h
+++ b/include/uapi/linux/inet_diag.h
@@ -137,15 +137,21 @@ enum {
 	INET_DIAG_TCLASS,
 	INET_DIAG_SKMEMINFO,
 	INET_DIAG_SHUTDOWN,
-	INET_DIAG_DCTCPINFO,
-	INET_DIAG_PROTOCOL,  /* response attribute only */
+
+	/*
+	 * Next extenstions cannot be requested in struct inet_diag_req_v2:
+	 * its field idiag_ext has only 8 bits.
+	 */
+
+	INET_DIAG_DCTCPINFO,	/* request as INET_DIAG_VEGASINFO */
+	INET_DIAG_PROTOCOL,	/* response attribute only */
 	INET_DIAG_SKV6ONLY,
 	INET_DIAG_LOCALS,
 	INET_DIAG_PEERS,
 	INET_DIAG_PAD,
-	INET_DIAG_MARK,
-	INET_DIAG_BBRINFO,
-	INET_DIAG_CLASS_ID,
+	INET_DIAG_MARK,		/* only with CAP_NET_ADMIN */
+	INET_DIAG_BBRINFO,	/* request as INET_DIAG_VEGASINFO */
+	INET_DIAG_CLASS_ID,	/* request as INET_DIAG_TCLASS */
 	INET_DIAG_MD5SIG,
 	__INET_DIAG_MAX,
 };
diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
index 1a4e9ff02762..5731670c560b 100644
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -108,6 +108,7 @@ static size_t inet_sk_attr_size(struct sock *sk,
 		+ nla_total_size(1) /* INET_DIAG_TOS */
 		+ nla_total_size(1) /* INET_DIAG_TCLASS */
 		+ nla_total_size(4) /* INET_DIAG_MARK */
+		+ nla_total_size(4) /* INET_DIAG_CLASS_ID */
 		+ nla_total_size(sizeof(struct inet_diag_meminfo))
 		+ nla_total_size(sizeof(struct inet_diag_msg))
 		+ nla_total_size(SK_MEMINFO_VARS * sizeof(u32))
@@ -287,12 +288,19 @@ int inet_sk_diag_fill(struct sock *sk, struct inet_connection_sock *icsk,
 			goto errout;
 	}
 
-	if (ext & (1 << (INET_DIAG_CLASS_ID - 1))) {
+	if (ext & (1 << (INET_DIAG_CLASS_ID - 1)) ||
+	    ext & (1 << (INET_DIAG_TCLASS - 1))) {
 		u32 classid = 0;
 
 #ifdef CONFIG_SOCK_CGROUP_DATA
 		classid = sock_cgroup_classid(&sk->sk_cgrp_data);
 #endif
+		/* Fallback to socket priority if class id isn't set.
+		 * Classful qdiscs use it as direct reference to class.
+		 * For cgroup2 classid is always zero.
+		 */
+		if (!classid)
+			classid = sk->sk_priority;
 
 		if (nla_put_u32(skb, INET_DIAG_CLASS_ID, classid))
 			goto errout;
diff --git a/net/sctp/diag.c b/net/sctp/diag.c
index 078f01a8d582..435847d98b51 100644
--- a/net/sctp/diag.c
+++ b/net/sctp/diag.c
@@ -256,6 +256,7 @@ static size_t inet_assoc_attr_size(struct sctp_association *asoc)
 		+ nla_total_size(1) /* INET_DIAG_TOS */
 		+ nla_total_size(1) /* INET_DIAG_TCLASS */
 		+ nla_total_size(4) /* INET_DIAG_MARK */
+		+ nla_total_size(4) /* INET_DIAG_CLASS_ID */
 		+ nla_total_size(addrlen * asoc->peer.transport_count)
 		+ nla_total_size(addrlen * addrcnt)
 		+ nla_total_size(sizeof(struct inet_diag_meminfo))


^ permalink raw reply related

* Re: Linux 5.0 regression: rtl8169 / kernel BUG at lib/dynamic_queue_limits.c:27!
From: Sander Eikelenboom @ 2019-02-09 10:07 UTC (permalink / raw)
  To: Heiner Kallweit, Eric Dumazet, Realtek linux nic maintainers,
	Eric Dumazet
  Cc: Linus Torvalds, linux-kernel, netdev
In-Reply-To: <70e9a3fe-158a-c3a2-a427-2343bc6c9031@gmail.com>

On 09/02/2019 10:59, Heiner Kallweit wrote:
> On 09.02.2019 10:34, Sander Eikelenboom wrote:
>> On 09/02/2019 10:02, Heiner Kallweit wrote:
>>> On 09.02.2019 00:09, Eric Dumazet wrote:
>>>>
>>>>
>>>> On 02/08/2019 01:50 PM, Heiner Kallweit wrote:
>>>>> On 08.02.2019 22:45, Sander Eikelenboom wrote:
>>>>>> On 08/02/2019 22:22, Heiner Kallweit wrote:
>>>>>>> On 08.02.2019 21:55, Sander Eikelenboom wrote:
>>>>>>>> On 08/02/2019 19:52, Heiner Kallweit wrote:
>>>>>>>>> On 08.02.2019 19:29, Sander Eikelenboom wrote:
>>>>>>>>>> L.S.,
>>>>>>>>>>
>>>>>>>>>> While testing a linux 5.0-rc5 kernel (with some patches on top but they don't seem related) under Xen i the nasty splat below, 
>>>>>>>>>> that I haven encountered with Linux 4.20.x.
>>>>>>>>>>
>>>>>>>>>> Unfortunately I haven't got a clear reproducer for this and bisecting could be nasty due to another (networking related) kernel bug.
>>>>>>>>>>
>>>>>>>>>> If you need more info, want me to run a debug patch etc., please feel free to ask.
>>>>>>>>>>
>>>>>>>>> Thanks for the report. However I see no change in the r8169 driver between
>>>>>>>>> 4.20 and 5.0 with regard to BQL code. Having said that the root cause could
>>>>>>>>> be somewhere else. Therefore I'm afraid a bisect will be needed.
>>>>>>>>
>>>>>>>> Hmm i did some diging and i think:
>>>>>>>> bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 r8169: remove unneeded mmiowb barriers
>>>>>>>> 2e6eedb4813e34d8d84ac0eb3afb668966f3f356 r8169: make use of xmit_more and __netdev_sent_queue
>>>>>>>> 620344c43edfa020bbadfd81a144ebe5181fc94f net: core: add __netdev_sent_queue as variant of __netdev_tx_sent_queue
>>>>>>>>
>>>>>>> You're right. Thought this was added in 4.20 already.
>>>>>>> The BQL code pattern I copied from the mlx4 driver and so far I haven't heard about
>>>>>>> this issue from any user of physical hw. And due to the fact that a lot of mainboards
>>>>>>> have onboard Realtek network I have quite a few testers out there.
>>>>>>> Does the issue occur under specific circumstances like very high load?
>>>>>>
>>>>>> Yep, the box is already quite contented with the Xen VM's and if I remember correctly it occurred while kernel compiling
>>>>>> on the host.
>>>>>>
>>>>>>> If indeed the xmit_more patch causes the issue, I think we have to involve Eric Dumazet
>>>>>>> as author of the underlying changes.
>>>>>>
>>>>>> It could also be the barriers weren't that unneeded as assumed.
>>>>>
>>>>> The barriers were removed after adding xmit_more handling. Therefore it would be good to
>>>>> test also with only 
>>>>> bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 r8169: remove unneeded mmiowb barriers
>>>>> removed.
>>>>>
>>>>>> Since we are almost at RC6 i took the liberty to CC Eric now.
>>>>>>
>>>>> Sure, thanks.
>>>>>
>>>>>> BTW am i correct these patches are merely optimizations ?
>>>>>
>>>>> Yes
>>>>>
>>>>>> If so and concluding they revert cleanly, perhaps it should be considered at this point in the RC's
>>>>>> to revert them for 5.0 and try again for 5.1 ?
>>>>>>
>>>>> Before removing both it would be good to test with only the barrier-removal removed.
>>>>>
>>>>
>>>> Commit 2e6eedb4813e34d8d84ac0eb3afb668966f3f356 r8169: make use of xmit_more and __netdev_sent_queue
>>>> looks buggy to me, since the skb might have been freed already on another cpu when you call
>>>>
>>>> You could try :
>>>>
>>>> diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
>>>> index 3624e67aef72c92ed6e908e2c99ac2d381210126..f907d484165d9fd775e81bf2bfb9aa4ddedb1c93 100644
>>>> --- a/drivers/net/ethernet/realtek/r8169.c
>>>> +++ b/drivers/net/ethernet/realtek/r8169.c
>>>> @@ -6070,6 +6070,7 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
>>>>         dma_addr_t mapping;
>>>>         u32 opts[2], len;
>>>>         bool stop_queue;
>>>> +       bool door_bell;
>>>>         int frags;
>>>>  
>>>>         if (unlikely(!rtl_tx_slots_avail(tp, skb_shinfo(skb)->nr_frags))) {
>>>> @@ -6116,6 +6117,8 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
>>>>         /* Force memory writes to complete before releasing descriptor */
>>>>         dma_wmb();
>>>>  
>>>> +       door_bell = __netdev_sent_queue(dev, skb->len, skb->xmit_more);
>>>> +
>>>>         txd->opts1 = rtl8169_get_txd_opts1(opts[0], len, entry);
>>>>  
>>>>         /* Force all memory writes to complete before notifying device */
>>>> @@ -6127,7 +6130,7 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
>>>>         if (unlikely(stop_queue))
>>>>                 netif_stop_queue(dev);
>>>>  
>>>> -       if (__netdev_sent_queue(dev, skb->len, skb->xmit_more)) {
>>>> +       if (door_bell) {
>>>>                 RTL_W8(tp, TxPoll, NPQ);
>>>>                 mmiowb();
>>>>         }
>>>>
>>> Thanks a lot for checking and for the proposed fix.
>>> Sander, can you try with this patch on top of 5.0-rc5 w/o removing two two commits?
>>
>> I have done that already during the night .. the results:
>> - I can confirm 2e6eedb4813e34d8d84ac0eb3afb668966f3f356 is the first commit which causes hitting the BUG_ON in lib/dynamic_queue_limits.c.
>>   (in other word, with only reverting bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 it still blows up).
>>
>> - The Eric's patch only applies cleanly with bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 reverted, so that's what I tested.
>>   The patch seems to prevent hitting the BUG_ON in lib/dynamic_queue_limits.c, it has run this night and I gave done a few kernel compiles
>>   this morning. How ever during these kernel compiles i'm getting a transmit queue timeout which i haven't seen with 4.20.x, although i regularly
>>   compile kernels in the same way as I do now. The only thing I can't say if that is due to this change, or if it's again something else.
>>   Which makes me somewhat inclined to go testing the complete revert some more and see if I can trigger the queue timeout on that or not.
>>
>>   If I can, it is a separate issue.
>>   If I can't it seems even with a patch it still seems as a regression in comparison with 4.20.x, for which
>>   a revert would be the right thing to do (since as you indicated these are merely optimizations), 
>>   which would give us more time for 5.1 to try to solve things on top of the 5.0-release-to-be.
>>   (especially since I seem to still have other issues which need to be sorted out and time is limited)
>>
>>   The timeout in question:
>>         [28336.869479] NETDEV WATCHDOG: eth1 (r8169): transmit queue 0 timed out
>>         [28336.881498] WARNING: CPU: 0 PID: 6925 at net/sched/sch_generic.c:461 dev_watchdog+0x20b/0x210
>>         [28336.893358] Modules linked in:
>>         [28336.904106] CPU: 0 PID: 6925 Comm: cc1 Tainted: G      D           5.0.0-rc5-20190208-thp-net-florian-rtl8169-eric-doflr+ #1
>>         [28336.917385] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640)  , BIOS V1.8B1 09/13/2010
>>         [28336.928988] RIP: e030:dev_watchdog+0x20b/0x210
>>         [28336.940623] Code: 00 49 63 4e e0 eb 90 4c 89 e7 c6 05 ad d8 f1 00 01 e8 a9 32 fd ff 89 d9 48 89 c2 4c 89 e6 48 c7 c7 50 59 89 82 e8 e5 92 4d ff <0f> 0b eb c0 90 48 c7 47 08 00 00 00 00 48 c7 07 00 00 00 00 0f b7
>>         [28336.965265] RSP: e02b:ffff88807d403ea0 EFLAGS: 00010286
>>         [28336.977465] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff82a69db8
>>         [28336.991265] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 0000000000000200
>>         [28337.008865] RBP: ffff88807936e41c R08: 0000000000000000 R09: 0000000000000819
>>         [28337.022250] R10: 0000000000000202 R11: ffffffff8247ca80 R12: ffff88807936e000
>>         [28337.035204] R13: 0000000000000000 R14: ffff88807936e440 R15: 0000000000000001
>>         [28337.049832] FS:  00007f53e9bf3840(0000) GS:ffff88807d400000(0000) knlGS:0000000000000000
>>         [28337.062524] CS:  e030 DS: 0000 ES: 0000 CR0: 0000000080050033
>>         [28337.075086] CR2: 00007f53e60c4000 CR3: 000000001a0be000 CR4: 0000000000000660
>>         [28337.090052] Call Trace:
>>         [28337.103615]  <IRQ>
>>         [28337.116587]  ? qdisc_destroy+0x120/0x120
>>         [28337.128905]  call_timer_fn+0x19/0x90
>>         [28337.141892]  expire_timers+0x8b/0xa0
>>         [28337.153354]  run_timer_softirq+0x7e/0x160
>>         [28337.165931]  ? handle_irq_event_percpu+0x4c/0x70
>>         [28337.176548]  ? handle_percpu_irq+0x32/0x50
>>         [28337.186734]  __do_softirq+0xed/0x229
>>         [28337.196404]  ? hypervisor_callback+0xa/0x20
>>         [28337.207822]  irq_exit+0xb7/0xc0
>>         [28337.218978]  xen_evtchn_do_upcall+0x27/0x40
>>         [28337.230763]  xen_do_hypervisor_callback+0x29/0x40
>>         [28337.241261]  </IRQ>
>>         [28337.253283] RIP: e033:0xff7e62
>>         [28337.264899] Code: 35 43 0f c7 00 4c 89 ef e8 8b 6d 67 ff 0f 1f 00 44 89 e0 44 89 e2 c1 e8 06 83 e2 3f 48 8b 0c c5 40 8d c6 01 48 0f a3 d1 72 0e <48> 8b 04 c5 50 8d c6 01 48 0f a3 d0 73 0b 44 89 e6 4c 89 ef e8 b5
>>         [28337.288677] RSP: e02b:00007fff0fc6a340 EFLAGS: 00000202
>>         [28337.299234] RAX: 0000000000000000 RBX: 00007f53e60c3580 RCX: 0000000000000000
>>         [28337.309577] RDX: 0000000000000034 RSI: 0000000001e71a98 RDI: 00007fff0fc6a538
>>         [28337.320724] RBP: 00007fff0fc6a4b0 R08: 0000000000000000 R09: 0000000000000000
>>         [28337.331829] R10: 0000000000000001 R11: 00000000020cb3d0 R12: 0000000000000034
>>         [28337.343900] R13: 00007fff0fc6a538 R14: 0000000000000000 R15: 0000000000000001
>>         [28337.353977] ---[ end trace 6ff49f09286816b7 ]---
>>
> Thanks for your efforts. As usual this tx timeout trace says basically nothing except
> "timeout" and root cause could be anything. Earlier you reported a memory allocation error,
> did that occur again?
> If we decide to revert, I'd leave removal of the memory barriers in (as it doesn't seem to
> contribute to the issue) and just submit a patch to effectively revert
> 2e6eedb4813e34d8d84ac0eb3afb668966f3f356.

I can't say if that is correct, because i haven't tested that.

Another thing I could test is:
 - putting all the r8169 patches (and prerequisites) that went into 5.0 
   up to bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3, onto 4.20.7 and see what that does.
   If that would be feasible (not too many needed prerequisites out of r8169) and if 
   you could spare me some time and prep such a branch somewhere so i can pull and compile that,
   that would be great.

--
Sander

>> --
>> Sander
>>
>>  
>>>>
>>>> .
>>>>
>>> Heiner
>>>
>>
>>


^ permalink raw reply

* Re: Linux 5.0 regression: rtl8169 / kernel BUG at lib/dynamic_queue_limits.c:27!
From: Heiner Kallweit @ 2019-02-09  9:59 UTC (permalink / raw)
  To: Sander Eikelenboom, Eric Dumazet, Realtek linux nic maintainers,
	Eric Dumazet
  Cc: Linus Torvalds, linux-kernel, netdev
In-Reply-To: <6b4e8aa0-03c5-c0a8-439e-77daabb07416@eikelenboom.it>

On 09.02.2019 10:34, Sander Eikelenboom wrote:
> On 09/02/2019 10:02, Heiner Kallweit wrote:
>> On 09.02.2019 00:09, Eric Dumazet wrote:
>>>
>>>
>>> On 02/08/2019 01:50 PM, Heiner Kallweit wrote:
>>>> On 08.02.2019 22:45, Sander Eikelenboom wrote:
>>>>> On 08/02/2019 22:22, Heiner Kallweit wrote:
>>>>>> On 08.02.2019 21:55, Sander Eikelenboom wrote:
>>>>>>> On 08/02/2019 19:52, Heiner Kallweit wrote:
>>>>>>>> On 08.02.2019 19:29, Sander Eikelenboom wrote:
>>>>>>>>> L.S.,
>>>>>>>>>
>>>>>>>>> While testing a linux 5.0-rc5 kernel (with some patches on top but they don't seem related) under Xen i the nasty splat below, 
>>>>>>>>> that I haven encountered with Linux 4.20.x.
>>>>>>>>>
>>>>>>>>> Unfortunately I haven't got a clear reproducer for this and bisecting could be nasty due to another (networking related) kernel bug.
>>>>>>>>>
>>>>>>>>> If you need more info, want me to run a debug patch etc., please feel free to ask.
>>>>>>>>>
>>>>>>>> Thanks for the report. However I see no change in the r8169 driver between
>>>>>>>> 4.20 and 5.0 with regard to BQL code. Having said that the root cause could
>>>>>>>> be somewhere else. Therefore I'm afraid a bisect will be needed.
>>>>>>>
>>>>>>> Hmm i did some diging and i think:
>>>>>>> bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 r8169: remove unneeded mmiowb barriers
>>>>>>> 2e6eedb4813e34d8d84ac0eb3afb668966f3f356 r8169: make use of xmit_more and __netdev_sent_queue
>>>>>>> 620344c43edfa020bbadfd81a144ebe5181fc94f net: core: add __netdev_sent_queue as variant of __netdev_tx_sent_queue
>>>>>>>
>>>>>> You're right. Thought this was added in 4.20 already.
>>>>>> The BQL code pattern I copied from the mlx4 driver and so far I haven't heard about
>>>>>> this issue from any user of physical hw. And due to the fact that a lot of mainboards
>>>>>> have onboard Realtek network I have quite a few testers out there.
>>>>>> Does the issue occur under specific circumstances like very high load?
>>>>>
>>>>> Yep, the box is already quite contented with the Xen VM's and if I remember correctly it occurred while kernel compiling
>>>>> on the host.
>>>>>
>>>>>> If indeed the xmit_more patch causes the issue, I think we have to involve Eric Dumazet
>>>>>> as author of the underlying changes.
>>>>>
>>>>> It could also be the barriers weren't that unneeded as assumed.
>>>>
>>>> The barriers were removed after adding xmit_more handling. Therefore it would be good to
>>>> test also with only 
>>>> bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 r8169: remove unneeded mmiowb barriers
>>>> removed.
>>>>
>>>>> Since we are almost at RC6 i took the liberty to CC Eric now.
>>>>>
>>>> Sure, thanks.
>>>>
>>>>> BTW am i correct these patches are merely optimizations ?
>>>>
>>>> Yes
>>>>
>>>>> If so and concluding they revert cleanly, perhaps it should be considered at this point in the RC's
>>>>> to revert them for 5.0 and try again for 5.1 ?
>>>>>
>>>> Before removing both it would be good to test with only the barrier-removal removed.
>>>>
>>>
>>> Commit 2e6eedb4813e34d8d84ac0eb3afb668966f3f356 r8169: make use of xmit_more and __netdev_sent_queue
>>> looks buggy to me, since the skb might have been freed already on another cpu when you call
>>>
>>> You could try :
>>>
>>> diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
>>> index 3624e67aef72c92ed6e908e2c99ac2d381210126..f907d484165d9fd775e81bf2bfb9aa4ddedb1c93 100644
>>> --- a/drivers/net/ethernet/realtek/r8169.c
>>> +++ b/drivers/net/ethernet/realtek/r8169.c
>>> @@ -6070,6 +6070,7 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
>>>         dma_addr_t mapping;
>>>         u32 opts[2], len;
>>>         bool stop_queue;
>>> +       bool door_bell;
>>>         int frags;
>>>  
>>>         if (unlikely(!rtl_tx_slots_avail(tp, skb_shinfo(skb)->nr_frags))) {
>>> @@ -6116,6 +6117,8 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
>>>         /* Force memory writes to complete before releasing descriptor */
>>>         dma_wmb();
>>>  
>>> +       door_bell = __netdev_sent_queue(dev, skb->len, skb->xmit_more);
>>> +
>>>         txd->opts1 = rtl8169_get_txd_opts1(opts[0], len, entry);
>>>  
>>>         /* Force all memory writes to complete before notifying device */
>>> @@ -6127,7 +6130,7 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
>>>         if (unlikely(stop_queue))
>>>                 netif_stop_queue(dev);
>>>  
>>> -       if (__netdev_sent_queue(dev, skb->len, skb->xmit_more)) {
>>> +       if (door_bell) {
>>>                 RTL_W8(tp, TxPoll, NPQ);
>>>                 mmiowb();
>>>         }
>>>
>> Thanks a lot for checking and for the proposed fix.
>> Sander, can you try with this patch on top of 5.0-rc5 w/o removing two two commits?
> 
> I have done that already during the night .. the results:
> - I can confirm 2e6eedb4813e34d8d84ac0eb3afb668966f3f356 is the first commit which causes hitting the BUG_ON in lib/dynamic_queue_limits.c.
>   (in other word, with only reverting bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 it still blows up).
> 
> - The Eric's patch only applies cleanly with bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 reverted, so that's what I tested.
>   The patch seems to prevent hitting the BUG_ON in lib/dynamic_queue_limits.c, it has run this night and I gave done a few kernel compiles
>   this morning. How ever during these kernel compiles i'm getting a transmit queue timeout which i haven't seen with 4.20.x, although i regularly
>   compile kernels in the same way as I do now. The only thing I can't say if that is due to this change, or if it's again something else.
>   Which makes me somewhat inclined to go testing the complete revert some more and see if I can trigger the queue timeout on that or not.
> 
>   If I can, it is a separate issue.
>   If I can't it seems even with a patch it still seems as a regression in comparison with 4.20.x, for which
>   a revert would be the right thing to do (since as you indicated these are merely optimizations), 
>   which would give us more time for 5.1 to try to solve things on top of the 5.0-release-to-be.
>   (especially since I seem to still have other issues which need to be sorted out and time is limited)
> 
>   The timeout in question:
>         [28336.869479] NETDEV WATCHDOG: eth1 (r8169): transmit queue 0 timed out
>         [28336.881498] WARNING: CPU: 0 PID: 6925 at net/sched/sch_generic.c:461 dev_watchdog+0x20b/0x210
>         [28336.893358] Modules linked in:
>         [28336.904106] CPU: 0 PID: 6925 Comm: cc1 Tainted: G      D           5.0.0-rc5-20190208-thp-net-florian-rtl8169-eric-doflr+ #1
>         [28336.917385] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640)  , BIOS V1.8B1 09/13/2010
>         [28336.928988] RIP: e030:dev_watchdog+0x20b/0x210
>         [28336.940623] Code: 00 49 63 4e e0 eb 90 4c 89 e7 c6 05 ad d8 f1 00 01 e8 a9 32 fd ff 89 d9 48 89 c2 4c 89 e6 48 c7 c7 50 59 89 82 e8 e5 92 4d ff <0f> 0b eb c0 90 48 c7 47 08 00 00 00 00 48 c7 07 00 00 00 00 0f b7
>         [28336.965265] RSP: e02b:ffff88807d403ea0 EFLAGS: 00010286
>         [28336.977465] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff82a69db8
>         [28336.991265] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 0000000000000200
>         [28337.008865] RBP: ffff88807936e41c R08: 0000000000000000 R09: 0000000000000819
>         [28337.022250] R10: 0000000000000202 R11: ffffffff8247ca80 R12: ffff88807936e000
>         [28337.035204] R13: 0000000000000000 R14: ffff88807936e440 R15: 0000000000000001
>         [28337.049832] FS:  00007f53e9bf3840(0000) GS:ffff88807d400000(0000) knlGS:0000000000000000
>         [28337.062524] CS:  e030 DS: 0000 ES: 0000 CR0: 0000000080050033
>         [28337.075086] CR2: 00007f53e60c4000 CR3: 000000001a0be000 CR4: 0000000000000660
>         [28337.090052] Call Trace:
>         [28337.103615]  <IRQ>
>         [28337.116587]  ? qdisc_destroy+0x120/0x120
>         [28337.128905]  call_timer_fn+0x19/0x90
>         [28337.141892]  expire_timers+0x8b/0xa0
>         [28337.153354]  run_timer_softirq+0x7e/0x160
>         [28337.165931]  ? handle_irq_event_percpu+0x4c/0x70
>         [28337.176548]  ? handle_percpu_irq+0x32/0x50
>         [28337.186734]  __do_softirq+0xed/0x229
>         [28337.196404]  ? hypervisor_callback+0xa/0x20
>         [28337.207822]  irq_exit+0xb7/0xc0
>         [28337.218978]  xen_evtchn_do_upcall+0x27/0x40
>         [28337.230763]  xen_do_hypervisor_callback+0x29/0x40
>         [28337.241261]  </IRQ>
>         [28337.253283] RIP: e033:0xff7e62
>         [28337.264899] Code: 35 43 0f c7 00 4c 89 ef e8 8b 6d 67 ff 0f 1f 00 44 89 e0 44 89 e2 c1 e8 06 83 e2 3f 48 8b 0c c5 40 8d c6 01 48 0f a3 d1 72 0e <48> 8b 04 c5 50 8d c6 01 48 0f a3 d0 73 0b 44 89 e6 4c 89 ef e8 b5
>         [28337.288677] RSP: e02b:00007fff0fc6a340 EFLAGS: 00000202
>         [28337.299234] RAX: 0000000000000000 RBX: 00007f53e60c3580 RCX: 0000000000000000
>         [28337.309577] RDX: 0000000000000034 RSI: 0000000001e71a98 RDI: 00007fff0fc6a538
>         [28337.320724] RBP: 00007fff0fc6a4b0 R08: 0000000000000000 R09: 0000000000000000
>         [28337.331829] R10: 0000000000000001 R11: 00000000020cb3d0 R12: 0000000000000034
>         [28337.343900] R13: 00007fff0fc6a538 R14: 0000000000000000 R15: 0000000000000001
>         [28337.353977] ---[ end trace 6ff49f09286816b7 ]---
> 
Thanks for your efforts. As usual this tx timeout trace says basically nothing except
"timeout" and root cause could be anything. Earlier you reported a memory allocation error,
did that occur again?
If we decide to revert, I'd leave removal of the memory barriers in (as it doesn't seem to
contribute to the issue) and just submit a patch to effectively revert
2e6eedb4813e34d8d84ac0eb3afb668966f3f356.

> --
> Sander
> 
>  
>>>
>>> .
>>>
>> Heiner
>>
> 
> 

^ permalink raw reply

* Re: [PATCH 13/19] net: split out functions related to registering inflight socket files
From: Hannes Reinecke @ 2019-02-09  9:49 UTC (permalink / raw)
  To: Jens Axboe, linux-aio, linux-block, linux-api
  Cc: hch, jmoyer, avi, jannh, viro, netdev, David S . Miller
In-Reply-To: <20190208173423.27014-14-axboe@kernel.dk>

On 2/8/19 6:34 PM, Jens Axboe wrote:
> We need this functionality for the io_uring file registration, but
> we cannot rely on it since CONFIG_UNIX can be modular. Move the helpers
> to a separate file, that's always builtin to the kernel if CONFIG_UNIX is
> m/y.
> 
> No functional changes in this patch, just moving code around.
> 
> Cc: netdev@vger.kernel.org
> Cc: David S. Miller <davem@davemloft.net>
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> ---
>   include/net/af_unix.h |   1 +
>   net/unix/Kconfig      |   5 ++
>   net/unix/Makefile     |   2 +
>   net/unix/af_unix.c    |  63 +-----------------
>   net/unix/garbage.c    |  71 +-------------------
>   net/unix/scm.c        | 146 ++++++++++++++++++++++++++++++++++++++++++
>   net/unix/scm.h        |  10 +++
>   7 files changed, 168 insertions(+), 130 deletions(-)
>   create mode 100644 net/unix/scm.c
>   create mode 100644 net/unix/scm.h
> 
Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes



^ permalink raw reply

* Re: Linux 5.0 regression: rtl8169 / kernel BUG at lib/dynamic_queue_limits.c:27!
From: Sander Eikelenboom @ 2019-02-09  9:34 UTC (permalink / raw)
  To: Heiner Kallweit, Eric Dumazet, Realtek linux nic maintainers,
	Eric Dumazet
  Cc: Linus Torvalds, linux-kernel, netdev
In-Reply-To: <aa7eb814-ebf7-ffa5-ce61-317b9bbf2394@gmail.com>

On 09/02/2019 10:02, Heiner Kallweit wrote:
> On 09.02.2019 00:09, Eric Dumazet wrote:
>>
>>
>> On 02/08/2019 01:50 PM, Heiner Kallweit wrote:
>>> On 08.02.2019 22:45, Sander Eikelenboom wrote:
>>>> On 08/02/2019 22:22, Heiner Kallweit wrote:
>>>>> On 08.02.2019 21:55, Sander Eikelenboom wrote:
>>>>>> On 08/02/2019 19:52, Heiner Kallweit wrote:
>>>>>>> On 08.02.2019 19:29, Sander Eikelenboom wrote:
>>>>>>>> L.S.,
>>>>>>>>
>>>>>>>> While testing a linux 5.0-rc5 kernel (with some patches on top but they don't seem related) under Xen i the nasty splat below, 
>>>>>>>> that I haven encountered with Linux 4.20.x.
>>>>>>>>
>>>>>>>> Unfortunately I haven't got a clear reproducer for this and bisecting could be nasty due to another (networking related) kernel bug.
>>>>>>>>
>>>>>>>> If you need more info, want me to run a debug patch etc., please feel free to ask.
>>>>>>>>
>>>>>>> Thanks for the report. However I see no change in the r8169 driver between
>>>>>>> 4.20 and 5.0 with regard to BQL code. Having said that the root cause could
>>>>>>> be somewhere else. Therefore I'm afraid a bisect will be needed.
>>>>>>
>>>>>> Hmm i did some diging and i think:
>>>>>> bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 r8169: remove unneeded mmiowb barriers
>>>>>> 2e6eedb4813e34d8d84ac0eb3afb668966f3f356 r8169: make use of xmit_more and __netdev_sent_queue
>>>>>> 620344c43edfa020bbadfd81a144ebe5181fc94f net: core: add __netdev_sent_queue as variant of __netdev_tx_sent_queue
>>>>>>
>>>>> You're right. Thought this was added in 4.20 already.
>>>>> The BQL code pattern I copied from the mlx4 driver and so far I haven't heard about
>>>>> this issue from any user of physical hw. And due to the fact that a lot of mainboards
>>>>> have onboard Realtek network I have quite a few testers out there.
>>>>> Does the issue occur under specific circumstances like very high load?
>>>>
>>>> Yep, the box is already quite contented with the Xen VM's and if I remember correctly it occurred while kernel compiling
>>>> on the host.
>>>>
>>>>> If indeed the xmit_more patch causes the issue, I think we have to involve Eric Dumazet
>>>>> as author of the underlying changes.
>>>>
>>>> It could also be the barriers weren't that unneeded as assumed.
>>>
>>> The barriers were removed after adding xmit_more handling. Therefore it would be good to
>>> test also with only 
>>> bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 r8169: remove unneeded mmiowb barriers
>>> removed.
>>>
>>>> Since we are almost at RC6 i took the liberty to CC Eric now.
>>>>
>>> Sure, thanks.
>>>
>>>> BTW am i correct these patches are merely optimizations ?
>>>
>>> Yes
>>>
>>>> If so and concluding they revert cleanly, perhaps it should be considered at this point in the RC's
>>>> to revert them for 5.0 and try again for 5.1 ?
>>>>
>>> Before removing both it would be good to test with only the barrier-removal removed.
>>>
>>
>> Commit 2e6eedb4813e34d8d84ac0eb3afb668966f3f356 r8169: make use of xmit_more and __netdev_sent_queue
>> looks buggy to me, since the skb might have been freed already on another cpu when you call
>>
>> You could try :
>>
>> diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
>> index 3624e67aef72c92ed6e908e2c99ac2d381210126..f907d484165d9fd775e81bf2bfb9aa4ddedb1c93 100644
>> --- a/drivers/net/ethernet/realtek/r8169.c
>> +++ b/drivers/net/ethernet/realtek/r8169.c
>> @@ -6070,6 +6070,7 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
>>         dma_addr_t mapping;
>>         u32 opts[2], len;
>>         bool stop_queue;
>> +       bool door_bell;
>>         int frags;
>>  
>>         if (unlikely(!rtl_tx_slots_avail(tp, skb_shinfo(skb)->nr_frags))) {
>> @@ -6116,6 +6117,8 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
>>         /* Force memory writes to complete before releasing descriptor */
>>         dma_wmb();
>>  
>> +       door_bell = __netdev_sent_queue(dev, skb->len, skb->xmit_more);
>> +
>>         txd->opts1 = rtl8169_get_txd_opts1(opts[0], len, entry);
>>  
>>         /* Force all memory writes to complete before notifying device */
>> @@ -6127,7 +6130,7 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
>>         if (unlikely(stop_queue))
>>                 netif_stop_queue(dev);
>>  
>> -       if (__netdev_sent_queue(dev, skb->len, skb->xmit_more)) {
>> +       if (door_bell) {
>>                 RTL_W8(tp, TxPoll, NPQ);
>>                 mmiowb();
>>         }
>>
> Thanks a lot for checking and for the proposed fix.
> Sander, can you try with this patch on top of 5.0-rc5 w/o removing two two commits?

I have done that already during the night .. the results:
- I can confirm 2e6eedb4813e34d8d84ac0eb3afb668966f3f356 is the first commit which causes hitting the BUG_ON in lib/dynamic_queue_limits.c.
  (in other word, with only reverting bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 it still blows up).

- The Eric's patch only applies cleanly with bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 reverted, so that's what I tested.
  The patch seems to prevent hitting the BUG_ON in lib/dynamic_queue_limits.c, it has run this night and I gave done a few kernel compiles
  this morning. How ever during these kernel compiles i'm getting a transmit queue timeout which i haven't seen with 4.20.x, although i regularly
  compile kernels in the same way as I do now. The only thing I can't say if that is due to this change, or if it's again something else.
  Which makes me somewhat inclined to go testing the complete revert some more and see if I can trigger the queue timeout on that or not.

  If I can, it is a separate issue.
  If I can't it seems even with a patch it still seems as a regression in comparison with 4.20.x, for which
  a revert would be the right thing to do (since as you indicated these are merely optimizations), 
  which would give us more time for 5.1 to try to solve things on top of the 5.0-release-to-be.
  (especially since I seem to still have other issues which need to be sorted out and time is limited)

  The timeout in question:
        [28336.869479] NETDEV WATCHDOG: eth1 (r8169): transmit queue 0 timed out
        [28336.881498] WARNING: CPU: 0 PID: 6925 at net/sched/sch_generic.c:461 dev_watchdog+0x20b/0x210
        [28336.893358] Modules linked in:
        [28336.904106] CPU: 0 PID: 6925 Comm: cc1 Tainted: G      D           5.0.0-rc5-20190208-thp-net-florian-rtl8169-eric-doflr+ #1
        [28336.917385] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640)  , BIOS V1.8B1 09/13/2010
        [28336.928988] RIP: e030:dev_watchdog+0x20b/0x210
        [28336.940623] Code: 00 49 63 4e e0 eb 90 4c 89 e7 c6 05 ad d8 f1 00 01 e8 a9 32 fd ff 89 d9 48 89 c2 4c 89 e6 48 c7 c7 50 59 89 82 e8 e5 92 4d ff <0f> 0b eb c0 90 48 c7 47 08 00 00 00 00 48 c7 07 00 00 00 00 0f b7
        [28336.965265] RSP: e02b:ffff88807d403ea0 EFLAGS: 00010286
        [28336.977465] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff82a69db8
        [28336.991265] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 0000000000000200
        [28337.008865] RBP: ffff88807936e41c R08: 0000000000000000 R09: 0000000000000819
        [28337.022250] R10: 0000000000000202 R11: ffffffff8247ca80 R12: ffff88807936e000
        [28337.035204] R13: 0000000000000000 R14: ffff88807936e440 R15: 0000000000000001
        [28337.049832] FS:  00007f53e9bf3840(0000) GS:ffff88807d400000(0000) knlGS:0000000000000000
        [28337.062524] CS:  e030 DS: 0000 ES: 0000 CR0: 0000000080050033
        [28337.075086] CR2: 00007f53e60c4000 CR3: 000000001a0be000 CR4: 0000000000000660
        [28337.090052] Call Trace:
        [28337.103615]  <IRQ>
        [28337.116587]  ? qdisc_destroy+0x120/0x120
        [28337.128905]  call_timer_fn+0x19/0x90
        [28337.141892]  expire_timers+0x8b/0xa0
        [28337.153354]  run_timer_softirq+0x7e/0x160
        [28337.165931]  ? handle_irq_event_percpu+0x4c/0x70
        [28337.176548]  ? handle_percpu_irq+0x32/0x50
        [28337.186734]  __do_softirq+0xed/0x229
        [28337.196404]  ? hypervisor_callback+0xa/0x20
        [28337.207822]  irq_exit+0xb7/0xc0
        [28337.218978]  xen_evtchn_do_upcall+0x27/0x40
        [28337.230763]  xen_do_hypervisor_callback+0x29/0x40
        [28337.241261]  </IRQ>
        [28337.253283] RIP: e033:0xff7e62
        [28337.264899] Code: 35 43 0f c7 00 4c 89 ef e8 8b 6d 67 ff 0f 1f 00 44 89 e0 44 89 e2 c1 e8 06 83 e2 3f 48 8b 0c c5 40 8d c6 01 48 0f a3 d1 72 0e <48> 8b 04 c5 50 8d c6 01 48 0f a3 d0 73 0b 44 89 e6 4c 89 ef e8 b5
        [28337.288677] RSP: e02b:00007fff0fc6a340 EFLAGS: 00000202
        [28337.299234] RAX: 0000000000000000 RBX: 00007f53e60c3580 RCX: 0000000000000000
        [28337.309577] RDX: 0000000000000034 RSI: 0000000001e71a98 RDI: 00007fff0fc6a538
        [28337.320724] RBP: 00007fff0fc6a4b0 R08: 0000000000000000 R09: 0000000000000000
        [28337.331829] R10: 0000000000000001 R11: 00000000020cb3d0 R12: 0000000000000034
        [28337.343900] R13: 00007fff0fc6a538 R14: 0000000000000000 R15: 0000000000000001
        [28337.353977] ---[ end trace 6ff49f09286816b7 ]---

--
Sander

 
>>
>> .
>>
> Heiner
> 


^ permalink raw reply

* Re: Linux 5.0 regression: rtl8169 / kernel BUG at lib/dynamic_queue_limits.c:27!
From: Heiner Kallweit @ 2019-02-09  9:02 UTC (permalink / raw)
  To: Eric Dumazet, Sander Eikelenboom, Realtek linux nic maintainers,
	Eric Dumazet
  Cc: Linus Torvalds, linux-kernel, netdev
In-Reply-To: <e6423f51-42cc-e49f-bb48-85ea922f01b6@gmail.com>

On 09.02.2019 00:09, Eric Dumazet wrote:
> 
> 
> On 02/08/2019 01:50 PM, Heiner Kallweit wrote:
>> On 08.02.2019 22:45, Sander Eikelenboom wrote:
>>> On 08/02/2019 22:22, Heiner Kallweit wrote:
>>>> On 08.02.2019 21:55, Sander Eikelenboom wrote:
>>>>> On 08/02/2019 19:52, Heiner Kallweit wrote:
>>>>>> On 08.02.2019 19:29, Sander Eikelenboom wrote:
>>>>>>> L.S.,
>>>>>>>
>>>>>>> While testing a linux 5.0-rc5 kernel (with some patches on top but they don't seem related) under Xen i the nasty splat below, 
>>>>>>> that I haven encountered with Linux 4.20.x.
>>>>>>>
>>>>>>> Unfortunately I haven't got a clear reproducer for this and bisecting could be nasty due to another (networking related) kernel bug.
>>>>>>>
>>>>>>> If you need more info, want me to run a debug patch etc., please feel free to ask.
>>>>>>>
>>>>>> Thanks for the report. However I see no change in the r8169 driver between
>>>>>> 4.20 and 5.0 with regard to BQL code. Having said that the root cause could
>>>>>> be somewhere else. Therefore I'm afraid a bisect will be needed.
>>>>>
>>>>> Hmm i did some diging and i think:
>>>>> bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 r8169: remove unneeded mmiowb barriers
>>>>> 2e6eedb4813e34d8d84ac0eb3afb668966f3f356 r8169: make use of xmit_more and __netdev_sent_queue
>>>>> 620344c43edfa020bbadfd81a144ebe5181fc94f net: core: add __netdev_sent_queue as variant of __netdev_tx_sent_queue
>>>>>
>>>> You're right. Thought this was added in 4.20 already.
>>>> The BQL code pattern I copied from the mlx4 driver and so far I haven't heard about
>>>> this issue from any user of physical hw. And due to the fact that a lot of mainboards
>>>> have onboard Realtek network I have quite a few testers out there.
>>>> Does the issue occur under specific circumstances like very high load?
>>>
>>> Yep, the box is already quite contented with the Xen VM's and if I remember correctly it occurred while kernel compiling
>>> on the host.
>>>
>>>> If indeed the xmit_more patch causes the issue, I think we have to involve Eric Dumazet
>>>> as author of the underlying changes.
>>>
>>> It could also be the barriers weren't that unneeded as assumed.
>>
>> The barriers were removed after adding xmit_more handling. Therefore it would be good to
>> test also with only 
>> bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 r8169: remove unneeded mmiowb barriers
>> removed.
>>
>>> Since we are almost at RC6 i took the liberty to CC Eric now.
>>>
>> Sure, thanks.
>>
>>> BTW am i correct these patches are merely optimizations ?
>>
>> Yes
>>
>>> If so and concluding they revert cleanly, perhaps it should be considered at this point in the RC's
>>> to revert them for 5.0 and try again for 5.1 ?
>>>
>> Before removing both it would be good to test with only the barrier-removal removed.
>>
> 
> Commit 2e6eedb4813e34d8d84ac0eb3afb668966f3f356 r8169: make use of xmit_more and __netdev_sent_queue
> looks buggy to me, since the skb might have been freed already on another cpu when you call
> 
> You could try :
> 
> diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
> index 3624e67aef72c92ed6e908e2c99ac2d381210126..f907d484165d9fd775e81bf2bfb9aa4ddedb1c93 100644
> --- a/drivers/net/ethernet/realtek/r8169.c
> +++ b/drivers/net/ethernet/realtek/r8169.c
> @@ -6070,6 +6070,7 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
>         dma_addr_t mapping;
>         u32 opts[2], len;
>         bool stop_queue;
> +       bool door_bell;
>         int frags;
>  
>         if (unlikely(!rtl_tx_slots_avail(tp, skb_shinfo(skb)->nr_frags))) {
> @@ -6116,6 +6117,8 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
>         /* Force memory writes to complete before releasing descriptor */
>         dma_wmb();
>  
> +       door_bell = __netdev_sent_queue(dev, skb->len, skb->xmit_more);
> +
>         txd->opts1 = rtl8169_get_txd_opts1(opts[0], len, entry);
>  
>         /* Force all memory writes to complete before notifying device */
> @@ -6127,7 +6130,7 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
>         if (unlikely(stop_queue))
>                 netif_stop_queue(dev);
>  
> -       if (__netdev_sent_queue(dev, skb->len, skb->xmit_more)) {
> +       if (door_bell) {
>                 RTL_W8(tp, TxPoll, NPQ);
>                 mmiowb();
>         }
> 
Thanks a lot for checking and for the proposed fix.
Sander, can you try with this patch on top of 5.0-rc5 w/o removing two two commits?

> 
> .
> 
Heiner

^ permalink raw reply

* Re: Linux 5.0 regression: rtl8169 / kernel BUG at lib/dynamic_queue_limits.c:27!
From: Heiner Kallweit @ 2019-02-09  9:10 UTC (permalink / raw)
  To: Sander Eikelenboom, Realtek linux nic maintainers, Eric Dumazet
  Cc: Linus Torvalds, linux-kernel, netdev
In-Reply-To: <7084be7a-c279-080d-d1ec-cd604f2b2b14@eikelenboom.it>

On 09.02.2019 00:34, Sander Eikelenboom wrote:
> On 08/02/2019 22:50, Heiner Kallweit wrote:
>> On 08.02.2019 22:45, Sander Eikelenboom wrote:
>>> On 08/02/2019 22:22, Heiner Kallweit wrote:
>>>> On 08.02.2019 21:55, Sander Eikelenboom wrote:
>>>>> On 08/02/2019 19:52, Heiner Kallweit wrote:
>>>>>> On 08.02.2019 19:29, Sander Eikelenboom wrote:
>>>>>>> L.S.,
>>>>>>>
>>>>>>> While testing a linux 5.0-rc5 kernel (with some patches on top but they don't seem related) under Xen i the nasty splat below, 
>>>>>>> that I haven encountered with Linux 4.20.x.
>>>>>>>
>>>>>>> Unfortunately I haven't got a clear reproducer for this and bisecting could be nasty due to another (networking related) kernel bug.
>>>>>>>
>>>>>>> If you need more info, want me to run a debug patch etc., please feel free to ask.
>>>>>>>
>>>>>> Thanks for the report. However I see no change in the r8169 driver between
>>>>>> 4.20 and 5.0 with regard to BQL code. Having said that the root cause could
>>>>>> be somewhere else. Therefore I'm afraid a bisect will be needed.
>>>>>
>>>>> Hmm i did some diging and i think:
>>>>> bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 r8169: remove unneeded mmiowb barriers
>>>>> 2e6eedb4813e34d8d84ac0eb3afb668966f3f356 r8169: make use of xmit_more and __netdev_sent_queue
>>>>> 620344c43edfa020bbadfd81a144ebe5181fc94f net: core: add __netdev_sent_queue as variant of __netdev_tx_sent_queue
>>>>>
>>>> You're right. Thought this was added in 4.20 already.
>>>> The BQL code pattern I copied from the mlx4 driver and so far I haven't heard about
>>>> this issue from any user of physical hw. And due to the fact that a lot of mainboards
>>>> have onboard Realtek network I have quite a few testers out there.
>>>> Does the issue occur under specific circumstances like very high load?
>>>
>>> Yep, the box is already quite contented with the Xen VM's and if I remember correctly it occurred while kernel compiling
>>> on the host.
>>>
>>>> If indeed the xmit_more patch causes the issue, I think we have to involve Eric Dumazet
>>>> as author of the underlying changes.
>>>
>>> It could also be the barriers weren't that unneeded as assumed.
>>
>> The barriers were removed after adding xmit_more handling. Therefore it would be good to
>> test also with only 
>> bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 r8169: remove unneeded mmiowb barriers
>> removed.
> 
> *arghh* *grmbl*
> 
> with both:
>     bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3
>     and
>     2e6eedb4813e34d8d84ac0eb3afb668966f3f356 
> reverted i get yet another splat:
> 
Puh, I'm not a memory management expert. The traces include also a failed memory
allocation from a file system operation. Maybe the system is going low on memory?
The issue occurs so deep in the memory mgmt, that I wonder if and how this could
be caused by the network driver.


> [ 3769.246083] ld: page allocation failure: order:0, mode:0x480020(GFP_ATOMIC), nodemask=(null),cpuset=/,mems_allowed=0
> [ 3769.246095] CPU: 2 PID: 3201 Comm: ld Not tainted 5.0.0-rc5-20190208-thp-net-florian-rtl8169-doflr+ #1
> [ 3769.246096] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640)  , BIOS V1.8B1 09/13/2010
> [ 3769.246098] Call Trace:
> [ 3769.246104]  <IRQ>
> [ 3769.246114]  dump_stack+0x5c/0x7b
> [ 3769.246120]  warn_alloc+0x103/0x190
> [ 3769.246122]  __alloc_pages_nodemask+0xe3d/0xe80
> [ 3769.246128]  ? inet_gro_receive+0x232/0x2c0
> [ 3769.246130]  page_frag_alloc+0x117/0x150
> [ 3769.246132]  __napi_alloc_skb+0x83/0xd0
> [ 3769.246137]  rtl8169_poll+0x210/0x640
> [ 3769.246140]  net_rx_action+0x23d/0x370
> [ 3769.246145]  __do_softirq+0xed/0x229
> [ 3769.246149]  irq_exit+0xb7/0xc0
> [ 3769.246152]  xen_evtchn_do_upcall+0x27/0x40
> [ 3769.246154]  xen_do_hypervisor_callback+0x29/0x40
> [ 3769.246155]  </IRQ>
> [ 3769.246161] RIP: e030:__pv_queued_spin_lock_slowpath+0xda/0x280
> [ 3769.246163] Code: 14 41 bc 01 00 00 00 41 bd 00 01 00 00 3c 02 0f 94 c0 0f b6 c0 48 89 04 24 c6 45 14 00 ba 00 80 00 00 c6 43 01 01 eb 0b f3 90 <83> ea 01 0f 84 49 01 00 00 0f b6 03 84 c0 75 ee 44 89 e8 f0 66 44
> [ 3769.246164] RSP: e02b:ffffc90005b0f780 EFLAGS: 00000202
> [ 3769.246166] RAX: 0000000000000001 RBX: ffff8880047c9200 RCX: 0000000000000001
> [ 3769.246167] RDX: 0000000000007d75 RSI: 0000000000000000 RDI: ffff8880047c9200
> [ 3769.246167] RBP: ffff88807d4a1a80 R08: ffffc90005b0f978 R09: ffffc90005b0f978
> [ 3769.246168] R10: ffffc90005b0f9d0 R11: ffff88807fc17000 R12: 0000000000000001
> [ 3769.246169] R13: 0000000000000100 R14: 0000000000000000 R15: 00000000000c0000
> [ 3769.246173]  _raw_spin_lock+0x16/0x20
> [ 3769.246176]  list_lru_add+0x59/0x170
> [ 3769.246179]  inode_lru_list_add+0x1b/0x40
> [ 3769.246182]  iput+0x18b/0x1a0
> [ 3769.246184]  __dentry_kill+0xc5/0x170
> [ 3769.246186]  shrink_dentry_list+0x93/0x1c0
> [ 3769.246187]  prune_dcache_sb+0x4d/0x70
> [ 3769.246191]  super_cache_scan+0x104/0x190
> [ 3769.246194]  do_shrink_slab+0x12c/0x1e0
> [ 3769.246196]  shrink_slab+0xdf/0x2b0
> [ 3769.246198]  shrink_node+0x158/0x470
> [ 3769.246200]  do_try_to_free_pages+0xd1/0x380
> [ 3769.246202]  try_to_free_pages+0xb2/0xe0
> [ 3769.246204]  __alloc_pages_nodemask+0x603/0xe80
> [ 3769.246207]  ? xas_load+0x9/0x80
> [ 3769.246209]  ? find_get_entry+0x58/0x120
> [ 3769.246210]  pagecache_get_page+0xde/0x210
> [ 3769.246213]  grab_cache_page_write_begin+0x17/0x30
> [ 3769.246215]  ext4_da_write_begin+0xc4/0x340
> [ 3769.246217]  generic_perform_write+0xb8/0x1b0
> [ 3769.246219]  __generic_file_write_iter+0x13c/0x1b0
> [ 3769.246223]  ext4_file_write_iter+0x121/0x3c0
> [ 3769.246225]  __vfs_write+0x123/0x1a0
> [ 3769.246226]  vfs_write+0xab/0x1a0
> [ 3769.246229]  ksys_write+0x4d/0xc0
> [ 3769.246232]  do_syscall_64+0x49/0x100
> [ 3769.246234]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [ 3769.246237] RIP: 0033:0x7fee5b265730
> [ 3769.246238] Code: 73 01 c3 48 8b 0d 68 d7 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d d9 2f 2c 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 7e 9b 01 00 48 89 04 24
> [ 3769.246239] RSP: 002b:00007fff33183dd8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> [ 3769.246240] RAX: ffffffffffffffda RBX: 0000000000000710 RCX: 00007fee5b265730
> [ 3769.246241] RDX: 0000000000000710 RSI: 000055559bed78b0 RDI: 0000000000000049
> [ 3769.246241] RBP: 000055559bed78b0 R08: 0000000000000b40 R09: 0000000001c0320c
> [ 3769.246242] R10: 00007fee5be91e80 R11: 0000000000000246 R12: 0000000000000710
> [ 3769.246243] R13: 0000000000000001 R14: 00005555a2690050 R15: 0000000000000710
> [ 3769.246244] Mem-Info:
> [ 3769.246249] active_anon:152383 inactive_anon:99216 isolated_anon:0
>                 active_file:51569 inactive_file:85922 isolated_file:0
>                 unevictable:552 dirty:6866 writeback:0 unstable:0
>                 slab_reclaimable:6707 slab_unreclaimable:16166
>                 mapped:1870 shmem:6 pagetables:2716 bounce:0
>                 free:3639 free_pcp:900 free_cma:0
> [ 3769.246252] Node 0 active_anon:609532kB inactive_anon:396864kB active_file:206276kB inactive_file:343688kB unevictable:2208kB isolated(anon):0kB isolated(file):0kB mapped:7480kB dirty:27464kB writeback:0kB shmem:24kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
> [ 3769.246253] Node 0 DMA free:7480kB min:44kB low:56kB high:68kB active_anon:8056kB inactive_anon:0kB active_file:92kB inactive_file:148kB unevictable:0kB writepending:8kB present:15956kB managed:15872kB mlocked:0kB kernel_stack:0kB pagetables:20kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> [ 3769.246256] lowmem_reserve[]: 0 1865 1865 1865
> [ 3769.246258] Node 0 DMA32 free:7076kB min:19472kB low:21380kB high:23288kB active_anon:601840kB inactive_anon:396512kB active_file:206216kB inactive_file:343644kB unevictable:2208kB writepending:27256kB present:2080768kB managed:1833792kB mlocked:2208kB kernel_stack:9392kB pagetables:10844kB bounce:0kB free_pcp:3600kB local_pcp:596kB free_cma:0kB
> [ 3769.246260] lowmem_reserve[]: 0 0 0 0
> [ 3769.246262] Node 0 DMA: 6*4kB (UE) 4*8kB (UME) 4*16kB (UME) 2*32kB (UE) 6*64kB (UE) 2*128kB (UM) 4*256kB (UME) 3*512kB (UME) 2*1024kB (ME) 1*2048kB (M) 0*4096kB = 7480kB
> [ 3769.246267] Node 0 DMA32: 66*4kB (UM) 271*8kB (UME) 218*16kB (UME) 45*32kB (UME) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 7360kB
> [ 3769.246272] 144878 total pagecache pages
> [ 3769.246276] 6812 pages in swap cache
> [ 3769.246277] Swap cache stats: add 62616, delete 55806, find 31/55
> [ 3769.246278] Free swap  = 3943164kB
> [ 3769.246278] Total swap = 4194300kB
> [ 3769.246279] 524181 pages RAM
> [ 3769.246279] 0 pages HighMem/MovableOnly
> [ 3769.246280] 61765 pages reserved
> [ 3769.246280] 0 pages cma reserved
> [ 3769.246284] ld: page allocation failure: order:0, mode:0x480020(GFP_ATOMIC), nodemask=(null),cpuset=/,mems_allowed=0
> [ 3769.246286] CPU: 2 PID: 3201 Comm: ld Not tainted 5.0.0-rc5-20190208-thp-net-florian-rtl8169-doflr+ #1
> [ 3769.246287] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640)  , BIOS V1.8B1 09/13/2010
> [ 3769.246287] Call Trace:
> [ 3769.246288]  <IRQ>
> [ 3769.246290]  dump_stack+0x5c/0x7b
> [ 3769.246291]  warn_alloc+0x103/0x190
> [ 3769.246293]  __alloc_pages_nodemask+0xe3d/0xe80
> [ 3769.246294]  ? inet_gro_receive+0x232/0x2c0
> [ 3769.246296]  page_frag_alloc+0x117/0x150
> [ 3769.246297]  __napi_alloc_skb+0x83/0xd0
> [ 3769.246299]  rtl8169_poll+0x210/0x640
> [ 3769.246300]  net_rx_action+0x23d/0x370
> [ 3769.246302]  __do_softirq+0xed/0x229
> [ 3769.246304]  irq_exit+0xb7/0xc0
> [ 3769.246305]  xen_evtchn_do_upcall+0x27/0x40
> [ 3769.246306]  xen_do_hypervisor_callback+0x29/0x40
> [ 3769.246307]  </IRQ>
> [ 3769.246308] RIP: e030:__pv_queued_spin_lock_slowpath+0xda/0x280
> [ 3769.246310] Code: 14 41 bc 01 00 00 00 41 bd 00 01 00 00 3c 02 0f 94 c0 0f b6 c0 48 89 04 24 c6 45 14 00 ba 00 80 00 00 c6 43 01 01 eb 0b f3 90 <83> ea 01 0f 84 49 01 00 00 0f b6 03 84 c0 75 ee 44 89 e8 f0 66 44
> [ 3769.246310] RSP: e02b:ffffc90005b0f780 EFLAGS: 00000202
> [ 3769.246311] RAX: 0000000000000001 RBX: ffff8880047c9200 RCX: 0000000000000001
> [ 3769.246312] RDX: 0000000000007d75 RSI: 0000000000000000 RDI: ffff8880047c9200
> [ 3769.246313] RBP: ffff88807d4a1a80 R08: ffffc90005b0f978 R09: ffffc90005b0f978
> [ 3769.246313] R10: ffffc90005b0f9d0 R11: ffff88807fc17000 R12: 0000000000000001
> [ 3769.246314] R13: 0000000000000100 R14: 0000000000000000 R15: 00000000000c0000
> [ 3769.246316]  _raw_spin_lock+0x16/0x20
> [ 3769.246317]  list_lru_add+0x59/0x170
> [ 3769.246318]  inode_lru_list_add+0x1b/0x40
> [ 3769.246320]  iput+0x18b/0x1a0
> [ 3769.246321]  __dentry_kill+0xc5/0x170
> [ 3769.246322]  shrink_dentry_list+0x93/0x1c0
> [ 3769.246323]  prune_dcache_sb+0x4d/0x70
> [ 3769.246325]  super_cache_scan+0x104/0x190
> [ 3769.246326]  do_shrink_slab+0x12c/0x1e0
> [ 3769.246328]  shrink_slab+0xdf/0x2b0
> [ 3769.246329]  shrink_node+0x158/0x470
> [ 3769.246331]  do_try_to_free_pages+0xd1/0x380
> [ 3769.246333]  try_to_free_pages+0xb2/0xe0
> [ 3769.246334]  __alloc_pages_nodemask+0x603/0xe80
> [ 3769.246336]  ? xas_load+0x9/0x80
> [ 3769.246337]  ? find_get_entry+0x58/0x120
> [ 3769.246338]  pagecache_get_page+0xde/0x210
> [ 3769.246340]  grab_cache_page_write_begin+0x17/0x30
> [ 3769.246341]  ext4_da_write_begin+0xc4/0x340
> [ 3769.246342]  generic_perform_write+0xb8/0x1b0
> [ 3769.246344]  __generic_file_write_iter+0x13c/0x1b0
> [ 3769.246345]  ext4_file_write_iter+0x121/0x3c0
> [ 3769.246347]  __vfs_write+0x123/0x1a0
> [ 3769.246348]  vfs_write+0xab/0x1a0
> [ 3769.246349]  ksys_write+0x4d/0xc0
> [ 3769.246350]  do_syscall_64+0x49/0x100
> [ 3769.246352]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [ 3769.246353] RIP: 0033:0x7fee5b265730
> [ 3769.246354] Code: 73 01 c3 48 8b 0d 68 d7 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d d9 2f 2c 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 7e 9b 01 00 48 89 04 24
> [ 3769.246354] RSP: 002b:00007fff33183dd8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> [ 3769.246355] RAX: ffffffffffffffda RBX: 0000000000000710 RCX: 00007fee5b265730
> [ 3769.246356] RDX: 0000000000000710 RSI: 000055559bed78b0 RDI: 0000000000000049
> [ 3769.246357] RBP: 000055559bed78b0 R08: 0000000000000b40 R09: 0000000001c0320c
> [ 3769.246357] R10: 00007fee5be91e80 R11: 0000000000000246 R12: 0000000000000710
> [ 3769.246358] R13: 0000000000000001 R14: 00005555a2690050 R15: 0000000000000710
> [ 3769.246364] ld: page allocation failure: order:0, mode:0x480020(GFP_ATOMIC), nodemask=(null),cpuset=/,mems_allowed=0
> [ 3769.246366] CPU: 2 PID: 3201 Comm: ld Not tainted 5.0.0-rc5-20190208-thp-net-florian-rtl8169-doflr+ #1
> [ 3769.246366] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640)  , BIOS V1.8B1 09/13/2010
> [ 3769.246366] Call Trace:
> [ 3769.246367]  <IRQ>
> [ 3769.246368]  dump_stack+0x5c/0x7b
> [ 3769.246370]  warn_alloc+0x103/0x190
> [ 3769.246371]  __alloc_pages_nodemask+0xe3d/0xe80
> [ 3769.246373]  ? inet_gro_receive+0x232/0x2c0
> [ 3769.246374]  page_frag_alloc+0x117/0x150
> [ 3769.246375]  __napi_alloc_skb+0x83/0xd0
> [ 3769.246376]  rtl8169_poll+0x210/0x640
> [ 3769.246378]  net_rx_action+0x23d/0x370
> [ 3769.246379]  __do_softirq+0xed/0x229
> [ 3769.246381]  irq_exit+0xb7/0xc0
> [ 3769.246382]  xen_evtchn_do_upcall+0x27/0x40
> [ 3769.246383]  xen_do_hypervisor_callback+0x29/0x40
> [ 3769.246383]  </IRQ>
> [ 3769.246385] RIP: e030:__pv_queued_spin_lock_slowpath+0xda/0x280
> [ 3769.246386] Code: 14 41 bc 01 00 00 00 41 bd 00 01 00 00 3c 02 0f 94 c0 0f b6 c0 48 89 04 24 c6 45 14 00 ba 00 80 00 00 c6 43 01 01 eb 0b f3 90 <83> ea 01 0f 84 49 01 00 00 0f b6 03 84 c0 75 ee 44 89 e8 f0 66 44
> [ 3769.246387] RSP: e02b:ffffc90005b0f780 EFLAGS: 00000202
> [ 3769.246388] RAX: 0000000000000001 RBX: ffff8880047c9200 RCX: 0000000000000001
> [ 3769.246388] RDX: 0000000000007d75 RSI: 0000000000000000 RDI: ffff8880047c9200
> [ 3769.246389] RBP: ffff88807d4a1a80 R08: ffffc90005b0f978 R09: ffffc90005b0f978
> [ 3769.246390] R10: ffffc90005b0f9d0 R11: ffff88807fc17000 R12: 0000000000000001
> [ 3769.246390] R13: 0000000000000100 R14: 0000000000000000 R15: 00000000000c0000
> [ 3769.246392]  _raw_spin_lock+0x16/0x20
> [ 3769.246393]  list_lru_add+0x59/0x170
> [ 3769.246395]  inode_lru_list_add+0x1b/0x40
> [ 3769.246396]  iput+0x18b/0x1a0
> [ 3769.246397]  __dentry_kill+0xc5/0x170
> [ 3769.246398]  shrink_dentry_list+0x93/0x1c0
> [ 3769.246399]  prune_dcache_sb+0x4d/0x70
> [ 3769.246401]  super_cache_scan+0x104/0x190
> [ 3769.246402]  do_shrink_slab+0x12c/0x1e0
> [ 3769.246404]  shrink_slab+0xdf/0x2b0
> [ 3769.246405]  shrink_node+0x158/0x470
> [ 3769.246407]  do_try_to_free_pages+0xd1/0x380
> [ 3769.246408]  try_to_free_pages+0xb2/0xe0
> [ 3769.246410]  __alloc_pages_nodemask+0x603/0xe80
> [ 3769.246411]  ? xas_load+0x9/0x80
> [ 3769.246413]  ? find_get_entry+0x58/0x120
> [ 3769.246414]  pagecache_get_page+0xde/0x210
> [ 3769.246415]  grab_cache_page_write_begin+0x17/0x30
> [ 3769.246416]  ext4_da_write_begin+0xc4/0x340
> [ 3769.246418]  generic_perform_write+0xb8/0x1b0
> [ 3769.246420]  __generic_file_write_iter+0x13c/0x1b0
> [ 3769.246421]  ext4_file_write_iter+0x121/0x3c0
> [ 3769.246422]  __vfs_write+0x123/0x1a0
> [ 3769.246423]  vfs_write+0xab/0x1a0
> [ 3769.246424]  ksys_write+0x4d/0xc0
> [ 3769.246426]  do_syscall_64+0x49/0x100
> [ 3769.246427]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [ 3769.246428] RIP: 0033:0x7fee5b265730
> [ 3769.246429] Code: 73 01 c3 48 8b 0d 68 d7 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d d9 2f 2c 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 7e 9b 01 00 48 89 04 24
> [ 3769.246430] RSP: 002b:00007fff33183dd8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> [ 3769.246431] RAX: ffffffffffffffda RBX: 0000000000000710 RCX: 00007fee5b265730
> [ 3769.246431] RDX: 0000000000000710 RSI: 000055559bed78b0 RDI: 0000000000000049
> [ 3769.246432] RBP: 000055559bed78b0 R08: 0000000000000b40 R09: 0000000001c0320c
> [ 3769.246433] R10: 00007fee5be91e80 R11: 0000000000000246 R12: 0000000000000710
> [ 3769.246433] R13: 0000000000000001 R14: 00005555a2690050 R15: 0000000000000710
> 
> 
>  
>>> Since we are almost at RC6 i took the liberty to CC Eric now.
>>>
>> Sure, thanks.
>>
>>> BTW am i correct these patches are merely optimizations ?
>>
>> Yes
>>
>>> If so and concluding they revert cleanly, perhaps it should be considered at this point in the RC's
>>> to revert them for 5.0 and try again for 5.1 ?
>>>
>> Before removing both it would be good to test with only the barrier-removal removed.
>>
>>> --
>>> Sander
>>>
>> Heiner
>>
>>>
>>>>
>>>>> would be candidates, which were merged in 5.0.
>>>>>
>>>>> I have reverted the first two, see how that works out.
>>>>>
>>>>> --
>>>>> Sander
>>>>>
>>>> Heiner
>>>>
>>>>>  
>>>>>>> --
>>>>>>> Sander
>>>>>>>
>>>>>> Heiner
>>>>>>
>>>>>>>
>>>>>>> [ 6466.554866] kernel BUG at lib/dynamic_queue_limits.c:27!
>>>>>>> [ 6466.571425] invalid opcode: 0000 [#1] SMP NOPTI
>>>>>>> [ 6466.585890] CPU: 3 PID: 7057 Comm: as Not tainted 5.0.0-rc5-20190208-thp-net-florian-doflr+ #1
>>>>>>> [ 6466.598693] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640)  , BIOS V1.8B1 09/13/2010
>>>>>>> [ 6466.611579] RIP: e030:dql_completed+0x126/0x140
>>>>>>> [ 6466.624339] Code: 2b 47 54 ba 00 00 00 00 c7 47 54 ff ff ff ff 0f 48 c2 48 8b 15 7b 39 4a 01 48 89 57 58 e9 48 ff ff ff 44 89 c0 e9 40 ff ff ff <0f> 0b 8b 47 50 29 e8 41 0f 48 c3 eb 9f 90 90 90 90 90 90 90 90 90
>>>>>>> [ 6466.648130] RSP: e02b:ffff88807d4c3e78 EFLAGS: 00010297
>>>>>>> [ 6466.659616] RAX: 0000000000000042 RBX: ffff8880049cf800 RCX: 0000000000000000
>>>>>>> [ 6466.672835] RDX: 0000000000000001 RSI: 0000000000000042 RDI: ffff8880049cf8c0
>>>>>>> [ 6466.684521] RBP: ffff888077df7260 R08: 0000000000000001 R09: 0000000000000000
>>>>>>> [ 6466.696824] R10: 00000000387c2336 R11: 00000000387c2336 R12: 0000000010000000
>>>>>>> [ 6466.709953] R13: ffff888077df6898 R14: ffff888077df75c0 R15: 0000000000454677
>>>>>>> [ 6466.722165] FS:  00007fd869147200(0000) GS:ffff88807d4c0000(0000) knlGS:0000000000000000
>>>>>>> [ 6466.733228] CS:  e030 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>>> [ 6466.746581] CR2: 00007fd867dfd000 CR3: 0000000074884000 CR4: 0000000000000660
>>>>>>> [ 6466.758366] Call Trace:
>>>>>>> [ 6466.768118]  <IRQ>
>>>>>>> [ 6466.778214]  rtl8169_poll+0x4f4/0x640
>>>>>>> [ 6466.789198]  net_rx_action+0x23d/0x370
>>>>>>> [ 6466.798467]  __do_softirq+0xed/0x229
>>>>>>> [ 6466.807039]  irq_exit+0xb7/0xc0
>>>>>>> [ 6466.815471]  xen_evtchn_do_upcall+0x27/0x40
>>>>>>> [ 6466.826647]  xen_do_hypervisor_callback+0x29/0x40
>>>>>>> [ 6466.835902]  </IRQ>
>>>>>>> [ 6466.845361] RIP: e030:xen_hypercall_mmu_update+0xa/0x20
>>>>>>> [ 6466.853390] Code: 51 41 53 b8 00 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 01 00 00 00 0f 05 <41> 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
>>>>>>> [ 6466.874031] RSP: e02b:ffffc90003c0bdd0 EFLAGS: 00000246
>>>>>>> [ 6466.883452] RAX: 0000000000000000 RBX: 000000041f83bfe8 RCX: ffffffff8100102a
>>>>>>> [ 6466.891986] RDX: deadbeefdeadf00d RSI: deadbeefdeadf00d RDI: deadbeefdeadf00d
>>>>>>> [ 6466.903402] RBP: 0000000000000fe8 R08: 000000000000000b R09: 0000000000000000
>>>>>>> [ 6466.911201] R10: deadbeefdeadf00d R11: 0000000000000246 R12: 800000050c346067
>>>>>>> [ 6466.918491] R13: ffff8880607c4fe8 R14: ffff888005082800 R15: 0000000000000000
>>>>>>> [ 6466.926647]  ? xen_hypercall_mmu_update+0xa/0x20
>>>>>>> [ 6466.938195]  ? xen_set_pte_at+0x78/0xe0
>>>>>>> [ 6466.947046]  ? __handle_mm_fault+0xc43/0x1060
>>>>>>> [ 6466.955772]  ? do_mmap+0x44b/0x5b0
>>>>>>> [ 6466.964410]  ? handle_mm_fault+0xf8/0x200
>>>>>>> [ 6466.973290]  ? __do_page_fault+0x231/0x4a0
>>>>>>> [ 6466.981973]  ? page_fault+0x8/0x30
>>>>>>> [ 6466.990904]  ? page_fault+0x1e/0x30
>>>>>>> [ 6466.999585] Modules linked in:
>>>>>>> [ 6467.007533] ---[ end trace 94bec01608fe4061 ]---
>>>>>>> [ 6467.016751] RIP: e030:dql_completed+0x126/0x140
>>>>>>> [ 6467.024271] Code: 2b 47 54 ba 00 00 00 00 c7 47 54 ff ff ff ff 0f 48 c2 48 8b 15 7b 39 4a 01 48 89 57 58 e9 48 ff ff ff 44 89 c0 e9 40 ff ff ff <0f> 0b 8b 47 50 29 e8 41 0f 48 c3 eb 9f 90 90 90 90 90 90 90 90 90
>>>>>>> [ 6467.039726] RSP: e02b:ffff88807d4c3e78 EFLAGS: 00010297
>>>>>>> [ 6467.047243] RAX: 0000000000000042 RBX: ffff8880049cf800 RCX: 0000000000000000
>>>>>>> [ 6467.054202] RDX: 0000000000000001 RSI: 0000000000000042 RDI: ffff8880049cf8c0
>>>>>>> [ 6467.062000] RBP: ffff888077df7260 R08: 0000000000000001 R09: 0000000000000000
>>>>>>> [ 6467.069664] R10: 00000000387c2336 R11: 00000000387c2336 R12: 0000000010000000
>>>>>>> [ 6467.077715] R13: ffff888077df6898 R14: ffff888077df75c0 R15: 0000000000454677
>>>>>>> [ 6467.084916] FS:  00007fd869147200(0000) GS:ffff88807d4c0000(0000) knlGS:0000000000000000
>>>>>>> [ 6467.093352] CS:  e030 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>>> [ 6467.101492] CR2: 00007fd867dfd000 CR3: 0000000074884000 CR4: 0000000000000660
>>>>>>> [ 6467.110542] Kernel panic - not syncing: Fatal exception in interrupt
>>>>>>> [ 6467.118166] Kernel Offset: disabled
>>>>>>> (XEN) [2019-02-08 18:04:48.854] Hardware Dom0 crashed: rebooting machine in 5 seconds.
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
> 
> 


^ permalink raw reply

* [PATCH net-next] net: phy: remove unneeded masking of PHY register read results
From: Heiner Kallweit @ 2019-02-09  8:46 UTC (permalink / raw)
  To: Andrew Lunn, Florian Fainelli, David Miller; +Cc: netdev@vger.kernel.org
In-Reply-To: <75c9d8ee-582f-f247-7595-d8732ac26c20@gmail.com>

PHY registers are only 16 bits wide, therefore, if the read was
successful, there's no need to mask out the higher 16 bits.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
---
 drivers/net/phy/phy_device.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index d4fc1fd8a..31f9e7c49 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -676,13 +676,13 @@ static int get_phy_c45_devs_in_pkg(struct mii_bus *bus, int addr, int dev_addr,
 	phy_reg = mdiobus_read(bus, addr, reg_addr);
 	if (phy_reg < 0)
 		return -EIO;
-	*devices_in_package = (phy_reg & 0xffff) << 16;
+	*devices_in_package = phy_reg << 16;
 
 	reg_addr = MII_ADDR_C45 | dev_addr << 16 | MDIO_DEVS1;
 	phy_reg = mdiobus_read(bus, addr, reg_addr);
 	if (phy_reg < 0)
 		return -EIO;
-	*devices_in_package |= (phy_reg & 0xffff);
+	*devices_in_package |= phy_reg;
 
 	/* Bit 0 doesn't represent a device, it indicates c22 regs presence */
 	*devices_in_package &= ~BIT(0);
@@ -746,13 +746,13 @@ static int get_phy_c45_ids(struct mii_bus *bus, int addr, u32 *phy_id,
 		phy_reg = mdiobus_read(bus, addr, reg_addr);
 		if (phy_reg < 0)
 			return -EIO;
-		c45_ids->device_ids[i] = (phy_reg & 0xffff) << 16;
+		c45_ids->device_ids[i] = phy_reg << 16;
 
 		reg_addr = MII_ADDR_C45 | i << 16 | MII_PHYSID2;
 		phy_reg = mdiobus_read(bus, addr, reg_addr);
 		if (phy_reg < 0)
 			return -EIO;
-		c45_ids->device_ids[i] |= (phy_reg & 0xffff);
+		c45_ids->device_ids[i] |= phy_reg;
 	}
 	*phy_id = 0;
 	return 0;
@@ -789,14 +789,14 @@ static int get_phy_id(struct mii_bus *bus, int addr, u32 *phy_id,
 		return (phy_reg == -EIO || phy_reg == -ENODEV) ? -ENODEV : -EIO;
 	}
 
-	*phy_id = (phy_reg & 0xffff) << 16;
+	*phy_id = phy_reg << 16;
 
 	/* Grab the bits from PHYIR2, and put them in the lower half */
 	phy_reg = mdiobus_read(bus, addr, MII_PHYSID2);
 	if (phy_reg < 0)
 		return -EIO;
 
-	*phy_id |= (phy_reg & 0xffff);
+	*phy_id |= phy_reg;
 
 	return 0;
 }
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH net-next 3/4] nfp: devlink: rename vendor to manufacture
From: Jiri Pirko @ 2019-02-09  8:36 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: davem, netdev, oss-drivers
In-Reply-To: <20190209031611.1102-4-jakub.kicinski@netronome.com>

Sat, Feb 09, 2019 at 04:16:10AM CET, jakub.kicinski@netronome.com wrote:
>Vendor may sound ambiguous, let's rename the fab string to
>"board.manufacture".
>
>Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
>Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
>---
> drivers/net/ethernet/netronome/nfp/nfp_devlink.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/drivers/net/ethernet/netronome/nfp/nfp_devlink.c b/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
>index dddbb0575be9..6e15e216732a 100644
>--- a/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
>+++ b/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
>@@ -178,7 +178,7 @@ static const struct nfp_devlink_versions_simple {
> } nfp_devlink_versions_hwinfo[] = {
> 	{ DEVLINK_INFO_VERSION_GENERIC_BOARD_ID,	"assembly.partno", },
> 	{ DEVLINK_INFO_VERSION_GENERIC_BOARD_REV,	"assembly.revision", },
>-	{ "board.vendor", /* fab */			"assembly.vendor", },
>+	{ "board.manufacture",				"assembly.vendor", },

I wonder, why this is not among generic?


> 	{ "board.model", /* code name */		"assembly.model", },
> };
> 
>-- 
>2.19.2
>

^ permalink raw reply

* Re: [PATCH net-next 2/4] devlink: don't allocate attrs on the stack
From: Jiri Pirko @ 2019-02-09  8:35 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: davem, netdev, oss-drivers
In-Reply-To: <20190209031611.1102-3-jakub.kicinski@netronome.com>

Sat, Feb 09, 2019 at 04:16:09AM CET, jakub.kicinski@netronome.com wrote:
>Number of devlink attributes has grown over 128, causing the
>following warning:
>
>../net/core/devlink.c: In function ‘devlink_nl_cmd_region_read_dumpit’:
>../net/core/devlink.c:3740:1: warning: the frame size of 1064 bytes is larger than 1024 bytes [-Wframe-larger-than=]
> }
>  ^
>
>Since the number of attributes is only going to grow allocate
>the array dynamically.
>
>Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>

Acked-by: Jiri Pirko <jiri@mellanox.com>

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox