* Re: [PATCH] net: phy: broadcom: Wire suspend/resume for BCM54612E
From: Andrew Lunn @ 2023-10-31 0:31 UTC (permalink / raw)
To: Marco von Rosenberg
Cc: Florian Fainelli, Broadcom internal kernel review list,
Heiner Kallweit, Russell King, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, netdev, linux-kernel
In-Reply-To: <20231030225446.17422-1-marcovr@selfnet.de>
On Mon, Oct 30, 2023 at 11:54:45PM +0100, Marco von Rosenberg wrote:
> On some devices, the bootloader suspends the PHY before booting the OS.
> Not having a resume callback wired up is a problem in such situations
> since it is then never resumed.
Hi Marco
This description seems odd to me. I'm guessing here:
Are we talking about a device which as been suspended? The PHY has
been left running because there is no suspend callback? Something then
triggers a resume. The bootloader then suspends the active PHY? Linux
then boots, detects its a resume, so does not touch the hardware
because there is no resume callback? The suspended PHY is then
useless.
Adding suspend/resume calls makes sense, i just don't follow the
commit message reasoning.
Andrew
^ permalink raw reply
* Re: [PATCH net-next v2 3/3] net: dsa: realtek: support reset controller
From: Luiz Angelo Daros de Luca @ 2023-10-31 0:30 UTC (permalink / raw)
To: Vladimir Oltean
Cc: netdev, linus.walleij, alsi, andrew, vivien.didelot, f.fainelli,
davem, kuba, pabeni, robh+dt, krzk+dt, arinc.unal
In-Reply-To: <20231030205025.b4dryzqzuunrjils@skbuf>
> On Fri, Oct 27, 2023 at 04:00:57PM -0300, Luiz Angelo Daros de Luca wrote:
> > The 'reset-gpios' will not work when the switch reset is controlled by a
> > reset controller.
> >
> > Although the reset is optional and the driver performs a soft reset
> > during setup, if the initial reset state was asserted, the driver will
> > not detect it.
> >
> > The reset controller will take precedence over the reset GPIO.
> >
> > Signed-off-by: Luiz Angelo Daros de Luca <luizluca@gmail.com>
> > ---
> > drivers/net/dsa/realtek/realtek-mdio.c | 51 ++++++++++++++++++++++----
> > drivers/net/dsa/realtek/realtek-smi.c | 49 ++++++++++++++++++++++---
> > drivers/net/dsa/realtek/realtek.h | 2 +
> > 3 files changed, 89 insertions(+), 13 deletions(-)
> >
> > diff --git a/drivers/net/dsa/realtek/realtek-mdio.c b/drivers/net/dsa/realtek/realtek-mdio.c
> > index 292e6d087e8b..aad94e49d4c9 100644
> > --- a/drivers/net/dsa/realtek/realtek-mdio.c
> > +++ b/drivers/net/dsa/realtek/realtek-mdio.c
> > @@ -140,6 +140,40 @@ static const struct regmap_config realtek_mdio_nolock_regmap_config = {
> > .disable_locking = true,
> > };
> >
> > +static void realtek_mdio_reset_assert(struct realtek_priv *priv)
> > +{
> > + int ret;
> > +
> > + if (priv->reset_ctl) {
> > + ret = reset_control_assert(priv->reset_ctl);
> > + if (ret)
> > + dev_warn(priv->dev, "Failed to assert the switch reset control. Error: %i",
> > + ret);
>
> Instead of "Error: %i" you can say ".. reset control: %pe\n", ERR_PTR(ret)
> which will print the error as a symbolic error name (if CONFIG_SYMBOLIC_ERRNAME=y)
> rather than just a numeric value.
>
> Also, I don't know if this is explicit in the coding style, but I
> believe it is more consistent if single function calls are enveloped in
> curly braces if they span multiple lines, like so:
>
> if (ret) {
> dev_warn(priv->dev,
> "Failed to assert the switch reset control: %pe",
> ERR_PTR(ret));
> }
>
> Also, please note that netdev still prefers the 80 character line limit.
Sure.
> > +
> > + return;
> > + }
> > +
> > + if (priv->reset)
> > + gpiod_set_value(priv->reset, true);
> > +}
> > +
> > +static void realtek_mdio_reset_deassert(struct realtek_priv *priv)
> > +{
> > + int ret;
> > +
> > + if (priv->reset_ctl) {
> > + ret = reset_control_deassert(priv->reset_ctl);
> > + if (ret)
> > + dev_warn(priv->dev, "Failed to deassert the switch reset control. Error: %i",
> > + ret);
> > +
> > + return;
>
> Is there a particular reason why this has to ignore a reset GPIO if
> present, rather than fall through, checking for the latter as well?
Something like this, disregard white space issues?
static void realtek_smi_reset_assert(struct realtek_priv *priv)
{
int ret;
if (priv->reset_ctl) {
ret = reset_control_assert(priv->reset_ctl);
if (!ret)
return;
dev_warn(priv->dev,
"Failed to assert the switch reset control: %pe\n",
ERR_PTR(ret));
}
if (priv->reset)
gpiod_set_value(priv->reset, true);
}
> > + }
> > +
> > + if (priv->reset)
> > + gpiod_set_value(priv->reset, false);
> > +}
> > +
> > static int realtek_mdio_probe(struct mdio_device *mdiodev)
> > {
> > struct realtek_priv *priv;
> > @@ -194,20 +228,24 @@ static int realtek_mdio_probe(struct mdio_device *mdiodev)
> >
> > dev_set_drvdata(dev, priv);
> >
> > - /* TODO: if power is software controlled, set up any regulators here */
>
> As Andrew mentions, this commit does not make power software-controlled,
> so don't remove this.
I'll return it and move specific this TODO after the leds_disabled as
it should be before the reset. The one in realtek-smi was in the right
position.
> > priv->leds_disabled = of_property_read_bool(np, "realtek,disable-leds");
> >
> > + priv->reset_ctl = devm_reset_control_get_optional(dev, NULL);
> > + if (IS_ERR(priv->reset_ctl)) {
> > + ret = PTR_ERR(priv->reset_ctl);
> > + return dev_err_probe(dev, ret, "failed to get reset control\n");
> > + }
> > +
> > priv->reset = devm_gpiod_get_optional(dev, "reset", GPIOD_OUT_LOW);
> > if (IS_ERR(priv->reset)) {
> > dev_err(dev, "failed to get RESET GPIO\n");
> > return PTR_ERR(priv->reset);
> > }
> > -
> > - if (priv->reset) {
> > - gpiod_set_value(priv->reset, 1);
> > + if (priv->reset_ctl || priv->reset) {
> > + realtek_mdio_reset_assert(priv);
> > dev_dbg(dev, "asserted RESET\n");
> > msleep(REALTEK_HW_STOP_DELAY);
> > - gpiod_set_value(priv->reset, 0);
> > + realtek_mdio_reset_deassert(priv);
> > msleep(REALTEK_HW_START_DELAY);
> > dev_dbg(dev, "deasserted RESET\n");
> > }
> > @@ -246,8 +284,7 @@ static void realtek_mdio_remove(struct mdio_device *mdiodev)
> > dsa_unregister_switch(priv->ds);
> >
> > /* leave the device reset asserted */
> > - if (priv->reset)
> > - gpiod_set_value(priv->reset, 1);
> > + realtek_mdio_reset_assert(priv);
> > }
> >
> > static void realtek_mdio_shutdown(struct mdio_device *mdiodev)
> > diff --git a/drivers/net/dsa/realtek/realtek-smi.c b/drivers/net/dsa/realtek/realtek-smi.c
> > index bfd11591faf4..a99e53b5b662 100644
> > --- a/drivers/net/dsa/realtek/realtek-smi.c
> > +++ b/drivers/net/dsa/realtek/realtek-smi.c
> > @@ -408,6 +408,40 @@ static int realtek_smi_setup_mdio(struct dsa_switch *ds)
> > return ret;
> > }
> >
> > +static void realtek_smi_reset_assert(struct realtek_priv *priv)
> > +{
> > + int ret;
> > +
> > + if (priv->reset_ctl) {
> > + ret = reset_control_assert(priv->reset_ctl);
> > + if (ret)
> > + dev_warn(priv->dev, "Failed to assert the switch reset control. Error: %i",
> > + ret);
> > +
> > + return;
> > + }
> > +
> > + if (priv->reset)
> > + gpiod_set_value(priv->reset, true);
> > +}
> > +
> > +static void realtek_smi_reset_deassert(struct realtek_priv *priv)
> > +{
> > + int ret;
> > +
> > + if (priv->reset_ctl) {
> > + ret = reset_control_deassert(priv->reset_ctl);
> > + if (ret)
> > + dev_warn(priv->dev, "Failed to deassert the switch reset control. Error: %i",
> > + ret);
> > +
> > + return;
> > + }
> > +
> > + if (priv->reset)
> > + gpiod_set_value(priv->reset, false);
> > +}
> > +
>
> To respond here, in a single email, to your earlier question (sorry):
> https://lore.kernel.org/netdev/CAJq09z7miTe7HUzsL4GBSwkrzyy4mVi6z40+ETgvmY=iWGRN-g@mail.gmail.com/
>
> | Both interface modules, realtek-smi and realtek-mdio, do not share
> | code, except for the realtek.h header file. I don't know if it is
> | worth it to put the code in a new shared module. What is the best
> | practice here? Create a realtek_common.c linked to both modules?
>
> The answer is: I ran "meld" between realtek-mdio.c and realtek-smi.c,
> and the probe, remove and shutdown functions are surprisingly similar
> already, and perhaps might become even more similar in the future.
> I think it is worth introducing a common kernel module for both
> interface drivers as a preliminary patch, rather than keeping duplicated
> probe/remove/shutdown code.
The remove/shutdown are probably similar to any other DSA driver. I
think the extra code around a shared code in a new module would be
bigger than the duplicated code.
realtek-mdio is an MDIO driver while realtek-smi is a platform driver
implementing a bitbang protocol. They might never be used together in
a system to share RAM and not even installed together in small
systems. If I do need to share the code, I would just link it twice.
Would something like this be acceptable?
drivers/net/dsa/realtek/Makefile
-obj-$(CONFIG_NET_DSA_REALTEK_MDIO) += realtek-mdio.o
-obj-$(CONFIG_NET_DSA_REALTEK_SMI) += realtek-smi.o
+obj-$(CONFIG_NET_DSA_REALTEK_MDIO) += realtek-mdio.o realtek_common.o
+obj-$(CONFIG_NET_DSA_REALTEK_SMI) += realtek-smi.o realtek_common.o
drivers/net/dsa/realtek/realtek.h:
+void realtek_reset_assert(struct realtek_priv *priv);
+void realtek_reset_deassert(struct realtek_priv *priv);
realtek_common:
+void realtek_reset_assert(struct realtek_priv *priv) {}
+void realtek_reset_deassert(struct realtek_priv *priv) {}
If that realtek_common grows, we could convert it into a module. For
now, it would just introduce extra complexity.
> > static int realtek_smi_probe(struct platform_device *pdev)
> > {
> > const struct realtek_variant *var;
> > @@ -457,18 +491,22 @@ static int realtek_smi_probe(struct platform_device *pdev)
> > dev_set_drvdata(dev, priv);
> > spin_lock_init(&priv->lock);
> >
> > - /* TODO: if power is software controlled, set up any regulators here */
> > + priv->reset_ctl = devm_reset_control_get_optional(dev, NULL);
> > + if (IS_ERR(priv->reset_ctl)) {
> > + ret = PTR_ERR(priv->reset_ctl);
> > + return dev_err_probe(dev, ret, "failed to get reset control\n");
> > + }
> >
> > priv->reset = devm_gpiod_get_optional(dev, "reset", GPIOD_OUT_LOW);
> > if (IS_ERR(priv->reset)) {
> > dev_err(dev, "failed to get RESET GPIO\n");
> > return PTR_ERR(priv->reset);
> > }
> > - if (priv->reset) {
> > - gpiod_set_value(priv->reset, 1);
> > + if (priv->reset_ctl || priv->reset) {
> > + realtek_smi_reset_assert(priv);
> > dev_dbg(dev, "asserted RESET\n");
> > msleep(REALTEK_HW_STOP_DELAY);
> > - gpiod_set_value(priv->reset, 0);
> > + realtek_smi_reset_deassert(priv);
> > msleep(REALTEK_HW_START_DELAY);
> > dev_dbg(dev, "deasserted RESET\n");
> > }
> > @@ -518,8 +556,7 @@ static void realtek_smi_remove(struct platform_device *pdev)
> > of_node_put(priv->slave_mii_bus->dev.of_node);
>
> slave_mii_bus was renamed to user_mii_bus, and this prevents the
> application of the patch currently, so you will need to respin. But I
> think net-next is going to close soon for 2 weeks, so either you respin
> as RFC or you wait until it reopens.
I'll wait. I hope the comments on this thread might be enough to get
this patch sorted out.
>
> >
> > /* leave the device reset asserted */
> > - if (priv->reset)
> > - gpiod_set_value(priv->reset, 1);
> > + realtek_smi_reset_assert(priv);
> > }
> >
Regards,
Luiz
^ permalink raw reply
* [ANN] net-next is CLOSED
From: Jakub Kicinski @ 2023-10-31 0:06 UTC (permalink / raw)
To: netdev
Hi!
Please be advised that net-next is now closed for the duration
of the 6.7 merge window.
^ permalink raw reply
* Re: [PATCH net v2] net: dsa: tag_rtl4_a: Bump min packet size
From: Vladimir Oltean @ 2023-10-30 23:33 UTC (permalink / raw)
To: Linus Walleij
Cc: Andrew Lunn, Florian Fainelli, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, netdev, linux-kernel
In-Reply-To: <CACRpkdZ4+QrSA0+JCOrx_OZs4gzt1zx1kPK5bdqxp0AHfEQY3g@mail.gmail.com>
On Mon, Oct 30, 2023 at 11:57:33PM +0100, Linus Walleij wrote:
> Here you can incidentally also see what happens if we don't pad the big packets:
> the packet gets truncated.
Are we sure we are debugging a switch problem? On how many platforms
with the RTL8366RB can the issue be seen? Is the conduit interface the
same on all these platforms, or is it different and that makes no
difference?
^ permalink raw reply
* Re: [PATCH v4 1/5] r8169: Coalesce r8169_mac_ocp_write/modify calls to reduce spinlock stalls
From: Jacob Keller @ 2023-10-30 23:14 UTC (permalink / raw)
To: Heiner Kallweit, Mirsad Goran Todorovac, Jason Gunthorpe,
Joerg Roedel, Lu Baolu, iommu, linux-kernel, netdev
Cc: Joerg Roedel, Will Deacon, Robin Murphy, nic_swsd,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Marco Elver
In-Reply-To: <a85e41ab-7cfa-413a-a446-f1b65c09c9ab@gmail.com>
On 10/30/2023 3:08 PM, Heiner Kallweit wrote:
> On 30.10.2023 22:50, Jacob Keller wrote:
>>
>>
>> On 10/29/2023 4:04 AM, Mirsad Goran Todorovac wrote:> A pair of new
>> helpers r8168_mac_ocp_write_seq() and r8168_mac_ocp_modify_seq()
>>> are introduced.
>>>
>>> The motivation for these helpers was the locking overhead of 130 consecutive
>>> r8168_mac_ocp_write() calls in the RTL8411b reset after the NIC gets confused
>>> if the PHY is powered-down.
>>>
>>> To quote Heiner:
>>>
>>> On RTL8411b the RX unit gets confused if the PHY is powered-down.
>>> This was reported in [0] and confirmed by Realtek. Realtek provided
>>> a sequence to fix the RX unit after PHY wakeup.
>>>
>>> A series of about 130 r8168_mac_ocp_write() calls is performed to program the
>>> RTL registers for recovery, each doing an expensive spin_lock_irqsave() and
>>> spin_unlock_irqrestore().
>>>
>>> Each mac ocp write is made of:
>>>
>>> static void __r8168_mac_ocp_write(struct rtl8169_private *tp, u32 reg,
>>> u32 data)
>>> {
>>> if (rtl_ocp_reg_failure(reg))
>>> return;
>>>
>>> RTL_W32(tp, OCPDR, OCPAR_FLAG | (reg << 15) | data);
>>> }
>>>
>>> static void r8168_mac_ocp_write(struct rtl8169_private *tp, u32 reg,
>>> u32 data)
>>> {
>>> unsigned long flags;
>>>
>>> raw_spin_lock_irqsave(&tp->mac_ocp_lock, flags);
>>> __r8168_mac_ocp_write(tp, reg, data);
>>> raw_spin_unlock_irqrestore(&tp->mac_ocp_lock, flags);
>>> }
>>>
>>> Register programming is done through RTL_W32() macro which expands into
>>>
>>> #define RTL_W32(tp, reg, val32) writel((val32), tp->mmio_addr + (reg))
>>>
>>> which is further (on Alpha):
>>>
>>> extern inline void writel(u32 b, volatile void __iomem *addr)
>>> {
>>> mb();
>>> __raw_writel(b, addr);
>>> }
>>>
>>> or on i386/x86_64:
>>>
>>> #define build_mmio_write(name, size, type, reg, barrier) \
>>> static inline void name(type val, volatile void __iomem *addr) \
>>> { asm volatile("mov" size " %0,%1": :reg (val), \
>>> "m" (*(volatile type __force *)addr) barrier); }
>>>
>>> build_mmio_write(writel, "l", unsigned int, "r", :"memory")
>>>
>>> This obviously involves iat least a compiler barrier.
>>>
>>> mb() expands into something like this i.e. on x86_64:
>>>
>>> #define mb() asm volatile("lock; addl $0,0(%%esp)" ::: "memory")
>>>
>>> This means a whole lot of memory bus stalls: for spin_lock_irqsave(),
>>> memory barrier, writel(), and spin_unlock_irqrestore().
>>>
>>> With about 130 of these sequential calls to r8168_mac_ocp_write() this looks like
>>> a lock storm that will stall all of the cores and CPUs on the same memory controller
>>> for certain time I/O takes to finish.
>>>
>>> In a sequential case of RTL register programming, the writes to RTL registers
>>> can be coalesced under a same raw spinlock. This can dramatically decrease the
>>> number of bus stalls in a multicore or multi-CPU system.
>>>
>>> Macro helpers r8168_mac_ocp_write_seq() and r8168_mac_ocp_modify_seq() are
>>> provided to reduce lock contention:
>>>
>>> static void rtl_hw_start_8411_2(struct rtl8169_private *tp)
>>> {
>>>
>>> ...
>>>
>>> /* The following Realtek-provided magic fixes an issue with the RX unit
>>> * getting confused after the PHY having been powered-down.
>>> */
>>>
>>> static const struct recover_8411b_info init_zero_seq[] = {
>>> { 0xFC28, 0x0000 }, { 0xFC2A, 0x0000 }, { 0xFC2C, 0x0000 },
>>> ...
>>> };
>>>
>>> ...
>>>
>>> r8168_mac_ocp_write_seq(tp, init_zero_seq);
>>>
>>> ...
>>>
>>> }
>>>
>>> The hex data is preserved intact through s/r8168_mac_ocp_write[(]tp,/{ / and s/[)];/ },/
>>> functions that only changed the function names and the ending of the line, so the actual
>>> hex data is unchanged.
>>>
>>> To repeat, the reason for the introduction of the original commit
>>> was to enable recovery of the RX unit on the RTL8411b which was confused by the
>>> powered-down PHY. This sequence of r8168_mac_ocp_write() calls amplifies the problem
>>> into a series of about 500+ memory bus locks, most waiting for the main memory read,
>>> modify and write under a LOCK. The memory barrier in RTL_W32 should suffice for
>>> the programming sequence to reach RTL NIC registers.
>>>
>>> [0] https://bugzilla.redhat.com/show_bug.cgi?id=1692075
>>>
>>
>>
>> I might have chosen to send some of this information as the cover letter
>> for the series instead of just as part of the commit message for [1/5],
>> but either way:
>>
>> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
>
> Cover letter is still missing, and there's a v5 already.
> Good example why we have the "max one version per day" rule.
>
> There's still some issues with the series, see my review comments
> for v5. As-is I'd NAK the series.
>
Heh, ya. A v5 was sent without there being a single (public) comment on
the list prior to my reviewing. I didn't notice the v5, and my mail
scripts pointed out this series didn't have anyone who'd looked at it
yet.. I guess I could have searched for and noticed a newer version.
Thanks,
Jake
^ permalink raw reply
* Re: [GIT PULL] Networking for 6.7
From: Jakub Kicinski @ 2023-10-30 23:15 UTC (permalink / raw)
To: torvalds; +Cc: davem, netdev, linux-kernel, pabeni
In-Reply-To: <20231028011741.2400327-1-kuba@kernel.org>
On Fri, 27 Oct 2023 18:17:41 -0700 Jakub Kicinski wrote:
> Hi Linus!
>
> I'll be traveling next week, so anticipating no -rc8 here is our
> merge window material.
Minor heads up, there's a silent conflict with the crypto tree,
you may want to squash this into whichever you merge second:
https://lore.kernel.org/all/20231030160953.28f2df61@canb.auug.org.au/
^ permalink raw reply
* Re: [PATCH net v2] net: dsa: tag_rtl4_a: Bump min packet size
From: Linus Walleij @ 2023-10-30 23:11 UTC (permalink / raw)
To: Vladimir Oltean
Cc: Andrew Lunn, Florian Fainelli, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, netdev, linux-kernel
In-Reply-To: <20231030230906.s5feepjcvgbg5e7v@skbuf>
On Tue, Oct 31, 2023 at 12:09 AM Vladimir Oltean <olteanv@gmail.com> wrote:
> On Mon, Oct 30, 2023 at 11:57:33PM +0100, Linus Walleij wrote:
> > This of course make no sense, since the padding function should do nothing
> > when the packet is bigger than 60 bytes.
>
> Indeed, this of course makes no sense. Ping doesn't work, or ARP doesn't
> work? Could you add a static ARP entry for the 192.168.1.137 IP address?
ARP appears to work, and also DHCP then (I reconnect and
restart the interface before each test), so it's really weird.
I'm trying your suggestion to skb_dump() before/after.
Yours,
Linus Walleij
^ permalink raw reply
* Re: [PATCH net v2] net: dsa: tag_rtl4_a: Bump min packet size
From: Vladimir Oltean @ 2023-10-30 23:09 UTC (permalink / raw)
To: Linus Walleij
Cc: Andrew Lunn, Florian Fainelli, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, netdev, linux-kernel
In-Reply-To: <CACRpkdZ4+QrSA0+JCOrx_OZs4gzt1zx1kPK5bdqxp0AHfEQY3g@mail.gmail.com>
On Mon, Oct 30, 2023 at 11:57:33PM +0100, Linus Walleij wrote:
> This of course make no sense, since the padding function should do nothing
> when the packet is bigger than 60 bytes.
Indeed, this of course makes no sense. Ping doesn't work, or ARP doesn't
work? Could you add a static ARP entry for the 192.168.1.137 IP address?
^ permalink raw reply
* [PATCH] net: phy: broadcom: Wire suspend/resume for BCM54612E
From: Marco von Rosenberg @ 2023-10-30 22:54 UTC (permalink / raw)
To: Florian Fainelli, Broadcom internal kernel review list,
Andrew Lunn, Heiner Kallweit, Russell King, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni
Cc: Marco von Rosenberg, netdev, linux-kernel
On some devices, the bootloader suspends the PHY before booting the OS.
Not having a resume callback wired up is a problem in such situations
since it is then never resumed.
This behavior was observed with a Huawei enterprise WLAN access point.
The BCM54612E ethernet PHY supports IDDQ-SR.
Therefore wire-up the suspend and resume callbacks
to point to bcm54xx_suspend() and bcm54xx_resume().
The same wire-up has been done in commit 38b6a9073007 ("net: phy:
broadcom: Wire suspend/resume for BCM50610 and BCM50610M") for two PHYs
also supporting IDDQ-SR.
Signed-off-by: Marco von Rosenberg <marcovr@selfnet.de>
---
drivers/net/phy/broadcom.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/net/phy/broadcom.c b/drivers/net/phy/broadcom.c
index 04b2e6eeb195..ac14f223649b 100644
--- a/drivers/net/phy/broadcom.c
+++ b/drivers/net/phy/broadcom.c
@@ -1060,6 +1060,8 @@ static struct phy_driver broadcom_drivers[] = {
.handle_interrupt = bcm_phy_handle_interrupt,
.link_change_notify = bcm54xx_link_change_notify,
.led_brightness_set = bcm_phy_led_brightness_set,
+ .suspend = bcm54xx_suspend,
+ .resume = bcm54xx_resume,
}, {
.phy_id = PHY_ID_BCM54616S,
.phy_id_mask = 0xfffffff0,
--
2.42.0
^ permalink raw reply related
* Re: [PATCH net v2] net: dsa: tag_rtl4_a: Bump min packet size
From: Linus Walleij @ 2023-10-30 22:57 UTC (permalink / raw)
To: Vladimir Oltean
Cc: Andrew Lunn, Florian Fainelli, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, netdev, linux-kernel
In-Reply-To: <20231030222035.oqos7v7sdq5u6mti@skbuf>
On Mon, Oct 30, 2023 at 11:20 PM Vladimir Oltean <olteanv@gmail.com> wrote:
> I see commit 86dd9868b878 ("net: dsa: tag_rtl4_a: Support also egress tags")
> also mentions: "Qingfang came up with the solution: we need to pad the
> ethernet frame to 60 bytes using eth_skb_pad(), then the switch will
> happily accept frames with custom tags.". So the __skb_put_padto() was
> something very empirical in the first place.
>
> Since it's all problematic, would you mind removing the __skb_put_padto()
> altogether from rtl4a_tag_xmit(), and let me know what is the output for
> the following sweep through packet sizes? I truly wonder if it's just
> for small and large packets that we see packet drops, or if it's something
> repetitive throughout the range as well.
>
> for size in $(seq 0 1476); do if ping 10.0.0.56 -s $size -W 1 -c 1 -q >/dev/null; then echo "$((size + 42)): OK"; else echo "$((size + 42)): NOK"; fi; done
The weird thing is that if I remove the __skb_put_padto()
call, ping doesn't work at all. Somehow the packets are
corrupted, because they sure get out of the switch and I
can see them arriving with tcpdump on the host.
root@OpenWrt:/# for size in $(seq 0 1476); do if ping 192.168.1.137 -s $size -W
1 -c 1 -q >/dev/null; then echo "$((size + 42)): OK"; else echo "$((size + 42)):
NOK"; fi; done
42: NOK
43: NOK
44: NOK
45: NOK
46: NOK
47: NOK
48: NOK
49: NOK
50: NOK
51: NOK
(...)
1509: NOK
1510: NOK
1511: NOK
1512: NOK
1513: NOK
1514: NOK
1515: NOK
1516: NOK
1517: NOK
1518: NOK
This of course make no sense, since the padding function should do nothing
when the packet is bigger than 60 bytes.
So what we are seeing is some kind of side effect from the usage of
__skb_put_padto() I suppose? But I have no idea what that is, I looked
at the function and what it calls down to __skb_pad().
I'm testing skb_linearize(), which seems to be called on this path...
TCPdump on the host looks like this:
# tcpdump -i enp7s0
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on enp7s0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
23:28:55.184019 IP _gateway > fedora: ICMP echo request, id 2461, seq
0, length 27
23:28:56.205294 IP _gateway > fedora: ICMP echo request, id 2462, seq
0, length 28
23:28:57.226495 IP _gateway > fedora: ICMP echo request, id 2463, seq
0, length 29
23:28:58.248013 IP _gateway > fedora: ICMP echo request, id 2464, seq
0, length 30
23:28:59.269157 IP _gateway > fedora: ICMP echo request, id 2465, seq
0, length 31
23:29:00.290443 IP _gateway > fedora: ICMP echo request, id 2466, seq
0, length 32
23:29:01.698700 IP _gateway > fedora: ICMP echo request, id 2467, seq
0, length 33
23:29:02.332131 IP _gateway > fedora: ICMP echo request, id 2468, seq
0, length 34
23:29:03.352442 IP _gateway > fedora: ICMP echo request, id 2469, seq
0, length 35
(...)
23:53:33.834706 IP _gateway > fedora: ICMP echo request, id 4000, seq
0, length 1475
23:53:34.854946 IP _gateway > fedora: ICMP echo request, id 4001, seq
0, length 1476
23:53:36.258777 IP truncated-ip - 1 bytes missing! _gateway > fedora:
ICMP echo request, id 4002, seq 0, length 1477
23:53:36.896654 IP truncated-ip - 2 bytes missing! _gateway > fedora:
ICMP echo request, id 4003, seq 0, length 1478
23:53:37.918022 IP truncated-ip - 3 bytes missing! _gateway > fedora:
ICMP echo request, id 4004, seq 0, length 1479
23:53:38.938355 IP truncated-ip - 4 bytes missing! _gateway > fedora:
ICMP echo request, id 4005, seq 0, length 1480
23:53:39.958451 IP truncated-ip - 4 bytes missing! _gateway > fedora:
ICMP echo request, id 4006, seq 0, length 1480
23:53:40.978598 IP truncated-ip - 4 bytes missing! _gateway > fedora:
ICMP echo request, id 4007, seq 0, length 1480
23:53:41.998991 IP truncated-ip - 4 bytes missing! _gateway > fedora:
ICMP echo request, id 4008, seq 0, length 1480
23:53:43.020309 IP truncated-ip - 4 bytes missing! _gateway > fedora:
ICMP echo request, id 4010, seq 0, length 1480
Here you can incidentally also see what happens if we don't pad the big packets:
the packet gets truncated.
Yours,
Linus Walleij
^ permalink raw reply
* Re: [PATCH net-next v2 2/3] dt-bindings: net: dsa: realtek: add reset controller
From: Luiz Angelo Daros de Luca @ 2023-10-30 22:30 UTC (permalink / raw)
To: Rob Herring
Cc: netdev, linus.walleij, alsi, andrew, vivien.didelot, f.fainelli,
olteanv, davem, kuba, pabeni, krzk+dt, arinc.unal, devicetree
In-Reply-To: <20231030131551.GA714112-robh@kernel.org>
> On Fri, Oct 27, 2023 at 04:00:56PM -0300, Luiz Angelo Daros de Luca wrote:
> > Realtek switches can use a reset controller instead of reset-gpios.
> >
> > Signed-off-by: Luiz Angelo Daros de Luca <luizluca@gmail.com>
> > Cc: devicetree@vger.kernel.org
> > ---
> > .../devicetree/bindings/net/dsa/realtek.yaml | 75 +++++++++++++++++++
> > 1 file changed, 75 insertions(+)
> >
> > diff --git a/Documentation/devicetree/bindings/net/dsa/realtek.yaml b/Documentation/devicetree/bindings/net/dsa/realtek.yaml
> > index 46e113df77c8..ef7b27c3b1a3 100644
> > --- a/Documentation/devicetree/bindings/net/dsa/realtek.yaml
> > +++ b/Documentation/devicetree/bindings/net/dsa/realtek.yaml
> > @@ -59,6 +59,9 @@ properties:
> > description: GPIO to be used to reset the whole device
> > maxItems: 1
> >
> > + resets:
> > + maxItems: 1
> > +
> > realtek,disable-leds:
> > type: boolean
> > description: |
> > @@ -385,3 +388,75 @@ examples:
> > };
> > };
> > };
> > +
> > + - |
> > + #include <dt-bindings/gpio/gpio.h>
> > +
> > + platform {
> > + switch {
> > + compatible = "realtek,rtl8365mb";
> > + mdc-gpios = <&gpio1 16 GPIO_ACTIVE_HIGH>;
> > + mdio-gpios = <&gpio1 17 GPIO_ACTIVE_HIGH>;
> > +
> > + resets = <&rst 8>;
> > +
> > + ethernet-ports {
> > + #address-cells = <1>;
> > + #size-cells = <0>;
> > +
> > + ethernet-port@0 {
> > + reg = <0>;
> > + label = "wan";
> > + phy-handle = <ðphy-0>;
> > + };
> > + ethernet-port@1 {
> > + reg = <1>;
> > + label = "lan1";
> > + phy-handle = <ðphy-1>;
> > + };
> > + ethernet-port@2 {
> > + reg = <2>;
> > + label = "lan2";
> > + phy-handle = <ðphy-2>;
> > + };
> > + ethernet-port@3 {
> > + reg = <3>;
> > + label = "lan3";
> > + phy-handle = <ðphy-3>;
> > + };
> > + ethernet-port@4 {
> > + reg = <4>;
> > + label = "lan4";
> > + phy-handle = <ðphy-4>;
> > + };
> > + ethernet-port@5 {
> > + reg = <5>;
> > + ethernet = <ð0>;
> > + phy-mode = "rgmii";
> > + fixed-link {
> > + speed = <1000>;
> > + full-duplex;
> > + };
> > + };
> > + };
> > +
> > + mdio {
> > + compatible = "realtek,smi-mdio";
> > + #address-cells = <1>;
> > + #size-cells = <0>;
> > +
> > + ethphy-0: ethernet-phy@0 {
>
> You didn't test your binding (make dt_binding_check).
>
> '-' is not valid in labels.
>
My bad. I (wrongly) fixed that in the realtek.example.dtb content,
which is derived from the realtek.yaml.
As it has a newer mtime, it was not updated with dt_binding_check.
>
> Why do we have a whole other example just for 'resets' instead of
> 'reset-gpios'? That's not really worth it.
However, as it is not worth it, I'll drop it.
> Rob
^ permalink raw reply
* RE: [Patch v2] hv_netvsc: Mark VF as slave before exposing it to user-mode
From: Long Li @ 2023-10-30 22:29 UTC (permalink / raw)
To: Jakub Kicinski, longli@linuxonhyperv.com
Cc: KY Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
David S. Miller, Eric Dumazet, Paolo Abeni,
linux-hyperv@vger.kernel.org, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org
In-Reply-To: <20231030142542.6640190b@kernel.org>
> Subject: Re: [Patch v2] hv_netvsc: Mark VF as slave before exposing it to user-
> mode
>
> On Fri, 27 Oct 2023 13:59:50 -0700 longli@linuxonhyperv.com wrote:
> > When a VF is being exposed form the kernel, it should be marked as "slave"
> > before exposing to the user-mode. The VF is not usable without netvsc
> > running as master. The user-mode should never see a VF without the "slave"
> flag.
> >
> > This commit moves the code of setting the slave flag to the time
> > before VF is exposed to user-mode.
>
> Can you give a real example in the commit message of a flow in user space
> which would get confused by seeing the VF netdev without IFF_SLAVE?
A user-mode program may see the VF netdev show up without SLAVE flag before seeing the NETVSC netdev. It may try to configure the VF before it will be bonded to a NETVSC.
With the IFF_SLAVE correctly set at the time of VF showing up to the user-mode, it can rely on this flag to decide if this device should be ignored. (without implementing some timeout logic to detect a potential NETVSC device that may show up later)
>
> You're only moving setting IFF_SLAVE but not linking the master, is there no
> code which would assume that if SLAVE is set there is a master?
The same (taking IFF_SLAVE without linking to master) is done in the original code before VF is joined, but it was for another purpose. I think there is a gap between when the VF is acted upon by other parts of the system and when it's bonded.
^ permalink raw reply
* Re: [PATCH net v2] net: dsa: tag_rtl4_a: Bump min packet size
From: Vladimir Oltean @ 2023-10-30 22:20 UTC (permalink / raw)
To: Linus Walleij
Cc: Andrew Lunn, Florian Fainelli, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, netdev, linux-kernel
In-Reply-To: <CACRpkdaN2rTSHXDxwuS4czCzWyUkazY4Fn5vVLYosqF0=qi-Bw@mail.gmail.com>
On Mon, Oct 30, 2023 at 10:50:31PM +0100, Linus Walleij wrote:
> > you said that what increments is Dot1dTpPortInDiscards
> No this was a coincidence, we can rule this out.
I see commit 86dd9868b878 ("net: dsa: tag_rtl4_a: Support also egress tags")
also mentions: "Qingfang came up with the solution: we need to pad the
ethernet frame to 60 bytes using eth_skb_pad(), then the switch will
happily accept frames with custom tags.". So the __skb_put_padto() was
something very empirical in the first place.
Since it's all problematic, would you mind removing the __skb_put_padto()
altogether from rtl4a_tag_xmit(), and let me know what is the output for
the following sweep through packet sizes? I truly wonder if it's just
for small and large packets that we see packet drops, or if it's something
repetitive throughout the range as well.
for size in $(seq 0 1476); do if ping 10.0.0.56 -s $size -W 1 -c 1 -q >/dev/null; then echo "$((size + 42)): OK"; else echo "$((size + 42)): NOK"; fi; done
^ permalink raw reply
* Re: [PATCH v4 1/5] r8169: Coalesce r8169_mac_ocp_write/modify calls to reduce spinlock stalls
From: Heiner Kallweit @ 2023-10-30 22:08 UTC (permalink / raw)
To: Jacob Keller, Mirsad Goran Todorovac, Jason Gunthorpe,
Joerg Roedel, Lu Baolu, iommu, linux-kernel, netdev
Cc: Joerg Roedel, Will Deacon, Robin Murphy, nic_swsd,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Marco Elver
In-Reply-To: <e7a6b0c1-9fc6-480c-a135-7e142514d0e7@intel.com>
On 30.10.2023 22:50, Jacob Keller wrote:
>
>
> On 10/29/2023 4:04 AM, Mirsad Goran Todorovac wrote:> A pair of new
> helpers r8168_mac_ocp_write_seq() and r8168_mac_ocp_modify_seq()
>> are introduced.
>>
>> The motivation for these helpers was the locking overhead of 130 consecutive
>> r8168_mac_ocp_write() calls in the RTL8411b reset after the NIC gets confused
>> if the PHY is powered-down.
>>
>> To quote Heiner:
>>
>> On RTL8411b the RX unit gets confused if the PHY is powered-down.
>> This was reported in [0] and confirmed by Realtek. Realtek provided
>> a sequence to fix the RX unit after PHY wakeup.
>>
>> A series of about 130 r8168_mac_ocp_write() calls is performed to program the
>> RTL registers for recovery, each doing an expensive spin_lock_irqsave() and
>> spin_unlock_irqrestore().
>>
>> Each mac ocp write is made of:
>>
>> static void __r8168_mac_ocp_write(struct rtl8169_private *tp, u32 reg,
>> u32 data)
>> {
>> if (rtl_ocp_reg_failure(reg))
>> return;
>>
>> RTL_W32(tp, OCPDR, OCPAR_FLAG | (reg << 15) | data);
>> }
>>
>> static void r8168_mac_ocp_write(struct rtl8169_private *tp, u32 reg,
>> u32 data)
>> {
>> unsigned long flags;
>>
>> raw_spin_lock_irqsave(&tp->mac_ocp_lock, flags);
>> __r8168_mac_ocp_write(tp, reg, data);
>> raw_spin_unlock_irqrestore(&tp->mac_ocp_lock, flags);
>> }
>>
>> Register programming is done through RTL_W32() macro which expands into
>>
>> #define RTL_W32(tp, reg, val32) writel((val32), tp->mmio_addr + (reg))
>>
>> which is further (on Alpha):
>>
>> extern inline void writel(u32 b, volatile void __iomem *addr)
>> {
>> mb();
>> __raw_writel(b, addr);
>> }
>>
>> or on i386/x86_64:
>>
>> #define build_mmio_write(name, size, type, reg, barrier) \
>> static inline void name(type val, volatile void __iomem *addr) \
>> { asm volatile("mov" size " %0,%1": :reg (val), \
>> "m" (*(volatile type __force *)addr) barrier); }
>>
>> build_mmio_write(writel, "l", unsigned int, "r", :"memory")
>>
>> This obviously involves iat least a compiler barrier.
>>
>> mb() expands into something like this i.e. on x86_64:
>>
>> #define mb() asm volatile("lock; addl $0,0(%%esp)" ::: "memory")
>>
>> This means a whole lot of memory bus stalls: for spin_lock_irqsave(),
>> memory barrier, writel(), and spin_unlock_irqrestore().
>>
>> With about 130 of these sequential calls to r8168_mac_ocp_write() this looks like
>> a lock storm that will stall all of the cores and CPUs on the same memory controller
>> for certain time I/O takes to finish.
>>
>> In a sequential case of RTL register programming, the writes to RTL registers
>> can be coalesced under a same raw spinlock. This can dramatically decrease the
>> number of bus stalls in a multicore or multi-CPU system.
>>
>> Macro helpers r8168_mac_ocp_write_seq() and r8168_mac_ocp_modify_seq() are
>> provided to reduce lock contention:
>>
>> static void rtl_hw_start_8411_2(struct rtl8169_private *tp)
>> {
>>
>> ...
>>
>> /* The following Realtek-provided magic fixes an issue with the RX unit
>> * getting confused after the PHY having been powered-down.
>> */
>>
>> static const struct recover_8411b_info init_zero_seq[] = {
>> { 0xFC28, 0x0000 }, { 0xFC2A, 0x0000 }, { 0xFC2C, 0x0000 },
>> ...
>> };
>>
>> ...
>>
>> r8168_mac_ocp_write_seq(tp, init_zero_seq);
>>
>> ...
>>
>> }
>>
>> The hex data is preserved intact through s/r8168_mac_ocp_write[(]tp,/{ / and s/[)];/ },/
>> functions that only changed the function names and the ending of the line, so the actual
>> hex data is unchanged.
>>
>> To repeat, the reason for the introduction of the original commit
>> was to enable recovery of the RX unit on the RTL8411b which was confused by the
>> powered-down PHY. This sequence of r8168_mac_ocp_write() calls amplifies the problem
>> into a series of about 500+ memory bus locks, most waiting for the main memory read,
>> modify and write under a LOCK. The memory barrier in RTL_W32 should suffice for
>> the programming sequence to reach RTL NIC registers.
>>
>> [0] https://bugzilla.redhat.com/show_bug.cgi?id=1692075
>>
>
>
> I might have chosen to send some of this information as the cover letter
> for the series instead of just as part of the commit message for [1/5],
> but either way:
>
> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Cover letter is still missing, and there's a v5 already.
Good example why we have the "max one version per day" rule.
There's still some issues with the series, see my review comments
for v5. As-is I'd NAK the series.
^ permalink raw reply
* Re: linux-next: build failure after merge of the crypto tree
From: Jakub Kicinski @ 2023-10-30 22:02 UTC (permalink / raw)
To: Herbert Xu
Cc: Stephen Rothwell, David Miller, Paolo Abeni, Networking,
Linux Crypto List, Dmitry Safonov, Dmitry Safonov,
Francesco Ruggeri, Salam Noureddine, Linux Kernel Mailing List,
Linux Next Mailing List
In-Reply-To: <ZT896a2j3hUI1NF+@gondor.apana.org.au>
On Mon, 30 Oct 2023 13:23:53 +0800 Herbert Xu wrote:
> If we simply apply this patch to the netdev tree then everything
> should work at the next merge window. But perhaps you could change
> the patch description to say something like remove the obsolete
> crypto_hash_alignmask. It's not important though.
I'm happy to massage the commit message and apply the fix to net.
But is it actually 100% correct to do that? IOW is calling
crypto_ahash_alignmask() already not necessary in net-next or does
it only become unnecessary after some prep work in crypto-next?
We can tell Linus to squash this fix into the merge of either
crypto-next or net-next, I'm pretty sure he'd be okay with that..
^ permalink raw reply
* Re: [PATCH v3] net: r8169: Disable multicast filter for RTL8168H and RTL8107E
From: Heiner Kallweit @ 2023-10-30 22:02 UTC (permalink / raw)
To: Patrick Thompson, netdev
Cc: Chun-Hao Lin, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, linux-kernel, nic_swsd
In-Reply-To: <20231030205031.177855-1-ptf@google.com>
On 30.10.2023 21:50, Patrick Thompson wrote:
> RTL8168H and RTL8107E ethernet adapters erroneously filter unicast
> eapol packets unless allmulti is enabled. These devices correspond to
> RTL_GIGA_MAC_VER_46 and VER_48. Add an exception for VER_46 and VER_48
> in the same way that VER_35 has an exception.
>
> Fixes: 6e1d0b898818 ("r8169:add support for RTL8168H and RTL8107E")
> Signed-off-by: Patrick Thompson <ptf@google.com>
> ---
>
> Changes in v3:
> - disable mc filter for VER_48
> - update description
>
> Changes in v2:
> - add Fixes tag
> - add net annotation
> - update description
>
Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
^ permalink raw reply
* Re: [PATCH v1 net 2/2] dccp/tcp: Call security_inet_conn_request() after setting IPv6 addresses.
From: Paul Moore @ 2023-10-30 22:00 UTC (permalink / raw)
To: Kuniyuki Iwashima
Cc: davem, dccp, dsahern, edumazet, huw, kuba, kuni1840,
linux-security-module, netdev, pabeni
In-Reply-To: <20231030212015.57180-1-kuniyu@amazon.com>
On Mon, Oct 30, 2023 at 5:20 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
> From: Paul Moore <paul@paul-moore.com>
> Date: Mon, 30 Oct 2023 17:12:33 -0400
> > On Mon, Oct 30, 2023 at 4:12 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
> > >
> > > Initially, commit 4237c75c0a35 ("[MLSXFRM]: Auto-labeling of child
> > > sockets") introduced security_inet_conn_request() in some functions
> > > where reqsk is allocated. The hook is added just after the allocation,
> > > so reqsk's IPv6 remote address was not initialised then.
> > >
> > > However, SELinux/Smack started to read it in netlbl_req_setattr()
> > > after commit e1adea927080 ("calipso: Allow request sockets to be
> > > relabelled by the lsm.").
> > >
> > > Commit 284904aa7946 ("lsm: Relocate the IPv4 security_inet_conn_request()
> > > hooks") fixed that kind of issue only in TCPv4 because IPv6 labeling was
> > > not supported at that time. Finally, the same issue was introduced again
> > > in IPv6.
> > >
> > > Let's apply the same fix on DCCPv6 and TCPv6.
> > >
> > > Fixes: e1adea927080 ("calipso: Allow request sockets to be relabelled by the lsm.")
> > > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> > > ---
> > > Cc: Huw Davies <huw@codeweavers.com>
> > > Cc: Paul Moore <paul@paul-moore.com>
> > > ---
> > > net/dccp/ipv6.c | 6 +++---
> > > net/ipv6/syncookies.c | 7 ++++---
> > > 2 files changed, 7 insertions(+), 6 deletions(-)
> >
> > Thanks for catching this and submitting a patch!
> >
> > It seems like we should also update dccp_v4_conn_request(), what do you think?
>
> Yes, and it's done in patch 1 as it had a separate Fixes tag.
> https://lore.kernel.org/netdev/20231030201042.32885-2-kuniyu@amazon.com/
Great, thanks for doing that. netdev folks, please feel free to add
my ACK to both patches in the patchset.
Acked-by: Paul Moore <paul@paul-moore.com>
> It seems get_maintainers.pl suggested another email address of
> yours for patch 1. It would be good to update .mailmap ;)
Yes, I really should, thanks for the reminder. I'll send an update to
Linus once I get the merge window PRs sorted out.
--
paul-moore.com
^ permalink raw reply
* Re: [PATCH v4 5/5] r8169: Coalesce mac ocp commands for rtl_hw_init_8125 to reduce spinlocks
From: Jacob Keller @ 2023-10-30 21:54 UTC (permalink / raw)
To: Mirsad Goran Todorovac, Jason Gunthorpe, Joerg Roedel, Lu Baolu,
iommu, linux-kernel, netdev
Cc: Joerg Roedel, Will Deacon, Robin Murphy, Heiner Kallweit,
nic_swsd, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Marco Elver
In-Reply-To: <20231029110442.347448-5-mirsad.todorovac@alu.unizg.hr>
On 10/29/2023 4:04 AM, Mirsad Goran Todorovac wrote:
> Repeated calls to r8168_mac_ocp_write() and r8168_mac_ocp_modify() in
> the init sequence of the 8125 involve implicit spin_lock_irqsave() and
> spin_unlock_irqrestore() on each invocation.
>
> Coalesced with the corresponding helpers r8168_mac_ocp_write_seq() and
> r8168_mac_ocp_modify_seq() into sequential write or modidy with a sinqle lock/unlock,
> these calls reduce overall lock contention.
>
> Fixes: f1bce4ad2f1ce ("r8169: add support for RTL8125")
> Cc: Heiner Kallweit <hkallweit1@gmail.com>
> Cc: Marco Elver <elver@google.com>
> Cc: nic_swsd@realtek.com
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: Paolo Abeni <pabeni@redhat.com>
> Cc: netdev@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Link: https://lore.kernel.org/lkml/20231028005153.2180411-1-mirsad.todorovac@alu.unizg.hr/
> Link: https://lore.kernel.org/lkml/20231028110459.2644926-1-mirsad.todorovac@alu.unizg.hr/
> Signed-off-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
> ---
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
^ permalink raw reply
* Re: [PATCH v4 4/5] r8169: Coalesce mac ocp commands for 8125 and 8125B start to reduce spinlock contention
From: Jacob Keller @ 2023-10-30 21:53 UTC (permalink / raw)
To: Mirsad Goran Todorovac, Jason Gunthorpe, Joerg Roedel, Lu Baolu,
iommu, linux-kernel, netdev
Cc: Joerg Roedel, Will Deacon, Robin Murphy, Heiner Kallweit,
nic_swsd, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Marco Elver
In-Reply-To: <20231029110442.347448-4-mirsad.todorovac@alu.unizg.hr>
On 10/29/2023 4:04 AM, Mirsad Goran Todorovac wrote:
> Repeated calls to r8168_mac_ocp_write() and r8168_mac_ocp_modify() in
> the startup of 8125 and 8125B involve implicit spin_lock_irqsave() and
> spin_unlock_irqrestore() on each invocation.
>
> Coalesced with the corresponding helpers r8168_mac_ocp_write_seq() and
> r8168_mac_ocp_modify_seq() into sequential write or modidy with a sinqle lock/unlock,
> these calls reduce overall lock contention.
>
> Fixes: f1bce4ad2f1ce ("r8169: add support for RTL8125")
> Fixes: 0439297be9511 ("r8169: add support for RTL8125B")
> Cc: Heiner Kallweit <hkallweit1@gmail.com>
> Cc: Marco Elver <elver@google.com>
> Cc: nic_swsd@realtek.com
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: Paolo Abeni <pabeni@redhat.com>
> Cc: netdev@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Link: https://lore.kernel.org/lkml/20231028005153.2180411-1-mirsad.todorovac@alu.unizg.hr/
> Link: https://lore.kernel.org/lkml/20231028110459.2644926-1-mirsad.todorovac@alu.unizg.hr/
> Signed-off-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
> ---
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
^ permalink raw reply
* Re: [PATCH v4 3/5] r8169: Coalesce mac ocp write and modify for 8168H start to reduce spinlocks
From: Jacob Keller @ 2023-10-30 21:53 UTC (permalink / raw)
To: Mirsad Goran Todorovac, Jason Gunthorpe, Joerg Roedel, Lu Baolu,
iommu, linux-kernel, netdev
Cc: Joerg Roedel, Will Deacon, Robin Murphy, Heiner Kallweit,
nic_swsd, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Marco Elver
In-Reply-To: <20231029110442.347448-3-mirsad.todorovac@alu.unizg.hr>
On 10/29/2023 4:04 AM, Mirsad Goran Todorovac wrote:
> Repeated calls to r8168_mac_ocp_write() and r8168_mac_ocp_modify() in
> the startup of 8168H involve implicit spin_lock_irqsave() and spin_unlock_irqrestore()
> on each invocation.
>
> Coalesced with the corresponding helpers, r8168_mac_ocp_write_seq() and
> r8168_mac_ocp_modify_seq() with a sinqle lock/unlock, these calls reduce overall
> lock contention.
>
> Fixes: ef712ede3541d ("r8169: add helper r8168_mac_ocp_modify")
> Fixes: 6e1d0b8988188 ("r8169:add support for RTL8168H and RTL8107E")
> Cc: Heiner Kallweit <hkallweit1@gmail.com>
> Cc: Marco Elver <elver@google.com>
> Cc: nic_swsd@realtek.com
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: Paolo Abeni <pabeni@redhat.com>
> Cc: netdev@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Link: https://lore.kernel.org/lkml/20231028005153.2180411-1-mirsad.todorovac@alu.unizg.hr/
> Link: https://lore.kernel.org/lkml/20231028110459.2644926-1-mirsad.todorovac@alu.unizg.hr/
> Signed-off-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
> ---
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
^ permalink raw reply
* Re: [PATCH v4 2/5] r8169: Coalesce RTL8411b PHY power-down recovery calls to reduce spinlock stalls
From: Jacob Keller @ 2023-10-30 21:52 UTC (permalink / raw)
To: Mirsad Goran Todorovac, Jason Gunthorpe, Joerg Roedel, Lu Baolu,
iommu, linux-kernel, netdev
Cc: Joerg Roedel, Will Deacon, Robin Murphy, Heiner Kallweit,
nic_swsd, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Marco Elver
In-Reply-To: <20231029110442.347448-2-mirsad.todorovac@alu.unizg.hr>
On 10/29/2023 4:04 AM, Mirsad Goran Todorovac wrote:
> On RTL8411b the RX unit gets confused if the PHY is powered-down.
> This was reported in [0] and confirmed by Realtek. Realtek provided
> a sequence to fix the RX unit after PHY wakeup.
>
> A series of about 130 r8168_mac_ocp_write() calls is performed to
> program the RTL registers for recovery.
>
> With about 130 of these sequential calls to r8168_mac_ocp_write() this looks like
> a lock storm that will stall all of the cores and CPUs on the same memory controller
> for certain time I/O takes to finish.
>
> In a sequential case of RTL register programming, a sequence of writes to the RTL
> registers can be coalesced under a same raw spinlock. This can dramatically decrease
> the number of bus stalls in a multicore or multi-CPU system:
>
> static void rtl_hw_start_8411_2(struct rtl8169_private *tp)
> {
>
> ...
>
> /* The following Realtek-provided magic fixes an issue with the RX unit
> * getting confused after the PHY having been powered-down.
> */
>
> static const struct recover_8411b_info init_zero_seq[] = {
> { 0xFC28, 0x0000 }, { 0xFC2A, 0x0000 }, { 0xFC2C, 0x0000 },
> ...
> };
>
> static const struct recover_8411b_info recover_seq[] = {
> { 0xF800, 0xE008 }, { 0xF802, 0xE00A }, { 0xF804, 0xE00C },
> ...
> };
>
> static const struct recover_8411b_info final_seq[] = {
> { 0xFC2A, 0x0743 }, { 0xFC2C, 0x0801 }, { 0xFC2E, 0x0BE9 },
> ...
> };
>
> r8168_mac_ocp_write_seq(tp, init_zero_seq);
> mdelay(3);
> r8168_mac_ocp_write(tp, 0xFC26, 0x0000);
> r8168_mac_ocp_write_seq(tp, recover_seq);
> r8168_mac_ocp_write(tp, 0xFC26, 0x8000);
> r8168_mac_ocp_write_seq(tp, final_seq);
> }
>
> The hex data is preserved intact through s/r8168_mac_ocp_write[(]tp,/{ / and s/[)];/ },/
> functions that only changed the function names and the ending of the line, so the actual
> hex data is unchanged.
>
> Note that the reason for the introduction of the original commit
> was to enable recovery of the RX unit on the RTL8411b which was confused by the
> powered-down PHY. This sequence of r8168_mac_ocp_write() calls amplifies the problem
> into a series of about 500+ memory bus locks, most waiting for the main MMIO memory
> read-modify-write under a LOCK. The memory barrier in RTL_W32 should suffice for
> the programming sequence to reach RTL NIC registers.
>
> [0] https://bugzilla.redhat.com/show_bug.cgi?id=1692075
>
> Fixes: fe4e8db0392a6 ("r8169: fix issue with confused RX unit after PHY power-down on RTL8411b")
> Cc: Heiner Kallweit <hkallweit1@gmail.com>
> Cc: Marco Elver <elver@google.com>
> Cc: nic_swsd@realtek.com
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: Paolo Abeni <pabeni@redhat.com>
> Cc: netdev@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Link: https://lore.kernel.org/lkml/20231028005153.2180411-1-mirsad.todorovac@alu.unizg.hr/
> Link: https://lore.kernel.org/lkml/20231028110459.2644926-1-mirsad.todorovac@alu.unizg.hr/
> Signed-off-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
> ---
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
^ permalink raw reply
* Re: [PATCH net] veth: Fix RX stats for bpf_redirect_peer() traffic
From: Peilin Ye @ 2023-10-30 21:51 UTC (permalink / raw)
To: Daniel Borkmann
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Alexei Starovoitov, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
Hao Luo, Jiri Olsa, Jesper Dangaard Brouer, Peilin Ye, netdev,
bpf, linux-kernel, Cong Wang, Jiang Wang, Youlun Zhang
In-Reply-To: <94c88020-5282-c82b-8f88-a2d012444699@iogearbox.net>
On Mon, Oct 30, 2023 at 03:19:26PM +0100, Daniel Borkmann wrote:
> OT: does cadvisor run inside the Pod to collect the device stats? Just
> curious how it gathers them.
Seems like it reads from /proc/$PID/net/dev:
https://github.com/google/cadvisor/blob/d19df4f3b8e73c7e807d4b9894c252b54852fa8a/container/libcontainer/handler.go#L461
Thanks,
Peilin Ye
^ permalink raw reply
* Re: [PATCH v4 1/5] r8169: Coalesce r8169_mac_ocp_write/modify calls to reduce spinlock stalls
From: Jacob Keller @ 2023-10-30 21:50 UTC (permalink / raw)
To: Mirsad Goran Todorovac, Jason Gunthorpe, Joerg Roedel, Lu Baolu,
iommu, linux-kernel, netdev
Cc: Joerg Roedel, Will Deacon, Robin Murphy, Heiner Kallweit,
nic_swsd, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Marco Elver
In-Reply-To: <20231029110442.347448-1-mirsad.todorovac@alu.unizg.hr>
On 10/29/2023 4:04 AM, Mirsad Goran Todorovac wrote:> A pair of new
helpers r8168_mac_ocp_write_seq() and r8168_mac_ocp_modify_seq()
> are introduced.
>
> The motivation for these helpers was the locking overhead of 130 consecutive
> r8168_mac_ocp_write() calls in the RTL8411b reset after the NIC gets confused
> if the PHY is powered-down.
>
> To quote Heiner:
>
> On RTL8411b the RX unit gets confused if the PHY is powered-down.
> This was reported in [0] and confirmed by Realtek. Realtek provided
> a sequence to fix the RX unit after PHY wakeup.
>
> A series of about 130 r8168_mac_ocp_write() calls is performed to program the
> RTL registers for recovery, each doing an expensive spin_lock_irqsave() and
> spin_unlock_irqrestore().
>
> Each mac ocp write is made of:
>
> static void __r8168_mac_ocp_write(struct rtl8169_private *tp, u32 reg,
> u32 data)
> {
> if (rtl_ocp_reg_failure(reg))
> return;
>
> RTL_W32(tp, OCPDR, OCPAR_FLAG | (reg << 15) | data);
> }
>
> static void r8168_mac_ocp_write(struct rtl8169_private *tp, u32 reg,
> u32 data)
> {
> unsigned long flags;
>
> raw_spin_lock_irqsave(&tp->mac_ocp_lock, flags);
> __r8168_mac_ocp_write(tp, reg, data);
> raw_spin_unlock_irqrestore(&tp->mac_ocp_lock, flags);
> }
>
> Register programming is done through RTL_W32() macro which expands into
>
> #define RTL_W32(tp, reg, val32) writel((val32), tp->mmio_addr + (reg))
>
> which is further (on Alpha):
>
> extern inline void writel(u32 b, volatile void __iomem *addr)
> {
> mb();
> __raw_writel(b, addr);
> }
>
> or on i386/x86_64:
>
> #define build_mmio_write(name, size, type, reg, barrier) \
> static inline void name(type val, volatile void __iomem *addr) \
> { asm volatile("mov" size " %0,%1": :reg (val), \
> "m" (*(volatile type __force *)addr) barrier); }
>
> build_mmio_write(writel, "l", unsigned int, "r", :"memory")
>
> This obviously involves iat least a compiler barrier.
>
> mb() expands into something like this i.e. on x86_64:
>
> #define mb() asm volatile("lock; addl $0,0(%%esp)" ::: "memory")
>
> This means a whole lot of memory bus stalls: for spin_lock_irqsave(),
> memory barrier, writel(), and spin_unlock_irqrestore().
>
> With about 130 of these sequential calls to r8168_mac_ocp_write() this looks like
> a lock storm that will stall all of the cores and CPUs on the same memory controller
> for certain time I/O takes to finish.
>
> In a sequential case of RTL register programming, the writes to RTL registers
> can be coalesced under a same raw spinlock. This can dramatically decrease the
> number of bus stalls in a multicore or multi-CPU system.
>
> Macro helpers r8168_mac_ocp_write_seq() and r8168_mac_ocp_modify_seq() are
> provided to reduce lock contention:
>
> static void rtl_hw_start_8411_2(struct rtl8169_private *tp)
> {
>
> ...
>
> /* The following Realtek-provided magic fixes an issue with the RX unit
> * getting confused after the PHY having been powered-down.
> */
>
> static const struct recover_8411b_info init_zero_seq[] = {
> { 0xFC28, 0x0000 }, { 0xFC2A, 0x0000 }, { 0xFC2C, 0x0000 },
> ...
> };
>
> ...
>
> r8168_mac_ocp_write_seq(tp, init_zero_seq);
>
> ...
>
> }
>
> The hex data is preserved intact through s/r8168_mac_ocp_write[(]tp,/{ / and s/[)];/ },/
> functions that only changed the function names and the ending of the line, so the actual
> hex data is unchanged.
>
> To repeat, the reason for the introduction of the original commit
> was to enable recovery of the RX unit on the RTL8411b which was confused by the
> powered-down PHY. This sequence of r8168_mac_ocp_write() calls amplifies the problem
> into a series of about 500+ memory bus locks, most waiting for the main memory read,
> modify and write under a LOCK. The memory barrier in RTL_W32 should suffice for
> the programming sequence to reach RTL NIC registers.
>
> [0] https://bugzilla.redhat.com/show_bug.cgi?id=1692075
>
I might have chosen to send some of this information as the cover letter
for the series instead of just as part of the commit message for [1/5],
but either way:
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
^ permalink raw reply
* Re: [PATCH net v2] net: dsa: tag_rtl4_a: Bump min packet size
From: Linus Walleij @ 2023-10-30 21:50 UTC (permalink / raw)
To: Vladimir Oltean
Cc: Andrew Lunn, Florian Fainelli, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, netdev, linux-kernel
In-Reply-To: <20231030141623.ufzhb4ttvxi3ukbj@skbuf>
On Mon, Oct 30, 2023 at 3:16 PM Vladimir Oltean <olteanv@gmail.com> wrote:
> which means that here, skb->len will be 1522, if it was originally 1496.
> So the code adds 26 extra octets, and only 4 of those are legitimate (a tag).
Yeah I know :/
> The rest is absolutely unexplained, which means that until there is a
> valid explanation for them:
>
> pw-bot: cr
>
> (sorry, but if it works and we don't know why it works, then at some
> point it will break and we won't know why it stopped working)
Yeah it broke now and we don't know why...
> you said that what increments is Dot1dTpPortInDiscards. 802.1Q-2018 says
> about it: "Count of received valid frames that were discarded (i.e.,
> filtered) by the Forwarding Process." which is odd enough to me, since
> packets sent by rtl4a_tag_xmit() should *not* be processed by the forwarding
> layer of the switch, but rather, force-delivered to the specified egress
> port.
No this was a coincidence, we can rule this out. There are always
a few (2-3) Dot1dTpPortInDiscards on the switch port when it
is connected, sorry for getting this wrong :/
What happens is way more disturbing: packets are dropped
*silently* if not padded.
I added the following patch:
@@ -37,6 +37,8 @@ static struct sk_buff *rtl4a_tag_xmit(struct sk_buff *skb,
struct net_device *dev)
{
struct dsa_port *dp = dsa_slave_to_port(dev);
+ static u16 mask = BIT(6);
+ static int cnt = 0;
__be16 *p;
u8 *tag;
u16 out;
@@ -60,6 +62,19 @@ static struct sk_buff *rtl4a_tag_xmit(struct sk_buff *skb,
/* The lower bits indicate the port number */
out |= BIT(dp->index);
+ if (skb->len >= (ETH_DATA_LEN - RTL4_A_HDR_LEN)) {
+ /* Test bits... */
+ out |= mask;
+ netdev_info(dev, "add mask %04x to big package\n", mask);
+ cnt ++;
+ if (cnt == 10) {
+ cnt = 0;
+ mask <<= 1;
+ if (mask == BIT(15))
+ mask = BIT(6);
+ }
+ }
+
p = (__be16 *)(tag + 2);
*p = htons(out);
This loops over all the bits not used by the port mask and test them
one by one to see if any of them help.
Then ran a few rounds of ping -s 1472 and ping -s 1470.
There are console prints:
realtek-smi switch lan0: add mask 0040 to big package
realtek-smi switch lan0: add mask 0040 to big package
realtek-smi switch lan0: add mask 0040 to big package
(...)
Then bits 6,7,8,9,10,11,12,13,14 and 15 are tested in succession.
No error counters increase in ethtool -S lan0.
I can see the big packets leave the eth0 interface
(from ethtool -S eth0)
p05_EtherStatsPkts1024to1518Octe: 370
But they do not appear in the targeted switch port stats:
EtherStatsPkts1024to1518Octets: 22
(these 22 are some unrelated -s 1400 packets I sent to test)
Yours,
Linus Walleij
^ permalink raw reply
* Re: [PATCH v3] net: r8169: Disable multicast filter for RTL8168H and RTL8107E
From: Jacob Keller @ 2023-10-30 21:44 UTC (permalink / raw)
To: Patrick Thompson, netdev
Cc: Chun-Hao Lin, David S. Miller, Eric Dumazet, Heiner Kallweit,
Jakub Kicinski, Paolo Abeni, linux-kernel, nic_swsd
In-Reply-To: <20231030205031.177855-1-ptf@google.com>
On 10/30/2023 1:50 PM, Patrick Thompson wrote:
> RTL8168H and RTL8107E ethernet adapters erroneously filter unicast
> eapol packets unless allmulti is enabled. These devices correspond to
> RTL_GIGA_MAC_VER_46 and VER_48. Add an exception for VER_46 and VER_48
> in the same way that VER_35 has an exception.
>
> Fixes: 6e1d0b898818 ("r8169:add support for RTL8168H and RTL8107E")
> Signed-off-by: Patrick Thompson <ptf@google.com>
> ---
>
> Changes in v3:
> - disable mc filter for VER_48
> - update description
>
> Changes in v2:
> - add Fixes tag
> - add net annotation
> - update description
>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox