From: "Russell King (Oracle)" <linux@armlinux.org.uk>
To: Alexander Duyck <alexander.duyck@gmail.com>
Cc: netdev@vger.kernel.org, andrew@lunn.ch, hkallweit1@gmail.com,
davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com
Subject: Re: [net-next PATCH 2/2] net: phylink: Fix issues with link balancing w/ BMC present
Date: Wed, 16 Apr 2025 20:19:37 +0100 [thread overview]
Message-ID: <aAACyQ494eO4LFQD@shell.armlinux.org.uk> (raw)
In-Reply-To: <CAKgT0UcgZpE4CMDmnR2V2GTz3tyd+atU-mgMqiHesZVXN8F_+g@mail.gmail.com>
On Wed, Apr 16, 2025 at 12:03:05PM -0700, Alexander Duyck wrote:
> On Wed, Apr 16, 2025 at 9:01 AM Russell King (Oracle)
> <linux@armlinux.org.uk> wrote:
> >
> > On Wed, Apr 16, 2025 at 08:29:00AM -0700, Alexander Duyck wrote:
> > > From: Alexander Duyck <alexanderduyck@fb.com>
> > >
> > > This change is meant to address the fact that there are link imbalances
> > > introduced when using phylink on a system with a BMC. Specifically there
> > > are two issues.
> > >
> > > The first issue is that if we lose link after the first call to
> > > phylink_start but before it gets to the phylink_resolve we will end up with
> > > the phylink interface assuming the link was always down and not calling
> > > phylink_link_down resulting in a stuck interface.
> >
> > That is intentional.
> >
> > phylink strictly orders .mac_link_down and .mac_link_up, and starts from
> > an initial position that the link _will_ be considered to be down. So,
> > it is intentional that .mac_link_down will _never_ be called after
> > phylink_start().
>
> Well the issue is that with a BMC present the link may be up before we
> even start using phylink. So if the link is lost while we are going
> through phylink_start we will end up in an in-between state where the
> link is physically down, but the MAC is still configured as though the
> link is up. This will be problematic as the MAC should essentially be
> discarding frames for transmit if the link is down to avoid blocking
> internal Tx FIFOs.
So, when a Linux network driver probes, it starts out in administrative
state *DOWN*. When the administrator configures the network driver,
.ndo_open is called, which is expected to configure the network adapter.
Part of that process is to call phylink_start() as one of the last
steps, which detects whether the link is up or not. If the link is up,
then phylink will call netif_carrier_on() and .mac_link_up(). This
tells the core networking layers that the network interface is now
ready to start sending packets, and it will begin queueing packets for
the network driver to process - not before.
Prior to .ndo_open being called, the networking layers do not expect
traffic from the network device no matter what physical state the
media link is in. If .ndo_open fails, the same applies - no traffic is
expected to be passed to the core network layers from the network
layers because as far as the network stack is concerned, the interface
is still administratively down.
Thus, the fact that your BMC thinks that the link is up is irrelevant.
So, start off in a state that packet activity is suspended even if the
link is up at probe time. Only start packet activity (reception and
transmission) once .mac_link_up() has been called. Stop that activity
when .mac_link_down() is subsequently called.
There have been lots of NICs out there where the physical link doesn't
follow the adminstrative state of the network interface. This is not a
problem. It may be desirable that it does, but a desire is not the same
as "it must".
> > > The second issue is that when a BMC is present we are currently forcing the
> > > link down. This results in us bouncing the link for a fraction of a second
> > > and that will result in dropped packets for the BMC.
> >
> > ... but you don't explain how that happens.
>
> It was right there in the patch. It was the lines I removed:
... thus further breaking phylink guarantees.
Sorry, but no.
> The issue, even with your recent patch, is that it will still force
> the link down if the link was previously up. That is the piece I need
> to avoid to prevent the BMC from losing link. Ideally what I need is
> to have a check of the current link state and then sync back up rather
> than force the phylink state on the MAC and then clean things up after
> the fact.
So don't force the link, just stop packet activity. As stated above,
nothing requires that the physical link is forced down just because
.mac_link_down() has been called.
This makes me wonder what happens to your BMC with your ideas if
userspace takes down the network interface. It sounds like all hell
breaks loose because you've taken the link down, and that's seen as
a critical failure... So taking down a network interface becomes a
critical failure - yet it's a *normal* userspace operation.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
next prev parent reply other threads:[~2025-04-16 19:19 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-16 15:28 [net-next PATCH 0/2] net: phylink: Fix issue w/ BMC link flap Alexander Duyck
2025-04-16 15:28 ` [net-next PATCH 1/2] net: phylink: Drop unused defines for SUPPORTED/ADVERTISED_INTERFACES Alexander Duyck
2025-04-22 8:32 ` Russell King (Oracle)
2025-04-16 15:29 ` [net-next PATCH 2/2] net: phylink: Fix issues with link balancing w/ BMC present Alexander Duyck
2025-04-16 16:01 ` Russell King (Oracle)
2025-04-16 16:16 ` [PATCH net] net: phylink: fix suspend/resume with WoL enabled and link down Russell King (Oracle)
2025-04-16 17:14 ` Russell King (Oracle)
2025-04-17 14:30 ` Alexander H Duyck
2025-04-17 14:35 ` Russell King (Oracle)
2025-04-17 15:23 ` Russell King (Oracle)
2025-04-17 17:06 ` Alexander Duyck
2025-04-17 17:27 ` Russell King (Oracle)
2025-04-17 19:49 ` Alexander Duyck
2025-04-22 9:51 ` Russell King (Oracle)
2025-04-22 15:30 ` Alexander Duyck
2025-04-22 16:00 ` Andrew Lunn
2025-04-22 20:09 ` Alexander Duyck
2025-04-23 0:00 ` patchwork-bot+netdevbpf
2025-04-16 17:12 ` [net-next PATCH 2/2] net: phylink: Fix issues with link balancing w/ BMC present Russell King (Oracle)
2025-04-16 19:03 ` Alexander Duyck
2025-04-16 19:19 ` Russell King (Oracle) [this message]
2025-04-16 20:05 ` Russell King (Oracle)
2025-04-16 22:58 ` Alexander Duyck
2025-04-19 18:11 ` [net-next PATCH 0/2] net: phylink: Fix issue w/ BMC link flap Andrew Lunn
2025-04-20 18:18 ` Alexander Duyck
2025-04-20 21:58 ` Andrew Lunn
2025-04-21 15:51 ` Alexander Duyck
2025-04-21 16:50 ` Alexander Duyck
2025-04-22 1:21 ` Jakub Kicinski
2025-04-22 13:49 ` Andrew Lunn
2025-04-22 15:28 ` Jakub Kicinski
2025-04-22 16:49 ` Andrew Lunn
2025-04-22 17:30 ` Russell King (Oracle)
2025-04-22 18:13 ` Andrew Lunn
2025-04-22 18:50 ` Russell King (Oracle)
2025-04-22 23:51 ` Jakub Kicinski
2025-04-22 21:29 ` Alexander Duyck
2025-04-22 22:26 ` Andrew Lunn
2025-04-22 23:06 ` Alexander Duyck
2025-04-23 18:38 ` Alexander Duyck
2025-04-24 20:34 ` Andrew Lunn
2025-04-24 23:40 ` Alexander Duyck
2025-04-25 13:11 ` Andrew Lunn
2025-04-25 15:41 ` Alexander Duyck
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aAACyQ494eO4LFQD@shell.armlinux.org.uk \
--to=linux@armlinux.org.uk \
--cc=alexander.duyck@gmail.com \
--cc=andrew@lunn.ch \
--cc=davem@davemloft.net \
--cc=hkallweit1@gmail.com \
--cc=kuba@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).