From: "Russell King (Oracle)" <linux@armlinux.org.uk>
To: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Andrew Lunn <andrew@lunn.ch>,
Heiner Kallweit <hkallweit1@gmail.com>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>,
Joakim Zhang <qiangqing.zhang@nxp.com>,
netdev@vger.kernel.org, Paolo Abeni <pabeni@redhat.com>
Subject: Re: [PATCH net] net: phylink: fix suspend/resume with WoL enabled and link down
Date: Thu, 17 Apr 2025 18:27:17 +0100 [thread overview]
Message-ID: <aAE59VOdtyOwv0Rv@shell.armlinux.org.uk> (raw)
In-Reply-To: <CAKgT0Uf2a48D7O_OSFV8W7j3DJjn_patFbjRbvktazt9UTKoLQ@mail.gmail.com>
On Thu, Apr 17, 2025 at 10:06:45AM -0700, Alexander Duyck wrote:
> On Thu, Apr 17, 2025 at 8:23 AM Russell King (Oracle)
> <linux@armlinux.org.uk> wrote:
> >
> > On Thu, Apr 17, 2025 at 03:35:56PM +0100, Russell King (Oracle) wrote:
> > > On Thu, Apr 17, 2025 at 07:30:05AM -0700, Alexander H Duyck wrote:
> > > > This is the only spot where we weren't setting netif_carrier_on/off and
> > > > old_link_state together. I suspect you could just carry old_link_state
> > > > without needing to add a new argument. Basically you would just need to
> > > > drop the "else" portion of this statement.
> > > >
> > > > In the grand scheme of things with the exception of this one spot
> > > > old_link_state is essentially the actual MAC/PCS link state whereas
> > > > netif_carrier_off is the administrative state.
> > >
> > > Sorry to say, but you have that wrong. Neither are the administrative
> > > state.
> >
> > To add to this (sorry, I rushed that reply), old_link_state is used when
> > we aren't bound to a network device, otherwise the netif carrier state
> > is used. Changes in the netif carrier state provoke netlink messages to
> > userspace to inform userspace of changes to the link state.
> >
> > Moreover, it controls whether the network stack queues packets for
> > transmission to the driver, and also whether the transmit watchdog is
> > allowed to time out. It _probably_ also lets the packet schedulers
> > know that the link state has changed.
> >
> > So, the netif carrier state is not "administrative".
>
> I have always sort of assumed that netif_carrier_on/off was the
> logical AND of the administrative state of the NIC and the state of
> the actual MAC/PCS/PHY. That is why I refer to it as an administrative
> state. You end up with ifconfig up/down toggling netif carrier even if
> the underlying link state hasn't changed. This has been the case for
> many of the high end NICs for a while now, especially for the
> multi-host ones, as the firmware won't change the actual physical
> state of the link. It leaves it in place while the NIC port on the
> host is no longer active.
>
> > It isn't strictly necessary for old_link_state to remain in sync with
> > the netif carrier state, but it is desirable to avoid errors - but
> > obviously the netif carrier state only exists when we're bound to a
> > network device.
>
> From what I can tell this is the only spot where the two diverge,
> prior to this patch. The general thought I was having was to update
> the netif_carrier state in the suspend, and then old_link_state in the
> resume. I realize now the concern is that setting the
> phylink_disable_state is essentially the same as setting the link
> down. So really we have 3 states we are tracking,
> netif_carrier_ok/old_link_state, phylink_disable_state, and if we want
> the link state to change while disabled. So we do need one additional
> piece of state in the event that there isn't a netdev in order to
> handle the case where "pl->phylink_disable_state &&
> !!pl->phylink_disable_state == pl->old_link_state".
>
> > > > > - /* Call mac_link_down() so we keep the overall state balanced.
> > > > > - * Do this under the state_mutex lock for consistency. This
> > > > > - * will cause a "Link Down" message to be printed during
> > > > > - * resume, which is harmless - the true link state will be
> > > > > - * printed when we run a resolve.
> > > > > - */
> > > > > - mutex_lock(&pl->state_mutex);
> > > > > - phylink_link_down(pl);
> > > > > - mutex_unlock(&pl->state_mutex);
> > > > > + if (pl->suspend_link_up) {
> > > > > + /* Call mac_link_down() so we keep the overall state
> > > > > + * balanced. Do this under the state_mutex lock for
> > > > > + * consistency. This will cause a "Link Down" message
> > > > > + * to be printed during resume, which is harmless -
> > > > > + * the true link state will be printed when we run a
> > > > > + * resolve.
> > > > > + */
> > > > > + mutex_lock(&pl->state_mutex);
> > > > > + phylink_link_down(pl);
> > > > > + mutex_unlock(&pl->state_mutex);
> > > > > + }
> > > >
> > > > You should be able to do all of this with just old_link_state. The only
> > > > thing that would have to change is that you would need to set
> > > > old_link_state to false after the if statement.
> > >
> > > Nope.
> >
> > And still nope - what we need to know here is what was the link state
> > _before_ we called netif_carrier_off() or set old_link_state to false.
> >
> > I somehow suspect that you don't understand what all this code is trying
> > to do, so let me explain it.
> >
> > In the suspend function, when WoL is enabled, and we're using MAC-based
> > WoL, we need the link to the MAC to remain up, so it can receive packets
> > to check whether they are the appropriate wake packet. However, we don't
> > want packets to be queued for transmission either.
>
> That is the same as what we are looking for. We aren't queueing any
> packets for transmission ourselves. We just want to leave the MAC
> enabled, however we also want to leave it enabled when we resume.
>
> > So, we have the case where we want to avoid notifying the MAC that the
> > link has gone down, but we also want to call netif_carrier_off() to stop
> > the network stack queueing packets.
> >
> > To achieve this, several things work in unison:
> >
> > - we take the state mutex to prevent the resolver from running while we
> > fiddle with various state.
> > - we disable the resolver (which, if that's the only thing that happens,
> > will provoke the resolver to take the link down.)
> > - we lie to the resolver about the link state, by calling
> > netif_carrier_off() and/or setting old_link_state to false. This
> > means the resolver believes the link is _already_ down, and thus
> > because of the strict ordering I've previously mentioned, will *not*
> > call mac_link_down().
> > - we release the lock.
> >
> > There is now no way that the resolver will call either mac_link_up() or
> > mac_link_down() - which is what we want here.
>
> The part that I think I missed here was that if we set
> phylink_disable_state we then set link_state.link to false in
> phylink_resolve. I wonder if we couldn't just have a flag that sets
> cur_link_state to false in the "if(pl->phylink_disable_state)" case
> and simplify this to indicate we won't force the link down"
Then how does phylink_stop() end up calling .mac_link_down() ?
> So fbnic_mac_link_up_asic doesn't trigger any issues. The issues are:
>
> 1. In fbnic_mac_link_down_asic we assume that if we are being told
> that the link is down by phylink that it really means the link is down
> and we need to shut off the MAC and flush any packets that are in the
> Tx FIFO. The issue isn't so much the call itself, it is the fact that
> we are being called in phylink_resume to rebalance the link that will
> be going from UP->UP. The rebalancing is introducing an extra down
> step. This could be resolved by adding an extra check to the line in
> phylink_resume that is calling the mac_link_down so that it doesn't
> get triggered and stall the link. In my test code I am now calling it
> "pl->rolling_start".
You're not addessing the issue I've already stated here.
If the link was up, and we *don't* call mac_link_down(), we then *can*
*not* call phylink_mac_initial_config(). It's as simple as that. The
MAC must see link down before being configured.
So, if the link was up, and we don't call mac_link_down() then we must
also *not* call phylink_mac_initial_config(). I've no idea what will
break with that change.
> 2. In phylink_start/phylink_resume since we are coming at this from a
> "rolling start" due to the BMC we have the MAC up and the
> netif_carrier in the "off" state. As a result if we end up losing the
> link between mac_prepare and pcs_get_state the MAC gets into a bad
> state where netif_carrier is off, the MAC is stuck in the "up" state,
> and we have stale packets clogging up the Tx FIFO.
>
> As far as the BMC it isn't so much wanting to hit the big red button
> as our platform team. Basically losing of packets is very problematic
> for them, think about ssh sessions that suddenly lock up during boot,
> and they can demonstrate that none of the other vendors that have the
> MAC/PCS/PHY buried in their firmware have this issue. I was really
> hoping to avoid going that route as the whole point of this was to
> keep the code open source and visible.
The problem I have is not the idea, but the implementation. You want
to change the phylink behaviour in a way that affects *all* existing
users. I don't want that because of the guarantees of that contract
I've previously stated that I've given to existing users.
As things currently stand, you have a currently unique new case, and
you're trying to force your solution on all users potentially breaking
them. I have no real issue with supporting the new case, but *not* at
the expense of regressing existing phylink users.
Yet, when I point out this, you seem to be dead against *not* affecting
other users. This is where the problem is, and where we fundamentally
disagree.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
next prev parent reply other threads:[~2025-04-17 17:27 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-16 15:28 [net-next PATCH 0/2] net: phylink: Fix issue w/ BMC link flap Alexander Duyck
2025-04-16 15:28 ` [net-next PATCH 1/2] net: phylink: Drop unused defines for SUPPORTED/ADVERTISED_INTERFACES Alexander Duyck
2025-04-22 8:32 ` Russell King (Oracle)
2025-04-16 15:29 ` [net-next PATCH 2/2] net: phylink: Fix issues with link balancing w/ BMC present Alexander Duyck
2025-04-16 16:01 ` Russell King (Oracle)
2025-04-16 16:16 ` [PATCH net] net: phylink: fix suspend/resume with WoL enabled and link down Russell King (Oracle)
2025-04-16 17:14 ` Russell King (Oracle)
2025-04-17 14:30 ` Alexander H Duyck
2025-04-17 14:35 ` Russell King (Oracle)
2025-04-17 15:23 ` Russell King (Oracle)
2025-04-17 17:06 ` Alexander Duyck
2025-04-17 17:27 ` Russell King (Oracle) [this message]
2025-04-17 19:49 ` Alexander Duyck
2025-04-22 9:51 ` Russell King (Oracle)
2025-04-22 15:30 ` Alexander Duyck
2025-04-22 16:00 ` Andrew Lunn
2025-04-22 20:09 ` Alexander Duyck
2025-04-23 0:00 ` patchwork-bot+netdevbpf
2025-04-16 17:12 ` [net-next PATCH 2/2] net: phylink: Fix issues with link balancing w/ BMC present Russell King (Oracle)
2025-04-16 19:03 ` Alexander Duyck
2025-04-16 19:19 ` Russell King (Oracle)
2025-04-16 20:05 ` Russell King (Oracle)
2025-04-16 22:58 ` Alexander Duyck
2025-04-19 18:11 ` [net-next PATCH 0/2] net: phylink: Fix issue w/ BMC link flap Andrew Lunn
2025-04-20 18:18 ` Alexander Duyck
2025-04-20 21:58 ` Andrew Lunn
2025-04-21 15:51 ` Alexander Duyck
2025-04-21 16:50 ` Alexander Duyck
2025-04-22 1:21 ` Jakub Kicinski
2025-04-22 13:49 ` Andrew Lunn
2025-04-22 15:28 ` Jakub Kicinski
2025-04-22 16:49 ` Andrew Lunn
2025-04-22 17:30 ` Russell King (Oracle)
2025-04-22 18:13 ` Andrew Lunn
2025-04-22 18:50 ` Russell King (Oracle)
2025-04-22 23:51 ` Jakub Kicinski
2025-04-22 21:29 ` Alexander Duyck
2025-04-22 22:26 ` Andrew Lunn
2025-04-22 23:06 ` Alexander Duyck
2025-04-23 18:38 ` Alexander Duyck
2025-04-24 20:34 ` Andrew Lunn
2025-04-24 23:40 ` Alexander Duyck
2025-04-25 13:11 ` Andrew Lunn
2025-04-25 15:41 ` Alexander Duyck
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aAE59VOdtyOwv0Rv@shell.armlinux.org.uk \
--to=linux@armlinux.org.uk \
--cc=alexander.duyck@gmail.com \
--cc=andrew@lunn.ch \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=hkallweit1@gmail.com \
--cc=kuba@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=qiangqing.zhang@nxp.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.