* Re: [PATCH RFC net-next 03/14] net: phylink: add support for PCS link change notifications
[not found] <E1qChay-00Fmrf-9Y@rmk-PC.armlinux.org.uk>
@ 2024-01-23 19:46 ` Sean Anderson
2024-01-23 20:07 ` Russell King (Oracle)
0 siblings, 1 reply; 5+ messages in thread
From: Sean Anderson @ 2024-01-23 19:46 UTC (permalink / raw)
To: rmk+kernel
Cc: Landen.Chao, UNGLinuxDriver, alexandre.belloni, andrew,
angelogioacchino.delregno, arinc.unal, claudiu.manoil, daniel,
davem, dqfext, edumazet, f.fainelli, hkallweit1, kuba,
linux-arm-kernel, linux-mediatek, matthias.bgg, netdev, olteanv,
pabeni, sean.wang
Hi Russell,
Does there need to be any locking when calling phylink_pcs_change? I
noticed that you call it from threaded IRQ context in [1]. Can that race
with phylink_major_config?
--Sean
[1] https://lore.kernel.org/all/E1qJruX-00Gkk8-RY@rmk-PC.armlinux.org.uk/
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH RFC net-next 03/14] net: phylink: add support for PCS link change notifications
2024-01-23 19:46 ` [PATCH RFC net-next 03/14] net: phylink: add support for PCS link change notifications Sean Anderson
@ 2024-01-23 20:07 ` Russell King (Oracle)
2024-01-23 20:33 ` Sean Anderson
0 siblings, 1 reply; 5+ messages in thread
From: Russell King (Oracle) @ 2024-01-23 20:07 UTC (permalink / raw)
To: Sean Anderson
Cc: Landen.Chao, UNGLinuxDriver, alexandre.belloni, andrew,
angelogioacchino.delregno, arinc.unal, claudiu.manoil, daniel,
davem, dqfext, edumazet, f.fainelli, hkallweit1, kuba,
linux-arm-kernel, linux-mediatek, matthias.bgg, netdev, olteanv,
pabeni, sean.wang
On Tue, Jan 23, 2024 at 02:46:15PM -0500, Sean Anderson wrote:
> Hi Russell,
>
> Does there need to be any locking when calling phylink_pcs_change? I
> noticed that you call it from threaded IRQ context in [1]. Can that race
> with phylink_major_config?
What kind of scenario are you thinking may require locking?
I guess the possibility would be if pcs->phylink changes and the
compiler reads it multiple times - READ_ONCE() should solve that.
However, in terms of the mechanics, there's no race.
During the initial bringup, the resolve worker isn't started until
after phylink_major_config() has completed (it's started at
phylink_enable_and_run_resolve().) So, if phylink_pcs_change()
gets called while in phylink_major_config() there, it'll see
that pl->phylink_disable_state is non-zero, and won't queue the
work.
The next one is within the worker itself - and there can only
be one instance of the worker running in totality. So, if
phylink_pcs_change() gets called while phylink_major_config() is
running from this path, the only thing it'll do is re-schedule
the resolve worker to run another iteration which is harmless
(whether or not the PCS is still current.)
The last case is phylink_ethtool_ksettings_set(). This runs under
the state_mutex, which locks out the resolve worker (since it also
takes that mutex).
So calling phylink_pcs_change() should be pretty harmless _unless_
the compiler re-reads pcs->phylink multiple times inside
phylink_pcs_change(), which I suppose with modern compilers is
possible. Hence my suggestion above about READ_ONCE() for that.
Have you encountered an OOPS because pcs->phylink has become NULL?
Or have you spotted another issue?
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH RFC net-next 03/14] net: phylink: add support for PCS link change notifications
2024-01-23 20:07 ` Russell King (Oracle)
@ 2024-01-23 20:33 ` Sean Anderson
2024-01-23 21:05 ` Russell King (Oracle)
0 siblings, 1 reply; 5+ messages in thread
From: Sean Anderson @ 2024-01-23 20:33 UTC (permalink / raw)
To: Russell King (Oracle)
Cc: Landen.Chao, UNGLinuxDriver, alexandre.belloni, andrew,
angelogioacchino.delregno, arinc.unal, claudiu.manoil, daniel,
davem, dqfext, edumazet, f.fainelli, hkallweit1, kuba,
linux-arm-kernel, linux-mediatek, matthias.bgg, netdev, olteanv,
pabeni, sean.wang
On 1/23/24 15:07, Russell King (Oracle) wrote:
> On Tue, Jan 23, 2024 at 02:46:15PM -0500, Sean Anderson wrote:
>> Hi Russell,
>>
>> Does there need to be any locking when calling phylink_pcs_change? I
>> noticed that you call it from threaded IRQ context in [1]. Can that race
>> with phylink_major_config?
>
> What kind of scenario are you thinking may require locking?
Can't we at least get a spurious bounce? E.g.
pcs_major_config()
pcs_disable(old_pcs) /* masks IRQ */
old_pcs->phylink = NULL;
new_pcs->phylink = pl;
...
pcs_enable(new_pcs) /* unmasks IRQ */
...
pcs_handle_irq(new_pcs) /* Link up IRQ */
phylink_pcs_change(new_pcs, true)
phylink_run_resolve(pl)
phylink_resolve(pl)
/* Link up */
pcs_handle_irq(old_pcs) /* Link down IRQ (pending from before pcs_disable) */
phylink_pcs_change(old_pcs, false)
phylink_run_resolve(pl) /* Doesn't see the NULL */
phylink_resolve(pl)
/* Link down; retrigger */
phylink_resolve(pl)
/* Link up */
And mac_link_dropped probably needs WRITE_ONCE in order to take
advantage of the ordering provided by queue_work.
> I guess the possibility would be if pcs->phylink changes and the
> compiler reads it multiple times - READ_ONCE() should solve that.
>
> However, in terms of the mechanics, there's no race.
>
> During the initial bringup, the resolve worker isn't started until
> after phylink_major_config() has completed (it's started at
> phylink_enable_and_run_resolve().) So, if phylink_pcs_change()
> gets called while in phylink_major_config() there, it'll see
> that pl->phylink_disable_state is non-zero, and won't queue the
> work.
>
> The next one is within the worker itself - and there can only
> be one instance of the worker running in totality. So, if
> phylink_pcs_change() gets called while phylink_major_config() is
> running from this path, the only thing it'll do is re-schedule
> the resolve worker to run another iteration which is harmless
> (whether or not the PCS is still current.)
>
> The last case is phylink_ethtool_ksettings_set(). This runs under
> the state_mutex, which locks out the resolve worker (since it also
> takes that mutex).
>
> So calling phylink_pcs_change() should be pretty harmless _unless_
> the compiler re-reads pcs->phylink multiple times inside
> phylink_pcs_change(), which I suppose with modern compilers is
> possible. Hence my suggestion above about READ_ONCE() for that.
>
> Have you encountered an OOPS because pcs->phylink has become NULL?
> Or have you spotted another issue?
I was looking at extending this code, and I was wondering if I needed
to e.g. take RTNL first. Thanks for the quick response.
--Sean
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH RFC net-next 03/14] net: phylink: add support for PCS link change notifications
2024-01-23 20:33 ` Sean Anderson
@ 2024-01-23 21:05 ` Russell King (Oracle)
2024-01-23 21:09 ` Sean Anderson
0 siblings, 1 reply; 5+ messages in thread
From: Russell King (Oracle) @ 2024-01-23 21:05 UTC (permalink / raw)
To: Sean Anderson
Cc: Landen.Chao, UNGLinuxDriver, alexandre.belloni, andrew,
angelogioacchino.delregno, arinc.unal, claudiu.manoil, daniel,
davem, dqfext, edumazet, f.fainelli, hkallweit1, kuba,
linux-arm-kernel, linux-mediatek, matthias.bgg, netdev, olteanv,
pabeni, sean.wang
On Tue, Jan 23, 2024 at 03:33:57PM -0500, Sean Anderson wrote:
> On 1/23/24 15:07, Russell King (Oracle) wrote:
> > On Tue, Jan 23, 2024 at 02:46:15PM -0500, Sean Anderson wrote:
> >> Hi Russell,
> >>
> >> Does there need to be any locking when calling phylink_pcs_change? I
> >> noticed that you call it from threaded IRQ context in [1]. Can that race
> >> with phylink_major_config?
> >
> > What kind of scenario are you thinking may require locking?
>
> Can't we at least get a spurious bounce? E.g.
>
> pcs_major_config()
> pcs_disable(old_pcs) /* masks IRQ */
> old_pcs->phylink = NULL;
> new_pcs->phylink = pl;
> ...
> pcs_enable(new_pcs) /* unmasks IRQ */
> ...
>
> pcs_handle_irq(new_pcs) /* Link up IRQ */
> phylink_pcs_change(new_pcs, true)
> phylink_run_resolve(pl)
>
> phylink_resolve(pl)
> /* Link up */
By this time, old_pcs->phylink has been set to NULL as you mentioned
above.
> pcs_handle_irq(old_pcs) /* Link down IRQ (pending from before pcs_disable) */
> phylink_pcs_change(old_pcs, false)
> phylink_run_resolve(pl) /* Doesn't see the NULL */
So here, phylink_pcs_change(old_pcs, ...) will read old_pcs->phylink and
find that it's NULL, and do nothing.
> > I guess the possibility would be if pcs->phylink changes and the
> > compiler reads it multiple times - READ_ONCE() should solve that.
> >
> > However, in terms of the mechanics, there's no race.
> >
> > During the initial bringup, the resolve worker isn't started until
> > after phylink_major_config() has completed (it's started at
> > phylink_enable_and_run_resolve().) So, if phylink_pcs_change()
> > gets called while in phylink_major_config() there, it'll see
> > that pl->phylink_disable_state is non-zero, and won't queue the
> > work.
> >
> > The next one is within the worker itself - and there can only
> > be one instance of the worker running in totality. So, if
> > phylink_pcs_change() gets called while phylink_major_config() is
> > running from this path, the only thing it'll do is re-schedule
> > the resolve worker to run another iteration which is harmless
> > (whether or not the PCS is still current.)
> >
> > The last case is phylink_ethtool_ksettings_set(). This runs under
> > the state_mutex, which locks out the resolve worker (since it also
> > takes that mutex).
> >
> > So calling phylink_pcs_change() should be pretty harmless _unless_
> > the compiler re-reads pcs->phylink multiple times inside
> > phylink_pcs_change(), which I suppose with modern compilers is
> > possible. Hence my suggestion above about READ_ONCE() for that.
> >
> > Have you encountered an OOPS because pcs->phylink has become NULL?
> > Or have you spotted another issue?
>
> I was looking at extending this code, and I was wondering if I needed
> to e.g. take RTNL first. Thanks for the quick response.
Note that phylink_mac_change() gets called in irq context, so this
stuff can't take any mutexes or the rtnl. It is also intended that
phylink_pcs_change() is similarly callable in irq context.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH RFC net-next 03/14] net: phylink: add support for PCS link change notifications
2024-01-23 21:05 ` Russell King (Oracle)
@ 2024-01-23 21:09 ` Sean Anderson
0 siblings, 0 replies; 5+ messages in thread
From: Sean Anderson @ 2024-01-23 21:09 UTC (permalink / raw)
To: Russell King (Oracle)
Cc: Landen.Chao, UNGLinuxDriver, alexandre.belloni, andrew,
angelogioacchino.delregno, arinc.unal, claudiu.manoil, daniel,
davem, dqfext, edumazet, f.fainelli, hkallweit1, kuba,
linux-arm-kernel, linux-mediatek, matthias.bgg, netdev, olteanv,
pabeni, sean.wang
On 1/23/24 16:05, Russell King (Oracle) wrote:
> On Tue, Jan 23, 2024 at 03:33:57PM -0500, Sean Anderson wrote:
>> On 1/23/24 15:07, Russell King (Oracle) wrote:
>>> On Tue, Jan 23, 2024 at 02:46:15PM -0500, Sean Anderson wrote:
>>>> Hi Russell,
>>>>
>>>> Does there need to be any locking when calling
>>>> phylink_pcs_change? I noticed that you call it from threaded
>>>> IRQ context in [1]. Can that race with phylink_major_config?
>>>
>>> What kind of scenario are you thinking may require locking?
>>
>> Can't we at least get a spurious bounce? E.g.
>>
>> pcs_major_config() pcs_disable(old_pcs) /* masks IRQ */
>> old_pcs->phylink = NULL; new_pcs->phylink = pl; ...
>> pcs_enable(new_pcs) /* unmasks IRQ */ ...
>>
>> pcs_handle_irq(new_pcs) /* Link up IRQ */
>> phylink_pcs_change(new_pcs, true) phylink_run_resolve(pl)
>>
>> phylink_resolve(pl) /* Link up */
>
> By this time, old_pcs->phylink has been set to NULL as you mentioned
> above.
>
>> pcs_handle_irq(old_pcs) /* Link down IRQ (pending from before
>> pcs_disable) */ phylink_pcs_change(old_pcs, false)
>> phylink_run_resolve(pl) /* Doesn't see the NULL */
>
> So here, phylink_pcs_change(old_pcs, ...) will read old_pcs->phylink
> and find that it's NULL, and do nothing.
This can happen on another CPU. There are no memory barriers on the read
side (until queue_work), so there's no guarantee that other CPUs will
see the write.
--Sean
>>> I guess the possibility would be if pcs->phylink changes and the
>>> compiler reads it multiple times - READ_ONCE() should solve
>>> that.
>>>
>>> However, in terms of the mechanics, there's no race.
>>>
>>> During the initial bringup, the resolve worker isn't started
>>> until after phylink_major_config() has completed (it's started
>>> at phylink_enable_and_run_resolve().) So, if
>>> phylink_pcs_change() gets called while in phylink_major_config()
>>> there, it'll see that pl->phylink_disable_state is non-zero, and
>>> won't queue the work.
>>>
>>> The next one is within the worker itself - and there can only be
>>> one instance of the worker running in totality. So, if
>>> phylink_pcs_change() gets called while phylink_major_config() is
>>> running from this path, the only thing it'll do is re-schedule
>>> the resolve worker to run another iteration which is harmless
>>> (whether or not the PCS is still current.)
>>>
>>> The last case is phylink_ethtool_ksettings_set(). This runs
>>> under the state_mutex, which locks out the resolve worker (since
>>> it also takes that mutex).
>>>
>>> So calling phylink_pcs_change() should be pretty harmless
>>> _unless_ the compiler re-reads pcs->phylink multiple times
>>> inside phylink_pcs_change(), which I suppose with modern
>>> compilers is possible. Hence my suggestion above about
>>> READ_ONCE() for that.
>>>
>>> Have you encountered an OOPS because pcs->phylink has become
>>> NULL? Or have you spotted another issue?
>>
>> I was looking at extending this code, and I was wondering if I
>> needed to e.g. take RTNL first. Thanks for the quick response.
>
> Note that phylink_mac_change() gets called in irq context, so this
> stuff can't take any mutexes or the rtnl. It is also intended that
> phylink_pcs_change() is similarly callable in irq context.
>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-01-23 21:10 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <E1qChay-00Fmrf-9Y@rmk-PC.armlinux.org.uk>
2024-01-23 19:46 ` [PATCH RFC net-next 03/14] net: phylink: add support for PCS link change notifications Sean Anderson
2024-01-23 20:07 ` Russell King (Oracle)
2024-01-23 20:33 ` Sean Anderson
2024-01-23 21:05 ` Russell King (Oracle)
2024-01-23 21:09 ` Sean Anderson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox