* Re: [PATCH RFC net-next 03/14] net: phylink: add support for PCS link change notifications [not found] <E1qChay-00Fmrf-9Y@rmk-PC.armlinux.org.uk> @ 2024-01-23 19:46 ` Sean Anderson 2024-01-23 20:07 ` Russell King (Oracle) 0 siblings, 1 reply; 5+ messages in thread From: Sean Anderson @ 2024-01-23 19:46 UTC (permalink / raw) To: rmk+kernel Cc: Landen.Chao, UNGLinuxDriver, alexandre.belloni, andrew, angelogioacchino.delregno, arinc.unal, claudiu.manoil, daniel, davem, dqfext, edumazet, f.fainelli, hkallweit1, kuba, linux-arm-kernel, linux-mediatek, matthias.bgg, netdev, olteanv, pabeni, sean.wang Hi Russell, Does there need to be any locking when calling phylink_pcs_change? I noticed that you call it from threaded IRQ context in [1]. Can that race with phylink_major_config? --Sean [1] https://lore.kernel.org/all/E1qJruX-00Gkk8-RY@rmk-PC.armlinux.org.uk/ ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH RFC net-next 03/14] net: phylink: add support for PCS link change notifications 2024-01-23 19:46 ` [PATCH RFC net-next 03/14] net: phylink: add support for PCS link change notifications Sean Anderson @ 2024-01-23 20:07 ` Russell King (Oracle) 2024-01-23 20:33 ` Sean Anderson 0 siblings, 1 reply; 5+ messages in thread From: Russell King (Oracle) @ 2024-01-23 20:07 UTC (permalink / raw) To: Sean Anderson Cc: Landen.Chao, UNGLinuxDriver, alexandre.belloni, andrew, angelogioacchino.delregno, arinc.unal, claudiu.manoil, daniel, davem, dqfext, edumazet, f.fainelli, hkallweit1, kuba, linux-arm-kernel, linux-mediatek, matthias.bgg, netdev, olteanv, pabeni, sean.wang On Tue, Jan 23, 2024 at 02:46:15PM -0500, Sean Anderson wrote: > Hi Russell, > > Does there need to be any locking when calling phylink_pcs_change? I > noticed that you call it from threaded IRQ context in [1]. Can that race > with phylink_major_config? What kind of scenario are you thinking may require locking? I guess the possibility would be if pcs->phylink changes and the compiler reads it multiple times - READ_ONCE() should solve that. However, in terms of the mechanics, there's no race. During the initial bringup, the resolve worker isn't started until after phylink_major_config() has completed (it's started at phylink_enable_and_run_resolve().) So, if phylink_pcs_change() gets called while in phylink_major_config() there, it'll see that pl->phylink_disable_state is non-zero, and won't queue the work. The next one is within the worker itself - and there can only be one instance of the worker running in totality. So, if phylink_pcs_change() gets called while phylink_major_config() is running from this path, the only thing it'll do is re-schedule the resolve worker to run another iteration which is harmless (whether or not the PCS is still current.) The last case is phylink_ethtool_ksettings_set(). This runs under the state_mutex, which locks out the resolve worker (since it also takes that mutex). So calling phylink_pcs_change() should be pretty harmless _unless_ the compiler re-reads pcs->phylink multiple times inside phylink_pcs_change(), which I suppose with modern compilers is possible. Hence my suggestion above about READ_ONCE() for that. Have you encountered an OOPS because pcs->phylink has become NULL? Or have you spotted another issue? -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH RFC net-next 03/14] net: phylink: add support for PCS link change notifications 2024-01-23 20:07 ` Russell King (Oracle) @ 2024-01-23 20:33 ` Sean Anderson 2024-01-23 21:05 ` Russell King (Oracle) 0 siblings, 1 reply; 5+ messages in thread From: Sean Anderson @ 2024-01-23 20:33 UTC (permalink / raw) To: Russell King (Oracle) Cc: Landen.Chao, UNGLinuxDriver, alexandre.belloni, andrew, angelogioacchino.delregno, arinc.unal, claudiu.manoil, daniel, davem, dqfext, edumazet, f.fainelli, hkallweit1, kuba, linux-arm-kernel, linux-mediatek, matthias.bgg, netdev, olteanv, pabeni, sean.wang On 1/23/24 15:07, Russell King (Oracle) wrote: > On Tue, Jan 23, 2024 at 02:46:15PM -0500, Sean Anderson wrote: >> Hi Russell, >> >> Does there need to be any locking when calling phylink_pcs_change? I >> noticed that you call it from threaded IRQ context in [1]. Can that race >> with phylink_major_config? > > What kind of scenario are you thinking may require locking? Can't we at least get a spurious bounce? E.g. pcs_major_config() pcs_disable(old_pcs) /* masks IRQ */ old_pcs->phylink = NULL; new_pcs->phylink = pl; ... pcs_enable(new_pcs) /* unmasks IRQ */ ... pcs_handle_irq(new_pcs) /* Link up IRQ */ phylink_pcs_change(new_pcs, true) phylink_run_resolve(pl) phylink_resolve(pl) /* Link up */ pcs_handle_irq(old_pcs) /* Link down IRQ (pending from before pcs_disable) */ phylink_pcs_change(old_pcs, false) phylink_run_resolve(pl) /* Doesn't see the NULL */ phylink_resolve(pl) /* Link down; retrigger */ phylink_resolve(pl) /* Link up */ And mac_link_dropped probably needs WRITE_ONCE in order to take advantage of the ordering provided by queue_work. > I guess the possibility would be if pcs->phylink changes and the > compiler reads it multiple times - READ_ONCE() should solve that. > > However, in terms of the mechanics, there's no race. > > During the initial bringup, the resolve worker isn't started until > after phylink_major_config() has completed (it's started at > phylink_enable_and_run_resolve().) So, if phylink_pcs_change() > gets called while in phylink_major_config() there, it'll see > that pl->phylink_disable_state is non-zero, and won't queue the > work. > > The next one is within the worker itself - and there can only > be one instance of the worker running in totality. So, if > phylink_pcs_change() gets called while phylink_major_config() is > running from this path, the only thing it'll do is re-schedule > the resolve worker to run another iteration which is harmless > (whether or not the PCS is still current.) > > The last case is phylink_ethtool_ksettings_set(). This runs under > the state_mutex, which locks out the resolve worker (since it also > takes that mutex). > > So calling phylink_pcs_change() should be pretty harmless _unless_ > the compiler re-reads pcs->phylink multiple times inside > phylink_pcs_change(), which I suppose with modern compilers is > possible. Hence my suggestion above about READ_ONCE() for that. > > Have you encountered an OOPS because pcs->phylink has become NULL? > Or have you spotted another issue? I was looking at extending this code, and I was wondering if I needed to e.g. take RTNL first. Thanks for the quick response. --Sean ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH RFC net-next 03/14] net: phylink: add support for PCS link change notifications 2024-01-23 20:33 ` Sean Anderson @ 2024-01-23 21:05 ` Russell King (Oracle) 2024-01-23 21:09 ` Sean Anderson 0 siblings, 1 reply; 5+ messages in thread From: Russell King (Oracle) @ 2024-01-23 21:05 UTC (permalink / raw) To: Sean Anderson Cc: Landen.Chao, UNGLinuxDriver, alexandre.belloni, andrew, angelogioacchino.delregno, arinc.unal, claudiu.manoil, daniel, davem, dqfext, edumazet, f.fainelli, hkallweit1, kuba, linux-arm-kernel, linux-mediatek, matthias.bgg, netdev, olteanv, pabeni, sean.wang On Tue, Jan 23, 2024 at 03:33:57PM -0500, Sean Anderson wrote: > On 1/23/24 15:07, Russell King (Oracle) wrote: > > On Tue, Jan 23, 2024 at 02:46:15PM -0500, Sean Anderson wrote: > >> Hi Russell, > >> > >> Does there need to be any locking when calling phylink_pcs_change? I > >> noticed that you call it from threaded IRQ context in [1]. Can that race > >> with phylink_major_config? > > > > What kind of scenario are you thinking may require locking? > > Can't we at least get a spurious bounce? E.g. > > pcs_major_config() > pcs_disable(old_pcs) /* masks IRQ */ > old_pcs->phylink = NULL; > new_pcs->phylink = pl; > ... > pcs_enable(new_pcs) /* unmasks IRQ */ > ... > > pcs_handle_irq(new_pcs) /* Link up IRQ */ > phylink_pcs_change(new_pcs, true) > phylink_run_resolve(pl) > > phylink_resolve(pl) > /* Link up */ By this time, old_pcs->phylink has been set to NULL as you mentioned above. > pcs_handle_irq(old_pcs) /* Link down IRQ (pending from before pcs_disable) */ > phylink_pcs_change(old_pcs, false) > phylink_run_resolve(pl) /* Doesn't see the NULL */ So here, phylink_pcs_change(old_pcs, ...) will read old_pcs->phylink and find that it's NULL, and do nothing. > > I guess the possibility would be if pcs->phylink changes and the > > compiler reads it multiple times - READ_ONCE() should solve that. > > > > However, in terms of the mechanics, there's no race. > > > > During the initial bringup, the resolve worker isn't started until > > after phylink_major_config() has completed (it's started at > > phylink_enable_and_run_resolve().) So, if phylink_pcs_change() > > gets called while in phylink_major_config() there, it'll see > > that pl->phylink_disable_state is non-zero, and won't queue the > > work. > > > > The next one is within the worker itself - and there can only > > be one instance of the worker running in totality. So, if > > phylink_pcs_change() gets called while phylink_major_config() is > > running from this path, the only thing it'll do is re-schedule > > the resolve worker to run another iteration which is harmless > > (whether or not the PCS is still current.) > > > > The last case is phylink_ethtool_ksettings_set(). This runs under > > the state_mutex, which locks out the resolve worker (since it also > > takes that mutex). > > > > So calling phylink_pcs_change() should be pretty harmless _unless_ > > the compiler re-reads pcs->phylink multiple times inside > > phylink_pcs_change(), which I suppose with modern compilers is > > possible. Hence my suggestion above about READ_ONCE() for that. > > > > Have you encountered an OOPS because pcs->phylink has become NULL? > > Or have you spotted another issue? > > I was looking at extending this code, and I was wondering if I needed > to e.g. take RTNL first. Thanks for the quick response. Note that phylink_mac_change() gets called in irq context, so this stuff can't take any mutexes or the rtnl. It is also intended that phylink_pcs_change() is similarly callable in irq context. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH RFC net-next 03/14] net: phylink: add support for PCS link change notifications 2024-01-23 21:05 ` Russell King (Oracle) @ 2024-01-23 21:09 ` Sean Anderson 0 siblings, 0 replies; 5+ messages in thread From: Sean Anderson @ 2024-01-23 21:09 UTC (permalink / raw) To: Russell King (Oracle) Cc: Landen.Chao, UNGLinuxDriver, alexandre.belloni, andrew, angelogioacchino.delregno, arinc.unal, claudiu.manoil, daniel, davem, dqfext, edumazet, f.fainelli, hkallweit1, kuba, linux-arm-kernel, linux-mediatek, matthias.bgg, netdev, olteanv, pabeni, sean.wang On 1/23/24 16:05, Russell King (Oracle) wrote: > On Tue, Jan 23, 2024 at 03:33:57PM -0500, Sean Anderson wrote: >> On 1/23/24 15:07, Russell King (Oracle) wrote: >>> On Tue, Jan 23, 2024 at 02:46:15PM -0500, Sean Anderson wrote: >>>> Hi Russell, >>>> >>>> Does there need to be any locking when calling >>>> phylink_pcs_change? I noticed that you call it from threaded >>>> IRQ context in [1]. Can that race with phylink_major_config? >>> >>> What kind of scenario are you thinking may require locking? >> >> Can't we at least get a spurious bounce? E.g. >> >> pcs_major_config() pcs_disable(old_pcs) /* masks IRQ */ >> old_pcs->phylink = NULL; new_pcs->phylink = pl; ... >> pcs_enable(new_pcs) /* unmasks IRQ */ ... >> >> pcs_handle_irq(new_pcs) /* Link up IRQ */ >> phylink_pcs_change(new_pcs, true) phylink_run_resolve(pl) >> >> phylink_resolve(pl) /* Link up */ > > By this time, old_pcs->phylink has been set to NULL as you mentioned > above. > >> pcs_handle_irq(old_pcs) /* Link down IRQ (pending from before >> pcs_disable) */ phylink_pcs_change(old_pcs, false) >> phylink_run_resolve(pl) /* Doesn't see the NULL */ > > So here, phylink_pcs_change(old_pcs, ...) will read old_pcs->phylink > and find that it's NULL, and do nothing. This can happen on another CPU. There are no memory barriers on the read side (until queue_work), so there's no guarantee that other CPUs will see the write. --Sean >>> I guess the possibility would be if pcs->phylink changes and the >>> compiler reads it multiple times - READ_ONCE() should solve >>> that. >>> >>> However, in terms of the mechanics, there's no race. >>> >>> During the initial bringup, the resolve worker isn't started >>> until after phylink_major_config() has completed (it's started >>> at phylink_enable_and_run_resolve().) So, if >>> phylink_pcs_change() gets called while in phylink_major_config() >>> there, it'll see that pl->phylink_disable_state is non-zero, and >>> won't queue the work. >>> >>> The next one is within the worker itself - and there can only be >>> one instance of the worker running in totality. So, if >>> phylink_pcs_change() gets called while phylink_major_config() is >>> running from this path, the only thing it'll do is re-schedule >>> the resolve worker to run another iteration which is harmless >>> (whether or not the PCS is still current.) >>> >>> The last case is phylink_ethtool_ksettings_set(). This runs >>> under the state_mutex, which locks out the resolve worker (since >>> it also takes that mutex). >>> >>> So calling phylink_pcs_change() should be pretty harmless >>> _unless_ the compiler re-reads pcs->phylink multiple times >>> inside phylink_pcs_change(), which I suppose with modern >>> compilers is possible. Hence my suggestion above about >>> READ_ONCE() for that. >>> >>> Have you encountered an OOPS because pcs->phylink has become >>> NULL? Or have you spotted another issue? >> >> I was looking at extending this code, and I was wondering if I >> needed to e.g. take RTNL first. Thanks for the quick response. > > Note that phylink_mac_change() gets called in irq context, so this > stuff can't take any mutexes or the rtnl. It is also intended that > phylink_pcs_change() is similarly callable in irq context. > ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-01-23 21:10 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <E1qChay-00Fmrf-9Y@rmk-PC.armlinux.org.uk>
2024-01-23 19:46 ` [PATCH RFC net-next 03/14] net: phylink: add support for PCS link change notifications Sean Anderson
2024-01-23 20:07 ` Russell King (Oracle)
2024-01-23 20:33 ` Sean Anderson
2024-01-23 21:05 ` Russell King (Oracle)
2024-01-23 21:09 ` Sean Anderson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox