* Missing wiphy lock in ieee80211_color_collision_detection_work
@ 2024-09-19 8:15 Nicolas Escande
2024-09-19 10:02 ` Nicolas Cavallari
0 siblings, 1 reply; 12+ messages in thread
From: Nicolas Escande @ 2024-09-19 8:15 UTC (permalink / raw)
To: linux-wireless
Hi guys,
Running a pretty hacked up kernel under lockdep, I've had a few splats like this
[ 75.453180] Call trace:
[ 75.455595] cfg80211_bss_color_notify+0x220/0x260
[ 75.460341] ieee80211_color_collision_detection_work+0x4c/0x5c
[ 75.466205] process_one_work+0x434/0x6ec
[ 75.470169] worker_thread+0x9c/0x624
[ 75.473794] kthread+0x1a4/0x1ac
[ 75.476987] ret_from_fork+0x10/0x20
Which shows we are calling cfg80211_obss_color_collision_notify and thus
cfg80211_bss_color_notify from ieee80211_color_collision_detection_work without
holding the wiphy's mtx.
It seems that we should either explicitely lock the phy using something like
diff --git a/net/mac80211/cfg.c b/net/mac80211/cfg.c
index 847304a3a29a..481e1550cb21 100644
--- a/net/mac80211/cfg.c
+++ b/net/mac80211/cfg.c
@@ -4833,9 +4833,12 @@ void ieee80211_color_collision_detection_work(struct work_struct *work)
container_of(delayed_work, struct ieee80211_link_data,
color_collision_detect_work);
struct ieee80211_sub_if_data *sdata = link->sdata;
+ struct ieee80211_local *local = sdata->local;
+ wiphy_lock(local->hw.wiphy);
cfg80211_obss_color_collision_notify(sdata->dev, link->color_bitmap,
link->link_id);
+ wiphy_unlock(local->hw.wiphy);
}
void ieee80211_color_change_finish(struct ieee80211_vif *vif, u8 link_id)
Or switch to using the wiphy_work_queue infrastructure.
Did I miss something ? Which one should we do ? I'm not sure of all the
implications of switching to the wiphy work queue and why it did not get
converted like the color_change_finalize_work stuff ?
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: Missing wiphy lock in ieee80211_color_collision_detection_work
2024-09-19 8:15 Missing wiphy lock in ieee80211_color_collision_detection_work Nicolas Escande
@ 2024-09-19 10:02 ` Nicolas Cavallari
2024-09-19 10:22 ` Johannes Berg
0 siblings, 1 reply; 12+ messages in thread
From: Nicolas Cavallari @ 2024-09-19 10:02 UTC (permalink / raw)
To: Nicolas Escande, linux-wireless
On 19/09/2024 10:15, Nicolas Escande wrote:
> Hi guys,
>
> Running a pretty hacked up kernel under lockdep, I've had a few splats like this
>
> [ 75.453180] Call trace:
> [ 75.455595] cfg80211_bss_color_notify+0x220/0x260
> [ 75.460341] ieee80211_color_collision_detection_work+0x4c/0x5c
> [ 75.466205] process_one_work+0x434/0x6ec
> [ 75.470169] worker_thread+0x9c/0x624
> [ 75.473794] kthread+0x1a4/0x1ac
> [ 75.476987] ret_from_fork+0x10/0x20
>
> Which shows we are calling cfg80211_obss_color_collision_notify and thus
> cfg80211_bss_color_notify from ieee80211_color_collision_detection_work without
> holding the wiphy's mtx.
>
> It seems that we should either explicitely lock the phy using something like
>
> diff --git a/net/mac80211/cfg.c b/net/mac80211/cfg.c
> index 847304a3a29a..481e1550cb21 100644
> --- a/net/mac80211/cfg.c
> +++ b/net/mac80211/cfg.c
> @@ -4833,9 +4833,12 @@ void ieee80211_color_collision_detection_work(struct work_struct *work)
> container_of(delayed_work, struct ieee80211_link_data,
> color_collision_detect_work);
> struct ieee80211_sub_if_data *sdata = link->sdata;
> + struct ieee80211_local *local = sdata->local;
>
> + wiphy_lock(local->hw.wiphy);
> cfg80211_obss_color_collision_notify(sdata->dev, link->color_bitmap,
> link->link_id);
> + wiphy_unlock(local->hw.wiphy);
> }
>
> void ieee80211_color_change_finish(struct ieee80211_vif *vif, u8 link_id)
>
> Or switch to using the wiphy_work_queue infrastructure.
>
> Did I miss something ? Which one should we do ? I'm not sure of all the
> implications of switching to the wiphy work queue and why it did not get
> converted like the color_change_finalize_work stuff ?
ieee80211_color_collision_detection_work() used to lock the wdev mutex, now it
does not hold anything since 076fc8775da("wifi: cfg80211: remove wdev mutex")
Also the rate limiting uses delayed_work_pending(), There is no wiphy work queue
equivalent AFAIK, so the explicit lock is probably the way to go.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Missing wiphy lock in ieee80211_color_collision_detection_work
2024-09-19 10:02 ` Nicolas Cavallari
@ 2024-09-19 10:22 ` Johannes Berg
2024-09-20 7:05 ` Nicolas Escande
2024-09-20 8:28 ` Remi Pommarel
0 siblings, 2 replies; 12+ messages in thread
From: Johannes Berg @ 2024-09-19 10:22 UTC (permalink / raw)
To: Nicolas Cavallari, Nicolas Escande, linux-wireless
On Thu, 2024-09-19 at 12:02 +0200, Nicolas Cavallari wrote:
>
> > Did I miss something ? Which one should we do ? I'm not sure of all the
> > implications of switching to the wiphy work queue and why it did not get
> > converted like the color_change_finalize_work stuff ?
>
> ieee80211_color_collision_detection_work() used to lock the wdev mutex, now it
> does not hold anything since 076fc8775da("wifi: cfg80211: remove wdev mutex")
>
> Also the rate limiting uses delayed_work_pending(), There is no wiphy work queue
> equivalent AFAIK, so the explicit lock is probably the way to go.
That won't work, it's cancel_delayed_work_sync() under the wiphy mutex,
so that'll cause deadlocks (and should cause lockdep complaints about
them.)
johannes
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Missing wiphy lock in ieee80211_color_collision_detection_work
2024-09-19 10:22 ` Johannes Berg
@ 2024-09-20 7:05 ` Nicolas Escande
2024-09-20 8:28 ` Remi Pommarel
1 sibling, 0 replies; 12+ messages in thread
From: Nicolas Escande @ 2024-09-20 7:05 UTC (permalink / raw)
To: Johannes Berg, Nicolas Cavallari, Nicolas Escande, linux-wireless
On Thu Sep 19, 2024 at 12:22 PM CEST, Johannes Berg wrote:
> >
> > Also the rate limiting uses delayed_work_pending(), There is no wiphy work queue
> > equivalent AFAIK, so the explicit lock is probably the way to go.
>
> That won't work, it's cancel_delayed_work_sync() under the wiphy mutex,
> so that'll cause deadlocks (and should cause lockdep complaints about
> them.)
>
> johannes
Ok then we'll go the wiphy work queue way.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Missing wiphy lock in ieee80211_color_collision_detection_work
2024-09-19 10:22 ` Johannes Berg
2024-09-20 7:05 ` Nicolas Escande
@ 2024-09-20 8:28 ` Remi Pommarel
2024-09-20 8:41 ` Remi Pommarel
2024-09-20 11:35 ` Nicolas Cavallari
1 sibling, 2 replies; 12+ messages in thread
From: Remi Pommarel @ 2024-09-20 8:28 UTC (permalink / raw)
To: Johannes Berg; +Cc: Nicolas Cavallari, Nicolas Escande, linux-wireless
On Thu, Sep 19, 2024 at 12:22:10PM +0200, Johannes Berg wrote:
> On Thu, 2024-09-19 at 12:02 +0200, Nicolas Cavallari wrote:
> >
> > > Did I miss something ? Which one should we do ? I'm not sure of all the
> > > implications of switching to the wiphy work queue and why it did not get
> > > converted like the color_change_finalize_work stuff ?
> >
> > ieee80211_color_collision_detection_work() used to lock the wdev mutex, now it
> > does not hold anything since 076fc8775da("wifi: cfg80211: remove wdev mutex")
> >
> > Also the rate limiting uses delayed_work_pending(), There is no wiphy work queue
> > equivalent AFAIK, so the explicit lock is probably the way to go.
>
> That won't work, it's cancel_delayed_work_sync() under the wiphy mutex,
> so that'll cause deadlocks (and should cause lockdep complaints about
> them.)
Yes you are right, and AFAIU that is this kind of issue using wiphy work
queue would prevent. With wiphy work queue if wiphy_delayed_work_cancel
is called with wiphy lock held, work cannot be running after it returns;
making it handy to replace cancel_delayed_work_sync() with.
So, in my opinion, switching to wiphy work queue seems to be the
prefered solution here.
While there is no wiphy work queue equivalent of delayed_work_pending(),
I think using timer_pending(&link->color_collision_detect_work->timer)
to replace delayed_work_pending(), even if the semantic is a bit
different, would be ok to fulfill the rate limiting purpose. Having the
same delayed_work_pending() semantics on wiphy work queue would require
to take wiphy lock which seem a bit superfluous here.
Does that make sense ?
Thanks
--
Remi
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Missing wiphy lock in ieee80211_color_collision_detection_work
2024-09-20 8:28 ` Remi Pommarel
@ 2024-09-20 8:41 ` Remi Pommarel
2024-09-20 10:00 ` Johannes Berg
2024-09-20 11:35 ` Nicolas Cavallari
1 sibling, 1 reply; 12+ messages in thread
From: Remi Pommarel @ 2024-09-20 8:41 UTC (permalink / raw)
To: Johannes Berg; +Cc: Nicolas Cavallari, Nicolas Escande, linux-wireless
On Fri, Sep 20, 2024 at 10:28:55AM +0200, Remi Pommarel wrote:
> On Thu, Sep 19, 2024 at 12:22:10PM +0200, Johannes Berg wrote:
> > On Thu, 2024-09-19 at 12:02 +0200, Nicolas Cavallari wrote:
> > >
> > > > Did I miss something ? Which one should we do ? I'm not sure of all the
> > > > implications of switching to the wiphy work queue and why it did not get
> > > > converted like the color_change_finalize_work stuff ?
> > >
> > > ieee80211_color_collision_detection_work() used to lock the wdev mutex, now it
> > > does not hold anything since 076fc8775da("wifi: cfg80211: remove wdev mutex")
> > >
> > > Also the rate limiting uses delayed_work_pending(), There is no wiphy work queue
> > > equivalent AFAIK, so the explicit lock is probably the way to go.
> >
> > That won't work, it's cancel_delayed_work_sync() under the wiphy mutex,
> > so that'll cause deadlocks (and should cause lockdep complaints about
> > them.)
>
> Yes you are right, and AFAIU that is this kind of issue using wiphy work
> queue would prevent. With wiphy work queue if wiphy_delayed_work_cancel
> is called with wiphy lock held, work cannot be running after it returns;
> making it handy to replace cancel_delayed_work_sync() with.
>
> So, in my opinion, switching to wiphy work queue seems to be the
> prefered solution here.
>
> While there is no wiphy work queue equivalent of delayed_work_pending(),
> I think using timer_pending(&link->color_collision_detect_work->timer)
> to replace delayed_work_pending(), even if the semantic is a bit
> different, would be ok to fulfill the rate limiting purpose. Having the
> same delayed_work_pending() semantics on wiphy work queue would require
> to take wiphy lock which seem a bit superfluous here.
Forgot to mention that I am currently running a wiphy work version of
ieee80211_color_collision_detection_work() using timer_pending() as a
delayed_work_pending() substitute since a couple of hours and lockdep
did not complain so far.
--
Remi
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Missing wiphy lock in ieee80211_color_collision_detection_work
2024-09-20 8:41 ` Remi Pommarel
@ 2024-09-20 10:00 ` Johannes Berg
2024-09-20 12:07 ` Remi Pommarel
0 siblings, 1 reply; 12+ messages in thread
From: Johannes Berg @ 2024-09-20 10:00 UTC (permalink / raw)
To: Remi Pommarel; +Cc: Nicolas Cavallari, Nicolas Escande, linux-wireless
On Fri, 2024-09-20 at 10:41 +0200, Remi Pommarel wrote:
> > > That won't work, it's cancel_delayed_work_sync() under the wiphy mutex,
> > > so that'll cause deadlocks (and should cause lockdep complaints about
> > > them.)
> >
> > Yes you are right, and AFAIU that is this kind of issue using wiphy work
> > queue would prevent. With wiphy work queue if wiphy_delayed_work_cancel
> > is called with wiphy lock held, work cannot be running after it returns;
> > making it handy to replace cancel_delayed_work_sync() with.
Right, and safe under the lock, since the lock is taken externally to
the actual work function. It either is completed or still on the list,
both cases are handled safely (do nothing and remove, respectively.)
> > So, in my opinion, switching to wiphy work queue seems to be the
> > prefered solution here.
Yes.
> > While there is no wiphy work queue equivalent of delayed_work_pending(),
Maybe we should add it?
> > I think using timer_pending(&link->color_collision_detect_work->timer)
> > to replace delayed_work_pending(), even if the semantic is a bit
> > different, would be ok to fulfill the rate limiting purpose.
I think you're right. We could as well check list_empty() or so, but it
wouldn't make a big difference. As you say:
> > Having the
> > same delayed_work_pending() semantics on wiphy work queue would require
> > to take wiphy lock
To really have it known _precisely_, that's true.
> > which seem a bit superfluous here.
It's actually simply also not possible - if we could sleep in
ieee80211_obss_color_collision_notify() and take the wiphy mutex, we'd
probably have done this completely differently :)
And a hypothetical wiphy_delayed_work_pending() API should therefore not
be required to be called with the wiphy mutex held.
I think that perhaps we should add such a trivial inline instead of
open-coding the timer_pending() check, but I'm not sure whether or not
it should also check the list (i.e. check if timer has expired, but work
hasn't started yet): on the one hand it seems more appropriate, and if
actually holding the wiphy mutex it would in fact be completely correct,
on the other hand maybe it encourages being sloppily thinking the return
value is always perfect?
Right now I'd tend to have the check and document that it's only
guaranteed when under the wiphy mutex.
> Forgot to mention that I am currently running a wiphy work version of
> ieee80211_color_collision_detection_work() using timer_pending() as a
> delayed_work_pending() substitute since a couple of hours and lockdep
> did not complain so far.
:)
johannes
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Missing wiphy lock in ieee80211_color_collision_detection_work
2024-09-20 8:28 ` Remi Pommarel
2024-09-20 8:41 ` Remi Pommarel
@ 2024-09-20 11:35 ` Nicolas Cavallari
1 sibling, 0 replies; 12+ messages in thread
From: Nicolas Cavallari @ 2024-09-20 11:35 UTC (permalink / raw)
To: Remi Pommarel, Johannes Berg; +Cc: Nicolas Escande, linux-wireless
On 20/09/2024 10:28, Remi Pommarel wrote:
> On Thu, Sep 19, 2024 at 12:22:10PM +0200, Johannes Berg wrote:
>> On Thu, 2024-09-19 at 12:02 +0200, Nicolas Cavallari wrote:
>>>
>>>> Did I miss something ? Which one should we do ? I'm not sure of all the
>>>> implications of switching to the wiphy work queue and why it did not get
>>>> converted like the color_change_finalize_work stuff ?
>>>
>>> ieee80211_color_collision_detection_work() used to lock the wdev mutex, now it
>>> does not hold anything since 076fc8775da("wifi: cfg80211: remove wdev mutex")
>>>
>>> Also the rate limiting uses delayed_work_pending(), There is no wiphy work queue
>>> equivalent AFAIK, so the explicit lock is probably the way to go.
>>
>> That won't work, it's cancel_delayed_work_sync() under the wiphy mutex,
>> so that'll cause deadlocks (and should cause lockdep complaints about
>> them.)
>
> Yes you are right, and AFAIU that is this kind of issue using wiphy work
> queue would prevent. With wiphy work queue if wiphy_delayed_work_cancel
> is called with wiphy lock held, work cannot be running after it returns;
> making it handy to replace cancel_delayed_work_sync() with.
>
> So, in my opinion, switching to wiphy work queue seems to be the
> prefered solution here.
>
> While there is no wiphy work queue equivalent of delayed_work_pending(),
> I think using timer_pending(&link->color_collision_detect_work->timer)
> to replace delayed_work_pending(), even if the semantic is a bit
> different, would be ok to fulfill the rate limiting purpose. Having the
> same delayed_work_pending() semantics on wiphy work queue would require
> to take wiphy lock which seem a bit superfluous here.
>
> Does that make sense ?
At a cost of a spin_trylock, we can also simply take the __ratelimit() option
(or code an equivalent) and schedule a wiphy_work immediately.
The goal is just to rate limit the work (otherwise netlink is flooded and the
system is hosed). queue_delayed_work() and delayed_work_pending() was just an
easy way of implementing it since it had to be a work anyway.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Missing wiphy lock in ieee80211_color_collision_detection_work
2024-09-20 10:00 ` Johannes Berg
@ 2024-09-20 12:07 ` Remi Pommarel
2024-09-20 12:10 ` Remi Pommarel
0 siblings, 1 reply; 12+ messages in thread
From: Remi Pommarel @ 2024-09-20 12:07 UTC (permalink / raw)
To: Johannes Berg; +Cc: Nicolas Cavallari, Nicolas Escande, linux-wireless
On Fri, Sep 20, 2024 at 12:00:08PM +0200, Johannes Berg wrote:
> On Fri, 2024-09-20 at 10:41 +0200, Remi Pommarel wrote:
> > > I think using timer_pending(&link->color_collision_detect_work->timer)
> > > to replace delayed_work_pending(), even if the semantic is a bit
> > > different, would be ok to fulfill the rate limiting purpose.
>
> I think you're right. We could as well check list_empty() or so, but it
> wouldn't make a big difference. As you say:
>
> > > Having the
> > > same delayed_work_pending() semantics on wiphy work queue would require
> > > to take wiphy lock
>
> To really have it known _precisely_, that's true.
>
> > > which seem a bit superfluous here.
>
> It's actually simply also not possible - if we could sleep in
> ieee80211_obss_color_collision_notify() and take the wiphy mutex, we'd
> probably have done this completely differently :)
>
> And a hypothetical wiphy_delayed_work_pending() API should therefore not
> be required to be called with the wiphy mutex held.
>
> I think that perhaps we should add such a trivial inline instead of
> open-coding the timer_pending() check, but I'm not sure whether or not
> it should also check the list (i.e. check if timer has expired, but work
> hasn't started yet): on the one hand it seems more appropriate, and if
> actually holding the wiphy mutex it would in fact be completely correct,
> on the other hand maybe it encourages being sloppily thinking the return
> value is always perfect?
>
> Right now I'd tend to have the check and document that it's only
> guaranteed when under the wiphy mutex.
I had this exact train of thought and was replying that I did agree on
that, and then I checked the other *_pending semantics for which I tend
to forget the details. IIUC timer_pending() and work_pending() can both
return false if callback is still running (or even not yet started).
Thus the hypothetical wiphy_work_pending() semantics could allow to
implement it using timer_pending().
Adding a list_empty() check while making it more "precise" does not make
it "perfect" (even if there is no clear notion of what perfection should
be here) even with wiphy lock held. Indeed if wiphy_work_pending() is
called precisely after delayed work timer has cleared its pending state
but before the callback (i.e wiphy_delayed_work_timer()) has added the
work in the list both !timer_pending() and list_empty(work->entry) would
be true. So there is a small window where wiphy_work_pending() wouldn't
be more precise than just checking timer_pending() as show below:
CPU0 CPU1
expire_timers
detach_timer
wiphy_work_pending() -> return false
timer->function
wiphy_work_queue
list_add_tail()
wiphy_work_pending() -> return false
--
Remi
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Missing wiphy lock in ieee80211_color_collision_detection_work
2024-09-20 12:07 ` Remi Pommarel
@ 2024-09-20 12:10 ` Remi Pommarel
2024-09-20 12:12 ` Johannes Berg
0 siblings, 1 reply; 12+ messages in thread
From: Remi Pommarel @ 2024-09-20 12:10 UTC (permalink / raw)
To: Johannes Berg; +Cc: Nicolas Cavallari, Nicolas Escande, linux-wireless
On Fri, Sep 20, 2024 at 02:07:19PM +0200, Remi Pommarel wrote:
> On Fri, Sep 20, 2024 at 12:00:08PM +0200, Johannes Berg wrote:
> > On Fri, 2024-09-20 at 10:41 +0200, Remi Pommarel wrote:
> > > > I think using timer_pending(&link->color_collision_detect_work->timer)
> > > > to replace delayed_work_pending(), even if the semantic is a bit
> > > > different, would be ok to fulfill the rate limiting purpose.
> >
> > I think you're right. We could as well check list_empty() or so, but it
> > wouldn't make a big difference. As you say:
> >
> > > > Having the
> > > > same delayed_work_pending() semantics on wiphy work queue would require
> > > > to take wiphy lock
> >
> > To really have it known _precisely_, that's true.
> >
> > > > which seem a bit superfluous here.
> >
> > It's actually simply also not possible - if we could sleep in
> > ieee80211_obss_color_collision_notify() and take the wiphy mutex, we'd
> > probably have done this completely differently :)
> >
> > And a hypothetical wiphy_delayed_work_pending() API should therefore not
> > be required to be called with the wiphy mutex held.
> >
> > I think that perhaps we should add such a trivial inline instead of
> > open-coding the timer_pending() check, but I'm not sure whether or not
> > it should also check the list (i.e. check if timer has expired, but work
> > hasn't started yet): on the one hand it seems more appropriate, and if
> > actually holding the wiphy mutex it would in fact be completely correct,
> > on the other hand maybe it encourages being sloppily thinking the return
> > value is always perfect?
> >
> > Right now I'd tend to have the check and document that it's only
> > guaranteed when under the wiphy mutex.
>
> I had this exact train of thought and was replying that I did agree on
> that, and then I checked the other *_pending semantics for which I tend
> to forget the details. IIUC timer_pending() and work_pending() can both
> return false if callback is still running (or even not yet started).
> Thus the hypothetical wiphy_work_pending() semantics could allow to
> implement it using timer_pending().
>
> Adding a list_empty() check while making it more "precise" does not make
> it "perfect" (even if there is no clear notion of what perfection should
> be here) even with wiphy lock held. Indeed if wiphy_work_pending() is
> called precisely after delayed work timer has cleared its pending state
> but before the callback (i.e wiphy_delayed_work_timer()) has added the
> work in the list both !timer_pending() and list_empty(work->entry) would
> be true. So there is a small window where wiphy_work_pending() wouldn't
> be more precise than just checking timer_pending() as show below:
>
> CPU0 CPU1
> expire_timers
> detach_timer
> wiphy_work_pending() -> return false
> timer->function
> wiphy_work_queue
> list_add_tail()
> wiphy_work_pending() -> return false
I meant wiphy_work_pending() -> return true here ^.
--
Remi
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Missing wiphy lock in ieee80211_color_collision_detection_work
2024-09-20 12:10 ` Remi Pommarel
@ 2024-09-20 12:12 ` Johannes Berg
2024-09-20 13:37 ` Remi Pommarel
0 siblings, 1 reply; 12+ messages in thread
From: Johannes Berg @ 2024-09-20 12:12 UTC (permalink / raw)
To: Remi Pommarel; +Cc: Nicolas Cavallari, Nicolas Escande, linux-wireless
On Fri, 2024-09-20 at 14:10 +0200, Remi Pommarel wrote:
> On Fri, Sep 20, 2024 at 02:07:19PM +0200, Remi Pommarel wrote:
> > On Fri, Sep 20, 2024 at 12:00:08PM +0200, Johannes Berg wrote:
> > > On Fri, 2024-09-20 at 10:41 +0200, Remi Pommarel wrote:
> > > > > I think using timer_pending(&link->color_collision_detect_work->timer)
> > > > > to replace delayed_work_pending(), even if the semantic is a bit
> > > > > different, would be ok to fulfill the rate limiting purpose.
> > >
> > > I think you're right. We could as well check list_empty() or so, but it
> > > wouldn't make a big difference. As you say:
> > >
> > > > > Having the
> > > > > same delayed_work_pending() semantics on wiphy work queue would require
> > > > > to take wiphy lock
> > >
> > > To really have it known _precisely_, that's true.
> > >
> > > > > which seem a bit superfluous here.
> > >
> > > It's actually simply also not possible - if we could sleep in
> > > ieee80211_obss_color_collision_notify() and take the wiphy mutex, we'd
> > > probably have done this completely differently :)
> > >
> > > And a hypothetical wiphy_delayed_work_pending() API should therefore not
> > > be required to be called with the wiphy mutex held.
> > >
> > > I think that perhaps we should add such a trivial inline instead of
> > > open-coding the timer_pending() check, but I'm not sure whether or not
> > > it should also check the list (i.e. check if timer has expired, but work
> > > hasn't started yet): on the one hand it seems more appropriate, and if
> > > actually holding the wiphy mutex it would in fact be completely correct,
> > > on the other hand maybe it encourages being sloppily thinking the return
> > > value is always perfect?
> > >
> > > Right now I'd tend to have the check and document that it's only
> > > guaranteed when under the wiphy mutex.
> >
> > I had this exact train of thought and was replying that I did agree on
> > that, and then I checked the other *_pending semantics for which I tend
> > to forget the details.
Haha, yes, no kidding.
> > IIUC timer_pending() and work_pending() can both
> > return false if callback is still running (or even not yet started).
> > Thus the hypothetical wiphy_work_pending() semantics could allow to
> > implement it using timer_pending().
> >
> > Adding a list_empty() check while making it more "precise" does not make
> > it "perfect" (even if there is no clear notion of what perfection should
> > be here) even with wiphy lock held. Indeed if wiphy_work_pending() is
> > called precisely after delayed work timer has cleared its pending state
> > but before the callback (i.e wiphy_delayed_work_timer()) has added the
> > work in the list both !timer_pending() and list_empty(work->entry) would
> > be true. So there is a small window where wiphy_work_pending() wouldn't
> > be more precise than just checking timer_pending() as show below:
> >
> > CPU0 CPU1
> > expire_timers
> > detach_timer
> > wiphy_work_pending() -> return false
> > timer->function
> > wiphy_work_queue
> > list_add_tail()
> > wiphy_work_pending() -> return false
>
> I meant wiphy_work_pending() -> return true here ^.
>
Good point! The second call could even be before the list_add_tail and
still return false, so yeah, wiphy lock doesn't do anything.
But I guess we can live with both of these too, given sufficient
documentation :)
Agree also with Nicolas though that we could just ratelimit this
differently too, this was just a convenient way of achieving it with the
existing infrastructure.
johannes
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Missing wiphy lock in ieee80211_color_collision_detection_work
2024-09-20 12:12 ` Johannes Berg
@ 2024-09-20 13:37 ` Remi Pommarel
0 siblings, 0 replies; 12+ messages in thread
From: Remi Pommarel @ 2024-09-20 13:37 UTC (permalink / raw)
To: Johannes Berg; +Cc: Nicolas Cavallari, Nicolas Escande, linux-wireless
On Fri, Sep 20, 2024 at 02:12:23PM +0200, Johannes Berg wrote:
> On Fri, 2024-09-20 at 14:10 +0200, Remi Pommarel wrote:
> > On Fri, Sep 20, 2024 at 02:07:19PM +0200, Remi Pommarel wrote:
> > > On Fri, Sep 20, 2024 at 12:00:08PM +0200, Johannes Berg wrote:
> > > > On Fri, 2024-09-20 at 10:41 +0200, Remi Pommarel wrote:
> > > > > > I think using timer_pending(&link->color_collision_detect_work->timer)
> > > > > > to replace delayed_work_pending(), even if the semantic is a bit
> > > > > > different, would be ok to fulfill the rate limiting purpose.
> > > >
> > > > I think you're right. We could as well check list_empty() or so, but it
> > > > wouldn't make a big difference. As you say:
> > > >
> > > > > > Having the
> > > > > > same delayed_work_pending() semantics on wiphy work queue would require
> > > > > > to take wiphy lock
> > > >
> > > > To really have it known _precisely_, that's true.
> > > >
> > > > > > which seem a bit superfluous here.
> > > >
> > > > It's actually simply also not possible - if we could sleep in
> > > > ieee80211_obss_color_collision_notify() and take the wiphy mutex, we'd
> > > > probably have done this completely differently :)
> > > >
> > > > And a hypothetical wiphy_delayed_work_pending() API should therefore not
> > > > be required to be called with the wiphy mutex held.
> > > >
> > > > I think that perhaps we should add such a trivial inline instead of
> > > > open-coding the timer_pending() check, but I'm not sure whether or not
> > > > it should also check the list (i.e. check if timer has expired, but work
> > > > hasn't started yet): on the one hand it seems more appropriate, and if
> > > > actually holding the wiphy mutex it would in fact be completely correct,
> > > > on the other hand maybe it encourages being sloppily thinking the return
> > > > value is always perfect?
> > > >
> > > > Right now I'd tend to have the check and document that it's only
> > > > guaranteed when under the wiphy mutex.
> > >
> > > I had this exact train of thought and was replying that I did agree on
> > > that, and then I checked the other *_pending semantics for which I tend
> > > to forget the details.
>
> Haha, yes, no kidding.
>
> > > IIUC timer_pending() and work_pending() can both
> > > return false if callback is still running (or even not yet started).
> > > Thus the hypothetical wiphy_work_pending() semantics could allow to
> > > implement it using timer_pending().
> > >
> > > Adding a list_empty() check while making it more "precise" does not make
> > > it "perfect" (even if there is no clear notion of what perfection should
> > > be here) even with wiphy lock held. Indeed if wiphy_work_pending() is
> > > called precisely after delayed work timer has cleared its pending state
> > > but before the callback (i.e wiphy_delayed_work_timer()) has added the
> > > work in the list both !timer_pending() and list_empty(work->entry) would
> > > be true. So there is a small window where wiphy_work_pending() wouldn't
> > > be more precise than just checking timer_pending() as show below:
> > >
> > > CPU0 CPU1
> > > expire_timers
> > > detach_timer
> > > wiphy_work_pending() -> return false
> > > timer->function
> > > wiphy_work_queue
> > > list_add_tail()
> > > wiphy_work_pending() -> return false
> >
> > I meant wiphy_work_pending() -> return true here ^.
> >
> Good point! The second call could even be before the list_add_tail and
> still return false, so yeah, wiphy lock doesn't do anything.
>
> But I guess we can live with both of these too, given sufficient
> documentation :)
Also, sorry for being bothersome, wiphy_delayed_work_queue() is not a
genuine equivalent of queue_delayed_work() as the latter does not
requeue work if already scheduled while the former does so effectively
changing the delayed work deadline (very mod_delayed_work() alike).
If we wanted to fix that, ieee80211_obss_color_collision_notify() would
simply use wiphy_delayed_work_queue() directly. But other calls would
have to be rechecked and switched to use wiphy_delayed_work_mod() if
needed be.
>
> Agree also with Nicolas though that we could just ratelimit this
> differently too, this was just a convenient way of achieving it with the
> existing infrastructure.
Yes that would work also but still need to convert into wiphy work the
color collision work to avoid deadlock.
--
Remi
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2024-09-20 13:36 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-19 8:15 Missing wiphy lock in ieee80211_color_collision_detection_work Nicolas Escande
2024-09-19 10:02 ` Nicolas Cavallari
2024-09-19 10:22 ` Johannes Berg
2024-09-20 7:05 ` Nicolas Escande
2024-09-20 8:28 ` Remi Pommarel
2024-09-20 8:41 ` Remi Pommarel
2024-09-20 10:00 ` Johannes Berg
2024-09-20 12:07 ` Remi Pommarel
2024-09-20 12:10 ` Remi Pommarel
2024-09-20 12:12 ` Johannes Berg
2024-09-20 13:37 ` Remi Pommarel
2024-09-20 11:35 ` Nicolas Cavallari
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).