linux-wireless.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: mac80211: workqueue & RTNL lock
  2008-06-22 22:42 mac80211: workqueue & RTNL lock Ivo van Doorn
@ 2008-06-22 22:30 ` Michael Buesch
  2008-06-23  8:30   ` Ivo Van Doorn
  0 siblings, 1 reply; 11+ messages in thread
From: Michael Buesch @ 2008-06-22 22:30 UTC (permalink / raw)
  To: Ivo van Doorn; +Cc: linux-wireless, Johannes Berg

On Monday 23 June 2008 00:42:41 Ivo van Doorn wrote:
> That function calls flush_workqueue()

Uh wait. It shouldn't call that. Where exactly is this function called?

-- 
Greetings Michael.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* mac80211: workqueue & RTNL lock
@ 2008-06-22 22:42 Ivo van Doorn
  2008-06-22 22:30 ` Michael Buesch
  0 siblings, 1 reply; 11+ messages in thread
From: Ivo van Doorn @ 2008-06-22 22:42 UTC (permalink / raw)
  To: linux-wireless; +Cc: Johannes Berg

Hi,

I am currently looking at a rt2x00 regression in 2.6.26-rcX
	http://bugzilla.kernel.org/process_bug.cgi

This regression is  caused by a circular locking problem
(see http://bugzilla.kernel.org/attachment.cgi?id=16581)

The problem seems be caused by rt2x00 using ieee80211_iterate_active_interfaces()
within the workqueue as provided by mac80211. But this will trigger the
circular dependency when 'ifdown' is called. If I am not mistaken,
'ifdown' will grab the rtnl lock and ieee80211_stop() is called.
That function calls flush_workqueue() which cannot succeed because it will wait
on a thread that is also trying to grab the rtnl lock.

Possible solution would be making rt2x00 use its own workqueue, but
that will cause similar problems since it can't use flush either it could however
use flags to prevent ieee80211_iterate_active_interfaces() being called.

However it does mean that a solution for the mac80211 workqueue also has to
be found, since rt2x00 isn't the only use of the mac80211 workqueue,
but currently is the only one who uses it in combination with
ieee80211_iterate_active_interfaces().
And apparently those 2 cannot be combined. :S

Ivo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mac80211: workqueue & RTNL lock
  2008-06-22 22:30 ` Michael Buesch
@ 2008-06-23  8:30   ` Ivo Van Doorn
  2008-06-23  9:20     ` Michael Buesch
  0 siblings, 1 reply; 11+ messages in thread
From: Ivo Van Doorn @ 2008-06-23  8:30 UTC (permalink / raw)
  To: Michael Buesch; +Cc: linux-wireless, Johannes Berg

On 6/23/08, Michael Buesch <mb@bu3sch.de> wrote:
> On Monday 23 June 2008 00:42:41 Ivo van Doorn wrote:
>> That function calls flush_workqueue()
>
> Uh wait. It shouldn't call that. Where exactly is this function called?

It is called right before local->ops->remove_interface() which means the
workqueue is flushed when it isn't even guaranteed that all interfaces are gone.

net/mac80211/main.c:553

static int ieee80211_stop(struct net_device *dev)
{
<snip>
  switch (sdata->vif.type) {
	case IEEE80211_IF_TYPE_MESH_POINT:
	case IEEE80211_IF_TYPE_STA:
	case IEEE80211_IF_TYPE_IBSS:
        <snip>
        flush_workqueue(local->hw.workqueue);
       <snip>
  }
<snip>
}

Ivo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mac80211: workqueue & RTNL lock
  2008-06-23  8:30   ` Ivo Van Doorn
@ 2008-06-23  9:20     ` Michael Buesch
  2008-06-23 11:17       ` Ivo Van Doorn
  0 siblings, 1 reply; 11+ messages in thread
From: Michael Buesch @ 2008-06-23  9:20 UTC (permalink / raw)
  To: Ivo Van Doorn; +Cc: linux-wireless, Johannes Berg

On Monday 23 June 2008 10:30:47 Ivo Van Doorn wrote:
> On 6/23/08, Michael Buesch <mb@bu3sch.de> wrote:
> > On Monday 23 June 2008 00:42:41 Ivo van Doorn wrote:
> >> That function calls flush_workqueue()
> >
> > Uh wait. It shouldn't call that. Where exactly is this function called?
> 
> It is called right before local->ops->remove_interface() which means the
> workqueue is flushed when it isn't even guaranteed that all interfaces are gone.
> 
> net/mac80211/main.c:553
> 
> static int ieee80211_stop(struct net_device *dev)
> {
> <snip>
>   switch (sdata->vif.type) {
> 	case IEEE80211_IF_TYPE_MESH_POINT:
> 	case IEEE80211_IF_TYPE_STA:
> 	case IEEE80211_IF_TYPE_IBSS:
>         <snip>
>         flush_workqueue(local->hw.workqueue);
>        <snip>
>   }
> <snip>
> }

Oh right. I see. This is actually OK. I thought it would flush the global wq.
The local mac80211 wq was introduced to workaround this rtnl deadlock,
so I'm surprised it happens again.


-- 
Greetings Michael.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mac80211: workqueue & RTNL lock
  2008-06-23  9:20     ` Michael Buesch
@ 2008-06-23 11:17       ` Ivo Van Doorn
  2008-06-24  8:51         ` Johannes Berg
  0 siblings, 1 reply; 11+ messages in thread
From: Ivo Van Doorn @ 2008-06-23 11:17 UTC (permalink / raw)
  To: Michael Buesch; +Cc: linux-wireless, Johannes Berg

On 6/23/08, Michael Buesch <mb@bu3sch.de> wrote:
> On Monday 23 June 2008 10:30:47 Ivo Van Doorn wrote:
>> On 6/23/08, Michael Buesch <mb@bu3sch.de> wrote:
>> > On Monday 23 June 2008 00:42:41 Ivo van Doorn wrote:
>> >> That function calls flush_workqueue()
>> >
>> > Uh wait. It shouldn't call that. Where exactly is this function called?
>>
>> It is called right before local->ops->remove_interface() which means the
>> workqueue is flushed when it isn't even guaranteed that all interfaces are
>> gone.
>>
>> net/mac80211/main.c:553
>>
>> static int ieee80211_stop(struct net_device *dev)
>> {
>> <snip>
>>   switch (sdata->vif.type) {
>> 	case IEEE80211_IF_TYPE_MESH_POINT:
>> 	case IEEE80211_IF_TYPE_STA:
>> 	case IEEE80211_IF_TYPE_IBSS:
>>         <snip>
>>         flush_workqueue(local->hw.workqueue);
>>        <snip>
>>   }
>> <snip>
>> }
>
> Oh right. I see. This is actually OK. I thought it would flush the global
> wq.
> The local mac80211 wq was introduced to workaround this rtnl deadlock,
> so I'm surprised it happens again.

Well it was caused by the recent introduction of the interface iterator.
Question now is if we need to restrict the usage, remove the flush_workqueue,
or change the locking in the interface iterator. :S

Ivo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mac80211: workqueue & RTNL lock
  2008-06-23 11:17       ` Ivo Van Doorn
@ 2008-06-24  8:51         ` Johannes Berg
  2008-06-24  9:03           ` Ivo Van Doorn
  0 siblings, 1 reply; 11+ messages in thread
From: Johannes Berg @ 2008-06-24  8:51 UTC (permalink / raw)
  To: Ivo Van Doorn; +Cc: Michael Buesch, linux-wireless

[-- Attachment #1: Type: text/plain, Size: 528 bytes --]

On Mon, 2008-06-23 at 13:17 +0200, Ivo Van Doorn wrote:

> Well it was caused by the recent introduction of the interface iterator.
> Question now is if we need to restrict the usage, remove the flush_workqueue,
> or change the locking in the interface iterator. :S

Ouch. Any way you can use the atomic iterator there?

Or if not, can you use schedule_work and cancel_work_sync to take care
of things? I would like to keep the mac80211 workqueue rtnl-free because
otherwise locking is going to get ugly.

johannes

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mac80211: workqueue & RTNL lock
  2008-06-24  8:51         ` Johannes Berg
@ 2008-06-24  9:03           ` Ivo Van Doorn
  2008-06-24  9:10             ` Johannes Berg
  0 siblings, 1 reply; 11+ messages in thread
From: Ivo Van Doorn @ 2008-06-24  9:03 UTC (permalink / raw)
  To: Johannes Berg; +Cc: Michael Buesch, linux-wireless

On 6/24/08, Johannes Berg <johannes@sipsolutions.net> wrote:
> On Mon, 2008-06-23 at 13:17 +0200, Ivo Van Doorn wrote:
>
>> Well it was caused by the recent introduction of the interface iterator.
>> Question now is if we need to restrict the usage, remove the
>> flush_workqueue,
>> or change the locking in the interface iterator. :S
>
> Ouch. Any way you can use the atomic iterator there?

No because of rt2500usb and rt73usb who need scheduling,
and this bug is in exactly the same function as before where we had
the scheduling while atomic problem when you introduced the atomic iterator. ;)

> Or if not, can you use schedule_work and cancel_work_sync to take care
> of things?

I thought about that it cancel_work_sync will cause exactly the same problem,
stop() holds the rtnl lock, rt2x00 work waits for rtnl lock, mac80211
calls stop(), rt2x00 waits until tasks are completed but one of those
is still waiting on the rtnl lock.

However I could use the non-synchronized version for cancelling the work
and add a check to see if the work should be cancelled within the
rtnl-protected area.

> I would like to keep the mac80211 workqueue rtnl-free because
> otherwise locking is going to get ugly.

We need a big warning in mac80211.h then that tells that the non-atomic iterator
or any other rtnl locking cannot be used in the mac80211 workqueue.

Ivo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mac80211: workqueue & RTNL lock
  2008-06-24  9:03           ` Ivo Van Doorn
@ 2008-06-24  9:10             ` Johannes Berg
  2008-06-24  9:20               ` Ivo Van Doorn
  0 siblings, 1 reply; 11+ messages in thread
From: Johannes Berg @ 2008-06-24  9:10 UTC (permalink / raw)
  To: Ivo Van Doorn; +Cc: Michael Buesch, linux-wireless

[-- Attachment #1: Type: text/plain, Size: 1775 bytes --]

On Tue, 2008-06-24 at 11:03 +0200, Ivo Van Doorn wrote:

> > Ouch. Any way you can use the atomic iterator there?
> 
> No because of rt2500usb and rt73usb who need scheduling,
> and this bug is in exactly the same function as before where we had
> the scheduling while atomic problem when you introduced the atomic iterator. ;)

Ok :)

> > Or if not, can you use schedule_work and cancel_work_sync to take care
> > of things?
> 
> I thought about that it cancel_work_sync will cause exactly the same problem,
> stop() holds the rtnl lock, rt2x00 work waits for rtnl lock, mac80211
> calls stop(), rt2x00 waits until tasks are completed but one of those
> is still waiting on the rtnl lock.
> 
> However I could use the non-synchronized version for cancelling the work
> and add a check to see if the work should be cancelled within the
> rtnl-protected area.

Oh, right, your work is actually taking the RTNL. We will not get the
->stop callback rtnl-free no matter what we do. And we can probably not
even get it free of whatever lock we need for the netdev list either
way...

However, the interface iterator will simply not run at all if ->stop has
already succeeded since no interfaces would be UP, so you don't really
have to protect it against that, you only have to make sure it's not
running when you unregister your hardware struct, and _that_ doesn't
have the rtnl taken so you can use cancel_work_sync() there.

> > I would like to keep the mac80211 workqueue rtnl-free because
> > otherwise locking is going to get ugly.
> 
> We need a big warning in mac80211.h then that tells that the non-atomic iterator
> or any other rtnl locking cannot be used in the mac80211 workqueue.

Agreed, care to prepare a patch?

johannes

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mac80211: workqueue & RTNL lock
  2008-06-24  9:10             ` Johannes Berg
@ 2008-06-24  9:20               ` Ivo Van Doorn
  2008-06-24  9:28                 ` Johannes Berg
  0 siblings, 1 reply; 11+ messages in thread
From: Ivo Van Doorn @ 2008-06-24  9:20 UTC (permalink / raw)
  To: Johannes Berg; +Cc: Michael Buesch, linux-wireless

>> > Or if not, can you use schedule_work and cancel_work_sync to take care
>> > of things?
>>
>> I thought about that it cancel_work_sync will cause exactly the same
>> problem,
>> stop() holds the rtnl lock, rt2x00 work waits for rtnl lock, mac80211
>> calls stop(), rt2x00 waits until tasks are completed but one of those
>> is still waiting on the rtnl lock.
>>
>> However I could use the non-synchronized version for cancelling the work
>> and add a check to see if the work should be cancelled within the
>> rtnl-protected area.
>
> Oh, right, your work is actually taking the RTNL. We will not get the
> ->stop callback rtnl-free no matter what we do. And we can probably not
> even get it free of whatever lock we need for the netdev list either
> way...
>
> However, the interface iterator will simply not run at all if ->stop has
> already succeeded since no interfaces would be UP, so you don't really
> have to protect it against that, you only have to make sure it's not
> running when you unregister your hardware struct, and _that_ doesn't
> have the rtnl taken so you can use cancel_work_sync() there.

True, but the race condition happens when ieee80211_stop() is called,
and the rt2x00 work is triggered before before
rt2x00->remove_interface() or rt2x00>stop() is called. when stop() is
then called it cannot use cancel_work_sync() because
the work structure is waiting until stop() has completed which will
not end until cancel_work_sync() has completed.

>> > I would like to keep the mac80211 workqueue rtnl-free because
>> > otherwise locking is going to get ugly.
>>
>> We need a big warning in mac80211.h then that tells that the non-atomic
>> iterator
>> or any other rtnl locking cannot be used in the mac80211 workqueue.
>
> Agreed, care to prepare a patch?

Sure, as soon as get home I'll write the patch. :)

Ivo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mac80211: workqueue & RTNL lock
  2008-06-24  9:20               ` Ivo Van Doorn
@ 2008-06-24  9:28                 ` Johannes Berg
  2008-06-24  9:33                   ` Ivo Van Doorn
  0 siblings, 1 reply; 11+ messages in thread
From: Johannes Berg @ 2008-06-24  9:28 UTC (permalink / raw)
  To: Ivo Van Doorn; +Cc: Michael Buesch, linux-wireless

[-- Attachment #1: Type: text/plain, Size: 1513 bytes --]


> > However, the interface iterator will simply not run at all if ->stop has
> > already succeeded since no interfaces would be UP, so you don't really
> > have to protect it against that, you only have to make sure it's not
> > running when you unregister your hardware struct, and _that_ doesn't
> > have the rtnl taken so you can use cancel_work_sync() there.
> 
> True, but the race condition happens when ieee80211_stop() is called,
> and the rt2x00 work is triggered before before
> rt2x00->remove_interface() or rt2x00>stop() is called. when stop() is
> then called it cannot use cancel_work_sync() because
> the work structure is waiting until stop() has completed which will
> not end until cancel_work_sync() has completed.

No, but I'm saying that ->stop doesn't need to cancel_work_sync(). You
should have a work struct that you exclusively use for whatever you need
to do with interfaces, so the code looks something like this:

void work(...)
{
	iterate_interfaces();
}

maybe with some additional locking. You also use schedule_work() and not
the mac80211-provided workqueue.

This work will do exactly nothing if triggered before ->stop and running
after ->stop because all interfaces will be gone. Hence, you don't need
to cancel it in ->stop, you don't care whether it'll run or not.

Hence, the only thing you need to take care of is that it's canceled
before unregister_hw(), and for that you can use cancel_work_sync().

Am I missing something?

johannes

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mac80211: workqueue & RTNL lock
  2008-06-24  9:28                 ` Johannes Berg
@ 2008-06-24  9:33                   ` Ivo Van Doorn
  0 siblings, 0 replies; 11+ messages in thread
From: Ivo Van Doorn @ 2008-06-24  9:33 UTC (permalink / raw)
  To: Johannes Berg; +Cc: Michael Buesch, linux-wireless

>> > However, the interface iterator will simply not run at all if ->stop has
>> > already succeeded since no interfaces would be UP, so you don't really
>> > have to protect it against that, you only have to make sure it's not
>> > running when you unregister your hardware struct, and _that_ doesn't
>> > have the rtnl taken so you can use cancel_work_sync() there.
>>
>> True, but the race condition happens when ieee80211_stop() is called,
>> and the rt2x00 work is triggered before before
>> rt2x00->remove_interface() or rt2x00>stop() is called. when stop() is
>> then called it cannot use cancel_work_sync() because
>> the work structure is waiting until stop() has completed which will
>> not end until cancel_work_sync() has completed.
>
> No, but I'm saying that ->stop doesn't need to cancel_work_sync(). You
> should have a work struct that you exclusively use for whatever you need
> to do with interfaces, so the code looks something like this:
>
> void work(...)
> {
> 	iterate_interfaces();
> }
>
> maybe with some additional locking. You also use schedule_work() and not
> the mac80211-provided workqueue.
>
> This work will do exactly nothing if triggered before ->stop and running
> after ->stop because all interfaces will be gone. Hence, you don't need
> to cancel it in ->stop, you don't care whether it'll run or not.
>
> Hence, the only thing you need to take care of is that it's canceled
> before unregister_hw(), and for that you can use cancel_work_sync().

Ah yes, I understand what you mean now. That would indeed work, I'll get
a patch ready that does exactly that.

> Am I missing something?

Nope, I was missing something. ;)

Thanks.

Ivo

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2008-06-24  9:33 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-22 22:42 mac80211: workqueue & RTNL lock Ivo van Doorn
2008-06-22 22:30 ` Michael Buesch
2008-06-23  8:30   ` Ivo Van Doorn
2008-06-23  9:20     ` Michael Buesch
2008-06-23 11:17       ` Ivo Van Doorn
2008-06-24  8:51         ` Johannes Berg
2008-06-24  9:03           ` Ivo Van Doorn
2008-06-24  9:10             ` Johannes Berg
2008-06-24  9:20               ` Ivo Van Doorn
2008-06-24  9:28                 ` Johannes Berg
2008-06-24  9:33                   ` Ivo Van Doorn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).