* hung task in mac80211
@ 2017-09-06 11:57 Matteo Croce
[not found] ` <CAGnkfhwi9MmtLneeU23iWEGLj9cb1ODb3KzzOqTDvA6nNVsugg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
` (2 more replies)
0 siblings, 3 replies; 18+ messages in thread
From: Matteo Croce @ 2017-09-06 11:57 UTC (permalink / raw)
To: linux-wireless, netdev, linux-kernel
Hi,
I have an hung task on vanilla 4.13 kernel which I haven't on 4.12.
The problem is present both on my AP and on my notebook,
so it seems it affects AP and STA mode as well.
The generated messages are:
INFO: task kworker/u16:6:120 blocked for more than 120 seconds.
Not tainted 4.13.0 #57
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/u16:6 D 0 120 2 0x00000000
Workqueue: phy0 ieee80211_ba_session_work [mac80211]
Call Trace:
? __schedule+0x174/0x5b0
? schedule+0x31/0x80
? schedule_preempt_disabled+0x9/0x10
? __mutex_lock.isra.2+0x163/0x480
? select_task_rq_fair+0xb9f/0xc60
? __ieee80211_start_rx_ba_session+0x135/0x4d0 [mac80211]
? __ieee80211_start_rx_ba_session+0x135/0x4d0 [mac80211]
? try_to_wake_up+0x1f1/0x340
? update_curr+0x88/0xd0
? ieee80211_ba_session_work+0x148/0x230 [mac80211]
? process_one_work+0x1a5/0x330
? worker_thread+0x42/0x3c0
? create_worker+0x170/0x170
? kthread+0x10d/0x130
? kthread_create_on_node+0x40/0x40
? ret_from_fork+0x22/0x30
I did a bisect and the offending commit is:
commit 699cb58c8a52ff39bf659bff7971893ebe111bf2
Author: Johannes Berg <johannes.berg@intel.com>
Date: Tue May 30 16:34:46 2017 +0200
mac80211: manage RX BA session offload without SKB queue
Instead of using the SKB queue with the fake pkt_type for the
offloaded RX BA session management, also handle this with the
normal aggregation state machine worker. This also makes the
use of this more reliable since it gets rid of the allocation
of the fake skb.
Combined with the previous patch, this finally allows us to
get rid of the pkt_type hack entirely, so do that as well.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Regards,
--
Matteo Croce
per aspera ad upstream
^ permalink raw reply [flat|nested] 18+ messages in thread[parent not found: <CAGnkfhwi9MmtLneeU23iWEGLj9cb1ODb3KzzOqTDvA6nNVsugg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: hung task in mac80211 [not found] ` <CAGnkfhwi9MmtLneeU23iWEGLj9cb1ODb3KzzOqTDvA6nNVsugg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2017-09-06 12:28 ` Christian Lamparter 0 siblings, 0 replies; 18+ messages in thread From: Christian Lamparter @ 2017-09-06 12:28 UTC (permalink / raw) To: Matteo Croce Cc: linux-wireless-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, johannes-cdvu00un1VgdHxzADdlk8Q On Wednesday, September 6, 2017 1:57:47 PM CEST Matteo Croce wrote: > Hi, > > I have an hung task on vanilla 4.13 kernel which I haven't on 4.12. > The problem is present both on my AP and on my notebook, > so it seems it affects AP and STA mode as well. > The generated messages are: > > INFO: task kworker/u16:6:120 blocked for more than 120 seconds. > Not tainted 4.13.0 #57 > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > kworker/u16:6 D 0 120 2 0x00000000 > Workqueue: phy0 ieee80211_ba_session_work [mac80211] > Call Trace: > ? __schedule+0x174/0x5b0 > ? schedule+0x31/0x80 > ? schedule_preempt_disabled+0x9/0x10 > ? __mutex_lock.isra.2+0x163/0x480 > ? select_task_rq_fair+0xb9f/0xc60 > ? __ieee80211_start_rx_ba_session+0x135/0x4d0 [mac80211] > ? __ieee80211_start_rx_ba_session+0x135/0x4d0 [mac80211] > ? try_to_wake_up+0x1f1/0x340 > ? update_curr+0x88/0xd0 > ? ieee80211_ba_session_work+0x148/0x230 [mac80211] > > I did a bisect and the offending commit is: > > commit 699cb58c8a52ff39bf659bff7971893ebe111bf2 > Author: Johannes Berg <johannes.berg-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> > Date: Tue May 30 16:34:46 2017 +0200 > > mac80211: manage RX BA session offload without SKB queue I looked at this briefly: ieee80211_ba_session_work acquires: mutex_lock(&sta->ampdu_mlme.mtx) @ <http://elixir.free-electrons.com/linux/v4.13/source/net/mac80211/ht.c#L321> But it now also calls __ieee80211_start_rx_ba_session() @ http://elixir.free-electrons.com/linux/v4.13/source/net/mac80211/ht.c#L336 which also wants to hold mutex_lock(&sta->ampdu_mlme.mtx) in: http://elixir.free-electrons.com/linux/v4.13/source/net/mac80211/agg-rx.c#L314 I guess this is where it deadlocks? Regards, Christian ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: hung task in mac80211 2017-09-06 11:57 hung task in mac80211 Matteo Croce [not found] ` <CAGnkfhwi9MmtLneeU23iWEGLj9cb1ODb3KzzOqTDvA6nNVsugg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2017-09-06 12:40 ` Stefano Brivio 2017-09-06 12:48 ` Johannes Berg 2017-09-06 13:07 ` Matteo Croce 2017-09-06 12:58 ` Johannes Berg 2 siblings, 2 replies; 18+ messages in thread From: Stefano Brivio @ 2017-09-06 12:40 UTC (permalink / raw) To: Matteo Croce; +Cc: linux-wireless, netdev, linux-kernel, Johannes Berg On Wed, 6 Sep 2017 13:57:47 +0200 Matteo Croce <mcroce@redhat.com> wrote: > Hi, > > I have an hung task on vanilla 4.13 kernel which I haven't on 4.12. > The problem is present both on my AP and on my notebook, > so it seems it affects AP and STA mode as well. > The generated messages are: > > INFO: task kworker/u16:6:120 blocked for more than 120 seconds. > Not tainted 4.13.0 #57 > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > kworker/u16:6 D 0 120 2 0x00000000 > Workqueue: phy0 ieee80211_ba_session_work [mac80211] > Call Trace: > ? __schedule+0x174/0x5b0 > ? schedule+0x31/0x80 > ? schedule_preempt_disabled+0x9/0x10 > ? __mutex_lock.isra.2+0x163/0x480 > ? select_task_rq_fair+0xb9f/0xc60 > ? __ieee80211_start_rx_ba_session+0x135/0x4d0 [mac80211] > ? __ieee80211_start_rx_ba_session+0x135/0x4d0 [mac80211] This is ugly and maybe wrong, but you could check perhaps...: diff --git a/net/mac80211/ht.c b/net/mac80211/ht.c index c92df492e898..bd7512a656f2 100644 --- a/net/mac80211/ht.c +++ b/net/mac80211/ht.c @@ -320,28 +320,40 @@ void ieee80211_ba_session_work(struct work_struct *work) mutex_lock(&sta->ampdu_mlme.mtx); for (tid = 0; tid < IEEE80211_NUM_TIDS; tid++) { - if (test_and_clear_bit(tid, sta->ampdu_mlme.tid_rx_timer_expired)) + if (test_and_clear_bit(tid, sta->ampdu_mlme.tid_rx_timer_expired)) { + mutex_unlock(&sta->ampdu_mlme.mtx); ___ieee80211_stop_rx_ba_session( sta, tid, WLAN_BACK_RECIPIENT, WLAN_REASON_QSTA_TIMEOUT, true); + mutex_lock(&sta->ampdu_mlme.mtx); + } if (test_and_clear_bit(tid, - sta->ampdu_mlme.tid_rx_stop_requested)) + sta->ampdu_mlme.tid_rx_stop_requested)) { + mutex_unlock(&sta->ampdu_mlme.mtx); ___ieee80211_stop_rx_ba_session( sta, tid, WLAN_BACK_RECIPIENT, WLAN_REASON_UNSPECIFIED, true); + mutex_lock(&sta->ampdu_mlme.mtx); + } if (test_and_clear_bit(tid, - sta->ampdu_mlme.tid_rx_manage_offl)) + sta->ampdu_mlme.tid_rx_manage_offl)) { + mutex_unlock(&sta->ampdu_mlme.mtx); __ieee80211_start_rx_ba_session(sta, 0, 0, 0, 1, tid, IEEE80211_MAX_AMPDU_BUF, false, true); + mutex_lock(&sta->ampdu_mlme.mtx); + } if (test_and_clear_bit(tid + IEEE80211_NUM_TIDS, - sta->ampdu_mlme.tid_rx_manage_offl)) + sta->ampdu_mlme.tid_rx_manage_offl)) { + mutex_unlock(&sta->ampdu_mlme.mtx); ___ieee80211_stop_rx_ba_session( sta, tid, WLAN_BACK_RECIPIENT, 0, false); + mutex_lock(&sta->ampdu_mlme.mtx); + } spin_lock_bh(&sta->lock); -- Stefano ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: hung task in mac80211 2017-09-06 12:40 ` Stefano Brivio @ 2017-09-06 12:48 ` Johannes Berg 2017-09-06 13:03 ` Johannes Berg 2017-09-06 13:19 ` Stefano Brivio 2017-09-06 13:07 ` Matteo Croce 1 sibling, 2 replies; 18+ messages in thread From: Johannes Berg @ 2017-09-06 12:48 UTC (permalink / raw) To: Stefano Brivio, Matteo Croce; +Cc: linux-wireless, netdev, linux-kernel I'll look in a bit - but > + mutex_unlock(&sta->ampdu_mlme.mtx); > ___ieee80211_stop_rx_ba_session( > sta, tid, WLAN_BACK_RECIPIENT, > WLAN_REASON_QSTA_TIMEOUT, true); This already has three underscores so shouldn't drop. > > + mutex_unlock(&sta->ampdu_mlme.mtx); > ___ieee80211_stop_rx_ba_session( ditto > + mutex_unlock(&sta->ampdu_mlme.mtx); > __ieee80211_start_rx_ba_session(sta, 0, 0, > 0, 1, tid, maybe this one needs a ___ version then? > + mutex_unlock(&sta->ampdu_mlme.mtx); > ___ieee80211_stop_rx_ba_session( > already ___ I'm surprised nobody saw this before - though perhaps Sebastian's useless report is the same. johannes ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: hung task in mac80211 2017-09-06 12:48 ` Johannes Berg @ 2017-09-06 13:03 ` Johannes Berg [not found] ` <1504702990.13457.19.camel-cdvu00un1VgdHxzADdlk8Q@public.gmane.org> 2017-09-06 13:19 ` Stefano Brivio 1 sibling, 1 reply; 18+ messages in thread From: Johannes Berg @ 2017-09-06 13:03 UTC (permalink / raw) To: Stefano Brivio, Matteo Croce; +Cc: linux-wireless, netdev, linux-kernel On Wed, 2017-09-06 at 14:48 +0200, Johannes Berg wrote: > > I'm surprised nobody saw this before - though perhaps Sebastian's > useless report is the same. Oh, that's because this is only for the offloaded manager thing, and that's only ath10k. johannes ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <1504702990.13457.19.camel-cdvu00un1VgdHxzADdlk8Q@public.gmane.org>]
* Re: hung task in mac80211 [not found] ` <1504702990.13457.19.camel-cdvu00un1VgdHxzADdlk8Q@public.gmane.org> @ 2017-09-06 13:08 ` Sebastian Gottschall 0 siblings, 0 replies; 18+ messages in thread From: Sebastian Gottschall @ 2017-09-06 13:08 UTC (permalink / raw) To: Johannes Berg, Stefano Brivio, Matteo Croce Cc: linux-wireless-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA Am 06.09.2017 um 15:03 schrieb Johannes Berg: > On Wed, 2017-09-06 at 14:48 +0200, Johannes Berg wrote: >> I'm surprised nobody saw this before - though perhaps Sebastian's >> useless report is the same. > Oh, that's because this is only for the offloaded manager thing, and > that's only ath10k. > > johannes that explans the behaviour i have found with latest mac80211 and ath10k. -- Mit freundlichen Grüssen / Regards Sebastian Gottschall / CTO NewMedia-NET GmbH - DD-WRT Firmensitz: Berliner Ring 101, 64625 Bensheim Registergericht: Amtsgericht Darmstadt, HRB 25473 Geschäftsführer: Peter Steinhäuser, Christian Scheele http://www.dd-wrt.com email: s.gottschall-t37Kgv3TaIPQT0dZR+AlfA@public.gmane.org Tel.: +496251-582650 / Fax: +496251-5826565 ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: hung task in mac80211 2017-09-06 12:48 ` Johannes Berg 2017-09-06 13:03 ` Johannes Berg @ 2017-09-06 13:19 ` Stefano Brivio 2017-09-06 13:21 ` Johannes Berg 1 sibling, 1 reply; 18+ messages in thread From: Stefano Brivio @ 2017-09-06 13:19 UTC (permalink / raw) To: Johannes Berg; +Cc: Matteo Croce, linux-wireless, netdev, linux-kernel On Wed, 06 Sep 2017 14:48:35 +0200 Johannes Berg <johannes@sipsolutions.net> wrote: > I'll look in a bit - but > > > + mutex_unlock(&sta->ampdu_mlme.mtx); > > ___ieee80211_stop_rx_ba_session( > > sta, tid, WLAN_BACK_RECIPIENT, > > WLAN_REASON_QSTA_TIMEOUT, true); > > This already has three underscores so shouldn't drop. Right, of course. > [...] > > + mutex_unlock(&sta->ampdu_mlme.mtx); > > __ieee80211_start_rx_ba_session(sta, 0, 0, > > 0, 1, tid, > > maybe this one needs a ___ version then? Either that, or as it's a single call, perhaps just the following? Matter of taste I guess... diff --git a/net/mac80211/ht.c b/net/mac80211/ht.c index c92df492e898..377dd3c233d3 100644 --- a/net/mac80211/ht.c +++ b/net/mac80211/ht.c @@ -332,10 +332,13 @@ void ieee80211_ba_session_work(struct work_struct *work) WLAN_REASON_UNSPECIFIED, true); if (test_and_clear_bit(tid, - sta->ampdu_mlme.tid_rx_manage_offl)) + sta->ampdu_mlme.tid_rx_manage_offl)) { + mutex_unlock(&sta->ampdu_mlme.mtx); __ieee80211_start_rx_ba_session(sta, 0, 0, 0, 1, tid, IEEE80211_MAX_AMPDU_BUF, false, true); + mutex_lock(&sta->ampdu_mlme.mtx); + } if (test_and_clear_bit(tid + IEEE80211_NUM_TIDS, sta->ampdu_mlme.tid_rx_manage_offl)) -- Stefano ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: hung task in mac80211 2017-09-06 13:19 ` Stefano Brivio @ 2017-09-06 13:21 ` Johannes Berg 2017-09-06 13:27 ` Stefano Brivio 0 siblings, 1 reply; 18+ messages in thread From: Johannes Berg @ 2017-09-06 13:21 UTC (permalink / raw) To: Stefano Brivio; +Cc: Matteo Croce, linux-wireless, netdev, linux-kernel On Wed, 2017-09-06 at 15:19 +0200, Stefano Brivio wrote: > On Wed, 06 Sep 2017 14:48:35 +0200 > Johannes Berg <johannes@sipsolutions.net> wrote: > > > I'll look in a bit - but > > > > > + mutex_unlock(&sta->ampdu_mlme.mtx); > > > ___ieee80211_stop_rx_ba_session( > > > sta, tid, WLAN_BACK_RECIPIENT, > > > WLAN_REASON_QSTA_TIMEOUT, > > > true); > > > > This already has three underscores so shouldn't drop. > > Right, of course. > > > [...] > > > + mutex_unlock(&sta->ampdu_mlme.mtx); > > > __ieee80211_start_rx_ba_session(sta, 0, > > > 0, > > > 0, 1, tid, > > > > maybe this one needs a ___ version then? > > Either that, or as it's a single call, perhaps just the following? > Matter of taste I guess... I don't think it's a matter of taste - for me, in principle, dropping locks for small sections of code where the larger section holds it is a bug waiting to happen. It may (may, I don't even know) be OK here, but in general it's something to avoid. johannes ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: hung task in mac80211 2017-09-06 13:21 ` Johannes Berg @ 2017-09-06 13:27 ` Stefano Brivio 2017-09-06 13:30 ` Johannes Berg 0 siblings, 1 reply; 18+ messages in thread From: Stefano Brivio @ 2017-09-06 13:27 UTC (permalink / raw) To: Johannes Berg; +Cc: Matteo Croce, linux-wireless, netdev, linux-kernel On Wed, 06 Sep 2017 15:21:00 +0200 Johannes Berg <johannes@sipsolutions.net> wrote: > On Wed, 2017-09-06 at 15:19 +0200, Stefano Brivio wrote: > > On Wed, 06 Sep 2017 14:48:35 +0200 > > Johannes Berg <johannes@sipsolutions.net> wrote: > > > > > I'll look in a bit - but > > > > > > > + mutex_unlock(&sta->ampdu_mlme.mtx); > > > > ___ieee80211_stop_rx_ba_session( > > > > sta, tid, WLAN_BACK_RECIPIENT, > > > > WLAN_REASON_QSTA_TIMEOUT, > > > > true); > > > > > > This already has three underscores so shouldn't drop. > > > > Right, of course. > > > > > [...] > > > > + mutex_unlock(&sta->ampdu_mlme.mtx); > > > > __ieee80211_start_rx_ba_session(sta, 0, > > > > 0, > > > > 0, 1, tid, > > > > > > maybe this one needs a ___ version then? > > > > Either that, or as it's a single call, perhaps just the following? > > Matter of taste I guess... > > I don't think it's a matter of taste - for me, in principle, dropping > locks for small sections of code where the larger section holds it is a > bug waiting to happen. It may (may, I don't even know) be OK here, but > in general it's something to avoid. Yes, that was based on the assumption that the initial part of __ieee80211_start_rx_ba_session() can't really affect the AMPDU state-machine in any way. But sure, one small change there in the future and the assumption doesn't hold anymore. -- Stefano ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: hung task in mac80211 2017-09-06 13:27 ` Stefano Brivio @ 2017-09-06 13:30 ` Johannes Berg 2017-09-06 13:40 ` Stefano Brivio 0 siblings, 1 reply; 18+ messages in thread From: Johannes Berg @ 2017-09-06 13:30 UTC (permalink / raw) To: Stefano Brivio; +Cc: Matteo Croce, linux-wireless, netdev, linux-kernel On Wed, 2017-09-06 at 15:27 +0200, Stefano Brivio wrote: > > Yes, that was based on the assumption that the initial part of > __ieee80211_start_rx_ba_session() can't really affect the AMPDU > state-machine in any way. That's not really the point, if that changes that function would have to move the locking around, and nothing else. The point is more that code in ieee80211_ba_session_work() could assume the lock is held across the entire loop, since that's the way it's written and looks like even with your patch. So for example replacing the loop of tid = 0..NUM_TIDS-1 with a list_for_each_entry() would already be unsafe with the dropping if the list were to require the mutex for locking. johannes ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: hung task in mac80211 2017-09-06 13:30 ` Johannes Berg @ 2017-09-06 13:40 ` Stefano Brivio 0 siblings, 0 replies; 18+ messages in thread From: Stefano Brivio @ 2017-09-06 13:40 UTC (permalink / raw) To: Johannes Berg; +Cc: Matteo Croce, linux-wireless, netdev, linux-kernel On Wed, 06 Sep 2017 15:30:10 +0200 Johannes Berg <johannes@sipsolutions.net> wrote: > So for example replacing the loop of tid = 0..NUM_TIDS-1 with a > list_for_each_entry() would already be unsafe with the dropping if the > list were to require the mutex for locking. Sure. Still, it would need another code change to break, but in general I do agree indeed. :) -- Stefano ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: hung task in mac80211 2017-09-06 12:40 ` Stefano Brivio 2017-09-06 12:48 ` Johannes Berg @ 2017-09-06 13:07 ` Matteo Croce 1 sibling, 0 replies; 18+ messages in thread From: Matteo Croce @ 2017-09-06 13:07 UTC (permalink / raw) To: Stefano Brivio Cc: linux-wireless-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Johannes Berg On Wed, Sep 6, 2017 at 2:40 PM, Stefano Brivio <sbrivio-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: > On Wed, 6 Sep 2017 13:57:47 +0200 > Matteo Croce <mcroce-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: > >> Hi, >> >> I have an hung task on vanilla 4.13 kernel which I haven't on 4.12. >> The problem is present both on my AP and on my notebook, >> so it seems it affects AP and STA mode as well. >> The generated messages are: >> >> INFO: task kworker/u16:6:120 blocked for more than 120 seconds. >> Not tainted 4.13.0 #57 >> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> kworker/u16:6 D 0 120 2 0x00000000 >> Workqueue: phy0 ieee80211_ba_session_work [mac80211] >> Call Trace: >> ? __schedule+0x174/0x5b0 >> ? schedule+0x31/0x80 >> ? schedule_preempt_disabled+0x9/0x10 >> ? __mutex_lock.isra.2+0x163/0x480 >> ? select_task_rq_fair+0xb9f/0xc60 >> ? __ieee80211_start_rx_ba_session+0x135/0x4d0 [mac80211] >> ? __ieee80211_start_rx_ba_session+0x135/0x4d0 [mac80211] > > This is ugly and maybe wrong, but you could check perhaps...: > > diff --git a/net/mac80211/ht.c b/net/mac80211/ht.c > index c92df492e898..bd7512a656f2 100644 > --- a/net/mac80211/ht.c > +++ b/net/mac80211/ht.c > @@ -320,28 +320,40 @@ void ieee80211_ba_session_work(struct work_struct *work) > > mutex_lock(&sta->ampdu_mlme.mtx); > for (tid = 0; tid < IEEE80211_NUM_TIDS; tid++) { > - if (test_and_clear_bit(tid, sta->ampdu_mlme.tid_rx_timer_expired)) > + if (test_and_clear_bit(tid, sta->ampdu_mlme.tid_rx_timer_expired)) { > + mutex_unlock(&sta->ampdu_mlme.mtx); > ___ieee80211_stop_rx_ba_session( > sta, tid, WLAN_BACK_RECIPIENT, > WLAN_REASON_QSTA_TIMEOUT, true); > + mutex_lock(&sta->ampdu_mlme.mtx); > + } > > if (test_and_clear_bit(tid, > - sta->ampdu_mlme.tid_rx_stop_requested)) > + sta->ampdu_mlme.tid_rx_stop_requested)) { > + mutex_unlock(&sta->ampdu_mlme.mtx); > ___ieee80211_stop_rx_ba_session( > sta, tid, WLAN_BACK_RECIPIENT, > WLAN_REASON_UNSPECIFIED, true); > + mutex_lock(&sta->ampdu_mlme.mtx); > + } > > if (test_and_clear_bit(tid, > - sta->ampdu_mlme.tid_rx_manage_offl)) > + sta->ampdu_mlme.tid_rx_manage_offl)) { > + mutex_unlock(&sta->ampdu_mlme.mtx); > __ieee80211_start_rx_ba_session(sta, 0, 0, 0, 1, tid, > IEEE80211_MAX_AMPDU_BUF, > false, true); > + mutex_lock(&sta->ampdu_mlme.mtx); > + } > > if (test_and_clear_bit(tid + IEEE80211_NUM_TIDS, > - sta->ampdu_mlme.tid_rx_manage_offl)) > + sta->ampdu_mlme.tid_rx_manage_offl)) { > + mutex_unlock(&sta->ampdu_mlme.mtx); > ___ieee80211_stop_rx_ba_session( > sta, tid, WLAN_BACK_RECIPIENT, > 0, false); > + mutex_lock(&sta->ampdu_mlme.mtx); > + } > > spin_lock_bh(&sta->lock); > > -- > Stefano > ACK, I have it running since 12 minutes. The hang usually appears shortly after boot as I set kernel.hung_task_timeout_secs=10 -- Matteo Croce per aspera ad upstream ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: hung task in mac80211 2017-09-06 11:57 hung task in mac80211 Matteo Croce [not found] ` <CAGnkfhwi9MmtLneeU23iWEGLj9cb1ODb3KzzOqTDvA6nNVsugg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2017-09-06 12:40 ` Stefano Brivio @ 2017-09-06 12:58 ` Johannes Berg 2017-09-06 14:27 ` Stefano Brivio 2017-09-06 15:04 ` Matteo Croce 2 siblings, 2 replies; 18+ messages in thread From: Johannes Berg @ 2017-09-06 12:58 UTC (permalink / raw) To: Matteo Croce, linux-wireless, netdev, linux-kernel On Wed, 2017-09-06 at 13:57 +0200, Matteo Croce wrote: > I have an hung task on vanilla 4.13 kernel which I haven't on 4.12. > The problem is present both on my AP and on my notebook, > so it seems it affects AP and STA mode as well. > The generated messages are: > > INFO: task kworker/u16:6:120 blocked for more than 120 seconds. > Not tainted 4.13.0 #57 > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this > message. > kworker/u16:6 D 0 120 2 0x00000000 > Workqueue: phy0 ieee80211_ba_session_work [mac80211] > Call Trace: > ? __schedule+0x174/0x5b0 > ? schedule+0x31/0x80 > ? schedule_preempt_disabled+0x9/0x10 > ? __mutex_lock.isra.2+0x163/0x480 > ? select_task_rq_fair+0xb9f/0xc60 > ? __ieee80211_start_rx_ba_session+0x135/0x4d0 [mac80211] > ? __ieee80211_start_rx_ba_session+0x135/0x4d0 [mac80211] Yeah - obviously as Stefano found, both take &sta->ampdu_mlme.mtx. Can you try this? diff --git a/net/mac80211/agg-rx.c b/net/mac80211/agg-rx.c index 2b36eff5d97e..d8d32776031e 100644 --- a/net/mac80211/agg-rx.c +++ b/net/mac80211/agg-rx.c @@ -245,10 +245,10 @@ static void ieee80211_send_addba_resp(struct ieee80211_sub_if_data *sdata, u8 *d ieee80211_tx_skb(sdata, skb); } -void __ieee80211_start_rx_ba_session(struct sta_info *sta, - u8 dialog_token, u16 timeout, - u16 start_seq_num, u16 ba_policy, u16 tid, - u16 buf_size, bool tx, bool auto_seq) +void ___ieee80211_start_rx_ba_session(struct sta_info *sta, + u8 dialog_token, u16 timeout, + u16 start_seq_num, u16 ba_policy, u16 tid, + u16 buf_size, bool tx, bool auto_seq) { struct ieee80211_local *local = sta->sdata->local; struct tid_ampdu_rx *tid_agg_rx; @@ -267,7 +267,7 @@ void __ieee80211_start_rx_ba_session(struct sta_info *sta, ht_dbg(sta->sdata, "STA %pM requests BA session on unsupported tid %d\n", sta->sta.addr, tid); - goto end_no_lock; + goto end; } if (!sta->sta.ht_cap.ht_supported) { @@ -275,14 +275,14 @@ void __ieee80211_start_rx_ba_session(struct sta_info *sta, "STA %pM erroneously requests BA session on tid %d w/o QoS\n", sta->sta.addr, tid); /* send a response anyway, it's an error case if we get here */ - goto end_no_lock; + goto end; } if (test_sta_flag(sta, WLAN_STA_BLOCK_BA)) { ht_dbg(sta->sdata, "Suspend in progress - Denying ADDBA request (%pM tid %d)\n", sta->sta.addr, tid); - goto end_no_lock; + goto end; } /* sanity check for incoming parameters: @@ -296,7 +296,7 @@ void __ieee80211_start_rx_ba_session(struct sta_info *sta, ht_dbg_ratelimited(sta->sdata, "AddBA Req with bad params from %pM on tid %u. policy %d, buffer size %d\n", sta->sta.addr, tid, ba_policy, buf_size); - goto end_no_lock; + goto end; } /* determine default buffer size */ if (buf_size == 0) @@ -311,7 +311,6 @@ void __ieee80211_start_rx_ba_session(struct sta_info *sta, buf_size, sta->sta.addr); /* examine state machine */ - mutex_lock(&sta->ampdu_mlme.mtx); if (test_bit(tid, sta->ampdu_mlme.agg_session_valid)) { if (sta->ampdu_mlme.tid_rx_token[tid] == dialog_token) { @@ -415,15 +414,25 @@ void __ieee80211_start_rx_ba_session(struct sta_info *sta, __clear_bit(tid, sta->ampdu_mlme.unexpected_agg); sta->ampdu_mlme.tid_rx_token[tid] = dialog_token; } - mutex_unlock(&sta->ampdu_mlme.mtx); -end_no_lock: if (tx) ieee80211_send_addba_resp(sta->sdata, sta->sta.addr, tid, dialog_token, status, 1, buf_size, timeout); } +void __ieee80211_start_rx_ba_session(struct sta_info *sta, + u8 dialog_token, u16 timeout, + u16 start_seq_num, u16 ba_policy, u16 tid, + u16 buf_size, bool tx, bool auto_seq) +{ + mutex_lock(&sta->ampdu_mlme.mtx); + ___ieee80211_start_rx_ba_session(sta, dialog_token, timeout, + start_seq_num, ba_policy, tid, + buf_size, tx, auto_seq); + mutex_unlock(&sta->ampdu_mlme.mtx); +} + void ieee80211_process_addba_request(struct ieee80211_local *local, struct sta_info *sta, struct ieee80211_mgmt *mgmt, diff --git a/net/mac80211/ht.c b/net/mac80211/ht.c index c92df492e898..198b2d3e56fd 100644 --- a/net/mac80211/ht.c +++ b/net/mac80211/ht.c @@ -333,9 +333,9 @@ void ieee80211_ba_session_work(struct work_struct *work) if (test_and_clear_bit(tid, sta->ampdu_mlme.tid_rx_manage_offl)) - __ieee80211_start_rx_ba_session(sta, 0, 0, 0, 1, tid, - IEEE80211_MAX_AMPDU_BUF, - false, true); + ___ieee80211_start_rx_ba_session(sta, 0, 0, 0, 1, tid, + IEEE80211_MAX_AMPDU_BUF, + false, true); if (test_and_clear_bit(tid + IEEE80211_NUM_TIDS, sta->ampdu_mlme.tid_rx_manage_offl)) diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h index 2197c62a0a6e..9675814f64db 100644 --- a/net/mac80211/ieee80211_i.h +++ b/net/mac80211/ieee80211_i.h @@ -1760,6 +1760,10 @@ void __ieee80211_start_rx_ba_session(struct sta_info *sta, u8 dialog_token, u16 timeout, u16 start_seq_num, u16 ba_policy, u16 tid, u16 buf_size, bool tx, bool auto_seq); +void ___ieee80211_start_rx_ba_session(struct sta_info *sta, + u8 dialog_token, u16 timeout, + u16 start_seq_num, u16 ba_policy, u16 tid, + u16 buf_size, bool tx, bool auto_seq); void ieee80211_sta_tear_down_BA_sessions(struct sta_info *sta, enum ieee80211_agg_stop_reason reason); void ieee80211_process_delba(struct ieee80211_sub_if_data *sdata, johannes ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: hung task in mac80211 2017-09-06 12:58 ` Johannes Berg @ 2017-09-06 14:27 ` Stefano Brivio 2017-09-06 14:30 ` Johannes Berg 2017-09-06 15:04 ` Matteo Croce 1 sibling, 1 reply; 18+ messages in thread From: Stefano Brivio @ 2017-09-06 14:27 UTC (permalink / raw) To: Johannes Berg; +Cc: Matteo Croce, linux-wireless, netdev, linux-kernel On Wed, 06 Sep 2017 14:58:48 +0200 Johannes Berg <johannes@sipsolutions.net> wrote: > +void __ieee80211_start_rx_ba_session(struct sta_info *sta, > + u8 dialog_token, u16 timeout, > + u16 start_seq_num, u16 ba_policy, u16 tid, > + u16 buf_size, bool tx, bool auto_seq) > +{ > + mutex_lock(&sta->ampdu_mlme.mtx); > + ___ieee80211_start_rx_ba_session(sta, dialog_token, timeout, > + start_seq_num, ba_policy, tid, > + buf_size, tx, auto_seq); > + mutex_unlock(&sta->ampdu_mlme.mtx); > +} > + Sorry for the extended bothering :) but here, you're extending quite a bit the scope of the lock also when__ieee80211_start_rx_ba_session() is called by ieee80211_process_addba_request(). No idea what the hit can be, but we can't safely assume it's nothing either. What about simply introducing a 'ampdu_mlme_lock_held' argument instead? Something like: diff --git a/net/mac80211/agg-rx.c b/net/mac80211/agg-rx.c index 2b36eff5d97e..35a9eff1ec66 100644 --- a/net/mac80211/agg-rx.c +++ b/net/mac80211/agg-rx.c @@ -248,7 +248,8 @@ static void ieee80211_send_addba_resp(struct ieee80211_sub_if_data *sdata, u8 *d void __ieee80211_start_rx_ba_session(struct sta_info *sta, u8 dialog_token, u16 timeout, u16 start_seq_num, u16 ba_policy, u16 tid, - u16 buf_size, bool tx, bool auto_seq) + u16 buf_size, bool tx, bool auto_seq, + bool ampdu_mlme_lock_held) { struct ieee80211_local *local = sta->sdata->local; struct tid_ampdu_rx *tid_agg_rx; @@ -311,7 +312,8 @@ void __ieee80211_start_rx_ba_session(struct sta_info *sta, buf_size, sta->sta.addr); /* examine state machine */ - mutex_lock(&sta->ampdu_mlme.mtx); + if (!ampdu_mlme_lock_held) + mutex_lock(&sta->ampdu_mlme.mtx); if (test_bit(tid, sta->ampdu_mlme.agg_session_valid)) { if (sta->ampdu_mlme.tid_rx_token[tid] == dialog_token) { @@ -415,7 +417,8 @@ void __ieee80211_start_rx_ba_session(struct sta_info *sta, __clear_bit(tid, sta->ampdu_mlme.unexpected_agg); sta->ampdu_mlme.tid_rx_token[tid] = dialog_token; } - mutex_unlock(&sta->ampdu_mlme.mtx); + if (!ampdu_mlme_lock_held) + mutex_unlock(&sta->ampdu_mlme.mtx); end_no_lock: if (tx) @@ -445,7 +448,7 @@ void ieee80211_process_addba_request(struct ieee80211_local *local, __ieee80211_start_rx_ba_session(sta, dialog_token, timeout, start_seq_num, ba_policy, tid, - buf_size, true, false); + buf_size, true, false, false); } void ieee80211_manage_rx_ba_offl(struct ieee80211_vif *vif, diff --git a/net/mac80211/ht.c b/net/mac80211/ht.c index c92df492e898..59ba67e8942f 100644 --- a/net/mac80211/ht.c +++ b/net/mac80211/ht.c @@ -335,7 +335,7 @@ void ieee80211_ba_session_work(struct work_struct *work) sta->ampdu_mlme.tid_rx_manage_offl)) __ieee80211_start_rx_ba_session(sta, 0, 0, 0, 1, tid, IEEE80211_MAX_AMPDU_BUF, - false, true); + false, true, true); if (test_and_clear_bit(tid + IEEE80211_NUM_TIDS, sta->ampdu_mlme.tid_rx_manage_offl)) diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h index 2197c62a0a6e..5d494ac65853 100644 --- a/net/mac80211/ieee80211_i.h +++ b/net/mac80211/ieee80211_i.h @@ -1759,7 +1759,8 @@ void __ieee80211_stop_rx_ba_session(struct sta_info *sta, u16 tid, void __ieee80211_start_rx_ba_session(struct sta_info *sta, u8 dialog_token, u16 timeout, u16 start_seq_num, u16 ba_policy, u16 tid, - u16 buf_size, bool tx, bool auto_seq); + u16 buf_size, bool tx, bool auto_seq, + bool ampdu_mlme_lock_held); void ieee80211_sta_tear_down_BA_sessions(struct sta_info *sta, enum ieee80211_agg_stop_reason reason); void ieee80211_process_delba(struct ieee80211_sub_if_data *sdata, -- Stefano ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: hung task in mac80211 2017-09-06 14:27 ` Stefano Brivio @ 2017-09-06 14:30 ` Johannes Berg 0 siblings, 0 replies; 18+ messages in thread From: Johannes Berg @ 2017-09-06 14:30 UTC (permalink / raw) To: Stefano Brivio; +Cc: Matteo Croce, linux-wireless, netdev, linux-kernel On Wed, 2017-09-06 at 16:27 +0200, Stefano Brivio wrote: > > Sorry for the extended bothering :) but here, you're extending quite > a bit the scope of the lock also > when__ieee80211_start_rx_ba_session() is called by > ieee80211_process_addba_request(). I know, but it doesn't matter. > No idea what the hit can be, but we can't safely assume it's > nothing either. We don't really have to assume anything, we can read the code :-) Trust me, I probably wrote most of it. It's fine, just sanity checks. > What about simply introducing a 'ampdu_mlme_lock_held' argument > instead? Eww, no. johannes ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: hung task in mac80211 2017-09-06 12:58 ` Johannes Berg 2017-09-06 14:27 ` Stefano Brivio @ 2017-09-06 15:04 ` Matteo Croce 2017-09-06 15:11 ` Johannes Berg 2017-09-06 15:45 ` Sebastian Gottschall 1 sibling, 2 replies; 18+ messages in thread From: Matteo Croce @ 2017-09-06 15:04 UTC (permalink / raw) To: Johannes Berg; +Cc: linux-wireless, netdev, linux-kernel On Wed, Sep 6, 2017 at 2:58 PM, Johannes Berg <johannes@sipsolutions.net> wrote: > On Wed, 2017-09-06 at 13:57 +0200, Matteo Croce wrote: > >> I have an hung task on vanilla 4.13 kernel which I haven't on 4.12. >> The problem is present both on my AP and on my notebook, >> so it seems it affects AP and STA mode as well. >> The generated messages are: >> >> INFO: task kworker/u16:6:120 blocked for more than 120 seconds. >> Not tainted 4.13.0 #57 >> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this >> message. >> kworker/u16:6 D 0 120 2 0x00000000 >> Workqueue: phy0 ieee80211_ba_session_work [mac80211] >> Call Trace: >> ? __schedule+0x174/0x5b0 >> ? schedule+0x31/0x80 >> ? schedule_preempt_disabled+0x9/0x10 >> ? __mutex_lock.isra.2+0x163/0x480 >> ? select_task_rq_fair+0xb9f/0xc60 >> ? __ieee80211_start_rx_ba_session+0x135/0x4d0 [mac80211] >> ? __ieee80211_start_rx_ba_session+0x135/0x4d0 [mac80211] > > Yeah - obviously as Stefano found, both take &sta->ampdu_mlme.mtx. > > Can you try this? > > diff --git a/net/mac80211/agg-rx.c b/net/mac80211/agg-rx.c > index 2b36eff5d97e..d8d32776031e 100644 > --- a/net/mac80211/agg-rx.c > +++ b/net/mac80211/agg-rx.c > @@ -245,10 +245,10 @@ static void ieee80211_send_addba_resp(struct ieee80211_sub_if_data *sdata, u8 *d > ieee80211_tx_skb(sdata, skb); > } > > -void __ieee80211_start_rx_ba_session(struct sta_info *sta, > - u8 dialog_token, u16 timeout, > - u16 start_seq_num, u16 ba_policy, u16 tid, > - u16 buf_size, bool tx, bool auto_seq) > +void ___ieee80211_start_rx_ba_session(struct sta_info *sta, > + u8 dialog_token, u16 timeout, > + u16 start_seq_num, u16 ba_policy, u16 tid, > + u16 buf_size, bool tx, bool auto_seq) > { > struct ieee80211_local *local = sta->sdata->local; > struct tid_ampdu_rx *tid_agg_rx; > @@ -267,7 +267,7 @@ void __ieee80211_start_rx_ba_session(struct sta_info *sta, > ht_dbg(sta->sdata, > "STA %pM requests BA session on unsupported tid %d\n", > sta->sta.addr, tid); > - goto end_no_lock; > + goto end; > } > > if (!sta->sta.ht_cap.ht_supported) { > @@ -275,14 +275,14 @@ void __ieee80211_start_rx_ba_session(struct sta_info *sta, > "STA %pM erroneously requests BA session on tid %d w/o QoS\n", > sta->sta.addr, tid); > /* send a response anyway, it's an error case if we get here */ > - goto end_no_lock; > + goto end; > } > > if (test_sta_flag(sta, WLAN_STA_BLOCK_BA)) { > ht_dbg(sta->sdata, > "Suspend in progress - Denying ADDBA request (%pM tid %d)\n", > sta->sta.addr, tid); > - goto end_no_lock; > + goto end; > } > > /* sanity check for incoming parameters: > @@ -296,7 +296,7 @@ void __ieee80211_start_rx_ba_session(struct sta_info *sta, > ht_dbg_ratelimited(sta->sdata, > "AddBA Req with bad params from %pM on tid %u. policy %d, buffer size %d\n", > sta->sta.addr, tid, ba_policy, buf_size); > - goto end_no_lock; > + goto end; > } > /* determine default buffer size */ > if (buf_size == 0) > @@ -311,7 +311,6 @@ void __ieee80211_start_rx_ba_session(struct sta_info *sta, > buf_size, sta->sta.addr); > > /* examine state machine */ > - mutex_lock(&sta->ampdu_mlme.mtx); > > if (test_bit(tid, sta->ampdu_mlme.agg_session_valid)) { > if (sta->ampdu_mlme.tid_rx_token[tid] == dialog_token) { > @@ -415,15 +414,25 @@ void __ieee80211_start_rx_ba_session(struct sta_info *sta, > __clear_bit(tid, sta->ampdu_mlme.unexpected_agg); > sta->ampdu_mlme.tid_rx_token[tid] = dialog_token; > } > - mutex_unlock(&sta->ampdu_mlme.mtx); > > -end_no_lock: > if (tx) > ieee80211_send_addba_resp(sta->sdata, sta->sta.addr, tid, > dialog_token, status, 1, buf_size, > timeout); > } > > +void __ieee80211_start_rx_ba_session(struct sta_info *sta, > + u8 dialog_token, u16 timeout, > + u16 start_seq_num, u16 ba_policy, u16 tid, > + u16 buf_size, bool tx, bool auto_seq) > +{ > + mutex_lock(&sta->ampdu_mlme.mtx); > + ___ieee80211_start_rx_ba_session(sta, dialog_token, timeout, > + start_seq_num, ba_policy, tid, > + buf_size, tx, auto_seq); > + mutex_unlock(&sta->ampdu_mlme.mtx); > +} > + > void ieee80211_process_addba_request(struct ieee80211_local *local, > struct sta_info *sta, > struct ieee80211_mgmt *mgmt, > diff --git a/net/mac80211/ht.c b/net/mac80211/ht.c > index c92df492e898..198b2d3e56fd 100644 > --- a/net/mac80211/ht.c > +++ b/net/mac80211/ht.c > @@ -333,9 +333,9 @@ void ieee80211_ba_session_work(struct work_struct *work) > > if (test_and_clear_bit(tid, > sta->ampdu_mlme.tid_rx_manage_offl)) > - __ieee80211_start_rx_ba_session(sta, 0, 0, 0, 1, tid, > - IEEE80211_MAX_AMPDU_BUF, > - false, true); > + ___ieee80211_start_rx_ba_session(sta, 0, 0, 0, 1, tid, > + IEEE80211_MAX_AMPDU_BUF, > + false, true); > > if (test_and_clear_bit(tid + IEEE80211_NUM_TIDS, > sta->ampdu_mlme.tid_rx_manage_offl)) > diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h > index 2197c62a0a6e..9675814f64db 100644 > --- a/net/mac80211/ieee80211_i.h > +++ b/net/mac80211/ieee80211_i.h > @@ -1760,6 +1760,10 @@ void __ieee80211_start_rx_ba_session(struct sta_info *sta, > u8 dialog_token, u16 timeout, > u16 start_seq_num, u16 ba_policy, u16 tid, > u16 buf_size, bool tx, bool auto_seq); > +void ___ieee80211_start_rx_ba_session(struct sta_info *sta, > + u8 dialog_token, u16 timeout, > + u16 start_seq_num, u16 ba_policy, u16 tid, > + u16 buf_size, bool tx, bool auto_seq); > void ieee80211_sta_tear_down_BA_sessions(struct sta_info *sta, > enum ieee80211_agg_stop_reason reason); > void ieee80211_process_delba(struct ieee80211_sub_if_data *sdata, > > johannes I confirm that this patch fixes the hang too. I'm curious to see if there are noticeable performance differences between the two solutions. Cheers, -- Matteo Croce per aspera ad upstream ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: hung task in mac80211 2017-09-06 15:04 ` Matteo Croce @ 2017-09-06 15:11 ` Johannes Berg 2017-09-06 15:45 ` Sebastian Gottschall 1 sibling, 0 replies; 18+ messages in thread From: Johannes Berg @ 2017-09-06 15:11 UTC (permalink / raw) To: Matteo Croce; +Cc: linux-wireless, netdev, linux-kernel On Wed, 2017-09-06 at 17:04 +0200, Matteo Croce wrote: > > I confirm that this patch fixes the hang too. Cool, I'll go apply it. > I'm curious to see if there are noticeable performance differences > between the two solutions. Nope, you hit this code path essentially once. johannes ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: hung task in mac80211 2017-09-06 15:04 ` Matteo Croce 2017-09-06 15:11 ` Johannes Berg @ 2017-09-06 15:45 ` Sebastian Gottschall 1 sibling, 0 replies; 18+ messages in thread From: Sebastian Gottschall @ 2017-09-06 15:45 UTC (permalink / raw) To: Matteo Croce, Johannes Berg; +Cc: linux-wireless, netdev, linux-kernel i confirm this patch fixes the issue for me as well Am 06.09.2017 um 17:04 schrieb Matteo Croce: > On Wed, Sep 6, 2017 at 2:58 PM, Johannes Berg <johannes@sipsolutions.net> wrote: >> On Wed, 2017-09-06 at 13:57 +0200, Matteo Croce wrote: >> >>> I have an hung task on vanilla 4.13 kernel which I haven't on 4.12. >>> The problem is present both on my AP and on my notebook, >>> so it seems it affects AP and STA mode as well. >>> The generated messages are: >>> >>> INFO: task kworker/u16:6:120 blocked for more than 120 seconds. >>> Not tainted 4.13.0 #57 >>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this >>> message. >>> kworker/u16:6 D 0 120 2 0x00000000 >>> Workqueue: phy0 ieee80211_ba_session_work [mac80211] >>> Call Trace: >>> ? __schedule+0x174/0x5b0 >>> ? schedule+0x31/0x80 >>> ? schedule_preempt_disabled+0x9/0x10 >>> ? __mutex_lock.isra.2+0x163/0x480 >>> ? select_task_rq_fair+0xb9f/0xc60 >>> ? __ieee80211_start_rx_ba_session+0x135/0x4d0 [mac80211] >>> ? __ieee80211_start_rx_ba_session+0x135/0x4d0 [mac80211] >> Yeah - obviously as Stefano found, both take &sta->ampdu_mlme.mtx. >> >> Can you try this? >> >> diff --git a/net/mac80211/agg-rx.c b/net/mac80211/agg-rx.c >> index 2b36eff5d97e..d8d32776031e 100644 >> --- a/net/mac80211/agg-rx.c >> +++ b/net/mac80211/agg-rx.c >> @@ -245,10 +245,10 @@ static void ieee80211_send_addba_resp(struct ieee80211_sub_if_data *sdata, u8 *d >> ieee80211_tx_skb(sdata, skb); >> } >> >> -void __ieee80211_start_rx_ba_session(struct sta_info *sta, >> - u8 dialog_token, u16 timeout, >> - u16 start_seq_num, u16 ba_policy, u16 tid, >> - u16 buf_size, bool tx, bool auto_seq) >> +void ___ieee80211_start_rx_ba_session(struct sta_info *sta, >> + u8 dialog_token, u16 timeout, >> + u16 start_seq_num, u16 ba_policy, u16 tid, >> + u16 buf_size, bool tx, bool auto_seq) >> { >> struct ieee80211_local *local = sta->sdata->local; >> struct tid_ampdu_rx *tid_agg_rx; >> @@ -267,7 +267,7 @@ void __ieee80211_start_rx_ba_session(struct sta_info *sta, >> ht_dbg(sta->sdata, >> "STA %pM requests BA session on unsupported tid %d\n", >> sta->sta.addr, tid); >> - goto end_no_lock; >> + goto end; >> } >> >> if (!sta->sta.ht_cap.ht_supported) { >> @@ -275,14 +275,14 @@ void __ieee80211_start_rx_ba_session(struct sta_info *sta, >> "STA %pM erroneously requests BA session on tid %d w/o QoS\n", >> sta->sta.addr, tid); >> /* send a response anyway, it's an error case if we get here */ >> - goto end_no_lock; >> + goto end; >> } >> >> if (test_sta_flag(sta, WLAN_STA_BLOCK_BA)) { >> ht_dbg(sta->sdata, >> "Suspend in progress - Denying ADDBA request (%pM tid %d)\n", >> sta->sta.addr, tid); >> - goto end_no_lock; >> + goto end; >> } >> >> /* sanity check for incoming parameters: >> @@ -296,7 +296,7 @@ void __ieee80211_start_rx_ba_session(struct sta_info *sta, >> ht_dbg_ratelimited(sta->sdata, >> "AddBA Req with bad params from %pM on tid %u. policy %d, buffer size %d\n", >> sta->sta.addr, tid, ba_policy, buf_size); >> - goto end_no_lock; >> + goto end; >> } >> /* determine default buffer size */ >> if (buf_size == 0) >> @@ -311,7 +311,6 @@ void __ieee80211_start_rx_ba_session(struct sta_info *sta, >> buf_size, sta->sta.addr); >> >> /* examine state machine */ >> - mutex_lock(&sta->ampdu_mlme.mtx); >> >> if (test_bit(tid, sta->ampdu_mlme.agg_session_valid)) { >> if (sta->ampdu_mlme.tid_rx_token[tid] == dialog_token) { >> @@ -415,15 +414,25 @@ void __ieee80211_start_rx_ba_session(struct sta_info *sta, >> __clear_bit(tid, sta->ampdu_mlme.unexpected_agg); >> sta->ampdu_mlme.tid_rx_token[tid] = dialog_token; >> } >> - mutex_unlock(&sta->ampdu_mlme.mtx); >> >> -end_no_lock: >> if (tx) >> ieee80211_send_addba_resp(sta->sdata, sta->sta.addr, tid, >> dialog_token, status, 1, buf_size, >> timeout); >> } >> >> +void __ieee80211_start_rx_ba_session(struct sta_info *sta, >> + u8 dialog_token, u16 timeout, >> + u16 start_seq_num, u16 ba_policy, u16 tid, >> + u16 buf_size, bool tx, bool auto_seq) >> +{ >> + mutex_lock(&sta->ampdu_mlme.mtx); >> + ___ieee80211_start_rx_ba_session(sta, dialog_token, timeout, >> + start_seq_num, ba_policy, tid, >> + buf_size, tx, auto_seq); >> + mutex_unlock(&sta->ampdu_mlme.mtx); >> +} >> + >> void ieee80211_process_addba_request(struct ieee80211_local *local, >> struct sta_info *sta, >> struct ieee80211_mgmt *mgmt, >> diff --git a/net/mac80211/ht.c b/net/mac80211/ht.c >> index c92df492e898..198b2d3e56fd 100644 >> --- a/net/mac80211/ht.c >> +++ b/net/mac80211/ht.c >> @@ -333,9 +333,9 @@ void ieee80211_ba_session_work(struct work_struct *work) >> >> if (test_and_clear_bit(tid, >> sta->ampdu_mlme.tid_rx_manage_offl)) >> - __ieee80211_start_rx_ba_session(sta, 0, 0, 0, 1, tid, >> - IEEE80211_MAX_AMPDU_BUF, >> - false, true); >> + ___ieee80211_start_rx_ba_session(sta, 0, 0, 0, 1, tid, >> + IEEE80211_MAX_AMPDU_BUF, >> + false, true); >> >> if (test_and_clear_bit(tid + IEEE80211_NUM_TIDS, >> sta->ampdu_mlme.tid_rx_manage_offl)) >> diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h >> index 2197c62a0a6e..9675814f64db 100644 >> --- a/net/mac80211/ieee80211_i.h >> +++ b/net/mac80211/ieee80211_i.h >> @@ -1760,6 +1760,10 @@ void __ieee80211_start_rx_ba_session(struct sta_info *sta, >> u8 dialog_token, u16 timeout, >> u16 start_seq_num, u16 ba_policy, u16 tid, >> u16 buf_size, bool tx, bool auto_seq); >> +void ___ieee80211_start_rx_ba_session(struct sta_info *sta, >> + u8 dialog_token, u16 timeout, >> + u16 start_seq_num, u16 ba_policy, u16 tid, >> + u16 buf_size, bool tx, bool auto_seq); >> void ieee80211_sta_tear_down_BA_sessions(struct sta_info *sta, >> enum ieee80211_agg_stop_reason reason); >> void ieee80211_process_delba(struct ieee80211_sub_if_data *sdata, >> >> johannes > I confirm that this patch fixes the hang too. > I'm curious to see if there are noticeable performance differences > between the two solutions. > > Cheers, -- Mit freundlichen Grüssen / Regards Sebastian Gottschall / CTO NewMedia-NET GmbH - DD-WRT Firmensitz: Berliner Ring 101, 64625 Bensheim Registergericht: Amtsgericht Darmstadt, HRB 25473 Geschäftsführer: Peter Steinhäuser, Christian Scheele http://www.dd-wrt.com email: s.gottschall@dd-wrt.com Tel.: +496251-582650 / Fax: +496251-5826565 ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2017-09-06 15:45 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-09-06 11:57 hung task in mac80211 Matteo Croce
[not found] ` <CAGnkfhwi9MmtLneeU23iWEGLj9cb1ODb3KzzOqTDvA6nNVsugg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-09-06 12:28 ` Christian Lamparter
2017-09-06 12:40 ` Stefano Brivio
2017-09-06 12:48 ` Johannes Berg
2017-09-06 13:03 ` Johannes Berg
[not found] ` <1504702990.13457.19.camel-cdvu00un1VgdHxzADdlk8Q@public.gmane.org>
2017-09-06 13:08 ` Sebastian Gottschall
2017-09-06 13:19 ` Stefano Brivio
2017-09-06 13:21 ` Johannes Berg
2017-09-06 13:27 ` Stefano Brivio
2017-09-06 13:30 ` Johannes Berg
2017-09-06 13:40 ` Stefano Brivio
2017-09-06 13:07 ` Matteo Croce
2017-09-06 12:58 ` Johannes Berg
2017-09-06 14:27 ` Stefano Brivio
2017-09-06 14:30 ` Johannes Berg
2017-09-06 15:04 ` Matteo Croce
2017-09-06 15:11 ` Johannes Berg
2017-09-06 15:45 ` Sebastian Gottschall
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).