* RE: [PATCH] mac80211: always update the PM state of a peer on MGMT / DATA frames
From: Grumbach, Emmanuel @ 2017-09-06 12:26 UTC (permalink / raw)
To: johannes@sipsolutions.net; +Cc: linux-wireless@vger.kernel.org
In-Reply-To: <20170906122523.17333-1-emmanuel.grumbach@intel.com>
>
> The 2016 version of the spec is more generic about when the AP should
> update the power management state of the peer:
> the AP shall update the state based on any management or data frames. This
> means that even non-bufferable management frames should be looked at to
> update to maintain the power management state of the peer.
>
> This can avoid problematic cases for example if a station disappears while
> being asleep and then re-appears. The AP would remember it as in power
> save, but the Authentication frame couldn't be used to set the peer as
> awake again.
> Note that this issues wasn't really critical since at some point (after the
> association) we would have removed the station and created another one
> with all the states cleared.
>
> type=feature
> ticket=none
>
> Change-Id: Iae1fbbd502d64f0c31db3e0f2d81224b29acb5af
Looks like I forgot to remove a few things here :)
And to put this as an RFC..
> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
> ---
> net/mac80211/rx.c | 17 +++++------------
> 1 file changed, 5 insertions(+), 12 deletions(-)
>
> diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c index
> fdf616811865..89db6b11af8a 100644
> --- a/net/mac80211/rx.c
> +++ b/net/mac80211/rx.c
> @@ -1749,23 +1749,16 @@ ieee80211_rx_h_sta_process(struct
> ieee80211_rx_data *rx)
>
> /*
> * Change STA power saving mode only at the end of a frame
> - * exchange sequence.
> + * exchange sequence, and only for a data or management
> + * frame as specified in ieee80211-2016 11.2.3.2
> */
> if (!ieee80211_hw_check(&sta->local->hw, AP_LINK_PS) &&
> !ieee80211_has_morefrags(hdr->frame_control) &&
> - !ieee80211_is_back_req(hdr->frame_control) &&
> + (ieee80211_is_mgmt(hdr->frame_control) ||
> + ieee80211_is_data(hdr->frame_control)) &&
> !(status->rx_flags & IEEE80211_RX_DEFERRED_RELEASE) &&
> (rx->sdata->vif.type == NL80211_IFTYPE_AP ||
> - rx->sdata->vif.type == NL80211_IFTYPE_AP_VLAN) &&
> - /*
> - * PM bit is only checked in frames where it isn't reserved,
> - * in AP mode it's reserved in non-bufferable management frames
> - * (cf. IEEE 802.11-2012 8.2.4.1.7 Power Management field)
> - * BAR frames should be ignored as specified in
> - * IEEE 802.11-2012 10.2.1.2.
> - */
> - (!ieee80211_is_mgmt(hdr->frame_control) ||
> - ieee80211_is_bufferable_mmpdu(hdr->frame_control))) {
> + rx->sdata->vif.type == NL80211_IFTYPE_AP_VLAN)) {
> if (test_sta_flag(sta, WLAN_STA_PS_STA)) {
> if (!ieee80211_has_pm(hdr->frame_control))
> sta_ps_end(sta);
> --
> 2.9.3
^ permalink raw reply
* Re: hung task in mac80211
From: Christian Lamparter @ 2017-09-06 12:28 UTC (permalink / raw)
To: Matteo Croce; +Cc: linux-wireless, netdev, linux-kernel, johannes
In-Reply-To: <CAGnkfhwi9MmtLneeU23iWEGLj9cb1ODb3KzzOqTDvA6nNVsugg@mail.gmail.com>
On Wednesday, September 6, 2017 1:57:47 PM CEST Matteo Croce wrote:
> Hi,
>
> I have an hung task on vanilla 4.13 kernel which I haven't on 4.12.
> The problem is present both on my AP and on my notebook,
> so it seems it affects AP and STA mode as well.
> The generated messages are:
>
> INFO: task kworker/u16:6:120 blocked for more than 120 seconds.
> Not tainted 4.13.0 #57
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> kworker/u16:6 D 0 120 2 0x00000000
> Workqueue: phy0 ieee80211_ba_session_work [mac80211]
> Call Trace:
> ? __schedule+0x174/0x5b0
> ? schedule+0x31/0x80
> ? schedule_preempt_disabled+0x9/0x10
> ? __mutex_lock.isra.2+0x163/0x480
> ? select_task_rq_fair+0xb9f/0xc60
> ? __ieee80211_start_rx_ba_session+0x135/0x4d0 [mac80211]
> ? __ieee80211_start_rx_ba_session+0x135/0x4d0 [mac80211]
> ? try_to_wake_up+0x1f1/0x340
> ? update_curr+0x88/0xd0
> ? ieee80211_ba_session_work+0x148/0x230 [mac80211]
>
> I did a bisect and the offending commit is:
>
> commit 699cb58c8a52ff39bf659bff7971893ebe111bf2
> Author: Johannes Berg <johannes.berg@intel.com>
> Date: Tue May 30 16:34:46 2017 +0200
>
> mac80211: manage RX BA session offload without SKB queue
I looked at this briefly:
ieee80211_ba_session_work acquires:
mutex_lock(&sta->ampdu_mlme.mtx) @
<http://elixir.free-electrons.com/linux/v4.13/source/net/mac80211/ht.c#L321>
But it now also calls
__ieee80211_start_rx_ba_session() @
http://elixir.free-electrons.com/linux/v4.13/source/net/mac80211/ht.c#L336
which also wants to hold mutex_lock(&sta->ampdu_mlme.mtx) in:
http://elixir.free-electrons.com/linux/v4.13/source/net/mac80211/agg-rx.c#L314
I guess this is where it deadlocks?
Regards,
Christian
^ permalink raw reply
* [PATCH v3] mac80211: Complete ampdu work schedule during session tear down
From: Ilan Peer @ 2017-09-06 14:32 UTC (permalink / raw)
To: linux-wireless; +Cc: Ilan Peer
Commit 7a7c0a6438b8 ("mac80211: fix TX aggregation start/stop callback race")
added a cancellation of the ampdu work after the loop that stopped the
Tx and Rx BA sessions. However, in some cases, e.g., during HW reconfig,
the low level driver might call mac80211 APIs to complete the stopping
of the BA sessions, which would queue the ampdu work to handle the actual
completion. This work needs to be performed as otherwise mac80211 data
structures would not be properly synced.
Fix this by checking if BA session STOP_CB bit is set after the BA session
cancellation and properly clean the session.
Signed-off-by: Ilan Peer <ilan.peer@intel.com>
---
net/mac80211/ht.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/net/mac80211/ht.c b/net/mac80211/ht.c
index c92df49..11d691c 100644
--- a/net/mac80211/ht.c
+++ b/net/mac80211/ht.c
@@ -300,6 +300,23 @@ void ieee80211_sta_tear_down_BA_sessions(struct sta_info *sta,
/* stopping might queue the work again - so cancel only afterwards */
cancel_work_sync(&sta->ampdu_mlme.work);
+
+ /* In case the tear down is part of a reconfigure due to HW restart
+ * request, it is possible that the low level driver requested to stop
+ * the BA session, so handle it to properly clean tid_tx data.
+ */
+ mutex_lock(&sta->ampdu_mlme.mtx);
+ for (i = 0; i < IEEE80211_NUM_TIDS; i++) {
+ struct tid_ampdu_tx *tid_tx =
+ rcu_dereference_protected_tid_tx(sta, i);
+
+ if (!tid_tx)
+ continue;
+
+ if (test_and_clear_bit(HT_AGG_STATE_STOP_CB, &tid_tx->state))
+ ieee80211_stop_tx_ba_cb(sta, i, tid_tx);
+ }
+ mutex_unlock(&sta->ampdu_mlme.mtx);
}
void ieee80211_ba_session_work(struct work_struct *work)
--
1.9.1
^ permalink raw reply related
* Re: hung task in mac80211
From: Stefano Brivio @ 2017-09-06 12:40 UTC (permalink / raw)
To: Matteo Croce; +Cc: linux-wireless, netdev, linux-kernel, Johannes Berg
In-Reply-To: <CAGnkfhwi9MmtLneeU23iWEGLj9cb1ODb3KzzOqTDvA6nNVsugg@mail.gmail.com>
On Wed, 6 Sep 2017 13:57:47 +0200
Matteo Croce <mcroce@redhat.com> wrote:
> Hi,
>
> I have an hung task on vanilla 4.13 kernel which I haven't on 4.12.
> The problem is present both on my AP and on my notebook,
> so it seems it affects AP and STA mode as well.
> The generated messages are:
>
> INFO: task kworker/u16:6:120 blocked for more than 120 seconds.
> Not tainted 4.13.0 #57
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> kworker/u16:6 D 0 120 2 0x00000000
> Workqueue: phy0 ieee80211_ba_session_work [mac80211]
> Call Trace:
> ? __schedule+0x174/0x5b0
> ? schedule+0x31/0x80
> ? schedule_preempt_disabled+0x9/0x10
> ? __mutex_lock.isra.2+0x163/0x480
> ? select_task_rq_fair+0xb9f/0xc60
> ? __ieee80211_start_rx_ba_session+0x135/0x4d0 [mac80211]
> ? __ieee80211_start_rx_ba_session+0x135/0x4d0 [mac80211]
This is ugly and maybe wrong, but you could check perhaps...:
diff --git a/net/mac80211/ht.c b/net/mac80211/ht.c
index c92df492e898..bd7512a656f2 100644
--- a/net/mac80211/ht.c
+++ b/net/mac80211/ht.c
@@ -320,28 +320,40 @@ void ieee80211_ba_session_work(struct work_struct *work)
mutex_lock(&sta->ampdu_mlme.mtx);
for (tid = 0; tid < IEEE80211_NUM_TIDS; tid++) {
- if (test_and_clear_bit(tid, sta->ampdu_mlme.tid_rx_timer_expired))
+ if (test_and_clear_bit(tid, sta->ampdu_mlme.tid_rx_timer_expired)) {
+ mutex_unlock(&sta->ampdu_mlme.mtx);
___ieee80211_stop_rx_ba_session(
sta, tid, WLAN_BACK_RECIPIENT,
WLAN_REASON_QSTA_TIMEOUT, true);
+ mutex_lock(&sta->ampdu_mlme.mtx);
+ }
if (test_and_clear_bit(tid,
- sta->ampdu_mlme.tid_rx_stop_requested))
+ sta->ampdu_mlme.tid_rx_stop_requested)) {
+ mutex_unlock(&sta->ampdu_mlme.mtx);
___ieee80211_stop_rx_ba_session(
sta, tid, WLAN_BACK_RECIPIENT,
WLAN_REASON_UNSPECIFIED, true);
+ mutex_lock(&sta->ampdu_mlme.mtx);
+ }
if (test_and_clear_bit(tid,
- sta->ampdu_mlme.tid_rx_manage_offl))
+ sta->ampdu_mlme.tid_rx_manage_offl)) {
+ mutex_unlock(&sta->ampdu_mlme.mtx);
__ieee80211_start_rx_ba_session(sta, 0, 0, 0, 1, tid,
IEEE80211_MAX_AMPDU_BUF,
false, true);
+ mutex_lock(&sta->ampdu_mlme.mtx);
+ }
if (test_and_clear_bit(tid + IEEE80211_NUM_TIDS,
- sta->ampdu_mlme.tid_rx_manage_offl))
+ sta->ampdu_mlme.tid_rx_manage_offl)) {
+ mutex_unlock(&sta->ampdu_mlme.mtx);
___ieee80211_stop_rx_ba_session(
sta, tid, WLAN_BACK_RECIPIENT,
0, false);
+ mutex_lock(&sta->ampdu_mlme.mtx);
+ }
spin_lock_bh(&sta->lock);
--
Stefano
^ permalink raw reply related
* Re: hung task in mac80211
From: Johannes Berg @ 2017-09-06 12:48 UTC (permalink / raw)
To: Stefano Brivio, Matteo Croce; +Cc: linux-wireless, netdev, linux-kernel
In-Reply-To: <20170906144019.1c98a636@elisabeth>
I'll look in a bit - but
> + mutex_unlock(&sta->ampdu_mlme.mtx);
> ___ieee80211_stop_rx_ba_session(
> sta, tid, WLAN_BACK_RECIPIENT,
> WLAN_REASON_QSTA_TIMEOUT, true);
This already has three underscores so shouldn't drop.
>
> + mutex_unlock(&sta->ampdu_mlme.mtx);
> ___ieee80211_stop_rx_ba_session(
ditto
> + mutex_unlock(&sta->ampdu_mlme.mtx);
> __ieee80211_start_rx_ba_session(sta, 0, 0,
> 0, 1, tid,
maybe this one needs a ___ version then?
> + mutex_unlock(&sta->ampdu_mlme.mtx);
> ___ieee80211_stop_rx_ba_session(
>
already ___
I'm surprised nobody saw this before - though perhaps Sebastian's
useless report is the same.
johannes
^ permalink raw reply
* Re: hung task in mac80211
From: Johannes Berg @ 2017-09-06 12:58 UTC (permalink / raw)
To: Matteo Croce, linux-wireless, netdev, linux-kernel
In-Reply-To: <CAGnkfhwi9MmtLneeU23iWEGLj9cb1ODb3KzzOqTDvA6nNVsugg@mail.gmail.com>
On Wed, 2017-09-06 at 13:57 +0200, Matteo Croce wrote:
> I have an hung task on vanilla 4.13 kernel which I haven't on 4.12.
> The problem is present both on my AP and on my notebook,
> so it seems it affects AP and STA mode as well.
> The generated messages are:
>
> INFO: task kworker/u16:6:120 blocked for more than 120 seconds.
> Not tainted 4.13.0 #57
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
> message.
> kworker/u16:6 D 0 120 2 0x00000000
> Workqueue: phy0 ieee80211_ba_session_work [mac80211]
> Call Trace:
> ? __schedule+0x174/0x5b0
> ? schedule+0x31/0x80
> ? schedule_preempt_disabled+0x9/0x10
> ? __mutex_lock.isra.2+0x163/0x480
> ? select_task_rq_fair+0xb9f/0xc60
> ? __ieee80211_start_rx_ba_session+0x135/0x4d0 [mac80211]
> ? __ieee80211_start_rx_ba_session+0x135/0x4d0 [mac80211]
Yeah - obviously as Stefano found, both take &sta->ampdu_mlme.mtx.
Can you try this?
diff --git a/net/mac80211/agg-rx.c b/net/mac80211/agg-rx.c
index 2b36eff5d97e..d8d32776031e 100644
--- a/net/mac80211/agg-rx.c
+++ b/net/mac80211/agg-rx.c
@@ -245,10 +245,10 @@ static void ieee80211_send_addba_resp(struct ieee80211_sub_if_data *sdata, u8 *d
ieee80211_tx_skb(sdata, skb);
}
-void __ieee80211_start_rx_ba_session(struct sta_info *sta,
- u8 dialog_token, u16 timeout,
- u16 start_seq_num, u16 ba_policy, u16 tid,
- u16 buf_size, bool tx, bool auto_seq)
+void ___ieee80211_start_rx_ba_session(struct sta_info *sta,
+ u8 dialog_token, u16 timeout,
+ u16 start_seq_num, u16 ba_policy, u16 tid,
+ u16 buf_size, bool tx, bool auto_seq)
{
struct ieee80211_local *local = sta->sdata->local;
struct tid_ampdu_rx *tid_agg_rx;
@@ -267,7 +267,7 @@ void __ieee80211_start_rx_ba_session(struct sta_info *sta,
ht_dbg(sta->sdata,
"STA %pM requests BA session on unsupported tid %d\n",
sta->sta.addr, tid);
- goto end_no_lock;
+ goto end;
}
if (!sta->sta.ht_cap.ht_supported) {
@@ -275,14 +275,14 @@ void __ieee80211_start_rx_ba_session(struct sta_info *sta,
"STA %pM erroneously requests BA session on tid %d w/o QoS\n",
sta->sta.addr, tid);
/* send a response anyway, it's an error case if we get here */
- goto end_no_lock;
+ goto end;
}
if (test_sta_flag(sta, WLAN_STA_BLOCK_BA)) {
ht_dbg(sta->sdata,
"Suspend in progress - Denying ADDBA request (%pM tid %d)\n",
sta->sta.addr, tid);
- goto end_no_lock;
+ goto end;
}
/* sanity check for incoming parameters:
@@ -296,7 +296,7 @@ void __ieee80211_start_rx_ba_session(struct sta_info *sta,
ht_dbg_ratelimited(sta->sdata,
"AddBA Req with bad params from %pM on tid %u. policy %d, buffer size %d\n",
sta->sta.addr, tid, ba_policy, buf_size);
- goto end_no_lock;
+ goto end;
}
/* determine default buffer size */
if (buf_size == 0)
@@ -311,7 +311,6 @@ void __ieee80211_start_rx_ba_session(struct sta_info *sta,
buf_size, sta->sta.addr);
/* examine state machine */
- mutex_lock(&sta->ampdu_mlme.mtx);
if (test_bit(tid, sta->ampdu_mlme.agg_session_valid)) {
if (sta->ampdu_mlme.tid_rx_token[tid] == dialog_token) {
@@ -415,15 +414,25 @@ void __ieee80211_start_rx_ba_session(struct sta_info *sta,
__clear_bit(tid, sta->ampdu_mlme.unexpected_agg);
sta->ampdu_mlme.tid_rx_token[tid] = dialog_token;
}
- mutex_unlock(&sta->ampdu_mlme.mtx);
-end_no_lock:
if (tx)
ieee80211_send_addba_resp(sta->sdata, sta->sta.addr, tid,
dialog_token, status, 1, buf_size,
timeout);
}
+void __ieee80211_start_rx_ba_session(struct sta_info *sta,
+ u8 dialog_token, u16 timeout,
+ u16 start_seq_num, u16 ba_policy, u16 tid,
+ u16 buf_size, bool tx, bool auto_seq)
+{
+ mutex_lock(&sta->ampdu_mlme.mtx);
+ ___ieee80211_start_rx_ba_session(sta, dialog_token, timeout,
+ start_seq_num, ba_policy, tid,
+ buf_size, tx, auto_seq);
+ mutex_unlock(&sta->ampdu_mlme.mtx);
+}
+
void ieee80211_process_addba_request(struct ieee80211_local *local,
struct sta_info *sta,
struct ieee80211_mgmt *mgmt,
diff --git a/net/mac80211/ht.c b/net/mac80211/ht.c
index c92df492e898..198b2d3e56fd 100644
--- a/net/mac80211/ht.c
+++ b/net/mac80211/ht.c
@@ -333,9 +333,9 @@ void ieee80211_ba_session_work(struct work_struct *work)
if (test_and_clear_bit(tid,
sta->ampdu_mlme.tid_rx_manage_offl))
- __ieee80211_start_rx_ba_session(sta, 0, 0, 0, 1, tid,
- IEEE80211_MAX_AMPDU_BUF,
- false, true);
+ ___ieee80211_start_rx_ba_session(sta, 0, 0, 0, 1, tid,
+ IEEE80211_MAX_AMPDU_BUF,
+ false, true);
if (test_and_clear_bit(tid + IEEE80211_NUM_TIDS,
sta->ampdu_mlme.tid_rx_manage_offl))
diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
index 2197c62a0a6e..9675814f64db 100644
--- a/net/mac80211/ieee80211_i.h
+++ b/net/mac80211/ieee80211_i.h
@@ -1760,6 +1760,10 @@ void __ieee80211_start_rx_ba_session(struct sta_info *sta,
u8 dialog_token, u16 timeout,
u16 start_seq_num, u16 ba_policy, u16 tid,
u16 buf_size, bool tx, bool auto_seq);
+void ___ieee80211_start_rx_ba_session(struct sta_info *sta,
+ u8 dialog_token, u16 timeout,
+ u16 start_seq_num, u16 ba_policy, u16 tid,
+ u16 buf_size, bool tx, bool auto_seq);
void ieee80211_sta_tear_down_BA_sessions(struct sta_info *sta,
enum ieee80211_agg_stop_reason reason);
void ieee80211_process_delba(struct ieee80211_sub_if_data *sdata,
johannes
^ permalink raw reply related
* [PATCH] ath10: mark PM functions as __maybe_unused
From: Arnd Bergmann @ 2017-09-06 12:58 UTC (permalink / raw)
To: Kalle Valo
Cc: Arnd Bergmann, Colin Ian King, Bartosz Markowski, Govind Singh,
Ryan Hsu, Srinivas Kandagatla, Rajkumar Manoharan,
Ashok Raj Nagarajan, Ben Greear, ath10k, linux-wireless, netdev,
linux-kernel
When CONFIG_PM_SLEEP is disabled, we get a compile-time
warning:
drivers/net/wireless/ath/ath10k/pci.c:3417:12: error: 'ath10k_pci_pm_resume' defined but not used [-Werror=unused-function]
static int ath10k_pci_pm_resume(struct device *dev)
^~~~~~~~~~~~~~~~~~~~
drivers/net/wireless/ath/ath10k/pci.c:3401:12: error: 'ath10k_pci_pm_suspend' defined but not used [-Werror=unused-function]
static int ath10k_pci_pm_suspend(struct device *dev)
Rather than fixing the #ifdef, this just marks both functions
as __maybe_unused, which is a more robust way to do this.
Fixes: 32faa3f0ee50 ("ath10k: add the PCI PM core suspend/resume ops")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
drivers/net/wireless/ath/ath10k/pci.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/drivers/net/wireless/ath/ath10k/pci.c b/drivers/net/wireless/ath/ath10k/pci.c
index bc1633945a56..195dafb98131 100644
--- a/drivers/net/wireless/ath/ath10k/pci.c
+++ b/drivers/net/wireless/ath/ath10k/pci.c
@@ -3396,9 +3396,7 @@ static void ath10k_pci_remove(struct pci_dev *pdev)
MODULE_DEVICE_TABLE(pci, ath10k_pci_id_table);
-#ifdef CONFIG_PM
-
-static int ath10k_pci_pm_suspend(struct device *dev)
+static __maybe_unused int ath10k_pci_pm_suspend(struct device *dev)
{
struct ath10k *ar = dev_get_drvdata(dev);
int ret;
@@ -3414,7 +3412,7 @@ static int ath10k_pci_pm_suspend(struct device *dev)
return ret;
}
-static int ath10k_pci_pm_resume(struct device *dev)
+static __maybe_unused int ath10k_pci_pm_resume(struct device *dev)
{
struct ath10k *ar = dev_get_drvdata(dev);
int ret;
@@ -3433,7 +3431,6 @@ static int ath10k_pci_pm_resume(struct device *dev)
static SIMPLE_DEV_PM_OPS(ath10k_pci_pm_ops,
ath10k_pci_pm_suspend,
ath10k_pci_pm_resume);
-#endif
static struct pci_driver ath10k_pci_driver = {
.name = "ath10k_pci",
--
2.9.0
^ permalink raw reply related
* Re: hung task in mac80211
From: Johannes Berg @ 2017-09-06 13:03 UTC (permalink / raw)
To: Stefano Brivio, Matteo Croce; +Cc: linux-wireless, netdev, linux-kernel
In-Reply-To: <1504702115.13457.16.camel@sipsolutions.net>
On Wed, 2017-09-06 at 14:48 +0200, Johannes Berg wrote:
>
> I'm surprised nobody saw this before - though perhaps Sebastian's
> useless report is the same.
Oh, that's because this is only for the offloaded manager thing, and
that's only ath10k.
johannes
^ permalink raw reply
* Re: hung task in mac80211
From: Matteo Croce @ 2017-09-06 13:07 UTC (permalink / raw)
To: Stefano Brivio; +Cc: linux-wireless, netdev, linux-kernel, Johannes Berg
In-Reply-To: <20170906144019.1c98a636@elisabeth>
On Wed, Sep 6, 2017 at 2:40 PM, Stefano Brivio <sbrivio@redhat.com> wrote:
> On Wed, 6 Sep 2017 13:57:47 +0200
> Matteo Croce <mcroce@redhat.com> wrote:
>
>> Hi,
>>
>> I have an hung task on vanilla 4.13 kernel which I haven't on 4.12.
>> The problem is present both on my AP and on my notebook,
>> so it seems it affects AP and STA mode as well.
>> The generated messages are:
>>
>> INFO: task kworker/u16:6:120 blocked for more than 120 seconds.
>> Not tainted 4.13.0 #57
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> kworker/u16:6 D 0 120 2 0x00000000
>> Workqueue: phy0 ieee80211_ba_session_work [mac80211]
>> Call Trace:
>> ? __schedule+0x174/0x5b0
>> ? schedule+0x31/0x80
>> ? schedule_preempt_disabled+0x9/0x10
>> ? __mutex_lock.isra.2+0x163/0x480
>> ? select_task_rq_fair+0xb9f/0xc60
>> ? __ieee80211_start_rx_ba_session+0x135/0x4d0 [mac80211]
>> ? __ieee80211_start_rx_ba_session+0x135/0x4d0 [mac80211]
>
> This is ugly and maybe wrong, but you could check perhaps...:
>
> diff --git a/net/mac80211/ht.c b/net/mac80211/ht.c
> index c92df492e898..bd7512a656f2 100644
> --- a/net/mac80211/ht.c
> +++ b/net/mac80211/ht.c
> @@ -320,28 +320,40 @@ void ieee80211_ba_session_work(struct work_struct *work)
>
> mutex_lock(&sta->ampdu_mlme.mtx);
> for (tid = 0; tid < IEEE80211_NUM_TIDS; tid++) {
> - if (test_and_clear_bit(tid, sta->ampdu_mlme.tid_rx_timer_expired))
> + if (test_and_clear_bit(tid, sta->ampdu_mlme.tid_rx_timer_expired)) {
> + mutex_unlock(&sta->ampdu_mlme.mtx);
> ___ieee80211_stop_rx_ba_session(
> sta, tid, WLAN_BACK_RECIPIENT,
> WLAN_REASON_QSTA_TIMEOUT, true);
> + mutex_lock(&sta->ampdu_mlme.mtx);
> + }
>
> if (test_and_clear_bit(tid,
> - sta->ampdu_mlme.tid_rx_stop_requested))
> + sta->ampdu_mlme.tid_rx_stop_requested)) {
> + mutex_unlock(&sta->ampdu_mlme.mtx);
> ___ieee80211_stop_rx_ba_session(
> sta, tid, WLAN_BACK_RECIPIENT,
> WLAN_REASON_UNSPECIFIED, true);
> + mutex_lock(&sta->ampdu_mlme.mtx);
> + }
>
> if (test_and_clear_bit(tid,
> - sta->ampdu_mlme.tid_rx_manage_offl))
> + sta->ampdu_mlme.tid_rx_manage_offl)) {
> + mutex_unlock(&sta->ampdu_mlme.mtx);
> __ieee80211_start_rx_ba_session(sta, 0, 0, 0, 1, tid,
> IEEE80211_MAX_AMPDU_BUF,
> false, true);
> + mutex_lock(&sta->ampdu_mlme.mtx);
> + }
>
> if (test_and_clear_bit(tid + IEEE80211_NUM_TIDS,
> - sta->ampdu_mlme.tid_rx_manage_offl))
> + sta->ampdu_mlme.tid_rx_manage_offl)) {
> + mutex_unlock(&sta->ampdu_mlme.mtx);
> ___ieee80211_stop_rx_ba_session(
> sta, tid, WLAN_BACK_RECIPIENT,
> 0, false);
> + mutex_lock(&sta->ampdu_mlme.mtx);
> + }
>
> spin_lock_bh(&sta->lock);
>
> --
> Stefano
>
ACK, I have it running since 12 minutes.
The hang usually appears shortly after boot as I set
kernel.hung_task_timeout_secs=10
--
Matteo Croce
per aspera ad upstream
^ permalink raw reply
* Re: hung task in mac80211
From: Sebastian Gottschall @ 2017-09-06 13:08 UTC (permalink / raw)
To: Johannes Berg, Stefano Brivio, Matteo Croce
Cc: linux-wireless, netdev, linux-kernel
In-Reply-To: <1504702990.13457.19.camel@sipsolutions.net>
Am 06.09.2017 um 15:03 schrieb Johannes Berg:
> On Wed, 2017-09-06 at 14:48 +0200, Johannes Berg wrote:
>> I'm surprised nobody saw this before - though perhaps Sebastian's
>> useless report is the same.
> Oh, that's because this is only for the offloaded manager thing, and
> that's only ath10k.
>
> johannes
that explans the behaviour i have found with latest mac80211 and ath10k.
--
Mit freundlichen Grüssen / Regards
Sebastian Gottschall / CTO
NewMedia-NET GmbH - DD-WRT
Firmensitz: Berliner Ring 101, 64625 Bensheim
Registergericht: Amtsgericht Darmstadt, HRB 25473
Geschäftsführer: Peter Steinhäuser, Christian Scheele
http://www.dd-wrt.com
email: s.gottschall@dd-wrt.com
Tel.: +496251-582650 / Fax: +496251-5826565
^ permalink raw reply
* [PATCH] mac80211: fix deadlock in driver-managed RX BA session start
From: Johannes Berg @ 2017-09-06 13:19 UTC (permalink / raw)
To: linux-wireless; +Cc: Matteo Croce, Johannes Berg
From: Johannes Berg <johannes.berg@intel.com>
When an RX BA session is started by the driver, and it has to tell
mac80211 about it, the corresponding bit in tid_rx_manage_offl gets
set and the BA session work is scheduled. Upon testing this bit, it
will call __ieee80211_start_rx_ba_session(), thus deadlocking as it
already holds the ampdu_mlme.mtx, which that acquires again.
Fix this by adding ___ieee80211_start_rx_ba_session(), a version of
the function that requires the mutex already held.
Cc: stable@vger.kernel.org
Fixes: 699cb58c8a52 ("mac80211: manage RX BA session offload without SKB queue")
Reported-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
---
net/mac80211/agg-rx.c | 32 +++++++++++++++++++++-----------
net/mac80211/ht.c | 6 +++---
net/mac80211/ieee80211_i.h | 4 ++++
3 files changed, 28 insertions(+), 14 deletions(-)
diff --git a/net/mac80211/agg-rx.c b/net/mac80211/agg-rx.c
index 2b36eff5d97e..2849a1fc41c5 100644
--- a/net/mac80211/agg-rx.c
+++ b/net/mac80211/agg-rx.c
@@ -245,10 +245,10 @@ static void ieee80211_send_addba_resp(struct ieee80211_sub_if_data *sdata, u8 *d
ieee80211_tx_skb(sdata, skb);
}
-void __ieee80211_start_rx_ba_session(struct sta_info *sta,
- u8 dialog_token, u16 timeout,
- u16 start_seq_num, u16 ba_policy, u16 tid,
- u16 buf_size, bool tx, bool auto_seq)
+void ___ieee80211_start_rx_ba_session(struct sta_info *sta,
+ u8 dialog_token, u16 timeout,
+ u16 start_seq_num, u16 ba_policy, u16 tid,
+ u16 buf_size, bool tx, bool auto_seq)
{
struct ieee80211_local *local = sta->sdata->local;
struct tid_ampdu_rx *tid_agg_rx;
@@ -267,7 +267,7 @@ void __ieee80211_start_rx_ba_session(struct sta_info *sta,
ht_dbg(sta->sdata,
"STA %pM requests BA session on unsupported tid %d\n",
sta->sta.addr, tid);
- goto end_no_lock;
+ goto end;
}
if (!sta->sta.ht_cap.ht_supported) {
@@ -275,14 +275,14 @@ void __ieee80211_start_rx_ba_session(struct sta_info *sta,
"STA %pM erroneously requests BA session on tid %d w/o QoS\n",
sta->sta.addr, tid);
/* send a response anyway, it's an error case if we get here */
- goto end_no_lock;
+ goto end;
}
if (test_sta_flag(sta, WLAN_STA_BLOCK_BA)) {
ht_dbg(sta->sdata,
"Suspend in progress - Denying ADDBA request (%pM tid %d)\n",
sta->sta.addr, tid);
- goto end_no_lock;
+ goto end;
}
/* sanity check for incoming parameters:
@@ -296,7 +296,7 @@ void __ieee80211_start_rx_ba_session(struct sta_info *sta,
ht_dbg_ratelimited(sta->sdata,
"AddBA Req with bad params from %pM on tid %u. policy %d, buffer size %d\n",
sta->sta.addr, tid, ba_policy, buf_size);
- goto end_no_lock;
+ goto end;
}
/* determine default buffer size */
if (buf_size == 0)
@@ -311,7 +311,7 @@ void __ieee80211_start_rx_ba_session(struct sta_info *sta,
buf_size, sta->sta.addr);
/* examine state machine */
- mutex_lock(&sta->ampdu_mlme.mtx);
+ lockdep_assert_held(&sta->ampdu_mlme.mtx);
if (test_bit(tid, sta->ampdu_mlme.agg_session_valid)) {
if (sta->ampdu_mlme.tid_rx_token[tid] == dialog_token) {
@@ -415,15 +415,25 @@ void __ieee80211_start_rx_ba_session(struct sta_info *sta,
__clear_bit(tid, sta->ampdu_mlme.unexpected_agg);
sta->ampdu_mlme.tid_rx_token[tid] = dialog_token;
}
- mutex_unlock(&sta->ampdu_mlme.mtx);
-end_no_lock:
if (tx)
ieee80211_send_addba_resp(sta->sdata, sta->sta.addr, tid,
dialog_token, status, 1, buf_size,
timeout);
}
+void __ieee80211_start_rx_ba_session(struct sta_info *sta,
+ u8 dialog_token, u16 timeout,
+ u16 start_seq_num, u16 ba_policy, u16 tid,
+ u16 buf_size, bool tx, bool auto_seq)
+{
+ mutex_lock(&sta->ampdu_mlme.mtx);
+ ___ieee80211_start_rx_ba_session(sta, dialog_token, timeout,
+ start_seq_num, ba_policy, tid,
+ buf_size, tx, auto_seq);
+ mutex_unlock(&sta->ampdu_mlme.mtx);
+}
+
void ieee80211_process_addba_request(struct ieee80211_local *local,
struct sta_info *sta,
struct ieee80211_mgmt *mgmt,
diff --git a/net/mac80211/ht.c b/net/mac80211/ht.c
index c92df492e898..198b2d3e56fd 100644
--- a/net/mac80211/ht.c
+++ b/net/mac80211/ht.c
@@ -333,9 +333,9 @@ void ieee80211_ba_session_work(struct work_struct *work)
if (test_and_clear_bit(tid,
sta->ampdu_mlme.tid_rx_manage_offl))
- __ieee80211_start_rx_ba_session(sta, 0, 0, 0, 1, tid,
- IEEE80211_MAX_AMPDU_BUF,
- false, true);
+ ___ieee80211_start_rx_ba_session(sta, 0, 0, 0, 1, tid,
+ IEEE80211_MAX_AMPDU_BUF,
+ false, true);
if (test_and_clear_bit(tid + IEEE80211_NUM_TIDS,
sta->ampdu_mlme.tid_rx_manage_offl))
diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
index 2197c62a0a6e..9675814f64db 100644
--- a/net/mac80211/ieee80211_i.h
+++ b/net/mac80211/ieee80211_i.h
@@ -1760,6 +1760,10 @@ void __ieee80211_start_rx_ba_session(struct sta_info *sta,
u8 dialog_token, u16 timeout,
u16 start_seq_num, u16 ba_policy, u16 tid,
u16 buf_size, bool tx, bool auto_seq);
+void ___ieee80211_start_rx_ba_session(struct sta_info *sta,
+ u8 dialog_token, u16 timeout,
+ u16 start_seq_num, u16 ba_policy, u16 tid,
+ u16 buf_size, bool tx, bool auto_seq);
void ieee80211_sta_tear_down_BA_sessions(struct sta_info *sta,
enum ieee80211_agg_stop_reason reason);
void ieee80211_process_delba(struct ieee80211_sub_if_data *sdata,
--
2.14.1
^ permalink raw reply related
* Re: hung task in mac80211
From: Stefano Brivio @ 2017-09-06 13:19 UTC (permalink / raw)
To: Johannes Berg; +Cc: Matteo Croce, linux-wireless, netdev, linux-kernel
In-Reply-To: <1504702115.13457.16.camel@sipsolutions.net>
On Wed, 06 Sep 2017 14:48:35 +0200
Johannes Berg <johannes@sipsolutions.net> wrote:
> I'll look in a bit - but
>
> > + mutex_unlock(&sta->ampdu_mlme.mtx);
> > ___ieee80211_stop_rx_ba_session(
> > sta, tid, WLAN_BACK_RECIPIENT,
> > WLAN_REASON_QSTA_TIMEOUT, true);
>
> This already has three underscores so shouldn't drop.
Right, of course.
> [...]
> > + mutex_unlock(&sta->ampdu_mlme.mtx);
> > __ieee80211_start_rx_ba_session(sta, 0, 0,
> > 0, 1, tid,
>
> maybe this one needs a ___ version then?
Either that, or as it's a single call, perhaps just the following?
Matter of taste I guess...
diff --git a/net/mac80211/ht.c b/net/mac80211/ht.c
index c92df492e898..377dd3c233d3 100644
--- a/net/mac80211/ht.c
+++ b/net/mac80211/ht.c
@@ -332,10 +332,13 @@ void ieee80211_ba_session_work(struct work_struct *work)
WLAN_REASON_UNSPECIFIED, true);
if (test_and_clear_bit(tid,
- sta->ampdu_mlme.tid_rx_manage_offl))
+ sta->ampdu_mlme.tid_rx_manage_offl)) {
+ mutex_unlock(&sta->ampdu_mlme.mtx);
__ieee80211_start_rx_ba_session(sta, 0, 0, 0, 1, tid,
IEEE80211_MAX_AMPDU_BUF,
false, true);
+ mutex_lock(&sta->ampdu_mlme.mtx);
+ }
if (test_and_clear_bit(tid + IEEE80211_NUM_TIDS,
sta->ampdu_mlme.tid_rx_manage_offl))
--
Stefano
^ permalink raw reply related
* Re: hung task in mac80211
From: Johannes Berg @ 2017-09-06 13:21 UTC (permalink / raw)
To: Stefano Brivio; +Cc: Matteo Croce, linux-wireless, netdev, linux-kernel
In-Reply-To: <20170906151922.4a320b1d@elisabeth>
On Wed, 2017-09-06 at 15:19 +0200, Stefano Brivio wrote:
> On Wed, 06 Sep 2017 14:48:35 +0200
> Johannes Berg <johannes@sipsolutions.net> wrote:
>
> > I'll look in a bit - but
> >
> > > + mutex_unlock(&sta->ampdu_mlme.mtx);
> > > ___ieee80211_stop_rx_ba_session(
> > > sta, tid, WLAN_BACK_RECIPIENT,
> > > WLAN_REASON_QSTA_TIMEOUT,
> > > true);
> >
> > This already has three underscores so shouldn't drop.
>
> Right, of course.
>
> > [...]
> > > + mutex_unlock(&sta->ampdu_mlme.mtx);
> > > __ieee80211_start_rx_ba_session(sta, 0,
> > > 0,
> > > 0, 1, tid,
> >
> > maybe this one needs a ___ version then?
>
> Either that, or as it's a single call, perhaps just the following?
> Matter of taste I guess...
I don't think it's a matter of taste - for me, in principle, dropping
locks for small sections of code where the larger section holds it is a
bug waiting to happen. It may (may, I don't even know) be OK here, but
in general it's something to avoid.
johannes
^ permalink raw reply
* Re: hung task in mac80211
From: Stefano Brivio @ 2017-09-06 13:27 UTC (permalink / raw)
To: Johannes Berg; +Cc: Matteo Croce, linux-wireless, netdev, linux-kernel
In-Reply-To: <1504704060.13457.20.camel@sipsolutions.net>
On Wed, 06 Sep 2017 15:21:00 +0200
Johannes Berg <johannes@sipsolutions.net> wrote:
> On Wed, 2017-09-06 at 15:19 +0200, Stefano Brivio wrote:
> > On Wed, 06 Sep 2017 14:48:35 +0200
> > Johannes Berg <johannes@sipsolutions.net> wrote:
> >
> > > I'll look in a bit - but
> > >
> > > > + mutex_unlock(&sta->ampdu_mlme.mtx);
> > > > ___ieee80211_stop_rx_ba_session(
> > > > sta, tid, WLAN_BACK_RECIPIENT,
> > > > WLAN_REASON_QSTA_TIMEOUT,
> > > > true);
> > >
> > > This already has three underscores so shouldn't drop.
> >
> > Right, of course.
> >
> > > [...]
> > > > + mutex_unlock(&sta->ampdu_mlme.mtx);
> > > > __ieee80211_start_rx_ba_session(sta, 0,
> > > > 0,
> > > > 0, 1, tid,
> > >
> > > maybe this one needs a ___ version then?
> >
> > Either that, or as it's a single call, perhaps just the following?
> > Matter of taste I guess...
>
> I don't think it's a matter of taste - for me, in principle, dropping
> locks for small sections of code where the larger section holds it is a
> bug waiting to happen. It may (may, I don't even know) be OK here, but
> in general it's something to avoid.
Yes, that was based on the assumption that the initial part of
__ieee80211_start_rx_ba_session() can't really affect the AMPDU
state-machine in any way.
But sure, one small change there in the future and the assumption
doesn't hold anymore.
--
Stefano
^ permalink raw reply
* Re: hung task in mac80211
From: Johannes Berg @ 2017-09-06 13:30 UTC (permalink / raw)
To: Stefano Brivio; +Cc: Matteo Croce, linux-wireless, netdev, linux-kernel
In-Reply-To: <20170906152709.673f230d@elisabeth>
On Wed, 2017-09-06 at 15:27 +0200, Stefano Brivio wrote:
>
> Yes, that was based on the assumption that the initial part of
> __ieee80211_start_rx_ba_session() can't really affect the AMPDU
> state-machine in any way.
That's not really the point, if that changes that function would have
to move the locking around, and nothing else.
The point is more that code in ieee80211_ba_session_work() could assume
the lock is held across the entire loop, since that's the way it's
written and looks like even with your patch.
So for example replacing the loop of tid = 0..NUM_TIDS-1 with a
list_for_each_entry() would already be unsafe with the dropping if the
list were to require the mutex for locking.
johannes
^ permalink raw reply
* Re: hung task in mac80211
From: Stefano Brivio @ 2017-09-06 13:40 UTC (permalink / raw)
To: Johannes Berg; +Cc: Matteo Croce, linux-wireless, netdev, linux-kernel
In-Reply-To: <1504704610.23905.1.camel@sipsolutions.net>
On Wed, 06 Sep 2017 15:30:10 +0200
Johannes Berg <johannes@sipsolutions.net> wrote:
> So for example replacing the loop of tid = 0..NUM_TIDS-1 with a
> list_for_each_entry() would already be unsafe with the dropping if the
> list were to require the mutex for locking.
Sure. Still, it would need another code change to break, but in general
I do agree indeed. :)
--
Stefano
^ permalink raw reply
* Re: hung task in mac80211
From: Stefano Brivio @ 2017-09-06 14:27 UTC (permalink / raw)
To: Johannes Berg; +Cc: Matteo Croce, linux-wireless, netdev, linux-kernel
In-Reply-To: <1504702728.13457.17.camel@sipsolutions.net>
On Wed, 06 Sep 2017 14:58:48 +0200
Johannes Berg <johannes@sipsolutions.net> wrote:
> +void __ieee80211_start_rx_ba_session(struct sta_info *sta,
> + u8 dialog_token, u16 timeout,
> + u16 start_seq_num, u16 ba_policy, u16 tid,
> + u16 buf_size, bool tx, bool auto_seq)
> +{
> + mutex_lock(&sta->ampdu_mlme.mtx);
> + ___ieee80211_start_rx_ba_session(sta, dialog_token, timeout,
> + start_seq_num, ba_policy, tid,
> + buf_size, tx, auto_seq);
> + mutex_unlock(&sta->ampdu_mlme.mtx);
> +}
> +
Sorry for the extended bothering :) but here, you're extending quite a
bit the scope of the lock also when__ieee80211_start_rx_ba_session() is
called by ieee80211_process_addba_request(). No idea what the hit can
be, but we can't safely assume it's nothing either. What about simply
introducing a 'ampdu_mlme_lock_held' argument instead? Something like:
diff --git a/net/mac80211/agg-rx.c b/net/mac80211/agg-rx.c
index 2b36eff5d97e..35a9eff1ec66 100644
--- a/net/mac80211/agg-rx.c
+++ b/net/mac80211/agg-rx.c
@@ -248,7 +248,8 @@ static void ieee80211_send_addba_resp(struct ieee80211_sub_if_data *sdata, u8 *d
void __ieee80211_start_rx_ba_session(struct sta_info *sta,
u8 dialog_token, u16 timeout,
u16 start_seq_num, u16 ba_policy, u16 tid,
- u16 buf_size, bool tx, bool auto_seq)
+ u16 buf_size, bool tx, bool auto_seq,
+ bool ampdu_mlme_lock_held)
{
struct ieee80211_local *local = sta->sdata->local;
struct tid_ampdu_rx *tid_agg_rx;
@@ -311,7 +312,8 @@ void __ieee80211_start_rx_ba_session(struct sta_info *sta,
buf_size, sta->sta.addr);
/* examine state machine */
- mutex_lock(&sta->ampdu_mlme.mtx);
+ if (!ampdu_mlme_lock_held)
+ mutex_lock(&sta->ampdu_mlme.mtx);
if (test_bit(tid, sta->ampdu_mlme.agg_session_valid)) {
if (sta->ampdu_mlme.tid_rx_token[tid] == dialog_token) {
@@ -415,7 +417,8 @@ void __ieee80211_start_rx_ba_session(struct sta_info *sta,
__clear_bit(tid, sta->ampdu_mlme.unexpected_agg);
sta->ampdu_mlme.tid_rx_token[tid] = dialog_token;
}
- mutex_unlock(&sta->ampdu_mlme.mtx);
+ if (!ampdu_mlme_lock_held)
+ mutex_unlock(&sta->ampdu_mlme.mtx);
end_no_lock:
if (tx)
@@ -445,7 +448,7 @@ void ieee80211_process_addba_request(struct ieee80211_local *local,
__ieee80211_start_rx_ba_session(sta, dialog_token, timeout,
start_seq_num, ba_policy, tid,
- buf_size, true, false);
+ buf_size, true, false, false);
}
void ieee80211_manage_rx_ba_offl(struct ieee80211_vif *vif,
diff --git a/net/mac80211/ht.c b/net/mac80211/ht.c
index c92df492e898..59ba67e8942f 100644
--- a/net/mac80211/ht.c
+++ b/net/mac80211/ht.c
@@ -335,7 +335,7 @@ void ieee80211_ba_session_work(struct work_struct *work)
sta->ampdu_mlme.tid_rx_manage_offl))
__ieee80211_start_rx_ba_session(sta, 0, 0, 0, 1, tid,
IEEE80211_MAX_AMPDU_BUF,
- false, true);
+ false, true, true);
if (test_and_clear_bit(tid + IEEE80211_NUM_TIDS,
sta->ampdu_mlme.tid_rx_manage_offl))
diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
index 2197c62a0a6e..5d494ac65853 100644
--- a/net/mac80211/ieee80211_i.h
+++ b/net/mac80211/ieee80211_i.h
@@ -1759,7 +1759,8 @@ void __ieee80211_stop_rx_ba_session(struct sta_info *sta, u16 tid,
void __ieee80211_start_rx_ba_session(struct sta_info *sta,
u8 dialog_token, u16 timeout,
u16 start_seq_num, u16 ba_policy, u16 tid,
- u16 buf_size, bool tx, bool auto_seq);
+ u16 buf_size, bool tx, bool auto_seq,
+ bool ampdu_mlme_lock_held);
void ieee80211_sta_tear_down_BA_sessions(struct sta_info *sta,
enum ieee80211_agg_stop_reason reason);
void ieee80211_process_delba(struct ieee80211_sub_if_data *sdata,
--
Stefano
^ permalink raw reply related
* Re: hung task in mac80211
From: Johannes Berg @ 2017-09-06 14:30 UTC (permalink / raw)
To: Stefano Brivio; +Cc: Matteo Croce, linux-wireless, netdev, linux-kernel
In-Reply-To: <20170906162748.071b040e@elisabeth>
On Wed, 2017-09-06 at 16:27 +0200, Stefano Brivio wrote:
>
> Sorry for the extended bothering :) but here, you're extending quite
> a bit the scope of the lock also
> when__ieee80211_start_rx_ba_session() is called by
> ieee80211_process_addba_request().
I know, but it doesn't matter.
> No idea what the hit can be, but we can't safely assume it's
> nothing either.
We don't really have to assume anything, we can read the code :-)
Trust me, I probably wrote most of it. It's fine, just sanity checks.
> What about simply introducing a 'ampdu_mlme_lock_held' argument
> instead?
Eww, no.
johannes
^ permalink raw reply
* Re: hung task in mac80211
From: Matteo Croce @ 2017-09-06 15:04 UTC (permalink / raw)
To: Johannes Berg; +Cc: linux-wireless, netdev, linux-kernel
In-Reply-To: <1504702728.13457.17.camel@sipsolutions.net>
On Wed, Sep 6, 2017 at 2:58 PM, Johannes Berg <johannes@sipsolutions.net> wrote:
> On Wed, 2017-09-06 at 13:57 +0200, Matteo Croce wrote:
>
>> I have an hung task on vanilla 4.13 kernel which I haven't on 4.12.
>> The problem is present both on my AP and on my notebook,
>> so it seems it affects AP and STA mode as well.
>> The generated messages are:
>>
>> INFO: task kworker/u16:6:120 blocked for more than 120 seconds.
>> Not tainted 4.13.0 #57
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
>> message.
>> kworker/u16:6 D 0 120 2 0x00000000
>> Workqueue: phy0 ieee80211_ba_session_work [mac80211]
>> Call Trace:
>> ? __schedule+0x174/0x5b0
>> ? schedule+0x31/0x80
>> ? schedule_preempt_disabled+0x9/0x10
>> ? __mutex_lock.isra.2+0x163/0x480
>> ? select_task_rq_fair+0xb9f/0xc60
>> ? __ieee80211_start_rx_ba_session+0x135/0x4d0 [mac80211]
>> ? __ieee80211_start_rx_ba_session+0x135/0x4d0 [mac80211]
>
> Yeah - obviously as Stefano found, both take &sta->ampdu_mlme.mtx.
>
> Can you try this?
>
> diff --git a/net/mac80211/agg-rx.c b/net/mac80211/agg-rx.c
> index 2b36eff5d97e..d8d32776031e 100644
> --- a/net/mac80211/agg-rx.c
> +++ b/net/mac80211/agg-rx.c
> @@ -245,10 +245,10 @@ static void ieee80211_send_addba_resp(struct ieee80211_sub_if_data *sdata, u8 *d
> ieee80211_tx_skb(sdata, skb);
> }
>
> -void __ieee80211_start_rx_ba_session(struct sta_info *sta,
> - u8 dialog_token, u16 timeout,
> - u16 start_seq_num, u16 ba_policy, u16 tid,
> - u16 buf_size, bool tx, bool auto_seq)
> +void ___ieee80211_start_rx_ba_session(struct sta_info *sta,
> + u8 dialog_token, u16 timeout,
> + u16 start_seq_num, u16 ba_policy, u16 tid,
> + u16 buf_size, bool tx, bool auto_seq)
> {
> struct ieee80211_local *local = sta->sdata->local;
> struct tid_ampdu_rx *tid_agg_rx;
> @@ -267,7 +267,7 @@ void __ieee80211_start_rx_ba_session(struct sta_info *sta,
> ht_dbg(sta->sdata,
> "STA %pM requests BA session on unsupported tid %d\n",
> sta->sta.addr, tid);
> - goto end_no_lock;
> + goto end;
> }
>
> if (!sta->sta.ht_cap.ht_supported) {
> @@ -275,14 +275,14 @@ void __ieee80211_start_rx_ba_session(struct sta_info *sta,
> "STA %pM erroneously requests BA session on tid %d w/o QoS\n",
> sta->sta.addr, tid);
> /* send a response anyway, it's an error case if we get here */
> - goto end_no_lock;
> + goto end;
> }
>
> if (test_sta_flag(sta, WLAN_STA_BLOCK_BA)) {
> ht_dbg(sta->sdata,
> "Suspend in progress - Denying ADDBA request (%pM tid %d)\n",
> sta->sta.addr, tid);
> - goto end_no_lock;
> + goto end;
> }
>
> /* sanity check for incoming parameters:
> @@ -296,7 +296,7 @@ void __ieee80211_start_rx_ba_session(struct sta_info *sta,
> ht_dbg_ratelimited(sta->sdata,
> "AddBA Req with bad params from %pM on tid %u. policy %d, buffer size %d\n",
> sta->sta.addr, tid, ba_policy, buf_size);
> - goto end_no_lock;
> + goto end;
> }
> /* determine default buffer size */
> if (buf_size == 0)
> @@ -311,7 +311,6 @@ void __ieee80211_start_rx_ba_session(struct sta_info *sta,
> buf_size, sta->sta.addr);
>
> /* examine state machine */
> - mutex_lock(&sta->ampdu_mlme.mtx);
>
> if (test_bit(tid, sta->ampdu_mlme.agg_session_valid)) {
> if (sta->ampdu_mlme.tid_rx_token[tid] == dialog_token) {
> @@ -415,15 +414,25 @@ void __ieee80211_start_rx_ba_session(struct sta_info *sta,
> __clear_bit(tid, sta->ampdu_mlme.unexpected_agg);
> sta->ampdu_mlme.tid_rx_token[tid] = dialog_token;
> }
> - mutex_unlock(&sta->ampdu_mlme.mtx);
>
> -end_no_lock:
> if (tx)
> ieee80211_send_addba_resp(sta->sdata, sta->sta.addr, tid,
> dialog_token, status, 1, buf_size,
> timeout);
> }
>
> +void __ieee80211_start_rx_ba_session(struct sta_info *sta,
> + u8 dialog_token, u16 timeout,
> + u16 start_seq_num, u16 ba_policy, u16 tid,
> + u16 buf_size, bool tx, bool auto_seq)
> +{
> + mutex_lock(&sta->ampdu_mlme.mtx);
> + ___ieee80211_start_rx_ba_session(sta, dialog_token, timeout,
> + start_seq_num, ba_policy, tid,
> + buf_size, tx, auto_seq);
> + mutex_unlock(&sta->ampdu_mlme.mtx);
> +}
> +
> void ieee80211_process_addba_request(struct ieee80211_local *local,
> struct sta_info *sta,
> struct ieee80211_mgmt *mgmt,
> diff --git a/net/mac80211/ht.c b/net/mac80211/ht.c
> index c92df492e898..198b2d3e56fd 100644
> --- a/net/mac80211/ht.c
> +++ b/net/mac80211/ht.c
> @@ -333,9 +333,9 @@ void ieee80211_ba_session_work(struct work_struct *work)
>
> if (test_and_clear_bit(tid,
> sta->ampdu_mlme.tid_rx_manage_offl))
> - __ieee80211_start_rx_ba_session(sta, 0, 0, 0, 1, tid,
> - IEEE80211_MAX_AMPDU_BUF,
> - false, true);
> + ___ieee80211_start_rx_ba_session(sta, 0, 0, 0, 1, tid,
> + IEEE80211_MAX_AMPDU_BUF,
> + false, true);
>
> if (test_and_clear_bit(tid + IEEE80211_NUM_TIDS,
> sta->ampdu_mlme.tid_rx_manage_offl))
> diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
> index 2197c62a0a6e..9675814f64db 100644
> --- a/net/mac80211/ieee80211_i.h
> +++ b/net/mac80211/ieee80211_i.h
> @@ -1760,6 +1760,10 @@ void __ieee80211_start_rx_ba_session(struct sta_info *sta,
> u8 dialog_token, u16 timeout,
> u16 start_seq_num, u16 ba_policy, u16 tid,
> u16 buf_size, bool tx, bool auto_seq);
> +void ___ieee80211_start_rx_ba_session(struct sta_info *sta,
> + u8 dialog_token, u16 timeout,
> + u16 start_seq_num, u16 ba_policy, u16 tid,
> + u16 buf_size, bool tx, bool auto_seq);
> void ieee80211_sta_tear_down_BA_sessions(struct sta_info *sta,
> enum ieee80211_agg_stop_reason reason);
> void ieee80211_process_delba(struct ieee80211_sub_if_data *sdata,
>
> johannes
I confirm that this patch fixes the hang too.
I'm curious to see if there are noticeable performance differences
between the two solutions.
Cheers,
--
Matteo Croce
per aspera ad upstream
^ permalink raw reply
* Re: hung task in mac80211
From: Johannes Berg @ 2017-09-06 15:11 UTC (permalink / raw)
To: Matteo Croce; +Cc: linux-wireless, netdev, linux-kernel
In-Reply-To: <CAGnkfhyBMKT-Tf10NxjN40Znuj7OK2pSM3qLUidPSkQqh8gjwQ@mail.gmail.com>
On Wed, 2017-09-06 at 17:04 +0200, Matteo Croce wrote:
>
> I confirm that this patch fixes the hang too.
Cool, I'll go apply it.
> I'm curious to see if there are noticeable performance differences
> between the two solutions.
Nope, you hit this code path essentially once.
johannes
^ permalink raw reply
* Re: hung task in mac80211
From: Sebastian Gottschall @ 2017-09-06 15:45 UTC (permalink / raw)
To: Matteo Croce, Johannes Berg; +Cc: linux-wireless, netdev, linux-kernel
In-Reply-To: <CAGnkfhyBMKT-Tf10NxjN40Znuj7OK2pSM3qLUidPSkQqh8gjwQ@mail.gmail.com>
i confirm this patch fixes the issue for me as well
Am 06.09.2017 um 17:04 schrieb Matteo Croce:
> On Wed, Sep 6, 2017 at 2:58 PM, Johannes Berg <johannes@sipsolutions.net> wrote:
>> On Wed, 2017-09-06 at 13:57 +0200, Matteo Croce wrote:
>>
>>> I have an hung task on vanilla 4.13 kernel which I haven't on 4.12.
>>> The problem is present both on my AP and on my notebook,
>>> so it seems it affects AP and STA mode as well.
>>> The generated messages are:
>>>
>>> INFO: task kworker/u16:6:120 blocked for more than 120 seconds.
>>> Not tainted 4.13.0 #57
>>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
>>> message.
>>> kworker/u16:6 D 0 120 2 0x00000000
>>> Workqueue: phy0 ieee80211_ba_session_work [mac80211]
>>> Call Trace:
>>> ? __schedule+0x174/0x5b0
>>> ? schedule+0x31/0x80
>>> ? schedule_preempt_disabled+0x9/0x10
>>> ? __mutex_lock.isra.2+0x163/0x480
>>> ? select_task_rq_fair+0xb9f/0xc60
>>> ? __ieee80211_start_rx_ba_session+0x135/0x4d0 [mac80211]
>>> ? __ieee80211_start_rx_ba_session+0x135/0x4d0 [mac80211]
>> Yeah - obviously as Stefano found, both take &sta->ampdu_mlme.mtx.
>>
>> Can you try this?
>>
>> diff --git a/net/mac80211/agg-rx.c b/net/mac80211/agg-rx.c
>> index 2b36eff5d97e..d8d32776031e 100644
>> --- a/net/mac80211/agg-rx.c
>> +++ b/net/mac80211/agg-rx.c
>> @@ -245,10 +245,10 @@ static void ieee80211_send_addba_resp(struct ieee80211_sub_if_data *sdata, u8 *d
>> ieee80211_tx_skb(sdata, skb);
>> }
>>
>> -void __ieee80211_start_rx_ba_session(struct sta_info *sta,
>> - u8 dialog_token, u16 timeout,
>> - u16 start_seq_num, u16 ba_policy, u16 tid,
>> - u16 buf_size, bool tx, bool auto_seq)
>> +void ___ieee80211_start_rx_ba_session(struct sta_info *sta,
>> + u8 dialog_token, u16 timeout,
>> + u16 start_seq_num, u16 ba_policy, u16 tid,
>> + u16 buf_size, bool tx, bool auto_seq)
>> {
>> struct ieee80211_local *local = sta->sdata->local;
>> struct tid_ampdu_rx *tid_agg_rx;
>> @@ -267,7 +267,7 @@ void __ieee80211_start_rx_ba_session(struct sta_info *sta,
>> ht_dbg(sta->sdata,
>> "STA %pM requests BA session on unsupported tid %d\n",
>> sta->sta.addr, tid);
>> - goto end_no_lock;
>> + goto end;
>> }
>>
>> if (!sta->sta.ht_cap.ht_supported) {
>> @@ -275,14 +275,14 @@ void __ieee80211_start_rx_ba_session(struct sta_info *sta,
>> "STA %pM erroneously requests BA session on tid %d w/o QoS\n",
>> sta->sta.addr, tid);
>> /* send a response anyway, it's an error case if we get here */
>> - goto end_no_lock;
>> + goto end;
>> }
>>
>> if (test_sta_flag(sta, WLAN_STA_BLOCK_BA)) {
>> ht_dbg(sta->sdata,
>> "Suspend in progress - Denying ADDBA request (%pM tid %d)\n",
>> sta->sta.addr, tid);
>> - goto end_no_lock;
>> + goto end;
>> }
>>
>> /* sanity check for incoming parameters:
>> @@ -296,7 +296,7 @@ void __ieee80211_start_rx_ba_session(struct sta_info *sta,
>> ht_dbg_ratelimited(sta->sdata,
>> "AddBA Req with bad params from %pM on tid %u. policy %d, buffer size %d\n",
>> sta->sta.addr, tid, ba_policy, buf_size);
>> - goto end_no_lock;
>> + goto end;
>> }
>> /* determine default buffer size */
>> if (buf_size == 0)
>> @@ -311,7 +311,6 @@ void __ieee80211_start_rx_ba_session(struct sta_info *sta,
>> buf_size, sta->sta.addr);
>>
>> /* examine state machine */
>> - mutex_lock(&sta->ampdu_mlme.mtx);
>>
>> if (test_bit(tid, sta->ampdu_mlme.agg_session_valid)) {
>> if (sta->ampdu_mlme.tid_rx_token[tid] == dialog_token) {
>> @@ -415,15 +414,25 @@ void __ieee80211_start_rx_ba_session(struct sta_info *sta,
>> __clear_bit(tid, sta->ampdu_mlme.unexpected_agg);
>> sta->ampdu_mlme.tid_rx_token[tid] = dialog_token;
>> }
>> - mutex_unlock(&sta->ampdu_mlme.mtx);
>>
>> -end_no_lock:
>> if (tx)
>> ieee80211_send_addba_resp(sta->sdata, sta->sta.addr, tid,
>> dialog_token, status, 1, buf_size,
>> timeout);
>> }
>>
>> +void __ieee80211_start_rx_ba_session(struct sta_info *sta,
>> + u8 dialog_token, u16 timeout,
>> + u16 start_seq_num, u16 ba_policy, u16 tid,
>> + u16 buf_size, bool tx, bool auto_seq)
>> +{
>> + mutex_lock(&sta->ampdu_mlme.mtx);
>> + ___ieee80211_start_rx_ba_session(sta, dialog_token, timeout,
>> + start_seq_num, ba_policy, tid,
>> + buf_size, tx, auto_seq);
>> + mutex_unlock(&sta->ampdu_mlme.mtx);
>> +}
>> +
>> void ieee80211_process_addba_request(struct ieee80211_local *local,
>> struct sta_info *sta,
>> struct ieee80211_mgmt *mgmt,
>> diff --git a/net/mac80211/ht.c b/net/mac80211/ht.c
>> index c92df492e898..198b2d3e56fd 100644
>> --- a/net/mac80211/ht.c
>> +++ b/net/mac80211/ht.c
>> @@ -333,9 +333,9 @@ void ieee80211_ba_session_work(struct work_struct *work)
>>
>> if (test_and_clear_bit(tid,
>> sta->ampdu_mlme.tid_rx_manage_offl))
>> - __ieee80211_start_rx_ba_session(sta, 0, 0, 0, 1, tid,
>> - IEEE80211_MAX_AMPDU_BUF,
>> - false, true);
>> + ___ieee80211_start_rx_ba_session(sta, 0, 0, 0, 1, tid,
>> + IEEE80211_MAX_AMPDU_BUF,
>> + false, true);
>>
>> if (test_and_clear_bit(tid + IEEE80211_NUM_TIDS,
>> sta->ampdu_mlme.tid_rx_manage_offl))
>> diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
>> index 2197c62a0a6e..9675814f64db 100644
>> --- a/net/mac80211/ieee80211_i.h
>> +++ b/net/mac80211/ieee80211_i.h
>> @@ -1760,6 +1760,10 @@ void __ieee80211_start_rx_ba_session(struct sta_info *sta,
>> u8 dialog_token, u16 timeout,
>> u16 start_seq_num, u16 ba_policy, u16 tid,
>> u16 buf_size, bool tx, bool auto_seq);
>> +void ___ieee80211_start_rx_ba_session(struct sta_info *sta,
>> + u8 dialog_token, u16 timeout,
>> + u16 start_seq_num, u16 ba_policy, u16 tid,
>> + u16 buf_size, bool tx, bool auto_seq);
>> void ieee80211_sta_tear_down_BA_sessions(struct sta_info *sta,
>> enum ieee80211_agg_stop_reason reason);
>> void ieee80211_process_delba(struct ieee80211_sub_if_data *sdata,
>>
>> johannes
> I confirm that this patch fixes the hang too.
> I'm curious to see if there are noticeable performance differences
> between the two solutions.
>
> Cheers,
--
Mit freundlichen Grüssen / Regards
Sebastian Gottschall / CTO
NewMedia-NET GmbH - DD-WRT
Firmensitz: Berliner Ring 101, 64625 Bensheim
Registergericht: Amtsgericht Darmstadt, HRB 25473
Geschäftsführer: Peter Steinhäuser, Christian Scheele
http://www.dd-wrt.com
email: s.gottschall@dd-wrt.com
Tel.: +496251-582650 / Fax: +496251-5826565
^ permalink raw reply
* Re: VLAN/bridge "compression" in wifi (was: Re: [PATCH 3/8] qtnfmac: implement AP_VLAN iftype support)
From: Sergey Matyukevich @ 2017-09-06 15:45 UTC (permalink / raw)
To: Johannes Berg; +Cc: linux-wireless, netdev, Igor Mitsyanko, Avinash Patil
In-Reply-To: <1504621233.12380.21.camel@sipsolutions.net>
Hi Johannes and all,
> > In a way this feature seems mis-designed - you never have 802.1Q tags
> > over the air, but you're inserting them on RX and stripping them on
> > TX, probably in order to make bridging to ethernet easier and not
> > have to have 802.1Q acceleration on the ethernet port, or - well - in
> > order to have an ability to do this with an ethernet card that only
> > has a single CPU port.
>
> Ok this isn't really right either - it's only for saving the 802.1Q
> acceleration on the Ethernet port, really - and saving the extra
> bridges.
>
> To clarify, I think what you - conceptually - want is the following
> topology:
>
> +--- eth0.1 --- br.1 --- wlan0.1
> |
> eth0 ---+--- eth0.2 --- br.2 --- wlan0.2
> |
> +--- eth0.3 --- br.3 --- wlan0.3
>
> where eth0.N is just "ip link add link eth0 name eth0.N type vlan id N"
> and br.N is obviously a bridge for each, and the wlan0.N are AP_VLAN
> type interfaces that isolate the clients against each other as far as
> wifi is concerned.
>
> Is this correct? As far as I understand, that's the baseline topology
> that you're trying to achieve, expressed in terms of Linux networking.
That's right. In fact, hostapd is able to create this kind of network
bridge infrastructure automatically when it is built
with CONFIG_FULL_DYNAMIC_VLAN option enabled.
> Now, you seem to want to compress this to
>
> +--- wlan0.1
> |
> eth0 --- br ---+--- wlan0.2
> |
> +--- wlan0.3
>
> and have the 802.1Q tag insertion/removal that's normally configured to
> happen in eth0.N already be handled in wlan0.N.
>
> Also correct?
Exactly. And yes, the only purpose of this 'non-conventional' mode was
to have 802.1Q acceleration on the ethernet port.
>
>
> We clearly don't have APIs for this, and I don't think it makes sense
> in the Linux space - the bridge and wlan0.N suddenly have tagged
> traffic rather than untagged, and the VLAN tagging is completely hidden
> from the management view.
>
> johannes
^ permalink raw reply
* Re: [PATCH 3/8] qtnfmac: implement AP_VLAN iftype support
From: Sergey Matyukevich @ 2017-09-06 16:22 UTC (permalink / raw)
To: Johannes Berg; +Cc: linux-wireless, Igor Mitsyanko, Avinash Patil
In-Reply-To: <1504619151.12380.16.camel@sipsolutions.net>
Hi Johannes and all,
> > 1. Passing dynamic VLAN tag to firmware
>
> There's no such thing as a "VLAN tag" with AP_VLAN interfaces. You want
> 802.1Q, which can be done on top of them if needed, but it's an
> unrelated concept.
>
> > Currently there is no established way to pass VLAN tag assigned to
> > STA by Radius server to wireless driver. It is assumed that hostapd
> > is able to setup all the bridging and tagging on its own.
>
> I don't even know if this is true? Does hostapd in fact set up 802.1Q
> tagging in any way? Can you say how to configure this?
Yes, hostapd supports this feature when it is built with option
CONFIG_FULL_DYNAMIC_VLAN enabled. There is a set of hostapd configuration
options and additional files that make it possible to specify 'WAN'
interface, naming conventions for those dynamic bridges, and more.
However I agree that using 802.1Q is only one of possible ways to isolate
stations attached to different AP_VLAN intefaces.
> > 2. Packet routing to/from AP/VLAN interfaces
> > Firmware operates with tagged packets after dynamic VLAN mode is
> > configured. In particular, packets destined to STAs should be
> > properly tagged before they can be passed to firmware. Packets
> > received from STAs are properly tagged by firmware and then
> > passed to wireless driver. As a result, packet routing to AP/VLAN
> > interfaces is straightforward: it is enough to check VLAN tags.
>
> Ok, so here the whole tagging comes - a bit - into play. Technically
> you could, however, do the following:
>
> * assign "fake" tags as I suggested above
> * pack/unpack the 802.1Q header from the firmware (or put it into
> metadata) in the driver and just tx/rx untagged packets into the
> right interface
> * if the AP_VLAN has a real 802.1Q on top, then it's re-packed again
> by the ethernet driver when the data goes out
Yes, this is approach with 'fake' tags is how it can be done provided
we do not want to use any hacks with naming conventions for AP_VLAN
interfaces enforced by hostapd.
In fact, the 802.1Q approach has been chosen since in our case it is
possible to make use of 802.1Q acceleration in firmware. So we could
save some cycles on host CPU, tagging/untagging packets on the card.
But again, I agree that using 802.1Q is only one of the possible ways
to isolate stations attached to different AP_VLAN. Thus there is no
point to complicate/drop other usecases only because we are able
to support 802.1Q case better due to acceleration in firmware.
> > Normally hostapd expects untagged packets from AP/VLAN interfaces.
>
> hostapd doesn't really expect anything there, does it?
In fact, hostapd does not expect 802.1Q tagged packets. IIRC when
CONFIG_FULL_DYNAMIC_VLAN enabled, then hostapd creates bridges
and VLAN interfaces dynamically, connecting them to 'WAN' network
interface specified by vlan_tagged_interface configuration option.
> > Meanwhile firmware performs tagging using h/w acceleration. That
> > is why it makes sense to avoid untagging packets in driver if
> > they are supposed to by tagged again on host. To enable this
> > behavior a new module parameter 'dyn_vlan_tagged' has been
> > introduced:
> >
> > - dyn_vlan_tagged = 0 (default)
> > In this case untagged packets are sent to and expected from AP/VLAN
> > interfaces.
> > Driver tags/untags packets going to/from firmware. This behaviour is
> > expected
> > by hostapd which is able to create bridges and VLAN interfaces
> > automatically
> > when hostapd is built with CONFIG_FULL_DYNAMIC_VLAN option enabled.
> >
> > - dyn_vlan_tagged = 1
> > In this case tagged packets are sent to and expected from AP/VLAN
> > interfaces.
> > Hostapd build option CONFIG_FULL_DYNAMIC_VLAN should be disabled.
> > Setup of
> > networking topology on host is left up to the implementers.
>
> This is all very hacky, I really don't think we can accept that.
Fair enough. I didn't like it that much anyway :) I will rework the whole
patch using opaque tags for 802.1Q headers from the firmware, leaving AP_VLAN
interface management to hostapd or any other userspace tool.
Regards,
Sergey
^ permalink raw reply
* Re: [GIT] Networking
From: Linus Torvalds @ 2017-09-06 23:27 UTC (permalink / raw)
To: David Miller, Johannes Berg, Emmanuel Grumbach, Luca Coelho,
Kalle Valo
Cc: Andrew Morton, Network Development, Linux Kernel Mailing List,
Intel Linux Wireless, Linux Wireless List
In-Reply-To: <20170905.214143.826912481689443792.davem@davemloft.net>
This pull request completely breaks Intel wireless for me.
This is my trusty old XPS 13 (9350), using Intel Wireless 8260 (rev 3a).
That remains a very standard Intel machine with absolutely zero odd
things going on.
The firmware is iwlwifi-8000C-28.ucode from
iwl7260-firmware-25.30.13.0-75.fc26.noarch, and the kernel reports
iwlwifi 0000:3a:00.0: loaded firmware version 27.455470.0 op_mode iwlmvm
the thing starts acting badly with this:
iwlwifi 0000:3a:00.0: FW Error notification: type 0x00000000 cmd_id 0x04
iwlwifi 0000:3a:00.0: FW Error notification: seq 0x0000 service 0x00000004
iwlwifi 0000:3a:00.0: FW Error notification: timestamp 0x 5D84
iwlwifi 0000:3a:00.0: Microcode SW error detected. Restarting 0x2000000.
iwlwifi 0000:3a:00.0: Start IWL Error Log Dump:
iwlwifi 0000:3a:00.0: Status: 0x00000100, count: 6
iwlwifi 0000:3a:00.0: Loaded firmware version: 27.455470.0
iwlwifi 0000:3a:00.0: 0x00000038 | BAD_COMMAND
iwlwifi 0000:3a:00.0: 0x00A002F0 | trm_hw_status0
...
iwlwifi 0000:3a:00.0: 0x00000000 | isr status reg
ieee80211 phy0: Hardware restart was requested
iwlwifi 0000:3a:00.0: FW error in SYNC CMD MAC_CONTEXT_CMD
CPU: 2 PID: 993 Comm: NetworkManager Not tainted 4.13.0-06466-g80cee03bf1d6 #4
Hardware name: Dell Inc. XPS 13 9350/09JHRY, BIOS 1.4.17 05/10/2017
Call Trace:
dump_stack+0x4d/0x70
iwl_trans_pcie_send_hcmd+0x4e7/0x530 [iwlwifi]
? wait_woken+0x80/0x80
iwl_trans_send_cmd+0x5c/0xc0 [iwlwifi]
iwl_mvm_send_cmd+0x32/0x90 [iwlmvm]
iwl_mvm_send_cmd_pdu+0x58/0x80 [iwlmvm]
iwl_mvm_mac_ctxt_send_cmd+0x2a/0x60 [iwlmvm]
? iwl_mvm_mac_ctxt_send_cmd+0x2a/0x60 [iwlmvm]
iwl_mvm_mac_ctxt_cmd_sta+0x140/0x1e0 [iwlmvm]
iwl_mvm_mac_ctx_send+0x2d/0x60 [iwlmvm]
iwl_mvm_mac_ctxt_add+0x43/0xc0 [iwlmvm]
iwl_mvm_mac_add_interface+0x139/0x2b0 [iwlmvm]
? iwl_led_brightness_set+0x1f/0x30 [iwlmvm]
drv_add_interface+0x4a/0x120 [mac80211]
ieee80211_do_open+0x33d/0x820 [mac80211]
ieee80211_open+0x52/0x60 [mac80211]
__dev_open+0xae/0x120
__dev_change_flags+0x17b/0x1c0
dev_change_flags+0x29/0x60
do_setlink+0x2f7/0xe60
? __nla_put+0x20/0x30
? _raw_read_unlock_bh+0x20/0x30
? inet6_fill_ifla6_attrs+0x4be/0x4e0
? __kmalloc_node_track_caller+0x35/0x2b0
? nla_parse+0x35/0x100
rtnl_newlink+0x5d2/0x8f0
? __netlink_sendskb+0x3b/0x60
? security_capset+0x40/0x80
? ns_capable_common+0x68/0x80
? ns_capable+0x13/0x20
rtnetlink_rcv_msg+0x1f9/0x280
? rtnl_calcit.isra.26+0x110/0x110
netlink_rcv_skb+0x8e/0x130
rtnetlink_rcv+0x15/0x20
netlink_unicast+0x18b/0x220
netlink_sendmsg+0x2ad/0x3a0
sock_sendmsg+0x38/0x50
___sys_sendmsg+0x269/0x2c0
? addrconf_sysctl_forward+0x114/0x280
? dev_forward_change+0x140/0x140
? sysctl_head_finish.part.22+0x32/0x40
? lockref_put_or_lock+0x5e/0x80
? dput.part.22+0x13e/0x1c0
? mntput+0x24/0x40
__sys_sendmsg+0x54/0x90
? __sys_sendmsg+0x54/0x90
SyS_sendmsg+0x12/0x20
entry_SYSCALL_64_fastpath+0x13/0x94
RIP: 0033:0x7ff1f9933134
RSP: 002b:00007ffe7419b460 EFLAGS: 00000293 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 000055604b6d82b9 RCX: 00007ff1f9933134
RDX: 0000000000000000 RSI: 00007ffe7419b4b0 RDI: 0000000000000007
RBP: 00007ffe7419b940 R08: 0000000000000000 R09: 000055604d16b400
R10: 00007ff1f7cf8b38 R11: 0000000000000293 R12: 0000000000000001
R13: 0000000000000001 R14: 00007ffe7419b670 R15: 000055604b9515a0
iwlwifi 0000:3a:00.0: Failed to send MAC context (action:1): -5
and it doesn't get any better from there. The next error seems to be
Timeout waiting for hardware access (CSR_GP_CNTRL 0x08000008)
------------[ cut here ]------------
WARNING: CPU: 3 PID: 1075 at
drivers/net/wireless/intel/iwlwifi/pcie/trans.c:1874
iwl_trans_pcie_grab_nic_access+0xdf/0xf0 [iwlwifi]
and it will continue with those microcode failure errors and various
other warnigns about how nothing is working.
And no, nothing works. A lot of log output, no actual network access..
Linus
^ permalink raw reply
* Re: [GIT] Networking
From: David Miller @ 2017-09-06 23:31 UTC (permalink / raw)
To: torvalds
Cc: johannes.berg, emmanuel.grumbach, luciano.coelho, kvalo, akpm,
netdev, linux-kernel, linuxwifi, linux-wireless
In-Reply-To: <CA+55aFw0KNTKFrRHk5hthDGTGgL9BGt9G=x_m9Tz7m_-pN+NoA@mail.gmail.com>
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Wed, 6 Sep 2017 16:27:15 -0700
> This pull request completely breaks Intel wireless for me.
>
> This is my trusty old XPS 13 (9350), using Intel Wireless 8260 (rev 3a).
>
> That remains a very standard Intel machine with absolutely zero odd
> things going on.
>
> The firmware is iwlwifi-8000C-28.ucode from
> iwl7260-firmware-25.30.13.0-75.fc26.noarch, and the kernel reports
...
Johannes and other Intel folks please look into this.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox