* [RESEND PATCH v3 0/2] Improve ath10k flush queue mechanism
@ 2024-11-22 16:48 Remi Pommarel
2024-11-22 16:48 ` [RESEND PATCH v3 1/2] wifi: ath10k: Implement ieee80211 flush_sta callback Remi Pommarel
` (3 more replies)
0 siblings, 4 replies; 10+ messages in thread
From: Remi Pommarel @ 2024-11-22 16:48 UTC (permalink / raw)
To: ath10k, linux-wireless, linux-kernel
Cc: Kalle Valo, Jeff Johnson, Cedric Veilleux,
Vasanthakumar Thiagarajan, Remi Pommarel
It has been reported [0] that a 3-4 seconds (actually up to 5 sec) of
radio silence could be observed followed by the error below on ath10k
devices:
ath10k_pci 0000:04:00.0: failed to flush transmit queue (skip 0 ar-state 1): 0
This is due to how the TX queues are flushed in ath10k. When a STA is
removed, mac80211 need to flush queues [1], but because ath10k does not
have a lightweight .flush_sta operation, ieee80211_flush_queues() is
called instead effectively blocking the whole queue during the drain
causing this radio silence. Also because ath10k_flush() waits for all
queued to be emptied, not only the flushed ones it could more easily
take up to 5 seconds to finish making the whole situation worst.
The first patch of this series adds a .flush_sta operation to flush only
specific STA traffic avoiding the need to stop whole queues and should
be enough in itself to fix the reported issue.
The second patch of this series is a proposal to improve ath10k_flush so
that it will be less likely to timeout waiting for non related queues to
drain.
The abose kernel warning could still be observed (e.g. flushing a dead
STA) but should be now harmless.
[0]: https://lore.kernel.org/all/CA+Xfe4FjUmzM5mvPxGbpJsF3SvSdE5_wgxvgFJ0bsdrKODVXCQ@mail.gmail.com/
[1]: commit 0b75a1b1e42e ("wifi: mac80211: flush queues on STA removal")
V3:
- Initialize empty to true to fix smatch error
V2:
- Add Closes tag
- Use atomic instead of spinlock for per sta pending frame counter
- Call ath10k_htt_tx_sta_dec_pending within rcu
- Rename pending_per_queue[] to num_pending_per_queue[]
Remi Pommarel (2):
wifi: ath10k: Implement ieee80211 flush_sta callback
wifi: ath10k: Flush only requested txq in ath10k_flush()
drivers/net/wireless/ath/ath10k/core.h | 2 +
drivers/net/wireless/ath/ath10k/htt.h | 11 +++-
drivers/net/wireless/ath/ath10k/htt_tx.c | 49 +++++++++++++++-
drivers/net/wireless/ath/ath10k/mac.c | 75 ++++++++++++++++++++----
drivers/net/wireless/ath/ath10k/txrx.c | 11 ++--
5 files changed, 127 insertions(+), 21 deletions(-)
--
2.40.0
^ permalink raw reply [flat|nested] 10+ messages in thread
* [RESEND PATCH v3 1/2] wifi: ath10k: Implement ieee80211 flush_sta callback
2024-11-22 16:48 [RESEND PATCH v3 0/2] Improve ath10k flush queue mechanism Remi Pommarel
@ 2024-11-22 16:48 ` Remi Pommarel
2024-11-22 16:48 ` [RESEND PATCH v3 2/2] wifi: ath10k: Flush only requested txq in ath10k_flush() Remi Pommarel
` (2 subsequent siblings)
3 siblings, 0 replies; 10+ messages in thread
From: Remi Pommarel @ 2024-11-22 16:48 UTC (permalink / raw)
To: ath10k, linux-wireless, linux-kernel
Cc: Kalle Valo, Jeff Johnson, Cedric Veilleux,
Vasanthakumar Thiagarajan, Remi Pommarel
When a STA reassociates, mac80211's _sta_info_move_state() waits for all
pending frame to be flushed before removing the key (so that no frame
get sent unencrypted after key removable [0]). When a driver does not
implement the flush_sta callback, ieee80211_flush_queues() is called
instead which effectively stops the whole queue until it is completely
drained.
The ath10k driver configure all STAs of one vdev to share the same
queue. So when flushing one STA this is the whole vdev queue that is
blocked until completely drained causing Tx to other STA to also stall
this whole time.
One easy way to reproduce the issue is to connect two STAs (STA0 and
STA1) to an ath10k AP. While Generating a bunch of traffic from AP to
STA0 (e.g. fping -l -p 20 <STA0-IP>) disconnect STA0 from AP without
clean disassociation (e.g. remove power, reboot -f). Then as soon as
STA0 is effectively disconnected from AP (either after inactivity
timeout or forced with iw dev AP station del STA0), its queues get
flushed using ieee80211_flush_queues(). This causes STA1 to suffer a
connectivity stall for about 5 seconds (see ATH10K_FLUSH_TIMEOUT_HZ).
Implement a flush_sta callback in ath10k to wait only for a specific
STA pending frames to be drained (without stopping the whole HW queue)
to fix that.
[0]: commit 0b75a1b1e42e ("wifi: mac80211: flush queues on STA removal")
Reported-by: Cedric Veilleux <veilleux.cedric@gmail.com>
Closes: https://lore.kernel.org/all/CA+Xfe4FjUmzM5mvPxGbpJsF3SvSdE5_wgxvgFJ0bsdrKODVXCQ@mail.gmail.com/
Signed-off-by: Remi Pommarel <repk@triplefau.lt>
---
drivers/net/wireless/ath/ath10k/core.h | 2 ++
drivers/net/wireless/ath/ath10k/htt.h | 4 +++
drivers/net/wireless/ath/ath10k/htt_tx.c | 31 ++++++++++++++++++
drivers/net/wireless/ath/ath10k/mac.c | 40 +++++++++++++++++++++++-
drivers/net/wireless/ath/ath10k/txrx.c | 9 ++++--
5 files changed, 82 insertions(+), 4 deletions(-)
diff --git a/drivers/net/wireless/ath/ath10k/core.h b/drivers/net/wireless/ath/ath10k/core.h
index 446dca74f06a..4ec2faa055c0 100644
--- a/drivers/net/wireless/ath/ath10k/core.h
+++ b/drivers/net/wireless/ath/ath10k/core.h
@@ -558,6 +558,8 @@ struct ath10k_sta {
u8 rate_ctrl[ATH10K_TID_MAX];
u32 rate_code[ATH10K_TID_MAX];
int rtscts[ATH10K_TID_MAX];
+ wait_queue_head_t empty_tx_wq;
+ atomic_t num_fw_queued;
};
#define ATH10K_VDEV_SETUP_TIMEOUT_HZ (5 * HZ)
diff --git a/drivers/net/wireless/ath/ath10k/htt.h b/drivers/net/wireless/ath/ath10k/htt.h
index 603f6de62b0a..d150f9330941 100644
--- a/drivers/net/wireless/ath/ath10k/htt.h
+++ b/drivers/net/wireless/ath/ath10k/htt.h
@@ -2452,6 +2452,10 @@ int ath10k_htt_tx_inc_pending(struct ath10k_htt *htt);
void ath10k_htt_tx_mgmt_dec_pending(struct ath10k_htt *htt);
int ath10k_htt_tx_mgmt_inc_pending(struct ath10k_htt *htt, bool is_mgmt,
bool is_presp);
+void ath10k_htt_tx_sta_inc_pending(struct ath10k_htt *htt,
+ struct ieee80211_sta *sta);
+void ath10k_htt_tx_sta_dec_pending(struct ath10k_htt *htt,
+ struct ieee80211_sta *sta);
int ath10k_htt_tx_alloc_msdu_id(struct ath10k_htt *htt, struct sk_buff *skb);
void ath10k_htt_tx_free_msdu_id(struct ath10k_htt *htt, u16 msdu_id);
diff --git a/drivers/net/wireless/ath/ath10k/htt_tx.c b/drivers/net/wireless/ath/ath10k/htt_tx.c
index 9725feecefd6..211752bd0f65 100644
--- a/drivers/net/wireless/ath/ath10k/htt_tx.c
+++ b/drivers/net/wireless/ath/ath10k/htt_tx.c
@@ -195,6 +195,37 @@ void ath10k_htt_tx_mgmt_dec_pending(struct ath10k_htt *htt)
htt->num_pending_mgmt_tx--;
}
+void ath10k_htt_tx_sta_inc_pending(struct ath10k_htt *htt,
+ struct ieee80211_sta *sta)
+{
+ struct ath10k_sta *arsta;
+
+ if (!sta)
+ return;
+
+ arsta = (struct ath10k_sta *)sta->drv_priv;
+
+ atomic_inc(&arsta->num_fw_queued);
+}
+
+void ath10k_htt_tx_sta_dec_pending(struct ath10k_htt *htt,
+ struct ieee80211_sta *sta)
+{
+ struct ath10k_sta *arsta;
+ int v;
+
+ if (!sta)
+ return;
+
+ arsta = (struct ath10k_sta *)sta->drv_priv;
+
+ v = atomic_dec_if_positive(&arsta->num_fw_queued);
+ if (v < 0)
+ WARN_ON_ONCE(1);
+ if (v == 0)
+ wake_up(&arsta->empty_tx_wq);
+}
+
int ath10k_htt_tx_alloc_msdu_id(struct ath10k_htt *htt, struct sk_buff *skb)
{
struct ath10k *ar = htt->ar;
diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c
index c61b95a928da..8b8d4f24e5fb 100644
--- a/drivers/net/wireless/ath/ath10k/mac.c
+++ b/drivers/net/wireless/ath/ath10k/mac.c
@@ -4423,6 +4423,8 @@ int ath10k_mac_tx_push_txq(struct ieee80211_hw *hw,
spin_unlock_bh(&ar->htt.tx_lock);
}
+ ath10k_htt_tx_sta_inc_pending(&ar->htt, sta);
+
ret = ath10k_mac_tx(ar, vif, txmode, txpath, skb, false);
if (unlikely(ret)) {
ath10k_warn(ar, "failed to push frame: %d\n", ret);
@@ -4432,6 +4434,7 @@ int ath10k_mac_tx_push_txq(struct ieee80211_hw *hw,
if (is_mgmt)
ath10k_htt_tx_mgmt_dec_pending(htt);
spin_unlock_bh(&ar->htt.tx_lock);
+ ath10k_htt_tx_sta_dec_pending(&ar->htt, sta);
return ret;
}
@@ -7481,7 +7484,7 @@ static int ath10k_sta_state(struct ieee80211_hw *hw,
arsta->peer_ps_state = WMI_PEER_PS_STATE_DISABLED;
INIT_WORK(&arsta->update_wk, ath10k_sta_rc_update_wk);
INIT_WORK(&arsta->tid_config_wk, ath10k_sta_tid_cfg_wk);
-
+ init_waitqueue_head(&arsta->empty_tx_wq);
for (i = 0; i < ARRAY_SIZE(sta->txq); i++)
ath10k_mac_txq_init(sta->txq[i]);
}
@@ -8112,6 +8115,40 @@ static void ath10k_flush(struct ieee80211_hw *hw, struct ieee80211_vif *vif,
mutex_unlock(&ar->conf_mutex);
}
+static void ath10k_flush_sta(struct ieee80211_hw *hw, struct ieee80211_vif *vif,
+ struct ieee80211_sta *sta)
+{
+ struct ath10k_sta *arsta = (struct ath10k_sta *)sta->drv_priv;
+ struct ath10k *ar = hw->priv;
+ bool skip;
+ long time_left;
+
+ /* TODO do we need drop implemented here ? */
+
+ mutex_lock(&ar->conf_mutex);
+
+ if (ar->state == ATH10K_STATE_WEDGED)
+ goto out;
+
+ time_left = wait_event_timeout(arsta->empty_tx_wq, ({
+ bool empty;
+
+ empty = atomic_read(&arsta->num_fw_queued) == 0;
+
+ skip = (ar->state == ATH10K_STATE_WEDGED) ||
+ test_bit(ATH10K_FLAG_CRASH_FLUSH,
+ &ar->dev_flags);
+
+ (empty || skip);
+ }), ATH10K_FLUSH_TIMEOUT_HZ);
+
+ if (time_left == 0 || skip)
+ ath10k_warn(ar, "failed to flush sta txq (sta %pM skip %i ar-state %i): %ld\n",
+ sta->addr, skip, ar->state, time_left);
+out:
+ mutex_unlock(&ar->conf_mutex);
+}
+
/* TODO: Implement this function properly
* For now it is needed to reply to Probe Requests in IBSS mode.
* Probably we need this information from FW.
@@ -9459,6 +9496,7 @@ static const struct ieee80211_ops ath10k_ops = {
.set_rts_threshold = ath10k_set_rts_threshold,
.set_frag_threshold = ath10k_mac_op_set_frag_threshold,
.flush = ath10k_flush,
+ .flush_sta = ath10k_flush_sta,
.tx_last_beacon = ath10k_tx_last_beacon,
.set_antenna = ath10k_set_antenna,
.get_antenna = ath10k_get_antenna,
diff --git a/drivers/net/wireless/ath/ath10k/txrx.c b/drivers/net/wireless/ath/ath10k/txrx.c
index da3bc35e41aa..5b6750ef7d19 100644
--- a/drivers/net/wireless/ath/ath10k/txrx.c
+++ b/drivers/net/wireless/ath/ath10k/txrx.c
@@ -86,9 +86,12 @@ int ath10k_txrx_tx_unref(struct ath10k_htt *htt,
spin_unlock_bh(&htt->tx_lock);
rcu_read_lock();
- if (txq && txq->sta && skb_cb->airtime_est)
- ieee80211_sta_register_airtime(txq->sta, txq->tid,
- skb_cb->airtime_est, 0);
+ if (txq && txq->sta) {
+ if (skb_cb->airtime_est)
+ ieee80211_sta_register_airtime(txq->sta, txq->tid,
+ skb_cb->airtime_est, 0);
+ ath10k_htt_tx_sta_dec_pending(htt, txq->sta);
+ }
rcu_read_unlock();
if (ar->bus_param.dev_type != ATH10K_DEV_TYPE_HL)
--
2.40.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [RESEND PATCH v3 2/2] wifi: ath10k: Flush only requested txq in ath10k_flush()
2024-11-22 16:48 [RESEND PATCH v3 0/2] Improve ath10k flush queue mechanism Remi Pommarel
2024-11-22 16:48 ` [RESEND PATCH v3 1/2] wifi: ath10k: Implement ieee80211 flush_sta callback Remi Pommarel
@ 2024-11-22 16:48 ` Remi Pommarel
2025-07-25 21:57 ` Maurer, Florian
2024-11-26 12:57 ` [RESEND PATCH v3 0/2] Improve ath10k flush queue mechanism James Prestwood
2025-01-09 13:03 ` Kalle Valo
3 siblings, 1 reply; 10+ messages in thread
From: Remi Pommarel @ 2024-11-22 16:48 UTC (permalink / raw)
To: ath10k, linux-wireless, linux-kernel
Cc: Kalle Valo, Jeff Johnson, Cedric Veilleux,
Vasanthakumar Thiagarajan, Remi Pommarel
The ieee80211 flush callback can be called to flush only part of all hw
queues. The ath10k's flush callback implementation (i.e. ath10k_flush())
was waiting for all pending frames of all queues to be flushed ignoring
the queue parameter. Because only the queues to be flushed are stopped
by mac80211, skb can still be queued to other queues meanwhile. Thus
ath10k_flush() could fail (and wait 5sec holding ar->conf lock) even if
the requested queues are flushed correctly.
A way to reproduce the issue is to use two different APs because
each vdev has its own hw queue in ath10k. Connect STA0 to AP0 and STA1
to AP1. Then generate traffic from AP0 to STA0 and kill STA0 without
clean disassociation frame (e.g. unplug power cable, reboot -f, ...).
Now if we were to flush AP1's queue, ath10k_flush() would fail (and
effectively block 5 seconds with ar->conf or even wiphy's lock held)
with the following warning:
ath10k_pci 0000:01:00.0: failed to flush transmit queue (skip 0 ar-state 2): 0
Wait only for pending frames of the requested queues to be flushed in
ath10k_flush() to avoid that long blocking.
Reported-by: Cedric Veilleux <veilleux.cedric@gmail.com>
Closes: https://lore.kernel.org/all/CA+Xfe4FjUmzM5mvPxGbpJsF3SvSdE5_wgxvgFJ0bsdrKODVXCQ@mail.gmail.com/
Signed-off-by: Remi Pommarel <repk@triplefau.lt>
---
drivers/net/wireless/ath/ath10k/htt.h | 7 +++--
drivers/net/wireless/ath/ath10k/htt_tx.c | 18 ++++++++++--
drivers/net/wireless/ath/ath10k/mac.c | 35 ++++++++++++++++--------
drivers/net/wireless/ath/ath10k/txrx.c | 2 +-
4 files changed, 45 insertions(+), 17 deletions(-)
diff --git a/drivers/net/wireless/ath/ath10k/htt.h b/drivers/net/wireless/ath/ath10k/htt.h
index d150f9330941..ca8bf3b6766d 100644
--- a/drivers/net/wireless/ath/ath10k/htt.h
+++ b/drivers/net/wireless/ath/ath10k/htt.h
@@ -1870,6 +1870,7 @@ struct ath10k_htt {
spinlock_t tx_lock;
int max_num_pending_tx;
int num_pending_tx;
+ int num_pending_per_queue[IEEE80211_MAX_QUEUES];
int num_pending_mgmt_tx;
struct idr pending_tx;
wait_queue_head_t empty_tx_wq;
@@ -2447,8 +2448,10 @@ void ath10k_htt_tx_txq_update(struct ieee80211_hw *hw,
void ath10k_htt_tx_txq_recalc(struct ieee80211_hw *hw,
struct ieee80211_txq *txq);
void ath10k_htt_tx_txq_sync(struct ath10k *ar);
-void ath10k_htt_tx_dec_pending(struct ath10k_htt *htt);
-int ath10k_htt_tx_inc_pending(struct ath10k_htt *htt);
+void ath10k_htt_tx_dec_pending(struct ath10k_htt *htt,
+ struct ieee80211_txq *txq);
+int ath10k_htt_tx_inc_pending(struct ath10k_htt *htt,
+ struct ieee80211_txq *txq);
void ath10k_htt_tx_mgmt_dec_pending(struct ath10k_htt *htt);
int ath10k_htt_tx_mgmt_inc_pending(struct ath10k_htt *htt, bool is_mgmt,
bool is_presp);
diff --git a/drivers/net/wireless/ath/ath10k/htt_tx.c b/drivers/net/wireless/ath/ath10k/htt_tx.c
index 211752bd0f65..ef5a992e8cce 100644
--- a/drivers/net/wireless/ath/ath10k/htt_tx.c
+++ b/drivers/net/wireless/ath/ath10k/htt_tx.c
@@ -140,19 +140,26 @@ void ath10k_htt_tx_txq_update(struct ieee80211_hw *hw,
spin_unlock_bh(&ar->htt.tx_lock);
}
-void ath10k_htt_tx_dec_pending(struct ath10k_htt *htt)
+void ath10k_htt_tx_dec_pending(struct ath10k_htt *htt,
+ struct ieee80211_txq *txq)
{
+ int qnr = -1;
+
lockdep_assert_held(&htt->tx_lock);
htt->num_pending_tx--;
if (htt->num_pending_tx == htt->max_num_pending_tx - 1)
ath10k_mac_tx_unlock(htt->ar, ATH10K_TX_PAUSE_Q_FULL);
- if (htt->num_pending_tx == 0)
+ if (txq)
+ qnr = --htt->num_pending_per_queue[txq->vif->hw_queue[txq->ac]];
+
+ if (htt->num_pending_tx == 0 || qnr == 0)
wake_up(&htt->empty_tx_wq);
}
-int ath10k_htt_tx_inc_pending(struct ath10k_htt *htt)
+int ath10k_htt_tx_inc_pending(struct ath10k_htt *htt,
+ struct ieee80211_txq *txq)
{
lockdep_assert_held(&htt->tx_lock);
@@ -163,6 +170,11 @@ int ath10k_htt_tx_inc_pending(struct ath10k_htt *htt)
if (htt->num_pending_tx == htt->max_num_pending_tx)
ath10k_mac_tx_lock(htt->ar, ATH10K_TX_PAUSE_Q_FULL);
+ if (!txq)
+ return 0;
+
+ htt->num_pending_per_queue[txq->vif->hw_queue[txq->ac]]++;
+
return 0;
}
diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c
index 8b8d4f24e5fb..7c5f95f2858f 100644
--- a/drivers/net/wireless/ath/ath10k/mac.c
+++ b/drivers/net/wireless/ath/ath10k/mac.c
@@ -4385,7 +4385,7 @@ int ath10k_mac_tx_push_txq(struct ieee80211_hw *hw,
u16 airtime;
spin_lock_bh(&ar->htt.tx_lock);
- ret = ath10k_htt_tx_inc_pending(htt);
+ ret = ath10k_htt_tx_inc_pending(htt, txq);
spin_unlock_bh(&ar->htt.tx_lock);
if (ret)
@@ -4394,7 +4394,7 @@ int ath10k_mac_tx_push_txq(struct ieee80211_hw *hw,
skb = ieee80211_tx_dequeue_ni(hw, txq);
if (!skb) {
spin_lock_bh(&ar->htt.tx_lock);
- ath10k_htt_tx_dec_pending(htt);
+ ath10k_htt_tx_dec_pending(htt, txq);
spin_unlock_bh(&ar->htt.tx_lock);
return -ENOENT;
@@ -4416,7 +4416,7 @@ int ath10k_mac_tx_push_txq(struct ieee80211_hw *hw,
ret = ath10k_htt_tx_mgmt_inc_pending(htt, is_mgmt, is_presp);
if (ret) {
- ath10k_htt_tx_dec_pending(htt);
+ ath10k_htt_tx_dec_pending(htt, txq);
spin_unlock_bh(&ar->htt.tx_lock);
return ret;
}
@@ -4430,7 +4430,7 @@ int ath10k_mac_tx_push_txq(struct ieee80211_hw *hw,
ath10k_warn(ar, "failed to push frame: %d\n", ret);
spin_lock_bh(&ar->htt.tx_lock);
- ath10k_htt_tx_dec_pending(htt);
+ ath10k_htt_tx_dec_pending(htt, txq);
if (is_mgmt)
ath10k_htt_tx_mgmt_dec_pending(htt);
spin_unlock_bh(&ar->htt.tx_lock);
@@ -4693,7 +4693,7 @@ static void ath10k_mac_op_tx(struct ieee80211_hw *hw,
is_presp = ieee80211_is_probe_resp(hdr->frame_control);
}
- ret = ath10k_htt_tx_inc_pending(htt);
+ ret = ath10k_htt_tx_inc_pending(htt, txq);
if (ret) {
ath10k_warn(ar, "failed to increase tx pending count: %d, dropping\n",
ret);
@@ -4706,7 +4706,7 @@ static void ath10k_mac_op_tx(struct ieee80211_hw *hw,
if (ret) {
ath10k_dbg(ar, ATH10K_DBG_MAC, "failed to increase tx mgmt pending count: %d, dropping\n",
ret);
- ath10k_htt_tx_dec_pending(htt);
+ ath10k_htt_tx_dec_pending(htt, txq);
spin_unlock_bh(&ar->htt.tx_lock);
ieee80211_free_txskb(ar->hw, skb);
return;
@@ -4719,7 +4719,7 @@ static void ath10k_mac_op_tx(struct ieee80211_hw *hw,
ath10k_warn(ar, "failed to transmit frame: %d\n", ret);
if (is_htt) {
spin_lock_bh(&ar->htt.tx_lock);
- ath10k_htt_tx_dec_pending(htt);
+ ath10k_htt_tx_dec_pending(htt, txq);
if (is_mgmt)
ath10k_htt_tx_mgmt_dec_pending(htt);
spin_unlock_bh(&ar->htt.tx_lock);
@@ -8059,10 +8059,12 @@ static int ath10k_mac_op_set_frag_threshold(struct ieee80211_hw *hw, u32 value)
return -EOPNOTSUPP;
}
-void ath10k_mac_wait_tx_complete(struct ath10k *ar)
+static void _ath10k_mac_wait_tx_complete(struct ath10k *ar,
+ unsigned long queues)
{
bool skip;
long time_left;
+ unsigned int q;
/* mac80211 doesn't care if we really xmit queued frames or not
* we'll collect those frames either way if we stop/delete vdevs
@@ -8072,10 +8074,14 @@ void ath10k_mac_wait_tx_complete(struct ath10k *ar)
return;
time_left = wait_event_timeout(ar->htt.empty_tx_wq, ({
- bool empty;
+ bool empty = true;
spin_lock_bh(&ar->htt.tx_lock);
- empty = (ar->htt.num_pending_tx == 0);
+ for_each_set_bit(q, &queues, ar->hw->queues) {
+ empty = (ar->htt.num_pending_per_queue[q] == 0);
+ if (!empty)
+ break;
+ }
spin_unlock_bh(&ar->htt.tx_lock);
skip = (ar->state == ATH10K_STATE_WEDGED) ||
@@ -8090,6 +8096,13 @@ void ath10k_mac_wait_tx_complete(struct ath10k *ar)
skip, ar->state, time_left);
}
+void ath10k_mac_wait_tx_complete(struct ath10k *ar)
+{
+ unsigned int queues = GENMASK(ar->hw->queues - 1, 0);
+
+ _ath10k_mac_wait_tx_complete(ar, queues);
+}
+
static void ath10k_flush(struct ieee80211_hw *hw, struct ieee80211_vif *vif,
u32 queues, bool drop)
{
@@ -8111,7 +8124,7 @@ static void ath10k_flush(struct ieee80211_hw *hw, struct ieee80211_vif *vif,
}
mutex_lock(&ar->conf_mutex);
- ath10k_mac_wait_tx_complete(ar);
+ _ath10k_mac_wait_tx_complete(ar, queues);
mutex_unlock(&ar->conf_mutex);
}
diff --git a/drivers/net/wireless/ath/ath10k/txrx.c b/drivers/net/wireless/ath/ath10k/txrx.c
index 5b6750ef7d19..b848962fc8fb 100644
--- a/drivers/net/wireless/ath/ath10k/txrx.c
+++ b/drivers/net/wireless/ath/ath10k/txrx.c
@@ -82,7 +82,7 @@ int ath10k_txrx_tx_unref(struct ath10k_htt *htt,
flags = skb_cb->flags;
ath10k_htt_tx_free_msdu_id(htt, tx_done->msdu_id);
- ath10k_htt_tx_dec_pending(htt);
+ ath10k_htt_tx_dec_pending(htt, txq);
spin_unlock_bh(&htt->tx_lock);
rcu_read_lock();
--
2.40.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [RESEND PATCH v3 0/2] Improve ath10k flush queue mechanism
2024-11-22 16:48 [RESEND PATCH v3 0/2] Improve ath10k flush queue mechanism Remi Pommarel
2024-11-22 16:48 ` [RESEND PATCH v3 1/2] wifi: ath10k: Implement ieee80211 flush_sta callback Remi Pommarel
2024-11-22 16:48 ` [RESEND PATCH v3 2/2] wifi: ath10k: Flush only requested txq in ath10k_flush() Remi Pommarel
@ 2024-11-26 12:57 ` James Prestwood
2024-11-26 12:59 ` James Prestwood
2024-11-29 16:31 ` Remi Pommarel
2025-01-09 13:03 ` Kalle Valo
3 siblings, 2 replies; 10+ messages in thread
From: James Prestwood @ 2024-11-26 12:57 UTC (permalink / raw)
To: Remi Pommarel, ath10k, linux-wireless, linux-kernel
Cc: Kalle Valo, Jeff Johnson, Cedric Veilleux,
Vasanthakumar Thiagarajan
Hi Remi,
On 11/22/24 8:48 AM, Remi Pommarel wrote:
> It has been reported [0] that a 3-4 seconds (actually up to 5 sec) of
> radio silence could be observed followed by the error below on ath10k
> devices:
>
> ath10k_pci 0000:04:00.0: failed to flush transmit queue (skip 0 ar-state 1): 0
>
> This is due to how the TX queues are flushed in ath10k. When a STA is
> removed, mac80211 need to flush queues [1], but because ath10k does not
> have a lightweight .flush_sta operation, ieee80211_flush_queues() is
> called instead effectively blocking the whole queue during the drain
> causing this radio silence. Also because ath10k_flush() waits for all
> queued to be emptied, not only the flushed ones it could more easily
> take up to 5 seconds to finish making the whole situation worst.
>
> The first patch of this series adds a .flush_sta operation to flush only
> specific STA traffic avoiding the need to stop whole queues and should
> be enough in itself to fix the reported issue.
>
> The second patch of this series is a proposal to improve ath10k_flush so
> that it will be less likely to timeout waiting for non related queues to
> drain.
>
> The abose kernel warning could still be observed (e.g. flushing a dead
> STA) but should be now harmless.
>
> [0]: https://lore.kernel.org/all/CA+Xfe4FjUmzM5mvPxGbpJsF3SvSdE5_wgxvgFJ0bsdrKODVXCQ@mail.gmail.com/
> [1]: commit 0b75a1b1e42e ("wifi: mac80211: flush queues on STA removal")
I saw in the original report that it indicated it was only for AP mode
but after seeing this and checking some of our clients I saw that this
is also happening in station mode too. I only have clients on 6.2 and
6.8. I can confirm its not occurring on 6.2, but is on 6.8. I also tried
your set of patches but did not notice any behavior difference with or
without them. When it happens, its always just after a roam scan, ~4
seconds go by and we get the failure followed by a "Connection to AP
<mac> lost". Oddly the MAC address is all zeros.
Nov 25 09:09:50 iwd[16256]: src/station.c:station_start_roam() Using
cached neighbor report for roam
Nov 25 09:09:54 kernel: ath10k_pci 0000:02:00.0: failed to flush
transmit queue (skip 0 ar-state 1): 0
Nov 25 09:09:54 iwd[16256]: src/netdev.c:netdev_mlme_notify() MLME
notification Del Station(20)
Nov 25 09:09:54 iwd[16256]: src/netdev.c:netdev_link_notify() event 16
on ifindex 7
Nov 25 09:09:54 iwd[16256]: src/netdev.c:netdev_mlme_notify() MLME
notification Deauthenticate(39)
Nov 25 09:09:54 iwd[16256]: src/netdev.c:netdev_deauthenticate_event()
Nov 25 09:09:54 iwd[16256]: src/netdev.c:netdev_mlme_notify() MLME
notification Disconnect(48)
Nov 25 09:09:54 iwd[16256]: src/netdev.c:netdev_disconnect_event()
Nov 25 09:09:54 iwd[16256]: Received Deauthentication event, reason: 4,
from_ap: false
Nov 25 09:09:54 kernel: wlan0: Connection to AP 00:00:00:00:00:00 lost
Other times, the above logs are preceded by this:
Nov 26 00:25:25 kernel: ath10k_pci 0000:02:00.0: failed to flush sta txq
(sta ca:55:b8:7a:91:4b skip 0 ar-state 1): 0
Note, the above logs are with your patches applied. Maybe this is a
separate issue? Or do you think its related?
Thanks,
James
>
> V3:
> - Initialize empty to true to fix smatch error
>
> V2:
> - Add Closes tag
> - Use atomic instead of spinlock for per sta pending frame counter
> - Call ath10k_htt_tx_sta_dec_pending within rcu
> - Rename pending_per_queue[] to num_pending_per_queue[]
>
> Remi Pommarel (2):
> wifi: ath10k: Implement ieee80211 flush_sta callback
> wifi: ath10k: Flush only requested txq in ath10k_flush()
>
> drivers/net/wireless/ath/ath10k/core.h | 2 +
> drivers/net/wireless/ath/ath10k/htt.h | 11 +++-
> drivers/net/wireless/ath/ath10k/htt_tx.c | 49 +++++++++++++++-
> drivers/net/wireless/ath/ath10k/mac.c | 75 ++++++++++++++++++++----
> drivers/net/wireless/ath/ath10k/txrx.c | 11 ++--
> 5 files changed, 127 insertions(+), 21 deletions(-)
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RESEND PATCH v3 0/2] Improve ath10k flush queue mechanism
2024-11-26 12:57 ` [RESEND PATCH v3 0/2] Improve ath10k flush queue mechanism James Prestwood
@ 2024-11-26 12:59 ` James Prestwood
2024-11-29 16:31 ` Remi Pommarel
1 sibling, 0 replies; 10+ messages in thread
From: James Prestwood @ 2024-11-26 12:59 UTC (permalink / raw)
To: Remi Pommarel, ath10k, linux-wireless, linux-kernel
Cc: Kalle Valo, Jeff Johnson, Cedric Veilleux,
Vasanthakumar Thiagarajan
On 11/26/24 4:57 AM, James Prestwood wrote:
> Hi Remi,
>
> On 11/22/24 8:48 AM, Remi Pommarel wrote:
>> It has been reported [0] that a 3-4 seconds (actually up to 5 sec) of
>> radio silence could be observed followed by the error below on ath10k
>> devices:
>>
>> ath10k_pci 0000:04:00.0: failed to flush transmit queue (skip 0
>> ar-state 1): 0
>>
>> This is due to how the TX queues are flushed in ath10k. When a STA is
>> removed, mac80211 need to flush queues [1], but because ath10k does not
>> have a lightweight .flush_sta operation, ieee80211_flush_queues() is
>> called instead effectively blocking the whole queue during the drain
>> causing this radio silence. Also because ath10k_flush() waits for all
>> queued to be emptied, not only the flushed ones it could more easily
>> take up to 5 seconds to finish making the whole situation worst.
>>
>> The first patch of this series adds a .flush_sta operation to flush only
>> specific STA traffic avoiding the need to stop whole queues and should
>> be enough in itself to fix the reported issue.
>>
>> The second patch of this series is a proposal to improve ath10k_flush so
>> that it will be less likely to timeout waiting for non related queues to
>> drain.
>>
>> The abose kernel warning could still be observed (e.g. flushing a dead
>> STA) but should be now harmless.
>>
>> [0]:
>> https://lore.kernel.org/all/CA+Xfe4FjUmzM5mvPxGbpJsF3SvSdE5_wgxvgFJ0bsdrKODVXCQ@mail.gmail.com/
>> [1]: commit 0b75a1b1e42e ("wifi: mac80211: flush queues on STA removal")
>
> I saw in the original report that it indicated it was only for AP mode
> but after seeing this and checking some of our clients I saw that this
> is also happening in station mode too. I only have clients on 6.2 and
> 6.8. I can confirm its not occurring on 6.2, but is on 6.8. I also
> tried your set of patches but did not notice any behavior difference
> with or without them. When it happens, its always just after a roam
> scan, ~4 seconds go by and we get the failure followed by a
> "Connection to AP <mac> lost". Oddly the MAC address is all zeros.
>
> Nov 25 09:09:50 iwd[16256]: src/station.c:station_start_roam() Using
> cached neighbor report for roam
> Nov 25 09:09:54 kernel: ath10k_pci 0000:02:00.0: failed to flush
> transmit queue (skip 0 ar-state 1): 0
> Nov 25 09:09:54 iwd[16256]: src/netdev.c:netdev_mlme_notify() MLME
> notification Del Station(20)
> Nov 25 09:09:54 iwd[16256]: src/netdev.c:netdev_link_notify() event 16
> on ifindex 7
> Nov 25 09:09:54 iwd[16256]: src/netdev.c:netdev_mlme_notify() MLME
> notification Deauthenticate(39)
> Nov 25 09:09:54 iwd[16256]: src/netdev.c:netdev_deauthenticate_event()
> Nov 25 09:09:54 iwd[16256]: src/netdev.c:netdev_mlme_notify() MLME
> notification Disconnect(48)
> Nov 25 09:09:54 iwd[16256]: src/netdev.c:netdev_disconnect_event()
> Nov 25 09:09:54 iwd[16256]: Received Deauthentication event, reason:
> 4, from_ap: false
> Nov 25 09:09:54 kernel: wlan0: Connection to AP 00:00:00:00:00:00 lost
>
> Other times, the above logs are preceded by this:
>
> Nov 26 00:25:25 kernel: ath10k_pci 0000:02:00.0: failed to flush sta
> txq (sta ca:55:b8:7a:91:4b skip 0 ar-state 1): 0
>
> Note, the above logs are with your patches applied. Maybe this is a
> separate issue? Or do you think its related?
Forgot to mention, this is on the QCA6174 hw 3.2
firmware ver WLAN.RM.4.4.1-00288- api 6 features wowlan,ignore-otp,mfp
crc32 bf907c7c
>
> Thanks,
>
> James
>
>>
>> V3:
>> - Initialize empty to true to fix smatch error
>>
>> V2:
>> - Add Closes tag
>> - Use atomic instead of spinlock for per sta pending frame counter
>> - Call ath10k_htt_tx_sta_dec_pending within rcu
>> - Rename pending_per_queue[] to num_pending_per_queue[]
>>
>> Remi Pommarel (2):
>> wifi: ath10k: Implement ieee80211 flush_sta callback
>> wifi: ath10k: Flush only requested txq in ath10k_flush()
>>
>> drivers/net/wireless/ath/ath10k/core.h | 2 +
>> drivers/net/wireless/ath/ath10k/htt.h | 11 +++-
>> drivers/net/wireless/ath/ath10k/htt_tx.c | 49 +++++++++++++++-
>> drivers/net/wireless/ath/ath10k/mac.c | 75 ++++++++++++++++++++----
>> drivers/net/wireless/ath/ath10k/txrx.c | 11 ++--
>> 5 files changed, 127 insertions(+), 21 deletions(-)
>>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RESEND PATCH v3 0/2] Improve ath10k flush queue mechanism
2024-11-26 12:57 ` [RESEND PATCH v3 0/2] Improve ath10k flush queue mechanism James Prestwood
2024-11-26 12:59 ` James Prestwood
@ 2024-11-29 16:31 ` Remi Pommarel
2024-12-02 15:25 ` James Prestwood
1 sibling, 1 reply; 10+ messages in thread
From: Remi Pommarel @ 2024-11-29 16:31 UTC (permalink / raw)
To: James Prestwood
Cc: ath10k, linux-wireless, linux-kernel, Kalle Valo, Jeff Johnson,
Cedric Veilleux, Vasanthakumar Thiagarajan
Hi James,
On Tue, Nov 26, 2024 at 04:57:36AM -0800, James Prestwood wrote:
> Hi Remi,
>
> On 11/22/24 8:48 AM, Remi Pommarel wrote:
> > It has been reported [0] that a 3-4 seconds (actually up to 5 sec) of
> > radio silence could be observed followed by the error below on ath10k
> > devices:
> >
> > ath10k_pci 0000:04:00.0: failed to flush transmit queue (skip 0 ar-state 1): 0
> >
> > This is due to how the TX queues are flushed in ath10k. When a STA is
> > removed, mac80211 need to flush queues [1], but because ath10k does not
> > have a lightweight .flush_sta operation, ieee80211_flush_queues() is
> > called instead effectively blocking the whole queue during the drain
> > causing this radio silence. Also because ath10k_flush() waits for all
> > queued to be emptied, not only the flushed ones it could more easily
> > take up to 5 seconds to finish making the whole situation worst.
> >
> > The first patch of this series adds a .flush_sta operation to flush only
> > specific STA traffic avoiding the need to stop whole queues and should
> > be enough in itself to fix the reported issue.
> >
> > The second patch of this series is a proposal to improve ath10k_flush so
> > that it will be less likely to timeout waiting for non related queues to
> > drain.
> >
> > The abose kernel warning could still be observed (e.g. flushing a dead
> > STA) but should be now harmless.
> >
> > [0]: https://lore.kernel.org/all/CA+Xfe4FjUmzM5mvPxGbpJsF3SvSdE5_wgxvgFJ0bsdrKODVXCQ@mail.gmail.com/
> > [1]: commit 0b75a1b1e42e ("wifi: mac80211: flush queues on STA removal")
>
> I saw in the original report that it indicated it was only for AP mode but
> after seeing this and checking some of our clients I saw that this is also
> happening in station mode too. I only have clients on 6.2 and 6.8. I can
> confirm its not occurring on 6.2, but is on 6.8. I also tried your set of
> patches but did not notice any behavior difference with or without them.
> When it happens, its always just after a roam scan, ~4 seconds go by and we
> get the failure followed by a "Connection to AP <mac> lost". Oddly the MAC
> address is all zeros.
>
> Nov 25 09:09:50 iwd[16256]: src/station.c:station_start_roam() Using cached
> neighbor report for roam
> Nov 25 09:09:54 kernel: ath10k_pci 0000:02:00.0: failed to flush transmit
> queue (skip 0 ar-state 1): 0
> Nov 25 09:09:54 iwd[16256]: src/netdev.c:netdev_mlme_notify() MLME
> notification Del Station(20)
> Nov 25 09:09:54 iwd[16256]: src/netdev.c:netdev_link_notify() event 16 on
> ifindex 7
> Nov 25 09:09:54 iwd[16256]: src/netdev.c:netdev_mlme_notify() MLME
> notification Deauthenticate(39)
> Nov 25 09:09:54 iwd[16256]: src/netdev.c:netdev_deauthenticate_event()
> Nov 25 09:09:54 iwd[16256]: src/netdev.c:netdev_mlme_notify() MLME
> notification Disconnect(48)
> Nov 25 09:09:54 iwd[16256]: src/netdev.c:netdev_disconnect_event()
> Nov 25 09:09:54 iwd[16256]: Received Deauthentication event, reason: 4,
> from_ap: false
> Nov 25 09:09:54 kernel: wlan0: Connection to AP 00:00:00:00:00:00 lost
>
> Other times, the above logs are preceded by this:
>
> Nov 26 00:25:25 kernel: ath10k_pci 0000:02:00.0: failed to flush sta txq
> (sta ca:55:b8:7a:91:4b skip 0 ar-state 1): 0
>
> Note, the above logs are with your patches applied. Maybe this is a separate
> issue? Or do you think its related?
Thanks fot the test. Yes this patchset is here only to fix the issue for
AP (this caused AP to stall all traffic for every STA connected to it).
So while this issue is interesting it is not addressed by this patchset.
Out of curiosity I tried to reproduce it currently trying to roam an
ath10k sta back and forth two APs (same SSID/psk, different channels)
and wasn't able to reproduce with wpa_supplicant, didn't try with iwd
though. Or maybe the AP the sta is roaming away from has stopped
responding, in that case I don't know what can be done here as it does
not seem we want to drop pending frames (as we would prefer to deauth
cleanly from AP in main case).
In any case still I think this is a separate issue and it is also way
less critical than the AP one (one STA can create ~4sec DOS to the
entire BSS vs a STA took more time to roam away if AP crashed).
Thanks,
--
Remi
>
> Thanks,
>
> James
>
> >
> > V3:
> > - Initialize empty to true to fix smatch error
> >
> > V2:
> > - Add Closes tag
> > - Use atomic instead of spinlock for per sta pending frame counter
> > - Call ath10k_htt_tx_sta_dec_pending within rcu
> > - Rename pending_per_queue[] to num_pending_per_queue[]
> >
> > Remi Pommarel (2):
> > wifi: ath10k: Implement ieee80211 flush_sta callback
> > wifi: ath10k: Flush only requested txq in ath10k_flush()
> >
> > drivers/net/wireless/ath/ath10k/core.h | 2 +
> > drivers/net/wireless/ath/ath10k/htt.h | 11 +++-
> > drivers/net/wireless/ath/ath10k/htt_tx.c | 49 +++++++++++++++-
> > drivers/net/wireless/ath/ath10k/mac.c | 75 ++++++++++++++++++++----
> > drivers/net/wireless/ath/ath10k/txrx.c | 11 ++--
> > 5 files changed, 127 insertions(+), 21 deletions(-)
> >
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RESEND PATCH v3 0/2] Improve ath10k flush queue mechanism
2024-11-29 16:31 ` Remi Pommarel
@ 2024-12-02 15:25 ` James Prestwood
0 siblings, 0 replies; 10+ messages in thread
From: James Prestwood @ 2024-12-02 15:25 UTC (permalink / raw)
To: Remi Pommarel
Cc: ath10k, linux-wireless, linux-kernel, Kalle Valo, Jeff Johnson,
Cedric Veilleux, Vasanthakumar Thiagarajan
Hi Remi,
On 11/29/24 8:31 AM, Remi Pommarel wrote:
> Hi James,
>
> On Tue, Nov 26, 2024 at 04:57:36AM -0800, James Prestwood wrote:
>> Hi Remi,
>>
>> On 11/22/24 8:48 AM, Remi Pommarel wrote:
>>> It has been reported [0] that a 3-4 seconds (actually up to 5 sec) of
>>> radio silence could be observed followed by the error below on ath10k
>>> devices:
>>>
>>> ath10k_pci 0000:04:00.0: failed to flush transmit queue (skip 0 ar-state 1): 0
>>>
>>> This is due to how the TX queues are flushed in ath10k. When a STA is
>>> removed, mac80211 need to flush queues [1], but because ath10k does not
>>> have a lightweight .flush_sta operation, ieee80211_flush_queues() is
>>> called instead effectively blocking the whole queue during the drain
>>> causing this radio silence. Also because ath10k_flush() waits for all
>>> queued to be emptied, not only the flushed ones it could more easily
>>> take up to 5 seconds to finish making the whole situation worst.
>>>
>>> The first patch of this series adds a .flush_sta operation to flush only
>>> specific STA traffic avoiding the need to stop whole queues and should
>>> be enough in itself to fix the reported issue.
>>>
>>> The second patch of this series is a proposal to improve ath10k_flush so
>>> that it will be less likely to timeout waiting for non related queues to
>>> drain.
>>>
>>> The abose kernel warning could still be observed (e.g. flushing a dead
>>> STA) but should be now harmless.
>>>
>>> [0]: https://lore.kernel.org/all/CA+Xfe4FjUmzM5mvPxGbpJsF3SvSdE5_wgxvgFJ0bsdrKODVXCQ@mail.gmail.com/
>>> [1]: commit 0b75a1b1e42e ("wifi: mac80211: flush queues on STA removal")
>> I saw in the original report that it indicated it was only for AP mode but
>> after seeing this and checking some of our clients I saw that this is also
>> happening in station mode too. I only have clients on 6.2 and 6.8. I can
>> confirm its not occurring on 6.2, but is on 6.8. I also tried your set of
>> patches but did not notice any behavior difference with or without them.
>> When it happens, its always just after a roam scan, ~4 seconds go by and we
>> get the failure followed by a "Connection to AP <mac> lost". Oddly the MAC
>> address is all zeros.
>>
>> Nov 25 09:09:50 iwd[16256]: src/station.c:station_start_roam() Using cached
>> neighbor report for roam
>> Nov 25 09:09:54 kernel: ath10k_pci 0000:02:00.0: failed to flush transmit
>> queue (skip 0 ar-state 1): 0
>> Nov 25 09:09:54 iwd[16256]: src/netdev.c:netdev_mlme_notify() MLME
>> notification Del Station(20)
>> Nov 25 09:09:54 iwd[16256]: src/netdev.c:netdev_link_notify() event 16 on
>> ifindex 7
>> Nov 25 09:09:54 iwd[16256]: src/netdev.c:netdev_mlme_notify() MLME
>> notification Deauthenticate(39)
>> Nov 25 09:09:54 iwd[16256]: src/netdev.c:netdev_deauthenticate_event()
>> Nov 25 09:09:54 iwd[16256]: src/netdev.c:netdev_mlme_notify() MLME
>> notification Disconnect(48)
>> Nov 25 09:09:54 iwd[16256]: src/netdev.c:netdev_disconnect_event()
>> Nov 25 09:09:54 iwd[16256]: Received Deauthentication event, reason: 4,
>> from_ap: false
>> Nov 25 09:09:54 kernel: wlan0: Connection to AP 00:00:00:00:00:00 lost
>>
>> Other times, the above logs are preceded by this:
>>
>> Nov 26 00:25:25 kernel: ath10k_pci 0000:02:00.0: failed to flush sta txq
>> (sta ca:55:b8:7a:91:4b skip 0 ar-state 1): 0
>>
>> Note, the above logs are with your patches applied. Maybe this is a separate
>> issue? Or do you think its related?
> Thanks fot the test. Yes this patchset is here only to fix the issue for
> AP (this caused AP to stall all traffic for every STA connected to it).
> So while this issue is interesting it is not addressed by this patchset.
Thanks for the clarification.
>
> Out of curiosity I tried to reproduce it currently trying to roam an
> ath10k sta back and forth two APs (same SSID/psk, different channels)
> and wasn't able to reproduce with wpa_supplicant, didn't try with iwd
> though. Or maybe the AP the sta is roaming away from has stopped
> responding, in that case I don't know what can be done here as it does
> not seem we want to drop pending frames (as we would prefer to deauth
> cleanly from AP in main case).
We have quite a lot of clients on ath10k and the issue is rare(ish). But
you may be right and its spurred from the AP not responding. I need to
dig in more to see if there is anything to be done on the client side, I
just figured implementing the flush queue op would apply to both station
and AP mode.
>
> In any case still I think this is a separate issue and it is also way
> less critical than the AP one (one STA can create ~4sec DOS to the
> entire BSS vs a STA took more time to roam away if AP crashed).
So for my companies use case a 4 second DOS to an individual BSS can be
potentially bad. This doesn't really differ from an outright disconnect
but I'm still trying to limit any lapse in connectivity if at all
possible. If I can gather more info I'll report back.
Thanks,
James
>
> Thanks,
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RESEND PATCH v3 0/2] Improve ath10k flush queue mechanism
2024-11-22 16:48 [RESEND PATCH v3 0/2] Improve ath10k flush queue mechanism Remi Pommarel
` (2 preceding siblings ...)
2024-11-26 12:57 ` [RESEND PATCH v3 0/2] Improve ath10k flush queue mechanism James Prestwood
@ 2025-01-09 13:03 ` Kalle Valo
2025-01-09 13:56 ` Remi Pommarel
3 siblings, 1 reply; 10+ messages in thread
From: Kalle Valo @ 2025-01-09 13:03 UTC (permalink / raw)
To: Remi Pommarel
Cc: ath10k, linux-wireless, linux-kernel, Jeff Johnson,
Cedric Veilleux, Vasanthakumar Thiagarajan
Remi Pommarel <repk@triplefau.lt> writes:
> It has been reported [0] that a 3-4 seconds (actually up to 5 sec) of
> radio silence could be observed followed by the error below on ath10k
> devices:
>
> ath10k_pci 0000:04:00.0: failed to flush transmit queue (skip 0 ar-state 1): 0
>
> This is due to how the TX queues are flushed in ath10k. When a STA is
> removed, mac80211 need to flush queues [1], but because ath10k does not
> have a lightweight .flush_sta operation, ieee80211_flush_queues() is
> called instead effectively blocking the whole queue during the drain
> causing this radio silence. Also because ath10k_flush() waits for all
> queued to be emptied, not only the flushed ones it could more easily
> take up to 5 seconds to finish making the whole situation worst.
>
> The first patch of this series adds a .flush_sta operation to flush only
> specific STA traffic avoiding the need to stop whole queues and should
> be enough in itself to fix the reported issue.
>
> The second patch of this series is a proposal to improve ath10k_flush so
> that it will be less likely to timeout waiting for non related queues to
> drain.
>
> The abose kernel warning could still be observed (e.g. flushing a dead
> STA) but should be now harmless.
>
> [0]: https://lore.kernel.org/all/CA+Xfe4FjUmzM5mvPxGbpJsF3SvSdE5_wgxvgFJ0bsdrKODVXCQ@mail.gmail.com/
> [1]: commit 0b75a1b1e42e ("wifi: mac80211: flush queues on STA removal")
On what hardware and firmware did you test this? As they can behave very
differently knowing that is really important.
--
https://patchwork.kernel.org/project/linux-wireless/list/
https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RESEND PATCH v3 0/2] Improve ath10k flush queue mechanism
2025-01-09 13:03 ` Kalle Valo
@ 2025-01-09 13:56 ` Remi Pommarel
0 siblings, 0 replies; 10+ messages in thread
From: Remi Pommarel @ 2025-01-09 13:56 UTC (permalink / raw)
To: Kalle Valo
Cc: ath10k, linux-wireless, linux-kernel, Jeff Johnson,
Cedric Veilleux, Vasanthakumar Thiagarajan
On Thu, Jan 09, 2025 at 03:03:44PM +0200, Kalle Valo wrote:
> Remi Pommarel <repk@triplefau.lt> writes:
>
> > It has been reported [0] that a 3-4 seconds (actually up to 5 sec) of
> > radio silence could be observed followed by the error below on ath10k
> > devices:
> >
> > ath10k_pci 0000:04:00.0: failed to flush transmit queue (skip 0 ar-state 1): 0
> >
> > This is due to how the TX queues are flushed in ath10k. When a STA is
> > removed, mac80211 need to flush queues [1], but because ath10k does not
> > have a lightweight .flush_sta operation, ieee80211_flush_queues() is
> > called instead effectively blocking the whole queue during the drain
> > causing this radio silence. Also because ath10k_flush() waits for all
> > queued to be emptied, not only the flushed ones it could more easily
> > take up to 5 seconds to finish making the whole situation worst.
> >
> > The first patch of this series adds a .flush_sta operation to flush only
> > specific STA traffic avoiding the need to stop whole queues and should
> > be enough in itself to fix the reported issue.
> >
> > The second patch of this series is a proposal to improve ath10k_flush so
> > that it will be less likely to timeout waiting for non related queues to
> > drain.
> >
> > The abose kernel warning could still be observed (e.g. flushing a dead
> > STA) but should be now harmless.
> >
> > [0]: https://lore.kernel.org/all/CA+Xfe4FjUmzM5mvPxGbpJsF3SvSdE5_wgxvgFJ0bsdrKODVXCQ@mail.gmail.com/
> > [1]: commit 0b75a1b1e42e ("wifi: mac80211: flush queues on STA removal")
>
> On what hardware and firmware did you test this? As they can behave very
> differently knowing that is really important.
This was tested on QCA9888 hw 2.0 10.4-3.10-00076.
Not sure to see how a different HW/FW could have prevented that though.
--
Remi
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RESEND PATCH v3 2/2] wifi: ath10k: Flush only requested txq in ath10k_flush()
2024-11-22 16:48 ` [RESEND PATCH v3 2/2] wifi: ath10k: Flush only requested txq in ath10k_flush() Remi Pommarel
@ 2025-07-25 21:57 ` Maurer, Florian
0 siblings, 0 replies; 10+ messages in thread
From: Maurer, Florian @ 2025-07-25 21:57 UTC (permalink / raw)
To: linux-wireless@vger.kernel.org, repk@triplefau.lt,
ath10k@lists.infradead.org, linux-kernel@vger.kernel.org
Cc: kvalo@kernel.org, veilleux.cedric@gmail.com, jjohnson@kernel.org,
quic_vthiagar@quicinc.com
Hi everyone,
we tested this Patch in Freifunk Aachen using Gluon/OpenWrt.
We had the issue that the Extreme Networks WS-AP3825i did have a high
load in a busy environment, making the node unresponsive.
This issue is resolved with this patch, improving the wifi stability
noticeably.
Thanks,
Florian Maurer
Tested-by: Florian Maurer <maurer@fh-aachen.de>
On Fri, 2024-11-22 at 17:48 +0100, Remi Pommarel wrote:
> The ieee80211 flush callback can be called to flush only part of all hw
> queues. The ath10k's flush callback implementation (i.e. ath10k_flush())
> was waiting for all pending frames of all queues to be flushed ignoring
> the queue parameter. Because only the queues to be flushed are stopped
> by mac80211, skb can still be queued to other queues meanwhile. Thus
> ath10k_flush() could fail (and wait 5sec holding ar->conf lock) even if
> the requested queues are flushed correctly.
>
> A way to reproduce the issue is to use two different APs because
> each vdev has its own hw queue in ath10k. Connect STA0 to AP0 and STA1
> to AP1. Then generate traffic from AP0 to STA0 and kill STA0 without
> clean disassociation frame (e.g. unplug power cable, reboot -f, ...).
> Now if we were to flush AP1's queue, ath10k_flush() would fail (and
> effectively block 5 seconds with ar->conf or even wiphy's lock held)
> with the following warning:
>
> ath10k_pci 0000:01:00.0: failed to flush transmit queue (skip 0 ar-state 2): 0
>
> Wait only for pending frames of the requested queues to be flushed in
> ath10k_flush() to avoid that long blocking.
>
> Reported-by: Cedric Veilleux <veilleux.cedric@gmail.com>
> Closes: https://lore.kernel.org/all/CA+Xfe4FjUmzM5mvPxGbpJsF3SvSdE5_wgxvgFJ0bsdrKODVXCQ@mail.gmail.com/
> Signed-off-by: Remi Pommarel <repk@triplefau.lt>
> ---
> drivers/net/wireless/ath/ath10k/htt.h | 7 +++--
> drivers/net/wireless/ath/ath10k/htt_tx.c | 18 ++++++++++--
> drivers/net/wireless/ath/ath10k/mac.c | 35 ++++++++++++++++--------
> drivers/net/wireless/ath/ath10k/txrx.c | 2 +-
> 4 files changed, 45 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/net/wireless/ath/ath10k/htt.h b/drivers/net/wireless/ath/ath10k/htt.h
> index d150f9330941..ca8bf3b6766d 100644
> --- a/drivers/net/wireless/ath/ath10k/htt.h
> +++ b/drivers/net/wireless/ath/ath10k/htt.h
> @@ -1870,6 +1870,7 @@ struct ath10k_htt {
> spinlock_t tx_lock;
> int max_num_pending_tx;
> int num_pending_tx;
> + int num_pending_per_queue[IEEE80211_MAX_QUEUES];
> int num_pending_mgmt_tx;
> struct idr pending_tx;
> wait_queue_head_t empty_tx_wq;
> @@ -2447,8 +2448,10 @@ void ath10k_htt_tx_txq_update(struct ieee80211_hw *hw,
> void ath10k_htt_tx_txq_recalc(struct ieee80211_hw *hw,
> struct ieee80211_txq *txq);
> void ath10k_htt_tx_txq_sync(struct ath10k *ar);
> -void ath10k_htt_tx_dec_pending(struct ath10k_htt *htt);
> -int ath10k_htt_tx_inc_pending(struct ath10k_htt *htt);
> +void ath10k_htt_tx_dec_pending(struct ath10k_htt *htt,
> + struct ieee80211_txq *txq);
> +int ath10k_htt_tx_inc_pending(struct ath10k_htt *htt,
> + struct ieee80211_txq *txq);
> void ath10k_htt_tx_mgmt_dec_pending(struct ath10k_htt *htt);
> int ath10k_htt_tx_mgmt_inc_pending(struct ath10k_htt *htt, bool is_mgmt,
> bool is_presp);
> diff --git a/drivers/net/wireless/ath/ath10k/htt_tx.c b/drivers/net/wireless/ath/ath10k/htt_tx.c
> index 211752bd0f65..ef5a992e8cce 100644
> --- a/drivers/net/wireless/ath/ath10k/htt_tx.c
> +++ b/drivers/net/wireless/ath/ath10k/htt_tx.c
> @@ -140,19 +140,26 @@ void ath10k_htt_tx_txq_update(struct ieee80211_hw *hw,
> spin_unlock_bh(&ar->htt.tx_lock);
> }
>
> -void ath10k_htt_tx_dec_pending(struct ath10k_htt *htt)
> +void ath10k_htt_tx_dec_pending(struct ath10k_htt *htt,
> + struct ieee80211_txq *txq)
> {
> + int qnr = -1;
> +
> lockdep_assert_held(&htt->tx_lock);
>
> htt->num_pending_tx--;
> if (htt->num_pending_tx == htt->max_num_pending_tx - 1)
> ath10k_mac_tx_unlock(htt->ar, ATH10K_TX_PAUSE_Q_FULL);
>
> - if (htt->num_pending_tx == 0)
> + if (txq)
> + qnr = --htt->num_pending_per_queue[txq->vif->hw_queue[txq->ac]];
> +
> + if (htt->num_pending_tx == 0 || qnr == 0)
> wake_up(&htt->empty_tx_wq);
> }
>
> -int ath10k_htt_tx_inc_pending(struct ath10k_htt *htt)
> +int ath10k_htt_tx_inc_pending(struct ath10k_htt *htt,
> + struct ieee80211_txq *txq)
> {
> lockdep_assert_held(&htt->tx_lock);
>
> @@ -163,6 +170,11 @@ int ath10k_htt_tx_inc_pending(struct ath10k_htt *htt)
> if (htt->num_pending_tx == htt->max_num_pending_tx)
> ath10k_mac_tx_lock(htt->ar, ATH10K_TX_PAUSE_Q_FULL);
>
> + if (!txq)
> + return 0;
> +
> + htt->num_pending_per_queue[txq->vif->hw_queue[txq->ac]]++;
> +
> return 0;
> }
>
> diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c
> index 8b8d4f24e5fb..7c5f95f2858f 100644
> --- a/drivers/net/wireless/ath/ath10k/mac.c
> +++ b/drivers/net/wireless/ath/ath10k/mac.c
> @@ -4385,7 +4385,7 @@ int ath10k_mac_tx_push_txq(struct ieee80211_hw *hw,
> u16 airtime;
>
> spin_lock_bh(&ar->htt.tx_lock);
> - ret = ath10k_htt_tx_inc_pending(htt);
> + ret = ath10k_htt_tx_inc_pending(htt, txq);
> spin_unlock_bh(&ar->htt.tx_lock);
>
> if (ret)
> @@ -4394,7 +4394,7 @@ int ath10k_mac_tx_push_txq(struct ieee80211_hw *hw,
> skb = ieee80211_tx_dequeue_ni(hw, txq);
> if (!skb) {
> spin_lock_bh(&ar->htt.tx_lock);
> - ath10k_htt_tx_dec_pending(htt);
> + ath10k_htt_tx_dec_pending(htt, txq);
> spin_unlock_bh(&ar->htt.tx_lock);
>
> return -ENOENT;
> @@ -4416,7 +4416,7 @@ int ath10k_mac_tx_push_txq(struct ieee80211_hw *hw,
> ret = ath10k_htt_tx_mgmt_inc_pending(htt, is_mgmt, is_presp);
>
> if (ret) {
> - ath10k_htt_tx_dec_pending(htt);
> + ath10k_htt_tx_dec_pending(htt, txq);
> spin_unlock_bh(&ar->htt.tx_lock);
> return ret;
> }
> @@ -4430,7 +4430,7 @@ int ath10k_mac_tx_push_txq(struct ieee80211_hw *hw,
> ath10k_warn(ar, "failed to push frame: %d\n", ret);
>
> spin_lock_bh(&ar->htt.tx_lock);
> - ath10k_htt_tx_dec_pending(htt);
> + ath10k_htt_tx_dec_pending(htt, txq);
> if (is_mgmt)
> ath10k_htt_tx_mgmt_dec_pending(htt);
> spin_unlock_bh(&ar->htt.tx_lock);
> @@ -4693,7 +4693,7 @@ static void ath10k_mac_op_tx(struct ieee80211_hw *hw,
> is_presp = ieee80211_is_probe_resp(hdr->frame_control);
> }
>
> - ret = ath10k_htt_tx_inc_pending(htt);
> + ret = ath10k_htt_tx_inc_pending(htt, txq);
> if (ret) {
> ath10k_warn(ar, "failed to increase tx pending count: %d, dropping\n",
> ret);
> @@ -4706,7 +4706,7 @@ static void ath10k_mac_op_tx(struct ieee80211_hw *hw,
> if (ret) {
> ath10k_dbg(ar, ATH10K_DBG_MAC, "failed to increase tx mgmt pending count: %d, dropping\n",
> ret);
> - ath10k_htt_tx_dec_pending(htt);
> + ath10k_htt_tx_dec_pending(htt, txq);
> spin_unlock_bh(&ar->htt.tx_lock);
> ieee80211_free_txskb(ar->hw, skb);
> return;
> @@ -4719,7 +4719,7 @@ static void ath10k_mac_op_tx(struct ieee80211_hw *hw,
> ath10k_warn(ar, "failed to transmit frame: %d\n", ret);
> if (is_htt) {
> spin_lock_bh(&ar->htt.tx_lock);
> - ath10k_htt_tx_dec_pending(htt);
> + ath10k_htt_tx_dec_pending(htt, txq);
> if (is_mgmt)
> ath10k_htt_tx_mgmt_dec_pending(htt);
> spin_unlock_bh(&ar->htt.tx_lock);
> @@ -8059,10 +8059,12 @@ static int ath10k_mac_op_set_frag_threshold(struct ieee80211_hw *hw, u32 value)
> return -EOPNOTSUPP;
> }
>
> -void ath10k_mac_wait_tx_complete(struct ath10k *ar)
> +static void _ath10k_mac_wait_tx_complete(struct ath10k *ar,
> + unsigned long queues)
> {
> bool skip;
> long time_left;
> + unsigned int q;
>
> /* mac80211 doesn't care if we really xmit queued frames or not
> * we'll collect those frames either way if we stop/delete vdevs
> @@ -8072,10 +8074,14 @@ void ath10k_mac_wait_tx_complete(struct ath10k *ar)
> return;
>
> time_left = wait_event_timeout(ar->htt.empty_tx_wq, ({
> - bool empty;
> + bool empty = true;
>
> spin_lock_bh(&ar->htt.tx_lock);
> - empty = (ar->htt.num_pending_tx == 0);
> + for_each_set_bit(q, &queues, ar->hw->queues) {
> + empty = (ar->htt.num_pending_per_queue[q] == 0);
> + if (!empty)
> + break;
> + }
> spin_unlock_bh(&ar->htt.tx_lock);
>
> skip = (ar->state == ATH10K_STATE_WEDGED) ||
> @@ -8090,6 +8096,13 @@ void ath10k_mac_wait_tx_complete(struct ath10k *ar)
> skip, ar->state, time_left);
> }
>
> +void ath10k_mac_wait_tx_complete(struct ath10k *ar)
> +{
> + unsigned int queues = GENMASK(ar->hw->queues - 1, 0);
> +
> + _ath10k_mac_wait_tx_complete(ar, queues);
> +}
> +
> static void ath10k_flush(struct ieee80211_hw *hw, struct ieee80211_vif *vif,
> u32 queues, bool drop)
> {
> @@ -8111,7 +8124,7 @@ static void ath10k_flush(struct ieee80211_hw *hw, struct ieee80211_vif *vif,
> }
>
> mutex_lock(&ar->conf_mutex);
> - ath10k_mac_wait_tx_complete(ar);
> + _ath10k_mac_wait_tx_complete(ar, queues);
> mutex_unlock(&ar->conf_mutex);
> }
>
> diff --git a/drivers/net/wireless/ath/ath10k/txrx.c b/drivers/net/wireless/ath/ath10k/txrx.c
> index 5b6750ef7d19..b848962fc8fb 100644
> --- a/drivers/net/wireless/ath/ath10k/txrx.c
> +++ b/drivers/net/wireless/ath/ath10k/txrx.c
> @@ -82,7 +82,7 @@ int ath10k_txrx_tx_unref(struct ath10k_htt *htt,
>
> flags = skb_cb->flags;
> ath10k_htt_tx_free_msdu_id(htt, tx_done->msdu_id);
> - ath10k_htt_tx_dec_pending(htt);
> + ath10k_htt_tx_dec_pending(htt, txq);
> spin_unlock_bh(&htt->tx_lock);
>
> rcu_read_lock();
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2025-07-25 21:59 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-22 16:48 [RESEND PATCH v3 0/2] Improve ath10k flush queue mechanism Remi Pommarel
2024-11-22 16:48 ` [RESEND PATCH v3 1/2] wifi: ath10k: Implement ieee80211 flush_sta callback Remi Pommarel
2024-11-22 16:48 ` [RESEND PATCH v3 2/2] wifi: ath10k: Flush only requested txq in ath10k_flush() Remi Pommarel
2025-07-25 21:57 ` Maurer, Florian
2024-11-26 12:57 ` [RESEND PATCH v3 0/2] Improve ath10k flush queue mechanism James Prestwood
2024-11-26 12:59 ` James Prestwood
2024-11-29 16:31 ` Remi Pommarel
2024-12-02 15:25 ` James Prestwood
2025-01-09 13:03 ` Kalle Valo
2025-01-09 13:56 ` Remi Pommarel
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).