public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] ath11k: workaround firmware bug where peer_id=0
@ 2026-03-26 10:53 Matthew Leach
  2026-03-30  7:57 ` Matthew Leach
  0 siblings, 1 reply; 5+ messages in thread
From: Matthew Leach @ 2026-03-26 10:53 UTC (permalink / raw)
  To: Jeff Johnson; +Cc: linux-wireless, ath11k, linux-kernel, kernel, Matthew Leach

It has been observed that under certain conditions the ath11k firmware
sets the peer_id=0 for RX'd frames. For standard MPDUs this is fine as
ath11k_dp_rx_h_find_peer() has a fallback case where it locates the peer
based upon the source mac address.

However, on an aggregated link, reception of an A-MSDU results in the
peer not being resolved for the second (any any subsequent) sub-MSDUs.
This causes the encryption type of the frame to be set to an incorrect
value, resulting in the sub-MSDUs being dropped by ieee80211. Notice how
the flags differ in:

ath11k_pci 0000:03:00.0: data rx skb 000000002f4b704d len 1534 peer xx:xx:xx:xx:xx:xx 0 ucast sn 3063 he160 rate_idx 9 vht_nss 2 freq 5240 band 1 flag 0x40d1a fcs-err 0 mic-err 0 amsdu-more 0 peer_id 0 first_msdu 1 last_msdu 0
ath11k_pci 0000:03:00.0: data rx skb 0000000038acd580 len 1534 peer (null) 0 ucast sn 3063 he160 rate_idx 9 vht_nss 2 freq 5240 band 1 flag 0x40d00 fcs-err 0 mic-err 0 amsdu-more 0 peer_id 0 first_msdu 0 last_msdu 1

This patch caches the peer enctype during the MSDU processing loop,
caching it on the first AMSDU sub-frame (is_first_msdu=1 is_last_msdu=0)
and setting the correct enctype for any subsequent sub-MSDUs.

Signed-off-by: Matthew Leach <matthew.leach@collabora.com>
---
 drivers/net/wireless/ath/ath11k/dp_rx.c | 35 ++++++++++++++++++++++++++++-----
 1 file changed, 30 insertions(+), 5 deletions(-)

diff --git a/drivers/net/wireless/ath/ath11k/dp_rx.c b/drivers/net/wireless/ath/ath11k/dp_rx.c
index 49d959b2e148..f5c2a8085a1b 100644
--- a/drivers/net/wireless/ath/ath11k/dp_rx.c
+++ b/drivers/net/wireless/ath/ath11k/dp_rx.c
@@ -21,6 +21,12 @@
 
 #define ATH11K_DP_RX_FRAGMENT_TIMEOUT_MS (2 * HZ)
 
+struct cached_peer_info {
+	enum hal_encrypt_type enctype;
+	u16 seq_no;
+	bool valid;
+};
+
 static inline
 u8 *ath11k_dp_rx_h_80211_hdr(struct ath11k_base *ab, struct hal_rx_desc *desc)
 {
@@ -2232,7 +2238,8 @@ ath11k_dp_rx_h_find_peer(struct ath11k_base *ab, struct sk_buff *msdu)
 static void ath11k_dp_rx_h_mpdu(struct ath11k *ar,
 				struct sk_buff *msdu,
 				struct hal_rx_desc *rx_desc,
-				struct ieee80211_rx_status *rx_status)
+				struct ieee80211_rx_status *rx_status,
+				struct cached_peer_info *peer_cache)
 {
 	bool  fill_crypto_hdr;
 	enum hal_encrypt_type enctype;
@@ -2265,6 +2272,21 @@ static void ath11k_dp_rx_h_mpdu(struct ath11k *ar,
 	}
 	spin_unlock_bh(&ar->ab->base_lock);
 
+	if (!rxcb->peer_id && rxcb->is_first_msdu && !rxcb->is_last_msdu) {
+		peer_cache->enctype = enctype;
+		peer_cache->seq_no = rxcb->seq_no;
+		peer_cache->valid = true;
+	}
+
+	if (!rxcb->peer_id && !rxcb->is_first_msdu && peer_cache->valid) {
+		if (rxcb->seq_no == peer_cache->seq_no)
+			enctype = peer_cache->enctype;
+		else
+			ath11k_dbg(ar->ab, ATH11K_DBG_DATA,
+				   "null peer_id workaround failed. cached seq_no=%d, msdu seq_no=%d",
+				   peer_cache->seq_no, rxcb->seq_no);
+	}
+
 	rx_attention = ath11k_dp_rx_get_attention(ar->ab, rx_desc);
 	err_bitmap = ath11k_dp_rx_h_attn_mpdu_err(rx_attention);
 	if (enctype != HAL_ENCRYPT_TYPE_OPEN && !err_bitmap)
@@ -2506,7 +2528,8 @@ static void ath11k_dp_rx_deliver_msdu(struct ath11k *ar, struct napi_struct *nap
 static int ath11k_dp_rx_process_msdu(struct ath11k *ar,
 				     struct sk_buff *msdu,
 				     struct sk_buff_head *msdu_list,
-				     struct ieee80211_rx_status *rx_status)
+				     struct ieee80211_rx_status *rx_status,
+				     struct cached_peer_info *peer_cache)
 {
 	struct ath11k_base *ab = ar->ab;
 	struct hal_rx_desc *rx_desc, *lrx_desc;
@@ -2574,7 +2597,7 @@ static int ath11k_dp_rx_process_msdu(struct ath11k *ar,
 	}
 
 	ath11k_dp_rx_h_ppdu(ar, rx_desc, rx_status);
-	ath11k_dp_rx_h_mpdu(ar, msdu, rx_desc, rx_status);
+	ath11k_dp_rx_h_mpdu(ar, msdu, rx_desc, rx_status, peer_cache);
 
 	rx_status->flag |= RX_FLAG_SKIP_MONITOR | RX_FLAG_DUP_VALIDATED;
 
@@ -2592,6 +2615,7 @@ static void ath11k_dp_rx_process_received_packets(struct ath11k_base *ab,
 	struct sk_buff *msdu;
 	struct ath11k *ar;
 	struct ieee80211_rx_status rx_status = {};
+	struct cached_peer_info peer_cache = {};
 	int ret;
 
 	if (skb_queue_empty(msdu_list))
@@ -2609,7 +2633,7 @@ static void ath11k_dp_rx_process_received_packets(struct ath11k_base *ab,
 	}
 
 	while ((msdu = __skb_dequeue(msdu_list))) {
-		ret = ath11k_dp_rx_process_msdu(ar, msdu, msdu_list, &rx_status);
+		ret = ath11k_dp_rx_process_msdu(ar, msdu, msdu_list, &rx_status, &peer_cache);
 		if (unlikely(ret)) {
 			ath11k_dbg(ab, ATH11K_DBG_DATA,
 				   "Unable to process msdu %d", ret);
@@ -3959,6 +3983,7 @@ static int ath11k_dp_rx_h_null_q_desc(struct ath11k *ar, struct sk_buff *msdu,
 	u8 l3pad_bytes;
 	struct ath11k_skb_rxcb *rxcb = ATH11K_SKB_RXCB(msdu);
 	u32 hal_rx_desc_sz = ar->ab->hw_params.hal_desc_sz;
+	struct cached_peer_info peer_cache = {};
 
 	msdu_len = ath11k_dp_rx_h_msdu_start_msdu_len(ar->ab, desc);
 
@@ -4002,7 +4027,7 @@ static int ath11k_dp_rx_h_null_q_desc(struct ath11k *ar, struct sk_buff *msdu,
 	}
 	ath11k_dp_rx_h_ppdu(ar, desc, status);
 
-	ath11k_dp_rx_h_mpdu(ar, msdu, desc, status);
+	ath11k_dp_rx_h_mpdu(ar, msdu, desc, status, &peer_cache);
 
 	rxcb->tid = ath11k_dp_rx_h_mpdu_start_tid(ar->ab, desc);
 

---
base-commit: f338e77383789c0cae23ca3d48adcc5e9e137e3c
change-id: 20260326-ath11k-null-peerid-workaround-625a129781b1

Best regards,
--  
Matt


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] ath11k: workaround firmware bug where peer_id=0
  2026-03-26 10:53 [PATCH] ath11k: workaround firmware bug where peer_id=0 Matthew Leach
@ 2026-03-30  7:57 ` Matthew Leach
  2026-04-14  7:06   ` Baochen Qiang
  0 siblings, 1 reply; 5+ messages in thread
From: Matthew Leach @ 2026-03-30  7:57 UTC (permalink / raw)
  To: Jeff Johnson; +Cc: linux-wireless, ath11k, linux-kernel, kernel

Hello,

Matthew Leach <matthew.leach@collabora.com> writes:

> This patch caches the peer enctype during the MSDU processing loop,
> caching it on the first AMSDU sub-frame (is_first_msdu=1
> is_last_msdu=0) and setting the correct enctype for any subsequent
> sub-MSDUs.

I've been looking at creating a patch that addresses the root cause,
rather than patching incoming frame's flags:

--8<---------------cut here---------------start------------->8---
diff --git a/drivers/net/wireless/ath/ath11k/peer.c b/drivers/net/wireless/ath/ath11k/peer.c
index 6d0126c39301..98348ccfdfbe 100644
--- a/drivers/net/wireless/ath/ath11k/peer.c
+++ b/drivers/net/wireless/ath/ath11k/peer.c
@@ -347,7 +347,7 @@ static int __ath11k_peer_delete(struct ath11k *ar, u32 vdev_id, const u8 *addr)
 	return 0;
 }
 
-int ath11k_peer_delete(struct ath11k *ar, u32 vdev_id, u8 *addr)
+int ath11k_peer_delete(struct ath11k *ar, u32 vdev_id, const u8 *addr)
 {
 	int ret;
 
@@ -372,7 +372,7 @@ int ath11k_peer_create(struct ath11k *ar, struct ath11k_vif *arvif,
 {
 	struct ath11k_peer *peer;
 	struct ath11k_sta *arsta;
-	int ret, fbret;
+	int ret, fbret, retries = 3;
 
 	lockdep_assert_held(&ar->conf_mutex);
 
@@ -400,6 +400,8 @@ int ath11k_peer_create(struct ath11k *ar, struct ath11k_vif *arvif,
 	spin_unlock_bh(&ar->ab->base_lock);
 	mutex_unlock(&ar->ab->tbl_mtx_lock);
 
+retry:
+
 	ret = ath11k_wmi_send_peer_create_cmd(ar, param);
 	if (ret) {
 		ath11k_warn(ar->ab,
@@ -427,6 +429,18 @@ int ath11k_peer_create(struct ath11k *ar, struct ath11k_vif *arvif,
 		goto cleanup;
 	}
 
+	if (!peer->peer_id) {
+		if (retries--) {
+			spin_unlock_bh(&ar->ab->base_lock);
+			mutex_unlock(&ar->ab->tbl_mtx_lock);
+			ath11k_peer_delete(ar, param->vdev_id, param->peer_addr);
+			goto retry;
+		} else {
+			ath11k_warn(ar->ab, "Null peer workaround failed for peer %pM, adding anyway",
+				    param->peer_addr);
+		}
+	}
+
 	ret = ath11k_peer_rhash_add(ar->ab, peer);
 	if (ret) {
 		spin_unlock_bh(&ar->ab->base_lock);
diff --git a/drivers/net/wireless/ath/ath11k/peer.h b/drivers/net/wireless/ath/ath11k/peer.h
index 3ad2f3355b14..6325c4d157c7 100644
--- a/drivers/net/wireless/ath/ath11k/peer.h
+++ b/drivers/net/wireless/ath/ath11k/peer.h
@@ -47,7 +47,7 @@ struct ath11k_peer *ath11k_peer_find_by_addr(struct ath11k_base *ab,
 					     const u8 *addr);
 struct ath11k_peer *ath11k_peer_find_by_id(struct ath11k_base *ab, int peer_id);
 void ath11k_peer_cleanup(struct ath11k *ar, u32 vdev_id);
-int ath11k_peer_delete(struct ath11k *ar, u32 vdev_id, u8 *addr);
+int ath11k_peer_delete(struct ath11k *ar, u32 vdev_id, const u8 *addr);
 int ath11k_peer_create(struct ath11k *ar, struct ath11k_vif *arvif,
 		       struct ieee80211_sta *sta, struct peer_create_params *param);
 int ath11k_wait_for_peer_delete_done(struct ath11k *ar, u32 vdev_id,
--8<---------------cut here---------------end--------------->8---

This patch detects the error condition at the point where a peer map
request reply is received from the firmware. If the firmware maps with
peer_id=0, we request that the firmware unmap that peer and map again,
hoping it selects a peer_id!=0. We attempt this up to three times, at
which point we give up and let the peer be mapped with an ID of 0.

This patch addresses the root cause, but I think it's more invasive. I'd
appreciate some comments as to which approach upstream would prefer. If
the preference is for the above, I'll send out a v2.

Regards,
-- 
Matt

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] ath11k: workaround firmware bug where peer_id=0
  2026-03-30  7:57 ` Matthew Leach
@ 2026-04-14  7:06   ` Baochen Qiang
  2026-04-14 12:54     ` Matthew Leach
  0 siblings, 1 reply; 5+ messages in thread
From: Baochen Qiang @ 2026-04-14  7:06 UTC (permalink / raw)
  To: Matthew Leach, Jeff Johnson; +Cc: linux-wireless, ath11k, linux-kernel, kernel



On 3/30/2026 3:57 PM, Matthew Leach wrote:
> Hello,
> 
> Matthew Leach <matthew.leach@collabora.com> writes:
> 
>> This patch caches the peer enctype during the MSDU processing loop,
>> caching it on the first AMSDU sub-frame (is_first_msdu=1
>> is_last_msdu=0) and setting the correct enctype for any subsequent
>> sub-MSDUs.
> 
> I've been looking at creating a patch that addresses the root cause,
> rather than patching incoming frame's flags:
> 
> --8<---------------cut here---------------start------------->8---
> diff --git a/drivers/net/wireless/ath/ath11k/peer.c b/drivers/net/wireless/ath/ath11k/peer.c
> index 6d0126c39301..98348ccfdfbe 100644
> --- a/drivers/net/wireless/ath/ath11k/peer.c
> +++ b/drivers/net/wireless/ath/ath11k/peer.c
> @@ -347,7 +347,7 @@ static int __ath11k_peer_delete(struct ath11k *ar, u32 vdev_id, const u8 *addr)
>  	return 0;
>  }
>  
> -int ath11k_peer_delete(struct ath11k *ar, u32 vdev_id, u8 *addr)
> +int ath11k_peer_delete(struct ath11k *ar, u32 vdev_id, const u8 *addr)
>  {
>  	int ret;
>  
> @@ -372,7 +372,7 @@ int ath11k_peer_create(struct ath11k *ar, struct ath11k_vif *arvif,
>  {
>  	struct ath11k_peer *peer;
>  	struct ath11k_sta *arsta;
> -	int ret, fbret;
> +	int ret, fbret, retries = 3;
>  
>  	lockdep_assert_held(&ar->conf_mutex);
>  
> @@ -400,6 +400,8 @@ int ath11k_peer_create(struct ath11k *ar, struct ath11k_vif *arvif,
>  	spin_unlock_bh(&ar->ab->base_lock);
>  	mutex_unlock(&ar->ab->tbl_mtx_lock);
>  
> +retry:
> +
>  	ret = ath11k_wmi_send_peer_create_cmd(ar, param);
>  	if (ret) {
>  		ath11k_warn(ar->ab,
> @@ -427,6 +429,18 @@ int ath11k_peer_create(struct ath11k *ar, struct ath11k_vif *arvif,
>  		goto cleanup;
>  	}
>  
> +	if (!peer->peer_id) {
> +		if (retries--) {
> +			spin_unlock_bh(&ar->ab->base_lock);
> +			mutex_unlock(&ar->ab->tbl_mtx_lock);
> +			ath11k_peer_delete(ar, param->vdev_id, param->peer_addr);
> +			goto retry;
> +		} else {
> +			ath11k_warn(ar->ab, "Null peer workaround failed for peer %pM, adding anyway",
> +				    param->peer_addr);
> +		}
> +	}
> +
>  	ret = ath11k_peer_rhash_add(ar->ab, peer);
>  	if (ret) {
>  		spin_unlock_bh(&ar->ab->base_lock);
> diff --git a/drivers/net/wireless/ath/ath11k/peer.h b/drivers/net/wireless/ath/ath11k/peer.h
> index 3ad2f3355b14..6325c4d157c7 100644
> --- a/drivers/net/wireless/ath/ath11k/peer.h
> +++ b/drivers/net/wireless/ath/ath11k/peer.h
> @@ -47,7 +47,7 @@ struct ath11k_peer *ath11k_peer_find_by_addr(struct ath11k_base *ab,
>  					     const u8 *addr);
>  struct ath11k_peer *ath11k_peer_find_by_id(struct ath11k_base *ab, int peer_id);
>  void ath11k_peer_cleanup(struct ath11k *ar, u32 vdev_id);
> -int ath11k_peer_delete(struct ath11k *ar, u32 vdev_id, u8 *addr);
> +int ath11k_peer_delete(struct ath11k *ar, u32 vdev_id, const u8 *addr);
>  int ath11k_peer_create(struct ath11k *ar, struct ath11k_vif *arvif,
>  		       struct ieee80211_sta *sta, struct peer_create_params *param);
>  int ath11k_wait_for_peer_delete_done(struct ath11k *ar, u32 vdev_id,
> --8<---------------cut here---------------end--------------->8---
> 
> This patch detects the error condition at the point where a peer map
> request reply is received from the firmware. If the firmware maps with
> peer_id=0, we request that the firmware unmap that peer and map again,
> hoping it selects a peer_id!=0. We attempt this up to three times, at
> which point we give up and let the peer be mapped with an ID of 0.
> 
> This patch addresses the root cause, but I think it's more invasive. I'd
> appreciate some comments as to which approach upstream would prefer. If
> the preference is for the above, I'll send out a v2.

for chips like QCA2066 and WCN6855 etc 0 is a valid value, however this is not for chips
like QCN9074 etc.

so a possible fix would be to add hardware ops based on chips: for QCN9074 we keep the
existing validation on 0 in the ops, while for QCA2066 the ops is a null func. Or even
simper we can remove the validation for all chips.

> 
> Regards,


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] ath11k: workaround firmware bug where peer_id=0
  2026-04-14  7:06   ` Baochen Qiang
@ 2026-04-14 12:54     ` Matthew Leach
  2026-04-15  3:16       ` Baochen Qiang
  0 siblings, 1 reply; 5+ messages in thread
From: Matthew Leach @ 2026-04-14 12:54 UTC (permalink / raw)
  To: Baochen Qiang; +Cc: Jeff Johnson, linux-wireless, ath11k, linux-kernel, kernel

Hi Baochen,

Baochen Qiang <baochen.qiang@oss.qualcomm.com> writes:

> On 3/30/2026 3:57 PM, Matthew Leach wrote:
>> Hello,
>> 
>> Matthew Leach <matthew.leach@collabora.com> writes:
>> 

[...]

> for chips like QCA2066 and WCN6855 etc 0 is a valid value, however
> this is not for chips like QCN9074 etc.
>
> so a possible fix would be to add hardware ops based on chips: for
> QCN9074 we keep the existing validation on 0 in the ops, while for
> QCA2066 the ops is a null func. Or even simper we can remove the
> validation for all chips.

In that case, does it make sense to remove the condition check

if (rxcb->peer_id)

in ath11k_dp_rx_h_find_peer()? It looks like this has been used as a
small optimisation, where if peer_id isn't valid it skips checking for
it in the peer hash table. However, if on newer chips peer_id=0 is
valid, we should remove this?

Regards,
-- 
Matt

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] ath11k: workaround firmware bug where peer_id=0
  2026-04-14 12:54     ` Matthew Leach
@ 2026-04-15  3:16       ` Baochen Qiang
  0 siblings, 0 replies; 5+ messages in thread
From: Baochen Qiang @ 2026-04-15  3:16 UTC (permalink / raw)
  To: Matthew Leach; +Cc: Jeff Johnson, linux-wireless, ath11k, linux-kernel, kernel



On 4/14/2026 8:54 PM, Matthew Leach wrote:
> Hi Baochen,
> 
> Baochen Qiang <baochen.qiang@oss.qualcomm.com> writes:
> 
>> On 3/30/2026 3:57 PM, Matthew Leach wrote:
>>> Hello,
>>>
>>> Matthew Leach <matthew.leach@collabora.com> writes:
>>>
> 
> [...]
> 
>> for chips like QCA2066 and WCN6855 etc 0 is a valid value, however
>> this is not for chips like QCN9074 etc.
>>
>> so a possible fix would be to add hardware ops based on chips: for
>> QCN9074 we keep the existing validation on 0 in the ops, while for
>> QCA2066 the ops is a null func. Or even simper we can remove the
>> validation for all chips.
> 
> In that case, does it make sense to remove the condition check
> 
> if (rxcb->peer_id)
> 
> in ath11k_dp_rx_h_find_peer()? It looks like this has been used as a
> small optimisation, where if peer_id isn't valid it skips checking for
> it in the peer hash table. However, if on newer chips peer_id=0 is
> valid, we should remove this?

yeah, I think so. This check was also based on the non-zero peer id assumption.

> 
> Regards,


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-04-15  3:16 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-26 10:53 [PATCH] ath11k: workaround firmware bug where peer_id=0 Matthew Leach
2026-03-30  7:57 ` Matthew Leach
2026-04-14  7:06   ` Baochen Qiang
2026-04-14 12:54     ` Matthew Leach
2026-04-15  3:16       ` Baochen Qiang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox