* [PATCH 2/5] wifi: mt76: mt7996: fix wrong DMAD length when using MAC TXP
2026-02-02 7:53 [PATCH 1/5] wifi: mt76: fix potential deadlock caused by rx_lock Shayne Chen
@ 2026-02-02 7:53 ` Shayne Chen
2026-02-02 7:53 ` [PATCH 3/5] wifi: mt76: mt7996: fix struct mt7996_mcu_uni_event Shayne Chen
` (3 subsequent siblings)
4 siblings, 0 replies; 10+ messages in thread
From: Shayne Chen @ 2026-02-02 7:53 UTC (permalink / raw)
To: Felix Fietkau
Cc: linux-wireless, Lorenzo Bianconi, Ryder Lee, Evelyn Tsai,
Money Wang, linux-mediatek, Shayne Chen
The struct mt76_connac_fw_txp is used for HIF TXP. Change to use the
struct mt76_connac_hw_txp to fix the wrong DMAD length for MAC TXP.
Fixes: cb6ebbdffef2 ("wifi: mt76: mt7996: support writing MAC TXD for AddBA Request")
Signed-off-by: Shayne Chen <shayne.chen@mediatek.com>
---
drivers/net/wireless/mediatek/mt76/mt7996/mac.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/net/wireless/mediatek/mt76/mt7996/mac.c b/drivers/net/wireless/mediatek/mt76/mt7996/mac.c
index 77a036ac043c..3b09eff216c3 100644
--- a/drivers/net/wireless/mediatek/mt76/mt7996/mac.c
+++ b/drivers/net/wireless/mediatek/mt76/mt7996/mac.c
@@ -1099,10 +1099,10 @@ int mt7996_tx_prepare_skb(struct mt76_dev *mdev, void *txwi_ptr,
* req
*/
if (le32_to_cpu(ptr[7]) & MT_TXD7_MAC_TXD) {
- u32 val;
+ u32 val, mac_txp_size = sizeof(struct mt76_connac_hw_txp);
ptr = (__le32 *)(txwi + MT_TXD_SIZE);
- memset((void *)ptr, 0, sizeof(struct mt76_connac_fw_txp));
+ memset((void *)ptr, 0, mac_txp_size);
val = FIELD_PREP(MT_TXP0_TOKEN_ID0, id) |
MT_TXP0_TOKEN_ID0_VALID_MASK;
@@ -1121,6 +1121,8 @@ int mt7996_tx_prepare_skb(struct mt76_dev *mdev, void *txwi_ptr,
tx_info->buf[1].addr >> 32);
#endif
ptr[3] = cpu_to_le32(val);
+
+ tx_info->buf[0].len = MT_TXD_SIZE + mac_txp_size;
} else {
struct mt76_connac_txp_common *txp;
--
2.51.0
^ permalink raw reply related [flat|nested] 10+ messages in thread* [PATCH 3/5] wifi: mt76: mt7996: fix struct mt7996_mcu_uni_event
2026-02-02 7:53 [PATCH 1/5] wifi: mt76: fix potential deadlock caused by rx_lock Shayne Chen
2026-02-02 7:53 ` [PATCH 2/5] wifi: mt76: mt7996: fix wrong DMAD length when using MAC TXP Shayne Chen
@ 2026-02-02 7:53 ` Shayne Chen
2026-02-02 7:53 ` [PATCH 4/5] wifi: mt76: avoid to set ACK for MCU command if wait_resp is not set Shayne Chen
` (2 subsequent siblings)
4 siblings, 0 replies; 10+ messages in thread
From: Shayne Chen @ 2026-02-02 7:53 UTC (permalink / raw)
To: Felix Fietkau
Cc: linux-wireless, Lorenzo Bianconi, Ryder Lee, Evelyn Tsai,
Money Wang, linux-mediatek, StanleyYP Wang, Shayne Chen
From: StanleyYP Wang <StanleyYP.Wang@mediatek.com>
The cid field is defined as a two-byte value in the firmware.
Fixes: 98686cd21624 ("wifi: mt76: mt7996: add driver for MediaTek Wi-Fi 7 (802.11be) devices")
Signed-off-by: StanleyYP Wang <StanleyYP.Wang@mediatek.com>
Signed-off-by: Shayne Chen <shayne.chen@mediatek.com>
---
drivers/net/wireless/mediatek/mt76/mt7996/mcu.c | 2 +-
drivers/net/wireless/mediatek/mt76/mt7996/mcu.h | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/net/wireless/mediatek/mt76/mt7996/mcu.c b/drivers/net/wireless/mediatek/mt76/mt7996/mcu.c
index 8e1c8e1d6a99..285cd83e7117 100644
--- a/drivers/net/wireless/mediatek/mt76/mt7996/mcu.c
+++ b/drivers/net/wireless/mediatek/mt76/mt7996/mcu.c
@@ -242,7 +242,7 @@ mt7996_mcu_parse_response(struct mt76_dev *mdev, int cmd,
event = (struct mt7996_mcu_uni_event *)skb->data;
ret = le32_to_cpu(event->status);
/* skip invalid event */
- if (mcu_cmd != event->cid)
+ if (mcu_cmd != le16_to_cpu(event->cid))
ret = -EAGAIN;
} else {
skb_pull(skb, sizeof(struct mt7996_mcu_rxd));
diff --git a/drivers/net/wireless/mediatek/mt76/mt7996/mcu.h b/drivers/net/wireless/mediatek/mt76/mt7996/mcu.h
index d9fb49f7b01b..d70540982983 100644
--- a/drivers/net/wireless/mediatek/mt76/mt7996/mcu.h
+++ b/drivers/net/wireless/mediatek/mt76/mt7996/mcu.h
@@ -25,8 +25,8 @@ struct mt7996_mcu_rxd {
};
struct mt7996_mcu_uni_event {
- u8 cid;
- u8 __rsv[3];
+ __le16 cid;
+ u8 __rsv[2];
__le32 status; /* 0: success, others: fail */
} __packed;
--
2.51.0
^ permalink raw reply related [flat|nested] 10+ messages in thread* [PATCH 4/5] wifi: mt76: avoid to set ACK for MCU command if wait_resp is not set
2026-02-02 7:53 [PATCH 1/5] wifi: mt76: fix potential deadlock caused by rx_lock Shayne Chen
2026-02-02 7:53 ` [PATCH 2/5] wifi: mt76: mt7996: fix wrong DMAD length when using MAC TXP Shayne Chen
2026-02-02 7:53 ` [PATCH 3/5] wifi: mt76: mt7996: fix struct mt7996_mcu_uni_event Shayne Chen
@ 2026-02-02 7:53 ` Shayne Chen
2026-02-02 7:53 ` [PATCH 5/5] wifi: mt76: mt7996: fix queue pause after scan due to wrong channel switch reason Shayne Chen
2026-02-02 8:52 ` [PATCH 1/5] wifi: mt76: fix potential deadlock caused by rx_lock Felix Fietkau
4 siblings, 0 replies; 10+ messages in thread
From: Shayne Chen @ 2026-02-02 7:53 UTC (permalink / raw)
To: Felix Fietkau
Cc: linux-wireless, Lorenzo Bianconi, Ryder Lee, Evelyn Tsai,
Money Wang, linux-mediatek, StanleyYP Wang, Shayne Chen
From: StanleyYP Wang <StanleyYP.Wang@mediatek.com>
When wait_resp is not set but the ACK option is enabled in the MCU TXD,
the ACK event is enqueued to the MCU event queue without being dequeued
by the original MCU command request.
Any orphaned ACK events will only be removed from the queue when another
MCU command requests a response. Due to sequence index mismatches, these
events are discarded one by one until a matching sequence index is found.
However, if several MCU commands that do not require a response continue
to fill up the event queue, there is a risk that when an MCU command with
wait_resp enabled is issued, it may dequeue the wrong event skb,
especially if the queue contains events with all possible sequence
indices.
Signed-off-by: StanleyYP Wang <StanleyYP.Wang@mediatek.com>
Signed-off-by: Shayne Chen <shayne.chen@mediatek.com>
---
drivers/net/wireless/mediatek/mt76/mcu.c | 2 +-
drivers/net/wireless/mediatek/mt76/mt7996/mcu.c | 11 +++++------
2 files changed, 6 insertions(+), 7 deletions(-)
diff --git a/drivers/net/wireless/mediatek/mt76/mcu.c b/drivers/net/wireless/mediatek/mt76/mcu.c
index 535c3d8a9cc0..cbfb3bbec503 100644
--- a/drivers/net/wireless/mediatek/mt76/mcu.c
+++ b/drivers/net/wireless/mediatek/mt76/mcu.c
@@ -98,7 +98,7 @@ int mt76_mcu_skb_send_and_get_msg(struct mt76_dev *dev, struct sk_buff *skb,
/* orig skb might be needed for retry, mcu_skb_send_msg consumes it */
if (orig_skb)
skb_get(orig_skb);
- ret = dev->mcu_ops->mcu_skb_send_msg(dev, skb, cmd, &seq);
+ ret = dev->mcu_ops->mcu_skb_send_msg(dev, skb, cmd, wait_resp ? &seq : NULL);
if (ret < 0)
goto out;
diff --git a/drivers/net/wireless/mediatek/mt76/mt7996/mcu.c b/drivers/net/wireless/mediatek/mt76/mt7996/mcu.c
index 285cd83e7117..68d698033e43 100644
--- a/drivers/net/wireless/mediatek/mt76/mt7996/mcu.c
+++ b/drivers/net/wireless/mediatek/mt76/mt7996/mcu.c
@@ -322,13 +322,12 @@ mt7996_mcu_send_message(struct mt76_dev *mdev, struct sk_buff *skb,
uni_txd->pkt_type = MCU_PKT_ID;
uni_txd->seq = seq;
- if (cmd & __MCU_CMD_FIELD_QUERY)
- uni_txd->option = MCU_CMD_UNI_QUERY_ACK;
- else
- uni_txd->option = MCU_CMD_UNI_EXT_ACK;
+ uni_txd->option = MCU_CMD_UNI;
+ if (!(cmd & __MCU_CMD_FIELD_QUERY))
+ uni_txd->option |= MCU_CMD_SET;
- if (mcu_cmd == MCU_UNI_CMD_SDO)
- uni_txd->option &= ~MCU_CMD_ACK;
+ if (wait_seq)
+ uni_txd->option |= MCU_CMD_ACK;
if ((cmd & __MCU_CMD_FIELD_WA) && (cmd & __MCU_CMD_FIELD_WM))
uni_txd->s2d_index = MCU_S2D_H2CN;
--
2.51.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 5/5] wifi: mt76: mt7996: fix queue pause after scan due to wrong channel switch reason
2026-02-02 7:53 [PATCH 1/5] wifi: mt76: fix potential deadlock caused by rx_lock Shayne Chen
` (2 preceding siblings ...)
2026-02-02 7:53 ` [PATCH 4/5] wifi: mt76: avoid to set ACK for MCU command if wait_resp is not set Shayne Chen
@ 2026-02-02 7:53 ` Shayne Chen
2026-02-02 9:01 ` Felix Fietkau
2026-02-02 8:52 ` [PATCH 1/5] wifi: mt76: fix potential deadlock caused by rx_lock Felix Fietkau
4 siblings, 1 reply; 10+ messages in thread
From: Shayne Chen @ 2026-02-02 7:53 UTC (permalink / raw)
To: Felix Fietkau
Cc: linux-wireless, Lorenzo Bianconi, Ryder Lee, Evelyn Tsai,
Money Wang, linux-mediatek, StanleyYP Wang, Shayne Chen
From: StanleyYP Wang <StanleyYP.Wang@mediatek.com>
Previously, we use the IEEE80211_CONF_IDLE flag to avoid setting the
parking channel with the CH_SWITCH_NORMAL reason, which could trigger TX
emission before bootup CAC.
However, we found that this flag can be set after triggering scanning on a
connected station interface, and the reason CH_SWITCH_SCAN_BYPASS_DPD will
be used when switching back to the operating channel, which makes the
firmware failed to resume paused AC queues.
Seems that we should avoid relying on this flag after switching to single
multi-radio architecture. Instead, replace it with MT76_STATE_RUNNING.
Signed-off-by: StanleyYP Wang <StanleyYP.Wang@mediatek.com>
Signed-off-by: Shayne Chen <shayne.chen@mediatek.com>
---
drivers/net/wireless/mediatek/mt76/mt7996/mcu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/wireless/mediatek/mt76/mt7996/mcu.c b/drivers/net/wireless/mediatek/mt76/mt7996/mcu.c
index 68d698033e43..1d5ea28e7b9b 100644
--- a/drivers/net/wireless/mediatek/mt76/mt7996/mcu.c
+++ b/drivers/net/wireless/mediatek/mt76/mt7996/mcu.c
@@ -3925,7 +3925,7 @@ int mt7996_mcu_set_chan_info(struct mt7996_phy *phy, u16 tag)
if (phy->mt76->hw->conf.flags & IEEE80211_CONF_MONITOR)
req.switch_reason = CH_SWITCH_NORMAL;
else if (phy->mt76->offchannel ||
- phy->mt76->hw->conf.flags & IEEE80211_CONF_IDLE)
+ !test_bit(MT76_STATE_RUNNING, &phy->mt76->state))
req.switch_reason = CH_SWITCH_SCAN_BYPASS_DPD;
else if (!cfg80211_reg_can_beacon(phy->mt76->hw->wiphy, chandef,
NL80211_IFTYPE_AP))
--
2.51.0
^ permalink raw reply related [flat|nested] 10+ messages in thread* Re: [PATCH 5/5] wifi: mt76: mt7996: fix queue pause after scan due to wrong channel switch reason
2026-02-02 7:53 ` [PATCH 5/5] wifi: mt76: mt7996: fix queue pause after scan due to wrong channel switch reason Shayne Chen
@ 2026-02-02 9:01 ` Felix Fietkau
2026-02-02 11:52 ` Shayne Chen
0 siblings, 1 reply; 10+ messages in thread
From: Felix Fietkau @ 2026-02-02 9:01 UTC (permalink / raw)
To: Shayne Chen
Cc: linux-wireless, Lorenzo Bianconi, Ryder Lee, Evelyn Tsai,
Money Wang, linux-mediatek, StanleyYP Wang
On 02.02.26 08:53, Shayne Chen wrote:
> From: StanleyYP Wang <StanleyYP.Wang@mediatek.com>
>
> Previously, we use the IEEE80211_CONF_IDLE flag to avoid setting the
> parking channel with the CH_SWITCH_NORMAL reason, which could trigger TX
> emission before bootup CAC.
>
> However, we found that this flag can be set after triggering scanning on a
> connected station interface, and the reason CH_SWITCH_SCAN_BYPASS_DPD will
> be used when switching back to the operating channel, which makes the
> firmware failed to resume paused AC queues.
>
> Seems that we should avoid relying on this flag after switching to single
> multi-radio architecture. Instead, replace it with MT76_STATE_RUNNING.
I don't see how the conditions are comparable at all. I also don't see
how this function can be called with MT76_STATE_RUNNING unset.
Maybe a better replacement would be to check for a chanctx on the phy?
- Felix
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 5/5] wifi: mt76: mt7996: fix queue pause after scan due to wrong channel switch reason
2026-02-02 9:01 ` Felix Fietkau
@ 2026-02-02 11:52 ` Shayne Chen
2026-02-02 12:22 ` Felix Fietkau
0 siblings, 1 reply; 10+ messages in thread
From: Shayne Chen @ 2026-02-02 11:52 UTC (permalink / raw)
To: Felix Fietkau
Cc: linux-wireless, Lorenzo Bianconi, Ryder Lee, Evelyn Tsai,
Money Wang, linux-mediatek, StanleyYP Wang
On Mon, 2026-02-02 at 10:01 +0100, Felix Fietkau wrote:
> On 02.02.26 08:53, Shayne Chen wrote:
> > From: StanleyYP Wang <StanleyYP.Wang@mediatek.com>
> >
> > Previously, we use the IEEE80211_CONF_IDLE flag to avoid setting
> > the
> > parking channel with the CH_SWITCH_NORMAL reason, which could
> > trigger TX
> > emission before bootup CAC.
> >
> > However, we found that this flag can be set after triggering
> > scanning on a
> > connected station interface, and the reason
> > CH_SWITCH_SCAN_BYPASS_DPD will
> > be used when switching back to the operating channel, which makes
> > the
> > firmware failed to resume paused AC queues.
> >
> > Seems that we should avoid relying on this flag after switching to
> > single
> > multi-radio architecture. Instead, replace it with
> > MT76_STATE_RUNNING.
>
> I don't see how the conditions are comparable at all. I also don't
> see
> how this function can be called with MT76_STATE_RUNNING unset.
>
The condition is used to prevent mt7996_mcu_set_chan_info() (in
mt7996_run()) from triggering TX emission.
> Maybe a better replacement would be to check for a chanctx on the
> phy?
>
Will do some tests on this and send v2.
Thanks,
Shayne
> - Felix
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 5/5] wifi: mt76: mt7996: fix queue pause after scan due to wrong channel switch reason
2026-02-02 11:52 ` Shayne Chen
@ 2026-02-02 12:22 ` Felix Fietkau
0 siblings, 0 replies; 10+ messages in thread
From: Felix Fietkau @ 2026-02-02 12:22 UTC (permalink / raw)
To: Shayne Chen
Cc: linux-wireless, Lorenzo Bianconi, Ryder Lee, Evelyn Tsai,
Money Wang, linux-mediatek, StanleyYP Wang
On 02.02.26 12:52, Shayne Chen wrote:
> On Mon, 2026-02-02 at 10:01 +0100, Felix Fietkau wrote:
>> On 02.02.26 08:53, Shayne Chen wrote:
>> > From: StanleyYP Wang <StanleyYP.Wang@mediatek.com>
>> >
>> > Previously, we use the IEEE80211_CONF_IDLE flag to avoid setting
>> > the
>> > parking channel with the CH_SWITCH_NORMAL reason, which could
>> > trigger TX
>> > emission before bootup CAC.
>> >
>> > However, we found that this flag can be set after triggering
>> > scanning on a
>> > connected station interface, and the reason
>> > CH_SWITCH_SCAN_BYPASS_DPD will
>> > be used when switching back to the operating channel, which makes
>> > the
>> > firmware failed to resume paused AC queues.
>> >
>> > Seems that we should avoid relying on this flag after switching to
>> > single
>> > multi-radio architecture. Instead, replace it with
>> > MT76_STATE_RUNNING.
>>
>> I don't see how the conditions are comparable at all. I also don't
>> see
>> how this function can be called with MT76_STATE_RUNNING unset.
>>
> The condition is used to prevent mt7996_mcu_set_chan_info() (in
> mt7996_run()) from triggering TX emission.
Makes sense.
>> Maybe a better replacement would be to check for a chanctx on the
>> phy?
>>
> Will do some tests on this and send v2.
Thanks.
- Felix
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/5] wifi: mt76: fix potential deadlock caused by rx_lock
2026-02-02 7:53 [PATCH 1/5] wifi: mt76: fix potential deadlock caused by rx_lock Shayne Chen
` (3 preceding siblings ...)
2026-02-02 7:53 ` [PATCH 5/5] wifi: mt76: mt7996: fix queue pause after scan due to wrong channel switch reason Shayne Chen
@ 2026-02-02 8:52 ` Felix Fietkau
2026-02-02 11:46 ` Shayne Chen
4 siblings, 1 reply; 10+ messages in thread
From: Felix Fietkau @ 2026-02-02 8:52 UTC (permalink / raw)
To: Shayne Chen
Cc: linux-wireless, Lorenzo Bianconi, Ryder Lee, Evelyn Tsai,
Money Wang, linux-mediatek
On 02.02.26 08:53, Shayne Chen wrote:
> A deadlock will occur if both of the following conditions are met,
> because they each attempt to acquire the rx_lock:
> - mac80211 receives an unexpected BAR control frame, which triggers
> a BA deletion
> - A transmission failure happens due to an abnormality in DMA
>
> Since ieee80211_tx_status_ext() is primarily used to address AQL issues,
> avoid potential deadlocks by restricting calls to ieee80211_tx_status_ext
> only for data frames.
First of all, ieee80211_tx_status_ext is not primarily used to address
AQL, ieee80211_free_txskb handles it as well. The reason for it is tx
status handling, e.g. for management frames sent by hostapd that require
an ACK status report, so limiting the status calls for data frames seems
wrong to me.
I don't really understand how the scenario you're describing leads to a
deadlock. From my understanding, if something in the mac80211 rx path
triggers a tx, it should end up calling mt76_tx(), which queues the skb
to wcid->tx_list and triggers the tx worker. So the actual dma tx callls
are expected to come from the worker kthread.
How does this lead to a deadlock on rx_lock?
- Felix
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH 1/5] wifi: mt76: fix potential deadlock caused by rx_lock
2026-02-02 8:52 ` [PATCH 1/5] wifi: mt76: fix potential deadlock caused by rx_lock Felix Fietkau
@ 2026-02-02 11:46 ` Shayne Chen
0 siblings, 0 replies; 10+ messages in thread
From: Shayne Chen @ 2026-02-02 11:46 UTC (permalink / raw)
To: Felix Fietkau
Cc: linux-wireless, Lorenzo Bianconi, Ryder Lee, Evelyn Tsai,
Money Wang, linux-mediatek
On Mon, 2026-02-02 at 09:52 +0100, Felix Fietkau wrote:
> On 02.02.26 08:53, Shayne Chen wrote:
> > A deadlock will occur if both of the following conditions are met,
> > because they each attempt to acquire the rx_lock:
> > - mac80211 receives an unexpected BAR control frame, which triggers
> > a BA deletion
> > - A transmission failure happens due to an abnormality in DMA
> >
> > Since ieee80211_tx_status_ext() is primarily used to address AQL
> > issues,
> > avoid potential deadlocks by restricting calls to
> > ieee80211_tx_status_ext
> > only for data frames.
>
> First of all, ieee80211_tx_status_ext is not primarily used to
> address
> AQL, ieee80211_free_txskb handles it as well. The reason for it is tx
> status handling, e.g. for management frames sent by hostapd that
> require
> an ACK status report, so limiting the status calls for data frames
> seems
> wrong to me.
>
> I don't really understand how the scenario you're describing leads to
> a
> deadlock. From my understanding, if something in the mac80211 rx path
> triggers a tx, it should end up calling mt76_tx(), which queues the
> skb
> to wcid->tx_list and triggers the tx worker. So the actual dma tx
> callls
> are expected to come from the worker kthread.
> How does this lead to a deadlock on rx_lock?
Hi Felix,
Thanks for the explanation.
I've re-checked the codebase used by the customer when the issue was
reported, and I found that the wcid->tx_list structure was not present
in that version. So yes, this problem should not occur in the current
codebase.
Will drop this patch in v2.
Thanks,
Shayne
>
> - Felix
^ permalink raw reply [flat|nested] 10+ messages in thread