* [PATCH rtw v4 0/4] wifi: fixes for rtw89
@ 2025-09-17 9:52 Fedor Pchelkin
2025-09-17 9:52 ` [PATCH rtw v4 1/4] wifi: rtw89: fix use-after-free in rtw89_core_tx_kick_off_and_wait() Fedor Pchelkin
` (3 more replies)
0 siblings, 4 replies; 20+ messages in thread
From: Fedor Pchelkin @ 2025-09-17 9:52 UTC (permalink / raw)
To: Ping-Ke Shih, Zong-Zhe Yang
Cc: Fedor Pchelkin, Bitterblue Smith, Po-Hao Huang, linux-wireless,
linux-kernel, lvc-project
v4: - drop USB patch to handle that in a separate series
- further changelog below --- in the patches
v3: https://lore.kernel.org/linux-wireless/20250828211245.178843-1-pchelkin@ispras.ru/
- further improvements for fix for use-after-free in
rtw89_core_tx_kick_off_and_wait(), see changelog below --- in the patch
- add patch 3/5 for USB case
v2: https://lore.kernel.org/linux-wireless/20250827120603.723548-1-pchelkin@ispras.ru/
- rework fix for use-after-free in rtw89_core_tx_kick_off_and_wait() (Zong-Zhe)
- add patches 2/4 and 3/4 for another issues discovered while reworking
- use rtw tree instead of rtw-next
v1: https://lore.kernel.org/linux-wireless/20250820141441.106156-1-pchelkin@ispras.ru/
Fedor Pchelkin (4):
wifi: rtw89: fix use-after-free in rtw89_core_tx_kick_off_and_wait()
wifi: rtw89: fix tx_wait initialization race
wifi: rtw89: fix leak in rtw89_core_send_nullfunc()
wifi: rtw89: avoid circular locking dependency in ser_state_run()
drivers/net/wireless/realtek/rtw89/core.c | 61 ++++++++++++++++-------
drivers/net/wireless/realtek/rtw89/core.h | 36 ++++++++++++-
drivers/net/wireless/realtek/rtw89/pci.c | 5 +-
drivers/net/wireless/realtek/rtw89/ser.c | 5 +-
4 files changed, 82 insertions(+), 25 deletions(-)
--
2.51.0
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH rtw v4 1/4] wifi: rtw89: fix use-after-free in rtw89_core_tx_kick_off_and_wait()
2025-09-17 9:52 [PATCH rtw v4 0/4] wifi: fixes for rtw89 Fedor Pchelkin
@ 2025-09-17 9:52 ` Fedor Pchelkin
2025-09-18 4:00 ` Ping-Ke Shih
2025-09-17 9:52 ` [PATCH rtw v4 2/4] wifi: rtw89: fix tx_wait initialization race Fedor Pchelkin
` (2 subsequent siblings)
3 siblings, 1 reply; 20+ messages in thread
From: Fedor Pchelkin @ 2025-09-17 9:52 UTC (permalink / raw)
To: Ping-Ke Shih, Zong-Zhe Yang
Cc: Fedor Pchelkin, Bitterblue Smith, Po-Hao Huang, linux-wireless,
linux-kernel, lvc-project, stable
There is a bug observed when rtw89_core_tx_kick_off_and_wait() tries to
access already freed skb_data:
BUG: KFENCE: use-after-free write in rtw89_core_tx_kick_off_and_wait drivers/net/wireless/realtek/rtw89/core.c:1110
CPU: 6 UID: 0 PID: 41377 Comm: kworker/u64:24 Not tainted 6.17.0-rc1+ #1 PREEMPT(lazy)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS edk2-20250523-14.fc42 05/23/2025
Workqueue: events_unbound cfg80211_wiphy_work [cfg80211]
Use-after-free write at 0x0000000020309d9d (in kfence-#251):
rtw89_core_tx_kick_off_and_wait drivers/net/wireless/realtek/rtw89/core.c:1110
rtw89_core_scan_complete drivers/net/wireless/realtek/rtw89/core.c:5338
rtw89_hw_scan_complete_cb drivers/net/wireless/realtek/rtw89/fw.c:7979
rtw89_chanctx_proceed_cb drivers/net/wireless/realtek/rtw89/chan.c:3165
rtw89_chanctx_proceed drivers/net/wireless/realtek/rtw89/chan.h:141
rtw89_hw_scan_complete drivers/net/wireless/realtek/rtw89/fw.c:8012
rtw89_mac_c2h_scanofld_rsp drivers/net/wireless/realtek/rtw89/mac.c:5059
rtw89_fw_c2h_work drivers/net/wireless/realtek/rtw89/fw.c:6758
process_one_work kernel/workqueue.c:3241
worker_thread kernel/workqueue.c:3400
kthread kernel/kthread.c:463
ret_from_fork arch/x86/kernel/process.c:154
ret_from_fork_asm arch/x86/entry/entry_64.S:258
kfence-#251: 0x0000000056e2393d-0x000000009943cb62, size=232, cache=skbuff_head_cache
allocated by task 41377 on cpu 6 at 77869.159548s (0.009551s ago):
__alloc_skb net/core/skbuff.c:659
__netdev_alloc_skb net/core/skbuff.c:734
ieee80211_nullfunc_get net/mac80211/tx.c:5844
rtw89_core_send_nullfunc drivers/net/wireless/realtek/rtw89/core.c:3431
rtw89_core_scan_complete drivers/net/wireless/realtek/rtw89/core.c:5338
rtw89_hw_scan_complete_cb drivers/net/wireless/realtek/rtw89/fw.c:7979
rtw89_chanctx_proceed_cb drivers/net/wireless/realtek/rtw89/chan.c:3165
rtw89_chanctx_proceed drivers/net/wireless/realtek/rtw89/chan.c:3194
rtw89_hw_scan_complete drivers/net/wireless/realtek/rtw89/fw.c:8012
rtw89_mac_c2h_scanofld_rsp drivers/net/wireless/realtek/rtw89/mac.c:5059
rtw89_fw_c2h_work drivers/net/wireless/realtek/rtw89/fw.c:6758
process_one_work kernel/workqueue.c:3241
worker_thread kernel/workqueue.c:3400
kthread kernel/kthread.c:463
ret_from_fork arch/x86/kernel/process.c:154
ret_from_fork_asm arch/x86/entry/entry_64.S:258
freed by task 1045 on cpu 9 at 77869.168393s (0.001557s ago):
ieee80211_tx_status_skb net/mac80211/status.c:1117
rtw89_pci_release_txwd_skb drivers/net/wireless/realtek/rtw89/pci.c:564
rtw89_pci_release_tx_skbs.isra.0 drivers/net/wireless/realtek/rtw89/pci.c:651
rtw89_pci_release_tx drivers/net/wireless/realtek/rtw89/pci.c:676
rtw89_pci_napi_poll drivers/net/wireless/realtek/rtw89/pci.c:4238
__napi_poll net/core/dev.c:7495
net_rx_action net/core/dev.c:7557 net/core/dev.c:7684
handle_softirqs kernel/softirq.c:580
do_softirq.part.0 kernel/softirq.c:480
__local_bh_enable_ip kernel/softirq.c:407
rtw89_pci_interrupt_threadfn drivers/net/wireless/realtek/rtw89/pci.c:927
irq_thread_fn kernel/irq/manage.c:1133
irq_thread kernel/irq/manage.c:1257
kthread kernel/kthread.c:463
ret_from_fork arch/x86/kernel/process.c:154
ret_from_fork_asm arch/x86/entry/entry_64.S:258
It is a consequence of a race between the waiting and the signaling side
of the completion:
Waiting thread Completing thread
rtw89_core_tx_kick_off_and_wait()
rcu_assign_pointer(skb_data->wait, wait)
/* start waiting */
wait_for_completion_timeout()
rtw89_pci_tx_status()
rtw89_core_tx_wait_complete()
rcu_read_lock()
/* signals completion and
* proceeds further
*/
complete(&wait->completion)
rcu_read_unlock()
...
/* frees skb_data */
ieee80211_tx_status_ni()
/* returns (exit status doesn't matter) */
wait_for_completion_timeout()
...
/* accesses the already freed skb_data */
rcu_assign_pointer(skb_data->wait, NULL)
The completing side might proceed and free the underlying skb even before
the waiting side is fully awoken and run to execution. Actually the race
happens regardless of wait_for_completion_timeout() exit status, e.g.
the waiting side may hit a timeout and the concurrent completing side is
still able to free the skb.
Skbs which are sent by rtw89_core_tx_kick_off_and_wait() are owned by the
driver. They don't come from core ieee80211 stack so no need to pass them
to ieee80211_tx_status_ni() on completing side.
Introduce a work function which will act as a garbage collector for
rtw89_tx_wait_info objects and the associated skbs. Thus no potentially
heavy locks are required on the completing side.
Found by Linux Verification Center (linuxtesting.org).
Fixes: 1ae5ca615285 ("wifi: rtw89: add function to wait for completion of TX skbs")
Cc: stable@vger.kernel.org
Suggested-by: Zong-Zhe Yang <kevin_yang@realtek.com>
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
---
v4: - fill wait's fields before publishing (Zong-Zhe)
- leave dev_kfree_skb_any + kfree_rcu for tx_wait release (Zong-Zhe)
v3: - decrease waiting timeout in rtw89_tx_wait_work() (Zong-Zhe)
- clear tx_waits list from rtw89_hci_reset(), too (Zong-Zhe)
- keep RCU updating for skb_data->wait (Zong-Zhe)
- keep the old order of calls in rtw89_pci_tx_status() (Zong-Zhe)
- drop wait->finished as complete_all() would make the completion be
done permanently
v2: - use a work function to manage release of tx_waits and associated skbs (Zong-Zhe)
drivers/net/wireless/realtek/rtw89/core.c | 29 ++++++++++++++----
drivers/net/wireless/realtek/rtw89/core.h | 36 +++++++++++++++++++++--
drivers/net/wireless/realtek/rtw89/pci.c | 3 +-
drivers/net/wireless/realtek/rtw89/ser.c | 2 ++
4 files changed, 61 insertions(+), 9 deletions(-)
diff --git a/drivers/net/wireless/realtek/rtw89/core.c b/drivers/net/wireless/realtek/rtw89/core.c
index 57590f5577a3..438930b65631 100644
--- a/drivers/net/wireless/realtek/rtw89/core.c
+++ b/drivers/net/wireless/realtek/rtw89/core.c
@@ -1073,6 +1073,14 @@ rtw89_core_tx_update_desc_info(struct rtw89_dev *rtwdev,
}
}
+static void rtw89_tx_wait_work(struct wiphy *wiphy, struct wiphy_work *work)
+{
+ struct rtw89_dev *rtwdev = container_of(work, struct rtw89_dev,
+ tx_wait_work);
+
+ rtw89_tx_wait_list_clear(rtwdev);
+}
+
void rtw89_core_tx_kick_off(struct rtw89_dev *rtwdev, u8 qsel)
{
u8 ch_dma;
@@ -1090,6 +1098,8 @@ int rtw89_core_tx_kick_off_and_wait(struct rtw89_dev *rtwdev, struct sk_buff *sk
unsigned long time_left;
int ret = 0;
+ lockdep_assert_wiphy(rtwdev->hw->wiphy);
+
wait = kzalloc(sizeof(*wait), GFP_KERNEL);
if (!wait) {
rtw89_core_tx_kick_off(rtwdev, qsel);
@@ -1097,18 +1107,22 @@ int rtw89_core_tx_kick_off_and_wait(struct rtw89_dev *rtwdev, struct sk_buff *sk
}
init_completion(&wait->completion);
+ wait->skb = skb;
rcu_assign_pointer(skb_data->wait, wait);
rtw89_core_tx_kick_off(rtwdev, qsel);
time_left = wait_for_completion_timeout(&wait->completion,
msecs_to_jiffies(timeout));
- if (time_left == 0)
- ret = -ETIMEDOUT;
- else if (!wait->tx_done)
- ret = -EAGAIN;
- rcu_assign_pointer(skb_data->wait, NULL);
- kfree_rcu(wait, rcu_head);
+ if (time_left == 0) {
+ ret = -ETIMEDOUT;
+ list_add_tail(&wait->list, &rtwdev->tx_waits);
+ wiphy_work_queue(rtwdev->hw->wiphy, &rtwdev->tx_wait_work);
+ } else {
+ if (!wait->tx_done)
+ ret = -EAGAIN;
+ rtw89_tx_wait_release(wait);
+ }
return ret;
}
@@ -4972,6 +4986,7 @@ void rtw89_core_stop(struct rtw89_dev *rtwdev)
clear_bit(RTW89_FLAG_RUNNING, rtwdev->flags);
wiphy_work_cancel(wiphy, &rtwdev->c2h_work);
+ wiphy_work_cancel(wiphy, &rtwdev->tx_wait_work);
wiphy_work_cancel(wiphy, &rtwdev->cancel_6ghz_probe_work);
wiphy_work_cancel(wiphy, &btc->eapol_notify_work);
wiphy_work_cancel(wiphy, &btc->arp_notify_work);
@@ -5203,6 +5218,7 @@ int rtw89_core_init(struct rtw89_dev *rtwdev)
INIT_LIST_HEAD(&rtwdev->scan_info.pkt_list[band]);
}
INIT_LIST_HEAD(&rtwdev->scan_info.chan_list);
+ INIT_LIST_HEAD(&rtwdev->tx_waits);
INIT_WORK(&rtwdev->ba_work, rtw89_core_ba_work);
INIT_WORK(&rtwdev->txq_work, rtw89_core_txq_work);
INIT_DELAYED_WORK(&rtwdev->txq_reinvoke_work, rtw89_core_txq_reinvoke_work);
@@ -5233,6 +5249,7 @@ int rtw89_core_init(struct rtw89_dev *rtwdev)
wiphy_work_init(&rtwdev->c2h_work, rtw89_fw_c2h_work);
wiphy_work_init(&rtwdev->ips_work, rtw89_ips_work);
wiphy_work_init(&rtwdev->cancel_6ghz_probe_work, rtw89_cancel_6ghz_probe_work);
+ wiphy_work_init(&rtwdev->tx_wait_work, rtw89_tx_wait_work);
INIT_WORK(&rtwdev->load_firmware_work, rtw89_load_firmware_work);
skb_queue_head_init(&rtwdev->c2h_queue);
diff --git a/drivers/net/wireless/realtek/rtw89/core.h b/drivers/net/wireless/realtek/rtw89/core.h
index 43e10278e14d..b5586603f027 100644
--- a/drivers/net/wireless/realtek/rtw89/core.h
+++ b/drivers/net/wireless/realtek/rtw89/core.h
@@ -3506,9 +3506,12 @@ struct rtw89_phy_rate_pattern {
bool enable;
};
+#define RTW89_TX_WAIT_DEFAULT_TIMEOUT msecs_to_jiffies(10)
struct rtw89_tx_wait_info {
struct rcu_head rcu_head;
+ struct list_head list;
struct completion completion;
+ struct sk_buff *skb;
bool tx_done;
};
@@ -5925,6 +5928,9 @@ struct rtw89_dev {
/* used to protect rpwm */
spinlock_t rpwm_lock;
+ struct list_head tx_waits;
+ struct wiphy_work tx_wait_work;
+
struct rtw89_cam_info cam_info;
struct sk_buff_head c2h_queue;
@@ -6181,6 +6187,27 @@ rtw89_assoc_link_rcu_dereference(struct rtw89_dev *rtwdev, u8 macid)
list_first_entry_or_null(&p->dlink_pool, typeof(*p->links_inst), dlink_schd); \
})
+static inline void rtw89_tx_wait_release(struct rtw89_tx_wait_info *wait)
+{
+ dev_kfree_skb_any(wait->skb);
+ kfree_rcu(wait, rcu_head);
+}
+
+static inline void rtw89_tx_wait_list_clear(struct rtw89_dev *rtwdev)
+{
+ struct rtw89_tx_wait_info *wait, *tmp;
+
+ lockdep_assert_wiphy(rtwdev->hw->wiphy);
+
+ list_for_each_entry_safe(wait, tmp, &rtwdev->tx_waits, list) {
+ if (!wait_for_completion_timeout(&wait->completion,
+ RTW89_TX_WAIT_DEFAULT_TIMEOUT))
+ continue;
+ list_del(&wait->list);
+ rtw89_tx_wait_release(wait);
+ }
+}
+
static inline int rtw89_hci_tx_write(struct rtw89_dev *rtwdev,
struct rtw89_core_tx_request *tx_req)
{
@@ -6190,6 +6217,7 @@ static inline int rtw89_hci_tx_write(struct rtw89_dev *rtwdev,
static inline void rtw89_hci_reset(struct rtw89_dev *rtwdev)
{
rtwdev->hci.ops->reset(rtwdev);
+ rtw89_tx_wait_list_clear(rtwdev);
}
static inline int rtw89_hci_start(struct rtw89_dev *rtwdev)
@@ -7258,11 +7286,12 @@ static inline struct sk_buff *rtw89_alloc_skb_for_rx(struct rtw89_dev *rtwdev,
return dev_alloc_skb(length);
}
-static inline void rtw89_core_tx_wait_complete(struct rtw89_dev *rtwdev,
+static inline bool rtw89_core_tx_wait_complete(struct rtw89_dev *rtwdev,
struct rtw89_tx_skb_data *skb_data,
bool tx_done)
{
struct rtw89_tx_wait_info *wait;
+ bool ret = false;
rcu_read_lock();
@@ -7270,11 +7299,14 @@ static inline void rtw89_core_tx_wait_complete(struct rtw89_dev *rtwdev,
if (!wait)
goto out;
+ ret = true;
wait->tx_done = tx_done;
- complete(&wait->completion);
+ /* Don't access skb anymore after completion */
+ complete_all(&wait->completion);
out:
rcu_read_unlock();
+ return ret;
}
static inline bool rtw89_is_mlo_1_1(struct rtw89_dev *rtwdev)
diff --git a/drivers/net/wireless/realtek/rtw89/pci.c b/drivers/net/wireless/realtek/rtw89/pci.c
index a669f2f843aa..4e3034b44f56 100644
--- a/drivers/net/wireless/realtek/rtw89/pci.c
+++ b/drivers/net/wireless/realtek/rtw89/pci.c
@@ -464,7 +464,8 @@ static void rtw89_pci_tx_status(struct rtw89_dev *rtwdev,
struct rtw89_tx_skb_data *skb_data = RTW89_TX_SKB_CB(skb);
struct ieee80211_tx_info *info;
- rtw89_core_tx_wait_complete(rtwdev, skb_data, tx_status == RTW89_TX_DONE);
+ if (rtw89_core_tx_wait_complete(rtwdev, skb_data, tx_status == RTW89_TX_DONE))
+ return;
info = IEEE80211_SKB_CB(skb);
ieee80211_tx_info_clear_status(info);
diff --git a/drivers/net/wireless/realtek/rtw89/ser.c b/drivers/net/wireless/realtek/rtw89/ser.c
index bb39fdbcba0d..fe7beff8c424 100644
--- a/drivers/net/wireless/realtek/rtw89/ser.c
+++ b/drivers/net/wireless/realtek/rtw89/ser.c
@@ -502,7 +502,9 @@ static void ser_reset_trx_st_hdl(struct rtw89_ser *ser, u8 evt)
}
drv_stop_rx(ser);
+ wiphy_lock(wiphy);
drv_trx_reset(ser);
+ wiphy_unlock(wiphy);
/* wait m3 */
hal_send_m2_event(ser);
--
2.51.0
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH rtw v4 2/4] wifi: rtw89: fix tx_wait initialization race
2025-09-17 9:52 [PATCH rtw v4 0/4] wifi: fixes for rtw89 Fedor Pchelkin
2025-09-17 9:52 ` [PATCH rtw v4 1/4] wifi: rtw89: fix use-after-free in rtw89_core_tx_kick_off_and_wait() Fedor Pchelkin
@ 2025-09-17 9:52 ` Fedor Pchelkin
2025-09-18 5:47 ` Ping-Ke Shih
2025-09-17 9:52 ` [PATCH rtw v4 3/4] wifi: rtw89: fix leak in rtw89_core_send_nullfunc() Fedor Pchelkin
2025-09-17 9:52 ` [PATCH rtw v4 4/4] wifi: rtw89: avoid circular locking dependency in ser_state_run() Fedor Pchelkin
3 siblings, 1 reply; 20+ messages in thread
From: Fedor Pchelkin @ 2025-09-17 9:52 UTC (permalink / raw)
To: Ping-Ke Shih, Zong-Zhe Yang
Cc: Fedor Pchelkin, Bitterblue Smith, Po-Hao Huang, linux-wireless,
linux-kernel, lvc-project, stable
Now that nullfunc skbs are recycled in a separate work item in the driver,
the following race during initialization and processing of those skbs
might lead to noticeable bugs:
Waiting thread Completing thread
rtw89_core_send_nullfunc()
rtw89_core_tx_write_link()
...
rtw89_pci_txwd_submit()
skb_data->wait = NULL
/* add skb to the queue */
skb_queue_tail(&txwd->queue, skb)
rtw89_pci_napi_poll()
...
rtw89_pci_release_txwd_skb()
/* get skb from the queue */
skb_unlink(skb, &txwd->queue)
rtw89_pci_tx_status()
rtw89_core_tx_wait_complete()
/* use incorrect skb_data->wait */
rtw89_core_tx_kick_off_and_wait()
/* assign skb_data->wait but too late */
The value of skb_data->wait indicates whether skb is passed on to the
core ieee80211 stack or released by the driver itself. So assure that by
the time skb is added to txwd queue and becomes visible to the completing
side, it has already allocated tx_wait-related data (in case it's needed).
Found by Linux Verification Center (linuxtesting.org).
Fixes: 1ae5ca615285 ("wifi: rtw89: add function to wait for completion of TX skbs")
Cc: stable@vger.kernel.org
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
---
v4: - use wiphy_dereference (Zong-Zhe)
- move wait->skb assignment place
drivers/net/wireless/realtek/rtw89/core.c | 35 ++++++++++++++---------
drivers/net/wireless/realtek/rtw89/pci.c | 2 --
2 files changed, 21 insertions(+), 16 deletions(-)
diff --git a/drivers/net/wireless/realtek/rtw89/core.c b/drivers/net/wireless/realtek/rtw89/core.c
index 438930b65631..1efe4bb09262 100644
--- a/drivers/net/wireless/realtek/rtw89/core.c
+++ b/drivers/net/wireless/realtek/rtw89/core.c
@@ -1094,22 +1094,13 @@ int rtw89_core_tx_kick_off_and_wait(struct rtw89_dev *rtwdev, struct sk_buff *sk
int qsel, unsigned int timeout)
{
struct rtw89_tx_skb_data *skb_data = RTW89_TX_SKB_CB(skb);
- struct rtw89_tx_wait_info *wait;
+ struct rtw89_tx_wait_info *wait = wiphy_dereference(rtwdev->hw->wiphy,
+ skb_data->wait);
unsigned long time_left;
int ret = 0;
lockdep_assert_wiphy(rtwdev->hw->wiphy);
- wait = kzalloc(sizeof(*wait), GFP_KERNEL);
- if (!wait) {
- rtw89_core_tx_kick_off(rtwdev, qsel);
- return 0;
- }
-
- init_completion(&wait->completion);
- wait->skb = skb;
- rcu_assign_pointer(skb_data->wait, wait);
-
rtw89_core_tx_kick_off(rtwdev, qsel);
time_left = wait_for_completion_timeout(&wait->completion,
msecs_to_jiffies(timeout));
@@ -1171,10 +1162,12 @@ int rtw89_h2c_tx(struct rtw89_dev *rtwdev,
static int rtw89_core_tx_write_link(struct rtw89_dev *rtwdev,
struct rtw89_vif_link *rtwvif_link,
struct rtw89_sta_link *rtwsta_link,
- struct sk_buff *skb, int *qsel, bool sw_mld)
+ struct sk_buff *skb, int *qsel, bool sw_mld,
+ struct rtw89_tx_wait_info *wait)
{
struct ieee80211_sta *sta = rtwsta_link_to_sta_safe(rtwsta_link);
struct ieee80211_vif *vif = rtwvif_link_to_vif(rtwvif_link);
+ struct rtw89_tx_skb_data *skb_data = RTW89_TX_SKB_CB(skb);
struct rtw89_vif *rtwvif = rtwvif_link->rtwvif;
struct rtw89_core_tx_request tx_req = {};
int ret;
@@ -1191,6 +1184,8 @@ static int rtw89_core_tx_write_link(struct rtw89_dev *rtwdev,
rtw89_core_tx_update_desc_info(rtwdev, &tx_req);
rtw89_core_tx_wake(rtwdev, &tx_req);
+ rcu_assign_pointer(skb_data->wait, wait);
+
ret = rtw89_hci_tx_write(rtwdev, &tx_req);
if (ret) {
rtw89_err(rtwdev, "failed to transmit skb to HCI\n");
@@ -1227,7 +1222,8 @@ int rtw89_core_tx_write(struct rtw89_dev *rtwdev, struct ieee80211_vif *vif,
}
}
- return rtw89_core_tx_write_link(rtwdev, rtwvif_link, rtwsta_link, skb, qsel, false);
+ return rtw89_core_tx_write_link(rtwdev, rtwvif_link, rtwsta_link, skb, qsel, false,
+ NULL);
}
static __le32 rtw89_build_txwd_body0(struct rtw89_tx_desc_info *desc_info)
@@ -3425,6 +3421,7 @@ int rtw89_core_send_nullfunc(struct rtw89_dev *rtwdev, struct rtw89_vif_link *rt
struct ieee80211_vif *vif = rtwvif_link_to_vif(rtwvif_link);
int link_id = ieee80211_vif_is_mld(vif) ? rtwvif_link->link_id : -1;
struct rtw89_sta_link *rtwsta_link;
+ struct rtw89_tx_wait_info *wait;
struct ieee80211_sta *sta;
struct ieee80211_hdr *hdr;
struct rtw89_sta *rtwsta;
@@ -3434,6 +3431,12 @@ int rtw89_core_send_nullfunc(struct rtw89_dev *rtwdev, struct rtw89_vif_link *rt
if (vif->type != NL80211_IFTYPE_STATION || !vif->cfg.assoc)
return 0;
+ wait = kzalloc(sizeof(*wait), GFP_KERNEL);
+ if (!wait)
+ return -ENOMEM;
+
+ init_completion(&wait->completion);
+
rcu_read_lock();
sta = ieee80211_find_sta(vif, vif->cfg.ap_addr);
if (!sta) {
@@ -3448,6 +3451,8 @@ int rtw89_core_send_nullfunc(struct rtw89_dev *rtwdev, struct rtw89_vif_link *rt
goto out;
}
+ wait->skb = skb;
+
hdr = (struct ieee80211_hdr *)skb->data;
if (ps)
hdr->frame_control |= cpu_to_le16(IEEE80211_FCTL_PM);
@@ -3458,7 +3463,8 @@ int rtw89_core_send_nullfunc(struct rtw89_dev *rtwdev, struct rtw89_vif_link *rt
goto out;
}
- ret = rtw89_core_tx_write_link(rtwdev, rtwvif_link, rtwsta_link, skb, &qsel, true);
+ ret = rtw89_core_tx_write_link(rtwdev, rtwvif_link, rtwsta_link, skb, &qsel, true,
+ wait);
if (ret) {
rtw89_warn(rtwdev, "nullfunc transmit failed: %d\n", ret);
dev_kfree_skb_any(skb);
@@ -3471,6 +3477,7 @@ int rtw89_core_send_nullfunc(struct rtw89_dev *rtwdev, struct rtw89_vif_link *rt
timeout);
out:
rcu_read_unlock();
+ kfree(wait);
return ret;
}
diff --git a/drivers/net/wireless/realtek/rtw89/pci.c b/drivers/net/wireless/realtek/rtw89/pci.c
index 4e3034b44f56..cb9682f306a6 100644
--- a/drivers/net/wireless/realtek/rtw89/pci.c
+++ b/drivers/net/wireless/realtek/rtw89/pci.c
@@ -1372,7 +1372,6 @@ static int rtw89_pci_txwd_submit(struct rtw89_dev *rtwdev,
struct pci_dev *pdev = rtwpci->pdev;
struct sk_buff *skb = tx_req->skb;
struct rtw89_pci_tx_data *tx_data = RTW89_PCI_TX_SKB_CB(skb);
- struct rtw89_tx_skb_data *skb_data = RTW89_TX_SKB_CB(skb);
bool en_wd_info = desc_info->en_wd_info;
u32 txwd_len;
u32 txwp_len;
@@ -1388,7 +1387,6 @@ static int rtw89_pci_txwd_submit(struct rtw89_dev *rtwdev,
}
tx_data->dma = dma;
- rcu_assign_pointer(skb_data->wait, NULL);
txwp_len = sizeof(*txwp_info);
txwd_len = chip->txwd_body_size;
--
2.51.0
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH rtw v4 3/4] wifi: rtw89: fix leak in rtw89_core_send_nullfunc()
2025-09-17 9:52 [PATCH rtw v4 0/4] wifi: fixes for rtw89 Fedor Pchelkin
2025-09-17 9:52 ` [PATCH rtw v4 1/4] wifi: rtw89: fix use-after-free in rtw89_core_tx_kick_off_and_wait() Fedor Pchelkin
2025-09-17 9:52 ` [PATCH rtw v4 2/4] wifi: rtw89: fix tx_wait initialization race Fedor Pchelkin
@ 2025-09-17 9:52 ` Fedor Pchelkin
2025-09-18 5:48 ` Ping-Ke Shih
2025-09-17 9:52 ` [PATCH rtw v4 4/4] wifi: rtw89: avoid circular locking dependency in ser_state_run() Fedor Pchelkin
3 siblings, 1 reply; 20+ messages in thread
From: Fedor Pchelkin @ 2025-09-17 9:52 UTC (permalink / raw)
To: Ping-Ke Shih, Zong-Zhe Yang
Cc: Fedor Pchelkin, Bitterblue Smith, Po-Hao Huang, linux-wireless,
linux-kernel, lvc-project
If there is no rtwsta_link found in rtw89_core_send_nullfunc(), allocated
skb is leaked. Free it on the error handling path.
Found by Linux Verification Center (linuxtesting.org).
Fixes: a8ba4acab7db ("wifi: rtw89: send nullfunc based on the given link")
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
---
v3: - use dev_kfree_skb_any() directly on error handling path without
extra goto complexity (Zong-Zhe)
drivers/net/wireless/realtek/rtw89/core.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/wireless/realtek/rtw89/core.c b/drivers/net/wireless/realtek/rtw89/core.c
index 1efe4bb09262..4310ef839dd6 100644
--- a/drivers/net/wireless/realtek/rtw89/core.c
+++ b/drivers/net/wireless/realtek/rtw89/core.c
@@ -3460,6 +3460,7 @@ int rtw89_core_send_nullfunc(struct rtw89_dev *rtwdev, struct rtw89_vif_link *rt
rtwsta_link = rtwsta->links[rtwvif_link->link_id];
if (unlikely(!rtwsta_link)) {
ret = -ENOLINK;
+ dev_kfree_skb_any(skb);
goto out;
}
--
2.51.0
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH rtw v4 4/4] wifi: rtw89: avoid circular locking dependency in ser_state_run()
2025-09-17 9:52 [PATCH rtw v4 0/4] wifi: fixes for rtw89 Fedor Pchelkin
` (2 preceding siblings ...)
2025-09-17 9:52 ` [PATCH rtw v4 3/4] wifi: rtw89: fix leak in rtw89_core_send_nullfunc() Fedor Pchelkin
@ 2025-09-17 9:52 ` Fedor Pchelkin
2025-09-18 5:52 ` Ping-Ke Shih
3 siblings, 1 reply; 20+ messages in thread
From: Fedor Pchelkin @ 2025-09-17 9:52 UTC (permalink / raw)
To: Ping-Ke Shih, Zong-Zhe Yang
Cc: Fedor Pchelkin, Bitterblue Smith, Po-Hao Huang, linux-wireless,
linux-kernel, lvc-project
Lockdep gives a splat [1] when ser_hdl_work item is executed. It is
scheduled at mac80211 workqueue via ieee80211_queue_work() and takes a
wiphy lock inside. However, this workqueue can be flushed when e.g.
closing the interface and wiphy lock is already taken in that case.
Choosing wiphy_work_queue() for SER is likely not suitable. Back on to
the global workqueue.
[1]:
WARNING: possible circular locking dependency detected
6.17.0-rc2 #17 Not tainted
------------------------------------------------------
kworker/u32:1/61 is trying to acquire lock:
ffff88811bc00768 (&rdev->wiphy.mtx){+.+.}-{4:4}, at: ser_state_run+0x5e/0x180 [rtw89_core]
but task is already holding lock:
ffffc9000048fd30 ((work_completion)(&ser->ser_hdl_work)){+.+.}-{0:0}, at: process_one_work+0x7b5/0x1450
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 ((work_completion)(&ser->ser_hdl_work)){+.+.}-{0:0}:
process_one_work+0x7c6/0x1450
worker_thread+0x49e/0xd00
kthread+0x313/0x640
ret_from_fork+0x221/0x300
ret_from_fork_asm+0x1a/0x30
-> #1 ((wq_completion)phy0){+.+.}-{0:0}:
touch_wq_lockdep_map+0x8e/0x180
__flush_workqueue+0x129/0x10d0
ieee80211_stop_device+0xa8/0x110
ieee80211_do_stop+0x14ce/0x2880
ieee80211_stop+0x13a/0x2c0
__dev_close_many+0x18f/0x510
__dev_change_flags+0x25f/0x670
netif_change_flags+0x7b/0x160
do_setlink.isra.0+0x1640/0x35d0
rtnl_newlink+0xd8c/0x1d30
rtnetlink_rcv_msg+0x700/0xb80
netlink_rcv_skb+0x11d/0x350
netlink_unicast+0x49a/0x7a0
netlink_sendmsg+0x759/0xc20
____sys_sendmsg+0x812/0xa00
___sys_sendmsg+0xf7/0x180
__sys_sendmsg+0x11f/0x1b0
do_syscall_64+0xbb/0x360
entry_SYSCALL_64_after_hwframe+0x77/0x7f
-> #0 (&rdev->wiphy.mtx){+.+.}-{4:4}:
__lock_acquire+0x124c/0x1d20
lock_acquire+0x154/0x2e0
__mutex_lock+0x17b/0x12f0
ser_state_run+0x5e/0x180 [rtw89_core]
rtw89_ser_hdl_work+0x119/0x220 [rtw89_core]
process_one_work+0x82d/0x1450
worker_thread+0x49e/0xd00
kthread+0x313/0x640
ret_from_fork+0x221/0x300
ret_from_fork_asm+0x1a/0x30
other info that might help us debug this:
Chain exists of:
&rdev->wiphy.mtx --> (wq_completion)phy0 --> (work_completion)(&ser->ser_hdl_work)
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock((work_completion)(&ser->ser_hdl_work));
lock((wq_completion)phy0);
lock((work_completion)(&ser->ser_hdl_work));
lock(&rdev->wiphy.mtx);
*** DEADLOCK ***
2 locks held by kworker/u32:1/61:
#0: ffff888103835148 ((wq_completion)phy0){+.+.}-{0:0}, at: process_one_work+0xefa/0x1450
#1: ffffc9000048fd30 ((work_completion)(&ser->ser_hdl_work)){+.+.}-{0:0}, at: process_one_work+0x7b5/0x1450
stack backtrace:
CPU: 0 UID: 0 PID: 61 Comm: kworker/u32:1 Not tainted 6.17.0-rc2 #17 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS edk2-20250523-14.fc42 05/23/2025
Workqueue: phy0 rtw89_ser_hdl_work [rtw89_core]
Call Trace:
<TASK>
dump_stack_lvl+0x5d/0x80
print_circular_bug.cold+0x178/0x1be
check_noncircular+0x14c/0x170
__lock_acquire+0x124c/0x1d20
lock_acquire+0x154/0x2e0
__mutex_lock+0x17b/0x12f0
ser_state_run+0x5e/0x180 [rtw89_core]
rtw89_ser_hdl_work+0x119/0x220 [rtw89_core]
process_one_work+0x82d/0x1450
worker_thread+0x49e/0xd00
kthread+0x313/0x640
ret_from_fork+0x221/0x300
ret_from_fork_asm+0x1a/0x30
</TASK>
Found by Linux Verification Center (linuxtesting.org).
Fixes: ebfc9199df05 ("wifi: rtw89: add wiphy_lock() to work that isn't held wiphy_lock() yet")
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
---
drivers/net/wireless/realtek/rtw89/ser.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/net/wireless/realtek/rtw89/ser.c b/drivers/net/wireless/realtek/rtw89/ser.c
index fe7beff8c424..f99e179f7ff9 100644
--- a/drivers/net/wireless/realtek/rtw89/ser.c
+++ b/drivers/net/wireless/realtek/rtw89/ser.c
@@ -205,7 +205,6 @@ static void rtw89_ser_hdl_work(struct work_struct *work)
static int ser_send_msg(struct rtw89_ser *ser, u8 event)
{
- struct rtw89_dev *rtwdev = container_of(ser, struct rtw89_dev, ser);
struct ser_msg *msg = NULL;
if (test_bit(RTW89_SER_DRV_STOP_RUN, ser->flags))
@@ -221,7 +220,7 @@ static int ser_send_msg(struct rtw89_ser *ser, u8 event)
list_add(&msg->list, &ser->msg_q);
spin_unlock_irq(&ser->msg_q_lock);
- ieee80211_queue_work(rtwdev->hw, &ser->ser_hdl_work);
+ schedule_work(&ser->ser_hdl_work);
return 0;
}
--
2.51.0
^ permalink raw reply related [flat|nested] 20+ messages in thread
* RE: [PATCH rtw v4 1/4] wifi: rtw89: fix use-after-free in rtw89_core_tx_kick_off_and_wait()
2025-09-17 9:52 ` [PATCH rtw v4 1/4] wifi: rtw89: fix use-after-free in rtw89_core_tx_kick_off_and_wait() Fedor Pchelkin
@ 2025-09-18 4:00 ` Ping-Ke Shih
2025-09-18 4:40 ` Zong-Zhe Yang
0 siblings, 1 reply; 20+ messages in thread
From: Ping-Ke Shih @ 2025-09-18 4:00 UTC (permalink / raw)
To: Fedor Pchelkin, Zong-Zhe Yang
Cc: Bitterblue Smith, Bernie Huang, linux-wireless@vger.kernel.org,
linux-kernel@vger.kernel.org, lvc-project@linuxtesting.org,
stable@vger.kernel.org
Fedor Pchelkin <pchelkin@ispras.ru> wrote:
[...]
> @@ -6181,6 +6187,27 @@ rtw89_assoc_link_rcu_dereference(struct rtw89_dev *rtwdev, u8 macid)
> list_first_entry_or_null(&p->dlink_pool, typeof(*p->links_inst), dlink_schd); \
> })
>
> +static inline void rtw89_tx_wait_release(struct rtw89_tx_wait_info *wait)
> +{
> + dev_kfree_skb_any(wait->skb);
> + kfree_rcu(wait, rcu_head);
> +}
> +
> +static inline void rtw89_tx_wait_list_clear(struct rtw89_dev *rtwdev)
> +{
> + struct rtw89_tx_wait_info *wait, *tmp;
> +
> + lockdep_assert_wiphy(rtwdev->hw->wiphy);
> +
> + list_for_each_entry_safe(wait, tmp, &rtwdev->tx_waits, list) {
> + if (!wait_for_completion_timeout(&wait->completion,
> + RTW89_TX_WAIT_DEFAULT_TIMEOUT))
> + continue;
Why should we wait 10ms? Just try_wait_for_completion()?
Since TX completion might be missing (rtw89_core_stop(), for example),
shouldn't we unconditionally free all in wait list for that case?
> + list_del(&wait->list);
> + rtw89_tx_wait_release(wait);
> + }
> +}
> +
> static inline int rtw89_hci_tx_write(struct rtw89_dev *rtwdev,
> struct rtw89_core_tx_request *tx_req)
> {
^ permalink raw reply [flat|nested] 20+ messages in thread
* RE: [PATCH rtw v4 1/4] wifi: rtw89: fix use-after-free in rtw89_core_tx_kick_off_and_wait()
2025-09-18 4:00 ` Ping-Ke Shih
@ 2025-09-18 4:40 ` Zong-Zhe Yang
2025-09-18 5:23 ` Ping-Ke Shih
0 siblings, 1 reply; 20+ messages in thread
From: Zong-Zhe Yang @ 2025-09-18 4:40 UTC (permalink / raw)
To: Ping-Ke Shih, Fedor Pchelkin
Cc: Bitterblue Smith, Bernie Huang, linux-wireless@vger.kernel.org,
linux-kernel@vger.kernel.org, lvc-project@linuxtesting.org,
stable@vger.kernel.org
Ping-Ke Shih <pkshih@realtek.com> wrote:
>
> Fedor Pchelkin <pchelkin@ispras.ru> wrote:
>
> [...]
>
> > @@ -6181,6 +6187,27 @@ rtw89_assoc_link_rcu_dereference(struct rtw89_dev *rtwdev,
> u8 macid)
> > list_first_entry_or_null(&p->dlink_pool,
> > typeof(*p->links_inst), dlink_schd); \
> > })
> >
> > +static inline void rtw89_tx_wait_release(struct rtw89_tx_wait_info
> > +*wait) {
> > + dev_kfree_skb_any(wait->skb);
> > + kfree_rcu(wait, rcu_head);
> > +}
> > +
> > +static inline void rtw89_tx_wait_list_clear(struct rtw89_dev *rtwdev)
> > +{
> > + struct rtw89_tx_wait_info *wait, *tmp;
> > +
> > + lockdep_assert_wiphy(rtwdev->hw->wiphy);
> > +
> > + list_for_each_entry_safe(wait, tmp, &rtwdev->tx_waits, list) {
> > + if (!wait_for_completion_timeout(&wait->completion,
> > +
> RTW89_TX_WAIT_DEFAULT_TIMEOUT))
> > + continue;
>
>
> Why should we wait 10ms? Just try_wait_for_completion()?
>
> Since TX completion might be missing (rtw89_core_stop(), for example), shouldn't we
> unconditionally free all in wait list for that case?
>
In hci reset (when we release pending skb), the condition will become true.
So, all left will be freed at that time. Before that, there is no logic to ensure no
more completing side, so it cannot be unconditionally freed unless we don't
want to double check if those, which timed out, are done at some moment.
(e.g. core stop will do hci reset)
>
> > + list_del(&wait->list);
> > + rtw89_tx_wait_release(wait);
> > + }
> > +}
> > +
> > static inline int rtw89_hci_tx_write(struct rtw89_dev *rtwdev,
> > struct rtw89_core_tx_request *tx_req)
> > {
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* RE: [PATCH rtw v4 1/4] wifi: rtw89: fix use-after-free in rtw89_core_tx_kick_off_and_wait()
2025-09-18 4:40 ` Zong-Zhe Yang
@ 2025-09-18 5:23 ` Ping-Ke Shih
2025-09-18 13:34 ` Fedor Pchelkin
0 siblings, 1 reply; 20+ messages in thread
From: Ping-Ke Shih @ 2025-09-18 5:23 UTC (permalink / raw)
To: Zong-Zhe Yang, Fedor Pchelkin
Cc: Bitterblue Smith, Bernie Huang, linux-wireless@vger.kernel.org,
linux-kernel@vger.kernel.org, lvc-project@linuxtesting.org,
stable@vger.kernel.org
Zong-Zhe Yang <kevin_yang@realtek.com> wrote:
> Ping-Ke Shih <pkshih@realtek.com> wrote:
> >
> > Fedor Pchelkin <pchelkin@ispras.ru> wrote:
> >
> > [...]
> >
> > > @@ -6181,6 +6187,27 @@ rtw89_assoc_link_rcu_dereference(struct rtw89_dev *rtwdev,
> > u8 macid)
> > > list_first_entry_or_null(&p->dlink_pool,
> > > typeof(*p->links_inst), dlink_schd); \
> > > })
> > >
> > > +static inline void rtw89_tx_wait_release(struct rtw89_tx_wait_info
> > > +*wait) {
> > > + dev_kfree_skb_any(wait->skb);
> > > + kfree_rcu(wait, rcu_head);
> > > +}
> > > +
> > > +static inline void rtw89_tx_wait_list_clear(struct rtw89_dev *rtwdev)
> > > +{
> > > + struct rtw89_tx_wait_info *wait, *tmp;
> > > +
> > > + lockdep_assert_wiphy(rtwdev->hw->wiphy);
> > > +
> > > + list_for_each_entry_safe(wait, tmp, &rtwdev->tx_waits, list) {
> > > + if (!wait_for_completion_timeout(&wait->completion,
> > > +
> > RTW89_TX_WAIT_DEFAULT_TIMEOUT))
> > > + continue;
> >
> >
> > Why should we wait 10ms? Just try_wait_for_completion()?
> >
> > Since TX completion might be missing (rtw89_core_stop(), for example), shouldn't we
> > unconditionally free all in wait list for that case?
> >
>
> In hci reset (when we release pending skb), the condition will become true.
> So, all left will be freed at that time. Before that, there is no logic to ensure no
> more completing side, so it cannot be unconditionally freed unless we don't
> want to double check if those, which timed out, are done at some moment.
>
> (e.g. core stop will do hci reset)
Thanks for the explanation.
Just consider try_wait_for_completion() then.
By the way, if want a delay for timeout case, use delayed work for tx_wait_work
instead.
>
> >
> > > + list_del(&wait->list);
> > > + rtw89_tx_wait_release(wait);
> > > + }
> > > +}
> > > +
> > > static inline int rtw89_hci_tx_write(struct rtw89_dev *rtwdev,
> > > struct rtw89_core_tx_request *tx_req)
> > > {
> >
^ permalink raw reply [flat|nested] 20+ messages in thread
* RE: [PATCH rtw v4 2/4] wifi: rtw89: fix tx_wait initialization race
2025-09-17 9:52 ` [PATCH rtw v4 2/4] wifi: rtw89: fix tx_wait initialization race Fedor Pchelkin
@ 2025-09-18 5:47 ` Ping-Ke Shih
2025-09-18 15:19 ` Fedor Pchelkin
0 siblings, 1 reply; 20+ messages in thread
From: Ping-Ke Shih @ 2025-09-18 5:47 UTC (permalink / raw)
To: Fedor Pchelkin, Zong-Zhe Yang
Cc: Bitterblue Smith, Bernie Huang, linux-wireless@vger.kernel.org,
linux-kernel@vger.kernel.org, lvc-project@linuxtesting.org,
stable@vger.kernel.org
Fedor Pchelkin <pchelkin@ispras.ru> wrote:
> Now that nullfunc skbs are recycled in a separate work item in the driver,
> the following race during initialization and processing of those skbs
> might lead to noticeable bugs:
>
> Waiting thread Completing thread
>
> rtw89_core_send_nullfunc()
> rtw89_core_tx_write_link()
> ...
> rtw89_pci_txwd_submit()
> skb_data->wait = NULL
> /* add skb to the queue */
> skb_queue_tail(&txwd->queue, skb)
> rtw89_pci_napi_poll()
> ...
> rtw89_pci_release_txwd_skb()
> /* get skb from the queue */
> skb_unlink(skb, &txwd->queue)
> rtw89_pci_tx_status()
> rtw89_core_tx_wait_complete()
> /* use incorrect skb_data->wait */
> rtw89_core_tx_kick_off_and_wait()
> /* assign skb_data->wait but too late */
How will we receive tx completion before TX kick off?
(see the original code below)
[...]
> diff --git a/drivers/net/wireless/realtek/rtw89/core.c b/drivers/net/wireless/realtek/rtw89/core.c
> index 438930b65631..1efe4bb09262 100644
> --- a/drivers/net/wireless/realtek/rtw89/core.c
> +++ b/drivers/net/wireless/realtek/rtw89/core.c
> @@ -1094,22 +1094,13 @@ int rtw89_core_tx_kick_off_and_wait(struct rtw89_dev *rtwdev, struct sk_buff *sk
> int qsel, unsigned int timeout)
> {
> struct rtw89_tx_skb_data *skb_data = RTW89_TX_SKB_CB(skb);
> - struct rtw89_tx_wait_info *wait;
> + struct rtw89_tx_wait_info *wait = wiphy_dereference(rtwdev->hw->wiphy,
> + skb_data->wait);
Can't we just pass 'wait' by function argument?
> unsigned long time_left;
> int ret = 0;
>
> lockdep_assert_wiphy(rtwdev->hw->wiphy);
>
> - wait = kzalloc(sizeof(*wait), GFP_KERNEL);
> - if (!wait) {
> - rtw89_core_tx_kick_off(rtwdev, qsel);
> - return 0;
> - }
> -
> - init_completion(&wait->completion);
> - wait->skb = skb;
> - rcu_assign_pointer(skb_data->wait, wait);
> -
Here, original code prepares completion before TX kick off. How it could
be a problem? Do I miss something?
> rtw89_core_tx_kick_off(rtwdev, qsel);
> time_left = wait_for_completion_timeout(&wait->completion,
> msecs_to_jiffies(timeout));
^ permalink raw reply [flat|nested] 20+ messages in thread
* RE: [PATCH rtw v4 3/4] wifi: rtw89: fix leak in rtw89_core_send_nullfunc()
2025-09-17 9:52 ` [PATCH rtw v4 3/4] wifi: rtw89: fix leak in rtw89_core_send_nullfunc() Fedor Pchelkin
@ 2025-09-18 5:48 ` Ping-Ke Shih
0 siblings, 0 replies; 20+ messages in thread
From: Ping-Ke Shih @ 2025-09-18 5:48 UTC (permalink / raw)
To: Fedor Pchelkin, Zong-Zhe Yang
Cc: Bitterblue Smith, Bernie Huang, linux-wireless@vger.kernel.org,
linux-kernel@vger.kernel.org, lvc-project@linuxtesting.org
Fedor Pchelkin <pchelkin@ispras.ru> wrote:
> If there is no rtwsta_link found in rtw89_core_send_nullfunc(), allocated
> skb is leaked. Free it on the error handling path.
>
> Found by Linux Verification Center (linuxtesting.org).
>
> Fixes: a8ba4acab7db ("wifi: rtw89: send nullfunc based on the given link")
> Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Acked-by: Ping-Ke Shih <pkshih@realtek.com>
^ permalink raw reply [flat|nested] 20+ messages in thread
* RE: [PATCH rtw v4 4/4] wifi: rtw89: avoid circular locking dependency in ser_state_run()
2025-09-17 9:52 ` [PATCH rtw v4 4/4] wifi: rtw89: avoid circular locking dependency in ser_state_run() Fedor Pchelkin
@ 2025-09-18 5:52 ` Ping-Ke Shih
2025-09-18 15:30 ` Fedor Pchelkin
0 siblings, 1 reply; 20+ messages in thread
From: Ping-Ke Shih @ 2025-09-18 5:52 UTC (permalink / raw)
To: Fedor Pchelkin, Zong-Zhe Yang
Cc: Bitterblue Smith, Bernie Huang, linux-wireless@vger.kernel.org,
linux-kernel@vger.kernel.org, lvc-project@linuxtesting.org
Fedor Pchelkin <pchelkin@ispras.ru> wrote:
> Lockdep gives a splat [1] when ser_hdl_work item is executed. It is
> scheduled at mac80211 workqueue via ieee80211_queue_work() and takes a
> wiphy lock inside. However, this workqueue can be flushed when e.g.
> closing the interface and wiphy lock is already taken in that case.
>
> Choosing wiphy_work_queue() for SER is likely not suitable. Back on to
> the global workqueue.
>
> [1]:
>
> WARNING: possible circular locking dependency detected
> 6.17.0-rc2 #17 Not tainted
> ------------------------------------------------------
> kworker/u32:1/61 is trying to acquire lock:
> ffff88811bc00768 (&rdev->wiphy.mtx){+.+.}-{4:4}, at: ser_state_run+0x5e/0x180 [rtw89_core]
>
> but task is already holding lock:
> ffffc9000048fd30 ((work_completion)(&ser->ser_hdl_work)){+.+.}-{0:0}, at:
> process_one_work+0x7b5/0x1450
>
> which lock already depends on the new lock.
>
> the existing dependency chain (in reverse order) is:
>
> -> #2 ((work_completion)(&ser->ser_hdl_work)){+.+.}-{0:0}:
> process_one_work+0x7c6/0x1450
> worker_thread+0x49e/0xd00
> kthread+0x313/0x640
> ret_from_fork+0x221/0x300
> ret_from_fork_asm+0x1a/0x30
>
> -> #1 ((wq_completion)phy0){+.+.}-{0:0}:
> touch_wq_lockdep_map+0x8e/0x180
> __flush_workqueue+0x129/0x10d0
> ieee80211_stop_device+0xa8/0x110
> ieee80211_do_stop+0x14ce/0x2880
> ieee80211_stop+0x13a/0x2c0
> __dev_close_many+0x18f/0x510
> __dev_change_flags+0x25f/0x670
> netif_change_flags+0x7b/0x160
> do_setlink.isra.0+0x1640/0x35d0
> rtnl_newlink+0xd8c/0x1d30
> rtnetlink_rcv_msg+0x700/0xb80
> netlink_rcv_skb+0x11d/0x350
> netlink_unicast+0x49a/0x7a0
> netlink_sendmsg+0x759/0xc20
> ____sys_sendmsg+0x812/0xa00
> ___sys_sendmsg+0xf7/0x180
> __sys_sendmsg+0x11f/0x1b0
> do_syscall_64+0xbb/0x360
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
>
> -> #0 (&rdev->wiphy.mtx){+.+.}-{4:4}:
> __lock_acquire+0x124c/0x1d20
> lock_acquire+0x154/0x2e0
> __mutex_lock+0x17b/0x12f0
> ser_state_run+0x5e/0x180 [rtw89_core]
> rtw89_ser_hdl_work+0x119/0x220 [rtw89_core]
> process_one_work+0x82d/0x1450
> worker_thread+0x49e/0xd00
> kthread+0x313/0x640
> ret_from_fork+0x221/0x300
> ret_from_fork_asm+0x1a/0x30
>
> other info that might help us debug this:
>
> Chain exists of:
> &rdev->wiphy.mtx --> (wq_completion)phy0 --> (work_completion)(&ser->ser_hdl_work)
>
> Possible unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock((work_completion)(&ser->ser_hdl_work));
> lock((wq_completion)phy0);
> lock((work_completion)(&ser->ser_hdl_work));
> lock(&rdev->wiphy.mtx);
>
> *** DEADLOCK ***
>
> 2 locks held by kworker/u32:1/61:
> #0: ffff888103835148 ((wq_completion)phy0){+.+.}-{0:0}, at: process_one_work+0xefa/0x1450
> #1: ffffc9000048fd30 ((work_completion)(&ser->ser_hdl_work)){+.+.}-{0:0}, at:
> process_one_work+0x7b5/0x1450
>
> stack backtrace:
> CPU: 0 UID: 0 PID: 61 Comm: kworker/u32:1 Not tainted 6.17.0-rc2 #17 PREEMPT(voluntary)
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS edk2-20250523-14.fc42 05/23/2025
> Workqueue: phy0 rtw89_ser_hdl_work [rtw89_core]
> Call Trace:
> <TASK>
> dump_stack_lvl+0x5d/0x80
> print_circular_bug.cold+0x178/0x1be
> check_noncircular+0x14c/0x170
> __lock_acquire+0x124c/0x1d20
> lock_acquire+0x154/0x2e0
> __mutex_lock+0x17b/0x12f0
> ser_state_run+0x5e/0x180 [rtw89_core]
> rtw89_ser_hdl_work+0x119/0x220 [rtw89_core]
> process_one_work+0x82d/0x1450
> worker_thread+0x49e/0xd00
> kthread+0x313/0x640
> ret_from_fork+0x221/0x300
> ret_from_fork_asm+0x1a/0x30
> </TASK>
>
> Found by Linux Verification Center (linuxtesting.org).
>
> Fixes: ebfc9199df05 ("wifi: rtw89: add wiphy_lock() to work that isn't held wiphy_lock() yet")
> Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Acked-by: Ping-Ke Shih <pkshih@realtek.com>
By the way, you mark this patchset as 'rtw'. Does it mean this patchset is
urgent to you? If not, it will be more smooth (avoid possible merge conflict)
if it goes via 'rtw-next'. Let me know your preference.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH rtw v4 1/4] wifi: rtw89: fix use-after-free in rtw89_core_tx_kick_off_and_wait()
2025-09-18 5:23 ` Ping-Ke Shih
@ 2025-09-18 13:34 ` Fedor Pchelkin
2025-09-19 0:27 ` Ping-Ke Shih
0 siblings, 1 reply; 20+ messages in thread
From: Fedor Pchelkin @ 2025-09-18 13:34 UTC (permalink / raw)
To: Ping-Ke Shih
Cc: Zong-Zhe Yang, Bitterblue Smith, Bernie Huang,
linux-wireless@vger.kernel.org, linux-kernel@vger.kernel.org,
lvc-project@linuxtesting.org, stable@vger.kernel.org
On Thu, 18. Sep 05:23, Ping-Ke Shih wrote:
> Zong-Zhe Yang <kevin_yang@realtek.com> wrote:
> > Ping-Ke Shih <pkshih@realtek.com> wrote:
> > >
> > > Fedor Pchelkin <pchelkin@ispras.ru> wrote:
> > >
> > > [...]
> > >
> > > > @@ -6181,6 +6187,27 @@ rtw89_assoc_link_rcu_dereference(struct rtw89_dev *rtwdev,
> > > u8 macid)
> > > > list_first_entry_or_null(&p->dlink_pool,
> > > > typeof(*p->links_inst), dlink_schd); \
> > > > })
> > > >
> > > > +static inline void rtw89_tx_wait_release(struct rtw89_tx_wait_info
> > > > +*wait) {
> > > > + dev_kfree_skb_any(wait->skb);
> > > > + kfree_rcu(wait, rcu_head);
> > > > +}
> > > > +
> > > > +static inline void rtw89_tx_wait_list_clear(struct rtw89_dev *rtwdev)
> > > > +{
> > > > + struct rtw89_tx_wait_info *wait, *tmp;
> > > > +
> > > > + lockdep_assert_wiphy(rtwdev->hw->wiphy);
> > > > +
> > > > + list_for_each_entry_safe(wait, tmp, &rtwdev->tx_waits, list) {
> > > > + if (!wait_for_completion_timeout(&wait->completion,
> > > > +
> > > RTW89_TX_WAIT_DEFAULT_TIMEOUT))
> > > > + continue;
> > >
> > >
> > > Why should we wait 10ms? Just try_wait_for_completion()?
> > >
> > > Since TX completion might be missing (rtw89_core_stop(), for example), shouldn't we
> > > unconditionally free all in wait list for that case?
> > >
> >
> > In hci reset (when we release pending skb), the condition will become true.
> > So, all left will be freed at that time. Before that, there is no logic to ensure no
> > more completing side, so it cannot be unconditionally freed unless we don't
> > want to double check if those, which timed out, are done at some moment.
> >
> > (e.g. core stop will do hci reset)
>
> Thanks for the explanation.
>
> Just consider try_wait_for_completion() then.
OK. completion_done() looks appropriate here as well.
>
> By the way, if want a delay for timeout case, use delayed work for tx_wait_work
> instead.
That makes sense, thanks. So the next time I'll go with delayed
tx_wait_work performing completion_done(): work delay 500 ms, looks
neither too small nor too big for freeing potentially timed out items.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH rtw v4 2/4] wifi: rtw89: fix tx_wait initialization race
2025-09-18 5:47 ` Ping-Ke Shih
@ 2025-09-18 15:19 ` Fedor Pchelkin
2025-09-19 0:34 ` Ping-Ke Shih
0 siblings, 1 reply; 20+ messages in thread
From: Fedor Pchelkin @ 2025-09-18 15:19 UTC (permalink / raw)
To: Ping-Ke Shih
Cc: Zong-Zhe Yang, Bitterblue Smith, Bernie Huang,
linux-wireless@vger.kernel.org, linux-kernel@vger.kernel.org,
lvc-project@linuxtesting.org, stable@vger.kernel.org
On Thu, 18. Sep 05:47, Ping-Ke Shih wrote:
> Fedor Pchelkin <pchelkin@ispras.ru> wrote:
> > @@ -1094,22 +1094,13 @@ int rtw89_core_tx_kick_off_and_wait(struct rtw89_dev *rtwdev, struct sk_buff *sk
> > int qsel, unsigned int timeout)
> > {
> > struct rtw89_tx_skb_data *skb_data = RTW89_TX_SKB_CB(skb);
> > - struct rtw89_tx_wait_info *wait;
> > + struct rtw89_tx_wait_info *wait = wiphy_dereference(rtwdev->hw->wiphy,
> > + skb_data->wait);
>
> Can't we just pass 'wait' by function argument?
Yep.
>
> > unsigned long time_left;
> > int ret = 0;
> >
> > lockdep_assert_wiphy(rtwdev->hw->wiphy);
> >
> > - wait = kzalloc(sizeof(*wait), GFP_KERNEL);
> > - if (!wait) {
> > - rtw89_core_tx_kick_off(rtwdev, qsel);
> > - return 0;
> > - }
> > -
> > - init_completion(&wait->completion);
> > - wait->skb = skb;
> > - rcu_assign_pointer(skb_data->wait, wait);
> > -
>
> Here, original code prepares completion before TX kick off. How it could
> be a problem? Do I miss something?
That's a good question and it made me rethink the cause of the race
scenario. I didn't initially take TX kick off into consideration when
it actually matters.
The thing is: there might have been another thread initiating TX kick off
for the same queue in parallel. But no such thread exists because a taken
wiphy lock generally protects from such situations. rtw89_core_txq_schedule()
worker looks like a good candidate but it doesn't operate on the needed
management queues.
So I may conclude this patch doesn't fix any real bug though I'd prefer to
keep it (with description rewritten of course) because it helps to avoid
potential issues in future.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH rtw v4 4/4] wifi: rtw89: avoid circular locking dependency in ser_state_run()
2025-09-18 5:52 ` Ping-Ke Shih
@ 2025-09-18 15:30 ` Fedor Pchelkin
2025-09-19 0:46 ` Ping-Ke Shih
0 siblings, 1 reply; 20+ messages in thread
From: Fedor Pchelkin @ 2025-09-18 15:30 UTC (permalink / raw)
To: Ping-Ke Shih
Cc: Zong-Zhe Yang, Bitterblue Smith, Bernie Huang,
linux-wireless@vger.kernel.org, linux-kernel@vger.kernel.org,
lvc-project@linuxtesting.org
On Thu, 18. Sep 05:52, Ping-Ke Shih wrote:
> Fedor Pchelkin <pchelkin@ispras.ru> wrote:
> By the way, you mark this patchset as 'rtw'. Does it mean this patchset is
> urgent to you? If not, it will be more smooth (avoid possible merge conflict)
> if it goes via 'rtw-next'. Let me know your preference.
The first patch of the series is rather urgent compared to the others
because it addresses the issue occasionally banging on a working system.
The other ones are less urgent.
TBH I'm not aware of your development process in details. It's v6.17-rc6
at the moment. If I target all patches at rtw-next, are they to land in
upcoming merge window for v6.18 release (a couple of weeks from now)?
If yes then no hurries from me, rtw-next is ok.
^ permalink raw reply [flat|nested] 20+ messages in thread
* RE: [PATCH rtw v4 1/4] wifi: rtw89: fix use-after-free in rtw89_core_tx_kick_off_and_wait()
2025-09-18 13:34 ` Fedor Pchelkin
@ 2025-09-19 0:27 ` Ping-Ke Shih
0 siblings, 0 replies; 20+ messages in thread
From: Ping-Ke Shih @ 2025-09-19 0:27 UTC (permalink / raw)
To: Fedor Pchelkin
Cc: Zong-Zhe Yang, Bitterblue Smith, Bernie Huang,
linux-wireless@vger.kernel.org, linux-kernel@vger.kernel.org,
lvc-project@linuxtesting.org, stable@vger.kernel.org
Fedor Pchelkin <pchelkin@ispras.ru> wrote:
> On Thu, 18. Sep 05:23, Ping-Ke Shih wrote:
> > Zong-Zhe Yang <kevin_yang@realtek.com> wrote:
> > > Ping-Ke Shih <pkshih@realtek.com> wrote:
> > > >
> > > > Fedor Pchelkin <pchelkin@ispras.ru> wrote:
> > > >
> > > > [...]
> > > >
> > > > > @@ -6181,6 +6187,27 @@ rtw89_assoc_link_rcu_dereference(struct rtw89_dev *rtwdev,
> > > > u8 macid)
> > > > > list_first_entry_or_null(&p->dlink_pool,
> > > > > typeof(*p->links_inst), dlink_schd); \
> > > > > })
> > > > >
> > > > > +static inline void rtw89_tx_wait_release(struct rtw89_tx_wait_info
> > > > > +*wait) {
> > > > > + dev_kfree_skb_any(wait->skb);
> > > > > + kfree_rcu(wait, rcu_head);
> > > > > +}
> > > > > +
> > > > > +static inline void rtw89_tx_wait_list_clear(struct rtw89_dev *rtwdev)
> > > > > +{
> > > > > + struct rtw89_tx_wait_info *wait, *tmp;
> > > > > +
> > > > > + lockdep_assert_wiphy(rtwdev->hw->wiphy);
> > > > > +
> > > > > + list_for_each_entry_safe(wait, tmp, &rtwdev->tx_waits, list) {
> > > > > + if (!wait_for_completion_timeout(&wait->completion,
> > > > > +
> > > > RTW89_TX_WAIT_DEFAULT_TIMEOUT))
> > > > > + continue;
> > > >
> > > >
> > > > Why should we wait 10ms? Just try_wait_for_completion()?
> > > >
> > > > Since TX completion might be missing (rtw89_core_stop(), for example), shouldn't we
> > > > unconditionally free all in wait list for that case?
> > > >
> > >
> > > In hci reset (when we release pending skb), the condition will become true.
> > > So, all left will be freed at that time. Before that, there is no logic to ensure no
> > > more completing side, so it cannot be unconditionally freed unless we don't
> > > want to double check if those, which timed out, are done at some moment.
> > >
> > > (e.g. core stop will do hci reset)
> >
> > Thanks for the explanation.
> >
> > Just consider try_wait_for_completion() then.
>
> OK. completion_done() looks appropriate here as well.
>
> >
> > By the way, if want a delay for timeout case, use delayed work for tx_wait_work
> > instead.
>
> That makes sense, thanks. So the next time I'll go with delayed
> tx_wait_work performing completion_done(): work delay 500 ms, looks
> neither too small nor too big for freeing potentially timed out items.
Both look reasonable to me.
^ permalink raw reply [flat|nested] 20+ messages in thread
* RE: [PATCH rtw v4 2/4] wifi: rtw89: fix tx_wait initialization race
2025-09-18 15:19 ` Fedor Pchelkin
@ 2025-09-19 0:34 ` Ping-Ke Shih
2025-09-19 0:50 ` Ping-Ke Shih
0 siblings, 1 reply; 20+ messages in thread
From: Ping-Ke Shih @ 2025-09-19 0:34 UTC (permalink / raw)
To: Fedor Pchelkin
Cc: Zong-Zhe Yang, Bitterblue Smith, Bernie Huang,
linux-wireless@vger.kernel.org, linux-kernel@vger.kernel.org,
lvc-project@linuxtesting.org, stable@vger.kernel.org
Fedor Pchelkin <pchelkin@ispras.ru> wrote:
> On Thu, 18. Sep 05:47, Ping-Ke Shih wrote:
> > Fedor Pchelkin <pchelkin@ispras.ru> wrote:
> > > @@ -1094,22 +1094,13 @@ int rtw89_core_tx_kick_off_and_wait(struct rtw89_dev *rtwdev, struct sk_buff
> *sk
> > > int qsel, unsigned int timeout)
> > > {
> > > struct rtw89_tx_skb_data *skb_data = RTW89_TX_SKB_CB(skb);
> > > - struct rtw89_tx_wait_info *wait;
> > > + struct rtw89_tx_wait_info *wait = wiphy_dereference(rtwdev->hw->wiphy,
> > > + skb_data->wait);
> >
> > Can't we just pass 'wait' by function argument?
>
> Yep.
>
> >
> > > unsigned long time_left;
> > > int ret = 0;
> > >
> > > lockdep_assert_wiphy(rtwdev->hw->wiphy);
> > >
> > > - wait = kzalloc(sizeof(*wait), GFP_KERNEL);
> > > - if (!wait) {
> > > - rtw89_core_tx_kick_off(rtwdev, qsel);
> > > - return 0;
> > > - }
> > > -
> > > - init_completion(&wait->completion);
> > > - wait->skb = skb;
> > > - rcu_assign_pointer(skb_data->wait, wait);
> > > -
> >
> > Here, original code prepares completion before TX kick off. How it could
> > be a problem? Do I miss something?
>
> That's a good question and it made me rethink the cause of the race
> scenario. I didn't initially take TX kick off into consideration when
> it actually matters.
Do it mean that you pictured the racing scenario in commit message by
code review instead of a real case you met?
>
> The thing is: there might have been another thread initiating TX kick off
> for the same queue in parallel. But no such thread exists because a taken
> wiphy lock generally protects from such situations. rtw89_core_txq_schedule()
> worker looks like a good candidate but it doesn't operate on the needed
> management queues.
Last night I also thought if another thread works in parallel.
Maybe rtw89_ops_tx() could be?
>
> So I may conclude this patch doesn't fix any real bug though I'd prefer to
> keep it (with description rewritten of course) because it helps to avoid
> potential issues in future.
Agree.
^ permalink raw reply [flat|nested] 20+ messages in thread
* RE: [PATCH rtw v4 4/4] wifi: rtw89: avoid circular locking dependency in ser_state_run()
2025-09-18 15:30 ` Fedor Pchelkin
@ 2025-09-19 0:46 ` Ping-Ke Shih
2025-09-19 11:00 ` Fedor Pchelkin
0 siblings, 1 reply; 20+ messages in thread
From: Ping-Ke Shih @ 2025-09-19 0:46 UTC (permalink / raw)
To: Fedor Pchelkin
Cc: Zong-Zhe Yang, Bitterblue Smith, Bernie Huang,
linux-wireless@vger.kernel.org, linux-kernel@vger.kernel.org,
lvc-project@linuxtesting.org
Fedor Pchelkin <pchelkin@ispras.ru> wrote:
> On Thu, 18. Sep 05:52, Ping-Ke Shih wrote:
> > Fedor Pchelkin <pchelkin@ispras.ru> wrote:
> > By the way, you mark this patchset as 'rtw'. Does it mean this patchset is
> > urgent to you? If not, it will be more smooth (avoid possible merge conflict)
> > if it goes via 'rtw-next'. Let me know your preference.
>
> The first patch of the series is rather urgent compared to the others
> because it addresses the issue occasionally banging on a working system.
> The other ones are less urgent.
>
> TBH I'm not aware of your development process in details. It's v6.17-rc6
> at the moment. If I target all patches at rtw-next, are they to land in
> upcoming merge window for v6.18 release (a couple of weeks from now)?
> If yes then no hurries from me, rtw-next is ok.
It's v6.17-rc6 (-rc eve), so I don't plan to send a pull-request.
Originally I plan to send the last pull-request to v6.18 today, so I did
review this patchset yesterday to see if I can merge it before sending.
Since only two or three minor changes are needed, I can wait a while and
send the pull-request next Monday if you can re-spin the patchset this
weekend.
If not, I can still merge this patchset in v6.18-rc cycle to rtw tree.
However, this might cause merge conflict with -next, so I don't prefer
this. Upper maintainers need to spend extra time to resolve conflicts.
^ permalink raw reply [flat|nested] 20+ messages in thread
* RE: [PATCH rtw v4 2/4] wifi: rtw89: fix tx_wait initialization race
2025-09-19 0:34 ` Ping-Ke Shih
@ 2025-09-19 0:50 ` Ping-Ke Shih
2025-09-19 7:46 ` Fedor Pchelkin
0 siblings, 1 reply; 20+ messages in thread
From: Ping-Ke Shih @ 2025-09-19 0:50 UTC (permalink / raw)
To: Fedor Pchelkin
Cc: Zong-Zhe Yang, Bitterblue Smith, Bernie Huang,
linux-wireless@vger.kernel.org, linux-kernel@vger.kernel.org,
lvc-project@linuxtesting.org, stable@vger.kernel.org
Ping-Ke Shih <pkshih@realtek.com> wrote:
> Fedor Pchelkin <pchelkin@ispras.ru> wrote:
> > On Thu, 18. Sep 05:47, Ping-Ke Shih wrote:
> > > Fedor Pchelkin <pchelkin@ispras.ru> wrote:
> > > > @@ -1094,22 +1094,13 @@ int rtw89_core_tx_kick_off_and_wait(struct rtw89_dev *rtwdev, struct sk_buff
> > *sk
> > > > int qsel, unsigned int timeout)
> > > > {
> > > > struct rtw89_tx_skb_data *skb_data = RTW89_TX_SKB_CB(skb);
> > > > - struct rtw89_tx_wait_info *wait;
> > > > + struct rtw89_tx_wait_info *wait = wiphy_dereference(rtwdev->hw->wiphy,
> > > > + skb_data->wait);
> > >
> > > Can't we just pass 'wait' by function argument?
> >
> > Yep.
> >
> > >
> > > > unsigned long time_left;
> > > > int ret = 0;
> > > >
> > > > lockdep_assert_wiphy(rtwdev->hw->wiphy);
> > > >
> > > > - wait = kzalloc(sizeof(*wait), GFP_KERNEL);
> > > > - if (!wait) {
> > > > - rtw89_core_tx_kick_off(rtwdev, qsel);
> > > > - return 0;
> > > > - }
> > > > -
> > > > - init_completion(&wait->completion);
> > > > - wait->skb = skb;
> > > > - rcu_assign_pointer(skb_data->wait, wait);
> > > > -
> > >
> > > Here, original code prepares completion before TX kick off. How it could
> > > be a problem? Do I miss something?
> >
> > That's a good question and it made me rethink the cause of the race
> > scenario. I didn't initially take TX kick off into consideration when
> > it actually matters.
>
> Do it mean that you pictured the racing scenario in commit message by
> code review instead of a real case you met?
>
> >
> > The thing is: there might have been another thread initiating TX kick off
> > for the same queue in parallel. But no such thread exists because a taken
> > wiphy lock generally protects from such situations. rtw89_core_txq_schedule()
> > worker looks like a good candidate but it doesn't operate on the needed
> > management queues.
>
> Last night I also thought if another thread works in parallel.
> Maybe rtw89_ops_tx() could be?
>
> >
> > So I may conclude this patch doesn't fix any real bug though I'd prefer to
> > keep it (with description rewritten of course) because it helps to avoid
> > potential issues in future.
>
> Agree.
>
Forgot to say. Could you mention this racing scenario was found by core
review and your perspective in commit message?
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH rtw v4 2/4] wifi: rtw89: fix tx_wait initialization race
2025-09-19 0:50 ` Ping-Ke Shih
@ 2025-09-19 7:46 ` Fedor Pchelkin
0 siblings, 0 replies; 20+ messages in thread
From: Fedor Pchelkin @ 2025-09-19 7:46 UTC (permalink / raw)
To: Ping-Ke Shih
Cc: Zong-Zhe Yang, Bitterblue Smith, Bernie Huang,
linux-wireless@vger.kernel.org, linux-kernel@vger.kernel.org,
lvc-project@linuxtesting.org, stable@vger.kernel.org
On Fri, 19. Sep 00:50, Ping-Ke Shih wrote:
> Ping-Ke Shih <pkshih@realtek.com> wrote:
> > Fedor Pchelkin <pchelkin@ispras.ru> wrote:
> > > That's a good question and it made me rethink the cause of the race
> > > scenario. I didn't initially take TX kick off into consideration when
> > > it actually matters.
> >
> > Do it mean that you pictured the racing scenario in commit message by
> > code review instead of a real case you met?
Yes, the underlying issue for this patch was found by code review only.
Somehow the negative consequences of the potential race became an "obvious"
thing after preparing the first commit, and ignorance of TX kick off
influence made the changelog confusing..
> >
> > >
> > > The thing is: there might have been another thread initiating TX kick off
> > > for the same queue in parallel. But no such thread exists because a taken
> > > wiphy lock generally protects from such situations. rtw89_core_txq_schedule()
> > > worker looks like a good candidate but it doesn't operate on the needed
> > > management queues.
> >
> > Last night I also thought if another thread works in parallel.
> > Maybe rtw89_ops_tx() could be?
Well, probably it could. I thought rtw89_ops_tx() is wiphy locked, too,
but apparently it's not always the case.
Not that it's a relatively easy-to-hit race I'm going to try to reproduce
though :)
> >
> > >
> > > So I may conclude this patch doesn't fix any real bug though I'd prefer to
> > > keep it (with description rewritten of course) because it helps to avoid
> > > potential issues in future.
> >
> > Agree.
> >
>
> Forgot to say. Could you mention this racing scenario was found by core
> review and your perspective in commit message?
Sure.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH rtw v4 4/4] wifi: rtw89: avoid circular locking dependency in ser_state_run()
2025-09-19 0:46 ` Ping-Ke Shih
@ 2025-09-19 11:00 ` Fedor Pchelkin
0 siblings, 0 replies; 20+ messages in thread
From: Fedor Pchelkin @ 2025-09-19 11:00 UTC (permalink / raw)
To: Ping-Ke Shih
Cc: Zong-Zhe Yang, Bitterblue Smith, Bernie Huang,
linux-wireless@vger.kernel.org, linux-kernel@vger.kernel.org,
lvc-project@linuxtesting.org
On Fri, 19. Sep 00:46, Ping-Ke Shih wrote:
> Fedor Pchelkin <pchelkin@ispras.ru> wrote:
> > On Thu, 18. Sep 05:52, Ping-Ke Shih wrote:
> > > Fedor Pchelkin <pchelkin@ispras.ru> wrote:
> > > By the way, you mark this patchset as 'rtw'. Does it mean this patchset is
> > > urgent to you? If not, it will be more smooth (avoid possible merge conflict)
> > > if it goes via 'rtw-next'. Let me know your preference.
> >
> > The first patch of the series is rather urgent compared to the others
> > because it addresses the issue occasionally banging on a working system.
> > The other ones are less urgent.
> >
> > TBH I'm not aware of your development process in details. It's v6.17-rc6
> > at the moment. If I target all patches at rtw-next, are they to land in
> > upcoming merge window for v6.18 release (a couple of weeks from now)?
> > If yes then no hurries from me, rtw-next is ok.
>
> It's v6.17-rc6 (-rc eve), so I don't plan to send a pull-request.
>
> Originally I plan to send the last pull-request to v6.18 today, so I did
> review this patchset yesterday to see if I can merge it before sending.
> Since only two or three minor changes are needed, I can wait a while and
> send the pull-request next Monday if you can re-spin the patchset this
> weekend.
>
> If not, I can still merge this patchset in v6.18-rc cycle to rtw tree.
> However, this might cause merge conflict with -next, so I don't prefer
> this. Upper maintainers need to spend extra time to resolve conflicts.
One thing that I forgot to mention is about rtw89 USB part.
"BUG: KFENCE: use-after-free write in rtw89_core_tx_kick_off_and_wait"
which is fixed by the first patch of the current series is reproduced
reliably with USB HCI because there is no TX completion there yet, i.e.
rtw89_core_tx_kick_off_and_wait() always exits with a timeout and touches
skb parts which are most probably already freed by the call to
ieee80211_tx_status_irqsafe() from URB completion callback. I've got
a dongle now and confirm the bug. Turns out it was reported here [1] by
Bitterblue Smith as well.
[1]: https://lore.kernel.org/linux-wireless/1e5e97d4-8267-4f77-a4bf-1fe23ea40f77@gmail.com/
The first patch does avoid use-after-free bug for USB, too. But, as
there is no TX ACK completion implemented for USB yet, tx_wait_list will
be piled up with wasted items which can't be freed due to the lack of
completion. It's better than crash but still a problem.
Bitterblue suggested [2] implementing the missing TX completion parts for
USB to fix this entirely. I've got a bunch of patches for it which will
send as a separate USB-series today or tomorrow. I expect it'll require
time for review and it probably should have to be improved/reworked in
several places. Anyway I'll send it soon so you've got a more complete
picture and some time until Monday to decide how to handle it.
[2]: https://lore.kernel.org/linux-wireless/0cb4d19b-94c7-450e-ac56-8b0d4a1d889f@gmail.com/
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2025-09-19 11:00 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-17 9:52 [PATCH rtw v4 0/4] wifi: fixes for rtw89 Fedor Pchelkin
2025-09-17 9:52 ` [PATCH rtw v4 1/4] wifi: rtw89: fix use-after-free in rtw89_core_tx_kick_off_and_wait() Fedor Pchelkin
2025-09-18 4:00 ` Ping-Ke Shih
2025-09-18 4:40 ` Zong-Zhe Yang
2025-09-18 5:23 ` Ping-Ke Shih
2025-09-18 13:34 ` Fedor Pchelkin
2025-09-19 0:27 ` Ping-Ke Shih
2025-09-17 9:52 ` [PATCH rtw v4 2/4] wifi: rtw89: fix tx_wait initialization race Fedor Pchelkin
2025-09-18 5:47 ` Ping-Ke Shih
2025-09-18 15:19 ` Fedor Pchelkin
2025-09-19 0:34 ` Ping-Ke Shih
2025-09-19 0:50 ` Ping-Ke Shih
2025-09-19 7:46 ` Fedor Pchelkin
2025-09-17 9:52 ` [PATCH rtw v4 3/4] wifi: rtw89: fix leak in rtw89_core_send_nullfunc() Fedor Pchelkin
2025-09-18 5:48 ` Ping-Ke Shih
2025-09-17 9:52 ` [PATCH rtw v4 4/4] wifi: rtw89: avoid circular locking dependency in ser_state_run() Fedor Pchelkin
2025-09-18 5:52 ` Ping-Ke Shih
2025-09-18 15:30 ` Fedor Pchelkin
2025-09-19 0:46 ` Ping-Ke Shih
2025-09-19 11:00 ` Fedor Pchelkin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox