* [PATCH 1/2] wifi: rtw88: usb: Copy instead of cloning the RX skb
@ 2024-11-14 23:06 Bitterblue Smith
2024-11-14 23:07 ` [PATCH 2/2] wifi: rtw88: usb: Preallocate and reuse the RX skbs Bitterblue Smith
2024-11-18 9:16 ` [PATCH 1/2] wifi: rtw88: usb: Copy instead of cloning the RX skb Ping-Ke Shih
0 siblings, 2 replies; 10+ messages in thread
From: Bitterblue Smith @ 2024-11-14 23:06 UTC (permalink / raw)
To: linux-wireless@vger.kernel.org; +Cc: Ping-Ke Shih, Sascha Hauer
"iperf3 -c 192.168.0.1 -R --udp -b 0" shows about 40% of datagrams
are lost. Many torrents don't download faster than 3 MiB/s, probably
because the Bittorrent protocol uses UDP. This is somehow related to
the use of skb_clone() in the RX path.
Don't use skb_clone(). Instead allocate a new skb for each 802.11 frame
received and copy the data from the big (32768 byte) skb.
With this patch, "iperf3 -c 192.168.0.1 -R --udp -b 0" shows only 1-2%
of datagrams are lost, and torrents can reach download speeds of 36
MiB/s.
Tested with RTL8812AU and RTL8822CU.
Signed-off-by: Bitterblue Smith <rtl8821cerfe2@gmail.com>
---
drivers/net/wireless/realtek/rtw88/usb.c | 52 ++++++++++++++----------
1 file changed, 31 insertions(+), 21 deletions(-)
diff --git a/drivers/net/wireless/realtek/rtw88/usb.c b/drivers/net/wireless/realtek/rtw88/usb.c
index 93ac4837e1b5..727569d4ef4b 100644
--- a/drivers/net/wireless/realtek/rtw88/usb.c
+++ b/drivers/net/wireless/realtek/rtw88/usb.c
@@ -7,6 +7,7 @@
#include <linux/mutex.h>
#include "main.h"
#include "debug.h"
+#include "mac.h"
#include "reg.h"
#include "tx.h"
#include "rx.h"
@@ -546,49 +547,58 @@ static void rtw_usb_rx_handler(struct work_struct *work)
{
struct rtw_usb *rtwusb = container_of(work, struct rtw_usb, rx_work);
struct rtw_dev *rtwdev = rtwusb->rtwdev;
- const struct rtw_chip_info *chip = rtwdev->chip;
- u32 pkt_desc_sz = chip->rx_pkt_desc_sz;
struct ieee80211_rx_status rx_status;
- u32 pkt_offset, next_pkt, urb_len;
struct rtw_rx_pkt_stat pkt_stat;
- struct sk_buff *next_skb;
+ struct sk_buff *rx_skb;
struct sk_buff *skb;
+ u32 pkt_desc_sz = rtwdev->chip->rx_pkt_desc_sz;
+ u32 max_skb_len = pkt_desc_sz + PHY_STATUS_SIZE * 8 +
+ IEEE80211_MAX_MPDU_LEN_VHT_11454;
+ u32 pkt_offset, next_pkt, skb_len;
u8 *rx_desc;
int limit;
for (limit = 0; limit < 200; limit++) {
- skb = skb_dequeue(&rtwusb->rx_queue);
- if (!skb)
+ rx_skb = skb_dequeue(&rtwusb->rx_queue);
+ if (!rx_skb)
break;
if (skb_queue_len(&rtwusb->rx_queue) >= RTW_USB_MAX_RXQ_LEN) {
dev_dbg_ratelimited(rtwdev->dev, "failed to get rx_queue, overflow\n");
- dev_kfree_skb_any(skb);
+ dev_kfree_skb_any(rx_skb);
continue;
}
- urb_len = skb->len;
+ rx_desc = rx_skb->data;
do {
- rx_desc = skb->data;
rtw_rx_query_rx_desc(rtwdev, rx_desc, &pkt_stat,
&rx_status);
pkt_offset = pkt_desc_sz + pkt_stat.drv_info_sz +
pkt_stat.shift;
- next_pkt = round_up(pkt_stat.pkt_len + pkt_offset, 8);
+ skb_len = pkt_stat.pkt_len + pkt_offset;
+ if (skb_len > max_skb_len) {
+ rtw_dbg(rtwdev, RTW_DBG_USB,
+ "skipping too big packet: %u\n",
+ skb_len);
+ goto skip_packet;
+ }
- if (urb_len >= next_pkt + pkt_desc_sz)
- next_skb = skb_clone(skb, GFP_KERNEL);
- else
- next_skb = NULL;
+ skb = alloc_skb(skb_len, GFP_KERNEL);
+ if (!skb) {
+ rtw_dbg(rtwdev, RTW_DBG_USB,
+ "failed to allocate RX skb of size %u\n",
+ skb_len);
+ goto skip_packet;
+ }
+
+ skb_put_data(skb, rx_desc, skb_len);
if (pkt_stat.is_c2h) {
- skb_trim(skb, pkt_stat.pkt_len + pkt_offset);
rtw_fw_c2h_cmd_rx_irqsafe(rtwdev, pkt_offset, skb);
} else {
skb_pull(skb, pkt_offset);
- skb_trim(skb, pkt_stat.pkt_len);
rtw_update_rx_freq_for_invalid(rtwdev, skb,
&rx_status,
&pkt_stat);
@@ -597,12 +607,12 @@ static void rtw_usb_rx_handler(struct work_struct *work)
ieee80211_rx_irqsafe(rtwdev->hw, skb);
}
- skb = next_skb;
- if (skb)
- skb_pull(skb, next_pkt);
+skip_packet:
+ next_pkt = round_up(skb_len, 8);
+ rx_desc += next_pkt;
+ } while (rx_desc + pkt_desc_sz < rx_skb->data + rx_skb->len);
- urb_len -= next_pkt;
- } while (skb);
+ dev_kfree_skb_any(rx_skb);
}
}
--
2.47.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 2/2] wifi: rtw88: usb: Preallocate and reuse the RX skbs
2024-11-14 23:06 [PATCH 1/2] wifi: rtw88: usb: Copy instead of cloning the RX skb Bitterblue Smith
@ 2024-11-14 23:07 ` Bitterblue Smith
2024-11-19 0:50 ` Ping-Ke Shih
2024-11-18 9:16 ` [PATCH 1/2] wifi: rtw88: usb: Copy instead of cloning the RX skb Ping-Ke Shih
1 sibling, 1 reply; 10+ messages in thread
From: Bitterblue Smith @ 2024-11-14 23:07 UTC (permalink / raw)
To: linux-wireless@vger.kernel.org; +Cc: Ping-Ke Shih, Sascha Hauer
The USB driver uses four USB Request Blocks for RX. Before submitting
one, it allocates a 32768 byte skb for the RX data. This allocation can
fail, maybe due to temporary memory fragmentation. When the allocation
fails, the corresponding URB is never submitted again. After four such
allocation failures, all RX stops because the driver is not requesting
data from the device anymore.
Don't allocate a 32768 byte skb when submitting a USB Request Block
(which happens very often). Instead preallocate 8 such skbs, and reuse
them over and over. If all 8 are busy, allocate a new one. This is
pretty rare. If the allocation fails, use a work to try again later.
When there are enough free skbs again, free the excess skbs.
Also, use WQ_BH for the RX workqueue. With a normal or high priority
workqueue the skbs are processed too slowly when the system is even a
little busy, like when opening a new page in a browser, and the driver
runs out of free skbs and allocates a lot of new ones.
Move C2H_ADAPTIVITY to the c2h workqueue instead of handling it directly
from rtw_fw_c2h_cmd_rx_irqsafe(), which runs in the RX workqueue. It
reads hardware registers, which is not "irqsafe" with USB.
This is more or less what the out-of-tree Realtek drivers do, except
they use a tasklet instead of a BH workqueue.
Tested with RTL8723DU, RTL8821AU, RTL8812AU, RTL8812BU, RTL8822CU,
RTL8811CU.
Closes: https://lore.kernel.org/linux-wireless/6e7ecb47-7ea0-433a-a19f-05f88a2edf6b@gmail.com/
Signed-off-by: Bitterblue Smith <rtl8821cerfe2@gmail.com>
---
drivers/net/wireless/realtek/rtw88/fw.c | 7 +--
drivers/net/wireless/realtek/rtw88/usb.c | 73 ++++++++++++++++++++----
drivers/net/wireless/realtek/rtw88/usb.h | 3 +
3 files changed, 67 insertions(+), 16 deletions(-)
diff --git a/drivers/net/wireless/realtek/rtw88/fw.c b/drivers/net/wireless/realtek/rtw88/fw.c
index e6e9946fbf44..02389b7c6876 100644
--- a/drivers/net/wireless/realtek/rtw88/fw.c
+++ b/drivers/net/wireless/realtek/rtw88/fw.c
@@ -332,6 +332,9 @@ void rtw_fw_c2h_cmd_handle(struct rtw_dev *rtwdev, struct sk_buff *skb)
case C2H_RA_RPT:
rtw_fw_ra_report_handle(rtwdev, c2h->payload, len);
break;
+ case C2H_ADAPTIVITY:
+ rtw_fw_adaptivity_result(rtwdev, c2h->payload, len);
+ break;
default:
rtw_dbg(rtwdev, RTW_DBG_FW, "C2H 0x%x isn't handled\n", c2h->id);
break;
@@ -367,10 +370,6 @@ void rtw_fw_c2h_cmd_rx_irqsafe(struct rtw_dev *rtwdev, u32 pkt_offset,
rtw_fw_scan_result(rtwdev, c2h->payload, len);
dev_kfree_skb_any(skb);
break;
- case C2H_ADAPTIVITY:
- rtw_fw_adaptivity_result(rtwdev, c2h->payload, len);
- dev_kfree_skb_any(skb);
- break;
default:
/* pass offset for further operation */
*((u32 *)skb->cb) = pkt_offset;
diff --git a/drivers/net/wireless/realtek/rtw88/usb.c b/drivers/net/wireless/realtek/rtw88/usb.c
index 727569d4ef4b..5c2dfa2ced92 100644
--- a/drivers/net/wireless/realtek/rtw88/usb.c
+++ b/drivers/net/wireless/realtek/rtw88/usb.c
@@ -585,7 +585,7 @@ static void rtw_usb_rx_handler(struct work_struct *work)
goto skip_packet;
}
- skb = alloc_skb(skb_len, GFP_KERNEL);
+ skb = alloc_skb(skb_len, GFP_ATOMIC);
if (!skb) {
rtw_dbg(rtwdev, RTW_DBG_USB,
"failed to allocate RX skb of size %u\n",
@@ -612,7 +612,11 @@ static void rtw_usb_rx_handler(struct work_struct *work)
rx_desc += next_pkt;
} while (rx_desc + pkt_desc_sz < rx_skb->data + rx_skb->len);
- dev_kfree_skb_any(rx_skb);
+ if (skb_queue_len(&rtwusb->rx_free_queue) >=
+ RTW_USB_RX_SKB_NUM - RTW_USB_RXCB_NUM)
+ dev_kfree_skb_any(rx_skb);
+ else
+ skb_queue_tail(&rtwusb->rx_free_queue, rx_skb);
}
}
@@ -621,23 +625,57 @@ static void rtw_usb_read_port_complete(struct urb *urb);
static void rtw_usb_rx_resubmit(struct rtw_usb *rtwusb, struct rx_usb_ctrl_block *rxcb)
{
struct rtw_dev *rtwdev = rtwusb->rtwdev;
+ struct sk_buff *rx_skb;
+ gfp_t priority = in_interrupt() ? GFP_ATOMIC : GFP_KERNEL;
int error;
- rxcb->rx_skb = alloc_skb(RTW_USB_MAX_RECVBUF_SZ, GFP_ATOMIC);
- if (!rxcb->rx_skb)
- return;
+ rx_skb = skb_dequeue(&rtwusb->rx_free_queue);
+ if (!rx_skb)
+ rx_skb = alloc_skb(RTW_USB_MAX_RECVBUF_SZ, priority);
+
+ if (!rx_skb)
+ goto try_later;
+
+ rxcb->rx_skb = rx_skb;
+
+ skb_reset_tail_pointer(rxcb->rx_skb);
+ rxcb->rx_skb->len = 0;
usb_fill_bulk_urb(rxcb->rx_urb, rtwusb->udev,
usb_rcvbulkpipe(rtwusb->udev, rtwusb->pipe_in),
rxcb->rx_skb->data, RTW_USB_MAX_RECVBUF_SZ,
rtw_usb_read_port_complete, rxcb);
- error = usb_submit_urb(rxcb->rx_urb, GFP_ATOMIC);
+ error = usb_submit_urb(rxcb->rx_urb, priority);
if (error) {
- kfree_skb(rxcb->rx_skb);
+ skb_queue_tail(&rtwusb->rx_free_queue, rxcb->rx_skb);
+
if (error != -ENODEV)
rtw_err(rtwdev, "Err sending rx data urb %d\n",
error);
+
+ if (error == -ENOMEM)
+ goto try_later;
+ }
+
+ return;
+
+try_later:
+ rxcb->rx_skb = NULL;
+ queue_work(rtwusb->rxwq, &rtwusb->rx_urb_work);
+}
+
+static void rtw_usb_rx_resubmit_work(struct work_struct *work)
+{
+ struct rtw_usb *rtwusb = container_of(work, struct rtw_usb, rx_urb_work);
+ struct rx_usb_ctrl_block *rxcb;
+ int i;
+
+ for (i = 0; i < RTW_USB_RXCB_NUM; i++) {
+ rxcb = &rtwusb->rx_cb[i];
+
+ if (!rxcb->rx_skb)
+ rtw_usb_rx_resubmit(rtwusb, rxcb);
}
}
@@ -653,8 +691,7 @@ static void rtw_usb_read_port_complete(struct urb *urb)
urb->actual_length < 24) {
rtw_err(rtwdev, "failed to get urb length:%d\n",
urb->actual_length);
- if (skb)
- dev_kfree_skb_any(skb);
+ skb_queue_tail(&rtwusb->rx_free_queue, skb);
} else {
skb_put(skb, urb->actual_length);
skb_queue_tail(&rtwusb->rx_queue, skb);
@@ -662,6 +699,8 @@ static void rtw_usb_read_port_complete(struct urb *urb)
}
rtw_usb_rx_resubmit(rtwusb, rxcb);
} else {
+ skb_queue_tail(&rtwusb->rx_free_queue, skb);
+
switch (urb->status) {
case -EINVAL:
case -EPIPE:
@@ -679,8 +718,6 @@ static void rtw_usb_read_port_complete(struct urb *urb)
rtw_err(rtwdev, "status %d\n", urb->status);
break;
}
- if (skb)
- dev_kfree_skb_any(skb);
}
}
@@ -868,16 +905,26 @@ static struct rtw_hci_ops rtw_usb_ops = {
static int rtw_usb_init_rx(struct rtw_dev *rtwdev)
{
struct rtw_usb *rtwusb = rtw_get_usb_priv(rtwdev);
+ struct sk_buff *rx_skb;
+ int i;
- rtwusb->rxwq = create_singlethread_workqueue("rtw88_usb: rx wq");
+ rtwusb->rxwq = alloc_workqueue("rtw88_usb: rx wq", WQ_BH, 0);
if (!rtwusb->rxwq) {
rtw_err(rtwdev, "failed to create RX work queue\n");
return -ENOMEM;
}
skb_queue_head_init(&rtwusb->rx_queue);
+ skb_queue_head_init(&rtwusb->rx_free_queue);
INIT_WORK(&rtwusb->rx_work, rtw_usb_rx_handler);
+ INIT_WORK(&rtwusb->rx_urb_work, rtw_usb_rx_resubmit_work);
+
+ for (i = 0; i < RTW_USB_RX_SKB_NUM; i++) {
+ rx_skb = alloc_skb(RTW_USB_MAX_RECVBUF_SZ, GFP_KERNEL);
+ if (rx_skb)
+ skb_queue_tail(&rtwusb->rx_free_queue, rx_skb);
+ }
return 0;
}
@@ -902,6 +949,8 @@ static void rtw_usb_deinit_rx(struct rtw_dev *rtwdev)
flush_workqueue(rtwusb->rxwq);
destroy_workqueue(rtwusb->rxwq);
+
+ skb_queue_purge(&rtwusb->rx_free_queue);
}
static int rtw_usb_init_tx(struct rtw_dev *rtwdev)
diff --git a/drivers/net/wireless/realtek/rtw88/usb.h b/drivers/net/wireless/realtek/rtw88/usb.h
index 86697a5c0103..9b695b688b24 100644
--- a/drivers/net/wireless/realtek/rtw88/usb.h
+++ b/drivers/net/wireless/realtek/rtw88/usb.h
@@ -38,6 +38,7 @@
#define RTW_USB_RXAGG_TIMEOUT 10
#define RTW_USB_RXCB_NUM 4
+#define RTW_USB_RX_SKB_NUM 8
#define RTW_USB_EP_MAX 4
@@ -81,7 +82,9 @@ struct rtw_usb {
struct rx_usb_ctrl_block rx_cb[RTW_USB_RXCB_NUM];
struct sk_buff_head rx_queue;
+ struct sk_buff_head rx_free_queue;
struct work_struct rx_work;
+ struct work_struct rx_urb_work;
};
static inline struct rtw_usb_tx_data *rtw_usb_get_tx_data(struct sk_buff *skb)
--
2.47.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* RE: [PATCH 1/2] wifi: rtw88: usb: Copy instead of cloning the RX skb
2024-11-14 23:06 [PATCH 1/2] wifi: rtw88: usb: Copy instead of cloning the RX skb Bitterblue Smith
2024-11-14 23:07 ` [PATCH 2/2] wifi: rtw88: usb: Preallocate and reuse the RX skbs Bitterblue Smith
@ 2024-11-18 9:16 ` Ping-Ke Shih
2024-11-25 23:48 ` Bitterblue Smith
1 sibling, 1 reply; 10+ messages in thread
From: Ping-Ke Shih @ 2024-11-18 9:16 UTC (permalink / raw)
To: Bitterblue Smith, linux-wireless@vger.kernel.org; +Cc: Sascha Hauer
Bitterblue Smith <rtl8821cerfe2@gmail.com> wrote:
> "iperf3 -c 192.168.0.1 -R --udp -b 0" shows about 40% of datagrams
> are lost. Many torrents don't download faster than 3 MiB/s, probably
> because the Bittorrent protocol uses UDP. This is somehow related to
> the use of skb_clone() in the RX path.
Using skb_clone() can improve throughput is weird to me too. Do you check
top with 100% cpu usage?
>
> Don't use skb_clone(). Instead allocate a new skb for each 802.11 frame
> received and copy the data from the big (32768 byte) skb.
>
> With this patch, "iperf3 -c 192.168.0.1 -R --udp -b 0" shows only 1-2%
> of datagrams are lost, and torrents can reach download speeds of 36
> MiB/s.
>
> Tested with RTL8812AU and RTL8822CU.
>
> Signed-off-by: Bitterblue Smith <rtl8821cerfe2@gmail.com>
> ---
> drivers/net/wireless/realtek/rtw88/usb.c | 52 ++++++++++++++----------
> 1 file changed, 31 insertions(+), 21 deletions(-)
>
> diff --git a/drivers/net/wireless/realtek/rtw88/usb.c b/drivers/net/wireless/realtek/rtw88/usb.c
> index 93ac4837e1b5..727569d4ef4b 100644
> --- a/drivers/net/wireless/realtek/rtw88/usb.c
> +++ b/drivers/net/wireless/realtek/rtw88/usb.c
> @@ -7,6 +7,7 @@
> #include <linux/mutex.h>
> #include "main.h"
> #include "debug.h"
> +#include "mac.h"
> #include "reg.h"
> #include "tx.h"
> #include "rx.h"
> @@ -546,49 +547,58 @@ static void rtw_usb_rx_handler(struct work_struct *work)
> {
> struct rtw_usb *rtwusb = container_of(work, struct rtw_usb, rx_work);
> struct rtw_dev *rtwdev = rtwusb->rtwdev;
> - const struct rtw_chip_info *chip = rtwdev->chip;
> - u32 pkt_desc_sz = chip->rx_pkt_desc_sz;
> struct ieee80211_rx_status rx_status;
> - u32 pkt_offset, next_pkt, urb_len;
> struct rtw_rx_pkt_stat pkt_stat;
> - struct sk_buff *next_skb;
> + struct sk_buff *rx_skb;
> struct sk_buff *skb;
> + u32 pkt_desc_sz = rtwdev->chip->rx_pkt_desc_sz;
> + u32 max_skb_len = pkt_desc_sz + PHY_STATUS_SIZE * 8 +
> + IEEE80211_MAX_MPDU_LEN_VHT_11454;
> + u32 pkt_offset, next_pkt, skb_len;
> u8 *rx_desc;
> int limit;
>
> for (limit = 0; limit < 200; limit++) {
> - skb = skb_dequeue(&rtwusb->rx_queue);
> - if (!skb)
> + rx_skb = skb_dequeue(&rtwusb->rx_queue);
> + if (!rx_skb)
> break;
>
> if (skb_queue_len(&rtwusb->rx_queue) >= RTW_USB_MAX_RXQ_LEN) {
> dev_dbg_ratelimited(rtwdev->dev, "failed to get rx_queue, overflow\n");
> - dev_kfree_skb_any(skb);
> + dev_kfree_skb_any(rx_skb);
> continue;
> }
>
> - urb_len = skb->len;
> + rx_desc = rx_skb->data;
>
> do {
> - rx_desc = skb->data;
> rtw_rx_query_rx_desc(rtwdev, rx_desc, &pkt_stat,
> &rx_status);
> pkt_offset = pkt_desc_sz + pkt_stat.drv_info_sz +
> pkt_stat.shift;
>
> - next_pkt = round_up(pkt_stat.pkt_len + pkt_offset, 8);
> + skb_len = pkt_stat.pkt_len + pkt_offset;
> + if (skb_len > max_skb_len) {
This seems a new rule introduced by this patch. Do you really encounter this
case? Maybe this is the cause of slow download throughput?
> + rtw_dbg(rtwdev, RTW_DBG_USB,
> + "skipping too big packet: %u\n",
> + skb_len);
> + goto skip_packet;
> + }
>
> - if (urb_len >= next_pkt + pkt_desc_sz)
> - next_skb = skb_clone(skb, GFP_KERNEL);
> - else
> - next_skb = NULL;
> + skb = alloc_skb(skb_len, GFP_KERNEL);
> + if (!skb) {
> + rtw_dbg(rtwdev, RTW_DBG_USB,
> + "failed to allocate RX skb of size %u\n",
> + skb_len);
> + goto skip_packet;
> + }
> +
> + skb_put_data(skb, rx_desc, skb_len);
>
> if (pkt_stat.is_c2h) {
> - skb_trim(skb, pkt_stat.pkt_len + pkt_offset);
> rtw_fw_c2h_cmd_rx_irqsafe(rtwdev, pkt_offset, skb);
> } else {
> skb_pull(skb, pkt_offset);
> - skb_trim(skb, pkt_stat.pkt_len);
> rtw_update_rx_freq_for_invalid(rtwdev, skb,
> &rx_status,
> &pkt_stat);
> @@ -597,12 +607,12 @@ static void rtw_usb_rx_handler(struct work_struct *work)
> ieee80211_rx_irqsafe(rtwdev->hw, skb);
> }
>
> - skb = next_skb;
> - if (skb)
> - skb_pull(skb, next_pkt);
> +skip_packet:
> + next_pkt = round_up(skb_len, 8);
> + rx_desc += next_pkt;
> + } while (rx_desc + pkt_desc_sz < rx_skb->data + rx_skb->len);
>
> - urb_len -= next_pkt;
> - } while (skb);
> + dev_kfree_skb_any(rx_skb);
> }
> }
>
> --
> 2.47.0
^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: [PATCH 2/2] wifi: rtw88: usb: Preallocate and reuse the RX skbs
2024-11-14 23:07 ` [PATCH 2/2] wifi: rtw88: usb: Preallocate and reuse the RX skbs Bitterblue Smith
@ 2024-11-19 0:50 ` Ping-Ke Shih
2024-11-25 23:49 ` Bitterblue Smith
0 siblings, 1 reply; 10+ messages in thread
From: Ping-Ke Shih @ 2024-11-19 0:50 UTC (permalink / raw)
To: Bitterblue Smith, linux-wireless@vger.kernel.org; +Cc: Sascha Hauer
Bitterblue Smith <rtl8821cerfe2@gmail.com> wrote:
> The USB driver uses four USB Request Blocks for RX. Before submitting
> one, it allocates a 32768 byte skb for the RX data. This allocation can
> fail, maybe due to temporary memory fragmentation. When the allocation
> fails, the corresponding URB is never submitted again. After four such
> allocation failures, all RX stops because the driver is not requesting
> data from the device anymore.
>
> Don't allocate a 32768 byte skb when submitting a USB Request Block
> (which happens very often). Instead preallocate 8 such skbs, and reuse
> them over and over. If all 8 are busy, allocate a new one. This is
> pretty rare. If the allocation fails, use a work to try again later.
> When there are enough free skbs again, free the excess skbs.
>
> Also, use WQ_BH for the RX workqueue. With a normal or high priority
> workqueue the skbs are processed too slowly when the system is even a
> little busy, like when opening a new page in a browser, and the driver
> runs out of free skbs and allocates a lot of new ones.
>
> Move C2H_ADAPTIVITY to the c2h workqueue instead of handling it directly
> from rtw_fw_c2h_cmd_rx_irqsafe(), which runs in the RX workqueue. It
> reads hardware registers, which is not "irqsafe" with USB.
This part should be in a separated patch.
>
> This is more or less what the out-of-tree Realtek drivers do, except
> they use a tasklet instead of a BH workqueue.
>
> Tested with RTL8723DU, RTL8821AU, RTL8812AU, RTL8812BU, RTL8822CU,
> RTL8811CU.
>
> Closes: https://lore.kernel.org/linux-wireless/6e7ecb47-7ea0-433a-a19f-05f88a2edf6b@gmail.com/
> Signed-off-by: Bitterblue Smith <rtl8821cerfe2@gmail.com>
> ---
> drivers/net/wireless/realtek/rtw88/fw.c | 7 +--
> drivers/net/wireless/realtek/rtw88/usb.c | 73 ++++++++++++++++++++----
> drivers/net/wireless/realtek/rtw88/usb.h | 3 +
> 3 files changed, 67 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/net/wireless/realtek/rtw88/fw.c b/drivers/net/wireless/realtek/rtw88/fw.c
> index e6e9946fbf44..02389b7c6876 100644
> --- a/drivers/net/wireless/realtek/rtw88/fw.c
> +++ b/drivers/net/wireless/realtek/rtw88/fw.c
> @@ -332,6 +332,9 @@ void rtw_fw_c2h_cmd_handle(struct rtw_dev *rtwdev, struct sk_buff *skb)
> case C2H_RA_RPT:
> rtw_fw_ra_report_handle(rtwdev, c2h->payload, len);
> break;
> + case C2H_ADAPTIVITY:
> + rtw_fw_adaptivity_result(rtwdev, c2h->payload, len);
> + break;
> default:
> rtw_dbg(rtwdev, RTW_DBG_FW, "C2H 0x%x isn't handled\n", c2h->id);
> break;
> @@ -367,10 +370,6 @@ void rtw_fw_c2h_cmd_rx_irqsafe(struct rtw_dev *rtwdev, u32 pkt_offset,
> rtw_fw_scan_result(rtwdev, c2h->payload, len);
> dev_kfree_skb_any(skb);
> break;
> - case C2H_ADAPTIVITY:
> - rtw_fw_adaptivity_result(rtwdev, c2h->payload, len);
> - dev_kfree_skb_any(skb);
> - break;
> default:
> /* pass offset for further operation */
> *((u32 *)skb->cb) = pkt_offset;
> diff --git a/drivers/net/wireless/realtek/rtw88/usb.c b/drivers/net/wireless/realtek/rtw88/usb.c
> index 727569d4ef4b..5c2dfa2ced92 100644
> --- a/drivers/net/wireless/realtek/rtw88/usb.c
> +++ b/drivers/net/wireless/realtek/rtw88/usb.c
> @@ -585,7 +585,7 @@ static void rtw_usb_rx_handler(struct work_struct *work)
> goto skip_packet;
> }
>
> - skb = alloc_skb(skb_len, GFP_KERNEL);
> + skb = alloc_skb(skb_len, GFP_ATOMIC);
> if (!skb) {
> rtw_dbg(rtwdev, RTW_DBG_USB,
> "failed to allocate RX skb of size %u\n",
> @@ -612,7 +612,11 @@ static void rtw_usb_rx_handler(struct work_struct *work)
> rx_desc += next_pkt;
> } while (rx_desc + pkt_desc_sz < rx_skb->data + rx_skb->len);
>
> - dev_kfree_skb_any(rx_skb);
> + if (skb_queue_len(&rtwusb->rx_free_queue) >=
> + RTW_USB_RX_SKB_NUM - RTW_USB_RXCB_NUM)
> + dev_kfree_skb_any(rx_skb);
> + else
> + skb_queue_tail(&rtwusb->rx_free_queue, rx_skb);
Why not just queue and reuse rx_skb for all? That would be simpler.
> }
> }
>
> @@ -621,23 +625,57 @@ static void rtw_usb_read_port_complete(struct urb *urb);
> static void rtw_usb_rx_resubmit(struct rtw_usb *rtwusb, struct rx_usb_ctrl_block *rxcb)
> {
> struct rtw_dev *rtwdev = rtwusb->rtwdev;
> + struct sk_buff *rx_skb;
> + gfp_t priority = in_interrupt() ? GFP_ATOMIC : GFP_KERNEL;
I remember in_interrupt() is not recommended. Can't we pass necessary gfp_t
via function argument by callers?
> int error;
>
> - rxcb->rx_skb = alloc_skb(RTW_USB_MAX_RECVBUF_SZ, GFP_ATOMIC);
> - if (!rxcb->rx_skb)
> - return;
> + rx_skb = skb_dequeue(&rtwusb->rx_free_queue);
> + if (!rx_skb)
> + rx_skb = alloc_skb(RTW_USB_MAX_RECVBUF_SZ, priority);
> +
> + if (!rx_skb)
> + goto try_later;
> +
> + rxcb->rx_skb = rx_skb;
> +
> + skb_reset_tail_pointer(rxcb->rx_skb);
> + rxcb->rx_skb->len = 0;
How about moving these initialization upward before ' rxcb->rx_skb = rx_skb;'?
Statements can be shorter, and it is reasonable to initialize data before
assignment.
>
> usb_fill_bulk_urb(rxcb->rx_urb, rtwusb->udev,
> usb_rcvbulkpipe(rtwusb->udev, rtwusb->pipe_in),
> rxcb->rx_skb->data, RTW_USB_MAX_RECVBUF_SZ,
> rtw_usb_read_port_complete, rxcb);
>
> - error = usb_submit_urb(rxcb->rx_urb, GFP_ATOMIC);
> + error = usb_submit_urb(rxcb->rx_urb, priority);
Not sure if 'priority' fits the meaning. Maybe many kernel code just
uses 'gfp'.
> if (error) {
> - kfree_skb(rxcb->rx_skb);
> + skb_queue_tail(&rtwusb->rx_free_queue, rxcb->rx_skb);
> +
> if (error != -ENODEV)
> rtw_err(rtwdev, "Err sending rx data urb %d\n",
> error);
Since here rxcb->rx_skb != NULL, will it be a problem? The rxcb will not be
submitted again? Should all error cases go to try_later label?
> +
> + if (error == -ENOMEM)
> + goto try_later;
> + }
> +
> + return;
> +
> +try_later:
> + rxcb->rx_skb = NULL;
> + queue_work(rtwusb->rxwq, &rtwusb->rx_urb_work);
> +}
> +
> +static void rtw_usb_rx_resubmit_work(struct work_struct *work)
> +{
> + struct rtw_usb *rtwusb = container_of(work, struct rtw_usb, rx_urb_work);
> + struct rx_usb_ctrl_block *rxcb;
> + int i;
> +
> + for (i = 0; i < RTW_USB_RXCB_NUM; i++) {
> + rxcb = &rtwusb->rx_cb[i];
> +
> + if (!rxcb->rx_skb)
> + rtw_usb_rx_resubmit(rtwusb, rxcb);
> }
> }
>
> @@ -653,8 +691,7 @@ static void rtw_usb_read_port_complete(struct urb *urb)
> urb->actual_length < 24) {
> rtw_err(rtwdev, "failed to get urb length:%d\n",
> urb->actual_length);
> - if (skb)
> - dev_kfree_skb_any(skb);
> + skb_queue_tail(&rtwusb->rx_free_queue, skb);
> } else {
> skb_put(skb, urb->actual_length);
> skb_queue_tail(&rtwusb->rx_queue, skb);
> @@ -662,6 +699,8 @@ static void rtw_usb_read_port_complete(struct urb *urb)
> }
> rtw_usb_rx_resubmit(rtwusb, rxcb);
> } else {
> + skb_queue_tail(&rtwusb->rx_free_queue, skb);
> +
> switch (urb->status) {
> case -EINVAL:
> case -EPIPE:
> @@ -679,8 +718,6 @@ static void rtw_usb_read_port_complete(struct urb *urb)
> rtw_err(rtwdev, "status %d\n", urb->status);
> break;
> }
> - if (skb)
> - dev_kfree_skb_any(skb);
> }
> }
>
> @@ -868,16 +905,26 @@ static struct rtw_hci_ops rtw_usb_ops = {
> static int rtw_usb_init_rx(struct rtw_dev *rtwdev)
> {
> struct rtw_usb *rtwusb = rtw_get_usb_priv(rtwdev);
> + struct sk_buff *rx_skb;
> + int i;
>
> - rtwusb->rxwq = create_singlethread_workqueue("rtw88_usb: rx wq");
> + rtwusb->rxwq = alloc_workqueue("rtw88_usb: rx wq", WQ_BH, 0);
> if (!rtwusb->rxwq) {
> rtw_err(rtwdev, "failed to create RX work queue\n");
> return -ENOMEM;
> }
>
> skb_queue_head_init(&rtwusb->rx_queue);
> + skb_queue_head_init(&rtwusb->rx_free_queue);
>
> INIT_WORK(&rtwusb->rx_work, rtw_usb_rx_handler);
> + INIT_WORK(&rtwusb->rx_urb_work, rtw_usb_rx_resubmit_work);
> +
> + for (i = 0; i < RTW_USB_RX_SKB_NUM; i++) {
> + rx_skb = alloc_skb(RTW_USB_MAX_RECVBUF_SZ, GFP_KERNEL);
> + if (rx_skb)
> + skb_queue_tail(&rtwusb->rx_free_queue, rx_skb);
> + }
>
> return 0;
> }
> @@ -902,6 +949,8 @@ static void rtw_usb_deinit_rx(struct rtw_dev *rtwdev)
>
> flush_workqueue(rtwusb->rxwq);
> destroy_workqueue(rtwusb->rxwq);
> +
> + skb_queue_purge(&rtwusb->rx_free_queue);
> }
>
> static int rtw_usb_init_tx(struct rtw_dev *rtwdev)
> diff --git a/drivers/net/wireless/realtek/rtw88/usb.h b/drivers/net/wireless/realtek/rtw88/usb.h
> index 86697a5c0103..9b695b688b24 100644
> --- a/drivers/net/wireless/realtek/rtw88/usb.h
> +++ b/drivers/net/wireless/realtek/rtw88/usb.h
> @@ -38,6 +38,7 @@
> #define RTW_USB_RXAGG_TIMEOUT 10
>
> #define RTW_USB_RXCB_NUM 4
> +#define RTW_USB_RX_SKB_NUM 8
>
> #define RTW_USB_EP_MAX 4
>
> @@ -81,7 +82,9 @@ struct rtw_usb {
>
> struct rx_usb_ctrl_block rx_cb[RTW_USB_RXCB_NUM];
> struct sk_buff_head rx_queue;
> + struct sk_buff_head rx_free_queue;
> struct work_struct rx_work;
> + struct work_struct rx_urb_work;
> };
>
> static inline struct rtw_usb_tx_data *rtw_usb_get_tx_data(struct sk_buff *skb)
> --
> 2.47.0
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] wifi: rtw88: usb: Copy instead of cloning the RX skb
2024-11-18 9:16 ` [PATCH 1/2] wifi: rtw88: usb: Copy instead of cloning the RX skb Ping-Ke Shih
@ 2024-11-25 23:48 ` Bitterblue Smith
2024-11-26 1:17 ` Ping-Ke Shih
0 siblings, 1 reply; 10+ messages in thread
From: Bitterblue Smith @ 2024-11-25 23:48 UTC (permalink / raw)
To: Ping-Ke Shih, linux-wireless@vger.kernel.org; +Cc: Sascha Hauer
On 18/11/2024 11:16, Ping-Ke Shih wrote:
> Bitterblue Smith <rtl8821cerfe2@gmail.com> wrote:
>> "iperf3 -c 192.168.0.1 -R --udp -b 0" shows about 40% of datagrams
>> are lost. Many torrents don't download faster than 3 MiB/s, probably
>> because the Bittorrent protocol uses UDP. This is somehow related to
>> the use of skb_clone() in the RX path.
>
> Using skb_clone() can improve throughput is weird to me too. Do you check
> top with 100% cpu usage?
>
I checked the CPU usage now and the results are interesting. In short,
patch 1/2 slightly raises the CPU usage, and patch 2/2 lowers it a lot.
I used this command: "top -b -d 20 -n 2" while running iperf3 in
reverse mode. During every test the RX speed was approximately 490 Mbps.
I tested with RTL8812AU. I used the numbers from the second "frame"
printed by each top command. I added up all the processes that had CPU
usage of at least 0.1%. I removed processes that I think are irrelevant
(kwin_wayland, konsole, weechat, plasmashell).
Before patch 1/2:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
21192 asdf 20 0 17988 3468 2828 S 26.6 0.0 0:05.90 iperf3
34 root 20 0 0 0 0 S 8.7 0.0 1:06.89 ksoftirqd/1
17 root 20 0 0 0 0 S 4.7 0.0 0:44.58 ksoftirqd/0
40 root 20 0 0 0 0 S 3.9 0.0 0:25.15 ksoftirqd/3
28 root 20 0 0 0 0 R 3.7 0.0 0:31.46 ksoftirqd/2
20964 root 20 0 0 0 0 R 3.5 0.0 0:01.21 kworker/u16:1-rtw88_usb: tx wq
20716 root 20 0 0 0 0 I 3.4 0.0 0:02.74 kworker/u16:3-flush-259:0
11253 root 20 0 0 0 0 I 2.9 0.0 0:43.42 kworker/u16:4-events_unbound
20572 root 0 -20 0 0 0 I 2.7 0.0 0:03.05 kworker/u17:0-rtw_tx_wq
21194 root 0 -20 0 0 0 I 2.7 0.0 0:00.59 kworker/u17:1-rtw_tx_wq
62.8
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
21798 asdf 20 0 17988 3408 2768 S 33.7 0.0 0:08.41 iperf3
34 root 20 0 0 0 0 S 7.5 0.0 1:21.17 ksoftirqd/1
17 root 20 0 0 0 0 S 5.8 0.0 0:54.08 ksoftirqd/0
28 root 20 0 0 0 0 S 5.7 0.0 0:38.42 ksoftirqd/2
40 root 20 0 0 0 0 S 5.5 0.0 0:29.69 ksoftirqd/3
21261 root 20 0 0 0 0 I 3.0 0.0 0:02.32 kworker/u16:0-rtw88_usb: tx wq
21237 root 0 -20 0 0 0 I 2.6 0.0 0:00.71 kworker/u17:4-rtw_tx_wq
20716 root 20 0 0 0 0 I 2.4 0.0 0:05.97 kworker/u16:3-rtw88_usb: rx wq
20964 root 20 0 0 0 0 I 2.4 0.0 0:04.65 kworker/u16:1-rtw88_usb: rx wq
21244 root 0 -20 0 0 0 I 0.7 0.0 0:00.24 kworker/u17:12-rtw_tx_wq
69.3
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
21853 asdf 20 0 17988 3700 2932 S 34.3 0.0 0:08.38 iperf3
17 root 20 0 0 0 0 S 7.0 0.0 0:55.93 ksoftirqd/0
34 root 20 0 0 0 0 S 6.8 0.0 1:22.92 ksoftirqd/1
40 root 20 0 0 0 0 S 5.4 0.0 0:31.05 ksoftirqd/3
28 root 20 0 0 0 0 S 4.9 0.0 0:39.62 ksoftirqd/2
21261 root 20 0 0 0 0 I 3.1 0.0 0:03.22 kworker/u16:0-rtw88_usb: rx wq
21833 root 0 -20 0 0 0 I 2.4 0.0 0:00.67 kworker/u17:0-rtw_tx_wq
20427 root 20 0 0 0 0 I 2.2 0.0 0:12.37 kworker/u16:2-events_unbound
21834 root 0 -20 0 0 0 I 1.1 0.0 0:00.30 kworker/u17:2-rtw_tx_wq
67.2
With patch 1/2:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
22222 asdf 20 0 17988 3804 3036 S 29.4 0.0 0:08.38 iperf3
34 root 20 0 0 0 0 R 10.3 0.0 1:26.13 ksoftirqd/1
28 root 20 0 0 0 0 R 7.9 0.0 0:42.00 ksoftirqd/2
20964 root 20 0 0 0 0 I 5.8 0.0 0:06.66 kworker/u16:1-rtw88_usb: rx wq
17 root 20 0 0 0 0 S 5.0 0.0 0:57.49 ksoftirqd/0
21261 root 20 0 0 0 0 I 4.2 0.0 0:04.55 kworker/u16:0-rtw88_usb: tx wq
40 root 20 0 0 0 0 S 4.1 0.0 0:32.39 ksoftirqd/3
20716 root 20 0 0 0 0 I 2.8 0.0 0:08.07 kworker/u16:3-rtw88_usb: tx wq
21237 root 0 -20 0 0 0 I 2.5 0.0 0:01.32 kworker/u17:4-rtw_tx_wq
21834 root 0 -20 0 0 0 I 1.0 0.0 0:00.61 kworker/u17:2-rtw_tx_wq
21225 root 20 0 0 0 0 I 0.1 0.0 0:00.33 kworker/2:2-events
73.1
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
22265 asdf 20 0 17988 3596 2828 S 30.5 0.0 0:08.17 iperf3
34 root 20 0 0 0 0 S 10.5 0.0 1:28.96 ksoftirqd/1
28 root 20 0 0 0 0 S 9.1 0.0 0:44.54 ksoftirqd/2
20427 root 20 0 0 0 0 I 5.9 0.0 0:13.72 kworker/u16:2-rtw88_usb: rx wq
20716 root 20 0 0 0 0 I 5.7 0.0 0:09.78 kworker/u16:3-rtw88_usb: tx wq
17 root 20 0 0 0 0 S 3.9 0.0 0:58.72 ksoftirqd/0
40 root 20 0 0 0 0 S 3.5 0.0 0:33.53 ksoftirqd/3
21833 root 0 -20 0 0 0 I 3.2 0.0 0:01.76 kworker/u17:0-rtw_tx_wq
20964 root 20 0 0 0 0 I 1.1 0.0 0:07.34 kworker/u16:1-flush-259:0
21834 root 0 -20 0 0 0 I 0.4 0.0 0:00.92 kworker/u17:2-rtw_tx_wq
73.8
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
22303 asdf 20 0 17988 3808 3040 S 31.3 0.0 0:08.58 iperf3
28 root 20 0 0 0 0 R 10.0 0.0 0:47.50 ksoftirqd/2
34 root 20 0 0 0 0 S 9.9 0.0 1:31.69 ksoftirqd/1
20716 root 20 0 0 0 0 I 5.2 0.0 0:11.26 kworker/u16:3-rtw88_usb: rx wq
17 root 20 0 0 0 0 S 4.0 0.0 0:59.98 ksoftirqd/0
20427 root 20 0 0 0 0 I 4.0 0.0 0:15.06 kworker/u16:2-rtw88_usb: tx wq
20964 root 20 0 0 0 0 I 3.8 0.0 0:08.73 kworker/u16:1-rtw88_usb: tx wq
40 root 20 0 0 0 0 S 2.7 0.0 0:34.49 ksoftirqd/3
21833 root 0 -20 0 0 0 I 1.9 0.0 0:02.47 kworker/u17:0-rtw_tx_wq
21834 root 0 -20 0 0 0 I 1.8 0.0 0:01.49 kworker/u17:2-rtw_tx_wq
74.6
With patches 1/2 and 2/2:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
23151 asdf 20 0 17988 3656 2888 S 28.9 0.0 0:08.86 iperf3
22400 root 0 -20 0 0 0 I 2.7 0.0 0:00.86 kworker/u17:6-rtw_tx_wq
21261 root 20 0 0 0 0 I 2.3 0.0 0:06.47 kworker/u16:0-rtw88_usb: tx wq
22401 root 0 -20 0 0 0 I 1.3 0.0 0:00.41 kworker/u17:7-rtw_tx_wq
20427 root 20 0 0 0 0 I 1.2 0.0 0:16.02 kworker/u16:2-rtw88_usb: tx wq
34 root 20 0 0 0 0 S 0.5 0.0 1:32.23 ksoftirqd/1
20964 root 20 0 0 0 0 I 0.4 0.0 0:09.30 kworker/u16:1-rtw88_usb: tx wq
22498 root 20 0 0 0 0 I 0.1 0.0 0:00.07 kworker/1:1-events
37.4
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
23213 asdf 20 0 17988 3360 2720 S 30.7 0.0 0:08.69 iperf3
23169 root 0 -20 0 0 0 I 3.3 0.0 0:00.93 kworker/u17:1-rtw_tx_wq
21261 root 20 0 0 0 0 I 1.5 0.0 0:07.17 kworker/u16:0-rtw88_usb: tx wq
23219 root 20 0 0 0 0 I 1.0 0.0 0:00.21 kworker/u16:3-rtw88_usb: tx wq
34 root 20 0 0 0 0 S 0.6 0.0 1:32.59 ksoftirqd/1
20964 root 20 0 0 0 0 I 0.6 0.0 0:09.46 kworker/u16:1-rtw88_usb: tx wq
23189 root 0 -20 0 0 0 I 0.6 0.0 0:00.26 kworker/u17:9-rtw_tx_wq
20427 root 20 0 0 0 0 I 0.5 0.0 0:16.48 kworker/u16:2-rtw88_usb: tx wq
22283 root 20 0 0 0 0 I 0.1 0.0 0:00.03 kworker/1:0-events
38.9
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
23266 asdf 20 0 17988 3608 2968 S 27.5 0.0 0:07.55 iperf3
21833 root 0 -20 0 0 0 R 2.7 0.0 0:03.09 kworker/u17:0+rtw_tx_wq
20964 root 20 0 0 0 0 I 1.6 0.0 0:10.03 kworker/u16:1-events_unbound
23272 root 20 0 0 0 0 I 1.5 0.0 0:00.31 kworker/u16:4-rtw88_usb: tx wq
22401 root 0 -20 0 0 0 I 1.1 0.0 0:00.97 kworker/u17:7-rtw_tx_wq
34 root 20 0 0 0 0 S 0.5 0.0 1:32.83 ksoftirqd/1
20427 root 20 0 0 0 0 I 0.5 0.0 0:16.78 kworker/u16:2-rtw88_usb: tx wq
23228 root 20 0 0 0 0 I 0.1 0.0 0:00.08 kworker/1:2-events
35.5
>>
>> Don't use skb_clone(). Instead allocate a new skb for each 802.11 frame
>> received and copy the data from the big (32768 byte) skb.
>>
>> With this patch, "iperf3 -c 192.168.0.1 -R --udp -b 0" shows only 1-2%
>> of datagrams are lost, and torrents can reach download speeds of 36
>> MiB/s.
>>
>> Tested with RTL8812AU and RTL8822CU.
>>
>> Signed-off-by: Bitterblue Smith <rtl8821cerfe2@gmail.com>
>> ---
>> drivers/net/wireless/realtek/rtw88/usb.c | 52 ++++++++++++++----------
>> 1 file changed, 31 insertions(+), 21 deletions(-)
>>
>> diff --git a/drivers/net/wireless/realtek/rtw88/usb.c b/drivers/net/wireless/realtek/rtw88/usb.c
>> index 93ac4837e1b5..727569d4ef4b 100644
>> --- a/drivers/net/wireless/realtek/rtw88/usb.c
>> +++ b/drivers/net/wireless/realtek/rtw88/usb.c
>> @@ -7,6 +7,7 @@
>> #include <linux/mutex.h>
>> #include "main.h"
>> #include "debug.h"
>> +#include "mac.h"
>> #include "reg.h"
>> #include "tx.h"
>> #include "rx.h"
>> @@ -546,49 +547,58 @@ static void rtw_usb_rx_handler(struct work_struct *work)
>> {
>> struct rtw_usb *rtwusb = container_of(work, struct rtw_usb, rx_work);
>> struct rtw_dev *rtwdev = rtwusb->rtwdev;
>> - const struct rtw_chip_info *chip = rtwdev->chip;
>> - u32 pkt_desc_sz = chip->rx_pkt_desc_sz;
>> struct ieee80211_rx_status rx_status;
>> - u32 pkt_offset, next_pkt, urb_len;
>> struct rtw_rx_pkt_stat pkt_stat;
>> - struct sk_buff *next_skb;
>> + struct sk_buff *rx_skb;
>> struct sk_buff *skb;
>> + u32 pkt_desc_sz = rtwdev->chip->rx_pkt_desc_sz;
>> + u32 max_skb_len = pkt_desc_sz + PHY_STATUS_SIZE * 8 +
>> + IEEE80211_MAX_MPDU_LEN_VHT_11454;
>> + u32 pkt_offset, next_pkt, skb_len;
>> u8 *rx_desc;
>> int limit;
>>
>> for (limit = 0; limit < 200; limit++) {
>> - skb = skb_dequeue(&rtwusb->rx_queue);
>> - if (!skb)
>> + rx_skb = skb_dequeue(&rtwusb->rx_queue);
>> + if (!rx_skb)
>> break;
>>
>> if (skb_queue_len(&rtwusb->rx_queue) >= RTW_USB_MAX_RXQ_LEN) {
>> dev_dbg_ratelimited(rtwdev->dev, "failed to get rx_queue, overflow\n");
>> - dev_kfree_skb_any(skb);
>> + dev_kfree_skb_any(rx_skb);
>> continue;
>> }
>>
>> - urb_len = skb->len;
>> + rx_desc = rx_skb->data;
>>
>> do {
>> - rx_desc = skb->data;
>> rtw_rx_query_rx_desc(rtwdev, rx_desc, &pkt_stat,
>> &rx_status);
>> pkt_offset = pkt_desc_sz + pkt_stat.drv_info_sz +
>> pkt_stat.shift;
>>
>> - next_pkt = round_up(pkt_stat.pkt_len + pkt_offset, 8);
>> + skb_len = pkt_stat.pkt_len + pkt_offset;
>> + if (skb_len > max_skb_len) {
>
> This seems a new rule introduced by this patch. Do you really encounter this
> case? Maybe this is the cause of slow download throughput?
>
No, I never saw this case. It just seemed like a good idea to check the
size passed to alloc_skb. Maybe it's not needed?
>> + rtw_dbg(rtwdev, RTW_DBG_USB,
>> + "skipping too big packet: %u\n",
>> + skb_len);
>> + goto skip_packet;
>> + }
>>
>> - if (urb_len >= next_pkt + pkt_desc_sz)
>> - next_skb = skb_clone(skb, GFP_KERNEL);
>> - else
>> - next_skb = NULL;
>> + skb = alloc_skb(skb_len, GFP_KERNEL);
>> + if (!skb) {
>> + rtw_dbg(rtwdev, RTW_DBG_USB,
>> + "failed to allocate RX skb of size %u\n",
>> + skb_len);
>> + goto skip_packet;
>> + }
>> +
>> + skb_put_data(skb, rx_desc, skb_len);
>>
>> if (pkt_stat.is_c2h) {
>> - skb_trim(skb, pkt_stat.pkt_len + pkt_offset);
>> rtw_fw_c2h_cmd_rx_irqsafe(rtwdev, pkt_offset, skb);
>> } else {
>> skb_pull(skb, pkt_offset);
>> - skb_trim(skb, pkt_stat.pkt_len);
>> rtw_update_rx_freq_for_invalid(rtwdev, skb,
>> &rx_status,
>> &pkt_stat);
>> @@ -597,12 +607,12 @@ static void rtw_usb_rx_handler(struct work_struct *work)
>> ieee80211_rx_irqsafe(rtwdev->hw, skb);
>> }
>>
>> - skb = next_skb;
>> - if (skb)
>> - skb_pull(skb, next_pkt);
>> +skip_packet:
>> + next_pkt = round_up(skb_len, 8);
>> + rx_desc += next_pkt;
>> + } while (rx_desc + pkt_desc_sz < rx_skb->data + rx_skb->len);
>>
>> - urb_len -= next_pkt;
>> - } while (skb);
>> + dev_kfree_skb_any(rx_skb);
>> }
>> }
>>
>> --
>> 2.47.0
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] wifi: rtw88: usb: Preallocate and reuse the RX skbs
2024-11-19 0:50 ` Ping-Ke Shih
@ 2024-11-25 23:49 ` Bitterblue Smith
2024-11-26 1:26 ` Ping-Ke Shih
0 siblings, 1 reply; 10+ messages in thread
From: Bitterblue Smith @ 2024-11-25 23:49 UTC (permalink / raw)
To: Ping-Ke Shih, linux-wireless@vger.kernel.org; +Cc: Sascha Hauer
On 19/11/2024 02:50, Ping-Ke Shih wrote:
> Bitterblue Smith <rtl8821cerfe2@gmail.com> wrote:
>> The USB driver uses four USB Request Blocks for RX. Before submitting
>> one, it allocates a 32768 byte skb for the RX data. This allocation can
>> fail, maybe due to temporary memory fragmentation. When the allocation
>> fails, the corresponding URB is never submitted again. After four such
>> allocation failures, all RX stops because the driver is not requesting
>> data from the device anymore.
>>
>> Don't allocate a 32768 byte skb when submitting a USB Request Block
>> (which happens very often). Instead preallocate 8 such skbs, and reuse
>> them over and over. If all 8 are busy, allocate a new one. This is
>> pretty rare. If the allocation fails, use a work to try again later.
>> When there are enough free skbs again, free the excess skbs.
>>
>> Also, use WQ_BH for the RX workqueue. With a normal or high priority
>> workqueue the skbs are processed too slowly when the system is even a
>> little busy, like when opening a new page in a browser, and the driver
>> runs out of free skbs and allocates a lot of new ones.
>>
>> Move C2H_ADAPTIVITY to the c2h workqueue instead of handling it directly
>> from rtw_fw_c2h_cmd_rx_irqsafe(), which runs in the RX workqueue. It
>> reads hardware registers, which is not "irqsafe" with USB.
>
> This part should be in a separated patch.
>
Do you mean just C2H_ADAPTIVITY or WQ_BH as well?
>>
>> This is more or less what the out-of-tree Realtek drivers do, except
>> they use a tasklet instead of a BH workqueue.
>>
>> Tested with RTL8723DU, RTL8821AU, RTL8812AU, RTL8812BU, RTL8822CU,
>> RTL8811CU.
>>
>> Closes: https://lore.kernel.org/linux-wireless/6e7ecb47-7ea0-433a-a19f-05f88a2edf6b@gmail.com/
>> Signed-off-by: Bitterblue Smith <rtl8821cerfe2@gmail.com>
>> ---
>> drivers/net/wireless/realtek/rtw88/fw.c | 7 +--
>> drivers/net/wireless/realtek/rtw88/usb.c | 73 ++++++++++++++++++++----
>> drivers/net/wireless/realtek/rtw88/usb.h | 3 +
>> 3 files changed, 67 insertions(+), 16 deletions(-)
>>
>> diff --git a/drivers/net/wireless/realtek/rtw88/fw.c b/drivers/net/wireless/realtek/rtw88/fw.c
>> index e6e9946fbf44..02389b7c6876 100644
>> --- a/drivers/net/wireless/realtek/rtw88/fw.c
>> +++ b/drivers/net/wireless/realtek/rtw88/fw.c
>> @@ -332,6 +332,9 @@ void rtw_fw_c2h_cmd_handle(struct rtw_dev *rtwdev, struct sk_buff *skb)
>> case C2H_RA_RPT:
>> rtw_fw_ra_report_handle(rtwdev, c2h->payload, len);
>> break;
>> + case C2H_ADAPTIVITY:
>> + rtw_fw_adaptivity_result(rtwdev, c2h->payload, len);
>> + break;
>> default:
>> rtw_dbg(rtwdev, RTW_DBG_FW, "C2H 0x%x isn't handled\n", c2h->id);
>> break;
>> @@ -367,10 +370,6 @@ void rtw_fw_c2h_cmd_rx_irqsafe(struct rtw_dev *rtwdev, u32 pkt_offset,
>> rtw_fw_scan_result(rtwdev, c2h->payload, len);
>> dev_kfree_skb_any(skb);
>> break;
>> - case C2H_ADAPTIVITY:
>> - rtw_fw_adaptivity_result(rtwdev, c2h->payload, len);
>> - dev_kfree_skb_any(skb);
>> - break;
>> default:
>> /* pass offset for further operation */
>> *((u32 *)skb->cb) = pkt_offset;
>> diff --git a/drivers/net/wireless/realtek/rtw88/usb.c b/drivers/net/wireless/realtek/rtw88/usb.c
>> index 727569d4ef4b..5c2dfa2ced92 100644
>> --- a/drivers/net/wireless/realtek/rtw88/usb.c
>> +++ b/drivers/net/wireless/realtek/rtw88/usb.c
>> @@ -585,7 +585,7 @@ static void rtw_usb_rx_handler(struct work_struct *work)
>> goto skip_packet;
>> }
>>
>> - skb = alloc_skb(skb_len, GFP_KERNEL);
>> + skb = alloc_skb(skb_len, GFP_ATOMIC);
>> if (!skb) {
>> rtw_dbg(rtwdev, RTW_DBG_USB,
>> "failed to allocate RX skb of size %u\n",
>> @@ -612,7 +612,11 @@ static void rtw_usb_rx_handler(struct work_struct *work)
>> rx_desc += next_pkt;
>> } while (rx_desc + pkt_desc_sz < rx_skb->data + rx_skb->len);
>>
>> - dev_kfree_skb_any(rx_skb);
>> + if (skb_queue_len(&rtwusb->rx_free_queue) >=
>> + RTW_USB_RX_SKB_NUM - RTW_USB_RXCB_NUM)
>> + dev_kfree_skb_any(rx_skb);
>> + else
>> + skb_queue_tail(&rtwusb->rx_free_queue, rx_skb);
>
> Why not just queue and reuse rx_skb for all? That would be simpler.
>
I didn't want to let it allocate too many skbs. I didn't find any kind
of limit in the official drivers, so maybe it would be fine.
>> }
>> }
>>
>> @@ -621,23 +625,57 @@ static void rtw_usb_read_port_complete(struct urb *urb);
>> static void rtw_usb_rx_resubmit(struct rtw_usb *rtwusb, struct rx_usb_ctrl_block *rxcb)
>> {
>> struct rtw_dev *rtwdev = rtwusb->rtwdev;
>> + struct sk_buff *rx_skb;
>> + gfp_t priority = in_interrupt() ? GFP_ATOMIC : GFP_KERNEL;
>
> I remember in_interrupt() is not recommended. Can't we pass necessary gfp_t
> via function argument by callers?
>
Yes, I can do that.
>> int error;
>>
>> - rxcb->rx_skb = alloc_skb(RTW_USB_MAX_RECVBUF_SZ, GFP_ATOMIC);
>> - if (!rxcb->rx_skb)
>> - return;
>> + rx_skb = skb_dequeue(&rtwusb->rx_free_queue);
>> + if (!rx_skb)
>> + rx_skb = alloc_skb(RTW_USB_MAX_RECVBUF_SZ, priority);
>> +
>> + if (!rx_skb)
>> + goto try_later;
>> +
>> + rxcb->rx_skb = rx_skb;
>> +
>> + skb_reset_tail_pointer(rxcb->rx_skb);
>> + rxcb->rx_skb->len = 0;
>
> How about moving these initialization upward before ' rxcb->rx_skb = rx_skb;'?
> Statements can be shorter, and it is reasonable to initialize data before
> assignment.
>
>>
>> usb_fill_bulk_urb(rxcb->rx_urb, rtwusb->udev,
>> usb_rcvbulkpipe(rtwusb->udev, rtwusb->pipe_in),
>> rxcb->rx_skb->data, RTW_USB_MAX_RECVBUF_SZ,
>> rtw_usb_read_port_complete, rxcb);
>>
>> - error = usb_submit_urb(rxcb->rx_urb, GFP_ATOMIC);
>> + error = usb_submit_urb(rxcb->rx_urb, priority);
>
> Not sure if 'priority' fits the meaning. Maybe many kernel code just
> uses 'gfp'.
>
>> if (error) {
>> - kfree_skb(rxcb->rx_skb);
>> + skb_queue_tail(&rtwusb->rx_free_queue, rxcb->rx_skb);
>> +
>> if (error != -ENODEV)
>> rtw_err(rtwdev, "Err sending rx data urb %d\n",
>> error);
>
> Since here rxcb->rx_skb != NULL, will it be a problem? The rxcb will not be
> submitted again? Should all error cases go to try_later label?
>
Right, the rxcb will be submitted again only if the error was ENOMEM
I don't know what other errors can be considered temporary. I never
ran into the case where the error is not ENODEV.
>
>> +
>> + if (error == -ENOMEM)
>> + goto try_later;
>> + }
>> +
>> + return;
>> +
>> +try_later:
>> + rxcb->rx_skb = NULL;
>> + queue_work(rtwusb->rxwq, &rtwusb->rx_urb_work);
>> +}
>> +
>> +static void rtw_usb_rx_resubmit_work(struct work_struct *work)
>> +{
>> + struct rtw_usb *rtwusb = container_of(work, struct rtw_usb, rx_urb_work);
>> + struct rx_usb_ctrl_block *rxcb;
>> + int i;
>> +
>> + for (i = 0; i < RTW_USB_RXCB_NUM; i++) {
>> + rxcb = &rtwusb->rx_cb[i];
>> +
>> + if (!rxcb->rx_skb)
>> + rtw_usb_rx_resubmit(rtwusb, rxcb);
>> }
>> }
>>
^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: [PATCH 1/2] wifi: rtw88: usb: Copy instead of cloning the RX skb
2024-11-25 23:48 ` Bitterblue Smith
@ 2024-11-26 1:17 ` Ping-Ke Shih
2024-11-26 20:40 ` Bitterblue Smith
0 siblings, 1 reply; 10+ messages in thread
From: Ping-Ke Shih @ 2024-11-26 1:17 UTC (permalink / raw)
To: Bitterblue Smith, linux-wireless@vger.kernel.org; +Cc: Sascha Hauer
Bitterblue Smith <rtl8821cerfe2@gmail.com> wrote:
> On 18/11/2024 11:16, Ping-Ke Shih wrote:
> > Bitterblue Smith <rtl8821cerfe2@gmail.com> wrote:
> >> "iperf3 -c 192.168.0.1 -R --udp -b 0" shows about 40% of datagrams
> >> are lost. Many torrents don't download faster than 3 MiB/s, probably
> >> because the Bittorrent protocol uses UDP. This is somehow related to
> >> the use of skb_clone() in the RX path.
> >
> > Using skb_clone() can improve throughput is weird to me too. Do you check
> > top with 100% cpu usage?
> >
>
> I checked the CPU usage now and the results are interesting. In short,
> patch 1/2 slightly raises the CPU usage, and patch 2/2 lowers it a lot.
Originally I just wanted to know if throughput is a limit of CPU bound.
Anyway good to know this patchset can improve CPU usage.
> >>
> >> Don't use skb_clone(). Instead allocate a new skb for each 802.11 frame
> >> received and copy the data from the big (32768 byte) skb.
> >>
> >> With this patch, "iperf3 -c 192.168.0.1 -R --udp -b 0" shows only 1-2%
> >> of datagrams are lost, and torrents can reach download speeds of 36
> >> MiB/s.
> >>
> >> Tested with RTL8812AU and RTL8822CU.
> >>
> >> Signed-off-by: Bitterblue Smith <rtl8821cerfe2@gmail.com>
> >> ---
> >> drivers/net/wireless/realtek/rtw88/usb.c | 52 ++++++++++++++----------
> >> 1 file changed, 31 insertions(+), 21 deletions(-)
> >>
> >> diff --git a/drivers/net/wireless/realtek/rtw88/usb.c b/drivers/net/wireless/realtek/rtw88/usb.c
> >> index 93ac4837e1b5..727569d4ef4b 100644
> >> --- a/drivers/net/wireless/realtek/rtw88/usb.c
> >> +++ b/drivers/net/wireless/realtek/rtw88/usb.c
> >> @@ -7,6 +7,7 @@
> >> #include <linux/mutex.h>
> >> #include "main.h"
> >> #include "debug.h"
> >> +#include "mac.h"
> >> #include "reg.h"
> >> #include "tx.h"
> >> #include "rx.h"
> >> @@ -546,49 +547,58 @@ static void rtw_usb_rx_handler(struct work_struct *work)
> >> {
> >> struct rtw_usb *rtwusb = container_of(work, struct rtw_usb, rx_work);
> >> struct rtw_dev *rtwdev = rtwusb->rtwdev;
> >> - const struct rtw_chip_info *chip = rtwdev->chip;
> >> - u32 pkt_desc_sz = chip->rx_pkt_desc_sz;
> >> struct ieee80211_rx_status rx_status;
> >> - u32 pkt_offset, next_pkt, urb_len;
> >> struct rtw_rx_pkt_stat pkt_stat;
> >> - struct sk_buff *next_skb;
> >> + struct sk_buff *rx_skb;
> >> struct sk_buff *skb;
> >> + u32 pkt_desc_sz = rtwdev->chip->rx_pkt_desc_sz;
> >> + u32 max_skb_len = pkt_desc_sz + PHY_STATUS_SIZE * 8 +
> >> + IEEE80211_MAX_MPDU_LEN_VHT_11454;
> >> + u32 pkt_offset, next_pkt, skb_len;
> >> u8 *rx_desc;
> >> int limit;
> >>
> >> for (limit = 0; limit < 200; limit++) {
> >> - skb = skb_dequeue(&rtwusb->rx_queue);
> >> - if (!skb)
> >> + rx_skb = skb_dequeue(&rtwusb->rx_queue);
> >> + if (!rx_skb)
> >> break;
> >>
> >> if (skb_queue_len(&rtwusb->rx_queue) >= RTW_USB_MAX_RXQ_LEN) {
> >> dev_dbg_ratelimited(rtwdev->dev, "failed to get rx_queue, overflow\n");
> >> - dev_kfree_skb_any(skb);
> >> + dev_kfree_skb_any(rx_skb);
> >> continue;
> >> }
> >>
> >> - urb_len = skb->len;
> >> + rx_desc = rx_skb->data;
> >>
> >> do {
> >> - rx_desc = skb->data;
> >> rtw_rx_query_rx_desc(rtwdev, rx_desc, &pkt_stat,
> >> &rx_status);
> >> pkt_offset = pkt_desc_sz + pkt_stat.drv_info_sz +
> >> pkt_stat.shift;
> >>
> >> - next_pkt = round_up(pkt_stat.pkt_len + pkt_offset, 8);
> >> + skb_len = pkt_stat.pkt_len + pkt_offset;
> >> + if (skb_len > max_skb_len) {
> >
> > This seems a new rule introduced by this patch. Do you really encounter this
> > case? Maybe this is the cause of slow download throughput?
> >
>
> No, I never saw this case. It just seemed like a good idea to check the
> size passed to alloc_skb. Maybe it's not needed?
I think it is fine.
Asking some questions before, I just tried to find a cause why 40% datagram get
lost as you mentioned in commit message, but I can't.
>
> >> + rtw_dbg(rtwdev, RTW_DBG_USB,
> >> + "skipping too big packet: %u\n",
> >> + skb_len);
> >> + goto skip_packet;
> >> + }
> >>
> >> - if (urb_len >= next_pkt + pkt_desc_sz)
> >> - next_skb = skb_clone(skb, GFP_KERNEL);
> >> - else
> >> - next_skb = NULL;
> >> + skb = alloc_skb(skb_len, GFP_KERNEL);
> >> + if (!skb) {
> >> + rtw_dbg(rtwdev, RTW_DBG_USB,
> >> + "failed to allocate RX skb of size %u\n",
> >> + skb_len);
> >> + goto skip_packet;
> >> + }
> >> +
> >> + skb_put_data(skb, rx_desc, skb_len);
> >>
> >> if (pkt_stat.is_c2h) {
> >> - skb_trim(skb, pkt_stat.pkt_len + pkt_offset);
> >> rtw_fw_c2h_cmd_rx_irqsafe(rtwdev, pkt_offset, skb);
> >> } else {
> >> skb_pull(skb, pkt_offset);
> >> - skb_trim(skb, pkt_stat.pkt_len);
> >> rtw_update_rx_freq_for_invalid(rtwdev, skb,
> >> &rx_status,
> >> &pkt_stat);
> >> @@ -597,12 +607,12 @@ static void rtw_usb_rx_handler(struct work_struct *work)
> >> ieee80211_rx_irqsafe(rtwdev->hw, skb);
> >> }
> >>
> >> - skb = next_skb;
> >> - if (skb)
> >> - skb_pull(skb, next_pkt);
> >> +skip_packet:
> >> + next_pkt = round_up(skb_len, 8);
> >> + rx_desc += next_pkt;
> >> + } while (rx_desc + pkt_desc_sz < rx_skb->data + rx_skb->len);
> >>
> >> - urb_len -= next_pkt;
> >> - } while (skb);
> >> + dev_kfree_skb_any(rx_skb);
> >> }
> >> }
> >>
> >> --
> >> 2.47.0
> >
^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: [PATCH 2/2] wifi: rtw88: usb: Preallocate and reuse the RX skbs
2024-11-25 23:49 ` Bitterblue Smith
@ 2024-11-26 1:26 ` Ping-Ke Shih
2024-12-11 18:07 ` Bitterblue Smith
0 siblings, 1 reply; 10+ messages in thread
From: Ping-Ke Shih @ 2024-11-26 1:26 UTC (permalink / raw)
To: Bitterblue Smith, linux-wireless@vger.kernel.org; +Cc: Sascha Hauer
Bitterblue Smith <rtl8821cerfe2@gmail.com> wrote:
> On 19/11/2024 02:50, Ping-Ke Shih wrote:
> > Bitterblue Smith <rtl8821cerfe2@gmail.com> wrote:
> >> The USB driver uses four USB Request Blocks for RX. Before submitting
> >> one, it allocates a 32768 byte skb for the RX data. This allocation can
> >> fail, maybe due to temporary memory fragmentation. When the allocation
> >> fails, the corresponding URB is never submitted again. After four such
> >> allocation failures, all RX stops because the driver is not requesting
> >> data from the device anymore.
> >>
> >> Don't allocate a 32768 byte skb when submitting a USB Request Block
> >> (which happens very often). Instead preallocate 8 such skbs, and reuse
> >> them over and over. If all 8 are busy, allocate a new one. This is
> >> pretty rare. If the allocation fails, use a work to try again later.
> >> When there are enough free skbs again, free the excess skbs.
> >>
> >> Also, use WQ_BH for the RX workqueue. With a normal or high priority
> >> workqueue the skbs are processed too slowly when the system is even a
> >> little busy, like when opening a new page in a browser, and the driver
> >> runs out of free skbs and allocates a lot of new ones.
> >>
> >> Move C2H_ADAPTIVITY to the c2h workqueue instead of handling it directly
> >> from rtw_fw_c2h_cmd_rx_irqsafe(), which runs in the RX workqueue. It
> >> reads hardware registers, which is not "irqsafe" with USB.
> >
> > This part should be in a separated patch.
> >
>
> Do you mean just C2H_ADAPTIVITY or WQ_BH as well?
Just C2H_ADAPTIVITY.
With patch subject, people can't understand this change.
>
> >>
> >> This is more or less what the out-of-tree Realtek drivers do, except
> >> they use a tasklet instead of a BH workqueue.
> >>
> >> Tested with RTL8723DU, RTL8821AU, RTL8812AU, RTL8812BU, RTL8822CU,
> >> RTL8811CU.
> >>
> >> Closes: https://lore.kernel.org/linux-wireless/6e7ecb47-7ea0-433a-a19f-05f88a2edf6b@gmail.com/
> >> Signed-off-by: Bitterblue Smith <rtl8821cerfe2@gmail.com>
> >> ---
> >> drivers/net/wireless/realtek/rtw88/fw.c | 7 +--
> >> drivers/net/wireless/realtek/rtw88/usb.c | 73 ++++++++++++++++++++----
> >> drivers/net/wireless/realtek/rtw88/usb.h | 3 +
> >> 3 files changed, 67 insertions(+), 16 deletions(-)
> >>
> >> diff --git a/drivers/net/wireless/realtek/rtw88/fw.c b/drivers/net/wireless/realtek/rtw88/fw.c
> >> index e6e9946fbf44..02389b7c6876 100644
> >> --- a/drivers/net/wireless/realtek/rtw88/fw.c
> >> +++ b/drivers/net/wireless/realtek/rtw88/fw.c
> >> @@ -332,6 +332,9 @@ void rtw_fw_c2h_cmd_handle(struct rtw_dev *rtwdev, struct sk_buff *skb)
> >> case C2H_RA_RPT:
> >> rtw_fw_ra_report_handle(rtwdev, c2h->payload, len);
> >> break;
> >> + case C2H_ADAPTIVITY:
> >> + rtw_fw_adaptivity_result(rtwdev, c2h->payload, len);
> >> + break;
> >> default:
> >> rtw_dbg(rtwdev, RTW_DBG_FW, "C2H 0x%x isn't handled\n", c2h->id);
> >> break;
> >> @@ -367,10 +370,6 @@ void rtw_fw_c2h_cmd_rx_irqsafe(struct rtw_dev *rtwdev, u32 pkt_offset,
> >> rtw_fw_scan_result(rtwdev, c2h->payload, len);
> >> dev_kfree_skb_any(skb);
> >> break;
> >> - case C2H_ADAPTIVITY:
> >> - rtw_fw_adaptivity_result(rtwdev, c2h->payload, len);
> >> - dev_kfree_skb_any(skb);
> >> - break;
> >> default:
> >> /* pass offset for further operation */
> >> *((u32 *)skb->cb) = pkt_offset;
> >> diff --git a/drivers/net/wireless/realtek/rtw88/usb.c b/drivers/net/wireless/realtek/rtw88/usb.c
> >> index 727569d4ef4b..5c2dfa2ced92 100644
> >> --- a/drivers/net/wireless/realtek/rtw88/usb.c
> >> +++ b/drivers/net/wireless/realtek/rtw88/usb.c
> >> @@ -585,7 +585,7 @@ static void rtw_usb_rx_handler(struct work_struct *work)
> >> goto skip_packet;
> >> }
> >>
> >> - skb = alloc_skb(skb_len, GFP_KERNEL);
> >> + skb = alloc_skb(skb_len, GFP_ATOMIC);
> >> if (!skb) {
> >> rtw_dbg(rtwdev, RTW_DBG_USB,
> >> "failed to allocate RX skb of size %u\n",
> >> @@ -612,7 +612,11 @@ static void rtw_usb_rx_handler(struct work_struct *work)
> >> rx_desc += next_pkt;
> >> } while (rx_desc + pkt_desc_sz < rx_skb->data + rx_skb->len);
> >>
> >> - dev_kfree_skb_any(rx_skb);
> >> + if (skb_queue_len(&rtwusb->rx_free_queue) >=
> >> + RTW_USB_RX_SKB_NUM - RTW_USB_RXCB_NUM)
> >> + dev_kfree_skb_any(rx_skb);
> >> + else
> >> + skb_queue_tail(&rtwusb->rx_free_queue, rx_skb);
> >
> > Why not just queue and reuse rx_skb for all? That would be simpler.
> >
>
> I didn't want to let it allocate too many skbs. I didn't find any kind
> of limit in the official drivers, so maybe it would be fine.
The case ' skb_queue_len(&rtwusb->rx_free_queue) >= 8 - 4' is rare?
If so, I think this is fine. If not, to repeatedly allocate and free
would cause memory fragment, and higher probability to failed to allocate
memory with GFP_ATOMIC. Also seemingly additional 4 persistent rx_skb
is not a big cost.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] wifi: rtw88: usb: Copy instead of cloning the RX skb
2024-11-26 1:17 ` Ping-Ke Shih
@ 2024-11-26 20:40 ` Bitterblue Smith
0 siblings, 0 replies; 10+ messages in thread
From: Bitterblue Smith @ 2024-11-26 20:40 UTC (permalink / raw)
To: Ping-Ke Shih, linux-wireless@vger.kernel.org; +Cc: Sascha Hauer
On 26/11/2024 03:17, Ping-Ke Shih wrote:
> Bitterblue Smith <rtl8821cerfe2@gmail.com> wrote:
>> On 18/11/2024 11:16, Ping-Ke Shih wrote:
>>> Bitterblue Smith <rtl8821cerfe2@gmail.com> wrote:
>>>> "iperf3 -c 192.168.0.1 -R --udp -b 0" shows about 40% of datagrams
>>>> are lost. Many torrents don't download faster than 3 MiB/s, probably
>>>> because the Bittorrent protocol uses UDP. This is somehow related to
>>>> the use of skb_clone() in the RX path.
>>>
>>> Using skb_clone() can improve throughput is weird to me too. Do you check
>>> top with 100% cpu usage?
>>>
>>
>> I checked the CPU usage now and the results are interesting. In short,
>> patch 1/2 slightly raises the CPU usage, and patch 2/2 lowers it a lot.
>
> Originally I just wanted to know if throughput is a limit of CPU bound.
> Anyway good to know this patchset can improve CPU usage.
>
>>>>
>>>> Don't use skb_clone(). Instead allocate a new skb for each 802.11 frame
>>>> received and copy the data from the big (32768 byte) skb.
>>>>
>>>> With this patch, "iperf3 -c 192.168.0.1 -R --udp -b 0" shows only 1-2%
>>>> of datagrams are lost, and torrents can reach download speeds of 36
>>>> MiB/s.
>>>>
>>>> Tested with RTL8812AU and RTL8822CU.
>>>>
>>>> Signed-off-by: Bitterblue Smith <rtl8821cerfe2@gmail.com>
>>>> ---
>>>> drivers/net/wireless/realtek/rtw88/usb.c | 52 ++++++++++++++----------
>>>> 1 file changed, 31 insertions(+), 21 deletions(-)
>>>>
>>>> diff --git a/drivers/net/wireless/realtek/rtw88/usb.c b/drivers/net/wireless/realtek/rtw88/usb.c
>>>> index 93ac4837e1b5..727569d4ef4b 100644
>>>> --- a/drivers/net/wireless/realtek/rtw88/usb.c
>>>> +++ b/drivers/net/wireless/realtek/rtw88/usb.c
>>>> @@ -7,6 +7,7 @@
>>>> #include <linux/mutex.h>
>>>> #include "main.h"
>>>> #include "debug.h"
>>>> +#include "mac.h"
>>>> #include "reg.h"
>>>> #include "tx.h"
>>>> #include "rx.h"
>>>> @@ -546,49 +547,58 @@ static void rtw_usb_rx_handler(struct work_struct *work)
>>>> {
>>>> struct rtw_usb *rtwusb = container_of(work, struct rtw_usb, rx_work);
>>>> struct rtw_dev *rtwdev = rtwusb->rtwdev;
>>>> - const struct rtw_chip_info *chip = rtwdev->chip;
>>>> - u32 pkt_desc_sz = chip->rx_pkt_desc_sz;
>>>> struct ieee80211_rx_status rx_status;
>>>> - u32 pkt_offset, next_pkt, urb_len;
>>>> struct rtw_rx_pkt_stat pkt_stat;
>>>> - struct sk_buff *next_skb;
>>>> + struct sk_buff *rx_skb;
>>>> struct sk_buff *skb;
>>>> + u32 pkt_desc_sz = rtwdev->chip->rx_pkt_desc_sz;
>>>> + u32 max_skb_len = pkt_desc_sz + PHY_STATUS_SIZE * 8 +
>>>> + IEEE80211_MAX_MPDU_LEN_VHT_11454;
>>>> + u32 pkt_offset, next_pkt, skb_len;
>>>> u8 *rx_desc;
>>>> int limit;
>>>>
>>>> for (limit = 0; limit < 200; limit++) {
>>>> - skb = skb_dequeue(&rtwusb->rx_queue);
>>>> - if (!skb)
>>>> + rx_skb = skb_dequeue(&rtwusb->rx_queue);
>>>> + if (!rx_skb)
>>>> break;
>>>>
>>>> if (skb_queue_len(&rtwusb->rx_queue) >= RTW_USB_MAX_RXQ_LEN) {
>>>> dev_dbg_ratelimited(rtwdev->dev, "failed to get rx_queue, overflow\n");
>>>> - dev_kfree_skb_any(skb);
>>>> + dev_kfree_skb_any(rx_skb);
>>>> continue;
>>>> }
>>>>
>>>> - urb_len = skb->len;
>>>> + rx_desc = rx_skb->data;
>>>>
>>>> do {
>>>> - rx_desc = skb->data;
>>>> rtw_rx_query_rx_desc(rtwdev, rx_desc, &pkt_stat,
>>>> &rx_status);
>>>> pkt_offset = pkt_desc_sz + pkt_stat.drv_info_sz +
>>>> pkt_stat.shift;
>>>>
>>>> - next_pkt = round_up(pkt_stat.pkt_len + pkt_offset, 8);
>>>> + skb_len = pkt_stat.pkt_len + pkt_offset;
>>>> + if (skb_len > max_skb_len) {
>>>
>>> This seems a new rule introduced by this patch. Do you really encounter this
>>> case? Maybe this is the cause of slow download throughput?
>>>
>>
>> No, I never saw this case. It just seemed like a good idea to check the
>> size passed to alloc_skb. Maybe it's not needed?
>
> I think it is fine.
>
> Asking some questions before, I just tried to find a cause why 40% datagram get
> lost as you mentioned in commit message, but I can't.
>
By the way, I saw 30-40% datagrams lost with RTL8822CE as well. But the
PCI driver is not using skb_clone so I'm not sure what is going on there.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] wifi: rtw88: usb: Preallocate and reuse the RX skbs
2024-11-26 1:26 ` Ping-Ke Shih
@ 2024-12-11 18:07 ` Bitterblue Smith
0 siblings, 0 replies; 10+ messages in thread
From: Bitterblue Smith @ 2024-12-11 18:07 UTC (permalink / raw)
To: Ping-Ke Shih, linux-wireless@vger.kernel.org; +Cc: Sascha Hauer
On 26/11/2024 03:26, Ping-Ke Shih wrote:
> Bitterblue Smith <rtl8821cerfe2@gmail.com> wrote:
>> On 19/11/2024 02:50, Ping-Ke Shih wrote:
>>> Bitterblue Smith <rtl8821cerfe2@gmail.com> wrote:
>>>> The USB driver uses four USB Request Blocks for RX. Before submitting
>>>> one, it allocates a 32768 byte skb for the RX data. This allocation can
>>>> fail, maybe due to temporary memory fragmentation. When the allocation
>>>> fails, the corresponding URB is never submitted again. After four such
>>>> allocation failures, all RX stops because the driver is not requesting
>>>> data from the device anymore.
>>>>
>>>> Don't allocate a 32768 byte skb when submitting a USB Request Block
>>>> (which happens very often). Instead preallocate 8 such skbs, and reuse
>>>> them over and over. If all 8 are busy, allocate a new one. This is
>>>> pretty rare. If the allocation fails, use a work to try again later.
>>>> When there are enough free skbs again, free the excess skbs.
>>>>
>>>> Also, use WQ_BH for the RX workqueue. With a normal or high priority
>>>> workqueue the skbs are processed too slowly when the system is even a
>>>> little busy, like when opening a new page in a browser, and the driver
>>>> runs out of free skbs and allocates a lot of new ones.
>>>>
>>>> Move C2H_ADAPTIVITY to the c2h workqueue instead of handling it directly
>>>> from rtw_fw_c2h_cmd_rx_irqsafe(), which runs in the RX workqueue. It
>>>> reads hardware registers, which is not "irqsafe" with USB.
>>>
>>> This part should be in a separated patch.
>>>
>>
>> Do you mean just C2H_ADAPTIVITY or WQ_BH as well?
>
> Just C2H_ADAPTIVITY.
>
> With patch subject, people can't understand this change.
>
>>
>>>>
>>>> This is more or less what the out-of-tree Realtek drivers do, except
>>>> they use a tasklet instead of a BH workqueue.
>>>>
>>>> Tested with RTL8723DU, RTL8821AU, RTL8812AU, RTL8812BU, RTL8822CU,
>>>> RTL8811CU.
>>>>
>>>> Closes: https://lore.kernel.org/linux-wireless/6e7ecb47-7ea0-433a-a19f-05f88a2edf6b@gmail.com/
>>>> Signed-off-by: Bitterblue Smith <rtl8821cerfe2@gmail.com>
>>>> ---
>>>> drivers/net/wireless/realtek/rtw88/fw.c | 7 +--
>>>> drivers/net/wireless/realtek/rtw88/usb.c | 73 ++++++++++++++++++++----
>>>> drivers/net/wireless/realtek/rtw88/usb.h | 3 +
>>>> 3 files changed, 67 insertions(+), 16 deletions(-)
>>>>
>>>> diff --git a/drivers/net/wireless/realtek/rtw88/fw.c b/drivers/net/wireless/realtek/rtw88/fw.c
>>>> index e6e9946fbf44..02389b7c6876 100644
>>>> --- a/drivers/net/wireless/realtek/rtw88/fw.c
>>>> +++ b/drivers/net/wireless/realtek/rtw88/fw.c
>>>> @@ -332,6 +332,9 @@ void rtw_fw_c2h_cmd_handle(struct rtw_dev *rtwdev, struct sk_buff *skb)
>>>> case C2H_RA_RPT:
>>>> rtw_fw_ra_report_handle(rtwdev, c2h->payload, len);
>>>> break;
>>>> + case C2H_ADAPTIVITY:
>>>> + rtw_fw_adaptivity_result(rtwdev, c2h->payload, len);
>>>> + break;
>>>> default:
>>>> rtw_dbg(rtwdev, RTW_DBG_FW, "C2H 0x%x isn't handled\n", c2h->id);
>>>> break;
>>>> @@ -367,10 +370,6 @@ void rtw_fw_c2h_cmd_rx_irqsafe(struct rtw_dev *rtwdev, u32 pkt_offset,
>>>> rtw_fw_scan_result(rtwdev, c2h->payload, len);
>>>> dev_kfree_skb_any(skb);
>>>> break;
>>>> - case C2H_ADAPTIVITY:
>>>> - rtw_fw_adaptivity_result(rtwdev, c2h->payload, len);
>>>> - dev_kfree_skb_any(skb);
>>>> - break;
>>>> default:
>>>> /* pass offset for further operation */
>>>> *((u32 *)skb->cb) = pkt_offset;
>>>> diff --git a/drivers/net/wireless/realtek/rtw88/usb.c b/drivers/net/wireless/realtek/rtw88/usb.c
>>>> index 727569d4ef4b..5c2dfa2ced92 100644
>>>> --- a/drivers/net/wireless/realtek/rtw88/usb.c
>>>> +++ b/drivers/net/wireless/realtek/rtw88/usb.c
>>>> @@ -585,7 +585,7 @@ static void rtw_usb_rx_handler(struct work_struct *work)
>>>> goto skip_packet;
>>>> }
>>>>
>>>> - skb = alloc_skb(skb_len, GFP_KERNEL);
>>>> + skb = alloc_skb(skb_len, GFP_ATOMIC);
>>>> if (!skb) {
>>>> rtw_dbg(rtwdev, RTW_DBG_USB,
>>>> "failed to allocate RX skb of size %u\n",
>>>> @@ -612,7 +612,11 @@ static void rtw_usb_rx_handler(struct work_struct *work)
>>>> rx_desc += next_pkt;
>>>> } while (rx_desc + pkt_desc_sz < rx_skb->data + rx_skb->len);
>>>>
>>>> - dev_kfree_skb_any(rx_skb);
>>>> + if (skb_queue_len(&rtwusb->rx_free_queue) >=
>>>> + RTW_USB_RX_SKB_NUM - RTW_USB_RXCB_NUM)
>>>> + dev_kfree_skb_any(rx_skb);
>>>> + else
>>>> + skb_queue_tail(&rtwusb->rx_free_queue, rx_skb);
>>>
>>> Why not just queue and reuse rx_skb for all? That would be simpler.
>>>
>>
>> I didn't want to let it allocate too many skbs. I didn't find any kind
>> of limit in the official drivers, so maybe it would be fine.
>
> The case ' skb_queue_len(&rtwusb->rx_free_queue) >= 8 - 4' is rare?
> If so, I think this is fine. If not, to repeatedly allocate and free
> would cause memory fragment, and higher probability to failed to allocate
> memory with GFP_ATOMIC. Also seemingly additional 4 persistent rx_skb
> is not a big cost.
>
>
I feel like it's rare for me, but I never counted how often it happens.
Someone else reported that in a period of 14.5 hours additional skbs
were allocated and freed 295 times. This is their log:
https://github.com/user-attachments/files/17792914/dmesg_grep_rtw_8821au.txt
This computer has i7-6700 CPU with 32 GB of RAM.
Another user reported that additional skbs were allocated and freed
3824 times in a period of 1.25 hours. This is their log:
https://github.com/user-attachments/files/18008735/log.txt
This computer is a Raspberry Pi 3B with 1 GB RAM.
I agree that another 4 skbs would be fine. I don't know what the upper
limit should be. The user with the Raspberry Pi 3B also tested with no
limit and reported that the driver allocated 17 extra skbs. That seems
like too many to me.
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2024-12-11 18:07 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-14 23:06 [PATCH 1/2] wifi: rtw88: usb: Copy instead of cloning the RX skb Bitterblue Smith
2024-11-14 23:07 ` [PATCH 2/2] wifi: rtw88: usb: Preallocate and reuse the RX skbs Bitterblue Smith
2024-11-19 0:50 ` Ping-Ke Shih
2024-11-25 23:49 ` Bitterblue Smith
2024-11-26 1:26 ` Ping-Ke Shih
2024-12-11 18:07 ` Bitterblue Smith
2024-11-18 9:16 ` [PATCH 1/2] wifi: rtw88: usb: Copy instead of cloning the RX skb Ping-Ke Shih
2024-11-25 23:48 ` Bitterblue Smith
2024-11-26 1:17 ` Ping-Ke Shih
2024-11-26 20:40 ` Bitterblue Smith
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).