Linux wireless drivers development
 help / color / mirror / Atom feed
* [PATCH v2] wifi: mt76: mt76u: use a threaded NAPI for the RX path
@ 2026-06-09  0:32 Filip Bakreski
  2026-06-09  9:45 ` Lorenzo Bianconi
  0 siblings, 1 reply; 3+ messages in thread
From: Filip Bakreski @ 2026-06-09  0:32 UTC (permalink / raw)
  To: nbd, lorenzo, ryder.lee; +Cc: shayne.chen, sean.wang, linux-wireless

The USB RX path delivers frames to the stack via mt76_rx_complete() with
a NULL napi pointer, taking the netif_receive_skb_list() path, so it never
benefits from GRO -- unlike the DMA-based mt76 drivers, which pass a real
napi and use napi_gro_receive(). For bulk TCP traffic this is costly, as
every segment traverses the stack individually.

Service the MT_RXQ_MAIN queue from a threaded NAPI, reusing mt76_dev's
existing napi_dev and napi[] rather than adding new fields. The URB
completion handler schedules the napi; its poll drains the URBs, builds
the skbs, resubmits and delivers them through napi_gro_receive(). The MCU
queue stays on the existing RX worker. This enables GRO and moves RX
processing into its own kernel thread, parallelising the datapath.

On mt7921u at HE-MCS 11 (2x2, 80 MHz; fast.com, multiple streams) this
averages ~588 Mbit/s, versus ~424 Mbit/s when the same napi is instead
driven manually from the RX worker, and ~380 Mbit/s for the unmodified
driver.

Suggested-by: Lorenzo Bianconi <lorenzo@kernel.org>
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Filip Bakreski <phial@phiality.com>
---
v2:
- Service MT_RXQ_MAIN from a threaded NAPI instead of a NAPI driven
  manually from the RX worker; on mt7921u the threaded variant measured
  ~39% faster (~588 vs ~424 Mbit/s, fast.com) (Lorenzo Bianconi).
- Reuse mt76_dev's existing napi_dev/napi[] instead of adding new fields
  to struct mt76_usb (Lorenzo Bianconi).

v1: https://lore.kernel.org/linux-wireless/20260608044109.31730-1-phial@phiality.com/

 drivers/net/wireless/mediatek/mt76/usb.c | 56 +++++++++++++++++++++---
 1 file changed, 49 insertions(+), 7 deletions(-)

diff --git a/drivers/net/wireless/mediatek/mt76/usb.c b/drivers/net/wireless/mediatek/mt76/usb.c
index d9638a9b7..aef8f855f 100644
--- a/drivers/net/wireless/mediatek/mt76/usb.c
+++ b/drivers/net/wireless/mediatek/mt76/usb.c
@@ -580,7 +580,10 @@ static void mt76u_complete_rx(struct urb *urb)
 
 	q->head = (q->head + 1) % q->ndesc;
 	q->queued++;
-	mt76_worker_schedule(&dev->usb.rx_worker);
+	if (q == &dev->q_rx[MT_RXQ_MAIN])
+		napi_schedule(&dev->napi[MT_RXQ_MAIN]);
+	else
+		mt76_worker_schedule(&dev->usb.rx_worker);
 out:
 	spin_unlock_irqrestore(&q->lock, flags);
 }
@@ -618,11 +621,23 @@ mt76u_process_rx_queue(struct mt76_dev *dev, struct mt76_queue *q)
 		}
 		mt76u_submit_rx_buf(dev, qid, urb);
 	}
-	if (qid == MT_RXQ_MAIN) {
-		local_bh_disable();
-		mt76_rx_poll_complete(dev, MT_RXQ_MAIN, NULL);
-		local_bh_enable();
-	}
+}
+
+/* Threaded NAPI poll for the MAIN RX queue: drain URBs, build skbs, resubmit,
+ * then deliver through napi_gro_receive() and let napi_complete() flush GRO.
+ */
+static int mt76u_napi_poll(struct napi_struct *napi, int budget)
+{
+	struct mt76_dev *dev = mt76_priv(napi->dev);
+
+	rcu_read_lock();
+	mt76u_process_rx_queue(dev, &dev->q_rx[MT_RXQ_MAIN]);
+	mt76_rx_poll_complete(dev, MT_RXQ_MAIN, napi);
+	rcu_read_unlock();
+
+	napi_complete(napi);
+
+	return 0;
 }
 
 static void mt76u_rx_worker(struct mt76_worker *w)
@@ -632,8 +647,12 @@ static void mt76u_rx_worker(struct mt76_worker *w)
 	int i;
 
 	rcu_read_lock();
-	mt76_for_each_q_rx(dev, i)
+	mt76_for_each_q_rx(dev, i) {
+		/* MT_RXQ_MAIN is serviced by the threaded NAPI poll */
+		if (i == MT_RXQ_MAIN)
+			continue;
 		mt76u_process_rx_queue(dev, &dev->q_rx[i]);
+	}
 	rcu_read_unlock();
 }
 
@@ -723,6 +742,8 @@ void mt76u_stop_rx(struct mt76_dev *dev)
 	int i;
 
 	mt76_worker_disable(&dev->usb.rx_worker);
+	if (dev->napi_dev)
+		napi_disable(&dev->napi[MT_RXQ_MAIN]);
 
 	mt76_for_each_q_rx(dev, i) {
 		struct mt76_queue *q = &dev->q_rx[i];
@@ -751,6 +772,8 @@ int mt76u_resume_rx(struct mt76_dev *dev)
 	}
 
 	mt76_worker_enable(&dev->usb.rx_worker);
+	if (dev->napi_dev)
+		napi_enable(&dev->napi[MT_RXQ_MAIN]);
 
 	return 0;
 }
@@ -1051,6 +1074,13 @@ void mt76u_queues_deinit(struct mt76_dev *dev)
 	mt76u_stop_rx(dev);
 	mt76u_stop_tx(dev);
 
+	/* mt76u_stop_rx() (above) already napi_disable()d the MAIN queue */
+	if (dev->napi_dev) {
+		netif_napi_del(&dev->napi[MT_RXQ_MAIN]);
+		free_netdev(dev->napi_dev);
+		dev->napi_dev = NULL;
+	}
+
 	mt76u_free_rx(dev);
 	mt76u_free_tx(dev);
 }
@@ -1115,6 +1145,18 @@ int __mt76u_init(struct mt76_dev *dev, struct usb_interface *intf,
 	sched_set_fifo_low(usb->rx_worker.task);
 	sched_set_fifo_low(usb->status_worker.task);
 
+	/* threaded NAPI on a dummy netdev (reusing mt76_dev's napi_dev/napi[])
+	 * services the MAIN RX queue and gives the RX path GRO
+	 */
+	dev->napi_dev = alloc_netdev_dummy(sizeof(struct mt76_dev *));
+	if (!dev->napi_dev)
+		return -ENOMEM;
+	*(struct mt76_dev **)netdev_priv(dev->napi_dev) = dev;
+	strscpy(dev->napi_dev->name, "mt76u-rx", sizeof(dev->napi_dev->name));
+	dev->napi_dev->threaded = 1;
+	netif_napi_add(dev->napi_dev, &dev->napi[MT_RXQ_MAIN], mt76u_napi_poll);
+	napi_enable(&dev->napi[MT_RXQ_MAIN]);
+
 	return 0;
 }
 EXPORT_SYMBOL_GPL(__mt76u_init);

base-commit: 5f6099446d1ddb888e36cdf93b6a0551f05c1267
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] wifi: mt76: mt76u: use a threaded NAPI for the RX path
  2026-06-09  0:32 [PATCH v2] wifi: mt76: mt76u: use a threaded NAPI for the RX path Filip Bakreski
@ 2026-06-09  9:45 ` Lorenzo Bianconi
  2026-06-09 10:57   ` Phiality
  0 siblings, 1 reply; 3+ messages in thread
From: Lorenzo Bianconi @ 2026-06-09  9:45 UTC (permalink / raw)
  To: Filip Bakreski; +Cc: nbd, ryder.lee, shayne.chen, sean.wang, linux-wireless

[-- Attachment #1: Type: text/plain, Size: 5719 bytes --]

> The USB RX path delivers frames to the stack via mt76_rx_complete() with
> a NULL napi pointer, taking the netif_receive_skb_list() path, so it never
> benefits from GRO -- unlike the DMA-based mt76 drivers, which pass a real
> napi and use napi_gro_receive(). For bulk TCP traffic this is costly, as
> every segment traverses the stack individually.
> 
> Service the MT_RXQ_MAIN queue from a threaded NAPI, reusing mt76_dev's
> existing napi_dev and napi[] rather than adding new fields. The URB
> completion handler schedules the napi; its poll drains the URBs, builds
> the skbs, resubmits and delivers them through napi_gro_receive(). The MCU
> queue stays on the existing RX worker. This enables GRO and moves RX
> processing into its own kernel thread, parallelising the datapath.
> 
> On mt7921u at HE-MCS 11 (2x2, 80 MHz; fast.com, multiple streams) this
> averages ~588 Mbit/s, versus ~424 Mbit/s when the same napi is instead
> driven manually from the RX worker, and ~380 Mbit/s for the unmodified
> driver.
> 
> Suggested-by: Lorenzo Bianconi <lorenzo@kernel.org>
> Assisted-by: Claude:claude-opus-4-8
> Signed-off-by: Filip Bakreski <phial@phiality.com>
> ---
> v2:
> - Service MT_RXQ_MAIN from a threaded NAPI instead of a NAPI driven
>   manually from the RX worker; on mt7921u the threaded variant measured
>   ~39% faster (~588 vs ~424 Mbit/s, fast.com) (Lorenzo Bianconi).
> - Reuse mt76_dev's existing napi_dev/napi[] instead of adding new fields
>   to struct mt76_usb (Lorenzo Bianconi).
> 
> v1: https://lore.kernel.org/linux-wireless/20260608044109.31730-1-phial@phiality.com/

Hi Filip,

I guess the patch is fine, just a couple of nits inline.

Regards,
Lorenzo

> 
>  drivers/net/wireless/mediatek/mt76/usb.c | 56 +++++++++++++++++++++---
>  1 file changed, 49 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/net/wireless/mediatek/mt76/usb.c b/drivers/net/wireless/mediatek/mt76/usb.c
> index d9638a9b7..aef8f855f 100644
> --- a/drivers/net/wireless/mediatek/mt76/usb.c
> +++ b/drivers/net/wireless/mediatek/mt76/usb.c
> @@ -580,7 +580,10 @@ static void mt76u_complete_rx(struct urb *urb)
>  
>  	q->head = (q->head + 1) % q->ndesc;
>  	q->queued++;
> -	mt76_worker_schedule(&dev->usb.rx_worker);

nit: new-line here.

> +	if (q == &dev->q_rx[MT_RXQ_MAIN])
> +		napi_schedule(&dev->napi[MT_RXQ_MAIN]);
> +	else
> +		mt76_worker_schedule(&dev->usb.rx_worker);
>  out:
>  	spin_unlock_irqrestore(&q->lock, flags);
>  }
> @@ -618,11 +621,23 @@ mt76u_process_rx_queue(struct mt76_dev *dev, struct mt76_queue *q)
>  		}
>  		mt76u_submit_rx_buf(dev, qid, urb);
>  	}
> -	if (qid == MT_RXQ_MAIN) {
> -		local_bh_disable();
> -		mt76_rx_poll_complete(dev, MT_RXQ_MAIN, NULL);
> -		local_bh_enable();
> -	}
> +}
> +
> +/* Threaded NAPI poll for the MAIN RX queue: drain URBs, build skbs, resubmit,
> + * then deliver through napi_gro_receive() and let napi_complete() flush GRO.
> + */
> +static int mt76u_napi_poll(struct napi_struct *napi, int budget)
> +{
> +	struct mt76_dev *dev = mt76_priv(napi->dev);
> +
> +	rcu_read_lock();
> +	mt76u_process_rx_queue(dev, &dev->q_rx[MT_RXQ_MAIN]);
> +	mt76_rx_poll_complete(dev, MT_RXQ_MAIN, napi);
> +	rcu_read_unlock();
> +
> +	napi_complete(napi);
> +
> +	return 0;
>  }
>  
>  static void mt76u_rx_worker(struct mt76_worker *w)
> @@ -632,8 +647,12 @@ static void mt76u_rx_worker(struct mt76_worker *w)
>  	int i;
>  
>  	rcu_read_lock();
> -	mt76_for_each_q_rx(dev, i)
> +	mt76_for_each_q_rx(dev, i) {
> +		/* MT_RXQ_MAIN is serviced by the threaded NAPI poll */
> +		if (i == MT_RXQ_MAIN)
> +			continue;

nit: new-line here.

>  		mt76u_process_rx_queue(dev, &dev->q_rx[i]);
> +	}
>  	rcu_read_unlock();
>  }
>  
> @@ -723,6 +742,8 @@ void mt76u_stop_rx(struct mt76_dev *dev)
>  	int i;
>  
>  	mt76_worker_disable(&dev->usb.rx_worker);
> +	if (dev->napi_dev)
> +		napi_disable(&dev->napi[MT_RXQ_MAIN]);
>  
>  	mt76_for_each_q_rx(dev, i) {
>  		struct mt76_queue *q = &dev->q_rx[i];
> @@ -751,6 +772,8 @@ int mt76u_resume_rx(struct mt76_dev *dev)
>  	}
>  
>  	mt76_worker_enable(&dev->usb.rx_worker);
> +	if (dev->napi_dev)
> +		napi_enable(&dev->napi[MT_RXQ_MAIN]);
>  
>  	return 0;
>  }
> @@ -1051,6 +1074,13 @@ void mt76u_queues_deinit(struct mt76_dev *dev)
>  	mt76u_stop_rx(dev);
>  	mt76u_stop_tx(dev);
>  
> +	/* mt76u_stop_rx() (above) already napi_disable()d the MAIN queue */
> +	if (dev->napi_dev) {
> +		netif_napi_del(&dev->napi[MT_RXQ_MAIN]);
> +		free_netdev(dev->napi_dev);
> +		dev->napi_dev = NULL;
> +	}
> +
>  	mt76u_free_rx(dev);
>  	mt76u_free_tx(dev);
>  }
> @@ -1115,6 +1145,18 @@ int __mt76u_init(struct mt76_dev *dev, struct usb_interface *intf,
>  	sched_set_fifo_low(usb->rx_worker.task);
>  	sched_set_fifo_low(usb->status_worker.task);
>  
> +	/* threaded NAPI on a dummy netdev (reusing mt76_dev's napi_dev/napi[])
> +	 * services the MAIN RX queue and gives the RX path GRO
> +	 */
> +	dev->napi_dev = alloc_netdev_dummy(sizeof(struct mt76_dev *));
> +	if (!dev->napi_dev)
> +		return -ENOMEM;

nit: new-line here.

> +	*(struct mt76_dev **)netdev_priv(dev->napi_dev) = dev;

To make the code more readable, I guess you can define priv pointer similar to mt76_dma_init().


> +	strscpy(dev->napi_dev->name, "mt76u-rx", sizeof(dev->napi_dev->name));
> +	dev->napi_dev->threaded = 1;
> +	netif_napi_add(dev->napi_dev, &dev->napi[MT_RXQ_MAIN], mt76u_napi_poll);
> +	napi_enable(&dev->napi[MT_RXQ_MAIN]);
> +
>  	return 0;
>  }
>  EXPORT_SYMBOL_GPL(__mt76u_init);
> 
> base-commit: 5f6099446d1ddb888e36cdf93b6a0551f05c1267
> -- 
> 2.54.0
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] wifi: mt76: mt76u: use a threaded NAPI for the RX path
  2026-06-09  9:45 ` Lorenzo Bianconi
@ 2026-06-09 10:57   ` Phiality
  0 siblings, 0 replies; 3+ messages in thread
From: Phiality @ 2026-06-09 10:57 UTC (permalink / raw)
  To: Lorenzo Bianconi; +Cc: nbd, ryder.lee, shayne.chen, sean.wang, linux-wireless

> I guess the patch is fine, just a couple of nits inline.

Patch updated in v3 to address the nits,
https://lore.kernel.org/linux-wireless/20260609105301.196302-1-phial@phiality.com/

Cheers,
Filip

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-06-09 10:57 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-09  0:32 [PATCH v2] wifi: mt76: mt76u: use a threaded NAPI for the RX path Filip Bakreski
2026-06-09  9:45 ` Lorenzo Bianconi
2026-06-09 10:57   ` Phiality

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox