Linux virtualization list
 help / color / mirror / Atom feed
* [PATCH net-next v2 0/2] virtio_net: xsk: rx and tx wake up
@ 2026-06-11  2:56 menglong8.dong
  2026-06-11  2:56 ` [PATCH net-next v2 1/2] virtio_net: xsk: fix race in rx " menglong8.dong
  2026-06-11  2:56 ` [PATCH net-next v2 2/2] virtio-net: xsk: support tx " menglong8.dong
  0 siblings, 2 replies; 4+ messages in thread
From: menglong8.dong @ 2026-06-11  2:56 UTC (permalink / raw)
  To: xuanzhuo, eperezma
  Cc: mst, jasowang, andrew+netdev, davem, edumazet, kuba, pabeni,
	minhquangbui99, kerneljasonxing, netdev, virtualization,
	linux-kernel

From: Menglong Dong <dongml2@chinatelecom.cn>

In the first patch, we fix a race condition for the xsk rx wake up in
virtio-net.

In the second patch, we support xsk tx wake up for virtio-net.

Changes since v1:
- split the rx and tx into two patch
- add the Fixes tag


Menglong Dong (2):
  virtio_net: xsk: fix race in rx wake up
  virtio-net: xsk: support tx wake up

 drivers/net/virtio_net.c | 52 ++++++++++++++++++++++++++++++++++------
 1 file changed, 45 insertions(+), 7 deletions(-)

-- 
2.54.0


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH net-next v2 1/2] virtio_net: xsk: fix race in rx wake up
  2026-06-11  2:56 [PATCH net-next v2 0/2] virtio_net: xsk: rx and tx wake up menglong8.dong
@ 2026-06-11  2:56 ` menglong8.dong
  2026-06-11 16:24   ` Bui Quang Minh
  2026-06-11  2:56 ` [PATCH net-next v2 2/2] virtio-net: xsk: support tx " menglong8.dong
  1 sibling, 1 reply; 4+ messages in thread
From: menglong8.dong @ 2026-06-11  2:56 UTC (permalink / raw)
  To: xuanzhuo, eperezma
  Cc: mst, jasowang, andrew+netdev, davem, edumazet, kuba, pabeni,
	minhquangbui99, kerneljasonxing, netdev, virtualization,
	linux-kernel

From: Menglong Dong <dongml2@chinatelecom.cn>

During packet receiving in virtio-net, the rq can be empty, which means
"rq->vq->num_free == virtqueue_get_vring_size(rq->vq)", in
virtnet_add_recvbuf_xsk(), if we are using xsk. Meanwhile, the fill ring
can be empty too, which means we can't allocate anything from
xsk_buff_alloc_batch(). Then, we will set the XDP_RING_NEED_WAKEUP flag.

However, if the user clean all the data in rx ring and fill the
"fill ring" and check the XDP_RING_NEED_WAKEUP flag after
xsk_buff_alloc_batch() and before xsk_set_rx_need_wakeup(), then the rx
napi will never be scheduled: the rx ring is empty, which means we will
never receive a packet to trigger the further recv fill. The rx ring is
empty now, so the user will not check the flag too.

Fix this by set the XDP_RING_NEED_WAKEUP flag before
xsk_buff_alloc_batch() if both rq->vq and fill ring are empty.

Meanwhile, set the XDP_RING_NEED_WAKEUP flag if we have any free entry in
rq->vq.

Fixes: e3f8800aa243 ("virtio-net: xsk: Support wakeup on RX side")
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
---
 drivers/net/virtio_net.c | 25 ++++++++++++++++++++++---
 1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index f4adcfee7a80..4b5b3fa62008 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1323,16 +1323,27 @@ static int virtnet_add_recvbuf_xsk(struct virtnet_info *vi, struct receive_queue
 				   struct xsk_buff_pool *pool, gfp_t gfp)
 {
 	struct xdp_buff **xsk_buffs;
+	bool need_wakeup;
 	dma_addr_t addr;
 	int err = 0;
 	u32 len, i;
 	int num;
 
+	need_wakeup = xsk_uses_need_wakeup(pool);
 	xsk_buffs = rq->xsk_buffs;
 
+	/* If both rq->vq and fill ring are empty, and then the user submit
+	 * all the chunks to the fill ring and check the wake up flag
+	 * after xsk_buff_alloc_batch() and before xsk_set_rx_need_wakeup(),
+	 * we will lose the chance to wake up the rx napi, so we have to
+	 * set the need_wakeup flag here.
+	 */
+	if (need_wakeup && virtqueue_get_vring_size(rq->vq) == rq->vq->num_free)
+		xsk_set_rx_need_wakeup(pool);
+
 	num = xsk_buff_alloc_batch(pool, xsk_buffs, rq->vq->num_free);
 	if (!num) {
-		if (xsk_uses_need_wakeup(pool)) {
+		if (need_wakeup) {
 			xsk_set_rx_need_wakeup(pool);
 			/* Return 0 instead of -ENOMEM so that NAPI is
 			 * descheduled.
@@ -1341,8 +1352,6 @@ static int virtnet_add_recvbuf_xsk(struct virtnet_info *vi, struct receive_queue
 		}
 
 		return -ENOMEM;
-	} else {
-		xsk_clear_rx_need_wakeup(pool);
 	}
 
 	len = xsk_pool_get_rx_frame_size(pool) + vi->hdr_len;
@@ -1363,6 +1372,16 @@ static int virtnet_add_recvbuf_xsk(struct virtnet_info *vi, struct receive_queue
 			goto err;
 	}
 
+	if (need_wakeup) {
+		if (rq->vq->num_free)
+			/* We have free buffers, so we'd better wake up the
+			 * rx napi as soon as possible.
+			 */
+			xsk_set_rx_need_wakeup(pool);
+		else
+			xsk_clear_rx_need_wakeup(pool);
+	}
+
 	return num;
 
 err:
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH net-next v2 2/2] virtio-net: xsk: support tx wake up
  2026-06-11  2:56 [PATCH net-next v2 0/2] virtio_net: xsk: rx and tx wake up menglong8.dong
  2026-06-11  2:56 ` [PATCH net-next v2 1/2] virtio_net: xsk: fix race in rx " menglong8.dong
@ 2026-06-11  2:56 ` menglong8.dong
  1 sibling, 0 replies; 4+ messages in thread
From: menglong8.dong @ 2026-06-11  2:56 UTC (permalink / raw)
  To: xuanzhuo, eperezma
  Cc: mst, jasowang, andrew+netdev, davem, edumazet, kuba, pabeni,
	minhquangbui99, kerneljasonxing, netdev, virtualization,
	linux-kernel

From: Menglong Dong <dongml2@chinatelecom.cn>

For now, XDP_RING_NEED_WAKEUP is not supported properly by the virtio-net
in the tx path for example: we set xsk_set_tx_need_wakeup() in
virtnet_xsk_xmit(), but we didn't call xsk_clear_tx_need_wakeup()
anywhere, which means the user will call send() for every packet.

We call xsk_set_tx_need_wakeup() after virtnet_xsk_xmit_batch() if sq->vq
is empty, as we can't be wakeup by the skb_xmit_done() in this case.
Otherwise, we will clear the wakeup flag.

Race condition is considered for tx path.

Fixes: 89f86675cb03 ("virtio_net: xsk: tx: support xmit xsk buffer")
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
---
 drivers/net/virtio_net.c | 27 +++++++++++++++++++++++----
 1 file changed, 23 insertions(+), 4 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 4b5b3fa62008..86b5c1ca568c 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1459,8 +1459,9 @@ static bool virtnet_xsk_xmit(struct send_queue *sq, struct xsk_buff_pool *pool,
 	struct virtnet_info *vi = sq->vq->vdev->priv;
 	struct virtnet_sq_free_stats stats = {};
 	struct net_device *dev = vi->dev;
+	int sent, vring_size;
+	bool need_wakeup;
 	u64 kicks = 0;
-	int sent;
 
 	/* Avoid to wakeup napi meanless, so call __free_old_xmit instead of
 	 * free_old_xmit().
@@ -1470,8 +1471,29 @@ static bool virtnet_xsk_xmit(struct send_queue *sq, struct xsk_buff_pool *pool,
 	if (stats.xsk)
 		xsk_tx_completed(sq->xsk_pool, stats.xsk);
 
+	vring_size = virtqueue_get_vring_size(sq->vq);
+	need_wakeup = xsk_uses_need_wakeup(pool);
+	/* If the sq->vq is empty, and the tx ring is empty, and the user
+	 * submit an entry to the tx ring after virtnet_xsk_xmit_batch() and
+	 * before xsk_set_tx_need_wakeup(), we will lose the chance to wake
+	 * up the tx napi, so we have to set the need_wakeup flag here.
+	 */
+	if (need_wakeup && vring_size == sq->vq->num_free)
+		xsk_set_tx_need_wakeup(pool);
+
 	sent = virtnet_xsk_xmit_batch(sq, pool, budget, &kicks);
 
+	if (need_wakeup) {
+		if (vring_size == sq->vq->num_free)
+			/* we can't wake up by ourself, and it should be done
+			 * by the user.
+			 */
+			xsk_set_tx_need_wakeup(pool);
+		else
+			/* we can wake up from skb_xmit_done() */
+			xsk_clear_tx_need_wakeup(pool);
+	}
+
 	if (!is_xdp_raw_buffer_queue(vi, sq - vi->sq))
 		check_sq_full_and_disable(vi, vi->dev, sq);
 
@@ -1489,9 +1511,6 @@ static bool virtnet_xsk_xmit(struct send_queue *sq, struct xsk_buff_pool *pool,
 	u64_stats_add(&sq->stats.xdp_tx,  sent);
 	u64_stats_update_end(&sq->stats.syncp);
 
-	if (xsk_uses_need_wakeup(pool))
-		xsk_set_tx_need_wakeup(pool);
-
 	return sent;
 }
 
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH net-next v2 1/2] virtio_net: xsk: fix race in rx wake up
  2026-06-11  2:56 ` [PATCH net-next v2 1/2] virtio_net: xsk: fix race in rx " menglong8.dong
@ 2026-06-11 16:24   ` Bui Quang Minh
  0 siblings, 0 replies; 4+ messages in thread
From: Bui Quang Minh @ 2026-06-11 16:24 UTC (permalink / raw)
  To: menglong8.dong, xuanzhuo, eperezma
  Cc: mst, jasowang, andrew+netdev, davem, edumazet, kuba, pabeni,
	kerneljasonxing, netdev, virtualization, linux-kernel

On 6/11/26 09:56, menglong8.dong@gmail.com wrote:
> From: Menglong Dong <dongml2@chinatelecom.cn>
>
> During packet receiving in virtio-net, the rq can be empty, which means
> "rq->vq->num_free == virtqueue_get_vring_size(rq->vq)", in
> virtnet_add_recvbuf_xsk(), if we are using xsk. Meanwhile, the fill ring
> can be empty too, which means we can't allocate anything from
> xsk_buff_alloc_batch(). Then, we will set the XDP_RING_NEED_WAKEUP flag.
>
> However, if the user clean all the data in rx ring and fill the
> "fill ring" and check the XDP_RING_NEED_WAKEUP flag after
> xsk_buff_alloc_batch() and before xsk_set_rx_need_wakeup(), then the rx
> napi will never be scheduled: the rx ring is empty, which means we will
> never receive a packet to trigger the further recv fill. The rx ring is
> empty now, so the user will not check the flag too.
>
> Fix this by set the XDP_RING_NEED_WAKEUP flag before
> xsk_buff_alloc_batch() if both rq->vq and fill ring are empty.
>
> Meanwhile, set the XDP_RING_NEED_WAKEUP flag if we have any free entry in
> rq->vq.
>
> Fixes: e3f8800aa243 ("virtio-net: xsk: Support wakeup on RX side")
> Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
> ---
>   drivers/net/virtio_net.c | 25 ++++++++++++++++++++++---
>   1 file changed, 22 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index f4adcfee7a80..4b5b3fa62008 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1323,16 +1323,27 @@ static int virtnet_add_recvbuf_xsk(struct virtnet_info *vi, struct receive_queue
>   				   struct xsk_buff_pool *pool, gfp_t gfp)
>   {
>   	struct xdp_buff **xsk_buffs;
> +	bool need_wakeup;
>   	dma_addr_t addr;
>   	int err = 0;
>   	u32 len, i;
>   	int num;
>   
> +	need_wakeup = xsk_uses_need_wakeup(pool);
>   	xsk_buffs = rq->xsk_buffs;
>   
> +	/* If both rq->vq and fill ring are empty, and then the user submit
> +	 * all the chunks to the fill ring and check the wake up flag
> +	 * after xsk_buff_alloc_batch() and before xsk_set_rx_need_wakeup(),
> +	 * we will lose the chance to wake up the rx napi, so we have to
> +	 * set the need_wakeup flag here.
> +	 */
> +	if (need_wakeup && virtqueue_get_vring_size(rq->vq) == rq->vq->num_free)
> +		xsk_set_rx_need_wakeup(pool);

I think when polling the receive queue, the userspace program needs to 
check the XDP_RING_NEED_WAKEUP flag if it does not see any packets. The 
flag check is quite lightweight in my opinion. Here are some examples I find

- 
https://github.com/xdp-project/xdp-tools/blob/e9469501622aa22a7e452a671000bec8685edcde/lib/util/xdpsock.c#L1206
- 
https://github.com/xdp-project/bpf-examples/blob/43e565901c4287efa863edca7f0e6cd6e35ed896/AF_XDP-forwarding/xsk_fwd.c#L540

Furthermore, the XDP_RING_NEED_WAKEUP flag related functions does not 
provide any memory orderings. So even with your patch, I'm worried that 
this case is possible

kernel userspace

xsk_buff_alloc_batch -> failed
                                                             submit fill 
ring
                                                             flag != 
XDP_RING_NEED_WAKEUP
// reordering due to lack of memory orderings
xsk_set_rx_need_wakeup

I'm not expert here, so correct me if I'm wrong. I think the wake up 
flag is designed with no orderings so we cannot rely on it to reason and 
skip further checks.

> +
>   	num = xsk_buff_alloc_batch(pool, xsk_buffs, rq->vq->num_free);
>   	if (!num) {
> -		if (xsk_uses_need_wakeup(pool)) {
> +		if (need_wakeup) {
>   			xsk_set_rx_need_wakeup(pool);
>   			/* Return 0 instead of -ENOMEM so that NAPI is
>   			 * descheduled.
> @@ -1341,8 +1352,6 @@ static int virtnet_add_recvbuf_xsk(struct virtnet_info *vi, struct receive_queue
>   		}
>   
>   		return -ENOMEM;
> -	} else {
> -		xsk_clear_rx_need_wakeup(pool);
>   	}
>   
>   	len = xsk_pool_get_rx_frame_size(pool) + vi->hdr_len;
> @@ -1363,6 +1372,16 @@ static int virtnet_add_recvbuf_xsk(struct virtnet_info *vi, struct receive_queue
>   			goto err;
>   	}
>   
> +	if (need_wakeup) {
> +		if (rq->vq->num_free)
> +			/* We have free buffers, so we'd better wake up the
> +			 * rx napi as soon as possible.
> +			 */
> +			xsk_set_rx_need_wakeup(pool);
> +		else
> +			xsk_clear_rx_need_wakeup(pool);
> +	}
> +

Why do we need to set XDP_RING_NEED_WAKEUP even when 
xsk_buff_alloc_batch succeeds?

>   	return num;
>   
>   err:

Thanks,
Quang Minh.



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-06-11 16:24 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-11  2:56 [PATCH net-next v2 0/2] virtio_net: xsk: rx and tx wake up menglong8.dong
2026-06-11  2:56 ` [PATCH net-next v2 1/2] virtio_net: xsk: fix race in rx " menglong8.dong
2026-06-11 16:24   ` Bui Quang Minh
2026-06-11  2:56 ` [PATCH net-next v2 2/2] virtio-net: xsk: support tx " menglong8.dong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox