Linux virtualization list
 help / color / mirror / Atom feed
* [PATCH net-next v2 0/2] virtio_net: xsk: rx and tx wake up
@ 2026-06-11  2:56 menglong8.dong
  2026-06-11  2:56 ` [PATCH net-next v2 1/2] virtio_net: xsk: fix race in rx " menglong8.dong
  2026-06-11  2:56 ` [PATCH net-next v2 2/2] virtio-net: xsk: support tx " menglong8.dong
  0 siblings, 2 replies; 5+ messages in thread
From: menglong8.dong @ 2026-06-11  2:56 UTC (permalink / raw)
  To: xuanzhuo, eperezma
  Cc: mst, jasowang, andrew+netdev, davem, edumazet, kuba, pabeni,
	minhquangbui99, kerneljasonxing, netdev, virtualization,
	linux-kernel

From: Menglong Dong <dongml2@chinatelecom.cn>

In the first patch, we fix a race condition for the xsk rx wake up in
virtio-net.

In the second patch, we support xsk tx wake up for virtio-net.

Changes since v1:
- split the rx and tx into two patch
- add the Fixes tag


Menglong Dong (2):
  virtio_net: xsk: fix race in rx wake up
  virtio-net: xsk: support tx wake up

 drivers/net/virtio_net.c | 52 ++++++++++++++++++++++++++++++++++------
 1 file changed, 45 insertions(+), 7 deletions(-)

-- 
2.54.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH net-next v2 1/2] virtio_net: xsk: fix race in rx wake up
  2026-06-11  2:56 [PATCH net-next v2 0/2] virtio_net: xsk: rx and tx wake up menglong8.dong
@ 2026-06-11  2:56 ` menglong8.dong
  2026-06-11 16:24   ` Bui Quang Minh
  2026-06-11  2:56 ` [PATCH net-next v2 2/2] virtio-net: xsk: support tx " menglong8.dong
  1 sibling, 1 reply; 5+ messages in thread
From: menglong8.dong @ 2026-06-11  2:56 UTC (permalink / raw)
  To: xuanzhuo, eperezma
  Cc: mst, jasowang, andrew+netdev, davem, edumazet, kuba, pabeni,
	minhquangbui99, kerneljasonxing, netdev, virtualization,
	linux-kernel

From: Menglong Dong <dongml2@chinatelecom.cn>

During packet receiving in virtio-net, the rq can be empty, which means
"rq->vq->num_free == virtqueue_get_vring_size(rq->vq)", in
virtnet_add_recvbuf_xsk(), if we are using xsk. Meanwhile, the fill ring
can be empty too, which means we can't allocate anything from
xsk_buff_alloc_batch(). Then, we will set the XDP_RING_NEED_WAKEUP flag.

However, if the user clean all the data in rx ring and fill the
"fill ring" and check the XDP_RING_NEED_WAKEUP flag after
xsk_buff_alloc_batch() and before xsk_set_rx_need_wakeup(), then the rx
napi will never be scheduled: the rx ring is empty, which means we will
never receive a packet to trigger the further recv fill. The rx ring is
empty now, so the user will not check the flag too.

Fix this by set the XDP_RING_NEED_WAKEUP flag before
xsk_buff_alloc_batch() if both rq->vq and fill ring are empty.

Meanwhile, set the XDP_RING_NEED_WAKEUP flag if we have any free entry in
rq->vq.

Fixes: e3f8800aa243 ("virtio-net: xsk: Support wakeup on RX side")
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
---
 drivers/net/virtio_net.c | 25 ++++++++++++++++++++++---
 1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index f4adcfee7a80..4b5b3fa62008 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1323,16 +1323,27 @@ static int virtnet_add_recvbuf_xsk(struct virtnet_info *vi, struct receive_queue
 				   struct xsk_buff_pool *pool, gfp_t gfp)
 {
 	struct xdp_buff **xsk_buffs;
+	bool need_wakeup;
 	dma_addr_t addr;
 	int err = 0;
 	u32 len, i;
 	int num;
 
+	need_wakeup = xsk_uses_need_wakeup(pool);
 	xsk_buffs = rq->xsk_buffs;
 
+	/* If both rq->vq and fill ring are empty, and then the user submit
+	 * all the chunks to the fill ring and check the wake up flag
+	 * after xsk_buff_alloc_batch() and before xsk_set_rx_need_wakeup(),
+	 * we will lose the chance to wake up the rx napi, so we have to
+	 * set the need_wakeup flag here.
+	 */
+	if (need_wakeup && virtqueue_get_vring_size(rq->vq) == rq->vq->num_free)
+		xsk_set_rx_need_wakeup(pool);
+
 	num = xsk_buff_alloc_batch(pool, xsk_buffs, rq->vq->num_free);
 	if (!num) {
-		if (xsk_uses_need_wakeup(pool)) {
+		if (need_wakeup) {
 			xsk_set_rx_need_wakeup(pool);
 			/* Return 0 instead of -ENOMEM so that NAPI is
 			 * descheduled.
@@ -1341,8 +1352,6 @@ static int virtnet_add_recvbuf_xsk(struct virtnet_info *vi, struct receive_queue
 		}
 
 		return -ENOMEM;
-	} else {
-		xsk_clear_rx_need_wakeup(pool);
 	}
 
 	len = xsk_pool_get_rx_frame_size(pool) + vi->hdr_len;
@@ -1363,6 +1372,16 @@ static int virtnet_add_recvbuf_xsk(struct virtnet_info *vi, struct receive_queue
 			goto err;
 	}
 
+	if (need_wakeup) {
+		if (rq->vq->num_free)
+			/* We have free buffers, so we'd better wake up the
+			 * rx napi as soon as possible.
+			 */
+			xsk_set_rx_need_wakeup(pool);
+		else
+			xsk_clear_rx_need_wakeup(pool);
+	}
+
 	return num;
 
 err:
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH net-next v2 2/2] virtio-net: xsk: support tx wake up
  2026-06-11  2:56 [PATCH net-next v2 0/2] virtio_net: xsk: rx and tx wake up menglong8.dong
  2026-06-11  2:56 ` [PATCH net-next v2 1/2] virtio_net: xsk: fix race in rx " menglong8.dong
@ 2026-06-11  2:56 ` menglong8.dong
  1 sibling, 0 replies; 5+ messages in thread
From: menglong8.dong @ 2026-06-11  2:56 UTC (permalink / raw)
  To: xuanzhuo, eperezma
  Cc: mst, jasowang, andrew+netdev, davem, edumazet, kuba, pabeni,
	minhquangbui99, kerneljasonxing, netdev, virtualization,
	linux-kernel

From: Menglong Dong <dongml2@chinatelecom.cn>

For now, XDP_RING_NEED_WAKEUP is not supported properly by the virtio-net
in the tx path for example: we set xsk_set_tx_need_wakeup() in
virtnet_xsk_xmit(), but we didn't call xsk_clear_tx_need_wakeup()
anywhere, which means the user will call send() for every packet.

We call xsk_set_tx_need_wakeup() after virtnet_xsk_xmit_batch() if sq->vq
is empty, as we can't be wakeup by the skb_xmit_done() in this case.
Otherwise, we will clear the wakeup flag.

Race condition is considered for tx path.

Fixes: 89f86675cb03 ("virtio_net: xsk: tx: support xmit xsk buffer")
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
---
 drivers/net/virtio_net.c | 27 +++++++++++++++++++++++----
 1 file changed, 23 insertions(+), 4 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 4b5b3fa62008..86b5c1ca568c 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1459,8 +1459,9 @@ static bool virtnet_xsk_xmit(struct send_queue *sq, struct xsk_buff_pool *pool,
 	struct virtnet_info *vi = sq->vq->vdev->priv;
 	struct virtnet_sq_free_stats stats = {};
 	struct net_device *dev = vi->dev;
+	int sent, vring_size;
+	bool need_wakeup;
 	u64 kicks = 0;
-	int sent;
 
 	/* Avoid to wakeup napi meanless, so call __free_old_xmit instead of
 	 * free_old_xmit().
@@ -1470,8 +1471,29 @@ static bool virtnet_xsk_xmit(struct send_queue *sq, struct xsk_buff_pool *pool,
 	if (stats.xsk)
 		xsk_tx_completed(sq->xsk_pool, stats.xsk);
 
+	vring_size = virtqueue_get_vring_size(sq->vq);
+	need_wakeup = xsk_uses_need_wakeup(pool);
+	/* If the sq->vq is empty, and the tx ring is empty, and the user
+	 * submit an entry to the tx ring after virtnet_xsk_xmit_batch() and
+	 * before xsk_set_tx_need_wakeup(), we will lose the chance to wake
+	 * up the tx napi, so we have to set the need_wakeup flag here.
+	 */
+	if (need_wakeup && vring_size == sq->vq->num_free)
+		xsk_set_tx_need_wakeup(pool);
+
 	sent = virtnet_xsk_xmit_batch(sq, pool, budget, &kicks);
 
+	if (need_wakeup) {
+		if (vring_size == sq->vq->num_free)
+			/* we can't wake up by ourself, and it should be done
+			 * by the user.
+			 */
+			xsk_set_tx_need_wakeup(pool);
+		else
+			/* we can wake up from skb_xmit_done() */
+			xsk_clear_tx_need_wakeup(pool);
+	}
+
 	if (!is_xdp_raw_buffer_queue(vi, sq - vi->sq))
 		check_sq_full_and_disable(vi, vi->dev, sq);
 
@@ -1489,9 +1511,6 @@ static bool virtnet_xsk_xmit(struct send_queue *sq, struct xsk_buff_pool *pool,
 	u64_stats_add(&sq->stats.xdp_tx,  sent);
 	u64_stats_update_end(&sq->stats.syncp);
 
-	if (xsk_uses_need_wakeup(pool))
-		xsk_set_tx_need_wakeup(pool);
-
 	return sent;
 }
 
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next v2 1/2] virtio_net: xsk: fix race in rx wake up
  2026-06-11  2:56 ` [PATCH net-next v2 1/2] virtio_net: xsk: fix race in rx " menglong8.dong
@ 2026-06-11 16:24   ` Bui Quang Minh
  2026-06-13 12:26     ` Menglong Dong
  0 siblings, 1 reply; 5+ messages in thread
From: Bui Quang Minh @ 2026-06-11 16:24 UTC (permalink / raw)
  To: menglong8.dong, xuanzhuo, eperezma
  Cc: mst, jasowang, andrew+netdev, davem, edumazet, kuba, pabeni,
	kerneljasonxing, netdev, virtualization, linux-kernel

On 6/11/26 09:56, menglong8.dong@gmail.com wrote:
> From: Menglong Dong <dongml2@chinatelecom.cn>
>
> During packet receiving in virtio-net, the rq can be empty, which means
> "rq->vq->num_free == virtqueue_get_vring_size(rq->vq)", in
> virtnet_add_recvbuf_xsk(), if we are using xsk. Meanwhile, the fill ring
> can be empty too, which means we can't allocate anything from
> xsk_buff_alloc_batch(). Then, we will set the XDP_RING_NEED_WAKEUP flag.
>
> However, if the user clean all the data in rx ring and fill the
> "fill ring" and check the XDP_RING_NEED_WAKEUP flag after
> xsk_buff_alloc_batch() and before xsk_set_rx_need_wakeup(), then the rx
> napi will never be scheduled: the rx ring is empty, which means we will
> never receive a packet to trigger the further recv fill. The rx ring is
> empty now, so the user will not check the flag too.
>
> Fix this by set the XDP_RING_NEED_WAKEUP flag before
> xsk_buff_alloc_batch() if both rq->vq and fill ring are empty.
>
> Meanwhile, set the XDP_RING_NEED_WAKEUP flag if we have any free entry in
> rq->vq.
>
> Fixes: e3f8800aa243 ("virtio-net: xsk: Support wakeup on RX side")
> Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
> ---
>   drivers/net/virtio_net.c | 25 ++++++++++++++++++++++---
>   1 file changed, 22 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index f4adcfee7a80..4b5b3fa62008 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1323,16 +1323,27 @@ static int virtnet_add_recvbuf_xsk(struct virtnet_info *vi, struct receive_queue
>   				   struct xsk_buff_pool *pool, gfp_t gfp)
>   {
>   	struct xdp_buff **xsk_buffs;
> +	bool need_wakeup;
>   	dma_addr_t addr;
>   	int err = 0;
>   	u32 len, i;
>   	int num;
>   
> +	need_wakeup = xsk_uses_need_wakeup(pool);
>   	xsk_buffs = rq->xsk_buffs;
>   
> +	/* If both rq->vq and fill ring are empty, and then the user submit
> +	 * all the chunks to the fill ring and check the wake up flag
> +	 * after xsk_buff_alloc_batch() and before xsk_set_rx_need_wakeup(),
> +	 * we will lose the chance to wake up the rx napi, so we have to
> +	 * set the need_wakeup flag here.
> +	 */
> +	if (need_wakeup && virtqueue_get_vring_size(rq->vq) == rq->vq->num_free)
> +		xsk_set_rx_need_wakeup(pool);

I think when polling the receive queue, the userspace program needs to 
check the XDP_RING_NEED_WAKEUP flag if it does not see any packets. The 
flag check is quite lightweight in my opinion. Here are some examples I find

- 
https://github.com/xdp-project/xdp-tools/blob/e9469501622aa22a7e452a671000bec8685edcde/lib/util/xdpsock.c#L1206
- 
https://github.com/xdp-project/bpf-examples/blob/43e565901c4287efa863edca7f0e6cd6e35ed896/AF_XDP-forwarding/xsk_fwd.c#L540

Furthermore, the XDP_RING_NEED_WAKEUP flag related functions does not 
provide any memory orderings. So even with your patch, I'm worried that 
this case is possible

kernel userspace

xsk_buff_alloc_batch -> failed
                                                             submit fill 
ring
                                                             flag != 
XDP_RING_NEED_WAKEUP
// reordering due to lack of memory orderings
xsk_set_rx_need_wakeup

I'm not expert here, so correct me if I'm wrong. I think the wake up 
flag is designed with no orderings so we cannot rely on it to reason and 
skip further checks.

> +
>   	num = xsk_buff_alloc_batch(pool, xsk_buffs, rq->vq->num_free);
>   	if (!num) {
> -		if (xsk_uses_need_wakeup(pool)) {
> +		if (need_wakeup) {
>   			xsk_set_rx_need_wakeup(pool);
>   			/* Return 0 instead of -ENOMEM so that NAPI is
>   			 * descheduled.
> @@ -1341,8 +1352,6 @@ static int virtnet_add_recvbuf_xsk(struct virtnet_info *vi, struct receive_queue
>   		}
>   
>   		return -ENOMEM;
> -	} else {
> -		xsk_clear_rx_need_wakeup(pool);
>   	}
>   
>   	len = xsk_pool_get_rx_frame_size(pool) + vi->hdr_len;
> @@ -1363,6 +1372,16 @@ static int virtnet_add_recvbuf_xsk(struct virtnet_info *vi, struct receive_queue
>   			goto err;
>   	}
>   
> +	if (need_wakeup) {
> +		if (rq->vq->num_free)
> +			/* We have free buffers, so we'd better wake up the
> +			 * rx napi as soon as possible.
> +			 */
> +			xsk_set_rx_need_wakeup(pool);
> +		else
> +			xsk_clear_rx_need_wakeup(pool);
> +	}
> +

Why do we need to set XDP_RING_NEED_WAKEUP even when 
xsk_buff_alloc_batch succeeds?

>   	return num;
>   
>   err:

Thanks,
Quang Minh.



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next v2 1/2] virtio_net: xsk: fix race in rx wake up
  2026-06-11 16:24   ` Bui Quang Minh
@ 2026-06-13 12:26     ` Menglong Dong
  0 siblings, 0 replies; 5+ messages in thread
From: Menglong Dong @ 2026-06-13 12:26 UTC (permalink / raw)
  To: menglong8.dong, xuanzhuo, eperezma, Bui Quang Minh
  Cc: mst, jasowang, andrew+netdev, davem, edumazet, kuba, pabeni,
	kerneljasonxing, netdev, virtualization, linux-kernel

On 2026/6/12 00:24, Bui Quang Minh wrote:
> On 6/11/26 09:56, menglong8.dong@gmail.com wrote:
> > From: Menglong Dong <dongml2@chinatelecom.cn>
> >
> > During packet receiving in virtio-net, the rq can be empty, which means
> > "rq->vq->num_free == virtqueue_get_vring_size(rq->vq)", in
> > virtnet_add_recvbuf_xsk(), if we are using xsk. Meanwhile, the fill ring
> > can be empty too, which means we can't allocate anything from
> > xsk_buff_alloc_batch(). Then, we will set the XDP_RING_NEED_WAKEUP flag.
> >
> > However, if the user clean all the data in rx ring and fill the
> > "fill ring" and check the XDP_RING_NEED_WAKEUP flag after
> > xsk_buff_alloc_batch() and before xsk_set_rx_need_wakeup(), then the rx
> > napi will never be scheduled: the rx ring is empty, which means we will
> > never receive a packet to trigger the further recv fill. The rx ring is
> > empty now, so the user will not check the flag too.
> >
> > Fix this by set the XDP_RING_NEED_WAKEUP flag before
> > xsk_buff_alloc_batch() if both rq->vq and fill ring are empty.
> >
> > Meanwhile, set the XDP_RING_NEED_WAKEUP flag if we have any free entry in
> > rq->vq.
> >
> > Fixes: e3f8800aa243 ("virtio-net: xsk: Support wakeup on RX side")
> > Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
> > ---
> >   drivers/net/virtio_net.c | 25 ++++++++++++++++++++++---
> >   1 file changed, 22 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index f4adcfee7a80..4b5b3fa62008 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -1323,16 +1323,27 @@ static int virtnet_add_recvbuf_xsk(struct virtnet_info *vi, struct receive_queue
> >   				   struct xsk_buff_pool *pool, gfp_t gfp)
> >   {
> >   	struct xdp_buff **xsk_buffs;
> > +	bool need_wakeup;
> >   	dma_addr_t addr;
> >   	int err = 0;
> >   	u32 len, i;
> >   	int num;
> >   
> > +	need_wakeup = xsk_uses_need_wakeup(pool);
> >   	xsk_buffs = rq->xsk_buffs;
> >   
> > +	/* If both rq->vq and fill ring are empty, and then the user submit
> > +	 * all the chunks to the fill ring and check the wake up flag
> > +	 * after xsk_buff_alloc_batch() and before xsk_set_rx_need_wakeup(),
> > +	 * we will lose the chance to wake up the rx napi, so we have to
> > +	 * set the need_wakeup flag here.
> > +	 */
> > +	if (need_wakeup && virtqueue_get_vring_size(rq->vq) == rq->vq->num_free)
> > +		xsk_set_rx_need_wakeup(pool);
> 

Hi, Bui Quang. Thanks for your reply. I spent some time learning
what you said.

> I think when polling the receive queue, the userspace program needs to 
> check the XDP_RING_NEED_WAKEUP flag if it does not see any packets. The 
> flag check is quite lightweight in my opinion. Here are some examples I find
> 
> - 
> https://github.com/xdp-project/xdp-tools/blob/e9469501622aa22a7e452a671000bec8685edcde/lib/util/xdpsock.c#L1206

You are right, I'm over concerned about this point. My origin
concern is that we can't wake up from the poll syscall in this case:

The chunk of the umem is 2000. In the beginning, the xsk->fill_ring
is filled with 2000 chunk, and then the user fall asleep and don't
do anything.

Kernel: the 2000th packet is received
Kernel: xsk_buff_alloc_batch return 0(xsk->fill_ring is empty and xsk->rx_ring is full)

        User: handle the xsk->rx_ring
        User: fill the xsk->fill_ring with 2000 chunks
        User: check the wake up flag
        User: no need_wakeup flag, fall asleep with poll() syscall

Kernel: call xsk_set_rx_need_wakeup()
Kernel: virio-net rx ringbuf is empty, we can't receive any packet further
Kernel: to call virtnet_add_recvbuf_xsk(), we are dead

But then, I found that we can still be wake up with the 2000th
packet from the poll syscall, which means that the case that
the NAPI and the user can't both be waked up doesn't exist.

> - 
> https://github.com/xdp-project/bpf-examples/blob/43e565901c4287efa863edca7f0e6cd6e35ed896/AF_XDP-forwarding/xsk_fwd.c#L540
> 
> Furthermore, the XDP_RING_NEED_WAKEUP flag related functions does not 
> provide any memory orderings. So even with your patch, I'm worried that 
> this case is possible
> 
> kernel userspace
> 
> xsk_buff_alloc_batch -> failed
>                                                              submit fill 
> ring
>                                                              flag != 
> XDP_RING_NEED_WAKEUP
> // reordering due to lack of memory orderings
> xsk_set_rx_need_wakeup
> 
> I'm not expert here, so correct me if I'm wrong. I think the wake up 
> flag is designed with no orderings so we cannot rely on it to reason and 
> skip further checks.
> 
> > +
> >   	num = xsk_buff_alloc_batch(pool, xsk_buffs, rq->vq->num_free);
[....]
> > +
> 
> Why do we need to set XDP_RING_NEED_WAKEUP even when 
> xsk_buff_alloc_batch succeeds?

Ah, don't mind here. I just thought that if xsk_buff_alloc_batch()
didn't allocate enough chunks as we need, we can wake up
the NAPI as soon as possible, in case that the virtio-net ringbuf
is full and cause packet dropping :)

Anyway, I'll remove the first patch, and send the second patch
only in the V3.

Thanks!
Menglong Dong

> 
> >   	return num;
> >   
> >   err:
> 
> Thanks,
> Quang Minh.
> 
> 
> 
> 





^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-06-13 12:26 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-11  2:56 [PATCH net-next v2 0/2] virtio_net: xsk: rx and tx wake up menglong8.dong
2026-06-11  2:56 ` [PATCH net-next v2 1/2] virtio_net: xsk: fix race in rx " menglong8.dong
2026-06-11 16:24   ` Bui Quang Minh
2026-06-13 12:26     ` Menglong Dong
2026-06-11  2:56 ` [PATCH net-next v2 2/2] virtio-net: xsk: support tx " menglong8.dong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox