From: Menglong Dong <menglong.dong@linux.dev>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Menglong Dong <menglong8.dong@gmail.com>,
xuanzhuo@linux.alibaba.com, eperezma@redhat.com,
jasowang@redhat.com, andrew+netdev@lunn.ch, davem@davemloft.net,
edumazet@google.com, kuba@kernel.org, pabeni@redhat.com,
netdev@vger.kernel.org, virtualization@lists.linux.dev,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH net-next v3] virtio-net: xsk: support tx wake up
Date: Tue, 23 Jun 2026 09:48:51 +0800 [thread overview]
Message-ID: <d3QXZoiISIiEIwCXJDbC5A@linux.dev> (raw)
In-Reply-To: <20260622085825-mutt-send-email-mst@kernel.org>
On 2026/6/22 21:24 Michael S. Tsirkin <mst@redhat.com> write:
> On Mon, Jun 22, 2026 at 08:27:12PM +0800, Menglong Dong wrote:
> > On 2026/6/22 06:31 Michael S. Tsirkin <mst@redhat.com> write:
> > > On Tue, Jun 16, 2026 at 07:59:12PM +0800, Menglong Dong wrote:
> > [...]
[...]
> >
> > And the logic is like this:
> >
> > Kernel: tx NAPI is waked up from skb_xmit_done() ->
> > Kernel: sq->vq and xsk->tx_ring are both empty ->
> > Kernel: call virtnet_xsk_xmit_batch()
> >
> > User: submit a entry to the xsk->tx_ring
> > User: check the wakeup flag
> > User: wakeup flag is not set, skip send()
> >
> > Kernel: call xsk_set_tx_need_wakeup(), because sq->vq is empty
> >
> > If we don't send more data, the data in the xsk->tx_ring will
> > not be sent forever.
>
> I'm not 100% sure I understand, but when someone fixes cross-CPU races
> with no synchronization or CPU memory barriers just with extra checks,
> this always gives me pause.
>
> AI helped write this for me, for example:
> 1. Kernel: xsk_set_tx_need_wakeup stores NEED_WAKEUP (sits in store buffer)
> 2. Kernel: xsk_tx_peek_release_desc_batch - load, sees empty (reordered before the store is globally visible)
> 3. Kernel: peek finds nothing, returns 0
> 4. Userspace: stores entry + producer
> 5. Userspace: loads flags - doesn't see NEED_WAKEUP yet (still in kernel's store buffer)
> 6. Userspeace: skips send()
> 7. Kernel: NEED_WAKEUP store finally becomes visible - too late
>
> Seems legit?
Ah, it seems right. The race condition problem is more complex
than I thought. And seems that this is a common problem of
XSK WAKEUP, which should exists for all the drivers.
So I think we can remove the checking here. And I'll see if I
can solve such problem completely further. WDYT?
>
>
>
> > >
> > > > sent = virtnet_xsk_xmit_batch(sq, pool, budget, &kicks);
> > > >
> > > > + if (need_wakeup) {
> > > > + if (vring_size == sq->vq->num_free)
> > > > + /* we can't wake up by ourself, and it should be done
> > > > + * by the user.
> > > > + */
> > > > + xsk_set_tx_need_wakeup(pool);
> > > > + else
> > > > + /* we can wake up from skb_xmit_done() */
> > > > + xsk_clear_tx_need_wakeup(pool);
> > >
> > > But what if we don't have get tx napi so no wakeup in skb_xmit_done?
> >
> > Sorry that I'm not sure what "get tx napi" means here ;(
> >
> > There are entry in sq->vq, so skb_xmit_done() will be called after
> > the entries in the ring is consumed by the HOST, right?
> > Then, the corresponding sq->napi will be scheduled, as we ensure
> > that tx napi is always enabled, which means napi->weight is not
> > zero, in this commit:
> > 1df5116a41a8 ("virtio_net: xsk: prevent disable tx napi")
>
> Oh I forgot we did that. But can xsk bind when tx napi has already
> been disabled previously?
According to my observe, it can, which I think is another issue, and
I were about to fix it later in a separate patch.
It is a problem, right?
There are 2 approach to fix it:
1. don't allow the binding if the tx napi is not enabled
2. or we set the tx_napi->weight to 1 when binding, and
restore it to 0 when unbind.
Should I fix it in this series?
Thanks!
Menglong Dong
>
>
> > Right?
> >
> > Thanks!
> > Menglong Dong
> >
> > >
> > >
> > > > + }
> > > > +
> > > > if (!is_xdp_raw_buffer_queue(vi, sq - vi->sq))
> > > > check_sq_full_and_disable(vi, vi->dev, sq);
> > > >
> > > > @@ -1470,9 +1488,6 @@ static bool virtnet_xsk_xmit(struct send_queue *sq, struct xsk_buff_pool *pool,
> > > > u64_stats_add(&sq->stats.xdp_tx, sent);
> > > > u64_stats_update_end(&sq->stats.syncp);
> > > >
> > > > - if (xsk_uses_need_wakeup(pool))
> > > > - xsk_set_tx_need_wakeup(pool);
> > > > -
> > > > return sent;
> > > > }
> > > >
> > > > --
> > > > 2.54.0
> > >
> > >
> > >
> >
> >
> >
>
>
next prev parent reply other threads:[~2026-06-23 1:49 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-16 11:59 [PATCH net-next v3] virtio-net: xsk: support tx wake up Menglong Dong
2026-06-21 22:06 ` Jakub Kicinski
2026-06-22 12:38 ` Menglong Dong
2026-06-21 22:31 ` Michael S. Tsirkin
2026-06-22 12:27 ` Menglong Dong
2026-06-22 13:24 ` Michael S. Tsirkin
2026-06-23 1:48 ` Menglong Dong [this message]
2026-06-22 2:40 ` Xuan Zhuo
2026-06-22 12:28 ` Menglong Dong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d3QXZoiISIiEIwCXJDbC5A@linux.dev \
--to=menglong.dong@linux.dev \
--cc=andrew+netdev@lunn.ch \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=eperezma@redhat.com \
--cc=jasowang@redhat.com \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=menglong8.dong@gmail.com \
--cc=mst@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=virtualization@lists.linux.dev \
--cc=xuanzhuo@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox