From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
To: Jakub Kicinski <kuba@kernel.org>
Cc: "Nikhil P. Rao" <nikhil.rao@amd.com>, <netdev@vger.kernel.org>,
<magnus.karlsson@intel.com>, <sdf@fomichev.me>,
<davem@davemloft.net>, <edumazet@google.com>, <pabeni@redhat.com>,
<horms@kernel.org>, <kerneljasonxing@gmail.com>
Subject: Re: [PATCH net v4 2/2] xsk: Fix zero-copy AF_XDP fragment drop
Date: Fri, 20 Feb 2026 13:37:09 +0100 [thread overview]
Message-ID: <aZhVdcTDceAlhLvV@boxer> (raw)
In-Reply-To: <20260219145529.42b177d3@kernel.org>
On Thu, Feb 19, 2026 at 02:55:29PM -0800, Jakub Kicinski wrote:
> On Tue, 17 Feb 2026 21:08:51 +0000 Nikhil P. Rao wrote:
> > AF_XDP should ensure that only a complete packet is sent to application.
> > In the zero-copy case, if the Rx queue gets full as fragments are being
> > enqueued, the remaining fragments are dropped.
> >
> > For the multi-buffer case, add a check to ensure that the Rx queue has
> > enough space for all fragments of a packet before starting to enqueue
> > them.
> >
> > Fixes: 24ea50127ecf ("xsk: support mbuf on ZC RX")
> > Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> > Signed-off-by: Nikhil P. Rao <nikhil.rao@amd.com>
> > ---
> > net/xdp/xsk.c | 23 +++++++++++++++--------
> > 1 file changed, 15 insertions(+), 8 deletions(-)
> >
> > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> > index f2ec4f78bbb6..f7f816a5cb80 100644
> > --- a/net/xdp/xsk.c
> > +++ b/net/xdp/xsk.c
> > @@ -167,25 +167,32 @@ static int xsk_rcv_zc(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
> > struct xdp_buff_xsk *pos, *tmp;
> > struct list_head *xskb_list;
> > u32 contd = 0;
> > + u32 num_desc;
> > int err;
> >
> > - if (frags)
> > + if (frags) {
> > + num_desc = xdp_get_shared_info_from_buff(xdp)->nr_frags + 1;
> > contd = XDP_PKT_CONTD;
>
> [1]
>
> > + } else {
> > + err = __xsk_rcv_zc(xs, xskb, len, contd);
> > + if (err)
> > + goto err;
> > + return 0;
> > + }
> >
> > - err = __xsk_rcv_zc(xs, xskb, len, contd);
> > - if (err)
> > + if (xskq_prod_nb_free(xs->rx, num_desc) < num_desc) {
>
> We can pull this check into the branch at [1]
> It will let us preserve the existing flow.
Hi Jakub,
that would work, yes.
>
> Either that or handle the non-frag case fully upfront:
>
> if (likely(!frags)) {
> err = __xsk_rcv_zc(xs, xskb, len, 0);
> if (err)
> goto err;
> return 0;
> }
>
> As is you have a weird mix of the two.
>
> > + xs->rx_queue_full++;
> > + err = -ENOBUFS;
> > goto err;
> > - if (likely(!frags))
> > - return 0;
> > + }
> >
> > + __xsk_rcv_zc(xs, xskb, len, contd);
>
> Personal preference perhaps but removing error checking always
> gives me pause. Maybe:
>
> bool frag_fail;
>
> frag_fail = __xsk_rcv_zc(xs, xskb, len, contd);
> list_for_each...
> ...
> frag_fail |= __xsk_rcv_zc(xs, xskb, len, contd);
> DEBUG_NET_WARN_ON_ONCE(frag_fail);
error checking can be actually skipped as xskq_prod_nb_free() peeked into
xsk rx queue and told us there is enough space for descriptor production.
I have sent a patch that adds a variant of __xsk_rcv_zc() that skips
xskq_prod_reserve_desc():
https://lore.kernel.org/bpf/20260218150000.301176-1-maciej.fijalkowski@intel.com/
Logistics of these patches (this set & patch linked above) are a bit of a
question to me though since what Nikhil sent are clearly a fixes that need
backports whereas mine was sent as an improvement towards -next tree.
However, path that Nikhil touched here should be adjusted to what my patch
introduces. I might do this as a follow-up once bpf is merged to bpf-next.
Nikhil, I also see you routed the set to 'net' tree, previously xsk core
was handled via bpf/bpf-next.
>
> ?
>
> > xskb_list = &xskb->pool->xskb_list;
> > list_for_each_entry_safe(pos, tmp, xskb_list, list_node) {
> > if (list_is_singular(xskb_list))
> > contd = 0;
> > len = pos->xdp.data_end - pos->xdp.data;
> > - err = __xsk_rcv_zc(xs, pos, len, contd);
> > - if (err)
> > - goto err;
> > + __xsk_rcv_zc(xs, pos, len, contd);
> > list_del_init(&pos->list_node);
> > }
> >
> --
> pw-bot: cr
next prev parent reply other threads:[~2026-02-20 12:37 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-17 21:08 [PATCH net v4 0/2] xsk: Fixes for AF_XDP fragment handling Nikhil P. Rao
2026-02-17 21:08 ` [PATCH net v4 1/2] xsk: Fix fragment node deletion to prevent buffer leak Nikhil P. Rao
2026-02-17 21:08 ` [PATCH net v4 2/2] xsk: Fix zero-copy AF_XDP fragment drop Nikhil P. Rao
2026-02-19 22:55 ` Jakub Kicinski
2026-02-20 12:37 ` Maciej Fijalkowski [this message]
2026-02-20 20:38 ` Jakub Kicinski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aZhVdcTDceAlhLvV@boxer \
--to=maciej.fijalkowski@intel.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=kerneljasonxing@gmail.com \
--cc=kuba@kernel.org \
--cc=magnus.karlsson@intel.com \
--cc=netdev@vger.kernel.org \
--cc=nikhil.rao@amd.com \
--cc=pabeni@redhat.com \
--cc=sdf@fomichev.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox