From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
To: Jakub Kicinski <kuba@kernel.org>
Cc: "Nikhil P. Rao" <nikhil.rao@amd.com>, <netdev@vger.kernel.org>,
<magnus.karlsson@intel.com>, <sdf@fomichev.me>,
<davem@davemloft.net>, <edumazet@google.com>, <pabeni@redhat.com>,
<horms@kernel.org>, <kerneljasonxing@gmail.com>
Subject: Re: [PATCH net v4 2/2] xsk: Fix zero-copy AF_XDP fragment drop
Date: Fri, 20 Feb 2026 13:37:09 +0100 [thread overview]
Message-ID: <aZhVdcTDceAlhLvV@boxer> (raw)
In-Reply-To: <20260219145529.42b177d3@kernel.org>
On Thu, Feb 19, 2026 at 02:55:29PM -0800, Jakub Kicinski wrote:
> On Tue, 17 Feb 2026 21:08:51 +0000 Nikhil P. Rao wrote:
> > AF_XDP should ensure that only a complete packet is sent to application.
> > In the zero-copy case, if the Rx queue gets full as fragments are being
> > enqueued, the remaining fragments are dropped.
> >
> > For the multi-buffer case, add a check to ensure that the Rx queue has
> > enough space for all fragments of a packet before starting to enqueue
> > them.
> >
> > Fixes: 24ea50127ecf ("xsk: support mbuf on ZC RX")
> > Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> > Signed-off-by: Nikhil P. Rao <nikhil.rao@amd.com>
> > ---
> > net/xdp/xsk.c | 23 +++++++++++++++--------
> > 1 file changed, 15 insertions(+), 8 deletions(-)
> >
> > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> > index f2ec4f78bbb6..f7f816a5cb80 100644
> > --- a/net/xdp/xsk.c
> > +++ b/net/xdp/xsk.c
> > @@ -167,25 +167,32 @@ static int xsk_rcv_zc(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
> > struct xdp_buff_xsk *pos, *tmp;
> > struct list_head *xskb_list;
> > u32 contd = 0;
> > + u32 num_desc;
> > int err;
> >
> > - if (frags)
> > + if (frags) {
> > + num_desc = xdp_get_shared_info_from_buff(xdp)->nr_frags + 1;
> > contd = XDP_PKT_CONTD;
>
> [1]
>
> > + } else {
> > + err = __xsk_rcv_zc(xs, xskb, len, contd);
> > + if (err)
> > + goto err;
> > + return 0;
> > + }
> >
> > - err = __xsk_rcv_zc(xs, xskb, len, contd);
> > - if (err)
> > + if (xskq_prod_nb_free(xs->rx, num_desc) < num_desc) {
>
> We can pull this check into the branch at [1]
> It will let us preserve the existing flow.
Hi Jakub,
that would work, yes.
>
> Either that or handle the non-frag case fully upfront:
>
> if (likely(!frags)) {
> err = __xsk_rcv_zc(xs, xskb, len, 0);
> if (err)
> goto err;
> return 0;
> }
>
> As is you have a weird mix of the two.
>
> > + xs->rx_queue_full++;
> > + err = -ENOBUFS;
> > goto err;
> > - if (likely(!frags))
> > - return 0;
> > + }
> >
> > + __xsk_rcv_zc(xs, xskb, len, contd);
>
> Personal preference perhaps but removing error checking always
> gives me pause. Maybe:
>
> bool frag_fail;
>
> frag_fail = __xsk_rcv_zc(xs, xskb, len, contd);
> list_for_each...
> ...
> frag_fail |= __xsk_rcv_zc(xs, xskb, len, contd);
> DEBUG_NET_WARN_ON_ONCE(frag_fail);
error checking can be actually skipped as xskq_prod_nb_free() peeked into
xsk rx queue and told us there is enough space for descriptor production.
I have sent a patch that adds a variant of __xsk_rcv_zc() that skips
xskq_prod_reserve_desc():
https://lore.kernel.org/bpf/20260218150000.301176-1-maciej.fijalkowski@intel.com/
Logistics of these patches (this set & patch linked above) are a bit of a
question to me though since what Nikhil sent are clearly a fixes that need
backports whereas mine was sent as an improvement towards -next tree.
However, path that Nikhil touched here should be adjusted to what my patch
introduces. I might do this as a follow-up once bpf is merged to bpf-next.
Nikhil, I also see you routed the set to 'net' tree, previously xsk core
was handled via bpf/bpf-next.
>
> ?
>
> > xskb_list = &xskb->pool->xskb_list;
> > list_for_each_entry_safe(pos, tmp, xskb_list, list_node) {
> > if (list_is_singular(xskb_list))
> > contd = 0;
> > len = pos->xdp.data_end - pos->xdp.data;
> > - err = __xsk_rcv_zc(xs, pos, len, contd);
> > - if (err)
> > - goto err;
> > + __xsk_rcv_zc(xs, pos, len, contd);
> > list_del_init(&pos->list_node);
> > }
> >
> --
> pw-bot: cr
next prev parent reply other threads:[~2026-02-20 12:37 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-17 21:08 [PATCH net v4 0/2] xsk: Fixes for AF_XDP fragment handling Nikhil P. Rao
2026-02-17 21:08 ` [PATCH net v4 1/2] xsk: Fix fragment node deletion to prevent buffer leak Nikhil P. Rao
2026-02-17 21:08 ` [PATCH net v4 2/2] xsk: Fix zero-copy AF_XDP fragment drop Nikhil P. Rao
2026-02-19 22:55 ` Jakub Kicinski
2026-02-20 12:37 ` Maciej Fijalkowski [this message]
2026-02-20 20:38 ` Jakub Kicinski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aZhVdcTDceAlhLvV@boxer \
--to=maciej.fijalkowski@intel.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=kerneljasonxing@gmail.com \
--cc=kuba@kernel.org \
--cc=magnus.karlsson@intel.com \
--cc=netdev@vger.kernel.org \
--cc=nikhil.rao@amd.com \
--cc=pabeni@redhat.com \
--cc=sdf@fomichev.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.