public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
To: Jakub Kicinski <kuba@kernel.org>
Cc: "Nikhil P. Rao" <nikhil.rao@amd.com>, <netdev@vger.kernel.org>,
	<magnus.karlsson@intel.com>, <sdf@fomichev.me>,
	<davem@davemloft.net>, <edumazet@google.com>, <pabeni@redhat.com>,
	<horms@kernel.org>, <kerneljasonxing@gmail.com>
Subject: Re: [PATCH net v4 2/2] xsk: Fix zero-copy AF_XDP fragment drop
Date: Fri, 20 Feb 2026 13:37:09 +0100	[thread overview]
Message-ID: <aZhVdcTDceAlhLvV@boxer> (raw)
In-Reply-To: <20260219145529.42b177d3@kernel.org>

On Thu, Feb 19, 2026 at 02:55:29PM -0800, Jakub Kicinski wrote:
> On Tue, 17 Feb 2026 21:08:51 +0000 Nikhil P. Rao wrote:
> > AF_XDP should ensure that only a complete packet is sent to application.
> > In the zero-copy case, if the Rx queue gets full as fragments are being
> > enqueued, the remaining fragments are dropped.
> > 
> > For the multi-buffer case, add a check to ensure that the Rx queue has
> > enough space for all fragments of a packet before starting to enqueue
> > them.
> > 
> > Fixes: 24ea50127ecf ("xsk: support mbuf on ZC RX")
> > Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> > Signed-off-by: Nikhil P. Rao <nikhil.rao@amd.com>
> > ---
> >  net/xdp/xsk.c | 23 +++++++++++++++--------
> >  1 file changed, 15 insertions(+), 8 deletions(-)
> > 
> > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> > index f2ec4f78bbb6..f7f816a5cb80 100644
> > --- a/net/xdp/xsk.c
> > +++ b/net/xdp/xsk.c
> > @@ -167,25 +167,32 @@ static int xsk_rcv_zc(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
> >  	struct xdp_buff_xsk *pos, *tmp;
> >  	struct list_head *xskb_list;
> >  	u32 contd = 0;
> > +	u32 num_desc;
> >  	int err;
> >  
> > -	if (frags)
> > +	if (frags) {
> > +		num_desc = xdp_get_shared_info_from_buff(xdp)->nr_frags + 1;
> >  		contd = XDP_PKT_CONTD;
> 
> [1]
> 
> > +	} else {
> > +		err = __xsk_rcv_zc(xs, xskb, len, contd);
> > +		if (err)
> > +			goto err;
> > +		return 0;
> > +	}
> >  
> > -	err = __xsk_rcv_zc(xs, xskb, len, contd);
> > -	if (err)
> > +	if (xskq_prod_nb_free(xs->rx, num_desc) < num_desc) {
> 
> We can pull this check into the branch at [1]
> It will let us preserve the existing flow.

Hi Jakub,

that would work, yes.

> 
> Either that or handle the non-frag case fully upfront:
> 
> if (likely(!frags)) {
> 	err = __xsk_rcv_zc(xs, xskb, len, 0);
> 	if (err)
> 		goto err;
> 	return 0;
> }
> 
> As is you have a weird mix of the two.
> 
> > +		xs->rx_queue_full++;
> > +		err = -ENOBUFS;
> >  		goto err;
> > -	if (likely(!frags))
> > -		return 0;
> > +	}
> >  
> > +	__xsk_rcv_zc(xs, xskb, len, contd);
> 
> Personal preference perhaps but removing error checking always
> gives me pause. Maybe:
> 
> 	bool frag_fail;
> 
> 	frag_fail = __xsk_rcv_zc(xs, xskb, len, contd);
> 	list_for_each...
> 		...
> 		frag_fail |= __xsk_rcv_zc(xs, xskb, len, contd);
> 	DEBUG_NET_WARN_ON_ONCE(frag_fail);

error checking can be actually skipped as xskq_prod_nb_free() peeked into
xsk rx queue and told us there is enough space for descriptor production.

I have sent a patch that adds a variant of __xsk_rcv_zc() that skips
xskq_prod_reserve_desc():

https://lore.kernel.org/bpf/20260218150000.301176-1-maciej.fijalkowski@intel.com/

Logistics of these patches (this set & patch linked above) are a bit of a
question to me though since what Nikhil sent are clearly a fixes that need
backports whereas mine was sent as an improvement towards -next tree.
However, path that Nikhil touched here should be adjusted to what my patch
introduces. I might do this as a follow-up once bpf is merged to bpf-next.

Nikhil, I also see you routed the set to 'net' tree, previously xsk core
was handled via bpf/bpf-next.

> 
> ?
> 
> >  	xskb_list = &xskb->pool->xskb_list;
> >  	list_for_each_entry_safe(pos, tmp, xskb_list, list_node) {
> >  		if (list_is_singular(xskb_list))
> >  			contd = 0;
> >  		len = pos->xdp.data_end - pos->xdp.data;
> > -		err = __xsk_rcv_zc(xs, pos, len, contd);
> > -		if (err)
> > -			goto err;
> > +		__xsk_rcv_zc(xs, pos, len, contd);
> >  		list_del_init(&pos->list_node);
> >  	}
> >  
> -- 
> pw-bot: cr

  reply	other threads:[~2026-02-20 12:37 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-17 21:08 [PATCH net v4 0/2] xsk: Fixes for AF_XDP fragment handling Nikhil P. Rao
2026-02-17 21:08 ` [PATCH net v4 1/2] xsk: Fix fragment node deletion to prevent buffer leak Nikhil P. Rao
2026-02-17 21:08 ` [PATCH net v4 2/2] xsk: Fix zero-copy AF_XDP fragment drop Nikhil P. Rao
2026-02-19 22:55   ` Jakub Kicinski
2026-02-20 12:37     ` Maciej Fijalkowski [this message]
2026-02-20 20:38       ` Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aZhVdcTDceAlhLvV@boxer \
    --to=maciej.fijalkowski@intel.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=kerneljasonxing@gmail.com \
    --cc=kuba@kernel.org \
    --cc=magnus.karlsson@intel.com \
    --cc=netdev@vger.kernel.org \
    --cc=nikhil.rao@amd.com \
    --cc=pabeni@redhat.com \
    --cc=sdf@fomichev.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox