From: Jakub Kicinski <kuba@kernel.org>
To: Amery Hung <ameryhung@gmail.com>
Cc: bpf@vger.kernel.org, netdev@vger.kernel.org,
alexei.starovoitov@gmail.com, andrii@kernel.org,
daniel@iogearbox.net, paul.chaignon@gmail.com,
stfomichev@gmail.com, martin.lau@kernel.org,
mohsin.bashr@gmail.com, noren@nvidia.com, dtatulea@nvidia.com,
saeedm@nvidia.com, tariqt@nvidia.com, mbloch@nvidia.com,
maciej.fijalkowski@intel.com, kernel-team@meta.com
Subject: Re: [PATCH bpf-next v3 2/6] bpf: Support pulling non-linear xdp data
Date: Tue, 16 Sep 2025 17:17:11 -0700 [thread overview]
Message-ID: <20250916171711.1b0d0bc4@kernel.org> (raw)
In-Reply-To: <20250915224801.2961360-3-ameryhung@gmail.com>
On Mon, 15 Sep 2025 15:47:57 -0700 Amery Hung wrote:
> +/**
> + * bpf_xdp_pull_data() - Pull in non-linear xdp data.
> + * @x: &xdp_md associated with the XDP buffer
> + * @len: length of data to be made directly accessible in the linear part
> + *
> + * Pull in non-linear data in case the XDP buffer associated with @x is
looks like there will be a v4, so nit, I'd drop the first non-linear:
Pull in data in case the XDP buffer associated with @x is
we say linear too many times, makes the doc hard to read
> + * non-linear and not all @len are in the linear data area.
> + *
> + * Direct packet access allows reading and writing linear XDP data through
> + * packet pointers (i.e., &xdp_md->data + offsets). The amount of data which
> + * ends up in the linear part of the xdp_buff depends on the NIC and its
> + * configuration. When an eBPF program wants to directly access headers that
s/eBPF/frag-capable XDP/ ?
> + * may be in the non-linear area, call this kfunc to make sure the data is
> + * available in the linear area. Alternatively, use dynptr or
> + * bpf_xdp_{load,store}_bytes() to access data without pulling.
> + *
> + * This kfunc can also be used with bpf_xdp_adjust_head() to decapsulate
> + * headers in the non-linear data area.
> + *
> + * A call to this kfunc may reduce headroom. If there is not enough tailroom
> + * in the linear data area, metadata and data will be shifted down.
> + *
> + * A call to this kfunc is susceptible to change the buffer geometry.
> + * Therefore, at load time, all checks on pointers previously done by the
> + * verifier are invalidated and must be performed again, if the kfunc is used
> + * in combination with direct packet access.
> + *
> + * Return:
> + * * %0 - success
> + * * %-EINVAL - invalid len
> + */
> +__bpf_kfunc int bpf_xdp_pull_data(struct xdp_md *x, u32 len)
> +{
> + struct xdp_buff *xdp = (struct xdp_buff *)x;
> + int i, delta, shift, headroom, tailroom, n_frags_free = 0, len_free = 0;
> + struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp);
> + void *data_hard_end = xdp_data_hard_end(xdp);
> + int data_len = xdp->data_end - xdp->data;
> + void *start, *new_end = xdp->data + len;
> +
> + if (len <= data_len)
> + return 0;
> +
> + if (unlikely(len > xdp_get_buff_len(xdp)))
> + return -EINVAL;
> +
> + start = xdp_data_meta_unsupported(xdp) ? xdp->data : xdp->data_meta;
> +
> + headroom = start - xdp->data_hard_start - sizeof(struct xdp_frame);
> + tailroom = data_hard_end - xdp->data_end;
> +
> + delta = len - data_len;
> + if (unlikely(delta > tailroom + headroom))
> + return -EINVAL;
> +
> + shift = delta - tailroom;
> + if (shift > 0) {
> + memmove(start - shift, start, xdp->data_end - start);
> +
> + xdp->data_meta -= shift;
> + xdp->data -= shift;
> + xdp->data_end -= shift;
> +
> + new_end = data_hard_end;
> + }
> +
> + for (i = 0; i < sinfo->nr_frags && delta; i++) {
> + skb_frag_t *frag = &sinfo->frags[i];
> + u32 shrink = min_t(u32, delta, skb_frag_size(frag));
> +
> + memcpy(xdp->data_end + len_free, skb_frag_address(frag), shrink);
> +
> + len_free += shrink;
> + delta -= shrink;
> + if (bpf_xdp_shrink_data(xdp, frag, shrink, false))
> + n_frags_free++;
> + }
> +
> + if (unlikely(n_frags_free)) {
> + memmove(sinfo->frags, sinfo->frags + n_frags_free,
> + (sinfo->nr_frags - n_frags_free) * sizeof(skb_frag_t));
> +
> + sinfo->nr_frags -= n_frags_free;
> +
> + if (!sinfo->nr_frags)
> + xdp_buff_clear_frags_flag(xdp);
> + }
> +
> + sinfo->xdp_frags_size -= len_free;
> + xdp->data_end = new_end;
Not sure I see the benefit of maintaining the new_end, and len_free.
We could directly adjust
xdp->data_end += shrink;
sinfo->xdp_frags_size -= shrink;
as we copy from the frags. But either way:
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
The whole things actually looks pretty clean, I was worried
the shifting down of the data would add a lot of complexity :)
next prev parent reply other threads:[~2025-09-17 0:17 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-15 22:47 [PATCH bpf-next v3 0/6] Add kfunc bpf_xdp_pull_data Amery Hung
2025-09-15 22:47 ` [PATCH bpf-next v3 1/6] bpf: Allow bpf_xdp_shrink_data to shrink a frag from head and tail Amery Hung
2025-09-15 22:47 ` [PATCH bpf-next v3 2/6] bpf: Support pulling non-linear xdp data Amery Hung
2025-09-17 0:17 ` Jakub Kicinski [this message]
2025-09-17 19:37 ` Amery Hung
2025-09-15 22:47 ` [PATCH bpf-next v3 3/6] bpf: Clear packet pointers after changing packet data in kfuncs Amery Hung
2025-09-15 22:47 ` [PATCH bpf-next v3 4/6] bpf: Support specifying linear xdp packet data size for BPF_PROG_TEST_RUN Amery Hung
2025-09-16 22:59 ` Martin KaFai Lau
2025-09-17 17:23 ` Amery Hung
2025-09-15 22:48 ` [PATCH bpf-next v3 5/6] selftests/bpf: Test bpf_xdp_pull_data Amery Hung
2025-09-17 17:54 ` Martin KaFai Lau
2025-09-15 22:48 ` [PATCH bpf-next v3 6/6] selftests: drv-net: Pull data before parsing headers Amery Hung
2025-09-17 18:50 ` [PATCH bpf-next v3 0/6] Add kfunc bpf_xdp_pull_data Martin KaFai Lau
2025-09-17 21:22 ` Jakub Kicinski
2025-09-18 6:43 ` Nimrod Oren
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250916171711.1b0d0bc4@kernel.org \
--to=kuba@kernel.org \
--cc=alexei.starovoitov@gmail.com \
--cc=ameryhung@gmail.com \
--cc=andrii@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=dtatulea@nvidia.com \
--cc=kernel-team@meta.com \
--cc=maciej.fijalkowski@intel.com \
--cc=martin.lau@kernel.org \
--cc=mbloch@nvidia.com \
--cc=mohsin.bashr@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=noren@nvidia.com \
--cc=paul.chaignon@gmail.com \
--cc=saeedm@nvidia.com \
--cc=stfomichev@gmail.com \
--cc=tariqt@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.