From: Stanislav Fomichev <stfomichev@gmail.com>
To: Amery Hung <ameryhung@gmail.com>
Cc: bpf@vger.kernel.org, netdev@vger.kernel.org,
alexei.starovoitov@gmail.com, andrii@kernel.org,
daniel@iogearbox.net, kuba@kernel.org, martin.lau@kernel.org,
mohsin.bashr@gmail.com, saeedm@nvidia.com, tariqt@nvidia.com,
mbloch@nvidia.com, maciej.fijalkowski@intel.com,
kernel-team@meta.com
Subject: Re: [RFC bpf-next v1 3/7] bpf: Support pulling non-linear xdp data
Date: Mon, 25 Aug 2025 15:46:02 -0700 [thread overview]
Message-ID: <aKznqjd1aowjxJfK@mini-arch> (raw)
In-Reply-To: <CAMB2axOkPx=5vseNXbwQtHQTFhdur6OSZ-HbNPUciwBmubQa1w@mail.gmail.com>
On 08/25, Amery Hung wrote:
> On Mon, Aug 25, 2025 at 2:29 PM Stanislav Fomichev <stfomichev@gmail.com> wrote:
> >
> > On 08/25, Amery Hung wrote:
> > > Add kfunc, bpf_xdp_pull_data(), to support pulling data from xdp
> > > fragments. Similar to bpf_skb_pull_data(), bpf_xdp_pull_data() makes
> > > the first len bytes of data directly readable and writable in bpf
> > > programs. If the "len" argument is larger than the linear data size,
> > > data in fragments will be copied to the linear region when there
> > > is enough room between xdp->data_end and xdp_data_hard_end(xdp),
> > > which is subject to driver implementation.
> > >
> > > A use case of the kfunc is to decapsulate headers residing in xdp
> > > fragments. It is possible for a NIC driver to place headers in xdp
> > > fragments. To keep using direct packet access for parsing and
> > > decapsulating headers, users can pull headers into the linear data
> > > area by calling bpf_xdp_pull_data() and then pop the header with
> > > bpf_xdp_adjust_head().
> > >
> > > An unused argument, flags is reserved for future extension (e.g.,
> > > tossing the data instead of copying it to the linear data area).
> > >
> > > Signed-off-by: Amery Hung <ameryhung@gmail.com>
> > > ---
> > > net/core/filter.c | 52 +++++++++++++++++++++++++++++++++++++++++++++++
> > > 1 file changed, 52 insertions(+)
> > >
> > > diff --git a/net/core/filter.c b/net/core/filter.c
> > > index f0ee5aec7977..82d953e077ac 100644
> > > --- a/net/core/filter.c
> > > +++ b/net/core/filter.c
> > > @@ -12211,6 +12211,57 @@ __bpf_kfunc int bpf_sock_ops_enable_tx_tstamp(struct bpf_sock_ops_kern *skops,
> > > return 0;
> > > }
> > >
> > > +__bpf_kfunc int bpf_xdp_pull_data(struct xdp_md *x, u32 len, u64 flags)
> > > +{
> > > + struct xdp_buff *xdp = (struct xdp_buff *)x;
> > > + struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp);
> > > + void *data_end, *data_hard_end = xdp_data_hard_end(xdp);
> > > + int i, delta, buff_len, n_frags_free = 0, len_free = 0;
> > > +
> > > + buff_len = xdp_get_buff_len(xdp);
> > > +
> > > + if (unlikely(len > buff_len))
> > > + return -EINVAL;
> > > +
> > > + if (!len)
> > > + len = xdp_get_buff_len(xdp);
> >
> > Why not return -EINVAL here for len=0?
> >
>
> I try to mirror the behavior of bpf_skb_pull_data() to reduce confusion here.
Ah, makes sense!
> > > +
> > > + data_end = xdp->data + len;
> > > + delta = data_end - xdp->data_end;
> > > +
> > > + if (delta <= 0)
> > > + return 0;
> > > +
> > > + if (unlikely(data_end > data_hard_end))
> > > + return -EINVAL;
> > > +
> > > + for (i = 0; i < sinfo->nr_frags && delta; i++) {
> > > + skb_frag_t *frag = &sinfo->frags[i];
> > > + u32 shrink = min_t(u32, delta, skb_frag_size(frag));
> > > +
> > > + memcpy(xdp->data_end + len_free, skb_frag_address(frag), shrink);
> >
> > skb_frag_address can return NULL for unreadable frags.
>
> Is it safe to assume that drivers will ensure frags to be readable? It
> seems at least mlx5 does.
>
> I did a quick check and found other xdp kfuncs using
> skb_frag_address() without checking the return.
The unreadable frags will always be unredabale to the host. This is TCP
device memory, the memory on the accelerators that is not mapped onto
the CPU. Any attempts to read that memory should gracefully error out.
Can you also pls fix that other one? (not as part of the series should
be ok)
next prev parent reply other threads:[~2025-08-25 22:46 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-25 19:39 [RFC bpf-next v1 0/7] Add kfunc bpf_xdp_pull_data Amery Hung
2025-08-25 19:39 ` [RFC bpf-next v1 1/7] net/mlx5e: Fix generating skb from nonlinear xdp_buff Amery Hung
2025-08-27 13:45 ` Dragos Tatulea
2025-08-28 3:44 ` Amery Hung
2025-08-28 16:23 ` Dragos Tatulea
2025-09-04 17:26 ` Amery Hung
2025-08-28 13:41 ` Nimrod Oren
2025-08-25 19:39 ` [RFC bpf-next v1 2/7] bpf: Allow bpf_xdp_shrink_data to shrink a frag from head and tail Amery Hung
2025-08-28 13:43 ` Nimrod Oren
2025-09-04 22:19 ` Amery Hung
2025-09-05 1:52 ` Jakub Kicinski
2025-08-25 19:39 ` [RFC bpf-next v1 3/7] bpf: Support pulling non-linear xdp data Amery Hung
2025-08-25 21:29 ` Stanislav Fomichev
2025-08-25 22:23 ` Amery Hung
2025-08-25 22:29 ` Jakub Kicinski
2025-08-25 22:36 ` Amery Hung
2025-08-25 22:46 ` Stanislav Fomichev [this message]
2025-08-25 22:58 ` Jakub Kicinski
2025-08-26 0:12 ` Stanislav Fomichev
2025-08-26 0:30 ` Jakub Kicinski
2025-08-25 22:39 ` Jakub Kicinski
2025-08-26 5:12 ` Amery Hung
2025-08-26 13:20 ` Jakub Kicinski
2025-08-26 13:44 ` Amery Hung
2025-08-25 19:39 ` [RFC bpf-next v1 4/7] bpf: Clear packet pointers after changing packet data in kfuncs Amery Hung
2025-08-25 19:39 ` [RFC bpf-next v1 5/7] bpf: Support specifying linear xdp packet data size in test_run Amery Hung
2025-08-25 19:39 ` [RFC bpf-next v1 6/7] selftests/bpf: Test bpf_xdp_pull_data Amery Hung
2025-08-25 19:39 ` [RFC bpf-next v1 7/7] selftests: drv-net: Pull data before parsing headers Amery Hung
2025-08-25 22:41 ` [RFC bpf-next v1 0/7] Add kfunc bpf_xdp_pull_data Jakub Kicinski
2025-08-26 19:38 ` Gal Pressman
2025-08-28 13:39 ` Nimrod Oren
2025-08-29 7:26 ` Amery Hung
2025-08-30 0:09 ` Jakub Kicinski
2025-09-09 9:28 ` Nimrod Oren
2025-08-29 18:21 ` Martin KaFai Lau
2025-09-04 17:28 ` Amery Hung
2025-09-05 17:20 ` Martin KaFai Lau
2025-09-04 22:16 ` Amery Hung
2025-09-09 13:21 ` Nimrod Oren
2025-09-09 15:53 ` Amery Hung
2025-09-09 19:20 ` Amery Hung
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aKznqjd1aowjxJfK@mini-arch \
--to=stfomichev@gmail.com \
--cc=alexei.starovoitov@gmail.com \
--cc=ameryhung@gmail.com \
--cc=andrii@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=kernel-team@meta.com \
--cc=kuba@kernel.org \
--cc=maciej.fijalkowski@intel.com \
--cc=martin.lau@kernel.org \
--cc=mbloch@nvidia.com \
--cc=mohsin.bashr@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=saeedm@nvidia.com \
--cc=tariqt@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.