From: Lorenzo Bianconi <lorenzo@kernel.org>
To: Jakub Kicinski <kuba@kernel.org>
Cc: Donald Hunter <donald.hunter@gmail.com>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Paolo Abeni <pabeni@redhat.com>, Simon Horman <horms@kernel.org>,
Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Jesper Dangaard Brouer <hawk@kernel.org>,
John Fastabend <john.fastabend@gmail.com>,
Stanislav Fomichev <sdf@fomichev.me>,
Andrew Lunn <andrew+netdev@lunn.ch>,
Tony Nguyen <anthony.l.nguyen@intel.com>,
Przemek Kitszel <przemyslaw.kitszel@intel.com>,
Alexander Lobakin <aleksander.lobakin@intel.com>,
Andrii Nakryiko <andrii@kernel.org>,
Martin KaFai Lau <martin.lau@linux.dev>,
Eduard Zingerman <eddyz87@gmail.com>, Song Liu <song@kernel.org>,
Yonghong Song <yonghong.song@linux.dev>,
KP Singh <kpsingh@kernel.org>, Hao Luo <haoluo@google.com>,
Jiri Olsa <jolsa@kernel.org>, Shuah Khan <shuah@kernel.org>,
Maciej Fijalkowski <maciej.fijalkowski@intel.com>,
Jakub Sitnicki <jakub@cloudflare.com>,
netdev@vger.kernel.org, bpf@vger.kernel.org,
intel-wired-lan@lists.osuosl.org,
linux-kselftest@vger.kernel.org
Subject: Re: [PATCH bpf-next v3 1/5] netlink: specs: Add XDP RX checksum capability to XDP metadata specs
Date: Fri, 27 Feb 2026 14:21:44 +0100 [thread overview]
Message-ID: <aaGaaExy63bGa7Or@lore-desk> (raw)
In-Reply-To: <20260223151845.06db43b0@kernel.org>
[-- Attachment #1: Type: text/plain, Size: 4019 bytes --]
> On Mon, 23 Feb 2026 18:11:54 +0100 Lorenzo Bianconi wrote:
> > > Off the top of my head drivers prefer reporting UNNECESSARY when they
> > > have both, and reserve COMPLETE for cases where L4 could not be found
> > > or is incorrect. Why don't we report both? We're using 3 args, we still
> > > have 3 to go. We could turn ip_summed into a bitmap and have explicit
> > > output args for both level and csum complete value?
> >
> > Ack, thx for the explanation. Just for sake of understanding, is there
> > any NIC capable of reporting both csum_value and csum for the same packet
> > in the DMA descriptor? Or is this change needed to be future-proof?
>
> Both nfp and fbnic definitely can. Off the top of my head - mlx5 also
> can, but I haven't double checked.
ack, thx for pointing this out, I was not aware of it. I will modify the APIs
in order to add the capability to report both cksum and csum_level for a given
packet.
>
> > > One more thing I'd like us to at least have a plan for at this stage
> > > is how to deal with COMPLETE + modified packet + XDP_PASS.
> > > Right now some drivers discard COMPLETE when XDP is attached since
> > > they can't be sure if XDP modifies the packet. Other drivers don't
> > > and we end up with bad csum splat. Do we have a recommendation on
> > > the correct behavior? If not - should we have a kfunc to adjust /
> > > discard csum complete explicitly?
> >
> > At the moment there is no way to store the csum value we got running
> > bpf_xdp_metadata_rx_checksum() in order to be consumed during
> > xdp_buff/xdp_frame to skb conversion (this info can just be consumed in the
> > ebpf program bound to the NIC) but
>
> I think the scope here is much narrower than the xdp_buf to xdp_frame
> to skb conversion. We are just pass information between the program and
> driver which owns xdp_buff. Very similar to your new xmo.
>
> We could either tell the driver to discard the csum complete or even
> add a helper to "adjust" the the csum value. Similar to the helper
> we have to adjust the csum in TC / skb context.
IIUC, for the CSUM_COMPLETE case, we want to add a kfunc used to update (or
invalidate) the checksum value (if the packet has been modified by the eBPF
program bounded to the NIC) and report the updated checksum to the driver if
the XDP verdict is XDP_PASS. Correct?
I guess we could have two approaches here:
- Write the new checksum value into the xdp_metadata area (if available)
where the driver can load it and update the checksum value before
allocating the skb.
The main downside of this approach is we need modify each driver.
- Add a new xmo callback used to set the checksum value and report it
from the eBPF program into a specific memory area provided by the driver
(e.g. DMA descriptor) that is used to build the skb.
What do you think?
Moreover, since we already have this issue upstream, do you think this new feature must
be part this series or can we do it with a follow-up patch/series?
Regards,
Lorenzo
>
> > I guess the issue you pointed out can be solved in the verifier
> > during program load time. What do you think?
>
> It could, but at the verifier level we'd probably have to be fairly
> coarse-grained. Any write to the packet data would mean csum complete
> cannot be trusted, that's not too hard. But also any tail call / fentry?
> I'm not really up to date on the latest in program chaining in BPF but
> I think a lot of real-life deployments would use either chaining or
> fentry. So in practice it may be a lot of complexity for having csum
> complete always disabled w/ XDP, in practice.
>
> Up to you. I'm totally okay to just say** that drivers should never
> report csum complete with XDP (until appropriate API is built).
> Perhaps this will force those who care about XDP+csum_complete to
> tell us what their requirements are?
>
> [**] "just say" == document and add driver kselftest that validates it
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
next prev parent reply other threads:[~2026-02-27 13:21 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-17 8:33 [PATCH bpf-next v3 0/5] Add the the capability to load HW RX checsum in eBPF programs Lorenzo Bianconi
2026-02-17 8:33 ` [PATCH bpf-next v3 1/5] netlink: specs: Add XDP RX checksum capability to XDP metadata specs Lorenzo Bianconi
2026-02-18 1:01 ` Stanislav Fomichev
2026-02-18 10:58 ` Jesper Dangaard Brouer
2026-02-19 1:47 ` Jakub Kicinski
2026-02-19 11:04 ` Lorenzo Bianconi
2026-02-19 17:13 ` Jakub Kicinski
2026-02-23 17:11 ` Lorenzo Bianconi
2026-02-23 23:18 ` Jakub Kicinski
2026-02-27 13:21 ` Lorenzo Bianconi [this message]
2026-02-27 23:32 ` Jakub Kicinski
2026-02-28 11:58 ` Lorenzo Bianconi
2026-02-17 8:33 ` [PATCH bpf-next v3 2/5] net: veth: Add xmo_rx_checksum callback to veth driver Lorenzo Bianconi
2026-02-17 8:33 ` [PATCH bpf-next v3 3/5] net: ice: Add xmo_rx_checksum callback Lorenzo Bianconi
2026-02-17 8:33 ` [PATCH bpf-next v3 4/5] selftests/bpf: Add selftest support for bpf_xdp_metadata_rx_checksum Lorenzo Bianconi
2026-02-17 9:17 ` bot+bpf-ci
2026-02-17 8:34 ` [PATCH bpf-next v3 5/5] selftests/bpf: Add bpf_xdp_metadata_rx_checksum support to xdp_hw_metadat prog Lorenzo Bianconi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aaGaaExy63bGa7Or@lore-desk \
--to=lorenzo@kernel.org \
--cc=aleksander.lobakin@intel.com \
--cc=andrew+netdev@lunn.ch \
--cc=andrii@kernel.org \
--cc=anthony.l.nguyen@intel.com \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=donald.hunter@gmail.com \
--cc=eddyz87@gmail.com \
--cc=edumazet@google.com \
--cc=haoluo@google.com \
--cc=hawk@kernel.org \
--cc=horms@kernel.org \
--cc=intel-wired-lan@lists.osuosl.org \
--cc=jakub@cloudflare.com \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kpsingh@kernel.org \
--cc=kuba@kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=maciej.fijalkowski@intel.com \
--cc=martin.lau@linux.dev \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=przemyslaw.kitszel@intel.com \
--cc=sdf@fomichev.me \
--cc=shuah@kernel.org \
--cc=song@kernel.org \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox