From: Ralf Lici <ralf@mandelbit.com>
To: "Toke Høiland-Jørgensen" <toke@kernel.org>
Cc: netdev@vger.kernel.org, "Daniel Gröber" <dxld@darkboxed.org>,
"Antonio Quartulli" <antonio@mandelbit.com>,
"Andrew Lunn" <andrew+netdev@lunn.ch>,
"David S. Miller" <davem@davemloft.net>,
"Eric Dumazet" <edumazet@google.com>,
"Jakub Kicinski" <kuba@kernel.org>,
"Paolo Abeni" <pabeni@redhat.com>,
linux-kernel@vger.kernel.org,
"Pablo Neira Ayuso" <pablo@netfilter.org>,
"Florian Westphal" <fw@strlen.de>, "Phil Sutter" <phil@nwl.cc>,
"Beniamino Galvani" <bgalvani@redhat.com>
Subject: Re: [RFC net-next 08/15] ipxlat: add translation engine and dispatch core
Date: Wed, 24 Jun 2026 18:18:52 +0200 [thread overview]
Message-ID: <20260624161854.686569-1-ralf@mandelbit.com> (raw)
In-Reply-To: <87v7b9c9jj.fsf@toke.dk>
On Tue, 23 Jun 2026 21:59:44 +0200, Toke Høiland-Jørgensen <toke@kernel.org> wrote:
> Ralf Lici <ralf@mandelbit.com> writes:
> > On the BPF point specifically: I agree a BPF program should be able to
> > decide whether to translate. What I am less sure about is whether
> > redirecting to a netdevice is the best way to expose that. A TC action
> > (yet another model, I know :)) gives you the same thing in-pipeline and
> > more directly:
> >
> > tc filter add dev wwan0 egress \
> > bpf obj match.o action ipxlat4to6 domain clat0
> >
> > Let BPF make the policy decision, with the native action doing the
> > translation work that the current BPF CLAT implementations have trouble
> > with: fragmentation, checksum corner cases, and ICMP error inner
> > headers (as explained by Beniamino).
> >
> > So TC clsact looks like the natural in-kernel replacement for today's
> > TC-BPF CLAT programs: no extra netdev, you attach to the existing
> > uplink, direction is explicit, and on egress you sit on the real route
> > dst, so the synthetic-dst and double-routing problems above just don't
> > arise. The cost is more moving parts than a single bpf_redirect since
> > userspace has to manage clsact, filters, priorities and action
> > lifecycle/cleanup.
>
> Hmm, so no one really uses the bpf filter mechanism, since you can just
> do everything from an action anyway (and with TCX attachment, you can
> even avoid the overhead of the TC filter/action infrastructure
> entirely). However, point taken wrt how to integrate this with BPF. I
> guess the most flexible thing would be to expose the functionality
> directly (as a kfunc callable from a BPF program). Which also fits with
> your point below:
>
Ah, I see, the cls_bpf example was dated, and I like the kfunc angle
better than a new TC action.
I would probably keep that as the minimal per-packet interface: BPF can
decide whether a packet should be translated, and the kfunc can do the
actual translation work for packets whose translated form still fits the
output MTU. The full 4->6 fragmentation case still looks like
output-path/harness territory to me, since it is a 1->N fan-out
operation.
> > For a gateway translator, though, I still think a device-bound model is
> > less natural. There the translation point is more like a forwarding
> > decision across routes and nexthops, so a route/LWT attachment, or
> > possibly a netfilter attachment seems easier to reason about. Also, as
> > you already pointed out while discussing LWT, an admin setting up NAT64
> > is more likely to reach for an nft rule than for a clsact filter on a
> > specific device.
> >
> > Taking a step back, ipxlat is really a generic translation engine plus a
> > thin harness around it. So rather than pick one attachment, it might be
> > worth structuring the engine so different harnesses can drive it.
> > There's interesting precedent for this shape:
> >
> > - ILA, again, is the closest sibling: stateless IPv6 address translation
> > with a shared core in ila_common.c, driven both by an LWT frontend in
> > ila_lwt.c and by an inline netfilter hook with a netlink-configured
> > mapping table in ila_xlat.c.
> >
> > - act_ct is the precedent for the TC side specifically: a TC action that
> > reuses the netfilter conntrack engine rather than reimplementing it.
> >
> > And act_nat is the cautionary counter-example: a standalone TC
> > reimplementation of stateless NAT that shares no code with nf_nat, and
> > carries a "would be nice to share code" comment :)
> >
> > So I am wondering whether the right direction is to factor the
> > translation engine cleanly, land it with one harness first, and keep the
> > other attachment points as follow-up work once the core semantics are
> > settled.
> >
> > Does that direction seem reasonable to you?
>
> Yes, reusable functionality that can be called from multiple places
> sounds like a good fit; let's try to structure it that way!
>
Great, that's the direction I'll take then.
> As for which hook to start with, well, let's see if we hear back from
> the netfilter devs, but either netfilter or the routing subsystem (LWT
> style) would be OK for me I think.
>
Works for me. The engine factoring is common to all of them, so I'll
start there. Once it's in shape I can sketch a harness against it to
sanity-check the interface.
--
Ralf Lici
Mandelbit Srl
next prev parent reply other threads:[~2026-06-24 16:26 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-19 15:12 [RFC net-next 00/15] Introducing ipxlat: a stateless IPv4/IPv6 translation device Ralf Lici
2026-03-19 15:12 ` [RFC net-next 01/15] drivers/net: add ipxlat netdevice skeleton and build plumbing Ralf Lici
2026-03-19 15:12 ` [RFC net-next 02/15] ipxlat: add RFC 6052 address conversion helpers Ralf Lici
2026-03-19 15:12 ` [RFC net-next 03/15] ipxlat: add packet metadata control block helpers Ralf Lici
2026-03-19 15:12 ` [RFC net-next 04/15] ipxlat: add IPv4 packet validation path Ralf Lici
2026-03-19 15:12 ` [RFC net-next 05/15] ipxlat: add IPv6 " Ralf Lici
2026-04-09 2:18 ` Xavier HSINYUAN
2026-04-09 9:44 ` Ralf Lici
2026-03-19 15:12 ` [RFC net-next 06/15] ipxlat: add transport checksum and offload helpers Ralf Lici
2026-03-19 15:12 ` [RFC net-next 07/15] ipxlat: add 4to6 and 6to4 TCP/UDP translation helpers Ralf Lici
2026-03-19 15:12 ` [RFC net-next 08/15] ipxlat: add translation engine and dispatch core Ralf Lici
2026-06-04 18:23 ` Toke Høiland-Jørgensen
2026-06-05 12:32 ` Ralf Lici
2026-06-10 11:14 ` Toke Høiland-Jørgensen
2026-06-13 13:17 ` Ralf Lici
2026-06-15 13:31 ` Toke Høiland-Jørgensen
2026-06-22 13:34 ` Ralf Lici
2026-06-22 14:36 ` Toke Høiland-Jørgensen
2026-06-23 16:36 ` Ralf Lici
2026-06-23 19:59 ` Toke Høiland-Jørgensen
2026-06-24 16:18 ` Ralf Lici [this message]
2026-06-22 8:32 ` Beniamino Galvani
2026-06-22 15:56 ` Ralf Lici
2026-06-23 8:05 ` Beniamino Galvani
2026-06-24 15:43 ` Ralf Lici
2026-03-19 15:12 ` [RFC net-next 09/15] ipxlat: emit translator-generated ICMP errors on drop Ralf Lici
2026-03-19 15:12 ` [RFC net-next 10/15] ipxlat: add 4to6 pre-fragmentation path Ralf Lici
2026-05-18 12:36 ` Xavier HSINYUAN
2026-06-05 12:24 ` Ralf Lici
2026-03-19 15:12 ` [RFC net-next 11/15] ipxlat: add ICMP informational translation paths Ralf Lici
2026-03-19 15:12 ` [RFC net-next 12/15] ipxlat: add ICMP error translation and quoted-inner handling Ralf Lici
2026-03-19 15:12 ` [RFC net-next 13/15] ipxlat: add netlink control plane and uapi Ralf Lici
2026-03-19 15:12 ` [RFC net-next 14/15] selftests: net: add ipxlat coverage Ralf Lici
2026-03-19 15:12 ` [RFC net-next 15/15] Documentation: networking: add ipxlat translator guide Ralf Lici
2026-03-19 22:11 ` Jonathan Corbet
2026-03-24 9:55 ` Ralf Lici
2026-04-06 14:50 ` Xavier Hsinyuan
2026-04-07 11:30 ` Daniel Gröber
2026-04-09 2:17 ` Xavier HSINYUAN
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260624161854.686569-1-ralf@mandelbit.com \
--to=ralf@mandelbit.com \
--cc=andrew+netdev@lunn.ch \
--cc=antonio@mandelbit.com \
--cc=bgalvani@redhat.com \
--cc=davem@davemloft.net \
--cc=dxld@darkboxed.org \
--cc=edumazet@google.com \
--cc=fw@strlen.de \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=pablo@netfilter.org \
--cc=phil@nwl.cc \
--cc=toke@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox