Netdev List
 help / color / mirror / Atom feed
From: "Toke Høiland-Jørgensen" <toke@kernel.org>
To: Ralf Lici <ralf@mandelbit.com>
Cc: netdev@vger.kernel.org, "Daniel Gröber" <dxld@darkboxed.org>,
	"Antonio Quartulli" <antonio@mandelbit.com>,
	"Andrew Lunn" <andrew+netdev@lunn.ch>,
	"David S. Miller" <davem@davemloft.net>,
	"Eric Dumazet" <edumazet@google.com>,
	"Jakub Kicinski" <kuba@kernel.org>,
	"Paolo Abeni" <pabeni@redhat.com>,
	linux-kernel@vger.kernel.org,
	"Pablo Neira Ayuso" <pablo@netfilter.org>,
	"Florian Westphal" <fw@strlen.de>, "Phil Sutter" <phil@nwl.cc>,
	"Beniamino Galvani" <bgalvani@redhat.com>
Subject: Re: [RFC net-next 08/15] ipxlat: add translation engine and dispatch core
Date: Mon, 15 Jun 2026 15:31:51 +0200	[thread overview]
Message-ID: <87tsr4gcag.fsf@toke.dk> (raw)
In-Reply-To: <20260613131720.253936-1-ralf@mandelbit.com>

>> >> I think a better model is to treat the device as basically a loopback
>> >> device that translates packets before looping them back (so when they
>> >> come back they appear to be coming from that device).
>> >>
>> >> Any reason why that wouldn't work?
>> >>
>> >
>> > That's indeed the intended model for the ipxlat netdevice: route packets
>> > to it, translate them, then loop them back into the stack as packets
>> > received from that same device. That seemed like the simplest model and
>> > the one that exposes the translation point most clearly.
>>
>> Right. I think this could be made a bit more explicit in the
>> documentation as well, since it's a bit of an unusual model.
>>
>> And, well, taking a step back: is it really the right model? Regular NAT
>> lives in netfilter, why can't this be a netfilter module as well? Seems
>> to me you could have something like:
>>
>> table ip xlat4 {
>> 	chain postrouting {
>> 		type nat hook postrouting priority srcnat; policy accept;
>> 		ip daddr 0.0.0.0/0 oifname "eth0" xlat to 64:ff9b::/96
>> 	}
>> }
>> table ip6 xlat6 {
>> 	chain prerouting {
>> 		type nat hook prerouting priority dstnat; policy accept;
>> 		ip6 saddr 64::ff0b::/96 iifname "eth0" xlat from 64::ff9b::/96
>> 	}
>> }
>>
>> and that would provide the functionality without having to implement a
>> new interface type and the associated multiple traversals through the
>> stack? Did you consider this as an alternative to the new device type?
>>
>
> We did consider netfilter, and your example is syntactically attractive,
> but I am no longer convinced it is the cleanest model for SIIT.
>
> An nft expression cannot simply rewrite ETH_P_IP <-> ETH_P_IPV6 and
> return ACCEPT as if this were normal NAT because the current hook
> invocation, dst, and conntrack-related state were established for the
> packet as it entered that hook. A cross-family translator would need to
> consume the skb, clear or rebuild route and ct metadata as appropriate,
> do an other-family route lookup, and resume at a well-defined point in
> that family. That seems possible, but it would be a new stateless
> cross-family action, not just a new mode of the existing nft nat
> expression (which is built around nf_nat_setup_info and assumes the
> packet's L3 family does not change AFAICT).

Right, I did not expect it would be possible to actually share code with
the existing NAT functionality, but conceptually they're similar. I.e.,
if I was an admin trying to figure out if my system supported SIIT
translation, my chain of thought would be something along the line of:
"SIIT is a variant of NAT, and I know NAT is a long-standing feature of
netfilter, so I wonder if SIIT exists there as well".

Adding the netfilter folks to Cc to try to get their attention and an
opinion on this :)

> My second concern is that the SIIT boundary would be a property of
> rule and hook placement. That gives flexibility, but it also means the
> translation point has to be constrained and documented very carefully
> to avoid ambiguous TTL/Hop Limit, PMTU/ICMP, and hook-order behavior.
> For this use case I would rather have the route that matches the
> translation prefix also be the object that says: leave this family
> here and continue in the other one.

Yeah, with flexibility comes the ability to shoot yourself in the foot.
But that's not really different from much of the other functionality we
have in the kernel today, is it? For netfilter in particular it's
certainly possible to configure a broken NAT configuration that leads to
packet drops (or just invalid packets being sent out on a network
device).

> After looking at the available kernel mechanisms again, I think the
> better model is probably LWT: routes carry an ipxlat encap referencing a
> named translator domain configured over netlink. That should represent
> the stateless, prefix-based and symmetric nature of ipxlat.

I think this description actually hits the nail on the head: What are we
implementing here? Is it a product feature, or a building block for one?
The properties you mention wrt consistency, symmetry etc are properties
of the high-level feature (which is also generally the level things are
specified in RFCs). Whereas other packet mangling features in the kernel
are more in the "building block" category, where it's possible to
configure things to implement a particular feature set / compliance with
a particular RFC, but it's also possible to do things that are outside
of that.

I think this relates to the "mechanism, not policy" approach that we
take to most things in the kernel: implement the building blocks to do
something in the most general way we can, and then leave it up to
userspace to configure things in a way that results in a consistent
high-level system behaviour.

That being said:

> Very roughly, userspace could look like:
>
>     ip xlat add siit0 prefix6 64:ff9b::/96
>     ip route add ... encap ipxlat id siit0
>     ip -6 route add ... encap ipxlat id siit0
>
> There are some useful precedents for this: ILA is stateless address
> translation as LWT, seg6_local already has cross-family LWT actions, and
> ioam6 has a similar split between separately configured objects and
> route attachments.
>
> The invariant I would like v2 to follow is that the original-family
> route lookup selects translation as its terminal route action. The
> translated skb then gets a fresh lookup in the other family. From that
> point on, TTL/Hop Limit where applicable, PMTU, ICMP errors, and
> netfilter visibility belong to the translated family.
>
> So I think your question addresses the core design issue in this RFC. My
> current preference is to rework the next version around an LWT/domain
> model instead of the virtual netdevice model, unless prototyping shows a
> fundamental problem with that approach.
>
> Does that model make sense to you?

I did consider this as well before suggesting netfilter as the right
place to hook things, and I do think the route object model has some
appeal. I agree it's a better model than the magical loopback interface,
certainly.

I think in the end this comes down to whether flexibility in how to use
this translation mechanism is a bug or a feature, as outlined above. I'm
leaning towards "feature", but could probably be persuaded otherwise :)

> Thanks for pushing on this.

You're welcome! Thanks for working on it - will be cool to have this
land in whatever form we end up agreeing on!

-Toke

  reply	other threads:[~2026-06-15 13:31 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-19 15:12 [RFC net-next 00/15] Introducing ipxlat: a stateless IPv4/IPv6 translation device Ralf Lici
2026-03-19 15:12 ` [RFC net-next 01/15] drivers/net: add ipxlat netdevice skeleton and build plumbing Ralf Lici
2026-03-19 15:12 ` [RFC net-next 02/15] ipxlat: add RFC 6052 address conversion helpers Ralf Lici
2026-03-19 15:12 ` [RFC net-next 03/15] ipxlat: add packet metadata control block helpers Ralf Lici
2026-03-19 15:12 ` [RFC net-next 04/15] ipxlat: add IPv4 packet validation path Ralf Lici
2026-03-19 15:12 ` [RFC net-next 05/15] ipxlat: add IPv6 " Ralf Lici
2026-04-09  2:18   ` Xavier HSINYUAN
2026-04-09  9:44     ` Ralf Lici
2026-03-19 15:12 ` [RFC net-next 06/15] ipxlat: add transport checksum and offload helpers Ralf Lici
2026-03-19 15:12 ` [RFC net-next 07/15] ipxlat: add 4to6 and 6to4 TCP/UDP translation helpers Ralf Lici
2026-03-19 15:12 ` [RFC net-next 08/15] ipxlat: add translation engine and dispatch core Ralf Lici
2026-06-04 18:23   ` Toke Høiland-Jørgensen
2026-06-05 12:32     ` Ralf Lici
2026-06-10 11:14       ` Toke Høiland-Jørgensen
2026-06-13 13:17         ` Ralf Lici
2026-06-15 13:31           ` Toke Høiland-Jørgensen [this message]
2026-03-19 15:12 ` [RFC net-next 09/15] ipxlat: emit translator-generated ICMP errors on drop Ralf Lici
2026-03-19 15:12 ` [RFC net-next 10/15] ipxlat: add 4to6 pre-fragmentation path Ralf Lici
2026-05-18 12:36   ` Xavier HSINYUAN
2026-06-05 12:24     ` Ralf Lici
2026-03-19 15:12 ` [RFC net-next 11/15] ipxlat: add ICMP informational translation paths Ralf Lici
2026-03-19 15:12 ` [RFC net-next 12/15] ipxlat: add ICMP error translation and quoted-inner handling Ralf Lici
2026-03-19 15:12 ` [RFC net-next 13/15] ipxlat: add netlink control plane and uapi Ralf Lici
2026-03-19 15:12 ` [RFC net-next 14/15] selftests: net: add ipxlat coverage Ralf Lici
2026-03-19 15:12 ` [RFC net-next 15/15] Documentation: networking: add ipxlat translator guide Ralf Lici
2026-03-19 22:11   ` Jonathan Corbet
2026-03-24  9:55     ` Ralf Lici
2026-04-06 14:50   ` Xavier Hsinyuan
2026-04-07 11:30     ` Daniel Gröber
2026-04-09  2:17       ` Xavier HSINYUAN

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87tsr4gcag.fsf@toke.dk \
    --to=toke@kernel.org \
    --cc=andrew+netdev@lunn.ch \
    --cc=antonio@mandelbit.com \
    --cc=bgalvani@redhat.com \
    --cc=davem@davemloft.net \
    --cc=dxld@darkboxed.org \
    --cc=edumazet@google.com \
    --cc=fw@strlen.de \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=pablo@netfilter.org \
    --cc=phil@nwl.cc \
    --cc=ralf@mandelbit.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox