* [RFC net-next 01/15] drivers/net: add ipxlat netdevice skeleton and build plumbing
2026-03-19 15:12 [RFC net-next 00/15] Introducing ipxlat: a stateless IPv4/IPv6 translation device Ralf Lici
@ 2026-03-19 15:12 ` Ralf Lici
2026-03-19 15:12 ` [RFC net-next 02/15] ipxlat: add RFC 6052 address conversion helpers Ralf Lici
` (13 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Ralf Lici @ 2026-03-19 15:12 UTC (permalink / raw)
To: netdev
Cc: Daniel Gröber, Ralf Lici, Antonio Quartulli, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
linux-kernel
ipxlat is a virtual netdevice implementing stateless IPv4/IPv6
translation (SIIT). The translation model follows RFC 7915 behavior and
RFC 6052 address embedding rules.
The netdevice form is intentional: it provides per-instance lifecycle,
MTU/statistics semantics and explicit routing integration, so translated
traffic can be steered through a dedicated device and configured per
namespace.
This series targets ipxlat as a reusable kernel building block for SIIT
deployments and for NAT64-style setups when combined with existing
nftables rules in userspace policy.
This first patch introduces only the driver scaffolding:
- drivers/net/ipxlat/ directory and build integration
- Kconfig/Makefile entries
- basic private structures and defaults
- rtnl_link_ops and netdevice skeleton needed to create/register links
No translation logic is added in this patch yet. Follow-up patches add
packet validation, transport/ICMP translation, error handling,
fragmentation handling, generic netlink control plane, selftests and
documentation.
Signed-off-by: Ralf Lici <ralf@mandelbit.com>
---
drivers/net/Kconfig | 13 ++++
drivers/net/Makefile | 1 +
drivers/net/ipxlat/Makefile | 7 ++
drivers/net/ipxlat/ipxlpriv.h | 53 +++++++++++++
drivers/net/ipxlat/main.c | 137 ++++++++++++++++++++++++++++++++++
drivers/net/ipxlat/main.h | 27 +++++++
6 files changed, 238 insertions(+)
create mode 100644 drivers/net/ipxlat/Makefile
create mode 100644 drivers/net/ipxlat/ipxlpriv.h
create mode 100644 drivers/net/ipxlat/main.c
create mode 100644 drivers/net/ipxlat/main.h
diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index b2fd90466bab..a3b28f294d95 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -117,6 +117,19 @@ config OVPN
This module enhances the performance of the OpenVPN userspace software
by offloading the data channel processing to kernelspace.
+config IPXLAT
+ tristate "IPv6<>IPv4 packet translation virtual device (SIIT)"
+ depends on NET && INET && IPV6
+ help
+ Virtual network device driver for Stateless IP/ICMP Packet
+ Translation (RFC 7915). Useful for IPv6 focused networks.
+ Particularly NAT64, SIIT-DC, 464XLAT network architectures.
+
+ See also <file:Documentation/networking/ipxlat.rst>.
+
+ To compile this driver as a module, choose M here: the module will be
+ called ipxlat.
+
config EQUALIZER
tristate "EQL (serial line load balancing) support"
help
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 5b01215f6829..4f982c9e6585 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -24,6 +24,7 @@ obj-$(CONFIG_NET) += loopback.o
obj-$(CONFIG_NETDEV_LEGACY_INIT) += Space.o
obj-$(CONFIG_NETCONSOLE) += netconsole.o
obj-$(CONFIG_NETKIT) += netkit.o
+obj-$(CONFIG_IPXLAT) += ipxlat/
obj-y += phy/
obj-y += pse-pd/
obj-y += mdio/
diff --git a/drivers/net/ipxlat/Makefile b/drivers/net/ipxlat/Makefile
new file mode 100644
index 000000000000..bd48c2700bf5
--- /dev/null
+++ b/drivers/net/ipxlat/Makefile
@@ -0,0 +1,7 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver
+
+obj-$(CONFIG_IPXLAT) := ipxlat.o
+
+ipxlat-objs += main.o
diff --git a/drivers/net/ipxlat/ipxlpriv.h b/drivers/net/ipxlat/ipxlpriv.h
new file mode 100644
index 000000000000..5027d8377bdd
--- /dev/null
+++ b/drivers/net/ipxlat/ipxlpriv.h
@@ -0,0 +1,53 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver
+ *
+ * Copyright (C) 2024- Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Copyright (C) 2026- Mandelbit SRL
+ * Copyright (C) 2026- Daniel Gröber <dxld@darkboxed.org>
+ *
+ * Author: Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Antonio Quartulli <antonio@mandelbit.com>
+ * Daniel Gröber <dxld@darkboxed.org>
+ * Ralf Lici <ralf@mandelbit.com>
+ */
+
+#ifndef _NET_IPXLAT_IPXLPRIV_H_
+#define _NET_IPXLAT_IPXLPRIV_H_
+
+#include <linux/mutex.h>
+#include <linux/netdevice.h>
+#include <net/gro_cells.h>
+
+/**
+ * struct ipv6_prefix - IPv6 prefix definition
+ * @addr: prefix address (host bits may be non-zero)
+ * @len: prefix length in bits
+ */
+struct ipv6_prefix {
+ struct in6_addr addr;
+ u8 len;
+};
+
+/**
+ * struct ipxlat_priv - private state stored in netdev priv area
+ * @dev: owning netdevice
+ * @xlat_prefix6: RFC 6052 prefix used for stateless v4<->v6 mapping
+ * @lowest_ipv6_mtu: LIM threshold used by 4->6 pre-fragment planning
+ * @cfg_lock: serializes control-plane updates
+ * @gro_cells: receive-side reinjection queue used by forward path
+ *
+ * Datapath reads config without taking @cfg_lock to keep per-packet overhead
+ * low. Writers serialize updates under @cfg_lock. During reconfiguration,
+ * readers may transiently observe mixed old/new values; this may cause a small
+ * number of drops and is an accepted tradeoff for a lightweight datapath.
+ */
+struct ipxlat_priv {
+ struct net_device *dev;
+ struct ipv6_prefix xlat_prefix6;
+ u32 lowest_ipv6_mtu;
+ /* serializes control-plane updates */
+ struct mutex cfg_lock;
+ struct gro_cells gro_cells;
+};
+
+#endif /* _NET_IPXLAT_IPXLPRIV_H_ */
diff --git a/drivers/net/ipxlat/main.c b/drivers/net/ipxlat/main.c
new file mode 100644
index 000000000000..26b7f5b6ff20
--- /dev/null
+++ b/drivers/net/ipxlat/main.c
@@ -0,0 +1,137 @@
+// SPDX-License-Identifier: GPL-2.0
+/* IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver
+ *
+ * Copyright (C) 2024- Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Copyright (C) 2026- Mandelbit SRL
+ * Copyright (C) 2026- Daniel Gröber <dxld@darkboxed.org>
+ *
+ * Author: Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Antonio Quartulli <antonio@mandelbit.com>
+ * Daniel Gröber <dxld@darkboxed.org>
+ * Ralf Lici <ralf@mandelbit.com>
+ */
+
+#include <linux/module.h>
+
+#include <net/ip.h>
+
+#include "ipxlpriv.h"
+#include "main.h"
+
+MODULE_AUTHOR("Alberto Leiva Popper <ydahhrk@gmail.com>");
+MODULE_AUTHOR("Antonio Quartulli <antonio@mandelbit.com>");
+MODULE_AUTHOR("Daniel Gröber <dxld@darkboxed.org>");
+MODULE_AUTHOR("Ralf Lici <ralf@mandelbit.com>");
+MODULE_DESCRIPTION("IPv6<>IPv4 translation virtual netdev support (SIIT)");
+MODULE_LICENSE("GPL");
+
+static int ipxlat_dev_init(struct net_device *dev)
+{
+ struct ipxlat_priv *ipxlat = netdev_priv(dev);
+ int err;
+
+ ipxlat->dev = dev;
+ /* default xlat-prefix6 is 64:ff9b::/96 */
+ ipxlat->xlat_prefix6.addr.s6_addr32[0] = htonl(0x0064ff9b);
+ ipxlat->xlat_prefix6.addr.s6_addr32[1] = 0;
+ ipxlat->xlat_prefix6.addr.s6_addr32[2] = 0;
+ ipxlat->xlat_prefix6.addr.s6_addr32[3] = 0;
+ ipxlat->xlat_prefix6.len = 96;
+ ipxlat->lowest_ipv6_mtu = 1280;
+ mutex_init(&ipxlat->cfg_lock);
+
+ err = gro_cells_init(&ipxlat->gro_cells, dev);
+ if (unlikely(err))
+ return err;
+
+ return 0;
+}
+
+static void ipxlat_dev_uninit(struct net_device *dev)
+{
+ struct ipxlat_priv *ipxlat = netdev_priv(dev);
+
+ gro_cells_destroy(&ipxlat->gro_cells);
+}
+
+static int ipxlat_start_xmit(struct sk_buff *skb, struct net_device *dev)
+{
+ dev_dstats_tx_dropped(dev);
+ kfree_skb(skb);
+ return NETDEV_TX_OK;
+}
+
+static const struct net_device_ops ipxlat_netdev_ops = {
+ .ndo_init = ipxlat_dev_init,
+ .ndo_uninit = ipxlat_dev_uninit,
+ .ndo_start_xmit = ipxlat_start_xmit,
+};
+
+static const struct device_type ipxlat_type = {
+ .name = "ipxlat",
+};
+
+static void ipxlat_setup(struct net_device *dev)
+{
+ const netdev_features_t feat = NETIF_F_SG | NETIF_F_FRAGLIST |
+ NETIF_F_HW_CSUM | NETIF_F_HIGHDMA |
+ NETIF_F_GSO_SOFTWARE;
+
+ dev->type = ARPHRD_NONE;
+ dev->flags = IFF_NOARP;
+ dev->priv_flags |= IFF_NO_QUEUE;
+ dev->hard_header_len = 0;
+ dev->addr_len = 0;
+
+ dev->lltx = true;
+ dev->features |= feat;
+ dev->hw_features |= feat;
+ dev->hw_enc_features |= feat;
+
+ dev->netdev_ops = &ipxlat_netdev_ops;
+ dev->needs_free_netdev = true;
+ dev->pcpu_stat_type = NETDEV_PCPU_STAT_DSTATS;
+ dev->max_mtu = IP_MAX_MTU - sizeof(struct ipv6hdr) -
+ sizeof(struct iphdr);
+ dev->min_mtu = IPV6_MIN_MTU;
+ dev->mtu = ETH_DATA_LEN;
+
+ /* keep skb->dst up to ndo_start_xmit so ICMP error emission can
+ * reuse routing metadata from ingress when available
+ */
+ netif_keep_dst(dev);
+
+ SET_NETDEV_DEVTYPE(dev, &ipxlat_type);
+}
+
+static struct rtnl_link_ops ipxlat_link_ops = {
+ .kind = "ipxlat",
+ .priv_size = sizeof(struct ipxlat_priv),
+ .setup = ipxlat_setup,
+};
+
+bool ipxlat_dev_is_valid(const struct net_device *dev)
+{
+ return dev->rtnl_link_ops == &ipxlat_link_ops;
+}
+
+static int __init ipxlat_init(void)
+{
+ int err;
+
+ err = rtnl_link_register(&ipxlat_link_ops);
+ if (err) {
+ pr_err("ipxlat: failed to register rtnl link ops: %d\n", err);
+ return err;
+ }
+
+ return 0;
+}
+
+static void __exit ipxlat_exit(void)
+{
+ rtnl_link_unregister(&ipxlat_link_ops);
+}
+
+module_init(ipxlat_init);
+module_exit(ipxlat_exit);
diff --git a/drivers/net/ipxlat/main.h b/drivers/net/ipxlat/main.h
new file mode 100644
index 000000000000..fb78f910b2e2
--- /dev/null
+++ b/drivers/net/ipxlat/main.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver
+ *
+ * Copyright (C) 2024- Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Copyright (C) 2026- Mandelbit SRL
+ * Copyright (C) 2026- Daniel Gröber <dxld@darkboxed.org>
+ *
+ * Author: Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Antonio Quartulli <antonio@mandelbit.com>
+ * Daniel Gröber <dxld@darkboxed.org>
+ * Ralf Lici <ralf@mandelbit.com>
+ */
+
+#ifndef _NET_IPXLAT_MAIN_H_
+#define _NET_IPXLAT_MAIN_H_
+
+#include <linux/netdevice.h>
+
+/**
+ * ipxlat_dev_is_valid - tell whether a netdev is an ipxlat interface
+ * @dev: netdevice to inspect
+ *
+ * Return: true if @dev was created with ipxlat link ops.
+ */
+bool ipxlat_dev_is_valid(const struct net_device *dev);
+
+#endif /* _NET_IPXLAT_MAIN_H_ */
--
2.53.0
^ permalink raw reply related [flat|nested] 18+ messages in thread* [RFC net-next 02/15] ipxlat: add RFC 6052 address conversion helpers
2026-03-19 15:12 [RFC net-next 00/15] Introducing ipxlat: a stateless IPv4/IPv6 translation device Ralf Lici
2026-03-19 15:12 ` [RFC net-next 01/15] drivers/net: add ipxlat netdevice skeleton and build plumbing Ralf Lici
@ 2026-03-19 15:12 ` Ralf Lici
2026-03-19 15:12 ` [RFC net-next 03/15] ipxlat: add packet metadata control block helpers Ralf Lici
` (12 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Ralf Lici @ 2026-03-19 15:12 UTC (permalink / raw)
To: netdev
Cc: Daniel Gröber, Ralf Lici, Antonio Quartulli, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
linux-kernel
Introduce IPv4/IPv6 stateless address mapping helpers used by the
translation pipeline. Add the core 4<->6 conversion routines, including
RFC 6052 prefix embedding/extraction and the RFC 6791 fallback source
selection logic used by ICMP translation paths.
Signed-off-by: Ralf Lici <ralf@mandelbit.com>
---
drivers/net/ipxlat/Makefile | 1 +
drivers/net/ipxlat/address.c | 132 +++++++++++++++++++++++++++++++++++
drivers/net/ipxlat/address.h | 59 ++++++++++++++++
3 files changed, 192 insertions(+)
create mode 100644 drivers/net/ipxlat/address.c
create mode 100644 drivers/net/ipxlat/address.h
diff --git a/drivers/net/ipxlat/Makefile b/drivers/net/ipxlat/Makefile
index bd48c2700bf5..b6367dedd78e 100644
--- a/drivers/net/ipxlat/Makefile
+++ b/drivers/net/ipxlat/Makefile
@@ -5,3 +5,4 @@
obj-$(CONFIG_IPXLAT) := ipxlat.o
ipxlat-objs += main.o
+ipxlat-objs += address.o
diff --git a/drivers/net/ipxlat/address.c b/drivers/net/ipxlat/address.c
new file mode 100644
index 000000000000..d1a2b7d1768f
--- /dev/null
+++ b/drivers/net/ipxlat/address.c
@@ -0,0 +1,132 @@
+// SPDX-License-Identifier: GPL-2.0
+/* IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver
+ *
+ * Copyright (C) 2024- Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Copyright (C) 2026- Mandelbit SRL
+ * Copyright (C) 2026- Daniel Gröber <dxld@darkboxed.org>
+ *
+ * Author: Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Antonio Quartulli <antonio@mandelbit.com>
+ * Daniel Gröber <dxld@darkboxed.org>
+ * Ralf Lici <ralf@mandelbit.com>
+ */
+
+#include "address.h"
+
+static bool ipxlat_prefix6_contains(const struct ipv6_prefix *prefix,
+ const struct in6_addr *addr)
+{
+ return ipv6_prefix_equal(&prefix->addr, addr, prefix->len);
+}
+
+static __be32 ipxlat_64_extract_addr(const struct in6_addr *src,
+ unsigned int q1, unsigned int q2,
+ unsigned int q3, unsigned int q4)
+{
+ q1 = src->s6_addr[q1];
+ q2 = src->s6_addr[q2];
+ q3 = src->s6_addr[q3];
+ q4 = src->s6_addr[q4];
+ return htonl((q1 << 24) | (q2 << 16) | (q3 << 8) | q4);
+}
+
+static void ipxlat_46_embed_addr(__be32 __src, struct in6_addr *dst,
+ unsigned int q1, unsigned int q2,
+ unsigned int q3, unsigned int q4)
+{
+ u32 src = ntohl(__src);
+
+ dst->s6_addr[q1] = ((src >> 24) & 0xFF);
+ dst->s6_addr[q2] = ((src >> 16) & 0xFF);
+ dst->s6_addr[q3] = ((src >> 8) & 0xFF);
+ dst->s6_addr[q4] = ((src) & 0xFF);
+}
+
+void ipxlat_46_convert_addr(const struct ipv6_prefix *xlat_prefix6,
+ __be32 addr4, struct in6_addr *addr6)
+{
+ *addr6 = xlat_prefix6->addr;
+
+ switch (xlat_prefix6->len) {
+ case 96:
+ addr6->s6_addr32[3] = addr4;
+ return;
+ case 64:
+ ipxlat_46_embed_addr(addr4, addr6, 9, 10, 11, 12);
+ return;
+ case 56:
+ ipxlat_46_embed_addr(addr4, addr6, 7, 9, 10, 11);
+ return;
+ case 48:
+ ipxlat_46_embed_addr(addr4, addr6, 6, 7, 9, 10);
+ return;
+ case 40:
+ ipxlat_46_embed_addr(addr4, addr6, 5, 6, 7, 9);
+ return;
+ case 32:
+ addr6->s6_addr32[1] = addr4;
+ return;
+ }
+
+ DEBUG_NET_WARN_ON_ONCE(1);
+}
+
+int ipxlat_64_convert_addrs(const struct ipv6_prefix *xlat_prefix6,
+ const struct ipv6hdr *hdr6, bool icmp_err,
+ __be32 *src, __be32 *dst)
+{
+ bool src_ok;
+
+ src_ok = ipxlat_prefix6_contains(xlat_prefix6, &hdr6->saddr);
+ if (unlikely(!src_ok && !icmp_err))
+ return -EINVAL;
+ if (unlikely(!ipxlat_prefix6_contains(xlat_prefix6, &hdr6->daddr)))
+ return -EINVAL;
+
+ switch (xlat_prefix6->len) {
+ case 96:
+ if (likely(src_ok))
+ *src = hdr6->saddr.s6_addr32[3];
+ *dst = hdr6->daddr.s6_addr32[3];
+ break;
+ case 64:
+ if (likely(src_ok))
+ *src = ipxlat_64_extract_addr(&hdr6->saddr, 9, 10, 11,
+ 12);
+ *dst = ipxlat_64_extract_addr(&hdr6->daddr, 9, 10, 11, 12);
+ break;
+ case 56:
+ if (likely(src_ok))
+ *src = ipxlat_64_extract_addr(&hdr6->saddr, 7,
+ 9, 10, 11);
+ *dst = ipxlat_64_extract_addr(&hdr6->daddr, 7, 9, 10, 11);
+ break;
+ case 48:
+ if (likely(src_ok))
+ *src = ipxlat_64_extract_addr(&hdr6->saddr, 6,
+ 7, 9, 10);
+ *dst = ipxlat_64_extract_addr(&hdr6->daddr, 6, 7, 9, 10);
+ break;
+ case 40:
+ if (likely(src_ok))
+ *src = ipxlat_64_extract_addr(&hdr6->saddr, 5, 6, 7, 9);
+ *dst = ipxlat_64_extract_addr(&hdr6->daddr, 5, 6, 7, 9);
+ break;
+ case 32:
+ if (likely(src_ok))
+ *src = hdr6->saddr.s6_addr32[1];
+ *dst = hdr6->daddr.s6_addr32[1];
+ break;
+ default:
+ DEBUG_NET_WARN_ON_ONCE(1);
+ return -EINVAL;
+ }
+
+ /* keep 6->4 ICMP error translation functional even when the ICMPv6
+ * source is not xlat_prefix6-mapped (for example, stack-generated PTB)
+ */
+ if (unlikely(!src_ok))
+ *src = htonl(INADDR_DUMMY);
+
+ return 0;
+}
diff --git a/drivers/net/ipxlat/address.h b/drivers/net/ipxlat/address.h
new file mode 100644
index 000000000000..4283fdddac56
--- /dev/null
+++ b/drivers/net/ipxlat/address.h
@@ -0,0 +1,59 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver
+ *
+ * Copyright (C) 2024- Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Copyright (C) 2026- Mandelbit SRL
+ * Copyright (C) 2026- Daniel Gröber <dxld@darkboxed.org>
+ *
+ * Author: Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Antonio Quartulli <antonio@mandelbit.com>
+ * Daniel Gröber <dxld@darkboxed.org>
+ * Ralf Lici <ralf@mandelbit.com>
+ */
+
+#ifndef _NET_IPXLAT_ADDRESS_H_
+#define _NET_IPXLAT_ADDRESS_H_
+
+#include <linux/ip.h>
+#include <net/ipv6.h>
+
+#include "ipxlpriv.h"
+
+/**
+ * ipxlat_46_convert_addr - translate one IPv4 address into RFC 6052 IPv6 form
+ * @xlat_prefix6: configured RFC 6052 prefix
+ * @addr4: IPv4 address to convert
+ * @addr6: output IPv6 address
+ */
+void ipxlat_46_convert_addr(const struct ipv6_prefix *xlat_prefix6,
+ __be32 addr4, struct in6_addr *addr6);
+
+/**
+ * ipxlat_64_convert_addrs - translate outer IPv6 endpoints into IPv4 pair
+ * @xlat_prefix6: configured RFC 6052 prefix
+ * @hdr6: source IPv6 header
+ * @icmp_err: source packet is ICMPv6 error
+ * @src: output IPv4 source address
+ * @dst: output IPv4 destination address
+ *
+ * Return: 0 on success, negative errno on non-translatable addresses.
+ */
+int ipxlat_64_convert_addrs(const struct ipv6_prefix *xlat_prefix6,
+ const struct ipv6hdr *hdr6, bool icmp_err,
+ __be32 *src, __be32 *dst);
+
+/**
+ * ipxlat_46_convert_addrs - translate outer IPv4 endpoints into IPv6 pair
+ * @xlat_prefix6: configured RFC 6052 prefix
+ * @iph4: source IPv4 header
+ * @iph6: output IPv6 header (only saddr/daddr are updated)
+ */
+static inline void
+ipxlat_46_convert_addrs(const struct ipv6_prefix *xlat_prefix6,
+ const struct iphdr *iph4, struct ipv6hdr *iph6)
+{
+ ipxlat_46_convert_addr(xlat_prefix6, iph4->saddr, &iph6->saddr);
+ ipxlat_46_convert_addr(xlat_prefix6, iph4->daddr, &iph6->daddr);
+}
+
+#endif /* _NET_IPXLAT_ADDRESS_H_ */
--
2.53.0
^ permalink raw reply related [flat|nested] 18+ messages in thread* [RFC net-next 03/15] ipxlat: add packet metadata control block helpers
2026-03-19 15:12 [RFC net-next 00/15] Introducing ipxlat: a stateless IPv4/IPv6 translation device Ralf Lici
2026-03-19 15:12 ` [RFC net-next 01/15] drivers/net: add ipxlat netdevice skeleton and build plumbing Ralf Lici
2026-03-19 15:12 ` [RFC net-next 02/15] ipxlat: add RFC 6052 address conversion helpers Ralf Lici
@ 2026-03-19 15:12 ` Ralf Lici
2026-03-19 15:12 ` [RFC net-next 04/15] ipxlat: add IPv4 packet validation path Ralf Lici
` (11 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Ralf Lici @ 2026-03-19 15:12 UTC (permalink / raw)
To: netdev
Cc: Daniel Gröber, Ralf Lici, Antonio Quartulli, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
linux-kernel
Add the per-skb control-block layout and shared packet helper routines
used by translation stages introducing common metadata bookkeeping
(offset rebasing and invariant checks) plus protocol-fragment helper
utilities.
Signed-off-by: Ralf Lici <ralf@mandelbit.com>
---
drivers/net/ipxlat/Makefile | 1 +
drivers/net/ipxlat/packet.c | 99 +++++++++++++++++++++
drivers/net/ipxlat/packet.h | 166 ++++++++++++++++++++++++++++++++++++
3 files changed, 266 insertions(+)
create mode 100644 drivers/net/ipxlat/packet.c
create mode 100644 drivers/net/ipxlat/packet.h
diff --git a/drivers/net/ipxlat/Makefile b/drivers/net/ipxlat/Makefile
index b6367dedd78e..90dbc0489fa2 100644
--- a/drivers/net/ipxlat/Makefile
+++ b/drivers/net/ipxlat/Makefile
@@ -6,3 +6,4 @@ obj-$(CONFIG_IPXLAT) := ipxlat.o
ipxlat-objs += main.o
ipxlat-objs += address.o
+ipxlat-objs += packet.o
diff --git a/drivers/net/ipxlat/packet.c b/drivers/net/ipxlat/packet.c
new file mode 100644
index 000000000000..f82c375255f3
--- /dev/null
+++ b/drivers/net/ipxlat/packet.c
@@ -0,0 +1,99 @@
+// SPDX-License-Identifier: GPL-2.0
+/* IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver
+ *
+ * Copyright (C) 2024- Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Copyright (C) 2026- Mandelbit SRL
+ * Copyright (C) 2026- Daniel Gröber <dxld@darkboxed.org>
+ *
+ * Author: Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Antonio Quartulli <antonio@mandelbit.com>
+ * Daniel Gröber <dxld@darkboxed.org>
+ * Ralf Lici <ralf@mandelbit.com>
+ */
+
+#include "packet.h"
+
+/* Shift cached skb cb offsets by the L3 header delta after in-place rewrite.
+ *
+ * Translation may replace only the outer L3 header size (4->6 or 6->4), while
+ * cached offsets were computed before rewrite. Rebasing applies the same delta
+ * to all cached absolute offsets so they still point to the same logical
+ * fields in the modified skb.
+ *
+ * This helper only guards against underflow (< 0). Relative ordering checks
+ * are done by ipxlat_cb_offsets_valid.
+ */
+int ipxlat_cb_rebase_offsets(struct ipxlat_cb *cb, int delta)
+{
+ int off;
+
+ off = cb->l4_off + delta;
+ if (unlikely(off < 0))
+ return -EINVAL;
+ cb->l4_off = off;
+
+ off = cb->payload_off + delta;
+ if (unlikely(off < 0))
+ return -EINVAL;
+ cb->payload_off = off;
+
+ if (unlikely(cb->is_icmp_err)) {
+ off = cb->inner_l3_offset + delta;
+ if (unlikely(off < 0))
+ return -EINVAL;
+ cb->inner_l3_offset = off;
+
+ off = cb->inner_l4_offset + delta;
+ if (unlikely(off < 0))
+ return -EINVAL;
+ cb->inner_l4_offset = off;
+
+ if (cb->inner_fragh_off) {
+ off = cb->inner_fragh_off + delta;
+ if (unlikely(off < 0))
+ return -EINVAL;
+ cb->inner_fragh_off = off;
+ }
+ }
+
+ return 0;
+}
+
+#ifdef CONFIG_DEBUG_NET
+/* Verify ordering/range relations between cached skb cb offsets.
+ *
+ * Unlike ipxlat_cb_rebase_offsets, this checks structural invariants:
+ * l4 <= payload, inner_l3 >= payload, inner_l3 <= inner_l4, and fragment
+ * header (when present) located inside inner L3 area before inner L4.
+ */
+bool ipxlat_cb_offsets_valid(const struct ipxlat_cb *cb)
+{
+ if (unlikely(cb->payload_off < cb->l4_off))
+ return false;
+
+ if (unlikely(cb->is_icmp_err)) {
+ if (unlikely(cb->inner_l3_offset < cb->payload_off))
+ return false;
+ if (unlikely(cb->inner_l4_offset < cb->inner_l3_offset))
+ return false;
+ if (unlikely(cb->inner_fragh_off &&
+ cb->inner_fragh_off < cb->inner_l3_offset))
+ return false;
+ if (unlikely(cb->inner_fragh_off &&
+ cb->inner_fragh_off >= cb->inner_l4_offset))
+ return false;
+ }
+
+ return true;
+}
+#endif
+
+int ipxlat_v4_validate_skb(struct ipxlat_priv *ipxl, struct sk_buff *skb)
+{
+ return -EOPNOTSUPP;
+}
+
+int ipxlat_v6_validate_skb(struct sk_buff *skb)
+{
+ return -EOPNOTSUPP;
+}
diff --git a/drivers/net/ipxlat/packet.h b/drivers/net/ipxlat/packet.h
new file mode 100644
index 000000000000..f39c25987940
--- /dev/null
+++ b/drivers/net/ipxlat/packet.h
@@ -0,0 +1,166 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver
+ *
+ * Copyright (C) 2024- Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Copyright (C) 2026- Mandelbit SRL
+ * Copyright (C) 2026- Daniel Gröber <dxld@darkboxed.org>
+ *
+ * Author: Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Antonio Quartulli <antonio@mandelbit.com>
+ * Daniel Gröber <dxld@darkboxed.org>
+ * Ralf Lici <ralf@mandelbit.com>
+ */
+
+#ifndef _NET_IPXLAT_PACKET_H_
+#define _NET_IPXLAT_PACKET_H_
+
+#include <net/ip.h>
+
+#include "ipxlpriv.h"
+
+/**
+ * struct ipxlat_cb - per-skb parser and control metadata stored in skb->cb
+ * @l4_off: outer L4 header offset
+ * @payload_off: outer payload offset
+ * @fragh_off: outer IPv6 Fragment Header offset, or 0 if absent
+ * @inner_l3_offset: quoted inner L3 offset for ICMP errors
+ * @inner_l4_offset: quoted inner L4 offset for ICMP errors
+ * @inner_fragh_off: quoted inner IPv6 Fragment Header offset, or 0
+ * @udp_zero_csum_len: outer UDP length used for 4->6 checksum synthesis
+ * @frag_max_size: pre-fragment payload cap for ip_do_fragment
+ * @l4_proto: outer L4 protocol (or nexthdr for IPv6)
+ * @inner_l4_proto: quoted inner L4 protocol
+ * @l3_hdr_len: outer L3 header length including extension headers
+ * @inner_l3_hdr_len: quoted inner L3 header length
+ * @is_icmp_err: packet is ICMP error and carries quoted inner packet
+ * @emit_icmp_err: datapath must emit translator-generated ICMP on drop
+ * @icmp_err: ICMP type/code/info cached for deferred emission
+ * @icmp_err.type: ICMP type to emit
+ * @icmp_err.code: ICMP code to emit
+ * @icmp_err.info: ICMP auxiliary info (e.g. pointer/MTU)
+ */
+struct ipxlat_cb {
+ u16 l4_off;
+ u16 payload_off;
+ u16 fragh_off;
+ u16 inner_l3_offset;
+ u16 inner_l4_offset;
+ u16 inner_fragh_off;
+ /* L4 span length (UDP header + payload) for outer IPv4 UDP packets
+ * arriving with checksum 0.
+ */
+ u16 udp_zero_csum_len;
+ u16 frag_max_size;
+ u8 l4_proto;
+ u8 inner_l4_proto;
+ u8 l3_hdr_len;
+ u8 inner_l3_hdr_len;
+ bool is_icmp_err;
+ bool emit_icmp_err;
+ struct {
+ u8 type;
+ u8 code;
+ u32 info;
+ } icmp_err;
+};
+
+/**
+ * ipxlat_skb_cb - return ipxlat private control block in skb->cb
+ * @skb: skb carrying ipxlat metadata
+ *
+ * Return: pointer to &struct ipxlat_cb stored in the control buffer of @skb.
+ */
+static inline struct ipxlat_cb *ipxlat_skb_cb(const struct sk_buff *skb)
+{
+ BUILD_BUG_ON(sizeof(struct ipxlat_cb) > sizeof(skb->cb));
+ return (struct ipxlat_cb *)(skb->cb);
+}
+
+static inline unsigned int ipxlat_skb_datagram_len(const struct sk_buff *skb)
+{
+ return skb->len - skb_transport_offset(skb);
+}
+
+static inline u8 ipxlat_get_ipv6_tclass(const struct ipv6hdr *hdr)
+{
+ return (hdr->priority << 4) | (hdr->flow_lbl[0] >> 4);
+}
+
+static inline u16 ipxlat_get_frag6_offset(const struct frag_hdr *hdr)
+{
+ return be16_to_cpu(hdr->frag_off) & 0xFFF8U;
+}
+
+static inline u16 ipxlat_get_frag4_offset(const struct iphdr *hdr)
+{
+ return (be16_to_cpu(hdr->frag_off) & IP_OFFSET) << 3;
+}
+
+static inline bool ipxlat_is_first_frag6(const struct frag_hdr *hdr)
+{
+ return hdr ? (ipxlat_get_frag6_offset(hdr) == 0) : true;
+}
+
+static inline bool ipxlat_is_first_frag4(const struct iphdr *hdr)
+{
+ return !(hdr->frag_off & htons(IP_OFFSET));
+}
+
+static inline __be16 ipxlat_build_frag6_offset(u16 frag_offset, bool mf)
+{
+ return cpu_to_be16((frag_offset & 0xFFF8U) | mf);
+}
+
+static inline __be16
+ipxlat_build_frag4_offset(bool df, bool mf, u16 frag_offset)
+{
+ return cpu_to_be16((df ? (1U << 14) : 0) | (mf ? (1U << 13) : 0) |
+ (frag_offset >> 3));
+}
+
+/**
+ * ipxlat_cb_rebase_offsets - shift cached cb offsets after skb relayout
+ * @cb: parsed packet metadata
+ * @delta: signed byte delta applied to cached offsets
+ *
+ * Return: 0 on success, negative errno if rebased offsets would underflow.
+ */
+int ipxlat_cb_rebase_offsets(struct ipxlat_cb *cb, int delta);
+#ifdef CONFIG_DEBUG_NET
+/**
+ * ipxlat_cb_offsets_valid - validate monotonicity and bounds of cb offsets
+ * @cb: parsed packet metadata
+ *
+ * Return: true if cached offsets are internally consistent.
+ */
+bool ipxlat_cb_offsets_valid(const struct ipxlat_cb *cb);
+#else
+static inline bool ipxlat_cb_offsets_valid(const struct ipxlat_cb *cb)
+{
+ return true;
+}
+#endif
+
+/**
+ * ipxlat_v4_validate_skb - validate and summarize IPv4 packet into skb->cb
+ * @ipxlat: translator private context
+ * @skb: packet to validate
+ *
+ * Populates &struct ipxlat_cb and may mark translator-generated ICMP action on
+ * failure paths.
+ *
+ * Return: 0 on success, negative errno on validation failure.
+ */
+int ipxlat_v4_validate_skb(struct ipxlat_priv *ipxlat, struct sk_buff *skb);
+
+/**
+ * ipxlat_v6_validate_skb - validate and summarize IPv6 packet into skb->cb
+ * @skb: packet to validate
+ *
+ * Populates &struct ipxlat_cb for subsequent 6->4 translation.
+ *
+ * Return: 0 on success, negative errno on validation failure.
+ */
+int ipxlat_v6_validate_skb(struct sk_buff *skb);
+
+#endif /* _NET_IPXLAT_PACKET_H_ */
--
2.53.0
^ permalink raw reply related [flat|nested] 18+ messages in thread* [RFC net-next 04/15] ipxlat: add IPv4 packet validation path
2026-03-19 15:12 [RFC net-next 00/15] Introducing ipxlat: a stateless IPv4/IPv6 translation device Ralf Lici
` (2 preceding siblings ...)
2026-03-19 15:12 ` [RFC net-next 03/15] ipxlat: add packet metadata control block helpers Ralf Lici
@ 2026-03-19 15:12 ` Ralf Lici
2026-03-19 15:12 ` [RFC net-next 05/15] ipxlat: add IPv6 " Ralf Lici
` (10 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Ralf Lici @ 2026-03-19 15:12 UTC (permalink / raw)
To: netdev
Cc: Daniel Gröber, Ralf Lici, Antonio Quartulli, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
linux-kernel
Implement IPv4 packet parsing and validation, including option
inspection, fragment-sensitive L4 checks, and UDP checksum-zero handling
consistent with translator constraints. The parser populates skb
control-block metadata consumed by translation and marks RFC-driven drop
reasons for later action handling.
Signed-off-by: Ralf Lici <ralf@mandelbit.com>
---
drivers/net/ipxlat/packet.c | 312 +++++++++++++++++++++++++++++++++++-
1 file changed, 310 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ipxlat/packet.c b/drivers/net/ipxlat/packet.c
index f82c375255f3..0cc619dca147 100644
--- a/drivers/net/ipxlat/packet.c
+++ b/drivers/net/ipxlat/packet.c
@@ -11,6 +11,8 @@
* Ralf Lici <ralf@mandelbit.com>
*/
+#include <linux/icmp.h>
+
#include "packet.h"
/* Shift cached skb cb offsets by the L3 header delta after in-place rewrite.
@@ -88,9 +90,315 @@ bool ipxlat_cb_offsets_valid(const struct ipxlat_cb *cb)
}
#endif
-int ipxlat_v4_validate_skb(struct ipxlat_priv *ipxl, struct sk_buff *skb)
+static bool ipxlat_v4_validate_addr(__be32 addr4)
{
- return -EOPNOTSUPP;
+ return !(ipv4_is_zeronet(addr4) || ipv4_is_loopback(addr4) ||
+ ipv4_is_multicast(addr4) || ipv4_is_lbcast(addr4));
+}
+
+/* RFC 7915 Section 4.1 requires ignoring IPv4 options unless an unexpired
+ * LSRR/SSRR is present, in which case we must send ICMPv4 SR_FAILED.
+ * We intentionally treat malformed option encoding as invalid input and
+ * drop early instead of continuing translation.
+ */
+static int ipxlat_v4_srr_check(struct sk_buff *skb, const struct iphdr *hdr)
+{
+ const u8 *opt, *end;
+ u8 type, len, ptr;
+
+ if (likely(hdr->ihl <= 5))
+ return 0;
+
+ opt = (const u8 *)(hdr + 1);
+ end = (const u8 *)hdr + (hdr->ihl << 2);
+
+ while (opt < end) {
+ type = opt[0];
+ if (type == IPOPT_END)
+ return 0;
+ if (type == IPOPT_NOOP) {
+ opt++;
+ continue;
+ }
+
+ if (unlikely(end - opt < 2))
+ return -EINVAL;
+
+ len = opt[1];
+ if (unlikely(len < 2 || opt + len > end))
+ return -EINVAL;
+
+ if (type == IPOPT_LSRR || type == IPOPT_SSRR) {
+ if (unlikely(len < 3))
+ return -EINVAL;
+
+ /* points to the beginning of the next IP addr */
+ ptr = opt[2];
+ if (unlikely(ptr < 4))
+ return -EINVAL;
+ if (unlikely(ptr > len))
+ return 0;
+ if (unlikely(ptr > len - 3))
+ return -EINVAL;
+
+ return -EINVAL;
+ }
+
+ opt += len;
+ }
+
+ return 0;
+}
+
+static int ipxlat_v4_pull_l3(struct sk_buff *skb, unsigned int l3_offset,
+ bool inner)
+{
+ const struct iphdr *iph;
+ unsigned int tot_len;
+ int l3_len;
+
+ if (unlikely(!pskb_may_pull(skb, l3_offset + sizeof(*iph))))
+ return -EINVAL;
+
+ iph = (const struct iphdr *)(skb->data + l3_offset);
+ if (unlikely(iph->version != 4 || iph->ihl < 5))
+ return -EINVAL;
+
+ l3_len = iph->ihl << 2;
+ /* For inner packets use ntohs(iph->tot_len) instead of iph_totlen.
+ * If inner iph->tot_len is zero, iph_totlen would fall back to outer
+ * GSO metadata, which is unrelated to quoted inner packet length.
+ */
+ tot_len = unlikely(inner) ? ntohs(iph->tot_len) : iph_totlen(skb, iph);
+ if (unlikely(tot_len < l3_len))
+ return -EINVAL;
+
+ if (unlikely(!pskb_may_pull(skb, l3_offset + l3_len)))
+ return -EINVAL;
+
+ return l3_len;
+}
+
+static int ipxlat_v4_pull_l4(struct sk_buff *skb, unsigned int l4_offset,
+ u8 l4_proto, bool *is_icmp_err)
+{
+ struct icmphdr *icmp;
+ struct udphdr *udp;
+ struct tcphdr *tcp;
+
+ *is_icmp_err = false;
+
+ switch (l4_proto) {
+ case IPPROTO_TCP:
+ if (unlikely(!pskb_may_pull(skb, l4_offset + sizeof(*tcp))))
+ return -EINVAL;
+
+ tcp = (struct tcphdr *)(skb->data + l4_offset);
+ if (unlikely(tcp->doff < 5))
+ return -EINVAL;
+
+ return __tcp_hdrlen(tcp);
+ case IPPROTO_UDP:
+ if (unlikely(!pskb_may_pull(skb, l4_offset + sizeof(*udp))))
+ return -EINVAL;
+
+ udp = (struct udphdr *)(skb->data + l4_offset);
+ if (unlikely(ntohs(udp->len) < sizeof(*udp)))
+ return -EINVAL;
+
+ return sizeof(struct udphdr);
+ case IPPROTO_ICMP:
+ if (unlikely(!pskb_may_pull(skb, l4_offset + sizeof(*icmp))))
+ return -EINVAL;
+
+ icmp = (struct icmphdr *)(skb->data + l4_offset);
+ *is_icmp_err = icmp_is_err(icmp->type);
+ return sizeof(struct icmphdr);
+ default:
+ return 0;
+ }
+}
+
+static int ipxlat_v4_pull_icmp_inner(struct sk_buff *skb,
+ unsigned int inner_l3_off)
+{
+ struct ipxlat_cb *cb = ipxlat_skb_cb(skb);
+ const struct iphdr *inner_l3_hdr;
+ unsigned int inner_l4_off;
+ int inner_l3_len, err;
+ bool is_icmp_err;
+
+ inner_l3_len = ipxlat_v4_pull_l3(skb, inner_l3_off, true);
+ if (unlikely(inner_l3_len < 0))
+ return inner_l3_len;
+ inner_l3_hdr = (const struct iphdr *)(skb->data + inner_l3_off);
+
+ /* accept non-first quoted fragments: only inner L3 is translatable */
+ inner_l4_off = inner_l3_off + inner_l3_len;
+ cb->inner_l3_offset = inner_l3_off;
+ cb->inner_l3_hdr_len = inner_l3_len;
+ cb->inner_l4_offset = inner_l4_off;
+
+ if (unlikely(!ipxlat_is_first_frag4(inner_l3_hdr)))
+ return 0;
+
+ err = ipxlat_v4_pull_l4(skb, inner_l4_off, inner_l3_hdr->protocol,
+ &is_icmp_err);
+ if (unlikely(err < 0))
+ return err;
+ if (unlikely(is_icmp_err))
+ return -EINVAL;
+
+ return 0;
+}
+
+static int ipxlat_v4_pull_hdrs(struct sk_buff *skb)
+{
+ const unsigned int l3_off = skb_network_offset(skb);
+ struct ipxlat_cb *cb = ipxlat_skb_cb(skb);
+ int err, l3_len, l4_len = 0;
+ const struct iphdr *l3_hdr;
+
+ /* parse IPv4 header and get its full length including options */
+ l3_len = ipxlat_v4_pull_l3(skb, l3_off, false);
+ if (unlikely(l3_len < 0))
+ return l3_len;
+ l3_hdr = ip_hdr(skb);
+
+ if (unlikely(!ipxlat_v4_validate_addr(l3_hdr->daddr)))
+ return -EINVAL;
+
+ /* RFC 7915 Section 4.1 */
+ if (unlikely(ipxlat_v4_srr_check(skb, l3_hdr)))
+ return -EINVAL;
+ if (unlikely(l3_hdr->ttl <= 1))
+ return -EINVAL;
+
+ /* RFC 7915 Section 1.2:
+ * Fragmented ICMP/ICMPv6 packets will not be translated by IP/ICMP
+ * translators.
+ */
+ if (unlikely(l3_hdr->protocol == IPPROTO_ICMP &&
+ ip_is_fragment(l3_hdr)))
+ return -EINVAL;
+
+ cb->l3_hdr_len = l3_len;
+ cb->l4_proto = l3_hdr->protocol;
+ cb->l4_off = l3_off + l3_len;
+ cb->payload_off = cb->l4_off;
+ cb->is_icmp_err = false;
+
+ /* only non fragmented packets or first fragments have transport hdrs */
+ if (unlikely(!ipxlat_is_first_frag4(l3_hdr))) {
+ if (unlikely(!ipxlat_v4_validate_addr(l3_hdr->saddr)))
+ return -EINVAL;
+ return 0;
+ }
+
+ l4_len = ipxlat_v4_pull_l4(skb, cb->l4_off, l3_hdr->protocol,
+ &cb->is_icmp_err);
+ if (unlikely(l4_len < 0))
+ return l4_len;
+
+ /* RFC 7915 Section 4.1:
+ * Illegal IPv4 sources are accepted only for ICMPv4 error translation.
+ */
+ if (unlikely(!ipxlat_v4_validate_addr(l3_hdr->saddr) &&
+ !cb->is_icmp_err))
+ return -EINVAL;
+
+ cb->payload_off = cb->l4_off + l4_len;
+
+ if (unlikely(cb->is_icmp_err)) {
+ /* validate the quoted packet in an ICMP error */
+ err = ipxlat_v4_pull_icmp_inner(skb, cb->payload_off);
+ if (unlikely(err))
+ return err;
+ }
+
+ return 0;
+}
+
+static int ipxlat_v4_validate_icmp_csum(const struct sk_buff *skb)
+{
+ __sum16 csum;
+
+ /* skip when checksum is not software-owned */
+ if (skb->ip_summed != CHECKSUM_NONE)
+ return 0;
+
+ /* compute checksum over ICMP header and payload, then fold to 16-bit
+ * Internet checksum to validate it
+ */
+ csum = csum_fold(skb_checksum(skb, skb_transport_offset(skb),
+ ipxlat_skb_datagram_len(skb), 0));
+ return unlikely(csum) ? -EINVAL : 0;
+}
+
+/**
+ * ipxlat_v4_validate_skb - validate IPv4 input and fill parser metadata in cb
+ * @ipxlat: translator private context
+ * @skb: packet to validate
+ *
+ * Ensures required headers are present/consistent and stores parsed offsets
+ * into &struct ipxlat_cb for the translation path.
+ *
+ * Return: 0 on success, negative errno on validation failure.
+ */
+int ipxlat_v4_validate_skb(struct ipxlat_priv *ipxlat, struct sk_buff *skb)
+{
+ struct ipxlat_cb *cb = ipxlat_skb_cb(skb);
+ struct iphdr *l3_hdr;
+ struct udphdr *udph;
+ int err;
+
+ if (unlikely(skb_shared(skb)))
+ return -EINVAL;
+
+ err = ipxlat_v4_pull_hdrs(skb);
+ if (unlikely(err))
+ return err;
+
+ skb_set_transport_header(skb, cb->l4_off);
+
+ if (unlikely(cb->is_icmp_err)) {
+ if (unlikely(cb->l4_proto != IPPROTO_ICMP)) {
+ DEBUG_NET_WARN_ON_ONCE(1);
+ return -EINVAL;
+ }
+
+ /* Translation path recomputes ICMPv6 checksum from scratch.
+ * Validate here so a corrupted ICMPv4 error is not converted
+ * into a translated packet with a valid checksum.
+ */
+ return ipxlat_v4_validate_icmp_csum(skb);
+ }
+
+ l3_hdr = ip_hdr(skb);
+ if (likely(cb->l4_proto != IPPROTO_UDP))
+ return 0;
+ if (unlikely(!ipxlat_is_first_frag4(l3_hdr)))
+ return 0;
+
+ udph = udp_hdr(skb);
+ if (likely(udph->check != 0))
+ return 0;
+
+ /* We are in the path where L4 header is present (unfragmented packets
+ * or first fragments) and is UDP.
+ * Fragmented checksum-less IPv4 UDP is rejected because 4->6 cannot
+ * reliably translate it.
+ */
+ if (unlikely(ip_is_fragment(l3_hdr)))
+ return -EINVAL;
+
+ /* udph->len bounds the span used to compute replacement checksum */
+ if (unlikely(ntohs(udph->len) > skb->len - cb->l4_off))
+ return -EINVAL;
+
+ cb->udp_zero_csum_len = ntohs(udph->len);
+
+ return 0;
}
int ipxlat_v6_validate_skb(struct sk_buff *skb)
--
2.53.0
^ permalink raw reply related [flat|nested] 18+ messages in thread* [RFC net-next 05/15] ipxlat: add IPv6 packet validation path
2026-03-19 15:12 [RFC net-next 00/15] Introducing ipxlat: a stateless IPv4/IPv6 translation device Ralf Lici
` (3 preceding siblings ...)
2026-03-19 15:12 ` [RFC net-next 04/15] ipxlat: add IPv4 packet validation path Ralf Lici
@ 2026-03-19 15:12 ` Ralf Lici
2026-03-19 15:12 ` [RFC net-next 06/15] ipxlat: add transport checksum and offload helpers Ralf Lici
` (9 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Ralf Lici @ 2026-03-19 15:12 UTC (permalink / raw)
To: netdev
Cc: Daniel Gröber, Ralf Lici, Antonio Quartulli, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
linux-kernel
Implement IPv6 packet parsing and validation, including extension header
traversal, fragment-header constraints, and ICMPv6 checksum handling for
informational/error traffic. The parser fills skb control-block metadata
for 6->4 translation and quoted-inner packet handling.
Signed-off-by: Ralf Lici <ralf@mandelbit.com>
---
drivers/net/ipxlat/packet.c | 326 +++++++++++++++++++++++++++++++++++-
1 file changed, 325 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ipxlat/packet.c b/drivers/net/ipxlat/packet.c
index 0cc619dca147..b9a9af1b3adb 100644
--- a/drivers/net/ipxlat/packet.c
+++ b/drivers/net/ipxlat/packet.c
@@ -401,7 +401,331 @@ int ipxlat_v4_validate_skb(struct ipxlat_priv *ipxlat, struct sk_buff *skb)
return 0;
}
+static bool ipxlat_v6_validate_saddr(const struct in6_addr *addr6)
+{
+ return !(ipv6_addr_any(addr6) || ipv6_addr_loopback(addr6) ||
+ ipv6_addr_is_multicast(addr6));
+}
+
+static int ipxlat_v6_pull_l4(struct sk_buff *skb, unsigned int l4_offset,
+ u8 l4_proto, bool *is_icmp_err)
+{
+ struct icmp6hdr *icmp;
+ struct udphdr *udp;
+ struct tcphdr *tcp;
+
+ *is_icmp_err = false;
+
+ switch (l4_proto) {
+ case NEXTHDR_TCP:
+ if (unlikely(!pskb_may_pull(skb, l4_offset + sizeof(*tcp))))
+ return -EINVAL;
+ tcp = (struct tcphdr *)(skb->data + l4_offset);
+ return __tcp_hdrlen(tcp);
+ case NEXTHDR_UDP:
+ if (unlikely(!pskb_may_pull(skb, l4_offset + sizeof(*udp))))
+ return -EINVAL;
+ udp = (struct udphdr *)(skb->data + l4_offset);
+ if (unlikely(ntohs(udp->len) < sizeof(*udp)))
+ return -EINVAL;
+ return sizeof(struct udphdr);
+ case NEXTHDR_ICMP:
+ if (unlikely(!pskb_may_pull(skb, l4_offset + sizeof(*icmp))))
+ return -EINVAL;
+ icmp = (struct icmp6hdr *)(skb->data + l4_offset);
+ *is_icmp_err = icmpv6_is_err(icmp->icmp6_type);
+ return sizeof(struct icmp6hdr);
+ default:
+ return 0;
+ }
+}
+
+/* Basic IPv6 header walk: parse only the packet starting at l3_offset.
+ * It does not inspect quoted inner packets carried by ICMP errors.
+ */
+static int ipxlat_v6_walk_hdrs(struct sk_buff *skb, unsigned int l3_offset,
+ u8 *l4_proto, unsigned int *fhdr_offset,
+ unsigned int *l4_offset, bool *has_l4)
+{
+ unsigned int frag_hdr_off, l4hdr_off;
+ struct frag_hdr *frag;
+ struct ipv6hdr *ip6;
+ bool first_frag;
+ int err;
+
+ /* cannot use default getter because this function is used both for
+ * outer and inner packets
+ */
+ ip6 = (struct ipv6hdr *)(skb->data + l3_offset);
+
+ /* if present, locate Fragment Header first because it affects
+ * whether transport headers are available
+ */
+ frag_hdr_off = l3_offset;
+ err = ipv6_find_hdr(skb, &frag_hdr_off, NEXTHDR_FRAGMENT, NULL, NULL);
+ if (unlikely(err < 0 && err != -ENOENT))
+ return -EINVAL;
+
+ *has_l4 = true;
+ *fhdr_offset = 0;
+ if (unlikely(err == NEXTHDR_FRAGMENT)) {
+ if (unlikely(!pskb_may_pull(skb, frag_hdr_off + sizeof(*frag))))
+ return -EINVAL;
+ frag = (struct frag_hdr *)(skb->data + frag_hdr_off);
+
+ /* remember Fragment Header offset for downstream logic */
+ *fhdr_offset = frag_hdr_off;
+ first_frag = ipxlat_is_first_frag6(frag);
+
+ /* ipv6 forbids chaining FHs */
+ if (unlikely(frag->nexthdr == NEXTHDR_FRAGMENT))
+ return -EINVAL;
+
+ /* RFC 7915 Section 5.1.1 does not support extension headers
+ * after FH (except NEXTHDR_NONE)
+ */
+ if (unlikely(ipv6_ext_hdr(frag->nexthdr) &&
+ frag->nexthdr != NEXTHDR_NONE))
+ return -EPROTONOSUPPORT;
+
+ /* non-first fragments do not carry a full transport header */
+ if (!first_frag) {
+ *l4_proto = frag->nexthdr;
+ /* first byte after FH is fragment payload,
+ * not L4 header
+ */
+ *l4_offset = frag_hdr_off + sizeof(struct frag_hdr);
+ *has_l4 = false;
+ return 0;
+ }
+ }
+
+ /* walk extension headers to terminal protocol and compute offsets used
+ * by validation/translation
+ */
+ l4hdr_off = l3_offset;
+ err = ipv6_find_hdr(skb, &l4hdr_off, -1, NULL, NULL);
+ if (unlikely(err < 0))
+ return -EINVAL;
+
+ *l4_proto = err;
+ *l4_offset = l4hdr_off;
+ return 0;
+}
+
+/* RFC 7915 Section 5.1 says a Routing Header with Segments Left != 0
+ * must not be translated. We detect it by asking ipv6_find_hdr not to
+ * skip RH, then emit ICMPv6 Parameter Problem pointing to segments_left.
+ */
+static int ipxlat_v6_check_rh(struct sk_buff *skb)
+{
+ unsigned int rh_off;
+ int flags, nexthdr;
+
+ rh_off = 0;
+ flags = IP6_FH_F_SKIP_RH;
+ nexthdr = ipv6_find_hdr(skb, &rh_off, NEXTHDR_ROUTING, NULL, &flags);
+ if (unlikely(nexthdr < 0 && nexthdr != -ENOENT))
+ return -EINVAL;
+ if (likely(nexthdr != NEXTHDR_ROUTING))
+ return 0;
+
+ return -EINVAL;
+}
+
+static int ipxlat_v6_pull_outer_l3(struct sk_buff *skb)
+{
+ const unsigned int l3_off = skb_network_offset(skb);
+ struct ipv6hdr *l3_hdr;
+
+ if (unlikely(!pskb_may_pull(skb, l3_off + sizeof(*l3_hdr))))
+ return -EINVAL;
+ l3_hdr = ipv6_hdr(skb);
+
+ /* translator does not support jumbograms; payload_len must match skb */
+ if (unlikely(l3_hdr->version != 6 ||
+ skb->len != sizeof(*l3_hdr) +
+ be16_to_cpu(l3_hdr->payload_len) ||
+ !ipxlat_v6_validate_saddr(&l3_hdr->saddr)))
+ return -EINVAL;
+
+ if (unlikely(l3_hdr->hop_limit <= 1))
+ return -EINVAL;
+
+ return 0;
+}
+
+static int ipxlat_v6_pull_icmp_inner(struct sk_buff *skb,
+ unsigned int outer_payload_off)
+{
+ unsigned int inner_fhdr_off, inner_l4_off;
+ struct ipxlat_cb *cb = ipxlat_skb_cb(skb);
+ struct ipv6hdr *inner_ip6;
+ bool has_l4, is_icmp_err;
+ u8 inner_l4_proto;
+ int err;
+
+ if (unlikely(!pskb_may_pull(skb,
+ outer_payload_off + sizeof(*inner_ip6))))
+ return -EINVAL;
+
+ inner_ip6 = (struct ipv6hdr *)(skb->data + outer_payload_off);
+ if (unlikely(inner_ip6->version != 6))
+ return -EINVAL;
+
+ err = ipxlat_v6_walk_hdrs(skb, outer_payload_off, &inner_l4_proto,
+ &inner_fhdr_off, &inner_l4_off, &has_l4);
+ if (unlikely(err))
+ return err;
+
+ cb->inner_l3_offset = outer_payload_off;
+ cb->inner_l4_offset = inner_l4_off;
+ cb->inner_fragh_off = inner_fhdr_off;
+ cb->inner_l4_proto = inner_l4_proto;
+
+ if (likely(has_l4)) {
+ err = ipxlat_v6_pull_l4(skb, inner_l4_off, inner_l4_proto,
+ &is_icmp_err);
+ if (unlikely(err < 0))
+ return err;
+ if (unlikely(is_icmp_err))
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static int ipxlat_v6_pull_hdrs(struct sk_buff *skb)
+{
+ const unsigned int l3_off = skb_network_offset(skb);
+ unsigned int fragh_off, l4_off, payload_off;
+ struct ipxlat_cb *cb = ipxlat_skb_cb(skb);
+ int l3_len, l4_len, err;
+ struct frag_hdr *frag;
+ bool has_l4;
+ u8 l4_proto;
+
+ /* parse IPv6 base header and perform basic structural checks */
+ err = ipxlat_v6_pull_outer_l3(skb);
+ if (unlikely(err))
+ return err;
+
+ /* walk extension/fragment headers and locate the transport header */
+ err = ipxlat_v6_walk_hdrs(skb, l3_off, &l4_proto, &fragh_off, &l4_off,
+ &has_l4);
+ /* -EPROTONOSUPPORT means packet layout is syntactically valid but
+ * unsupported by our RFC 7915 path
+ */
+ if (unlikely(err == -EPROTONOSUPPORT))
+ return -EINVAL;
+ if (unlikely(err))
+ return err;
+
+ l3_len = l4_off - l3_off;
+ payload_off = l4_off;
+
+ if (likely(has_l4)) {
+ l4_len = ipxlat_v6_pull_l4(skb, l4_off, l4_proto,
+ &cb->is_icmp_err);
+ if (unlikely(l4_len < 0))
+ return l4_len;
+ payload_off += l4_len;
+ }
+
+ /* RFC 7915 Section 5.1 */
+ err = ipxlat_v6_check_rh(skb);
+ if (unlikely(err))
+ return err;
+
+ if (unlikely(l4_proto == NEXTHDR_ICMP)) {
+ /* A stateless translator cannot reliably translate ICMP
+ * checksum across real IPv6 fragments, so fragmented ICMP is
+ * dropped. A Fragment Header alone, however, is not enough to
+ * decide: so-called atomic fragments (offset=0, M=0) carry a
+ * Fragment Header but are not actually fragmented.
+ */
+ if (unlikely(fragh_off)) {
+ if (unlikely(!pskb_may_pull(skb,
+ fragh_off + sizeof(*frag))))
+ return -EINVAL;
+
+ frag = (struct frag_hdr *)(skb->data + fragh_off);
+ if (unlikely(ipxlat_get_frag6_offset(frag) ||
+ (be16_to_cpu(frag->frag_off) & IP6_MF)))
+ return -EINVAL;
+ }
+
+ if (unlikely(cb->is_icmp_err)) {
+ /* validate the quoted packet in an ICMP error */
+ err = ipxlat_v6_pull_icmp_inner(skb, payload_off);
+ if (unlikely(err))
+ return err;
+ }
+ }
+
+ cb->l4_proto = l4_proto;
+ cb->l4_off = l4_off;
+ cb->fragh_off = fragh_off;
+ cb->payload_off = payload_off;
+ cb->l3_hdr_len = l3_len;
+
+ return 0;
+}
+
+static int ipxlat_v6_validate_icmp_csum(const struct sk_buff *skb)
+{
+ struct ipv6hdr *iph6;
+ unsigned int len;
+ __sum16 csum;
+
+ if (skb->ip_summed != CHECKSUM_NONE)
+ return 0;
+
+ iph6 = ipv6_hdr(skb);
+ len = ipxlat_skb_datagram_len(skb);
+ csum = csum_ipv6_magic(&iph6->saddr, &iph6->daddr, len, NEXTHDR_ICMP,
+ skb_checksum(skb, skb_transport_offset(skb), len,
+ 0));
+
+ return unlikely(csum) ? -EINVAL : 0;
+}
+
+/**
+ * ipxlat_v6_validate_skb - validate IPv6 input and fill parser metadata in cb
+ * @skb: packet to validate
+ *
+ * Ensures required headers are present/consistent and stores parsed offsets
+ * into &struct ipxlat_cb for the translation path.
+ *
+ * Return: 0 on success, negative errno on validation failure.
+ */
int ipxlat_v6_validate_skb(struct sk_buff *skb)
{
- return -EOPNOTSUPP;
+ struct ipxlat_cb *cb = ipxlat_skb_cb(skb);
+ int err;
+
+ if (unlikely(skb_shared(skb)))
+ return -EINVAL;
+
+ err = ipxlat_v6_pull_hdrs(skb);
+ if (unlikely(err))
+ return err;
+
+ skb_set_transport_header(skb, cb->l4_off);
+
+ if (unlikely(cb->is_icmp_err)) {
+ if (unlikely(cb->l4_proto != NEXTHDR_ICMP)) {
+ DEBUG_NET_WARN_ON_ONCE(1);
+ return -EINVAL;
+ }
+
+ /* The translated ICMPv4 checksum is recomputed from scratch,
+ * so reject bad ICMPv6 error checksums before conversion.
+ */
+ err = ipxlat_v6_validate_icmp_csum(skb);
+ if (unlikely(err))
+ return err;
+ }
+
+ return 0;
}
--
2.53.0
^ permalink raw reply related [flat|nested] 18+ messages in thread* [RFC net-next 06/15] ipxlat: add transport checksum and offload helpers
2026-03-19 15:12 [RFC net-next 00/15] Introducing ipxlat: a stateless IPv4/IPv6 translation device Ralf Lici
` (4 preceding siblings ...)
2026-03-19 15:12 ` [RFC net-next 05/15] ipxlat: add IPv6 " Ralf Lici
@ 2026-03-19 15:12 ` Ralf Lici
2026-03-19 15:12 ` [RFC net-next 07/15] ipxlat: add 4to6 and 6to4 TCP/UDP translation helpers Ralf Lici
` (8 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Ralf Lici @ 2026-03-19 15:12 UTC (permalink / raw)
To: netdev
Cc: Daniel Gröber, Ralf Lici, Antonio Quartulli, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
linux-kernel
Add shared transport-layer helpers for checksum manipulation and offload
metadata normalization across family translation.
This introduces incremental and full checksum utilities plus generic
ICMP relayout/offload finalization routines reused by later 4->6 and
6->4 transport translation paths.
Signed-off-by: Ralf Lici <ralf@mandelbit.com>
---
drivers/net/ipxlat/transport.c | 146 +++++++++++++++++++++++++++++++++
drivers/net/ipxlat/transport.h | 83 +++++++++++++++++++
2 files changed, 229 insertions(+)
create mode 100644 drivers/net/ipxlat/transport.c
create mode 100644 drivers/net/ipxlat/transport.h
diff --git a/drivers/net/ipxlat/transport.c b/drivers/net/ipxlat/transport.c
new file mode 100644
index 000000000000..cd786ce84adc
--- /dev/null
+++ b/drivers/net/ipxlat/transport.c
@@ -0,0 +1,146 @@
+// SPDX-License-Identifier: GPL-2.0
+/* IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver
+ *
+ * Copyright (C) 2024- Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Copyright (C) 2026- Mandelbit SRL
+ * Copyright (C) 2026- Daniel Gröber <dxld@darkboxed.org>
+ *
+ * Author: Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Antonio Quartulli <antonio@mandelbit.com>
+ * Daniel Gröber <dxld@darkboxed.org>
+ * Ralf Lici <ralf@mandelbit.com>
+ */
+
+#include <net/ip.h>
+#include <net/ip6_checksum.h>
+#include <net/tcp.h>
+#include <net/udp.h>
+
+#include "packet.h"
+#include "transport.h"
+
+/* set CHECKSUM_PARTIAL metadata for transport checksum completion */
+int ipxlat_set_partial_csum(struct sk_buff *skb, u16 csum_offset)
+{
+ if (likely(skb_partial_csum_set(skb, skb_transport_offset(skb),
+ csum_offset)))
+ return 0;
+ return -EINVAL;
+}
+
+static __wsum ipxlat_pseudohdr6_csum(const struct ipv6hdr *hdr)
+{
+ return ~csum_unfold(csum_ipv6_magic(&hdr->saddr, &hdr->daddr, 0, 0, 0));
+}
+
+static __wsum ipxlat_pseudohdr4_csum(const struct iphdr *hdr)
+{
+ return csum_tcpudp_nofold(hdr->saddr, hdr->daddr, 0, 0, 0);
+}
+
+static __sum16 ipxlat_46_update_csum(__sum16 csum16,
+ const struct iphdr *in_ip4,
+ const void *in_l4_hdr,
+ const struct ipv6hdr *out_ip6,
+ const void *out_l4_hdr, size_t l4_hdr_len)
+{
+ __wsum csum;
+
+ csum = ~csum_unfold(csum16);
+
+ /* replace pseudohdr and L4 header contributions, payload unchanged */
+ csum = csum_sub(csum, ipxlat_pseudohdr4_csum(in_ip4));
+ csum = csum_sub(csum, csum_partial(in_l4_hdr, l4_hdr_len, 0));
+ csum = csum_add(csum, ipxlat_pseudohdr6_csum(out_ip6));
+ csum = csum_add(csum, csum_partial(out_l4_hdr, l4_hdr_len, 0));
+ return csum_fold(csum);
+}
+
+static __sum16 ipxlat_64_update_csum(__sum16 csum16,
+ const struct ipv6hdr *in_ip6,
+ const void *in_l4_hdr,
+ size_t in_l4_hdr_len,
+ const struct iphdr *out_ip4,
+ const void *out_l4_hdr,
+ size_t out_l4_hdr_len)
+{
+ __wsum csum;
+
+ csum = ~csum_unfold(csum16);
+
+ /* only address terms matter because L4 length/proto are unchanged */
+ csum = csum_sub(csum, ipxlat_pseudohdr6_csum(in_ip6));
+ csum = csum_sub(csum, csum_partial(in_l4_hdr, in_l4_hdr_len, 0));
+
+ csum = csum_add(csum, ipxlat_pseudohdr4_csum(out_ip4));
+ csum = csum_add(csum, csum_partial(out_l4_hdr, out_l4_hdr_len, 0));
+
+ return csum_fold(csum);
+}
+
+__sum16 ipxlat_l4_csum_ipv6(const struct in6_addr *saddr,
+ const struct in6_addr *daddr,
+ const struct sk_buff *skb, unsigned int l4_off,
+ unsigned int l4_len, u8 proto)
+{
+ return csum_ipv6_magic(saddr, daddr, l4_len, proto,
+ skb_checksum(skb, l4_off, l4_len, 0));
+}
+
+/* Normalize checksum/offload metadata after address-family translation.
+ *
+ * Translation changes protocol family but keeps transport payload semantics
+ * intact, so TCP GSO only needs type remap (gso_from -> gso_to), while ICMP
+ * must clear stale GSO state because there is no ICMP GSO transform here.
+ *
+ * This mirrors forwarding expectations: reject LRO on xmit and clear hash
+ * when tuple semantics may have changed (fragments and non-TCP/UDP).
+ */
+int ipxlat_finalize_offload(struct sk_buff *skb, u8 l4_proto, bool is_fragment,
+ u32 gso_from, u32 gso_to)
+{
+ struct skb_shared_info *shinfo;
+
+ if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE))
+ skb->ip_summed = CHECKSUM_NONE;
+
+ if (!skb_is_gso(skb))
+ goto out_hash;
+
+ /* align with forwarding paths that reject LRO skbs before xmit */
+ if (unlikely(skb_warn_if_lro(skb)))
+ return -EINVAL;
+
+ shinfo = skb_shinfo(skb);
+ switch (l4_proto) {
+ case IPPROTO_TCP:
+ /* segment payload size is unchanged by address-family
+ * translation so there's no need to touch gso_size
+ */
+ if (shinfo->gso_type & gso_from) {
+ shinfo->gso_type &= ~gso_from;
+ shinfo->gso_type |= gso_to;
+ } else if (unlikely(!(shinfo->gso_type & gso_to))) {
+ return -EOPNOTSUPP;
+ }
+ break;
+ case IPPROTO_UDP:
+ break;
+ case IPPROTO_ICMP:
+ /* for ICMP there is no GSO transform; clear stale offload
+ * metadata so the stack treats it as a normal frame
+ */
+ skb_gso_reset(skb);
+ break;
+ default:
+ return -EPROTONOSUPPORT;
+ }
+
+out_hash:
+ if (unlikely(is_fragment ||
+ (l4_proto != IPPROTO_TCP && l4_proto != IPPROTO_UDP)))
+ skb_clear_hash(skb);
+ else
+ skb_clear_hash_if_not_l4(skb);
+ return 0;
+}
diff --git a/drivers/net/ipxlat/transport.h b/drivers/net/ipxlat/transport.h
new file mode 100644
index 000000000000..bd228aecfb3b
--- /dev/null
+++ b/drivers/net/ipxlat/transport.h
@@ -0,0 +1,83 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver
+ *
+ * Copyright (C) 2024- Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Copyright (C) 2026- Mandelbit SRL
+ * Copyright (C) 2026- Daniel Gröber <dxld@darkboxed.org>
+ *
+ * Author: Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Antonio Quartulli <antonio@mandelbit.com>
+ * Daniel Gröber <dxld@darkboxed.org>
+ * Ralf Lici <ralf@mandelbit.com>
+ */
+
+#ifndef _NET_IPXLAT_TRANSPORT_H_
+#define _NET_IPXLAT_TRANSPORT_H_
+
+#include <linux/icmp.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+
+/**
+ * ipxlat_l4_min_len - minimum transport header size for protocol
+ * @protocol: transport protocol identifier
+ *
+ * Return: minimum header length for @protocol, or 0 when unsupported.
+ */
+static inline unsigned int ipxlat_l4_min_len(u8 protocol)
+{
+ switch (protocol) {
+ case IPPROTO_TCP:
+ return sizeof(struct tcphdr);
+ case IPPROTO_UDP:
+ return sizeof(struct udphdr);
+ case IPPROTO_ICMP:
+ return sizeof(struct icmphdr);
+ default:
+ return 0;
+ }
+}
+
+/**
+ * ipxlat_set_partial_csum - program CHECKSUM_PARTIAL metadata on skb
+ * @skb: packet with transport checksum field
+ * @csum_offset: offset of checksum field within transport header
+ *
+ * Return: 0 on success, negative errno on invalid skb state.
+ */
+int ipxlat_set_partial_csum(struct sk_buff *skb, u16 csum_offset);
+
+/**
+ * ipxlat_l4_csum_ipv6 - compute full L4 checksum with IPv6 pseudo-header
+ * @saddr: IPv6 source address
+ * @daddr: IPv6 destination address
+ * @skb: packet buffer
+ * @l4_off: transport header offset
+ * @l4_len: transport span (header + payload)
+ * @proto: transport protocol
+ *
+ * Return: folded checksum value covering pseudo-header and transport payload.
+ */
+__sum16 ipxlat_l4_csum_ipv6(const struct in6_addr *saddr,
+ const struct in6_addr *daddr,
+ const struct sk_buff *skb, unsigned int l4_off,
+ unsigned int l4_len, u8 proto);
+
+/**
+ * ipxlat_finalize_offload - normalize checksum/GSO metadata after translation
+ * @skb: translated packet
+ * @l4_proto: resulting transport protocol
+ * @is_fragment: resulting packet is fragmented
+ * @gso_from: input TCP GSO type bit
+ * @gso_to: output TCP GSO type bit
+ *
+ * Converts TCP GSO family bits and clears stale checksum/hash state when
+ * offload metadata cannot be preserved across address-family translation.
+ *
+ * Return: 0 on success, negative errno on unsupported/offload-incompatible
+ * input.
+ */
+int ipxlat_finalize_offload(struct sk_buff *skb, u8 l4_proto, bool is_fragment,
+ u32 gso_from, u32 gso_to);
+
+#endif /* _NET_IPXLAT_TRANSPORT_H_ */
--
2.53.0
^ permalink raw reply related [flat|nested] 18+ messages in thread* [RFC net-next 07/15] ipxlat: add 4to6 and 6to4 TCP/UDP translation helpers
2026-03-19 15:12 [RFC net-next 00/15] Introducing ipxlat: a stateless IPv4/IPv6 translation device Ralf Lici
` (5 preceding siblings ...)
2026-03-19 15:12 ` [RFC net-next 06/15] ipxlat: add transport checksum and offload helpers Ralf Lici
@ 2026-03-19 15:12 ` Ralf Lici
2026-03-19 15:12 ` [RFC net-next 08/15] ipxlat: add translation engine and dispatch core Ralf Lici
` (7 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Ralf Lici @ 2026-03-19 15:12 UTC (permalink / raw)
To: netdev
Cc: Daniel Gröber, Ralf Lici, Antonio Quartulli, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
linux-kernel
Add protocol-specific transport translation entry points for both
address-family directions.
This wires checksum adjustment for outer and quoted-inner TCP/UDP
headers and provides the transport routines consumed by the translation
engine.
Signed-off-by: Ralf Lici <ralf@mandelbit.com>
---
drivers/net/ipxlat/transport.c | 194 +++++++++++++++++++++++++++++++++
drivers/net/ipxlat/transport.h | 20 ++++
2 files changed, 214 insertions(+)
diff --git a/drivers/net/ipxlat/transport.c b/drivers/net/ipxlat/transport.c
index cd786ce84adc..3aa00c635916 100644
--- a/drivers/net/ipxlat/transport.c
+++ b/drivers/net/ipxlat/transport.c
@@ -144,3 +144,197 @@ int ipxlat_finalize_offload(struct sk_buff *skb, u8 l4_proto, bool is_fragment,
skb_clear_hash_if_not_l4(skb);
return 0;
}
+
+int ipxlat_46_outer_tcp(struct sk_buff *skb, const struct iphdr *in4)
+{
+ const struct ipv6hdr *iph6 = ipv6_hdr(skb);
+ struct tcphdr *tcp_new = tcp_hdr(skb);
+ struct tcphdr tcp_old;
+ __sum16 csum16;
+
+ /* CHECKSUM_PARTIAL keeps a pseudohdr seed in check, not a final
+ * transport checksum. For 4->6, we only re-seed it with IPv6 pseudohdr
+ * data and keep completion deferred to offload.
+ */
+ if (skb->ip_summed == CHECKSUM_PARTIAL) {
+ tcp_new->check = ~tcp_v6_check(ipxlat_skb_datagram_len(skb),
+ &iph6->saddr, &iph6->daddr, 0);
+ return ipxlat_set_partial_csum(skb,
+ offsetof(struct tcphdr, check));
+ }
+
+ /* zeroing check in old/new headers avoids double-accounting it */
+ csum16 = tcp_new->check;
+ tcp_old = *tcp_new;
+ tcp_old.check = 0;
+ tcp_new->check = 0;
+ tcp_new->check = ipxlat_46_update_csum(csum16, in4,
+ &tcp_old, iph6, tcp_new,
+ sizeof(*tcp_new));
+ skb->ip_summed = CHECKSUM_NONE;
+ return 0;
+}
+
+int ipxlat_46_outer_udp(struct sk_buff *skb, const struct iphdr *in4)
+{
+ const struct ipxlat_cb *cb = ipxlat_skb_cb(skb);
+ const struct ipv6hdr *iph6 = ipv6_hdr(skb);
+ struct udphdr *udp_new = udp_hdr(skb);
+ struct udphdr udp_old;
+ __sum16 csum16;
+
+ /* outer path enforces UDP zero-checksum policy in validation */
+ if (skb->ip_summed == CHECKSUM_PARTIAL && likely(udp_new->check != 0)) {
+ udp_new->check = ~udp_v6_check(ipxlat_skb_datagram_len(skb),
+ &iph6->saddr, &iph6->daddr, 0);
+ return ipxlat_set_partial_csum(skb,
+ offsetof(struct udphdr, check));
+ }
+
+ /* incoming UDP IPv4 has no checksum (legal in IPv4, not in IPv6) */
+ if (unlikely(udp_new->check == 0)) {
+ if (unlikely(!cb->udp_zero_csum_len))
+ return -EINVAL;
+
+ udp_new->check =
+ ipxlat_l4_csum_ipv6(&iph6->saddr, &iph6->daddr, skb,
+ skb_transport_offset(skb),
+ cb->udp_zero_csum_len, IPPROTO_UDP);
+ /* 0x0000 on wire means "no checksum"; preserve computed zero */
+ if (udp_new->check == 0)
+ udp_new->check = CSUM_MANGLED_0;
+ skb->ip_summed = CHECKSUM_NONE;
+ return 0;
+ }
+
+ csum16 = udp_new->check;
+ udp_old = *udp_new;
+ udp_old.check = 0;
+ udp_new->check = 0;
+ udp_new->check = ipxlat_46_update_csum(csum16, in4,
+ &udp_old, iph6, udp_new,
+ sizeof(*udp_new));
+ skb->ip_summed = CHECKSUM_NONE;
+ return 0;
+}
+
+int ipxlat_46_inner_tcp(struct sk_buff *skb, const struct iphdr *in4,
+ const struct ipv6hdr *iph6, struct tcphdr *tcp_new)
+{
+ struct tcphdr tcp_old;
+ __sum16 csum16;
+
+ csum16 = tcp_new->check;
+ tcp_old = *tcp_new;
+ tcp_old.check = 0;
+ tcp_new->check = 0;
+ tcp_new->check = ipxlat_46_update_csum(csum16, in4, &tcp_old, iph6,
+ tcp_new, sizeof(*tcp_new));
+ return 0;
+}
+
+int ipxlat_46_inner_udp(struct sk_buff *skb, const struct iphdr *in4,
+ const struct ipv6hdr *iph6, struct udphdr *udp_new)
+{
+ struct udphdr udp_old;
+ __sum16 csum16;
+
+ if (unlikely(udp_new->check == 0))
+ return 0;
+
+ csum16 = udp_new->check;
+ udp_old = *udp_new;
+ udp_old.check = 0;
+ udp_new->check = 0;
+ udp_new->check = ipxlat_46_update_csum(csum16, in4, &udp_old, iph6,
+ udp_new, sizeof(*udp_new));
+ return 0;
+}
+
+int ipxlat_64_outer_tcp(struct sk_buff *skb, const struct ipv6hdr *in6)
+{
+ struct tcphdr tcp_old, *tcp_new;
+ __sum16 csum16;
+
+ tcp_new = tcp_hdr(skb);
+
+ if (skb->ip_summed == CHECKSUM_PARTIAL) {
+ tcp_new->check = ~tcp_v4_check(ipxlat_skb_datagram_len(skb),
+ ip_hdr(skb)->saddr,
+ ip_hdr(skb)->daddr, 0);
+ return ipxlat_set_partial_csum(skb,
+ offsetof(struct tcphdr, check));
+ }
+
+ csum16 = tcp_new->check;
+ tcp_old = *tcp_new;
+ tcp_old.check = 0;
+ tcp_new->check = 0;
+ tcp_new->check = ipxlat_64_update_csum(csum16, in6, &tcp_old,
+ sizeof(tcp_old), ip_hdr(skb),
+ tcp_new, sizeof(*tcp_new));
+ skb->ip_summed = CHECKSUM_NONE;
+ return 0;
+}
+
+int ipxlat_64_outer_udp(struct sk_buff *skb, const struct ipv6hdr *in6)
+{
+ struct udphdr udp_old, *udp_new;
+ __sum16 csum16;
+
+ udp_new = udp_hdr(skb);
+
+ if (skb->ip_summed == CHECKSUM_PARTIAL) {
+ udp_new->check = ~udp_v4_check(ipxlat_skb_datagram_len(skb),
+ ip_hdr(skb)->saddr,
+ ip_hdr(skb)->daddr, 0);
+ return ipxlat_set_partial_csum(skb,
+ offsetof(struct udphdr, check));
+ }
+
+ csum16 = udp_new->check;
+ udp_old = *udp_new;
+ udp_old.check = 0;
+ udp_new->check = 0;
+ udp_new->check = ipxlat_64_update_csum(csum16, in6, &udp_old,
+ sizeof(udp_old), ip_hdr(skb),
+ udp_new, sizeof(*udp_new));
+ if (udp_new->check == 0)
+ udp_new->check = CSUM_MANGLED_0;
+ skb->ip_summed = CHECKSUM_NONE;
+ return 0;
+}
+
+int ipxlat_64_inner_tcp(struct sk_buff *skb, const struct ipv6hdr *in6,
+ const struct iphdr *out4, struct tcphdr *tcp_new)
+{
+ struct tcphdr tcp_old;
+ __sum16 csum16;
+
+ csum16 = tcp_new->check;
+ tcp_old = *tcp_new;
+ tcp_old.check = 0;
+ tcp_new->check = 0;
+ tcp_new->check = ipxlat_64_update_csum(csum16, in6, &tcp_old,
+ sizeof(tcp_old), out4, tcp_new,
+ sizeof(*tcp_new));
+ return 0;
+}
+
+int ipxlat_64_inner_udp(struct sk_buff *skb, const struct ipv6hdr *in6,
+ const struct iphdr *out4, struct udphdr *udp_new)
+{
+ struct udphdr udp_old;
+ __sum16 csum16;
+
+ csum16 = udp_new->check;
+ udp_old = *udp_new;
+ udp_old.check = 0;
+ udp_new->check = 0;
+ udp_new->check = ipxlat_64_update_csum(csum16, in6, &udp_old,
+ sizeof(udp_old), out4, udp_new,
+ sizeof(*udp_new));
+ if (udp_new->check == 0)
+ udp_new->check = CSUM_MANGLED_0;
+ return 0;
+}
diff --git a/drivers/net/ipxlat/transport.h b/drivers/net/ipxlat/transport.h
index bd228aecfb3b..9b6fe422b01f 100644
--- a/drivers/net/ipxlat/transport.h
+++ b/drivers/net/ipxlat/transport.h
@@ -80,4 +80,24 @@ __sum16 ipxlat_l4_csum_ipv6(const struct in6_addr *saddr,
int ipxlat_finalize_offload(struct sk_buff *skb, u8 l4_proto, bool is_fragment,
u32 gso_from, u32 gso_to);
+/* outer transport translation helpers (packet L3 already translated) */
+int ipxlat_46_outer_tcp(struct sk_buff *skb, const struct iphdr *in4);
+int ipxlat_46_outer_udp(struct sk_buff *skb, const struct iphdr *in4);
+
+/* quoted-inner transport translation helpers for ICMP error payloads */
+int ipxlat_46_inner_tcp(struct sk_buff *skb, const struct iphdr *in4,
+ const struct ipv6hdr *iph6, struct tcphdr *tcp_new);
+int ipxlat_46_inner_udp(struct sk_buff *skb, const struct iphdr *in4,
+ const struct ipv6hdr *iph6, struct udphdr *udp_new);
+
+/* outer transport translation helpers (packet L3 already translated) */
+int ipxlat_64_outer_tcp(struct sk_buff *skb, const struct ipv6hdr *in6);
+int ipxlat_64_outer_udp(struct sk_buff *skb, const struct ipv6hdr *in6);
+
+/* quoted-inner transport translation helpers for ICMP error payloads */
+int ipxlat_64_inner_tcp(struct sk_buff *skb, const struct ipv6hdr *in6,
+ const struct iphdr *out4, struct tcphdr *tcp_new);
+int ipxlat_64_inner_udp(struct sk_buff *skb, const struct ipv6hdr *in6,
+ const struct iphdr *out4, struct udphdr *udp_new);
+
#endif /* _NET_IPXLAT_TRANSPORT_H_ */
--
2.53.0
^ permalink raw reply related [flat|nested] 18+ messages in thread* [RFC net-next 08/15] ipxlat: add translation engine and dispatch core
2026-03-19 15:12 [RFC net-next 00/15] Introducing ipxlat: a stateless IPv4/IPv6 translation device Ralf Lici
` (6 preceding siblings ...)
2026-03-19 15:12 ` [RFC net-next 07/15] ipxlat: add 4to6 and 6to4 TCP/UDP translation helpers Ralf Lici
@ 2026-03-19 15:12 ` Ralf Lici
2026-03-19 15:12 ` [RFC net-next 09/15] ipxlat: emit translator-generated ICMP errors on drop Ralf Lici
` (6 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Ralf Lici @ 2026-03-19 15:12 UTC (permalink / raw)
To: netdev
Cc: Daniel Gröber, Ralf Lici, Antonio Quartulli, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
linux-kernel
This commit introduces the core start_xmit processing flow: validate,
select action, translate, and forward. It centralizes action resolution
in the dispatch layer and keeps per-direction translation logic separate
from device glue. The result is a single data-path entry point with
explicit control over drop/forward/emit behavior.
Signed-off-by: Ralf Lici <ralf@mandelbit.com>
---
drivers/net/ipxlat/Makefile | 4 +
drivers/net/ipxlat/dispatch.c | 104 +++++++++++++++
drivers/net/ipxlat/dispatch.h | 71 +++++++++++
drivers/net/ipxlat/main.c | 6 +-
drivers/net/ipxlat/packet.c | 1 +
drivers/net/ipxlat/translate_46.c | 198 +++++++++++++++++++++++++++++
drivers/net/ipxlat/translate_46.h | 73 +++++++++++
drivers/net/ipxlat/translate_64.c | 205 ++++++++++++++++++++++++++++++
drivers/net/ipxlat/translate_64.h | 56 ++++++++
drivers/net/ipxlat/transport.c | 11 ++
drivers/net/ipxlat/transport.h | 5 +
11 files changed, 732 insertions(+), 2 deletions(-)
create mode 100644 drivers/net/ipxlat/dispatch.c
create mode 100644 drivers/net/ipxlat/dispatch.h
create mode 100644 drivers/net/ipxlat/translate_46.c
create mode 100644 drivers/net/ipxlat/translate_46.h
create mode 100644 drivers/net/ipxlat/translate_64.c
create mode 100644 drivers/net/ipxlat/translate_64.h
diff --git a/drivers/net/ipxlat/Makefile b/drivers/net/ipxlat/Makefile
index 90dbc0489fa2..d7b7097aee5f 100644
--- a/drivers/net/ipxlat/Makefile
+++ b/drivers/net/ipxlat/Makefile
@@ -7,3 +7,7 @@ obj-$(CONFIG_IPXLAT) := ipxlat.o
ipxlat-objs += main.o
ipxlat-objs += address.o
ipxlat-objs += packet.o
+ipxlat-objs += transport.o
+ipxlat-objs += dispatch.o
+ipxlat-objs += translate_46.o
+ipxlat-objs += translate_64.o
diff --git a/drivers/net/ipxlat/dispatch.c b/drivers/net/ipxlat/dispatch.c
new file mode 100644
index 000000000000..133d30859f49
--- /dev/null
+++ b/drivers/net/ipxlat/dispatch.c
@@ -0,0 +1,104 @@
+// SPDX-License-Identifier: GPL-2.0
+/* IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver
+ *
+ * Copyright (C) 2024- Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Copyright (C) 2026- Mandelbit SRL
+ * Copyright (C) 2026- Daniel Gröber <dxld@darkboxed.org>
+ *
+ * Author: Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Antonio Quartulli <antonio@mandelbit.com>
+ * Daniel Gröber <dxld@darkboxed.org>
+ * Ralf Lici <ralf@mandelbit.com>
+ */
+
+#include <net/ip.h>
+
+#include "dispatch.h"
+#include "packet.h"
+#include "translate_46.h"
+#include "translate_64.h"
+
+static enum ipxlat_action
+ipxlat_resolve_failed_action(const struct sk_buff *skb)
+{
+ return IPXLAT_ACT_DROP;
+}
+
+enum ipxlat_action ipxlat_translate(struct ipxlat_priv *ipxlat,
+ struct sk_buff *skb)
+{
+ const u16 proto = ntohs(skb->protocol);
+
+ memset(skb->cb, 0, sizeof(struct ipxlat_cb));
+
+ if (proto == ETH_P_IPV6) {
+ if (unlikely(ipxlat_v6_validate_skb(skb)) ||
+ unlikely(ipxlat_64_translate(ipxlat, skb)))
+ return ipxlat_resolve_failed_action(skb);
+
+ return IPXLAT_ACT_FWD;
+ } else if (likely(proto == ETH_P_IP)) {
+ if (unlikely(ipxlat_v4_validate_skb(ipxlat, skb)))
+ return ipxlat_resolve_failed_action(skb);
+
+ if (unlikely(ipxlat_46_translate(ipxlat, skb)))
+ return ipxlat_resolve_failed_action(skb);
+
+ return IPXLAT_ACT_FWD;
+ }
+
+ return IPXLAT_ACT_DROP;
+}
+
+/* mark current skb as drop-with-icmp and cache type/code/info for dispatch */
+void ipxlat_mark_icmp_drop(struct sk_buff *skb, u8 type, u8 code, u32 info)
+{
+ struct ipxlat_cb *cb = ipxlat_skb_cb(skb);
+
+ cb->emit_icmp_err = true;
+ cb->icmp_err.type = type;
+ cb->icmp_err.code = code;
+ cb->icmp_err.info = info;
+}
+
+static void ipxlat_forward_pkt(struct ipxlat_priv *ipxlat, struct sk_buff *skb)
+{
+ const unsigned int len = skb->len;
+ int err;
+
+ /* reinject as a fresh packet with scrubbed metadata */
+ skb_set_queue_mapping(skb, 0);
+ skb_scrub_packet(skb, false);
+
+ err = gro_cells_receive(&ipxlat->gro_cells, skb);
+ if (likely(err == NET_RX_SUCCESS))
+ dev_dstats_rx_add(ipxlat->dev, len);
+ /* on failure gro_cells updates rx drop stats internally */
+}
+
+int ipxlat_process_skb(struct ipxlat_priv *ipxlat, struct sk_buff *skb,
+ bool allow_pre_frag)
+{
+ enum ipxlat_action action;
+ int err = -EINVAL;
+
+ (void)allow_pre_frag;
+
+ action = ipxlat_translate(ipxlat, skb);
+ switch (action) {
+ case IPXLAT_ACT_FWD:
+ dev_dstats_tx_add(ipxlat->dev, skb->len);
+ ipxlat_forward_pkt(ipxlat, skb);
+ return 0;
+ case IPXLAT_ACT_DROP:
+ goto drop_free;
+ default:
+ DEBUG_NET_WARN_ON_ONCE(1);
+ goto drop_free;
+ }
+
+drop_free:
+ dev_dstats_tx_dropped(ipxlat->dev);
+ kfree_skb(skb);
+ return err;
+}
diff --git a/drivers/net/ipxlat/dispatch.h b/drivers/net/ipxlat/dispatch.h
new file mode 100644
index 000000000000..fa6fafea656b
--- /dev/null
+++ b/drivers/net/ipxlat/dispatch.h
@@ -0,0 +1,71 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver
+ *
+ * Copyright (C) 2024- Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Copyright (C) 2026- Mandelbit SRL
+ * Copyright (C) 2026- Daniel Gröber <dxld@darkboxed.org>
+ *
+ * Author: Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Antonio Quartulli <antonio@mandelbit.com>
+ * Daniel Gröber <dxld@darkboxed.org>
+ * Ralf Lici <ralf@mandelbit.com>
+ */
+
+#ifndef _NET_IPXLAT_DISPATCH_H_
+#define _NET_IPXLAT_DISPATCH_H_
+
+#include "ipxlpriv.h"
+
+struct sk_buff;
+
+/**
+ * enum ipxlat_action - result of packet translation dispatch
+ * @IPXLAT_ACT_DROP: drop the packet
+ * @IPXLAT_ACT_FWD: packet translated and ready for forward reinjection
+ * @IPXLAT_ACT_PRE_FRAG: packet must be fragmented before 4->6 translation
+ * @IPXLAT_ACT_ICMP_ERR: drop packet and emit translator-generated ICMP error
+ */
+enum ipxlat_action {
+ IPXLAT_ACT_DROP,
+ IPXLAT_ACT_FWD,
+ IPXLAT_ACT_PRE_FRAG,
+ IPXLAT_ACT_ICMP_ERR,
+};
+
+/**
+ * ipxlat_mark_icmp_drop - cache translator-generated ICMP action in skb cb
+ * @skb: packet being rejected
+ * @type: ICMP type to emit
+ * @code: ICMP code to emit
+ * @info: ICMP auxiliary info (pointer/MTU), host-endian
+ *
+ * This does not emit immediately; dispatch consumes the mark later and sends
+ * the ICMP error through the appropriate address family path.
+ */
+void ipxlat_mark_icmp_drop(struct sk_buff *skb, u8 type, u8 code, u32 info);
+
+/**
+ * ipxlat_translate - validate/translate one packet and return next action
+ * @ipxlat: translator private context
+ * @skb: packet to process
+ *
+ * Return: one of &enum ipxlat_action.
+ */
+enum ipxlat_action ipxlat_translate(struct ipxlat_priv *ipxlat,
+ struct sk_buff *skb);
+
+/**
+ * ipxlat_process_skb - top-level packet handler for ndo_start_xmit/reinjection
+ * @ipxlat: translator private context
+ * @skb: packet to process
+ * @allow_pre_frag: allow 4->6 pre-fragment action for this invocation
+ *
+ * The function always consumes @skb directly or through fragmentation
+ * callback/reinjection paths.
+ *
+ * Return: 0 on success, negative errno on processing failure.
+ */
+int ipxlat_process_skb(struct ipxlat_priv *ipxlat, struct sk_buff *skb,
+ bool allow_pre_frag);
+
+#endif /* _NET_IPXLAT_DISPATCH_H_ */
diff --git a/drivers/net/ipxlat/main.c b/drivers/net/ipxlat/main.c
index 26b7f5b6ff20..a1b4bcd39478 100644
--- a/drivers/net/ipxlat/main.c
+++ b/drivers/net/ipxlat/main.c
@@ -15,6 +15,7 @@
#include <net/ip.h>
+#include "dispatch.h"
#include "ipxlpriv.h"
#include "main.h"
@@ -56,8 +57,9 @@ static void ipxlat_dev_uninit(struct net_device *dev)
static int ipxlat_start_xmit(struct sk_buff *skb, struct net_device *dev)
{
- dev_dstats_tx_dropped(dev);
- kfree_skb(skb);
+ struct ipxlat_priv *ipxlat = netdev_priv(dev);
+
+ ipxlat_process_skb(ipxlat, skb, true);
return NETDEV_TX_OK;
}
diff --git a/drivers/net/ipxlat/packet.c b/drivers/net/ipxlat/packet.c
index b9a9af1b3adb..b37a3e55aff8 100644
--- a/drivers/net/ipxlat/packet.c
+++ b/drivers/net/ipxlat/packet.c
@@ -13,6 +13,7 @@
#include <linux/icmp.h>
+#include "dispatch.h"
#include "packet.h"
/* Shift cached skb cb offsets by the L3 header delta after in-place rewrite.
diff --git a/drivers/net/ipxlat/translate_46.c b/drivers/net/ipxlat/translate_46.c
new file mode 100644
index 000000000000..aec8500db2c2
--- /dev/null
+++ b/drivers/net/ipxlat/translate_46.c
@@ -0,0 +1,198 @@
+// SPDX-License-Identifier: GPL-2.0
+/* IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver
+ *
+ * Copyright (C) 2024- Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Copyright (C) 2026- Mandelbit SRL
+ * Copyright (C) 2026- Daniel Gröber <dxld@darkboxed.org>
+ *
+ * Author: Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Antonio Quartulli <antonio@mandelbit.com>
+ * Daniel Gröber <dxld@darkboxed.org>
+ * Ralf Lici <ralf@mandelbit.com>
+ */
+
+#include <net/ip6_route.h>
+
+#include "address.h"
+#include "packet.h"
+#include "transport.h"
+#include "translate_46.h"
+
+u8 ipxlat_46_map_proto_to_nexthdr(u8 protocol)
+{
+ return (protocol == IPPROTO_ICMP) ? NEXTHDR_ICMP : protocol;
+}
+
+void ipxlat_46_build_frag_hdr(struct frag_hdr *fh6, const struct iphdr *hdr4,
+ u8 l4_proto)
+{
+ fh6->nexthdr = ipxlat_46_map_proto_to_nexthdr(l4_proto);
+ fh6->reserved = 0;
+ fh6->frag_off =
+ ipxlat_build_frag6_offset(ipxlat_get_frag4_offset(hdr4),
+ !!(be16_to_cpu(hdr4->frag_off) &
+ IP_MF));
+ fh6->identification = cpu_to_be32(be16_to_cpu(hdr4->id));
+}
+
+void ipxlat_46_build_l3(struct ipv6hdr *iph6, const struct iphdr *iph4,
+ unsigned int payload_len, u8 nexthdr, u8 hop_limit)
+{
+ iph6->version = 6;
+ iph6->priority = iph4->tos >> 4;
+ iph6->flow_lbl[0] = (iph4->tos & 0x0F) << 4;
+ iph6->flow_lbl[1] = 0;
+ iph6->flow_lbl[2] = 0;
+ iph6->payload_len = htons(payload_len);
+ iph6->nexthdr = nexthdr;
+ iph6->hop_limit = hop_limit;
+}
+
+/* Lookup post-translation IPv6 PMTU for 4->6 output decisions.
+ * Falls back to translator MTU on routing failures and clamps route MTU
+ * against translator egress MTU.
+ */
+unsigned int ipxlat_46_lookup_pmtu6(struct ipxlat_priv *ipxlat,
+ const struct sk_buff *skb,
+ const struct iphdr *in4)
+{
+ unsigned int mtu6, dev_mtu;
+ struct flowi6 fl6 = {};
+ struct dst_entry *dst;
+
+ dev_mtu = READ_ONCE(ipxlat->dev->mtu);
+
+ ipxlat_46_convert_addr(&ipxlat->xlat_prefix6, in4->saddr,
+ &fl6.saddr);
+ ipxlat_46_convert_addr(&ipxlat->xlat_prefix6, in4->daddr,
+ &fl6.daddr);
+ fl6.flowi6_mark = skb->mark;
+
+ dst = ip6_route_output(dev_net(ipxlat->dev), NULL, &fl6);
+ if (unlikely(dst->error)) {
+ mtu6 = dev_mtu;
+ goto out;
+ }
+
+ /* Route lookup can return a very large MTU (eg, local/loopback style
+ * routes) that does not reflect the translator egress constraint.
+ * Clamp with the translator device MTU so DF decisions are stable and
+ * pre-fragment planning never targets packets larger than what this
+ * interface can hand to the next stages.
+ */
+ mtu6 = min_t(unsigned int, dst_mtu(dst), dev_mtu);
+
+out:
+ dst_release(dst);
+ return mtu6;
+}
+
+/**
+ * ipxlat_46_translate - translate one validated packet from IPv4 to IPv6
+ * @ipxlat: translator private context
+ * @skb: packet to translate
+ *
+ * Rewrites outer L3 in place, rebases cached offsets and translates L4 on
+ * first fragments only.
+ *
+ * Return: 0 on success, negative errno on translation failure.
+ */
+int ipxlat_46_translate(struct ipxlat_priv *ipxlat, struct sk_buff *skb)
+{
+ unsigned int min_l4_len, old_l3_len, new_l3_len;
+ struct ipxlat_cb *cb = ipxlat_skb_cb(skb);
+ const struct iphdr outer4 = *ip_hdr(skb);
+ const u8 in_l4_proto = cb->l4_proto;
+ bool has_frag, first_frag;
+ struct frag_hdr *fh6;
+ struct ipv6hdr *iph6;
+ int l3_delta, err;
+ u8 out_l4_proto;
+
+ /* snapshot the original IPv4 header fields before skb layout changes */
+ has_frag = ip_is_fragment(&outer4);
+ first_frag = ipxlat_is_first_frag4(&outer4);
+ out_l4_proto = ipxlat_46_map_proto_to_nexthdr(in_l4_proto);
+
+ old_l3_len = cb->l3_hdr_len;
+ new_l3_len = sizeof(struct ipv6hdr) +
+ (has_frag ? sizeof(struct frag_hdr) : 0);
+ l3_delta = (int)new_l3_len - (int)old_l3_len;
+
+ /* make room for the new hdrs */
+ if (unlikely(skb_cow_head(skb, max_t(int, 0, l3_delta))))
+ return -ENOMEM;
+
+ /* replace outer L3 area: drop IPv4 hdr, reserve IPv6(+Frag) hdr */
+ skb_pull(skb, old_l3_len);
+ skb_push(skb, new_l3_len);
+ skb_reset_network_header(skb);
+ skb_set_transport_header(skb, new_l3_len);
+ skb->protocol = htons(ETH_P_IPV6);
+
+ /* build outer IPv6 base hdr from translated IPv4 fields */
+ iph6 = ipv6_hdr(skb);
+ ipxlat_46_build_l3(iph6, &outer4, skb->len - sizeof(*iph6),
+ out_l4_proto, outer4.ttl - 1);
+
+ /* translate IPv4 endpoints into IPv6 addresses using xlat_prefix6 */
+ ipxlat_46_convert_addrs(&ipxlat->xlat_prefix6, &outer4, iph6);
+
+ /* add IPv6 fragment hdr when the IPv4 packet carried fragmentation */
+ if (unlikely(has_frag)) {
+ iph6->nexthdr = NEXTHDR_FRAGMENT;
+
+ fh6 = (struct frag_hdr *)(iph6 + 1);
+ ipxlat_46_build_frag_hdr(fh6, &outer4, in_l4_proto);
+ cb->fragh_off = sizeof(struct ipv6hdr);
+ }
+
+ /* Rebase cached offsets after L3 size delta.
+ * For outer 4->6 translation this should not underflow: cached offsets
+ * were built from l3_off + ip4_len(+...) and delta = ip6_len - ip4_len,
+ * so ip4_len cancels out after rebasing. A failure here means internal
+ * metadata inconsistency, not a packet validation outcome.
+ */
+ err = ipxlat_cb_rebase_offsets(cb, l3_delta);
+ if (unlikely(err)) {
+ DEBUG_NET_WARN_ON_ONCE(1);
+ return err;
+ }
+
+ cb->l3_hdr_len = new_l3_len;
+ cb->l4_proto = out_l4_proto;
+ DEBUG_NET_WARN_ON_ONCE(!ipxlat_cb_offsets_valid(cb));
+
+ /* non-first fragments have no transport header to translate */
+ if (unlikely(!first_frag))
+ goto out;
+
+ /* ensure transport bytes are writable before L4 csum/proto rewrites */
+ min_l4_len = ipxlat_l4_min_len(in_l4_proto);
+ if (unlikely(skb_ensure_writable(skb, skb_transport_offset(skb) +
+ min_l4_len)))
+ return -ENOMEM;
+
+ /* translate transport hdr and pseudohdr dependent checksums */
+ switch (in_l4_proto) {
+ case IPPROTO_TCP:
+ err = ipxlat_46_outer_tcp(skb, &outer4);
+ break;
+ case IPPROTO_UDP:
+ err = ipxlat_46_outer_udp(skb, &outer4);
+ break;
+ case IPPROTO_ICMP:
+ err = ipxlat_46_icmp(ipxlat, skb);
+ break;
+ default:
+ err = 0;
+ break;
+ }
+ if (unlikely(err))
+ return err;
+
+out:
+ /* normalize checksum/offload metadata for the translated frame */
+ return ipxlat_finalize_offload(skb, in_l4_proto, has_frag,
+ SKB_GSO_TCPV4, SKB_GSO_TCPV6);
+}
diff --git a/drivers/net/ipxlat/translate_46.h b/drivers/net/ipxlat/translate_46.h
new file mode 100644
index 000000000000..75def10d0cad
--- /dev/null
+++ b/drivers/net/ipxlat/translate_46.h
@@ -0,0 +1,73 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver
+ *
+ * Copyright (C) 2024- Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Copyright (C) 2026- Mandelbit SRL
+ * Copyright (C) 2026- Daniel Gröber <dxld@darkboxed.org>
+ *
+ * Author: Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Antonio Quartulli <antonio@mandelbit.com>
+ * Daniel Gröber <dxld@darkboxed.org>
+ * Ralf Lici <ralf@mandelbit.com>
+ */
+
+#ifndef _NET_IPXLAT_TRANSLATE_46_H_
+#define _NET_IPXLAT_TRANSLATE_46_H_
+
+#include "ipxlpriv.h"
+
+struct iphdr;
+struct ipv6hdr;
+struct frag_hdr;
+struct sk_buff;
+
+/**
+ * ipxlat_46_map_proto_to_nexthdr - map IPv4 L4 protocol to IPv6 nexthdr
+ * @protocol: IPv4 L4 protocol
+ *
+ * Return: IPv6 next-header value corresponding to @protocol.
+ */
+u8 ipxlat_46_map_proto_to_nexthdr(u8 protocol);
+
+/**
+ * ipxlat_46_build_frag_hdr - build IPv6 Fragment Header from IPv4 fragment info
+ * @fh6: output IPv6 fragment header
+ * @hdr4: source IPv4 header
+ * @l4_proto: original IPv4 L4 protocol
+ */
+void ipxlat_46_build_frag_hdr(struct frag_hdr *fh6, const struct iphdr *hdr4,
+ u8 l4_proto);
+
+/**
+ * ipxlat_46_build_l3 - build translated outer IPv6 header from IPv4 metadata
+ * @iph6: output IPv6 header
+ * @iph4: source IPv4 header
+ * @payload_len: IPv6 payload length
+ * @nexthdr: resulting IPv6 nexthdr
+ * @hop_limit: resulting IPv6 hop limit
+ */
+void ipxlat_46_build_l3(struct ipv6hdr *iph6, const struct iphdr *iph4,
+ unsigned int payload_len, u8 nexthdr, u8 hop_limit);
+
+/**
+ * ipxlat_46_lookup_pmtu6 - lookup post-translation IPv6 PMTU for a 4->6 packet
+ * @ipxlat: translator private context
+ * @skb: packet being translated
+ * @in4: source IPv4 header snapshot
+ *
+ * Return: effective PMTU clamped against translator device MTU.
+ */
+unsigned int ipxlat_46_lookup_pmtu6(struct ipxlat_priv *ipxlat,
+ const struct sk_buff *skb,
+ const struct iphdr *in4);
+
+/**
+ * ipxlat_46_translate - translate outer packet from IPv4 to IPv6 in place
+ * @ipxlat: translator private context
+ * @skb: packet to translate
+ *
+ * Return: 0 on success, negative errno on translation failure.
+ */
+int ipxlat_46_translate(struct ipxlat_priv *ipxlat, struct sk_buff *skb);
+
+#endif /* _NET_IPXLAT_TRANSLATE_46_H_ */
diff --git a/drivers/net/ipxlat/translate_64.c b/drivers/net/ipxlat/translate_64.c
new file mode 100644
index 000000000000..50a95fb75f9d
--- /dev/null
+++ b/drivers/net/ipxlat/translate_64.c
@@ -0,0 +1,205 @@
+// SPDX-License-Identifier: GPL-2.0
+/* IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver
+ *
+ * Copyright (C) 2024- Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Copyright (C) 2026- Mandelbit SRL
+ * Copyright (C) 2026- Daniel Gröber <dxld@darkboxed.org>
+ *
+ * Author: Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Antonio Quartulli <antonio@mandelbit.com>
+ * Daniel Gröber <dxld@darkboxed.org>
+ * Ralf Lici <ralf@mandelbit.com>
+ */
+
+#include <linux/icmpv6.h>
+#include <net/ip.h>
+
+#include "translate_64.h"
+#include "address.h"
+#include "packet.h"
+#include "transport.h"
+
+u8 ipxlat_64_map_nexthdr_proto(u8 nexthdr)
+{
+ return (nexthdr == NEXTHDR_ICMP) ? IPPROTO_ICMP : nexthdr;
+}
+
+void ipxlat_64_build_l3(struct iphdr *iph4, const struct ipv6hdr *iph6,
+ unsigned int tot_len, __be16 frag_off, u8 protocol,
+ __be32 saddr, __be32 daddr, u8 ttl, __be16 id)
+{
+ iph4->version = 4;
+ iph4->ihl = 5;
+ iph4->tos = ipxlat_get_ipv6_tclass(iph6);
+ iph4->tot_len = cpu_to_be16(tot_len);
+ iph4->frag_off = frag_off;
+ iph4->ttl = ttl;
+ iph4->protocol = protocol;
+ iph4->saddr = saddr;
+ iph4->daddr = daddr;
+ iph4->id = id;
+ iph4->check = 0;
+ iph4->check = ip_fast_csum(iph4, iph4->ihl);
+}
+
+static __be16 ipxlat_64_build_frag_off(const struct sk_buff *skb,
+ const struct frag_hdr *frag6,
+ u8 l4_proto)
+{
+ bool df, mf, over_mtu;
+ u16 frag_offset;
+
+ /* preserve real IPv6 fragmentation state with a Fragment Header */
+ if (frag6) {
+ mf = !!(be16_to_cpu(frag6->frag_off) & IP6_MF);
+ frag_offset = ipxlat_get_frag6_offset(frag6);
+ return ipxlat_build_frag4_offset(false, mf, frag_offset);
+ }
+
+ /* frag_list implies segmented payload emitted as fragments */
+ if (skb_has_frag_list(skb))
+ return ipxlat_build_frag4_offset(false, false, 0);
+
+ if (skb_is_gso(skb)) {
+ /* GSO frames are one datagram here; set DF only for TCP
+ * when later segmentation exceeds IPv6 minimum MTU
+ */
+ df = (l4_proto == IPPROTO_TCP) &&
+ (ipxlat_skb_cb(skb)->payload_off +
+ skb_shinfo(skb)->gso_size >
+ (IPV6_MIN_MTU - sizeof(struct iphdr)));
+ return ipxlat_build_frag4_offset(df, false, 0);
+ }
+
+ over_mtu = skb->len > (IPV6_MIN_MTU - sizeof(struct iphdr));
+ return ipxlat_build_frag4_offset(over_mtu, false, 0);
+}
+
+/**
+ * ipxlat_64_translate - translate one validated packet from IPv6 to IPv4
+ * @ipxlat: translator private context
+ * @skb: packet to translate
+ *
+ * Rewrites outer L3 in place, rebases cached offsets and translates L4 on
+ * first fragments only.
+ *
+ * Return: 0 on success, negative errno on translation failure.
+ */
+int ipxlat_64_translate(struct ipxlat_priv *ipxlat, struct sk_buff *skb)
+{
+ unsigned int min_l4_len, old_l3_len, new_l3_len;
+ struct ipxlat_cb *cb = ipxlat_skb_cb(skb);
+ struct ipv6hdr outer6 = *ipv6_hdr(skb);
+ bool is_icmp_err, has_frag, first_frag;
+ u8 in_l4_proto, out_l4_proto;
+ struct frag_hdr frag_copy;
+ struct frag_hdr *frag6;
+ __be32 saddr, daddr;
+ __be16 frag_off, id;
+ struct iphdr *iph4;
+ int l3_delta, err;
+
+ /* snapshot original outer IPv6 fields before L3 rewrite */
+ frag6 = cb->fragh_off ? (struct frag_hdr *)(skb->data + cb->fragh_off) :
+ NULL;
+ has_frag = !!frag6;
+ in_l4_proto = cb->l4_proto;
+ is_icmp_err = cb->is_icmp_err;
+ out_l4_proto = ipxlat_64_map_nexthdr_proto(in_l4_proto);
+
+ old_l3_len = cb->l3_hdr_len;
+ new_l3_len = sizeof(struct iphdr);
+ l3_delta = (int)new_l3_len - (int)old_l3_len;
+
+ if (unlikely(has_frag))
+ frag_copy = *frag6;
+ first_frag = ipxlat_is_first_frag6(has_frag ? &frag_copy : NULL);
+
+ if (unlikely(is_icmp_err)) {
+ if (unlikely(in_l4_proto != NEXTHDR_ICMP))
+ return -EINVAL;
+ }
+
+ /* derive translated IPv4 endpoints */
+ err = ipxlat_64_convert_addrs(&ipxlat->xlat_prefix6, &outer6,
+ is_icmp_err, &saddr, &daddr);
+ if (unlikely(err))
+ return err;
+
+ /* replace outer IPv6 hdr with IPv4 hdr in-place */
+ skb_pull(skb, old_l3_len);
+ skb_push(skb, new_l3_len);
+ skb_reset_network_header(skb);
+ skb_set_transport_header(skb, new_l3_len);
+ skb->protocol = htons(ETH_P_IP);
+
+ /* Rebase cached offsets after L3 size delta.
+ * For outer 6->4 translation this should not underflow: cached offsets
+ * were built from l3_off + ip6_len (+ ...), and
+ * delta = sizeof(struct iphdr) - ip6_len, so ip6_len cancels out after
+ * rebasing. A failure here means internal metadata inconsistency, not
+ * a packet validation outcome.
+ */
+ err = ipxlat_cb_rebase_offsets(cb, l3_delta);
+ if (unlikely(err)) {
+ DEBUG_NET_WARN_ON_ONCE(1);
+ return err;
+ }
+
+ cb->l3_hdr_len = sizeof(struct iphdr);
+ cb->fragh_off = 0;
+ cb->l4_proto = out_l4_proto;
+ DEBUG_NET_WARN_ON_ONCE(!ipxlat_cb_offsets_valid(cb));
+
+ /* build outer IPv4 base hdr from translated IPv6 fields */
+ iph4 = ip_hdr(skb);
+ frag_off = ipxlat_64_build_frag_off(skb, has_frag ? &frag_copy : NULL,
+ out_l4_proto);
+ /* when source had Fragment Header we preserve its identification;
+ * otherwise allocate a fresh IPv4 ID for the translated packet
+ */
+ id = has_frag ? cpu_to_be16(be32_to_cpu(frag_copy.identification)) : 0;
+ ipxlat_64_build_l3(iph4, &outer6, skb->len, frag_off,
+ out_l4_proto, saddr, daddr,
+ outer6.hop_limit - 1, id);
+
+ if (likely(!has_frag)) {
+ iph4->id = 0;
+ __ip_select_ident(dev_net(ipxlat->dev), iph4, 1);
+ iph4->check = 0;
+ iph4->check = ip_fast_csum(iph4, iph4->ihl);
+ }
+
+ /* non-first fragments have no transport header to translate */
+ if (unlikely(!first_frag))
+ goto out;
+
+ /* ensure transport bytes are writable before L4 csum/proto rewrites */
+ min_l4_len = ipxlat_l4_min_len(out_l4_proto);
+ if (unlikely(skb_ensure_writable(skb, skb_transport_offset(skb) +
+ min_l4_len)))
+ return -ENOMEM;
+
+ /* translate transport hdr and pseudohdr dependent checksums */
+ switch (out_l4_proto) {
+ case IPPROTO_TCP:
+ err = ipxlat_64_outer_tcp(skb, &outer6);
+ break;
+ case IPPROTO_UDP:
+ err = ipxlat_64_outer_udp(skb, &outer6);
+ break;
+ case IPPROTO_ICMP:
+ err = ipxlat_64_icmp(ipxlat, skb, &outer6);
+ break;
+ default:
+ err = 0;
+ break;
+ }
+ if (unlikely(err))
+ return err;
+
+out:
+ /* normalize checksum/offload metadata for the translated frame */
+ return ipxlat_finalize_offload(skb, out_l4_proto, ip_is_fragment(iph4),
+ SKB_GSO_TCPV6, SKB_GSO_TCPV4);
+}
diff --git a/drivers/net/ipxlat/translate_64.h b/drivers/net/ipxlat/translate_64.h
new file mode 100644
index 000000000000..269d1955944f
--- /dev/null
+++ b/drivers/net/ipxlat/translate_64.h
@@ -0,0 +1,56 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver
+ *
+ * Copyright (C) 2024- Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Copyright (C) 2026- Mandelbit SRL
+ * Copyright (C) 2026- Daniel Gröber <dxld@darkboxed.org>
+ *
+ * Author: Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Antonio Quartulli <antonio@mandelbit.com>
+ * Daniel Gröber <dxld@darkboxed.org>
+ * Ralf Lici <ralf@mandelbit.com>
+ */
+
+#ifndef _NET_IPXLAT_TRANSLATE_64_H_
+#define _NET_IPXLAT_TRANSLATE_64_H_
+
+#include "ipxlpriv.h"
+
+struct sk_buff;
+struct iphdr;
+struct ipv6hdr;
+
+/**
+ * ipxlat_64_build_l3 - build translated outer IPv4 header from IPv6 metadata
+ * @iph4: output IPv4 header
+ * @iph6: source IPv6 header
+ * @tot_len: resulting IPv4 total length
+ * @frag_off: resulting IPv4 fragment offset/flags
+ * @protocol: resulting IPv4 L4 protocol
+ * @saddr: resulting IPv4 source address
+ * @daddr: resulting IPv4 destination address
+ * @ttl: resulting IPv4 TTL
+ * @id: resulting IPv4 identification field
+ */
+void ipxlat_64_build_l3(struct iphdr *iph4, const struct ipv6hdr *iph6,
+ unsigned int tot_len, __be16 frag_off, u8 protocol,
+ __be32 saddr, __be32 daddr, u8 ttl, __be16 id);
+
+/**
+ * ipxlat_64_translate - translate outer packet from IPv6 to IPv4 in place
+ * @ipxlat: translator private context
+ * @skb: packet to translate
+ *
+ * Return: 0 on success, negative errno on translation failure.
+ */
+int ipxlat_64_translate(struct ipxlat_priv *ipxlat, struct sk_buff *skb);
+
+/**
+ * ipxlat_64_map_nexthdr_proto - map IPv6 nexthdr to IPv4 L4 protocol
+ * @nexthdr: IPv6 next-header value
+ *
+ * Return: IPv4 protocol value corresponding to @nexthdr.
+ */
+u8 ipxlat_64_map_nexthdr_proto(u8 nexthdr);
+
+#endif /* _NET_IPXLAT_TRANSLATE_64_H_ */
diff --git a/drivers/net/ipxlat/transport.c b/drivers/net/ipxlat/transport.c
index 3aa00c635916..78548d0b8c22 100644
--- a/drivers/net/ipxlat/transport.c
+++ b/drivers/net/ipxlat/transport.c
@@ -338,3 +338,14 @@ int ipxlat_64_inner_udp(struct sk_buff *skb, const struct ipv6hdr *in6,
udp_new->check = CSUM_MANGLED_0;
return 0;
}
+
+int ipxlat_46_icmp(struct ipxlat_priv *ipxlat, struct sk_buff *skb)
+{
+ return -EPROTONOSUPPORT;
+}
+
+int ipxlat_64_icmp(struct ipxlat_priv *ipxlat, struct sk_buff *skb,
+ const struct ipv6hdr *outer6)
+{
+ return -EPROTONOSUPPORT;
+}
diff --git a/drivers/net/ipxlat/transport.h b/drivers/net/ipxlat/transport.h
index 9b6fe422b01f..0e69b98eafd0 100644
--- a/drivers/net/ipxlat/transport.h
+++ b/drivers/net/ipxlat/transport.h
@@ -100,4 +100,9 @@ int ipxlat_64_inner_tcp(struct sk_buff *skb, const struct ipv6hdr *in6,
int ipxlat_64_inner_udp(struct sk_buff *skb, const struct ipv6hdr *in6,
const struct iphdr *out4, struct udphdr *udp_new);
+/* temporary ICMP stubs until ICMP translation support is introduced */
+int ipxlat_46_icmp(struct ipxlat_priv *ipxlat, struct sk_buff *skb);
+int ipxlat_64_icmp(struct ipxlat_priv *ipxlat, struct sk_buff *skb,
+ const struct ipv6hdr *outer6);
+
#endif /* _NET_IPXLAT_TRANSPORT_H_ */
--
2.53.0
^ permalink raw reply related [flat|nested] 18+ messages in thread* [RFC net-next 09/15] ipxlat: emit translator-generated ICMP errors on drop
2026-03-19 15:12 [RFC net-next 00/15] Introducing ipxlat: a stateless IPv4/IPv6 translation device Ralf Lici
` (7 preceding siblings ...)
2026-03-19 15:12 ` [RFC net-next 08/15] ipxlat: add translation engine and dispatch core Ralf Lici
@ 2026-03-19 15:12 ` Ralf Lici
2026-03-19 15:12 ` [RFC net-next 10/15] ipxlat: add 4to6 pre-fragmentation path Ralf Lici
` (5 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Ralf Lici @ 2026-03-19 15:12 UTC (permalink / raw)
To: netdev
Cc: Daniel Gröber, Ralf Lici, Antonio Quartulli, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
linux-kernel
When validation or policy requires dropping a packet and generating an
ICMP error, route that failure through explicit ICMP emission paths so
the sender can be notified where appropriate. This commit adds
translator-originated error generation for both directions and
integrates it into dispatch action handling without changing normal
forwarding behavior.
Signed-off-by: Ralf Lici <ralf@mandelbit.com>
---
drivers/net/ipxlat/dispatch.c | 66 ++++++++++++++++++++++++++++++++++-
drivers/net/ipxlat/dispatch.h | 7 ++++
drivers/net/ipxlat/packet.c | 25 ++++++++++---
3 files changed, 92 insertions(+), 6 deletions(-)
diff --git a/drivers/net/ipxlat/dispatch.c b/drivers/net/ipxlat/dispatch.c
index 133d30859f49..b8b9b930b04c 100644
--- a/drivers/net/ipxlat/dispatch.c
+++ b/drivers/net/ipxlat/dispatch.c
@@ -11,7 +11,12 @@
* Ralf Lici <ralf@mandelbit.com>
*/
+#include <linux/icmp.h>
+#include <linux/icmpv6.h>
+#include <net/icmp.h>
#include <net/ip.h>
+#include <net/route.h>
+#include <net/ipv6.h>
#include "dispatch.h"
#include "packet.h"
@@ -21,7 +26,8 @@
static enum ipxlat_action
ipxlat_resolve_failed_action(const struct sk_buff *skb)
{
- return IPXLAT_ACT_DROP;
+ return ipxlat_skb_cb(skb)->emit_icmp_err ? IPXLAT_ACT_ICMP_ERR :
+ IPXLAT_ACT_DROP;
}
enum ipxlat_action ipxlat_translate(struct ipxlat_priv *ipxlat,
@@ -61,6 +67,59 @@ void ipxlat_mark_icmp_drop(struct sk_buff *skb, u8 type, u8 code, u32 info)
cb->icmp_err.info = info;
}
+static void ipxlat_46_emit_icmp_err(struct ipxlat_priv *ipxlat,
+ struct sk_buff *inner)
+{
+ struct ipxlat_cb *cb = ipxlat_skb_cb(inner);
+ const struct iphdr *iph = ip_hdr(inner);
+ struct inet_skb_parm param = {};
+
+ /* build route metadata on demand when the packet has no dst */
+ if (unlikely(!skb_dst(inner))) {
+ const int reason = ip_route_input_noref(inner, iph->daddr,
+ iph->saddr,
+ ip4h_dscp(iph),
+ inner->dev);
+
+ if (unlikely(reason)) {
+ netdev_dbg(ipxlat->dev,
+ "icmp4 emit: route build failed reason=%d\n",
+ reason);
+ return;
+ }
+ }
+
+ /* emit the ICMPv4 error */
+ __icmp_send(inner, cb->icmp_err.type, cb->icmp_err.code,
+ htonl(cb->icmp_err.info), ¶m);
+}
+
+static void ipxlat_64_emit_icmp_err(struct sk_buff *inner)
+{
+ struct ipxlat_cb *cb = ipxlat_skb_cb(inner);
+ struct inet6_skb_parm param = {};
+
+ /* emit the ICMPv6 error */
+ icmp6_send(inner, cb->icmp_err.type, cb->icmp_err.code,
+ cb->icmp_err.info, NULL, ¶m);
+}
+
+/* emit translator-generated ICMP errors for packets rejected by RFC rules */
+void ipxlat_emit_icmp_error(struct ipxlat_priv *ipxlat, struct sk_buff *inner)
+{
+ switch (ntohs(inner->protocol)) {
+ case ETH_P_IPV6:
+ ipxlat_64_emit_icmp_err(inner);
+ return;
+ case ETH_P_IP:
+ ipxlat_46_emit_icmp_err(ipxlat, inner);
+ return;
+ default:
+ DEBUG_NET_WARN_ON_ONCE(1);
+ return;
+ }
+}
+
static void ipxlat_forward_pkt(struct ipxlat_priv *ipxlat, struct sk_buff *skb)
{
const unsigned int len = skb->len;
@@ -90,6 +149,11 @@ int ipxlat_process_skb(struct ipxlat_priv *ipxlat, struct sk_buff *skb,
dev_dstats_tx_add(ipxlat->dev, skb->len);
ipxlat_forward_pkt(ipxlat, skb);
return 0;
+ case IPXLAT_ACT_ICMP_ERR:
+ dev_dstats_tx_dropped(ipxlat->dev);
+ ipxlat_emit_icmp_error(ipxlat, skb);
+ consume_skb(skb);
+ return 0;
case IPXLAT_ACT_DROP:
goto drop_free;
default:
diff --git a/drivers/net/ipxlat/dispatch.h b/drivers/net/ipxlat/dispatch.h
index fa6fafea656b..73acd831b6cf 100644
--- a/drivers/net/ipxlat/dispatch.h
+++ b/drivers/net/ipxlat/dispatch.h
@@ -44,6 +44,13 @@ enum ipxlat_action {
*/
void ipxlat_mark_icmp_drop(struct sk_buff *skb, u8 type, u8 code, u32 info);
+/**
+ * ipxlat_emit_icmp_error - emit cached translator-generated ICMP error
+ * @ipxlat: translator private context
+ * @inner: offending packet used as quoted payload
+ */
+void ipxlat_emit_icmp_error(struct ipxlat_priv *ipxlat, struct sk_buff *inner);
+
/**
* ipxlat_translate - validate/translate one packet and return next action
* @ipxlat: translator private context
diff --git a/drivers/net/ipxlat/packet.c b/drivers/net/ipxlat/packet.c
index b37a3e55aff8..758b72bdc6f1 100644
--- a/drivers/net/ipxlat/packet.c
+++ b/drivers/net/ipxlat/packet.c
@@ -142,6 +142,8 @@ static int ipxlat_v4_srr_check(struct sk_buff *skb, const struct iphdr *hdr)
if (unlikely(ptr > len - 3))
return -EINVAL;
+ ipxlat_mark_icmp_drop(skb, ICMP_DEST_UNREACH,
+ ICMP_SR_FAILED, 0);
return -EINVAL;
}
@@ -272,8 +274,10 @@ static int ipxlat_v4_pull_hdrs(struct sk_buff *skb)
/* RFC 7915 Section 4.1 */
if (unlikely(ipxlat_v4_srr_check(skb, l3_hdr)))
return -EINVAL;
- if (unlikely(l3_hdr->ttl <= 1))
+ if (unlikely(l3_hdr->ttl <= 1)) {
+ ipxlat_mark_icmp_drop(skb, ICMP_TIME_EXCEEDED, ICMP_EXC_TTL, 0);
return -EINVAL;
+ }
/* RFC 7915 Section 1.2:
* Fragmented ICMP/ICMPv6 packets will not be translated by IP/ICMP
@@ -390,8 +394,11 @@ int ipxlat_v4_validate_skb(struct ipxlat_priv *ipxlat, struct sk_buff *skb)
* Fragmented checksum-less IPv4 UDP is rejected because 4->6 cannot
* reliably translate it.
*/
- if (unlikely(ip_is_fragment(l3_hdr)))
+ if (unlikely(ip_is_fragment(l3_hdr))) {
+ ipxlat_mark_icmp_drop(skb, ICMP_DEST_UNREACH, ICMP_PKT_FILTERED,
+ 0);
return -EINVAL;
+ }
/* udph->len bounds the span used to compute replacement checksum */
if (unlikely(ntohs(udph->len) > skb->len - cb->l4_off))
@@ -520,7 +527,7 @@ static int ipxlat_v6_walk_hdrs(struct sk_buff *skb, unsigned int l3_offset,
*/
static int ipxlat_v6_check_rh(struct sk_buff *skb)
{
- unsigned int rh_off;
+ unsigned int rh_off, pointer;
int flags, nexthdr;
rh_off = 0;
@@ -531,6 +538,8 @@ static int ipxlat_v6_check_rh(struct sk_buff *skb)
if (likely(nexthdr != NEXTHDR_ROUTING))
return 0;
+ pointer = rh_off + offsetof(struct ipv6_rt_hdr, segments_left);
+ ipxlat_mark_icmp_drop(skb, ICMPV6_PARAMPROB, ICMPV6_HDR_FIELD, pointer);
return -EINVAL;
}
@@ -550,8 +559,11 @@ static int ipxlat_v6_pull_outer_l3(struct sk_buff *skb)
!ipxlat_v6_validate_saddr(&l3_hdr->saddr)))
return -EINVAL;
- if (unlikely(l3_hdr->hop_limit <= 1))
+ if (unlikely(l3_hdr->hop_limit <= 1)) {
+ ipxlat_mark_icmp_drop(skb, ICMPV6_TIME_EXCEED,
+ ICMPV6_EXC_HOPLIMIT, 0);
return -EINVAL;
+ }
return 0;
}
@@ -617,8 +629,11 @@ static int ipxlat_v6_pull_hdrs(struct sk_buff *skb)
/* -EPROTONOSUPPORT means packet layout is syntactically valid but
* unsupported by our RFC 7915 path
*/
- if (unlikely(err == -EPROTONOSUPPORT))
+ if (unlikely(err == -EPROTONOSUPPORT)) {
+ ipxlat_mark_icmp_drop(skb, ICMPV6_DEST_UNREACH,
+ ICMPV6_ADM_PROHIBITED, 0);
return -EINVAL;
+ }
if (unlikely(err))
return err;
--
2.53.0
^ permalink raw reply related [flat|nested] 18+ messages in thread* [RFC net-next 10/15] ipxlat: add 4to6 pre-fragmentation path
2026-03-19 15:12 [RFC net-next 00/15] Introducing ipxlat: a stateless IPv4/IPv6 translation device Ralf Lici
` (8 preceding siblings ...)
2026-03-19 15:12 ` [RFC net-next 09/15] ipxlat: emit translator-generated ICMP errors on drop Ralf Lici
@ 2026-03-19 15:12 ` Ralf Lici
2026-03-19 15:12 ` [RFC net-next 11/15] ipxlat: add ICMP informational translation paths Ralf Lici
` (4 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Ralf Lici @ 2026-03-19 15:12 UTC (permalink / raw)
To: netdev
Cc: Daniel Gröber, Ralf Lici, Antonio Quartulli, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
linux-kernel
RFC 7915 requires handling packets that would exceed the translated IPv6
size constraints. Add a pre-fragmentation planning/action path that
invokes kernel fragmentation helpers before translation, carries
fragment size through skb metadata, and then reinjects fragments into
the normal translation path.
Signed-off-by: Ralf Lici <ralf@mandelbit.com>
---
drivers/net/ipxlat/dispatch.c | 99 ++++++++++++++++++++++++++++++-
drivers/net/ipxlat/translate_46.c | 59 +++++++++++++++++-
drivers/net/ipxlat/translate_46.h | 11 ++++
3 files changed, 166 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ipxlat/dispatch.c b/drivers/net/ipxlat/dispatch.c
index b8b9b930b04c..b58191d4b2c9 100644
--- a/drivers/net/ipxlat/dispatch.c
+++ b/drivers/net/ipxlat/dispatch.c
@@ -47,6 +47,16 @@ enum ipxlat_action ipxlat_translate(struct ipxlat_priv *ipxlat,
if (unlikely(ipxlat_v4_validate_skb(ipxlat, skb)))
return ipxlat_resolve_failed_action(skb);
+ /* 4->6 prefrag plan stores per-skb frag_max_size
+ * when the packet must be split before translation
+ * (DF clear and translated size
+ * above PMTU/threshold).
+ */
+ if (unlikely(ipxlat_46_plan_prefrag(ipxlat, skb)))
+ return ipxlat_resolve_failed_action(skb);
+ if (unlikely(ipxlat_skb_cb(skb)->frag_max_size))
+ return IPXLAT_ACT_PRE_FRAG;
+
if (unlikely(ipxlat_46_translate(ipxlat, skb)))
return ipxlat_resolve_failed_action(skb);
@@ -120,6 +130,76 @@ void ipxlat_emit_icmp_error(struct ipxlat_priv *ipxlat, struct sk_buff *inner)
}
}
+static unsigned int ipxlat_frag_dst_get_mtu(const struct dst_entry *dst)
+{
+ return READ_ONCE(dst->dev->mtu);
+}
+
+static struct dst_ops ipxlat_frag_dst_ops = {
+ .family = AF_UNSPEC,
+ .mtu = ipxlat_frag_dst_get_mtu,
+};
+
+/**
+ * ipxlat_46_frag_output - reinject one fragment produced by ip_do_fragment
+ * @net: network namespace of the transmitter
+ * @sk: originating socket
+ * @skb: fragment to reinject
+ *
+ * This callback mirrors ndo_start_xmit processing but runs with
+ * pre-fragmentation disabled to prevent recursive pre-fragment loops.
+ *
+ * Return: 0 on success, negative errno on processing failure.
+ */
+static int ipxlat_46_frag_output(struct net *net, struct sock *sk,
+ struct sk_buff *skb)
+{
+ struct ipxlat_priv *ipxlat = netdev_priv(skb->dev);
+
+ return ipxlat_process_skb(ipxlat, skb, false);
+}
+
+/**
+ * ipxlat_46_fragment_pkt - fragment oversized 4->6 input before translation
+ * @ipxlat: translator private context
+ * @skb: original packet to fragment
+ * @frag_max_size: per-fragment payload cap for ip_do_fragment
+ *
+ * Installs a temporary synthetic dst so ip_do_fragment can read MTU and then
+ * reinjects each produced fragment back into ipxlat through
+ * ipxlat_46_frag_output.
+ *
+ * Return: 0 on success, negative errno on fragmentation failure.
+ */
+static int ipxlat_46_fragment_pkt(struct ipxlat_priv *ipxlat,
+ struct sk_buff *skb, u16 frag_max_size)
+{
+ const unsigned long orig_dst = skb->_skb_refdst;
+ struct rtable ipxlat_rt = {};
+ int err;
+
+ /* ip_do_fragment needs a dst object to query mtu */
+ dst_init(&ipxlat_rt.dst, &ipxlat_frag_dst_ops, NULL, DST_OBSOLETE_NONE,
+ DST_NOCOUNT);
+
+ /* use translator netdev as mtu source for the temporary dst */
+ ipxlat_rt.dst.dev = ipxlat->dev;
+
+ /* setup the skb for fragmentation */
+ skb_dst_set_noref(skb, &ipxlat_rt.dst);
+ memset(IPCB(skb), 0, sizeof(struct inet_skb_parm));
+ IPCB(skb)->frag_max_size = frag_max_size;
+
+ /* fragment and reinject each frag in the translator */
+ err = ip_do_fragment(dev_net(ipxlat->dev), skb->sk, skb,
+ ipxlat_46_frag_output);
+
+ /* drop original dst ref replaced by the synthetic NOREF dst */
+ refdst_drop(orig_dst);
+
+ return err;
+}
+
static void ipxlat_forward_pkt(struct ipxlat_priv *ipxlat, struct sk_buff *skb)
{
const unsigned int len = skb->len;
@@ -141,14 +221,29 @@ int ipxlat_process_skb(struct ipxlat_priv *ipxlat, struct sk_buff *skb,
enum ipxlat_action action;
int err = -EINVAL;
- (void)allow_pre_frag;
-
action = ipxlat_translate(ipxlat, skb);
switch (action) {
case IPXLAT_ACT_FWD:
dev_dstats_tx_add(ipxlat->dev, skb->len);
ipxlat_forward_pkt(ipxlat, skb);
return 0;
+ case IPXLAT_ACT_PRE_FRAG:
+ /* prefrag is allowed only once to avoid unbounded loops */
+ if (unlikely(!allow_pre_frag)) {
+ err = -ELOOP;
+ goto drop_free;
+ }
+
+ /* fragment first, then reinject each fragment through
+ * ipxlat_process_skb via ipxlat_46_frag_output
+ */
+ err = ipxlat_46_fragment_pkt(ipxlat, skb,
+ ipxlat_skb_cb(skb)->frag_max_size);
+ /* fragment path already consumed/freed skb */
+ skb = NULL;
+ if (unlikely(err))
+ goto drop_free;
+ return 0;
case IPXLAT_ACT_ICMP_ERR:
dev_dstats_tx_dropped(ipxlat->dev);
ipxlat_emit_icmp_error(ipxlat, skb);
diff --git a/drivers/net/ipxlat/translate_46.c b/drivers/net/ipxlat/translate_46.c
index aec8500db2c2..0b79ca07c771 100644
--- a/drivers/net/ipxlat/translate_46.c
+++ b/drivers/net/ipxlat/translate_46.c
@@ -87,6 +87,63 @@ unsigned int ipxlat_46_lookup_pmtu6(struct ipxlat_priv *ipxlat,
return mtu6;
}
+/**
+ * ipxlat_46_plan_prefrag - plan pre-translation IPv4 fragmentation for 4->6
+ * @ipxlat: translator private context
+ * @skb: packet being translated
+ *
+ * Decides whether packet exceeds PMTU/LIM thresholds and, when needed, stores
+ * per-skb fragmentation cap in cb->frag_max_size for later ip_do_fragment.
+ *
+ * Return: 0 on success, negative errno on policy/validation failure.
+ */
+int ipxlat_46_plan_prefrag(struct ipxlat_priv *ipxlat, struct sk_buff *skb)
+{
+ unsigned int pkt_len6, pmtu6, threshold6, frag_max_size, pkt_len4,
+ old_l3_len, new_l3_len;
+ struct ipxlat_cb *cb = ipxlat_skb_cb(skb);
+ const struct iphdr *in4 = ip_hdr(skb);
+ int l3_delta, frag_l3_delta;
+
+ if (unlikely(cb->frag_max_size)) {
+ DEBUG_NET_WARN_ON_ONCE(1);
+ cb->frag_max_size = 0;
+ }
+
+ pkt_len4 = iph_totlen(skb, in4);
+ old_l3_len = cb->l3_hdr_len;
+ new_l3_len = sizeof(struct ipv6hdr) +
+ (ip_is_fragment(in4) ? sizeof(struct frag_hdr) : 0);
+ l3_delta = (int)new_l3_len - (int)old_l3_len;
+ pkt_len6 = pkt_len4 + l3_delta;
+
+ pmtu6 = ipxlat_46_lookup_pmtu6(ipxlat, skb, in4);
+ threshold6 = min(pmtu6, READ_ONCE(ipxlat->lowest_ipv6_mtu));
+
+ if (likely(pkt_len6 <= threshold6))
+ return 0;
+
+ /* df packets are never locally pre-fragmented */
+ if (likely(be16_to_cpu(in4->frag_off) & IP_DF)) {
+ /* Let the IPv6 forwarding path raise PTB when needed and rely
+ * on the reverse 6->4 ICMP translation path for feedback.
+ */
+ return 0;
+ }
+
+ /* df not set: we can fragment */
+
+ frag_l3_delta =
+ (int)(sizeof(struct ipv6hdr) + sizeof(struct frag_hdr)) -
+ (int)old_l3_len;
+ frag_max_size = threshold6 - frag_l3_delta;
+ /* store per-skb prefrag cap: ipxlat_46_fragment_pkt will copy it into
+ * IPCB(skb)->frag_max_size before calling ip_do_fragment
+ */
+ cb->frag_max_size = min_t(unsigned int, frag_max_size, IP_MAX_MTU);
+ return 0;
+}
+
/**
* ipxlat_46_translate - translate one validated packet from IPv4 to IPv6
* @ipxlat: translator private context
@@ -182,7 +239,7 @@ int ipxlat_46_translate(struct ipxlat_priv *ipxlat, struct sk_buff *skb)
err = ipxlat_46_outer_udp(skb, &outer4);
break;
case IPPROTO_ICMP:
- err = ipxlat_46_icmp(ipxlat, skb);
+ err = -EPROTONOSUPPORT;
break;
default:
err = 0;
diff --git a/drivers/net/ipxlat/translate_46.h b/drivers/net/ipxlat/translate_46.h
index 75def10d0cad..6ba409c94185 100644
--- a/drivers/net/ipxlat/translate_46.h
+++ b/drivers/net/ipxlat/translate_46.h
@@ -61,6 +61,17 @@ unsigned int ipxlat_46_lookup_pmtu6(struct ipxlat_priv *ipxlat,
const struct sk_buff *skb,
const struct iphdr *in4);
+/**
+ * ipxlat_46_plan_prefrag - decide whether IPv4 packet must be pre-fragmented
+ * @ipxlat: translator private context
+ * @skb: packet being translated
+ *
+ * Sets cb->frag_max_size when pre-fragmentation is required.
+ *
+ * Return: 0 on success, negative errno on policy/validation failure.
+ */
+int ipxlat_46_plan_prefrag(struct ipxlat_priv *ipxlat, struct sk_buff *skb);
+
/**
* ipxlat_46_translate - translate outer packet from IPv4 to IPv6 in place
* @ipxlat: translator private context
--
2.53.0
^ permalink raw reply related [flat|nested] 18+ messages in thread* [RFC net-next 11/15] ipxlat: add ICMP informational translation paths
2026-03-19 15:12 [RFC net-next 00/15] Introducing ipxlat: a stateless IPv4/IPv6 translation device Ralf Lici
` (9 preceding siblings ...)
2026-03-19 15:12 ` [RFC net-next 10/15] ipxlat: add 4to6 pre-fragmentation path Ralf Lici
@ 2026-03-19 15:12 ` Ralf Lici
2026-03-19 15:12 ` [RFC net-next 12/15] ipxlat: add ICMP error translation and quoted-inner handling Ralf Lici
` (3 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Ralf Lici @ 2026-03-19 15:12 UTC (permalink / raw)
To: netdev
Cc: Daniel Gröber, Ralf Lici, Antonio Quartulli, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
linux-kernel
Add ICMP informational message translation for both 4->6 and 6->4 paths
and wire the new ICMP translation units into the engine.
This introduces the protocol mapping and checksum update logic for echo
request/reply traffic, while ICMP error quoted-inner translation is
added in a follow-up commit.
Signed-off-by: Ralf Lici <ralf@mandelbit.com>
---
drivers/net/ipxlat/Makefile | 2 +
drivers/net/ipxlat/icmp.h | 43 ++++++++++++++
drivers/net/ipxlat/icmp_46.c | 95 +++++++++++++++++++++++++++++++
drivers/net/ipxlat/icmp_64.c | 92 ++++++++++++++++++++++++++++++
drivers/net/ipxlat/translate_64.c | 1 +
drivers/net/ipxlat/transport.c | 11 ----
drivers/net/ipxlat/transport.h | 5 --
7 files changed, 233 insertions(+), 16 deletions(-)
create mode 100644 drivers/net/ipxlat/icmp.h
create mode 100644 drivers/net/ipxlat/icmp_46.c
create mode 100644 drivers/net/ipxlat/icmp_64.c
diff --git a/drivers/net/ipxlat/Makefile b/drivers/net/ipxlat/Makefile
index d7b7097aee5f..2ded504902e3 100644
--- a/drivers/net/ipxlat/Makefile
+++ b/drivers/net/ipxlat/Makefile
@@ -11,3 +11,5 @@ ipxlat-objs += transport.o
ipxlat-objs += dispatch.o
ipxlat-objs += translate_46.o
ipxlat-objs += translate_64.o
+ipxlat-objs += icmp_46.o
+ipxlat-objs += icmp_64.o
diff --git a/drivers/net/ipxlat/icmp.h b/drivers/net/ipxlat/icmp.h
new file mode 100644
index 000000000000..52d681787d6a
--- /dev/null
+++ b/drivers/net/ipxlat/icmp.h
@@ -0,0 +1,43 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver
+ *
+ * Copyright (C) 2024- Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Copyright (C) 2026- Mandelbit SRL
+ * Copyright (C) 2026- Daniel Gröber <dxld@darkboxed.org>
+ *
+ * Author: Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Antonio Quartulli <antonio@mandelbit.com>
+ * Daniel Gröber <dxld@darkboxed.org>
+ * Ralf Lici <ralf@mandelbit.com>
+ */
+
+#ifndef _NET_IPXLAT_ICMP_H_
+#define _NET_IPXLAT_ICMP_H_
+
+#include <linux/ipv6.h>
+
+#include "ipxlpriv.h"
+
+/**
+ * ipxlat_46_icmp - translate ICMP informational payload
+ * after outer 4->6 rewrite
+ * @ipxl: translator private context
+ * @skb: packet carrying ICMPv4 transport payload
+ *
+ * Return: 0 on success, negative errno on translation failure.
+ */
+int ipxlat_46_icmp(struct ipxlat_priv *ipxl, struct sk_buff *skb);
+
+/**
+ * ipxlat_64_icmp - translate ICMP informational payload
+ * after outer 6->4 rewrite
+ * @ipxlat: translator private context
+ * @skb: packet carrying ICMPv6 transport payload
+ * @in6: snapshot of original outer IPv6 header
+ *
+ * Return: 0 on success, negative errno on translation failure.
+ */
+int ipxlat_64_icmp(struct ipxlat_priv *ipxlat, struct sk_buff *skb,
+ const struct ipv6hdr *in6);
+
+#endif /* _NET_IPXLAT_ICMP_H_ */
diff --git a/drivers/net/ipxlat/icmp_46.c b/drivers/net/ipxlat/icmp_46.c
new file mode 100644
index 000000000000..ad907f60416c
--- /dev/null
+++ b/drivers/net/ipxlat/icmp_46.c
@@ -0,0 +1,95 @@
+// SPDX-License-Identifier: GPL-2.0
+/* IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver
+ *
+ * Copyright (C) 2024- Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Copyright (C) 2026- Mandelbit SRL
+ * Copyright (C) 2026- Daniel Gröber <dxld@darkboxed.org>
+ *
+ * Author: Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Antonio Quartulli <antonio@mandelbit.com>
+ * Daniel Gröber <dxld@darkboxed.org>
+ * Ralf Lici <ralf@mandelbit.com>
+ */
+
+#include <linux/icmp.h>
+#include <linux/icmpv6.h>
+
+#include "icmp.h"
+#include "packet.h"
+#include "transport.h"
+
+static int ipxlat_46_map_icmp_info_type_code(const struct icmphdr *in,
+ struct icmp6hdr *out)
+{
+ switch (in->type) {
+ case ICMP_ECHO:
+ out->icmp6_type = ICMPV6_ECHO_REQUEST;
+ out->icmp6_code = 0;
+ out->icmp6_identifier = in->un.echo.id;
+ out->icmp6_sequence = in->un.echo.sequence;
+ return 0;
+ case ICMP_ECHOREPLY:
+ out->icmp6_type = ICMPV6_ECHO_REPLY;
+ out->icmp6_code = 0;
+ out->icmp6_identifier = in->un.echo.id;
+ out->icmp6_sequence = in->un.echo.sequence;
+ return 0;
+ }
+
+ return -EPROTONOSUPPORT;
+}
+
+static void ipxlat_46_icmp_info_update_csum(const struct icmphdr *icmp4,
+ struct icmp6hdr *icmp6,
+ const struct ipv6hdr *ip6,
+ const struct sk_buff *skb,
+ unsigned int l4_off)
+{
+ struct icmp6hdr icmp6_zero;
+ struct icmphdr icmp4_zero;
+ __wsum csum;
+
+ icmp4_zero = *icmp4;
+ icmp4_zero.checksum = 0;
+ icmp6_zero = *icmp6;
+ icmp6_zero.icmp6_cksum = 0;
+ csum = ~csum_unfold(icmp4->checksum);
+ csum = csum_sub(csum, csum_partial(&icmp4_zero, sizeof(icmp4_zero), 0));
+ csum = csum_add(csum, csum_partial(&icmp6_zero, sizeof(icmp6_zero), 0));
+ icmp6->icmp6_cksum = csum_ipv6_magic(&ip6->saddr, &ip6->daddr,
+ skb->len - l4_off,
+ IPPROTO_ICMPV6, csum);
+}
+
+static int ipxlat_46_icmp_info_outer(struct sk_buff *skb)
+{
+ const unsigned int l4_off = skb_transport_offset(skb);
+ const struct icmphdr icmp4 = *icmp_hdr(skb);
+ const struct ipv6hdr *ip6 = ipv6_hdr(skb);
+ struct icmp6hdr *icmp6 = icmp6_hdr(skb);
+ int err;
+
+ err = ipxlat_46_map_icmp_info_type_code(&icmp4, icmp6);
+ if (unlikely(err))
+ return -EINVAL;
+
+ if (skb->ip_summed == CHECKSUM_PARTIAL) {
+ icmp6->icmp6_cksum = ~csum_ipv6_magic(&ip6->saddr, &ip6->daddr,
+ skb->len - l4_off,
+ IPPROTO_ICMPV6, 0);
+ return ipxlat_set_partial_csum(skb, offsetof(struct icmp6hdr,
+ icmp6_cksum));
+ }
+
+ ipxlat_46_icmp_info_update_csum(&icmp4, icmp6, ip6, skb, l4_off);
+ skb->ip_summed = CHECKSUM_NONE;
+ return 0;
+}
+
+int ipxlat_46_icmp(struct ipxlat_priv *ipxl, struct sk_buff *skb)
+{
+ if (unlikely(ipxlat_skb_cb(skb)->is_icmp_err))
+ return -EPROTONOSUPPORT;
+
+ return ipxlat_46_icmp_info_outer(skb);
+}
diff --git a/drivers/net/ipxlat/icmp_64.c b/drivers/net/ipxlat/icmp_64.c
new file mode 100644
index 000000000000..6b11aa638068
--- /dev/null
+++ b/drivers/net/ipxlat/icmp_64.c
@@ -0,0 +1,92 @@
+// SPDX-License-Identifier: GPL-2.0
+/* IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver
+ *
+ * Copyright (C) 2024- Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Copyright (C) 2026- Mandelbit SRL
+ * Copyright (C) 2026- Daniel Gröber <dxld@darkboxed.org>
+ *
+ * Author: Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Antonio Quartulli <antonio@mandelbit.com>
+ * Daniel Gröber <dxld@darkboxed.org>
+ * Ralf Lici <ralf@mandelbit.com>
+ */
+
+#include <linux/icmpv6.h>
+
+#include "icmp.h"
+#include "packet.h"
+#include "transport.h"
+
+static int ipxlat_64_map_icmp_info_type_code(const struct icmp6hdr *in,
+ struct icmphdr *out)
+{
+ switch (in->icmp6_type) {
+ case ICMPV6_ECHO_REQUEST:
+ out->type = ICMP_ECHO;
+ out->code = 0;
+ out->un.echo.id = in->icmp6_identifier;
+ out->un.echo.sequence = in->icmp6_sequence;
+ return 0;
+ case ICMPV6_ECHO_REPLY:
+ out->type = ICMP_ECHOREPLY;
+ out->code = 0;
+ out->un.echo.id = in->icmp6_identifier;
+ out->un.echo.sequence = in->icmp6_sequence;
+ return 0;
+ default:
+ return -EINVAL;
+ }
+}
+
+static __sum16 ipxlat_64_compute_icmp_info_csum(const struct ipv6hdr *in6,
+ const struct icmp6hdr *in_icmp6,
+ const struct icmphdr *out_icmp4,
+ unsigned int l4_len)
+{
+ struct icmp6hdr icmp6_zero;
+ struct icmphdr icmp4_zero;
+ __wsum csum, tmp;
+
+ icmp6_zero = *in_icmp6;
+ icmp6_zero.icmp6_cksum = 0;
+ icmp4_zero = *out_icmp4;
+ icmp4_zero.checksum = 0;
+
+ csum = ~csum_unfold(in_icmp6->icmp6_cksum);
+ tmp = ~csum_unfold(csum_ipv6_magic(&in6->saddr, &in6->daddr, l4_len,
+ NEXTHDR_ICMP, 0));
+ csum = csum_sub(csum, tmp);
+ csum = csum_sub(csum, csum_partial(&icmp6_zero, sizeof(icmp6_zero), 0));
+ csum = csum_add(csum, csum_partial(&icmp4_zero, sizeof(icmp4_zero), 0));
+ return csum_fold(csum);
+}
+
+static int ipxlat_64_icmp_info(struct sk_buff *skb, const struct ipv6hdr *in6)
+{
+ struct icmp6hdr ic6_copy, *ic6;
+ struct icmphdr *ic4;
+ int err;
+
+ ic6 = icmp6_hdr(skb);
+ ic6_copy = *ic6;
+
+ ic4 = (struct icmphdr *)(skb->data + skb_transport_offset(skb));
+ err = ipxlat_64_map_icmp_info_type_code(&ic6_copy, ic4);
+ if (unlikely(err))
+ return err;
+
+ ic4->checksum =
+ ipxlat_64_compute_icmp_info_csum(in6, &ic6_copy, ic4,
+ ipxlat_skb_datagram_len(skb));
+ skb->ip_summed = CHECKSUM_NONE;
+ return 0;
+}
+
+int ipxlat_64_icmp(struct ipxlat_priv *ipxl, struct sk_buff *skb,
+ const struct ipv6hdr *in6)
+{
+ if (unlikely(ipxlat_skb_cb(skb)->is_icmp_err))
+ return -EPROTONOSUPPORT;
+
+ return ipxlat_64_icmp_info(skb, in6);
+}
diff --git a/drivers/net/ipxlat/translate_64.c b/drivers/net/ipxlat/translate_64.c
index 50a95fb75f9d..412d29214a43 100644
--- a/drivers/net/ipxlat/translate_64.c
+++ b/drivers/net/ipxlat/translate_64.c
@@ -16,6 +16,7 @@
#include "translate_64.h"
#include "address.h"
+#include "icmp.h"
#include "packet.h"
#include "transport.h"
diff --git a/drivers/net/ipxlat/transport.c b/drivers/net/ipxlat/transport.c
index 78548d0b8c22..3aa00c635916 100644
--- a/drivers/net/ipxlat/transport.c
+++ b/drivers/net/ipxlat/transport.c
@@ -338,14 +338,3 @@ int ipxlat_64_inner_udp(struct sk_buff *skb, const struct ipv6hdr *in6,
udp_new->check = CSUM_MANGLED_0;
return 0;
}
-
-int ipxlat_46_icmp(struct ipxlat_priv *ipxlat, struct sk_buff *skb)
-{
- return -EPROTONOSUPPORT;
-}
-
-int ipxlat_64_icmp(struct ipxlat_priv *ipxlat, struct sk_buff *skb,
- const struct ipv6hdr *outer6)
-{
- return -EPROTONOSUPPORT;
-}
diff --git a/drivers/net/ipxlat/transport.h b/drivers/net/ipxlat/transport.h
index 0e69b98eafd0..9b6fe422b01f 100644
--- a/drivers/net/ipxlat/transport.h
+++ b/drivers/net/ipxlat/transport.h
@@ -100,9 +100,4 @@ int ipxlat_64_inner_tcp(struct sk_buff *skb, const struct ipv6hdr *in6,
int ipxlat_64_inner_udp(struct sk_buff *skb, const struct ipv6hdr *in6,
const struct iphdr *out4, struct udphdr *udp_new);
-/* temporary ICMP stubs until ICMP translation support is introduced */
-int ipxlat_46_icmp(struct ipxlat_priv *ipxlat, struct sk_buff *skb);
-int ipxlat_64_icmp(struct ipxlat_priv *ipxlat, struct sk_buff *skb,
- const struct ipv6hdr *outer6);
-
#endif /* _NET_IPXLAT_TRANSPORT_H_ */
--
2.53.0
^ permalink raw reply related [flat|nested] 18+ messages in thread* [RFC net-next 12/15] ipxlat: add ICMP error translation and quoted-inner handling
2026-03-19 15:12 [RFC net-next 00/15] Introducing ipxlat: a stateless IPv4/IPv6 translation device Ralf Lici
` (10 preceding siblings ...)
2026-03-19 15:12 ` [RFC net-next 11/15] ipxlat: add ICMP informational translation paths Ralf Lici
@ 2026-03-19 15:12 ` Ralf Lici
2026-03-19 15:12 ` [RFC net-next 13/15] ipxlat: add netlink control plane and uapi Ralf Lici
` (2 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Ralf Lici @ 2026-03-19 15:12 UTC (permalink / raw)
To: netdev
Cc: Daniel Gröber, Ralf Lici, Antonio Quartulli, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
linux-kernel
Extend ICMP translation with error-path support for both directions,
including quoted-inner packet rewriting and RFC 4884 extension
relayout/squeeze logic.
This adds the ICMP type/code/error-field mappings, inner L3/L4 rewrite
paths, and final checksum handling required for translator ICMP error
processing.
Signed-off-by: Ralf Lici <ralf@mandelbit.com>
---
drivers/net/ipxlat/icmp.h | 14 +-
drivers/net/ipxlat/icmp_46.c | 467 ++++++++++++++++++++++++++++++++-
drivers/net/ipxlat/icmp_64.c | 453 +++++++++++++++++++++++++++++++-
drivers/net/ipxlat/transport.c | 61 +++++
drivers/net/ipxlat/transport.h | 19 ++
5 files changed, 996 insertions(+), 18 deletions(-)
diff --git a/drivers/net/ipxlat/icmp.h b/drivers/net/ipxlat/icmp.h
index 52d681787d6a..71bd7e20af91 100644
--- a/drivers/net/ipxlat/icmp.h
+++ b/drivers/net/ipxlat/icmp.h
@@ -19,22 +19,24 @@
#include "ipxlpriv.h"
/**
- * ipxlat_46_icmp - translate ICMP informational payload
- * after outer 4->6 rewrite
- * @ipxl: translator private context
+ * ipxlat_46_icmp - translate ICMP payload after outer 4->6 L3 rewrite
+ * @ipxlat: translator private context
* @skb: packet carrying ICMPv4 transport payload
*
+ * Handles both ICMP info translation and ICMP error quoted-inner rewriting.
+ *
* Return: 0 on success, negative errno on translation failure.
*/
-int ipxlat_46_icmp(struct ipxlat_priv *ipxl, struct sk_buff *skb);
+int ipxlat_46_icmp(struct ipxlat_priv *ipxlat, struct sk_buff *skb);
/**
- * ipxlat_64_icmp - translate ICMP informational payload
- * after outer 6->4 rewrite
+ * ipxlat_64_icmp - translate ICMP payload after outer 6->4 L3 rewrite
* @ipxlat: translator private context
* @skb: packet carrying ICMPv6 transport payload
* @in6: snapshot of original outer IPv6 header
*
+ * Handles both ICMP info translation and ICMP error quoted-inner rewriting.
+ *
* Return: 0 on success, negative errno on translation failure.
*/
int ipxlat_64_icmp(struct ipxlat_priv *ipxlat, struct sk_buff *skb,
diff --git a/drivers/net/ipxlat/icmp_46.c b/drivers/net/ipxlat/icmp_46.c
index ad907f60416c..41a91d4bc3dc 100644
--- a/drivers/net/ipxlat/icmp_46.c
+++ b/drivers/net/ipxlat/icmp_46.c
@@ -11,13 +11,49 @@
* Ralf Lici <ralf@mandelbit.com>
*/
-#include <linux/icmp.h>
-#include <linux/icmpv6.h>
-
+#include "address.h"
#include "icmp.h"
#include "packet.h"
+#include "translate_46.h"
#include "transport.h"
+#define IPXLAT_ICMP4_PP_CODE_PTR 0
+#define IPXLAT_ICMP4_PP_CODE_BADLEN 2
+
+/* RFC 7915 Section 4.2, Figure 3 */
+static const u8 ipxlat_46_icmp_param_prob_map[] = { 0, 1, 4, 4, 0xff,
+ 0xff, 0xff, 0xff, 7, 6,
+ 0xff, 0xff, 8, 8, 8,
+ 8, 24, 24, 24, 24 };
+
+/* RFC 1191 plateau table used when ICMPv4 FRAG_NEEDED reports MTU=0 */
+static const u16 ipxlat_46_mtu_plateaus[] = {
+ 65535, 32000, 17914, 8166, 4352, 2002, 1492,
+};
+
+static u8 ipxlat_icmp4_get_param_ptr(const struct icmphdr *ic4)
+{
+ return ntohl(ic4->un.gateway) >> 24;
+}
+
+static int ipxlat_46_map_icmp_param_prob(const struct icmphdr *in,
+ struct icmp6hdr *out)
+{
+ u8 ptr;
+
+ if (unlikely(in->code != IPXLAT_ICMP4_PP_CODE_PTR &&
+ in->code != IPXLAT_ICMP4_PP_CODE_BADLEN))
+ return -EPROTONOSUPPORT;
+
+ ptr = ipxlat_icmp4_get_param_ptr(in);
+ if (unlikely(ptr >= ARRAY_SIZE(ipxlat_46_icmp_param_prob_map) ||
+ ipxlat_46_icmp_param_prob_map[ptr] == 0xff))
+ return -EPROTONOSUPPORT;
+
+ out->icmp6_pointer = cpu_to_be32(ipxlat_46_icmp_param_prob_map[ptr]);
+ return 0;
+}
+
static int ipxlat_46_map_icmp_info_type_code(const struct icmphdr *in,
struct icmp6hdr *out)
{
@@ -39,6 +75,165 @@ static int ipxlat_46_map_icmp_info_type_code(const struct icmphdr *in,
return -EPROTONOSUPPORT;
}
+static __be32 ipxlat_46_compute_icmp_mtu6(unsigned int pkt_mtu,
+ unsigned int nexthop6mtu,
+ unsigned int nexthop4mtu,
+ u16 tot_len_field)
+{
+ unsigned int i;
+ u32 result;
+
+ /* RFC 7915 Section 4.2:
+ * If the IPv4 router set the MTU field to zero, then the translator
+ * MUST use the plateau values specified in RFC 1191 to determine a
+ * likely path MTU and include that path MTU in the ICMPv6 packet.
+ */
+ if (unlikely(pkt_mtu == 0)) {
+ for (i = 0; i < ARRAY_SIZE(ipxlat_46_mtu_plateaus); i++) {
+ if (ipxlat_46_mtu_plateaus[i] < tot_len_field) {
+ pkt_mtu = ipxlat_46_mtu_plateaus[i];
+ break;
+ }
+ }
+ }
+
+ /* RFC 7915 Section 4.2:
+ * max(1280, min(pkt_mtu + 20, mtu6_nexthop, mtu4_nexthop + 20))
+ *
+ * pkt_mtu + 20 converts ICMPv4-reported MTU to IPv6 context.
+ * mtu6_nexthop and mtu4_nexthop + 20 clamp to local next-hop limits.
+ * max(..., 1280) enforces IPv6 minimum MTU.
+ */
+ result = min(pkt_mtu + 20, min(nexthop6mtu, nexthop4mtu + 20));
+ if (result < IPV6_MIN_MTU)
+ result = IPV6_MIN_MTU;
+
+ return cpu_to_be32(result);
+}
+
+static int ipxlat_46_build_icmp_dest_unreach(struct ipxlat_priv *ipxlat,
+ struct sk_buff *skb,
+ const struct icmphdr *in,
+ struct icmp6hdr *out,
+ const struct iphdr *inner4)
+{
+ unsigned int inner4_tot_len, in_frag_mtu, in_mtu, out_mtu;
+
+ switch (in->code) {
+ case ICMP_NET_UNREACH:
+ case ICMP_HOST_UNREACH:
+ case ICMP_SR_FAILED:
+ case ICMP_NET_UNKNOWN:
+ case ICMP_HOST_UNKNOWN:
+ case ICMP_HOST_ISOLATED:
+ case ICMP_NET_UNR_TOS:
+ case ICMP_HOST_UNR_TOS:
+ case ICMP_PORT_UNREACH:
+ case ICMP_NET_ANO:
+ case ICMP_HOST_ANO:
+ case ICMP_PKT_FILTERED:
+ case ICMP_PREC_CUTOFF:
+ out->icmp6_unused = 0;
+ return 0;
+ case ICMP_PROT_UNREACH:
+ out->icmp6_pointer =
+ cpu_to_be32(offsetof(struct ipv6hdr, nexthdr));
+ return 0;
+ case ICMP_FRAG_NEEDED:
+ in_frag_mtu = be16_to_cpu(in->un.frag.mtu);
+ inner4_tot_len = be16_to_cpu(inner4->tot_len);
+ in_mtu = READ_ONCE(ipxlat->dev->mtu);
+ out_mtu = ipxlat_46_lookup_pmtu6(ipxlat, skb, inner4);
+
+ out->icmp6_mtu =
+ ipxlat_46_compute_icmp_mtu6(in_frag_mtu, out_mtu,
+ in_mtu, inner4_tot_len);
+ return 0;
+ }
+
+ return -EPROTONOSUPPORT;
+}
+
+static int ipxlat_46_map_icmp_type_code(struct ipxlat_priv *ipxlat,
+ struct sk_buff *skb,
+ const struct icmphdr *in,
+ struct icmp6hdr *out,
+ const struct iphdr *inner4,
+ bool *ie_forbidden)
+{
+ int err;
+
+ *ie_forbidden = false;
+
+ switch (in->type) {
+ case ICMP_ECHO:
+ case ICMP_ECHOREPLY:
+ return ipxlat_46_map_icmp_info_type_code(in, out);
+ case ICMP_DEST_UNREACH:
+ switch (in->code) {
+ case ICMP_NET_UNREACH:
+ case ICMP_HOST_UNREACH:
+ case ICMP_SR_FAILED:
+ case ICMP_NET_UNKNOWN:
+ case ICMP_HOST_UNKNOWN:
+ case ICMP_HOST_ISOLATED:
+ case ICMP_NET_UNR_TOS:
+ case ICMP_HOST_UNR_TOS:
+ out->icmp6_type = ICMPV6_DEST_UNREACH;
+ out->icmp6_code = ICMPV6_NOROUTE;
+ break;
+ case ICMP_PROT_UNREACH:
+ out->icmp6_type = ICMPV6_PARAMPROB;
+ out->icmp6_code = ICMPV6_UNK_NEXTHDR;
+ *ie_forbidden = true;
+ break;
+ case ICMP_PORT_UNREACH:
+ out->icmp6_type = ICMPV6_DEST_UNREACH;
+ out->icmp6_code = ICMPV6_PORT_UNREACH;
+ break;
+ case ICMP_FRAG_NEEDED:
+ out->icmp6_type = ICMPV6_PKT_TOOBIG;
+ out->icmp6_code = 0;
+ *ie_forbidden = true;
+ break;
+ case ICMP_NET_ANO:
+ case ICMP_HOST_ANO:
+ case ICMP_PKT_FILTERED:
+ case ICMP_PREC_CUTOFF:
+ out->icmp6_type = ICMPV6_DEST_UNREACH;
+ out->icmp6_code = ICMPV6_ADM_PROHIBITED;
+ break;
+ default:
+ return -EPROTONOSUPPORT;
+ }
+ return ipxlat_46_build_icmp_dest_unreach(ipxlat,
+ skb, in, out,
+ inner4);
+ case ICMP_TIME_EXCEEDED:
+ out->icmp6_type = ICMPV6_TIME_EXCEED;
+ out->icmp6_code = in->code;
+ out->icmp6_unused = 0;
+ return 0;
+ case ICMP_PARAMETERPROB:
+ out->icmp6_type = ICMPV6_PARAMPROB;
+ *ie_forbidden = true;
+ switch (in->code) {
+ case IPXLAT_ICMP4_PP_CODE_PTR:
+ case IPXLAT_ICMP4_PP_CODE_BADLEN:
+ out->icmp6_code = ICMPV6_HDR_FIELD;
+ break;
+ default:
+ return -EPROTONOSUPPORT;
+ }
+ err = ipxlat_46_map_icmp_param_prob(in, out);
+ if (unlikely(err))
+ return err;
+ return 0;
+ }
+
+ return -EPROTONOSUPPORT;
+}
+
static void ipxlat_46_icmp_info_update_csum(const struct icmphdr *icmp4,
struct icmp6hdr *icmp6,
const struct ipv6hdr *ip6,
@@ -86,10 +281,272 @@ static int ipxlat_46_icmp_info_outer(struct sk_buff *skb)
return 0;
}
-int ipxlat_46_icmp(struct ipxlat_priv *ipxl, struct sk_buff *skb)
+static int ipxlat_46_icmp_info_inner(struct sk_buff *skb,
+ unsigned int inner_l4_off,
+ const struct ipv6hdr *inner6)
+{
+ struct icmp6hdr *icmp6;
+ struct icmphdr icmp4;
+ int err;
+
+ /* inner header alignment is not guaranteed */
+ memcpy(&icmp4, skb->data + inner_l4_off, sizeof(icmp4));
+ icmp6 = (struct icmp6hdr *)(skb->data + inner_l4_off);
+
+ err = ipxlat_46_map_icmp_info_type_code(&icmp4, icmp6);
+ if (unlikely(err))
+ return -EINVAL;
+
+ ipxlat_46_icmp_info_update_csum(&icmp4, icmp6, inner6, skb,
+ inner_l4_off);
+ return 0;
+}
+
+static int ipxlat_46_icmp_inner_l4(struct sk_buff *skb,
+ unsigned int inner_l4_off,
+ const struct iphdr *inner4,
+ const struct ipv6hdr *inner6)
+{
+ struct tcphdr *tcp;
+ struct udphdr *udp;
+
+ switch (inner4->protocol) {
+ case IPPROTO_TCP:
+ tcp = (struct tcphdr *)(skb->data + inner_l4_off);
+ return ipxlat_46_inner_tcp(skb, inner4, inner6, tcp);
+ case IPPROTO_UDP:
+ udp = (struct udphdr *)(skb->data + inner_l4_off);
+ return ipxlat_46_inner_udp(skb, inner4, inner6, udp);
+ case IPPROTO_ICMP:
+ return ipxlat_46_icmp_info_inner(skb, inner_l4_off, inner6);
+ default:
+ return 0;
+ }
+}
+
+static int ipxlat_46_icmp_inner(struct ipxlat_priv *ipxlat,
+ struct sk_buff *skb, struct iphdr *inner4,
+ int *inner_delta)
+{
+ unsigned int inner_l3_len, inner_l3_off, inner_l4_off, old_prefix,
+ new_prefix, inner_tot_len, inner_l3_payload, inner_l4_payload;
+ const unsigned int outer_l3_len = skb_transport_offset(skb);
+ const struct ipxlat_cb *cb = ipxlat_skb_cb(skb);
+ struct ipv6hdr outer_ip6_copy, *inner_ip6;
+ struct frag_hdr *fh6;
+ u8 next_hdr;
+ bool has_inner_frag;
+
+ inner_l3_off = cb->inner_l3_offset;
+ inner_l4_off = cb->inner_l4_offset;
+
+ /* inner header alignment is not guaranteed */
+ memcpy(inner4, skb->data + inner_l3_off, sizeof(*inner4));
+ inner_l3_len = inner4->ihl << 2;
+ has_inner_frag = ip_is_fragment(inner4);
+
+ /* save outer IPv6 hdr because pull+push destroys that hdr region */
+ outer_ip6_copy = *ipv6_hdr(skb);
+
+ old_prefix = inner_l3_off + inner_l3_len;
+ new_prefix = inner_l3_off + sizeof(struct ipv6hdr) +
+ (has_inner_frag ? sizeof(struct frag_hdr) : 0);
+ *inner_delta = (int)new_prefix - (int)old_prefix;
+
+ if (unlikely(skb_cow_head(skb, max_t(int, 0, *inner_delta))))
+ return -ENOMEM;
+
+ skb_pull(skb, old_prefix);
+ skb_push(skb, new_prefix);
+ /* outer 4->6 path already set header offsets, but inner relayout
+ * pulls/pushes change skb->data placement. Reinitialize outer header
+ * offsets so ip{,v6}_hdr/icmp{,6}_hdr and skb_transport_offset keep
+ * pointing to the outer packet.
+ */
+ skb_reset_network_header(skb);
+ skb_set_transport_header(skb, outer_l3_len);
+
+ *ipv6_hdr(skb) = outer_ip6_copy;
+ ipv6_hdr(skb)->payload_len = htons(skb->len - sizeof(struct ipv6hdr));
+
+ inner_ip6 = (struct ipv6hdr *)(skb->data + inner_l3_off);
+ /* use quoted IPv4 total-length, not skb->len:
+ * skb->len also includes ICMP extension bytes at the end, which are
+ * not part of the quoted inner IP datagram length.
+ */
+ inner_tot_len = ntohs(inner4->tot_len);
+ if (unlikely(inner_tot_len < inner_l3_len))
+ return -EINVAL;
+
+ inner_l3_payload = inner_tot_len - inner_l3_len +
+ (has_inner_frag ? sizeof(struct frag_hdr) : 0);
+ if (has_inner_frag)
+ next_hdr = NEXTHDR_FRAGMENT;
+ else
+ next_hdr = ipxlat_46_map_proto_to_nexthdr(inner4->protocol);
+
+ ipxlat_46_build_l3(inner_ip6, inner4, inner_l3_payload, next_hdr,
+ inner4->ttl);
+
+ ipxlat_46_convert_addrs(&ipxlat->xlat_prefix6, inner4, inner_ip6);
+
+ if (unlikely(has_inner_frag)) {
+ fh6 = (struct frag_hdr *)(inner_ip6 + 1);
+ ipxlat_46_build_frag_hdr(fh6, inner4, inner4->protocol);
+ }
+
+ if (unlikely(!ipxlat_is_first_frag4(inner4)))
+ return 0;
+
+ inner_l4_payload = new_prefix + ipxlat_l4_min_len(inner4->protocol);
+ if (unlikely(skb_ensure_writable(skb, inner_l4_payload)))
+ return -ENOMEM;
+
+ return ipxlat_46_icmp_inner_l4(skb, new_prefix, inner4, inner_ip6);
+}
+
+/* Adjust ICMP error quoted-datagram/extensions after inner 4->6 translation.
+ * The inner rewrite changes quoted datagram length; this helper recomputes
+ * RFC 4884 delimiter/padding, preserves extensions only when allowed, and
+ * enforces IPv6 minimum-MTU packet size constraints.
+ */
+static int ipxlat_46_icmp_squeeze_ext(struct sk_buff *skb,
+ unsigned int icmp4_ipl, int inner_delta,
+ bool ie_forbidden)
+{
+ unsigned int icmp6_iel_in, icmp6_iel_out, max_iel, outer_hdrs_len,
+ out_pad, payload_len, icmp6_ipl_out_bytes, pkt_len_cap;
+ unsigned int icmp6_ipl_out = 0;
+ int icmp6_ipl_in_bytes, err;
+ struct icmp6hdr *ic6;
+ struct ipv6hdr *iph6;
+
+ /* icmp4_ipl marks where quoted datagram ends and extension area starts
+ */
+ if (likely(!icmp4_ipl))
+ goto no_extensions;
+
+ outer_hdrs_len = skb_transport_offset(skb) + sizeof(struct icmp6hdr);
+ payload_len = skb->len - outer_hdrs_len;
+ icmp6_ipl_in_bytes = icmp4_ipl + inner_delta;
+ if (unlikely(icmp6_ipl_in_bytes < 0 ||
+ icmp6_ipl_in_bytes > payload_len))
+ return -EINVAL;
+
+ if (likely(icmp6_ipl_in_bytes == payload_len))
+ goto no_extensions;
+
+ icmp6_iel_in = payload_len - icmp6_ipl_in_bytes;
+ max_iel = IPV6_MIN_MTU - (outer_hdrs_len + ICMP_EXT_ORIG_DGRAM_MIN_LEN);
+
+ if (unlikely(ie_forbidden || icmp6_iel_in > max_iel)) {
+ pkt_len_cap = min_t(unsigned int, skb->len - icmp6_iel_in,
+ IPV6_MIN_MTU);
+ icmp6_ipl_out_bytes = pkt_len_cap - outer_hdrs_len;
+ out_pad = 0;
+ icmp6_iel_out = 0;
+ icmp6_ipl_out = 0;
+ } else {
+ pkt_len_cap = min_t(unsigned int, skb->len, IPV6_MIN_MTU);
+ icmp6_ipl_out_bytes =
+ round_down(pkt_len_cap - icmp6_iel_in - outer_hdrs_len,
+ sizeof(u64));
+ out_pad = max_t(unsigned int, ICMP_EXT_ORIG_DGRAM_MIN_LEN,
+ icmp6_ipl_out_bytes) -
+ icmp6_ipl_out_bytes;
+ icmp6_iel_out = icmp6_iel_in;
+ icmp6_ipl_out = (icmp6_ipl_out_bytes + out_pad) >> 3;
+ }
+
+ /* if no extension bytes are copied and no pad is written, relayout only
+ * trims/updates lengths and does not require full data writability
+ */
+ if (unlikely(icmp6_iel_out || out_pad)) {
+ err = skb_ensure_writable(skb, skb->len);
+ if (unlikely(err))
+ return err;
+ }
+
+ err = ipxlat_icmp_relayout(skb, outer_hdrs_len, icmp6_ipl_in_bytes,
+ icmp6_iel_in, icmp6_ipl_out_bytes, out_pad,
+ icmp6_iel_out);
+ if (unlikely(err))
+ return err;
+
+ iph6 = ipv6_hdr(skb);
+ iph6->payload_len = htons(skb->len - sizeof(*iph6));
+
+no_extensions:
+ if (unlikely(skb->len > IPV6_MIN_MTU)) {
+ err = pskb_trim(skb, IPV6_MIN_MTU);
+ if (unlikely(err))
+ return err;
+
+ iph6 = ipv6_hdr(skb);
+ iph6->payload_len = htons(skb->len - sizeof(*iph6));
+ }
+
+ ic6 = icmp6_hdr(skb);
+ ic6->icmp6_datagram_len = icmp6_ipl_out;
+ return 0;
+}
+
+/**
+ * ipxlat_46_icmp_error - translate ICMPv4 error payload to ICMPv6 error form
+ * @ipxlat: translator private context
+ * @skb: packet carrying outer ICMPv4 error
+ *
+ * Rewrites the quoted inner datagram in place, maps type/code/fields and
+ * adjusts RFC 4884 datagram/extension layout before recomputing outer checksum.
+ *
+ * Return: 0 on success, negative errno on translation failure.
+ */
+static int ipxlat_46_icmp_error(struct ipxlat_priv *ipxlat, struct sk_buff *skb)
+{
+ const struct ipxlat_cb *cb = ipxlat_skb_cb(skb);
+ const struct icmphdr icmp4 = *icmp_hdr(skb);
+ struct iphdr inner4_ip;
+ int inner_delta, err;
+ bool ie_forbidden;
+
+ if (unlikely(!(cb->is_icmp_err))) {
+ DEBUG_NET_WARN_ON_ONCE(1);
+ return -EINVAL;
+ }
+
+ /* translate quoted inner packet headers */
+ err = ipxlat_46_icmp_inner(ipxlat, skb, &inner4_ip, &inner_delta);
+ if (unlikely(err))
+ return err;
+
+ err = ipxlat_46_map_icmp_type_code(ipxlat, skb, &icmp4, icmp6_hdr(skb),
+ &inner4_ip, &ie_forbidden);
+ if (unlikely(err))
+ return err;
+
+ err = ipxlat_46_icmp_squeeze_ext(skb, icmp4.un.reserved[1] << 2,
+ inner_delta, ie_forbidden);
+ if (unlikely(err))
+ return err;
+
+ /* error path rewrites quoted packet bytes/lengths, so use full
+ * checksum recomputation instead of incremental update
+ */
+ icmp6_hdr(skb)->icmp6_cksum = 0;
+ icmp6_hdr(skb)->icmp6_cksum =
+ ipxlat_l4_csum_ipv6(&ipv6_hdr(skb)->saddr,
+ &ipv6_hdr(skb)->daddr, skb,
+ skb_transport_offset(skb),
+ ipxlat_skb_datagram_len(skb),
+ IPPROTO_ICMPV6);
+ skb->ip_summed = CHECKSUM_NONE;
+ return 0;
+}
+
+int ipxlat_46_icmp(struct ipxlat_priv *ipxlat, struct sk_buff *skb)
{
if (unlikely(ipxlat_skb_cb(skb)->is_icmp_err))
- return -EPROTONOSUPPORT;
+ return ipxlat_46_icmp_error(ipxlat, skb);
return ipxlat_46_icmp_info_outer(skb);
}
diff --git a/drivers/net/ipxlat/icmp_64.c b/drivers/net/ipxlat/icmp_64.c
index 6b11aa638068..18583620a09a 100644
--- a/drivers/net/ipxlat/icmp_64.c
+++ b/drivers/net/ipxlat/icmp_64.c
@@ -11,12 +11,38 @@
* Ralf Lici <ralf@mandelbit.com>
*/
-#include <linux/icmpv6.h>
+#include <net/route.h>
+#include "address.h"
#include "icmp.h"
#include "packet.h"
+#include "translate_64.h"
#include "transport.h"
+#define IPXLAT_ICMP4_ERROR_MAX_LEN 576U
+
+/* RFC 7915 Section 5.2, Figure 4 */
+static const u8 ipxlat_64_icmp_param_prob_map[] = {
+ 0, 1, 0xff, 0xff, 2, 2, 9, 8, 12, 12, 12, 12, 12, 12,
+ 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 16, 16, 16, 16,
+ 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
+};
+
+static int ipxlat_64_map_icmp_param_prob(u32 ptr6, u32 *ptr4)
+{
+ if (unlikely(ptr6 >= ARRAY_SIZE(ipxlat_64_icmp_param_prob_map) ||
+ ipxlat_64_icmp_param_prob_map[ptr6] == 0xff))
+ return -EPROTONOSUPPORT;
+
+ *ptr4 = ipxlat_64_icmp_param_prob_map[ptr6];
+ return 0;
+}
+
+static void ipxlat_icmp4_set_param_ptr(struct icmphdr *ic4, u8 ptr)
+{
+ ic4->un.gateway = htonl((u32)ptr << 24);
+}
+
static int ipxlat_64_map_icmp_info_type_code(const struct icmp6hdr *in,
struct icmphdr *out)
{
@@ -38,10 +64,119 @@ static int ipxlat_64_map_icmp_info_type_code(const struct icmp6hdr *in,
}
}
-static __sum16 ipxlat_64_compute_icmp_info_csum(const struct ipv6hdr *in6,
- const struct icmp6hdr *in_icmp6,
- const struct icmphdr *out_icmp4,
- unsigned int l4_len)
+/* Lookup post-translation IPv4 PMTU for ICMPv6 PTB -> ICMPv4 FRAG_NEEDED.
+ * Falls back to translator MTU on routing failures and clamps route MTU
+ * against translator egress MTU.
+ */
+static unsigned int ipxlat_64_lookup_pmtu4(struct ipxlat_priv *ipxlat,
+ const struct sk_buff *skb)
+{
+ const struct iphdr *iph4;
+ struct flowi4 fl4 = {};
+ unsigned int dev_mtu;
+ struct rtable *rt;
+ unsigned int mtu4;
+
+ dev_mtu = READ_ONCE(ipxlat->dev->mtu);
+ iph4 = ip_hdr(skb);
+
+ fl4.daddr = iph4->daddr;
+ fl4.saddr = iph4->saddr;
+ fl4.flowi4_mark = skb->mark;
+ fl4.flowi4_proto = IPPROTO_ICMP;
+
+ rt = ip_route_output_key(dev_net(ipxlat->dev), &fl4);
+ if (IS_ERR(rt))
+ return dev_mtu;
+
+ /* clamp against translator MTU to avoid oversized local PMTU */
+ mtu4 = min_t(unsigned int, dst_mtu(&rt->dst), dev_mtu);
+ ip_rt_put(rt);
+
+ return mtu4;
+}
+
+static int ipxlat_64_build_icmp4_errhdr(struct ipxlat_priv *ipxlat,
+ struct sk_buff *skb,
+ const struct icmp6hdr *ic6,
+ struct icmphdr *ic4, bool *ie_forbidden)
+{
+ unsigned int in_mtu, out_mtu;
+ u32 ptr6, ptr4;
+ int err;
+
+ switch (ic6->icmp6_type) {
+ case ICMPV6_DEST_UNREACH:
+ ic4->type = ICMP_DEST_UNREACH;
+ switch (ic6->icmp6_code) {
+ case ICMPV6_NOROUTE:
+ case ICMPV6_NOT_NEIGHBOUR:
+ case ICMPV6_ADDR_UNREACH:
+ ic4->code = ICMP_HOST_UNREACH;
+ break;
+ case ICMPV6_ADM_PROHIBITED:
+ ic4->code = ICMP_HOST_ANO;
+ break;
+ case ICMPV6_PORT_UNREACH:
+ ic4->code = ICMP_PORT_UNREACH;
+ break;
+ default:
+ return -EINVAL;
+ }
+ ic4->un.gateway = 0;
+ *ie_forbidden = false;
+ return 0;
+ case ICMPV6_TIME_EXCEED:
+ ic4->type = ICMP_TIME_EXCEEDED;
+ ic4->code = ic6->icmp6_code;
+ ic4->un.gateway = 0;
+ *ie_forbidden = false;
+ return 0;
+ case ICMPV6_PKT_TOOBIG:
+ ic4->type = ICMP_DEST_UNREACH;
+ ic4->code = ICMP_FRAG_NEEDED;
+ ic4->un.frag.__unused = 0;
+ in_mtu = ipxlat_64_lookup_pmtu4(ipxlat, skb);
+ out_mtu = READ_ONCE(ipxlat->dev->mtu);
+ /* RFC 7915 Section 5.2:
+ * min((PTB_mtu - 20), mtu4_nexthop, (mtu6_nexthop - 20))
+ */
+ ic4->un.frag.mtu =
+ cpu_to_be16(min3(be32_to_cpu(ic6->icmp6_mtu) - 20,
+ in_mtu, out_mtu - 20));
+ *ie_forbidden = true;
+ return 0;
+ case ICMPV6_PARAMPROB:
+ ptr6 = be32_to_cpu(ic6->icmp6_dataun.un_data32[0]);
+ switch (ic6->icmp6_code) {
+ case ICMPV6_HDR_FIELD:
+ ic4->type = ICMP_PARAMETERPROB;
+ ic4->code = 0;
+ err = ipxlat_64_map_icmp_param_prob(ptr6, &ptr4);
+ if (unlikely(err))
+ return err;
+ ipxlat_icmp4_set_param_ptr(ic4, ptr4);
+ break;
+ case ICMPV6_UNK_NEXTHDR:
+ ic4->type = ICMP_DEST_UNREACH;
+ ic4->code = ICMP_PROT_UNREACH;
+ ic4->un.gateway = 0;
+ break;
+ default:
+ return -EINVAL;
+ }
+ *ie_forbidden = true;
+ return 0;
+ default:
+ return -EINVAL;
+ }
+}
+
+static __sum16
+ipxlat_64_compute_icmp_info_csum(const struct ipv6hdr *in6,
+ const struct icmp6hdr *in_icmp6,
+ const struct icmphdr *out_icmp4,
+ unsigned int l4_len)
{
struct icmp6hdr icmp6_zero;
struct icmphdr icmp4_zero;
@@ -82,11 +217,315 @@ static int ipxlat_64_icmp_info(struct sk_buff *skb, const struct ipv6hdr *in6)
return 0;
}
-int ipxlat_64_icmp(struct ipxlat_priv *ipxl, struct sk_buff *skb,
+static int ipxlat_64_icmp_inner_info(struct sk_buff *skb,
+ unsigned int inner_l4_off)
+{
+ struct icmphdr *ic4;
+ struct icmp6hdr ic6;
+ int err;
+
+ /* inner header alignment is not guaranteed */
+ memcpy(&ic6, skb->data + inner_l4_off, sizeof(ic6));
+ ic4 = (struct icmphdr *)(skb->data + inner_l4_off);
+ err = ipxlat_64_map_icmp_info_type_code(&ic6, ic4);
+ if (unlikely(err))
+ return err;
+
+ ic4->checksum = 0;
+ ic4->checksum = csum_fold(skb_checksum(skb, inner_l4_off,
+ skb->len - inner_l4_off, 0));
+ skb->ip_summed = CHECKSUM_NONE;
+ return 0;
+}
+
+static int ipxlat_64_icmp_inner_l4(struct sk_buff *skb,
+ unsigned int inner_l4_off,
+ const struct iphdr *inner4,
+ const struct ipv6hdr *inner6)
+{
+ struct tcphdr *tcp;
+ struct udphdr *udp;
+
+ switch (inner4->protocol) {
+ case IPPROTO_TCP:
+ tcp = (struct tcphdr *)(skb->data + inner_l4_off);
+ return ipxlat_64_inner_tcp(skb, inner6, inner4, tcp);
+ case IPPROTO_UDP:
+ udp = (struct udphdr *)(skb->data + inner_l4_off);
+ return ipxlat_64_inner_udp(skb, inner6, inner4, udp);
+ case IPPROTO_ICMP:
+ return ipxlat_64_icmp_inner_info(skb, inner_l4_off);
+ default:
+ return 0;
+ }
+}
+
+static int ipxlat_64_icmp_inner(struct ipxlat_priv *ipxlat, struct sk_buff *skb,
+ int *inner_delta)
+{
+ unsigned int old_prefix, new_prefix, inner_l3_len, inner_tot_len,
+ inner_l4_payload, outer_prefix, inner_l3_off, inner_l4_old_off;
+ const unsigned int outer_l3_len = skb_transport_offset(skb);
+ const struct ipxlat_cb *cb = ipxlat_skb_cb(skb);
+ const struct iphdr outer4_copy = *ip_hdr(skb);
+ bool has_inner_frag, first_inner_frag, mf, df;
+ struct frag_hdr inner_fragh;
+ struct ipv6hdr inner6;
+ struct iphdr *inner4;
+ __be32 saddr, daddr;
+ u16 frag_off;
+ u8 inner_l4_proto;
+ __be16 frag_id;
+ int err;
+
+ inner_l3_off = cb->inner_l3_offset;
+ inner_l4_old_off = cb->inner_l4_offset;
+ inner_l3_len = inner_l4_old_off - inner_l3_off;
+ outer_prefix = inner_l3_off;
+
+ inner_l4_proto = ipxlat_64_map_nexthdr_proto(cb->inner_l4_proto);
+ has_inner_frag = !!cb->inner_fragh_off;
+
+ /* inner header alignment is not guaranteed */
+ memcpy(&inner6, skb->data + outer_prefix, sizeof(inner6));
+
+ first_inner_frag = true;
+ if (unlikely(has_inner_frag)) {
+ memcpy(&inner_fragh, skb->data + cb->inner_fragh_off,
+ sizeof(inner_fragh));
+ first_inner_frag = ipxlat_is_first_frag6(&inner_fragh);
+ }
+
+ err = ipxlat_64_convert_addrs(&ipxlat->xlat_prefix6, &inner6, false,
+ &saddr, &daddr);
+ if (unlikely(err))
+ return err;
+
+ old_prefix = outer_prefix + inner_l3_len;
+ new_prefix = outer_prefix + sizeof(struct iphdr);
+ *inner_delta = (int)new_prefix - (int)old_prefix;
+
+ /* unlike 46, inner 6->4 always shrinks quoted L3 size */
+ skb_pull(skb, old_prefix);
+ skb_push(skb, new_prefix);
+ /* outer 6->4 translation already set network/transport headers, but
+ * inner relayout pulls/pushes again and changes skb->data placement.
+ * Reinitialize outer header offsets so ip{,v6}_hdr/icmp{,6}_hdr and
+ * skb_transport_offset keep pointing to the outer packet.
+ */
+ skb_reset_network_header(skb);
+ skb_set_transport_header(skb, outer_l3_len);
+
+ *ip_hdr(skb) = outer4_copy;
+
+ inner4 = (struct iphdr *)(skb->data + outer_prefix);
+ inner_tot_len = ntohs(inner6.payload_len) + sizeof(inner6) -
+ inner_l3_len + sizeof(struct iphdr);
+ /* RFC 7915 Section 5.1 */
+ if (likely(!has_inner_frag)) {
+ df = inner_tot_len > (IPV6_MIN_MTU - sizeof(struct iphdr));
+ inner4->frag_off = ipxlat_build_frag4_offset(df, false, 0);
+ } else {
+ mf = !!(be16_to_cpu(inner_fragh.frag_off) & IP6_MF);
+ frag_off = ipxlat_get_frag6_offset(&inner_fragh);
+ inner4->frag_off =
+ ipxlat_build_frag4_offset(false, mf, frag_off);
+ }
+
+ /* keep low 16 bits of IPv6 Fragment ID as numeric value, then re-encode
+ * to network-order IPv4 ID
+ */
+ frag_id = has_inner_frag ?
+ cpu_to_be16(be32_to_cpu(inner_fragh.identification)) :
+ 0;
+ ipxlat_64_build_l3(inner4, &inner6, inner_tot_len, inner4->frag_off,
+ inner_l4_proto, saddr, daddr, inner6.hop_limit,
+ frag_id);
+
+ if (likely(!has_inner_frag)) {
+ inner4->id = 0;
+ __ip_select_ident(dev_net(ipxlat->dev), inner4, 1);
+ inner4->check = 0;
+ inner4->check = ip_fast_csum(inner4, inner4->ihl);
+ }
+
+ if (unlikely(!first_inner_frag))
+ return 0;
+
+ inner_l4_payload = new_prefix + ipxlat_l4_min_len(inner4->protocol);
+ if (unlikely(skb_ensure_writable(skb, inner_l4_payload)))
+ return -ENOMEM;
+
+ return ipxlat_64_icmp_inner_l4(skb, new_prefix, inner4, &inner6);
+}
+
+/* Rebuild ICMPv4 quoted-datagram/extensions after inner 6->4 translation.
+ *
+ * The inner rewrite changes the quoted datagram length. This helper updates
+ * the RFC 4884 delimiter/padding and extension bytes, then enforces the
+ * IPv4 ICMP error size cap.
+ *
+ * This is intentionally not a mirror of ipxlat_46_icmp_squeeze_ext:
+ * - 4->6 always writes icmp6_datagram_len (either computed or 0).
+ * - 6->4 updates ICMPv4 datagram-length only when extensions are allowed.
+ * Some mapped ICMPv6 errors set ie_forbidden, and in that case we keep the
+ * ICMPv4 header semantics for that type/code and only relayout/trim payload.
+ */
+static int ipxlat_64_squeeze_icmp_ext(struct sk_buff *skb,
+ unsigned int icmp6_ipl, int inner_delta,
+ bool ie_forbidden)
+{
+ unsigned int outer_hdrs_len, payload_len, icmp4_iel_in, icmp4_iel_out;
+ unsigned int out_pad, max_iel, pkt_len_cap, icmp4_ipl_out_bytes;
+ unsigned int icmp4_ipl_out = 0, icmp4_ipl_in_bytes;
+ unsigned int new_tot_len;
+ int icmp4_ipl_in, err;
+ struct icmphdr *ic4;
+ struct iphdr *iph4;
+
+ if (likely(!icmp6_ipl))
+ goto finalize;
+
+ outer_hdrs_len = skb_transport_offset(skb) + sizeof(struct icmphdr);
+ if (unlikely(skb->len < outer_hdrs_len))
+ return -EINVAL;
+
+ payload_len = skb->len - outer_hdrs_len;
+ icmp4_ipl_in = (int)icmp6_ipl + inner_delta;
+ if (unlikely(icmp4_ipl_in < 0))
+ return -EINVAL;
+ icmp4_ipl_in_bytes = icmp4_ipl_in;
+ if (unlikely(icmp4_ipl_in_bytes > payload_len))
+ return -EINVAL;
+
+ if (likely(icmp4_ipl_in_bytes == payload_len))
+ goto finalize;
+
+ icmp4_iel_in = payload_len - icmp4_ipl_in_bytes;
+ max_iel = IPXLAT_ICMP4_ERROR_MAX_LEN -
+ (outer_hdrs_len + ICMP_EXT_ORIG_DGRAM_MIN_LEN);
+
+ if (unlikely(ie_forbidden)) {
+ icmp4_ipl_out_bytes = icmp4_ipl_in_bytes;
+ out_pad = 0;
+ icmp4_iel_out = 0;
+ } else if (unlikely(icmp4_iel_in > max_iel)) {
+ pkt_len_cap = min_t(unsigned int, skb->len - icmp4_iel_in,
+ IPXLAT_ICMP4_ERROR_MAX_LEN);
+ icmp4_ipl_out_bytes = pkt_len_cap - outer_hdrs_len;
+ out_pad = 0;
+ icmp4_iel_out = 0;
+ icmp4_ipl_out = 0;
+ } else {
+ pkt_len_cap = min_t(unsigned int, skb->len,
+ IPXLAT_ICMP4_ERROR_MAX_LEN);
+ icmp4_ipl_out_bytes =
+ round_down(pkt_len_cap - icmp4_iel_in - outer_hdrs_len,
+ sizeof(u32));
+ out_pad = max_t(unsigned int, ICMP_EXT_ORIG_DGRAM_MIN_LEN,
+ icmp4_ipl_out_bytes) -
+ icmp4_ipl_out_bytes;
+ icmp4_iel_out = icmp4_iel_in;
+ /* RFC 4884 field is in 32-bit units for ICMPv4 errors */
+ icmp4_ipl_out = (icmp4_ipl_out_bytes + out_pad) >> 2;
+ }
+
+ /* if no extension bytes are copied and no pad is written, relayout only
+ * trims/updates lengths and does not require full data writability
+ */
+ if (unlikely(icmp4_iel_out || out_pad)) {
+ err = skb_ensure_writable(skb, skb->len);
+ if (unlikely(err))
+ return err;
+ }
+
+ err = ipxlat_icmp_relayout(skb, outer_hdrs_len, icmp4_ipl_in_bytes,
+ icmp4_iel_in, icmp4_ipl_out_bytes, out_pad,
+ icmp4_iel_out);
+ if (unlikely(err))
+ return err;
+
+finalize:
+ if (!ie_forbidden) {
+ ic4 = icmp_hdr(skb);
+ ic4->un.reserved[1] = icmp4_ipl_out;
+ }
+
+ if (unlikely(skb->len > IPXLAT_ICMP4_ERROR_MAX_LEN)) {
+ err = pskb_trim(skb, IPXLAT_ICMP4_ERROR_MAX_LEN);
+ if (unlikely(err))
+ return err;
+ }
+
+ iph4 = ip_hdr(skb);
+ new_tot_len = skb->len;
+ if (unlikely(be16_to_cpu(iph4->tot_len) != new_tot_len)) {
+ iph4->tot_len = cpu_to_be16(new_tot_len);
+ /* relayout/trim may invalidate precomputed DF decision */
+ iph4->frag_off &= cpu_to_be16(~IP_DF);
+ iph4->check = 0;
+ iph4->check = ip_fast_csum(iph4, iph4->ihl);
+ }
+
+ return 0;
+}
+
+/**
+ * ipxlat_64_icmp_error - translate ICMPv6 error payload to ICMPv4 error form
+ * @ipxlat: translator private context
+ * @skb: packet carrying outer ICMPv6 error
+ *
+ * Rewrites the quoted inner datagram in place, maps type/code/fields and
+ * adjusts RFC 4884 datagram/extension layout before recomputing outer checksum.
+ *
+ * Return: 0 on success, negative errno on translation failure.
+ */
+static int ipxlat_64_icmp_error(struct ipxlat_priv *ipxlat, struct sk_buff *skb)
+{
+ const struct ipxlat_cb *cb = ipxlat_skb_cb(skb);
+ const struct icmp6hdr ic6 = *icmp6_hdr(skb);
+ unsigned int icmp6_ipl;
+ int inner_delta, err;
+ struct icmphdr *ic4;
+ bool ie_forbidden;
+
+ if (unlikely(!(cb->is_icmp_err))) {
+ DEBUG_NET_WARN_ON_ONCE(1);
+ return -EINVAL;
+ }
+
+ /* translate quoted inner packet headers */
+ err = ipxlat_64_icmp_inner(ipxlat, skb, &inner_delta);
+ if (unlikely(err))
+ return err;
+
+ /* build outer ICMPv4 error header after inner relayout */
+ ic4 = (struct icmphdr *)(skb->data + skb_transport_offset(skb));
+ err = ipxlat_64_build_icmp4_errhdr(ipxlat, skb, &ic6, ic4,
+ &ie_forbidden);
+ if (unlikely(err))
+ return err;
+
+ icmp6_ipl = ic6.icmp6_datagram_len << 3;
+ err = ipxlat_64_squeeze_icmp_ext(skb, icmp6_ipl, inner_delta,
+ ie_forbidden);
+ if (unlikely(err))
+ return err;
+
+ /* recompute whole ICMPv4 checksum after error-path relayout */
+ ic4->checksum = 0;
+ ic4->checksum = csum_fold(skb_checksum(skb, skb_transport_offset(skb),
+ ipxlat_skb_datagram_len(skb),
+ 0));
+ skb->ip_summed = CHECKSUM_NONE;
+ return 0;
+}
+
+int ipxlat_64_icmp(struct ipxlat_priv *ipxlat, struct sk_buff *skb,
const struct ipv6hdr *in6)
{
if (unlikely(ipxlat_skb_cb(skb)->is_icmp_err))
- return -EPROTONOSUPPORT;
+ return ipxlat_64_icmp_error(ipxlat, skb);
return ipxlat_64_icmp_info(skb, in6);
}
diff --git a/drivers/net/ipxlat/transport.c b/drivers/net/ipxlat/transport.c
index 3aa00c635916..82aedfb0ee48 100644
--- a/drivers/net/ipxlat/transport.c
+++ b/drivers/net/ipxlat/transport.c
@@ -87,6 +87,67 @@ __sum16 ipxlat_l4_csum_ipv6(const struct in6_addr *saddr,
skb_checksum(skb, l4_off, l4_len, 0));
}
+static int ipxlat_ensure_tailroom(struct sk_buff *skb, const unsigned int grow)
+{
+ int err;
+
+ if (!grow || skb_tailroom(skb) >= grow)
+ return 0;
+
+ /* tail growth may reallocate backing storage and move skb data */
+ err = pskb_expand_head(skb, 0, grow - skb_tailroom(skb), GFP_ATOMIC);
+ if (unlikely(err))
+ return err;
+
+ return 0;
+}
+
+/* Rewrite quoted datagram layout after inner translation in ICMP errors.
+ *
+ * Caller provides old/new quoted lengths and extension lengths; this helper
+ * only does byte moves/padding/trim while preserving extension bytes at the
+ * end of the packet when present
+ */
+int ipxlat_icmp_relayout(struct sk_buff *skb, unsigned int outer_len,
+ unsigned int in_ipl, unsigned int in_iel,
+ unsigned int out_ipl, unsigned int out_pad,
+ unsigned int out_iel)
+{
+ const unsigned int in_ie_off = outer_len + in_ipl, old_len = skb->len;
+ const unsigned int new_len = outer_len + out_ipl + out_pad + out_iel;
+ const unsigned int out_ie_off = outer_len + out_ipl + out_pad;
+ unsigned int grow = 0;
+ int err;
+
+ /* new_len > old_len here means "we need extra bytes on top of
+ * already-translated length", mainly due padding/layout decisions
+ * while keeping extensions
+ */
+ if (unlikely(new_len > old_len)) {
+ grow = new_len - old_len;
+
+ err = ipxlat_ensure_tailroom(skb, grow);
+ if (unlikely(err))
+ return err;
+
+ __skb_put(skb, grow);
+ }
+
+ if (unlikely(out_iel))
+ memmove(skb->data + out_ie_off, skb->data + in_ie_off, out_iel);
+
+ if (unlikely(out_pad))
+ memset(skb->data + outer_len + out_ipl, 0, out_pad);
+
+ if (unlikely(new_len < old_len)) {
+ err = pskb_trim(skb, new_len);
+ if (unlikely(err))
+ return err;
+ }
+
+ return 0;
+}
+
/* Normalize checksum/offload metadata after address-family translation.
*
* Translation changes protocol family but keeps transport payload semantics
diff --git a/drivers/net/ipxlat/transport.h b/drivers/net/ipxlat/transport.h
index 9b6fe422b01f..09f522696eea 100644
--- a/drivers/net/ipxlat/transport.h
+++ b/drivers/net/ipxlat/transport.h
@@ -63,6 +63,25 @@ __sum16 ipxlat_l4_csum_ipv6(const struct in6_addr *saddr,
const struct sk_buff *skb, unsigned int l4_off,
unsigned int l4_len, u8 proto);
+/**
+ * ipxlat_icmp_relayout - resize quoted ICMP payload/extensions in place
+ * @skb: packet buffer
+ * @outer_len: offset to quoted datagram start
+ * @in_ipl: input datagram payload length
+ * @in_iel: input extension length
+ * @out_ipl: output datagram payload length
+ * @out_pad: output pad bytes between datagram and extensions
+ * @out_iel: output extension length
+ *
+ * This helper may move payload bytes and adjust skb tail length.
+ *
+ * Return: 0 on success, negative errno on resize/memory failures.
+ */
+int ipxlat_icmp_relayout(struct sk_buff *skb, unsigned int outer_len,
+ unsigned int in_ipl, unsigned int in_iel,
+ unsigned int out_ipl, unsigned int out_pad,
+ unsigned int out_iel);
+
/**
* ipxlat_finalize_offload - normalize checksum/GSO metadata after translation
* @skb: translated packet
--
2.53.0
^ permalink raw reply related [flat|nested] 18+ messages in thread* [RFC net-next 13/15] ipxlat: add netlink control plane and uapi
2026-03-19 15:12 [RFC net-next 00/15] Introducing ipxlat: a stateless IPv4/IPv6 translation device Ralf Lici
` (11 preceding siblings ...)
2026-03-19 15:12 ` [RFC net-next 12/15] ipxlat: add ICMP error translation and quoted-inner handling Ralf Lici
@ 2026-03-19 15:12 ` Ralf Lici
2026-03-19 15:12 ` [RFC net-next 14/15] selftests: net: add ipxlat coverage Ralf Lici
2026-03-19 15:12 ` [RFC net-next 15/15] Documentation: networking: add ipxlat translator guide Ralf Lici
14 siblings, 0 replies; 18+ messages in thread
From: Ralf Lici @ 2026-03-19 15:12 UTC (permalink / raw)
To: netdev
Cc: Daniel Gröber, Ralf Lici, Antonio Quartulli, Donald Hunter,
Jakub Kicinski, David S. Miller, Eric Dumazet, Paolo Abeni,
Simon Horman, Andrew Lunn, linux-kernel
Expose runtime configuration through netlink with validated set/get/dump
operations and generated policy glue from the YAML spec. The API
configures the translator prefix and MTU threshold used by the data
path.
Signed-off-by: Ralf Lici <ralf@mandelbit.com>
---
Documentation/netlink/specs/ipxlat.yaml | 97 +++++++
drivers/net/ipxlat/Makefile | 2 +
drivers/net/ipxlat/main.c | 9 +
drivers/net/ipxlat/netlink-gen.c | 71 +++++
drivers/net/ipxlat/netlink-gen.h | 31 +++
drivers/net/ipxlat/netlink.c | 348 ++++++++++++++++++++++++
drivers/net/ipxlat/netlink.h | 27 ++
drivers/net/ipxlat/translate_46.c | 3 +-
include/uapi/linux/ipxlat.h | 48 ++++
9 files changed, 635 insertions(+), 1 deletion(-)
create mode 100644 Documentation/netlink/specs/ipxlat.yaml
create mode 100644 drivers/net/ipxlat/netlink-gen.c
create mode 100644 drivers/net/ipxlat/netlink-gen.h
create mode 100644 drivers/net/ipxlat/netlink.c
create mode 100644 drivers/net/ipxlat/netlink.h
create mode 100644 include/uapi/linux/ipxlat.h
diff --git a/Documentation/netlink/specs/ipxlat.yaml b/Documentation/netlink/specs/ipxlat.yaml
new file mode 100644
index 000000000000..d0df5ef16e04
--- /dev/null
+++ b/Documentation/netlink/specs/ipxlat.yaml
@@ -0,0 +1,97 @@
+# SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)
+#
+# Copyright (C) 2026- Mandelbit SRL
+#
+# Author: Antonio Quartulli <antonio@mandelbit.com>
+# Ralf Lici <ralf@mandelbit.com>
+#
+---
+name: ipxlat
+protocol: genetlink
+doc: Netlink protocol to control IPXLAT (SIIT) network devices.
+
+definitions:
+ -
+ type: const
+ name: xlat-prefix6-max-prefix-len
+ value: 96
+ doc: Maximum prefix length accepted for xlat-prefix6.
+
+attribute-sets:
+ -
+ name: pool
+ attributes:
+ -
+ name: prefix
+ type: binary
+ checks:
+ exact-len: 16
+ -
+ name: prefix-len
+ type: u8
+ checks:
+ max: xlat-prefix6-max-prefix-len
+ -
+ name: cfg
+ attributes:
+ -
+ name: xlat-prefix6
+ type: nest
+ doc: IPv6 translation prefix.
+ nested-attributes: pool
+ -
+ name: lowest-ipv6-mtu
+ type: u32
+ checks:
+ min: 1280
+ -
+ name: dev
+ attributes:
+ -
+ name: ifindex
+ type: u32
+ doc: Index of the ipxlat interface to operate on.
+ -
+ name: netnsid
+ type: s32
+ doc: ID of the netns the device lives in.
+ -
+ name: config
+ type: nest
+ doc: Ipxlat device configuration.
+ nested-attributes: cfg
+
+operations:
+ list:
+ -
+ name: dev-get
+ attribute-set: dev
+ doc: Get / dump configuration of ipxlat devices.
+ do:
+ pre: ipxlat-nl-pre-doit
+ post: ipxlat-nl-post-doit
+ request:
+ attributes:
+ - ifindex
+ reply: &dev-all
+ attributes:
+ - ifindex
+ - netnsid
+ - config
+ dump:
+ reply: *dev-all
+
+ -
+ name: dev-set
+ doc: Set configuration of an ipxlat device.
+ attribute-set: dev
+ flags: [admin-perm]
+ do:
+ request:
+ attributes:
+ - ifindex
+ - config
+ reply:
+ attributes: []
+ pre: ipxlat-nl-pre-doit
+ post: ipxlat-nl-post-doit
diff --git a/drivers/net/ipxlat/Makefile b/drivers/net/ipxlat/Makefile
index 2ded504902e3..b906d5698351 100644
--- a/drivers/net/ipxlat/Makefile
+++ b/drivers/net/ipxlat/Makefile
@@ -13,3 +13,5 @@ ipxlat-objs += translate_46.o
ipxlat-objs += translate_64.o
ipxlat-objs += icmp_46.o
ipxlat-objs += icmp_64.o
+ipxlat-objs += netlink.o
+ipxlat-objs += netlink-gen.o
diff --git a/drivers/net/ipxlat/main.c b/drivers/net/ipxlat/main.c
index a1b4bcd39478..bef67ed634b6 100644
--- a/drivers/net/ipxlat/main.c
+++ b/drivers/net/ipxlat/main.c
@@ -18,6 +18,7 @@
#include "dispatch.h"
#include "ipxlpriv.h"
#include "main.h"
+#include "netlink.h"
MODULE_AUTHOR("Alberto Leiva Popper <ydahhrk@gmail.com>");
MODULE_AUTHOR("Antonio Quartulli <antonio@mandelbit.com>");
@@ -127,11 +128,19 @@ static int __init ipxlat_init(void)
return err;
}
+ err = ipxlat_nl_register();
+ if (err) {
+ pr_err("ipxlat: failed to register netlink family: %d\n", err);
+ rtnl_link_unregister(&ipxlat_link_ops);
+ return err;
+ }
+
return 0;
}
static void __exit ipxlat_exit(void)
{
+ ipxlat_nl_unregister();
rtnl_link_unregister(&ipxlat_link_ops);
}
diff --git a/drivers/net/ipxlat/netlink-gen.c b/drivers/net/ipxlat/netlink-gen.c
new file mode 100644
index 000000000000..e2cfaa6bb4dc
--- /dev/null
+++ b/drivers/net/ipxlat/netlink-gen.c
@@ -0,0 +1,71 @@
+// SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)
+/* Do not edit directly, auto-generated from: */
+/* Documentation/netlink/specs/ipxlat.yaml */
+/* YNL-GEN kernel source */
+/* To regenerate run: tools/net/ynl/ynl-regen.sh */
+
+#include <net/netlink.h>
+#include <net/genetlink.h>
+
+#include "netlink-gen.h"
+
+#include <uapi/linux/ipxlat.h>
+
+/* Common nested types */
+const struct nla_policy ipxlat_cfg_nl_policy[IPXLAT_A_CFG_LOWEST_IPV6_MTU + 1] = {
+ [IPXLAT_A_CFG_XLAT_PREFIX6] = NLA_POLICY_NESTED(ipxlat_pool_nl_policy),
+ [IPXLAT_A_CFG_LOWEST_IPV6_MTU] = NLA_POLICY_MIN(NLA_U32, 1280),
+};
+
+const struct nla_policy ipxlat_pool_nl_policy[IPXLAT_A_POOL_PREFIX_LEN + 1] = {
+ [IPXLAT_A_POOL_PREFIX] = NLA_POLICY_EXACT_LEN(16),
+ [IPXLAT_A_POOL_PREFIX_LEN] = NLA_POLICY_MAX(NLA_U8, IPXLAT_XLAT_PREFIX6_MAX_PREFIX_LEN),
+};
+
+/* IPXLAT_CMD_DEV_GET - do */
+static const struct nla_policy ipxlat_dev_get_nl_policy[IPXLAT_A_DEV_IFINDEX + 1] = {
+ [IPXLAT_A_DEV_IFINDEX] = { .type = NLA_U32, },
+};
+
+/* IPXLAT_CMD_DEV_SET - do */
+static const struct nla_policy ipxlat_dev_set_nl_policy[IPXLAT_A_DEV_CONFIG + 1] = {
+ [IPXLAT_A_DEV_IFINDEX] = { .type = NLA_U32, },
+ [IPXLAT_A_DEV_CONFIG] = NLA_POLICY_NESTED(ipxlat_cfg_nl_policy),
+};
+
+/* Ops table for ipxlat */
+static const struct genl_split_ops ipxlat_nl_ops[] = {
+ {
+ .cmd = IPXLAT_CMD_DEV_GET,
+ .pre_doit = ipxlat_nl_pre_doit,
+ .doit = ipxlat_nl_dev_get_doit,
+ .post_doit = ipxlat_nl_post_doit,
+ .policy = ipxlat_dev_get_nl_policy,
+ .maxattr = IPXLAT_A_DEV_IFINDEX,
+ .flags = GENL_CMD_CAP_DO,
+ },
+ {
+ .cmd = IPXLAT_CMD_DEV_GET,
+ .dumpit = ipxlat_nl_dev_get_dumpit,
+ .flags = GENL_CMD_CAP_DUMP,
+ },
+ {
+ .cmd = IPXLAT_CMD_DEV_SET,
+ .pre_doit = ipxlat_nl_pre_doit,
+ .doit = ipxlat_nl_dev_set_doit,
+ .post_doit = ipxlat_nl_post_doit,
+ .policy = ipxlat_dev_set_nl_policy,
+ .maxattr = IPXLAT_A_DEV_CONFIG,
+ .flags = GENL_ADMIN_PERM | GENL_CMD_CAP_DO,
+ },
+};
+
+struct genl_family ipxlat_nl_family __ro_after_init = {
+ .name = IPXLAT_FAMILY_NAME,
+ .version = IPXLAT_FAMILY_VERSION,
+ .netnsok = true,
+ .parallel_ops = true,
+ .module = THIS_MODULE,
+ .split_ops = ipxlat_nl_ops,
+ .n_split_ops = ARRAY_SIZE(ipxlat_nl_ops),
+};
diff --git a/drivers/net/ipxlat/netlink-gen.h b/drivers/net/ipxlat/netlink-gen.h
new file mode 100644
index 000000000000..2a766d05e0b4
--- /dev/null
+++ b/drivers/net/ipxlat/netlink-gen.h
@@ -0,0 +1,31 @@
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) */
+/* Do not edit directly, auto-generated from: */
+/* Documentation/netlink/specs/ipxlat.yaml */
+/* YNL-GEN kernel header */
+/* To regenerate run: tools/net/ynl/ynl-regen.sh */
+
+#ifndef _LINUX_IPXLAT_GEN_H
+#define _LINUX_IPXLAT_GEN_H
+
+#include <net/netlink.h>
+#include <net/genetlink.h>
+
+#include <uapi/linux/ipxlat.h>
+
+/* Common nested types */
+extern const struct nla_policy ipxlat_cfg_nl_policy[IPXLAT_A_CFG_LOWEST_IPV6_MTU + 1];
+extern const struct nla_policy ipxlat_pool_nl_policy[IPXLAT_A_POOL_PREFIX_LEN + 1];
+
+int ipxlat_nl_pre_doit(const struct genl_split_ops *ops, struct sk_buff *skb,
+ struct genl_info *info);
+void
+ipxlat_nl_post_doit(const struct genl_split_ops *ops, struct sk_buff *skb,
+ struct genl_info *info);
+
+int ipxlat_nl_dev_get_doit(struct sk_buff *skb, struct genl_info *info);
+int ipxlat_nl_dev_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb);
+int ipxlat_nl_dev_set_doit(struct sk_buff *skb, struct genl_info *info);
+
+extern struct genl_family ipxlat_nl_family;
+
+#endif /* _LINUX_IPXLAT_GEN_H */
diff --git a/drivers/net/ipxlat/netlink.c b/drivers/net/ipxlat/netlink.c
new file mode 100644
index 000000000000..02d097726f22
--- /dev/null
+++ b/drivers/net/ipxlat/netlink.c
@@ -0,0 +1,348 @@
+// SPDX-License-Identifier: GPL-2.0
+/* IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver
+ *
+ * Copyright (C) 2026- Mandelbit SRL
+ * Copyright (C) 2026- Daniel Gröber <dxld@darkboxed.org>
+ *
+ * Author: Antonio Quartulli <antonio@mandelbit.com>
+ * Daniel Gröber <dxld@darkboxed.org>
+ * Ralf Lici <ralf@mandelbit.com>
+ */
+
+#include <net/genetlink.h>
+#include <net/ipv6.h>
+
+#include <uapi/linux/ipxlat.h>
+
+#include "netlink.h"
+#include "main.h"
+#include "netlink-gen.h"
+#include "ipxlpriv.h"
+
+MODULE_ALIAS_GENL_FAMILY(IPXLAT_FAMILY_NAME);
+
+struct ipxlat_nl_info_ctx {
+ struct ipxlat_priv *ipxlat;
+ netdevice_tracker tracker;
+};
+
+struct ipxlat_nl_dump_ctx {
+ unsigned long last_ifindex;
+};
+
+/**
+ * ipxlat_get_from_attrs - retrieve ipxlat private data for target netdev
+ * @net: network namespace where to look for the interface
+ * @info: generic netlink info from the user request
+ * @tracker: tracker object to be used for the netdev reference acquisition
+ *
+ * Return: the ipxlat private data, if found, or an error otherwise
+ */
+static struct ipxlat_priv *ipxlat_get_from_attrs(struct net *net,
+ struct genl_info *info,
+ netdevice_tracker *tracker)
+{
+ struct ipxlat_priv *ipxlat;
+ struct net_device *dev;
+ int ifindex;
+
+ if (GENL_REQ_ATTR_CHECK(info, IPXLAT_A_DEV_IFINDEX))
+ return ERR_PTR(-EINVAL);
+ ifindex = nla_get_u32(info->attrs[IPXLAT_A_DEV_IFINDEX]);
+
+ rcu_read_lock();
+ dev = dev_get_by_index_rcu(net, ifindex);
+ if (!dev) {
+ rcu_read_unlock();
+ NL_SET_ERR_MSG_MOD(info->extack,
+ "ifindex does not match any interface");
+ return ERR_PTR(-ENODEV);
+ }
+
+ if (!ipxlat_dev_is_valid(dev)) {
+ rcu_read_unlock();
+ NL_SET_ERR_MSG_MOD(info->extack,
+ "specified interface is not ipxlat");
+ NL_SET_BAD_ATTR(info->extack,
+ info->attrs[IPXLAT_A_DEV_IFINDEX]);
+ return ERR_PTR(-EINVAL);
+ }
+
+ ipxlat = netdev_priv(dev);
+ netdev_hold(dev, tracker, GFP_ATOMIC);
+ rcu_read_unlock();
+
+ return ipxlat;
+}
+
+int ipxlat_nl_pre_doit(const struct genl_split_ops *ops, struct sk_buff *skb,
+ struct genl_info *info)
+{
+ struct ipxlat_nl_info_ctx *ctx = (struct ipxlat_nl_info_ctx *)info->ctx;
+ struct ipxlat_priv *ipxlat;
+
+ BUILD_BUG_ON(sizeof(*ctx) > sizeof(info->ctx));
+
+ ipxlat = ipxlat_get_from_attrs(genl_info_net(info), info,
+ &ctx->tracker);
+ if (IS_ERR(ipxlat))
+ return PTR_ERR(ipxlat);
+
+ ctx->ipxlat = ipxlat;
+ return 0;
+}
+
+void ipxlat_nl_post_doit(const struct genl_split_ops *ops, struct sk_buff *skb,
+ struct genl_info *info)
+{
+ struct ipxlat_nl_info_ctx *ctx = (struct ipxlat_nl_info_ctx *)info->ctx;
+
+ if (ctx->ipxlat)
+ netdev_put(ctx->ipxlat->dev, &ctx->tracker);
+}
+
+static int ipxlat_nl_send_dev(struct sk_buff *skb, struct ipxlat_priv *ipxlat,
+ struct net *src_net, const u32 portid,
+ const u32 seq, int flags)
+{
+ struct nlattr *attr_cfg, *attr_pool;
+ struct ipv6_prefix xlat_prefix6;
+ int id, ret = -EMSGSIZE;
+ u32 lowest_ipv6_mtu;
+ void *hdr;
+
+ /* snapshot settings under lock so userspace sees a coherent state */
+ mutex_lock(&ipxlat->cfg_lock);
+ xlat_prefix6 = ipxlat->xlat_prefix6;
+ lowest_ipv6_mtu = ipxlat->lowest_ipv6_mtu;
+ mutex_unlock(&ipxlat->cfg_lock);
+
+ hdr = genlmsg_put(skb, portid, seq, &ipxlat_nl_family, flags,
+ IPXLAT_CMD_DEV_GET);
+ if (!hdr)
+ return -ENOBUFS;
+
+ if (nla_put_u32(skb, IPXLAT_A_DEV_IFINDEX, ipxlat->dev->ifindex))
+ goto err;
+
+ if (!net_eq(src_net, dev_net(ipxlat->dev))) {
+ id = peernet2id_alloc(src_net, dev_net(ipxlat->dev),
+ GFP_ATOMIC);
+ if (id < 0) {
+ ret = id;
+ goto err;
+ }
+ if (nla_put_s32(skb, IPXLAT_A_DEV_NETNSID, id))
+ goto err;
+ }
+
+ attr_cfg = nla_nest_start(skb, IPXLAT_A_DEV_CONFIG);
+ if (!attr_cfg)
+ goto err;
+
+ attr_pool = nla_nest_start(skb, IPXLAT_A_CFG_XLAT_PREFIX6);
+ if (!attr_pool)
+ goto err;
+
+ if (nla_put_in6_addr(skb, IPXLAT_A_POOL_PREFIX, &xlat_prefix6.addr) ||
+ nla_put_u8(skb, IPXLAT_A_POOL_PREFIX_LEN, xlat_prefix6.len))
+ goto err;
+
+ nla_nest_end(skb, attr_pool);
+
+ if (nla_put_u32(skb, IPXLAT_A_CFG_LOWEST_IPV6_MTU, lowest_ipv6_mtu))
+ goto err;
+
+ nla_nest_end(skb, attr_cfg);
+ genlmsg_end(skb, hdr);
+
+ return 0;
+err:
+ genlmsg_cancel(skb, hdr);
+ return ret;
+}
+
+int ipxlat_nl_dev_get_doit(struct sk_buff *skb, struct genl_info *info)
+{
+ struct ipxlat_nl_info_ctx *ctx = (struct ipxlat_nl_info_ctx *)info->ctx;
+ struct sk_buff *reply;
+ int ret;
+
+ if (GENL_REQ_ATTR_CHECK(info, IPXLAT_A_DEV_IFINDEX))
+ return -EINVAL;
+
+ reply = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+ if (!reply)
+ return -ENOMEM;
+
+ ret = ipxlat_nl_send_dev(reply, ctx->ipxlat, genl_info_net(info),
+ info->snd_portid, info->snd_seq, 0);
+ if (ret < 0) {
+ nlmsg_free(reply);
+ return ret;
+ }
+
+ return genlmsg_reply(reply, info);
+}
+
+int ipxlat_nl_dev_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb)
+{
+ struct ipxlat_nl_dump_ctx *state = (struct ipxlat_nl_dump_ctx *)cb->ctx;
+ struct net *net = sock_net(cb->skb->sk);
+ netdevice_tracker tracker;
+ struct net_device *dev;
+ int ret;
+
+ rcu_read_lock();
+ for_each_netdev_dump(net, dev, state->last_ifindex) {
+ if (!ipxlat_dev_is_valid(dev))
+ continue;
+
+ netdev_hold(dev, &tracker, GFP_ATOMIC);
+ rcu_read_unlock();
+
+ ret = ipxlat_nl_send_dev(skb, netdev_priv(dev), net,
+ NETLINK_CB(cb->skb).portid,
+ cb->nlh->nlmsg_seq, NLM_F_MULTI);
+
+ rcu_read_lock();
+ netdev_put(dev, &tracker);
+
+ if (ret < 0) {
+ if (skb->len > 0)
+ break;
+ rcu_read_unlock();
+ return ret;
+ }
+ }
+ rcu_read_unlock();
+ return skb->len;
+}
+
+static int ipxlat_nl_validate_xlat_prefix6(const struct ipv6_prefix *prefix,
+ struct netlink_ext_ack *extack)
+{
+ if (prefix->len != 32 && prefix->len != 40 && prefix->len != 48 &&
+ prefix->len != 56 && prefix->len != 64 && prefix->len != 96) {
+ NL_SET_ERR_MSG_FMT_MOD(extack,
+ "unsupported RFC 6052 prefix length: %u",
+ prefix->len);
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static int ipxlat_nl_parse_xlat_prefix6(struct nlattr *attr,
+ struct ipv6_prefix *xlat_prefix6,
+ struct netlink_ext_ack *extack)
+{
+ struct nlattr *attrs_pool[IPXLAT_A_POOL_MAX + 1];
+ struct ipv6_prefix new_xlat_prefix6;
+ int ret;
+
+ new_xlat_prefix6 = *xlat_prefix6;
+
+ ret = nla_parse_nested(attrs_pool, IPXLAT_A_POOL_MAX, attr,
+ ipxlat_pool_nl_policy, extack);
+ if (ret)
+ return ret;
+
+ if (!attrs_pool[IPXLAT_A_POOL_PREFIX] &&
+ !attrs_pool[IPXLAT_A_POOL_PREFIX_LEN]) {
+ NL_SET_ERR_MSG_MOD(extack, "xlat-prefix6 update is empty");
+ return -EINVAL;
+ }
+
+ if (attrs_pool[IPXLAT_A_POOL_PREFIX])
+ new_xlat_prefix6.addr =
+ nla_get_in6_addr(attrs_pool[IPXLAT_A_POOL_PREFIX]);
+ if (attrs_pool[IPXLAT_A_POOL_PREFIX_LEN])
+ new_xlat_prefix6.len =
+ nla_get_u8(attrs_pool[IPXLAT_A_POOL_PREFIX_LEN]);
+
+ ret = ipxlat_nl_validate_xlat_prefix6(&new_xlat_prefix6, extack);
+ if (ret) {
+ if (attrs_pool[IPXLAT_A_POOL_PREFIX_LEN])
+ NL_SET_BAD_ATTR(extack,
+ attrs_pool[IPXLAT_A_POOL_PREFIX_LEN]);
+ else
+ NL_SET_BAD_ATTR(extack,
+ attrs_pool[IPXLAT_A_POOL_PREFIX]);
+ return ret;
+ }
+
+ *xlat_prefix6 = new_xlat_prefix6;
+ return 0;
+}
+
+int ipxlat_nl_dev_set_doit(struct sk_buff *skb, struct genl_info *info)
+{
+ struct ipxlat_nl_info_ctx *ctx = (struct ipxlat_nl_info_ctx *)info->ctx;
+ struct nlattr *attrs[IPXLAT_A_CFG_MAX + 1];
+ struct nlattr *xlat_prefix6_attr;
+ struct ipv6_prefix xlat_prefix6;
+ u32 lowest_ipv6_mtu;
+ int ret = 0;
+
+ if (GENL_REQ_ATTR_CHECK(info, IPXLAT_A_DEV_CONFIG))
+ return -EINVAL;
+
+ ret = nla_parse_nested(attrs, IPXLAT_A_CFG_MAX,
+ info->attrs[IPXLAT_A_DEV_CONFIG],
+ ipxlat_cfg_nl_policy, info->extack);
+ if (ret)
+ return ret;
+
+ if (!attrs[IPXLAT_A_CFG_XLAT_PREFIX6] &&
+ !attrs[IPXLAT_A_CFG_LOWEST_IPV6_MTU]) {
+ NL_SET_ERR_MSG_MOD(info->extack, "config update is empty");
+ return -EINVAL;
+ }
+ xlat_prefix6_attr = attrs[IPXLAT_A_CFG_XLAT_PREFIX6];
+
+ mutex_lock(&ctx->ipxlat->cfg_lock);
+
+ /* Stage updates that can fail before writing device state.
+ * This keeps dev-set all-or-nothing and avoids partial commits when
+ * xlat-prefix parsing/validation fails.
+ */
+ if (xlat_prefix6_attr) {
+ xlat_prefix6 = ctx->ipxlat->xlat_prefix6;
+ ret = ipxlat_nl_parse_xlat_prefix6(xlat_prefix6_attr,
+ &xlat_prefix6,
+ info->extack);
+ if (ret)
+ goto out_unlock;
+ }
+
+ if (xlat_prefix6_attr)
+ ctx->ipxlat->xlat_prefix6 = xlat_prefix6;
+ if (attrs[IPXLAT_A_CFG_LOWEST_IPV6_MTU]) {
+ lowest_ipv6_mtu =
+ nla_get_u32(attrs[IPXLAT_A_CFG_LOWEST_IPV6_MTU]);
+ WRITE_ONCE(ctx->ipxlat->lowest_ipv6_mtu, lowest_ipv6_mtu);
+ }
+
+out_unlock:
+ mutex_unlock(&ctx->ipxlat->cfg_lock);
+ return ret;
+}
+
+/**
+ * ipxlat_nl_register - perform any needed registration in the netlink subsystem
+ *
+ * Return: 0 on success, a negative error code otherwise
+ */
+int __init ipxlat_nl_register(void)
+{
+ return genl_register_family(&ipxlat_nl_family);
+}
+
+/**
+ * ipxlat_nl_unregister - undo any module wide netlink registration
+ */
+void ipxlat_nl_unregister(void)
+{
+ genl_unregister_family(&ipxlat_nl_family);
+}
diff --git a/drivers/net/ipxlat/netlink.h b/drivers/net/ipxlat/netlink.h
new file mode 100644
index 000000000000..1ea292ad9964
--- /dev/null
+++ b/drivers/net/ipxlat/netlink.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver
+ *
+ * Copyright (C) 2026- Mandelbit SRL
+ * Copyright (C) 2026- Daniel Gröber <dxld@darkboxed.org>
+ *
+ * Author: Antonio Quartulli <antonio@mandelbit.com>
+ * Daniel Gröber <dxld@darkboxed.org>
+ * Ralf Lici <ralf@mandelbit.com>
+ */
+
+#ifndef _NET_IPXLAT_NETLINK_H_
+#define _NET_IPXLAT_NETLINK_H_
+
+/**
+ * ipxlat_nl_register - register ipxlat generic-netlink family
+ *
+ * Return: 0 on success, negative errno on registration failure.
+ */
+int ipxlat_nl_register(void);
+
+/**
+ * ipxlat_nl_unregister - unregister ipxlat generic-netlink family
+ */
+void ipxlat_nl_unregister(void);
+
+#endif /* _NET_IPXLAT_NETLINK_H_ */
diff --git a/drivers/net/ipxlat/translate_46.c b/drivers/net/ipxlat/translate_46.c
index 0b79ca07c771..d625dc85576b 100644
--- a/drivers/net/ipxlat/translate_46.c
+++ b/drivers/net/ipxlat/translate_46.c
@@ -14,6 +14,7 @@
#include <net/ip6_route.h>
#include "address.h"
+#include "icmp.h"
#include "packet.h"
#include "transport.h"
#include "translate_46.h"
@@ -239,7 +240,7 @@ int ipxlat_46_translate(struct ipxlat_priv *ipxlat, struct sk_buff *skb)
err = ipxlat_46_outer_udp(skb, &outer4);
break;
case IPPROTO_ICMP:
- err = -EPROTONOSUPPORT;
+ err = ipxlat_46_icmp(ipxlat, skb);
break;
default:
err = 0;
diff --git a/include/uapi/linux/ipxlat.h b/include/uapi/linux/ipxlat.h
new file mode 100644
index 000000000000..f8db3df3f9e8
--- /dev/null
+++ b/include/uapi/linux/ipxlat.h
@@ -0,0 +1,48 @@
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) */
+/* Do not edit directly, auto-generated from: */
+/* Documentation/netlink/specs/ipxlat.yaml */
+/* YNL-GEN uapi header */
+/* To regenerate run: tools/net/ynl/ynl-regen.sh */
+
+#ifndef _UAPI_LINUX_IPXLAT_H
+#define _UAPI_LINUX_IPXLAT_H
+
+#define IPXLAT_FAMILY_NAME "ipxlat"
+#define IPXLAT_FAMILY_VERSION 1
+
+#define IPXLAT_XLAT_PREFIX6_MAX_PREFIX_LEN 96
+
+enum {
+ IPXLAT_A_POOL_PREFIX = 1,
+ IPXLAT_A_POOL_PREFIX_LEN,
+
+ __IPXLAT_A_POOL_MAX,
+ IPXLAT_A_POOL_MAX = (__IPXLAT_A_POOL_MAX - 1)
+};
+
+enum {
+ IPXLAT_A_CFG_XLAT_PREFIX6 = 1,
+ IPXLAT_A_CFG_LOWEST_IPV6_MTU,
+
+ __IPXLAT_A_CFG_MAX,
+ IPXLAT_A_CFG_MAX = (__IPXLAT_A_CFG_MAX - 1)
+};
+
+enum {
+ IPXLAT_A_DEV_IFINDEX = 1,
+ IPXLAT_A_DEV_NETNSID,
+ IPXLAT_A_DEV_CONFIG,
+
+ __IPXLAT_A_DEV_MAX,
+ IPXLAT_A_DEV_MAX = (__IPXLAT_A_DEV_MAX - 1)
+};
+
+enum {
+ IPXLAT_CMD_DEV_GET = 1,
+ IPXLAT_CMD_DEV_SET,
+
+ __IPXLAT_CMD_MAX,
+ IPXLAT_CMD_MAX = (__IPXLAT_CMD_MAX - 1)
+};
+
+#endif /* _UAPI_LINUX_IPXLAT_H */
--
2.53.0
^ permalink raw reply related [flat|nested] 18+ messages in thread* [RFC net-next 14/15] selftests: net: add ipxlat coverage
2026-03-19 15:12 [RFC net-next 00/15] Introducing ipxlat: a stateless IPv4/IPv6 translation device Ralf Lici
` (12 preceding siblings ...)
2026-03-19 15:12 ` [RFC net-next 13/15] ipxlat: add netlink control plane and uapi Ralf Lici
@ 2026-03-19 15:12 ` Ralf Lici
2026-03-19 15:12 ` [RFC net-next 15/15] Documentation: networking: add ipxlat translator guide Ralf Lici
14 siblings, 0 replies; 18+ messages in thread
From: Ralf Lici @ 2026-03-19 15:12 UTC (permalink / raw)
To: netdev
Cc: Daniel Gröber, Ralf Lici, Antonio Quartulli, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
Shuah Khan, linux-kernel, linux-kselftest
Add selftests for ipxlat data plane behavior and control-plane setup.
The tests build an isolated netns topology, configure ipxlat through
YNL, and exercise core traffic classes (TCP, UDP, ICMP info/error, and
fragment-related paths). This provides reproducible end-to-end coverage
for the translation pipeline and basic regression protection for future
changes.
Signed-off-by: Ralf Lici <ralf@mandelbit.com>
---
tools/testing/selftests/net/ipxlat/.gitignore | 1 +
tools/testing/selftests/net/ipxlat/Makefile | 25 ++
.../selftests/net/ipxlat/ipxlat_data.sh | 70 +++++
.../selftests/net/ipxlat/ipxlat_frag.sh | 70 +++++
.../selftests/net/ipxlat/ipxlat_icmp_err.sh | 54 ++++
.../selftests/net/ipxlat/ipxlat_lib.sh | 273 ++++++++++++++++++
.../net/ipxlat/ipxlat_udp4_zero_csum_send.c | 119 ++++++++
7 files changed, 612 insertions(+)
create mode 100644 tools/testing/selftests/net/ipxlat/.gitignore
create mode 100644 tools/testing/selftests/net/ipxlat/Makefile
create mode 100755 tools/testing/selftests/net/ipxlat/ipxlat_data.sh
create mode 100755 tools/testing/selftests/net/ipxlat/ipxlat_frag.sh
create mode 100755 tools/testing/selftests/net/ipxlat/ipxlat_icmp_err.sh
create mode 100644 tools/testing/selftests/net/ipxlat/ipxlat_lib.sh
create mode 100644 tools/testing/selftests/net/ipxlat/ipxlat_udp4_zero_csum_send.c
diff --git a/tools/testing/selftests/net/ipxlat/.gitignore b/tools/testing/selftests/net/ipxlat/.gitignore
new file mode 100644
index 000000000000..43bd01d8a84b
--- /dev/null
+++ b/tools/testing/selftests/net/ipxlat/.gitignore
@@ -0,0 +1 @@
+ipxlat_udp4_zero_csum_send
diff --git a/tools/testing/selftests/net/ipxlat/Makefile b/tools/testing/selftests/net/ipxlat/Makefile
new file mode 100644
index 000000000000..cca588945e48
--- /dev/null
+++ b/tools/testing/selftests/net/ipxlat/Makefile
@@ -0,0 +1,25 @@
+# SPDX-License-Identifier: GPL-2.0
+# IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver
+#
+# Copyright (C) 2026- Mandelbit SRL
+# Copyright (C) 2026- Daniel Gröber <dxld@darkboxed.org>
+#
+# Author: Antonio Quartulli <antonio@mandelbit.com>
+# Daniel Gröber <dxld@darkboxed.org>
+# Ralf Lici <ralf@mandelbit.com>
+
+TEST_PROGS := \
+ ipxlat_data.sh \
+ ipxlat_frag.sh \
+ ipxlat_icmp_err.sh \
+# end of TEST_PROGS
+
+TEST_FILES := \
+ ipxlat_lib.sh \
+# end of TEST_FILES
+
+TEST_GEN_FILES := \
+ ipxlat_udp4_zero_csum_send \
+# end of TEST_GEN_FILES
+
+include ../../lib.mk
diff --git a/tools/testing/selftests/net/ipxlat/ipxlat_data.sh b/tools/testing/selftests/net/ipxlat/ipxlat_data.sh
new file mode 100755
index 000000000000..101e0a65f0a9
--- /dev/null
+++ b/tools/testing/selftests/net/ipxlat/ipxlat_data.sh
@@ -0,0 +1,70 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver
+#
+# Copyright (C) 2026- Mandelbit SRL
+# Copyright (C) 2026- Daniel Gröber <dxld@darkboxed.org>
+#
+# Author: Antonio Quartulli <antonio@mandelbit.com>
+# Daniel Gröber <dxld@darkboxed.org>
+# Ralf Lici <ralf@mandelbit.com>
+
+set -o pipefail
+
+SCRIPT_DIR=$(dirname "$(readlink -f "$0")")
+source "$SCRIPT_DIR/ipxlat_lib.sh"
+
+trap ipxlat_cleanup EXIT
+
+ipxlat_setup_env
+
+# Send ICMP Echo and verify we receive a reply back
+
+RET=0
+ip netns exec "$NS4" ping -c 2 -W 2 "$IPXLAT_V4_REMOTE" >/dev/null 2>&1
+check_err $? "ping 4->6 failed"
+log_test "icmp-info 4->6"
+
+RET=0
+ip netns exec "$NS6" ping -6 -c 2 -W 2 -I "$IPXLAT_V6_NS6_SRC" \
+ "$IPXLAT_V6_NS4" >/dev/null 2>&1
+check_err $? "ping 6->4 failed"
+log_test "icmp-info 6->4"
+
+# Run a TCP data transfer over the translator path
+
+RET=0
+ipxlat_run_iperf "$NS6" "$NS4" "$IPXLAT_V4_REMOTE" 5201 -n 256K
+check_err $? "tcp 4->6 failed"
+log_test "tcp 4->6"
+
+RET=0
+ipxlat_run_iperf "$NS4" "$NS6" "$IPXLAT_V6_NS4" 5201 \
+ -B "$IPXLAT_V6_NS6_SRC" -n 256K
+check_err $? "tcp 6->4 failed"
+log_test "tcp 6->4"
+
+# Run UDP traffic to verify UDP translation and delivery
+
+RET=0
+ipxlat_run_iperf "$NS6" "$NS4" "$IPXLAT_V4_REMOTE" 5202 -u -b 5M -t 1
+check_err $? "udp 4->6 failed"
+log_test "udp 4->6"
+
+RET=0
+ipxlat_run_iperf "$NS4" "$NS6" "$IPXLAT_V6_NS4" 5202 \
+ -B "$IPXLAT_V6_NS6_SRC" -u -b 5M -t 1
+check_err $? "udp 6->4 failed"
+log_test "udp 6->4"
+
+# Send one IPv4 UDP packet with checksum=0 and verify 4->6 translation.
+
+RET=0
+ipxlat_capture_pkts "$NS6" \
+ "ip6 and udp and dst host $IPXLAT_V6_REMOTE and dst port 5555" 1 3 \
+ ip netns exec "$NS4" "$SCRIPT_DIR/ipxlat_udp4_zero_csum_send" \
+ "$IPXLAT_NS4_ADDR" "$IPXLAT_V4_REMOTE" 5555
+check_err $? "udp checksum-zero 4->6 failed"
+log_test "udp checksum-zero 4->6"
+
+exit "$EXIT_STATUS"
diff --git a/tools/testing/selftests/net/ipxlat/ipxlat_frag.sh b/tools/testing/selftests/net/ipxlat/ipxlat_frag.sh
new file mode 100755
index 000000000000..26ed351cd263
--- /dev/null
+++ b/tools/testing/selftests/net/ipxlat/ipxlat_frag.sh
@@ -0,0 +1,70 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver
+#
+# Copyright (C) 2026- Mandelbit SRL
+# Copyright (C) 2026- Daniel Gröber <dxld@darkboxed.org>
+#
+# Author: Antonio Quartulli <antonio@mandelbit.com>
+# Daniel Gröber <dxld@darkboxed.org>
+# Ralf Lici <ralf@mandelbit.com>
+
+set -o pipefail
+
+SCRIPT_DIR=$(dirname "$(readlink -f "$0")")
+source "$SCRIPT_DIR/ipxlat_lib.sh"
+
+trap ipxlat_cleanup EXIT
+
+ipxlat_setup_env
+
+# Exercise large TCP flow on 4->6 path to cover pre-fragmentation behavior
+RET=0
+ipxlat_run_iperf "$NS6" "$NS4" "$IPXLAT_V4_REMOTE" 5301 -n 8M
+check_err $? "large tcp 4->6 failed"
+log_test "large tcp 4->6"
+
+# Exercise large UDP flow on 4->6 path to cover pre-fragmentation behavior
+RET=0
+ipxlat_run_iperf "$NS6" "$NS4" "$IPXLAT_V4_REMOTE" 5302 -u -b 20M -t 2 -l 1400
+check_err $? "large udp 4->6 failed"
+log_test "large udp 4->6"
+
+# Exercise large TCP flow on 6->4 path to cover
+# fragmentation-sensitive translation
+RET=0
+ipxlat_run_iperf "$NS4" "$NS6" "$IPXLAT_V6_NS4" 5303 \
+ -B "$IPXLAT_V6_NS6_SRC" -n 8M
+check_err $? "large tcp 6->4 failed"
+log_test "large tcp 6->4"
+
+# Exercise large UDP flow on 6->4 path to cover
+# fragmentation-sensitive translation
+RET=0
+ipxlat_run_iperf "$NS4" "$NS6" "$IPXLAT_V6_NS4" 5304 \
+ -B "$IPXLAT_V6_NS6_SRC" -u -b 20M -t 2 -l 1400
+check_err $? "large udp 6->4 failed"
+log_test "large udp 6->4"
+
+# Send oversized IPv4 ICMP Echo with DF disabled (source fragmentation allowed)
+# and verify translator drops fragmented ICMPv4 input (no translated ICMPv6
+# Echo seen in NS6)
+RET=0
+ipxlat_capture_pkts "$NS6" "icmp6 and ip6[40] == 128" 0 5 \
+ ip netns exec "$NS4" bash -c \
+ "ping -M \"dont\" -s 2000 -c 1 -W 1 \"$IPXLAT_V4_REMOTE\" \
+ >/dev/null 2>&1 || test \$? -eq 1"
+check_err $? "fragmented icmp 4->6 should be dropped"
+log_test "drop fragmented icmp 4->6"
+
+# Send oversized IPv6 ICMP echo request and verify translator drops fragmented
+# ICMPv6 input (no translated ICMPv4 Echo seen in NS4)
+RET=0
+ipxlat_capture_pkts "$NS4" "icmp and icmp[0] == 8" 0 5 \
+ ip netns exec "$NS6" bash -c \
+ "ping -6 -s 2000 -c 1 -W 1 -I \"$IPXLAT_V6_NS6_SRC\" \
+ \"$IPXLAT_V6_NS4\" >/dev/null 2>&1 || test \$? -eq 1"
+check_err $? "fragmented icmp 6->4 should be dropped"
+log_test "drop fragmented icmp 6->4"
+
+exit "$EXIT_STATUS"
diff --git a/tools/testing/selftests/net/ipxlat/ipxlat_icmp_err.sh b/tools/testing/selftests/net/ipxlat/ipxlat_icmp_err.sh
new file mode 100755
index 000000000000..946584b55895
--- /dev/null
+++ b/tools/testing/selftests/net/ipxlat/ipxlat_icmp_err.sh
@@ -0,0 +1,54 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver
+#
+# Copyright (C) 2026- Mandelbit SRL
+# Copyright (C) 2026- Daniel Gröber <dxld@darkboxed.org>
+#
+# Author: Antonio Quartulli <antonio@mandelbit.com>
+# Daniel Gröber <dxld@darkboxed.org>
+# Ralf Lici <ralf@mandelbit.com>
+
+set -o pipefail
+
+SCRIPT_DIR=$(dirname "$(readlink -f "$0")")
+source "$SCRIPT_DIR/ipxlat_lib.sh"
+
+trap ipxlat_cleanup EXIT
+
+ipxlat_setup_env
+
+# Trigger UDP to a closed port from NS4 and capture translated
+# ICMPv4 Port Unreachable
+RET=0
+ipxlat_capture_pkts "$NS4" "icmp and icmp[0] == 3 and icmp[1] == 3" 1 3 \
+ ip netns exec "$NS4" bash -c \
+ "echo x > /dev/udp/$IPXLAT_V4_REMOTE/9 || true"
+check_err $? "icmp-error 4->6 not observed"
+log_test "icmp-error xlate 4->6"
+
+# Trigger UDP to a closed port from NS6 and capture translated
+# ICMPv6 Port Unreachable
+RET=0
+ipxlat_capture_pkts "$NS6" "icmp6 and ip6[40] == 1 and ip6[41] == 4" 1 3 \
+ ip netns exec "$NS6" bash -c \
+ "echo x > /dev/udp/$IPXLAT_V6_NS4/9 || true"
+check_err $? "icmp-error 6->4 not observed"
+log_test "icmp-error xlate 6->4"
+
+# Send oversized DF IPv4 packet and verify local ICMPv4
+# Fragmentation Needed emission
+sysctl -qw net.ipv4.conf.ipxl0.accept_local=1
+sysctl -qw net.ipv4.conf.all.rp_filter=0
+sysctl -qw net.ipv4.conf.default.rp_filter=0
+sysctl -qw net.ipv4.conf.ipxl0.rp_filter=0
+sleep 2
+RET=0
+ipxlat_capture_pkts "$NS4" "icmp and icmp[0] == 3 and icmp[1] == 4" 1 3 \
+ ip netns exec "$NS4" bash -c \
+ "ping -M \"do\" -s 1300 -c 1 -W 1 \"$IPXLAT_V4_REMOTE\" \
+ >/dev/null 2>&1 || test \$? -eq 1"
+check_err $? "icmpv4 frag-needed emission not observed"
+log_test "icmpv4 frag-needed emission"
+
+exit "$EXIT_STATUS"
diff --git a/tools/testing/selftests/net/ipxlat/ipxlat_lib.sh b/tools/testing/selftests/net/ipxlat/ipxlat_lib.sh
new file mode 100644
index 000000000000..e27683f280d4
--- /dev/null
+++ b/tools/testing/selftests/net/ipxlat/ipxlat_lib.sh
@@ -0,0 +1,273 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver
+#
+# Copyright (C) 2026- Mandelbit SRL
+# Copyright (C) 2026- Daniel Gröber <dxld@darkboxed.org>
+#
+# Author: Antonio Quartulli <antonio@mandelbit.com>
+# Daniel Gröber <dxld@darkboxed.org>
+# Ralf Lici <ralf@mandelbit.com>
+
+set -o pipefail
+
+IPXLAT_TEST_DIR=$(dirname "$(readlink -f "${BASH_SOURCE[0]}")")
+source "$IPXLAT_TEST_DIR/../lib.sh"
+
+KDIR=${KDIR:-$(readlink -f "$IPXLAT_TEST_DIR/../../../../../")}
+YNL_CLI="$KDIR/tools/net/ynl/pyynl/cli.py"
+YNL_SPEC="$KDIR/Documentation/netlink/specs/ipxlat.yaml"
+IPXLAT_IPERF_TIMEOUT=${IPXLAT_IPERF_TIMEOUT:-10}
+
+IPXLAT_TRANSLATOR_DEV=ipxl0
+IPXLAT_VETH4_HOST=veth4r
+IPXLAT_VETH4_NS=veth4n
+IPXLAT_VETH6_HOST=veth6r
+IPXLAT_VETH6_NS=veth6n
+
+IPXLAT_XLAT_PREFIX6=2001:db8:100::
+IPXLAT_XLAT_PREFIX6_LEN=40
+IPXLAT_XLAT_PREFIX6_HEX=20010db8010000000000000000000000
+IPXLAT_LOWEST_IPV6_MTU=1280
+
+IPXLAT_HOST4_ADDR=198.51.100.1
+IPXLAT_HOST6_ADDR=2001:db8:1::1
+
+IPXLAT_NS4_ADDR=198.51.100.2
+IPXLAT_NS6_ADDR=2001:db8:1::2
+export IPXLAT_V4_REMOTE=192.0.2.33
+
+IPXLAT_V6_REMOTE=2001:db8:1c0:2:21::
+IPXLAT_V6_NS4=2001:db8:1c6:3364:2::
+IPXLAT_V6_NS6_SRC=2001:db8:1c0:2:2::
+
+NS4=""
+NS6=""
+
+ipxlat_ynl()
+{
+ python3 "$YNL_CLI" --spec "$YNL_SPEC" "$@"
+}
+
+ipxlat_build_dev_set_json()
+{
+ local ifindex="$1"
+
+ jq -cn \
+ --argjson ifindex "$ifindex" \
+ --arg prefix "$IPXLAT_XLAT_PREFIX6_HEX" \
+ --argjson prefix_len "$IPXLAT_XLAT_PREFIX6_LEN" \
+ --argjson lowest_ipv6_mtu "$IPXLAT_LOWEST_IPV6_MTU" \
+ '{
+ ifindex: $ifindex,
+ config: {
+ "xlat-prefix6": {
+ prefix: $prefix,
+ "prefix-len": $prefix_len
+ },
+ "lowest-ipv6-mtu": $lowest_ipv6_mtu
+ }
+ }'
+}
+
+ipxlat_require_root()
+{
+ if [[ $(id -u) -ne 0 ]]; then
+ echo "ipxlat selftests need root; skipping"
+ exit "$ksft_skip"
+ fi
+}
+
+ipxlat_require_tools()
+{
+ if [[ ! -f "$YNL_CLI" || ! -f "$YNL_SPEC" ]]; then
+ log_test_skip "ipxlat netlink spec/ynl not found"
+ exit "$ksft_skip"
+ fi
+
+ for tool in ip python3 ping iperf3 tcpdump timeout jq; do
+ require_command "$tool"
+ done
+}
+
+ipxlat_cleanup()
+{
+ cleanup_ns "${NS4:-}" "${NS6:-}" || true
+ ip link del "$IPXLAT_TRANSLATOR_DEV" 2>/dev/null || true
+ ip link del "$IPXLAT_VETH4_HOST" 2>/dev/null || true
+ ip link del "$IPXLAT_VETH6_HOST" 2>/dev/null || true
+}
+
+# Test topology:
+#
+# host namespace:
+# - owns ipxlat dev `ipxl0`
+# - has veth peers `veth4r` and `veth6r`
+# - routes IPv4 test prefix (192.0.2.0/24) to ipxl0 (v4 network steering rule)
+# - routes xlat-prefix6 prefix (2001:db8:100::/40) out to NS6 side
+# - routes mapped NS4 IPv6 identity (2001:db8:1c6:3364:2::/128) to ipxl0
+# so NS6->NS4 traffic enters 6->4 translation
+#
+# NS4:
+# - IPv4-only endpoint: 198.51.100.2/24 on veth4n
+# - default route via host 198.51.100.1 (veth4r)
+# - sends traffic to 192.0.2.33 (translated by ipxl0 to IPv6)
+#
+# NS6:
+# - IPv6 endpoint: 2001:db8:1::2/64 on veth6n
+# - also owns mapped addresses used by tests:
+# 2001:db8:1c0:2:21:: (maps to 192.0.2.33)
+# 2001:db8:1c0:2:2:: (maps to 192.0.2.2, used as explicit src
+# since we have multiple v6 addresses)
+# - route to mapped NS4 IPv6 address is pinned via host:
+# 2001:db8:1c6:3364:2::/128
+# This keeps the 6->4 test path deterministic.
+#
+# ipxlat config under test:
+# - xlat-prefix6 = 2001:db8:100::/40
+# - lowest-ipv6-mtu = 1280
+ipxlat_configure_topology()
+{
+ local ifindex
+ local dev_set_json
+
+ if ! ip link add "$IPXLAT_TRANSLATOR_DEV" type ipxlat; then
+ echo "ipxlat link kind unavailable; skipping"
+ exit "$ksft_skip"
+ fi
+ ip link set "$IPXLAT_TRANSLATOR_DEV" up
+ ifindex=$(cat /sys/class/net/"$IPXLAT_TRANSLATOR_DEV"/ifindex)
+ dev_set_json=$(ipxlat_build_dev_set_json "$ifindex")
+
+ if ! ipxlat_ynl --do dev-set --json "$dev_set_json" >/dev/null; then
+ echo "ipxlat dev-set failed"
+ exit "$ksft_fail"
+ fi
+
+ setup_ns NS4 NS6 || exit "$ksft_skip"
+
+ ip link add "$IPXLAT_VETH4_HOST" type veth peer name "$IPXLAT_VETH4_NS"
+ ip link add "$IPXLAT_VETH6_HOST" type veth peer name "$IPXLAT_VETH6_NS"
+ ip link set "$IPXLAT_VETH4_NS" netns "$NS4"
+ ip link set "$IPXLAT_VETH6_NS" netns "$NS6"
+
+ ip addr add "$IPXLAT_HOST4_ADDR/24" dev "$IPXLAT_VETH4_HOST"
+ ip -6 addr add "$IPXLAT_HOST6_ADDR/64" dev "$IPXLAT_VETH6_HOST"
+ ip link set "$IPXLAT_VETH4_HOST" up
+ ip link set "$IPXLAT_VETH6_HOST" up
+
+ ip netns exec "$NS4" ip addr add "$IPXLAT_NS4_ADDR/24" \
+ dev "$IPXLAT_VETH4_NS"
+ ip netns exec "$NS4" ip link set "$IPXLAT_VETH4_NS" up
+ ip netns exec "$NS4" ip route add default via "$IPXLAT_HOST4_ADDR"
+
+ ip netns exec "$NS6" ip -6 addr add "$IPXLAT_NS6_ADDR/64" \
+ dev "$IPXLAT_VETH6_NS"
+ ip netns exec "$NS6" ip -6 addr add "$IPXLAT_V6_REMOTE/128" \
+ dev "$IPXLAT_VETH6_NS"
+ ip netns exec "$NS6" ip -6 addr add "$IPXLAT_V6_NS6_SRC/128" \
+ dev "$IPXLAT_VETH6_NS"
+ ip netns exec "$NS6" ip link set "$IPXLAT_VETH6_NS" up
+ ip netns exec "$NS6" ip -6 route add default via "$IPXLAT_HOST6_ADDR"
+ ip netns exec "$NS6" ip -6 route replace "$IPXLAT_V6_NS4/128" \
+ via "$IPXLAT_HOST6_ADDR"
+ sleep 2
+
+ sysctl -qw net.ipv4.ip_forward=1
+ sysctl -qw net.ipv6.conf.all.forwarding=1
+
+ # 4->6 steering rule
+ ip route replace 192.0.2.0/24 dev "$IPXLAT_TRANSLATOR_DEV"
+ # Post-translation egress:
+ # IPv6 destinations in xlat-prefix6 leave toward NS6.
+ ip -6 route replace "$IPXLAT_XLAT_PREFIX6/$IPXLAT_XLAT_PREFIX6_LEN" \
+ dev "$IPXLAT_VETH6_HOST"
+ # 6->4 steering rule
+ ip -6 route replace "$IPXLAT_V6_NS4/128" dev "$IPXLAT_TRANSLATOR_DEV"
+
+ ip link set "$IPXLAT_VETH6_HOST" mtu 1280
+ ip netns exec "$NS6" ip link set "$IPXLAT_VETH6_NS" mtu 1280
+}
+
+ipxlat_setup_env()
+{
+ ipxlat_require_root
+ ipxlat_require_tools
+ ipxlat_cleanup
+
+ ipxlat_configure_topology
+}
+
+ipxlat_run_iperf()
+{
+ local srv_ns="$1"
+ local cli_ns="$2"
+ local dst="$3"
+ local port="$4"
+ local -a args=()
+ local client_rc
+ local server_rc
+ local spid
+ local idx
+
+ for ((idx = 5; idx <= $#; idx++)); do
+ args+=("${!idx}")
+ done
+
+ ip netns exec "$srv_ns" timeout "$IPXLAT_IPERF_TIMEOUT" \
+ iperf3 -s -1 -p "$port" >/dev/null 2>&1 &
+ spid=$!
+ sleep 0.2
+
+ ip netns exec "$cli_ns" timeout "$IPXLAT_IPERF_TIMEOUT" \
+ iperf3 -c "$dst" -p "$port" "${args[@]}" >/dev/null 2>&1
+
+ client_rc=$?
+ if [[ $client_rc -ne 0 ]]; then
+ kill "$spid" >/dev/null 2>&1 || true
+ fi
+
+ wait "$spid" >/dev/null 2>&1
+ server_rc=$?
+
+ ((client_rc != 0)) && return "$client_rc"
+ return "$server_rc"
+}
+
+ipxlat_capture_pkts()
+{
+ local ns="$1"
+ local filter="$2"
+ local expect_pkts="$3"
+ local timeout_s="$4"
+ local cap_goal
+ local cap_pid
+ local rc
+ local trigger_rc
+
+ shift 4
+
+ cap_goal=1
+ [[ $expect_pkts -gt 0 ]] && cap_goal=$expect_pkts
+
+ ip netns exec "$ns" timeout "$timeout_s" \
+ tcpdump -nni any -c "$cap_goal" \
+ "$filter" >/dev/null 2>&1 &
+ cap_pid=$!
+ sleep 0.2
+
+ "$@"
+ trigger_rc=$?
+ wait "$cap_pid" >/dev/null 2>&1
+ rc=$?
+
+ if [[ $trigger_rc -ne 0 ]]; then
+ return "$trigger_rc"
+ fi
+
+ if [[ $expect_pkts -eq 0 ]]; then
+ [[ $rc -eq 124 ]]
+ else
+ [[ $rc -eq 0 ]]
+ fi
+}
diff --git a/tools/testing/selftests/net/ipxlat/ipxlat_udp4_zero_csum_send.c b/tools/testing/selftests/net/ipxlat/ipxlat_udp4_zero_csum_send.c
new file mode 100644
index 000000000000..ef9f07f8d699
--- /dev/null
+++ b/tools/testing/selftests/net/ipxlat/ipxlat_udp4_zero_csum_send.c
@@ -0,0 +1,119 @@
+// SPDX-License-Identifier: GPL-2.0
+/* IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver
+ *
+ * Copyright (C) 2026- Mandelbit SRL
+ * Copyright (C) 2026- Daniel Gröber <dxld@darkboxed.org>
+ *
+ * Author: Antonio Quartulli <antonio@mandelbit.com>
+ * Daniel Gröber <dxld@darkboxed.org>
+ * Ralf Lici <ralf@mandelbit.com>
+ */
+
+#include <arpa/inet.h>
+#include <errno.h>
+#include <linux/ip.h>
+#include <linux/udp.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/socket.h>
+#include <unistd.h>
+
+static uint16_t iphdr_csum(const void *buf, size_t len)
+{
+ const uint16_t *p = buf;
+ uint32_t sum = 0;
+
+ while (len > 1) {
+ sum += *p++;
+ len -= 2;
+ }
+ if (len)
+ sum += *(const uint8_t *)p;
+
+ while (sum >> 16)
+ sum = (sum & 0xffff) + (sum >> 16);
+
+ return (uint16_t)~sum;
+}
+
+int main(int argc, char **argv)
+{
+ static const char payload[] = "ipxlat-zero-udp-csum";
+ struct sockaddr_in dst = {};
+ struct {
+ struct iphdr ip;
+ struct udphdr udp;
+ char payload[sizeof(payload)];
+ } pkt = {};
+ in_addr_t saddr, daddr;
+ unsigned long dport_ul;
+ socklen_t dst_len;
+ ssize_t n;
+ int one = 1;
+ int fd;
+
+ if (argc != 4) {
+ fprintf(stderr, "usage: %s <src4> <dst4> <dport>\n", argv[0]);
+ return 2;
+ }
+
+ if (!inet_pton(AF_INET, argv[1], &saddr) ||
+ !inet_pton(AF_INET, argv[2], &daddr)) {
+ fprintf(stderr, "invalid IPv4 address\n");
+ return 2;
+ }
+
+ errno = 0;
+ dport_ul = strtoul(argv[3], NULL, 10);
+ if (errno || dport_ul > 65535) {
+ fprintf(stderr, "invalid UDP port\n");
+ return 2;
+ }
+
+ fd = socket(AF_INET, SOCK_RAW, IPPROTO_RAW);
+ if (fd < 0) {
+ perror("socket");
+ return 1;
+ }
+
+ if (setsockopt(fd, IPPROTO_IP, IP_HDRINCL, &one, sizeof(one)) < 0) {
+ perror("setsockopt(IP_HDRINCL)");
+ close(fd);
+ return 1;
+ }
+
+ pkt.ip.version = 4;
+ pkt.ip.ihl = 5;
+ pkt.ip.ttl = 64;
+ pkt.ip.protocol = IPPROTO_UDP;
+ pkt.ip.tot_len = htons(sizeof(pkt));
+ pkt.ip.id = htons(1);
+ pkt.ip.frag_off = 0;
+ pkt.ip.saddr = saddr;
+ pkt.ip.daddr = daddr;
+ pkt.ip.check = iphdr_csum(&pkt.ip, sizeof(pkt.ip));
+
+ pkt.udp.source = htons(4242);
+ pkt.udp.dest = htons((uint16_t)dport_ul);
+ pkt.udp.len = htons(sizeof(pkt.udp) + sizeof(payload));
+ pkt.udp.check = 0;
+
+ memcpy(pkt.payload, payload, sizeof(payload));
+
+ dst.sin_family = AF_INET;
+ dst.sin_port = pkt.udp.dest;
+ dst.sin_addr.s_addr = daddr;
+ dst_len = sizeof(dst);
+
+ n = sendto(fd, &pkt, sizeof(pkt), 0, (struct sockaddr *)&dst, dst_len);
+ if (n != (ssize_t)sizeof(pkt)) {
+ perror("sendto");
+ close(fd);
+ return 1;
+ }
+
+ close(fd);
+ return 0;
+}
--
2.53.0
^ permalink raw reply related [flat|nested] 18+ messages in thread* [RFC net-next 15/15] Documentation: networking: add ipxlat translator guide
2026-03-19 15:12 [RFC net-next 00/15] Introducing ipxlat: a stateless IPv4/IPv6 translation device Ralf Lici
` (13 preceding siblings ...)
2026-03-19 15:12 ` [RFC net-next 14/15] selftests: net: add ipxlat coverage Ralf Lici
@ 2026-03-19 15:12 ` Ralf Lici
2026-03-19 22:11 ` Jonathan Corbet
14 siblings, 1 reply; 18+ messages in thread
From: Ralf Lici @ 2026-03-19 15:12 UTC (permalink / raw)
To: netdev
Cc: Daniel Gröber, Antonio Quartulli, Ralf Lici, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
Jonathan Corbet, Shuah Khan, linux-doc, linux-kernel
From: Daniel Gröber <dxld@darkboxed.org>
Add user and reviewer documentation for the ipxlat virtual netdevice in
Documentation/networking/ipxlat.rst.
The document describes the datapath model, stateless IPv4/IPv6 address
translation rules, ICMP handling, control-plane configuration, and test
topology assumptions. It also records the intended runtime configuration
contract and current behavior limits so deployment expectations are
clear.
Signed-off-by: Daniel Gröber <dxld@darkboxed.org>
Signed-off-by: Ralf Lici <ralf@mandelbit.com>
---
Documentation/networking/ipxlat.rst | 190 ++++++++++++++++++++++++++++
1 file changed, 190 insertions(+)
create mode 100644 Documentation/networking/ipxlat.rst
diff --git a/Documentation/networking/ipxlat.rst b/Documentation/networking/ipxlat.rst
new file mode 100644
index 000000000000..5a0ad02c05be
--- /dev/null
+++ b/Documentation/networking/ipxlat.rst
@@ -0,0 +1,190 @@
+.. SPDX-License-Identifier: GPL-2.0+
+.. Copyright (C) 2026 Daniel Gröber <dxld@debian.org>
+
+==============================================
+IPXLAT - IPv6<>IPv4 IP/ICMP Translation (SIIT)
+==============================================
+
+ipxlat (``CONFIG_IPXLAT=y``) provides a virtual netdevice implementing
+stateless IP packet translation between IP versions 6 and 4. This is a
+building block for establishing layer 3 connectivity between otherwise
+uncommunicative IPv6-only and/or IPv4-only networks.
+
+
+Creation and Configuration Parameters
+=====================================
+
+An ipxlat netdevice can be created and configured using YNL like so::
+
+ $ ip link add siit0 type ipxlat
+
+ $ IID=$(cat /sys/class/net/siit0/ifindex)
+
+ $ ADDR_HEX=$(python3 -c 'import ipaddress,sys; \
+ print(ipaddress.IPv6Address(sys.argv[1]).packed.hex())' \
+ 64:ff9b:: | tee /dev/stderr)
+ 0064ff9b000000000000000000000000
+
+ $ ./tools/net/ynl/pyynl/cli.py --family ipxlat --json '{"ifindex": $IID, \
+ "config": {"xlat-prefix6": "'$HEX_ADDR'", "prefix-len": 96} }'
+
+(TODO: Once implemented) A ipxlat netdevice can be configured using
+iproute2::
+
+ $ ip link add siit0 type ipxlat [ OPTIONS ]
+
+ # where OPTIONS can include (TODO: iproute2 patch):
+ #
+ # prefix ADDR (default 64:ff9b::/96)
+ #
+ # lowest-ipv6-mtu MTU (default 1280)
+
+
+Introduction to Packet-level IPv6<>IPv4 Translation
+===================================================
+
+Translatable packets delivered into an ipxlat device as either of the IP
+protocol versions loop-back as the other. Untranslatable packets are
+rejected with ICMP errors of the same IP version as appropriate or dropped
+silently if required by RFC-SIIT_.
+
+.. _RFC-SIIT: https://datatracker.ietf.org/doc/html/rfc7915
+
+Supported upper layer protocols (TCP/UDP/ICMP) have their checksums
+recomputed as-needed as part of translation. Unsupported IP protocols
+(IPPROTO\_*) are passed through unmodified. This will make them fail at the
+receiver except in special cases.
+
+Differences in IP layer semantic concerns are handled using several
+different strategies, here we'll only give a high-level summary in the
+areas of most friction:
+ Fragmentation approach, Path MTU Discovery (PMTUD), IP Options and Extension
+ Headers.
+
+**Fragmentation Approach** (v4: on-path vs v6: end-to-end) is smoothed over by:
+ | 4->6: Fragmenting (DF=0) IPv4 packets when needed. See "lowest-ipv6-mtu".
+ | 6->4: Using on-path frag. down the line for v4 pkts smaller than 1260.
+ Details are tedious, check RFC-SIIT_.
+
+**PMTUD** is maintained by recalculating advised MTU values in ICMP
+PKT_TOO_BIG and FRAG_NEEDED messages as they're being translated. Taking
+into account the necessary header re-sizing and post-translation nexthop
+MTU in the main routing table.
+
+**IP Options and IPv6 Extension Headers** except the Fragment Header are
+dropped or ignored expept where more specific behaviour is specified in
+RFC-SIIT_.
+
+
+Address Translation
+-------------------
+
+The ipxlat address translation algorithm is stateless, per RFC-ADDR_, all
+possible IPv4 addressess are mapped one-to-one into the translation prefix,
+optionally including a non-standard "suffix". See `RFC-ADDR Section 2.2
+<https://datatracker.ietf.org/doc/html/rfc6052#section-2.2>`_.
+
+.. _RFC-ADDR: https://datatracker.ietf.org/doc/html/rfc6052
+
+IPv6 addressess outside this prefix are rejected with ICMPv6 errors with
+the notable exception of ICMPv6 errors originating from untranslatable
+source addressess. These are translated to be sourced from the IPv4 Dummy
+Address ``192.0.0.8`` (per I-D-dummy_) instead to maintain IPv4 traceroute
+visibility.
+
+.. _I-D-dummy:
+ https://datatracker.ietf.org/doc/draft-ietf-v6ops-icmpext-xlat-v6only-source/
+
+In a basic bidirectional 6<>4 connectivity scenario this means IPv6 hosts
+must be addressed wholly from inside the translation prefix and per
+RFC-ADDR_. Plain vanilla SLAAC doesn't cut it here, static addressing or
+DHCPv6 is needed, unless that is we introduce statefulnes (RFC-NAT64_) into
+the mix. See below on that.
+
+.. _RFC-NAT64: https://datatracker.ietf.org/doc/html/rfc6146
+
+
+Stateful Translation (NAT64)
+----------------------------
+
+Using NAT64 has several drawbacks, it's necessary only when your control
+over IPv4 or IPv6 addressing of hosts is limited.
+
+Using nftables we can turn a system into a stateful translator. For example
+to make the IPv4 internet reachable to a IPv6-only LAN having this system
+as it's default route, further assuming we have an IPv4 default route and
+``192.0.2.1/32`` is routed to this system::
+
+ $ ip link add siit0 type ipxlat
+ $ ip link set dev siit0 up
+ $ ip route 192.0.2.1/32 dev siit0
+ $ ip route 64:ff9b::/96 dev siit0
+ $ sysctl -w net.ipv4.conf.all.forwarding=1
+ $ sysctl -w net.ipv6.conf.all.forwarding=1
+ $ nft -f- <<EOF
+ table ip6 nat {
+ chain postrouting {
+ type nat hook postrouting priority filter; policy accept;
+ oifname "siit0" snat to 64:ff9b::c002:1 comment "::192.0.2.1"
+ }
+ }
+ table ip nat {
+ chain postrouting {
+ type nat hook postrouting priority filter; policy accept;
+ iifname "siit0" masquerade
+ }
+ }
+ EOF
+
+Note: Keep reading when replacing the 192.0.2.0/24 documentation
+placeholder with RFC 1918 "private IPv4" space.
+
+
+Translation Prefix Choice and Complications
+-------------------------------------------
+
+Several prefix sizes between /32 and /96 are supported by ipxlat. Using
+a /96 prefix is often convenient as it allows using the dotted quad IPv6
+notation, eg.: "64:ff9b::192.0.2.1". RFC-ADDR_ "3.3. Choice of Prefix for
+Stateless Translation Deployments" has more detailed recommendations.
+
+The "Well-Known Prefix" (WKP) 64:ff9b::/96, while a convenient and short
+choice for LANs, comes with some IETF baggage. As specified (at time of
+writing) addressess drawn from RFC 1918 "private IPv4" space "MUST NOT" be
+used with the WKP. While ipxlat does not enforce this other network
+elements may.
+
+If I-D-WKP-1918_ makes it through the IETF process this complication for
+the cautious network engineer may dissapear in the future.
+
+.. _I-D-WKP-1918:
+ https://datatracker.ietf.org/doc/draft-ietf-v6ops-nat64-wkp-1918/
+
+In the meantime the newer and more lax prefix allocated by RFC-LWKP_ or an
+entirely Network-Specific Prefix may be a better fit. We'd recommend using
+the checksum-neutral ``64:ff9b:1:fffe::/96`` prefix from the larger /48
+allocation.
+
+.. _RFC-LWKP: https://datatracker.ietf.org/doc/html/rfc8215
+
+
+RFC Considerations for Userspace
+--------------------------------
+
+- Per `RFC 7915
+ <https://datatracker.ietf.org/doc/html/rfc7915#section-4.5>`_,
+ ipxlat SHOULD drop UDPv4 zero checksum packets, yet we chose to always
+ recalculate checksums for unfragmented packets.
+
+ If you want your translator to follow the SHOULD add a netfilter rule
+ dropping such packets. For example using ``nft(8)`` syntax::
+
+ nft add rule filter ip postrouting -- oifkind ipxlat udp checksum 0 log drop
+
+- Per `RFC 6146
+ <https://datatracker.ietf.org/doc/html/rfc6146#section-3.4>`_,
+ Fragmented UDPv4 zero checksum recalculation by reassembly is not
+ supported.
+
+- I-D-dummy_: Adding a Node Identity Object to for IPv4-side traceroute
+ disambiguation is not yet supported.
--
2.53.0
^ permalink raw reply related [flat|nested] 18+ messages in thread* Re: [RFC net-next 15/15] Documentation: networking: add ipxlat translator guide
2026-03-19 15:12 ` [RFC net-next 15/15] Documentation: networking: add ipxlat translator guide Ralf Lici
@ 2026-03-19 22:11 ` Jonathan Corbet
2026-03-24 9:55 ` Ralf Lici
0 siblings, 1 reply; 18+ messages in thread
From: Jonathan Corbet @ 2026-03-19 22:11 UTC (permalink / raw)
To: Ralf Lici, netdev
Cc: Daniel Gröber, Antonio Quartulli, Ralf Lici, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
Shuah Khan, linux-doc, linux-kernel
Ralf Lici <ralf@mandelbit.com> writes:
> From: Daniel Gröber <dxld@darkboxed.org>
>
> Add user and reviewer documentation for the ipxlat virtual netdevice in
> Documentation/networking/ipxlat.rst.
>
> The document describes the datapath model, stateless IPv4/IPv6 address
> translation rules, ICMP handling, control-plane configuration, and test
> topology assumptions. It also records the intended runtime configuration
> contract and current behavior limits so deployment expectations are
> clear.
>
> Signed-off-by: Daniel Gröber <dxld@darkboxed.org>
> Signed-off-by: Ralf Lici <ralf@mandelbit.com>
> ---
> Documentation/networking/ipxlat.rst | 190 ++++++++++++++++++++++++++++
> 1 file changed, 190 insertions(+)
> create mode 100644 Documentation/networking/ipxlat.rst
You need to add this new file to Documentation/networking/index.rst or
it won't be included in the build (and you'll get a warning).
Thanks,
jon
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC net-next 15/15] Documentation: networking: add ipxlat translator guide
2026-03-19 22:11 ` Jonathan Corbet
@ 2026-03-24 9:55 ` Ralf Lici
0 siblings, 0 replies; 18+ messages in thread
From: Ralf Lici @ 2026-03-24 9:55 UTC (permalink / raw)
To: Jonathan Corbet, netdev
Cc: Daniel Gröber, Antonio Quartulli, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
Shuah Khan, linux-doc, linux-kernel
On 3/19/26 23:11, Jonathan Corbet wrote:
> Ralf Lici <ralf@mandelbit.com> writes:
>
>> From: Daniel Gröber <dxld@darkboxed.org>
>>
>> Add user and reviewer documentation for the ipxlat virtual netdevice in
>> Documentation/networking/ipxlat.rst.
>>
>> The document describes the datapath model, stateless IPv4/IPv6 address
>> translation rules, ICMP handling, control-plane configuration, and test
>> topology assumptions. It also records the intended runtime configuration
>> contract and current behavior limits so deployment expectations are
>> clear.
>>
>> Signed-off-by: Daniel Gröber <dxld@darkboxed.org>
>> Signed-off-by: Ralf Lici <ralf@mandelbit.com>
>> ---
>> Documentation/networking/ipxlat.rst | 190 ++++++++++++++++++++++++++++
>> 1 file changed, 190 insertions(+)
>> create mode 100644 Documentation/networking/ipxlat.rst
>
> You need to add this new file to Documentation/networking/index.rst or
> it won't be included in the build (and you'll get a warning).
>
> Thanks,
>
> jon
Hi Jon,
Thanks for the heads-up.
I’ve fixed this for the next revision. While rechecking with 'make
SPHINXDIRS=networking htmldocs', I also found and fixed a couple of
'ipxlat.rst' issues reported by Sphinx.
Thanks,
--
Ralf Lici
Mandelbit Srl
^ permalink raw reply [flat|nested] 18+ messages in thread