* [RFC net-next 00/15] Introducing ipxlat: a stateless IPv4/IPv6 translation device
@ 2026-03-19 15:12 Ralf Lici
2026-03-19 15:12 ` [RFC net-next 01/15] drivers/net: add ipxlat netdevice skeleton and build plumbing Ralf Lici
` (14 more replies)
0 siblings, 15 replies; 18+ messages in thread
From: Ralf Lici @ 2026-03-19 15:12 UTC (permalink / raw)
To: netdev
Cc: Daniel Gröber, Ralf Lici, Andrew Lunn, Antonio Quartulli,
David S. Miller, Eric Dumazet, Jakub Kicinski, linux-kernel,
Paolo Abeni
Hi all,
this RFC series introduces ipxlat, a virtual netdevice for stateless
packet translation between IPv6 and IPv4.
This stateless IP/ICMP translation (SIIT, RFC 7915) device is a building
block ultimately allowing suitably configured Linux systems to cover all
IPv6<>IPv4 connectivity scenarios outlined in RFC 6144, "Framework for
IPv4/IPv6 Translation".
While the packet translation function implemented in ipxlat itself is
stateless, building stateful NAT64 translators is easy in combination
with a sandwich of simple nft SNAT and MASQUERADE rules. Even SIIT-DC
(RFC 7755 / 7756) ER/BR functions including EAMT (RFC 7757) are thought
to be possible with suitable nft/iptables configuration, but this needs
further testing.
The series contains patches covering driver core, translation paths,
netlink API, selftests and documentation.
See Documentation/networking/ipxlat.rst for more details.
== Architecture ==
ipxlat sits at a boundary between two kernel models. It is exposed as a
netdevice, so it has device semantics such as MTU and netdev statistics.
However, most of its processing falls within protocol translation logic.
The implementation therefore uses netdevice hooks for integration and
lifecycle, while translation behavior follows RFC rules and reuses
existing IP stack helpers for routing, fragmentation and checksum
handling.
Feedback on the netdevice integration model is welcome, yet this series
intentionally keeps scope limited to a self-contained module to make
review and validation tractable.
ipxlat devices are created and destroyed via rtnl link operations.
Per-device translation parameters are configured through a generic
netlink family named ipxlat.
No generic networking core behavior is changed.
== RFCs ==
The ipxlat packet translation code considers:
- RFC 7915 - Stateless IP/ICMP translation (SIIT) behavior
- RFC 6052 - Address mapping for xlat-prefix sizes between /32 and /96
- RFC 6791 - Although we use standard ICMP source-address selection
- RFC 4884 - Translation painstakingly handles ICMP extensions
- RFC 5837 - Interface Information Objects from RFC 6791 are not
implemented in this series and are planned as follow-up work
== Implementation ==
We enforce a strict processing contract: packet validation is done once,
then translation runs on that validated layout. When translation cannot
continue, the packet is either dropped or we switch to the ICMP error
emission path.
Control-plane updates are serialized, while the data path reads
configuration locklessly to keep per-packet overhead low.
During live reconfiguration, readers may transiently observe mixed old
and new values; this may cause a small number of packet drops while
configuration is being changed.
This tradeoff is intentional to keep the fast path simple and
lightweight.
== Selftests ==
Selftests are added under tools/testing/selftests/net/ipxlat and cover
ICMP, TCP and UDP translation in both directions, large-packet and
fragmentation-sensitive paths, ICMP error translation and PMTUD-related
emission paths.
== Points of Discussion ==
- Tighter stack integration?
== Work Planned for v1 ==
- icmp: Simplify FRAG_NEEDED / PKT_TOOBIG MTU calculation.
- translation: Prevent skb loops without TTL/HLIM decrement?
- netdevice: Decide on hardcoding MTU = 0xffff - $xlat_overhead
- UDPv4 defrag and csum recalc for NAT64 (RFC 6146 Sec 3.4.) "For
incoming IPv4 packets carrying UDP packets with a zero checksum ...
MUST calculate the checksum"
== Acknowledgements ==
The ipxlat translation code is based on the Jool project in order to
benefit from years of accumulated experience and its golden-packet
test-suite.
Thanks to Jool's Principal Author, Alberto Leiva Popper, for developing
and maintaining Jool since IPv6 translation was last in-vogue and
writing the initial "joolif" netdevice prototype our work was able to
start from.
Thanks to NLnet's NGI0 Core Fund for supporting development of the
ipxlat driver.
Thanks for your review,
Ralf Lici
Mandelbit SRL
---
Daniel Gröber (1):
Documentation: networking: add ipxlat translator guide
Ralf Lici (14):
drivers/net: add ipxlat netdevice skeleton and build plumbing
ipxlat: add RFC 6052 address conversion helpers
ipxlat: add packet metadata control block helpers
ipxlat: add IPv4 packet validation path
ipxlat: add IPv6 packet validation path
ipxlat: add transport checksum and offload helpers
ipxlat: add 4to6 and 6to4 TCP/UDP translation helpers
ipxlat: add translation engine and dispatch core
ipxlat: emit translator-generated ICMP errors on drop
ipxlat: add 4to6 pre-fragmentation path
ipxlat: add ICMP informational translation paths
ipxlat: add ICMP error translation and quoted-inner handling
ipxlat: add netlink control plane and uapi
selftests: net: add ipxlat coverage
Documentation/netlink/specs/ipxlat.yaml | 97 +++
Documentation/networking/ipxlat.rst | 190 +++++
drivers/net/Kconfig | 13 +
drivers/net/Makefile | 1 +
drivers/net/ipxlat/Makefile | 17 +
drivers/net/ipxlat/address.c | 132 ++++
drivers/net/ipxlat/address.h | 59 ++
drivers/net/ipxlat/dispatch.c | 263 ++++++
drivers/net/ipxlat/dispatch.h | 78 ++
drivers/net/ipxlat/icmp.h | 45 ++
drivers/net/ipxlat/icmp_46.c | 552 +++++++++++++
drivers/net/ipxlat/icmp_64.c | 531 +++++++++++++
drivers/net/ipxlat/ipxlpriv.h | 53 ++
drivers/net/ipxlat/main.c | 148 ++++
drivers/net/ipxlat/main.h | 27 +
drivers/net/ipxlat/netlink-gen.c | 71 ++
drivers/net/ipxlat/netlink-gen.h | 31 +
drivers/net/ipxlat/netlink.c | 348 ++++++++
drivers/net/ipxlat/netlink.h | 27 +
drivers/net/ipxlat/packet.c | 747 ++++++++++++++++++
drivers/net/ipxlat/packet.h | 166 ++++
drivers/net/ipxlat/translate_46.c | 256 ++++++
drivers/net/ipxlat/translate_46.h | 84 ++
drivers/net/ipxlat/translate_64.c | 206 +++++
drivers/net/ipxlat/translate_64.h | 56 ++
drivers/net/ipxlat/transport.c | 401 ++++++++++
drivers/net/ipxlat/transport.h | 122 +++
include/uapi/linux/ipxlat.h | 48 ++
tools/testing/selftests/net/ipxlat/.gitignore | 1 +
tools/testing/selftests/net/ipxlat/Makefile | 25 +
.../selftests/net/ipxlat/ipxlat_data.sh | 70 ++
.../selftests/net/ipxlat/ipxlat_frag.sh | 70 ++
.../selftests/net/ipxlat/ipxlat_icmp_err.sh | 54 ++
.../selftests/net/ipxlat/ipxlat_lib.sh | 273 +++++++
.../net/ipxlat/ipxlat_udp4_zero_csum_send.c | 119 +++
35 files changed, 5381 insertions(+)
create mode 100644 Documentation/netlink/specs/ipxlat.yaml
create mode 100644 Documentation/networking/ipxlat.rst
create mode 100644 drivers/net/ipxlat/Makefile
create mode 100644 drivers/net/ipxlat/address.c
create mode 100644 drivers/net/ipxlat/address.h
create mode 100644 drivers/net/ipxlat/dispatch.c
create mode 100644 drivers/net/ipxlat/dispatch.h
create mode 100644 drivers/net/ipxlat/icmp.h
create mode 100644 drivers/net/ipxlat/icmp_46.c
create mode 100644 drivers/net/ipxlat/icmp_64.c
create mode 100644 drivers/net/ipxlat/ipxlpriv.h
create mode 100644 drivers/net/ipxlat/main.c
create mode 100644 drivers/net/ipxlat/main.h
create mode 100644 drivers/net/ipxlat/netlink-gen.c
create mode 100644 drivers/net/ipxlat/netlink-gen.h
create mode 100644 drivers/net/ipxlat/netlink.c
create mode 100644 drivers/net/ipxlat/netlink.h
create mode 100644 drivers/net/ipxlat/packet.c
create mode 100644 drivers/net/ipxlat/packet.h
create mode 100644 drivers/net/ipxlat/translate_46.c
create mode 100644 drivers/net/ipxlat/translate_46.h
create mode 100644 drivers/net/ipxlat/translate_64.c
create mode 100644 drivers/net/ipxlat/translate_64.h
create mode 100644 drivers/net/ipxlat/transport.c
create mode 100644 drivers/net/ipxlat/transport.h
create mode 100644 include/uapi/linux/ipxlat.h
create mode 100644 tools/testing/selftests/net/ipxlat/.gitignore
create mode 100644 tools/testing/selftests/net/ipxlat/Makefile
create mode 100755 tools/testing/selftests/net/ipxlat/ipxlat_data.sh
create mode 100755 tools/testing/selftests/net/ipxlat/ipxlat_frag.sh
create mode 100755 tools/testing/selftests/net/ipxlat/ipxlat_icmp_err.sh
create mode 100644 tools/testing/selftests/net/ipxlat/ipxlat_lib.sh
create mode 100644 tools/testing/selftests/net/ipxlat/ipxlat_udp4_zero_csum_send.c
--
2.53.0
^ permalink raw reply [flat|nested] 18+ messages in thread
* [RFC net-next 01/15] drivers/net: add ipxlat netdevice skeleton and build plumbing
2026-03-19 15:12 [RFC net-next 00/15] Introducing ipxlat: a stateless IPv4/IPv6 translation device Ralf Lici
@ 2026-03-19 15:12 ` Ralf Lici
2026-03-19 15:12 ` [RFC net-next 02/15] ipxlat: add RFC 6052 address conversion helpers Ralf Lici
` (13 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Ralf Lici @ 2026-03-19 15:12 UTC (permalink / raw)
To: netdev
Cc: Daniel Gröber, Ralf Lici, Antonio Quartulli, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
linux-kernel
ipxlat is a virtual netdevice implementing stateless IPv4/IPv6
translation (SIIT). The translation model follows RFC 7915 behavior and
RFC 6052 address embedding rules.
The netdevice form is intentional: it provides per-instance lifecycle,
MTU/statistics semantics and explicit routing integration, so translated
traffic can be steered through a dedicated device and configured per
namespace.
This series targets ipxlat as a reusable kernel building block for SIIT
deployments and for NAT64-style setups when combined with existing
nftables rules in userspace policy.
This first patch introduces only the driver scaffolding:
- drivers/net/ipxlat/ directory and build integration
- Kconfig/Makefile entries
- basic private structures and defaults
- rtnl_link_ops and netdevice skeleton needed to create/register links
No translation logic is added in this patch yet. Follow-up patches add
packet validation, transport/ICMP translation, error handling,
fragmentation handling, generic netlink control plane, selftests and
documentation.
Signed-off-by: Ralf Lici <ralf@mandelbit.com>
---
drivers/net/Kconfig | 13 ++++
drivers/net/Makefile | 1 +
drivers/net/ipxlat/Makefile | 7 ++
drivers/net/ipxlat/ipxlpriv.h | 53 +++++++++++++
drivers/net/ipxlat/main.c | 137 ++++++++++++++++++++++++++++++++++
drivers/net/ipxlat/main.h | 27 +++++++
6 files changed, 238 insertions(+)
create mode 100644 drivers/net/ipxlat/Makefile
create mode 100644 drivers/net/ipxlat/ipxlpriv.h
create mode 100644 drivers/net/ipxlat/main.c
create mode 100644 drivers/net/ipxlat/main.h
diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index b2fd90466bab..a3b28f294d95 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -117,6 +117,19 @@ config OVPN
This module enhances the performance of the OpenVPN userspace software
by offloading the data channel processing to kernelspace.
+config IPXLAT
+ tristate "IPv6<>IPv4 packet translation virtual device (SIIT)"
+ depends on NET && INET && IPV6
+ help
+ Virtual network device driver for Stateless IP/ICMP Packet
+ Translation (RFC 7915). Useful for IPv6 focused networks.
+ Particularly NAT64, SIIT-DC, 464XLAT network architectures.
+
+ See also <file:Documentation/networking/ipxlat.rst>.
+
+ To compile this driver as a module, choose M here: the module will be
+ called ipxlat.
+
config EQUALIZER
tristate "EQL (serial line load balancing) support"
help
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 5b01215f6829..4f982c9e6585 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -24,6 +24,7 @@ obj-$(CONFIG_NET) += loopback.o
obj-$(CONFIG_NETDEV_LEGACY_INIT) += Space.o
obj-$(CONFIG_NETCONSOLE) += netconsole.o
obj-$(CONFIG_NETKIT) += netkit.o
+obj-$(CONFIG_IPXLAT) += ipxlat/
obj-y += phy/
obj-y += pse-pd/
obj-y += mdio/
diff --git a/drivers/net/ipxlat/Makefile b/drivers/net/ipxlat/Makefile
new file mode 100644
index 000000000000..bd48c2700bf5
--- /dev/null
+++ b/drivers/net/ipxlat/Makefile
@@ -0,0 +1,7 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver
+
+obj-$(CONFIG_IPXLAT) := ipxlat.o
+
+ipxlat-objs += main.o
diff --git a/drivers/net/ipxlat/ipxlpriv.h b/drivers/net/ipxlat/ipxlpriv.h
new file mode 100644
index 000000000000..5027d8377bdd
--- /dev/null
+++ b/drivers/net/ipxlat/ipxlpriv.h
@@ -0,0 +1,53 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver
+ *
+ * Copyright (C) 2024- Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Copyright (C) 2026- Mandelbit SRL
+ * Copyright (C) 2026- Daniel Gröber <dxld@darkboxed.org>
+ *
+ * Author: Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Antonio Quartulli <antonio@mandelbit.com>
+ * Daniel Gröber <dxld@darkboxed.org>
+ * Ralf Lici <ralf@mandelbit.com>
+ */
+
+#ifndef _NET_IPXLAT_IPXLPRIV_H_
+#define _NET_IPXLAT_IPXLPRIV_H_
+
+#include <linux/mutex.h>
+#include <linux/netdevice.h>
+#include <net/gro_cells.h>
+
+/**
+ * struct ipv6_prefix - IPv6 prefix definition
+ * @addr: prefix address (host bits may be non-zero)
+ * @len: prefix length in bits
+ */
+struct ipv6_prefix {
+ struct in6_addr addr;
+ u8 len;
+};
+
+/**
+ * struct ipxlat_priv - private state stored in netdev priv area
+ * @dev: owning netdevice
+ * @xlat_prefix6: RFC 6052 prefix used for stateless v4<->v6 mapping
+ * @lowest_ipv6_mtu: LIM threshold used by 4->6 pre-fragment planning
+ * @cfg_lock: serializes control-plane updates
+ * @gro_cells: receive-side reinjection queue used by forward path
+ *
+ * Datapath reads config without taking @cfg_lock to keep per-packet overhead
+ * low. Writers serialize updates under @cfg_lock. During reconfiguration,
+ * readers may transiently observe mixed old/new values; this may cause a small
+ * number of drops and is an accepted tradeoff for a lightweight datapath.
+ */
+struct ipxlat_priv {
+ struct net_device *dev;
+ struct ipv6_prefix xlat_prefix6;
+ u32 lowest_ipv6_mtu;
+ /* serializes control-plane updates */
+ struct mutex cfg_lock;
+ struct gro_cells gro_cells;
+};
+
+#endif /* _NET_IPXLAT_IPXLPRIV_H_ */
diff --git a/drivers/net/ipxlat/main.c b/drivers/net/ipxlat/main.c
new file mode 100644
index 000000000000..26b7f5b6ff20
--- /dev/null
+++ b/drivers/net/ipxlat/main.c
@@ -0,0 +1,137 @@
+// SPDX-License-Identifier: GPL-2.0
+/* IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver
+ *
+ * Copyright (C) 2024- Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Copyright (C) 2026- Mandelbit SRL
+ * Copyright (C) 2026- Daniel Gröber <dxld@darkboxed.org>
+ *
+ * Author: Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Antonio Quartulli <antonio@mandelbit.com>
+ * Daniel Gröber <dxld@darkboxed.org>
+ * Ralf Lici <ralf@mandelbit.com>
+ */
+
+#include <linux/module.h>
+
+#include <net/ip.h>
+
+#include "ipxlpriv.h"
+#include "main.h"
+
+MODULE_AUTHOR("Alberto Leiva Popper <ydahhrk@gmail.com>");
+MODULE_AUTHOR("Antonio Quartulli <antonio@mandelbit.com>");
+MODULE_AUTHOR("Daniel Gröber <dxld@darkboxed.org>");
+MODULE_AUTHOR("Ralf Lici <ralf@mandelbit.com>");
+MODULE_DESCRIPTION("IPv6<>IPv4 translation virtual netdev support (SIIT)");
+MODULE_LICENSE("GPL");
+
+static int ipxlat_dev_init(struct net_device *dev)
+{
+ struct ipxlat_priv *ipxlat = netdev_priv(dev);
+ int err;
+
+ ipxlat->dev = dev;
+ /* default xlat-prefix6 is 64:ff9b::/96 */
+ ipxlat->xlat_prefix6.addr.s6_addr32[0] = htonl(0x0064ff9b);
+ ipxlat->xlat_prefix6.addr.s6_addr32[1] = 0;
+ ipxlat->xlat_prefix6.addr.s6_addr32[2] = 0;
+ ipxlat->xlat_prefix6.addr.s6_addr32[3] = 0;
+ ipxlat->xlat_prefix6.len = 96;
+ ipxlat->lowest_ipv6_mtu = 1280;
+ mutex_init(&ipxlat->cfg_lock);
+
+ err = gro_cells_init(&ipxlat->gro_cells, dev);
+ if (unlikely(err))
+ return err;
+
+ return 0;
+}
+
+static void ipxlat_dev_uninit(struct net_device *dev)
+{
+ struct ipxlat_priv *ipxlat = netdev_priv(dev);
+
+ gro_cells_destroy(&ipxlat->gro_cells);
+}
+
+static int ipxlat_start_xmit(struct sk_buff *skb, struct net_device *dev)
+{
+ dev_dstats_tx_dropped(dev);
+ kfree_skb(skb);
+ return NETDEV_TX_OK;
+}
+
+static const struct net_device_ops ipxlat_netdev_ops = {
+ .ndo_init = ipxlat_dev_init,
+ .ndo_uninit = ipxlat_dev_uninit,
+ .ndo_start_xmit = ipxlat_start_xmit,
+};
+
+static const struct device_type ipxlat_type = {
+ .name = "ipxlat",
+};
+
+static void ipxlat_setup(struct net_device *dev)
+{
+ const netdev_features_t feat = NETIF_F_SG | NETIF_F_FRAGLIST |
+ NETIF_F_HW_CSUM | NETIF_F_HIGHDMA |
+ NETIF_F_GSO_SOFTWARE;
+
+ dev->type = ARPHRD_NONE;
+ dev->flags = IFF_NOARP;
+ dev->priv_flags |= IFF_NO_QUEUE;
+ dev->hard_header_len = 0;
+ dev->addr_len = 0;
+
+ dev->lltx = true;
+ dev->features |= feat;
+ dev->hw_features |= feat;
+ dev->hw_enc_features |= feat;
+
+ dev->netdev_ops = &ipxlat_netdev_ops;
+ dev->needs_free_netdev = true;
+ dev->pcpu_stat_type = NETDEV_PCPU_STAT_DSTATS;
+ dev->max_mtu = IP_MAX_MTU - sizeof(struct ipv6hdr) -
+ sizeof(struct iphdr);
+ dev->min_mtu = IPV6_MIN_MTU;
+ dev->mtu = ETH_DATA_LEN;
+
+ /* keep skb->dst up to ndo_start_xmit so ICMP error emission can
+ * reuse routing metadata from ingress when available
+ */
+ netif_keep_dst(dev);
+
+ SET_NETDEV_DEVTYPE(dev, &ipxlat_type);
+}
+
+static struct rtnl_link_ops ipxlat_link_ops = {
+ .kind = "ipxlat",
+ .priv_size = sizeof(struct ipxlat_priv),
+ .setup = ipxlat_setup,
+};
+
+bool ipxlat_dev_is_valid(const struct net_device *dev)
+{
+ return dev->rtnl_link_ops == &ipxlat_link_ops;
+}
+
+static int __init ipxlat_init(void)
+{
+ int err;
+
+ err = rtnl_link_register(&ipxlat_link_ops);
+ if (err) {
+ pr_err("ipxlat: failed to register rtnl link ops: %d\n", err);
+ return err;
+ }
+
+ return 0;
+}
+
+static void __exit ipxlat_exit(void)
+{
+ rtnl_link_unregister(&ipxlat_link_ops);
+}
+
+module_init(ipxlat_init);
+module_exit(ipxlat_exit);
diff --git a/drivers/net/ipxlat/main.h b/drivers/net/ipxlat/main.h
new file mode 100644
index 000000000000..fb78f910b2e2
--- /dev/null
+++ b/drivers/net/ipxlat/main.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver
+ *
+ * Copyright (C) 2024- Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Copyright (C) 2026- Mandelbit SRL
+ * Copyright (C) 2026- Daniel Gröber <dxld@darkboxed.org>
+ *
+ * Author: Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Antonio Quartulli <antonio@mandelbit.com>
+ * Daniel Gröber <dxld@darkboxed.org>
+ * Ralf Lici <ralf@mandelbit.com>
+ */
+
+#ifndef _NET_IPXLAT_MAIN_H_
+#define _NET_IPXLAT_MAIN_H_
+
+#include <linux/netdevice.h>
+
+/**
+ * ipxlat_dev_is_valid - tell whether a netdev is an ipxlat interface
+ * @dev: netdevice to inspect
+ *
+ * Return: true if @dev was created with ipxlat link ops.
+ */
+bool ipxlat_dev_is_valid(const struct net_device *dev);
+
+#endif /* _NET_IPXLAT_MAIN_H_ */
--
2.53.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [RFC net-next 02/15] ipxlat: add RFC 6052 address conversion helpers
2026-03-19 15:12 [RFC net-next 00/15] Introducing ipxlat: a stateless IPv4/IPv6 translation device Ralf Lici
2026-03-19 15:12 ` [RFC net-next 01/15] drivers/net: add ipxlat netdevice skeleton and build plumbing Ralf Lici
@ 2026-03-19 15:12 ` Ralf Lici
2026-03-19 15:12 ` [RFC net-next 03/15] ipxlat: add packet metadata control block helpers Ralf Lici
` (12 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Ralf Lici @ 2026-03-19 15:12 UTC (permalink / raw)
To: netdev
Cc: Daniel Gröber, Ralf Lici, Antonio Quartulli, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
linux-kernel
Introduce IPv4/IPv6 stateless address mapping helpers used by the
translation pipeline. Add the core 4<->6 conversion routines, including
RFC 6052 prefix embedding/extraction and the RFC 6791 fallback source
selection logic used by ICMP translation paths.
Signed-off-by: Ralf Lici <ralf@mandelbit.com>
---
drivers/net/ipxlat/Makefile | 1 +
drivers/net/ipxlat/address.c | 132 +++++++++++++++++++++++++++++++++++
drivers/net/ipxlat/address.h | 59 ++++++++++++++++
3 files changed, 192 insertions(+)
create mode 100644 drivers/net/ipxlat/address.c
create mode 100644 drivers/net/ipxlat/address.h
diff --git a/drivers/net/ipxlat/Makefile b/drivers/net/ipxlat/Makefile
index bd48c2700bf5..b6367dedd78e 100644
--- a/drivers/net/ipxlat/Makefile
+++ b/drivers/net/ipxlat/Makefile
@@ -5,3 +5,4 @@
obj-$(CONFIG_IPXLAT) := ipxlat.o
ipxlat-objs += main.o
+ipxlat-objs += address.o
diff --git a/drivers/net/ipxlat/address.c b/drivers/net/ipxlat/address.c
new file mode 100644
index 000000000000..d1a2b7d1768f
--- /dev/null
+++ b/drivers/net/ipxlat/address.c
@@ -0,0 +1,132 @@
+// SPDX-License-Identifier: GPL-2.0
+/* IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver
+ *
+ * Copyright (C) 2024- Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Copyright (C) 2026- Mandelbit SRL
+ * Copyright (C) 2026- Daniel Gröber <dxld@darkboxed.org>
+ *
+ * Author: Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Antonio Quartulli <antonio@mandelbit.com>
+ * Daniel Gröber <dxld@darkboxed.org>
+ * Ralf Lici <ralf@mandelbit.com>
+ */
+
+#include "address.h"
+
+static bool ipxlat_prefix6_contains(const struct ipv6_prefix *prefix,
+ const struct in6_addr *addr)
+{
+ return ipv6_prefix_equal(&prefix->addr, addr, prefix->len);
+}
+
+static __be32 ipxlat_64_extract_addr(const struct in6_addr *src,
+ unsigned int q1, unsigned int q2,
+ unsigned int q3, unsigned int q4)
+{
+ q1 = src->s6_addr[q1];
+ q2 = src->s6_addr[q2];
+ q3 = src->s6_addr[q3];
+ q4 = src->s6_addr[q4];
+ return htonl((q1 << 24) | (q2 << 16) | (q3 << 8) | q4);
+}
+
+static void ipxlat_46_embed_addr(__be32 __src, struct in6_addr *dst,
+ unsigned int q1, unsigned int q2,
+ unsigned int q3, unsigned int q4)
+{
+ u32 src = ntohl(__src);
+
+ dst->s6_addr[q1] = ((src >> 24) & 0xFF);
+ dst->s6_addr[q2] = ((src >> 16) & 0xFF);
+ dst->s6_addr[q3] = ((src >> 8) & 0xFF);
+ dst->s6_addr[q4] = ((src) & 0xFF);
+}
+
+void ipxlat_46_convert_addr(const struct ipv6_prefix *xlat_prefix6,
+ __be32 addr4, struct in6_addr *addr6)
+{
+ *addr6 = xlat_prefix6->addr;
+
+ switch (xlat_prefix6->len) {
+ case 96:
+ addr6->s6_addr32[3] = addr4;
+ return;
+ case 64:
+ ipxlat_46_embed_addr(addr4, addr6, 9, 10, 11, 12);
+ return;
+ case 56:
+ ipxlat_46_embed_addr(addr4, addr6, 7, 9, 10, 11);
+ return;
+ case 48:
+ ipxlat_46_embed_addr(addr4, addr6, 6, 7, 9, 10);
+ return;
+ case 40:
+ ipxlat_46_embed_addr(addr4, addr6, 5, 6, 7, 9);
+ return;
+ case 32:
+ addr6->s6_addr32[1] = addr4;
+ return;
+ }
+
+ DEBUG_NET_WARN_ON_ONCE(1);
+}
+
+int ipxlat_64_convert_addrs(const struct ipv6_prefix *xlat_prefix6,
+ const struct ipv6hdr *hdr6, bool icmp_err,
+ __be32 *src, __be32 *dst)
+{
+ bool src_ok;
+
+ src_ok = ipxlat_prefix6_contains(xlat_prefix6, &hdr6->saddr);
+ if (unlikely(!src_ok && !icmp_err))
+ return -EINVAL;
+ if (unlikely(!ipxlat_prefix6_contains(xlat_prefix6, &hdr6->daddr)))
+ return -EINVAL;
+
+ switch (xlat_prefix6->len) {
+ case 96:
+ if (likely(src_ok))
+ *src = hdr6->saddr.s6_addr32[3];
+ *dst = hdr6->daddr.s6_addr32[3];
+ break;
+ case 64:
+ if (likely(src_ok))
+ *src = ipxlat_64_extract_addr(&hdr6->saddr, 9, 10, 11,
+ 12);
+ *dst = ipxlat_64_extract_addr(&hdr6->daddr, 9, 10, 11, 12);
+ break;
+ case 56:
+ if (likely(src_ok))
+ *src = ipxlat_64_extract_addr(&hdr6->saddr, 7,
+ 9, 10, 11);
+ *dst = ipxlat_64_extract_addr(&hdr6->daddr, 7, 9, 10, 11);
+ break;
+ case 48:
+ if (likely(src_ok))
+ *src = ipxlat_64_extract_addr(&hdr6->saddr, 6,
+ 7, 9, 10);
+ *dst = ipxlat_64_extract_addr(&hdr6->daddr, 6, 7, 9, 10);
+ break;
+ case 40:
+ if (likely(src_ok))
+ *src = ipxlat_64_extract_addr(&hdr6->saddr, 5, 6, 7, 9);
+ *dst = ipxlat_64_extract_addr(&hdr6->daddr, 5, 6, 7, 9);
+ break;
+ case 32:
+ if (likely(src_ok))
+ *src = hdr6->saddr.s6_addr32[1];
+ *dst = hdr6->daddr.s6_addr32[1];
+ break;
+ default:
+ DEBUG_NET_WARN_ON_ONCE(1);
+ return -EINVAL;
+ }
+
+ /* keep 6->4 ICMP error translation functional even when the ICMPv6
+ * source is not xlat_prefix6-mapped (for example, stack-generated PTB)
+ */
+ if (unlikely(!src_ok))
+ *src = htonl(INADDR_DUMMY);
+
+ return 0;
+}
diff --git a/drivers/net/ipxlat/address.h b/drivers/net/ipxlat/address.h
new file mode 100644
index 000000000000..4283fdddac56
--- /dev/null
+++ b/drivers/net/ipxlat/address.h
@@ -0,0 +1,59 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver
+ *
+ * Copyright (C) 2024- Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Copyright (C) 2026- Mandelbit SRL
+ * Copyright (C) 2026- Daniel Gröber <dxld@darkboxed.org>
+ *
+ * Author: Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Antonio Quartulli <antonio@mandelbit.com>
+ * Daniel Gröber <dxld@darkboxed.org>
+ * Ralf Lici <ralf@mandelbit.com>
+ */
+
+#ifndef _NET_IPXLAT_ADDRESS_H_
+#define _NET_IPXLAT_ADDRESS_H_
+
+#include <linux/ip.h>
+#include <net/ipv6.h>
+
+#include "ipxlpriv.h"
+
+/**
+ * ipxlat_46_convert_addr - translate one IPv4 address into RFC 6052 IPv6 form
+ * @xlat_prefix6: configured RFC 6052 prefix
+ * @addr4: IPv4 address to convert
+ * @addr6: output IPv6 address
+ */
+void ipxlat_46_convert_addr(const struct ipv6_prefix *xlat_prefix6,
+ __be32 addr4, struct in6_addr *addr6);
+
+/**
+ * ipxlat_64_convert_addrs - translate outer IPv6 endpoints into IPv4 pair
+ * @xlat_prefix6: configured RFC 6052 prefix
+ * @hdr6: source IPv6 header
+ * @icmp_err: source packet is ICMPv6 error
+ * @src: output IPv4 source address
+ * @dst: output IPv4 destination address
+ *
+ * Return: 0 on success, negative errno on non-translatable addresses.
+ */
+int ipxlat_64_convert_addrs(const struct ipv6_prefix *xlat_prefix6,
+ const struct ipv6hdr *hdr6, bool icmp_err,
+ __be32 *src, __be32 *dst);
+
+/**
+ * ipxlat_46_convert_addrs - translate outer IPv4 endpoints into IPv6 pair
+ * @xlat_prefix6: configured RFC 6052 prefix
+ * @iph4: source IPv4 header
+ * @iph6: output IPv6 header (only saddr/daddr are updated)
+ */
+static inline void
+ipxlat_46_convert_addrs(const struct ipv6_prefix *xlat_prefix6,
+ const struct iphdr *iph4, struct ipv6hdr *iph6)
+{
+ ipxlat_46_convert_addr(xlat_prefix6, iph4->saddr, &iph6->saddr);
+ ipxlat_46_convert_addr(xlat_prefix6, iph4->daddr, &iph6->daddr);
+}
+
+#endif /* _NET_IPXLAT_ADDRESS_H_ */
--
2.53.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [RFC net-next 03/15] ipxlat: add packet metadata control block helpers
2026-03-19 15:12 [RFC net-next 00/15] Introducing ipxlat: a stateless IPv4/IPv6 translation device Ralf Lici
2026-03-19 15:12 ` [RFC net-next 01/15] drivers/net: add ipxlat netdevice skeleton and build plumbing Ralf Lici
2026-03-19 15:12 ` [RFC net-next 02/15] ipxlat: add RFC 6052 address conversion helpers Ralf Lici
@ 2026-03-19 15:12 ` Ralf Lici
2026-03-19 15:12 ` [RFC net-next 04/15] ipxlat: add IPv4 packet validation path Ralf Lici
` (11 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Ralf Lici @ 2026-03-19 15:12 UTC (permalink / raw)
To: netdev
Cc: Daniel Gröber, Ralf Lici, Antonio Quartulli, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
linux-kernel
Add the per-skb control-block layout and shared packet helper routines
used by translation stages introducing common metadata bookkeeping
(offset rebasing and invariant checks) plus protocol-fragment helper
utilities.
Signed-off-by: Ralf Lici <ralf@mandelbit.com>
---
drivers/net/ipxlat/Makefile | 1 +
drivers/net/ipxlat/packet.c | 99 +++++++++++++++++++++
drivers/net/ipxlat/packet.h | 166 ++++++++++++++++++++++++++++++++++++
3 files changed, 266 insertions(+)
create mode 100644 drivers/net/ipxlat/packet.c
create mode 100644 drivers/net/ipxlat/packet.h
diff --git a/drivers/net/ipxlat/Makefile b/drivers/net/ipxlat/Makefile
index b6367dedd78e..90dbc0489fa2 100644
--- a/drivers/net/ipxlat/Makefile
+++ b/drivers/net/ipxlat/Makefile
@@ -6,3 +6,4 @@ obj-$(CONFIG_IPXLAT) := ipxlat.o
ipxlat-objs += main.o
ipxlat-objs += address.o
+ipxlat-objs += packet.o
diff --git a/drivers/net/ipxlat/packet.c b/drivers/net/ipxlat/packet.c
new file mode 100644
index 000000000000..f82c375255f3
--- /dev/null
+++ b/drivers/net/ipxlat/packet.c
@@ -0,0 +1,99 @@
+// SPDX-License-Identifier: GPL-2.0
+/* IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver
+ *
+ * Copyright (C) 2024- Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Copyright (C) 2026- Mandelbit SRL
+ * Copyright (C) 2026- Daniel Gröber <dxld@darkboxed.org>
+ *
+ * Author: Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Antonio Quartulli <antonio@mandelbit.com>
+ * Daniel Gröber <dxld@darkboxed.org>
+ * Ralf Lici <ralf@mandelbit.com>
+ */
+
+#include "packet.h"
+
+/* Shift cached skb cb offsets by the L3 header delta after in-place rewrite.
+ *
+ * Translation may replace only the outer L3 header size (4->6 or 6->4), while
+ * cached offsets were computed before rewrite. Rebasing applies the same delta
+ * to all cached absolute offsets so they still point to the same logical
+ * fields in the modified skb.
+ *
+ * This helper only guards against underflow (< 0). Relative ordering checks
+ * are done by ipxlat_cb_offsets_valid.
+ */
+int ipxlat_cb_rebase_offsets(struct ipxlat_cb *cb, int delta)
+{
+ int off;
+
+ off = cb->l4_off + delta;
+ if (unlikely(off < 0))
+ return -EINVAL;
+ cb->l4_off = off;
+
+ off = cb->payload_off + delta;
+ if (unlikely(off < 0))
+ return -EINVAL;
+ cb->payload_off = off;
+
+ if (unlikely(cb->is_icmp_err)) {
+ off = cb->inner_l3_offset + delta;
+ if (unlikely(off < 0))
+ return -EINVAL;
+ cb->inner_l3_offset = off;
+
+ off = cb->inner_l4_offset + delta;
+ if (unlikely(off < 0))
+ return -EINVAL;
+ cb->inner_l4_offset = off;
+
+ if (cb->inner_fragh_off) {
+ off = cb->inner_fragh_off + delta;
+ if (unlikely(off < 0))
+ return -EINVAL;
+ cb->inner_fragh_off = off;
+ }
+ }
+
+ return 0;
+}
+
+#ifdef CONFIG_DEBUG_NET
+/* Verify ordering/range relations between cached skb cb offsets.
+ *
+ * Unlike ipxlat_cb_rebase_offsets, this checks structural invariants:
+ * l4 <= payload, inner_l3 >= payload, inner_l3 <= inner_l4, and fragment
+ * header (when present) located inside inner L3 area before inner L4.
+ */
+bool ipxlat_cb_offsets_valid(const struct ipxlat_cb *cb)
+{
+ if (unlikely(cb->payload_off < cb->l4_off))
+ return false;
+
+ if (unlikely(cb->is_icmp_err)) {
+ if (unlikely(cb->inner_l3_offset < cb->payload_off))
+ return false;
+ if (unlikely(cb->inner_l4_offset < cb->inner_l3_offset))
+ return false;
+ if (unlikely(cb->inner_fragh_off &&
+ cb->inner_fragh_off < cb->inner_l3_offset))
+ return false;
+ if (unlikely(cb->inner_fragh_off &&
+ cb->inner_fragh_off >= cb->inner_l4_offset))
+ return false;
+ }
+
+ return true;
+}
+#endif
+
+int ipxlat_v4_validate_skb(struct ipxlat_priv *ipxl, struct sk_buff *skb)
+{
+ return -EOPNOTSUPP;
+}
+
+int ipxlat_v6_validate_skb(struct sk_buff *skb)
+{
+ return -EOPNOTSUPP;
+}
diff --git a/drivers/net/ipxlat/packet.h b/drivers/net/ipxlat/packet.h
new file mode 100644
index 000000000000..f39c25987940
--- /dev/null
+++ b/drivers/net/ipxlat/packet.h
@@ -0,0 +1,166 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver
+ *
+ * Copyright (C) 2024- Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Copyright (C) 2026- Mandelbit SRL
+ * Copyright (C) 2026- Daniel Gröber <dxld@darkboxed.org>
+ *
+ * Author: Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Antonio Quartulli <antonio@mandelbit.com>
+ * Daniel Gröber <dxld@darkboxed.org>
+ * Ralf Lici <ralf@mandelbit.com>
+ */
+
+#ifndef _NET_IPXLAT_PACKET_H_
+#define _NET_IPXLAT_PACKET_H_
+
+#include <net/ip.h>
+
+#include "ipxlpriv.h"
+
+/**
+ * struct ipxlat_cb - per-skb parser and control metadata stored in skb->cb
+ * @l4_off: outer L4 header offset
+ * @payload_off: outer payload offset
+ * @fragh_off: outer IPv6 Fragment Header offset, or 0 if absent
+ * @inner_l3_offset: quoted inner L3 offset for ICMP errors
+ * @inner_l4_offset: quoted inner L4 offset for ICMP errors
+ * @inner_fragh_off: quoted inner IPv6 Fragment Header offset, or 0
+ * @udp_zero_csum_len: outer UDP length used for 4->6 checksum synthesis
+ * @frag_max_size: pre-fragment payload cap for ip_do_fragment
+ * @l4_proto: outer L4 protocol (or nexthdr for IPv6)
+ * @inner_l4_proto: quoted inner L4 protocol
+ * @l3_hdr_len: outer L3 header length including extension headers
+ * @inner_l3_hdr_len: quoted inner L3 header length
+ * @is_icmp_err: packet is ICMP error and carries quoted inner packet
+ * @emit_icmp_err: datapath must emit translator-generated ICMP on drop
+ * @icmp_err: ICMP type/code/info cached for deferred emission
+ * @icmp_err.type: ICMP type to emit
+ * @icmp_err.code: ICMP code to emit
+ * @icmp_err.info: ICMP auxiliary info (e.g. pointer/MTU)
+ */
+struct ipxlat_cb {
+ u16 l4_off;
+ u16 payload_off;
+ u16 fragh_off;
+ u16 inner_l3_offset;
+ u16 inner_l4_offset;
+ u16 inner_fragh_off;
+ /* L4 span length (UDP header + payload) for outer IPv4 UDP packets
+ * arriving with checksum 0.
+ */
+ u16 udp_zero_csum_len;
+ u16 frag_max_size;
+ u8 l4_proto;
+ u8 inner_l4_proto;
+ u8 l3_hdr_len;
+ u8 inner_l3_hdr_len;
+ bool is_icmp_err;
+ bool emit_icmp_err;
+ struct {
+ u8 type;
+ u8 code;
+ u32 info;
+ } icmp_err;
+};
+
+/**
+ * ipxlat_skb_cb - return ipxlat private control block in skb->cb
+ * @skb: skb carrying ipxlat metadata
+ *
+ * Return: pointer to &struct ipxlat_cb stored in the control buffer of @skb.
+ */
+static inline struct ipxlat_cb *ipxlat_skb_cb(const struct sk_buff *skb)
+{
+ BUILD_BUG_ON(sizeof(struct ipxlat_cb) > sizeof(skb->cb));
+ return (struct ipxlat_cb *)(skb->cb);
+}
+
+static inline unsigned int ipxlat_skb_datagram_len(const struct sk_buff *skb)
+{
+ return skb->len - skb_transport_offset(skb);
+}
+
+static inline u8 ipxlat_get_ipv6_tclass(const struct ipv6hdr *hdr)
+{
+ return (hdr->priority << 4) | (hdr->flow_lbl[0] >> 4);
+}
+
+static inline u16 ipxlat_get_frag6_offset(const struct frag_hdr *hdr)
+{
+ return be16_to_cpu(hdr->frag_off) & 0xFFF8U;
+}
+
+static inline u16 ipxlat_get_frag4_offset(const struct iphdr *hdr)
+{
+ return (be16_to_cpu(hdr->frag_off) & IP_OFFSET) << 3;
+}
+
+static inline bool ipxlat_is_first_frag6(const struct frag_hdr *hdr)
+{
+ return hdr ? (ipxlat_get_frag6_offset(hdr) == 0) : true;
+}
+
+static inline bool ipxlat_is_first_frag4(const struct iphdr *hdr)
+{
+ return !(hdr->frag_off & htons(IP_OFFSET));
+}
+
+static inline __be16 ipxlat_build_frag6_offset(u16 frag_offset, bool mf)
+{
+ return cpu_to_be16((frag_offset & 0xFFF8U) | mf);
+}
+
+static inline __be16
+ipxlat_build_frag4_offset(bool df, bool mf, u16 frag_offset)
+{
+ return cpu_to_be16((df ? (1U << 14) : 0) | (mf ? (1U << 13) : 0) |
+ (frag_offset >> 3));
+}
+
+/**
+ * ipxlat_cb_rebase_offsets - shift cached cb offsets after skb relayout
+ * @cb: parsed packet metadata
+ * @delta: signed byte delta applied to cached offsets
+ *
+ * Return: 0 on success, negative errno if rebased offsets would underflow.
+ */
+int ipxlat_cb_rebase_offsets(struct ipxlat_cb *cb, int delta);
+#ifdef CONFIG_DEBUG_NET
+/**
+ * ipxlat_cb_offsets_valid - validate monotonicity and bounds of cb offsets
+ * @cb: parsed packet metadata
+ *
+ * Return: true if cached offsets are internally consistent.
+ */
+bool ipxlat_cb_offsets_valid(const struct ipxlat_cb *cb);
+#else
+static inline bool ipxlat_cb_offsets_valid(const struct ipxlat_cb *cb)
+{
+ return true;
+}
+#endif
+
+/**
+ * ipxlat_v4_validate_skb - validate and summarize IPv4 packet into skb->cb
+ * @ipxlat: translator private context
+ * @skb: packet to validate
+ *
+ * Populates &struct ipxlat_cb and may mark translator-generated ICMP action on
+ * failure paths.
+ *
+ * Return: 0 on success, negative errno on validation failure.
+ */
+int ipxlat_v4_validate_skb(struct ipxlat_priv *ipxlat, struct sk_buff *skb);
+
+/**
+ * ipxlat_v6_validate_skb - validate and summarize IPv6 packet into skb->cb
+ * @skb: packet to validate
+ *
+ * Populates &struct ipxlat_cb for subsequent 6->4 translation.
+ *
+ * Return: 0 on success, negative errno on validation failure.
+ */
+int ipxlat_v6_validate_skb(struct sk_buff *skb);
+
+#endif /* _NET_IPXLAT_PACKET_H_ */
--
2.53.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [RFC net-next 04/15] ipxlat: add IPv4 packet validation path
2026-03-19 15:12 [RFC net-next 00/15] Introducing ipxlat: a stateless IPv4/IPv6 translation device Ralf Lici
` (2 preceding siblings ...)
2026-03-19 15:12 ` [RFC net-next 03/15] ipxlat: add packet metadata control block helpers Ralf Lici
@ 2026-03-19 15:12 ` Ralf Lici
2026-03-19 15:12 ` [RFC net-next 05/15] ipxlat: add IPv6 " Ralf Lici
` (10 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Ralf Lici @ 2026-03-19 15:12 UTC (permalink / raw)
To: netdev
Cc: Daniel Gröber, Ralf Lici, Antonio Quartulli, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
linux-kernel
Implement IPv4 packet parsing and validation, including option
inspection, fragment-sensitive L4 checks, and UDP checksum-zero handling
consistent with translator constraints. The parser populates skb
control-block metadata consumed by translation and marks RFC-driven drop
reasons for later action handling.
Signed-off-by: Ralf Lici <ralf@mandelbit.com>
---
drivers/net/ipxlat/packet.c | 312 +++++++++++++++++++++++++++++++++++-
1 file changed, 310 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ipxlat/packet.c b/drivers/net/ipxlat/packet.c
index f82c375255f3..0cc619dca147 100644
--- a/drivers/net/ipxlat/packet.c
+++ b/drivers/net/ipxlat/packet.c
@@ -11,6 +11,8 @@
* Ralf Lici <ralf@mandelbit.com>
*/
+#include <linux/icmp.h>
+
#include "packet.h"
/* Shift cached skb cb offsets by the L3 header delta after in-place rewrite.
@@ -88,9 +90,315 @@ bool ipxlat_cb_offsets_valid(const struct ipxlat_cb *cb)
}
#endif
-int ipxlat_v4_validate_skb(struct ipxlat_priv *ipxl, struct sk_buff *skb)
+static bool ipxlat_v4_validate_addr(__be32 addr4)
{
- return -EOPNOTSUPP;
+ return !(ipv4_is_zeronet(addr4) || ipv4_is_loopback(addr4) ||
+ ipv4_is_multicast(addr4) || ipv4_is_lbcast(addr4));
+}
+
+/* RFC 7915 Section 4.1 requires ignoring IPv4 options unless an unexpired
+ * LSRR/SSRR is present, in which case we must send ICMPv4 SR_FAILED.
+ * We intentionally treat malformed option encoding as invalid input and
+ * drop early instead of continuing translation.
+ */
+static int ipxlat_v4_srr_check(struct sk_buff *skb, const struct iphdr *hdr)
+{
+ const u8 *opt, *end;
+ u8 type, len, ptr;
+
+ if (likely(hdr->ihl <= 5))
+ return 0;
+
+ opt = (const u8 *)(hdr + 1);
+ end = (const u8 *)hdr + (hdr->ihl << 2);
+
+ while (opt < end) {
+ type = opt[0];
+ if (type == IPOPT_END)
+ return 0;
+ if (type == IPOPT_NOOP) {
+ opt++;
+ continue;
+ }
+
+ if (unlikely(end - opt < 2))
+ return -EINVAL;
+
+ len = opt[1];
+ if (unlikely(len < 2 || opt + len > end))
+ return -EINVAL;
+
+ if (type == IPOPT_LSRR || type == IPOPT_SSRR) {
+ if (unlikely(len < 3))
+ return -EINVAL;
+
+ /* points to the beginning of the next IP addr */
+ ptr = opt[2];
+ if (unlikely(ptr < 4))
+ return -EINVAL;
+ if (unlikely(ptr > len))
+ return 0;
+ if (unlikely(ptr > len - 3))
+ return -EINVAL;
+
+ return -EINVAL;
+ }
+
+ opt += len;
+ }
+
+ return 0;
+}
+
+static int ipxlat_v4_pull_l3(struct sk_buff *skb, unsigned int l3_offset,
+ bool inner)
+{
+ const struct iphdr *iph;
+ unsigned int tot_len;
+ int l3_len;
+
+ if (unlikely(!pskb_may_pull(skb, l3_offset + sizeof(*iph))))
+ return -EINVAL;
+
+ iph = (const struct iphdr *)(skb->data + l3_offset);
+ if (unlikely(iph->version != 4 || iph->ihl < 5))
+ return -EINVAL;
+
+ l3_len = iph->ihl << 2;
+ /* For inner packets use ntohs(iph->tot_len) instead of iph_totlen.
+ * If inner iph->tot_len is zero, iph_totlen would fall back to outer
+ * GSO metadata, which is unrelated to quoted inner packet length.
+ */
+ tot_len = unlikely(inner) ? ntohs(iph->tot_len) : iph_totlen(skb, iph);
+ if (unlikely(tot_len < l3_len))
+ return -EINVAL;
+
+ if (unlikely(!pskb_may_pull(skb, l3_offset + l3_len)))
+ return -EINVAL;
+
+ return l3_len;
+}
+
+static int ipxlat_v4_pull_l4(struct sk_buff *skb, unsigned int l4_offset,
+ u8 l4_proto, bool *is_icmp_err)
+{
+ struct icmphdr *icmp;
+ struct udphdr *udp;
+ struct tcphdr *tcp;
+
+ *is_icmp_err = false;
+
+ switch (l4_proto) {
+ case IPPROTO_TCP:
+ if (unlikely(!pskb_may_pull(skb, l4_offset + sizeof(*tcp))))
+ return -EINVAL;
+
+ tcp = (struct tcphdr *)(skb->data + l4_offset);
+ if (unlikely(tcp->doff < 5))
+ return -EINVAL;
+
+ return __tcp_hdrlen(tcp);
+ case IPPROTO_UDP:
+ if (unlikely(!pskb_may_pull(skb, l4_offset + sizeof(*udp))))
+ return -EINVAL;
+
+ udp = (struct udphdr *)(skb->data + l4_offset);
+ if (unlikely(ntohs(udp->len) < sizeof(*udp)))
+ return -EINVAL;
+
+ return sizeof(struct udphdr);
+ case IPPROTO_ICMP:
+ if (unlikely(!pskb_may_pull(skb, l4_offset + sizeof(*icmp))))
+ return -EINVAL;
+
+ icmp = (struct icmphdr *)(skb->data + l4_offset);
+ *is_icmp_err = icmp_is_err(icmp->type);
+ return sizeof(struct icmphdr);
+ default:
+ return 0;
+ }
+}
+
+static int ipxlat_v4_pull_icmp_inner(struct sk_buff *skb,
+ unsigned int inner_l3_off)
+{
+ struct ipxlat_cb *cb = ipxlat_skb_cb(skb);
+ const struct iphdr *inner_l3_hdr;
+ unsigned int inner_l4_off;
+ int inner_l3_len, err;
+ bool is_icmp_err;
+
+ inner_l3_len = ipxlat_v4_pull_l3(skb, inner_l3_off, true);
+ if (unlikely(inner_l3_len < 0))
+ return inner_l3_len;
+ inner_l3_hdr = (const struct iphdr *)(skb->data + inner_l3_off);
+
+ /* accept non-first quoted fragments: only inner L3 is translatable */
+ inner_l4_off = inner_l3_off + inner_l3_len;
+ cb->inner_l3_offset = inner_l3_off;
+ cb->inner_l3_hdr_len = inner_l3_len;
+ cb->inner_l4_offset = inner_l4_off;
+
+ if (unlikely(!ipxlat_is_first_frag4(inner_l3_hdr)))
+ return 0;
+
+ err = ipxlat_v4_pull_l4(skb, inner_l4_off, inner_l3_hdr->protocol,
+ &is_icmp_err);
+ if (unlikely(err < 0))
+ return err;
+ if (unlikely(is_icmp_err))
+ return -EINVAL;
+
+ return 0;
+}
+
+static int ipxlat_v4_pull_hdrs(struct sk_buff *skb)
+{
+ const unsigned int l3_off = skb_network_offset(skb);
+ struct ipxlat_cb *cb = ipxlat_skb_cb(skb);
+ int err, l3_len, l4_len = 0;
+ const struct iphdr *l3_hdr;
+
+ /* parse IPv4 header and get its full length including options */
+ l3_len = ipxlat_v4_pull_l3(skb, l3_off, false);
+ if (unlikely(l3_len < 0))
+ return l3_len;
+ l3_hdr = ip_hdr(skb);
+
+ if (unlikely(!ipxlat_v4_validate_addr(l3_hdr->daddr)))
+ return -EINVAL;
+
+ /* RFC 7915 Section 4.1 */
+ if (unlikely(ipxlat_v4_srr_check(skb, l3_hdr)))
+ return -EINVAL;
+ if (unlikely(l3_hdr->ttl <= 1))
+ return -EINVAL;
+
+ /* RFC 7915 Section 1.2:
+ * Fragmented ICMP/ICMPv6 packets will not be translated by IP/ICMP
+ * translators.
+ */
+ if (unlikely(l3_hdr->protocol == IPPROTO_ICMP &&
+ ip_is_fragment(l3_hdr)))
+ return -EINVAL;
+
+ cb->l3_hdr_len = l3_len;
+ cb->l4_proto = l3_hdr->protocol;
+ cb->l4_off = l3_off + l3_len;
+ cb->payload_off = cb->l4_off;
+ cb->is_icmp_err = false;
+
+ /* only non fragmented packets or first fragments have transport hdrs */
+ if (unlikely(!ipxlat_is_first_frag4(l3_hdr))) {
+ if (unlikely(!ipxlat_v4_validate_addr(l3_hdr->saddr)))
+ return -EINVAL;
+ return 0;
+ }
+
+ l4_len = ipxlat_v4_pull_l4(skb, cb->l4_off, l3_hdr->protocol,
+ &cb->is_icmp_err);
+ if (unlikely(l4_len < 0))
+ return l4_len;
+
+ /* RFC 7915 Section 4.1:
+ * Illegal IPv4 sources are accepted only for ICMPv4 error translation.
+ */
+ if (unlikely(!ipxlat_v4_validate_addr(l3_hdr->saddr) &&
+ !cb->is_icmp_err))
+ return -EINVAL;
+
+ cb->payload_off = cb->l4_off + l4_len;
+
+ if (unlikely(cb->is_icmp_err)) {
+ /* validate the quoted packet in an ICMP error */
+ err = ipxlat_v4_pull_icmp_inner(skb, cb->payload_off);
+ if (unlikely(err))
+ return err;
+ }
+
+ return 0;
+}
+
+static int ipxlat_v4_validate_icmp_csum(const struct sk_buff *skb)
+{
+ __sum16 csum;
+
+ /* skip when checksum is not software-owned */
+ if (skb->ip_summed != CHECKSUM_NONE)
+ return 0;
+
+ /* compute checksum over ICMP header and payload, then fold to 16-bit
+ * Internet checksum to validate it
+ */
+ csum = csum_fold(skb_checksum(skb, skb_transport_offset(skb),
+ ipxlat_skb_datagram_len(skb), 0));
+ return unlikely(csum) ? -EINVAL : 0;
+}
+
+/**
+ * ipxlat_v4_validate_skb - validate IPv4 input and fill parser metadata in cb
+ * @ipxlat: translator private context
+ * @skb: packet to validate
+ *
+ * Ensures required headers are present/consistent and stores parsed offsets
+ * into &struct ipxlat_cb for the translation path.
+ *
+ * Return: 0 on success, negative errno on validation failure.
+ */
+int ipxlat_v4_validate_skb(struct ipxlat_priv *ipxlat, struct sk_buff *skb)
+{
+ struct ipxlat_cb *cb = ipxlat_skb_cb(skb);
+ struct iphdr *l3_hdr;
+ struct udphdr *udph;
+ int err;
+
+ if (unlikely(skb_shared(skb)))
+ return -EINVAL;
+
+ err = ipxlat_v4_pull_hdrs(skb);
+ if (unlikely(err))
+ return err;
+
+ skb_set_transport_header(skb, cb->l4_off);
+
+ if (unlikely(cb->is_icmp_err)) {
+ if (unlikely(cb->l4_proto != IPPROTO_ICMP)) {
+ DEBUG_NET_WARN_ON_ONCE(1);
+ return -EINVAL;
+ }
+
+ /* Translation path recomputes ICMPv6 checksum from scratch.
+ * Validate here so a corrupted ICMPv4 error is not converted
+ * into a translated packet with a valid checksum.
+ */
+ return ipxlat_v4_validate_icmp_csum(skb);
+ }
+
+ l3_hdr = ip_hdr(skb);
+ if (likely(cb->l4_proto != IPPROTO_UDP))
+ return 0;
+ if (unlikely(!ipxlat_is_first_frag4(l3_hdr)))
+ return 0;
+
+ udph = udp_hdr(skb);
+ if (likely(udph->check != 0))
+ return 0;
+
+ /* We are in the path where L4 header is present (unfragmented packets
+ * or first fragments) and is UDP.
+ * Fragmented checksum-less IPv4 UDP is rejected because 4->6 cannot
+ * reliably translate it.
+ */
+ if (unlikely(ip_is_fragment(l3_hdr)))
+ return -EINVAL;
+
+ /* udph->len bounds the span used to compute replacement checksum */
+ if (unlikely(ntohs(udph->len) > skb->len - cb->l4_off))
+ return -EINVAL;
+
+ cb->udp_zero_csum_len = ntohs(udph->len);
+
+ return 0;
}
int ipxlat_v6_validate_skb(struct sk_buff *skb)
--
2.53.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [RFC net-next 05/15] ipxlat: add IPv6 packet validation path
2026-03-19 15:12 [RFC net-next 00/15] Introducing ipxlat: a stateless IPv4/IPv6 translation device Ralf Lici
` (3 preceding siblings ...)
2026-03-19 15:12 ` [RFC net-next 04/15] ipxlat: add IPv4 packet validation path Ralf Lici
@ 2026-03-19 15:12 ` Ralf Lici
2026-03-19 15:12 ` [RFC net-next 06/15] ipxlat: add transport checksum and offload helpers Ralf Lici
` (9 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Ralf Lici @ 2026-03-19 15:12 UTC (permalink / raw)
To: netdev
Cc: Daniel Gröber, Ralf Lici, Antonio Quartulli, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
linux-kernel
Implement IPv6 packet parsing and validation, including extension header
traversal, fragment-header constraints, and ICMPv6 checksum handling for
informational/error traffic. The parser fills skb control-block metadata
for 6->4 translation and quoted-inner packet handling.
Signed-off-by: Ralf Lici <ralf@mandelbit.com>
---
drivers/net/ipxlat/packet.c | 326 +++++++++++++++++++++++++++++++++++-
1 file changed, 325 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ipxlat/packet.c b/drivers/net/ipxlat/packet.c
index 0cc619dca147..b9a9af1b3adb 100644
--- a/drivers/net/ipxlat/packet.c
+++ b/drivers/net/ipxlat/packet.c
@@ -401,7 +401,331 @@ int ipxlat_v4_validate_skb(struct ipxlat_priv *ipxlat, struct sk_buff *skb)
return 0;
}
+static bool ipxlat_v6_validate_saddr(const struct in6_addr *addr6)
+{
+ return !(ipv6_addr_any(addr6) || ipv6_addr_loopback(addr6) ||
+ ipv6_addr_is_multicast(addr6));
+}
+
+static int ipxlat_v6_pull_l4(struct sk_buff *skb, unsigned int l4_offset,
+ u8 l4_proto, bool *is_icmp_err)
+{
+ struct icmp6hdr *icmp;
+ struct udphdr *udp;
+ struct tcphdr *tcp;
+
+ *is_icmp_err = false;
+
+ switch (l4_proto) {
+ case NEXTHDR_TCP:
+ if (unlikely(!pskb_may_pull(skb, l4_offset + sizeof(*tcp))))
+ return -EINVAL;
+ tcp = (struct tcphdr *)(skb->data + l4_offset);
+ return __tcp_hdrlen(tcp);
+ case NEXTHDR_UDP:
+ if (unlikely(!pskb_may_pull(skb, l4_offset + sizeof(*udp))))
+ return -EINVAL;
+ udp = (struct udphdr *)(skb->data + l4_offset);
+ if (unlikely(ntohs(udp->len) < sizeof(*udp)))
+ return -EINVAL;
+ return sizeof(struct udphdr);
+ case NEXTHDR_ICMP:
+ if (unlikely(!pskb_may_pull(skb, l4_offset + sizeof(*icmp))))
+ return -EINVAL;
+ icmp = (struct icmp6hdr *)(skb->data + l4_offset);
+ *is_icmp_err = icmpv6_is_err(icmp->icmp6_type);
+ return sizeof(struct icmp6hdr);
+ default:
+ return 0;
+ }
+}
+
+/* Basic IPv6 header walk: parse only the packet starting at l3_offset.
+ * It does not inspect quoted inner packets carried by ICMP errors.
+ */
+static int ipxlat_v6_walk_hdrs(struct sk_buff *skb, unsigned int l3_offset,
+ u8 *l4_proto, unsigned int *fhdr_offset,
+ unsigned int *l4_offset, bool *has_l4)
+{
+ unsigned int frag_hdr_off, l4hdr_off;
+ struct frag_hdr *frag;
+ struct ipv6hdr *ip6;
+ bool first_frag;
+ int err;
+
+ /* cannot use default getter because this function is used both for
+ * outer and inner packets
+ */
+ ip6 = (struct ipv6hdr *)(skb->data + l3_offset);
+
+ /* if present, locate Fragment Header first because it affects
+ * whether transport headers are available
+ */
+ frag_hdr_off = l3_offset;
+ err = ipv6_find_hdr(skb, &frag_hdr_off, NEXTHDR_FRAGMENT, NULL, NULL);
+ if (unlikely(err < 0 && err != -ENOENT))
+ return -EINVAL;
+
+ *has_l4 = true;
+ *fhdr_offset = 0;
+ if (unlikely(err == NEXTHDR_FRAGMENT)) {
+ if (unlikely(!pskb_may_pull(skb, frag_hdr_off + sizeof(*frag))))
+ return -EINVAL;
+ frag = (struct frag_hdr *)(skb->data + frag_hdr_off);
+
+ /* remember Fragment Header offset for downstream logic */
+ *fhdr_offset = frag_hdr_off;
+ first_frag = ipxlat_is_first_frag6(frag);
+
+ /* ipv6 forbids chaining FHs */
+ if (unlikely(frag->nexthdr == NEXTHDR_FRAGMENT))
+ return -EINVAL;
+
+ /* RFC 7915 Section 5.1.1 does not support extension headers
+ * after FH (except NEXTHDR_NONE)
+ */
+ if (unlikely(ipv6_ext_hdr(frag->nexthdr) &&
+ frag->nexthdr != NEXTHDR_NONE))
+ return -EPROTONOSUPPORT;
+
+ /* non-first fragments do not carry a full transport header */
+ if (!first_frag) {
+ *l4_proto = frag->nexthdr;
+ /* first byte after FH is fragment payload,
+ * not L4 header
+ */
+ *l4_offset = frag_hdr_off + sizeof(struct frag_hdr);
+ *has_l4 = false;
+ return 0;
+ }
+ }
+
+ /* walk extension headers to terminal protocol and compute offsets used
+ * by validation/translation
+ */
+ l4hdr_off = l3_offset;
+ err = ipv6_find_hdr(skb, &l4hdr_off, -1, NULL, NULL);
+ if (unlikely(err < 0))
+ return -EINVAL;
+
+ *l4_proto = err;
+ *l4_offset = l4hdr_off;
+ return 0;
+}
+
+/* RFC 7915 Section 5.1 says a Routing Header with Segments Left != 0
+ * must not be translated. We detect it by asking ipv6_find_hdr not to
+ * skip RH, then emit ICMPv6 Parameter Problem pointing to segments_left.
+ */
+static int ipxlat_v6_check_rh(struct sk_buff *skb)
+{
+ unsigned int rh_off;
+ int flags, nexthdr;
+
+ rh_off = 0;
+ flags = IP6_FH_F_SKIP_RH;
+ nexthdr = ipv6_find_hdr(skb, &rh_off, NEXTHDR_ROUTING, NULL, &flags);
+ if (unlikely(nexthdr < 0 && nexthdr != -ENOENT))
+ return -EINVAL;
+ if (likely(nexthdr != NEXTHDR_ROUTING))
+ return 0;
+
+ return -EINVAL;
+}
+
+static int ipxlat_v6_pull_outer_l3(struct sk_buff *skb)
+{
+ const unsigned int l3_off = skb_network_offset(skb);
+ struct ipv6hdr *l3_hdr;
+
+ if (unlikely(!pskb_may_pull(skb, l3_off + sizeof(*l3_hdr))))
+ return -EINVAL;
+ l3_hdr = ipv6_hdr(skb);
+
+ /* translator does not support jumbograms; payload_len must match skb */
+ if (unlikely(l3_hdr->version != 6 ||
+ skb->len != sizeof(*l3_hdr) +
+ be16_to_cpu(l3_hdr->payload_len) ||
+ !ipxlat_v6_validate_saddr(&l3_hdr->saddr)))
+ return -EINVAL;
+
+ if (unlikely(l3_hdr->hop_limit <= 1))
+ return -EINVAL;
+
+ return 0;
+}
+
+static int ipxlat_v6_pull_icmp_inner(struct sk_buff *skb,
+ unsigned int outer_payload_off)
+{
+ unsigned int inner_fhdr_off, inner_l4_off;
+ struct ipxlat_cb *cb = ipxlat_skb_cb(skb);
+ struct ipv6hdr *inner_ip6;
+ bool has_l4, is_icmp_err;
+ u8 inner_l4_proto;
+ int err;
+
+ if (unlikely(!pskb_may_pull(skb,
+ outer_payload_off + sizeof(*inner_ip6))))
+ return -EINVAL;
+
+ inner_ip6 = (struct ipv6hdr *)(skb->data + outer_payload_off);
+ if (unlikely(inner_ip6->version != 6))
+ return -EINVAL;
+
+ err = ipxlat_v6_walk_hdrs(skb, outer_payload_off, &inner_l4_proto,
+ &inner_fhdr_off, &inner_l4_off, &has_l4);
+ if (unlikely(err))
+ return err;
+
+ cb->inner_l3_offset = outer_payload_off;
+ cb->inner_l4_offset = inner_l4_off;
+ cb->inner_fragh_off = inner_fhdr_off;
+ cb->inner_l4_proto = inner_l4_proto;
+
+ if (likely(has_l4)) {
+ err = ipxlat_v6_pull_l4(skb, inner_l4_off, inner_l4_proto,
+ &is_icmp_err);
+ if (unlikely(err < 0))
+ return err;
+ if (unlikely(is_icmp_err))
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static int ipxlat_v6_pull_hdrs(struct sk_buff *skb)
+{
+ const unsigned int l3_off = skb_network_offset(skb);
+ unsigned int fragh_off, l4_off, payload_off;
+ struct ipxlat_cb *cb = ipxlat_skb_cb(skb);
+ int l3_len, l4_len, err;
+ struct frag_hdr *frag;
+ bool has_l4;
+ u8 l4_proto;
+
+ /* parse IPv6 base header and perform basic structural checks */
+ err = ipxlat_v6_pull_outer_l3(skb);
+ if (unlikely(err))
+ return err;
+
+ /* walk extension/fragment headers and locate the transport header */
+ err = ipxlat_v6_walk_hdrs(skb, l3_off, &l4_proto, &fragh_off, &l4_off,
+ &has_l4);
+ /* -EPROTONOSUPPORT means packet layout is syntactically valid but
+ * unsupported by our RFC 7915 path
+ */
+ if (unlikely(err == -EPROTONOSUPPORT))
+ return -EINVAL;
+ if (unlikely(err))
+ return err;
+
+ l3_len = l4_off - l3_off;
+ payload_off = l4_off;
+
+ if (likely(has_l4)) {
+ l4_len = ipxlat_v6_pull_l4(skb, l4_off, l4_proto,
+ &cb->is_icmp_err);
+ if (unlikely(l4_len < 0))
+ return l4_len;
+ payload_off += l4_len;
+ }
+
+ /* RFC 7915 Section 5.1 */
+ err = ipxlat_v6_check_rh(skb);
+ if (unlikely(err))
+ return err;
+
+ if (unlikely(l4_proto == NEXTHDR_ICMP)) {
+ /* A stateless translator cannot reliably translate ICMP
+ * checksum across real IPv6 fragments, so fragmented ICMP is
+ * dropped. A Fragment Header alone, however, is not enough to
+ * decide: so-called atomic fragments (offset=0, M=0) carry a
+ * Fragment Header but are not actually fragmented.
+ */
+ if (unlikely(fragh_off)) {
+ if (unlikely(!pskb_may_pull(skb,
+ fragh_off + sizeof(*frag))))
+ return -EINVAL;
+
+ frag = (struct frag_hdr *)(skb->data + fragh_off);
+ if (unlikely(ipxlat_get_frag6_offset(frag) ||
+ (be16_to_cpu(frag->frag_off) & IP6_MF)))
+ return -EINVAL;
+ }
+
+ if (unlikely(cb->is_icmp_err)) {
+ /* validate the quoted packet in an ICMP error */
+ err = ipxlat_v6_pull_icmp_inner(skb, payload_off);
+ if (unlikely(err))
+ return err;
+ }
+ }
+
+ cb->l4_proto = l4_proto;
+ cb->l4_off = l4_off;
+ cb->fragh_off = fragh_off;
+ cb->payload_off = payload_off;
+ cb->l3_hdr_len = l3_len;
+
+ return 0;
+}
+
+static int ipxlat_v6_validate_icmp_csum(const struct sk_buff *skb)
+{
+ struct ipv6hdr *iph6;
+ unsigned int len;
+ __sum16 csum;
+
+ if (skb->ip_summed != CHECKSUM_NONE)
+ return 0;
+
+ iph6 = ipv6_hdr(skb);
+ len = ipxlat_skb_datagram_len(skb);
+ csum = csum_ipv6_magic(&iph6->saddr, &iph6->daddr, len, NEXTHDR_ICMP,
+ skb_checksum(skb, skb_transport_offset(skb), len,
+ 0));
+
+ return unlikely(csum) ? -EINVAL : 0;
+}
+
+/**
+ * ipxlat_v6_validate_skb - validate IPv6 input and fill parser metadata in cb
+ * @skb: packet to validate
+ *
+ * Ensures required headers are present/consistent and stores parsed offsets
+ * into &struct ipxlat_cb for the translation path.
+ *
+ * Return: 0 on success, negative errno on validation failure.
+ */
int ipxlat_v6_validate_skb(struct sk_buff *skb)
{
- return -EOPNOTSUPP;
+ struct ipxlat_cb *cb = ipxlat_skb_cb(skb);
+ int err;
+
+ if (unlikely(skb_shared(skb)))
+ return -EINVAL;
+
+ err = ipxlat_v6_pull_hdrs(skb);
+ if (unlikely(err))
+ return err;
+
+ skb_set_transport_header(skb, cb->l4_off);
+
+ if (unlikely(cb->is_icmp_err)) {
+ if (unlikely(cb->l4_proto != NEXTHDR_ICMP)) {
+ DEBUG_NET_WARN_ON_ONCE(1);
+ return -EINVAL;
+ }
+
+ /* The translated ICMPv4 checksum is recomputed from scratch,
+ * so reject bad ICMPv6 error checksums before conversion.
+ */
+ err = ipxlat_v6_validate_icmp_csum(skb);
+ if (unlikely(err))
+ return err;
+ }
+
+ return 0;
}
--
2.53.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [RFC net-next 06/15] ipxlat: add transport checksum and offload helpers
2026-03-19 15:12 [RFC net-next 00/15] Introducing ipxlat: a stateless IPv4/IPv6 translation device Ralf Lici
` (4 preceding siblings ...)
2026-03-19 15:12 ` [RFC net-next 05/15] ipxlat: add IPv6 " Ralf Lici
@ 2026-03-19 15:12 ` Ralf Lici
2026-03-19 15:12 ` [RFC net-next 07/15] ipxlat: add 4to6 and 6to4 TCP/UDP translation helpers Ralf Lici
` (8 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Ralf Lici @ 2026-03-19 15:12 UTC (permalink / raw)
To: netdev
Cc: Daniel Gröber, Ralf Lici, Antonio Quartulli, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
linux-kernel
Add shared transport-layer helpers for checksum manipulation and offload
metadata normalization across family translation.
This introduces incremental and full checksum utilities plus generic
ICMP relayout/offload finalization routines reused by later 4->6 and
6->4 transport translation paths.
Signed-off-by: Ralf Lici <ralf@mandelbit.com>
---
drivers/net/ipxlat/transport.c | 146 +++++++++++++++++++++++++++++++++
drivers/net/ipxlat/transport.h | 83 +++++++++++++++++++
2 files changed, 229 insertions(+)
create mode 100644 drivers/net/ipxlat/transport.c
create mode 100644 drivers/net/ipxlat/transport.h
diff --git a/drivers/net/ipxlat/transport.c b/drivers/net/ipxlat/transport.c
new file mode 100644
index 000000000000..cd786ce84adc
--- /dev/null
+++ b/drivers/net/ipxlat/transport.c
@@ -0,0 +1,146 @@
+// SPDX-License-Identifier: GPL-2.0
+/* IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver
+ *
+ * Copyright (C) 2024- Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Copyright (C) 2026- Mandelbit SRL
+ * Copyright (C) 2026- Daniel Gröber <dxld@darkboxed.org>
+ *
+ * Author: Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Antonio Quartulli <antonio@mandelbit.com>
+ * Daniel Gröber <dxld@darkboxed.org>
+ * Ralf Lici <ralf@mandelbit.com>
+ */
+
+#include <net/ip.h>
+#include <net/ip6_checksum.h>
+#include <net/tcp.h>
+#include <net/udp.h>
+
+#include "packet.h"
+#include "transport.h"
+
+/* set CHECKSUM_PARTIAL metadata for transport checksum completion */
+int ipxlat_set_partial_csum(struct sk_buff *skb, u16 csum_offset)
+{
+ if (likely(skb_partial_csum_set(skb, skb_transport_offset(skb),
+ csum_offset)))
+ return 0;
+ return -EINVAL;
+}
+
+static __wsum ipxlat_pseudohdr6_csum(const struct ipv6hdr *hdr)
+{
+ return ~csum_unfold(csum_ipv6_magic(&hdr->saddr, &hdr->daddr, 0, 0, 0));
+}
+
+static __wsum ipxlat_pseudohdr4_csum(const struct iphdr *hdr)
+{
+ return csum_tcpudp_nofold(hdr->saddr, hdr->daddr, 0, 0, 0);
+}
+
+static __sum16 ipxlat_46_update_csum(__sum16 csum16,
+ const struct iphdr *in_ip4,
+ const void *in_l4_hdr,
+ const struct ipv6hdr *out_ip6,
+ const void *out_l4_hdr, size_t l4_hdr_len)
+{
+ __wsum csum;
+
+ csum = ~csum_unfold(csum16);
+
+ /* replace pseudohdr and L4 header contributions, payload unchanged */
+ csum = csum_sub(csum, ipxlat_pseudohdr4_csum(in_ip4));
+ csum = csum_sub(csum, csum_partial(in_l4_hdr, l4_hdr_len, 0));
+ csum = csum_add(csum, ipxlat_pseudohdr6_csum(out_ip6));
+ csum = csum_add(csum, csum_partial(out_l4_hdr, l4_hdr_len, 0));
+ return csum_fold(csum);
+}
+
+static __sum16 ipxlat_64_update_csum(__sum16 csum16,
+ const struct ipv6hdr *in_ip6,
+ const void *in_l4_hdr,
+ size_t in_l4_hdr_len,
+ const struct iphdr *out_ip4,
+ const void *out_l4_hdr,
+ size_t out_l4_hdr_len)
+{
+ __wsum csum;
+
+ csum = ~csum_unfold(csum16);
+
+ /* only address terms matter because L4 length/proto are unchanged */
+ csum = csum_sub(csum, ipxlat_pseudohdr6_csum(in_ip6));
+ csum = csum_sub(csum, csum_partial(in_l4_hdr, in_l4_hdr_len, 0));
+
+ csum = csum_add(csum, ipxlat_pseudohdr4_csum(out_ip4));
+ csum = csum_add(csum, csum_partial(out_l4_hdr, out_l4_hdr_len, 0));
+
+ return csum_fold(csum);
+}
+
+__sum16 ipxlat_l4_csum_ipv6(const struct in6_addr *saddr,
+ const struct in6_addr *daddr,
+ const struct sk_buff *skb, unsigned int l4_off,
+ unsigned int l4_len, u8 proto)
+{
+ return csum_ipv6_magic(saddr, daddr, l4_len, proto,
+ skb_checksum(skb, l4_off, l4_len, 0));
+}
+
+/* Normalize checksum/offload metadata after address-family translation.
+ *
+ * Translation changes protocol family but keeps transport payload semantics
+ * intact, so TCP GSO only needs type remap (gso_from -> gso_to), while ICMP
+ * must clear stale GSO state because there is no ICMP GSO transform here.
+ *
+ * This mirrors forwarding expectations: reject LRO on xmit and clear hash
+ * when tuple semantics may have changed (fragments and non-TCP/UDP).
+ */
+int ipxlat_finalize_offload(struct sk_buff *skb, u8 l4_proto, bool is_fragment,
+ u32 gso_from, u32 gso_to)
+{
+ struct skb_shared_info *shinfo;
+
+ if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE))
+ skb->ip_summed = CHECKSUM_NONE;
+
+ if (!skb_is_gso(skb))
+ goto out_hash;
+
+ /* align with forwarding paths that reject LRO skbs before xmit */
+ if (unlikely(skb_warn_if_lro(skb)))
+ return -EINVAL;
+
+ shinfo = skb_shinfo(skb);
+ switch (l4_proto) {
+ case IPPROTO_TCP:
+ /* segment payload size is unchanged by address-family
+ * translation so there's no need to touch gso_size
+ */
+ if (shinfo->gso_type & gso_from) {
+ shinfo->gso_type &= ~gso_from;
+ shinfo->gso_type |= gso_to;
+ } else if (unlikely(!(shinfo->gso_type & gso_to))) {
+ return -EOPNOTSUPP;
+ }
+ break;
+ case IPPROTO_UDP:
+ break;
+ case IPPROTO_ICMP:
+ /* for ICMP there is no GSO transform; clear stale offload
+ * metadata so the stack treats it as a normal frame
+ */
+ skb_gso_reset(skb);
+ break;
+ default:
+ return -EPROTONOSUPPORT;
+ }
+
+out_hash:
+ if (unlikely(is_fragment ||
+ (l4_proto != IPPROTO_TCP && l4_proto != IPPROTO_UDP)))
+ skb_clear_hash(skb);
+ else
+ skb_clear_hash_if_not_l4(skb);
+ return 0;
+}
diff --git a/drivers/net/ipxlat/transport.h b/drivers/net/ipxlat/transport.h
new file mode 100644
index 000000000000..bd228aecfb3b
--- /dev/null
+++ b/drivers/net/ipxlat/transport.h
@@ -0,0 +1,83 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver
+ *
+ * Copyright (C) 2024- Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Copyright (C) 2026- Mandelbit SRL
+ * Copyright (C) 2026- Daniel Gröber <dxld@darkboxed.org>
+ *
+ * Author: Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Antonio Quartulli <antonio@mandelbit.com>
+ * Daniel Gröber <dxld@darkboxed.org>
+ * Ralf Lici <ralf@mandelbit.com>
+ */
+
+#ifndef _NET_IPXLAT_TRANSPORT_H_
+#define _NET_IPXLAT_TRANSPORT_H_
+
+#include <linux/icmp.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+
+/**
+ * ipxlat_l4_min_len - minimum transport header size for protocol
+ * @protocol: transport protocol identifier
+ *
+ * Return: minimum header length for @protocol, or 0 when unsupported.
+ */
+static inline unsigned int ipxlat_l4_min_len(u8 protocol)
+{
+ switch (protocol) {
+ case IPPROTO_TCP:
+ return sizeof(struct tcphdr);
+ case IPPROTO_UDP:
+ return sizeof(struct udphdr);
+ case IPPROTO_ICMP:
+ return sizeof(struct icmphdr);
+ default:
+ return 0;
+ }
+}
+
+/**
+ * ipxlat_set_partial_csum - program CHECKSUM_PARTIAL metadata on skb
+ * @skb: packet with transport checksum field
+ * @csum_offset: offset of checksum field within transport header
+ *
+ * Return: 0 on success, negative errno on invalid skb state.
+ */
+int ipxlat_set_partial_csum(struct sk_buff *skb, u16 csum_offset);
+
+/**
+ * ipxlat_l4_csum_ipv6 - compute full L4 checksum with IPv6 pseudo-header
+ * @saddr: IPv6 source address
+ * @daddr: IPv6 destination address
+ * @skb: packet buffer
+ * @l4_off: transport header offset
+ * @l4_len: transport span (header + payload)
+ * @proto: transport protocol
+ *
+ * Return: folded checksum value covering pseudo-header and transport payload.
+ */
+__sum16 ipxlat_l4_csum_ipv6(const struct in6_addr *saddr,
+ const struct in6_addr *daddr,
+ const struct sk_buff *skb, unsigned int l4_off,
+ unsigned int l4_len, u8 proto);
+
+/**
+ * ipxlat_finalize_offload - normalize checksum/GSO metadata after translation
+ * @skb: translated packet
+ * @l4_proto: resulting transport protocol
+ * @is_fragment: resulting packet is fragmented
+ * @gso_from: input TCP GSO type bit
+ * @gso_to: output TCP GSO type bit
+ *
+ * Converts TCP GSO family bits and clears stale checksum/hash state when
+ * offload metadata cannot be preserved across address-family translation.
+ *
+ * Return: 0 on success, negative errno on unsupported/offload-incompatible
+ * input.
+ */
+int ipxlat_finalize_offload(struct sk_buff *skb, u8 l4_proto, bool is_fragment,
+ u32 gso_from, u32 gso_to);
+
+#endif /* _NET_IPXLAT_TRANSPORT_H_ */
--
2.53.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [RFC net-next 07/15] ipxlat: add 4to6 and 6to4 TCP/UDP translation helpers
2026-03-19 15:12 [RFC net-next 00/15] Introducing ipxlat: a stateless IPv4/IPv6 translation device Ralf Lici
` (5 preceding siblings ...)
2026-03-19 15:12 ` [RFC net-next 06/15] ipxlat: add transport checksum and offload helpers Ralf Lici
@ 2026-03-19 15:12 ` Ralf Lici
2026-03-19 15:12 ` [RFC net-next 08/15] ipxlat: add translation engine and dispatch core Ralf Lici
` (7 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Ralf Lici @ 2026-03-19 15:12 UTC (permalink / raw)
To: netdev
Cc: Daniel Gröber, Ralf Lici, Antonio Quartulli, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
linux-kernel
Add protocol-specific transport translation entry points for both
address-family directions.
This wires checksum adjustment for outer and quoted-inner TCP/UDP
headers and provides the transport routines consumed by the translation
engine.
Signed-off-by: Ralf Lici <ralf@mandelbit.com>
---
drivers/net/ipxlat/transport.c | 194 +++++++++++++++++++++++++++++++++
drivers/net/ipxlat/transport.h | 20 ++++
2 files changed, 214 insertions(+)
diff --git a/drivers/net/ipxlat/transport.c b/drivers/net/ipxlat/transport.c
index cd786ce84adc..3aa00c635916 100644
--- a/drivers/net/ipxlat/transport.c
+++ b/drivers/net/ipxlat/transport.c
@@ -144,3 +144,197 @@ int ipxlat_finalize_offload(struct sk_buff *skb, u8 l4_proto, bool is_fragment,
skb_clear_hash_if_not_l4(skb);
return 0;
}
+
+int ipxlat_46_outer_tcp(struct sk_buff *skb, const struct iphdr *in4)
+{
+ const struct ipv6hdr *iph6 = ipv6_hdr(skb);
+ struct tcphdr *tcp_new = tcp_hdr(skb);
+ struct tcphdr tcp_old;
+ __sum16 csum16;
+
+ /* CHECKSUM_PARTIAL keeps a pseudohdr seed in check, not a final
+ * transport checksum. For 4->6, we only re-seed it with IPv6 pseudohdr
+ * data and keep completion deferred to offload.
+ */
+ if (skb->ip_summed == CHECKSUM_PARTIAL) {
+ tcp_new->check = ~tcp_v6_check(ipxlat_skb_datagram_len(skb),
+ &iph6->saddr, &iph6->daddr, 0);
+ return ipxlat_set_partial_csum(skb,
+ offsetof(struct tcphdr, check));
+ }
+
+ /* zeroing check in old/new headers avoids double-accounting it */
+ csum16 = tcp_new->check;
+ tcp_old = *tcp_new;
+ tcp_old.check = 0;
+ tcp_new->check = 0;
+ tcp_new->check = ipxlat_46_update_csum(csum16, in4,
+ &tcp_old, iph6, tcp_new,
+ sizeof(*tcp_new));
+ skb->ip_summed = CHECKSUM_NONE;
+ return 0;
+}
+
+int ipxlat_46_outer_udp(struct sk_buff *skb, const struct iphdr *in4)
+{
+ const struct ipxlat_cb *cb = ipxlat_skb_cb(skb);
+ const struct ipv6hdr *iph6 = ipv6_hdr(skb);
+ struct udphdr *udp_new = udp_hdr(skb);
+ struct udphdr udp_old;
+ __sum16 csum16;
+
+ /* outer path enforces UDP zero-checksum policy in validation */
+ if (skb->ip_summed == CHECKSUM_PARTIAL && likely(udp_new->check != 0)) {
+ udp_new->check = ~udp_v6_check(ipxlat_skb_datagram_len(skb),
+ &iph6->saddr, &iph6->daddr, 0);
+ return ipxlat_set_partial_csum(skb,
+ offsetof(struct udphdr, check));
+ }
+
+ /* incoming UDP IPv4 has no checksum (legal in IPv4, not in IPv6) */
+ if (unlikely(udp_new->check == 0)) {
+ if (unlikely(!cb->udp_zero_csum_len))
+ return -EINVAL;
+
+ udp_new->check =
+ ipxlat_l4_csum_ipv6(&iph6->saddr, &iph6->daddr, skb,
+ skb_transport_offset(skb),
+ cb->udp_zero_csum_len, IPPROTO_UDP);
+ /* 0x0000 on wire means "no checksum"; preserve computed zero */
+ if (udp_new->check == 0)
+ udp_new->check = CSUM_MANGLED_0;
+ skb->ip_summed = CHECKSUM_NONE;
+ return 0;
+ }
+
+ csum16 = udp_new->check;
+ udp_old = *udp_new;
+ udp_old.check = 0;
+ udp_new->check = 0;
+ udp_new->check = ipxlat_46_update_csum(csum16, in4,
+ &udp_old, iph6, udp_new,
+ sizeof(*udp_new));
+ skb->ip_summed = CHECKSUM_NONE;
+ return 0;
+}
+
+int ipxlat_46_inner_tcp(struct sk_buff *skb, const struct iphdr *in4,
+ const struct ipv6hdr *iph6, struct tcphdr *tcp_new)
+{
+ struct tcphdr tcp_old;
+ __sum16 csum16;
+
+ csum16 = tcp_new->check;
+ tcp_old = *tcp_new;
+ tcp_old.check = 0;
+ tcp_new->check = 0;
+ tcp_new->check = ipxlat_46_update_csum(csum16, in4, &tcp_old, iph6,
+ tcp_new, sizeof(*tcp_new));
+ return 0;
+}
+
+int ipxlat_46_inner_udp(struct sk_buff *skb, const struct iphdr *in4,
+ const struct ipv6hdr *iph6, struct udphdr *udp_new)
+{
+ struct udphdr udp_old;
+ __sum16 csum16;
+
+ if (unlikely(udp_new->check == 0))
+ return 0;
+
+ csum16 = udp_new->check;
+ udp_old = *udp_new;
+ udp_old.check = 0;
+ udp_new->check = 0;
+ udp_new->check = ipxlat_46_update_csum(csum16, in4, &udp_old, iph6,
+ udp_new, sizeof(*udp_new));
+ return 0;
+}
+
+int ipxlat_64_outer_tcp(struct sk_buff *skb, const struct ipv6hdr *in6)
+{
+ struct tcphdr tcp_old, *tcp_new;
+ __sum16 csum16;
+
+ tcp_new = tcp_hdr(skb);
+
+ if (skb->ip_summed == CHECKSUM_PARTIAL) {
+ tcp_new->check = ~tcp_v4_check(ipxlat_skb_datagram_len(skb),
+ ip_hdr(skb)->saddr,
+ ip_hdr(skb)->daddr, 0);
+ return ipxlat_set_partial_csum(skb,
+ offsetof(struct tcphdr, check));
+ }
+
+ csum16 = tcp_new->check;
+ tcp_old = *tcp_new;
+ tcp_old.check = 0;
+ tcp_new->check = 0;
+ tcp_new->check = ipxlat_64_update_csum(csum16, in6, &tcp_old,
+ sizeof(tcp_old), ip_hdr(skb),
+ tcp_new, sizeof(*tcp_new));
+ skb->ip_summed = CHECKSUM_NONE;
+ return 0;
+}
+
+int ipxlat_64_outer_udp(struct sk_buff *skb, const struct ipv6hdr *in6)
+{
+ struct udphdr udp_old, *udp_new;
+ __sum16 csum16;
+
+ udp_new = udp_hdr(skb);
+
+ if (skb->ip_summed == CHECKSUM_PARTIAL) {
+ udp_new->check = ~udp_v4_check(ipxlat_skb_datagram_len(skb),
+ ip_hdr(skb)->saddr,
+ ip_hdr(skb)->daddr, 0);
+ return ipxlat_set_partial_csum(skb,
+ offsetof(struct udphdr, check));
+ }
+
+ csum16 = udp_new->check;
+ udp_old = *udp_new;
+ udp_old.check = 0;
+ udp_new->check = 0;
+ udp_new->check = ipxlat_64_update_csum(csum16, in6, &udp_old,
+ sizeof(udp_old), ip_hdr(skb),
+ udp_new, sizeof(*udp_new));
+ if (udp_new->check == 0)
+ udp_new->check = CSUM_MANGLED_0;
+ skb->ip_summed = CHECKSUM_NONE;
+ return 0;
+}
+
+int ipxlat_64_inner_tcp(struct sk_buff *skb, const struct ipv6hdr *in6,
+ const struct iphdr *out4, struct tcphdr *tcp_new)
+{
+ struct tcphdr tcp_old;
+ __sum16 csum16;
+
+ csum16 = tcp_new->check;
+ tcp_old = *tcp_new;
+ tcp_old.check = 0;
+ tcp_new->check = 0;
+ tcp_new->check = ipxlat_64_update_csum(csum16, in6, &tcp_old,
+ sizeof(tcp_old), out4, tcp_new,
+ sizeof(*tcp_new));
+ return 0;
+}
+
+int ipxlat_64_inner_udp(struct sk_buff *skb, const struct ipv6hdr *in6,
+ const struct iphdr *out4, struct udphdr *udp_new)
+{
+ struct udphdr udp_old;
+ __sum16 csum16;
+
+ csum16 = udp_new->check;
+ udp_old = *udp_new;
+ udp_old.check = 0;
+ udp_new->check = 0;
+ udp_new->check = ipxlat_64_update_csum(csum16, in6, &udp_old,
+ sizeof(udp_old), out4, udp_new,
+ sizeof(*udp_new));
+ if (udp_new->check == 0)
+ udp_new->check = CSUM_MANGLED_0;
+ return 0;
+}
diff --git a/drivers/net/ipxlat/transport.h b/drivers/net/ipxlat/transport.h
index bd228aecfb3b..9b6fe422b01f 100644
--- a/drivers/net/ipxlat/transport.h
+++ b/drivers/net/ipxlat/transport.h
@@ -80,4 +80,24 @@ __sum16 ipxlat_l4_csum_ipv6(const struct in6_addr *saddr,
int ipxlat_finalize_offload(struct sk_buff *skb, u8 l4_proto, bool is_fragment,
u32 gso_from, u32 gso_to);
+/* outer transport translation helpers (packet L3 already translated) */
+int ipxlat_46_outer_tcp(struct sk_buff *skb, const struct iphdr *in4);
+int ipxlat_46_outer_udp(struct sk_buff *skb, const struct iphdr *in4);
+
+/* quoted-inner transport translation helpers for ICMP error payloads */
+int ipxlat_46_inner_tcp(struct sk_buff *skb, const struct iphdr *in4,
+ const struct ipv6hdr *iph6, struct tcphdr *tcp_new);
+int ipxlat_46_inner_udp(struct sk_buff *skb, const struct iphdr *in4,
+ const struct ipv6hdr *iph6, struct udphdr *udp_new);
+
+/* outer transport translation helpers (packet L3 already translated) */
+int ipxlat_64_outer_tcp(struct sk_buff *skb, const struct ipv6hdr *in6);
+int ipxlat_64_outer_udp(struct sk_buff *skb, const struct ipv6hdr *in6);
+
+/* quoted-inner transport translation helpers for ICMP error payloads */
+int ipxlat_64_inner_tcp(struct sk_buff *skb, const struct ipv6hdr *in6,
+ const struct iphdr *out4, struct tcphdr *tcp_new);
+int ipxlat_64_inner_udp(struct sk_buff *skb, const struct ipv6hdr *in6,
+ const struct iphdr *out4, struct udphdr *udp_new);
+
#endif /* _NET_IPXLAT_TRANSPORT_H_ */
--
2.53.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [RFC net-next 08/15] ipxlat: add translation engine and dispatch core
2026-03-19 15:12 [RFC net-next 00/15] Introducing ipxlat: a stateless IPv4/IPv6 translation device Ralf Lici
` (6 preceding siblings ...)
2026-03-19 15:12 ` [RFC net-next 07/15] ipxlat: add 4to6 and 6to4 TCP/UDP translation helpers Ralf Lici
@ 2026-03-19 15:12 ` Ralf Lici
2026-03-19 15:12 ` [RFC net-next 09/15] ipxlat: emit translator-generated ICMP errors on drop Ralf Lici
` (6 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Ralf Lici @ 2026-03-19 15:12 UTC (permalink / raw)
To: netdev
Cc: Daniel Gröber, Ralf Lici, Antonio Quartulli, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
linux-kernel
This commit introduces the core start_xmit processing flow: validate,
select action, translate, and forward. It centralizes action resolution
in the dispatch layer and keeps per-direction translation logic separate
from device glue. The result is a single data-path entry point with
explicit control over drop/forward/emit behavior.
Signed-off-by: Ralf Lici <ralf@mandelbit.com>
---
drivers/net/ipxlat/Makefile | 4 +
drivers/net/ipxlat/dispatch.c | 104 +++++++++++++++
drivers/net/ipxlat/dispatch.h | 71 +++++++++++
drivers/net/ipxlat/main.c | 6 +-
drivers/net/ipxlat/packet.c | 1 +
drivers/net/ipxlat/translate_46.c | 198 +++++++++++++++++++++++++++++
drivers/net/ipxlat/translate_46.h | 73 +++++++++++
drivers/net/ipxlat/translate_64.c | 205 ++++++++++++++++++++++++++++++
drivers/net/ipxlat/translate_64.h | 56 ++++++++
drivers/net/ipxlat/transport.c | 11 ++
drivers/net/ipxlat/transport.h | 5 +
11 files changed, 732 insertions(+), 2 deletions(-)
create mode 100644 drivers/net/ipxlat/dispatch.c
create mode 100644 drivers/net/ipxlat/dispatch.h
create mode 100644 drivers/net/ipxlat/translate_46.c
create mode 100644 drivers/net/ipxlat/translate_46.h
create mode 100644 drivers/net/ipxlat/translate_64.c
create mode 100644 drivers/net/ipxlat/translate_64.h
diff --git a/drivers/net/ipxlat/Makefile b/drivers/net/ipxlat/Makefile
index 90dbc0489fa2..d7b7097aee5f 100644
--- a/drivers/net/ipxlat/Makefile
+++ b/drivers/net/ipxlat/Makefile
@@ -7,3 +7,7 @@ obj-$(CONFIG_IPXLAT) := ipxlat.o
ipxlat-objs += main.o
ipxlat-objs += address.o
ipxlat-objs += packet.o
+ipxlat-objs += transport.o
+ipxlat-objs += dispatch.o
+ipxlat-objs += translate_46.o
+ipxlat-objs += translate_64.o
diff --git a/drivers/net/ipxlat/dispatch.c b/drivers/net/ipxlat/dispatch.c
new file mode 100644
index 000000000000..133d30859f49
--- /dev/null
+++ b/drivers/net/ipxlat/dispatch.c
@@ -0,0 +1,104 @@
+// SPDX-License-Identifier: GPL-2.0
+/* IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver
+ *
+ * Copyright (C) 2024- Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Copyright (C) 2026- Mandelbit SRL
+ * Copyright (C) 2026- Daniel Gröber <dxld@darkboxed.org>
+ *
+ * Author: Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Antonio Quartulli <antonio@mandelbit.com>
+ * Daniel Gröber <dxld@darkboxed.org>
+ * Ralf Lici <ralf@mandelbit.com>
+ */
+
+#include <net/ip.h>
+
+#include "dispatch.h"
+#include "packet.h"
+#include "translate_46.h"
+#include "translate_64.h"
+
+static enum ipxlat_action
+ipxlat_resolve_failed_action(const struct sk_buff *skb)
+{
+ return IPXLAT_ACT_DROP;
+}
+
+enum ipxlat_action ipxlat_translate(struct ipxlat_priv *ipxlat,
+ struct sk_buff *skb)
+{
+ const u16 proto = ntohs(skb->protocol);
+
+ memset(skb->cb, 0, sizeof(struct ipxlat_cb));
+
+ if (proto == ETH_P_IPV6) {
+ if (unlikely(ipxlat_v6_validate_skb(skb)) ||
+ unlikely(ipxlat_64_translate(ipxlat, skb)))
+ return ipxlat_resolve_failed_action(skb);
+
+ return IPXLAT_ACT_FWD;
+ } else if (likely(proto == ETH_P_IP)) {
+ if (unlikely(ipxlat_v4_validate_skb(ipxlat, skb)))
+ return ipxlat_resolve_failed_action(skb);
+
+ if (unlikely(ipxlat_46_translate(ipxlat, skb)))
+ return ipxlat_resolve_failed_action(skb);
+
+ return IPXLAT_ACT_FWD;
+ }
+
+ return IPXLAT_ACT_DROP;
+}
+
+/* mark current skb as drop-with-icmp and cache type/code/info for dispatch */
+void ipxlat_mark_icmp_drop(struct sk_buff *skb, u8 type, u8 code, u32 info)
+{
+ struct ipxlat_cb *cb = ipxlat_skb_cb(skb);
+
+ cb->emit_icmp_err = true;
+ cb->icmp_err.type = type;
+ cb->icmp_err.code = code;
+ cb->icmp_err.info = info;
+}
+
+static void ipxlat_forward_pkt(struct ipxlat_priv *ipxlat, struct sk_buff *skb)
+{
+ const unsigned int len = skb->len;
+ int err;
+
+ /* reinject as a fresh packet with scrubbed metadata */
+ skb_set_queue_mapping(skb, 0);
+ skb_scrub_packet(skb, false);
+
+ err = gro_cells_receive(&ipxlat->gro_cells, skb);
+ if (likely(err == NET_RX_SUCCESS))
+ dev_dstats_rx_add(ipxlat->dev, len);
+ /* on failure gro_cells updates rx drop stats internally */
+}
+
+int ipxlat_process_skb(struct ipxlat_priv *ipxlat, struct sk_buff *skb,
+ bool allow_pre_frag)
+{
+ enum ipxlat_action action;
+ int err = -EINVAL;
+
+ (void)allow_pre_frag;
+
+ action = ipxlat_translate(ipxlat, skb);
+ switch (action) {
+ case IPXLAT_ACT_FWD:
+ dev_dstats_tx_add(ipxlat->dev, skb->len);
+ ipxlat_forward_pkt(ipxlat, skb);
+ return 0;
+ case IPXLAT_ACT_DROP:
+ goto drop_free;
+ default:
+ DEBUG_NET_WARN_ON_ONCE(1);
+ goto drop_free;
+ }
+
+drop_free:
+ dev_dstats_tx_dropped(ipxlat->dev);
+ kfree_skb(skb);
+ return err;
+}
diff --git a/drivers/net/ipxlat/dispatch.h b/drivers/net/ipxlat/dispatch.h
new file mode 100644
index 000000000000..fa6fafea656b
--- /dev/null
+++ b/drivers/net/ipxlat/dispatch.h
@@ -0,0 +1,71 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver
+ *
+ * Copyright (C) 2024- Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Copyright (C) 2026- Mandelbit SRL
+ * Copyright (C) 2026- Daniel Gröber <dxld@darkboxed.org>
+ *
+ * Author: Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Antonio Quartulli <antonio@mandelbit.com>
+ * Daniel Gröber <dxld@darkboxed.org>
+ * Ralf Lici <ralf@mandelbit.com>
+ */
+
+#ifndef _NET_IPXLAT_DISPATCH_H_
+#define _NET_IPXLAT_DISPATCH_H_
+
+#include "ipxlpriv.h"
+
+struct sk_buff;
+
+/**
+ * enum ipxlat_action - result of packet translation dispatch
+ * @IPXLAT_ACT_DROP: drop the packet
+ * @IPXLAT_ACT_FWD: packet translated and ready for forward reinjection
+ * @IPXLAT_ACT_PRE_FRAG: packet must be fragmented before 4->6 translation
+ * @IPXLAT_ACT_ICMP_ERR: drop packet and emit translator-generated ICMP error
+ */
+enum ipxlat_action {
+ IPXLAT_ACT_DROP,
+ IPXLAT_ACT_FWD,
+ IPXLAT_ACT_PRE_FRAG,
+ IPXLAT_ACT_ICMP_ERR,
+};
+
+/**
+ * ipxlat_mark_icmp_drop - cache translator-generated ICMP action in skb cb
+ * @skb: packet being rejected
+ * @type: ICMP type to emit
+ * @code: ICMP code to emit
+ * @info: ICMP auxiliary info (pointer/MTU), host-endian
+ *
+ * This does not emit immediately; dispatch consumes the mark later and sends
+ * the ICMP error through the appropriate address family path.
+ */
+void ipxlat_mark_icmp_drop(struct sk_buff *skb, u8 type, u8 code, u32 info);
+
+/**
+ * ipxlat_translate - validate/translate one packet and return next action
+ * @ipxlat: translator private context
+ * @skb: packet to process
+ *
+ * Return: one of &enum ipxlat_action.
+ */
+enum ipxlat_action ipxlat_translate(struct ipxlat_priv *ipxlat,
+ struct sk_buff *skb);
+
+/**
+ * ipxlat_process_skb - top-level packet handler for ndo_start_xmit/reinjection
+ * @ipxlat: translator private context
+ * @skb: packet to process
+ * @allow_pre_frag: allow 4->6 pre-fragment action for this invocation
+ *
+ * The function always consumes @skb directly or through fragmentation
+ * callback/reinjection paths.
+ *
+ * Return: 0 on success, negative errno on processing failure.
+ */
+int ipxlat_process_skb(struct ipxlat_priv *ipxlat, struct sk_buff *skb,
+ bool allow_pre_frag);
+
+#endif /* _NET_IPXLAT_DISPATCH_H_ */
diff --git a/drivers/net/ipxlat/main.c b/drivers/net/ipxlat/main.c
index 26b7f5b6ff20..a1b4bcd39478 100644
--- a/drivers/net/ipxlat/main.c
+++ b/drivers/net/ipxlat/main.c
@@ -15,6 +15,7 @@
#include <net/ip.h>
+#include "dispatch.h"
#include "ipxlpriv.h"
#include "main.h"
@@ -56,8 +57,9 @@ static void ipxlat_dev_uninit(struct net_device *dev)
static int ipxlat_start_xmit(struct sk_buff *skb, struct net_device *dev)
{
- dev_dstats_tx_dropped(dev);
- kfree_skb(skb);
+ struct ipxlat_priv *ipxlat = netdev_priv(dev);
+
+ ipxlat_process_skb(ipxlat, skb, true);
return NETDEV_TX_OK;
}
diff --git a/drivers/net/ipxlat/packet.c b/drivers/net/ipxlat/packet.c
index b9a9af1b3adb..b37a3e55aff8 100644
--- a/drivers/net/ipxlat/packet.c
+++ b/drivers/net/ipxlat/packet.c
@@ -13,6 +13,7 @@
#include <linux/icmp.h>
+#include "dispatch.h"
#include "packet.h"
/* Shift cached skb cb offsets by the L3 header delta after in-place rewrite.
diff --git a/drivers/net/ipxlat/translate_46.c b/drivers/net/ipxlat/translate_46.c
new file mode 100644
index 000000000000..aec8500db2c2
--- /dev/null
+++ b/drivers/net/ipxlat/translate_46.c
@@ -0,0 +1,198 @@
+// SPDX-License-Identifier: GPL-2.0
+/* IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver
+ *
+ * Copyright (C) 2024- Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Copyright (C) 2026- Mandelbit SRL
+ * Copyright (C) 2026- Daniel Gröber <dxld@darkboxed.org>
+ *
+ * Author: Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Antonio Quartulli <antonio@mandelbit.com>
+ * Daniel Gröber <dxld@darkboxed.org>
+ * Ralf Lici <ralf@mandelbit.com>
+ */
+
+#include <net/ip6_route.h>
+
+#include "address.h"
+#include "packet.h"
+#include "transport.h"
+#include "translate_46.h"
+
+u8 ipxlat_46_map_proto_to_nexthdr(u8 protocol)
+{
+ return (protocol == IPPROTO_ICMP) ? NEXTHDR_ICMP : protocol;
+}
+
+void ipxlat_46_build_frag_hdr(struct frag_hdr *fh6, const struct iphdr *hdr4,
+ u8 l4_proto)
+{
+ fh6->nexthdr = ipxlat_46_map_proto_to_nexthdr(l4_proto);
+ fh6->reserved = 0;
+ fh6->frag_off =
+ ipxlat_build_frag6_offset(ipxlat_get_frag4_offset(hdr4),
+ !!(be16_to_cpu(hdr4->frag_off) &
+ IP_MF));
+ fh6->identification = cpu_to_be32(be16_to_cpu(hdr4->id));
+}
+
+void ipxlat_46_build_l3(struct ipv6hdr *iph6, const struct iphdr *iph4,
+ unsigned int payload_len, u8 nexthdr, u8 hop_limit)
+{
+ iph6->version = 6;
+ iph6->priority = iph4->tos >> 4;
+ iph6->flow_lbl[0] = (iph4->tos & 0x0F) << 4;
+ iph6->flow_lbl[1] = 0;
+ iph6->flow_lbl[2] = 0;
+ iph6->payload_len = htons(payload_len);
+ iph6->nexthdr = nexthdr;
+ iph6->hop_limit = hop_limit;
+}
+
+/* Lookup post-translation IPv6 PMTU for 4->6 output decisions.
+ * Falls back to translator MTU on routing failures and clamps route MTU
+ * against translator egress MTU.
+ */
+unsigned int ipxlat_46_lookup_pmtu6(struct ipxlat_priv *ipxlat,
+ const struct sk_buff *skb,
+ const struct iphdr *in4)
+{
+ unsigned int mtu6, dev_mtu;
+ struct flowi6 fl6 = {};
+ struct dst_entry *dst;
+
+ dev_mtu = READ_ONCE(ipxlat->dev->mtu);
+
+ ipxlat_46_convert_addr(&ipxlat->xlat_prefix6, in4->saddr,
+ &fl6.saddr);
+ ipxlat_46_convert_addr(&ipxlat->xlat_prefix6, in4->daddr,
+ &fl6.daddr);
+ fl6.flowi6_mark = skb->mark;
+
+ dst = ip6_route_output(dev_net(ipxlat->dev), NULL, &fl6);
+ if (unlikely(dst->error)) {
+ mtu6 = dev_mtu;
+ goto out;
+ }
+
+ /* Route lookup can return a very large MTU (eg, local/loopback style
+ * routes) that does not reflect the translator egress constraint.
+ * Clamp with the translator device MTU so DF decisions are stable and
+ * pre-fragment planning never targets packets larger than what this
+ * interface can hand to the next stages.
+ */
+ mtu6 = min_t(unsigned int, dst_mtu(dst), dev_mtu);
+
+out:
+ dst_release(dst);
+ return mtu6;
+}
+
+/**
+ * ipxlat_46_translate - translate one validated packet from IPv4 to IPv6
+ * @ipxlat: translator private context
+ * @skb: packet to translate
+ *
+ * Rewrites outer L3 in place, rebases cached offsets and translates L4 on
+ * first fragments only.
+ *
+ * Return: 0 on success, negative errno on translation failure.
+ */
+int ipxlat_46_translate(struct ipxlat_priv *ipxlat, struct sk_buff *skb)
+{
+ unsigned int min_l4_len, old_l3_len, new_l3_len;
+ struct ipxlat_cb *cb = ipxlat_skb_cb(skb);
+ const struct iphdr outer4 = *ip_hdr(skb);
+ const u8 in_l4_proto = cb->l4_proto;
+ bool has_frag, first_frag;
+ struct frag_hdr *fh6;
+ struct ipv6hdr *iph6;
+ int l3_delta, err;
+ u8 out_l4_proto;
+
+ /* snapshot the original IPv4 header fields before skb layout changes */
+ has_frag = ip_is_fragment(&outer4);
+ first_frag = ipxlat_is_first_frag4(&outer4);
+ out_l4_proto = ipxlat_46_map_proto_to_nexthdr(in_l4_proto);
+
+ old_l3_len = cb->l3_hdr_len;
+ new_l3_len = sizeof(struct ipv6hdr) +
+ (has_frag ? sizeof(struct frag_hdr) : 0);
+ l3_delta = (int)new_l3_len - (int)old_l3_len;
+
+ /* make room for the new hdrs */
+ if (unlikely(skb_cow_head(skb, max_t(int, 0, l3_delta))))
+ return -ENOMEM;
+
+ /* replace outer L3 area: drop IPv4 hdr, reserve IPv6(+Frag) hdr */
+ skb_pull(skb, old_l3_len);
+ skb_push(skb, new_l3_len);
+ skb_reset_network_header(skb);
+ skb_set_transport_header(skb, new_l3_len);
+ skb->protocol = htons(ETH_P_IPV6);
+
+ /* build outer IPv6 base hdr from translated IPv4 fields */
+ iph6 = ipv6_hdr(skb);
+ ipxlat_46_build_l3(iph6, &outer4, skb->len - sizeof(*iph6),
+ out_l4_proto, outer4.ttl - 1);
+
+ /* translate IPv4 endpoints into IPv6 addresses using xlat_prefix6 */
+ ipxlat_46_convert_addrs(&ipxlat->xlat_prefix6, &outer4, iph6);
+
+ /* add IPv6 fragment hdr when the IPv4 packet carried fragmentation */
+ if (unlikely(has_frag)) {
+ iph6->nexthdr = NEXTHDR_FRAGMENT;
+
+ fh6 = (struct frag_hdr *)(iph6 + 1);
+ ipxlat_46_build_frag_hdr(fh6, &outer4, in_l4_proto);
+ cb->fragh_off = sizeof(struct ipv6hdr);
+ }
+
+ /* Rebase cached offsets after L3 size delta.
+ * For outer 4->6 translation this should not underflow: cached offsets
+ * were built from l3_off + ip4_len(+...) and delta = ip6_len - ip4_len,
+ * so ip4_len cancels out after rebasing. A failure here means internal
+ * metadata inconsistency, not a packet validation outcome.
+ */
+ err = ipxlat_cb_rebase_offsets(cb, l3_delta);
+ if (unlikely(err)) {
+ DEBUG_NET_WARN_ON_ONCE(1);
+ return err;
+ }
+
+ cb->l3_hdr_len = new_l3_len;
+ cb->l4_proto = out_l4_proto;
+ DEBUG_NET_WARN_ON_ONCE(!ipxlat_cb_offsets_valid(cb));
+
+ /* non-first fragments have no transport header to translate */
+ if (unlikely(!first_frag))
+ goto out;
+
+ /* ensure transport bytes are writable before L4 csum/proto rewrites */
+ min_l4_len = ipxlat_l4_min_len(in_l4_proto);
+ if (unlikely(skb_ensure_writable(skb, skb_transport_offset(skb) +
+ min_l4_len)))
+ return -ENOMEM;
+
+ /* translate transport hdr and pseudohdr dependent checksums */
+ switch (in_l4_proto) {
+ case IPPROTO_TCP:
+ err = ipxlat_46_outer_tcp(skb, &outer4);
+ break;
+ case IPPROTO_UDP:
+ err = ipxlat_46_outer_udp(skb, &outer4);
+ break;
+ case IPPROTO_ICMP:
+ err = ipxlat_46_icmp(ipxlat, skb);
+ break;
+ default:
+ err = 0;
+ break;
+ }
+ if (unlikely(err))
+ return err;
+
+out:
+ /* normalize checksum/offload metadata for the translated frame */
+ return ipxlat_finalize_offload(skb, in_l4_proto, has_frag,
+ SKB_GSO_TCPV4, SKB_GSO_TCPV6);
+}
diff --git a/drivers/net/ipxlat/translate_46.h b/drivers/net/ipxlat/translate_46.h
new file mode 100644
index 000000000000..75def10d0cad
--- /dev/null
+++ b/drivers/net/ipxlat/translate_46.h
@@ -0,0 +1,73 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver
+ *
+ * Copyright (C) 2024- Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Copyright (C) 2026- Mandelbit SRL
+ * Copyright (C) 2026- Daniel Gröber <dxld@darkboxed.org>
+ *
+ * Author: Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Antonio Quartulli <antonio@mandelbit.com>
+ * Daniel Gröber <dxld@darkboxed.org>
+ * Ralf Lici <ralf@mandelbit.com>
+ */
+
+#ifndef _NET_IPXLAT_TRANSLATE_46_H_
+#define _NET_IPXLAT_TRANSLATE_46_H_
+
+#include "ipxlpriv.h"
+
+struct iphdr;
+struct ipv6hdr;
+struct frag_hdr;
+struct sk_buff;
+
+/**
+ * ipxlat_46_map_proto_to_nexthdr - map IPv4 L4 protocol to IPv6 nexthdr
+ * @protocol: IPv4 L4 protocol
+ *
+ * Return: IPv6 next-header value corresponding to @protocol.
+ */
+u8 ipxlat_46_map_proto_to_nexthdr(u8 protocol);
+
+/**
+ * ipxlat_46_build_frag_hdr - build IPv6 Fragment Header from IPv4 fragment info
+ * @fh6: output IPv6 fragment header
+ * @hdr4: source IPv4 header
+ * @l4_proto: original IPv4 L4 protocol
+ */
+void ipxlat_46_build_frag_hdr(struct frag_hdr *fh6, const struct iphdr *hdr4,
+ u8 l4_proto);
+
+/**
+ * ipxlat_46_build_l3 - build translated outer IPv6 header from IPv4 metadata
+ * @iph6: output IPv6 header
+ * @iph4: source IPv4 header
+ * @payload_len: IPv6 payload length
+ * @nexthdr: resulting IPv6 nexthdr
+ * @hop_limit: resulting IPv6 hop limit
+ */
+void ipxlat_46_build_l3(struct ipv6hdr *iph6, const struct iphdr *iph4,
+ unsigned int payload_len, u8 nexthdr, u8 hop_limit);
+
+/**
+ * ipxlat_46_lookup_pmtu6 - lookup post-translation IPv6 PMTU for a 4->6 packet
+ * @ipxlat: translator private context
+ * @skb: packet being translated
+ * @in4: source IPv4 header snapshot
+ *
+ * Return: effective PMTU clamped against translator device MTU.
+ */
+unsigned int ipxlat_46_lookup_pmtu6(struct ipxlat_priv *ipxlat,
+ const struct sk_buff *skb,
+ const struct iphdr *in4);
+
+/**
+ * ipxlat_46_translate - translate outer packet from IPv4 to IPv6 in place
+ * @ipxlat: translator private context
+ * @skb: packet to translate
+ *
+ * Return: 0 on success, negative errno on translation failure.
+ */
+int ipxlat_46_translate(struct ipxlat_priv *ipxlat, struct sk_buff *skb);
+
+#endif /* _NET_IPXLAT_TRANSLATE_46_H_ */
diff --git a/drivers/net/ipxlat/translate_64.c b/drivers/net/ipxlat/translate_64.c
new file mode 100644
index 000000000000..50a95fb75f9d
--- /dev/null
+++ b/drivers/net/ipxlat/translate_64.c
@@ -0,0 +1,205 @@
+// SPDX-License-Identifier: GPL-2.0
+/* IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver
+ *
+ * Copyright (C) 2024- Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Copyright (C) 2026- Mandelbit SRL
+ * Copyright (C) 2026- Daniel Gröber <dxld@darkboxed.org>
+ *
+ * Author: Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Antonio Quartulli <antonio@mandelbit.com>
+ * Daniel Gröber <dxld@darkboxed.org>
+ * Ralf Lici <ralf@mandelbit.com>
+ */
+
+#include <linux/icmpv6.h>
+#include <net/ip.h>
+
+#include "translate_64.h"
+#include "address.h"
+#include "packet.h"
+#include "transport.h"
+
+u8 ipxlat_64_map_nexthdr_proto(u8 nexthdr)
+{
+ return (nexthdr == NEXTHDR_ICMP) ? IPPROTO_ICMP : nexthdr;
+}
+
+void ipxlat_64_build_l3(struct iphdr *iph4, const struct ipv6hdr *iph6,
+ unsigned int tot_len, __be16 frag_off, u8 protocol,
+ __be32 saddr, __be32 daddr, u8 ttl, __be16 id)
+{
+ iph4->version = 4;
+ iph4->ihl = 5;
+ iph4->tos = ipxlat_get_ipv6_tclass(iph6);
+ iph4->tot_len = cpu_to_be16(tot_len);
+ iph4->frag_off = frag_off;
+ iph4->ttl = ttl;
+ iph4->protocol = protocol;
+ iph4->saddr = saddr;
+ iph4->daddr = daddr;
+ iph4->id = id;
+ iph4->check = 0;
+ iph4->check = ip_fast_csum(iph4, iph4->ihl);
+}
+
+static __be16 ipxlat_64_build_frag_off(const struct sk_buff *skb,
+ const struct frag_hdr *frag6,
+ u8 l4_proto)
+{
+ bool df, mf, over_mtu;
+ u16 frag_offset;
+
+ /* preserve real IPv6 fragmentation state with a Fragment Header */
+ if (frag6) {
+ mf = !!(be16_to_cpu(frag6->frag_off) & IP6_MF);
+ frag_offset = ipxlat_get_frag6_offset(frag6);
+ return ipxlat_build_frag4_offset(false, mf, frag_offset);
+ }
+
+ /* frag_list implies segmented payload emitted as fragments */
+ if (skb_has_frag_list(skb))
+ return ipxlat_build_frag4_offset(false, false, 0);
+
+ if (skb_is_gso(skb)) {
+ /* GSO frames are one datagram here; set DF only for TCP
+ * when later segmentation exceeds IPv6 minimum MTU
+ */
+ df = (l4_proto == IPPROTO_TCP) &&
+ (ipxlat_skb_cb(skb)->payload_off +
+ skb_shinfo(skb)->gso_size >
+ (IPV6_MIN_MTU - sizeof(struct iphdr)));
+ return ipxlat_build_frag4_offset(df, false, 0);
+ }
+
+ over_mtu = skb->len > (IPV6_MIN_MTU - sizeof(struct iphdr));
+ return ipxlat_build_frag4_offset(over_mtu, false, 0);
+}
+
+/**
+ * ipxlat_64_translate - translate one validated packet from IPv6 to IPv4
+ * @ipxlat: translator private context
+ * @skb: packet to translate
+ *
+ * Rewrites outer L3 in place, rebases cached offsets and translates L4 on
+ * first fragments only.
+ *
+ * Return: 0 on success, negative errno on translation failure.
+ */
+int ipxlat_64_translate(struct ipxlat_priv *ipxlat, struct sk_buff *skb)
+{
+ unsigned int min_l4_len, old_l3_len, new_l3_len;
+ struct ipxlat_cb *cb = ipxlat_skb_cb(skb);
+ struct ipv6hdr outer6 = *ipv6_hdr(skb);
+ bool is_icmp_err, has_frag, first_frag;
+ u8 in_l4_proto, out_l4_proto;
+ struct frag_hdr frag_copy;
+ struct frag_hdr *frag6;
+ __be32 saddr, daddr;
+ __be16 frag_off, id;
+ struct iphdr *iph4;
+ int l3_delta, err;
+
+ /* snapshot original outer IPv6 fields before L3 rewrite */
+ frag6 = cb->fragh_off ? (struct frag_hdr *)(skb->data + cb->fragh_off) :
+ NULL;
+ has_frag = !!frag6;
+ in_l4_proto = cb->l4_proto;
+ is_icmp_err = cb->is_icmp_err;
+ out_l4_proto = ipxlat_64_map_nexthdr_proto(in_l4_proto);
+
+ old_l3_len = cb->l3_hdr_len;
+ new_l3_len = sizeof(struct iphdr);
+ l3_delta = (int)new_l3_len - (int)old_l3_len;
+
+ if (unlikely(has_frag))
+ frag_copy = *frag6;
+ first_frag = ipxlat_is_first_frag6(has_frag ? &frag_copy : NULL);
+
+ if (unlikely(is_icmp_err)) {
+ if (unlikely(in_l4_proto != NEXTHDR_ICMP))
+ return -EINVAL;
+ }
+
+ /* derive translated IPv4 endpoints */
+ err = ipxlat_64_convert_addrs(&ipxlat->xlat_prefix6, &outer6,
+ is_icmp_err, &saddr, &daddr);
+ if (unlikely(err))
+ return err;
+
+ /* replace outer IPv6 hdr with IPv4 hdr in-place */
+ skb_pull(skb, old_l3_len);
+ skb_push(skb, new_l3_len);
+ skb_reset_network_header(skb);
+ skb_set_transport_header(skb, new_l3_len);
+ skb->protocol = htons(ETH_P_IP);
+
+ /* Rebase cached offsets after L3 size delta.
+ * For outer 6->4 translation this should not underflow: cached offsets
+ * were built from l3_off + ip6_len (+ ...), and
+ * delta = sizeof(struct iphdr) - ip6_len, so ip6_len cancels out after
+ * rebasing. A failure here means internal metadata inconsistency, not
+ * a packet validation outcome.
+ */
+ err = ipxlat_cb_rebase_offsets(cb, l3_delta);
+ if (unlikely(err)) {
+ DEBUG_NET_WARN_ON_ONCE(1);
+ return err;
+ }
+
+ cb->l3_hdr_len = sizeof(struct iphdr);
+ cb->fragh_off = 0;
+ cb->l4_proto = out_l4_proto;
+ DEBUG_NET_WARN_ON_ONCE(!ipxlat_cb_offsets_valid(cb));
+
+ /* build outer IPv4 base hdr from translated IPv6 fields */
+ iph4 = ip_hdr(skb);
+ frag_off = ipxlat_64_build_frag_off(skb, has_frag ? &frag_copy : NULL,
+ out_l4_proto);
+ /* when source had Fragment Header we preserve its identification;
+ * otherwise allocate a fresh IPv4 ID for the translated packet
+ */
+ id = has_frag ? cpu_to_be16(be32_to_cpu(frag_copy.identification)) : 0;
+ ipxlat_64_build_l3(iph4, &outer6, skb->len, frag_off,
+ out_l4_proto, saddr, daddr,
+ outer6.hop_limit - 1, id);
+
+ if (likely(!has_frag)) {
+ iph4->id = 0;
+ __ip_select_ident(dev_net(ipxlat->dev), iph4, 1);
+ iph4->check = 0;
+ iph4->check = ip_fast_csum(iph4, iph4->ihl);
+ }
+
+ /* non-first fragments have no transport header to translate */
+ if (unlikely(!first_frag))
+ goto out;
+
+ /* ensure transport bytes are writable before L4 csum/proto rewrites */
+ min_l4_len = ipxlat_l4_min_len(out_l4_proto);
+ if (unlikely(skb_ensure_writable(skb, skb_transport_offset(skb) +
+ min_l4_len)))
+ return -ENOMEM;
+
+ /* translate transport hdr and pseudohdr dependent checksums */
+ switch (out_l4_proto) {
+ case IPPROTO_TCP:
+ err = ipxlat_64_outer_tcp(skb, &outer6);
+ break;
+ case IPPROTO_UDP:
+ err = ipxlat_64_outer_udp(skb, &outer6);
+ break;
+ case IPPROTO_ICMP:
+ err = ipxlat_64_icmp(ipxlat, skb, &outer6);
+ break;
+ default:
+ err = 0;
+ break;
+ }
+ if (unlikely(err))
+ return err;
+
+out:
+ /* normalize checksum/offload metadata for the translated frame */
+ return ipxlat_finalize_offload(skb, out_l4_proto, ip_is_fragment(iph4),
+ SKB_GSO_TCPV6, SKB_GSO_TCPV4);
+}
diff --git a/drivers/net/ipxlat/translate_64.h b/drivers/net/ipxlat/translate_64.h
new file mode 100644
index 000000000000..269d1955944f
--- /dev/null
+++ b/drivers/net/ipxlat/translate_64.h
@@ -0,0 +1,56 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver
+ *
+ * Copyright (C) 2024- Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Copyright (C) 2026- Mandelbit SRL
+ * Copyright (C) 2026- Daniel Gröber <dxld@darkboxed.org>
+ *
+ * Author: Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Antonio Quartulli <antonio@mandelbit.com>
+ * Daniel Gröber <dxld@darkboxed.org>
+ * Ralf Lici <ralf@mandelbit.com>
+ */
+
+#ifndef _NET_IPXLAT_TRANSLATE_64_H_
+#define _NET_IPXLAT_TRANSLATE_64_H_
+
+#include "ipxlpriv.h"
+
+struct sk_buff;
+struct iphdr;
+struct ipv6hdr;
+
+/**
+ * ipxlat_64_build_l3 - build translated outer IPv4 header from IPv6 metadata
+ * @iph4: output IPv4 header
+ * @iph6: source IPv6 header
+ * @tot_len: resulting IPv4 total length
+ * @frag_off: resulting IPv4 fragment offset/flags
+ * @protocol: resulting IPv4 L4 protocol
+ * @saddr: resulting IPv4 source address
+ * @daddr: resulting IPv4 destination address
+ * @ttl: resulting IPv4 TTL
+ * @id: resulting IPv4 identification field
+ */
+void ipxlat_64_build_l3(struct iphdr *iph4, const struct ipv6hdr *iph6,
+ unsigned int tot_len, __be16 frag_off, u8 protocol,
+ __be32 saddr, __be32 daddr, u8 ttl, __be16 id);
+
+/**
+ * ipxlat_64_translate - translate outer packet from IPv6 to IPv4 in place
+ * @ipxlat: translator private context
+ * @skb: packet to translate
+ *
+ * Return: 0 on success, negative errno on translation failure.
+ */
+int ipxlat_64_translate(struct ipxlat_priv *ipxlat, struct sk_buff *skb);
+
+/**
+ * ipxlat_64_map_nexthdr_proto - map IPv6 nexthdr to IPv4 L4 protocol
+ * @nexthdr: IPv6 next-header value
+ *
+ * Return: IPv4 protocol value corresponding to @nexthdr.
+ */
+u8 ipxlat_64_map_nexthdr_proto(u8 nexthdr);
+
+#endif /* _NET_IPXLAT_TRANSLATE_64_H_ */
diff --git a/drivers/net/ipxlat/transport.c b/drivers/net/ipxlat/transport.c
index 3aa00c635916..78548d0b8c22 100644
--- a/drivers/net/ipxlat/transport.c
+++ b/drivers/net/ipxlat/transport.c
@@ -338,3 +338,14 @@ int ipxlat_64_inner_udp(struct sk_buff *skb, const struct ipv6hdr *in6,
udp_new->check = CSUM_MANGLED_0;
return 0;
}
+
+int ipxlat_46_icmp(struct ipxlat_priv *ipxlat, struct sk_buff *skb)
+{
+ return -EPROTONOSUPPORT;
+}
+
+int ipxlat_64_icmp(struct ipxlat_priv *ipxlat, struct sk_buff *skb,
+ const struct ipv6hdr *outer6)
+{
+ return -EPROTONOSUPPORT;
+}
diff --git a/drivers/net/ipxlat/transport.h b/drivers/net/ipxlat/transport.h
index 9b6fe422b01f..0e69b98eafd0 100644
--- a/drivers/net/ipxlat/transport.h
+++ b/drivers/net/ipxlat/transport.h
@@ -100,4 +100,9 @@ int ipxlat_64_inner_tcp(struct sk_buff *skb, const struct ipv6hdr *in6,
int ipxlat_64_inner_udp(struct sk_buff *skb, const struct ipv6hdr *in6,
const struct iphdr *out4, struct udphdr *udp_new);
+/* temporary ICMP stubs until ICMP translation support is introduced */
+int ipxlat_46_icmp(struct ipxlat_priv *ipxlat, struct sk_buff *skb);
+int ipxlat_64_icmp(struct ipxlat_priv *ipxlat, struct sk_buff *skb,
+ const struct ipv6hdr *outer6);
+
#endif /* _NET_IPXLAT_TRANSPORT_H_ */
--
2.53.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [RFC net-next 09/15] ipxlat: emit translator-generated ICMP errors on drop
2026-03-19 15:12 [RFC net-next 00/15] Introducing ipxlat: a stateless IPv4/IPv6 translation device Ralf Lici
` (7 preceding siblings ...)
2026-03-19 15:12 ` [RFC net-next 08/15] ipxlat: add translation engine and dispatch core Ralf Lici
@ 2026-03-19 15:12 ` Ralf Lici
2026-03-19 15:12 ` [RFC net-next 10/15] ipxlat: add 4to6 pre-fragmentation path Ralf Lici
` (5 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Ralf Lici @ 2026-03-19 15:12 UTC (permalink / raw)
To: netdev
Cc: Daniel Gröber, Ralf Lici, Antonio Quartulli, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
linux-kernel
When validation or policy requires dropping a packet and generating an
ICMP error, route that failure through explicit ICMP emission paths so
the sender can be notified where appropriate. This commit adds
translator-originated error generation for both directions and
integrates it into dispatch action handling without changing normal
forwarding behavior.
Signed-off-by: Ralf Lici <ralf@mandelbit.com>
---
drivers/net/ipxlat/dispatch.c | 66 ++++++++++++++++++++++++++++++++++-
drivers/net/ipxlat/dispatch.h | 7 ++++
drivers/net/ipxlat/packet.c | 25 ++++++++++---
3 files changed, 92 insertions(+), 6 deletions(-)
diff --git a/drivers/net/ipxlat/dispatch.c b/drivers/net/ipxlat/dispatch.c
index 133d30859f49..b8b9b930b04c 100644
--- a/drivers/net/ipxlat/dispatch.c
+++ b/drivers/net/ipxlat/dispatch.c
@@ -11,7 +11,12 @@
* Ralf Lici <ralf@mandelbit.com>
*/
+#include <linux/icmp.h>
+#include <linux/icmpv6.h>
+#include <net/icmp.h>
#include <net/ip.h>
+#include <net/route.h>
+#include <net/ipv6.h>
#include "dispatch.h"
#include "packet.h"
@@ -21,7 +26,8 @@
static enum ipxlat_action
ipxlat_resolve_failed_action(const struct sk_buff *skb)
{
- return IPXLAT_ACT_DROP;
+ return ipxlat_skb_cb(skb)->emit_icmp_err ? IPXLAT_ACT_ICMP_ERR :
+ IPXLAT_ACT_DROP;
}
enum ipxlat_action ipxlat_translate(struct ipxlat_priv *ipxlat,
@@ -61,6 +67,59 @@ void ipxlat_mark_icmp_drop(struct sk_buff *skb, u8 type, u8 code, u32 info)
cb->icmp_err.info = info;
}
+static void ipxlat_46_emit_icmp_err(struct ipxlat_priv *ipxlat,
+ struct sk_buff *inner)
+{
+ struct ipxlat_cb *cb = ipxlat_skb_cb(inner);
+ const struct iphdr *iph = ip_hdr(inner);
+ struct inet_skb_parm param = {};
+
+ /* build route metadata on demand when the packet has no dst */
+ if (unlikely(!skb_dst(inner))) {
+ const int reason = ip_route_input_noref(inner, iph->daddr,
+ iph->saddr,
+ ip4h_dscp(iph),
+ inner->dev);
+
+ if (unlikely(reason)) {
+ netdev_dbg(ipxlat->dev,
+ "icmp4 emit: route build failed reason=%d\n",
+ reason);
+ return;
+ }
+ }
+
+ /* emit the ICMPv4 error */
+ __icmp_send(inner, cb->icmp_err.type, cb->icmp_err.code,
+ htonl(cb->icmp_err.info), ¶m);
+}
+
+static void ipxlat_64_emit_icmp_err(struct sk_buff *inner)
+{
+ struct ipxlat_cb *cb = ipxlat_skb_cb(inner);
+ struct inet6_skb_parm param = {};
+
+ /* emit the ICMPv6 error */
+ icmp6_send(inner, cb->icmp_err.type, cb->icmp_err.code,
+ cb->icmp_err.info, NULL, ¶m);
+}
+
+/* emit translator-generated ICMP errors for packets rejected by RFC rules */
+void ipxlat_emit_icmp_error(struct ipxlat_priv *ipxlat, struct sk_buff *inner)
+{
+ switch (ntohs(inner->protocol)) {
+ case ETH_P_IPV6:
+ ipxlat_64_emit_icmp_err(inner);
+ return;
+ case ETH_P_IP:
+ ipxlat_46_emit_icmp_err(ipxlat, inner);
+ return;
+ default:
+ DEBUG_NET_WARN_ON_ONCE(1);
+ return;
+ }
+}
+
static void ipxlat_forward_pkt(struct ipxlat_priv *ipxlat, struct sk_buff *skb)
{
const unsigned int len = skb->len;
@@ -90,6 +149,11 @@ int ipxlat_process_skb(struct ipxlat_priv *ipxlat, struct sk_buff *skb,
dev_dstats_tx_add(ipxlat->dev, skb->len);
ipxlat_forward_pkt(ipxlat, skb);
return 0;
+ case IPXLAT_ACT_ICMP_ERR:
+ dev_dstats_tx_dropped(ipxlat->dev);
+ ipxlat_emit_icmp_error(ipxlat, skb);
+ consume_skb(skb);
+ return 0;
case IPXLAT_ACT_DROP:
goto drop_free;
default:
diff --git a/drivers/net/ipxlat/dispatch.h b/drivers/net/ipxlat/dispatch.h
index fa6fafea656b..73acd831b6cf 100644
--- a/drivers/net/ipxlat/dispatch.h
+++ b/drivers/net/ipxlat/dispatch.h
@@ -44,6 +44,13 @@ enum ipxlat_action {
*/
void ipxlat_mark_icmp_drop(struct sk_buff *skb, u8 type, u8 code, u32 info);
+/**
+ * ipxlat_emit_icmp_error - emit cached translator-generated ICMP error
+ * @ipxlat: translator private context
+ * @inner: offending packet used as quoted payload
+ */
+void ipxlat_emit_icmp_error(struct ipxlat_priv *ipxlat, struct sk_buff *inner);
+
/**
* ipxlat_translate - validate/translate one packet and return next action
* @ipxlat: translator private context
diff --git a/drivers/net/ipxlat/packet.c b/drivers/net/ipxlat/packet.c
index b37a3e55aff8..758b72bdc6f1 100644
--- a/drivers/net/ipxlat/packet.c
+++ b/drivers/net/ipxlat/packet.c
@@ -142,6 +142,8 @@ static int ipxlat_v4_srr_check(struct sk_buff *skb, const struct iphdr *hdr)
if (unlikely(ptr > len - 3))
return -EINVAL;
+ ipxlat_mark_icmp_drop(skb, ICMP_DEST_UNREACH,
+ ICMP_SR_FAILED, 0);
return -EINVAL;
}
@@ -272,8 +274,10 @@ static int ipxlat_v4_pull_hdrs(struct sk_buff *skb)
/* RFC 7915 Section 4.1 */
if (unlikely(ipxlat_v4_srr_check(skb, l3_hdr)))
return -EINVAL;
- if (unlikely(l3_hdr->ttl <= 1))
+ if (unlikely(l3_hdr->ttl <= 1)) {
+ ipxlat_mark_icmp_drop(skb, ICMP_TIME_EXCEEDED, ICMP_EXC_TTL, 0);
return -EINVAL;
+ }
/* RFC 7915 Section 1.2:
* Fragmented ICMP/ICMPv6 packets will not be translated by IP/ICMP
@@ -390,8 +394,11 @@ int ipxlat_v4_validate_skb(struct ipxlat_priv *ipxlat, struct sk_buff *skb)
* Fragmented checksum-less IPv4 UDP is rejected because 4->6 cannot
* reliably translate it.
*/
- if (unlikely(ip_is_fragment(l3_hdr)))
+ if (unlikely(ip_is_fragment(l3_hdr))) {
+ ipxlat_mark_icmp_drop(skb, ICMP_DEST_UNREACH, ICMP_PKT_FILTERED,
+ 0);
return -EINVAL;
+ }
/* udph->len bounds the span used to compute replacement checksum */
if (unlikely(ntohs(udph->len) > skb->len - cb->l4_off))
@@ -520,7 +527,7 @@ static int ipxlat_v6_walk_hdrs(struct sk_buff *skb, unsigned int l3_offset,
*/
static int ipxlat_v6_check_rh(struct sk_buff *skb)
{
- unsigned int rh_off;
+ unsigned int rh_off, pointer;
int flags, nexthdr;
rh_off = 0;
@@ -531,6 +538,8 @@ static int ipxlat_v6_check_rh(struct sk_buff *skb)
if (likely(nexthdr != NEXTHDR_ROUTING))
return 0;
+ pointer = rh_off + offsetof(struct ipv6_rt_hdr, segments_left);
+ ipxlat_mark_icmp_drop(skb, ICMPV6_PARAMPROB, ICMPV6_HDR_FIELD, pointer);
return -EINVAL;
}
@@ -550,8 +559,11 @@ static int ipxlat_v6_pull_outer_l3(struct sk_buff *skb)
!ipxlat_v6_validate_saddr(&l3_hdr->saddr)))
return -EINVAL;
- if (unlikely(l3_hdr->hop_limit <= 1))
+ if (unlikely(l3_hdr->hop_limit <= 1)) {
+ ipxlat_mark_icmp_drop(skb, ICMPV6_TIME_EXCEED,
+ ICMPV6_EXC_HOPLIMIT, 0);
return -EINVAL;
+ }
return 0;
}
@@ -617,8 +629,11 @@ static int ipxlat_v6_pull_hdrs(struct sk_buff *skb)
/* -EPROTONOSUPPORT means packet layout is syntactically valid but
* unsupported by our RFC 7915 path
*/
- if (unlikely(err == -EPROTONOSUPPORT))
+ if (unlikely(err == -EPROTONOSUPPORT)) {
+ ipxlat_mark_icmp_drop(skb, ICMPV6_DEST_UNREACH,
+ ICMPV6_ADM_PROHIBITED, 0);
return -EINVAL;
+ }
if (unlikely(err))
return err;
--
2.53.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [RFC net-next 10/15] ipxlat: add 4to6 pre-fragmentation path
2026-03-19 15:12 [RFC net-next 00/15] Introducing ipxlat: a stateless IPv4/IPv6 translation device Ralf Lici
` (8 preceding siblings ...)
2026-03-19 15:12 ` [RFC net-next 09/15] ipxlat: emit translator-generated ICMP errors on drop Ralf Lici
@ 2026-03-19 15:12 ` Ralf Lici
2026-03-19 15:12 ` [RFC net-next 11/15] ipxlat: add ICMP informational translation paths Ralf Lici
` (4 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Ralf Lici @ 2026-03-19 15:12 UTC (permalink / raw)
To: netdev
Cc: Daniel Gröber, Ralf Lici, Antonio Quartulli, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
linux-kernel
RFC 7915 requires handling packets that would exceed the translated IPv6
size constraints. Add a pre-fragmentation planning/action path that
invokes kernel fragmentation helpers before translation, carries
fragment size through skb metadata, and then reinjects fragments into
the normal translation path.
Signed-off-by: Ralf Lici <ralf@mandelbit.com>
---
drivers/net/ipxlat/dispatch.c | 99 ++++++++++++++++++++++++++++++-
drivers/net/ipxlat/translate_46.c | 59 +++++++++++++++++-
drivers/net/ipxlat/translate_46.h | 11 ++++
3 files changed, 166 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ipxlat/dispatch.c b/drivers/net/ipxlat/dispatch.c
index b8b9b930b04c..b58191d4b2c9 100644
--- a/drivers/net/ipxlat/dispatch.c
+++ b/drivers/net/ipxlat/dispatch.c
@@ -47,6 +47,16 @@ enum ipxlat_action ipxlat_translate(struct ipxlat_priv *ipxlat,
if (unlikely(ipxlat_v4_validate_skb(ipxlat, skb)))
return ipxlat_resolve_failed_action(skb);
+ /* 4->6 prefrag plan stores per-skb frag_max_size
+ * when the packet must be split before translation
+ * (DF clear and translated size
+ * above PMTU/threshold).
+ */
+ if (unlikely(ipxlat_46_plan_prefrag(ipxlat, skb)))
+ return ipxlat_resolve_failed_action(skb);
+ if (unlikely(ipxlat_skb_cb(skb)->frag_max_size))
+ return IPXLAT_ACT_PRE_FRAG;
+
if (unlikely(ipxlat_46_translate(ipxlat, skb)))
return ipxlat_resolve_failed_action(skb);
@@ -120,6 +130,76 @@ void ipxlat_emit_icmp_error(struct ipxlat_priv *ipxlat, struct sk_buff *inner)
}
}
+static unsigned int ipxlat_frag_dst_get_mtu(const struct dst_entry *dst)
+{
+ return READ_ONCE(dst->dev->mtu);
+}
+
+static struct dst_ops ipxlat_frag_dst_ops = {
+ .family = AF_UNSPEC,
+ .mtu = ipxlat_frag_dst_get_mtu,
+};
+
+/**
+ * ipxlat_46_frag_output - reinject one fragment produced by ip_do_fragment
+ * @net: network namespace of the transmitter
+ * @sk: originating socket
+ * @skb: fragment to reinject
+ *
+ * This callback mirrors ndo_start_xmit processing but runs with
+ * pre-fragmentation disabled to prevent recursive pre-fragment loops.
+ *
+ * Return: 0 on success, negative errno on processing failure.
+ */
+static int ipxlat_46_frag_output(struct net *net, struct sock *sk,
+ struct sk_buff *skb)
+{
+ struct ipxlat_priv *ipxlat = netdev_priv(skb->dev);
+
+ return ipxlat_process_skb(ipxlat, skb, false);
+}
+
+/**
+ * ipxlat_46_fragment_pkt - fragment oversized 4->6 input before translation
+ * @ipxlat: translator private context
+ * @skb: original packet to fragment
+ * @frag_max_size: per-fragment payload cap for ip_do_fragment
+ *
+ * Installs a temporary synthetic dst so ip_do_fragment can read MTU and then
+ * reinjects each produced fragment back into ipxlat through
+ * ipxlat_46_frag_output.
+ *
+ * Return: 0 on success, negative errno on fragmentation failure.
+ */
+static int ipxlat_46_fragment_pkt(struct ipxlat_priv *ipxlat,
+ struct sk_buff *skb, u16 frag_max_size)
+{
+ const unsigned long orig_dst = skb->_skb_refdst;
+ struct rtable ipxlat_rt = {};
+ int err;
+
+ /* ip_do_fragment needs a dst object to query mtu */
+ dst_init(&ipxlat_rt.dst, &ipxlat_frag_dst_ops, NULL, DST_OBSOLETE_NONE,
+ DST_NOCOUNT);
+
+ /* use translator netdev as mtu source for the temporary dst */
+ ipxlat_rt.dst.dev = ipxlat->dev;
+
+ /* setup the skb for fragmentation */
+ skb_dst_set_noref(skb, &ipxlat_rt.dst);
+ memset(IPCB(skb), 0, sizeof(struct inet_skb_parm));
+ IPCB(skb)->frag_max_size = frag_max_size;
+
+ /* fragment and reinject each frag in the translator */
+ err = ip_do_fragment(dev_net(ipxlat->dev), skb->sk, skb,
+ ipxlat_46_frag_output);
+
+ /* drop original dst ref replaced by the synthetic NOREF dst */
+ refdst_drop(orig_dst);
+
+ return err;
+}
+
static void ipxlat_forward_pkt(struct ipxlat_priv *ipxlat, struct sk_buff *skb)
{
const unsigned int len = skb->len;
@@ -141,14 +221,29 @@ int ipxlat_process_skb(struct ipxlat_priv *ipxlat, struct sk_buff *skb,
enum ipxlat_action action;
int err = -EINVAL;
- (void)allow_pre_frag;
-
action = ipxlat_translate(ipxlat, skb);
switch (action) {
case IPXLAT_ACT_FWD:
dev_dstats_tx_add(ipxlat->dev, skb->len);
ipxlat_forward_pkt(ipxlat, skb);
return 0;
+ case IPXLAT_ACT_PRE_FRAG:
+ /* prefrag is allowed only once to avoid unbounded loops */
+ if (unlikely(!allow_pre_frag)) {
+ err = -ELOOP;
+ goto drop_free;
+ }
+
+ /* fragment first, then reinject each fragment through
+ * ipxlat_process_skb via ipxlat_46_frag_output
+ */
+ err = ipxlat_46_fragment_pkt(ipxlat, skb,
+ ipxlat_skb_cb(skb)->frag_max_size);
+ /* fragment path already consumed/freed skb */
+ skb = NULL;
+ if (unlikely(err))
+ goto drop_free;
+ return 0;
case IPXLAT_ACT_ICMP_ERR:
dev_dstats_tx_dropped(ipxlat->dev);
ipxlat_emit_icmp_error(ipxlat, skb);
diff --git a/drivers/net/ipxlat/translate_46.c b/drivers/net/ipxlat/translate_46.c
index aec8500db2c2..0b79ca07c771 100644
--- a/drivers/net/ipxlat/translate_46.c
+++ b/drivers/net/ipxlat/translate_46.c
@@ -87,6 +87,63 @@ unsigned int ipxlat_46_lookup_pmtu6(struct ipxlat_priv *ipxlat,
return mtu6;
}
+/**
+ * ipxlat_46_plan_prefrag - plan pre-translation IPv4 fragmentation for 4->6
+ * @ipxlat: translator private context
+ * @skb: packet being translated
+ *
+ * Decides whether packet exceeds PMTU/LIM thresholds and, when needed, stores
+ * per-skb fragmentation cap in cb->frag_max_size for later ip_do_fragment.
+ *
+ * Return: 0 on success, negative errno on policy/validation failure.
+ */
+int ipxlat_46_plan_prefrag(struct ipxlat_priv *ipxlat, struct sk_buff *skb)
+{
+ unsigned int pkt_len6, pmtu6, threshold6, frag_max_size, pkt_len4,
+ old_l3_len, new_l3_len;
+ struct ipxlat_cb *cb = ipxlat_skb_cb(skb);
+ const struct iphdr *in4 = ip_hdr(skb);
+ int l3_delta, frag_l3_delta;
+
+ if (unlikely(cb->frag_max_size)) {
+ DEBUG_NET_WARN_ON_ONCE(1);
+ cb->frag_max_size = 0;
+ }
+
+ pkt_len4 = iph_totlen(skb, in4);
+ old_l3_len = cb->l3_hdr_len;
+ new_l3_len = sizeof(struct ipv6hdr) +
+ (ip_is_fragment(in4) ? sizeof(struct frag_hdr) : 0);
+ l3_delta = (int)new_l3_len - (int)old_l3_len;
+ pkt_len6 = pkt_len4 + l3_delta;
+
+ pmtu6 = ipxlat_46_lookup_pmtu6(ipxlat, skb, in4);
+ threshold6 = min(pmtu6, READ_ONCE(ipxlat->lowest_ipv6_mtu));
+
+ if (likely(pkt_len6 <= threshold6))
+ return 0;
+
+ /* df packets are never locally pre-fragmented */
+ if (likely(be16_to_cpu(in4->frag_off) & IP_DF)) {
+ /* Let the IPv6 forwarding path raise PTB when needed and rely
+ * on the reverse 6->4 ICMP translation path for feedback.
+ */
+ return 0;
+ }
+
+ /* df not set: we can fragment */
+
+ frag_l3_delta =
+ (int)(sizeof(struct ipv6hdr) + sizeof(struct frag_hdr)) -
+ (int)old_l3_len;
+ frag_max_size = threshold6 - frag_l3_delta;
+ /* store per-skb prefrag cap: ipxlat_46_fragment_pkt will copy it into
+ * IPCB(skb)->frag_max_size before calling ip_do_fragment
+ */
+ cb->frag_max_size = min_t(unsigned int, frag_max_size, IP_MAX_MTU);
+ return 0;
+}
+
/**
* ipxlat_46_translate - translate one validated packet from IPv4 to IPv6
* @ipxlat: translator private context
@@ -182,7 +239,7 @@ int ipxlat_46_translate(struct ipxlat_priv *ipxlat, struct sk_buff *skb)
err = ipxlat_46_outer_udp(skb, &outer4);
break;
case IPPROTO_ICMP:
- err = ipxlat_46_icmp(ipxlat, skb);
+ err = -EPROTONOSUPPORT;
break;
default:
err = 0;
diff --git a/drivers/net/ipxlat/translate_46.h b/drivers/net/ipxlat/translate_46.h
index 75def10d0cad..6ba409c94185 100644
--- a/drivers/net/ipxlat/translate_46.h
+++ b/drivers/net/ipxlat/translate_46.h
@@ -61,6 +61,17 @@ unsigned int ipxlat_46_lookup_pmtu6(struct ipxlat_priv *ipxlat,
const struct sk_buff *skb,
const struct iphdr *in4);
+/**
+ * ipxlat_46_plan_prefrag - decide whether IPv4 packet must be pre-fragmented
+ * @ipxlat: translator private context
+ * @skb: packet being translated
+ *
+ * Sets cb->frag_max_size when pre-fragmentation is required.
+ *
+ * Return: 0 on success, negative errno on policy/validation failure.
+ */
+int ipxlat_46_plan_prefrag(struct ipxlat_priv *ipxlat, struct sk_buff *skb);
+
/**
* ipxlat_46_translate - translate outer packet from IPv4 to IPv6 in place
* @ipxlat: translator private context
--
2.53.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [RFC net-next 11/15] ipxlat: add ICMP informational translation paths
2026-03-19 15:12 [RFC net-next 00/15] Introducing ipxlat: a stateless IPv4/IPv6 translation device Ralf Lici
` (9 preceding siblings ...)
2026-03-19 15:12 ` [RFC net-next 10/15] ipxlat: add 4to6 pre-fragmentation path Ralf Lici
@ 2026-03-19 15:12 ` Ralf Lici
2026-03-19 15:12 ` [RFC net-next 12/15] ipxlat: add ICMP error translation and quoted-inner handling Ralf Lici
` (3 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Ralf Lici @ 2026-03-19 15:12 UTC (permalink / raw)
To: netdev
Cc: Daniel Gröber, Ralf Lici, Antonio Quartulli, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
linux-kernel
Add ICMP informational message translation for both 4->6 and 6->4 paths
and wire the new ICMP translation units into the engine.
This introduces the protocol mapping and checksum update logic for echo
request/reply traffic, while ICMP error quoted-inner translation is
added in a follow-up commit.
Signed-off-by: Ralf Lici <ralf@mandelbit.com>
---
drivers/net/ipxlat/Makefile | 2 +
drivers/net/ipxlat/icmp.h | 43 ++++++++++++++
drivers/net/ipxlat/icmp_46.c | 95 +++++++++++++++++++++++++++++++
drivers/net/ipxlat/icmp_64.c | 92 ++++++++++++++++++++++++++++++
drivers/net/ipxlat/translate_64.c | 1 +
drivers/net/ipxlat/transport.c | 11 ----
drivers/net/ipxlat/transport.h | 5 --
7 files changed, 233 insertions(+), 16 deletions(-)
create mode 100644 drivers/net/ipxlat/icmp.h
create mode 100644 drivers/net/ipxlat/icmp_46.c
create mode 100644 drivers/net/ipxlat/icmp_64.c
diff --git a/drivers/net/ipxlat/Makefile b/drivers/net/ipxlat/Makefile
index d7b7097aee5f..2ded504902e3 100644
--- a/drivers/net/ipxlat/Makefile
+++ b/drivers/net/ipxlat/Makefile
@@ -11,3 +11,5 @@ ipxlat-objs += transport.o
ipxlat-objs += dispatch.o
ipxlat-objs += translate_46.o
ipxlat-objs += translate_64.o
+ipxlat-objs += icmp_46.o
+ipxlat-objs += icmp_64.o
diff --git a/drivers/net/ipxlat/icmp.h b/drivers/net/ipxlat/icmp.h
new file mode 100644
index 000000000000..52d681787d6a
--- /dev/null
+++ b/drivers/net/ipxlat/icmp.h
@@ -0,0 +1,43 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver
+ *
+ * Copyright (C) 2024- Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Copyright (C) 2026- Mandelbit SRL
+ * Copyright (C) 2026- Daniel Gröber <dxld@darkboxed.org>
+ *
+ * Author: Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Antonio Quartulli <antonio@mandelbit.com>
+ * Daniel Gröber <dxld@darkboxed.org>
+ * Ralf Lici <ralf@mandelbit.com>
+ */
+
+#ifndef _NET_IPXLAT_ICMP_H_
+#define _NET_IPXLAT_ICMP_H_
+
+#include <linux/ipv6.h>
+
+#include "ipxlpriv.h"
+
+/**
+ * ipxlat_46_icmp - translate ICMP informational payload
+ * after outer 4->6 rewrite
+ * @ipxl: translator private context
+ * @skb: packet carrying ICMPv4 transport payload
+ *
+ * Return: 0 on success, negative errno on translation failure.
+ */
+int ipxlat_46_icmp(struct ipxlat_priv *ipxl, struct sk_buff *skb);
+
+/**
+ * ipxlat_64_icmp - translate ICMP informational payload
+ * after outer 6->4 rewrite
+ * @ipxlat: translator private context
+ * @skb: packet carrying ICMPv6 transport payload
+ * @in6: snapshot of original outer IPv6 header
+ *
+ * Return: 0 on success, negative errno on translation failure.
+ */
+int ipxlat_64_icmp(struct ipxlat_priv *ipxlat, struct sk_buff *skb,
+ const struct ipv6hdr *in6);
+
+#endif /* _NET_IPXLAT_ICMP_H_ */
diff --git a/drivers/net/ipxlat/icmp_46.c b/drivers/net/ipxlat/icmp_46.c
new file mode 100644
index 000000000000..ad907f60416c
--- /dev/null
+++ b/drivers/net/ipxlat/icmp_46.c
@@ -0,0 +1,95 @@
+// SPDX-License-Identifier: GPL-2.0
+/* IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver
+ *
+ * Copyright (C) 2024- Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Copyright (C) 2026- Mandelbit SRL
+ * Copyright (C) 2026- Daniel Gröber <dxld@darkboxed.org>
+ *
+ * Author: Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Antonio Quartulli <antonio@mandelbit.com>
+ * Daniel Gröber <dxld@darkboxed.org>
+ * Ralf Lici <ralf@mandelbit.com>
+ */
+
+#include <linux/icmp.h>
+#include <linux/icmpv6.h>
+
+#include "icmp.h"
+#include "packet.h"
+#include "transport.h"
+
+static int ipxlat_46_map_icmp_info_type_code(const struct icmphdr *in,
+ struct icmp6hdr *out)
+{
+ switch (in->type) {
+ case ICMP_ECHO:
+ out->icmp6_type = ICMPV6_ECHO_REQUEST;
+ out->icmp6_code = 0;
+ out->icmp6_identifier = in->un.echo.id;
+ out->icmp6_sequence = in->un.echo.sequence;
+ return 0;
+ case ICMP_ECHOREPLY:
+ out->icmp6_type = ICMPV6_ECHO_REPLY;
+ out->icmp6_code = 0;
+ out->icmp6_identifier = in->un.echo.id;
+ out->icmp6_sequence = in->un.echo.sequence;
+ return 0;
+ }
+
+ return -EPROTONOSUPPORT;
+}
+
+static void ipxlat_46_icmp_info_update_csum(const struct icmphdr *icmp4,
+ struct icmp6hdr *icmp6,
+ const struct ipv6hdr *ip6,
+ const struct sk_buff *skb,
+ unsigned int l4_off)
+{
+ struct icmp6hdr icmp6_zero;
+ struct icmphdr icmp4_zero;
+ __wsum csum;
+
+ icmp4_zero = *icmp4;
+ icmp4_zero.checksum = 0;
+ icmp6_zero = *icmp6;
+ icmp6_zero.icmp6_cksum = 0;
+ csum = ~csum_unfold(icmp4->checksum);
+ csum = csum_sub(csum, csum_partial(&icmp4_zero, sizeof(icmp4_zero), 0));
+ csum = csum_add(csum, csum_partial(&icmp6_zero, sizeof(icmp6_zero), 0));
+ icmp6->icmp6_cksum = csum_ipv6_magic(&ip6->saddr, &ip6->daddr,
+ skb->len - l4_off,
+ IPPROTO_ICMPV6, csum);
+}
+
+static int ipxlat_46_icmp_info_outer(struct sk_buff *skb)
+{
+ const unsigned int l4_off = skb_transport_offset(skb);
+ const struct icmphdr icmp4 = *icmp_hdr(skb);
+ const struct ipv6hdr *ip6 = ipv6_hdr(skb);
+ struct icmp6hdr *icmp6 = icmp6_hdr(skb);
+ int err;
+
+ err = ipxlat_46_map_icmp_info_type_code(&icmp4, icmp6);
+ if (unlikely(err))
+ return -EINVAL;
+
+ if (skb->ip_summed == CHECKSUM_PARTIAL) {
+ icmp6->icmp6_cksum = ~csum_ipv6_magic(&ip6->saddr, &ip6->daddr,
+ skb->len - l4_off,
+ IPPROTO_ICMPV6, 0);
+ return ipxlat_set_partial_csum(skb, offsetof(struct icmp6hdr,
+ icmp6_cksum));
+ }
+
+ ipxlat_46_icmp_info_update_csum(&icmp4, icmp6, ip6, skb, l4_off);
+ skb->ip_summed = CHECKSUM_NONE;
+ return 0;
+}
+
+int ipxlat_46_icmp(struct ipxlat_priv *ipxl, struct sk_buff *skb)
+{
+ if (unlikely(ipxlat_skb_cb(skb)->is_icmp_err))
+ return -EPROTONOSUPPORT;
+
+ return ipxlat_46_icmp_info_outer(skb);
+}
diff --git a/drivers/net/ipxlat/icmp_64.c b/drivers/net/ipxlat/icmp_64.c
new file mode 100644
index 000000000000..6b11aa638068
--- /dev/null
+++ b/drivers/net/ipxlat/icmp_64.c
@@ -0,0 +1,92 @@
+// SPDX-License-Identifier: GPL-2.0
+/* IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver
+ *
+ * Copyright (C) 2024- Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Copyright (C) 2026- Mandelbit SRL
+ * Copyright (C) 2026- Daniel Gröber <dxld@darkboxed.org>
+ *
+ * Author: Alberto Leiva Popper <ydahhrk@gmail.com>
+ * Antonio Quartulli <antonio@mandelbit.com>
+ * Daniel Gröber <dxld@darkboxed.org>
+ * Ralf Lici <ralf@mandelbit.com>
+ */
+
+#include <linux/icmpv6.h>
+
+#include "icmp.h"
+#include "packet.h"
+#include "transport.h"
+
+static int ipxlat_64_map_icmp_info_type_code(const struct icmp6hdr *in,
+ struct icmphdr *out)
+{
+ switch (in->icmp6_type) {
+ case ICMPV6_ECHO_REQUEST:
+ out->type = ICMP_ECHO;
+ out->code = 0;
+ out->un.echo.id = in->icmp6_identifier;
+ out->un.echo.sequence = in->icmp6_sequence;
+ return 0;
+ case ICMPV6_ECHO_REPLY:
+ out->type = ICMP_ECHOREPLY;
+ out->code = 0;
+ out->un.echo.id = in->icmp6_identifier;
+ out->un.echo.sequence = in->icmp6_sequence;
+ return 0;
+ default:
+ return -EINVAL;
+ }
+}
+
+static __sum16 ipxlat_64_compute_icmp_info_csum(const struct ipv6hdr *in6,
+ const struct icmp6hdr *in_icmp6,
+ const struct icmphdr *out_icmp4,
+ unsigned int l4_len)
+{
+ struct icmp6hdr icmp6_zero;
+ struct icmphdr icmp4_zero;
+ __wsum csum, tmp;
+
+ icmp6_zero = *in_icmp6;
+ icmp6_zero.icmp6_cksum = 0;
+ icmp4_zero = *out_icmp4;
+ icmp4_zero.checksum = 0;
+
+ csum = ~csum_unfold(in_icmp6->icmp6_cksum);
+ tmp = ~csum_unfold(csum_ipv6_magic(&in6->saddr, &in6->daddr, l4_len,
+ NEXTHDR_ICMP, 0));
+ csum = csum_sub(csum, tmp);
+ csum = csum_sub(csum, csum_partial(&icmp6_zero, sizeof(icmp6_zero), 0));
+ csum = csum_add(csum, csum_partial(&icmp4_zero, sizeof(icmp4_zero), 0));
+ return csum_fold(csum);
+}
+
+static int ipxlat_64_icmp_info(struct sk_buff *skb, const struct ipv6hdr *in6)
+{
+ struct icmp6hdr ic6_copy, *ic6;
+ struct icmphdr *ic4;
+ int err;
+
+ ic6 = icmp6_hdr(skb);
+ ic6_copy = *ic6;
+
+ ic4 = (struct icmphdr *)(skb->data + skb_transport_offset(skb));
+ err = ipxlat_64_map_icmp_info_type_code(&ic6_copy, ic4);
+ if (unlikely(err))
+ return err;
+
+ ic4->checksum =
+ ipxlat_64_compute_icmp_info_csum(in6, &ic6_copy, ic4,
+ ipxlat_skb_datagram_len(skb));
+ skb->ip_summed = CHECKSUM_NONE;
+ return 0;
+}
+
+int ipxlat_64_icmp(struct ipxlat_priv *ipxl, struct sk_buff *skb,
+ const struct ipv6hdr *in6)
+{
+ if (unlikely(ipxlat_skb_cb(skb)->is_icmp_err))
+ return -EPROTONOSUPPORT;
+
+ return ipxlat_64_icmp_info(skb, in6);
+}
diff --git a/drivers/net/ipxlat/translate_64.c b/drivers/net/ipxlat/translate_64.c
index 50a95fb75f9d..412d29214a43 100644
--- a/drivers/net/ipxlat/translate_64.c
+++ b/drivers/net/ipxlat/translate_64.c
@@ -16,6 +16,7 @@
#include "translate_64.h"
#include "address.h"
+#include "icmp.h"
#include "packet.h"
#include "transport.h"
diff --git a/drivers/net/ipxlat/transport.c b/drivers/net/ipxlat/transport.c
index 78548d0b8c22..3aa00c635916 100644
--- a/drivers/net/ipxlat/transport.c
+++ b/drivers/net/ipxlat/transport.c
@@ -338,14 +338,3 @@ int ipxlat_64_inner_udp(struct sk_buff *skb, const struct ipv6hdr *in6,
udp_new->check = CSUM_MANGLED_0;
return 0;
}
-
-int ipxlat_46_icmp(struct ipxlat_priv *ipxlat, struct sk_buff *skb)
-{
- return -EPROTONOSUPPORT;
-}
-
-int ipxlat_64_icmp(struct ipxlat_priv *ipxlat, struct sk_buff *skb,
- const struct ipv6hdr *outer6)
-{
- return -EPROTONOSUPPORT;
-}
diff --git a/drivers/net/ipxlat/transport.h b/drivers/net/ipxlat/transport.h
index 0e69b98eafd0..9b6fe422b01f 100644
--- a/drivers/net/ipxlat/transport.h
+++ b/drivers/net/ipxlat/transport.h
@@ -100,9 +100,4 @@ int ipxlat_64_inner_tcp(struct sk_buff *skb, const struct ipv6hdr *in6,
int ipxlat_64_inner_udp(struct sk_buff *skb, const struct ipv6hdr *in6,
const struct iphdr *out4, struct udphdr *udp_new);
-/* temporary ICMP stubs until ICMP translation support is introduced */
-int ipxlat_46_icmp(struct ipxlat_priv *ipxlat, struct sk_buff *skb);
-int ipxlat_64_icmp(struct ipxlat_priv *ipxlat, struct sk_buff *skb,
- const struct ipv6hdr *outer6);
-
#endif /* _NET_IPXLAT_TRANSPORT_H_ */
--
2.53.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [RFC net-next 12/15] ipxlat: add ICMP error translation and quoted-inner handling
2026-03-19 15:12 [RFC net-next 00/15] Introducing ipxlat: a stateless IPv4/IPv6 translation device Ralf Lici
` (10 preceding siblings ...)
2026-03-19 15:12 ` [RFC net-next 11/15] ipxlat: add ICMP informational translation paths Ralf Lici
@ 2026-03-19 15:12 ` Ralf Lici
2026-03-19 15:12 ` [RFC net-next 13/15] ipxlat: add netlink control plane and uapi Ralf Lici
` (2 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Ralf Lici @ 2026-03-19 15:12 UTC (permalink / raw)
To: netdev
Cc: Daniel Gröber, Ralf Lici, Antonio Quartulli, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
linux-kernel
Extend ICMP translation with error-path support for both directions,
including quoted-inner packet rewriting and RFC 4884 extension
relayout/squeeze logic.
This adds the ICMP type/code/error-field mappings, inner L3/L4 rewrite
paths, and final checksum handling required for translator ICMP error
processing.
Signed-off-by: Ralf Lici <ralf@mandelbit.com>
---
drivers/net/ipxlat/icmp.h | 14 +-
drivers/net/ipxlat/icmp_46.c | 467 ++++++++++++++++++++++++++++++++-
drivers/net/ipxlat/icmp_64.c | 453 +++++++++++++++++++++++++++++++-
drivers/net/ipxlat/transport.c | 61 +++++
drivers/net/ipxlat/transport.h | 19 ++
5 files changed, 996 insertions(+), 18 deletions(-)
diff --git a/drivers/net/ipxlat/icmp.h b/drivers/net/ipxlat/icmp.h
index 52d681787d6a..71bd7e20af91 100644
--- a/drivers/net/ipxlat/icmp.h
+++ b/drivers/net/ipxlat/icmp.h
@@ -19,22 +19,24 @@
#include "ipxlpriv.h"
/**
- * ipxlat_46_icmp - translate ICMP informational payload
- * after outer 4->6 rewrite
- * @ipxl: translator private context
+ * ipxlat_46_icmp - translate ICMP payload after outer 4->6 L3 rewrite
+ * @ipxlat: translator private context
* @skb: packet carrying ICMPv4 transport payload
*
+ * Handles both ICMP info translation and ICMP error quoted-inner rewriting.
+ *
* Return: 0 on success, negative errno on translation failure.
*/
-int ipxlat_46_icmp(struct ipxlat_priv *ipxl, struct sk_buff *skb);
+int ipxlat_46_icmp(struct ipxlat_priv *ipxlat, struct sk_buff *skb);
/**
- * ipxlat_64_icmp - translate ICMP informational payload
- * after outer 6->4 rewrite
+ * ipxlat_64_icmp - translate ICMP payload after outer 6->4 L3 rewrite
* @ipxlat: translator private context
* @skb: packet carrying ICMPv6 transport payload
* @in6: snapshot of original outer IPv6 header
*
+ * Handles both ICMP info translation and ICMP error quoted-inner rewriting.
+ *
* Return: 0 on success, negative errno on translation failure.
*/
int ipxlat_64_icmp(struct ipxlat_priv *ipxlat, struct sk_buff *skb,
diff --git a/drivers/net/ipxlat/icmp_46.c b/drivers/net/ipxlat/icmp_46.c
index ad907f60416c..41a91d4bc3dc 100644
--- a/drivers/net/ipxlat/icmp_46.c
+++ b/drivers/net/ipxlat/icmp_46.c
@@ -11,13 +11,49 @@
* Ralf Lici <ralf@mandelbit.com>
*/
-#include <linux/icmp.h>
-#include <linux/icmpv6.h>
-
+#include "address.h"
#include "icmp.h"
#include "packet.h"
+#include "translate_46.h"
#include "transport.h"
+#define IPXLAT_ICMP4_PP_CODE_PTR 0
+#define IPXLAT_ICMP4_PP_CODE_BADLEN 2
+
+/* RFC 7915 Section 4.2, Figure 3 */
+static const u8 ipxlat_46_icmp_param_prob_map[] = { 0, 1, 4, 4, 0xff,
+ 0xff, 0xff, 0xff, 7, 6,
+ 0xff, 0xff, 8, 8, 8,
+ 8, 24, 24, 24, 24 };
+
+/* RFC 1191 plateau table used when ICMPv4 FRAG_NEEDED reports MTU=0 */
+static const u16 ipxlat_46_mtu_plateaus[] = {
+ 65535, 32000, 17914, 8166, 4352, 2002, 1492,
+};
+
+static u8 ipxlat_icmp4_get_param_ptr(const struct icmphdr *ic4)
+{
+ return ntohl(ic4->un.gateway) >> 24;
+}
+
+static int ipxlat_46_map_icmp_param_prob(const struct icmphdr *in,
+ struct icmp6hdr *out)
+{
+ u8 ptr;
+
+ if (unlikely(in->code != IPXLAT_ICMP4_PP_CODE_PTR &&
+ in->code != IPXLAT_ICMP4_PP_CODE_BADLEN))
+ return -EPROTONOSUPPORT;
+
+ ptr = ipxlat_icmp4_get_param_ptr(in);
+ if (unlikely(ptr >= ARRAY_SIZE(ipxlat_46_icmp_param_prob_map) ||
+ ipxlat_46_icmp_param_prob_map[ptr] == 0xff))
+ return -EPROTONOSUPPORT;
+
+ out->icmp6_pointer = cpu_to_be32(ipxlat_46_icmp_param_prob_map[ptr]);
+ return 0;
+}
+
static int ipxlat_46_map_icmp_info_type_code(const struct icmphdr *in,
struct icmp6hdr *out)
{
@@ -39,6 +75,165 @@ static int ipxlat_46_map_icmp_info_type_code(const struct icmphdr *in,
return -EPROTONOSUPPORT;
}
+static __be32 ipxlat_46_compute_icmp_mtu6(unsigned int pkt_mtu,
+ unsigned int nexthop6mtu,
+ unsigned int nexthop4mtu,
+ u16 tot_len_field)
+{
+ unsigned int i;
+ u32 result;
+
+ /* RFC 7915 Section 4.2:
+ * If the IPv4 router set the MTU field to zero, then the translator
+ * MUST use the plateau values specified in RFC 1191 to determine a
+ * likely path MTU and include that path MTU in the ICMPv6 packet.
+ */
+ if (unlikely(pkt_mtu == 0)) {
+ for (i = 0; i < ARRAY_SIZE(ipxlat_46_mtu_plateaus); i++) {
+ if (ipxlat_46_mtu_plateaus[i] < tot_len_field) {
+ pkt_mtu = ipxlat_46_mtu_plateaus[i];
+ break;
+ }
+ }
+ }
+
+ /* RFC 7915 Section 4.2:
+ * max(1280, min(pkt_mtu + 20, mtu6_nexthop, mtu4_nexthop + 20))
+ *
+ * pkt_mtu + 20 converts ICMPv4-reported MTU to IPv6 context.
+ * mtu6_nexthop and mtu4_nexthop + 20 clamp to local next-hop limits.
+ * max(..., 1280) enforces IPv6 minimum MTU.
+ */
+ result = min(pkt_mtu + 20, min(nexthop6mtu, nexthop4mtu + 20));
+ if (result < IPV6_MIN_MTU)
+ result = IPV6_MIN_MTU;
+
+ return cpu_to_be32(result);
+}
+
+static int ipxlat_46_build_icmp_dest_unreach(struct ipxlat_priv *ipxlat,
+ struct sk_buff *skb,
+ const struct icmphdr *in,
+ struct icmp6hdr *out,
+ const struct iphdr *inner4)
+{
+ unsigned int inner4_tot_len, in_frag_mtu, in_mtu, out_mtu;
+
+ switch (in->code) {
+ case ICMP_NET_UNREACH:
+ case ICMP_HOST_UNREACH:
+ case ICMP_SR_FAILED:
+ case ICMP_NET_UNKNOWN:
+ case ICMP_HOST_UNKNOWN:
+ case ICMP_HOST_ISOLATED:
+ case ICMP_NET_UNR_TOS:
+ case ICMP_HOST_UNR_TOS:
+ case ICMP_PORT_UNREACH:
+ case ICMP_NET_ANO:
+ case ICMP_HOST_ANO:
+ case ICMP_PKT_FILTERED:
+ case ICMP_PREC_CUTOFF:
+ out->icmp6_unused = 0;
+ return 0;
+ case ICMP_PROT_UNREACH:
+ out->icmp6_pointer =
+ cpu_to_be32(offsetof(struct ipv6hdr, nexthdr));
+ return 0;
+ case ICMP_FRAG_NEEDED:
+ in_frag_mtu = be16_to_cpu(in->un.frag.mtu);
+ inner4_tot_len = be16_to_cpu(inner4->tot_len);
+ in_mtu = READ_ONCE(ipxlat->dev->mtu);
+ out_mtu = ipxlat_46_lookup_pmtu6(ipxlat, skb, inner4);
+
+ out->icmp6_mtu =
+ ipxlat_46_compute_icmp_mtu6(in_frag_mtu, out_mtu,
+ in_mtu, inner4_tot_len);
+ return 0;
+ }
+
+ return -EPROTONOSUPPORT;
+}
+
+static int ipxlat_46_map_icmp_type_code(struct ipxlat_priv *ipxlat,
+ struct sk_buff *skb,
+ const struct icmphdr *in,
+ struct icmp6hdr *out,
+ const struct iphdr *inner4,
+ bool *ie_forbidden)
+{
+ int err;
+
+ *ie_forbidden = false;
+
+ switch (in->type) {
+ case ICMP_ECHO:
+ case ICMP_ECHOREPLY:
+ return ipxlat_46_map_icmp_info_type_code(in, out);
+ case ICMP_DEST_UNREACH:
+ switch (in->code) {
+ case ICMP_NET_UNREACH:
+ case ICMP_HOST_UNREACH:
+ case ICMP_SR_FAILED:
+ case ICMP_NET_UNKNOWN:
+ case ICMP_HOST_UNKNOWN:
+ case ICMP_HOST_ISOLATED:
+ case ICMP_NET_UNR_TOS:
+ case ICMP_HOST_UNR_TOS:
+ out->icmp6_type = ICMPV6_DEST_UNREACH;
+ out->icmp6_code = ICMPV6_NOROUTE;
+ break;
+ case ICMP_PROT_UNREACH:
+ out->icmp6_type = ICMPV6_PARAMPROB;
+ out->icmp6_code = ICMPV6_UNK_NEXTHDR;
+ *ie_forbidden = true;
+ break;
+ case ICMP_PORT_UNREACH:
+ out->icmp6_type = ICMPV6_DEST_UNREACH;
+ out->icmp6_code = ICMPV6_PORT_UNREACH;
+ break;
+ case ICMP_FRAG_NEEDED:
+ out->icmp6_type = ICMPV6_PKT_TOOBIG;
+ out->icmp6_code = 0;
+ *ie_forbidden = true;
+ break;
+ case ICMP_NET_ANO:
+ case ICMP_HOST_ANO:
+ case ICMP_PKT_FILTERED:
+ case ICMP_PREC_CUTOFF:
+ out->icmp6_type = ICMPV6_DEST_UNREACH;
+ out->icmp6_code = ICMPV6_ADM_PROHIBITED;
+ break;
+ default:
+ return -EPROTONOSUPPORT;
+ }
+ return ipxlat_46_build_icmp_dest_unreach(ipxlat,
+ skb, in, out,
+ inner4);
+ case ICMP_TIME_EXCEEDED:
+ out->icmp6_type = ICMPV6_TIME_EXCEED;
+ out->icmp6_code = in->code;
+ out->icmp6_unused = 0;
+ return 0;
+ case ICMP_PARAMETERPROB:
+ out->icmp6_type = ICMPV6_PARAMPROB;
+ *ie_forbidden = true;
+ switch (in->code) {
+ case IPXLAT_ICMP4_PP_CODE_PTR:
+ case IPXLAT_ICMP4_PP_CODE_BADLEN:
+ out->icmp6_code = ICMPV6_HDR_FIELD;
+ break;
+ default:
+ return -EPROTONOSUPPORT;
+ }
+ err = ipxlat_46_map_icmp_param_prob(in, out);
+ if (unlikely(err))
+ return err;
+ return 0;
+ }
+
+ return -EPROTONOSUPPORT;
+}
+
static void ipxlat_46_icmp_info_update_csum(const struct icmphdr *icmp4,
struct icmp6hdr *icmp6,
const struct ipv6hdr *ip6,
@@ -86,10 +281,272 @@ static int ipxlat_46_icmp_info_outer(struct sk_buff *skb)
return 0;
}
-int ipxlat_46_icmp(struct ipxlat_priv *ipxl, struct sk_buff *skb)
+static int ipxlat_46_icmp_info_inner(struct sk_buff *skb,
+ unsigned int inner_l4_off,
+ const struct ipv6hdr *inner6)
+{
+ struct icmp6hdr *icmp6;
+ struct icmphdr icmp4;
+ int err;
+
+ /* inner header alignment is not guaranteed */
+ memcpy(&icmp4, skb->data + inner_l4_off, sizeof(icmp4));
+ icmp6 = (struct icmp6hdr *)(skb->data + inner_l4_off);
+
+ err = ipxlat_46_map_icmp_info_type_code(&icmp4, icmp6);
+ if (unlikely(err))
+ return -EINVAL;
+
+ ipxlat_46_icmp_info_update_csum(&icmp4, icmp6, inner6, skb,
+ inner_l4_off);
+ return 0;
+}
+
+static int ipxlat_46_icmp_inner_l4(struct sk_buff *skb,
+ unsigned int inner_l4_off,
+ const struct iphdr *inner4,
+ const struct ipv6hdr *inner6)
+{
+ struct tcphdr *tcp;
+ struct udphdr *udp;
+
+ switch (inner4->protocol) {
+ case IPPROTO_TCP:
+ tcp = (struct tcphdr *)(skb->data + inner_l4_off);
+ return ipxlat_46_inner_tcp(skb, inner4, inner6, tcp);
+ case IPPROTO_UDP:
+ udp = (struct udphdr *)(skb->data + inner_l4_off);
+ return ipxlat_46_inner_udp(skb, inner4, inner6, udp);
+ case IPPROTO_ICMP:
+ return ipxlat_46_icmp_info_inner(skb, inner_l4_off, inner6);
+ default:
+ return 0;
+ }
+}
+
+static int ipxlat_46_icmp_inner(struct ipxlat_priv *ipxlat,
+ struct sk_buff *skb, struct iphdr *inner4,
+ int *inner_delta)
+{
+ unsigned int inner_l3_len, inner_l3_off, inner_l4_off, old_prefix,
+ new_prefix, inner_tot_len, inner_l3_payload, inner_l4_payload;
+ const unsigned int outer_l3_len = skb_transport_offset(skb);
+ const struct ipxlat_cb *cb = ipxlat_skb_cb(skb);
+ struct ipv6hdr outer_ip6_copy, *inner_ip6;
+ struct frag_hdr *fh6;
+ u8 next_hdr;
+ bool has_inner_frag;
+
+ inner_l3_off = cb->inner_l3_offset;
+ inner_l4_off = cb->inner_l4_offset;
+
+ /* inner header alignment is not guaranteed */
+ memcpy(inner4, skb->data + inner_l3_off, sizeof(*inner4));
+ inner_l3_len = inner4->ihl << 2;
+ has_inner_frag = ip_is_fragment(inner4);
+
+ /* save outer IPv6 hdr because pull+push destroys that hdr region */
+ outer_ip6_copy = *ipv6_hdr(skb);
+
+ old_prefix = inner_l3_off + inner_l3_len;
+ new_prefix = inner_l3_off + sizeof(struct ipv6hdr) +
+ (has_inner_frag ? sizeof(struct frag_hdr) : 0);
+ *inner_delta = (int)new_prefix - (int)old_prefix;
+
+ if (unlikely(skb_cow_head(skb, max_t(int, 0, *inner_delta))))
+ return -ENOMEM;
+
+ skb_pull(skb, old_prefix);
+ skb_push(skb, new_prefix);
+ /* outer 4->6 path already set header offsets, but inner relayout
+ * pulls/pushes change skb->data placement. Reinitialize outer header
+ * offsets so ip{,v6}_hdr/icmp{,6}_hdr and skb_transport_offset keep
+ * pointing to the outer packet.
+ */
+ skb_reset_network_header(skb);
+ skb_set_transport_header(skb, outer_l3_len);
+
+ *ipv6_hdr(skb) = outer_ip6_copy;
+ ipv6_hdr(skb)->payload_len = htons(skb->len - sizeof(struct ipv6hdr));
+
+ inner_ip6 = (struct ipv6hdr *)(skb->data + inner_l3_off);
+ /* use quoted IPv4 total-length, not skb->len:
+ * skb->len also includes ICMP extension bytes at the end, which are
+ * not part of the quoted inner IP datagram length.
+ */
+ inner_tot_len = ntohs(inner4->tot_len);
+ if (unlikely(inner_tot_len < inner_l3_len))
+ return -EINVAL;
+
+ inner_l3_payload = inner_tot_len - inner_l3_len +
+ (has_inner_frag ? sizeof(struct frag_hdr) : 0);
+ if (has_inner_frag)
+ next_hdr = NEXTHDR_FRAGMENT;
+ else
+ next_hdr = ipxlat_46_map_proto_to_nexthdr(inner4->protocol);
+
+ ipxlat_46_build_l3(inner_ip6, inner4, inner_l3_payload, next_hdr,
+ inner4->ttl);
+
+ ipxlat_46_convert_addrs(&ipxlat->xlat_prefix6, inner4, inner_ip6);
+
+ if (unlikely(has_inner_frag)) {
+ fh6 = (struct frag_hdr *)(inner_ip6 + 1);
+ ipxlat_46_build_frag_hdr(fh6, inner4, inner4->protocol);
+ }
+
+ if (unlikely(!ipxlat_is_first_frag4(inner4)))
+ return 0;
+
+ inner_l4_payload = new_prefix + ipxlat_l4_min_len(inner4->protocol);
+ if (unlikely(skb_ensure_writable(skb, inner_l4_payload)))
+ return -ENOMEM;
+
+ return ipxlat_46_icmp_inner_l4(skb, new_prefix, inner4, inner_ip6);
+}
+
+/* Adjust ICMP error quoted-datagram/extensions after inner 4->6 translation.
+ * The inner rewrite changes quoted datagram length; this helper recomputes
+ * RFC 4884 delimiter/padding, preserves extensions only when allowed, and
+ * enforces IPv6 minimum-MTU packet size constraints.
+ */
+static int ipxlat_46_icmp_squeeze_ext(struct sk_buff *skb,
+ unsigned int icmp4_ipl, int inner_delta,
+ bool ie_forbidden)
+{
+ unsigned int icmp6_iel_in, icmp6_iel_out, max_iel, outer_hdrs_len,
+ out_pad, payload_len, icmp6_ipl_out_bytes, pkt_len_cap;
+ unsigned int icmp6_ipl_out = 0;
+ int icmp6_ipl_in_bytes, err;
+ struct icmp6hdr *ic6;
+ struct ipv6hdr *iph6;
+
+ /* icmp4_ipl marks where quoted datagram ends and extension area starts
+ */
+ if (likely(!icmp4_ipl))
+ goto no_extensions;
+
+ outer_hdrs_len = skb_transport_offset(skb) + sizeof(struct icmp6hdr);
+ payload_len = skb->len - outer_hdrs_len;
+ icmp6_ipl_in_bytes = icmp4_ipl + inner_delta;
+ if (unlikely(icmp6_ipl_in_bytes < 0 ||
+ icmp6_ipl_in_bytes > payload_len))
+ return -EINVAL;
+
+ if (likely(icmp6_ipl_in_bytes == payload_len))
+ goto no_extensions;
+
+ icmp6_iel_in = payload_len - icmp6_ipl_in_bytes;
+ max_iel = IPV6_MIN_MTU - (outer_hdrs_len + ICMP_EXT_ORIG_DGRAM_MIN_LEN);
+
+ if (unlikely(ie_forbidden || icmp6_iel_in > max_iel)) {
+ pkt_len_cap = min_t(unsigned int, skb->len - icmp6_iel_in,
+ IPV6_MIN_MTU);
+ icmp6_ipl_out_bytes = pkt_len_cap - outer_hdrs_len;
+ out_pad = 0;
+ icmp6_iel_out = 0;
+ icmp6_ipl_out = 0;
+ } else {
+ pkt_len_cap = min_t(unsigned int, skb->len, IPV6_MIN_MTU);
+ icmp6_ipl_out_bytes =
+ round_down(pkt_len_cap - icmp6_iel_in - outer_hdrs_len,
+ sizeof(u64));
+ out_pad = max_t(unsigned int, ICMP_EXT_ORIG_DGRAM_MIN_LEN,
+ icmp6_ipl_out_bytes) -
+ icmp6_ipl_out_bytes;
+ icmp6_iel_out = icmp6_iel_in;
+ icmp6_ipl_out = (icmp6_ipl_out_bytes + out_pad) >> 3;
+ }
+
+ /* if no extension bytes are copied and no pad is written, relayout only
+ * trims/updates lengths and does not require full data writability
+ */
+ if (unlikely(icmp6_iel_out || out_pad)) {
+ err = skb_ensure_writable(skb, skb->len);
+ if (unlikely(err))
+ return err;
+ }
+
+ err = ipxlat_icmp_relayout(skb, outer_hdrs_len, icmp6_ipl_in_bytes,
+ icmp6_iel_in, icmp6_ipl_out_bytes, out_pad,
+ icmp6_iel_out);
+ if (unlikely(err))
+ return err;
+
+ iph6 = ipv6_hdr(skb);
+ iph6->payload_len = htons(skb->len - sizeof(*iph6));
+
+no_extensions:
+ if (unlikely(skb->len > IPV6_MIN_MTU)) {
+ err = pskb_trim(skb, IPV6_MIN_MTU);
+ if (unlikely(err))
+ return err;
+
+ iph6 = ipv6_hdr(skb);
+ iph6->payload_len = htons(skb->len - sizeof(*iph6));
+ }
+
+ ic6 = icmp6_hdr(skb);
+ ic6->icmp6_datagram_len = icmp6_ipl_out;
+ return 0;
+}
+
+/**
+ * ipxlat_46_icmp_error - translate ICMPv4 error payload to ICMPv6 error form
+ * @ipxlat: translator private context
+ * @skb: packet carrying outer ICMPv4 error
+ *
+ * Rewrites the quoted inner datagram in place, maps type/code/fields and
+ * adjusts RFC 4884 datagram/extension layout before recomputing outer checksum.
+ *
+ * Return: 0 on success, negative errno on translation failure.
+ */
+static int ipxlat_46_icmp_error(struct ipxlat_priv *ipxlat, struct sk_buff *skb)
+{
+ const struct ipxlat_cb *cb = ipxlat_skb_cb(skb);
+ const struct icmphdr icmp4 = *icmp_hdr(skb);
+ struct iphdr inner4_ip;
+ int inner_delta, err;
+ bool ie_forbidden;
+
+ if (unlikely(!(cb->is_icmp_err))) {
+ DEBUG_NET_WARN_ON_ONCE(1);
+ return -EINVAL;
+ }
+
+ /* translate quoted inner packet headers */
+ err = ipxlat_46_icmp_inner(ipxlat, skb, &inner4_ip, &inner_delta);
+ if (unlikely(err))
+ return err;
+
+ err = ipxlat_46_map_icmp_type_code(ipxlat, skb, &icmp4, icmp6_hdr(skb),
+ &inner4_ip, &ie_forbidden);
+ if (unlikely(err))
+ return err;
+
+ err = ipxlat_46_icmp_squeeze_ext(skb, icmp4.un.reserved[1] << 2,
+ inner_delta, ie_forbidden);
+ if (unlikely(err))
+ return err;
+
+ /* error path rewrites quoted packet bytes/lengths, so use full
+ * checksum recomputation instead of incremental update
+ */
+ icmp6_hdr(skb)->icmp6_cksum = 0;
+ icmp6_hdr(skb)->icmp6_cksum =
+ ipxlat_l4_csum_ipv6(&ipv6_hdr(skb)->saddr,
+ &ipv6_hdr(skb)->daddr, skb,
+ skb_transport_offset(skb),
+ ipxlat_skb_datagram_len(skb),
+ IPPROTO_ICMPV6);
+ skb->ip_summed = CHECKSUM_NONE;
+ return 0;
+}
+
+int ipxlat_46_icmp(struct ipxlat_priv *ipxlat, struct sk_buff *skb)
{
if (unlikely(ipxlat_skb_cb(skb)->is_icmp_err))
- return -EPROTONOSUPPORT;
+ return ipxlat_46_icmp_error(ipxlat, skb);
return ipxlat_46_icmp_info_outer(skb);
}
diff --git a/drivers/net/ipxlat/icmp_64.c b/drivers/net/ipxlat/icmp_64.c
index 6b11aa638068..18583620a09a 100644
--- a/drivers/net/ipxlat/icmp_64.c
+++ b/drivers/net/ipxlat/icmp_64.c
@@ -11,12 +11,38 @@
* Ralf Lici <ralf@mandelbit.com>
*/
-#include <linux/icmpv6.h>
+#include <net/route.h>
+#include "address.h"
#include "icmp.h"
#include "packet.h"
+#include "translate_64.h"
#include "transport.h"
+#define IPXLAT_ICMP4_ERROR_MAX_LEN 576U
+
+/* RFC 7915 Section 5.2, Figure 4 */
+static const u8 ipxlat_64_icmp_param_prob_map[] = {
+ 0, 1, 0xff, 0xff, 2, 2, 9, 8, 12, 12, 12, 12, 12, 12,
+ 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 16, 16, 16, 16,
+ 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
+};
+
+static int ipxlat_64_map_icmp_param_prob(u32 ptr6, u32 *ptr4)
+{
+ if (unlikely(ptr6 >= ARRAY_SIZE(ipxlat_64_icmp_param_prob_map) ||
+ ipxlat_64_icmp_param_prob_map[ptr6] == 0xff))
+ return -EPROTONOSUPPORT;
+
+ *ptr4 = ipxlat_64_icmp_param_prob_map[ptr6];
+ return 0;
+}
+
+static void ipxlat_icmp4_set_param_ptr(struct icmphdr *ic4, u8 ptr)
+{
+ ic4->un.gateway = htonl((u32)ptr << 24);
+}
+
static int ipxlat_64_map_icmp_info_type_code(const struct icmp6hdr *in,
struct icmphdr *out)
{
@@ -38,10 +64,119 @@ static int ipxlat_64_map_icmp_info_type_code(const struct icmp6hdr *in,
}
}
-static __sum16 ipxlat_64_compute_icmp_info_csum(const struct ipv6hdr *in6,
- const struct icmp6hdr *in_icmp6,
- const struct icmphdr *out_icmp4,
- unsigned int l4_len)
+/* Lookup post-translation IPv4 PMTU for ICMPv6 PTB -> ICMPv4 FRAG_NEEDED.
+ * Falls back to translator MTU on routing failures and clamps route MTU
+ * against translator egress MTU.
+ */
+static unsigned int ipxlat_64_lookup_pmtu4(struct ipxlat_priv *ipxlat,
+ const struct sk_buff *skb)
+{
+ const struct iphdr *iph4;
+ struct flowi4 fl4 = {};
+ unsigned int dev_mtu;
+ struct rtable *rt;
+ unsigned int mtu4;
+
+ dev_mtu = READ_ONCE(ipxlat->dev->mtu);
+ iph4 = ip_hdr(skb);
+
+ fl4.daddr = iph4->daddr;
+ fl4.saddr = iph4->saddr;
+ fl4.flowi4_mark = skb->mark;
+ fl4.flowi4_proto = IPPROTO_ICMP;
+
+ rt = ip_route_output_key(dev_net(ipxlat->dev), &fl4);
+ if (IS_ERR(rt))
+ return dev_mtu;
+
+ /* clamp against translator MTU to avoid oversized local PMTU */
+ mtu4 = min_t(unsigned int, dst_mtu(&rt->dst), dev_mtu);
+ ip_rt_put(rt);
+
+ return mtu4;
+}
+
+static int ipxlat_64_build_icmp4_errhdr(struct ipxlat_priv *ipxlat,
+ struct sk_buff *skb,
+ const struct icmp6hdr *ic6,
+ struct icmphdr *ic4, bool *ie_forbidden)
+{
+ unsigned int in_mtu, out_mtu;
+ u32 ptr6, ptr4;
+ int err;
+
+ switch (ic6->icmp6_type) {
+ case ICMPV6_DEST_UNREACH:
+ ic4->type = ICMP_DEST_UNREACH;
+ switch (ic6->icmp6_code) {
+ case ICMPV6_NOROUTE:
+ case ICMPV6_NOT_NEIGHBOUR:
+ case ICMPV6_ADDR_UNREACH:
+ ic4->code = ICMP_HOST_UNREACH;
+ break;
+ case ICMPV6_ADM_PROHIBITED:
+ ic4->code = ICMP_HOST_ANO;
+ break;
+ case ICMPV6_PORT_UNREACH:
+ ic4->code = ICMP_PORT_UNREACH;
+ break;
+ default:
+ return -EINVAL;
+ }
+ ic4->un.gateway = 0;
+ *ie_forbidden = false;
+ return 0;
+ case ICMPV6_TIME_EXCEED:
+ ic4->type = ICMP_TIME_EXCEEDED;
+ ic4->code = ic6->icmp6_code;
+ ic4->un.gateway = 0;
+ *ie_forbidden = false;
+ return 0;
+ case ICMPV6_PKT_TOOBIG:
+ ic4->type = ICMP_DEST_UNREACH;
+ ic4->code = ICMP_FRAG_NEEDED;
+ ic4->un.frag.__unused = 0;
+ in_mtu = ipxlat_64_lookup_pmtu4(ipxlat, skb);
+ out_mtu = READ_ONCE(ipxlat->dev->mtu);
+ /* RFC 7915 Section 5.2:
+ * min((PTB_mtu - 20), mtu4_nexthop, (mtu6_nexthop - 20))
+ */
+ ic4->un.frag.mtu =
+ cpu_to_be16(min3(be32_to_cpu(ic6->icmp6_mtu) - 20,
+ in_mtu, out_mtu - 20));
+ *ie_forbidden = true;
+ return 0;
+ case ICMPV6_PARAMPROB:
+ ptr6 = be32_to_cpu(ic6->icmp6_dataun.un_data32[0]);
+ switch (ic6->icmp6_code) {
+ case ICMPV6_HDR_FIELD:
+ ic4->type = ICMP_PARAMETERPROB;
+ ic4->code = 0;
+ err = ipxlat_64_map_icmp_param_prob(ptr6, &ptr4);
+ if (unlikely(err))
+ return err;
+ ipxlat_icmp4_set_param_ptr(ic4, ptr4);
+ break;
+ case ICMPV6_UNK_NEXTHDR:
+ ic4->type = ICMP_DEST_UNREACH;
+ ic4->code = ICMP_PROT_UNREACH;
+ ic4->un.gateway = 0;
+ break;
+ default:
+ return -EINVAL;
+ }
+ *ie_forbidden = true;
+ return 0;
+ default:
+ return -EINVAL;
+ }
+}
+
+static __sum16
+ipxlat_64_compute_icmp_info_csum(const struct ipv6hdr *in6,
+ const struct icmp6hdr *in_icmp6,
+ const struct icmphdr *out_icmp4,
+ unsigned int l4_len)
{
struct icmp6hdr icmp6_zero;
struct icmphdr icmp4_zero;
@@ -82,11 +217,315 @@ static int ipxlat_64_icmp_info(struct sk_buff *skb, const struct ipv6hdr *in6)
return 0;
}
-int ipxlat_64_icmp(struct ipxlat_priv *ipxl, struct sk_buff *skb,
+static int ipxlat_64_icmp_inner_info(struct sk_buff *skb,
+ unsigned int inner_l4_off)
+{
+ struct icmphdr *ic4;
+ struct icmp6hdr ic6;
+ int err;
+
+ /* inner header alignment is not guaranteed */
+ memcpy(&ic6, skb->data + inner_l4_off, sizeof(ic6));
+ ic4 = (struct icmphdr *)(skb->data + inner_l4_off);
+ err = ipxlat_64_map_icmp_info_type_code(&ic6, ic4);
+ if (unlikely(err))
+ return err;
+
+ ic4->checksum = 0;
+ ic4->checksum = csum_fold(skb_checksum(skb, inner_l4_off,
+ skb->len - inner_l4_off, 0));
+ skb->ip_summed = CHECKSUM_NONE;
+ return 0;
+}
+
+static int ipxlat_64_icmp_inner_l4(struct sk_buff *skb,
+ unsigned int inner_l4_off,
+ const struct iphdr *inner4,
+ const struct ipv6hdr *inner6)
+{
+ struct tcphdr *tcp;
+ struct udphdr *udp;
+
+ switch (inner4->protocol) {
+ case IPPROTO_TCP:
+ tcp = (struct tcphdr *)(skb->data + inner_l4_off);
+ return ipxlat_64_inner_tcp(skb, inner6, inner4, tcp);
+ case IPPROTO_UDP:
+ udp = (struct udphdr *)(skb->data + inner_l4_off);
+ return ipxlat_64_inner_udp(skb, inner6, inner4, udp);
+ case IPPROTO_ICMP:
+ return ipxlat_64_icmp_inner_info(skb, inner_l4_off);
+ default:
+ return 0;
+ }
+}
+
+static int ipxlat_64_icmp_inner(struct ipxlat_priv *ipxlat, struct sk_buff *skb,
+ int *inner_delta)
+{
+ unsigned int old_prefix, new_prefix, inner_l3_len, inner_tot_len,
+ inner_l4_payload, outer_prefix, inner_l3_off, inner_l4_old_off;
+ const unsigned int outer_l3_len = skb_transport_offset(skb);
+ const struct ipxlat_cb *cb = ipxlat_skb_cb(skb);
+ const struct iphdr outer4_copy = *ip_hdr(skb);
+ bool has_inner_frag, first_inner_frag, mf, df;
+ struct frag_hdr inner_fragh;
+ struct ipv6hdr inner6;
+ struct iphdr *inner4;
+ __be32 saddr, daddr;
+ u16 frag_off;
+ u8 inner_l4_proto;
+ __be16 frag_id;
+ int err;
+
+ inner_l3_off = cb->inner_l3_offset;
+ inner_l4_old_off = cb->inner_l4_offset;
+ inner_l3_len = inner_l4_old_off - inner_l3_off;
+ outer_prefix = inner_l3_off;
+
+ inner_l4_proto = ipxlat_64_map_nexthdr_proto(cb->inner_l4_proto);
+ has_inner_frag = !!cb->inner_fragh_off;
+
+ /* inner header alignment is not guaranteed */
+ memcpy(&inner6, skb->data + outer_prefix, sizeof(inner6));
+
+ first_inner_frag = true;
+ if (unlikely(has_inner_frag)) {
+ memcpy(&inner_fragh, skb->data + cb->inner_fragh_off,
+ sizeof(inner_fragh));
+ first_inner_frag = ipxlat_is_first_frag6(&inner_fragh);
+ }
+
+ err = ipxlat_64_convert_addrs(&ipxlat->xlat_prefix6, &inner6, false,
+ &saddr, &daddr);
+ if (unlikely(err))
+ return err;
+
+ old_prefix = outer_prefix + inner_l3_len;
+ new_prefix = outer_prefix + sizeof(struct iphdr);
+ *inner_delta = (int)new_prefix - (int)old_prefix;
+
+ /* unlike 46, inner 6->4 always shrinks quoted L3 size */
+ skb_pull(skb, old_prefix);
+ skb_push(skb, new_prefix);
+ /* outer 6->4 translation already set network/transport headers, but
+ * inner relayout pulls/pushes again and changes skb->data placement.
+ * Reinitialize outer header offsets so ip{,v6}_hdr/icmp{,6}_hdr and
+ * skb_transport_offset keep pointing to the outer packet.
+ */
+ skb_reset_network_header(skb);
+ skb_set_transport_header(skb, outer_l3_len);
+
+ *ip_hdr(skb) = outer4_copy;
+
+ inner4 = (struct iphdr *)(skb->data + outer_prefix);
+ inner_tot_len = ntohs(inner6.payload_len) + sizeof(inner6) -
+ inner_l3_len + sizeof(struct iphdr);
+ /* RFC 7915 Section 5.1 */
+ if (likely(!has_inner_frag)) {
+ df = inner_tot_len > (IPV6_MIN_MTU - sizeof(struct iphdr));
+ inner4->frag_off = ipxlat_build_frag4_offset(df, false, 0);
+ } else {
+ mf = !!(be16_to_cpu(inner_fragh.frag_off) & IP6_MF);
+ frag_off = ipxlat_get_frag6_offset(&inner_fragh);
+ inner4->frag_off =
+ ipxlat_build_frag4_offset(false, mf, frag_off);
+ }
+
+ /* keep low 16 bits of IPv6 Fragment ID as numeric value, then re-encode
+ * to network-order IPv4 ID
+ */
+ frag_id = has_inner_frag ?
+ cpu_to_be16(be32_to_cpu(inner_fragh.identification)) :
+ 0;
+ ipxlat_64_build_l3(inner4, &inner6, inner_tot_len, inner4->frag_off,
+ inner_l4_proto, saddr, daddr, inner6.hop_limit,
+ frag_id);
+
+ if (likely(!has_inner_frag)) {
+ inner4->id = 0;
+ __ip_select_ident(dev_net(ipxlat->dev), inner4, 1);
+ inner4->check = 0;
+ inner4->check = ip_fast_csum(inner4, inner4->ihl);
+ }
+
+ if (unlikely(!first_inner_frag))
+ return 0;
+
+ inner_l4_payload = new_prefix + ipxlat_l4_min_len(inner4->protocol);
+ if (unlikely(skb_ensure_writable(skb, inner_l4_payload)))
+ return -ENOMEM;
+
+ return ipxlat_64_icmp_inner_l4(skb, new_prefix, inner4, &inner6);
+}
+
+/* Rebuild ICMPv4 quoted-datagram/extensions after inner 6->4 translation.
+ *
+ * The inner rewrite changes the quoted datagram length. This helper updates
+ * the RFC 4884 delimiter/padding and extension bytes, then enforces the
+ * IPv4 ICMP error size cap.
+ *
+ * This is intentionally not a mirror of ipxlat_46_icmp_squeeze_ext:
+ * - 4->6 always writes icmp6_datagram_len (either computed or 0).
+ * - 6->4 updates ICMPv4 datagram-length only when extensions are allowed.
+ * Some mapped ICMPv6 errors set ie_forbidden, and in that case we keep the
+ * ICMPv4 header semantics for that type/code and only relayout/trim payload.
+ */
+static int ipxlat_64_squeeze_icmp_ext(struct sk_buff *skb,
+ unsigned int icmp6_ipl, int inner_delta,
+ bool ie_forbidden)
+{
+ unsigned int outer_hdrs_len, payload_len, icmp4_iel_in, icmp4_iel_out;
+ unsigned int out_pad, max_iel, pkt_len_cap, icmp4_ipl_out_bytes;
+ unsigned int icmp4_ipl_out = 0, icmp4_ipl_in_bytes;
+ unsigned int new_tot_len;
+ int icmp4_ipl_in, err;
+ struct icmphdr *ic4;
+ struct iphdr *iph4;
+
+ if (likely(!icmp6_ipl))
+ goto finalize;
+
+ outer_hdrs_len = skb_transport_offset(skb) + sizeof(struct icmphdr);
+ if (unlikely(skb->len < outer_hdrs_len))
+ return -EINVAL;
+
+ payload_len = skb->len - outer_hdrs_len;
+ icmp4_ipl_in = (int)icmp6_ipl + inner_delta;
+ if (unlikely(icmp4_ipl_in < 0))
+ return -EINVAL;
+ icmp4_ipl_in_bytes = icmp4_ipl_in;
+ if (unlikely(icmp4_ipl_in_bytes > payload_len))
+ return -EINVAL;
+
+ if (likely(icmp4_ipl_in_bytes == payload_len))
+ goto finalize;
+
+ icmp4_iel_in = payload_len - icmp4_ipl_in_bytes;
+ max_iel = IPXLAT_ICMP4_ERROR_MAX_LEN -
+ (outer_hdrs_len + ICMP_EXT_ORIG_DGRAM_MIN_LEN);
+
+ if (unlikely(ie_forbidden)) {
+ icmp4_ipl_out_bytes = icmp4_ipl_in_bytes;
+ out_pad = 0;
+ icmp4_iel_out = 0;
+ } else if (unlikely(icmp4_iel_in > max_iel)) {
+ pkt_len_cap = min_t(unsigned int, skb->len - icmp4_iel_in,
+ IPXLAT_ICMP4_ERROR_MAX_LEN);
+ icmp4_ipl_out_bytes = pkt_len_cap - outer_hdrs_len;
+ out_pad = 0;
+ icmp4_iel_out = 0;
+ icmp4_ipl_out = 0;
+ } else {
+ pkt_len_cap = min_t(unsigned int, skb->len,
+ IPXLAT_ICMP4_ERROR_MAX_LEN);
+ icmp4_ipl_out_bytes =
+ round_down(pkt_len_cap - icmp4_iel_in - outer_hdrs_len,
+ sizeof(u32));
+ out_pad = max_t(unsigned int, ICMP_EXT_ORIG_DGRAM_MIN_LEN,
+ icmp4_ipl_out_bytes) -
+ icmp4_ipl_out_bytes;
+ icmp4_iel_out = icmp4_iel_in;
+ /* RFC 4884 field is in 32-bit units for ICMPv4 errors */
+ icmp4_ipl_out = (icmp4_ipl_out_bytes + out_pad) >> 2;
+ }
+
+ /* if no extension bytes are copied and no pad is written, relayout only
+ * trims/updates lengths and does not require full data writability
+ */
+ if (unlikely(icmp4_iel_out || out_pad)) {
+ err = skb_ensure_writable(skb, skb->len);
+ if (unlikely(err))
+ return err;
+ }
+
+ err = ipxlat_icmp_relayout(skb, outer_hdrs_len, icmp4_ipl_in_bytes,
+ icmp4_iel_in, icmp4_ipl_out_bytes, out_pad,
+ icmp4_iel_out);
+ if (unlikely(err))
+ return err;
+
+finalize:
+ if (!ie_forbidden) {
+ ic4 = icmp_hdr(skb);
+ ic4->un.reserved[1] = icmp4_ipl_out;
+ }
+
+ if (unlikely(skb->len > IPXLAT_ICMP4_ERROR_MAX_LEN)) {
+ err = pskb_trim(skb, IPXLAT_ICMP4_ERROR_MAX_LEN);
+ if (unlikely(err))
+ return err;
+ }
+
+ iph4 = ip_hdr(skb);
+ new_tot_len = skb->len;
+ if (unlikely(be16_to_cpu(iph4->tot_len) != new_tot_len)) {
+ iph4->tot_len = cpu_to_be16(new_tot_len);
+ /* relayout/trim may invalidate precomputed DF decision */
+ iph4->frag_off &= cpu_to_be16(~IP_DF);
+ iph4->check = 0;
+ iph4->check = ip_fast_csum(iph4, iph4->ihl);
+ }
+
+ return 0;
+}
+
+/**
+ * ipxlat_64_icmp_error - translate ICMPv6 error payload to ICMPv4 error form
+ * @ipxlat: translator private context
+ * @skb: packet carrying outer ICMPv6 error
+ *
+ * Rewrites the quoted inner datagram in place, maps type/code/fields and
+ * adjusts RFC 4884 datagram/extension layout before recomputing outer checksum.
+ *
+ * Return: 0 on success, negative errno on translation failure.
+ */
+static int ipxlat_64_icmp_error(struct ipxlat_priv *ipxlat, struct sk_buff *skb)
+{
+ const struct ipxlat_cb *cb = ipxlat_skb_cb(skb);
+ const struct icmp6hdr ic6 = *icmp6_hdr(skb);
+ unsigned int icmp6_ipl;
+ int inner_delta, err;
+ struct icmphdr *ic4;
+ bool ie_forbidden;
+
+ if (unlikely(!(cb->is_icmp_err))) {
+ DEBUG_NET_WARN_ON_ONCE(1);
+ return -EINVAL;
+ }
+
+ /* translate quoted inner packet headers */
+ err = ipxlat_64_icmp_inner(ipxlat, skb, &inner_delta);
+ if (unlikely(err))
+ return err;
+
+ /* build outer ICMPv4 error header after inner relayout */
+ ic4 = (struct icmphdr *)(skb->data + skb_transport_offset(skb));
+ err = ipxlat_64_build_icmp4_errhdr(ipxlat, skb, &ic6, ic4,
+ &ie_forbidden);
+ if (unlikely(err))
+ return err;
+
+ icmp6_ipl = ic6.icmp6_datagram_len << 3;
+ err = ipxlat_64_squeeze_icmp_ext(skb, icmp6_ipl, inner_delta,
+ ie_forbidden);
+ if (unlikely(err))
+ return err;
+
+ /* recompute whole ICMPv4 checksum after error-path relayout */
+ ic4->checksum = 0;
+ ic4->checksum = csum_fold(skb_checksum(skb, skb_transport_offset(skb),
+ ipxlat_skb_datagram_len(skb),
+ 0));
+ skb->ip_summed = CHECKSUM_NONE;
+ return 0;
+}
+
+int ipxlat_64_icmp(struct ipxlat_priv *ipxlat, struct sk_buff *skb,
const struct ipv6hdr *in6)
{
if (unlikely(ipxlat_skb_cb(skb)->is_icmp_err))
- return -EPROTONOSUPPORT;
+ return ipxlat_64_icmp_error(ipxlat, skb);
return ipxlat_64_icmp_info(skb, in6);
}
diff --git a/drivers/net/ipxlat/transport.c b/drivers/net/ipxlat/transport.c
index 3aa00c635916..82aedfb0ee48 100644
--- a/drivers/net/ipxlat/transport.c
+++ b/drivers/net/ipxlat/transport.c
@@ -87,6 +87,67 @@ __sum16 ipxlat_l4_csum_ipv6(const struct in6_addr *saddr,
skb_checksum(skb, l4_off, l4_len, 0));
}
+static int ipxlat_ensure_tailroom(struct sk_buff *skb, const unsigned int grow)
+{
+ int err;
+
+ if (!grow || skb_tailroom(skb) >= grow)
+ return 0;
+
+ /* tail growth may reallocate backing storage and move skb data */
+ err = pskb_expand_head(skb, 0, grow - skb_tailroom(skb), GFP_ATOMIC);
+ if (unlikely(err))
+ return err;
+
+ return 0;
+}
+
+/* Rewrite quoted datagram layout after inner translation in ICMP errors.
+ *
+ * Caller provides old/new quoted lengths and extension lengths; this helper
+ * only does byte moves/padding/trim while preserving extension bytes at the
+ * end of the packet when present
+ */
+int ipxlat_icmp_relayout(struct sk_buff *skb, unsigned int outer_len,
+ unsigned int in_ipl, unsigned int in_iel,
+ unsigned int out_ipl, unsigned int out_pad,
+ unsigned int out_iel)
+{
+ const unsigned int in_ie_off = outer_len + in_ipl, old_len = skb->len;
+ const unsigned int new_len = outer_len + out_ipl + out_pad + out_iel;
+ const unsigned int out_ie_off = outer_len + out_ipl + out_pad;
+ unsigned int grow = 0;
+ int err;
+
+ /* new_len > old_len here means "we need extra bytes on top of
+ * already-translated length", mainly due padding/layout decisions
+ * while keeping extensions
+ */
+ if (unlikely(new_len > old_len)) {
+ grow = new_len - old_len;
+
+ err = ipxlat_ensure_tailroom(skb, grow);
+ if (unlikely(err))
+ return err;
+
+ __skb_put(skb, grow);
+ }
+
+ if (unlikely(out_iel))
+ memmove(skb->data + out_ie_off, skb->data + in_ie_off, out_iel);
+
+ if (unlikely(out_pad))
+ memset(skb->data + outer_len + out_ipl, 0, out_pad);
+
+ if (unlikely(new_len < old_len)) {
+ err = pskb_trim(skb, new_len);
+ if (unlikely(err))
+ return err;
+ }
+
+ return 0;
+}
+
/* Normalize checksum/offload metadata after address-family translation.
*
* Translation changes protocol family but keeps transport payload semantics
diff --git a/drivers/net/ipxlat/transport.h b/drivers/net/ipxlat/transport.h
index 9b6fe422b01f..09f522696eea 100644
--- a/drivers/net/ipxlat/transport.h
+++ b/drivers/net/ipxlat/transport.h
@@ -63,6 +63,25 @@ __sum16 ipxlat_l4_csum_ipv6(const struct in6_addr *saddr,
const struct sk_buff *skb, unsigned int l4_off,
unsigned int l4_len, u8 proto);
+/**
+ * ipxlat_icmp_relayout - resize quoted ICMP payload/extensions in place
+ * @skb: packet buffer
+ * @outer_len: offset to quoted datagram start
+ * @in_ipl: input datagram payload length
+ * @in_iel: input extension length
+ * @out_ipl: output datagram payload length
+ * @out_pad: output pad bytes between datagram and extensions
+ * @out_iel: output extension length
+ *
+ * This helper may move payload bytes and adjust skb tail length.
+ *
+ * Return: 0 on success, negative errno on resize/memory failures.
+ */
+int ipxlat_icmp_relayout(struct sk_buff *skb, unsigned int outer_len,
+ unsigned int in_ipl, unsigned int in_iel,
+ unsigned int out_ipl, unsigned int out_pad,
+ unsigned int out_iel);
+
/**
* ipxlat_finalize_offload - normalize checksum/GSO metadata after translation
* @skb: translated packet
--
2.53.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [RFC net-next 13/15] ipxlat: add netlink control plane and uapi
2026-03-19 15:12 [RFC net-next 00/15] Introducing ipxlat: a stateless IPv4/IPv6 translation device Ralf Lici
` (11 preceding siblings ...)
2026-03-19 15:12 ` [RFC net-next 12/15] ipxlat: add ICMP error translation and quoted-inner handling Ralf Lici
@ 2026-03-19 15:12 ` Ralf Lici
2026-03-19 15:12 ` [RFC net-next 14/15] selftests: net: add ipxlat coverage Ralf Lici
2026-03-19 15:12 ` [RFC net-next 15/15] Documentation: networking: add ipxlat translator guide Ralf Lici
14 siblings, 0 replies; 18+ messages in thread
From: Ralf Lici @ 2026-03-19 15:12 UTC (permalink / raw)
To: netdev
Cc: Daniel Gröber, Ralf Lici, Antonio Quartulli, Donald Hunter,
Jakub Kicinski, David S. Miller, Eric Dumazet, Paolo Abeni,
Simon Horman, Andrew Lunn, linux-kernel
Expose runtime configuration through netlink with validated set/get/dump
operations and generated policy glue from the YAML spec. The API
configures the translator prefix and MTU threshold used by the data
path.
Signed-off-by: Ralf Lici <ralf@mandelbit.com>
---
Documentation/netlink/specs/ipxlat.yaml | 97 +++++++
drivers/net/ipxlat/Makefile | 2 +
drivers/net/ipxlat/main.c | 9 +
drivers/net/ipxlat/netlink-gen.c | 71 +++++
drivers/net/ipxlat/netlink-gen.h | 31 +++
drivers/net/ipxlat/netlink.c | 348 ++++++++++++++++++++++++
drivers/net/ipxlat/netlink.h | 27 ++
drivers/net/ipxlat/translate_46.c | 3 +-
include/uapi/linux/ipxlat.h | 48 ++++
9 files changed, 635 insertions(+), 1 deletion(-)
create mode 100644 Documentation/netlink/specs/ipxlat.yaml
create mode 100644 drivers/net/ipxlat/netlink-gen.c
create mode 100644 drivers/net/ipxlat/netlink-gen.h
create mode 100644 drivers/net/ipxlat/netlink.c
create mode 100644 drivers/net/ipxlat/netlink.h
create mode 100644 include/uapi/linux/ipxlat.h
diff --git a/Documentation/netlink/specs/ipxlat.yaml b/Documentation/netlink/specs/ipxlat.yaml
new file mode 100644
index 000000000000..d0df5ef16e04
--- /dev/null
+++ b/Documentation/netlink/specs/ipxlat.yaml
@@ -0,0 +1,97 @@
+# SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)
+#
+# Copyright (C) 2026- Mandelbit SRL
+#
+# Author: Antonio Quartulli <antonio@mandelbit.com>
+# Ralf Lici <ralf@mandelbit.com>
+#
+---
+name: ipxlat
+protocol: genetlink
+doc: Netlink protocol to control IPXLAT (SIIT) network devices.
+
+definitions:
+ -
+ type: const
+ name: xlat-prefix6-max-prefix-len
+ value: 96
+ doc: Maximum prefix length accepted for xlat-prefix6.
+
+attribute-sets:
+ -
+ name: pool
+ attributes:
+ -
+ name: prefix
+ type: binary
+ checks:
+ exact-len: 16
+ -
+ name: prefix-len
+ type: u8
+ checks:
+ max: xlat-prefix6-max-prefix-len
+ -
+ name: cfg
+ attributes:
+ -
+ name: xlat-prefix6
+ type: nest
+ doc: IPv6 translation prefix.
+ nested-attributes: pool
+ -
+ name: lowest-ipv6-mtu
+ type: u32
+ checks:
+ min: 1280
+ -
+ name: dev
+ attributes:
+ -
+ name: ifindex
+ type: u32
+ doc: Index of the ipxlat interface to operate on.
+ -
+ name: netnsid
+ type: s32
+ doc: ID of the netns the device lives in.
+ -
+ name: config
+ type: nest
+ doc: Ipxlat device configuration.
+ nested-attributes: cfg
+
+operations:
+ list:
+ -
+ name: dev-get
+ attribute-set: dev
+ doc: Get / dump configuration of ipxlat devices.
+ do:
+ pre: ipxlat-nl-pre-doit
+ post: ipxlat-nl-post-doit
+ request:
+ attributes:
+ - ifindex
+ reply: &dev-all
+ attributes:
+ - ifindex
+ - netnsid
+ - config
+ dump:
+ reply: *dev-all
+
+ -
+ name: dev-set
+ doc: Set configuration of an ipxlat device.
+ attribute-set: dev
+ flags: [admin-perm]
+ do:
+ request:
+ attributes:
+ - ifindex
+ - config
+ reply:
+ attributes: []
+ pre: ipxlat-nl-pre-doit
+ post: ipxlat-nl-post-doit
diff --git a/drivers/net/ipxlat/Makefile b/drivers/net/ipxlat/Makefile
index 2ded504902e3..b906d5698351 100644
--- a/drivers/net/ipxlat/Makefile
+++ b/drivers/net/ipxlat/Makefile
@@ -13,3 +13,5 @@ ipxlat-objs += translate_46.o
ipxlat-objs += translate_64.o
ipxlat-objs += icmp_46.o
ipxlat-objs += icmp_64.o
+ipxlat-objs += netlink.o
+ipxlat-objs += netlink-gen.o
diff --git a/drivers/net/ipxlat/main.c b/drivers/net/ipxlat/main.c
index a1b4bcd39478..bef67ed634b6 100644
--- a/drivers/net/ipxlat/main.c
+++ b/drivers/net/ipxlat/main.c
@@ -18,6 +18,7 @@
#include "dispatch.h"
#include "ipxlpriv.h"
#include "main.h"
+#include "netlink.h"
MODULE_AUTHOR("Alberto Leiva Popper <ydahhrk@gmail.com>");
MODULE_AUTHOR("Antonio Quartulli <antonio@mandelbit.com>");
@@ -127,11 +128,19 @@ static int __init ipxlat_init(void)
return err;
}
+ err = ipxlat_nl_register();
+ if (err) {
+ pr_err("ipxlat: failed to register netlink family: %d\n", err);
+ rtnl_link_unregister(&ipxlat_link_ops);
+ return err;
+ }
+
return 0;
}
static void __exit ipxlat_exit(void)
{
+ ipxlat_nl_unregister();
rtnl_link_unregister(&ipxlat_link_ops);
}
diff --git a/drivers/net/ipxlat/netlink-gen.c b/drivers/net/ipxlat/netlink-gen.c
new file mode 100644
index 000000000000..e2cfaa6bb4dc
--- /dev/null
+++ b/drivers/net/ipxlat/netlink-gen.c
@@ -0,0 +1,71 @@
+// SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)
+/* Do not edit directly, auto-generated from: */
+/* Documentation/netlink/specs/ipxlat.yaml */
+/* YNL-GEN kernel source */
+/* To regenerate run: tools/net/ynl/ynl-regen.sh */
+
+#include <net/netlink.h>
+#include <net/genetlink.h>
+
+#include "netlink-gen.h"
+
+#include <uapi/linux/ipxlat.h>
+
+/* Common nested types */
+const struct nla_policy ipxlat_cfg_nl_policy[IPXLAT_A_CFG_LOWEST_IPV6_MTU + 1] = {
+ [IPXLAT_A_CFG_XLAT_PREFIX6] = NLA_POLICY_NESTED(ipxlat_pool_nl_policy),
+ [IPXLAT_A_CFG_LOWEST_IPV6_MTU] = NLA_POLICY_MIN(NLA_U32, 1280),
+};
+
+const struct nla_policy ipxlat_pool_nl_policy[IPXLAT_A_POOL_PREFIX_LEN + 1] = {
+ [IPXLAT_A_POOL_PREFIX] = NLA_POLICY_EXACT_LEN(16),
+ [IPXLAT_A_POOL_PREFIX_LEN] = NLA_POLICY_MAX(NLA_U8, IPXLAT_XLAT_PREFIX6_MAX_PREFIX_LEN),
+};
+
+/* IPXLAT_CMD_DEV_GET - do */
+static const struct nla_policy ipxlat_dev_get_nl_policy[IPXLAT_A_DEV_IFINDEX + 1] = {
+ [IPXLAT_A_DEV_IFINDEX] = { .type = NLA_U32, },
+};
+
+/* IPXLAT_CMD_DEV_SET - do */
+static const struct nla_policy ipxlat_dev_set_nl_policy[IPXLAT_A_DEV_CONFIG + 1] = {
+ [IPXLAT_A_DEV_IFINDEX] = { .type = NLA_U32, },
+ [IPXLAT_A_DEV_CONFIG] = NLA_POLICY_NESTED(ipxlat_cfg_nl_policy),
+};
+
+/* Ops table for ipxlat */
+static const struct genl_split_ops ipxlat_nl_ops[] = {
+ {
+ .cmd = IPXLAT_CMD_DEV_GET,
+ .pre_doit = ipxlat_nl_pre_doit,
+ .doit = ipxlat_nl_dev_get_doit,
+ .post_doit = ipxlat_nl_post_doit,
+ .policy = ipxlat_dev_get_nl_policy,
+ .maxattr = IPXLAT_A_DEV_IFINDEX,
+ .flags = GENL_CMD_CAP_DO,
+ },
+ {
+ .cmd = IPXLAT_CMD_DEV_GET,
+ .dumpit = ipxlat_nl_dev_get_dumpit,
+ .flags = GENL_CMD_CAP_DUMP,
+ },
+ {
+ .cmd = IPXLAT_CMD_DEV_SET,
+ .pre_doit = ipxlat_nl_pre_doit,
+ .doit = ipxlat_nl_dev_set_doit,
+ .post_doit = ipxlat_nl_post_doit,
+ .policy = ipxlat_dev_set_nl_policy,
+ .maxattr = IPXLAT_A_DEV_CONFIG,
+ .flags = GENL_ADMIN_PERM | GENL_CMD_CAP_DO,
+ },
+};
+
+struct genl_family ipxlat_nl_family __ro_after_init = {
+ .name = IPXLAT_FAMILY_NAME,
+ .version = IPXLAT_FAMILY_VERSION,
+ .netnsok = true,
+ .parallel_ops = true,
+ .module = THIS_MODULE,
+ .split_ops = ipxlat_nl_ops,
+ .n_split_ops = ARRAY_SIZE(ipxlat_nl_ops),
+};
diff --git a/drivers/net/ipxlat/netlink-gen.h b/drivers/net/ipxlat/netlink-gen.h
new file mode 100644
index 000000000000..2a766d05e0b4
--- /dev/null
+++ b/drivers/net/ipxlat/netlink-gen.h
@@ -0,0 +1,31 @@
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) */
+/* Do not edit directly, auto-generated from: */
+/* Documentation/netlink/specs/ipxlat.yaml */
+/* YNL-GEN kernel header */
+/* To regenerate run: tools/net/ynl/ynl-regen.sh */
+
+#ifndef _LINUX_IPXLAT_GEN_H
+#define _LINUX_IPXLAT_GEN_H
+
+#include <net/netlink.h>
+#include <net/genetlink.h>
+
+#include <uapi/linux/ipxlat.h>
+
+/* Common nested types */
+extern const struct nla_policy ipxlat_cfg_nl_policy[IPXLAT_A_CFG_LOWEST_IPV6_MTU + 1];
+extern const struct nla_policy ipxlat_pool_nl_policy[IPXLAT_A_POOL_PREFIX_LEN + 1];
+
+int ipxlat_nl_pre_doit(const struct genl_split_ops *ops, struct sk_buff *skb,
+ struct genl_info *info);
+void
+ipxlat_nl_post_doit(const struct genl_split_ops *ops, struct sk_buff *skb,
+ struct genl_info *info);
+
+int ipxlat_nl_dev_get_doit(struct sk_buff *skb, struct genl_info *info);
+int ipxlat_nl_dev_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb);
+int ipxlat_nl_dev_set_doit(struct sk_buff *skb, struct genl_info *info);
+
+extern struct genl_family ipxlat_nl_family;
+
+#endif /* _LINUX_IPXLAT_GEN_H */
diff --git a/drivers/net/ipxlat/netlink.c b/drivers/net/ipxlat/netlink.c
new file mode 100644
index 000000000000..02d097726f22
--- /dev/null
+++ b/drivers/net/ipxlat/netlink.c
@@ -0,0 +1,348 @@
+// SPDX-License-Identifier: GPL-2.0
+/* IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver
+ *
+ * Copyright (C) 2026- Mandelbit SRL
+ * Copyright (C) 2026- Daniel Gröber <dxld@darkboxed.org>
+ *
+ * Author: Antonio Quartulli <antonio@mandelbit.com>
+ * Daniel Gröber <dxld@darkboxed.org>
+ * Ralf Lici <ralf@mandelbit.com>
+ */
+
+#include <net/genetlink.h>
+#include <net/ipv6.h>
+
+#include <uapi/linux/ipxlat.h>
+
+#include "netlink.h"
+#include "main.h"
+#include "netlink-gen.h"
+#include "ipxlpriv.h"
+
+MODULE_ALIAS_GENL_FAMILY(IPXLAT_FAMILY_NAME);
+
+struct ipxlat_nl_info_ctx {
+ struct ipxlat_priv *ipxlat;
+ netdevice_tracker tracker;
+};
+
+struct ipxlat_nl_dump_ctx {
+ unsigned long last_ifindex;
+};
+
+/**
+ * ipxlat_get_from_attrs - retrieve ipxlat private data for target netdev
+ * @net: network namespace where to look for the interface
+ * @info: generic netlink info from the user request
+ * @tracker: tracker object to be used for the netdev reference acquisition
+ *
+ * Return: the ipxlat private data, if found, or an error otherwise
+ */
+static struct ipxlat_priv *ipxlat_get_from_attrs(struct net *net,
+ struct genl_info *info,
+ netdevice_tracker *tracker)
+{
+ struct ipxlat_priv *ipxlat;
+ struct net_device *dev;
+ int ifindex;
+
+ if (GENL_REQ_ATTR_CHECK(info, IPXLAT_A_DEV_IFINDEX))
+ return ERR_PTR(-EINVAL);
+ ifindex = nla_get_u32(info->attrs[IPXLAT_A_DEV_IFINDEX]);
+
+ rcu_read_lock();
+ dev = dev_get_by_index_rcu(net, ifindex);
+ if (!dev) {
+ rcu_read_unlock();
+ NL_SET_ERR_MSG_MOD(info->extack,
+ "ifindex does not match any interface");
+ return ERR_PTR(-ENODEV);
+ }
+
+ if (!ipxlat_dev_is_valid(dev)) {
+ rcu_read_unlock();
+ NL_SET_ERR_MSG_MOD(info->extack,
+ "specified interface is not ipxlat");
+ NL_SET_BAD_ATTR(info->extack,
+ info->attrs[IPXLAT_A_DEV_IFINDEX]);
+ return ERR_PTR(-EINVAL);
+ }
+
+ ipxlat = netdev_priv(dev);
+ netdev_hold(dev, tracker, GFP_ATOMIC);
+ rcu_read_unlock();
+
+ return ipxlat;
+}
+
+int ipxlat_nl_pre_doit(const struct genl_split_ops *ops, struct sk_buff *skb,
+ struct genl_info *info)
+{
+ struct ipxlat_nl_info_ctx *ctx = (struct ipxlat_nl_info_ctx *)info->ctx;
+ struct ipxlat_priv *ipxlat;
+
+ BUILD_BUG_ON(sizeof(*ctx) > sizeof(info->ctx));
+
+ ipxlat = ipxlat_get_from_attrs(genl_info_net(info), info,
+ &ctx->tracker);
+ if (IS_ERR(ipxlat))
+ return PTR_ERR(ipxlat);
+
+ ctx->ipxlat = ipxlat;
+ return 0;
+}
+
+void ipxlat_nl_post_doit(const struct genl_split_ops *ops, struct sk_buff *skb,
+ struct genl_info *info)
+{
+ struct ipxlat_nl_info_ctx *ctx = (struct ipxlat_nl_info_ctx *)info->ctx;
+
+ if (ctx->ipxlat)
+ netdev_put(ctx->ipxlat->dev, &ctx->tracker);
+}
+
+static int ipxlat_nl_send_dev(struct sk_buff *skb, struct ipxlat_priv *ipxlat,
+ struct net *src_net, const u32 portid,
+ const u32 seq, int flags)
+{
+ struct nlattr *attr_cfg, *attr_pool;
+ struct ipv6_prefix xlat_prefix6;
+ int id, ret = -EMSGSIZE;
+ u32 lowest_ipv6_mtu;
+ void *hdr;
+
+ /* snapshot settings under lock so userspace sees a coherent state */
+ mutex_lock(&ipxlat->cfg_lock);
+ xlat_prefix6 = ipxlat->xlat_prefix6;
+ lowest_ipv6_mtu = ipxlat->lowest_ipv6_mtu;
+ mutex_unlock(&ipxlat->cfg_lock);
+
+ hdr = genlmsg_put(skb, portid, seq, &ipxlat_nl_family, flags,
+ IPXLAT_CMD_DEV_GET);
+ if (!hdr)
+ return -ENOBUFS;
+
+ if (nla_put_u32(skb, IPXLAT_A_DEV_IFINDEX, ipxlat->dev->ifindex))
+ goto err;
+
+ if (!net_eq(src_net, dev_net(ipxlat->dev))) {
+ id = peernet2id_alloc(src_net, dev_net(ipxlat->dev),
+ GFP_ATOMIC);
+ if (id < 0) {
+ ret = id;
+ goto err;
+ }
+ if (nla_put_s32(skb, IPXLAT_A_DEV_NETNSID, id))
+ goto err;
+ }
+
+ attr_cfg = nla_nest_start(skb, IPXLAT_A_DEV_CONFIG);
+ if (!attr_cfg)
+ goto err;
+
+ attr_pool = nla_nest_start(skb, IPXLAT_A_CFG_XLAT_PREFIX6);
+ if (!attr_pool)
+ goto err;
+
+ if (nla_put_in6_addr(skb, IPXLAT_A_POOL_PREFIX, &xlat_prefix6.addr) ||
+ nla_put_u8(skb, IPXLAT_A_POOL_PREFIX_LEN, xlat_prefix6.len))
+ goto err;
+
+ nla_nest_end(skb, attr_pool);
+
+ if (nla_put_u32(skb, IPXLAT_A_CFG_LOWEST_IPV6_MTU, lowest_ipv6_mtu))
+ goto err;
+
+ nla_nest_end(skb, attr_cfg);
+ genlmsg_end(skb, hdr);
+
+ return 0;
+err:
+ genlmsg_cancel(skb, hdr);
+ return ret;
+}
+
+int ipxlat_nl_dev_get_doit(struct sk_buff *skb, struct genl_info *info)
+{
+ struct ipxlat_nl_info_ctx *ctx = (struct ipxlat_nl_info_ctx *)info->ctx;
+ struct sk_buff *reply;
+ int ret;
+
+ if (GENL_REQ_ATTR_CHECK(info, IPXLAT_A_DEV_IFINDEX))
+ return -EINVAL;
+
+ reply = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+ if (!reply)
+ return -ENOMEM;
+
+ ret = ipxlat_nl_send_dev(reply, ctx->ipxlat, genl_info_net(info),
+ info->snd_portid, info->snd_seq, 0);
+ if (ret < 0) {
+ nlmsg_free(reply);
+ return ret;
+ }
+
+ return genlmsg_reply(reply, info);
+}
+
+int ipxlat_nl_dev_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb)
+{
+ struct ipxlat_nl_dump_ctx *state = (struct ipxlat_nl_dump_ctx *)cb->ctx;
+ struct net *net = sock_net(cb->skb->sk);
+ netdevice_tracker tracker;
+ struct net_device *dev;
+ int ret;
+
+ rcu_read_lock();
+ for_each_netdev_dump(net, dev, state->last_ifindex) {
+ if (!ipxlat_dev_is_valid(dev))
+ continue;
+
+ netdev_hold(dev, &tracker, GFP_ATOMIC);
+ rcu_read_unlock();
+
+ ret = ipxlat_nl_send_dev(skb, netdev_priv(dev), net,
+ NETLINK_CB(cb->skb).portid,
+ cb->nlh->nlmsg_seq, NLM_F_MULTI);
+
+ rcu_read_lock();
+ netdev_put(dev, &tracker);
+
+ if (ret < 0) {
+ if (skb->len > 0)
+ break;
+ rcu_read_unlock();
+ return ret;
+ }
+ }
+ rcu_read_unlock();
+ return skb->len;
+}
+
+static int ipxlat_nl_validate_xlat_prefix6(const struct ipv6_prefix *prefix,
+ struct netlink_ext_ack *extack)
+{
+ if (prefix->len != 32 && prefix->len != 40 && prefix->len != 48 &&
+ prefix->len != 56 && prefix->len != 64 && prefix->len != 96) {
+ NL_SET_ERR_MSG_FMT_MOD(extack,
+ "unsupported RFC 6052 prefix length: %u",
+ prefix->len);
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static int ipxlat_nl_parse_xlat_prefix6(struct nlattr *attr,
+ struct ipv6_prefix *xlat_prefix6,
+ struct netlink_ext_ack *extack)
+{
+ struct nlattr *attrs_pool[IPXLAT_A_POOL_MAX + 1];
+ struct ipv6_prefix new_xlat_prefix6;
+ int ret;
+
+ new_xlat_prefix6 = *xlat_prefix6;
+
+ ret = nla_parse_nested(attrs_pool, IPXLAT_A_POOL_MAX, attr,
+ ipxlat_pool_nl_policy, extack);
+ if (ret)
+ return ret;
+
+ if (!attrs_pool[IPXLAT_A_POOL_PREFIX] &&
+ !attrs_pool[IPXLAT_A_POOL_PREFIX_LEN]) {
+ NL_SET_ERR_MSG_MOD(extack, "xlat-prefix6 update is empty");
+ return -EINVAL;
+ }
+
+ if (attrs_pool[IPXLAT_A_POOL_PREFIX])
+ new_xlat_prefix6.addr =
+ nla_get_in6_addr(attrs_pool[IPXLAT_A_POOL_PREFIX]);
+ if (attrs_pool[IPXLAT_A_POOL_PREFIX_LEN])
+ new_xlat_prefix6.len =
+ nla_get_u8(attrs_pool[IPXLAT_A_POOL_PREFIX_LEN]);
+
+ ret = ipxlat_nl_validate_xlat_prefix6(&new_xlat_prefix6, extack);
+ if (ret) {
+ if (attrs_pool[IPXLAT_A_POOL_PREFIX_LEN])
+ NL_SET_BAD_ATTR(extack,
+ attrs_pool[IPXLAT_A_POOL_PREFIX_LEN]);
+ else
+ NL_SET_BAD_ATTR(extack,
+ attrs_pool[IPXLAT_A_POOL_PREFIX]);
+ return ret;
+ }
+
+ *xlat_prefix6 = new_xlat_prefix6;
+ return 0;
+}
+
+int ipxlat_nl_dev_set_doit(struct sk_buff *skb, struct genl_info *info)
+{
+ struct ipxlat_nl_info_ctx *ctx = (struct ipxlat_nl_info_ctx *)info->ctx;
+ struct nlattr *attrs[IPXLAT_A_CFG_MAX + 1];
+ struct nlattr *xlat_prefix6_attr;
+ struct ipv6_prefix xlat_prefix6;
+ u32 lowest_ipv6_mtu;
+ int ret = 0;
+
+ if (GENL_REQ_ATTR_CHECK(info, IPXLAT_A_DEV_CONFIG))
+ return -EINVAL;
+
+ ret = nla_parse_nested(attrs, IPXLAT_A_CFG_MAX,
+ info->attrs[IPXLAT_A_DEV_CONFIG],
+ ipxlat_cfg_nl_policy, info->extack);
+ if (ret)
+ return ret;
+
+ if (!attrs[IPXLAT_A_CFG_XLAT_PREFIX6] &&
+ !attrs[IPXLAT_A_CFG_LOWEST_IPV6_MTU]) {
+ NL_SET_ERR_MSG_MOD(info->extack, "config update is empty");
+ return -EINVAL;
+ }
+ xlat_prefix6_attr = attrs[IPXLAT_A_CFG_XLAT_PREFIX6];
+
+ mutex_lock(&ctx->ipxlat->cfg_lock);
+
+ /* Stage updates that can fail before writing device state.
+ * This keeps dev-set all-or-nothing and avoids partial commits when
+ * xlat-prefix parsing/validation fails.
+ */
+ if (xlat_prefix6_attr) {
+ xlat_prefix6 = ctx->ipxlat->xlat_prefix6;
+ ret = ipxlat_nl_parse_xlat_prefix6(xlat_prefix6_attr,
+ &xlat_prefix6,
+ info->extack);
+ if (ret)
+ goto out_unlock;
+ }
+
+ if (xlat_prefix6_attr)
+ ctx->ipxlat->xlat_prefix6 = xlat_prefix6;
+ if (attrs[IPXLAT_A_CFG_LOWEST_IPV6_MTU]) {
+ lowest_ipv6_mtu =
+ nla_get_u32(attrs[IPXLAT_A_CFG_LOWEST_IPV6_MTU]);
+ WRITE_ONCE(ctx->ipxlat->lowest_ipv6_mtu, lowest_ipv6_mtu);
+ }
+
+out_unlock:
+ mutex_unlock(&ctx->ipxlat->cfg_lock);
+ return ret;
+}
+
+/**
+ * ipxlat_nl_register - perform any needed registration in the netlink subsystem
+ *
+ * Return: 0 on success, a negative error code otherwise
+ */
+int __init ipxlat_nl_register(void)
+{
+ return genl_register_family(&ipxlat_nl_family);
+}
+
+/**
+ * ipxlat_nl_unregister - undo any module wide netlink registration
+ */
+void ipxlat_nl_unregister(void)
+{
+ genl_unregister_family(&ipxlat_nl_family);
+}
diff --git a/drivers/net/ipxlat/netlink.h b/drivers/net/ipxlat/netlink.h
new file mode 100644
index 000000000000..1ea292ad9964
--- /dev/null
+++ b/drivers/net/ipxlat/netlink.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver
+ *
+ * Copyright (C) 2026- Mandelbit SRL
+ * Copyright (C) 2026- Daniel Gröber <dxld@darkboxed.org>
+ *
+ * Author: Antonio Quartulli <antonio@mandelbit.com>
+ * Daniel Gröber <dxld@darkboxed.org>
+ * Ralf Lici <ralf@mandelbit.com>
+ */
+
+#ifndef _NET_IPXLAT_NETLINK_H_
+#define _NET_IPXLAT_NETLINK_H_
+
+/**
+ * ipxlat_nl_register - register ipxlat generic-netlink family
+ *
+ * Return: 0 on success, negative errno on registration failure.
+ */
+int ipxlat_nl_register(void);
+
+/**
+ * ipxlat_nl_unregister - unregister ipxlat generic-netlink family
+ */
+void ipxlat_nl_unregister(void);
+
+#endif /* _NET_IPXLAT_NETLINK_H_ */
diff --git a/drivers/net/ipxlat/translate_46.c b/drivers/net/ipxlat/translate_46.c
index 0b79ca07c771..d625dc85576b 100644
--- a/drivers/net/ipxlat/translate_46.c
+++ b/drivers/net/ipxlat/translate_46.c
@@ -14,6 +14,7 @@
#include <net/ip6_route.h>
#include "address.h"
+#include "icmp.h"
#include "packet.h"
#include "transport.h"
#include "translate_46.h"
@@ -239,7 +240,7 @@ int ipxlat_46_translate(struct ipxlat_priv *ipxlat, struct sk_buff *skb)
err = ipxlat_46_outer_udp(skb, &outer4);
break;
case IPPROTO_ICMP:
- err = -EPROTONOSUPPORT;
+ err = ipxlat_46_icmp(ipxlat, skb);
break;
default:
err = 0;
diff --git a/include/uapi/linux/ipxlat.h b/include/uapi/linux/ipxlat.h
new file mode 100644
index 000000000000..f8db3df3f9e8
--- /dev/null
+++ b/include/uapi/linux/ipxlat.h
@@ -0,0 +1,48 @@
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) */
+/* Do not edit directly, auto-generated from: */
+/* Documentation/netlink/specs/ipxlat.yaml */
+/* YNL-GEN uapi header */
+/* To regenerate run: tools/net/ynl/ynl-regen.sh */
+
+#ifndef _UAPI_LINUX_IPXLAT_H
+#define _UAPI_LINUX_IPXLAT_H
+
+#define IPXLAT_FAMILY_NAME "ipxlat"
+#define IPXLAT_FAMILY_VERSION 1
+
+#define IPXLAT_XLAT_PREFIX6_MAX_PREFIX_LEN 96
+
+enum {
+ IPXLAT_A_POOL_PREFIX = 1,
+ IPXLAT_A_POOL_PREFIX_LEN,
+
+ __IPXLAT_A_POOL_MAX,
+ IPXLAT_A_POOL_MAX = (__IPXLAT_A_POOL_MAX - 1)
+};
+
+enum {
+ IPXLAT_A_CFG_XLAT_PREFIX6 = 1,
+ IPXLAT_A_CFG_LOWEST_IPV6_MTU,
+
+ __IPXLAT_A_CFG_MAX,
+ IPXLAT_A_CFG_MAX = (__IPXLAT_A_CFG_MAX - 1)
+};
+
+enum {
+ IPXLAT_A_DEV_IFINDEX = 1,
+ IPXLAT_A_DEV_NETNSID,
+ IPXLAT_A_DEV_CONFIG,
+
+ __IPXLAT_A_DEV_MAX,
+ IPXLAT_A_DEV_MAX = (__IPXLAT_A_DEV_MAX - 1)
+};
+
+enum {
+ IPXLAT_CMD_DEV_GET = 1,
+ IPXLAT_CMD_DEV_SET,
+
+ __IPXLAT_CMD_MAX,
+ IPXLAT_CMD_MAX = (__IPXLAT_CMD_MAX - 1)
+};
+
+#endif /* _UAPI_LINUX_IPXLAT_H */
--
2.53.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [RFC net-next 14/15] selftests: net: add ipxlat coverage
2026-03-19 15:12 [RFC net-next 00/15] Introducing ipxlat: a stateless IPv4/IPv6 translation device Ralf Lici
` (12 preceding siblings ...)
2026-03-19 15:12 ` [RFC net-next 13/15] ipxlat: add netlink control plane and uapi Ralf Lici
@ 2026-03-19 15:12 ` Ralf Lici
2026-03-19 15:12 ` [RFC net-next 15/15] Documentation: networking: add ipxlat translator guide Ralf Lici
14 siblings, 0 replies; 18+ messages in thread
From: Ralf Lici @ 2026-03-19 15:12 UTC (permalink / raw)
To: netdev
Cc: Daniel Gröber, Ralf Lici, Antonio Quartulli, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
Shuah Khan, linux-kernel, linux-kselftest
Add selftests for ipxlat data plane behavior and control-plane setup.
The tests build an isolated netns topology, configure ipxlat through
YNL, and exercise core traffic classes (TCP, UDP, ICMP info/error, and
fragment-related paths). This provides reproducible end-to-end coverage
for the translation pipeline and basic regression protection for future
changes.
Signed-off-by: Ralf Lici <ralf@mandelbit.com>
---
tools/testing/selftests/net/ipxlat/.gitignore | 1 +
tools/testing/selftests/net/ipxlat/Makefile | 25 ++
.../selftests/net/ipxlat/ipxlat_data.sh | 70 +++++
.../selftests/net/ipxlat/ipxlat_frag.sh | 70 +++++
.../selftests/net/ipxlat/ipxlat_icmp_err.sh | 54 ++++
.../selftests/net/ipxlat/ipxlat_lib.sh | 273 ++++++++++++++++++
.../net/ipxlat/ipxlat_udp4_zero_csum_send.c | 119 ++++++++
7 files changed, 612 insertions(+)
create mode 100644 tools/testing/selftests/net/ipxlat/.gitignore
create mode 100644 tools/testing/selftests/net/ipxlat/Makefile
create mode 100755 tools/testing/selftests/net/ipxlat/ipxlat_data.sh
create mode 100755 tools/testing/selftests/net/ipxlat/ipxlat_frag.sh
create mode 100755 tools/testing/selftests/net/ipxlat/ipxlat_icmp_err.sh
create mode 100644 tools/testing/selftests/net/ipxlat/ipxlat_lib.sh
create mode 100644 tools/testing/selftests/net/ipxlat/ipxlat_udp4_zero_csum_send.c
diff --git a/tools/testing/selftests/net/ipxlat/.gitignore b/tools/testing/selftests/net/ipxlat/.gitignore
new file mode 100644
index 000000000000..43bd01d8a84b
--- /dev/null
+++ b/tools/testing/selftests/net/ipxlat/.gitignore
@@ -0,0 +1 @@
+ipxlat_udp4_zero_csum_send
diff --git a/tools/testing/selftests/net/ipxlat/Makefile b/tools/testing/selftests/net/ipxlat/Makefile
new file mode 100644
index 000000000000..cca588945e48
--- /dev/null
+++ b/tools/testing/selftests/net/ipxlat/Makefile
@@ -0,0 +1,25 @@
+# SPDX-License-Identifier: GPL-2.0
+# IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver
+#
+# Copyright (C) 2026- Mandelbit SRL
+# Copyright (C) 2026- Daniel Gröber <dxld@darkboxed.org>
+#
+# Author: Antonio Quartulli <antonio@mandelbit.com>
+# Daniel Gröber <dxld@darkboxed.org>
+# Ralf Lici <ralf@mandelbit.com>
+
+TEST_PROGS := \
+ ipxlat_data.sh \
+ ipxlat_frag.sh \
+ ipxlat_icmp_err.sh \
+# end of TEST_PROGS
+
+TEST_FILES := \
+ ipxlat_lib.sh \
+# end of TEST_FILES
+
+TEST_GEN_FILES := \
+ ipxlat_udp4_zero_csum_send \
+# end of TEST_GEN_FILES
+
+include ../../lib.mk
diff --git a/tools/testing/selftests/net/ipxlat/ipxlat_data.sh b/tools/testing/selftests/net/ipxlat/ipxlat_data.sh
new file mode 100755
index 000000000000..101e0a65f0a9
--- /dev/null
+++ b/tools/testing/selftests/net/ipxlat/ipxlat_data.sh
@@ -0,0 +1,70 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver
+#
+# Copyright (C) 2026- Mandelbit SRL
+# Copyright (C) 2026- Daniel Gröber <dxld@darkboxed.org>
+#
+# Author: Antonio Quartulli <antonio@mandelbit.com>
+# Daniel Gröber <dxld@darkboxed.org>
+# Ralf Lici <ralf@mandelbit.com>
+
+set -o pipefail
+
+SCRIPT_DIR=$(dirname "$(readlink -f "$0")")
+source "$SCRIPT_DIR/ipxlat_lib.sh"
+
+trap ipxlat_cleanup EXIT
+
+ipxlat_setup_env
+
+# Send ICMP Echo and verify we receive a reply back
+
+RET=0
+ip netns exec "$NS4" ping -c 2 -W 2 "$IPXLAT_V4_REMOTE" >/dev/null 2>&1
+check_err $? "ping 4->6 failed"
+log_test "icmp-info 4->6"
+
+RET=0
+ip netns exec "$NS6" ping -6 -c 2 -W 2 -I "$IPXLAT_V6_NS6_SRC" \
+ "$IPXLAT_V6_NS4" >/dev/null 2>&1
+check_err $? "ping 6->4 failed"
+log_test "icmp-info 6->4"
+
+# Run a TCP data transfer over the translator path
+
+RET=0
+ipxlat_run_iperf "$NS6" "$NS4" "$IPXLAT_V4_REMOTE" 5201 -n 256K
+check_err $? "tcp 4->6 failed"
+log_test "tcp 4->6"
+
+RET=0
+ipxlat_run_iperf "$NS4" "$NS6" "$IPXLAT_V6_NS4" 5201 \
+ -B "$IPXLAT_V6_NS6_SRC" -n 256K
+check_err $? "tcp 6->4 failed"
+log_test "tcp 6->4"
+
+# Run UDP traffic to verify UDP translation and delivery
+
+RET=0
+ipxlat_run_iperf "$NS6" "$NS4" "$IPXLAT_V4_REMOTE" 5202 -u -b 5M -t 1
+check_err $? "udp 4->6 failed"
+log_test "udp 4->6"
+
+RET=0
+ipxlat_run_iperf "$NS4" "$NS6" "$IPXLAT_V6_NS4" 5202 \
+ -B "$IPXLAT_V6_NS6_SRC" -u -b 5M -t 1
+check_err $? "udp 6->4 failed"
+log_test "udp 6->4"
+
+# Send one IPv4 UDP packet with checksum=0 and verify 4->6 translation.
+
+RET=0
+ipxlat_capture_pkts "$NS6" \
+ "ip6 and udp and dst host $IPXLAT_V6_REMOTE and dst port 5555" 1 3 \
+ ip netns exec "$NS4" "$SCRIPT_DIR/ipxlat_udp4_zero_csum_send" \
+ "$IPXLAT_NS4_ADDR" "$IPXLAT_V4_REMOTE" 5555
+check_err $? "udp checksum-zero 4->6 failed"
+log_test "udp checksum-zero 4->6"
+
+exit "$EXIT_STATUS"
diff --git a/tools/testing/selftests/net/ipxlat/ipxlat_frag.sh b/tools/testing/selftests/net/ipxlat/ipxlat_frag.sh
new file mode 100755
index 000000000000..26ed351cd263
--- /dev/null
+++ b/tools/testing/selftests/net/ipxlat/ipxlat_frag.sh
@@ -0,0 +1,70 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver
+#
+# Copyright (C) 2026- Mandelbit SRL
+# Copyright (C) 2026- Daniel Gröber <dxld@darkboxed.org>
+#
+# Author: Antonio Quartulli <antonio@mandelbit.com>
+# Daniel Gröber <dxld@darkboxed.org>
+# Ralf Lici <ralf@mandelbit.com>
+
+set -o pipefail
+
+SCRIPT_DIR=$(dirname "$(readlink -f "$0")")
+source "$SCRIPT_DIR/ipxlat_lib.sh"
+
+trap ipxlat_cleanup EXIT
+
+ipxlat_setup_env
+
+# Exercise large TCP flow on 4->6 path to cover pre-fragmentation behavior
+RET=0
+ipxlat_run_iperf "$NS6" "$NS4" "$IPXLAT_V4_REMOTE" 5301 -n 8M
+check_err $? "large tcp 4->6 failed"
+log_test "large tcp 4->6"
+
+# Exercise large UDP flow on 4->6 path to cover pre-fragmentation behavior
+RET=0
+ipxlat_run_iperf "$NS6" "$NS4" "$IPXLAT_V4_REMOTE" 5302 -u -b 20M -t 2 -l 1400
+check_err $? "large udp 4->6 failed"
+log_test "large udp 4->6"
+
+# Exercise large TCP flow on 6->4 path to cover
+# fragmentation-sensitive translation
+RET=0
+ipxlat_run_iperf "$NS4" "$NS6" "$IPXLAT_V6_NS4" 5303 \
+ -B "$IPXLAT_V6_NS6_SRC" -n 8M
+check_err $? "large tcp 6->4 failed"
+log_test "large tcp 6->4"
+
+# Exercise large UDP flow on 6->4 path to cover
+# fragmentation-sensitive translation
+RET=0
+ipxlat_run_iperf "$NS4" "$NS6" "$IPXLAT_V6_NS4" 5304 \
+ -B "$IPXLAT_V6_NS6_SRC" -u -b 20M -t 2 -l 1400
+check_err $? "large udp 6->4 failed"
+log_test "large udp 6->4"
+
+# Send oversized IPv4 ICMP Echo with DF disabled (source fragmentation allowed)
+# and verify translator drops fragmented ICMPv4 input (no translated ICMPv6
+# Echo seen in NS6)
+RET=0
+ipxlat_capture_pkts "$NS6" "icmp6 and ip6[40] == 128" 0 5 \
+ ip netns exec "$NS4" bash -c \
+ "ping -M \"dont\" -s 2000 -c 1 -W 1 \"$IPXLAT_V4_REMOTE\" \
+ >/dev/null 2>&1 || test \$? -eq 1"
+check_err $? "fragmented icmp 4->6 should be dropped"
+log_test "drop fragmented icmp 4->6"
+
+# Send oversized IPv6 ICMP echo request and verify translator drops fragmented
+# ICMPv6 input (no translated ICMPv4 Echo seen in NS4)
+RET=0
+ipxlat_capture_pkts "$NS4" "icmp and icmp[0] == 8" 0 5 \
+ ip netns exec "$NS6" bash -c \
+ "ping -6 -s 2000 -c 1 -W 1 -I \"$IPXLAT_V6_NS6_SRC\" \
+ \"$IPXLAT_V6_NS4\" >/dev/null 2>&1 || test \$? -eq 1"
+check_err $? "fragmented icmp 6->4 should be dropped"
+log_test "drop fragmented icmp 6->4"
+
+exit "$EXIT_STATUS"
diff --git a/tools/testing/selftests/net/ipxlat/ipxlat_icmp_err.sh b/tools/testing/selftests/net/ipxlat/ipxlat_icmp_err.sh
new file mode 100755
index 000000000000..946584b55895
--- /dev/null
+++ b/tools/testing/selftests/net/ipxlat/ipxlat_icmp_err.sh
@@ -0,0 +1,54 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver
+#
+# Copyright (C) 2026- Mandelbit SRL
+# Copyright (C) 2026- Daniel Gröber <dxld@darkboxed.org>
+#
+# Author: Antonio Quartulli <antonio@mandelbit.com>
+# Daniel Gröber <dxld@darkboxed.org>
+# Ralf Lici <ralf@mandelbit.com>
+
+set -o pipefail
+
+SCRIPT_DIR=$(dirname "$(readlink -f "$0")")
+source "$SCRIPT_DIR/ipxlat_lib.sh"
+
+trap ipxlat_cleanup EXIT
+
+ipxlat_setup_env
+
+# Trigger UDP to a closed port from NS4 and capture translated
+# ICMPv4 Port Unreachable
+RET=0
+ipxlat_capture_pkts "$NS4" "icmp and icmp[0] == 3 and icmp[1] == 3" 1 3 \
+ ip netns exec "$NS4" bash -c \
+ "echo x > /dev/udp/$IPXLAT_V4_REMOTE/9 || true"
+check_err $? "icmp-error 4->6 not observed"
+log_test "icmp-error xlate 4->6"
+
+# Trigger UDP to a closed port from NS6 and capture translated
+# ICMPv6 Port Unreachable
+RET=0
+ipxlat_capture_pkts "$NS6" "icmp6 and ip6[40] == 1 and ip6[41] == 4" 1 3 \
+ ip netns exec "$NS6" bash -c \
+ "echo x > /dev/udp/$IPXLAT_V6_NS4/9 || true"
+check_err $? "icmp-error 6->4 not observed"
+log_test "icmp-error xlate 6->4"
+
+# Send oversized DF IPv4 packet and verify local ICMPv4
+# Fragmentation Needed emission
+sysctl -qw net.ipv4.conf.ipxl0.accept_local=1
+sysctl -qw net.ipv4.conf.all.rp_filter=0
+sysctl -qw net.ipv4.conf.default.rp_filter=0
+sysctl -qw net.ipv4.conf.ipxl0.rp_filter=0
+sleep 2
+RET=0
+ipxlat_capture_pkts "$NS4" "icmp and icmp[0] == 3 and icmp[1] == 4" 1 3 \
+ ip netns exec "$NS4" bash -c \
+ "ping -M \"do\" -s 1300 -c 1 -W 1 \"$IPXLAT_V4_REMOTE\" \
+ >/dev/null 2>&1 || test \$? -eq 1"
+check_err $? "icmpv4 frag-needed emission not observed"
+log_test "icmpv4 frag-needed emission"
+
+exit "$EXIT_STATUS"
diff --git a/tools/testing/selftests/net/ipxlat/ipxlat_lib.sh b/tools/testing/selftests/net/ipxlat/ipxlat_lib.sh
new file mode 100644
index 000000000000..e27683f280d4
--- /dev/null
+++ b/tools/testing/selftests/net/ipxlat/ipxlat_lib.sh
@@ -0,0 +1,273 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver
+#
+# Copyright (C) 2026- Mandelbit SRL
+# Copyright (C) 2026- Daniel Gröber <dxld@darkboxed.org>
+#
+# Author: Antonio Quartulli <antonio@mandelbit.com>
+# Daniel Gröber <dxld@darkboxed.org>
+# Ralf Lici <ralf@mandelbit.com>
+
+set -o pipefail
+
+IPXLAT_TEST_DIR=$(dirname "$(readlink -f "${BASH_SOURCE[0]}")")
+source "$IPXLAT_TEST_DIR/../lib.sh"
+
+KDIR=${KDIR:-$(readlink -f "$IPXLAT_TEST_DIR/../../../../../")}
+YNL_CLI="$KDIR/tools/net/ynl/pyynl/cli.py"
+YNL_SPEC="$KDIR/Documentation/netlink/specs/ipxlat.yaml"
+IPXLAT_IPERF_TIMEOUT=${IPXLAT_IPERF_TIMEOUT:-10}
+
+IPXLAT_TRANSLATOR_DEV=ipxl0
+IPXLAT_VETH4_HOST=veth4r
+IPXLAT_VETH4_NS=veth4n
+IPXLAT_VETH6_HOST=veth6r
+IPXLAT_VETH6_NS=veth6n
+
+IPXLAT_XLAT_PREFIX6=2001:db8:100::
+IPXLAT_XLAT_PREFIX6_LEN=40
+IPXLAT_XLAT_PREFIX6_HEX=20010db8010000000000000000000000
+IPXLAT_LOWEST_IPV6_MTU=1280
+
+IPXLAT_HOST4_ADDR=198.51.100.1
+IPXLAT_HOST6_ADDR=2001:db8:1::1
+
+IPXLAT_NS4_ADDR=198.51.100.2
+IPXLAT_NS6_ADDR=2001:db8:1::2
+export IPXLAT_V4_REMOTE=192.0.2.33
+
+IPXLAT_V6_REMOTE=2001:db8:1c0:2:21::
+IPXLAT_V6_NS4=2001:db8:1c6:3364:2::
+IPXLAT_V6_NS6_SRC=2001:db8:1c0:2:2::
+
+NS4=""
+NS6=""
+
+ipxlat_ynl()
+{
+ python3 "$YNL_CLI" --spec "$YNL_SPEC" "$@"
+}
+
+ipxlat_build_dev_set_json()
+{
+ local ifindex="$1"
+
+ jq -cn \
+ --argjson ifindex "$ifindex" \
+ --arg prefix "$IPXLAT_XLAT_PREFIX6_HEX" \
+ --argjson prefix_len "$IPXLAT_XLAT_PREFIX6_LEN" \
+ --argjson lowest_ipv6_mtu "$IPXLAT_LOWEST_IPV6_MTU" \
+ '{
+ ifindex: $ifindex,
+ config: {
+ "xlat-prefix6": {
+ prefix: $prefix,
+ "prefix-len": $prefix_len
+ },
+ "lowest-ipv6-mtu": $lowest_ipv6_mtu
+ }
+ }'
+}
+
+ipxlat_require_root()
+{
+ if [[ $(id -u) -ne 0 ]]; then
+ echo "ipxlat selftests need root; skipping"
+ exit "$ksft_skip"
+ fi
+}
+
+ipxlat_require_tools()
+{
+ if [[ ! -f "$YNL_CLI" || ! -f "$YNL_SPEC" ]]; then
+ log_test_skip "ipxlat netlink spec/ynl not found"
+ exit "$ksft_skip"
+ fi
+
+ for tool in ip python3 ping iperf3 tcpdump timeout jq; do
+ require_command "$tool"
+ done
+}
+
+ipxlat_cleanup()
+{
+ cleanup_ns "${NS4:-}" "${NS6:-}" || true
+ ip link del "$IPXLAT_TRANSLATOR_DEV" 2>/dev/null || true
+ ip link del "$IPXLAT_VETH4_HOST" 2>/dev/null || true
+ ip link del "$IPXLAT_VETH6_HOST" 2>/dev/null || true
+}
+
+# Test topology:
+#
+# host namespace:
+# - owns ipxlat dev `ipxl0`
+# - has veth peers `veth4r` and `veth6r`
+# - routes IPv4 test prefix (192.0.2.0/24) to ipxl0 (v4 network steering rule)
+# - routes xlat-prefix6 prefix (2001:db8:100::/40) out to NS6 side
+# - routes mapped NS4 IPv6 identity (2001:db8:1c6:3364:2::/128) to ipxl0
+# so NS6->NS4 traffic enters 6->4 translation
+#
+# NS4:
+# - IPv4-only endpoint: 198.51.100.2/24 on veth4n
+# - default route via host 198.51.100.1 (veth4r)
+# - sends traffic to 192.0.2.33 (translated by ipxl0 to IPv6)
+#
+# NS6:
+# - IPv6 endpoint: 2001:db8:1::2/64 on veth6n
+# - also owns mapped addresses used by tests:
+# 2001:db8:1c0:2:21:: (maps to 192.0.2.33)
+# 2001:db8:1c0:2:2:: (maps to 192.0.2.2, used as explicit src
+# since we have multiple v6 addresses)
+# - route to mapped NS4 IPv6 address is pinned via host:
+# 2001:db8:1c6:3364:2::/128
+# This keeps the 6->4 test path deterministic.
+#
+# ipxlat config under test:
+# - xlat-prefix6 = 2001:db8:100::/40
+# - lowest-ipv6-mtu = 1280
+ipxlat_configure_topology()
+{
+ local ifindex
+ local dev_set_json
+
+ if ! ip link add "$IPXLAT_TRANSLATOR_DEV" type ipxlat; then
+ echo "ipxlat link kind unavailable; skipping"
+ exit "$ksft_skip"
+ fi
+ ip link set "$IPXLAT_TRANSLATOR_DEV" up
+ ifindex=$(cat /sys/class/net/"$IPXLAT_TRANSLATOR_DEV"/ifindex)
+ dev_set_json=$(ipxlat_build_dev_set_json "$ifindex")
+
+ if ! ipxlat_ynl --do dev-set --json "$dev_set_json" >/dev/null; then
+ echo "ipxlat dev-set failed"
+ exit "$ksft_fail"
+ fi
+
+ setup_ns NS4 NS6 || exit "$ksft_skip"
+
+ ip link add "$IPXLAT_VETH4_HOST" type veth peer name "$IPXLAT_VETH4_NS"
+ ip link add "$IPXLAT_VETH6_HOST" type veth peer name "$IPXLAT_VETH6_NS"
+ ip link set "$IPXLAT_VETH4_NS" netns "$NS4"
+ ip link set "$IPXLAT_VETH6_NS" netns "$NS6"
+
+ ip addr add "$IPXLAT_HOST4_ADDR/24" dev "$IPXLAT_VETH4_HOST"
+ ip -6 addr add "$IPXLAT_HOST6_ADDR/64" dev "$IPXLAT_VETH6_HOST"
+ ip link set "$IPXLAT_VETH4_HOST" up
+ ip link set "$IPXLAT_VETH6_HOST" up
+
+ ip netns exec "$NS4" ip addr add "$IPXLAT_NS4_ADDR/24" \
+ dev "$IPXLAT_VETH4_NS"
+ ip netns exec "$NS4" ip link set "$IPXLAT_VETH4_NS" up
+ ip netns exec "$NS4" ip route add default via "$IPXLAT_HOST4_ADDR"
+
+ ip netns exec "$NS6" ip -6 addr add "$IPXLAT_NS6_ADDR/64" \
+ dev "$IPXLAT_VETH6_NS"
+ ip netns exec "$NS6" ip -6 addr add "$IPXLAT_V6_REMOTE/128" \
+ dev "$IPXLAT_VETH6_NS"
+ ip netns exec "$NS6" ip -6 addr add "$IPXLAT_V6_NS6_SRC/128" \
+ dev "$IPXLAT_VETH6_NS"
+ ip netns exec "$NS6" ip link set "$IPXLAT_VETH6_NS" up
+ ip netns exec "$NS6" ip -6 route add default via "$IPXLAT_HOST6_ADDR"
+ ip netns exec "$NS6" ip -6 route replace "$IPXLAT_V6_NS4/128" \
+ via "$IPXLAT_HOST6_ADDR"
+ sleep 2
+
+ sysctl -qw net.ipv4.ip_forward=1
+ sysctl -qw net.ipv6.conf.all.forwarding=1
+
+ # 4->6 steering rule
+ ip route replace 192.0.2.0/24 dev "$IPXLAT_TRANSLATOR_DEV"
+ # Post-translation egress:
+ # IPv6 destinations in xlat-prefix6 leave toward NS6.
+ ip -6 route replace "$IPXLAT_XLAT_PREFIX6/$IPXLAT_XLAT_PREFIX6_LEN" \
+ dev "$IPXLAT_VETH6_HOST"
+ # 6->4 steering rule
+ ip -6 route replace "$IPXLAT_V6_NS4/128" dev "$IPXLAT_TRANSLATOR_DEV"
+
+ ip link set "$IPXLAT_VETH6_HOST" mtu 1280
+ ip netns exec "$NS6" ip link set "$IPXLAT_VETH6_NS" mtu 1280
+}
+
+ipxlat_setup_env()
+{
+ ipxlat_require_root
+ ipxlat_require_tools
+ ipxlat_cleanup
+
+ ipxlat_configure_topology
+}
+
+ipxlat_run_iperf()
+{
+ local srv_ns="$1"
+ local cli_ns="$2"
+ local dst="$3"
+ local port="$4"
+ local -a args=()
+ local client_rc
+ local server_rc
+ local spid
+ local idx
+
+ for ((idx = 5; idx <= $#; idx++)); do
+ args+=("${!idx}")
+ done
+
+ ip netns exec "$srv_ns" timeout "$IPXLAT_IPERF_TIMEOUT" \
+ iperf3 -s -1 -p "$port" >/dev/null 2>&1 &
+ spid=$!
+ sleep 0.2
+
+ ip netns exec "$cli_ns" timeout "$IPXLAT_IPERF_TIMEOUT" \
+ iperf3 -c "$dst" -p "$port" "${args[@]}" >/dev/null 2>&1
+
+ client_rc=$?
+ if [[ $client_rc -ne 0 ]]; then
+ kill "$spid" >/dev/null 2>&1 || true
+ fi
+
+ wait "$spid" >/dev/null 2>&1
+ server_rc=$?
+
+ ((client_rc != 0)) && return "$client_rc"
+ return "$server_rc"
+}
+
+ipxlat_capture_pkts()
+{
+ local ns="$1"
+ local filter="$2"
+ local expect_pkts="$3"
+ local timeout_s="$4"
+ local cap_goal
+ local cap_pid
+ local rc
+ local trigger_rc
+
+ shift 4
+
+ cap_goal=1
+ [[ $expect_pkts -gt 0 ]] && cap_goal=$expect_pkts
+
+ ip netns exec "$ns" timeout "$timeout_s" \
+ tcpdump -nni any -c "$cap_goal" \
+ "$filter" >/dev/null 2>&1 &
+ cap_pid=$!
+ sleep 0.2
+
+ "$@"
+ trigger_rc=$?
+ wait "$cap_pid" >/dev/null 2>&1
+ rc=$?
+
+ if [[ $trigger_rc -ne 0 ]]; then
+ return "$trigger_rc"
+ fi
+
+ if [[ $expect_pkts -eq 0 ]]; then
+ [[ $rc -eq 124 ]]
+ else
+ [[ $rc -eq 0 ]]
+ fi
+}
diff --git a/tools/testing/selftests/net/ipxlat/ipxlat_udp4_zero_csum_send.c b/tools/testing/selftests/net/ipxlat/ipxlat_udp4_zero_csum_send.c
new file mode 100644
index 000000000000..ef9f07f8d699
--- /dev/null
+++ b/tools/testing/selftests/net/ipxlat/ipxlat_udp4_zero_csum_send.c
@@ -0,0 +1,119 @@
+// SPDX-License-Identifier: GPL-2.0
+/* IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver
+ *
+ * Copyright (C) 2026- Mandelbit SRL
+ * Copyright (C) 2026- Daniel Gröber <dxld@darkboxed.org>
+ *
+ * Author: Antonio Quartulli <antonio@mandelbit.com>
+ * Daniel Gröber <dxld@darkboxed.org>
+ * Ralf Lici <ralf@mandelbit.com>
+ */
+
+#include <arpa/inet.h>
+#include <errno.h>
+#include <linux/ip.h>
+#include <linux/udp.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/socket.h>
+#include <unistd.h>
+
+static uint16_t iphdr_csum(const void *buf, size_t len)
+{
+ const uint16_t *p = buf;
+ uint32_t sum = 0;
+
+ while (len > 1) {
+ sum += *p++;
+ len -= 2;
+ }
+ if (len)
+ sum += *(const uint8_t *)p;
+
+ while (sum >> 16)
+ sum = (sum & 0xffff) + (sum >> 16);
+
+ return (uint16_t)~sum;
+}
+
+int main(int argc, char **argv)
+{
+ static const char payload[] = "ipxlat-zero-udp-csum";
+ struct sockaddr_in dst = {};
+ struct {
+ struct iphdr ip;
+ struct udphdr udp;
+ char payload[sizeof(payload)];
+ } pkt = {};
+ in_addr_t saddr, daddr;
+ unsigned long dport_ul;
+ socklen_t dst_len;
+ ssize_t n;
+ int one = 1;
+ int fd;
+
+ if (argc != 4) {
+ fprintf(stderr, "usage: %s <src4> <dst4> <dport>\n", argv[0]);
+ return 2;
+ }
+
+ if (!inet_pton(AF_INET, argv[1], &saddr) ||
+ !inet_pton(AF_INET, argv[2], &daddr)) {
+ fprintf(stderr, "invalid IPv4 address\n");
+ return 2;
+ }
+
+ errno = 0;
+ dport_ul = strtoul(argv[3], NULL, 10);
+ if (errno || dport_ul > 65535) {
+ fprintf(stderr, "invalid UDP port\n");
+ return 2;
+ }
+
+ fd = socket(AF_INET, SOCK_RAW, IPPROTO_RAW);
+ if (fd < 0) {
+ perror("socket");
+ return 1;
+ }
+
+ if (setsockopt(fd, IPPROTO_IP, IP_HDRINCL, &one, sizeof(one)) < 0) {
+ perror("setsockopt(IP_HDRINCL)");
+ close(fd);
+ return 1;
+ }
+
+ pkt.ip.version = 4;
+ pkt.ip.ihl = 5;
+ pkt.ip.ttl = 64;
+ pkt.ip.protocol = IPPROTO_UDP;
+ pkt.ip.tot_len = htons(sizeof(pkt));
+ pkt.ip.id = htons(1);
+ pkt.ip.frag_off = 0;
+ pkt.ip.saddr = saddr;
+ pkt.ip.daddr = daddr;
+ pkt.ip.check = iphdr_csum(&pkt.ip, sizeof(pkt.ip));
+
+ pkt.udp.source = htons(4242);
+ pkt.udp.dest = htons((uint16_t)dport_ul);
+ pkt.udp.len = htons(sizeof(pkt.udp) + sizeof(payload));
+ pkt.udp.check = 0;
+
+ memcpy(pkt.payload, payload, sizeof(payload));
+
+ dst.sin_family = AF_INET;
+ dst.sin_port = pkt.udp.dest;
+ dst.sin_addr.s_addr = daddr;
+ dst_len = sizeof(dst);
+
+ n = sendto(fd, &pkt, sizeof(pkt), 0, (struct sockaddr *)&dst, dst_len);
+ if (n != (ssize_t)sizeof(pkt)) {
+ perror("sendto");
+ close(fd);
+ return 1;
+ }
+
+ close(fd);
+ return 0;
+}
--
2.53.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [RFC net-next 15/15] Documentation: networking: add ipxlat translator guide
2026-03-19 15:12 [RFC net-next 00/15] Introducing ipxlat: a stateless IPv4/IPv6 translation device Ralf Lici
` (13 preceding siblings ...)
2026-03-19 15:12 ` [RFC net-next 14/15] selftests: net: add ipxlat coverage Ralf Lici
@ 2026-03-19 15:12 ` Ralf Lici
2026-03-19 22:11 ` Jonathan Corbet
14 siblings, 1 reply; 18+ messages in thread
From: Ralf Lici @ 2026-03-19 15:12 UTC (permalink / raw)
To: netdev
Cc: Daniel Gröber, Antonio Quartulli, Ralf Lici, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
Jonathan Corbet, Shuah Khan, linux-doc, linux-kernel
From: Daniel Gröber <dxld@darkboxed.org>
Add user and reviewer documentation for the ipxlat virtual netdevice in
Documentation/networking/ipxlat.rst.
The document describes the datapath model, stateless IPv4/IPv6 address
translation rules, ICMP handling, control-plane configuration, and test
topology assumptions. It also records the intended runtime configuration
contract and current behavior limits so deployment expectations are
clear.
Signed-off-by: Daniel Gröber <dxld@darkboxed.org>
Signed-off-by: Ralf Lici <ralf@mandelbit.com>
---
Documentation/networking/ipxlat.rst | 190 ++++++++++++++++++++++++++++
1 file changed, 190 insertions(+)
create mode 100644 Documentation/networking/ipxlat.rst
diff --git a/Documentation/networking/ipxlat.rst b/Documentation/networking/ipxlat.rst
new file mode 100644
index 000000000000..5a0ad02c05be
--- /dev/null
+++ b/Documentation/networking/ipxlat.rst
@@ -0,0 +1,190 @@
+.. SPDX-License-Identifier: GPL-2.0+
+.. Copyright (C) 2026 Daniel Gröber <dxld@debian.org>
+
+==============================================
+IPXLAT - IPv6<>IPv4 IP/ICMP Translation (SIIT)
+==============================================
+
+ipxlat (``CONFIG_IPXLAT=y``) provides a virtual netdevice implementing
+stateless IP packet translation between IP versions 6 and 4. This is a
+building block for establishing layer 3 connectivity between otherwise
+uncommunicative IPv6-only and/or IPv4-only networks.
+
+
+Creation and Configuration Parameters
+=====================================
+
+An ipxlat netdevice can be created and configured using YNL like so::
+
+ $ ip link add siit0 type ipxlat
+
+ $ IID=$(cat /sys/class/net/siit0/ifindex)
+
+ $ ADDR_HEX=$(python3 -c 'import ipaddress,sys; \
+ print(ipaddress.IPv6Address(sys.argv[1]).packed.hex())' \
+ 64:ff9b:: | tee /dev/stderr)
+ 0064ff9b000000000000000000000000
+
+ $ ./tools/net/ynl/pyynl/cli.py --family ipxlat --json '{"ifindex": $IID, \
+ "config": {"xlat-prefix6": "'$HEX_ADDR'", "prefix-len": 96} }'
+
+(TODO: Once implemented) A ipxlat netdevice can be configured using
+iproute2::
+
+ $ ip link add siit0 type ipxlat [ OPTIONS ]
+
+ # where OPTIONS can include (TODO: iproute2 patch):
+ #
+ # prefix ADDR (default 64:ff9b::/96)
+ #
+ # lowest-ipv6-mtu MTU (default 1280)
+
+
+Introduction to Packet-level IPv6<>IPv4 Translation
+===================================================
+
+Translatable packets delivered into an ipxlat device as either of the IP
+protocol versions loop-back as the other. Untranslatable packets are
+rejected with ICMP errors of the same IP version as appropriate or dropped
+silently if required by RFC-SIIT_.
+
+.. _RFC-SIIT: https://datatracker.ietf.org/doc/html/rfc7915
+
+Supported upper layer protocols (TCP/UDP/ICMP) have their checksums
+recomputed as-needed as part of translation. Unsupported IP protocols
+(IPPROTO\_*) are passed through unmodified. This will make them fail at the
+receiver except in special cases.
+
+Differences in IP layer semantic concerns are handled using several
+different strategies, here we'll only give a high-level summary in the
+areas of most friction:
+ Fragmentation approach, Path MTU Discovery (PMTUD), IP Options and Extension
+ Headers.
+
+**Fragmentation Approach** (v4: on-path vs v6: end-to-end) is smoothed over by:
+ | 4->6: Fragmenting (DF=0) IPv4 packets when needed. See "lowest-ipv6-mtu".
+ | 6->4: Using on-path frag. down the line for v4 pkts smaller than 1260.
+ Details are tedious, check RFC-SIIT_.
+
+**PMTUD** is maintained by recalculating advised MTU values in ICMP
+PKT_TOO_BIG and FRAG_NEEDED messages as they're being translated. Taking
+into account the necessary header re-sizing and post-translation nexthop
+MTU in the main routing table.
+
+**IP Options and IPv6 Extension Headers** except the Fragment Header are
+dropped or ignored expept where more specific behaviour is specified in
+RFC-SIIT_.
+
+
+Address Translation
+-------------------
+
+The ipxlat address translation algorithm is stateless, per RFC-ADDR_, all
+possible IPv4 addressess are mapped one-to-one into the translation prefix,
+optionally including a non-standard "suffix". See `RFC-ADDR Section 2.2
+<https://datatracker.ietf.org/doc/html/rfc6052#section-2.2>`_.
+
+.. _RFC-ADDR: https://datatracker.ietf.org/doc/html/rfc6052
+
+IPv6 addressess outside this prefix are rejected with ICMPv6 errors with
+the notable exception of ICMPv6 errors originating from untranslatable
+source addressess. These are translated to be sourced from the IPv4 Dummy
+Address ``192.0.0.8`` (per I-D-dummy_) instead to maintain IPv4 traceroute
+visibility.
+
+.. _I-D-dummy:
+ https://datatracker.ietf.org/doc/draft-ietf-v6ops-icmpext-xlat-v6only-source/
+
+In a basic bidirectional 6<>4 connectivity scenario this means IPv6 hosts
+must be addressed wholly from inside the translation prefix and per
+RFC-ADDR_. Plain vanilla SLAAC doesn't cut it here, static addressing or
+DHCPv6 is needed, unless that is we introduce statefulnes (RFC-NAT64_) into
+the mix. See below on that.
+
+.. _RFC-NAT64: https://datatracker.ietf.org/doc/html/rfc6146
+
+
+Stateful Translation (NAT64)
+----------------------------
+
+Using NAT64 has several drawbacks, it's necessary only when your control
+over IPv4 or IPv6 addressing of hosts is limited.
+
+Using nftables we can turn a system into a stateful translator. For example
+to make the IPv4 internet reachable to a IPv6-only LAN having this system
+as it's default route, further assuming we have an IPv4 default route and
+``192.0.2.1/32`` is routed to this system::
+
+ $ ip link add siit0 type ipxlat
+ $ ip link set dev siit0 up
+ $ ip route 192.0.2.1/32 dev siit0
+ $ ip route 64:ff9b::/96 dev siit0
+ $ sysctl -w net.ipv4.conf.all.forwarding=1
+ $ sysctl -w net.ipv6.conf.all.forwarding=1
+ $ nft -f- <<EOF
+ table ip6 nat {
+ chain postrouting {
+ type nat hook postrouting priority filter; policy accept;
+ oifname "siit0" snat to 64:ff9b::c002:1 comment "::192.0.2.1"
+ }
+ }
+ table ip nat {
+ chain postrouting {
+ type nat hook postrouting priority filter; policy accept;
+ iifname "siit0" masquerade
+ }
+ }
+ EOF
+
+Note: Keep reading when replacing the 192.0.2.0/24 documentation
+placeholder with RFC 1918 "private IPv4" space.
+
+
+Translation Prefix Choice and Complications
+-------------------------------------------
+
+Several prefix sizes between /32 and /96 are supported by ipxlat. Using
+a /96 prefix is often convenient as it allows using the dotted quad IPv6
+notation, eg.: "64:ff9b::192.0.2.1". RFC-ADDR_ "3.3. Choice of Prefix for
+Stateless Translation Deployments" has more detailed recommendations.
+
+The "Well-Known Prefix" (WKP) 64:ff9b::/96, while a convenient and short
+choice for LANs, comes with some IETF baggage. As specified (at time of
+writing) addressess drawn from RFC 1918 "private IPv4" space "MUST NOT" be
+used with the WKP. While ipxlat does not enforce this other network
+elements may.
+
+If I-D-WKP-1918_ makes it through the IETF process this complication for
+the cautious network engineer may dissapear in the future.
+
+.. _I-D-WKP-1918:
+ https://datatracker.ietf.org/doc/draft-ietf-v6ops-nat64-wkp-1918/
+
+In the meantime the newer and more lax prefix allocated by RFC-LWKP_ or an
+entirely Network-Specific Prefix may be a better fit. We'd recommend using
+the checksum-neutral ``64:ff9b:1:fffe::/96`` prefix from the larger /48
+allocation.
+
+.. _RFC-LWKP: https://datatracker.ietf.org/doc/html/rfc8215
+
+
+RFC Considerations for Userspace
+--------------------------------
+
+- Per `RFC 7915
+ <https://datatracker.ietf.org/doc/html/rfc7915#section-4.5>`_,
+ ipxlat SHOULD drop UDPv4 zero checksum packets, yet we chose to always
+ recalculate checksums for unfragmented packets.
+
+ If you want your translator to follow the SHOULD add a netfilter rule
+ dropping such packets. For example using ``nft(8)`` syntax::
+
+ nft add rule filter ip postrouting -- oifkind ipxlat udp checksum 0 log drop
+
+- Per `RFC 6146
+ <https://datatracker.ietf.org/doc/html/rfc6146#section-3.4>`_,
+ Fragmented UDPv4 zero checksum recalculation by reassembly is not
+ supported.
+
+- I-D-dummy_: Adding a Node Identity Object to for IPv4-side traceroute
+ disambiguation is not yet supported.
--
2.53.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [RFC net-next 15/15] Documentation: networking: add ipxlat translator guide
2026-03-19 15:12 ` [RFC net-next 15/15] Documentation: networking: add ipxlat translator guide Ralf Lici
@ 2026-03-19 22:11 ` Jonathan Corbet
2026-03-24 9:55 ` Ralf Lici
0 siblings, 1 reply; 18+ messages in thread
From: Jonathan Corbet @ 2026-03-19 22:11 UTC (permalink / raw)
To: Ralf Lici, netdev
Cc: Daniel Gröber, Antonio Quartulli, Ralf Lici, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
Shuah Khan, linux-doc, linux-kernel
Ralf Lici <ralf@mandelbit.com> writes:
> From: Daniel Gröber <dxld@darkboxed.org>
>
> Add user and reviewer documentation for the ipxlat virtual netdevice in
> Documentation/networking/ipxlat.rst.
>
> The document describes the datapath model, stateless IPv4/IPv6 address
> translation rules, ICMP handling, control-plane configuration, and test
> topology assumptions. It also records the intended runtime configuration
> contract and current behavior limits so deployment expectations are
> clear.
>
> Signed-off-by: Daniel Gröber <dxld@darkboxed.org>
> Signed-off-by: Ralf Lici <ralf@mandelbit.com>
> ---
> Documentation/networking/ipxlat.rst | 190 ++++++++++++++++++++++++++++
> 1 file changed, 190 insertions(+)
> create mode 100644 Documentation/networking/ipxlat.rst
You need to add this new file to Documentation/networking/index.rst or
it won't be included in the build (and you'll get a warning).
Thanks,
jon
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC net-next 15/15] Documentation: networking: add ipxlat translator guide
2026-03-19 22:11 ` Jonathan Corbet
@ 2026-03-24 9:55 ` Ralf Lici
0 siblings, 0 replies; 18+ messages in thread
From: Ralf Lici @ 2026-03-24 9:55 UTC (permalink / raw)
To: Jonathan Corbet, netdev
Cc: Daniel Gröber, Antonio Quartulli, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
Shuah Khan, linux-doc, linux-kernel
On 3/19/26 23:11, Jonathan Corbet wrote:
> Ralf Lici <ralf@mandelbit.com> writes:
>
>> From: Daniel Gröber <dxld@darkboxed.org>
>>
>> Add user and reviewer documentation for the ipxlat virtual netdevice in
>> Documentation/networking/ipxlat.rst.
>>
>> The document describes the datapath model, stateless IPv4/IPv6 address
>> translation rules, ICMP handling, control-plane configuration, and test
>> topology assumptions. It also records the intended runtime configuration
>> contract and current behavior limits so deployment expectations are
>> clear.
>>
>> Signed-off-by: Daniel Gröber <dxld@darkboxed.org>
>> Signed-off-by: Ralf Lici <ralf@mandelbit.com>
>> ---
>> Documentation/networking/ipxlat.rst | 190 ++++++++++++++++++++++++++++
>> 1 file changed, 190 insertions(+)
>> create mode 100644 Documentation/networking/ipxlat.rst
>
> You need to add this new file to Documentation/networking/index.rst or
> it won't be included in the build (and you'll get a warning).
>
> Thanks,
>
> jon
Hi Jon,
Thanks for the heads-up.
I’ve fixed this for the next revision. While rechecking with 'make
SPHINXDIRS=networking htmldocs', I also found and fixed a couple of
'ipxlat.rst' issues reported by Sphinx.
Thanks,
--
Ralf Lici
Mandelbit Srl
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2026-03-24 10:05 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-19 15:12 [RFC net-next 00/15] Introducing ipxlat: a stateless IPv4/IPv6 translation device Ralf Lici
2026-03-19 15:12 ` [RFC net-next 01/15] drivers/net: add ipxlat netdevice skeleton and build plumbing Ralf Lici
2026-03-19 15:12 ` [RFC net-next 02/15] ipxlat: add RFC 6052 address conversion helpers Ralf Lici
2026-03-19 15:12 ` [RFC net-next 03/15] ipxlat: add packet metadata control block helpers Ralf Lici
2026-03-19 15:12 ` [RFC net-next 04/15] ipxlat: add IPv4 packet validation path Ralf Lici
2026-03-19 15:12 ` [RFC net-next 05/15] ipxlat: add IPv6 " Ralf Lici
2026-03-19 15:12 ` [RFC net-next 06/15] ipxlat: add transport checksum and offload helpers Ralf Lici
2026-03-19 15:12 ` [RFC net-next 07/15] ipxlat: add 4to6 and 6to4 TCP/UDP translation helpers Ralf Lici
2026-03-19 15:12 ` [RFC net-next 08/15] ipxlat: add translation engine and dispatch core Ralf Lici
2026-03-19 15:12 ` [RFC net-next 09/15] ipxlat: emit translator-generated ICMP errors on drop Ralf Lici
2026-03-19 15:12 ` [RFC net-next 10/15] ipxlat: add 4to6 pre-fragmentation path Ralf Lici
2026-03-19 15:12 ` [RFC net-next 11/15] ipxlat: add ICMP informational translation paths Ralf Lici
2026-03-19 15:12 ` [RFC net-next 12/15] ipxlat: add ICMP error translation and quoted-inner handling Ralf Lici
2026-03-19 15:12 ` [RFC net-next 13/15] ipxlat: add netlink control plane and uapi Ralf Lici
2026-03-19 15:12 ` [RFC net-next 14/15] selftests: net: add ipxlat coverage Ralf Lici
2026-03-19 15:12 ` [RFC net-next 15/15] Documentation: networking: add ipxlat translator guide Ralf Lici
2026-03-19 22:11 ` Jonathan Corbet
2026-03-24 9:55 ` Ralf Lici
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox