From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mout-b-210.mailbox.org (mout-b-210.mailbox.org [195.10.208.40]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9ADDF1C84A6; Wed, 24 Jun 2026 16:26:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.10.208.40 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782318390; cv=none; b=tznEoFEofqH1TmESMB//ytq+ccdKePb8UZDL5swvM25yVjQh8xp3IIuQqcu9v3G0XRv94JbLBkX0y1xCYxE2HF3XPUDH8aRp0sPsylPLNCu5Le07zeMhyFqo5/wEs67m+kO8MeGnQ9Dv9voVZ3osjj6kDGCF+kS8Oh1pr6mXdvM= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782318390; c=relaxed/simple; bh=dqheWV0MoBE8GFdeH5pTuxFjJ46lZ7tAJI2rPyDPQ1g=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=u+gio+VUMH1xy36S0JX2A7Tw6IegcNUCXYt2LeAnrwh6+hnc9ODYsL3S/eMEbZ8jThOc3XH0TORVFZfXRm+H9WHwv+o7EqfeofTpOT1DXGer32nKdyhFR4iHOl4CWXIeGcjHbmPCuoJZTc7KzV3DEUujgrW9JYTrgOloQOuy5Sc= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=mandelbit.com; spf=pass smtp.mailfrom=mandelbit.com; dkim=pass (2048-bit key) header.d=mandelbit.com header.i=@mandelbit.com header.b=SG4LA9ij; arc=none smtp.client-ip=195.10.208.40 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=mandelbit.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=mandelbit.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=mandelbit.com header.i=@mandelbit.com header.b="SG4LA9ij" Received: from smtp2.mailbox.org (smtp2.mailbox.org [IPv6:2001:67c:2050:b231:465::2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange x25519 server-signature RSA-PSS (4096 bits) server-digest SHA512) (No client certificate requested) by mout-b-210.mailbox.org (Postfix) with ESMTPS id 4glnFB0stVzMm3v; Wed, 24 Jun 2026 18:19:10 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mandelbit.com; s=MBO0001; t=1782317950; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=F/WLT1yFW6o+7ddGyEm3OpCyzzUL7JiZATagfJStkU4=; b=SG4LA9ij845HuzjRwxoIBCYwje70OKH3X50VitM6aC0Gte3nWwQCsTltezpxxP1Zk5IVsM uBosnlUJ324e/7+SLIYEUf5A0RinQM6rpi5eYnYW4fGQDPloMDqW9RhsJ4r0QTYMPndGaH czZI+NMVWhO2yrKavOyLnrnYohkA6lbgvsym/TnldIIA1NZWfFiSwGVKydUDQaHu6HXSA8 WmdTwcuKibIb5J5cmtwhd/D3ebL0i602An8LgJ+KqEgcfkKUNFy0It/SeVz0bfSaxulhYK FEAehBqlEQAenOhMqioTV+qAoD95xm9u7Z1OAnhFjY7DsxdyLgjx+HhsDmDSGA== Authentication-Results: outgoing_mbo_mout; dkim=none; spf=pass (outgoing_mbo_mout: domain of ralf@mandelbit.com designates 2001:67c:2050:b231:465::2 as permitted sender) smtp.mailfrom=ralf@mandelbit.com From: Ralf Lici To: =?UTF-8?q?Toke=20H=C3=B8iland-J=C3=B8rgensen?= Cc: netdev@vger.kernel.org, =?UTF-8?q?Daniel=20Gr=C3=B6ber?= , Antonio Quartulli , Andrew Lunn , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , linux-kernel@vger.kernel.org, Pablo Neira Ayuso , Florian Westphal , Phil Sutter , Beniamino Galvani Subject: Re: [RFC net-next 08/15] ipxlat: add translation engine and dispatch core Date: Wed, 24 Jun 2026 18:18:52 +0200 Message-ID: <20260624161854.686569-1-ralf@mandelbit.com> In-Reply-To: <87v7b9c9jj.fsf@toke.dk> References: <87v7b9c9jj.fsf@toke.dk> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 4glnFB0stVzMm3v On Tue, 23 Jun 2026 21:59:44 +0200, Toke Høiland-Jørgensen wrote: > Ralf Lici writes: > > On the BPF point specifically: I agree a BPF program should be able to > > decide whether to translate. What I am less sure about is whether > > redirecting to a netdevice is the best way to expose that. A TC action > > (yet another model, I know :)) gives you the same thing in-pipeline and > > more directly: > > > > tc filter add dev wwan0 egress \ > > bpf obj match.o action ipxlat4to6 domain clat0 > > > > Let BPF make the policy decision, with the native action doing the > > translation work that the current BPF CLAT implementations have trouble > > with: fragmentation, checksum corner cases, and ICMP error inner > > headers (as explained by Beniamino). > > > > So TC clsact looks like the natural in-kernel replacement for today's > > TC-BPF CLAT programs: no extra netdev, you attach to the existing > > uplink, direction is explicit, and on egress you sit on the real route > > dst, so the synthetic-dst and double-routing problems above just don't > > arise. The cost is more moving parts than a single bpf_redirect since > > userspace has to manage clsact, filters, priorities and action > > lifecycle/cleanup. > > Hmm, so no one really uses the bpf filter mechanism, since you can just > do everything from an action anyway (and with TCX attachment, you can > even avoid the overhead of the TC filter/action infrastructure > entirely). However, point taken wrt how to integrate this with BPF. I > guess the most flexible thing would be to expose the functionality > directly (as a kfunc callable from a BPF program). Which also fits with > your point below: > Ah, I see, the cls_bpf example was dated, and I like the kfunc angle better than a new TC action. I would probably keep that as the minimal per-packet interface: BPF can decide whether a packet should be translated, and the kfunc can do the actual translation work for packets whose translated form still fits the output MTU. The full 4->6 fragmentation case still looks like output-path/harness territory to me, since it is a 1->N fan-out operation. > > For a gateway translator, though, I still think a device-bound model is > > less natural. There the translation point is more like a forwarding > > decision across routes and nexthops, so a route/LWT attachment, or > > possibly a netfilter attachment seems easier to reason about. Also, as > > you already pointed out while discussing LWT, an admin setting up NAT64 > > is more likely to reach for an nft rule than for a clsact filter on a > > specific device. > > > > Taking a step back, ipxlat is really a generic translation engine plus a > > thin harness around it. So rather than pick one attachment, it might be > > worth structuring the engine so different harnesses can drive it. > > There's interesting precedent for this shape: > > > > - ILA, again, is the closest sibling: stateless IPv6 address translation > > with a shared core in ila_common.c, driven both by an LWT frontend in > > ila_lwt.c and by an inline netfilter hook with a netlink-configured > > mapping table in ila_xlat.c. > > > > - act_ct is the precedent for the TC side specifically: a TC action that > > reuses the netfilter conntrack engine rather than reimplementing it. > > > > And act_nat is the cautionary counter-example: a standalone TC > > reimplementation of stateless NAT that shares no code with nf_nat, and > > carries a "would be nice to share code" comment :) > > > > So I am wondering whether the right direction is to factor the > > translation engine cleanly, land it with one harness first, and keep the > > other attachment points as follow-up work once the core semantics are > > settled. > > > > Does that direction seem reasonable to you? > > Yes, reusable functionality that can be called from multiple places > sounds like a good fit; let's try to structure it that way! > Great, that's the direction I'll take then. > As for which hook to start with, well, let's see if we hear back from > the netfilter devs, but either netfilter or the routing subsystem (LWT > style) would be OK for me I think. > Works for me. The engine factoring is common to all of them, so I'll start there. Once it's in shape I can sketch a harness against it to sanity-check the interface. -- Ralf Lici Mandelbit Srl