From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2AA413101C8; Thu, 11 Jun 2026 08:24:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781166249; cv=none; b=cInC4bQ5Q5lUfuTQZTv54G8/zHv5AfeOEW468kVFuRKenKTjp1eTOgBfPvtJAazq9Tlbk8hySsPn3qpgYGjyEXzib0WgU+gd7fWDqYxhvkGeax//qdpo9JxftKegaVSm1WEbEeFYyM96nczz7mWwm1RBCFv00yBzBvgnsG5OtYU= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781166249; c=relaxed/simple; bh=IO9rJs7T38yo+TK6izt9duvp55N+9i3j+3Atz0x5s1c=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=kBW2tETAC7D7l+w2uXk0vR4N3YiRmGUgJeYKBOWWcMxN0tgy4mmDhMop2ZWrUtZp/IRA5+tqHfPvAPo8qWiAPZXz9st9Jyh8WLfuOE3Q+WxqcVkFqbmTsfuC06LfZsRc0Yio9Hpi2uEf9YIeJdsmadhb3lLpEybptJeT7RQEN3I= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=DZWso0TR; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="DZWso0TR" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 31D671F00893; Thu, 11 Jun 2026 08:24:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1781166247; bh=Yjg0i8SR0KD3kOfgggb+4qce8niDREk8xow7m/nK0vE=; h=Date:From:To:Cc:Subject:References:In-Reply-To; b=DZWso0TRqEKL3AK96G0lqFXNCZZvH9jOeq9NnuIuqvEeDHHEF4abMHvdStGGQaJo6 yRCSiiEJlf8INlzvceNajfri2LxQZmMmVFvaFmV9OrFQD37PLzGTyKC4NKfmre1fD6 APNde28aEMORP8o6ZqXZGk7ADhohwTiIg40pr83pN/i678cY/2ahaKnN/+MOAj6rmg Kn7ett0M/FW7ycEPUGQE1xBIp3VPZ4bbe90at3fEmmok7TZcRbKy8az4/VWx8odwcN uyWwahD6Cuj2hqkYUFQnGGF8gtRJ/PC9Rmq4ZqPAfHPiRcfgj3PtZsqKn9b1WRV1M+ TGzZqwMTKhDmw== Date: Thu, 11 Jun 2026 10:24:05 +0200 From: Lorenzo Bianconi To: Pablo Neira Ayuso Cc: Florian Westphal , Phil Sutter , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Simon Horman , netfilter-devel@vger.kernel.org, coreteam@netfilter.org, netdev@vger.kernel.org Subject: Re: [PATCH nf] netfilter: flowtable: use pskb_may_pull() in nf_flow_ip6_tunnel_proto() Message-ID: References: <20260608-b4-nf_flow_ip6_tunnel_proto-update-v1-1-782c7052c8fd@kernel.org> Precedence: bulk X-Mailing-List: netfilter-devel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="LMbnxSaivFP16/g/" Content-Disposition: inline In-Reply-To: --LMbnxSaivFP16/g/ Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable > Hi Lorenzo, >=20 > On Tue, Jun 09, 2026 at 02:31:03PM +0200, Lorenzo Bianconi wrote: > > > On Tue, Jun 09, 2026 at 12:28:08PM +0200, Lorenzo Bianconi wrote: > > > > On Jun 09, Pablo Neira Ayuso wrote: > > > > > Hi Lorenzo, > > > > >=20 > > > > > On Mon, Jun 08, 2026 at 07:06:52PM +0200, Lorenzo Bianconi wrote: > > > > > > Switch nf_flow_ip6_tunnel_proto() from skb_header_pointer() to > > > > > > pskb_may_pull() for header validation, aligning it with the app= roach > > > > > > used in nf_flow_ip4_tunnel_proto(). > > > > > > Move ctx->offset update inside the IPPROTO_IPV6 conditional blo= ck since > > > > > > it should only be adjusted when a tunnel is actually detected. > > > > > > While at it, use nexthdr instead of the hardcoded IPPROTO_IPV6 = constant > > > > > > when setting ctx->tun.proto. > > > > > >=20 > > > > > > Fixes: d98103575dcdd ("netfilter: flowtable: Add IP6IP6 rx sw a= cceleration") > > > > > > Signed-off-by: Lorenzo Bianconi > > > > > > --- > > > > > > net/netfilter/nf_flow_table_ip.c | 10 +++++----- > > > > > > 1 file changed, 5 insertions(+), 5 deletions(-) > > > > > >=20 > > > > > > diff --git a/net/netfilter/nf_flow_table_ip.c b/net/netfilter/n= f_flow_table_ip.c > > > > > > index 9c05a50d6013..2946399ab715 100644 > > > > > > --- a/net/netfilter/nf_flow_table_ip.c > > > > > > +++ b/net/netfilter/nf_flow_table_ip.c > > > > > > @@ -347,15 +347,15 @@ static bool nf_flow_ip6_tunnel_proto(stru= ct nf_flowtable_ctx *ctx, > > > > > > struct sk_buff *skb) > > > > > > { > > > > > > #if IS_ENABLED(CONFIG_IPV6) > > > > > > - struct ipv6hdr *ip6h, _ip6h; > > > > > > + struct ipv6hdr *ip6h; > > > > > > __be16 frag_off; > > > > > > u8 nexthdr; > > > > > > int hdrlen; > > > > > > =20 > > > > > > - ip6h =3D skb_header_pointer(skb, ctx->offset, sizeof(*ip6h), = &_ip6h); > > > > > > - if (!ip6h) > > > > > > + if (!pskb_may_pull(skb, sizeof(*ip6h) + ctx->offset)) > > > > > > return false; > > > > > > =20 > > > > > > + ip6h =3D (struct ipv6hdr *)(skb_network_header(skb) + ctx->of= fset); > > > > > > if (ip6h->hop_limit <=3D 1) > > > > > > return false; > > > > >=20 > > > > > Not shown in the patch, but is there still a corner case here that > > > > > needs to be covered? > > > > >=20 > > > > > ipv6_skip_exthdr() uses skb_header_pointer() internal, then anoth= er > > > > > pskb_may_pull() is needed to make sure no other IPv6 extension he= ader > > > > > sits between the outer and the inner IPPROTO_IPV6 header, allowin= g to > > > > > be in a non-linear area of the skb? =20 > > > > >=20 > > > > > > @@ -367,9 +367,9 @@ static bool nf_flow_ip6_tunnel_proto(struct= nf_flowtable_ctx *ctx, > > > > > > =20 > > > > >=20 > > > > > I mean: > > > > >=20 > > > > > if (!pskb_may_pull(skb, hdrlen)) > > > > > return false; > > > > >=20 > > > > > where hdrlen is what ipv6_skip_exthdr() returns. > > > > >=20 > > > > > Then, I think it should be safe to call skb_pull() on > > > > > ctx->tun.hdr_size. > > > > >=20 > > > > > Let me know, thanks. > > > >=20 > > > > I think you are right, here we need to run: > > > >=20 > > > > if (!pskb_may_pull(skb, hdrlen)) > > > > return false; > > > >=20 > > > > in order to be sure we can pull ctx->tun.hdr_size in nf_flow_ip_tun= nel_pop(). > > > > Doing so, we can roll-back to the original skb_header_pointer() to = access the > > > > outer ip6 header here. What do you think? > > >=20 > > > Yes, initial skb_header_pointer() then pskb_may_pull(skb, hdrlen) to > > > ensure the entire should be fine. > > >=20 > > > I think this need one more fix: This needs to resort to classic path > > > if there are intermediate extension headers sitting in between the > > > outer and inner headers in IP6IP6, ie. ipv6_ext_hdr() =3D=3D true. Th= ose > > > extensions need to be handled by the IPv6 stack. > >=20 > > In my setup we have just a single Destination Option extension header (= 60) > > between the outer and the inner IPV6 headers. In order to check if we h= ave > > other extensions headers other than Destination Option (and if so, send= the > > packet the networking stack) I guess we need to implement something sim= ilar > > to ipv6_skip_exthdr(), agree? >=20 > Maybe simpler check? If nexthdr is immediately IP6IP6 (ie. no > intermediate headers are in place), then handle this from the > flowtable datapath, otherwise fallback to classic. Thus, no new > special parser function is needed. Hi Pablo, ack, I agree. I guess we can limit the flowtable acceleration to the case o= f a IP6IP6 tunnel created with encaplimit set to none: $ip link add name tun0 type ip6tnl local remote encaplimit= none >=20 > > > nf_flow_ip6_tunnel_proto() needs to be fixed to deal with this.=20 > >=20 > > Do you want to do it with a dedicated patch or do you prefer to do it i= n this > > one? >=20 > I think a single patch to fix the issues in nf_flow_ip6_tunnel_proto() > should be fine. I cooked this patch, it works fine. What do you think? Regards, Lorenzo diff --git a/net/netfilter/nf_flow_table_ip.c b/net/netfilter/nf_flow_table= _ip.c index 9c05a50d6013..5cb50414a491 100644 --- a/net/netfilter/nf_flow_table_ip.c +++ b/net/netfilter/nf_flow_table_ip.c @@ -347,29 +347,20 @@ static bool nf_flow_ip6_tunnel_proto(struct nf_flowta= ble_ctx *ctx, struct sk_buff *skb) { #if IS_ENABLED(CONFIG_IPV6) - struct ipv6hdr *ip6h, _ip6h; - __be16 frag_off; - u8 nexthdr; - int hdrlen; + struct ipv6hdr *ip6h; =20 - ip6h =3D skb_header_pointer(skb, ctx->offset, sizeof(*ip6h), &_ip6h); - if (!ip6h) + if (!pskb_may_pull(skb, sizeof(*ip6h) + ctx->offset)) return false; =20 + ip6h =3D (struct ipv6hdr *)(skb_network_header(skb) + ctx->offset); if (ip6h->hop_limit <=3D 1) return false; =20 - nexthdr =3D ip6h->nexthdr; - hdrlen =3D ipv6_skip_exthdr(skb, sizeof(*ip6h) + ctx->offset, &nexthdr, - &frag_off); - if (hdrlen < 0) - return false; - - if (nexthdr =3D=3D IPPROTO_IPV6) { - ctx->tun.hdr_size =3D hdrlen; - ctx->tun.proto =3D IPPROTO_IPV6; + if (ip6h->nexthdr =3D=3D IPPROTO_IPV6) { + ctx->tun.proto =3D ip6h->nexthdr; + ctx->tun.hdr_size =3D sizeof(*ip6h); + ctx->offset +=3D ctx->tun.hdr_size; } - ctx->offset +=3D ctx->tun.hdr_size; =20 return true; #else @@ -648,25 +639,19 @@ static int nf_flow_tunnel_v4_push(struct net *net, st= ruct sk_buff *skb, return 0; } =20 -struct ipv6_tel_txoption { - struct ipv6_txoptions ops; - __u8 dst_opt[8]; -}; - static int nf_flow_tunnel_ip6ip6_push(struct net *net, struct sk_buff *skb, struct flow_offload_tuple *tuple, - struct in6_addr **ip6_daddr, - int encap_limit) + struct in6_addr **ip6_daddr) { struct ipv6hdr *ip6h =3D (struct ipv6hdr *)skb_network_header(skb); - u8 hop_limit =3D ip6h->hop_limit, proto =3D IPPROTO_IPV6; struct rtable *rt =3D dst_rtable(tuple->dst_cache); __u8 dsfield =3D ipv6_get_dsfield(ip6h); struct flowi6 fl6 =3D { .daddr =3D tuple->tun.src_v6, .saddr =3D tuple->tun.dst_v6, - .flowi6_proto =3D proto, + .flowi6_proto =3D IPPROTO_IPV6, }; + u8 hop_limit =3D ip6h->hop_limit; int err, mtu; u32 headroom; =20 @@ -674,41 +659,18 @@ static int nf_flow_tunnel_ip6ip6_push(struct net *net= , struct sk_buff *skb, if (err) return err; =20 - skb_set_inner_ipproto(skb, proto); + skb_set_inner_ipproto(skb, IPPROTO_IPV6); headroom =3D sizeof(*ip6h) + LL_RESERVED_SPACE(rt->dst.dev) + rt->dst.header_len; - if (encap_limit) - headroom +=3D 8; err =3D skb_cow_head(skb, headroom); if (err) return err; =20 skb_scrub_packet(skb, true); mtu =3D dst_mtu(&rt->dst) - sizeof(*ip6h); - if (encap_limit) - mtu -=3D 8; mtu =3D max(mtu, IPV6_MIN_MTU); skb_dst_update_pmtu_no_confirm(skb, mtu); =20 - if (encap_limit > 0) { - struct ipv6_tel_txoption opt =3D { - .dst_opt[2] =3D IPV6_TLV_TNL_ENCAP_LIMIT, - .dst_opt[3] =3D 1, - .dst_opt[4] =3D encap_limit, - .dst_opt[5] =3D IPV6_TLV_PADN, - .dst_opt[6] =3D 1, - }; - struct ipv6_opt_hdr *hopt; - - opt.ops.dst1opt =3D (struct ipv6_opt_hdr *)opt.dst_opt; - opt.ops.opt_nflen =3D 8; - - hopt =3D skb_push(skb, ipv6_optlen(opt.ops.dst1opt)); - memcpy(hopt, opt.ops.dst1opt, ipv6_optlen(opt.ops.dst1opt)); - hopt->nexthdr =3D IPPROTO_IPV6; - proto =3D NEXTHDR_DEST; - } - skb_push(skb, sizeof(*ip6h)); skb_reset_network_header(skb); =20 @@ -716,7 +678,7 @@ static int nf_flow_tunnel_ip6ip6_push(struct net *net, = struct sk_buff *skb, ip6_flow_hdr(ip6h, dsfield, ip6_make_flowlabel(net, skb, fl6.flowlabel, true, &fl6)); ip6h->hop_limit =3D hop_limit; - ip6h->nexthdr =3D proto; + ip6h->nexthdr =3D IPPROTO_IPV6; ip6h->daddr =3D tuple->tun.src_v6; ip6h->saddr =3D tuple->tun.dst_v6; ipv6_hdr(skb)->payload_len =3D htons(skb->len - sizeof(*ip6h)); @@ -729,12 +691,10 @@ static int nf_flow_tunnel_ip6ip6_push(struct net *net= , struct sk_buff *skb, =20 static int nf_flow_tunnel_v6_push(struct net *net, struct sk_buff *skb, struct flow_offload_tuple *tuple, - struct in6_addr **ip6_daddr, - int encap_limit) + struct in6_addr **ip6_daddr) { if (tuple->tun_num) - return nf_flow_tunnel_ip6ip6_push(net, skb, tuple, ip6_daddr, - encap_limit); + return nf_flow_tunnel_ip6ip6_push(net, skb, tuple, ip6_daddr); =20 return 0; } @@ -1089,7 +1049,7 @@ static int nf_flow_tuple_ipv6(struct nf_flowtable_ctx= *ctx, struct sk_buff *skb, static int nf_flow_offload_ipv6_forward(struct nf_flowtable_ctx *ctx, struct nf_flowtable *flow_table, struct flow_offload_tuple_rhash *tuplehash, - struct sk_buff *skb, int encap_limit) + struct sk_buff *skb) { enum flow_offload_tuple_dir dir; struct flow_offload *flow; @@ -1100,11 +1060,8 @@ static int nf_flow_offload_ipv6_forward(struct nf_fl= owtable_ctx *ctx, flow =3D container_of(tuplehash, struct flow_offload, tuplehash[dir]); =20 mtu =3D flow->tuplehash[dir].tuple.mtu + ctx->offset; - if (flow->tuplehash[!dir].tuple.tun_num) { + if (flow->tuplehash[!dir].tuple.tun_num) mtu -=3D sizeof(*ip6h); - if (encap_limit > 0) - mtu -=3D 8; /* encap limit option */ - } =20 if (unlikely(nf_flow_exceeds_mtu(skb, mtu))) return 0; @@ -1158,7 +1115,6 @@ unsigned int nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb, const struct nf_hook_state *state) { - int encap_limit =3D IPV6_DEFAULT_TNL_ENCAP_LIMIT; struct flow_offload_tuple_rhash *tuplehash; struct nf_flowtable *flow_table =3D priv; struct flow_offload_tuple *other_tuple; @@ -1177,8 +1133,7 @@ nf_flow_offload_ipv6_hook(void *priv, struct sk_buff = *skb, if (tuplehash =3D=3D NULL) return NF_ACCEPT; =20 - ret =3D nf_flow_offload_ipv6_forward(&ctx, flow_table, tuplehash, skb, - encap_limit); + ret =3D nf_flow_offload_ipv6_forward(&ctx, flow_table, tuplehash, skb); if (ret < 0) return NF_DROP; else if (ret =3D=3D 0) @@ -1198,7 +1153,7 @@ nf_flow_offload_ipv6_hook(void *priv, struct sk_buff = *skb, ip6_daddr =3D &other_tuple->src_v6; =20 if (nf_flow_tunnel_v6_push(state->net, skb, other_tuple, - &ip6_daddr, encap_limit) < 0) + &ip6_daddr) < 0) return NF_DROP; =20 switch (tuplehash->tuple.xmit_type) { >=20 > My understanding is that IP6IP6 tunnels are special, because it is > handled in the received path as local traffic, then decapsulation > happens and packet follows local output path. This is different to > what we already have where packets follow forwarding path, not local > input. The flowtable should only pull the outer IP6 header. >=20 > > > And I suspect nf_flow_ip4_tunnel_proto() with IP options have the same > > > problem, the flowtable need to resort to the classic stack path. > >=20 > > In the IPv4 case, if the packet has options nf_flow_ip4_tunnel_proto() = will > > return false and so the packet will be sent to the networking stack. >=20 > OK, then IPv4 is fine, thanks for explaining. --LMbnxSaivFP16/g/ Content-Type: application/pgp-signature; name=signature.asc -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTquNwa3Txd3rGGn7Y6cBh0uS2trAUCaipwpQAKCRA6cBh0uS2t rIkbAP4hcapcj6PpaqksT5TiwKMT9yXGQnldZYY8WDBDiqvTQwEArWT4NOoXcvxR HONktiBhKWlH+TNyLmhA+G0r3cQFPgQ= =7uKb -----END PGP SIGNATURE----- --LMbnxSaivFP16/g/--