From: Pablo Neira Ayuso <pablo@netfilter.org>
To: Lorenzo Bianconi <lorenzo@kernel.org>
Cc: Florian Westphal <fw@strlen.de>, Phil Sutter <phil@nwl.cc>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Simon Horman <horms@kernel.org>,
netfilter-devel@vger.kernel.org, coreteam@netfilter.org,
netdev@vger.kernel.org
Subject: Re: [PATCH nf] netfilter: flowtable: use pskb_may_pull() in nf_flow_ip6_tunnel_proto()
Date: Thu, 11 Jun 2026 13:03:58 +0200 [thread overview]
Message-ID: <aiqWHszyI-RXqHVI@chamomile> (raw)
In-Reply-To: <aipwpXlisPVxO2ig@lore-desk>
On Thu, Jun 11, 2026 at 10:24:05AM +0200, Lorenzo Bianconi wrote:
> > Hi Lorenzo,
> >
> > On Tue, Jun 09, 2026 at 02:31:03PM +0200, Lorenzo Bianconi wrote:
> > > > On Tue, Jun 09, 2026 at 12:28:08PM +0200, Lorenzo Bianconi wrote:
> > > > > On Jun 09, Pablo Neira Ayuso wrote:
> > > > > > Hi Lorenzo,
> > > > > >
> > > > > > On Mon, Jun 08, 2026 at 07:06:52PM +0200, Lorenzo Bianconi wrote:
> > > > > > > Switch nf_flow_ip6_tunnel_proto() from skb_header_pointer() to
> > > > > > > pskb_may_pull() for header validation, aligning it with the approach
> > > > > > > used in nf_flow_ip4_tunnel_proto().
> > > > > > > Move ctx->offset update inside the IPPROTO_IPV6 conditional block since
> > > > > > > it should only be adjusted when a tunnel is actually detected.
> > > > > > > While at it, use nexthdr instead of the hardcoded IPPROTO_IPV6 constant
> > > > > > > when setting ctx->tun.proto.
> > > > > > >
> > > > > > > Fixes: d98103575dcdd ("netfilter: flowtable: Add IP6IP6 rx sw acceleration")
> > > > > > > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> > > > > > > ---
> > > > > > > net/netfilter/nf_flow_table_ip.c | 10 +++++-----
> > > > > > > 1 file changed, 5 insertions(+), 5 deletions(-)
> > > > > > >
> > > > > > > diff --git a/net/netfilter/nf_flow_table_ip.c b/net/netfilter/nf_flow_table_ip.c
> > > > > > > index 9c05a50d6013..2946399ab715 100644
> > > > > > > --- a/net/netfilter/nf_flow_table_ip.c
> > > > > > > +++ b/net/netfilter/nf_flow_table_ip.c
> > > > > > > @@ -347,15 +347,15 @@ static bool nf_flow_ip6_tunnel_proto(struct nf_flowtable_ctx *ctx,
> > > > > > > struct sk_buff *skb)
> > > > > > > {
> > > > > > > #if IS_ENABLED(CONFIG_IPV6)
> > > > > > > - struct ipv6hdr *ip6h, _ip6h;
> > > > > > > + struct ipv6hdr *ip6h;
> > > > > > > __be16 frag_off;
> > > > > > > u8 nexthdr;
> > > > > > > int hdrlen;
> > > > > > >
> > > > > > > - ip6h = skb_header_pointer(skb, ctx->offset, sizeof(*ip6h), &_ip6h);
> > > > > > > - if (!ip6h)
> > > > > > > + if (!pskb_may_pull(skb, sizeof(*ip6h) + ctx->offset))
> > > > > > > return false;
> > > > > > >
> > > > > > > + ip6h = (struct ipv6hdr *)(skb_network_header(skb) + ctx->offset);
> > > > > > > if (ip6h->hop_limit <= 1)
> > > > > > > return false;
> > > > > >
> > > > > > Not shown in the patch, but is there still a corner case here that
> > > > > > needs to be covered?
> > > > > >
> > > > > > ipv6_skip_exthdr() uses skb_header_pointer() internal, then another
> > > > > > pskb_may_pull() is needed to make sure no other IPv6 extension header
> > > > > > sits between the outer and the inner IPPROTO_IPV6 header, allowing to
> > > > > > be in a non-linear area of the skb?
> > > > > >
> > > > > > > @@ -367,9 +367,9 @@ static bool nf_flow_ip6_tunnel_proto(struct nf_flowtable_ctx *ctx,
> > > > > > >
> > > > > >
> > > > > > I mean:
> > > > > >
> > > > > > if (!pskb_may_pull(skb, hdrlen))
> > > > > > return false;
> > > > > >
> > > > > > where hdrlen is what ipv6_skip_exthdr() returns.
> > > > > >
> > > > > > Then, I think it should be safe to call skb_pull() on
> > > > > > ctx->tun.hdr_size.
> > > > > >
> > > > > > Let me know, thanks.
> > > > >
> > > > > I think you are right, here we need to run:
> > > > >
> > > > > if (!pskb_may_pull(skb, hdrlen))
> > > > > return false;
> > > > >
> > > > > in order to be sure we can pull ctx->tun.hdr_size in nf_flow_ip_tunnel_pop().
> > > > > Doing so, we can roll-back to the original skb_header_pointer() to access the
> > > > > outer ip6 header here. What do you think?
> > > >
> > > > Yes, initial skb_header_pointer() then pskb_may_pull(skb, hdrlen) to
> > > > ensure the entire should be fine.
> > > >
> > > > I think this need one more fix: This needs to resort to classic path
> > > > if there are intermediate extension headers sitting in between the
> > > > outer and inner headers in IP6IP6, ie. ipv6_ext_hdr() == true. Those
> > > > extensions need to be handled by the IPv6 stack.
> > >
> > > In my setup we have just a single Destination Option extension header (60)
> > > between the outer and the inner IPV6 headers. In order to check if we have
> > > other extensions headers other than Destination Option (and if so, send the
> > > packet the networking stack) I guess we need to implement something similar
> > > to ipv6_skip_exthdr(), agree?
> >
> > Maybe simpler check? If nexthdr is immediately IP6IP6 (ie. no
> > intermediate headers are in place), then handle this from the
> > flowtable datapath, otherwise fallback to classic. Thus, no new
> > special parser function is needed.
>
> Hi Pablo,
>
> ack, I agree. I guess we can limit the flowtable acceleration to the case of a
> IP6IP6 tunnel created with encaplimit set to none:
>
> $ip link add name tun0 type ip6tnl local <local> remote <remote> encaplimit none
I can see IPV6_DEFAULT_TNL_ENCAP_LIMIT is used from packet path, so
the flowtable ignores this existing configuration.
I guess encaplimit is reachable from net_device so .fill_forward_path
can check this _at flow offload_ set up time.
I think it is sensible to start simple, ie. no encaplimit support,
then maybe at full support later for NEXTHDR_DEST later on, but
I think encaplimit support is currently half-way done?
I think your patch is fine, probably only .fill_forward_path needs the
update to skip the offload if encaplimit is used for ip6ip6?
more comments below
> > > > nf_flow_ip6_tunnel_proto() needs to be fixed to deal with this.
> > >
> > > Do you want to do it with a dedicated patch or do you prefer to do it in this
> > > one?
> >
> > I think a single patch to fix the issues in nf_flow_ip6_tunnel_proto()
> > should be fine.
>
> I cooked this patch, it works fine. What do you think?
>
> Regards,
> Lorenzo
>
> diff --git a/net/netfilter/nf_flow_table_ip.c b/net/netfilter/nf_flow_table_ip.c
> index 9c05a50d6013..5cb50414a491 100644
> --- a/net/netfilter/nf_flow_table_ip.c
> +++ b/net/netfilter/nf_flow_table_ip.c
> @@ -347,29 +347,20 @@ static bool nf_flow_ip6_tunnel_proto(struct nf_flowtable_ctx *ctx,
> struct sk_buff *skb)
> {
> #if IS_ENABLED(CONFIG_IPV6)
> - struct ipv6hdr *ip6h, _ip6h;
> - __be16 frag_off;
> - u8 nexthdr;
> - int hdrlen;
> + struct ipv6hdr *ip6h;
>
> - ip6h = skb_header_pointer(skb, ctx->offset, sizeof(*ip6h), &_ip6h);
> - if (!ip6h)
> + if (!pskb_may_pull(skb, sizeof(*ip6h) + ctx->offset))
> return false;
>
> + ip6h = (struct ipv6hdr *)(skb_network_header(skb) + ctx->offset);
> if (ip6h->hop_limit <= 1)
> return false;
>
> - nexthdr = ip6h->nexthdr;
> - hdrlen = ipv6_skip_exthdr(skb, sizeof(*ip6h) + ctx->offset, &nexthdr,
> - &frag_off);
> - if (hdrlen < 0)
> - return false;
> -
> - if (nexthdr == IPPROTO_IPV6) {
> - ctx->tun.hdr_size = hdrlen;
> - ctx->tun.proto = IPPROTO_IPV6;
> + if (ip6h->nexthdr == IPPROTO_IPV6) {
LGTM, if we see an IP6IP6 packets with intermediate headers, then
flowtable lookup fails and packet follows classic path.
And this also removes the ipv6_skip_exthdr() preprocessing which is a
bit expensive to run for all IPv6 packets, most people do not use
IP6IP6.
> + ctx->tun.proto = ip6h->nexthdr;
> + ctx->tun.hdr_size = sizeof(*ip6h);
> + ctx->offset += ctx->tun.hdr_size;
> }
> - ctx->offset += ctx->tun.hdr_size;
>
> return true;
> #else
> @@ -648,25 +639,19 @@ static int nf_flow_tunnel_v4_push(struct net *net, struct sk_buff *skb,
> return 0;
> }
>
> -struct ipv6_tel_txoption {
> - struct ipv6_txoptions ops;
> - __u8 dst_opt[8];
> -};
> -
> static int nf_flow_tunnel_ip6ip6_push(struct net *net, struct sk_buff *skb,
> struct flow_offload_tuple *tuple,
> - struct in6_addr **ip6_daddr,
> - int encap_limit)
> + struct in6_addr **ip6_daddr)
> {
> struct ipv6hdr *ip6h = (struct ipv6hdr *)skb_network_header(skb);
> - u8 hop_limit = ip6h->hop_limit, proto = IPPROTO_IPV6;
> struct rtable *rt = dst_rtable(tuple->dst_cache);
> __u8 dsfield = ipv6_get_dsfield(ip6h);
> struct flowi6 fl6 = {
> .daddr = tuple->tun.src_v6,
> .saddr = tuple->tun.dst_v6,
> - .flowi6_proto = proto,
> + .flowi6_proto = IPPROTO_IPV6,
> };
> + u8 hop_limit = ip6h->hop_limit;
> int err, mtu;
> u32 headroom;
>
> @@ -674,41 +659,18 @@ static int nf_flow_tunnel_ip6ip6_push(struct net *net, struct sk_buff *skb,
> if (err)
> return err;
>
> - skb_set_inner_ipproto(skb, proto);
> + skb_set_inner_ipproto(skb, IPPROTO_IPV6);
> headroom = sizeof(*ip6h) + LL_RESERVED_SPACE(rt->dst.dev) +
> rt->dst.header_len;
> - if (encap_limit)
> - headroom += 8;
> err = skb_cow_head(skb, headroom);
> if (err)
> return err;
>
> skb_scrub_packet(skb, true);
> mtu = dst_mtu(&rt->dst) - sizeof(*ip6h);
> - if (encap_limit)
> - mtu -= 8;
> mtu = max(mtu, IPV6_MIN_MTU);
> skb_dst_update_pmtu_no_confirm(skb, mtu);
>
> - if (encap_limit > 0) {
> - struct ipv6_tel_txoption opt = {
> - .dst_opt[2] = IPV6_TLV_TNL_ENCAP_LIMIT,
> - .dst_opt[3] = 1,
> - .dst_opt[4] = encap_limit,
> - .dst_opt[5] = IPV6_TLV_PADN,
> - .dst_opt[6] = 1,
> - };
> - struct ipv6_opt_hdr *hopt;
> -
> - opt.ops.dst1opt = (struct ipv6_opt_hdr *)opt.dst_opt;
> - opt.ops.opt_nflen = 8;
> -
> - hopt = skb_push(skb, ipv6_optlen(opt.ops.dst1opt));
> - memcpy(hopt, opt.ops.dst1opt, ipv6_optlen(opt.ops.dst1opt));
> - hopt->nexthdr = IPPROTO_IPV6;
> - proto = NEXTHDR_DEST;
> - }
> -
> skb_push(skb, sizeof(*ip6h));
> skb_reset_network_header(skb);
>
> @@ -716,7 +678,7 @@ static int nf_flow_tunnel_ip6ip6_push(struct net *net, struct sk_buff *skb,
> ip6_flow_hdr(ip6h, dsfield,
> ip6_make_flowlabel(net, skb, fl6.flowlabel, true, &fl6));
> ip6h->hop_limit = hop_limit;
> - ip6h->nexthdr = proto;
> + ip6h->nexthdr = IPPROTO_IPV6;
> ip6h->daddr = tuple->tun.src_v6;
> ip6h->saddr = tuple->tun.dst_v6;
> ipv6_hdr(skb)->payload_len = htons(skb->len - sizeof(*ip6h));
> @@ -729,12 +691,10 @@ static int nf_flow_tunnel_ip6ip6_push(struct net *net, struct sk_buff *skb,
>
> static int nf_flow_tunnel_v6_push(struct net *net, struct sk_buff *skb,
> struct flow_offload_tuple *tuple,
> - struct in6_addr **ip6_daddr,
> - int encap_limit)
> + struct in6_addr **ip6_daddr)
> {
> if (tuple->tun_num)
> - return nf_flow_tunnel_ip6ip6_push(net, skb, tuple, ip6_daddr,
> - encap_limit);
> + return nf_flow_tunnel_ip6ip6_push(net, skb, tuple, ip6_daddr);
>
> return 0;
> }
> @@ -1089,7 +1049,7 @@ static int nf_flow_tuple_ipv6(struct nf_flowtable_ctx *ctx, struct sk_buff *skb,
> static int nf_flow_offload_ipv6_forward(struct nf_flowtable_ctx *ctx,
> struct nf_flowtable *flow_table,
> struct flow_offload_tuple_rhash *tuplehash,
> - struct sk_buff *skb, int encap_limit)
> + struct sk_buff *skb)
> {
> enum flow_offload_tuple_dir dir;
> struct flow_offload *flow;
> @@ -1100,11 +1060,8 @@ static int nf_flow_offload_ipv6_forward(struct nf_flowtable_ctx *ctx,
> flow = container_of(tuplehash, struct flow_offload, tuplehash[dir]);
>
> mtu = flow->tuplehash[dir].tuple.mtu + ctx->offset;
> - if (flow->tuplehash[!dir].tuple.tun_num) {
> + if (flow->tuplehash[!dir].tuple.tun_num)
> mtu -= sizeof(*ip6h);
> - if (encap_limit > 0)
> - mtu -= 8; /* encap limit option */
> - }
>
> if (unlikely(nf_flow_exceeds_mtu(skb, mtu)))
> return 0;
> @@ -1158,7 +1115,6 @@ unsigned int
> nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb,
> const struct nf_hook_state *state)
> {
> - int encap_limit = IPV6_DEFAULT_TNL_ENCAP_LIMIT;
> struct flow_offload_tuple_rhash *tuplehash;
> struct nf_flowtable *flow_table = priv;
> struct flow_offload_tuple *other_tuple;
> @@ -1177,8 +1133,7 @@ nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb,
> if (tuplehash == NULL)
> return NF_ACCEPT;
>
> - ret = nf_flow_offload_ipv6_forward(&ctx, flow_table, tuplehash, skb,
> - encap_limit);
> + ret = nf_flow_offload_ipv6_forward(&ctx, flow_table, tuplehash, skb);
> if (ret < 0)
> return NF_DROP;
> else if (ret == 0)
> @@ -1198,7 +1153,7 @@ nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb,
> ip6_daddr = &other_tuple->src_v6;
>
> if (nf_flow_tunnel_v6_push(state->net, skb, other_tuple,
> - &ip6_daddr, encap_limit) < 0)
> + &ip6_daddr) < 0)
> return NF_DROP;
>
> switch (tuplehash->tuple.xmit_type) {
prev parent reply other threads:[~2026-06-11 11:04 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-08 17:06 [PATCH nf] netfilter: flowtable: use pskb_may_pull() in nf_flow_ip6_tunnel_proto() Lorenzo Bianconi
2026-06-08 23:11 ` Pablo Neira Ayuso
2026-06-09 10:28 ` Lorenzo Bianconi
2026-06-09 11:48 ` Pablo Neira Ayuso
2026-06-09 12:31 ` Lorenzo Bianconi
2026-06-09 20:38 ` Pablo Neira Ayuso
2026-06-11 8:24 ` Lorenzo Bianconi
2026-06-11 11:03 ` Pablo Neira Ayuso [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aiqWHszyI-RXqHVI@chamomile \
--to=pablo@netfilter.org \
--cc=coreteam@netfilter.org \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=fw@strlen.de \
--cc=horms@kernel.org \
--cc=kuba@kernel.org \
--cc=lorenzo@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=netfilter-devel@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=phil@nwl.cc \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.