From: Lorenzo Bianconi <lorenzo@kernel.org>
To: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Florian Westphal <fw@strlen.de>, Phil Sutter <phil@nwl.cc>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Simon Horman <horms@kernel.org>,
netfilter-devel@vger.kernel.org, coreteam@netfilter.org,
netdev@vger.kernel.org
Subject: Re: [PATCH nf] netfilter: flowtable: use pskb_may_pull() in nf_flow_ip6_tunnel_proto()
Date: Thu, 11 Jun 2026 10:24:05 +0200 [thread overview]
Message-ID: <aipwpXlisPVxO2ig@lore-desk> (raw)
In-Reply-To: <aih5uTn97bK29LDR@chamomile>
[-- Attachment #1: Type: text/plain, Size: 11924 bytes --]
> Hi Lorenzo,
>
> On Tue, Jun 09, 2026 at 02:31:03PM +0200, Lorenzo Bianconi wrote:
> > > On Tue, Jun 09, 2026 at 12:28:08PM +0200, Lorenzo Bianconi wrote:
> > > > On Jun 09, Pablo Neira Ayuso wrote:
> > > > > Hi Lorenzo,
> > > > >
> > > > > On Mon, Jun 08, 2026 at 07:06:52PM +0200, Lorenzo Bianconi wrote:
> > > > > > Switch nf_flow_ip6_tunnel_proto() from skb_header_pointer() to
> > > > > > pskb_may_pull() for header validation, aligning it with the approach
> > > > > > used in nf_flow_ip4_tunnel_proto().
> > > > > > Move ctx->offset update inside the IPPROTO_IPV6 conditional block since
> > > > > > it should only be adjusted when a tunnel is actually detected.
> > > > > > While at it, use nexthdr instead of the hardcoded IPPROTO_IPV6 constant
> > > > > > when setting ctx->tun.proto.
> > > > > >
> > > > > > Fixes: d98103575dcdd ("netfilter: flowtable: Add IP6IP6 rx sw acceleration")
> > > > > > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> > > > > > ---
> > > > > > net/netfilter/nf_flow_table_ip.c | 10 +++++-----
> > > > > > 1 file changed, 5 insertions(+), 5 deletions(-)
> > > > > >
> > > > > > diff --git a/net/netfilter/nf_flow_table_ip.c b/net/netfilter/nf_flow_table_ip.c
> > > > > > index 9c05a50d6013..2946399ab715 100644
> > > > > > --- a/net/netfilter/nf_flow_table_ip.c
> > > > > > +++ b/net/netfilter/nf_flow_table_ip.c
> > > > > > @@ -347,15 +347,15 @@ static bool nf_flow_ip6_tunnel_proto(struct nf_flowtable_ctx *ctx,
> > > > > > struct sk_buff *skb)
> > > > > > {
> > > > > > #if IS_ENABLED(CONFIG_IPV6)
> > > > > > - struct ipv6hdr *ip6h, _ip6h;
> > > > > > + struct ipv6hdr *ip6h;
> > > > > > __be16 frag_off;
> > > > > > u8 nexthdr;
> > > > > > int hdrlen;
> > > > > >
> > > > > > - ip6h = skb_header_pointer(skb, ctx->offset, sizeof(*ip6h), &_ip6h);
> > > > > > - if (!ip6h)
> > > > > > + if (!pskb_may_pull(skb, sizeof(*ip6h) + ctx->offset))
> > > > > > return false;
> > > > > >
> > > > > > + ip6h = (struct ipv6hdr *)(skb_network_header(skb) + ctx->offset);
> > > > > > if (ip6h->hop_limit <= 1)
> > > > > > return false;
> > > > >
> > > > > Not shown in the patch, but is there still a corner case here that
> > > > > needs to be covered?
> > > > >
> > > > > ipv6_skip_exthdr() uses skb_header_pointer() internal, then another
> > > > > pskb_may_pull() is needed to make sure no other IPv6 extension header
> > > > > sits between the outer and the inner IPPROTO_IPV6 header, allowing to
> > > > > be in a non-linear area of the skb?
> > > > >
> > > > > > @@ -367,9 +367,9 @@ static bool nf_flow_ip6_tunnel_proto(struct nf_flowtable_ctx *ctx,
> > > > > >
> > > > >
> > > > > I mean:
> > > > >
> > > > > if (!pskb_may_pull(skb, hdrlen))
> > > > > return false;
> > > > >
> > > > > where hdrlen is what ipv6_skip_exthdr() returns.
> > > > >
> > > > > Then, I think it should be safe to call skb_pull() on
> > > > > ctx->tun.hdr_size.
> > > > >
> > > > > Let me know, thanks.
> > > >
> > > > I think you are right, here we need to run:
> > > >
> > > > if (!pskb_may_pull(skb, hdrlen))
> > > > return false;
> > > >
> > > > in order to be sure we can pull ctx->tun.hdr_size in nf_flow_ip_tunnel_pop().
> > > > Doing so, we can roll-back to the original skb_header_pointer() to access the
> > > > outer ip6 header here. What do you think?
> > >
> > > Yes, initial skb_header_pointer() then pskb_may_pull(skb, hdrlen) to
> > > ensure the entire should be fine.
> > >
> > > I think this need one more fix: This needs to resort to classic path
> > > if there are intermediate extension headers sitting in between the
> > > outer and inner headers in IP6IP6, ie. ipv6_ext_hdr() == true. Those
> > > extensions need to be handled by the IPv6 stack.
> >
> > In my setup we have just a single Destination Option extension header (60)
> > between the outer and the inner IPV6 headers. In order to check if we have
> > other extensions headers other than Destination Option (and if so, send the
> > packet the networking stack) I guess we need to implement something similar
> > to ipv6_skip_exthdr(), agree?
>
> Maybe simpler check? If nexthdr is immediately IP6IP6 (ie. no
> intermediate headers are in place), then handle this from the
> flowtable datapath, otherwise fallback to classic. Thus, no new
> special parser function is needed.
Hi Pablo,
ack, I agree. I guess we can limit the flowtable acceleration to the case of a
IP6IP6 tunnel created with encaplimit set to none:
$ip link add name tun0 type ip6tnl local <local> remote <remote> encaplimit none
>
> > > nf_flow_ip6_tunnel_proto() needs to be fixed to deal with this.
> >
> > Do you want to do it with a dedicated patch or do you prefer to do it in this
> > one?
>
> I think a single patch to fix the issues in nf_flow_ip6_tunnel_proto()
> should be fine.
I cooked this patch, it works fine. What do you think?
Regards,
Lorenzo
diff --git a/net/netfilter/nf_flow_table_ip.c b/net/netfilter/nf_flow_table_ip.c
index 9c05a50d6013..5cb50414a491 100644
--- a/net/netfilter/nf_flow_table_ip.c
+++ b/net/netfilter/nf_flow_table_ip.c
@@ -347,29 +347,20 @@ static bool nf_flow_ip6_tunnel_proto(struct nf_flowtable_ctx *ctx,
struct sk_buff *skb)
{
#if IS_ENABLED(CONFIG_IPV6)
- struct ipv6hdr *ip6h, _ip6h;
- __be16 frag_off;
- u8 nexthdr;
- int hdrlen;
+ struct ipv6hdr *ip6h;
- ip6h = skb_header_pointer(skb, ctx->offset, sizeof(*ip6h), &_ip6h);
- if (!ip6h)
+ if (!pskb_may_pull(skb, sizeof(*ip6h) + ctx->offset))
return false;
+ ip6h = (struct ipv6hdr *)(skb_network_header(skb) + ctx->offset);
if (ip6h->hop_limit <= 1)
return false;
- nexthdr = ip6h->nexthdr;
- hdrlen = ipv6_skip_exthdr(skb, sizeof(*ip6h) + ctx->offset, &nexthdr,
- &frag_off);
- if (hdrlen < 0)
- return false;
-
- if (nexthdr == IPPROTO_IPV6) {
- ctx->tun.hdr_size = hdrlen;
- ctx->tun.proto = IPPROTO_IPV6;
+ if (ip6h->nexthdr == IPPROTO_IPV6) {
+ ctx->tun.proto = ip6h->nexthdr;
+ ctx->tun.hdr_size = sizeof(*ip6h);
+ ctx->offset += ctx->tun.hdr_size;
}
- ctx->offset += ctx->tun.hdr_size;
return true;
#else
@@ -648,25 +639,19 @@ static int nf_flow_tunnel_v4_push(struct net *net, struct sk_buff *skb,
return 0;
}
-struct ipv6_tel_txoption {
- struct ipv6_txoptions ops;
- __u8 dst_opt[8];
-};
-
static int nf_flow_tunnel_ip6ip6_push(struct net *net, struct sk_buff *skb,
struct flow_offload_tuple *tuple,
- struct in6_addr **ip6_daddr,
- int encap_limit)
+ struct in6_addr **ip6_daddr)
{
struct ipv6hdr *ip6h = (struct ipv6hdr *)skb_network_header(skb);
- u8 hop_limit = ip6h->hop_limit, proto = IPPROTO_IPV6;
struct rtable *rt = dst_rtable(tuple->dst_cache);
__u8 dsfield = ipv6_get_dsfield(ip6h);
struct flowi6 fl6 = {
.daddr = tuple->tun.src_v6,
.saddr = tuple->tun.dst_v6,
- .flowi6_proto = proto,
+ .flowi6_proto = IPPROTO_IPV6,
};
+ u8 hop_limit = ip6h->hop_limit;
int err, mtu;
u32 headroom;
@@ -674,41 +659,18 @@ static int nf_flow_tunnel_ip6ip6_push(struct net *net, struct sk_buff *skb,
if (err)
return err;
- skb_set_inner_ipproto(skb, proto);
+ skb_set_inner_ipproto(skb, IPPROTO_IPV6);
headroom = sizeof(*ip6h) + LL_RESERVED_SPACE(rt->dst.dev) +
rt->dst.header_len;
- if (encap_limit)
- headroom += 8;
err = skb_cow_head(skb, headroom);
if (err)
return err;
skb_scrub_packet(skb, true);
mtu = dst_mtu(&rt->dst) - sizeof(*ip6h);
- if (encap_limit)
- mtu -= 8;
mtu = max(mtu, IPV6_MIN_MTU);
skb_dst_update_pmtu_no_confirm(skb, mtu);
- if (encap_limit > 0) {
- struct ipv6_tel_txoption opt = {
- .dst_opt[2] = IPV6_TLV_TNL_ENCAP_LIMIT,
- .dst_opt[3] = 1,
- .dst_opt[4] = encap_limit,
- .dst_opt[5] = IPV6_TLV_PADN,
- .dst_opt[6] = 1,
- };
- struct ipv6_opt_hdr *hopt;
-
- opt.ops.dst1opt = (struct ipv6_opt_hdr *)opt.dst_opt;
- opt.ops.opt_nflen = 8;
-
- hopt = skb_push(skb, ipv6_optlen(opt.ops.dst1opt));
- memcpy(hopt, opt.ops.dst1opt, ipv6_optlen(opt.ops.dst1opt));
- hopt->nexthdr = IPPROTO_IPV6;
- proto = NEXTHDR_DEST;
- }
-
skb_push(skb, sizeof(*ip6h));
skb_reset_network_header(skb);
@@ -716,7 +678,7 @@ static int nf_flow_tunnel_ip6ip6_push(struct net *net, struct sk_buff *skb,
ip6_flow_hdr(ip6h, dsfield,
ip6_make_flowlabel(net, skb, fl6.flowlabel, true, &fl6));
ip6h->hop_limit = hop_limit;
- ip6h->nexthdr = proto;
+ ip6h->nexthdr = IPPROTO_IPV6;
ip6h->daddr = tuple->tun.src_v6;
ip6h->saddr = tuple->tun.dst_v6;
ipv6_hdr(skb)->payload_len = htons(skb->len - sizeof(*ip6h));
@@ -729,12 +691,10 @@ static int nf_flow_tunnel_ip6ip6_push(struct net *net, struct sk_buff *skb,
static int nf_flow_tunnel_v6_push(struct net *net, struct sk_buff *skb,
struct flow_offload_tuple *tuple,
- struct in6_addr **ip6_daddr,
- int encap_limit)
+ struct in6_addr **ip6_daddr)
{
if (tuple->tun_num)
- return nf_flow_tunnel_ip6ip6_push(net, skb, tuple, ip6_daddr,
- encap_limit);
+ return nf_flow_tunnel_ip6ip6_push(net, skb, tuple, ip6_daddr);
return 0;
}
@@ -1089,7 +1049,7 @@ static int nf_flow_tuple_ipv6(struct nf_flowtable_ctx *ctx, struct sk_buff *skb,
static int nf_flow_offload_ipv6_forward(struct nf_flowtable_ctx *ctx,
struct nf_flowtable *flow_table,
struct flow_offload_tuple_rhash *tuplehash,
- struct sk_buff *skb, int encap_limit)
+ struct sk_buff *skb)
{
enum flow_offload_tuple_dir dir;
struct flow_offload *flow;
@@ -1100,11 +1060,8 @@ static int nf_flow_offload_ipv6_forward(struct nf_flowtable_ctx *ctx,
flow = container_of(tuplehash, struct flow_offload, tuplehash[dir]);
mtu = flow->tuplehash[dir].tuple.mtu + ctx->offset;
- if (flow->tuplehash[!dir].tuple.tun_num) {
+ if (flow->tuplehash[!dir].tuple.tun_num)
mtu -= sizeof(*ip6h);
- if (encap_limit > 0)
- mtu -= 8; /* encap limit option */
- }
if (unlikely(nf_flow_exceeds_mtu(skb, mtu)))
return 0;
@@ -1158,7 +1115,6 @@ unsigned int
nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb,
const struct nf_hook_state *state)
{
- int encap_limit = IPV6_DEFAULT_TNL_ENCAP_LIMIT;
struct flow_offload_tuple_rhash *tuplehash;
struct nf_flowtable *flow_table = priv;
struct flow_offload_tuple *other_tuple;
@@ -1177,8 +1133,7 @@ nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb,
if (tuplehash == NULL)
return NF_ACCEPT;
- ret = nf_flow_offload_ipv6_forward(&ctx, flow_table, tuplehash, skb,
- encap_limit);
+ ret = nf_flow_offload_ipv6_forward(&ctx, flow_table, tuplehash, skb);
if (ret < 0)
return NF_DROP;
else if (ret == 0)
@@ -1198,7 +1153,7 @@ nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb,
ip6_daddr = &other_tuple->src_v6;
if (nf_flow_tunnel_v6_push(state->net, skb, other_tuple,
- &ip6_daddr, encap_limit) < 0)
+ &ip6_daddr) < 0)
return NF_DROP;
switch (tuplehash->tuple.xmit_type) {
>
> My understanding is that IP6IP6 tunnels are special, because it is
> handled in the received path as local traffic, then decapsulation
> happens and packet follows local output path. This is different to
> what we already have where packets follow forwarding path, not local
> input. The flowtable should only pull the outer IP6 header.
>
> > > And I suspect nf_flow_ip4_tunnel_proto() with IP options have the same
> > > problem, the flowtable need to resort to the classic stack path.
> >
> > In the IPv4 case, if the packet has options nf_flow_ip4_tunnel_proto() will
> > return false and so the packet will be sent to the networking stack.
>
> OK, then IPv4 is fine, thanks for explaining.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
prev parent reply other threads:[~2026-06-11 8:24 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-08 17:06 [PATCH nf] netfilter: flowtable: use pskb_may_pull() in nf_flow_ip6_tunnel_proto() Lorenzo Bianconi
2026-06-08 23:11 ` Pablo Neira Ayuso
2026-06-09 10:28 ` Lorenzo Bianconi
2026-06-09 11:48 ` Pablo Neira Ayuso
2026-06-09 12:31 ` Lorenzo Bianconi
2026-06-09 20:38 ` Pablo Neira Ayuso
2026-06-11 8:24 ` Lorenzo Bianconi [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aipwpXlisPVxO2ig@lore-desk \
--to=lorenzo@kernel.org \
--cc=coreteam@netfilter.org \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=fw@strlen.de \
--cc=horms@kernel.org \
--cc=kuba@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=netfilter-devel@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=pablo@netfilter.org \
--cc=phil@nwl.cc \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox