* [PATCH nf] netfilter: flowtable: fix IP6IP6 tunnel offset double-count with vlan/pppoe encap
@ 2026-06-04 21:17 David Carlier
2026-06-05 13:53 ` Lorenzo Bianconi
0 siblings, 1 reply; 4+ messages in thread
From: David Carlier @ 2026-06-04 21:17 UTC (permalink / raw)
To: netfilter-devel
Cc: Pablo Neira Ayuso, Florian Westphal, Lorenzo Bianconi, coreteam,
David Carlier
nf_flow_ip6_tunnel_proto() stores the return value of ipv6_skip_exthdr()
directly into ctx->tun.hdr_size and then does ctx->offset +=
ctx->tun.hdr_size.
ipv6_skip_exthdr() returns an offset measured from skb->data, i.e. its
result already includes the "sizeof(*ip6h) + ctx->offset" start argument.
So hdr_size ends up containing ctx->offset, and the subsequent
"ctx->offset += ctx->tun.hdr_size" counts the encap offset twice.
This is harmless for a bare IP6IP6 packet, where ctx->offset is 0 on
entry, which is why it has gone unnoticed. But nf_flow_skb_encap_protocol()
advances ctx->offset by VLAN_HLEN / PPPOE_SES_HLEN before the tunnel
parser runs, so for an IP6IP6 flow carried over vlan or pppoe both
ctx->offset and ctx->tun.hdr_size are off by the encap length:
- nf_flow_tuple_ipv6() then reads the inner header at the wrong offset,
the computed tuple no longer matches the flowtable entry, and the
packet silently falls back to the slow path (IP6IP6 rx acceleration
stops working);
- on the forward path nf_flow_ip_tunnel_pop() would skb_pull() past the
inner header.
The IPv4 sibling nf_flow_ip4_tunnel_proto() does this correctly: it stores
a relative header length (iph->ihl << 2) and adds that to ctx->offset.
Make the IPv6 path symmetric by storing the relative size.
Fixes: d98103575dcd ("netfilter: flowtable: Add IP6IP6 rx sw acceleration")
Signed-off-by: David Carlier <devnexen@gmail.com>
---
net/netfilter/nf_flow_table_ip.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/netfilter/nf_flow_table_ip.c b/net/netfilter/nf_flow_table_ip.c
index 9c05a50d6013..4c6a68765c6b 100644
--- a/net/netfilter/nf_flow_table_ip.c
+++ b/net/netfilter/nf_flow_table_ip.c
@@ -366,7 +366,7 @@ static bool nf_flow_ip6_tunnel_proto(struct nf_flowtable_ctx *ctx,
return false;
if (nexthdr == IPPROTO_IPV6) {
- ctx->tun.hdr_size = hdrlen;
+ ctx->tun.hdr_size = hdrlen - ctx->offset;
ctx->tun.proto = IPPROTO_IPV6;
}
ctx->offset += ctx->tun.hdr_size;
--
2.53.0
^ permalink raw reply related [flat|nested] 4+ messages in thread* Re: [PATCH nf] netfilter: flowtable: fix IP6IP6 tunnel offset double-count with vlan/pppoe encap
2026-06-04 21:17 [PATCH nf] netfilter: flowtable: fix IP6IP6 tunnel offset double-count with vlan/pppoe encap David Carlier
@ 2026-06-05 13:53 ` Lorenzo Bianconi
2026-06-05 15:17 ` Pablo Neira Ayuso
0 siblings, 1 reply; 4+ messages in thread
From: Lorenzo Bianconi @ 2026-06-05 13:53 UTC (permalink / raw)
To: David Carlier
Cc: netfilter-devel, Pablo Neira Ayuso, Florian Westphal, coreteam
[-- Attachment #1: Type: text/plain, Size: 2436 bytes --]
> nf_flow_ip6_tunnel_proto() stores the return value of ipv6_skip_exthdr()
> directly into ctx->tun.hdr_size and then does ctx->offset +=
> ctx->tun.hdr_size.
>
> ipv6_skip_exthdr() returns an offset measured from skb->data, i.e. its
> result already includes the "sizeof(*ip6h) + ctx->offset" start argument.
> So hdr_size ends up containing ctx->offset, and the subsequent
> "ctx->offset += ctx->tun.hdr_size" counts the encap offset twice.
>
> This is harmless for a bare IP6IP6 packet, where ctx->offset is 0 on
> entry, which is why it has gone unnoticed. But nf_flow_skb_encap_protocol()
> advances ctx->offset by VLAN_HLEN / PPPOE_SES_HLEN before the tunnel
> parser runs, so for an IP6IP6 flow carried over vlan or pppoe both
> ctx->offset and ctx->tun.hdr_size are off by the encap length:
>
> - nf_flow_tuple_ipv6() then reads the inner header at the wrong offset,
> the computed tuple no longer matches the flowtable entry, and the
> packet silently falls back to the slow path (IP6IP6 rx acceleration
> stops working);
> - on the forward path nf_flow_ip_tunnel_pop() would skb_pull() past the
> inner header.
>
> The IPv4 sibling nf_flow_ip4_tunnel_proto() does this correctly: it stores
> a relative header length (iph->ihl << 2) and adds that to ctx->offset.
> Make the IPv6 path symmetric by storing the relative size.
>
> Fixes: d98103575dcd ("netfilter: flowtable: Add IP6IP6 rx sw acceleration")
> Signed-off-by: David Carlier <devnexen@gmail.com>
Hi David,
thx for fixing it. I developed the IP6IP6 vlan support using the veth as
underlying device. veth enables vlan rx/tx offload by default so I was
not able to spot the issue.
Acked-by: Lorenzo Bianconi <lorenzo@kernel.org>
Regards,
Lorenzo
> ---
> net/netfilter/nf_flow_table_ip.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/netfilter/nf_flow_table_ip.c b/net/netfilter/nf_flow_table_ip.c
> index 9c05a50d6013..4c6a68765c6b 100644
> --- a/net/netfilter/nf_flow_table_ip.c
> +++ b/net/netfilter/nf_flow_table_ip.c
> @@ -366,7 +366,7 @@ static bool nf_flow_ip6_tunnel_proto(struct nf_flowtable_ctx *ctx,
> return false;
>
> if (nexthdr == IPPROTO_IPV6) {
> - ctx->tun.hdr_size = hdrlen;
> + ctx->tun.hdr_size = hdrlen - ctx->offset;
> ctx->tun.proto = IPPROTO_IPV6;
> }
> ctx->offset += ctx->tun.hdr_size;
> --
> 2.53.0
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [PATCH nf] netfilter: flowtable: fix IP6IP6 tunnel offset double-count with vlan/pppoe encap
2026-06-05 13:53 ` Lorenzo Bianconi
@ 2026-06-05 15:17 ` Pablo Neira Ayuso
2026-06-05 15:26 ` Lorenzo Bianconi
0 siblings, 1 reply; 4+ messages in thread
From: Pablo Neira Ayuso @ 2026-06-05 15:17 UTC (permalink / raw)
To: Lorenzo Bianconi
Cc: David Carlier, netfilter-devel, Florian Westphal, coreteam
On Fri, Jun 05, 2026 at 03:53:50PM +0200, Lorenzo Bianconi wrote:
> > nf_flow_ip6_tunnel_proto() stores the return value of ipv6_skip_exthdr()
> > directly into ctx->tun.hdr_size and then does ctx->offset +=
> > ctx->tun.hdr_size.
> >
> > ipv6_skip_exthdr() returns an offset measured from skb->data, i.e. its
> > result already includes the "sizeof(*ip6h) + ctx->offset" start argument.
> > So hdr_size ends up containing ctx->offset, and the subsequent
> > "ctx->offset += ctx->tun.hdr_size" counts the encap offset twice.
> >
> > This is harmless for a bare IP6IP6 packet, where ctx->offset is 0 on
> > entry, which is why it has gone unnoticed. But nf_flow_skb_encap_protocol()
> > advances ctx->offset by VLAN_HLEN / PPPOE_SES_HLEN before the tunnel
> > parser runs, so for an IP6IP6 flow carried over vlan or pppoe both
> > ctx->offset and ctx->tun.hdr_size are off by the encap length:
> >
> > - nf_flow_tuple_ipv6() then reads the inner header at the wrong offset,
> > the computed tuple no longer matches the flowtable entry, and the
> > packet silently falls back to the slow path (IP6IP6 rx acceleration
> > stops working);
> > - on the forward path nf_flow_ip_tunnel_pop() would skb_pull() past the
> > inner header.
> >
> > The IPv4 sibling nf_flow_ip4_tunnel_proto() does this correctly: it stores
> > a relative header length (iph->ihl << 2) and adds that to ctx->offset.
> > Make the IPv6 path symmetric by storing the relative size.
> >
> > Fixes: d98103575dcd ("netfilter: flowtable: Add IP6IP6 rx sw acceleration")
> > Signed-off-by: David Carlier <devnexen@gmail.com>
>
> Hi David,
>
> thx for fixing it. I developed the IP6IP6 vlan support using the veth as
> underlying device. veth enables vlan rx/tx offload by default so I was
> not able to spot the issue.
One question when looking at this code:
In nf_flow_ip6_tunnel_proto():
if (nexthdr == IPPROTO_IPV6) {
ctx->tun.hdr_size = hdrlen - ctx->offset;
ctx->tun.proto = IPPROTO_IPV6;
}
ctx->offset += ctx->tun.hdr_size;
ctx->offset is bumped out of the branch.
and nf_flow_ip4_tunnel_proto():
if (iph->protocol == IPPROTO_IPIP) {
ctx->tun.proto = IPPROTO_IPIP;
ctx->tun.hdr_size = size;
ctx->offset += size;
}
I think these checks are superfluous at this stage:
if (nexthdr == IPPROTO_IPV6) {
if (iph->protocol == IPPROTO_IPIP) {
because only ipip and ip6ip6 is supported.
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [PATCH nf] netfilter: flowtable: fix IP6IP6 tunnel offset double-count with vlan/pppoe encap
2026-06-05 15:17 ` Pablo Neira Ayuso
@ 2026-06-05 15:26 ` Lorenzo Bianconi
0 siblings, 0 replies; 4+ messages in thread
From: Lorenzo Bianconi @ 2026-06-05 15:26 UTC (permalink / raw)
To: Pablo Neira Ayuso
Cc: David Carlier, netfilter-devel, Florian Westphal, coreteam
[-- Attachment #1: Type: text/plain, Size: 2930 bytes --]
> On Fri, Jun 05, 2026 at 03:53:50PM +0200, Lorenzo Bianconi wrote:
> > > nf_flow_ip6_tunnel_proto() stores the return value of ipv6_skip_exthdr()
> > > directly into ctx->tun.hdr_size and then does ctx->offset +=
> > > ctx->tun.hdr_size.
> > >
> > > ipv6_skip_exthdr() returns an offset measured from skb->data, i.e. its
> > > result already includes the "sizeof(*ip6h) + ctx->offset" start argument.
> > > So hdr_size ends up containing ctx->offset, and the subsequent
> > > "ctx->offset += ctx->tun.hdr_size" counts the encap offset twice.
> > >
> > > This is harmless for a bare IP6IP6 packet, where ctx->offset is 0 on
> > > entry, which is why it has gone unnoticed. But nf_flow_skb_encap_protocol()
> > > advances ctx->offset by VLAN_HLEN / PPPOE_SES_HLEN before the tunnel
> > > parser runs, so for an IP6IP6 flow carried over vlan or pppoe both
> > > ctx->offset and ctx->tun.hdr_size are off by the encap length:
> > >
> > > - nf_flow_tuple_ipv6() then reads the inner header at the wrong offset,
> > > the computed tuple no longer matches the flowtable entry, and the
> > > packet silently falls back to the slow path (IP6IP6 rx acceleration
> > > stops working);
> > > - on the forward path nf_flow_ip_tunnel_pop() would skb_pull() past the
> > > inner header.
> > >
> > > The IPv4 sibling nf_flow_ip4_tunnel_proto() does this correctly: it stores
> > > a relative header length (iph->ihl << 2) and adds that to ctx->offset.
> > > Make the IPv6 path symmetric by storing the relative size.
> > >
> > > Fixes: d98103575dcd ("netfilter: flowtable: Add IP6IP6 rx sw acceleration")
> > > Signed-off-by: David Carlier <devnexen@gmail.com>
> >
> > Hi David,
> >
> > thx for fixing it. I developed the IP6IP6 vlan support using the veth as
> > underlying device. veth enables vlan rx/tx offload by default so I was
> > not able to spot the issue.
>
> One question when looking at this code:
>
> In nf_flow_ip6_tunnel_proto():
>
> if (nexthdr == IPPROTO_IPV6) {
> ctx->tun.hdr_size = hdrlen - ctx->offset;
> ctx->tun.proto = IPPROTO_IPV6;
> }
> ctx->offset += ctx->tun.hdr_size;
>
> ctx->offset is bumped out of the branch.
>
> and nf_flow_ip4_tunnel_proto():
>
> if (iph->protocol == IPPROTO_IPIP) {
> ctx->tun.proto = IPPROTO_IPIP;
> ctx->tun.hdr_size = size;
> ctx->offset += size;
> }
>
> I think these checks are superfluous at this stage:
>
> if (nexthdr == IPPROTO_IPV6) {
>
> if (iph->protocol == IPPROTO_IPIP) {
>
> because only ipip and ip6ip6 is supported.
In this series [0] I modified a bit this code to support the mixed cases
(SIT and IPv4 over IPv6).
Regards,
Lorenzo
[0] https://lore.kernel.org/netfilter-devel/20260531-b4-flowtable-sw-accel-ip6ip-v3-0-56a2826f3279@kernel.org/
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-06-05 15:26 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-04 21:17 [PATCH nf] netfilter: flowtable: fix IP6IP6 tunnel offset double-count with vlan/pppoe encap David Carlier
2026-06-05 13:53 ` Lorenzo Bianconi
2026-06-05 15:17 ` Pablo Neira Ayuso
2026-06-05 15:26 ` Lorenzo Bianconi
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.