[PATCH nf] netfilter: flowtable: fix IP6IP6 tunnel offset double-count with vlan/pppoe encap

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH nf] netfilter: flowtable: fix IP6IP6 tunnel offset double-count with vlan/pppoe encap
@ 2026-06-04 21:17 David Carlier
  2026-06-05 13:53 ` Lorenzo Bianconi
  0 siblings, 1 reply; 4+ messages in thread
From: David Carlier @ 2026-06-04 21:17 UTC (permalink / raw)
  To: netfilter-devel
  Cc: Pablo Neira Ayuso, Florian Westphal, Lorenzo Bianconi, coreteam,
	David Carlier

nf_flow_ip6_tunnel_proto() stores the return value of ipv6_skip_exthdr()
directly into ctx->tun.hdr_size and then does ctx->offset +=
ctx->tun.hdr_size.

ipv6_skip_exthdr() returns an offset measured from skb->data, i.e. its
result already includes the "sizeof(*ip6h) + ctx->offset" start argument.
So hdr_size ends up containing ctx->offset, and the subsequent
"ctx->offset += ctx->tun.hdr_size" counts the encap offset twice.

This is harmless for a bare IP6IP6 packet, where ctx->offset is 0 on
entry, which is why it has gone unnoticed. But nf_flow_skb_encap_protocol()
advances ctx->offset by VLAN_HLEN / PPPOE_SES_HLEN before the tunnel
parser runs, so for an IP6IP6 flow carried over vlan or pppoe both
ctx->offset and ctx->tun.hdr_size are off by the encap length:

  - nf_flow_tuple_ipv6() then reads the inner header at the wrong offset,
    the computed tuple no longer matches the flowtable entry, and the
    packet silently falls back to the slow path (IP6IP6 rx acceleration
    stops working);
  - on the forward path nf_flow_ip_tunnel_pop() would skb_pull() past the
    inner header.

The IPv4 sibling nf_flow_ip4_tunnel_proto() does this correctly: it stores
a relative header length (iph->ihl << 2) and adds that to ctx->offset.
Make the IPv6 path symmetric by storing the relative size.

Fixes: d98103575dcd ("netfilter: flowtable: Add IP6IP6 rx sw acceleration")
Signed-off-by: David Carlier <devnexen@gmail.com>
---
 net/netfilter/nf_flow_table_ip.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/nf_flow_table_ip.c b/net/netfilter/nf_flow_table_ip.c
index 9c05a50d6013..4c6a68765c6b 100644
--- a/net/netfilter/nf_flow_table_ip.c
+++ b/net/netfilter/nf_flow_table_ip.c
@@ -366,7 +366,7 @@ static bool nf_flow_ip6_tunnel_proto(struct nf_flowtable_ctx *ctx,
 		return false;

 	if (nexthdr == IPPROTO_IPV6) {
-		ctx->tun.hdr_size = hdrlen;
+		ctx->tun.hdr_size = hdrlen - ctx->offset;
 		ctx->tun.proto = IPPROTO_IPV6;
 	}
 	ctx->offset += ctx->tun.hdr_size;
-- 
2.53.0

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH nf] netfilter: flowtable: fix IP6IP6 tunnel offset double-count with vlan/pppoe encap
  2026-06-04 21:17 [PATCH nf] netfilter: flowtable: fix IP6IP6 tunnel offset double-count with vlan/pppoe encap David Carlier
@ 2026-06-05 13:53 ` Lorenzo Bianconi
  2026-06-05 15:17   ` Pablo Neira Ayuso
  0 siblings, 1 reply; 4+ messages in thread
From: Lorenzo Bianconi @ 2026-06-05 13:53 UTC (permalink / raw)
  To: David Carlier
  Cc: netfilter-devel, Pablo Neira Ayuso, Florian Westphal, coreteam

[-- Attachment #1: Type: text/plain, Size: 2436 bytes --]

> nf_flow_ip6_tunnel_proto() stores the return value of ipv6_skip_exthdr()
> directly into ctx->tun.hdr_size and then does ctx->offset +=
> ctx->tun.hdr_size.
> 
> ipv6_skip_exthdr() returns an offset measured from skb->data, i.e. its
> result already includes the "sizeof(*ip6h) + ctx->offset" start argument.
> So hdr_size ends up containing ctx->offset, and the subsequent
> "ctx->offset += ctx->tun.hdr_size" counts the encap offset twice.
> 
> This is harmless for a bare IP6IP6 packet, where ctx->offset is 0 on
> entry, which is why it has gone unnoticed. But nf_flow_skb_encap_protocol()
> advances ctx->offset by VLAN_HLEN / PPPOE_SES_HLEN before the tunnel
> parser runs, so for an IP6IP6 flow carried over vlan or pppoe both
> ctx->offset and ctx->tun.hdr_size are off by the encap length:
> 
>   - nf_flow_tuple_ipv6() then reads the inner header at the wrong offset,
>     the computed tuple no longer matches the flowtable entry, and the
>     packet silently falls back to the slow path (IP6IP6 rx acceleration
>     stops working);
>   - on the forward path nf_flow_ip_tunnel_pop() would skb_pull() past the
>     inner header.
> 
> The IPv4 sibling nf_flow_ip4_tunnel_proto() does this correctly: it stores
> a relative header length (iph->ihl << 2) and adds that to ctx->offset.
> Make the IPv6 path symmetric by storing the relative size.
> 
> Fixes: d98103575dcd ("netfilter: flowtable: Add IP6IP6 rx sw acceleration")
> Signed-off-by: David Carlier <devnexen@gmail.com>

Hi David,

thx for fixing it. I developed the IP6IP6 vlan support using the veth as
underlying device. veth enables vlan rx/tx offload by default so I was
not able to spot the issue.

Acked-by: Lorenzo Bianconi <lorenzo@kernel.org>

Regards,
Lorenzo

> ---
>  net/netfilter/nf_flow_table_ip.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/netfilter/nf_flow_table_ip.c b/net/netfilter/nf_flow_table_ip.c
> index 9c05a50d6013..4c6a68765c6b 100644
> --- a/net/netfilter/nf_flow_table_ip.c
> +++ b/net/netfilter/nf_flow_table_ip.c
> @@ -366,7 +366,7 @@ static bool nf_flow_ip6_tunnel_proto(struct nf_flowtable_ctx *ctx,
>  		return false;
>  
>  	if (nexthdr == IPPROTO_IPV6) {
> -		ctx->tun.hdr_size = hdrlen;
> +		ctx->tun.hdr_size = hdrlen - ctx->offset;
>  		ctx->tun.proto = IPPROTO_IPV6;
>  	}
>  	ctx->offset += ctx->tun.hdr_size;
> -- 
> 2.53.0
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH nf] netfilter: flowtable: fix IP6IP6 tunnel offset double-count with vlan/pppoe encap
  2026-06-05 13:53 ` Lorenzo Bianconi
@ 2026-06-05 15:17   ` Pablo Neira Ayuso
  2026-06-05 15:26     ` Lorenzo Bianconi
  0 siblings, 1 reply; 4+ messages in thread
From: Pablo Neira Ayuso @ 2026-06-05 15:17 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: David Carlier, netfilter-devel, Florian Westphal, coreteam

On Fri, Jun 05, 2026 at 03:53:50PM +0200, Lorenzo Bianconi wrote:
> > nf_flow_ip6_tunnel_proto() stores the return value of ipv6_skip_exthdr()
> > directly into ctx->tun.hdr_size and then does ctx->offset +=
> > ctx->tun.hdr_size.
> > 
> > ipv6_skip_exthdr() returns an offset measured from skb->data, i.e. its
> > result already includes the "sizeof(*ip6h) + ctx->offset" start argument.
> > So hdr_size ends up containing ctx->offset, and the subsequent
> > "ctx->offset += ctx->tun.hdr_size" counts the encap offset twice.
> > 
> > This is harmless for a bare IP6IP6 packet, where ctx->offset is 0 on
> > entry, which is why it has gone unnoticed. But nf_flow_skb_encap_protocol()
> > advances ctx->offset by VLAN_HLEN / PPPOE_SES_HLEN before the tunnel
> > parser runs, so for an IP6IP6 flow carried over vlan or pppoe both
> > ctx->offset and ctx->tun.hdr_size are off by the encap length:
> > 
> >   - nf_flow_tuple_ipv6() then reads the inner header at the wrong offset,
> >     the computed tuple no longer matches the flowtable entry, and the
> >     packet silently falls back to the slow path (IP6IP6 rx acceleration
> >     stops working);
> >   - on the forward path nf_flow_ip_tunnel_pop() would skb_pull() past the
> >     inner header.
> > 
> > The IPv4 sibling nf_flow_ip4_tunnel_proto() does this correctly: it stores
> > a relative header length (iph->ihl << 2) and adds that to ctx->offset.
> > Make the IPv6 path symmetric by storing the relative size.
> > 
> > Fixes: d98103575dcd ("netfilter: flowtable: Add IP6IP6 rx sw acceleration")
> > Signed-off-by: David Carlier <devnexen@gmail.com>
> 
> Hi David,
> 
> thx for fixing it. I developed the IP6IP6 vlan support using the veth as
> underlying device. veth enables vlan rx/tx offload by default so I was
> not able to spot the issue.

One question when looking at this code:

In nf_flow_ip6_tunnel_proto():

        if (nexthdr == IPPROTO_IPV6) {
                ctx->tun.hdr_size = hdrlen - ctx->offset;
                ctx->tun.proto = IPPROTO_IPV6;
        }
        ctx->offset += ctx->tun.hdr_size;

ctx->offset is bumped out of the branch.

and nf_flow_ip4_tunnel_proto():

        if (iph->protocol == IPPROTO_IPIP) {
                ctx->tun.proto = IPPROTO_IPIP;
                ctx->tun.hdr_size = size;
                ctx->offset += size;
        }

I think these checks are superfluous at this stage:

if (nexthdr == IPPROTO_IPV6) {

if (iph->protocol == IPPROTO_IPIP) {

because only ipip and ip6ip6 is supported.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH nf] netfilter: flowtable: fix IP6IP6 tunnel offset double-count with vlan/pppoe encap
  2026-06-05 15:17   ` Pablo Neira Ayuso
@ 2026-06-05 15:26     ` Lorenzo Bianconi
  0 siblings, 0 replies; 4+ messages in thread
From: Lorenzo Bianconi @ 2026-06-05 15:26 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: David Carlier, netfilter-devel, Florian Westphal, coreteam

[-- Attachment #1: Type: text/plain, Size: 2930 bytes --]

> On Fri, Jun 05, 2026 at 03:53:50PM +0200, Lorenzo Bianconi wrote:
> > > nf_flow_ip6_tunnel_proto() stores the return value of ipv6_skip_exthdr()
> > > directly into ctx->tun.hdr_size and then does ctx->offset +=
> > > ctx->tun.hdr_size.
> > > 
> > > ipv6_skip_exthdr() returns an offset measured from skb->data, i.e. its
> > > result already includes the "sizeof(*ip6h) + ctx->offset" start argument.
> > > So hdr_size ends up containing ctx->offset, and the subsequent
> > > "ctx->offset += ctx->tun.hdr_size" counts the encap offset twice.
> > > 
> > > This is harmless for a bare IP6IP6 packet, where ctx->offset is 0 on
> > > entry, which is why it has gone unnoticed. But nf_flow_skb_encap_protocol()
> > > advances ctx->offset by VLAN_HLEN / PPPOE_SES_HLEN before the tunnel
> > > parser runs, so for an IP6IP6 flow carried over vlan or pppoe both
> > > ctx->offset and ctx->tun.hdr_size are off by the encap length:
> > > 
> > >   - nf_flow_tuple_ipv6() then reads the inner header at the wrong offset,
> > >     the computed tuple no longer matches the flowtable entry, and the
> > >     packet silently falls back to the slow path (IP6IP6 rx acceleration
> > >     stops working);
> > >   - on the forward path nf_flow_ip_tunnel_pop() would skb_pull() past the
> > >     inner header.
> > > 
> > > The IPv4 sibling nf_flow_ip4_tunnel_proto() does this correctly: it stores
> > > a relative header length (iph->ihl << 2) and adds that to ctx->offset.
> > > Make the IPv6 path symmetric by storing the relative size.
> > > 
> > > Fixes: d98103575dcd ("netfilter: flowtable: Add IP6IP6 rx sw acceleration")
> > > Signed-off-by: David Carlier <devnexen@gmail.com>
> > 
> > Hi David,
> > 
> > thx for fixing it. I developed the IP6IP6 vlan support using the veth as
> > underlying device. veth enables vlan rx/tx offload by default so I was
> > not able to spot the issue.
> 
> One question when looking at this code:
> 
> In nf_flow_ip6_tunnel_proto():
> 
>         if (nexthdr == IPPROTO_IPV6) {
>                 ctx->tun.hdr_size = hdrlen - ctx->offset;
>                 ctx->tun.proto = IPPROTO_IPV6;
>         }
>         ctx->offset += ctx->tun.hdr_size;
> 
> ctx->offset is bumped out of the branch.
> 
> and nf_flow_ip4_tunnel_proto():
> 
>         if (iph->protocol == IPPROTO_IPIP) {
>                 ctx->tun.proto = IPPROTO_IPIP;
>                 ctx->tun.hdr_size = size;
>                 ctx->offset += size;
>         }
> 
> I think these checks are superfluous at this stage:
> 
> if (nexthdr == IPPROTO_IPV6) {
> 
> if (iph->protocol == IPPROTO_IPIP) {
> 
> because only ipip and ip6ip6 is supported.

In this series [0] I modified a bit this code to support the mixed cases
(SIT and IPv4 over IPv6).

Regards,
Lorenzo

[0] https://lore.kernel.org/netfilter-devel/20260531-b4-flowtable-sw-accel-ip6ip-v3-0-56a2826f3279@kernel.org/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-06-05 15:26 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-04 21:17 [PATCH nf] netfilter: flowtable: fix IP6IP6 tunnel offset double-count with vlan/pppoe encap David Carlier
2026-06-05 13:53 ` Lorenzo Bianconi
2026-06-05 15:17   ` Pablo Neira Ayuso
2026-06-05 15:26     ` Lorenzo Bianconi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.