All of lore.kernel.org
 help / color / mirror / Atom feed
From: Florian Westphal <fw@strlen.de>
To: "Antoine C." <acalando@free.fr>
Cc: netfilter-devel@vger.kernel.org,
	Pablo Neira Ayuso <pablo@netfilter.org>,
	Eric Woudstra <ericwouds@gmail.com>
Subject: Re: bug report: MAC src + protocol optiomization failing with 802.1Q frames
Date: Fri, 7 Nov 2025 00:46:45 +0100	[thread overview]
Message-ID: <aQ0zZe__4AKOWCj3@strlen.de> (raw)
In-Reply-To: <628074913.12181179.1761604837771.JavaMail.root@zimbra62-e11.priv.proxad.net>

Antoine C. <acalando@free.fr> wrote:
> ----- Florian Westphal <fw@strlen.de> a écrit :
> > Antoine C. <acalando@free.fr> wrote:
> > > 
> > > This bug does not seem to get a lot of attention but may be it
> > > deserves at least to be filed ?
> > 
> > Its a design bug and so far not a single solution was
> > presented.
> > 
> > And no one answered any of the questions that I asked.
> > 
> > So, whats YOUR opinion?
> > 
> > > > Should "ip saddr 1.2.3.4" match:
> > > > 
> > > > Only in classic ethernet case?
> > > > In VLAN?
> > > > In QinQ?
> > > > 
> > > > What about IP packet in a PPPOE frame?
> > > > What about other L2 protocols?
> 
> As a user, my answer would be of course that "ip xxx" rules 
> work seamlessly whatever the encapsulation is above. I have

Thanks.  I believe that this is the way to go, i.e., if user asks
to match L3, then we should be greedy and attempt to provide the
relevant context.

TL;DR: I think
https://patchwork.ozlabs.org/project/netfilter-devel/patch/20251104145728.517197-4-ericwouds@gmail.com/

is the way to go with only minor changes needed.

Unfortunately this isn't as easy as it sounds.
Lets see where we are:

When user asks for 'ip saddr 1.2.3.4'. we must provide context
for matching to work in first place.

1. Kernel doesn't always know where to start the matching (the network
header offset in this case).  For ethernet and vlan the kernel will
know.

But for other procotols on top of ethernet that might not be the case
especially if the kernel is only forwarding frames.

To make 'ip saddr' work on bridge with other encapsulations, e.g. PPPoE,
this needs kernel changes.

That also means that existing rules behave differently when updating
kernel with new L2 encap protocol support.

2. Even if kernel knows where in the frame the network header starts,
'ip saddr 1.2.3.4' must not match e.g. an ipv6 packet.

This is solved (or rather, supposed to be solved) by nftables userspace:
For bridge family nft will insert a protocol dependency check
internally, in the given example it is
'meta load protocol;cmp eq 0x0800'.

- It excludes non-ipv4 (no false matches). This is wanted behavior.
- It includes VLAN encapsulated frames because kernel (by default)
  removes vlan headers and thus replaces the 0x8100 vlan type with
  the upper protocol.
- If kernel is configured to not do that, then this will only match
  normal (non-vlan) ipv4 ethernet frames.

If we want to make 'ip saddr 1.2.3.4' always match for netdev and bridge
families, then we need to make changes on both userspace and/or kernel:

- kernel must, for bridge and netdev chains, follow the L2 encap trail
  until it finds ip/ip6 protocol or it can't follow any further.

 This means extension to also skip PPPoE, q-in-q and so on and would
 be largely identical to what Eric Woudstras patches already do.

Downside: Behaviour change on kernel upgrade: packets that were
not matched before now will be, UNLESS we don't update
skb->protocol value and e.g. add a new l3proto field to nft_pktinfo.

That in turn is also problematic, because it means that we'd have to
add a new 'meta l3proto' that can extract this from nft_pktinfo
as a replacement of 'meta protocol'.

Semantics would be:
meta protocol == skb->protocol

meta l3proto == "last" ETH_P_ that kernel could figure out.
It is similar to 'ip6 nexthdr' vs. 'meta l4proto';
former is just the first next header value, the latter the last
protocol header with all extension headers, if any, skipped.

The problem is that, given its a new extension, we can't update
nftables to use it for implicit dependencies; else rule add
fails on older kernels.

If we ignore that and pretend we could change this, then
the behaviour of:

ip saddr 1.2.3.4 (with meta l3proto dep)
ip saddr 1.2.3.4 (with current meta protocol)

... are identical between the two options
(i.e., follow dependency trail and update skb->protocol to last
 known ETH_P value) resp. 'stash last known ETH_P in nft_pktinfo)
from user perspective.

That would also mean that nftables netdev family should be
fixed to follow the bridge dependency approach consistently
if possible.

      reply	other threads:[~2025-11-06 23:46 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-12 20:59 bug report: MAC src + protocol optiomization failing with 802.1Q frames Antoine C.
2025-10-12 21:44 ` Florian Westphal
2025-10-27 18:26   ` Antoine C.
2025-10-27 18:37     ` Florian Westphal
2025-10-27 22:40       ` Antoine C.
2025-11-06 23:46         ` Florian Westphal [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aQ0zZe__4AKOWCj3@strlen.de \
    --to=fw@strlen.de \
    --cc=acalando@free.fr \
    --cc=ericwouds@gmail.com \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=pablo@netfilter.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.