* bug report: MAC src + protocol optiomization failing with 802.1Q frames
@ 2025-10-12 20:59 Antoine C.
2025-10-12 21:44 ` Florian Westphal
0 siblings, 1 reply; 6+ messages in thread
From: Antoine C. @ 2025-10-12 20:59 UTC (permalink / raw)
To: netfilter-devel
Hello,
Following the mails I sent on the user mailing list, it seems that
there is a bug occurring with the first rule below (the second is
fine):
# nft list table netdev t
table netdev t {
chain c {
ether saddr aa:bb:cc:dd:00:38 ip saddr 192.168.140.56 \
log prefix "--tests 1&2 --"
ip saddr 192.168.140.56 ether saddr aa:bb:cc:dd:00:38 \
log prefix "--tests 2&1 --"
}
}
It is translated this way:
netdev t c
[ meta load iiftype => reg 1 ]
[ cmp eq reg 1 0x00000001 ]
[ payload load 8b @ link header + 6 => reg 1 ]
[ cmp eq reg 1 0xddccbbaa 0x00083800 ]
[ payload load 4b @ network header + 12 => reg 1 ]
[ cmp eq reg 1 0x388ca8c0 ]
[ log prefix --tests 1&2 -- ]
The MAC source and the protocol are loaded at the same time
then checked... but with an 802.1Q packet, it is actually
wrong: the ethertype will be 0x8100 and the protocol (here
IPv4, 0x0800), will be 4 bytes further. And it that case,
the second test above will succeed because the protocol
is loaded independently.
I just tested with latest versions of libmnl/libnftnl/nft
and I get the same behavior.
The mail on the netfilter-user ML:
https://marc.info/?l=netfilter&m=176011821517829
Regards,
Antoine
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: bug report: MAC src + protocol optiomization failing with 802.1Q frames 2025-10-12 20:59 bug report: MAC src + protocol optiomization failing with 802.1Q frames Antoine C. @ 2025-10-12 21:44 ` Florian Westphal 2025-10-27 18:26 ` Antoine C. 0 siblings, 1 reply; 6+ messages in thread From: Florian Westphal @ 2025-10-12 21:44 UTC (permalink / raw) To: Antoine C.; +Cc: netfilter-devel, Pablo Neira Ayuso, Eric Woudstra Antoine C. <acalando@free.fr> wrote: > Following the mails I sent on the user mailing list, it seems that > there is a bug occurring with the first rule below (the second is > fine): > > # nft list table netdev t > table netdev t { > chain c { > ether saddr aa:bb:cc:dd:00:38 ip saddr 192.168.140.56 \ > log prefix "--tests 1&2 --" > ip saddr 192.168.140.56 ether saddr aa:bb:cc:dd:00:38 \ > log prefix "--tests 2&1 --" > } > } > > It is translated this way: > netdev t c > [ meta load iiftype => reg 1 ] > [ cmp eq reg 1 0x00000001 ] > [ payload load 8b @ link header + 6 => reg 1 ] > [ cmp eq reg 1 0xddccbbaa 0x00083800 ] > [ payload load 4b @ network header + 12 => reg 1 ] > [ cmp eq reg 1 0x388ca8c0 ] > [ log prefix --tests 1&2 -- ] > > The MAC source and the protocol are loaded at the same time > then checked... but with an 802.1Q packet, it is actually > wrong: the ethertype will be 0x8100 and the protocol (here > IPv4, 0x0800), will be 4 bytes further. And it that case, > the second test above will succeed because the protocol > is loaded independently. > > I just tested with latest versions of libmnl/libnftnl/nft > and I get the same behavior. The question is what these rules should actually match, there are no consistent semantics in nftables for bridge and netdev families: The existing behaviour is undefined resp. random. Should "ip saddr 1.2.3.4" match: Only in classic ethernet case? In VLAN? In QinQ? What about IP packet in a PPPOE frame? What about other L2 protocols? Pablo, I can't come up with any good answer for this; I think an explicit dissector expression is needed to populate l3 and l4 information into nft_pktinfo structure for bridge/netdev families so "ip saddr" would only ever match plain ethernet (no vlans, no pppoe). This also means the existing skb->protocol based dependencies need to die resp. check for offloaded vlan headers. Whats your take? This is also related to Eric Woudstras work to add qinq+pppoe automatching. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: bug report: MAC src + protocol optiomization failing with 802.1Q frames 2025-10-12 21:44 ` Florian Westphal @ 2025-10-27 18:26 ` Antoine C. 2025-10-27 18:37 ` Florian Westphal 0 siblings, 1 reply; 6+ messages in thread From: Antoine C. @ 2025-10-27 18:26 UTC (permalink / raw) To: Florian Westphal; +Cc: netfilter-devel, Pablo Neira Ayuso, Eric Woudstra This bug does not seem to get a lot of attention but may be it deserves at least to be filed ? I have tried to do it myself after requesting an account on bugzilla but I do not have the necessary permissions. Thanks, Antoine ----- Florian Westphal <fw@strlen.de> a écrit : > Antoine C. <acalando@free.fr> wrote: > > Following the mails I sent on the user mailing list, it seems that > > there is a bug occurring with the first rule below (the second is > > fine): > > > > # nft list table netdev t > > table netdev t { > > chain c { > > ether saddr aa:bb:cc:dd:00:38 ip saddr 192.168.140.56 \ > > log prefix "--tests 1&2 --" > > ip saddr 192.168.140.56 ether saddr aa:bb:cc:dd:00:38 \ > > log prefix "--tests 2&1 --" > > } > > } > > > > It is translated this way: > > netdev t c > > [ meta load iiftype => reg 1 ] > > [ cmp eq reg 1 0x00000001 ] > > [ payload load 8b @ link header + 6 => reg 1 ] > > [ cmp eq reg 1 0xddccbbaa 0x00083800 ] > > [ payload load 4b @ network header + 12 => reg 1 ] > > [ cmp eq reg 1 0x388ca8c0 ] > > [ log prefix --tests 1&2 -- ] > > > > The MAC source and the protocol are loaded at the same time > > then checked... but with an 802.1Q packet, it is actually > > wrong: the ethertype will be 0x8100 and the protocol (here > > IPv4, 0x0800), will be 4 bytes further. And it that case, > > the second test above will succeed because the protocol > > is loaded independently. > > > > I just tested with latest versions of libmnl/libnftnl/nft > > and I get the same behavior. > > The question is what these rules should actually match, there > are no consistent semantics in nftables for bridge and > netdev families: The existing behaviour is undefined resp. > random. > > Should "ip saddr 1.2.3.4" match: > > Only in classic ethernet case? > In VLAN? > In QinQ? > > What about IP packet in a PPPOE frame? > What about other L2 protocols? > > Pablo, I can't come up with any good answer for this; I think > an explicit dissector expression is needed to populate l3 and l4 > information into nft_pktinfo structure for bridge/netdev families so > "ip saddr" would only ever match plain ethernet (no vlans, no pppoe). > > This also means the existing skb->protocol based dependencies > need to die resp. check for offloaded vlan headers. > > Whats your take? > > This is also related to Eric Woudstras work to add qinq+pppoe > automatching. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: bug report: MAC src + protocol optiomization failing with 802.1Q frames 2025-10-27 18:26 ` Antoine C. @ 2025-10-27 18:37 ` Florian Westphal 2025-10-27 22:40 ` Antoine C. 0 siblings, 1 reply; 6+ messages in thread From: Florian Westphal @ 2025-10-27 18:37 UTC (permalink / raw) To: Antoine C.; +Cc: netfilter-devel, Pablo Neira Ayuso, Eric Woudstra Antoine C. <acalando@free.fr> wrote: > > This bug does not seem to get a lot of attention but may be it > deserves at least to be filed ? Its a design bug and so far not a single solution was presented. And no one answered any of the questions that I asked. So, whats YOUR opinion? > > Should "ip saddr 1.2.3.4" match: > > > > Only in classic ethernet case? > > In VLAN? > > In QinQ? > > > > What about IP packet in a PPPOE frame? > > What about other L2 protocols? As long as nobody defines how any of this is supposed to work in the first place nothing will happen here. What you encounter is the 'historic random behaviour' where outcome depends on the hidden dependencies that nft (userspace) inserts and if vlan offload was disabled. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: bug report: MAC src + protocol optiomization failing with 802.1Q frames 2025-10-27 18:37 ` Florian Westphal @ 2025-10-27 22:40 ` Antoine C. 2025-11-06 23:46 ` Florian Westphal 0 siblings, 1 reply; 6+ messages in thread From: Antoine C. @ 2025-10-27 22:40 UTC (permalink / raw) To: Florian Westphal; +Cc: netfilter-devel, Pablo Neira Ayuso, Eric Woudstra ----- Florian Westphal <fw@strlen.de> a écrit : > Antoine C. <acalando@free.fr> wrote: > > > > This bug does not seem to get a lot of attention but may be it > > deserves at least to be filed ? > > Its a design bug and so far not a single solution was > presented. > > And no one answered any of the questions that I asked. > > So, whats YOUR opinion? > > > > Should "ip saddr 1.2.3.4" match: > > > > > > Only in classic ethernet case? > > > In VLAN? > > > In QinQ? > > > > > > What about IP packet in a PPPOE frame? > > > What about other L2 protocols? As a user, my answer would be of course that "ip xxx" rules work seamlessly whatever the encapsulation is above. I have been trying to do for a few weeks with iptables what I just did with nft (for reasons external to this problem) and I can only praise how this was easy and elegant with nft, despite of this bug (to give more context: I am converting a huge access control list with L2/L3/L4 fields to NFT/iptables rules). Of course, this is just my personal use case and I fully understand that my wishes come after under-the-hood constraints or main cases optimizations. But at least, I would expect an error or a warning if I ask nft something ending up in undefined behavior. Antoine ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: bug report: MAC src + protocol optiomization failing with 802.1Q frames 2025-10-27 22:40 ` Antoine C. @ 2025-11-06 23:46 ` Florian Westphal 0 siblings, 0 replies; 6+ messages in thread From: Florian Westphal @ 2025-11-06 23:46 UTC (permalink / raw) To: Antoine C.; +Cc: netfilter-devel, Pablo Neira Ayuso, Eric Woudstra Antoine C. <acalando@free.fr> wrote: > ----- Florian Westphal <fw@strlen.de> a écrit : > > Antoine C. <acalando@free.fr> wrote: > > > > > > This bug does not seem to get a lot of attention but may be it > > > deserves at least to be filed ? > > > > Its a design bug and so far not a single solution was > > presented. > > > > And no one answered any of the questions that I asked. > > > > So, whats YOUR opinion? > > > > > > Should "ip saddr 1.2.3.4" match: > > > > > > > > Only in classic ethernet case? > > > > In VLAN? > > > > In QinQ? > > > > > > > > What about IP packet in a PPPOE frame? > > > > What about other L2 protocols? > > As a user, my answer would be of course that "ip xxx" rules > work seamlessly whatever the encapsulation is above. I have Thanks. I believe that this is the way to go, i.e., if user asks to match L3, then we should be greedy and attempt to provide the relevant context. TL;DR: I think https://patchwork.ozlabs.org/project/netfilter-devel/patch/20251104145728.517197-4-ericwouds@gmail.com/ is the way to go with only minor changes needed. Unfortunately this isn't as easy as it sounds. Lets see where we are: When user asks for 'ip saddr 1.2.3.4'. we must provide context for matching to work in first place. 1. Kernel doesn't always know where to start the matching (the network header offset in this case). For ethernet and vlan the kernel will know. But for other procotols on top of ethernet that might not be the case especially if the kernel is only forwarding frames. To make 'ip saddr' work on bridge with other encapsulations, e.g. PPPoE, this needs kernel changes. That also means that existing rules behave differently when updating kernel with new L2 encap protocol support. 2. Even if kernel knows where in the frame the network header starts, 'ip saddr 1.2.3.4' must not match e.g. an ipv6 packet. This is solved (or rather, supposed to be solved) by nftables userspace: For bridge family nft will insert a protocol dependency check internally, in the given example it is 'meta load protocol;cmp eq 0x0800'. - It excludes non-ipv4 (no false matches). This is wanted behavior. - It includes VLAN encapsulated frames because kernel (by default) removes vlan headers and thus replaces the 0x8100 vlan type with the upper protocol. - If kernel is configured to not do that, then this will only match normal (non-vlan) ipv4 ethernet frames. If we want to make 'ip saddr 1.2.3.4' always match for netdev and bridge families, then we need to make changes on both userspace and/or kernel: - kernel must, for bridge and netdev chains, follow the L2 encap trail until it finds ip/ip6 protocol or it can't follow any further. This means extension to also skip PPPoE, q-in-q and so on and would be largely identical to what Eric Woudstras patches already do. Downside: Behaviour change on kernel upgrade: packets that were not matched before now will be, UNLESS we don't update skb->protocol value and e.g. add a new l3proto field to nft_pktinfo. That in turn is also problematic, because it means that we'd have to add a new 'meta l3proto' that can extract this from nft_pktinfo as a replacement of 'meta protocol'. Semantics would be: meta protocol == skb->protocol meta l3proto == "last" ETH_P_ that kernel could figure out. It is similar to 'ip6 nexthdr' vs. 'meta l4proto'; former is just the first next header value, the latter the last protocol header with all extension headers, if any, skipped. The problem is that, given its a new extension, we can't update nftables to use it for implicit dependencies; else rule add fails on older kernels. If we ignore that and pretend we could change this, then the behaviour of: ip saddr 1.2.3.4 (with meta l3proto dep) ip saddr 1.2.3.4 (with current meta protocol) ... are identical between the two options (i.e., follow dependency trail and update skb->protocol to last known ETH_P value) resp. 'stash last known ETH_P in nft_pktinfo) from user perspective. That would also mean that nftables netdev family should be fixed to follow the bridge dependency approach consistently if possible. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-11-06 23:46 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-10-12 20:59 bug report: MAC src + protocol optiomization failing with 802.1Q frames Antoine C. 2025-10-12 21:44 ` Florian Westphal 2025-10-27 18:26 ` Antoine C. 2025-10-27 18:37 ` Florian Westphal 2025-10-27 22:40 ` Antoine C. 2025-11-06 23:46 ` Florian Westphal
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).