Ethernet Bridge development
 help / color / mirror / Atom feed
* Re: Please backport bridge multicast exponential field encoding fix series to stable kernels
From: Ujjal Roy @ 2026-06-25 14:50 UTC (permalink / raw)
  To: Sasha Levin
  Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Nikolay Aleksandrov, Ido Schimmel, David Ahern,
	Shuah Khan, Andy Roulin, Yong Wang, Petr Machata, stable, Greg KH,
	Greg Kroah-Hartman, Ujjal Roy, bridge, Kernel, Kernel,
	linux-kselftest
In-Reply-To: <20260625054005.0016.bridge-mcast@kernel.org>

On Thu, Jun 25, 2026 at 4:12 PM Sasha Levin <sashal@kernel.org> wrote:
>
> > Please backport the 5-patch bridge multicast exponential field
> > encoding series (726fa7da2d8c, 12cfb4ecc471, 95bfd196f0dc,
> > e51560f4220a, 529dbe762de0) to the stable kernels.
>
> I tried, but it doesn't apply to 7.1. Could you provide a backport please?
>
> --
> Thanks,
> Sasha

I will create patches on top of 7.1. But tell me what about all other
stable releases? I have to create patches to all stables and how to
share the patches to you? Via this email or any other process? I am a
fresh on backporting my changes to all stables.

^ permalink raw reply

* Re: Please backport bridge multicast exponential field encoding fix series to stable kernels
From: Sasha Levin @ 2026-06-25 10:42 UTC (permalink / raw)
  To: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Nikolay Aleksandrov, Ido Schimmel, David Ahern,
	Shuah Khan, Andy Roulin, Yong Wang, Petr Machata, stable, Greg KH,
	Greg Kroah-Hartman
  Cc: Sasha Levin, Ujjal Roy, bridge, Kernel, Kernel, linux-kselftest,
	Ujjal Roy
In-Reply-To: <CAE2MWknz4X_gcNo6jkR87Lg8F0zfubkOc4Ujr57CS3aBMWrjEA@mail.gmail.com>

> Please backport the 5-patch bridge multicast exponential field
> encoding series (726fa7da2d8c, 12cfb4ecc471, 95bfd196f0dc,
> e51560f4220a, 529dbe762de0) to the stable kernels.

I tried, but it doesn't apply to 7.1. Could you provide a backport please?

--
Thanks,
Sasha

^ permalink raw reply

* Please backport bridge multicast exponential field encoding fix series to stable kernels
From: Ujjal Roy @ 2026-06-24  6:59 UTC (permalink / raw)
  To: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Nikolay Aleksandrov, Ido Schimmel, David Ahern,
	Shuah Khan, Andy Roulin, Yong Wang, Petr Machata, stable, Greg KH,
	Greg Kroah-Hartman
  Cc: Ujjal Roy, bridge, Kernel, Kernel, linux-kselftest

Hi Greg,

Please consider backporting the following bridge multicast fix series
to all applicable stable kernels:

726fa7da2d8c ("ipv4: igmp: get rid of IGMPV3_{QQIC,MRC} and simplify
calculation")
12cfb4ecc471 ("ipv6: mld: rename mldv2_mrc() and add mldv2_qqi()")
95bfd196f0dc ("ipv4: igmp: encode multicast exponential fields")
e51560f4220a ("ipv6: mld: encode multicast exponential fields")
529dbe762de0 ("selftests: net: bridge: add MRC and QQIC field encoding tests")

This series was merged via: db314398f618 ("net: bridge: mcast: support
exponential field encoding")

History: The multicast stack currently supports decoding of IGMPv3 and
MLDv2 exponential timer field encodings, but lacks the corresponding
encoding logic when generating multicast query packets. As a result,
query intervals and response codes exceeding the linear encoding range
can be transmitted incorrectly. This can cause multicast queriers and
listeners to interpret different timing values, resulting in protocol
interoperability issues, membership timeouts, and premature multicast
group expiration.

Testing: The series adds the missing encoding support for both IGMPv3
and MLDv2 and includes selftests that validate the behavior.
I backported the series to v6.6.123.2 and verified the accompanying
selftests. The selftests fail on the unpatched kernel and pass after
applying the series, demonstrating both the bug and the effectiveness
of the fix.

Given that this is a protocol correctness issue affecting multicast
query generation, please consider backporting the complete series to
all applicable stable kernels.

Thanks,
Ujjal

^ permalink raw reply

* Re: [PATCH RFC] net: bridge: mcast: don't clear L2 host_joined on port group deletion
From: Ido Schimmel @ 2026-06-14 10:54 UTC (permalink / raw)
  To: cedric.jehasse
  Cc: Nikolay Aleksandrov, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, bridge, netdev,
	linux-kernel, Cedric Jehasse
In-Reply-To: <20260610-mdb_l2_host_joined_fix-v1-1-19746b0b8a5d@luminex.be>

On Wed, Jun 10, 2026 at 10:31:23AM +0200, Cedric Jehasse via B4 Relay wrote:
> From: Cedric Jehasse <cedric.jehasse@luminex.be>
> 
> For a static L2 multicast group that has both a host entry and a port
> entry, deleting the port entry also removes the host entry, and the
> whole group then disappears from "bridge mdb show".
> 
> To reproduce:
>   bridge mdb add dev br0 port br0  grp 01:02:03:04:05:06 permanent
>   bridge mdb add dev br0 port swp1 grp 01:02:03:04:05:06 permanent
>   bridge mdb del dev br0 port swp1 grp 01:02:03:04:05:06 permanent
>   bridge mdb show # the "port br0" host entry is gone, too

Please show the output in the commit message and also show that this
differs from regular (*, G) entries where the host entry is not removed
following the deletion of the port entry.

> 
> br_multicast_del_pg() processes every non-(*,G) entry through the S,G
> path, which removes the port group from br->sg_port_tbl and then calls
> br_multicast_sg_del_exclude_ports(). L2 entries are stored in
> sg_port_tbl as well, so they take this path too.
> 
> When the last port is removed in br_multicast_sg_del_exclude_ports it
> sets "sgmp->host_joined = false", clearing the host membership directly
> and bypassing br_multicast_host_leave(). With host_joined now false and
> no ports left, br_multicast_del_pg() arms the group timer and
> br_multicast_group_expired() tears down the whole mdb entry -- even
> though the host membership was explicitly and permanently configured
> from user space.
> 
> Keep removing L2 port groups from sg_port_tbl, but skip the S,G
> EXCLUDE-mode handling for them. The host membership of an L2 group is
> managed solely via br_multicast_host_join() / br_multicast_host_leave().
> 
> Signed-off-by: Cedric Jehasse <cedric.jehasse@luminex.be>

The patch seems OK to me, but please add a test case in bridge_mdb.sh.

I checked the code and AFAICT this never worked, so target at net-next
without a fixes tag: Support for L2 multicast groups was added in
955062b03fa62, but at this point the mode handling already existed in
br_multicast_del_pg().

^ permalink raw reply

* Re: [PATCH net v4] bridge: cfm: reject invalid CCM interval at configuration time
From: patchwork-bot+netdevbpf @ 2026-06-11 22:30 UTC (permalink / raw)
  To: Xiang Mei
  Cc: netdev, idosch, horms, bridge, razor, davem, edumazet, pabeni,
	bestswngs
In-Reply-To: <20260609065116.2818837-1-xmei5@asu.edu>

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Mon,  8 Jun 2026 23:51:16 -0700 you wrote:
> ccm_tx_work_expired() re-arms itself via queue_delayed_work() using
> the configured exp_interval converted by interval_to_us(). When
> exp_interval is BR_CFM_CCM_INTERVAL_NONE or out of range,
> interval_to_us() returns 0, causing the worker to fire immediately in
> a tight loop that allocates skbs until OOM.
> 
> Fix this by validating exp_interval at configuration time:
> 
> [...]

Here is the summary with links:
  - [net,v4] bridge: cfm: reject invalid CCM interval at configuration time
    https://git.kernel.org/netdev/net/c/f3e02edd8322

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH] net: bridge: vxlan: Protocol field in bridge fdb
From: Ido Schimmel @ 2026-06-11 12:19 UTC (permalink / raw)
  To: Patrice Brissette
  Cc: Ido Schimmel, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org, bridge@lists.linux-foundation.org,
	Mrinmoy Ghosh, razor
In-Reply-To: <CACWwMkvCAdjDPEraGhBdH67tkK=3p=aygZ2JYCRPkOzjVghFpw@mail.gmail.com>

On Tue, Jun 09, 2026 at 06:55:10PM -0400, Patrice Brissette wrote:
> I'm following up on the status of this patch series. This feature is
> critical for our EVPN Multihoming deployments, and the corresponding
> FRRouting work is currently blocked pending support for this
> functionality.
> 
> Has there been any progress on this effort, or has someone else picked
> it up?

No. Assumption was that the author will follow up on the feedback.

> What are the next steps?

Before I answer this, I have some questions / comments below.

> 
> For reference, the proposed work includes:
> 
> Adding support for a protocol field to bridge and VXLAN FDB entries.
> 
> Allowing the protocol field to identify the source of an FDB update,
> for example:
> 
> Zebra for control-plane-originated entries
> 
> HW for data-plane-learned entries (e.g., ASIC-learned MACs)

Note that entries installed by the kernel (as opposed to user space)
will always be programmed with RTPROT_KERNEL, regardless of the data
path in which they were learned (software / hardware).

> 
> Extending iproute2 to support configuration and display of this new field.
> 
> The primary use case is EVPN Multihoming with ARP/ND synchronization
> for hosts that are multihomed to a set of routers. In this
> environment, the same MAC address may be learned locally by hardware
> when the host is directly attached, or installed by the control plane
> when the entry is synchronized through BGP, as commonly occurs in
> all-active scenarios.
> 
> The protocol field allows FRRouting to distinguish
> control-plane-installed entries from hardware-learned entries,
> enabling correct MAC mobility handling, ES peer synchronization, and
> proper processing of MAC ownership changes.

Can't this be achieved by using "activity_notify"?

See:

https://git.kernel.org/pub/scm/network/iproute2/iproute2-next.git/commit/?id=e041178ba6bc2af0a1148145ee303c9db79fb4cb
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=5e88777a382480d0b1f7eafb6d0fb680ec7a40bb

When an FDB entry is learned on an ES, install it with "activity_notify
norefresh":

es1# bridge fdb replace 00:aa:bb:cc:dd:ee dev bond1 master static activity_notify norefresh
es1# # bridge -d fdb get 00:aa:bb:cc:dd:ee br br1
00:aa:bb:cc:dd:ee dev bond1 activity_notify master br1 static

It will transition to "inactive" after the aging time elapsed:

es1# bridge -d fdb get 00:aa:bb:cc:dd:ee br br1
00:aa:bb:cc:dd:ee dev bond1 activity_notify inactive master br1 static

And install it as "activity_notify inactive" when synchronizing it to
other ES peers:

es2# bridge fdb add 00:aa:bb:cc:dd:ee dev bond1 master static activity_notify inactive
es2# bridge -d fdb get 00:aa:bb:cc:dd:ee br br1
00:aa:bb:cc:dd:ee dev bond1 activity_notify inactive master br1 static

Then entry will become active if later it is refreshed / learned by the
data path:

es2# bridge -d fdb get 00:aa:bb:cc:dd:ee br br1
00:aa:bb:cc:dd:ee dev bond1 activity_notify master br1 static

^ permalink raw reply

* [PATCH RFC] net: bridge: mcast: don't clear L2 host_joined on port group deletion
From: Cedric Jehasse via B4 Relay @ 2026-06-10  8:31 UTC (permalink / raw)
  To: Nikolay Aleksandrov, Ido Schimmel, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman
  Cc: bridge, netdev, linux-kernel, Cedric Jehasse, Cedric Jehasse

From: Cedric Jehasse <cedric.jehasse@luminex.be>

For a static L2 multicast group that has both a host entry and a port
entry, deleting the port entry also removes the host entry, and the
whole group then disappears from "bridge mdb show".

To reproduce:
  bridge mdb add dev br0 port br0  grp 01:02:03:04:05:06 permanent
  bridge mdb add dev br0 port swp1 grp 01:02:03:04:05:06 permanent
  bridge mdb del dev br0 port swp1 grp 01:02:03:04:05:06 permanent
  bridge mdb show # the "port br0" host entry is gone, too

br_multicast_del_pg() processes every non-(*,G) entry through the S,G
path, which removes the port group from br->sg_port_tbl and then calls
br_multicast_sg_del_exclude_ports(). L2 entries are stored in
sg_port_tbl as well, so they take this path too.

When the last port is removed in br_multicast_sg_del_exclude_ports it
sets "sgmp->host_joined = false", clearing the host membership directly
and bypassing br_multicast_host_leave(). With host_joined now false and
no ports left, br_multicast_del_pg() arms the group timer and
br_multicast_group_expired() tears down the whole mdb entry -- even
though the host membership was explicitly and permanently configured
from user space.

Keep removing L2 port groups from sg_port_tbl, but skip the S,G
EXCLUDE-mode handling for them. The host membership of an L2 group is
managed solely via br_multicast_host_join() / br_multicast_host_leave().

Signed-off-by: Cedric Jehasse <cedric.jehasse@luminex.be>
---
For a static L2 multicast group that has both a host entry and a port
entry, deleting the port entry also removes the host entry, and the
whole group then disappears from "bridge mdb show".

To reproduce:
  bridge mdb add dev br0 port br0  grp 01:02:03:04:05:06 permanent
  bridge mdb add dev br0 port swp1 grp 01:02:03:04:05:06 permanent
  bridge mdb del dev br0 port swp1 grp 01:02:03:04:05:06 permanent
  bridge mdb show # the "port br0" host entry is gone, too

br_multicast_del_pg() processes every non-(*,G) entry through the S,G
path, which removes the port group from br->sg_port_tbl and then calls
br_multicast_sg_del_exclude_ports(). L2 entries are stored in
sg_port_tbl as well, so they take this path too.

When the last port is removed in br_multicast_sg_del_exclude_ports it
sets "sgmp->host_joined = false", clearing the host membership directly
and bypassing br_multicast_host_leave(). With host_joined now false and
no ports left, br_multicast_del_pg() arms the group timer and
br_multicast_group_expired() tears down the whole mdb entry -- even
though the host membership was explicitly and permanently configured
from user space.

Keep removing L2 port groups from sg_port_tbl, but skip the S,G
EXCLUDE-mode handling for them. The host membership of an L2 group is
managed solely via br_multicast_host_join() / br_multicast_host_leave().

This fixes the issue, but i'd like a second opinion on if this is the
correct way to fix it.
---
 net/bridge/br_multicast.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 881d866d687a..d718a6d1ddb1 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -816,7 +816,13 @@ void br_multicast_del_pg(struct net_bridge_mdb_entry *mp,
 	if (!br_multicast_is_star_g(&mp->addr)) {
 		rhashtable_remove_fast(&br->sg_port_tbl, &pg->rhnode,
 				       br_sg_port_rht_params);
-		br_multicast_sg_del_exclude_ports(mp);
+		/* L2 entries share sg_port_tbl with S,G entries but have no
+		 * *,G/S,G EXCLUDE-mode semantics; their host membership is
+		 * managed explicitly via br_multicast_host_join()/leave() and
+		 * must not be cleared here when the last port group is removed.
+		 */
+		if (!br_group_is_l2(&mp->addr))
+			br_multicast_sg_del_exclude_ports(mp);
 	} else {
 		br_multicast_star_g_handle_mode(pg, MCAST_INCLUDE);
 	}

---
base-commit: 022bdd9c0d036863c4bacd1688b73c6be3001cee
change-id: 20260609-mdb_l2_host_joined_fix-fb2de21580c7

Best regards,
-- 
Cedric Jehasse <cedric.jehasse@luminex.be>



^ permalink raw reply related

* Re: [PATCH] net: bridge: vxlan: Protocol field in bridge fdb
From: Patrice Brissette @ 2026-06-09 22:55 UTC (permalink / raw)
  To: Ido Schimmel, linux-kernel@vger.kernel.org
  Cc: netdev@vger.kernel.org, bridge@lists.linux-foundation.org,
	Mrinmoy Ghosh

Hi,

I'm following up on the status of this patch series. This feature is
critical for our EVPN Multihoming deployments, and the corresponding
FRRouting work is currently blocked pending support for this
functionality.

Has there been any progress on this effort, or has someone else picked
it up? What are the next steps?

For reference, the proposed work includes:

Adding support for a protocol field to bridge and VXLAN FDB entries.

Allowing the protocol field to identify the source of an FDB update,
for example:

Zebra for control-plane-originated entries

HW for data-plane-learned entries (e.g., ASIC-learned MACs)

Extending iproute2 to support configuration and display of this new field.

The primary use case is EVPN Multihoming with ARP/ND synchronization
for hosts that are multihomed to a set of routers. In this
environment, the same MAC address may be learned locally by hardware
when the host is directly attached, or installed by the control plane
when the entry is synchronized through BGP, as commonly occurs in
all-active scenarios.

The protocol field allows FRRouting to distinguish
control-plane-installed entries from hardware-learned entries,
enabling correct MAC mobility handling, ES peer synchronization, and
proper processing of MAC ownership changes.

Please let me know if anyone is already working on this. Otherwise, I
would be happy to pick it up and help move it forward.

Thanks,
Patrice


From: Ido Schimmel <idosch@idosch.org>
Date: Thursday, August 21, 2025 at 07:03
To: Mrinmoy Ghosh (mrghosh) <mrghosh@cisco.com>
Cc: linux-kernel@vger.kernel.org <linux-kernel@vger.kernel.org>;
netdev@vger.kernel.org <netdev@vger.kernel.org>;
bridge@lists.linux-foundation.org <bridge@lists.linux-foundation.org>;
Mrinmoy Ghosh <mrinmoy_g@hotmail.com>; Patrice Brissette (pbrisset)
<pbrisset@cisco.com>
Subject: Re: [PATCH] net: bridge: vxlan: Protocol field in bridge fdb

On Mon, Aug 18, 2025 at 05:52:58PM +0000, Mrinmoy Ghosh wrote:
> This is to add optional "protocol" field for bridge fdb entries.
> The introduction of the 'protocol' field in the bridge FDB for EVPN Multihome, addresses the need to distinguish between MAC addresses learned via the control plane and those learned via the data plane with data plane aging. Specifically:
> * A MAC address in an EVPN Multihome environment can be learned either through the control plane (static MAC) or the data plane (dynamic MAC with aging).

This is true for EVPN in general, so why mention MH?

> * The 'protocol' field uses values such as 'HW' for data plane dynamic MACs and 'ZEBRA' for control plane static MACs.

"HW" does not make sense to me. Why does the control plane care if the
entry was learned dynamically in the software data path (no offload) or
in the hardware data path? Entries installed by the kernel should be
installed with "RTPROT_KERNEL" regardless of the origin of the entry
(software / hardware).

That being said, you can encode whatever you want in the protocol field
and adjust rt_protos to display it however you like.

> * This distinction allows the application to manage the MAC address state machine effectively during transitions, which can occur due to traffic hashing between EVPN Multihome peers or mobility of MAC addresses across EVPN peers.
> * By identifying the source of the MAC learning (control plane vs. data plane), the system can handle MAC aging and mobility more accurately, ensuring synchronization between control and data planes and improving stability and reliability in MAC route handling.

This is quite vague. Can you be more specific on how exactly the control
plane is expected to use the protocol field in EVPN MH?

AFAIK, when the kernel notifies FRR about an FDB entry that was learned
on an ES peer, FRR installs the entry on all the ES peers as a static
entry (no aging, roaming enabled) with the "activity_notify" and
"inactive" flags so that the control plane will be notified when the
entry becomes active (i.e., was learned locally). See:

https://git.kernel.org/pub/scm/network/iproute2/iproute2-next.git/commit/?id=e041178ba6bc2af0a1148145ee303c9db79fb4cb
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=5e88777a382480d0b1f7eafb6d0fb680ec7a40bb

I am not against adding a cookie ("protocol") to FDB entries, but I
would like to understand your motivation.

>
> This mechanism supports the complex state transitions and synchronization required in EVPN Multihome scenarios, where MAC addresses may move or be learned differently depending on network events and traffic patterns.
>
> Change Summary:
> vxlan_core.c:  Encode NDA_PROTOCOL, and create and update fdb protocol field
>                Use RTPROT_UNSPEC when protocol not specified (default)
> vxlan_private.h: protocol field in vxlan_fdb, function signature updates
> vxlan_vnifilter.c: Use default RTPROT_UNSPEC, for default fdb create
> br.c: Use default RTPROT_UNSPEC as protocol, for swdev event
> br_fdb.c: Set NDA_PROTOCOL from protocol for fdb fill.
>           bridge fdb add, delete, learn update of protocol field
> br_private.h: protocol field in net_bridge_fdb_entry
>
> e.g:
> Test along with iproute2 change i.e https://lore.kernel.org/netdev/20250816031145.1153429-1-mrghosh@cisco.com/T/#u
>
> $ bridge fdb add 00:00:00:00:00:88 dev hostbond2 vlan 1000 master dynamic extern_learn proto hw
>
> $ bridge -d fdb show dev hostbond2 | grep 00:00:00:00:00:88
> 00:00:00:00:00:88 vlan 1000 extern_learn master br1000 proto hw
>
> $ bridge -d -j -p fdb show dev hostbond2
>
> ...
>
> [ {
>         "mac": "00:00:00:00:00:88",
>         "vlan": 1000,
>         "flags": [ "extern_learn" ],
>         "master": "br1000",
>         "flags_ext": [ ],
>         "protocol": "hw",
>         "state": ""
>     },{
> ...
>
> Transition to Zebra:
> $ bridge fdb replace  00:00:00:00:00:88 dev hostbond2 vlan 1000 master dynamic extern_learn proto zebra
>
> $ bridge -d fdb show dev hostbond2 | grep 00:00:00:00:00:88
> 00:00:00:00:00:88 vlan 1000 extern_learn master br1000 proto zebra
>
> $ bridge -d -j -p fdb show dev hostbond2 ...
> [ {
>         "mac": "00:00:00:00:00:88",
>         "vlan": 1000,
>         "flags": [ "extern_learn" ],
>         "master": "br1000",
>         "flags_ext": [ ],
>         "protocol": "zebra",
>         "state": ""
>     },
> ...
>
> iproute2 review: https://lore.kernel.org/netdev/20250816031145.1153429-1-mrghosh@cisco.com/T/#u
>
> Signed-off-by: Mrinmoy Ghosh <mrghosh@cisco.com>
> Co-authored-by: Mrinmoy Ghosh <mrinmoy_g@hotmail.com>
> Co-authored-by: Patrice Brissette <pbrisset@cisco.com>
> ---
>  drivers/net/vxlan/vxlan_core.c      | 132 ++++++++++++++--------------
>  drivers/net/vxlan/vxlan_private.h   |  21 +++--
>  drivers/net/vxlan/vxlan_vnifilter.c |  11 +--
>  net/bridge/br.c                     |   4 +-
>  net/bridge/br_fdb.c                 |  52 ++++++++---
>  net/bridge/br_private.h             |   5 +-
>  6 files changed, 127 insertions(+), 98 deletions(-)
>
> diff --git a/drivers/net/vxlan/vxlan_core.c b/drivers/net/vxlan/vxlan_core.c
> index f32be2e301f2..eff342e467a6 100644
> --- a/drivers/net/vxlan/vxlan_core.c
> +++ b/drivers/net/vxlan/vxlan_core.c
> @@ -206,6 +206,8 @@ static int vxlan_fdb_info(struct sk_buff *skb, struct vxlan_dev *vxlan,
>                        peernet2id(dev_net(vxlan->dev), vxlan->net)))
>                goto nla_put_failure;
>
> +     if (nla_put_u8(skb, NDA_PROTOCOL, fdb->protocol))

Maybe fill only if not 0?

You should patch vxlan_nlmsg_size() as well.

> +             goto nla_put_failure;
>        if (send_eth && nla_put(skb, NDA_LLADDR, ETH_ALEN, &fdb->key.eth_addr))
>                goto nla_put_failure;
>        if (nh) {
> @@ -852,12 +854,11 @@ static int vxlan_fdb_nh_update(struct vxlan_dev *vxlan, struct vxlan_fdb *fdb,
>        return err;
>  }
>
> -int vxlan_fdb_create(struct vxlan_dev *vxlan,
> -                  const u8 *mac, union vxlan_addr *ip,
> -                  __u16 state, __be16 port, __be32 src_vni,
> -                  __be32 vni, __u32 ifindex, __u16 ndm_flags,
> +int vxlan_fdb_create(struct vxlan_dev *vxlan, const u8 *mac,
> +                  union vxlan_addr *ip, __u16 state, __be16 port,
> +                  __be32 src_vni, __be32 vni, __u32 ifindex, __u16 ndm_flags,
>                     u32 nhid, struct vxlan_fdb **fdb,
> -                  struct netlink_ext_ack *extack)
> +                  struct netlink_ext_ack *extack, u8 protocol)

Move this after 'nhid' so it's closer to the other attributes.

I will add an item to my TODO list to move these arguments into an FDB
config structure.

>  {
>        struct vxlan_rdst *rd = NULL;
>        struct vxlan_fdb *f;
> @@ -872,6 +873,7 @@ int vxlan_fdb_create(struct vxlan_dev *vxlan,
>        if (!f)
>                return -ENOMEM;
>
> +     f->protocol = protocol;

Multicast FDB entries can have multiple remotes and each can be added by
a different entity. I think it makes more sense to move the protocol
field to the remote structure.

>        if (nhid)
>                rc = vxlan_fdb_nh_update(vxlan, f, nhid, extack);
>        else
> @@ -964,14 +966,12 @@ static void vxlan_dst_free(struct rcu_head *head)
>        kfree(rd);
>  }
>
> -static int vxlan_fdb_update_existing(struct vxlan_dev *vxlan,
> -                                  union vxlan_addr *ip,
> -                                  __u16 state, __u16 flags,
> -                                  __be16 port, __be32 vni,
> -                                  __u32 ifindex, __u16 ndm_flags,
> -                                  struct vxlan_fdb *f, u32 nhid,
> -                                  bool swdev_notify,
> -                                  struct netlink_ext_ack *extack)
> +static int
> +vxlan_fdb_update_existing(struct vxlan_dev *vxlan, union vxlan_addr *ip,
> +                       __u16 state, __u16 flags, __be16 port, __be32 vni,
> +                       __u32 ifindex, __u16 ndm_flags, struct vxlan_fdb *f,
> +                       u32 nhid, bool swdev_notify,
> +                       struct netlink_ext_ack *extack, u8 protocol)
>  {
>        __u16 fdb_flags = (ndm_flags & ~NTF_USE);
>        struct vxlan_rdst *rd = NULL;
> @@ -1005,6 +1005,11 @@ static int vxlan_fdb_update_existing(struct vxlan_dev *vxlan,
>                        f->flags = fdb_flags;
>                        notify = 1;
>                }
> +             if (f->protocol != protocol) {
> +                     f->protocol = protocol;
> +                     f->updated = jiffies;
> +                     notify = 1;
> +             }
>        }
>
>        if ((flags & NLM_F_REPLACE)) {
> @@ -1063,13 +1068,12 @@ static int vxlan_fdb_update_existing(struct vxlan_dev *vxlan,
>        return err;
>  }
>
> -static int vxlan_fdb_update_create(struct vxlan_dev *vxlan,
> -                                const u8 *mac, union vxlan_addr *ip,
> -                                __u16 state, __u16 flags,
> -                                __be16 port, __be32 src_vni, __be32 vni,
> -                                __u32 ifindex, __u16 ndm_flags, u32 nhid,
> -                                bool swdev_notify,
> -                                struct netlink_ext_ack *extack)
> +static int vxlan_fdb_update_create(struct vxlan_dev *vxlan, const u8 *mac,
> +                                union vxlan_addr *ip, __u16 state,
> +                                __u16 flags, __be16 port, __be32 src_vni,
> +                                __be32 vni, __u32 ifindex, __u16 ndm_flags,
> +                                u32 nhid, bool swdev_notify,
> +                                struct netlink_ext_ack *extack, u8 protocol)
>  {
>        __u16 fdb_flags = (ndm_flags & ~NTF_USE);
>        struct vxlan_fdb *f;
> @@ -1081,8 +1085,8 @@ static int vxlan_fdb_update_create(struct vxlan_dev *vxlan,
>                return -EOPNOTSUPP;
>
>        netdev_dbg(vxlan->dev, "add %pM -> %pIS\n", mac, ip);
> -     rc = vxlan_fdb_create(vxlan, mac, ip, state, port, src_vni,
> -                           vni, ifindex, fdb_flags, nhid, &f, extack);
> +     rc = vxlan_fdb_create(vxlan, mac, ip, state, port, src_vni, vni,
> +                           ifindex, fdb_flags, nhid, &f, extack, protocol);
>        if (rc < 0)
>                return rc;
>
> @@ -1099,13 +1103,11 @@ static int vxlan_fdb_update_create(struct vxlan_dev *vxlan,
>  }
>
>  /* Add new entry to forwarding table -- assumes lock held */
> -int vxlan_fdb_update(struct vxlan_dev *vxlan,
> -                  const u8 *mac, union vxlan_addr *ip,
> -                  __u16 state, __u16 flags,
> -                  __be16 port, __be32 src_vni, __be32 vni,
> -                  __u32 ifindex, __u16 ndm_flags, u32 nhid,
> -                  bool swdev_notify,
> -                  struct netlink_ext_ack *extack)
> +int vxlan_fdb_update(struct vxlan_dev *vxlan, const u8 *mac,
> +                  union vxlan_addr *ip, __u16 state, __u16 flags,
> +                  __be16 port, __be32 src_vni, __be32 vni, __u32 ifindex,
> +                  __u16 ndm_flags, u32 nhid, bool swdev_notify,
> +                  struct netlink_ext_ack *extack, u8 protocol)
>  {
>        struct vxlan_fdb *f;
>
> @@ -1119,7 +1121,8 @@ int vxlan_fdb_update(struct vxlan_dev *vxlan,
>
>                return vxlan_fdb_update_existing(vxlan, ip, state, flags, port,
>                                                 vni, ifindex, ndm_flags, f,
> -                                              nhid, swdev_notify, extack);
> +                                              nhid, swdev_notify, extack,
> +                                              protocol);
>        } else {
>                if (!(flags & NLM_F_CREATE))
>                        return -ENOENT;
> @@ -1127,7 +1130,7 @@ int vxlan_fdb_update(struct vxlan_dev *vxlan,
>                return vxlan_fdb_update_create(vxlan, mac, ip, state, flags,
>                                               port, src_vni, vni, ifindex,
>                                               ndm_flags, nhid, swdev_notify,
> -                                            extack);
> +                                            extack, protocol);
>        }
>  }
>
> @@ -1142,7 +1145,7 @@ static void vxlan_fdb_dst_destroy(struct vxlan_dev *vxlan, struct vxlan_fdb *f,
>  static int vxlan_fdb_parse(struct nlattr *tb[], struct vxlan_dev *vxlan,
>                           union vxlan_addr *ip, __be16 *port, __be32 *src_vni,
>                           __be32 *vni, u32 *ifindex, u32 *nhid,
> -                        struct netlink_ext_ack *extack)
> +                        struct netlink_ext_ack *extack, u8 *protocol)
>  {
>        struct net *net = dev_net(vxlan->dev);
>        int err;
> @@ -1222,6 +1225,11 @@ static int vxlan_fdb_parse(struct nlattr *tb[], struct vxlan_dev *vxlan,
>
>        *nhid = nla_get_u32_default(tb[NDA_NH_ID], 0);
>
> +     if (tb[NDA_PROTOCOL])
> +             *protocol = nla_get_u8(tb[NDA_PROTOCOL]);
> +     else
> +             *protocol = RTPROT_UNSPEC;
> +
>        return 0;
>  }
>
> @@ -1238,6 +1246,7 @@ static int vxlan_fdb_add(struct ndmsg *ndm, struct nlattr *tb[],
>        __be32 src_vni, vni;
>        u32 ifindex, nhid;
>        int err;
> +     u8 protocol;

https://docs.kernel.org/process/maintainer-netdev.html#local-variable-ordering-reverse-xmas-tree-rcs

Make this change throughout the patch.

>
>        if (!(ndm->ndm_state & (NUD_PERMANENT|NUD_REACHABLE))) {
>                pr_info("RTM_NEWNEIGH with invalid state %#x\n",
> @@ -1249,7 +1258,7 @@ static int vxlan_fdb_add(struct ndmsg *ndm, struct nlattr *tb[],
>                return -EINVAL;
>
>        err = vxlan_fdb_parse(tb, vxlan, &ip, &port, &src_vni, &vni, &ifindex,
> -                           &nhid, extack);
> +                           &nhid, extack, &protocol);
>        if (err)
>                return err;
>
> @@ -1257,10 +1266,10 @@ static int vxlan_fdb_add(struct ndmsg *ndm, struct nlattr *tb[],
>                return -EAFNOSUPPORT;
>
>        spin_lock_bh(&vxlan->hash_lock);
> -     err = vxlan_fdb_update(vxlan, addr, &ip, ndm->ndm_state, flags,
> -                            port, src_vni, vni, ifindex,
> -                            ndm->ndm_flags | NTF_VXLAN_ADDED_BY_USER,
> -                            nhid, true, extack);
> +     err = vxlan_fdb_update(vxlan, addr, &ip, ndm->ndm_state, flags, port,
> +                            src_vni, vni, ifindex,
> +                            ndm->ndm_flags | NTF_VXLAN_ADDED_BY_USER, nhid,
> +                            true, extack, protocol);
>        spin_unlock_bh(&vxlan->hash_lock);
>
>        if (!err)
> @@ -1314,9 +1323,10 @@ static int vxlan_fdb_delete(struct ndmsg *ndm, struct nlattr *tb[],
>        u32 ifindex, nhid;
>        __be16 port;
>        int err;
> +     u8 protocol;
>
>        err = vxlan_fdb_parse(tb, vxlan, &ip, &port, &src_vni, &vni, &ifindex,
> -                           &nhid, extack);
> +                           &nhid, extack, &protocol);
>        if (err)
>                return err;
>
> @@ -1470,13 +1480,12 @@ static enum skb_drop_reason vxlan_snoop(struct net_device *dev,
>
>                /* close off race between vxlan_flush and incoming packets */
>                if (netif_running(dev))
> -                     vxlan_fdb_update(vxlan, src_mac, src_ip,
> -                                      NUD_REACHABLE,
> -                                      NLM_F_EXCL|NLM_F_CREATE,
> -                                      vxlan->cfg.dst_port,
> -                                      vni,
> -                                      vxlan->default_dst.remote_vni,
> -                                      ifindex, NTF_SELF, 0, true, NULL);
> +                     vxlan_fdb_update(vxlan, src_mac, src_ip, NUD_REACHABLE,
> +                                      NLM_F_EXCL | NLM_F_CREATE,
> +                                      vxlan->cfg.dst_port, vni,
> +                                      vxlan->default_dst.remote_vni, ifindex,
> +                                      NTF_SELF, 0, true, NULL,
> +                                      RTPROT_UNSPEC);

Entries installed by the kernel should have RTPROT_KERNEL

>                spin_unlock(&vxlan->hash_lock);
>        }
>
> @@ -3963,15 +3972,13 @@ static int __vxlan_dev_create(struct net *net, struct net_device *dev,
>        /* create an fdb entry for a valid default destination */
>        if (!vxlan_addr_any(&dst->remote_ip)) {
>                spin_lock_bh(&vxlan->hash_lock);
> -             err = vxlan_fdb_update(vxlan, all_zeros_mac,
> -                                    &dst->remote_ip,
> +             err = vxlan_fdb_update(vxlan, all_zeros_mac, &dst->remote_ip,
>                                       NUD_REACHABLE | NUD_PERMANENT,
>                                       NLM_F_EXCL | NLM_F_CREATE,
> -                                    vxlan->cfg.dst_port,
> -                                    dst->remote_vni,
> -                                    dst->remote_vni,
> -                                    dst->remote_ifindex,
> -                                    NTF_SELF, 0, true, extack);
> +                                    vxlan->cfg.dst_port, dst->remote_vni,
> +                                    dst->remote_vni, dst->remote_ifindex,
> +                                    NTF_SELF, 0, true, extack,
> +                                    RTPROT_UNSPEC);
>                spin_unlock_bh(&vxlan->hash_lock);
>                if (err)
>                        goto unlink;
> @@ -4416,10 +4423,10 @@ static int vxlan_changelink(struct net_device *dev, struct nlattr *tb[],
>                                               &conf.remote_ip,
>                                               NUD_REACHABLE | NUD_PERMANENT,
>                                               NLM_F_APPEND | NLM_F_CREATE,
> -                                            vxlan->cfg.dst_port,
> -                                            conf.vni, conf.vni,
> -                                            conf.remote_ifindex,
> -                                            NTF_SELF, 0, true, extack);
> +                                            vxlan->cfg.dst_port, conf.vni,
> +                                            conf.vni, conf.remote_ifindex,
> +                                            NTF_SELF, 0, true, extack,
> +                                            RTPROT_UNSPEC);
>                        if (err) {
>                                spin_unlock_bh(&vxlan->hash_lock);
>                                netdev_adjacent_change_abort(dst->remote_dev,
> @@ -4767,14 +4774,11 @@ vxlan_fdb_external_learn_add(struct net_device *dev,
>
>        spin_lock_bh(&vxlan->hash_lock);
>        err = vxlan_fdb_update(vxlan, fdb_info->eth_addr, &fdb_info->remote_ip,
> -                            NUD_REACHABLE,
> -                            NLM_F_CREATE | NLM_F_REPLACE,
> -                            fdb_info->remote_port,
> -                            fdb_info->vni,
> -                            fdb_info->remote_vni,
> -                            fdb_info->remote_ifindex,
> -                            NTF_USE | NTF_SELF | NTF_EXT_LEARNED,
> -                            0, false, extack);
> +                            NUD_REACHABLE, NLM_F_CREATE | NLM_F_REPLACE,
> +                            fdb_info->remote_port, fdb_info->vni,
> +                            fdb_info->remote_vni, fdb_info->remote_ifindex,
> +                            NTF_USE | NTF_SELF | NTF_EXT_LEARNED, 0, false,
> +                            extack, RTPROT_UNSPEC);
>        spin_unlock_bh(&vxlan->hash_lock);
>
>        return err;
> diff --git a/drivers/net/vxlan/vxlan_private.h b/drivers/net/vxlan/vxlan_private.h
> index 6c625fb29c6c..19d1b93be279 100644
> --- a/drivers/net/vxlan/vxlan_private.h
> +++ b/drivers/net/vxlan/vxlan_private.h
> @@ -39,6 +39,7 @@ struct vxlan_fdb {
>        struct vxlan_fdb_key key;
>        u16               state; /* see ndm_state */
>        u16               flags; /* see ndm_flags and below */
> +     u8 protocol; /* protocol for FDB entry */
>        struct list_head  nh_list;
>        struct hlist_node fdb_node;
>        struct nexthop __rcu *nh;
> @@ -180,24 +181,22 @@ vxlan_vnifilter_lookup(struct vxlan_dev *vxlan, __be32 vni)
>  }
>
>  /* vxlan_core.c */
> -int vxlan_fdb_create(struct vxlan_dev *vxlan,
> -                  const u8 *mac, union vxlan_addr *ip,
> -                  __u16 state, __be16 port, __be32 src_vni,
> -                  __be32 vni, __u32 ifindex, __u16 ndm_flags,
> +int vxlan_fdb_create(struct vxlan_dev *vxlan, const u8 *mac,
> +                  union vxlan_addr *ip, __u16 state, __be16 port,
> +                  __be32 src_vni, __be32 vni, __u32 ifindex, __u16 ndm_flags,
>                     u32 nhid, struct vxlan_fdb **fdb,
> -                  struct netlink_ext_ack *extack);
> +                  struct netlink_ext_ack *extack, u8 protocol);
>  int __vxlan_fdb_delete(struct vxlan_dev *vxlan,
>                       const unsigned char *addr, union vxlan_addr ip,
>                       __be16 port, __be32 src_vni, __be32 vni,
>                       u32 ifindex, bool swdev_notify);
>  u32 eth_vni_hash(const unsigned char *addr, __be32 vni);
>  u32 fdb_head_index(struct vxlan_dev *vxlan, const u8 *mac, __be32 vni);
> -int vxlan_fdb_update(struct vxlan_dev *vxlan,
> -                  const u8 *mac, union vxlan_addr *ip,
> -                  __u16 state, __u16 flags,
> -                  __be16 port, __be32 src_vni, __be32 vni,
> -                  __u32 ifindex, __u16 ndm_flags, u32 nhid,
> -                  bool swdev_notify, struct netlink_ext_ack *extack);
> +int vxlan_fdb_update(struct vxlan_dev *vxlan, const u8 *mac,
> +                  union vxlan_addr *ip, __u16 state, __u16 flags,
> +                  __be16 port, __be32 src_vni, __be32 vni, __u32 ifindex,
> +                  __u16 ndm_flags, u32 nhid, bool swdev_notify,
> +                  struct netlink_ext_ack *extack, u8 protocol);
>  void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
>                    __be32 default_vni, struct vxlan_rdst *rdst, bool did_rsc);
>  int vxlan_vni_in_use(struct net *src_net, struct vxlan_dev *vxlan,
> diff --git a/drivers/net/vxlan/vxlan_vnifilter.c b/drivers/net/vxlan/vxlan_vnifilter.c
> index adc89e651e27..908b6b489ac8 100644
> --- a/drivers/net/vxlan/vxlan_vnifilter.c
> +++ b/drivers/net/vxlan/vxlan_vnifilter.c
> @@ -482,15 +482,12 @@ static int vxlan_update_default_fdb_entry(struct vxlan_dev *vxlan, __be32 vni,
>
>        spin_lock_bh(&vxlan->hash_lock);
>        if (remote_ip && !vxlan_addr_any(remote_ip)) {
> -             err = vxlan_fdb_update(vxlan, all_zeros_mac,
> -                                    remote_ip,
> +             err = vxlan_fdb_update(vxlan, all_zeros_mac, remote_ip,
>                                       NUD_REACHABLE | NUD_PERMANENT,
>                                       NLM_F_APPEND | NLM_F_CREATE,
> -                                    vxlan->cfg.dst_port,
> -                                    vni,
> -                                    vni,
> -                                    dst->remote_ifindex,
> -                                    NTF_SELF, 0, true, extack);
> +                                    vxlan->cfg.dst_port, vni, vni,
> +                                    dst->remote_ifindex, NTF_SELF, 0, true,
> +                                    extack, RTPROT_UNSPEC);
>                if (err) {
>                        spin_unlock_bh(&vxlan->hash_lock);
>                        return err;
> diff --git a/net/bridge/br.c b/net/bridge/br.c
> index 1885d0c315f0..55f017e00247 100644
> --- a/net/bridge/br.c
> +++ b/net/bridge/br.c
> @@ -173,8 +173,8 @@ static int br_switchdev_event(struct notifier_block *unused,
>        case SWITCHDEV_FDB_ADD_TO_BRIDGE:
>                fdb_info = ptr;
>                err = br_fdb_external_learn_add(br, p, fdb_info->addr,
> -                                             fdb_info->vid,
> -                                             fdb_info->locked, false);
> +                                             fdb_info->vid, fdb_info->locked,
> +                                             false, RTPROT_UNSPEC);
>                if (err) {
>                        err = notifier_from_errno(err);
>                        break;
> diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
> index 902694c0ce64..e1b93e495db3 100644
> --- a/net/bridge/br_fdb.c
> +++ b/net/bridge/br_fdb.c
> @@ -123,6 +123,8 @@ static int fdb_fill_info(struct sk_buff *skb, const struct net_bridge *br,
>                goto nla_put_failure;
>        if (nla_put_u32(skb, NDA_MASTER, br->dev->ifindex))
>                goto nla_put_failure;
> +     if (nla_put_u8(skb, NDA_PROTOCOL, fdb->protocol))
> +             goto nla_put_failure;

Need to patch fdb_nlmsg_size()

>        if (nla_put_u32(skb, NDA_FLAGS_EXT, ext_flags))
>                goto nla_put_failure;
>
> @@ -1153,7 +1155,8 @@ static int fdb_add_entry(struct net_bridge *br, struct net_bridge_port *source,
>  static int __br_fdb_add(struct ndmsg *ndm, struct net_bridge *br,
>                        struct net_bridge_port *p, const unsigned char *addr,
>                        u16 nlh_flags, u16 vid, struct nlattr *nfea_tb[],
> -                     bool *notified, struct netlink_ext_ack *extack)
> +                     bool *notified, struct netlink_ext_ack *extack,
> +                     u8 protocol)

Move this after 'vid' so it's closer to the other attributes.

>  {
>        int err = 0;
>
> @@ -1177,7 +1180,8 @@ static int __br_fdb_add(struct ndmsg *ndm, struct net_bridge *br,
>                                           "FDB entry towards bridge must be permanent");
>                        return -EINVAL;
>                }
> -             err = br_fdb_external_learn_add(br, p, addr, vid, false, true);
> +             err = br_fdb_external_learn_add(br, p, addr, vid, false, true,
> +                                             protocol);
>        } else {
>                spin_lock_bh(&br->hash_lock);
>                err = fdb_add_entry(br, p, addr, ndm, nlh_flags, vid, nfea_tb);
> @@ -1206,6 +1210,7 @@ int br_fdb_add(struct ndmsg *ndm, struct nlattr *tb[],
>        struct net_bridge_vlan *v;
>        struct net_bridge *br = NULL;
>        u32 ext_flags = 0;
> +     u8 protocol = RTPROT_UNSPEC;
>        int err = 0;
>
>        trace_br_fdb_add(ndm, dev, addr, vid, nlh_flags);
> @@ -1237,6 +1242,9 @@ int br_fdb_add(struct ndmsg *ndm, struct nlattr *tb[],
>        if (tb[NDA_FLAGS_EXT])
>                ext_flags = nla_get_u32(tb[NDA_FLAGS_EXT]);
>
> +     if (tb[NDA_PROTOCOL])
> +             protocol = nla_get_u8(tb[NDA_PROTOCOL]);
> +
>        if (ext_flags & NTF_EXT_LOCKED) {
>                NL_SET_ERR_MSG_MOD(extack, "Cannot add FDB entry with \"locked\" flag set");
>                return -EINVAL;
> @@ -1261,10 +1269,10 @@ int br_fdb_add(struct ndmsg *ndm, struct nlattr *tb[],
>
>                /* VID was specified, so use it. */
>                err = __br_fdb_add(ndm, br, p, addr, nlh_flags, vid, nfea_tb,
> -                                notified, extack);
> +                                notified, extack, protocol);
>        } else {
>                err = __br_fdb_add(ndm, br, p, addr, nlh_flags, 0, nfea_tb,
> -                                notified, extack);
> +                                notified, extack, protocol);
>                if (err || !vg || !vg->num_vlans)
>                        goto out;
>
> @@ -1276,7 +1284,7 @@ int br_fdb_add(struct ndmsg *ndm, struct nlattr *tb[],
>                        if (!br_vlan_should_use(v))
>                                continue;
>                        err = __br_fdb_add(ndm, br, p, addr, nlh_flags, v->vid,
> -                                        nfea_tb, notified, extack);
> +                                        nfea_tb, notified, extack, protocol);
>                        if (err)
>                                goto out;
>                }
> @@ -1288,7 +1296,8 @@ int br_fdb_add(struct ndmsg *ndm, struct nlattr *tb[],
>
>  static int fdb_delete_by_addr_and_port(struct net_bridge *br,
>                                       const struct net_bridge_port *p,
> -                                    const u8 *addr, u16 vlan, bool *notified)
> +                                    const u8 *addr, u16 vlan, u8 protocol,
> +                                    bool *notified)
>  {
>        struct net_bridge_fdb_entry *fdb;
>
> @@ -1296,6 +1305,13 @@ static int fdb_delete_by_addr_and_port(struct net_bridge *br,
>        if (!fdb || READ_ONCE(fdb->dst) != p)
>                return -ENOENT;
>
> +     /* If the delete comes from a different protocol type,
> +     * that type is used in the notification as some software
> +     * may be expecting multiple deletes (control learned +
> +     * hardware datapath learned) */
> +     if (protocol != RTPROT_UNSPEC)
> +             fdb->protocol = protocol;

I don't understand this. The protocol that should be notified is the one
of the FDB entry being deleted.

> +
>        fdb_delete(br, fdb, true);
>        *notified = true;
>
> @@ -1304,12 +1320,13 @@ static int fdb_delete_by_addr_and_port(struct net_bridge *br,
>
>  static int __br_fdb_delete(struct net_bridge *br,
>                           const struct net_bridge_port *p,
> -                        const unsigned char *addr, u16 vid, bool *notified)
> +                        const unsigned char *addr, u16 vid, u8 protocol,
> +                        bool *notified)
>  {
>        int err;
>
>        spin_lock_bh(&br->hash_lock);
> -     err = fdb_delete_by_addr_and_port(br, p, addr, vid, notified);
> +     err = fdb_delete_by_addr_and_port(br, p, addr, vid, protocol, notified);
>        spin_unlock_bh(&br->hash_lock);
>
>        return err;
> @@ -1324,8 +1341,12 @@ int br_fdb_delete(struct ndmsg *ndm, struct nlattr *tb[],
>        struct net_bridge_vlan_group *vg;
>        struct net_bridge_port *p = NULL;
>        struct net_bridge *br;
> +     u8 protocol = RTPROT_UNSPEC;
>        int err;
>
> +     if (tb[NDA_PROTOCOL])
> +             protocol = nla_get_u8(tb[NDA_PROTOCOL]);
> +
>        if (netif_is_bridge_master(dev)) {
>                br = netdev_priv(dev);
>                vg = br_vlan_group(br);
> @@ -1341,19 +1362,20 @@ int br_fdb_delete(struct ndmsg *ndm, struct nlattr *tb[],
>        }
>
>        if (vid) {
> -             err = __br_fdb_delete(br, p, addr, vid, notified);
> +             err = __br_fdb_delete(br, p, addr, vid, protocol, notified);
>        } else {
>                struct net_bridge_vlan *v;
>
>                err = -ENOENT;
> -             err &= __br_fdb_delete(br, p, addr, 0, notified);
> +             err &= __br_fdb_delete(br, p, addr, 0, protocol, notified);
>                if (!vg || !vg->num_vlans)
>                        return err;
>
>                list_for_each_entry(v, &vg->vlan_list, vlist) {
>                        if (!br_vlan_should_use(v))
>                                continue;
> -                     err &= __br_fdb_delete(br, p, addr, v->vid, notified);
> +                     err &= __br_fdb_delete(br, p, addr, v->vid, protocol,
> +                                            notified);
>                }
>        }
>
> @@ -1414,7 +1436,7 @@ void br_fdb_unsync_static(struct net_bridge *br, struct net_bridge_port *p)
>
>  int br_fdb_external_learn_add(struct net_bridge *br, struct net_bridge_port *p,
>                              const unsigned char *addr, u16 vid, bool locked,
> -                           bool swdev_notify)
> +                           bool swdev_notify, u8 protocol)
>  {
>        struct net_bridge_fdb_entry *fdb;
>        bool modified = false;
> @@ -1445,6 +1467,7 @@ int br_fdb_external_learn_add(struct net_bridge *br, struct net_bridge_port *p,
>                        err = -ENOMEM;
>                        goto err_unlock;
>                }
> +             fdb->protocol = protocol;
>                fdb_notify(br, fdb, RTM_NEWNEIGH, swdev_notify);
>        } else {
>                if (locked &&
> @@ -1483,6 +1506,11 @@ int br_fdb_external_learn_add(struct net_bridge *br, struct net_bridge_port *p,
>                    test_and_clear_bit(BR_FDB_DYNAMIC_LEARNED, &fdb->flags))
>                        atomic_dec(&br->fdb_n_learned);
>
> +             if (fdb->protocol != protocol) {
> +                     modified = true;
> +                     fdb->protocol = protocol;
> +             }
> +
>                if (modified)
>                        fdb_notify(br, fdb, RTM_NEWNEIGH, swdev_notify);
>        }
> diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
> index b159aae594c0..dc14c3c102b2 100644
> --- a/net/bridge/br_private.h
> +++ b/net/bridge/br_private.h
> @@ -291,6 +291,7 @@ struct net_bridge_fdb_entry {
>        struct net_bridge_fdb_key       key;
>        struct hlist_node               fdb_node;
>        unsigned long                   flags;
> +     u8 protocol;

Align the name like other fields. I guess position in the structure is
OK since this field is not write-heavy and there is a 16 bytes hole
here.

>
>        /* write-heavy members should not affect lookups */
>        unsigned long                   updated ____cacheline_aligned_in_smp;
> @@ -870,8 +871,8 @@ int br_fdb_get(struct sk_buff *skb, struct nlattr *tb[], struct net_device *dev,
>  int br_fdb_sync_static(struct net_bridge *br, struct net_bridge_port *p);
>  void br_fdb_unsync_static(struct net_bridge *br, struct net_bridge_port *p);
>  int br_fdb_external_learn_add(struct net_bridge *br, struct net_bridge_port *p,
> -                           const unsigned char *addr, u16 vid,
> -                           bool locked, bool swdev_notify);
> +                           const unsigned char *addr, u16 vid, bool locked,
> +                           bool swdev_notify, u8 protocol);
>  int br_fdb_external_learn_del(struct net_bridge *br, struct net_bridge_port *p,
>                              const unsigned char *addr, u16 vid,
>                              bool swdev_notify);
> --
> 2.43.0
>
>

^ permalink raw reply

* Re: [PATCH net v4] bridge: cfm: reject invalid CCM interval at configuration time
From: Xiang Mei @ 2026-06-09  8:13 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: netdev, horms, bridge, razor, davem, edumazet, pabeni, bestswngs
In-Reply-To: <20260609074606.GC663407@shredder>

On Tue, Jun 9, 2026 at 12:46 AM Ido Schimmel <idosch@nvidia.com> wrote:
>
> On Mon, Jun 08, 2026 at 11:51:16PM -0700, Xiang Mei wrote:
> > ccm_tx_work_expired() re-arms itself via queue_delayed_work() using
> > the configured exp_interval converted by interval_to_us(). When
> > exp_interval is BR_CFM_CCM_INTERVAL_NONE or out of range,
> > interval_to_us() returns 0, causing the worker to fire immediately in
> > a tight loop that allocates skbs until OOM.
> >
> > Fix this by validating exp_interval at configuration time:
> >
> >  - Constrain IFLA_BRIDGE_CFM_CC_CONFIG_EXP_INTERVAL to the valid range
> >    [BR_CFM_CCM_INTERVAL_3_3_MS, BR_CFM_CCM_INTERVAL_10_MIN] in the
> >    netlink policy so userspace cannot set an invalid value.
> >
> >  - Reject starting CCM TX in br_cfm_cc_ccm_tx() when exp_interval has
> >    not yet been configured (defaults to 0 from kzalloc).
> >
> > Fixes: 2be665c3940d ("bridge: cfm: Netlink SET configuration Interface.")
> > Reported-by: Weiming Shi <bestswngs@gmail.com>
> > Signed-off-by: Xiang Mei <xmei5@asu.edu>
>
> Reviewed-by: Ido Schimmel <idosch@nvidia.com>
>
> > ---
> > v4: remove the Suggested-by tag
>
> Should have kept my R-b tag...
Thanks, Ido, for the tip.

^ permalink raw reply

* Re: [PATCH net v4] bridge: cfm: reject invalid CCM interval at configuration time
From: Ido Schimmel @ 2026-06-09  7:46 UTC (permalink / raw)
  To: Xiang Mei
  Cc: netdev, horms, bridge, razor, davem, edumazet, pabeni, bestswngs
In-Reply-To: <20260609065116.2818837-1-xmei5@asu.edu>

On Mon, Jun 08, 2026 at 11:51:16PM -0700, Xiang Mei wrote:
> ccm_tx_work_expired() re-arms itself via queue_delayed_work() using
> the configured exp_interval converted by interval_to_us(). When
> exp_interval is BR_CFM_CCM_INTERVAL_NONE or out of range,
> interval_to_us() returns 0, causing the worker to fire immediately in
> a tight loop that allocates skbs until OOM.
> 
> Fix this by validating exp_interval at configuration time:
> 
>  - Constrain IFLA_BRIDGE_CFM_CC_CONFIG_EXP_INTERVAL to the valid range
>    [BR_CFM_CCM_INTERVAL_3_3_MS, BR_CFM_CCM_INTERVAL_10_MIN] in the
>    netlink policy so userspace cannot set an invalid value.
> 
>  - Reject starting CCM TX in br_cfm_cc_ccm_tx() when exp_interval has
>    not yet been configured (defaults to 0 from kzalloc).
> 
> Fixes: 2be665c3940d ("bridge: cfm: Netlink SET configuration Interface.")
> Reported-by: Weiming Shi <bestswngs@gmail.com>
> Signed-off-by: Xiang Mei <xmei5@asu.edu>

Reviewed-by: Ido Schimmel <idosch@nvidia.com>

> ---
> v4: remove the Suggested-by tag

Should have kept my R-b tag...

^ permalink raw reply

* Re: [PATCH net v3] bridge: cfm: reject invalid CCM interval at configuration time
From: Xiang Mei @ 2026-06-09  7:22 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: Ido Schimmel, netdev, horms, bridge, davem, edumazet, pabeni,
	bestswngs
In-Reply-To: <099ba3ba-49af-4ed2-8b58-31f8132f0d32@blackwall.org>

On Tue, Jun 9, 2026 at 12:19 AM Nikolay Aleksandrov <razor@blackwall.org> wrote:
>
> On 09/06/2026 09:51, Xiang Mei wrote:
> > Thanks for your review and the tip. V4 was sent.
> >
> > Xiang
> >
>
> Please don't top post on netdev@.
>
Thanks for the tip!

Xiang
> Cheers,
>   Nik
>
> > On Mon, Jun 8, 2026 at 11:46 PM Ido Schimmel <idosch@nvidia.com> wrote:
> >>
> >> On Sat, Jun 06, 2026 at 02:58:48PM -0700, Xiang Mei wrote:
> >>> ccm_tx_work_expired() re-arms itself via queue_delayed_work() using
> >>> the configured exp_interval converted by interval_to_us(). When
> >>> exp_interval is BR_CFM_CCM_INTERVAL_NONE or out of range,
> >>> interval_to_us() returns 0, causing the worker to fire immediately in
> >>> a tight loop that allocates skbs until OOM.
> >>>
> >>> Fix this by validating exp_interval at configuration time:
> >>>
> >>>   - Constrain IFLA_BRIDGE_CFM_CC_CONFIG_EXP_INTERVAL to the valid range
> >>>     [BR_CFM_CCM_INTERVAL_3_3_MS, BR_CFM_CCM_INTERVAL_10_MIN] in the
> >>>     netlink policy so userspace cannot set an invalid value.
> >>>
> >>>   - Reject starting CCM TX in br_cfm_cc_ccm_tx() when exp_interval has
> >>>     not yet been configured (defaults to 0 from kzalloc).
> >>>
> >>> Fixes: 2be665c3940d ("bridge: cfm: Netlink SET configuration Interface.")
> >>> Reported-by: Weiming Shi <bestswngs@gmail.com>
> >>> Suggested-by: Ido Schimmel <idosch@nvidia.com>
> >>> Signed-off-by: Xiang Mei <xmei5@asu.edu>
> >>
> >> Reviewed-by: Ido Schimmel <idosch@nvidia.com>
> >>
> >> Nit: I don't think that the Suggested-by is appropriate here since I
> >> merely had minor comments on the previous version.
>

^ permalink raw reply

* Re: [PATCH net v4] bridge: cfm: reject invalid CCM interval at configuration time
From: Nikolay Aleksandrov @ 2026-06-09  7:19 UTC (permalink / raw)
  To: Xiang Mei, netdev
  Cc: idosch, horms, bridge, davem, edumazet, pabeni, bestswngs
In-Reply-To: <20260609065116.2818837-1-xmei5@asu.edu>

On 09/06/2026 09:51, Xiang Mei wrote:
> ccm_tx_work_expired() re-arms itself via queue_delayed_work() using
> the configured exp_interval converted by interval_to_us(). When
> exp_interval is BR_CFM_CCM_INTERVAL_NONE or out of range,
> interval_to_us() returns 0, causing the worker to fire immediately in
> a tight loop that allocates skbs until OOM.
> 
> Fix this by validating exp_interval at configuration time:
> 
>   - Constrain IFLA_BRIDGE_CFM_CC_CONFIG_EXP_INTERVAL to the valid range
>     [BR_CFM_CCM_INTERVAL_3_3_MS, BR_CFM_CCM_INTERVAL_10_MIN] in the
>     netlink policy so userspace cannot set an invalid value.
> 
>   - Reject starting CCM TX in br_cfm_cc_ccm_tx() when exp_interval has
>     not yet been configured (defaults to 0 from kzalloc).
> 
> Fixes: 2be665c3940d ("bridge: cfm: Netlink SET configuration Interface.")
> Reported-by: Weiming Shi <bestswngs@gmail.com>
> Signed-off-by: Xiang Mei <xmei5@asu.edu>
> ---
> v4: remove the Suggested-by tag
> v3: Correct the fix tag and avoid use magic numbers
> v2: Move validation out of the datapath and into configuration
> 
>   net/bridge/br_cfm.c         | 6 ++++++
>   net/bridge/br_cfm_netlink.c | 4 +++-
>   2 files changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/net/bridge/br_cfm.c b/net/bridge/br_cfm.c
> index 118c7ea48c35..dea56fffa1c1 100644
> --- a/net/bridge/br_cfm.c
> +++ b/net/bridge/br_cfm.c
> @@ -805,6 +805,12 @@ int br_cfm_cc_ccm_tx(struct net_bridge *br, const u32 instance,
>   		goto save;
>   	}
>   
> +	if (!interval_to_us(mep->cc_config.exp_interval)) {
> +		NL_SET_ERR_MSG_MOD(extack,
> +				   "Invalid CCM interval");
> +		return -EINVAL;
> +	}
> +
>   	/* Start delayed work to transmit CCM frames. It is done with zero delay
>   	 * to send first frame immediately
>   	 */
> diff --git a/net/bridge/br_cfm_netlink.c b/net/bridge/br_cfm_netlink.c
> index 2faab44652e7..91b9922dc3f2 100644
> --- a/net/bridge/br_cfm_netlink.c
> +++ b/net/bridge/br_cfm_netlink.c
> @@ -34,7 +34,9 @@ br_cfm_cc_config_policy[IFLA_BRIDGE_CFM_CC_CONFIG_MAX + 1] = {
>   	[IFLA_BRIDGE_CFM_CC_CONFIG_UNSPEC]	 = { .type = NLA_REJECT },
>   	[IFLA_BRIDGE_CFM_CC_CONFIG_INSTANCE]	 = { .type = NLA_U32 },
>   	[IFLA_BRIDGE_CFM_CC_CONFIG_ENABLE]	 = { .type = NLA_U32 },
> -	[IFLA_BRIDGE_CFM_CC_CONFIG_EXP_INTERVAL] = { .type = NLA_U32 },
> +	[IFLA_BRIDGE_CFM_CC_CONFIG_EXP_INTERVAL] =
> +		NLA_POLICY_RANGE(NLA_U32, BR_CFM_CCM_INTERVAL_3_3_MS,
> +				 BR_CFM_CCM_INTERVAL_10_MIN),
>   	[IFLA_BRIDGE_CFM_CC_CONFIG_EXP_MAID]	 = {
>   	.type = NLA_BINARY, .len = CFM_MAID_LENGTH },
>   };

Acked-by: Nikolay Aleksandrov <razor@blackwall.org>


^ permalink raw reply

* Re: [PATCH net v3] bridge: cfm: reject invalid CCM interval at configuration time
From: Nikolay Aleksandrov @ 2026-06-09  7:19 UTC (permalink / raw)
  To: Xiang Mei, Ido Schimmel
  Cc: netdev, horms, bridge, davem, edumazet, pabeni, bestswngs
In-Reply-To: <CAPpSM+QGAARoC2Jq2EfMkXCgBQXy7k64W6rBFdJ8mhpfrR0zXA@mail.gmail.com>

On 09/06/2026 09:51, Xiang Mei wrote:
> Thanks for your review and the tip. V4 was sent.
> 
> Xiang
> 

Please don't top post on netdev@.

Cheers,
  Nik

> On Mon, Jun 8, 2026 at 11:46 PM Ido Schimmel <idosch@nvidia.com> wrote:
>>
>> On Sat, Jun 06, 2026 at 02:58:48PM -0700, Xiang Mei wrote:
>>> ccm_tx_work_expired() re-arms itself via queue_delayed_work() using
>>> the configured exp_interval converted by interval_to_us(). When
>>> exp_interval is BR_CFM_CCM_INTERVAL_NONE or out of range,
>>> interval_to_us() returns 0, causing the worker to fire immediately in
>>> a tight loop that allocates skbs until OOM.
>>>
>>> Fix this by validating exp_interval at configuration time:
>>>
>>>   - Constrain IFLA_BRIDGE_CFM_CC_CONFIG_EXP_INTERVAL to the valid range
>>>     [BR_CFM_CCM_INTERVAL_3_3_MS, BR_CFM_CCM_INTERVAL_10_MIN] in the
>>>     netlink policy so userspace cannot set an invalid value.
>>>
>>>   - Reject starting CCM TX in br_cfm_cc_ccm_tx() when exp_interval has
>>>     not yet been configured (defaults to 0 from kzalloc).
>>>
>>> Fixes: 2be665c3940d ("bridge: cfm: Netlink SET configuration Interface.")
>>> Reported-by: Weiming Shi <bestswngs@gmail.com>
>>> Suggested-by: Ido Schimmel <idosch@nvidia.com>
>>> Signed-off-by: Xiang Mei <xmei5@asu.edu>
>>
>> Reviewed-by: Ido Schimmel <idosch@nvidia.com>
>>
>> Nit: I don't think that the Suggested-by is appropriate here since I
>> merely had minor comments on the previous version.


^ permalink raw reply

* Re: [PATCH net v3] bridge: cfm: reject invalid CCM interval at configuration time
From: Xiang Mei @ 2026-06-09  6:51 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: netdev, horms, bridge, razor, davem, edumazet, pabeni, bestswngs
In-Reply-To: <20260609064613.GA652158@shredder>

Thanks for your review and the tip. V4 was sent.

Xiang

On Mon, Jun 8, 2026 at 11:46 PM Ido Schimmel <idosch@nvidia.com> wrote:
>
> On Sat, Jun 06, 2026 at 02:58:48PM -0700, Xiang Mei wrote:
> > ccm_tx_work_expired() re-arms itself via queue_delayed_work() using
> > the configured exp_interval converted by interval_to_us(). When
> > exp_interval is BR_CFM_CCM_INTERVAL_NONE or out of range,
> > interval_to_us() returns 0, causing the worker to fire immediately in
> > a tight loop that allocates skbs until OOM.
> >
> > Fix this by validating exp_interval at configuration time:
> >
> >  - Constrain IFLA_BRIDGE_CFM_CC_CONFIG_EXP_INTERVAL to the valid range
> >    [BR_CFM_CCM_INTERVAL_3_3_MS, BR_CFM_CCM_INTERVAL_10_MIN] in the
> >    netlink policy so userspace cannot set an invalid value.
> >
> >  - Reject starting CCM TX in br_cfm_cc_ccm_tx() when exp_interval has
> >    not yet been configured (defaults to 0 from kzalloc).
> >
> > Fixes: 2be665c3940d ("bridge: cfm: Netlink SET configuration Interface.")
> > Reported-by: Weiming Shi <bestswngs@gmail.com>
> > Suggested-by: Ido Schimmel <idosch@nvidia.com>
> > Signed-off-by: Xiang Mei <xmei5@asu.edu>
>
> Reviewed-by: Ido Schimmel <idosch@nvidia.com>
>
> Nit: I don't think that the Suggested-by is appropriate here since I
> merely had minor comments on the previous version.

^ permalink raw reply

* [PATCH net v4] bridge: cfm: reject invalid CCM interval at configuration time
From: Xiang Mei @ 2026-06-09  6:51 UTC (permalink / raw)
  To: netdev
  Cc: idosch, horms, bridge, razor, davem, edumazet, pabeni, bestswngs,
	Xiang Mei

ccm_tx_work_expired() re-arms itself via queue_delayed_work() using
the configured exp_interval converted by interval_to_us(). When
exp_interval is BR_CFM_CCM_INTERVAL_NONE or out of range,
interval_to_us() returns 0, causing the worker to fire immediately in
a tight loop that allocates skbs until OOM.

Fix this by validating exp_interval at configuration time:

 - Constrain IFLA_BRIDGE_CFM_CC_CONFIG_EXP_INTERVAL to the valid range
   [BR_CFM_CCM_INTERVAL_3_3_MS, BR_CFM_CCM_INTERVAL_10_MIN] in the
   netlink policy so userspace cannot set an invalid value.

 - Reject starting CCM TX in br_cfm_cc_ccm_tx() when exp_interval has
   not yet been configured (defaults to 0 from kzalloc).

Fixes: 2be665c3940d ("bridge: cfm: Netlink SET configuration Interface.")
Reported-by: Weiming Shi <bestswngs@gmail.com>
Signed-off-by: Xiang Mei <xmei5@asu.edu>
---
v4: remove the Suggested-by tag
v3: Correct the fix tag and avoid use magic numbers
v2: Move validation out of the datapath and into configuration

 net/bridge/br_cfm.c         | 6 ++++++
 net/bridge/br_cfm_netlink.c | 4 +++-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/net/bridge/br_cfm.c b/net/bridge/br_cfm.c
index 118c7ea48c35..dea56fffa1c1 100644
--- a/net/bridge/br_cfm.c
+++ b/net/bridge/br_cfm.c
@@ -805,6 +805,12 @@ int br_cfm_cc_ccm_tx(struct net_bridge *br, const u32 instance,
 		goto save;
 	}
 
+	if (!interval_to_us(mep->cc_config.exp_interval)) {
+		NL_SET_ERR_MSG_MOD(extack,
+				   "Invalid CCM interval");
+		return -EINVAL;
+	}
+
 	/* Start delayed work to transmit CCM frames. It is done with zero delay
 	 * to send first frame immediately
 	 */
diff --git a/net/bridge/br_cfm_netlink.c b/net/bridge/br_cfm_netlink.c
index 2faab44652e7..91b9922dc3f2 100644
--- a/net/bridge/br_cfm_netlink.c
+++ b/net/bridge/br_cfm_netlink.c
@@ -34,7 +34,9 @@ br_cfm_cc_config_policy[IFLA_BRIDGE_CFM_CC_CONFIG_MAX + 1] = {
 	[IFLA_BRIDGE_CFM_CC_CONFIG_UNSPEC]	 = { .type = NLA_REJECT },
 	[IFLA_BRIDGE_CFM_CC_CONFIG_INSTANCE]	 = { .type = NLA_U32 },
 	[IFLA_BRIDGE_CFM_CC_CONFIG_ENABLE]	 = { .type = NLA_U32 },
-	[IFLA_BRIDGE_CFM_CC_CONFIG_EXP_INTERVAL] = { .type = NLA_U32 },
+	[IFLA_BRIDGE_CFM_CC_CONFIG_EXP_INTERVAL] =
+		NLA_POLICY_RANGE(NLA_U32, BR_CFM_CCM_INTERVAL_3_3_MS,
+				 BR_CFM_CCM_INTERVAL_10_MIN),
 	[IFLA_BRIDGE_CFM_CC_CONFIG_EXP_MAID]	 = {
 	.type = NLA_BINARY, .len = CFM_MAID_LENGTH },
 };
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH net v3] bridge: cfm: reject invalid CCM interval at configuration time
From: Ido Schimmel @ 2026-06-09  6:46 UTC (permalink / raw)
  To: Xiang Mei
  Cc: netdev, horms, bridge, razor, davem, edumazet, pabeni, bestswngs
In-Reply-To: <20260606215848.1951633-1-xmei5@asu.edu>

On Sat, Jun 06, 2026 at 02:58:48PM -0700, Xiang Mei wrote:
> ccm_tx_work_expired() re-arms itself via queue_delayed_work() using
> the configured exp_interval converted by interval_to_us(). When
> exp_interval is BR_CFM_CCM_INTERVAL_NONE or out of range,
> interval_to_us() returns 0, causing the worker to fire immediately in
> a tight loop that allocates skbs until OOM.
> 
> Fix this by validating exp_interval at configuration time:
> 
>  - Constrain IFLA_BRIDGE_CFM_CC_CONFIG_EXP_INTERVAL to the valid range
>    [BR_CFM_CCM_INTERVAL_3_3_MS, BR_CFM_CCM_INTERVAL_10_MIN] in the
>    netlink policy so userspace cannot set an invalid value.
> 
>  - Reject starting CCM TX in br_cfm_cc_ccm_tx() when exp_interval has
>    not yet been configured (defaults to 0 from kzalloc).
> 
> Fixes: 2be665c3940d ("bridge: cfm: Netlink SET configuration Interface.")
> Reported-by: Weiming Shi <bestswngs@gmail.com>
> Suggested-by: Ido Schimmel <idosch@nvidia.com>
> Signed-off-by: Xiang Mei <xmei5@asu.edu>

Reviewed-by: Ido Schimmel <idosch@nvidia.com>

Nit: I don't think that the Suggested-by is appropriate here since I
merely had minor comments on the previous version.

^ permalink raw reply

* Re: [PATCH net-next v2 3/6] net: bridge: add 802.1Qat stream reservation admission control
From: Luke Howard @ 2026-06-07  0:02 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: Andrew Lunn, Cedric Jehasse, Jiri Pirko, Ivan Vecera,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Ido Schimmel, Andrew Lunn, David Ahern, Shuah Khan,
	Vladimir Oltean, netdev, linux-kernel, bridge, linux-kselftest,
	Max Hunter, Kieran Tyrrell
In-Reply-To: <aiSbrmlOunfxwvGu@penguin>

Hi Nikolay,

> For the MDB to have a dynamic reservation entry means someone must've added it,
> these are not dynamically learned, so you can just as well build the table
> in a more appropriate place which can tag or filter the packet.

Fair enough, also a reasonable argument :) Without 802.1Q Dynamic Reservation Entries having first-class support in the MDB, I don’t think there’s an abstraction that also maps well to the mv88e6xxx.

I’ll keep thinking about it, but for now this had probably best remain a local patch. [1] Thanks for taking the time to give feedback, I learned a lot.

Cheers,
Luke

[1] https://github.com/PADL/linux/tree/b4/mv88e6xxx-8021qat-mqprio

^ permalink raw reply

* Re: [PATCH net-next v2 3/6] net: bridge: add 802.1Qat stream reservation admission control
From: Nikolay Aleksandrov @ 2026-06-06 22:14 UTC (permalink / raw)
  To: Luke Howard
  Cc: Andrew Lunn, Cedric Jehasse, Jiri Pirko, Ivan Vecera,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Ido Schimmel, Andrew Lunn, David Ahern, Shuah Khan,
	Vladimir Oltean, netdev, linux-kernel, bridge, linux-kselftest,
	Max Hunter, Kieran Tyrrell
In-Reply-To: <77CCA8EB-145D-4B76-B6F4-9B775C361995@padl.com>

On Sun, Jun 07, 2026 at 07:49:26AM +1000, Luke Howard wrote:
> 
> 
> > On 6 Jun 2026, at 6:21 pm, Nikolay Aleksandrov <razor@blackwall.org> wrote:
> > 
> > On 06/06/2026 11:02, Luke Howard wrote:
> >> The definition of Dynamic Reservation Entries in 802.1Q (clause 8.8.7) might support the addition of a new MDB (or even FDB) entry state to the kernel:
> >> - add MDB_DYNAMIC_RESERVATION (a state, not a flag);
> >> - the software bridge only _classifies_ packets against  MDB_DYNAMIC_RESERVATION entries, and only when MDB is authoritative. Classification sets dynamic_reservation_hit on tc_skb_ext;
> >> - dynamic_reservation_hit is visible to the flow dissector so can be used for policy enforcement.
> > 
> > See, saying the bridge has to classify doesn't sound right. Why not do the
> > classification where such operations are usually done, e.g. tc?
> > You have to manually designate these entries anyway.
> 
> s/classify/mark, i.e. marking a forwarding bit for tc to match, a la l2_miss.
> 
> tc can’t see into the MDB to tell if a DA has a dynamic reservation entry so, without an explicit DRE bit, the SRP daemon would need to maintain a flower permit filter per DRE. Not needing this allows the user to set a single policy filter prior to starting SRP, e.g.:
> 
> tc filter add dev lan0 egress protocol 802.1Q pref 1 handle 1 flower vlan_prio 3 dynamic_reservation_hit 0 action drop
> 
> It also maps cleanly to chips that support 802.1Qav with priority regeneration or filtering, but which can’t support tc-flower.

Yeah, that was an expected answer and I've seen such claims multiple times.
Just because it is convenient to add it in the bridge, does not make it the
right software model. There are layers that do filtering, marking and manipulation
this must be done at such layer. If you have to create a new table with the entries
filled there then that is what your user-space software must do, or come up with a
better alternative. There're also bridge netfilter chains that can do packet
filtering and manipulation, that might be an option.

For the MDB to have a dynamic reservation entry means someone must've added it,
these are not dynamically learned, so you can just as well build the table
in a more appropriate place which can tag or filter the packet.

^ permalink raw reply

* Re: [PATCH net v2] bridge: cfm: reject invalid CCM interval at configuration time
From: Xiang Mei @ 2026-06-06 22:00 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: netdev, horms, bridge, razor, davem, edumazet, pabeni, bestswngs
In-Reply-To: <20260407070716.GA752875@shredder>

Thanks for the feedback, and sorry for the delayed v3. It was sent.

Best,
Xiang

On Tue, Apr 7, 2026 at 12:07 AM Ido Schimmel <idosch@nvidia.com> wrote:
>
> On Sat, Apr 04, 2026 at 05:03:24PM -0700, Xiang Mei wrote:
> > ccm_tx_work_expired() re-arms itself via queue_delayed_work() using
> > the configured exp_interval converted by interval_to_us(). When
> > exp_interval is BR_CFM_CCM_INTERVAL_NONE or out of range,
> > interval_to_us() returns 0, causing the worker to fire immediately in
> > a tight loop that allocates skbs until OOM.
> >
> > Fix this by validating exp_interval at configuration time:
> >
> >  - Constrain IFLA_BRIDGE_CFM_CC_CONFIG_EXP_INTERVAL to [1, 7] in the
> >    netlink policy so userspace cannot set an invalid value.
> >
> >  - Reject starting CCM TX in br_cfm_cc_ccm_tx() when exp_interval has
> >    not yet been configured (defaults to 0 from kzalloc).
> >
> > Fixes: a806ad8ee2aa ("bridge: cfm: Kernel space implementation of CFM. CCM frame TX added.")
>
> Nit: Doesn't matter in practice, but let's blame commit 2be665c3940d
> ("bridge: cfm: Netlink SET configuration Interface.") instead as I don't
> think this bug could be triggered before exposing the netlink API.
>
> > Reported-by: Weiming Shi <bestswngs@gmail.com>
> > Signed-off-by: Xiang Mei <xmei5@asu.edu>
> > ---
> > v2: Move validation out of the datapath and into configuration
> >
> >  net/bridge/br_cfm.c         | 6 ++++++
> >  net/bridge/br_cfm_netlink.c | 2 +-
> >  2 files changed, 7 insertions(+), 1 deletion(-)
> >
> > diff --git a/net/bridge/br_cfm.c b/net/bridge/br_cfm.c
> > index 118c7ea48c35..dea56fffa1c1 100644
> > --- a/net/bridge/br_cfm.c
> > +++ b/net/bridge/br_cfm.c
> > @@ -805,6 +805,12 @@ int br_cfm_cc_ccm_tx(struct net_bridge *br, const u32 instance,
> >               goto save;
> >       }
> >
> > +     if (!interval_to_us(mep->cc_config.exp_interval)) {
> > +             NL_SET_ERR_MSG_MOD(extack,
> > +                                "Invalid CCM interval");
> > +             return -EINVAL;
> > +     }
> > +
> >       /* Start delayed work to transmit CCM frames. It is done with zero delay
> >        * to send first frame immediately
> >        */
> > diff --git a/net/bridge/br_cfm_netlink.c b/net/bridge/br_cfm_netlink.c
> > index 2faab44652e7..1bb33c8f587b 100644
> > --- a/net/bridge/br_cfm_netlink.c
> > +++ b/net/bridge/br_cfm_netlink.c
> > @@ -34,7 +34,7 @@ br_cfm_cc_config_policy[IFLA_BRIDGE_CFM_CC_CONFIG_MAX + 1] = {
> >       [IFLA_BRIDGE_CFM_CC_CONFIG_UNSPEC]       = { .type = NLA_REJECT },
> >       [IFLA_BRIDGE_CFM_CC_CONFIG_INSTANCE]     = { .type = NLA_U32 },
> >       [IFLA_BRIDGE_CFM_CC_CONFIG_ENABLE]       = { .type = NLA_U32 },
> > -     [IFLA_BRIDGE_CFM_CC_CONFIG_EXP_INTERVAL] = { .type = NLA_U32 },
> > +     [IFLA_BRIDGE_CFM_CC_CONFIG_EXP_INTERVAL] = NLA_POLICY_RANGE(NLA_U32, 1, 7),
>
> Use BR_CFM_CCM_INTERVAL_3_3_MS and BR_CFM_CCM_INTERVAL_10_MIN instead of
> the magic numbers?
>
> The Sashiko review points out that blocking BR_CFM_CCM_INTERVAL_NONE
> might break user space, but it seems weird to allow passing a value that
> is interpreted the same as an invalid one. Worst case, if someone
> complains, we can revert and go back to v1.
>
> >       [IFLA_BRIDGE_CFM_CC_CONFIG_EXP_MAID]     = {
> >       .type = NLA_BINARY, .len = CFM_MAID_LENGTH },
> >  };
> > --
> > 2.43.0
> >

^ permalink raw reply

* [PATCH net v3] bridge: cfm: reject invalid CCM interval at configuration time
From: Xiang Mei @ 2026-06-06 21:58 UTC (permalink / raw)
  To: netdev
  Cc: idosch, horms, bridge, razor, davem, edumazet, pabeni, bestswngs,
	Xiang Mei

ccm_tx_work_expired() re-arms itself via queue_delayed_work() using
the configured exp_interval converted by interval_to_us(). When
exp_interval is BR_CFM_CCM_INTERVAL_NONE or out of range,
interval_to_us() returns 0, causing the worker to fire immediately in
a tight loop that allocates skbs until OOM.

Fix this by validating exp_interval at configuration time:

 - Constrain IFLA_BRIDGE_CFM_CC_CONFIG_EXP_INTERVAL to the valid range
   [BR_CFM_CCM_INTERVAL_3_3_MS, BR_CFM_CCM_INTERVAL_10_MIN] in the
   netlink policy so userspace cannot set an invalid value.

 - Reject starting CCM TX in br_cfm_cc_ccm_tx() when exp_interval has
   not yet been configured (defaults to 0 from kzalloc).

Fixes: 2be665c3940d ("bridge: cfm: Netlink SET configuration Interface.")
Reported-by: Weiming Shi <bestswngs@gmail.com>
Suggested-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Xiang Mei <xmei5@asu.edu>
---
v3: Correct the fix tag and avoid use magic numbers
v2: Move validation out of the datapath and into configuration

 net/bridge/br_cfm.c         | 6 ++++++
 net/bridge/br_cfm_netlink.c | 4 +++-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/net/bridge/br_cfm.c b/net/bridge/br_cfm.c
index 118c7ea48c35..dea56fffa1c1 100644
--- a/net/bridge/br_cfm.c
+++ b/net/bridge/br_cfm.c
@@ -805,6 +805,12 @@ int br_cfm_cc_ccm_tx(struct net_bridge *br, const u32 instance,
 		goto save;
 	}
 
+	if (!interval_to_us(mep->cc_config.exp_interval)) {
+		NL_SET_ERR_MSG_MOD(extack,
+				   "Invalid CCM interval");
+		return -EINVAL;
+	}
+
 	/* Start delayed work to transmit CCM frames. It is done with zero delay
 	 * to send first frame immediately
 	 */
diff --git a/net/bridge/br_cfm_netlink.c b/net/bridge/br_cfm_netlink.c
index 2faab44652e7..91b9922dc3f2 100644
--- a/net/bridge/br_cfm_netlink.c
+++ b/net/bridge/br_cfm_netlink.c
@@ -34,7 +34,9 @@ br_cfm_cc_config_policy[IFLA_BRIDGE_CFM_CC_CONFIG_MAX + 1] = {
 	[IFLA_BRIDGE_CFM_CC_CONFIG_UNSPEC]	 = { .type = NLA_REJECT },
 	[IFLA_BRIDGE_CFM_CC_CONFIG_INSTANCE]	 = { .type = NLA_U32 },
 	[IFLA_BRIDGE_CFM_CC_CONFIG_ENABLE]	 = { .type = NLA_U32 },
-	[IFLA_BRIDGE_CFM_CC_CONFIG_EXP_INTERVAL] = { .type = NLA_U32 },
+	[IFLA_BRIDGE_CFM_CC_CONFIG_EXP_INTERVAL] =
+		NLA_POLICY_RANGE(NLA_U32, BR_CFM_CCM_INTERVAL_3_3_MS,
+				 BR_CFM_CCM_INTERVAL_10_MIN),
 	[IFLA_BRIDGE_CFM_CC_CONFIG_EXP_MAID]	 = {
 	.type = NLA_BINARY, .len = CFM_MAID_LENGTH },
 };
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH net-next v2 3/6] net: bridge: add 802.1Qat stream reservation admission control
From: Luke Howard @ 2026-06-06 21:49 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: Andrew Lunn, Cedric Jehasse, Jiri Pirko, Ivan Vecera,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Ido Schimmel, Andrew Lunn, David Ahern, Shuah Khan,
	Vladimir Oltean, netdev, linux-kernel, bridge, linux-kselftest,
	Max Hunter, Kieran Tyrrell
In-Reply-To: <a63eed7d-b674-4830-a690-1b55dfdc9a72@blackwall.org>



> On 6 Jun 2026, at 6:21 pm, Nikolay Aleksandrov <razor@blackwall.org> wrote:
> 
> On 06/06/2026 11:02, Luke Howard wrote:
>> The definition of Dynamic Reservation Entries in 802.1Q (clause 8.8.7) might support the addition of a new MDB (or even FDB) entry state to the kernel:
>> - add MDB_DYNAMIC_RESERVATION (a state, not a flag);
>> - the software bridge only _classifies_ packets against  MDB_DYNAMIC_RESERVATION entries, and only when MDB is authoritative. Classification sets dynamic_reservation_hit on tc_skb_ext;
>> - dynamic_reservation_hit is visible to the flow dissector so can be used for policy enforcement.
> 
> See, saying the bridge has to classify doesn't sound right. Why not do the
> classification where such operations are usually done, e.g. tc?
> You have to manually designate these entries anyway.

s/classify/mark, i.e. marking a forwarding bit for tc to match, a la l2_miss.

tc can’t see into the MDB to tell if a DA has a dynamic reservation entry so, without an explicit DRE bit, the SRP daemon would need to maintain a flower permit filter per DRE. Not needing this allows the user to set a single policy filter prior to starting SRP, e.g.:

tc filter add dev lan0 egress protocol 802.1Q pref 1 handle 1 flower vlan_prio 3 dynamic_reservation_hit 0 action drop

It also maps cleanly to chips that support 802.1Qav with priority regeneration or filtering, but which can’t support tc-flower.

^ permalink raw reply

* Re: [PATCH net-next v2 3/6] net: bridge: add 802.1Qat stream reservation admission control
From: Nikolay Aleksandrov @ 2026-06-06  8:21 UTC (permalink / raw)
  To: Luke Howard, Andrew Lunn
  Cc: Cedric Jehasse, Jiri Pirko, Ivan Vecera, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Ido Schimmel, Andrew Lunn, David Ahern, Shuah Khan,
	Vladimir Oltean, netdev, linux-kernel, bridge, linux-kselftest,
	Max Hunter, Kieran Tyrrell
In-Reply-To: <CCF20D32-6C9F-49DC-9838-592EFB5E88D2@padl.com>

On 06/06/2026 11:02, Luke Howard wrote:
> 
>> So it sounds like you first need to work on the software
>> implementation and expand the simplified version with the features you
>> need. You can then add support to accelerate this by offloading it to
>> the hardware.
> 
> Agreed.
> 
> The definition of Dynamic Reservation Entries in 802.1Q (clause 8.8.7) might support the addition of a new MDB (or even FDB) entry state to the kernel:
> 
> - add MDB_DYNAMIC_RESERVATION (a state, not a flag);
> - the software bridge only _classifies_ packets against  MDB_DYNAMIC_RESERVATION entries, and only when MDB is authoritative. Classification sets dynamic_reservation_hit on tc_skb_ext;
> - dynamic_reservation_hit is visible to the flow dissector so can be used for policy enforcement.
> 

See, saying the bridge has to classify doesn't sound right. Why not do the
classification where such operations are usually done, e.g. tc?
You have to manually designate these entries anyway.

> Advantages:
> 
> - the new MDB state maps well to 802.1Q;
> - minimal changes to bridge;
> - actual policy (reclassify, drop, etc) is left to the user;
> - doesn’t require a new tc-flower entry for each stream DA;
> - entry state maps 1:1 to mv88e6xxx AVB_NRL ATU EntryState.
> 
> Disadvantages:
> 
> - no unicast support, although potentially can be extended (there are some subtleties);
> - mv88e6xxx support would either require rich enough tc-flower support in TCAM, intercepting TCA_FLOWER_DYNAMIC_RESERVATION_HIT and mapping to native AVB admission control, or a per-port devlink parameter (not so nice).
> 
> This is implemented and working with the software bridge, I still haven’t quite figured out the right mapping for mv88e6xxx.
> 
> Luke
> 
> PS. I previously incorrectly asserted that 802.1Q required dropping frames with AVB/SRP PCPs but without valid dynamic reservation entries. 802.1Q discussions priority mapping in clause 6.9.4 for traffic from SRP boundary ports (those not participating in SRP). In practice I think this should all be policy, e.g. you might want to reprioritise valid DSCP traffic from within a SRP domain, or drop instead of reprioritise, etc.


^ permalink raw reply

* Re: [PATCH net-next v2 3/6] net: bridge: add 802.1Qat stream reservation admission control
From: Luke Howard @ 2026-06-06  8:02 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Cedric Jehasse, Nikolay Aleksandrov, Jiri Pirko, Ivan Vecera,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Ido Schimmel, Andrew Lunn, David Ahern, Shuah Khan,
	Vladimir Oltean, netdev, linux-kernel, bridge, linux-kselftest,
	Max Hunter, Kieran Tyrrell
In-Reply-To: <d0e650fb-4011-4d64-9780-a0ddd0ca1cf3@lunn.ch>


> So it sounds like you first need to work on the software
> implementation and expand the simplified version with the features you
> need. You can then add support to accelerate this by offloading it to
> the hardware.

Agreed.

The definition of Dynamic Reservation Entries in 802.1Q (clause 8.8.7) might support the addition of a new MDB (or even FDB) entry state to the kernel:

- add MDB_DYNAMIC_RESERVATION (a state, not a flag);
- the software bridge only _classifies_ packets against  MDB_DYNAMIC_RESERVATION entries, and only when MDB is authoritative. Classification sets dynamic_reservation_hit on tc_skb_ext;
- dynamic_reservation_hit is visible to the flow dissector so can be used for policy enforcement.

Advantages:

- the new MDB state maps well to 802.1Q;
- minimal changes to bridge;
- actual policy (reclassify, drop, etc) is left to the user;
- doesn’t require a new tc-flower entry for each stream DA;
- entry state maps 1:1 to mv88e6xxx AVB_NRL ATU EntryState.

Disadvantages:

- no unicast support, although potentially can be extended (there are some subtleties);
- mv88e6xxx support would either require rich enough tc-flower support in TCAM, intercepting TCA_FLOWER_DYNAMIC_RESERVATION_HIT and mapping to native AVB admission control, or a per-port devlink parameter (not so nice).

This is implemented and working with the software bridge, I still haven’t quite figured out the right mapping for mv88e6xxx.

Luke

PS. I previously incorrectly asserted that 802.1Q required dropping frames with AVB/SRP PCPs but without valid dynamic reservation entries. 802.1Q discussions priority mapping in clause 6.9.4 for traffic from SRP boundary ports (those not participating in SRP). In practice I think this should all be policy, e.g. you might want to reprioritise valid DSCP traffic from within a SRP domain, or drop instead of reprioritise, etc.

^ permalink raw reply

* Re: [PATCH net-next v2 3/6] net: bridge: add 802.1Qat stream reservation admission control
From: Luke Howard @ 2026-06-05 22:36 UTC (permalink / raw)
  To: Cedric Jehasse
  Cc: Nikolay Aleksandrov, Jiri Pirko, Ivan Vecera, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Ido Schimmel, Andrew Lunn, David Ahern, Shuah Khan, Andrew Lunn,
	Vladimir Oltean, netdev, linux-kernel, bridge, linux-kselftest,
	Max Hunter, Kieran Tyrrell
In-Reply-To: <n2pi7h5nvxg3xvshtcxqeizpukp6hphqa7kodbe7b3ioc3kcq5@n34fgocvaicm>


> But the concept comes from the 802.1Q spec. The spec describes different types
> of FDB entries. One of these is Dynamic Reservation Entries, which are created
> from the Stream Reservation Protocol.
> The issue here is the Marvell switch has an implementation where we need to
> know if an entry is a Dynamic Reservation Entry. But the linux bridge has a
> simplified version of the FDB described in 802.1Q.
> By adding a way to distinguish between the type of FDB entries, it's possible
> to program a Marvell which has this distinction between Dynamic Reservation
> Entries (or AVB entries) and other entries.

If we only care about the multicast case, and we can depend on snooping being enabled and flooding disabled, then I think Dynamic Reservation Entries collapse to permanent MDB entries. Unicast is trickier.

The issue of reclassifying (or dropping) frames that share a SRP class PCP is separate. I think it could be done with a tc-flower entry that reclassifies all ingress traffic with SRP PCPs (e.g. see 802.1Q Table 6-5) and then a per-stream egress entry that sets the queue and priority.

^ permalink raw reply

* Re: [syzbot] [bridge?] KASAN: use-after-free Read in qdisc_pkt_len_segs_init
From: syzbot @ 2026-06-05 16:58 UTC (permalink / raw)
  To: bridge, davem, dsahern, edumazet, horms, idosch, jiayuan.chen,
	kuba, linux-kernel, netdev, pabeni, pshelar, razor,
	syzkaller-bugs, tom
In-Reply-To: <69de2bee.a00a0220.475f0.0041.GAE@google.com>

syzbot has found a reproducer for the following issue on:

HEAD commit:    4aacf509e537 net: mv643xx: fix OF node refcount
git tree:       net
console output: https://syzkaller.appspot.com/x/log.txt?x=15b8fdd2580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=65472e27d1590a04
dashboard link: https://syzkaller.appspot.com/bug?extid=83181a31faf9455499c5
compiler:       Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=10a6da86580000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=13a3c0ae580000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/7559c887601e/disk-4aacf509.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/185ea069b480/vmlinux-4aacf509.xz
kernel image: https://storage.googleapis.com/syzbot-assets/4285524349b9/bzImage-4aacf509.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+83181a31faf9455499c5@syzkaller.appspotmail.com

==================================================================
BUG: KASAN: use-after-free in __tcp_hdrlen include/linux/tcp.h:31 [inline]
BUG: KASAN: use-after-free in qdisc_pkt_len_segs_init+0x7f8/0xa30 net/core/dev.c:4140
Read of size 2 at addr ffff88817cd41734 by task syz.0.17/5856

CPU: 0 UID: 0 PID: 5856 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/18/2026
Call Trace:
 <IRQ>
 dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
 print_address_description+0x55/0x1e0 mm/kasan/report.c:378
 print_report+0x58/0x70 mm/kasan/report.c:482
 kasan_report+0x117/0x150 mm/kasan/report.c:595
 __tcp_hdrlen include/linux/tcp.h:31 [inline]
 qdisc_pkt_len_segs_init+0x7f8/0xa30 net/core/dev.c:4140
 __dev_queue_xmit+0x29a/0x3950 net/core/dev.c:4782
 dev_queue_xmit include/linux/netdevice.h:3418 [inline]
 hsr_xmit net/hsr/hsr_forward.c:440 [inline]
 hsr_forward_do net/hsr/hsr_forward.c:581 [inline]
 hsr_forward_skb+0x167e/0x2ab0 net/hsr/hsr_forward.c:743
 hsr_handle_frame+0x6b8/0xa50 net/hsr/hsr_slave.c:81
 __netif_receive_skb_core+0x98f/0x3170 net/core/dev.c:6089
 __netif_receive_skb_list_core+0x24d/0x810 net/core/dev.c:6277
 __netif_receive_skb_list net/core/dev.c:6344 [inline]
 netif_receive_skb_list_internal+0x995/0xcf0 net/core/dev.c:6435
 gro_normal_list include/net/gro.h:523 [inline]
 gro_flush_normal include/net/gro.h:531 [inline]
 napi_complete_done+0x299/0x730 net/core/dev.c:6803
 gro_cell_poll+0x5a9/0x5d0 net/core/gro_cells.c:74
 __napi_poll+0xae/0x340 net/core/dev.c:7733
 napi_poll net/core/dev.c:7796 [inline]
 net_rx_action+0x627/0xf70 net/core/dev.c:7953
 handle_softirqs+0x22a/0x840 kernel/softirq.c:622
 do_softirq+0x76/0xd0 kernel/softirq.c:523
 </IRQ>
 <TASK>
 __local_bh_enable_ip+0xf8/0x130 kernel/softirq.c:450
 local_bh_enable include/linux/bottom_half.h:33 [inline]
 tun_rx_batched+0x617/0x790 drivers/net/tun.c:-1
 tun_get_user+0x2bbc/0x43e0 drivers/net/tun.c:1955
 tun_chr_write_iter+0x113/0x200 drivers/net/tun.c:2001
 new_sync_write fs/read_write.c:595 [inline]
 vfs_write+0x61d/0xb90 fs/read_write.c:688
 ksys_write+0x150/0x270 fs/read_write.c:740
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fe91fb9ce59
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffda826a2e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 00007fe91fe15fa0 RCX: 00007fe91fb9ce59
RDX: 000000000000007a RSI: 00002000000002c0 RDI: 0000000000000005
RBP: 00007fe91fc32d6f R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007fe91fe15fac R14: 00007fe91fe15fa0 R15: 00007fe91fe15fa0
 </TASK>

The buggy address belongs to the physical page:
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x17cd41
flags: 0x57ff00000000000(node=1|zone=2|lastcpupid=0x7ff)
raw: 057ff00000000000 ffffea0005f35048 ffffea0005f35048 0000000000000000
raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner info is not present (never set?)

Memory state around the buggy address:
 ffff88817cd41600: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 ffff88817cd41680: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>ffff88817cd41700: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
                                     ^
 ffff88817cd41780: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 ffff88817cd41800: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
==================================================================


---
If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox