[PATCH net] net: bridge: Trigger host query on v6 addr valid

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net] net: bridge: Trigger host query on v6 addr valid
@ 2025-09-12 22:39 Joseph Huang
  2025-09-13 18:23 ` Ido Schimmel
  0 siblings, 1 reply; 7+ messages in thread
From: Joseph Huang @ 2025-09-12 22:39 UTC (permalink / raw)
  To: netdev
  Cc: Joseph Huang, Joseph Huang, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Andrew Lunn,
	Nikolay Aleksandrov, Ido Schimmel, David Ahern,
	Stanislav Fomichev, Kuniyuki Iwashima, Ahmed Zaki,
	Alexander Lobakin, linux-kernel, bridge

Trigger the bridge to (re)start sending out Queries to the Host once
IPv6 address becomes valid.

In current implementation, once the bridge (interface) is brought up,
the bridge will start trying to send v4 and v6 Queries to the Host
immediately. However, at that time most likely the IPv6 address of
the bridge interface is not valid yet, and thus the send (actually
the alloc) operation will fail. So the first v6 Startup Query is
always missed.

This caused a ripple effect on the timing of Querier Election. In
current implementation, :: always wins the election. In order for
the "real" election to take place, the bridge would have to first
select itself (this happens when a v6 Query is successfully sent
to the Host), and then do the real address comparison when the next
Query is received. In worst cast scenario, the bridge would have to
wait for [Startup Query Interval] seconds (for the second Query to
be sent to the Host) plus [Query Interval] seconds (for the real
Querier to send the next Query) before it can recognize the real
Querier.

This patch adds a new notification NETDEV_NEWADDR when IPv6 address
becomes valid. When the bridge receives the notification, it will
restart the Startup Queries (much like how the bridge handles port
NETDEV_CHANGE events today).

Signed-off-by: Joseph Huang <Joseph.Huang@garmin.com>
---
 include/linux/netdevice.h |  1 +
 net/bridge/br.c           |  5 +++++
 net/bridge/br_multicast.c | 16 ++++++++++++++++
 net/bridge/br_private.h   |  1 +
 net/core/dev.c            | 10 +++++-----
 net/ipv6/addrconf.c       |  3 +++
 6 files changed, 31 insertions(+), 5 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index f3a3b761abfb..27297e46e064 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3129,6 +3129,7 @@ enum netdev_cmd {
        NETDEV_REGISTER,
        NETDEV_UNREGISTER,
        NETDEV_CHANGEMTU,       /* notify after mtu change happened */
+       NETDEV_NEWADDR,
        NETDEV_CHANGEADDR,      /* notify after the address change */
        NETDEV_PRE_CHANGEADDR,  /* notify before the address change */
        NETDEV_GOING_DOWN,
diff --git a/net/bridge/br.c b/net/bridge/br.c
index c683baa3847f..6f66965e8075 100644
--- a/net/bridge/br.c
+++ b/net/bridge/br.c
@@ -49,6 +49,11 @@ static int br_device_event(struct notifier_block *unused, unsigned long event, v

                        return NOTIFY_DONE;
                }
+
+               if (event == NETDEV_NEWADDR) {
+                       br_multicast_enable_host(netdev_priv(dev));
+                       return NOTIFY_DONE;
+               }
        }

        if (is_vlan_dev(dev)) {
diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 8ce145938b02..5a138c5731f5 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -2076,6 +2076,22 @@ static void br_multicast_enable(struct bridge_mcast_own_query *query)
                mod_timer(&query->timer, jiffies);
 }

+void br_multicast_enable_host(struct net_bridge *br)
+{
+#if IS_ENABLED(CONFIG_IPV6)
+       spin_lock_bh(&br->multicast_lock);
+
+       if (!br_opt_get(br, BROPT_MULTICAST_ENABLED) ||
+           !netif_running(br->dev))
+               goto out;
+
+       br_multicast_enable(&br->multicast_ctx.ip6_own_query);
+
+out:
+       spin_unlock_bh(&br->multicast_lock);
+#endif
+}
+
 static void __br_multicast_enable_port_ctx(struct net_bridge_mcast_port *pmctx)
 {
        struct net_bridge *br = pmctx->port->br;
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 8de0904b9627..16864286dc0d 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -967,6 +967,7 @@ br_mdb_entry_skb_get(struct net_bridge_mcast *brmctx, struct sk_buff *skb,
                     u16 vid);
 int br_multicast_add_port(struct net_bridge_port *port);
 void br_multicast_del_port(struct net_bridge_port *port);
+void br_multicast_enable_host(struct net_bridge *br);
 void br_multicast_enable_port(struct net_bridge_port *port);
 void br_multicast_disable_port(struct net_bridge_port *port);
 void br_multicast_init(struct net_bridge *br);
diff --git a/net/core/dev.c b/net/core/dev.c
index 93a25d87b86b..70a9f379f003 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1843,11 +1843,11 @@ const char *netdev_cmd_to_name(enum netdev_cmd cmd)
                return "NETDEV_" __stringify(val);
        switch (cmd) {
        N(UP) N(DOWN) N(REBOOT) N(CHANGE) N(REGISTER) N(UNREGISTER)
-       N(CHANGEMTU) N(CHANGEADDR) N(GOING_DOWN) N(CHANGENAME) N(FEAT_CHANGE)
-       N(BONDING_FAILOVER) N(PRE_UP) N(PRE_TYPE_CHANGE) N(POST_TYPE_CHANGE)
-       N(POST_INIT) N(PRE_UNINIT) N(RELEASE) N(NOTIFY_PEERS) N(JOIN)
-       N(CHANGEUPPER) N(RESEND_IGMP) N(PRECHANGEMTU) N(CHANGEINFODATA)
-       N(BONDING_INFO) N(PRECHANGEUPPER) N(CHANGELOWERSTATE)
+       N(CHANGEMTU) N(NEWADDR) N(CHANGEADDR) N(GOING_DOWN) N(CHANGENAME)
+       N(FEAT_CHANGE) N(BONDING_FAILOVER) N(PRE_UP) N(PRE_TYPE_CHANGE)
+       N(POST_TYPE_CHANGE) N(POST_INIT) N(PRE_UNINIT) N(RELEASE)
+       N(NOTIFY_PEERS) N(JOIN) N(CHANGEUPPER) N(RESEND_IGMP) N(PRECHANGEMTU)
+       N(CHANGEINFODATA) N(BONDING_INFO) N(PRECHANGEUPPER) N(CHANGELOWERSTATE)
        N(UDP_TUNNEL_PUSH_INFO) N(UDP_TUNNEL_DROP_INFO) N(CHANGE_TX_QUEUE_LEN)
        N(CVLAN_FILTER_PUSH_INFO) N(CVLAN_FILTER_DROP_INFO)
        N(SVLAN_FILTER_PUSH_INFO) N(SVLAN_FILTER_DROP_INFO)
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index f17a5dd4789f..785952377d69 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -6292,6 +6292,9 @@ static void __ipv6_ifa_notify(int event, struct inet6_ifaddr *ifp)
                        addrconf_prefix_route(&ifp->peer_addr, 128,
                                              ifp->rt_priority, ifp->idev->dev,
                                              0, 0, GFP_ATOMIC);
+
+               call_netdevice_notifiers(NETDEV_NEWADDR, ifp->idev->dev);
+
                break;
        case RTM_DELADDR:
                if (ifp->idev->cnf.forwarding)
--
2.50.1


________________________________

CONFIDENTIALITY NOTICE: This email and any attachments are for the sole use of the intended recipient(s) and contain information that may be Garmin confidential and/or Garmin legally privileged. If you have received this email in error, please notify the sender by reply email and delete the message. Any disclosure, copying, distribution or use of this communication (including attachments) by someone other than the intended recipient is prohibited. Thank you.

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH net] net: bridge: Trigger host query on v6 addr valid
  2025-09-12 22:39 [PATCH net] net: bridge: Trigger host query on v6 addr valid Joseph Huang
@ 2025-09-13 18:23 ` Ido Schimmel
  2025-09-15 22:41   ` Huang, Joseph
  0 siblings, 1 reply; 7+ messages in thread
From: Ido Schimmel @ 2025-09-13 18:23 UTC (permalink / raw)
  To: Joseph Huang
  Cc: netdev, Joseph Huang, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Andrew Lunn,
	Nikolay Aleksandrov, David Ahern, Stanislav Fomichev,
	Kuniyuki Iwashima, Ahmed Zaki, Alexander Lobakin, linux-kernel,
	bridge

On Fri, Sep 12, 2025 at 06:39:30PM -0400, Joseph Huang wrote:
> Trigger the bridge to (re)start sending out Queries to the Host once
> IPv6 address becomes valid.
> 
> In current implementation, once the bridge (interface) is brought up,
> the bridge will start trying to send v4 and v6 Queries to the Host
> immediately. However, at that time most likely the IPv6 address of
> the bridge interface is not valid yet, and thus the send (actually
> the alloc) operation will fail. So the first v6 Startup Query is
> always missed.
> 
> This caused a ripple effect on the timing of Querier Election. In
> current implementation, :: always wins the election. In order for
> the "real" election to take place, the bridge would have to first
> select itself (this happens when a v6 Query is successfully sent
> to the Host), and then do the real address comparison when the next
> Query is received. In worst cast scenario, the bridge would have to
> wait for [Startup Query Interval] seconds (for the second Query to
> be sent to the Host) plus [Query Interval] seconds (for the real
> Querier to send the next Query) before it can recognize the real
> Querier.
> 
> This patch adds a new notification NETDEV_NEWADDR when IPv6 address
> becomes valid. When the bridge receives the notification, it will
> restart the Startup Queries (much like how the bridge handles port
> NETDEV_CHANGE events today).
> 
> Signed-off-by: Joseph Huang <Joseph.Huang@garmin.com>
> ---
>  include/linux/netdevice.h |  1 +
>  net/bridge/br.c           |  5 +++++
>  net/bridge/br_multicast.c | 16 ++++++++++++++++
>  net/bridge/br_private.h   |  1 +
>  net/core/dev.c            | 10 +++++-----
>  net/ipv6/addrconf.c       |  3 +++
>  6 files changed, 31 insertions(+), 5 deletions(-)

A few comments:

1. The confidentiality footer needs to be removed.

2. Patches targeted at net need to have a Fixes tag. If you cannot
identify a commit before which this worked correctly (i.e., it's not a
regression), then target the patch at net-next instead.

3. The commit message needs to describe the user visible changes. My
understanding is as follows: When the bridge is brought administratively
up it will try to send a General Query which requires an IPv6 link-local
address to be configured on the bridge device. Because of DAD, such an
address might not exist right away, which means that the first General
Query will be sent after "mcast_startup_query_interval" seconds.

During this time the bridge will be unaware of multicast listeners that
joined before the creation of the bridge. Therefore, the bridge will
either unnecessarily flood multicast traffic to all the bridge ports or
just to those marked as router ports.

The patch aims to reduce this time period and send a General Query as
soon as the bridge is assigned an IPv6 link-local address.

4. Use imperative mood:
https://www.kernel.org/doc/html/latest/process/submitting-patches.html#describe-your-changes

5. There is already a notification chain that notifies about addition /
deletion of IPv6 addresses. See register_inet6addr_notifier().

6. Please extend bridge_mld.sh with a test case in a separate patch. You
can look at xstats to see if queries were sent or not. See for example:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=aea45363e29dd16050e6ce333ce0d3696ac3b5a9

Thanks

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net] net: bridge: Trigger host query on v6 addr valid
  2025-09-13 18:23 ` Ido Schimmel
@ 2025-09-15 22:41   ` Huang, Joseph
  2025-09-17 11:30     ` Ido Schimmel
  0 siblings, 1 reply; 7+ messages in thread
From: Huang, Joseph @ 2025-09-15 22:41 UTC (permalink / raw)
  To: Ido Schimmel, Joseph Huang
  Cc: netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Andrew Lunn, Nikolay Aleksandrov,
	David Ahern, Stanislav Fomichev, Kuniyuki Iwashima, Ahmed Zaki,
	Alexander Lobakin, linux-kernel, bridge

On 9/13/2025 2:23 PM, Ido Schimmel wrote:
 > On Fri, Sep 12, 2025 at 06:39:30PM -0400, Joseph Huang wrote:
 >> Trigger the bridge to (re)start sending out Queries to the Host once
 >> IPv6 address becomes valid.
 >>
 >> In current implementation, once the bridge (interface) is brought up,
 >> the bridge will start trying to send v4 and v6 Queries to the Host
 >> immediately. However, at that time most likely the IPv6 address of
 >> the bridge interface is not valid yet, and thus the send (actually
 >> the alloc) operation will fail. So the first v6 Startup Query is
 >> always missed.
 >>
 >> This caused a ripple effect on the timing of Querier Election. In
 >> current implementation, :: always wins the election. In order for
 >> the "real" election to take place, the bridge would have to first
 >> select itself (this happens when a v6 Query is successfully sent
 >> to the Host), and then do the real address comparison when the next
 >> Query is received. In worst cast scenario, the bridge would have to
 >> wait for [Startup Query Interval] seconds (for the second Query to
 >> be sent to the Host) plus [Query Interval] seconds (for the real
 >> Querier to send the next Query) before it can recognize the real
 >> Querier.
 >>
 >> This patch adds a new notification NETDEV_NEWADDR when IPv6 address
 >> becomes valid. When the bridge receives the notification, it will
 >> restart the Startup Queries (much like how the bridge handles port
 >> NETDEV_CHANGE events today).
 >>
 >> Signed-off-by: Joseph Huang <Joseph.Huang@garmin.com>
 >> ---
 >>   include/linux/netdevice.h |  1 +
 >>   net/bridge/br.c           |  5 +++++
 >>   net/bridge/br_multicast.c | 16 ++++++++++++++++
 >>   net/bridge/br_private.h   |  1 +
 >>   net/core/dev.c            | 10 +++++-----
 >>   net/ipv6/addrconf.c       |  3 +++
 >>   6 files changed, 31 insertions(+), 5 deletions(-)
 >
 > A few comments:
 >
 > 1. The confidentiality footer needs to be removed.
 >
 > 2. Patches targeted at net need to have a Fixes tag. If you cannot
 > identify a commit before which this worked correctly (i.e., it's not a
 > regression), then target the patch at net-next instead.
 >
 > 3. The commit message needs to describe the user visible changes. My
 > understanding is as follows: When the bridge is brought administratively
 > up it will try to send a General Query which requires an IPv6 link-local
 > address to be configured on the bridge device. Because of DAD, such an
 > address might not exist right away, which means that the first General
 > Query will be sent after "mcast_startup_query_interval" seconds.
 >
 > During this time the bridge will be unaware of multicast listeners that
 > joined before the creation of the bridge. Therefore, the bridge will
 > either unnecessarily flood multicast traffic to all the bridge ports or
 > just to those marked as router ports.
 >
 > The patch aims to reduce this time period and send a General Query as
 > soon as the bridge is assigned an IPv6 link-local address.
 >
 > 4. Use imperative mood:
 > 
https://www.kernel.org/doc/html/latest/process/submitting-patches.html#describe-your-changes
 >
 > 5. There is already a notification chain that notifies about addition /
 > deletion of IPv6 addresses. See register_inet6addr_notifier().
 >

It seems that inet6addr_notifier_call_chain() can be called when the 
address is still tentative, which means br_ip6_multicast_alloc_query() 
is still going to fail (br_ip6_multicast_alloc_query() calls 
ipv6_dev_get_saddr(), which calls __ipv6_dev_get_saddr(), which does not 
consider tentative source addresses).

What the bridge needs really is a notification after DAD is completed, 
but I couldn't find such notification. Or did you mean reusing the same 
notification inet6addr_notifier_call_chain() but with a new event after 
DAD is completed?

Thanks,
Joseph



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net] net: bridge: Trigger host query on v6 addr valid
  2025-09-15 22:41   ` Huang, Joseph
@ 2025-09-17 11:30     ` Ido Schimmel
  2025-10-04 14:27       ` Linus Lüssing
  0 siblings, 1 reply; 7+ messages in thread
From: Ido Schimmel @ 2025-09-17 11:30 UTC (permalink / raw)
  To: Huang, Joseph, linus.luessing
  Cc: Joseph Huang, netdev, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Andrew Lunn,
	Nikolay Aleksandrov, David Ahern, Stanislav Fomichev,
	Kuniyuki Iwashima, Ahmed Zaki, Alexander Lobakin, linux-kernel,
	bridge

+ Linus Lüssing

Original patch: https://lore.kernel.org/netdev/20250912223937.1363559-1-Joseph.Huang@garmin.com/

On Mon, Sep 15, 2025 at 06:41:19PM -0400, Huang, Joseph wrote:
> It seems that inet6addr_notifier_call_chain() can be called when the address
> is still tentative, which means br_ip6_multicast_alloc_query() is still
> going to fail (br_ip6_multicast_alloc_query() calls ipv6_dev_get_saddr(),
> which calls __ipv6_dev_get_saddr(), which does not consider tentative source
> addresses).
> 
> What the bridge needs really is a notification after DAD is completed, but I
> couldn't find such notification. Or did you mean reusing the same
> notification inet6addr_notifier_call_chain() but with a new event after DAD
> is completed?

Adding a new event to the inet6addr notification chain makes more sense
than adding a new event to the netdev notification chain.

But before making changes, I want to better understand the problem you
are seeing. Is it specific to the offloaded data path? I believe the
problem was fixed in the software data path by this commit:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0888d5f3c0f183ea6177355752ada433d370ac89

And Linus is working [1][2] on reflecting it to device drivers so that
the hardware data path will act like the software data path and flood
unregistered multicast traffic to all the ports as long as no querier
was detected.

As a temporary workaround, you can either configure an IPv6 link-local
address on the bridge before opening it:

 # ip -6 address add fe80::1/64 dev br0 nodad

Or enable optimistic DAD:

 # sysctl -wq net.ipv6.conf.br0.optimistic_dad=1

[1] https://lore.kernel.org/netdev/20250522195952.29265-1-linus.luessing@c0d3.blue/
[2] https://lore.kernel.org/netdev/20250829085724.24230-1-linus.luessing@c0d3.blue/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net] net: bridge: Trigger host query on v6 addr valid
  2025-09-17 11:30     ` Ido Schimmel
@ 2025-10-04 14:27       ` Linus Lüssing
  2025-10-06 15:43         ` Huang, Joseph
  0 siblings, 1 reply; 7+ messages in thread
From: Linus Lüssing @ 2025-10-04 14:27 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: Huang, Joseph, Joseph Huang, netdev, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Andrew Lunn, Nikolay Aleksandrov, David Ahern, Stanislav Fomichev,
	Kuniyuki Iwashima, Ahmed Zaki, Alexander Lobakin, linux-kernel,
	bridge

On Wed, Sep 17, 2025 at 02:30:51PM +0300, Ido Schimmel wrote:
> But before making changes, I want to better understand the problem you
> are seeing. Is it specific to the offloaded data path? I believe the
> problem was fixed in the software data path by this commit:

Two issues I noticed recently, even without any hardware switch
offloading, on plain soft bridges:

1) (Probably not the issue here? But just to avoid that this
causes additional confusion:) we don't seem to properly converge to
the lowest MAC address, which is a bug, a violation of the RFCs.

If we received an IGMP/MLD query from a foreign host with an
address like fe80::2 and selected it and then enable our own
multicast querier with a lower address like fe80::1 on our bridge
interface for example then we won't send our queries, won't reelect
ourself. If I recall correctly. (Not too critical though, as at least we
have a querier on the link. But I find the election code a bit
confusing and I wouldn't dare to touch it without adding some tests.)

2) Without Ido's suggested workaround when the bridge multicast snooping
+ querier is enabled before the IPv6 DAD has taken place then our
first IGMP/MLD query will fizzle, not be transmitted.

However (at least for a non-hardware-offloaded) bridge as far as I
recall this shouldn't create any multicast packet loss and should
operate as "normal" with flooding multicast data packets first,
with multicast snooping activating on multicast data
after another IGMP/MLD querier interval has elapsed (default:
125 sec.)?

Which indeed could be optimized and is confusing, this delay could
be avoided. Is that that the issue you mean, Joseph?
(I'd consider it more an optimization, so for net-next, not
net though.)

> In current implementation, :: always wins the election

That would be news to me.

RFC2710, section 5:

   To be valid, the Query message MUST come from a link-
   local IPv6 Source Address

RFC3810, section 5.1.14, is even more explicit:

   5.1.14.  Source Addresses for Queries

   All MLDv2 Queries MUST be sent with a valid IPv6 link-local source
   address.  If a node (router or host) receives a Query message with
   the IPv6 Source Address set to the unspecified address (::), or any
   other address that is not a valid IPv6 link-local address, it MUST
   silently discard the message and SHOULD log a warning.

So :: can't be used as a source address for an MLD query.
And since 2014 with "bridge: multicast: add sanity check for query source addresses"
(https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6565b9eeef194afbb3beec80d6dd2447f4091f8c)
we should be adhering to that requirement? Let me know if I'm missing
something.

For IPv4 and 0.0.0.0 this is a different story though... I'm not
aware of a requirement in RFCs to avoid 0.0.0.0 in IGMP
queries. And "intuitively" one would prefer 0.0.0.0 to be the
least prefered querier address. But when taking the IGMP RFCs
literally then 0.0.0.0 would be the lowest one and always win... And RFC4541
unfortunately does not clarify the use of 0.0.0.0 for IGMP queries.
Not quite sure what the common practice among other layer 2 multicast
snooping implemetations across other vendos is.

> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0888d5f3c0f183ea6177355752ada433d370ac89
> 
> And Linus is working [1][2] on reflecting it to device drivers so that
> the hardware data path will act like the software data path and flood
> unregistered multicast traffic to all the ports as long as no querier
> was detected.

Right, for hardware offloading bridges/switches I'm on it, next
revision shouldn't take much longer...

Regards, Linus

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net] net: bridge: Trigger host query on v6 addr valid
  2025-10-04 14:27       ` Linus Lüssing
@ 2025-10-06 15:43         ` Huang, Joseph
  2025-10-08 12:28           ` Ido Schimmel
  0 siblings, 1 reply; 7+ messages in thread
From: Huang, Joseph @ 2025-10-06 15:43 UTC (permalink / raw)
  To: Linus Lüssing, Ido Schimmel
  Cc: Joseph Huang, netdev, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Andrew Lunn,
	Nikolay Aleksandrov, David Ahern, Stanislav Fomichev,
	Kuniyuki Iwashima, Ahmed Zaki, Alexander Lobakin, linux-kernel,
	bridge

On 10/4/2025 10:27 AM, Linus Lüssing wrote:
> On Wed, Sep 17, 2025 at 02:30:51PM +0300, Ido Schimmel wrote:
>> But before making changes, I want to better understand the problem you
>> are seeing. Is it specific to the offloaded data path? I believe the
>> problem was fixed in the software data path by this commit:
> 
> Two issues I noticed recently, even without any hardware switch
> offloading, on plain soft bridges:
> 
> 1) (Probably not the issue here? But just to avoid that this
> causes additional confusion:) we don't seem to properly converge to
> the lowest MAC address, which is a bug, a violation of the RFCs.
> 
> If we received an IGMP/MLD query from a foreign host with an
> address like fe80::2 and selected it and then enable our own
> multicast querier with a lower address like fe80::1 on our bridge
> interface for example then we won't send our queries, won't reelect
> ourself. If I recall correctly. (Not too critical though, as at least we
> have a querier on the link. But I find the election code a bit
> confusing and I wouldn't dare to touch it without adding some tests.)
> 

I agree that there might be some corner cases which the current election 
code does not handle very well (one of them is outlined below).

> 2) Without Ido's suggested workaround when the bridge multicast snooping
> + querier is enabled before the IPv6 DAD has taken place then our
> first IGMP/MLD query will fizzle, not be transmitted.

This (#2) is what this patch trying to address. With DAD enabled, the 
first MLD Query is never transmitted. That essentially means that the 
Robustness Variable is 1 (which is not very robust).

> However (at least for a non-hardware-offloaded) bridge as far as I
> recall this shouldn't create any multicast packet loss and should
> operate as "normal" with flooding multicast data packets first,
> with multicast snooping activating on multicast data
> after another IGMP/MLD querier interval has elapsed (default:
> 125 sec.)?
> 

Some systems could not afford to flood multicast traffic. Think of some 
resource-constrained low power sensors connected to a network with high 
volume multicast video traffic for example. The multicast traffic could 
easily choke the sensors and is essentially a DDoS attack.

> Which indeed could be optimized and is confusing, this delay could
> be avoided. Is that that the issue you mean, Joseph?
> (I'd consider it more an optimization, so for net-next, not
> net though.)
> 

I'm not sure this should be categorized as an optimization. If we never 
intend to send Startup Queries, that's a different story. But if we 
intend to send it but failed, I think that should be a bug.

>> In current implementation, :: always wins the election
> 
> That would be news to me.
> 
> RFC2710, section 5:
> 
>     To be valid, the Query message MUST come from a link-
>     local IPv6 Source Address
> 
> RFC3810, section 5.1.14, is even more explicit:
> 
>     5.1.14.  Source Addresses for Queries
> 
>     All MLDv2 Queries MUST be sent with a valid IPv6 link-local source
>     address.  If a node (router or host) receives a Query message with
>     the IPv6 Source Address set to the unspecified address (::), or any
>     other address that is not a valid IPv6 link-local address, it MUST
>     silently discard the message and SHOULD log a warning.
> 
> So :: can't be used as a source address for an MLD query.
> And since 2014 with "bridge: multicast: add sanity check for query source addresses"
> (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6565b9eeef194afbb3beec80d6dd2447f4091f8c)
> we should be adhering to that requirement? Let me know if I'm missing
> something.
> 

This is what I meant by ":: always wins":

In br_multicast_select_querier(),

	if (ipv6_addr_cmp(&saddr->src.ip6, &querier->addr.src.ip6) <= 0)
		goto update;

If querier->addr.src.ip6 is 0, nothing can be less than that, so ":: 
always wins".

However,

1. querier->addr.src.ip6 is (un)initialized(?) to 0 (I couldn't find the 
place where ip6_querier.addr is initialized)
2. Querier election cannot take place due to the comparison above, until 
the bridge selects itself first via br_multicast_select_own_querier()
3. the bridge only selects itself after the first successful Query is 
sent to the host
4. br_ip6_multicast_alloc_query() will fail if v6 address is not valid

So, without this patch a system would have to wait for

31.25 seconds (for the second Query to the host to selects itself) +
~125 seconds (for the next Query from the real Querier to arrive)

in order to receive multicast traffic. For some embedded devices that's 
a very long time (imagine turning on a TV and have to wait for 2 minutes 
and a half before it starts working).

Thanks,
Joseph

> For IPv4 and 0.0.0.0 this is a different story though... I'm not
> aware of a requirement in RFCs to avoid 0.0.0.0 in IGMP
> queries. And "intuitively" one would prefer 0.0.0.0 to be the
> least prefered querier address. But when taking the IGMP RFCs
> literally then 0.0.0.0 would be the lowest one and always win... And RFC4541
> unfortunately does not clarify the use of 0.0.0.0 for IGMP queries.
> Not quite sure what the common practice among other layer 2 multicast
> snooping implemetations across other vendos is.
> 
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0888d5f3c0f183ea6177355752ada433d370ac89
>>
>> And Linus is working [1][2] on reflecting it to device drivers so that
>> the hardware data path will act like the software data path and flood
>> unregistered multicast traffic to all the ports as long as no querier
>> was detected.
> 
> Right, for hardware offloading bridges/switches I'm on it, next
> revision shouldn't take much longer...
> 
> Regards, Linus


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net] net: bridge: Trigger host query on v6 addr valid
  2025-10-06 15:43         ` Huang, Joseph
@ 2025-10-08 12:28           ` Ido Schimmel
  0 siblings, 0 replies; 7+ messages in thread
From: Ido Schimmel @ 2025-10-08 12:28 UTC (permalink / raw)
  To: Huang, Joseph, linus.luessing
  Cc: Joseph Huang, netdev, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Andrew Lunn,
	Nikolay Aleksandrov, David Ahern, Stanislav Fomichev,
	Kuniyuki Iwashima, Ahmed Zaki, Alexander Lobakin, linux-kernel,
	bridge

On Mon, Oct 06, 2025 at 11:43:02AM -0400, Huang, Joseph wrote:
> On 10/4/2025 10:27 AM, Linus Lüssing wrote:
> > However (at least for a non-hardware-offloaded) bridge as far as I
> > recall this shouldn't create any multicast packet loss and should
> > operate as "normal" with flooding multicast data packets first,
> > with multicast snooping activating on multicast data
> > after another IGMP/MLD querier interval has elapsed (default:
> > 125 sec.)?

Isn't this 10 seconds (default mcast_query_response_interval)?

BTW, I see that delay_timer is started in br_multicast_set_querier()
which is called from br_changelink(). Isn't this problematic if querier
is enabled while the bridge is administratively down? It's possible for
this timer to expire by the time the bridge is opened.

> Some systems could not afford to flood multicast traffic. Think of some
> resource-constrained low power sensors connected to a network with high
> volume multicast video traffic for example. The multicast traffic could
> easily choke the sensors and is essentially a DDoS attack.

Note that even with your patch (or optimistic DAD) there will still be a
time period where multicast traffic is flooded to give responses enough
time to arrive.

Can you clarify how you observed the problem? Did you observe packet
loss with hardware offload or did you observe excessive flooding with
the software data path?

> > Which indeed could be optimized and is confusing, this delay could
> > be avoided. Is that that the issue you mean, Joseph?
> > (I'd consider it more an optimization, so for net-next, not
> > net though.)
> > 
> 
> I'm not sure this should be categorized as an optimization. If we never
> intend to send Startup Queries, that's a different story. But if we intend
> to send it but failed, I think that should be a bug.

I would say that the deciding factor should be if it's a regression or
not.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-10-08 12:28 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-12 22:39 [PATCH net] net: bridge: Trigger host query on v6 addr valid Joseph Huang
2025-09-13 18:23 ` Ido Schimmel
2025-09-15 22:41   ` Huang, Joseph
2025-09-17 11:30     ` Ido Schimmel
2025-10-04 14:27       ` Linus Lüssing
2025-10-06 15:43         ` Huang, Joseph
2025-10-08 12:28           ` Ido Schimmel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).