netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Huang, Joseph" <joseph.huang.at.garmin@gmail.com>
To: "Linus Lüssing" <linus.luessing@c0d3.blue>,
	"Ido Schimmel" <idosch@nvidia.com>
Cc: Joseph Huang <Joseph.Huang@garmin.com>,
	netdev@vger.kernel.org, "David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Simon Horman <horms@kernel.org>,
	Andrew Lunn <andrew+netdev@lunn.ch>,
	Nikolay Aleksandrov <razor@blackwall.org>,
	David Ahern <dsahern@kernel.org>,
	Stanislav Fomichev <sdf@fomichev.me>,
	Kuniyuki Iwashima <kuniyu@google.com>,
	Ahmed Zaki <ahmed.zaki@intel.com>,
	Alexander Lobakin <aleksander.lobakin@intel.com>,
	linux-kernel@vger.kernel.org, bridge@lists.linux.dev
Subject: Re: [PATCH net] net: bridge: Trigger host query on v6 addr valid
Date: Mon, 6 Oct 2025 11:43:02 -0400	[thread overview]
Message-ID: <9cc66694-6fcd-4460-9bce-cdbcb0153a89@gmail.com> (raw)
In-Reply-To: <aOEu6uQ4pP4PJH-y@sellars>

On 10/4/2025 10:27 AM, Linus Lüssing wrote:
> On Wed, Sep 17, 2025 at 02:30:51PM +0300, Ido Schimmel wrote:
>> But before making changes, I want to better understand the problem you
>> are seeing. Is it specific to the offloaded data path? I believe the
>> problem was fixed in the software data path by this commit:
> 
> Two issues I noticed recently, even without any hardware switch
> offloading, on plain soft bridges:
> 
> 1) (Probably not the issue here? But just to avoid that this
> causes additional confusion:) we don't seem to properly converge to
> the lowest MAC address, which is a bug, a violation of the RFCs.
> 
> If we received an IGMP/MLD query from a foreign host with an
> address like fe80::2 and selected it and then enable our own
> multicast querier with a lower address like fe80::1 on our bridge
> interface for example then we won't send our queries, won't reelect
> ourself. If I recall correctly. (Not too critical though, as at least we
> have a querier on the link. But I find the election code a bit
> confusing and I wouldn't dare to touch it without adding some tests.)
> 

I agree that there might be some corner cases which the current election 
code does not handle very well (one of them is outlined below).

> 2) Without Ido's suggested workaround when the bridge multicast snooping
> + querier is enabled before the IPv6 DAD has taken place then our
> first IGMP/MLD query will fizzle, not be transmitted.

This (#2) is what this patch trying to address. With DAD enabled, the 
first MLD Query is never transmitted. That essentially means that the 
Robustness Variable is 1 (which is not very robust).

> However (at least for a non-hardware-offloaded) bridge as far as I
> recall this shouldn't create any multicast packet loss and should
> operate as "normal" with flooding multicast data packets first,
> with multicast snooping activating on multicast data
> after another IGMP/MLD querier interval has elapsed (default:
> 125 sec.)?
> 

Some systems could not afford to flood multicast traffic. Think of some 
resource-constrained low power sensors connected to a network with high 
volume multicast video traffic for example. The multicast traffic could 
easily choke the sensors and is essentially a DDoS attack.

> Which indeed could be optimized and is confusing, this delay could
> be avoided. Is that that the issue you mean, Joseph?
> (I'd consider it more an optimization, so for net-next, not
> net though.)
> 

I'm not sure this should be categorized as an optimization. If we never 
intend to send Startup Queries, that's a different story. But if we 
intend to send it but failed, I think that should be a bug.

>> In current implementation, :: always wins the election
> 
> That would be news to me.
> 
> RFC2710, section 5:
> 
>     To be valid, the Query message MUST come from a link-
>     local IPv6 Source Address
> 
> RFC3810, section 5.1.14, is even more explicit:
> 
>     5.1.14.  Source Addresses for Queries
> 
>     All MLDv2 Queries MUST be sent with a valid IPv6 link-local source
>     address.  If a node (router or host) receives a Query message with
>     the IPv6 Source Address set to the unspecified address (::), or any
>     other address that is not a valid IPv6 link-local address, it MUST
>     silently discard the message and SHOULD log a warning.
> 
> So :: can't be used as a source address for an MLD query.
> And since 2014 with "bridge: multicast: add sanity check for query source addresses"
> (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6565b9eeef194afbb3beec80d6dd2447f4091f8c)
> we should be adhering to that requirement? Let me know if I'm missing
> something.
> 

This is what I meant by ":: always wins":

In br_multicast_select_querier(),

	if (ipv6_addr_cmp(&saddr->src.ip6, &querier->addr.src.ip6) <= 0)
		goto update;

If querier->addr.src.ip6 is 0, nothing can be less than that, so ":: 
always wins".

However,

1. querier->addr.src.ip6 is (un)initialized(?) to 0 (I couldn't find the 
place where ip6_querier.addr is initialized)
2. Querier election cannot take place due to the comparison above, until 
the bridge selects itself first via br_multicast_select_own_querier()
3. the bridge only selects itself after the first successful Query is 
sent to the host
4. br_ip6_multicast_alloc_query() will fail if v6 address is not valid

So, without this patch a system would have to wait for

31.25 seconds (for the second Query to the host to selects itself) +
~125 seconds (for the next Query from the real Querier to arrive)

in order to receive multicast traffic. For some embedded devices that's 
a very long time (imagine turning on a TV and have to wait for 2 minutes 
and a half before it starts working).

Thanks,
Joseph

> For IPv4 and 0.0.0.0 this is a different story though... I'm not
> aware of a requirement in RFCs to avoid 0.0.0.0 in IGMP
> queries. And "intuitively" one would prefer 0.0.0.0 to be the
> least prefered querier address. But when taking the IGMP RFCs
> literally then 0.0.0.0 would be the lowest one and always win... And RFC4541
> unfortunately does not clarify the use of 0.0.0.0 for IGMP queries.
> Not quite sure what the common practice among other layer 2 multicast
> snooping implemetations across other vendos is.
> 
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0888d5f3c0f183ea6177355752ada433d370ac89
>>
>> And Linus is working [1][2] on reflecting it to device drivers so that
>> the hardware data path will act like the software data path and flood
>> unregistered multicast traffic to all the ports as long as no querier
>> was detected.
> 
> Right, for hardware offloading bridges/switches I'm on it, next
> revision shouldn't take much longer...
> 
> Regards, Linus


  reply	other threads:[~2025-10-06 15:43 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-12 22:39 [PATCH net] net: bridge: Trigger host query on v6 addr valid Joseph Huang
2025-09-13 18:23 ` Ido Schimmel
2025-09-15 22:41   ` Huang, Joseph
2025-09-17 11:30     ` Ido Schimmel
2025-10-04 14:27       ` Linus Lüssing
2025-10-06 15:43         ` Huang, Joseph [this message]
2025-10-08 12:28           ` Ido Schimmel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9cc66694-6fcd-4460-9bce-cdbcb0153a89@gmail.com \
    --to=joseph.huang.at.garmin@gmail.com \
    --cc=Joseph.Huang@garmin.com \
    --cc=ahmed.zaki@intel.com \
    --cc=aleksander.lobakin@intel.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=bridge@lists.linux.dev \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=idosch@nvidia.com \
    --cc=kuba@kernel.org \
    --cc=kuniyu@google.com \
    --cc=linus.luessing@c0d3.blue \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=razor@blackwall.org \
    --cc=sdf@fomichev.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).