netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Petr Machata <petrm@nvidia.com>, Ido Schimmel <idosch@nvidia.com>,
	Nikolay Aleksandrov <razor@blackwall.org>,
	Paolo Abeni <pabeni@redhat.com>, Sasha Levin <sashal@kernel.org>,
	andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com,
	kuba@kernel.org, menglong8.dong@gmail.com, gnault@redhat.com,
	netdev@vger.kernel.org
Subject: [PATCH AUTOSEL 6.1 151/212] vxlan: Join / leave MC group after remote changes
Date: Mon,  5 May 2025 19:05:23 -0400	[thread overview]
Message-ID: <20250505230624.2692522-151-sashal@kernel.org> (raw)
In-Reply-To: <20250505230624.2692522-1-sashal@kernel.org>

From: Petr Machata <petrm@nvidia.com>

[ Upstream commit d42d543368343c0449a4e433b5f02e063a86209c ]

When a vxlan netdevice is brought up, if its default remote is a multicast
address, the device joins the indicated group.

Therefore when the multicast remote address changes, the device should
leave the current group and subscribe to the new one. Similarly when the
interface used for endpoint communication is changed in a situation when
multicast remote is configured. This is currently not done.

Both vxlan_igmp_join() and vxlan_igmp_leave() can however fail. So it is
possible that with such fix, the netdevice will end up in an inconsistent
situation where the old group is not joined anymore, but joining the new
group fails. Should we join the new group first, and leave the old one
second, we might end up in the opposite situation, where both groups are
joined. Undoing any of this during rollback is going to be similarly
problematic.

One solution would be to just forbid the change when the netdevice is up.
However in vnifilter mode, changing the group address is allowed, and these
problems are simply ignored (see vxlan_vni_update_group()):

 # ip link add name br up type bridge vlan_filtering 1
 # ip link add vx1 up master br type vxlan external vnifilter local 192.0.2.1 dev lo dstport 4789
 # bridge vni add dev vx1 vni 200 group 224.0.0.1
 # tcpdump -i lo &
 # bridge vni add dev vx1 vni 200 group 224.0.0.2
 18:55:46.523438 IP 0.0.0.0 > 224.0.0.22: igmp v3 report, 1 group record(s)
 18:55:46.943447 IP 0.0.0.0 > 224.0.0.22: igmp v3 report, 1 group record(s)
 # bridge vni
 dev               vni                group/remote
 vx1               200                224.0.0.2

Having two different modes of operation for conceptually the same interface
is silly, so in this patch, just do what the vnifilter code does and deal
with the errors by crossing fingers real hard.

The vnifilter code leaves old before joining new, and in case of join /
leave failures does not roll back the configuration changes that have
already been applied, but bails out of joining if it could not leave. Do
the same here: leave before join, apply changes unconditionally and do not
attempt to join if we couldn't leave.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/net/vxlan/vxlan_core.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/drivers/net/vxlan/vxlan_core.c b/drivers/net/vxlan/vxlan_core.c
index 50be5a3c47795..0afd7eb976e6c 100644
--- a/drivers/net/vxlan/vxlan_core.c
+++ b/drivers/net/vxlan/vxlan_core.c
@@ -4231,6 +4231,7 @@ static int vxlan_changelink(struct net_device *dev, struct nlattr *tb[],
 			    struct netlink_ext_ack *extack)
 {
 	struct vxlan_dev *vxlan = netdev_priv(dev);
+	bool rem_ip_changed, change_igmp;
 	struct net_device *lowerdev;
 	struct vxlan_config conf;
 	struct vxlan_rdst *dst;
@@ -4254,8 +4255,13 @@ static int vxlan_changelink(struct net_device *dev, struct nlattr *tb[],
 	if (err)
 		return err;
 
+	rem_ip_changed = !vxlan_addr_equal(&conf.remote_ip, &dst->remote_ip);
+	change_igmp = vxlan->dev->flags & IFF_UP &&
+		      (rem_ip_changed ||
+		       dst->remote_ifindex != conf.remote_ifindex);
+
 	/* handle default dst entry */
-	if (!vxlan_addr_equal(&conf.remote_ip, &dst->remote_ip)) {
+	if (rem_ip_changed) {
 		u32 hash_index = fdb_head_index(vxlan, all_zeros_mac, conf.vni);
 
 		spin_lock_bh(&vxlan->hash_lock[hash_index]);
@@ -4299,6 +4305,9 @@ static int vxlan_changelink(struct net_device *dev, struct nlattr *tb[],
 		}
 	}
 
+	if (change_igmp && vxlan_addr_multicast(&dst->remote_ip))
+		err = vxlan_multicast_leave(vxlan);
+
 	if (conf.age_interval != vxlan->cfg.age_interval)
 		mod_timer(&vxlan->age_timer, jiffies);
 
@@ -4306,7 +4315,12 @@ static int vxlan_changelink(struct net_device *dev, struct nlattr *tb[],
 	if (lowerdev && lowerdev != dst->remote_dev)
 		dst->remote_dev = lowerdev;
 	vxlan_config_apply(dev, &conf, lowerdev, vxlan->net, true);
-	return 0;
+
+	if (!err && change_igmp &&
+	    vxlan_addr_multicast(&dst->remote_ip))
+		err = vxlan_multicast_join(vxlan);
+
+	return err;
 }
 
 static void vxlan_dellink(struct net_device *dev, struct list_head *head)
-- 
2.39.5


  parent reply	other threads:[~2025-05-05 23:11 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20250505230624.2692522-1-sashal@kernel.org>
2025-05-05 23:03 ` [PATCH AUTOSEL 6.1 009/212] SUNRPC: Don't allow waiting for exiting tasks Sasha Levin
2025-05-05 23:03 ` [PATCH AUTOSEL 6.1 018/212] SUNRPC: rpc_clnt_set_transport() must not change the autobind setting Sasha Levin
2025-05-05 23:03 ` [PATCH AUTOSEL 6.1 019/212] SUNRPC: rpcbind should never reset the port to the value '0' Sasha Levin
2025-05-05 23:03 ` [PATCH AUTOSEL 6.1 051/212] tcp: reorganize tcp_in_ack_event() and tcp_count_delivered() Sasha Levin
2025-05-05 23:03 ` [PATCH AUTOSEL 6.1 058/212] net/smc: use the correct ndev to find pnetid by pnetid table Sasha Levin
2025-05-05 23:03 ` [PATCH AUTOSEL 6.1 067/212] netfilter: conntrack: Bound nf_conntrack sysctl writes Sasha Levin
2025-05-05 23:04 ` [PATCH AUTOSEL 6.1 072/212] ipv6: save dontfrag in cork Sasha Levin
2025-05-05 23:04 ` [PATCH AUTOSEL 6.1 083/212] tcp: bring back NUMA dispersion in inet_ehash_locks_alloc() Sasha Levin
2025-05-05 23:04 ` [PATCH AUTOSEL 6.1 085/212] ieee802154: ca8210: Use proper setters and getters for bitwise types Sasha Levin
2025-05-05 23:04 ` [PATCH AUTOSEL 6.1 090/212] net: phylink: use pl->link_interface in phylink_expects_phy() Sasha Levin
2025-05-05 23:04 ` [PATCH AUTOSEL 6.1 095/212] net: ethernet: ti: cpsw_new: populate netdev of_node Sasha Levin
2025-05-05 23:04 ` [PATCH AUTOSEL 6.1 096/212] net: pktgen: fix mpls maximum labels list parsing Sasha Levin
2025-05-05 23:04 ` [PATCH AUTOSEL 6.1 099/212] ipv4: fib: Move fib_valid_key_len() to rtm_to_fib_config() Sasha Levin
2025-05-05 23:04 ` [PATCH AUTOSEL 6.1 114/212] net/mlx5: Avoid report two health errors on same syndrome Sasha Levin
2025-05-05 23:04 ` [PATCH AUTOSEL 6.1 115/212] selftests/net: have `gro.sh -t` return a correct exit code Sasha Levin
2025-05-05 23:04 ` [PATCH AUTOSEL 6.1 118/212] net: ethernet: mtk_ppe_offload: Allow QinQ, double ETH_P_8021Q only Sasha Levin
2025-05-05 23:04 ` [PATCH AUTOSEL 6.1 119/212] net: xgene-v2: remove incorrect ACPI_PTR annotation Sasha Levin
2025-05-05 23:04 ` [PATCH AUTOSEL 6.1 120/212] bonding: report duplicate MAC address in all situations Sasha Levin
2025-05-05 23:05 ` [PATCH AUTOSEL 6.1 140/212] net: pktgen: fix access outside of user given buffer in pktgen_thread_write() Sasha Levin
2025-05-05 23:05 ` [PATCH AUTOSEL 6.1 142/212] bpf: Prevent unsafe access to the sock fields in the BPF timestamping callback Sasha Levin
2025-05-05 23:05 ` [PATCH AUTOSEL 6.1 148/212] eth: mlx4: don't try to complete XDP frames in netpoll Sasha Levin
2025-05-05 23:05 ` Sasha Levin [this message]
2025-05-05 23:05 ` [PATCH AUTOSEL 6.1 153/212] net/mlx5: Modify LSB bitmask in temperature event to include only the first bit Sasha Levin
2025-05-05 23:05 ` [PATCH AUTOSEL 6.1 154/212] net/mlx5: Apply rate-limiting to high temperature warning Sasha Levin
2025-05-05 23:05 ` [PATCH AUTOSEL 6.1 166/212] net/mlx4_core: Avoid impossible mlx4_db_alloc() order value Sasha Levin
2025-05-05 23:05 ` [PATCH AUTOSEL 6.1 174/212] net/mlx5: Extend Ethtool loopback selftest to support non-linear SKB Sasha Levin
2025-05-05 23:05 ` [PATCH AUTOSEL 6.1 175/212] net/mlx5e: set the tx_queue_len for pfifo_fast Sasha Levin
2025-05-05 23:05 ` [PATCH AUTOSEL 6.1 176/212] net/mlx5e: reduce rep rxq depth to 256 for ECPF Sasha Levin
2025-05-05 23:05 ` [PATCH AUTOSEL 6.1 180/212] ip: fib_rules: Fetch net from fib_rule in fib[46]_rule_configure() Sasha Levin
2025-05-05 23:05 ` [PATCH AUTOSEL 6.1 181/212] r8152: add vendor/device ID pair for Dell Alienware AW1022z Sasha Levin
2025-05-05 23:05 ` [PATCH AUTOSEL 6.1 187/212] vxlan: Annotate FDB data races Sasha Levin
2025-05-05 23:06 ` [PATCH AUTOSEL 6.1 188/212] r8169: don't scan PHY addresses > 0 Sasha Levin
2025-05-05 23:06 ` [PATCH AUTOSEL 6.1 189/212] net-sysfs: prevent uncleared queues from being re-added Sasha Levin
2025-05-05 23:06 ` [PATCH AUTOSEL 6.1 194/212] ice: count combined queues using Rx/Tx count Sasha Levin
2025-05-06  9:26   ` [Intel-wired-lan] " Loktionov, Aleksandr
2025-05-05 23:06 ` [PATCH AUTOSEL 6.1 195/212] net/mana: fix warning in the writer of client oob Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250505230624.2692522-151-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=andrew+netdev@lunn.ch \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=gnault@redhat.com \
    --cc=idosch@nvidia.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=menglong8.dong@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=petrm@nvidia.com \
    --cc=razor@blackwall.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).