* [PATCH net 0/3] mlx5e misc fixes 2025-09-08
@ 2025-09-08 10:07 Tariq Toukan
2025-09-08 10:07 ` [PATCH net 1/3] net/mlx5e: Harden uplink netdev access against device unbind Tariq Toukan
` (2 more replies)
0 siblings, 3 replies; 16+ messages in thread
From: Tariq Toukan @ 2025-09-08 10:07 UTC (permalink / raw)
To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
David S. Miller
Cc: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Mark Bloch, netdev,
linux-rdma, linux-kernel, Gal Pressman
Hi,
This patchset provides misc bug fixes from the team to the mlx5 Eth
driver.
Thanks,
Tariq.
Jianbo Liu (2):
net/mlx5e: Harden uplink netdev access against device unbind
net/mlx5e: Prevent entering switchdev mode with inconsistent netns
Lama Kayal (1):
net/mlx5e: Add a miss level for ipsec crypto offload
.../net/ethernet/mellanox/mlx5/core/en/fs.h | 1 +
.../mellanox/mlx5/core/en_accel/ipsec.h | 1 +
.../mellanox/mlx5/core/en_accel/ipsec_fs.c | 3 +-
.../net/ethernet/mellanox/mlx5/core/en_rep.c | 26 ++++++++++++---
.../net/ethernet/mellanox/mlx5/core/esw/qos.c | 1 +
.../mellanox/mlx5/core/eswitch_offloads.c | 33 +++++++++++++++++++
.../net/ethernet/mellanox/mlx5/core/fs_core.c | 4 +--
.../ethernet/mellanox/mlx5/core/lib/mlx5.h | 15 ++++++++-
include/linux/mlx5/driver.h | 1 +
9 files changed, 76 insertions(+), 9 deletions(-)
base-commit: e2a10daba84968f6b5777d150985fd7d6abc9c84
--
2.31.1
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH net 1/3] net/mlx5e: Harden uplink netdev access against device unbind
2025-09-08 10:07 [PATCH net 0/3] mlx5e misc fixes 2025-09-08 Tariq Toukan
@ 2025-09-08 10:07 ` Tariq Toukan
2025-09-10 1:23 ` Jakub Kicinski
2025-09-08 10:07 ` [PATCH net 2/3] net/mlx5e: Prevent entering switchdev mode with inconsistent netns Tariq Toukan
2025-09-08 10:07 ` [PATCH net 3/3] net/mlx5e: Add a miss level for ipsec crypto offload Tariq Toukan
2 siblings, 1 reply; 16+ messages in thread
From: Tariq Toukan @ 2025-09-08 10:07 UTC (permalink / raw)
To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
David S. Miller
Cc: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Mark Bloch, netdev,
linux-rdma, linux-kernel, Gal Pressman, Jianbo Liu
From: Jianbo Liu <jianbol@nvidia.com>
The function mlx5_uplink_netdev_get() gets the uplink netdevice
pointer from mdev->mlx5e_res.uplink_netdev. However, the netdevice can
be removed and its pointer cleared when unbound from the mlx5_core.eth
driver. This results in a NULL pointer, causing a kernel panic.
BUG: unable to handle page fault for address: 0000000000001300
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: Oops: 0000 [#1] SMP
CPU: 5 UID: 0 PID: 1420 Comm: devlink Not tainted 6.16.0-rc7+ #1 NONE
Hardware name: ...
RIP: 0010:mlx5e_vport_rep_load+0x22a/0x270 [mlx5_core]
Code: ...
RSP: 0018:ffff8881142b78f0 EFLAGS: 00010246
RAX: ffff888104652150 RBX: ffff88810215a6c0 RCX: 0000000000000000
RDX: ffff888104652000 RSI: ffffffffa02b0cc0 RDI: 0000000000000000
RBP: ffff888104652000 R08: ffff8884edaad5c0 R09: 0000000000000000
R10: ffff8881142b78f0 R11: ffff888100716480 R12: ffff88810f1e0000
R13: ffff88810215a6c0 R14: ffff888104652000 R15: ffff8881095301a0
FS: 00007fa008742740(0000) GS:ffff88856a9f6000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000001300 CR3: 0000000106fb6001 CR4: 0000000000372eb0
Call Trace:
<TASK>
mlx5_esw_offloads_rep_load+0x68/0xe0 [mlx5_core]
esw_offloads_enable+0x593/0x910 [mlx5_core]
mlx5_eswitch_enable_locked+0x341/0x420 [mlx5_core]
? is_mp_supported+0x4b/0xa0 [mlx5_core]
mlx5_devlink_eswitch_mode_set+0x17e/0x3a0 [mlx5_core]
devlink_nl_eswitch_set_doit+0x60/0xd0
genl_family_rcv_msg_doit+0xe0/0x130
genl_rcv_msg+0x183/0x290
? devlink_get_from_attrs_lock+0x180/0x180
? devlink_nl_eswitch_get_doit+0x290/0x290
? devlink_nl_pre_doit_port_optional+0x40/0x40
? genl_family_rcv_msg_dumpit+0xf0/0xf0
netlink_rcv_skb+0x4b/0xf0
genl_rcv+0x24/0x40
netlink_unicast+0x255/0x380
? __alloc_skb+0x120/0x1b0
netlink_sendmsg+0x1f3/0x420
__sock_sendmsg+0x38/0x60
__sys_sendto+0x119/0x180
? count_memcg_events+0xec/0x150
? __rseq_handle_notify_resume+0xa9/0x460
? handle_mm_fault+0x12e/0x260
? do_user_addr_fault+0x2a4/0x680
__x64_sys_sendto+0x20/0x30
do_syscall_64+0x53/0x1d0
entry_SYSCALL_64_after_hwframe+0x4b/0x53
Ensure the pointer is valid before use by checking it for NULL. If it
is valid, immediately call netdev_hold() to take a reference, and
preventing the netdevice from being freed while it is in use.
Fixes: 7a9fb35e8c3a ("net/mlx5e: Do not reload ethernet ports when changing eswitch mode")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
.../net/ethernet/mellanox/mlx5/core/en_rep.c | 26 +++++++++++++++----
.../net/ethernet/mellanox/mlx5/core/esw/qos.c | 1 +
.../ethernet/mellanox/mlx5/core/lib/mlx5.h | 15 ++++++++++-
include/linux/mlx5/driver.h | 1 +
4 files changed, 37 insertions(+), 6 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 63a7a788fb0d..912887376824 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -1506,12 +1506,20 @@ static const struct mlx5e_profile mlx5e_uplink_rep_profile = {
static int
mlx5e_vport_uplink_rep_load(struct mlx5_core_dev *dev, struct mlx5_eswitch_rep *rep)
{
- struct mlx5e_priv *priv = netdev_priv(mlx5_uplink_netdev_get(dev));
struct mlx5e_rep_priv *rpriv = mlx5e_rep_to_rep_priv(rep);
+ struct net_device *netdev = mlx5_uplink_netdev_get(dev);
+ struct mlx5e_priv *priv;
+ int err;
+
+ if (!netdev)
+ return 0;
+ priv = netdev_priv(netdev);
rpriv->netdev = priv->netdev;
- return mlx5e_netdev_change_profile(priv, &mlx5e_uplink_rep_profile,
- rpriv);
+ err = mlx5e_netdev_change_profile(priv, &mlx5e_uplink_rep_profile,
+ rpriv);
+ mlx5_uplink_netdev_put(dev, netdev);
+ return err;
}
static void
@@ -1638,8 +1646,16 @@ mlx5e_vport_rep_unload(struct mlx5_eswitch_rep *rep)
{
struct mlx5e_rep_priv *rpriv = mlx5e_rep_to_rep_priv(rep);
struct net_device *netdev = rpriv->netdev;
- struct mlx5e_priv *priv = netdev_priv(netdev);
- void *ppriv = priv->ppriv;
+ struct mlx5e_priv *priv;
+ void *ppriv;
+
+ if (!netdev) {
+ ppriv = rpriv;
+ goto free_ppriv;
+ }
+
+ priv = netdev_priv(netdev);
+ ppriv = priv->ppriv;
if (rep->vport == MLX5_VPORT_UPLINK) {
mlx5e_vport_uplink_rep_unload(rpriv);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c
index 8b4977650183..5f2d6c35f1ad 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c
@@ -1515,6 +1515,7 @@ static u32 mlx5_esw_qos_lag_link_speed_get_locked(struct mlx5_core_dev *mdev)
speed = lksettings.base.speed;
out:
+ mlx5_uplink_netdev_put(mdev, slave);
return speed;
}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h b/drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h
index b111ccd03b02..74ea5da58b7e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h
@@ -47,7 +47,20 @@ int mlx5_crdump_collect(struct mlx5_core_dev *dev, u32 *cr_data);
static inline struct net_device *mlx5_uplink_netdev_get(struct mlx5_core_dev *mdev)
{
- return mdev->mlx5e_res.uplink_netdev;
+ struct mlx5e_resources *mlx5e_res = &mdev->mlx5e_res;
+ struct net_device *netdev;
+
+ mutex_lock(&mlx5e_res->uplink_netdev_lock);
+ netdev = mlx5e_res->uplink_netdev;
+ netdev_hold(netdev, &mlx5e_res->tracker, GFP_KERNEL);
+ mutex_unlock(&mlx5e_res->uplink_netdev_lock);
+ return netdev;
+}
+
+static inline void mlx5_uplink_netdev_put(struct mlx5_core_dev *mdev,
+ struct net_device *netdev)
+{
+ netdev_put(netdev, &mdev->mlx5e_res.tracker);
}
struct mlx5_sd;
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 8c5fbfb85749..10fe492e1fed 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -663,6 +663,7 @@ struct mlx5e_resources {
bool tisn_valid;
} hw_objs;
struct net_device *uplink_netdev;
+ netdevice_tracker tracker;
struct mutex uplink_netdev_lock;
struct mlx5_crypto_dek_priv *dek_priv;
};
--
2.31.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH net 2/3] net/mlx5e: Prevent entering switchdev mode with inconsistent netns
2025-09-08 10:07 [PATCH net 0/3] mlx5e misc fixes 2025-09-08 Tariq Toukan
2025-09-08 10:07 ` [PATCH net 1/3] net/mlx5e: Harden uplink netdev access against device unbind Tariq Toukan
@ 2025-09-08 10:07 ` Tariq Toukan
2025-09-10 1:23 ` Jakub Kicinski
2025-09-08 10:07 ` [PATCH net 3/3] net/mlx5e: Add a miss level for ipsec crypto offload Tariq Toukan
2 siblings, 1 reply; 16+ messages in thread
From: Tariq Toukan @ 2025-09-08 10:07 UTC (permalink / raw)
To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
David S. Miller
Cc: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Mark Bloch, netdev,
linux-rdma, linux-kernel, Gal Pressman, Jianbo Liu
From: Jianbo Liu <jianbol@nvidia.com>
When a PF enters switchdev mode, its netdevice becomes the uplink
representor but remains in its current network namespace. All other
representors (VFs, SFs) are created in the netns of the devlink
instance.
If the PF's netns has been moved and differs from the devlink's netns,
enabling switchdev mode would create an invalid state where
representors and PF exist in different namespaces.
To prevent this inconsistent configuration, block the request to enter
switchdev mode if the PF netdevice's netns does not match the netns of
its devlink instance.
As part of this change, the PF's netns is first marked as immutable.
This prevents race conditions where the netns could be changed after
the check is performed but before the mode transition is complete, and
it aligns the PF's behavior with that of the final uplink representor.
Fixes: 71c6eaebf06a ("net/mlx5e: Set netdev name space on creation")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
.../mellanox/mlx5/core/eswitch_offloads.c | 33 +++++++++++++++++++
1 file changed, 33 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index bee906661282..b204ed459760 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -3739,6 +3739,29 @@ void mlx5_eswitch_unblock_mode(struct mlx5_core_dev *dev)
up_write(&esw->mode_lock);
}
+/* Returns false only when uplink netdev exists and its netns is different from
+ * devlink's netns. True for all others so entering switchdev mode is allowed.
+ */
+static bool mlx5_devlink_netdev_netns_immutable_set(struct devlink *devlink,
+ bool immutable)
+{
+ struct mlx5_core_dev *mdev = devlink_priv(devlink);
+ struct net_device *netdev;
+ bool ret;
+
+ netdev = mlx5_uplink_netdev_get(mdev);
+ if (!netdev)
+ return true;
+
+ rtnl_lock();
+ netdev->netns_immutable = immutable;
+ ret = net_eq(dev_net(netdev), devlink_net(devlink));
+ rtnl_unlock();
+
+ mlx5_uplink_netdev_put(mdev, netdev);
+ return ret;
+}
+
int mlx5_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode,
struct netlink_ext_ack *extack)
{
@@ -3781,6 +3804,14 @@ int mlx5_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode,
esw->eswitch_operation_in_progress = true;
up_write(&esw->mode_lock);
+ if (mode == DEVLINK_ESWITCH_MODE_SWITCHDEV &&
+ !mlx5_devlink_netdev_netns_immutable_set(devlink, true)) {
+ NL_SET_ERR_MSG_MOD(extack,
+ "Can't change E-Switch mode to switchdev when netdev net namespace has diverged from the devlink's.");
+ err = -EINVAL;
+ goto skip;
+ }
+
if (mode == DEVLINK_ESWITCH_MODE_LEGACY)
esw->dev->priv.flags |= MLX5_PRIV_FLAGS_SWITCH_LEGACY;
mlx5_eswitch_disable_locked(esw);
@@ -3799,6 +3830,8 @@ int mlx5_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode,
}
skip:
+ if (mode == DEVLINK_ESWITCH_MODE_SWITCHDEV && err)
+ mlx5_devlink_netdev_netns_immutable_set(devlink, false);
down_write(&esw->mode_lock);
esw->eswitch_operation_in_progress = false;
unlock:
--
2.31.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH net 3/3] net/mlx5e: Add a miss level for ipsec crypto offload
2025-09-08 10:07 [PATCH net 0/3] mlx5e misc fixes 2025-09-08 Tariq Toukan
2025-09-08 10:07 ` [PATCH net 1/3] net/mlx5e: Harden uplink netdev access against device unbind Tariq Toukan
2025-09-08 10:07 ` [PATCH net 2/3] net/mlx5e: Prevent entering switchdev mode with inconsistent netns Tariq Toukan
@ 2025-09-08 10:07 ` Tariq Toukan
2 siblings, 0 replies; 16+ messages in thread
From: Tariq Toukan @ 2025-09-08 10:07 UTC (permalink / raw)
To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
David S. Miller
Cc: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Mark Bloch, netdev,
linux-rdma, linux-kernel, Gal Pressman, Lama Kayal, Jianbo Liu,
Chris Mi
From: Lama Kayal <lkayal@nvidia.com>
The cited commit adds a miss table for switchdev mode. But it
uses the same level as policy table. Will hit the following error
when running command:
# ip xfrm state add src 192.168.1.22 dst 192.168.1.21 proto \
esp spi 1001 reqid 10001 aead 'rfc4106(gcm(aes))' \
0x3a189a7f9374955d3817886c8587f1da3df387ff 128 \
mode tunnel offload dev enp8s0f0 dir in
Error: mlx5_core: Device failed to offload this state.
The dmesg error is:
mlx5_core 0000:03:00.0: ipsec_miss_create:578:(pid 311797): fail to create IPsec miss_rule err=-22
Fix it by adding a new miss level to avoid the error.
Fixes: 7d9e292ecd67 ("net/mlx5e: Move IPSec policy check after decryption")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Chris Mi <cmi@nvidia.com>
Signed-off-by: Lama Kayal <lkayal@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
drivers/net/ethernet/mellanox/mlx5/core/en/fs.h | 1 +
drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h | 1 +
drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c | 3 ++-
drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 4 ++--
4 files changed, 6 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/fs.h b/drivers/net/ethernet/mellanox/mlx5/core/en/fs.h
index 9560fcba643f..ac65e3191480 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/fs.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/fs.h
@@ -92,6 +92,7 @@ enum {
MLX5E_ACCEL_FS_ESP_FT_LEVEL = MLX5E_INNER_TTC_FT_LEVEL + 1,
MLX5E_ACCEL_FS_ESP_FT_ERR_LEVEL,
MLX5E_ACCEL_FS_POL_FT_LEVEL,
+ MLX5E_ACCEL_FS_POL_MISS_FT_LEVEL,
MLX5E_ACCEL_FS_ESP_FT_ROCE_LEVEL,
#endif
};
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h
index ffcd0cdeb775..23703f28386a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h
@@ -185,6 +185,7 @@ struct mlx5e_ipsec_rx_create_attr {
u32 family;
int prio;
int pol_level;
+ int pol_miss_level;
int sa_level;
int status_level;
enum mlx5_flow_namespace_type chains_ns;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c
index 98b6a3a623f9..65dc3529283b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c
@@ -747,6 +747,7 @@ static void ipsec_rx_create_attr_set(struct mlx5e_ipsec *ipsec,
attr->family = family;
attr->prio = MLX5E_NIC_PRIO;
attr->pol_level = MLX5E_ACCEL_FS_POL_FT_LEVEL;
+ attr->pol_miss_level = MLX5E_ACCEL_FS_POL_MISS_FT_LEVEL;
attr->sa_level = MLX5E_ACCEL_FS_ESP_FT_LEVEL;
attr->status_level = MLX5E_ACCEL_FS_ESP_FT_ERR_LEVEL;
attr->chains_ns = MLX5_FLOW_NAMESPACE_KERNEL;
@@ -833,7 +834,7 @@ static int ipsec_rx_chains_create_miss(struct mlx5e_ipsec *ipsec,
ft_attr.max_fte = 1;
ft_attr.autogroup.max_num_groups = 1;
- ft_attr.level = attr->pol_level;
+ ft_attr.level = attr->pol_miss_level;
ft_attr.prio = attr->prio;
ft = mlx5_create_auto_grouped_flow_table(attr->ns, &ft_attr);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index cb165085a4c1..db552c012b4f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -114,9 +114,9 @@
#define ETHTOOL_NUM_PRIOS 11
#define ETHTOOL_MIN_LEVEL (KERNEL_MIN_LEVEL + ETHTOOL_NUM_PRIOS)
/* Vlan, mac, ttc, inner ttc, {UDP/ANY/aRFS/accel/{esp, esp_err}}, IPsec policy,
- * {IPsec RoCE MPV,Alias table},IPsec RoCE policy
+ * IPsec policy miss, {IPsec RoCE MPV,Alias table},IPsec RoCE policy
*/
-#define KERNEL_NIC_PRIO_NUM_LEVELS 10
+#define KERNEL_NIC_PRIO_NUM_LEVELS 11
#define KERNEL_NIC_NUM_PRIOS 1
/* One more level for tc, and one more for promisc */
#define KERNEL_MIN_LEVEL (KERNEL_NIC_PRIO_NUM_LEVELS + 2)
--
2.31.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH net 2/3] net/mlx5e: Prevent entering switchdev mode with inconsistent netns
2025-09-08 10:07 ` [PATCH net 2/3] net/mlx5e: Prevent entering switchdev mode with inconsistent netns Tariq Toukan
@ 2025-09-10 1:23 ` Jakub Kicinski
2025-09-10 3:01 ` Jianbo Liu
0 siblings, 1 reply; 16+ messages in thread
From: Jakub Kicinski @ 2025-09-10 1:23 UTC (permalink / raw)
To: Tariq Toukan
Cc: Eric Dumazet, Paolo Abeni, Andrew Lunn, David S. Miller,
Saeed Mahameed, Leon Romanovsky, Mark Bloch, netdev, linux-rdma,
linux-kernel, Gal Pressman, Jianbo Liu
On Mon, 8 Sep 2025 13:07:05 +0300 Tariq Toukan wrote:
> If the PF's netns has been moved and differs from the devlink's netns,
> enabling switchdev mode would create an invalid state where
> representors and PF exist in different namespaces.
>
> To prevent this inconsistent configuration,
Could you explain clearly what is the problem with having different
netdevs in different namespaces? From networking perspective it really
doesn't matter.
--
pw-bot: cr
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net 1/3] net/mlx5e: Harden uplink netdev access against device unbind
2025-09-08 10:07 ` [PATCH net 1/3] net/mlx5e: Harden uplink netdev access against device unbind Tariq Toukan
@ 2025-09-10 1:23 ` Jakub Kicinski
2025-09-10 3:23 ` Jianbo Liu
0 siblings, 1 reply; 16+ messages in thread
From: Jakub Kicinski @ 2025-09-10 1:23 UTC (permalink / raw)
To: Tariq Toukan
Cc: Eric Dumazet, Paolo Abeni, Andrew Lunn, David S. Miller,
Saeed Mahameed, Leon Romanovsky, Mark Bloch, netdev, linux-rdma,
linux-kernel, Gal Pressman, Jianbo Liu
On Mon, 8 Sep 2025 13:07:04 +0300 Tariq Toukan wrote:
> + struct net_device *netdev = mlx5_uplink_netdev_get(dev);
> + struct mlx5e_priv *priv;
> + int err;
> +
> + if (!netdev)
> + return 0;
Please don't call in variable init functions which require cleanup
or error checking.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net 2/3] net/mlx5e: Prevent entering switchdev mode with inconsistent netns
2025-09-10 1:23 ` Jakub Kicinski
@ 2025-09-10 3:01 ` Jianbo Liu
2025-09-11 0:48 ` Jakub Kicinski
0 siblings, 1 reply; 16+ messages in thread
From: Jianbo Liu @ 2025-09-10 3:01 UTC (permalink / raw)
To: Jakub Kicinski, Tariq Toukan
Cc: Eric Dumazet, Paolo Abeni, Andrew Lunn, David S. Miller,
Saeed Mahameed, Leon Romanovsky, Mark Bloch, netdev, linux-rdma,
linux-kernel, Gal Pressman
On 9/10/2025 9:23 AM, Jakub Kicinski wrote:
> On Mon, 8 Sep 2025 13:07:05 +0300 Tariq Toukan wrote:
>> If the PF's netns has been moved and differs from the devlink's netns,
>> enabling switchdev mode would create an invalid state where
>> representors and PF exist in different namespaces.
>>
>> To prevent this inconsistent configuration,
>
> Could you explain clearly what is the problem with having different
> netdevs in different namespaces? From networking perspective it really
> doesn't matter.
There is a requirement from customer who wants to manage openvswitch in
a container. But he can't complete the steps (changing eswitch and
configuring OVS) in the container if the netns are different.
Besides, ibdev is dependent on netdev, there is refcnt issue if netdev
is moved to other netns but devlink netns is not changed by "devlink dev
reload netns" command.
Thanks!
Jianbo
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net 1/3] net/mlx5e: Harden uplink netdev access against device unbind
2025-09-10 1:23 ` Jakub Kicinski
@ 2025-09-10 3:23 ` Jianbo Liu
2025-09-11 0:45 ` Jakub Kicinski
0 siblings, 1 reply; 16+ messages in thread
From: Jianbo Liu @ 2025-09-10 3:23 UTC (permalink / raw)
To: Jakub Kicinski, Tariq Toukan
Cc: Eric Dumazet, Paolo Abeni, Andrew Lunn, David S. Miller,
Saeed Mahameed, Leon Romanovsky, Mark Bloch, netdev, linux-rdma,
linux-kernel, Gal Pressman
On 9/10/2025 9:23 AM, Jakub Kicinski wrote:
> On Mon, 8 Sep 2025 13:07:04 +0300 Tariq Toukan wrote:
>> + struct net_device *netdev = mlx5_uplink_netdev_get(dev);
>> + struct mlx5e_priv *priv;
>> + int err;
>> +
>> + if (!netdev)
>> + return 0;
>
> Please don't call in variable init functions which require cleanup
> or error checking.
But in this function, a NULL return from mlx5_uplink_netdev_get is a
valid condition where it should simply return 0. No cleanup or error
check is needed.
Thanks!
Jianbo
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net 1/3] net/mlx5e: Harden uplink netdev access against device unbind
2025-09-10 3:23 ` Jianbo Liu
@ 2025-09-11 0:45 ` Jakub Kicinski
2025-09-11 7:09 ` Jianbo Liu
0 siblings, 1 reply; 16+ messages in thread
From: Jakub Kicinski @ 2025-09-11 0:45 UTC (permalink / raw)
To: Jianbo Liu
Cc: Tariq Toukan, Eric Dumazet, Paolo Abeni, Andrew Lunn,
David S. Miller, Saeed Mahameed, Leon Romanovsky, Mark Bloch,
netdev, linux-rdma, linux-kernel, Gal Pressman
On Wed, 10 Sep 2025 11:23:09 +0800 Jianbo Liu wrote:
> On 9/10/2025 9:23 AM, Jakub Kicinski wrote:
> > On Mon, 8 Sep 2025 13:07:04 +0300 Tariq Toukan wrote:
> >> + struct net_device *netdev = mlx5_uplink_netdev_get(dev);
> >> + struct mlx5e_priv *priv;
> >> + int err;
> >> +
> >> + if (!netdev)
> >> + return 0;
> >
> > Please don't call in variable init functions which require cleanup
> > or error checking.
>
> But in this function, a NULL return from mlx5_uplink_netdev_get is a
> valid condition where it should simply return 0. No cleanup or error
> check is needed.
You have to check if it succeeded, and if so, you need to clean up
later. Do no hide meaningful code in variable init.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net 2/3] net/mlx5e: Prevent entering switchdev mode with inconsistent netns
2025-09-10 3:01 ` Jianbo Liu
@ 2025-09-11 0:48 ` Jakub Kicinski
2025-09-11 7:48 ` Jianbo Liu
0 siblings, 1 reply; 16+ messages in thread
From: Jakub Kicinski @ 2025-09-11 0:48 UTC (permalink / raw)
To: Jianbo Liu
Cc: Tariq Toukan, Eric Dumazet, Paolo Abeni, Andrew Lunn,
David S. Miller, Saeed Mahameed, Leon Romanovsky, Mark Bloch,
netdev, linux-rdma, linux-kernel, Gal Pressman
On Wed, 10 Sep 2025 11:01:18 +0800 Jianbo Liu wrote:
> On 9/10/2025 9:23 AM, Jakub Kicinski wrote:
> > On Mon, 8 Sep 2025 13:07:05 +0300 Tariq Toukan wrote:
> >> If the PF's netns has been moved and differs from the devlink's netns,
> >> enabling switchdev mode would create an invalid state where
> >> representors and PF exist in different namespaces.
> >>
> >> To prevent this inconsistent configuration,
> >
> > Could you explain clearly what is the problem with having different
> > netdevs in different namespaces? From networking perspective it really
> > doesn't matter.
>
> There is a requirement from customer who wants to manage openvswitch in
> a container. But he can't complete the steps (changing eswitch and
> configuring OVS) in the container if the netns are different.
You're preventing a configuration which you think is "bad" (for a
reason unknown). How is _rejecting_ a config enabling you to fulfill
some "customer requirement" which sounds like having all interfaces
in a separate ns?
> Besides, ibdev is dependent on netdev, there is refcnt issue if netdev
> is moved to other netns but devlink netns is not changed by "devlink dev
> reload netns" command.
shrug
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net 1/3] net/mlx5e: Harden uplink netdev access against device unbind
2025-09-11 0:45 ` Jakub Kicinski
@ 2025-09-11 7:09 ` Jianbo Liu
2025-09-12 11:07 ` Przemek Kitszel
0 siblings, 1 reply; 16+ messages in thread
From: Jianbo Liu @ 2025-09-11 7:09 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Tariq Toukan, Eric Dumazet, Paolo Abeni, Andrew Lunn,
David S. Miller, Saeed Mahameed, Leon Romanovsky, Mark Bloch,
netdev, linux-rdma, linux-kernel, Gal Pressman
On 9/11/2025 8:45 AM, Jakub Kicinski wrote:
> On Wed, 10 Sep 2025 11:23:09 +0800 Jianbo Liu wrote:
>> On 9/10/2025 9:23 AM, Jakub Kicinski wrote:
>>> On Mon, 8 Sep 2025 13:07:04 +0300 Tariq Toukan wrote:
>>>> + struct net_device *netdev = mlx5_uplink_netdev_get(dev);
>>>> + struct mlx5e_priv *priv;
>>>> + int err;
>>>> +
>>>> + if (!netdev)
>>>> + return 0;
>>>
>>> Please don't call in variable init functions which require cleanup
>>> or error checking.
>>
>> But in this function, a NULL return from mlx5_uplink_netdev_get is a
>> valid condition where it should simply return 0. No cleanup or error
>> check is needed.
>
> You have to check if it succeeded, and if so, you need to clean up
> later. Do no hide meaningful code in variable init.
My focus was on the NULL case, but I see now that the real issue is
ensuring the corresponding cleanup (_put) happens on the successful
path. Hiding the _get call in the initializer makes that less clear.
I will refactor the code to follow the correct pattern, like this:
struct net_device *netdev;
netdev = mlx5_uplink_netdev_get(dev);
if (!netdev)
return 0;
Thank you for the explanation.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net 2/3] net/mlx5e: Prevent entering switchdev mode with inconsistent netns
2025-09-11 0:48 ` Jakub Kicinski
@ 2025-09-11 7:48 ` Jianbo Liu
2025-09-12 0:11 ` Jakub Kicinski
0 siblings, 1 reply; 16+ messages in thread
From: Jianbo Liu @ 2025-09-11 7:48 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Tariq Toukan, Eric Dumazet, Paolo Abeni, Andrew Lunn,
David S. Miller, Saeed Mahameed, Leon Romanovsky, Mark Bloch,
netdev, linux-rdma, linux-kernel, Gal Pressman
On 9/11/2025 8:48 AM, Jakub Kicinski wrote:
> On Wed, 10 Sep 2025 11:01:18 +0800 Jianbo Liu wrote:
>> On 9/10/2025 9:23 AM, Jakub Kicinski wrote:
>>> On Mon, 8 Sep 2025 13:07:05 +0300 Tariq Toukan wrote:
>>>> If the PF's netns has been moved and differs from the devlink's netns,
>>>> enabling switchdev mode would create an invalid state where
>>>> representors and PF exist in different namespaces.
>>>>
>>>> To prevent this inconsistent configuration,
>>>
>>> Could you explain clearly what is the problem with having different
>>> netdevs in different namespaces? From networking perspective it really
>>> doesn't matter.
>>
>> There is a requirement from customer who wants to manage openvswitch in
>> a container. But he can't complete the steps (changing eswitch and
>> configuring OVS) in the container if the netns are different.
>
> You're preventing a configuration which you think is "bad" (for a
> reason unknown). How is _rejecting_ a config enabling you to fulfill
> some "customer requirement" which sounds like having all interfaces
> in a separate ns?
>
My apologies, I wasn't clear. The problem is specific to the OVS control
plane. ovs-vsctl cannot manage the switch if the PF uplink and VF
representors are in different namespaces. When the PF is in a container
while the devlink instance is bound to the host, enabling switchdev
creates this exact split: the PF uplink stays in the container, while
the VF representors are created on the host.
Our patch prevents this broken state by requiring the devlink namespace
to be set to the container's namespace before switchdev can be enabled.
>> Besides, ibdev is dependent on netdev, there is refcnt issue if netdev
>> is moved to other netns but devlink netns is not changed by "devlink dev
>> reload netns" command.
>
> shrug
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net 2/3] net/mlx5e: Prevent entering switchdev mode with inconsistent netns
2025-09-11 7:48 ` Jianbo Liu
@ 2025-09-12 0:11 ` Jakub Kicinski
2025-09-12 1:12 ` Jianbo Liu
0 siblings, 1 reply; 16+ messages in thread
From: Jakub Kicinski @ 2025-09-12 0:11 UTC (permalink / raw)
To: Jianbo Liu
Cc: Tariq Toukan, Eric Dumazet, Paolo Abeni, Andrew Lunn,
David S. Miller, Saeed Mahameed, Leon Romanovsky, Mark Bloch,
netdev, linux-rdma, linux-kernel, Gal Pressman
On Thu, 11 Sep 2025 15:48:24 +0800 Jianbo Liu wrote:
> >> There is a requirement from customer who wants to manage openvswitch in
> >> a container. But he can't complete the steps (changing eswitch and
> >> configuring OVS) in the container if the netns are different.
> >
> > You're preventing a configuration which you think is "bad" (for a
> > reason unknown). How is _rejecting_ a config enabling you to fulfill
> > some "customer requirement" which sounds like having all interfaces
> > in a separate ns?
>
> My apologies, I wasn't clear. The problem is specific to the OVS control
> plane. ovs-vsctl cannot manage the switch if the PF uplink and VF
> representors are in different namespaces. When the PF is in a container
> while the devlink instance is bound to the host, enabling switchdev
> creates this exact split: the PF uplink stays in the container, while
> the VF representors are created on the host.
So you're saying the user can mess up the configuration in a way that'd
prevent them from using OVS. No strong objection to the patch (assuming
commit message is improved), but I don't see how this is a fix.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net 2/3] net/mlx5e: Prevent entering switchdev mode with inconsistent netns
2025-09-12 0:11 ` Jakub Kicinski
@ 2025-09-12 1:12 ` Jianbo Liu
0 siblings, 0 replies; 16+ messages in thread
From: Jianbo Liu @ 2025-09-12 1:12 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Tariq Toukan, Eric Dumazet, Paolo Abeni, Andrew Lunn,
David S. Miller, Saeed Mahameed, Leon Romanovsky, Mark Bloch,
netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
linux-kernel@vger.kernel.org, Gal Pressman
On 9/12/2025 8:11 AM, Jakub Kicinski wrote:
> On Thu, 11 Sep 2025 15:48:24 +0800 Jianbo Liu wrote:
>>>> There is a requirement from customer who wants to manage openvswitch in
>>>> a container. But he can't complete the steps (changing eswitch and
>>>> configuring OVS) in the container if the netns are different.
>>>
>>> You're preventing a configuration which you think is "bad" (for a
>>> reason unknown). How is _rejecting_ a config enabling you to fulfill
>>> some "customer requirement" which sounds like having all interfaces
>>> in a separate ns?
>>
>> My apologies, I wasn't clear. The problem is specific to the OVS control
>> plane. ovs-vsctl cannot manage the switch if the PF uplink and VF
>> representors are in different namespaces. When the PF is in a container
>> while the devlink instance is bound to the host, enabling switchdev
>> creates this exact split: the PF uplink stays in the container, while
>> the VF representors are created on the host.
>
> So you're saying the user can mess up the configuration in a way that'd
> prevent them from using OVS. No strong objection to the patch (assuming
> commit message is improved), but I don't see how this is a fix.
Yes. We are preventing a configuration that breaks the OVS control plane
for this specific use case. Thank you for the review. I will update the
commit message.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net 1/3] net/mlx5e: Harden uplink netdev access against device unbind
2025-09-11 7:09 ` Jianbo Liu
@ 2025-09-12 11:07 ` Przemek Kitszel
2025-09-15 0:59 ` Jianbo Liu
0 siblings, 1 reply; 16+ messages in thread
From: Przemek Kitszel @ 2025-09-12 11:07 UTC (permalink / raw)
To: Jianbo Liu
Cc: Tariq Toukan, Eric Dumazet, Paolo Abeni, Andrew Lunn,
David S. Miller, Saeed Mahameed, Leon Romanovsky, Mark Bloch,
netdev, linux-rdma, linux-kernel, Gal Pressman, Jakub Kicinski
On 9/11/25 09:09, Jianbo Liu wrote:
>
>
> On 9/11/2025 8:45 AM, Jakub Kicinski wrote:
>> On Wed, 10 Sep 2025 11:23:09 +0800 Jianbo Liu wrote:
>>> On 9/10/2025 9:23 AM, Jakub Kicinski wrote:
>>>> On Mon, 8 Sep 2025 13:07:04 +0300 Tariq Toukan wrote:
>>>>> + struct net_device *netdev = mlx5_uplink_netdev_get(dev);
>>>>> + struct mlx5e_priv *priv;
>>>>> + int err;
>>>>> +
>>>>> + if (!netdev)
>>>>> + return 0;
>>>>
>>>> Please don't call in variable init functions which require cleanup
>>>> or error checking.
>>>
>>> But in this function, a NULL return from mlx5_uplink_netdev_get is a
>>> valid condition where it should simply return 0. No cleanup or error
>>> check is needed.
>>
>> You have to check if it succeeded, and if so, you need to clean up
>> later. Do no hide meaningful code in variable init.
>
> My focus was on the NULL case, but I see now that the real issue is
> ensuring the corresponding cleanup (_put) happens on the successful
> path. Hiding the _get call in the initializer makes that less clear.
>
> I will refactor the code to follow the correct pattern, like this:
>
> struct net_device *netdev;
>
> netdev = mlx5_uplink_netdev_get(dev);
> if (!netdev)
> return 0;
>
> Thank you for the explanation.
>
that would be much better, and make it obvious that there is
matched get() and put() calls
would be also great to minify stacktrace
https://www.kernel.org/doc/html/latest/process/submitting-patches.html#backtraces-in-commit-messages
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net 1/3] net/mlx5e: Harden uplink netdev access against device unbind
2025-09-12 11:07 ` Przemek Kitszel
@ 2025-09-15 0:59 ` Jianbo Liu
0 siblings, 0 replies; 16+ messages in thread
From: Jianbo Liu @ 2025-09-15 0:59 UTC (permalink / raw)
To: Przemek Kitszel
Cc: Tariq Toukan, Eric Dumazet, Paolo Abeni, Andrew Lunn,
David S. Miller, Saeed Mahameed, Leon Romanovsky, Mark Bloch,
netdev, linux-rdma, linux-kernel, Gal Pressman, Jakub Kicinski
On 9/12/2025 7:07 PM, Przemek Kitszel wrote:
> On 9/11/25 09:09, Jianbo Liu wrote:
>>
>>
>> On 9/11/2025 8:45 AM, Jakub Kicinski wrote:
>>> On Wed, 10 Sep 2025 11:23:09 +0800 Jianbo Liu wrote:
>>>> On 9/10/2025 9:23 AM, Jakub Kicinski wrote:
>>>>> On Mon, 8 Sep 2025 13:07:04 +0300 Tariq Toukan wrote:
>>>>>> + struct net_device *netdev = mlx5_uplink_netdev_get(dev);
>>>>>> + struct mlx5e_priv *priv;
>>>>>> + int err;
>>>>>> +
>>>>>> + if (!netdev)
>>>>>> + return 0;
>>>>>
>>>>> Please don't call in variable init functions which require cleanup
>>>>> or error checking.
>>>>
>>>> But in this function, a NULL return from mlx5_uplink_netdev_get is a
>>>> valid condition where it should simply return 0. No cleanup or error
>>>> check is needed.
>>>
>>> You have to check if it succeeded, and if so, you need to clean up
>>> later. Do no hide meaningful code in variable init.
>>
>> My focus was on the NULL case, but I see now that the real issue is
>> ensuring the corresponding cleanup (_put) happens on the successful
>> path. Hiding the _get call in the initializer makes that less clear.
>>
>> I will refactor the code to follow the correct pattern, like this:
>>
>> struct net_device *netdev;
>>
>> netdev = mlx5_uplink_netdev_get(dev);
>> if (!netdev)
>> return 0;
>>
>> Thank you for the explanation.
>>
>
> that would be much better, and make it obvious that there is
> matched get() and put() calls
>
> would be also great to minify stacktrace
> https://www.kernel.org/doc/html/latest/process/submitting-
> patches.html#backtraces-in-commit-messages
>
Got it. I'll minify the stack trace. Thanks for the tip.
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2025-09-15 1:00 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-08 10:07 [PATCH net 0/3] mlx5e misc fixes 2025-09-08 Tariq Toukan
2025-09-08 10:07 ` [PATCH net 1/3] net/mlx5e: Harden uplink netdev access against device unbind Tariq Toukan
2025-09-10 1:23 ` Jakub Kicinski
2025-09-10 3:23 ` Jianbo Liu
2025-09-11 0:45 ` Jakub Kicinski
2025-09-11 7:09 ` Jianbo Liu
2025-09-12 11:07 ` Przemek Kitszel
2025-09-15 0:59 ` Jianbo Liu
2025-09-08 10:07 ` [PATCH net 2/3] net/mlx5e: Prevent entering switchdev mode with inconsistent netns Tariq Toukan
2025-09-10 1:23 ` Jakub Kicinski
2025-09-10 3:01 ` Jianbo Liu
2025-09-11 0:48 ` Jakub Kicinski
2025-09-11 7:48 ` Jianbo Liu
2025-09-12 0:11 ` Jakub Kicinski
2025-09-12 1:12 ` Jianbo Liu
2025-09-08 10:07 ` [PATCH net 3/3] net/mlx5e: Add a miss level for ipsec crypto offload Tariq Toukan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).