* [PATCH net-next 0/3] devlink eswitch active/inactive state
@ 2025-10-16 1:36 Saeed Mahameed
2025-10-16 1:36 ` [PATCH net-next 1/3] devlink: Introduce devlink eswitch state Saeed Mahameed
` (2 more replies)
0 siblings, 3 replies; 11+ messages in thread
From: Saeed Mahameed @ 2025-10-16 1:36 UTC (permalink / raw)
To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
Cc: Saeed Mahameed, netdev, Tariq Toukan, Gal Pressman,
Leon Romanovsky, mbloch, Adithya Jayachandran, Jiri Pirko
From: Saeed Mahameed <saeedm@nvidia.com>
Before having traffic flow through an eswitch, a user may want to have the
ability to block traffic towards the FDB until FDB is fully programmed and the
user is ready to send traffic to it. For example: when two eswitches are present
for vports in a multi-PF setup, one eswitch may take over the traffic from the
other when the user chooses. Before this take over, a user may want to first
program the inactive eswitch and then once ready redirect traffic to this new
eswitch.
This series introduces a user-configurable states for an eswitch that allows
dynamically switching between active and inactive states. When inactive, traffic
does not flow through the eswitch. While inactive, steering pipeline
configuration can be done (e.g. adding TC rules, discovering representors,
enabling the desired SDN modes such as bridge/OVS/DPDK/etc). Once configuration
is completed, a user can set the eswitch state to active and have traffic flow
through. This allows admins to upgrade forwarding pipeline rules with very
minimal downtime and packet drops.
A user can start the eswitch in switchdev mode in either active or inactive
state. To preserve backwards compatibility, the default state is active.
Active: Traffic is enabled on this eswitch FDB.
Inactive: Traffic is ignored/dropped on this eswitch FDB.
An example of starting the switch in active state is following.
1. Default is active (backward compatible)
$ devlink dev eswitch set pci/0000:08:00.1 mode switchdev
2. Explicitly set the state
$ devlink dev eswitch set pci/0000:08:00.1 mode switchdev state active
To bring up the esw in 'inactive' state:
$ devlink dev eswitch set pci/0000:08:00.1 mode switchdev state inactive
When querying the eswitch, we also see the state of it:
$ devlink dev eswitch show pci/0000:01:01.0
pci/0000:01:01.0: mode switchdev inline-mode none encap-mode basic state inactive
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Adithya Jayachandran <ajayachandra@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Parav Pandit (1):
devlink: Introduce devlink eswitch state
Saeed Mahameed (2):
net/mlx5: MPFS, add support for dynamic enable/disable
net/mlx5: E-Switch, support eswitch state
Documentation/netlink/specs/devlink.yaml | 13 ++
.../devlink/devlink-eswitch-attr.rst | 15 ++
.../net/ethernet/mellanox/mlx5/core/devlink.c | 2 +
.../mellanox/mlx5/core/esw/adj_vport.c | 15 +-
.../net/ethernet/mellanox/mlx5/core/eswitch.c | 1 +
.../net/ethernet/mellanox/mlx5/core/eswitch.h | 12 ++
.../mellanox/mlx5/core/eswitch_offloads.c | 157 ++++++++++++++++++
.../net/ethernet/mellanox/mlx5/core/fs_core.c | 5 +
.../ethernet/mellanox/mlx5/core/lib/mpfs.c | 111 +++++++++++--
.../ethernet/mellanox/mlx5/core/lib/mpfs.h | 9 +
include/linux/mlx5/fs.h | 1 +
include/net/devlink.h | 5 +
include/uapi/linux/devlink.h | 7 +
net/devlink/dev.c | 30 ++++
net/devlink/netlink_gen.c | 5 +-
15 files changed, 358 insertions(+), 30 deletions(-)
--
2.51.0
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH net-next 1/3] devlink: Introduce devlink eswitch state
2025-10-16 1:36 [PATCH net-next 0/3] devlink eswitch active/inactive state Saeed Mahameed
@ 2025-10-16 1:36 ` Saeed Mahameed
2025-10-16 9:16 ` Jiri Pirko
2025-10-16 1:36 ` [PATCH net-next 2/3] net/mlx5: MPFS, add support for dynamic enable/disable Saeed Mahameed
2025-10-16 1:36 ` [PATCH net-next 3/3] net/mlx5: E-Switch, support eswitch state Saeed Mahameed
2 siblings, 1 reply; 11+ messages in thread
From: Saeed Mahameed @ 2025-10-16 1:36 UTC (permalink / raw)
To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
Cc: Saeed Mahameed, netdev, Tariq Toukan, Gal Pressman,
Leon Romanovsky, mbloch, Parav Pandit, Adithya Jayachandran
From: Parav Pandit <parav@nvidia.com>
Introduce a new state to eswitch (active/inactive) and
enable user to set it dynamically.
A user can start the eswitch in switchdev mode in either active or
inactive state.
Active: Traffic is enabled on this eswitch FDB.
Inactive: Traffic is ignored/dropped on this eswitch FDB.
An example of starting the switch in active state is following.
1. devlink dev eswitch set pci/0000:08:00.1 mode switchdev
(default is active, backward compatible)
2. devlink dev eswitch set pci/0000:08:00.1 mode switchdev state
active
To bring up the esw in 'inactive' state:
devlink dev eswitch set pci/0000:08:00.1 mode switchdev state
inactive
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Adithya Jayachandran <ajayachandra@nvidia.com>
---
Documentation/netlink/specs/devlink.yaml | 13 ++++++++
.../devlink/devlink-eswitch-attr.rst | 15 ++++++++++
include/net/devlink.h | 5 ++++
include/uapi/linux/devlink.h | 7 +++++
net/devlink/dev.c | 30 +++++++++++++++++++
net/devlink/netlink_gen.c | 5 ++--
6 files changed, 73 insertions(+), 2 deletions(-)
diff --git a/Documentation/netlink/specs/devlink.yaml b/Documentation/netlink/specs/devlink.yaml
index 3db59c965869..4242a3431320 100644
--- a/Documentation/netlink/specs/devlink.yaml
+++ b/Documentation/netlink/specs/devlink.yaml
@@ -119,6 +119,14 @@ definitions:
name: none
-
name: basic
+ -
+ type: enum
+ name: eswitch-state
+ entries:
+ -
+ name: none
+ -
+ name: basic
-
type: enum
name: dpipe-header-id
@@ -857,6 +865,10 @@ attribute-sets:
name: health-reporter-burst-period
type: u64
doc: Time (in msec) for recoveries before starting the grace period.
+ -
+ name: eswitch-state
+ type: u8
+ enum: eswitch-state
-
name: dl-dev-stats
subset-of: devlink
@@ -1609,6 +1621,7 @@ operations:
- eswitch-mode
- eswitch-inline-mode
- eswitch-encap-mode
+ - eswitch-state
-
name: eswitch-set
diff --git a/Documentation/networking/devlink/devlink-eswitch-attr.rst b/Documentation/networking/devlink/devlink-eswitch-attr.rst
index 08bb39ab1528..13ad1ed300ee 100644
--- a/Documentation/networking/devlink/devlink-eswitch-attr.rst
+++ b/Documentation/networking/devlink/devlink-eswitch-attr.rst
@@ -57,6 +57,18 @@ The following is a list of E-Switch attributes.
* ``none`` Disable encapsulation support.
* ``basic`` Enable encapsulation support.
+ * - ``state``
+ - enum
+ - The state of the E-Switch.
+ In situations where the user want to bring up the e-switch, they want to
+ have the ability to block traffic towards the FDB until FDB is fully
+ programmed.
+ The state can be one of the following:
+
+ * ``active`` Traffic is enabled on this eswitch FDB - default mode
+ * ``inactive`` Traffic is disabled on this eswitch FDB - no traffic
+ will be forwarded to/from this eswitch FDB
+
Example Usage
=============
@@ -74,3 +86,6 @@ Example Usage
# enable encap-mode with legacy mode
$ devlink dev eswitch set pci/0000:08:00.0 mode legacy inline-mode none encap-mode basic
+
+ # enable switchdev mode in inactive state
+ $ devlink dev eswitch set pci/0000:08:00.0 mode switchdev state inactive
diff --git a/include/net/devlink.h b/include/net/devlink.h
index 8d4362f010e4..aca56a905ab8 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -1369,6 +1369,11 @@ struct devlink_ops {
int (*eswitch_encap_mode_set)(struct devlink *devlink,
enum devlink_eswitch_encap_mode encap_mode,
struct netlink_ext_ack *extack);
+ int (*eswitch_state_get)(struct devlink *devlink,
+ enum devlink_eswitch_state *state);
+ int (*eswitch_state_set)(struct devlink *devlink,
+ enum devlink_eswitch_state state,
+ struct netlink_ext_ack *extack);
int (*info_get)(struct devlink *devlink, struct devlink_info_req *req,
struct netlink_ext_ack *extack);
/**
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index bcad11a787a5..a01443810658 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -195,6 +195,11 @@ enum devlink_eswitch_encap_mode {
DEVLINK_ESWITCH_ENCAP_MODE_BASIC,
};
+enum devlink_eswitch_state {
+ DEVLINK_ESWITCH_STATE_INACTIVE,
+ DEVLINK_ESWITCH_STATE_ACTIVE,
+};
+
enum devlink_port_flavour {
DEVLINK_PORT_FLAVOUR_PHYSICAL, /* Any kind of a port physically
* facing the user.
@@ -638,6 +643,8 @@ enum devlink_attr {
DEVLINK_ATTR_HEALTH_REPORTER_BURST_PERIOD, /* u64 */
+ DEVLINK_ATTR_ESWITCH_STATE, /* u8 */
+
/* Add new attributes above here, update the spec in
* Documentation/netlink/specs/devlink.yaml and re-generate
* net/devlink/netlink_gen.c.
diff --git a/net/devlink/dev.c b/net/devlink/dev.c
index 02602704bdea..1eea3e2c1ade 100644
--- a/net/devlink/dev.c
+++ b/net/devlink/dev.c
@@ -672,6 +672,17 @@ static int devlink_nl_eswitch_fill(struct sk_buff *msg, struct devlink *devlink,
goto nla_put_failure;
}
+ if (ops->eswitch_state_get) {
+ enum devlink_eswitch_state state;
+
+ err = ops->eswitch_state_get(devlink, &state);
+ if (err)
+ return err;
+ err = nla_put_u8(msg, DEVLINK_ATTR_ESWITCH_STATE, state);
+ if (err)
+ return err;
+ }
+
genlmsg_end(msg, hdr);
return 0;
@@ -706,6 +717,7 @@ int devlink_nl_eswitch_set_doit(struct sk_buff *skb, struct genl_info *info)
struct devlink *devlink = info->user_ptr[0];
const struct devlink_ops *ops = devlink->ops;
enum devlink_eswitch_encap_mode encap_mode;
+ enum devlink_eswitch_state state;
u8 inline_mode;
int err = 0;
u16 mode;
@@ -722,6 +734,24 @@ int devlink_nl_eswitch_set_doit(struct sk_buff *skb, struct genl_info *info)
return err;
}
+ state = DEVLINK_ESWITCH_STATE_ACTIVE;
+ if (info->attrs[DEVLINK_ATTR_ESWITCH_STATE]) {
+ if (!ops->eswitch_state_set)
+ return -EOPNOTSUPP;
+ state = nla_get_u8(info->attrs[DEVLINK_ATTR_ESWITCH_STATE]);
+ }
+ /* If user did not supply the state attribute, the default is
+ * active state. If the state was not explicitly set, set the default
+ * state for drivers that support eswitch state.
+ * Keep this after mode-set as state handling can be dependent on
+ * the eswitch mode.
+ */
+ if (ops->eswitch_state_set) {
+ err = ops->eswitch_state_set(devlink, state, info->extack);
+ if (err)
+ return err;
+ }
+
if (info->attrs[DEVLINK_ATTR_ESWITCH_INLINE_MODE]) {
if (!ops->eswitch_inline_mode_set)
return -EOPNOTSUPP;
diff --git a/net/devlink/netlink_gen.c b/net/devlink/netlink_gen.c
index 9fd00977d59e..e0910fb2214d 100644
--- a/net/devlink/netlink_gen.c
+++ b/net/devlink/netlink_gen.c
@@ -226,12 +226,13 @@ static const struct nla_policy devlink_eswitch_get_nl_policy[DEVLINK_ATTR_DEV_NA
};
/* DEVLINK_CMD_ESWITCH_SET - do */
-static const struct nla_policy devlink_eswitch_set_nl_policy[DEVLINK_ATTR_ESWITCH_ENCAP_MODE + 1] = {
+static const struct nla_policy devlink_eswitch_set_nl_policy[DEVLINK_ATTR_ESWITCH_STATE + 1] = {
[DEVLINK_ATTR_BUS_NAME] = { .type = NLA_NUL_STRING, },
[DEVLINK_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING, },
[DEVLINK_ATTR_ESWITCH_MODE] = NLA_POLICY_MAX(NLA_U16, 1),
[DEVLINK_ATTR_ESWITCH_INLINE_MODE] = NLA_POLICY_MAX(NLA_U8, 3),
[DEVLINK_ATTR_ESWITCH_ENCAP_MODE] = NLA_POLICY_MAX(NLA_U8, 1),
+ [DEVLINK_ATTR_ESWITCH_STATE] = NLA_POLICY_MAX(NLA_U8, 1),
};
/* DEVLINK_CMD_DPIPE_TABLE_GET - do */
@@ -822,7 +823,7 @@ const struct genl_split_ops devlink_nl_ops[74] = {
.doit = devlink_nl_eswitch_set_doit,
.post_doit = devlink_nl_post_doit,
.policy = devlink_eswitch_set_nl_policy,
- .maxattr = DEVLINK_ATTR_ESWITCH_ENCAP_MODE,
+ .maxattr = DEVLINK_ATTR_ESWITCH_STATE,
.flags = GENL_ADMIN_PERM | GENL_CMD_CAP_DO,
},
{
--
2.51.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH net-next 2/3] net/mlx5: MPFS, add support for dynamic enable/disable
2025-10-16 1:36 [PATCH net-next 0/3] devlink eswitch active/inactive state Saeed Mahameed
2025-10-16 1:36 ` [PATCH net-next 1/3] devlink: Introduce devlink eswitch state Saeed Mahameed
@ 2025-10-16 1:36 ` Saeed Mahameed
2025-10-16 19:28 ` kernel test robot
` (2 more replies)
2025-10-16 1:36 ` [PATCH net-next 3/3] net/mlx5: E-Switch, support eswitch state Saeed Mahameed
2 siblings, 3 replies; 11+ messages in thread
From: Saeed Mahameed @ 2025-10-16 1:36 UTC (permalink / raw)
To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
Cc: Saeed Mahameed, netdev, Tariq Toukan, Gal Pressman,
Leon Romanovsky, mbloch, Adithya Jayachandran
From: Saeed Mahameed <saeedm@nvidia.com>
MPFS (Multi PF Switch) is enabled by default in Multi-Host environments,
the driver keeps a list of desired unicast mac addresses of all vports
(vfs/Sfs) and applied to HW via L2_table FW command.
Add API to dynamically apply the list of MACs to HW when needed for next
patches, to utilize this new API in devlink eswitch active/in-active uAPI.
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Adithya Jayachandran <ajayachandra@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
---
.../ethernet/mellanox/mlx5/core/lib/mpfs.c | 111 +++++++++++++++---
.../ethernet/mellanox/mlx5/core/lib/mpfs.h | 9 ++
2 files changed, 103 insertions(+), 17 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.c
index 4450091e181a..9230c31539fb 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.c
@@ -65,13 +65,14 @@ static int del_l2table_entry_cmd(struct mlx5_core_dev *dev, u32 index)
/* UC L2 table hash node */
struct l2table_node {
struct l2addr_node node;
- u32 index; /* index in HW l2 table */
+ int index; /* index in HW l2 table */
int ref_count;
};
struct mlx5_mpfs {
struct hlist_head hash[MLX5_L2_ADDR_HASH_SIZE];
struct mutex lock; /* Synchronize l2 table access */
+ bool enabled;
u32 size;
unsigned long *bitmap;
};
@@ -114,6 +115,8 @@ int mlx5_mpfs_init(struct mlx5_core_dev *dev)
return -ENOMEM;
}
+ mpfs->enabled = true;
+
dev->priv.mpfs = mpfs;
return 0;
}
@@ -135,7 +138,7 @@ int mlx5_mpfs_add_mac(struct mlx5_core_dev *dev, u8 *mac)
struct mlx5_mpfs *mpfs = dev->priv.mpfs;
struct l2table_node *l2addr;
int err = 0;
- u32 index;
+ int index;
if (!mpfs)
return 0;
@@ -148,30 +151,32 @@ int mlx5_mpfs_add_mac(struct mlx5_core_dev *dev, u8 *mac)
goto out;
}
- err = alloc_l2table_index(mpfs, &index);
- if (err)
- goto out;
-
l2addr = l2addr_hash_add(mpfs->hash, mac, struct l2table_node, GFP_KERNEL);
if (!l2addr) {
err = -ENOMEM;
- goto hash_add_err;
+ goto out;
}
- err = set_l2table_entry_cmd(dev, index, mac);
- if (err)
- goto set_table_entry_err;
+ index = -1;
+
+ if (mpfs->enabled) {
+ err = alloc_l2table_index(mpfs, &index);
+ if (err)
+ goto hash_del;
+ err = set_l2table_entry_cmd(dev, index, mac);
+ if (err)
+ goto free_l2table_index;
+ }
l2addr->index = index;
l2addr->ref_count = 1;
mlx5_core_dbg(dev, "MPFS mac added %pM, index (%d)\n", mac, index);
goto out;
-
-set_table_entry_err:
- l2addr_hash_del(l2addr);
-hash_add_err:
+free_l2table_index:
free_l2table_index(mpfs, index);
+hash_del:
+ l2addr_hash_del(l2addr);
out:
mutex_unlock(&mpfs->lock);
return err;
@@ -183,7 +188,7 @@ int mlx5_mpfs_del_mac(struct mlx5_core_dev *dev, u8 *mac)
struct mlx5_mpfs *mpfs = dev->priv.mpfs;
struct l2table_node *l2addr;
int err = 0;
- u32 index;
+ int index;
if (!mpfs)
return 0;
@@ -200,12 +205,84 @@ int mlx5_mpfs_del_mac(struct mlx5_core_dev *dev, u8 *mac)
goto unlock;
index = l2addr->index;
- del_l2table_entry_cmd(dev, index);
+ if (index >= 0) {
+ del_l2table_entry_cmd(dev, index);
+ free_l2table_index(mpfs, index);
+ }
l2addr_hash_del(l2addr);
- free_l2table_index(mpfs, index);
mlx5_core_dbg(dev, "MPFS mac deleted %pM, index (%d)\n", mac, index);
unlock:
mutex_unlock(&mpfs->lock);
return err;
}
EXPORT_SYMBOL(mlx5_mpfs_del_mac);
+
+int mlx5_mpfs_enable(struct mlx5_core_dev *dev)
+{
+ struct mlx5_mpfs *mpfs = dev->priv.mpfs;
+ struct l2table_node *l2addr;
+ struct hlist_node *n;
+ int err = 0;
+
+ if (!mpfs)
+ return -ENODEV;
+
+ mutex_lock(&mpfs->lock);
+ if (mpfs->enabled)
+ goto out;
+ mpfs->enabled = true;
+ mlx5_core_dbg(dev, "MPFS enabling mpfs\n");
+
+ mlx5_mpfs_foreach(l2addr, n, mpfs) {
+ u32 index;
+
+ err = alloc_l2table_index(mpfs, &index);
+ if (err) {
+ mlx5_core_err(dev, "Failed to allocated MPFS index for %pM, err(%d)\n",
+ l2addr->node.addr, err);
+ goto out;
+ }
+
+ err = set_l2table_entry_cmd(dev, index, l2addr->node.addr);
+ if (err) {
+ mlx5_core_err(dev, "Failed to set MPFS l2table entry for %pM index=%d, err(%d)\n",
+ l2addr->node.addr, index, err);
+ free_l2table_index(mpfs, index);
+ goto out;
+ }
+
+ l2addr->index = index;
+ mlx5_core_dbg(dev, "MPFS entry %pM, set @index (%d)\n",
+ l2addr->node.addr, l2addr->index);
+ }
+out:
+ mutex_unlock(&mpfs->lock);
+ return err;
+}
+
+void mlx5_mpfs_disable(struct mlx5_core_dev *dev)
+{
+ struct mlx5_mpfs *mpfs = dev->priv.mpfs;
+ struct l2table_node *l2addr;
+ struct hlist_node *n;
+
+ if (!mpfs)
+ return;
+
+ mutex_lock(&mpfs->lock);
+ if (!mpfs->enabled)
+ goto unlock;
+ mlx5_mpfs_foreach(l2addr, n, mpfs) {
+ if (l2addr->index < 0)
+ continue;
+ del_l2table_entry_cmd(dev, l2addr->index);
+ free_l2table_index(mpfs, l2addr->index);
+ mlx5_core_dbg(dev, "MPFS entry %pM, deleted @index (%d)\n",
+ l2addr->node.addr, l2addr->index);
+ l2addr->index = -1;
+ }
+ mpfs->enabled = false;
+ mlx5_core_dbg(dev, "MPFS disabled\n");
+unlock:
+ mutex_unlock(&mpfs->lock);
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.h b/drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.h
index 4a293542a7aa..866c94982e46 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.h
@@ -45,6 +45,10 @@ struct l2addr_node {
u8 addr[ETH_ALEN];
};
+#define mlx5_mpfs_foreach(hs, tmp, mpfs) \
+ for (int j = 0; j < MLX5_L2_ADDR_HASH_SIZE; j++) \
+ hlist_for_each_entry_safe(hs, tmp, &(mpfs)->hash[j], node.hlist)
+
#define for_each_l2hash_node(hn, tmp, hash, i) \
for (i = 0; i < MLX5_L2_ADDR_HASH_SIZE; i++) \
hlist_for_each_entry_safe(hn, tmp, &(hash)[i], hlist)
@@ -82,11 +86,16 @@ struct l2addr_node {
})
#ifdef CONFIG_MLX5_MPFS
+struct mlx5_core_dev;
int mlx5_mpfs_init(struct mlx5_core_dev *dev);
void mlx5_mpfs_cleanup(struct mlx5_core_dev *dev);
+int mlx5_mpfs_enable(struct mlx5_core_dev *dev);
+void mlx5_mpfs_disable(struct mlx5_core_dev *dev);
#else /* #ifndef CONFIG_MLX5_MPFS */
static inline int mlx5_mpfs_init(struct mlx5_core_dev *dev) { return 0; }
static inline void mlx5_mpfs_cleanup(struct mlx5_core_dev *dev) {}
+int mlx5_mpfs_enable(struct mlx5_core_dev *dev) { return 0; }
+void mlx5_mpfs_disable(struct mlx5_core_dev *dev) {}
#endif
#endif
--
2.51.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH net-next 3/3] net/mlx5: E-Switch, support eswitch state
2025-10-16 1:36 [PATCH net-next 0/3] devlink eswitch active/inactive state Saeed Mahameed
2025-10-16 1:36 ` [PATCH net-next 1/3] devlink: Introduce devlink eswitch state Saeed Mahameed
2025-10-16 1:36 ` [PATCH net-next 2/3] net/mlx5: MPFS, add support for dynamic enable/disable Saeed Mahameed
@ 2025-10-16 1:36 ` Saeed Mahameed
2025-10-16 14:54 ` Jakub Kicinski
2 siblings, 1 reply; 11+ messages in thread
From: Saeed Mahameed @ 2025-10-16 1:36 UTC (permalink / raw)
To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
Cc: Saeed Mahameed, netdev, Tariq Toukan, Gal Pressman,
Leon Romanovsky, mbloch, Adithya Jayachandran
From: Saeed Mahameed <saeedm@nvidia.com>
Support eswitch state:
- Active State: Allow FDB traffic, Connect adjacent vports and apply l2
mpfs rules.
- Inactive / Deactivated State: Drop all traffic going to FDB, Remove
mpfs l2 rules and disconnect adjacent vports.
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Adithya Jayachandran <ajayachandra@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
---
.../net/ethernet/mellanox/mlx5/core/devlink.c | 2 +
.../mellanox/mlx5/core/esw/adj_vport.c | 15 +-
.../net/ethernet/mellanox/mlx5/core/eswitch.c | 1 +
.../net/ethernet/mellanox/mlx5/core/eswitch.h | 12 ++
.../mellanox/mlx5/core/eswitch_offloads.c | 157 ++++++++++++++++++
.../net/ethernet/mellanox/mlx5/core/fs_core.c | 5 +
include/linux/mlx5/fs.h | 1 +
7 files changed, 182 insertions(+), 11 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
index a0b68321355a..32dbb11db94b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
@@ -371,6 +371,8 @@ static const struct devlink_ops mlx5_devlink_ops = {
#ifdef CONFIG_MLX5_ESWITCH
.eswitch_mode_set = mlx5_devlink_eswitch_mode_set,
.eswitch_mode_get = mlx5_devlink_eswitch_mode_get,
+ .eswitch_state_get = mlx5_devlink_eswitch_state_get,
+ .eswitch_state_set = mlx5_devlink_eswitch_state_set,
.eswitch_inline_mode_set = mlx5_devlink_eswitch_inline_mode_set,
.eswitch_inline_mode_get = mlx5_devlink_eswitch_inline_mode_get,
.eswitch_encap_mode_set = mlx5_devlink_eswitch_encap_mode_set,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/adj_vport.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/adj_vport.c
index 0091ba697bae..250af09b5af2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/adj_vport.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/adj_vport.c
@@ -4,13 +4,8 @@
#include "fs_core.h"
#include "eswitch.h"
-enum {
- MLX5_ADJ_VPORT_DISCONNECT = 0x0,
- MLX5_ADJ_VPORT_CONNECT = 0x1,
-};
-
-static int mlx5_esw_adj_vport_modify(struct mlx5_core_dev *dev,
- u16 vport, bool connect)
+int mlx5_esw_adj_vport_modify(struct mlx5_core_dev *dev, u16 vport,
+ bool connect)
{
u32 in[MLX5_ST_SZ_DW(modify_vport_state_in)] = {};
@@ -24,7 +19,7 @@ static int mlx5_esw_adj_vport_modify(struct mlx5_core_dev *dev,
MLX5_SET(modify_vport_state_in, in, egress_connect_valid, 1);
MLX5_SET(modify_vport_state_in, in, ingress_connect, connect);
MLX5_SET(modify_vport_state_in, in, egress_connect, connect);
-
+ MLX5_SET(modify_vport_state_in, in, admin_state, connect);
return mlx5_cmd_exec_in(dev, modify_vport_state, in);
}
@@ -96,7 +91,6 @@ static int mlx5_esw_adj_vport_create(struct mlx5_eswitch *esw, u16 vhca_id,
if (err)
goto acl_ns_remove;
- mlx5_esw_adj_vport_modify(esw->dev, vport_num, MLX5_ADJ_VPORT_CONNECT);
return 0;
acl_ns_remove:
@@ -117,8 +111,7 @@ static void mlx5_esw_adj_vport_destroy(struct mlx5_eswitch *esw,
esw_debug(esw->dev, "Destroying adjacent vport %d for vhca_id 0x%x\n",
vport_num, vport->vhca_id);
- mlx5_esw_adj_vport_modify(esw->dev, vport_num,
- MLX5_ADJ_VPORT_DISCONNECT);
+
mlx5_esw_offloads_rep_remove(esw, vport);
mlx5_fs_vport_egress_acl_ns_remove(esw->dev->priv.steering,
vport->index);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index ad6858789e48..b22f270e4859 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -2044,6 +2044,7 @@ int mlx5_eswitch_init(struct mlx5_core_dev *dev)
refcount_set(&esw->qos.refcnt, 0);
esw->enabled_vports = 0;
+ esw->state = DEVLINK_ESWITCH_STATE_ACTIVE;
esw->offloads.inline_mode = MLX5_INLINE_MODE_NONE;
if (MLX5_CAP_ESW_FLOWTABLE_FDB(dev, reformat) &&
MLX5_CAP_ESW_FLOWTABLE_FDB(dev, decap))
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index 4fe285ce32aa..5fd70bd8fb8c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -264,6 +264,10 @@ struct mlx5_eswitch_fdb {
struct offloads_fdb {
struct mlx5_flow_namespace *ns;
+ struct mlx5_flow_table *drop_root;
+ struct mlx5_flow_handle *drop_root_rule;
+ struct mlx5_fc *drop_root_counter;
+ struct dentry *drop_root_dbgfs;
struct mlx5_flow_table *tc_miss_table;
struct mlx5_flow_table *slow_fdb;
struct mlx5_flow_group *send_to_vport_grp;
@@ -392,6 +396,7 @@ struct mlx5_eswitch {
struct mlx5_esw_offload offloads;
u32 last_vport_idx;
int mode;
+ u8 state;
u16 manager_vport;
u16 first_host_vport;
u8 num_peers;
@@ -569,6 +574,11 @@ int mlx5_devlink_eswitch_encap_mode_set(struct devlink *devlink,
struct netlink_ext_ack *extack);
int mlx5_devlink_eswitch_encap_mode_get(struct devlink *devlink,
enum devlink_eswitch_encap_mode *encap);
+int mlx5_devlink_eswitch_state_get(struct devlink *devlink,
+ enum devlink_eswitch_state *state);
+int mlx5_devlink_eswitch_state_set(struct devlink *devlink,
+ enum devlink_eswitch_state state,
+ struct netlink_ext_ack *extack);
int mlx5_devlink_port_fn_hw_addr_get(struct devlink_port *port,
u8 *hw_addr, int *hw_addr_len,
struct netlink_ext_ack *extack);
@@ -633,6 +643,8 @@ const u32 *mlx5_esw_query_functions(struct mlx5_core_dev *dev);
void mlx5_esw_adjacent_vhcas_setup(struct mlx5_eswitch *esw);
void mlx5_esw_adjacent_vhcas_cleanup(struct mlx5_eswitch *esw);
+int mlx5_esw_adj_vport_modify(struct mlx5_core_dev *dev, u16 vport,
+ bool connect);
#define MLX5_DEBUG_ESWITCH_MASK BIT(3)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index f289e846ea3a..326f1e33799c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -1577,6 +1577,7 @@ esw_chains_create(struct mlx5_eswitch *esw, struct mlx5_flow_table *miss_fdb)
attr.max_grp_num = esw->params.large_group_num;
attr.default_ft = miss_fdb;
attr.mapping = esw->offloads.reg_c0_obj_pool;
+ attr.fs_base_prio = FDB_BYPASS_PATH;
chains = mlx5_chains_create(dev, &attr);
if (IS_ERR(chains)) {
@@ -2354,6 +2355,115 @@ static void esw_mode_change(struct mlx5_eswitch *esw, u16 mode)
mlx5_devcom_comp_unlock(esw->dev->priv.hca_devcom_comp);
}
+static void mlx5_esw_fdb_drop_destroy(struct mlx5_eswitch *esw)
+{
+ if (!esw->fdb_table.offloads.drop_root)
+ return;
+
+ mlx5_del_flow_rules(esw->fdb_table.offloads.drop_root_rule);
+ mlx5_fc_destroy(esw->dev, esw->fdb_table.offloads.drop_root_counter);
+ mlx5_destroy_flow_table(esw->fdb_table.offloads.drop_root);
+ esw->fdb_table.offloads.drop_root_counter = NULL;
+ esw->fdb_table.offloads.drop_root_rule = NULL;
+ esw->fdb_table.offloads.drop_root = NULL;
+}
+
+static int mlx5_esw_fdb_drop_create(struct mlx5_eswitch *esw)
+{
+ struct mlx5_flow_table_attr ft_attr = {};
+ struct mlx5_flow_destination dst = {};
+ struct mlx5_core_dev *dev = esw->dev;
+ struct mlx5_flow_namespace *root_ns;
+ struct mlx5_flow_act flow_act = {};
+ struct mlx5_flow_handle *flow_rule;
+ struct mlx5_flow_table *table;
+ int err = 0;
+
+ if (esw->fdb_table.offloads.drop_root)
+ return 0;
+
+ root_ns = esw->fdb_table.offloads.ns;
+
+ ft_attr.prio = FDB_DROP_ROOT;
+ ft_attr.max_fte = 1;
+ ft_attr.autogroup.max_num_groups = 1;
+ table = mlx5_create_auto_grouped_flow_table(root_ns, &ft_attr);
+ if (IS_ERR(table)) {
+ esw_warn(dev, "Failed to create fdb drop root table, err %ld\n",
+ PTR_ERR(table));
+ return PTR_ERR(table);
+ }
+
+ dst.type = MLX5_FLOW_DESTINATION_TYPE_COUNTER;
+ dst.counter = mlx5_fc_create(dev, 0);
+ err = PTR_ERR_OR_ZERO(dst.counter);
+ if (err) {
+ esw_warn(dev, "Failed to create fdb drop counter, err %d\n",
+ err);
+ goto err_counter;
+ }
+
+ flow_act.action = MLX5_FLOW_CONTEXT_ACTION_DROP |
+ MLX5_FLOW_CONTEXT_ACTION_COUNT;
+ flow_rule = mlx5_add_flow_rules(table, NULL, &flow_act, &dst, 1);
+ err = PTR_ERR_OR_ZERO(flow_rule);
+ if (err) {
+ esw_warn(esw->dev,
+ "fs offloads: Failed to add vport rx drop rule err %d\n",
+ err);
+ goto err_flow_rule;
+ }
+
+ esw->fdb_table.offloads.drop_root = table;
+ esw->fdb_table.offloads.drop_root_rule = flow_rule;
+ esw->fdb_table.offloads.drop_root_counter = dst.counter;
+ return 0;
+
+err_flow_rule:
+ mlx5_fc_destroy(dev, dst.counter);
+err_counter:
+ mlx5_destroy_flow_table(table);
+ return err;
+}
+
+static void mlx5_esw_fdb_active(struct mlx5_eswitch *esw)
+{
+ struct mlx5_vport *vport;
+ unsigned long i;
+
+ mlx5_esw_fdb_drop_destroy(esw);
+ mlx5_mpfs_enable(esw->dev);
+
+ mlx5_esw_for_each_vf_vport(esw, i, vport, U16_MAX) {
+ if (!vport->adjacent)
+ continue;
+ /* connect vport to this esw */
+ mlx5_esw_adj_vport_modify(esw->dev, vport->vport, true);
+ }
+
+ esw->state = DEVLINK_ESWITCH_STATE_ACTIVE;
+ esw_warn(esw->dev, "MPFS/FDB activated\n");
+}
+
+static void mlx5_esw_fdb_inactive(struct mlx5_eswitch *esw)
+{
+ struct mlx5_vport *vport;
+ unsigned long i;
+
+ mlx5_mpfs_disable(esw->dev);
+ mlx5_esw_fdb_drop_create(esw);
+
+ mlx5_esw_for_each_vf_vport(esw, i, vport, U16_MAX) {
+ if (!vport->adjacent)
+ continue;
+ /* disconnect vport from this esw */
+ mlx5_esw_adj_vport_modify(esw->dev, vport->vport, false);
+ }
+
+ esw->state = DEVLINK_ESWITCH_STATE_INACTIVE;
+ esw_warn(esw->dev, "MPFS/FDB de-activated\n");
+}
+
static int esw_offloads_start(struct mlx5_eswitch *esw,
struct netlink_ext_ack *extack)
{
@@ -3656,6 +3766,10 @@ void esw_offloads_disable(struct mlx5_eswitch *esw)
{
mlx5_eswitch_disable_pf_vf_vports(esw);
mlx5_esw_offloads_rep_unload(esw, MLX5_VPORT_UPLINK);
+
+ if (esw->state == DEVLINK_ESWITCH_STATE_INACTIVE)
+ mlx5_esw_fdb_active(esw); /* legacy mode always active */
+
esw_set_passing_vport_metadata(esw, false);
esw_offloads_steering_cleanup(esw);
mapping_destroy(esw->offloads.reg_c0_obj_pool);
@@ -3851,6 +3965,49 @@ int mlx5_devlink_eswitch_mode_get(struct devlink *devlink, u16 *mode)
return esw_mode_to_devlink(esw->mode, mode);
}
+int mlx5_devlink_eswitch_state_get(struct devlink *devlink,
+ enum devlink_eswitch_state *state)
+{
+ struct mlx5_eswitch *esw;
+
+ esw = mlx5_devlink_eswitch_get(devlink);
+ if (IS_ERR(esw))
+ return PTR_ERR(esw);
+
+ *state = esw->state;
+ return 0;
+}
+
+int mlx5_devlink_eswitch_state_set(struct devlink *devlink,
+ enum devlink_eswitch_state state,
+ struct netlink_ext_ack *extack)
+{
+ struct mlx5_eswitch *esw;
+
+ esw = mlx5_devlink_eswitch_get(devlink);
+ if (IS_ERR(esw)) {
+ NL_SET_ERR_MSG_MOD(extack, "Unable to query eswitch");
+ return PTR_ERR(esw);
+ }
+
+ if (esw->mode == MLX5_ESWITCH_LEGACY) {
+ if (state != DEVLINK_ESWITCH_STATE_ACTIVE) {
+ NL_SET_ERR_MSG_MOD(extack,
+ "legacy mode only supports active state");
+ return -EOPNOTSUPP;
+ }
+ return 0;
+ }
+
+ if (state == DEVLINK_ESWITCH_STATE_ACTIVE)
+ mlx5_esw_fdb_active(esw);
+ else
+ mlx5_esw_fdb_inactive(esw);
+
+ esw->state = state;
+ return 0;
+}
+
static int mlx5_esw_vports_inline_set(struct mlx5_eswitch *esw, u8 mlx5_mode,
struct netlink_ext_ack *extack)
{
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index 4308e89802f3..c8cfcf939d08 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -3520,6 +3520,11 @@ static int init_fdb_root_ns(struct mlx5_flow_steering *steering)
if (!steering->fdb_root_ns)
return -ENOMEM;
+ maj_prio = fs_create_prio(&steering->fdb_root_ns->ns, FDB_DROP_ROOT, 1);
+ err = PTR_ERR_OR_ZERO(maj_prio);
+ if (err)
+ goto out_err;
+
err = create_fdb_bypass(steering);
if (err)
goto out_err;
diff --git a/include/linux/mlx5/fs.h b/include/linux/mlx5/fs.h
index 86055d55836d..f1fe56e7efdb 100644
--- a/include/linux/mlx5/fs.h
+++ b/include/linux/mlx5/fs.h
@@ -116,6 +116,7 @@ enum mlx5_flow_namespace_type {
};
enum {
+ FDB_DROP_ROOT,
FDB_BYPASS_PATH,
FDB_CRYPTO_INGRESS,
FDB_TC_OFFLOAD,
--
2.51.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH net-next 1/3] devlink: Introduce devlink eswitch state
2025-10-16 1:36 ` [PATCH net-next 1/3] devlink: Introduce devlink eswitch state Saeed Mahameed
@ 2025-10-16 9:16 ` Jiri Pirko
2025-10-16 17:34 ` Saeed Mahameed
0 siblings, 1 reply; 11+ messages in thread
From: Jiri Pirko @ 2025-10-16 9:16 UTC (permalink / raw)
To: Saeed Mahameed
Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
Saeed Mahameed, netdev, Tariq Toukan, Gal Pressman,
Leon Romanovsky, mbloch, Parav Pandit, Adithya Jayachandran
Thu, Oct 16, 2025 at 03:36:16AM +0200, saeed@kernel.org wrote:
>From: Parav Pandit <parav@nvidia.com>
[...]
>@@ -722,6 +734,24 @@ int devlink_nl_eswitch_set_doit(struct sk_buff *skb, struct genl_info *info)
> return err;
> }
>
>+ state = DEVLINK_ESWITCH_STATE_ACTIVE;
>+ if (info->attrs[DEVLINK_ATTR_ESWITCH_STATE]) {
>+ if (!ops->eswitch_state_set)
>+ return -EOPNOTSUPP;
>+ state = nla_get_u8(info->attrs[DEVLINK_ATTR_ESWITCH_STATE]);
>+ }
>+ /* If user did not supply the state attribute, the default is
>+ * active state. If the state was not explicitly set, set the default
>+ * state for drivers that support eswitch state.
>+ * Keep this after mode-set as state handling can be dependent on
>+ * the eswitch mode.
>+ */
>+ if (ops->eswitch_state_set) {
>+ err = ops->eswitch_state_set(devlink, state, info->extack);
Calling state_set() upon every DEVLINK_CMD_ESWITCH_SET call,
even if STATE attr is not present, is plain wrong. Don't do it.
I don't really understand why you do so.
>+ if (err)
>+ return err;
>+ }
>+
> if (info->attrs[DEVLINK_ATTR_ESWITCH_INLINE_MODE]) {
> if (!ops->eswitch_inline_mode_set)
> return -EOPNOTSUPP;
[...]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next 3/3] net/mlx5: E-Switch, support eswitch state
2025-10-16 1:36 ` [PATCH net-next 3/3] net/mlx5: E-Switch, support eswitch state Saeed Mahameed
@ 2025-10-16 14:54 ` Jakub Kicinski
0 siblings, 0 replies; 11+ messages in thread
From: Jakub Kicinski @ 2025-10-16 14:54 UTC (permalink / raw)
To: Saeed Mahameed
Cc: David S. Miller, Paolo Abeni, Eric Dumazet, Saeed Mahameed,
netdev, Tariq Toukan, Gal Pressman, Leon Romanovsky, mbloch,
Adithya Jayachandran
On Wed, 15 Oct 2025 18:36:18 -0700 Saeed Mahameed wrote:
> + esw_warn(dev, "Failed to create fdb drop root table, err %ld\n",
> + PTR_ERR(table));
Gal's new coccicheck says:
drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c:2395:4-11: WARNING: Consider using %pe to print PTR_ERR()
:)
--
pw-bot: cr
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next 1/3] devlink: Introduce devlink eswitch state
2025-10-16 9:16 ` Jiri Pirko
@ 2025-10-16 17:34 ` Saeed Mahameed
2025-10-17 8:06 ` Jiri Pirko
0 siblings, 1 reply; 11+ messages in thread
From: Saeed Mahameed @ 2025-10-16 17:34 UTC (permalink / raw)
To: Jiri Pirko
Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
Saeed Mahameed, netdev, Tariq Toukan, Gal Pressman,
Leon Romanovsky, mbloch, Parav Pandit, Adithya Jayachandran
On 16 Oct 11:16, Jiri Pirko wrote:
>Thu, Oct 16, 2025 at 03:36:16AM +0200, saeed@kernel.org wrote:
>>From: Parav Pandit <parav@nvidia.com>
>
>[...]
>
>
>>@@ -722,6 +734,24 @@ int devlink_nl_eswitch_set_doit(struct sk_buff *skb, struct genl_info *info)
>> return err;
>> }
>>
>>+ state = DEVLINK_ESWITCH_STATE_ACTIVE;
>>+ if (info->attrs[DEVLINK_ATTR_ESWITCH_STATE]) {
>>+ if (!ops->eswitch_state_set)
>>+ return -EOPNOTSUPP;
>>+ state = nla_get_u8(info->attrs[DEVLINK_ATTR_ESWITCH_STATE]);
>>+ }
>>+ /* If user did not supply the state attribute, the default is
>>+ * active state. If the state was not explicitly set, set the default
>>+ * state for drivers that support eswitch state.
>>+ * Keep this after mode-set as state handling can be dependent on
>>+ * the eswitch mode.
>>+ */
>>+ if (ops->eswitch_state_set) {
>>+ err = ops->eswitch_state_set(devlink, state, info->extack);
>
>Calling state_set() upon every DEVLINK_CMD_ESWITCH_SET call,
>even if STATE attr is not present, is plain wrong. Don't do it.
>I don't really understand why you do so.
>
I don't get the "plain wrong" part? Please explain.
Here's is what we are trying to solve and why I think this way is the best
way to solve it, unless you have a better idea.
We want to preserve backwards compatibility, think of:
- old devlink iproute2 (doesn't provide STATE attr).
- new kernel (expects new STATE attr).
Upon your request we split mode and state handling into separate callbacks,
meaning, you set mode first and then state in DEVLINK_CMD_ESWITCH_SET.
ops->mode_set(); doesn't have information on state, so a drivers that
implement state_set() will expect state_set() to be called after
mode_set(), otherwise, state will remain inactive for that driver.
If state attr is not provided (e.g. old devlink userspace) but the user
expects state to be active, then if we do what you ask for, we don't
call state_set() and after mode_set() we will be in an inactive state,
while user expects active (default behavior) for backward compatibility.
To solve this we always default state = ACTIVE (if state attr wasn't
provided) and call state_set();
Let me know if you have better ideas, on how to solve this problem.
Otherwise, this patch's way of preserving backward compatibility is
not "plain wrong".
We can optimize to call set_state() only if (mode || state) attr was
provided. Let me know if that works for you.
- Saeed.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next 2/3] net/mlx5: MPFS, add support for dynamic enable/disable
2025-10-16 1:36 ` [PATCH net-next 2/3] net/mlx5: MPFS, add support for dynamic enable/disable Saeed Mahameed
@ 2025-10-16 19:28 ` kernel test robot
2025-10-16 21:35 ` kernel test robot
2025-10-18 7:42 ` kernel test robot
2 siblings, 0 replies; 11+ messages in thread
From: kernel test robot @ 2025-10-16 19:28 UTC (permalink / raw)
To: Saeed Mahameed, David S. Miller, Jakub Kicinski, Paolo Abeni,
Eric Dumazet
Cc: oe-kbuild-all, netdev, Saeed Mahameed, Tariq Toukan, Gal Pressman,
Leon Romanovsky, mbloch, Adithya Jayachandran
Hi Saeed,
kernel test robot noticed the following build warnings:
[auto build test WARNING on net-next/main]
url: https://github.com/intel-lab-lkp/linux/commits/Saeed-Mahameed/devlink-Introduce-devlink-eswitch-state/20251016-094245
base: net-next/main
patch link: https://lore.kernel.org/r/20251016013618.2030940-3-saeed%40kernel.org
patch subject: [PATCH net-next 2/3] net/mlx5: MPFS, add support for dynamic enable/disable
config: powerpc-randconfig-001-20251017 (https://download.01.org/0day-ci/archive/20251017/202510170321.YvES75vr-lkp@intel.com/config)
compiler: powerpc-linux-gcc (GCC) 14.3.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251017/202510170321.YvES75vr-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202510170321.YvES75vr-lkp@intel.com/
All warnings (new ones prefixed by >>):
In file included from drivers/net/ethernet/mellanox/mlx5/core/eswitch.h:45,
from drivers/net/ethernet/mellanox/mlx5/core/vport.c:39:
>> drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.h:97:5: warning: no previous prototype for 'mlx5_mpfs_enable' [-Wmissing-prototypes]
97 | int mlx5_mpfs_enable(struct mlx5_core_dev *dev) { return 0; }
| ^~~~~~~~~~~~~~~~
>> drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.h:98:6: warning: no previous prototype for 'mlx5_mpfs_disable' [-Wmissing-prototypes]
98 | void mlx5_mpfs_disable(struct mlx5_core_dev *dev) {}
| ^~~~~~~~~~~~~~~~~
vim +/mlx5_mpfs_enable +97 drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.h
87
88 #ifdef CONFIG_MLX5_MPFS
89 struct mlx5_core_dev;
90 int mlx5_mpfs_init(struct mlx5_core_dev *dev);
91 void mlx5_mpfs_cleanup(struct mlx5_core_dev *dev);
92 int mlx5_mpfs_enable(struct mlx5_core_dev *dev);
93 void mlx5_mpfs_disable(struct mlx5_core_dev *dev);
94 #else /* #ifndef CONFIG_MLX5_MPFS */
95 static inline int mlx5_mpfs_init(struct mlx5_core_dev *dev) { return 0; }
96 static inline void mlx5_mpfs_cleanup(struct mlx5_core_dev *dev) {}
> 97 int mlx5_mpfs_enable(struct mlx5_core_dev *dev) { return 0; }
> 98 void mlx5_mpfs_disable(struct mlx5_core_dev *dev) {}
99 #endif
100
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next 2/3] net/mlx5: MPFS, add support for dynamic enable/disable
2025-10-16 1:36 ` [PATCH net-next 2/3] net/mlx5: MPFS, add support for dynamic enable/disable Saeed Mahameed
2025-10-16 19:28 ` kernel test robot
@ 2025-10-16 21:35 ` kernel test robot
2025-10-18 7:42 ` kernel test robot
2 siblings, 0 replies; 11+ messages in thread
From: kernel test robot @ 2025-10-16 21:35 UTC (permalink / raw)
To: Saeed Mahameed, David S. Miller, Jakub Kicinski, Paolo Abeni,
Eric Dumazet
Cc: llvm, oe-kbuild-all, netdev, Saeed Mahameed, Tariq Toukan,
Gal Pressman, Leon Romanovsky, mbloch, Adithya Jayachandran
Hi Saeed,
kernel test robot noticed the following build warnings:
[auto build test WARNING on net-next/main]
url: https://github.com/intel-lab-lkp/linux/commits/Saeed-Mahameed/devlink-Introduce-devlink-eswitch-state/20251016-094245
base: net-next/main
patch link: https://lore.kernel.org/r/20251016013618.2030940-3-saeed%40kernel.org
patch subject: [PATCH net-next 2/3] net/mlx5: MPFS, add support for dynamic enable/disable
config: x86_64-randconfig-075-20251017 (https://download.01.org/0day-ci/archive/20251017/202510170416.DfoJV8mp-lkp@intel.com/config)
compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251017/202510170416.DfoJV8mp-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202510170416.DfoJV8mp-lkp@intel.com/
All warnings (new ones prefixed by >>):
In file included from drivers/net/ethernet/mellanox/mlx5/core/sriov.c:38:
In file included from drivers/net/ethernet/mellanox/mlx5/core/eswitch.h:45:
>> drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.h:97:5: warning: no previous prototype for function 'mlx5_mpfs_enable' [-Wmissing-prototypes]
97 | int mlx5_mpfs_enable(struct mlx5_core_dev *dev) { return 0; }
| ^
drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.h:97:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
97 | int mlx5_mpfs_enable(struct mlx5_core_dev *dev) { return 0; }
| ^
| static
>> drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.h:98:6: warning: no previous prototype for function 'mlx5_mpfs_disable' [-Wmissing-prototypes]
98 | void mlx5_mpfs_disable(struct mlx5_core_dev *dev) {}
| ^
drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.h:98:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
98 | void mlx5_mpfs_disable(struct mlx5_core_dev *dev) {}
| ^
| static
2 warnings generated.
vim +/mlx5_mpfs_enable +97 drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.h
87
88 #ifdef CONFIG_MLX5_MPFS
89 struct mlx5_core_dev;
90 int mlx5_mpfs_init(struct mlx5_core_dev *dev);
91 void mlx5_mpfs_cleanup(struct mlx5_core_dev *dev);
92 int mlx5_mpfs_enable(struct mlx5_core_dev *dev);
93 void mlx5_mpfs_disable(struct mlx5_core_dev *dev);
94 #else /* #ifndef CONFIG_MLX5_MPFS */
95 static inline int mlx5_mpfs_init(struct mlx5_core_dev *dev) { return 0; }
96 static inline void mlx5_mpfs_cleanup(struct mlx5_core_dev *dev) {}
> 97 int mlx5_mpfs_enable(struct mlx5_core_dev *dev) { return 0; }
> 98 void mlx5_mpfs_disable(struct mlx5_core_dev *dev) {}
99 #endif
100
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next 1/3] devlink: Introduce devlink eswitch state
2025-10-16 17:34 ` Saeed Mahameed
@ 2025-10-17 8:06 ` Jiri Pirko
0 siblings, 0 replies; 11+ messages in thread
From: Jiri Pirko @ 2025-10-17 8:06 UTC (permalink / raw)
To: Saeed Mahameed
Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
Saeed Mahameed, netdev, Tariq Toukan, Gal Pressman,
Leon Romanovsky, mbloch, Parav Pandit, Adithya Jayachandran
Thu, Oct 16, 2025 at 07:34:04PM +0200, saeed@kernel.org wrote:
>On 16 Oct 11:16, Jiri Pirko wrote:
>> Thu, Oct 16, 2025 at 03:36:16AM +0200, saeed@kernel.org wrote:
>> > From: Parav Pandit <parav@nvidia.com>
>>
>> [...]
>>
>>
>> > @@ -722,6 +734,24 @@ int devlink_nl_eswitch_set_doit(struct sk_buff *skb, struct genl_info *info)
>> > return err;
>> > }
>> >
>> > + state = DEVLINK_ESWITCH_STATE_ACTIVE;
>> > + if (info->attrs[DEVLINK_ATTR_ESWITCH_STATE]) {
>> > + if (!ops->eswitch_state_set)
>> > + return -EOPNOTSUPP;
>> > + state = nla_get_u8(info->attrs[DEVLINK_ATTR_ESWITCH_STATE]);
>> > + }
>> > + /* If user did not supply the state attribute, the default is
>> > + * active state. If the state was not explicitly set, set the default
>> > + * state for drivers that support eswitch state.
>> > + * Keep this after mode-set as state handling can be dependent on
>> > + * the eswitch mode.
>> > + */
>> > + if (ops->eswitch_state_set) {
>> > + err = ops->eswitch_state_set(devlink, state, info->extack);
>>
>> Calling state_set() upon every DEVLINK_CMD_ESWITCH_SET call,
>> even if STATE attr is not present, is plain wrong. Don't do it.
>> I don't really understand why you do so.
>>
>
>I don't get the "plain wrong" part? Please explain.
>
>Here's is what we are trying to solve and why I think this way is the best
>way to solve it, unless you have a better idea.
>
>We want to preserve backwards compatibility, think of:
> - old devlink iproute2 (doesn't provide STATE attr).
> - new kernel (expects new STATE attr).
>
>Upon your request we split mode and state handling into separate callbacks,
>meaning, you set mode first and then state in DEVLINK_CMD_ESWITCH_SET.
>
>ops->mode_set(); doesn't have information on state, so a drivers that
>implement state_set() will expect state_set() to be called after
>mode_set(), otherwise, state will remain inactive for that driver.
>
>If state attr is not provided (e.g. old devlink userspace) but the user
>expects state to be active, then if we do what you ask for, we don't
>call state_set() and after mode_set() we will be in an inactive state,
>while user expects active (default behavior) for backward compatibility.
>
>To solve this we always default state = ACTIVE (if state attr wasn't
>provided) and call state_set();
>
>Let me know if you have better ideas, on how to solve this problem.
>Otherwise, this patch's way of preserving backward compatibility is not
>"plain wrong".
>
>We can optimize to call set_state() only if (mode || state) attr was
>provided. Let me know if that works for you.
I'm just saying you have a bug in the code. You assume user *always*
sets mode. That is not the case however. User might set only:
inline-mode
encap-mode
In that cases, you wrongly call state_set() without any reason.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next 2/3] net/mlx5: MPFS, add support for dynamic enable/disable
2025-10-16 1:36 ` [PATCH net-next 2/3] net/mlx5: MPFS, add support for dynamic enable/disable Saeed Mahameed
2025-10-16 19:28 ` kernel test robot
2025-10-16 21:35 ` kernel test robot
@ 2025-10-18 7:42 ` kernel test robot
2 siblings, 0 replies; 11+ messages in thread
From: kernel test robot @ 2025-10-18 7:42 UTC (permalink / raw)
To: Saeed Mahameed, David S. Miller, Jakub Kicinski, Paolo Abeni,
Eric Dumazet
Cc: oe-kbuild-all, netdev, Saeed Mahameed, Tariq Toukan, Gal Pressman,
Leon Romanovsky, mbloch, Adithya Jayachandran
Hi Saeed,
kernel test robot noticed the following build errors:
[auto build test ERROR on net-next/main]
url: https://github.com/intel-lab-lkp/linux/commits/Saeed-Mahameed/devlink-Introduce-devlink-eswitch-state/20251016-094245
base: net-next/main
patch link: https://lore.kernel.org/r/20251016013618.2030940-3-saeed%40kernel.org
patch subject: [PATCH net-next 2/3] net/mlx5: MPFS, add support for dynamic enable/disable
config: i386-buildonly-randconfig-001-20251018 (https://download.01.org/0day-ci/archive/20251018/202510181424.S8zAzGjf-lkp@intel.com/config)
compiler: gcc-14 (Debian 14.2.0-19) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251018/202510181424.S8zAzGjf-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202510181424.S8zAzGjf-lkp@intel.com/
All errors (new ones prefixed by >>):
ld: drivers/net/ethernet/mellanox/mlx5/core/eq.o: in function `mlx5_mpfs_enable':
>> drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.h:97: multiple definition of `mlx5_mpfs_enable'; drivers/net/ethernet/mellanox/mlx5/core/main.o:drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.h:97: first defined here
ld: drivers/net/ethernet/mellanox/mlx5/core/eq.o: in function `mlx5_mpfs_disable':
>> drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.h:98: multiple definition of `mlx5_mpfs_disable'; drivers/net/ethernet/mellanox/mlx5/core/main.o:drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.h:98: first defined here
ld: drivers/net/ethernet/mellanox/mlx5/core/vport.o: in function `mlx5_mpfs_enable':
>> drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.h:97: multiple definition of `mlx5_mpfs_enable'; drivers/net/ethernet/mellanox/mlx5/core/main.o:drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.h:97: first defined here
ld: drivers/net/ethernet/mellanox/mlx5/core/vport.o: in function `mlx5_mpfs_disable':
>> drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.h:98: multiple definition of `mlx5_mpfs_disable'; drivers/net/ethernet/mellanox/mlx5/core/main.o:drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.h:98: first defined here
ld: drivers/net/ethernet/mellanox/mlx5/core/sriov.o: in function `mlx5_mpfs_enable':
>> drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.h:97: multiple definition of `mlx5_mpfs_enable'; drivers/net/ethernet/mellanox/mlx5/core/main.o:drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.h:97: first defined here
ld: drivers/net/ethernet/mellanox/mlx5/core/sriov.o: in function `mlx5_mpfs_disable':
>> drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.h:98: multiple definition of `mlx5_mpfs_disable'; drivers/net/ethernet/mellanox/mlx5/core/main.o:drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.h:98: first defined here
ld: drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.o: in function `mlx5_mpfs_enable':
>> drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.h:97: multiple definition of `mlx5_mpfs_enable'; drivers/net/ethernet/mellanox/mlx5/core/main.o:drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.h:97: first defined here
ld: drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.o: in function `mlx5_mpfs_disable':
>> drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.h:98: multiple definition of `mlx5_mpfs_disable'; drivers/net/ethernet/mellanox/mlx5/core/main.o:drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.h:98: first defined here
ld: drivers/net/ethernet/mellanox/mlx5/core/lag/lag.o: in function `mlx5_mpfs_enable':
>> drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.h:97: multiple definition of `mlx5_mpfs_enable'; drivers/net/ethernet/mellanox/mlx5/core/main.o:drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.h:97: first defined here
ld: drivers/net/ethernet/mellanox/mlx5/core/lag/lag.o: in function `mlx5_mpfs_disable':
>> drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.h:98: multiple definition of `mlx5_mpfs_disable'; drivers/net/ethernet/mellanox/mlx5/core/main.o:drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.h:98: first defined here
ld: drivers/net/ethernet/mellanox/mlx5/core/devlink.o: in function `mlx5_mpfs_enable':
>> drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.h:97: multiple definition of `mlx5_mpfs_enable'; drivers/net/ethernet/mellanox/mlx5/core/main.o:drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.h:97: first defined here
ld: drivers/net/ethernet/mellanox/mlx5/core/devlink.o: in function `mlx5_mpfs_disable':
>> drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.h:98: multiple definition of `mlx5_mpfs_disable'; drivers/net/ethernet/mellanox/mlx5/core/main.o:drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.h:98: first defined here
vim +97 drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.h
87
88 #ifdef CONFIG_MLX5_MPFS
89 struct mlx5_core_dev;
90 int mlx5_mpfs_init(struct mlx5_core_dev *dev);
91 void mlx5_mpfs_cleanup(struct mlx5_core_dev *dev);
92 int mlx5_mpfs_enable(struct mlx5_core_dev *dev);
93 void mlx5_mpfs_disable(struct mlx5_core_dev *dev);
94 #else /* #ifndef CONFIG_MLX5_MPFS */
95 static inline int mlx5_mpfs_init(struct mlx5_core_dev *dev) { return 0; }
96 static inline void mlx5_mpfs_cleanup(struct mlx5_core_dev *dev) {}
> 97 int mlx5_mpfs_enable(struct mlx5_core_dev *dev) { return 0; }
> 98 void mlx5_mpfs_disable(struct mlx5_core_dev *dev) {}
99 #endif
100
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2025-10-18 7:42 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-16 1:36 [PATCH net-next 0/3] devlink eswitch active/inactive state Saeed Mahameed
2025-10-16 1:36 ` [PATCH net-next 1/3] devlink: Introduce devlink eswitch state Saeed Mahameed
2025-10-16 9:16 ` Jiri Pirko
2025-10-16 17:34 ` Saeed Mahameed
2025-10-17 8:06 ` Jiri Pirko
2025-10-16 1:36 ` [PATCH net-next 2/3] net/mlx5: MPFS, add support for dynamic enable/disable Saeed Mahameed
2025-10-16 19:28 ` kernel test robot
2025-10-16 21:35 ` kernel test robot
2025-10-18 7:42 ` kernel test robot
2025-10-16 1:36 ` [PATCH net-next 3/3] net/mlx5: E-Switch, support eswitch state Saeed Mahameed
2025-10-16 14:54 ` Jakub Kicinski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).