* [PATCH net-next V2 1/3] devlink: Introduce switchdev_inactive eswitch mode
2025-11-07 0:08 [PATCH net-next V2 0/3] devlink eswitch inactive mode Saeed Mahameed
@ 2025-11-07 0:08 ` Saeed Mahameed
2025-11-07 0:08 ` [PATCH net-next V2 2/3] net/mlx5: MPFS, add support for dynamic enable/disable Saeed Mahameed
2025-11-07 0:08 ` [PATCH net-next V2 3/3] net/mlx5: E-Switch, support eswitch inactive mode Saeed Mahameed
2 siblings, 0 replies; 9+ messages in thread
From: Saeed Mahameed @ 2025-11-07 0:08 UTC (permalink / raw)
To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
Cc: Saeed Mahameed, netdev, Tariq Toukan, Gal Pressman,
Leon Romanovsky, Jiri Pirko, mbloch
From: Saeed Mahameed <saeedm@nvidia.com>
Adds DEVLINK_ESWITCH_MODE_SWITCHDEV_INACTIVE attribute to UAPI and
documentation.
Before having traffic flow through an eswitch, a user may want to have the
ability to block traffic towards the FDB until FDB is fully programmed and
the user is ready to send traffic to it. For example: when two eswitches
are present for vports in a multi-PF setup, one eswitch may take over the
traffic from the other when the user chooses.
Before this take over, a user may want to first program the inactive
eswitch and then once ready redirect traffic to this new eswitch.
switchdev modes transition semantics:
legacy->switchdev_inactive: Create switchdev mode normally, traffic not
allowed to flow yet.
switchdev_inactive->switchdev: Enable traffic to flow.
switchdev->switchdev_inactive: Block traffic on the FDB, FDB and
representros state and content is preserved.
When eswitch is configured to this mode, traffic is ignored/dropped on
this eswitch FDB, while current configuration is kept, e.g FDB rules and
netdev representros are kept available, FDB programming is allowed.
Example:
# start inactive switchdev
devlink dev eswitch set pci/0000:08:00.1 mode switchdev_inactive
# setup TC rules, representors etc ..
# activate
devlink dev eswitch set pci/0000:08:00.1 mode switchdev
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
---
Documentation/netlink/specs/devlink.yaml | 2 ++
.../networking/devlink/devlink-eswitch-attr.rst | 13 +++++++++++++
include/uapi/linux/devlink.h | 1 +
net/devlink/netlink_gen.c | 2 +-
4 files changed, 17 insertions(+), 1 deletion(-)
diff --git a/Documentation/netlink/specs/devlink.yaml b/Documentation/netlink/specs/devlink.yaml
index 3db59c965869..426d5aa7d955 100644
--- a/Documentation/netlink/specs/devlink.yaml
+++ b/Documentation/netlink/specs/devlink.yaml
@@ -99,6 +99,8 @@ definitions:
name: legacy
-
name: switchdev
+ -
+ name: switchdev-inactive
-
type: enum
name: eswitch-inline-mode
diff --git a/Documentation/networking/devlink/devlink-eswitch-attr.rst b/Documentation/networking/devlink/devlink-eswitch-attr.rst
index 08bb39ab1528..eafe09abc40c 100644
--- a/Documentation/networking/devlink/devlink-eswitch-attr.rst
+++ b/Documentation/networking/devlink/devlink-eswitch-attr.rst
@@ -39,6 +39,10 @@ The following is a list of E-Switch attributes.
rules.
* ``switchdev`` allows for more advanced offloading capabilities of
the E-Switch to hardware.
+ * ``switchdev_inactive`` switchdev mode but starts inactive, doesn't allow traffic
+ until explicitly activated. This mode is useful for orchestrators that
+ want to prepare the device in switchdev mode but only activate it when
+ all configurations are done.
* - ``inline-mode``
- enum
- Some HWs need the VF driver to put part of the packet
@@ -74,3 +78,12 @@ Example Usage
# enable encap-mode with legacy mode
$ devlink dev eswitch set pci/0000:08:00.0 mode legacy inline-mode none encap-mode basic
+
+ # start switchdev mode in inactive state
+ $ devlink dev eswitch set pci/0000:08:00.0 mode switchdev_inactive
+
+ # setup switchdev configurations, representors, FDB entries, etc..
+ ...
+
+ # activate switchdev mode to allow traffic
+ $ devlink dev eswitch set pci/0000:08:00.0 mode switchdev
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index bcad11a787a5..157f11d3fb72 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -181,6 +181,7 @@ enum devlink_sb_threshold_type {
enum devlink_eswitch_mode {
DEVLINK_ESWITCH_MODE_LEGACY,
DEVLINK_ESWITCH_MODE_SWITCHDEV,
+ DEVLINK_ESWITCH_MODE_SWITCHDEV_INACTIVE,
};
enum devlink_eswitch_inline_mode {
diff --git a/net/devlink/netlink_gen.c b/net/devlink/netlink_gen.c
index 9fd00977d59e..5ad435aee29d 100644
--- a/net/devlink/netlink_gen.c
+++ b/net/devlink/netlink_gen.c
@@ -229,7 +229,7 @@ static const struct nla_policy devlink_eswitch_get_nl_policy[DEVLINK_ATTR_DEV_NA
static const struct nla_policy devlink_eswitch_set_nl_policy[DEVLINK_ATTR_ESWITCH_ENCAP_MODE + 1] = {
[DEVLINK_ATTR_BUS_NAME] = { .type = NLA_NUL_STRING, },
[DEVLINK_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING, },
- [DEVLINK_ATTR_ESWITCH_MODE] = NLA_POLICY_MAX(NLA_U16, 1),
+ [DEVLINK_ATTR_ESWITCH_MODE] = NLA_POLICY_MAX(NLA_U16, 2),
[DEVLINK_ATTR_ESWITCH_INLINE_MODE] = NLA_POLICY_MAX(NLA_U8, 3),
[DEVLINK_ATTR_ESWITCH_ENCAP_MODE] = NLA_POLICY_MAX(NLA_U8, 1),
};
--
2.51.1
^ permalink raw reply related [flat|nested] 9+ messages in thread* [PATCH net-next V2 2/3] net/mlx5: MPFS, add support for dynamic enable/disable
2025-11-07 0:08 [PATCH net-next V2 0/3] devlink eswitch inactive mode Saeed Mahameed
2025-11-07 0:08 ` [PATCH net-next V2 1/3] devlink: Introduce switchdev_inactive eswitch mode Saeed Mahameed
@ 2025-11-07 0:08 ` Saeed Mahameed
2025-11-08 15:21 ` Simon Horman
2025-11-07 0:08 ` [PATCH net-next V2 3/3] net/mlx5: E-Switch, support eswitch inactive mode Saeed Mahameed
2 siblings, 1 reply; 9+ messages in thread
From: Saeed Mahameed @ 2025-11-07 0:08 UTC (permalink / raw)
To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
Cc: Saeed Mahameed, netdev, Tariq Toukan, Gal Pressman,
Leon Romanovsky, Jiri Pirko, mbloch, Adithya Jayachandran
From: Saeed Mahameed <saeedm@nvidia.com>
MPFS (Multi PF Switch) is enabled by default in Multi-Host environments,
the driver keeps a list of desired unicast mac addresses of all vports
(vfs/Sfs) and applied to HW via L2_table FW command.
Add API to dynamically apply the list of MACs to HW when needed for next
patches, to utilize this new API in devlink eswitch active/in-active uAPI.
Issue: 4314625
Change-Id: I185c144319e514f787811f556888e1b727bdbf35
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Adithya Jayachandran <ajayachandra@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
---
.../ethernet/mellanox/mlx5/core/lib/mpfs.c | 115 +++++++++++++++---
.../ethernet/mellanox/mlx5/core/lib/mpfs.h | 9 ++
2 files changed, 107 insertions(+), 17 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.c
index 4450091e181a..f27b5adb9606 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.c
@@ -65,13 +65,14 @@ static int del_l2table_entry_cmd(struct mlx5_core_dev *dev, u32 index)
/* UC L2 table hash node */
struct l2table_node {
struct l2addr_node node;
- u32 index; /* index in HW l2 table */
+ int index; /* index in HW l2 table */
int ref_count;
};
struct mlx5_mpfs {
struct hlist_head hash[MLX5_L2_ADDR_HASH_SIZE];
struct mutex lock; /* Synchronize l2 table access */
+ bool enabled;
u32 size;
unsigned long *bitmap;
};
@@ -114,6 +115,8 @@ int mlx5_mpfs_init(struct mlx5_core_dev *dev)
return -ENOMEM;
}
+ mpfs->enabled = true;
+
dev->priv.mpfs = mpfs;
return 0;
}
@@ -135,7 +138,7 @@ int mlx5_mpfs_add_mac(struct mlx5_core_dev *dev, u8 *mac)
struct mlx5_mpfs *mpfs = dev->priv.mpfs;
struct l2table_node *l2addr;
int err = 0;
- u32 index;
+ int index;
if (!mpfs)
return 0;
@@ -148,30 +151,34 @@ int mlx5_mpfs_add_mac(struct mlx5_core_dev *dev, u8 *mac)
goto out;
}
- err = alloc_l2table_index(mpfs, &index);
- if (err)
- goto out;
-
l2addr = l2addr_hash_add(mpfs->hash, mac, struct l2table_node, GFP_KERNEL);
if (!l2addr) {
err = -ENOMEM;
- goto hash_add_err;
+ goto out;
}
- err = set_l2table_entry_cmd(dev, index, mac);
- if (err)
- goto set_table_entry_err;
+ index = -1;
+
+ if (mpfs->enabled) {
+ err = alloc_l2table_index(mpfs, &index);
+ if (err)
+ goto hash_del;
+ err = set_l2table_entry_cmd(dev, index, mac);
+ if (err)
+ goto free_l2table_index;
+ mlx5_core_dbg(dev, "MPFS entry %pM, set @index (%d)\n",
+ l2addr->node.addr, l2addr->index);
+ }
l2addr->index = index;
l2addr->ref_count = 1;
mlx5_core_dbg(dev, "MPFS mac added %pM, index (%d)\n", mac, index);
goto out;
-
-set_table_entry_err:
- l2addr_hash_del(l2addr);
-hash_add_err:
+free_l2table_index:
free_l2table_index(mpfs, index);
+hash_del:
+ l2addr_hash_del(l2addr);
out:
mutex_unlock(&mpfs->lock);
return err;
@@ -183,7 +190,7 @@ int mlx5_mpfs_del_mac(struct mlx5_core_dev *dev, u8 *mac)
struct mlx5_mpfs *mpfs = dev->priv.mpfs;
struct l2table_node *l2addr;
int err = 0;
- u32 index;
+ int index;
if (!mpfs)
return 0;
@@ -200,12 +207,86 @@ int mlx5_mpfs_del_mac(struct mlx5_core_dev *dev, u8 *mac)
goto unlock;
index = l2addr->index;
- del_l2table_entry_cmd(dev, index);
+ if (index >= 0) {
+ del_l2table_entry_cmd(dev, index);
+ free_l2table_index(mpfs, index);
+ mlx5_core_dbg(dev, "MPFS entry %pM, deleted @index (%d)\n",
+ mac, index);
+ }
l2addr_hash_del(l2addr);
- free_l2table_index(mpfs, index);
mlx5_core_dbg(dev, "MPFS mac deleted %pM, index (%d)\n", mac, index);
unlock:
mutex_unlock(&mpfs->lock);
return err;
}
EXPORT_SYMBOL(mlx5_mpfs_del_mac);
+
+int mlx5_mpfs_enable(struct mlx5_core_dev *dev)
+{
+ struct mlx5_mpfs *mpfs = dev->priv.mpfs;
+ struct l2table_node *l2addr;
+ struct hlist_node *n;
+ int err = 0;
+
+ if (!mpfs)
+ return -ENODEV;
+
+ mutex_lock(&mpfs->lock);
+ if (mpfs->enabled)
+ goto out;
+ mpfs->enabled = true;
+ mlx5_core_dbg(dev, "MPFS enabling mpfs\n");
+
+ mlx5_mpfs_foreach(l2addr, n, mpfs) {
+ u32 index;
+
+ err = alloc_l2table_index(mpfs, &index);
+ if (err) {
+ mlx5_core_err(dev, "Failed to allocated MPFS index for %pM, err(%d)\n",
+ l2addr->node.addr, err);
+ goto out;
+ }
+
+ err = set_l2table_entry_cmd(dev, index, l2addr->node.addr);
+ if (err) {
+ mlx5_core_err(dev, "Failed to set MPFS l2table entry for %pM index=%d, err(%d)\n",
+ l2addr->node.addr, index, err);
+ free_l2table_index(mpfs, index);
+ goto out;
+ }
+
+ l2addr->index = index;
+ mlx5_core_dbg(dev, "MPFS entry %pM, set @index (%d)\n",
+ l2addr->node.addr, l2addr->index);
+ }
+out:
+ mutex_unlock(&mpfs->lock);
+ return err;
+}
+
+void mlx5_mpfs_disable(struct mlx5_core_dev *dev)
+{
+ struct mlx5_mpfs *mpfs = dev->priv.mpfs;
+ struct l2table_node *l2addr;
+ struct hlist_node *n;
+
+ if (!mpfs)
+ return;
+
+ mutex_lock(&mpfs->lock);
+ if (!mpfs->enabled)
+ goto unlock;
+ mlx5_mpfs_foreach(l2addr, n, mpfs) {
+ if (l2addr->index < 0)
+ continue;
+ del_l2table_entry_cmd(dev, l2addr->index);
+ free_l2table_index(mpfs, l2addr->index);
+ mlx5_core_dbg(dev, "MPFS entry %pM, deleted @index (%d)\n",
+ l2addr->node.addr, l2addr->index);
+ l2addr->index = -1;
+ }
+ mpfs->enabled = false;
+ mlx5_core_dbg(dev, "MPFS disabled\n");
+unlock:
+ mutex_unlock(&mpfs->lock);
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.h b/drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.h
index 4a293542a7aa..be0ccdd5a0aa 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.h
@@ -45,6 +45,10 @@ struct l2addr_node {
u8 addr[ETH_ALEN];
};
+#define mlx5_mpfs_foreach(hs, tmp, mpfs) \
+ for (int j = 0; j < MLX5_L2_ADDR_HASH_SIZE; j++) \
+ hlist_for_each_entry_safe(hs, tmp, &(mpfs)->hash[j], node.hlist)
+
#define for_each_l2hash_node(hn, tmp, hash, i) \
for (i = 0; i < MLX5_L2_ADDR_HASH_SIZE; i++) \
hlist_for_each_entry_safe(hn, tmp, &(hash)[i], hlist)
@@ -82,11 +86,16 @@ struct l2addr_node {
})
#ifdef CONFIG_MLX5_MPFS
+struct mlx5_core_dev;
int mlx5_mpfs_init(struct mlx5_core_dev *dev);
void mlx5_mpfs_cleanup(struct mlx5_core_dev *dev);
+int mlx5_mpfs_enable(struct mlx5_core_dev *dev);
+void mlx5_mpfs_disable(struct mlx5_core_dev *dev);
#else /* #ifndef CONFIG_MLX5_MPFS */
static inline int mlx5_mpfs_init(struct mlx5_core_dev *dev) { return 0; }
static inline void mlx5_mpfs_cleanup(struct mlx5_core_dev *dev) {}
+static inline int mlx5_mpfs_enable(struct mlx5_core_dev *dev) { return 0; }
+static inline void mlx5_mpfs_disable(struct mlx5_core_dev *dev) {}
#endif
#endif
--
2.51.1
^ permalink raw reply related [flat|nested] 9+ messages in thread* [PATCH net-next V2 3/3] net/mlx5: E-Switch, support eswitch inactive mode
2025-11-07 0:08 [PATCH net-next V2 0/3] devlink eswitch inactive mode Saeed Mahameed
2025-11-07 0:08 ` [PATCH net-next V2 1/3] devlink: Introduce switchdev_inactive eswitch mode Saeed Mahameed
2025-11-07 0:08 ` [PATCH net-next V2 2/3] net/mlx5: MPFS, add support for dynamic enable/disable Saeed Mahameed
@ 2025-11-07 0:08 ` Saeed Mahameed
2025-11-08 2:28 ` Jakub Kicinski
2 siblings, 1 reply; 9+ messages in thread
From: Saeed Mahameed @ 2025-11-07 0:08 UTC (permalink / raw)
To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
Cc: Saeed Mahameed, netdev, Tariq Toukan, Gal Pressman,
Leon Romanovsky, Jiri Pirko, mbloch, Adithya Jayachandran
From: Saeed Mahameed <saeedm@nvidia.com>
Add support for eswitch switchdev inactive mode
Inactive mode: Drop all traffic going to FDB, Remove
mpfs l2 rules and disconnect adjacent vports.
Active mode: Traffic flows through FDB, mpfs table populated, and
adjacent vports are connected.
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Adithya Jayachandran <ajayachandra@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
---
.../mellanox/mlx5/core/esw/adj_vport.c | 15 +-
.../net/ethernet/mellanox/mlx5/core/eswitch.h | 6 +
.../mellanox/mlx5/core/eswitch_offloads.c | 194 +++++++++++++++++-
.../net/ethernet/mellanox/mlx5/core/fs_core.c | 5 +
.../ethernet/mellanox/mlx5/core/lib/mpfs.c | 2 +-
include/linux/mlx5/fs.h | 1 +
6 files changed, 202 insertions(+), 21 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/adj_vport.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/adj_vport.c
index 0091ba697bae..250af09b5af2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/adj_vport.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/adj_vport.c
@@ -4,13 +4,8 @@
#include "fs_core.h"
#include "eswitch.h"
-enum {
- MLX5_ADJ_VPORT_DISCONNECT = 0x0,
- MLX5_ADJ_VPORT_CONNECT = 0x1,
-};
-
-static int mlx5_esw_adj_vport_modify(struct mlx5_core_dev *dev,
- u16 vport, bool connect)
+int mlx5_esw_adj_vport_modify(struct mlx5_core_dev *dev, u16 vport,
+ bool connect)
{
u32 in[MLX5_ST_SZ_DW(modify_vport_state_in)] = {};
@@ -24,7 +19,7 @@ static int mlx5_esw_adj_vport_modify(struct mlx5_core_dev *dev,
MLX5_SET(modify_vport_state_in, in, egress_connect_valid, 1);
MLX5_SET(modify_vport_state_in, in, ingress_connect, connect);
MLX5_SET(modify_vport_state_in, in, egress_connect, connect);
-
+ MLX5_SET(modify_vport_state_in, in, admin_state, connect);
return mlx5_cmd_exec_in(dev, modify_vport_state, in);
}
@@ -96,7 +91,6 @@ static int mlx5_esw_adj_vport_create(struct mlx5_eswitch *esw, u16 vhca_id,
if (err)
goto acl_ns_remove;
- mlx5_esw_adj_vport_modify(esw->dev, vport_num, MLX5_ADJ_VPORT_CONNECT);
return 0;
acl_ns_remove:
@@ -117,8 +111,7 @@ static void mlx5_esw_adj_vport_destroy(struct mlx5_eswitch *esw,
esw_debug(esw->dev, "Destroying adjacent vport %d for vhca_id 0x%x\n",
vport_num, vport->vhca_id);
- mlx5_esw_adj_vport_modify(esw->dev, vport_num,
- MLX5_ADJ_VPORT_DISCONNECT);
+
mlx5_esw_offloads_rep_remove(esw, vport);
mlx5_fs_vport_egress_acl_ns_remove(esw->dev->priv.steering,
vport->index);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index 16eb99aba2a7..368ef2032825 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -264,6 +264,9 @@ struct mlx5_eswitch_fdb {
struct offloads_fdb {
struct mlx5_flow_namespace *ns;
+ struct mlx5_flow_table *drop_root;
+ struct mlx5_flow_handle *drop_root_rule;
+ struct mlx5_fc *drop_root_counter;
struct mlx5_flow_table *tc_miss_table;
struct mlx5_flow_table *slow_fdb;
struct mlx5_flow_group *send_to_vport_grp;
@@ -392,6 +395,7 @@ struct mlx5_eswitch {
struct mlx5_esw_offload offloads;
u32 last_vport_idx;
int mode;
+ bool offloads_inactive;
u16 manager_vport;
u16 first_host_vport;
u8 num_peers;
@@ -634,6 +638,8 @@ const u32 *mlx5_esw_query_functions(struct mlx5_core_dev *dev);
void mlx5_esw_adjacent_vhcas_setup(struct mlx5_eswitch *esw);
void mlx5_esw_adjacent_vhcas_cleanup(struct mlx5_eswitch *esw);
+int mlx5_esw_adj_vport_modify(struct mlx5_core_dev *dev, u16 vport,
+ bool connect);
#define MLX5_DEBUG_ESWITCH_MASK BIT(3)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index 4092ea29c630..d0da2ed979f4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -1577,6 +1577,7 @@ esw_chains_create(struct mlx5_eswitch *esw, struct mlx5_flow_table *miss_fdb)
attr.max_grp_num = esw->params.large_group_num;
attr.default_ft = miss_fdb;
attr.mapping = esw->offloads.reg_c0_obj_pool;
+ attr.fs_base_prio = FDB_BYPASS_PATH;
chains = mlx5_chains_create(dev, &attr);
if (IS_ERR(chains)) {
@@ -2355,6 +2356,123 @@ static void esw_mode_change(struct mlx5_eswitch *esw, u16 mode)
mlx5_devcom_comp_unlock(esw->dev->priv.hca_devcom_comp);
}
+static void mlx5_esw_fdb_drop_destroy(struct mlx5_eswitch *esw)
+{
+ if (!esw->fdb_table.offloads.drop_root)
+ return;
+
+ esw_debug(esw->dev, "Destroying FDB drop root table %#x fc %#x\n",
+ esw->fdb_table.offloads.drop_root->id,
+ esw->fdb_table.offloads.drop_root_counter->id);
+ mlx5_del_flow_rules(esw->fdb_table.offloads.drop_root_rule);
+ mlx5_fc_destroy(esw->dev, esw->fdb_table.offloads.drop_root_counter);
+ mlx5_destroy_flow_table(esw->fdb_table.offloads.drop_root);
+ esw->fdb_table.offloads.drop_root_counter = NULL;
+ esw->fdb_table.offloads.drop_root_rule = NULL;
+ esw->fdb_table.offloads.drop_root = NULL;
+}
+
+static int mlx5_esw_fdb_drop_create(struct mlx5_eswitch *esw)
+{
+ struct mlx5_flow_table_attr ft_attr = {};
+ struct mlx5_flow_destination dst = {};
+ struct mlx5_core_dev *dev = esw->dev;
+ struct mlx5_flow_namespace *root_ns;
+ struct mlx5_flow_act flow_act = {};
+ struct mlx5_flow_handle *flow_rule;
+ struct mlx5_flow_table *table;
+ int err = 0;
+
+ if (esw->fdb_table.offloads.drop_root)
+ return 0;
+
+ root_ns = esw->fdb_table.offloads.ns;
+
+ ft_attr.prio = FDB_DROP_ROOT;
+ ft_attr.max_fte = 1;
+ ft_attr.autogroup.max_num_groups = 1;
+ table = mlx5_create_auto_grouped_flow_table(root_ns, &ft_attr);
+ if (IS_ERR(table)) {
+ esw_warn(dev, "Failed to create fdb drop root table, err %ld\n",
+ PTR_ERR(table));
+ return PTR_ERR(table);
+ }
+
+ dst.type = MLX5_FLOW_DESTINATION_TYPE_COUNTER;
+ dst.counter = mlx5_fc_create(dev, 0);
+ err = PTR_ERR_OR_ZERO(dst.counter);
+ if (err) {
+ esw_warn(dev, "Failed to create fdb drop counter, err %d\n",
+ err);
+ goto err_counter;
+ }
+
+ flow_act.action = MLX5_FLOW_CONTEXT_ACTION_DROP |
+ MLX5_FLOW_CONTEXT_ACTION_COUNT;
+ flow_rule = mlx5_add_flow_rules(table, NULL, &flow_act, &dst, 1);
+ err = PTR_ERR_OR_ZERO(flow_rule);
+ if (err) {
+ esw_warn(esw->dev,
+ "fs offloads: Failed to add vport rx drop rule err %d\n",
+ err);
+ goto err_flow_rule;
+ }
+
+ esw->fdb_table.offloads.drop_root = table;
+ esw->fdb_table.offloads.drop_root_rule = flow_rule;
+ esw->fdb_table.offloads.drop_root_counter = dst.counter;
+ esw_debug(esw->dev, "Created FDB drop root table %#x fc %#x\n",
+ table->id, dst.counter->id);
+ return 0;
+
+err_flow_rule:
+ mlx5_fc_destroy(dev, dst.counter);
+err_counter:
+ mlx5_destroy_flow_table(table);
+ return err;
+}
+
+static void mlx5_esw_fdb_active(struct mlx5_eswitch *esw)
+{
+ struct mlx5_vport *vport;
+ unsigned long i;
+
+ mlx5_esw_fdb_drop_destroy(esw);
+ mlx5_mpfs_enable(esw->dev);
+
+ mlx5_esw_for_each_vf_vport(esw, i, vport, U16_MAX) {
+ if (!vport->adjacent)
+ continue;
+ esw_debug(esw->dev, "Connecting vport %d to eswitch\n",
+ vport->vport);
+ mlx5_esw_adj_vport_modify(esw->dev, vport->vport, true);
+ }
+
+ esw->offloads_inactive = false;
+ esw_warn(esw->dev, "MPFS/FDB active\n");
+}
+
+static void mlx5_esw_fdb_inactive(struct mlx5_eswitch *esw)
+{
+ struct mlx5_vport *vport;
+ unsigned long i;
+
+ mlx5_mpfs_disable(esw->dev);
+ mlx5_esw_fdb_drop_create(esw);
+
+ mlx5_esw_for_each_vf_vport(esw, i, vport, U16_MAX) {
+ if (!vport->adjacent)
+ continue;
+ esw_debug(esw->dev, "Disconnecting vport %u from eswitch\n",
+ vport->vport);
+
+ mlx5_esw_adj_vport_modify(esw->dev, vport->vport, false);
+ }
+
+ esw->offloads_inactive = true;
+ esw_warn(esw->dev, "MPFS/FDB inactive\n");
+}
+
static int esw_offloads_start(struct mlx5_eswitch *esw,
struct netlink_ext_ack *extack)
{
@@ -3438,6 +3556,7 @@ static int esw_offloads_steering_init(struct mlx5_eswitch *esw)
static void esw_offloads_steering_cleanup(struct mlx5_eswitch *esw)
{
+ mlx5_esw_fdb_drop_destroy(esw);
esw_destroy_vport_rx_drop_rule(esw);
esw_destroy_vport_rx_drop_group(esw);
esw_destroy_vport_rx_group(esw);
@@ -3600,6 +3719,11 @@ int esw_offloads_enable(struct mlx5_eswitch *esw)
if (err)
goto err_steering_init;
+ if (esw->offloads_inactive)
+ mlx5_esw_fdb_inactive(esw);
+ else
+ mlx5_esw_fdb_active(esw);
+
/* Representor will control the vport link state */
mlx5_esw_for_each_vf_vport(esw, i, vport, esw->esw_funcs.num_vfs)
vport->info.link_state = MLX5_VPORT_ADMIN_STATE_DOWN;
@@ -3666,6 +3790,9 @@ void esw_offloads_disable(struct mlx5_eswitch *esw)
esw_offloads_metadata_uninit(esw);
mlx5_rdma_disable_roce(esw->dev);
mlx5_esw_adjacent_vhcas_cleanup(esw);
+ /* must be done after vhcas cleanup to avoid adjacent vports connect */
+ if (esw->offloads_inactive)
+ mlx5_esw_fdb_active(esw); /* legacy mode always active */
mutex_destroy(&esw->offloads.termtbl_mutex);
}
@@ -3676,6 +3803,7 @@ static int esw_mode_from_devlink(u16 mode, u16 *mlx5_mode)
*mlx5_mode = MLX5_ESWITCH_LEGACY;
break;
case DEVLINK_ESWITCH_MODE_SWITCHDEV:
+ case DEVLINK_ESWITCH_MODE_SWITCHDEV_INACTIVE:
*mlx5_mode = MLX5_ESWITCH_OFFLOADS;
break;
default:
@@ -3685,14 +3813,17 @@ static int esw_mode_from_devlink(u16 mode, u16 *mlx5_mode)
return 0;
}
-static int esw_mode_to_devlink(u16 mlx5_mode, u16 *mode)
+static int esw_mode_to_devlink(struct mlx5_eswitch *esw, u16 *mode)
{
- switch (mlx5_mode) {
+ switch (esw->mode) {
case MLX5_ESWITCH_LEGACY:
*mode = DEVLINK_ESWITCH_MODE_LEGACY;
break;
case MLX5_ESWITCH_OFFLOADS:
- *mode = DEVLINK_ESWITCH_MODE_SWITCHDEV;
+ if (esw->offloads_inactive)
+ *mode = DEVLINK_ESWITCH_MODE_SWITCHDEV_INACTIVE;
+ else
+ *mode = DEVLINK_ESWITCH_MODE_SWITCHDEV;
break;
default:
return -EINVAL;
@@ -3798,6 +3929,45 @@ static bool mlx5_devlink_netdev_netns_immutable_set(struct devlink *devlink,
return ret;
}
+/* Returns true when only changing between active and inactive switchdev mode */
+static bool mlx5_devlink_switchdev_active_mode_change(struct mlx5_eswitch *esw,
+ u16 devlink_mode)
+{
+ /* current mode is not switchdev */
+ if (esw->mode != MLX5_ESWITCH_OFFLOADS)
+ return false;
+
+ /* new mode is not switchdev */
+ if (devlink_mode != DEVLINK_ESWITCH_MODE_SWITCHDEV &&
+ devlink_mode != DEVLINK_ESWITCH_MODE_SWITCHDEV_INACTIVE)
+ return false;
+
+ /* already inactive: no change in current state */
+ if (devlink_mode == DEVLINK_ESWITCH_MODE_SWITCHDEV_INACTIVE &&
+ esw->offloads_inactive)
+ return false;
+
+ /* already active: no change in current state */
+ if (devlink_mode == DEVLINK_ESWITCH_MODE_SWITCHDEV &&
+ !esw->offloads_inactive)
+ return false;
+
+ down_write(&esw->mode_lock);
+ esw->offloads_inactive = !esw->offloads_inactive;
+ esw->eswitch_operation_in_progress = true;
+ up_write(&esw->mode_lock);
+
+ if (esw->offloads_inactive)
+ mlx5_esw_fdb_inactive(esw);
+ else
+ mlx5_esw_fdb_active(esw);
+
+ down_write(&esw->mode_lock);
+ esw->eswitch_operation_in_progress = false;
+ up_write(&esw->mode_lock);
+ return true;
+}
+
int mlx5_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode,
struct netlink_ext_ack *extack)
{
@@ -3812,12 +3982,16 @@ int mlx5_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode,
if (esw_mode_from_devlink(mode, &mlx5_mode))
return -EINVAL;
- if (mode == DEVLINK_ESWITCH_MODE_SWITCHDEV && mlx5_get_sd(esw->dev)) {
+ if (mlx5_mode == MLX5_ESWITCH_OFFLOADS && mlx5_get_sd(esw->dev)) {
NL_SET_ERR_MSG_MOD(extack,
"Can't change E-Switch mode to switchdev when multi-PF netdev (Socket Direct) is configured.");
return -EPERM;
}
+ /* Avoid try_lock, active/inactive mode change is not restricted */
+ if (mlx5_devlink_switchdev_active_mode_change(esw, mode))
+ return 0;
+
mlx5_lag_disable_change(esw->dev);
err = mlx5_esw_try_lock(esw);
if (err < 0) {
@@ -3840,7 +4014,7 @@ int mlx5_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode,
esw->eswitch_operation_in_progress = true;
up_write(&esw->mode_lock);
- if (mode == DEVLINK_ESWITCH_MODE_SWITCHDEV &&
+ if (mlx5_mode == MLX5_ESWITCH_OFFLOADS &&
!mlx5_devlink_netdev_netns_immutable_set(devlink, true)) {
NL_SET_ERR_MSG_MOD(extack,
"Can't change E-Switch mode to switchdev when netdev net namespace has diverged from the devlink's.");
@@ -3848,18 +4022,20 @@ int mlx5_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode,
goto skip;
}
- if (mode == DEVLINK_ESWITCH_MODE_LEGACY)
+ if (mlx5_mode == MLX5_ESWITCH_LEGACY)
esw->dev->priv.flags |= MLX5_PRIV_FLAGS_SWITCH_LEGACY;
mlx5_eswitch_disable_locked(esw);
- if (mode == DEVLINK_ESWITCH_MODE_SWITCHDEV) {
+ if (mlx5_mode == MLX5_ESWITCH_OFFLOADS) {
if (mlx5_devlink_trap_get_num_active(esw->dev)) {
NL_SET_ERR_MSG_MOD(extack,
"Can't change mode while devlink traps are active");
err = -EOPNOTSUPP;
goto skip;
}
+ esw->offloads_inactive =
+ (mode == DEVLINK_ESWITCH_MODE_SWITCHDEV_INACTIVE);
err = esw_offloads_start(esw, extack);
- } else if (mode == DEVLINK_ESWITCH_MODE_LEGACY) {
+ } else if (mlx5_mode == MLX5_ESWITCH_LEGACY) {
err = esw_offloads_stop(esw, extack);
} else {
err = -EINVAL;
@@ -3885,7 +4061,7 @@ int mlx5_devlink_eswitch_mode_get(struct devlink *devlink, u16 *mode)
if (IS_ERR(esw))
return PTR_ERR(esw);
- return esw_mode_to_devlink(esw->mode, mode);
+ return esw_mode_to_devlink(esw, mode);
}
static int mlx5_esw_vports_inline_set(struct mlx5_eswitch *esw, u8 mlx5_mode,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index 2db3ffb0a2b2..2ca3bddbdf05 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -3520,6 +3520,11 @@ static int init_fdb_root_ns(struct mlx5_flow_steering *steering)
if (!steering->fdb_root_ns)
return -ENOMEM;
+ maj_prio = fs_create_prio(&steering->fdb_root_ns->ns, FDB_DROP_ROOT, 1);
+ err = PTR_ERR_OR_ZERO(maj_prio);
+ if (err)
+ goto out_err;
+
err = create_fdb_bypass(steering);
if (err)
goto out_err;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.c
index f27b5adb9606..3777aef600d1 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.c
@@ -167,7 +167,7 @@ int mlx5_mpfs_add_mac(struct mlx5_core_dev *dev, u8 *mac)
if (err)
goto free_l2table_index;
mlx5_core_dbg(dev, "MPFS entry %pM, set @index (%d)\n",
- l2addr->node.addr, l2addr->index);
+ l2addr->node.addr, index);
}
l2addr->index = index;
diff --git a/include/linux/mlx5/fs.h b/include/linux/mlx5/fs.h
index 6ac76a0c3827..7bf2449c53b2 100644
--- a/include/linux/mlx5/fs.h
+++ b/include/linux/mlx5/fs.h
@@ -116,6 +116,7 @@ enum mlx5_flow_namespace_type {
};
enum {
+ FDB_DROP_ROOT,
FDB_BYPASS_PATH,
FDB_CRYPTO_INGRESS,
FDB_TC_OFFLOAD,
--
2.51.1
^ permalink raw reply related [flat|nested] 9+ messages in thread