* [PATCH net 1/8] net/mlx5: Fix RDMA TX steering prio
2025-01-13 15:40 [PATCH net 0/8] mlx5 misc fixes 2025-01-13 Tariq Toukan
@ 2025-01-13 15:40 ` Tariq Toukan
2025-01-13 18:58 ` Jacob Keller
2025-01-13 15:40 ` [PATCH net 2/8] net/mlx5: Fix a lockdep warning as part of the write combining test Tariq Toukan
` (6 subsequent siblings)
7 siblings, 1 reply; 20+ messages in thread
From: Tariq Toukan @ 2025-01-13 15:40 UTC (permalink / raw)
To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
Andrew Lunn
Cc: netdev, Saeed Mahameed, Gal Pressman, Leon Romanovsky, Mark Bloch,
Moshe Shemesh, Patrisious Haddad, Tariq Toukan
From: Patrisious Haddad <phaddad@nvidia.com>
User added steering rules at RDMA_TX were being added to the first prio,
which is the counters prio.
Fix that so that they are correctly added to the BYPASS_PRIO instead.
Fixes: 24670b1a3166 ("net/mlx5: Add support for RDMA TX steering")
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index 2eabfcc247c6..0ce999706d41 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -2709,6 +2709,7 @@ struct mlx5_flow_namespace *mlx5_get_flow_namespace(struct mlx5_core_dev *dev,
break;
case MLX5_FLOW_NAMESPACE_RDMA_TX:
root_ns = steering->rdma_tx_root_ns;
+ prio = RDMA_TX_BYPASS_PRIO;
break;
case MLX5_FLOW_NAMESPACE_RDMA_RX_COUNTERS:
root_ns = steering->rdma_rx_root_ns;
--
2.45.0
^ permalink raw reply related [flat|nested] 20+ messages in thread* Re: [PATCH net 1/8] net/mlx5: Fix RDMA TX steering prio
2025-01-13 15:40 ` [PATCH net 1/8] net/mlx5: Fix RDMA TX steering prio Tariq Toukan
@ 2025-01-13 18:58 ` Jacob Keller
0 siblings, 0 replies; 20+ messages in thread
From: Jacob Keller @ 2025-01-13 18:58 UTC (permalink / raw)
To: Tariq Toukan, David S. Miller, Jakub Kicinski, Paolo Abeni,
Eric Dumazet, Andrew Lunn
Cc: netdev, Saeed Mahameed, Gal Pressman, Leon Romanovsky, Mark Bloch,
Moshe Shemesh, Patrisious Haddad
On 1/13/2025 7:40 AM, Tariq Toukan wrote:
> From: Patrisious Haddad <phaddad@nvidia.com>
>
> User added steering rules at RDMA_TX were being added to the first prio,
> which is the counters prio.
> Fix that so that they are correctly added to the BYPASS_PRIO instead.
>
> Fixes: 24670b1a3166 ("net/mlx5: Add support for RDMA TX steering")
> Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
> Reviewed-by: Mark Bloch <mbloch@nvidia.com>
> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
> ---
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
> drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
> index 2eabfcc247c6..0ce999706d41 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
> @@ -2709,6 +2709,7 @@ struct mlx5_flow_namespace *mlx5_get_flow_namespace(struct mlx5_core_dev *dev,
> break;
> case MLX5_FLOW_NAMESPACE_RDMA_TX:
> root_ns = steering->rdma_tx_root_ns;
> + prio = RDMA_TX_BYPASS_PRIO;
> break;
> case MLX5_FLOW_NAMESPACE_RDMA_RX_COUNTERS:
> root_ns = steering->rdma_rx_root_ns;
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH net 2/8] net/mlx5: Fix a lockdep warning as part of the write combining test
2025-01-13 15:40 [PATCH net 0/8] mlx5 misc fixes 2025-01-13 Tariq Toukan
2025-01-13 15:40 ` [PATCH net 1/8] net/mlx5: Fix RDMA TX steering prio Tariq Toukan
@ 2025-01-13 15:40 ` Tariq Toukan
2025-01-14 16:33 ` Larysa Zaremba
2025-01-13 15:40 ` [PATCH net 3/8] net/mlx5: SF, Fix add port error handling Tariq Toukan
` (5 subsequent siblings)
7 siblings, 1 reply; 20+ messages in thread
From: Tariq Toukan @ 2025-01-13 15:40 UTC (permalink / raw)
To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
Andrew Lunn
Cc: netdev, Saeed Mahameed, Gal Pressman, Leon Romanovsky, Mark Bloch,
Moshe Shemesh, Yishai Hadas, Michael Guralnik, Tariq Toukan
From: Yishai Hadas <yishaih@nvidia.com>
Fix a lockdep warning [1] observed during the write combining test.
The warning indicates a potential nested lock scenario that could lead
to a deadlock.
However, this is a false positive alarm because the SF lock and its
parent lock are distinct ones.
The lockdep confusion arises because the locks belong to the same object
class (i.e., struct mlx5_core_dev).
To resolve this, the code has been refactored to avoid taking both
locks. Instead, only the parent lock is acquired.
[1]
raw_ethernet_bw/2118 is trying to acquire lock:
[ 213.619032] ffff88811dd75e08 (&dev->wc_state_lock){+.+.}-{3:3}, at:
mlx5_wc_support_get+0x18c/0x210 [mlx5_core]
[ 213.620270]
[ 213.620270] but task is already holding lock:
[ 213.620943] ffff88810b585e08 (&dev->wc_state_lock){+.+.}-{3:3}, at:
mlx5_wc_support_get+0x10c/0x210 [mlx5_core]
[ 213.622045]
[ 213.622045] other info that might help us debug this:
[ 213.622778] Possible unsafe locking scenario:
[ 213.622778]
[ 213.623465] CPU0
[ 213.623815] ----
[ 213.624148] lock(&dev->wc_state_lock);
[ 213.624615] lock(&dev->wc_state_lock);
[ 213.625071]
[ 213.625071] *** DEADLOCK ***
[ 213.625071]
[ 213.625805] May be due to missing lock nesting notation
[ 213.625805]
[ 213.626522] 4 locks held by raw_ethernet_bw/2118:
[ 213.627019] #0: ffff88813f80d578 (&uverbs_dev->disassociate_srcu){.+.+}-{0:0},
at: ib_uverbs_ioctl+0xc4/0x170 [ib_uverbs]
[ 213.628088] #1: ffff88810fb23930 (&file->hw_destroy_rwsem){.+.+}-{3:3},
at: ib_init_ucontext+0x2d/0xf0 [ib_uverbs]
[ 213.629094] #2: ffff88810fb23878 (&file->ucontext_lock){+.+.}-{3:3},
at: ib_init_ucontext+0x49/0xf0 [ib_uverbs]
[ 213.630106] #3: ffff88810b585e08 (&dev->wc_state_lock){+.+.}-{3:3},
at: mlx5_wc_support_get+0x10c/0x210 [mlx5_core]
[ 213.631185]
[ 213.631185] stack backtrace:
[ 213.631718] CPU: 1 UID: 0 PID: 2118 Comm: raw_ethernet_bw Not tainted
6.12.0-rc7_internal_net_next_mlx5_89a0ad0 #1
[ 213.632722] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 213.633785] Call Trace:
[ 213.634099]
[ 213.634393] dump_stack_lvl+0x7e/0xc0
[ 213.634806] print_deadlock_bug+0x278/0x3c0
[ 213.635265] __lock_acquire+0x15f4/0x2c40
[ 213.635712] lock_acquire+0xcd/0x2d0
[ 213.636120] ? mlx5_wc_support_get+0x18c/0x210 [mlx5_core]
[ 213.636722] ? mlx5_ib_enable_lb+0x24/0xa0 [mlx5_ib]
[ 213.637277] __mutex_lock+0x81/0xda0
[ 213.637697] ? mlx5_wc_support_get+0x18c/0x210 [mlx5_core]
[ 213.638305] ? mlx5_wc_support_get+0x18c/0x210 [mlx5_core]
[ 213.638902] ? rcu_read_lock_sched_held+0x3f/0x70
[ 213.639400] ? mlx5_wc_support_get+0x18c/0x210 [mlx5_core]
[ 213.640016] mlx5_wc_support_get+0x18c/0x210 [mlx5_core]
[ 213.640615] set_ucontext_resp+0x68/0x2b0 [mlx5_ib]
[ 213.641144] ? debug_mutex_init+0x33/0x40
[ 213.641586] mlx5_ib_alloc_ucontext+0x18e/0x7b0 [mlx5_ib]
[ 213.642145] ib_init_ucontext+0xa0/0xf0 [ib_uverbs]
[ 213.642679] ib_uverbs_handler_UVERBS_METHOD_GET_CONTEXT+0x95/0xc0
[ib_uverbs]
[ 213.643426] ? _copy_from_user+0x46/0x80
[ 213.643878] ib_uverbs_cmd_verbs+0xa6b/0xc80 [ib_uverbs]
[ 213.644426] ? ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x130/0x130
[ib_uverbs]
[ 213.645213] ? __lock_acquire+0xa99/0x2c40
[ 213.645675] ? lock_acquire+0xcd/0x2d0
[ 213.646101] ? ib_uverbs_ioctl+0xc4/0x170 [ib_uverbs]
[ 213.646625] ? reacquire_held_locks+0xcf/0x1f0
[ 213.647102] ? do_user_addr_fault+0x45d/0x770
[ 213.647586] ib_uverbs_ioctl+0xe0/0x170 [ib_uverbs]
[ 213.648102] ? ib_uverbs_ioctl+0xc4/0x170 [ib_uverbs]
[ 213.648632] __x64_sys_ioctl+0x4d3/0xaa0
[ 213.649060] ? do_user_addr_fault+0x4a8/0x770
[ 213.649528] do_syscall_64+0x6d/0x140
[ 213.649947] entry_SYSCALL_64_after_hwframe+0x4b/0x53
[ 213.650478] RIP: 0033:0x7fa179b0737b
[ 213.650893] Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c
89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8
10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d
7d 2a 0f 00 f7 d8 64 89 01 48
[ 213.652619] RSP: 002b:00007ffd2e6d46e8 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[ 213.653390] RAX: ffffffffffffffda RBX: 00007ffd2e6d47f8 RCX:
00007fa179b0737b
[ 213.654084] RDX: 00007ffd2e6d47e0 RSI: 00000000c0181b01 RDI:
0000000000000003
[ 213.654767] RBP: 00007ffd2e6d47c0 R08: 00007fa1799be010 R09:
0000000000000002
[ 213.655453] R10: 00007ffd2e6d4960 R11: 0000000000000246 R12:
00007ffd2e6d487c
[ 213.656170] R13: 0000000000000027 R14: 0000000000000001 R15:
00007ffd2e6d4f70
Fixes: d98995b4bf98 ("net/mlx5: Reimplement write combining test")
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
drivers/net/ethernet/mellanox/mlx5/core/wc.c | 24 ++++++++++----------
1 file changed, 12 insertions(+), 12 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/wc.c b/drivers/net/ethernet/mellanox/mlx5/core/wc.c
index 1bed75eca97d..740b719e7072 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/wc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/wc.c
@@ -382,6 +382,7 @@ static void mlx5_core_test_wc(struct mlx5_core_dev *mdev)
bool mlx5_wc_support_get(struct mlx5_core_dev *mdev)
{
+ struct mutex *wc_state_lock = &mdev->wc_state_lock;
struct mlx5_core_dev *parent = NULL;
if (!MLX5_CAP_GEN(mdev, bf)) {
@@ -400,32 +401,31 @@ bool mlx5_wc_support_get(struct mlx5_core_dev *mdev)
*/
goto out;
- mutex_lock(&mdev->wc_state_lock);
-
- if (mdev->wc_state != MLX5_WC_STATE_UNINITIALIZED)
- goto unlock;
-
#ifdef CONFIG_MLX5_SF
- if (mlx5_core_is_sf(mdev))
+ if (mlx5_core_is_sf(mdev)) {
parent = mdev->priv.parent_mdev;
+ wc_state_lock = &parent->wc_state_lock;
+ }
#endif
- if (parent) {
- mutex_lock(&parent->wc_state_lock);
+ mutex_lock(wc_state_lock);
+ if (mdev->wc_state != MLX5_WC_STATE_UNINITIALIZED)
+ goto unlock;
+
+ if (parent) {
mlx5_core_test_wc(parent);
mlx5_core_dbg(mdev, "parent set wc_state=%d\n",
parent->wc_state);
mdev->wc_state = parent->wc_state;
- mutex_unlock(&parent->wc_state_lock);
+ } else {
+ mlx5_core_test_wc(mdev);
}
- mlx5_core_test_wc(mdev);
-
unlock:
- mutex_unlock(&mdev->wc_state_lock);
+ mutex_unlock(wc_state_lock);
out:
mlx5_core_dbg(mdev, "wc_state=%d\n", mdev->wc_state);
--
2.45.0
^ permalink raw reply related [flat|nested] 20+ messages in thread* Re: [PATCH net 2/8] net/mlx5: Fix a lockdep warning as part of the write combining test
2025-01-13 15:40 ` [PATCH net 2/8] net/mlx5: Fix a lockdep warning as part of the write combining test Tariq Toukan
@ 2025-01-14 16:33 ` Larysa Zaremba
0 siblings, 0 replies; 20+ messages in thread
From: Larysa Zaremba @ 2025-01-14 16:33 UTC (permalink / raw)
To: Tariq Toukan
Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
Andrew Lunn, netdev, Saeed Mahameed, Gal Pressman,
Leon Romanovsky, Mark Bloch, Moshe Shemesh, Yishai Hadas,
Michael Guralnik
On Mon, Jan 13, 2025 at 05:40:48PM +0200, Tariq Toukan wrote:
> From: Yishai Hadas <yishaih@nvidia.com>
>
> Fix a lockdep warning [1] observed during the write combining test.
>
> The warning indicates a potential nested lock scenario that could lead
> to a deadlock.
>
> However, this is a false positive alarm because the SF lock and its
> parent lock are distinct ones.
>
> The lockdep confusion arises because the locks belong to the same object
> class (i.e., struct mlx5_core_dev).
>
> To resolve this, the code has been refactored to avoid taking both
> locks. Instead, only the parent lock is acquired.
>
[...]
>
> Fixes: d98995b4bf98 ("net/mlx5: Reimplement write combining test")
> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
> ---
> drivers/net/ethernet/mellanox/mlx5/core/wc.c | 24 ++++++++++----------
> 1 file changed, 12 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/wc.c b/drivers/net/ethernet/mellanox/mlx5/core/wc.c
> index 1bed75eca97d..740b719e7072 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/wc.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/wc.c
> @@ -382,6 +382,7 @@ static void mlx5_core_test_wc(struct mlx5_core_dev *mdev)
>
> bool mlx5_wc_support_get(struct mlx5_core_dev *mdev)
> {
> + struct mutex *wc_state_lock = &mdev->wc_state_lock;
> struct mlx5_core_dev *parent = NULL;
>
> if (!MLX5_CAP_GEN(mdev, bf)) {
> @@ -400,32 +401,31 @@ bool mlx5_wc_support_get(struct mlx5_core_dev *mdev)
> */
> goto out;
>
> - mutex_lock(&mdev->wc_state_lock);
> -
> - if (mdev->wc_state != MLX5_WC_STATE_UNINITIALIZED)
> - goto unlock;
> -
> #ifdef CONFIG_MLX5_SF
> - if (mlx5_core_is_sf(mdev))
> + if (mlx5_core_is_sf(mdev)) {
> parent = mdev->priv.parent_mdev;
> + wc_state_lock = &parent->wc_state_lock;
> + }
> #endif
>
> - if (parent) {
> - mutex_lock(&parent->wc_state_lock);
> + mutex_lock(wc_state_lock);
>
> + if (mdev->wc_state != MLX5_WC_STATE_UNINITIALIZED)
> + goto unlock;
> +
> + if (parent) {
> mlx5_core_test_wc(parent);
>
> mlx5_core_dbg(mdev, "parent set wc_state=%d\n",
> parent->wc_state);
> mdev->wc_state = parent->wc_state;
>
> - mutex_unlock(&parent->wc_state_lock);
> + } else {
> + mlx5_core_test_wc(mdev);
> }
>
> - mlx5_core_test_wc(mdev);
> -
> unlock:
> - mutex_unlock(&mdev->wc_state_lock);
> + mutex_unlock(wc_state_lock);
> out:
> mlx5_core_dbg(mdev, "wc_state=%d\n", mdev->wc_state);
>
> --
> 2.45.0
>
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH net 3/8] net/mlx5: SF, Fix add port error handling
2025-01-13 15:40 [PATCH net 0/8] mlx5 misc fixes 2025-01-13 Tariq Toukan
2025-01-13 15:40 ` [PATCH net 1/8] net/mlx5: Fix RDMA TX steering prio Tariq Toukan
2025-01-13 15:40 ` [PATCH net 2/8] net/mlx5: Fix a lockdep warning as part of the write combining test Tariq Toukan
@ 2025-01-13 15:40 ` Tariq Toukan
2025-01-13 19:00 ` Jacob Keller
2025-01-13 15:40 ` [PATCH net 4/8] net/mlx5: Clear port select structure when fail to create Tariq Toukan
` (4 subsequent siblings)
7 siblings, 1 reply; 20+ messages in thread
From: Tariq Toukan @ 2025-01-13 15:40 UTC (permalink / raw)
To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
Andrew Lunn
Cc: netdev, Saeed Mahameed, Gal Pressman, Leon Romanovsky, Mark Bloch,
Moshe Shemesh, Chris Mi, Shay Drori, Tariq Toukan
From: Chris Mi <cmi@nvidia.com>
If failed to add SF, error handling doesn't delete the SF from the
SF table. But the hw resources are deleted. So when unload driver,
hw resources will be deleted again. Firmware will report syndrome
0x68def3 which means "SF is not allocated can not deallocate".
Fix it by delete SF from SF table if failed to add SF.
Fixes: 2597ee190b4e ("net/mlx5: Call mlx5_sf_id_erase() once in mlx5_sf_dealloc()")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Shay Drori <shayd@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
drivers/net/ethernet/mellanox/mlx5/core/sf/devlink.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/sf/devlink.c
index a96be98be032..b96909fbeb12 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/sf/devlink.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/devlink.c
@@ -257,6 +257,7 @@ static int mlx5_sf_add(struct mlx5_core_dev *dev, struct mlx5_sf_table *table,
return 0;
esw_err:
+ mlx5_sf_function_id_erase(table, sf);
mlx5_sf_free(table, sf);
return err;
}
--
2.45.0
^ permalink raw reply related [flat|nested] 20+ messages in thread* Re: [PATCH net 3/8] net/mlx5: SF, Fix add port error handling
2025-01-13 15:40 ` [PATCH net 3/8] net/mlx5: SF, Fix add port error handling Tariq Toukan
@ 2025-01-13 19:00 ` Jacob Keller
0 siblings, 0 replies; 20+ messages in thread
From: Jacob Keller @ 2025-01-13 19:00 UTC (permalink / raw)
To: Tariq Toukan, David S. Miller, Jakub Kicinski, Paolo Abeni,
Eric Dumazet, Andrew Lunn
Cc: netdev, Saeed Mahameed, Gal Pressman, Leon Romanovsky, Mark Bloch,
Moshe Shemesh, Chris Mi, Shay Drori
On 1/13/2025 7:40 AM, Tariq Toukan wrote:
> From: Chris Mi <cmi@nvidia.com>
>
> If failed to add SF, error handling doesn't delete the SF from the
> SF table. But the hw resources are deleted. So when unload driver,
> hw resources will be deleted again. Firmware will report syndrome
> 0x68def3 which means "SF is not allocated can not deallocate".
>
> Fix it by delete SF from SF table if failed to add SF.
>
> Fixes: 2597ee190b4e ("net/mlx5: Call mlx5_sf_id_erase() once in mlx5_sf_dealloc()")
> Signed-off-by: Chris Mi <cmi@nvidia.com>
> Reviewed-by: Shay Drori <shayd@nvidia.com>
> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
> ---
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
> drivers/net/ethernet/mellanox/mlx5/core/sf/devlink.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/sf/devlink.c
> index a96be98be032..b96909fbeb12 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/sf/devlink.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/devlink.c
> @@ -257,6 +257,7 @@ static int mlx5_sf_add(struct mlx5_core_dev *dev, struct mlx5_sf_table *table,
> return 0;
>
> esw_err:
> + mlx5_sf_function_id_erase(table, sf);
> mlx5_sf_free(table, sf);
> return err;
> }
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH net 4/8] net/mlx5: Clear port select structure when fail to create
2025-01-13 15:40 [PATCH net 0/8] mlx5 misc fixes 2025-01-13 Tariq Toukan
` (2 preceding siblings ...)
2025-01-13 15:40 ` [PATCH net 3/8] net/mlx5: SF, Fix add port error handling Tariq Toukan
@ 2025-01-13 15:40 ` Tariq Toukan
2025-01-13 19:00 ` Jacob Keller
2025-01-13 15:40 ` [PATCH net 5/8] net/mlx5e: Fix inversion dependency warning while enabling IPsec tunnel Tariq Toukan
` (3 subsequent siblings)
7 siblings, 1 reply; 20+ messages in thread
From: Tariq Toukan @ 2025-01-13 15:40 UTC (permalink / raw)
To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
Andrew Lunn
Cc: netdev, Saeed Mahameed, Gal Pressman, Leon Romanovsky, Mark Bloch,
Moshe Shemesh, Mark Zhang, Tariq Toukan
From: Mark Zhang <markzhang@nvidia.com>
Clear the port select structure on error so no stale values left after
definers are destroyed. That's because the mlx5_lag_destroy_definers()
always try to destroy all lag definers in the tt_map, so in the flow
below lag definers get double-destroyed and cause kernel crash:
mlx5_lag_port_sel_create()
mlx5_lag_create_definers()
mlx5_lag_create_definer() <- Failed on tt 1
mlx5_lag_destroy_definers() <- definers[tt=0] gets destroyed
mlx5_lag_port_sel_create()
mlx5_lag_create_definers()
mlx5_lag_create_definer() <- Failed on tt 0
mlx5_lag_destroy_definers() <- definers[tt=0] gets double-destroyed
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
Mem abort info:
ESR = 0x0000000096000005
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
FSC = 0x05: level 1 translation fault
Data abort info:
ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
CM = 0, WnR = 0, TnD = 0, TagAccess = 0
GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
user pgtable: 64k pages, 48-bit VAs, pgdp=0000000112ce2e00
[0000000000000008] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000
Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP
Modules linked in: iptable_raw bonding ip_gre ip6_gre gre ip6_tunnel tunnel6 geneve ip6_udp_tunnel udp_tunnel ipip tunnel4 ip_tunnel rdma_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_ib(OE) ib_uverbs(OE) mlx5_fwctl(OE) fwctl(OE) mlx5_core(OE) mlxdevm(OE) ib_core(OE) mlxfw(OE) memtrack(OE) mlx_compat(OE) openvswitch nsh nf_conncount psample xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter bridge stp llc netconsole overlay efi_pstore sch_fq_codel zram ip_tables crct10dif_ce qemu_fw_cfg fuse ipv6 crc_ccitt [last unloaded: mlx_compat(OE)]
CPU: 3 UID: 0 PID: 217 Comm: kworker/u53:2 Tainted: G OE 6.11.0+ #2
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
Workqueue: mlx5_lag mlx5_do_bond_work [mlx5_core]
pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : mlx5_del_flow_rules+0x24/0x2c0 [mlx5_core]
lr : mlx5_lag_destroy_definer+0x54/0x100 [mlx5_core]
sp : ffff800085fafb00
x29: ffff800085fafb00 x28: ffff0000da0c8000 x27: 0000000000000000
x26: ffff0000da0c8000 x25: ffff0000da0c8000 x24: ffff0000da0c8000
x23: ffff0000c31f81a0 x22: 0400000000000000 x21: ffff0000da0c8000
x20: 0000000000000000 x19: 0000000000000001 x18: 0000000000000000
x17: 0000000000000000 x16: 0000000000000000 x15: 0000ffff8b0c9350
x14: 0000000000000000 x13: ffff800081390d18 x12: ffff800081dc3cc0
x11: 0000000000000001 x10: 0000000000000b10 x9 : ffff80007ab7304c
x8 : ffff0000d00711f0 x7 : 0000000000000004 x6 : 0000000000000190
x5 : ffff00027edb3010 x4 : 0000000000000000 x3 : 0000000000000000
x2 : ffff0000d39b8000 x1 : ffff0000d39b8000 x0 : 0400000000000000
Call trace:
mlx5_del_flow_rules+0x24/0x2c0 [mlx5_core]
mlx5_lag_destroy_definer+0x54/0x100 [mlx5_core]
mlx5_lag_destroy_definers+0xa0/0x108 [mlx5_core]
mlx5_lag_port_sel_create+0x2d4/0x6f8 [mlx5_core]
mlx5_activate_lag+0x60c/0x6f8 [mlx5_core]
mlx5_do_bond_work+0x284/0x5c8 [mlx5_core]
process_one_work+0x170/0x3e0
worker_thread+0x2d8/0x3e0
kthread+0x11c/0x128
ret_from_fork+0x10/0x20
Code: a9025bf5 aa0003f6 a90363f7 f90023f9 (f9400400)
---[ end trace 0000000000000000 ]---
Fixes: dc48516ec7d3 ("net/mlx5: Lag, add support to create definers for LAG")
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
drivers/net/ethernet/mellanox/mlx5/core/lag/port_sel.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag/port_sel.c b/drivers/net/ethernet/mellanox/mlx5/core/lag/port_sel.c
index ab2717012b79..39e80704b1c4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lag/port_sel.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/port_sel.c
@@ -530,7 +530,7 @@ int mlx5_lag_port_sel_create(struct mlx5_lag *ldev,
set_tt_map(port_sel, hash_type);
err = mlx5_lag_create_definers(ldev, hash_type, ports);
if (err)
- return err;
+ goto clear_port_sel;
if (port_sel->tunnel) {
err = mlx5_lag_create_inner_ttc_table(ldev);
@@ -549,6 +549,8 @@ int mlx5_lag_port_sel_create(struct mlx5_lag *ldev,
mlx5_destroy_ttc_table(port_sel->inner.ttc);
destroy_definers:
mlx5_lag_destroy_definers(ldev);
+clear_port_sel:
+ memset(port_sel, 0, sizeof(*port_sel));
return err;
}
--
2.45.0
^ permalink raw reply related [flat|nested] 20+ messages in thread* Re: [PATCH net 4/8] net/mlx5: Clear port select structure when fail to create
2025-01-13 15:40 ` [PATCH net 4/8] net/mlx5: Clear port select structure when fail to create Tariq Toukan
@ 2025-01-13 19:00 ` Jacob Keller
0 siblings, 0 replies; 20+ messages in thread
From: Jacob Keller @ 2025-01-13 19:00 UTC (permalink / raw)
To: Tariq Toukan, David S. Miller, Jakub Kicinski, Paolo Abeni,
Eric Dumazet, Andrew Lunn
Cc: netdev, Saeed Mahameed, Gal Pressman, Leon Romanovsky, Mark Bloch,
Moshe Shemesh, Mark Zhang
On 1/13/2025 7:40 AM, Tariq Toukan wrote:
> From: Mark Zhang <markzhang@nvidia.com>
>
> Clear the port select structure on error so no stale values left after
> definers are destroyed. That's because the mlx5_lag_destroy_definers()
> always try to destroy all lag definers in the tt_map, so in the flow
> below lag definers get double-destroyed and cause kernel crash:
>
> mlx5_lag_port_sel_create()
> mlx5_lag_create_definers()
> mlx5_lag_create_definer() <- Failed on tt 1
> mlx5_lag_destroy_definers() <- definers[tt=0] gets destroyed
> mlx5_lag_port_sel_create()
> mlx5_lag_create_definers()
> mlx5_lag_create_definer() <- Failed on tt 0
> mlx5_lag_destroy_definers() <- definers[tt=0] gets double-destroyed
>
> Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
> Mem abort info:
> ESR = 0x0000000096000005
> EC = 0x25: DABT (current EL), IL = 32 bits
> SET = 0, FnV = 0
> EA = 0, S1PTW = 0
> FSC = 0x05: level 1 translation fault
> Data abort info:
> ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
> CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> user pgtable: 64k pages, 48-bit VAs, pgdp=0000000112ce2e00
> [0000000000000008] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000
> Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP
> Modules linked in: iptable_raw bonding ip_gre ip6_gre gre ip6_tunnel tunnel6 geneve ip6_udp_tunnel udp_tunnel ipip tunnel4 ip_tunnel rdma_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_ib(OE) ib_uverbs(OE) mlx5_fwctl(OE) fwctl(OE) mlx5_core(OE) mlxdevm(OE) ib_core(OE) mlxfw(OE) memtrack(OE) mlx_compat(OE) openvswitch nsh nf_conncount psample xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter bridge stp llc netconsole overlay efi_pstore sch_fq_codel zram ip_tables crct10dif_ce qemu_fw_cfg fuse ipv6 crc_ccitt [last unloaded: mlx_compat(OE)]
> CPU: 3 UID: 0 PID: 217 Comm: kworker/u53:2 Tainted: G OE 6.11.0+ #2
> Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
> Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
> Workqueue: mlx5_lag mlx5_do_bond_work [mlx5_core]
> pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> pc : mlx5_del_flow_rules+0x24/0x2c0 [mlx5_core]
> lr : mlx5_lag_destroy_definer+0x54/0x100 [mlx5_core]
> sp : ffff800085fafb00
> x29: ffff800085fafb00 x28: ffff0000da0c8000 x27: 0000000000000000
> x26: ffff0000da0c8000 x25: ffff0000da0c8000 x24: ffff0000da0c8000
> x23: ffff0000c31f81a0 x22: 0400000000000000 x21: ffff0000da0c8000
> x20: 0000000000000000 x19: 0000000000000001 x18: 0000000000000000
> x17: 0000000000000000 x16: 0000000000000000 x15: 0000ffff8b0c9350
> x14: 0000000000000000 x13: ffff800081390d18 x12: ffff800081dc3cc0
> x11: 0000000000000001 x10: 0000000000000b10 x9 : ffff80007ab7304c
> x8 : ffff0000d00711f0 x7 : 0000000000000004 x6 : 0000000000000190
> x5 : ffff00027edb3010 x4 : 0000000000000000 x3 : 0000000000000000
> x2 : ffff0000d39b8000 x1 : ffff0000d39b8000 x0 : 0400000000000000
> Call trace:
> mlx5_del_flow_rules+0x24/0x2c0 [mlx5_core]
> mlx5_lag_destroy_definer+0x54/0x100 [mlx5_core]
> mlx5_lag_destroy_definers+0xa0/0x108 [mlx5_core]
> mlx5_lag_port_sel_create+0x2d4/0x6f8 [mlx5_core]
> mlx5_activate_lag+0x60c/0x6f8 [mlx5_core]
> mlx5_do_bond_work+0x284/0x5c8 [mlx5_core]
> process_one_work+0x170/0x3e0
> worker_thread+0x2d8/0x3e0
> kthread+0x11c/0x128
> ret_from_fork+0x10/0x20
> Code: a9025bf5 aa0003f6 a90363f7 f90023f9 (f9400400)
> ---[ end trace 0000000000000000 ]---
>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
> Fixes: dc48516ec7d3 ("net/mlx5: Lag, add support to create definers for LAG")
> Signed-off-by: Mark Zhang <markzhang@nvidia.com>
> Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
> Reviewed-by: Mark Bloch <mbloch@nvidia.com>
> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
> ---
> drivers/net/ethernet/mellanox/mlx5/core/lag/port_sel.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag/port_sel.c b/drivers/net/ethernet/mellanox/mlx5/core/lag/port_sel.c
> index ab2717012b79..39e80704b1c4 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/lag/port_sel.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/port_sel.c
> @@ -530,7 +530,7 @@ int mlx5_lag_port_sel_create(struct mlx5_lag *ldev,
> set_tt_map(port_sel, hash_type);
> err = mlx5_lag_create_definers(ldev, hash_type, ports);
> if (err)
> - return err;
> + goto clear_port_sel;
>
> if (port_sel->tunnel) {
> err = mlx5_lag_create_inner_ttc_table(ldev);
> @@ -549,6 +549,8 @@ int mlx5_lag_port_sel_create(struct mlx5_lag *ldev,
> mlx5_destroy_ttc_table(port_sel->inner.ttc);
> destroy_definers:
> mlx5_lag_destroy_definers(ldev);
> +clear_port_sel:
> + memset(port_sel, 0, sizeof(*port_sel));
> return err;
> }
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH net 5/8] net/mlx5e: Fix inversion dependency warning while enabling IPsec tunnel
2025-01-13 15:40 [PATCH net 0/8] mlx5 misc fixes 2025-01-13 Tariq Toukan
` (3 preceding siblings ...)
2025-01-13 15:40 ` [PATCH net 4/8] net/mlx5: Clear port select structure when fail to create Tariq Toukan
@ 2025-01-13 15:40 ` Tariq Toukan
2025-01-13 15:40 ` [PATCH net 6/8] net/mlx5e: Properly match IPsec subnet addresses Tariq Toukan
` (2 subsequent siblings)
7 siblings, 0 replies; 20+ messages in thread
From: Tariq Toukan @ 2025-01-13 15:40 UTC (permalink / raw)
To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
Andrew Lunn
Cc: netdev, Saeed Mahameed, Gal Pressman, Leon Romanovsky, Mark Bloch,
Moshe Shemesh, Tariq Toukan
From: Leon Romanovsky <leonro@nvidia.com>
Attempt to enable IPsec packet offload in tunnel mode in debug kernel
generates the following kernel panic, which is happening due to two
issues:
1. In SA add section, the should be _bh() variant when marking SA mode.
2. There is not needed flush_workqueue in SA delete routine. It is not
needed as at this stage as it is removed from SADB and the running work
will be canceled later in SA free.
=====================================================
WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected
6.12.0+ #4 Not tainted
-----------------------------------------------------
charon/1337 [HC0[0]:SC0[4]:HE1:SE0] is trying to acquire:
ffff88810f365020 (&xa->xa_lock#24){+.+.}-{3:3}, at: mlx5e_xfrm_del_state+0xca/0x1e0 [mlx5_core]
and this task is already holding:
ffff88813e0f0d48 (&x->lock){+.-.}-{3:3}, at: xfrm_state_delete+0x16/0x30
which would create a new lock dependency:
(&x->lock){+.-.}-{3:3} -> (&xa->xa_lock#24){+.+.}-{3:3}
but this new dependency connects a SOFTIRQ-irq-safe lock:
(&x->lock){+.-.}-{3:3}
... which became SOFTIRQ-irq-safe at:
lock_acquire+0x1be/0x520
_raw_spin_lock_bh+0x34/0x40
xfrm_timer_handler+0x91/0xd70
__hrtimer_run_queues+0x1dd/0xa60
hrtimer_run_softirq+0x146/0x2e0
handle_softirqs+0x266/0x860
irq_exit_rcu+0x115/0x1a0
sysvec_apic_timer_interrupt+0x6e/0x90
asm_sysvec_apic_timer_interrupt+0x16/0x20
default_idle+0x13/0x20
default_idle_call+0x67/0xa0
do_idle+0x2da/0x320
cpu_startup_entry+0x50/0x60
start_secondary+0x213/0x2a0
common_startup_64+0x129/0x138
to a SOFTIRQ-irq-unsafe lock:
(&xa->xa_lock#24){+.+.}-{3:3}
... which became SOFTIRQ-irq-unsafe at:
...
lock_acquire+0x1be/0x520
_raw_spin_lock+0x2c/0x40
xa_set_mark+0x70/0x110
mlx5e_xfrm_add_state+0xe48/0x2290 [mlx5_core]
xfrm_dev_state_add+0x3bb/0xd70
xfrm_add_sa+0x2451/0x4a90
xfrm_user_rcv_msg+0x493/0x880
netlink_rcv_skb+0x12e/0x380
xfrm_netlink_rcv+0x6d/0x90
netlink_unicast+0x42f/0x740
netlink_sendmsg+0x745/0xbe0
__sock_sendmsg+0xc5/0x190
__sys_sendto+0x1fe/0x2c0
__x64_sys_sendto+0xdc/0x1b0
do_syscall_64+0x6d/0x140
entry_SYSCALL_64_after_hwframe+0x4b/0x53
other info that might help us debug this:
Possible interrupt unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&xa->xa_lock#24);
local_irq_disable();
lock(&x->lock);
lock(&xa->xa_lock#24);
<Interrupt>
lock(&x->lock);
*** DEADLOCK ***
2 locks held by charon/1337:
#0: ffffffff87f8f858 (&net->xfrm.xfrm_cfg_mutex){+.+.}-{4:4}, at: xfrm_netlink_rcv+0x5e/0x90
#1: ffff88813e0f0d48 (&x->lock){+.-.}-{3:3}, at: xfrm_state_delete+0x16/0x30
the dependencies between SOFTIRQ-irq-safe lock and the holding lock:
-> (&x->lock){+.-.}-{3:3} ops: 29 {
HARDIRQ-ON-W at:
lock_acquire+0x1be/0x520
_raw_spin_lock_bh+0x34/0x40
xfrm_alloc_spi+0xc0/0xe60
xfrm_alloc_userspi+0x5f6/0xbc0
xfrm_user_rcv_msg+0x493/0x880
netlink_rcv_skb+0x12e/0x380
xfrm_netlink_rcv+0x6d/0x90
netlink_unicast+0x42f/0x740
netlink_sendmsg+0x745/0xbe0
__sock_sendmsg+0xc5/0x190
__sys_sendto+0x1fe/0x2c0
__x64_sys_sendto+0xdc/0x1b0
do_syscall_64+0x6d/0x140
entry_SYSCALL_64_after_hwframe+0x4b/0x53
IN-SOFTIRQ-W at:
lock_acquire+0x1be/0x520
_raw_spin_lock_bh+0x34/0x40
xfrm_timer_handler+0x91/0xd70
__hrtimer_run_queues+0x1dd/0xa60
hrtimer_run_softirq+0x146/0x2e0
handle_softirqs+0x266/0x860
irq_exit_rcu+0x115/0x1a0
sysvec_apic_timer_interrupt+0x6e/0x90
asm_sysvec_apic_timer_interrupt+0x16/0x20
default_idle+0x13/0x20
default_idle_call+0x67/0xa0
do_idle+0x2da/0x320
cpu_startup_entry+0x50/0x60
start_secondary+0x213/0x2a0
common_startup_64+0x129/0x138
INITIAL USE at:
lock_acquire+0x1be/0x520
_raw_spin_lock_bh+0x34/0x40
xfrm_alloc_spi+0xc0/0xe60
xfrm_alloc_userspi+0x5f6/0xbc0
xfrm_user_rcv_msg+0x493/0x880
netlink_rcv_skb+0x12e/0x380
xfrm_netlink_rcv+0x6d/0x90
netlink_unicast+0x42f/0x740
netlink_sendmsg+0x745/0xbe0
__sock_sendmsg+0xc5/0x190
__sys_sendto+0x1fe/0x2c0
__x64_sys_sendto+0xdc/0x1b0
do_syscall_64+0x6d/0x140
entry_SYSCALL_64_after_hwframe+0x4b/0x53
}
... key at: [<ffffffff87f9cd20>] __key.18+0x0/0x40
the dependencies between the lock to be acquired
and SOFTIRQ-irq-unsafe lock:
-> (&xa->xa_lock#24){+.+.}-{3:3} ops: 9 {
HARDIRQ-ON-W at:
lock_acquire+0x1be/0x520
_raw_spin_lock_bh+0x34/0x40
mlx5e_xfrm_add_state+0xc5b/0x2290 [mlx5_core]
xfrm_dev_state_add+0x3bb/0xd70
xfrm_add_sa+0x2451/0x4a90
xfrm_user_rcv_msg+0x493/0x880
netlink_rcv_skb+0x12e/0x380
xfrm_netlink_rcv+0x6d/0x90
netlink_unicast+0x42f/0x740
netlink_sendmsg+0x745/0xbe0
__sock_sendmsg+0xc5/0x190
__sys_sendto+0x1fe/0x2c0
__x64_sys_sendto+0xdc/0x1b0
do_syscall_64+0x6d/0x140
entry_SYSCALL_64_after_hwframe+0x4b/0x53
SOFTIRQ-ON-W at:
lock_acquire+0x1be/0x520
_raw_spin_lock+0x2c/0x40
xa_set_mark+0x70/0x110
mlx5e_xfrm_add_state+0xe48/0x2290 [mlx5_core]
xfrm_dev_state_add+0x3bb/0xd70
xfrm_add_sa+0x2451/0x4a90
xfrm_user_rcv_msg+0x493/0x880
netlink_rcv_skb+0x12e/0x380
xfrm_netlink_rcv+0x6d/0x90
netlink_unicast+0x42f/0x740
netlink_sendmsg+0x745/0xbe0
__sock_sendmsg+0xc5/0x190
__sys_sendto+0x1fe/0x2c0
__x64_sys_sendto+0xdc/0x1b0
do_syscall_64+0x6d/0x140
entry_SYSCALL_64_after_hwframe+0x4b/0x53
INITIAL USE at:
lock_acquire+0x1be/0x520
_raw_spin_lock_bh+0x34/0x40
mlx5e_xfrm_add_state+0xc5b/0x2290 [mlx5_core]
xfrm_dev_state_add+0x3bb/0xd70
xfrm_add_sa+0x2451/0x4a90
xfrm_user_rcv_msg+0x493/0x880
netlink_rcv_skb+0x12e/0x380
xfrm_netlink_rcv+0x6d/0x90
netlink_unicast+0x42f/0x740
netlink_sendmsg+0x745/0xbe0
__sock_sendmsg+0xc5/0x190
__sys_sendto+0x1fe/0x2c0
__x64_sys_sendto+0xdc/0x1b0
do_syscall_64+0x6d/0x140
entry_SYSCALL_64_after_hwframe+0x4b/0x53
}
... key at: [<ffffffffa078ff60>] __key.48+0x0/0xfffffffffff210a0 [mlx5_core]
... acquired at:
__lock_acquire+0x30a0/0x5040
lock_acquire+0x1be/0x520
_raw_spin_lock_bh+0x34/0x40
mlx5e_xfrm_del_state+0xca/0x1e0 [mlx5_core]
xfrm_dev_state_delete+0x90/0x160
__xfrm_state_delete+0x662/0xae0
xfrm_state_delete+0x1e/0x30
xfrm_del_sa+0x1c2/0x340
xfrm_user_rcv_msg+0x493/0x880
netlink_rcv_skb+0x12e/0x380
xfrm_netlink_rcv+0x6d/0x90
netlink_unicast+0x42f/0x740
netlink_sendmsg+0x745/0xbe0
__sock_sendmsg+0xc5/0x190
__sys_sendto+0x1fe/0x2c0
__x64_sys_sendto+0xdc/0x1b0
do_syscall_64+0x6d/0x140
entry_SYSCALL_64_after_hwframe+0x4b/0x53
stack backtrace:
CPU: 7 UID: 0 PID: 1337 Comm: charon Not tainted 6.12.0+ #4
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x74/0xd0
check_irq_usage+0x12e8/0x1d90
? print_shortest_lock_dependencies_backwards+0x1b0/0x1b0
? check_chain_key+0x1bb/0x4c0
? __lockdep_reset_lock+0x180/0x180
? check_path.constprop.0+0x24/0x50
? mark_lock+0x108/0x2fb0
? print_circular_bug+0x9b0/0x9b0
? mark_lock+0x108/0x2fb0
? print_usage_bug.part.0+0x670/0x670
? check_prev_add+0x1c4/0x2310
check_prev_add+0x1c4/0x2310
__lock_acquire+0x30a0/0x5040
? lockdep_set_lock_cmp_fn+0x190/0x190
? lockdep_set_lock_cmp_fn+0x190/0x190
lock_acquire+0x1be/0x520
? mlx5e_xfrm_del_state+0xca/0x1e0 [mlx5_core]
? lockdep_hardirqs_on_prepare+0x400/0x400
? __xfrm_state_delete+0x5f0/0xae0
? lock_downgrade+0x6b0/0x6b0
_raw_spin_lock_bh+0x34/0x40
? mlx5e_xfrm_del_state+0xca/0x1e0 [mlx5_core]
mlx5e_xfrm_del_state+0xca/0x1e0 [mlx5_core]
xfrm_dev_state_delete+0x90/0x160
__xfrm_state_delete+0x662/0xae0
xfrm_state_delete+0x1e/0x30
xfrm_del_sa+0x1c2/0x340
? xfrm_get_sa+0x250/0x250
? check_chain_key+0x1bb/0x4c0
xfrm_user_rcv_msg+0x493/0x880
? copy_sec_ctx+0x270/0x270
? check_chain_key+0x1bb/0x4c0
? lockdep_set_lock_cmp_fn+0x190/0x190
? lockdep_set_lock_cmp_fn+0x190/0x190
netlink_rcv_skb+0x12e/0x380
? copy_sec_ctx+0x270/0x270
? netlink_ack+0xd90/0xd90
? netlink_deliver_tap+0xcd/0xb60
xfrm_netlink_rcv+0x6d/0x90
netlink_unicast+0x42f/0x740
? netlink_attachskb+0x730/0x730
? lock_acquire+0x1be/0x520
netlink_sendmsg+0x745/0xbe0
? netlink_unicast+0x740/0x740
? __might_fault+0xbb/0x170
? netlink_unicast+0x740/0x740
__sock_sendmsg+0xc5/0x190
? fdget+0x163/0x1d0
__sys_sendto+0x1fe/0x2c0
? __x64_sys_getpeername+0xb0/0xb0
? do_user_addr_fault+0x856/0xe30
? lock_acquire+0x1be/0x520
? __task_pid_nr_ns+0x117/0x410
? lock_downgrade+0x6b0/0x6b0
__x64_sys_sendto+0xdc/0x1b0
? lockdep_hardirqs_on_prepare+0x284/0x400
do_syscall_64+0x6d/0x140
entry_SYSCALL_64_after_hwframe+0x4b/0x53
RIP: 0033:0x7f7d31291ba4
Code: 7d e8 89 4d d4 e8 4c 42 f7 ff 44 8b 4d d0 4c 8b 45 c8 89 c3 44 8b 55 d4 8b 7d e8 b8 2c 00 00 00 48 8b 55 d8 48 8b 75 e0 0f 05 <48> 3d 00 f0 ff ff 77 34 89 df 48 89 45 e8 e8 99 42 f7 ff 48 8b 45
RSP: 002b:00007f7d2ccd94f0 EFLAGS: 00000297 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f7d31291ba4
RDX: 0000000000000028 RSI: 00007f7d2ccd96a0 RDI: 000000000000000a
RBP: 00007f7d2ccd9530 R08: 00007f7d2ccd9598 R09: 000000000000000c
R10: 0000000000000000 R11: 0000000000000297 R12: 0000000000000028
R13: 00007f7d2ccd9598 R14: 00007f7d2ccd96a0 R15: 00000000000000e1
</TASK>
Fixes: 4c24272b4e2b ("net/mlx5e: Listen to ARP events to update IPsec L2 headers in tunnel mode")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
.../ethernet/mellanox/mlx5/core/en_accel/ipsec.c | 16 ++++++----------
1 file changed, 6 insertions(+), 10 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
index ca92e518be76..21857474ad83 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
@@ -768,9 +768,12 @@ static int mlx5e_xfrm_add_state(struct xfrm_state *x,
MLX5_IPSEC_RESCHED);
if (x->xso.type == XFRM_DEV_OFFLOAD_PACKET &&
- x->props.mode == XFRM_MODE_TUNNEL)
- xa_set_mark(&ipsec->sadb, sa_entry->ipsec_obj_id,
- MLX5E_IPSEC_TUNNEL_SA);
+ x->props.mode == XFRM_MODE_TUNNEL) {
+ xa_lock_bh(&ipsec->sadb);
+ __xa_set_mark(&ipsec->sadb, sa_entry->ipsec_obj_id,
+ MLX5E_IPSEC_TUNNEL_SA);
+ xa_unlock_bh(&ipsec->sadb);
+ }
out:
x->xso.offload_handle = (unsigned long)sa_entry;
@@ -797,7 +800,6 @@ static int mlx5e_xfrm_add_state(struct xfrm_state *x,
static void mlx5e_xfrm_del_state(struct xfrm_state *x)
{
struct mlx5e_ipsec_sa_entry *sa_entry = to_ipsec_sa_entry(x);
- struct mlx5_accel_esp_xfrm_attrs *attrs = &sa_entry->attrs;
struct mlx5e_ipsec *ipsec = sa_entry->ipsec;
struct mlx5e_ipsec_sa_entry *old;
@@ -806,12 +808,6 @@ static void mlx5e_xfrm_del_state(struct xfrm_state *x)
old = xa_erase_bh(&ipsec->sadb, sa_entry->ipsec_obj_id);
WARN_ON(old != sa_entry);
-
- if (attrs->mode == XFRM_MODE_TUNNEL &&
- attrs->type == XFRM_DEV_OFFLOAD_PACKET)
- /* Make sure that no ARP requests are running in parallel */
- flush_workqueue(ipsec->wq);
-
}
static void mlx5e_xfrm_free_state(struct xfrm_state *x)
--
2.45.0
^ permalink raw reply related [flat|nested] 20+ messages in thread* [PATCH net 6/8] net/mlx5e: Properly match IPsec subnet addresses
2025-01-13 15:40 [PATCH net 0/8] mlx5 misc fixes 2025-01-13 Tariq Toukan
` (4 preceding siblings ...)
2025-01-13 15:40 ` [PATCH net 5/8] net/mlx5e: Fix inversion dependency warning while enabling IPsec tunnel Tariq Toukan
@ 2025-01-13 15:40 ` Tariq Toukan
2025-01-13 19:07 ` Jacob Keller
2025-01-15 2:37 ` Jakub Kicinski
2025-01-13 15:40 ` [PATCH net 7/8] net/mlx5e: Rely on reqid in IPsec tunnel mode Tariq Toukan
2025-01-13 15:40 ` [PATCH net 8/8] net/mlx5e: Always start IPsec sequence number from 1 Tariq Toukan
7 siblings, 2 replies; 20+ messages in thread
From: Tariq Toukan @ 2025-01-13 15:40 UTC (permalink / raw)
To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
Andrew Lunn
Cc: netdev, Saeed Mahameed, Gal Pressman, Leon Romanovsky, Mark Bloch,
Moshe Shemesh, Tariq Toukan
From: Leon Romanovsky <leonro@nvidia.com>
Existing match criteria didn't allow to match whole subnet and
only by specific addresses only. This caused to tunnel mode do not
forward such traffic through relevant SA.
In tunnel mode, policies look like this:
src 192.169.0.0/16 dst 192.169.0.0/16
dir out priority 383615 ptype main
tmpl src 192.169.101.2 dst 192.169.101.1
proto esp spi 0xc5141c18 reqid 1 mode tunnel
crypto offload parameters: dev eth2 mode packet
Fixes: a5b8ca9471d3 ("net/mlx5e: Add XFRM policy offload logic")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
.../mellanox/mlx5/core/en_accel/ipsec_fs.c | 41 +++++++++++++++----
1 file changed, 33 insertions(+), 8 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c
index e51b03d4c717..47df02ef5d69 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c
@@ -1150,9 +1150,20 @@ static void tx_ft_put_policy(struct mlx5e_ipsec *ipsec, u32 prio, int type)
mutex_unlock(&tx->ft.mutex);
}
+static void addr4_to_mask(__be32 *addr, __be32 *mask)
+{
+ int i;
+
+ *mask = 0;
+ for (i = 0; i < 4; i++)
+ *mask |= ((*addr >> 8 * i) & 0xFF) ? (0xFF << 8 * i) : 0;
+}
+
static void setup_fte_addr4(struct mlx5_flow_spec *spec, __be32 *saddr,
__be32 *daddr)
{
+ __be32 mask;
+
if (!*saddr && !*daddr)
return;
@@ -1164,21 +1175,33 @@ static void setup_fte_addr4(struct mlx5_flow_spec *spec, __be32 *saddr,
if (*saddr) {
memcpy(MLX5_ADDR_OF(fte_match_param, spec->match_value,
outer_headers.src_ipv4_src_ipv6.ipv4_layout.ipv4), saddr, 4);
- MLX5_SET_TO_ONES(fte_match_param, spec->match_criteria,
- outer_headers.src_ipv4_src_ipv6.ipv4_layout.ipv4);
+ addr4_to_mask(saddr, &mask);
+ memcpy(MLX5_ADDR_OF(fte_match_param, spec->match_criteria,
+ outer_headers.src_ipv4_src_ipv6.ipv4_layout.ipv4), &mask, 4);
}
if (*daddr) {
memcpy(MLX5_ADDR_OF(fte_match_param, spec->match_value,
outer_headers.dst_ipv4_dst_ipv6.ipv4_layout.ipv4), daddr, 4);
- MLX5_SET_TO_ONES(fte_match_param, spec->match_criteria,
- outer_headers.dst_ipv4_dst_ipv6.ipv4_layout.ipv4);
+ addr4_to_mask(daddr, &mask);
+ memcpy(MLX5_ADDR_OF(fte_match_param, spec->match_criteria,
+ outer_headers.dst_ipv4_dst_ipv6.ipv4_layout.ipv4), &mask, 4);
}
}
+static void addr6_to_mask(__be32 *addr, __be32 *mask)
+{
+ int i;
+
+ for (i = 0; i < 4; i++)
+ addr4_to_mask(&addr[i], &mask[i]);
+}
+
static void setup_fte_addr6(struct mlx5_flow_spec *spec, __be32 *saddr,
__be32 *daddr)
{
+ __be32 mask[4];
+
if (addr6_all_zero(saddr) && addr6_all_zero(daddr))
return;
@@ -1190,15 +1213,17 @@ static void setup_fte_addr6(struct mlx5_flow_spec *spec, __be32 *saddr,
if (!addr6_all_zero(saddr)) {
memcpy(MLX5_ADDR_OF(fte_match_param, spec->match_value,
outer_headers.src_ipv4_src_ipv6.ipv6_layout.ipv6), saddr, 16);
- memset(MLX5_ADDR_OF(fte_match_param, spec->match_criteria,
- outer_headers.src_ipv4_src_ipv6.ipv6_layout.ipv6), 0xff, 16);
+ addr6_to_mask(saddr, mask);
+ memcpy(MLX5_ADDR_OF(fte_match_param, spec->match_criteria,
+ outer_headers.src_ipv4_src_ipv6.ipv6_layout.ipv6), mask, 16);
}
if (!addr6_all_zero(daddr)) {
memcpy(MLX5_ADDR_OF(fte_match_param, spec->match_value,
outer_headers.dst_ipv4_dst_ipv6.ipv6_layout.ipv6), daddr, 16);
- memset(MLX5_ADDR_OF(fte_match_param, spec->match_criteria,
- outer_headers.dst_ipv4_dst_ipv6.ipv6_layout.ipv6), 0xff, 16);
+ addr6_to_mask(daddr, mask);
+ memcpy(MLX5_ADDR_OF(fte_match_param, spec->match_criteria,
+ outer_headers.dst_ipv4_dst_ipv6.ipv6_layout.ipv6), mask, 16);
}
}
--
2.45.0
^ permalink raw reply related [flat|nested] 20+ messages in thread* Re: [PATCH net 6/8] net/mlx5e: Properly match IPsec subnet addresses
2025-01-13 15:40 ` [PATCH net 6/8] net/mlx5e: Properly match IPsec subnet addresses Tariq Toukan
@ 2025-01-13 19:07 ` Jacob Keller
2025-01-13 19:23 ` Leon Romanovsky
2025-01-15 2:37 ` Jakub Kicinski
1 sibling, 1 reply; 20+ messages in thread
From: Jacob Keller @ 2025-01-13 19:07 UTC (permalink / raw)
To: Tariq Toukan, David S. Miller, Jakub Kicinski, Paolo Abeni,
Eric Dumazet, Andrew Lunn
Cc: netdev, Saeed Mahameed, Gal Pressman, Leon Romanovsky, Mark Bloch,
Moshe Shemesh
On 1/13/2025 7:40 AM, Tariq Toukan wrote:
> +static void addr4_to_mask(__be32 *addr, __be32 *mask)
> +{
> + int i;
> +
> + *mask = 0;
> + for (i = 0; i < 4; i++)
> + *mask |= ((*addr >> 8 * i) & 0xFF) ? (0xFF << 8 * i) : 0;
> +}
> +
I'm surprised this isn't already a common helper function.
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: [PATCH net 6/8] net/mlx5e: Properly match IPsec subnet addresses
2025-01-13 19:07 ` Jacob Keller
@ 2025-01-13 19:23 ` Leon Romanovsky
2025-01-13 19:48 ` Jacob Keller
0 siblings, 1 reply; 20+ messages in thread
From: Leon Romanovsky @ 2025-01-13 19:23 UTC (permalink / raw)
To: Jacob Keller
Cc: Tariq Toukan, David S. Miller, Jakub Kicinski, Paolo Abeni,
Eric Dumazet, Andrew Lunn, netdev, Saeed Mahameed, Gal Pressman,
Mark Bloch, Moshe Shemesh
On Mon, Jan 13, 2025 at 11:07:03AM -0800, Jacob Keller wrote:
>
>
> On 1/13/2025 7:40 AM, Tariq Toukan wrote:
> > +static void addr4_to_mask(__be32 *addr, __be32 *mask)
> > +{
> > + int i;
> > +
> > + *mask = 0;
> > + for (i = 0; i < 4; i++)
> > + *mask |= ((*addr >> 8 * i) & 0xFF) ? (0xFF << 8 * i) : 0;
> > +}
> > +
>
> I'm surprised this isn't already a common helper function.
I failed to find.
Thanks
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: [PATCH net 6/8] net/mlx5e: Properly match IPsec subnet addresses
2025-01-13 19:23 ` Leon Romanovsky
@ 2025-01-13 19:48 ` Jacob Keller
0 siblings, 0 replies; 20+ messages in thread
From: Jacob Keller @ 2025-01-13 19:48 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Tariq Toukan, David S. Miller, Jakub Kicinski, Paolo Abeni,
Eric Dumazet, Andrew Lunn, netdev, Saeed Mahameed, Gal Pressman,
Mark Bloch, Moshe Shemesh
On 1/13/2025 11:23 AM, Leon Romanovsky wrote:
> On Mon, Jan 13, 2025 at 11:07:03AM -0800, Jacob Keller wrote:
>>
>>
>> On 1/13/2025 7:40 AM, Tariq Toukan wrote:
>>> +static void addr4_to_mask(__be32 *addr, __be32 *mask)
>>> +{
>>> + int i;
>>> +
>>> + *mask = 0;
>>> + for (i = 0; i < 4; i++)
>>> + *mask |= ((*addr >> 8 * i) & 0xFF) ? (0xFF << 8 * i) : 0;
>>> +}
>>> +
>>
>> I'm surprised this isn't already a common helper function.
>
> I failed to find.
>
> Thanks
To clarify, I didn't find one either, and I don't think its a blocker on
this fix. I'm just surprised that there hasn't been another user with
the same need before.
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH net 6/8] net/mlx5e: Properly match IPsec subnet addresses
2025-01-13 15:40 ` [PATCH net 6/8] net/mlx5e: Properly match IPsec subnet addresses Tariq Toukan
2025-01-13 19:07 ` Jacob Keller
@ 2025-01-15 2:37 ` Jakub Kicinski
2025-01-15 8:39 ` Leon Romanovsky
1 sibling, 1 reply; 20+ messages in thread
From: Jakub Kicinski @ 2025-01-15 2:37 UTC (permalink / raw)
To: Tariq Toukan
Cc: David S. Miller, Paolo Abeni, Eric Dumazet, Andrew Lunn, netdev,
Saeed Mahameed, Gal Pressman, Leon Romanovsky, Mark Bloch,
Moshe Shemesh
On Mon, 13 Jan 2025 17:40:52 +0200 Tariq Toukan wrote:
> +static void addr4_to_mask(__be32 *addr, __be32 *mask)
> +{
> + int i;
> +
> + *mask = 0;
> + for (i = 0; i < 4; i++)
> + *mask |= ((*addr >> 8 * i) & 0xFF) ? (0xFF << 8 * i) : 0;
sparse is unimpressed with the lack of byteswaps here.
drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c:1159:28: warning: restricted __be32 degrades to integer
drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c:1159:23: warning: invalid assignment: |=
drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c:1159:23: left side has type restricted __be32
drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c:1159:23: right side has type int
Maybe just force cast to u32 inside the helper, and add a comment why?
Or just byte swap.
Also from the word mask and subnet in the commit message it sounds like
you are shooting for a prefix. But this does "byte enable" kind of
thing :S Think 10.0.55.0/24. Maybe mention if this is intentional in
a comment, too.
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: [PATCH net 6/8] net/mlx5e: Properly match IPsec subnet addresses
2025-01-15 2:37 ` Jakub Kicinski
@ 2025-01-15 8:39 ` Leon Romanovsky
0 siblings, 0 replies; 20+ messages in thread
From: Leon Romanovsky @ 2025-01-15 8:39 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Tariq Toukan, David S. Miller, Paolo Abeni, Eric Dumazet,
Andrew Lunn, netdev, Saeed Mahameed, Gal Pressman, Mark Bloch,
Moshe Shemesh
On Tue, Jan 14, 2025 at 06:37:35PM -0800, Jakub Kicinski wrote:
> On Mon, 13 Jan 2025 17:40:52 +0200 Tariq Toukan wrote:
> > +static void addr4_to_mask(__be32 *addr, __be32 *mask)
> > +{
> > + int i;
> > +
> > + *mask = 0;
> > + for (i = 0; i < 4; i++)
> > + *mask |= ((*addr >> 8 * i) & 0xFF) ? (0xFF << 8 * i) : 0;
>
> sparse is unimpressed with the lack of byteswaps here.
>
> drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c:1159:28: warning: restricted __be32 degrades to integer
> drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c:1159:23: warning: invalid assignment: |=
> drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c:1159:23: left side has type restricted __be32
> drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c:1159:23: right side has type int
Strange that I didn't get such errors from kbuild, I kept this patch in
my tree for approximately month, before we sent it.
>
>
> Maybe just force cast to u32 inside the helper, and add a comment why?
> Or just byte swap.
Thanks, will try.
>
> Also from the word mask and subnet in the commit message it sounds like
> you are shooting for a prefix. But this does "byte enable" kind of
> thing :S Think 10.0.55.0/24. Maybe mention if this is intentional in
> a comment, too.
It is not essential, I need to take into account subnet too.
Tariq, please drop this patch for now.
Thanks
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH net 7/8] net/mlx5e: Rely on reqid in IPsec tunnel mode
2025-01-13 15:40 [PATCH net 0/8] mlx5 misc fixes 2025-01-13 Tariq Toukan
` (5 preceding siblings ...)
2025-01-13 15:40 ` [PATCH net 6/8] net/mlx5e: Properly match IPsec subnet addresses Tariq Toukan
@ 2025-01-13 15:40 ` Tariq Toukan
2025-01-13 19:08 ` Jacob Keller
2025-01-13 15:40 ` [PATCH net 8/8] net/mlx5e: Always start IPsec sequence number from 1 Tariq Toukan
7 siblings, 1 reply; 20+ messages in thread
From: Tariq Toukan @ 2025-01-13 15:40 UTC (permalink / raw)
To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
Andrew Lunn
Cc: netdev, Saeed Mahameed, Gal Pressman, Leon Romanovsky, Mark Bloch,
Moshe Shemesh, Tariq Toukan
From: Leon Romanovsky <leonro@nvidia.com>
All packet offloads SAs have reqid in it to make sure they have
corresponding policy. While it is not strictly needed for transparent
mode, it is extremely important in tunnel mode. In that mode, policy and
SAs have different match criteria.
Policy catches the whole subnet addresses, and SA catches the tunnel gateways
addresses. The source address of such tunnel is not known during egress packet
traversal in flow steering as it is added only after successful encryption.
As reqid is required for packet offload and it is unique for every SA,
we can safely rely on it only.
The output below shows the configured egress policy and SA by strongswan:
[leonro@vm ~]$ sudo ip x s
src 192.169.101.2 dst 192.169.101.1
proto esp spi 0xc88b7652 reqid 1 mode tunnel
replay-window 0 flag af-unspec esn
aead rfc4106(gcm(aes)) 0xe406a01083986e14d116488549094710e9c57bc6 128
anti-replay esn context:
seq-hi 0x0, seq 0x0, oseq-hi 0x0, oseq 0x0
replay_window 1, bitmap-length 1
00000000
crypto offload parameters: dev eth2 dir out mode packet
[leonro@064 ~]$ sudo ip x p
src 192.170.0.0/16 dst 192.170.0.0/16
dir out priority 383615 ptype main
tmpl src 192.169.101.2 dst 192.169.101.1
proto esp spi 0xc88b7652 reqid 1 mode tunnel
crypto offload parameters: dev eth2 mode packet
Fixes: b3beba1fb404 ("net/mlx5e: Allow policies with reqid 0, to support IKE policy holes")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
.../ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c | 12 +++++-------
1 file changed, 5 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c
index 47df02ef5d69..772b329aecc5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c
@@ -1743,23 +1743,21 @@ static int tx_add_rule(struct mlx5e_ipsec_sa_entry *sa_entry)
goto err_alloc;
}
- if (attrs->family == AF_INET)
- setup_fte_addr4(spec, &attrs->saddr.a4, &attrs->daddr.a4);
- else
- setup_fte_addr6(spec, attrs->saddr.a6, attrs->daddr.a6);
-
setup_fte_no_frags(spec);
setup_fte_upper_proto_match(spec, &attrs->upspec);
switch (attrs->type) {
case XFRM_DEV_OFFLOAD_CRYPTO:
+ if (attrs->family == AF_INET)
+ setup_fte_addr4(spec, &attrs->saddr.a4, &attrs->daddr.a4);
+ else
+ setup_fte_addr6(spec, attrs->saddr.a6, attrs->daddr.a6);
setup_fte_spi(spec, attrs->spi, false);
setup_fte_esp(spec);
setup_fte_reg_a(spec);
break;
case XFRM_DEV_OFFLOAD_PACKET:
- if (attrs->reqid)
- setup_fte_reg_c4(spec, attrs->reqid);
+ setup_fte_reg_c4(spec, attrs->reqid);
err = setup_pkt_reformat(ipsec, attrs, &flow_act);
if (err)
goto err_pkt_reformat;
--
2.45.0
^ permalink raw reply related [flat|nested] 20+ messages in thread* Re: [PATCH net 7/8] net/mlx5e: Rely on reqid in IPsec tunnel mode
2025-01-13 15:40 ` [PATCH net 7/8] net/mlx5e: Rely on reqid in IPsec tunnel mode Tariq Toukan
@ 2025-01-13 19:08 ` Jacob Keller
0 siblings, 0 replies; 20+ messages in thread
From: Jacob Keller @ 2025-01-13 19:08 UTC (permalink / raw)
To: Tariq Toukan, David S. Miller, Jakub Kicinski, Paolo Abeni,
Eric Dumazet, Andrew Lunn
Cc: netdev, Saeed Mahameed, Gal Pressman, Leon Romanovsky, Mark Bloch,
Moshe Shemesh
On 1/13/2025 7:40 AM, Tariq Toukan wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
>
> All packet offloads SAs have reqid in it to make sure they have
> corresponding policy. While it is not strictly needed for transparent
> mode, it is extremely important in tunnel mode. In that mode, policy and
> SAs have different match criteria.
>
> Policy catches the whole subnet addresses, and SA catches the tunnel gateways
> addresses. The source address of such tunnel is not known during egress packet
> traversal in flow steering as it is added only after successful encryption.
>
> As reqid is required for packet offload and it is unique for every SA,
> we can safely rely on it only.
>
> The output below shows the configured egress policy and SA by strongswan:
>
> [leonro@vm ~]$ sudo ip x s
> src 192.169.101.2 dst 192.169.101.1
> proto esp spi 0xc88b7652 reqid 1 mode tunnel
> replay-window 0 flag af-unspec esn
> aead rfc4106(gcm(aes)) 0xe406a01083986e14d116488549094710e9c57bc6 128
> anti-replay esn context:
> seq-hi 0x0, seq 0x0, oseq-hi 0x0, oseq 0x0
> replay_window 1, bitmap-length 1
> 00000000
> crypto offload parameters: dev eth2 dir out mode packet
>
> [leonro@064 ~]$ sudo ip x p
> src 192.170.0.0/16 dst 192.170.0.0/16
> dir out priority 383615 ptype main
> tmpl src 192.169.101.2 dst 192.169.101.1
> proto esp spi 0xc88b7652 reqid 1 mode tunnel
> crypto offload parameters: dev eth2 mode packet
>
> Fixes: b3beba1fb404 ("net/mlx5e: Allow policies with reqid 0, to support IKE policy holes")
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
> ---
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH net 8/8] net/mlx5e: Always start IPsec sequence number from 1
2025-01-13 15:40 [PATCH net 0/8] mlx5 misc fixes 2025-01-13 Tariq Toukan
` (6 preceding siblings ...)
2025-01-13 15:40 ` [PATCH net 7/8] net/mlx5e: Rely on reqid in IPsec tunnel mode Tariq Toukan
@ 2025-01-13 15:40 ` Tariq Toukan
2025-01-13 19:08 ` Jacob Keller
7 siblings, 1 reply; 20+ messages in thread
From: Tariq Toukan @ 2025-01-13 15:40 UTC (permalink / raw)
To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
Andrew Lunn
Cc: netdev, Saeed Mahameed, Gal Pressman, Leon Romanovsky, Mark Bloch,
Moshe Shemesh, Tariq Toukan
From: Leon Romanovsky <leonro@nvidia.com>
According to RFC4303, section "3.3.3. Sequence Number Generation",
the first packet sent using a given SA will contain a sequence
number of 1.
This is applicable to both ESN and non-ESN mode, which was not covered
in commit mentioned in Fixes line.
Fixes: 3d42c8cc67a8 ("net/mlx5e: Ensure that IPsec sequence packet number starts from 1")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
.../net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c | 6 ++++++
.../mellanox/mlx5/core/en_accel/ipsec_offload.c | 11 ++++++++---
2 files changed, 14 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
index 21857474ad83..1baf8933a07c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
@@ -724,6 +724,12 @@ static int mlx5e_xfrm_add_state(struct xfrm_state *x,
/* check esn */
if (x->props.flags & XFRM_STATE_ESN)
mlx5e_ipsec_update_esn_state(sa_entry);
+ else
+ /* According to RFC4303, section "3.3.3. Sequence Number Generation",
+ * the first packet sent using a given SA will contain a sequence
+ * number of 1.
+ */
+ sa_entry->esn_state.esn = 1;
mlx5e_ipsec_build_accel_xfrm_attrs(sa_entry, &sa_entry->attrs);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_offload.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_offload.c
index 53cfa39188cb..820debf3fbbf 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_offload.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_offload.c
@@ -91,8 +91,9 @@ u32 mlx5_ipsec_device_caps(struct mlx5_core_dev *mdev)
EXPORT_SYMBOL_GPL(mlx5_ipsec_device_caps);
static void mlx5e_ipsec_packet_setup(void *obj, u32 pdn,
- struct mlx5_accel_esp_xfrm_attrs *attrs)
+ struct mlx5e_ipsec_sa_entry *sa_entry)
{
+ struct mlx5_accel_esp_xfrm_attrs *attrs = &sa_entry->attrs;
void *aso_ctx;
aso_ctx = MLX5_ADDR_OF(ipsec_obj, obj, ipsec_aso);
@@ -120,8 +121,12 @@ static void mlx5e_ipsec_packet_setup(void *obj, u32 pdn,
* active.
*/
MLX5_SET(ipsec_obj, obj, aso_return_reg, MLX5_IPSEC_ASO_REG_C_4_5);
- if (attrs->dir == XFRM_DEV_OFFLOAD_OUT)
+ if (attrs->dir == XFRM_DEV_OFFLOAD_OUT) {
MLX5_SET(ipsec_aso, aso_ctx, mode, MLX5_IPSEC_ASO_INC_SN);
+ if (!attrs->replay_esn.trigger)
+ MLX5_SET(ipsec_aso, aso_ctx, mode_parameter,
+ sa_entry->esn_state.esn);
+ }
if (attrs->lft.hard_packet_limit != XFRM_INF) {
MLX5_SET(ipsec_aso, aso_ctx, remove_flow_pkt_cnt,
@@ -175,7 +180,7 @@ static int mlx5_create_ipsec_obj(struct mlx5e_ipsec_sa_entry *sa_entry)
res = &mdev->mlx5e_res.hw_objs;
if (attrs->type == XFRM_DEV_OFFLOAD_PACKET)
- mlx5e_ipsec_packet_setup(obj, res->pdn, attrs);
+ mlx5e_ipsec_packet_setup(obj, res->pdn, sa_entry);
err = mlx5_cmd_exec(mdev, in, sizeof(in), out, sizeof(out));
if (!err)
--
2.45.0
^ permalink raw reply related [flat|nested] 20+ messages in thread* Re: [PATCH net 8/8] net/mlx5e: Always start IPsec sequence number from 1
2025-01-13 15:40 ` [PATCH net 8/8] net/mlx5e: Always start IPsec sequence number from 1 Tariq Toukan
@ 2025-01-13 19:08 ` Jacob Keller
0 siblings, 0 replies; 20+ messages in thread
From: Jacob Keller @ 2025-01-13 19:08 UTC (permalink / raw)
To: Tariq Toukan, David S. Miller, Jakub Kicinski, Paolo Abeni,
Eric Dumazet, Andrew Lunn
Cc: netdev, Saeed Mahameed, Gal Pressman, Leon Romanovsky, Mark Bloch,
Moshe Shemesh
On 1/13/2025 7:40 AM, Tariq Toukan wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
>
> According to RFC4303, section "3.3.3. Sequence Number Generation",
> the first packet sent using a given SA will contain a sequence
> number of 1.
>
> This is applicable to both ESN and non-ESN mode, which was not covered
> in commit mentioned in Fixes line.
>
> Fixes: 3d42c8cc67a8 ("net/mlx5e: Ensure that IPsec sequence packet number starts from 1")
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
> ---
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
^ permalink raw reply [flat|nested] 20+ messages in thread