netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net 0/6] mlxsw: Miscellaneous fixes
@ 2024-01-17 15:04 Petr Machata
  2024-01-17 15:04 ` [PATCH net 1/6] mlxsw: spectrum_acl_erp: Fix error flow of pool allocation failure Petr Machata
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: Petr Machata @ 2024-01-17 15:04 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	netdev
  Cc: Ido Schimmel, Petr Machata, Danielle Ratson, Amit Cohen,
	Jiri Pirko, mlxsw

This patchset is a bric-a-brac of fixes for bugs impacting mlxsw.

- Patches #1 and #2 fix issues in ACL handling error paths.
- Patch #3 fixes stack corruption in ACL code that a recent FW update
  has uncovered.

- Patch #4 fixes an issue in handling of IPIP next hops.

- Patch #5 fixes a typo in a the qos_pfc selftest
- Patch #6 fixes the same selftest to work with 8-lane ports.

Amit Cohen (3):
  mlxsw: spectrum_acl_erp: Fix error flow of pool allocation failure
  selftests: mlxsw: qos_pfc: Remove wrong description
  selftests: mlxsw: qos_pfc: Adjust the test to support 8 lanes

Ido Schimmel (2):
  mlxsw: spectrum_acl_tcam: Fix NULL pointer dereference in error path
  mlxsw: spectrum_acl_tcam: Fix stack corruption

Petr Machata (1):
  mlxsw: spectrum_router: Register netdevice notifier before nexthop

 .../mellanox/mlxsw/spectrum_acl_erp.c         |   8 +-
 .../mellanox/mlxsw/spectrum_acl_tcam.c        |   6 +-
 .../ethernet/mellanox/mlxsw/spectrum_router.c |  24 ++--
 .../selftests/drivers/net/mlxsw/qos_pfc.sh    |  19 +++-
 .../drivers/net/mlxsw/spectrum-2/tc_flower.sh | 106 +++++++++++++++++-
 5 files changed, 143 insertions(+), 20 deletions(-)

-- 
2.42.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH net 1/6] mlxsw: spectrum_acl_erp: Fix error flow of pool allocation failure
  2024-01-17 15:04 [PATCH net 0/6] mlxsw: Miscellaneous fixes Petr Machata
@ 2024-01-17 15:04 ` Petr Machata
  2024-01-17 15:04 ` [PATCH net 2/6] mlxsw: spectrum_acl_tcam: Fix NULL pointer dereference in error path Petr Machata
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Petr Machata @ 2024-01-17 15:04 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	netdev
  Cc: Ido Schimmel, Petr Machata, Danielle Ratson, Amit Cohen,
	Jiri Pirko, mlxsw

From: Amit Cohen <amcohen@nvidia.com>

Lately, a bug was found when many TC filters are added - at some point,
several bugs are printed to dmesg [1] and the switch is crashed with
segmentation fault.

The issue starts when gen_pool_free() fails because of unexpected
behavior - a try to free memory which is already freed, this leads to BUG()
call which crashes the switch and makes many other bugs.

Trying to track down the unexpected behavior led to a bug in eRP code. The
function mlxsw_sp_acl_erp_table_alloc() gets a pointer to the allocated
index, sets the value and returns an error code. When gen_pool_alloc()
fails it returns address 0, we track it and return -ENOBUFS outside, BUT
the call for gen_pool_alloc() already override the index in erp_table
structure. This is a problem when such allocation is done as part of
table expansion. This is not a new table, which will not be used in case
of allocation failure. We try to expand eRP table and override the
current index (non-zero) with zero. Then, it leads to an unexpected
behavior when address 0 is freed twice. Note that address 0 is valid in
erp_table->base_index and indeed other tables use it.

gen_pool_alloc() fails in case that there is no space left in the
pre-allocated pool, in our case, the pool is limited to
ACL_MAX_ERPT_BANK_SIZE, which is read from hardware. When more than max
erp entries are required, we exceed the limit and return an error, this
error leads to "Failed to migrate vregion" print.

Fix this by changing erp_table->base_index only in case of a successful
allocation.

Add a test case for such a scenario. Without this fix it causes
segmentation fault:

$ TESTS="max_erp_entries_test" ./tc_flower.sh
./tc_flower.sh: line 988:  1560 Segmentation fault      tc filter del dev $h2 ingress chain $i protocol ip pref $i handle $j flower &>/dev/null

[1]:
kernel BUG at lib/genalloc.c:508!
invalid opcode: 0000 [#1] PREEMPT SMP
CPU: 6 PID: 3531 Comm: tc Not tainted 6.7.0-rc5-custom-ga6893f479f5e #1
Hardware name: Mellanox Technologies Ltd. MSN4700/VMOD0010, BIOS 5.11 07/12/2021
RIP: 0010:gen_pool_free_owner+0xc9/0xe0
...
Call Trace:
 <TASK>
 __mlxsw_sp_acl_erp_table_other_dec+0x70/0xa0 [mlxsw_spectrum]
 mlxsw_sp_acl_erp_mask_destroy+0xf5/0x110 [mlxsw_spectrum]
 objagg_obj_root_destroy+0x18/0x80 [objagg]
 objagg_obj_destroy+0x12c/0x130 [objagg]
 mlxsw_sp_acl_erp_mask_put+0x37/0x50 [mlxsw_spectrum]
 mlxsw_sp_acl_ctcam_region_entry_remove+0x74/0xa0 [mlxsw_spectrum]
 mlxsw_sp_acl_ctcam_entry_del+0x1e/0x40 [mlxsw_spectrum]
 mlxsw_sp_acl_tcam_ventry_del+0x78/0xd0 [mlxsw_spectrum]
 mlxsw_sp_flower_destroy+0x4d/0x70 [mlxsw_spectrum]
 mlxsw_sp_flow_block_cb+0x73/0xb0 [mlxsw_spectrum]
 tc_setup_cb_destroy+0xc1/0x180
 fl_hw_destroy_filter+0x94/0xc0 [cls_flower]
 __fl_delete+0x1ac/0x1c0 [cls_flower]
 fl_destroy+0xc2/0x150 [cls_flower]
 tcf_proto_destroy+0x1a/0xa0
...
mlxsw_spectrum3 0000:07:00.0: Failed to migrate vregion
mlxsw_spectrum3 0000:07:00.0: Failed to migrate vregion

Fixes: f465261aa105 ("mlxsw: spectrum_acl: Implement common eRP core")
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
---
 .../mellanox/mlxsw/spectrum_acl_erp.c         |  8 +--
 .../drivers/net/mlxsw/spectrum-2/tc_flower.sh | 52 ++++++++++++++++++-
 2 files changed, 56 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_erp.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_erp.c
index 4c98950380d5..d231f4d2888b 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_erp.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_erp.c
@@ -301,6 +301,7 @@ mlxsw_sp_acl_erp_table_alloc(struct mlxsw_sp_acl_erp_core *erp_core,
 			     unsigned long *p_index)
 {
 	unsigned int num_rows, entry_size;
+	unsigned long index;
 
 	/* We only allow allocations of entire rows */
 	if (num_erps % erp_core->num_erp_banks != 0)
@@ -309,10 +310,11 @@ mlxsw_sp_acl_erp_table_alloc(struct mlxsw_sp_acl_erp_core *erp_core,
 	entry_size = erp_core->erpt_entries_size[region_type];
 	num_rows = num_erps / erp_core->num_erp_banks;
 
-	*p_index = gen_pool_alloc(erp_core->erp_tables, num_rows * entry_size);
-	if (*p_index == 0)
+	index = gen_pool_alloc(erp_core->erp_tables, num_rows * entry_size);
+	if (!index)
 		return -ENOBUFS;
-	*p_index -= MLXSW_SP_ACL_ERP_GENALLOC_OFFSET;
+
+	*p_index = index - MLXSW_SP_ACL_ERP_GENALLOC_OFFSET;
 
 	return 0;
 }
diff --git a/tools/testing/selftests/drivers/net/mlxsw/spectrum-2/tc_flower.sh b/tools/testing/selftests/drivers/net/mlxsw/spectrum-2/tc_flower.sh
index fb850e0ec837..7bf56ea161e3 100755
--- a/tools/testing/selftests/drivers/net/mlxsw/spectrum-2/tc_flower.sh
+++ b/tools/testing/selftests/drivers/net/mlxsw/spectrum-2/tc_flower.sh
@@ -10,7 +10,8 @@ lib_dir=$(dirname $0)/../../../../net/forwarding
 ALL_TESTS="single_mask_test identical_filters_test two_masks_test \
 	multiple_masks_test ctcam_edge_cases_test delta_simple_test \
 	delta_two_masks_one_key_test delta_simple_rehash_test \
-	bloom_simple_test bloom_complex_test bloom_delta_test"
+	bloom_simple_test bloom_complex_test bloom_delta_test \
+	max_erp_entries_test"
 NUM_NETIFS=2
 source $lib_dir/lib.sh
 source $lib_dir/tc_common.sh
@@ -983,6 +984,55 @@ bloom_delta_test()
 	log_test "bloom delta test ($tcflags)"
 }
 
+max_erp_entries_test()
+{
+	# The number of eRP entries is limited. Once the maximum number of eRPs
+	# has been reached, filters cannot be added. This test verifies that
+	# when this limit is reached, inserstion fails without crashing.
+
+	RET=0
+
+	local num_masks=32
+	local num_regions=15
+	local chain_failed
+	local mask_failed
+	local ret
+
+	if [[ "$tcflags" != "skip_sw" ]]; then
+		return 0;
+	fi
+
+	for ((i=1; i < $num_regions; i++)); do
+		for ((j=$num_masks; j >= 0; j--)); do
+			tc filter add dev $h2 ingress chain $i protocol ip \
+				pref $i	handle $j flower $tcflags \
+				dst_ip 192.1.0.0/$j &> /dev/null
+			ret=$?
+
+			if [ $ret -ne 0 ]; then
+				chain_failed=$i
+				mask_failed=$j
+				break 2
+			fi
+		done
+	done
+
+	# We expect to exceed the maximum number of eRP entries, so that
+	# insertion eventually fails. Otherwise, the test should be adjusted to
+	# add more filters.
+	check_fail $ret "expected to exceed number of eRP entries"
+
+	for ((; i >= 1; i--)); do
+		for ((j=0; j <= $num_masks; j++)); do
+			tc filter del dev $h2 ingress chain $i protocol ip \
+				pref $i handle $j flower &> /dev/null
+		done
+	done
+
+	log_test "max eRP entries test ($tcflags). " \
+		"max chain $chain_failed, mask $mask_failed"
+}
+
 setup_prepare()
 {
 	h1=${NETIFS[p1]}
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH net 2/6] mlxsw: spectrum_acl_tcam: Fix NULL pointer dereference in error path
  2024-01-17 15:04 [PATCH net 0/6] mlxsw: Miscellaneous fixes Petr Machata
  2024-01-17 15:04 ` [PATCH net 1/6] mlxsw: spectrum_acl_erp: Fix error flow of pool allocation failure Petr Machata
@ 2024-01-17 15:04 ` Petr Machata
  2024-01-17 15:04 ` [PATCH net 3/6] mlxsw: spectrum_acl_tcam: Fix stack corruption Petr Machata
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Petr Machata @ 2024-01-17 15:04 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	netdev
  Cc: Ido Schimmel, Petr Machata, Danielle Ratson, Amit Cohen,
	Jiri Pirko, mlxsw, Jiri Pirko

From: Ido Schimmel <idosch@nvidia.com>

When calling mlxsw_sp_acl_tcam_region_destroy() from an error path after
failing to attach the region to an ACL group, we hit a NULL pointer
dereference upon 'region->group->tcam' [1].

Fix by retrieving the 'tcam' pointer using mlxsw_sp_acl_to_tcam().

[1]
BUG: kernel NULL pointer dereference, address: 0000000000000000
[...]
RIP: 0010:mlxsw_sp_acl_tcam_region_destroy+0xa0/0xd0
[...]
Call Trace:
 mlxsw_sp_acl_tcam_vchunk_get+0x88b/0xa20
 mlxsw_sp_acl_tcam_ventry_add+0x25/0xe0
 mlxsw_sp_acl_rule_add+0x47/0x240
 mlxsw_sp_flower_replace+0x1a9/0x1d0
 tc_setup_cb_add+0xdc/0x1c0
 fl_hw_replace_filter+0x146/0x1f0
 fl_change+0xc17/0x1360
 tc_new_tfilter+0x472/0xb90
 rtnetlink_rcv_msg+0x313/0x3b0
 netlink_rcv_skb+0x58/0x100
 netlink_unicast+0x244/0x390
 netlink_sendmsg+0x1e4/0x440
 ____sys_sendmsg+0x164/0x260
 ___sys_sendmsg+0x9a/0xe0
 __sys_sendmsg+0x7a/0xc0
 do_syscall_64+0x40/0xe0
 entry_SYSCALL_64_after_hwframe+0x63/0x6b

Fixes: 22a677661f56 ("mlxsw: spectrum: Introduce ACL core with simple TCAM implementation")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.c
index d50786b0a6ce..7d1e91196e94 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.c
@@ -681,13 +681,13 @@ static void
 mlxsw_sp_acl_tcam_region_destroy(struct mlxsw_sp *mlxsw_sp,
 				 struct mlxsw_sp_acl_tcam_region *region)
 {
+	struct mlxsw_sp_acl_tcam *tcam = mlxsw_sp_acl_to_tcam(mlxsw_sp->acl);
 	const struct mlxsw_sp_acl_tcam_ops *ops = mlxsw_sp->acl_tcam_ops;
 
 	ops->region_fini(mlxsw_sp, region->priv);
 	mlxsw_sp_acl_tcam_region_disable(mlxsw_sp, region);
 	mlxsw_sp_acl_tcam_region_free(mlxsw_sp, region);
-	mlxsw_sp_acl_tcam_region_id_put(region->group->tcam,
-					region->id);
+	mlxsw_sp_acl_tcam_region_id_put(tcam, region->id);
 	kfree(region);
 }
 
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH net 3/6] mlxsw: spectrum_acl_tcam: Fix stack corruption
  2024-01-17 15:04 [PATCH net 0/6] mlxsw: Miscellaneous fixes Petr Machata
  2024-01-17 15:04 ` [PATCH net 1/6] mlxsw: spectrum_acl_erp: Fix error flow of pool allocation failure Petr Machata
  2024-01-17 15:04 ` [PATCH net 2/6] mlxsw: spectrum_acl_tcam: Fix NULL pointer dereference in error path Petr Machata
@ 2024-01-17 15:04 ` Petr Machata
  2024-01-17 15:04 ` [PATCH net 4/6] mlxsw: spectrum_router: Register netdevice notifier before nexthop Petr Machata
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Petr Machata @ 2024-01-17 15:04 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	netdev
  Cc: Ido Schimmel, Petr Machata, Danielle Ratson, Amit Cohen,
	Jiri Pirko, mlxsw, Orel Hagag

From: Ido Schimmel <idosch@nvidia.com>

When tc filters are first added to a net device, the corresponding local
port gets bound to an ACL group in the device. The group contains a list
of ACLs. In turn, each ACL points to a different TCAM region where the
filters are stored. During forwarding, the ACLs are sequentially
evaluated until a match is found.

One reason to place filters in different regions is when they are added
with decreasing priorities and in an alternating order so that two
consecutive filters can never fit in the same region because of their
key usage.

In Spectrum-2 and newer ASICs the firmware started to report that the
maximum number of ACLs in a group is more than 16, but the layout of the
register that configures ACL groups (PAGT) was not updated to account
for that. It is therefore possible to hit stack corruption [1] in the
rare case where more than 16 ACLs in a group are required.

Fix by limiting the maximum ACL group size to the minimum between what
the firmware reports and the maximum ACLs that fit in the PAGT register.

Add a test case to make sure the machine does not crash when this
condition is hit.

[1]
Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: mlxsw_sp_acl_tcam_group_update+0x116/0x120
[...]
 dump_stack_lvl+0x36/0x50
 panic+0x305/0x330
 __stack_chk_fail+0x15/0x20
 mlxsw_sp_acl_tcam_group_update+0x116/0x120
 mlxsw_sp_acl_tcam_group_region_attach+0x69/0x110
 mlxsw_sp_acl_tcam_vchunk_get+0x492/0xa20
 mlxsw_sp_acl_tcam_ventry_add+0x25/0xe0
 mlxsw_sp_acl_rule_add+0x47/0x240
 mlxsw_sp_flower_replace+0x1a9/0x1d0
 tc_setup_cb_add+0xdc/0x1c0
 fl_hw_replace_filter+0x146/0x1f0
 fl_change+0xc17/0x1360
 tc_new_tfilter+0x472/0xb90
 rtnetlink_rcv_msg+0x313/0x3b0
 netlink_rcv_skb+0x58/0x100
 netlink_unicast+0x244/0x390
 netlink_sendmsg+0x1e4/0x440
 ____sys_sendmsg+0x164/0x260
 ___sys_sendmsg+0x9a/0xe0
 __sys_sendmsg+0x7a/0xc0
 do_syscall_64+0x40/0xe0
 entry_SYSCALL_64_after_hwframe+0x63/0x6b

Fixes: c3ab435466d5 ("mlxsw: spectrum: Extend to support Spectrum-2 ASIC")
Reported-by: Orel Hagag <orelh@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
---
 .../mellanox/mlxsw/spectrum_acl_tcam.c        |  2 +
 .../drivers/net/mlxsw/spectrum-2/tc_flower.sh | 56 ++++++++++++++++++-
 2 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.c
index 7d1e91196e94..50ea1eff02b2 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.c
@@ -1564,6 +1564,8 @@ int mlxsw_sp_acl_tcam_init(struct mlxsw_sp *mlxsw_sp,
 	tcam->max_groups = max_groups;
 	tcam->max_group_size = MLXSW_CORE_RES_GET(mlxsw_sp->core,
 						  ACL_MAX_GROUP_SIZE);
+	tcam->max_group_size = min_t(unsigned int, tcam->max_group_size,
+				     MLXSW_REG_PAGT_ACL_MAX_NUM);
 
 	err = ops->init(mlxsw_sp, tcam->priv, tcam);
 	if (err)
diff --git a/tools/testing/selftests/drivers/net/mlxsw/spectrum-2/tc_flower.sh b/tools/testing/selftests/drivers/net/mlxsw/spectrum-2/tc_flower.sh
index 7bf56ea161e3..616d3581419c 100755
--- a/tools/testing/selftests/drivers/net/mlxsw/spectrum-2/tc_flower.sh
+++ b/tools/testing/selftests/drivers/net/mlxsw/spectrum-2/tc_flower.sh
@@ -11,7 +11,7 @@ ALL_TESTS="single_mask_test identical_filters_test two_masks_test \
 	multiple_masks_test ctcam_edge_cases_test delta_simple_test \
 	delta_two_masks_one_key_test delta_simple_rehash_test \
 	bloom_simple_test bloom_complex_test bloom_delta_test \
-	max_erp_entries_test"
+	max_erp_entries_test max_group_size_test"
 NUM_NETIFS=2
 source $lib_dir/lib.sh
 source $lib_dir/tc_common.sh
@@ -1033,6 +1033,60 @@ max_erp_entries_test()
 		"max chain $chain_failed, mask $mask_failed"
 }
 
+max_group_size_test()
+{
+	# The number of ACLs in an ACL group is limited. Once the maximum
+	# number of ACLs has been reached, filters cannot be added. This test
+	# verifies that when this limit is reached, insertion fails without
+	# crashing.
+
+	RET=0
+
+	local num_acls=32
+	local max_size
+	local ret
+
+	if [[ "$tcflags" != "skip_sw" ]]; then
+		return 0;
+	fi
+
+	for ((i=1; i < $num_acls; i++)); do
+		if [[ $(( i % 2 )) == 1 ]]; then
+			tc filter add dev $h2 ingress pref $i proto ipv4 \
+				flower $tcflags dst_ip 198.51.100.1/32 \
+				ip_proto tcp tcp_flags 0x01/0x01 \
+				action drop &> /dev/null
+		else
+			tc filter add dev $h2 ingress pref $i proto ipv6 \
+				flower $tcflags dst_ip 2001:db8:1::1/128 \
+				action drop &> /dev/null
+		fi
+
+		ret=$?
+		[[ $ret -ne 0 ]] && max_size=$((i - 1)) && break
+	done
+
+	# We expect to exceed the maximum number of ACLs in a group, so that
+	# insertion eventually fails. Otherwise, the test should be adjusted to
+	# add more filters.
+	check_fail $ret "expected to exceed number of ACLs in a group"
+
+	for ((; i >= 1; i--)); do
+		if [[ $(( i % 2 )) == 1 ]]; then
+			tc filter del dev $h2 ingress pref $i proto ipv4 \
+				flower $tcflags dst_ip 198.51.100.1/32 \
+				ip_proto tcp tcp_flags 0x01/0x01 \
+				action drop &> /dev/null
+		else
+			tc filter del dev $h2 ingress pref $i proto ipv6 \
+				flower $tcflags dst_ip 2001:db8:1::1/128 \
+				action drop &> /dev/null
+		fi
+	done
+
+	log_test "max ACL group size test ($tcflags). max size $max_size"
+}
+
 setup_prepare()
 {
 	h1=${NETIFS[p1]}
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH net 4/6] mlxsw: spectrum_router: Register netdevice notifier before nexthop
  2024-01-17 15:04 [PATCH net 0/6] mlxsw: Miscellaneous fixes Petr Machata
                   ` (2 preceding siblings ...)
  2024-01-17 15:04 ` [PATCH net 3/6] mlxsw: spectrum_acl_tcam: Fix stack corruption Petr Machata
@ 2024-01-17 15:04 ` Petr Machata
  2024-01-17 15:04 ` [PATCH net 5/6] selftests: mlxsw: qos_pfc: Remove wrong description Petr Machata
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Petr Machata @ 2024-01-17 15:04 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	netdev
  Cc: Ido Schimmel, Petr Machata, Danielle Ratson, Amit Cohen,
	Jiri Pirko, mlxsw, Maksym Yaremchuk

If there are IPIP nexthops at the time when the driver is loaded (or the
devlink instance reloaded), the driver looks up the corresponding IPIP
entry. But IPIP entries are only created as a result of netdevice
notifications. Since the netdevice notifier is registered after the nexthop
notifier, mlxsw_sp_nexthop_type_init() never finds the IPIP entry,
registers the nexthop MLXSW_SP_NEXTHOP_TYPE_ETH, and fails to assign a CRIF
to the nexthop. Later on when the CRIF is necessary, the WARN_ON in
mlxsw_sp_nexthop_rif() triggers, causing the splat [1].

In order to fix the issue, reorder the netdevice notifier to be registered
before the nexthop one.

[1] (edited for clarity):

    WARNING: CPU: 1 PID: 1364 at drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:3245 mlxsw_sp_nexthop_rif (drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:3246 (discriminator 1)) mlxsw_spectrum
    Hardware name: Mellanox Technologies Ltd. MSN4410/VMOD0010, BIOS 5.11 01/06/2019
    Call Trace:
    ? mlxsw_sp_nexthop_rif (drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:3246 (discriminator 1)) mlxsw_spectrum
    __mlxsw_sp_nexthop_eth_update (drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:3637) mlxsw_spectrum
    mlxsw_sp_nexthop_update (drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:3679 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:3727) mlxsw_spectrum
    mlxsw_sp_nexthop_group_update (drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:3757) mlxsw_spectrum
    mlxsw_sp_nexthop_group_refresh (drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:4112) mlxsw_spectrum
    mlxsw_sp_nexthop_obj_event (drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:5118 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:5191 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:5315 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:5500) mlxsw_spectrum
    nexthops_dump (net/ipv4/nexthop.c:217 net/ipv4/nexthop.c:440 net/ipv4/nexthop.c:3609)
    register_nexthop_notifier (net/ipv4/nexthop.c:3624)
    mlxsw_sp_router_init (drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:11486) mlxsw_spectrum
    mlxsw_sp_init (drivers/net/ethernet/mellanox/mlxsw/spectrum.c:3267) mlxsw_spectrum
    __mlxsw_core_bus_device_register (drivers/net/ethernet/mellanox/mlxsw/core.c:2202) mlxsw_core
    mlxsw_devlink_core_bus_device_reload_up (drivers/net/ethernet/mellanox/mlxsw/core.c:2265 drivers/net/ethernet/mellanox/mlxsw/core.c:1603) mlxsw_core
    devlink_reload (net/devlink/dev.c:314 net/devlink/dev.c:475)
    [...]

Fixes: 9464a3d68ea9 ("mlxsw: spectrum_router: Track next hops at CRIFs")
Reported-by: Maksym Yaremchuk <maksymy@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
---
 .../ethernet/mellanox/mlxsw/spectrum_router.c | 24 +++++++++----------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
index 2c255ed9b8a9..7164f9e6370f 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
@@ -11472,6 +11472,13 @@ int mlxsw_sp_router_init(struct mlxsw_sp *mlxsw_sp,
 	if (err)
 		goto err_register_netevent_notifier;
 
+	mlxsw_sp->router->netdevice_nb.notifier_call =
+		mlxsw_sp_router_netdevice_event;
+	err = register_netdevice_notifier_net(mlxsw_sp_net(mlxsw_sp),
+					      &mlxsw_sp->router->netdevice_nb);
+	if (err)
+		goto err_register_netdev_notifier;
+
 	mlxsw_sp->router->nexthop_nb.notifier_call =
 		mlxsw_sp_nexthop_obj_event;
 	err = register_nexthop_notifier(mlxsw_sp_net(mlxsw_sp),
@@ -11487,22 +11494,15 @@ int mlxsw_sp_router_init(struct mlxsw_sp *mlxsw_sp,
 	if (err)
 		goto err_register_fib_notifier;
 
-	mlxsw_sp->router->netdevice_nb.notifier_call =
-		mlxsw_sp_router_netdevice_event;
-	err = register_netdevice_notifier_net(mlxsw_sp_net(mlxsw_sp),
-					      &mlxsw_sp->router->netdevice_nb);
-	if (err)
-		goto err_register_netdev_notifier;
-
 	return 0;
 
-err_register_netdev_notifier:
-	unregister_fib_notifier(mlxsw_sp_net(mlxsw_sp),
-				&mlxsw_sp->router->fib_nb);
 err_register_fib_notifier:
 	unregister_nexthop_notifier(mlxsw_sp_net(mlxsw_sp),
 				    &mlxsw_sp->router->nexthop_nb);
 err_register_nexthop_notifier:
+	unregister_netdevice_notifier_net(mlxsw_sp_net(mlxsw_sp),
+					  &router->netdevice_nb);
+err_register_netdev_notifier:
 	unregister_netevent_notifier(&mlxsw_sp->router->netevent_nb);
 err_register_netevent_notifier:
 	unregister_inet6addr_validator_notifier(&router->inet6addr_valid_nb);
@@ -11550,11 +11550,11 @@ void mlxsw_sp_router_fini(struct mlxsw_sp *mlxsw_sp)
 {
 	struct mlxsw_sp_router *router = mlxsw_sp->router;
 
-	unregister_netdevice_notifier_net(mlxsw_sp_net(mlxsw_sp),
-					  &router->netdevice_nb);
 	unregister_fib_notifier(mlxsw_sp_net(mlxsw_sp), &router->fib_nb);
 	unregister_nexthop_notifier(mlxsw_sp_net(mlxsw_sp),
 				    &router->nexthop_nb);
+	unregister_netdevice_notifier_net(mlxsw_sp_net(mlxsw_sp),
+					  &router->netdevice_nb);
 	unregister_netevent_notifier(&router->netevent_nb);
 	unregister_inet6addr_validator_notifier(&router->inet6addr_valid_nb);
 	unregister_inetaddr_validator_notifier(&router->inetaddr_valid_nb);
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH net 5/6] selftests: mlxsw: qos_pfc: Remove wrong description
  2024-01-17 15:04 [PATCH net 0/6] mlxsw: Miscellaneous fixes Petr Machata
                   ` (3 preceding siblings ...)
  2024-01-17 15:04 ` [PATCH net 4/6] mlxsw: spectrum_router: Register netdevice notifier before nexthop Petr Machata
@ 2024-01-17 15:04 ` Petr Machata
  2024-01-17 15:04 ` [PATCH net 6/6] selftests: mlxsw: qos_pfc: Adjust the test to support 8 lanes Petr Machata
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Petr Machata @ 2024-01-17 15:04 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	netdev
  Cc: Ido Schimmel, Petr Machata, Danielle Ratson, Amit Cohen,
	Jiri Pirko, mlxsw, Shuah Khan, linux-kselftest

From: Amit Cohen <amcohen@nvidia.com>

In the diagram of the topology, $swp3 and $swp4 are described as 1Gbps
ports. This is wrong information, the test does not configure such speed.

Cc: Shuah Khan <shuah@kernel.org>
Cc: linux-kselftest@vger.kernel.org
Fixes: bfa804784e32 ("selftests: mlxsw: Add a PFC test")
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
---
 tools/testing/selftests/drivers/net/mlxsw/qos_pfc.sh | 1 -
 1 file changed, 1 deletion(-)

diff --git a/tools/testing/selftests/drivers/net/mlxsw/qos_pfc.sh b/tools/testing/selftests/drivers/net/mlxsw/qos_pfc.sh
index 42ce602d8d49..49bef76083b8 100755
--- a/tools/testing/selftests/drivers/net/mlxsw/qos_pfc.sh
+++ b/tools/testing/selftests/drivers/net/mlxsw/qos_pfc.sh
@@ -40,7 +40,6 @@
 # |   + $swp1          $swp3 +                    + $swp4                     |
 # |   | iPOOL1        iPOOL0 |                    | iPOOL2                    |
 # |   | ePOOL4        ePOOL5 |                    | ePOOL4                    |
-# |   |                1Gbps |                    | 1Gbps                     |
 # |   |        PFC:enabled=1 |                    | PFC:enabled=1             |
 # | +-|----------------------|-+                +-|------------------------+  |
 # | | + $swp1.111  $swp3.111 + |                | + $swp4.111              |  |
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH net 6/6] selftests: mlxsw: qos_pfc: Adjust the test to support 8 lanes
  2024-01-17 15:04 [PATCH net 0/6] mlxsw: Miscellaneous fixes Petr Machata
                   ` (4 preceding siblings ...)
  2024-01-17 15:04 ` [PATCH net 5/6] selftests: mlxsw: qos_pfc: Remove wrong description Petr Machata
@ 2024-01-17 15:04 ` Petr Machata
  2024-01-18 15:06 ` [PATCH net 0/6] mlxsw: Miscellaneous fixes Paolo Abeni
  2024-01-18 18:00 ` patchwork-bot+netdevbpf
  7 siblings, 0 replies; 9+ messages in thread
From: Petr Machata @ 2024-01-17 15:04 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	netdev
  Cc: Ido Schimmel, Petr Machata, Danielle Ratson, Amit Cohen,
	Jiri Pirko, mlxsw, Shuah Khan, linux-kselftest

From: Amit Cohen <amcohen@nvidia.com>

'qos_pfc' test checks PFC behavior. The idea is to limit the traffic
using a shaper somewhere in the flow of the packets. In this area, the
buffer is smaller than the buffer at the beginning of the flow, so it fills
up until there is no more space left. The test configures there PFC
which is supposed to notice that the headroom is filling up and send PFC
Xoff to indicate the transmitter to stop sending traffic for the priorities
sharing this PG.

The Xon/Xoff threshold is auto-configured and always equal to
2*(MTU rounded up to cell size). Even after sending the PFC Xoff packet,
traffic will keep arriving until the transmitter receives and processes
the PFC packet. This amount of traffic is known as the PFC delay allowance.

Currently the buffer for the delay traffic is configured as 100KB. The
MTU in the test is 10KB, therefore the threshold for Xoff is about 20KB.
This allows 80KB extra to be stored in this buffer.

8-lane ports use two buffers among which the configured buffer is split,
the Xoff threshold then applies to each buffer in parallel.

The test does not take into account the behavior of 8-lane ports, when the
ports are configured to 400Gbps with 8 lanes or 800Gbps with 8 lanes,
packets are dropped and the test fails.

Check if the relevant ports use 8 lanes, in such case double the size of
the buffer, as the headroom is split half-half.

Cc: Shuah Khan <shuah@kernel.org>
Cc: linux-kselftest@vger.kernel.org
Fixes: bfa804784e32 ("selftests: mlxsw: Add a PFC test")
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
---
 .../selftests/drivers/net/mlxsw/qos_pfc.sh     | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/drivers/net/mlxsw/qos_pfc.sh b/tools/testing/selftests/drivers/net/mlxsw/qos_pfc.sh
index 49bef76083b8..0f0f4f05807c 100755
--- a/tools/testing/selftests/drivers/net/mlxsw/qos_pfc.sh
+++ b/tools/testing/selftests/drivers/net/mlxsw/qos_pfc.sh
@@ -119,6 +119,9 @@ h2_destroy()
 
 switch_create()
 {
+	local lanes_swp4
+	local pg1_size
+
 	# pools
 	# -----
 
@@ -228,7 +231,20 @@ switch_create()
 	dcb pfc set dev $swp4 prio-pfc all:off 1:on
 	# PG0 will get autoconfigured to Xoff, give PG1 arbitrarily 100K, which
 	# is (-2*MTU) about 80K of delay provision.
-	dcb buffer set dev $swp4 buffer-size all:0 1:$_100KB
+	pg1_size=$_100KB
+
+	setup_wait_dev_with_timeout $swp4
+
+	lanes_swp4=$(ethtool $swp4 | grep 'Lanes:')
+	lanes_swp4=${lanes_swp4#*"Lanes: "}
+
+	# 8-lane ports use two buffers among which the configured buffer
+	# is split, so double the size to get twice (20K + 80K).
+	if [[ $lanes_swp4 -eq 8 ]]; then
+		pg1_size=$((pg1_size * 2))
+	fi
+
+	dcb buffer set dev $swp4 buffer-size all:0 1:$pg1_size
 
 	# bridges
 	# -------
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH net 0/6] mlxsw: Miscellaneous fixes
  2024-01-17 15:04 [PATCH net 0/6] mlxsw: Miscellaneous fixes Petr Machata
                   ` (5 preceding siblings ...)
  2024-01-17 15:04 ` [PATCH net 6/6] selftests: mlxsw: qos_pfc: Adjust the test to support 8 lanes Petr Machata
@ 2024-01-18 15:06 ` Paolo Abeni
  2024-01-18 18:00 ` patchwork-bot+netdevbpf
  7 siblings, 0 replies; 9+ messages in thread
From: Paolo Abeni @ 2024-01-18 15:06 UTC (permalink / raw)
  To: Petr Machata, David S. Miller, Eric Dumazet, Jakub Kicinski,
	netdev
  Cc: Ido Schimmel, Danielle Ratson, Amit Cohen, Jiri Pirko, mlxsw

On Wed, 2024-01-17 at 16:04 +0100, Petr Machata wrote:
> This patchset is a bric-a-brac of fixes for bugs impacting mlxsw.
> 
> - Patches #1 and #2 fix issues in ACL handling error paths.
> - Patch #3 fixes stack corruption in ACL code that a recent FW update
>   has uncovered.
> 
> - Patch #4 fixes an issue in handling of IPIP next hops.
> 
> - Patch #5 fixes a typo in a the qos_pfc selftest
> - Patch #6 fixes the same selftest to work with 8-lane ports.
> 
> Amit Cohen (3):
>   mlxsw: spectrum_acl_erp: Fix error flow of pool allocation failure
>   selftests: mlxsw: qos_pfc: Remove wrong description
>   selftests: mlxsw: qos_pfc: Adjust the test to support 8 lanes
> 
> Ido Schimmel (2):
>   mlxsw: spectrum_acl_tcam: Fix NULL pointer dereference in error path
>   mlxsw: spectrum_acl_tcam: Fix stack corruption
> 
> Petr Machata (1):
>   mlxsw: spectrum_router: Register netdevice notifier before nexthop
> 
>  .../mellanox/mlxsw/spectrum_acl_erp.c         |   8 +-
>  .../mellanox/mlxsw/spectrum_acl_tcam.c        |   6 +-
>  .../ethernet/mellanox/mlxsw/spectrum_router.c |  24 ++--
>  .../selftests/drivers/net/mlxsw/qos_pfc.sh    |  19 +++-
>  .../drivers/net/mlxsw/spectrum-2/tc_flower.sh | 106 +++++++++++++++++-
>  5 files changed, 143 insertions(+), 20 deletions(-)

LGTM, but still a bit too fresh to be merged right now.

Acked-by: Paolo Abeni <pabeni@redhat.com>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net 0/6] mlxsw: Miscellaneous fixes
  2024-01-17 15:04 [PATCH net 0/6] mlxsw: Miscellaneous fixes Petr Machata
                   ` (6 preceding siblings ...)
  2024-01-18 15:06 ` [PATCH net 0/6] mlxsw: Miscellaneous fixes Paolo Abeni
@ 2024-01-18 18:00 ` patchwork-bot+netdevbpf
  7 siblings, 0 replies; 9+ messages in thread
From: patchwork-bot+netdevbpf @ 2024-01-18 18:00 UTC (permalink / raw)
  To: Petr Machata
  Cc: davem, edumazet, kuba, pabeni, netdev, idosch, danieller, amcohen,
	jiri, mlxsw

Hello:

This series was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Wed, 17 Jan 2024 16:04:15 +0100 you wrote:
> This patchset is a bric-a-brac of fixes for bugs impacting mlxsw.
> 
> - Patches #1 and #2 fix issues in ACL handling error paths.
> - Patch #3 fixes stack corruption in ACL code that a recent FW update
>   has uncovered.
> 
> - Patch #4 fixes an issue in handling of IPIP next hops.
> 
> [...]

Here is the summary with links:
  - [net,1/6] mlxsw: spectrum_acl_erp: Fix error flow of pool allocation failure
    https://git.kernel.org/netdev/net/c/6d6eeabcfaba
  - [net,2/6] mlxsw: spectrum_acl_tcam: Fix NULL pointer dereference in error path
    https://git.kernel.org/netdev/net/c/efeb7dfea8ee
  - [net,3/6] mlxsw: spectrum_acl_tcam: Fix stack corruption
    https://git.kernel.org/netdev/net/c/483ae90d8f97
  - [net,4/6] mlxsw: spectrum_router: Register netdevice notifier before nexthop
    https://git.kernel.org/netdev/net/c/62bef63646c1
  - [net,5/6] selftests: mlxsw: qos_pfc: Remove wrong description
    https://git.kernel.org/netdev/net/c/40cc674bafd5
  - [net,6/6] selftests: mlxsw: qos_pfc: Adjust the test to support 8 lanes
    https://git.kernel.org/netdev/net/c/b34f4de6d30c

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-01-18 18:00 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-17 15:04 [PATCH net 0/6] mlxsw: Miscellaneous fixes Petr Machata
2024-01-17 15:04 ` [PATCH net 1/6] mlxsw: spectrum_acl_erp: Fix error flow of pool allocation failure Petr Machata
2024-01-17 15:04 ` [PATCH net 2/6] mlxsw: spectrum_acl_tcam: Fix NULL pointer dereference in error path Petr Machata
2024-01-17 15:04 ` [PATCH net 3/6] mlxsw: spectrum_acl_tcam: Fix stack corruption Petr Machata
2024-01-17 15:04 ` [PATCH net 4/6] mlxsw: spectrum_router: Register netdevice notifier before nexthop Petr Machata
2024-01-17 15:04 ` [PATCH net 5/6] selftests: mlxsw: qos_pfc: Remove wrong description Petr Machata
2024-01-17 15:04 ` [PATCH net 6/6] selftests: mlxsw: qos_pfc: Adjust the test to support 8 lanes Petr Machata
2024-01-18 15:06 ` [PATCH net 0/6] mlxsw: Miscellaneous fixes Paolo Abeni
2024-01-18 18:00 ` patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).