netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net 0/6] mlxsw: ACL fixes
@ 2024-06-06 14:49 Petr Machata
  2024-06-06 14:49 ` [PATCH net 1/6] lib: objagg: Fix spelling Petr Machata
                   ` (6 more replies)
  0 siblings, 7 replies; 14+ messages in thread
From: Petr Machata @ 2024-06-06 14:49 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	netdev
  Cc: Amit Cohen, Ido Schimmel, Petr Machata, Jiri Pirko,
	Alexander Zubkov, mlxsw

Ido Schimmel writes:

Patches #1-#3 fix various spelling mistakes I noticed while working on
the code base.

Patch #4 fixes a general protection fault by bailing out when the error
occurs and warning.

Patch #5 fixes the warning.

Patch #6 fixes ACL scale regression and firmware errors.

See the commit messages for more info.

Ido Schimmel (6):
  lib: objagg: Fix spelling
  lib: test_objagg: Fix spelling
  mlxsw: spectrum_acl_atcam: Fix wrong comment
  lib: objagg: Fix general protection fault
  mlxsw: spectrum_acl_erp: Fix object nesting warning
  mlxsw: spectrum_acl: Fix ACL scale regression and firmware errors

 .../mellanox/mlxsw/spectrum_acl_atcam.c       | 20 +++----
 .../mlxsw/spectrum_acl_bloom_filter.c         |  2 +-
 .../mellanox/mlxsw/spectrum_acl_erp.c         | 13 -----
 .../mellanox/mlxsw/spectrum_acl_tcam.h        |  9 +--
 include/linux/objagg.h                        |  1 -
 lib/objagg.c                                  | 20 ++-----
 lib/test_objagg.c                             |  2 +-
 .../drivers/net/mlxsw/spectrum-2/tc_flower.sh | 55 +++++++++++++++++--
 8 files changed, 69 insertions(+), 53 deletions(-)

-- 
2.45.0


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH net 1/6] lib: objagg: Fix spelling
  2024-06-06 14:49 [PATCH net 0/6] mlxsw: ACL fixes Petr Machata
@ 2024-06-06 14:49 ` Petr Machata
  2024-06-08  9:00   ` Simon Horman
  2024-06-06 14:49 ` [PATCH net 2/6] lib: test_objagg: " Petr Machata
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 14+ messages in thread
From: Petr Machata @ 2024-06-06 14:49 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	netdev
  Cc: Amit Cohen, Ido Schimmel, Petr Machata, Jiri Pirko,
	Alexander Zubkov, mlxsw

From: Ido Schimmel <idosch@nvidia.com>

Fixes: 0a020d416d0a ("lib: introduce initial implementation of object aggregation manager")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Tested-by: Alexander Zubkov <green@qrator.net>
Signed-off-by: Petr Machata <petrm@nvidia.com>
---
 lib/objagg.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/objagg.c b/lib/objagg.c
index 1e248629ed64..955538c90223 100644
--- a/lib/objagg.c
+++ b/lib/objagg.c
@@ -421,7 +421,7 @@ static struct objagg_obj *__objagg_obj_get(struct objagg *objagg, void *obj)
  *
  * There are 3 main options this function wraps:
  * 1) The object according to "obj" already exist. In that case
- *    the reference counter is incrementes and the object is returned.
+ *    the reference counter is incremented and the object is returned.
  * 2) The object does not exist, but it can be aggregated within
  *    another object. In that case, user ops->delta_create() is called
  *    to obtain delta data and a new object is created with returned
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net 2/6] lib: test_objagg: Fix spelling
  2024-06-06 14:49 [PATCH net 0/6] mlxsw: ACL fixes Petr Machata
  2024-06-06 14:49 ` [PATCH net 1/6] lib: objagg: Fix spelling Petr Machata
@ 2024-06-06 14:49 ` Petr Machata
  2024-06-08  9:00   ` Simon Horman
  2024-06-06 14:49 ` [PATCH net 3/6] mlxsw: spectrum_acl_atcam: Fix wrong comment Petr Machata
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 14+ messages in thread
From: Petr Machata @ 2024-06-06 14:49 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	netdev
  Cc: Amit Cohen, Ido Schimmel, Petr Machata, Jiri Pirko,
	Alexander Zubkov, mlxsw

From: Ido Schimmel <idosch@nvidia.com>

Fixes: 0a020d416d0a ("lib: introduce initial implementation of object aggregation manager")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Tested-by: Alexander Zubkov <green@qrator.net>
Signed-off-by: Petr Machata <petrm@nvidia.com>
---
 lib/test_objagg.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/test_objagg.c b/lib/test_objagg.c
index c0c957c50635..d34df4306b87 100644
--- a/lib/test_objagg.c
+++ b/lib/test_objagg.c
@@ -60,7 +60,7 @@ static struct objagg_obj *world_obj_get(struct world *world,
 	if (!world->key_refs[key_id_index(key_id)]) {
 		world->objagg_objs[key_id_index(key_id)] = objagg_obj;
 	} else if (world->objagg_objs[key_id_index(key_id)] != objagg_obj) {
-		pr_err("Key %u: God another object for the same key.\n",
+		pr_err("Key %u: Got another object for the same key.\n",
 		       key_id);
 		err = -EINVAL;
 		goto err_key_id_check;
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net 3/6] mlxsw: spectrum_acl_atcam: Fix wrong comment
  2024-06-06 14:49 [PATCH net 0/6] mlxsw: ACL fixes Petr Machata
  2024-06-06 14:49 ` [PATCH net 1/6] lib: objagg: Fix spelling Petr Machata
  2024-06-06 14:49 ` [PATCH net 2/6] lib: test_objagg: " Petr Machata
@ 2024-06-06 14:49 ` Petr Machata
  2024-06-08  9:01   ` Simon Horman
  2024-06-06 14:49 ` [PATCH net 4/6] lib: objagg: Fix general protection fault Petr Machata
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 14+ messages in thread
From: Petr Machata @ 2024-06-06 14:49 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	netdev
  Cc: Amit Cohen, Ido Schimmel, Petr Machata, Jiri Pirko,
	Alexander Zubkov, mlxsw

From: Ido Schimmel <idosch@nvidia.com>

The key is encoded, not encrypted.

Fixes: c22291f7cf45 ("mlxsw: spectrum: acl: Implement delta for ERP")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Tested-by: Alexander Zubkov <green@qrator.net>
Signed-off-by: Petr Machata <petrm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_atcam.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_atcam.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_atcam.c
index 4b713832fdd5..a7473e782b56 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_atcam.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_atcam.c
@@ -491,7 +491,7 @@ __mlxsw_sp_acl_atcam_entry_add(struct mlxsw_sp *mlxsw_sp,
 	       sizeof(aentry->enc_key));
 
 	/* Compute all needed delta information and clear the delta bits
-	 * from the encrypted key.
+	 * from the encoded key.
 	 */
 	delta = mlxsw_sp_acl_erp_delta(aentry->erp_mask);
 	aentry->delta_info.start = mlxsw_sp_acl_erp_delta_start(delta);
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net 4/6] lib: objagg: Fix general protection fault
  2024-06-06 14:49 [PATCH net 0/6] mlxsw: ACL fixes Petr Machata
                   ` (2 preceding siblings ...)
  2024-06-06 14:49 ` [PATCH net 3/6] mlxsw: spectrum_acl_atcam: Fix wrong comment Petr Machata
@ 2024-06-06 14:49 ` Petr Machata
  2024-06-08  9:01   ` Simon Horman
  2024-06-06 14:49 ` [PATCH net 5/6] mlxsw: spectrum_acl_erp: Fix object nesting warning Petr Machata
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 14+ messages in thread
From: Petr Machata @ 2024-06-06 14:49 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	netdev
  Cc: Amit Cohen, Ido Schimmel, Petr Machata, Jiri Pirko,
	Alexander Zubkov, mlxsw

From: Ido Schimmel <idosch@nvidia.com>

The library supports aggregation of objects into other objects only if
the parent object does not have a parent itself. That is, nesting is not
supported.

Aggregation happens in two cases: Without and with hints, where hints
are a pre-computed recommendation on how to aggregate the provided
objects.

Nesting is not possible in the first case due to a check that prevents
it, but in the second case there is no check because the assumption is
that nesting cannot happen when creating objects based on hints. The
violation of this assumption leads to various warnings and eventually to
a general protection fault [1].

Before fixing the root cause, error out when nesting happens and warn.

[1]
general protection fault, probably for non-canonical address 0xdead000000000d90: 0000 [#1] PREEMPT SMP PTI
CPU: 1 PID: 1083 Comm: kworker/1:9 Tainted: G        W          6.9.0-rc6-custom-gd9b4f1cca7fb #7
Hardware name: Mellanox Technologies Ltd. MSN3700/VMOD0005, BIOS 5.11 01/06/2019
Workqueue: mlxsw_core mlxsw_sp_acl_tcam_vregion_rehash_work
RIP: 0010:mlxsw_sp_acl_erp_bf_insert+0x25/0x80
[...]
Call Trace:
 <TASK>
 mlxsw_sp_acl_atcam_entry_add+0x256/0x3c0
 mlxsw_sp_acl_tcam_entry_create+0x5e/0xa0
 mlxsw_sp_acl_tcam_vchunk_migrate_one+0x16b/0x270
 mlxsw_sp_acl_tcam_vregion_rehash_work+0xbe/0x510
 process_one_work+0x151/0x370
 worker_thread+0x2cb/0x3e0
 kthread+0xd0/0x100
 ret_from_fork+0x34/0x50
 ret_from_fork_asm+0x1a/0x30
 </TASK>

Fixes: 9069a3817d82 ("lib: objagg: implement optimization hints assembly and use hints for object creation")
Reported-by: Alexander Zubkov <green@qrator.net>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Tested-by: Alexander Zubkov <green@qrator.net>
Signed-off-by: Petr Machata <petrm@nvidia.com>
---
 lib/objagg.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/lib/objagg.c b/lib/objagg.c
index 955538c90223..0f99ea5f5371 100644
--- a/lib/objagg.c
+++ b/lib/objagg.c
@@ -167,6 +167,9 @@ static int objagg_obj_parent_assign(struct objagg *objagg,
 {
 	void *delta_priv;
 
+	if (WARN_ON(!objagg_obj_is_root(parent)))
+		return -EINVAL;
+
 	delta_priv = objagg->ops->delta_create(objagg->priv, parent->obj,
 					       objagg_obj->obj);
 	if (IS_ERR(delta_priv))
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net 5/6] mlxsw: spectrum_acl_erp: Fix object nesting warning
  2024-06-06 14:49 [PATCH net 0/6] mlxsw: ACL fixes Petr Machata
                   ` (3 preceding siblings ...)
  2024-06-06 14:49 ` [PATCH net 4/6] lib: objagg: Fix general protection fault Petr Machata
@ 2024-06-06 14:49 ` Petr Machata
  2024-06-08  9:01   ` Simon Horman
  2024-06-06 14:49 ` [PATCH net 6/6] mlxsw: spectrum_acl: Fix ACL scale regression and firmware errors Petr Machata
  2024-06-10 10:20 ` [PATCH net 0/6] mlxsw: ACL fixes patchwork-bot+netdevbpf
  6 siblings, 1 reply; 14+ messages in thread
From: Petr Machata @ 2024-06-06 14:49 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	netdev
  Cc: Amit Cohen, Ido Schimmel, Petr Machata, Jiri Pirko,
	Alexander Zubkov, mlxsw

From: Ido Schimmel <idosch@nvidia.com>

ACLs in Spectrum-2 and newer ASICs can reside in the algorithmic TCAM
(A-TCAM) or in the ordinary circuit TCAM (C-TCAM). The former can
contain more ACLs (i.e., tc filters), but the number of masks in each
region (i.e., tc chain) is limited.

In order to mitigate the effects of the above limitation, the device
allows filters to share a single mask if their masks only differ in up
to 8 consecutive bits. For example, dst_ip/25 can be represented using
dst_ip/24 with a delta of 1 bit. The C-TCAM does not have a limit on the
number of masks being used (and therefore does not support mask
aggregation), but can contain a limited number of filters.

The driver uses the "objagg" library to perform the mask aggregation by
passing it objects that consist of the filter's mask and whether the
filter is to be inserted into the A-TCAM or the C-TCAM since filters in
different TCAMs cannot share a mask.

The set of created objects is dependent on the insertion order of the
filters and is not necessarily optimal. Therefore, the driver will
periodically ask the library to compute a more optimal set ("hints") by
looking at all the existing objects.

When the library asks the driver whether two objects can be aggregated
the driver only compares the provided masks and ignores the A-TCAM /
C-TCAM indication. This is the right thing to do since the goal is to
move as many filters as possible to the A-TCAM. The driver also forbids
two identical masks from being aggregated since this can only happen if
one was intentionally put in the C-TCAM to avoid a conflict in the
A-TCAM.

The above can result in the following set of hints:

H1: {mask X, A-TCAM} -> H2: {mask Y, A-TCAM} // X is Y + delta
H3: {mask Y, C-TCAM} -> H4: {mask Z, A-TCAM} // Y is Z + delta

After getting the hints from the library the driver will start migrating
filters from one region to another while consulting the computed hints
and instructing the device to perform a lookup in both regions during
the transition.

Assuming a filter with mask X is being migrated into the A-TCAM in the
new region, the hints lookup will return H1. Since H2 is the parent of
H1, the library will try to find the object associated with it and
create it if necessary in which case another hints lookup (recursive)
will be performed. This hints lookup for {mask Y, A-TCAM} will either
return H2 or H3 since the driver passes the library an object comparison
function that ignores the A-TCAM / C-TCAM indication.

This can eventually lead to nested objects which are not supported by
the library [1].

Fix by removing the object comparison function from both the driver and
the library as the driver was the only user. That way the lookup will
only return exact matches.

I do not have a reliable reproducer that can reproduce the issue in a
timely manner, but before the fix the issue would reproduce in several
minutes and with the fix it does not reproduce in over an hour.

Note that the current usefulness of the hints is limited because they
include the C-TCAM indication and represent aggregation that cannot
actually happen. This will be addressed in net-next.

[1]
WARNING: CPU: 0 PID: 153 at lib/objagg.c:170 objagg_obj_parent_assign+0xb5/0xd0
Modules linked in:
CPU: 0 PID: 153 Comm: kworker/0:18 Not tainted 6.9.0-rc6-custom-g70fbc2c1c38b #42
Hardware name: Mellanox Technologies Ltd. MSN3700C/VMOD0008, BIOS 5.11 10/10/2018
Workqueue: mlxsw_core mlxsw_sp_acl_tcam_vregion_rehash_work
RIP: 0010:objagg_obj_parent_assign+0xb5/0xd0
[...]
Call Trace:
 <TASK>
 __objagg_obj_get+0x2bb/0x580
 objagg_obj_get+0xe/0x80
 mlxsw_sp_acl_erp_mask_get+0xb5/0xf0
 mlxsw_sp_acl_atcam_entry_add+0xe8/0x3c0
 mlxsw_sp_acl_tcam_entry_create+0x5e/0xa0
 mlxsw_sp_acl_tcam_vchunk_migrate_one+0x16b/0x270
 mlxsw_sp_acl_tcam_vregion_rehash_work+0xbe/0x510
 process_one_work+0x151/0x370

Fixes: 9069a3817d82 ("lib: objagg: implement optimization hints assembly and use hints for object creation")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Tested-by: Alexander Zubkov <green@qrator.net>
Signed-off-by: Petr Machata <petrm@nvidia.com>
---
 .../ethernet/mellanox/mlxsw/spectrum_acl_erp.c    | 13 -------------
 include/linux/objagg.h                            |  1 -
 lib/objagg.c                                      | 15 ---------------
 3 files changed, 29 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_erp.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_erp.c
index d231f4d2888b..9eee229303cc 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_erp.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_erp.c
@@ -1217,18 +1217,6 @@ static bool mlxsw_sp_acl_erp_delta_check(void *priv, const void *parent_obj,
 	return err ? false : true;
 }
 
-static int mlxsw_sp_acl_erp_hints_obj_cmp(const void *obj1, const void *obj2)
-{
-	const struct mlxsw_sp_acl_erp_key *key1 = obj1;
-	const struct mlxsw_sp_acl_erp_key *key2 = obj2;
-
-	/* For hints purposes, two objects are considered equal
-	 * in case the masks are the same. Does not matter what
-	 * the "ctcam" value is.
-	 */
-	return memcmp(key1->mask, key2->mask, sizeof(key1->mask));
-}
-
 static void *mlxsw_sp_acl_erp_delta_create(void *priv, void *parent_obj,
 					   void *obj)
 {
@@ -1308,7 +1296,6 @@ static void mlxsw_sp_acl_erp_root_destroy(void *priv, void *root_priv)
 static const struct objagg_ops mlxsw_sp_acl_erp_objagg_ops = {
 	.obj_size = sizeof(struct mlxsw_sp_acl_erp_key),
 	.delta_check = mlxsw_sp_acl_erp_delta_check,
-	.hints_obj_cmp = mlxsw_sp_acl_erp_hints_obj_cmp,
 	.delta_create = mlxsw_sp_acl_erp_delta_create,
 	.delta_destroy = mlxsw_sp_acl_erp_delta_destroy,
 	.root_create = mlxsw_sp_acl_erp_root_create,
diff --git a/include/linux/objagg.h b/include/linux/objagg.h
index 78021777df46..6df5b887dc54 100644
--- a/include/linux/objagg.h
+++ b/include/linux/objagg.h
@@ -8,7 +8,6 @@ struct objagg_ops {
 	size_t obj_size;
 	bool (*delta_check)(void *priv, const void *parent_obj,
 			    const void *obj);
-	int (*hints_obj_cmp)(const void *obj1, const void *obj2);
 	void * (*delta_create)(void *priv, void *parent_obj, void *obj);
 	void (*delta_destroy)(void *priv, void *delta_priv);
 	void * (*root_create)(void *priv, void *obj, unsigned int root_id);
diff --git a/lib/objagg.c b/lib/objagg.c
index 0f99ea5f5371..363e43e849ac 100644
--- a/lib/objagg.c
+++ b/lib/objagg.c
@@ -906,20 +906,6 @@ static const struct objagg_opt_algo *objagg_opt_algos[] = {
 	[OBJAGG_OPT_ALGO_SIMPLE_GREEDY] = &objagg_opt_simple_greedy,
 };
 
-static int objagg_hints_obj_cmp(struct rhashtable_compare_arg *arg,
-				const void *obj)
-{
-	struct rhashtable *ht = arg->ht;
-	struct objagg_hints *objagg_hints =
-			container_of(ht, struct objagg_hints, node_ht);
-	const struct objagg_ops *ops = objagg_hints->ops;
-	const char *ptr = obj;
-
-	ptr += ht->p.key_offset;
-	return ops->hints_obj_cmp ? ops->hints_obj_cmp(ptr, arg->key) :
-				    memcmp(ptr, arg->key, ht->p.key_len);
-}
-
 /**
  * objagg_hints_get - obtains hints instance
  * @objagg:		objagg instance
@@ -958,7 +944,6 @@ struct objagg_hints *objagg_hints_get(struct objagg *objagg,
 				offsetof(struct objagg_hints_node, obj);
 	objagg_hints->ht_params.head_offset =
 				offsetof(struct objagg_hints_node, ht_node);
-	objagg_hints->ht_params.obj_cmpfn = objagg_hints_obj_cmp;
 
 	err = rhashtable_init(&objagg_hints->node_ht, &objagg_hints->ht_params);
 	if (err)
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net 6/6] mlxsw: spectrum_acl: Fix ACL scale regression and firmware errors
  2024-06-06 14:49 [PATCH net 0/6] mlxsw: ACL fixes Petr Machata
                   ` (4 preceding siblings ...)
  2024-06-06 14:49 ` [PATCH net 5/6] mlxsw: spectrum_acl_erp: Fix object nesting warning Petr Machata
@ 2024-06-06 14:49 ` Petr Machata
  2024-06-08  9:01   ` Simon Horman
  2024-06-10 10:20 ` [PATCH net 0/6] mlxsw: ACL fixes patchwork-bot+netdevbpf
  6 siblings, 1 reply; 14+ messages in thread
From: Petr Machata @ 2024-06-06 14:49 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	netdev
  Cc: Amit Cohen, Ido Schimmel, Petr Machata, Jiri Pirko,
	Alexander Zubkov, mlxsw

From: Ido Schimmel <idosch@nvidia.com>

ACLs that reside in the algorithmic TCAM (A-TCAM) in Spectrum-2 and
newer ASICs can share the same mask if their masks only differ in up to
8 consecutive bits. For example, consider the following filters:

 # tc filter add dev swp1 ingress pref 1 proto ip flower dst_ip 192.0.2.0/24 action drop
 # tc filter add dev swp1 ingress pref 1 proto ip flower dst_ip 198.51.100.128/25 action drop

The second filter can use the same mask as the first (dst_ip/24) with a
delta of 1 bit.

However, the above only works because the two filters have different
values in the common unmasked part (dst_ip/24). When entries have the
same value in the common unmasked part they create undesired collisions
in the device since many entries now have the same key. This leads to
firmware errors such as [1] and to a reduced scale.

Fix by adjusting the hash table key to only include the value in the
common unmasked part. That is, without including the delta bits. That
way the driver will detect the collision during filter insertion and
spill the filter into the circuit TCAM (C-TCAM).

Add a test case that fails without the fix and adjust existing cases
that check C-TCAM spillage according to the above limitation.

[1]
mlxsw_spectrum2 0000:06:00.0: EMAD reg access failed (tid=3379b18a00003394,reg_id=3027(ptce3),type=write,status=8(resource not available))

Fixes: c22291f7cf45 ("mlxsw: spectrum: acl: Implement delta for ERP")
Reported-by: Alexander Zubkov <green@qrator.net>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Tested-by: Alexander Zubkov <green@qrator.net>
Signed-off-by: Petr Machata <petrm@nvidia.com>
---
 .../mellanox/mlxsw/spectrum_acl_atcam.c       | 18 +++---
 .../mlxsw/spectrum_acl_bloom_filter.c         |  2 +-
 .../mellanox/mlxsw/spectrum_acl_tcam.h        |  9 +--
 .../drivers/net/mlxsw/spectrum-2/tc_flower.sh | 55 +++++++++++++++++--
 4 files changed, 63 insertions(+), 21 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_atcam.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_atcam.c
index a7473e782b56..07cb1e26ca3e 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_atcam.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_atcam.c
@@ -391,7 +391,8 @@ mlxsw_sp_acl_atcam_region_entry_insert(struct mlxsw_sp *mlxsw_sp,
 	if (err)
 		return err;
 
-	lkey_id = aregion->ops->lkey_id_get(aregion, aentry->enc_key, erp_id);
+	lkey_id = aregion->ops->lkey_id_get(aregion, aentry->ht_key.enc_key,
+					    erp_id);
 	if (IS_ERR(lkey_id))
 		return PTR_ERR(lkey_id);
 	aentry->lkey_id = lkey_id;
@@ -399,7 +400,7 @@ mlxsw_sp_acl_atcam_region_entry_insert(struct mlxsw_sp *mlxsw_sp,
 	kvdl_index = mlxsw_afa_block_first_kvdl_index(rulei->act_block);
 	mlxsw_reg_ptce3_pack(ptce3_pl, true, MLXSW_REG_PTCE3_OP_WRITE_WRITE,
 			     priority, region->tcam_region_info,
-			     aentry->enc_key, erp_id,
+			     aentry->ht_key.enc_key, erp_id,
 			     aentry->delta_info.start,
 			     aentry->delta_info.mask,
 			     aentry->delta_info.value,
@@ -428,7 +429,7 @@ mlxsw_sp_acl_atcam_region_entry_remove(struct mlxsw_sp *mlxsw_sp,
 
 	mlxsw_reg_ptce3_pack(ptce3_pl, false, MLXSW_REG_PTCE3_OP_WRITE_WRITE, 0,
 			     region->tcam_region_info,
-			     aentry->enc_key, erp_id,
+			     aentry->ht_key.enc_key, erp_id,
 			     aentry->delta_info.start,
 			     aentry->delta_info.mask,
 			     aentry->delta_info.value,
@@ -457,7 +458,7 @@ mlxsw_sp_acl_atcam_region_entry_action_replace(struct mlxsw_sp *mlxsw_sp,
 	kvdl_index = mlxsw_afa_block_first_kvdl_index(rulei->act_block);
 	mlxsw_reg_ptce3_pack(ptce3_pl, true, MLXSW_REG_PTCE3_OP_WRITE_UPDATE,
 			     priority, region->tcam_region_info,
-			     aentry->enc_key, erp_id,
+			     aentry->ht_key.enc_key, erp_id,
 			     aentry->delta_info.start,
 			     aentry->delta_info.mask,
 			     aentry->delta_info.value,
@@ -480,15 +481,13 @@ __mlxsw_sp_acl_atcam_entry_add(struct mlxsw_sp *mlxsw_sp,
 	int err;
 
 	mlxsw_afk_encode(afk, region->key_info, &rulei->values,
-			 aentry->ht_key.full_enc_key, mask);
+			 aentry->ht_key.enc_key, mask);
 
 	erp_mask = mlxsw_sp_acl_erp_mask_get(aregion, mask, false);
 	if (IS_ERR(erp_mask))
 		return PTR_ERR(erp_mask);
 	aentry->erp_mask = erp_mask;
 	aentry->ht_key.erp_id = mlxsw_sp_acl_erp_mask_erp_id(erp_mask);
-	memcpy(aentry->enc_key, aentry->ht_key.full_enc_key,
-	       sizeof(aentry->enc_key));
 
 	/* Compute all needed delta information and clear the delta bits
 	 * from the encoded key.
@@ -497,9 +496,8 @@ __mlxsw_sp_acl_atcam_entry_add(struct mlxsw_sp *mlxsw_sp,
 	aentry->delta_info.start = mlxsw_sp_acl_erp_delta_start(delta);
 	aentry->delta_info.mask = mlxsw_sp_acl_erp_delta_mask(delta);
 	aentry->delta_info.value =
-		mlxsw_sp_acl_erp_delta_value(delta,
-					     aentry->ht_key.full_enc_key);
-	mlxsw_sp_acl_erp_delta_clear(delta, aentry->enc_key);
+		mlxsw_sp_acl_erp_delta_value(delta, aentry->ht_key.enc_key);
+	mlxsw_sp_acl_erp_delta_clear(delta, aentry->ht_key.enc_key);
 
 	/* Add rule to the list of A-TCAM rules, assuming this
 	 * rule is intended to A-TCAM. In case this rule does
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_bloom_filter.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_bloom_filter.c
index 95f63fcf4ba1..a54eedb69a3f 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_bloom_filter.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_bloom_filter.c
@@ -249,7 +249,7 @@ __mlxsw_sp_acl_bf_key_encode(struct mlxsw_sp_acl_atcam_region *aregion,
 		memcpy(chunk + pad_bytes, &erp_region_id,
 		       sizeof(erp_region_id));
 		memcpy(chunk + key_offset,
-		       &aentry->enc_key[chunk_key_offsets[chunk_index]],
+		       &aentry->ht_key.enc_key[chunk_key_offsets[chunk_index]],
 		       chunk_key_len);
 		chunk += chunk_len;
 	}
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.h b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.h
index 79a1d8606512..010204f73ea4 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.h
@@ -167,9 +167,9 @@ struct mlxsw_sp_acl_atcam_region {
 };
 
 struct mlxsw_sp_acl_atcam_entry_ht_key {
-	char full_enc_key[MLXSW_REG_PTCEX_FLEX_KEY_BLOCKS_LEN]; /* Encoded
-								 * key.
-								 */
+	char enc_key[MLXSW_REG_PTCEX_FLEX_KEY_BLOCKS_LEN]; /* Encoded key, minus
+							    * delta bits.
+							    */
 	u8 erp_id;
 };
 
@@ -181,9 +181,6 @@ struct mlxsw_sp_acl_atcam_entry {
 	struct rhash_head ht_node;
 	struct list_head list; /* Member in entries_list */
 	struct mlxsw_sp_acl_atcam_entry_ht_key ht_key;
-	char enc_key[MLXSW_REG_PTCEX_FLEX_KEY_BLOCKS_LEN]; /* Encoded key,
-							    * minus delta bits.
-							    */
 	struct {
 		u16 start;
 		u8 mask;
diff --git a/tools/testing/selftests/drivers/net/mlxsw/spectrum-2/tc_flower.sh b/tools/testing/selftests/drivers/net/mlxsw/spectrum-2/tc_flower.sh
index 31252bc8775e..4994bea5daf8 100755
--- a/tools/testing/selftests/drivers/net/mlxsw/spectrum-2/tc_flower.sh
+++ b/tools/testing/selftests/drivers/net/mlxsw/spectrum-2/tc_flower.sh
@@ -11,7 +11,7 @@ ALL_TESTS="single_mask_test identical_filters_test two_masks_test \
 	multiple_masks_test ctcam_edge_cases_test delta_simple_test \
 	delta_two_masks_one_key_test delta_simple_rehash_test \
 	bloom_simple_test bloom_complex_test bloom_delta_test \
-	max_erp_entries_test max_group_size_test"
+	max_erp_entries_test max_group_size_test collision_test"
 NUM_NETIFS=2
 source $lib_dir/lib.sh
 source $lib_dir/tc_common.sh
@@ -457,7 +457,7 @@ delta_two_masks_one_key_test()
 {
 	# If 2 keys are the same and only differ in mask in a way that
 	# they belong under the same ERP (second is delta of the first),
-	# there should be no C-TCAM spill.
+	# there should be C-TCAM spill.
 
 	RET=0
 
@@ -474,8 +474,8 @@ delta_two_masks_one_key_test()
 	tp_record "mlxsw:*" "tc filter add dev $h2 ingress protocol ip \
 		   pref 2 handle 102 flower $tcflags dst_ip 192.0.2.2 \
 		   action drop"
-	tp_check_hits "mlxsw:mlxsw_sp_acl_atcam_entry_add_ctcam_spill" 0
-	check_err $? "incorrect C-TCAM spill while inserting the second rule"
+	tp_check_hits "mlxsw:mlxsw_sp_acl_atcam_entry_add_ctcam_spill" 1
+	check_err $? "C-TCAM spill did not happen while inserting the second rule"
 
 	$MZ $h1 -c 1 -p 64 -a $h1mac -b $h2mac -A 192.0.2.1 -B 192.0.2.2 \
 		-t ip -q
@@ -1087,6 +1087,53 @@ max_group_size_test()
 	log_test "max ACL group size test ($tcflags). max size $max_size"
 }
 
+collision_test()
+{
+	# Filters cannot share an eRP if in the common unmasked part (i.e.,
+	# without the delta bits) they have the same values. If the driver does
+	# not prevent such configuration (by spilling into the C-TCAM), then
+	# multiple entries will be present in the device with the same key,
+	# leading to collisions and a reduced scale.
+	#
+	# Create such a scenario and make sure all the filters are successfully
+	# added.
+
+	RET=0
+
+	local ret
+
+	if [[ "$tcflags" != "skip_sw" ]]; then
+		return 0;
+	fi
+
+	# Add a single dst_ip/24 filter and multiple dst_ip/32 filters that all
+	# have the same values in the common unmasked part (dst_ip/24).
+
+	tc filter add dev $h2 ingress pref 1 proto ipv4 handle 101 \
+		flower $tcflags dst_ip 198.51.100.0/24 \
+		action drop
+
+	for i in {0..255}; do
+		tc filter add dev $h2 ingress pref 2 proto ipv4 \
+			handle $((102 + i)) \
+			flower $tcflags dst_ip 198.51.100.${i}/32 \
+			action drop
+		ret=$?
+		[[ $ret -ne 0 ]] && break
+	done
+
+	check_err $ret "failed to add all the filters"
+
+	for i in {255..0}; do
+		tc filter del dev $h2 ingress pref 2 proto ipv4 \
+			handle $((102 + i)) flower
+	done
+
+	tc filter del dev $h2 ingress pref 1 proto ipv4 handle 101 flower
+
+	log_test "collision test ($tcflags)"
+}
+
 setup_prepare()
 {
 	h1=${NETIFS[p1]}
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH net 1/6] lib: objagg: Fix spelling
  2024-06-06 14:49 ` [PATCH net 1/6] lib: objagg: Fix spelling Petr Machata
@ 2024-06-08  9:00   ` Simon Horman
  0 siblings, 0 replies; 14+ messages in thread
From: Simon Horman @ 2024-06-08  9:00 UTC (permalink / raw)
  To: Petr Machata
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	netdev, Amit Cohen, Ido Schimmel, Jiri Pirko, Alexander Zubkov,
	mlxsw

On Thu, Jun 06, 2024 at 04:49:38PM +0200, Petr Machata wrote:
> From: Ido Schimmel <idosch@nvidia.com>
> 
> Fixes: 0a020d416d0a ("lib: introduce initial implementation of object aggregation manager")
> Signed-off-by: Ido Schimmel <idosch@nvidia.com>
> Reviewed-by: Amit Cohen <amcohen@nvidia.com>
> Tested-by: Alexander Zubkov <green@qrator.net>
> Signed-off-by: Petr Machata <petrm@nvidia.com>

Reviewed-by: Simon Horman <horms@kernel.org>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net 2/6] lib: test_objagg: Fix spelling
  2024-06-06 14:49 ` [PATCH net 2/6] lib: test_objagg: " Petr Machata
@ 2024-06-08  9:00   ` Simon Horman
  0 siblings, 0 replies; 14+ messages in thread
From: Simon Horman @ 2024-06-08  9:00 UTC (permalink / raw)
  To: Petr Machata
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	netdev, Amit Cohen, Ido Schimmel, Jiri Pirko, Alexander Zubkov,
	mlxsw

On Thu, Jun 06, 2024 at 04:49:39PM +0200, Petr Machata wrote:
> From: Ido Schimmel <idosch@nvidia.com>
> 
> Fixes: 0a020d416d0a ("lib: introduce initial implementation of object aggregation manager")
> Signed-off-by: Ido Schimmel <idosch@nvidia.com>
> Reviewed-by: Amit Cohen <amcohen@nvidia.com>
> Tested-by: Alexander Zubkov <green@qrator.net>
> Signed-off-by: Petr Machata <petrm@nvidia.com>

Reviewed-by: Simon Horman <horms@kernel.org>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net 3/6] mlxsw: spectrum_acl_atcam: Fix wrong comment
  2024-06-06 14:49 ` [PATCH net 3/6] mlxsw: spectrum_acl_atcam: Fix wrong comment Petr Machata
@ 2024-06-08  9:01   ` Simon Horman
  0 siblings, 0 replies; 14+ messages in thread
From: Simon Horman @ 2024-06-08  9:01 UTC (permalink / raw)
  To: Petr Machata
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	netdev, Amit Cohen, Ido Schimmel, Jiri Pirko, Alexander Zubkov,
	mlxsw

On Thu, Jun 06, 2024 at 04:49:40PM +0200, Petr Machata wrote:
> From: Ido Schimmel <idosch@nvidia.com>
> 
> The key is encoded, not encrypted.
> 
> Fixes: c22291f7cf45 ("mlxsw: spectrum: acl: Implement delta for ERP")
> Signed-off-by: Ido Schimmel <idosch@nvidia.com>
> Reviewed-by: Amit Cohen <amcohen@nvidia.com>
> Tested-by: Alexander Zubkov <green@qrator.net>
> Signed-off-by: Petr Machata <petrm@nvidia.com>

Reviewed-by: Simon Horman <horms@kernel.org>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net 6/6] mlxsw: spectrum_acl: Fix ACL scale regression and firmware errors
  2024-06-06 14:49 ` [PATCH net 6/6] mlxsw: spectrum_acl: Fix ACL scale regression and firmware errors Petr Machata
@ 2024-06-08  9:01   ` Simon Horman
  0 siblings, 0 replies; 14+ messages in thread
From: Simon Horman @ 2024-06-08  9:01 UTC (permalink / raw)
  To: Petr Machata
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	netdev, Amit Cohen, Ido Schimmel, Jiri Pirko, Alexander Zubkov,
	mlxsw

On Thu, Jun 06, 2024 at 04:49:43PM +0200, Petr Machata wrote:
> From: Ido Schimmel <idosch@nvidia.com>
> 
> ACLs that reside in the algorithmic TCAM (A-TCAM) in Spectrum-2 and
> newer ASICs can share the same mask if their masks only differ in up to
> 8 consecutive bits. For example, consider the following filters:
> 
>  # tc filter add dev swp1 ingress pref 1 proto ip flower dst_ip 192.0.2.0/24 action drop
>  # tc filter add dev swp1 ingress pref 1 proto ip flower dst_ip 198.51.100.128/25 action drop
> 
> The second filter can use the same mask as the first (dst_ip/24) with a
> delta of 1 bit.
> 
> However, the above only works because the two filters have different
> values in the common unmasked part (dst_ip/24). When entries have the
> same value in the common unmasked part they create undesired collisions
> in the device since many entries now have the same key. This leads to
> firmware errors such as [1] and to a reduced scale.
> 
> Fix by adjusting the hash table key to only include the value in the
> common unmasked part. That is, without including the delta bits. That
> way the driver will detect the collision during filter insertion and
> spill the filter into the circuit TCAM (C-TCAM).
> 
> Add a test case that fails without the fix and adjust existing cases
> that check C-TCAM spillage according to the above limitation.
> 
> [1]
> mlxsw_spectrum2 0000:06:00.0: EMAD reg access failed (tid=3379b18a00003394,reg_id=3027(ptce3),type=write,status=8(resource not available))
> 
> Fixes: c22291f7cf45 ("mlxsw: spectrum: acl: Implement delta for ERP")
> Reported-by: Alexander Zubkov <green@qrator.net>
> Signed-off-by: Ido Schimmel <idosch@nvidia.com>
> Reviewed-by: Amit Cohen <amcohen@nvidia.com>
> Tested-by: Alexander Zubkov <green@qrator.net>
> Signed-off-by: Petr Machata <petrm@nvidia.com>

Reviewed-by: Simon Horman <horms@kernel.org>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net 4/6] lib: objagg: Fix general protection fault
  2024-06-06 14:49 ` [PATCH net 4/6] lib: objagg: Fix general protection fault Petr Machata
@ 2024-06-08  9:01   ` Simon Horman
  0 siblings, 0 replies; 14+ messages in thread
From: Simon Horman @ 2024-06-08  9:01 UTC (permalink / raw)
  To: Petr Machata
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	netdev, Amit Cohen, Ido Schimmel, Jiri Pirko, Alexander Zubkov,
	mlxsw

On Thu, Jun 06, 2024 at 04:49:41PM +0200, Petr Machata wrote:
> From: Ido Schimmel <idosch@nvidia.com>
> 
> The library supports aggregation of objects into other objects only if
> the parent object does not have a parent itself. That is, nesting is not
> supported.
> 
> Aggregation happens in two cases: Without and with hints, where hints
> are a pre-computed recommendation on how to aggregate the provided
> objects.
> 
> Nesting is not possible in the first case due to a check that prevents
> it, but in the second case there is no check because the assumption is
> that nesting cannot happen when creating objects based on hints. The
> violation of this assumption leads to various warnings and eventually to
> a general protection fault [1].
> 
> Before fixing the root cause, error out when nesting happens and warn.
> 
> [1]
> general protection fault, probably for non-canonical address 0xdead000000000d90: 0000 [#1] PREEMPT SMP PTI
> CPU: 1 PID: 1083 Comm: kworker/1:9 Tainted: G        W          6.9.0-rc6-custom-gd9b4f1cca7fb #7
> Hardware name: Mellanox Technologies Ltd. MSN3700/VMOD0005, BIOS 5.11 01/06/2019
> Workqueue: mlxsw_core mlxsw_sp_acl_tcam_vregion_rehash_work
> RIP: 0010:mlxsw_sp_acl_erp_bf_insert+0x25/0x80
> [...]
> Call Trace:
>  <TASK>
>  mlxsw_sp_acl_atcam_entry_add+0x256/0x3c0
>  mlxsw_sp_acl_tcam_entry_create+0x5e/0xa0
>  mlxsw_sp_acl_tcam_vchunk_migrate_one+0x16b/0x270
>  mlxsw_sp_acl_tcam_vregion_rehash_work+0xbe/0x510
>  process_one_work+0x151/0x370
>  worker_thread+0x2cb/0x3e0
>  kthread+0xd0/0x100
>  ret_from_fork+0x34/0x50
>  ret_from_fork_asm+0x1a/0x30
>  </TASK>
> 
> Fixes: 9069a3817d82 ("lib: objagg: implement optimization hints assembly and use hints for object creation")
> Reported-by: Alexander Zubkov <green@qrator.net>
> Signed-off-by: Ido Schimmel <idosch@nvidia.com>
> Reviewed-by: Amit Cohen <amcohen@nvidia.com>
> Tested-by: Alexander Zubkov <green@qrator.net>
> Signed-off-by: Petr Machata <petrm@nvidia.com>

Reviewed-by: Simon Horman <horms@kernel.org>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net 5/6] mlxsw: spectrum_acl_erp: Fix object nesting warning
  2024-06-06 14:49 ` [PATCH net 5/6] mlxsw: spectrum_acl_erp: Fix object nesting warning Petr Machata
@ 2024-06-08  9:01   ` Simon Horman
  0 siblings, 0 replies; 14+ messages in thread
From: Simon Horman @ 2024-06-08  9:01 UTC (permalink / raw)
  To: Petr Machata
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	netdev, Amit Cohen, Ido Schimmel, Jiri Pirko, Alexander Zubkov,
	mlxsw

On Thu, Jun 06, 2024 at 04:49:42PM +0200, Petr Machata wrote:
> From: Ido Schimmel <idosch@nvidia.com>
> 
> ACLs in Spectrum-2 and newer ASICs can reside in the algorithmic TCAM
> (A-TCAM) or in the ordinary circuit TCAM (C-TCAM). The former can
> contain more ACLs (i.e., tc filters), but the number of masks in each
> region (i.e., tc chain) is limited.
> 
> In order to mitigate the effects of the above limitation, the device
> allows filters to share a single mask if their masks only differ in up
> to 8 consecutive bits. For example, dst_ip/25 can be represented using
> dst_ip/24 with a delta of 1 bit. The C-TCAM does not have a limit on the
> number of masks being used (and therefore does not support mask
> aggregation), but can contain a limited number of filters.
> 
> The driver uses the "objagg" library to perform the mask aggregation by
> passing it objects that consist of the filter's mask and whether the
> filter is to be inserted into the A-TCAM or the C-TCAM since filters in
> different TCAMs cannot share a mask.
> 
> The set of created objects is dependent on the insertion order of the
> filters and is not necessarily optimal. Therefore, the driver will
> periodically ask the library to compute a more optimal set ("hints") by
> looking at all the existing objects.
> 
> When the library asks the driver whether two objects can be aggregated
> the driver only compares the provided masks and ignores the A-TCAM /
> C-TCAM indication. This is the right thing to do since the goal is to
> move as many filters as possible to the A-TCAM. The driver also forbids
> two identical masks from being aggregated since this can only happen if
> one was intentionally put in the C-TCAM to avoid a conflict in the
> A-TCAM.
> 
> The above can result in the following set of hints:
> 
> H1: {mask X, A-TCAM} -> H2: {mask Y, A-TCAM} // X is Y + delta
> H3: {mask Y, C-TCAM} -> H4: {mask Z, A-TCAM} // Y is Z + delta
> 
> After getting the hints from the library the driver will start migrating
> filters from one region to another while consulting the computed hints
> and instructing the device to perform a lookup in both regions during
> the transition.
> 
> Assuming a filter with mask X is being migrated into the A-TCAM in the
> new region, the hints lookup will return H1. Since H2 is the parent of
> H1, the library will try to find the object associated with it and
> create it if necessary in which case another hints lookup (recursive)
> will be performed. This hints lookup for {mask Y, A-TCAM} will either
> return H2 or H3 since the driver passes the library an object comparison
> function that ignores the A-TCAM / C-TCAM indication.
> 
> This can eventually lead to nested objects which are not supported by
> the library [1].
> 
> Fix by removing the object comparison function from both the driver and
> the library as the driver was the only user. That way the lookup will
> only return exact matches.
> 
> I do not have a reliable reproducer that can reproduce the issue in a
> timely manner, but before the fix the issue would reproduce in several
> minutes and with the fix it does not reproduce in over an hour.
> 
> Note that the current usefulness of the hints is limited because they
> include the C-TCAM indication and represent aggregation that cannot
> actually happen. This will be addressed in net-next.
> 
> [1]
> WARNING: CPU: 0 PID: 153 at lib/objagg.c:170 objagg_obj_parent_assign+0xb5/0xd0
> Modules linked in:
> CPU: 0 PID: 153 Comm: kworker/0:18 Not tainted 6.9.0-rc6-custom-g70fbc2c1c38b #42
> Hardware name: Mellanox Technologies Ltd. MSN3700C/VMOD0008, BIOS 5.11 10/10/2018
> Workqueue: mlxsw_core mlxsw_sp_acl_tcam_vregion_rehash_work
> RIP: 0010:objagg_obj_parent_assign+0xb5/0xd0
> [...]
> Call Trace:
>  <TASK>
>  __objagg_obj_get+0x2bb/0x580
>  objagg_obj_get+0xe/0x80
>  mlxsw_sp_acl_erp_mask_get+0xb5/0xf0
>  mlxsw_sp_acl_atcam_entry_add+0xe8/0x3c0
>  mlxsw_sp_acl_tcam_entry_create+0x5e/0xa0
>  mlxsw_sp_acl_tcam_vchunk_migrate_one+0x16b/0x270
>  mlxsw_sp_acl_tcam_vregion_rehash_work+0xbe/0x510
>  process_one_work+0x151/0x370
> 
> Fixes: 9069a3817d82 ("lib: objagg: implement optimization hints assembly and use hints for object creation")
> Signed-off-by: Ido Schimmel <idosch@nvidia.com>
> Reviewed-by: Amit Cohen <amcohen@nvidia.com>
> Tested-by: Alexander Zubkov <green@qrator.net>
> Signed-off-by: Petr Machata <petrm@nvidia.com>

Reviewed-by: Simon Horman <horms@kernel.org>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net 0/6] mlxsw: ACL fixes
  2024-06-06 14:49 [PATCH net 0/6] mlxsw: ACL fixes Petr Machata
                   ` (5 preceding siblings ...)
  2024-06-06 14:49 ` [PATCH net 6/6] mlxsw: spectrum_acl: Fix ACL scale regression and firmware errors Petr Machata
@ 2024-06-10 10:20 ` patchwork-bot+netdevbpf
  6 siblings, 0 replies; 14+ messages in thread
From: patchwork-bot+netdevbpf @ 2024-06-10 10:20 UTC (permalink / raw)
  To: Petr Machata
  Cc: davem, edumazet, kuba, pabeni, netdev, amcohen, idosch, jiri,
	green, mlxsw

Hello:

This series was applied to netdev/net-next.git (main)
by David S. Miller <davem@davemloft.net>:

On Thu, 6 Jun 2024 16:49:37 +0200 you wrote:
> Ido Schimmel writes:
> 
> Patches #1-#3 fix various spelling mistakes I noticed while working on
> the code base.
> 
> Patch #4 fixes a general protection fault by bailing out when the error
> occurs and warning.
> 
> [...]

Here is the summary with links:
  - [net,1/6] lib: objagg: Fix spelling
    https://git.kernel.org/netdev/net-next/c/c1e156ae50ee
  - [net,2/6] lib: test_objagg: Fix spelling
    https://git.kernel.org/netdev/net-next/c/2aad28ec4543
  - [net,3/6] mlxsw: spectrum_acl_atcam: Fix wrong comment
    https://git.kernel.org/netdev/net-next/c/06fcdf249406
  - [net,4/6] lib: objagg: Fix general protection fault
    https://git.kernel.org/netdev/net-next/c/b4a3a89fffcd
  - [net,5/6] mlxsw: spectrum_acl_erp: Fix object nesting warning
    https://git.kernel.org/netdev/net-next/c/97d833ceb27d
  - [net,6/6] mlxsw: spectrum_acl: Fix ACL scale regression and firmware errors
    https://git.kernel.org/netdev/net-next/c/75d8d7a63065

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2024-06-10 10:20 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-06 14:49 [PATCH net 0/6] mlxsw: ACL fixes Petr Machata
2024-06-06 14:49 ` [PATCH net 1/6] lib: objagg: Fix spelling Petr Machata
2024-06-08  9:00   ` Simon Horman
2024-06-06 14:49 ` [PATCH net 2/6] lib: test_objagg: " Petr Machata
2024-06-08  9:00   ` Simon Horman
2024-06-06 14:49 ` [PATCH net 3/6] mlxsw: spectrum_acl_atcam: Fix wrong comment Petr Machata
2024-06-08  9:01   ` Simon Horman
2024-06-06 14:49 ` [PATCH net 4/6] lib: objagg: Fix general protection fault Petr Machata
2024-06-08  9:01   ` Simon Horman
2024-06-06 14:49 ` [PATCH net 5/6] mlxsw: spectrum_acl_erp: Fix object nesting warning Petr Machata
2024-06-08  9:01   ` Simon Horman
2024-06-06 14:49 ` [PATCH net 6/6] mlxsw: spectrum_acl: Fix ACL scale regression and firmware errors Petr Machata
2024-06-08  9:01   ` Simon Horman
2024-06-10 10:20 ` [PATCH net 0/6] mlxsw: ACL fixes patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).