DPDK-dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 0/3] fib: tbl8 reservation drift reproducer and proposed fix
@ 2026-05-07  9:50 Maxime Leroy
  2026-05-07  9:50 ` [RFC 1/3] fib6: fix tbl8 reservation drift in trie Maxime Leroy
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Maxime Leroy @ 2026-05-07  9:50 UTC (permalink / raw)
  To: Vladimir Medvedkin; +Cc: dev, Maxime Leroy

Asymmetric ADD/DEL sequences in FIB6 trie (a covering parent
removed between the ADD and DEL of a longer prefix) eventually
make ADD fail with -ENOSPC even when the tbl8 pool is empty.

Patch 2/3 is a small reproducer.

Root cause: rsvd_tbl8s is updated by depth_diff recomputed from
the current RIB, so increments at ADD and decrements at DEL do
not cancel when the RIB state changes in between. The counter
drifts and wraps to UINT32_MAX.

The simplest fix I could find (1/3, 3/3) is to drop rsvd_tbl8s
and use the pool counters already maintained by alloc/free:
tbl8_pool_pos in trie, cur_tbl8s in dir24_8. The DQ reclaim
inside tbl8_alloc() is moved into the pre-check.

I am not sure I understood the original intent of keeping
rsvd_tbl8s separate from the pool counters. In dir24_8 the two
mirror each other 1:1 and rsvd_tbl8s looks redundant; in trie,
depth_diff gives it a worst-case-reservation flavor but the
recomputation from the RIB is exactly what makes it drift. If
there was a deliberate reason, please point it out.

Patch 3/3 is a no-op cleanup that aligns dir24_8 with the trie
pattern.

Maxime Leroy (3):
  fib6: fix tbl8 reservation drift in trie
  test/fib6: add reproducer for tbl8 reservation drift
  fib: drop redundant tbl8 reservation counter

 app/test/test_fib6.c | 83 ++++++++++++++++++++++++++++++++++++++++++++
 lib/fib/dir24_8.c    | 31 ++++++-----------
 lib/fib/dir24_8.h    |  1 -
 lib/fib/trie.c       | 23 ++++++------
 lib/fib/trie.h       |  3 +-
 5 files changed, 104 insertions(+), 37 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [RFC 1/3] fib6: fix tbl8 reservation drift in trie
  2026-05-07  9:50 [RFC 0/3] fib: tbl8 reservation drift reproducer and proposed fix Maxime Leroy
@ 2026-05-07  9:50 ` Maxime Leroy
  2026-05-07  9:50 ` [RFC 2/3] test/fib6: add reproducer for tbl8 reservation drift Maxime Leroy
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 6+ messages in thread
From: Maxime Leroy @ 2026-05-07  9:50 UTC (permalink / raw)
  To: Vladimir Medvedkin; +Cc: dev, Maxime Leroy, stable

trie_modify() updates rsvd_tbl8s by depth_diff computed from the
current RIB state. The RIB is not invariant between the ADD of a
prefix and its later DEL (a covering parent may be added or removed
in between), so depth_diff at DEL time may not match depth_diff at
ADD time. Repeated over asymmetric pairs, rsvd_tbl8s drifts and
eventually wraps to UINT32_MAX, after which the pre-check rejects
all long-prefix ADDs with -ENOSPC even when the pool is empty.

Replace rsvd_tbl8s with tbl8_pool_pos, which tbl8_get()/tbl8_put()
maintain exactly. To preserve the QSBR_MODE_DQ safety net previously
provided by the retry-with-reclaim inside tbl8_alloc(), the pre-check
now calls rte_rcu_qsbr_dq_reclaim(depth_diff) before returning
-ENOSPC.

The single-tbl8 retry inside tbl8_alloc() is removed: depth_diff is
the algorithmic upper bound for new tbl8 allocations, and the
pre-check now performs the DQ reclaim before allocation, so the
retry inside the allocator is no longer needed.

Fixes: c3e12e0f0354 ("fib: add dataplane algorithm for IPv6")
Cc: stable@dpdk.org
Signed-off-by: Maxime Leroy <maxime@leroys.fr>
---
 lib/fib/trie.c | 23 ++++++++++-------------
 lib/fib/trie.h |  3 +--
 2 files changed, 11 insertions(+), 15 deletions(-)

diff --git a/lib/fib/trie.c b/lib/fib/trie.c
index fa5d9ec6b0..52f25d499c 100644
--- a/lib/fib/trie.c
+++ b/lib/fib/trie.c
@@ -161,12 +161,6 @@ tbl8_alloc(struct rte_trie_tbl *dp, uint64_t nh)
 	uint8_t		*tbl8_ptr;
 
 	tbl8_idx = tbl8_get(dp);
-
-	/* If there are no tbl8 groups try to reclaim one. */
-	if (unlikely(tbl8_idx == -ENOSPC && dp->dq &&
-			!rte_rcu_qsbr_dq_reclaim(dp->dq, 1, NULL, NULL, NULL)))
-		tbl8_idx = tbl8_get(dp);
-
 	if (tbl8_idx < 0)
 		return tbl8_idx;
 	tbl8_ptr = get_tbl_p_by_idx(dp->tbl8,
@@ -603,8 +597,15 @@ trie_modify(struct rte_fib6 *fib, const struct rte_ipv6_addr *ip,
 			return 0;
 		}
 
-		if ((depth > 24) && (dp->rsvd_tbl8s + depth_diff > dp->number_tbl8s))
-			return -ENOSPC;
+		if ((depth > 24) &&
+		    (dp->tbl8_pool_pos + depth_diff > dp->number_tbl8s)) {
+			/* Reclaim deferred tbl8s before failing. */
+			if (dp->dq != NULL)
+				rte_rcu_qsbr_dq_reclaim(dp->dq, depth_diff,
+					NULL, NULL, NULL);
+			if (dp->tbl8_pool_pos + depth_diff > dp->number_tbl8s)
+				return -ENOSPC;
+		}
 
 		node = rte_rib6_insert(rib, &ip_masked, depth);
 		if (node == NULL)
@@ -614,15 +615,13 @@ trie_modify(struct rte_fib6 *fib, const struct rte_ipv6_addr *ip,
 		if (parent != NULL) {
 			rte_rib6_get_nh(parent, &par_nh);
 			if (par_nh == next_hop)
-				goto successfully_added;
+				return 0;
 		}
 		ret = modify_dp(dp, rib, &ip_masked, depth, next_hop);
 		if (ret != 0) {
 			rte_rib6_remove(rib, &ip_masked, depth);
 			return ret;
 		}
-successfully_added:
-		dp->rsvd_tbl8s += depth_diff;
 		return 0;
 	case RTE_FIB6_DEL:
 		if (node == NULL)
@@ -641,8 +640,6 @@ trie_modify(struct rte_fib6 *fib, const struct rte_ipv6_addr *ip,
 		if (ret != 0)
 			return ret;
 		rte_rib6_remove(rib, ip, depth);
-
-		dp->rsvd_tbl8s -= depth_diff;
 		return 0;
 	default:
 		break;
diff --git a/lib/fib/trie.h b/lib/fib/trie.h
index c34cc2c057..b42a28f84e 100644
--- a/lib/fib/trie.h
+++ b/lib/fib/trie.h
@@ -31,8 +31,7 @@
 
 struct rte_trie_tbl {
 	uint32_t	number_tbl8s;	/**< Total number of tbl8s */
-	uint32_t	rsvd_tbl8s;	/**< Number of reserved tbl8s */
-	uint32_t	cur_tbl8s;	/**< Current cumber of tbl8s */
+	uint32_t	cur_tbl8s;	/**< Current number of tbl8s */
 	uint64_t	def_nh;		/**< Default next hop */
 	enum rte_fib_trie_nh_sz	nh_sz;	/**< Size of nexthop entry */
 	uint64_t	*tbl8;		/**< tbl8 table. */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [RFC 2/3] test/fib6: add reproducer for tbl8 reservation drift
  2026-05-07  9:50 [RFC 0/3] fib: tbl8 reservation drift reproducer and proposed fix Maxime Leroy
  2026-05-07  9:50 ` [RFC 1/3] fib6: fix tbl8 reservation drift in trie Maxime Leroy
@ 2026-05-07  9:50 ` Maxime Leroy
  2026-05-07  9:50 ` [RFC 3/3] fib: drop redundant tbl8 reservation counter Maxime Leroy
  2026-05-07 19:02 ` [RFC 0/3] fib: tbl8 reservation drift reproducer and proposed fix Stephen Hemminger
  3 siblings, 0 replies; 6+ messages in thread
From: Maxime Leroy @ 2026-05-07  9:50 UTC (permalink / raw)
  To: Vladimir Medvedkin; +Cc: dev, Maxime Leroy

Regression test for the rsvd_tbl8s drift bug fixed in
"fib6: fix tbl8 reservation drift in trie".

The test installs a /28 parent and three /48 children, deletes the
/28 parent while children are alive, then deletes each /48 child.
One asymmetric cycle wraps rsvd_tbl8s past zero (net -2 per cycle).

The probe is a /28 ADD (depth_diff=1) so UINT32_MAX-1 + 1 exceeds
the pool size without further uint32_t overflow.

Without the fix, the final ADD /28 fails with -ENOSPC.
With the fix, it succeeds.

Signed-off-by: Maxime Leroy <maxime@leroys.fr>
---
 app/test/test_fib6.c | 83 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 83 insertions(+)

diff --git a/app/test/test_fib6.c b/app/test/test_fib6.c
index fffb590dbf..85a6c0abc3 100644
--- a/app/test/test_fib6.c
+++ b/app/test/test_fib6.c
@@ -25,6 +25,7 @@ static int32_t test_get_invalid(void);
 static int32_t test_lookup(void);
 static int32_t test_invalid_rcu(void);
 static int32_t test_fib_rcu_sync_rw(void);
+static int32_t test_drift(void);
 
 #define MAX_ROUTES	(1 << 16)
 /** Maximum number of tbl8 for 2-byte entries */
@@ -599,6 +600,87 @@ test_fib_rcu_sync_rw(void)
 	return status == 0 ? TEST_SUCCESS : TEST_FAILED;
 }
 
+/*
+ * Reproducer for the rsvd_tbl8s drift bug. depth_diff used to maintain
+ * rsvd_tbl8s is computed from the current RIB state, so it is not
+ * invariant between the ADD of a prefix and its later DEL when a
+ * covering parent prefix is removed in between.
+ *
+ * Layout: one /28 parent (fcde::/28) and three /48 siblings under it
+ * (fcde:0:6000::/48, fcde:1:6000::/48, fcde:2:6000::/48). The second
+ * hextet's high 12 bits are zero, so the three /48 IPs all fall inside
+ * the /28.
+ *
+ * One asymmetric sequence is enough to wrap the counter:
+ *   ADD /28                                  rsvd_tbl8s += 1
+ *   ADD /48 child_0,1,2 (with /28 parent)    rsvd_tbl8s += 2 each (+6)
+ *   DEL /28 (sibling /48 found)              rsvd_tbl8s -= 0
+ *   DEL /48 child_0,1,2 (no parent left)     rsvd_tbl8s -= 3 each (-9)
+ *
+ * Net: -2. Starting from 0, rsvd_tbl8s wraps to UINT32_MAX-1. The
+ * next ADD of a prefix longer than /24 then unconditionally fails the
+ * pre-check (rsvd_tbl8s + depth_diff > number_tbl8s), even though the
+ * pool is empty.
+ */
+static int32_t
+test_drift(void)
+{
+	struct rte_fib6_conf config = { 0 };
+	struct rte_fib6 *fib;
+	struct rte_ipv6_addr parent =
+		RTE_IPV6(0xfcde, 0, 0, 0, 0, 0, 0, 0);
+	struct rte_ipv6_addr child[3] = {
+		RTE_IPV6(0xfcde, 0, 0x6000, 0, 0, 0, 0, 0),
+		RTE_IPV6(0xfcde, 1, 0x6000, 0, 0, 0, 0, 0),
+		RTE_IPV6(0xfcde, 2, 0x6000, 0, 0, 0, 0, 0),
+	};
+	unsigned int c;
+	int ret;
+
+	config.max_routes = 1024;
+	config.rib_ext_sz = 0;
+	config.default_nh = 0;
+	config.type = RTE_FIB6_TRIE;
+	config.trie.nh_sz = RTE_FIB6_TRIE_2B;
+	config.trie.num_tbl8 = 256;
+
+	fib = rte_fib6_create(__func__, SOCKET_ID_ANY, &config);
+	RTE_TEST_ASSERT(fib != NULL, "Failed to create FIB\n");
+
+	ret = rte_fib6_add(fib, &parent, 28, 0xa);
+	RTE_TEST_ASSERT(ret == 0, "ADD /28 failed (ret=%d)\n", ret);
+
+	for (c = 0; c < 3; c++) {
+		ret = rte_fib6_add(fib, &child[c], 48, 0xb + c);
+		RTE_TEST_ASSERT(ret == 0,
+			"ADD /48 child %u failed (ret=%d)\n", c, ret);
+	}
+
+	ret = rte_fib6_delete(fib, &parent, 28);
+	RTE_TEST_ASSERT(ret == 0, "DEL /28 failed (ret=%d)\n", ret);
+
+	for (c = 0; c < 3; c++) {
+		ret = rte_fib6_delete(fib, &child[c], 48);
+		RTE_TEST_ASSERT(ret == 0,
+			"DEL /48 child %u failed (ret=%d)\n", c, ret);
+	}
+
+	/*
+	 * Pre-fix: -ENOSPC because rsvd_tbl8s wrapped to UINT32_MAX-1.
+	 * Post-fix: succeeds; the pre-check uses tbl8_pool_pos which
+	 * accurately reflects the (empty) pool.
+	 */
+	ret = rte_fib6_add(fib, &parent, 28, 0xe);
+	RTE_TEST_ASSERT(ret == 0,
+		"Fresh ADD /28 spuriously failed (ret=%d)\n", ret);
+
+	ret = rte_fib6_delete(fib, &parent, 28);
+	RTE_TEST_ASSERT(ret == 0, "Final DEL /28 failed (ret=%d)\n", ret);
+
+	rte_fib6_free(fib);
+	return TEST_SUCCESS;
+}
+
 static struct unit_test_suite fib6_fast_tests = {
 	.suite_name = "fib6 autotest",
 	.setup = NULL,
@@ -611,6 +693,7 @@ static struct unit_test_suite fib6_fast_tests = {
 	TEST_CASE(test_lookup),
 	TEST_CASE(test_invalid_rcu),
 	TEST_CASE(test_fib_rcu_sync_rw),
+	TEST_CASE(test_drift),
 	TEST_CASES_END()
 	}
 };
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [RFC 3/3] fib: drop redundant tbl8 reservation counter
  2026-05-07  9:50 [RFC 0/3] fib: tbl8 reservation drift reproducer and proposed fix Maxime Leroy
  2026-05-07  9:50 ` [RFC 1/3] fib6: fix tbl8 reservation drift in trie Maxime Leroy
  2026-05-07  9:50 ` [RFC 2/3] test/fib6: add reproducer for tbl8 reservation drift Maxime Leroy
@ 2026-05-07  9:50 ` Maxime Leroy
  2026-05-07 19:02 ` [RFC 0/3] fib: tbl8 reservation drift reproducer and proposed fix Stephen Hemminger
  3 siblings, 0 replies; 6+ messages in thread
From: Maxime Leroy @ 2026-05-07  9:50 UTC (permalink / raw)
  To: Vladimir Medvedkin; +Cc: dev, Maxime Leroy

In dir24_8, rsvd_tbl8s mirrors cur_tbl8s: both are bumped when
depth > 24 and no /24+ cover exists, and decremented under the
same condition. Removing rsvd_tbl8s leaves cur_tbl8s as the
single source of truth for the pre-check.

Move the DQ reclaim retry from tbl8_alloc() into the pre-check,
mirroring the layout used by the trie path. Behavior is unchanged
for non-DQ modes; in DQ mode the pre-check now performs the
reclaim that the in-allocator retry used to perform.

Signed-off-by: Maxime Leroy <maxime@leroys.fr>
---
 lib/fib/dir24_8.c | 31 ++++++++++---------------------
 lib/fib/dir24_8.h |  1 -
 2 files changed, 10 insertions(+), 22 deletions(-)

diff --git a/lib/fib/dir24_8.c b/lib/fib/dir24_8.c
index 489d2ef427..80215b93b0 100644
--- a/lib/fib/dir24_8.c
+++ b/lib/fib/dir24_8.c
@@ -207,12 +207,6 @@ tbl8_alloc(struct dir24_8_tbl *dp, uint64_t nh)
 	uint8_t	*tbl8_ptr;
 
 	tbl8_idx = tbl8_get_idx(dp);
-
-	/* If there are no tbl8 groups try to reclaim one. */
-	if (unlikely(tbl8_idx == -ENOSPC && dp->dq &&
-			!rte_rcu_qsbr_dq_reclaim(dp->dq, 1, NULL, NULL, NULL)))
-		tbl8_idx = tbl8_get_idx(dp);
-
 	if (tbl8_idx < 0)
 		return tbl8_idx;
 	tbl8_ptr = (uint8_t *)dp->tbl8 +
@@ -504,9 +498,14 @@ dir24_8_modify(struct rte_fib *fib, uint32_t ip, uint8_t depth,
 			tmp = rte_rib_get_nxt(rib, ip, 24, NULL,
 				RTE_RIB_GET_NXT_COVER);
 			if ((tmp == NULL) &&
-				(dp->rsvd_tbl8s >= dp->number_tbl8s))
-				return -ENOSPC;
-
+			    (dp->cur_tbl8s >= dp->number_tbl8s)) {
+				/* Reclaim deferred tbl8s before failing. */
+				if (dp->dq != NULL)
+					rte_rcu_qsbr_dq_reclaim(dp->dq, 1,
+						NULL, NULL, NULL);
+				if (dp->cur_tbl8s >= dp->number_tbl8s)
+					return -ENOSPC;
+			}
 		}
 		node = rte_rib_insert(rib, ip, depth);
 		if (node == NULL)
@@ -516,16 +515,13 @@ dir24_8_modify(struct rte_fib *fib, uint32_t ip, uint8_t depth,
 		if (parent != NULL) {
 			rte_rib_get_nh(parent, &par_nh);
 			if (par_nh == next_hop)
-				goto successfully_added;
+				return 0;
 		}
 		ret = modify_fib(dp, rib, ip, depth, next_hop);
 		if (ret != 0) {
 			rte_rib_remove(rib, ip, depth);
 			return ret;
 		}
-successfully_added:
-		if ((depth > 24) && (tmp == NULL))
-			dp->rsvd_tbl8s++;
 		return 0;
 	case RTE_FIB_DEL:
 		if (node == NULL)
@@ -539,15 +535,8 @@ dir24_8_modify(struct rte_fib *fib, uint32_t ip, uint8_t depth,
 				ret = modify_fib(dp, rib, ip, depth, par_nh);
 		} else
 			ret = modify_fib(dp, rib, ip, depth, dp->def_nh);
-		if (ret == 0) {
+		if (ret == 0)
 			rte_rib_remove(rib, ip, depth);
-			if (depth > 24) {
-				tmp = rte_rib_get_nxt(rib, ip, 24, NULL,
-					RTE_RIB_GET_NXT_COVER);
-				if (tmp == NULL)
-					dp->rsvd_tbl8s--;
-			}
-		}
 		return ret;
 	default:
 		break;
diff --git a/lib/fib/dir24_8.h b/lib/fib/dir24_8.h
index b343b5d686..502540173c 100644
--- a/lib/fib/dir24_8.h
+++ b/lib/fib/dir24_8.h
@@ -30,7 +30,6 @@
 
 struct dir24_8_tbl {
 	uint32_t	number_tbl8s;	/**< Total number of tbl8s */
-	uint32_t	rsvd_tbl8s;	/**< Number of reserved tbl8s */
 	uint32_t	cur_tbl8s;	/**< Current number of tbl8s */
 	enum rte_fib_dir24_8_nh_sz	nh_sz;	/**< Size of nexthop entry */
 	/* RCU config. */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [RFC 0/3] fib: tbl8 reservation drift reproducer and proposed fix
  2026-05-07  9:50 [RFC 0/3] fib: tbl8 reservation drift reproducer and proposed fix Maxime Leroy
                   ` (2 preceding siblings ...)
  2026-05-07  9:50 ` [RFC 3/3] fib: drop redundant tbl8 reservation counter Maxime Leroy
@ 2026-05-07 19:02 ` Stephen Hemminger
  2026-05-11  7:29   ` Maxime Leroy
  3 siblings, 1 reply; 6+ messages in thread
From: Stephen Hemminger @ 2026-05-07 19:02 UTC (permalink / raw)
  To: Maxime Leroy; +Cc: Vladimir Medvedkin, dev

On Thu,  7 May 2026 11:50:40 +0200
Maxime Leroy <maxime@leroys.fr> wrote:

> Asymmetric ADD/DEL sequences in FIB6 trie (a covering parent
> removed between the ADD and DEL of a longer prefix) eventually
> make ADD fail with -ENOSPC even when the tbl8 pool is empty.
> 
> Patch 2/3 is a small reproducer.
> 
> Root cause: rsvd_tbl8s is updated by depth_diff recomputed from
> the current RIB, so increments at ADD and decrements at DEL do
> not cancel when the RIB state changes in between. The counter
> drifts and wraps to UINT32_MAX.
> 
> The simplest fix I could find (1/3, 3/3) is to drop rsvd_tbl8s
> and use the pool counters already maintained by alloc/free:
> tbl8_pool_pos in trie, cur_tbl8s in dir24_8. The DQ reclaim
> inside tbl8_alloc() is moved into the pre-check.
> 
> I am not sure I understood the original intent of keeping
> rsvd_tbl8s separate from the pool counters. In dir24_8 the two
> mirror each other 1:1 and rsvd_tbl8s looks redundant; in trie,
> depth_diff gives it a worst-case-reservation flavor but the
> recomputation from the RIB is exactly what makes it drift. If
> there was a deliberate reason, please point it out.
> 
> Patch 3/3 is a no-op cleanup that aligns dir24_8 with the trie
> pattern.
> 
> Maxime Leroy (3):
>   fib6: fix tbl8 reservation drift in trie
>   test/fib6: add reproducer for tbl8 reservation drift
>   fib: drop redundant tbl8 reservation counter
> 
>  app/test/test_fib6.c | 83 ++++++++++++++++++++++++++++++++++++++++++++
>  lib/fib/dir24_8.c    | 31 ++++++-----------
>  lib/fib/dir24_8.h    |  1 -
>  lib/fib/trie.c       | 23 ++++++------
>  lib/fib/trie.h       |  3 +-
>  5 files changed, 104 insertions(+), 37 deletions(-)
> 

Used  this a test case for multiple AI reviewers.
Most reported false positives in various places.
But the winner was Claude web (or desktop) with Opus 4.7 extended.
It saw:

Patch 3 - fib: drop redundant tbl8 reservation counter
Warning: The commit message frames this as removing a "redundant" counter and claims "Behavior is unchanged for non-DQ modes," but dir24_8 has the same rsvd_tbl8s drift bug that patch 1/3 fixes for trie6 — this patch silently fixes it as well.
Trace (no RCU, depth-1 mode):
ADD /28 A (no /24 cover):
- tmp == NULL, pre-check passes
- install_to_fib allocates 1 tbl8 -> cur_tbl8s = 1
- post-add hook: tmp == NULL -> rsvd_tbl8s++ = 1
ADD /24 B covering A:
- depth == 24, no rsvd_tbl8s update
- tbl8 entries rewritten, no alloc/free
- state: rsvd_tbl8s = 1, cur_tbl8s = 1
DEL /28 A (with /24 cover B):
- modify_fib installs B's nh over A's range
- tbl8 becomes uniform -> tbl8_recycle() -> cur_tbl8s = 0
- post-del check: tmp != NULL (B is the cover) -> rsvd_tbl8s NOT decremented
- state: rsvd_tbl8s = 1, cur_tbl8s = 0  (diverged)
DEL /24 B:
- depth == 24, no rsvd_tbl8s update
- final: rsvd_tbl8s = 1, cur_tbl8s = 0
Each repetition leaks +1 into rsvd_tbl8s with no actual tbl8 in use. Over enough cycles, rsvd_tbl8s reaches number_tbl8s and the pre-check rejects valid /25-/32 ADDs even though the pool is empty. This is the same class of bug as the IPv6 case, just with smaller per-cycle drift.
The patch should carry:
Fixes: <commit that introduced rsvd_tbl8s in dir24_8>
Cc: stable@dpdk.org

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC 0/3] fib: tbl8 reservation drift reproducer and proposed fix
  2026-05-07 19:02 ` [RFC 0/3] fib: tbl8 reservation drift reproducer and proposed fix Stephen Hemminger
@ 2026-05-11  7:29   ` Maxime Leroy
  0 siblings, 0 replies; 6+ messages in thread
From: Maxime Leroy @ 2026-05-11  7:29 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Vladimir Medvedkin, dev

Hi Stephen,

On Thu, May 7, 2026 at 9:03 PM Stephen Hemminger
<stephen@networkplumber.org> wrote:
>
> On Thu,  7 May 2026 11:50:40 +0200
> Maxime Leroy <maxime@leroys.fr> wrote:
>
> > Asymmetric ADD/DEL sequences in FIB6 trie (a covering parent
> > removed between the ADD and DEL of a longer prefix) eventually
> > make ADD fail with -ENOSPC even when the tbl8 pool is empty.
> >
> > Patch 2/3 is a small reproducer.
> >
> > Root cause: rsvd_tbl8s is updated by depth_diff recomputed from
> > the current RIB, so increments at ADD and decrements at DEL do
> > not cancel when the RIB state changes in between. The counter
> > drifts and wraps to UINT32_MAX.
> >
> > The simplest fix I could find (1/3, 3/3) is to drop rsvd_tbl8s
> > and use the pool counters already maintained by alloc/free:
> > tbl8_pool_pos in trie, cur_tbl8s in dir24_8. The DQ reclaim
> > inside tbl8_alloc() is moved into the pre-check.
> >
> > I am not sure I understood the original intent of keeping
> > rsvd_tbl8s separate from the pool counters. In dir24_8 the two
> > mirror each other 1:1 and rsvd_tbl8s looks redundant; in trie,
> > depth_diff gives it a worst-case-reservation flavor but the
> > recomputation from the RIB is exactly what makes it drift. If
> > there was a deliberate reason, please point it out.
> >
> > Patch 3/3 is a no-op cleanup that aligns dir24_8 with the trie
> > pattern.
> >
> > Maxime Leroy (3):
> >   fib6: fix tbl8 reservation drift in trie
> >   test/fib6: add reproducer for tbl8 reservation drift
> >   fib: drop redundant tbl8 reservation counter
> >
> >  app/test/test_fib6.c | 83 ++++++++++++++++++++++++++++++++++++++++++++
> >  lib/fib/dir24_8.c    | 31 ++++++-----------
> >  lib/fib/dir24_8.h    |  1 -
> >  lib/fib/trie.c       | 23 ++++++------
> >  lib/fib/trie.h       |  3 +-
> >  5 files changed, 104 insertions(+), 37 deletions(-)
> >
>
> Used  this a test case for multiple AI reviewers.
> Most reported false positives in various places.
> But the winner was Claude web (or desktop) with Opus 4.7 extended.
> It saw:
>
> Patch 3 - fib: drop redundant tbl8 reservation counter
> Warning: The commit message frames this as removing a "redundant" counter and claims "Behavior is unchanged for non-DQ modes," but dir24_8 has the same rsvd_tbl8s drift bug that patch 1/3 fixes for trie6 — this patch silently fixes it as well.
> Trace (no RCU, depth-1 mode):
> ADD /28 A (no /24 cover):
> - tmp == NULL, pre-check passes
> - install_to_fib allocates 1 tbl8 -> cur_tbl8s = 1
> - post-add hook: tmp == NULL -> rsvd_tbl8s++ = 1
> ADD /24 B covering A:
> - depth == 24, no rsvd_tbl8s update
> - tbl8 entries rewritten, no alloc/free
> - state: rsvd_tbl8s = 1, cur_tbl8s = 1
> DEL /28 A (with /24 cover B):
> - modify_fib installs B's nh over A's range
> - tbl8 becomes uniform -> tbl8_recycle() -> cur_tbl8s = 0
> - post-del check: tmp != NULL (B is the cover) -> rsvd_tbl8s NOT decremented
> - state: rsvd_tbl8s = 1, cur_tbl8s = 0  (diverged)
> DEL /24 B:
> - depth == 24, no rsvd_tbl8s update
> - final: rsvd_tbl8s = 1, cur_tbl8s = 0
> Each repetition leaks +1 into rsvd_tbl8s with no actual tbl8 in use. Over enough cycles, rsvd_tbl8s reaches number_tbl8s and the pre-check rejects valid /25-/32 ADDs even though the pool is empty. This is the same class of bug as the IPv6 case, just with smaller per-cycle drift.
> The patch should carry:
> Fixes: <commit that introduced rsvd_tbl8s in dir24_8>
> Cc: stable@dpdk.org

Thanks for the AI review. It makes senses. I will fix it for the V1.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-05-11  7:30 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-07  9:50 [RFC 0/3] fib: tbl8 reservation drift reproducer and proposed fix Maxime Leroy
2026-05-07  9:50 ` [RFC 1/3] fib6: fix tbl8 reservation drift in trie Maxime Leroy
2026-05-07  9:50 ` [RFC 2/3] test/fib6: add reproducer for tbl8 reservation drift Maxime Leroy
2026-05-07  9:50 ` [RFC 3/3] fib: drop redundant tbl8 reservation counter Maxime Leroy
2026-05-07 19:02 ` [RFC 0/3] fib: tbl8 reservation drift reproducer and proposed fix Stephen Hemminger
2026-05-11  7:29   ` Maxime Leroy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox