* [RFC 0/5] fib: shared and resizable tbl8 pool
@ 2026-03-31 21:41 Maxime Leroy
2026-03-31 21:41 ` [RFC 1/5] test/fib6: zero-initialize config struct Maxime Leroy
` (6 more replies)
0 siblings, 7 replies; 9+ messages in thread
From: Maxime Leroy @ 2026-03-31 21:41 UTC (permalink / raw)
To: dev; +Cc: vladimir.medvedkin, rjarry, Maxime Leroy
This RFC proposes an optional shared tbl8 pool for FIB/FIB6,
to address the difficulty of sizing num_tbl8 upfront.
In practice, tbl8 usage depends on prefix distribution and
evolves over time. In multi-VRF environments, some VRFs are
elephants (full table, thousands of tbl8 groups) while others
consume very little (mostly /24 or shorter). Per-FIB sizing
forces each instance to provision for its worst case, leading
to significant memory waste.
A shared pool solves this: all FIBs draw from the same tbl8
memory, so elephant VRFs use what they need while light VRFs
cost almost nothing. The sharing granularity is flexible: one pool per
VRF, per address family, a global pool, or no sharing at all.
This series adds:
- A shared tbl8 pool, replacing per-backend allocation
(bitmap in dir24_8, stack in trie) with a common
refcounted O(1) stack allocator.
- An optional resizable mode (grow via alloc + copy + QSBR
synchronize), removing the need to guess peak usage at
creation time.
- A stats API (rte_fib_tbl8_pool_get_stats()) exposing
used/total/max counters.
All features are opt-in:
- Existing per-FIB allocation remains the default.
- Shared pool is enabled via the tbl8_pool config field.
- Resize is enabled by setting max_tbl8 > 0 with QSBR.
Shrinking (reducing pool capacity after usage drops) is not
part of this series. It would always be best-effort since
there is no compaction: if any tbl8 group near the end of the
pool is still in use, the pool cannot shrink. The current LIFO
free-list makes this less likely by immediately reusing freed
high indices, which prevents a contiguous free tail from
forming. A different allocation strategy (e.g. a min-heap
favoring low indices) could improve shrink opportunities, but
is better addressed separately.
A working integration in Grout is available:
https://github.com/DPDK/grout/pull/581 (still a draft)
Maxime Leroy (5):
test/fib6: zero-initialize config struct
fib: share tbl8 definitions between fib and fib6
fib: add shared tbl8 pool
fib: add resizable tbl8 pool
fib: add tbl8 pool stats API
app/test/test_fib6.c | 10 +-
lib/fib/dir24_8.c | 234 ++++++++++---------------
lib/fib/dir24_8.h | 17 +-
lib/fib/fib_tbl8.h | 50 ++++++
lib/fib/fib_tbl8_pool.c | 337 ++++++++++++++++++++++++++++++++++++
lib/fib/fib_tbl8_pool.h | 113 ++++++++++++
lib/fib/meson.build | 5 +-
lib/fib/rte_fib.h | 3 +
lib/fib/rte_fib6.h | 3 +
lib/fib/rte_fib_tbl8_pool.h | 149 ++++++++++++++++
lib/fib/trie.c | 230 +++++++++---------------
lib/fib/trie.h | 15 +-
12 files changed, 844 insertions(+), 322 deletions(-)
create mode 100644 lib/fib/fib_tbl8.h
create mode 100644 lib/fib/fib_tbl8_pool.c
create mode 100644 lib/fib/fib_tbl8_pool.h
create mode 100644 lib/fib/rte_fib_tbl8_pool.h
--
2.43.0
^ permalink raw reply [flat|nested] 9+ messages in thread
* [RFC 1/5] test/fib6: zero-initialize config struct
2026-03-31 21:41 [RFC 0/5] fib: shared and resizable tbl8 pool Maxime Leroy
@ 2026-03-31 21:41 ` Maxime Leroy
2026-03-31 21:41 ` [RFC 2/5] fib: share tbl8 definitions between fib and fib6 Maxime Leroy
` (5 subsequent siblings)
6 siblings, 0 replies; 9+ messages in thread
From: Maxime Leroy @ 2026-03-31 21:41 UTC (permalink / raw)
To: dev; +Cc: vladimir.medvedkin, rjarry, Maxime Leroy
Initialize rte_fib6_conf with { 0 } to avoid using uninitialized
fields, aligned with how it is already done in test_fib.c.
This is needed because the struct will gain new optional fields
(tbl8_pool) that must default to NULL.
Signed-off-by: Maxime Leroy <maxime@leroys.fr>
---
app/test/test_fib6.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/app/test/test_fib6.c b/app/test/test_fib6.c
index fffb590dbf..9fbdde6b05 100644
--- a/app/test/test_fib6.c
+++ b/app/test/test_fib6.c
@@ -38,7 +38,7 @@ int32_t
test_create_invalid(void)
{
struct rte_fib6 *fib = NULL;
- struct rte_fib6_conf config;
+ struct rte_fib6_conf config = { 0 };
config.max_routes = MAX_ROUTES;
config.rib_ext_sz = 0;
@@ -97,7 +97,7 @@ int32_t
test_multiple_create(void)
{
struct rte_fib6 *fib = NULL;
- struct rte_fib6_conf config;
+ struct rte_fib6_conf config = { 0 };
int32_t i;
config.rib_ext_sz = 0;
@@ -124,7 +124,7 @@ int32_t
test_free_null(void)
{
struct rte_fib6 *fib = NULL;
- struct rte_fib6_conf config;
+ struct rte_fib6_conf config = { 0 };
config.max_routes = MAX_ROUTES;
config.rib_ext_sz = 0;
@@ -148,7 +148,7 @@ int32_t
test_add_del_invalid(void)
{
struct rte_fib6 *fib = NULL;
- struct rte_fib6_conf config;
+ struct rte_fib6_conf config = { 0 };
uint64_t nh = 100;
struct rte_ipv6_addr ip = RTE_IPV6_ADDR_UNSPEC;
int ret;
@@ -342,7 +342,7 @@ int32_t
test_lookup(void)
{
struct rte_fib6 *fib = NULL;
- struct rte_fib6_conf config;
+ struct rte_fib6_conf config = { 0 };
uint64_t def_nh = 100;
int ret;
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [RFC 2/5] fib: share tbl8 definitions between fib and fib6
2026-03-31 21:41 [RFC 0/5] fib: shared and resizable tbl8 pool Maxime Leroy
2026-03-31 21:41 ` [RFC 1/5] test/fib6: zero-initialize config struct Maxime Leroy
@ 2026-03-31 21:41 ` Maxime Leroy
2026-03-31 21:41 ` [RFC 3/5] fib: add shared tbl8 pool Maxime Leroy
` (4 subsequent siblings)
6 siblings, 0 replies; 9+ messages in thread
From: Maxime Leroy @ 2026-03-31 21:41 UTC (permalink / raw)
To: dev; +Cc: vladimir.medvedkin, rjarry, Maxime Leroy
Extract common tbl8 definitions shared by dir24_8 and trie backends
into a new fib_tbl8.h header:
- FIB_TBL8_GRP_NUM_ENT constant (was DIR24_8_TBL8_GRP_NUM_ENT
and TRIE_TBL8_GRP_NUM_ENT)
- enum fib_nh_sz with static_asserts against public enums
- fib_tbl8_write() inline (was write_to_fib and write_to_dp)
Convert dir24_8 tbl8 index allocator from bitmap to stack, aligned
with the trie backend which already uses a stack-based allocator.
Signed-off-by: Maxime Leroy <maxime@leroys.fr>
---
lib/fib/dir24_8.c | 151 +++++++++++++++++++--------------------------
lib/fib/dir24_8.h | 17 +++--
lib/fib/fib_tbl8.h | 50 +++++++++++++++
lib/fib/trie.c | 80 +++++++++---------------
lib/fib/trie.h | 11 +---
5 files changed, 155 insertions(+), 154 deletions(-)
create mode 100644 lib/fib/fib_tbl8.h
diff --git a/lib/fib/dir24_8.c b/lib/fib/dir24_8.c
index 489d2ef427..935eca12c3 100644
--- a/lib/fib/dir24_8.c
+++ b/lib/fib/dir24_8.c
@@ -17,6 +17,11 @@
#include "dir24_8.h"
#include "fib_log.h"
+static_assert((int)FIB_NH_SZ_1B == (int)RTE_FIB_DIR24_8_1B, "nh_sz 1B mismatch");
+static_assert((int)FIB_NH_SZ_2B == (int)RTE_FIB_DIR24_8_2B, "nh_sz 2B mismatch");
+static_assert((int)FIB_NH_SZ_4B == (int)RTE_FIB_DIR24_8_4B, "nh_sz 4B mismatch");
+static_assert((int)FIB_NH_SZ_8B == (int)RTE_FIB_DIR24_8_8B, "nh_sz 8B mismatch");
+
#ifdef CC_AVX512_SUPPORT
#include "dir24_8_avx512.h"
@@ -147,57 +152,27 @@ dir24_8_get_lookup_fn(void *p, enum rte_fib_lookup_type type, bool be_addr)
return NULL;
}
-static void
-write_to_fib(void *ptr, uint64_t val, enum rte_fib_dir24_8_nh_sz size, int n)
+/*
+ * Get an index of a free tbl8 from the pool
+ */
+static inline int32_t
+tbl8_get(struct dir24_8_tbl *dp)
{
- int i;
- uint8_t *ptr8 = (uint8_t *)ptr;
- uint16_t *ptr16 = (uint16_t *)ptr;
- uint32_t *ptr32 = (uint32_t *)ptr;
- uint64_t *ptr64 = (uint64_t *)ptr;
-
- switch (size) {
- case RTE_FIB_DIR24_8_1B:
- for (i = 0; i < n; i++)
- ptr8[i] = (uint8_t)val;
- break;
- case RTE_FIB_DIR24_8_2B:
- for (i = 0; i < n; i++)
- ptr16[i] = (uint16_t)val;
- break;
- case RTE_FIB_DIR24_8_4B:
- for (i = 0; i < n; i++)
- ptr32[i] = (uint32_t)val;
- break;
- case RTE_FIB_DIR24_8_8B:
- for (i = 0; i < n; i++)
- ptr64[i] = (uint64_t)val;
- break;
- }
-}
+ if (dp->tbl8_pool_pos == dp->number_tbl8s)
+ /* no more free tbl8 */
+ return -ENOSPC;
-static int
-tbl8_get_idx(struct dir24_8_tbl *dp)
-{
- uint32_t i;
- int bit_idx;
-
- for (i = 0; (i < (dp->number_tbl8s >> BITMAP_SLAB_BIT_SIZE_LOG2)) &&
- (dp->tbl8_idxes[i] == UINT64_MAX); i++)
- ;
- if (i < (dp->number_tbl8s >> BITMAP_SLAB_BIT_SIZE_LOG2)) {
- bit_idx = rte_ctz64(~dp->tbl8_idxes[i]);
- dp->tbl8_idxes[i] |= (1ULL << bit_idx);
- return (i << BITMAP_SLAB_BIT_SIZE_LOG2) + bit_idx;
- }
- return -ENOSPC;
+ /* next index */
+ return dp->tbl8_pool[dp->tbl8_pool_pos++];
}
+/*
+ * Put an index of a free tbl8 back to the pool
+ */
static inline void
-tbl8_free_idx(struct dir24_8_tbl *dp, int idx)
+tbl8_put(struct dir24_8_tbl *dp, uint32_t tbl8_ind)
{
- dp->tbl8_idxes[idx >> BITMAP_SLAB_BIT_SIZE_LOG2] &=
- ~(1ULL << (idx & BITMAP_SLAB_BITMASK));
+ dp->tbl8_pool[--dp->tbl8_pool_pos] = tbl8_ind;
}
static int
@@ -206,34 +181,32 @@ tbl8_alloc(struct dir24_8_tbl *dp, uint64_t nh)
int64_t tbl8_idx;
uint8_t *tbl8_ptr;
- tbl8_idx = tbl8_get_idx(dp);
+ tbl8_idx = tbl8_get(dp);
/* If there are no tbl8 groups try to reclaim one. */
if (unlikely(tbl8_idx == -ENOSPC && dp->dq &&
!rte_rcu_qsbr_dq_reclaim(dp->dq, 1, NULL, NULL, NULL)))
- tbl8_idx = tbl8_get_idx(dp);
+ tbl8_idx = tbl8_get(dp);
if (tbl8_idx < 0)
return tbl8_idx;
tbl8_ptr = (uint8_t *)dp->tbl8 +
- ((tbl8_idx * DIR24_8_TBL8_GRP_NUM_ENT) <<
+ ((tbl8_idx * FIB_TBL8_GRP_NUM_ENT) <<
dp->nh_sz);
/*Init tbl8 entries with nexthop from tbl24*/
- write_to_fib((void *)tbl8_ptr, nh|
+ fib_tbl8_write((void *)tbl8_ptr, nh|
DIR24_8_EXT_ENT, dp->nh_sz,
- DIR24_8_TBL8_GRP_NUM_ENT);
- dp->cur_tbl8s++;
+ FIB_TBL8_GRP_NUM_ENT);
return tbl8_idx;
}
static void
tbl8_cleanup_and_free(struct dir24_8_tbl *dp, uint64_t tbl8_idx)
{
- uint8_t *ptr = (uint8_t *)dp->tbl8 + (tbl8_idx * DIR24_8_TBL8_GRP_NUM_ENT << dp->nh_sz);
+ uint8_t *ptr = (uint8_t *)dp->tbl8 + (tbl8_idx * FIB_TBL8_GRP_NUM_ENT << dp->nh_sz);
- memset(ptr, 0, DIR24_8_TBL8_GRP_NUM_ENT << dp->nh_sz);
- tbl8_free_idx(dp, tbl8_idx);
- dp->cur_tbl8s--;
+ memset(ptr, 0, FIB_TBL8_GRP_NUM_ENT << dp->nh_sz);
+ tbl8_put(dp, tbl8_idx);
}
static void
@@ -258,9 +231,9 @@ tbl8_recycle(struct dir24_8_tbl *dp, uint32_t ip, uint64_t tbl8_idx)
switch (dp->nh_sz) {
case RTE_FIB_DIR24_8_1B:
ptr8 = &((uint8_t *)dp->tbl8)[tbl8_idx *
- DIR24_8_TBL8_GRP_NUM_ENT];
+ FIB_TBL8_GRP_NUM_ENT];
nh = *ptr8;
- for (i = 1; i < DIR24_8_TBL8_GRP_NUM_ENT; i++) {
+ for (i = 1; i < FIB_TBL8_GRP_NUM_ENT; i++) {
if (nh != ptr8[i])
return;
}
@@ -269,9 +242,9 @@ tbl8_recycle(struct dir24_8_tbl *dp, uint32_t ip, uint64_t tbl8_idx)
break;
case RTE_FIB_DIR24_8_2B:
ptr16 = &((uint16_t *)dp->tbl8)[tbl8_idx *
- DIR24_8_TBL8_GRP_NUM_ENT];
+ FIB_TBL8_GRP_NUM_ENT];
nh = *ptr16;
- for (i = 1; i < DIR24_8_TBL8_GRP_NUM_ENT; i++) {
+ for (i = 1; i < FIB_TBL8_GRP_NUM_ENT; i++) {
if (nh != ptr16[i])
return;
}
@@ -280,9 +253,9 @@ tbl8_recycle(struct dir24_8_tbl *dp, uint32_t ip, uint64_t tbl8_idx)
break;
case RTE_FIB_DIR24_8_4B:
ptr32 = &((uint32_t *)dp->tbl8)[tbl8_idx *
- DIR24_8_TBL8_GRP_NUM_ENT];
+ FIB_TBL8_GRP_NUM_ENT];
nh = *ptr32;
- for (i = 1; i < DIR24_8_TBL8_GRP_NUM_ENT; i++) {
+ for (i = 1; i < FIB_TBL8_GRP_NUM_ENT; i++) {
if (nh != ptr32[i])
return;
}
@@ -291,9 +264,9 @@ tbl8_recycle(struct dir24_8_tbl *dp, uint32_t ip, uint64_t tbl8_idx)
break;
case RTE_FIB_DIR24_8_8B:
ptr64 = &((uint64_t *)dp->tbl8)[tbl8_idx *
- DIR24_8_TBL8_GRP_NUM_ENT];
+ FIB_TBL8_GRP_NUM_ENT];
nh = *ptr64;
- for (i = 1; i < DIR24_8_TBL8_GRP_NUM_ENT; i++) {
+ for (i = 1; i < FIB_TBL8_GRP_NUM_ENT; i++) {
if (nh != ptr64[i])
return;
}
@@ -337,32 +310,32 @@ install_to_fib(struct dir24_8_tbl *dp, uint32_t ledge, uint32_t redge,
* needs tbl8 for ledge and redge.
*/
tbl8_idx = tbl8_alloc(dp, tbl24_tmp);
- tmp_tbl8_idx = tbl8_get_idx(dp);
+ tmp_tbl8_idx = tbl8_get(dp);
if (tbl8_idx < 0)
return -ENOSPC;
else if (tmp_tbl8_idx < 0) {
- tbl8_free_idx(dp, tbl8_idx);
+ tbl8_put(dp, tbl8_idx);
return -ENOSPC;
}
- tbl8_free_idx(dp, tmp_tbl8_idx);
+ tbl8_put(dp, tmp_tbl8_idx);
/*update dir24 entry with tbl8 index*/
- write_to_fib(get_tbl24_p(dp, ledge,
+ fib_tbl8_write(get_tbl24_p(dp, ledge,
dp->nh_sz), (tbl8_idx << 1)|
DIR24_8_EXT_ENT,
dp->nh_sz, 1);
} else
tbl8_idx = tbl24_tmp >> 1;
tbl8_ptr = (uint8_t *)dp->tbl8 +
- (((tbl8_idx * DIR24_8_TBL8_GRP_NUM_ENT) +
+ (((tbl8_idx * FIB_TBL8_GRP_NUM_ENT) +
(ledge & ~DIR24_8_TBL24_MASK)) <<
dp->nh_sz);
/*update tbl8 with new next hop*/
- write_to_fib((void *)tbl8_ptr, (next_hop << 1)|
+ fib_tbl8_write((void *)tbl8_ptr, (next_hop << 1)|
DIR24_8_EXT_ENT,
dp->nh_sz, ROUNDUP(ledge, 24) - ledge);
tbl8_recycle(dp, ledge, tbl8_idx);
}
- write_to_fib(get_tbl24_p(dp, ROUNDUP(ledge, 24), dp->nh_sz),
+ fib_tbl8_write(get_tbl24_p(dp, ROUNDUP(ledge, 24), dp->nh_sz),
next_hop << 1, dp->nh_sz, len);
if (redge & ~DIR24_8_TBL24_MASK) {
tbl24_tmp = get_tbl24(dp, redge, dp->nh_sz);
@@ -372,17 +345,17 @@ install_to_fib(struct dir24_8_tbl *dp, uint32_t ledge, uint32_t redge,
if (tbl8_idx < 0)
return -ENOSPC;
/*update dir24 entry with tbl8 index*/
- write_to_fib(get_tbl24_p(dp, redge,
+ fib_tbl8_write(get_tbl24_p(dp, redge,
dp->nh_sz), (tbl8_idx << 1)|
DIR24_8_EXT_ENT,
dp->nh_sz, 1);
} else
tbl8_idx = tbl24_tmp >> 1;
tbl8_ptr = (uint8_t *)dp->tbl8 +
- ((tbl8_idx * DIR24_8_TBL8_GRP_NUM_ENT) <<
+ ((tbl8_idx * FIB_TBL8_GRP_NUM_ENT) <<
dp->nh_sz);
/*update tbl8 with new next hop*/
- write_to_fib((void *)tbl8_ptr, (next_hop << 1)|
+ fib_tbl8_write((void *)tbl8_ptr, (next_hop << 1)|
DIR24_8_EXT_ENT,
dp->nh_sz, redge & ~DIR24_8_TBL24_MASK);
tbl8_recycle(dp, redge, tbl8_idx);
@@ -395,18 +368,18 @@ install_to_fib(struct dir24_8_tbl *dp, uint32_t ledge, uint32_t redge,
if (tbl8_idx < 0)
return -ENOSPC;
/*update dir24 entry with tbl8 index*/
- write_to_fib(get_tbl24_p(dp, ledge, dp->nh_sz),
+ fib_tbl8_write(get_tbl24_p(dp, ledge, dp->nh_sz),
(tbl8_idx << 1)|
DIR24_8_EXT_ENT,
dp->nh_sz, 1);
} else
tbl8_idx = tbl24_tmp >> 1;
tbl8_ptr = (uint8_t *)dp->tbl8 +
- (((tbl8_idx * DIR24_8_TBL8_GRP_NUM_ENT) +
+ (((tbl8_idx * FIB_TBL8_GRP_NUM_ENT) +
(ledge & ~DIR24_8_TBL24_MASK)) <<
dp->nh_sz);
/*update tbl8 with new next hop*/
- write_to_fib((void *)tbl8_ptr, (next_hop << 1)|
+ fib_tbl8_write((void *)tbl8_ptr, (next_hop << 1)|
DIR24_8_EXT_ENT,
dp->nh_sz, redge - ledge);
tbl8_recycle(dp, ledge, tbl8_idx);
@@ -561,7 +534,9 @@ dir24_8_create(const char *name, int socket_id, struct rte_fib_conf *fib_conf)
char mem_name[DIR24_8_NAMESIZE];
struct dir24_8_tbl *dp;
uint64_t def_nh;
+ uint64_t tbl8_sz;
uint32_t num_tbl8;
+ uint32_t i;
enum rte_fib_dir24_8_nh_sz nh_sz;
if ((name == NULL) || (fib_conf == NULL) ||
@@ -578,8 +553,7 @@ dir24_8_create(const char *name, int socket_id, struct rte_fib_conf *fib_conf)
def_nh = fib_conf->default_nh;
nh_sz = fib_conf->dir24_8.nh_sz;
- num_tbl8 = RTE_ALIGN_CEIL(fib_conf->dir24_8.num_tbl8,
- BITMAP_SLAB_BIT_SIZE);
+ num_tbl8 = fib_conf->dir24_8.num_tbl8;
snprintf(mem_name, sizeof(mem_name), "DP_%s", name);
dp = rte_zmalloc_socket(name, sizeof(struct dir24_8_tbl) +
@@ -590,11 +564,8 @@ dir24_8_create(const char *name, int socket_id, struct rte_fib_conf *fib_conf)
return NULL;
}
- /* Init table with default value */
- write_to_fib(dp->tbl24, (def_nh << 1), nh_sz, 1 << 24);
-
snprintf(mem_name, sizeof(mem_name), "TBL8_%p", dp);
- uint64_t tbl8_sz = DIR24_8_TBL8_GRP_NUM_ENT * (1ULL << nh_sz) *
+ tbl8_sz = FIB_TBL8_GRP_NUM_ENT * (1ULL << nh_sz) *
(num_tbl8 + 1);
dp->tbl8 = rte_zmalloc_socket(mem_name, tbl8_sz,
RTE_CACHE_LINE_SIZE, socket_id);
@@ -608,16 +579,24 @@ dir24_8_create(const char *name, int socket_id, struct rte_fib_conf *fib_conf)
dp->number_tbl8s = num_tbl8;
snprintf(mem_name, sizeof(mem_name), "TBL8_idxes_%p", dp);
- dp->tbl8_idxes = rte_zmalloc_socket(mem_name,
- RTE_ALIGN_CEIL(dp->number_tbl8s, 64) >> 3,
+ dp->tbl8_pool = rte_zmalloc_socket(mem_name,
+ sizeof(uint32_t) * dp->number_tbl8s,
RTE_CACHE_LINE_SIZE, socket_id);
- if (dp->tbl8_idxes == NULL) {
+ if (dp->tbl8_pool == NULL) {
rte_errno = ENOMEM;
rte_free(dp->tbl8);
rte_free(dp);
return NULL;
}
+ /* Init pool with all tbl8 indices free */
+ for (i = 0; i < dp->number_tbl8s; i++)
+ dp->tbl8_pool[i] = i;
+ dp->tbl8_pool_pos = 0;
+
+ /* Init table with default value */
+ fib_tbl8_write(dp->tbl24, (def_nh << 1), nh_sz, 1 << 24);
+
return dp;
}
@@ -627,7 +606,7 @@ dir24_8_free(void *p)
struct dir24_8_tbl *dp = (struct dir24_8_tbl *)p;
rte_rcu_qsbr_dq_delete(dp->dq);
- rte_free(dp->tbl8_idxes);
+ rte_free(dp->tbl8_pool);
rte_free(dp->tbl8);
rte_free(dp);
}
diff --git a/lib/fib/dir24_8.h b/lib/fib/dir24_8.h
index b343b5d686..e75bd120ad 100644
--- a/lib/fib/dir24_8.h
+++ b/lib/fib/dir24_8.h
@@ -14,24 +14,21 @@
#include <rte_branch_prediction.h>
#include <rte_rcu_qsbr.h>
+#include "fib_tbl8.h"
+
/**
* @file
* DIR24_8 algorithm
*/
#define DIR24_8_TBL24_NUM_ENT (1 << 24)
-#define DIR24_8_TBL8_GRP_NUM_ENT 256U
#define DIR24_8_EXT_ENT 1
#define DIR24_8_TBL24_MASK 0xffffff00
-#define BITMAP_SLAB_BIT_SIZE_LOG2 6
-#define BITMAP_SLAB_BIT_SIZE (1 << BITMAP_SLAB_BIT_SIZE_LOG2)
-#define BITMAP_SLAB_BITMASK (BITMAP_SLAB_BIT_SIZE - 1)
-
struct dir24_8_tbl {
uint32_t number_tbl8s; /**< Total number of tbl8s */
uint32_t rsvd_tbl8s; /**< Number of reserved tbl8s */
- uint32_t cur_tbl8s; /**< Current number of tbl8s */
+ uint32_t tbl8_pool_pos; /**< Next free index in pool */
enum rte_fib_dir24_8_nh_sz nh_sz; /**< Size of nexthop entry */
/* RCU config. */
enum rte_fib_qsbr_mode rcu_mode;/* Blocking, defer queue. */
@@ -39,7 +36,7 @@ struct dir24_8_tbl {
struct rte_rcu_qsbr_dq *dq; /* RCU QSBR defer queue. */
uint64_t def_nh; /**< Default next hop */
uint64_t *tbl8; /**< tbl8 table. */
- uint64_t *tbl8_idxes; /**< bitmap containing free tbl8 idxes*/
+ uint32_t *tbl8_pool; /**< Stack of free tbl8 indices */
/* tbl24 table. */
alignas(RTE_CACHE_LINE_SIZE) uint64_t tbl24[];
};
@@ -72,7 +69,7 @@ get_tbl24_idx(uint32_t ip)
static inline uint32_t
get_tbl8_idx(uint32_t res, uint32_t ip)
{
- return (res >> 1) * DIR24_8_TBL8_GRP_NUM_ENT + (uint8_t)ip;
+ return (res >> 1) * FIB_TBL8_GRP_NUM_ENT + (uint8_t)ip;
}
static inline uint64_t
@@ -133,14 +130,14 @@ static inline void dir24_8_lookup_bulk_##suffix(void *p, const uint32_t *ips, \
tmp = ((type *)dp->tbl24)[ips[i] >> 8]; \
if (unlikely(is_entry_extended(tmp))) \
tmp = ((type *)dp->tbl8)[(uint8_t)ips[i] + \
- ((tmp >> 1) * DIR24_8_TBL8_GRP_NUM_ENT)]; \
+ ((tmp >> 1) * FIB_TBL8_GRP_NUM_ENT)]; \
next_hops[i] = tmp >> 1; \
} \
for (; i < n; i++) { \
tmp = ((type *)dp->tbl24)[ips[i] >> 8]; \
if (unlikely(is_entry_extended(tmp))) \
tmp = ((type *)dp->tbl8)[(uint8_t)ips[i] + \
- ((tmp >> 1) * DIR24_8_TBL8_GRP_NUM_ENT)]; \
+ ((tmp >> 1) * FIB_TBL8_GRP_NUM_ENT)]; \
next_hops[i] = tmp >> 1; \
} \
} \
diff --git a/lib/fib/fib_tbl8.h b/lib/fib/fib_tbl8.h
new file mode 100644
index 0000000000..b345c1e489
--- /dev/null
+++ b/lib/fib/fib_tbl8.h
@@ -0,0 +1,50 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2026 Maxime Leroy, Free Mobile
+ */
+
+#ifndef _FIB_TBL8_H_
+#define _FIB_TBL8_H_
+
+/**
+ * @file
+ * Common tbl8 definitions shared by dir24_8 and trie backends.
+ */
+
+#include <stdint.h>
+
+#define FIB_TBL8_GRP_NUM_ENT 256U
+
+/** Nexthop size (log2 of byte width) */
+enum fib_nh_sz {
+ FIB_NH_SZ_1B = 0,
+ FIB_NH_SZ_2B = 1,
+ FIB_NH_SZ_4B = 2,
+ FIB_NH_SZ_8B = 3,
+};
+
+static inline void
+fib_tbl8_write(void *ptr, uint64_t val, uint8_t nh_sz, int n)
+{
+ int i;
+
+ switch (nh_sz) {
+ case FIB_NH_SZ_1B:
+ for (i = 0; i < n; i++)
+ ((uint8_t *)ptr)[i] = (uint8_t)val;
+ break;
+ case FIB_NH_SZ_2B:
+ for (i = 0; i < n; i++)
+ ((uint16_t *)ptr)[i] = (uint16_t)val;
+ break;
+ case FIB_NH_SZ_4B:
+ for (i = 0; i < n; i++)
+ ((uint32_t *)ptr)[i] = (uint32_t)val;
+ break;
+ case FIB_NH_SZ_8B:
+ for (i = 0; i < n; i++)
+ ((uint64_t *)ptr)[i] = (uint64_t)val;
+ break;
+ }
+}
+
+#endif /* _FIB_TBL8_H_ */
diff --git a/lib/fib/trie.c b/lib/fib/trie.c
index fa5d9ec6b0..198fc54395 100644
--- a/lib/fib/trie.c
+++ b/lib/fib/trie.c
@@ -16,6 +16,10 @@
#include "fib_log.h"
#include "trie.h"
+static_assert((int)FIB_NH_SZ_2B == (int)RTE_FIB6_TRIE_2B, "nh_sz 2B mismatch");
+static_assert((int)FIB_NH_SZ_4B == (int)RTE_FIB6_TRIE_4B, "nh_sz 4B mismatch");
+static_assert((int)FIB_NH_SZ_8B == (int)RTE_FIB6_TRIE_8B, "nh_sz 8B mismatch");
+
#ifdef CC_AVX512_SUPPORT
#include "trie_avx512.h"
@@ -95,30 +99,6 @@ trie_get_lookup_fn(void *p, enum rte_fib6_lookup_type type)
return NULL;
}
-static void
-write_to_dp(void *ptr, uint64_t val, enum rte_fib_trie_nh_sz size, int n)
-{
- int i;
- uint16_t *ptr16 = (uint16_t *)ptr;
- uint32_t *ptr32 = (uint32_t *)ptr;
- uint64_t *ptr64 = (uint64_t *)ptr;
-
- switch (size) {
- case RTE_FIB6_TRIE_2B:
- for (i = 0; i < n; i++)
- ptr16[i] = (uint16_t)val;
- break;
- case RTE_FIB6_TRIE_4B:
- for (i = 0; i < n; i++)
- ptr32[i] = (uint32_t)val;
- break;
- case RTE_FIB6_TRIE_8B:
- for (i = 0; i < n; i++)
- ptr64[i] = (uint64_t)val;
- break;
- }
-}
-
static void
tbl8_pool_init(struct rte_trie_tbl *dp)
{
@@ -170,19 +150,19 @@ tbl8_alloc(struct rte_trie_tbl *dp, uint64_t nh)
if (tbl8_idx < 0)
return tbl8_idx;
tbl8_ptr = get_tbl_p_by_idx(dp->tbl8,
- tbl8_idx * TRIE_TBL8_GRP_NUM_ENT, dp->nh_sz);
+ tbl8_idx * FIB_TBL8_GRP_NUM_ENT, dp->nh_sz);
/*Init tbl8 entries with nexthop from tbl24*/
- write_to_dp((void *)tbl8_ptr, nh, dp->nh_sz,
- TRIE_TBL8_GRP_NUM_ENT);
+ fib_tbl8_write((void *)tbl8_ptr, nh, dp->nh_sz,
+ FIB_TBL8_GRP_NUM_ENT);
return tbl8_idx;
}
static void
tbl8_cleanup_and_free(struct rte_trie_tbl *dp, uint64_t tbl8_idx)
{
- uint8_t *ptr = (uint8_t *)dp->tbl8 + (tbl8_idx * TRIE_TBL8_GRP_NUM_ENT << dp->nh_sz);
+ uint8_t *ptr = (uint8_t *)dp->tbl8 + (tbl8_idx * FIB_TBL8_GRP_NUM_ENT << dp->nh_sz);
- memset(ptr, 0, TRIE_TBL8_GRP_NUM_ENT << dp->nh_sz);
+ memset(ptr, 0, FIB_TBL8_GRP_NUM_ENT << dp->nh_sz);
tbl8_put(dp, tbl8_idx);
}
@@ -206,39 +186,39 @@ tbl8_recycle(struct rte_trie_tbl *dp, void *par, uint64_t tbl8_idx)
switch (dp->nh_sz) {
case RTE_FIB6_TRIE_2B:
ptr16 = &((uint16_t *)dp->tbl8)[tbl8_idx *
- TRIE_TBL8_GRP_NUM_ENT];
+ FIB_TBL8_GRP_NUM_ENT];
nh = *ptr16;
if (nh & TRIE_EXT_ENT)
return;
- for (i = 1; i < TRIE_TBL8_GRP_NUM_ENT; i++) {
+ for (i = 1; i < FIB_TBL8_GRP_NUM_ENT; i++) {
if (nh != ptr16[i])
return;
}
- write_to_dp(par, nh, dp->nh_sz, 1);
+ fib_tbl8_write(par, nh, dp->nh_sz, 1);
break;
case RTE_FIB6_TRIE_4B:
ptr32 = &((uint32_t *)dp->tbl8)[tbl8_idx *
- TRIE_TBL8_GRP_NUM_ENT];
+ FIB_TBL8_GRP_NUM_ENT];
nh = *ptr32;
if (nh & TRIE_EXT_ENT)
return;
- for (i = 1; i < TRIE_TBL8_GRP_NUM_ENT; i++) {
+ for (i = 1; i < FIB_TBL8_GRP_NUM_ENT; i++) {
if (nh != ptr32[i])
return;
}
- write_to_dp(par, nh, dp->nh_sz, 1);
+ fib_tbl8_write(par, nh, dp->nh_sz, 1);
break;
case RTE_FIB6_TRIE_8B:
ptr64 = &((uint64_t *)dp->tbl8)[tbl8_idx *
- TRIE_TBL8_GRP_NUM_ENT];
+ FIB_TBL8_GRP_NUM_ENT];
nh = *ptr64;
if (nh & TRIE_EXT_ENT)
return;
- for (i = 1; i < TRIE_TBL8_GRP_NUM_ENT; i++) {
+ for (i = 1; i < FIB_TBL8_GRP_NUM_ENT; i++) {
if (nh != ptr64[i])
return;
}
- write_to_dp(par, nh, dp->nh_sz, 1);
+ fib_tbl8_write(par, nh, dp->nh_sz, 1);
break;
}
@@ -265,7 +245,7 @@ get_idx(const struct rte_ipv6_addr *ip, uint32_t prev_idx, int bytes, int first_
bitshift = (int8_t)(((first_byte + bytes - 1) - i)*BYTE_SIZE);
idx |= ip->a[i] << bitshift;
}
- return (prev_idx * TRIE_TBL8_GRP_NUM_ENT) + idx;
+ return (prev_idx * FIB_TBL8_GRP_NUM_ENT) + idx;
}
static inline uint64_t
@@ -303,7 +283,7 @@ recycle_root_path(struct rte_trie_tbl *dp, const uint8_t *ip_part,
if (common_tbl8 != 0) {
p = get_tbl_p_by_idx(dp->tbl8, (val >> 1) *
- TRIE_TBL8_GRP_NUM_ENT + *ip_part, dp->nh_sz);
+ FIB_TBL8_GRP_NUM_ENT + *ip_part, dp->nh_sz);
recycle_root_path(dp, ip_part + 1, common_tbl8 - 1, p);
}
tbl8_recycle(dp, prev, val >> 1);
@@ -327,7 +307,7 @@ build_common_root(struct rte_trie_tbl *dp, const struct rte_ipv6_addr *ip,
idx = tbl8_alloc(dp, val);
if (unlikely(idx < 0))
return idx;
- write_to_dp(tbl_ptr, (idx << 1) |
+ fib_tbl8_write(tbl_ptr, (idx << 1) |
TRIE_EXT_ENT, dp->nh_sz, 1);
prev_idx = idx;
} else
@@ -336,7 +316,7 @@ build_common_root(struct rte_trie_tbl *dp, const struct rte_ipv6_addr *ip,
j = i;
cur_tbl = dp->tbl8;
}
- *tbl = get_tbl_p_by_idx(cur_tbl, prev_idx * TRIE_TBL8_GRP_NUM_ENT,
+ *tbl = get_tbl_p_by_idx(cur_tbl, prev_idx * FIB_TBL8_GRP_NUM_ENT,
dp->nh_sz);
return 0;
}
@@ -361,22 +341,22 @@ write_edge(struct rte_trie_tbl *dp, const uint8_t *ip_part, uint64_t next_hop,
val = (tbl8_idx << 1)|TRIE_EXT_ENT;
}
p = get_tbl_p_by_idx(dp->tbl8, (tbl8_idx *
- TRIE_TBL8_GRP_NUM_ENT) + *ip_part, dp->nh_sz);
+ FIB_TBL8_GRP_NUM_ENT) + *ip_part, dp->nh_sz);
ret = write_edge(dp, ip_part + 1, next_hop, len - 1, edge, p);
if (ret < 0)
return ret;
if (edge == LEDGE) {
- write_to_dp(RTE_PTR_ADD(p, (uintptr_t)(1) << dp->nh_sz),
+ fib_tbl8_write(RTE_PTR_ADD(p, (uintptr_t)(1) << dp->nh_sz),
next_hop << 1, dp->nh_sz, UINT8_MAX - *ip_part);
} else {
- write_to_dp(get_tbl_p_by_idx(dp->tbl8, tbl8_idx *
- TRIE_TBL8_GRP_NUM_ENT, dp->nh_sz),
+ fib_tbl8_write(get_tbl_p_by_idx(dp->tbl8, tbl8_idx *
+ FIB_TBL8_GRP_NUM_ENT, dp->nh_sz),
next_hop << 1, dp->nh_sz, *ip_part);
}
tbl8_recycle(dp, &val, tbl8_idx);
}
- write_to_dp(ent, val, dp->nh_sz, 1);
+ fib_tbl8_write(ent, val, dp->nh_sz, 1);
return ret;
}
@@ -444,7 +424,7 @@ install_to_dp(struct rte_trie_tbl *dp, const struct rte_ipv6_addr *ledge,
if (right_idx > left_idx + 1) {
ent = get_tbl_p_by_idx(common_root_tbl, left_idx + 1,
dp->nh_sz);
- write_to_dp(ent, next_hop << 1, dp->nh_sz,
+ fib_tbl8_write(ent, next_hop << 1, dp->nh_sz,
right_idx - (left_idx + 1));
}
ent = get_tbl_p_by_idx(common_root_tbl, right_idx, dp->nh_sz);
@@ -686,10 +666,10 @@ trie_create(const char *name, int socket_id,
return dp;
}
- write_to_dp(&dp->tbl24, (def_nh << 1), nh_sz, 1 << 24);
+ fib_tbl8_write(&dp->tbl24, (def_nh << 1), nh_sz, 1 << 24);
snprintf(mem_name, sizeof(mem_name), "TBL8_%p", dp);
- dp->tbl8 = rte_zmalloc_socket(mem_name, TRIE_TBL8_GRP_NUM_ENT *
+ dp->tbl8 = rte_zmalloc_socket(mem_name, FIB_TBL8_GRP_NUM_ENT *
(1ll << nh_sz) * (num_tbl8 + 1),
RTE_CACHE_LINE_SIZE, socket_id);
if (dp->tbl8 == NULL) {
diff --git a/lib/fib/trie.h b/lib/fib/trie.h
index c34cc2c057..30fa886792 100644
--- a/lib/fib/trie.h
+++ b/lib/fib/trie.h
@@ -11,6 +11,8 @@
#include <rte_common.h>
#include <rte_fib6.h>
+#include "fib_tbl8.h"
+
/**
* @file
* RTE IPv6 Longest Prefix Match (LPM)
@@ -18,21 +20,14 @@
/* @internal Total number of tbl24 entries. */
#define TRIE_TBL24_NUM_ENT (1 << 24)
-/* @internal Number of entries in a tbl8 group. */
-#define TRIE_TBL8_GRP_NUM_ENT 256ULL
/* @internal Total number of tbl8 groups in the tbl8. */
#define TRIE_TBL8_NUM_GROUPS 65536
/* @internal bitmask with valid and valid_group fields set */
#define TRIE_EXT_ENT 1
-#define BITMAP_SLAB_BIT_SIZE_LOG2 6
-#define BITMAP_SLAB_BIT_SIZE (1ULL << BITMAP_SLAB_BIT_SIZE_LOG2)
-#define BITMAP_SLAB_BITMASK (BITMAP_SLAB_BIT_SIZE - 1)
-
struct rte_trie_tbl {
uint32_t number_tbl8s; /**< Total number of tbl8s */
uint32_t rsvd_tbl8s; /**< Number of reserved tbl8s */
- uint32_t cur_tbl8s; /**< Current cumber of tbl8s */
uint64_t def_nh; /**< Default next hop */
enum rte_fib_trie_nh_sz nh_sz; /**< Size of nexthop entry */
uint64_t *tbl8; /**< tbl8 table. */
@@ -124,7 +119,7 @@ static inline void rte_trie_lookup_bulk_##suffix(void *p, \
j = 3; \
while (is_entry_extended(tmp)) { \
tmp = ((type *)dp->tbl8)[ips[i].a[j++] + \
- ((tmp >> 1) * TRIE_TBL8_GRP_NUM_ENT)]; \
+ ((tmp >> 1) * FIB_TBL8_GRP_NUM_ENT)]; \
} \
next_hops[i] = tmp >> 1; \
} \
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [RFC 3/5] fib: add shared tbl8 pool
2026-03-31 21:41 [RFC 0/5] fib: shared and resizable tbl8 pool Maxime Leroy
2026-03-31 21:41 ` [RFC 1/5] test/fib6: zero-initialize config struct Maxime Leroy
2026-03-31 21:41 ` [RFC 2/5] fib: share tbl8 definitions between fib and fib6 Maxime Leroy
@ 2026-03-31 21:41 ` Maxime Leroy
2026-03-31 21:41 ` [RFC 4/5] fib: add resizable " Maxime Leroy
` (3 subsequent siblings)
6 siblings, 0 replies; 9+ messages in thread
From: Maxime Leroy @ 2026-03-31 21:41 UTC (permalink / raw)
To: dev; +Cc: vladimir.medvedkin, rjarry, Maxime Leroy
Replace the per-FIB tbl8 allocation with a common refcounted tbl8
pool (fib_tbl8_pool).
A FIB can either use an internal pool (created transparently from the
existing num_tbl8 config parameter) or attach to an external shared
pool via the new tbl8_pool config field. The shared pool allows
multiple FIB instances (e.g. one per VRF) to draw tbl8 groups from
the same memory, reducing overall allocation.
The pool is refcounted: internal pools start at refcount 1 and are
freed when the owning FIB is destroyed. External pools are
incremented on FIB attach and decremented on FIB detach; the creator
releases its reference via rte_fib_tbl8_pool_free().
The per-FIB RCU defer queue callback is shared across both backends
(fib_tbl8_pool_rcu_free_cb).
New public API:
- rte_fib_tbl8_pool_create()
- rte_fib_tbl8_pool_free()
Signed-off-by: Maxime Leroy <maxime@leroys.fr>
---
lib/fib/dir24_8.c | 128 ++++++++++---------------------
lib/fib/dir24_8.h | 6 +-
lib/fib/fib_tbl8_pool.c | 148 ++++++++++++++++++++++++++++++++++++
lib/fib/fib_tbl8_pool.h | 74 ++++++++++++++++++
lib/fib/meson.build | 5 +-
lib/fib/rte_fib.h | 3 +
lib/fib/rte_fib6.h | 3 +
lib/fib/rte_fib_tbl8_pool.h | 76 ++++++++++++++++++
lib/fib/trie.c | 134 ++++++++++----------------------
lib/fib/trie.h | 6 +-
10 files changed, 394 insertions(+), 189 deletions(-)
create mode 100644 lib/fib/fib_tbl8_pool.c
create mode 100644 lib/fib/fib_tbl8_pool.h
create mode 100644 lib/fib/rte_fib_tbl8_pool.h
diff --git a/lib/fib/dir24_8.c b/lib/fib/dir24_8.c
index 935eca12c3..b8e588a56a 100644
--- a/lib/fib/dir24_8.c
+++ b/lib/fib/dir24_8.c
@@ -152,41 +152,18 @@ dir24_8_get_lookup_fn(void *p, enum rte_fib_lookup_type type, bool be_addr)
return NULL;
}
-/*
- * Get an index of a free tbl8 from the pool
- */
-static inline int32_t
-tbl8_get(struct dir24_8_tbl *dp)
-{
- if (dp->tbl8_pool_pos == dp->number_tbl8s)
- /* no more free tbl8 */
- return -ENOSPC;
-
- /* next index */
- return dp->tbl8_pool[dp->tbl8_pool_pos++];
-}
-
-/*
- * Put an index of a free tbl8 back to the pool
- */
-static inline void
-tbl8_put(struct dir24_8_tbl *dp, uint32_t tbl8_ind)
-{
- dp->tbl8_pool[--dp->tbl8_pool_pos] = tbl8_ind;
-}
-
static int
tbl8_alloc(struct dir24_8_tbl *dp, uint64_t nh)
{
int64_t tbl8_idx;
uint8_t *tbl8_ptr;
- tbl8_idx = tbl8_get(dp);
+ tbl8_idx = fib_tbl8_pool_get(dp->pool);
/* If there are no tbl8 groups try to reclaim one. */
if (unlikely(tbl8_idx == -ENOSPC && dp->dq &&
!rte_rcu_qsbr_dq_reclaim(dp->dq, 1, NULL, NULL, NULL)))
- tbl8_idx = tbl8_get(dp);
+ tbl8_idx = fib_tbl8_pool_get(dp->pool);
if (tbl8_idx < 0)
return tbl8_idx;
@@ -200,24 +177,6 @@ tbl8_alloc(struct dir24_8_tbl *dp, uint64_t nh)
return tbl8_idx;
}
-static void
-tbl8_cleanup_and_free(struct dir24_8_tbl *dp, uint64_t tbl8_idx)
-{
- uint8_t *ptr = (uint8_t *)dp->tbl8 + (tbl8_idx * FIB_TBL8_GRP_NUM_ENT << dp->nh_sz);
-
- memset(ptr, 0, FIB_TBL8_GRP_NUM_ENT << dp->nh_sz);
- tbl8_put(dp, tbl8_idx);
-}
-
-static void
-__rcu_qsbr_free_resource(void *p, void *data, unsigned int n __rte_unused)
-{
- struct dir24_8_tbl *dp = p;
- uint64_t tbl8_idx = *(uint64_t *)data;
-
- tbl8_cleanup_and_free(dp, tbl8_idx);
-}
-
static void
tbl8_recycle(struct dir24_8_tbl *dp, uint32_t ip, uint64_t tbl8_idx)
{
@@ -276,10 +235,10 @@ tbl8_recycle(struct dir24_8_tbl *dp, uint32_t ip, uint64_t tbl8_idx)
}
if (dp->v == NULL) {
- tbl8_cleanup_and_free(dp, tbl8_idx);
+ fib_tbl8_pool_cleanup_and_free(dp->pool, tbl8_idx);
} else if (dp->rcu_mode == RTE_FIB_QSBR_MODE_SYNC) {
rte_rcu_qsbr_synchronize(dp->v, RTE_QSBR_THRID_INVALID);
- tbl8_cleanup_and_free(dp, tbl8_idx);
+ fib_tbl8_pool_cleanup_and_free(dp->pool, tbl8_idx);
} else { /* RTE_FIB_QSBR_MODE_DQ */
if (rte_rcu_qsbr_dq_enqueue(dp->dq, &tbl8_idx))
FIB_LOG(ERR, "Failed to push QSBR FIFO");
@@ -310,14 +269,14 @@ install_to_fib(struct dir24_8_tbl *dp, uint32_t ledge, uint32_t redge,
* needs tbl8 for ledge and redge.
*/
tbl8_idx = tbl8_alloc(dp, tbl24_tmp);
- tmp_tbl8_idx = tbl8_get(dp);
+ tmp_tbl8_idx = fib_tbl8_pool_get(dp->pool);
if (tbl8_idx < 0)
return -ENOSPC;
else if (tmp_tbl8_idx < 0) {
- tbl8_put(dp, tbl8_idx);
+ fib_tbl8_pool_cleanup_and_free(dp->pool, tbl8_idx);
return -ENOSPC;
}
- tbl8_put(dp, tmp_tbl8_idx);
+ fib_tbl8_pool_put(dp->pool, tmp_tbl8_idx);
/*update dir24 entry with tbl8 index*/
fib_tbl8_write(get_tbl24_p(dp, ledge,
dp->nh_sz), (tbl8_idx << 1)|
@@ -477,7 +436,7 @@ dir24_8_modify(struct rte_fib *fib, uint32_t ip, uint8_t depth,
tmp = rte_rib_get_nxt(rib, ip, 24, NULL,
RTE_RIB_GET_NXT_COVER);
if ((tmp == NULL) &&
- (dp->rsvd_tbl8s >= dp->number_tbl8s))
+ (dp->rsvd_tbl8s >= dp->pool->num_tbl8s))
return -ENOSPC;
}
@@ -533,18 +492,13 @@ dir24_8_create(const char *name, int socket_id, struct rte_fib_conf *fib_conf)
{
char mem_name[DIR24_8_NAMESIZE];
struct dir24_8_tbl *dp;
+ struct rte_fib_tbl8_pool *pool;
uint64_t def_nh;
- uint64_t tbl8_sz;
- uint32_t num_tbl8;
- uint32_t i;
enum rte_fib_dir24_8_nh_sz nh_sz;
if ((name == NULL) || (fib_conf == NULL) ||
(fib_conf->dir24_8.nh_sz < RTE_FIB_DIR24_8_1B) ||
(fib_conf->dir24_8.nh_sz > RTE_FIB_DIR24_8_8B) ||
- (fib_conf->dir24_8.num_tbl8 >
- get_max_nh(fib_conf->dir24_8.nh_sz)) ||
- (fib_conf->dir24_8.num_tbl8 == 0) ||
(fib_conf->default_nh >
get_max_nh(fib_conf->dir24_8.nh_sz))) {
rte_errno = EINVAL;
@@ -553,46 +507,47 @@ dir24_8_create(const char *name, int socket_id, struct rte_fib_conf *fib_conf)
def_nh = fib_conf->default_nh;
nh_sz = fib_conf->dir24_8.nh_sz;
- num_tbl8 = fib_conf->dir24_8.num_tbl8;
+
+ if (fib_conf->dir24_8.tbl8_pool != NULL) {
+ /* External shared pool */
+ pool = fib_conf->dir24_8.tbl8_pool;
+ if (pool->nh_sz != nh_sz) {
+ rte_errno = EINVAL;
+ return NULL;
+ }
+ fib_tbl8_pool_ref(pool);
+ } else {
+ /* Internal pool */
+ if ((fib_conf->dir24_8.num_tbl8 >
+ get_max_nh(fib_conf->dir24_8.nh_sz)) ||
+ (fib_conf->dir24_8.num_tbl8 == 0)) {
+ rte_errno = EINVAL;
+ return NULL;
+ }
+ struct rte_fib_tbl8_pool_conf pool_conf = {
+ .num_tbl8 = fib_conf->dir24_8.num_tbl8,
+ .nh_sz = nh_sz,
+ .socket_id = socket_id,
+ };
+ pool = rte_fib_tbl8_pool_create(name, &pool_conf);
+ if (pool == NULL)
+ return NULL;
+ }
snprintf(mem_name, sizeof(mem_name), "DP_%s", name);
dp = rte_zmalloc_socket(name, sizeof(struct dir24_8_tbl) +
DIR24_8_TBL24_NUM_ENT * (1 << nh_sz) + sizeof(uint32_t),
RTE_CACHE_LINE_SIZE, socket_id);
if (dp == NULL) {
+ fib_tbl8_pool_unref(pool);
rte_errno = ENOMEM;
return NULL;
}
- snprintf(mem_name, sizeof(mem_name), "TBL8_%p", dp);
- tbl8_sz = FIB_TBL8_GRP_NUM_ENT * (1ULL << nh_sz) *
- (num_tbl8 + 1);
- dp->tbl8 = rte_zmalloc_socket(mem_name, tbl8_sz,
- RTE_CACHE_LINE_SIZE, socket_id);
- if (dp->tbl8 == NULL) {
- rte_errno = ENOMEM;
- rte_free(dp);
- return NULL;
- }
+ dp->pool = pool;
+ dp->tbl8 = pool->tbl8;
dp->def_nh = def_nh;
dp->nh_sz = nh_sz;
- dp->number_tbl8s = num_tbl8;
-
- snprintf(mem_name, sizeof(mem_name), "TBL8_idxes_%p", dp);
- dp->tbl8_pool = rte_zmalloc_socket(mem_name,
- sizeof(uint32_t) * dp->number_tbl8s,
- RTE_CACHE_LINE_SIZE, socket_id);
- if (dp->tbl8_pool == NULL) {
- rte_errno = ENOMEM;
- rte_free(dp->tbl8);
- rte_free(dp);
- return NULL;
- }
-
- /* Init pool with all tbl8 indices free */
- for (i = 0; i < dp->number_tbl8s; i++)
- dp->tbl8_pool[i] = i;
- dp->tbl8_pool_pos = 0;
/* Init table with default value */
fib_tbl8_write(dp->tbl24, (def_nh << 1), nh_sz, 1 << 24);
@@ -606,8 +561,7 @@ dir24_8_free(void *p)
struct dir24_8_tbl *dp = (struct dir24_8_tbl *)p;
rte_rcu_qsbr_dq_delete(dp->dq);
- rte_free(dp->tbl8_pool);
- rte_free(dp->tbl8);
+ fib_tbl8_pool_unref(dp->pool);
rte_free(dp);
}
@@ -639,8 +593,8 @@ dir24_8_rcu_qsbr_add(struct dir24_8_tbl *dp, struct rte_fib_rcu_config *cfg,
if (params.max_reclaim_size == 0)
params.max_reclaim_size = RTE_FIB_RCU_DQ_RECLAIM_MAX;
params.esize = sizeof(uint64_t);
- params.free_fn = __rcu_qsbr_free_resource;
- params.p = dp;
+ params.free_fn = fib_tbl8_pool_rcu_free_cb;
+ params.p = dp->pool;
params.v = cfg->v;
dp->dq = rte_rcu_qsbr_dq_create(¶ms);
if (dp->dq == NULL) {
diff --git a/lib/fib/dir24_8.h b/lib/fib/dir24_8.h
index e75bd120ad..287b91ef4b 100644
--- a/lib/fib/dir24_8.h
+++ b/lib/fib/dir24_8.h
@@ -14,7 +14,7 @@
#include <rte_branch_prediction.h>
#include <rte_rcu_qsbr.h>
-#include "fib_tbl8.h"
+#include "fib_tbl8_pool.h"
/**
* @file
@@ -26,9 +26,7 @@
#define DIR24_8_TBL24_MASK 0xffffff00
struct dir24_8_tbl {
- uint32_t number_tbl8s; /**< Total number of tbl8s */
uint32_t rsvd_tbl8s; /**< Number of reserved tbl8s */
- uint32_t tbl8_pool_pos; /**< Next free index in pool */
enum rte_fib_dir24_8_nh_sz nh_sz; /**< Size of nexthop entry */
/* RCU config. */
enum rte_fib_qsbr_mode rcu_mode;/* Blocking, defer queue. */
@@ -36,7 +34,7 @@ struct dir24_8_tbl {
struct rte_rcu_qsbr_dq *dq; /* RCU QSBR defer queue. */
uint64_t def_nh; /**< Default next hop */
uint64_t *tbl8; /**< tbl8 table. */
- uint32_t *tbl8_pool; /**< Stack of free tbl8 indices */
+ struct rte_fib_tbl8_pool *pool; /**< tbl8 pool */
/* tbl24 table. */
alignas(RTE_CACHE_LINE_SIZE) uint64_t tbl24[];
};
diff --git a/lib/fib/fib_tbl8_pool.c b/lib/fib/fib_tbl8_pool.c
new file mode 100644
index 0000000000..5f8ba74219
--- /dev/null
+++ b/lib/fib/fib_tbl8_pool.c
@@ -0,0 +1,148 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2026 Maxime Leroy, Free Mobile
+ */
+
+#include <stdint.h>
+#include <string.h>
+
+#include <eal_export.h>
+#include <rte_debug.h>
+#include <rte_errno.h>
+#include <rte_malloc.h>
+
+#include "fib_tbl8_pool.h"
+
+static void
+pool_init_free_list(struct rte_fib_tbl8_pool *pool)
+{
+ uint32_t i;
+
+ /* put entire range of indexes to the tbl8 pool */
+ for (i = 0; i < pool->num_tbl8s; i++)
+ pool->free_list[i] = i;
+
+ pool->cur_tbl8s = 0;
+}
+
+int32_t
+fib_tbl8_pool_get(struct rte_fib_tbl8_pool *pool)
+{
+ if (pool->cur_tbl8s == pool->num_tbl8s)
+ /* no more free tbl8 */
+ return -ENOSPC;
+
+ /* next index */
+ return pool->free_list[pool->cur_tbl8s++];
+}
+
+void
+fib_tbl8_pool_put(struct rte_fib_tbl8_pool *pool, uint32_t idx)
+{
+ RTE_ASSERT(pool->cur_tbl8s > 0);
+ pool->free_list[--pool->cur_tbl8s] = idx;
+}
+
+void
+fib_tbl8_pool_cleanup_and_free(struct rte_fib_tbl8_pool *pool, uint64_t idx)
+{
+ uint8_t *ptr = (uint8_t *)pool->tbl8 +
+ ((idx * FIB_TBL8_GRP_NUM_ENT) << pool->nh_sz);
+
+ memset(ptr, 0, FIB_TBL8_GRP_NUM_ENT << pool->nh_sz);
+ fib_tbl8_pool_put(pool, idx);
+}
+
+void
+fib_tbl8_pool_rcu_free_cb(void *p, void *data,
+ unsigned int n __rte_unused)
+{
+ struct rte_fib_tbl8_pool *pool = p;
+ uint64_t tbl8_idx = *(uint64_t *)data;
+
+ fib_tbl8_pool_cleanup_and_free(pool, tbl8_idx);
+}
+
+void
+fib_tbl8_pool_ref(struct rte_fib_tbl8_pool *pool)
+{
+ pool->refcnt++;
+}
+
+static void
+pool_free(struct rte_fib_tbl8_pool *pool)
+{
+ rte_free(pool->free_list);
+ rte_free(pool->tbl8);
+ rte_free(pool);
+}
+
+void
+fib_tbl8_pool_unref(struct rte_fib_tbl8_pool *pool)
+{
+ if (--pool->refcnt == 0)
+ pool_free(pool);
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_fib_tbl8_pool_create, 26.07)
+struct rte_fib_tbl8_pool *
+rte_fib_tbl8_pool_create(const char *name,
+ const struct rte_fib_tbl8_pool_conf *conf)
+{
+ struct rte_fib_tbl8_pool *pool;
+ char mem_name[64];
+
+ if (name == NULL || conf == NULL || conf->num_tbl8 == 0 ||
+ conf->nh_sz > 3) {
+ rte_errno = EINVAL;
+ return NULL;
+ }
+
+ snprintf(mem_name, sizeof(mem_name), "TBL8_POOL_%s", name);
+ pool = rte_zmalloc_socket(mem_name, sizeof(*pool),
+ RTE_CACHE_LINE_SIZE, conf->socket_id);
+ if (pool == NULL) {
+ rte_errno = ENOMEM;
+ return NULL;
+ }
+
+ pool->nh_sz = conf->nh_sz;
+ pool->num_tbl8s = conf->num_tbl8;
+ pool->socket_id = conf->socket_id;
+ pool->refcnt = 1;
+
+ snprintf(mem_name, sizeof(mem_name), "TBL8_%s", name);
+ pool->tbl8 = rte_zmalloc_socket(mem_name,
+ FIB_TBL8_GRP_NUM_ENT * (1ULL << pool->nh_sz) *
+ (pool->num_tbl8s + 1),
+ RTE_CACHE_LINE_SIZE, conf->socket_id);
+ if (pool->tbl8 == NULL) {
+ rte_errno = ENOMEM;
+ rte_free(pool);
+ return NULL;
+ }
+
+ snprintf(mem_name, sizeof(mem_name), "TBL8_FL_%s", name);
+ pool->free_list = rte_zmalloc_socket(mem_name,
+ sizeof(uint32_t) * pool->num_tbl8s,
+ RTE_CACHE_LINE_SIZE, conf->socket_id);
+ if (pool->free_list == NULL) {
+ rte_errno = ENOMEM;
+ rte_free(pool->tbl8);
+ rte_free(pool);
+ return NULL;
+ }
+
+ pool_init_free_list(pool);
+
+ return pool;
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_fib_tbl8_pool_free, 26.07)
+void
+rte_fib_tbl8_pool_free(struct rte_fib_tbl8_pool *pool)
+{
+ if (pool == NULL)
+ return;
+
+ fib_tbl8_pool_unref(pool);
+}
diff --git a/lib/fib/fib_tbl8_pool.h b/lib/fib/fib_tbl8_pool.h
new file mode 100644
index 0000000000..285f06d87f
--- /dev/null
+++ b/lib/fib/fib_tbl8_pool.h
@@ -0,0 +1,74 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2026 Maxime Leroy, Free Mobile
+ */
+
+#ifndef _FIB_TBL8_POOL_H_
+#define _FIB_TBL8_POOL_H_
+
+/**
+ * @file
+ * Internal tbl8 pool header.
+ *
+ * The pool is not thread-safe. When multiple FIBs share a pool,
+ * all operations (route modifications, FIB creation/destruction)
+ * must be serialized by the caller.
+ */
+
+#include <stdint.h>
+#include <string.h>
+
+#include <rte_common.h>
+
+#include "fib_tbl8.h"
+#include "rte_fib_tbl8_pool.h"
+
+struct rte_fib_tbl8_pool {
+ uint64_t *tbl8; /**< tbl8 group array */
+ uint32_t *free_list; /**< Stack of free group indices */
+ uint32_t cur_tbl8s; /**< Number of allocated groups */
+ uint32_t num_tbl8s; /**< Total number of tbl8 groups */
+ uint8_t nh_sz; /**< Nexthop entry size (0-3) */
+ int socket_id;
+ uint32_t refcnt; /**< Reference count */
+};
+
+/**
+ * Get a free tbl8 group index from the pool.
+ * @return index on success, -ENOSPC if pool is full
+ */
+int32_t
+fib_tbl8_pool_get(struct rte_fib_tbl8_pool *pool);
+
+/**
+ * Return a tbl8 group index to the pool.
+ */
+void
+fib_tbl8_pool_put(struct rte_fib_tbl8_pool *pool, uint32_t idx);
+
+/**
+ * Clear a tbl8 group and return its index to the pool.
+ */
+void
+fib_tbl8_pool_cleanup_and_free(struct rte_fib_tbl8_pool *pool, uint64_t idx);
+
+/**
+ * RCU defer queue callback for tbl8 group reclamation.
+ * Shared by dir24_8 and trie backends.
+ * Use as params.free_fn with params.p = pool.
+ */
+void
+fib_tbl8_pool_rcu_free_cb(void *p, void *data, unsigned int n);
+
+/**
+ * Increment pool reference count.
+ */
+void
+fib_tbl8_pool_ref(struct rte_fib_tbl8_pool *pool);
+
+/**
+ * Decrement pool reference count. Free the pool if it reaches 0.
+ */
+void
+fib_tbl8_pool_unref(struct rte_fib_tbl8_pool *pool);
+
+#endif /* _FIB_TBL8_POOL_H_ */
diff --git a/lib/fib/meson.build b/lib/fib/meson.build
index 573fc50ff1..6ecd954b26 100644
--- a/lib/fib/meson.build
+++ b/lib/fib/meson.build
@@ -2,8 +2,9 @@
# Copyright(c) 2018 Vladimir Medvedkin <medvedkinv@gmail.com>
# Copyright(c) 2019 Intel Corporation
-sources = files('rte_fib.c', 'rte_fib6.c', 'dir24_8.c', 'trie.c')
-headers = files('rte_fib.h', 'rte_fib6.h')
+sources = files('rte_fib.c', 'rte_fib6.c', 'dir24_8.c', 'trie.c',
+ 'fib_tbl8_pool.c')
+headers = files('rte_fib.h', 'rte_fib6.h', 'rte_fib_tbl8_pool.h')
deps += ['rib']
deps += ['rcu']
deps += ['net']
diff --git a/lib/fib/rte_fib.h b/lib/fib/rte_fib.h
index b16a653535..b8c86566ad 100644
--- a/lib/fib/rte_fib.h
+++ b/lib/fib/rte_fib.h
@@ -19,6 +19,7 @@
#include <rte_common.h>
#include <rte_rcu_qsbr.h>
+#include <rte_fib_tbl8_pool.h>
#ifdef __cplusplus
extern "C" {
@@ -107,6 +108,8 @@ struct rte_fib_conf {
struct {
enum rte_fib_dir24_8_nh_sz nh_sz;
uint32_t num_tbl8;
+ /** Shared tbl8 pool (NULL = internal pool) */
+ struct rte_fib_tbl8_pool *tbl8_pool;
} dir24_8;
};
unsigned int flags; /**< Optional feature flags from RTE_FIB_F_* */
diff --git a/lib/fib/rte_fib6.h b/lib/fib/rte_fib6.h
index 4527328bf0..655a4c9501 100644
--- a/lib/fib/rte_fib6.h
+++ b/lib/fib/rte_fib6.h
@@ -20,6 +20,7 @@
#include <rte_common.h>
#include <rte_ip6.h>
#include <rte_rcu_qsbr.h>
+#include <rte_fib_tbl8_pool.h>
#ifdef __cplusplus
extern "C" {
@@ -95,6 +96,8 @@ struct rte_fib6_conf {
struct {
enum rte_fib_trie_nh_sz nh_sz;
uint32_t num_tbl8;
+ /** Shared tbl8 pool (NULL = internal pool) */
+ struct rte_fib_tbl8_pool *tbl8_pool;
} trie;
};
};
diff --git a/lib/fib/rte_fib_tbl8_pool.h b/lib/fib/rte_fib_tbl8_pool.h
new file mode 100644
index 0000000000..e362efe74b
--- /dev/null
+++ b/lib/fib/rte_fib_tbl8_pool.h
@@ -0,0 +1,76 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2026 Maxime Leroy, Free Mobile
+ */
+
+#ifndef _RTE_FIB_TBL8_POOL_H_
+#define _RTE_FIB_TBL8_POOL_H_
+
+/**
+ * @file
+ * Shared tbl8 pool for FIB backends.
+ *
+ * A tbl8 pool manages a shared array of tbl8 groups that can be used
+ * across multiple FIB instances (e.g., one per VRF).
+ *
+ * Two modes of operation:
+ * - Internal pool: set num_tbl8 in the FIB config and leave tbl8_pool
+ * NULL. The pool is created and destroyed with the FIB.
+ * - External shared pool: create with rte_fib_tbl8_pool_create(), pass
+ * the handle via the tbl8_pool config field. Each FIB holds a
+ * reference; the creator releases its reference with
+ * rte_fib_tbl8_pool_free(). The pool is freed when the last
+ * reference is dropped.
+ *
+ * Thread safety: none. The pool is not thread-safe. All operations
+ * on FIBs sharing the same pool (route updates, FIB creation and
+ * destruction, pool create/free) must be serialized by the caller.
+ */
+
+#include <stdint.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+struct rte_fib_tbl8_pool;
+
+/** tbl8 pool configuration */
+struct rte_fib_tbl8_pool_conf {
+ uint32_t num_tbl8; /**< Number of tbl8 groups */
+ uint8_t nh_sz; /**< Nexthop size: 0=1B, 1=2B, 2=4B, 3=8B */
+ int socket_id; /**< NUMA socket for memory allocation */
+};
+
+/**
+ * Create a tbl8 pool.
+ *
+ * @param name
+ * Pool name (for memory allocation tracking)
+ * @param conf
+ * Pool configuration
+ * @return
+ * Pool handle on success, NULL on failure with rte_errno set
+ */
+__rte_experimental
+struct rte_fib_tbl8_pool *
+rte_fib_tbl8_pool_create(const char *name,
+ const struct rte_fib_tbl8_pool_conf *conf);
+
+/**
+ * Release the creator's reference on a tbl8 pool.
+ *
+ * The pool is freed when the last reference is dropped (i.e. after
+ * all FIBs using this pool have been destroyed).
+ *
+ * @param pool
+ * Pool handle (NULL is allowed)
+ */
+__rte_experimental
+void
+rte_fib_tbl8_pool_free(struct rte_fib_tbl8_pool *pool);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_FIB_TBL8_POOL_H_ */
diff --git a/lib/fib/trie.c b/lib/fib/trie.c
index 198fc54395..798d322b1e 100644
--- a/lib/fib/trie.c
+++ b/lib/fib/trie.c
@@ -99,53 +99,18 @@ trie_get_lookup_fn(void *p, enum rte_fib6_lookup_type type)
return NULL;
}
-static void
-tbl8_pool_init(struct rte_trie_tbl *dp)
-{
- uint32_t i;
-
- /* put entire range of indexes to the tbl8 pool */
- for (i = 0; i < dp->number_tbl8s; i++)
- dp->tbl8_pool[i] = i;
-
- dp->tbl8_pool_pos = 0;
-}
-
-/*
- * Get an index of a free tbl8 from the pool
- */
-static inline int32_t
-tbl8_get(struct rte_trie_tbl *dp)
-{
- if (dp->tbl8_pool_pos == dp->number_tbl8s)
- /* no more free tbl8 */
- return -ENOSPC;
-
- /* next index */
- return dp->tbl8_pool[dp->tbl8_pool_pos++];
-}
-
-/*
- * Put an index of a free tbl8 back to the pool
- */
-static inline void
-tbl8_put(struct rte_trie_tbl *dp, uint32_t tbl8_ind)
-{
- dp->tbl8_pool[--dp->tbl8_pool_pos] = tbl8_ind;
-}
-
static int
tbl8_alloc(struct rte_trie_tbl *dp, uint64_t nh)
{
int64_t tbl8_idx;
uint8_t *tbl8_ptr;
- tbl8_idx = tbl8_get(dp);
+ tbl8_idx = fib_tbl8_pool_get(dp->pool);
/* If there are no tbl8 groups try to reclaim one. */
if (unlikely(tbl8_idx == -ENOSPC && dp->dq &&
!rte_rcu_qsbr_dq_reclaim(dp->dq, 1, NULL, NULL, NULL)))
- tbl8_idx = tbl8_get(dp);
+ tbl8_idx = fib_tbl8_pool_get(dp->pool);
if (tbl8_idx < 0)
return tbl8_idx;
@@ -157,23 +122,6 @@ tbl8_alloc(struct rte_trie_tbl *dp, uint64_t nh)
return tbl8_idx;
}
-static void
-tbl8_cleanup_and_free(struct rte_trie_tbl *dp, uint64_t tbl8_idx)
-{
- uint8_t *ptr = (uint8_t *)dp->tbl8 + (tbl8_idx * FIB_TBL8_GRP_NUM_ENT << dp->nh_sz);
-
- memset(ptr, 0, FIB_TBL8_GRP_NUM_ENT << dp->nh_sz);
- tbl8_put(dp, tbl8_idx);
-}
-
-static void
-__rcu_qsbr_free_resource(void *p, void *data, unsigned int n __rte_unused)
-{
- struct rte_trie_tbl *dp = p;
- uint64_t tbl8_idx = *(uint64_t *)data;
- tbl8_cleanup_and_free(dp, tbl8_idx);
-}
-
static void
tbl8_recycle(struct rte_trie_tbl *dp, void *par, uint64_t tbl8_idx)
{
@@ -223,10 +171,10 @@ tbl8_recycle(struct rte_trie_tbl *dp, void *par, uint64_t tbl8_idx)
}
if (dp->v == NULL) {
- tbl8_cleanup_and_free(dp, tbl8_idx);
+ fib_tbl8_pool_cleanup_and_free(dp->pool, tbl8_idx);
} else if (dp->rcu_mode == RTE_FIB6_QSBR_MODE_SYNC) {
rte_rcu_qsbr_synchronize(dp->v, RTE_QSBR_THRID_INVALID);
- tbl8_cleanup_and_free(dp, tbl8_idx);
+ fib_tbl8_pool_cleanup_and_free(dp->pool, tbl8_idx);
} else { /* RTE_FIB6_QSBR_MODE_DQ */
if (rte_rcu_qsbr_dq_enqueue(dp->dq, &tbl8_idx))
FIB_LOG(ERR, "Failed to push QSBR FIFO");
@@ -583,7 +531,7 @@ trie_modify(struct rte_fib6 *fib, const struct rte_ipv6_addr *ip,
return 0;
}
- if ((depth > 24) && (dp->rsvd_tbl8s + depth_diff > dp->number_tbl8s))
+ if ((depth > 24) && (dp->rsvd_tbl8s + depth_diff > dp->pool->num_tbl8s))
return -ENOSPC;
node = rte_rib6_insert(rib, &ip_masked, depth);
@@ -636,63 +584,66 @@ trie_create(const char *name, int socket_id,
{
char mem_name[TRIE_NAMESIZE];
struct rte_trie_tbl *dp = NULL;
+ struct rte_fib_tbl8_pool *pool;
uint64_t def_nh;
- uint32_t num_tbl8;
enum rte_fib_trie_nh_sz nh_sz;
if ((name == NULL) || (conf == NULL) ||
(conf->trie.nh_sz < RTE_FIB6_TRIE_2B) ||
(conf->trie.nh_sz > RTE_FIB6_TRIE_8B) ||
- (conf->trie.num_tbl8 >
- get_max_nh(conf->trie.nh_sz)) ||
- (conf->trie.num_tbl8 == 0) ||
(conf->default_nh >
get_max_nh(conf->trie.nh_sz))) {
-
rte_errno = EINVAL;
return NULL;
}
def_nh = conf->default_nh;
nh_sz = conf->trie.nh_sz;
- num_tbl8 = conf->trie.num_tbl8;
+
+ if (conf->trie.tbl8_pool != NULL) {
+ /* External shared pool: validate nh_sz matches. */
+ pool = conf->trie.tbl8_pool;
+ if (pool->nh_sz != nh_sz) {
+ rte_errno = EINVAL;
+ return NULL;
+ }
+ fib_tbl8_pool_ref(pool);
+ } else {
+ /* Internal pool: create from config. */
+ struct rte_fib_tbl8_pool_conf pool_conf = {
+ .num_tbl8 = conf->trie.num_tbl8,
+ .nh_sz = nh_sz,
+ .socket_id = socket_id,
+ };
+
+ if (conf->trie.num_tbl8 == 0 ||
+ conf->trie.num_tbl8 >
+ get_max_nh(nh_sz)) {
+ rte_errno = EINVAL;
+ return NULL;
+ }
+
+ pool = rte_fib_tbl8_pool_create(name, &pool_conf);
+ if (pool == NULL)
+ return NULL;
+ }
snprintf(mem_name, sizeof(mem_name), "DP_%s", name);
dp = rte_zmalloc_socket(name, sizeof(struct rte_trie_tbl) +
TRIE_TBL24_NUM_ENT * (1 << nh_sz) + sizeof(uint32_t),
RTE_CACHE_LINE_SIZE, socket_id);
if (dp == NULL) {
+ fib_tbl8_pool_unref(pool);
rte_errno = ENOMEM;
- return dp;
- }
-
- fib_tbl8_write(&dp->tbl24, (def_nh << 1), nh_sz, 1 << 24);
-
- snprintf(mem_name, sizeof(mem_name), "TBL8_%p", dp);
- dp->tbl8 = rte_zmalloc_socket(mem_name, FIB_TBL8_GRP_NUM_ENT *
- (1ll << nh_sz) * (num_tbl8 + 1),
- RTE_CACHE_LINE_SIZE, socket_id);
- if (dp->tbl8 == NULL) {
- rte_errno = ENOMEM;
- rte_free(dp);
return NULL;
}
+
dp->def_nh = def_nh;
dp->nh_sz = nh_sz;
- dp->number_tbl8s = num_tbl8;
+ dp->pool = pool;
+ dp->tbl8 = pool->tbl8;
- snprintf(mem_name, sizeof(mem_name), "TBL8_idxes_%p", dp);
- dp->tbl8_pool = rte_zmalloc_socket(mem_name,
- sizeof(uint32_t) * dp->number_tbl8s,
- RTE_CACHE_LINE_SIZE, socket_id);
- if (dp->tbl8_pool == NULL) {
- rte_errno = ENOMEM;
- rte_free(dp->tbl8);
- rte_free(dp);
- return NULL;
- }
-
- tbl8_pool_init(dp);
+ fib_tbl8_write(&dp->tbl24, (def_nh << 1), nh_sz, 1 << 24);
return dp;
}
@@ -703,8 +654,7 @@ trie_free(void *p)
struct rte_trie_tbl *dp = (struct rte_trie_tbl *)p;
rte_rcu_qsbr_dq_delete(dp->dq);
- rte_free(dp->tbl8_pool);
- rte_free(dp->tbl8);
+ fib_tbl8_pool_unref(dp->pool);
rte_free(dp);
}
@@ -735,8 +685,8 @@ trie_rcu_qsbr_add(struct rte_trie_tbl *dp, struct rte_fib6_rcu_config *cfg,
if (params.max_reclaim_size == 0)
params.max_reclaim_size = RTE_FIB6_RCU_DQ_RECLAIM_MAX;
params.esize = sizeof(uint64_t);
- params.free_fn = __rcu_qsbr_free_resource;
- params.p = dp;
+ params.free_fn = fib_tbl8_pool_rcu_free_cb;
+ params.p = dp->pool;
params.v = cfg->v;
dp->dq = rte_rcu_qsbr_dq_create(¶ms);
if (dp->dq == NULL) {
diff --git a/lib/fib/trie.h b/lib/fib/trie.h
index 30fa886792..61df56b1bb 100644
--- a/lib/fib/trie.h
+++ b/lib/fib/trie.h
@@ -11,7 +11,7 @@
#include <rte_common.h>
#include <rte_fib6.h>
-#include "fib_tbl8.h"
+#include "fib_tbl8_pool.h"
/**
* @file
@@ -26,13 +26,11 @@
#define TRIE_EXT_ENT 1
struct rte_trie_tbl {
- uint32_t number_tbl8s; /**< Total number of tbl8s */
uint32_t rsvd_tbl8s; /**< Number of reserved tbl8s */
uint64_t def_nh; /**< Default next hop */
enum rte_fib_trie_nh_sz nh_sz; /**< Size of nexthop entry */
uint64_t *tbl8; /**< tbl8 table. */
- uint32_t *tbl8_pool; /**< bitmap containing free tbl8 idxes*/
- uint32_t tbl8_pool_pos;
+ struct rte_fib_tbl8_pool *pool; /**< tbl8 pool */
/* RCU config. */
enum rte_fib6_qsbr_mode rcu_mode; /**< Blocking, defer queue. */
struct rte_rcu_qsbr *v; /**< RCU QSBR variable. */
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [RFC 4/5] fib: add resizable tbl8 pool
2026-03-31 21:41 [RFC 0/5] fib: shared and resizable tbl8 pool Maxime Leroy
` (2 preceding siblings ...)
2026-03-31 21:41 ` [RFC 3/5] fib: add shared tbl8 pool Maxime Leroy
@ 2026-03-31 21:41 ` Maxime Leroy
2026-03-31 21:41 ` [RFC 5/5] fib: add tbl8 pool stats API Maxime Leroy
` (2 subsequent siblings)
6 siblings, 0 replies; 9+ messages in thread
From: Maxime Leroy @ 2026-03-31 21:41 UTC (permalink / raw)
To: dev; +Cc: vladimir.medvedkin, rjarry, Maxime Leroy
Add dynamic resize support to the shared tbl8 pool. When all groups
are in use, the pool doubles its capacity via an RCU-safe pointer
swap.
The resize mechanism:
1. Allocate new tbl8 array (double the current size)
2. Copy existing data
3. Patch all registered dp->tbl8 consumer pointers via SLIST
4. rte_rcu_qsbr_synchronize() to wait for all readers
5. Free old tbl8 array
The pool maintains a SLIST of consumer pointers (dp->tbl8) that are
registered at FIB creation and unregistered at FIB destruction.
A new fib_tbl8_pool_alloc() function replaces the per-backend
tbl8_alloc logic: it handles get + RCU reclaim retry + resize retry +
group initialization in one place.
RCU is required for resize and is configured either:
- Explicitly via rte_fib_tbl8_pool_rcu_qsbr_add() for external pools
- Automatically propagated from rte_fib_rcu_qsbr_add() for internal
pools
New public API:
- rte_fib_tbl8_pool_rcu_qsbr_add()
New config field:
- rte_fib_tbl8_pool_conf.max_tbl8 (maximum capacity, 0 keeps
the pool fixed-size)
Signed-off-by: Maxime Leroy <maxime@leroys.fr>
---
lib/fib/dir24_8.c | 49 +++++-----
lib/fib/fib_tbl8_pool.c | 174 +++++++++++++++++++++++++++++++++++-
lib/fib/fib_tbl8_pool.h | 41 ++++++++-
lib/fib/rte_fib_tbl8_pool.h | 56 +++++++++++-
lib/fib/trie.c | 46 ++++++----
5 files changed, 323 insertions(+), 43 deletions(-)
diff --git a/lib/fib/dir24_8.c b/lib/fib/dir24_8.c
index b8e588a56a..3e8d8d7321 100644
--- a/lib/fib/dir24_8.c
+++ b/lib/fib/dir24_8.c
@@ -155,26 +155,8 @@ dir24_8_get_lookup_fn(void *p, enum rte_fib_lookup_type type, bool be_addr)
static int
tbl8_alloc(struct dir24_8_tbl *dp, uint64_t nh)
{
- int64_t tbl8_idx;
- uint8_t *tbl8_ptr;
-
- tbl8_idx = fib_tbl8_pool_get(dp->pool);
-
- /* If there are no tbl8 groups try to reclaim one. */
- if (unlikely(tbl8_idx == -ENOSPC && dp->dq &&
- !rte_rcu_qsbr_dq_reclaim(dp->dq, 1, NULL, NULL, NULL)))
- tbl8_idx = fib_tbl8_pool_get(dp->pool);
-
- if (tbl8_idx < 0)
- return tbl8_idx;
- tbl8_ptr = (uint8_t *)dp->tbl8 +
- ((tbl8_idx * FIB_TBL8_GRP_NUM_ENT) <<
- dp->nh_sz);
- /*Init tbl8 entries with nexthop from tbl24*/
- fib_tbl8_write((void *)tbl8_ptr, nh|
- DIR24_8_EXT_ENT, dp->nh_sz,
- FIB_TBL8_GRP_NUM_ENT);
- return tbl8_idx;
+ return fib_tbl8_pool_alloc(dp->pool, nh | DIR24_8_EXT_ENT,
+ dp->dq);
}
static void
@@ -436,7 +418,9 @@ dir24_8_modify(struct rte_fib *fib, uint32_t ip, uint8_t depth,
tmp = rte_rib_get_nxt(rib, ip, 24, NULL,
RTE_RIB_GET_NXT_COVER);
if ((tmp == NULL) &&
- (dp->rsvd_tbl8s >= dp->pool->num_tbl8s))
+ (dp->rsvd_tbl8s >= (dp->pool->max_tbl8s ?
+ dp->pool->max_tbl8s :
+ dp->pool->num_tbl8s)))
return -ENOSPC;
}
@@ -549,6 +533,13 @@ dir24_8_create(const char *name, int socket_id, struct rte_fib_conf *fib_conf)
dp->def_nh = def_nh;
dp->nh_sz = nh_sz;
+ if (fib_tbl8_pool_register(pool, &dp->tbl8) != 0) {
+ rte_errno = ENOMEM;
+ fib_tbl8_pool_unref(pool);
+ rte_free(dp);
+ return NULL;
+ }
+
/* Init table with default value */
fib_tbl8_write(dp->tbl24, (def_nh << 1), nh_sz, 1 << 24);
@@ -560,6 +551,7 @@ dir24_8_free(void *p)
{
struct dir24_8_tbl *dp = (struct dir24_8_tbl *)p;
+ fib_tbl8_pool_unregister(dp->pool, &dp->tbl8);
rte_rcu_qsbr_dq_delete(dp->dq);
fib_tbl8_pool_unref(dp->pool);
rte_free(dp);
@@ -578,6 +570,21 @@ dir24_8_rcu_qsbr_add(struct dir24_8_tbl *dp, struct rte_fib_rcu_config *cfg,
if (dp->v != NULL)
return -EEXIST;
+ /* Propagate RCU to the pool for resize if it is resizable */
+ if (dp->pool->max_tbl8s > 0) {
+ if (dp->pool->v != NULL && dp->pool->v != cfg->v)
+ return -EINVAL;
+ if (dp->pool->v == NULL) {
+ struct rte_fib_tbl8_pool_rcu_config pool_rcu = {
+ .v = cfg->v,
+ };
+ int rc = rte_fib_tbl8_pool_rcu_qsbr_add(
+ dp->pool, &pool_rcu);
+ if (rc != 0)
+ return rc;
+ }
+ }
+
if (cfg->mode == RTE_FIB_QSBR_MODE_SYNC) {
/* No other things to do. */
} else if (cfg->mode == RTE_FIB_QSBR_MODE_DQ) {
diff --git a/lib/fib/fib_tbl8_pool.c b/lib/fib/fib_tbl8_pool.c
index 5f8ba74219..10e0c57ba7 100644
--- a/lib/fib/fib_tbl8_pool.c
+++ b/lib/fib/fib_tbl8_pool.c
@@ -2,14 +2,18 @@
* Copyright(c) 2026 Maxime Leroy, Free Mobile
*/
+#include <stdatomic.h>
#include <stdint.h>
+#include <stdlib.h>
#include <string.h>
#include <eal_export.h>
+#include <rte_branch_prediction.h>
#include <rte_debug.h>
#include <rte_errno.h>
#include <rte_malloc.h>
+#include "fib_log.h"
#include "fib_tbl8_pool.h"
static void
@@ -62,6 +66,151 @@ fib_tbl8_pool_rcu_free_cb(void *p, void *data,
fib_tbl8_pool_cleanup_and_free(pool, tbl8_idx);
}
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_fib_tbl8_pool_resize, 26.07)
+int
+rte_fib_tbl8_pool_resize(struct rte_fib_tbl8_pool *pool,
+ uint32_t new_num_tbl8)
+{
+ uint32_t new_num, old_num;
+ uint64_t *new_tbl8;
+ uint32_t *new_fl;
+ char mem_name[64];
+ struct fib_tbl8_consumer *c;
+
+ if (pool == NULL)
+ return -EINVAL;
+ if (pool->v == NULL)
+ return -EINVAL;
+
+ old_num = pool->num_tbl8s;
+ new_num = new_num_tbl8;
+ if (pool->max_tbl8s != 0 && new_num > pool->max_tbl8s)
+ new_num = pool->max_tbl8s;
+ if (new_num <= old_num)
+ return -ENOSPC;
+
+ FIB_LOG(INFO, "Resizing tbl8 pool from %u to %u groups",
+ old_num, new_num);
+
+ snprintf(mem_name, sizeof(mem_name), "TBL8_%u", new_num);
+ new_tbl8 = rte_zmalloc_socket(mem_name,
+ FIB_TBL8_GRP_NUM_ENT * (1ULL << pool->nh_sz) * (new_num + 1),
+ RTE_CACHE_LINE_SIZE, pool->socket_id);
+ if (new_tbl8 == NULL)
+ return -ENOMEM;
+
+ snprintf(mem_name, sizeof(mem_name), "TBL8_FL_%u", new_num);
+ new_fl = rte_zmalloc_socket(mem_name,
+ sizeof(uint32_t) * new_num,
+ RTE_CACHE_LINE_SIZE, pool->socket_id);
+ if (new_fl == NULL) {
+ rte_free(new_tbl8);
+ return -ENOMEM;
+ }
+
+ /* Copy existing tbl8 data */
+ memcpy(new_tbl8, pool->tbl8,
+ FIB_TBL8_GRP_NUM_ENT * (1ULL << pool->nh_sz) * (old_num + 1));
+
+ /*
+ * Rebuild the free list: copy the existing in-use portion,
+ * then append new indices at the top.
+ */
+ memcpy(new_fl, pool->free_list, sizeof(uint32_t) * old_num);
+ uint32_t i;
+ for (i = old_num; i < new_num; i++)
+ new_fl[i] = i;
+
+ uint64_t *old_tbl8 = pool->tbl8;
+ uint32_t *old_fl = pool->free_list;
+
+ pool->free_list = new_fl;
+ pool->num_tbl8s = new_num;
+
+ /*
+ * Ensure copied tbl8 contents are visible before publishing
+ * the new pointer on weakly ordered architectures.
+ */
+ atomic_thread_fence(memory_order_release);
+
+ pool->tbl8 = new_tbl8;
+
+ /* Update all registered consumer tbl8 pointers */
+ SLIST_FOREACH(c, &pool->consumers, next)
+ *c->tbl8_ptr = new_tbl8;
+
+ /*
+ * If RCU is configured, readers may still be accessing old_tbl8.
+ * Synchronize before freeing.
+ */
+ if (pool->v != NULL)
+ rte_rcu_qsbr_synchronize(pool->v, RTE_QSBR_THRID_INVALID);
+
+ rte_free(old_tbl8);
+ rte_free(old_fl);
+
+ return 0;
+}
+
+int
+fib_tbl8_pool_alloc(struct rte_fib_tbl8_pool *pool, uint64_t nh,
+ struct rte_rcu_qsbr_dq *dq)
+{
+ int32_t tbl8_idx;
+ uint8_t *tbl8_ptr;
+
+ tbl8_idx = fib_tbl8_pool_get(pool);
+
+ /* If there are no tbl8 groups try to reclaim one. */
+ if (unlikely(tbl8_idx == -ENOSPC && dq &&
+ !rte_rcu_qsbr_dq_reclaim(dq, 1, NULL, NULL, NULL)))
+ tbl8_idx = fib_tbl8_pool_get(pool);
+
+ /* Still full -- try to grow the pool */
+ if (unlikely(tbl8_idx == -ENOSPC &&
+ rte_fib_tbl8_pool_resize(pool, pool->num_tbl8s * 2) == 0))
+ tbl8_idx = fib_tbl8_pool_get(pool);
+
+ if (tbl8_idx < 0)
+ return tbl8_idx;
+
+ tbl8_ptr = (uint8_t *)pool->tbl8 +
+ ((tbl8_idx * FIB_TBL8_GRP_NUM_ENT) << pool->nh_sz);
+ /* Init tbl8 entries with nexthop */
+ fib_tbl8_write((void *)tbl8_ptr, nh, pool->nh_sz,
+ FIB_TBL8_GRP_NUM_ENT);
+ return tbl8_idx;
+}
+
+int
+fib_tbl8_pool_register(struct rte_fib_tbl8_pool *pool, uint64_t **tbl8_ptr)
+{
+ struct fib_tbl8_consumer *c;
+
+ c = calloc(1, sizeof(*c));
+ if (c == NULL)
+ return -ENOMEM;
+
+ c->tbl8_ptr = tbl8_ptr;
+ SLIST_INSERT_HEAD(&pool->consumers, c, next);
+ return 0;
+}
+
+void
+fib_tbl8_pool_unregister(struct rte_fib_tbl8_pool *pool, uint64_t **tbl8_ptr)
+{
+ struct fib_tbl8_consumer *c;
+
+ SLIST_FOREACH(c, &pool->consumers, next) {
+ if (c->tbl8_ptr == tbl8_ptr) {
+ SLIST_REMOVE(&pool->consumers, c,
+ fib_tbl8_consumer, next);
+ free(c);
+ return;
+ }
+ }
+}
+
void
fib_tbl8_pool_ref(struct rte_fib_tbl8_pool *pool)
{
@@ -71,6 +220,7 @@ fib_tbl8_pool_ref(struct rte_fib_tbl8_pool *pool)
static void
pool_free(struct rte_fib_tbl8_pool *pool)
{
+ RTE_ASSERT(SLIST_EMPTY(&pool->consumers));
rte_free(pool->free_list);
rte_free(pool->tbl8);
rte_free(pool);
@@ -92,7 +242,9 @@ rte_fib_tbl8_pool_create(const char *name,
char mem_name[64];
if (name == NULL || conf == NULL || conf->num_tbl8 == 0 ||
- conf->nh_sz > 3) {
+ conf->nh_sz > 3 ||
+ (conf->max_tbl8 != 0 &&
+ conf->max_tbl8 < conf->num_tbl8)) {
rte_errno = EINVAL;
return NULL;
}
@@ -107,8 +259,10 @@ rte_fib_tbl8_pool_create(const char *name,
pool->nh_sz = conf->nh_sz;
pool->num_tbl8s = conf->num_tbl8;
+ pool->max_tbl8s = conf->max_tbl8;
pool->socket_id = conf->socket_id;
pool->refcnt = 1;
+ SLIST_INIT(&pool->consumers);
snprintf(mem_name, sizeof(mem_name), "TBL8_%s", name);
pool->tbl8 = rte_zmalloc_socket(mem_name,
@@ -146,3 +300,21 @@ rte_fib_tbl8_pool_free(struct rte_fib_tbl8_pool *pool)
fib_tbl8_pool_unref(pool);
}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_fib_tbl8_pool_rcu_qsbr_add, 26.07)
+int
+rte_fib_tbl8_pool_rcu_qsbr_add(struct rte_fib_tbl8_pool *pool,
+ const struct rte_fib_tbl8_pool_rcu_config *cfg)
+{
+ if (pool == NULL || cfg == NULL || cfg->v == NULL)
+ return -EINVAL;
+
+ if (pool->v != NULL)
+ return -EEXIST;
+
+ if (pool->max_tbl8s == 0)
+ return -ENOTSUP;
+
+ pool->v = cfg->v;
+ return 0;
+}
diff --git a/lib/fib/fib_tbl8_pool.h b/lib/fib/fib_tbl8_pool.h
index 285f06d87f..edd0aedf0f 100644
--- a/lib/fib/fib_tbl8_pool.h
+++ b/lib/fib/fib_tbl8_pool.h
@@ -17,19 +17,30 @@
#include <stdint.h>
#include <string.h>
+#include <sys/queue.h>
+
#include <rte_common.h>
#include "fib_tbl8.h"
#include "rte_fib_tbl8_pool.h"
+/** Consumer entry -- tracks each FIB's tbl8 pointer for resize updates. */
+struct fib_tbl8_consumer {
+ SLIST_ENTRY(fib_tbl8_consumer) next;
+ uint64_t **tbl8_ptr; /**< Points to the FIB's dp->tbl8 field */
+};
+
struct rte_fib_tbl8_pool {
uint64_t *tbl8; /**< tbl8 group array */
uint32_t *free_list; /**< Stack of free group indices */
uint32_t cur_tbl8s; /**< Number of allocated groups */
- uint32_t num_tbl8s; /**< Total number of tbl8 groups */
+ uint32_t num_tbl8s; /**< Current capacity */
+ uint32_t max_tbl8s; /**< Maximum capacity (0 = fixed) */
uint8_t nh_sz; /**< Nexthop entry size (0-3) */
int socket_id;
uint32_t refcnt; /**< Reference count */
+ struct rte_rcu_qsbr *v; /**< RCU QSBR variable (for resize) */
+ SLIST_HEAD(, fib_tbl8_consumer) consumers; /**< Registered FIBs */
};
/**
@@ -71,4 +82,32 @@ fib_tbl8_pool_ref(struct rte_fib_tbl8_pool *pool);
void
fib_tbl8_pool_unref(struct rte_fib_tbl8_pool *pool);
+/**
+ * Allocate a tbl8 group, resizing the pool if needed.
+ *
+ * Tries fib_tbl8_pool_get() first; on ENOSPC, tries RCU reclaim via @p dq,
+ * then attempts fib_tbl8_pool_resize(). Initialises the group with @p nh.
+ *
+ * @return group index on success, negative errno on failure.
+ */
+int
+fib_tbl8_pool_alloc(struct rte_fib_tbl8_pool *pool, uint64_t nh,
+ struct rte_rcu_qsbr_dq *dq);
+
+/**
+ * Register a FIB consumer so its tbl8 pointer is updated on resize.
+ *
+ * @param pool Pool handle.
+ * @param tbl8_ptr Address of the consumer's tbl8 pointer (e.g. &dp->tbl8).
+ * @return 0 on success, negative errno on failure.
+ */
+int
+fib_tbl8_pool_register(struct rte_fib_tbl8_pool *pool, uint64_t **tbl8_ptr);
+
+/**
+ * Unregister a FIB consumer.
+ */
+void
+fib_tbl8_pool_unregister(struct rte_fib_tbl8_pool *pool, uint64_t **tbl8_ptr);
+
#endif /* _FIB_TBL8_POOL_H_ */
diff --git a/lib/fib/rte_fib_tbl8_pool.h b/lib/fib/rte_fib_tbl8_pool.h
index e362efe74b..d37ddedff3 100644
--- a/lib/fib/rte_fib_tbl8_pool.h
+++ b/lib/fib/rte_fib_tbl8_pool.h
@@ -21,6 +21,12 @@
* rte_fib_tbl8_pool_free(). The pool is freed when the last
* reference is dropped.
*
+ * Resizing: if max_tbl8 is set in the pool configuration, the pool
+ * can grow on demand up to that limit. This requires an RCU QSBR
+ * variable (rte_fib_tbl8_pool_rcu_qsbr_add). When max_tbl8 is 0
+ * (default), the pool has a fixed capacity and no RCU is needed
+ * for pool operation.
+ *
* Thread safety: none. The pool is not thread-safe. All operations
* on FIBs sharing the same pool (route updates, FIB creation and
* destruction, pool create/free) must be serialized by the caller.
@@ -28,6 +34,8 @@
#include <stdint.h>
+#include <rte_rcu_qsbr.h>
+
#ifdef __cplusplus
extern "C" {
#endif
@@ -36,11 +44,17 @@ struct rte_fib_tbl8_pool;
/** tbl8 pool configuration */
struct rte_fib_tbl8_pool_conf {
- uint32_t num_tbl8; /**< Number of tbl8 groups */
+ uint32_t num_tbl8; /**< Initial number of tbl8 groups */
+ uint32_t max_tbl8; /**< Max tbl8 groups (0 = fixed, no resize) */
uint8_t nh_sz; /**< Nexthop size: 0=1B, 1=2B, 2=4B, 3=8B */
int socket_id; /**< NUMA socket for memory allocation */
};
+/** RCU QSBR configuration for tbl8 pool resize. */
+struct rte_fib_tbl8_pool_rcu_config {
+ struct rte_rcu_qsbr *v; /**< RCU QSBR variable */
+};
+
/**
* Create a tbl8 pool.
*
@@ -69,6 +83,46 @@ __rte_experimental
void
rte_fib_tbl8_pool_free(struct rte_fib_tbl8_pool *pool);
+/**
+ * Associate an RCU QSBR variable with the pool.
+ *
+ * Required for resizable pools so that the old tbl8 array can be
+ * reclaimed safely after a resize.
+ *
+ * @param pool
+ * Pool handle
+ * @param cfg
+ * RCU configuration
+ * @return
+ * 0 on success, negative errno on failure
+ */
+__rte_experimental
+int
+rte_fib_tbl8_pool_rcu_qsbr_add(struct rte_fib_tbl8_pool *pool,
+ const struct rte_fib_tbl8_pool_rcu_config *cfg);
+
+/**
+ * Resize the tbl8 pool to a given capacity.
+ *
+ * The new capacity must be greater than the current capacity and
+ * must not exceed max_tbl8 (if set). Requires RCU to be configured.
+ *
+ * @param pool
+ * Pool handle
+ * @param new_num_tbl8
+ * Target number of tbl8 groups
+ * @return
+ * 0 on success
+ * -EINVAL if RCU is not configured (see rte_fib_tbl8_pool_rcu_qsbr_add)
+ * -ENOSPC if pool cannot grow (at max capacity or
+ * new_num_tbl8 <= current capacity)
+ * -ENOMEM if memory allocation failed
+ */
+__rte_experimental
+int
+rte_fib_tbl8_pool_resize(struct rte_fib_tbl8_pool *pool,
+ uint32_t new_num_tbl8);
+
#ifdef __cplusplus
}
#endif
diff --git a/lib/fib/trie.c b/lib/fib/trie.c
index 798d322b1e..7b9c11f81f 100644
--- a/lib/fib/trie.c
+++ b/lib/fib/trie.c
@@ -102,24 +102,7 @@ trie_get_lookup_fn(void *p, enum rte_fib6_lookup_type type)
static int
tbl8_alloc(struct rte_trie_tbl *dp, uint64_t nh)
{
- int64_t tbl8_idx;
- uint8_t *tbl8_ptr;
-
- tbl8_idx = fib_tbl8_pool_get(dp->pool);
-
- /* If there are no tbl8 groups try to reclaim one. */
- if (unlikely(tbl8_idx == -ENOSPC && dp->dq &&
- !rte_rcu_qsbr_dq_reclaim(dp->dq, 1, NULL, NULL, NULL)))
- tbl8_idx = fib_tbl8_pool_get(dp->pool);
-
- if (tbl8_idx < 0)
- return tbl8_idx;
- tbl8_ptr = get_tbl_p_by_idx(dp->tbl8,
- tbl8_idx * FIB_TBL8_GRP_NUM_ENT, dp->nh_sz);
- /*Init tbl8 entries with nexthop from tbl24*/
- fib_tbl8_write((void *)tbl8_ptr, nh, dp->nh_sz,
- FIB_TBL8_GRP_NUM_ENT);
- return tbl8_idx;
+ return fib_tbl8_pool_alloc(dp->pool, nh, dp->dq);
}
static void
@@ -531,7 +514,9 @@ trie_modify(struct rte_fib6 *fib, const struct rte_ipv6_addr *ip,
return 0;
}
- if ((depth > 24) && (dp->rsvd_tbl8s + depth_diff > dp->pool->num_tbl8s))
+ if ((depth > 24) && (dp->rsvd_tbl8s + depth_diff >
+ (dp->pool->max_tbl8s ? dp->pool->max_tbl8s :
+ dp->pool->num_tbl8s)))
return -ENOSPC;
node = rte_rib6_insert(rib, &ip_masked, depth);
@@ -643,6 +628,13 @@ trie_create(const char *name, int socket_id,
dp->pool = pool;
dp->tbl8 = pool->tbl8;
+ if (fib_tbl8_pool_register(pool, &dp->tbl8) != 0) {
+ rte_errno = ENOMEM;
+ fib_tbl8_pool_unref(pool);
+ rte_free(dp);
+ return NULL;
+ }
+
fib_tbl8_write(&dp->tbl24, (def_nh << 1), nh_sz, 1 << 24);
return dp;
@@ -653,6 +645,7 @@ trie_free(void *p)
{
struct rte_trie_tbl *dp = (struct rte_trie_tbl *)p;
+ fib_tbl8_pool_unregister(dp->pool, &dp->tbl8);
rte_rcu_qsbr_dq_delete(dp->dq);
fib_tbl8_pool_unref(dp->pool);
rte_free(dp);
@@ -671,6 +664,21 @@ trie_rcu_qsbr_add(struct rte_trie_tbl *dp, struct rte_fib6_rcu_config *cfg,
if (dp->v != NULL)
return -EEXIST;
+ /* Propagate RCU to the pool for resize if it is resizable */
+ if (dp->pool->max_tbl8s > 0) {
+ if (dp->pool->v != NULL && dp->pool->v != cfg->v)
+ return -EINVAL;
+ if (dp->pool->v == NULL) {
+ struct rte_fib_tbl8_pool_rcu_config pool_rcu = {
+ .v = cfg->v,
+ };
+ int rc = rte_fib_tbl8_pool_rcu_qsbr_add(
+ dp->pool, &pool_rcu);
+ if (rc != 0)
+ return rc;
+ }
+ }
+
switch (cfg->mode) {
case RTE_FIB6_QSBR_MODE_DQ:
/* Init QSBR defer queue. */
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [RFC 5/5] fib: add tbl8 pool stats API
2026-03-31 21:41 [RFC 0/5] fib: shared and resizable tbl8 pool Maxime Leroy
` (3 preceding siblings ...)
2026-03-31 21:41 ` [RFC 4/5] fib: add resizable " Maxime Leroy
@ 2026-03-31 21:41 ` Maxime Leroy
2026-03-31 22:17 ` [RFC 0/5] fib: shared and resizable tbl8 pool Robin Jarry
2026-03-31 22:30 ` Stephen Hemminger
6 siblings, 0 replies; 9+ messages in thread
From: Maxime Leroy @ 2026-03-31 21:41 UTC (permalink / raw)
To: dev; +Cc: vladimir.medvedkin, rjarry, Maxime Leroy
Add rte_fib_tbl8_pool_get_stats() to retrieve the number of used and
total tbl8 groups from a pool handle directly, without going through
a FIB instance.
Signed-off-by: Maxime Leroy <maxime@leroys.fr>
---
lib/fib/fib_tbl8_pool.c | 17 +++++++++++++++++
lib/fib/rte_fib_tbl8_pool.h | 19 +++++++++++++++++++
2 files changed, 36 insertions(+)
diff --git a/lib/fib/fib_tbl8_pool.c b/lib/fib/fib_tbl8_pool.c
index 10e0c57ba7..d47fccd987 100644
--- a/lib/fib/fib_tbl8_pool.c
+++ b/lib/fib/fib_tbl8_pool.c
@@ -318,3 +318,20 @@ rte_fib_tbl8_pool_rcu_qsbr_add(struct rte_fib_tbl8_pool *pool,
pool->v = cfg->v;
return 0;
}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_fib_tbl8_pool_get_stats, 26.07)
+int
+rte_fib_tbl8_pool_get_stats(struct rte_fib_tbl8_pool *pool,
+ uint32_t *used, uint32_t *total, uint32_t *max)
+{
+ if (pool == NULL)
+ return -EINVAL;
+
+ if (used != NULL)
+ *used = pool->cur_tbl8s;
+ if (total != NULL)
+ *total = pool->num_tbl8s;
+ if (max != NULL)
+ *max = pool->max_tbl8s;
+ return 0;
+}
diff --git a/lib/fib/rte_fib_tbl8_pool.h b/lib/fib/rte_fib_tbl8_pool.h
index d37ddedff3..49a2589a5b 100644
--- a/lib/fib/rte_fib_tbl8_pool.h
+++ b/lib/fib/rte_fib_tbl8_pool.h
@@ -123,6 +123,25 @@ int
rte_fib_tbl8_pool_resize(struct rte_fib_tbl8_pool *pool,
uint32_t new_num_tbl8);
+/**
+ * Retrieve tbl8 pool statistics.
+ *
+ * @param pool
+ * Pool handle
+ * @param used
+ * Number of tbl8 groups currently in use (can be NULL)
+ * @param total
+ * Total number of tbl8 groups (current capacity, can be NULL)
+ * @param max
+ * Maximum number of tbl8 groups (0 = fixed, can be NULL)
+ * @return
+ * 0 on success, -EINVAL if pool is NULL
+ */
+__rte_experimental
+int
+rte_fib_tbl8_pool_get_stats(struct rte_fib_tbl8_pool *pool,
+ uint32_t *used, uint32_t *total, uint32_t *max);
+
#ifdef __cplusplus
}
#endif
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [RFC 0/5] fib: shared and resizable tbl8 pool
2026-03-31 21:41 [RFC 0/5] fib: shared and resizable tbl8 pool Maxime Leroy
` (4 preceding siblings ...)
2026-03-31 21:41 ` [RFC 5/5] fib: add tbl8 pool stats API Maxime Leroy
@ 2026-03-31 22:17 ` Robin Jarry
2026-04-01 9:15 ` Maxime Leroy
2026-03-31 22:30 ` Stephen Hemminger
6 siblings, 1 reply; 9+ messages in thread
From: Robin Jarry @ 2026-03-31 22:17 UTC (permalink / raw)
To: Maxime Leroy, dev; +Cc: vladimir.medvedkin
Hi Maxime,
Maxime Leroy, Mar 31, 2026 at 23:41:
> This RFC proposes an optional shared tbl8 pool for FIB/FIB6,
> to address the difficulty of sizing num_tbl8 upfront.
>
> In practice, tbl8 usage depends on prefix distribution and
> evolves over time. In multi-VRF environments, some VRFs are
> elephants (full table, thousands of tbl8 groups) while others
> consume very little (mostly /24 or shorter). Per-FIB sizing
> forces each instance to provision for its worst case, leading
> to significant memory waste.
>
> A shared pool solves this: all FIBs draw from the same tbl8
> memory, so elephant VRFs use what they need while light VRFs
> cost almost nothing. The sharing granularity is flexible: one pool per
> VRF, per address family, a global pool, or no sharing at all.
>
> This series adds:
>
> - A shared tbl8 pool, replacing per-backend allocation
> (bitmap in dir24_8, stack in trie) with a common
> refcounted O(1) stack allocator.
> - An optional resizable mode (grow via alloc + copy + QSBR
> synchronize), removing the need to guess peak usage at
> creation time.
> - A stats API (rte_fib_tbl8_pool_get_stats()) exposing
> used/total/max counters.
>
> All features are opt-in:
>
> - Existing per-FIB allocation remains the default.
> - Shared pool is enabled via the tbl8_pool config field.
> - Resize is enabled by setting max_tbl8 > 0 with QSBR.
The shared pool is nice, but dynamic resize is awesome.
I have gone over the implementation and it seems sane to me. The only
concern I might have is the change of tbl8 pool allocator for DIR24_8
from a O(n/64) slab to O(1) stack. I don't know if it can have
a performance impact on lookup or if it only affects the control plane
operations (add/del).
> Shrinking (reducing pool capacity after usage drops) is not
> part of this series. It would always be best-effort since
> there is no compaction: if any tbl8 group near the end of the
> pool is still in use, the pool cannot shrink. The current LIFO
> free-list makes this less likely by immediately reusing freed
> high indices, which prevents a contiguous free tail from
> forming. A different allocation strategy (e.g. a min-heap
> favoring low indices) could improve shrink opportunities, but
> is better addressed separately.
Shrinking would be nice to have but not critical in my opinion. I would
prefer if we could add a dynamic resize feature (and possibly RIB node
mempool sharing) for rte_rib* as well so that FIB objects can really be
scaled up on demand. For now, if you run out of space in the RIB, you
will get an ENOSPC error even if the FIB tbl8 pool still has room.
Nice work, thanks!
> A working integration in Grout is available:
> https://github.com/DPDK/grout/pull/581 (still a draft)
>
> Maxime Leroy (5):
> test/fib6: zero-initialize config struct
> fib: share tbl8 definitions between fib and fib6
> fib: add shared tbl8 pool
> fib: add resizable tbl8 pool
> fib: add tbl8 pool stats API
>
> app/test/test_fib6.c | 10 +-
> lib/fib/dir24_8.c | 234 ++++++++++---------------
> lib/fib/dir24_8.h | 17 +-
> lib/fib/fib_tbl8.h | 50 ++++++
> lib/fib/fib_tbl8_pool.c | 337 ++++++++++++++++++++++++++++++++++++
> lib/fib/fib_tbl8_pool.h | 113 ++++++++++++
> lib/fib/meson.build | 5 +-
> lib/fib/rte_fib.h | 3 +
> lib/fib/rte_fib6.h | 3 +
> lib/fib/rte_fib_tbl8_pool.h | 149 ++++++++++++++++
> lib/fib/trie.c | 230 +++++++++---------------
> lib/fib/trie.h | 15 +-
> 12 files changed, 844 insertions(+), 322 deletions(-)
> create mode 100644 lib/fib/fib_tbl8.h
> create mode 100644 lib/fib/fib_tbl8_pool.c
> create mode 100644 lib/fib/fib_tbl8_pool.h
> create mode 100644 lib/fib/rte_fib_tbl8_pool.h
--
Robin
> For recreational use only.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC 0/5] fib: shared and resizable tbl8 pool
2026-03-31 21:41 [RFC 0/5] fib: shared and resizable tbl8 pool Maxime Leroy
` (5 preceding siblings ...)
2026-03-31 22:17 ` [RFC 0/5] fib: shared and resizable tbl8 pool Robin Jarry
@ 2026-03-31 22:30 ` Stephen Hemminger
6 siblings, 0 replies; 9+ messages in thread
From: Stephen Hemminger @ 2026-03-31 22:30 UTC (permalink / raw)
To: Maxime Leroy; +Cc: dev, vladimir.medvedkin, rjarry
On Tue, 31 Mar 2026 23:41:12 +0200
Maxime Leroy <maxime@leroys.fr> wrote:
> This RFC proposes an optional shared tbl8 pool for FIB/FIB6,
> to address the difficulty of sizing num_tbl8 upfront.
>
> In practice, tbl8 usage depends on prefix distribution and
> evolves over time. In multi-VRF environments, some VRFs are
> elephants (full table, thousands of tbl8 groups) while others
> consume very little (mostly /24 or shorter). Per-FIB sizing
> forces each instance to provision for its worst case, leading
> to significant memory waste.
>
> A shared pool solves this: all FIBs draw from the same tbl8
> memory, so elephant VRFs use what they need while light VRFs
> cost almost nothing. The sharing granularity is flexible: one pool per
> VRF, per address family, a global pool, or no sharing at all.
>
> This series adds:
>
> - A shared tbl8 pool, replacing per-backend allocation
> (bitmap in dir24_8, stack in trie) with a common
> refcounted O(1) stack allocator.
> - An optional resizable mode (grow via alloc + copy + QSBR
> synchronize), removing the need to guess peak usage at
> creation time.
> - A stats API (rte_fib_tbl8_pool_get_stats()) exposing
> used/total/max counters.
>
> All features are opt-in:
>
> - Existing per-FIB allocation remains the default.
> - Shared pool is enabled via the tbl8_pool config field.
> - Resize is enabled by setting max_tbl8 > 0 with QSBR.
>
> Shrinking (reducing pool capacity after usage drops) is not
> part of this series. It would always be best-effort since
> there is no compaction: if any tbl8 group near the end of the
> pool is still in use, the pool cannot shrink. The current LIFO
> free-list makes this less likely by immediately reusing freed
> high indices, which prevents a contiguous free tail from
> forming. A different allocation strategy (e.g. a min-heap
> favoring low indices) could improve shrink opportunities, but
> is better addressed separately.
>
> A working integration in Grout is available:
> https://github.com/DPDK/grout/pull/581 (still a draft)
>
> Maxime Leroy (5):
> test/fib6: zero-initialize config struct
> fib: share tbl8 definitions between fib and fib6
> fib: add shared tbl8 pool
> fib: add resizable tbl8 pool
> fib: add tbl8 pool stats API
>
> app/test/test_fib6.c | 10 +-
> lib/fib/dir24_8.c | 234 ++++++++++---------------
> lib/fib/dir24_8.h | 17 +-
> lib/fib/fib_tbl8.h | 50 ++++++
> lib/fib/fib_tbl8_pool.c | 337 ++++++++++++++++++++++++++++++++++++
> lib/fib/fib_tbl8_pool.h | 113 ++++++++++++
> lib/fib/meson.build | 5 +-
> lib/fib/rte_fib.h | 3 +
> lib/fib/rte_fib6.h | 3 +
> lib/fib/rte_fib_tbl8_pool.h | 149 ++++++++++++++++
> lib/fib/trie.c | 230 +++++++++---------------
> lib/fib/trie.h | 15 +-
> 12 files changed, 844 insertions(+), 322 deletions(-)
> create mode 100644 lib/fib/fib_tbl8.h
> create mode 100644 lib/fib/fib_tbl8_pool.c
> create mode 100644 lib/fib/fib_tbl8_pool.h
> create mode 100644 lib/fib/rte_fib_tbl8_pool.h
>
Brief AI review
Review of [RFC 0/5] fib: shared and resizable tbl8 pool
Good series overall. The motivation for shared tbl8 pools in multi-VRF
environments is clear and the cover letter is well-written. A few
issues below, mostly in the resize path.
Patch 4/5: fib: add resizable tbl8 pool
--------------------------------------------
Error: Uses C11 <stdatomic.h> directly instead of DPDK atomic wrappers.
New DPDK code must use rte_atomic_thread_fence() with rte_memory_order_*
constants, not C11 atomic_thread_fence() with memory_order_*.
In fib_tbl8_pool.c:
#include <stdatomic.h>
...
atomic_thread_fence(memory_order_release);
Should be:
#include <rte_stdatomic.h>
...
rte_atomic_thread_fence(rte_memory_order_release);
Warning: The plain store to consumer tbl8 pointers during resize
(*c->tbl8_ptr = new_tbl8) and the data-plane readers' plain load of
dp->tbl8 in the lookup functions have no acquire/release annotation.
This works today because the RCU synchronize prevents use-after-free
of the old array, and both old and new arrays contain identical data
during the transition. However, the release fence before
pool->tbl8 = new_tbl8 does not cover the subsequent consumer pointer
stores. Consider using rte_atomic_store_explicit() with release
ordering on the consumer pointer stores, or at minimum adding a
comment explaining why plain stores are safe here.
Warning: rte_fib_tbl8_pool_resize() is declared in the public header
and exported, but it is also called automatically from
fib_tbl8_pool_alloc() as an internal fallback. Having an auto-resize
path that calls rte_rcu_qsbr_synchronize() means a route add can
block for an unbounded time waiting for all reader threads to go
quiescent. This should be documented prominently, or the resize
should be separated from the alloc path so the caller can control
when blocking is acceptable.
Patch 3/5: fib: add shared tbl8 pool
--------------------------------------------
Warning: The rte_fib_tbl8_pool struct and the free_list array are
allocated with rte_zmalloc_socket but are only used on the control
path. Standard calloc/malloc would avoid consuming limited hugepage
memory. The tbl8 data array is correctly allocated with
rte_zmalloc_socket since it is accessed in the data plane.
Warning: install_to_fib() in dir24_8.c has an error path that
calls fib_tbl8_pool_cleanup_and_free() to return tbl8_idx when
tmp_tbl8_idx allocation fails:
} else if (tmp_tbl8_idx < 0) {
fib_tbl8_pool_cleanup_and_free(dp->pool, tbl8_idx);
return -ENOSPC;
}
This is correct (cleans the initialized tbl8 group before returning
it), but note this is a behavior change from the previous patch in
the series where tbl8_put() was used without cleanup. The change is
an improvement but should be mentioned in the commit message since
it affects error-path semantics.
Patches 3/5, 4/5, 5/5: New public API
--------------------------------------------
Warning: Five new public API functions are added across these patches
(rte_fib_tbl8_pool_create, rte_fib_tbl8_pool_free,
rte_fib_tbl8_pool_rcu_qsbr_add, rte_fib_tbl8_pool_resize,
rte_fib_tbl8_pool_get_stats) but no tests are added. New APIs need
test coverage, at minimum exercising:
- create/free lifecycle
- shared pool between two FIB instances
- resize with RCU configured
- stats accuracy after alloc/free cycles
Warning: No release notes for the new APIs and features. These will
be needed before the series moves past RFC.
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC 0/5] fib: shared and resizable tbl8 pool
2026-03-31 22:17 ` [RFC 0/5] fib: shared and resizable tbl8 pool Robin Jarry
@ 2026-04-01 9:15 ` Maxime Leroy
0 siblings, 0 replies; 9+ messages in thread
From: Maxime Leroy @ 2026-04-01 9:15 UTC (permalink / raw)
To: Robin Jarry; +Cc: dev, Vladimir Medvedkin
[-- Attachment #1: Type: text/plain, Size: 5024 bytes --]
HI Robin,
Thanks a lot for the review.
Le mer. 1 avr. 2026, 00:17, Robin Jarry <rjarry@redhat.com> a écrit :
> Hi Maxime,
>
> Maxime Leroy, Mar 31, 2026 at 23:41:
> > This RFC proposes an optional shared tbl8 pool for FIB/FIB6,
> > to address the difficulty of sizing num_tbl8 upfront.
> >
> > In practice, tbl8 usage depends on prefix distribution and
> > evolves over time. In multi-VRF environments, some VRFs are
> > elephants (full table, thousands of tbl8 groups) while others
> > consume very little (mostly /24 or shorter). Per-FIB sizing
> > forces each instance to provision for its worst case, leading
> > to significant memory waste.
> >
> > A shared pool solves this: all FIBs draw from the same tbl8
> > memory, so elephant VRFs use what they need while light VRFs
> > cost almost nothing. The sharing granularity is flexible: one pool per
> > VRF, per address family, a global pool, or no sharing at all.
> >
> > This series adds:
> >
> > - A shared tbl8 pool, replacing per-backend allocation
> > (bitmap in dir24_8, stack in trie) with a common
> > refcounted O(1) stack allocator.
> > - An optional resizable mode (grow via alloc + copy + QSBR
> > synchronize), removing the need to guess peak usage at
> > creation time.
> > - A stats API (rte_fib_tbl8_pool_get_stats()) exposing
> > used/total/max counters.
> >
> > All features are opt-in:
> >
> > - Existing per-FIB allocation remains the default.
> > - Shared pool is enabled via the tbl8_pool config field.
> > - Resize is enabled by setting max_tbl8 > 0 with QSBR.
>
> The shared pool is nice, but dynamic resize is awesome.
>
> I have gone over the implementation and it seems sane to me. The only
> concern I might have is the change of tbl8 pool allocator for DIR24_8
> from a O(n/64) slab to O(1) stack. I don't know if it can have
> a performance impact on lookup or if it only affects the control plane
> operations (add/del).
>
This only affects control-plane operations (tbl8 alloc/free on
add/del).Lookup only reads the final tbl24/tbl8 arrays and does not
interact withthe allocator itself.
So the motivation for the stack allocator here is to make
shared-poolmanagement simple and O(1) on the update path, not to change
lookupbehavior.
If shrinking ever becomes interesting later, then I agree the allocator
choice may need to be revisited. A LIFO stack immediately reuses the most
recently freed entries, so high indices tend to get reused first and it
becomes difficult to form a contiguous free tail. A low-first bitmap/slab
allocator, or a min-heap, would be better for shrinking because they prefer
lower free indices and therefore leave high indices unused longer.The
trade-off is that put/get become more expensive (O(n/64) for bitmap scan or
O(log n) for a heap, instead of O(1) for the stack).
> > Shrinking (reducing pool capacity after usage drops) is not
> > part of this series. It would always be best-effort since
> > there is no compaction: if any tbl8 group near the end of the
> > pool is still in use, the pool cannot shrink. The current LIFO
> > free-list makes this less likely by immediately reusing freed
> > high indices, which prevents a contiguous free tail from
> > forming. A different allocation strategy (e.g. a min-heap
> > favoring low indices) could improve shrink opportunities, but
> > is better addressed separately.
>
> Shrinking would be nice to have but not critical in my opinion. I would
> prefer if we could add a dynamic resize feature (and possibly RIB node
> mempool sharing) for rte_rib* as well so that FIB objects can really be
> scaled up on demand. For now, if you run out of space in the RIB, you
> will get an ENOSPC error even if the FIB tbl8 pool still has room.
>
Agreed. Today tbl8 is only one side of the sizing problem.
For rte_rib*, I think we should probably move in a similar direction as
well: avoid per-VRF/per-instance worst-case provisioning, while keeping
separate global limits for IPv4 and IPv6, e.g. max_ipv4_routes and
max_ipv6_routes.
The difference with tbl8 is that the trade-off is not the same. tbl8 usage
is both expensive (2 KB per group) and hard to predict from route count
alone, since it depends on prefix distribution and table shape. RIB nodes
are much smaller and their usage is more predictable, so a shared global
node pool per AF already looks like a sensible first step.
That would remove per-VRF over-provisioning while keeping global limits. If
later true on-demand growth is needed, I think the mechanism would likely
have to be different from tbl8 resizing anyway: tbl8 is an indexed array
and can grow via alloc + copy + pointer swap, while RIB nodes are linked by
pointers, so they cannot be relocated transparently. In that case, a
chunked allocator would probably make more sense.
Also, I do not think hugepage-backed allocation (i.e. rte_mempool) is
really needed for rte_rib*.
Maxime
[-- Attachment #2: Type: text/html, Size: 6613 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2026-04-01 9:15 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-31 21:41 [RFC 0/5] fib: shared and resizable tbl8 pool Maxime Leroy
2026-03-31 21:41 ` [RFC 1/5] test/fib6: zero-initialize config struct Maxime Leroy
2026-03-31 21:41 ` [RFC 2/5] fib: share tbl8 definitions between fib and fib6 Maxime Leroy
2026-03-31 21:41 ` [RFC 3/5] fib: add shared tbl8 pool Maxime Leroy
2026-03-31 21:41 ` [RFC 4/5] fib: add resizable " Maxime Leroy
2026-03-31 21:41 ` [RFC 5/5] fib: add tbl8 pool stats API Maxime Leroy
2026-03-31 22:17 ` [RFC 0/5] fib: shared and resizable tbl8 pool Robin Jarry
2026-04-01 9:15 ` Maxime Leroy
2026-03-31 22:30 ` Stephen Hemminger
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox