* [RFC PATCH 0/4] VRF support in FIB library
@ 2026-03-22 15:42 Vladimir Medvedkin
2026-03-22 15:42 ` [RFC PATCH 1/4] fib: add multi-VRF support Vladimir Medvedkin
` (7 more replies)
0 siblings, 8 replies; 33+ messages in thread
From: Vladimir Medvedkin @ 2026-03-22 15:42 UTC (permalink / raw)
To: dev; +Cc: rjarry, nsaxena16, mb, adwivedi, jerinjacobk
This series adds multi-VRF support to both IPv4 and IPv6 FIB paths by
allowing a single FIB instance to host multiple isolated routing domains.
Currently FIB instance represents one routing instance. For workloads that
need multiple VRFs, the only option is to create multiple FIB objects. In a
burst oriented datapath, packets in the same batch can belong to different VRFs, so
the application either does per-packet lookup in different FIB instances or
regroups packets by VRF before lookup. Both approaches are expensive.
To remove that cost, this series keeps all VRFs inside one FIB instance and
extends lookup input with per-packet VRF IDs.
The design follows the existing fast-path structure for both families. IPv4 and
IPv6 use multi-ary trees with a 2^24 associativity on a first level (tbl24). The
first-level table scales per configured VRF. This increases memory usage, but
keeps performance and lookup complexity on par with non-VRF implementation.
Vladimir Medvedkin (4):
fib: add multi-VRF support
fib: add VRF functional and unit tests
fib6: add multi-VRF support
fib6: add VRF functional and unit tests
app/test-fib/main.c | 257 ++++++++++++++++++++++--
app/test/test_fib.c | 298 +++++++++++++++++++++++++++
app/test/test_fib6.c | 319 ++++++++++++++++++++++++++++-
lib/fib/dir24_8.c | 241 ++++++++++++++++------
lib/fib/dir24_8.h | 255 ++++++++++++++++--------
lib/fib/dir24_8_avx512.c | 420 +++++++++++++++++++++++++++++++--------
lib/fib/dir24_8_avx512.h | 80 +++++++-
lib/fib/rte_fib.c | 158 ++++++++++++---
lib/fib/rte_fib.h | 94 ++++++++-
lib/fib/rte_fib6.c | 166 +++++++++++++---
lib/fib/rte_fib6.h | 88 +++++++-
lib/fib/trie.c | 158 +++++++++++----
lib/fib/trie.h | 51 +++--
lib/fib/trie_avx512.c | 225 +++++++++++++++++++--
lib/fib/trie_avx512.h | 39 +++-
15 files changed, 2453 insertions(+), 396 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 33+ messages in thread
* [RFC PATCH 1/4] fib: add multi-VRF support
2026-03-22 15:42 [RFC PATCH 0/4] VRF support in FIB library Vladimir Medvedkin
@ 2026-03-22 15:42 ` Vladimir Medvedkin
2026-03-23 15:48 ` Konstantin Ananyev
2026-03-22 15:42 ` [RFC PATCH 2/4] fib: add VRF functional and unit tests Vladimir Medvedkin
` (6 subsequent siblings)
7 siblings, 1 reply; 33+ messages in thread
From: Vladimir Medvedkin @ 2026-03-22 15:42 UTC (permalink / raw)
To: dev; +Cc: rjarry, nsaxena16, mb, adwivedi, jerinjacobk
Add VRF (Virtual Routing and Forwarding) support to the IPv4
FIB library, allowing multiple independent routing tables
within a single FIB instance.
Introduce max_vrfs and vrf_default_nh fields in rte_fib_conf
to configure the number of VRFs and per-VRF default nexthops.
Add four new experimental APIs:
- rte_fib_vrf_add() and rte_fib_vrf_delete() to manage routes
per VRF
- rte_fib_vrf_lookup_bulk() for multi-VRF bulk lookups
- rte_fib_vrf_get_rib() to retrieve a per-VRF RIB handle
Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
---
lib/fib/dir24_8.c | 241 ++++++++++++++++------
lib/fib/dir24_8.h | 255 ++++++++++++++++--------
lib/fib/dir24_8_avx512.c | 420 +++++++++++++++++++++++++++++++--------
lib/fib/dir24_8_avx512.h | 80 +++++++-
lib/fib/rte_fib.c | 158 ++++++++++++---
lib/fib/rte_fib.h | 94 ++++++++-
6 files changed, 988 insertions(+), 260 deletions(-)
diff --git a/lib/fib/dir24_8.c b/lib/fib/dir24_8.c
index 489d2ef427..ad295c5f16 100644
--- a/lib/fib/dir24_8.c
+++ b/lib/fib/dir24_8.c
@@ -32,41 +32,80 @@
#define ROUNDUP(x, y) RTE_ALIGN_CEIL(x, (1 << (32 - y)))
static inline rte_fib_lookup_fn_t
-get_scalar_fn(enum rte_fib_dir24_8_nh_sz nh_sz, bool be_addr)
+get_scalar_fn(const struct dir24_8_tbl *dp, enum rte_fib_dir24_8_nh_sz nh_sz,
+ bool be_addr)
{
+ bool single_vrf = dp->num_vrfs <= 1;
+
switch (nh_sz) {
case RTE_FIB_DIR24_8_1B:
- return be_addr ? dir24_8_lookup_bulk_1b_be : dir24_8_lookup_bulk_1b;
+ if (single_vrf)
+ return be_addr ? dir24_8_lookup_bulk_1b_be :
+ dir24_8_lookup_bulk_1b;
+ return be_addr ? dir24_8_lookup_bulk_vrf_1b_be :
+ dir24_8_lookup_bulk_vrf_1b;
case RTE_FIB_DIR24_8_2B:
- return be_addr ? dir24_8_lookup_bulk_2b_be : dir24_8_lookup_bulk_2b;
+ if (single_vrf)
+ return be_addr ? dir24_8_lookup_bulk_2b_be :
+ dir24_8_lookup_bulk_2b;
+ return be_addr ? dir24_8_lookup_bulk_vrf_2b_be :
+ dir24_8_lookup_bulk_vrf_2b;
case RTE_FIB_DIR24_8_4B:
- return be_addr ? dir24_8_lookup_bulk_4b_be : dir24_8_lookup_bulk_4b;
+ if (single_vrf)
+ return be_addr ? dir24_8_lookup_bulk_4b_be :
+ dir24_8_lookup_bulk_4b;
+ return be_addr ? dir24_8_lookup_bulk_vrf_4b_be :
+ dir24_8_lookup_bulk_vrf_4b;
case RTE_FIB_DIR24_8_8B:
- return be_addr ? dir24_8_lookup_bulk_8b_be : dir24_8_lookup_bulk_8b;
+ if (single_vrf)
+ return be_addr ? dir24_8_lookup_bulk_8b_be :
+ dir24_8_lookup_bulk_8b;
+ return be_addr ? dir24_8_lookup_bulk_vrf_8b_be :
+ dir24_8_lookup_bulk_vrf_8b;
default:
return NULL;
}
}
static inline rte_fib_lookup_fn_t
-get_scalar_fn_inlined(enum rte_fib_dir24_8_nh_sz nh_sz, bool be_addr)
+get_scalar_fn_inlined(const struct dir24_8_tbl *dp,
+ enum rte_fib_dir24_8_nh_sz nh_sz, bool be_addr)
{
+ bool single_vrf = dp->num_vrfs <= 1;
+
switch (nh_sz) {
case RTE_FIB_DIR24_8_1B:
- return be_addr ? dir24_8_lookup_bulk_0_be : dir24_8_lookup_bulk_0;
+ if (single_vrf)
+ return be_addr ? dir24_8_lookup_bulk_0_be :
+ dir24_8_lookup_bulk_0;
+ return be_addr ? dir24_8_lookup_bulk_vrf_0_be :
+ dir24_8_lookup_bulk_vrf_0;
case RTE_FIB_DIR24_8_2B:
- return be_addr ? dir24_8_lookup_bulk_1_be : dir24_8_lookup_bulk_1;
+ if (single_vrf)
+ return be_addr ? dir24_8_lookup_bulk_1_be :
+ dir24_8_lookup_bulk_1;
+ return be_addr ? dir24_8_lookup_bulk_vrf_1_be :
+ dir24_8_lookup_bulk_vrf_1;
case RTE_FIB_DIR24_8_4B:
- return be_addr ? dir24_8_lookup_bulk_2_be : dir24_8_lookup_bulk_2;
+ if (single_vrf)
+ return be_addr ? dir24_8_lookup_bulk_2_be :
+ dir24_8_lookup_bulk_2;
+ return be_addr ? dir24_8_lookup_bulk_vrf_2_be :
+ dir24_8_lookup_bulk_vrf_2;
case RTE_FIB_DIR24_8_8B:
- return be_addr ? dir24_8_lookup_bulk_3_be : dir24_8_lookup_bulk_3;
+ if (single_vrf)
+ return be_addr ? dir24_8_lookup_bulk_3_be :
+ dir24_8_lookup_bulk_3;
+ return be_addr ? dir24_8_lookup_bulk_vrf_3_be :
+ dir24_8_lookup_bulk_vrf_3;
default:
return NULL;
}
}
static inline rte_fib_lookup_fn_t
-get_vector_fn(enum rte_fib_dir24_8_nh_sz nh_sz, bool be_addr)
+get_vector_fn(const struct dir24_8_tbl *dp, enum rte_fib_dir24_8_nh_sz nh_sz,
+ bool be_addr)
{
#ifdef CC_AVX512_SUPPORT
if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512F) <= 0 ||
@@ -77,24 +116,63 @@ get_vector_fn(enum rte_fib_dir24_8_nh_sz nh_sz, bool be_addr)
if (be_addr && rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512BW) <= 0)
return NULL;
+ if (dp->num_vrfs <= 1) {
+ switch (nh_sz) {
+ case RTE_FIB_DIR24_8_1B:
+ return be_addr ? rte_dir24_8_vec_lookup_bulk_1b_be :
+ rte_dir24_8_vec_lookup_bulk_1b;
+ case RTE_FIB_DIR24_8_2B:
+ return be_addr ? rte_dir24_8_vec_lookup_bulk_2b_be :
+ rte_dir24_8_vec_lookup_bulk_2b;
+ case RTE_FIB_DIR24_8_4B:
+ return be_addr ? rte_dir24_8_vec_lookup_bulk_4b_be :
+ rte_dir24_8_vec_lookup_bulk_4b;
+ case RTE_FIB_DIR24_8_8B:
+ return be_addr ? rte_dir24_8_vec_lookup_bulk_8b_be :
+ rte_dir24_8_vec_lookup_bulk_8b;
+ default:
+ return NULL;
+ }
+ }
+
+ if (dp->num_vrfs >= 256) {
+ switch (nh_sz) {
+ case RTE_FIB_DIR24_8_1B:
+ return be_addr ? rte_dir24_8_vec_lookup_bulk_vrf_1b_be_large :
+ rte_dir24_8_vec_lookup_bulk_vrf_1b_large;
+ case RTE_FIB_DIR24_8_2B:
+ return be_addr ? rte_dir24_8_vec_lookup_bulk_vrf_2b_be_large :
+ rte_dir24_8_vec_lookup_bulk_vrf_2b_large;
+ case RTE_FIB_DIR24_8_4B:
+ return be_addr ? rte_dir24_8_vec_lookup_bulk_vrf_4b_be_large :
+ rte_dir24_8_vec_lookup_bulk_vrf_4b_large;
+ case RTE_FIB_DIR24_8_8B:
+ return be_addr ? rte_dir24_8_vec_lookup_bulk_vrf_8b_be_large :
+ rte_dir24_8_vec_lookup_bulk_vrf_8b_large;
+ default:
+ return NULL;
+ }
+ }
+
switch (nh_sz) {
case RTE_FIB_DIR24_8_1B:
- return be_addr ? rte_dir24_8_vec_lookup_bulk_1b_be :
- rte_dir24_8_vec_lookup_bulk_1b;
+ return be_addr ? rte_dir24_8_vec_lookup_bulk_vrf_1b_be :
+ rte_dir24_8_vec_lookup_bulk_vrf_1b;
case RTE_FIB_DIR24_8_2B:
- return be_addr ? rte_dir24_8_vec_lookup_bulk_2b_be :
- rte_dir24_8_vec_lookup_bulk_2b;
+ return be_addr ? rte_dir24_8_vec_lookup_bulk_vrf_2b_be :
+ rte_dir24_8_vec_lookup_bulk_vrf_2b;
case RTE_FIB_DIR24_8_4B:
- return be_addr ? rte_dir24_8_vec_lookup_bulk_4b_be :
- rte_dir24_8_vec_lookup_bulk_4b;
+ return be_addr ? rte_dir24_8_vec_lookup_bulk_vrf_4b_be :
+ rte_dir24_8_vec_lookup_bulk_vrf_4b;
case RTE_FIB_DIR24_8_8B:
- return be_addr ? rte_dir24_8_vec_lookup_bulk_8b_be :
- rte_dir24_8_vec_lookup_bulk_8b;
+ return be_addr ? rte_dir24_8_vec_lookup_bulk_vrf_8b_be :
+ rte_dir24_8_vec_lookup_bulk_vrf_8b;
default:
return NULL;
}
#elif defined(RTE_RISCV_FEATURE_V)
RTE_SET_USED(be_addr);
+ RTE_SET_USED(dp);
if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_RISCV_ISA_V) <= 0)
return NULL;
switch (nh_sz) {
@@ -130,16 +208,17 @@ dir24_8_get_lookup_fn(void *p, enum rte_fib_lookup_type type, bool be_addr)
switch (type) {
case RTE_FIB_LOOKUP_DIR24_8_SCALAR_MACRO:
- return get_scalar_fn(nh_sz, be_addr);
+ return get_scalar_fn(dp, nh_sz, be_addr);
case RTE_FIB_LOOKUP_DIR24_8_SCALAR_INLINE:
- return get_scalar_fn_inlined(nh_sz, be_addr);
+ return get_scalar_fn_inlined(dp, nh_sz, be_addr);
case RTE_FIB_LOOKUP_DIR24_8_SCALAR_UNI:
- return be_addr ? dir24_8_lookup_bulk_uni_be : dir24_8_lookup_bulk_uni;
+ return be_addr ? dir24_8_lookup_bulk_uni_be :
+ dir24_8_lookup_bulk_uni;
case RTE_FIB_LOOKUP_DIR24_8_VECTOR_AVX512:
- return get_vector_fn(nh_sz, be_addr);
+ return get_vector_fn(dp, nh_sz, be_addr);
case RTE_FIB_LOOKUP_DEFAULT:
- ret_fn = get_vector_fn(nh_sz, be_addr);
- return ret_fn != NULL ? ret_fn : get_scalar_fn(nh_sz, be_addr);
+ ret_fn = get_vector_fn(dp, nh_sz, be_addr);
+ return ret_fn != NULL ? ret_fn : get_scalar_fn(dp, nh_sz, be_addr);
default:
return NULL;
}
@@ -246,15 +325,18 @@ __rcu_qsbr_free_resource(void *p, void *data, unsigned int n __rte_unused)
}
static void
-tbl8_recycle(struct dir24_8_tbl *dp, uint32_t ip, uint64_t tbl8_idx)
+tbl8_recycle(struct dir24_8_tbl *dp, uint16_t vrf_id, uint32_t ip, uint64_t tbl8_idx)
{
uint32_t i;
uint64_t nh;
+ uint64_t tbl24_idx;
uint8_t *ptr8;
uint16_t *ptr16;
uint32_t *ptr32;
uint64_t *ptr64;
+ tbl24_idx = get_tbl24_idx(vrf_id, ip);
+
switch (dp->nh_sz) {
case RTE_FIB_DIR24_8_1B:
ptr8 = &((uint8_t *)dp->tbl8)[tbl8_idx *
@@ -264,7 +346,7 @@ tbl8_recycle(struct dir24_8_tbl *dp, uint32_t ip, uint64_t tbl8_idx)
if (nh != ptr8[i])
return;
}
- ((uint8_t *)dp->tbl24)[ip >> 8] =
+ ((uint8_t *)dp->tbl24)[tbl24_idx] =
nh & ~DIR24_8_EXT_ENT;
break;
case RTE_FIB_DIR24_8_2B:
@@ -275,7 +357,7 @@ tbl8_recycle(struct dir24_8_tbl *dp, uint32_t ip, uint64_t tbl8_idx)
if (nh != ptr16[i])
return;
}
- ((uint16_t *)dp->tbl24)[ip >> 8] =
+ ((uint16_t *)dp->tbl24)[tbl24_idx] =
nh & ~DIR24_8_EXT_ENT;
break;
case RTE_FIB_DIR24_8_4B:
@@ -286,7 +368,7 @@ tbl8_recycle(struct dir24_8_tbl *dp, uint32_t ip, uint64_t tbl8_idx)
if (nh != ptr32[i])
return;
}
- ((uint32_t *)dp->tbl24)[ip >> 8] =
+ ((uint32_t *)dp->tbl24)[tbl24_idx] =
nh & ~DIR24_8_EXT_ENT;
break;
case RTE_FIB_DIR24_8_8B:
@@ -297,7 +379,7 @@ tbl8_recycle(struct dir24_8_tbl *dp, uint32_t ip, uint64_t tbl8_idx)
if (nh != ptr64[i])
return;
}
- ((uint64_t *)dp->tbl24)[ip >> 8] =
+ ((uint64_t *)dp->tbl24)[tbl24_idx] =
nh & ~DIR24_8_EXT_ENT;
break;
}
@@ -314,7 +396,7 @@ tbl8_recycle(struct dir24_8_tbl *dp, uint32_t ip, uint64_t tbl8_idx)
}
static int
-install_to_fib(struct dir24_8_tbl *dp, uint32_t ledge, uint32_t redge,
+install_to_fib(struct dir24_8_tbl *dp, uint16_t vrf_id, uint32_t ledge, uint32_t redge,
uint64_t next_hop)
{
uint64_t tbl24_tmp;
@@ -328,7 +410,7 @@ install_to_fib(struct dir24_8_tbl *dp, uint32_t ledge, uint32_t redge,
if (((ledge >> 8) != (redge >> 8)) || (len == 1 << 24)) {
if ((ROUNDUP(ledge, 24) - ledge) != 0) {
- tbl24_tmp = get_tbl24(dp, ledge, dp->nh_sz);
+ tbl24_tmp = get_tbl24(dp, vrf_id, ledge, dp->nh_sz);
if ((tbl24_tmp & DIR24_8_EXT_ENT) !=
DIR24_8_EXT_ENT) {
/**
@@ -346,7 +428,7 @@ install_to_fib(struct dir24_8_tbl *dp, uint32_t ledge, uint32_t redge,
}
tbl8_free_idx(dp, tmp_tbl8_idx);
/*update dir24 entry with tbl8 index*/
- write_to_fib(get_tbl24_p(dp, ledge,
+ write_to_fib(get_tbl24_p(dp, vrf_id, ledge,
dp->nh_sz), (tbl8_idx << 1)|
DIR24_8_EXT_ENT,
dp->nh_sz, 1);
@@ -360,19 +442,19 @@ install_to_fib(struct dir24_8_tbl *dp, uint32_t ledge, uint32_t redge,
write_to_fib((void *)tbl8_ptr, (next_hop << 1)|
DIR24_8_EXT_ENT,
dp->nh_sz, ROUNDUP(ledge, 24) - ledge);
- tbl8_recycle(dp, ledge, tbl8_idx);
+ tbl8_recycle(dp, vrf_id, ledge, tbl8_idx);
}
- write_to_fib(get_tbl24_p(dp, ROUNDUP(ledge, 24), dp->nh_sz),
+ write_to_fib(get_tbl24_p(dp, vrf_id, ROUNDUP(ledge, 24), dp->nh_sz),
next_hop << 1, dp->nh_sz, len);
if (redge & ~DIR24_8_TBL24_MASK) {
- tbl24_tmp = get_tbl24(dp, redge, dp->nh_sz);
+ tbl24_tmp = get_tbl24(dp, vrf_id, redge, dp->nh_sz);
if ((tbl24_tmp & DIR24_8_EXT_ENT) !=
DIR24_8_EXT_ENT) {
tbl8_idx = tbl8_alloc(dp, tbl24_tmp);
if (tbl8_idx < 0)
return -ENOSPC;
/*update dir24 entry with tbl8 index*/
- write_to_fib(get_tbl24_p(dp, redge,
+ write_to_fib(get_tbl24_p(dp, vrf_id, redge,
dp->nh_sz), (tbl8_idx << 1)|
DIR24_8_EXT_ENT,
dp->nh_sz, 1);
@@ -385,17 +467,17 @@ install_to_fib(struct dir24_8_tbl *dp, uint32_t ledge, uint32_t redge,
write_to_fib((void *)tbl8_ptr, (next_hop << 1)|
DIR24_8_EXT_ENT,
dp->nh_sz, redge & ~DIR24_8_TBL24_MASK);
- tbl8_recycle(dp, redge, tbl8_idx);
+ tbl8_recycle(dp, vrf_id, redge, tbl8_idx);
}
} else if ((redge - ledge) != 0) {
- tbl24_tmp = get_tbl24(dp, ledge, dp->nh_sz);
+ tbl24_tmp = get_tbl24(dp, vrf_id, ledge, dp->nh_sz);
if ((tbl24_tmp & DIR24_8_EXT_ENT) !=
DIR24_8_EXT_ENT) {
tbl8_idx = tbl8_alloc(dp, tbl24_tmp);
if (tbl8_idx < 0)
return -ENOSPC;
/*update dir24 entry with tbl8 index*/
- write_to_fib(get_tbl24_p(dp, ledge, dp->nh_sz),
+ write_to_fib(get_tbl24_p(dp, vrf_id, ledge, dp->nh_sz),
(tbl8_idx << 1)|
DIR24_8_EXT_ENT,
dp->nh_sz, 1);
@@ -409,13 +491,13 @@ install_to_fib(struct dir24_8_tbl *dp, uint32_t ledge, uint32_t redge,
write_to_fib((void *)tbl8_ptr, (next_hop << 1)|
DIR24_8_EXT_ENT,
dp->nh_sz, redge - ledge);
- tbl8_recycle(dp, ledge, tbl8_idx);
+ tbl8_recycle(dp, vrf_id, ledge, tbl8_idx);
}
return 0;
}
static int
-modify_fib(struct dir24_8_tbl *dp, struct rte_rib *rib, uint32_t ip,
+modify_fib(struct dir24_8_tbl *dp, struct rte_rib *rib, uint16_t vrf_id, uint32_t ip,
uint8_t depth, uint64_t next_hop)
{
struct rte_rib_node *tmp = NULL;
@@ -438,7 +520,7 @@ modify_fib(struct dir24_8_tbl *dp, struct rte_rib *rib, uint32_t ip,
(uint32_t)(1ULL << (32 - tmp_depth));
continue;
}
- ret = install_to_fib(dp, ledge, redge,
+ ret = install_to_fib(dp, vrf_id, ledge, redge,
next_hop);
if (ret != 0)
return ret;
@@ -454,7 +536,7 @@ modify_fib(struct dir24_8_tbl *dp, struct rte_rib *rib, uint32_t ip,
redge = ip + (uint32_t)(1ULL << (32 - depth));
if (ledge == redge && ledge != 0)
break;
- ret = install_to_fib(dp, ledge, redge,
+ ret = install_to_fib(dp, vrf_id, ledge, redge,
next_hop);
if (ret != 0)
return ret;
@@ -465,7 +547,7 @@ modify_fib(struct dir24_8_tbl *dp, struct rte_rib *rib, uint32_t ip,
}
int
-dir24_8_modify(struct rte_fib *fib, uint32_t ip, uint8_t depth,
+dir24_8_modify(struct rte_fib *fib, uint16_t vrf_id, uint32_t ip, uint8_t depth,
uint64_t next_hop, int op)
{
struct dir24_8_tbl *dp;
@@ -480,8 +562,13 @@ dir24_8_modify(struct rte_fib *fib, uint32_t ip, uint8_t depth,
return -EINVAL;
dp = rte_fib_get_dp(fib);
- rib = rte_fib_get_rib(fib);
- RTE_ASSERT((dp != NULL) && (rib != NULL));
+ RTE_ASSERT(dp != NULL);
+
+ if (vrf_id >= dp->num_vrfs)
+ return -EINVAL;
+
+ rib = rte_fib_vrf_get_rib(fib, vrf_id);
+ RTE_ASSERT(rib != NULL);
if (next_hop > get_max_nh(dp->nh_sz))
return -EINVAL;
@@ -495,7 +582,7 @@ dir24_8_modify(struct rte_fib *fib, uint32_t ip, uint8_t depth,
rte_rib_get_nh(node, &node_nh);
if (node_nh == next_hop)
return 0;
- ret = modify_fib(dp, rib, ip, depth, next_hop);
+ ret = modify_fib(dp, rib, vrf_id, ip, depth, next_hop);
if (ret == 0)
rte_rib_set_nh(node, next_hop);
return 0;
@@ -518,7 +605,7 @@ dir24_8_modify(struct rte_fib *fib, uint32_t ip, uint8_t depth,
if (par_nh == next_hop)
goto successfully_added;
}
- ret = modify_fib(dp, rib, ip, depth, next_hop);
+ ret = modify_fib(dp, rib, vrf_id, ip, depth, next_hop);
if (ret != 0) {
rte_rib_remove(rib, ip, depth);
return ret;
@@ -536,9 +623,9 @@ dir24_8_modify(struct rte_fib *fib, uint32_t ip, uint8_t depth,
rte_rib_get_nh(parent, &par_nh);
rte_rib_get_nh(node, &node_nh);
if (par_nh != node_nh)
- ret = modify_fib(dp, rib, ip, depth, par_nh);
+ ret = modify_fib(dp, rib, vrf_id, ip, depth, par_nh);
} else
- ret = modify_fib(dp, rib, ip, depth, dp->def_nh);
+ ret = modify_fib(dp, rib, vrf_id, ip, depth, dp->def_nh[vrf_id]);
if (ret == 0) {
rte_rib_remove(rib, ip, depth);
if (depth > 24) {
@@ -562,7 +649,10 @@ dir24_8_create(const char *name, int socket_id, struct rte_fib_conf *fib_conf)
struct dir24_8_tbl *dp;
uint64_t def_nh;
uint32_t num_tbl8;
+ uint16_t num_vrfs;
enum rte_fib_dir24_8_nh_sz nh_sz;
+ uint64_t tbl24_sz;
+ uint16_t vrf;
if ((name == NULL) || (fib_conf == NULL) ||
(fib_conf->dir24_8.nh_sz < RTE_FIB_DIR24_8_1B) ||
@@ -580,19 +670,56 @@ dir24_8_create(const char *name, int socket_id, struct rte_fib_conf *fib_conf)
nh_sz = fib_conf->dir24_8.nh_sz;
num_tbl8 = RTE_ALIGN_CEIL(fib_conf->dir24_8.num_tbl8,
BITMAP_SLAB_BIT_SIZE);
+ num_vrfs = (fib_conf->max_vrfs == 0) ? 1 : fib_conf->max_vrfs;
+
+ /* Validate per-VRF default nexthops if provided */
+ if (fib_conf->vrf_default_nh != NULL) {
+ for (vrf = 0; vrf < num_vrfs; vrf++) {
+ if (fib_conf->vrf_default_nh[vrf] > get_max_nh(nh_sz)) {
+ rte_errno = EINVAL;
+ return NULL;
+ }
+ }
+ }
+
+ tbl24_sz = (uint64_t)num_vrfs * DIR24_8_TBL24_NUM_ENT * (1 << nh_sz);
snprintf(mem_name, sizeof(mem_name), "DP_%s", name);
dp = rte_zmalloc_socket(name, sizeof(struct dir24_8_tbl) +
- DIR24_8_TBL24_NUM_ENT * (1 << nh_sz) + sizeof(uint32_t),
+ tbl24_sz + sizeof(uint32_t),
RTE_CACHE_LINE_SIZE, socket_id);
if (dp == NULL) {
rte_errno = ENOMEM;
return NULL;
}
- /* Init table with default value */
- write_to_fib(dp->tbl24, (def_nh << 1), nh_sz, 1 << 24);
+ dp->num_vrfs = num_vrfs;
+ dp->nh_sz = nh_sz;
+ dp->number_tbl8s = num_tbl8;
+
+ /* Allocate per-VRF default nexthop array */
+ snprintf(mem_name, sizeof(mem_name), "DEFNH_%p", dp);
+ dp->def_nh = rte_zmalloc_socket(mem_name, num_vrfs * sizeof(uint64_t),
+ RTE_CACHE_LINE_SIZE, socket_id);
+ if (dp->def_nh == NULL) {
+ rte_errno = ENOMEM;
+ rte_free(dp);
+ return NULL;
+ }
+
+ /* Initialize all VRFs with default nexthop */
+ for (vrf = 0; vrf < num_vrfs; vrf++) {
+ uint64_t vrf_def_nh = (fib_conf->vrf_default_nh != NULL) ?
+ fib_conf->vrf_default_nh[vrf] : def_nh;
+ dp->def_nh[vrf] = vrf_def_nh;
+ /* Init TBL24 for this VRF with default value */
+ uint64_t vrf_offset = (uint64_t)vrf * DIR24_8_TBL24_NUM_ENT;
+ void *vrf_tbl24 = (void *)&((uint8_t *)dp->tbl24)[vrf_offset << nh_sz];
+ write_to_fib(vrf_tbl24, (vrf_def_nh << 1), nh_sz, 1 << 24);
+ }
+
+ /* Allocate shared TBL8 for all VRFs */
snprintf(mem_name, sizeof(mem_name), "TBL8_%p", dp);
uint64_t tbl8_sz = DIR24_8_TBL8_GRP_NUM_ENT * (1ULL << nh_sz) *
(num_tbl8 + 1);
@@ -600,12 +727,10 @@ dir24_8_create(const char *name, int socket_id, struct rte_fib_conf *fib_conf)
RTE_CACHE_LINE_SIZE, socket_id);
if (dp->tbl8 == NULL) {
rte_errno = ENOMEM;
+ rte_free(dp->def_nh);
rte_free(dp);
return NULL;
}
- dp->def_nh = def_nh;
- dp->nh_sz = nh_sz;
- dp->number_tbl8s = num_tbl8;
snprintf(mem_name, sizeof(mem_name), "TBL8_idxes_%p", dp);
dp->tbl8_idxes = rte_zmalloc_socket(mem_name,
@@ -614,6 +739,7 @@ dir24_8_create(const char *name, int socket_id, struct rte_fib_conf *fib_conf)
if (dp->tbl8_idxes == NULL) {
rte_errno = ENOMEM;
rte_free(dp->tbl8);
+ rte_free(dp->def_nh);
rte_free(dp);
return NULL;
}
@@ -629,6 +755,7 @@ dir24_8_free(void *p)
rte_rcu_qsbr_dq_delete(dp->dq);
rte_free(dp->tbl8_idxes);
rte_free(dp->tbl8);
+ rte_free(dp->def_nh);
rte_free(dp);
}
diff --git a/lib/fib/dir24_8.h b/lib/fib/dir24_8.h
index b343b5d686..37a73a3cc2 100644
--- a/lib/fib/dir24_8.h
+++ b/lib/fib/dir24_8.h
@@ -12,6 +12,7 @@
#include <rte_byteorder.h>
#include <rte_prefetch.h>
#include <rte_branch_prediction.h>
+#include <rte_debug.h>
#include <rte_rcu_qsbr.h>
/**
@@ -32,24 +33,19 @@ struct dir24_8_tbl {
uint32_t number_tbl8s; /**< Total number of tbl8s */
uint32_t rsvd_tbl8s; /**< Number of reserved tbl8s */
uint32_t cur_tbl8s; /**< Current number of tbl8s */
+ uint16_t num_vrfs; /**< Number of VRFs */
enum rte_fib_dir24_8_nh_sz nh_sz; /**< Size of nexthop entry */
/* RCU config. */
enum rte_fib_qsbr_mode rcu_mode;/* Blocking, defer queue. */
struct rte_rcu_qsbr *v; /* RCU QSBR variable. */
struct rte_rcu_qsbr_dq *dq; /* RCU QSBR defer queue. */
- uint64_t def_nh; /**< Default next hop */
+ uint64_t *def_nh; /**< Per-VRF default next hop array */
uint64_t *tbl8; /**< tbl8 table. */
uint64_t *tbl8_idxes; /**< bitmap containing free tbl8 idxes*/
/* tbl24 table. */
alignas(RTE_CACHE_LINE_SIZE) uint64_t tbl24[];
};
-static inline void *
-get_tbl24_p(struct dir24_8_tbl *dp, uint32_t ip, uint8_t nh_sz)
-{
- return (void *)&((uint8_t *)dp->tbl24)[(ip &
- DIR24_8_TBL24_MASK) >> (8 - nh_sz)];
-}
static inline uint8_t
bits_in_nh(uint8_t nh_sz)
@@ -63,14 +59,21 @@ get_max_nh(uint8_t nh_sz)
return ((1ULL << (bits_in_nh(nh_sz) - 1)) - 1);
}
-static inline uint32_t
-get_tbl24_idx(uint32_t ip)
+static inline uint64_t
+get_tbl24_idx(uint16_t vrf_id, uint32_t ip)
+{
+ return ((uint64_t)vrf_id << 24) + (ip >> 8);
+}
+
+static inline void *
+get_tbl24_p(struct dir24_8_tbl *dp, uint16_t vrf_id, uint32_t ip, uint8_t nh_sz)
{
- return ip >> 8;
+ uint64_t idx = get_tbl24_idx(vrf_id, ip);
+ return (void *)&((uint8_t *)dp->tbl24)[idx << nh_sz];
}
-static inline uint32_t
-get_tbl8_idx(uint32_t res, uint32_t ip)
+static inline uint64_t
+get_tbl8_idx(uint64_t res, uint32_t ip)
{
return (res >> 1) * DIR24_8_TBL8_GRP_NUM_ENT + (uint8_t)ip;
}
@@ -87,17 +90,18 @@ get_psd_idx(uint32_t val, uint8_t nh_sz)
return val & ((1 << (3 - nh_sz)) - 1);
}
-static inline uint32_t
-get_tbl_idx(uint32_t val, uint8_t nh_sz)
+static inline uint64_t
+get_tbl_idx(uint64_t val, uint8_t nh_sz)
{
return val >> (3 - nh_sz);
}
static inline uint64_t
-get_tbl24(struct dir24_8_tbl *dp, uint32_t ip, uint8_t nh_sz)
+get_tbl24(struct dir24_8_tbl *dp, uint16_t vrf_id, uint32_t ip, uint8_t nh_sz)
{
- return ((dp->tbl24[get_tbl_idx(get_tbl24_idx(ip), nh_sz)] >>
- (get_psd_idx(get_tbl24_idx(ip), nh_sz) *
+ uint64_t idx = get_tbl24_idx(vrf_id, ip);
+ return ((dp->tbl24[get_tbl_idx(idx, nh_sz)] >>
+ (get_psd_idx(idx, nh_sz) *
bits_in_nh(nh_sz))) & lookup_msk(nh_sz));
}
@@ -115,62 +119,92 @@ is_entry_extended(uint64_t ent)
return (ent & DIR24_8_EXT_ENT) == DIR24_8_EXT_ENT;
}
-#define LOOKUP_FUNC(suffix, type, bulk_prefetch, nh_sz) \
-static inline void dir24_8_lookup_bulk_##suffix(void *p, const uint32_t *ips, \
- uint64_t *next_hops, const unsigned int n) \
-{ \
- struct dir24_8_tbl *dp = (struct dir24_8_tbl *)p; \
- uint64_t tmp; \
- uint32_t i; \
- uint32_t prefetch_offset = \
- RTE_MIN((unsigned int)bulk_prefetch, n); \
- \
- for (i = 0; i < prefetch_offset; i++) \
- rte_prefetch0(get_tbl24_p(dp, ips[i], nh_sz)); \
- for (i = 0; i < (n - prefetch_offset); i++) { \
- rte_prefetch0(get_tbl24_p(dp, \
- ips[i + prefetch_offset], nh_sz)); \
- tmp = ((type *)dp->tbl24)[ips[i] >> 8]; \
- if (unlikely(is_entry_extended(tmp))) \
- tmp = ((type *)dp->tbl8)[(uint8_t)ips[i] + \
- ((tmp >> 1) * DIR24_8_TBL8_GRP_NUM_ENT)]; \
- next_hops[i] = tmp >> 1; \
- } \
- for (; i < n; i++) { \
- tmp = ((type *)dp->tbl24)[ips[i] >> 8]; \
- if (unlikely(is_entry_extended(tmp))) \
- tmp = ((type *)dp->tbl8)[(uint8_t)ips[i] + \
- ((tmp >> 1) * DIR24_8_TBL8_GRP_NUM_ENT)]; \
- next_hops[i] = tmp >> 1; \
- } \
-} \
-
-LOOKUP_FUNC(1b, uint8_t, 5, 0)
-LOOKUP_FUNC(2b, uint16_t, 6, 1)
-LOOKUP_FUNC(4b, uint32_t, 15, 2)
-LOOKUP_FUNC(8b, uint64_t, 12, 3)
+
+#define LOOKUP_FUNC(suffix, type, bulk_prefetch, nh_sz, is_vrf) \
+static inline void dir24_8_lookup_bulk_##suffix(void *p, \
+ const uint16_t *vrf_ids, const uint32_t *ips, \
+ uint64_t *next_hops, const unsigned int n) \
+{ \
+ struct dir24_8_tbl *dp = (struct dir24_8_tbl *)p; \
+ uint64_t tmp; \
+ uint32_t i; \
+ uint32_t prefetch_offset = RTE_MIN((unsigned int)bulk_prefetch, n); \
+ \
+ if (!is_vrf) \
+ RTE_SET_USED(vrf_ids); \
+ \
+ for (i = 0; i < prefetch_offset; i++) { \
+ uint16_t vid = is_vrf ? vrf_ids[i] : 0; \
+ RTE_ASSERT(vid < dp->num_vrfs); \
+ rte_prefetch0(get_tbl24_p(dp, vid, ips[i], nh_sz)); \
+ } \
+ for (i = 0; i < (n - prefetch_offset); i++) { \
+ uint16_t vid = is_vrf ? vrf_ids[i] : 0; \
+ uint16_t vid_next = is_vrf ? vrf_ids[i + prefetch_offset] : 0; \
+ RTE_ASSERT(vid < dp->num_vrfs); \
+ RTE_ASSERT(vid_next < dp->num_vrfs); \
+ rte_prefetch0(get_tbl24_p(dp, vid_next, \
+ ips[i + prefetch_offset], nh_sz)); \
+ tmp = ((type *)dp->tbl24)[get_tbl24_idx(vid, ips[i])]; \
+ if (unlikely(is_entry_extended(tmp))) \
+ tmp = ((type *)dp->tbl8)[(uint8_t)ips[i] + \
+ ((tmp >> 1) * DIR24_8_TBL8_GRP_NUM_ENT)]; \
+ next_hops[i] = tmp >> 1; \
+ } \
+ for (; i < n; i++) { \
+ uint16_t vid = is_vrf ? vrf_ids[i] : 0; \
+ RTE_ASSERT(vid < dp->num_vrfs); \
+ tmp = ((type *)dp->tbl24)[get_tbl24_idx(vid, ips[i])]; \
+ if (unlikely(is_entry_extended(tmp))) \
+ tmp = ((type *)dp->tbl8)[(uint8_t)ips[i] + \
+ ((tmp >> 1) * DIR24_8_TBL8_GRP_NUM_ENT)]; \
+ next_hops[i] = tmp >> 1; \
+ } \
+}
+
+LOOKUP_FUNC(1b, uint8_t, 5, 0, false)
+LOOKUP_FUNC(2b, uint16_t, 6, 1, false)
+LOOKUP_FUNC(4b, uint32_t, 15, 2, false)
+LOOKUP_FUNC(8b, uint64_t, 12, 3, false)
+LOOKUP_FUNC(vrf_1b, uint8_t, 5, 0, true)
+LOOKUP_FUNC(vrf_2b, uint16_t, 6, 1, true)
+LOOKUP_FUNC(vrf_4b, uint32_t, 15, 2, true)
+LOOKUP_FUNC(vrf_8b, uint64_t, 12, 3, true)
static inline void
-dir24_8_lookup_bulk(struct dir24_8_tbl *dp, const uint32_t *ips,
- uint64_t *next_hops, const unsigned int n, uint8_t nh_sz)
+__dir24_8_lookup_bulk(struct dir24_8_tbl *dp, const uint16_t *vrf_ids,
+ const uint32_t *ips, uint64_t *next_hops, const unsigned int n,
+ uint8_t nh_sz, bool is_vrf)
{
uint64_t tmp;
uint32_t i;
uint32_t prefetch_offset = RTE_MIN(15U, n);
- for (i = 0; i < prefetch_offset; i++)
- rte_prefetch0(get_tbl24_p(dp, ips[i], nh_sz));
+ if (!is_vrf)
+ RTE_SET_USED(vrf_ids);
+
+ for (i = 0; i < prefetch_offset; i++) {
+ uint16_t vid = is_vrf ? vrf_ids[i] : 0;
+ RTE_ASSERT(vid < dp->num_vrfs);
+ rte_prefetch0(get_tbl24_p(dp, vid, ips[i], nh_sz));
+ }
for (i = 0; i < (n - prefetch_offset); i++) {
- rte_prefetch0(get_tbl24_p(dp, ips[i + prefetch_offset],
- nh_sz));
- tmp = get_tbl24(dp, ips[i], nh_sz);
+ uint16_t vid = is_vrf ? vrf_ids[i] : 0;
+ uint16_t vid_next = is_vrf ? vrf_ids[i + prefetch_offset] : 0;
+ RTE_ASSERT(vid < dp->num_vrfs);
+ RTE_ASSERT(vid_next < dp->num_vrfs);
+ rte_prefetch0(get_tbl24_p(dp, vid_next,
+ ips[i + prefetch_offset], nh_sz));
+ tmp = get_tbl24(dp, vid, ips[i], nh_sz);
if (unlikely(is_entry_extended(tmp)))
tmp = get_tbl8(dp, tmp, ips[i], nh_sz);
next_hops[i] = tmp >> 1;
}
for (; i < n; i++) {
- tmp = get_tbl24(dp, ips[i], nh_sz);
+ uint16_t vid = is_vrf ? vrf_ids[i] : 0;
+ RTE_ASSERT(vid < dp->num_vrfs);
+ tmp = get_tbl24(dp, vid, ips[i], nh_sz);
if (unlikely(is_entry_extended(tmp)))
tmp = get_tbl8(dp, tmp, ips[i], nh_sz);
@@ -179,43 +213,79 @@ dir24_8_lookup_bulk(struct dir24_8_tbl *dp, const uint32_t *ips,
}
static inline void
-dir24_8_lookup_bulk_0(void *p, const uint32_t *ips,
+dir24_8_lookup_bulk_0(void *p, const uint16_t *vrf_ids, const uint32_t *ips,
uint64_t *next_hops, const unsigned int n)
{
struct dir24_8_tbl *dp = (struct dir24_8_tbl *)p;
- dir24_8_lookup_bulk(dp, ips, next_hops, n, 0);
+ __dir24_8_lookup_bulk(dp, vrf_ids, ips, next_hops, n, 0, false);
+}
+
+static inline void
+dir24_8_lookup_bulk_vrf_0(void *p, const uint16_t *vrf_ids,
+ const uint32_t *ips, uint64_t *next_hops, const unsigned int n)
+{
+ struct dir24_8_tbl *dp = (struct dir24_8_tbl *)p;
+
+ __dir24_8_lookup_bulk(dp, vrf_ids, ips, next_hops, n, 0, true);
}
static inline void
-dir24_8_lookup_bulk_1(void *p, const uint32_t *ips,
+dir24_8_lookup_bulk_1(void *p, const uint16_t *vrf_ids, const uint32_t *ips,
uint64_t *next_hops, const unsigned int n)
{
struct dir24_8_tbl *dp = (struct dir24_8_tbl *)p;
- dir24_8_lookup_bulk(dp, ips, next_hops, n, 1);
+ __dir24_8_lookup_bulk(dp, vrf_ids, ips, next_hops, n, 1, false);
}
static inline void
-dir24_8_lookup_bulk_2(void *p, const uint32_t *ips,
+dir24_8_lookup_bulk_vrf_1(void *p, const uint16_t *vrf_ids,
+ const uint32_t *ips, uint64_t *next_hops, const unsigned int n)
+{
+ struct dir24_8_tbl *dp = (struct dir24_8_tbl *)p;
+
+ __dir24_8_lookup_bulk(dp, vrf_ids, ips, next_hops, n, 1, true);
+}
+
+static inline void
+dir24_8_lookup_bulk_2(void *p, const uint16_t *vrf_ids, const uint32_t *ips,
uint64_t *next_hops, const unsigned int n)
{
struct dir24_8_tbl *dp = (struct dir24_8_tbl *)p;
- dir24_8_lookup_bulk(dp, ips, next_hops, n, 2);
+ __dir24_8_lookup_bulk(dp, vrf_ids, ips, next_hops, n, 2, false);
}
static inline void
-dir24_8_lookup_bulk_3(void *p, const uint32_t *ips,
+dir24_8_lookup_bulk_vrf_2(void *p, const uint16_t *vrf_ids,
+ const uint32_t *ips, uint64_t *next_hops, const unsigned int n)
+{
+ struct dir24_8_tbl *dp = (struct dir24_8_tbl *)p;
+
+ __dir24_8_lookup_bulk(dp, vrf_ids, ips, next_hops, n, 2, true);
+}
+
+static inline void
+dir24_8_lookup_bulk_3(void *p, const uint16_t *vrf_ids, const uint32_t *ips,
uint64_t *next_hops, const unsigned int n)
{
struct dir24_8_tbl *dp = (struct dir24_8_tbl *)p;
- dir24_8_lookup_bulk(dp, ips, next_hops, n, 3);
+ __dir24_8_lookup_bulk(dp, vrf_ids, ips, next_hops, n, 3, false);
}
static inline void
-dir24_8_lookup_bulk_uni(void *p, const uint32_t *ips,
+dir24_8_lookup_bulk_vrf_3(void *p, const uint16_t *vrf_ids,
+ const uint32_t *ips, uint64_t *next_hops, const unsigned int n)
+{
+ struct dir24_8_tbl *dp = (struct dir24_8_tbl *)p;
+
+ __dir24_8_lookup_bulk(dp, vrf_ids, ips, next_hops, n, 3, true);
+}
+
+static inline void
+dir24_8_lookup_bulk_uni(void *p, const uint16_t *vrf_ids, const uint32_t *ips,
uint64_t *next_hops, const unsigned int n)
{
struct dir24_8_tbl *dp = (struct dir24_8_tbl *)p;
@@ -224,66 +294,83 @@ dir24_8_lookup_bulk_uni(void *p, const uint32_t *ips,
uint32_t prefetch_offset = RTE_MIN(15U, n);
uint8_t nh_sz = dp->nh_sz;
- for (i = 0; i < prefetch_offset; i++)
- rte_prefetch0(get_tbl24_p(dp, ips[i], nh_sz));
+ for (i = 0; i < prefetch_offset; i++) {
+ uint16_t vid = vrf_ids[i];
+ RTE_ASSERT(vid < dp->num_vrfs);
+ rte_prefetch0(get_tbl24_p(dp, vid, ips[i], nh_sz));
+ }
for (i = 0; i < (n - prefetch_offset); i++) {
- rte_prefetch0(get_tbl24_p(dp, ips[i + prefetch_offset],
- nh_sz));
- tmp = get_tbl24(dp, ips[i], nh_sz);
+ uint16_t vid = vrf_ids[i];
+ uint16_t vid_next = vrf_ids[i + prefetch_offset];
+ RTE_ASSERT(vid < dp->num_vrfs);
+ RTE_ASSERT(vid_next < dp->num_vrfs);
+ rte_prefetch0(get_tbl24_p(dp, vid_next,
+ ips[i + prefetch_offset], nh_sz));
+ tmp = get_tbl24(dp, vid, ips[i], nh_sz);
if (unlikely(is_entry_extended(tmp)))
tmp = get_tbl8(dp, tmp, ips[i], nh_sz);
next_hops[i] = tmp >> 1;
}
for (; i < n; i++) {
- tmp = get_tbl24(dp, ips[i], nh_sz);
+ uint16_t vid = vrf_ids[i];
+ RTE_ASSERT(vid < dp->num_vrfs);
+ tmp = get_tbl24(dp, vid, ips[i], nh_sz);
if (unlikely(is_entry_extended(tmp)))
tmp = get_tbl8(dp, tmp, ips[i], nh_sz);
next_hops[i] = tmp >> 1;
}
}
-
#define BSWAP_MAX_LENGTH 64
-typedef void (*dir24_8_lookup_bulk_be_cb)(void *p, const uint32_t *ips, uint64_t *next_hops,
- const unsigned int n);
+typedef void (*dir24_8_lookup_bulk_be_cb)(void *p, const uint16_t *vrf_ids,
+ const uint32_t *ips, uint64_t *next_hops, const unsigned int n);
static inline void
-dir24_8_lookup_bulk_be(void *p, const uint32_t *ips, uint64_t *next_hops, const unsigned int n,
- dir24_8_lookup_bulk_be_cb cb)
+dir24_8_lookup_bulk_be(void *p, const uint16_t *vrf_ids, const uint32_t *ips,
+ uint64_t *next_hops, const unsigned int n, dir24_8_lookup_bulk_be_cb cb)
{
uint32_t le_ips[BSWAP_MAX_LENGTH];
unsigned int i;
#if RTE_BYTE_ORDER == RTE_BIG_ENDIAN
- cb(p, ips, next_hops, n);
+ cb(p, vrf_ids, ips, next_hops, n);
#else
for (i = 0; i < n; i += BSWAP_MAX_LENGTH) {
int j;
for (j = 0; j < BSWAP_MAX_LENGTH && i + j < n; j++)
le_ips[j] = rte_be_to_cpu_32(ips[i + j]);
- cb(p, le_ips, next_hops + i, j);
+ cb(p, vrf_ids + i, le_ips, next_hops + i, j);
}
#endif
}
#define DECLARE_BE_LOOKUP_FN(name) \
static inline void \
-name##_be(void *p, const uint32_t *ips, uint64_t *next_hops, const unsigned int n) \
+name##_be(void *p, const uint16_t *vrf_ids, const uint32_t *ips, \
+ uint64_t *next_hops, const unsigned int n) \
{ \
- dir24_8_lookup_bulk_be(p, ips, next_hops, n, name); \
+ dir24_8_lookup_bulk_be(p, vrf_ids, ips, next_hops, n, name); \
}
DECLARE_BE_LOOKUP_FN(dir24_8_lookup_bulk_1b)
DECLARE_BE_LOOKUP_FN(dir24_8_lookup_bulk_2b)
DECLARE_BE_LOOKUP_FN(dir24_8_lookup_bulk_4b)
DECLARE_BE_LOOKUP_FN(dir24_8_lookup_bulk_8b)
+DECLARE_BE_LOOKUP_FN(dir24_8_lookup_bulk_vrf_1b)
+DECLARE_BE_LOOKUP_FN(dir24_8_lookup_bulk_vrf_2b)
+DECLARE_BE_LOOKUP_FN(dir24_8_lookup_bulk_vrf_4b)
+DECLARE_BE_LOOKUP_FN(dir24_8_lookup_bulk_vrf_8b)
DECLARE_BE_LOOKUP_FN(dir24_8_lookup_bulk_0)
DECLARE_BE_LOOKUP_FN(dir24_8_lookup_bulk_1)
DECLARE_BE_LOOKUP_FN(dir24_8_lookup_bulk_2)
DECLARE_BE_LOOKUP_FN(dir24_8_lookup_bulk_3)
+DECLARE_BE_LOOKUP_FN(dir24_8_lookup_bulk_vrf_0)
+DECLARE_BE_LOOKUP_FN(dir24_8_lookup_bulk_vrf_1)
+DECLARE_BE_LOOKUP_FN(dir24_8_lookup_bulk_vrf_2)
+DECLARE_BE_LOOKUP_FN(dir24_8_lookup_bulk_vrf_3)
DECLARE_BE_LOOKUP_FN(dir24_8_lookup_bulk_uni)
void *
@@ -296,7 +383,7 @@ rte_fib_lookup_fn_t
dir24_8_get_lookup_fn(void *p, enum rte_fib_lookup_type type, bool be_addr);
int
-dir24_8_modify(struct rte_fib *fib, uint32_t ip, uint8_t depth,
+dir24_8_modify(struct rte_fib *fib, uint16_t vrf_id, uint32_t ip, uint8_t depth,
uint64_t next_hop, int op);
int
diff --git a/lib/fib/dir24_8_avx512.c b/lib/fib/dir24_8_avx512.c
index 89b43583c7..3e576e410e 100644
--- a/lib/fib/dir24_8_avx512.c
+++ b/lib/fib/dir24_8_avx512.c
@@ -4,75 +4,132 @@
#include <rte_vect.h>
#include <rte_fib.h>
+#include <rte_debug.h>
#include "dir24_8.h"
#include "dir24_8_avx512.h"
+enum vrf_scale {
+ VRF_SCALE_SINGLE = 0,
+ VRF_SCALE_SMALL = 1,
+ VRF_SCALE_LARGE = 2,
+};
+
static __rte_always_inline void
-dir24_8_vec_lookup_x16(void *p, const uint32_t *ips,
- uint64_t *next_hops, int size, bool be_addr)
+dir24_8_vec_lookup_x8_64b_path(struct dir24_8_tbl *dp, __m256i ip_vec_256,
+ __m256i vrf32_256, uint64_t *next_hops, int size)
{
- struct dir24_8_tbl *dp = (struct dir24_8_tbl *)p;
- __mmask16 msk_ext;
- __mmask16 exp_msk = 0x5555;
- __m512i ip_vec, idxes, res, bytes;
- const __m512i zero = _mm512_set1_epi32(0);
- const __m512i lsb = _mm512_set1_epi32(1);
- const __m512i lsbyte_msk = _mm512_set1_epi32(0xff);
- __m512i tmp1, tmp2, res_msk;
- __m256i tmp256;
- /* used to mask gather values if size is 1/2 (8/16 bit next hops) */
+ const __m512i zero_64 = _mm512_set1_epi64(0);
+ const __m512i lsb_64 = _mm512_set1_epi64(1);
+ const __m512i lsbyte_msk_64 = _mm512_set1_epi64(0xff);
+ __m512i res_msk_64, vrf64, idxes_64, res, bytes_64;
+ __mmask8 msk_ext_64;
+
if (size == sizeof(uint8_t))
- res_msk = _mm512_set1_epi32(UINT8_MAX);
+ res_msk_64 = _mm512_set1_epi64(UINT8_MAX);
else if (size == sizeof(uint16_t))
- res_msk = _mm512_set1_epi32(UINT16_MAX);
+ res_msk_64 = _mm512_set1_epi64(UINT16_MAX);
+ else if (size == sizeof(uint32_t))
+ res_msk_64 = _mm512_set1_epi64(UINT32_MAX);
- ip_vec = _mm512_loadu_si512(ips);
- if (be_addr) {
- const __m512i bswap32 = _mm512_set_epi32(
- 0x0c0d0e0f, 0x08090a0b, 0x04050607, 0x00010203,
- 0x0c0d0e0f, 0x08090a0b, 0x04050607, 0x00010203,
- 0x0c0d0e0f, 0x08090a0b, 0x04050607, 0x00010203,
- 0x0c0d0e0f, 0x08090a0b, 0x04050607, 0x00010203
- );
- ip_vec = _mm512_shuffle_epi8(ip_vec, bswap32);
+ vrf64 = _mm512_cvtepu32_epi64(vrf32_256);
+
+ /* Compute index: (vrf_id << 24) + (ip >> 8) using 64-bit shift */
+ idxes_64 = _mm512_slli_epi64(vrf64, 24);
+ idxes_64 = _mm512_add_epi64(idxes_64, _mm512_cvtepu32_epi64(
+ _mm256_srli_epi32(ip_vec_256, 8)));
+
+ /* lookup in tbl24 */
+ if (size == sizeof(uint8_t)) {
+ res = _mm512_i64gather_epi64(idxes_64, (const void *)dp->tbl24, 1);
+ res = _mm512_and_epi64(res, res_msk_64);
+ } else if (size == sizeof(uint16_t)) {
+ res = _mm512_i64gather_epi64(idxes_64, (const void *)dp->tbl24, 2);
+ res = _mm512_and_epi64(res, res_msk_64);
+ } else {
+ res = _mm512_i64gather_epi64(idxes_64, (const void *)dp->tbl24, 4);
+ res = _mm512_and_epi64(res, res_msk_64);
+ }
+
+ /* get extended entries indexes */
+ msk_ext_64 = _mm512_test_epi64_mask(res, lsb_64);
+
+ if (msk_ext_64 != 0) {
+ bytes_64 = _mm512_cvtepu32_epi64(ip_vec_256);
+ idxes_64 = _mm512_srli_epi64(res, 1);
+ idxes_64 = _mm512_slli_epi64(idxes_64, 8);
+ bytes_64 = _mm512_and_epi64(bytes_64, lsbyte_msk_64);
+ idxes_64 = _mm512_maskz_add_epi64(msk_ext_64, idxes_64, bytes_64);
+
+ if (size == sizeof(uint8_t))
+ idxes_64 = _mm512_mask_i64gather_epi64(zero_64, msk_ext_64,
+ idxes_64, (const void *)dp->tbl8, 1);
+ else if (size == sizeof(uint16_t))
+ idxes_64 = _mm512_mask_i64gather_epi64(zero_64, msk_ext_64,
+ idxes_64, (const void *)dp->tbl8, 2);
+ else
+ idxes_64 = _mm512_mask_i64gather_epi64(zero_64, msk_ext_64,
+ idxes_64, (const void *)dp->tbl8, 4);
+
+ res = _mm512_mask_blend_epi64(msk_ext_64, res, idxes_64);
}
- /* mask 24 most significant bits */
- idxes = _mm512_srli_epi32(ip_vec, 8);
+ res = _mm512_srli_epi64(res, 1);
+ _mm512_storeu_si512(next_hops, res);
+}
+
+static __rte_always_inline void
+dir24_8_vec_lookup_x16_32b_path(struct dir24_8_tbl *dp, __m512i ip_vec,
+ __m512i idxes, uint64_t *next_hops, int size)
+{
+ __mmask16 msk_ext;
+ const __mmask16 exp_msk = 0x5555;
+ const __m512i zero_32 = _mm512_set1_epi32(0);
+ const __m512i lsb_32 = _mm512_set1_epi32(1);
+ const __m512i lsbyte_msk_32 = _mm512_set1_epi32(0xff);
+ __m512i res, bytes, tmp1, tmp2;
+ __m256i tmp256;
+ __m512i res_msk_32;
+
+ if (size == sizeof(uint8_t))
+ res_msk_32 = _mm512_set1_epi32(UINT8_MAX);
+ else if (size == sizeof(uint16_t))
+ res_msk_32 = _mm512_set1_epi32(UINT16_MAX);
- /**
+ /*
* lookup in tbl24
* Put it inside branch to make compiler happy with -O0
*/
if (size == sizeof(uint8_t)) {
res = _mm512_i32gather_epi32(idxes, (const int *)dp->tbl24, 1);
- res = _mm512_and_epi32(res, res_msk);
+ res = _mm512_and_epi32(res, res_msk_32);
} else if (size == sizeof(uint16_t)) {
res = _mm512_i32gather_epi32(idxes, (const int *)dp->tbl24, 2);
- res = _mm512_and_epi32(res, res_msk);
- } else
+ res = _mm512_and_epi32(res, res_msk_32);
+ } else {
res = _mm512_i32gather_epi32(idxes, (const int *)dp->tbl24, 4);
+ }
/* get extended entries indexes */
- msk_ext = _mm512_test_epi32_mask(res, lsb);
+ msk_ext = _mm512_test_epi32_mask(res, lsb_32);
if (msk_ext != 0) {
idxes = _mm512_srli_epi32(res, 1);
idxes = _mm512_slli_epi32(idxes, 8);
- bytes = _mm512_and_epi32(ip_vec, lsbyte_msk);
+ bytes = _mm512_and_epi32(ip_vec, lsbyte_msk_32);
idxes = _mm512_maskz_add_epi32(msk_ext, idxes, bytes);
if (size == sizeof(uint8_t)) {
- idxes = _mm512_mask_i32gather_epi32(zero, msk_ext,
+ idxes = _mm512_mask_i32gather_epi32(zero_32, msk_ext,
idxes, (const int *)dp->tbl8, 1);
- idxes = _mm512_and_epi32(idxes, res_msk);
+ idxes = _mm512_and_epi32(idxes, res_msk_32);
} else if (size == sizeof(uint16_t)) {
- idxes = _mm512_mask_i32gather_epi32(zero, msk_ext,
+ idxes = _mm512_mask_i32gather_epi32(zero_32, msk_ext,
idxes, (const int *)dp->tbl8, 2);
- idxes = _mm512_and_epi32(idxes, res_msk);
- } else
- idxes = _mm512_mask_i32gather_epi32(zero, msk_ext,
+ idxes = _mm512_and_epi32(idxes, res_msk_32);
+ } else {
+ idxes = _mm512_mask_i32gather_epi32(zero_32, msk_ext,
idxes, (const int *)dp->tbl8, 4);
+ }
res = _mm512_mask_blend_epi32(msk_ext, res, idxes);
}
@@ -86,16 +143,74 @@ dir24_8_vec_lookup_x16(void *p, const uint32_t *ips,
_mm512_storeu_si512(next_hops + 8, tmp2);
}
+/* Unified function with vrf_scale parameter similar to be_addr */
+static __rte_always_inline void
+dir24_8_vec_lookup_x16(void *p, const uint16_t *vrf_ids, const uint32_t *ips,
+ uint64_t *next_hops, int size, bool be_addr, enum vrf_scale vrf_scale)
+{
+ struct dir24_8_tbl *dp = (struct dir24_8_tbl *)p;
+ __m512i ip_vec, idxes;
+ __m256i ip_vec_256, vrf32_256;
+
+ ip_vec = _mm512_loadu_si512(ips);
+ if (be_addr) {
+ const __m512i bswap32 = _mm512_set_epi32(
+ 0x0c0d0e0f, 0x08090a0b, 0x04050607, 0x00010203,
+ 0x0c0d0e0f, 0x08090a0b, 0x04050607, 0x00010203,
+ 0x0c0d0e0f, 0x08090a0b, 0x04050607, 0x00010203,
+ 0x0c0d0e0f, 0x08090a0b, 0x04050607, 0x00010203
+ );
+ ip_vec = _mm512_shuffle_epi8(ip_vec, bswap32);
+ }
+
+ if (vrf_scale == VRF_SCALE_SINGLE) {
+ /* mask 24 most significant bits */
+ idxes = _mm512_srli_epi32(ip_vec, 8);
+ dir24_8_vec_lookup_x16_32b_path(dp, ip_vec, idxes, next_hops, size);
+ } else if (vrf_scale == VRF_SCALE_SMALL) {
+ /* For < 256 VRFs: use 32-bit indices with 32-bit shift */
+ __m512i vrf32;
+ uint32_t i;
+
+ for (i = 0; i < 16; i++)
+ RTE_ASSERT(vrf_ids[i] < dp->num_vrfs);
+
+ vrf32 = _mm512_cvtepu16_epi32(_mm256_loadu_si256((const void *)vrf_ids));
+
+ /* mask 24 most significant bits */
+ idxes = _mm512_srli_epi32(ip_vec, 8);
+ idxes = _mm512_add_epi32(idxes, _mm512_slli_epi32(vrf32, 24));
+ dir24_8_vec_lookup_x16_32b_path(dp, ip_vec, idxes, next_hops, size);
+ } else {
+ /* For >= 256 VRFs: use 64-bit indices to avoid overflow */
+ uint32_t i;
+
+ for (i = 0; i < 16; i++)
+ RTE_ASSERT(vrf_ids[i] < dp->num_vrfs);
+
+ /* Extract first 8 IPs and VRF IDs */
+ ip_vec_256 = _mm512_castsi512_si256(ip_vec);
+ vrf32_256 = _mm256_cvtepu16_epi32(_mm_loadu_si128((const void *)vrf_ids));
+ dir24_8_vec_lookup_x8_64b_path(dp, ip_vec_256, vrf32_256, next_hops, size);
+
+ /* Process next 8 IPs from the second half of the vector */
+ ip_vec_256 = _mm512_extracti32x8_epi32(ip_vec, 1);
+ vrf32_256 = _mm256_cvtepu16_epi32(_mm_loadu_si128((const void *)(vrf_ids + 8)));
+ dir24_8_vec_lookup_x8_64b_path(dp, ip_vec_256, vrf32_256, next_hops + 8, size);
+ }
+}
+
+/* Unified function with vrf_scale parameter */
static __rte_always_inline void
-dir24_8_vec_lookup_x8_8b(void *p, const uint32_t *ips,
- uint64_t *next_hops, bool be_addr)
+dir24_8_vec_lookup_x8_8b(void *p, const uint16_t *vrf_ids,
+ const uint32_t *ips, uint64_t *next_hops, bool be_addr, enum vrf_scale vrf_scale)
{
struct dir24_8_tbl *dp = (struct dir24_8_tbl *)p;
- const __m512i zero = _mm512_set1_epi32(0);
- const __m512i lsbyte_msk = _mm512_set1_epi64(0xff);
- const __m512i lsb = _mm512_set1_epi64(1);
+ const __m512i zero_64 = _mm512_set1_epi64(0);
+ const __m512i lsbyte_msk_64 = _mm512_set1_epi64(0xff);
+ const __m512i lsb_64 = _mm512_set1_epi64(1);
__m512i res, idxes, bytes;
- __m256i idxes_256, ip_vec;
+ __m256i ip_vec, vrf32_256;
__mmask8 msk_ext;
ip_vec = _mm256_loadu_si256((const void *)ips);
@@ -106,66 +221,207 @@ dir24_8_vec_lookup_x8_8b(void *p, const uint32_t *ips,
);
ip_vec = _mm256_shuffle_epi8(ip_vec, bswap32);
}
- /* mask 24 most significant bits */
- idxes_256 = _mm256_srli_epi32(ip_vec, 8);
- /* lookup in tbl24 */
- res = _mm512_i32gather_epi64(idxes_256, (const void *)dp->tbl24, 8);
+ if (vrf_scale == VRF_SCALE_SINGLE) {
+ /* For single VRF: use 32-bit indices without vrf_ids */
+ __m256i idxes_256;
- /* get extended entries indexes */
- msk_ext = _mm512_test_epi64_mask(res, lsb);
+ /* mask 24 most significant bits */
+ idxes_256 = _mm256_srli_epi32(ip_vec, 8);
- if (msk_ext != 0) {
- bytes = _mm512_cvtepi32_epi64(ip_vec);
- idxes = _mm512_srli_epi64(res, 1);
- idxes = _mm512_slli_epi64(idxes, 8);
- bytes = _mm512_and_epi64(bytes, lsbyte_msk);
- idxes = _mm512_maskz_add_epi64(msk_ext, idxes, bytes);
- idxes = _mm512_mask_i64gather_epi64(zero, msk_ext, idxes,
- (const void *)dp->tbl8, 8);
-
- res = _mm512_mask_blend_epi64(msk_ext, res, idxes);
- }
+ /* lookup in tbl24 */
+ res = _mm512_i32gather_epi64(idxes_256, (const void *)dp->tbl24, 8);
- res = _mm512_srli_epi64(res, 1);
- _mm512_storeu_si512(next_hops, res);
+ /* get extended entries indexes */
+ msk_ext = _mm512_test_epi64_mask(res, lsb_64);
+
+ if (msk_ext != 0) {
+ bytes = _mm512_cvtepu32_epi64(ip_vec);
+ idxes = _mm512_srli_epi64(res, 1);
+ idxes = _mm512_slli_epi64(idxes, 8);
+ bytes = _mm512_and_epi64(bytes, lsbyte_msk_64);
+ idxes = _mm512_maskz_add_epi64(msk_ext, idxes, bytes);
+ idxes = _mm512_mask_i64gather_epi64(zero_64, msk_ext, idxes,
+ (const void *)dp->tbl8, 8);
+
+ res = _mm512_mask_blend_epi64(msk_ext, res, idxes);
+ }
+
+ res = _mm512_srli_epi64(res, 1);
+ _mm512_storeu_si512(next_hops, res);
+ } else if (vrf_scale == VRF_SCALE_SMALL) {
+ /* For < 256 VRFs: use 32-bit indices with 32-bit shift */
+ __m256i idxes_256;
+ uint32_t i;
+
+ for (i = 0; i < 8; i++)
+ RTE_ASSERT(vrf_ids[i] < dp->num_vrfs);
+
+ /* mask 24 most significant bits */
+ idxes_256 = _mm256_srli_epi32(ip_vec, 8);
+ vrf32_256 = _mm256_cvtepu16_epi32(_mm_loadu_si128((const void *)vrf_ids));
+ idxes_256 = _mm256_add_epi32(idxes_256, _mm256_slli_epi32(vrf32_256, 24));
+
+ /* lookup in tbl24 */
+ res = _mm512_i32gather_epi64(idxes_256, (const void *)dp->tbl24, 8);
+
+ /* get extended entries indexes */
+ msk_ext = _mm512_test_epi64_mask(res, lsb_64);
+
+ if (msk_ext != 0) {
+ bytes = _mm512_cvtepu32_epi64(ip_vec);
+ idxes = _mm512_srli_epi64(res, 1);
+ idxes = _mm512_slli_epi64(idxes, 8);
+ bytes = _mm512_and_epi64(bytes, lsbyte_msk_64);
+ idxes = _mm512_maskz_add_epi64(msk_ext, idxes, bytes);
+ idxes = _mm512_mask_i64gather_epi64(zero_64, msk_ext, idxes,
+ (const void *)dp->tbl8, 8);
+
+ res = _mm512_mask_blend_epi64(msk_ext, res, idxes);
+ }
+
+ res = _mm512_srli_epi64(res, 1);
+ _mm512_storeu_si512(next_hops, res);
+ } else {
+ /* For >= 256 VRFs: use 64-bit indices to avoid overflow */
+ uint32_t i;
+
+ for (i = 0; i < 8; i++)
+ RTE_ASSERT(vrf_ids[i] < dp->num_vrfs);
+
+ vrf32_256 = _mm256_cvtepu16_epi32(_mm_loadu_si128((const void *)vrf_ids));
+ __m512i vrf64 = _mm512_cvtepu32_epi64(vrf32_256);
+
+ /* Compute index: (vrf_id << 24) + (ip >> 8) using 64-bit arithmetic */
+ idxes = _mm512_slli_epi64(vrf64, 24);
+ idxes = _mm512_add_epi64(idxes, _mm512_cvtepu32_epi64(
+ _mm256_srli_epi32(ip_vec, 8)));
+
+ /* lookup in tbl24 with 64-bit gather */
+ res = _mm512_i64gather_epi64(idxes, (const void *)dp->tbl24, 8);
+
+ /* get extended entries indexes */
+ msk_ext = _mm512_test_epi64_mask(res, lsb_64);
+
+ if (msk_ext != 0) {
+ bytes = _mm512_cvtepu32_epi64(ip_vec);
+ idxes = _mm512_srli_epi64(res, 1);
+ idxes = _mm512_slli_epi64(idxes, 8);
+ bytes = _mm512_and_epi64(bytes, lsbyte_msk_64);
+ idxes = _mm512_maskz_add_epi64(msk_ext, idxes, bytes);
+ idxes = _mm512_mask_i64gather_epi64(zero_64, msk_ext, idxes,
+ (const void *)dp->tbl8, 8);
+
+ res = _mm512_mask_blend_epi64(msk_ext, res, idxes);
+ }
+
+ res = _mm512_srli_epi64(res, 1);
+ _mm512_storeu_si512(next_hops, res);
+ }
}
-#define DECLARE_VECTOR_FN(suffix, nh_type, be_addr) \
+#define DECLARE_VECTOR_FN(suffix, scalar_suffix, nh_type, be_addr, vrf_scale) \
void \
-rte_dir24_8_vec_lookup_bulk_##suffix(void *p, const uint32_t *ips, uint64_t *next_hops, \
- const unsigned int n) \
+rte_dir24_8_vec_lookup_bulk_##suffix(void *p, const uint16_t *vrf_ids, \
+ const uint32_t *ips, uint64_t *next_hops, const unsigned int n) \
{ \
uint32_t i; \
for (i = 0; i < (n / 16); i++) \
- dir24_8_vec_lookup_x16(p, ips + i * 16, next_hops + i * 16, sizeof(nh_type), \
- be_addr); \
- dir24_8_lookup_bulk_##suffix(p, ips + i * 16, next_hops + i * 16, n - i * 16); \
+ dir24_8_vec_lookup_x16(p, vrf_ids + i * 16, ips + i * 16, \
+ next_hops + i * 16, sizeof(nh_type), be_addr, vrf_scale); \
+ dir24_8_lookup_bulk_##scalar_suffix(p, vrf_ids + i * 16, ips + i * 16, \
+ next_hops + i * 16, n - i * 16); \
+}
+
+DECLARE_VECTOR_FN(1b, 1b, uint8_t, false, VRF_SCALE_SINGLE)
+DECLARE_VECTOR_FN(1b_be, 1b_be, uint8_t, true, VRF_SCALE_SINGLE)
+DECLARE_VECTOR_FN(2b, 2b, uint16_t, false, VRF_SCALE_SINGLE)
+DECLARE_VECTOR_FN(2b_be, 2b_be, uint16_t, true, VRF_SCALE_SINGLE)
+DECLARE_VECTOR_FN(4b, 4b, uint32_t, false, VRF_SCALE_SINGLE)
+DECLARE_VECTOR_FN(4b_be, 4b_be, uint32_t, true, VRF_SCALE_SINGLE)
+
+DECLARE_VECTOR_FN(vrf_1b, vrf_1b, uint8_t, false, VRF_SCALE_SMALL)
+DECLARE_VECTOR_FN(vrf_1b_be, vrf_1b_be, uint8_t, true, VRF_SCALE_SMALL)
+DECLARE_VECTOR_FN(vrf_2b, vrf_2b, uint16_t, false, VRF_SCALE_SMALL)
+DECLARE_VECTOR_FN(vrf_2b_be, vrf_2b_be, uint16_t, true, VRF_SCALE_SMALL)
+DECLARE_VECTOR_FN(vrf_4b, vrf_4b, uint32_t, false, VRF_SCALE_SMALL)
+DECLARE_VECTOR_FN(vrf_4b_be, vrf_4b_be, uint32_t, true, VRF_SCALE_SMALL)
+
+DECLARE_VECTOR_FN(vrf_1b_large, vrf_1b, uint8_t, false, VRF_SCALE_LARGE)
+DECLARE_VECTOR_FN(vrf_1b_be_large, vrf_1b_be, uint8_t, true, VRF_SCALE_LARGE)
+DECLARE_VECTOR_FN(vrf_2b_large, vrf_2b, uint16_t, false, VRF_SCALE_LARGE)
+DECLARE_VECTOR_FN(vrf_2b_be_large, vrf_2b_be, uint16_t, true, VRF_SCALE_LARGE)
+DECLARE_VECTOR_FN(vrf_4b_large, vrf_4b, uint32_t, false, VRF_SCALE_LARGE)
+DECLARE_VECTOR_FN(vrf_4b_be_large, vrf_4b_be, uint32_t, true, VRF_SCALE_LARGE)
+
+void
+rte_dir24_8_vec_lookup_bulk_8b(void *p, const uint16_t *vrf_ids,
+ const uint32_t *ips, uint64_t *next_hops, const unsigned int n)
+{
+ uint32_t i;
+ for (i = 0; i < (n / 8); i++)
+ dir24_8_vec_lookup_x8_8b(p, vrf_ids + i * 8, ips + i * 8,
+ next_hops + i * 8, false, VRF_SCALE_SINGLE);
+ dir24_8_lookup_bulk_8b(p, vrf_ids + i * 8, ips + i * 8,
+ next_hops + i * 8, n - i * 8);
+}
+
+void
+rte_dir24_8_vec_lookup_bulk_8b_be(void *p, const uint16_t *vrf_ids,
+ const uint32_t *ips, uint64_t *next_hops, const unsigned int n)
+{
+ uint32_t i;
+ for (i = 0; i < (n / 8); i++)
+ dir24_8_vec_lookup_x8_8b(p, vrf_ids + i * 8, ips + i * 8,
+ next_hops + i * 8, true, VRF_SCALE_SINGLE);
+ dir24_8_lookup_bulk_8b_be(p, vrf_ids + i * 8, ips + i * 8,
+ next_hops + i * 8, n - i * 8);
+}
+
+void
+rte_dir24_8_vec_lookup_bulk_vrf_8b(void *p, const uint16_t *vrf_ids,
+ const uint32_t *ips, uint64_t *next_hops, const unsigned int n)
+{
+ uint32_t i;
+ for (i = 0; i < (n / 8); i++)
+ dir24_8_vec_lookup_x8_8b(p, vrf_ids + i * 8, ips + i * 8,
+ next_hops + i * 8, false, VRF_SCALE_SMALL);
+ dir24_8_lookup_bulk_vrf_8b(p, vrf_ids + i * 8, ips + i * 8,
+ next_hops + i * 8, n - i * 8);
}
-DECLARE_VECTOR_FN(1b, uint8_t, false)
-DECLARE_VECTOR_FN(1b_be, uint8_t, true)
-DECLARE_VECTOR_FN(2b, uint16_t, false)
-DECLARE_VECTOR_FN(2b_be, uint16_t, true)
-DECLARE_VECTOR_FN(4b, uint32_t, false)
-DECLARE_VECTOR_FN(4b_be, uint32_t, true)
+void
+rte_dir24_8_vec_lookup_bulk_vrf_8b_be(void *p, const uint16_t *vrf_ids,
+ const uint32_t *ips, uint64_t *next_hops, const unsigned int n)
+{
+ uint32_t i;
+ for (i = 0; i < (n / 8); i++)
+ dir24_8_vec_lookup_x8_8b(p, vrf_ids + i * 8, ips + i * 8,
+ next_hops + i * 8, true, VRF_SCALE_SMALL);
+ dir24_8_lookup_bulk_vrf_8b_be(p, vrf_ids + i * 8, ips + i * 8,
+ next_hops + i * 8, n - i * 8);
+}
void
-rte_dir24_8_vec_lookup_bulk_8b(void *p, const uint32_t *ips,
- uint64_t *next_hops, const unsigned int n)
+rte_dir24_8_vec_lookup_bulk_vrf_8b_large(void *p, const uint16_t *vrf_ids,
+ const uint32_t *ips, uint64_t *next_hops, const unsigned int n)
{
uint32_t i;
for (i = 0; i < (n / 8); i++)
- dir24_8_vec_lookup_x8_8b(p, ips + i * 8, next_hops + i * 8, false);
- dir24_8_lookup_bulk_8b(p, ips + i * 8, next_hops + i * 8, n - i * 8);
+ dir24_8_vec_lookup_x8_8b(p, vrf_ids + i * 8, ips + i * 8,
+ next_hops + i * 8, false, VRF_SCALE_LARGE);
+ dir24_8_lookup_bulk_vrf_8b(p, vrf_ids + i * 8, ips + i * 8,
+ next_hops + i * 8, n - i * 8);
}
void
-rte_dir24_8_vec_lookup_bulk_8b_be(void *p, const uint32_t *ips,
- uint64_t *next_hops, const unsigned int n)
+rte_dir24_8_vec_lookup_bulk_vrf_8b_be_large(void *p, const uint16_t *vrf_ids,
+ const uint32_t *ips, uint64_t *next_hops, const unsigned int n)
{
uint32_t i;
for (i = 0; i < (n / 8); i++)
- dir24_8_vec_lookup_x8_8b(p, ips + i * 8, next_hops + i * 8, true);
- dir24_8_lookup_bulk_8b_be(p, ips + i * 8, next_hops + i * 8, n - i * 8);
+ dir24_8_vec_lookup_x8_8b(p, vrf_ids + i * 8, ips + i * 8,
+ next_hops + i * 8, true, VRF_SCALE_LARGE);
+ dir24_8_lookup_bulk_vrf_8b_be(p, vrf_ids + i * 8, ips + i * 8,
+ next_hops + i * 8, n - i * 8);
}
diff --git a/lib/fib/dir24_8_avx512.h b/lib/fib/dir24_8_avx512.h
index 3e2bbc2490..d42ef1d17f 100644
--- a/lib/fib/dir24_8_avx512.h
+++ b/lib/fib/dir24_8_avx512.h
@@ -6,35 +6,99 @@
#define _DIR248_AVX512_H_
void
-rte_dir24_8_vec_lookup_bulk_1b(void *p, const uint32_t *ips,
+rte_dir24_8_vec_lookup_bulk_1b(void *p, const uint16_t *vrf_ids,
+ const uint32_t *ips, uint64_t *next_hops, const unsigned int n);
+
+void
+rte_dir24_8_vec_lookup_bulk_1b_be(void *p, const uint16_t *vrf_ids,
+ const uint32_t *ips, uint64_t *next_hops, const unsigned int n);
+
+void
+rte_dir24_8_vec_lookup_bulk_vrf_1b(void *p, const uint16_t *vrf_ids, const uint32_t *ips,
uint64_t *next_hops, const unsigned int n);
void
-rte_dir24_8_vec_lookup_bulk_2b(void *p, const uint32_t *ips,
+rte_dir24_8_vec_lookup_bulk_vrf_1b_be(void *p, const uint16_t *vrf_ids, const uint32_t *ips,
uint64_t *next_hops, const unsigned int n);
void
-rte_dir24_8_vec_lookup_bulk_4b(void *p, const uint32_t *ips,
+rte_dir24_8_vec_lookup_bulk_vrf_1b_large(void *p, const uint16_t *vrf_ids,
+ const uint32_t *ips, uint64_t *next_hops, const unsigned int n);
+
+void
+rte_dir24_8_vec_lookup_bulk_vrf_1b_be_large(void *p, const uint16_t *vrf_ids,
+ const uint32_t *ips, uint64_t *next_hops, const unsigned int n);
+
+void
+rte_dir24_8_vec_lookup_bulk_2b(void *p, const uint16_t *vrf_ids,
+ const uint32_t *ips, uint64_t *next_hops, const unsigned int n);
+
+void
+rte_dir24_8_vec_lookup_bulk_2b_be(void *p, const uint16_t *vrf_ids,
+ const uint32_t *ips, uint64_t *next_hops, const unsigned int n);
+
+void
+rte_dir24_8_vec_lookup_bulk_vrf_2b(void *p, const uint16_t *vrf_ids, const uint32_t *ips,
uint64_t *next_hops, const unsigned int n);
void
-rte_dir24_8_vec_lookup_bulk_8b(void *p, const uint32_t *ips,
+rte_dir24_8_vec_lookup_bulk_vrf_2b_be(void *p, const uint16_t *vrf_ids, const uint32_t *ips,
uint64_t *next_hops, const unsigned int n);
void
-rte_dir24_8_vec_lookup_bulk_1b_be(void *p, const uint32_t *ips,
+rte_dir24_8_vec_lookup_bulk_vrf_2b_large(void *p, const uint16_t *vrf_ids,
+ const uint32_t *ips, uint64_t *next_hops, const unsigned int n);
+
+void
+rte_dir24_8_vec_lookup_bulk_vrf_2b_be_large(void *p, const uint16_t *vrf_ids,
+ const uint32_t *ips, uint64_t *next_hops, const unsigned int n);
+
+void
+rte_dir24_8_vec_lookup_bulk_4b(void *p, const uint16_t *vrf_ids,
+ const uint32_t *ips, uint64_t *next_hops, const unsigned int n);
+
+void
+rte_dir24_8_vec_lookup_bulk_4b_be(void *p, const uint16_t *vrf_ids,
+ const uint32_t *ips, uint64_t *next_hops, const unsigned int n);
+
+void
+rte_dir24_8_vec_lookup_bulk_vrf_4b(void *p, const uint16_t *vrf_ids, const uint32_t *ips,
uint64_t *next_hops, const unsigned int n);
void
-rte_dir24_8_vec_lookup_bulk_2b_be(void *p, const uint32_t *ips,
+rte_dir24_8_vec_lookup_bulk_vrf_4b_be(void *p, const uint16_t *vrf_ids, const uint32_t *ips,
uint64_t *next_hops, const unsigned int n);
void
-rte_dir24_8_vec_lookup_bulk_4b_be(void *p, const uint32_t *ips,
+rte_dir24_8_vec_lookup_bulk_vrf_4b_large(void *p, const uint16_t *vrf_ids,
+ const uint32_t *ips, uint64_t *next_hops, const unsigned int n);
+
+void
+rte_dir24_8_vec_lookup_bulk_vrf_4b_be_large(void *p, const uint16_t *vrf_ids,
+ const uint32_t *ips, uint64_t *next_hops, const unsigned int n);
+
+void
+rte_dir24_8_vec_lookup_bulk_8b(void *p, const uint16_t *vrf_ids,
+ const uint32_t *ips, uint64_t *next_hops, const unsigned int n);
+
+void
+rte_dir24_8_vec_lookup_bulk_8b_be(void *p, const uint16_t *vrf_ids,
+ const uint32_t *ips, uint64_t *next_hops, const unsigned int n);
+
+void
+rte_dir24_8_vec_lookup_bulk_vrf_8b(void *p, const uint16_t *vrf_ids, const uint32_t *ips,
uint64_t *next_hops, const unsigned int n);
void
-rte_dir24_8_vec_lookup_bulk_8b_be(void *p, const uint32_t *ips,
+rte_dir24_8_vec_lookup_bulk_vrf_8b_be(void *p, const uint16_t *vrf_ids, const uint32_t *ips,
uint64_t *next_hops, const unsigned int n);
+void
+rte_dir24_8_vec_lookup_bulk_vrf_8b_large(void *p, const uint16_t *vrf_ids,
+ const uint32_t *ips, uint64_t *next_hops, const unsigned int n);
+
+void
+rte_dir24_8_vec_lookup_bulk_vrf_8b_be_large(void *p, const uint16_t *vrf_ids,
+ const uint32_t *ips, uint64_t *next_hops, const unsigned int n);
+
#endif /* _DIR248_AVX512_H_ */
diff --git a/lib/fib/rte_fib.c b/lib/fib/rte_fib.c
index 184210f380..efc0595a7f 100644
--- a/lib/fib/rte_fib.c
+++ b/lib/fib/rte_fib.c
@@ -14,12 +14,15 @@
#include <rte_string_fns.h>
#include <rte_tailq.h>
+#include <rte_debug.h>
#include <rte_rib.h>
#include <rte_fib.h>
#include "dir24_8.h"
#include "fib_log.h"
+#define FIB_MAX_LOOKUP_BULK 64U
+
RTE_LOG_REGISTER_DEFAULT(fib_logtype, INFO);
TAILQ_HEAD(rte_fib_list, rte_tailq_entry);
@@ -40,52 +43,61 @@ EAL_REGISTER_TAILQ(rte_fib_tailq)
struct rte_fib {
char name[RTE_FIB_NAMESIZE];
enum rte_fib_type type; /**< Type of FIB struct */
- unsigned int flags; /**< Flags */
- struct rte_rib *rib; /**< RIB helper datastructure */
+ uint16_t flags; /**< Flags */
+ uint16_t num_vrfs;/**< Number of VRFs */
+ struct rte_rib **ribs; /**< RIB helper datastructures per VRF */
void *dp; /**< pointer to the dataplane struct*/
rte_fib_lookup_fn_t lookup; /**< FIB lookup function */
rte_fib_modify_fn_t modify; /**< modify FIB datastructure */
- uint64_t def_nh;
+ uint64_t *def_nh;/**< Per-VRF default next hop array */
};
static void
-dummy_lookup(void *fib_p, const uint32_t *ips, uint64_t *next_hops,
- const unsigned int n)
+dummy_lookup(void *fib_p, const uint16_t *vrf_ids,
+ const uint32_t *ips, uint64_t *next_hops, const unsigned int n)
{
unsigned int i;
struct rte_fib *fib = fib_p;
struct rte_rib_node *node;
+ struct rte_rib *rib;
for (i = 0; i < n; i++) {
- node = rte_rib_lookup(fib->rib, ips[i]);
+ RTE_ASSERT(vrf_ids[i] < fib->num_vrfs);
+ rib = rte_fib_vrf_get_rib(fib, vrf_ids[i]);
+ node = rte_rib_lookup(rib, ips[i]);
if (node != NULL)
rte_rib_get_nh(node, &next_hops[i]);
else
- next_hops[i] = fib->def_nh;
+ next_hops[i] = fib->def_nh[vrf_ids[i]];
}
}
static int
-dummy_modify(struct rte_fib *fib, uint32_t ip, uint8_t depth,
- uint64_t next_hop, int op)
+dummy_modify(struct rte_fib *fib, uint16_t vrf_id, uint32_t ip,
+ uint8_t depth, uint64_t next_hop, int op)
{
struct rte_rib_node *node;
+ struct rte_rib *rib;
if ((fib == NULL) || (depth > RTE_FIB_MAXDEPTH))
return -EINVAL;
- node = rte_rib_lookup_exact(fib->rib, ip, depth);
+ rib = rte_fib_vrf_get_rib(fib, vrf_id);
+ if (rib == NULL)
+ return -EINVAL;
+
+ node = rte_rib_lookup_exact(rib, ip, depth);
switch (op) {
case RTE_FIB_ADD:
if (node == NULL)
- node = rte_rib_insert(fib->rib, ip, depth);
+ node = rte_rib_insert(rib, ip, depth);
if (node == NULL)
return -rte_errno;
return rte_rib_set_nh(node, next_hop);
case RTE_FIB_DEL:
if (node == NULL)
return -ENOENT;
- rte_rib_remove(fib->rib, ip, depth);
+ rte_rib_remove(rib, ip, depth);
return 0;
}
return -EINVAL;
@@ -125,7 +137,7 @@ rte_fib_add(struct rte_fib *fib, uint32_t ip, uint8_t depth, uint64_t next_hop)
if ((fib == NULL) || (fib->modify == NULL) ||
(depth > RTE_FIB_MAXDEPTH))
return -EINVAL;
- return fib->modify(fib, ip, depth, next_hop, RTE_FIB_ADD);
+ return fib->modify(fib, 0, ip, depth, next_hop, RTE_FIB_ADD);
}
RTE_EXPORT_SYMBOL(rte_fib_delete)
@@ -135,7 +147,7 @@ rte_fib_delete(struct rte_fib *fib, uint32_t ip, uint8_t depth)
if ((fib == NULL) || (fib->modify == NULL) ||
(depth > RTE_FIB_MAXDEPTH))
return -EINVAL;
- return fib->modify(fib, ip, depth, 0, RTE_FIB_DEL);
+ return fib->modify(fib, 0, ip, depth, 0, RTE_FIB_DEL);
}
RTE_EXPORT_SYMBOL(rte_fib_lookup_bulk)
@@ -143,24 +155,73 @@ int
rte_fib_lookup_bulk(struct rte_fib *fib, uint32_t *ips,
uint64_t *next_hops, int n)
{
+ static const uint16_t zero_vrf_ids[FIB_MAX_LOOKUP_BULK];
+ unsigned int off = 0;
+ unsigned int total = (unsigned int)n;
+
FIB_RETURN_IF_TRUE(((fib == NULL) || (ips == NULL) ||
(next_hops == NULL) || (fib->lookup == NULL)), -EINVAL);
- fib->lookup(fib->dp, ips, next_hops, n);
+ while (off < total) {
+ unsigned int chunk = RTE_MIN(total - off,
+ FIB_MAX_LOOKUP_BULK);
+ fib->lookup(fib->dp, zero_vrf_ids, ips + off,
+ next_hops + off, chunk);
+ off += chunk;
+ }
+
+ return 0;
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_fib_vrf_lookup_bulk, 26.07)
+int
+rte_fib_vrf_lookup_bulk(struct rte_fib *fib, const uint16_t *vrf_ids,
+ const uint32_t *ips, uint64_t *next_hops, int n)
+{
+ FIB_RETURN_IF_TRUE(((fib == NULL) || (vrf_ids == NULL) ||
+ (ips == NULL) || (next_hops == NULL) ||
+ (fib->lookup == NULL)), -EINVAL);
+
+ fib->lookup(fib->dp, vrf_ids, ips, next_hops, (unsigned int)n);
return 0;
}
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_fib_vrf_add, 26.07)
+int
+rte_fib_vrf_add(struct rte_fib *fib, uint16_t vrf_id, uint32_t ip,
+ uint8_t depth, uint64_t next_hop)
+{
+ if ((fib == NULL) || (fib->modify == NULL) ||
+ (depth > RTE_FIB_MAXDEPTH))
+ return -EINVAL;
+ return fib->modify(fib, vrf_id, ip, depth, next_hop, RTE_FIB_ADD);
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_fib_vrf_delete, 26.07)
+int
+rte_fib_vrf_delete(struct rte_fib *fib, uint16_t vrf_id, uint32_t ip,
+ uint8_t depth)
+{
+ if ((fib == NULL) || (fib->modify == NULL) ||
+ (depth > RTE_FIB_MAXDEPTH))
+ return -EINVAL;
+ return fib->modify(fib, vrf_id, ip, depth, 0, RTE_FIB_DEL);
+}
+
RTE_EXPORT_SYMBOL(rte_fib_create)
struct rte_fib *
rte_fib_create(const char *name, int socket_id, struct rte_fib_conf *conf)
{
char mem_name[RTE_FIB_NAMESIZE];
+ char rib_name[RTE_FIB_NAMESIZE];
int ret;
struct rte_fib *fib = NULL;
struct rte_rib *rib = NULL;
struct rte_tailq_entry *te;
struct rte_fib_list *fib_list;
struct rte_rib_conf rib_conf;
+ uint16_t num_vrfs;
+ uint16_t vrf;
/* Check user arguments. */
if ((name == NULL) || (conf == NULL) || (conf->max_routes < 0) ||
@@ -170,16 +231,42 @@ rte_fib_create(const char *name, int socket_id, struct rte_fib_conf *conf)
return NULL;
}
+ num_vrfs = (conf->max_vrfs == 0) ? 1 : conf->max_vrfs;
rib_conf.ext_sz = conf->rib_ext_sz;
rib_conf.max_nodes = conf->max_routes * 2;
- rib = rte_rib_create(name, socket_id, &rib_conf);
- if (rib == NULL) {
- FIB_LOG(ERR,
- "Can not allocate RIB %s", name);
+ struct rte_rib **ribs = rte_zmalloc_socket("FIB_RIBS",
+ num_vrfs * sizeof(*fib->ribs), RTE_CACHE_LINE_SIZE, socket_id);
+ if (ribs == NULL) {
+ FIB_LOG(ERR, "FIB %s RIB array allocation failed", name);
+ rte_errno = ENOMEM;
return NULL;
}
+ uint64_t *def_nh = rte_zmalloc_socket("FIB_DEF_NH",
+ num_vrfs * sizeof(*def_nh), RTE_CACHE_LINE_SIZE, socket_id);
+ if (def_nh == NULL) {
+ FIB_LOG(ERR, "FIB %s default nexthop array allocation failed", name);
+ rte_errno = ENOMEM;
+ rte_free(ribs);
+ return NULL;
+ }
+
+ for (vrf = 0; vrf < num_vrfs; vrf++) {
+ if (num_vrfs == 1)
+ snprintf(rib_name, sizeof(rib_name), "%s", name);
+ else
+ snprintf(rib_name, sizeof(rib_name), "%s_vrf%u", name, vrf);
+ rib = rte_rib_create(rib_name, socket_id, &rib_conf);
+ if (rib == NULL) {
+ FIB_LOG(ERR, "Can not allocate RIB %s", rib_name);
+ goto free_ribs;
+ }
+ ribs[vrf] = rib;
+ def_nh[vrf] = (conf->vrf_default_nh != NULL) ?
+ conf->vrf_default_nh[vrf] : conf->default_nh;
+ }
+
snprintf(mem_name, sizeof(mem_name), "FIB_%s", name);
fib_list = RTE_TAILQ_CAST(rte_fib_tailq.head, rte_fib_list);
@@ -215,11 +302,13 @@ rte_fib_create(const char *name, int socket_id, struct rte_fib_conf *conf)
goto free_te;
}
+ fib->num_vrfs = num_vrfs;
+ fib->ribs = ribs;
+ fib->def_nh = def_nh;
+
rte_strlcpy(fib->name, name, sizeof(fib->name));
- fib->rib = rib;
fib->type = conf->type;
fib->flags = conf->flags;
- fib->def_nh = conf->default_nh;
ret = init_dataplane(fib, socket_id, conf);
if (ret < 0) {
FIB_LOG(ERR,
@@ -242,8 +331,12 @@ rte_fib_create(const char *name, int socket_id, struct rte_fib_conf *conf)
rte_free(te);
exit:
rte_mcfg_tailq_write_unlock();
- rte_rib_free(rib);
+free_ribs:
+ for (vrf = 0; vrf < num_vrfs; vrf++)
+ rte_rib_free(ribs[vrf]);
+ rte_free(def_nh);
+ rte_free(ribs);
return NULL;
}
@@ -311,7 +404,13 @@ rte_fib_free(struct rte_fib *fib)
rte_mcfg_tailq_write_unlock();
free_dataplane(fib);
- rte_rib_free(fib->rib);
+ if (fib->ribs != NULL) {
+ uint16_t vrf;
+ for (vrf = 0; vrf < fib->num_vrfs; vrf++)
+ rte_rib_free(fib->ribs[vrf]);
+ }
+ rte_free(fib->ribs);
+ rte_free(fib->def_nh);
rte_free(fib);
rte_free(te);
}
@@ -327,7 +426,18 @@ RTE_EXPORT_SYMBOL(rte_fib_get_rib)
struct rte_rib *
rte_fib_get_rib(struct rte_fib *fib)
{
- return (fib == NULL) ? NULL : fib->rib;
+ return (fib == NULL || fib->ribs == NULL) ? NULL : fib->ribs[0];
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_fib_vrf_get_rib, 26.07)
+struct rte_rib *
+rte_fib_vrf_get_rib(struct rte_fib *fib, uint16_t vrf_id)
+{
+ if (fib == NULL || fib->ribs == NULL)
+ return NULL;
+ if (vrf_id >= fib->num_vrfs)
+ return NULL;
+ return fib->ribs[vrf_id];
}
RTE_EXPORT_SYMBOL(rte_fib_select_lookup)
diff --git a/lib/fib/rte_fib.h b/lib/fib/rte_fib.h
index b16a653535..883195c7d6 100644
--- a/lib/fib/rte_fib.h
+++ b/lib/fib/rte_fib.h
@@ -53,11 +53,11 @@ enum rte_fib_type {
};
/** Modify FIB function */
-typedef int (*rte_fib_modify_fn_t)(struct rte_fib *fib, uint32_t ip,
- uint8_t depth, uint64_t next_hop, int op);
+typedef int (*rte_fib_modify_fn_t)(struct rte_fib *fib, uint16_t vrf_id,
+ uint32_t ip, uint8_t depth, uint64_t next_hop, int op);
/** FIB bulk lookup function */
-typedef void (*rte_fib_lookup_fn_t)(void *fib, const uint32_t *ips,
- uint64_t *next_hops, const unsigned int n);
+typedef void (*rte_fib_lookup_fn_t)(void *fib, const uint16_t *vrf_ids,
+ const uint32_t *ips, uint64_t *next_hops, const unsigned int n);
enum rte_fib_op {
RTE_FIB_ADD,
@@ -110,6 +110,10 @@ struct rte_fib_conf {
} dir24_8;
};
unsigned int flags; /**< Optional feature flags from RTE_FIB_F_* */
+ /** Number of VRFs to support (0 or 1 = single VRF for backward compat) */
+ uint16_t max_vrfs;
+ /** Per-VRF default nexthops (NULL = use default_nh for all) */
+ uint64_t *vrf_default_nh;
};
/** FIB RCU QSBR configuration structure. */
@@ -224,6 +228,71 @@ rte_fib_delete(struct rte_fib *fib, uint32_t ip, uint8_t depth);
int
rte_fib_lookup_bulk(struct rte_fib *fib, uint32_t *ips,
uint64_t *next_hops, int n);
+
+/**
+ * Add a route to the FIB with VRF ID.
+ *
+ * @param fib
+ * FIB object handle
+ * @param vrf_id
+ * VRF ID (0 to max_vrfs-1)
+ * @param ip
+ * IPv4 prefix address to be added to the FIB
+ * @param depth
+ * Prefix length
+ * @param next_hop
+ * Next hop to be added to the FIB
+ * @return
+ * 0 on success, negative value otherwise
+ */
+__rte_experimental
+int
+rte_fib_vrf_add(struct rte_fib *fib, uint16_t vrf_id, uint32_t ip,
+ uint8_t depth, uint64_t next_hop);
+
+/**
+ * Delete a rule from the FIB with VRF ID.
+ *
+ * @param fib
+ * FIB object handle
+ * @param vrf_id
+ * VRF ID (0 to max_vrfs-1)
+ * @param ip
+ * IPv4 prefix address to be deleted from the FIB
+ * @param depth
+ * Prefix length
+ * @return
+ * 0 on success, negative value otherwise
+ */
+__rte_experimental
+int
+rte_fib_vrf_delete(struct rte_fib *fib, uint16_t vrf_id, uint32_t ip,
+ uint8_t depth);
+
+/**
+ * Lookup multiple IP addresses in the FIB with per-packet VRF IDs.
+ *
+ * @param fib
+ * FIB object handle
+ * @param vrf_ids
+ * Array of VRF IDs
+ * @param ips
+ * Array of IPs to be looked up in the FIB
+ * @param next_hops
+ * Next hop of the most specific rule found for IP in the corresponding VRF.
+ * This is an array of eight byte values.
+ * If the lookup for the given IP failed, then corresponding element would
+ * contain default nexthop value configured for that VRF.
+ * @param n
+ * Number of elements in vrf_ids, ips (and next_hops) arrays to lookup.
+ * @return
+ * -EINVAL for incorrect arguments, otherwise 0
+ */
+__rte_experimental
+int
+rte_fib_vrf_lookup_bulk(struct rte_fib *fib, const uint16_t *vrf_ids,
+ const uint32_t *ips, uint64_t *next_hops, int n);
+
/**
* Get pointer to the dataplane specific struct
*
@@ -237,7 +306,7 @@ void *
rte_fib_get_dp(struct rte_fib *fib);
/**
- * Get pointer to the RIB
+ * Get pointer to the RIB for VRF 0
*
* @param fib
* FIB object handle
@@ -248,6 +317,21 @@ rte_fib_get_dp(struct rte_fib *fib);
struct rte_rib *
rte_fib_get_rib(struct rte_fib *fib);
+/**
+ * Get pointer to the RIB for a specific VRF
+ *
+ * @param fib
+ * FIB object handle
+ * @param vrf_id
+ * VRF ID (0 to max_vrfs-1)
+ * @return
+ * Pointer on the RIB on success
+ * NULL otherwise
+ */
+__rte_experimental
+struct rte_rib *
+rte_fib_vrf_get_rib(struct rte_fib *fib, uint16_t vrf_id);
+
/**
* Set lookup function based on type
*
--
2.43.0
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [RFC PATCH 2/4] fib: add VRF functional and unit tests
2026-03-22 15:42 [RFC PATCH 0/4] VRF support in FIB library Vladimir Medvedkin
2026-03-22 15:42 ` [RFC PATCH 1/4] fib: add multi-VRF support Vladimir Medvedkin
@ 2026-03-22 15:42 ` Vladimir Medvedkin
2026-03-22 16:40 ` Stephen Hemminger
2026-03-22 16:41 ` Stephen Hemminger
2026-03-22 15:42 ` [RFC PATCH 3/4] fib6: add multi-VRF support Vladimir Medvedkin
` (5 subsequent siblings)
7 siblings, 2 replies; 33+ messages in thread
From: Vladimir Medvedkin @ 2026-03-22 15:42 UTC (permalink / raw)
To: dev; +Cc: rjarry, nsaxena16, mb, adwivedi, jerinjacobk
Add test coverage for the multi-VRF IPv4 FIB API.
The app/test-fib functional harness exercises VRF-aware routes
with nexthop encoding, bulk VRF lookup, and nexthop validation
across multiple VRFs (enabled with the -V option).
The app/test unit tests cover VRF table creation, per-VRF
route add and delete, lookup correctness, inter-VRF isolation,
and all supported nexthop sizes.
Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
---
app/test-fib/main.c | 167 +++++++++++++++++++++++--
app/test/test_fib.c | 298 ++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 454 insertions(+), 11 deletions(-)
diff --git a/app/test-fib/main.c b/app/test-fib/main.c
index dd1a6d7297..5593fdd47e 100644
--- a/app/test-fib/main.c
+++ b/app/test-fib/main.c
@@ -82,6 +82,7 @@ static struct {
uint32_t nb_routes_per_depth[128 + 1];
uint32_t flags;
uint32_t tbl8;
+ uint32_t nb_vrfs;
uint8_t ent_sz;
uint8_t rnd_lookup_ips_ratio;
uint8_t print_fract;
@@ -95,6 +96,7 @@ static struct {
.nb_routes_per_depth = {0},
.flags = FIB_V4_DIR_TYPE,
.tbl8 = DEFAULT_LPM_TBL8,
+ .nb_vrfs = 1,
.ent_sz = 4,
.rnd_lookup_ips_ratio = 0,
.print_fract = 10,
@@ -136,6 +138,45 @@ get_max_nh(uint8_t nh_sz)
(1ULL << 21) - 1);
}
+/* Get number of bits needed to encode VRF IDs */
+static __rte_always_inline __rte_pure uint8_t
+get_vrf_bits(uint32_t nb_vrfs)
+{
+ uint8_t bits = 0;
+ uint32_t vrfs = nb_vrfs > 1 ? nb_vrfs : 2; /* At least 1 bit */
+
+ /* Round up to next power of 2 */
+ vrfs--;
+ while (vrfs > 0) {
+ bits++;
+ vrfs >>= 1;
+ }
+ return bits;
+}
+
+/* Encode VRF ID into nexthop (high bits) */
+static __rte_always_inline uint64_t
+encode_vrf_nh(uint16_t vrf_id, uint64_t nh, uint8_t nh_sz)
+{
+ uint8_t vrf_bits = get_vrf_bits(config.nb_vrfs);
+ /* +1 to account for ext bit in nexthop */
+ uint8_t vrf_shift = bits_in_nh(nh_sz) - (vrf_bits + 1);
+ uint64_t nh_mask = (1ULL << vrf_shift) - 1;
+
+ return (nh & nh_mask) | ((uint64_t)vrf_id << vrf_shift);
+}
+
+/* Decode VRF ID from nexthop (high bits) */
+static __rte_always_inline uint16_t
+decode_vrf_nh(uint64_t nh, uint8_t nh_sz)
+{
+ uint8_t vrf_bits = get_vrf_bits(config.nb_vrfs);
+ /* +1 to account for ext bit in nexthop */
+ uint8_t vrf_shift = bits_in_nh(nh_sz) - (vrf_bits + 1);
+
+ return (uint16_t)(nh >> vrf_shift);
+}
+
static int
get_fib_type(void)
{
@@ -629,7 +670,8 @@ print_usage(void)
"[-v <type of lookup function:"
"\ts1, s2, s3 (3 types of scalar), v (vector) -"
" for DIR24_8 based FIB\n"
- "\ts, v - for TRIE based ipv6 FIB>]\n",
+ "\ts, v - for TRIE based ipv6 FIB>]\n"
+ "[-V <number of VRFs (default 1)>]\n",
config.prgname);
}
@@ -652,6 +694,11 @@ check_config(void)
return -1;
}
+ if ((config.flags & CMP_FLAG) && (config.nb_vrfs > 1)) {
+ printf("-c option can not be used with -V > 1\n");
+ return -1;
+ }
+
if (!((config.ent_sz == 1) || (config.ent_sz == 2) ||
(config.ent_sz == 4) || (config.ent_sz == 8))) {
printf("wrong -e option %d, can be 1 or 2 or 4 or 8\n",
@@ -663,6 +710,24 @@ check_config(void)
printf("-e 1 is valid only for ipv4\n");
return -1;
}
+
+ /*
+ * For multi-VRF mode, VRF IDs are packed into the high bits of the
+ * nexthop for validation. Ensure there are enough bits:
+ * get_vrf_bits(nb_vrfs) must be strictly less than
+ * the total nexthop width.
+ */
+ if ((config.nb_vrfs > 1) && !(config.flags & IPV6_FLAG)) {
+ uint8_t nh_sz = rte_ctz32(config.ent_sz);
+ uint8_t vrf_bits = get_vrf_bits(config.nb_vrfs);
+ /* - 2 to leave at least 1 bit for nexthop and 1 bit for ext_ent flag */
+ if (vrf_bits >= bits_in_nh(nh_sz) - 2) {
+ printf("%u VRFs cannot be encoded in a %u-byte nexthop; "
+ "use a wider -e entry size\n",
+ config.nb_vrfs, config.ent_sz);
+ return -1;
+ }
+ }
return 0;
}
@@ -672,7 +737,7 @@ parse_opts(int argc, char **argv)
int opt;
char *endptr;
- while ((opt = getopt(argc, argv, "f:t:n:d:l:r:c6ab:e:g:w:u:sv:")) !=
+ while ((opt = getopt(argc, argv, "f:t:n:d:l:r:c6ab:e:g:w:u:sv:V:")) !=
-1) {
switch (opt) {
case 'f':
@@ -781,6 +846,16 @@ parse_opts(int argc, char **argv)
}
print_usage();
rte_exit(-EINVAL, "Invalid option -v %s\n", optarg);
+ case 'V':
+ errno = 0;
+ config.nb_vrfs = strtoul(optarg, &endptr, 10);
+ /* VRF IDs are uint16_t, max valid VRF is 65535 */
+ if ((errno != 0) || (config.nb_vrfs == 0) ||
+ (config.nb_vrfs > UINT16_MAX)) {
+ print_usage();
+ rte_exit(-EINVAL, "Invalid option -V: must be 1..65535\n");
+ }
+ break;
default:
print_usage();
rte_exit(-EINVAL, "Invalid options\n");
@@ -820,6 +895,7 @@ run_v4(void)
{
uint64_t start, acc;
uint64_t def_nh = 0;
+ uint8_t nh_sz = rte_ctz32(config.ent_sz);
struct rte_fib *fib;
struct rte_fib_conf conf = {0};
struct rt_rule_4 *rt;
@@ -830,6 +906,7 @@ run_v4(void)
uint32_t *tbl4 = config.lookup_tbl;
uint64_t fib_nh[BURST_SZ];
uint32_t lpm_nh[BURST_SZ];
+ uint16_t *vrf_ids = NULL;
rt = (struct rt_rule_4 *)config.rt;
@@ -843,16 +920,38 @@ run_v4(void)
return ret;
}
+ /* Allocate VRF IDs array for lookups if using multiple VRFs */
+ if (config.nb_vrfs > 1) {
+ vrf_ids = rte_malloc(NULL, sizeof(uint16_t) * config.nb_lookup_ips, 0);
+ if (vrf_ids == NULL) {
+ printf("Can not alloc VRF IDs array\n");
+ return -ENOMEM;
+ }
+ /* Generate random VRF IDs for each lookup */
+ for (i = 0; i < config.nb_lookup_ips; i++)
+ vrf_ids[i] = rte_rand() % config.nb_vrfs;
+ }
+
conf.type = get_fib_type();
conf.default_nh = def_nh;
conf.max_routes = config.nb_routes * 2;
conf.rib_ext_sz = 0;
+ conf.max_vrfs = config.nb_vrfs;
+ conf.vrf_default_nh = NULL; /* Use global default for single VRF */
if (conf.type == RTE_FIB_DIR24_8) {
conf.dir24_8.nh_sz = rte_ctz32(config.ent_sz);
conf.dir24_8.num_tbl8 = RTE_MIN(config.tbl8,
get_max_nh(conf.dir24_8.nh_sz));
}
+ conf.vrf_default_nh = rte_malloc(NULL, conf.max_vrfs * sizeof(uint64_t), 0);
+ if (conf.vrf_default_nh == NULL) {
+ printf("Can not alloc VRF default nexthops array\n");
+ return -ENOMEM;
+ }
+ for (i = 0; i < conf.max_vrfs; i++)
+ conf.vrf_default_nh[i] = encode_vrf_nh(i, def_nh, nh_sz);
+
fib = rte_fib_create("test", -1, &conf);
if (fib == NULL) {
printf("Can not alloc FIB, err %d\n", rte_errno);
@@ -883,12 +982,27 @@ run_v4(void)
for (k = config.print_fract, i = 0; k > 0; k--) {
start = rte_rdtsc_precise();
for (j = 0; j < (config.nb_routes - i) / k; j++) {
- ret = rte_fib_add(fib, rt[i + j].addr, rt[i + j].depth,
- rt[i + j].nh);
- if (unlikely(ret != 0)) {
- printf("Can not add a route to FIB, err %d\n",
- ret);
- return -ret;
+ uint32_t idx = i + j;
+ if (config.nb_vrfs > 1) {
+ uint16_t vrf_id;
+ for (vrf_id = 0; vrf_id < config.nb_vrfs; vrf_id++) {
+ uint64_t nh = encode_vrf_nh(vrf_id, rt[idx].nh, nh_sz);
+ ret = rte_fib_vrf_add(fib, vrf_id, rt[idx].addr,
+ rt[idx].depth, nh);
+ if (unlikely(ret != 0)) {
+ printf("Can not add a route to FIB, err %d\n",
+ ret);
+ return -ret;
+ }
+ }
+ } else {
+ ret = rte_fib_add(fib, rt[idx].addr, rt[idx].depth,
+ rt[idx].nh);
+ if (unlikely(ret != 0)) {
+ printf("Can not add a route to FIB, err %d\n",
+ ret);
+ return -ret;
+ }
}
}
printf("AVG FIB add %"PRIu64"\n",
@@ -928,14 +1042,33 @@ run_v4(void)
acc = 0;
for (i = 0; i < config.nb_lookup_ips; i += BURST_SZ) {
start = rte_rdtsc_precise();
- ret = rte_fib_lookup_bulk(fib, tbl4 + i, fib_nh, BURST_SZ);
+ if (config.nb_vrfs > 1)
+ ret = rte_fib_vrf_lookup_bulk(fib, vrf_ids + i,
+ tbl4 + i, fib_nh, BURST_SZ);
+ else
+ ret = rte_fib_lookup_bulk(fib, tbl4 + i, fib_nh,
+ BURST_SZ);
acc += rte_rdtsc_precise() - start;
if (ret != 0) {
printf("FIB lookup fails, err %d\n", ret);
return -ret;
}
+ /* Validate VRF IDs in returned nexthops */
+ if (config.nb_vrfs > 1) {
+ for (j = 0; j < BURST_SZ; j++) {
+ uint16_t returned_vrf = decode_vrf_nh(fib_nh[j], nh_sz);
+ if (returned_vrf != vrf_ids[i + j]) {
+ printf("VRF validation failed: "
+ "expected VRF %u, got %u\n",
+ vrf_ids[i + j], returned_vrf);
+ return -1;
+ }
+ }
+ }
}
printf("AVG FIB lookup %.1f\n", (double)acc / (double)i);
+ if (config.nb_vrfs > 1)
+ printf("VRF validation passed\n");
if (config.flags & CMP_FLAG) {
acc = 0;
@@ -970,8 +1103,17 @@ run_v4(void)
for (k = config.print_fract, i = 0; k > 0; k--) {
start = rte_rdtsc_precise();
- for (j = 0; j < (config.nb_routes - i) / k; j++)
- rte_fib_delete(fib, rt[i + j].addr, rt[i + j].depth);
+ for (j = 0; j < (config.nb_routes - i) / k; j++) {
+ uint32_t idx = i + j;
+ if (config.nb_vrfs > 1) {
+ uint16_t vrf_id;
+ for (vrf_id = 0; vrf_id < config.nb_vrfs; vrf_id++)
+ rte_fib_vrf_delete(fib, vrf_id, rt[idx].addr,
+ rt[idx].depth);
+ } else {
+ rte_fib_delete(fib, rt[idx].addr, rt[idx].depth);
+ }
+ }
printf("AVG FIB delete %"PRIu64"\n",
(rte_rdtsc_precise() - start) / j);
@@ -991,6 +1133,9 @@ run_v4(void)
}
}
+ if (vrf_ids != NULL)
+ rte_free(vrf_ids);
+
return 0;
}
diff --git a/app/test/test_fib.c b/app/test/test_fib.c
index bd73399d56..6a5d9836de 100644
--- a/app/test/test_fib.c
+++ b/app/test/test_fib.c
@@ -24,6 +24,11 @@ static int32_t test_get_invalid(void);
static int32_t test_lookup(void);
static int32_t test_invalid_rcu(void);
static int32_t test_fib_rcu_sync_rw(void);
+static int32_t test_create_vrf(void);
+static int32_t test_vrf_add_del(void);
+static int32_t test_vrf_lookup(void);
+static int32_t test_vrf_isolation(void);
+static int32_t test_vrf_all_nh_sizes(void);
#define MAX_ROUTES (1 << 16)
#define MAX_TBL8 (1 << 15)
@@ -588,6 +593,294 @@ test_fib_rcu_sync_rw(void)
return status == 0 ? TEST_SUCCESS : TEST_FAILED;
}
+/*
+ * Test VRF creation and basic operations
+ */
+static int32_t
+test_create_vrf(void)
+{
+ struct rte_fib *fib = NULL;
+ struct rte_fib_conf config = { 0 };
+ uint64_t def_nh = 100;
+ uint64_t vrf_def_nh[4] = {100, 200, 300, 400};
+
+ config.max_routes = MAX_ROUTES;
+ config.rib_ext_sz = 0;
+ config.default_nh = def_nh;
+ config.type = RTE_FIB_DIR24_8;
+ config.dir24_8.nh_sz = RTE_FIB_DIR24_8_4B;
+ config.dir24_8.num_tbl8 = MAX_TBL8;
+
+ /* Test single VRF (backward compat) */
+ config.max_vrfs = 0;
+ config.vrf_default_nh = NULL;
+ fib = rte_fib_create(__func__, SOCKET_ID_ANY, &config);
+ RTE_TEST_ASSERT(fib != NULL, "Failed to create FIB with max_vrfs=0\n");
+ rte_fib_free(fib);
+
+ /* Test single VRF explicitly */
+ config.max_vrfs = 1;
+ fib = rte_fib_create(__func__, SOCKET_ID_ANY, &config);
+ RTE_TEST_ASSERT(fib != NULL, "Failed to create FIB with max_vrfs=1\n");
+ rte_fib_free(fib);
+
+ /* Test multi-VRF with per-VRF defaults */
+ config.max_vrfs = 4;
+ config.vrf_default_nh = vrf_def_nh;
+ fib = rte_fib_create(__func__, SOCKET_ID_ANY, &config);
+ RTE_TEST_ASSERT(fib != NULL, "Failed to create FIB with max_vrfs=4\n");
+ rte_fib_free(fib);
+
+ return TEST_SUCCESS;
+}
+
+/*
+ * Test VRF route add/delete operations
+ */
+static int32_t
+test_vrf_add_del(void)
+{
+ struct rte_fib *fib = NULL;
+ struct rte_fib_conf config = { 0 };
+ uint64_t def_nh = 100;
+ uint64_t vrf_def_nh[4] = {100, 200, 300, 400};
+ uint32_t ip = RTE_IPV4(192, 168, 1, 0);
+ uint8_t depth = 24;
+ uint64_t nh = 1000;
+ int ret;
+
+ config.max_routes = MAX_ROUTES;
+ config.rib_ext_sz = 0;
+ config.default_nh = def_nh;
+ config.type = RTE_FIB_DIR24_8;
+ config.dir24_8.nh_sz = RTE_FIB_DIR24_8_4B;
+ config.dir24_8.num_tbl8 = MAX_TBL8;
+ config.max_vrfs = 4;
+ config.vrf_default_nh = vrf_def_nh;
+
+ fib = rte_fib_create(__func__, SOCKET_ID_ANY, &config);
+ RTE_TEST_ASSERT(fib != NULL, "Failed to create FIB\n");
+
+ /* Add route to VRF 0 */
+ ret = rte_fib_vrf_add(fib, 0, ip, depth, nh);
+ RTE_TEST_ASSERT(ret == 0, "Failed to add route to VRF 0\n");
+
+ /* Add route to VRF 1 with different nexthop */
+ ret = rte_fib_vrf_add(fib, 1, ip, depth, nh + 1);
+ RTE_TEST_ASSERT(ret == 0, "Failed to add route to VRF 1\n");
+
+ /* Add route to VRF 2 */
+ ret = rte_fib_vrf_add(fib, 2, ip, depth, nh + 2);
+ RTE_TEST_ASSERT(ret == 0, "Failed to add route to VRF 2\n");
+
+ /* Test invalid VRF ID */
+ ret = rte_fib_vrf_add(fib, 10, ip, depth, nh);
+ RTE_TEST_ASSERT(ret != 0, "Should fail with invalid VRF ID\n");
+
+ /* Delete route from VRF 1 */
+ ret = rte_fib_vrf_delete(fib, 1, ip, depth);
+ RTE_TEST_ASSERT(ret == 0, "Failed to delete route from VRF 1\n");
+
+ /* Delete non-existent route - implementation may return error */
+ ret = rte_fib_vrf_delete(fib, 3, ip, depth);
+ (void)ret; /* Accept any return value */
+
+ rte_fib_free(fib);
+ return TEST_SUCCESS;
+}
+
+/*
+ * Test VRF lookup functionality
+ */
+static int32_t
+test_vrf_lookup(void)
+{
+ struct rte_fib *fib = NULL;
+ struct rte_fib_conf config = { 0 };
+ uint64_t def_nh = 100;
+ uint64_t vrf_def_nh[4] = {1000, 2000, 3000, 4000};
+ uint32_t ip_base = RTE_IPV4(10, 0, 0, 0);
+ uint16_t vrf_ids[8];
+ uint32_t ips[8];
+ uint64_t next_hops[8];
+ int ret;
+ uint32_t i;
+
+ config.max_routes = MAX_ROUTES;
+ config.rib_ext_sz = 0;
+ config.default_nh = def_nh;
+ config.type = RTE_FIB_DIR24_8;
+ config.dir24_8.nh_sz = RTE_FIB_DIR24_8_4B;
+ config.dir24_8.num_tbl8 = MAX_TBL8;
+ config.max_vrfs = 4;
+ config.vrf_default_nh = vrf_def_nh;
+
+ fib = rte_fib_create(__func__, SOCKET_ID_ANY, &config);
+ RTE_TEST_ASSERT(fib != NULL, "Failed to create FIB\n");
+
+ /* Add routes to different VRFs with VRF-specific nexthops */
+ for (i = 0; i < 4; i++) {
+ ret = rte_fib_vrf_add(fib, i, ip_base + (i << 16), 16, 100 + i);
+ RTE_TEST_ASSERT(ret == 0, "Failed to add route to VRF %u\n", i);
+ }
+
+ /* Prepare lookup: each IP should match its VRF-specific route */
+ for (i = 0; i < 4; i++) {
+ vrf_ids[i] = i;
+ ips[i] = ip_base + (i << 16) + 0x1234; /* Within the /16 */
+ }
+
+ /* Lookup should return VRF-specific nexthops */
+ ret = rte_fib_vrf_lookup_bulk(fib, vrf_ids, ips, next_hops, 4);
+ RTE_TEST_ASSERT(ret == 0, "VRF lookup failed\n");
+
+ for (i = 0; i < 4; i++) {
+ RTE_TEST_ASSERT(next_hops[i] == 100 + i,
+ "Wrong nexthop for VRF %u: expected %"PRIu64", got %"PRIu64"\n",
+ i, (uint64_t)(100 + i), next_hops[i]);
+ }
+
+ /* Test default nexthops for unmatched IPs */
+ for (i = 0; i < 4; i++) {
+ vrf_ids[i] = i;
+ ips[i] = RTE_IPV4(192, 168, i, 1); /* No route for these */
+ }
+
+ ret = rte_fib_vrf_lookup_bulk(fib, vrf_ids, ips, next_hops, 4);
+ RTE_TEST_ASSERT(ret == 0, "VRF lookup failed\n");
+
+ for (i = 0; i < 4; i++) {
+ RTE_TEST_ASSERT(next_hops[i] == vrf_def_nh[i],
+ "Wrong default nexthop for VRF %u: expected %"PRIu64", got %"PRIu64"\n",
+ i, vrf_def_nh[i], next_hops[i]);
+ }
+
+ rte_fib_free(fib);
+ return TEST_SUCCESS;
+}
+
+/*
+ * Test VRF isolation - routes in one VRF shouldn't affect others
+ */
+static int32_t
+test_vrf_isolation(void)
+{
+ struct rte_fib *fib = NULL;
+ struct rte_fib_conf config = { 0 };
+ uint64_t vrf_def_nh[3] = {100, 200, 300};
+ uint32_t ip = RTE_IPV4(10, 10, 10, 0);
+ uint16_t vrf_ids[3] = {0, 1, 2};
+ uint32_t ips[3];
+ uint64_t next_hops[3];
+ int ret;
+ uint32_t i;
+
+ config.max_routes = MAX_ROUTES;
+ config.rib_ext_sz = 0;
+ config.default_nh = 0;
+ config.type = RTE_FIB_DIR24_8;
+ config.dir24_8.nh_sz = RTE_FIB_DIR24_8_4B;
+ config.dir24_8.num_tbl8 = MAX_TBL8;
+ config.max_vrfs = 3;
+ config.vrf_default_nh = vrf_def_nh;
+
+ fib = rte_fib_create("test_vrfisol", SOCKET_ID_ANY, &config);
+ RTE_TEST_ASSERT(fib != NULL, "Failed to create FIB\n");
+
+ /* Add route only to VRF 1 */
+ ret = rte_fib_vrf_add(fib, 1, ip, 24, 777);
+ RTE_TEST_ASSERT(ret == 0, "Failed to add route to VRF 1\n");
+
+ /* Lookup same IP in all three VRFs */
+ for (i = 0; i < 3; i++)
+ ips[i] = ip + 15; /* Within /24 */
+
+ ret = rte_fib_vrf_lookup_bulk(fib, vrf_ids, ips, next_hops, 3);
+ RTE_TEST_ASSERT(ret == 0, "VRF lookup failed\n");
+
+ /* VRF 0 should get default */
+ RTE_TEST_ASSERT(next_hops[0] == vrf_def_nh[0],
+ "VRF 0 should return default nexthop\n");
+
+ /* VRF 1 should get the route */
+ RTE_TEST_ASSERT(next_hops[1] == 777,
+ "VRF 1 should return route nexthop 777, got %"PRIu64"\n", next_hops[1]);
+
+ /* VRF 2 should get default */
+ RTE_TEST_ASSERT(next_hops[2] == vrf_def_nh[2],
+ "VRF 2 should return default nexthop\n");
+
+ rte_fib_free(fib);
+ return TEST_SUCCESS;
+}
+
+/*
+ * Test multi-VRF with all nexthop sizes
+ */
+static int32_t
+test_vrf_all_nh_sizes(void)
+{
+ struct rte_fib *fib = NULL;
+ struct rte_fib_conf config = { 0 };
+ uint64_t vrf_def_nh[2] = {10, 20};
+ uint32_t ip = RTE_IPV4(172, 16, 0, 0);
+ uint16_t vrf_ids[2] = {0, 1};
+ uint32_t ips[2];
+ uint64_t next_hops[2];
+ int ret;
+ enum rte_fib_dir24_8_nh_sz nh_sizes[] = {
+ RTE_FIB_DIR24_8_1B,
+ RTE_FIB_DIR24_8_2B,
+ RTE_FIB_DIR24_8_4B,
+ RTE_FIB_DIR24_8_8B
+ };
+ uint64_t max_nhs[] = {127, 32767, 2147483647ULL, 9223372036854775807ULL};
+ int i;
+
+ config.max_routes = MAX_ROUTES;
+ config.rib_ext_sz = 0;
+ config.default_nh = 0;
+ config.type = RTE_FIB_DIR24_8;
+ config.dir24_8.num_tbl8 = 127;
+ config.max_vrfs = 2;
+ config.vrf_default_nh = vrf_def_nh;
+
+ for (i = 0; i < (int)RTE_DIM(nh_sizes); i++) {
+ char name[32];
+ config.dir24_8.nh_sz = nh_sizes[i];
+ snprintf(name, sizeof(name), "vrf_nh%d", i);
+
+ fib = rte_fib_create(name, SOCKET_ID_ANY, &config);
+ RTE_TEST_ASSERT(fib != NULL, "Failed to create FIB\n");
+
+ /* Add routes with max nexthop for this size */
+ ret = rte_fib_vrf_add(fib, 0, ip, 16, max_nhs[i]);
+ RTE_TEST_ASSERT(ret == 0,
+ "Failed to add route to VRF 0 with nh_sz=%d\n", nh_sizes[i]);
+
+ ret = rte_fib_vrf_add(fib, 1, ip, 16, max_nhs[i] - 1);
+ RTE_TEST_ASSERT(ret == 0,
+ "Failed to add route to VRF 1 with nh_sz=%d\n", nh_sizes[i]);
+
+ /* Lookup */
+ ips[0] = ip + 0x100;
+ ips[1] = ip + 0x200;
+
+ ret = rte_fib_vrf_lookup_bulk(fib, vrf_ids, ips, next_hops, 2);
+ RTE_TEST_ASSERT(ret == 0, "VRF lookup failed with nh_sz=%d\n", nh_sizes[i]);
+
+ RTE_TEST_ASSERT(next_hops[0] == max_nhs[i],
+ "Wrong nexthop for VRF 0 with nh_sz=%d\n", nh_sizes[i]);
+ RTE_TEST_ASSERT(next_hops[1] == max_nhs[i] - 1,
+ "Wrong nexthop for VRF 1 with nh_sz=%d\n", nh_sizes[i]);
+
+ rte_fib_free(fib);
+ fib = NULL;
+ }
+
+ return TEST_SUCCESS;
+}
+
static struct unit_test_suite fib_fast_tests = {
.suite_name = "fib autotest",
.setup = NULL,
@@ -600,6 +893,11 @@ static struct unit_test_suite fib_fast_tests = {
TEST_CASE(test_lookup),
TEST_CASE(test_invalid_rcu),
TEST_CASE(test_fib_rcu_sync_rw),
+ TEST_CASE(test_create_vrf),
+ TEST_CASE(test_vrf_add_del),
+ TEST_CASE(test_vrf_lookup),
+ TEST_CASE(test_vrf_isolation),
+ TEST_CASE(test_vrf_all_nh_sizes),
TEST_CASES_END()
}
};
--
2.43.0
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [RFC PATCH 3/4] fib6: add multi-VRF support
2026-03-22 15:42 [RFC PATCH 0/4] VRF support in FIB library Vladimir Medvedkin
2026-03-22 15:42 ` [RFC PATCH 1/4] fib: add multi-VRF support Vladimir Medvedkin
2026-03-22 15:42 ` [RFC PATCH 2/4] fib: add VRF functional and unit tests Vladimir Medvedkin
@ 2026-03-22 15:42 ` Vladimir Medvedkin
2026-03-22 15:42 ` [RFC PATCH 4/4] fib6: add VRF functional and unit tests Vladimir Medvedkin
` (4 subsequent siblings)
7 siblings, 0 replies; 33+ messages in thread
From: Vladimir Medvedkin @ 2026-03-22 15:42 UTC (permalink / raw)
To: dev; +Cc: rjarry, nsaxena16, mb, adwivedi, jerinjacobk
Add VRF (Virtual Routing and Forwarding) support to the IPv6
FIB library, allowing multiple independent routing tables
within a single FIB instance.
Introduce max_vrfs and vrf_default_nh in rte_fib6_conf and
add four new experimental APIs:
- rte_fib6_vrf_add() and rte_fib6_vrf_delete() for per-VRF
route management
- rte_fib6_vrf_lookup_bulk() for multi-VRF bulk lookups
- rte_fib6_vrf_get_rib() to retrieve a per-VRF RIB handle
Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
---
lib/fib/rte_fib6.c | 166 ++++++++++++++++++++++++++-----
lib/fib/rte_fib6.h | 88 ++++++++++++++++-
lib/fib/trie.c | 158 +++++++++++++++++++++--------
lib/fib/trie.h | 51 +++++++---
lib/fib/trie_avx512.c | 225 ++++++++++++++++++++++++++++++++++++++----
lib/fib/trie_avx512.h | 39 +++++++-
6 files changed, 617 insertions(+), 110 deletions(-)
diff --git a/lib/fib/rte_fib6.c b/lib/fib/rte_fib6.c
index 770becdb61..0d2b2927d5 100644
--- a/lib/fib/rte_fib6.c
+++ b/lib/fib/rte_fib6.c
@@ -22,6 +22,8 @@
#include "trie.h"
#include "fib_log.h"
+#define FIB6_MAX_LOOKUP_BULK 64U
+
TAILQ_HEAD(rte_fib6_list, rte_tailq_entry);
static struct rte_tailq_elem rte_fib6_tailq = {
.name = "RTE_FIB6",
@@ -40,51 +42,61 @@ EAL_REGISTER_TAILQ(rte_fib6_tailq)
struct rte_fib6 {
char name[RTE_FIB6_NAMESIZE];
enum rte_fib6_type type; /**< Type of FIB struct */
- struct rte_rib6 *rib; /**< RIB helper datastructure */
- void *dp; /**< pointer to the dataplane struct*/
- rte_fib6_lookup_fn_t lookup; /**< FIB lookup function */
+ uint16_t num_vrfs; /**< Number of VRFs */
+ struct rte_rib6 **ribs; /**< RIB helper datastructures per VRF */
+ void *dp; /**< pointer to the dataplane struct */
+ rte_fib6_lookup_fn_t lookup; /**< lookup function */
rte_fib6_modify_fn_t modify; /**< modify FIB datastructure */
- uint64_t def_nh;
+ uint64_t *def_nh; /**< Per-VRF default next hop array */
};
static void
-dummy_lookup(void *fib_p, const struct rte_ipv6_addr *ips,
+dummy_lookup(void *fib_p, const uint16_t *vrf_ids,
+ const struct rte_ipv6_addr *ips,
uint64_t *next_hops, const unsigned int n)
{
unsigned int i;
struct rte_fib6 *fib = fib_p;
struct rte_rib6_node *node;
+ struct rte_rib6 *rib;
for (i = 0; i < n; i++) {
- node = rte_rib6_lookup(fib->rib, &ips[i]);
+ RTE_ASSERT(vrf_ids[i] < fib->num_vrfs);
+ rib = rte_fib6_vrf_get_rib(fib, vrf_ids[i]);
+ node = rte_rib6_lookup(rib, &ips[i]);
if (node != NULL)
rte_rib6_get_nh(node, &next_hops[i]);
else
- next_hops[i] = fib->def_nh;
+ next_hops[i] = fib->def_nh[vrf_ids[i]];
}
}
static int
-dummy_modify(struct rte_fib6 *fib, const struct rte_ipv6_addr *ip,
- uint8_t depth, uint64_t next_hop, int op)
+dummy_modify(struct rte_fib6 *fib, uint16_t vrf_id,
+ const struct rte_ipv6_addr *ip, uint8_t depth,
+ uint64_t next_hop, int op)
{
struct rte_rib6_node *node;
+ struct rte_rib6 *rib;
if ((fib == NULL) || (depth > RTE_IPV6_MAX_DEPTH))
return -EINVAL;
+ rib = rte_fib6_vrf_get_rib(fib, vrf_id);
+ if (rib == NULL)
+ return -EINVAL;
- node = rte_rib6_lookup_exact(fib->rib, ip, depth);
+ node = rte_rib6_lookup_exact(rib, ip, depth);
switch (op) {
case RTE_FIB6_ADD:
if (node == NULL)
- node = rte_rib6_insert(fib->rib, ip, depth);
+ node = rte_rib6_insert(rib, ip, depth);
if (node == NULL)
return -rte_errno;
return rte_rib6_set_nh(node, next_hop);
case RTE_FIB6_DEL:
if (node == NULL)
return -ENOENT;
- rte_rib6_remove(fib->rib, ip, depth);
+ rte_rib6_remove(rib, ip, depth);
return 0;
}
return -EINVAL;
@@ -113,7 +125,6 @@ init_dataplane(struct rte_fib6 *fib, __rte_unused int socket_id,
default:
return -EINVAL;
}
- return 0;
}
RTE_EXPORT_SYMBOL(rte_fib6_add)
@@ -124,7 +135,7 @@ rte_fib6_add(struct rte_fib6 *fib, const struct rte_ipv6_addr *ip,
if ((fib == NULL) || (ip == NULL) || (fib->modify == NULL) ||
(depth > RTE_IPV6_MAX_DEPTH))
return -EINVAL;
- return fib->modify(fib, ip, depth, next_hop, RTE_FIB6_ADD);
+ return fib->modify(fib, 0, ip, depth, next_hop, RTE_FIB6_ADD);
}
RTE_EXPORT_SYMBOL(rte_fib6_delete)
@@ -135,7 +146,7 @@ rte_fib6_delete(struct rte_fib6 *fib, const struct rte_ipv6_addr *ip,
if ((fib == NULL) || (ip == NULL) || (fib->modify == NULL) ||
(depth > RTE_IPV6_MAX_DEPTH))
return -EINVAL;
- return fib->modify(fib, ip, depth, 0, RTE_FIB6_DEL);
+ return fib->modify(fib, 0, ip, depth, 0, RTE_FIB6_DEL);
}
RTE_EXPORT_SYMBOL(rte_fib6_lookup_bulk)
@@ -144,23 +155,72 @@ rte_fib6_lookup_bulk(struct rte_fib6 *fib,
const struct rte_ipv6_addr *ips,
uint64_t *next_hops, int n)
{
+ static const uint16_t zero_vrf_ids[FIB6_MAX_LOOKUP_BULK];
+ unsigned int off = 0;
+ unsigned int total = (unsigned int)n;
+
FIB6_RETURN_IF_TRUE((fib == NULL) || (ips == NULL) ||
(next_hops == NULL) || (fib->lookup == NULL), -EINVAL);
- fib->lookup(fib->dp, ips, next_hops, n);
+
+ while (off < total) {
+ unsigned int chunk = RTE_MIN(total - off,
+ FIB6_MAX_LOOKUP_BULK);
+ fib->lookup(fib->dp, zero_vrf_ids, ips + off,
+ next_hops + off, chunk);
+ off += chunk;
+ }
+ return 0;
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_fib6_vrf_lookup_bulk, 26.07)
+int
+rte_fib6_vrf_lookup_bulk(struct rte_fib6 *fib, const uint16_t *vrf_ids,
+ const struct rte_ipv6_addr *ips, uint64_t *next_hops, int n)
+{
+ FIB6_RETURN_IF_TRUE((fib == NULL) || (vrf_ids == NULL) || (ips == NULL) ||
+ (next_hops == NULL) || (fib->lookup == NULL), -EINVAL);
+
+ fib->lookup(fib->dp, vrf_ids, ips, next_hops, (unsigned int)n);
+
return 0;
}
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_fib6_vrf_add, 26.07)
+int
+rte_fib6_vrf_add(struct rte_fib6 *fib, uint16_t vrf_id,
+ const struct rte_ipv6_addr *ip, uint8_t depth, uint64_t next_hop)
+{
+ if ((fib == NULL) || (ip == NULL) || (fib->modify == NULL) ||
+ (depth > RTE_IPV6_MAX_DEPTH))
+ return -EINVAL;
+ return fib->modify(fib, vrf_id, ip, depth, next_hop, RTE_FIB6_ADD);
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_fib6_vrf_delete, 26.07)
+int
+rte_fib6_vrf_delete(struct rte_fib6 *fib, uint16_t vrf_id,
+ const struct rte_ipv6_addr *ip, uint8_t depth)
+{
+ if ((fib == NULL) || (ip == NULL) || (fib->modify == NULL) ||
+ (depth > RTE_IPV6_MAX_DEPTH))
+ return -EINVAL;
+ return fib->modify(fib, vrf_id, ip, depth, 0, RTE_FIB6_DEL);
+}
+
RTE_EXPORT_SYMBOL(rte_fib6_create)
struct rte_fib6 *
rte_fib6_create(const char *name, int socket_id, struct rte_fib6_conf *conf)
{
char mem_name[RTE_FIB6_NAMESIZE];
+ char rib_name[RTE_FIB6_NAMESIZE];
int ret;
struct rte_fib6 *fib = NULL;
struct rte_rib6 *rib = NULL;
struct rte_tailq_entry *te;
struct rte_fib6_list *fib_list;
struct rte_rib6_conf rib_conf;
+ uint16_t num_vrfs;
+ uint16_t vrf;
/* Check user arguments. */
if ((name == NULL) || (conf == NULL) || (conf->max_routes < 0) ||
@@ -172,13 +232,41 @@ rte_fib6_create(const char *name, int socket_id, struct rte_fib6_conf *conf)
rib_conf.ext_sz = conf->rib_ext_sz;
rib_conf.max_nodes = conf->max_routes * 2;
- rib = rte_rib6_create(name, socket_id, &rib_conf);
- if (rib == NULL) {
- FIB_LOG(ERR,
- "Can not allocate RIB %s", name);
+ num_vrfs = (conf->max_vrfs == 0) ? 1 : conf->max_vrfs;
+
+ struct rte_rib6 **ribs = rte_zmalloc_socket("FIB6_RIBS",
+ num_vrfs * sizeof(*ribs), RTE_CACHE_LINE_SIZE, socket_id);
+ if (ribs == NULL) {
+ FIB_LOG(ERR, "FIB6 %s RIB array allocation failed", name);
+ rte_errno = ENOMEM;
+ return NULL;
+ }
+
+ uint64_t *def_nh = rte_zmalloc_socket("FIB6_DEF_NH",
+ num_vrfs * sizeof(*def_nh), RTE_CACHE_LINE_SIZE, socket_id);
+ if (def_nh == NULL) {
+ FIB_LOG(ERR, "FIB6 %s default nexthop array allocation failed", name);
+ rte_errno = ENOMEM;
+ rte_free(ribs);
return NULL;
}
+ for (vrf = 0; vrf < num_vrfs; vrf++) {
+ if (num_vrfs == 1)
+ snprintf(rib_name, sizeof(rib_name), "%s", name);
+ else
+ snprintf(rib_name, sizeof(rib_name), "%s_vrf%u", name, vrf);
+ rib = rte_rib6_create(rib_name, socket_id, &rib_conf);
+ if (rib == NULL) {
+ FIB_LOG(ERR, "Can not allocate RIB %s", rib_name);
+ rte_errno = ENOMEM;
+ goto free_ribs;
+ }
+ ribs[vrf] = rib;
+ def_nh[vrf] = (conf->vrf_default_nh != NULL) ?
+ conf->vrf_default_nh[vrf] : conf->default_nh;
+ }
+
snprintf(mem_name, sizeof(mem_name), "FIB6_%s", name);
fib_list = RTE_TAILQ_CAST(rte_fib6_tailq.head, rte_fib6_list);
@@ -214,15 +302,17 @@ rte_fib6_create(const char *name, int socket_id, struct rte_fib6_conf *conf)
goto free_te;
}
+ fib->num_vrfs = num_vrfs;
+ fib->ribs = ribs;
+ fib->def_nh = def_nh;
+
rte_strlcpy(fib->name, name, sizeof(fib->name));
- fib->rib = rib;
fib->type = conf->type;
- fib->def_nh = conf->default_nh;
ret = init_dataplane(fib, socket_id, conf);
if (ret < 0) {
FIB_LOG(ERR,
- "FIB dataplane struct %s memory allocation failed",
- name);
+ "FIB dataplane struct %s memory allocation failed with err %d",
+ name, ret);
rte_errno = -ret;
goto free_fib;
}
@@ -240,7 +330,12 @@ rte_fib6_create(const char *name, int socket_id, struct rte_fib6_conf *conf)
rte_free(te);
exit:
rte_mcfg_tailq_write_unlock();
- rte_rib6_free(rib);
+free_ribs:
+ for (vrf = 0; vrf < num_vrfs; vrf++)
+ rte_rib6_free(ribs[vrf]);
+
+ rte_free(def_nh);
+ rte_free(ribs);
return NULL;
}
@@ -309,7 +404,13 @@ rte_fib6_free(struct rte_fib6 *fib)
rte_mcfg_tailq_write_unlock();
free_dataplane(fib);
- rte_rib6_free(fib->rib);
+ if (fib->ribs != NULL) {
+ uint16_t vrf;
+ for (vrf = 0; vrf < fib->num_vrfs; vrf++)
+ rte_rib6_free(fib->ribs[vrf]);
+ }
+ rte_free(fib->ribs);
+ rte_free(fib->def_nh);
rte_free(fib);
rte_free(te);
}
@@ -325,7 +426,18 @@ RTE_EXPORT_SYMBOL(rte_fib6_get_rib)
struct rte_rib6 *
rte_fib6_get_rib(struct rte_fib6 *fib)
{
- return (fib == NULL) ? NULL : fib->rib;
+ return (fib == NULL || fib->ribs == NULL) ? NULL : fib->ribs[0];
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_fib6_vrf_get_rib, 26.07)
+struct rte_rib6 *
+rte_fib6_vrf_get_rib(struct rte_fib6 *fib, uint16_t vrf_id)
+{
+ if (fib == NULL || fib->ribs == NULL)
+ return NULL;
+ if (vrf_id >= fib->num_vrfs)
+ return NULL;
+ return fib->ribs[vrf_id];
}
RTE_EXPORT_SYMBOL(rte_fib6_select_lookup)
diff --git a/lib/fib/rte_fib6.h b/lib/fib/rte_fib6.h
index 4527328bf0..864ec08c4e 100644
--- a/lib/fib/rte_fib6.h
+++ b/lib/fib/rte_fib6.h
@@ -55,11 +55,11 @@ enum rte_fib6_type {
};
/** Modify FIB function */
-typedef int (*rte_fib6_modify_fn_t)(struct rte_fib6 *fib,
+typedef int (*rte_fib6_modify_fn_t)(struct rte_fib6 *fib, uint16_t vrf_id,
const struct rte_ipv6_addr *ip, uint8_t depth,
uint64_t next_hop, int op);
/** FIB bulk lookup function */
-typedef void (*rte_fib6_lookup_fn_t)(void *fib,
+typedef void (*rte_fib6_lookup_fn_t)(void *fib, const uint16_t *vrf_ids,
const struct rte_ipv6_addr *ips,
uint64_t *next_hops, const unsigned int n);
@@ -97,6 +97,10 @@ struct rte_fib6_conf {
uint32_t num_tbl8;
} trie;
};
+ /** Number of VRFs to support (0 or 1 = single VRF for backward compat) */
+ uint16_t max_vrfs;
+ /** Per-VRF default nexthops (NULL = use default_nh for all) */
+ uint64_t *vrf_default_nh;
};
/** FIB RCU QSBR configuration structure. */
@@ -215,6 +219,70 @@ rte_fib6_lookup_bulk(struct rte_fib6 *fib,
const struct rte_ipv6_addr *ips,
uint64_t *next_hops, int n);
+/**
+ * Add a route to the FIB with VRF ID.
+ *
+ * @param fib
+ * FIB object handle
+ * @param vrf_id
+ * VRF ID (0 to max_vrfs-1)
+ * @param ip
+ * IPv6 prefix address to be added to the FIB
+ * @param depth
+ * Prefix length
+ * @param next_hop
+ * Next hop to be added to the FIB
+ * @return
+ * 0 on success, negative value otherwise
+ */
+__rte_experimental
+int
+rte_fib6_vrf_add(struct rte_fib6 *fib, uint16_t vrf_id,
+ const struct rte_ipv6_addr *ip, uint8_t depth, uint64_t next_hop);
+
+/**
+ * Delete a rule from the FIB with VRF ID.
+ *
+ * @param fib
+ * FIB object handle
+ * @param vrf_id
+ * VRF ID (0 to max_vrfs-1)
+ * @param ip
+ * IPv6 prefix address to be deleted from the FIB
+ * @param depth
+ * Prefix length
+ * @return
+ * 0 on success, negative value otherwise
+ */
+__rte_experimental
+int
+rte_fib6_vrf_delete(struct rte_fib6 *fib, uint16_t vrf_id,
+ const struct rte_ipv6_addr *ip, uint8_t depth);
+
+/**
+ * Lookup multiple IP addresses in the FIB with per-packet VRF IDs.
+ *
+ * @param fib
+ * FIB object handle
+ * @param vrf_ids
+ * Array of VRF IDs corresponding to ips[] (0 to max_vrfs-1)
+ * @param ips
+ * Array of IPv6s to be looked up in the FIB
+ * @param next_hops
+ * Next hop of the most specific rule found for IP.
+ * This is an array of eight byte values.
+ * If the lookup for the given IP failed, then corresponding element would
+ * contain default nexthop value configured for that VRF.
+ * @param n
+ * Number of elements in vrf_ids/ips/next_hops arrays to lookup.
+ * @return
+ * -EINVAL for incorrect arguments, otherwise 0
+ */
+__rte_experimental
+int
+rte_fib6_vrf_lookup_bulk(struct rte_fib6 *fib, const uint16_t *vrf_ids,
+ const struct rte_ipv6_addr *ips, uint64_t *next_hops, int n);
+
/**
* Get pointer to the dataplane specific struct
*
@@ -228,7 +296,7 @@ void *
rte_fib6_get_dp(struct rte_fib6 *fib);
/**
- * Get pointer to the RIB6
+ * Get pointer to the RIB6 for VRF 0
*
* @param fib
* FIB object handle
@@ -239,6 +307,20 @@ rte_fib6_get_dp(struct rte_fib6 *fib);
struct rte_rib6 *
rte_fib6_get_rib(struct rte_fib6 *fib);
+/**
+ * Get the RIB for a specific VRF.
+ *
+ * @param fib
+ * FIB object handle
+ * @param vrf_id
+ * VRF ID (0 to max_vrfs-1)
+ * @return
+ * RIB for the specified VRF or NULL on error.
+ */
+__rte_experimental
+struct rte_rib6 *
+rte_fib6_vrf_get_rib(struct rte_fib6 *fib, uint16_t vrf_id);
+
/**
* Set lookup function based on type
*
diff --git a/lib/fib/trie.c b/lib/fib/trie.c
index fa5d9ec6b0..2acc9d9526 100644
--- a/lib/fib/trie.c
+++ b/lib/fib/trie.c
@@ -30,22 +30,27 @@ enum edge {
};
static inline rte_fib6_lookup_fn_t
-get_scalar_fn(enum rte_fib_trie_nh_sz nh_sz)
+get_scalar_fn(const struct rte_trie_tbl *dp, enum rte_fib_trie_nh_sz nh_sz)
{
+ bool single_vrf = dp->num_vrfs <= 1;
+
switch (nh_sz) {
case RTE_FIB6_TRIE_2B:
- return rte_trie_lookup_bulk_2b;
+ return single_vrf ? rte_trie_lookup_bulk_2b :
+ rte_trie_lookup_bulk_vrf_2b;
case RTE_FIB6_TRIE_4B:
- return rte_trie_lookup_bulk_4b;
+ return single_vrf ? rte_trie_lookup_bulk_4b :
+ rte_trie_lookup_bulk_vrf_4b;
case RTE_FIB6_TRIE_8B:
- return rte_trie_lookup_bulk_8b;
+ return single_vrf ? rte_trie_lookup_bulk_8b :
+ rte_trie_lookup_bulk_vrf_8b;
default:
return NULL;
}
}
static inline rte_fib6_lookup_fn_t
-get_vector_fn(enum rte_fib_trie_nh_sz nh_sz)
+get_vector_fn(const struct rte_trie_tbl *dp, enum rte_fib_trie_nh_sz nh_sz)
{
#ifdef CC_AVX512_SUPPORT
if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512F) <= 0 ||
@@ -53,13 +58,40 @@ get_vector_fn(enum rte_fib_trie_nh_sz nh_sz)
rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512BW) <= 0 ||
rte_vect_get_max_simd_bitwidth() < RTE_VECT_SIMD_512)
return NULL;
+
+ if (dp->num_vrfs <= 1) {
+ switch (nh_sz) {
+ case RTE_FIB6_TRIE_2B:
+ return rte_trie_vec_lookup_bulk_2b;
+ case RTE_FIB6_TRIE_4B:
+ return rte_trie_vec_lookup_bulk_4b;
+ case RTE_FIB6_TRIE_8B:
+ return rte_trie_vec_lookup_bulk_8b;
+ default:
+ return NULL;
+ }
+ }
+
+ if (dp->num_vrfs >= 256) {
+ switch (nh_sz) {
+ case RTE_FIB6_TRIE_2B:
+ return rte_trie_vec_lookup_bulk_vrf_2b_large;
+ case RTE_FIB6_TRIE_4B:
+ return rte_trie_vec_lookup_bulk_vrf_4b_large;
+ case RTE_FIB6_TRIE_8B:
+ return rte_trie_vec_lookup_bulk_vrf_8b_large;
+ default:
+ return NULL;
+ }
+ }
+
switch (nh_sz) {
case RTE_FIB6_TRIE_2B:
- return rte_trie_vec_lookup_bulk_2b;
+ return rte_trie_vec_lookup_bulk_vrf_2b;
case RTE_FIB6_TRIE_4B:
- return rte_trie_vec_lookup_bulk_4b;
+ return rte_trie_vec_lookup_bulk_vrf_4b;
case RTE_FIB6_TRIE_8B:
- return rte_trie_vec_lookup_bulk_8b;
+ return rte_trie_vec_lookup_bulk_vrf_8b;
default:
return NULL;
}
@@ -83,12 +115,12 @@ trie_get_lookup_fn(void *p, enum rte_fib6_lookup_type type)
switch (type) {
case RTE_FIB6_LOOKUP_TRIE_SCALAR:
- return get_scalar_fn(nh_sz);
+ return get_scalar_fn(dp, nh_sz);
case RTE_FIB6_LOOKUP_TRIE_VECTOR_AVX512:
- return get_vector_fn(nh_sz);
+ return get_vector_fn(dp, nh_sz);
case RTE_FIB6_LOOKUP_DEFAULT:
- ret_fn = get_vector_fn(nh_sz);
- return (ret_fn != NULL) ? ret_fn : get_scalar_fn(nh_sz);
+ ret_fn = get_vector_fn(dp, nh_sz);
+ return (ret_fn != NULL) ? ret_fn : get_scalar_fn(dp, nh_sz);
default:
return NULL;
}
@@ -310,19 +342,22 @@ recycle_root_path(struct rte_trie_tbl *dp, const uint8_t *ip_part,
}
static inline int
-build_common_root(struct rte_trie_tbl *dp, const struct rte_ipv6_addr *ip,
- int common_bytes, void **tbl)
+build_common_root(struct rte_trie_tbl *dp, uint16_t vrf_id,
+ const struct rte_ipv6_addr *ip, int common_bytes, void **tbl)
{
void *tbl_ptr = NULL;
uint64_t *cur_tbl;
uint64_t val;
int i, j, idx, prev_idx = 0;
+ uint64_t idx_tbl;
+ uint64_t tbl24_base = (uint64_t)vrf_id * TRIE_TBL24_NUM_ENT;
cur_tbl = dp->tbl24;
for (i = 3, j = 0; i <= common_bytes; i++) {
idx = get_idx(ip, prev_idx, i - j, j);
- val = get_tbl_val_by_idx(cur_tbl, idx, dp->nh_sz);
- tbl_ptr = get_tbl_p_by_idx(cur_tbl, idx, dp->nh_sz);
+ idx_tbl = (cur_tbl == dp->tbl24) ? idx + tbl24_base : (uint32_t)idx;
+ val = get_tbl_val_by_idx(cur_tbl, idx_tbl, dp->nh_sz);
+ tbl_ptr = get_tbl_p_by_idx(cur_tbl, idx_tbl, dp->nh_sz);
if ((val & TRIE_EXT_ENT) != TRIE_EXT_ENT) {
idx = tbl8_alloc(dp, val);
if (unlikely(idx < 0))
@@ -336,8 +371,11 @@ build_common_root(struct rte_trie_tbl *dp, const struct rte_ipv6_addr *ip,
j = i;
cur_tbl = dp->tbl8;
}
- *tbl = get_tbl_p_by_idx(cur_tbl, prev_idx * TRIE_TBL8_GRP_NUM_ENT,
- dp->nh_sz);
+
+ uint64_t final_idx = (cur_tbl == dp->tbl24) ?
+ (prev_idx * TRIE_TBL8_GRP_NUM_ENT + tbl24_base) :
+ (prev_idx * TRIE_TBL8_GRP_NUM_ENT);
+ *tbl = get_tbl_p_by_idx(cur_tbl, final_idx, dp->nh_sz);
return 0;
}
@@ -385,7 +423,8 @@ write_edge(struct rte_trie_tbl *dp, const uint8_t *ip_part, uint64_t next_hop,
#define TBL8_LEN (RTE_IPV6_ADDR_SIZE - TBL24_BYTES)
static int
-install_to_dp(struct rte_trie_tbl *dp, const struct rte_ipv6_addr *ledge,
+install_to_dp(struct rte_trie_tbl *dp, uint16_t vrf_id,
+ const struct rte_ipv6_addr *ledge,
const struct rte_ipv6_addr *r, uint64_t next_hop)
{
void *common_root_tbl;
@@ -409,7 +448,7 @@ install_to_dp(struct rte_trie_tbl *dp, const struct rte_ipv6_addr *ledge,
break;
}
- ret = build_common_root(dp, ledge, common_bytes, &common_root_tbl);
+ ret = build_common_root(dp, vrf_id, ledge, common_bytes, &common_root_tbl);
if (unlikely(ret != 0))
return ret;
/*first uncommon tbl8 byte idx*/
@@ -455,7 +494,7 @@ install_to_dp(struct rte_trie_tbl *dp, const struct rte_ipv6_addr *ledge,
uint8_t common_tbl8 = (common_bytes < TBL24_BYTES) ?
0 : common_bytes - (TBL24_BYTES - 1);
- ent = get_tbl24_p(dp, ledge, dp->nh_sz);
+ ent = get_tbl24_p(dp, vrf_id, ledge, dp->nh_sz);
recycle_root_path(dp, ledge->a + TBL24_BYTES, common_tbl8, ent);
return 0;
}
@@ -482,9 +521,8 @@ get_nxt_net(struct rte_ipv6_addr *ip, uint8_t depth)
}
static int
-modify_dp(struct rte_trie_tbl *dp, struct rte_rib6 *rib,
- const struct rte_ipv6_addr *ip,
- uint8_t depth, uint64_t next_hop)
+modify_dp(struct rte_trie_tbl *dp, struct rte_rib6 *rib, uint16_t vrf_id,
+ const struct rte_ipv6_addr *ip, uint8_t depth, uint64_t next_hop)
{
struct rte_rib6_node *tmp = NULL;
struct rte_ipv6_addr ledge, redge;
@@ -507,7 +545,7 @@ modify_dp(struct rte_trie_tbl *dp, struct rte_rib6 *rib,
get_nxt_net(&ledge, tmp_depth);
continue;
}
- ret = install_to_dp(dp, &ledge, &redge, next_hop);
+ ret = install_to_dp(dp, vrf_id, &ledge, &redge, next_hop);
if (ret != 0)
return ret;
get_nxt_net(&redge, tmp_depth);
@@ -525,7 +563,7 @@ modify_dp(struct rte_trie_tbl *dp, struct rte_rib6 *rib,
!rte_ipv6_addr_is_unspec(&ledge))
break;
- ret = install_to_dp(dp, &ledge, &redge, next_hop);
+ ret = install_to_dp(dp, vrf_id, &ledge, &redge, next_hop);
if (ret != 0)
return ret;
}
@@ -535,7 +573,8 @@ modify_dp(struct rte_trie_tbl *dp, struct rte_rib6 *rib,
}
int
-trie_modify(struct rte_fib6 *fib, const struct rte_ipv6_addr *ip,
+trie_modify(struct rte_fib6 *fib, uint16_t vrf_id,
+ const struct rte_ipv6_addr *ip,
uint8_t depth, uint64_t next_hop, int op)
{
struct rte_trie_tbl *dp;
@@ -552,9 +591,11 @@ trie_modify(struct rte_fib6 *fib, const struct rte_ipv6_addr *ip,
return -EINVAL;
dp = rte_fib6_get_dp(fib);
- RTE_ASSERT(dp);
- rib = rte_fib6_get_rib(fib);
- RTE_ASSERT(rib);
+ rib = rte_fib6_vrf_get_rib(fib, vrf_id);
+ RTE_ASSERT((dp != NULL) && (rib != NULL));
+
+ if (vrf_id >= dp->num_vrfs)
+ return -EINVAL;
ip_masked = *ip;
rte_ipv6_addr_mask(&ip_masked, depth);
@@ -597,7 +638,7 @@ trie_modify(struct rte_fib6 *fib, const struct rte_ipv6_addr *ip,
rte_rib6_get_nh(node, &node_nh);
if (node_nh == next_hop)
return 0;
- ret = modify_dp(dp, rib, &ip_masked, depth, next_hop);
+ ret = modify_dp(dp, rib, vrf_id, &ip_masked, depth, next_hop);
if (ret == 0)
rte_rib6_set_nh(node, next_hop);
return 0;
@@ -616,7 +657,7 @@ trie_modify(struct rte_fib6 *fib, const struct rte_ipv6_addr *ip,
if (par_nh == next_hop)
goto successfully_added;
}
- ret = modify_dp(dp, rib, &ip_masked, depth, next_hop);
+ ret = modify_dp(dp, rib, vrf_id, &ip_masked, depth, next_hop);
if (ret != 0) {
rte_rib6_remove(rib, &ip_masked, depth);
return ret;
@@ -633,10 +674,11 @@ trie_modify(struct rte_fib6 *fib, const struct rte_ipv6_addr *ip,
rte_rib6_get_nh(parent, &par_nh);
rte_rib6_get_nh(node, &node_nh);
if (par_nh != node_nh)
- ret = modify_dp(dp, rib, &ip_masked, depth,
+ ret = modify_dp(dp, rib, vrf_id, &ip_masked, depth,
par_nh);
} else
- ret = modify_dp(dp, rib, &ip_masked, depth, dp->def_nh);
+ ret = modify_dp(dp, rib, vrf_id, &ip_masked, depth,
+ dp->def_nh[vrf_id]);
if (ret != 0)
return ret;
@@ -656,9 +698,11 @@ trie_create(const char *name, int socket_id,
{
char mem_name[TRIE_NAMESIZE];
struct rte_trie_tbl *dp = NULL;
- uint64_t def_nh;
uint32_t num_tbl8;
enum rte_fib_trie_nh_sz nh_sz;
+ uint16_t num_vrfs;
+ uint16_t vrf;
+ uint64_t tbl24_sz;
if ((name == NULL) || (conf == NULL) ||
(conf->trie.nh_sz < RTE_FIB6_TRIE_2B) ||
@@ -673,21 +717,28 @@ trie_create(const char *name, int socket_id,
return NULL;
}
- def_nh = conf->default_nh;
nh_sz = conf->trie.nh_sz;
num_tbl8 = conf->trie.num_tbl8;
+ num_vrfs = (conf->max_vrfs == 0) ? 1 : conf->max_vrfs;
+ tbl24_sz = (uint64_t)num_vrfs * TRIE_TBL24_NUM_ENT * (1 << nh_sz);
+
+ if (conf->vrf_default_nh != NULL) {
+ for (vrf = 0; vrf < num_vrfs; vrf++) {
+ if (conf->vrf_default_nh[vrf] > get_max_nh(nh_sz)) {
+ rte_errno = EINVAL;
+ return NULL;
+ }
+ }
+ }
snprintf(mem_name, sizeof(mem_name), "DP_%s", name);
- dp = rte_zmalloc_socket(name, sizeof(struct rte_trie_tbl) +
- TRIE_TBL24_NUM_ENT * (1 << nh_sz) + sizeof(uint32_t),
+ dp = rte_zmalloc_socket(name, sizeof(struct rte_trie_tbl) + tbl24_sz,
RTE_CACHE_LINE_SIZE, socket_id);
if (dp == NULL) {
rte_errno = ENOMEM;
return dp;
}
- write_to_dp(&dp->tbl24, (def_nh << 1), nh_sz, 1 << 24);
-
snprintf(mem_name, sizeof(mem_name), "TBL8_%p", dp);
dp->tbl8 = rte_zmalloc_socket(mem_name, TRIE_TBL8_GRP_NUM_ENT *
(1ll << nh_sz) * (num_tbl8 + 1),
@@ -697,9 +748,32 @@ trie_create(const char *name, int socket_id,
rte_free(dp);
return NULL;
}
- dp->def_nh = def_nh;
+
+ snprintf(mem_name, sizeof(mem_name), "DEF_NH_%p", dp);
+ dp->def_nh = rte_zmalloc_socket(mem_name,
+ num_vrfs * sizeof(*dp->def_nh),
+ RTE_CACHE_LINE_SIZE, socket_id);
+ if (dp->def_nh == NULL) {
+ rte_errno = ENOMEM;
+ rte_free(dp->tbl8);
+ rte_free(dp);
+ return NULL;
+ }
+
+ for (vrf = 0; vrf < num_vrfs; vrf++) {
+ uint64_t vrf_def = (conf->vrf_default_nh != NULL) ?
+ conf->vrf_default_nh[vrf] : conf->default_nh;
+ uint8_t *tbl24_ptr = (uint8_t *)dp->tbl24 +
+ ((uint64_t)vrf * TRIE_TBL24_NUM_ENT << nh_sz);
+
+ dp->def_nh[vrf] = vrf_def;
+ write_to_dp((void *)tbl24_ptr, (vrf_def << 1), nh_sz,
+ TRIE_TBL24_NUM_ENT);
+ }
+
dp->nh_sz = nh_sz;
dp->number_tbl8s = num_tbl8;
+ dp->num_vrfs = num_vrfs;
snprintf(mem_name, sizeof(mem_name), "TBL8_idxes_%p", dp);
dp->tbl8_pool = rte_zmalloc_socket(mem_name,
@@ -707,6 +781,7 @@ trie_create(const char *name, int socket_id,
RTE_CACHE_LINE_SIZE, socket_id);
if (dp->tbl8_pool == NULL) {
rte_errno = ENOMEM;
+ rte_free(dp->def_nh);
rte_free(dp->tbl8);
rte_free(dp);
return NULL;
@@ -725,6 +800,7 @@ trie_free(void *p)
rte_rcu_qsbr_dq_delete(dp->dq);
rte_free(dp->tbl8_pool);
rte_free(dp->tbl8);
+ rte_free(dp->def_nh);
rte_free(dp);
}
diff --git a/lib/fib/trie.h b/lib/fib/trie.h
index c34cc2c057..ef9a1d50c6 100644
--- a/lib/fib/trie.h
+++ b/lib/fib/trie.h
@@ -9,6 +9,7 @@
#include <stdalign.h>
#include <rte_common.h>
+#include <rte_debug.h>
#include <rte_fib6.h>
/**
@@ -32,18 +33,19 @@
struct rte_trie_tbl {
uint32_t number_tbl8s; /**< Total number of tbl8s */
uint32_t rsvd_tbl8s; /**< Number of reserved tbl8s */
- uint32_t cur_tbl8s; /**< Current cumber of tbl8s */
- uint64_t def_nh; /**< Default next hop */
+ uint32_t cur_tbl8s; /**< Current number of tbl8s */
+ uint16_t num_vrfs; /**< Number of VRFs */
enum rte_fib_trie_nh_sz nh_sz; /**< Size of nexthop entry */
- uint64_t *tbl8; /**< tbl8 table. */
- uint32_t *tbl8_pool; /**< bitmap containing free tbl8 idxes*/
- uint32_t tbl8_pool_pos;
/* RCU config. */
enum rte_fib6_qsbr_mode rcu_mode; /**< Blocking, defer queue. */
struct rte_rcu_qsbr *v; /**< RCU QSBR variable. */
struct rte_rcu_qsbr_dq *dq; /**< RCU QSBR defer queue. */
+ uint64_t *def_nh; /**< Per-VRF default next hop array */
+ uint64_t *tbl8; /**< tbl8 table for all VRFs */
+ uint32_t *tbl8_pool; /**< bitmap containing free tbl8 idxes */
+ uint32_t tbl8_pool_pos;
/* tbl24 table. */
- alignas(RTE_CACHE_LINE_SIZE) uint64_t tbl24[];
+ alignas(RTE_CACHE_LINE_SIZE) uint64_t tbl24[];
};
static inline uint32_t
@@ -53,12 +55,15 @@ get_tbl24_idx(const struct rte_ipv6_addr *ip)
}
static inline void *
-get_tbl24_p(struct rte_trie_tbl *dp, const struct rte_ipv6_addr *ip, uint8_t nh_sz)
+get_tbl24_p(struct rte_trie_tbl *dp, uint16_t vrf_id,
+ const struct rte_ipv6_addr *ip, uint8_t nh_sz)
{
uint32_t tbl24_idx;
+ uint64_t base;
tbl24_idx = get_tbl24_idx(ip);
- return (void *)&((uint8_t *)dp->tbl24)[tbl24_idx << nh_sz];
+ base = (uint64_t)vrf_id * TRIE_TBL24_NUM_ENT;
+ return (void *)&((uint8_t *)dp->tbl24)[(base + tbl24_idx) << nh_sz];
}
static inline uint8_t
@@ -110,17 +115,26 @@ is_entry_extended(uint64_t ent)
return (ent & TRIE_EXT_ENT) == TRIE_EXT_ENT;
}
-#define LOOKUP_FUNC(suffix, type, nh_sz) \
+#define LOOKUP_FUNC(suffix, type, is_vrf) \
static inline void rte_trie_lookup_bulk_##suffix(void *p, \
- const struct rte_ipv6_addr *ips, \
+ const uint16_t *vrf_ids, const struct rte_ipv6_addr *ips, \
uint64_t *next_hops, const unsigned int n) \
-{ \
+{\
struct rte_trie_tbl *dp = (struct rte_trie_tbl *)p; \
uint64_t tmp; \
uint32_t i, j; \
+ uint32_t tbl24_idx; \
+ uint64_t base; \
+ \
+ if (!is_vrf) \
+ RTE_SET_USED(vrf_ids); \
\
for (i = 0; i < n; i++) { \
- tmp = ((type *)dp->tbl24)[get_tbl24_idx(&ips[i])]; \
+ uint16_t vrf_id = is_vrf ? vrf_ids[i] : 0; \
+ RTE_ASSERT(vrf_id < dp->num_vrfs); \
+ base = (uint64_t)vrf_id * TRIE_TBL24_NUM_ENT; \
+ tbl24_idx = get_tbl24_idx(&ips[i]); \
+ tmp = ((type *)dp->tbl24)[base + tbl24_idx]; \
j = 3; \
while (is_entry_extended(tmp)) { \
tmp = ((type *)dp->tbl8)[ips[i].a[j++] + \
@@ -129,9 +143,13 @@ static inline void rte_trie_lookup_bulk_##suffix(void *p, \
next_hops[i] = tmp >> 1; \
} \
}
-LOOKUP_FUNC(2b, uint16_t, 1)
-LOOKUP_FUNC(4b, uint32_t, 2)
-LOOKUP_FUNC(8b, uint64_t, 3)
+
+LOOKUP_FUNC(2b, uint16_t, false)
+LOOKUP_FUNC(4b, uint32_t, false)
+LOOKUP_FUNC(8b, uint64_t, false)
+LOOKUP_FUNC(vrf_2b, uint16_t, true)
+LOOKUP_FUNC(vrf_4b, uint32_t, true)
+LOOKUP_FUNC(vrf_8b, uint64_t, true)
void
trie_free(void *p);
@@ -144,7 +162,8 @@ rte_fib6_lookup_fn_t
trie_get_lookup_fn(void *p, enum rte_fib6_lookup_type type);
int
-trie_modify(struct rte_fib6 *fib, const struct rte_ipv6_addr *ip,
+trie_modify(struct rte_fib6 *fib, uint16_t vrf_id,
+ const struct rte_ipv6_addr *ip,
uint8_t depth, uint64_t next_hop, int op);
int
diff --git a/lib/fib/trie_avx512.c b/lib/fib/trie_avx512.c
index f49482a95d..19cd69e69c 100644
--- a/lib/fib/trie_avx512.c
+++ b/lib/fib/trie_avx512.c
@@ -8,6 +8,12 @@
#include "trie.h"
#include "trie_avx512.h"
+enum vrf_scale {
+ VRF_SCALE_SINGLE = 0,
+ VRF_SCALE_SMALL = 1,
+ VRF_SCALE_LARGE = 2,
+};
+
static __rte_always_inline void
transpose_x16(const struct rte_ipv6_addr *ips,
__m512i *first, __m512i *second, __m512i *third, __m512i *fourth)
@@ -67,8 +73,9 @@ transpose_x8(const struct rte_ipv6_addr *ips,
}
static __rte_always_inline void
-trie_vec_lookup_x16x2(void *p, const struct rte_ipv6_addr *ips,
- uint64_t *next_hops, int size)
+trie_vec_lookup_x16x2(void *p, const uint16_t *vrf_ids,
+ const struct rte_ipv6_addr *ips, uint64_t *next_hops, int size,
+ enum vrf_scale vrf_scale)
{
struct rte_trie_tbl *dp = (struct rte_trie_tbl *)p;
const __m512i zero = _mm512_set1_epi32(0);
@@ -79,6 +86,7 @@ trie_vec_lookup_x16x2(void *p, const struct rte_ipv6_addr *ips,
__m512i first_2, second_2, third_2, fourth_2;
__m512i idxes_1, res_1;
__m512i idxes_2, res_2;
+ __m512i vrf32_1, vrf32_2;
__m512i shuf_idxes;
__m512i tmp_1, tmp2_1, bytes_1, byte_chunk_1;
__m512i tmp_2, tmp2_2, bytes_2, byte_chunk_2;
@@ -109,6 +117,24 @@ trie_vec_lookup_x16x2(void *p, const struct rte_ipv6_addr *ips,
idxes_1 = _mm512_shuffle_epi8(first_1, bswap.z);
idxes_2 = _mm512_shuffle_epi8(first_2, bswap.z);
+ if (vrf_scale == VRF_SCALE_SINGLE) {
+ RTE_SET_USED(vrf_ids);
+ } else {
+ uint32_t j;
+
+ for (j = 0; j < 32; j++)
+ RTE_ASSERT(vrf_ids[j] < dp->num_vrfs);
+
+ vrf32_1 = _mm512_cvtepu16_epi32(
+ _mm256_loadu_si256((const void *)vrf_ids));
+ vrf32_2 = _mm512_cvtepu16_epi32(
+ _mm256_loadu_si256((const void *)(vrf_ids + 16)));
+ idxes_1 = _mm512_add_epi32(idxes_1,
+ _mm512_slli_epi32(vrf32_1, 24));
+ idxes_2 = _mm512_add_epi32(idxes_2,
+ _mm512_slli_epi32(vrf32_2, 24));
+ }
+
/**
* lookup in tbl24
* Put it inside branch to make compiller happy with -O0
@@ -213,13 +239,15 @@ trie_vec_lookup_x16x2(void *p, const struct rte_ipv6_addr *ips,
}
static void
-trie_vec_lookup_x8x2_8b(void *p, const struct rte_ipv6_addr *ips,
- uint64_t *next_hops)
+trie_vec_lookup_x8x2(void *p, const uint16_t *vrf_ids,
+ const struct rte_ipv6_addr *ips, uint64_t *next_hops, int size,
+ enum vrf_scale vrf_scale)
{
struct rte_trie_tbl *dp = (struct rte_trie_tbl *)p;
const __m512i zero = _mm512_set1_epi32(0);
const __m512i lsb = _mm512_set1_epi32(1);
const __m512i three_lsb = _mm512_set1_epi32(7);
+ __m512i res_msk;
/* IPv6 eight byte chunks */
__m512i first_1, second_1;
__m512i first_2, second_2;
@@ -228,6 +256,7 @@ trie_vec_lookup_x8x2_8b(void *p, const struct rte_ipv6_addr *ips,
__m512i shuf_idxes, base_idxes;
__m512i tmp_1, bytes_1, byte_chunk_1;
__m512i tmp_2, bytes_2, byte_chunk_2;
+ __m512i vrf64_1, vrf64_2;
const __rte_x86_zmm_t bswap = {
.u8 = { 2, 1, 0, 255, 255, 255, 255, 255,
10, 9, 8, 255, 255, 255, 255, 255,
@@ -244,6 +273,11 @@ trie_vec_lookup_x8x2_8b(void *p, const struct rte_ipv6_addr *ips,
__mmask8 msk_ext_1, new_msk_1;
__mmask8 msk_ext_2, new_msk_2;
+ if (size == sizeof(uint16_t))
+ res_msk = _mm512_set1_epi64(UINT16_MAX);
+ else if (size == sizeof(uint32_t))
+ res_msk = _mm512_set1_epi64(UINT32_MAX);
+
transpose_x8(ips, &first_1, &second_1);
transpose_x8(ips + 8, &first_2, &second_2);
@@ -251,9 +285,39 @@ trie_vec_lookup_x8x2_8b(void *p, const struct rte_ipv6_addr *ips,
idxes_1 = _mm512_shuffle_epi8(first_1, bswap.z);
idxes_2 = _mm512_shuffle_epi8(first_2, bswap.z);
+ if (vrf_scale == VRF_SCALE_SINGLE) {
+ RTE_SET_USED(vrf_ids);
+ } else {
+ uint32_t j;
+
+ for (j = 0; j < 16; j++)
+ RTE_ASSERT(vrf_ids[j] < dp->num_vrfs);
+
+ vrf64_1 = _mm512_cvtepu16_epi64(
+ _mm_loadu_si128((const void *)vrf_ids));
+ vrf64_2 = _mm512_cvtepu16_epi64(
+ _mm_loadu_si128((const void *)(vrf_ids + 8)));
+ idxes_1 = _mm512_add_epi64(idxes_1,
+ _mm512_slli_epi64(vrf64_1, 24));
+ idxes_2 = _mm512_add_epi64(idxes_2,
+ _mm512_slli_epi64(vrf64_2, 24));
+ }
+
/* lookup in tbl24 */
- res_1 = _mm512_i64gather_epi64(idxes_1, (const void *)dp->tbl24, 8);
- res_2 = _mm512_i64gather_epi64(idxes_2, (const void *)dp->tbl24, 8);
+ if (size == sizeof(uint16_t)) {
+ res_1 = _mm512_i64gather_epi64(idxes_1, (const void *)dp->tbl24, 2);
+ res_2 = _mm512_i64gather_epi64(idxes_2, (const void *)dp->tbl24, 2);
+ res_1 = _mm512_and_epi64(res_1, res_msk);
+ res_2 = _mm512_and_epi64(res_2, res_msk);
+ } else if (size == sizeof(uint32_t)) {
+ res_1 = _mm512_i64gather_epi64(idxes_1, (const void *)dp->tbl24, 4);
+ res_2 = _mm512_i64gather_epi64(idxes_2, (const void *)dp->tbl24, 4);
+ res_1 = _mm512_and_epi64(res_1, res_msk);
+ res_2 = _mm512_and_epi64(res_2, res_msk);
+ } else {
+ res_1 = _mm512_i64gather_epi64(idxes_1, (const void *)dp->tbl24, 8);
+ res_2 = _mm512_i64gather_epi64(idxes_2, (const void *)dp->tbl24, 8);
+ }
/* get extended entries indexes */
msk_ext_1 = _mm512_test_epi64_mask(res_1, lsb);
msk_ext_2 = _mm512_test_epi64_mask(res_2, lsb);
@@ -278,10 +342,26 @@ trie_vec_lookup_x8x2_8b(void *p, const struct rte_ipv6_addr *ips,
shuf_idxes);
idxes_1 = _mm512_maskz_add_epi64(msk_ext_1, idxes_1, bytes_1);
idxes_2 = _mm512_maskz_add_epi64(msk_ext_2, idxes_2, bytes_2);
- tmp_1 = _mm512_mask_i64gather_epi64(zero, msk_ext_1,
+ if (size == sizeof(uint16_t)) {
+ tmp_1 = _mm512_mask_i64gather_epi64(zero, msk_ext_1,
+ idxes_1, (const void *)dp->tbl8, 2);
+ tmp_2 = _mm512_mask_i64gather_epi64(zero, msk_ext_2,
+ idxes_2, (const void *)dp->tbl8, 2);
+ tmp_1 = _mm512_and_epi64(tmp_1, res_msk);
+ tmp_2 = _mm512_and_epi64(tmp_2, res_msk);
+ } else if (size == sizeof(uint32_t)) {
+ tmp_1 = _mm512_mask_i64gather_epi64(zero, msk_ext_1,
+ idxes_1, (const void *)dp->tbl8, 4);
+ tmp_2 = _mm512_mask_i64gather_epi64(zero, msk_ext_2,
+ idxes_2, (const void *)dp->tbl8, 4);
+ tmp_1 = _mm512_and_epi64(tmp_1, res_msk);
+ tmp_2 = _mm512_and_epi64(tmp_2, res_msk);
+ } else {
+ tmp_1 = _mm512_mask_i64gather_epi64(zero, msk_ext_1,
idxes_1, (const void *)dp->tbl8, 8);
- tmp_2 = _mm512_mask_i64gather_epi64(zero, msk_ext_2,
+ tmp_2 = _mm512_mask_i64gather_epi64(zero, msk_ext_2,
idxes_2, (const void *)dp->tbl8, 8);
+ }
new_msk_1 = _mm512_test_epi64_mask(tmp_1, lsb);
new_msk_2 = _mm512_test_epi64_mask(tmp_2, lsb);
res_1 = _mm512_mask_blend_epi64(msk_ext_1 ^ new_msk_1, res_1,
@@ -306,40 +386,145 @@ trie_vec_lookup_x8x2_8b(void *p, const struct rte_ipv6_addr *ips,
}
void
-rte_trie_vec_lookup_bulk_2b(void *p, const struct rte_ipv6_addr *ips,
+rte_trie_vec_lookup_bulk_2b(void *p, const uint16_t *vrf_ids,
+ const struct rte_ipv6_addr *ips,
uint64_t *next_hops, const unsigned int n)
{
uint32_t i;
+
for (i = 0; i < (n / 32); i++) {
- trie_vec_lookup_x16x2(p, &ips[i * 32],
- next_hops + i * 32, sizeof(uint16_t));
+ trie_vec_lookup_x16x2(p, vrf_ids + i * 32, &ips[i * 32],
+ next_hops + i * 32, sizeof(uint16_t),
+ VRF_SCALE_SINGLE);
}
- rte_trie_lookup_bulk_2b(p, &ips[i * 32],
+ rte_trie_lookup_bulk_2b(p, vrf_ids + i * 32, &ips[i * 32],
next_hops + i * 32, n - i * 32);
}
void
-rte_trie_vec_lookup_bulk_4b(void *p, const struct rte_ipv6_addr *ips,
+rte_trie_vec_lookup_bulk_vrf_2b(void *p, const uint16_t *vrf_ids,
+ const struct rte_ipv6_addr *ips,
uint64_t *next_hops, const unsigned int n)
{
uint32_t i;
+
for (i = 0; i < (n / 32); i++) {
- trie_vec_lookup_x16x2(p, &ips[i * 32],
- next_hops + i * 32, sizeof(uint32_t));
+ trie_vec_lookup_x16x2(p, vrf_ids + i * 32, &ips[i * 32],
+ next_hops + i * 32, sizeof(uint16_t),
+ VRF_SCALE_SMALL);
}
- rte_trie_lookup_bulk_4b(p, &ips[i * 32],
+ rte_trie_lookup_bulk_vrf_2b(p, vrf_ids + i * 32, &ips[i * 32],
next_hops + i * 32, n - i * 32);
}
void
-rte_trie_vec_lookup_bulk_8b(void *p, const struct rte_ipv6_addr *ips,
+rte_trie_vec_lookup_bulk_vrf_2b_large(void *p, const uint16_t *vrf_ids,
+ const struct rte_ipv6_addr *ips,
uint64_t *next_hops, const unsigned int n)
{
uint32_t i;
+
+ for (i = 0; i < (n / 16); i++) {
+ trie_vec_lookup_x8x2(p, vrf_ids + i * 16, &ips[i * 16],
+ next_hops + i * 16, sizeof(uint16_t),
+ VRF_SCALE_LARGE);
+ }
+ rte_trie_lookup_bulk_vrf_2b(p, vrf_ids + i * 16, &ips[i * 16],
+ next_hops + i * 16, n - i * 16);
+}
+
+void
+rte_trie_vec_lookup_bulk_4b(void *p, const uint16_t *vrf_ids,
+ const struct rte_ipv6_addr *ips,
+ uint64_t *next_hops, const unsigned int n)
+{
+ uint32_t i;
+
+ for (i = 0; i < (n / 32); i++) {
+ trie_vec_lookup_x16x2(p, vrf_ids + i * 32, &ips[i * 32],
+ next_hops + i * 32, sizeof(uint32_t),
+ VRF_SCALE_SINGLE);
+ }
+ rte_trie_lookup_bulk_4b(p, vrf_ids + i * 32, &ips[i * 32],
+ next_hops + i * 32, n - i * 32);
+}
+
+void
+rte_trie_vec_lookup_bulk_vrf_4b(void *p, const uint16_t *vrf_ids,
+ const struct rte_ipv6_addr *ips,
+ uint64_t *next_hops, const unsigned int n)
+{
+ uint32_t i;
+
+ for (i = 0; i < (n / 32); i++) {
+ trie_vec_lookup_x16x2(p, vrf_ids + i * 32, &ips[i * 32],
+ next_hops + i * 32, sizeof(uint32_t),
+ VRF_SCALE_SMALL);
+ }
+ rte_trie_lookup_bulk_vrf_4b(p, vrf_ids + i * 32, &ips[i * 32],
+ next_hops + i * 32, n - i * 32);
+}
+
+void
+rte_trie_vec_lookup_bulk_vrf_4b_large(void *p, const uint16_t *vrf_ids,
+ const struct rte_ipv6_addr *ips,
+ uint64_t *next_hops, const unsigned int n)
+{
+ uint32_t i;
+
+ for (i = 0; i < (n / 16); i++) {
+ trie_vec_lookup_x8x2(p, vrf_ids + i * 16, &ips[i * 16],
+ next_hops + i * 16, sizeof(uint32_t),
+ VRF_SCALE_LARGE);
+ }
+ rte_trie_lookup_bulk_vrf_4b(p, vrf_ids + i * 16, &ips[i * 16],
+ next_hops + i * 16, n - i * 16);
+}
+
+void
+rte_trie_vec_lookup_bulk_8b(void *p, const uint16_t *vrf_ids,
+ const struct rte_ipv6_addr *ips,
+ uint64_t *next_hops, const unsigned int n)
+{
+ uint32_t i;
+
+ for (i = 0; i < (n / 16); i++) {
+ trie_vec_lookup_x8x2(p, vrf_ids + i * 16, &ips[i * 16],
+ next_hops + i * 16, sizeof(uint64_t),
+ VRF_SCALE_SINGLE);
+ }
+ rte_trie_lookup_bulk_8b(p, vrf_ids + i * 16, &ips[i * 16],
+ next_hops + i * 16, n - i * 16);
+}
+
+void
+rte_trie_vec_lookup_bulk_vrf_8b(void *p, const uint16_t *vrf_ids,
+ const struct rte_ipv6_addr *ips,
+ uint64_t *next_hops, const unsigned int n)
+{
+ uint32_t i;
+
+ for (i = 0; i < (n / 16); i++) {
+ trie_vec_lookup_x8x2(p, vrf_ids + i * 16, &ips[i * 16],
+ next_hops + i * 16, sizeof(uint64_t),
+ VRF_SCALE_SMALL);
+ }
+ rte_trie_lookup_bulk_vrf_8b(p, vrf_ids + i * 16, &ips[i * 16],
+ next_hops + i * 16, n - i * 16);
+}
+
+void
+rte_trie_vec_lookup_bulk_vrf_8b_large(void *p, const uint16_t *vrf_ids,
+ const struct rte_ipv6_addr *ips,
+ uint64_t *next_hops, const unsigned int n)
+{
+ uint32_t i;
+
for (i = 0; i < (n / 16); i++) {
- trie_vec_lookup_x8x2_8b(p, &ips[i * 16],
- next_hops + i * 16);
+ trie_vec_lookup_x8x2(p, vrf_ids + i * 16, &ips[i * 16],
+ next_hops + i * 16, sizeof(uint64_t),
+ VRF_SCALE_LARGE);
}
- rte_trie_lookup_bulk_8b(p, &ips[i * 16],
+ rte_trie_lookup_bulk_vrf_8b(p, vrf_ids + i * 16, &ips[i * 16],
next_hops + i * 16, n - i * 16);
}
diff --git a/lib/fib/trie_avx512.h b/lib/fib/trie_avx512.h
index 1028a4899f..190a5c5aa4 100644
--- a/lib/fib/trie_avx512.h
+++ b/lib/fib/trie_avx512.h
@@ -10,15 +10,48 @@
struct rte_ipv6_addr;
void
-rte_trie_vec_lookup_bulk_2b(void *p, const struct rte_ipv6_addr *ips,
+rte_trie_vec_lookup_bulk_2b(void *p, const uint16_t *vrf_ids,
+ const struct rte_ipv6_addr *ips,
uint64_t *next_hops, const unsigned int n);
void
-rte_trie_vec_lookup_bulk_4b(void *p, const struct rte_ipv6_addr *ips,
+rte_trie_vec_lookup_bulk_vrf_2b(void *p, const uint16_t *vrf_ids,
+ const struct rte_ipv6_addr *ips,
uint64_t *next_hops, const unsigned int n);
void
-rte_trie_vec_lookup_bulk_8b(void *p, const struct rte_ipv6_addr *ips,
+rte_trie_vec_lookup_bulk_vrf_2b_large(void *p, const uint16_t *vrf_ids,
+ const struct rte_ipv6_addr *ips,
+ uint64_t *next_hops, const unsigned int n);
+
+void
+rte_trie_vec_lookup_bulk_4b(void *p, const uint16_t *vrf_ids,
+ const struct rte_ipv6_addr *ips,
+ uint64_t *next_hops, const unsigned int n);
+
+void
+rte_trie_vec_lookup_bulk_vrf_4b(void *p, const uint16_t *vrf_ids,
+ const struct rte_ipv6_addr *ips,
+ uint64_t *next_hops, const unsigned int n);
+
+void
+rte_trie_vec_lookup_bulk_vrf_4b_large(void *p, const uint16_t *vrf_ids,
+ const struct rte_ipv6_addr *ips,
+ uint64_t *next_hops, const unsigned int n);
+
+void
+rte_trie_vec_lookup_bulk_8b(void *p, const uint16_t *vrf_ids,
+ const struct rte_ipv6_addr *ips,
+ uint64_t *next_hops, const unsigned int n);
+
+void
+rte_trie_vec_lookup_bulk_vrf_8b(void *p, const uint16_t *vrf_ids,
+ const struct rte_ipv6_addr *ips,
+ uint64_t *next_hops, const unsigned int n);
+
+void
+rte_trie_vec_lookup_bulk_vrf_8b_large(void *p, const uint16_t *vrf_ids,
+ const struct rte_ipv6_addr *ips,
uint64_t *next_hops, const unsigned int n);
#endif /* _TRIE_AVX512_H_ */
--
2.43.0
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [RFC PATCH 4/4] fib6: add VRF functional and unit tests
2026-03-22 15:42 [RFC PATCH 0/4] VRF support in FIB library Vladimir Medvedkin
` (2 preceding siblings ...)
2026-03-22 15:42 ` [RFC PATCH 3/4] fib6: add multi-VRF support Vladimir Medvedkin
@ 2026-03-22 15:42 ` Vladimir Medvedkin
2026-03-22 16:45 ` Stephen Hemminger
2026-03-22 16:43 ` [RFC PATCH 0/4] VRF support in FIB library Stephen Hemminger
` (3 subsequent siblings)
7 siblings, 1 reply; 33+ messages in thread
From: Vladimir Medvedkin @ 2026-03-22 15:42 UTC (permalink / raw)
To: dev; +Cc: rjarry, nsaxena16, mb, adwivedi, jerinjacobk
Add test coverage for the multi-VRF IPv6 FIB API.
Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
---
app/test-fib/main.c | 92 +++++++++++--
app/test/test_fib6.c | 319 ++++++++++++++++++++++++++++++++++++++++++-
2 files changed, 395 insertions(+), 16 deletions(-)
diff --git a/app/test-fib/main.c b/app/test-fib/main.c
index 5593fdd47e..0feac72b87 100644
--- a/app/test-fib/main.c
+++ b/app/test-fib/main.c
@@ -717,7 +717,7 @@ check_config(void)
* get_vrf_bits(nb_vrfs) must be strictly less than
* the total nexthop width.
*/
- if ((config.nb_vrfs > 1) && !(config.flags & IPV6_FLAG)) {
+ if (config.nb_vrfs > 1) {
uint8_t nh_sz = rte_ctz32(config.ent_sz);
uint8_t vrf_bits = get_vrf_bits(config.nb_vrfs);
/* - 2 to leave at least 1 bit for nexthop and 1 bit for ext_ent flag */
@@ -1165,6 +1165,7 @@ run_v6(void)
{
uint64_t start, acc;
uint64_t def_nh = 0;
+ uint8_t nh_sz = rte_ctz32(config.ent_sz);
struct rte_fib6 *fib;
struct rte_fib6_conf conf = {0};
struct rt_rule_6 *rt;
@@ -1175,6 +1176,7 @@ run_v6(void)
struct rte_ipv6_addr *tbl6;
uint64_t fib_nh[BURST_SZ];
int32_t lpm_nh[BURST_SZ];
+ uint16_t *vrf_ids = NULL;
rt = (struct rt_rule_6 *)config.rt;
tbl6 = config.lookup_tbl;
@@ -1189,16 +1191,38 @@ run_v6(void)
return ret;
}
+ /* Allocate VRF IDs array for lookups if using multiple VRFs */
+ if (config.nb_vrfs > 1) {
+ vrf_ids = rte_malloc(NULL, sizeof(uint16_t) * config.nb_lookup_ips, 0);
+ if (vrf_ids == NULL) {
+ printf("Can not alloc VRF IDs array\n");
+ return -ENOMEM;
+ }
+ /* Generate random VRF IDs for each lookup */
+ for (i = 0; i < config.nb_lookup_ips; i++)
+ vrf_ids[i] = rte_rand() % config.nb_vrfs;
+ }
+
conf.type = get_fib_type();
conf.default_nh = def_nh;
conf.max_routes = config.nb_routes * 2;
conf.rib_ext_sz = 0;
+ conf.max_vrfs = config.nb_vrfs;
+ conf.vrf_default_nh = NULL;
if (conf.type == RTE_FIB6_TRIE) {
conf.trie.nh_sz = rte_ctz32(config.ent_sz);
conf.trie.num_tbl8 = RTE_MIN(config.tbl8,
get_max_nh(conf.trie.nh_sz));
}
+ conf.vrf_default_nh = rte_malloc(NULL, conf.max_vrfs * sizeof(uint64_t), 0);
+ if (conf.vrf_default_nh == NULL) {
+ printf("Can not alloc VRF default nexthops array\n");
+ return -ENOMEM;
+ }
+ for (i = 0; i < conf.max_vrfs; i++)
+ conf.vrf_default_nh[i] = encode_vrf_nh(i, def_nh, nh_sz);
+
fib = rte_fib6_create("test", -1, &conf);
if (fib == NULL) {
printf("Can not alloc FIB, err %d\n", rte_errno);
@@ -1223,12 +1247,28 @@ run_v6(void)
for (k = config.print_fract, i = 0; k > 0; k--) {
start = rte_rdtsc_precise();
for (j = 0; j < (config.nb_routes - i) / k; j++) {
- ret = rte_fib6_add(fib, &rt[i + j].addr,
- rt[i + j].depth, rt[i + j].nh);
- if (unlikely(ret != 0)) {
- printf("Can not add a route to FIB, err %d\n",
- ret);
- return -ret;
+ uint32_t idx = i + j;
+ if (config.nb_vrfs > 1) {
+ uint16_t vrf_id;
+ for (vrf_id = 0; vrf_id < config.nb_vrfs; vrf_id++) {
+ uint64_t nh = encode_vrf_nh(vrf_id, rt[idx].nh,
+ nh_sz);
+ ret = rte_fib6_vrf_add(fib, vrf_id, &rt[idx].addr,
+ rt[idx].depth, nh);
+ if (unlikely(ret != 0)) {
+ printf("Can not add a route to FIB, err %d\n",
+ ret);
+ return -ret;
+ }
+ }
+ } else {
+ ret = rte_fib6_add(fib, &rt[idx].addr,
+ rt[idx].depth, rt[idx].nh);
+ if (unlikely(ret != 0)) {
+ printf("Can not add a route to FIB, err %d\n",
+ ret);
+ return -ret;
+ }
}
}
printf("AVG FIB add %"PRIu64"\n",
@@ -1268,15 +1308,33 @@ run_v6(void)
acc = 0;
for (i = 0; i < config.nb_lookup_ips; i += BURST_SZ) {
start = rte_rdtsc_precise();
- ret = rte_fib6_lookup_bulk(fib, &tbl6[i],
- fib_nh, BURST_SZ);
+ if (config.nb_vrfs > 1)
+ ret = rte_fib6_vrf_lookup_bulk(fib, vrf_ids + i,
+ &tbl6[i], fib_nh, BURST_SZ);
+ else
+ ret = rte_fib6_lookup_bulk(fib, &tbl6[i],
+ fib_nh, BURST_SZ);
acc += rte_rdtsc_precise() - start;
if (ret != 0) {
printf("FIB lookup fails, err %d\n", ret);
return -ret;
}
+ /* Validate VRF IDs in returned nexthops */
+ if (config.nb_vrfs > 1) {
+ for (j = 0; j < BURST_SZ; j++) {
+ uint16_t returned_vrf = decode_vrf_nh(fib_nh[j],
+ nh_sz);
+ if (returned_vrf != vrf_ids[i + j]) {
+ printf("VRF validation failed: expected VRF %u, got %u\n",
+ vrf_ids[i + j], returned_vrf);
+ return -1;
+ }
+ }
+ }
}
printf("AVG FIB lookup %.1f\n", (double)acc / (double)i);
+ if (config.nb_vrfs > 1)
+ printf("VRF validation passed\n");
if (config.flags & CMP_FLAG) {
acc = 0;
@@ -1314,8 +1372,17 @@ run_v6(void)
for (k = config.print_fract, i = 0; k > 0; k--) {
start = rte_rdtsc_precise();
- for (j = 0; j < (config.nb_routes - i) / k; j++)
- rte_fib6_delete(fib, &rt[i + j].addr, rt[i + j].depth);
+ for (j = 0; j < (config.nb_routes - i) / k; j++) {
+ uint32_t idx = i + j;
+ if (config.nb_vrfs > 1) {
+ uint16_t vrf_id;
+ for (vrf_id = 0; vrf_id < config.nb_vrfs; vrf_id++)
+ rte_fib6_vrf_delete(fib, vrf_id, &rt[idx].addr,
+ rt[idx].depth);
+ } else {
+ rte_fib6_delete(fib, &rt[idx].addr, rt[idx].depth);
+ }
+ }
printf("AVG FIB delete %"PRIu64"\n",
(rte_rdtsc_precise() - start) / j);
@@ -1334,6 +1401,9 @@ run_v6(void)
i += j;
}
}
+
+ if (vrf_ids != NULL)
+ rte_free(vrf_ids);
return 0;
}
diff --git a/app/test/test_fib6.c b/app/test/test_fib6.c
index fffb590dbf..1143a338e6 100644
--- a/app/test/test_fib6.c
+++ b/app/test/test_fib6.c
@@ -6,6 +6,7 @@
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
+#include <inttypes.h>
#include <rte_memory.h>
#include <rte_log.h>
@@ -25,6 +26,11 @@ static int32_t test_get_invalid(void);
static int32_t test_lookup(void);
static int32_t test_invalid_rcu(void);
static int32_t test_fib_rcu_sync_rw(void);
+static int32_t test_create_vrf(void);
+static int32_t test_vrf_add_del(void);
+static int32_t test_vrf_lookup(void);
+static int32_t test_vrf_isolation(void);
+static int32_t test_vrf_all_nh_sizes(void);
#define MAX_ROUTES (1 << 16)
/** Maximum number of tbl8 for 2-byte entries */
@@ -38,7 +44,7 @@ int32_t
test_create_invalid(void)
{
struct rte_fib6 *fib = NULL;
- struct rte_fib6_conf config;
+ struct rte_fib6_conf config = { 0 };
config.max_routes = MAX_ROUTES;
config.rib_ext_sz = 0;
@@ -97,7 +103,7 @@ int32_t
test_multiple_create(void)
{
struct rte_fib6 *fib = NULL;
- struct rte_fib6_conf config;
+ struct rte_fib6_conf config = { 0 };
int32_t i;
config.rib_ext_sz = 0;
@@ -124,7 +130,7 @@ int32_t
test_free_null(void)
{
struct rte_fib6 *fib = NULL;
- struct rte_fib6_conf config;
+ struct rte_fib6_conf config = { 0 };
config.max_routes = MAX_ROUTES;
config.rib_ext_sz = 0;
@@ -148,7 +154,7 @@ int32_t
test_add_del_invalid(void)
{
struct rte_fib6 *fib = NULL;
- struct rte_fib6_conf config;
+ struct rte_fib6_conf config = { 0 };
uint64_t nh = 100;
struct rte_ipv6_addr ip = RTE_IPV6_ADDR_UNSPEC;
int ret;
@@ -342,7 +348,7 @@ int32_t
test_lookup(void)
{
struct rte_fib6 *fib = NULL;
- struct rte_fib6_conf config;
+ struct rte_fib6_conf config = { 0 };
uint64_t def_nh = 100;
int ret;
@@ -599,6 +605,304 @@ test_fib_rcu_sync_rw(void)
return status == 0 ? TEST_SUCCESS : TEST_FAILED;
}
+/*
+ * Test VRF creation and basic operations
+ */
+static int32_t
+test_create_vrf(void)
+{
+ struct rte_fib6 *fib = NULL;
+ struct rte_fib6_conf config = { 0 };
+ uint64_t def_nh = 100;
+ uint64_t vrf_def_nh[4] = {100, 200, 300, 400};
+
+ config.max_routes = MAX_ROUTES;
+ config.rib_ext_sz = 0;
+ config.default_nh = def_nh;
+ config.type = RTE_FIB6_TRIE;
+ config.trie.nh_sz = RTE_FIB6_TRIE_4B;
+ config.trie.num_tbl8 = MAX_TBL8;
+
+ /* Test single VRF (backward compat) */
+ config.max_vrfs = 0;
+ config.vrf_default_nh = NULL;
+ fib = rte_fib6_create(__func__, SOCKET_ID_ANY, &config);
+ RTE_TEST_ASSERT(fib != NULL, "Failed to create FIB with max_vrfs=0\n");
+ rte_fib6_free(fib);
+
+ /* Test single VRF explicitly */
+ config.max_vrfs = 1;
+ fib = rte_fib6_create(__func__, SOCKET_ID_ANY, &config);
+ RTE_TEST_ASSERT(fib != NULL, "Failed to create FIB with max_vrfs=1\n");
+ rte_fib6_free(fib);
+
+ /* Test multi-VRF with per-VRF defaults */
+ config.max_vrfs = 4;
+ config.vrf_default_nh = vrf_def_nh;
+ fib = rte_fib6_create(__func__, SOCKET_ID_ANY, &config);
+ RTE_TEST_ASSERT(fib != NULL, "Failed to create FIB with max_vrfs=4\n");
+ rte_fib6_free(fib);
+
+ return TEST_SUCCESS;
+}
+
+/*
+ * Test VRF route add/delete operations
+ */
+static int32_t
+test_vrf_add_del(void)
+{
+ struct rte_fib6 *fib = NULL;
+ struct rte_fib6_conf config = { 0 };
+ uint64_t def_nh = 100;
+ uint64_t vrf_def_nh[4] = {100, 200, 300, 400};
+ struct rte_ipv6_addr ip = RTE_IPV6(0x2001, 0, 0, 0, 0, 0, 0, 0);
+ uint8_t depth = 64;
+ uint64_t nh = 1000;
+ int ret;
+
+ config.max_routes = MAX_ROUTES;
+ config.rib_ext_sz = 0;
+ config.default_nh = def_nh;
+ config.type = RTE_FIB6_TRIE;
+ config.trie.nh_sz = RTE_FIB6_TRIE_4B;
+ config.trie.num_tbl8 = MAX_TBL8;
+ config.max_vrfs = 4;
+ config.vrf_default_nh = vrf_def_nh;
+
+ fib = rte_fib6_create(__func__, SOCKET_ID_ANY, &config);
+ RTE_TEST_ASSERT(fib != NULL, "Failed to create FIB\n");
+
+ /* Add route to VRF 0 */
+ ret = rte_fib6_vrf_add(fib, 0, &ip, depth, nh);
+ RTE_TEST_ASSERT(ret == 0, "Failed to add route to VRF 0\n");
+
+ /* Add route to VRF 1 with different nexthop */
+ ret = rte_fib6_vrf_add(fib, 1, &ip, depth, nh + 1);
+ RTE_TEST_ASSERT(ret == 0, "Failed to add route to VRF 1\n");
+
+ /* Add route to VRF 2 */
+ ret = rte_fib6_vrf_add(fib, 2, &ip, depth, nh + 2);
+ RTE_TEST_ASSERT(ret == 0, "Failed to add route to VRF 2\n");
+
+ /* Test invalid VRF ID */
+ ret = rte_fib6_vrf_add(fib, 10, &ip, depth, nh);
+ RTE_TEST_ASSERT(ret != 0, "Should fail with invalid VRF ID\n");
+
+ /* Delete route from VRF 1 */
+ ret = rte_fib6_vrf_delete(fib, 1, &ip, depth);
+ RTE_TEST_ASSERT(ret == 0, "Failed to delete route from VRF 1\n");
+
+ /* Delete non-existent route - implementation may return error */
+ ret = rte_fib6_vrf_delete(fib, 3, &ip, depth);
+ (void)ret; /* Accept any return value */
+
+ rte_fib6_free(fib);
+ return TEST_SUCCESS;
+}
+
+/*
+ * Test VRF lookup functionality
+ */
+static int32_t
+test_vrf_lookup(void)
+{
+ struct rte_fib6 *fib = NULL;
+ struct rte_fib6_conf config = { 0 };
+ uint64_t def_nh = 100;
+ uint64_t vrf_def_nh[4] = {1000, 2000, 3000, 4000};
+ struct rte_ipv6_addr ip_base = RTE_IPV6(0x2001, 0, 0, 0, 0, 0, 0, 0);
+ uint16_t vrf_ids[4];
+ struct rte_ipv6_addr ips[4];
+ uint64_t next_hops[4];
+ int ret;
+ uint32_t i;
+
+ config.max_routes = MAX_ROUTES;
+ config.rib_ext_sz = 0;
+ config.default_nh = def_nh;
+ config.type = RTE_FIB6_TRIE;
+ config.trie.nh_sz = RTE_FIB6_TRIE_4B;
+ config.trie.num_tbl8 = MAX_TBL8;
+ config.max_vrfs = 4;
+ config.vrf_default_nh = vrf_def_nh;
+
+ fib = rte_fib6_create(__func__, SOCKET_ID_ANY, &config);
+ RTE_TEST_ASSERT(fib != NULL, "Failed to create FIB\n");
+
+ /* Add routes to different VRFs with VRF-specific nexthops */
+ for (i = 0; i < 4; i++) {
+ struct rte_ipv6_addr ip = ip_base;
+ ip.a[1] = (uint8_t)i;
+ ret = rte_fib6_vrf_add(fib, i, &ip, 64, 100 + i);
+ RTE_TEST_ASSERT(ret == 0, "Failed to add route to VRF %u\n", i);
+ }
+
+ /* Prepare lookup: each IP should match its VRF-specific route */
+ for (i = 0; i < 4; i++) {
+ vrf_ids[i] = i;
+ ips[i] = ip_base;
+ ips[i].a[1] = (uint8_t)i;
+ ips[i].a[15] = 0x34; /* within /64 */
+ }
+
+ /* Lookup should return VRF-specific nexthops */
+ ret = rte_fib6_vrf_lookup_bulk(fib, vrf_ids, ips, next_hops, 4);
+ RTE_TEST_ASSERT(ret == 0, "VRF lookup failed\n");
+
+ for (i = 0; i < 4; i++) {
+ RTE_TEST_ASSERT(next_hops[i] == 100 + i,
+ "Wrong nexthop for VRF %u: expected %"PRIu64", got %"PRIu64"\n",
+ i, (uint64_t)(100 + i), next_hops[i]);
+ }
+
+ /* Test default nexthops for unmatched IPs */
+ {
+ struct rte_ipv6_addr ip_unmatch = RTE_IPV6(0x3001, 0, 0, 0, 0, 0, 0, 1);
+ for (i = 0; i < 4; i++) {
+ vrf_ids[i] = i;
+ ips[i] = ip_unmatch;
+ }
+ }
+
+ ret = rte_fib6_vrf_lookup_bulk(fib, vrf_ids, ips, next_hops, 4);
+ RTE_TEST_ASSERT(ret == 0, "VRF lookup failed\n");
+
+ for (i = 0; i < 4; i++) {
+ RTE_TEST_ASSERT(next_hops[i] == vrf_def_nh[i],
+ "Wrong default nexthop for VRF %u: expected %"PRIu64", got %"PRIu64"\n",
+ i, vrf_def_nh[i], next_hops[i]);
+ }
+
+ rte_fib6_free(fib);
+ return TEST_SUCCESS;
+}
+
+/*
+ * Test VRF isolation - routes in one VRF shouldn't affect others
+ */
+static int32_t
+test_vrf_isolation(void)
+{
+ struct rte_fib6 *fib = NULL;
+ struct rte_fib6_conf config = { 0 };
+ uint64_t vrf_def_nh[3] = {100, 200, 300};
+ struct rte_ipv6_addr ip = RTE_IPV6(0x2001, 0, 0, 0, 0, 0, 0, 0);
+ uint16_t vrf_ids[3] = {0, 1, 2};
+ struct rte_ipv6_addr ips[3];
+ uint64_t next_hops[3];
+ int ret;
+ uint32_t i;
+
+ config.max_routes = MAX_ROUTES;
+ config.rib_ext_sz = 0;
+ config.default_nh = 0;
+ config.type = RTE_FIB6_TRIE;
+ config.trie.nh_sz = RTE_FIB6_TRIE_4B;
+ config.trie.num_tbl8 = MAX_TBL8;
+ config.max_vrfs = 3;
+ config.vrf_default_nh = vrf_def_nh;
+
+ fib = rte_fib6_create("test_vrf6_isol", SOCKET_ID_ANY, &config);
+ RTE_TEST_ASSERT(fib != NULL, "Failed to create FIB\n");
+
+ /* Add route only to VRF 1 */
+ ret = rte_fib6_vrf_add(fib, 1, &ip, 64, 777);
+ RTE_TEST_ASSERT(ret == 0, "Failed to add route to VRF 1\n");
+
+ /* Lookup same IP in all three VRFs */
+ for (i = 0; i < 3; i++) {
+ ips[i] = ip;
+ ips[i].a[15] = 0x22; /* within /64 */
+ }
+
+ ret = rte_fib6_vrf_lookup_bulk(fib, vrf_ids, ips, next_hops, 3);
+ RTE_TEST_ASSERT(ret == 0, "VRF lookup failed\n");
+
+ /* VRF 0 should get default */
+ RTE_TEST_ASSERT(next_hops[0] == vrf_def_nh[0],
+ "VRF 0 should return default nexthop\n");
+
+ /* VRF 1 should get the route */
+ RTE_TEST_ASSERT(next_hops[1] == 777,
+ "VRF 1 should return route nexthop 777, got %"PRIu64"\n", next_hops[1]);
+
+ /* VRF 2 should get default */
+ RTE_TEST_ASSERT(next_hops[2] == vrf_def_nh[2],
+ "VRF 2 should return default nexthop\n");
+
+ rte_fib6_free(fib);
+ return TEST_SUCCESS;
+}
+
+/*
+ * Test multi-VRF with all nexthop sizes
+ */
+static int32_t
+test_vrf_all_nh_sizes(void)
+{
+ struct rte_fib6 *fib = NULL;
+ struct rte_fib6_conf config = { 0 };
+ uint64_t vrf_def_nh[2] = {10, 20};
+ struct rte_ipv6_addr ip = RTE_IPV6(0x2001, 0, 0, 0, 0, 0, 0, 0);
+ uint16_t vrf_ids[2] = {0, 1};
+ struct rte_ipv6_addr ips[2];
+ uint64_t next_hops[2];
+ int ret;
+ enum rte_fib_trie_nh_sz nh_sizes[] = {
+ RTE_FIB6_TRIE_2B,
+ RTE_FIB6_TRIE_4B,
+ RTE_FIB6_TRIE_8B
+ };
+ uint64_t max_nhs[] = {32767, 2147483647ULL, 9223372036854775807ULL};
+ int i;
+
+ config.max_routes = MAX_ROUTES;
+ config.rib_ext_sz = 0;
+ config.default_nh = 0;
+ config.type = RTE_FIB6_TRIE;
+ config.trie.num_tbl8 = MAX_TBL8 - 1;
+ config.max_vrfs = 2;
+ config.vrf_default_nh = vrf_def_nh;
+
+ for (i = 0; i < (int)RTE_DIM(nh_sizes); i++) {
+ char name[32];
+ config.trie.nh_sz = nh_sizes[i];
+ snprintf(name, sizeof(name), "vrf6_nh%d", i);
+
+ fib = rte_fib6_create(name, SOCKET_ID_ANY, &config);
+ RTE_TEST_ASSERT(fib != NULL, "Failed to create FIB\n");
+
+ /* Add routes with max nexthop for this size */
+ ret = rte_fib6_vrf_add(fib, 0, &ip, 64, max_nhs[i]);
+ RTE_TEST_ASSERT(ret == 0,
+ "Failed to add route to VRF 0 with nh_sz=%d\n", nh_sizes[i]);
+
+ ret = rte_fib6_vrf_add(fib, 1, &ip, 64, max_nhs[i] - 1);
+ RTE_TEST_ASSERT(ret == 0,
+ "Failed to add route to VRF 1 with nh_sz=%d\n", nh_sizes[i]);
+
+ /* Lookup */
+ ips[0] = ip;
+ ips[1] = ip;
+ ips[0].a[15] = 0x11;
+ ips[1].a[15] = 0x22;
+
+ ret = rte_fib6_vrf_lookup_bulk(fib, vrf_ids, ips, next_hops, 2);
+ RTE_TEST_ASSERT(ret == 0, "VRF lookup failed with nh_sz=%d\n", nh_sizes[i]);
+
+ RTE_TEST_ASSERT(next_hops[0] == max_nhs[i],
+ "Wrong nexthop for VRF 0 with nh_sz=%d\n", nh_sizes[i]);
+ RTE_TEST_ASSERT(next_hops[1] == max_nhs[i] - 1,
+ "Wrong nexthop for VRF 1 with nh_sz=%d\n", nh_sizes[i]);
+
+ rte_fib6_free(fib);
+ fib = NULL;
+ }
+
+ return TEST_SUCCESS;
+}
+
static struct unit_test_suite fib6_fast_tests = {
.suite_name = "fib6 autotest",
.setup = NULL,
@@ -611,6 +915,11 @@ static struct unit_test_suite fib6_fast_tests = {
TEST_CASE(test_lookup),
TEST_CASE(test_invalid_rcu),
TEST_CASE(test_fib_rcu_sync_rw),
+ TEST_CASE(test_create_vrf),
+ TEST_CASE(test_vrf_add_del),
+ TEST_CASE(test_vrf_lookup),
+ TEST_CASE(test_vrf_isolation),
+ TEST_CASE(test_vrf_all_nh_sizes),
TEST_CASES_END()
}
};
--
2.43.0
^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: [RFC PATCH 2/4] fib: add VRF functional and unit tests
2026-03-22 15:42 ` [RFC PATCH 2/4] fib: add VRF functional and unit tests Vladimir Medvedkin
@ 2026-03-22 16:40 ` Stephen Hemminger
2026-03-22 16:41 ` Stephen Hemminger
1 sibling, 0 replies; 33+ messages in thread
From: Stephen Hemminger @ 2026-03-22 16:40 UTC (permalink / raw)
To: Vladimir Medvedkin; +Cc: dev, rjarry, nsaxena16, mb, adwivedi, jerinjacobk
On Sun, 22 Mar 2026 15:42:13 +0000
Vladimir Medvedkin <vladimir.medvedkin@intel.com> wrote:
> +/* Get number of bits needed to encode VRF IDs */
> +static __rte_always_inline __rte_pure uint8_t
> +get_vrf_bits(uint32_t nb_vrfs)
This is test code, no need for __rte_always_inline.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [RFC PATCH 2/4] fib: add VRF functional and unit tests
2026-03-22 15:42 ` [RFC PATCH 2/4] fib: add VRF functional and unit tests Vladimir Medvedkin
2026-03-22 16:40 ` Stephen Hemminger
@ 2026-03-22 16:41 ` Stephen Hemminger
1 sibling, 0 replies; 33+ messages in thread
From: Stephen Hemminger @ 2026-03-22 16:41 UTC (permalink / raw)
To: Vladimir Medvedkin; +Cc: dev, rjarry, nsaxena16, mb, adwivedi, jerinjacobk
On Sun, 22 Mar 2026 15:42:13 +0000
Vladimir Medvedkin <vladimir.medvedkin@intel.com> wrote:
> + case 'V':
> + errno = 0;
> + config.nb_vrfs = strtoul(optarg, &endptr, 10);
> + /* VRF IDs are uint16_t, max valid VRF is 65535 */
> + if ((errno != 0) || (config.nb_vrfs == 0) ||
> + (config.nb_vrfs > UINT16_MAX)) {
> + print_usage();
Should also check endptr. Right now "-V1xx" would be accepted.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [RFC PATCH 0/4] VRF support in FIB library
2026-03-22 15:42 [RFC PATCH 0/4] VRF support in FIB library Vladimir Medvedkin
` (3 preceding siblings ...)
2026-03-22 15:42 ` [RFC PATCH 4/4] fib6: add VRF functional and unit tests Vladimir Medvedkin
@ 2026-03-22 16:43 ` Stephen Hemminger
2026-03-23 9:01 ` Morten Brørup
2026-03-23 11:16 ` Medvedkin, Vladimir
2026-03-23 9:54 ` Robin Jarry
` (2 subsequent siblings)
7 siblings, 2 replies; 33+ messages in thread
From: Stephen Hemminger @ 2026-03-22 16:43 UTC (permalink / raw)
To: Vladimir Medvedkin; +Cc: dev, rjarry, nsaxena16, mb, adwivedi, jerinjacobk
On Sun, 22 Mar 2026 15:42:11 +0000
Vladimir Medvedkin <vladimir.medvedkin@intel.com> wrote:
> This series adds multi-VRF support to both IPv4 and IPv6 FIB paths by
> allowing a single FIB instance to host multiple isolated routing domains.
>
> Currently FIB instance represents one routing instance. For workloads that
> need multiple VRFs, the only option is to create multiple FIB objects. In a
> burst oriented datapath, packets in the same batch can belong to different VRFs, so
> the application either does per-packet lookup in different FIB instances or
> regroups packets by VRF before lookup. Both approaches are expensive.
>
> To remove that cost, this series keeps all VRFs inside one FIB instance and
> extends lookup input with per-packet VRF IDs.
>
> The design follows the existing fast-path structure for both families. IPv4 and
> IPv6 use multi-ary trees with a 2^24 associativity on a first level (tbl24). The
> first-level table scales per configured VRF. This increases memory usage, but
> keeps performance and lookup complexity on par with non-VRF implementation.
>
> Vladimir Medvedkin (4):
> fib: add multi-VRF support
> fib: add VRF functional and unit tests
> fib6: add multi-VRF support
> fib6: add VRF functional and unit tests
>
> app/test-fib/main.c | 257 ++++++++++++++++++++++--
> app/test/test_fib.c | 298 +++++++++++++++++++++++++++
> app/test/test_fib6.c | 319 ++++++++++++++++++++++++++++-
> lib/fib/dir24_8.c | 241 ++++++++++++++++------
> lib/fib/dir24_8.h | 255 ++++++++++++++++--------
> lib/fib/dir24_8_avx512.c | 420 +++++++++++++++++++++++++++++++--------
> lib/fib/dir24_8_avx512.h | 80 +++++++-
> lib/fib/rte_fib.c | 158 ++++++++++++---
> lib/fib/rte_fib.h | 94 ++++++++-
> lib/fib/rte_fib6.c | 166 +++++++++++++---
> lib/fib/rte_fib6.h | 88 +++++++-
> lib/fib/trie.c | 158 +++++++++++----
> lib/fib/trie.h | 51 +++--
> lib/fib/trie_avx512.c | 225 +++++++++++++++++++--
> lib/fib/trie_avx512.h | 39 +++-
> 15 files changed, 2453 insertions(+), 396 deletions(-)
Not sure at all if this the right way to do VRF.
There are multiple ways to do VRF, the Linux way, the Cisco way, ...
This needs way more documentation and also an example.
Like an option to l3fwd. And also an implementation in testpmd.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [RFC PATCH 4/4] fib6: add VRF functional and unit tests
2026-03-22 15:42 ` [RFC PATCH 4/4] fib6: add VRF functional and unit tests Vladimir Medvedkin
@ 2026-03-22 16:45 ` Stephen Hemminger
0 siblings, 0 replies; 33+ messages in thread
From: Stephen Hemminger @ 2026-03-22 16:45 UTC (permalink / raw)
To: Vladimir Medvedkin; +Cc: dev, rjarry, nsaxena16, mb, adwivedi, jerinjacobk
On Sun, 22 Mar 2026 15:42:15 +0000
Vladimir Medvedkin <vladimir.medvedkin@intel.com> wrote:
>
> + /* Allocate VRF IDs array for lookups if using multiple VRFs */
> + if (config.nb_vrfs > 1) {
> + vrf_ids = rte_malloc(NULL, sizeof(uint16_t) * config.nb_lookup_ips, 0);
Since this won't be shared across processes, use regular malloc, not hugepages.
> + if (vrf_ids == NULL) {
> + printf("Can not alloc VRF IDs array\n");
> + return -ENOMEM;
> + }
> + /* Generate random VRF IDs for each lookup */
> + for (i = 0; i < config.nb_lookup_ips; i++)
> + vrf_ids[i] = rte_rand() % config.nb_vrfs;
> +
Use rte_rand_max here.
^ permalink raw reply [flat|nested] 33+ messages in thread
* RE: [RFC PATCH 0/4] VRF support in FIB library
2026-03-22 16:43 ` [RFC PATCH 0/4] VRF support in FIB library Stephen Hemminger
@ 2026-03-23 9:01 ` Morten Brørup
2026-03-23 11:32 ` Medvedkin, Vladimir
2026-03-23 11:16 ` Medvedkin, Vladimir
1 sibling, 1 reply; 33+ messages in thread
From: Morten Brørup @ 2026-03-23 9:01 UTC (permalink / raw)
To: Stephen Hemminger, Vladimir Medvedkin
Cc: dev, rjarry, nsaxena16, adwivedi, jerinjacobk
> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Sunday, 22 March 2026 17.44
>
> On Sun, 22 Mar 2026 15:42:11 +0000
> Vladimir Medvedkin <vladimir.medvedkin@intel.com> wrote:
>
> > This series adds multi-VRF support to both IPv4 and IPv6 FIB paths by
> > allowing a single FIB instance to host multiple isolated routing
> domains.
> >
> > Currently FIB instance represents one routing instance. For workloads
> that
> > need multiple VRFs, the only option is to create multiple FIB
> objects. In a
> > burst oriented datapath, packets in the same batch can belong to
> different VRFs, so
> > the application either does per-packet lookup in different FIB
> instances or
> > regroups packets by VRF before lookup. Both approaches are expensive.
> >
> > To remove that cost, this series keeps all VRFs inside one FIB
> instance and
> > extends lookup input with per-packet VRF IDs.
> >
> > The design follows the existing fast-path structure for both
> families. IPv4 and
> > IPv6 use multi-ary trees with a 2^24 associativity on a first level
> (tbl24). The
> > first-level table scales per configured VRF. This increases memory
> usage, but
> > keeps performance and lookup complexity on par with non-VRF
> implementation.
> >
I noticed the suggested API uses separate parameters for the VRF and IP.
How about using one parameter, a structure containing the {VRF, IP} tuple, instead?
I'm mainly thinking about the bulk operations, where passing one array seems more intuitive than passing two arrays.
>
>
> Not sure at all if this the right way to do VRF.
> There are multiple ways to do VRF, the Linux way, the Cisco way, ...
I think a shared table operating on the {VRF, IP} tuple makes sense.
If a table instance per VRF is preferred, that is still supported.
Can you elaborate what Linux and Cisco does differently than this?
>
>
>
> This needs way more documentation and also an example.
+1
> Like an option to l3fwd. And also an implementation in testpmd.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [RFC PATCH 0/4] VRF support in FIB library
2026-03-22 15:42 [RFC PATCH 0/4] VRF support in FIB library Vladimir Medvedkin
` (4 preceding siblings ...)
2026-03-22 16:43 ` [RFC PATCH 0/4] VRF support in FIB library Stephen Hemminger
@ 2026-03-23 9:54 ` Robin Jarry
2026-03-23 11:34 ` Medvedkin, Vladimir
2026-03-23 11:27 ` Maxime Leroy
2026-03-23 19:05 ` Stephen Hemminger
7 siblings, 1 reply; 33+ messages in thread
From: Robin Jarry @ 2026-03-23 9:54 UTC (permalink / raw)
To: Vladimir Medvedkin, dev; +Cc: nsaxena16, mb, adwivedi, jerinjacobk
Vladimir Medvedkin, Mar 22, 2026 at 16:42:
> This series adds multi-VRF support to both IPv4 and IPv6 FIB paths by
> allowing a single FIB instance to host multiple isolated routing domains.
>
> Currently FIB instance represents one routing instance. For workloads that
> need multiple VRFs, the only option is to create multiple FIB objects. In a
> burst oriented datapath, packets in the same batch can belong to different VRFs, so
> the application either does per-packet lookup in different FIB instances or
> regroups packets by VRF before lookup. Both approaches are expensive.
>
> To remove that cost, this series keeps all VRFs inside one FIB instance and
> extends lookup input with per-packet VRF IDs.
>
> The design follows the existing fast-path structure for both families. IPv4 and
> IPv6 use multi-ary trees with a 2^24 associativity on a first level (tbl24). The
> first-level table scales per configured VRF. This increases memory usage, but
> keeps performance and lookup complexity on par with non-VRF implementation.
>
> Vladimir Medvedkin (4):
> fib: add multi-VRF support
> fib: add VRF functional and unit tests
> fib6: add multi-VRF support
> fib6: add VRF functional and unit tests
Hey Vladimir,
Thanks for the series, this is an interesting approach. Does this allow
sharing the tbl8 arrays amongst VRFs?
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [RFC PATCH 0/4] VRF support in FIB library
2026-03-22 16:43 ` [RFC PATCH 0/4] VRF support in FIB library Stephen Hemminger
2026-03-23 9:01 ` Morten Brørup
@ 2026-03-23 11:16 ` Medvedkin, Vladimir
1 sibling, 0 replies; 33+ messages in thread
From: Medvedkin, Vladimir @ 2026-03-23 11:16 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: dev, rjarry, nsaxena16, mb, adwivedi, jerinjacobk
Hi Stephen,
On 3/22/2026 4:43 PM, Stephen Hemminger wrote:
> On Sun, 22 Mar 2026 15:42:11 +0000
> Vladimir Medvedkin <vladimir.medvedkin@intel.com> wrote:
>
>> This series adds multi-VRF support to both IPv4 and IPv6 FIB paths by
>> allowing a single FIB instance to host multiple isolated routing domains.
>>
>> Currently FIB instance represents one routing instance. For workloads that
>> need multiple VRFs, the only option is to create multiple FIB objects. In a
>> burst oriented datapath, packets in the same batch can belong to different VRFs, so
>> the application either does per-packet lookup in different FIB instances or
>> regroups packets by VRF before lookup. Both approaches are expensive.
>>
>> To remove that cost, this series keeps all VRFs inside one FIB instance and
>> extends lookup input with per-packet VRF IDs.
>>
>> The design follows the existing fast-path structure for both families. IPv4 and
>> IPv6 use multi-ary trees with a 2^24 associativity on a first level (tbl24). The
>> first-level table scales per configured VRF. This increases memory usage, but
>> keeps performance and lookup complexity on par with non-VRF implementation.
>>
>> Vladimir Medvedkin (4):
>> fib: add multi-VRF support
>> fib: add VRF functional and unit tests
>> fib6: add multi-VRF support
>> fib6: add VRF functional and unit tests
>>
>> app/test-fib/main.c | 257 ++++++++++++++++++++++--
>> app/test/test_fib.c | 298 +++++++++++++++++++++++++++
>> app/test/test_fib6.c | 319 ++++++++++++++++++++++++++++-
>> lib/fib/dir24_8.c | 241 ++++++++++++++++------
>> lib/fib/dir24_8.h | 255 ++++++++++++++++--------
>> lib/fib/dir24_8_avx512.c | 420 +++++++++++++++++++++++++++++++--------
>> lib/fib/dir24_8_avx512.h | 80 +++++++-
>> lib/fib/rte_fib.c | 158 ++++++++++++---
>> lib/fib/rte_fib.h | 94 ++++++++-
>> lib/fib/rte_fib6.c | 166 +++++++++++++---
>> lib/fib/rte_fib6.h | 88 +++++++-
>> lib/fib/trie.c | 158 +++++++++++----
>> lib/fib/trie.h | 51 +++--
>> lib/fib/trie_avx512.c | 225 +++++++++++++++++++--
>> lib/fib/trie_avx512.h | 39 +++-
>> 15 files changed, 2453 insertions(+), 396 deletions(-)
>
> Not sure at all if this the right way to do VRF.
I'd like to hear the reason why this is not the right way, what is the
right way and why
> There are multiple ways to do VRF, the Linux way, the Cisco way, ...
I'm not 100% sure what defines a "way", but looking at VPP (since Cisco
was mentioned) their way is pretty much the same as Linux - i.e. it has
a separate handle per VRF(fib_index):
https://github.com/FDio/vpp/blob/c2f6a88b43d6175730d5d19364cb446e128065da/src/vnet/fib/ip4_fib.h#L128
In grout with the current FIB design the situation is the same - FIB
instance per VRF(vrf_id):
https://github.com/DPDK/grout/blob/0f7b91c287f7f54cd75a1b64da7ade505b346bf5/modules/ip/control/route.c#L147
I wrote about the problems of this approach in the cover letter,
different options were considered, and this is the design I have arrived at.
After all, why should we have to follow someone else's way? I'm
proposing our own "DPDK way" :)
>
>
>
> This needs way more documentation and also an example.
> Like an option to l3fwd. And also an implementation in testpmd.
--
Regards,
Vladimir
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [RFC PATCH 0/4] VRF support in FIB library
2026-03-22 15:42 [RFC PATCH 0/4] VRF support in FIB library Vladimir Medvedkin
` (5 preceding siblings ...)
2026-03-23 9:54 ` Robin Jarry
@ 2026-03-23 11:27 ` Maxime Leroy
2026-03-23 12:49 ` Medvedkin, Vladimir
2026-03-23 19:05 ` Stephen Hemminger
7 siblings, 1 reply; 33+ messages in thread
From: Maxime Leroy @ 2026-03-23 11:27 UTC (permalink / raw)
To: Vladimir Medvedkin; +Cc: dev, rjarry, nsaxena16, mb, adwivedi, jerinjacobk
Hi Vladimir,
On Sun, Mar 22, 2026 at 4:42 PM Vladimir Medvedkin
<vladimir.medvedkin@intel.com> wrote:
>
> This series adds multi-VRF support to both IPv4 and IPv6 FIB paths by
> allowing a single FIB instance to host multiple isolated routing domains.
>
> Currently FIB instance represents one routing instance. For workloads that
> need multiple VRFs, the only option is to create multiple FIB objects. In a
> burst oriented datapath, packets in the same batch can belong to different VRFs, so
> the application either does per-packet lookup in different FIB instances or
> regroups packets by VRF before lookup. Both approaches are expensive.
>
> To remove that cost, this series keeps all VRFs inside one FIB instance and
> extends lookup input with per-packet VRF IDs.
>
> The design follows the existing fast-path structure for both families. IPv4 and
> IPv6 use multi-ary trees with a 2^24 associativity on a first level (tbl24). The
> first-level table scales per configured VRF. This increases memory usage, but
> keeps performance and lookup complexity on par with non-VRF implementation.
>
Thanks for the RFC. Some thoughts below.
Memory cost: the flat TBL24 replicates the entire table for every VRF
(num_vrfs * 2^24 * nh_size). With 256 VRFs and 8B nexthops that is
32 GB for TBL24 alone. In grout we support up to 256 VRFs allocated
on demand -- this approach forces the full cost upfront even if most
VRFs are empty.
Per-packet VRF lookup: Rx bursts come from one port, thus one VRF.
Mixed-VRF bulk lookups do not occur in practice. The three AVX512
code paths add complexity for a scenario that does not exist, at
least for a classic router. Am I missing a use-case?
I am not too familiar with DPDK FIB internals, but would it be
possible to keep a separate TBL24 per VRF and only share the TBL8
pool? Something like pre-allocating an array of max_vrfs TBL24
pointers, allocating each TBL24 on demand at VRF add time, and
having them all point into a shared TBL8 pool. The TBL8 index in
TBL24 entries seems to already be global, so would that work without
encoding changes?
Going further: could the same idea extend to IPv6? The dir24_8 and
trie seem to use the same TBL8 block format (256 entries, same
(nh << 1) | ext_bit encoding, same size). Would unifying the TBL8
allocator allow a single pool shared across IPv4, IPv6, and all
VRFs? That could be a bigger win for /32-heavy and /128-heavy tables
and maybe a good first step before multi-VRF.
Regards,
Maxime Leroy
> Vladimir Medvedkin (4):
> fib: add multi-VRF support
> fib: add VRF functional and unit tests
> fib6: add multi-VRF support
> fib6: add VRF functional and unit tests
>
> app/test-fib/main.c | 257 ++++++++++++++++++++++--
> app/test/test_fib.c | 298 +++++++++++++++++++++++++++
> app/test/test_fib6.c | 319 ++++++++++++++++++++++++++++-
> lib/fib/dir24_8.c | 241 ++++++++++++++++------
> lib/fib/dir24_8.h | 255 ++++++++++++++++--------
> lib/fib/dir24_8_avx512.c | 420 +++++++++++++++++++++++++++++++--------
> lib/fib/dir24_8_avx512.h | 80 +++++++-
> lib/fib/rte_fib.c | 158 ++++++++++++---
> lib/fib/rte_fib.h | 94 ++++++++-
> lib/fib/rte_fib6.c | 166 +++++++++++++---
> lib/fib/rte_fib6.h | 88 +++++++-
> lib/fib/trie.c | 158 +++++++++++----
> lib/fib/trie.h | 51 +++--
> lib/fib/trie_avx512.c | 225 +++++++++++++++++++--
> lib/fib/trie_avx512.h | 39 +++-
> 15 files changed, 2453 insertions(+), 396 deletions(-)
>
> --
> 2.43.0
>
--
-------------------------------
Maxime Leroy
maxime@leroys.fr
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [RFC PATCH 0/4] VRF support in FIB library
2026-03-23 9:01 ` Morten Brørup
@ 2026-03-23 11:32 ` Medvedkin, Vladimir
0 siblings, 0 replies; 33+ messages in thread
From: Medvedkin, Vladimir @ 2026-03-23 11:32 UTC (permalink / raw)
To: Morten Brørup, Stephen Hemminger
Cc: dev, rjarry, nsaxena16, adwivedi, jerinjacobk
Hi Morten,
On 3/23/2026 9:01 AM, Morten Brørup wrote:
>> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
>> Sent: Sunday, 22 March 2026 17.44
>>
>> On Sun, 22 Mar 2026 15:42:11 +0000
>> Vladimir Medvedkin <vladimir.medvedkin@intel.com> wrote:
>>
>>> This series adds multi-VRF support to both IPv4 and IPv6 FIB paths by
>>> allowing a single FIB instance to host multiple isolated routing
>> domains.
>>> Currently FIB instance represents one routing instance. For workloads
>> that
>>> need multiple VRFs, the only option is to create multiple FIB
>> objects. In a
>>> burst oriented datapath, packets in the same batch can belong to
>> different VRFs, so
>>> the application either does per-packet lookup in different FIB
>> instances or
>>> regroups packets by VRF before lookup. Both approaches are expensive.
>>>
>>> To remove that cost, this series keeps all VRFs inside one FIB
>> instance and
>>> extends lookup input with per-packet VRF IDs.
>>>
>>> The design follows the existing fast-path structure for both
>> families. IPv4 and
>>> IPv6 use multi-ary trees with a 2^24 associativity on a first level
>> (tbl24). The
>>> first-level table scales per configured VRF. This increases memory
>> usage, but
>>> keeps performance and lookup complexity on par with non-VRF
>> implementation.
> I noticed the suggested API uses separate parameters for the VRF and IP.
> How about using one parameter, a structure containing the {VRF, IP} tuple, instead?
> I'm mainly thinking about the bulk operations, where passing one array seems more intuitive than passing two arrays.
I found this design to be more intuitive and kind of backward
compatible, many apps already create an array of addresses, adding an
extra array with corresponding VRFs maynotbecounterintuitive IMO.
ButwhatIfindmoreimportantisperformance,atleastthisapproachis
moreconvenientforvectorization.
>
>>
>> Not sure at all if this the right way to do VRF.
>> There are multiple ways to do VRF, the Linux way, the Cisco way, ...
> I think a shared table operating on the {VRF, IP} tuple makes sense.
> If a table instance per VRF is preferred, that is still supported.
>
> Can you elaborate what Linux and Cisco does differently than this?
>
>>
>>
>> This needs way more documentation and also an example.
> +1
>
>> Like an option to l3fwd. And also an implementation in testpmd.
--
Regards,
Vladimir
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [RFC PATCH 0/4] VRF support in FIB library
2026-03-23 9:54 ` Robin Jarry
@ 2026-03-23 11:34 ` Medvedkin, Vladimir
0 siblings, 0 replies; 33+ messages in thread
From: Medvedkin, Vladimir @ 2026-03-23 11:34 UTC (permalink / raw)
To: Robin Jarry, dev; +Cc: nsaxena16, mb, adwivedi, jerinjacobk
Hi Robin,
On 3/23/2026 9:54 AM, Robin Jarry wrote:
> Vladimir Medvedkin, Mar 22, 2026 at 16:42:
>> This series adds multi-VRF support to both IPv4 and IPv6 FIB paths by
>> allowing a single FIB instance to host multiple isolated routing domains.
>>
>> Currently FIB instance represents one routing instance. For workloads that
>> need multiple VRFs, the only option is to create multiple FIB objects. In a
>> burst oriented datapath, packets in the same batch can belong to different VRFs, so
>> the application either does per-packet lookup in different FIB instances or
>> regroups packets by VRF before lookup. Both approaches are expensive.
>>
>> To remove that cost, this series keeps all VRFs inside one FIB instance and
>> extends lookup input with per-packet VRF IDs.
>>
>> The design follows the existing fast-path structure for both families. IPv4 and
>> IPv6 use multi-ary trees with a 2^24 associativity on a first level (tbl24). The
>> first-level table scales per configured VRF. This increases memory usage, but
>> keeps performance and lookup complexity on par with non-VRF implementation.
>>
>> Vladimir Medvedkin (4):
>> fib: add multi-VRF support
>> fib: add VRF functional and unit tests
>> fib6: add multi-VRF support
>> fib6: add VRF functional and unit tests
> Hey Vladimir,
>
> Thanks for the series, this is an interesting approach. Does this allow
> sharing the tbl8 arrays amongst VRFs?
Yes! tbl8 array is shared for all VRFs in both IPv4(dir24_8) and IPv6(trie).
--
Regards,
Vladimir
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [RFC PATCH 0/4] VRF support in FIB library
2026-03-23 11:27 ` Maxime Leroy
@ 2026-03-23 12:49 ` Medvedkin, Vladimir
2026-03-23 14:53 ` Maxime Leroy
0 siblings, 1 reply; 33+ messages in thread
From: Medvedkin, Vladimir @ 2026-03-23 12:49 UTC (permalink / raw)
To: Maxime Leroy; +Cc: dev, rjarry, nsaxena16, mb, adwivedi, jerinjacobk
Hi Maxime,
On 3/23/2026 11:27 AM, Maxime Leroy wrote:
> Hi Vladimir,
>
>
> On Sun, Mar 22, 2026 at 4:42 PM Vladimir Medvedkin
> <vladimir.medvedkin@intel.com> wrote:
>> This series adds multi-VRF support to both IPv4 and IPv6 FIB paths by
>> allowing a single FIB instance to host multiple isolated routing domains.
>>
>> Currently FIB instance represents one routing instance. For workloads that
>> need multiple VRFs, the only option is to create multiple FIB objects. In a
>> burst oriented datapath, packets in the same batch can belong to different VRFs, so
>> the application either does per-packet lookup in different FIB instances or
>> regroups packets by VRF before lookup. Both approaches are expensive.
>>
>> To remove that cost, this series keeps all VRFs inside one FIB instance and
>> extends lookup input with per-packet VRF IDs.
>>
>> The design follows the existing fast-path structure for both families. IPv4 and
>> IPv6 use multi-ary trees with a 2^24 associativity on a first level (tbl24). The
>> first-level table scales per configured VRF. This increases memory usage, but
>> keeps performance and lookup complexity on par with non-VRF implementation.
>>
> Thanks for the RFC. Some thoughts below.
>
> Memory cost: the flat TBL24 replicates the entire table for every VRF
> (num_vrfs * 2^24 * nh_size). With 256 VRFs and 8B nexthops that is
> 32 GB for TBL24 alone. In grout we support up to 256 VRFs allocated
> on demand -- this approach forces the full cost upfront even if most
> VRFs are empty.
Yes, increased memory consumption is the
trade-off.WemakethischoiceinDPDKquite often,such as pre-allocatedmbufs,
mempoolsand many other stuff allocated in advance to gain performance.
For FIB, I chose to replicate TBL24 per VRF for this same reason.
And, as Morten mentioned earlier, if memory is the priority, a table
instance per VRF allocated on-demand is still supported.
The high memory cost stems from TBL24's design: for IPv4, it was
justified by the BGP filtering convention (no prefixes more specific
than /24 in BGPv4 full view), ensuring most lookups hit with just one
random memory access. For IPv6, we should likely switch to a 16-bit TRIE
scheme on all layers. For IPv4, alternative algorithms with smaller
footprints (like DXR or DIR16-8-8, as used in VPP) may be worth
exploring if BGP full view is not required for those VRFs.
>
> Per-packet VRF lookup: Rx bursts come from one port, thus one VRF.
> Mixed-VRF bulk lookups do not occur in practice. The three AVX512
> code paths add complexity for a scenario that does not exist, at
> least for a classic router. Am I missing a use-case?
That's not true, you're missing out on a lot of established core use
cases that are at least 2 decades old:
- VLAN subinterface abstraction. Each subinterface may belong to a
separate VRF
- MPLS VPN
- Policy based routing
>
> I am not too familiar with DPDK FIB internals, but would it be
> possible to keep a separate TBL24 per VRF and only share the TBL8
> pool?
it is how it is implemented right now with one note - TBL24 are pre
allocated.
> Something like pre-allocating an array of max_vrfs TBL24
> pointers, allocating each TBL24 on demand at VRF add time,
and you suggesting to allocate TBL24 on demand by adding an extra
indirection layer. Thiswill leadtolowerperformance,whichIwouldliketo avoid.
> and
> having them all point into a shared TBL8 pool. The TBL8 index in
> TBL24 entries seems to already be global, so would that work without
> encoding changes?
>
> Going further: could the same idea extend to IPv6? The dir24_8 and
> trie seem to use the same TBL8 block format (256 entries, same
> (nh << 1) | ext_bit encoding, same size). Would unifying the TBL8
> allocator allow a single pool shared across IPv4, IPv6, and all
> VRFs? That could be a bigger win for /32-heavy and /128-heavy tables
> and maybe a good first step before multi-VRF.
So, you are suggesting merging IPv4 and IPv6 into a single unified FIB?
I'm not sure how this can be a bigger win, could you please elaborate
more on this?
> Regards,
>
> Maxime Leroy
>
>> Vladimir Medvedkin (4):
>> fib: add multi-VRF support
>> fib: add VRF functional and unit tests
>> fib6: add multi-VRF support
>> fib6: add VRF functional and unit tests
>>
>> app/test-fib/main.c | 257 ++++++++++++++++++++++--
>> app/test/test_fib.c | 298 +++++++++++++++++++++++++++
>> app/test/test_fib6.c | 319 ++++++++++++++++++++++++++++-
>> lib/fib/dir24_8.c | 241 ++++++++++++++++------
>> lib/fib/dir24_8.h | 255 ++++++++++++++++--------
>> lib/fib/dir24_8_avx512.c | 420 +++++++++++++++++++++++++++++++--------
>> lib/fib/dir24_8_avx512.h | 80 +++++++-
>> lib/fib/rte_fib.c | 158 ++++++++++++---
>> lib/fib/rte_fib.h | 94 ++++++++-
>> lib/fib/rte_fib6.c | 166 +++++++++++++---
>> lib/fib/rte_fib6.h | 88 +++++++-
>> lib/fib/trie.c | 158 +++++++++++----
>> lib/fib/trie.h | 51 +++--
>> lib/fib/trie_avx512.c | 225 +++++++++++++++++++--
>> lib/fib/trie_avx512.h | 39 +++-
>> 15 files changed, 2453 insertions(+), 396 deletions(-)
>>
>> --
>> 2.43.0
>>
>
--
Regards,
Vladimir
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [RFC PATCH 0/4] VRF support in FIB library
2026-03-23 12:49 ` Medvedkin, Vladimir
@ 2026-03-23 14:53 ` Maxime Leroy
2026-03-23 15:08 ` Robin Jarry
2026-03-23 18:42 ` Medvedkin, Vladimir
0 siblings, 2 replies; 33+ messages in thread
From: Maxime Leroy @ 2026-03-23 14:53 UTC (permalink / raw)
To: Medvedkin, Vladimir; +Cc: dev, rjarry, nsaxena16, mb, adwivedi, jerinjacobk
On Mon, Mar 23, 2026 at 1:49 PM Medvedkin, Vladimir
<vladimir.medvedkin@intel.com> wrote:
>
> Hi Maxime,
>
> On 3/23/2026 11:27 AM, Maxime Leroy wrote:
> > Hi Vladimir,
> >
> >
> > On Sun, Mar 22, 2026 at 4:42 PM Vladimir Medvedkin
> > <vladimir.medvedkin@intel.com> wrote:
> >> This series adds multi-VRF support to both IPv4 and IPv6 FIB paths by
> >> allowing a single FIB instance to host multiple isolated routing domains.
> >>
> >> Currently FIB instance represents one routing instance. For workloads that
> >> need multiple VRFs, the only option is to create multiple FIB objects. In a
> >> burst oriented datapath, packets in the same batch can belong to different VRFs, so
> >> the application either does per-packet lookup in different FIB instances or
> >> regroups packets by VRF before lookup. Both approaches are expensive.
> >>
> >> To remove that cost, this series keeps all VRFs inside one FIB instance and
> >> extends lookup input with per-packet VRF IDs.
> >>
> >> The design follows the existing fast-path structure for both families. IPv4 and
> >> IPv6 use multi-ary trees with a 2^24 associativity on a first level (tbl24). The
> >> first-level table scales per configured VRF. This increases memory usage, but
> >> keeps performance and lookup complexity on par with non-VRF implementation.
> >>
> > Thanks for the RFC. Some thoughts below.
> >
> > Memory cost: the flat TBL24 replicates the entire table for every VRF
> > (num_vrfs * 2^24 * nh_size). With 256 VRFs and 8B nexthops that is
> > 32 GB for TBL24 alone. In grout we support up to 256 VRFs allocated
> > on demand -- this approach forces the full cost upfront even if most
> > VRFs are empty.
>
> Yes, increased memory consumption is the
> trade-off.WemakethischoiceinDPDKquite often,such as pre-allocatedmbufs,
> mempoolsand many other stuff allocated in advance to gain performance.
> For FIB, I chose to replicate TBL24 per VRF for this same reason.
>
> And, as Morten mentioned earlier, if memory is the priority, a table
> instance per VRF allocated on-demand is still supported.
>
> The high memory cost stems from TBL24's design: for IPv4, it was
> justified by the BGP filtering convention (no prefixes more specific
> than /24 in BGPv4 full view), ensuring most lookups hit with just one
> random memory access. For IPv6, we should likely switch to a 16-bit TRIE
> scheme on all layers. For IPv4, alternative algorithms with smaller
> footprints (like DXR or DIR16-8-8, as used in VPP) may be worth
> exploring if BGP full view is not required for those VRFs.
>
> >
> > Per-packet VRF lookup: Rx bursts come from one port, thus one VRF.
> > Mixed-VRF bulk lookups do not occur in practice. The three AVX512
> > code paths add complexity for a scenario that does not exist, at
> > least for a classic router. Am I missing a use-case?
>
> That's not true, you're missing out on a lot of established core use
> cases that are at least 2 decades old:
>
> - VLAN subinterface abstraction. Each subinterface may belong to a
> separate VRF
>
> - MPLS VPN
>
> - Policy based routing
>
Fair point on VLAN subinterfaces and MPLS VPN. SRv6 L3VPN (End.DT4/
End.DT6) also fits that pattern after decap.
I agree DPDK often pre-allocates for performance, but I wonder if the
flat TBL24 actually helps here. Each VRF's working set is spread
128 MB apart in the flat table. Would regrouping packets by VRF and
doing one bulk lookup per VRF with separate contiguous TBL24s be
more cache-friendly than a single mixed-VRF gather? Do you have
benchmarks comparing the two approaches?
On the memory trade-off and VRF ID mapping: the API uses vrf_id as
a direct index (0 to max_vrfs-1). With 256 VRFs and 8B nexthops,
TBL24 alone costs 32 GB for IPv4 and 32 GB for IPv6 -- 64 GB total
at startup. In grout, VRF IDs are interface IDs that can be any
uint16_t, so we would also need to maintain a mapping between our
VRF IDs and FIB slot indices. We would need to introduce a max_vrfs
limit, which forces a bad trade-off: either set it low (e.g. 16)
and limit deployments, or set it high (e.g. 256) and pay 64 GB at
startup even with a single VRF. With separate FIB instances per VRF,
we only allocate what we use.
> >
> > I am not too familiar with DPDK FIB internals, but would it be
> > possible to keep a separate TBL24 per VRF and only share the TBL8
> > pool?
> it is how it is implemented right now with one note - TBL24 are pre
> allocated.
> > Something like pre-allocating an array of max_vrfs TBL24
> > pointers, allocating each TBL24 on demand at VRF add time,
> and you suggesting to allocate TBL24 on demand by adding an extra
> indirection layer. Thiswill leadtolowerperformance,whichIwouldliketo avoid.
> > and
> > having them all point into a shared TBL8 pool. The TBL8 index in
> > TBL24 entries seems to already be global, so would that work without
> > encoding changes?
> >
> > Going further: could the same idea extend to IPv6? The dir24_8 and
> > trie seem to use the same TBL8 block format (256 entries, same
> > (nh << 1) | ext_bit encoding, same size). Would unifying the TBL8
> > allocator allow a single pool shared across IPv4, IPv6, and all
> > VRFs? That could be a bigger win for /32-heavy and /128-heavy tables
> > and maybe a good first step before multi-VRF.
>
> So, you are suggesting merging IPv4 and IPv6 into a single unified FIB?
> I'm not sure how this can be a bigger win, could you please elaborate
> more on this?
On the IPv4/IPv6 TBL8 pool: I was not suggesting merging FIBs, just
sharing the TBL8 block allocator between separate FIB instances.
This is possible since dir24_8 and trie use the same TBL8 block
format (256 entries, same encoding, same size).
Would it be possible to pass a shared TBL8 pool at rte_fib_create()
time? Each FIB keeps its own TBL24 and RIB, but TBL8 is shared
across all FIBs and potentially across IPv4/IPv6. Users would no
longer have to guess num_tbl8 per FIB.
>
> > Regards,
> >
> > Maxime Leroy
> >
> >> Vladimir Medvedkin (4):
> >> fib: add multi-VRF support
> >> fib: add VRF functional and unit tests
> >> fib6: add multi-VRF support
> >> fib6: add VRF functional and unit tests
> >>
> >> app/test-fib/main.c | 257 ++++++++++++++++++++++--
> >> app/test/test_fib.c | 298 +++++++++++++++++++++++++++
> >> app/test/test_fib6.c | 319 ++++++++++++++++++++++++++++-
> >> lib/fib/dir24_8.c | 241 ++++++++++++++++------
> >> lib/fib/dir24_8.h | 255 ++++++++++++++++--------
> >> lib/fib/dir24_8_avx512.c | 420 +++++++++++++++++++++++++++++++--------
> >> lib/fib/dir24_8_avx512.h | 80 +++++++-
> >> lib/fib/rte_fib.c | 158 ++++++++++++---
> >> lib/fib/rte_fib.h | 94 ++++++++-
> >> lib/fib/rte_fib6.c | 166 +++++++++++++---
> >> lib/fib/rte_fib6.h | 88 +++++++-
> >> lib/fib/trie.c | 158 +++++++++++----
> >> lib/fib/trie.h | 51 +++--
> >> lib/fib/trie_avx512.c | 225 +++++++++++++++++++--
> >> lib/fib/trie_avx512.h | 39 +++-
> >> 15 files changed, 2453 insertions(+), 396 deletions(-)
> >>
> >> --
> >> 2.43.0
> >>
> >
> --
> Regards,
> Vladimir
>
--
-------------------------------
Maxime Leroy
maxime@leroys.fr
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [RFC PATCH 0/4] VRF support in FIB library
2026-03-23 14:53 ` Maxime Leroy
@ 2026-03-23 15:08 ` Robin Jarry
2026-03-23 15:27 ` Morten Brørup
2026-03-23 18:42 ` Medvedkin, Vladimir
1 sibling, 1 reply; 33+ messages in thread
From: Robin Jarry @ 2026-03-23 15:08 UTC (permalink / raw)
To: Maxime Leroy, Medvedkin, Vladimir
Cc: dev, nsaxena16, mb, adwivedi, jerinjacobk
Hey folks,
Maxime Leroy, Mar 23, 2026 at 15:53:
> Fair point on VLAN subinterfaces and MPLS VPN. SRv6 L3VPN (End.DT4/
> End.DT6) also fits that pattern after decap.
>
> I agree DPDK often pre-allocates for performance, but I wonder if the
> flat TBL24 actually helps here. Each VRF's working set is spread
> 128 MB apart in the flat table. Would regrouping packets by VRF and
> doing one bulk lookup per VRF with separate contiguous TBL24s be
> more cache-friendly than a single mixed-VRF gather? Do you have
> benchmarks comparing the two approaches?
> On the memory trade-off and VRF ID mapping: the API uses vrf_id as
> a direct index (0 to max_vrfs-1). With 256 VRFs and 8B nexthops,
> TBL24 alone costs 32 GB for IPv4 and 32 GB for IPv6 -- 64 GB total
> at startup. In grout, VRF IDs are interface IDs that can be any
> uint16_t, so we would also need to maintain a mapping between our
> VRF IDs and FIB slot indices. We would need to introduce a max_vrfs
> limit, which forces a bad trade-off: either set it low (e.g. 16)
> and limit deployments, or set it high (e.g. 256) and pay 64 GB at
> startup even with a single VRF. With separate FIB instances per VRF,
> we only allocate what we use.
I am also concerned about the global memory consumption. Taking grout as
a live example, we currently support up to 1024 VRFs (each VRF is
an interface so the upper limit is just the number of interfaces).
Pre-allocating 1024 rte_fib and rte_fib6 is virtually impossible.
> On the IPv4/IPv6 TBL8 pool: I was not suggesting merging FIBs, just
> sharing the TBL8 block allocator between separate FIB instances.
> This is possible since dir24_8 and trie use the same TBL8 block
> format (256 entries, same encoding, same size).
>
> Would it be possible to pass a shared TBL8 pool at rte_fib_create()
> time? Each FIB keeps its own TBL24 and RIB, but TBL8 is shared
> across all FIBs and potentially across IPv4/IPv6. Users would no
> longer have to guess num_tbl8 per FIB.
+1 to this. That would help a lot to have a common tbl8 pool. That way
we could keep the single VRF per fib/rib but have a global tbl8 pool
that we can tune to our use case.
Cheers,
^ permalink raw reply [flat|nested] 33+ messages in thread
* RE: [RFC PATCH 0/4] VRF support in FIB library
2026-03-23 15:08 ` Robin Jarry
@ 2026-03-23 15:27 ` Morten Brørup
2026-03-23 18:52 ` Medvedkin, Vladimir
0 siblings, 1 reply; 33+ messages in thread
From: Morten Brørup @ 2026-03-23 15:27 UTC (permalink / raw)
To: Robin Jarry, Maxime Leroy, Medvedkin, Vladimir
Cc: dev, nsaxena16, adwivedi, jerinjacobk, Stephen Hemminger
Let's take a level up in abstraction, and consider one of the key design goals:
What are the typical use cases we want VRF support designed for?
I can imagine a use case with one or very few VRFs with a very large (internet) route table, and many VRFs with small (private) route tables.
PS: I have no experience with the DPDK FIB library, so my feedback might be completely off.
^ permalink raw reply [flat|nested] 33+ messages in thread
* RE: [RFC PATCH 1/4] fib: add multi-VRF support
2026-03-22 15:42 ` [RFC PATCH 1/4] fib: add multi-VRF support Vladimir Medvedkin
@ 2026-03-23 15:48 ` Konstantin Ananyev
2026-03-23 19:06 ` Medvedkin, Vladimir
0 siblings, 1 reply; 33+ messages in thread
From: Konstantin Ananyev @ 2026-03-23 15:48 UTC (permalink / raw)
To: Vladimir Medvedkin, dev@dpdk.org
Cc: rjarry@redhat.com, nsaxena16@gmail.com, mb@smartsharesystems.com,
adwivedi@marvell.com, jerinjacobk@gmail.com, Maxime Leroy
> Add VRF (Virtual Routing and Forwarding) support to the IPv4
> FIB library, allowing multiple independent routing tables
> within a single FIB instance.
>
> Introduce max_vrfs and vrf_default_nh fields in rte_fib_conf
> to configure the number of VRFs and per-VRF default nexthops.
Thanks Vladimir, allowing multiple VRFs per same LPM table will
definitely be a useful thing to have.
Though, I have the same concern as Maxime:
memory requirements are just overwhelming.
Stupid q - why just not to store a pointer to a vector of next-hops
within the table entry?
And we can provide to the user with ability to specify custom
alloc/free function for these vectors.
That would help to avoid allocating huge chunks of memory at startup.
I understand that it will be one extra memory dereference,
but probably it will be not that critical in terms of performance .
Again for bulk function we might be able to pipeline lookups and
de-references and hide that extra load latency.
> Add four new experimental APIs:
> - rte_fib_vrf_add() and rte_fib_vrf_delete() to manage routes
> per VRF
> - rte_fib_vrf_lookup_bulk() for multi-VRF bulk lookups
> - rte_fib_vrf_get_rib() to retrieve a per-VRF RIB handle
>
> Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
> ---
> lib/fib/dir24_8.c | 241 ++++++++++++++++------
> lib/fib/dir24_8.h | 255 ++++++++++++++++--------
> lib/fib/dir24_8_avx512.c | 420 +++++++++++++++++++++++++++++++--------
> lib/fib/dir24_8_avx512.h | 80 +++++++-
> lib/fib/rte_fib.c | 158 ++++++++++++---
> lib/fib/rte_fib.h | 94 ++++++++-
> 6 files changed, 988 insertions(+), 260 deletions(-)
>
> diff --git a/lib/fib/dir24_8.c b/lib/fib/dir24_8.c
> index 489d2ef427..ad295c5f16 100644
> --- a/lib/fib/dir24_8.c
> +++ b/lib/fib/dir24_8.c
> @@ -32,41 +32,80 @@
> #define ROUNDUP(x, y) RTE_ALIGN_CEIL(x, (1 << (32 - y)))
>
> static inline rte_fib_lookup_fn_t
> -get_scalar_fn(enum rte_fib_dir24_8_nh_sz nh_sz, bool be_addr)
> +get_scalar_fn(const struct dir24_8_tbl *dp, enum rte_fib_dir24_8_nh_sz nh_sz,
> + bool be_addr)
> {
> + bool single_vrf = dp->num_vrfs <= 1;
> +
> switch (nh_sz) {
> case RTE_FIB_DIR24_8_1B:
> - return be_addr ? dir24_8_lookup_bulk_1b_be :
> dir24_8_lookup_bulk_1b;
> + if (single_vrf)
> + return be_addr ? dir24_8_lookup_bulk_1b_be :
> + dir24_8_lookup_bulk_1b;
> + return be_addr ? dir24_8_lookup_bulk_vrf_1b_be :
> + dir24_8_lookup_bulk_vrf_1b;
> case RTE_FIB_DIR24_8_2B:
> - return be_addr ? dir24_8_lookup_bulk_2b_be :
> dir24_8_lookup_bulk_2b;
> + if (single_vrf)
> + return be_addr ? dir24_8_lookup_bulk_2b_be :
> + dir24_8_lookup_bulk_2b;
> + return be_addr ? dir24_8_lookup_bulk_vrf_2b_be :
> + dir24_8_lookup_bulk_vrf_2b;
> case RTE_FIB_DIR24_8_4B:
> - return be_addr ? dir24_8_lookup_bulk_4b_be :
> dir24_8_lookup_bulk_4b;
> + if (single_vrf)
> + return be_addr ? dir24_8_lookup_bulk_4b_be :
> + dir24_8_lookup_bulk_4b;
> + return be_addr ? dir24_8_lookup_bulk_vrf_4b_be :
> + dir24_8_lookup_bulk_vrf_4b;
> case RTE_FIB_DIR24_8_8B:
> - return be_addr ? dir24_8_lookup_bulk_8b_be :
> dir24_8_lookup_bulk_8b;
> + if (single_vrf)
> + return be_addr ? dir24_8_lookup_bulk_8b_be :
> + dir24_8_lookup_bulk_8b;
> + return be_addr ? dir24_8_lookup_bulk_vrf_8b_be :
> + dir24_8_lookup_bulk_vrf_8b;
> default:
> return NULL;
> }
> }
>
> static inline rte_fib_lookup_fn_t
> -get_scalar_fn_inlined(enum rte_fib_dir24_8_nh_sz nh_sz, bool be_addr)
> +get_scalar_fn_inlined(const struct dir24_8_tbl *dp,
> + enum rte_fib_dir24_8_nh_sz nh_sz, bool be_addr)
> {
> + bool single_vrf = dp->num_vrfs <= 1;
> +
> switch (nh_sz) {
> case RTE_FIB_DIR24_8_1B:
> - return be_addr ? dir24_8_lookup_bulk_0_be :
> dir24_8_lookup_bulk_0;
> + if (single_vrf)
> + return be_addr ? dir24_8_lookup_bulk_0_be :
> + dir24_8_lookup_bulk_0;
> + return be_addr ? dir24_8_lookup_bulk_vrf_0_be :
> + dir24_8_lookup_bulk_vrf_0;
> case RTE_FIB_DIR24_8_2B:
> - return be_addr ? dir24_8_lookup_bulk_1_be :
> dir24_8_lookup_bulk_1;
> + if (single_vrf)
> + return be_addr ? dir24_8_lookup_bulk_1_be :
> + dir24_8_lookup_bulk_1;
> + return be_addr ? dir24_8_lookup_bulk_vrf_1_be :
> + dir24_8_lookup_bulk_vrf_1;
> case RTE_FIB_DIR24_8_4B:
> - return be_addr ? dir24_8_lookup_bulk_2_be :
> dir24_8_lookup_bulk_2;
> + if (single_vrf)
> + return be_addr ? dir24_8_lookup_bulk_2_be :
> + dir24_8_lookup_bulk_2;
> + return be_addr ? dir24_8_lookup_bulk_vrf_2_be :
> + dir24_8_lookup_bulk_vrf_2;
> case RTE_FIB_DIR24_8_8B:
> - return be_addr ? dir24_8_lookup_bulk_3_be :
> dir24_8_lookup_bulk_3;
> + if (single_vrf)
> + return be_addr ? dir24_8_lookup_bulk_3_be :
> + dir24_8_lookup_bulk_3;
> + return be_addr ? dir24_8_lookup_bulk_vrf_3_be :
> + dir24_8_lookup_bulk_vrf_3;
> default:
> return NULL;
> }
> }
>
> static inline rte_fib_lookup_fn_t
> -get_vector_fn(enum rte_fib_dir24_8_nh_sz nh_sz, bool be_addr)
> +get_vector_fn(const struct dir24_8_tbl *dp, enum rte_fib_dir24_8_nh_sz nh_sz,
> + bool be_addr)
> {
> #ifdef CC_AVX512_SUPPORT
> if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512F) <= 0 ||
> @@ -77,24 +116,63 @@ get_vector_fn(enum rte_fib_dir24_8_nh_sz nh_sz, bool
> be_addr)
> if (be_addr && rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512BW) <=
> 0)
> return NULL;
>
> + if (dp->num_vrfs <= 1) {
> + switch (nh_sz) {
> + case RTE_FIB_DIR24_8_1B:
> + return be_addr ? rte_dir24_8_vec_lookup_bulk_1b_be :
> + rte_dir24_8_vec_lookup_bulk_1b;
> + case RTE_FIB_DIR24_8_2B:
> + return be_addr ? rte_dir24_8_vec_lookup_bulk_2b_be :
> + rte_dir24_8_vec_lookup_bulk_2b;
> + case RTE_FIB_DIR24_8_4B:
> + return be_addr ? rte_dir24_8_vec_lookup_bulk_4b_be :
> + rte_dir24_8_vec_lookup_bulk_4b;
> + case RTE_FIB_DIR24_8_8B:
> + return be_addr ? rte_dir24_8_vec_lookup_bulk_8b_be :
> + rte_dir24_8_vec_lookup_bulk_8b;
> + default:
> + return NULL;
> + }
> + }
> +
> + if (dp->num_vrfs >= 256) {
> + switch (nh_sz) {
> + case RTE_FIB_DIR24_8_1B:
> + return be_addr ?
> rte_dir24_8_vec_lookup_bulk_vrf_1b_be_large :
> + rte_dir24_8_vec_lookup_bulk_vrf_1b_large;
> + case RTE_FIB_DIR24_8_2B:
> + return be_addr ?
> rte_dir24_8_vec_lookup_bulk_vrf_2b_be_large :
> + rte_dir24_8_vec_lookup_bulk_vrf_2b_large;
> + case RTE_FIB_DIR24_8_4B:
> + return be_addr ?
> rte_dir24_8_vec_lookup_bulk_vrf_4b_be_large :
> + rte_dir24_8_vec_lookup_bulk_vrf_4b_large;
> + case RTE_FIB_DIR24_8_8B:
> + return be_addr ?
> rte_dir24_8_vec_lookup_bulk_vrf_8b_be_large :
> + rte_dir24_8_vec_lookup_bulk_vrf_8b_large;
> + default:
> + return NULL;
> + }
> + }
> +
> switch (nh_sz) {
> case RTE_FIB_DIR24_8_1B:
> - return be_addr ? rte_dir24_8_vec_lookup_bulk_1b_be :
> - rte_dir24_8_vec_lookup_bulk_1b;
> + return be_addr ? rte_dir24_8_vec_lookup_bulk_vrf_1b_be :
> + rte_dir24_8_vec_lookup_bulk_vrf_1b;
> case RTE_FIB_DIR24_8_2B:
> - return be_addr ? rte_dir24_8_vec_lookup_bulk_2b_be :
> - rte_dir24_8_vec_lookup_bulk_2b;
> + return be_addr ? rte_dir24_8_vec_lookup_bulk_vrf_2b_be :
> + rte_dir24_8_vec_lookup_bulk_vrf_2b;
> case RTE_FIB_DIR24_8_4B:
> - return be_addr ? rte_dir24_8_vec_lookup_bulk_4b_be :
> - rte_dir24_8_vec_lookup_bulk_4b;
> + return be_addr ? rte_dir24_8_vec_lookup_bulk_vrf_4b_be :
> + rte_dir24_8_vec_lookup_bulk_vrf_4b;
> case RTE_FIB_DIR24_8_8B:
> - return be_addr ? rte_dir24_8_vec_lookup_bulk_8b_be :
> - rte_dir24_8_vec_lookup_bulk_8b;
> + return be_addr ? rte_dir24_8_vec_lookup_bulk_vrf_8b_be :
> + rte_dir24_8_vec_lookup_bulk_vrf_8b;
> default:
> return NULL;
> }
> #elif defined(RTE_RISCV_FEATURE_V)
> RTE_SET_USED(be_addr);
> + RTE_SET_USED(dp);
> if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_RISCV_ISA_V) <= 0)
> return NULL;
> switch (nh_sz) {
> @@ -130,16 +208,17 @@ dir24_8_get_lookup_fn(void *p, enum
> rte_fib_lookup_type type, bool be_addr)
>
> switch (type) {
> case RTE_FIB_LOOKUP_DIR24_8_SCALAR_MACRO:
> - return get_scalar_fn(nh_sz, be_addr);
> + return get_scalar_fn(dp, nh_sz, be_addr);
> case RTE_FIB_LOOKUP_DIR24_8_SCALAR_INLINE:
> - return get_scalar_fn_inlined(nh_sz, be_addr);
> + return get_scalar_fn_inlined(dp, nh_sz, be_addr);
> case RTE_FIB_LOOKUP_DIR24_8_SCALAR_UNI:
> - return be_addr ? dir24_8_lookup_bulk_uni_be :
> dir24_8_lookup_bulk_uni;
> + return be_addr ? dir24_8_lookup_bulk_uni_be :
> + dir24_8_lookup_bulk_uni;
> case RTE_FIB_LOOKUP_DIR24_8_VECTOR_AVX512:
> - return get_vector_fn(nh_sz, be_addr);
> + return get_vector_fn(dp, nh_sz, be_addr);
> case RTE_FIB_LOOKUP_DEFAULT:
> - ret_fn = get_vector_fn(nh_sz, be_addr);
> - return ret_fn != NULL ? ret_fn : get_scalar_fn(nh_sz, be_addr);
> + ret_fn = get_vector_fn(dp, nh_sz, be_addr);
> + return ret_fn != NULL ? ret_fn : get_scalar_fn(dp, nh_sz,
> be_addr);
> default:
> return NULL;
> }
> @@ -246,15 +325,18 @@ __rcu_qsbr_free_resource(void *p, void *data,
> unsigned int n __rte_unused)
> }
>
> static void
> -tbl8_recycle(struct dir24_8_tbl *dp, uint32_t ip, uint64_t tbl8_idx)
> +tbl8_recycle(struct dir24_8_tbl *dp, uint16_t vrf_id, uint32_t ip, uint64_t
> tbl8_idx)
> {
> uint32_t i;
> uint64_t nh;
> + uint64_t tbl24_idx;
> uint8_t *ptr8;
> uint16_t *ptr16;
> uint32_t *ptr32;
> uint64_t *ptr64;
>
> + tbl24_idx = get_tbl24_idx(vrf_id, ip);
> +
> switch (dp->nh_sz) {
> case RTE_FIB_DIR24_8_1B:
> ptr8 = &((uint8_t *)dp->tbl8)[tbl8_idx *
> @@ -264,7 +346,7 @@ tbl8_recycle(struct dir24_8_tbl *dp, uint32_t ip, uint64_t
> tbl8_idx)
> if (nh != ptr8[i])
> return;
> }
> - ((uint8_t *)dp->tbl24)[ip >> 8] =
> + ((uint8_t *)dp->tbl24)[tbl24_idx] =
> nh & ~DIR24_8_EXT_ENT;
> break;
> case RTE_FIB_DIR24_8_2B:
> @@ -275,7 +357,7 @@ tbl8_recycle(struct dir24_8_tbl *dp, uint32_t ip, uint64_t
> tbl8_idx)
> if (nh != ptr16[i])
> return;
> }
> - ((uint16_t *)dp->tbl24)[ip >> 8] =
> + ((uint16_t *)dp->tbl24)[tbl24_idx] =
> nh & ~DIR24_8_EXT_ENT;
> break;
> case RTE_FIB_DIR24_8_4B:
> @@ -286,7 +368,7 @@ tbl8_recycle(struct dir24_8_tbl *dp, uint32_t ip, uint64_t
> tbl8_idx)
> if (nh != ptr32[i])
> return;
> }
> - ((uint32_t *)dp->tbl24)[ip >> 8] =
> + ((uint32_t *)dp->tbl24)[tbl24_idx] =
> nh & ~DIR24_8_EXT_ENT;
> break;
> case RTE_FIB_DIR24_8_8B:
> @@ -297,7 +379,7 @@ tbl8_recycle(struct dir24_8_tbl *dp, uint32_t ip, uint64_t
> tbl8_idx)
> if (nh != ptr64[i])
> return;
> }
> - ((uint64_t *)dp->tbl24)[ip >> 8] =
> + ((uint64_t *)dp->tbl24)[tbl24_idx] =
> nh & ~DIR24_8_EXT_ENT;
> break;
> }
> @@ -314,7 +396,7 @@ tbl8_recycle(struct dir24_8_tbl *dp, uint32_t ip, uint64_t
> tbl8_idx)
> }
>
> static int
> -install_to_fib(struct dir24_8_tbl *dp, uint32_t ledge, uint32_t redge,
> +install_to_fib(struct dir24_8_tbl *dp, uint16_t vrf_id, uint32_t ledge, uint32_t
> redge,
> uint64_t next_hop)
> {
> uint64_t tbl24_tmp;
> @@ -328,7 +410,7 @@ install_to_fib(struct dir24_8_tbl *dp, uint32_t ledge,
> uint32_t redge,
>
> if (((ledge >> 8) != (redge >> 8)) || (len == 1 << 24)) {
> if ((ROUNDUP(ledge, 24) - ledge) != 0) {
> - tbl24_tmp = get_tbl24(dp, ledge, dp->nh_sz);
> + tbl24_tmp = get_tbl24(dp, vrf_id, ledge, dp->nh_sz);
> if ((tbl24_tmp & DIR24_8_EXT_ENT) !=
> DIR24_8_EXT_ENT) {
> /**
> @@ -346,7 +428,7 @@ install_to_fib(struct dir24_8_tbl *dp, uint32_t ledge,
> uint32_t redge,
> }
> tbl8_free_idx(dp, tmp_tbl8_idx);
> /*update dir24 entry with tbl8 index*/
> - write_to_fib(get_tbl24_p(dp, ledge,
> + write_to_fib(get_tbl24_p(dp, vrf_id, ledge,
> dp->nh_sz), (tbl8_idx << 1)|
> DIR24_8_EXT_ENT,
> dp->nh_sz, 1);
> @@ -360,19 +442,19 @@ install_to_fib(struct dir24_8_tbl *dp, uint32_t ledge,
> uint32_t redge,
> write_to_fib((void *)tbl8_ptr, (next_hop << 1)|
> DIR24_8_EXT_ENT,
> dp->nh_sz, ROUNDUP(ledge, 24) - ledge);
> - tbl8_recycle(dp, ledge, tbl8_idx);
> + tbl8_recycle(dp, vrf_id, ledge, tbl8_idx);
> }
> - write_to_fib(get_tbl24_p(dp, ROUNDUP(ledge, 24), dp->nh_sz),
> + write_to_fib(get_tbl24_p(dp, vrf_id, ROUNDUP(ledge, 24), dp-
> >nh_sz),
> next_hop << 1, dp->nh_sz, len);
> if (redge & ~DIR24_8_TBL24_MASK) {
> - tbl24_tmp = get_tbl24(dp, redge, dp->nh_sz);
> + tbl24_tmp = get_tbl24(dp, vrf_id, redge, dp->nh_sz);
> if ((tbl24_tmp & DIR24_8_EXT_ENT) !=
> DIR24_8_EXT_ENT) {
> tbl8_idx = tbl8_alloc(dp, tbl24_tmp);
> if (tbl8_idx < 0)
> return -ENOSPC;
> /*update dir24 entry with tbl8 index*/
> - write_to_fib(get_tbl24_p(dp, redge,
> + write_to_fib(get_tbl24_p(dp, vrf_id, redge,
> dp->nh_sz), (tbl8_idx << 1)|
> DIR24_8_EXT_ENT,
> dp->nh_sz, 1);
> @@ -385,17 +467,17 @@ install_to_fib(struct dir24_8_tbl *dp, uint32_t ledge,
> uint32_t redge,
> write_to_fib((void *)tbl8_ptr, (next_hop << 1)|
> DIR24_8_EXT_ENT,
> dp->nh_sz, redge & ~DIR24_8_TBL24_MASK);
> - tbl8_recycle(dp, redge, tbl8_idx);
> + tbl8_recycle(dp, vrf_id, redge, tbl8_idx);
> }
> } else if ((redge - ledge) != 0) {
> - tbl24_tmp = get_tbl24(dp, ledge, dp->nh_sz);
> + tbl24_tmp = get_tbl24(dp, vrf_id, ledge, dp->nh_sz);
> if ((tbl24_tmp & DIR24_8_EXT_ENT) !=
> DIR24_8_EXT_ENT) {
> tbl8_idx = tbl8_alloc(dp, tbl24_tmp);
> if (tbl8_idx < 0)
> return -ENOSPC;
> /*update dir24 entry with tbl8 index*/
> - write_to_fib(get_tbl24_p(dp, ledge, dp->nh_sz),
> + write_to_fib(get_tbl24_p(dp, vrf_id, ledge, dp->nh_sz),
> (tbl8_idx << 1)|
> DIR24_8_EXT_ENT,
> dp->nh_sz, 1);
> @@ -409,13 +491,13 @@ install_to_fib(struct dir24_8_tbl *dp, uint32_t ledge,
> uint32_t redge,
> write_to_fib((void *)tbl8_ptr, (next_hop << 1)|
> DIR24_8_EXT_ENT,
> dp->nh_sz, redge - ledge);
> - tbl8_recycle(dp, ledge, tbl8_idx);
> + tbl8_recycle(dp, vrf_id, ledge, tbl8_idx);
> }
> return 0;
> }
>
> static int
> -modify_fib(struct dir24_8_tbl *dp, struct rte_rib *rib, uint32_t ip,
> +modify_fib(struct dir24_8_tbl *dp, struct rte_rib *rib, uint16_t vrf_id, uint32_t
> ip,
> uint8_t depth, uint64_t next_hop)
> {
> struct rte_rib_node *tmp = NULL;
> @@ -438,7 +520,7 @@ modify_fib(struct dir24_8_tbl *dp, struct rte_rib *rib,
> uint32_t ip,
> (uint32_t)(1ULL << (32 - tmp_depth));
> continue;
> }
> - ret = install_to_fib(dp, ledge, redge,
> + ret = install_to_fib(dp, vrf_id, ledge, redge,
> next_hop);
> if (ret != 0)
> return ret;
> @@ -454,7 +536,7 @@ modify_fib(struct dir24_8_tbl *dp, struct rte_rib *rib,
> uint32_t ip,
> redge = ip + (uint32_t)(1ULL << (32 - depth));
> if (ledge == redge && ledge != 0)
> break;
> - ret = install_to_fib(dp, ledge, redge,
> + ret = install_to_fib(dp, vrf_id, ledge, redge,
> next_hop);
> if (ret != 0)
> return ret;
> @@ -465,7 +547,7 @@ modify_fib(struct dir24_8_tbl *dp, struct rte_rib *rib,
> uint32_t ip,
> }
>
> int
> -dir24_8_modify(struct rte_fib *fib, uint32_t ip, uint8_t depth,
> +dir24_8_modify(struct rte_fib *fib, uint16_t vrf_id, uint32_t ip, uint8_t depth,
> uint64_t next_hop, int op)
> {
> struct dir24_8_tbl *dp;
> @@ -480,8 +562,13 @@ dir24_8_modify(struct rte_fib *fib, uint32_t ip, uint8_t
> depth,
> return -EINVAL;
>
> dp = rte_fib_get_dp(fib);
> - rib = rte_fib_get_rib(fib);
> - RTE_ASSERT((dp != NULL) && (rib != NULL));
> + RTE_ASSERT(dp != NULL);
> +
> + if (vrf_id >= dp->num_vrfs)
> + return -EINVAL;
> +
> + rib = rte_fib_vrf_get_rib(fib, vrf_id);
> + RTE_ASSERT(rib != NULL);
>
> if (next_hop > get_max_nh(dp->nh_sz))
> return -EINVAL;
> @@ -495,7 +582,7 @@ dir24_8_modify(struct rte_fib *fib, uint32_t ip, uint8_t
> depth,
> rte_rib_get_nh(node, &node_nh);
> if (node_nh == next_hop)
> return 0;
> - ret = modify_fib(dp, rib, ip, depth, next_hop);
> + ret = modify_fib(dp, rib, vrf_id, ip, depth, next_hop);
> if (ret == 0)
> rte_rib_set_nh(node, next_hop);
> return 0;
> @@ -518,7 +605,7 @@ dir24_8_modify(struct rte_fib *fib, uint32_t ip, uint8_t
> depth,
> if (par_nh == next_hop)
> goto successfully_added;
> }
> - ret = modify_fib(dp, rib, ip, depth, next_hop);
> + ret = modify_fib(dp, rib, vrf_id, ip, depth, next_hop);
> if (ret != 0) {
> rte_rib_remove(rib, ip, depth);
> return ret;
> @@ -536,9 +623,9 @@ dir24_8_modify(struct rte_fib *fib, uint32_t ip, uint8_t
> depth,
> rte_rib_get_nh(parent, &par_nh);
> rte_rib_get_nh(node, &node_nh);
> if (par_nh != node_nh)
> - ret = modify_fib(dp, rib, ip, depth, par_nh);
> + ret = modify_fib(dp, rib, vrf_id, ip, depth,
> par_nh);
> } else
> - ret = modify_fib(dp, rib, ip, depth, dp->def_nh);
> + ret = modify_fib(dp, rib, vrf_id, ip, depth, dp-
> >def_nh[vrf_id]);
> if (ret == 0) {
> rte_rib_remove(rib, ip, depth);
> if (depth > 24) {
> @@ -562,7 +649,10 @@ dir24_8_create(const char *name, int socket_id, struct
> rte_fib_conf *fib_conf)
> struct dir24_8_tbl *dp;
> uint64_t def_nh;
> uint32_t num_tbl8;
> + uint16_t num_vrfs;
> enum rte_fib_dir24_8_nh_sz nh_sz;
> + uint64_t tbl24_sz;
> + uint16_t vrf;
>
> if ((name == NULL) || (fib_conf == NULL) ||
> (fib_conf->dir24_8.nh_sz < RTE_FIB_DIR24_8_1B) ||
> @@ -580,19 +670,56 @@ dir24_8_create(const char *name, int socket_id, struct
> rte_fib_conf *fib_conf)
> nh_sz = fib_conf->dir24_8.nh_sz;
> num_tbl8 = RTE_ALIGN_CEIL(fib_conf->dir24_8.num_tbl8,
> BITMAP_SLAB_BIT_SIZE);
> + num_vrfs = (fib_conf->max_vrfs == 0) ? 1 : fib_conf->max_vrfs;
> +
> + /* Validate per-VRF default nexthops if provided */
> + if (fib_conf->vrf_default_nh != NULL) {
> + for (vrf = 0; vrf < num_vrfs; vrf++) {
> + if (fib_conf->vrf_default_nh[vrf] > get_max_nh(nh_sz)) {
> + rte_errno = EINVAL;
> + return NULL;
> + }
> + }
> + }
> +
> + tbl24_sz = (uint64_t)num_vrfs * DIR24_8_TBL24_NUM_ENT * (1 <<
> nh_sz);
>
> snprintf(mem_name, sizeof(mem_name), "DP_%s", name);
> dp = rte_zmalloc_socket(name, sizeof(struct dir24_8_tbl) +
> - DIR24_8_TBL24_NUM_ENT * (1 << nh_sz) + sizeof(uint32_t),
> + tbl24_sz + sizeof(uint32_t),
> RTE_CACHE_LINE_SIZE, socket_id);
> if (dp == NULL) {
> rte_errno = ENOMEM;
> return NULL;
> }
>
> - /* Init table with default value */
> - write_to_fib(dp->tbl24, (def_nh << 1), nh_sz, 1 << 24);
> + dp->num_vrfs = num_vrfs;
> + dp->nh_sz = nh_sz;
> + dp->number_tbl8s = num_tbl8;
> +
> + /* Allocate per-VRF default nexthop array */
> + snprintf(mem_name, sizeof(mem_name), "DEFNH_%p", dp);
> + dp->def_nh = rte_zmalloc_socket(mem_name, num_vrfs *
> sizeof(uint64_t),
> + RTE_CACHE_LINE_SIZE, socket_id);
> + if (dp->def_nh == NULL) {
> + rte_errno = ENOMEM;
> + rte_free(dp);
> + return NULL;
> + }
> +
> + /* Initialize all VRFs with default nexthop */
> + for (vrf = 0; vrf < num_vrfs; vrf++) {
> + uint64_t vrf_def_nh = (fib_conf->vrf_default_nh != NULL) ?
> + fib_conf->vrf_default_nh[vrf] : def_nh;
> + dp->def_nh[vrf] = vrf_def_nh;
>
> + /* Init TBL24 for this VRF with default value */
> + uint64_t vrf_offset = (uint64_t)vrf * DIR24_8_TBL24_NUM_ENT;
> + void *vrf_tbl24 = (void *)&((uint8_t *)dp->tbl24)[vrf_offset <<
> nh_sz];
> + write_to_fib(vrf_tbl24, (vrf_def_nh << 1), nh_sz, 1 << 24);
> + }
> +
> + /* Allocate shared TBL8 for all VRFs */
> snprintf(mem_name, sizeof(mem_name), "TBL8_%p", dp);
> uint64_t tbl8_sz = DIR24_8_TBL8_GRP_NUM_ENT * (1ULL << nh_sz) *
> (num_tbl8 + 1);
> @@ -600,12 +727,10 @@ dir24_8_create(const char *name, int socket_id, struct
> rte_fib_conf *fib_conf)
> RTE_CACHE_LINE_SIZE, socket_id);
> if (dp->tbl8 == NULL) {
> rte_errno = ENOMEM;
> + rte_free(dp->def_nh);
> rte_free(dp);
> return NULL;
> }
> - dp->def_nh = def_nh;
> - dp->nh_sz = nh_sz;
> - dp->number_tbl8s = num_tbl8;
>
> snprintf(mem_name, sizeof(mem_name), "TBL8_idxes_%p", dp);
> dp->tbl8_idxes = rte_zmalloc_socket(mem_name,
> @@ -614,6 +739,7 @@ dir24_8_create(const char *name, int socket_id, struct
> rte_fib_conf *fib_conf)
> if (dp->tbl8_idxes == NULL) {
> rte_errno = ENOMEM;
> rte_free(dp->tbl8);
> + rte_free(dp->def_nh);
> rte_free(dp);
> return NULL;
> }
> @@ -629,6 +755,7 @@ dir24_8_free(void *p)
> rte_rcu_qsbr_dq_delete(dp->dq);
> rte_free(dp->tbl8_idxes);
> rte_free(dp->tbl8);
> + rte_free(dp->def_nh);
> rte_free(dp);
> }
>
> diff --git a/lib/fib/dir24_8.h b/lib/fib/dir24_8.h
> index b343b5d686..37a73a3cc2 100644
> --- a/lib/fib/dir24_8.h
> +++ b/lib/fib/dir24_8.h
> @@ -12,6 +12,7 @@
> #include <rte_byteorder.h>
> #include <rte_prefetch.h>
> #include <rte_branch_prediction.h>
> +#include <rte_debug.h>
> #include <rte_rcu_qsbr.h>
>
> /**
> @@ -32,24 +33,19 @@ struct dir24_8_tbl {
> uint32_t number_tbl8s; /**< Total number of tbl8s */
> uint32_t rsvd_tbl8s; /**< Number of reserved tbl8s */
> uint32_t cur_tbl8s; /**< Current number of tbl8s */
> + uint16_t num_vrfs; /**< Number of VRFs */
> enum rte_fib_dir24_8_nh_sz nh_sz; /**< Size of nexthop entry */
> /* RCU config. */
> enum rte_fib_qsbr_mode rcu_mode;/* Blocking, defer queue. */
> struct rte_rcu_qsbr *v; /* RCU QSBR variable. */
> struct rte_rcu_qsbr_dq *dq; /* RCU QSBR defer queue. */
> - uint64_t def_nh; /**< Default next hop */
> + uint64_t *def_nh; /**< Per-VRF default next hop array */
> uint64_t *tbl8; /**< tbl8 table. */
> uint64_t *tbl8_idxes; /**< bitmap containing free tbl8 idxes*/
> /* tbl24 table. */
> alignas(RTE_CACHE_LINE_SIZE) uint64_t tbl24[];
> };
>
> -static inline void *
> -get_tbl24_p(struct dir24_8_tbl *dp, uint32_t ip, uint8_t nh_sz)
> -{
> - return (void *)&((uint8_t *)dp->tbl24)[(ip &
> - DIR24_8_TBL24_MASK) >> (8 - nh_sz)];
> -}
>
> static inline uint8_t
> bits_in_nh(uint8_t nh_sz)
> @@ -63,14 +59,21 @@ get_max_nh(uint8_t nh_sz)
> return ((1ULL << (bits_in_nh(nh_sz) - 1)) - 1);
> }
>
> -static inline uint32_t
> -get_tbl24_idx(uint32_t ip)
> +static inline uint64_t
> +get_tbl24_idx(uint16_t vrf_id, uint32_t ip)
> +{
> + return ((uint64_t)vrf_id << 24) + (ip >> 8);
> +}
> +
> +static inline void *
> +get_tbl24_p(struct dir24_8_tbl *dp, uint16_t vrf_id, uint32_t ip, uint8_t nh_sz)
> {
> - return ip >> 8;
> + uint64_t idx = get_tbl24_idx(vrf_id, ip);
> + return (void *)&((uint8_t *)dp->tbl24)[idx << nh_sz];
> }
>
> -static inline uint32_t
> -get_tbl8_idx(uint32_t res, uint32_t ip)
> +static inline uint64_t
> +get_tbl8_idx(uint64_t res, uint32_t ip)
> {
> return (res >> 1) * DIR24_8_TBL8_GRP_NUM_ENT + (uint8_t)ip;
> }
> @@ -87,17 +90,18 @@ get_psd_idx(uint32_t val, uint8_t nh_sz)
> return val & ((1 << (3 - nh_sz)) - 1);
> }
>
> -static inline uint32_t
> -get_tbl_idx(uint32_t val, uint8_t nh_sz)
> +static inline uint64_t
> +get_tbl_idx(uint64_t val, uint8_t nh_sz)
> {
> return val >> (3 - nh_sz);
> }
>
> static inline uint64_t
> -get_tbl24(struct dir24_8_tbl *dp, uint32_t ip, uint8_t nh_sz)
> +get_tbl24(struct dir24_8_tbl *dp, uint16_t vrf_id, uint32_t ip, uint8_t nh_sz)
> {
> - return ((dp->tbl24[get_tbl_idx(get_tbl24_idx(ip), nh_sz)] >>
> - (get_psd_idx(get_tbl24_idx(ip), nh_sz) *
> + uint64_t idx = get_tbl24_idx(vrf_id, ip);
> + return ((dp->tbl24[get_tbl_idx(idx, nh_sz)] >>
> + (get_psd_idx(idx, nh_sz) *
> bits_in_nh(nh_sz))) & lookup_msk(nh_sz));
> }
>
> @@ -115,62 +119,92 @@ is_entry_extended(uint64_t ent)
> return (ent & DIR24_8_EXT_ENT) == DIR24_8_EXT_ENT;
> }
>
> -#define LOOKUP_FUNC(suffix, type, bulk_prefetch, nh_sz) \
> -static inline void dir24_8_lookup_bulk_##suffix(void *p, const uint32_t *ips, \
> - uint64_t *next_hops, const unsigned int n) \
> -{ \
> - struct dir24_8_tbl *dp = (struct dir24_8_tbl *)p; \
> - uint64_t tmp; \
> - uint32_t i; \
> - uint32_t prefetch_offset = \
> - RTE_MIN((unsigned int)bulk_prefetch, n); \
> - \
> - for (i = 0; i < prefetch_offset; i++) \
> - rte_prefetch0(get_tbl24_p(dp, ips[i], nh_sz)); \
> - for (i = 0; i < (n - prefetch_offset); i++) { \
> - rte_prefetch0(get_tbl24_p(dp, \
> - ips[i + prefetch_offset], nh_sz)); \
> - tmp = ((type *)dp->tbl24)[ips[i] >> 8]; \
> - if (unlikely(is_entry_extended(tmp))) \
> - tmp = ((type *)dp->tbl8)[(uint8_t)ips[i] + \
> - ((tmp >> 1) * DIR24_8_TBL8_GRP_NUM_ENT)]; \
> - next_hops[i] = tmp >> 1; \
> - } \
> - for (; i < n; i++) { \
> - tmp = ((type *)dp->tbl24)[ips[i] >> 8]; \
> - if (unlikely(is_entry_extended(tmp))) \
> - tmp = ((type *)dp->tbl8)[(uint8_t)ips[i] + \
> - ((tmp >> 1) * DIR24_8_TBL8_GRP_NUM_ENT)]; \
> - next_hops[i] = tmp >> 1; \
> - } \
> -} \
> -
> -LOOKUP_FUNC(1b, uint8_t, 5, 0)
> -LOOKUP_FUNC(2b, uint16_t, 6, 1)
> -LOOKUP_FUNC(4b, uint32_t, 15, 2)
> -LOOKUP_FUNC(8b, uint64_t, 12, 3)
> +
> +#define LOOKUP_FUNC(suffix, type, bulk_prefetch, nh_sz, is_vrf)
> \
> +static inline void dir24_8_lookup_bulk_##suffix(void *p, \
> + const uint16_t *vrf_ids, const uint32_t *ips, \
> + uint64_t *next_hops, const unsigned int n) \
> +{ \
> + struct dir24_8_tbl *dp = (struct dir24_8_tbl *)p; \
> + uint64_t tmp; \
> + uint32_t i; \
> + uint32_t prefetch_offset = RTE_MIN((unsigned int)bulk_prefetch, n); \
> + \
> + if (!is_vrf) \
> + RTE_SET_USED(vrf_ids);
> \
> + \
> + for (i = 0; i < prefetch_offset; i++) { \
> + uint16_t vid = is_vrf ? vrf_ids[i] : 0; \
> + RTE_ASSERT(vid < dp->num_vrfs);
> \
> + rte_prefetch0(get_tbl24_p(dp, vid, ips[i], nh_sz)); \
> + } \
> + for (i = 0; i < (n - prefetch_offset); i++) { \
> + uint16_t vid = is_vrf ? vrf_ids[i] : 0; \
> + uint16_t vid_next = is_vrf ? vrf_ids[i + prefetch_offset] : 0; \
> + RTE_ASSERT(vid < dp->num_vrfs);
> \
> + RTE_ASSERT(vid_next < dp->num_vrfs);
> \
> + rte_prefetch0(get_tbl24_p(dp, vid_next,
> \
> + ips[i + prefetch_offset], nh_sz)); \
> + tmp = ((type *)dp->tbl24)[get_tbl24_idx(vid, ips[i])]; \
> + if (unlikely(is_entry_extended(tmp))) \
> + tmp = ((type *)dp->tbl8)[(uint8_t)ips[i] + \
> + ((tmp >> 1) * DIR24_8_TBL8_GRP_NUM_ENT)]; \
> + next_hops[i] = tmp >> 1; \
> + } \
> + for (; i < n; i++) { \
> + uint16_t vid = is_vrf ? vrf_ids[i] : 0; \
> + RTE_ASSERT(vid < dp->num_vrfs); \
> + tmp = ((type *)dp->tbl24)[get_tbl24_idx(vid, ips[i])]; \
> + if (unlikely(is_entry_extended(tmp))) \
> + tmp = ((type *)dp->tbl8)[(uint8_t)ips[i] + \
> + ((tmp >> 1) * DIR24_8_TBL8_GRP_NUM_ENT)]; \
> + next_hops[i] = tmp >> 1; \
> + } \
> +}
> +
> +LOOKUP_FUNC(1b, uint8_t, 5, 0, false)
> +LOOKUP_FUNC(2b, uint16_t, 6, 1, false)
> +LOOKUP_FUNC(4b, uint32_t, 15, 2, false)
> +LOOKUP_FUNC(8b, uint64_t, 12, 3, false)
> +LOOKUP_FUNC(vrf_1b, uint8_t, 5, 0, true)
> +LOOKUP_FUNC(vrf_2b, uint16_t, 6, 1, true)
> +LOOKUP_FUNC(vrf_4b, uint32_t, 15, 2, true)
> +LOOKUP_FUNC(vrf_8b, uint64_t, 12, 3, true)
>
> static inline void
> -dir24_8_lookup_bulk(struct dir24_8_tbl *dp, const uint32_t *ips,
> - uint64_t *next_hops, const unsigned int n, uint8_t nh_sz)
> +__dir24_8_lookup_bulk(struct dir24_8_tbl *dp, const uint16_t *vrf_ids,
> + const uint32_t *ips, uint64_t *next_hops, const unsigned int n,
> + uint8_t nh_sz, bool is_vrf)
> {
> uint64_t tmp;
> uint32_t i;
> uint32_t prefetch_offset = RTE_MIN(15U, n);
>
> - for (i = 0; i < prefetch_offset; i++)
> - rte_prefetch0(get_tbl24_p(dp, ips[i], nh_sz));
> + if (!is_vrf)
> + RTE_SET_USED(vrf_ids);
> +
> + for (i = 0; i < prefetch_offset; i++) {
> + uint16_t vid = is_vrf ? vrf_ids[i] : 0;
> + RTE_ASSERT(vid < dp->num_vrfs);
> + rte_prefetch0(get_tbl24_p(dp, vid, ips[i], nh_sz));
> + }
> for (i = 0; i < (n - prefetch_offset); i++) {
> - rte_prefetch0(get_tbl24_p(dp, ips[i + prefetch_offset],
> - nh_sz));
> - tmp = get_tbl24(dp, ips[i], nh_sz);
> + uint16_t vid = is_vrf ? vrf_ids[i] : 0;
> + uint16_t vid_next = is_vrf ? vrf_ids[i + prefetch_offset] : 0;
> + RTE_ASSERT(vid < dp->num_vrfs);
> + RTE_ASSERT(vid_next < dp->num_vrfs);
> + rte_prefetch0(get_tbl24_p(dp, vid_next,
> + ips[i + prefetch_offset], nh_sz));
> + tmp = get_tbl24(dp, vid, ips[i], nh_sz);
> if (unlikely(is_entry_extended(tmp)))
> tmp = get_tbl8(dp, tmp, ips[i], nh_sz);
>
> next_hops[i] = tmp >> 1;
> }
> for (; i < n; i++) {
> - tmp = get_tbl24(dp, ips[i], nh_sz);
> + uint16_t vid = is_vrf ? vrf_ids[i] : 0;
> + RTE_ASSERT(vid < dp->num_vrfs);
> + tmp = get_tbl24(dp, vid, ips[i], nh_sz);
> if (unlikely(is_entry_extended(tmp)))
> tmp = get_tbl8(dp, tmp, ips[i], nh_sz);
>
> @@ -179,43 +213,79 @@ dir24_8_lookup_bulk(struct dir24_8_tbl *dp, const
> uint32_t *ips,
> }
>
> static inline void
> -dir24_8_lookup_bulk_0(void *p, const uint32_t *ips,
> +dir24_8_lookup_bulk_0(void *p, const uint16_t *vrf_ids, const uint32_t *ips,
> uint64_t *next_hops, const unsigned int n)
> {
> struct dir24_8_tbl *dp = (struct dir24_8_tbl *)p;
>
> - dir24_8_lookup_bulk(dp, ips, next_hops, n, 0);
> + __dir24_8_lookup_bulk(dp, vrf_ids, ips, next_hops, n, 0, false);
> +}
> +
> +static inline void
> +dir24_8_lookup_bulk_vrf_0(void *p, const uint16_t *vrf_ids,
> + const uint32_t *ips, uint64_t *next_hops, const unsigned int n)
> +{
> + struct dir24_8_tbl *dp = (struct dir24_8_tbl *)p;
> +
> + __dir24_8_lookup_bulk(dp, vrf_ids, ips, next_hops, n, 0, true);
> }
>
> static inline void
> -dir24_8_lookup_bulk_1(void *p, const uint32_t *ips,
> +dir24_8_lookup_bulk_1(void *p, const uint16_t *vrf_ids, const uint32_t *ips,
> uint64_t *next_hops, const unsigned int n)
> {
> struct dir24_8_tbl *dp = (struct dir24_8_tbl *)p;
>
> - dir24_8_lookup_bulk(dp, ips, next_hops, n, 1);
> + __dir24_8_lookup_bulk(dp, vrf_ids, ips, next_hops, n, 1, false);
> }
>
> static inline void
> -dir24_8_lookup_bulk_2(void *p, const uint32_t *ips,
> +dir24_8_lookup_bulk_vrf_1(void *p, const uint16_t *vrf_ids,
> + const uint32_t *ips, uint64_t *next_hops, const unsigned int n)
> +{
> + struct dir24_8_tbl *dp = (struct dir24_8_tbl *)p;
> +
> + __dir24_8_lookup_bulk(dp, vrf_ids, ips, next_hops, n, 1, true);
> +}
> +
> +static inline void
> +dir24_8_lookup_bulk_2(void *p, const uint16_t *vrf_ids, const uint32_t *ips,
> uint64_t *next_hops, const unsigned int n)
> {
> struct dir24_8_tbl *dp = (struct dir24_8_tbl *)p;
>
> - dir24_8_lookup_bulk(dp, ips, next_hops, n, 2);
> + __dir24_8_lookup_bulk(dp, vrf_ids, ips, next_hops, n, 2, false);
> }
>
> static inline void
> -dir24_8_lookup_bulk_3(void *p, const uint32_t *ips,
> +dir24_8_lookup_bulk_vrf_2(void *p, const uint16_t *vrf_ids,
> + const uint32_t *ips, uint64_t *next_hops, const unsigned int n)
> +{
> + struct dir24_8_tbl *dp = (struct dir24_8_tbl *)p;
> +
> + __dir24_8_lookup_bulk(dp, vrf_ids, ips, next_hops, n, 2, true);
> +}
> +
> +static inline void
> +dir24_8_lookup_bulk_3(void *p, const uint16_t *vrf_ids, const uint32_t *ips,
> uint64_t *next_hops, const unsigned int n)
> {
> struct dir24_8_tbl *dp = (struct dir24_8_tbl *)p;
>
> - dir24_8_lookup_bulk(dp, ips, next_hops, n, 3);
> + __dir24_8_lookup_bulk(dp, vrf_ids, ips, next_hops, n, 3, false);
> }
>
> static inline void
> -dir24_8_lookup_bulk_uni(void *p, const uint32_t *ips,
> +dir24_8_lookup_bulk_vrf_3(void *p, const uint16_t *vrf_ids,
> + const uint32_t *ips, uint64_t *next_hops, const unsigned int n)
> +{
> + struct dir24_8_tbl *dp = (struct dir24_8_tbl *)p;
> +
> + __dir24_8_lookup_bulk(dp, vrf_ids, ips, next_hops, n, 3, true);
> +}
> +
> +static inline void
> +dir24_8_lookup_bulk_uni(void *p, const uint16_t *vrf_ids, const uint32_t *ips,
> uint64_t *next_hops, const unsigned int n)
> {
> struct dir24_8_tbl *dp = (struct dir24_8_tbl *)p;
> @@ -224,66 +294,83 @@ dir24_8_lookup_bulk_uni(void *p, const uint32_t *ips,
> uint32_t prefetch_offset = RTE_MIN(15U, n);
> uint8_t nh_sz = dp->nh_sz;
>
> - for (i = 0; i < prefetch_offset; i++)
> - rte_prefetch0(get_tbl24_p(dp, ips[i], nh_sz));
> + for (i = 0; i < prefetch_offset; i++) {
> + uint16_t vid = vrf_ids[i];
> + RTE_ASSERT(vid < dp->num_vrfs);
> + rte_prefetch0(get_tbl24_p(dp, vid, ips[i], nh_sz));
> + }
> for (i = 0; i < (n - prefetch_offset); i++) {
> - rte_prefetch0(get_tbl24_p(dp, ips[i + prefetch_offset],
> - nh_sz));
> - tmp = get_tbl24(dp, ips[i], nh_sz);
> + uint16_t vid = vrf_ids[i];
> + uint16_t vid_next = vrf_ids[i + prefetch_offset];
> + RTE_ASSERT(vid < dp->num_vrfs);
> + RTE_ASSERT(vid_next < dp->num_vrfs);
> + rte_prefetch0(get_tbl24_p(dp, vid_next,
> + ips[i + prefetch_offset], nh_sz));
> + tmp = get_tbl24(dp, vid, ips[i], nh_sz);
> if (unlikely(is_entry_extended(tmp)))
> tmp = get_tbl8(dp, tmp, ips[i], nh_sz);
>
> next_hops[i] = tmp >> 1;
> }
> for (; i < n; i++) {
> - tmp = get_tbl24(dp, ips[i], nh_sz);
> + uint16_t vid = vrf_ids[i];
> + RTE_ASSERT(vid < dp->num_vrfs);
> + tmp = get_tbl24(dp, vid, ips[i], nh_sz);
> if (unlikely(is_entry_extended(tmp)))
> tmp = get_tbl8(dp, tmp, ips[i], nh_sz);
>
> next_hops[i] = tmp >> 1;
> }
> }
> -
> #define BSWAP_MAX_LENGTH 64
>
> -typedef void (*dir24_8_lookup_bulk_be_cb)(void *p, const uint32_t *ips,
> uint64_t *next_hops,
> - const unsigned int n);
> +typedef void (*dir24_8_lookup_bulk_be_cb)(void *p, const uint16_t *vrf_ids,
> + const uint32_t *ips, uint64_t *next_hops, const unsigned int n);
>
> static inline void
> -dir24_8_lookup_bulk_be(void *p, const uint32_t *ips, uint64_t *next_hops,
> const unsigned int n,
> - dir24_8_lookup_bulk_be_cb cb)
> +dir24_8_lookup_bulk_be(void *p, const uint16_t *vrf_ids, const uint32_t *ips,
> + uint64_t *next_hops, const unsigned int n, dir24_8_lookup_bulk_be_cb
> cb)
> {
> uint32_t le_ips[BSWAP_MAX_LENGTH];
> unsigned int i;
>
> #if RTE_BYTE_ORDER == RTE_BIG_ENDIAN
> - cb(p, ips, next_hops, n);
> + cb(p, vrf_ids, ips, next_hops, n);
> #else
> for (i = 0; i < n; i += BSWAP_MAX_LENGTH) {
> int j;
> for (j = 0; j < BSWAP_MAX_LENGTH && i + j < n; j++)
> le_ips[j] = rte_be_to_cpu_32(ips[i + j]);
>
> - cb(p, le_ips, next_hops + i, j);
> + cb(p, vrf_ids + i, le_ips, next_hops + i, j);
> }
> #endif
> }
>
> #define DECLARE_BE_LOOKUP_FN(name) \
> static inline void \
> -name##_be(void *p, const uint32_t *ips, uint64_t *next_hops, const unsigned
> int n) \
> +name##_be(void *p, const uint16_t *vrf_ids, const uint32_t *ips, \
> + uint64_t *next_hops, const unsigned int n) \
> { \
> - dir24_8_lookup_bulk_be(p, ips, next_hops, n, name); \
> + dir24_8_lookup_bulk_be(p, vrf_ids, ips, next_hops, n, name); \
> }
>
> DECLARE_BE_LOOKUP_FN(dir24_8_lookup_bulk_1b)
> DECLARE_BE_LOOKUP_FN(dir24_8_lookup_bulk_2b)
> DECLARE_BE_LOOKUP_FN(dir24_8_lookup_bulk_4b)
> DECLARE_BE_LOOKUP_FN(dir24_8_lookup_bulk_8b)
> +DECLARE_BE_LOOKUP_FN(dir24_8_lookup_bulk_vrf_1b)
> +DECLARE_BE_LOOKUP_FN(dir24_8_lookup_bulk_vrf_2b)
> +DECLARE_BE_LOOKUP_FN(dir24_8_lookup_bulk_vrf_4b)
> +DECLARE_BE_LOOKUP_FN(dir24_8_lookup_bulk_vrf_8b)
> DECLARE_BE_LOOKUP_FN(dir24_8_lookup_bulk_0)
> DECLARE_BE_LOOKUP_FN(dir24_8_lookup_bulk_1)
> DECLARE_BE_LOOKUP_FN(dir24_8_lookup_bulk_2)
> DECLARE_BE_LOOKUP_FN(dir24_8_lookup_bulk_3)
> +DECLARE_BE_LOOKUP_FN(dir24_8_lookup_bulk_vrf_0)
> +DECLARE_BE_LOOKUP_FN(dir24_8_lookup_bulk_vrf_1)
> +DECLARE_BE_LOOKUP_FN(dir24_8_lookup_bulk_vrf_2)
> +DECLARE_BE_LOOKUP_FN(dir24_8_lookup_bulk_vrf_3)
> DECLARE_BE_LOOKUP_FN(dir24_8_lookup_bulk_uni)
>
> void *
> @@ -296,7 +383,7 @@ rte_fib_lookup_fn_t
> dir24_8_get_lookup_fn(void *p, enum rte_fib_lookup_type type, bool be_addr);
>
> int
> -dir24_8_modify(struct rte_fib *fib, uint32_t ip, uint8_t depth,
> +dir24_8_modify(struct rte_fib *fib, uint16_t vrf_id, uint32_t ip, uint8_t depth,
> uint64_t next_hop, int op);
>
> int
> diff --git a/lib/fib/dir24_8_avx512.c b/lib/fib/dir24_8_avx512.c
> index 89b43583c7..3e576e410e 100644
> --- a/lib/fib/dir24_8_avx512.c
> +++ b/lib/fib/dir24_8_avx512.c
> @@ -4,75 +4,132 @@
>
> #include <rte_vect.h>
> #include <rte_fib.h>
> +#include <rte_debug.h>
>
> #include "dir24_8.h"
> #include "dir24_8_avx512.h"
>
> +enum vrf_scale {
> + VRF_SCALE_SINGLE = 0,
> + VRF_SCALE_SMALL = 1,
> + VRF_SCALE_LARGE = 2,
> +};
> +
> static __rte_always_inline void
> -dir24_8_vec_lookup_x16(void *p, const uint32_t *ips,
> - uint64_t *next_hops, int size, bool be_addr)
> +dir24_8_vec_lookup_x8_64b_path(struct dir24_8_tbl *dp, __m256i ip_vec_256,
> + __m256i vrf32_256, uint64_t *next_hops, int size)
> {
> - struct dir24_8_tbl *dp = (struct dir24_8_tbl *)p;
> - __mmask16 msk_ext;
> - __mmask16 exp_msk = 0x5555;
> - __m512i ip_vec, idxes, res, bytes;
> - const __m512i zero = _mm512_set1_epi32(0);
> - const __m512i lsb = _mm512_set1_epi32(1);
> - const __m512i lsbyte_msk = _mm512_set1_epi32(0xff);
> - __m512i tmp1, tmp2, res_msk;
> - __m256i tmp256;
> - /* used to mask gather values if size is 1/2 (8/16 bit next hops) */
> + const __m512i zero_64 = _mm512_set1_epi64(0);
> + const __m512i lsb_64 = _mm512_set1_epi64(1);
> + const __m512i lsbyte_msk_64 = _mm512_set1_epi64(0xff);
> + __m512i res_msk_64, vrf64, idxes_64, res, bytes_64;
> + __mmask8 msk_ext_64;
> +
> if (size == sizeof(uint8_t))
> - res_msk = _mm512_set1_epi32(UINT8_MAX);
> + res_msk_64 = _mm512_set1_epi64(UINT8_MAX);
> else if (size == sizeof(uint16_t))
> - res_msk = _mm512_set1_epi32(UINT16_MAX);
> + res_msk_64 = _mm512_set1_epi64(UINT16_MAX);
> + else if (size == sizeof(uint32_t))
> + res_msk_64 = _mm512_set1_epi64(UINT32_MAX);
>
> - ip_vec = _mm512_loadu_si512(ips);
> - if (be_addr) {
> - const __m512i bswap32 = _mm512_set_epi32(
> - 0x0c0d0e0f, 0x08090a0b, 0x04050607, 0x00010203,
> - 0x0c0d0e0f, 0x08090a0b, 0x04050607, 0x00010203,
> - 0x0c0d0e0f, 0x08090a0b, 0x04050607, 0x00010203,
> - 0x0c0d0e0f, 0x08090a0b, 0x04050607, 0x00010203
> - );
> - ip_vec = _mm512_shuffle_epi8(ip_vec, bswap32);
> + vrf64 = _mm512_cvtepu32_epi64(vrf32_256);
> +
> + /* Compute index: (vrf_id << 24) + (ip >> 8) using 64-bit shift */
> + idxes_64 = _mm512_slli_epi64(vrf64, 24);
> + idxes_64 = _mm512_add_epi64(idxes_64, _mm512_cvtepu32_epi64(
> + _mm256_srli_epi32(ip_vec_256, 8)));
> +
> + /* lookup in tbl24 */
> + if (size == sizeof(uint8_t)) {
> + res = _mm512_i64gather_epi64(idxes_64, (const void *)dp-
> >tbl24, 1);
> + res = _mm512_and_epi64(res, res_msk_64);
> + } else if (size == sizeof(uint16_t)) {
> + res = _mm512_i64gather_epi64(idxes_64, (const void *)dp-
> >tbl24, 2);
> + res = _mm512_and_epi64(res, res_msk_64);
> + } else {
> + res = _mm512_i64gather_epi64(idxes_64, (const void *)dp-
> >tbl24, 4);
> + res = _mm512_and_epi64(res, res_msk_64);
> + }
> +
> + /* get extended entries indexes */
> + msk_ext_64 = _mm512_test_epi64_mask(res, lsb_64);
> +
> + if (msk_ext_64 != 0) {
> + bytes_64 = _mm512_cvtepu32_epi64(ip_vec_256);
> + idxes_64 = _mm512_srli_epi64(res, 1);
> + idxes_64 = _mm512_slli_epi64(idxes_64, 8);
> + bytes_64 = _mm512_and_epi64(bytes_64, lsbyte_msk_64);
> + idxes_64 = _mm512_maskz_add_epi64(msk_ext_64, idxes_64,
> bytes_64);
> +
> + if (size == sizeof(uint8_t))
> + idxes_64 = _mm512_mask_i64gather_epi64(zero_64,
> msk_ext_64,
> + idxes_64, (const void *)dp->tbl8, 1);
> + else if (size == sizeof(uint16_t))
> + idxes_64 = _mm512_mask_i64gather_epi64(zero_64,
> msk_ext_64,
> + idxes_64, (const void *)dp->tbl8, 2);
> + else
> + idxes_64 = _mm512_mask_i64gather_epi64(zero_64,
> msk_ext_64,
> + idxes_64, (const void *)dp->tbl8, 4);
> +
> + res = _mm512_mask_blend_epi64(msk_ext_64, res, idxes_64);
> }
>
> - /* mask 24 most significant bits */
> - idxes = _mm512_srli_epi32(ip_vec, 8);
> + res = _mm512_srli_epi64(res, 1);
> + _mm512_storeu_si512(next_hops, res);
> +}
> +
> +static __rte_always_inline void
> +dir24_8_vec_lookup_x16_32b_path(struct dir24_8_tbl *dp, __m512i ip_vec,
> + __m512i idxes, uint64_t *next_hops, int size)
> +{
> + __mmask16 msk_ext;
> + const __mmask16 exp_msk = 0x5555;
> + const __m512i zero_32 = _mm512_set1_epi32(0);
> + const __m512i lsb_32 = _mm512_set1_epi32(1);
> + const __m512i lsbyte_msk_32 = _mm512_set1_epi32(0xff);
> + __m512i res, bytes, tmp1, tmp2;
> + __m256i tmp256;
> + __m512i res_msk_32;
> +
> + if (size == sizeof(uint8_t))
> + res_msk_32 = _mm512_set1_epi32(UINT8_MAX);
> + else if (size == sizeof(uint16_t))
> + res_msk_32 = _mm512_set1_epi32(UINT16_MAX);
>
> - /**
> + /*
> * lookup in tbl24
> * Put it inside branch to make compiler happy with -O0
> */
> if (size == sizeof(uint8_t)) {
> res = _mm512_i32gather_epi32(idxes, (const int *)dp->tbl24, 1);
> - res = _mm512_and_epi32(res, res_msk);
> + res = _mm512_and_epi32(res, res_msk_32);
> } else if (size == sizeof(uint16_t)) {
> res = _mm512_i32gather_epi32(idxes, (const int *)dp->tbl24, 2);
> - res = _mm512_and_epi32(res, res_msk);
> - } else
> + res = _mm512_and_epi32(res, res_msk_32);
> + } else {
> res = _mm512_i32gather_epi32(idxes, (const int *)dp->tbl24, 4);
> + }
>
> /* get extended entries indexes */
> - msk_ext = _mm512_test_epi32_mask(res, lsb);
> + msk_ext = _mm512_test_epi32_mask(res, lsb_32);
>
> if (msk_ext != 0) {
> idxes = _mm512_srli_epi32(res, 1);
> idxes = _mm512_slli_epi32(idxes, 8);
> - bytes = _mm512_and_epi32(ip_vec, lsbyte_msk);
> + bytes = _mm512_and_epi32(ip_vec, lsbyte_msk_32);
> idxes = _mm512_maskz_add_epi32(msk_ext, idxes, bytes);
> if (size == sizeof(uint8_t)) {
> - idxes = _mm512_mask_i32gather_epi32(zero, msk_ext,
> + idxes = _mm512_mask_i32gather_epi32(zero_32,
> msk_ext,
> idxes, (const int *)dp->tbl8, 1);
> - idxes = _mm512_and_epi32(idxes, res_msk);
> + idxes = _mm512_and_epi32(idxes, res_msk_32);
> } else if (size == sizeof(uint16_t)) {
> - idxes = _mm512_mask_i32gather_epi32(zero, msk_ext,
> + idxes = _mm512_mask_i32gather_epi32(zero_32,
> msk_ext,
> idxes, (const int *)dp->tbl8, 2);
> - idxes = _mm512_and_epi32(idxes, res_msk);
> - } else
> - idxes = _mm512_mask_i32gather_epi32(zero, msk_ext,
> + idxes = _mm512_and_epi32(idxes, res_msk_32);
> + } else {
> + idxes = _mm512_mask_i32gather_epi32(zero_32,
> msk_ext,
> idxes, (const int *)dp->tbl8, 4);
> + }
>
> res = _mm512_mask_blend_epi32(msk_ext, res, idxes);
> }
> @@ -86,16 +143,74 @@ dir24_8_vec_lookup_x16(void *p, const uint32_t *ips,
> _mm512_storeu_si512(next_hops + 8, tmp2);
> }
>
> +/* Unified function with vrf_scale parameter similar to be_addr */
> +static __rte_always_inline void
> +dir24_8_vec_lookup_x16(void *p, const uint16_t *vrf_ids, const uint32_t *ips,
> + uint64_t *next_hops, int size, bool be_addr, enum vrf_scale vrf_scale)
> +{
> + struct dir24_8_tbl *dp = (struct dir24_8_tbl *)p;
> + __m512i ip_vec, idxes;
> + __m256i ip_vec_256, vrf32_256;
> +
> + ip_vec = _mm512_loadu_si512(ips);
> + if (be_addr) {
> + const __m512i bswap32 = _mm512_set_epi32(
> + 0x0c0d0e0f, 0x08090a0b, 0x04050607, 0x00010203,
> + 0x0c0d0e0f, 0x08090a0b, 0x04050607, 0x00010203,
> + 0x0c0d0e0f, 0x08090a0b, 0x04050607, 0x00010203,
> + 0x0c0d0e0f, 0x08090a0b, 0x04050607, 0x00010203
> + );
> + ip_vec = _mm512_shuffle_epi8(ip_vec, bswap32);
> + }
> +
> + if (vrf_scale == VRF_SCALE_SINGLE) {
> + /* mask 24 most significant bits */
> + idxes = _mm512_srli_epi32(ip_vec, 8);
> + dir24_8_vec_lookup_x16_32b_path(dp, ip_vec, idxes, next_hops,
> size);
> + } else if (vrf_scale == VRF_SCALE_SMALL) {
> + /* For < 256 VRFs: use 32-bit indices with 32-bit shift */
> + __m512i vrf32;
> + uint32_t i;
> +
> + for (i = 0; i < 16; i++)
> + RTE_ASSERT(vrf_ids[i] < dp->num_vrfs);
> +
> + vrf32 = _mm512_cvtepu16_epi32(_mm256_loadu_si256((const
> void *)vrf_ids));
> +
> + /* mask 24 most significant bits */
> + idxes = _mm512_srli_epi32(ip_vec, 8);
> + idxes = _mm512_add_epi32(idxes, _mm512_slli_epi32(vrf32,
> 24));
> + dir24_8_vec_lookup_x16_32b_path(dp, ip_vec, idxes, next_hops,
> size);
> + } else {
> + /* For >= 256 VRFs: use 64-bit indices to avoid overflow */
> + uint32_t i;
> +
> + for (i = 0; i < 16; i++)
> + RTE_ASSERT(vrf_ids[i] < dp->num_vrfs);
> +
> + /* Extract first 8 IPs and VRF IDs */
> + ip_vec_256 = _mm512_castsi512_si256(ip_vec);
> + vrf32_256 = _mm256_cvtepu16_epi32(_mm_loadu_si128((const
> void *)vrf_ids));
> + dir24_8_vec_lookup_x8_64b_path(dp, ip_vec_256, vrf32_256,
> next_hops, size);
> +
> + /* Process next 8 IPs from the second half of the vector */
> + ip_vec_256 = _mm512_extracti32x8_epi32(ip_vec, 1);
> + vrf32_256 = _mm256_cvtepu16_epi32(_mm_loadu_si128((const
> void *)(vrf_ids + 8)));
> + dir24_8_vec_lookup_x8_64b_path(dp, ip_vec_256, vrf32_256,
> next_hops + 8, size);
> + }
> +}
> +
> +/* Unified function with vrf_scale parameter */
> static __rte_always_inline void
> -dir24_8_vec_lookup_x8_8b(void *p, const uint32_t *ips,
> - uint64_t *next_hops, bool be_addr)
> +dir24_8_vec_lookup_x8_8b(void *p, const uint16_t *vrf_ids,
> + const uint32_t *ips, uint64_t *next_hops, bool be_addr, enum vrf_scale
> vrf_scale)
> {
> struct dir24_8_tbl *dp = (struct dir24_8_tbl *)p;
> - const __m512i zero = _mm512_set1_epi32(0);
> - const __m512i lsbyte_msk = _mm512_set1_epi64(0xff);
> - const __m512i lsb = _mm512_set1_epi64(1);
> + const __m512i zero_64 = _mm512_set1_epi64(0);
> + const __m512i lsbyte_msk_64 = _mm512_set1_epi64(0xff);
> + const __m512i lsb_64 = _mm512_set1_epi64(1);
> __m512i res, idxes, bytes;
> - __m256i idxes_256, ip_vec;
> + __m256i ip_vec, vrf32_256;
> __mmask8 msk_ext;
>
> ip_vec = _mm256_loadu_si256((const void *)ips);
> @@ -106,66 +221,207 @@ dir24_8_vec_lookup_x8_8b(void *p, const uint32_t
> *ips,
> );
> ip_vec = _mm256_shuffle_epi8(ip_vec, bswap32);
> }
> - /* mask 24 most significant bits */
> - idxes_256 = _mm256_srli_epi32(ip_vec, 8);
>
> - /* lookup in tbl24 */
> - res = _mm512_i32gather_epi64(idxes_256, (const void *)dp->tbl24, 8);
> + if (vrf_scale == VRF_SCALE_SINGLE) {
> + /* For single VRF: use 32-bit indices without vrf_ids */
> + __m256i idxes_256;
>
> - /* get extended entries indexes */
> - msk_ext = _mm512_test_epi64_mask(res, lsb);
> + /* mask 24 most significant bits */
> + idxes_256 = _mm256_srli_epi32(ip_vec, 8);
>
> - if (msk_ext != 0) {
> - bytes = _mm512_cvtepi32_epi64(ip_vec);
> - idxes = _mm512_srli_epi64(res, 1);
> - idxes = _mm512_slli_epi64(idxes, 8);
> - bytes = _mm512_and_epi64(bytes, lsbyte_msk);
> - idxes = _mm512_maskz_add_epi64(msk_ext, idxes, bytes);
> - idxes = _mm512_mask_i64gather_epi64(zero, msk_ext, idxes,
> - (const void *)dp->tbl8, 8);
> -
> - res = _mm512_mask_blend_epi64(msk_ext, res, idxes);
> - }
> + /* lookup in tbl24 */
> + res = _mm512_i32gather_epi64(idxes_256, (const void *)dp-
> >tbl24, 8);
>
> - res = _mm512_srli_epi64(res, 1);
> - _mm512_storeu_si512(next_hops, res);
> + /* get extended entries indexes */
> + msk_ext = _mm512_test_epi64_mask(res, lsb_64);
> +
> + if (msk_ext != 0) {
> + bytes = _mm512_cvtepu32_epi64(ip_vec);
> + idxes = _mm512_srli_epi64(res, 1);
> + idxes = _mm512_slli_epi64(idxes, 8);
> + bytes = _mm512_and_epi64(bytes, lsbyte_msk_64);
> + idxes = _mm512_maskz_add_epi64(msk_ext, idxes,
> bytes);
> + idxes = _mm512_mask_i64gather_epi64(zero_64,
> msk_ext, idxes,
> + (const void *)dp->tbl8, 8);
> +
> + res = _mm512_mask_blend_epi64(msk_ext, res, idxes);
> + }
> +
> + res = _mm512_srli_epi64(res, 1);
> + _mm512_storeu_si512(next_hops, res);
> + } else if (vrf_scale == VRF_SCALE_SMALL) {
> + /* For < 256 VRFs: use 32-bit indices with 32-bit shift */
> + __m256i idxes_256;
> + uint32_t i;
> +
> + for (i = 0; i < 8; i++)
> + RTE_ASSERT(vrf_ids[i] < dp->num_vrfs);
> +
> + /* mask 24 most significant bits */
> + idxes_256 = _mm256_srli_epi32(ip_vec, 8);
> + vrf32_256 = _mm256_cvtepu16_epi32(_mm_loadu_si128((const
> void *)vrf_ids));
> + idxes_256 = _mm256_add_epi32(idxes_256,
> _mm256_slli_epi32(vrf32_256, 24));
> +
> + /* lookup in tbl24 */
> + res = _mm512_i32gather_epi64(idxes_256, (const void *)dp-
> >tbl24, 8);
> +
> + /* get extended entries indexes */
> + msk_ext = _mm512_test_epi64_mask(res, lsb_64);
> +
> + if (msk_ext != 0) {
> + bytes = _mm512_cvtepu32_epi64(ip_vec);
> + idxes = _mm512_srli_epi64(res, 1);
> + idxes = _mm512_slli_epi64(idxes, 8);
> + bytes = _mm512_and_epi64(bytes, lsbyte_msk_64);
> + idxes = _mm512_maskz_add_epi64(msk_ext, idxes,
> bytes);
> + idxes = _mm512_mask_i64gather_epi64(zero_64,
> msk_ext, idxes,
> + (const void *)dp->tbl8, 8);
> +
> + res = _mm512_mask_blend_epi64(msk_ext, res, idxes);
> + }
> +
> + res = _mm512_srli_epi64(res, 1);
> + _mm512_storeu_si512(next_hops, res);
> + } else {
> + /* For >= 256 VRFs: use 64-bit indices to avoid overflow */
> + uint32_t i;
> +
> + for (i = 0; i < 8; i++)
> + RTE_ASSERT(vrf_ids[i] < dp->num_vrfs);
> +
> + vrf32_256 = _mm256_cvtepu16_epi32(_mm_loadu_si128((const
> void *)vrf_ids));
> + __m512i vrf64 = _mm512_cvtepu32_epi64(vrf32_256);
> +
> + /* Compute index: (vrf_id << 24) + (ip >> 8) using 64-bit
> arithmetic */
> + idxes = _mm512_slli_epi64(vrf64, 24);
> + idxes = _mm512_add_epi64(idxes, _mm512_cvtepu32_epi64(
> + _mm256_srli_epi32(ip_vec, 8)));
> +
> + /* lookup in tbl24 with 64-bit gather */
> + res = _mm512_i64gather_epi64(idxes, (const void *)dp->tbl24, 8);
> +
> + /* get extended entries indexes */
> + msk_ext = _mm512_test_epi64_mask(res, lsb_64);
> +
> + if (msk_ext != 0) {
> + bytes = _mm512_cvtepu32_epi64(ip_vec);
> + idxes = _mm512_srli_epi64(res, 1);
> + idxes = _mm512_slli_epi64(idxes, 8);
> + bytes = _mm512_and_epi64(bytes, lsbyte_msk_64);
> + idxes = _mm512_maskz_add_epi64(msk_ext, idxes,
> bytes);
> + idxes = _mm512_mask_i64gather_epi64(zero_64,
> msk_ext, idxes,
> + (const void *)dp->tbl8, 8);
> +
> + res = _mm512_mask_blend_epi64(msk_ext, res, idxes);
> + }
> +
> + res = _mm512_srli_epi64(res, 1);
> + _mm512_storeu_si512(next_hops, res);
> + }
> }
>
> -#define DECLARE_VECTOR_FN(suffix, nh_type, be_addr) \
> +#define DECLARE_VECTOR_FN(suffix, scalar_suffix, nh_type, be_addr, vrf_scale)
> \
> void \
> -rte_dir24_8_vec_lookup_bulk_##suffix(void *p, const uint32_t *ips, uint64_t
> *next_hops, \
> - const unsigned int n) \
> +rte_dir24_8_vec_lookup_bulk_##suffix(void *p, const uint16_t *vrf_ids, \
> + const uint32_t *ips, uint64_t *next_hops, const unsigned int n) \
> { \
> uint32_t i; \
> for (i = 0; i < (n / 16); i++) \
> - dir24_8_vec_lookup_x16(p, ips + i * 16, next_hops + i * 16,
> sizeof(nh_type), \
> - be_addr); \
> - dir24_8_lookup_bulk_##suffix(p, ips + i * 16, next_hops + i * 16, n - i *
> 16); \
> + dir24_8_vec_lookup_x16(p, vrf_ids + i * 16, ips + i * 16, \
> + next_hops + i * 16, sizeof(nh_type), be_addr, vrf_scale); \
> + dir24_8_lookup_bulk_##scalar_suffix(p, vrf_ids + i * 16, ips + i * 16, \
> + next_hops + i * 16, n - i * 16); \
> +}
> +
> +DECLARE_VECTOR_FN(1b, 1b, uint8_t, false, VRF_SCALE_SINGLE)
> +DECLARE_VECTOR_FN(1b_be, 1b_be, uint8_t, true, VRF_SCALE_SINGLE)
> +DECLARE_VECTOR_FN(2b, 2b, uint16_t, false, VRF_SCALE_SINGLE)
> +DECLARE_VECTOR_FN(2b_be, 2b_be, uint16_t, true, VRF_SCALE_SINGLE)
> +DECLARE_VECTOR_FN(4b, 4b, uint32_t, false, VRF_SCALE_SINGLE)
> +DECLARE_VECTOR_FN(4b_be, 4b_be, uint32_t, true, VRF_SCALE_SINGLE)
> +
> +DECLARE_VECTOR_FN(vrf_1b, vrf_1b, uint8_t, false, VRF_SCALE_SMALL)
> +DECLARE_VECTOR_FN(vrf_1b_be, vrf_1b_be, uint8_t, true, VRF_SCALE_SMALL)
> +DECLARE_VECTOR_FN(vrf_2b, vrf_2b, uint16_t, false, VRF_SCALE_SMALL)
> +DECLARE_VECTOR_FN(vrf_2b_be, vrf_2b_be, uint16_t, true, VRF_SCALE_SMALL)
> +DECLARE_VECTOR_FN(vrf_4b, vrf_4b, uint32_t, false, VRF_SCALE_SMALL)
> +DECLARE_VECTOR_FN(vrf_4b_be, vrf_4b_be, uint32_t, true, VRF_SCALE_SMALL)
> +
> +DECLARE_VECTOR_FN(vrf_1b_large, vrf_1b, uint8_t, false, VRF_SCALE_LARGE)
> +DECLARE_VECTOR_FN(vrf_1b_be_large, vrf_1b_be, uint8_t, true,
> VRF_SCALE_LARGE)
> +DECLARE_VECTOR_FN(vrf_2b_large, vrf_2b, uint16_t, false, VRF_SCALE_LARGE)
> +DECLARE_VECTOR_FN(vrf_2b_be_large, vrf_2b_be, uint16_t, true,
> VRF_SCALE_LARGE)
> +DECLARE_VECTOR_FN(vrf_4b_large, vrf_4b, uint32_t, false, VRF_SCALE_LARGE)
> +DECLARE_VECTOR_FN(vrf_4b_be_large, vrf_4b_be, uint32_t, true,
> VRF_SCALE_LARGE)
> +
> +void
> +rte_dir24_8_vec_lookup_bulk_8b(void *p, const uint16_t *vrf_ids,
> + const uint32_t *ips, uint64_t *next_hops, const unsigned int n)
> +{
> + uint32_t i;
> + for (i = 0; i < (n / 8); i++)
> + dir24_8_vec_lookup_x8_8b(p, vrf_ids + i * 8, ips + i * 8,
> + next_hops + i * 8, false, VRF_SCALE_SINGLE);
> + dir24_8_lookup_bulk_8b(p, vrf_ids + i * 8, ips + i * 8,
> + next_hops + i * 8, n - i * 8);
> +}
> +
> +void
> +rte_dir24_8_vec_lookup_bulk_8b_be(void *p, const uint16_t *vrf_ids,
> + const uint32_t *ips, uint64_t *next_hops, const unsigned int n)
> +{
> + uint32_t i;
> + for (i = 0; i < (n / 8); i++)
> + dir24_8_vec_lookup_x8_8b(p, vrf_ids + i * 8, ips + i * 8,
> + next_hops + i * 8, true, VRF_SCALE_SINGLE);
> + dir24_8_lookup_bulk_8b_be(p, vrf_ids + i * 8, ips + i * 8,
> + next_hops + i * 8, n - i * 8);
> +}
> +
> +void
> +rte_dir24_8_vec_lookup_bulk_vrf_8b(void *p, const uint16_t *vrf_ids,
> + const uint32_t *ips, uint64_t *next_hops, const unsigned int n)
> +{
> + uint32_t i;
> + for (i = 0; i < (n / 8); i++)
> + dir24_8_vec_lookup_x8_8b(p, vrf_ids + i * 8, ips + i * 8,
> + next_hops + i * 8, false, VRF_SCALE_SMALL);
> + dir24_8_lookup_bulk_vrf_8b(p, vrf_ids + i * 8, ips + i * 8,
> + next_hops + i * 8, n - i * 8);
> }
>
> -DECLARE_VECTOR_FN(1b, uint8_t, false)
> -DECLARE_VECTOR_FN(1b_be, uint8_t, true)
> -DECLARE_VECTOR_FN(2b, uint16_t, false)
> -DECLARE_VECTOR_FN(2b_be, uint16_t, true)
> -DECLARE_VECTOR_FN(4b, uint32_t, false)
> -DECLARE_VECTOR_FN(4b_be, uint32_t, true)
> +void
> +rte_dir24_8_vec_lookup_bulk_vrf_8b_be(void *p, const uint16_t *vrf_ids,
> + const uint32_t *ips, uint64_t *next_hops, const unsigned int n)
> +{
> + uint32_t i;
> + for (i = 0; i < (n / 8); i++)
> + dir24_8_vec_lookup_x8_8b(p, vrf_ids + i * 8, ips + i * 8,
> + next_hops + i * 8, true, VRF_SCALE_SMALL);
> + dir24_8_lookup_bulk_vrf_8b_be(p, vrf_ids + i * 8, ips + i * 8,
> + next_hops + i * 8, n - i * 8);
> +}
>
> void
> -rte_dir24_8_vec_lookup_bulk_8b(void *p, const uint32_t *ips,
> - uint64_t *next_hops, const unsigned int n)
> +rte_dir24_8_vec_lookup_bulk_vrf_8b_large(void *p, const uint16_t *vrf_ids,
> + const uint32_t *ips, uint64_t *next_hops, const unsigned int n)
> {
> uint32_t i;
> for (i = 0; i < (n / 8); i++)
> - dir24_8_vec_lookup_x8_8b(p, ips + i * 8, next_hops + i * 8, false);
> - dir24_8_lookup_bulk_8b(p, ips + i * 8, next_hops + i * 8, n - i * 8);
> + dir24_8_vec_lookup_x8_8b(p, vrf_ids + i * 8, ips + i * 8,
> + next_hops + i * 8, false, VRF_SCALE_LARGE);
> + dir24_8_lookup_bulk_vrf_8b(p, vrf_ids + i * 8, ips + i * 8,
> + next_hops + i * 8, n - i * 8);
> }
>
> void
> -rte_dir24_8_vec_lookup_bulk_8b_be(void *p, const uint32_t *ips,
> - uint64_t *next_hops, const unsigned int n)
> +rte_dir24_8_vec_lookup_bulk_vrf_8b_be_large(void *p, const uint16_t *vrf_ids,
> + const uint32_t *ips, uint64_t *next_hops, const unsigned int n)
> {
> uint32_t i;
> for (i = 0; i < (n / 8); i++)
> - dir24_8_vec_lookup_x8_8b(p, ips + i * 8, next_hops + i * 8, true);
> - dir24_8_lookup_bulk_8b_be(p, ips + i * 8, next_hops + i * 8, n - i * 8);
> + dir24_8_vec_lookup_x8_8b(p, vrf_ids + i * 8, ips + i * 8,
> + next_hops + i * 8, true, VRF_SCALE_LARGE);
> + dir24_8_lookup_bulk_vrf_8b_be(p, vrf_ids + i * 8, ips + i * 8,
> + next_hops + i * 8, n - i * 8);
> }
> diff --git a/lib/fib/dir24_8_avx512.h b/lib/fib/dir24_8_avx512.h
> index 3e2bbc2490..d42ef1d17f 100644
> --- a/lib/fib/dir24_8_avx512.h
> +++ b/lib/fib/dir24_8_avx512.h
> @@ -6,35 +6,99 @@
> #define _DIR248_AVX512_H_
>
> void
> -rte_dir24_8_vec_lookup_bulk_1b(void *p, const uint32_t *ips,
> +rte_dir24_8_vec_lookup_bulk_1b(void *p, const uint16_t *vrf_ids,
> + const uint32_t *ips, uint64_t *next_hops, const unsigned int n);
> +
> +void
> +rte_dir24_8_vec_lookup_bulk_1b_be(void *p, const uint16_t *vrf_ids,
> + const uint32_t *ips, uint64_t *next_hops, const unsigned int n);
> +
> +void
> +rte_dir24_8_vec_lookup_bulk_vrf_1b(void *p, const uint16_t *vrf_ids, const
> uint32_t *ips,
> uint64_t *next_hops, const unsigned int n);
>
> void
> -rte_dir24_8_vec_lookup_bulk_2b(void *p, const uint32_t *ips,
> +rte_dir24_8_vec_lookup_bulk_vrf_1b_be(void *p, const uint16_t *vrf_ids, const
> uint32_t *ips,
> uint64_t *next_hops, const unsigned int n);
>
> void
> -rte_dir24_8_vec_lookup_bulk_4b(void *p, const uint32_t *ips,
> +rte_dir24_8_vec_lookup_bulk_vrf_1b_large(void *p, const uint16_t *vrf_ids,
> + const uint32_t *ips, uint64_t *next_hops, const unsigned int n);
> +
> +void
> +rte_dir24_8_vec_lookup_bulk_vrf_1b_be_large(void *p, const uint16_t *vrf_ids,
> + const uint32_t *ips, uint64_t *next_hops, const unsigned int n);
> +
> +void
> +rte_dir24_8_vec_lookup_bulk_2b(void *p, const uint16_t *vrf_ids,
> + const uint32_t *ips, uint64_t *next_hops, const unsigned int n);
> +
> +void
> +rte_dir24_8_vec_lookup_bulk_2b_be(void *p, const uint16_t *vrf_ids,
> + const uint32_t *ips, uint64_t *next_hops, const unsigned int n);
> +
> +void
> +rte_dir24_8_vec_lookup_bulk_vrf_2b(void *p, const uint16_t *vrf_ids, const
> uint32_t *ips,
> uint64_t *next_hops, const unsigned int n);
>
> void
> -rte_dir24_8_vec_lookup_bulk_8b(void *p, const uint32_t *ips,
> +rte_dir24_8_vec_lookup_bulk_vrf_2b_be(void *p, const uint16_t *vrf_ids, const
> uint32_t *ips,
> uint64_t *next_hops, const unsigned int n);
>
> void
> -rte_dir24_8_vec_lookup_bulk_1b_be(void *p, const uint32_t *ips,
> +rte_dir24_8_vec_lookup_bulk_vrf_2b_large(void *p, const uint16_t *vrf_ids,
> + const uint32_t *ips, uint64_t *next_hops, const unsigned int n);
> +
> +void
> +rte_dir24_8_vec_lookup_bulk_vrf_2b_be_large(void *p, const uint16_t *vrf_ids,
> + const uint32_t *ips, uint64_t *next_hops, const unsigned int n);
> +
> +void
> +rte_dir24_8_vec_lookup_bulk_4b(void *p, const uint16_t *vrf_ids,
> + const uint32_t *ips, uint64_t *next_hops, const unsigned int n);
> +
> +void
> +rte_dir24_8_vec_lookup_bulk_4b_be(void *p, const uint16_t *vrf_ids,
> + const uint32_t *ips, uint64_t *next_hops, const unsigned int n);
> +
> +void
> +rte_dir24_8_vec_lookup_bulk_vrf_4b(void *p, const uint16_t *vrf_ids, const
> uint32_t *ips,
> uint64_t *next_hops, const unsigned int n);
>
> void
> -rte_dir24_8_vec_lookup_bulk_2b_be(void *p, const uint32_t *ips,
> +rte_dir24_8_vec_lookup_bulk_vrf_4b_be(void *p, const uint16_t *vrf_ids, const
> uint32_t *ips,
> uint64_t *next_hops, const unsigned int n);
>
> void
> -rte_dir24_8_vec_lookup_bulk_4b_be(void *p, const uint32_t *ips,
> +rte_dir24_8_vec_lookup_bulk_vrf_4b_large(void *p, const uint16_t *vrf_ids,
> + const uint32_t *ips, uint64_t *next_hops, const unsigned int n);
> +
> +void
> +rte_dir24_8_vec_lookup_bulk_vrf_4b_be_large(void *p, const uint16_t *vrf_ids,
> + const uint32_t *ips, uint64_t *next_hops, const unsigned int n);
> +
> +void
> +rte_dir24_8_vec_lookup_bulk_8b(void *p, const uint16_t *vrf_ids,
> + const uint32_t *ips, uint64_t *next_hops, const unsigned int n);
> +
> +void
> +rte_dir24_8_vec_lookup_bulk_8b_be(void *p, const uint16_t *vrf_ids,
> + const uint32_t *ips, uint64_t *next_hops, const unsigned int n);
> +
> +void
> +rte_dir24_8_vec_lookup_bulk_vrf_8b(void *p, const uint16_t *vrf_ids, const
> uint32_t *ips,
> uint64_t *next_hops, const unsigned int n);
>
> void
> -rte_dir24_8_vec_lookup_bulk_8b_be(void *p, const uint32_t *ips,
> +rte_dir24_8_vec_lookup_bulk_vrf_8b_be(void *p, const uint16_t *vrf_ids, const
> uint32_t *ips,
> uint64_t *next_hops, const unsigned int n);
>
> +void
> +rte_dir24_8_vec_lookup_bulk_vrf_8b_large(void *p, const uint16_t *vrf_ids,
> + const uint32_t *ips, uint64_t *next_hops, const unsigned int n);
> +
> +void
> +rte_dir24_8_vec_lookup_bulk_vrf_8b_be_large(void *p, const uint16_t *vrf_ids,
> + const uint32_t *ips, uint64_t *next_hops, const unsigned int n);
> +
> #endif /* _DIR248_AVX512_H_ */
> diff --git a/lib/fib/rte_fib.c b/lib/fib/rte_fib.c
> index 184210f380..efc0595a7f 100644
> --- a/lib/fib/rte_fib.c
> +++ b/lib/fib/rte_fib.c
> @@ -14,12 +14,15 @@
> #include <rte_string_fns.h>
> #include <rte_tailq.h>
>
> +#include <rte_debug.h>
> #include <rte_rib.h>
> #include <rte_fib.h>
>
> #include "dir24_8.h"
> #include "fib_log.h"
>
> +#define FIB_MAX_LOOKUP_BULK 64U
> +
> RTE_LOG_REGISTER_DEFAULT(fib_logtype, INFO);
>
> TAILQ_HEAD(rte_fib_list, rte_tailq_entry);
> @@ -40,52 +43,61 @@ EAL_REGISTER_TAILQ(rte_fib_tailq)
> struct rte_fib {
> char name[RTE_FIB_NAMESIZE];
> enum rte_fib_type type; /**< Type of FIB struct */
> - unsigned int flags; /**< Flags */
> - struct rte_rib *rib; /**< RIB helper datastructure */
> + uint16_t flags; /**< Flags */
> + uint16_t num_vrfs;/**< Number of VRFs */
> + struct rte_rib **ribs; /**< RIB helper datastructures per VRF */
> void *dp; /**< pointer to the dataplane struct*/
> rte_fib_lookup_fn_t lookup; /**< FIB lookup function */
> rte_fib_modify_fn_t modify; /**< modify FIB datastructure */
> - uint64_t def_nh;
> + uint64_t *def_nh;/**< Per-VRF default next hop array */
> };
>
> static void
> -dummy_lookup(void *fib_p, const uint32_t *ips, uint64_t *next_hops,
> - const unsigned int n)
> +dummy_lookup(void *fib_p, const uint16_t *vrf_ids,
> + const uint32_t *ips, uint64_t *next_hops, const unsigned int n)
> {
> unsigned int i;
> struct rte_fib *fib = fib_p;
> struct rte_rib_node *node;
> + struct rte_rib *rib;
>
> for (i = 0; i < n; i++) {
> - node = rte_rib_lookup(fib->rib, ips[i]);
> + RTE_ASSERT(vrf_ids[i] < fib->num_vrfs);
> + rib = rte_fib_vrf_get_rib(fib, vrf_ids[i]);
> + node = rte_rib_lookup(rib, ips[i]);
> if (node != NULL)
> rte_rib_get_nh(node, &next_hops[i]);
> else
> - next_hops[i] = fib->def_nh;
> + next_hops[i] = fib->def_nh[vrf_ids[i]];
> }
> }
>
> static int
> -dummy_modify(struct rte_fib *fib, uint32_t ip, uint8_t depth,
> - uint64_t next_hop, int op)
> +dummy_modify(struct rte_fib *fib, uint16_t vrf_id, uint32_t ip,
> + uint8_t depth, uint64_t next_hop, int op)
> {
> struct rte_rib_node *node;
> + struct rte_rib *rib;
> if ((fib == NULL) || (depth > RTE_FIB_MAXDEPTH))
> return -EINVAL;
>
> - node = rte_rib_lookup_exact(fib->rib, ip, depth);
> + rib = rte_fib_vrf_get_rib(fib, vrf_id);
> + if (rib == NULL)
> + return -EINVAL;
> +
> + node = rte_rib_lookup_exact(rib, ip, depth);
>
> switch (op) {
> case RTE_FIB_ADD:
> if (node == NULL)
> - node = rte_rib_insert(fib->rib, ip, depth);
> + node = rte_rib_insert(rib, ip, depth);
> if (node == NULL)
> return -rte_errno;
> return rte_rib_set_nh(node, next_hop);
> case RTE_FIB_DEL:
> if (node == NULL)
> return -ENOENT;
> - rte_rib_remove(fib->rib, ip, depth);
> + rte_rib_remove(rib, ip, depth);
> return 0;
> }
> return -EINVAL;
> @@ -125,7 +137,7 @@ rte_fib_add(struct rte_fib *fib, uint32_t ip, uint8_t depth,
> uint64_t next_hop)
> if ((fib == NULL) || (fib->modify == NULL) ||
> (depth > RTE_FIB_MAXDEPTH))
> return -EINVAL;
> - return fib->modify(fib, ip, depth, next_hop, RTE_FIB_ADD);
> + return fib->modify(fib, 0, ip, depth, next_hop, RTE_FIB_ADD);
> }
>
> RTE_EXPORT_SYMBOL(rte_fib_delete)
> @@ -135,7 +147,7 @@ rte_fib_delete(struct rte_fib *fib, uint32_t ip, uint8_t
> depth)
> if ((fib == NULL) || (fib->modify == NULL) ||
> (depth > RTE_FIB_MAXDEPTH))
> return -EINVAL;
> - return fib->modify(fib, ip, depth, 0, RTE_FIB_DEL);
> + return fib->modify(fib, 0, ip, depth, 0, RTE_FIB_DEL);
> }
>
> RTE_EXPORT_SYMBOL(rte_fib_lookup_bulk)
> @@ -143,24 +155,73 @@ int
> rte_fib_lookup_bulk(struct rte_fib *fib, uint32_t *ips,
> uint64_t *next_hops, int n)
> {
> + static const uint16_t zero_vrf_ids[FIB_MAX_LOOKUP_BULK];
> + unsigned int off = 0;
> + unsigned int total = (unsigned int)n;
> +
> FIB_RETURN_IF_TRUE(((fib == NULL) || (ips == NULL) ||
> (next_hops == NULL) || (fib->lookup == NULL)), -EINVAL);
>
> - fib->lookup(fib->dp, ips, next_hops, n);
> + while (off < total) {
> + unsigned int chunk = RTE_MIN(total - off,
> + FIB_MAX_LOOKUP_BULK);
> + fib->lookup(fib->dp, zero_vrf_ids, ips + off,
> + next_hops + off, chunk);
> + off += chunk;
> + }
> +
> + return 0;
> +}
> +
> +RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_fib_vrf_lookup_bulk, 26.07)
> +int
> +rte_fib_vrf_lookup_bulk(struct rte_fib *fib, const uint16_t *vrf_ids,
> + const uint32_t *ips, uint64_t *next_hops, int n)
> +{
> + FIB_RETURN_IF_TRUE(((fib == NULL) || (vrf_ids == NULL) ||
> + (ips == NULL) || (next_hops == NULL) ||
> + (fib->lookup == NULL)), -EINVAL);
> +
> + fib->lookup(fib->dp, vrf_ids, ips, next_hops, (unsigned int)n);
> return 0;
> }
>
> +RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_fib_vrf_add, 26.07)
> +int
> +rte_fib_vrf_add(struct rte_fib *fib, uint16_t vrf_id, uint32_t ip,
> + uint8_t depth, uint64_t next_hop)
> +{
> + if ((fib == NULL) || (fib->modify == NULL) ||
> + (depth > RTE_FIB_MAXDEPTH))
> + return -EINVAL;
> + return fib->modify(fib, vrf_id, ip, depth, next_hop, RTE_FIB_ADD);
> +}
> +
> +RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_fib_vrf_delete, 26.07)
> +int
> +rte_fib_vrf_delete(struct rte_fib *fib, uint16_t vrf_id, uint32_t ip,
> + uint8_t depth)
> +{
> + if ((fib == NULL) || (fib->modify == NULL) ||
> + (depth > RTE_FIB_MAXDEPTH))
> + return -EINVAL;
> + return fib->modify(fib, vrf_id, ip, depth, 0, RTE_FIB_DEL);
> +}
> +
> RTE_EXPORT_SYMBOL(rte_fib_create)
> struct rte_fib *
> rte_fib_create(const char *name, int socket_id, struct rte_fib_conf *conf)
> {
> char mem_name[RTE_FIB_NAMESIZE];
> + char rib_name[RTE_FIB_NAMESIZE];
> int ret;
> struct rte_fib *fib = NULL;
> struct rte_rib *rib = NULL;
> struct rte_tailq_entry *te;
> struct rte_fib_list *fib_list;
> struct rte_rib_conf rib_conf;
> + uint16_t num_vrfs;
> + uint16_t vrf;
>
> /* Check user arguments. */
> if ((name == NULL) || (conf == NULL) || (conf->max_routes < 0) ||
> @@ -170,16 +231,42 @@ rte_fib_create(const char *name, int socket_id, struct
> rte_fib_conf *conf)
> return NULL;
> }
>
> + num_vrfs = (conf->max_vrfs == 0) ? 1 : conf->max_vrfs;
> rib_conf.ext_sz = conf->rib_ext_sz;
> rib_conf.max_nodes = conf->max_routes * 2;
>
> - rib = rte_rib_create(name, socket_id, &rib_conf);
> - if (rib == NULL) {
> - FIB_LOG(ERR,
> - "Can not allocate RIB %s", name);
> + struct rte_rib **ribs = rte_zmalloc_socket("FIB_RIBS",
> + num_vrfs * sizeof(*fib->ribs), RTE_CACHE_LINE_SIZE, socket_id);
> + if (ribs == NULL) {
> + FIB_LOG(ERR, "FIB %s RIB array allocation failed", name);
> + rte_errno = ENOMEM;
> return NULL;
> }
>
> + uint64_t *def_nh = rte_zmalloc_socket("FIB_DEF_NH",
> + num_vrfs * sizeof(*def_nh), RTE_CACHE_LINE_SIZE, socket_id);
> + if (def_nh == NULL) {
> + FIB_LOG(ERR, "FIB %s default nexthop array allocation failed",
> name);
> + rte_errno = ENOMEM;
> + rte_free(ribs);
> + return NULL;
> + }
> +
> + for (vrf = 0; vrf < num_vrfs; vrf++) {
> + if (num_vrfs == 1)
> + snprintf(rib_name, sizeof(rib_name), "%s", name);
> + else
> + snprintf(rib_name, sizeof(rib_name), "%s_vrf%u", name,
> vrf);
> + rib = rte_rib_create(rib_name, socket_id, &rib_conf);
> + if (rib == NULL) {
> + FIB_LOG(ERR, "Can not allocate RIB %s", rib_name);
> + goto free_ribs;
> + }
> + ribs[vrf] = rib;
> + def_nh[vrf] = (conf->vrf_default_nh != NULL) ?
> + conf->vrf_default_nh[vrf] : conf->default_nh;
> + }
> +
> snprintf(mem_name, sizeof(mem_name), "FIB_%s", name);
> fib_list = RTE_TAILQ_CAST(rte_fib_tailq.head, rte_fib_list);
>
> @@ -215,11 +302,13 @@ rte_fib_create(const char *name, int socket_id, struct
> rte_fib_conf *conf)
> goto free_te;
> }
>
> + fib->num_vrfs = num_vrfs;
> + fib->ribs = ribs;
> + fib->def_nh = def_nh;
> +
> rte_strlcpy(fib->name, name, sizeof(fib->name));
> - fib->rib = rib;
> fib->type = conf->type;
> fib->flags = conf->flags;
> - fib->def_nh = conf->default_nh;
> ret = init_dataplane(fib, socket_id, conf);
> if (ret < 0) {
> FIB_LOG(ERR,
> @@ -242,8 +331,12 @@ rte_fib_create(const char *name, int socket_id, struct
> rte_fib_conf *conf)
> rte_free(te);
> exit:
> rte_mcfg_tailq_write_unlock();
> - rte_rib_free(rib);
> +free_ribs:
> + for (vrf = 0; vrf < num_vrfs; vrf++)
> + rte_rib_free(ribs[vrf]);
>
> + rte_free(def_nh);
> + rte_free(ribs);
> return NULL;
> }
>
> @@ -311,7 +404,13 @@ rte_fib_free(struct rte_fib *fib)
> rte_mcfg_tailq_write_unlock();
>
> free_dataplane(fib);
> - rte_rib_free(fib->rib);
> + if (fib->ribs != NULL) {
> + uint16_t vrf;
> + for (vrf = 0; vrf < fib->num_vrfs; vrf++)
> + rte_rib_free(fib->ribs[vrf]);
> + }
> + rte_free(fib->ribs);
> + rte_free(fib->def_nh);
> rte_free(fib);
> rte_free(te);
> }
> @@ -327,7 +426,18 @@ RTE_EXPORT_SYMBOL(rte_fib_get_rib)
> struct rte_rib *
> rte_fib_get_rib(struct rte_fib *fib)
> {
> - return (fib == NULL) ? NULL : fib->rib;
> + return (fib == NULL || fib->ribs == NULL) ? NULL : fib->ribs[0];
> +}
> +
> +RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_fib_vrf_get_rib, 26.07)
> +struct rte_rib *
> +rte_fib_vrf_get_rib(struct rte_fib *fib, uint16_t vrf_id)
> +{
> + if (fib == NULL || fib->ribs == NULL)
> + return NULL;
> + if (vrf_id >= fib->num_vrfs)
> + return NULL;
> + return fib->ribs[vrf_id];
> }
>
> RTE_EXPORT_SYMBOL(rte_fib_select_lookup)
> diff --git a/lib/fib/rte_fib.h b/lib/fib/rte_fib.h
> index b16a653535..883195c7d6 100644
> --- a/lib/fib/rte_fib.h
> +++ b/lib/fib/rte_fib.h
> @@ -53,11 +53,11 @@ enum rte_fib_type {
> };
>
> /** Modify FIB function */
> -typedef int (*rte_fib_modify_fn_t)(struct rte_fib *fib, uint32_t ip,
> - uint8_t depth, uint64_t next_hop, int op);
> +typedef int (*rte_fib_modify_fn_t)(struct rte_fib *fib, uint16_t vrf_id,
> + uint32_t ip, uint8_t depth, uint64_t next_hop, int op);
> /** FIB bulk lookup function */
> -typedef void (*rte_fib_lookup_fn_t)(void *fib, const uint32_t *ips,
> - uint64_t *next_hops, const unsigned int n);
> +typedef void (*rte_fib_lookup_fn_t)(void *fib, const uint16_t *vrf_ids,
> + const uint32_t *ips, uint64_t *next_hops, const unsigned int n);
>
> enum rte_fib_op {
> RTE_FIB_ADD,
> @@ -110,6 +110,10 @@ struct rte_fib_conf {
> } dir24_8;
> };
> unsigned int flags; /**< Optional feature flags from RTE_FIB_F_* */
> + /** Number of VRFs to support (0 or 1 = single VRF for backward compat)
> */
> + uint16_t max_vrfs;
> + /** Per-VRF default nexthops (NULL = use default_nh for all) */
> + uint64_t *vrf_default_nh;
> };
>
> /** FIB RCU QSBR configuration structure. */
> @@ -224,6 +228,71 @@ rte_fib_delete(struct rte_fib *fib, uint32_t ip, uint8_t
> depth);
> int
> rte_fib_lookup_bulk(struct rte_fib *fib, uint32_t *ips,
> uint64_t *next_hops, int n);
> +
> +/**
> + * Add a route to the FIB with VRF ID.
> + *
> + * @param fib
> + * FIB object handle
> + * @param vrf_id
> + * VRF ID (0 to max_vrfs-1)
> + * @param ip
> + * IPv4 prefix address to be added to the FIB
> + * @param depth
> + * Prefix length
> + * @param next_hop
> + * Next hop to be added to the FIB
> + * @return
> + * 0 on success, negative value otherwise
> + */
> +__rte_experimental
> +int
> +rte_fib_vrf_add(struct rte_fib *fib, uint16_t vrf_id, uint32_t ip,
> + uint8_t depth, uint64_t next_hop);
> +
> +/**
> + * Delete a rule from the FIB with VRF ID.
> + *
> + * @param fib
> + * FIB object handle
> + * @param vrf_id
> + * VRF ID (0 to max_vrfs-1)
> + * @param ip
> + * IPv4 prefix address to be deleted from the FIB
> + * @param depth
> + * Prefix length
> + * @return
> + * 0 on success, negative value otherwise
> + */
> +__rte_experimental
> +int
> +rte_fib_vrf_delete(struct rte_fib *fib, uint16_t vrf_id, uint32_t ip,
> + uint8_t depth);
> +
> +/**
> + * Lookup multiple IP addresses in the FIB with per-packet VRF IDs.
> + *
> + * @param fib
> + * FIB object handle
> + * @param vrf_ids
> + * Array of VRF IDs
> + * @param ips
> + * Array of IPs to be looked up in the FIB
> + * @param next_hops
> + * Next hop of the most specific rule found for IP in the corresponding VRF.
> + * This is an array of eight byte values.
> + * If the lookup for the given IP failed, then corresponding element would
> + * contain default nexthop value configured for that VRF.
> + * @param n
> + * Number of elements in vrf_ids, ips (and next_hops) arrays to lookup.
> + * @return
> + * -EINVAL for incorrect arguments, otherwise 0
> + */
> +__rte_experimental
> +int
> +rte_fib_vrf_lookup_bulk(struct rte_fib *fib, const uint16_t *vrf_ids,
> + const uint32_t *ips, uint64_t *next_hops, int n);
> +
> /**
> * Get pointer to the dataplane specific struct
> *
> @@ -237,7 +306,7 @@ void *
> rte_fib_get_dp(struct rte_fib *fib);
>
> /**
> - * Get pointer to the RIB
> + * Get pointer to the RIB for VRF 0
> *
> * @param fib
> * FIB object handle
> @@ -248,6 +317,21 @@ rte_fib_get_dp(struct rte_fib *fib);
> struct rte_rib *
> rte_fib_get_rib(struct rte_fib *fib);
>
> +/**
> + * Get pointer to the RIB for a specific VRF
> + *
> + * @param fib
> + * FIB object handle
> + * @param vrf_id
> + * VRF ID (0 to max_vrfs-1)
> + * @return
> + * Pointer on the RIB on success
> + * NULL otherwise
> + */
> +__rte_experimental
> +struct rte_rib *
> +rte_fib_vrf_get_rib(struct rte_fib *fib, uint16_t vrf_id);
> +
> /**
> * Set lookup function based on type
> *
> --
> 2.43.0
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [RFC PATCH 0/4] VRF support in FIB library
2026-03-23 14:53 ` Maxime Leroy
2026-03-23 15:08 ` Robin Jarry
@ 2026-03-23 18:42 ` Medvedkin, Vladimir
2026-03-24 9:19 ` Maxime Leroy
1 sibling, 1 reply; 33+ messages in thread
From: Medvedkin, Vladimir @ 2026-03-23 18:42 UTC (permalink / raw)
To: Maxime Leroy; +Cc: dev, rjarry, nsaxena16, mb, adwivedi, jerinjacobk
On 3/23/2026 2:53 PM, Maxime Leroy wrote:
> On Mon, Mar 23, 2026 at 1:49 PM Medvedkin, Vladimir
> <vladimir.medvedkin@intel.com> wrote:
>> Hi Maxime,
>>
>> On 3/23/2026 11:27 AM, Maxime Leroy wrote:
>>> Hi Vladimir,
>>>
>>>
>>> On Sun, Mar 22, 2026 at 4:42 PM Vladimir Medvedkin
>>> <vladimir.medvedkin@intel.com> wrote:
>>>> This series adds multi-VRF support to both IPv4 and IPv6 FIB paths by
>>>> allowing a single FIB instance to host multiple isolated routing domains.
>>>>
>>>> Currently FIB instance represents one routing instance. For workloads that
>>>> need multiple VRFs, the only option is to create multiple FIB objects. In a
>>>> burst oriented datapath, packets in the same batch can belong to different VRFs, so
>>>> the application either does per-packet lookup in different FIB instances or
>>>> regroups packets by VRF before lookup. Both approaches are expensive.
>>>>
>>>> To remove that cost, this series keeps all VRFs inside one FIB instance and
>>>> extends lookup input with per-packet VRF IDs.
>>>>
>>>> The design follows the existing fast-path structure for both families. IPv4 and
>>>> IPv6 use multi-ary trees with a 2^24 associativity on a first level (tbl24). The
>>>> first-level table scales per configured VRF. This increases memory usage, but
>>>> keeps performance and lookup complexity on par with non-VRF implementation.
>>>>
>>> Thanks for the RFC. Some thoughts below.
>>>
>>> Memory cost: the flat TBL24 replicates the entire table for every VRF
>>> (num_vrfs * 2^24 * nh_size). With 256 VRFs and 8B nexthops that is
>>> 32 GB for TBL24 alone. In grout we support up to 256 VRFs allocated
>>> on demand -- this approach forces the full cost upfront even if most
>>> VRFs are empty.
>> Yes, increased memory consumption is the
>> trade-off.WemakethischoiceinDPDKquite often,such as pre-allocatedmbufs,
>> mempoolsand many other stuff allocated in advance to gain performance.
>> For FIB, I chose to replicate TBL24 per VRF for this same reason.
>>
>> And, as Morten mentioned earlier, if memory is the priority, a table
>> instance per VRF allocated on-demand is still supported.
>>
>> The high memory cost stems from TBL24's design: for IPv4, it was
>> justified by the BGP filtering convention (no prefixes more specific
>> than /24 in BGPv4 full view), ensuring most lookups hit with just one
>> random memory access. For IPv6, we should likely switch to a 16-bit TRIE
>> scheme on all layers. For IPv4, alternative algorithms with smaller
>> footprints (like DXR or DIR16-8-8, as used in VPP) may be worth
>> exploring if BGP full view is not required for those VRFs.
>>
>>> Per-packet VRF lookup: Rx bursts come from one port, thus one VRF.
>>> Mixed-VRF bulk lookups do not occur in practice. The three AVX512
>>> code paths add complexity for a scenario that does not exist, at
>>> least for a classic router. Am I missing a use-case?
>> That's not true, you're missing out on a lot of established core use
>> cases that are at least 2 decades old:
>>
>> - VLAN subinterface abstraction. Each subinterface may belong to a
>> separate VRF
>>
>> - MPLS VPN
>>
>> - Policy based routing
>>
> Fair point on VLAN subinterfaces and MPLS VPN. SRv6 L3VPN (End.DT4/
> End.DT6) also fits that pattern after decap.
>
> I agree DPDK often pre-allocates for performance, but I wonder if the
> flat TBL24 actually helps here. Each VRF's working set is spread
> 128 MB apart in the flat table. Would regrouping packets by VRF and
> doing one bulk lookup per VRF with separate contiguous TBL24s be
> more cache-friendly than a single mixed-VRF gather? Do you have
> benchmarks comparing the two approaches?
It depends. Generally, if we assume that we are working with wide
internet traffic, then even for a single VRF we most likely will miss
the cache for TLB24, thus, regardless of the size of the tbl24, each
memory access will be performed directly to DRAM.
And if the addresses are localized (i.e. most traffic is internal), then
having multiple TBL24 won'tmake the situationmuchworse.
I don't have any benchmarks for regrouping, however I have 2 things to
consider:
1. lookup is relatively fast (for IPv4 it is about 10 cycles per
address, and I don't really want to slow it down)
2. incoming addresses and their corresponding VRFs are not controlled by
"us", so this is a random set. Regrouping effectively is sorting. I'm
not really happy to have nlogn complexity on a fast path :)
>
> On the memory trade-off and VRF ID mapping: the API uses vrf_id as
> a direct index (0 to max_vrfs-1). With 256 VRFs and 8B nexthops,
> TBL24 alone costs 32 GB for IPv4 and 32 GB for IPv6 -- 64 GB total
> at startup. In grout, VRF IDs are interface IDs that can be any
> uint16_t, so we would also need to maintain a mapping between our
> VRF IDs and FIB slot indices.
of course, this is an application responsibility. In FIB VRFs are in
continuous range.
> We would need to introduce a max_vrfs
> limit, which forces a bad trade-off: either set it low (e.g. 16)
> and limit deployments, or set it high (e.g. 256) and pay 64 GB at
> startup even with a single VRF. With separate FIB instances per VRF,
> we only allocate what we use.
Yes, I understand this. In the end, if the user wants to use 256 VRFs,
the amount of memory footprint will be at least 64Gb anyway.
As a trade-off for a bad trade-off ;) I can suggest to allocate it in
chunks. Let's say you are starting with 16 VRFs, and during runtime, if
the user wants to increase the number of VRFs above this limit, you can
allocate another 16xVRF FIB. Then, of course, you need to split
addresses into 2 bursts each for each FIB handle.
>
>>> I am not too familiar with DPDK FIB internals, but would it be
>>> possible to keep a separate TBL24 per VRF and only share the TBL8
>>> pool?
>> it is how it is implemented right now with one note - TBL24 are pre
>> allocated.
>>> Something like pre-allocating an array of max_vrfs TBL24
>>> pointers, allocating each TBL24 on demand at VRF add time,
>> and you suggesting to allocate TBL24 on demand by adding an extra
>> indirection layer. Thiswill leadtolowerperformance,whichIwouldliketo avoid.
>>> and
>>> having them all point into a shared TBL8 pool. The TBL8 index in
>>> TBL24 entries seems to already be global, so would that work without
>>> encoding changes?
>>>
>>> Going further: could the same idea extend to IPv6? The dir24_8 and
>>> trie seem to use the same TBL8 block format (256 entries, same
>>> (nh << 1) | ext_bit encoding, same size). Would unifying the TBL8
>>> allocator allow a single pool shared across IPv4, IPv6, and all
>>> VRFs? That could be a bigger win for /32-heavy and /128-heavy tables
>>> and maybe a good first step before multi-VRF.
>> So, you are suggesting merging IPv4 and IPv6 into a single unified FIB?
>> I'm not sure how this can be a bigger win, could you please elaborate
>> more on this?
> On the IPv4/IPv6 TBL8 pool: I was not suggesting merging FIBs, just
> sharing the TBL8 block allocator between separate FIB instances.
> This is possible since dir24_8 and trie use the same TBL8 block
> format (256 entries, same encoding, same size).
>
> Would it be possible to pass a shared TBL8 pool at rte_fib_create()
> time? Each FIB keeps its own TBL24 and RIB, but TBL8 is shared
> across all FIBs and potentially across IPv4/IPv6. Users would no
> longer have to guess num_tbl8 per FIB.
Yes, this is possible. However, this will significantly complicate the
work with the library, solving a not so big problem.
>>> Regards,
>>>
>>> Maxime Leroy
>>>
>>>> Vladimir Medvedkin (4):
>>>> fib: add multi-VRF support
>>>> fib: add VRF functional and unit tests
>>>> fib6: add multi-VRF support
>>>> fib6: add VRF functional and unit tests
>>>>
>>>> app/test-fib/main.c | 257 ++++++++++++++++++++++--
>>>> app/test/test_fib.c | 298 +++++++++++++++++++++++++++
>>>> app/test/test_fib6.c | 319 ++++++++++++++++++++++++++++-
>>>> lib/fib/dir24_8.c | 241 ++++++++++++++++------
>>>> lib/fib/dir24_8.h | 255 ++++++++++++++++--------
>>>> lib/fib/dir24_8_avx512.c | 420 +++++++++++++++++++++++++++++++--------
>>>> lib/fib/dir24_8_avx512.h | 80 +++++++-
>>>> lib/fib/rte_fib.c | 158 ++++++++++++---
>>>> lib/fib/rte_fib.h | 94 ++++++++-
>>>> lib/fib/rte_fib6.c | 166 +++++++++++++---
>>>> lib/fib/rte_fib6.h | 88 +++++++-
>>>> lib/fib/trie.c | 158 +++++++++++----
>>>> lib/fib/trie.h | 51 +++--
>>>> lib/fib/trie_avx512.c | 225 +++++++++++++++++++--
>>>> lib/fib/trie_avx512.h | 39 +++-
>>>> 15 files changed, 2453 insertions(+), 396 deletions(-)
>>>>
>>>> --
>>>> 2.43.0
>>>>
>> --
>> Regards,
>> Vladimir
>>
>
--
Regards,
Vladimir
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [RFC PATCH 0/4] VRF support in FIB library
2026-03-23 15:27 ` Morten Brørup
@ 2026-03-23 18:52 ` Medvedkin, Vladimir
0 siblings, 0 replies; 33+ messages in thread
From: Medvedkin, Vladimir @ 2026-03-23 18:52 UTC (permalink / raw)
To: Morten Brørup, Robin Jarry, Maxime Leroy
Cc: dev, nsaxena16, adwivedi, jerinjacobk, Stephen Hemminger
On 3/23/2026 3:27 PM, Morten Brørup wrote:
> Let's take a level up in abstraction, and consider one of the key design goals:
> What are the typical use cases we want VRF support designed for?
>
> I can imagine a use case with one or very few VRFs with a very large (internet) route table, and many VRFs with small (private) route tables.
>
> PS: I have no experience with the DPDK FIB library, so my feedback might be completely off.
No, you completely correct :) In real world scenarios only a few VRFs
will have full view, so I considered that number of VRFs would be small.
For small (private) route tables, it would be good to use either smaller
nexthops (e.g. 2bytes), or some other LPM algorithm that is not so
greedy for memory like DIR24_8.
>
--
Regards,
Vladimir
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [RFC PATCH 0/4] VRF support in FIB library
2026-03-22 15:42 [RFC PATCH 0/4] VRF support in FIB library Vladimir Medvedkin
` (6 preceding siblings ...)
2026-03-23 11:27 ` Maxime Leroy
@ 2026-03-23 19:05 ` Stephen Hemminger
7 siblings, 0 replies; 33+ messages in thread
From: Stephen Hemminger @ 2026-03-23 19:05 UTC (permalink / raw)
To: Vladimir Medvedkin; +Cc: dev, rjarry, nsaxena16, mb, adwivedi, jerinjacobk
On Sun, 22 Mar 2026 15:42:11 +0000
Vladimir Medvedkin <vladimir.medvedkin@intel.com> wrote:
> This series adds multi-VRF support to both IPv4 and IPv6 FIB paths by
> allowing a single FIB instance to host multiple isolated routing domains.
>
> Currently FIB instance represents one routing instance. For workloads that
> need multiple VRFs, the only option is to create multiple FIB objects. In a
> burst oriented datapath, packets in the same batch can belong to different VRFs, so
> the application either does per-packet lookup in different FIB instances or
> regroups packets by VRF before lookup. Both approaches are expensive.
>
> To remove that cost, this series keeps all VRFs inside one FIB instance and
> extends lookup input with per-packet VRF IDs.
>
> The design follows the existing fast-path structure for both families. IPv4 and
> IPv6 use multi-ary trees with a 2^24 associativity on a first level (tbl24). The
> first-level table scales per configured VRF. This increases memory usage, but
> keeps performance and lookup complexity on par with non-VRF implementation.
>
> Vladimir Medvedkin (4):
> fib: add multi-VRF support
> fib: add VRF functional and unit tests
> fib6: add multi-VRF support
> fib6: add VRF functional and unit tests
>
> app/test-fib/main.c | 257 ++++++++++++++++++++++--
> app/test/test_fib.c | 298 +++++++++++++++++++++++++++
> app/test/test_fib6.c | 319 ++++++++++++++++++++++++++++-
> lib/fib/dir24_8.c | 241 ++++++++++++++++------
> lib/fib/dir24_8.h | 255 ++++++++++++++++--------
> lib/fib/dir24_8_avx512.c | 420 +++++++++++++++++++++++++++++++--------
> lib/fib/dir24_8_avx512.h | 80 +++++++-
> lib/fib/rte_fib.c | 158 ++++++++++++---
> lib/fib/rte_fib.h | 94 ++++++++-
> lib/fib/rte_fib6.c | 166 +++++++++++++---
> lib/fib/rte_fib6.h | 88 +++++++-
> lib/fib/trie.c | 158 +++++++++++----
> lib/fib/trie.h | 51 +++--
> lib/fib/trie_avx512.c | 225 +++++++++++++++++++--
> lib/fib/trie_avx512.h | 39 +++-
> 15 files changed, 2453 insertions(+), 396 deletions(-)
>
AI review found several things
Review: [RFC PATCH 1/4] fib: add multi-VRF support
[RFC PATCH 2/4] fib: add VRF functional and unit tests
[RFC PATCH 3/4] fib6: add multi-VRF support
[RFC PATCH 4/4] fib6: add VRF functional and unit tests
Overall this is a well-structured RFC that adds multi-VRF support
to both the IPv4 and IPv6 FIB libraries with AVX512-optimized
lookup paths and comprehensive test coverage. There is one
significant correctness bug in the AVX512 gather paths, several
design points worth discussing, and some minor issues.
Patch 1/4 - fib: add multi-VRF support
Error: Signed overflow in AVX512 32-bit gather for VRF IDs >= 128
The VRF_SCALE_SMALL path (num_vrfs in [2, 255]) computes the
tbl24 index in 32-bit arithmetic as (vrf_id << 24) + (ip >> 8).
For vrf_id >= 128, vrf_id << 24 sets bit 31, making the result
negative when interpreted as int32. The _mm512_i32gather_epi32
and _mm512_i32gather_epi64 intrinsics sign-extend 32-bit indices
to compute byte offsets, so a negative index produces a read
before the start of tbl24 -- a buffer underflow.
Example: vrf_id=128, ip=0 gives index 0x08000000 << 24 =
0x80000000 = -2147483648 as signed int32.
This affects all nexthop sizes in both dir24_8_avx512.c and
trie_avx512.c.
Fix: Either lower the VRF_SCALE_SMALL ceiling from 256 to 128
(so VRFs 128-255 use the 64-bit path), or switch to unsigned
gather by pre-scaling the indices into byte offsets and using
scale=1 with unsigned arithmetic.
In dir24_8_avx512.c get_vector_fn():
if (dp->num_vrfs >= 256) {
should be:
if (dp->num_vrfs >= 128) {
Same change needed in trie.c get_vector_fn().
Warning: ABI break -- public function pointer typedefs changed
rte_fib_lookup_fn_t and rte_fib_modify_fn_t in rte_fib.h (and
the corresponding fib6 typedefs in rte_fib6.h) have new
parameters (vrf_ids/vrf_id). These are installed header typedefs
used by applications setting custom lookup functions via
rte_fib_select_lookup(). Changing them is an ABI break that needs
deprecation notice or ABI versioning.
Similarly, adding max_vrfs and vrf_default_nh to rte_fib_conf and
rte_fib6_conf changes the struct layout.
Since this is RFC, this is expected, but it will need to be
addressed before non-RFC submission.
Warning: No release notes for new experimental APIs
Eight new experimental APIs are added (rte_fib_vrf_add,
rte_fib_vrf_delete, rte_fib_vrf_lookup_bulk, rte_fib_vrf_get_rib
plus the fib6 equivalents). These need entries in
doc/guides/rel_notes/.
Warning: No testpmd hooks for new APIs
Per DPDK policy, new APIs should have hooks in app/testpmd.
Patch 2/4 - fib: add VRF functional and unit tests
Warning: Resource leak in run_v4() -- conf.vrf_default_nh not freed
In app/test-fib/main.c run_v4(), conf.vrf_default_nh is allocated
via rte_malloc() but never freed on any path (success or failure).
Same issue in run_v6() in patch 4/4.
Patch 3/4 - fib6: add multi-VRF support
Error: Same signed-overflow AVX512 gather bug as patch 1/4
The trie_avx512.c VRF_SCALE_SMALL path has the identical issue:
_mm512_slli_epi32(vrf32, 24) produces a negative signed index
for vrf_id >= 128, causing the 32-bit gather to read from a
negative offset.
In trie.c get_vector_fn():
if (dp->num_vrfs >= 256) {
should be:
if (dp->num_vrfs >= 128) {
Warning: Potential 32-bit truncation in trie helper functions
build_common_root() computes idx_tbl as uint64_t but passes it
to get_tbl_val_by_idx() and get_tbl_p_by_idx(). If those helpers
take uint32_t index parameters (the original code used 32-bit
indices), the upper bits will be silently truncated for large VRF
counts. The helpers should be widened to accept uint64_t, or
confirm they already do.
In practice, large VRF counts (hundreds+) with IPv6 trie tbl24
would require terabytes of memory, so this is unlikely to
manifest, but it is a latent correctness issue.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [RFC PATCH 1/4] fib: add multi-VRF support
2026-03-23 15:48 ` Konstantin Ananyev
@ 2026-03-23 19:06 ` Medvedkin, Vladimir
2026-03-23 22:22 ` Konstantin Ananyev
0 siblings, 1 reply; 33+ messages in thread
From: Medvedkin, Vladimir @ 2026-03-23 19:06 UTC (permalink / raw)
To: Konstantin Ananyev, dev@dpdk.org
Cc: rjarry@redhat.com, nsaxena16@gmail.com, mb@smartsharesystems.com,
adwivedi@marvell.com, jerinjacobk@gmail.com, Maxime Leroy
Hi Konstantin,
On 3/23/2026 3:48 PM, Konstantin Ananyev wrote:
>
>> Add VRF (Virtual Routing and Forwarding) support to the IPv4
>> FIB library, allowing multiple independent routing tables
>> within a single FIB instance.
>>
>> Introduce max_vrfs and vrf_default_nh fields in rte_fib_conf
>> to configure the number of VRFs and per-VRF default nexthops.
> Thanks Vladimir, allowing multiple VRFs per same LPM table will
> definitely be a useful thing to have.
> Though, I have the same concern as Maxime:
> memory requirements are just overwhelming.
> Stupid q - why just not to store a pointer to a vector of next-hops
> within the table entry?
Am I understand correctly, a vector with max_number_of_vrfs entries and
use vrf id to address a nexthop?
Yes, this may work.
But, if we are going to do an extra memory access, I'd better to
maintain an internal hash table with 5 byte keys {24_bits_from_LPM,
16_bits_vrf_id} to retrieve a nexthop.
> And we can provide to the user with ability to specify custom
> alloc/free function for these vectors.
> That would help to avoid allocating huge chunks of memory at startup.
> I understand that it will be one extra memory dereference,
> but probably it will be not that critical in terms of performance .
> Again for bulk function we might be able to pipeline lookups and
> de-references and hide that extra load latency.
>
>> Add four new experimental APIs:
>> - rte_fib_vrf_add() and rte_fib_vrf_delete() to manage routes
>> per VRF
>> - rte_fib_vrf_lookup_bulk() for multi-VRF bulk lookups
>> - rte_fib_vrf_get_rib() to retrieve a per-VRF RIB handle
>>
>> Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
>> ---
>> lib/fib/dir24_8.c | 241 ++++++++++++++++------
>> lib/fib/dir24_8.h | 255 ++++++++++++++++--------
>> lib/fib/dir24_8_avx512.c | 420 +++++++++++++++++++++++++++++++--------
>> lib/fib/dir24_8_avx512.h | 80 +++++++-
>> lib/fib/rte_fib.c | 158 ++++++++++++---
>> lib/fib/rte_fib.h | 94 ++++++++-
>> 6 files changed, 988 insertions(+), 260 deletions(-)
>>
<snip>
--
Regards,
Vladimir
^ permalink raw reply [flat|nested] 33+ messages in thread
* RE: [RFC PATCH 1/4] fib: add multi-VRF support
2026-03-23 19:06 ` Medvedkin, Vladimir
@ 2026-03-23 22:22 ` Konstantin Ananyev
2026-03-25 14:09 ` Medvedkin, Vladimir
0 siblings, 1 reply; 33+ messages in thread
From: Konstantin Ananyev @ 2026-03-23 22:22 UTC (permalink / raw)
To: Medvedkin, Vladimir, dev@dpdk.org
Cc: rjarry@redhat.com, nsaxena16@gmail.com, mb@smartsharesystems.com,
adwivedi@marvell.com, jerinjacobk@gmail.com, Maxime Leroy
> >> Add VRF (Virtual Routing and Forwarding) support to the IPv4
> >> FIB library, allowing multiple independent routing tables
> >> within a single FIB instance.
> >>
> >> Introduce max_vrfs and vrf_default_nh fields in rte_fib_conf
> >> to configure the number of VRFs and per-VRF default nexthops.
> > Thanks Vladimir, allowing multiple VRFs per same LPM table will
> > definitely be a useful thing to have.
> > Though, I have the same concern as Maxime:
> > memory requirements are just overwhelming.
> > Stupid q - why just not to store a pointer to a vector of next-hops
> > within the table entry?
>
> Am I understand correctly, a vector with max_number_of_vrfs entries and
> use vrf id to address a nexthop?
Yes.
> Yes, this may work.
> But, if we are going to do an extra memory access, I'd better to
> maintain an internal hash table with 5 byte keys {24_bits_from_LPM,
> 16_bits_vrf_id} to retrieve a nexthop.
Hmm... and what to do with entries in tbl8, I mean what will be the key for them?
Or you don't plan to put entries from tbl8 to that hash table?
> > And we can provide to the user with ability to specify custom
> > alloc/free function for these vectors.
> > That would help to avoid allocating huge chunks of memory at startup.
> > I understand that it will be one extra memory dereference,
> > but probably it will be not that critical in terms of performance .
> > Again for bulk function we might be able to pipeline lookups and
> > de-references and hide that extra load latency.
> >
> >> Add four new experimental APIs:
> >> - rte_fib_vrf_add() and rte_fib_vrf_delete() to manage routes
> >> per VRF
> >> - rte_fib_vrf_lookup_bulk() for multi-VRF bulk lookups
> >> - rte_fib_vrf_get_rib() to retrieve a per-VRF RIB handle
> >>
> >> Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
> >> ---
> >> lib/fib/dir24_8.c | 241 ++++++++++++++++------
> >> lib/fib/dir24_8.h | 255 ++++++++++++++++--------
> >> lib/fib/dir24_8_avx512.c | 420 +++++++++++++++++++++++++++++++--------
> >> lib/fib/dir24_8_avx512.h | 80 +++++++-
> >> lib/fib/rte_fib.c | 158 ++++++++++++---
> >> lib/fib/rte_fib.h | 94 ++++++++-
> >> 6 files changed, 988 insertions(+), 260 deletions(-)
> >>
> <snip>
>
> --
> Regards,
> Vladimir
>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [RFC PATCH 0/4] VRF support in FIB library
2026-03-23 18:42 ` Medvedkin, Vladimir
@ 2026-03-24 9:19 ` Maxime Leroy
2026-03-25 15:56 ` Medvedkin, Vladimir
0 siblings, 1 reply; 33+ messages in thread
From: Maxime Leroy @ 2026-03-24 9:19 UTC (permalink / raw)
To: Medvedkin, Vladimir; +Cc: dev, rjarry, nsaxena16, mb, adwivedi, jerinjacobk
Hi Vladimir,
On Mon, Mar 23, 2026 at 7:46 PM Medvedkin, Vladimir
<vladimir.medvedkin@intel.com> wrote:
>
>
> On 3/23/2026 2:53 PM, Maxime Leroy wrote:
> > On Mon, Mar 23, 2026 at 1:49 PM Medvedkin, Vladimir
> > <vladimir.medvedkin@intel.com> wrote:
> >> Hi Maxime,
> >>
> >> On 3/23/2026 11:27 AM, Maxime Leroy wrote:
> >>> Hi Vladimir,
> >>>
> >>>
> >>> On Sun, Mar 22, 2026 at 4:42 PM Vladimir Medvedkin
> >>> <vladimir.medvedkin@intel.com> wrote:
> >>>> This series adds multi-VRF support to both IPv4 and IPv6 FIB paths by
> >>>> allowing a single FIB instance to host multiple isolated routing domains.
> >>>>
> >>>> Currently FIB instance represents one routing instance. For workloads that
> >>>> need multiple VRFs, the only option is to create multiple FIB objects. In a
> >>>> burst oriented datapath, packets in the same batch can belong to different VRFs, so
> >>>> the application either does per-packet lookup in different FIB instances or
> >>>> regroups packets by VRF before lookup. Both approaches are expensive.
> >>>>
> >>>> To remove that cost, this series keeps all VRFs inside one FIB instance and
> >>>> extends lookup input with per-packet VRF IDs.
> >>>>
> >>>> The design follows the existing fast-path structure for both families. IPv4 and
> >>>> IPv6 use multi-ary trees with a 2^24 associativity on a first level (tbl24). The
> >>>> first-level table scales per configured VRF. This increases memory usage, but
> >>>> keeps performance and lookup complexity on par with non-VRF implementation.
> >>>>
> >>> Thanks for the RFC. Some thoughts below.
> >>>
> >>> Memory cost: the flat TBL24 replicates the entire table for every VRF
> >>> (num_vrfs * 2^24 * nh_size). With 256 VRFs and 8B nexthops that is
> >>> 32 GB for TBL24 alone. In grout we support up to 256 VRFs allocated
> >>> on demand -- this approach forces the full cost upfront even if most
> >>> VRFs are empty.
> >> Yes, increased memory consumption is the
> >> trade-off.WemakethischoiceinDPDKquite often,such as pre-allocatedmbufs,
> >> mempoolsand many other stuff allocated in advance to gain performance.
> >> For FIB, I chose to replicate TBL24 per VRF for this same reason.
> >>
> >> And, as Morten mentioned earlier, if memory is the priority, a table
> >> instance per VRF allocated on-demand is still supported.
> >>
> >> The high memory cost stems from TBL24's design: for IPv4, it was
> >> justified by the BGP filtering convention (no prefixes more specific
> >> than /24 in BGPv4 full view), ensuring most lookups hit with just one
> >> random memory access. For IPv6, we should likely switch to a 16-bit TRIE
> >> scheme on all layers. For IPv4, alternative algorithms with smaller
> >> footprints (like DXR or DIR16-8-8, as used in VPP) may be worth
> >> exploring if BGP full view is not required for those VRFs.
> >>
> >>> Per-packet VRF lookup: Rx bursts come from one port, thus one VRF.
> >>> Mixed-VRF bulk lookups do not occur in practice. The three AVX512
> >>> code paths add complexity for a scenario that does not exist, at
> >>> least for a classic router. Am I missing a use-case?
> >> That's not true, you're missing out on a lot of established core use
> >> cases that are at least 2 decades old:
> >>
> >> - VLAN subinterface abstraction. Each subinterface may belong to a
> >> separate VRF
> >>
> >> - MPLS VPN
> >>
> >> - Policy based routing
> >>
> > Fair point on VLAN subinterfaces and MPLS VPN. SRv6 L3VPN (End.DT4/
> > End.DT6) also fits that pattern after decap.
> >
> > I agree DPDK often pre-allocates for performance, but I wonder if the
> > flat TBL24 actually helps here. Each VRF's working set is spread
> > 128 MB apart in the flat table. Would regrouping packets by VRF and
> > doing one bulk lookup per VRF with separate contiguous TBL24s be
> > more cache-friendly than a single mixed-VRF gather? Do you have
> > benchmarks comparing the two approaches?
>
> It depends. Generally, if we assume that we are working with wide
> internet traffic, then even for a single VRF we most likely will miss
> the cache for TLB24, thus, regardless of the size of the tbl24, each
> memory access will be performed directly to DRAM.
If the lookup is DRAM-bound anyway, then the 10 cycles/addr cost
is dominated by memory latency, not CPU. The CPU cost of a bucket
sort on 32-64 packets is negligible next to a DRAM access (~80-100
ns per cache miss). That actually makes the case for regroup +
per-VRF lookup: the regrouping is pure CPU work hidden behind
memory stalls, and each per-VRF bulk lookup hits a contiguous
TBL24 instead of scattering across 128 MB-apart VRF regions.
> And if the addresses are localized (i.e. most traffic is internal), then
> having multiple TBL24 won'tmake the situationmuchworse.
>
With localized traffic, regrouping by VRF + per-VRF lookup on
contiguous TBL24s would benefit from cache locality, while the
flat multi-VRF table spreads hot entries 128 MB apart. The flat
approach may actually be worse in that scenario
> I don't have any benchmarks for regrouping, however I have 2 things to
> consider:
>
> 1. lookup is relatively fast (for IPv4 it is about 10 cycles per
> address, and I don't really want to slow it down)
>
> 2. incoming addresses and their corresponding VRFs are not controlled by
> "us", so this is a random set. Regrouping effectively is sorting. I'm
> not really happy to have nlogn complexity on a fast path :)
Without benchmarks, we do not know whether the flat approach is
actually faster than regroup + per-VRF lookup.
>
> >
> > On the memory trade-off and VRF ID mapping: the API uses vrf_id as
> > a direct index (0 to max_vrfs-1). With 256 VRFs and 8B nexthops,
> > TBL24 alone costs 32 GB for IPv4 and 32 GB for IPv6 -- 64 GB total
> > at startup. In grout, VRF IDs are interface IDs that can be any
> > uint16_t, so we would also need to maintain a mapping between our
> > VRF IDs and FIB slot indices.
> of course, this is an application responsibility. In FIB VRFs are in
> continuous range.
> > We would need to introduce a max_vrfs
> > limit, which forces a bad trade-off: either set it low (e.g. 16)
> > and limit deployments, or set it high (e.g. 256) and pay 64 GB at
> > startup even with a single VRF. With separate FIB instances per VRF,
> > we only allocate what we use.
> Yes, I understand this. In the end, if the user wants to use 256 VRFs,
> the amount of memory footprint will be at least 64Gb anyway.
The difference is when the memory is committed. With separate FIB
instances per VRF, you allocate 128 MB only when a VRF is actually
created at runtime. With the flat multi-VRF approach, you pay
max_vrfs * 128 MB at startup, even if only one VRF is active.
On top of that, the API uses vrf_id as a direct index (0 to
max_vrfs-1). As Stephen noted, there are multiple ways to model
VRFs. Depending on the networking stack, VRFs are identified by
ifindex (Linux l3mdev), by name (Cisco, Juniper), or by some
other scheme. This means the application must maintain a mapping
between its own VRF representation and the FIB slot indices, and
choose max_vrfs upfront. What is the benefit of this flat
multi-VRF FIB if the application still needs to manage a
translation layer and pre-commit memory for VRFs that may never
exist?
> As a trade-off for a bad trade-off ;) I can suggest to allocate it in
> chunks. Let's say you are starting with 16 VRFs, and during runtime, if
> the user wants to increase the number of VRFs above this limit, you can
> allocate another 16xVRF FIB. Then, of course, you need to split
> addresses into 2 bursts each for each FIB handle.
But then we are back to regrouping packets -- just by chunk of
VRFs instead of by individual VRF. If we have to sort the burst
anyway, what does the flat multi-VRF table buy us?
> >
> >>> I am not too familiar with DPDK FIB internals, but would it be
> >>> possible to keep a separate TBL24 per VRF and only share the TBL8
> >>> pool?
> >> it is how it is implemented right now with one note - TBL24 are pre
> >> allocated.
> >>> Something like pre-allocating an array of max_vrfs TBL24
> >>> pointers, allocating each TBL24 on demand at VRF add time,
> >> and you suggesting to allocate TBL24 on demand by adding an extra
> >> indirection layer. Thiswill leadtolowerperformance,whichIwouldliketo avoid.
> >>> and
> >>> having them all point into a shared TBL8 pool. The TBL8 index in
> >>> TBL24 entries seems to already be global, so would that work without
> >>> encoding changes?
> >>>
> >>> Going further: could the same idea extend to IPv6? The dir24_8 and
> >>> trie seem to use the same TBL8 block format (256 entries, same
> >>> (nh << 1) | ext_bit encoding, same size). Would unifying the TBL8
> >>> allocator allow a single pool shared across IPv4, IPv6, and all
> >>> VRFs? That could be a bigger win for /32-heavy and /128-heavy tables
> >>> and maybe a good first step before multi-VRF.
> >> So, you are suggesting merging IPv4 and IPv6 into a single unified FIB?
> >> I'm not sure how this can be a bigger win, could you please elaborate
> >> more on this?
> > On the IPv4/IPv6 TBL8 pool: I was not suggesting merging FIBs, just
> > sharing the TBL8 block allocator between separate FIB instances.
> > This is possible since dir24_8 and trie use the same TBL8 block
> > format (256 entries, same encoding, same size).
> >
> > Would it be possible to pass a shared TBL8 pool at rte_fib_create()
> > time? Each FIB keeps its own TBL24 and RIB, but TBL8 is shared
> > across all FIBs and potentially across IPv4/IPv6. Users would no
> > longer have to guess num_tbl8 per FIB.
> Yes, this is possible. However, this will significantly complicate the
> work with the library, solving a not so big problem.
Your series already shares TBL8 across all VRFs within a single
FIB -- that part is useful, and it does not require the flat
multi-VRF TBL24.
In grout, routes arrive from FRR (BGP, OSPF, etc.) at runtime.
We cannot predict TBL8 usage per VRF in advance -- it depends on
prefix length distribution which varies per VRF and changes over
time. No production LPM (Linux kernel, JunOS, IOS) asks the
operator to size these structures per routing table upfront.
Today we do not even have TBL8 usage stats (Robin's series
addresses that), and there is no way to resize a FIB without
destroying and recreating it.
Could you share performance numbers comparing the flat multi-VRF
lookup against regroup + per-VRF lookup?
> >>> Regards,
> >>>
> >>> Maxime Leroy
> >>>
> >>>> Vladimir Medvedkin (4):
> >>>> fib: add multi-VRF support
> >>>> fib: add VRF functional and unit tests
> >>>> fib6: add multi-VRF support
> >>>> fib6: add VRF functional and unit tests
> >>>>
> >>>> app/test-fib/main.c | 257 ++++++++++++++++++++++--
> >>>> app/test/test_fib.c | 298 +++++++++++++++++++++++++++
> >>>> app/test/test_fib6.c | 319 ++++++++++++++++++++++++++++-
> >>>> lib/fib/dir24_8.c | 241 ++++++++++++++++------
> >>>> lib/fib/dir24_8.h | 255 ++++++++++++++++--------
> >>>> lib/fib/dir24_8_avx512.c | 420 +++++++++++++++++++++++++++++++--------
> >>>> lib/fib/dir24_8_avx512.h | 80 +++++++-
> >>>> lib/fib/rte_fib.c | 158 ++++++++++++---
> >>>> lib/fib/rte_fib.h | 94 ++++++++-
> >>>> lib/fib/rte_fib6.c | 166 +++++++++++++---
> >>>> lib/fib/rte_fib6.h | 88 +++++++-
> >>>> lib/fib/trie.c | 158 +++++++++++----
> >>>> lib/fib/trie.h | 51 +++--
> >>>> lib/fib/trie_avx512.c | 225 +++++++++++++++++++--
> >>>> lib/fib/trie_avx512.h | 39 +++-
> >>>> 15 files changed, 2453 insertions(+), 396 deletions(-)
> >>>>
> >>>> --
> >>>> 2.43.0
> >>>>
> >> --
> >> Regards,
> >> Vladimir
> >>
> >
> --
> Regards,
> Vladimir
>
Regards,
Maxime
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [RFC PATCH 1/4] fib: add multi-VRF support
2026-03-23 22:22 ` Konstantin Ananyev
@ 2026-03-25 14:09 ` Medvedkin, Vladimir
2026-03-26 10:13 ` Konstantin Ananyev
0 siblings, 1 reply; 33+ messages in thread
From: Medvedkin, Vladimir @ 2026-03-25 14:09 UTC (permalink / raw)
To: Konstantin Ananyev, dev@dpdk.org
Cc: rjarry@redhat.com, nsaxena16@gmail.com, mb@smartsharesystems.com,
adwivedi@marvell.com, jerinjacobk@gmail.com, Maxime Leroy
On 3/23/2026 10:22 PM, Konstantin Ananyev wrote:
>>>> Add VRF (Virtual Routing and Forwarding) support to the IPv4
>>>> FIB library, allowing multiple independent routing tables
>>>> within a single FIB instance.
>>>>
>>>> Introduce max_vrfs and vrf_default_nh fields in rte_fib_conf
>>>> to configure the number of VRFs and per-VRF default nexthops.
>>> Thanks Vladimir, allowing multiple VRFs per same LPM table will
>>> definitely be a useful thing to have.
>>> Though, I have the same concern as Maxime:
>>> memory requirements are just overwhelming.
>>> Stupid q - why just not to store a pointer to a vector of next-hops
>>> within the table entry?
>> Am I understand correctly, a vector with max_number_of_vrfs entries and
>> use vrf id to address a nexthop?
> Yes.
Here I can see 2 problems:
1. tbl entries must be the size of a pointer, so no way to use smaller sizes
2. those vectors will be sparsely populated and, depending on the
runtime configuration, may consume a lot of memory too (as Robin
mentioned they may have 1024 VRFs)
>
>> Yes, this may work.
>> But, if we are going to do an extra memory access, I'd better to
>> maintain an internal hash table with 5 byte keys {24_bits_from_LPM,
>> 16_bits_vrf_id} to retrieve a nexthop.
> Hmm... and what to do with entries in tbl8, I mean what will be the key for them?
> Or you don't plan to put entries from tbl8 to that hash table?
The idea is to have a single LPM struct with a join superset of all
prefixes existing in all VRFs. Each prefix in this LPM struct has its
own unique "nexthop", which is not the final next hop, but an
intermediate metadata defining this unique prefix. Then, the following
search is performed with the key containing this intermediate metadata +
vrf_id in some exact match database like hash table. This approach is
the most memory friendly, since there is only one LPM data struct (which
scales well with number of prefixes it has) with intermediate entries
only 4b long.
On the other hand it requires an extra search, so lookup will be slower.
Also, some current LPM optimizations, like tbl8 collapsing if all tbl8
entries have a similar value, will be gone.
>
>>> And we can provide to the user with ability to specify custom
>>> alloc/free function for these vectors.
>>> That would help to avoid allocating huge chunks of memory at startup.
>>> I understand that it will be one extra memory dereference,
>>> but probably it will be not that critical in terms of performance .
>>> Again for bulk function we might be able to pipeline lookups and
>>> de-references and hide that extra load latency.
>>>
>>>> Add four new experimental APIs:
>>>> - rte_fib_vrf_add() and rte_fib_vrf_delete() to manage routes
>>>> per VRF
>>>> - rte_fib_vrf_lookup_bulk() for multi-VRF bulk lookups
>>>> - rte_fib_vrf_get_rib() to retrieve a per-VRF RIB handle
>>>>
>>>> Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
>>>> ---
>>>> lib/fib/dir24_8.c | 241 ++++++++++++++++------
>>>> lib/fib/dir24_8.h | 255 ++++++++++++++++--------
>>>> lib/fib/dir24_8_avx512.c | 420 +++++++++++++++++++++++++++++++--------
>>>> lib/fib/dir24_8_avx512.h | 80 +++++++-
>>>> lib/fib/rte_fib.c | 158 ++++++++++++---
>>>> lib/fib/rte_fib.h | 94 ++++++++-
>>>> 6 files changed, 988 insertions(+), 260 deletions(-)
>>>>
>> <snip>
>>
>> --
>> Regards,
>> Vladimir
>>
--
Regards,
Vladimir
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [RFC PATCH 0/4] VRF support in FIB library
2026-03-24 9:19 ` Maxime Leroy
@ 2026-03-25 15:56 ` Medvedkin, Vladimir
2026-03-25 21:43 ` Maxime Leroy
0 siblings, 1 reply; 33+ messages in thread
From: Medvedkin, Vladimir @ 2026-03-25 15:56 UTC (permalink / raw)
To: Maxime Leroy; +Cc: dev, rjarry, nsaxena16, mb, adwivedi, jerinjacobk
On 3/24/2026 9:19 AM, Maxime Leroy wrote:
> Hi Vladimir,
>
> On Mon, Mar 23, 2026 at 7:46 PM Medvedkin, Vladimir
> <vladimir.medvedkin@intel.com> wrote:
>>
>> On 3/23/2026 2:53 PM, Maxime Leroy wrote:
>>> On Mon, Mar 23, 2026 at 1:49 PM Medvedkin, Vladimir
>>> <vladimir.medvedkin@intel.com> wrote:
>>>> Hi Maxime,
>>>>
>>>> On 3/23/2026 11:27 AM, Maxime Leroy wrote:
>>>>> Hi Vladimir,
>>>>>
>>>>>
>>>>> On Sun, Mar 22, 2026 at 4:42 PM Vladimir Medvedkin
>>>>> <vladimir.medvedkin@intel.com> wrote:
>>>>>> This series adds multi-VRF support to both IPv4 and IPv6 FIB paths by
>>>>>> allowing a single FIB instance to host multiple isolated routing domains.
>>>>>>
>>>>>> Currently FIB instance represents one routing instance. For workloads that
>>>>>> need multiple VRFs, the only option is to create multiple FIB objects. In a
>>>>>> burst oriented datapath, packets in the same batch can belong to different VRFs, so
>>>>>> the application either does per-packet lookup in different FIB instances or
>>>>>> regroups packets by VRF before lookup. Both approaches are expensive.
>>>>>>
>>>>>> To remove that cost, this series keeps all VRFs inside one FIB instance and
>>>>>> extends lookup input with per-packet VRF IDs.
>>>>>>
>>>>>> The design follows the existing fast-path structure for both families. IPv4 and
>>>>>> IPv6 use multi-ary trees with a 2^24 associativity on a first level (tbl24). The
>>>>>> first-level table scales per configured VRF. This increases memory usage, but
>>>>>> keeps performance and lookup complexity on par with non-VRF implementation.
>>>>>>
>>>>> Thanks for the RFC. Some thoughts below.
>>>>>
>>>>> Memory cost: the flat TBL24 replicates the entire table for every VRF
>>>>> (num_vrfs * 2^24 * nh_size). With 256 VRFs and 8B nexthops that is
>>>>> 32 GB for TBL24 alone. In grout we support up to 256 VRFs allocated
>>>>> on demand -- this approach forces the full cost upfront even if most
>>>>> VRFs are empty.
>>>> Yes, increased memory consumption is the
>>>> trade-off.WemakethischoiceinDPDKquite often,such as pre-allocatedmbufs,
>>>> mempoolsand many other stuff allocated in advance to gain performance.
>>>> For FIB, I chose to replicate TBL24 per VRF for this same reason.
>>>>
>>>> And, as Morten mentioned earlier, if memory is the priority, a table
>>>> instance per VRF allocated on-demand is still supported.
>>>>
>>>> The high memory cost stems from TBL24's design: for IPv4, it was
>>>> justified by the BGP filtering convention (no prefixes more specific
>>>> than /24 in BGPv4 full view), ensuring most lookups hit with just one
>>>> random memory access. For IPv6, we should likely switch to a 16-bit TRIE
>>>> scheme on all layers. For IPv4, alternative algorithms with smaller
>>>> footprints (like DXR or DIR16-8-8, as used in VPP) may be worth
>>>> exploring if BGP full view is not required for those VRFs.
>>>>
>>>>> Per-packet VRF lookup: Rx bursts come from one port, thus one VRF.
>>>>> Mixed-VRF bulk lookups do not occur in practice. The three AVX512
>>>>> code paths add complexity for a scenario that does not exist, at
>>>>> least for a classic router. Am I missing a use-case?
>>>> That's not true, you're missing out on a lot of established core use
>>>> cases that are at least 2 decades old:
>>>>
>>>> - VLAN subinterface abstraction. Each subinterface may belong to a
>>>> separate VRF
>>>>
>>>> - MPLS VPN
>>>>
>>>> - Policy based routing
>>>>
>>> Fair point on VLAN subinterfaces and MPLS VPN. SRv6 L3VPN (End.DT4/
>>> End.DT6) also fits that pattern after decap.
>>>
>>> I agree DPDK often pre-allocates for performance, but I wonder if the
>>> flat TBL24 actually helps here. Each VRF's working set is spread
>>> 128 MB apart in the flat table. Would regrouping packets by VRF and
>>> doing one bulk lookup per VRF with separate contiguous TBL24s be
>>> more cache-friendly than a single mixed-VRF gather? Do you have
>>> benchmarks comparing the two approaches?
>> It depends. Generally, if we assume that we are working with wide
>> internet traffic, then even for a single VRF we most likely will miss
>> the cache for TLB24, thus, regardless of the size of the tbl24, each
>> memory access will be performed directly to DRAM.
> If the lookup is DRAM-bound anyway, then the 10 cycles/addr cost
> is dominated by memory latency, not CPU. The CPU cost of a bucket
> sort on 32-64 packets is negligible next to a DRAM access (~80-100
> ns per cache miss).
memory accesses are independent and executed in parallel in the CPU
pipeline
> That actually makes the case for regroup +
> per-VRF lookup: the regrouping is pure CPU work hidden behind
> memory stalls,
regrouping must be performed before memory accesses, so it cannot be
amortized in between of memory reads
> and each per-VRF bulk lookup hits a contiguous
> TBL24 instead of scattering across 128 MB-apart VRF regions.
why is a contiguous 128Mb single-VRF TBL24 OK for you, but bigger
contiguous multi-VRF TBL24 is not OK in the context of lookup (here we
are talking about lookup, omitting the problem of memory consumption on
init)?
In both of these cases, memory access behaves the same way within a
single batch of packets during lookup, i.e. the first hit is likely a
cache miss, regardless of whether we are dealing with one or more VRFs,
it will not maintain TBL24 in L3$ in any way in a real dataplane app.
>
>> And if the addresses are localized (i.e. most traffic is internal), then
>> having multiple TBL24 won'tmake the situationmuchworse.
>>
> With localized traffic, regrouping by VRF + per-VRF lookup on
> contiguous TBL24s would benefit from cache locality,
why so? There will be no differences within a single batch with a
reasonable size (for example 64), because within the lookup session, no
matter with or without regrouping, temporal cache locality will be the same.
Let't look at it from a different angle. Is it
worthregroupingipaddressesby/8(i.e. 8 MSBs)withthe
currentimplementationof a singleVRFFIB?
> while the
> flat multi-VRF table spreads hot entries 128 MB apart. The flat
> approach may actually be worse in that scenario
>
>> I don't have any benchmarks for regrouping, however I have 2 things to
>> consider:
>>
>> 1. lookup is relatively fast (for IPv4 it is about 10 cycles per
>> address, and I don't really want to slow it down)
>>
>> 2. incoming addresses and their corresponding VRFs are not controlled by
>> "us", so this is a random set. Regrouping effectively is sorting. I'm
>> not really happy to have nlogn complexity on a fast path :)
> Without benchmarks, we do not know whether the flat approach is
> actually faster than regroup + per-VRF lookup.
feel free to share benchmark results. The only thing you need to add is
the packets regrouping logic, and then use separate single-VRF FIB
instances.
>>> On the memory trade-off and VRF ID mapping: the API uses vrf_id as
>>> a direct index (0 to max_vrfs-1). With 256 VRFs and 8B nexthops,
>>> TBL24 alone costs 32 GB for IPv4 and 32 GB for IPv6 -- 64 GB total
>>> at startup. In grout, VRF IDs are interface IDs that can be any
>>> uint16_t, so we would also need to maintain a mapping between our
>>> VRF IDs and FIB slot indices.
>> of course, this is an application responsibility. In FIB VRFs are in
>> continuous range.
>>> We would need to introduce a max_vrfs
>>> limit, which forces a bad trade-off: either set it low (e.g. 16)
>>> and limit deployments, or set it high (e.g. 256) and pay 64 GB at
>>> startup even with a single VRF. With separate FIB instances per VRF,
>>> we only allocate what we use.
>> Yes, I understand this. In the end, if the user wants to use 256 VRFs,
>> the amount of memory footprint will be at least 64Gb anyway.
> The difference is when the memory is committed.
yes, this's the only difference. It all comes down to the static vs
dynamic memory allocation problem. And each of these approaches is good
for solving a specific task. For the task of creating a new VRF, what is
more preferable - to fail on init or runtime?
> With separate FIB
> instances per VRF, you allocate 128 MB only when a VRF is actually
> created at runtime. With the flat multi-VRF approach, you pay
> max_vrfs * 128 MB at startup, even if only one VRF is active.
>
> On top of that, the API uses vrf_id as a direct index (0 to
> max_vrfs-1). As Stephen noted, there are multiple ways to model
> VRFs. Depending on the networking stack, VRFs are identified by
> ifindex (Linux l3mdev), by name (Cisco, Juniper), or by some
> other scheme. This means the application must maintain a mapping
> between its own VRF representation and the FIB slot indices, and
> choose max_vrfs upfront. What is the benefit of this flat
> multi-VRF FIB if the application still needs to manage a
> translation layer and pre-commit memory for VRFs that may never
> exist?
This is the control plane task.
>
>> As a trade-off for a bad trade-off ;) I can suggest to allocate it in
>> chunks. Let's say you are starting with 16 VRFs, and during runtime, if
>> the user wants to increase the number of VRFs above this limit, you can
>> allocate another 16xVRF FIB. Then, of course, you need to split
>> addresses into 2 bursts each for each FIB handle.
> But then we are back to regrouping packets -- just by chunk of
> VRFs instead of by individual VRF. If we have to sort the burst
> anyway, what does the flat multi-VRF table buy us?
>
>>>>> I am not too familiar with DPDK FIB internals, but would it be
>>>>> possible to keep a separate TBL24 per VRF and only share the TBL8
>>>>> pool?
>>>> it is how it is implemented right now with one note - TBL24 are pre
>>>> allocated.
>>>>> Something like pre-allocating an array of max_vrfs TBL24
>>>>> pointers, allocating each TBL24 on demand at VRF add time,
>>>> and you suggesting to allocate TBL24 on demand by adding an extra
>>>> indirection layer. Thiswill leadtolowerperformance,whichIwouldliketo avoid.
>>>>> and
>>>>> having them all point into a shared TBL8 pool. The TBL8 index in
>>>>> TBL24 entries seems to already be global, so would that work without
>>>>> encoding changes?
>>>>>
>>>>> Going further: could the same idea extend to IPv6? The dir24_8 and
>>>>> trie seem to use the same TBL8 block format (256 entries, same
>>>>> (nh << 1) | ext_bit encoding, same size). Would unifying the TBL8
>>>>> allocator allow a single pool shared across IPv4, IPv6, and all
>>>>> VRFs? That could be a bigger win for /32-heavy and /128-heavy tables
>>>>> and maybe a good first step before multi-VRF.
>>>> So, you are suggesting merging IPv4 and IPv6 into a single unified FIB?
>>>> I'm not sure how this can be a bigger win, could you please elaborate
>>>> more on this?
>>> On the IPv4/IPv6 TBL8 pool: I was not suggesting merging FIBs, just
>>> sharing the TBL8 block allocator between separate FIB instances.
>>> This is possible since dir24_8 and trie use the same TBL8 block
>>> format (256 entries, same encoding, same size).
>>>
>>> Would it be possible to pass a shared TBL8 pool at rte_fib_create()
>>> time? Each FIB keeps its own TBL24 and RIB, but TBL8 is shared
>>> across all FIBs and potentially across IPv4/IPv6. Users would no
>>> longer have to guess num_tbl8 per FIB.
>> Yes, this is possible. However, this will significantly complicate the
>> work with the library, solving a not so big problem.
> Your series already shares TBL8 across all VRFs within a single
> FIB -- that part is useful, and it does not require the flat
> multi-VRF TBL24.
>
> In grout, routes arrive from FRR (BGP, OSPF, etc.) at runtime.
> We cannot predict TBL8 usage per VRF in advance
and you don't need it (knowing per-VRF consumption) now. If I understood
your request here properly, do you want to share TBL8 between ipv4 and
ipv6 FIBs? I don't think this is a good idea at all. At least because
this is a good idea to split them in case if one AF consumes (because of
attack/bogus CP) all TBL8, another AF remains intact.
> -- it depends on
> prefix length distribution which varies per VRF and changes over
> time. No production LPM (Linux kernel, JunOS, IOS) asks the
> operator to size these structures per routing table upfront.
- they are using different lpm algorithms
- you use these facilities, developers properly tuned them. FIB is a low
level library, it cannot be used without any knowledge, it will not
solve all the problems with a single red button "make it work, don't do
any bugs"
P.S. how do you know how JunOS /IOS implements their LPMs? ;)
> Today we do not even have TBL8 usage stats (Robin's series
> addresses that)
I will tryto findtimetoreviewthispatchinthe nearfuture
> , and there is no way to resize a FIB without
> destroying and recreating it.
>
> Could you share performance numbers comparing the flat multi-VRF
> lookup against regroup + per-VRF lookup?
>
>>>>> Regards,
>>>>>
>>>>> Maxime Leroy
>>>>>
>>>>>> Vladimir Medvedkin (4):
>>>>>> fib: add multi-VRF support
>>>>>> fib: add VRF functional and unit tests
>>>>>> fib6: add multi-VRF support
>>>>>> fib6: add VRF functional and unit tests
>>>>>>
>>>>>> app/test-fib/main.c | 257 ++++++++++++++++++++++--
>>>>>> app/test/test_fib.c | 298 +++++++++++++++++++++++++++
>>>>>> app/test/test_fib6.c | 319 ++++++++++++++++++++++++++++-
>>>>>> lib/fib/dir24_8.c | 241 ++++++++++++++++------
>>>>>> lib/fib/dir24_8.h | 255 ++++++++++++++++--------
>>>>>> lib/fib/dir24_8_avx512.c | 420 +++++++++++++++++++++++++++++++--------
>>>>>> lib/fib/dir24_8_avx512.h | 80 +++++++-
>>>>>> lib/fib/rte_fib.c | 158 ++++++++++++---
>>>>>> lib/fib/rte_fib.h | 94 ++++++++-
>>>>>> lib/fib/rte_fib6.c | 166 +++++++++++++---
>>>>>> lib/fib/rte_fib6.h | 88 +++++++-
>>>>>> lib/fib/trie.c | 158 +++++++++++----
>>>>>> lib/fib/trie.h | 51 +++--
>>>>>> lib/fib/trie_avx512.c | 225 +++++++++++++++++++--
>>>>>> lib/fib/trie_avx512.h | 39 +++-
>>>>>> 15 files changed, 2453 insertions(+), 396 deletions(-)
>>>>>>
>>>>>> --
>>>>>> 2.43.0
>>>>>>
>>>> --
>>>> Regards,
>>>> Vladimir
>>>>
>> --
>> Regards,
>> Vladimir
>>
> Regards,
>
> Maxime
--
Regards,
Vladimir
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [RFC PATCH 0/4] VRF support in FIB library
2026-03-25 15:56 ` Medvedkin, Vladimir
@ 2026-03-25 21:43 ` Maxime Leroy
2026-03-27 18:27 ` Medvedkin, Vladimir
0 siblings, 1 reply; 33+ messages in thread
From: Maxime Leroy @ 2026-03-25 21:43 UTC (permalink / raw)
To: Medvedkin, Vladimir; +Cc: dev, rjarry, nsaxena16, mb, adwivedi, jerinjacobk
Hi Vladimir,
On Wed, Mar 25, 2026 at 4:56 PM Medvedkin, Vladimir
<vladimir.medvedkin@intel.com> wrote:
>
>
> On 3/24/2026 9:19 AM, Maxime Leroy wrote:
> > Hi Vladimir,
> >
> > On Mon, Mar 23, 2026 at 7:46 PM Medvedkin, Vladimir
> > <vladimir.medvedkin@intel.com> wrote:
> >>
> >> On 3/23/2026 2:53 PM, Maxime Leroy wrote:
> >>> On Mon, Mar 23, 2026 at 1:49 PM Medvedkin, Vladimir
> >>> <vladimir.medvedkin@intel.com> wrote:
> >>>> Hi Maxime,
> >>>>
> >>>> On 3/23/2026 11:27 AM, Maxime Leroy wrote:
> >>>>> Hi Vladimir,
> >>>>>
> >>>>>
> >>>>> On Sun, Mar 22, 2026 at 4:42 PM Vladimir Medvedkin
> >>>>> <vladimir.medvedkin@intel.com> wrote:
> >>>>>> This series adds multi-VRF support to both IPv4 and IPv6 FIB paths by
> >>>>>> allowing a single FIB instance to host multiple isolated routing domains.
> >>>>>>
> >>>>>> Currently FIB instance represents one routing instance. For workloads that
> >>>>>> need multiple VRFs, the only option is to create multiple FIB objects. In a
> >>>>>> burst oriented datapath, packets in the same batch can belong to different VRFs, so
> >>>>>> the application either does per-packet lookup in different FIB instances or
> >>>>>> regroups packets by VRF before lookup. Both approaches are expensive.
> >>>>>>
> >>>>>> To remove that cost, this series keeps all VRFs inside one FIB instance and
> >>>>>> extends lookup input with per-packet VRF IDs.
> >>>>>>
> >>>>>> The design follows the existing fast-path structure for both families. IPv4 and
> >>>>>> IPv6 use multi-ary trees with a 2^24 associativity on a first level (tbl24). The
> >>>>>> first-level table scales per configured VRF. This increases memory usage, but
> >>>>>> keeps performance and lookup complexity on par with non-VRF implementation.
> >>>>>>
> >>>>> Thanks for the RFC. Some thoughts below.
> >>>>>
> >>>>> Memory cost: the flat TBL24 replicates the entire table for every VRF
> >>>>> (num_vrfs * 2^24 * nh_size). With 256 VRFs and 8B nexthops that is
> >>>>> 32 GB for TBL24 alone. In grout we support up to 256 VRFs allocated
> >>>>> on demand -- this approach forces the full cost upfront even if most
> >>>>> VRFs are empty.
> >>>> Yes, increased memory consumption is the
> >>>> trade-off.WemakethischoiceinDPDKquite often,such as pre-allocatedmbufs,
> >>>> mempoolsand many other stuff allocated in advance to gain performance.
> >>>> For FIB, I chose to replicate TBL24 per VRF for this same reason.
> >>>>
> >>>> And, as Morten mentioned earlier, if memory is the priority, a table
> >>>> instance per VRF allocated on-demand is still supported.
> >>>>
> >>>> The high memory cost stems from TBL24's design: for IPv4, it was
> >>>> justified by the BGP filtering convention (no prefixes more specific
> >>>> than /24 in BGPv4 full view), ensuring most lookups hit with just one
> >>>> random memory access. For IPv6, we should likely switch to a 16-bit TRIE
> >>>> scheme on all layers. For IPv4, alternative algorithms with smaller
> >>>> footprints (like DXR or DIR16-8-8, as used in VPP) may be worth
> >>>> exploring if BGP full view is not required for those VRFs.
> >>>>
> >>>>> Per-packet VRF lookup: Rx bursts come from one port, thus one VRF.
> >>>>> Mixed-VRF bulk lookups do not occur in practice. The three AVX512
> >>>>> code paths add complexity for a scenario that does not exist, at
> >>>>> least for a classic router. Am I missing a use-case?
> >>>> That's not true, you're missing out on a lot of established core use
> >>>> cases that are at least 2 decades old:
> >>>>
> >>>> - VLAN subinterface abstraction. Each subinterface may belong to a
> >>>> separate VRF
> >>>>
> >>>> - MPLS VPN
> >>>>
> >>>> - Policy based routing
> >>>>
> >>> Fair point on VLAN subinterfaces and MPLS VPN. SRv6 L3VPN (End.DT4/
> >>> End.DT6) also fits that pattern after decap.
> >>>
> >>> I agree DPDK often pre-allocates for performance, but I wonder if the
> >>> flat TBL24 actually helps here. Each VRF's working set is spread
> >>> 128 MB apart in the flat table. Would regrouping packets by VRF and
> >>> doing one bulk lookup per VRF with separate contiguous TBL24s be
> >>> more cache-friendly than a single mixed-VRF gather? Do you have
> >>> benchmarks comparing the two approaches?
> >> It depends. Generally, if we assume that we are working with wide
> >> internet traffic, then even for a single VRF we most likely will miss
> >> the cache for TLB24, thus, regardless of the size of the tbl24, each
> >> memory access will be performed directly to DRAM.
> > If the lookup is DRAM-bound anyway, then the 10 cycles/addr cost
> > is dominated by memory latency, not CPU. The CPU cost of a bucket
> > sort on 32-64 packets is negligible next to a DRAM access (~80-100
> > ns per cache miss).
> memory accesses are independent and executed in parallel in the CPU
> pipeline
> > That actually makes the case for regroup +
> > per-VRF lookup: the regrouping is pure CPU work hidden behind
> > memory stalls,
> regrouping must be performed before memory accesses, so it cannot be
> amortized in between of memory reads
With internet traffic, TBL24 lookups quickly become limited by
cache misses, not CPU cycles. Even if some bursts hit the same
routes and benefit from cache locality, the CPU has a limited
number of outstanding misses (load buffer entries, MSHRs) --
out-of-order execution helps, but it is not magic.
The whole point of vector/graph processing (VPP, DPDK graph, etc.)
is to amortize that memory latency: prefetch for packet N+1 while
processing packet N. This works because all packets in a batch
hit the same data structure in a tight loop.
With separate per-VRF TBL24s, a bucket sort by VRF -- a few
dozen cycles, all in L1 -- gives you clean batches where
prefetching works as designed. This is exactly what graph nodes
already do: classify, then process per-class in a tight loop.
> > and each per-VRF bulk lookup hits a contiguous
> > TBL24 instead of scattering across 128 MB-apart VRF regions.
>
> why is a contiguous 128Mb single-VRF TBL24 OK for you, but bigger
> contiguous multi-VRF TBL24 is not OK in the context of lookup (here we
> are talking about lookup, omitting the problem of memory consumption on
> init)?
The performance difference may be small, but the flat approach
is not faster either -- while costing 64 GB upfront.
> In both of these cases, memory access behaves the same way within a
> single batch of packets during lookup, i.e. the first hit is likely a
> cache miss, regardless of whether we are dealing with one or more VRFs,
> it will not maintain TBL24 in L3$ in any way in a real dataplane app.
>
> >
> >> And if the addresses are localized (i.e. most traffic is internal), then
> >> having multiple TBL24 won'tmake the situationmuchworse.
> >>
> > With localized traffic, regrouping by VRF + per-VRF lookup on
> > contiguous TBL24s would benefit from cache locality,
>
> why so? There will be no differences within a single batch with a
> reasonable size (for example 64), because within the lookup session, no
> matter with or without regrouping, temporal cache locality will be the same.
>
> Let't look at it from a different angle. Is it
> worthregroupingipaddressesby/8(i.e. 8 MSBs)withthe
> currentimplementationof a singleVRFFIB?
>
> > while the
> > flat multi-VRF table spreads hot entries 128 MB apart. The flat
> > approach may actually be worse in that scenario
> >
> >> I don't have any benchmarks for regrouping, however I have 2 things to
> >> consider:
> >>
> >> 1. lookup is relatively fast (for IPv4 it is about 10 cycles per
> >> address, and I don't really want to slow it down)
> >>
> >> 2. incoming addresses and their corresponding VRFs are not controlled by
> >> "us", so this is a random set. Regrouping effectively is sorting. I'm
> >> not really happy to have nlogn complexity on a fast path :)
> > Without benchmarks, we do not know whether the flat approach is
> > actually faster than regroup + per-VRF lookup.
> feel free to share benchmark results. The only thing you need to add is
> the packets regrouping logic, and then use separate single-VRF FIB
> instances.
Your series introduces a new API that optimizes multi-VRF lookup.
The performance numbers should come with the proposal.
> >>> On the memory trade-off and VRF ID mapping: the API uses vrf_id as
> >>> a direct index (0 to max_vrfs-1). With 256 VRFs and 8B nexthops,
> >>> TBL24 alone costs 32 GB for IPv4 and 32 GB for IPv6 -- 64 GB total
> >>> at startup. In grout, VRF IDs are interface IDs that can be any
> >>> uint16_t, so we would also need to maintain a mapping between our
> >>> VRF IDs and FIB slot indices.
> >> of course, this is an application responsibility. In FIB VRFs are in
> >> continuous range.
> >>> We would need to introduce a max_vrfs
> >>> limit, which forces a bad trade-off: either set it low (e.g. 16)
> >>> and limit deployments, or set it high (e.g. 256) and pay 64 GB at
> >>> startup even with a single VRF. With separate FIB instances per VRF,
> >>> we only allocate what we use.
> >> Yes, I understand this. In the end, if the user wants to use 256 VRFs,
> >> the amount of memory footprint will be at least 64Gb anyway.
> > The difference is when the memory is committed.
> yes, this's the only difference. It all comes down to the static vs
> dynamic memory allocation problem. And each of these approaches is good
> for solving a specific task. For the task of creating a new VRF, what is
> more preferable - to fail on init or runtime?
The main problem is that your series imposes contiguous VRF IDs
(0 to max_vrfs-1). How a VRF is represented is a network stack
design decision -- in Linux it is an ifindex, in Cisco a name,
in grout an interface ID. Any application using this API needs
a mapping layer on top.
In grout, everything is allocated dynamically: mempools, FIBs,
conntrack tables. Pre-allocating everything at init forces
hardcoded arbitrary limits and prevents memory reuse between
subsystems -- memory reserved for FIB TBL24 cannot be used for
conntrack when the VRF has no routes, and vice versa. We prefer
to allocate resources only when needed. It is simpler for users
and more efficient for memory.
> > With separate FIB
> > instances per VRF, you allocate 128 MB only when a VRF is actually
> > created at runtime. With the flat multi-VRF approach, you pay
> > max_vrfs * 128 MB at startup, even if only one VRF is active.
> >
> > On top of that, the API uses vrf_id as a direct index (0 to
> > max_vrfs-1). As Stephen noted, there are multiple ways to model
> > VRFs. Depending on the networking stack, VRFs are identified by
> > ifindex (Linux l3mdev), by name (Cisco, Juniper), or by some
> > other scheme. This means the application must maintain a mapping
> > between its own VRF representation and the FIB slot indices, and
> > choose max_vrfs upfront. What is the benefit of this flat
> > multi-VRF FIB if the application still needs to manage a
> > translation layer and pre-commit memory for VRFs that may never
> > exist?
> This is the control plane task.
> >
> >> As a trade-off for a bad trade-off ;) I can suggest to allocate it in
> >> chunks. Let's say you are starting with 16 VRFs, and during runtime, if
> >> the user wants to increase the number of VRFs above this limit, you can
> >> allocate another 16xVRF FIB. Then, of course, you need to split
> >> addresses into 2 bursts each for each FIB handle.
> > But then we are back to regrouping packets -- just by chunk of
> > VRFs instead of by individual VRF. If we have to sort the burst
> > anyway, what does the flat multi-VRF table buy us?
> >
> >>>>> I am not too familiar with DPDK FIB internals, but would it be
> >>>>> possible to keep a separate TBL24 per VRF and only share the TBL8
> >>>>> pool?
> >>>> it is how it is implemented right now with one note - TBL24 are pre
> >>>> allocated.
> >>>>> Something like pre-allocating an array of max_vrfs TBL24
> >>>>> pointers, allocating each TBL24 on demand at VRF add time,
> >>>> and you suggesting to allocate TBL24 on demand by adding an extra
> >>>> indirection layer. Thiswill leadtolowerperformance,whichIwouldliketo avoid.
> >>>>> and
> >>>>> having them all point into a shared TBL8 pool. The TBL8 index in
> >>>>> TBL24 entries seems to already be global, so would that work without
> >>>>> encoding changes?
> >>>>>
> >>>>> Going further: could the same idea extend to IPv6? The dir24_8 and
> >>>>> trie seem to use the same TBL8 block format (256 entries, same
> >>>>> (nh << 1) | ext_bit encoding, same size). Would unifying the TBL8
> >>>>> allocator allow a single pool shared across IPv4, IPv6, and all
> >>>>> VRFs? That could be a bigger win for /32-heavy and /128-heavy tables
> >>>>> and maybe a good first step before multi-VRF.
> >>>> So, you are suggesting merging IPv4 and IPv6 into a single unified FIB?
> >>>> I'm not sure how this can be a bigger win, could you please elaborate
> >>>> more on this?
> >>> On the IPv4/IPv6 TBL8 pool: I was not suggesting merging FIBs, just
> >>> sharing the TBL8 block allocator between separate FIB instances.
> >>> This is possible since dir24_8 and trie use the same TBL8 block
> >>> format (256 entries, same encoding, same size).
> >>>
> >>> Would it be possible to pass a shared TBL8 pool at rte_fib_create()
> >>> time? Each FIB keeps its own TBL24 and RIB, but TBL8 is shared
> >>> across all FIBs and potentially across IPv4/IPv6. Users would no
> >>> longer have to guess num_tbl8 per FIB.
> >> Yes, this is possible. However, this will significantly complicate the
> >> work with the library, solving a not so big problem.
> > Your series already shares TBL8 across all VRFs within a single
> > FIB -- that part is useful, and it does not require the flat
> > multi-VRF TBL24.
> >
> > In grout, routes arrive from FRR (BGP, OSPF, etc.) at runtime.
> > We cannot predict TBL8 usage per VRF in advance
> and you don't need it (knowing per-VRF consumption) now. If I understood
> your request here properly, do you want to share TBL8 between ipv4 and
> ipv6 FIBs? I don't think this is a good idea at all. At least because
> this is a good idea to split them in case if one AF consumes (because of
> attack/bogus CP) all TBL8, another AF remains intact.
If TBL8 isolation per AF is meant as a protection against route
floods, then the same argument applies between VRFs: your series
shares TBL8 across all VRFs within a single FIB, so a bogus
control plane in one VRF exhausts TBL8 for all other VRFs.
But more fundamentally, this is not how route flood protection
works. It is handled in the control plane: the routing daemon
limits the number of prefixes accepted per BGP session
(max-prefix) and selects which routes are installed via prefix
filters -- before those routes ever reach the forwarding table.
The Linux kernel is a good reference here. IPv6 used to enforce
a max_size limit on FIB + cache entries (net.ipv6.route.max_size,
defaulting to 4096). It caused real production issues and was
removed in kernel 6.3. IPv4 never had a FIB route limit. There
is no per-VRF route limit either. The kernel relies entirely on
the control plane for route flood protection.
> > -- it depends on
> > prefix length distribution which varies per VRF and changes over
> > time. No production LPM (Linux kernel, JunOS, IOS) asks the
> > operator to size these structures per routing table upfront.
>
> - they are using different lpm algorithms
> - you use these facilities, developers properly tuned them. FIB is a low
> level library, it cannot be used without any knowledge, it will not
> solve all the problems with a single red button "make it work, don't do
> any bugs"
> P.S. how do you know how JunOS /IOS implements their LPMs? ;)
>
I do not need to know their LPM implementation -- I only need
to know how they are configured. No production router requires
the operator to size internal LPM structures.
We can impose a maximum number of IPv4/IPv6 routes on the user
-- even though the kernel does not need this either. But TBL8
is a different problem: the application cannot predict TBL8
consumption because it depends on prefix length distribution,
which varies per VRF and changes over time with dynamic routing.
Today there is no API to query TBL8 usage, and no API to resize
a FIB without destroying it.
This is exactly why a shared TBL8 pool across VRFs is useful:
VRFs with few long prefixes naturally leave room for VRFs that
need more. This is the valuable part of your series. But it
does not require a flat multi-VRF TBL24 -- separate per-VRF
TBL24s sharing a common TBL8 pool would give the same benefit
without the 64 GB upfront cost.
> > Today we do not even have TBL8 usage stats (Robin's series
> > addresses that)
> I will tryto findtimetoreviewthispatchinthe nearfuture
Thanks, Robin's TBL8 stats series would help users understand
their TBL8 consumption -- a more practical improvement for
current users.
Regards,
Maxime
^ permalink raw reply [flat|nested] 33+ messages in thread
* RE: [RFC PATCH 1/4] fib: add multi-VRF support
2026-03-25 14:09 ` Medvedkin, Vladimir
@ 2026-03-26 10:13 ` Konstantin Ananyev
2026-03-27 18:32 ` Medvedkin, Vladimir
0 siblings, 1 reply; 33+ messages in thread
From: Konstantin Ananyev @ 2026-03-26 10:13 UTC (permalink / raw)
To: Medvedkin, Vladimir, dev@dpdk.org
Cc: rjarry@redhat.com, nsaxena16@gmail.com, mb@smartsharesystems.com,
adwivedi@marvell.com, jerinjacobk@gmail.com, Maxime Leroy
> >>>> Add VRF (Virtual Routing and Forwarding) support to the IPv4
> >>>> FIB library, allowing multiple independent routing tables
> >>>> within a single FIB instance.
> >>>>
> >>>> Introduce max_vrfs and vrf_default_nh fields in rte_fib_conf
> >>>> to configure the number of VRFs and per-VRF default nexthops.
> >>> Thanks Vladimir, allowing multiple VRFs per same LPM table will
> >>> definitely be a useful thing to have.
> >>> Though, I have the same concern as Maxime:
> >>> memory requirements are just overwhelming.
> >>> Stupid q - why just not to store a pointer to a vector of next-hops
> >>> within the table entry?
> >> Am I understand correctly, a vector with max_number_of_vrfs entries and
> >> use vrf id to address a nexthop?
> > Yes.
>
> Here I can see 2 problems:
>
> 1. tbl entries must be the size of a pointer, so no way to use smaller sizes
Yes, but as we are talking about storing nexthops for multiple VRFs anyway,
I don't think it is a big deal.
> 2. those vectors will be sparsely populated and, depending on the
> runtime configuration, may consume a lot of memory too (as Robin
> mentioned they may have 1024 VRFs)
Yeas, each VRF vector can become really sparse and we waste a lot of memory.
If that's an issue, we probably can think about something smarter
then simple flat array indexed by vrf-id: something like 2-level B-tree or so.
The main positives that I see in that approach:
- low extra overhead at lookup - one/two extra pointer de-refernces.
- it allows CP to allocate/free space for each such vecto separately,
so we don't need to pre-allocate memory for max possible entries at startup.
> >
> >> Yes, this may work.
> >> But, if we are going to do an extra memory access, I'd better to
> >> maintain an internal hash table with 5 byte keys {24_bits_from_LPM,
> >> 16_bits_vrf_id} to retrieve a nexthop.
> > Hmm... and what to do with entries in tbl8, I mean what will be the key for
> them?
> > Or you don't plan to put entries from tbl8 to that hash table?
>
> The idea is to have a single LPM struct with a join superset of all
> prefixes existing in all VRFs. Each prefix in this LPM struct has its
> own unique "nexthop", which is not the final next hop, but an
> intermediate metadata defining this unique prefix. Then, the following
> search is performed with the key containing this intermediate metadata +
> vrf_id in some exact match database like hash table. This approach is
> the most memory friendly, since there is only one LPM data struct (which
> scales well with number of prefixes it has) with intermediate entries
> only 4b long.
> On the other hand it requires an extra search, so lookup will be slower.
> Also, some current LPM optimizations, like tbl8 collapsing if all tbl8
> entries have a similar value, will be gone.
Yes, and yes :)
Yes it would help to save memory, and yes lookup will most likely be slower.
The other thing that I consider as a possible drawback here - with current rte_hash
implementation we still need to allocate space for all possible max entries at startup.
But that's not new in DPDK, and for most cases it is considered as acceptable trade-off.
Overall, it seems like a possible approach to me, I suppose the main question is:
what will be the price of that extra hash-lookup here.
Again there is a bulk version of hash lookup and in theory it might be it can be
improved further (avx512 version on x86?).
>
> >
> >>> And we can provide to the user with ability to specify custom
> >>> alloc/free function for these vectors.
> >>> That would help to avoid allocating huge chunks of memory at startup.
> >>> I understand that it will be one extra memory dereference,
> >>> but probably it will be not that critical in terms of performance .
> >>> Again for bulk function we might be able to pipeline lookups and
> >>> de-references and hide that extra load latency.
> >>>
> >>>> Add four new experimental APIs:
> >>>> - rte_fib_vrf_add() and rte_fib_vrf_delete() to manage routes
> >>>> per VRF
> >>>> - rte_fib_vrf_lookup_bulk() for multi-VRF bulk lookups
> >>>> - rte_fib_vrf_get_rib() to retrieve a per-VRF RIB handle
> >>>>
> >>>> Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
> >>>> ---
> >>>> lib/fib/dir24_8.c | 241 ++++++++++++++++------
> >>>> lib/fib/dir24_8.h | 255 ++++++++++++++++--------
> >>>> lib/fib/dir24_8_avx512.c | 420 +++++++++++++++++++++++++++++++--------
> >>>> lib/fib/dir24_8_avx512.h | 80 +++++++-
> >>>> lib/fib/rte_fib.c | 158 ++++++++++++---
> >>>> lib/fib/rte_fib.h | 94 ++++++++-
> >>>> 6 files changed, 988 insertions(+), 260 deletions(-)
> >>>>
> >> <snip>
> >>
> >> --
> >> Regards,
> >> Vladimir
> >>
> --
> Regards,
> Vladimir
>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [RFC PATCH 0/4] VRF support in FIB library
2026-03-25 21:43 ` Maxime Leroy
@ 2026-03-27 18:27 ` Medvedkin, Vladimir
2026-04-02 16:51 ` Maxime Leroy
0 siblings, 1 reply; 33+ messages in thread
From: Medvedkin, Vladimir @ 2026-03-27 18:27 UTC (permalink / raw)
To: Maxime Leroy
Cc: dev, rjarry, nsaxena16, mb, adwivedi, jerinjacobk,
Vladimir Medvedkin
Hi Maxime,
On 3/25/2026 9:43 PM, Maxime Leroy wrote:
> Hi Vladimir,
>
> On Wed, Mar 25, 2026 at 4:56 PM Medvedkin, Vladimir
> <vladimir.medvedkin@intel.com> wrote:
>>
>> On 3/24/2026 9:19 AM, Maxime Leroy wrote:
>>> Hi Vladimir,
>>>
>>> On Mon, Mar 23, 2026 at 7:46 PM Medvedkin, Vladimir
>>> <vladimir.medvedkin@intel.com> wrote:
>>>> On 3/23/2026 2:53 PM, Maxime Leroy wrote:
>>>>> On Mon, Mar 23, 2026 at 1:49 PM Medvedkin, Vladimir
>>>>> <vladimir.medvedkin@intel.com> wrote:
>>>>>> Hi Maxime,
>>>>>>
>>>>>> On 3/23/2026 11:27 AM, Maxime Leroy wrote:
>>>>>>> Hi Vladimir,
>>>>>>>
>>>>>>>
>>>>>>> On Sun, Mar 22, 2026 at 4:42 PM Vladimir Medvedkin
>>>>>>> <vladimir.medvedkin@intel.com> wrote:
>>>>>>>> This series adds multi-VRF support to both IPv4 and IPv6 FIB paths by
>>>>>>>> allowing a single FIB instance to host multiple isolated routing domains.
>>>>>>>>
>>>>>>>> Currently FIB instance represents one routing instance. For workloads that
>>>>>>>> need multiple VRFs, the only option is to create multiple FIB objects. In a
>>>>>>>> burst oriented datapath, packets in the same batch can belong to different VRFs, so
>>>>>>>> the application either does per-packet lookup in different FIB instances or
>>>>>>>> regroups packets by VRF before lookup. Both approaches are expensive.
>>>>>>>>
>>>>>>>> To remove that cost, this series keeps all VRFs inside one FIB instance and
>>>>>>>> extends lookup input with per-packet VRF IDs.
>>>>>>>>
>>>>>>>> The design follows the existing fast-path structure for both families. IPv4 and
>>>>>>>> IPv6 use multi-ary trees with a 2^24 associativity on a first level (tbl24). The
>>>>>>>> first-level table scales per configured VRF. This increases memory usage, but
>>>>>>>> keeps performance and lookup complexity on par with non-VRF implementation.
>>>>>>>>
>>>>>>> Thanks for the RFC. Some thoughts below.
>>>>>>>
>>>>>>> Memory cost: the flat TBL24 replicates the entire table for every VRF
>>>>>>> (num_vrfs * 2^24 * nh_size). With 256 VRFs and 8B nexthops that is
>>>>>>> 32 GB for TBL24 alone. In grout we support up to 256 VRFs allocated
>>>>>>> on demand -- this approach forces the full cost upfront even if most
>>>>>>> VRFs are empty.
>>>>>> Yes, increased memory consumption is the
>>>>>> trade-off.WemakethischoiceinDPDKquite often,such as pre-allocatedmbufs,
>>>>>> mempoolsand many other stuff allocated in advance to gain performance.
>>>>>> For FIB, I chose to replicate TBL24 per VRF for this same reason.
>>>>>>
>>>>>> And, as Morten mentioned earlier, if memory is the priority, a table
>>>>>> instance per VRF allocated on-demand is still supported.
>>>>>>
>>>>>> The high memory cost stems from TBL24's design: for IPv4, it was
>>>>>> justified by the BGP filtering convention (no prefixes more specific
>>>>>> than /24 in BGPv4 full view), ensuring most lookups hit with just one
>>>>>> random memory access. For IPv6, we should likely switch to a 16-bit TRIE
>>>>>> scheme on all layers. For IPv4, alternative algorithms with smaller
>>>>>> footprints (like DXR or DIR16-8-8, as used in VPP) may be worth
>>>>>> exploring if BGP full view is not required for those VRFs.
>>>>>>
>>>>>>> Per-packet VRF lookup: Rx bursts come from one port, thus one VRF.
>>>>>>> Mixed-VRF bulk lookups do not occur in practice. The three AVX512
>>>>>>> code paths add complexity for a scenario that does not exist, at
>>>>>>> least for a classic router. Am I missing a use-case?
>>>>>> That's not true, you're missing out on a lot of established core use
>>>>>> cases that are at least 2 decades old:
>>>>>>
>>>>>> - VLAN subinterface abstraction. Each subinterface may belong to a
>>>>>> separate VRF
>>>>>>
>>>>>> - MPLS VPN
>>>>>>
>>>>>> - Policy based routing
>>>>>>
>>>>> Fair point on VLAN subinterfaces and MPLS VPN. SRv6 L3VPN (End.DT4/
>>>>> End.DT6) also fits that pattern after decap.
>>>>>
>>>>> I agree DPDK often pre-allocates for performance, but I wonder if the
>>>>> flat TBL24 actually helps here. Each VRF's working set is spread
>>>>> 128 MB apart in the flat table. Would regrouping packets by VRF and
>>>>> doing one bulk lookup per VRF with separate contiguous TBL24s be
>>>>> more cache-friendly than a single mixed-VRF gather? Do you have
>>>>> benchmarks comparing the two approaches?
>>>> It depends. Generally, if we assume that we are working with wide
>>>> internet traffic, then even for a single VRF we most likely will miss
>>>> the cache for TLB24, thus, regardless of the size of the tbl24, each
>>>> memory access will be performed directly to DRAM.
>>> If the lookup is DRAM-bound anyway, then the 10 cycles/addr cost
>>> is dominated by memory latency, not CPU. The CPU cost of a bucket
>>> sort on 32-64 packets is negligible next to a DRAM access (~80-100
>>> ns per cache miss).
>> memory accesses are independent and executed in parallel in the CPU
>> pipeline
>>> That actually makes the case for regroup +
>>> per-VRF lookup: the regrouping is pure CPU work hidden behind
>>> memory stalls,
>> regrouping must be performed before memory accesses, so it cannot be
>> amortized in between of memory reads
> With internet traffic, TBL24 lookups quickly become limited by
> cache misses, not CPU cycles. Even if some bursts hit the same
> routes and benefit from cache locality, the CPU has a limited
> number of outstanding misses (load buffer entries, MSHRs) --
> out-of-order execution helps, but it is not magic.
Correct, but this does not contradict to what I'm saying
>
> The whole point of vector/graph processing (VPP, DPDK graph, etc.)
> is to amortize that memory latency: prefetch for packet N+1 while
> processing packet N. This works because all packets in a batch
> hit the same data structure in a tight loop.
https://github.com/DPDK/dpdk/blob/626d4e39327333cd5508885162e45ca7fb94ef7f/lib/fib/dir24_8.h#L161
>
> With separate per-VRF TBL24s, a bucket sort by VRF -- a few
> dozen cycles, all in L1 -- gives you clean batches where
> prefetching works as designed. This is exactly what graph nodes
> already do: classify, then process per-class in a tight loop.
How lookup is performed in this design? Am I understand it right:
1. sort the batch by VRF ids, splitting IPs from the batch with IP
sub-batches belonging to the same VRF id
2. for each subset of IPs perform lookup in tbl24[batch_common_vrf_id]
3. unsort nexthops
Correct?
>
>>> and each per-VRF bulk lookup hits a contiguous
>>> TBL24 instead of scattering across 128 MB-apart VRF regions.
>> why is a contiguous 128Mb single-VRF TBL24 OK for you, but bigger
>> contiguous multi-VRF TBL24 is not OK in the context of lookup (here we
>> are talking about lookup, omitting the problem of memory consumption on
>> init)?
> The performance difference may be small, but the flat approach
> is not faster either -- while costing 64 GB upfront.
it seems you are implicitly take an assumption of 256 VRFs. Is my
usecase with a few VRFs have a right to exist?
>
>> In both of these cases, memory access behaves the same way within a
>> single batch of packets during lookup, i.e. the first hit is likely a
>> cache miss, regardless of whether we are dealing with one or more VRFs,
>> it will not maintain TBL24 in L3$ in any way in a real dataplane app.
>>
>>>> And if the addresses are localized (i.e. most traffic is internal), then
>>>> having multiple TBL24 won'tmake the situationmuchworse.
>>>>
>>> With localized traffic, regrouping by VRF + per-VRF lookup on
>>> contiguous TBL24s would benefit from cache locality,
>> why so? There will be no differences within a single batch with a
>> reasonable size (for example 64), because within the lookup session, no
>> matter with or without regrouping, temporal cache locality will be the same.
>>
>> Let't look at it from a different angle. Is it
>> worthregroupingipaddressesby/8(i.e. 8 MSBs)withthe
>> currentimplementationof a singleVRFFIB?
>>
>>> while the
>>> flat multi-VRF table spreads hot entries 128 MB apart. The flat
>>> approach may actually be worse in that scenario
>>>
>>>> I don't have any benchmarks for regrouping, however I have 2 things to
>>>> consider:
>>>>
>>>> 1. lookup is relatively fast (for IPv4 it is about 10 cycles per
>>>> address, and I don't really want to slow it down)
>>>>
>>>> 2. incoming addresses and their corresponding VRFs are not controlled by
>>>> "us", so this is a random set. Regrouping effectively is sorting. I'm
>>>> not really happy to have nlogn complexity on a fast path :)
>>> Without benchmarks, we do not know whether the flat approach is
>>> actually faster than regroup + per-VRF lookup.
>> feel free to share benchmark results. The only thing you need to add is
>> the packets regrouping logic, and then use separate single-VRF FIB
>> instances.
> Your series introduces a new API that optimizes multi-VRF lookup.
> The performance numbers should come with the proposal.
By the policy we can not share raw performance numbers, and I think this
is unnecessary, because performance depends on the testing environment
(content of the routing table, CPU model, etc).
Tests I've done on my board with ipv4 full-view (782940 routes) with 4
VRFs performing random lookup in all of them was 180% in cost comparing
to a single VRF with the same RT content.
You can test it in your environment with
dpdk-test-fib -l 1,2 --no-pci -- -f <path to your routes> -e 4 -l
100000000 -V <number of VRFs>
>
>>>>> On the memory trade-off and VRF ID mapping: the API uses vrf_id as
>>>>> a direct index (0 to max_vrfs-1). With 256 VRFs and 8B nexthops,
>>>>> TBL24 alone costs 32 GB for IPv4 and 32 GB for IPv6 -- 64 GB total
>>>>> at startup. In grout, VRF IDs are interface IDs that can be any
>>>>> uint16_t, so we would also need to maintain a mapping between our
>>>>> VRF IDs and FIB slot indices.
>>>> of course, this is an application responsibility. In FIB VRFs are in
>>>> continuous range.
>>>>> We would need to introduce a max_vrfs
>>>>> limit, which forces a bad trade-off: either set it low (e.g. 16)
>>>>> and limit deployments, or set it high (e.g. 256) and pay 64 GB at
>>>>> startup even with a single VRF. With separate FIB instances per VRF,
>>>>> we only allocate what we use.
>>>> Yes, I understand this. In the end, if the user wants to use 256 VRFs,
>>>> the amount of memory footprint will be at least 64Gb anyway.
>>> The difference is when the memory is committed.
>> yes, this's the only difference. It all comes down to the static vs
>> dynamic memory allocation problem. And each of these approaches is good
>> for solving a specific task. For the task of creating a new VRF, what is
>> more preferable - to fail on init or runtime?
> The main problem is that your series imposes contiguous VRF IDs
> (0 to max_vrfs-1). How a VRF is represented is a network stack
> design decision
exactly - a network stack decision. FIB is not a network stack.
> -- in Linux it is an ifindex,
so every interface lives in it's own private VRF?
> in Cisco a name,
are you going to pass an array of strings on lookup?
> in grout an interface ID.
haven't we decided this is a problematic design(VLANs, L3VPN, etc)?
> Any application using this API needs
> a mapping layer on top.
I think from my rhetoric questions this should be obvious
>
> In grout, everything is allocated dynamically: mempools, FIBs,
> conntrack tables. Pre-allocating everything at init forces
> hardcoded arbitrary limits and prevents memory reuse between
> subsystems -- memory reserved for FIB TBL24 cannot be used for
> conntrack when the VRF has no routes, and vice versa. We prefer
> to allocate resources only when needed. It is simpler for users
> and more efficient for memory.
>
>>> With separate FIB
>>> instances per VRF, you allocate 128 MB only when a VRF is actually
>>> created at runtime. With the flat multi-VRF approach, you pay
>>> max_vrfs * 128 MB at startup, even if only one VRF is active.
>>>
>>> On top of that, the API uses vrf_id as a direct index (0 to
>>> max_vrfs-1). As Stephen noted, there are multiple ways to model
>>> VRFs. Depending on the networking stack, VRFs are identified by
>>> ifindex (Linux l3mdev), by name (Cisco, Juniper), or by some
>>> other scheme. This means the application must maintain a mapping
>>> between its own VRF representation and the FIB slot indices, and
>>> choose max_vrfs upfront. What is the benefit of this flat
>>> multi-VRF FIB if the application still needs to manage a
>>> translation layer and pre-commit memory for VRFs that may never
>>> exist?
>> This is the control plane task.
>>>> As a trade-off for a bad trade-off ;) I can suggest to allocate it in
>>>> chunks. Let's say you are starting with 16 VRFs, and during runtime, if
>>>> the user wants to increase the number of VRFs above this limit, you can
>>>> allocate another 16xVRF FIB. Then, of course, you need to split
>>>> addresses into 2 bursts each for each FIB handle.
>>> But then we are back to regrouping packets -- just by chunk of
>>> VRFs instead of by individual VRF. If we have to sort the burst
>>> anyway, what does the flat multi-VRF table buy us?
>>>
>>>>>>> I am not too familiar with DPDK FIB internals, but would it be
>>>>>>> possible to keep a separate TBL24 per VRF and only share the TBL8
>>>>>>> pool?
>>>>>> it is how it is implemented right now with one note - TBL24 are pre
>>>>>> allocated.
>>>>>>> Something like pre-allocating an array of max_vrfs TBL24
>>>>>>> pointers, allocating each TBL24 on demand at VRF add time,
>>>>>> and you suggesting to allocate TBL24 on demand by adding an extra
>>>>>> indirection layer. Thiswill leadtolowerperformance,whichIwouldliketo avoid.
>>>>>>> and
>>>>>>> having them all point into a shared TBL8 pool. The TBL8 index in
>>>>>>> TBL24 entries seems to already be global, so would that work without
>>>>>>> encoding changes?
>>>>>>>
>>>>>>> Going further: could the same idea extend to IPv6? The dir24_8 and
>>>>>>> trie seem to use the same TBL8 block format (256 entries, same
>>>>>>> (nh << 1) | ext_bit encoding, same size). Would unifying the TBL8
>>>>>>> allocator allow a single pool shared across IPv4, IPv6, and all
>>>>>>> VRFs? That could be a bigger win for /32-heavy and /128-heavy tables
>>>>>>> and maybe a good first step before multi-VRF.
>>>>>> So, you are suggesting merging IPv4 and IPv6 into a single unified FIB?
>>>>>> I'm not sure how this can be a bigger win, could you please elaborate
>>>>>> more on this?
>>>>> On the IPv4/IPv6 TBL8 pool: I was not suggesting merging FIBs, just
>>>>> sharing the TBL8 block allocator between separate FIB instances.
>>>>> This is possible since dir24_8 and trie use the same TBL8 block
>>>>> format (256 entries, same encoding, same size).
>>>>>
>>>>> Would it be possible to pass a shared TBL8 pool at rte_fib_create()
>>>>> time? Each FIB keeps its own TBL24 and RIB, but TBL8 is shared
>>>>> across all FIBs and potentially across IPv4/IPv6. Users would no
>>>>> longer have to guess num_tbl8 per FIB.
>>>> Yes, this is possible. However, this will significantly complicate the
>>>> work with the library, solving a not so big problem.
>>> Your series already shares TBL8 across all VRFs within a single
>>> FIB -- that part is useful, and it does not require the flat
>>> multi-VRF TBL24.
>>>
>>> In grout, routes arrive from FRR (BGP, OSPF, etc.) at runtime.
>>> We cannot predict TBL8 usage per VRF in advance
>> and you don't need it (knowing per-VRF consumption) now. If I understood
>> your request here properly, do you want to share TBL8 between ipv4 and
>> ipv6 FIBs? I don't think this is a good idea at all. At least because
>> this is a good idea to split them in case if one AF consumes (because of
>> attack/bogus CP) all TBL8, another AF remains intact.
> If TBL8 isolation per AF is meant as a protection against route
> floods, then the same argument applies between VRFs: your series
> shares TBL8 across all VRFs within a single FIB, so a bogus
> control plane in one VRF exhausts TBL8 for all other VRFs.
>
> But more fundamentally, this is not how route flood protection
> works. It is handled in the control plane: the routing daemon
> limits the number of prefixes accepted per BGP session
> (max-prefix) and selects which routes are installed via prefix
> filters -- before those routes ever reach the forwarding table.
>
> The Linux kernel is a good reference here. IPv6 used to enforce
> a max_size limit on FIB + cache entries (net.ipv6.route.max_size,
> defaulting to 4096). It caused real production issues and was
> removed in kernel 6.3. IPv4 never had a FIB route limit. There
> is no per-VRF route limit either. The kernel relies entirely on
> the control plane for route flood protection.
FIB is not the Linux kernel, as well as not a network stack. We can not
rely on a control plane protection, since control plane is a 3rd party
software.
Also, I think allocating a very algorithm-specific entity such as pool
of TBL8 prior to calling rte_fib_create and passing a pointer on it
could be confusing for many users and bloating API.
FIB supports pluggable lookup algorithms, you can write your own and
specify a pointer to the tbl8_pool in an algorithm-specific
configuration defined for your algorithm, where you may also create a
dynamic table of TBL24 pointers per each VRF. If you need any help with
this task - I would be happy to help.
>
>>> -- it depends on
>>> prefix length distribution which varies per VRF and changes over
>>> time. No production LPM (Linux kernel, JunOS, IOS) asks the
>>> operator to size these structures per routing table upfront.
>> - they are using different lpm algorithms
>> - you use these facilities, developers properly tuned them. FIB is a low
>> level library, it cannot be used without any knowledge, it will not
>> solve all the problems with a single red button "make it work, don't do
>> any bugs"
>> P.S. how do you know how JunOS /IOS implements their LPMs? ;)
>>
> I do not need to know their LPM implementation -- I only need
> to know how they are configured. No production router requires
> the operator to size internal LPM structures.
>
> We can impose a maximum number of IPv4/IPv6 routes on the user
> -- even though the kernel does not need this either. But TBL8
> is a different problem: the application cannot predict TBL8
> consumption because it depends on prefix length distribution,
> which varies per VRF and changes over time with dynamic routing.
> Today there is no API to query TBL8 usage, and no API to resize
> a FIB without destroying it.
>
> This is exactly why a shared TBL8 pool across VRFs is useful:
> VRFs with few long prefixes naturally leave room for VRFs that
> need more.
but this is already implemented. I don't know why you repeatedly
concerning about this. We are aligned about this and this feature is
already there - in the patch
On the other hand what we disagreed on is the sharing not only across
VRFs, but also across address families. If you don't understand the
amount of TBL8 per AF, how would you magically understand the number of
TBL8s for a merged pool?
> This is the valuable part of your series. But it
> does not require a flat multi-VRF TBL24 -- separate per-VRF
> TBL24s sharing a common TBL8 pool would give the same benefit
> without the 64 GB upfront cost.
andthat'sa completelydifferentproblem.Please,let's separatethe
problemsandnotmixthemup.
I understand your concern about memory consumption. Ihavesomeideason
howto solvethisproblem as a parallel to proposed solution.
>
>>> Today we do not even have TBL8 usage stats (Robin's series
>>> addresses that)
>> I will tryto findtimetoreviewthispatchinthe nearfuture
> Thanks, Robin's TBL8 stats series would help users understand
> their TBL8 consumption -- a more practical improvement for
> current users.
>
> Regards,
> Maxime
--
Regards,
Vladimir
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [RFC PATCH 1/4] fib: add multi-VRF support
2026-03-26 10:13 ` Konstantin Ananyev
@ 2026-03-27 18:32 ` Medvedkin, Vladimir
0 siblings, 0 replies; 33+ messages in thread
From: Medvedkin, Vladimir @ 2026-03-27 18:32 UTC (permalink / raw)
To: Konstantin Ananyev, dev@dpdk.org
Cc: rjarry@redhat.com, nsaxena16@gmail.com, mb@smartsharesystems.com,
adwivedi@marvell.com, jerinjacobk@gmail.com, Maxime Leroy,
Vladimir Medvedkin
On 3/26/2026 10:13 AM, Konstantin Ananyev wrote:
>
>>>>>> Add VRF (Virtual Routing and Forwarding) support to the IPv4
>>>>>> FIB library, allowing multiple independent routing tables
>>>>>> within a single FIB instance.
>>>>>>
>>>>>> Introduce max_vrfs and vrf_default_nh fields in rte_fib_conf
>>>>>> to configure the number of VRFs and per-VRF default nexthops.
>>>>> Thanks Vladimir, allowing multiple VRFs per same LPM table will
>>>>> definitely be a useful thing to have.
>>>>> Though, I have the same concern as Maxime:
>>>>> memory requirements are just overwhelming.
>>>>> Stupid q - why just not to store a pointer to a vector of next-hops
>>>>> within the table entry?
>>>> Am I understand correctly, a vector with max_number_of_vrfs entries and
>>>> use vrf id to address a nexthop?
>>> Yes.
>> Here I can see 2 problems:
>>
>> 1. tbl entries must be the size of a pointer, so no way to use smaller sizes
> Yes, but as we are talking about storing nexthops for multiple VRFs anyway,
> I don't think it is a big deal.
>
>> 2. those vectors will be sparsely populated and, depending on the
>> runtime configuration, may consume a lot of memory too (as Robin
>> mentioned they may have 1024 VRFs)
> Yeas, each VRF vector can become really sparse and we waste a lot of memory.
> If that's an issue, we probably can think about something smarter
> then simple flat array indexed by vrf-id: something like 2-level B-tree or so.
> The main positives that I see in that approach:
> - low extra overhead at lookup - one/two extra pointer de-refernces.
I'm afraidtheoverheadwillbe
comparativelylargejustbecausethecurrentimplementationis fastandmost
likely hit with a single memory access. However, for a low number of
VRFs, B-tree may be a good solution
> - it allows CP to allocate/free space for each such vecto separately,
> so we don't need to pre-allocate memory for max possible entries at startup.
>
>>>> Yes, this may work.
>>>> But, if we are going to do an extra memory access, I'd better to
>>>> maintain an internal hash table with 5 byte keys {24_bits_from_LPM,
>>>> 16_bits_vrf_id} to retrieve a nexthop.
>>> Hmm... and what to do with entries in tbl8, I mean what will be the key for
>> them?
>>> Or you don't plan to put entries from tbl8 to that hash table?
>> The idea is to have a single LPM struct with a join superset of all
>> prefixes existing in all VRFs. Each prefix in this LPM struct has its
>> own unique "nexthop", which is not the final next hop, but an
>> intermediate metadata defining this unique prefix. Then, the following
>> search is performed with the key containing this intermediate metadata +
>> vrf_id in some exact match database like hash table. This approach is
>> the most memory friendly, since there is only one LPM data struct (which
>> scales well with number of prefixes it has) with intermediate entries
>> only 4b long.
>> On the other hand it requires an extra search, so lookup will be slower.
>> Also, some current LPM optimizations, like tbl8 collapsing if all tbl8
>> entries have a similar value, will be gone.
> Yes, and yes :)
> Yes it would help to save memory, and yes lookup will most likely be slower.
> The other thing that I consider as a possible drawback here - with current rte_hash
> implementation we still need to allocate space for all possible max entries at startup.
I don't think this is a big problem, since the size of this memory will
be reasonable and will not grow linearly with the number of VRFs. So I
agree it is an acceptable trade-off
> But that's not new in DPDK, and for most cases it is considered as acceptable trade-off.
> Overall, it seems like a possible approach to me, I suppose the main question is:
> what will be the price of that extra hash-lookup here.
And this is the key problem. I don't think rte_hash is well suitable
here, at best we need some kind of a perfect hash. I have a few ideas on
this, stay tuned :)
>
> Again there is a bulk version of hash lookup and in theory it might be it can be
> improved further (avx512 version on x86?).
>
>>>>> And we can provide to the user with ability to specify custom
>>>>> alloc/free function for these vectors.
>>>>> That would help to avoid allocating huge chunks of memory at startup.
>>>>> I understand that it will be one extra memory dereference,
>>>>> but probably it will be not that critical in terms of performance .
>>>>> Again for bulk function we might be able to pipeline lookups and
>>>>> de-references and hide that extra load latency.
>>>>>
>>>>>> Add four new experimental APIs:
>>>>>> - rte_fib_vrf_add() and rte_fib_vrf_delete() to manage routes
>>>>>> per VRF
>>>>>> - rte_fib_vrf_lookup_bulk() for multi-VRF bulk lookups
>>>>>> - rte_fib_vrf_get_rib() to retrieve a per-VRF RIB handle
>>>>>>
>>>>>> Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
>>>>>> ---
>>>>>> lib/fib/dir24_8.c | 241 ++++++++++++++++------
>>>>>> lib/fib/dir24_8.h | 255 ++++++++++++++++--------
>>>>>> lib/fib/dir24_8_avx512.c | 420 +++++++++++++++++++++++++++++++--------
>>>>>> lib/fib/dir24_8_avx512.h | 80 +++++++-
>>>>>> lib/fib/rte_fib.c | 158 ++++++++++++---
>>>>>> lib/fib/rte_fib.h | 94 ++++++++-
>>>>>> 6 files changed, 988 insertions(+), 260 deletions(-)
>>>>>>
>>>> <snip>
>>>>
>>>> --
>>>> Regards,
>>>> Vladimir
>>>>
>> --
>> Regards,
>> Vladimir
>>
--
Regards,
Vladimir
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [RFC PATCH 0/4] VRF support in FIB library
2026-03-27 18:27 ` Medvedkin, Vladimir
@ 2026-04-02 16:51 ` Maxime Leroy
0 siblings, 0 replies; 33+ messages in thread
From: Maxime Leroy @ 2026-04-02 16:51 UTC (permalink / raw)
To: Medvedkin, Vladimir
Cc: dev, rjarry, nsaxena16, mb, adwivedi, jerinjacobk,
Vladimir Medvedkin
Hi Vladimir,
On Fri, Mar 27, 2026 at 7:28 PM Medvedkin, Vladimir
<vladimir.medvedkin@intel.com> wrote:
>
> Hi Maxime,
>
> On 3/25/2026 9:43 PM, Maxime Leroy wrote:
> > Hi Vladimir,
> >
> > On Wed, Mar 25, 2026 at 4:56 PM Medvedkin, Vladimir
> > <vladimir.medvedkin@intel.com> wrote:
> >>
> >> On 3/24/2026 9:19 AM, Maxime Leroy wrote:
> >>> Hi Vladimir,
> >>>
> >>> On Mon, Mar 23, 2026 at 7:46 PM Medvedkin, Vladimir
> >>> <vladimir.medvedkin@intel.com> wrote:
> >>>> On 3/23/2026 2:53 PM, Maxime Leroy wrote:
> >>>>> On Mon, Mar 23, 2026 at 1:49 PM Medvedkin, Vladimir
> >>>>> <vladimir.medvedkin@intel.com> wrote:
> >>>>>> Hi Maxime,
> >>>>>>
> >>>>>> On 3/23/2026 11:27 AM, Maxime Leroy wrote:
> >>>>>>> Hi Vladimir,
> >>>>>>>
> >>>>>>>
> >>>>>>> On Sun, Mar 22, 2026 at 4:42 PM Vladimir Medvedkin
> >>>>>>> <vladimir.medvedkin@intel.com> wrote:
> >>>>>>>> This series adds multi-VRF support to both IPv4 and IPv6 FIB paths by
> >>>>>>>> allowing a single FIB instance to host multiple isolated routing domains.
> >>>>>>>>
> >>>>>>>> Currently FIB instance represents one routing instance. For workloads that
> >>>>>>>> need multiple VRFs, the only option is to create multiple FIB objects. In a
> >>>>>>>> burst oriented datapath, packets in the same batch can belong to different VRFs, so
> >>>>>>>> the application either does per-packet lookup in different FIB instances or
> >>>>>>>> regroups packets by VRF before lookup. Both approaches are expensive.
> >>>>>>>>
> >>>>>>>> To remove that cost, this series keeps all VRFs inside one FIB instance and
> >>>>>>>> extends lookup input with per-packet VRF IDs.
> >>>>>>>>
> >>>>>>>> The design follows the existing fast-path structure for both families. IPv4 and
> >>>>>>>> IPv6 use multi-ary trees with a 2^24 associativity on a first level (tbl24). The
> >>>>>>>> first-level table scales per configured VRF. This increases memory usage, but
> >>>>>>>> keeps performance and lookup complexity on par with non-VRF implementation.
> >>>>>>>>
> >>>>>>> Thanks for the RFC. Some thoughts below.
> >>>>>>>
> >>>>>>> Memory cost: the flat TBL24 replicates the entire table for every VRF
> >>>>>>> (num_vrfs * 2^24 * nh_size). With 256 VRFs and 8B nexthops that is
> >>>>>>> 32 GB for TBL24 alone. In grout we support up to 256 VRFs allocated
> >>>>>>> on demand -- this approach forces the full cost upfront even if most
> >>>>>>> VRFs are empty.
> >>>>>> Yes, increased memory consumption is the
> >>>>>> trade-off.WemakethischoiceinDPDKquite often,such as pre-allocatedmbufs,
> >>>>>> mempoolsand many other stuff allocated in advance to gain performance.
> >>>>>> For FIB, I chose to replicate TBL24 per VRF for this same reason.
> >>>>>>
> >>>>>> And, as Morten mentioned earlier, if memory is the priority, a table
> >>>>>> instance per VRF allocated on-demand is still supported.
> >>>>>>
> >>>>>> The high memory cost stems from TBL24's design: for IPv4, it was
> >>>>>> justified by the BGP filtering convention (no prefixes more specific
> >>>>>> than /24 in BGPv4 full view), ensuring most lookups hit with just one
> >>>>>> random memory access. For IPv6, we should likely switch to a 16-bit TRIE
> >>>>>> scheme on all layers. For IPv4, alternative algorithms with smaller
> >>>>>> footprints (like DXR or DIR16-8-8, as used in VPP) may be worth
> >>>>>> exploring if BGP full view is not required for those VRFs.
> >>>>>>
> >>>>>>> Per-packet VRF lookup: Rx bursts come from one port, thus one VRF.
> >>>>>>> Mixed-VRF bulk lookups do not occur in practice. The three AVX512
> >>>>>>> code paths add complexity for a scenario that does not exist, at
> >>>>>>> least for a classic router. Am I missing a use-case?
> >>>>>> That's not true, you're missing out on a lot of established core use
> >>>>>> cases that are at least 2 decades old:
> >>>>>>
> >>>>>> - VLAN subinterface abstraction. Each subinterface may belong to a
> >>>>>> separate VRF
> >>>>>>
> >>>>>> - MPLS VPN
> >>>>>>
> >>>>>> - Policy based routing
> >>>>>>
> >>>>> Fair point on VLAN subinterfaces and MPLS VPN. SRv6 L3VPN (End.DT4/
> >>>>> End.DT6) also fits that pattern after decap.
> >>>>>
> >>>>> I agree DPDK often pre-allocates for performance, but I wonder if the
> >>>>> flat TBL24 actually helps here. Each VRF's working set is spread
> >>>>> 128 MB apart in the flat table. Would regrouping packets by VRF and
> >>>>> doing one bulk lookup per VRF with separate contiguous TBL24s be
> >>>>> more cache-friendly than a single mixed-VRF gather? Do you have
> >>>>> benchmarks comparing the two approaches?
> >>>> It depends. Generally, if we assume that we are working with wide
> >>>> internet traffic, then even for a single VRF we most likely will miss
> >>>> the cache for TLB24, thus, regardless of the size of the tbl24, each
> >>>> memory access will be performed directly to DRAM.
> >>> If the lookup is DRAM-bound anyway, then the 10 cycles/addr cost
> >>> is dominated by memory latency, not CPU. The CPU cost of a bucket
> >>> sort on 32-64 packets is negligible next to a DRAM access (~80-100
> >>> ns per cache miss).
> >> memory accesses are independent and executed in parallel in the CPU
> >> pipeline
> >>> That actually makes the case for regroup +
> >>> per-VRF lookup: the regrouping is pure CPU work hidden behind
> >>> memory stalls,
> >> regrouping must be performed before memory accesses, so it cannot be
> >> amortized in between of memory reads
> > With internet traffic, TBL24 lookups quickly become limited by
> > cache misses, not CPU cycles. Even if some bursts hit the same
> > routes and benefit from cache locality, the CPU has a limited
> > number of outstanding misses (load buffer entries, MSHRs) --
> > out-of-order execution helps, but it is not magic.
> Correct, but this does not contradict to what I'm saying
> >
> > The whole point of vector/graph processing (VPP, DPDK graph, etc.)
> > is to amortize that memory latency: prefetch for packet N+1 while
> > processing packet N. This works because all packets in a batch
> > hit the same data structure in a tight loop.
> https://github.com/DPDK/dpdk/blob/626d4e39327333cd5508885162e45ca7fb94ef7f/lib/fib/dir24_8.h#L161
>
> >
> > With separate per-VRF TBL24s, a bucket sort by VRF -- a few
> > dozen cycles, all in L1 -- gives you clean batches where
> > prefetching works as designed. This is exactly what graph nodes
> > already do: classify, then process per-class in a tight loop.
> How lookup is performed in this design? Am I understand it right:
> 1. sort the batch by VRF ids, splitting IPs from the batch with IP
> sub-batches belonging to the same VRF id
> 2. for each subset of IPs perform lookup in tbl24[batch_common_vrf_id]
> 3. unsort nexthops
>
> Correct?
No sort/unsort. This is how rte_graph classification works:
ip_input (validation)
-> ip_lookup-v0 (bulk fib4_lookup on homogeneous VRF 0 burst)
-> ip_lookup-v1 (bulk fib4_lookup on homogeneous VRF 1 burst)
-> ip_forward / ip_input_local / ...
ip_input already iterates over packets for header validation and
enqueues them to different next nodes. Adding a per-VRF edge costs
one iface->vrf_id load (already in L1) and one rte_node_enqueue_x1()
(already done today). Each ip_lookup-vN clone holds its VRF's
rte_fib in node context and calls rte_fib_lookup_bulk() on the
whole burst at once.
We do not use bulk lookups yet in grout (each packet does its own
rte_fib_lookup_bulk(..., 1) today), but this is how we would
implement it.
The tradeoff is batch fragmentation: with traffic spread across K
active VRFs, each sub-batch is ~N/K packets. But in practice, most
deployments have 1-3 hot VRFs, so batches stay large. And even
fragmented batches benefit from the vectorized lookup -- 8 packets
is still one AVX512 iteration, vs. 8 scalar lookups today.
> >
> >>> and each per-VRF bulk lookup hits a contiguous
> >>> TBL24 instead of scattering across 128 MB-apart VRF regions.
> >> why is a contiguous 128Mb single-VRF TBL24 OK for you, but bigger
> >> contiguous multi-VRF TBL24 is not OK in the context of lookup (here we
> >> are talking about lookup, omitting the problem of memory consumption on
> >> init)?
> > The performance difference may be small, but the flat approach
> > is not faster either -- while costing 64 GB upfront.
> it seems you are implicitly take an assumption of 256 VRFs. Is my
> usecase with a few VRFs have a right to exist?
> >
> >> In both of these cases, memory access behaves the same way within a
> >> single batch of packets during lookup, i.e. the first hit is likely a
> >> cache miss, regardless of whether we are dealing with one or more VRFs,
> >> it will not maintain TBL24 in L3$ in any way in a real dataplane app.
> >>
> >>>> And if the addresses are localized (i.e. most traffic is internal), then
> >>>> having multiple TBL24 won'tmake the situationmuchworse.
> >>>>
> >>> With localized traffic, regrouping by VRF + per-VRF lookup on
> >>> contiguous TBL24s would benefit from cache locality,
> >> why so? There will be no differences within a single batch with a
> >> reasonable size (for example 64), because within the lookup session, no
> >> matter with or without regrouping, temporal cache locality will be the same.
> >>
> >> Let't look at it from a different angle. Is it
> >> worthregroupingipaddressesby/8(i.e. 8 MSBs)withthe
> >> currentimplementationof a singleVRFFIB?
> >>
> >>> while the
> >>> flat multi-VRF table spreads hot entries 128 MB apart. The flat
> >>> approach may actually be worse in that scenario
> >>>
> >>>> I don't have any benchmarks for regrouping, however I have 2 things to
> >>>> consider:
> >>>>
> >>>> 1. lookup is relatively fast (for IPv4 it is about 10 cycles per
> >>>> address, and I don't really want to slow it down)
> >>>>
> >>>> 2. incoming addresses and their corresponding VRFs are not controlled by
> >>>> "us", so this is a random set. Regrouping effectively is sorting. I'm
> >>>> not really happy to have nlogn complexity on a fast path :)
> >>> Without benchmarks, we do not know whether the flat approach is
> >>> actually faster than regroup + per-VRF lookup.
> >> feel free to share benchmark results. The only thing you need to add is
> >> the packets regrouping logic, and then use separate single-VRF FIB
> >> instances.
> > Your series introduces a new API that optimizes multi-VRF lookup.
> > The performance numbers should come with the proposal.
> By the policy we can not share raw performance numbers, and I think this
> is unnecessary, because performance depends on the testing environment
> (content of the routing table, CPU model, etc).
>
> Tests I've done on my board with ipv4 full-view (782940 routes) with 4
> VRFs performing random lookup in all of them was 180% in cost comparing
> to a single VRF with the same RT content.
>
> You can test it in your environment with
> dpdk-test-fib -l 1,2 --no-pci -- -f <path to your routes> -e 4 -l
> 100000000 -V <number of VRFs>
The dpdk-test-fib benchmark is useful for measuring raw lookup
throughput, but it does not capture the full picture. In a real
router stack with rte_graph, the classification by VRF happens
naturally as part of packet processing -- it is not an extra
sorting step. The only way to compare both approaches fairly is
to measure end-to-end forwarding performance in a real datapath.
grout is an open source DPDK-based router built on rte_graph,
designed to exercise and validate DPDK APIs in realistic
conditions. I would be happy to help benchmark both approaches
there.
>
> >
> >>>>> On the memory trade-off and VRF ID mapping: the API uses vrf_id as
> >>>>> a direct index (0 to max_vrfs-1). With 256 VRFs and 8B nexthops,
> >>>>> TBL24 alone costs 32 GB for IPv4 and 32 GB for IPv6 -- 64 GB total
> >>>>> at startup. In grout, VRF IDs are interface IDs that can be any
> >>>>> uint16_t, so we would also need to maintain a mapping between our
> >>>>> VRF IDs and FIB slot indices.
> >>>> of course, this is an application responsibility. In FIB VRFs are in
> >>>> continuous range.
> >>>>> We would need to introduce a max_vrfs
> >>>>> limit, which forces a bad trade-off: either set it low (e.g. 16)
> >>>>> and limit deployments, or set it high (e.g. 256) and pay 64 GB at
> >>>>> startup even with a single VRF. With separate FIB instances per VRF,
> >>>>> we only allocate what we use.
> >>>> Yes, I understand this. In the end, if the user wants to use 256 VRFs,
> >>>> the amount of memory footprint will be at least 64Gb anyway.
> >>> The difference is when the memory is committed.
> >> yes, this's the only difference. It all comes down to the static vs
> >> dynamic memory allocation problem. And each of these approaches is good
> >> for solving a specific task. For the task of creating a new VRF, what is
> >> more preferable - to fail on init or runtime?
> > The main problem is that your series imposes contiguous VRF IDs
> > (0 to max_vrfs-1). How a VRF is represented is a network stack
> > design decision
> exactly - a network stack decision. FIB is not a network stack.
> > -- in Linux it is an ifindex,
> so every interface lives in it's own private VRF?
> > in Cisco a name,
> are you going to pass an array of strings on lookup?
> > in grout an interface ID.
> haven't we decided this is a problematic design(VLANs, L3VPN, etc)?
> > Any application using this API needs
> > a mapping layer on top.
> I think from my rhetoric questions this should be obvious
> >
> > In grout, everything is allocated dynamically: mempools, FIBs,
> > conntrack tables. Pre-allocating everything at init forces
> > hardcoded arbitrary limits and prevents memory reuse between
> > subsystems -- memory reserved for FIB TBL24 cannot be used for
> > conntrack when the VRF has no routes, and vice versa. We prefer
> > to allocate resources only when needed. It is simpler for users
> > and more efficient for memory.
> >
> >>> With separate FIB
> >>> instances per VRF, you allocate 128 MB only when a VRF is actually
> >>> created at runtime. With the flat multi-VRF approach, you pay
> >>> max_vrfs * 128 MB at startup, even if only one VRF is active.
> >>>
> >>> On top of that, the API uses vrf_id as a direct index (0 to
> >>> max_vrfs-1). As Stephen noted, there are multiple ways to model
> >>> VRFs. Depending on the networking stack, VRFs are identified by
> >>> ifindex (Linux l3mdev), by name (Cisco, Juniper), or by some
> >>> other scheme. This means the application must maintain a mapping
> >>> between its own VRF representation and the FIB slot indices, and
> >>> choose max_vrfs upfront. What is the benefit of this flat
> >>> multi-VRF FIB if the application still needs to manage a
> >>> translation layer and pre-commit memory for VRFs that may never
> >>> exist?
> >> This is the control plane task.
> >>>> As a trade-off for a bad trade-off ;) I can suggest to allocate it in
> >>>> chunks. Let's say you are starting with 16 VRFs, and during runtime, if
> >>>> the user wants to increase the number of VRFs above this limit, you can
> >>>> allocate another 16xVRF FIB. Then, of course, you need to split
> >>>> addresses into 2 bursts each for each FIB handle.
> >>> But then we are back to regrouping packets -- just by chunk of
> >>> VRFs instead of by individual VRF. If we have to sort the burst
> >>> anyway, what does the flat multi-VRF table buy us?
> >>>
> >>>>>>> I am not too familiar with DPDK FIB internals, but would it be
> >>>>>>> possible to keep a separate TBL24 per VRF and only share the TBL8
> >>>>>>> pool?
> >>>>>> it is how it is implemented right now with one note - TBL24 are pre
> >>>>>> allocated.
> >>>>>>> Something like pre-allocating an array of max_vrfs TBL24
> >>>>>>> pointers, allocating each TBL24 on demand at VRF add time,
> >>>>>> and you suggesting to allocate TBL24 on demand by adding an extra
> >>>>>> indirection layer. Thiswill leadtolowerperformance,whichIwouldliketo avoid.
> >>>>>>> and
> >>>>>>> having them all point into a shared TBL8 pool. The TBL8 index in
> >>>>>>> TBL24 entries seems to already be global, so would that work without
> >>>>>>> encoding changes?
> >>>>>>>
> >>>>>>> Going further: could the same idea extend to IPv6? The dir24_8 and
> >>>>>>> trie seem to use the same TBL8 block format (256 entries, same
> >>>>>>> (nh << 1) | ext_bit encoding, same size). Would unifying the TBL8
> >>>>>>> allocator allow a single pool shared across IPv4, IPv6, and all
> >>>>>>> VRFs? That could be a bigger win for /32-heavy and /128-heavy tables
> >>>>>>> and maybe a good first step before multi-VRF.
> >>>>>> So, you are suggesting merging IPv4 and IPv6 into a single unified FIB?
> >>>>>> I'm not sure how this can be a bigger win, could you please elaborate
> >>>>>> more on this?
> >>>>> On the IPv4/IPv6 TBL8 pool: I was not suggesting merging FIBs, just
> >>>>> sharing the TBL8 block allocator between separate FIB instances.
> >>>>> This is possible since dir24_8 and trie use the same TBL8 block
> >>>>> format (256 entries, same encoding, same size).
> >>>>>
> >>>>> Would it be possible to pass a shared TBL8 pool at rte_fib_create()
> >>>>> time? Each FIB keeps its own TBL24 and RIB, but TBL8 is shared
> >>>>> across all FIBs and potentially across IPv4/IPv6. Users would no
> >>>>> longer have to guess num_tbl8 per FIB.
> >>>> Yes, this is possible. However, this will significantly complicate the
> >>>> work with the library, solving a not so big problem.
> >>> Your series already shares TBL8 across all VRFs within a single
> >>> FIB -- that part is useful, and it does not require the flat
> >>> multi-VRF TBL24.
> >>>
> >>> In grout, routes arrive from FRR (BGP, OSPF, etc.) at runtime.
> >>> We cannot predict TBL8 usage per VRF in advance
> >> and you don't need it (knowing per-VRF consumption) now. If I understood
> >> your request here properly, do you want to share TBL8 between ipv4 and
> >> ipv6 FIBs? I don't think this is a good idea at all. At least because
> >> this is a good idea to split them in case if one AF consumes (because of
> >> attack/bogus CP) all TBL8, another AF remains intact.
> > If TBL8 isolation per AF is meant as a protection against route
> > floods, then the same argument applies between VRFs: your series
> > shares TBL8 across all VRFs within a single FIB, so a bogus
> > control plane in one VRF exhausts TBL8 for all other VRFs.
> >
> > But more fundamentally, this is not how route flood protection
> > works. It is handled in the control plane: the routing daemon
> > limits the number of prefixes accepted per BGP session
> > (max-prefix) and selects which routes are installed via prefix
> > filters -- before those routes ever reach the forwarding table.
> >
> > The Linux kernel is a good reference here. IPv6 used to enforce
> > a max_size limit on FIB + cache entries (net.ipv6.route.max_size,
> > defaulting to 4096). It caused real production issues and was
> > removed in kernel 6.3. IPv4 never had a FIB route limit. There
> > is no per-VRF route limit either. The kernel relies entirely on
> > the control plane for route flood protection.
>
> FIB is not the Linux kernel, as well as not a network stack. We can not
> rely on a control plane protection, since control plane is a 3rd party
> software.
>
> Also, I think allocating a very algorithm-specific entity such as pool
> of TBL8 prior to calling rte_fib_create and passing a pointer on it
> could be confusing for many users and bloating API.
>
> FIB supports pluggable lookup algorithms, you can write your own and
> specify a pointer to the tbl8_pool in an algorithm-specific
> configuration defined for your algorithm, where you may also create a
> dynamic table of TBL24 pointers per each VRF. If you need any help with
> this task - I would be happy to help.
I have sent this RFC:
https://mails.dpdk.org/archives/dev/2026-March/335512.html
Thanks in advance for your help.
>
> >
> >>> -- it depends on
> >>> prefix length distribution which varies per VRF and changes over
> >>> time. No production LPM (Linux kernel, JunOS, IOS) asks the
> >>> operator to size these structures per routing table upfront.
> >> - they are using different lpm algorithms
> >> - you use these facilities, developers properly tuned them. FIB is a low
> >> level library, it cannot be used without any knowledge, it will not
> >> solve all the problems with a single red button "make it work, don't do
> >> any bugs"
> >> P.S. how do you know how JunOS /IOS implements their LPMs? ;)
> >>
> > I do not need to know their LPM implementation -- I only need
> > to know how they are configured. No production router requires
> > the operator to size internal LPM structures.
> >
> > We can impose a maximum number of IPv4/IPv6 routes on the user
> > -- even though the kernel does not need this either. But TBL8
> > is a different problem: the application cannot predict TBL8
> > consumption because it depends on prefix length distribution,
> > which varies per VRF and changes over time with dynamic routing.
> > Today there is no API to query TBL8 usage, and no API to resize
> > a FIB without destroying it.
> >
> > This is exactly why a shared TBL8 pool across VRFs is useful:
> > VRFs with few long prefixes naturally leave room for VRFs that
> > need more.
>
> but this is already implemented. I don't know why you repeatedly
> concerning about this. We are aligned about this and this feature is
> already there - in the patch
> On the other hand what we disagreed on is the sharing not only across
> VRFs, but also across address families. If you don't understand the
> amount of TBL8 per AF, how would you magically understand the number of
> TBL8s for a merged pool?
>
> > This is the valuable part of your series. But it
> > does not require a flat multi-VRF TBL24 -- separate per-VRF
> > TBL24s sharing a common TBL8 pool would give the same benefit
> > without the 64 GB upfront cost.
>
> andthat'sa completelydifferentproblem.Please,let's separatethe
> problemsandnotmixthemup.
>
> I understand your concern about memory consumption. Ihavesomeideason
> howto solvethisproblem as a parallel to proposed solution.
>
> >
> >>> Today we do not even have TBL8 usage stats (Robin's series
> >>> addresses that)
> >> I will tryto findtimetoreviewthispatchinthe nearfuture
> > Thanks, Robin's TBL8 stats series would help users understand
> > their TBL8 consumption -- a more practical improvement for
> > current users.
> >
> > Regards,
> > Maxime
>
> --
> Regards,
> Vladimir
>
--
Regards,
Maxime
^ permalink raw reply [flat|nested] 33+ messages in thread
end of thread, other threads:[~2026-04-02 16:51 UTC | newest]
Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-22 15:42 [RFC PATCH 0/4] VRF support in FIB library Vladimir Medvedkin
2026-03-22 15:42 ` [RFC PATCH 1/4] fib: add multi-VRF support Vladimir Medvedkin
2026-03-23 15:48 ` Konstantin Ananyev
2026-03-23 19:06 ` Medvedkin, Vladimir
2026-03-23 22:22 ` Konstantin Ananyev
2026-03-25 14:09 ` Medvedkin, Vladimir
2026-03-26 10:13 ` Konstantin Ananyev
2026-03-27 18:32 ` Medvedkin, Vladimir
2026-03-22 15:42 ` [RFC PATCH 2/4] fib: add VRF functional and unit tests Vladimir Medvedkin
2026-03-22 16:40 ` Stephen Hemminger
2026-03-22 16:41 ` Stephen Hemminger
2026-03-22 15:42 ` [RFC PATCH 3/4] fib6: add multi-VRF support Vladimir Medvedkin
2026-03-22 15:42 ` [RFC PATCH 4/4] fib6: add VRF functional and unit tests Vladimir Medvedkin
2026-03-22 16:45 ` Stephen Hemminger
2026-03-22 16:43 ` [RFC PATCH 0/4] VRF support in FIB library Stephen Hemminger
2026-03-23 9:01 ` Morten Brørup
2026-03-23 11:32 ` Medvedkin, Vladimir
2026-03-23 11:16 ` Medvedkin, Vladimir
2026-03-23 9:54 ` Robin Jarry
2026-03-23 11:34 ` Medvedkin, Vladimir
2026-03-23 11:27 ` Maxime Leroy
2026-03-23 12:49 ` Medvedkin, Vladimir
2026-03-23 14:53 ` Maxime Leroy
2026-03-23 15:08 ` Robin Jarry
2026-03-23 15:27 ` Morten Brørup
2026-03-23 18:52 ` Medvedkin, Vladimir
2026-03-23 18:42 ` Medvedkin, Vladimir
2026-03-24 9:19 ` Maxime Leroy
2026-03-25 15:56 ` Medvedkin, Vladimir
2026-03-25 21:43 ` Maxime Leroy
2026-03-27 18:27 ` Medvedkin, Vladimir
2026-04-02 16:51 ` Maxime Leroy
2026-03-23 19:05 ` Stephen Hemminger
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox