From: "Medvedkin, Vladimir" <vladimir.medvedkin@intel.com>
To: Konstantin Ananyev <konstantin.ananyev@huawei.com>,
"dev@dpdk.org" <dev@dpdk.org>
Cc: "rjarry@redhat.com" <rjarry@redhat.com>,
"nsaxena16@gmail.com" <nsaxena16@gmail.com>,
"mb@smartsharesystems.com" <mb@smartsharesystems.com>,
"adwivedi@marvell.com" <adwivedi@marvell.com>,
"jerinjacobk@gmail.com" <jerinjacobk@gmail.com>,
Maxime Leroy <maxime@leroys.fr>,
Vladimir Medvedkin <medvedkinv@gmail.com>
Subject: Re: [RFC PATCH 1/4] fib: add multi-VRF support
Date: Fri, 27 Mar 2026 18:32:41 +0000 [thread overview]
Message-ID: <165a7482-5fcb-44bb-befb-a0bde1cb4ec1@intel.com> (raw)
In-Reply-To: <11250ee33c514310aa034c0f7ae0d8e5@huawei.com>
On 3/26/2026 10:13 AM, Konstantin Ananyev wrote:
>
>>>>>> Add VRF (Virtual Routing and Forwarding) support to the IPv4
>>>>>> FIB library, allowing multiple independent routing tables
>>>>>> within a single FIB instance.
>>>>>>
>>>>>> Introduce max_vrfs and vrf_default_nh fields in rte_fib_conf
>>>>>> to configure the number of VRFs and per-VRF default nexthops.
>>>>> Thanks Vladimir, allowing multiple VRFs per same LPM table will
>>>>> definitely be a useful thing to have.
>>>>> Though, I have the same concern as Maxime:
>>>>> memory requirements are just overwhelming.
>>>>> Stupid q - why just not to store a pointer to a vector of next-hops
>>>>> within the table entry?
>>>> Am I understand correctly, a vector with max_number_of_vrfs entries and
>>>> use vrf id to address a nexthop?
>>> Yes.
>> Here I can see 2 problems:
>>
>> 1. tbl entries must be the size of a pointer, so no way to use smaller sizes
> Yes, but as we are talking about storing nexthops for multiple VRFs anyway,
> I don't think it is a big deal.
>
>> 2. those vectors will be sparsely populated and, depending on the
>> runtime configuration, may consume a lot of memory too (as Robin
>> mentioned they may have 1024 VRFs)
> Yeas, each VRF vector can become really sparse and we waste a lot of memory.
> If that's an issue, we probably can think about something smarter
> then simple flat array indexed by vrf-id: something like 2-level B-tree or so.
> The main positives that I see in that approach:
> - low extra overhead at lookup - one/two extra pointer de-refernces.
I'm afraidtheoverheadwillbe
comparativelylargejustbecausethecurrentimplementationis fastandmost
likely hit with a single memory access. However, for a low number of
VRFs, B-tree may be a good solution
> - it allows CP to allocate/free space for each such vecto separately,
> so we don't need to pre-allocate memory for max possible entries at startup.
>
>>>> Yes, this may work.
>>>> But, if we are going to do an extra memory access, I'd better to
>>>> maintain an internal hash table with 5 byte keys {24_bits_from_LPM,
>>>> 16_bits_vrf_id} to retrieve a nexthop.
>>> Hmm... and what to do with entries in tbl8, I mean what will be the key for
>> them?
>>> Or you don't plan to put entries from tbl8 to that hash table?
>> The idea is to have a single LPM struct with a join superset of all
>> prefixes existing in all VRFs. Each prefix in this LPM struct has its
>> own unique "nexthop", which is not the final next hop, but an
>> intermediate metadata defining this unique prefix. Then, the following
>> search is performed with the key containing this intermediate metadata +
>> vrf_id in some exact match database like hash table. This approach is
>> the most memory friendly, since there is only one LPM data struct (which
>> scales well with number of prefixes it has) with intermediate entries
>> only 4b long.
>> On the other hand it requires an extra search, so lookup will be slower.
>> Also, some current LPM optimizations, like tbl8 collapsing if all tbl8
>> entries have a similar value, will be gone.
> Yes, and yes :)
> Yes it would help to save memory, and yes lookup will most likely be slower.
> The other thing that I consider as a possible drawback here - with current rte_hash
> implementation we still need to allocate space for all possible max entries at startup.
I don't think this is a big problem, since the size of this memory will
be reasonable and will not grow linearly with the number of VRFs. So I
agree it is an acceptable trade-off
> But that's not new in DPDK, and for most cases it is considered as acceptable trade-off.
> Overall, it seems like a possible approach to me, I suppose the main question is:
> what will be the price of that extra hash-lookup here.
And this is the key problem. I don't think rte_hash is well suitable
here, at best we need some kind of a perfect hash. I have a few ideas on
this, stay tuned :)
>
> Again there is a bulk version of hash lookup and in theory it might be it can be
> improved further (avx512 version on x86?).
>
>>>>> And we can provide to the user with ability to specify custom
>>>>> alloc/free function for these vectors.
>>>>> That would help to avoid allocating huge chunks of memory at startup.
>>>>> I understand that it will be one extra memory dereference,
>>>>> but probably it will be not that critical in terms of performance .
>>>>> Again for bulk function we might be able to pipeline lookups and
>>>>> de-references and hide that extra load latency.
>>>>>
>>>>>> Add four new experimental APIs:
>>>>>> - rte_fib_vrf_add() and rte_fib_vrf_delete() to manage routes
>>>>>> per VRF
>>>>>> - rte_fib_vrf_lookup_bulk() for multi-VRF bulk lookups
>>>>>> - rte_fib_vrf_get_rib() to retrieve a per-VRF RIB handle
>>>>>>
>>>>>> Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
>>>>>> ---
>>>>>> lib/fib/dir24_8.c | 241 ++++++++++++++++------
>>>>>> lib/fib/dir24_8.h | 255 ++++++++++++++++--------
>>>>>> lib/fib/dir24_8_avx512.c | 420 +++++++++++++++++++++++++++++++--------
>>>>>> lib/fib/dir24_8_avx512.h | 80 +++++++-
>>>>>> lib/fib/rte_fib.c | 158 ++++++++++++---
>>>>>> lib/fib/rte_fib.h | 94 ++++++++-
>>>>>> 6 files changed, 988 insertions(+), 260 deletions(-)
>>>>>>
>>>> <snip>
>>>>
>>>> --
>>>> Regards,
>>>> Vladimir
>>>>
>> --
>> Regards,
>> Vladimir
>>
--
Regards,
Vladimir
next prev parent reply other threads:[~2026-03-27 18:32 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-22 15:42 [RFC PATCH 0/4] VRF support in FIB library Vladimir Medvedkin
2026-03-22 15:42 ` [RFC PATCH 1/4] fib: add multi-VRF support Vladimir Medvedkin
2026-03-23 15:48 ` Konstantin Ananyev
2026-03-23 19:06 ` Medvedkin, Vladimir
2026-03-23 22:22 ` Konstantin Ananyev
2026-03-25 14:09 ` Medvedkin, Vladimir
2026-03-26 10:13 ` Konstantin Ananyev
2026-03-27 18:32 ` Medvedkin, Vladimir [this message]
2026-03-22 15:42 ` [RFC PATCH 2/4] fib: add VRF functional and unit tests Vladimir Medvedkin
2026-03-22 16:40 ` Stephen Hemminger
2026-03-22 16:41 ` Stephen Hemminger
2026-03-22 15:42 ` [RFC PATCH 3/4] fib6: add multi-VRF support Vladimir Medvedkin
2026-03-22 15:42 ` [RFC PATCH 4/4] fib6: add VRF functional and unit tests Vladimir Medvedkin
2026-03-22 16:45 ` Stephen Hemminger
2026-03-22 16:43 ` [RFC PATCH 0/4] VRF support in FIB library Stephen Hemminger
2026-03-23 9:01 ` Morten Brørup
2026-03-23 11:32 ` Medvedkin, Vladimir
2026-03-23 11:16 ` Medvedkin, Vladimir
2026-03-23 9:54 ` Robin Jarry
2026-03-23 11:34 ` Medvedkin, Vladimir
2026-03-23 11:27 ` Maxime Leroy
2026-03-23 12:49 ` Medvedkin, Vladimir
2026-03-23 14:53 ` Maxime Leroy
2026-03-23 15:08 ` Robin Jarry
2026-03-23 15:27 ` Morten Brørup
2026-03-23 18:52 ` Medvedkin, Vladimir
2026-03-23 18:42 ` Medvedkin, Vladimir
2026-03-24 9:19 ` Maxime Leroy
2026-03-25 15:56 ` Medvedkin, Vladimir
2026-03-25 21:43 ` Maxime Leroy
2026-03-27 18:27 ` Medvedkin, Vladimir
2026-04-02 16:51 ` Maxime Leroy
2026-03-23 19:05 ` Stephen Hemminger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=165a7482-5fcb-44bb-befb-a0bde1cb4ec1@intel.com \
--to=vladimir.medvedkin@intel.com \
--cc=adwivedi@marvell.com \
--cc=dev@dpdk.org \
--cc=jerinjacobk@gmail.com \
--cc=konstantin.ananyev@huawei.com \
--cc=maxime@leroys.fr \
--cc=mb@smartsharesystems.com \
--cc=medvedkinv@gmail.com \
--cc=nsaxena16@gmail.com \
--cc=rjarry@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox