Re: [RFC PATCH 1/4] fib: add multi-VRF support

public inbox for dev@dpdk.org
 help / color / mirror / Atom feed

From: "Medvedkin, Vladimir" <vladimir.medvedkin@intel.com>
To: Konstantin Ananyev <konstantin.ananyev@huawei.com>,
	"dev@dpdk.org" <dev@dpdk.org>
Cc: "rjarry@redhat.com" <rjarry@redhat.com>,
	"nsaxena16@gmail.com" <nsaxena16@gmail.com>,
	"mb@smartsharesystems.com" <mb@smartsharesystems.com>,
	 "adwivedi@marvell.com" <adwivedi@marvell.com>,
	"jerinjacobk@gmail.com" <jerinjacobk@gmail.com>,
	Maxime Leroy <maxime@leroys.fr>,
	Vladimir Medvedkin <medvedkinv@gmail.com>
Subject: Re: [RFC PATCH 1/4] fib: add multi-VRF support
Date: Fri, 27 Mar 2026 18:32:41 +0000	[thread overview]
Message-ID: <165a7482-5fcb-44bb-befb-a0bde1cb4ec1@intel.com> (raw)
In-Reply-To: <11250ee33c514310aa034c0f7ae0d8e5@huawei.com>


On 3/26/2026 10:13 AM, Konstantin Ananyev wrote:
>
>>>>>> Add VRF (Virtual Routing and Forwarding) support to the IPv4
>>>>>> FIB library, allowing multiple independent routing tables
>>>>>> within a single FIB instance.
>>>>>>
>>>>>> Introduce max_vrfs and vrf_default_nh fields in rte_fib_conf
>>>>>> to configure the number of VRFs and per-VRF default nexthops.
>>>>> Thanks Vladimir, allowing multiple VRFs per same LPM table will
>>>>> definitely be a useful thing to have.
>>>>> Though, I have the same concern as Maxime:
>>>>> memory requirements are just overwhelming.
>>>>> Stupid q - why just not to store a pointer to a vector of next-hops
>>>>> within the table entry?
>>>> Am I understand correctly, a vector with max_number_of_vrfs entries and
>>>> use vrf id to address a nexthop?
>>> Yes.
>> Here I can see 2 problems:
>>
>> 1. tbl entries must be the size of a pointer, so no way to use smaller sizes
> Yes, but as we are talking about storing nexthops for multiple VRFs anyway,
> I don't think it is a big deal.
>
>> 2. those vectors will be sparsely populated and, depending on the
>> runtime configuration, may consume a lot of memory too (as Robin
>> mentioned they may have 1024 VRFs)
> Yeas, each VRF vector can become really sparse and we waste a lot of memory.
> If that's an issue, we probably can think about something smarter
> then simple flat array indexed by vrf-id: something like 2-level B-tree or so.
> The main positives that I see in that approach:
> - low extra overhead at lookup  - one/two extra pointer de-refernces.
I'm afraidtheoverheadwillbe 
comparativelylargejustbecausethecurrentimplementationis fastandmost 
likely hit with a single memory access. However, for a low number of 
VRFs, B-tree may be a good solution
> - it allows CP to allocate/free space for each such vecto separately,
>    so we don't need to pre-allocate memory for max possible entries at startup.
>
>>>> Yes, this may work.
>>>> But, if we are going to do an extra memory access, I'd better to
>>>> maintain an internal hash table with 5 byte keys {24_bits_from_LPM,
>>>> 16_bits_vrf_id} to retrieve a nexthop.
>>> Hmm... and what to do with entries in tbl8, I mean what will be the key for
>> them?
>>> Or you don't plan to put entries from tbl8 to that hash table?
>> The idea is to have a single LPM struct with a join superset of all
>> prefixes existing in all VRFs. Each prefix in this LPM struct has its
>> own unique "nexthop", which is not the final next hop, but an
>> intermediate metadata defining this unique prefix. Then, the following
>> search is performed with the key containing this intermediate metadata +
>> vrf_id in some exact match database like hash table. This approach is
>> the most memory friendly, since there is only one LPM data struct (which
>> scales well with number of prefixes it has) with intermediate entries
>> only 4b long.
>> On the other hand it requires an extra search, so lookup will be slower.
>> Also, some current LPM optimizations, like tbl8 collapsing if all tbl8
>> entries have a similar value, will be gone.
> Yes, and yes :)
> Yes it would help to save memory, and yes lookup will most likely be slower.
> The other thing that I consider as a possible drawback here - with current rte_hash
> implementation we still need to allocate space for all possible max entries at startup.
I don't think this is a big problem, since the size of this memory will 
be reasonable and will not grow linearly with the number of VRFs. So I 
agree it is an acceptable trade-off
> But that's not new in DPDK, and for most cases it is considered as acceptable trade-off.
> Overall, it seems like a possible approach to me, I suppose the main question is:
> what will be the price of that extra hash-lookup here.
And this is the key problem. I don't think rte_hash is well suitable 
here, at best we need some kind of a perfect hash. I have a few ideas on 
this, stay tuned :)
>   
> Again there is a bulk version of hash lookup and in theory it might be it can be
> improved further (avx512 version on x86?).
>
>>>>> And we can provide to the user with ability to specify custom
>>>>> alloc/free function for these vectors.
>>>>> That would help to avoid allocating huge chunks of memory at startup.
>>>>> I understand that it will be one extra memory dereference,
>>>>> but probably it will be not that critical in terms of performance .
>>>>> Again for bulk function  we might be able to pipeline lookups and
>>>>> de-references and hide that extra load latency.
>>>>>
>>>>>> Add four new experimental APIs:
>>>>>> - rte_fib_vrf_add() and rte_fib_vrf_delete() to manage routes
>>>>>>      per VRF
>>>>>> - rte_fib_vrf_lookup_bulk() for multi-VRF bulk lookups
>>>>>> - rte_fib_vrf_get_rib() to retrieve a per-VRF RIB handle
>>>>>>
>>>>>> Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
>>>>>> ---
>>>>>>     lib/fib/dir24_8.c        | 241 ++++++++++++++++------
>>>>>>     lib/fib/dir24_8.h        | 255 ++++++++++++++++--------
>>>>>>     lib/fib/dir24_8_avx512.c | 420 +++++++++++++++++++++++++++++++--------
>>>>>>     lib/fib/dir24_8_avx512.h |  80 +++++++-
>>>>>>     lib/fib/rte_fib.c        | 158 ++++++++++++---
>>>>>>     lib/fib/rte_fib.h        |  94 ++++++++-
>>>>>>     6 files changed, 988 insertions(+), 260 deletions(-)
>>>>>>
>>>> <snip>
>>>>
>>>> --
>>>> Regards,
>>>> Vladimir
>>>>
>> --
>> Regards,
>> Vladimir
>>
-- 
Regards,
Vladimir

next prev parent reply	other threads:[~2026-03-27 18:32 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-22 15:42 [RFC PATCH 0/4] VRF support in FIB library Vladimir Medvedkin
2026-03-22 15:42 ` [RFC PATCH 1/4] fib: add multi-VRF support Vladimir Medvedkin
2026-03-23 15:48   ` Konstantin Ananyev
2026-03-23 19:06     ` Medvedkin, Vladimir
2026-03-23 22:22       ` Konstantin Ananyev
2026-03-25 14:09         ` Medvedkin, Vladimir
2026-03-26 10:13           ` Konstantin Ananyev
2026-03-27 18:32             ` Medvedkin, Vladimir [this message]
2026-03-22 15:42 ` [RFC PATCH 2/4] fib: add VRF functional and unit tests Vladimir Medvedkin
2026-03-22 16:40   ` Stephen Hemminger
2026-03-22 16:41   ` Stephen Hemminger
2026-03-22 15:42 ` [RFC PATCH 3/4] fib6: add multi-VRF support Vladimir Medvedkin
2026-03-22 15:42 ` [RFC PATCH 4/4] fib6: add VRF functional and unit tests Vladimir Medvedkin
2026-03-22 16:45   ` Stephen Hemminger
2026-03-22 16:43 ` [RFC PATCH 0/4] VRF support in FIB library Stephen Hemminger
2026-03-23  9:01   ` Morten Brørup
2026-03-23 11:32     ` Medvedkin, Vladimir
2026-03-23 11:16   ` Medvedkin, Vladimir
2026-03-23  9:54 ` Robin Jarry
2026-03-23 11:34   ` Medvedkin, Vladimir
2026-03-23 11:27 ` Maxime Leroy
2026-03-23 12:49   ` Medvedkin, Vladimir
2026-03-23 14:53     ` Maxime Leroy
2026-03-23 15:08       ` Robin Jarry
2026-03-23 15:27         ` Morten Brørup
2026-03-23 18:52           ` Medvedkin, Vladimir
2026-03-23 18:42       ` Medvedkin, Vladimir
2026-03-24  9:19         ` Maxime Leroy
2026-03-25 15:56           ` Medvedkin, Vladimir
2026-03-25 21:43             ` Maxime Leroy
2026-03-27 18:27               ` Medvedkin, Vladimir
2026-04-02 16:51                 ` Maxime Leroy
2026-03-23 19:05 ` Stephen Hemminger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=165a7482-5fcb-44bb-befb-a0bde1cb4ec1@intel.com \
    --to=vladimir.medvedkin@intel.com \
    --cc=adwivedi@marvell.com \
    --cc=dev@dpdk.org \
    --cc=jerinjacobk@gmail.com \
    --cc=konstantin.ananyev@huawei.com \
    --cc=maxime@leroys.fr \
    --cc=mb@smartsharesystems.com \
    --cc=medvedkinv@gmail.com \
    --cc=nsaxena16@gmail.com \
    --cc=rjarry@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox