From: Ido Schimmel <idosch@idosch.org>
To: Daniel Borkmann <daniel@iogearbox.net>
Cc: davem@davemloft.net, kuba@kernel.org, roopa@nvidia.com,
dsahern@kernel.org, m@lambda.lt, john.fastabend@gmail.com,
netdev@vger.kernel.org, bpf@vger.kernel.org
Subject: Re: [PATCH net-next 4/4] net, neigh: Add NTF_MANAGED flag for managed neighbor entries
Date: Tue, 12 Oct 2021 19:31:38 +0300 [thread overview]
Message-ID: <YWW4alF5eSUS0QVK@shredder> (raw)
In-Reply-To: <20211011121238.25542-5-daniel@iogearbox.net>
On Mon, Oct 11, 2021 at 02:12:38PM +0200, Daniel Borkmann wrote:
> Allow a user space control plane to insert entries with a new NTF_EXT_MANAGED
> flag. The flag then indicates to the kernel that the neighbor entry should be
> periodically probed for keeping the entry in NUD_REACHABLE state iff possible.
Nice idea
>
> The use case for this is targeting XDP or tc BPF load-balancers which use
> the bpf_fib_lookup() BPF helper in order to piggyback on neighbor resolution
> for their backends. Given they cannot be resolved in fast-path,
Out of curiosity, can you explain why that is? Because XDP is only fast
path? At least that's what I understand from commit 87f5fc7e48dd ("bpf:
Provide helper to do forwarding lookups in kernel FIB table") and it is
similar to L3 offload
> a control plane inserts the L3 (without L2) entries manually into the
> neighbor table and lets the kernel do the neighbor resolution either
> on the gateway or on the backend directly in case the latter resides
> in the same L2. This avoids to deal with L2 in the control plane and
> to rebuild what the kernel already does best anyway.
Are you using 'fib_multipath_use_neigh' sysctl to avoid going through
failed nexthops? Looking at how the bpf_fib_lookup() helper is
implemented, seems that you can benefit from it in XDP
>
> NTF_EXT_MANAGED can be combined with NTF_EXT_LEARNED in order to avoid GC
> eviction. The kernel then adds NTF_MANAGED flagged entries to a per-neighbor
> table which gets triggered by the system work queue to periodically call
> neigh_event_send() for performing the resolution. The implementation allows
> migration from/to NTF_MANAGED neighbor entries, so that already existing
> entries can be converted by the control plane if needed. Potentially, we could
> make the interval for periodically calling neigh_event_send() configurable;
> right now it's set to DELAY_PROBE_TIME which is also in line with mlxsw which
> has similar driver-internal infrastructure c723c735fa6b ("mlxsw: spectrum_router:
> Periodically update the kernel's neigh table"). In future, the latter could
> possibly reuse the NTF_MANAGED neighbors as well.
Yes, mlxsw can set this flag on neighbours used for its nexthops. Looks
like the use cases are similar: Avoid going to slow path, either from
XDP or HW.
In our HW the nexthop table is squashed together with the neighbour
table, so that it provides {netdev, MAC} and not {netdev, IP} with which
the kernel performs another lookup in its neighbour table. We want to
avoid situations where we perform multipathing between valid and failed
nexthop (basically, fib_multipath_use_neigh=1), so we only program valid
nexthop. But it means that nothing will trigger the resolution of the
failed nexthops, thus the need to probe the neighbours.
>
> Example:
>
> # ./ip/ip n replace 192.168.178.30 dev enp5s0 managed extern_learn
> # ./ip/ip n
> 192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a managed extern_learn REACHABLE
> [...]
>
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> Acked-by: Roopa Prabhu <roopa@nvidia.com>
> Link: https://linuxplumbersconf.org/event/11/contributions/953/
I was going to ask why not just default the kernel to resolve GW IPs (it
knows them when the nexthops are configured), but then I saw slide 34. I
guess that's what you meant by "... or on the backend directly in case
the latter resides in the same L2"?
next prev parent reply other threads:[~2021-10-12 16:31 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-10-11 12:12 [PATCH net-next 0/4] Managed Neighbor Entries Daniel Borkmann
2021-10-11 12:12 ` [PATCH net-next 1/4] net, neigh: Fix NTF_EXT_LEARNED in combination with NTF_USE Daniel Borkmann
2021-10-12 14:23 ` David Ahern
2021-10-11 12:12 ` [PATCH net-next 2/4] net, neigh: Enable state migration between NUD_PERMANENT and NTF_USE Daniel Borkmann
2021-10-12 14:25 ` David Ahern
2021-10-11 12:12 ` [PATCH net-next 3/4] net, neigh: Extend neigh->flags to 32 bit to allow for extensions Daniel Borkmann
2021-10-12 14:31 ` David Ahern
2021-10-12 14:46 ` Daniel Borkmann
2021-10-12 21:46 ` Jakub Kicinski
2021-10-11 12:12 ` [PATCH net-next 4/4] net, neigh: Add NTF_MANAGED flag for managed neighbor entries Daniel Borkmann
2021-10-12 14:51 ` David Ahern
2021-10-12 15:05 ` Daniel Borkmann
2021-10-12 15:26 ` Daniel Borkmann
2021-10-12 16:31 ` Ido Schimmel [this message]
2021-10-13 9:26 ` Daniel Borkmann
2021-10-13 9:59 ` Ido Schimmel
2021-10-13 11:23 ` Daniel Borkmann
2021-10-13 14:10 ` David Ahern
2022-01-31 20:43 ` Eric Dumazet
2022-01-31 21:17 ` Daniel Borkmann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YWW4alF5eSUS0QVK@shredder \
--to=idosch@idosch.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=dsahern@kernel.org \
--cc=john.fastabend@gmail.com \
--cc=kuba@kernel.org \
--cc=m@lambda.lt \
--cc=netdev@vger.kernel.org \
--cc=roopa@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).