From: Ido Schimmel <idosch@nvidia.com>
To: Cosmin Ratiu <cratiu@nvidia.com>
Cc: netdev@vger.kernel.org, David Ahern <dsahern@kernel.org>,
Kuniyuki Iwashima <kuniyu@google.com>,
"David S . Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Simon Horman <horms@kernel.org>,
Paolo Abeni <pabeni@redhat.com>
Subject: Re: [PATCH v3 net-next 2/3] ipv4: Flush the FIB once on multiple nexthop removal
Date: Thu, 7 May 2026 14:40:53 +0300 [thread overview]
Message-ID: <20260507114053.GB908463@shredder> (raw)
In-Reply-To: <20260507075606.322405-3-cratiu@nvidia.com>
On Thu, May 07, 2026 at 10:56:05AM +0300, Cosmin Ratiu wrote:
> When a device is going down or when a net namespace is deleted, all
> nexthops on it are removed, and for each nexthop being removed the FIB
> table is flushed, which does a full trie traversal looking for entries
> marked RTNH_F_DEAD and removing them. This is O(N x R), with N being
> number of dev nexthops and R being number of IPv4 routes.
>
> The RTNL is held the entire time.
>
> When there are many nexthops to be removed and many routing entries,
> this can result in the RTNL being held for multiple minutes, which
> causes unhappiness in other processes trying to acquire the RTNL (e.g.
> systemd-networkd for DHCP renewals).
>
> In a complicated deployment with multiple vxlan devices, each having
> 16K nexthops and a total of 128K ipv4 routes, this is exactly what
> happens:
>
> nexthop_flush_dev() # loops over 16K nexthops
> -> remove_nexthop()
> -> __remove_nexthop()
> -> __remove_nexthop_fib() # marks fi->fib_flags |= RTNH_F_DEAD
> -> fib_flush() # for EACH nexthop!
> -> fib_table_flush() # walks the ENTIRE FIB, 128K entries
>
> This patch makes use of the previously added FIB flushing signal to only
> do a single FIB flush after all nexthops to be removed are marked as
> RTNH_F_DEAD:
> - __remove_nexthop_fib() no longer flushes the FIB.
> - nexthop_flush_dev() and flush_all_nexthops() now keep track whether
> any nexthop was removed and trigger a FIB flush at the end.
> - a new wrapper is defined, remove_one_nexthop() which calls
> remove_nexthop() and flushes if necessary. This is intended for places
> which must remove a single nexthop and shouldn't worry about the need
> to trigger a FIB flush. For now, the only caller is rtm_del_nexthop().
> - The two direct callers of __remove_nexthop() get a WARN_ON_ONCE, since
> the nh about to be removed should not have any FIB entries referencing
> it when replacing or inserting a new one.
>
> This dramatically improves performance from O(N x R) to O(N + R).
>
> Releasing a nexthop reference in remove_nexthop() now no longer frees
> it. Instead, it is deleted when the last fib_info pointing to it gets
> freed via free_fib_info_rcu(). All routing code is already careful not
> to take into consideration routes marked with RTNH_F_DEAD.
>
> Tested with:
> DEV=eth2
> ip link set up dev $DEV
> ip link add testnh0 link $DEV type macvlan mode bridge
> ip addr add 198.51.100.1/24 dev testnh0
> ip link set testnh0 up
>
> seq 1 65536 | \
> sed 's/.*/nexthop add id & via 198.51.100.2 dev testnh0/' | \
> ip -batch -
>
> i=1
> for a in $(seq 0 255); do
> for b in $(seq 0 255); do
> echo "route add 10.${a}.${b}.0/32 nhid $i"
> i=$((i + 1))
> done
> done | ip -batch -
>
> time ip link set testnh0 down
> ip link del testnh0
>
> Without this patch:
> real 0m32.601s
> user 0m0.000s
> sys 0m32.511s
>
> With this patch:
> real 0m0.209s
> user 0m0.000s
> sys 0m0.153s
>
> Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
next prev parent reply other threads:[~2026-05-07 11:41 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-07 7:56 [PATCH net-next v3 0/3] ipv4: Flush the FIB once on multiple nexthop removal Cosmin Ratiu
2026-05-07 7:56 ` [PATCH v3 net-next 1/3] ipv4: Provide a FIB flushing signal from nexthop removal functions Cosmin Ratiu
2026-05-07 11:40 ` Ido Schimmel
2026-05-07 7:56 ` [PATCH v3 net-next 2/3] ipv4: Flush the FIB once on multiple nexthop removal Cosmin Ratiu
2026-05-07 11:40 ` Ido Schimmel [this message]
2026-05-07 7:56 ` [PATCH v3 net-next 3/3] ipv4: Add __must_check to nexthop removal functions Cosmin Ratiu
2026-05-07 11:41 ` Ido Schimmel
2026-05-07 14:57 ` [PATCH net-next v3 0/3] ipv4: Flush the FIB once on multiple nexthop removal David Ahern
2026-05-10 17:20 ` patchwork-bot+netdevbpf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260507114053.GB908463@shredder \
--to=idosch@nvidia.com \
--cc=cratiu@nvidia.com \
--cc=davem@davemloft.net \
--cc=dsahern@kernel.org \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=kuba@kernel.org \
--cc=kuniyu@google.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox