Netdev List
 help / color / mirror / Atom feed
From: Ido Schimmel <idosch@nvidia.com>
To: Cosmin Ratiu <cratiu@nvidia.com>
Cc: netdev@vger.kernel.org, David Ahern <dsahern@kernel.org>,
	Kuniyuki Iwashima <kuniyu@google.com>,
	"David S . Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Simon Horman <horms@kernel.org>,
	Paolo Abeni <pabeni@redhat.com>
Subject: Re: [PATCH v3 net-next 2/3] ipv4: Flush the FIB once on multiple nexthop removal
Date: Thu, 7 May 2026 14:40:53 +0300	[thread overview]
Message-ID: <20260507114053.GB908463@shredder> (raw)
In-Reply-To: <20260507075606.322405-3-cratiu@nvidia.com>

On Thu, May 07, 2026 at 10:56:05AM +0300, Cosmin Ratiu wrote:
> When a device is going down or when a net namespace is deleted, all
> nexthops on it are removed, and for each nexthop being removed the FIB
> table is flushed, which does a full trie traversal looking for entries
> marked RTNH_F_DEAD and removing them. This is O(N x R), with N being
> number of dev nexthops and R being number of IPv4 routes.
> 
> The RTNL is held the entire time.
> 
> When there are many nexthops to be removed and many routing entries,
> this can result in the RTNL being held for multiple minutes, which
> causes unhappiness in other processes trying to acquire the RTNL (e.g.
> systemd-networkd for DHCP renewals).
> 
> In a complicated deployment with multiple vxlan devices, each having
> 16K nexthops and a total of 128K ipv4 routes, this is exactly what
> happens:
> 
> nexthop_flush_dev()                # loops over 16K nexthops
>   -> remove_nexthop()
>     -> __remove_nexthop()
>       -> __remove_nexthop_fib()    # marks fi->fib_flags |= RTNH_F_DEAD
>         -> fib_flush()             # for EACH nexthop!
> 	  -> fib_table_flush()     # walks the ENTIRE FIB, 128K entries
> 
> This patch makes use of the previously added FIB flushing signal to only
> do a single FIB flush after all nexthops to be removed are marked as
> RTNH_F_DEAD:
> - __remove_nexthop_fib() no longer flushes the FIB.
> - nexthop_flush_dev() and flush_all_nexthops() now keep track whether
>   any nexthop was removed and trigger a FIB flush at the end.
> - a new wrapper is defined, remove_one_nexthop() which calls
>   remove_nexthop() and flushes if necessary. This is intended for places
>   which must remove a single nexthop and shouldn't worry about the need
>   to trigger a FIB flush. For now, the only caller is rtm_del_nexthop().
> - The two direct callers of __remove_nexthop() get a WARN_ON_ONCE, since
>   the nh about to be removed should not have any FIB entries referencing
>   it when replacing or inserting a new one.
> 
> This dramatically improves performance from O(N x R) to O(N + R).
> 
> Releasing a nexthop reference in remove_nexthop() now no longer frees
> it. Instead, it is deleted when the last fib_info pointing to it gets
> freed via free_fib_info_rcu(). All routing code is already careful not
> to take into consideration routes marked with RTNH_F_DEAD.
> 
> Tested with:
> DEV=eth2
> ip link set up dev $DEV
> ip link add testnh0 link $DEV type macvlan mode bridge
> ip addr add 198.51.100.1/24 dev testnh0
> ip link set testnh0 up
> 
> seq 1 65536 | \
> sed 's/.*/nexthop add id & via 198.51.100.2 dev testnh0/' | \
> ip -batch -
> 
> i=1
> for a in $(seq 0 255); do
>   for b in $(seq 0 255); do
>     echo "route add 10.${a}.${b}.0/32 nhid $i"
>     i=$((i + 1))
>   done
> done | ip -batch -
> 
> time ip link set testnh0 down
> ip link del testnh0
> 
> Without this patch:
> real	0m32.601s
> user	0m0.000s
> sys	0m32.511s
> 
> With this patch:
> real	0m0.209s
> user	0m0.000s
> sys	0m0.153s
> 
> Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>

Reviewed-by: Ido Schimmel <idosch@nvidia.com>

  reply	other threads:[~2026-05-07 11:41 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-07  7:56 [PATCH net-next v3 0/3] ipv4: Flush the FIB once on multiple nexthop removal Cosmin Ratiu
2026-05-07  7:56 ` [PATCH v3 net-next 1/3] ipv4: Provide a FIB flushing signal from nexthop removal functions Cosmin Ratiu
2026-05-07 11:40   ` Ido Schimmel
2026-05-07  7:56 ` [PATCH v3 net-next 2/3] ipv4: Flush the FIB once on multiple nexthop removal Cosmin Ratiu
2026-05-07 11:40   ` Ido Schimmel [this message]
2026-05-07  7:56 ` [PATCH v3 net-next 3/3] ipv4: Add __must_check to nexthop removal functions Cosmin Ratiu
2026-05-07 11:41   ` Ido Schimmel
2026-05-07 14:57 ` [PATCH net-next v3 0/3] ipv4: Flush the FIB once on multiple nexthop removal David Ahern
2026-05-10 17:20 ` patchwork-bot+netdevbpf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260507114053.GB908463@shredder \
    --to=idosch@nvidia.com \
    --cc=cratiu@nvidia.com \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=kuba@kernel.org \
    --cc=kuniyu@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox