* [PATCH net-next v3 0/3] ipv4: Flush the FIB once on multiple nexthop removal
@ 2026-05-07 7:56 Cosmin Ratiu
2026-05-07 7:56 ` [PATCH v3 net-next 1/3] ipv4: Provide a FIB flushing signal from nexthop removal functions Cosmin Ratiu
` (4 more replies)
0 siblings, 5 replies; 9+ messages in thread
From: Cosmin Ratiu @ 2026-05-07 7:56 UTC (permalink / raw)
To: netdev
Cc: David Ahern, Ido Schimmel, Kuniyuki Iwashima, David S . Miller,
Eric Dumazet, Jakub Kicinski, Simon Horman, Paolo Abeni,
Cosmin Ratiu
This series optimizes multiple nexthop removal performance from having
to do a FIB flush for each nexthop being removed to only doing a single
FIB flush after all nexthops are removed.
This dramatically improves performance in scenarios where there are
many nexthops and many ipv4 routes. Please see individual patches for
more details and for a test scenario.
V2 -> V3: https://lore.kernel.org/netdev/8fea4084-c9ec-472a-b8ab-ecc87e537216@kernel.org/T/#t
- Split the patch into 3 (Ido Schimmel, David Ahern)
- Used WARN_ON_ONCE instead of WARN_ON (Ido Schimmel)
V1 -> V2:
- Fixes xmas tree in a couple places (Kuniyuki Iwashima)
- Added __must_check to remove_nexthop_from_groups() (Kuniyuki Iwashima)
Cosmin Ratiu (3):
ipv4: Provide a FIB flushing signal from nexthop removal functions
ipv4: Flush the FIB once on multiple nexthop removal
ipv4: Add __must_check to nexthop removal functions
net/ipv4/nexthop.c | 88 +++++++++++++++++++++++++++++-----------------
1 file changed, 56 insertions(+), 32 deletions(-)
--
2.53.0
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v3 net-next 1/3] ipv4: Provide a FIB flushing signal from nexthop removal functions
2026-05-07 7:56 [PATCH net-next v3 0/3] ipv4: Flush the FIB once on multiple nexthop removal Cosmin Ratiu
@ 2026-05-07 7:56 ` Cosmin Ratiu
2026-05-07 11:40 ` Ido Schimmel
2026-05-07 7:56 ` [PATCH v3 net-next 2/3] ipv4: Flush the FIB once on multiple nexthop removal Cosmin Ratiu
` (3 subsequent siblings)
4 siblings, 1 reply; 9+ messages in thread
From: Cosmin Ratiu @ 2026-05-07 7:56 UTC (permalink / raw)
To: netdev
Cc: David Ahern, Ido Schimmel, Kuniyuki Iwashima, David S . Miller,
Eric Dumazet, Jakub Kicinski, Simon Horman, Paolo Abeni,
Cosmin Ratiu
Plumb a bool value throughout the various nexthop removal functions,
determined in the innermost __remove_nexthop_fib() (which still does the
FIB flushing) and propagated up all callers.
The next patch will make use of this signal to optimize the removal of
multiple nexthops by moving the FIB flushing up the call hierarchy.
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
---
net/ipv4/nexthop.c | 50 +++++++++++++++++++++++++++-------------------
1 file changed, 30 insertions(+), 20 deletions(-)
diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c
index f92fcc39fc4c..7177092d2605 100644
--- a/net/ipv4/nexthop.c
+++ b/net/ipv4/nexthop.c
@@ -20,7 +20,7 @@
#define NH_RES_DEFAULT_IDLE_TIMER (120 * HZ)
#define NH_RES_DEFAULT_UNBALANCED_TIMER 0 /* No forced rebalancing. */
-static void remove_nexthop(struct net *net, struct nexthop *nh,
+static bool remove_nexthop(struct net *net, struct nexthop *nh,
struct nl_info *nlinfo);
#define NH_DEV_HASHBITS 8
@@ -2016,7 +2016,7 @@ static void nh_hthr_group_rebalance(struct nh_group *nhg)
}
}
-static void remove_nh_grp_entry(struct net *net, struct nh_grp_entry *nhge,
+static bool remove_nh_grp_entry(struct net *net, struct nh_grp_entry *nhge,
struct nl_info *nlinfo,
struct list_head *deferred_free)
{
@@ -2033,10 +2033,8 @@ static void remove_nh_grp_entry(struct net *net, struct nh_grp_entry *nhge,
newg = nhg->spare;
/* last entry, keep it visible and remove the parent */
- if (nhg->num_nh == 1) {
- remove_nexthop(net, nhp, nlinfo);
- return;
- }
+ if (nhg->num_nh == 1)
+ return remove_nexthop(net, nhp, nlinfo);
newg->has_v4 = false;
newg->is_multipath = nhg->is_multipath;
@@ -2093,22 +2091,26 @@ static void remove_nh_grp_entry(struct net *net, struct nh_grp_entry *nhge,
if (nlinfo)
nexthop_notify(RTM_NEWNEXTHOP, nhp, nlinfo);
+
+ return false;
}
-static void remove_nexthop_from_groups(struct net *net, struct nexthop *nh,
+static bool remove_nexthop_from_groups(struct net *net, struct nexthop *nh,
struct nl_info *nlinfo)
{
struct nh_grp_entry *nhge, *tmp;
LIST_HEAD(deferred_free);
+ bool need_flush = false;
/* If there is nothing to do, let's avoid the costly call to
* synchronize_net()
*/
if (list_empty(&nh->grp_list))
- return;
+ return false;
list_for_each_entry_safe(nhge, tmp, &nh->grp_list, nh_list)
- remove_nh_grp_entry(net, nhge, nlinfo, &deferred_free);
+ need_flush |= remove_nh_grp_entry(net, nhge, nlinfo,
+ &deferred_free);
/* make sure all see the newly published array before releasing rtnl */
synchronize_net();
@@ -2118,6 +2120,8 @@ static void remove_nexthop_from_groups(struct net *net, struct nexthop *nh,
list_del(&nhge->nh_list);
free_percpu(nhge->stats);
}
+
+ return need_flush;
}
static void remove_nexthop_group(struct nexthop *nh, struct nl_info *nlinfo)
@@ -2142,17 +2146,15 @@ static void remove_nexthop_group(struct nexthop *nh, struct nl_info *nlinfo)
}
/* not called for nexthop replace */
-static void __remove_nexthop_fib(struct net *net, struct nexthop *nh)
+static bool __remove_nexthop_fib(struct net *net, struct nexthop *nh)
{
+ bool need_flush = !list_empty(&nh->fi_list);
struct fib6_info *f6i;
- bool do_flush = false;
struct fib_info *fi;
- list_for_each_entry(fi, &nh->fi_list, nh_list) {
+ list_for_each_entry(fi, &nh->fi_list, nh_list)
fi->fib_flags |= RTNH_F_DEAD;
- do_flush = true;
- }
- if (do_flush)
+ if (need_flush)
fib_flush(net);
spin_lock_bh(&nh->lock);
@@ -2173,12 +2175,14 @@ static void __remove_nexthop_fib(struct net *net, struct nexthop *nh)
}
spin_unlock_bh(&nh->lock);
+
+ return need_flush;
}
-static void __remove_nexthop(struct net *net, struct nexthop *nh,
+static bool __remove_nexthop(struct net *net, struct nexthop *nh,
struct nl_info *nlinfo)
{
- __remove_nexthop_fib(net, nh);
+ bool need_flush = __remove_nexthop_fib(net, nh);
if (nh->is_group) {
remove_nexthop_group(nh, nlinfo);
@@ -2189,13 +2193,17 @@ static void __remove_nexthop(struct net *net, struct nexthop *nh,
if (nhi->fib_nhc.nhc_dev)
hlist_del(&nhi->dev_hash);
- remove_nexthop_from_groups(net, nh, nlinfo);
+ need_flush |= remove_nexthop_from_groups(net, nh, nlinfo);
}
+
+ return need_flush;
}
-static void remove_nexthop(struct net *net, struct nexthop *nh,
+static bool remove_nexthop(struct net *net, struct nexthop *nh,
struct nl_info *nlinfo)
{
+ bool need_flush;
+
call_nexthop_notifiers(net, NEXTHOP_EVENT_DEL, nh, NULL);
/* remove from the tree */
@@ -2204,10 +2212,12 @@ static void remove_nexthop(struct net *net, struct nexthop *nh,
if (nlinfo)
nexthop_notify(RTM_DELNEXTHOP, nh, nlinfo);
- __remove_nexthop(net, nh, nlinfo);
+ need_flush = __remove_nexthop(net, nh, nlinfo);
nh_base_seq_inc(net);
nexthop_put(nh);
+
+ return need_flush;
}
/* if any FIB entries reference this nexthop, any dst entries
--
2.53.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH v3 net-next 2/3] ipv4: Flush the FIB once on multiple nexthop removal
2026-05-07 7:56 [PATCH net-next v3 0/3] ipv4: Flush the FIB once on multiple nexthop removal Cosmin Ratiu
2026-05-07 7:56 ` [PATCH v3 net-next 1/3] ipv4: Provide a FIB flushing signal from nexthop removal functions Cosmin Ratiu
@ 2026-05-07 7:56 ` Cosmin Ratiu
2026-05-07 11:40 ` Ido Schimmel
2026-05-07 7:56 ` [PATCH v3 net-next 3/3] ipv4: Add __must_check to nexthop removal functions Cosmin Ratiu
` (2 subsequent siblings)
4 siblings, 1 reply; 9+ messages in thread
From: Cosmin Ratiu @ 2026-05-07 7:56 UTC (permalink / raw)
To: netdev
Cc: David Ahern, Ido Schimmel, Kuniyuki Iwashima, David S . Miller,
Eric Dumazet, Jakub Kicinski, Simon Horman, Paolo Abeni,
Cosmin Ratiu
When a device is going down or when a net namespace is deleted, all
nexthops on it are removed, and for each nexthop being removed the FIB
table is flushed, which does a full trie traversal looking for entries
marked RTNH_F_DEAD and removing them. This is O(N x R), with N being
number of dev nexthops and R being number of IPv4 routes.
The RTNL is held the entire time.
When there are many nexthops to be removed and many routing entries,
this can result in the RTNL being held for multiple minutes, which
causes unhappiness in other processes trying to acquire the RTNL (e.g.
systemd-networkd for DHCP renewals).
In a complicated deployment with multiple vxlan devices, each having
16K nexthops and a total of 128K ipv4 routes, this is exactly what
happens:
nexthop_flush_dev() # loops over 16K nexthops
-> remove_nexthop()
-> __remove_nexthop()
-> __remove_nexthop_fib() # marks fi->fib_flags |= RTNH_F_DEAD
-> fib_flush() # for EACH nexthop!
-> fib_table_flush() # walks the ENTIRE FIB, 128K entries
This patch makes use of the previously added FIB flushing signal to only
do a single FIB flush after all nexthops to be removed are marked as
RTNH_F_DEAD:
- __remove_nexthop_fib() no longer flushes the FIB.
- nexthop_flush_dev() and flush_all_nexthops() now keep track whether
any nexthop was removed and trigger a FIB flush at the end.
- a new wrapper is defined, remove_one_nexthop() which calls
remove_nexthop() and flushes if necessary. This is intended for places
which must remove a single nexthop and shouldn't worry about the need
to trigger a FIB flush. For now, the only caller is rtm_del_nexthop().
- The two direct callers of __remove_nexthop() get a WARN_ON_ONCE, since
the nh about to be removed should not have any FIB entries referencing
it when replacing or inserting a new one.
This dramatically improves performance from O(N x R) to O(N + R).
Releasing a nexthop reference in remove_nexthop() now no longer frees
it. Instead, it is deleted when the last fib_info pointing to it gets
freed via free_fib_info_rcu(). All routing code is already careful not
to take into consideration routes marked with RTNH_F_DEAD.
Tested with:
DEV=eth2
ip link set up dev $DEV
ip link add testnh0 link $DEV type macvlan mode bridge
ip addr add 198.51.100.1/24 dev testnh0
ip link set testnh0 up
seq 1 65536 | \
sed 's/.*/nexthop add id & via 198.51.100.2 dev testnh0/' | \
ip -batch -
i=1
for a in $(seq 0 255); do
for b in $(seq 0 255); do
echo "route add 10.${a}.${b}.0/32 nhid $i"
i=$((i + 1))
done
done | ip -batch -
time ip link set testnh0 down
ip link del testnh0
Without this patch:
real 0m32.601s
user 0m0.000s
sys 0m32.511s
With this patch:
real 0m0.209s
user 0m0.000s
sys 0m0.153s
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
---
net/ipv4/nexthop.c | 26 +++++++++++++++++++-------
1 file changed, 19 insertions(+), 7 deletions(-)
diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c
index 7177092d2605..703954c490d0 100644
--- a/net/ipv4/nexthop.c
+++ b/net/ipv4/nexthop.c
@@ -2154,8 +2154,6 @@ static bool __remove_nexthop_fib(struct net *net, struct nexthop *nh)
list_for_each_entry(fi, &nh->fi_list, nh_list)
fi->fib_flags |= RTNH_F_DEAD;
- if (need_flush)
- fib_flush(net);
spin_lock_bh(&nh->lock);
@@ -2220,6 +2218,13 @@ static bool remove_nexthop(struct net *net, struct nexthop *nh,
return need_flush;
}
+static void remove_one_nexthop(struct net *net, struct nexthop *nh,
+ struct nl_info *nlinfo)
+{
+ if (remove_nexthop(net, nh, nlinfo))
+ fib_flush(net);
+}
+
/* if any FIB entries reference this nexthop, any dst entries
* need to be regenerated
*/
@@ -2602,7 +2607,7 @@ static int replace_nexthop(struct net *net, struct nexthop *old,
if (!err) {
nh_rt_cache_flush(net, old, new);
- __remove_nexthop(net, new, NULL);
+ WARN_ON_ONCE(__remove_nexthop(net, new, NULL));
nexthop_put(new);
}
@@ -2709,6 +2714,7 @@ static void nexthop_flush_dev(struct net_device *dev, unsigned long event)
unsigned int hash = nh_dev_hashfn(dev->ifindex);
struct net *net = dev_net(dev);
struct hlist_head *head = &net->nexthop.devhash[hash];
+ bool need_flush = false;
struct hlist_node *n;
struct nh_info *nhi;
@@ -2720,22 +2726,28 @@ static void nexthop_flush_dev(struct net_device *dev, unsigned long event)
(event == NETDEV_DOWN || event == NETDEV_CHANGE))
continue;
- remove_nexthop(net, nhi->nh_parent, NULL);
+ need_flush |= remove_nexthop(net, nhi->nh_parent, NULL);
}
+
+ if (need_flush)
+ fib_flush(net);
}
/* rtnl; called when net namespace is deleted */
static void flush_all_nexthops(struct net *net)
{
struct rb_root *root = &net->nexthop.rb_root;
+ bool need_flush = false;
struct rb_node *node;
struct nexthop *nh;
while ((node = rb_first(root))) {
nh = rb_entry(node, struct nexthop, rb_node);
- remove_nexthop(net, nh, NULL);
+ need_flush |= remove_nexthop(net, nh, NULL);
cond_resched();
}
+ if (need_flush)
+ fib_flush(net);
}
static struct nexthop *nexthop_create_group(struct net *net,
@@ -3004,7 +3016,7 @@ static struct nexthop *nexthop_add(struct net *net, struct nh_config *cfg,
err = insert_nexthop(net, nh, cfg, extack);
if (err) {
- __remove_nexthop(net, nh, NULL);
+ WARN_ON_ONCE(__remove_nexthop(net, nh, NULL));
nexthop_put(nh);
nh = ERR_PTR(err);
}
@@ -3373,7 +3385,7 @@ static int rtm_del_nexthop(struct sk_buff *skb, struct nlmsghdr *nlh,
nh = nexthop_find_by_id(net, id);
if (nh)
- remove_nexthop(net, nh, &nlinfo);
+ remove_one_nexthop(net, nh, &nlinfo);
else
err = -ENOENT;
--
2.53.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH v3 net-next 3/3] ipv4: Add __must_check to nexthop removal functions
2026-05-07 7:56 [PATCH net-next v3 0/3] ipv4: Flush the FIB once on multiple nexthop removal Cosmin Ratiu
2026-05-07 7:56 ` [PATCH v3 net-next 1/3] ipv4: Provide a FIB flushing signal from nexthop removal functions Cosmin Ratiu
2026-05-07 7:56 ` [PATCH v3 net-next 2/3] ipv4: Flush the FIB once on multiple nexthop removal Cosmin Ratiu
@ 2026-05-07 7:56 ` Cosmin Ratiu
2026-05-07 11:41 ` Ido Schimmel
2026-05-07 14:57 ` [PATCH net-next v3 0/3] ipv4: Flush the FIB once on multiple nexthop removal David Ahern
2026-05-10 17:20 ` patchwork-bot+netdevbpf
4 siblings, 1 reply; 9+ messages in thread
From: Cosmin Ratiu @ 2026-05-07 7:56 UTC (permalink / raw)
To: netdev
Cc: David Ahern, Ido Schimmel, Kuniyuki Iwashima, David S . Miller,
Eric Dumazet, Jakub Kicinski, Simon Horman, Paolo Abeni,
Cosmin Ratiu
These functions return a signal whether FIB flushing is required which
must not be ignored. Use the compiler to help with enforcing this
requirement in the future.
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
---
net/ipv4/nexthop.c | 26 ++++++++++++++------------
1 file changed, 14 insertions(+), 12 deletions(-)
diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c
index 703954c490d0..6205bd57aa85 100644
--- a/net/ipv4/nexthop.c
+++ b/net/ipv4/nexthop.c
@@ -20,8 +20,8 @@
#define NH_RES_DEFAULT_IDLE_TIMER (120 * HZ)
#define NH_RES_DEFAULT_UNBALANCED_TIMER 0 /* No forced rebalancing. */
-static bool remove_nexthop(struct net *net, struct nexthop *nh,
- struct nl_info *nlinfo);
+static bool __must_check remove_nexthop(struct net *net, struct nexthop *nh,
+ struct nl_info *nlinfo);
#define NH_DEV_HASHBITS 8
#define NH_DEV_HASHSIZE (1U << NH_DEV_HASHBITS)
@@ -2016,9 +2016,9 @@ static void nh_hthr_group_rebalance(struct nh_group *nhg)
}
}
-static bool remove_nh_grp_entry(struct net *net, struct nh_grp_entry *nhge,
- struct nl_info *nlinfo,
- struct list_head *deferred_free)
+static bool __must_check
+remove_nh_grp_entry(struct net *net, struct nh_grp_entry *nhge,
+ struct nl_info *nlinfo, struct list_head *deferred_free)
{
struct nh_grp_entry *nhges, *new_nhges;
struct nexthop *nhp = nhge->nh_parent;
@@ -2095,8 +2095,9 @@ static bool remove_nh_grp_entry(struct net *net, struct nh_grp_entry *nhge,
return false;
}
-static bool remove_nexthop_from_groups(struct net *net, struct nexthop *nh,
- struct nl_info *nlinfo)
+static bool __must_check
+remove_nexthop_from_groups(struct net *net, struct nexthop *nh,
+ struct nl_info *nlinfo)
{
struct nh_grp_entry *nhge, *tmp;
LIST_HEAD(deferred_free);
@@ -2146,7 +2147,8 @@ static void remove_nexthop_group(struct nexthop *nh, struct nl_info *nlinfo)
}
/* not called for nexthop replace */
-static bool __remove_nexthop_fib(struct net *net, struct nexthop *nh)
+static bool __must_check __remove_nexthop_fib(struct net *net,
+ struct nexthop *nh)
{
bool need_flush = !list_empty(&nh->fi_list);
struct fib6_info *f6i;
@@ -2177,8 +2179,8 @@ static bool __remove_nexthop_fib(struct net *net, struct nexthop *nh)
return need_flush;
}
-static bool __remove_nexthop(struct net *net, struct nexthop *nh,
- struct nl_info *nlinfo)
+static bool __must_check __remove_nexthop(struct net *net, struct nexthop *nh,
+ struct nl_info *nlinfo)
{
bool need_flush = __remove_nexthop_fib(net, nh);
@@ -2197,8 +2199,8 @@ static bool __remove_nexthop(struct net *net, struct nexthop *nh,
return need_flush;
}
-static bool remove_nexthop(struct net *net, struct nexthop *nh,
- struct nl_info *nlinfo)
+static bool __must_check remove_nexthop(struct net *net, struct nexthop *nh,
+ struct nl_info *nlinfo)
{
bool need_flush;
--
2.53.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v3 net-next 1/3] ipv4: Provide a FIB flushing signal from nexthop removal functions
2026-05-07 7:56 ` [PATCH v3 net-next 1/3] ipv4: Provide a FIB flushing signal from nexthop removal functions Cosmin Ratiu
@ 2026-05-07 11:40 ` Ido Schimmel
0 siblings, 0 replies; 9+ messages in thread
From: Ido Schimmel @ 2026-05-07 11:40 UTC (permalink / raw)
To: Cosmin Ratiu
Cc: netdev, David Ahern, Kuniyuki Iwashima, David S . Miller,
Eric Dumazet, Jakub Kicinski, Simon Horman, Paolo Abeni
On Thu, May 07, 2026 at 10:56:04AM +0300, Cosmin Ratiu wrote:
> Plumb a bool value throughout the various nexthop removal functions,
> determined in the innermost __remove_nexthop_fib() (which still does the
> FIB flushing) and propagated up all callers.
>
> The next patch will make use of this signal to optimize the removal of
> multiple nexthops by moving the FIB flushing up the call hierarchy.
>
> Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3 net-next 2/3] ipv4: Flush the FIB once on multiple nexthop removal
2026-05-07 7:56 ` [PATCH v3 net-next 2/3] ipv4: Flush the FIB once on multiple nexthop removal Cosmin Ratiu
@ 2026-05-07 11:40 ` Ido Schimmel
0 siblings, 0 replies; 9+ messages in thread
From: Ido Schimmel @ 2026-05-07 11:40 UTC (permalink / raw)
To: Cosmin Ratiu
Cc: netdev, David Ahern, Kuniyuki Iwashima, David S . Miller,
Eric Dumazet, Jakub Kicinski, Simon Horman, Paolo Abeni
On Thu, May 07, 2026 at 10:56:05AM +0300, Cosmin Ratiu wrote:
> When a device is going down or when a net namespace is deleted, all
> nexthops on it are removed, and for each nexthop being removed the FIB
> table is flushed, which does a full trie traversal looking for entries
> marked RTNH_F_DEAD and removing them. This is O(N x R), with N being
> number of dev nexthops and R being number of IPv4 routes.
>
> The RTNL is held the entire time.
>
> When there are many nexthops to be removed and many routing entries,
> this can result in the RTNL being held for multiple minutes, which
> causes unhappiness in other processes trying to acquire the RTNL (e.g.
> systemd-networkd for DHCP renewals).
>
> In a complicated deployment with multiple vxlan devices, each having
> 16K nexthops and a total of 128K ipv4 routes, this is exactly what
> happens:
>
> nexthop_flush_dev() # loops over 16K nexthops
> -> remove_nexthop()
> -> __remove_nexthop()
> -> __remove_nexthop_fib() # marks fi->fib_flags |= RTNH_F_DEAD
> -> fib_flush() # for EACH nexthop!
> -> fib_table_flush() # walks the ENTIRE FIB, 128K entries
>
> This patch makes use of the previously added FIB flushing signal to only
> do a single FIB flush after all nexthops to be removed are marked as
> RTNH_F_DEAD:
> - __remove_nexthop_fib() no longer flushes the FIB.
> - nexthop_flush_dev() and flush_all_nexthops() now keep track whether
> any nexthop was removed and trigger a FIB flush at the end.
> - a new wrapper is defined, remove_one_nexthop() which calls
> remove_nexthop() and flushes if necessary. This is intended for places
> which must remove a single nexthop and shouldn't worry about the need
> to trigger a FIB flush. For now, the only caller is rtm_del_nexthop().
> - The two direct callers of __remove_nexthop() get a WARN_ON_ONCE, since
> the nh about to be removed should not have any FIB entries referencing
> it when replacing or inserting a new one.
>
> This dramatically improves performance from O(N x R) to O(N + R).
>
> Releasing a nexthop reference in remove_nexthop() now no longer frees
> it. Instead, it is deleted when the last fib_info pointing to it gets
> freed via free_fib_info_rcu(). All routing code is already careful not
> to take into consideration routes marked with RTNH_F_DEAD.
>
> Tested with:
> DEV=eth2
> ip link set up dev $DEV
> ip link add testnh0 link $DEV type macvlan mode bridge
> ip addr add 198.51.100.1/24 dev testnh0
> ip link set testnh0 up
>
> seq 1 65536 | \
> sed 's/.*/nexthop add id & via 198.51.100.2 dev testnh0/' | \
> ip -batch -
>
> i=1
> for a in $(seq 0 255); do
> for b in $(seq 0 255); do
> echo "route add 10.${a}.${b}.0/32 nhid $i"
> i=$((i + 1))
> done
> done | ip -batch -
>
> time ip link set testnh0 down
> ip link del testnh0
>
> Without this patch:
> real 0m32.601s
> user 0m0.000s
> sys 0m32.511s
>
> With this patch:
> real 0m0.209s
> user 0m0.000s
> sys 0m0.153s
>
> Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3 net-next 3/3] ipv4: Add __must_check to nexthop removal functions
2026-05-07 7:56 ` [PATCH v3 net-next 3/3] ipv4: Add __must_check to nexthop removal functions Cosmin Ratiu
@ 2026-05-07 11:41 ` Ido Schimmel
0 siblings, 0 replies; 9+ messages in thread
From: Ido Schimmel @ 2026-05-07 11:41 UTC (permalink / raw)
To: Cosmin Ratiu
Cc: netdev, David Ahern, Kuniyuki Iwashima, David S . Miller,
Eric Dumazet, Jakub Kicinski, Simon Horman, Paolo Abeni
On Thu, May 07, 2026 at 10:56:06AM +0300, Cosmin Ratiu wrote:
> These functions return a signal whether FIB flushing is required which
> must not be ignored. Use the compiler to help with enforcing this
> requirement in the future.
>
> Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net-next v3 0/3] ipv4: Flush the FIB once on multiple nexthop removal
2026-05-07 7:56 [PATCH net-next v3 0/3] ipv4: Flush the FIB once on multiple nexthop removal Cosmin Ratiu
` (2 preceding siblings ...)
2026-05-07 7:56 ` [PATCH v3 net-next 3/3] ipv4: Add __must_check to nexthop removal functions Cosmin Ratiu
@ 2026-05-07 14:57 ` David Ahern
2026-05-10 17:20 ` patchwork-bot+netdevbpf
4 siblings, 0 replies; 9+ messages in thread
From: David Ahern @ 2026-05-07 14:57 UTC (permalink / raw)
To: Cosmin Ratiu, netdev
Cc: Ido Schimmel, Kuniyuki Iwashima, David S . Miller, Eric Dumazet,
Jakub Kicinski, Simon Horman, Paolo Abeni
On 5/7/26 1:56 AM, Cosmin Ratiu wrote:
> This series optimizes multiple nexthop removal performance from having
> to do a FIB flush for each nexthop being removed to only doing a single
> FIB flush after all nexthops are removed.
>
> This dramatically improves performance in scenarios where there are
> many nexthops and many ipv4 routes. Please see individual patches for
> more details and for a test scenario.
>
> V2 -> V3: https://lore.kernel.org/netdev/8fea4084-c9ec-472a-b8ab-ecc87e537216@kernel.org/T/#t
> - Split the patch into 3 (Ido Schimmel, David Ahern)
> - Used WARN_ON_ONCE instead of WARN_ON (Ido Schimmel)
>
> V1 -> V2:
> - Fixes xmas tree in a couple places (Kuniyuki Iwashima)
> - Added __must_check to remove_nexthop_from_groups() (Kuniyuki Iwashima)
>
> Cosmin Ratiu (3):
> ipv4: Provide a FIB flushing signal from nexthop removal functions
> ipv4: Flush the FIB once on multiple nexthop removal
> ipv4: Add __must_check to nexthop removal functions
>
> net/ipv4/nexthop.c | 88 +++++++++++++++++++++++++++++-----------------
> 1 file changed, 56 insertions(+), 32 deletions(-)
>
Much easier to follow. Thank you.
For the set:
Reviewed-by: David Ahern <dsahern@kernel.org>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net-next v3 0/3] ipv4: Flush the FIB once on multiple nexthop removal
2026-05-07 7:56 [PATCH net-next v3 0/3] ipv4: Flush the FIB once on multiple nexthop removal Cosmin Ratiu
` (3 preceding siblings ...)
2026-05-07 14:57 ` [PATCH net-next v3 0/3] ipv4: Flush the FIB once on multiple nexthop removal David Ahern
@ 2026-05-10 17:20 ` patchwork-bot+netdevbpf
4 siblings, 0 replies; 9+ messages in thread
From: patchwork-bot+netdevbpf @ 2026-05-10 17:20 UTC (permalink / raw)
To: Cosmin Ratiu
Cc: netdev, dsahern, idosch, kuniyu, davem, edumazet, kuba, horms,
pabeni
Hello:
This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Thu, 7 May 2026 10:56:03 +0300 you wrote:
> This series optimizes multiple nexthop removal performance from having
> to do a FIB flush for each nexthop being removed to only doing a single
> FIB flush after all nexthops are removed.
>
> This dramatically improves performance in scenarios where there are
> many nexthops and many ipv4 routes. Please see individual patches for
> more details and for a test scenario.
>
> [...]
Here is the summary with links:
- [v3,net-next,1/3] ipv4: Provide a FIB flushing signal from nexthop removal functions
https://git.kernel.org/netdev/net-next/c/31c777be2a2e
- [v3,net-next,2/3] ipv4: Flush the FIB once on multiple nexthop removal
https://git.kernel.org/netdev/net-next/c/35ce55100c61
- [v3,net-next,3/3] ipv4: Add __must_check to nexthop removal functions
https://git.kernel.org/netdev/net-next/c/5dcbd64e66ba
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2026-05-10 17:21 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-07 7:56 [PATCH net-next v3 0/3] ipv4: Flush the FIB once on multiple nexthop removal Cosmin Ratiu
2026-05-07 7:56 ` [PATCH v3 net-next 1/3] ipv4: Provide a FIB flushing signal from nexthop removal functions Cosmin Ratiu
2026-05-07 11:40 ` Ido Schimmel
2026-05-07 7:56 ` [PATCH v3 net-next 2/3] ipv4: Flush the FIB once on multiple nexthop removal Cosmin Ratiu
2026-05-07 11:40 ` Ido Schimmel
2026-05-07 7:56 ` [PATCH v3 net-next 3/3] ipv4: Add __must_check to nexthop removal functions Cosmin Ratiu
2026-05-07 11:41 ` Ido Schimmel
2026-05-07 14:57 ` [PATCH net-next v3 0/3] ipv4: Flush the FIB once on multiple nexthop removal David Ahern
2026-05-10 17:20 ` patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox