All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ido Schimmel <idosch@idosch.org>
To: cpaasch@openai.com
Cc: David Ahern <dsahern@kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Simon Horman <horms@kernel.org>,
	netdev@vger.kernel.org
Subject: Re: [PATCH net-next] net: Make nexthop-dumps scale linearly with the number of nexthops
Date: Fri, 25 Jul 2025 17:05:52 +0300	[thread overview]
Message-ID: <aIOPQH-S5LAPCb1u@shredder> (raw)
In-Reply-To: <20250724-nexthop_dump-v1-1-6b43fffd5bac@openai.com>

On Thu, Jul 24, 2025 at 05:10:36PM -0700, Christoph Paasch via B4 Relay wrote:
> From: Christoph Paasch <cpaasch@openai.com>
> 
> When we have a (very) large number of nexthops, they do not fit within a
> single message. rtm_dump_walk_nexthops() thus will be called repeatedly
> and ctx->idx is used to avoid dumping the same nexthops again.
> 
> The approach in which we avoid dumpint the same nexthops is by basically

s/dumpint/dumping/

> walking the entire nexthop rb-tree from the left-most node until we find
> a node whose id is >= s_idx. That does not scale well.
> 
> Instead of this non-efficient  approach, rather go directly through the
                               ^ double space
s/non-efficient/inefficient/ ?

> tree to the nexthop that should be dumped (the one whose nh_id >=
> s_idx). This allows us to find the relevant node in O(log(n)).
> 
> We have quite a nice improvement with this:
> 
> Before:
> =======
> 
> --> ~1M nexthops:
> $ time ~/libnl/src/nl-nh-list | wc -l
> 1050624
> 
> real	0m21.080s
> user	0m0.666s
> sys	0m20.384s
> 
> --> ~2M nexthops:
> $ time ~/libnl/src/nl-nh-list | wc -l
> 2101248
> 
> real	1m51.649s
> user	0m1.540s
> sys	1m49.908s
> 
> After:
> ======
> 
> --> ~1M nexthops:
> $ time ~/libnl/src/nl-nh-list | wc -l
> 1050624
> 
> real	0m1.157s
> user	0m0.926s
> sys	0m0.259s
> 
> --> ~2M nexthops:
> $ time ~/libnl/src/nl-nh-list | wc -l
> 2101248
> 
> real	0m2.763s
> user	0m2.042s
> sys	0m0.776s

I was able to reproduce these results.

> 
> Signed-off-by: Christoph Paasch <cpaasch@openai.com>
> ---
>  net/ipv4/nexthop.c | 34 +++++++++++++++++++++++++++++++++-
>  1 file changed, 33 insertions(+), 1 deletion(-)
> 
> diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c
> index 29118c43ebf5f1e91292fe227d4afde313e564bb..226447b1c17d22eab9121bed88c0c2b9148884ac 100644
> --- a/net/ipv4/nexthop.c
> +++ b/net/ipv4/nexthop.c
> @@ -3511,7 +3511,39 @@ static int rtm_dump_walk_nexthops(struct sk_buff *skb,
>  	int err;
>  
>  	s_idx = ctx->idx;
> -	for (node = rb_first(root); node; node = rb_next(node)) {
> +
> +	/*
> +	 * If this is not the first invocation, ctx->idx will contain the id of
> +	 * the last nexthop we processed.  Instead of starting from the very first
> +	 * element of the red/black tree again and linearly skipping the
> +	 * (potentially large) set of nodes with an id smaller than s_idx, walk the
> +	 * tree and find the left-most node whose id is >= s_idx.  This provides an
> +	 * efficient O(log n) starting point for the dump continuation.
> +	 */

Please try to keep lines at 80 characters.

> +	if (s_idx != 0) {
> +		struct rb_node *tmp = root->rb_node;
> +
> +		node = NULL;
> +		while (tmp) {
> +			struct nexthop *nh;
> +
> +			nh = rb_entry(tmp, struct nexthop, rb_node);
> +			if (nh->id < s_idx) {
> +				tmp = tmp->rb_right;
> +			} else {
> +				/* Track current candidate and keep looking on
> +				 * the left side to find the left-most
> +				 * (smallest id) that is still >= s_idx.
> +				 */

I'm aware that netdev now accepts both comment styles, but it's a bit
weird to mix both in the same commit and at the same function.

> +				node = tmp;
> +				tmp = tmp->rb_left;
> +			}
> +		}
> +	} else {
> +		node = rb_first(root);
> +	}
> +
> +	for (; node; node = rb_next(node)) {
>  		struct nexthop *nh;
>  
>  		nh = rb_entry(node, struct nexthop, rb_node);

The code below is:

if (nh->id < s_idx)
	continue;

Can't it be removed given the above code means we start at a nexthop
whose identifier is at least s_idx ?

  reply	other threads:[~2025-07-25 14:05 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-25  0:10 [PATCH net-next] net: Make nexthop-dumps scale linearly with the number of nexthops Christoph Paasch
2025-07-25  0:10 ` Christoph Paasch via B4 Relay
2025-07-25 14:05 ` Ido Schimmel [this message]
2025-07-25 17:47   ` Christoph Paasch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aIOPQH-S5LAPCb1u@shredder \
    --to=idosch@idosch.org \
    --cc=cpaasch@openai.com \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.