netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Leon Romanovsky <leon@kernel.org>
To: Jakub Kicinski <kuba@kernel.org>
Cc: davem@davemloft.net, netdev@vger.kernel.org, edumazet@google.com,
	pabeni@redhat.com, mkubecek@suse.cz, lorenzo@kernel.org
Subject: Re: [PATCH net-next 1/2] net: store netdevs in an xarray
Date: Mon, 24 Jul 2023 22:09:34 +0300	[thread overview]
Message-ID: <20230724190934.GE11388@unreal> (raw)
In-Reply-To: <20230722014237.4078962-2-kuba@kernel.org>

On Fri, Jul 21, 2023 at 06:42:36PM -0700, Jakub Kicinski wrote:
> Iterating over the netdev hash table for netlink dumps is hard.
> Dumps are done in "chunks" so we need to save the position
> after each chunk, so we know where to restart from. Because
> netdevs are stored in a hash table we remember which bucket
> we were in and how many devices we dumped.
> 
> Since we don't hold any locks across the "chunks" - devices may
> come and go while we're dumping. If that happens we may miss
> a device (if device is deleted from the bucket we were in).
> We indicate to user space that this may have happened by setting
> NLM_F_DUMP_INTR. User space is supposed to dump again (I think)
> if it sees that. Somehow I doubt most user space gets this right..
> 
> To illustrate let's look at an example:
> 
>                System state:
>   start:       # [A, B, C]
>   del:  B      # [A, C]
> 
> with the hash table we may dump [A, B], missing C completely even
> tho it existed both before and after the "del B".
> 
> Add an xarray and use it to allocate ifindexes. This way we
> can iterate ifindexes in order, without the worry that we'll
> skip one. We may still generate a dump of a state which "never
> existed", for example for a set of values and sequence of ops:
> 
>                System state:
>   start:       # [A, B]
>   add:  C      # [A, C, B]
>   del:  B      # [A, C]
> 
> we may generate a dump of [A], if C got an index between A and B.
> System has never been in such state. But I'm 90% sure that's perfectly
> fine, important part is that we can't _miss_ devices which exist before
> and after. User space which wants to mirror kernel's state subscribes
> to notifications and does periodic dumps so it will know that C exists
> from the notification about its creation or from the next dump
> (next dump is _guaranteed_ to include C, if it doesn't get removed).
> 
> To avoid any perf regressions keep the hash table for now. Most
> net namespaces have very few devices and microbenchmarking 1M lookups
> on Skylake I get the following results (not counting loopback
> to number of devs):
> 
>  #devs | hash |  xa  | delta
>     2  | 18.3 | 20.1 | + 9.8%
>    16  | 18.3 | 20.1 | + 9.5%
>    64  | 18.3 | 26.3 | +43.8%
>   128  | 20.4 | 26.3 | +28.6%
>   256  | 20.0 | 26.4 | +32.1%
>  1024  | 26.6 | 26.7 | + 0.2%
>  8192  |541.3 | 33.5 | -93.8%
> 
> No surprises since the hash table has 256 entries.
> The microbenchmark scans indexes in order, if the pattern is more
> random xa starts to win at 512 devices already. But that's a lot
> of devices, in practice.
> 
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
>  include/net/net_namespace.h |  4 +-
>  net/core/dev.c              | 82 ++++++++++++++++++++++++-------------
>  2 files changed, 57 insertions(+), 29 deletions(-)

<...>

> +	if (!ifindex)
> +		err = xa_alloc_cyclic(&net->dev_by_index, &ifindex, NULL,
> +				      xa_limit_31b, &net->ifindex, GFP_KERNEL);
> +	else
> +		err = xa_insert(&net->dev_by_index, ifindex, NULL, GFP_KERNEL);
> +	if (err)
> +		return err;

Please pay attention that xa_alloc_cyclic() returns 1 if the allocation
succeeded after wrapping. So the more accurate check is "if (err < 0) ..."

Thanks

  parent reply	other threads:[~2023-07-24 19:09 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-22  1:42 [PATCH net-next 0/2] net: store netdevs in an xarray Jakub Kicinski
2023-07-22  1:42 ` [PATCH net-next 1/2] " Jakub Kicinski
2023-07-22  1:47   ` Jakub Kicinski
2023-07-24  8:18   ` Paolo Abeni
2023-07-24 15:41     ` Jakub Kicinski
2023-07-24 16:23       ` Paolo Abeni
2023-07-24 17:27         ` Jakub Kicinski
2023-07-24 19:07           ` Jakub Kicinski
2023-07-25 11:11             ` Paolo Abeni
2023-07-25 16:56               ` Jakub Kicinski
2023-07-25 17:54             ` Sabrina Dubroca
2023-07-25 19:45               ` Jakub Kicinski
2023-07-24 19:09   ` Leon Romanovsky [this message]
2023-07-22  1:42 ` [PATCH net-next 2/2] net: convert some netlink netdev iterators to depend on the xarray Jakub Kicinski
2023-07-24 15:28 ` [PATCH net-next 0/2] net: store netdevs in an xarray Simon Horman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230724190934.GE11388@unreal \
    --to=leon@kernel.org \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=kuba@kernel.org \
    --cc=lorenzo@kernel.org \
    --cc=mkubecek@suse.cz \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).