From: Leon Romanovsky <leon@kernel.org>
To: Jakub Kicinski <kuba@kernel.org>
Cc: davem@davemloft.net, netdev@vger.kernel.org, edumazet@google.com,
pabeni@redhat.com, sd@queasysnail.net
Subject: Re: [PATCH net-next v2 1/2] net: store netdevs in an xarray
Date: Thu, 27 Jul 2023 16:08:24 +0300 [thread overview]
Message-ID: <20230727130824.GA2652767@unreal> (raw)
In-Reply-To: <20230726185530.2247698-2-kuba@kernel.org>
On Wed, Jul 26, 2023 at 11:55:29AM -0700, Jakub Kicinski wrote:
> Iterating over the netdev hash table for netlink dumps is hard.
> Dumps are done in "chunks" so we need to save the position
> after each chunk, so we know where to restart from. Because
> netdevs are stored in a hash table we remember which bucket
> we were in and how many devices we dumped.
>
> Since we don't hold any locks across the "chunks" - devices may
> come and go while we're dumping. If that happens we may miss
> a device (if device is deleted from the bucket we were in).
> We indicate to user space that this may have happened by setting
> NLM_F_DUMP_INTR. User space is supposed to dump again (I think)
> if it sees that. Somehow I doubt most user space gets this right..
>
> To illustrate let's look at an example:
>
> System state:
> start: # [A, B, C]
> del: B # [A, C]
>
> with the hash table we may dump [A, B], missing C completely even
> tho it existed both before and after the "del B".
>
> Add an xarray and use it to allocate ifindexes. This way we
> can iterate ifindexes in order, without the worry that we'll
> skip one. We may still generate a dump of a state which "never
> existed", for example for a set of values and sequence of ops:
>
> System state:
> start: # [A, B]
> add: C # [A, C, B]
> del: B # [A, C]
>
> we may generate a dump of [A], if C got an index between A and B.
> System has never been in such state. But I'm 90% sure that's perfectly
> fine, important part is that we can't _miss_ devices which exist before
> and after. User space which wants to mirror kernel's state subscribes
> to notifications and does periodic dumps so it will know that C exists
> from the notification about its creation or from the next dump
> (next dump is _guaranteed_ to include C, if it doesn't get removed).
>
> To avoid any perf regressions keep the hash table for now. Most
> net namespaces have very few devices and microbenchmarking 1M lookups
> on Skylake I get the following results (not counting loopback
> to number of devs):
>
> #devs | hash | xa | delta
> 2 | 18.3 | 20.1 | + 9.8%
> 16 | 18.3 | 20.1 | + 9.5%
> 64 | 18.3 | 26.3 | +43.8%
> 128 | 20.4 | 26.3 | +28.6%
> 256 | 20.0 | 26.4 | +32.1%
> 1024 | 26.6 | 26.7 | + 0.2%
> 8192 |541.3 | 33.5 | -93.8%
>
> No surprises since the hash table has 256 entries.
> The microbenchmark scans indexes in order, if the pattern is more
> random xa starts to win at 512 devices already. But that's a lot
> of devices, in practice.
>
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> v2:
> - fix error checking on xa_alloc_cyclic() (Leon)
> ---
> include/net/net_namespace.h | 4 +-
> net/core/dev.c | 82 ++++++++++++++++++++++++-------------
> 2 files changed, 57 insertions(+), 29 deletions(-)
<...>
> unsigned int dev_base_seq; /* protected by rtnl_mutex */
> - int ifindex;
> + u32 ifindex;
<...>
> +static int dev_index_reserve(struct net *net, u32 ifindex)
> {
> - int ifindex = net->ifindex;
> + int err;
<...>
> + if (!ifindex)
> + err = xa_alloc_cyclic(&net->dev_by_index, &ifindex, NULL,
> + xa_limit_31b, &net->ifindex, GFP_KERNEL);
> + else
> + err = xa_insert(&net->dev_by_index, ifindex, NULL, GFP_KERNEL);
> + if (err < 0)
> + return err;
> +
> + return ifindex;
ifindex is now u32, but you return it as int. In potential, you can
return valid ifindex which will be treated as error.
You should ensure that ifindex doesn't have signed bit on.
Everything else, LGTM
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
next prev parent reply other threads:[~2023-07-27 13:08 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-26 18:55 [PATCH net-next v2 0/2] net: store netdevs in an xarray Jakub Kicinski
2023-07-26 18:55 ` [PATCH net-next v2 1/2] " Jakub Kicinski
2023-07-27 13:08 ` Leon Romanovsky [this message]
2023-07-27 15:45 ` Jakub Kicinski
2023-07-28 4:53 ` Leon Romanovsky
2023-07-28 15:27 ` Jakub Kicinski
2023-07-28 23:23 ` Stephen Hemminger
2023-07-29 0:07 ` Jakub Kicinski
2023-07-26 18:55 ` [PATCH net-next v2 2/2] net: convert some netlink netdev iterators to depend on the xarray Jakub Kicinski
2023-07-27 13:10 ` Leon Romanovsky
2023-07-28 19:00 ` [PATCH net-next v2 0/2] net: store netdevs in an xarray patchwork-bot+netdevbpf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230727130824.GA2652767@unreal \
--to=leon@kernel.org \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=kuba@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=sd@queasysnail.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).