netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Paolo Abeni <pabeni@redhat.com>
To: Jakub Kicinski <kuba@kernel.org>, davem@davemloft.net
Cc: netdev@vger.kernel.org, edumazet@google.com, mkubecek@suse.cz,
	 lorenzo@kernel.org
Subject: Re: [PATCH net-next 1/2] net: store netdevs in an xarray
Date: Mon, 24 Jul 2023 10:18:04 +0200	[thread overview]
Message-ID: <20788d4df9bbcdce9453be3fd047fdf8e0465714.camel@redhat.com> (raw)
In-Reply-To: <20230722014237.4078962-2-kuba@kernel.org>

On Fri, 2023-07-21 at 18:42 -0700, Jakub Kicinski wrote:
> Iterating over the netdev hash table for netlink dumps is hard.
> Dumps are done in "chunks" so we need to save the position
> after each chunk, so we know where to restart from. Because
> netdevs are stored in a hash table we remember which bucket
> we were in and how many devices we dumped.
> 
> Since we don't hold any locks across the "chunks" - devices may
> come and go while we're dumping. If that happens we may miss
> a device (if device is deleted from the bucket we were in).
> We indicate to user space that this may have happened by setting
> NLM_F_DUMP_INTR. User space is supposed to dump again (I think)
> if it sees that. Somehow I doubt most user space gets this right..
> 
> To illustrate let's look at an example:
> 
>                System state:
>   start:       # [A, B, C]
>   del:  B      # [A, C]
> 
> with the hash table we may dump [A, B], missing C completely even
> tho it existed both before and after the "del B".
> 
> Add an xarray and use it to allocate ifindexes. This way we
> can iterate ifindexes in order, without the worry that we'll
> skip one. We may still generate a dump of a state which "never
> existed", for example for a set of values and sequence of ops:
> 
>                System state:
>   start:       # [A, B]
>   add:  C      # [A, C, B]
>   del:  B      # [A, C]
> 
> we may generate a dump of [A], if C got an index between A and B.
> System has never been in such state. But I'm 90% sure that's perfectly
> fine, important part is that we can't _miss_ devices which exist before
> and after. User space which wants to mirror kernel's state subscribes
> to notifications and does periodic dumps so it will know that C exists
> from the notification about its creation or from the next dump
> (next dump is _guaranteed_ to include C, if it doesn't get removed).
> 
> To avoid any perf regressions keep the hash table for now. Most
> net namespaces have very few devices and microbenchmarking 1M lookups
> on Skylake I get the following results (not counting loopback
> to number of devs):

A possibly dumb question: why using an xarray over a plain list? It
looks like the idea is to additionally use xarray for device lookup
beyond for dumping?

WRT the above, have you considered instead replacing dev_name_head with
an rhashtable? (and add the mentioned list)

Cheers,

Paolo


  parent reply	other threads:[~2023-07-24  8:18 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-22  1:42 [PATCH net-next 0/2] net: store netdevs in an xarray Jakub Kicinski
2023-07-22  1:42 ` [PATCH net-next 1/2] " Jakub Kicinski
2023-07-22  1:47   ` Jakub Kicinski
2023-07-24  8:18   ` Paolo Abeni [this message]
2023-07-24 15:41     ` Jakub Kicinski
2023-07-24 16:23       ` Paolo Abeni
2023-07-24 17:27         ` Jakub Kicinski
2023-07-24 19:07           ` Jakub Kicinski
2023-07-25 11:11             ` Paolo Abeni
2023-07-25 16:56               ` Jakub Kicinski
2023-07-25 17:54             ` Sabrina Dubroca
2023-07-25 19:45               ` Jakub Kicinski
2023-07-24 19:09   ` Leon Romanovsky
2023-07-22  1:42 ` [PATCH net-next 2/2] net: convert some netlink netdev iterators to depend on the xarray Jakub Kicinski
2023-07-24 15:28 ` [PATCH net-next 0/2] net: store netdevs in an xarray Simon Horman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20788d4df9bbcdce9453be3fd047fdf8e0465714.camel@redhat.com \
    --to=pabeni@redhat.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=kuba@kernel.org \
    --cc=lorenzo@kernel.org \
    --cc=mkubecek@suse.cz \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).