From: Simon Horman <horms@kernel.org>
To: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Cc: Alexandre Ferrieux <alexandre.ferrieux@gmail.com>,
Alexandre Ferrieux <alexandre.ferrieux@orange.com>,
Eric Dumazet <edumazet@google.com>,
netdev@vger.kernel.org
Subject: Re: RFC: Should net namespaces scale up (>10k) ?
Date: Mon, 16 Sep 2024 15:01:30 +0100 [thread overview]
Message-ID: <20240916140130.GB415778@kernel.org> (raw)
In-Reply-To: <db6ecdc4-8053-42d6-89cc-39c70b199bde@intel.com>
On Mon, Sep 16, 2024 at 12:13:35PM +0200, Przemek Kitszel wrote:
> On 9/15/24 22:49, Alexandre Ferrieux wrote:
> > (thanks Simon, reposting with another account to avoid the offending disclaimer)
> >
> > Hi,
> >
> > Currently, netns don't really scale beyond a few thousands, for
> > mundane reasons (see below). But should they ? Is there, in the
> > design, an assumption that tens of thousands of network namespaces are
> > considered "unreasonable" ?
> >
> > A typical use case for such ridiculous numbers is a tester for
> > firewalls or carrier-grade NATs. In these, you typically want tens of
> > thousands of tunnels, each of which is perfectly instantiated as an
> > interface. And, to avoid an explosion in source routing rules, you
> > want them in separate namespaces.
> >
> > Now why don't they scale *today* ? For two independent, seemingly
> > accidental, O(N) scans of the netns list.
> >
> > 1. The "netdevice notifier" from the Wireless Extensions subsystem
> > insists on scanning the whole list regardless of the nature of the
> > change, nor wondering whether all these namespaces hold any wireless
> > interface, nor even whether the system has _any_ wireless hardware...
> >
> > for_each_net(net) {
> > while ((skb = skb_dequeue(&net->wext_nlevents)))
> > rtnl_notify(skb, net, 0, RTNLGRP_LINK, NULL,
> > GFP_KERNEL);
> > }
> >
> > 2. When moving an interface (eg an IPVLAN slave) to another netns,
> > __dev_change_net_namespace() calls peernet2id_alloc() in order to get
> > an ID for the target namespace. This again incurs a full scan of the
> > netns list:
> >
> > int id = idr_for_each(&net->netns_ids, net_eq_idr, peer);
>
> this piece is inside of __peernet2id(), which is called in for_each_net
> loop, making it O(n^2):
>
> 548│ for_each_net(tmp) {
> 549│ int id;
> 550│
> 551│ spin_lock_bh(&tmp->nsid_lock);
> 552│ id = __peernet2id(tmp, net);
>
> >
> > Note that, while IDR is very fast when going from ID to pointer, the
> > reverse path is awfully slow... But why are IDs needed in the first
> > place, instead of the simple netns pointers ?
> >
> > Any insight on the (possibly very good) reasons those two apparent
> > warts stand in the way of netns scaling up ?
> >
> > -Alex
> >
>
> I guess that the reason is more pragmatic, net namespaces are decade
> older than xarray, thus list-based implementation.
Yes, I would also guess that the reason is not that these limitations were
part of the design. But just that the implementation scaled sufficiently at
the time. And that if further scale is required, then the implementation
can be updated.
next prev parent reply other threads:[~2024-09-16 14:01 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-15 20:49 RFC: Should net namespaces scale up (>10k) ? Alexandre Ferrieux
2024-09-16 10:13 ` Przemek Kitszel
2024-09-16 14:01 ` Simon Horman [this message]
2024-09-16 22:05 ` Alexandre Ferrieux
2024-09-17 6:40 ` Przemek Kitszel
2024-09-17 11:06 ` Alexandre Ferrieux
2024-09-17 6:59 ` Eric Dumazet
2024-09-17 12:30 ` Nicolas Dichtel
2024-09-24 14:06 ` Massive hash collisions on FIB Alexandre Ferrieux
2024-09-24 14:36 ` Eric Dumazet
2024-09-24 17:18 ` Alexandre Ferrieux
2024-09-25 19:06 ` Alexandre Ferrieux
2024-09-25 19:25 ` Eric Dumazet
2024-09-25 19:46 ` Alexandre Ferrieux
2024-09-25 20:12 ` Eric Dumazet
2024-09-25 21:26 ` Alexandre Ferrieux
2024-09-16 21:36 ` RFC: Should net namespaces scale up (>10k) ? Alexandre Ferrieux
2024-10-08 17:47 ` Kuniyuki Iwashima
2024-10-08 18:22 ` Johannes Berg
2024-10-08 18:56 ` Kuniyuki Iwashima
-- strict thread matches above, loose matches on Subject: below --
2024-09-14 22:34 alexandre.ferrieux
2024-09-15 18:58 ` Simon Horman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240916140130.GB415778@kernel.org \
--to=horms@kernel.org \
--cc=alexandre.ferrieux@gmail.com \
--cc=alexandre.ferrieux@orange.com \
--cc=edumazet@google.com \
--cc=netdev@vger.kernel.org \
--cc=przemyslaw.kitszel@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).