netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RFC: Should net namespaces scale up (>10k) ?
@ 2024-09-14 22:34 alexandre.ferrieux
  2024-09-15 18:58 ` Simon Horman
  0 siblings, 1 reply; 14+ messages in thread
From: alexandre.ferrieux @ 2024-09-14 22:34 UTC (permalink / raw)
  To: netdev; +Cc: Eric Dumazet

Hi,

Currently, netns don't really scale beyond a few thousands, for mundane reasons 
(see below). But should they ? Is there, in the design, an assumption that tens 
of thousands of network namespaces are considered "unreasonable" ?

A typical use case for such ridiculous numbers is a tester for firewalls or 
carrier-grade NATs. In these, you typically want tens of thousands of tunnels, 
each of which is perfectly instantiated as an interface. And, to avoid an 
explosion in source routing rules, you want them in separate namespaces.

Now why don't they scale *today* ? For two independent, seemingly accidental, 
O(N) scans of the netns list.

1. The "netdevice notifier" from the Wireless Extensions subsystem insists on 
scanning the whole list regardless of the nature of the change, nor wondering 
whether all these namespaces hold any wireless interface, nor even whether the 
system has _any_ wireless hardware...

         for_each_net(net) {
                 while ((skb = skb_dequeue(&net->wext_nlevents)))
                         rtnl_notify(skb, net, 0, RTNLGRP_LINK, NULL,
                                     GFP_KERNEL);
         }

2. When moving an interface (eg an IPVLAN slave) to another netns, 
__dev_change_net_namespace() calls peernet2id_alloc() in order to get an ID for 
the target namespace. This again incurs a full scan of the netns list:

         int id = idr_for_each(&net->netns_ids, net_eq_idr, peer);

Note that, while IDR is very fast when going from ID to pointer, the reverse 
path is awfully slow... But why are IDs needed in the first place, instead of 
the simple netns pointers ?

Any insight on the (possibly very good) reasons those two apparent warts stand 
in the way of netns scaling up ?

-Alex
____________________________________________________________________________________________________________
Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified.
Thank you.

^ permalink raw reply	[flat|nested] 14+ messages in thread
* Re: RFC: Should net namespaces scale up (>10k) ?
@ 2024-09-15 20:49 Alexandre Ferrieux
  2024-09-16 10:13 ` Przemek Kitszel
  2024-10-08 17:47 ` Kuniyuki Iwashima
  0 siblings, 2 replies; 14+ messages in thread
From: Alexandre Ferrieux @ 2024-09-15 20:49 UTC (permalink / raw)
  To: horms; +Cc: Alexandre Ferrieux, Eric Dumazet, netdev

(thanks Simon, reposting with another account to avoid the offending disclaimer)

Hi,

Currently, netns don't really scale beyond a few thousands, for
mundane reasons (see below). But should they ? Is there, in the
design, an assumption that tens of thousands of network namespaces are
considered "unreasonable" ?

A typical use case for such ridiculous numbers is a tester for
firewalls or carrier-grade NATs. In these, you typically want tens of
thousands of tunnels, each of which is perfectly instantiated as an
interface. And, to avoid an explosion in source routing rules, you
want them in separate namespaces.

Now why don't they scale *today* ? For two independent, seemingly
accidental, O(N) scans of the netns list.

1. The "netdevice notifier" from the Wireless Extensions subsystem
insists on scanning the whole list regardless of the nature of the
change, nor wondering whether all these namespaces hold any wireless
interface, nor even whether the system has _any_ wireless hardware...

        for_each_net(net) {
                while ((skb = skb_dequeue(&net->wext_nlevents)))
                        rtnl_notify(skb, net, 0, RTNLGRP_LINK, NULL,
                                    GFP_KERNEL);
        }

2. When moving an interface (eg an IPVLAN slave) to another netns,
__dev_change_net_namespace() calls peernet2id_alloc() in order to get
an ID for the target namespace. This again incurs a full scan of the
netns list:

        int id = idr_for_each(&net->netns_ids, net_eq_idr, peer);

Note that, while IDR is very fast when going from ID to pointer, the
reverse path is awfully slow... But why are IDs needed in the first
place, instead of the simple netns pointers ?

Any insight on the (possibly very good) reasons those two apparent
warts stand in the way of netns scaling up ?

-Alex

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2024-10-08 18:56 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-14 22:34 RFC: Should net namespaces scale up (>10k) ? alexandre.ferrieux
2024-09-15 18:58 ` Simon Horman
  -- strict thread matches above, loose matches on Subject: below --
2024-09-15 20:49 Alexandre Ferrieux
2024-09-16 10:13 ` Przemek Kitszel
2024-09-16 14:01   ` Simon Horman
2024-09-16 22:05     ` Alexandre Ferrieux
2024-09-17  6:40       ` Przemek Kitszel
2024-09-17 11:06         ` Alexandre Ferrieux
2024-09-17  6:59       ` Eric Dumazet
2024-09-17 12:30         ` Nicolas Dichtel
2024-09-16 21:36   ` Alexandre Ferrieux
2024-10-08 17:47 ` Kuniyuki Iwashima
2024-10-08 18:22   ` Johannes Berg
2024-10-08 18:56     ` Kuniyuki Iwashima

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).