RFC: Should net namespaces scale up (>10k) ?

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* RFC: Should net namespaces scale up (>10k) ?
@ 2024-09-14 22:34 alexandre.ferrieux
  2024-09-15 18:58 ` Simon Horman
  0 siblings, 1 reply; 14+ messages in thread
From: alexandre.ferrieux @ 2024-09-14 22:34 UTC (permalink / raw)
  To: netdev; +Cc: Eric Dumazet

Hi,

Currently, netns don't really scale beyond a few thousands, for mundane reasons 
(see below). But should they ? Is there, in the design, an assumption that tens 
of thousands of network namespaces are considered "unreasonable" ?

A typical use case for such ridiculous numbers is a tester for firewalls or 
carrier-grade NATs. In these, you typically want tens of thousands of tunnels, 
each of which is perfectly instantiated as an interface. And, to avoid an 
explosion in source routing rules, you want them in separate namespaces.

Now why don't they scale *today* ? For two independent, seemingly accidental, 
O(N) scans of the netns list.

1. The "netdevice notifier" from the Wireless Extensions subsystem insists on 
scanning the whole list regardless of the nature of the change, nor wondering 
whether all these namespaces hold any wireless interface, nor even whether the 
system has _any_ wireless hardware...

         for_each_net(net) {
                 while ((skb = skb_dequeue(&net->wext_nlevents)))
                         rtnl_notify(skb, net, 0, RTNLGRP_LINK, NULL,
                                     GFP_KERNEL);
         }

2. When moving an interface (eg an IPVLAN slave) to another netns, 
__dev_change_net_namespace() calls peernet2id_alloc() in order to get an ID for 
the target namespace. This again incurs a full scan of the netns list:

         int id = idr_for_each(&net->netns_ids, net_eq_idr, peer);

Note that, while IDR is very fast when going from ID to pointer, the reverse 
path is awfully slow... But why are IDs needed in the first place, instead of 
the simple netns pointers ?

Any insight on the (possibly very good) reasons those two apparent warts stand 
in the way of netns scaling up ?

-Alex
____________________________________________________________________________________________________________
Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified.
Thank you.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RFC: Should net namespaces scale up (>10k) ?
  2024-09-14 22:34 RFC: Should net namespaces scale up (>10k) ? alexandre.ferrieux
@ 2024-09-15 18:58 ` Simon Horman
  0 siblings, 0 replies; 14+ messages in thread
From: Simon Horman @ 2024-09-15 18:58 UTC (permalink / raw)
  To: alexandre.ferrieux; +Cc: netdev, Eric Dumazet

On Sun, Sep 15, 2024 at 12:34:05AM +0200, alexandre.ferrieux@orange.com wrote:

Hi Alex,

these are good questions, but you will need to post them without the
declaimer below.

> ____________________________________________________________________________________________________________
> Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc
> pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler
> a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration,
> Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.
> 
> This message and its attachments may contain confidential or privileged information that may be protected by law;
> they should not be distributed, used or copied without authorisation.
> If you have received this email in error, please notify the sender and delete this message and its attachments.
> As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified.
> Thank you.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RFC: Should net namespaces scale up (>10k) ?
@ 2024-09-15 20:49 Alexandre Ferrieux
  2024-09-16 10:13 ` Przemek Kitszel
  2024-10-08 17:47 ` Kuniyuki Iwashima
  0 siblings, 2 replies; 14+ messages in thread
From: Alexandre Ferrieux @ 2024-09-15 20:49 UTC (permalink / raw)
  To: horms; +Cc: Alexandre Ferrieux, Eric Dumazet, netdev

(thanks Simon, reposting with another account to avoid the offending disclaimer)

Hi,

Currently, netns don't really scale beyond a few thousands, for
mundane reasons (see below). But should they ? Is there, in the
design, an assumption that tens of thousands of network namespaces are
considered "unreasonable" ?

A typical use case for such ridiculous numbers is a tester for
firewalls or carrier-grade NATs. In these, you typically want tens of
thousands of tunnels, each of which is perfectly instantiated as an
interface. And, to avoid an explosion in source routing rules, you
want them in separate namespaces.

Now why don't they scale *today* ? For two independent, seemingly
accidental, O(N) scans of the netns list.

1. The "netdevice notifier" from the Wireless Extensions subsystem
insists on scanning the whole list regardless of the nature of the
change, nor wondering whether all these namespaces hold any wireless
interface, nor even whether the system has _any_ wireless hardware...

        for_each_net(net) {
                while ((skb = skb_dequeue(&net->wext_nlevents)))
                        rtnl_notify(skb, net, 0, RTNLGRP_LINK, NULL,
                                    GFP_KERNEL);
        }

2. When moving an interface (eg an IPVLAN slave) to another netns,
__dev_change_net_namespace() calls peernet2id_alloc() in order to get
an ID for the target namespace. This again incurs a full scan of the
netns list:

        int id = idr_for_each(&net->netns_ids, net_eq_idr, peer);

Note that, while IDR is very fast when going from ID to pointer, the
reverse path is awfully slow... But why are IDs needed in the first
place, instead of the simple netns pointers ?

Any insight on the (possibly very good) reasons those two apparent
warts stand in the way of netns scaling up ?

-Alex

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RFC: Should net namespaces scale up (>10k) ?
  2024-09-15 20:49 Alexandre Ferrieux
@ 2024-09-16 10:13 ` Przemek Kitszel
  2024-09-16 14:01   ` Simon Horman
  2024-09-16 21:36   ` Alexandre Ferrieux
  2024-10-08 17:47 ` Kuniyuki Iwashima
  1 sibling, 2 replies; 14+ messages in thread
From: Przemek Kitszel @ 2024-09-16 10:13 UTC (permalink / raw)
  To: Alexandre Ferrieux; +Cc: Alexandre Ferrieux, horms, Eric Dumazet, netdev

On 9/15/24 22:49, Alexandre Ferrieux wrote:
> (thanks Simon, reposting with another account to avoid the offending disclaimer)
> 
> Hi,
> 
> Currently, netns don't really scale beyond a few thousands, for
> mundane reasons (see below). But should they ? Is there, in the
> design, an assumption that tens of thousands of network namespaces are
> considered "unreasonable" ?
> 
> A typical use case for such ridiculous numbers is a tester for
> firewalls or carrier-grade NATs. In these, you typically want tens of
> thousands of tunnels, each of which is perfectly instantiated as an
> interface. And, to avoid an explosion in source routing rules, you
> want them in separate namespaces.
> 
> Now why don't they scale *today* ? For two independent, seemingly
> accidental, O(N) scans of the netns list.
> 
> 1. The "netdevice notifier" from the Wireless Extensions subsystem
> insists on scanning the whole list regardless of the nature of the
> change, nor wondering whether all these namespaces hold any wireless
> interface, nor even whether the system has _any_ wireless hardware...
> 
>          for_each_net(net) {
>                  while ((skb = skb_dequeue(&net->wext_nlevents)))
>                          rtnl_notify(skb, net, 0, RTNLGRP_LINK, NULL,
>                                      GFP_KERNEL);
>          }
> 
> 2. When moving an interface (eg an IPVLAN slave) to another netns,
> __dev_change_net_namespace() calls peernet2id_alloc() in order to get
> an ID for the target namespace. This again incurs a full scan of the
> netns list:
> 
>          int id = idr_for_each(&net->netns_ids, net_eq_idr, peer);

this piece is inside of __peernet2id(), which is called in for_each_net
loop, making it O(n^2):

  548│         for_each_net(tmp) {
  549│                 int id;
  550│
  551│                 spin_lock_bh(&tmp->nsid_lock);
  552│                 id = __peernet2id(tmp, net);

> 
> Note that, while IDR is very fast when going from ID to pointer, the
> reverse path is awfully slow... But why are IDs needed in the first
> place, instead of the simple netns pointers ?
> 
> Any insight on the (possibly very good) reasons those two apparent
> warts stand in the way of netns scaling up ?
> 
> -Alex
> 

I guess that the reason is more pragmatic, net namespaces are decade
older than xarray, thus list-based implementation.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RFC: Should net namespaces scale up (>10k) ?
  2024-09-16 10:13 ` Przemek Kitszel
@ 2024-09-16 14:01   ` Simon Horman
  2024-09-16 22:05     ` Alexandre Ferrieux
  2024-09-16 21:36   ` Alexandre Ferrieux
  1 sibling, 1 reply; 14+ messages in thread
From: Simon Horman @ 2024-09-16 14:01 UTC (permalink / raw)
  To: Przemek Kitszel
  Cc: Alexandre Ferrieux, Alexandre Ferrieux, Eric Dumazet, netdev

On Mon, Sep 16, 2024 at 12:13:35PM +0200, Przemek Kitszel wrote:
> On 9/15/24 22:49, Alexandre Ferrieux wrote:
> > (thanks Simon, reposting with another account to avoid the offending disclaimer)
> > 
> > Hi,
> > 
> > Currently, netns don't really scale beyond a few thousands, for
> > mundane reasons (see below). But should they ? Is there, in the
> > design, an assumption that tens of thousands of network namespaces are
> > considered "unreasonable" ?
> > 
> > A typical use case for such ridiculous numbers is a tester for
> > firewalls or carrier-grade NATs. In these, you typically want tens of
> > thousands of tunnels, each of which is perfectly instantiated as an
> > interface. And, to avoid an explosion in source routing rules, you
> > want them in separate namespaces.
> > 
> > Now why don't they scale *today* ? For two independent, seemingly
> > accidental, O(N) scans of the netns list.
> > 
> > 1. The "netdevice notifier" from the Wireless Extensions subsystem
> > insists on scanning the whole list regardless of the nature of the
> > change, nor wondering whether all these namespaces hold any wireless
> > interface, nor even whether the system has _any_ wireless hardware...
> > 
> >          for_each_net(net) {
> >                  while ((skb = skb_dequeue(&net->wext_nlevents)))
> >                          rtnl_notify(skb, net, 0, RTNLGRP_LINK, NULL,
> >                                      GFP_KERNEL);
> >          }
> > 
> > 2. When moving an interface (eg an IPVLAN slave) to another netns,
> > __dev_change_net_namespace() calls peernet2id_alloc() in order to get
> > an ID for the target namespace. This again incurs a full scan of the
> > netns list:
> > 
> >          int id = idr_for_each(&net->netns_ids, net_eq_idr, peer);
> 
> this piece is inside of __peernet2id(), which is called in for_each_net
> loop, making it O(n^2):
> 
>  548│         for_each_net(tmp) {
>  549│                 int id;
>  550│
>  551│                 spin_lock_bh(&tmp->nsid_lock);
>  552│                 id = __peernet2id(tmp, net);
> 
> > 
> > Note that, while IDR is very fast when going from ID to pointer, the
> > reverse path is awfully slow... But why are IDs needed in the first
> > place, instead of the simple netns pointers ?
> > 
> > Any insight on the (possibly very good) reasons those two apparent
> > warts stand in the way of netns scaling up ?
> > 
> > -Alex
> > 
> 
> I guess that the reason is more pragmatic, net namespaces are decade
> older than xarray, thus list-based implementation.

Yes, I would also guess that the reason is not that these limitations were
part of the design. But just that the implementation scaled sufficiently at
the time. And that if further scale is required, then the implementation
can be updated.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RFC: Should net namespaces scale up (>10k) ?
  2024-09-16 14:01   ` Simon Horman
@ 2024-09-16 22:05     ` Alexandre Ferrieux
  2024-09-17  6:40       ` Przemek Kitszel
  2024-09-17  6:59       ` Eric Dumazet
  0 siblings, 2 replies; 14+ messages in thread
From: Alexandre Ferrieux @ 2024-09-16 22:05 UTC (permalink / raw)
  To: Simon Horman, Przemek Kitszel; +Cc: Alexandre Ferrieux, Eric Dumazet, netdev

On 16/09/2024 16:01, Simon Horman wrote:
> 
>> > Any insight on the (possibly very good) reasons those two apparent
>> > warts stand in the way of netns scaling up ?
>> 
>> I guess that the reason is more pragmatic, net namespaces are decade
>> older than xarray, thus list-based implementation.
> 
> Yes, I would also guess that the reason is not that these limitations were
> part of the design. But just that the implementation scaled sufficiently at
> the time. And that if further scale is required, then the implementation
> can be updated.

Okay, thank you for confirming my fears :}
Now, what shall we do:

 1. Ignore this corner case and carve the "few netns" assumption in stone;

 2. Migrate netns IDs to xarrays (not to mention other leftover uses of IDR).

Note that this funny workload of mine is a typical situation where the "DPDK
beats Linux" myth gets reinforced. I find this pretty disappointing, as it
implies reinventing the whole network stack in userspace. All the more so, as
the other typical case for DPDK is now moot thanks to XDP.

What do you think ?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RFC: Should net namespaces scale up (>10k) ?
  2024-09-16 22:05     ` Alexandre Ferrieux
@ 2024-09-17  6:40       ` Przemek Kitszel
  2024-09-17 11:06         ` Alexandre Ferrieux
  2024-09-17  6:59       ` Eric Dumazet
  1 sibling, 1 reply; 14+ messages in thread
From: Przemek Kitszel @ 2024-09-17  6:40 UTC (permalink / raw)
  To: Alexandre Ferrieux, Simon Horman
  Cc: Eric Dumazet, netdev, Tony Nguyen, Knitter, Konrad

On 9/17/24 00:05, Alexandre Ferrieux wrote:
> On 16/09/2024 16:01, Simon Horman wrote:
>>
>>>> Any insight on the (possibly very good) reasons those two apparent
>>>> warts stand in the way of netns scaling up ?
>>>
>>> I guess that the reason is more pragmatic, net namespaces are decade
>>> older than xarray, thus list-based implementation.
>>
>> Yes, I would also guess that the reason is not that these limitations were
>> part of the design. But just that the implementation scaled sufficiently at
>> the time. And that if further scale is required, then the implementation
>> can be updated.
> 
> Okay, thank you for confirming my fears :}
> Now, what shall we do:
> 
>   1. Ignore this corner case and carve the "few netns" assumption in stone;
> 
>   2. Migrate netns IDs to xarrays (not to mention other leftover uses of IDR).
> 
> Note that this funny workload of mine is a typical situation where the "DPDK
> beats Linux" myth gets reinforced. I find this pretty disappointing, as it
> implies reinventing the whole network stack in userspace. All the more so, as
> the other typical case for DPDK is now moot thanks to XDP.
> 
> What do you think ?

I would describe (here) more what is this typical scenario where users
bother to set up DPDK for perf gains.

With that I think that is a legitimate reason to rewrite parts of netns,
if only to allow companies to shuffle engineers out from DPDK-support
teams into upstream-related ones :) [in the long term ofc]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RFC: Should net namespaces scale up (>10k) ?
  2024-09-17  6:40       ` Przemek Kitszel
@ 2024-09-17 11:06         ` Alexandre Ferrieux
  0 siblings, 0 replies; 14+ messages in thread
From: Alexandre Ferrieux @ 2024-09-17 11:06 UTC (permalink / raw)
  To: Przemek Kitszel, Alexandre Ferrieux, Simon Horman
  Cc: Eric Dumazet, netdev, Tony Nguyen, Knitter, Konrad,
	nicolas.dichtel

On 17/09/2024 08:40, Przemek Kitszel wrote:
> On 9/17/24 00:05, Alexandre Ferrieux wrote:
>> Now, what shall we do:
>> 
>>   1. Ignore this corner case and carve the "few netns" assumption in stone;
>> 
>>   2. Migrate netns IDs to xarrays (not to mention other leftover uses of IDR).
>> 
>> Note that this funny workload of mine is a typical situation where the "DPDK
>> beats Linux" myth gets reinforced. I find this pretty disappointing, as it
>> implies reinventing the whole network stack in userspace. All the more so, as
>> the other typical case for DPDK is now moot thanks to XDP.
>> 
>> What do you think ?
> 
> I would describe (here) more what is this typical scenario where users
> bother to set up DPDK for perf gains.

Two cases from my experience:

(1) "Bump-in-the-wire" rx/tx on same port, trying to reach line-rate on one or
several 100Gbps interfaces. On this one, Linux performs beautifully, with "no
fat", just use XDP (verdict XDP_TX) along with some packet tweaking in a kfunc.
Of course you need to get queue number, coalescence, IRQ and NUMA right. And you
need a well-written native-XDP mode in the driver (not all NICs have one).
Here, the "DPDK advantage" is a lie.

(2) "Many-tunnels" as in my CGNAT tester case. Due to the limitations we are
talking about, people are right (so far) to turn to DPDK, as they do for example
in TRex https://trex-tgn.cisco.com/ .

> With that I think that is a legitimate reason to rewrite parts of netns,
> if only to allow companies to shuffle engineers out from DPDK-support
> teams into upstream-related ones :) [in the long term ofc]

I violently agree :)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RFC: Should net namespaces scale up (>10k) ?
  2024-09-16 22:05     ` Alexandre Ferrieux
  2024-09-17  6:40       ` Przemek Kitszel
@ 2024-09-17  6:59       ` Eric Dumazet
  2024-09-17 12:30         ` Nicolas Dichtel
  1 sibling, 1 reply; 14+ messages in thread
From: Eric Dumazet @ 2024-09-17  6:59 UTC (permalink / raw)
  To: Alexandre Ferrieux; +Cc: Simon Horman, Przemek Kitszel, netdev

On Tue, Sep 17, 2024 at 12:06 AM Alexandre Ferrieux
<alexandre.ferrieux@gmail.com> wrote:
>
> On 16/09/2024 16:01, Simon Horman wrote:
> >
> >> > Any insight on the (possibly very good) reasons those two apparent
> >> > warts stand in the way of netns scaling up ?
> >>
> >> I guess that the reason is more pragmatic, net namespaces are decade
> >> older than xarray, thus list-based implementation.
> >
> > Yes, I would also guess that the reason is not that these limitations were
> > part of the design. But just that the implementation scaled sufficiently at
> > the time. And that if further scale is required, then the implementation
> > can be updated.
>
> Okay, thank you for confirming my fears :}
> Now, what shall we do:
>
>  1. Ignore this corner case and carve the "few netns" assumption in stone;
>
>  2. Migrate netns IDs to xarrays (not to mention other leftover uses of IDR).
>
> Note that this funny workload of mine is a typical situation where the "DPDK
> beats Linux" myth gets reinforced. I find this pretty disappointing, as it
> implies reinventing the whole network stack in userspace. All the more so, as
> the other typical case for DPDK is now moot thanks to XDP.
>
> What do you think ?

I do not see any blocker for making things more scalable.

It is only a matter of time and interest. I think that 99.99 % of
linux hosts around the world
have less than 10 netns.

RTNL removal is a little bit harder (and we hit RTNL contention even
with less than 10 netns around)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RFC: Should net namespaces scale up (>10k) ?
  2024-09-17  6:59       ` Eric Dumazet
@ 2024-09-17 12:30         ` Nicolas Dichtel
  0 siblings, 0 replies; 14+ messages in thread
From: Nicolas Dichtel @ 2024-09-17 12:30 UTC (permalink / raw)
  To: Eric Dumazet, Alexandre Ferrieux; +Cc: Simon Horman, Przemek Kitszel, netdev

Le 17/09/2024 à 08:59, Eric Dumazet a écrit :
[snip]

> 
> I do not see any blocker for making things more scalable.
+1

> 
> It is only a matter of time and interest. I think that 99.99 % of
> linux hosts around the world
> have less than 10 netns.
I agree. My target was 4k netns at the time I pushed these ids.

> 
> RTNL removal is a little bit harder (and we hit RTNL contention even
> with less than 10 netns around)
+1

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RFC: Should net namespaces scale up (>10k) ?
  2024-09-16 10:13 ` Przemek Kitszel
  2024-09-16 14:01   ` Simon Horman
@ 2024-09-16 21:36   ` Alexandre Ferrieux
  1 sibling, 0 replies; 14+ messages in thread
From: Alexandre Ferrieux @ 2024-09-16 21:36 UTC (permalink / raw)
  To: Przemek Kitszel, Alexandre Ferrieux; +Cc: horms, Eric Dumazet, netdev

On 16/09/2024 12:13, Przemek Kitszel wrote:
> On 9/15/24 22:49, Alexandre Ferrieux wrote:
>>
>> 2. When moving an interface (eg an IPVLAN slave) to another netns,
>> __dev_change_net_namespace() calls peernet2id_alloc() in order to get
>> an ID for the target namespace. This again incurs a full scan of the
>> netns list:
>>
>>          int id = idr_for_each(&net->netns_ids, net_eq_idr, peer);
> 
> this piece is inside of __peernet2id(), which is called in for_each_net
> loop, making it O(n^2):
> 
>   548│         for_each_net(tmp) {
>   549│                 int id;
>   550│
>   551│                 spin_lock_bh(&tmp->nsid_lock);
>   552│                 id = __peernet2id(tmp, net);

You're right, though that happens only within unhash_nsid(), which is called 
when deleting an nsnet.

Obviously this quadratic horror you found is even worse than the linear one I 
reported, but it can arguably be worked around in tester-like workloads (just 
never delete the namespaces). While the linear one cannot, as long as you need 
to move any interface into the newly created thousands of namespaces.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RFC: Should net namespaces scale up (>10k) ?
  2024-09-15 20:49 Alexandre Ferrieux
  2024-09-16 10:13 ` Przemek Kitszel
@ 2024-10-08 17:47 ` Kuniyuki Iwashima
  2024-10-08 18:22   ` Johannes Berg
  1 sibling, 1 reply; 14+ messages in thread
From: Kuniyuki Iwashima @ 2024-10-08 17:47 UTC (permalink / raw)
  To: alexandre.ferrieux
  Cc: alexandre.ferrieux, edumazet, horms, netdev, Johannes Berg,
	linux-wireless, kuniyu

+Johannes and wireless ML.

From: Alexandre Ferrieux <alexandre.ferrieux@gmail.com>
Date: Sun, 15 Sep 2024 22:49:22 +0200
> (thanks Simon, reposting with another account to avoid the offending disclaimer)
> 
> Hi,
> 
> Currently, netns don't really scale beyond a few thousands, for
> mundane reasons (see below). But should they ? Is there, in the
> design, an assumption that tens of thousands of network namespaces are
> considered "unreasonable" ?
> 
> A typical use case for such ridiculous numbers is a tester for
> firewalls or carrier-grade NATs. In these, you typically want tens of
> thousands of tunnels, each of which is perfectly instantiated as an
> interface. And, to avoid an explosion in source routing rules, you
> want them in separate namespaces.
> 
> Now why don't they scale *today* ? For two independent, seemingly
> accidental, O(N) scans of the netns list.
> 
> 1. The "netdevice notifier" from the Wireless Extensions subsystem
> insists on scanning the whole list regardless of the nature of the
> change, nor wondering whether all these namespaces hold any wireless
> interface, nor even whether the system has _any_ wireless hardware...
> 
>         for_each_net(net) {
>                 while ((skb = skb_dequeue(&net->wext_nlevents)))
>                         rtnl_notify(skb, net, 0, RTNLGRP_LINK, NULL,
>                                     GFP_KERNEL);
>         }
>

Alex forwarded this mail to me and asked about 1.

I checked 8bf862739a778, but I didn't see why wext_netdev_notifier_call()
needs to iterate all netns.

Is there a case where flushing messages in the notified dev's netns is not
enough for wext dev ?

---8<---
diff --git a/net/wireless/wext-core.c b/net/wireless/wext-core.c
index 838ad6541a17..d4b613fc650c 100644
--- a/net/wireless/wext-core.c
+++ b/net/wireless/wext-core.c
@@ -343,17 +343,22 @@ static const int compat_event_type_size[] = {
 
 /* IW event code */
 
-void wireless_nlevent_flush(void)
+static void wireless_nlevent_flush_net(struct net *net)
 {
 	struct sk_buff *skb;
+
+	while ((skb = skb_dequeue(&net->wext_nlevents)))
+		rtnl_notify(skb, net, 0, RTNLGRP_LINK, NULL,
+			    GFP_KERNEL);
+}
+
+void wireless_nlevent_flush(void)
+{
 	struct net *net;
 
 	down_read(&net_rwsem);
-	for_each_net(net) {
-		while ((skb = skb_dequeue(&net->wext_nlevents)))
-			rtnl_notify(skb, net, 0, RTNLGRP_LINK, NULL,
-				    GFP_KERNEL);
-	}
+	for_each_net(net)
+		wireless_nlevent_flush_net(net);
 	up_read(&net_rwsem);
 }
 EXPORT_SYMBOL_GPL(wireless_nlevent_flush);
@@ -361,6 +366,8 @@ EXPORT_SYMBOL_GPL(wireless_nlevent_flush);
 static int wext_netdev_notifier_call(struct notifier_block *nb,
 				     unsigned long state, void *ptr)
 {
+	struct net_device *dev = netdev_notifier_info_to_dev(ptr);
+
 	/*
 	 * When a netdev changes state in any way, flush all pending messages
 	 * to avoid them going out in a strange order, e.g. RTM_NEWLINK after
@@ -368,7 +375,7 @@ static int wext_netdev_notifier_call(struct notifier_block *nb,
 	 * or similar - all of which could otherwise happen due to delays from
 	 * schedule_work().
 	 */
-	wireless_nlevent_flush();
+	wireless_nlevent_flush_net(dev_net(dev));
 
 	return NOTIFY_OK;
 }
---8<---

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: RFC: Should net namespaces scale up (>10k) ?
  2024-10-08 17:47 ` Kuniyuki Iwashima
@ 2024-10-08 18:22   ` Johannes Berg
  2024-10-08 18:56     ` Kuniyuki Iwashima
  0 siblings, 1 reply; 14+ messages in thread
From: Johannes Berg @ 2024-10-08 18:22 UTC (permalink / raw)
  To: Kuniyuki Iwashima, alexandre.ferrieux
  Cc: alexandre.ferrieux, edumazet, horms, netdev, linux-wireless

On Tue, 2024-10-08 at 10:47 -0700, Kuniyuki Iwashima wrote:

> > 1. The "netdevice notifier" from the Wireless Extensions subsystem
> > insists on scanning the whole list regardless of the nature of the
> > change, nor wondering whether all these namespaces hold any wireless
> > interface, nor even whether the system has _any_ wireless hardware...
> > 
> >         for_each_net(net) {
> >                 while ((skb = skb_dequeue(&net->wext_nlevents)))
> >                         rtnl_notify(skb, net, 0, RTNLGRP_LINK, NULL,
> >                                     GFP_KERNEL);
> >         }
> > 
> 
> Alex forwarded this mail to me and asked about 1.
> 
> I checked 8bf862739a778, but I didn't see why wext_netdev_notifier_call()
> needs to iterate all netns.

Agree. That code is ancient, and I don't remember why, but I'd think
it's just because I was lazy then.

> diff --git a/net/wireless/wext-core.c b/net/wireless/wext-core.c
> index 838ad6541a17..d4b613fc650c 100644
> --- a/net/wireless/wext-core.c
> +++ b/net/wireless/wext-core.c
> @@ -343,17 +343,22 @@ static const int compat_event_type_size[] = {
>  
>  /* IW event code */
>  
> -void wireless_nlevent_flush(void)
> +static void wireless_nlevent_flush_net(struct net *net)
>  {
>  	struct sk_buff *skb;
> +
> +	while ((skb = skb_dequeue(&net->wext_nlevents)))
> +		rtnl_notify(skb, net, 0, RTNLGRP_LINK, NULL,
> +			    GFP_KERNEL);
> +}
> +
> +void wireless_nlevent_flush(void)
> +{
>  	struct net *net;
>  
>  	down_read(&net_rwsem);
> -	for_each_net(net) {
> -		while ((skb = skb_dequeue(&net->wext_nlevents)))
> -			rtnl_notify(skb, net, 0, RTNLGRP_LINK, NULL,
> -				    GFP_KERNEL);
> -	}
> +	for_each_net(net)
> +		wireless_nlevent_flush_net(net);
>  	up_read(&net_rwsem);
>  }
>  EXPORT_SYMBOL_GPL(wireless_nlevent_flush);

Note 1: I just posted this patch yesterday:
https://lore.kernel.org/linux-wireless/20241007214715.3dd736dc3ac0.I1388536e99c37f28a007dd753c473ad21513d9a9@changeid/

so that would conflict here, I'd think.

Note 2: the only other caller to wireless_nlevent_flush() is from
wireless_nlevent_process()/wireless_nlevent_work, and that work could
easily be made per netns since it comes along with net->wext_nlevents,
and then we don't need any global function at all. Seems this could be
implemented in wext_pernet_init()/wext_pernet_exit() pretty easily?

johannes

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RFC: Should net namespaces scale up (>10k) ?
  2024-10-08 18:22   ` Johannes Berg
@ 2024-10-08 18:56     ` Kuniyuki Iwashima
  0 siblings, 0 replies; 14+ messages in thread
From: Kuniyuki Iwashima @ 2024-10-08 18:56 UTC (permalink / raw)
  To: johannes
  Cc: alexandre.ferrieux, alexandre.ferrieux, edumazet, horms, kuniyu,
	linux-wireless, netdev

From: Johannes Berg <johannes@sipsolutions.net>
Date: Tue, 08 Oct 2024 20:22:38 +0200
> On Tue, 2024-10-08 at 10:47 -0700, Kuniyuki Iwashima wrote:
> 
> > > 1. The "netdevice notifier" from the Wireless Extensions subsystem
> > > insists on scanning the whole list regardless of the nature of the
> > > change, nor wondering whether all these namespaces hold any wireless
> > > interface, nor even whether the system has _any_ wireless hardware...
> > > 
> > >         for_each_net(net) {
> > >                 while ((skb = skb_dequeue(&net->wext_nlevents)))
> > >                         rtnl_notify(skb, net, 0, RTNLGRP_LINK, NULL,
> > >                                     GFP_KERNEL);
> > >         }
> > > 
> > 
> > Alex forwarded this mail to me and asked about 1.
> > 
> > I checked 8bf862739a778, but I didn't see why wext_netdev_notifier_call()
> > needs to iterate all netns.
> 
> Agree. That code is ancient, and I don't remember why, but I'd think
> it's just because I was lazy then.
> 
> > diff --git a/net/wireless/wext-core.c b/net/wireless/wext-core.c
> > index 838ad6541a17..d4b613fc650c 100644
> > --- a/net/wireless/wext-core.c
> > +++ b/net/wireless/wext-core.c
> > @@ -343,17 +343,22 @@ static const int compat_event_type_size[] = {
> >  
> >  /* IW event code */
> >  
> > -void wireless_nlevent_flush(void)
> > +static void wireless_nlevent_flush_net(struct net *net)
> >  {
> >  	struct sk_buff *skb;
> > +
> > +	while ((skb = skb_dequeue(&net->wext_nlevents)))
> > +		rtnl_notify(skb, net, 0, RTNLGRP_LINK, NULL,
> > +			    GFP_KERNEL);
> > +}
> > +
> > +void wireless_nlevent_flush(void)
> > +{
> >  	struct net *net;
> >  
> >  	down_read(&net_rwsem);
> > -	for_each_net(net) {
> > -		while ((skb = skb_dequeue(&net->wext_nlevents)))
> > -			rtnl_notify(skb, net, 0, RTNLGRP_LINK, NULL,
> > -				    GFP_KERNEL);
> > -	}
> > +	for_each_net(net)
> > +		wireless_nlevent_flush_net(net);
> >  	up_read(&net_rwsem);
> >  }
> >  EXPORT_SYMBOL_GPL(wireless_nlevent_flush);
> 
> Note 1: I just posted this patch yesterday:
> https://lore.kernel.org/linux-wireless/20241007214715.3dd736dc3ac0.I1388536e99c37f28a007dd753c473ad21513d9a9@changeid/
> 
> so that would conflict here, I'd think.
> 
> Note 2: the only other caller to wireless_nlevent_flush() is from
> wireless_nlevent_process()/wireless_nlevent_work, and that work could
> easily be made per netns since it comes along with net->wext_nlevents,
> and then we don't need any global function at all. Seems this could be
> implemented in wext_pernet_init()/wext_pernet_exit() pretty easily?

Sounds good.

I'll post a patch after yours lands on wireless-next.

Thanks!

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2024-10-08 18:56 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-14 22:34 RFC: Should net namespaces scale up (>10k) ? alexandre.ferrieux
2024-09-15 18:58 ` Simon Horman
  -- strict thread matches above, loose matches on Subject: below --
2024-09-15 20:49 Alexandre Ferrieux
2024-09-16 10:13 ` Przemek Kitszel
2024-09-16 14:01   ` Simon Horman
2024-09-16 22:05     ` Alexandre Ferrieux
2024-09-17  6:40       ` Przemek Kitszel
2024-09-17 11:06         ` Alexandre Ferrieux
2024-09-17  6:59       ` Eric Dumazet
2024-09-17 12:30         ` Nicolas Dichtel
2024-09-16 21:36   ` Alexandre Ferrieux
2024-10-08 17:47 ` Kuniyuki Iwashima
2024-10-08 18:22   ` Johannes Berg
2024-10-08 18:56     ` Kuniyuki Iwashima

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).