* Solving address deletion bottleneck in SCTP
@ 2024-10-28 12:48 Gilad Naaman
2024-10-29 15:06 ` Xin Long
2024-11-06 20:24 ` David Laight
0 siblings, 2 replies; 5+ messages in thread
From: Gilad Naaman @ 2024-10-28 12:48 UTC (permalink / raw)
To: Marcelo Ricardo Leitner, Xin Long, linux-sctp, netdev; +Cc: Gilad Naaman
Hello,
We've noticed that when a namespace has a large amount of IP addresses,
the list `net->sctp.local_addr_list` gets obscenely long.
This list contains both IPv4 and IPv6 addresses, of all scopes, and it is
a single long list, instead of a hashtable.
In our case we had 12K interfaces, each with an IPv4 and 2 IPv6 addresses
(GUA+LLA), which made deletion of a single address pretty expensive, since
it requires a linear search through 36K addresses.
Internally we solved it pretty naively by turning the list into hashmap, which
helped us avoid this bottleneck:
+ #define SCTP_ADDR_HSIZE_SHIFT 8
+ #define SCTP_ADDR_HSIZE (1 << SCTP_ADDR_HSIZE_SHIFT)
- struct list_head local_addr_list;
+ struct list_head local_addr_list[SCTP_ADDR_HSIZE];
I've used the same factor used by the IPv6 & IPv4 address tables.
I am not entirely sure this patch solves a big enough problem for the greater
general kernel community to warrant the increased memory usage (~2KiB-p-netns),
so I'll avoid sending it.
Recently, though, both IPv4 and IPv6 tables were namespacified, which makes
me think that maybe local_addr_list is no longer necessary, enabling us to
them directly instead of maintaining a separate list.
As far as I could tell, the only field of `struct sctp_sockaddr_entry` that
are used for items of this list, aside from the address itself, is the `valid`
bit, which can probably be folded into `struct in_ifaddr` and `struct inet6_ifaddr`.
What I'm suggesting, in short is:
- Represent `valid` inside the original address structs.
- Replace iteration of `local_addr_list` with iteration of ns addr tables
- Eliminate `local_addr_list`
Is this a reasonable proposal?
Thank you for your time,
Gilad
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Solving address deletion bottleneck in SCTP
2024-10-28 12:48 Solving address deletion bottleneck in SCTP Gilad Naaman
@ 2024-10-29 15:06 ` Xin Long
2024-11-05 9:02 ` Gilad Naaman
2024-11-06 20:24 ` David Laight
1 sibling, 1 reply; 5+ messages in thread
From: Xin Long @ 2024-10-29 15:06 UTC (permalink / raw)
To: Gilad Naaman; +Cc: Marcelo Ricardo Leitner, linux-sctp, netdev
On Mon, Oct 28, 2024 at 8:49 AM Gilad Naaman <gnaaman@drivenets.com> wrote:
>
> Hello,
>
> We've noticed that when a namespace has a large amount of IP addresses,
> the list `net->sctp.local_addr_list` gets obscenely long.
>
> This list contains both IPv4 and IPv6 addresses, of all scopes, and it is
> a single long list, instead of a hashtable.
>
> In our case we had 12K interfaces, each with an IPv4 and 2 IPv6 addresses
> (GUA+LLA), which made deletion of a single address pretty expensive, since
> it requires a linear search through 36K addresses.
>
> Internally we solved it pretty naively by turning the list into hashmap, which
> helped us avoid this bottleneck:
>
> + #define SCTP_ADDR_HSIZE_SHIFT 8
> + #define SCTP_ADDR_HSIZE (1 << SCTP_ADDR_HSIZE_SHIFT)
>
> - struct list_head local_addr_list;
> + struct list_head local_addr_list[SCTP_ADDR_HSIZE];
>
>
> I've used the same factor used by the IPv6 & IPv4 address tables.
>
> I am not entirely sure this patch solves a big enough problem for the greater
> general kernel community to warrant the increased memory usage (~2KiB-p-netns),
> so I'll avoid sending it.
>
> Recently, though, both IPv4 and IPv6 tables were namespacified, which makes
> me think that maybe local_addr_list is no longer necessary, enabling us to
> them directly instead of maintaining a separate list.
>
> As far as I could tell, the only field of `struct sctp_sockaddr_entry` that
> are used for items of this list, aside from the address itself, is the `valid`
> bit, which can probably be folded into `struct in_ifaddr` and `struct inet6_ifaddr`.
>
> What I'm suggesting, in short is:
> - Represent `valid` inside the original address structs.
> - Replace iteration of `local_addr_list` with iteration of ns addr tables
> - Eliminate `local_addr_list`
>
> Is this a reasonable proposal?
This would simplify sctp_inet6addr_event() and sctp_inetaddr_event(),
but complicate sctp_copy_laddrs() and sctp_copy_local_addr_list().
Would you like to create a patch for this and let's see how it looks?
Note I don't think that that 'valid' bit is useful:
if (addr->a.sa.sa_family == AF_INET &&
addr->a.v4.sin_addr.s_addr ==
ifa->ifa_local) {
sctp_addr_wq_mgmt(net, addr, SCTP_ADDR_DEL);
found = 1;
<-------- [1]
addr->valid = 0;
list_del_rcu(&addr->list);
break;
}
'addr' can be copied before "addr->valid = 0;" with addr->valid =1 in
another thread anyway. I think you can ignore this 'valid' bit.
Thanks.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Solving address deletion bottleneck in SCTP
2024-10-29 15:06 ` Xin Long
@ 2024-11-05 9:02 ` Gilad Naaman
0 siblings, 0 replies; 5+ messages in thread
From: Gilad Naaman @ 2024-11-05 9:02 UTC (permalink / raw)
To: lucien.xin; +Cc: gnaaman, linux-sctp, marcelo.leitner, netdev
> On Mon, Oct 28, 2024 at 8:49 AM Gilad Naaman <gnaaman@drivenets.com> wrote:
> >
> > Hello,
> >
> > We've noticed that when a namespace has a large amount of IP addresses,
> > the list `net->sctp.local_addr_list` gets obscenely long.
> >
> > This list contains both IPv4 and IPv6 addresses, of all scopes, and it is
> > a single long list, instead of a hashtable.
> >
> > In our case we had 12K interfaces, each with an IPv4 and 2 IPv6 addresses
> > (GUA+LLA), which made deletion of a single address pretty expensive, since
> > it requires a linear search through 36K addresses.
> >
> > Internally we solved it pretty naively by turning the list into hashmap, which
> > helped us avoid this bottleneck:
> >
> > + #define SCTP_ADDR_HSIZE_SHIFT 8
> > + #define SCTP_ADDR_HSIZE (1 << SCTP_ADDR_HSIZE_SHIFT)
> >
> > - struct list_head local_addr_list;
> > + struct list_head local_addr_list[SCTP_ADDR_HSIZE];
> >
> >
> > I've used the same factor used by the IPv6 & IPv4 address tables.
> >
> > I am not entirely sure this patch solves a big enough problem for the greater
> > general kernel community to warrant the increased memory usage (~2KiB-p-netns),
> > so I'll avoid sending it.
> >
> > Recently, though, both IPv4 and IPv6 tables were namespacified, which makes
> > me think that maybe local_addr_list is no longer necessary, enabling us to
> > them directly instead of maintaining a separate list.
> >
> > As far as I could tell, the only field of `struct sctp_sockaddr_entry` that
> > are used for items of this list, aside from the address itself, is the `valid`
> > bit, which can probably be folded into `struct in_ifaddr` and `struct inet6_ifaddr`.
> >
> > What I'm suggesting, in short is:
> > - Represent `valid` inside the original address structs.
> > - Replace iteration of `local_addr_list` with iteration of ns addr tables
> > - Eliminate `local_addr_list`
> >
> > Is this a reasonable proposal?
> This would simplify sctp_inet6addr_event() and sctp_inetaddr_event(),
> but complicate sctp_copy_laddrs() and sctp_copy_local_addr_list().
>
> Would you like to create a patch for this and let's see how it looks?
I've implemented it, and to be honest, the result is neither here nor there.
Tried first with:
for (idx = 0; idx < IN4_ADDR_HSIZE; idx++)
hlist_for_each_entry_rcu(ifa, &net->ipv4.inet_addr_lst[idx], addr_lst)
But after repeating it 4 times realized it should probably be extracted into
a macro, which didn't turn out that well:
#define _ifaddr_entry(node) hlist_entry_safe(rcu_dereference_raw(node), struct in_ifaddr, addr_lst)
#define for_each_inet_addr_rcu(idx, ifa, net) for ( \
idx = 0, \
ifa = _ifaddr_entry(hlist_first_rcu(&(net)->ipv4.inet_addr_lst[idx])); \
\
idx < IN4_ADDR_HSIZE; \
\
ifa = (ifa && ifa->addr_list.next) \
? _ifaddr_entry(hlist_next_rcu(&(ifa)->addr_lst)) \
: (++idx < IN4_ADDR_HSIZE \
? _ifaddr_entry(hlist_first_rcu(&(net)->ipv4.inet_addr_lst[idx])) \
: NULL) \
) if (ifa)
sctp_copy_laddrs() and sctp_copy_local_addr_list() do contain a bit of
duplication now, but I admit I like that we can avoid iterating addresses
when they are not relevant:
if ((copy_flags & SCTP_ADDR4_ALLOWED) &&
(copy_flags & SCTP_ADDR4_PEERSUPP)) {
error = sctp_copy_local_ipv4_addrs(net, bp, scope);
if (error)
goto unlock;
}
#if IS_ENABLED(CONFIG_IPV6)
if ((copy_flags & SCTP_ADDR6_ALLOWED) &&
(copy_flags & SCTP_ADDR6_PEERSUPP)) {
error = sctp_copy_local_ipv6_addrs(net, bp, scope);
if (error)
goto unlock;
}
#endif
I'll send a patch if I can figure out how to make the for_each macro not
look like a train-wreck.
Thank you!
> Note I don't think that that 'valid' bit is useful:
>
> if (addr->a.sa.sa_family == AF_INET &&
> addr->a.v4.sin_addr.s_addr ==
> ifa->ifa_local) {
> sctp_addr_wq_mgmt(net, addr, SCTP_ADDR_DEL);
> found = 1;
> <-------- [1]
> addr->valid = 0;
> list_del_rcu(&addr->list);
> break;
> }
>
> 'addr' can be copied before "addr->valid = 0;" with addr->valid =1 in
> another thread anyway. I think you can ignore this 'valid' bit.
>
> Thanks.
^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: Solving address deletion bottleneck in SCTP
2024-10-28 12:48 Solving address deletion bottleneck in SCTP Gilad Naaman
2024-10-29 15:06 ` Xin Long
@ 2024-11-06 20:24 ` David Laight
2024-11-07 8:18 ` Gilad Naaman
1 sibling, 1 reply; 5+ messages in thread
From: David Laight @ 2024-11-06 20:24 UTC (permalink / raw)
To: 'Gilad Naaman', Marcelo Ricardo Leitner, Xin Long,
linux-sctp@vger.kernel.org, netdev@vger.kernel.org
From: Gilad Naaman
> Sent: 28 October 2024 12:49
...
> the list `net->sctp.local_addr_list` gets obscenely long.
>
> This list contains both IPv4 and IPv6 addresses, of all scopes, and it is
> a single long list, instead of a hashtable.
>
> In our case we had 12K interfaces, each with an IPv4 and 2 IPv6 addresses
> (GUA+LLA), which made deletion of a single address pretty expensive, since
> it requires a linear search through 36K addresses.
...
Is that the list that SCTP uses in order to pass all of its local addresses
to the remote system during connection establishment?
In which case it really makes no sense to have the list at all if it contains
more than a handful of addresses.
Indeed the whole notion of 'send ALL my addresses' is just plain broken.
What happens in practise is that applications pretty much always have to
bind to all (typically both) the relevant addresses to stop the system
sending IP addresses that are unroutable from the remote system - and
may even refer to an entirely different local network.
Passing this buck to the application isn't really right either.
It ought to be a property of the network topology.
But that is hard to describe.
The two systems 10.1.1.1 and 10.1.1.2 could both have private 192.168.1.x
networks (without IP forwarding) and other 10.1.1.x hosts could be
randomly connected to either network.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: Solving address deletion bottleneck in SCTP
2024-11-06 20:24 ` David Laight
@ 2024-11-07 8:18 ` Gilad Naaman
0 siblings, 0 replies; 5+ messages in thread
From: Gilad Naaman @ 2024-11-07 8:18 UTC (permalink / raw)
To: david.laight; +Cc: gnaaman, linux-sctp, lucien.xin, marcelo.leitner, netdev
From: David Laight <David.Laight@ACULAB.COM>
> From: Gilad Naaman
> > Sent: 28 October 2024 12:49
> ...
> > the list `net->sctp.local_addr_list` gets obscenely long.
> >
> > This list contains both IPv4 and IPv6 addresses, of all scopes, and it is
> > a single long list, instead of a hashtable.
> >
> > In our case we had 12K interfaces, each with an IPv4 and 2 IPv6 addresses
> > (GUA+LLA), which made deletion of a single address pretty expensive, since
> > it requires a linear search through 36K addresses.
> ...
>
> Is that the list that SCTP uses in order to pass all of its local addresses
> to the remote system during connection establishment?
Yes, it is exactly that list.
> In which case it really makes no sense to have the list at all if it contains
> more than a handful of addresses.
>
> Indeed the whole notion of 'send ALL my addresses' is just plain broken.
> What happens in practise is that applications pretty much always have to
> bind to all (typically both) the relevant addresses to stop the system
> sending IP addresses that are unroutable from the remote system - and
> may even refer to an entirely different local network.
>
> Passing this buck to the application isn't really right either.
> It ought to be a property of the network topology.
> But that is hard to describe.
> The two systems 10.1.1.1 and 10.1.1.2 could both have private 192.168.1.x
> networks (without IP forwarding) and other 10.1.1.x hosts could be
> randomly connected to either network.
>
> David
Yeah, I'm not entirely sure what should even happen in this case.
I feel like I could use a CONFIG_SCTP_INIT_ADDRESS and CONFIG_SCTP_AUTO_ASCONF,
where setting both to false removes this behaviour and list.
Not sure if it makes sense, though.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-11-07 8:18 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-28 12:48 Solving address deletion bottleneck in SCTP Gilad Naaman
2024-10-29 15:06 ` Xin Long
2024-11-05 9:02 ` Gilad Naaman
2024-11-06 20:24 ` David Laight
2024-11-07 8:18 ` Gilad Naaman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).