From: sowmini varadhan <sowmini05@gmail.com>
To: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "Eric Dumazet" <eric.dumazet@gmail.com>,
"Niels Möller" <nisse@southpole.se>,
netdev <netdev@vger.kernel.org>,
"Jonas Bonn" <jonas@southpole.se>
Subject: Scaling 'ip addr add' (was Re: What's the right way to use a *large* number of source addresses?)
Date: Tue, 27 May 2014 17:29:00 -0400 [thread overview]
Message-ID: <CACP96tRSpop=Sf58qE6TXeF+a-5nqfCj5d12bwD9WaYue_6_UA@mail.gmail.com> (raw)
On Sat, May 24, 2014 at 8:06 AM, Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> On 05/23/14 10:14, Eric Dumazet wrote:
>
>> Use the batch mode, and it will be much faster than ifconfig, as
>> ifconfig does not support this mode (you need one fork()/exec() per IP
>> address)
>>
>> ip -batch filename
>>
>
> The address dumping algorithm is a very likely contributor as well.
> It tries to remember indices and then skips on the next iteration
> all the way to where it left off.... has never been a big deal until
> someone tries a substantial number of addresses.
>
> cheers,
> jamal
Niels (nisse@southpole.se) reported:
I've done a simple benchmark with a script assigning n addresses
using "ip address add", and this seems to have O(n^2) complexity.
E.g, assigning n=25500 addresses took 26 s, and doubling n, assigning
51000 addresses, took 122 s, 4.6 times longer. Which isn't
necessarily a problems once all the addresses are assigned, but it
sounds a bit like there's a linear datastructure in there, not
intended for a large number of addresses.
And this bothered me, since the suggested workaround of
"ip -b", plus the comment about slow address dumping algorithm
are both saying that there may be some fundamental scaling
issues here.
Also, my earlier comment about netlink vs ioctl was possibly
a red-herring- when I compared my experiment with what Niels is
trying to do, the experiment was different- I was adding
an address to a (newly created) tunnel interface (thus
explodes both number of interfaces and addresses), whereas
Niels is addign all addresses to the same interface.
So I looked at Niels' test script with perf. Some observations:
perf tells me:
80.13% ip [other]
|
|--30.12%-- fib_sync_up
| |
| --30.12%-- fib_inetaddr_event
| notifier_call_chain
| __blocking_notifier_call_chain
| blocking_notifier_call_chain
| __inet_insert_ifa
| inet_rtm_newaddr
| rtnetlink_rcv_msg
| netlink_rcv_skb
| rtnetlink_rcv
| netlink_unicast
| netlink_sendmsg
| sock_sendmsg
| ___sys_sendmsg
| __sys_sendmsg
| SyS_sendmsg
| SyS_socketcall
| syscall_call
thus fib_sync_up() itself doesn't scale very well. Not sure
how much tweak-potential exists here.
Further, in __inet_insert_ifa, we walk the ifa_list at least once
(which is probably unavoidable),
static int __inet_insert_ifa( /* .. */
u32 portid)
{
/* ... */
for (ifap = &in_dev->ifa_list; (ifa1 = *ifap) != NULL;
ifap = &ifa1->ifa_next) {
/* ... */
blocking_notifier_call_chain(&inetaddr_chain, NETDEV_UP, ifa);
return (0);
}
But in addition, The fib callback: fib_inetaddr_event() has another
potential ifa_list walk for SECONDARY addresses.
switch (event) {
case NETDEV_UP:
fib_add_ifaddr(ifa);
#ifdef CONFIG_IP_ROUTE_MULTIPATH
fib_sync_up(dev);
#endif
For Niels script, since there are many addresses in the same
subnet, we'll have a lot of cases of an IFA_F_SECONDARY address,
so fib_add_ifaddr will then do another walk of the ifa_list.
Has anyone looked at consolidating some of this?
All of this could easily become a factor when the system
has a large number of interfaces and addresses, and the
control plane only wants to modify a very small subset of
that state.
--Sowmini
next reply other threads:[~2014-05-27 21:29 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-05-27 21:29 sowmini varadhan [this message]
2014-05-28 1:41 ` Scaling 'ip addr add' (was Re: What's the right way to use a *large* number of source addresses?) Eric Dumazet
2014-05-28 10:01 ` sowmini varadhan
2014-05-28 11:23 ` Jamal Hadi Salim
2014-05-28 11:54 ` sowmini varadhan
2014-05-28 12:18 ` sowmini varadhan
2014-05-28 13:44 ` Eric Dumazet
2014-05-28 14:48 ` Eric Dumazet
2014-05-28 16:00 ` Eric Dumazet
2014-05-28 17:18 ` sowmini varadhan
2014-05-29 6:34 ` Julian Anastasov
2014-05-29 16:11 ` sowmini varadhan
2014-05-29 16:19 ` David Ahern
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CACP96tRSpop=Sf58qE6TXeF+a-5nqfCj5d12bwD9WaYue_6_UA@mail.gmail.com' \
--to=sowmini05@gmail.com \
--cc=eric.dumazet@gmail.com \
--cc=jhs@mojatatu.com \
--cc=jonas@southpole.se \
--cc=netdev@vger.kernel.org \
--cc=nisse@southpole.se \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).