netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: sowmini varadhan <sowmini05@gmail.com>
To: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "Eric Dumazet" <eric.dumazet@gmail.com>,
	"Niels Möller" <nisse@southpole.se>,
	netdev <netdev@vger.kernel.org>,
	"Jonas Bonn" <jonas@southpole.se>
Subject: Scaling 'ip addr add' (was Re: What's the right way to use a *large* number of source addresses?)
Date: Tue, 27 May 2014 17:29:00 -0400	[thread overview]
Message-ID: <CACP96tRSpop=Sf58qE6TXeF+a-5nqfCj5d12bwD9WaYue_6_UA@mail.gmail.com> (raw)

On Sat, May 24, 2014 at 8:06 AM, Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> On 05/23/14 10:14, Eric Dumazet wrote:
>
>> Use the batch mode, and it will be much faster than ifconfig, as
>> ifconfig does not support this mode (you need one fork()/exec() per IP
>> address)
>>
>> ip -batch filename
>>
>
> The address dumping algorithm is a very likely contributor as well.
> It tries to remember indices and then skips on the next iteration
> all the way to where it left off.... has never been a big deal until
> someone tries a substantial number of addresses.
>
> cheers,
> jamal

Niels (nisse@southpole.se) reported:

   I've done a simple benchmark with a script assigning n addresses
   using "ip address add", and this seems to have O(n^2) complexity.
   E.g, assigning n=25500 addresses took 26 s, and doubling n, assigning
   51000 addresses, took 122 s, 4.6 times longer. Which isn't
   necessarily a problems once all the addresses are assigned, but it
   sounds a bit like there's a linear datastructure in there, not
   intended for a large number of addresses.

And this bothered me, since the suggested workaround of
"ip -b", plus the comment about slow address dumping algorithm
are both saying that there may be some fundamental scaling
issues here.

Also, my earlier comment about netlink vs ioctl was possibly
a red-herring- when I compared my experiment with what Niels is
trying to do, the experiment was different- I was adding
an address to a (newly created) tunnel interface (thus
explodes both number of interfaces and addresses), whereas
Niels is addign all addresses to the same interface.

So I looked at Niels' test script with perf. Some observations:

perf tells me:

   80.13%       ip  [other]
                 |
                 |--30.12%-- fib_sync_up
                 |          |
                 |           --30.12%-- fib_inetaddr_event
                 |                     notifier_call_chain
                 |                     __blocking_notifier_call_chain
                 |                     blocking_notifier_call_chain
                 |                     __inet_insert_ifa
                 |                     inet_rtm_newaddr
                 |                     rtnetlink_rcv_msg
                 |                     netlink_rcv_skb
                 |                     rtnetlink_rcv
                 |                     netlink_unicast
                 |                     netlink_sendmsg
                 |                     sock_sendmsg
                 |                     ___sys_sendmsg
                 |                     __sys_sendmsg
                 |                     SyS_sendmsg
                 |                     SyS_socketcall
                 |                     syscall_call

thus fib_sync_up() itself doesn't scale very well. Not sure
how much tweak-potential exists here.

Further, in __inet_insert_ifa, we walk the ifa_list at least once
(which is probably unavoidable),

static int __inet_insert_ifa( /* .. */
                             u32 portid)
{

        /* ... */
       for (ifap = &in_dev->ifa_list; (ifa1 = *ifap) != NULL;
             ifap = &ifa1->ifa_next) {
        /* ... */
       blocking_notifier_call_chain(&inetaddr_chain, NETDEV_UP, ifa);

       return (0);
}

But in addition, The fib callback: fib_inetaddr_event() has another
potential ifa_list walk for SECONDARY addresses.

        switch (event) {
        case NETDEV_UP:
                fib_add_ifaddr(ifa);
#ifdef CONFIG_IP_ROUTE_MULTIPATH
                fib_sync_up(dev);
#endif

For Niels script, since there are many addresses in the same
subnet, we'll have a lot of cases of an IFA_F_SECONDARY address,
so fib_add_ifaddr will then do another walk of the ifa_list.

Has anyone looked at consolidating some of this?
All of this could easily become a factor when the system
has a large number of interfaces and addresses, and the
control plane only wants to modify a very small subset of
that state.

--Sowmini

             reply	other threads:[~2014-05-27 21:29 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-27 21:29 sowmini varadhan [this message]
2014-05-28  1:41 ` Scaling 'ip addr add' (was Re: What's the right way to use a *large* number of source addresses?) Eric Dumazet
2014-05-28 10:01   ` sowmini varadhan
2014-05-28 11:23     ` Jamal Hadi Salim
2014-05-28 11:54       ` sowmini varadhan
2014-05-28 12:18   ` sowmini varadhan
2014-05-28 13:44     ` Eric Dumazet
2014-05-28 14:48       ` Eric Dumazet
2014-05-28 16:00         ` Eric Dumazet
2014-05-28 17:18         ` sowmini varadhan
2014-05-29  6:34 ` Julian Anastasov
2014-05-29 16:11   ` sowmini varadhan
2014-05-29 16:19     ` David Ahern

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CACP96tRSpop=Sf58qE6TXeF+a-5nqfCj5d12bwD9WaYue_6_UA@mail.gmail.com' \
    --to=sowmini05@gmail.com \
    --cc=eric.dumazet@gmail.com \
    --cc=jhs@mojatatu.com \
    --cc=jonas@southpole.se \
    --cc=netdev@vger.kernel.org \
    --cc=nisse@southpole.se \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).