From: ebiederm@xmission.com (Eric W. Biederman)
To: David Miller <davem@davemloft.net>
Cc: netdev@vger.kernel.org, hadi@cyberus.ca, dlezcano@fr.ibm.com,
adobriyan@gmail.com, kaber@trash.net
Subject: Re: [PATCH 0/20] Batch network namespace cleanup
Date: Mon, 30 Nov 2009 16:55:37 -0800 [thread overview]
Message-ID: <m1bpijjywm.fsf@fess.ebiederm.org> (raw)
In-Reply-To: <20091130.163454.192638570.davem@davemloft.net> (David Miller's message of "Mon\, 30 Nov 2009 16\:34\:54 -0800 \(PST\)")
David Miller <davem@davemloft.net> writes:
> From: ebiederm@xmission.com (Eric W. Biederman)
> Date: Sun, 29 Nov 2009 17:46:03 -0800
>
>> Recently Jamal and Daniel perform some experiments and found that
>> large numbers of network namespace exiting simultaneously is very
>> inefficient. 24+ minutes in some configurations. The cpu overhead
>> was negligible but it results in long hold times of net_mutex, and
>> memory being consumed a long time after the last user has gone away.
>>
>> I looked into it and discovered that by batching network namespace
>> cleanups I can reduce the time for 4k network namespaces exiting from
>> 5-7 minutes in my configuration to 44 seconds.
>>
>> This patch series is my set of changes to the network namespace core
>> and associated cleanups to allow for network namespace batching.
>
> All applied, and assuming all of the build checks pass I'll
> push this out to net-next-2.6
>
> I should look into that inet_twsk_purge performance issue you mention
> when tearing down a namespace. It walks the entire hash table and
> takes a lock for every hash chain.
>
> Eric, is it possible for us to at least slightly optimize this by
> doing a peek at whether the head pointer of each chain is NULL and
> bypass the spinlock and everything else in that case? Or is this
> not legal with sk_nulls?
>
> Something like:
>
> if (hlist_nulls_empty(&head->twchain))
> continue;
>
> right before the 'restart' label?
I haven't had a chance to wrap my head around that case fully yet.
After playing with a few ideas I think what we want to do algorithmically
is to have a batched flush like we do for the routing table cache. That
should get the cost down to only about 100ms. Which is much better when
you have a lot of them but is still a lot of time.
>From my preliminary investigation I believe we don't need to take any
locks to traverse the hash table. I think we can do the entire hash
table traversal under simple rcu protection, and only take the lock on
the individual time wait entries to delete them.
....
Also for best batching we have ipip, ipgre, ip6_tunnel, sit, vlan,
and bridging that need to be taught to use rtnl_link_ops and let
the generic code delete their devices.
The changes for the vlan code are simple. The rest I haven't finished
wrapping my head around the drivers individual requirements for. Still
the changes should not be sweeping.
Eric
prev parent reply other threads:[~2009-12-01 0:55 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-11-30 1:46 [PATCH 0/20] Batch network namespace cleanup Eric W. Biederman
2009-11-30 8:07 ` Eric Dumazet
2009-11-30 8:09 ` David Miller
2009-11-30 8:17 ` Eric W. Biederman
2009-11-30 8:48 ` Eric W. Biederman
2009-11-30 8:25 ` [PATCH 02/20] net: Implement for_each_netdev_reverse Eric W. Biederman
2009-11-30 8:25 ` [PATCH 03/20] net: Batch network namespace destruction Eric W. Biederman
2009-11-30 8:25 ` [PATCH 04/20] net: Automatically allocate per namespace data Eric W. Biederman
2009-11-30 8:25 ` [PATCH 05/20] net: Simplify loopback and improve batching Eric W. Biederman
2009-11-30 8:25 ` [PATCH 06/20] net: Simplfy default_device_exit " Eric W. Biederman
2009-11-30 12:44 ` [PATCH 0/20] Batch network namespace cleanup jamal
2009-11-30 19:22 ` Eric W. Biederman
2009-12-01 0:34 ` David Miller
2009-12-01 0:55 ` Eric W. Biederman [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=m1bpijjywm.fsf@fess.ebiederm.org \
--to=ebiederm@xmission.com \
--cc=adobriyan@gmail.com \
--cc=davem@davemloft.net \
--cc=dlezcano@fr.ibm.com \
--cc=hadi@cyberus.ca \
--cc=kaber@trash.net \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).