From: Christian Brauner <christian.brauner@canonical.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: davem@davemloft.net, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org, avagin@virtuozzo.com,
ktkhai@virtuozzo.com, serge@hallyn.com,
gregkh@linuxfoundation.org
Subject: Re: [PATCH net-next 2/2] netns: isolate seqnums to use per-netns locks
Date: Sat, 21 Apr 2018 17:49:12 +0200 [thread overview]
Message-ID: <20180421154910.GA31964@gmail.com> (raw)
In-Reply-To: <20180420161643.GA15182@gmail.com>
On Fri, Apr 20, 2018 at 06:16:44PM +0200, Christian Brauner wrote:
> On Fri, Apr 20, 2018 at 03:56:28PM +0200, Christian Brauner wrote:
> > On Wed, Apr 18, 2018 at 11:52:47PM +0200, Christian Brauner wrote:
> > > On Wed, Apr 18, 2018 at 11:55:52AM -0500, Eric W. Biederman wrote:
> > > > Christian Brauner <christian.brauner@ubuntu.com> writes:
> > > >
> > > > > Now that it's possible to have a different set of uevents in different
> > > > > network namespaces, per-network namespace uevent sequence numbers are
> > > > > introduced. This increases performance as locking is now restricted to the
> > > > > network namespace affected by the uevent rather than locking
> > > > > everything.
> > > >
> > > > Numbers please. I personally expect that the netlink mc_list issues
> > > > will swamp any benefit you get from this.
> > >
> > > I wouldn't see how this would be the case. The gist of this is:
> > > Everytime you send a uevent into a network namespace *not* owned by
> > > init_user_ns you currently *have* to take mutex_lock(uevent_sock_list)
> > > effectively blocking the host from processing uevents even though
> > > - the uevent you're receiving might be totally different from the
> > > uevent that you're sending
> > > - the uevent socket of the non-init_user_ns owned network namespace
> > > isn't even recorded in the list.
> > >
> > > The other argument is that we now have properly isolated network
> > > namespaces wrt to uevents such that each netns can have its own set of
> > > uevents. This can either happen by a sufficiently privileged userspace
> > > process sending it uevents that are only dedicated to that specific
> > > netns. Or - and this *has been true for a long time* - because network
> > > devices are *properly namespaced*. Meaning a uevent for that network
> > > device is *tied to a network namespace*. For both cases the uevent
> > > sequence numbering will be absolutely misleading. For example, whenever
> > > you create e.g. a new veth device in a new network namespace it
> > > shouldn't be accounted against the initial network namespace but *only*
> > > against the network namespace that has that device added to it.
> >
> > Eric, I did the testing. Here's what I did:
> >
> > I compiled two 4.17-rc1 Kernels:
> > - one with per netns uevent seqnums with decoupled locking
> > - one without per netns uevent seqnums with decoupled locking
> >
> > # Testcase 1:
> > Only Injecting Uevents into network namespaces not owned by the initial user
> > namespace.
> > - created 1000 new user namespace + network namespace pairs
> > - opened a uevent listener in each of those namespace pairs
> > - injected uevents into each of those network namespaces 10,000 times meaning
> > 10,000,000 (10 million) uevents were injected. (The high number of
> > uevent injections should get rid of a lot of jitter.)
> > - Calculated the mean transaction time.
> > - *without* uevent sequence number namespacing:
> > 67 μs
> > - *with* uevent sequence number namespacing:
> > 55 μs
> > - makes a difference of 12 μs
> >
> > # Testcase 2:
> > Injecting Uevents into network namespaces not owned by the initial user
> > namespace and network namespaces owned by the initial user namespace.
> > - created 500 new user namespace + network namespace pairs
> > - created 500 new network namespace pairs
> > - opened a uevent listener in each of those namespace pairs
> > - injected uevents into each of those network namespaces 10,000 times meaning
> > 10,000,000 (10 million) uevents were injected. (The high number of
> > uevent injections should get rid of a lot of jitter.)
> > - Calculated the mean transaction time.
> > - *without* uevent sequence number namespacing:
> > 572 μs
> > - *with* uevent sequence number namespacing:
> > 514 μs
> > - makes a difference of 58 μs
> >
> > So there's performance gain. The third case would be to create a bunch
> > of hanging processes that send SIGSTOP to themselves but do not actually
> > open a uevent socket in their respective namespaces and then inject
> > uevents into them. I expect there to be an even more performance
> > benefits since the rtnl_table_lock() isn't hit in this case because
> > there are no listeners.
>
> I did the third test-case as well so:
> - created 500 new user namespace + network namespace pairs *without
> uevent listeners*
> - created 500 new network namespace pairs *without uevent listeners*
> - injected uevents into each of those network namespaces 10,000 times meaning
> 10,000,000 (10 million) uevents were injected. (The high number of
> uevent injections should get rid of a lot of jitter.)
> - Calculated the mean transaction time.
> - *without* uevent sequence number namespacing:
> 206 μs
> - *with* uevent sequence number namespacing:
> 163 μs
> - makes a difference of 43 μs
>
> So this test-case shows performance improvement as well.
Just for fun, I did a simple statistical anlysis using t-tests and they
all show significant differences at alpha-level 0.001 (Which I chose
because it seemed 0.05 is a bit too lax.).
Testcase 1:
Welch Two Sample t-test
data: x1 and y1
t = 405.16, df = 18883000, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
12.14949 12.26761
sample estimates:
mean of x mean of y
68.48594 56.27739
Testcase 2:
Welch Two Sample t-test
data: x2 and y2
t = 38.685, df = 19682000, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
55.10630 60.98815
sample estimates:
mean of x mean of y
572.9684 514.9211
Testcase 3:
Welch Two Sample t-test
data: x3 and y3
t = 58.37, df = 17711000, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
41.77860 44.68178
sample estimates:
mean of x mean of y
207.2632 164.0330
Thanks!
Christian
next prev parent reply other threads:[~2018-04-21 15:49 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-04-18 15:21 [PATCH net-next 0/2] netns: uevent performance tweaks Christian Brauner
2018-04-18 15:21 ` [PATCH net-next 1/2] netns: restrict uevents Christian Brauner
2018-04-18 15:21 ` [PATCH net-next 2/2] netns: isolate seqnums to use per-netns locks Christian Brauner
2018-04-18 16:55 ` Eric W. Biederman
2018-04-18 21:52 ` Christian Brauner
2018-04-20 13:56 ` Christian Brauner
2018-04-20 16:16 ` Christian Brauner
2018-04-21 15:49 ` Christian Brauner [this message]
2018-04-23 2:39 ` kbuild test robot
2018-04-23 10:12 ` Christian Brauner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180421154910.GA31964@gmail.com \
--to=christian.brauner@canonical.com \
--cc=avagin@virtuozzo.com \
--cc=davem@davemloft.net \
--cc=ebiederm@xmission.com \
--cc=gregkh@linuxfoundation.org \
--cc=ktkhai@virtuozzo.com \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=serge@hallyn.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.