From mboxrd@z Thu Jan 1 00:00:00 1970 From: Christian Brauner Subject: Re: [PATCH net-next 2/2] netns: isolate seqnums to use per-netns locks Date: Fri, 20 Apr 2018 15:56:28 +0200 Message-ID: <20180420135627.GA8350@gmail.com> References: <20180418152106.18519-1-christian.brauner@ubuntu.com> <20180418152106.18519-3-christian.brauner@ubuntu.com> <874lk8wj1j.fsf@xmission.com> <20180418215246.GA24000@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Cc: davem@davemloft.net, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, avagin@virtuozzo.com, ktkhai@virtuozzo.com, serge@hallyn.com, gregkh@linuxfoundation.org To: "Eric W. Biederman" Return-path: Content-Disposition: inline In-Reply-To: <20180418215246.GA24000@gmail.com> Sender: linux-kernel-owner@vger.kernel.org List-Id: netdev.vger.kernel.org On Wed, Apr 18, 2018 at 11:52:47PM +0200, Christian Brauner wrote: > On Wed, Apr 18, 2018 at 11:55:52AM -0500, Eric W. Biederman wrote: > > Christian Brauner writes: > > > > > Now that it's possible to have a different set of uevents in different > > > network namespaces, per-network namespace uevent sequence numbers are > > > introduced. This increases performance as locking is now restricted to the > > > network namespace affected by the uevent rather than locking > > > everything. > > > > Numbers please. I personally expect that the netlink mc_list issues > > will swamp any benefit you get from this. > > I wouldn't see how this would be the case. The gist of this is: > Everytime you send a uevent into a network namespace *not* owned by > init_user_ns you currently *have* to take mutex_lock(uevent_sock_list) > effectively blocking the host from processing uevents even though > - the uevent you're receiving might be totally different from the > uevent that you're sending > - the uevent socket of the non-init_user_ns owned network namespace > isn't even recorded in the list. > > The other argument is that we now have properly isolated network > namespaces wrt to uevents such that each netns can have its own set of > uevents. This can either happen by a sufficiently privileged userspace > process sending it uevents that are only dedicated to that specific > netns. Or - and this *has been true for a long time* - because network > devices are *properly namespaced*. Meaning a uevent for that network > device is *tied to a network namespace*. For both cases the uevent > sequence numbering will be absolutely misleading. For example, whenever > you create e.g. a new veth device in a new network namespace it > shouldn't be accounted against the initial network namespace but *only* > against the network namespace that has that device added to it. Eric, I did the testing. Here's what I did: I compiled two 4.17-rc1 Kernels: - one with per netns uevent seqnums with decoupled locking - one without per netns uevent seqnums with decoupled locking # Testcase 1: Only Injecting Uevents into network namespaces not owned by the initial user namespace. - created 1000 new user namespace + network namespace pairs - opened a uevent listener in each of those namespace pairs - injected uevents into each of those network namespaces 10,000 times meaning 10,000,000 (10 million) uevents were injected. (The high number of uevent injections should get rid of a lot of jitter.) - Calculated the mean transaction time. - *without* uevent sequence number namespacing: 67 μs - *with* uevent sequence number namespacing: 55 μs - makes a difference of 12 μs # Testcase 2: Injecting Uevents into network namespaces not owned by the initial user namespace and network namespaces owned by the initial user namespace. - created 500 new user namespace + network namespace pairs - created 500 new network namespace pairs - opened a uevent listener in each of those namespace pairs - injected uevents into each of those network namespaces 10,000 times meaning 10,000,000 (10 million) uevents were injected. (The high number of uevent injections should get rid of a lot of jitter.) - Calculated the mean transaction time. - *without* uevent sequence number namespacing: 572 μs - *with* uevent sequence number namespacing: 514 μs - makes a difference of 58 μs So there's performance gain. The third case would be to create a bunch of hanging processes that send SIGSTOP to themselves but do not actually open a uevent socket in their respective namespaces and then inject uevents into them. I expect there to be an even more performance benefits since the rtnl_table_lock() isn't hit in this case because there are no listeners. Christian