All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christian Brauner <christian.brauner@canonical.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: davem@davemloft.net, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, avagin@virtuozzo.com,
	ktkhai@virtuozzo.com, serge@hallyn.com,
	gregkh@linuxfoundation.org
Subject: Re: [PATCH net-next 2/2] netns: isolate seqnums to use per-netns locks
Date: Fri, 20 Apr 2018 15:56:28 +0200	[thread overview]
Message-ID: <20180420135627.GA8350@gmail.com> (raw)
In-Reply-To: <20180418215246.GA24000@gmail.com>

On Wed, Apr 18, 2018 at 11:52:47PM +0200, Christian Brauner wrote:
> On Wed, Apr 18, 2018 at 11:55:52AM -0500, Eric W. Biederman wrote:
> > Christian Brauner <christian.brauner@ubuntu.com> writes:
> > 
> > > Now that it's possible to have a different set of uevents in different
> > > network namespaces, per-network namespace uevent sequence numbers are
> > > introduced. This increases performance as locking is now restricted to the
> > > network namespace affected by the uevent rather than locking
> > > everything.
> > 
> > Numbers please.  I personally expect that the netlink mc_list issues
> > will swamp any benefit you get from this.
> 
> I wouldn't see how this would be the case. The gist of this is:
> Everytime you send a uevent into a network namespace *not* owned by
> init_user_ns you currently *have* to take mutex_lock(uevent_sock_list)
> effectively blocking the host from processing uevents even though
> - the uevent you're receiving might be totally different from the
>   uevent that you're sending
> - the uevent socket of the non-init_user_ns owned network namespace
>   isn't even recorded in the list.
> 
> The other argument is that we now have properly isolated network
> namespaces wrt to uevents such that each netns can have its own set of
> uevents. This can either happen by a sufficiently privileged userspace
> process sending it uevents that are only dedicated to that specific
> netns. Or - and this *has been true for a long time* - because network
> devices are *properly namespaced*. Meaning a uevent for that network
> device is *tied to a network namespace*. For both cases the uevent
> sequence numbering will be absolutely misleading. For example, whenever
> you create e.g. a new veth device in a new network namespace it
> shouldn't be accounted against the initial network namespace but *only*
> against the network namespace that has that device added to it.

Eric, I did the testing. Here's what I did:

I compiled two 4.17-rc1 Kernels:
- one with per netns uevent seqnums with decoupled locking
- one without per netns uevent seqnums with decoupled locking

# Testcase 1:
Only Injecting Uevents into network namespaces not owned by the initial user
namespace.
- created 1000 new user namespace + network namespace pairs
- opened a uevent listener in each of those namespace pairs
- injected uevents into each of those network namespaces 10,000 times meaning
  10,000,000 (10 million) uevents were injected. (The high number of
  uevent injections should get rid of a lot of jitter.)
- Calculated the mean transaction time.
- *without* uevent sequence number namespacing:
  67 μs
- *with* uevent sequence number namespacing:
  55 μs
- makes a difference of 12 μs

# Testcase 2:
Injecting Uevents into network namespaces not owned by the initial user
namespace and network namespaces owned by the initial user namespace.
- created 500 new user namespace + network namespace pairs
- created 500 new network namespace pairs
- opened a uevent listener in each of those namespace pairs
- injected uevents into each of those network namespaces 10,000 times meaning
  10,000,000 (10 million) uevents were injected. (The high number of
  uevent injections should get rid of a lot of jitter.)
- Calculated the mean transaction time.
- *without* uevent sequence number namespacing:
  572 μs
- *with* uevent sequence number namespacing:
  514 μs
- makes a difference of 58 μs

So there's performance gain. The third case would be to create a bunch
of hanging processes that send SIGSTOP to themselves but do not actually
open a uevent socket in their respective namespaces and then inject
uevents into them. I expect there to be an even more performance
benefits since the rtnl_table_lock() isn't hit in this case because
there are no listeners.

Christian

  reply	other threads:[~2018-04-20 13:56 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-18 15:21 [PATCH net-next 0/2] netns: uevent performance tweaks Christian Brauner
2018-04-18 15:21 ` [PATCH net-next 1/2] netns: restrict uevents Christian Brauner
2018-04-18 15:21 ` [PATCH net-next 2/2] netns: isolate seqnums to use per-netns locks Christian Brauner
2018-04-18 16:55   ` Eric W. Biederman
2018-04-18 21:52     ` Christian Brauner
2018-04-20 13:56       ` Christian Brauner [this message]
2018-04-20 16:16         ` Christian Brauner
2018-04-21 15:49           ` Christian Brauner
2018-04-23  2:39   ` kbuild test robot
2018-04-23 10:12     ` Christian Brauner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180420135627.GA8350@gmail.com \
    --to=christian.brauner@canonical.com \
    --cc=avagin@virtuozzo.com \
    --cc=davem@davemloft.net \
    --cc=ebiederm@xmission.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=ktkhai@virtuozzo.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=serge@hallyn.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.