Re: Possible netns creation and execution performance/scalability regression since v3.8 due to rcu callbacks being offloaded to multiple cpus

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: ebiederm@xmission.com (Eric W. Biederman)
To: paulmck@linux.vnet.ibm.com
Cc: chiluk@canonical.com, Rafael Tinoco <rafael.tinoco@canonical.com>,
	linux-kernel@vger.kernel.org, davem@davemloft.net,
	Christopher Arges <chris.j.arges@canonical.com>,
	Jay Vosburgh <jay.vosburgh@canonical.com>
Subject: Re: Possible netns creation and execution performance/scalability regression since v3.8 due to rcu callbacks being offloaded to multiple cpus
Date: Wed, 11 Jun 2014 16:12:15 -0700	[thread overview]
Message-ID: <87ioo7vy5s.fsf@x220.int.ebiederm.org> (raw)
In-Reply-To: <20140611225228.GO4581@linux.vnet.ibm.com> (Paul E. McKenney's message of "Wed, 11 Jun 2014 15:52:28 -0700")

"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> writes:

> On Wed, Jun 11, 2014 at 01:46:08PM -0700, Eric W. Biederman wrote:
>> On the chance it is dropping the old nsproxy which calls syncrhonize_rcu
>> in switch_task_namespaces that is causing you problems I have attached
>> a patch that changes from rcu_read_lock to task_lock for code that
>> calls task_nsproxy from a different task.  The code should be safe
>> and it should be an unquestions performance improvement but I have only
>> compile tested it.
>> 
>> If you can try the patch it will tell is if the problem is the rcu
>> access in switch_task_namespaces (the only one I am aware of network
>> namespace creation) or if the problem rcu case is somewhere else.
>> 
>> If nothing else knowing which rcu accesses are causing the slow down
>> seem important at the end of the day.
>> 
>> Eric
>> 
>
> If this is the culprit, another approach would be to use workqueues from
> RCU callbacks.  The following (untested, probably does not even build)
> patch illustrates one such approach.

For reference the only reason we are using rcu_lock today for nsproxy is
an old lock ordering problem that does not exist anymore.

I can say that in some workloads setns is a bit heavy today because of
the synchronize_rcu and setns is more important that I had previously
thought because pthreads break the classic unix ability to do things in
your process after fork() (sigh).

Today daemonize is gone, and notify the parent process with a signal
relies on task_active_pid_ns which does not use nsproxy.  So the old
lock ordering problem/race is gone.

The description of what was happening when the code switched from
task_lock to rcu_read_lock to protect nsproxy.

commit cf7b708c8d1d7a27736771bcf4c457b332b0f818
Author: Pavel Emelyanov <xemul@openvz.org>
Date:   Thu Oct 18 23:39:54 2007 -0700

    Make access to task's nsproxy lighter
    
    When someone wants to deal with some other taks's namespaces it has to lock
    the task and then to get the desired namespace if the one exists.  This is
    slow on read-only paths and may be impossible in some cases.
    
    E.g.  Oleg recently noticed a race between unshare() and the (sent for
    review in cgroups) pid namespaces - when the task notifies the parent it
    has to know the parent's namespace, but taking the task_lock() is
    impossible there - the code is under write locked tasklist lock.
    
    On the other hand switching the namespace on task (daemonize) and releasing
    the namespace (after the last task exit) is rather rare operation and we
    can sacrifice its speed to solve the issues above.
    
    The access to other task namespaces is proposed to be performed
    like this:
    
         rcu_read_lock();
         nsproxy = task_nsproxy(tsk);
         if (nsproxy != NULL) {
                 / *
                   * work with the namespaces here
                   * e.g. get the reference on one of them
                   * /
         } / *
             * NULL task_nsproxy() means that this task is
             * almost dead (zombie)
             * /
         rcu_read_unlock();
    
    This patch has passed the review by Eric and Oleg :) and,
    of course, tested.
    
    [clg@fr.ibm.com: fix unshare()]
    [ebiederm@xmission.com: Update get_net_ns_by_pid]
    Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
    Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
    Cc: Oleg Nesterov <oleg@tv-sign.ru>
    Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
    Cc: Serge Hallyn <serue@us.ibm.com>
    Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Eric

next prev parent reply	other threads:[~2014-06-11 23:13 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-11  5:52 Possible netns creation and execution performance/scalability regression since v3.8 due to rcu callbacks being offloaded to multiple cpus Rafael Tinoco
2014-06-11  7:07 ` Eric W. Biederman
2014-06-11 13:39 ` Paul E. McKenney
2014-06-11 15:17   ` Rafael Tinoco
2014-06-11 15:46     ` David Chiluk
2014-06-11 16:18       ` Paul E. McKenney
2014-06-11 18:27         ` Dave Chiluk
2014-06-11 19:48           ` Paul E. McKenney
2014-06-11 20:55             ` Eric W. Biederman
2014-06-11 21:03               ` Rafael Tinoco
2014-06-11 20:46           ` Eric W. Biederman
2014-06-11 21:14             ` Dave Chiluk
2014-06-11 22:52             ` Paul E. McKenney
2014-06-11 23:12               ` Eric W. Biederman [this message]
2014-06-11 23:49                 ` Paul E. McKenney
2014-06-12  0:14                   ` Eric W. Biederman
2014-06-12  0:25                     ` Rafael Tinoco
2014-06-12  1:09                       ` Eric W. Biederman
2014-06-12  1:14                         ` Rafael Tinoco
     [not found]                           ` <CAJE_dJzjcWP=e_CPM1M64URVHiEFFb+fP6g2YKZVdoFntkQMZg@mail.gmail.com>
2014-06-13 18:22                             ` Rafael Tinoco
2014-06-14  0:02                             ` Eric W. Biederman
2014-06-16 15:01                               ` Rafael Tinoco
2014-07-17 12:05                                 ` Rafael David Tinoco
2014-07-24  7:01                                   ` Eric W. Biederman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87ioo7vy5s.fsf@x220.int.ebiederm.org \
    --to=ebiederm@xmission.com \
    --cc=chiluk@canonical.com \
    --cc=chris.j.arges@canonical.com \
    --cc=davem@davemloft.net \
    --cc=jay.vosburgh@canonical.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=rafael.tinoco@canonical.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox