Linux Container Development
 help / color / mirror / Atom feed
From: Andrei Vagin <avagin-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
To: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
Cc: netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
	Andrei Vagin <avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>,
	"David S. Miller" <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
Subject: Re: [PATCH] net: limit a number of namespaces which can be cleaned up concurrently
Date: Thu, 13 Oct 2016 13:44:06 -0700	[thread overview]
Message-ID: <20161013204405.GA19836@outlook.office365.com> (raw)
In-Reply-To: <871szk9rl9.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>

On Thu, Oct 13, 2016 at 10:49:38AM -0500, Eric W. Biederman wrote:
> Andrei Vagin <avagin@openvz.org> writes:
> 
> > From: Andrey Vagin <avagin@openvz.org>
> >
> > The operation of destroying netns is heavy and it is executed under
> > net_mutex. If many namespaces are destroyed concurrently, net_mutex can
> > be locked for a long time. It is impossible to create a new netns during
> > this period of time.
> 
> This may be the right approach or at least the right approach to bound
> net_mutex hold times but I have to take exception to calling network
> namespace cleanup heavy.
> 
> The only particularly time consuming operation I have ever found are calls to
> synchronize_rcu/sycrhonize_sched/synchronize_net.

I booted the kernel with maxcpus=1, in this case these functions work
very fast and the problem is there any way.

Accoding to perf, we spend a lot of time in kobject_uevent:

-   99.96%     0.00%  kworker/u4:1     [kernel.kallsyms]  [k] unregister_netdevice_many                                                                      ▒
   - unregister_netdevice_many                                                                                                                               ◆
      - 99.95% rollback_registered_many                                                                                                                      ▒
         - 99.64% netdev_unregister_kobject                                                                                                                  ▒
            - 33.43% netdev_queue_update_kobjects                                                                                                            ▒
               - 33.40% kobject_put                                                                                                                          ▒
                  - kobject_release                                                                                                                          ▒
                     + 33.37% kobject_uevent                                                                                                                 ▒
                     + 0.03% kobject_del                                                                                                                     ▒
               + 0.03% sysfs_remove_group                                                                                                                    ▒
            - 33.13% net_rx_queue_update_kobjects                                                                                                            ▒
               - kobject_put                                                                                                                                 ▒
               - kobject_release                                                                                                                             ▒
                  + 33.11% kobject_uevent                                                                                                                    ▒
                  + 0.01% kobject_del                                                                                                                        ▒
                    0.00% rx_queue_release                                                                                                                   ▒
            - 33.08% device_del                                                                                                                              ▒
               + 32.75% kobject_uevent                                                                                                                       ▒
               + 0.17% device_remove_attrs                                                                                                                   ▒
               + 0.07% dpm_sysfs_remove                                                                                                                      ▒
               + 0.04% device_remove_class_symlinks                                                                                                          ▒
               + 0.01% kobject_del                                                                                                                           ▒
               + 0.01% device_pm_remove                                                                                                                      ▒
               + 0.01% sysfs_remove_file_ns                                                                                                                  ▒
               + 0.00% klist_del                                                                                                                             ▒
               + 0.00% driver_deferred_probe_del                                                                                                             ▒
                 0.00% cleanup_glue_dir.isra.14.part.15                                                                                                      ▒
                 0.00% to_acpi_device_node                                                                                                                   ▒
                 0.00% sysfs_remove_group                                                                                                                    ▒
              0.00% klist_del                                                                                                                                ▒
              0.00% device_remove_attrs                                                                                                                      ▒
         + 0.26% call_netdevice_notifiers_info                                                                                                               ▒
         + 0.04% rtmsg_ifinfo_build_skb                                                                                                                      ▒
         + 0.01% rtmsg_ifinfo_send                                                                                                                           ▒
        0.00% dev_uc_flush                                                                                                                                   ▒
        0.00% netif_reset_xps_queues_gt

Someone can listen these uevents, so we can't stop sending them without
breaking backward compatibility. We can try to optimize kobject_uevent...

> 
> Ideally we can search those out calls in the network namespace cleanup
> operations and figuroue out how to eliminate those operations or how to
> stack them.
> 
> > In our days when userns allows to create network namespaces to
> > unprivilaged users, it may be a real problem.
> 
> Sorting out syncrhonize_rcu calls will be a much larger
> and much more effective improvement than your patch here.
> 
> > On my laptop (fedora 24, i5-5200U, 12GB) 1000 namespaces requires about
> > 300MB of RAM and are being destroyed for 8 seconds.
> >
> > In this patch, a number of namespaces which can be cleaned up
> > concurrently is limited by 32. net_mutex is released after handling each
> > portion of net namespaces and then it is locked again to handle the next
> > one. It allows other users to lock it without waiting for a long
> > time.
> >
> > I am not sure whether we need to add a sysctl to costomize this limit.
> > Let me know if you think it's required.
> 
> We definitely don't need an extra sysctl.

Thanks,
Andrei

> 
> Eric
> 
> 
> > Cc: "David S. Miller" <davem@davemloft.net>
> > Cc: "Eric W. Biederman" <ebiederm@xmission.com>
> > Signed-off-by: Andrei Vagin <avagin@openvz.org>
> > ---
> >  net/core/net_namespace.c | 12 +++++++++++-
> >  1 file changed, 11 insertions(+), 1 deletion(-)
> >
> > diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
> > index 989434f..33dd3b7 100644
> > --- a/net/core/net_namespace.c
> > +++ b/net/core/net_namespace.c
> > @@ -406,10 +406,20 @@ static void cleanup_net(struct work_struct *work)
> >  	struct net *net, *tmp;
> >  	struct list_head net_kill_list;
> >  	LIST_HEAD(net_exit_list);
> > +	int i = 0;
> >  
> >  	/* Atomically snapshot the list of namespaces to cleanup */
> >  	spin_lock_irq(&cleanup_list_lock);
> > -	list_replace_init(&cleanup_list, &net_kill_list);
> > +	list_for_each_entry_safe(net, tmp, &cleanup_list, cleanup_list)
> > +		if (++i == 32)
> > +			break;
> > +	if (i == 32) {
> > +		list_cut_position(&net_kill_list,
> > +				  &cleanup_list, &net->cleanup_list);
> > +		queue_work(netns_wq, work);
> > +	} else {
> > +		list_replace_init(&cleanup_list, &net_kill_list);
> > +	}
> >  	spin_unlock_irq(&cleanup_list_lock);
> >  
> >  	mutex_lock(&net_mutex);
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

  parent reply	other threads:[~2016-10-13 20:44 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-12 17:32 [PATCH] net: limit a number of namespaces which can be cleaned up concurrently Andrei Vagin
     [not found] ` <1476293579-28582-1-git-send-email-avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2016-10-13 15:49   ` Eric W. Biederman
     [not found]     ` <871szk9rl9.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2016-10-13 20:44       ` Andrei Vagin [this message]
     [not found]         ` <20161013204405.GA19836-1ViLX0X+lBJGNQ1M2rI3KwRV3xvJKrda@public.gmane.org>
2016-10-14  3:06           ` Eric W. Biederman
     [not found]         ` <87k2db39zf.fsf@x220.int.ebiederm.org>
     [not found]           ` <87k2db39zf.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2016-10-14 14:09             ` David Miller
2016-10-14 21:26             ` Andrei Vagin
     [not found]               ` <20161014212642.GA2005-1ViLX0X+lBJGNQ1M2rI3KwRV3xvJKrda@public.gmane.org>
2016-10-15 16:36                 ` Eric W. Biederman
     [not found]               ` <87eg3hy3fm.fsf@x220.int.ebiederm.org>
     [not found]                 ` <87eg3hy3fm.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2016-10-19 18:46                   ` Andrey Vagin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161013204405.GA19836@outlook.office365.com \
    --to=avagin-5hdwgun5lf+gspxsjd1c4w@public.gmane.org \
    --cc=avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org \
    --cc=containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
    --cc=davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org \
    --cc=ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org \
    --cc=netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox