From mboxrd@z Thu Jan 1 00:00:00 1970 From: ebiederm@xmission.com (Eric W. Biederman) Subject: Re: Soft-lockup on vfsmount_lock with large numbers of mount namespaces in the cloud Date: Tue, 25 Feb 2014 00:05:47 -0800 Message-ID: <878uszboro.fsf@xmission.com> References: <5305A600.1030209@canonical.com> Mime-Version: 1.0 Content-Type: text/plain Cc: linux-fsdevel@vger.kernel.org, Al Viro To: Dave Chiluk Return-path: Received: from out01.mta.xmission.com ([166.70.13.231]:54891 "EHLO out01.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750708AbaBYIFz (ORCPT ); Tue, 25 Feb 2014 03:05:55 -0500 In-Reply-To: <5305A600.1030209@canonical.com> (Dave Chiluk's message of "Thu, 20 Feb 2014 00:51:44 -0600") Sender: linux-fsdevel-owner@vger.kernel.org List-ID: Dave Chiluk writes: > An openstack neutron gateway uses network namespaces to partition > machines within a cloud. In order to do so it creates lots of network > namespaces, and as a result mount namespaces. This is accomplished > through many calls to > > $ ip netns add/delete/exec > > After roughly 3k-4k namespaces the performance of these ip calls becomes > very slow on the order of many seconds. After a few more the machine > starts to report "BUGs" on the stuck ip processes (BUG output below). > > We think the problem is contention for the vfsmount_lock which gets held > by do_umount while it walks the mounts in the following stack > > do_umount > -> umount_tree > -> propagate_umount > -> __propagate_umount > -> __lookup_mnt > > Where lookup_mnt proceeds to spend significant time walking the > mount_hastable. > > How we can mitigate or fix this expensive operation while holding the > lock? If this has already been fixed please feel free to point me at > requisite git hash's. Just looking the expensive operation appears to be mount/umount propagation. I expect there is some mount propogating to all 4k mount namespaces you have, and that is taking the time. You should be able to dig into the set of mounts on your system, and figure out which umount is propogating to understand what is going on. After that you can either modify userspace to remove the mount propagation (perhaps just a patch to iproute) or we can figure out how to improve the locking present when the kernel propogates mounts. > Perhaps I'm looking in the wrong area of code, and I really just need > aa7a574d0c54cc5a0aceb7357b5097342c0844ee. Are there any others that > immediately stand out or is this a new problem? I think people actually using mount/umount propagation on a large scale is new. > Also we've tried reproducing with 3.5, 3.8, 3.11 which yielded similar > results. 3.13 runs into similar results but has different issues related > to the RCU locking. When I have a better idea as to what's going on > with 3.13 I will report back about that. >>From an upstream perspective I am primarily interested in 3.13 and 3.14-rcX. Eric