Re: Soft-lockup on vfsmount_lock with large numbers of mount namespaces in the cloud

All of lore.kernel.org
 help / color / mirror / Atom feed

From: ebiederm@xmission.com (Eric W. Biederman)
To: Dave Chiluk <chiluk@canonical.com>
Cc: linux-fsdevel@vger.kernel.org, Al Viro <viro@zeniv.linux.org.uk>
Subject: Re: Soft-lockup on vfsmount_lock with large numbers of mount namespaces in the cloud
Date: Tue, 25 Feb 2014 00:05:47 -0800	[thread overview]
Message-ID: <878uszboro.fsf@xmission.com> (raw)
In-Reply-To: <5305A600.1030209@canonical.com> (Dave Chiluk's message of "Thu, 20 Feb 2014 00:51:44 -0600")

Dave Chiluk <chiluk@canonical.com> writes:

> An openstack neutron gateway uses network namespaces to partition
> machines within a cloud. In order to do so it creates lots of network
> namespaces, and as a result mount namespaces. This is accomplished
> through many calls to
>
> $ ip netns add/delete/exec
>
> After roughly 3k-4k namespaces the performance of these ip calls becomes
> very slow on the order of many seconds.  After a few more the machine
> starts to report "BUGs" on the stuck ip processes (BUG output below).
>
> We think the problem is contention for the vfsmount_lock which gets held
> by do_umount while it walks the mounts in the following stack
>
> do_umount
>  -> umount_tree
>     -> propagate_umount
>        -> __propagate_umount
>           -> __lookup_mnt
>
> Where lookup_mnt proceeds to spend significant time walking the
> mount_hastable.
>
> How we can mitigate or fix this expensive operation while holding the
> lock?  If this has already been fixed please feel free to point me at
> requisite git hash's.

Just looking the expensive operation appears to be mount/umount
propagation.  I expect there is some mount propogating to all 4k mount
namespaces you have, and that is taking the time.

You should be able to dig into the set of mounts on your system, and
figure out which umount is propogating to understand what is going on.

After that you can either modify userspace to remove the mount
propagation (perhaps just a patch to iproute) or we can figure out how
to improve the locking present when the kernel propogates mounts.

> Perhaps I'm looking in the wrong area of code, and I really just need
> aa7a574d0c54cc5a0aceb7357b5097342c0844ee.  Are there any others that
> immediately stand out or is this a new problem?

I think people actually using mount/umount propagation on a large scale
is new.

> Also we've tried reproducing with 3.5, 3.8, 3.11 which yielded similar
> results. 3.13 runs into similar results but has different issues related
> to the RCU locking.  When I have a better idea as to what's going on
> with 3.13 I will report back about that.

>From an upstream perspective I am primarily interested in 3.13 and
3.14-rcX.

Eric

     prev parent reply	other threads:[~2014-02-25  8:05 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-20  6:51 Soft-lockup on vfsmount_lock with large numbers of mount namespaces in the cloud Dave Chiluk
2014-02-25  8:05 ` Eric W. Biederman [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=878uszboro.fsf@xmission.com \
    --to=ebiederm@xmission.com \
    --cc=chiluk@canonical.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.