From: "Paweł Sikora" <pluto@pld-linux.org>
To: Herbert Poetzl <herbert@13thfloor.at>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
arekm@pld-linux.org, baggins@pld-linux.org,
Daniel Hokka Zakrisson <daniel@hozac.com>
Subject: Re: [2.6.38-3.x] [BUG] soft lockup - CPU#X stuck for 23s! (vfs, autofs, vserver)
Date: Thu, 15 Nov 2012 19:48:10 +0100 [thread overview]
Message-ID: <3506450.k3Q223DJQc@localhost> (raw)
In-Reply-To: <20120925050558.GA14685@MAIL.13thfloor.at>
On Tuesday 25 of September 2012 07:05:59 Herbert Poetzl wrote:
> On Mon, Sep 24, 2012 at 11:17:42AM -0700, Eric W. Biederman wrote:
> > Herbert Poetzl <herbert@13thfloor.at> writes:
>
> >> On Mon, Sep 24, 2012 at 07:23:55AM +0200, Paweł Sikora wrote:
> >>> On Sunday 23 of September 2012 18:10:30 Linus Torvalds wrote:
> >>>> On Sat, Sep 22, 2012 at 11:09 PM, Paweł Sikora <pluto@pld-linux.org> wrote:
>
> >>>>> br_read_lock(vfsmount_lock);
>
> >>>> The vfsmount_lock is a "local-global" lock, where a read-lock
> >>>> is rather cheap and takes just a per-cpu lock, but the
> >>>> downside is that a write-lock is *very* expensive, and can
> >>>> cause serious trouble.
>
> >>>> And the write lock is taken by the [un]mount() paths. Do *not*
> >>>> do crazy things. If you do some insane "unmount and remount
> >>>> autofs" on a 1s granularity, you're doing insane things.
>
> >>>> Why do you have that 1s timeout? Insane.
>
> >>> 1s unmount timeout is *only* for fast bug reproduction (in few
> >>> seconds after opteron startup) and testing potential patches.
> >>> normally with 60s timeout it happens in few minutes..hours
> >>> (depends on machine i/o+cpu load) and makes server unusable
> >>> (permament soft-lockup).
>
> >>> can we redesign vserver's mnt_is_reachable() for better locking
> >>> to avoid total soft-lockup?
>
> >> currently we do:
>
> >> br_read_lock(&vfsmount_lock);
> >> root = current->fs->root;
> >> root_mnt = real_mount(root.mnt);
> >> point = root.dentry;
>
> >> while ((mnt != mnt->mnt_parent) && (mnt != root_mnt)) {
> >> point = mnt->mnt_mountpoint;
> >> mnt = mnt->mnt_parent;
> >> }
>
> >> ret = (mnt == root_mnt) && is_subdir(point, root.dentry);
> >> br_read_unlock(&vfsmount_lock);
>
> >> and we have been considering to move the br_read_unlock()
> >> right before the is_subdir() call
>
> >> if there are any suggestions how to achieve the same
> >> with less locking I'm all ears ...
>
> > Herbert, why do you need to filter the mounts that show up in a
> > mount namespace at all?
>
> that is actually a really good question!
>
> > I would think a far more performant and simpler solution would
> > be to just use mount namespaces without unwanted mounts.
>
> we had this mechanism for many years, long before the
> mount namespaces existed, and I vaguely remember that
> early versions didn't get the proc entries right either
>
> I took a quick look at the code and I think we can drop
> the mnt_is_reachable() check and/or make it conditional
> on setups without a mount namespace in place in the near
> future (thanks for the input, really appreciated!)
Hi,
Herbert, can i just drop this mnt_is_reachable() method from vserver patch?
this issue hasn't been solved for several months now. i can live without this
problematic security-through-obscurity feature on my heavy loaded machines. .
> > I'd like to blame this on the silly rcu_barrier in
> > deactivate_locked_super that should really be in the module
> > remove path, but that happens after we drop the br_write_lock.
>
> > The kernel take br_read_lock(&vfs_mount_lokck) during every rcu
> > path lookup so mnt_is_reachable isn't particular crazy just for
> > taking the lock.
>
> > I am with Linus on this one. Paweł even 60s for your mount
> > timeout looks too short for your workload. All of the readers
> > that take br_read_lock(&vfsmount_lock) seem to be showing up in
> > your oops. The only thing that seems to make sense is you have
> > a lot of unmount activity running back to back, keeping the
> > lock write held.
>
> > The only other possible culprit I can see is that it looks like
> > mnt_is_reachable changes reading /proc/mounts to be something
> > worse than linear in the number of mounts and reading /proc/mounts
> > starts taking the vfsmount_lock. All minor things but when you
> > are pushing things hard they look like things that would add up.
>
> > Eric
next prev parent reply other threads:[~2012-11-15 18:48 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-09-23 6:09 [2.6.38-3.x] [BUG] soft lockup - CPU#X stuck for 23s! (vfs, autofs, vserver) Paweł Sikora
2012-09-23 6:09 ` Paweł Sikora
2012-09-24 1:10 ` Linus Torvalds
2012-09-24 5:23 ` Paweł Sikora
2012-09-24 11:23 ` Herbert Poetzl
2012-09-24 17:22 ` Linus Torvalds
2012-09-24 18:17 ` Eric W. Biederman
2012-09-24 18:17 ` Eric W. Biederman
2012-09-25 5:05 ` Herbert Poetzl
2012-11-15 18:48 ` Paweł Sikora [this message]
2012-11-15 19:22 ` Herbert Poetzl
2012-11-15 19:22 ` Herbert Poetzl
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3506450.k3Q223DJQc@localhost \
--to=pluto@pld-linux.org \
--cc=arekm@pld-linux.org \
--cc=baggins@pld-linux.org \
--cc=daniel@hozac.com \
--cc=ebiederm@xmission.com \
--cc=herbert@13thfloor.at \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.