From: ebiederm@xmission.com (Eric W. Biederman)
To: "Paweł Sikora" <pluto@pld-linux.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
arekm@pld-linux.org, baggins@pld-linux.org,
Herbert Poetzl <herbert@13thfloor.at>
Subject: Re: [2.6.38-3.x] [BUG] soft lockup - CPU#X stuck for 23s! (vfs, autofs, vserver)
Date: Mon, 24 Sep 2012 11:17:42 -0700 [thread overview]
Message-ID: <87sja7uvy1.fsf@xmission.com> (raw)
In-Reply-To: <20120924112300.GE20655@MAIL.13thfloor.at> (Herbert Poetzl's message of "Mon, 24 Sep 2012 13:23:00 +0200")
Herbert Poetzl <herbert@13thfloor.at> writes:
> On Mon, Sep 24, 2012 at 07:23:55AM +0200, Paweł Sikora wrote:
>> On Sunday 23 of September 2012 18:10:30 Linus Torvalds wrote:
>>> On Sat, Sep 22, 2012 at 11:09 PM, Paweł Sikora <pluto@pld-linux.org> wrote:
>
>>>> br_read_lock(vfsmount_lock);
>
>>> The vfsmount_lock is a "local-global" lock, where a read-lock
>>> is rather cheap and takes just a per-cpu lock, but the
>>> downside is that a write-lock is *very* expensive, and can
>>> cause serious trouble.
>
>>> And the write lock is taken by the [un]mount() paths. Do *not*
>>> do crazy things. If you do some insane "unmount and remount
>>> autofs" on a 1s granularity, you're doing insane things.
>
>>> Why do you have that 1s timeout? Insane.
>
>> 1s unmount timeout is *only* for fast bug reproduction (in few
>> seconds after opteron startup) and testing potential patches.
>> normally with 60s timeout it happens in few minutes..hours
>> (depends on machine i/o+cpu load) and makes server unusable
>> (permament soft-lockup).
>
>> can we redesign vserver's mnt_is_reachable() for better locking
>> to avoid total soft-lockup?
>
> currently we do:
>
> br_read_lock(&vfsmount_lock);
> root = current->fs->root;
> root_mnt = real_mount(root.mnt);
> point = root.dentry;
>
> while ((mnt != mnt->mnt_parent) && (mnt != root_mnt)) {
> point = mnt->mnt_mountpoint;
> mnt = mnt->mnt_parent;
> }
>
> ret = (mnt == root_mnt) && is_subdir(point, root.dentry);
> br_read_unlock(&vfsmount_lock);
>
> and we have been considering to move the br_read_unlock()
> right before the is_subdir() call
>
> if there are any suggestions how to achieve the same
> with less locking I'm all ears ...
Herbert, why do you need to filter the mounts that show up in a mount
namespace at all? I would think a far more performant and simpler
solution would be to just use mount namespaces without unwanted mounts.
I'd like to blame this on the silly rcu_barrier in
deactivate_locked_super that should really be in the module remove path,
but that happens after we drop the br_write_lock.
The kernel take br_read_lock(&vfs_mount_lokck) during every rcu path
lookup so mnt_is_reachable isn't particular crazy just for taking the
lock.
I am with Linus on this one. Paweł even 60s for your mount timeout
looks too short for your workload. All of the readers that take
br_read_lock(&vfsmount_lock) seem to be showing up in your oops. The
only thing that seems to make sense is you have a lot of unmount
activity running back to back, keeping the lock write held.
The only other possible culprit I can see is that it looks like
mnt_is_reachable changes reading /proc/mounts to be something
worse than linear in the number of mounts and reading /proc/mounts
starts taking the vfsmount_lock. All minor things but when you
are pushing things hard they look like things that would add up.
Eric
next prev parent reply other threads:[~2012-09-24 18:17 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-09-23 6:09 [2.6.38-3.x] [BUG] soft lockup - CPU#X stuck for 23s! (vfs, autofs, vserver) Paweł Sikora
2012-09-24 1:10 ` Linus Torvalds
2012-09-24 5:23 ` Paweł Sikora
2012-09-24 11:23 ` Herbert Poetzl
2012-09-24 17:22 ` Linus Torvalds
2012-09-24 18:17 ` Eric W. Biederman [this message]
2012-09-25 5:05 ` Herbert Poetzl
2012-11-15 18:48 ` Paweł Sikora
2012-11-15 19:22 ` Herbert Poetzl
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87sja7uvy1.fsf@xmission.com \
--to=ebiederm@xmission.com \
--cc=arekm@pld-linux.org \
--cc=baggins@pld-linux.org \
--cc=herbert@13thfloor.at \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pluto@pld-linux.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox