From: Andrea Arcangeli <andrea@suse.de>
To: tytso@valinux.com
Cc: linux-kernel@vger.kernel.org, alan@redhat.com, aviro@redhat.com
Subject: Re: Locking problem in 2.2.18/19-pre7? (fs/inode.c and fs/dcache.c)
Date: Tue, 16 Jan 2001 20:33:34 +0100 [thread overview]
Message-ID: <20010116203334.C19265@athlon.random> (raw)
In-Reply-To: <E14IbPR-0007Ye-00@beefcake.hdqt.valinux.com>
In-Reply-To: <E14IbPR-0007Ye-00@beefcake.hdqt.valinux.com>; from tytso@valinux.com on Tue, Jan 16, 2001 at 11:04:45AM -0800
On Tue, Jan 16, 2001 at 11:04:45AM -0800, Theodore Y. Ts'o wrote:
>
> HJ Lu recently pointed me at a potential locking problem
> try_to_free_inodes(), and when I started proding at it, I found what
> appears to be another set of SMP locking issues in the dcache code.
> (But if that were the case, why aren't we seeing huge numbers of
> complaints? So I'm wondering if I'm missing something.)
Because the code is correct ;). It is infact a fix and it took some time to fix
such bug in mid 2.2.x.
>
> Anyway, the first problem which HJ pointed out is in
> try_to_free_inodes() which attempts to implement a mutual exclusion with
> respect to itself as follows....
>
> if (block)
> {
> struct wait_queue __wait;
>
> __wait.task = current;
> add_wait_queue(&wait_inode_freeing, &__wait);
> for (;;)
> {
> /* NOTE: we rely only on the inode_lock to be sure
> to not miss the unblock event */
> current->state = TASK_UNINTERRUPTIBLE;
> spin_unlock(&inode_lock);
^^^^^^^^^^^^^^^^^^^^^^^^
> schedule();
> spin_lock(&inode_lock);
^^^^^^^^^^^^^^^^^^^^^^^^
> if (!block)
> break;
> }
> remove_wait_queue(&wait_inode_freeing, &__wait);
> current->state = TASK_RUNNING;
> }
> block = 1;
>
> Of course, this is utterly unsafe on an SMP machines, since access to
> the "block" variable isn't protected at all. So the first question is
Wrong, it's obviously protected by the inode_lock. And even if it wasn't
protected by the inode_lock in 2.2.x inode.c and dcache.c runs globally
serialized by the BKL (but it is obviously protected regardless of the BKL).
> why did whoever implemented this do it in this horribly complicated way
> above, instead of something simple, say like this:
>
> static struct semaphore block = MUTEX;
> if (down_trylock(&block)) {
> spin_unlock(&inode_lock);
> down(&block);
> spin_lock(&inode_lock);
> }
The above is overkill (there's no need to use further atomic API, when we can
rely on the inode_lock for the locking. It's overcomplex and slower.
> (with the appropriate unlocking code at the end of the function).
>
>
> Next question.... why was this there in the first place? After all,
To fix the "inode-max limit reached" faliures that you could reproduce on
earlier 2.2.x. (the freed inodes was re-used before the task that freed them
had a chance to allocate them for itself)
> most of the time try_to_free_inodes() is called with the inode_lock
> spinlock held, which would act as a automatic mutual exclusion anyway.
> The only time this doesn't happen is when we call prune_dcache(), where
> inode_lock is temporarily dropped.
Wrong:
spin_unlock(&inode_lock);
prune_dcache(0, goal);
spin_lock(&inode_lock);
sync_all_inodes();
__free_inodes(&throw_away);
the above code obviously drops the spinlock for doing things like flushing the
dirty inodes to the buffer cache that can block in balance_dirty() etc...
(and that's the real problem because it sleeps so also the BKL gets released
while prune_dcache in practice could not race because of the BKL)
> So I took a look at prune_dcache(), and discovered that (a) it's called
> from multiple places, and (b) it and shrink_dcache_sb() both iterate over
> dentry_unused and among other things, tried to free dcache structures
> without any kind of locking to prevent two kernel threads to
> from mucking with the contents of dentry_unused at the same time, and
> possibly having prune_one_dentry() being called by two processes on the
> same dentry structure. This should almost certainly cause problems.
we're running under the BKL all over the place in 2.2.x so they can't race.
> So the following patch I think is definitely necessary, assuming that
The patch is definitely not necessary.
Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
next prev parent reply other threads:[~2001-01-16 19:33 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2001-01-16 19:04 Locking problem in 2.2.18/19-pre7? (fs/inode.c and fs/dcache.c) tytso
2001-01-16 19:33 ` Andrea Arcangeli [this message]
2001-01-16 20:10 ` tytso
2001-01-16 23:05 ` Andrea Arcangeli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20010116203334.C19265@athlon.random \
--to=andrea@suse.de \
--cc=alan@redhat.com \
--cc=aviro@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=tytso@valinux.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.