public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andrea Arcangeli <andrea@suse.de>
To: tytso@valinux.com
Cc: linux-kernel@vger.kernel.org, alan@redhat.com, aviro@redhat.com
Subject: Re: Locking problem in 2.2.18/19-pre7? (fs/inode.c and fs/dcache.c)
Date: Tue, 16 Jan 2001 20:33:34 +0100	[thread overview]
Message-ID: <20010116203334.C19265@athlon.random> (raw)
In-Reply-To: <E14IbPR-0007Ye-00@beefcake.hdqt.valinux.com>
In-Reply-To: <E14IbPR-0007Ye-00@beefcake.hdqt.valinux.com>; from tytso@valinux.com on Tue, Jan 16, 2001 at 11:04:45AM -0800

On Tue, Jan 16, 2001 at 11:04:45AM -0800, Theodore Y. Ts'o wrote:
> 
> HJ Lu recently pointed me at a potential locking problem
> try_to_free_inodes(), and when I started proding at it, I found what
> appears to be another set of SMP locking issues in the dcache code.
> (But if that were the case, why aren't we seeing huge numbers of
> complaints?  So I'm wondering if I'm missing something.)

Because the code is correct ;). It is infact a fix and it took some time to fix
such bug in mid 2.2.x.

> 
> Anyway, the first problem which HJ pointed out is in
> try_to_free_inodes() which attempts to implement a mutual exclusion with
> respect to itself as follows....
> 
> 	if (block)
> 	{
> 		struct wait_queue __wait;
> 
> 		__wait.task = current;
> 		add_wait_queue(&wait_inode_freeing, &__wait);
> 		for (;;)
> 		{
> 			/* NOTE: we rely only on the inode_lock to be sure
> 			   to not miss the unblock event */
> 			current->state = TASK_UNINTERRUPTIBLE;
> 			spin_unlock(&inode_lock);
			^^^^^^^^^^^^^^^^^^^^^^^^
> 			schedule();
> 			spin_lock(&inode_lock);
			^^^^^^^^^^^^^^^^^^^^^^^^
> 			if (!block)
> 				break;
> 		}
> 		remove_wait_queue(&wait_inode_freeing, &__wait);
> 		current->state = TASK_RUNNING;
> 	}
> 	block = 1;
> 
> Of course, this is utterly unsafe on an SMP machines, since access to
> the "block" variable isn't protected at all.  So the first question is

Wrong, it's obviously protected by the inode_lock. And even if it wasn't
protected by the inode_lock in 2.2.x inode.c and dcache.c runs globally
serialized by the BKL (but it is obviously protected regardless of the BKL).

> why did whoever implemented this do it in this horribly complicated way
> above, instead of something simple, say like this:
> 
> 	static struct semaphore block = MUTEX;
> 	if (down_trylock(&block)) {
> 		spin_unlock(&inode_lock);
> 		down(&block);
> 		spin_lock(&inode_lock);
> 	}

The above is overkill (there's no need to use further atomic API, when we can
rely on the inode_lock for the locking. It's overcomplex and slower.

> (with the appropriate unlocking code at the end of the function).
> 
> 
> Next question.... why was this there in the first place?  After all,

To fix the "inode-max limit reached" faliures that you could reproduce on
earlier 2.2.x. (the freed inodes was re-used before the task that freed them
had a chance to allocate them for itself)

> most of the time try_to_free_inodes() is called with the inode_lock
> spinlock held, which would act as a automatic mutual exclusion anyway.

> The only time this doesn't happen is when we call prune_dcache(), where
> inode_lock is temporarily dropped.

Wrong:

		spin_unlock(&inode_lock);
		prune_dcache(0, goal);
		spin_lock(&inode_lock);
		sync_all_inodes();
		__free_inodes(&throw_away);

the above code obviously drops the spinlock for doing things like flushing the
dirty inodes to the buffer cache that can block in balance_dirty() etc...
(and that's the real problem because it sleeps so also the BKL gets released
while prune_dcache in practice could not race because of the BKL)

> So I took a look at prune_dcache(), and discovered that (a) it's called
> from multiple places, and (b) it and shrink_dcache_sb() both iterate over
> dentry_unused and among other things, tried to free dcache structures
> without any kind of locking to prevent two kernel threads to
> from mucking with the contents of dentry_unused at the same time, and
> possibly having prune_one_dentry() being called by two processes on the
> same dentry structure.  This should almost certainly cause problems.

we're running under the BKL all over the place in 2.2.x so they can't race.

> So the following patch I think is definitely necessary, assuming that

The patch is definitely not necessary.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

  reply	other threads:[~2001-01-16 19:33 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2001-01-16 19:04 Locking problem in 2.2.18/19-pre7? (fs/inode.c and fs/dcache.c) tytso
2001-01-16 19:33 ` Andrea Arcangeli [this message]
2001-01-16 20:10   ` tytso
2001-01-16 23:05     ` Andrea Arcangeli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20010116203334.C19265@athlon.random \
    --to=andrea@suse.de \
    --cc=alan@redhat.com \
    --cc=aviro@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tytso@valinux.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox