linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: linux-fsdevel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Subject: [PATCH 17/21] fs: protect wake_up_inode with inode->i_lock
Date: Thu, 21 Oct 2010 11:49:42 +1100	[thread overview]
Message-ID: <1287622186-1935-18-git-send-email-david@fromorbit.com> (raw)
In-Reply-To: <1287622186-1935-1-git-send-email-david@fromorbit.com>

From: Dave Chinner <dchinner@redhat.com>

Currently we have an optimisation in place in unlock_new_inode() to
avoid taking the inode_lock. This uses memory barriers to ensure
that the the clearing of I_NEW can be seen by all other CPUs. It
also means the wake_up_inode() call relies on a memory barrier to
ensure that processes it is waking see any changes the i_state field
prior to the wakeup being issued.

This serialisation is also necessary to protect the inode against
lookup/freeing races once the inode_lock is removed. The current
lookup and wait code is atomic as it is all performed under the
inode_lock, while the modified code now splits the locks. Hence we
can get the following race:

Thread 1:				Thread 2:
					iput_final
					spin_lock(&inode->i_lock);
hlist_bl_lock()
  Find inode
    spin_lock(&inode->i_lock)
					......
					inode->i_state = I_FREEING;
					spin_unlock(&inode->i_lock);
					remove_inode_hash()
					   hlist_bl_lock()
    ......
    if (I_FREEING)
	hlist_bl_unlock()
					   ......
					   hlist_bl_del_init()
					   hlist_bl_unlock()
					wake_up_inode()

	__wait_on_freeing_inode()
	  put on waitqueue
	  spin_unlock(&inode->i_lock);
	  schedule()

To avoid this race, wake ups need to be serialised against the
waiters, and using inode->i_lock is the natural solution.

The memory barrier optimisation is no longer needed to avoid the
global inode_lock contention as the inode->i_state field is now
protected by inode->i_lock. Hence we can revert the code to a much
simpler form and correctly serialise wake ups against
__wait_on_freeing_inode() by holding the inode->i_lock while we do
the wakeup.

Given the newfound simplicity of wake_up_inode() and the fact that
we need to change i_state in unlock_new_inode() before the wakeup,
just open code the wakeup in the couple of spots it is used and
remove wake_up_inode() entirely.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/inode.c |   41 ++++++++++++++++++-----------------------
 1 files changed, 18 insertions(+), 23 deletions(-)

diff --git a/fs/inode.c b/fs/inode.c
index 32bac58..ba514a1 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -175,15 +175,6 @@ int proc_nr_inodes(ctl_table *table, int write,
 }
 #endif
 
-static void wake_up_inode(struct inode *inode)
-{
-	/*
-	 * Prevent speculative execution through spin_unlock(&inode->i_lock);
-	 */
-	smp_mb();
-	wake_up_bit(&inode->i_state, __I_NEW);
-}
-
 /**
  * inode_init_always - perform inode structure intialisation
  * @sb: superblock inode belongs to
@@ -540,7 +531,9 @@ static void dispose_list(struct list_head *head)
 		__inode_sb_list_del(inode);
 		spin_unlock(&inode_lock);
 
-		wake_up_inode(inode);
+		spin_lock(&inode->i_lock);
+		wake_up_bit(&inode->i_state, __I_NEW);
+		spin_unlock(&inode->i_lock);
 		destroy_inode(inode);
 	}
 }
@@ -862,6 +855,13 @@ struct inode *new_inode(struct super_block *sb)
 }
 EXPORT_SYMBOL(new_inode);
 
+/**
+ * unlock_new_inode - clear the I_NEW state and wake up any waiters
+ * @inode:	new inode to unlock
+ *
+ * Called when the inode is fully initialised to clear the new state of the
+ * inode and wake up anyone waiting for the inode to finish initialisation.
+ */
 void unlock_new_inode(struct inode *inode)
 {
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
@@ -881,19 +881,11 @@ void unlock_new_inode(struct inode *inode)
 		}
 	}
 #endif
-	/*
-	 * This is special!  We do not need the spinlock when clearing I_NEW,
-	 * because we're guaranteed that nobody else tries to do anything about
-	 * the state of the inode when it is locked, as we just created it (so
-	 * there can be no old holders that haven't tested I_NEW).
-	 * However we must emit the memory barrier so that other CPUs reliably
-	 * see the clearing of I_NEW after the other inode initialisation has
-	 * completed.
-	 */
-	smp_mb();
+	spin_lock(&inode->i_lock);
 	WARN_ON(!(inode->i_state & I_NEW));
 	inode->i_state &= ~I_NEW;
-	wake_up_inode(inode);
+	wake_up_bit(&inode->i_state, __I_NEW);
+	spin_unlock(&inode->i_lock);
 }
 EXPORT_SYMBOL(unlock_new_inode);
 
@@ -1486,8 +1478,10 @@ static void iput_final(struct inode *inode)
 	spin_unlock(&inode_lock);
 	evict(inode);
 	remove_inode_hash(inode);
-	wake_up_inode(inode);
+	spin_lock(&inode->i_lock);
+	wake_up_bit(&inode->i_state, __I_NEW);
 	BUG_ON(inode->i_state != (I_FREEING | I_CLEAR));
+	spin_unlock(&inode->i_lock);
 	destroy_inode(inode);
 }
 
@@ -1690,7 +1684,8 @@ EXPORT_SYMBOL(inode_wait);
  * to recheck inode state.
  *
  * It doesn't matter if I_NEW is not set initially, a call to
- * wake_up_inode() after removing from the hash list will DTRT.
+ * wake_up_bit(&inode->i_state, __I_NEW) with the i_lock held after removing
+ * from the hash list will DTRT.
  *
  * This is called with inode_lock held.
  *
-- 
1.7.1

  parent reply	other threads:[~2010-10-21  0:49 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-10-21  0:49 Inode Lock Scalability V6 Dave Chinner
2010-10-21  0:49 ` [PATCH 01/21] fs: switch bdev inode bdi's correctly Dave Chinner
2010-10-21  0:49 ` [PATCH 02/21] kernel: add bl_list Dave Chinner
2010-10-21  0:49 ` [PATCH 03/21] fs: Convert nr_inodes and nr_unused to per-cpu counters Dave Chinner
2010-10-21  0:49 ` [PATCH 04/21] fs: Implement lazy LRU updates for inodes Dave Chinner
2010-10-21  2:14   ` Christian Stroetmann
2010-10-21 10:07   ` Nick Piggin
2010-10-21 12:22     ` Christoph Hellwig
2010-10-23  9:32   ` Al Viro
2010-10-21  0:49 ` [PATCH 05/21] fs: inode split IO and LRU lists Dave Chinner
2010-10-21  0:49 ` [PATCH 06/21] fs: Clean up inode reference counting Dave Chinner
2010-10-21  1:41   ` Christoph Hellwig
2010-10-21  0:49 ` [PATCH 07/21] exofs: use iput() for inode reference count decrements Dave Chinner
2010-10-21  0:49 ` [PATCH 08/21] fs: rework icount to be a locked variable Dave Chinner
2010-10-21 19:40   ` Al Viro
2010-10-21 22:32     ` Dave Chinner
2010-10-21  0:49 ` [PATCH 09/21] fs: Factor inode hash operations into functions Dave Chinner
2010-10-21  0:49 ` [PATCH 10/21] fs: Stop abusing find_inode_fast in iunique Dave Chinner
2010-10-21  0:49 ` [PATCH 11/21] fs: move i_ref increments into find_inode/find_inode_fast Dave Chinner
2010-10-21  0:49 ` [PATCH 12/21] fs: remove inode_add_to_list/__inode_add_to_list Dave Chinner
2010-10-21  0:49 ` [PATCH 13/21] fs: Introduce per-bucket inode hash locks Dave Chinner
2010-10-21  0:49 ` [PATCH 14/21] fs: add a per-superblock lock for the inode list Dave Chinner
2010-10-21  0:49 ` [PATCH 15/21] fs: split locking of inode writeback and LRU lists Dave Chinner
2010-10-21  0:49 ` [PATCH 16/21] fs: Protect inode->i_state with the inode->i_lock Dave Chinner
2010-10-22  1:56   ` Al Viro
2010-10-22  2:26     ` Nick Piggin
2010-10-22  3:14     ` Dave Chinner
2010-10-22 10:37       ` Al Viro
2010-10-22 11:40         ` Christoph Hellwig
2010-10-23 21:40           ` Al Viro
2010-10-23 21:37         ` Al Viro
2010-10-24 14:13           ` Christoph Hellwig
2010-10-24 16:21             ` Christoph Hellwig
2010-10-24 19:17               ` Al Viro
2010-10-24 20:04                 ` Christoph Hellwig
2010-10-24 20:36                   ` Al Viro
2010-10-24  2:18       ` Nick Piggin
2010-10-21  0:49 ` Dave Chinner [this message]
2010-10-21  2:17   ` [PATCH 17/21] fs: protect wake_up_inode with inode->i_lock Christoph Hellwig
2010-10-21 13:16     ` Nick Piggin
2010-10-21  0:49 ` [PATCH 18/21] fs: introduce a per-cpu last_ino allocator Dave Chinner
2010-10-21  0:49 ` [PATCH 19/21] fs: icache remove inode_lock Dave Chinner
2010-10-21  2:14   ` Christian Stroetmann
2010-10-21  0:49 ` [PATCH 20/21] fs: Reduce inode I_FREEING and factor inode disposal Dave Chinner
2010-10-21  0:49 ` [PATCH 21/21] fs: do not assign default i_ino in new_inode Dave Chinner
2010-10-21  5:04 ` Inode Lock Scalability V7 (was V6) Dave Chinner
2010-10-21 13:20   ` Nick Piggin
2010-10-21 23:52     ` Dave Chinner
2010-10-22  0:45       ` Nick Piggin
2010-10-22  2:20         ` Al Viro
2010-10-22  2:34           ` Nick Piggin
2010-10-22  2:41             ` Nick Piggin
2010-10-22  2:48               ` Nick Piggin
2010-10-22  3:12                 ` Al Viro
2010-10-22  4:48                   ` Nick Piggin
2010-10-22  3:07             ` Al Viro
2010-10-22  4:46               ` Nick Piggin
2010-10-22  5:01                 ` Nick Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1287622186-1935-18-git-send-email-david@fromorbit.com \
    --to=david@fromorbit.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).