All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: linux-fsdevel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Subject: [PATCH 17/21] fs: protect wake_up_inode with inode->i_lock
Date: Thu, 21 Oct 2010 11:49:42 +1100	[thread overview]
Message-ID: <1287622186-1935-18-git-send-email-david@fromorbit.com> (raw)
In-Reply-To: <1287622186-1935-1-git-send-email-david@fromorbit.com>

From: Dave Chinner <dchinner@redhat.com>

Currently we have an optimisation in place in unlock_new_inode() to
avoid taking the inode_lock. This uses memory barriers to ensure
that the the clearing of I_NEW can be seen by all other CPUs. It
also means the wake_up_inode() call relies on a memory barrier to
ensure that processes it is waking see any changes the i_state field
prior to the wakeup being issued.

This serialisation is also necessary to protect the inode against
lookup/freeing races once the inode_lock is removed. The current
lookup and wait code is atomic as it is all performed under the
inode_lock, while the modified code now splits the locks. Hence we
can get the following race:

Thread 1:				Thread 2:
					iput_final
					spin_lock(&inode->i_lock);
hlist_bl_lock()
  Find inode
    spin_lock(&inode->i_lock)
					......
					inode->i_state = I_FREEING;
					spin_unlock(&inode->i_lock);
					remove_inode_hash()
					   hlist_bl_lock()
    ......
    if (I_FREEING)
	hlist_bl_unlock()
					   ......
					   hlist_bl_del_init()
					   hlist_bl_unlock()
					wake_up_inode()

	__wait_on_freeing_inode()
	  put on waitqueue
	  spin_unlock(&inode->i_lock);
	  schedule()

To avoid this race, wake ups need to be serialised against the
waiters, and using inode->i_lock is the natural solution.

The memory barrier optimisation is no longer needed to avoid the
global inode_lock contention as the inode->i_state field is now
protected by inode->i_lock. Hence we can revert the code to a much
simpler form and correctly serialise wake ups against
__wait_on_freeing_inode() by holding the inode->i_lock while we do
the wakeup.

Given the newfound simplicity of wake_up_inode() and the fact that
we need to change i_state in unlock_new_inode() before the wakeup,
just open code the wakeup in the couple of spots it is used and
remove wake_up_inode() entirely.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/inode.c |   41 ++++++++++++++++++-----------------------
 1 files changed, 18 insertions(+), 23 deletions(-)

diff --git a/fs/inode.c b/fs/inode.c
index 32bac58..ba514a1 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -175,15 +175,6 @@ int proc_nr_inodes(ctl_table *table, int write,
 }
 #endif
 
-static void wake_up_inode(struct inode *inode)
-{
-	/*
-	 * Prevent speculative execution through spin_unlock(&inode->i_lock);
-	 */
-	smp_mb();
-	wake_up_bit(&inode->i_state, __I_NEW);
-}
-
 /**
  * inode_init_always - perform inode structure intialisation
  * @sb: superblock inode belongs to
@@ -540,7 +531,9 @@ static void dispose_list(struct list_head *head)
 		__inode_sb_list_del(inode);
 		spin_unlock(&inode_lock);
 
-		wake_up_inode(inode);
+		spin_lock(&inode->i_lock);
+		wake_up_bit(&inode->i_state, __I_NEW);
+		spin_unlock(&inode->i_lock);
 		destroy_inode(inode);
 	}
 }
@@ -862,6 +855,13 @@ struct inode *new_inode(struct super_block *sb)
 }
 EXPORT_SYMBOL(new_inode);
 
+/**
+ * unlock_new_inode - clear the I_NEW state and wake up any waiters
+ * @inode:	new inode to unlock
+ *
+ * Called when the inode is fully initialised to clear the new state of the
+ * inode and wake up anyone waiting for the inode to finish initialisation.
+ */
 void unlock_new_inode(struct inode *inode)
 {
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
@@ -881,19 +881,11 @@ void unlock_new_inode(struct inode *inode)
 		}
 	}
 #endif
-	/*
-	 * This is special!  We do not need the spinlock when clearing I_NEW,
-	 * because we're guaranteed that nobody else tries to do anything about
-	 * the state of the inode when it is locked, as we just created it (so
-	 * there can be no old holders that haven't tested I_NEW).
-	 * However we must emit the memory barrier so that other CPUs reliably
-	 * see the clearing of I_NEW after the other inode initialisation has
-	 * completed.
-	 */
-	smp_mb();
+	spin_lock(&inode->i_lock);
 	WARN_ON(!(inode->i_state & I_NEW));
 	inode->i_state &= ~I_NEW;
-	wake_up_inode(inode);
+	wake_up_bit(&inode->i_state, __I_NEW);
+	spin_unlock(&inode->i_lock);
 }
 EXPORT_SYMBOL(unlock_new_inode);
 
@@ -1486,8 +1478,10 @@ static void iput_final(struct inode *inode)
 	spin_unlock(&inode_lock);
 	evict(inode);
 	remove_inode_hash(inode);
-	wake_up_inode(inode);
+	spin_lock(&inode->i_lock);
+	wake_up_bit(&inode->i_state, __I_NEW);
 	BUG_ON(inode->i_state != (I_FREEING | I_CLEAR));
+	spin_unlock(&inode->i_lock);
 	destroy_inode(inode);
 }
 
@@ -1690,7 +1684,8 @@ EXPORT_SYMBOL(inode_wait);
  * to recheck inode state.
  *
  * It doesn't matter if I_NEW is not set initially, a call to
- * wake_up_inode() after removing from the hash list will DTRT.
+ * wake_up_bit(&inode->i_state, __I_NEW) with the i_lock held after removing
+ * from the hash list will DTRT.
  *
  * This is called with inode_lock held.
  *
-- 
1.7.1


  parent reply	other threads:[~2010-10-21  0:50 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-10-21  0:49 Inode Lock Scalability V6 Dave Chinner
2010-10-21  0:49 ` [PATCH 01/21] fs: switch bdev inode bdi's correctly Dave Chinner
2010-10-21  0:49 ` [PATCH 02/21] kernel: add bl_list Dave Chinner
2010-10-21  0:49 ` [PATCH 03/21] fs: Convert nr_inodes and nr_unused to per-cpu counters Dave Chinner
2010-10-21  0:49 ` [PATCH 04/21] fs: Implement lazy LRU updates for inodes Dave Chinner
2010-10-21  2:14   ` Christian Stroetmann
2010-10-21 10:07   ` Nick Piggin
2010-10-21 12:22     ` Christoph Hellwig
2010-10-23  9:32   ` Al Viro
2010-10-21  0:49 ` [PATCH 05/21] fs: inode split IO and LRU lists Dave Chinner
2010-10-21  0:49 ` [PATCH 06/21] fs: Clean up inode reference counting Dave Chinner
2010-10-21  1:41   ` Christoph Hellwig
2010-10-21  0:49 ` [PATCH 07/21] exofs: use iput() for inode reference count decrements Dave Chinner
2010-10-21  0:49 ` [PATCH 08/21] fs: rework icount to be a locked variable Dave Chinner
2010-10-21 19:40   ` Al Viro
2010-10-21 22:32     ` Dave Chinner
2010-10-21  0:49 ` [PATCH 09/21] fs: Factor inode hash operations into functions Dave Chinner
2010-10-21  0:49 ` [PATCH 10/21] fs: Stop abusing find_inode_fast in iunique Dave Chinner
2010-10-21  0:49 ` [PATCH 11/21] fs: move i_ref increments into find_inode/find_inode_fast Dave Chinner
2010-10-21  0:49 ` [PATCH 12/21] fs: remove inode_add_to_list/__inode_add_to_list Dave Chinner
2010-10-21  0:49 ` [PATCH 13/21] fs: Introduce per-bucket inode hash locks Dave Chinner
2010-10-21  0:49 ` [PATCH 14/21] fs: add a per-superblock lock for the inode list Dave Chinner
2010-10-21  0:49 ` [PATCH 15/21] fs: split locking of inode writeback and LRU lists Dave Chinner
2010-10-21  0:49 ` [PATCH 16/21] fs: Protect inode->i_state with the inode->i_lock Dave Chinner
2010-10-22  1:56   ` Al Viro
2010-10-22  2:26     ` Nick Piggin
2010-10-22  3:14     ` Dave Chinner
2010-10-22 10:37       ` Al Viro
2010-10-22 11:40         ` Christoph Hellwig
2010-10-23 21:40           ` Al Viro
2010-10-23 21:37         ` Al Viro
2010-10-24 14:13           ` Christoph Hellwig
2010-10-24 16:21             ` Christoph Hellwig
2010-10-24 19:17               ` Al Viro
2010-10-24 20:04                 ` Christoph Hellwig
2010-10-24 20:36                   ` Al Viro
2010-10-24  2:18       ` Nick Piggin
2010-10-21  0:49 ` Dave Chinner [this message]
2010-10-21  2:17   ` [PATCH 17/21] fs: protect wake_up_inode with inode->i_lock Christoph Hellwig
2010-10-21 13:16     ` Nick Piggin
2010-10-21  0:49 ` [PATCH 18/21] fs: introduce a per-cpu last_ino allocator Dave Chinner
2010-10-21  0:49 ` [PATCH 19/21] fs: icache remove inode_lock Dave Chinner
2010-10-21  2:14   ` Christian Stroetmann
2010-10-21  0:49 ` [PATCH 20/21] fs: Reduce inode I_FREEING and factor inode disposal Dave Chinner
2010-10-21  0:49 ` [PATCH 21/21] fs: do not assign default i_ino in new_inode Dave Chinner
2010-10-21  5:04 ` Inode Lock Scalability V7 (was V6) Dave Chinner
2010-10-21 13:20   ` Nick Piggin
2010-10-21 23:52     ` Dave Chinner
2010-10-22  0:45       ` Nick Piggin
2010-10-22  2:20         ` Al Viro
2010-10-22  2:34           ` Nick Piggin
2010-10-22  2:41             ` Nick Piggin
2010-10-22  2:48               ` Nick Piggin
2010-10-22  3:12                 ` Al Viro
2010-10-22  4:48                   ` Nick Piggin
2010-10-22  3:07             ` Al Viro
2010-10-22  4:46               ` Nick Piggin
2010-10-22  5:01                 ` Nick Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1287622186-1935-18-git-send-email-david@fromorbit.com \
    --to=david@fromorbit.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.