All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC] __bd_forget should wait for inodes using the mapping
@ 2004-06-18  1:54 Chris Mason
  2004-06-18  2:01 ` Chris Mason
  2004-06-18  2:10 ` viro
  0 siblings, 2 replies; 16+ messages in thread
From: Chris Mason @ 2004-06-18  1:54 UTC (permalink / raw)
  To: akpm, linux-kernel

__bd_forget will change the mapping for filesystem inodes without 
waiting to make sure no users of the block device address space are 
using that mapping.

In the case of background writeout, it is possible for __bd_forget 
to free the block device inode while mpage_writepages is still 
looking through the mapping for dirty pages.  This is because
each device node in the filesystem has a pointer to the block
device address space, and __bd_forget is used to reset those pointers
before the block device inode is freed.  

There is no locking to make sure __bd_forget isn't running 
at the same time as __writeback_single_inode is run on the 
filesystem device node.

Here's an example patch that should fix things, Andi just found
a race where I wasn't holding onto the filesystem inode correctly,
so this rev got a last minute fix before I wander off for the night.

It's quite ugly, I'm hoping we can hash out something better.

Index: linux.t/fs/block_dev.c
===================================================================
--- linux.t.orig/fs/block_dev.c	2004-06-17 21:14:08.000000000 -0400
+++ linux.t/fs/block_dev.c	2004-06-17 21:46:46.203782616 -0400
@@ -24,6 +24,7 @@
 #include <linux/uio.h>
 #include <linux/namei.h>
 #include <asm/uaccess.h>
+#include <linux/writeback.h>
 
 struct bdev_inode {
 	struct block_device bdev;
@@ -258,11 +259,31 @@ static void init_once(void * foo, kmem_c
 	}
 }
 
+/* 
+ * we have to make sure that we don't free the block
+ * device inode and mapping while one of the inodes using
+ * it is in background writeback. 
+ *
+ * The lock ordering required elsewhere is bdev_lock->inode_lock.
+ */
 static inline void __bd_forget(struct inode *inode)
 {
+	spin_lock(&inode_lock);
+	__iget(inode);
+	while (inode->i_state & I_LOCK) {
+		spin_unlock(&bdev_lock);
+		spin_unlock(&inode_lock);
+		__wait_on_inode(inode);
+		spin_lock(&bdev_lock);
+		spin_lock(&inode_lock);
+	}
 	list_del_init(&inode->i_devices);
 	inode->i_bdev = NULL;
 	inode->i_mapping = &inode->i_data;
+	spin_unlock(&inode_lock);
+	spin_unlock(&bdev_lock);
+	iput(inode);
+	spin_lock(&bdev_lock);
 }
 
 static void bdev_clear_inode(struct inode *inode)



^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2004-06-18 23:37 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-06-18  1:54 [PATCH RFC] __bd_forget should wait for inodes using the mapping Chris Mason
2004-06-18  2:01 ` Chris Mason
2004-06-18  2:10 ` viro
2004-06-18 13:03   ` Chris Mason
2004-06-18 14:22     ` viro
2004-06-18 14:47       ` Chris Mason
2004-06-18 15:15         ` viro
2004-06-18 15:41           ` Chris Mason
2004-06-18 15:43             ` viro
2004-06-18 16:05               ` Chris Mason
2004-06-18 20:26                 ` Andrew Morton
2004-06-18 20:44                   ` Chris Mason
2004-06-18 21:27                     ` Andrew Morton
2004-06-18 23:15                       ` Chris Mason
2004-06-18 23:25                         ` Andrew Morton
2004-06-18 14:20   ` Chris Mason

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.