All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC] __bd_forget should wait for inodes using the mapping
@ 2004-06-18  1:54 Chris Mason
  2004-06-18  2:01 ` Chris Mason
  2004-06-18  2:10 ` viro
  0 siblings, 2 replies; 16+ messages in thread
From: Chris Mason @ 2004-06-18  1:54 UTC (permalink / raw)
  To: akpm, linux-kernel

__bd_forget will change the mapping for filesystem inodes without 
waiting to make sure no users of the block device address space are 
using that mapping.

In the case of background writeout, it is possible for __bd_forget 
to free the block device inode while mpage_writepages is still 
looking through the mapping for dirty pages.  This is because
each device node in the filesystem has a pointer to the block
device address space, and __bd_forget is used to reset those pointers
before the block device inode is freed.  

There is no locking to make sure __bd_forget isn't running 
at the same time as __writeback_single_inode is run on the 
filesystem device node.

Here's an example patch that should fix things, Andi just found
a race where I wasn't holding onto the filesystem inode correctly,
so this rev got a last minute fix before I wander off for the night.

It's quite ugly, I'm hoping we can hash out something better.

Index: linux.t/fs/block_dev.c
===================================================================
--- linux.t.orig/fs/block_dev.c	2004-06-17 21:14:08.000000000 -0400
+++ linux.t/fs/block_dev.c	2004-06-17 21:46:46.203782616 -0400
@@ -24,6 +24,7 @@
 #include <linux/uio.h>
 #include <linux/namei.h>
 #include <asm/uaccess.h>
+#include <linux/writeback.h>
 
 struct bdev_inode {
 	struct block_device bdev;
@@ -258,11 +259,31 @@ static void init_once(void * foo, kmem_c
 	}
 }
 
+/* 
+ * we have to make sure that we don't free the block
+ * device inode and mapping while one of the inodes using
+ * it is in background writeback. 
+ *
+ * The lock ordering required elsewhere is bdev_lock->inode_lock.
+ */
 static inline void __bd_forget(struct inode *inode)
 {
+	spin_lock(&inode_lock);
+	__iget(inode);
+	while (inode->i_state & I_LOCK) {
+		spin_unlock(&bdev_lock);
+		spin_unlock(&inode_lock);
+		__wait_on_inode(inode);
+		spin_lock(&bdev_lock);
+		spin_lock(&inode_lock);
+	}
 	list_del_init(&inode->i_devices);
 	inode->i_bdev = NULL;
 	inode->i_mapping = &inode->i_data;
+	spin_unlock(&inode_lock);
+	spin_unlock(&bdev_lock);
+	iput(inode);
+	spin_lock(&bdev_lock);
 }
 
 static void bdev_clear_inode(struct inode *inode)



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC] __bd_forget should wait for inodes using the mapping
  2004-06-18  1:54 [PATCH RFC] __bd_forget should wait for inodes using the mapping Chris Mason
@ 2004-06-18  2:01 ` Chris Mason
  2004-06-18  2:10 ` viro
  1 sibling, 0 replies; 16+ messages in thread
From: Chris Mason @ 2004-06-18  2:01 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel

On Thu, 2004-06-17 at 21:54, Chris Mason wrote:

> Here's an example patch that should fix things, Andi just found
> a race where I wasn't holding onto the filesystem inode correctly,
> so this rev got a last minute fix before I wander off for the night.

Ugh, this oopses on boot.  I'll provide a better example patch tomorrow
unless someone comes up with something clean over night ;-)

-chris



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC] __bd_forget should wait for inodes using the mapping
  2004-06-18  1:54 [PATCH RFC] __bd_forget should wait for inodes using the mapping Chris Mason
  2004-06-18  2:01 ` Chris Mason
@ 2004-06-18  2:10 ` viro
  2004-06-18 13:03   ` Chris Mason
  2004-06-18 14:20   ` Chris Mason
  1 sibling, 2 replies; 16+ messages in thread
From: viro @ 2004-06-18  2:10 UTC (permalink / raw)
  To: Chris Mason; +Cc: akpm, linux-kernel

On Thu, Jun 17, 2004 at 09:54:28PM -0400, Chris Mason wrote:
> __bd_forget will change the mapping for filesystem inodes without 
> waiting to make sure no users of the block device address space are 
> using that mapping.

Filesystem block device inodes have no business even looking at their
->i_mapping.  Where do you need to do that?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC] __bd_forget should wait for inodes using the mapping
  2004-06-18  2:10 ` viro
@ 2004-06-18 13:03   ` Chris Mason
  2004-06-18 14:22     ` viro
  2004-06-18 14:20   ` Chris Mason
  1 sibling, 1 reply; 16+ messages in thread
From: Chris Mason @ 2004-06-18 13:03 UTC (permalink / raw)
  To: viro; +Cc: akpm, linux-kernel

On Thu, 2004-06-17 at 22:10, viro@parcelfarce.linux.theplanet.co.uk
wrote:
> On Thu, Jun 17, 2004 at 09:54:28PM -0400, Chris Mason wrote:
> > __bd_forget will change the mapping for filesystem inodes without 
> > waiting to make sure no users of the block device address space are 
> > using that mapping.
> 
> Filesystem block device inodes have no business even looking at their
> ->i_mapping.  Where do you need to do that?

sync_sb_inodes, the filesystem block device inode ends up on some dirty
list, and under memory pressure balance_dirty_pages_ratelimited will
trigger writeback on it.  

There's nothing to write back of course, the real block device address
space has no dirty pages at all.  But, writeback is looking through the
mapping and __bd_forget can't drop it until writeback has finished
checking it.

I've verified this really is happening ;-) The patch I sent is nasty but
I'm sure this is a real bug.

-chris



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC] __bd_forget should wait for inodes using the mapping
  2004-06-18  2:10 ` viro
  2004-06-18 13:03   ` Chris Mason
@ 2004-06-18 14:20   ` Chris Mason
  1 sibling, 0 replies; 16+ messages in thread
From: Chris Mason @ 2004-06-18 14:20 UTC (permalink / raw)
  To: viro; +Cc: akpm, linux-kernel

On Thu, 2004-06-17 at 22:10, viro@parcelfarce.linux.theplanet.co.uk
wrote:
> On Thu, Jun 17, 2004 at 09:54:28PM -0400, Chris Mason wrote:
> > __bd_forget will change the mapping for filesystem inodes without 
> > waiting to make sure no users of the block device address space are 
> > using that mapping.

Well, here's a new patch, I'm confident you'll like it even less. My
fixes so far will trigger iput on inode B during the clear_inode call of
inode A.  Even if this is legal today, it seems doomed to cause bugs
later on.

Another option is to inc the reference count of the block device inode
during background write out.  But, since we're racing with the block
device clear_inode call here, that will need to involve the bdev
spinlock somewhere along the way, and has lock ordering issues since the
block device code already takes inode_lock under bdev_lock.

-chris
---

__bd_forget will change the mapping for an inode without waiting to make
sure no users of the inode are using that mapping.  In the case of
background writeout, it is possible for __bd_forget to free the
block device inode while mpage_writepages is still looking through the
mapping for dirty pages.

Index: linux.t/fs/block_dev.c
===================================================================
--- linux.t.orig/fs/block_dev.c	2004-06-17 21:14:08.000000000 -0400
+++ linux.t/fs/block_dev.c	2004-06-18 09:21:41.000000000 -0400
@@ -24,6 +24,7 @@
 #include <linux/uio.h>
 #include <linux/namei.h>
 #include <asm/uaccess.h>
+#include <linux/writeback.h>
 
 struct bdev_inode {
 	struct block_device bdev;
@@ -258,11 +259,48 @@ static void init_once(void * foo, kmem_c
 	}
 }
 
+/* 
+ * we have to make sure that we don't free the block
+ * device inode and mapping while one of the inodes using
+ * it is in background writeback. 
+ *
+ * The lock ordering required elsewhere is bdev_lock->inode_lock.
+ */
 static inline void __bd_forget(struct inode *inode)
 {
+	int iget_done = 0;
+	spin_lock(&inode_lock);
+
+	/* when the inode is being freed, we could be racing
+	 * with its clear_inode call.  The only thing saving us
+	 * is the bdev_lock, inode->i_bdev is still not null
+	 * and the clear_inode call will need to get the bdev lock
+	 * before it can zero it.
+	 *
+	 * don't __iget, wait on the inode, or iput in this case,
+	 */
+	if (inode->i_state & I_FREEING)
+		goto clear_it;			
+
+	__iget(inode);
+	iget_done = 1;
+	while (inode->i_state & I_LOCK) {
+		spin_unlock(&bdev_lock);
+		spin_unlock(&inode_lock);
+		__wait_on_inode(inode);
+		spin_lock(&bdev_lock);
+		spin_lock(&inode_lock);
+	}
+clear_it:
 	list_del_init(&inode->i_devices);
 	inode->i_bdev = NULL;
 	inode->i_mapping = &inode->i_data;
+	spin_unlock(&inode_lock);
+	if (iget_done) {
+		spin_unlock(&bdev_lock);
+		iput(inode);
+		spin_lock(&bdev_lock);
+	}
 }
 
 static void bdev_clear_inode(struct inode *inode)








^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC] __bd_forget should wait for inodes using the mapping
  2004-06-18 13:03   ` Chris Mason
@ 2004-06-18 14:22     ` viro
  2004-06-18 14:47       ` Chris Mason
  0 siblings, 1 reply; 16+ messages in thread
From: viro @ 2004-06-18 14:22 UTC (permalink / raw)
  To: Chris Mason; +Cc: akpm, linux-kernel

On Fri, Jun 18, 2004 at 09:03:31AM -0400, Chris Mason wrote:
> sync_sb_inodes, the filesystem block device inode ends up on some dirty
> list, and under memory pressure balance_dirty_pages_ratelimited will
> trigger writeback on it.  
> 
> There's nothing to write back of course, the real block device address
> space has no dirty pages at all.  But, writeback is looking through the
> mapping and __bd_forget can't drop it until writeback has finished
> checking it.

So WTF does writeback bother with that?  _That_ is the real bug here -
the only kind of bdev inodes that can have accesses to ->i_mapping
is fs/block_dev.c-created stuff.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC] __bd_forget should wait for inodes using the mapping
  2004-06-18 14:22     ` viro
@ 2004-06-18 14:47       ` Chris Mason
  2004-06-18 15:15         ` viro
  0 siblings, 1 reply; 16+ messages in thread
From: Chris Mason @ 2004-06-18 14:47 UTC (permalink / raw)
  To: viro; +Cc: akpm, linux-kernel

On Fri, 2004-06-18 at 10:22, viro@parcelfarce.linux.theplanet.co.uk
wrote:
> On Fri, Jun 18, 2004 at 09:03:31AM -0400, Chris Mason wrote:
> > sync_sb_inodes, the filesystem block device inode ends up on some dirty
> > list, and under memory pressure balance_dirty_pages_ratelimited will
> > trigger writeback on it.  
> > 
> > There's nothing to write back of course, the real block device address
> > space has no dirty pages at all.  But, writeback is looking through the
> > mapping and __bd_forget can't drop it until writeback has finished
> > checking it.
> 
> So WTF does writeback bother with that?  _That_ is the real bug here -
> the only kind of bdev inodes that can have accesses to ->i_mapping
> is fs/block_dev.c-created stuff.

During writeback, we need to answer the question: "are there dirty pages
attached to this inode", and the only way to answer it is via the
address space.

If bdev inodes don't want other inodes using their address space, they
shouldn't be setting the i_mapping on other inodes.  Since they are, the
bdev code needs to be aware that someone else might be using it.

-chris




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC] __bd_forget should wait for inodes using the mapping
  2004-06-18 14:47       ` Chris Mason
@ 2004-06-18 15:15         ` viro
  2004-06-18 15:41           ` Chris Mason
  0 siblings, 1 reply; 16+ messages in thread
From: viro @ 2004-06-18 15:15 UTC (permalink / raw)
  To: Chris Mason; +Cc: akpm, linux-kernel

On Fri, Jun 18, 2004 at 10:47:11AM -0400, Chris Mason wrote:
> During writeback, we need to answer the question: "are there dirty pages
> attached to this inode", and the only way to answer it is via the
> address space.
> 
> If bdev inodes don't want other inodes using their address space, they
> shouldn't be setting the i_mapping on other inodes.  Since they are, the
> bdev code needs to be aware that someone else might be using it.

Scheduled for 2.7.1; for now users of ->i_mapping (the fewer of them remain,
the better) have to be aware of bdev.

And yes, ->i_mapping flips on "normal" bdev inodes will go away - we set
->f_mapping on open directly.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC] __bd_forget should wait for inodes using the mapping
  2004-06-18 15:15         ` viro
@ 2004-06-18 15:41           ` Chris Mason
  2004-06-18 15:43             ` viro
  0 siblings, 1 reply; 16+ messages in thread
From: Chris Mason @ 2004-06-18 15:41 UTC (permalink / raw)
  To: viro; +Cc: akpm, linux-kernel

On Fri, 2004-06-18 at 11:15, viro@parcelfarce.linux.theplanet.co.uk
wrote:
> On Fri, Jun 18, 2004 at 10:47:11AM -0400, Chris Mason wrote:
> > During writeback, we need to answer the question: "are there dirty pages
> > attached to this inode", and the only way to answer it is via the
> > address space.
> > 
> > If bdev inodes don't want other inodes using their address space, they
> > shouldn't be setting the i_mapping on other inodes.  Since they are, the
> > bdev code needs to be aware that someone else might be using it.
> 
> Scheduled for 2.7.1; for now users of ->i_mapping (the fewer of them remain,
> the better) have to be aware of bdev.
> 
> And yes, ->i_mapping flips on "normal" bdev inodes will go away - we set
> ->f_mapping on open directly.

Fair enough, I'll cook up some code to bump the inode->bdev->bd_inode
i_count in __sync_single_inode  It won't be pretty either though, I'll
have to drop the inode_lock so that some function can take the bdev_lock
to safely use inode->i_bdev.

-chris



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC] __bd_forget should wait for inodes using the mapping
  2004-06-18 15:41           ` Chris Mason
@ 2004-06-18 15:43             ` viro
  2004-06-18 16:05               ` Chris Mason
  0 siblings, 1 reply; 16+ messages in thread
From: viro @ 2004-06-18 15:43 UTC (permalink / raw)
  To: Chris Mason; +Cc: akpm, linux-kernel

On Fri, Jun 18, 2004 at 11:41:43AM -0400, Chris Mason wrote:
> > And yes, ->i_mapping flips on "normal" bdev inodes will go away - we set
> > ->f_mapping on open directly.
> 
> Fair enough, I'll cook up some code to bump the inode->bdev->bd_inode
> i_count in __sync_single_inode  It won't be pretty either though, I'll
> have to drop the inode_lock so that some function can take the bdev_lock
> to safely use inode->i_bdev.

*Ugh*

You do realize that ->i_bdev is not promised to be there either?  Could you
show the actual code that steps into this mess?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC] __bd_forget should wait for inodes using the mapping
  2004-06-18 15:43             ` viro
@ 2004-06-18 16:05               ` Chris Mason
  2004-06-18 20:26                 ` Andrew Morton
  0 siblings, 1 reply; 16+ messages in thread
From: Chris Mason @ 2004-06-18 16:05 UTC (permalink / raw)
  To: viro; +Cc: akpm, linux-kernel

On Fri, 2004-06-18 at 11:43, viro@parcelfarce.linux.theplanet.co.uk
wrote:
> On Fri, Jun 18, 2004 at 11:41:43AM -0400, Chris Mason wrote:
> > > And yes, ->i_mapping flips on "normal" bdev inodes will go away - we set
> > > ->f_mapping on open directly.
> > 
> > Fair enough, I'll cook up some code to bump the inode->bdev->bd_inode
> > i_count in __sync_single_inode  It won't be pretty either though, I'll
> > have to drop the inode_lock so that some function can take the bdev_lock
> > to safely use inode->i_bdev.
> 
> *Ugh*
> 
> You do realize that ->i_bdev is not promised to be there either?  Could you
> show the actual code that steps into this mess?
> 
Grin, it won't be pretty, i_bdev can't be trusted without the bdev lock,
and I'll need to check for I_FREEING on it to make sure it isn't in
clear_inode.

The sequence leading up to all of this looks something like this:

CPU 0:					CPU 1: 

pdflush finds				umount /dev/sda1
FS inode for
/dev/sda1 in dirty list,
makes it's way down
to __sync_single_inode

mapping = inode->i_mapping
(this points bdev address
 space)
					kill_block_super
					close_bdev_excl
					blkdev_put
					bdput(bdev)
					iput(bdev->bd_inode)
					...
					clear_inode(bdev inode)
					bdev_clear_inode
					__bd_forget(inode for /dev/sda1)
					...
					destroy_inode(bdev inode)
...
do_writepages(mapping, wbc)

Since mapping on CPU0 still points to the bdev address space, and that
has been freed by the destroy inode, we run into problems.  

Maybe the real bug is the FS inode should never have ended up in the
dirty list.  This should all work fine if the bdev inode were the only
one to ever hit a dirty list.

-chris



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC] __bd_forget should wait for inodes using the mapping
  2004-06-18 16:05               ` Chris Mason
@ 2004-06-18 20:26                 ` Andrew Morton
  2004-06-18 20:44                   ` Chris Mason
  0 siblings, 1 reply; 16+ messages in thread
From: Andrew Morton @ 2004-06-18 20:26 UTC (permalink / raw)
  To: Chris Mason; +Cc: viro, linux-kernel

Chris Mason <mason@suse.com> wrote:
>
> Maybe the real bug is the FS inode should never have ended up in the
> dirty list.

It'd be interesting to find out where and why it is being dirtied (atime?),
but even if we can prevent that from happening, people can still do things
like chmod on it, so we're back in the same situation.

There's tight coupling between writing back the inode and writing back its
pages, and at times it has caused problems.  It's not clear _why_ there
should be such a coupling but it's never been a sufficient problem to
justify ripping it all up.

>  This should all work fine if the bdev inode were the only
> one to ever hit a dirty list.

Something like this?

--- 25/fs/fs-writeback.c~dont-writeback-fd-bdev-inodes	Fri Jun 18 12:54:08 2004
+++ 25-akpm/fs/fs-writeback.c	Fri Jun 18 13:25:16 2004
@@ -156,7 +156,8 @@ __sync_single_inode(struct inode *inode,
 	struct address_space *mapping = inode->i_mapping;
 	struct super_block *sb = inode->i_sb;
 	int wait = wbc->sync_mode == WB_SYNC_ALL;
-	int ret;
+	int is_fs_bdev;		/* Is a block-special node */
+	int ret = 0;
 
 	BUG_ON(inode->i_state & I_LOCK);
 
@@ -167,13 +168,23 @@ __sync_single_inode(struct inode *inode,
 
 	spin_unlock(&inode_lock);
 
-	ret = do_writepages(mapping, wbc);
+	/*
+	 * blockdev address_spaces are always written back via their internal
+	 * inodes, not via their /dev/hdXX inodes, so use is_fs_bdev to skip
+	 * them here.  We still need to write back the inode itself.
+	 * And we cannot touch ->i_mapping of /dev/hdXX inodes at all, because
+	 * umount can change their ->i_mapping.
+	 */
+	is_fs_bdev = S_ISBLK(inode->i_mode) && (sb != blockdev_superblock);
+
+	if (!is_fs_bdev)
+		ret = do_writepages(mapping, wbc);
 
 	/* Don't write the inode if only I_DIRTY_PAGES was set */
 	if (dirty & (I_DIRTY_SYNC | I_DIRTY_DATASYNC))
 		write_inode(inode, wait);
 
-	if (wait) {
+	if (wait && !is_fs_bdev) {
 		int err = filemap_fdatawait(mapping);
 		if (ret == 0)
 			ret = err;
@@ -182,7 +193,7 @@ __sync_single_inode(struct inode *inode,
 	spin_lock(&inode_lock);
 	inode->i_state &= ~I_LOCK;
 	if (!(inode->i_state & I_FREEING)) {
-		if (!(inode->i_state & I_DIRTY) &&
+		if (!(inode->i_state & I_DIRTY) && !is_fs_bdev &&
 		    mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) {
 			/*
 			 * We didn't write back all the pages.  nfs_writepages()
_


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC] __bd_forget should wait for inodes using the mapping
  2004-06-18 20:26                 ` Andrew Morton
@ 2004-06-18 20:44                   ` Chris Mason
  2004-06-18 21:27                     ` Andrew Morton
  0 siblings, 1 reply; 16+ messages in thread
From: Chris Mason @ 2004-06-18 20:44 UTC (permalink / raw)
  To: Andrew Morton; +Cc: viro, linux-kernel

On Fri, 2004-06-18 at 16:26, Andrew Morton wrote:
> Chris Mason <mason@suse.com> wrote:
> >
> > Maybe the real bug is the FS inode should never have ended up in the
> > dirty list.
> 
> It'd be interesting to find out where and why it is being dirtied (atime?),
> but even if we can prevent that from happening, people can still do things
> like chmod on it, so we're back in the same situation.
> 
> There's tight coupling between writing back the inode and writing back its
> pages, and at times it has caused problems.  It's not clear _why_ there
> should be such a coupling but it's never been a sufficient problem to
> justify ripping it all up.
> 
> >  This should all work fine if the bdev inode were the only
> > one to ever hit a dirty list.
> 
> Something like this?

[ skip writing block-special inodes ]

Hmmm, any risk in missing data integrity syncs because of this?

-chris



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC] __bd_forget should wait for inodes using the mapping
  2004-06-18 20:44                   ` Chris Mason
@ 2004-06-18 21:27                     ` Andrew Morton
  2004-06-18 23:15                       ` Chris Mason
  0 siblings, 1 reply; 16+ messages in thread
From: Andrew Morton @ 2004-06-18 21:27 UTC (permalink / raw)
  To: Chris Mason; +Cc: viro, linux-kernel

Chris Mason <mason@suse.com> wrote:
>
> [ skip writing block-special inodes ]
> 
> Hmmm, any risk in missing data integrity syncs because of this?

Need to think about that.  sys_fsync(), sys_fdatasync() and sys_msync() go
direct to file->f_mapping and sys_sync() will sync the blockdev via its
kernel-internal inode.  What does that leave?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC] __bd_forget should wait for inodes using the mapping
  2004-06-18 21:27                     ` Andrew Morton
@ 2004-06-18 23:15                       ` Chris Mason
  2004-06-18 23:25                         ` Andrew Morton
  0 siblings, 1 reply; 16+ messages in thread
From: Chris Mason @ 2004-06-18 23:15 UTC (permalink / raw)
  To: Andrew Morton; +Cc: viro, linux-kernel

On Fri, 2004-06-18 at 17:27, Andrew Morton wrote:
> Chris Mason <mason@suse.com> wrote:
> >
> > [ skip writing block-special inodes ]
> > 
> > Hmmm, any risk in missing data integrity syncs because of this?
> 
> Need to think about that.  sys_fsync(), sys_fdatasync() and sys_msync() go
> direct to file->f_mapping and sys_sync() will sync the blockdev via its
> kernel-internal inode.  What does that leave?

I was worried about O_SYNC, That actually looks safe though,
generic_osync_inode will first write the mapping via filemap_fdatawrite
(the mapping comes from f_mapping).

It doesn't really give me warm fuzzies, but looks safe enough.  Al had a
slightly different plan, maybe with your patch we can push his larger
changes off a bit?

-chris



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC] __bd_forget should wait for inodes using the mapping
  2004-06-18 23:15                       ` Chris Mason
@ 2004-06-18 23:25                         ` Andrew Morton
  0 siblings, 0 replies; 16+ messages in thread
From: Andrew Morton @ 2004-06-18 23:25 UTC (permalink / raw)
  To: Chris Mason; +Cc: viro, linux-kernel

Chris Mason <mason@suse.com> wrote:
>
> On Fri, 2004-06-18 at 17:27, Andrew Morton wrote:
> > Chris Mason <mason@suse.com> wrote:
> > >
> > > [ skip writing block-special inodes ]
> > > 
> > > Hmmm, any risk in missing data integrity syncs because of this?
> > 
> > Need to think about that.  sys_fsync(), sys_fdatasync() and sys_msync() go
> > direct to file->f_mapping and sys_sync() will sync the blockdev via its
> > kernel-internal inode.  What does that leave?
> 
> I was worried about O_SYNC, That actually looks safe though,
> generic_osync_inode will first write the mapping via filemap_fdatawrite
> (the mapping comes from f_mapping).
> 
> It doesn't really give me warm fuzzies, but looks safe enough.  Al had a
> slightly different plan, maybe with your patch we can push his larger
> changes off a bit?

>From a design POV the patch I sent isn't very nice, and does add code to a
warmpath.  If there's some way in which we can defer the i_mapping switch
until all references have gone away, that would be better?


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2004-06-18 23:37 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-06-18  1:54 [PATCH RFC] __bd_forget should wait for inodes using the mapping Chris Mason
2004-06-18  2:01 ` Chris Mason
2004-06-18  2:10 ` viro
2004-06-18 13:03   ` Chris Mason
2004-06-18 14:22     ` viro
2004-06-18 14:47       ` Chris Mason
2004-06-18 15:15         ` viro
2004-06-18 15:41           ` Chris Mason
2004-06-18 15:43             ` viro
2004-06-18 16:05               ` Chris Mason
2004-06-18 20:26                 ` Andrew Morton
2004-06-18 20:44                   ` Chris Mason
2004-06-18 21:27                     ` Andrew Morton
2004-06-18 23:15                       ` Chris Mason
2004-06-18 23:25                         ` Andrew Morton
2004-06-18 14:20   ` Chris Mason

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.