public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] btrfs: call mark_inode_dirty when i_size is updated
@ 2009-02-02 11:00 Hisashi Hifumi
  2009-02-02 14:12 ` Chris Mason
  0 siblings, 1 reply; 6+ messages in thread
From: Hisashi Hifumi @ 2009-02-02 11:00 UTC (permalink / raw)
  To: chris.mason, linux-btrfs

Hi Chris.

I think it is needed to call mark_inode_dirty() when file size expands
in order to flush metadata updates to HDD through sync() syscall or
background_writeout().

Thanks.

Signed-off-by: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>

diff -Nrup linux-2.6.29-rc3.org/fs/btrfs/file.c linux-2.6.29-rc3/fs/btrfs/file.c
--- linux-2.6.29-rc3.org/fs/btrfs/file.c	2009-02-02 16:53:18.000000000 +0900
+++ linux-2.6.29-rc3/fs/btrfs/file.c	2009-02-02 18:21:00.000000000 +0900
@@ -152,7 +152,7 @@ static noinline int dirty_and_release_pa
 	}
 	if (end_pos > isize) {
 		i_size_write(inode, end_pos);
-		btrfs_update_inode(trans, root, inode);
+		mark_inode_dirty(inode);
 	}
 	err = btrfs_end_transaction(trans, root);
 out_unlock:


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] btrfs: call mark_inode_dirty when i_size is updated
  2009-02-02 11:00 [PATCH] btrfs: call mark_inode_dirty when i_size is updated Hisashi Hifumi
@ 2009-02-02 14:12 ` Chris Mason
  2009-02-03  0:36   ` Hisashi Hifumi
  0 siblings, 1 reply; 6+ messages in thread
From: Chris Mason @ 2009-02-02 14:12 UTC (permalink / raw)
  To: Hisashi Hifumi; +Cc: linux-btrfs

On Mon, 2009-02-02 at 20:00 +0900, Hisashi Hifumi wrote:
> Hi Chris.
> 
> I think it is needed to call mark_inode_dirty() when file size expands
> in order to flush metadata updates to HDD through sync() syscall or
> background_writeout().
> 

Thanks for reading through this code and sending the patch.

I find the I_DIRTY flags one of the more confusing parts of the generic
fs writeback cdoe.  But, I think what happens is the
btrfs_set_page_dirty function calls __set_page_dirty_nobuffers() which
does:

if (mapping->host) {
     /* !PageAnon && !swapper_space */
      __mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
}

This should be enough to make sure the btrfs inodes are processed by
background writeout and sync().  Please let me know if I'm misreading
things.

-chris



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] btrfs: call mark_inode_dirty when i_size is updated
  2009-02-02 14:12 ` Chris Mason
@ 2009-02-03  0:36   ` Hisashi Hifumi
  2009-02-03  1:04     ` Chris Mason
  0 siblings, 1 reply; 6+ messages in thread
From: Hisashi Hifumi @ 2009-02-03  0:36 UTC (permalink / raw)
  To: Chris Mason; +Cc: linux-btrfs


At 23:12 09/02/02, Chris Mason wrote:
>On Mon, 2009-02-02 at 20:00 +0900, Hisashi Hifumi wrote:
>> Hi Chris.
>> 
>> I think it is needed to call mark_inode_dirty() when file size expands
>> in order to flush metadata updates to HDD through sync() syscall or
>> background_writeout().
>> 
>
>Thanks for reading through this code and sending the patch.
>
>I find the I_DIRTY flags one of the more confusing parts of the generic
>fs writeback cdoe.  But, I think what happens is the
>btrfs_set_page_dirty function calls __set_page_dirty_nobuffers() which
>does:
>
>if (mapping->host) {
>     /* !PageAnon && !swapper_space */
>      __mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
>}
>
>This should be enough to make sure the btrfs inodes are processed by
>background writeout and sync().  Please let me know if I'm misreading
>things.

Surely, as you pointed out, btrfs_set_page_dirty calls
if (mapping->host) {
     /* !PageAnon && !swapper_space */
      __mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
}
through _set_page_dirty_nobuffers.
But I_DIRTY_PAGES is not sufficient.
To flush metadata update to HDD through sync(), I_DIRTY_SYNC or 
I_DIRTY_DATASYNC flag is needed. see __sync_single_inode.

Thanks.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] btrfs: call mark_inode_dirty when i_size is updated
  2009-02-03  0:36   ` Hisashi Hifumi
@ 2009-02-03  1:04     ` Chris Mason
  2009-02-03  2:43       ` Hisashi Hifumi
  0 siblings, 1 reply; 6+ messages in thread
From: Chris Mason @ 2009-02-03  1:04 UTC (permalink / raw)
  To: Hisashi Hifumi; +Cc: linux-btrfs

On Tue, 2009-02-03 at 09:36 +0900, Hisashi Hifumi wrote:
> At 23:12 09/02/02, Chris Mason wrote:
> >On Mon, 2009-02-02 at 20:00 +0900, Hisashi Hifumi wrote:
> >> Hi Chris.
> >> 
> >> I think it is needed to call mark_inode_dirty() when file size expands
> >> in order to flush metadata updates to HDD through sync() syscall or
> >> background_writeout().
> >> 
> >
> >Thanks for reading through this code and sending the patch.
> >
> >I find the I_DIRTY flags one of the more confusing parts of the generic
> >fs writeback cdoe.  But, I think what happens is the
> >btrfs_set_page_dirty function calls __set_page_dirty_nobuffers() which
> >does:
> >
> >if (mapping->host) {
> >     /* !PageAnon && !swapper_space */
> >      __mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
> >}
> >
> >This should be enough to make sure the btrfs inodes are processed by
> >background writeout and sync().  Please let me know if I'm misreading
> >things.
> 
> Surely, as you pointed out, btrfs_set_page_dirty calls
> if (mapping->host) {
>      /* !PageAnon && !swapper_space */
>       __mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
> }
> through _set_page_dirty_nobuffers.
> But I_DIRTY_PAGES is not sufficient.
> To flush metadata update to HDD through sync(), I_DIRTY_SYNC or 
> I_DIRTY_DATASYNC flag is needed. see __sync_single_inode.

Since btrfs uses a dirty_inode callback, our inodes are never really
dirty. The btree metadata always has the same information as the in-core
inode does.

The extra transaction commit steps taken at sync time are enough to get
all the relevant metadata on disk.

So, I think what happens is that I_DIRTY_PAGES is enough to get the data
pages on disk and the transaction commit gets the metadata on disk.

-chris



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] btrfs: call mark_inode_dirty when i_size is updated
  2009-02-03  1:04     ` Chris Mason
@ 2009-02-03  2:43       ` Hisashi Hifumi
  2009-02-03 16:22         ` Chris Mason
  0 siblings, 1 reply; 6+ messages in thread
From: Hisashi Hifumi @ 2009-02-03  2:43 UTC (permalink / raw)
  To: Chris Mason; +Cc: linux-btrfs


At 10:04 09/02/03, Chris Mason wrote:
>On Tue, 2009-02-03 at 09:36 +0900, Hisashi Hifumi wrote:
>> At 23:12 09/02/02, Chris Mason wrote:
>> >On Mon, 2009-02-02 at 20:00 +0900, Hisashi Hifumi wrote:
>> >> Hi Chris.
>> >> 
>> >> I think it is needed to call mark_inode_dirty() when file size expands
>> >> in order to flush metadata updates to HDD through sync() syscall or
>> >> background_writeout().
>> >> 
>> >
>> >Thanks for reading through this code and sending the patch.
>> >
>> >I find the I_DIRTY flags one of the more confusing parts of the generic
>> >fs writeback cdoe.  But, I think what happens is the
>> >btrfs_set_page_dirty function calls __set_page_dirty_nobuffers() which
>> >does:
>> >
>> >if (mapping->host) {
>> >     /* !PageAnon && !swapper_space */
>> >      __mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
>> >}
>> >
>> >This should be enough to make sure the btrfs inodes are processed by
>> >background writeout and sync().  Please let me know if I'm misreading
>> >things.
>> 
>> Surely, as you pointed out, btrfs_set_page_dirty calls
>> if (mapping->host) {
>>      /* !PageAnon && !swapper_space */
>>       __mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
>> }
>> through _set_page_dirty_nobuffers.
>> But I_DIRTY_PAGES is not sufficient.
>> To flush metadata update to HDD through sync(), I_DIRTY_SYNC or 
>> I_DIRTY_DATASYNC flag is needed. see __sync_single_inode.
>
>Since btrfs uses a dirty_inode callback, our inodes are never really
>dirty. The btree metadata always has the same information as the in-core
>inode does.
>
>The extra transaction commit steps taken at sync time are enough to get
>all the relevant metadata on disk.
>
>So, I think what happens is that I_DIRTY_PAGES is enough to get the data
>pages on disk and the transaction commit gets the metadata on disk.


metadata update transaction is made through dirty_inode call back, but
to run dirty_inode callback I_DIRTY_SYNC or  I_DIRTY_DATASYNC flag is needed.
(see __mark_inode_dirty).

Also, to commit transaction to disk through write_inode callback, I_DIRTY_SYNC 
or  I_DIRTY_DATASYNC flag is needed.(see _sync_single_inode)

So I think my patch fixes this issue.




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] btrfs: call mark_inode_dirty when i_size is updated
  2009-02-03  2:43       ` Hisashi Hifumi
@ 2009-02-03 16:22         ` Chris Mason
  0 siblings, 0 replies; 6+ messages in thread
From: Chris Mason @ 2009-02-03 16:22 UTC (permalink / raw)
  To: Hisashi Hifumi; +Cc: linux-btrfs

On Tue, 2009-02-03 at 11:43 +0900, Hisashi Hifumi wrote:
> At 10:04 09/02/03, Chris Mason wrote:
> >On Tue, 2009-02-03 at 09:36 +0900, Hisashi Hifumi wrote:
> >> At 23:12 09/02/02, Chris Mason wrote:
> >> >On Mon, 2009-02-02 at 20:00 +0900, Hisashi Hifumi wrote:
> >> >> Hi Chris.
> >> >> 
> >> >> I think it is needed to call mark_inode_dirty() when file size expands
> >> >> in order to flush metadata updates to HDD through sync() syscall or
> >> >> background_writeout().
> >> >> 
> >> >
> >> >Thanks for reading through this code and sending the patch.
> >> >
> >> >I find the I_DIRTY flags one of the more confusing parts of the generic
> >> >fs writeback cdoe.  But, I think what happens is the
> >> >btrfs_set_page_dirty function calls __set_page_dirty_nobuffers() which
> >> >does:
> >> >
> >> >if (mapping->host) {
> >> >     /* !PageAnon && !swapper_space */
> >> >      __mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
> >> >}
> >> >
> >> >This should be enough to make sure the btrfs inodes are processed by
> >> >background writeout and sync().  Please let me know if I'm misreading
> >> >things.
> >> 
> >> Surely, as you pointed out, btrfs_set_page_dirty calls
> >> if (mapping->host) {
> >>      /* !PageAnon && !swapper_space */
> >>       __mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
> >> }
> >> through _set_page_dirty_nobuffers.
> >> But I_DIRTY_PAGES is not sufficient.
> >> To flush metadata update to HDD through sync(), I_DIRTY_SYNC or 
> >> I_DIRTY_DATASYNC flag is needed. see __sync_single_inode.
> >
> >Since btrfs uses a dirty_inode callback, our inodes are never really
> >dirty. The btree metadata always has the same information as the in-core
> >inode does.
> >
> >The extra transaction commit steps taken at sync time are enough to get
> >all the relevant metadata on disk.
> >
> >So, I think what happens is that I_DIRTY_PAGES is enough to get the data
> >pages on disk and the transaction commit gets the metadata on disk.
> 
> 
> metadata update transaction is made through dirty_inode call back, but
> to run dirty_inode callback I_DIRTY_SYNC or  I_DIRTY_DATASYNC flag is needed.
> (see __mark_inode_dirty).

Btrfs is using btrfs_update_inode to get inode updates into the btree
when it changes things.  The only places that trigger the dirty_inode
callback are the generic code that do atime/mtime/ctime updates and a
few other generic VFS operations.

>
> Also, to commit transaction to disk through write_inode callback, I_DIRTY_SYNC 
> or  I_DIRTY_DATASYNC flag is needed.(see _sync_single_inode)
> 
> So I think my patch fixes this issue.

I think I don't fully understand which caller of write_inode (or
__sync_single_inode) we're missing.

For O_DIRECT and O_SYNC, btrfs_file_write is responsible for getting the
metadata on disk.

For sync(), the btrfs super operations do a full transaction commit.

In all of the above cases, the btree is always up to date with the
contents of the vfs inode.  We just need to make sure a transaction
commit (or log operation) is done for the file.

-chris



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2009-02-03 16:22 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-02-02 11:00 [PATCH] btrfs: call mark_inode_dirty when i_size is updated Hisashi Hifumi
2009-02-02 14:12 ` Chris Mason
2009-02-03  0:36   ` Hisashi Hifumi
2009-02-03  1:04     ` Chris Mason
2009-02-03  2:43       ` Hisashi Hifumi
2009-02-03 16:22         ` Chris Mason

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox