* BUG with delayed allocation @ 2008-03-19 8:52 Aneesh Kumar K.V 2008-03-19 14:42 ` Eric Sandeen 2008-03-20 0:46 ` Mingming Cao 0 siblings, 2 replies; 9+ messages in thread From: Aneesh Kumar K.V @ 2008-03-19 8:52 UTC (permalink / raw) To: Mingming Cao, Eric Sandeen; +Cc: ext4 ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: BUG with delayed allocation 2008-03-19 8:52 BUG with delayed allocation Aneesh Kumar K.V @ 2008-03-19 14:42 ` Eric Sandeen 2008-03-19 18:32 ` Aneesh Kumar K.V 2008-03-20 0:46 ` Mingming Cao 1 sibling, 1 reply; 9+ messages in thread From: Eric Sandeen @ 2008-03-19 14:42 UTC (permalink / raw) To: Aneesh Kumar K.V; +Cc: Mingming Cao, ext4 Aneesh Kumar K.V wrote: > Hi, > > Eric actually observed it yesterday. I am able to reproduce it locally. > With delayed allocation we are observing wrong value of i_size. > > kenrel: 2.6.25-rc5 + > With all the patches in the patch queue ... I also see this with only: patches/delalloc-vfs.patch patches/delalloc-ext4.patch applied. -Eric ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: BUG with delayed allocation 2008-03-19 14:42 ` Eric Sandeen @ 2008-03-19 18:32 ` Aneesh Kumar K.V 2008-03-20 4:21 ` Aneesh Kumar K.V 0 siblings, 1 reply; 9+ messages in thread From: Aneesh Kumar K.V @ 2008-03-19 18:32 UTC (permalink / raw) To: Eric Sandeen; +Cc: Mingming Cao, ext4 ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: BUG with delayed allocation 2008-03-19 18:32 ` Aneesh Kumar K.V @ 2008-03-20 4:21 ` Aneesh Kumar K.V 0 siblings, 0 replies; 9+ messages in thread From: Aneesh Kumar K.V @ 2008-03-20 4:21 UTC (permalink / raw) To: ext4; +Cc: Mingming Cao, Eric Sandeen This mail didn't seem to reach the list. -aneesh On Thu, Mar 20, 2008 at 12:02:55AM +0530, Aneesh Kumar K.V wrote: > On Wed, Mar 19, 2008 at 09:42:13AM -0500, Eric Sandeen wrote: > > Aneesh Kumar K.V wrote: > > > Hi, > > > > > > Eric actually observed it yesterday. I am able to reproduce it locally. > > > With delayed allocation we are observing wrong value of i_size. > > > > > > kenrel: 2.6.25-rc5 + > > > With all the patches in the patch queue > > > > ... > > > > I also see this with only: > > > > patches/delalloc-vfs.patch > > patches/delalloc-ext4.patch > > > > applied. > > This one seems to work for me. Need to think a bit more about parallel > truncate. > > -aneesh > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > index 79930df..b74426d 100644 > --- a/fs/ext4/inode.c > +++ b/fs/ext4/inode.c > @@ -1488,6 +1488,43 @@ out: > > return ret; > } > +static int ext4_da_writepage(struct page *page, > + struct writeback_control *wbc) > +{ > + struct inode *inode = page->mapping->host; > + handle_t *handle = NULL; > + int ret = 0; > + int err; > + > + if (ext4_journal_current_handle()) > + goto out_fail; > + > + handle = ext4_journal_start(inode, ext4_writepage_trans_blocks(inode)); > + if (IS_ERR(handle)) { > + ret = PTR_ERR(handle); > + goto out_fail; > + } > + > + if (test_opt(inode->i_sb, NOBH) && ext4_should_writeback_data(inode)) > + ret = nobh_writepage(page, ext4_get_block, wbc); > + else > + ret = block_write_full_page(page, ext4_get_block, wbc); > + > + if (!ret && inode->i_size > EXT4_I(inode)->i_disksize) { > + EXT4_I(inode)->i_disksize = inode->i_size; > + ext4_mark_inode_dirty(handle, inode); > + } > + > + err = ext4_journal_stop(handle); > + if (!ret) > + ret = err; > + return ret; > + > +out_fail: > + redirty_page_for_writepage(wbc, page); > + unlock_page(page); > + return ret; > +} > > static int ext4_da_writepages(struct address_space *mapping, > struct writeback_control *wbc) > @@ -2015,7 +2052,7 @@ static const struct address_space_operations ext4_journalled_aops = { > static const struct address_space_operations ext4_da_aops = { > .readpage = ext4_readpage, > .readpages = ext4_readpages, > - .writepage = ext4_writeback_writepage, > + .writepage = ext4_da_writepage, > .writepages = ext4_da_writepages, > .sync_page = block_sync_page, > .write_begin = ext4_da_write_begin, ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: BUG with delayed allocation 2008-03-19 8:52 BUG with delayed allocation Aneesh Kumar K.V 2008-03-19 14:42 ` Eric Sandeen @ 2008-03-20 0:46 ` Mingming Cao 2008-03-20 5:39 ` Aneesh Kumar K.V 1 sibling, 1 reply; 9+ messages in thread From: Mingming Cao @ 2008-03-20 0:46 UTC (permalink / raw) To: Aneesh Kumar K.V; +Cc: Eric Sandeen, ext4 On Wed, 2008-03-19 at 14:22 +0530, Aneesh Kumar K.V wrote: > Hi, > > Eric actually observed it yesterday. I am able to reproduce it locally. > With delayed allocation we are observing wrong value of i_size. > The problem is current delalloc does not update on-disk i_size until writeout time, the in-core i_size is updated though. Could you try the following patch? It updates the i_disksize at the write_end time. Signed-off-by: Mingming Cao <cmm@us.ibm.com> --- fs/ext4/inode.c | 39 ++++++++++++++++++++++++++++++++++++++- 1 file changed, 38 insertions(+), 1 deletion(-) Index: linux-2.6.25-rc5/fs/ext4/inode.c =================================================================== --- linux-2.6.25-rc5.orig/fs/ext4/inode.c 2008-03-19 17:32:44.000000000 -0700 +++ linux-2.6.25-rc5/fs/ext4/inode.c 2008-03-19 17:43:19.000000000 -0700 @@ -1355,6 +1355,43 @@ static int ext4_writeback_write_end(stru return ret ? ret : copied; } +static int ext4_da_write_end(struct file *file, + struct address_space *mapping, + loff_t pos, unsigned len, unsigned copied, + struct page *page, void *fsdata) +{ + handle_t *handle; + struct inode *inode = file->f_mapping->host; + int needed_blocks = ext4_writepage_trans_blocks(inode); + int ret = 0, ret2; + loff_t new_i_size; + + handle = ext4_journal_start(inode, needed_blocks); + if (IS_ERR(handle)) { + unlock_page(page); + page_cache_release(page); + ret = PTR_ERR(handle); + return ret; + } + + new_i_size = pos + copied; + if (new_i_size > EXT4_I(inode)->i_disksize) + EXT4_I(inode)->i_disksize = new_i_size; + + copied = ext4_generic_write_end(file, mapping, pos, len, copied, + page, fsdata); + if (copied < 0) + ret = copied; + + ret2 = ext4_journal_stop(handle); + if (!ret) + ret = ret2; + unlock_page(page); + page_cache_release(page); + + return ret ? ret : copied; +} + static int ext4_journalled_write_end(struct file *file, struct address_space *mapping, loff_t pos, unsigned len, unsigned copied, @@ -2020,7 +2057,7 @@ static const struct address_space_operat .writepages = ext4_da_writepages, .sync_page = block_sync_page, .write_begin = ext4_da_write_begin, - .write_end = generic_write_end, + .write_end = ext4_da_write_end, .bmap = ext4_bmap, .invalidatepage = ext4_da_invalidatepage, .releasepage = ext4_releasepage, ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: BUG with delayed allocation 2008-03-20 0:46 ` Mingming Cao @ 2008-03-20 5:39 ` Aneesh Kumar K.V 2008-03-20 17:29 ` Mingming Cao 0 siblings, 1 reply; 9+ messages in thread From: Aneesh Kumar K.V @ 2008-03-20 5:39 UTC (permalink / raw) To: Mingming Cao; +Cc: Eric Sandeen, ext4 On Wed, Mar 19, 2008 at 05:46:58PM -0700, Mingming Cao wrote: > On Wed, 2008-03-19 at 14:22 +0530, Aneesh Kumar K.V wrote: > > Hi, > > > > Eric actually observed it yesterday. I am able to reproduce it locally. > > With delayed allocation we are observing wrong value of i_size. > > > > The problem is current delalloc does not update on-disk i_size until > writeout time, the in-core i_size is updated though. > ext4_da_writepages actually update the i_disksize during writeout time. The writepage callback for delalloc was using ext4_writeback_writepage, which didn't update the i_disksize. That is why some of the files have correct size while some doesn't. I tested the below change and sent this to list. But i appears vger dropped the mail to the list. diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 79930df..b74426d 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -1488,6 +1488,43 @@ out: return ret; } +static int ext4_da_writepage(struct page *page, + struct writeback_control *wbc) +{ + struct inode *inode = page->mapping->host; + handle_t *handle = NULL; + int ret = 0; + int err; + + if (ext4_journal_current_handle()) + goto out_fail; + + handle = ext4_journal_start(inode, ext4_writepage_trans_blocks(inode)); + if (IS_ERR(handle)) { + ret = PTR_ERR(handle); + goto out_fail; + } + + if (test_opt(inode->i_sb, NOBH) && ext4_should_writeback_data(inode)) + ret = nobh_writepage(page, ext4_get_block, wbc); + else + ret = block_write_full_page(page, ext4_get_block, wbc); + + if (!ret && inode->i_size > EXT4_I(inode)->i_disksize) { + EXT4_I(inode)->i_disksize = inode->i_size; + ext4_mark_inode_dirty(handle, inode); + } + + err = ext4_journal_stop(handle); + if (!ret) + ret = err; + return ret; + +out_fail: + redirty_page_for_writepage(wbc, page); + unlock_page(page); + return ret; +} static int ext4_da_writepages(struct address_space *mapping, struct writeback_control *wbc) @@ -2015,7 +2052,7 @@ static const struct address_space_operations ext4_journalled_aops = { static const struct address_space_operations ext4_da_aops = { .readpage = ext4_readpage, .readpages = ext4_readpages, - .writepage = ext4_writeback_writepage, + .writepage = ext4_da_writepage, .writepages = ext4_da_writepages, .sync_page = block_sync_page, .write_begin = ext4_da_write_begin, > Could you try the following patch? It updates the i_disksize at the > write_end time. > I will test the patch and update you. BTW shouldn't we update i_disksize only after actual block got allocated ? > > Signed-off-by: Mingming Cao <cmm@us.ibm.com> > --- > fs/ext4/inode.c | 39 ++++++++++++++++++++++++++++++++++++++- > 1 file changed, 38 insertions(+), 1 deletion(-) > > Index: linux-2.6.25-rc5/fs/ext4/inode.c > =================================================================== > --- linux-2.6.25-rc5.orig/fs/ext4/inode.c 2008-03-19 17:32:44.000000000 -0700 > +++ linux-2.6.25-rc5/fs/ext4/inode.c 2008-03-19 17:43:19.000000000 -0700 > @@ -1355,6 +1355,43 @@ static int ext4_writeback_write_end(stru > return ret ? ret : copied; > } > > +static int ext4_da_write_end(struct file *file, > + struct address_space *mapping, > + loff_t pos, unsigned len, unsigned copied, > + struct page *page, void *fsdata) > +{ > + handle_t *handle; > + struct inode *inode = file->f_mapping->host; > + int needed_blocks = ext4_writepage_trans_blocks(inode); > + int ret = 0, ret2; > + loff_t new_i_size; > + > + handle = ext4_journal_start(inode, needed_blocks); > + if (IS_ERR(handle)) { > + unlock_page(page); > + page_cache_release(page); > + ret = PTR_ERR(handle); > + return ret; > + } > + > + new_i_size = pos + copied; > + if (new_i_size > EXT4_I(inode)->i_disksize) > + EXT4_I(inode)->i_disksize = new_i_size; > + > + copied = ext4_generic_write_end(file, mapping, pos, len, copied, > + page, fsdata); > + if (copied < 0) > + ret = copied; > + > + ret2 = ext4_journal_stop(handle); > + if (!ret) > + ret = ret2; > + unlock_page(page); > + page_cache_release(page); > + > + return ret ? ret : copied; > +} > + > static int ext4_journalled_write_end(struct file *file, > struct address_space *mapping, > loff_t pos, unsigned len, unsigned copied, > @@ -2020,7 +2057,7 @@ static const struct address_space_operat > .writepages = ext4_da_writepages, > .sync_page = block_sync_page, > .write_begin = ext4_da_write_begin, > - .write_end = generic_write_end, > + .write_end = ext4_da_write_end, > .bmap = ext4_bmap, > .invalidatepage = ext4_da_invalidatepage, > .releasepage = ext4_releasepage, > > -aneesh ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: BUG with delayed allocation 2008-03-20 5:39 ` Aneesh Kumar K.V @ 2008-03-20 17:29 ` Mingming Cao 2008-03-20 17:56 ` Aneesh Kumar K.V 0 siblings, 1 reply; 9+ messages in thread From: Mingming Cao @ 2008-03-20 17:29 UTC (permalink / raw) To: Aneesh Kumar K.V; +Cc: Eric Sandeen, ext4 On Thu, 2008-03-20 at 11:09 +0530, Aneesh Kumar K.V wrote: > > Could you try the following patch? It updates the i_disksize at the > > write_end time. > > > > I will test the patch and update you. BTW shouldn't we update > i_disksize only after actual block got allocated ? > > Hmm...I am not 100% sure but I think we should not to change the behavior that the on-disk inode size should be updated when write() returns to user. Right now the in-memory inode size is updated, user would expecting the same when they run e2fsck, but e2fsck reads inode size from disk. Pushing the inode i_disksize update at the writeout (allocation) time will cause the window that i_size is different than the i_disksize being enlarged quite big. Mingming > > > > Signed-off-by: Mingming Cao <cmm@us.ibm.com> > > --- > > fs/ext4/inode.c | 39 ++++++++++++++++++++++++++++++++++++++- > > 1 file changed, 38 insertions(+), 1 deletion(-) > > > > Index: linux-2.6.25-rc5/fs/ext4/inode.c > > =================================================================== > > --- linux-2.6.25-rc5.orig/fs/ext4/inode.c 2008-03-19 17:32:44.000000000 -0700 > > +++ linux-2.6.25-rc5/fs/ext4/inode.c 2008-03-19 17:43:19.000000000 -0700 > > @@ -1355,6 +1355,43 @@ static int ext4_writeback_write_end(stru > > return ret ? ret : copied; > > } > > > > +static int ext4_da_write_end(struct file *file, > > + struct address_space *mapping, > > + loff_t pos, unsigned len, unsigned copied, > > + struct page *page, void *fsdata) > > +{ > > + handle_t *handle; > > + struct inode *inode = file->f_mapping->host; > > + int needed_blocks = ext4_writepage_trans_blocks(inode); > > + int ret = 0, ret2; > > + loff_t new_i_size; > > + > > + handle = ext4_journal_start(inode, needed_blocks); > > + if (IS_ERR(handle)) { > > + unlock_page(page); > > + page_cache_release(page); > > + ret = PTR_ERR(handle); > > + return ret; > > + } > > + > > + new_i_size = pos + copied; > > + if (new_i_size > EXT4_I(inode)->i_disksize) > > + EXT4_I(inode)->i_disksize = new_i_size; > > + > > + copied = ext4_generic_write_end(file, mapping, pos, len, copied, > > + page, fsdata); > > + if (copied < 0) > > + ret = copied; > > + > > + ret2 = ext4_journal_stop(handle); > > + if (!ret) > > + ret = ret2; > > + unlock_page(page); > > + page_cache_release(page); > > + > > + return ret ? ret : copied; > > +} > > + > > static int ext4_journalled_write_end(struct file *file, > > struct address_space *mapping, > > loff_t pos, unsigned len, unsigned copied, > > @@ -2020,7 +2057,7 @@ static const struct address_space_operat > > .writepages = ext4_da_writepages, > > .sync_page = block_sync_page, > > .write_begin = ext4_da_write_begin, > > - .write_end = generic_write_end, > > + .write_end = ext4_da_write_end, > > .bmap = ext4_bmap, > > .invalidatepage = ext4_da_invalidatepage, > > .releasepage = ext4_releasepage, > > > > > > -aneesh > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: BUG with delayed allocation 2008-03-20 17:29 ` Mingming Cao @ 2008-03-20 17:56 ` Aneesh Kumar K.V 2008-03-20 23:55 ` Andreas Dilger 0 siblings, 1 reply; 9+ messages in thread From: Aneesh Kumar K.V @ 2008-03-20 17:56 UTC (permalink / raw) To: Mingming Cao; +Cc: Eric Sandeen, ext4 On Thu, Mar 20, 2008 at 10:29:50AM -0700, Mingming Cao wrote: > On Thu, 2008-03-20 at 11:09 +0530, Aneesh Kumar K.V wrote: > > > > Could you try the following patch? It updates the i_disksize at the > > > write_end time. > > > > > > > I will test the patch and update you. BTW shouldn't we update > > i_disksize only after actual block got allocated ? > > > > > Hmm...I am not 100% sure but I think we should not to change the > behavior that the on-disk inode size should be updated when write() > returns to user. Right now the in-memory inode size is updated, user > would expecting the same when they run e2fsck, but e2fsck reads inode > size from disk. Pushing the inode i_disksize update at the writeout > (allocation) time will cause the window that i_size is different than > the i_disksize being enlarged quite big. > If we are updating i_disksize during write_end and if we crash before actually allocating the blocks e2fsck will find errors because the inode doesn't really have that many blocks right ? -aneesh ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: BUG with delayed allocation 2008-03-20 17:56 ` Aneesh Kumar K.V @ 2008-03-20 23:55 ` Andreas Dilger 0 siblings, 0 replies; 9+ messages in thread From: Andreas Dilger @ 2008-03-20 23:55 UTC (permalink / raw) To: Aneesh Kumar K.V; +Cc: Mingming Cao, Eric Sandeen, ext4 On Mar 20, 2008 23:26 +0530, Aneesh Kumar K.V wrote: > On Thu, Mar 20, 2008 at 10:29:50AM -0700, Mingming Cao wrote: > > On Thu, 2008-03-20 at 11:09 +0530, Aneesh Kumar K.V wrote: > > > > Could you try the following patch? It updates the i_disksize at the > > > > write_end time. > > > > > > I will test the patch and update you. BTW shouldn't we update > > > i_disksize only after actual block got allocated ? > > > > Hmm...I am not 100% sure but I think we should not to change the > > behavior that the on-disk inode size should be updated when write() > > returns to user. Right now the in-memory inode size is updated, user > > would expecting the same when they run e2fsck, but e2fsck reads inode > > size from disk. Pushing the inode i_disksize update at the writeout > > (allocation) time will cause the window that i_size is different than > > the i_disksize being enlarged quite big. > > If we are updating i_disksize during write_end and if we crash before actually > allocating the blocks e2fsck will find errors because the inode doesn't > really have that many blocks right ? No, it would just think the file is sparse and return \0 for the reads. That said, I don't agree with Mingming - the i_disksize should only be increased at the time the blocks are allocated on disk and not when the file is extended in memory. Even if the window where i_size is different than i_disksize is large, this is only important after a crash, and at that time ordered mode users want the file to have a shorter i_disksize and the file contains only valid data, instead of the extended i_size but the file contains \0 bytes at the end. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2008-03-20 23:55 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-03-19 8:52 BUG with delayed allocation Aneesh Kumar K.V 2008-03-19 14:42 ` Eric Sandeen 2008-03-19 18:32 ` Aneesh Kumar K.V 2008-03-20 4:21 ` Aneesh Kumar K.V 2008-03-20 0:46 ` Mingming Cao 2008-03-20 5:39 ` Aneesh Kumar K.V 2008-03-20 17:29 ` Mingming Cao 2008-03-20 17:56 ` Aneesh Kumar K.V 2008-03-20 23:55 ` Andreas Dilger
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).