* [PATCH] ext4: use vmtruncate() instead of ext4_truncate() in ext4_setattr() @ 2011-05-17 22:59 Jiaying Zhang 2011-05-18 2:46 ` Yongqiang Yang ` (3 more replies) 0 siblings, 4 replies; 22+ messages in thread From: Jiaying Zhang @ 2011-05-17 22:59 UTC (permalink / raw) To: tytso; +Cc: linux-ext4 There is a bug in commit c8d46e41 "ext4: Add flag to files with blocks intentionally past EOF" that if we fallocate a file with FALLOC_FL_KEEP_SIZE flag and then ftruncate the file to a size larger than the file's i_size, any allocated but unwritten blocks will be freed but the file size is set to the size that ftruncate specifies. Here is a simple test to reproduce the problem: 1. fallocate a 12k size file with KEEP_SIZE flag 2. write the first 4k 3. ftruncate the file to 8k Then 'ls -l' shows that the i_size of the file becomes 8k but debugfs shows the file has only the first written block left. Below is the proposed patch to fix the bug: ext4: use vmtruncate() instead of ext4_truncate() in ext4_setattr(). Change ext4_setattr() to use vmtruncate(inode, attr->ia_size) instead of ext4_truncate(inode) when it needs to truncate an inode so that if the inode has EXT4_EOFBLOCKS_FL flag set and we are trying to truncate to a size larger than the inode's i_size, we will only truncate the blocks beyond the specified truncate size instead of all of blocks beyond i_size. Signed-off-by: Jiaying Zhang <jiayingz@google.com> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 3424e82..3bfad57 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -5347,8 +5347,11 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr) } } /* ext4_truncate will clear the flag */ - if ((ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS))) - ext4_truncate(inode); + if ((ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS))) { + rc = vmtruncate(inode, attr->ia_size); + if (rc) + goto err_out; + } } if ((attr->ia_valid & ATTR_SIZE) && ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH] ext4: use vmtruncate() instead of ext4_truncate() in ext4_setattr() 2011-05-17 22:59 [PATCH] ext4: use vmtruncate() instead of ext4_truncate() in ext4_setattr() Jiaying Zhang @ 2011-05-18 2:46 ` Yongqiang Yang 2011-05-18 2:56 ` Yongqiang Yang ` (2 subsequent siblings) 3 siblings, 0 replies; 22+ messages in thread From: Yongqiang Yang @ 2011-05-18 2:46 UTC (permalink / raw) To: Jiaying Zhang; +Cc: tytso, linux-ext4 On Wed, May 18, 2011 at 6:59 AM, Jiaying Zhang <jiayingz@google.com> wrote: > There is a bug in commit c8d46e41 "ext4: Add flag to files with blocks > intentionally past EOF" that if we fallocate a file with FALLOC_FL_KEEP_SIZE > flag and then ftruncate the file to a size larger than the file's i_size, > any allocated but unwritten blocks will be freed but the file size is set > to the size that ftruncate specifies. > > Here is a simple test to reproduce the problem: > 1. fallocate a 12k size file with KEEP_SIZE flag > 2. write the first 4k > 3. ftruncate the file to 8k > Then 'ls -l' shows that the i_size of the file becomes 8k but debugfs > shows the file has only the first written block left. > > Below is the proposed patch to fix the bug: > > ext4: use vmtruncate() instead of ext4_truncate() in ext4_setattr(). > > Change ext4_setattr() to use vmtruncate(inode, attr->ia_size) instead > of ext4_truncate(inode) when it needs to truncate an inode so that > if the inode has EXT4_EOFBLOCKS_FL flag set and we are trying to truncate > to a size larger than the inode's i_size, we will only truncate the blocks > beyond the specified truncate size instead of all of blocks beyond i_size. > > Signed-off-by: Jiaying Zhang <jiayingz@google.com> > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > index 3424e82..3bfad57 100644 > --- a/fs/ext4/inode.c > +++ b/fs/ext4/inode.c > @@ -5347,8 +5347,11 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr) > } > } > /* ext4_truncate will clear the flag */ > - if ((ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS))) > - ext4_truncate(inode); > + if ((ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS))) { > + rc = vmtruncate(inode, attr->ia_size); > + if (rc) > + goto err_out; > + } Hi there, It seems that the error handling has problems. 1)We should handle error as the below code does. or 2) we can add a OR condition to the if statement below so that it can handle case here. I prefer the 2nd solution. it looks like: - /* ext4_truncate will clear the flag */ - if ((ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS))) - ext4_truncate(inode); - if ((attr->ia_valid & ATTR_SIZE) && - attr->ia_size != i_size_read(inode)) + if (((attr->ia_valid & ATTR_SIZE) && + attr->ia_size != i_size_read(inode)) || + ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS)) rc = vmtruncate(inode, attr->ia_size); Yongqiang. > } > > if ((attr->ia_valid & ATTR_SIZE) && > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Best Wishes Yongqiang Yang -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] ext4: use vmtruncate() instead of ext4_truncate() in ext4_setattr() 2011-05-17 22:59 [PATCH] ext4: use vmtruncate() instead of ext4_truncate() in ext4_setattr() Jiaying Zhang 2011-05-18 2:46 ` Yongqiang Yang @ 2011-05-18 2:56 ` Yongqiang Yang 2011-05-18 3:27 ` Yongqiang Yang 2011-05-18 3:19 ` Eric Sandeen 2011-05-22 23:56 ` Ted Ts'o 3 siblings, 1 reply; 22+ messages in thread From: Yongqiang Yang @ 2011-05-18 2:56 UTC (permalink / raw) To: Jiaying Zhang; +Cc: tytso, linux-ext4 On Wed, May 18, 2011 at 6:59 AM, Jiaying Zhang <jiayingz@google.com> wrote: > There is a bug in commit c8d46e41 "ext4: Add flag to files with blocks > intentionally past EOF" that if we fallocate a file with FALLOC_FL_KEEP_SIZE > flag and then ftruncate the file to a size larger than the file's i_size, > any allocated but unwritten blocks will be freed but the file size is set > to the size that ftruncate specifies. > > Here is a simple test to reproduce the problem: > 1. fallocate a 12k size file with KEEP_SIZE flag > 2. write the first 4k > 3. ftruncate the file to 8k > Then 'ls -l' shows that the i_size of the file becomes 8k but debugfs > shows the file has only the first written block left. > > Below is the proposed patch to fix the bug: > > ext4: use vmtruncate() instead of ext4_truncate() in ext4_setattr(). > > Change ext4_setattr() to use vmtruncate(inode, attr->ia_size) instead > of ext4_truncate(inode) when it needs to truncate an inode so that > if the inode has EXT4_EOFBLOCKS_FL flag set and we are trying to truncate > to a size larger than the inode's i_size, we will only truncate the blocks > beyond the specified truncate size instead of all of blocks beyond i_size. > > Signed-off-by: Jiaying Zhang <jiayingz@google.com> > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > index 3424e82..3bfad57 100644 > --- a/fs/ext4/inode.c > +++ b/fs/ext4/inode.c > @@ -5347,8 +5347,11 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr) > } > } > /* ext4_truncate will clear the flag */ > - if ((ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS))) > - ext4_truncate(inode); > + if ((ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS))) { > + rc = vmtruncate(inode, attr->ia_size); > + if (rc) > + goto err_out; > + } > } Another question, does vmtruncate() write zeros to blocks beyond old EOF? I had a rough look and could not find the code doing this, maybe due to my density. Yongqiang. > > if ((attr->ia_valid & ATTR_SIZE) && > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Best Wishes Yongqiang Yang -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] ext4: use vmtruncate() instead of ext4_truncate() in ext4_setattr() 2011-05-18 2:56 ` Yongqiang Yang @ 2011-05-18 3:27 ` Yongqiang Yang 0 siblings, 0 replies; 22+ messages in thread From: Yongqiang Yang @ 2011-05-18 3:27 UTC (permalink / raw) To: Jiaying Zhang; +Cc: tytso, linux-ext4 On Wed, May 18, 2011 at 10:56 AM, Yongqiang Yang <xiaoqiangnk@gmail.com> wrote: > On Wed, May 18, 2011 at 6:59 AM, Jiaying Zhang <jiayingz@google.com> wrote: >> There is a bug in commit c8d46e41 "ext4: Add flag to files with blocks >> intentionally past EOF" that if we fallocate a file with FALLOC_FL_KEEP_SIZE >> flag and then ftruncate the file to a size larger than the file's i_size, >> any allocated but unwritten blocks will be freed but the file size is set >> to the size that ftruncate specifies. >> >> Here is a simple test to reproduce the problem: >> 1. fallocate a 12k size file with KEEP_SIZE flag >> 2. write the first 4k >> 3. ftruncate the file to 8k >> Then 'ls -l' shows that the i_size of the file becomes 8k but debugfs >> shows the file has only the first written block left. >> >> Below is the proposed patch to fix the bug: >> >> ext4: use vmtruncate() instead of ext4_truncate() in ext4_setattr(). >> >> Change ext4_setattr() to use vmtruncate(inode, attr->ia_size) instead >> of ext4_truncate(inode) when it needs to truncate an inode so that >> if the inode has EXT4_EOFBLOCKS_FL flag set and we are trying to truncate >> to a size larger than the inode's i_size, we will only truncate the blocks >> beyond the specified truncate size instead of all of blocks beyond i_size. >> >> Signed-off-by: Jiaying Zhang <jiayingz@google.com> >> >> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c >> index 3424e82..3bfad57 100644 >> --- a/fs/ext4/inode.c >> +++ b/fs/ext4/inode.c >> @@ -5347,8 +5347,11 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr) >> } >> } >> /* ext4_truncate will clear the flag */ >> - if ((ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS))) >> - ext4_truncate(inode); >> + if ((ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS))) { >> + rc = vmtruncate(inode, attr->ia_size); >> + if (rc) >> + goto err_out; >> + } >> } > Another question, does vmtruncate() write zeros to blocks beyond old > EOF? I had a rough look and could not find the code doing this, maybe > due to my density. Sorry, ignore this, the extent is unwritten, no need to zero. > > Yongqiang. >> >> if ((attr->ia_valid & ATTR_SIZE) && >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > > > -- > Best Wishes > Yongqiang Yang > -- Best Wishes Yongqiang Yang -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] ext4: use vmtruncate() instead of ext4_truncate() in ext4_setattr() 2011-05-17 22:59 [PATCH] ext4: use vmtruncate() instead of ext4_truncate() in ext4_setattr() Jiaying Zhang 2011-05-18 2:46 ` Yongqiang Yang 2011-05-18 2:56 ` Yongqiang Yang @ 2011-05-18 3:19 ` Eric Sandeen 2011-05-18 5:35 ` Andreas Dilger ` (3 more replies) 2011-05-22 23:56 ` Ted Ts'o 3 siblings, 4 replies; 22+ messages in thread From: Eric Sandeen @ 2011-05-18 3:19 UTC (permalink / raw) To: Jiaying Zhang; +Cc: tytso, linux-ext4 On 5/17/11 5:59 PM, Jiaying Zhang wrote: > There is a bug in commit c8d46e41 "ext4: Add flag to files with blocks > intentionally past EOF" that if we fallocate a file with FALLOC_FL_KEEP_SIZE > flag and then ftruncate the file to a size larger than the file's i_size, > any allocated but unwritten blocks will be freed but the file size is set > to the size that ftruncate specifies. > > Here is a simple test to reproduce the problem: > 1. fallocate a 12k size file with KEEP_SIZE flag > 2. write the first 4k > 3. ftruncate the file to 8k > Then 'ls -l' shows that the i_size of the file becomes 8k but debugfs > shows the file has only the first written block left. To be honest I'm not 100% certain what the fiesystem -should- do in this case. If I go through that same sequence on xfs, I get 4k written / 8k unwritten: # xfs_bmap -vp testfile testfile: EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS 0: [0..7]: 2648750760..2648750767 3 (356066400..356066407) 8 00000 1: [8..23]: 2648750768..2648750783 3 (356066408..356066423) 16 10000 size 8k: # ls -l testfile -rw-r--r-- 1 root root 8192 May 17 22:33 testfile and diskspace used 12k: # du -hc testfile 12K testfile 12K total I think this is a different result from ext4, either with or without your patch. On ext4 I get size 8k, but only the first 4k mapped, as you say. I don't recall when truncate is supposed to free fallocated blocks, and from what point? -Eric > Below is the proposed patch to fix the bug: > > ext4: use vmtruncate() instead of ext4_truncate() in ext4_setattr(). > > Change ext4_setattr() to use vmtruncate(inode, attr->ia_size) instead > of ext4_truncate(inode) when it needs to truncate an inode so that > if the inode has EXT4_EOFBLOCKS_FL flag set and we are trying to truncate > to a size larger than the inode's i_size, we will only truncate the blocks > beyond the specified truncate size instead of all of blocks beyond i_size. > > Signed-off-by: Jiaying Zhang <jiayingz@google.com> > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > index 3424e82..3bfad57 100644 > --- a/fs/ext4/inode.c > +++ b/fs/ext4/inode.c > @@ -5347,8 +5347,11 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr) > } > } > /* ext4_truncate will clear the flag */ > - if ((ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS))) > - ext4_truncate(inode); > + if ((ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS))) { > + rc = vmtruncate(inode, attr->ia_size); > + if (rc) > + goto err_out; > + } > } > > if ((attr->ia_valid & ATTR_SIZE) && > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] ext4: use vmtruncate() instead of ext4_truncate() in ext4_setattr() 2011-05-18 3:19 ` Eric Sandeen @ 2011-05-18 5:35 ` Andreas Dilger 2011-05-18 20:32 ` Jiaying Zhang 2011-05-18 6:13 ` Dave Chinner ` (2 subsequent siblings) 3 siblings, 1 reply; 22+ messages in thread From: Andreas Dilger @ 2011-05-18 5:35 UTC (permalink / raw) To: Eric Sandeen; +Cc: Jiaying Zhang, tytso, linux-ext4 On May 17, 2011, at 21:19, Eric Sandeen wrote: > On 5/17/11 5:59 PM, Jiaying Zhang wrote: >> There is a bug in commit c8d46e41 "ext4: Add flag to files with blocks >> intentionally past EOF" that if we fallocate a file with FALLOC_FL_KEEP_SIZE >> flag and then ftruncate the file to a size larger than the file's i_size, >> any allocated but unwritten blocks will be freed but the file size is set >> to the size that ftruncate specifies. >> >> Here is a simple test to reproduce the problem: >> 1. fallocate a 12k size file with KEEP_SIZE flag >> 2. write the first 4k >> 3. ftruncate the file to 8k >> Then 'ls -l' shows that the i_size of the file becomes 8k but debugfs >> shows the file has only the first written block left. > > To be honest I'm not 100% certain what the fiesystem -should- do in this case. I think it makes sense from a usage POV to discard the blocks after EOF when a truncate is being done. For something like a PVR that is recording a show, but doesn't know the exact total size, it makes sense to fallocate() some estimated amount of space, and then when the show is finished recording it uses ftruncate() of the current size to drop the fallocated space. > If I go through that same sequence on xfs, I get 4k written / 8k unwritten: > > # xfs_bmap -vp testfile > testfile: > EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS > 0: [0..7]: 2648750760..2648750767 3 (356066400..356066407) 8 00000 > 1: [8..23]: 2648750768..2648750783 3 (356066408..356066423) 16 10000 > > size 8k: > # ls -l testfile > -rw-r--r-- 1 root root 8192 May 17 22:33 testfile > > and diskspace used 12k: > # du -hc testfile > 12K testfile > 12K total > > I think this is a different result from ext4, either with or without your patch. > > On ext4 I get size 8k, but only the first 4k mapped, as you say. > > I don't recall when truncate is supposed to free fallocated blocks, and from what point? > > -Eric > >> Below is the proposed patch to fix the bug: >> >> ext4: use vmtruncate() instead of ext4_truncate() in ext4_setattr(). >> >> Change ext4_setattr() to use vmtruncate(inode, attr->ia_size) instead >> of ext4_truncate(inode) when it needs to truncate an inode so that >> if the inode has EXT4_EOFBLOCKS_FL flag set and we are trying to truncate >> to a size larger than the inode's i_size, we will only truncate the blocks >> beyond the specified truncate size instead of all of blocks beyond i_size. >> >> Signed-off-by: Jiaying Zhang <jiayingz@google.com> >> >> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c >> index 3424e82..3bfad57 100644 >> --- a/fs/ext4/inode.c >> +++ b/fs/ext4/inode.c >> @@ -5347,8 +5347,11 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr) >> } >> } >> /* ext4_truncate will clear the flag */ >> - if ((ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS))) >> - ext4_truncate(inode); >> + if ((ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS))) { >> + rc = vmtruncate(inode, attr->ia_size); >> + if (rc) >> + goto err_out; >> + } >> } >> >> if ((attr->ia_valid & ATTR_SIZE) && >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html Cheers, Andreas ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] ext4: use vmtruncate() instead of ext4_truncate() in ext4_setattr() 2011-05-18 5:35 ` Andreas Dilger @ 2011-05-18 20:32 ` Jiaying Zhang 2011-05-18 20:45 ` Andreas Dilger 0 siblings, 1 reply; 22+ messages in thread From: Jiaying Zhang @ 2011-05-18 20:32 UTC (permalink / raw) To: Andreas Dilger; +Cc: Eric Sandeen, tytso, linux-ext4 On Tue, May 17, 2011 at 10:35 PM, Andreas Dilger <adilger@dilger.ca> wrote: > On May 17, 2011, at 21:19, Eric Sandeen wrote: >> On 5/17/11 5:59 PM, Jiaying Zhang wrote: >>> There is a bug in commit c8d46e41 "ext4: Add flag to files with blocks >>> intentionally past EOF" that if we fallocate a file with FALLOC_FL_KEEP_SIZE >>> flag and then ftruncate the file to a size larger than the file's i_size, >>> any allocated but unwritten blocks will be freed but the file size is set >>> to the size that ftruncate specifies. >>> >>> Here is a simple test to reproduce the problem: >>> 1. fallocate a 12k size file with KEEP_SIZE flag >>> 2. write the first 4k >>> 3. ftruncate the file to 8k >>> Then 'ls -l' shows that the i_size of the file becomes 8k but debugfs >>> shows the file has only the first written block left. >> >> To be honest I'm not 100% certain what the fiesystem -should- do in this case. > > I think it makes sense from a usage POV to discard the blocks after EOF when a truncate is being done. For something like a PVR that is recording a show, but doesn't know the exact total size, it makes sense to fallocate() some estimated amount of space, and then when the show is finished recording it uses ftruncate() of the current size to drop the fallocated space. > Indeed we have applications that are doing exactly the same as you described. They always fallocate files to a pre-defined size with KEEP_SIZE flag and if they end up using less than the allocated size, they ftruncate files to their written size later. Jiaying >> If I go through that same sequence on xfs, I get 4k written / 8k unwritten: >> >> # xfs_bmap -vp testfile >> testfile: >> EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS >> 0: [0..7]: 2648750760..2648750767 3 (356066400..356066407) 8 00000 >> 1: [8..23]: 2648750768..2648750783 3 (356066408..356066423) 16 10000 >> >> size 8k: >> # ls -l testfile >> -rw-r--r-- 1 root root 8192 May 17 22:33 testfile >> >> and diskspace used 12k: >> # du -hc testfile >> 12K testfile >> 12K total >> >> I think this is a different result from ext4, either with or without your patch. >> >> On ext4 I get size 8k, but only the first 4k mapped, as you say. >> >> I don't recall when truncate is supposed to free fallocated blocks, and from what point? >> >> -Eric >> >>> Below is the proposed patch to fix the bug: >>> >>> ext4: use vmtruncate() instead of ext4_truncate() in ext4_setattr(). >>> >>> Change ext4_setattr() to use vmtruncate(inode, attr->ia_size) instead >>> of ext4_truncate(inode) when it needs to truncate an inode so that >>> if the inode has EXT4_EOFBLOCKS_FL flag set and we are trying to truncate >>> to a size larger than the inode's i_size, we will only truncate the blocks >>> beyond the specified truncate size instead of all of blocks beyond i_size. >>> >>> Signed-off-by: Jiaying Zhang <jiayingz@google.com> >>> >>> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c >>> index 3424e82..3bfad57 100644 >>> --- a/fs/ext4/inode.c >>> +++ b/fs/ext4/inode.c >>> @@ -5347,8 +5347,11 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr) >>> } >>> } >>> /* ext4_truncate will clear the flag */ >>> - if ((ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS))) >>> - ext4_truncate(inode); >>> + if ((ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS))) { >>> + rc = vmtruncate(inode, attr->ia_size); >>> + if (rc) >>> + goto err_out; >>> + } >>> } >>> >>> if ((attr->ia_valid & ATTR_SIZE) && >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > Cheers, Andreas > > > > > > -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] ext4: use vmtruncate() instead of ext4_truncate() in ext4_setattr() 2011-05-18 20:32 ` Jiaying Zhang @ 2011-05-18 20:45 ` Andreas Dilger 2011-05-18 20:57 ` Jiaying Zhang 0 siblings, 1 reply; 22+ messages in thread From: Andreas Dilger @ 2011-05-18 20:45 UTC (permalink / raw) To: Jiaying Zhang; +Cc: Eric Sandeen, tytso, linux-ext4 On May 18, 2011, at 14:32, Jiaying Zhang wrote: > On Tue, May 17, 2011 at 10:35 PM, Andreas Dilger <adilger@dilger.ca> wrote: >> On May 17, 2011, at 21:19, Eric Sandeen wrote: >>> On 5/17/11 5:59 PM, Jiaying Zhang wrote: >>>> There is a bug in commit c8d46e41 "ext4: Add flag to files with blocks >>>> intentionally past EOF" that if we fallocate a file with FALLOC_FL_KEEP_SIZE >>>> flag and then ftruncate the file to a size larger than the file's i_size, >>>> any allocated but unwritten blocks will be freed but the file size is set >>>> to the size that ftruncate specifies. >>>> >>>> Here is a simple test to reproduce the problem: >>>> 1. fallocate a 12k size file with KEEP_SIZE flag >>>> 2. write the first 4k >>>> 3. ftruncate the file to 8k >>>> Then 'ls -l' shows that the i_size of the file becomes 8k but debugfs >>>> shows the file has only the first written block left. >>> >>> To be honest I'm not 100% certain what the fiesystem -should- do in this case. >> >> I think it makes sense from a usage POV to discard the blocks after EOF when a truncate is being done. For something like a PVR that is recording a show, but doesn't know the exact total size, it makes sense to fallocate() some estimated amount of space, and then when the show is finished recording it uses ftruncate() of the current size to drop the fallocated space. >> > > Indeed we have applications that are doing exactly the same as you > described. They always fallocate files to a pre-defined size with > KEEP_SIZE flag and if they end up using less than the allocated size, > they ftruncate files to their written size later. If XFS is already handling truncate-up and truncate-down differently, I don't mind keeping consistent behaviour with XFS. I had thought about this also, that truncate-up isn't intending to throw away space while truncate-down is. However, I didn't mention it in my email because I thought the semantics of having different behaviour for truncate-up vs. truncate-down was confusing. If XFS is already doing this, then it seems that this is at least somewhat expected by applications and/or is more efficient in the long run for the on-disk allocation. >>> If I go through that same sequence on xfs, I get 4k written / 8k unwritten: >>> >>> # xfs_bmap -vp testfile >>> testfile: >>> EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS >>> 0: [0..7]: 2648750760..2648750767 3 (356066400..356066407) 8 00000 >>> 1: [8..23]: 2648750768..2648750783 3 (356066408..356066423) 16 10000 >>> >>> size 8k: >>> # ls -l testfile >>> -rw-r--r-- 1 root root 8192 May 17 22:33 testfile >>> >>> and diskspace used 12k: >>> # du -hc testfile >>> 12K testfile >>> 12K total >>> >>> I think this is a different result from ext4, either with or without your patch. >>> >>> On ext4 I get size 8k, but only the first 4k mapped, as you say. >>> >>> I don't recall when truncate is supposed to free fallocated blocks, and from what point? >>> >>> -Eric >>> >>>> Below is the proposed patch to fix the bug: >>>> >>>> ext4: use vmtruncate() instead of ext4_truncate() in ext4_setattr(). >>>> >>>> Change ext4_setattr() to use vmtruncate(inode, attr->ia_size) instead >>>> of ext4_truncate(inode) when it needs to truncate an inode so that >>>> if the inode has EXT4_EOFBLOCKS_FL flag set and we are trying to truncate >>>> to a size larger than the inode's i_size, we will only truncate the blocks >>>> beyond the specified truncate size instead of all of blocks beyond i_size. >>>> >>>> Signed-off-by: Jiaying Zhang <jiayingz@google.com> >>>> >>>> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c >>>> index 3424e82..3bfad57 100644 >>>> --- a/fs/ext4/inode.c >>>> +++ b/fs/ext4/inode.c >>>> @@ -5347,8 +5347,11 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr) >>>> } >>>> } >>>> /* ext4_truncate will clear the flag */ >>>> - if ((ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS))) >>>> - ext4_truncate(inode); >>>> + if ((ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS))) { >>>> + rc = vmtruncate(inode, attr->ia_size); >>>> + if (rc) >>>> + goto err_out; >>>> + } >>>> } >>>> >>>> if ((attr->ia_valid & ATTR_SIZE) && >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> >> Cheers, Andreas >> >> >> >> >> >> > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html Cheers, Andreas ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] ext4: use vmtruncate() instead of ext4_truncate() in ext4_setattr() 2011-05-18 20:45 ` Andreas Dilger @ 2011-05-18 20:57 ` Jiaying Zhang 0 siblings, 0 replies; 22+ messages in thread From: Jiaying Zhang @ 2011-05-18 20:57 UTC (permalink / raw) To: Andreas Dilger; +Cc: Eric Sandeen, tytso, linux-ext4 On Wed, May 18, 2011 at 1:45 PM, Andreas Dilger <adilger@dilger.ca> wrote: > On May 18, 2011, at 14:32, Jiaying Zhang wrote: >> On Tue, May 17, 2011 at 10:35 PM, Andreas Dilger <adilger@dilger.ca> wrote: >>> On May 17, 2011, at 21:19, Eric Sandeen wrote: >>>> On 5/17/11 5:59 PM, Jiaying Zhang wrote: >>>>> There is a bug in commit c8d46e41 "ext4: Add flag to files with blocks >>>>> intentionally past EOF" that if we fallocate a file with FALLOC_FL_KEEP_SIZE >>>>> flag and then ftruncate the file to a size larger than the file's i_size, >>>>> any allocated but unwritten blocks will be freed but the file size is set >>>>> to the size that ftruncate specifies. >>>>> >>>>> Here is a simple test to reproduce the problem: >>>>> 1. fallocate a 12k size file with KEEP_SIZE flag >>>>> 2. write the first 4k >>>>> 3. ftruncate the file to 8k >>>>> Then 'ls -l' shows that the i_size of the file becomes 8k but debugfs >>>>> shows the file has only the first written block left. >>>> >>>> To be honest I'm not 100% certain what the fiesystem -should- do in this case. >>> >>> I think it makes sense from a usage POV to discard the blocks after EOF when a truncate is being done. For something like a PVR that is recording a show, but doesn't know the exact total size, it makes sense to fallocate() some estimated amount of space, and then when the show is finished recording it uses ftruncate() of the current size to drop the fallocated space. >>> >> >> Indeed we have applications that are doing exactly the same as you >> described. They always fallocate files to a pre-defined size with >> KEEP_SIZE flag and if they end up using less than the allocated size, >> they ftruncate files to their written size later. > > If XFS is already handling truncate-up and truncate-down differently, I don't mind keeping consistent behaviour with XFS. I had thought about this also, that truncate-up isn't intending to throw away space while truncate-down is. However, I didn't mention it in my email because I thought the semantics of having different behaviour for truncate-up vs. truncate-down was confusing. > > If XFS is already doing this, then it seems that this is at least somewhat expected by applications and/or is more efficient in the long run for the on-disk allocation. With my patch, it is still a little different from what xfs does in truncating-up case. As Eric mentioned, when an application fallocates 12k, writes 4k, and then truncates to 8k, on xfs there will be 12k allocated blocks left, but on ext4 with my patch there will be 8k allocated blocks left. The reason I think we may want to free any blocks beyond the truncate size is because there may be situations that applications are running out of space and want to shrink fallocated files to a smaller size that is still larger than the current i_size. Jiaying > >>>> If I go through that same sequence on xfs, I get 4k written / 8k unwritten: >>>> >>>> # xfs_bmap -vp testfile >>>> testfile: >>>> EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS >>>> 0: [0..7]: 2648750760..2648750767 3 (356066400..356066407) 8 00000 >>>> 1: [8..23]: 2648750768..2648750783 3 (356066408..356066423) 16 10000 >>>> >>>> size 8k: >>>> # ls -l testfile >>>> -rw-r--r-- 1 root root 8192 May 17 22:33 testfile >>>> >>>> and diskspace used 12k: >>>> # du -hc testfile >>>> 12K testfile >>>> 12K total >>>> >>>> I think this is a different result from ext4, either with or without your patch. >>>> >>>> On ext4 I get size 8k, but only the first 4k mapped, as you say. >>>> >>>> I don't recall when truncate is supposed to free fallocated blocks, and from what point? >>>> >>>> -Eric >>>> >>>>> Below is the proposed patch to fix the bug: >>>>> >>>>> ext4: use vmtruncate() instead of ext4_truncate() in ext4_setattr(). >>>>> >>>>> Change ext4_setattr() to use vmtruncate(inode, attr->ia_size) instead >>>>> of ext4_truncate(inode) when it needs to truncate an inode so that >>>>> if the inode has EXT4_EOFBLOCKS_FL flag set and we are trying to truncate >>>>> to a size larger than the inode's i_size, we will only truncate the blocks >>>>> beyond the specified truncate size instead of all of blocks beyond i_size. >>>>> >>>>> Signed-off-by: Jiaying Zhang <jiayingz@google.com> >>>>> >>>>> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c >>>>> index 3424e82..3bfad57 100644 >>>>> --- a/fs/ext4/inode.c >>>>> +++ b/fs/ext4/inode.c >>>>> @@ -5347,8 +5347,11 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr) >>>>> } >>>>> } >>>>> /* ext4_truncate will clear the flag */ >>>>> - if ((ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS))) >>>>> - ext4_truncate(inode); >>>>> + if ((ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS))) { >>>>> + rc = vmtruncate(inode, attr->ia_size); >>>>> + if (rc) >>>>> + goto err_out; >>>>> + } >>>>> } >>>>> >>>>> if ((attr->ia_valid & ATTR_SIZE) && >>>>> -- >>>>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in >>>>> the body of a message to majordomo@vger.kernel.org >>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >>> >>> Cheers, Andreas >>> >>> >>> >>> >>> >>> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > Cheers, Andreas > > > > > > -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] ext4: use vmtruncate() instead of ext4_truncate() in ext4_setattr() 2011-05-18 3:19 ` Eric Sandeen 2011-05-18 5:35 ` Andreas Dilger @ 2011-05-18 6:13 ` Dave Chinner 2011-05-18 14:05 ` Eric Sandeen 2011-05-18 20:42 ` Jiaying Zhang 2011-05-18 20:29 ` Jiaying Zhang [not found] ` <BANLkTi=Yv_q820aHFa2wkCL-PnYNcZdWCQ@mail.gmail.com> 3 siblings, 2 replies; 22+ messages in thread From: Dave Chinner @ 2011-05-18 6:13 UTC (permalink / raw) To: Eric Sandeen; +Cc: Jiaying Zhang, tytso, linux-ext4 On Tue, May 17, 2011 at 10:19:05PM -0500, Eric Sandeen wrote: > On 5/17/11 5:59 PM, Jiaying Zhang wrote: > > There is a bug in commit c8d46e41 "ext4: Add flag to files with blocks > > intentionally past EOF" that if we fallocate a file with FALLOC_FL_KEEP_SIZE > > flag and then ftruncate the file to a size larger than the file's i_size, > > any allocated but unwritten blocks will be freed but the file size is set > > to the size that ftruncate specifies. > > > > Here is a simple test to reproduce the problem: > > 1. fallocate a 12k size file with KEEP_SIZE flag > > 2. write the first 4k > > 3. ftruncate the file to 8k > > Then 'ls -l' shows that the i_size of the file becomes 8k but debugfs > > shows the file has only the first written block left. > > To be honest I'm not 100% certain what the fiesystem -should- do in this case. > > If I go through that same sequence on xfs, I get 4k written / 8k unwritten: > > # xfs_bmap -vp testfile > testfile: > EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS > 0: [0..7]: 2648750760..2648750767 3 (356066400..356066407) 8 00000 > 1: [8..23]: 2648750768..2648750783 3 (356066408..356066423) 16 10000 Ok, so that's the case for a _truncate up_ from 4k to 8k: $ rm /mnt/test/foo $ xfs_io -f -c "resvsp 0 12k" -c stat -c "bmap -vp" -c "pwrite 0 4k" -c "fsync" -c "bmap -vp" -c "t 8k" -c "bmap -vp" -c stat /mnt/test/foo fd.path = "/mnt/test/foo" fd.flags = non-sync,non-direct,read-write stat.ino = 71 stat.type = regular file stat.size = 0 stat.blocks = 24 fsxattr.xflags = 0x2 [-p------------] fsxattr.projid = 0 fsxattr.extsize = 0 fsxattr.nextents = 1 fsxattr.naextents = 0 dioattr.mem = 0x200 dioattr.miniosz = 512 dioattr.maxiosz = 2147483136 /mnt/test/foo: EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS 0: [0..23]: 9712..9735 0 (9712..9735) 24 10000 wrote 4096/4096 bytes at offset 0 4 KiB, 1 ops; 0.0000 sec (156 MiB/sec and 40000.0000 ops/sec) /mnt/test/foo: EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS 0: [0..7]: 9712..9719 0 (9712..9719) 8 00000 1: [8..23]: 9720..9735 0 (9720..9735) 16 10000 /mnt/test/foo: EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS 0: [0..7]: 9712..9719 0 (9712..9719) 8 00000 1: [8..23]: 9720..9735 0 (9720..9735) 16 10000 fd.path = "/mnt/test/foo" fd.flags = non-sync,non-direct,read-write stat.ino = 71 stat.type = regular file stat.size = 8192 stat.blocks = 24 fsxattr.xflags = 0x2 [-p------------] fsxattr.projid = 0 fsxattr.extsize = 0 fsxattr.nextents = 2 fsxattr.naextents = 0 dioattr.mem = 0x200 dioattr.miniosz = 512 dioattr.maxiosz = 2147483136 But you get a different result on truncate down: $rm /mnt/test/foo $ xfs_io -f -c "truncate 12k" -c "resvsp 0 12k" -c stat -c "bmap -vp" -c "pwrite 0 4k" -c "fsync" -c "bmap -vp" -c "t 8k" -c "bmap -vp" -c stat /mnt/test/foo fd.path = "/mnt/test/foo" fd.flags = non-sync,non-direct,read-write stat.ino = 71 stat.type = regular file stat.size = 12288 stat.blocks = 24 fsxattr.xflags = 0x2 [-p------------] fsxattr.projid = 0 fsxattr.extsize = 0 fsxattr.nextents = 1 fsxattr.naextents = 0 dioattr.mem = 0x200 dioattr.miniosz = 512 dioattr.maxiosz = 2147483136 /mnt/test/foo: EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS 0: [0..23]: 9584..9607 0 (9584..9607) 24 10000 wrote 4096/4096 bytes at offset 0 4 KiB, 1 ops; 0.0000 sec (217.014 MiB/sec and 55555.5556 ops/sec) /mnt/test/foo: EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS 0: [0..7]: 9584..9591 0 (9584..9591) 8 00000 1: [8..23]: 9592..9607 0 (9592..9607) 16 10000 /mnt/test/foo: EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS 0: [0..7]: 9584..9591 0 (9584..9591) 8 00000 1: [8..15]: 9592..9599 0 (9592..9599) 8 10000 fd.path = "/mnt/test/foo" fd.flags = non-sync,non-direct,read-write stat.ino = 71 stat.type = regular file stat.size = 8192 stat.blocks = 16 fsxattr.xflags = 0x2 [-p------------] fsxattr.projid = 0 fsxattr.extsize = 0 fsxattr.nextents = 2 fsxattr.naextents = 0 dioattr.mem = 0x200 dioattr.miniosz = 512 dioattr.maxiosz = 2147483136 IOWs, on XFS a truncate up does not change the preallocation at all, while a truncate down will _always_ remove preallocation beyond the new EOF. It's always had this behaviour w.r.t. to truncate(2) and preallocation beyond EOF. > I think this is a different result from ext4, either with or without your patch. > > On ext4 I get size 8k, but only the first 4k mapped, as you say. > > I don't recall when truncate is supposed to free fallocated blocks, and from what point? It's entirely up to the filesystem how it treats blocks beyond EOF during truncation. XFS frees them on truncate down, because it is much safer to just truncate away everything beyond the new EOF than to leave written extents beyond EOF as potential landmines. Indeed, that's why calling vmtruncate() as a bad fix. If you have: NUUUUUUUUUUWWWWWWWWWOUUUUUUUUU ....----+----------+--------+--------+ A B C D Where A = new EOF (N) A->B = unwritten (U) B->C = written (W) C = old EOF (O) C->D = unwritten (U) Then just calling vmtruncate() will leave the blocks in the range B->C as written blocks. Hence then doing an extending truncate back out to D will expose stale data rather than zeros in the range B->C.... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] ext4: use vmtruncate() instead of ext4_truncate() in ext4_setattr() 2011-05-18 6:13 ` Dave Chinner @ 2011-05-18 14:05 ` Eric Sandeen 2011-05-18 20:42 ` Jiaying Zhang 1 sibling, 0 replies; 22+ messages in thread From: Eric Sandeen @ 2011-05-18 14:05 UTC (permalink / raw) To: Dave Chinner; +Cc: Jiaying Zhang, tytso, linux-ext4 On 5/18/11 1:13 AM, Dave Chinner wrote: > It's entirely up to the filesystem how it treats blocks beyond EOF > during truncation. XFS frees them on truncate down, because it is > much safer to just truncate away everything beyond the new EOF than > to leave written extents beyond EOF as potential landmines. > > Indeed, that's why calling vmtruncate() as a bad fix. If you have: > > > NUUUUUUUUUUWWWWWWWWWOUUUUUUUUU > ....----+----------+--------+--------+ > A B C D > > Where A = new EOF (N) > A->B = unwritten (U) > B->C = written (W) > C = old EOF (O) > C->D = unwritten (U) > > Then just calling vmtruncate() will leave the blocks in the range > B->C as written blocks. Hence then doing an extending truncate back > out to D will expose stale data rather than zeros in the range > B->C.... Hm, running recent xfstests which includes fsx-with-fallocate should probably eventually catch that then. -Eric > Cheers, > > Dave. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] ext4: use vmtruncate() instead of ext4_truncate() in ext4_setattr() 2011-05-18 6:13 ` Dave Chinner 2011-05-18 14:05 ` Eric Sandeen @ 2011-05-18 20:42 ` Jiaying Zhang 1 sibling, 0 replies; 22+ messages in thread From: Jiaying Zhang @ 2011-05-18 20:42 UTC (permalink / raw) To: Dave Chinner; +Cc: Eric Sandeen, tytso, linux-ext4 On Tue, May 17, 2011 at 11:13 PM, Dave Chinner <david@fromorbit.com> wrote: > On Tue, May 17, 2011 at 10:19:05PM -0500, Eric Sandeen wrote: >> On 5/17/11 5:59 PM, Jiaying Zhang wrote: >> > There is a bug in commit c8d46e41 "ext4: Add flag to files with blocks >> > intentionally past EOF" that if we fallocate a file with FALLOC_FL_KEEP_SIZE >> > flag and then ftruncate the file to a size larger than the file's i_size, >> > any allocated but unwritten blocks will be freed but the file size is set >> > to the size that ftruncate specifies. >> > >> > Here is a simple test to reproduce the problem: >> > 1. fallocate a 12k size file with KEEP_SIZE flag >> > 2. write the first 4k >> > 3. ftruncate the file to 8k >> > Then 'ls -l' shows that the i_size of the file becomes 8k but debugfs >> > shows the file has only the first written block left. >> >> To be honest I'm not 100% certain what the fiesystem -should- do in this case. >> >> If I go through that same sequence on xfs, I get 4k written / 8k unwritten: >> >> # xfs_bmap -vp testfile >> testfile: >> EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS >> 0: [0..7]: 2648750760..2648750767 3 (356066400..356066407) 8 00000 >> 1: [8..23]: 2648750768..2648750783 3 (356066408..356066423) 16 10000 > > Ok, so that's the case for a _truncate up_ from 4k to 8k: > > $ rm /mnt/test/foo > $ xfs_io -f -c "resvsp 0 12k" -c stat -c "bmap -vp" -c "pwrite 0 4k" -c "fsync" -c "bmap -vp" -c "t 8k" -c "bmap -vp" -c stat /mnt/test/foo > fd.path = "/mnt/test/foo" > fd.flags = non-sync,non-direct,read-write > stat.ino = 71 > stat.type = regular file > stat.size = 0 > stat.blocks = 24 > fsxattr.xflags = 0x2 [-p------------] > fsxattr.projid = 0 > fsxattr.extsize = 0 > fsxattr.nextents = 1 > fsxattr.naextents = 0 > dioattr.mem = 0x200 > dioattr.miniosz = 512 > dioattr.maxiosz = 2147483136 > /mnt/test/foo: > EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS > 0: [0..23]: 9712..9735 0 (9712..9735) 24 10000 > wrote 4096/4096 bytes at offset 0 > 4 KiB, 1 ops; 0.0000 sec (156 MiB/sec and 40000.0000 ops/sec) > /mnt/test/foo: > EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS > 0: [0..7]: 9712..9719 0 (9712..9719) 8 00000 > 1: [8..23]: 9720..9735 0 (9720..9735) 16 10000 > /mnt/test/foo: > EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS > 0: [0..7]: 9712..9719 0 (9712..9719) 8 00000 > 1: [8..23]: 9720..9735 0 (9720..9735) 16 10000 > fd.path = "/mnt/test/foo" > fd.flags = non-sync,non-direct,read-write > stat.ino = 71 > stat.type = regular file > stat.size = 8192 > stat.blocks = 24 > fsxattr.xflags = 0x2 [-p------------] > fsxattr.projid = 0 > fsxattr.extsize = 0 > fsxattr.nextents = 2 > fsxattr.naextents = 0 > dioattr.mem = 0x200 > dioattr.miniosz = 512 > dioattr.maxiosz = 2147483136 > > But you get a different result on truncate down: > > $rm /mnt/test/foo > $ xfs_io -f -c "truncate 12k" -c "resvsp 0 12k" -c stat -c "bmap -vp" -c "pwrite 0 4k" -c "fsync" -c "bmap -vp" -c "t 8k" -c "bmap -vp" -c stat /mnt/test/foo > fd.path = "/mnt/test/foo" > fd.flags = non-sync,non-direct,read-write > stat.ino = 71 > stat.type = regular file > stat.size = 12288 > stat.blocks = 24 > fsxattr.xflags = 0x2 [-p------------] > fsxattr.projid = 0 > fsxattr.extsize = 0 > fsxattr.nextents = 1 > fsxattr.naextents = 0 > dioattr.mem = 0x200 > dioattr.miniosz = 512 > dioattr.maxiosz = 2147483136 > /mnt/test/foo: > EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS > 0: [0..23]: 9584..9607 0 (9584..9607) 24 10000 > wrote 4096/4096 bytes at offset 0 > 4 KiB, 1 ops; 0.0000 sec (217.014 MiB/sec and 55555.5556 ops/sec) > /mnt/test/foo: > EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS > 0: [0..7]: 9584..9591 0 (9584..9591) 8 00000 > 1: [8..23]: 9592..9607 0 (9592..9607) 16 10000 > /mnt/test/foo: > EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS > 0: [0..7]: 9584..9591 0 (9584..9591) 8 00000 > 1: [8..15]: 9592..9599 0 (9592..9599) 8 10000 > fd.path = "/mnt/test/foo" > fd.flags = non-sync,non-direct,read-write > stat.ino = 71 > stat.type = regular file > stat.size = 8192 > stat.blocks = 16 > fsxattr.xflags = 0x2 [-p------------] > fsxattr.projid = 0 > fsxattr.extsize = 0 > fsxattr.nextents = 2 > fsxattr.naextents = 0 > dioattr.mem = 0x200 > dioattr.miniosz = 512 > dioattr.maxiosz = 2147483136 > > IOWs, on XFS a truncate up does not change the preallocation at all, > while a truncate down will _always_ remove preallocation beyond the > new EOF. It's always had this behaviour w.r.t. to truncate(2) and > preallocation beyond EOF. > >> I think this is a different result from ext4, either with or without your patch. >> >> On ext4 I get size 8k, but only the first 4k mapped, as you say. >> >> I don't recall when truncate is supposed to free fallocated blocks, and from what point? > > It's entirely up to the filesystem how it treats blocks beyond EOF > during truncation. XFS frees them on truncate down, because it is > much safer to just truncate away everything beyond the new EOF than > to leave written extents beyond EOF as potential landmines. > > Indeed, that's why calling vmtruncate() as a bad fix. If you have: > > > NUUUUUUUUUUWWWWWWWWWOUUUUUUUUU > ....----+----------+--------+--------+ > A B C D > > Where A = new EOF (N) > A->B = unwritten (U) > B->C = written (W) > C = old EOF (O) > C->D = unwritten (U) > > Then just calling vmtruncate() will leave the blocks in the range > B->C as written blocks. Hence then doing an extending truncate back > out to D will expose stale data rather than zeros in the range > B->C.... Sorry I am a little confused. If I understand correctly, in the situation you described, we call a truncate that causes EOF to change from C to A. On ext4, we should free all of blocks after A. And when we do an extending truncate to D, any blocks beyond A should be treated as unwritten blocks so we should not expose any stale data, right? Jiaying > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com > -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] ext4: use vmtruncate() instead of ext4_truncate() in ext4_setattr() 2011-05-18 3:19 ` Eric Sandeen 2011-05-18 5:35 ` Andreas Dilger 2011-05-18 6:13 ` Dave Chinner @ 2011-05-18 20:29 ` Jiaying Zhang [not found] ` <BANLkTi=Yv_q820aHFa2wkCL-PnYNcZdWCQ@mail.gmail.com> 3 siblings, 0 replies; 22+ messages in thread From: Jiaying Zhang @ 2011-05-18 20:29 UTC (permalink / raw) To: Eric Sandeen; +Cc: tytso, linux-ext4 On Tue, May 17, 2011 at 8:19 PM, Eric Sandeen <sandeen@redhat.com> wrote: > > On 5/17/11 5:59 PM, Jiaying Zhang wrote: > > There is a bug in commit c8d46e41 "ext4: Add flag to files with blocks > > intentionally past EOF" that if we fallocate a file with FALLOC_FL_KEEP_SIZE > > flag and then ftruncate the file to a size larger than the file's i_size, > > any allocated but unwritten blocks will be freed but the file size is set > > to the size that ftruncate specifies. > > > > Here is a simple test to reproduce the problem: > > 1. fallocate a 12k size file with KEEP_SIZE flag > > 2. write the first 4k > > 3. ftruncate the file to 8k > > Then 'ls -l' shows that the i_size of the file becomes 8k but debugfs > > shows the file has only the first written block left. > > To be honest I'm not 100% certain what the fiesystem -should- do in this case. > > If I go through that same sequence on xfs, I get 4k written / 8k unwritten: > > # xfs_bmap -vp testfile > testfile: > EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS > 0: [0..7]: 2648750760..2648750767 3 (356066400..356066407) 8 00000 > 1: [8..23]: 2648750768..2648750783 3 (356066408..356066423) 16 10000 > > size 8k: > # ls -l testfile > -rw-r--r-- 1 root root 8192 May 17 22:33 testfile > > and diskspace used 12k: > # du -hc testfile > 12K testfile > 12K total > > I think this is a different result from ext4, either with or without your patch. > > On ext4 I get size 8k, but only the first 4k mapped, as you say. I agree that truncating to a size larger than i_size is un-specified by POSIX. However, I think the problem with the current behavior is that we have an inconsistency between file's i_size and its extent tree. Now we have 8k i_size but the file has only 4k space allocated. That can confuse applications. Jiaying > > I don't recall when truncate is supposed to free fallocated blocks, and from what point? > > -Eric > > > Below is the proposed patch to fix the bug: > > > > ext4: use vmtruncate() instead of ext4_truncate() in ext4_setattr(). > > > > Change ext4_setattr() to use vmtruncate(inode, attr->ia_size) instead > > of ext4_truncate(inode) when it needs to truncate an inode so that > > if the inode has EXT4_EOFBLOCKS_FL flag set and we are trying to truncate > > to a size larger than the inode's i_size, we will only truncate the blocks > > beyond the specified truncate size instead of all of blocks beyond i_size. > > > > Signed-off-by: Jiaying Zhang <jiayingz@google.com> > > > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > > index 3424e82..3bfad57 100644 > > --- a/fs/ext4/inode.c > > +++ b/fs/ext4/inode.c > > @@ -5347,8 +5347,11 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr) > > } > > } > > /* ext4_truncate will clear the flag */ > > - if ((ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS))) > > - ext4_truncate(inode); > > + if ((ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS))) { > > + rc = vmtruncate(inode, attr->ia_size); > > + if (rc) > > + goto err_out; > > + } > > } > > > > if ((attr->ia_valid & ATTR_SIZE) && > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 22+ messages in thread
[parent not found: <BANLkTi=Yv_q820aHFa2wkCL-PnYNcZdWCQ@mail.gmail.com>]
* Re: [PATCH] ext4: use vmtruncate() instead of ext4_truncate() in ext4_setattr() [not found] ` <BANLkTi=Yv_q820aHFa2wkCL-PnYNcZdWCQ@mail.gmail.com> @ 2011-05-18 20:31 ` Eric Sandeen 2011-05-18 20:38 ` Jiaying Zhang 0 siblings, 1 reply; 22+ messages in thread From: Eric Sandeen @ 2011-05-18 20:31 UTC (permalink / raw) To: Jiaying Zhang; +Cc: tytso, linux-ext4 On 5/18/11 3:28 PM, Jiaying Zhang wrote: > > > On Tue, May 17, 2011 at 8:19 PM, Eric Sandeen <sandeen@redhat.com <mailto:sandeen@redhat.com>> wrote: > > On 5/17/11 5:59 PM, Jiaying Zhang wrote: > > There is a bug in commit c8d46e41 "ext4: Add flag to files with blocks > > intentionally past EOF" that if we fallocate a file with FALLOC_FL_KEEP_SIZE > > flag and then ftruncate the file to a size larger than the file's i_size, > > any allocated but unwritten blocks will be freed but the file size is set > > to the size that ftruncate specifies. > > > > Here is a simple test to reproduce the problem: > > 1. fallocate a 12k size file with KEEP_SIZE flag > > 2. write the first 4k > > 3. ftruncate the file to 8k > > Then 'ls -l' shows that the i_size of the file becomes 8k but debugfs > > shows the file has only the first written block left. > > To be honest I'm not 100% certain what the fiesystem -should- do in this case. > > If I go through that same sequence on xfs, I get 4k written / 8k unwritten: > > # xfs_bmap -vp testfile > testfile: > EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS > 0: [0..7]: 2648750760..2648750767 3 (356066400..356066407) 8 00000 > 1: [8..23]: 2648750768..2648750783 3 (356066408..356066423) 16 10000 > > size 8k: > # ls -l testfile > -rw-r--r-- 1 root root 8192 May 17 22:33 testfile > > and diskspace used 12k: > # du -hc testfile > 12K testfile > 12K total > > I think this is a different result from ext4, either with or without your patch. > > On ext4 I get size 8k, but only the first 4k mapped, as you say. > > I agree that truncating to a size larger than i_size is un-specified by > POSIX. However, I think the problem with the current behavior is that > we have an inconsistency between file's i_size and its extent tree. > Now we have 8k i_size but the file has only 4k space allocated. That > can confuse applications. That's called "a sparse file" right? Apps should not be confused by that ... -Eric ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] ext4: use vmtruncate() instead of ext4_truncate() in ext4_setattr() 2011-05-18 20:31 ` Eric Sandeen @ 2011-05-18 20:38 ` Jiaying Zhang 0 siblings, 0 replies; 22+ messages in thread From: Jiaying Zhang @ 2011-05-18 20:38 UTC (permalink / raw) To: Eric Sandeen; +Cc: tytso, linux-ext4 On Wed, May 18, 2011 at 1:31 PM, Eric Sandeen <sandeen@redhat.com> wrote: > On 5/18/11 3:28 PM, Jiaying Zhang wrote: >> >> >> On Tue, May 17, 2011 at 8:19 PM, Eric Sandeen <sandeen@redhat.com <mailto:sandeen@redhat.com>> wrote: >> >> On 5/17/11 5:59 PM, Jiaying Zhang wrote: >> > There is a bug in commit c8d46e41 "ext4: Add flag to files with blocks >> > intentionally past EOF" that if we fallocate a file with FALLOC_FL_KEEP_SIZE >> > flag and then ftruncate the file to a size larger than the file's i_size, >> > any allocated but unwritten blocks will be freed but the file size is set >> > to the size that ftruncate specifies. >> > >> > Here is a simple test to reproduce the problem: >> > 1. fallocate a 12k size file with KEEP_SIZE flag >> > 2. write the first 4k >> > 3. ftruncate the file to 8k >> > Then 'ls -l' shows that the i_size of the file becomes 8k but debugfs >> > shows the file has only the first written block left. >> >> To be honest I'm not 100% certain what the fiesystem -should- do in this case. >> >> If I go through that same sequence on xfs, I get 4k written / 8k unwritten: >> >> # xfs_bmap -vp testfile >> testfile: >> EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS >> 0: [0..7]: 2648750760..2648750767 3 (356066400..356066407) 8 00000 >> 1: [8..23]: 2648750768..2648750783 3 (356066408..356066423) 16 10000 >> >> size 8k: >> # ls -l testfile >> -rw-r--r-- 1 root root 8192 May 17 22:33 testfile >> >> and diskspace used 12k: >> # du -hc testfile >> 12K testfile >> 12K total >> >> I think this is a different result from ext4, either with or without your patch. >> >> On ext4 I get size 8k, but only the first 4k mapped, as you say. >> >> I agree that truncating to a size larger than i_size is un-specified by >> POSIX. However, I think the problem with the current behavior is that >> we have an inconsistency between file's i_size and its extent tree. >> Now we have 8k i_size but the file has only 4k space allocated. That >> can confuse applications. > > That's called "a sparse file" right? Apps should not be confused by that ... But applications don't intend to create a sparse file in this case. They may create a lot of such files in this way with 'df' showing there is still plenty of free space available but as they start writing to these files they will fill up space and get ENOSPC unexpectedly. Jiaying > > -Eric > -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] ext4: use vmtruncate() instead of ext4_truncate() in ext4_setattr() 2011-05-17 22:59 [PATCH] ext4: use vmtruncate() instead of ext4_truncate() in ext4_setattr() Jiaying Zhang ` (2 preceding siblings ...) 2011-05-18 3:19 ` Eric Sandeen @ 2011-05-22 23:56 ` Ted Ts'o 2011-05-23 18:30 ` Jiaying Zhang 3 siblings, 1 reply; 22+ messages in thread From: Ted Ts'o @ 2011-05-22 23:56 UTC (permalink / raw) To: Jiaying Zhang; +Cc: linux-ext4 Here's a cleaner way of fixing the problem. vmtruncate() is now deprecated, so I'm using truncate_setsize() instead; in addition, I avoid calling ext4_truncate() twice in the case where EOFBLOCKS_FL. Since ext4_truncate() is a very heavyweight function, which requires manipulating the orphan list and taking i_data_sem, avoiding a double call of ext4_truncate() is a big win. - Ted commit bb5eabd2de1e9dfdeeac822174fba17f2d2a7f2b Author: Theodore Ts'o <tytso@mit.edu> Date: Sun May 22 19:45:01 2011 -0400 ext4: use truncate_setsize() unconditionally In commit c8d46e41 (ext4: Add flag to files with blocks intentionally past EOF), if the EOFBLOCKS_FL flag is set, we call ext4_truncate() before calling vmtruncate(). This caused any allocated but unwritten blocks created by calling fallocate() with the FALLOC_FL_KEEP_SIZE flag to be dropped. This was done to make to make sure that EOFBLOCKS_FL would not be cleared while still leaving blocks past i_size allocated. This was not necessary, since ext4_truncate() guarantees that blocks past i_size will be dropped, even in the case where truncate() has increased i_size before calling ext4_truncate(). So fix this by removing the EOFBLOCKS_FL special case treatment in ext4_setattr(). In addition, use truncate_setsize() followed by a call to ext4_truncate() instead of using vmtruncate(). This is more efficient since it skips the call to inode_newsize_ok(), which has been checked already by inode_change_ok(). This is also in a win in the case where EOFBLOCKS_FL is set since it avoids calling ext4_truncate() twice. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index df3fb20..4ca8411 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -5363,8 +5363,7 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr) if (S_ISREG(inode->i_mode) && attr->ia_valid & ATTR_SIZE && - (attr->ia_size < inode->i_size || - (ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS)))) { + (attr->ia_size < inode->i_size)) { handle_t *handle; handle = ext4_journal_start(inode, 3); @@ -5398,14 +5397,13 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr) goto err_out; } } - /* ext4_truncate will clear the flag */ - if ((ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS))) - ext4_truncate(inode); } if ((attr->ia_valid & ATTR_SIZE) && - attr->ia_size != i_size_read(inode)) - rc = vmtruncate(inode, attr->ia_size); + attr->ia_size != i_size_read(inode)) { + truncate_setsize(inode, attr->ia_size); + ext4_truncate(inode); + } if (!rc) { setattr_copy(inode, attr); ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH] ext4: use vmtruncate() instead of ext4_truncate() in ext4_setattr() 2011-05-22 23:56 ` Ted Ts'o @ 2011-05-23 18:30 ` Jiaying Zhang 2011-05-23 19:19 ` [PATCH -v2] ext4: use truncate_setsize() unconditionally Theodore Ts'o 0 siblings, 1 reply; 22+ messages in thread From: Jiaying Zhang @ 2011-05-23 18:30 UTC (permalink / raw) To: Ted Ts'o; +Cc: ext4 development Hi Ted, On Sun, May 22, 2011 at 4:56 PM, Ted Ts'o <tytso@mit.edu> wrote: > Here's a cleaner way of fixing the problem. vmtruncate() is now > deprecated, so I'm using truncate_setsize() instead; in addition, I > avoid calling ext4_truncate() twice in the case where EOFBLOCKS_FL. > Since ext4_truncate() is a very heavyweight function, which requires > manipulating the orphan list and taking i_data_sem, avoiding a double > call of ext4_truncate() is a big win. > > - Ted > > commit bb5eabd2de1e9dfdeeac822174fba17f2d2a7f2b > Author: Theodore Ts'o <tytso@mit.edu> > Date: Sun May 22 19:45:01 2011 -0400 > > ext4: use truncate_setsize() unconditionally > > In commit c8d46e41 (ext4: Add flag to files with blocks intentionally > past EOF), if the EOFBLOCKS_FL flag is set, we call ext4_truncate() > before calling vmtruncate(). This caused any allocated but unwritten > blocks created by calling fallocate() with the FALLOC_FL_KEEP_SIZE > flag to be dropped. This was done to make to make sure that > EOFBLOCKS_FL would not be cleared while still leaving blocks past > i_size allocated. This was not necessary, since ext4_truncate() > guarantees that blocks past i_size will be dropped, even in the case > where truncate() has increased i_size before calling ext4_truncate(). > > So fix this by removing the EOFBLOCKS_FL special case treatment in > ext4_setattr(). In addition, use truncate_setsize() followed by a > call to ext4_truncate() instead of using vmtruncate(). This is more > efficient since it skips the call to inode_newsize_ok(), which has > been checked already by inode_change_ok(). This is also in a win in > the case where EOFBLOCKS_FL is set since it avoids calling > ext4_truncate() twice. > > Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > index df3fb20..4ca8411 100644 > --- a/fs/ext4/inode.c > +++ b/fs/ext4/inode.c > @@ -5363,8 +5363,7 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr) > > if (S_ISREG(inode->i_mode) && > attr->ia_valid & ATTR_SIZE && > - (attr->ia_size < inode->i_size || > - (ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS)))) { > + (attr->ia_size < inode->i_size)) { > handle_t *handle; > > handle = ext4_journal_start(inode, 3); > @@ -5398,14 +5397,13 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr) > goto err_out; > } > } > - /* ext4_truncate will clear the flag */ > - if ((ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS))) > - ext4_truncate(inode); > } > > if ((attr->ia_valid & ATTR_SIZE) && > - attr->ia_size != i_size_read(inode)) > - rc = vmtruncate(inode, attr->ia_size); > + attr->ia_size != i_size_read(inode)) { > + truncate_setsize(inode, attr->ia_size); > + ext4_truncate(inode); > + } I think your patch doesn't cover the case when we fallocate with KEEP_SIZE a 12k file, write 4k, and then truncate the file to 4k. In this case, attr->ia_size is equal to i_size but we still want to free any allocated but unwritten blocks during truncate. Jiaying > > if (!rc) { > setattr_copy(inode, attr); > -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH -v2] ext4: use truncate_setsize() unconditionally 2011-05-23 18:30 ` Jiaying Zhang @ 2011-05-23 19:19 ` Theodore Ts'o 2011-05-23 20:22 ` Jiaying Zhang 2011-05-24 14:30 ` Eric Sandeen 0 siblings, 2 replies; 22+ messages in thread From: Theodore Ts'o @ 2011-05-23 19:19 UTC (permalink / raw) To: Ext4 Developers List; +Cc: jiayingz, Theodore Ts'o In commit c8d46e41 (ext4: Add flag to files with blocks intentionally past EOF), if the EOFBLOCKS_FL flag is set, we call ext4_truncate() before calling vmtruncate(). This caused any allocated but unwritten blocks created by calling fallocate() with the FALLOC_FL_KEEP_SIZE flag to be dropped. This was done to make to make sure that EOFBLOCKS_FL would not be cleared while still leaving blocks past i_size allocated. This was not necessary, since ext4_truncate() guarantees that blocks past i_size will be dropped, even in the case where truncate() has increased i_size before calling ext4_truncate(). So fix this by removing the EOFBLOCKS_FL special case treatment in ext4_setattr(). In addition, use truncate_setsize() followed by a call to ext4_truncate() instead of using vmtruncate(). This is more efficient since it skips the call to inode_newsize_ok(), which has been checked already by inode_change_ok(). This is also in a win in the case where EOFBLOCKS_FL is set since it avoids calling ext4_truncate() twice. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> --- Jiayingz pointed out that in the case where we fallocate 12k, write 4k, and then truncate to 4k, we should discard the excess fallocate'd blocks. So if attr->ia_size == inode.i_size, we can skip the truncate_setsize() call, but if the EOFBLOCKS_FL flag is set, we should still call ext4_truncate(). fs/ext4/inode.c | 16 ++++++++-------- 1 files changed, 8 insertions(+), 8 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index df3fb20..2e95819 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -5363,8 +5363,7 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr) if (S_ISREG(inode->i_mode) && attr->ia_valid & ATTR_SIZE && - (attr->ia_size < inode->i_size || - (ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS)))) { + (attr->ia_size < inode->i_size)) { handle_t *handle; handle = ext4_journal_start(inode, 3); @@ -5398,14 +5397,15 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr) goto err_out; } } - /* ext4_truncate will clear the flag */ - if ((ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS))) - ext4_truncate(inode); } - if ((attr->ia_valid & ATTR_SIZE) && - attr->ia_size != i_size_read(inode)) - rc = vmtruncate(inode, attr->ia_size); + if (attr->ia_valid & ATTR_SIZE) { + if (attr->ia_size != i_size_read(inode)) { + truncate_setsize(inode, attr->ia_size); + ext4_truncate(inode); + } else if (ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS)) + ext4_truncate(inode); + } if (!rc) { setattr_copy(inode, attr); -- 1.7.3.1 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH -v2] ext4: use truncate_setsize() unconditionally 2011-05-23 19:19 ` [PATCH -v2] ext4: use truncate_setsize() unconditionally Theodore Ts'o @ 2011-05-23 20:22 ` Jiaying Zhang 2011-05-24 14:30 ` Eric Sandeen 1 sibling, 0 replies; 22+ messages in thread From: Jiaying Zhang @ 2011-05-23 20:22 UTC (permalink / raw) To: Theodore Ts'o; +Cc: Ext4 Developers List On Mon, May 23, 2011 at 12:19 PM, Theodore Ts'o <tytso@mit.edu> wrote: > In commit c8d46e41 (ext4: Add flag to files with blocks intentionally > past EOF), if the EOFBLOCKS_FL flag is set, we call ext4_truncate() > before calling vmtruncate(). This caused any allocated but unwritten > blocks created by calling fallocate() with the FALLOC_FL_KEEP_SIZE > flag to be dropped. This was done to make to make sure that > EOFBLOCKS_FL would not be cleared while still leaving blocks past > i_size allocated. This was not necessary, since ext4_truncate() > guarantees that blocks past i_size will be dropped, even in the case > where truncate() has increased i_size before calling ext4_truncate(). > > So fix this by removing the EOFBLOCKS_FL special case treatment in > ext4_setattr(). In addition, use truncate_setsize() followed by a > call to ext4_truncate() instead of using vmtruncate(). This is more > efficient since it skips the call to inode_newsize_ok(), which has > been checked already by inode_change_ok(). This is also in a win in > the case where EOFBLOCKS_FL is set since it avoids calling > ext4_truncate() twice. > > Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> > --- > Jiayingz pointed out that in the case where we fallocate 12k, write 4k, and > then truncate to 4k, we should discard the excess fallocate'd blocks. So if > attr->ia_size == inode.i_size, we can skip the truncate_setsize() call, but > if the EOFBLOCKS_FL flag is set, we should still call ext4_truncate(). > > fs/ext4/inode.c | 16 ++++++++-------- > 1 files changed, 8 insertions(+), 8 deletions(-) > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > index df3fb20..2e95819 100644 > --- a/fs/ext4/inode.c > +++ b/fs/ext4/inode.c > @@ -5363,8 +5363,7 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr) > > if (S_ISREG(inode->i_mode) && > attr->ia_valid & ATTR_SIZE && > - (attr->ia_size < inode->i_size || > - (ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS)))) { > + (attr->ia_size < inode->i_size)) { > handle_t *handle; > > handle = ext4_journal_start(inode, 3); > @@ -5398,14 +5397,15 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr) > goto err_out; > } > } > - /* ext4_truncate will clear the flag */ > - if ((ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS))) > - ext4_truncate(inode); > } > > - if ((attr->ia_valid & ATTR_SIZE) && > - attr->ia_size != i_size_read(inode)) > - rc = vmtruncate(inode, attr->ia_size); > + if (attr->ia_valid & ATTR_SIZE) { > + if (attr->ia_size != i_size_read(inode)) { > + truncate_setsize(inode, attr->ia_size); > + ext4_truncate(inode); > + } else if (ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS)) > + ext4_truncate(inode); > + } > > if (!rc) { > setattr_copy(inode, attr); > -- > 1.7.3.1 lgtm. Jiaying > > -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH -v2] ext4: use truncate_setsize() unconditionally 2011-05-23 19:19 ` [PATCH -v2] ext4: use truncate_setsize() unconditionally Theodore Ts'o 2011-05-23 20:22 ` Jiaying Zhang @ 2011-05-24 14:30 ` Eric Sandeen 2011-05-24 22:06 ` Jiaying Zhang 1 sibling, 1 reply; 22+ messages in thread From: Eric Sandeen @ 2011-05-24 14:30 UTC (permalink / raw) To: Theodore Ts'o; +Cc: Ext4 Developers List, jiayingz On 5/23/11 2:19 PM, Theodore Ts'o wrote: > In commit c8d46e41 (ext4: Add flag to files with blocks intentionally > past EOF), if the EOFBLOCKS_FL flag is set, we call ext4_truncate() > before calling vmtruncate(). This caused any allocated but unwritten > blocks created by calling fallocate() with the FALLOC_FL_KEEP_SIZE > flag to be dropped. This was done to make to make sure that > EOFBLOCKS_FL would not be cleared while still leaving blocks past > i_size allocated. This was not necessary, since ext4_truncate() > guarantees that blocks past i_size will be dropped, even in the case > where truncate() has increased i_size before calling ext4_truncate(). > > So fix this by removing the EOFBLOCKS_FL special case treatment in > ext4_setattr(). In addition, use truncate_setsize() followed by a > call to ext4_truncate() instead of using vmtruncate(). This is more > efficient since it skips the call to inode_newsize_ok(), which has > been checked already by inode_change_ok(). This is also in a win in > the case where EOFBLOCKS_FL is set since it avoids calling > ext4_truncate() twice. > > Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> > --- > Jiayingz pointed out that in the case where we fallocate 12k, write 4k, and > then truncate to 4k, we should discard the excess fallocate'd blocks. So if > attr->ia_size == inode.i_size, we can skip the truncate_setsize() call, but > if the EOFBLOCKS_FL flag is set, we should still call ext4_truncate(). are there xfstests which cover this explicitly? It should be simple to write. If filesystem behavior differs we can always make ext4-only tests. -Eric ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH -v2] ext4: use truncate_setsize() unconditionally 2011-05-24 14:30 ` Eric Sandeen @ 2011-05-24 22:06 ` Jiaying Zhang 2011-05-24 22:31 ` Eric Sandeen 0 siblings, 1 reply; 22+ messages in thread From: Jiaying Zhang @ 2011-05-24 22:06 UTC (permalink / raw) To: Eric Sandeen; +Cc: Theodore Ts'o, Ext4 Developers List On Tue, May 24, 2011 at 7:30 AM, Eric Sandeen <sandeen@redhat.com> wrote: > On 5/23/11 2:19 PM, Theodore Ts'o wrote: >> In commit c8d46e41 (ext4: Add flag to files with blocks intentionally >> past EOF), if the EOFBLOCKS_FL flag is set, we call ext4_truncate() >> before calling vmtruncate(). This caused any allocated but unwritten >> blocks created by calling fallocate() with the FALLOC_FL_KEEP_SIZE >> flag to be dropped. This was done to make to make sure that >> EOFBLOCKS_FL would not be cleared while still leaving blocks past >> i_size allocated. This was not necessary, since ext4_truncate() >> guarantees that blocks past i_size will be dropped, even in the case >> where truncate() has increased i_size before calling ext4_truncate(). >> >> So fix this by removing the EOFBLOCKS_FL special case treatment in >> ext4_setattr(). In addition, use truncate_setsize() followed by a >> call to ext4_truncate() instead of using vmtruncate(). This is more >> efficient since it skips the call to inode_newsize_ok(), which has >> been checked already by inode_change_ok(). This is also in a win in >> the case where EOFBLOCKS_FL is set since it avoids calling >> ext4_truncate() twice. >> >> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> >> --- >> Jiayingz pointed out that in the case where we fallocate 12k, write 4k, and >> then truncate to 4k, we should discard the excess fallocate'd blocks. So if >> attr->ia_size == inode.i_size, we can skip the truncate_setsize() call, but >> if the EOFBLOCKS_FL flag is set, we should still call ext4_truncate(). > > are there xfstests which cover this explicitly? It should be simple to write. > Vivek has written a xfstest to cover this and more fallocate/truncate cases: http://old.nabble.com/-PATCH--xfstests%3A-test-fallocate%2C-write%2C-ftruncate-combinations.-to31666685.html#a31666685 Jiaying > If filesystem behavior differs we can always make ext4-only tests. > > -Eric > -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH -v2] ext4: use truncate_setsize() unconditionally 2011-05-24 22:06 ` Jiaying Zhang @ 2011-05-24 22:31 ` Eric Sandeen 0 siblings, 0 replies; 22+ messages in thread From: Eric Sandeen @ 2011-05-24 22:31 UTC (permalink / raw) To: Jiaying Zhang; +Cc: Theodore Ts'o, Ext4 Developers List On 5/24/11 5:06 PM, Jiaying Zhang wrote: > On Tue, May 24, 2011 at 7:30 AM, Eric Sandeen <sandeen@redhat.com> wrote: >> On 5/23/11 2:19 PM, Theodore Ts'o wrote: >>> In commit c8d46e41 (ext4: Add flag to files with blocks intentionally >>> past EOF), if the EOFBLOCKS_FL flag is set, we call ext4_truncate() >>> before calling vmtruncate(). This caused any allocated but unwritten >>> blocks created by calling fallocate() with the FALLOC_FL_KEEP_SIZE >>> flag to be dropped. This was done to make to make sure that >>> EOFBLOCKS_FL would not be cleared while still leaving blocks past >>> i_size allocated. This was not necessary, since ext4_truncate() >>> guarantees that blocks past i_size will be dropped, even in the case >>> where truncate() has increased i_size before calling ext4_truncate(). >>> >>> So fix this by removing the EOFBLOCKS_FL special case treatment in >>> ext4_setattr(). In addition, use truncate_setsize() followed by a >>> call to ext4_truncate() instead of using vmtruncate(). This is more >>> efficient since it skips the call to inode_newsize_ok(), which has >>> been checked already by inode_change_ok(). This is also in a win in >>> the case where EOFBLOCKS_FL is set since it avoids calling >>> ext4_truncate() twice. >>> >>> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> >>> --- >>> Jiayingz pointed out that in the case where we fallocate 12k, write 4k, and >>> then truncate to 4k, we should discard the excess fallocate'd blocks. So if >>> attr->ia_size == inode.i_size, we can skip the truncate_setsize() call, but >>> if the EOFBLOCKS_FL flag is set, we should still call ext4_truncate(). >> >> are there xfstests which cover this explicitly? It should be simple to write. >> > Vivek has written a xfstest to cover this and more fallocate/truncate cases: > http://old.nabble.com/-PATCH--xfstests%3A-test-fallocate%2C-write%2C-ftruncate-combinations.-to31666685.html#a31666685 ah, right - ok, thanks! Sorry for not keeping up. -Eric > Jiaying > >> If filesystem behavior differs we can always make ext4-only tests. >> >> -Eric >> ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2011-05-24 22:31 UTC | newest] Thread overview: 22+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-05-17 22:59 [PATCH] ext4: use vmtruncate() instead of ext4_truncate() in ext4_setattr() Jiaying Zhang 2011-05-18 2:46 ` Yongqiang Yang 2011-05-18 2:56 ` Yongqiang Yang 2011-05-18 3:27 ` Yongqiang Yang 2011-05-18 3:19 ` Eric Sandeen 2011-05-18 5:35 ` Andreas Dilger 2011-05-18 20:32 ` Jiaying Zhang 2011-05-18 20:45 ` Andreas Dilger 2011-05-18 20:57 ` Jiaying Zhang 2011-05-18 6:13 ` Dave Chinner 2011-05-18 14:05 ` Eric Sandeen 2011-05-18 20:42 ` Jiaying Zhang 2011-05-18 20:29 ` Jiaying Zhang [not found] ` <BANLkTi=Yv_q820aHFa2wkCL-PnYNcZdWCQ@mail.gmail.com> 2011-05-18 20:31 ` Eric Sandeen 2011-05-18 20:38 ` Jiaying Zhang 2011-05-22 23:56 ` Ted Ts'o 2011-05-23 18:30 ` Jiaying Zhang 2011-05-23 19:19 ` [PATCH -v2] ext4: use truncate_setsize() unconditionally Theodore Ts'o 2011-05-23 20:22 ` Jiaying Zhang 2011-05-24 14:30 ` Eric Sandeen 2011-05-24 22:06 ` Jiaying Zhang 2011-05-24 22:31 ` Eric Sandeen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).