* Question on fallocate/ftruncate sequence @ 2009-07-20 16:36 Curt Wohlgemuth 2009-07-20 22:45 ` Eric Sandeen 0 siblings, 1 reply; 42+ messages in thread From: Curt Wohlgemuth @ 2009-07-20 16:36 UTC (permalink / raw) To: ext4 development We've recently seen some interesting behavior with ftruncate() following a fallocate() call on ext4, and would like to know if this is intended or not. The sequence used from user space: fd = open() fallocate(fd, FALLOC_FL_KEEP_SIZE, 8MB) write(fd, buf, 64KB) ftruncate(fd, 64KB) close(fd) Since inode_setattr() only does something if the input size is not the same as inode->i_size, the ftruncate() call above does nothing; no blocks from the fallocate() are freed up. Yes, removing the KEEP_SIZE flag gets the behavior I'm expecting, but KEEP_SIZE is quite convenient in recovering from errors. I would have thought that ftruncate() would alter i_disksize even if this value is different from i_size. Any comments? I looked at other Linux file systems, and none that I saw that support fallocate() have this issue. Thanks, Curt ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Question on fallocate/ftruncate sequence 2009-07-20 16:36 Question on fallocate/ftruncate sequence Curt Wohlgemuth @ 2009-07-20 22:45 ` Eric Sandeen 2009-07-21 21:29 ` Frank Mayhar 2009-07-21 22:03 ` Question on fallocate/ftruncate sequence Eric Sandeen 0 siblings, 2 replies; 42+ messages in thread From: Eric Sandeen @ 2009-07-20 22:45 UTC (permalink / raw) To: Curt Wohlgemuth; +Cc: ext4 development Curt Wohlgemuth wrote: > We've recently seen some interesting behavior with ftruncate() > following a fallocate() call on ext4, and would like to know if this > is intended or not. > > The sequence used from user space: > > fd = open() > fallocate(fd, FALLOC_FL_KEEP_SIZE, 8MB) > write(fd, buf, 64KB) > ftruncate(fd, 64KB) > close(fd) > > Since inode_setattr() only does something if the input size is not the > same as inode->i_size, the ftruncate() call above does nothing; no > blocks from the fallocate() are freed up. > > Yes, removing the KEEP_SIZE flag gets the behavior I'm expecting, but > KEEP_SIZE is quite convenient in recovering from errors. > > I would have thought that ftruncate() would alter i_disksize even if > this value is different from i_size. > > Any comments? I looked at other Linux file systems, and none that I > saw that support fallocate() have this issue. > > Thanks, > Curt Yep, I think you've found a bug, I will look into this soon unless someone beats me to it :) -Eric ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Question on fallocate/ftruncate sequence 2009-07-20 22:45 ` Eric Sandeen @ 2009-07-21 21:29 ` Frank Mayhar 2009-07-21 21:54 ` Andreas Dilger 2009-07-21 22:03 ` Question on fallocate/ftruncate sequence Eric Sandeen 1 sibling, 1 reply; 42+ messages in thread From: Frank Mayhar @ 2009-07-21 21:29 UTC (permalink / raw) To: Eric Sandeen; +Cc: Curt Wohlgemuth, ext4 development On Mon, 2009-07-20 at 17:45 -0500, Eric Sandeen wrote: > Curt Wohlgemuth wrote: > > We've recently seen some interesting behavior with ftruncate() > > following a fallocate() call on ext4, and would like to know if this > > is intended or not. > > > > The sequence used from user space: > > > > fd = open() > > fallocate(fd, FALLOC_FL_KEEP_SIZE, 8MB) > > write(fd, buf, 64KB) > > ftruncate(fd, 64KB) > > close(fd) > > > > Since inode_setattr() only does something if the input size is not the > > same as inode->i_size, the ftruncate() call above does nothing; no > > blocks from the fallocate() are freed up. > > > > Yes, removing the KEEP_SIZE flag gets the behavior I'm expecting, but > > KEEP_SIZE is quite convenient in recovering from errors. > > > > I would have thought that ftruncate() would alter i_disksize even if > > this value is different from i_size. > > > > Any comments? I looked at other Linux file systems, and none that I > > saw that support fallocate() have this issue. > > > > Thanks, > > Curt > > Yep, I think you've found a bug, I will look into this soon unless > someone beats me to it :) I've spent a little while today digging into this. My guess (only a guess at this point until I have a chance to prove it) is that i_disksize should be updated by fallocate() even when KEEP_SIZE is specified. It's currently not updated in that case. It's my understanding that i_disksize should be the real allocation, right? While i_size is the size that has actually been used? If so, then setting i_disksize is probably what's missing. -- Frank Mayhar <fmayhar@google.com> Google, Inc. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Question on fallocate/ftruncate sequence 2009-07-21 21:29 ` Frank Mayhar @ 2009-07-21 21:54 ` Andreas Dilger 2009-07-22 16:24 ` Frank Mayhar ` (2 more replies) 0 siblings, 3 replies; 42+ messages in thread From: Andreas Dilger @ 2009-07-21 21:54 UTC (permalink / raw) To: Frank Mayhar; +Cc: Eric Sandeen, Curt Wohlgemuth, ext4 development On Jul 21, 2009 14:29 -0700, Frank Mayhar wrote: > I've spent a little while today digging into this. My guess (only a > guess at this point until I have a chance to prove it) is that > i_disksize should be updated by fallocate() even when KEEP_SIZE is > specified. It's currently not updated in that case. No, that isn't correct. The intent of KEEP_SIZE is to allow fallocate to preallocate blocks beyond the EOF, so that it doesn't affect the file data visible to userspace, but can avoid fragmentation from e.g. log files or mbox files. The i_disksize variable is just to handle the lag in updating the on-disk file size during truncate, because the VFS updates i_size to indicate a truncate, but in order to handle the truncation of files within finite transaction sizes the on-disk file size needs to be shrunk incrementally. > It's my > understanding that i_disksize should be the real allocation, right? > While i_size is the size that has actually been used? If so, then > setting i_disksize is probably what's missing. The difference is that i_size is in the VFS inode, and represents the current in-memory state, while i_disksize is in the ext4 private inode data and represents what is currently in the on-disk inode. If we were to change i_disksize then on the next reboot the filesize would become whatever is stored in i_disksize. That said, we might need to have some kind of flag in the on-disk inode to indicate that it was preallocated beyond EOF. Otherwise, e2fsck will try and extend the file size to match the block count, which isn't correct. We could also use this flag to determine if truncate needs to be run on the inode even if the new size is the same. As a workaround for now, you could truncate to (size+1), then again truncate to (size) and it should have the desired effect. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Question on fallocate/ftruncate sequence 2009-07-21 21:54 ` Andreas Dilger @ 2009-07-22 16:24 ` Frank Mayhar 2009-07-22 23:10 ` Frank Mayhar 2009-07-23 19:48 ` Question on fallocate/ftruncate sequence (and flags) Frank Mayhar 2 siblings, 0 replies; 42+ messages in thread From: Frank Mayhar @ 2009-07-22 16:24 UTC (permalink / raw) To: Andreas Dilger; +Cc: Eric Sandeen, Curt Wohlgemuth, ext4 development On Tue, 2009-07-21 at 15:54 -0600, Andreas Dilger wrote: > On Jul 21, 2009 14:29 -0700, Frank Mayhar wrote: > > I've spent a little while today digging into this. My guess (only a > > guess at this point until I have a chance to prove it) is that > > i_disksize should be updated by fallocate() even when KEEP_SIZE is > > specified. It's currently not updated in that case. > > No, that isn't correct. The intent of KEEP_SIZE is to allow fallocate > to preallocate blocks beyond the EOF, so that it doesn't affect the > file data visible to userspace, but can avoid fragmentation from e.g. > log files or mbox files. > > The i_disksize variable is just to handle the lag in updating the on-disk > file size during truncate, because the VFS updates i_size to indicate a > truncate, but in order to handle the truncation of files within finite > transaction sizes the on-disk file size needs to be shrunk incrementally. Okay, thanks, this makes this much more clear. It does sound like there needs to be a flag somewhere (probably in the on-disk inode) that indicates that there are allocated blocks beyond EOF, as you say. Then use that in ftruncate(). We would really like to avoid your workaround for performance reasons. -- Frank Mayhar <fmayhar@google.com> Google, Inc. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Question on fallocate/ftruncate sequence 2009-07-21 21:54 ` Andreas Dilger 2009-07-22 16:24 ` Frank Mayhar @ 2009-07-22 23:10 ` Frank Mayhar 2009-07-23 3:05 ` Eric Sandeen 2009-07-23 19:48 ` Question on fallocate/ftruncate sequence (and flags) Frank Mayhar 2 siblings, 1 reply; 42+ messages in thread From: Frank Mayhar @ 2009-07-22 23:10 UTC (permalink / raw) To: Andreas Dilger; +Cc: Eric Sandeen, Curt Wohlgemuth, ext4 development On Tue, 2009-07-21 at 15:54 -0600, Andreas Dilger wrote: > No, that isn't correct. The intent of KEEP_SIZE is to allow fallocate > to preallocate blocks beyond the EOF, so that it doesn't affect the > file data visible to userspace, but can avoid fragmentation from e.g. > log files or mbox files. > > The i_disksize variable is just to handle the lag in updating the on-disk > file size during truncate, because the VFS updates i_size to indicate a > truncate, but in order to handle the truncation of files within finite > transaction sizes the on-disk file size needs to be shrunk incrementally. > > The difference is that i_size is in the VFS inode, and represents the > current in-memory state, while i_disksize is in the ext4 private inode > data and represents what is currently in the on-disk inode. Okay, this makes sense, thanks. > That said, we might need to have some kind of flag in the on-disk > inode to indicate that it was preallocated beyond EOF. Otherwise, > e2fsck will try and extend the file size to match the block count, > which isn't correct. We could also use this flag to determine if > truncate needs to be run on the inode even if the new size is the > same. After chatting with Curt about this today, it sounds like this needs two things. One is your flag in the on-disk inode, set in fallocate() to indicate that it has an allocation past EOF. E2fsck would use this to avoid "fixing" the file size to match the block count. Truncate would use this to notice that there are blocks allocated past i_size and get rid of them. It would be cleared by truncate or by ext4_get_blocks when using the last block of such an allocation. Does this make sense? Have I missed anything? > As a workaround for now, you could truncate to (size+1), then again > truncate to (size) and it should have the desired effect. Well, as bad as fallocate()/truncate() is, doing two truncates is worse, I think. -- Frank Mayhar <fmayhar@google.com> Google, Inc. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Question on fallocate/ftruncate sequence 2009-07-22 23:10 ` Frank Mayhar @ 2009-07-23 3:05 ` Eric Sandeen 2009-07-23 16:27 ` Frank Mayhar 0 siblings, 1 reply; 42+ messages in thread From: Eric Sandeen @ 2009-07-23 3:05 UTC (permalink / raw) To: Frank Mayhar; +Cc: Andreas Dilger, Curt Wohlgemuth, ext4 development Frank Mayhar wrote: > On Tue, 2009-07-21 at 15:54 -0600, Andreas Dilger wrote: ... >> That said, we might need to have some kind of flag in the on-disk >> inode to indicate that it was preallocated beyond EOF. Otherwise, >> e2fsck will try and extend the file size to match the block count, >> which isn't correct. We could also use this flag to determine if >> truncate needs to be run on the inode even if the new size is the >> same. > > After chatting with Curt about this today, it sounds like this needs two > things. One is your flag in the on-disk inode, set in fallocate() to > indicate that it has an allocation past EOF. E2fsck would use this to > avoid "fixing" the file size to match the block count. Truncate would > use this to notice that there are blocks allocated past i_size and get > rid of them. It would be cleared by truncate or by ext4_get_blocks when > using the last block of such an allocation. > > Does this make sense? Have I missed anything? I guess I'm not totally sold on the new on-disk flag; we can work out blocks past EOF w/o needing a new flag can't we? -eric ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Question on fallocate/ftruncate sequence 2009-07-23 3:05 ` Eric Sandeen @ 2009-07-23 16:27 ` Frank Mayhar 2009-07-23 17:00 ` Eric Sandeen 0 siblings, 1 reply; 42+ messages in thread From: Frank Mayhar @ 2009-07-23 16:27 UTC (permalink / raw) To: Eric Sandeen; +Cc: Andreas Dilger, Curt Wohlgemuth, ext4 development On Wed, 2009-07-22 at 22:05 -0500, Eric Sandeen wrote: > Frank Mayhar wrote: > > On Tue, 2009-07-21 at 15:54 -0600, Andreas Dilger wrote: > > ... > > >> That said, we might need to have some kind of flag in the on-disk > >> inode to indicate that it was preallocated beyond EOF. Otherwise, > >> e2fsck will try and extend the file size to match the block count, > >> which isn't correct. We could also use this flag to determine if > >> truncate needs to be run on the inode even if the new size is the > >> same. > > > > After chatting with Curt about this today, it sounds like this needs two > > things. One is your flag in the on-disk inode, set in fallocate() to > > indicate that it has an allocation past EOF. E2fsck would use this to > > avoid "fixing" the file size to match the block count. Truncate would > > use this to notice that there are blocks allocated past i_size and get > > rid of them. It would be cleared by truncate or by ext4_get_blocks when > > using the last block of such an allocation. > > > > Does this make sense? Have I missed anything? > > I guess I'm not totally sold on the new on-disk flag; we can work out > blocks past EOF w/o needing a new flag can't we? It's on-disk because e2fsck needs it to know when not to extend i_size to the actual allocated length of the file. Were it not for that we could easily solve the fallocate/trucate problem with an in-memory flag only. -- Frank Mayhar <fmayhar@google.com> Google, Inc. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Question on fallocate/ftruncate sequence 2009-07-23 16:27 ` Frank Mayhar @ 2009-07-23 17:00 ` Eric Sandeen 2009-07-23 18:05 ` Frank Mayhar 0 siblings, 1 reply; 42+ messages in thread From: Eric Sandeen @ 2009-07-23 17:00 UTC (permalink / raw) To: Frank Mayhar; +Cc: Andreas Dilger, Curt Wohlgemuth, ext4 development Frank Mayhar wrote: > On Wed, 2009-07-22 at 22:05 -0500, Eric Sandeen wrote: >> Frank Mayhar wrote: >>> On Tue, 2009-07-21 at 15:54 -0600, Andreas Dilger wrote: >> ... >> >>>> That said, we might need to have some kind of flag in the on-disk >>>> inode to indicate that it was preallocated beyond EOF. Otherwise, >>>> e2fsck will try and extend the file size to match the block count, >>>> which isn't correct. We could also use this flag to determine if >>>> truncate needs to be run on the inode even if the new size is the >>>> same. >>> After chatting with Curt about this today, it sounds like this needs two >>> things. One is your flag in the on-disk inode, set in fallocate() to >>> indicate that it has an allocation past EOF. E2fsck would use this to >>> avoid "fixing" the file size to match the block count. Truncate would >>> use this to notice that there are blocks allocated past i_size and get >>> rid of them. It would be cleared by truncate or by ext4_get_blocks when >>> using the last block of such an allocation. >>> >>> Does this make sense? Have I missed anything? >> I guess I'm not totally sold on the new on-disk flag; we can work out >> blocks past EOF w/o needing a new flag can't we? > > It's on-disk because e2fsck needs it to know when not to extend i_size > to the actual allocated length of the file. Were it not for that we > could easily solve the fallocate/trucate problem with an in-memory flag > only. Sorry I skimmed to fast, skipped over the fsck part. But: # mkfs.ext4 /dev/sdb3 mke2fs 1.41.5 (23-Apr-2009) ... # mount /dev/sdb3 /mnt/test # touch /mnt/test/testfile # /root/fallocate -n -l 16m /mnt/test/testfile # ls -l /mnt/test/testfile -rw-r--r-- 1 root root 0 Jul 23 12:13 /mnt/test/testfile # du -h /mnt/test/testfile 16M /mnt/test/testfile # umount /mnt/test # e2fsck -f /dev/sdb3 e2fsck 1.41.5 (23-Apr-2009) Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information /dev/sdb3: 12/244800 files (0.0% non-contiguous), 37766/977956 blocks there doesn't seem to be a problem in fsck w/ block past EOF, or am I missing something else? -Eric ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Question on fallocate/ftruncate sequence 2009-07-23 17:00 ` Eric Sandeen @ 2009-07-23 18:05 ` Frank Mayhar 2009-07-23 21:56 ` Andreas Dilger 0 siblings, 1 reply; 42+ messages in thread From: Frank Mayhar @ 2009-07-23 18:05 UTC (permalink / raw) To: Eric Sandeen; +Cc: Andreas Dilger, Curt Wohlgemuth, ext4 development On Thu, 2009-07-23 at 12:00 -0500, Eric Sandeen wrote: > Sorry I skimmed to fast, skipped over the fsck part. But: > > # mkfs.ext4 /dev/sdb3 > mke2fs 1.41.5 (23-Apr-2009) > ... > # mount /dev/sdb3 /mnt/test > # touch /mnt/test/testfile > # /root/fallocate -n -l 16m /mnt/test/testfile > # ls -l /mnt/test/testfile > -rw-r--r-- 1 root root 0 Jul 23 12:13 /mnt/test/testfile > # du -h /mnt/test/testfile > 16M /mnt/test/testfile > # umount /mnt/test > # e2fsck -f /dev/sdb3 > e2fsck 1.41.5 (23-Apr-2009) > Pass 1: Checking inodes, blocks, and sizes > Pass 2: Checking directory structure > Pass 3: Checking directory connectivity > Pass 4: Checking reference counts > Pass 5: Checking group summary information > /dev/sdb3: 12/244800 files (0.0% non-contiguous), 37766/977956 blocks > > there doesn't seem to be a problem in fsck w/ block past EOF, or am I > missing something else? I was taking Andreas' word for it but now that you mention it, I see the same thing. Andreas, did you have a specific case in mind? -- Frank Mayhar <fmayhar@google.com> Google, Inc. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Question on fallocate/ftruncate sequence 2009-07-23 18:05 ` Frank Mayhar @ 2009-07-23 21:56 ` Andreas Dilger 2009-07-23 22:46 ` Frank Mayhar 0 siblings, 1 reply; 42+ messages in thread From: Andreas Dilger @ 2009-07-23 21:56 UTC (permalink / raw) To: Frank Mayhar; +Cc: Eric Sandeen, Curt Wohlgemuth, ext4 development On Jul 23, 2009 11:05 -0700, Frank Mayhar wrote: > On Thu, 2009-07-23 at 12:00 -0500, Eric Sandeen wrote: > > Sorry I skimmed to fast, skipped over the fsck part. But: > > > > # touch /mnt/test/testfile > > # /root/fallocate -n -l 16m /mnt/test/testfile > > # ls -l /mnt/test/testfile > > -rw-r--r-- 1 root root 0 Jul 23 12:13 /mnt/test/testfile > > # du -h /mnt/test/testfile > > 16M /mnt/test/testfile > > > > there doesn't seem to be a problem in fsck w/ block past EOF, or am I > > missing something else? > > I was taking Andreas' word for it but now that you mention it, I see the > same thing. Andreas, did you have a specific case in mind? Ted and I had discussed this in the past, maybe he fixed e2fsck to not change the file size when there are blocks allocated beyond EOF. Having a flag wouldn't be a terrible idea, IMHO, so that e2fsck can make a better decision on whether the size or the blocks count are more correct. I'm not dead set on it. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Question on fallocate/ftruncate sequence 2009-07-23 21:56 ` Andreas Dilger @ 2009-07-23 22:46 ` Frank Mayhar 2009-08-28 18:42 ` Jiaying Zhang 0 siblings, 1 reply; 42+ messages in thread From: Frank Mayhar @ 2009-07-23 22:46 UTC (permalink / raw) To: Andreas Dilger; +Cc: Eric Sandeen, Curt Wohlgemuth, ext4 development On Thu, 2009-07-23 at 15:56 -0600, Andreas Dilger wrote: > On Jul 23, 2009 11:05 -0700, Frank Mayhar wrote: > > On Thu, 2009-07-23 at 12:00 -0500, Eric Sandeen wrote: > > > Sorry I skimmed to fast, skipped over the fsck part. But: > > > > > > # touch /mnt/test/testfile > > > # /root/fallocate -n -l 16m /mnt/test/testfile > > > # ls -l /mnt/test/testfile > > > -rw-r--r-- 1 root root 0 Jul 23 12:13 /mnt/test/testfile > > > # du -h /mnt/test/testfile > > > 16M /mnt/test/testfile > > > > > > there doesn't seem to be a problem in fsck w/ block past EOF, or am I > > > missing something else? > > > > I was taking Andreas' word for it but now that you mention it, I see the > > same thing. Andreas, did you have a specific case in mind? > > Ted and I had discussed this in the past, maybe he fixed e2fsck to not > change the file size when there are blocks allocated beyond EOF. Having > a flag wouldn't be a terrible idea, IMHO, so that e2fsck can make a > better decision on whether the size or the blocks count are more correct. > I'm not dead set on it. For the moment I'm going to table the e2fsck change and make the flag memory-only. It'll be easy enough to change this if and when you guys come to an agreement about what is right. As for the flag itself, I'll pick a bit that doesn't conflict with anything else and leave reconciling the already-conflicting bits to you guys. -- Frank Mayhar <fmayhar@google.com> Google, Inc. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Question on fallocate/ftruncate sequence 2009-07-23 22:46 ` Frank Mayhar @ 2009-08-28 18:42 ` Jiaying Zhang 2009-08-28 19:40 ` Andreas Dilger 0 siblings, 1 reply; 42+ messages in thread From: Jiaying Zhang @ 2009-08-28 18:42 UTC (permalink / raw) To: Frank Mayhar Cc: Andreas Dilger, Eric Sandeen, Curt Wohlgemuth, ext4 development Sorry for joining the conversation late. Frank and I had a discussion on this problem this morning. We wonder whether we can just add the checking on whether i_blocks is consistent with i_size during truncate. Here is the patch I tried and it seems to have solved the problem. I.e., the space reserved in fallocate(KEEP_SIZE) is now freed in the next truncate. --- git-linux/fs/attr.c 2009-05-20 18:05:55.000000000 -0700 +++ linux-2.6.30.5/fs/attr.c 2009-08-27 14:34:48.000000000 -0700 @@ -68,7 +68,8 @@ int inode_setattr(struct inode * inode, unsigned int ia_valid = attr->ia_valid; if (ia_valid & ATTR_SIZE && - attr->ia_size != i_size_read(inode)) { + (attr->ia_size != i_size_read(inode) || + attr->ia_size >> 9 < inode->i_blocks - 1)) { int error = vmtruncate(inode, attr->ia_size); if (error) return error; One thing I am not sure is whether adding this check in inode_setattr may cause any problem in other cases. I saw inode_setattr is called at many places as well as during ftruncate. Any opinions on this proposed solution? Jiaying On Thu, Jul 23, 2009 at 3:46 PM, Frank Mayhar<fmayhar@google.com> wrote: > On Thu, 2009-07-23 at 15:56 -0600, Andreas Dilger wrote: >> On Jul 23, 2009 11:05 -0700, Frank Mayhar wrote: >> > On Thu, 2009-07-23 at 12:00 -0500, Eric Sandeen wrote: >> > > Sorry I skimmed to fast, skipped over the fsck part. But: >> > > >> > > # touch /mnt/test/testfile >> > > # /root/fallocate -n -l 16m /mnt/test/testfile >> > > # ls -l /mnt/test/testfile >> > > -rw-r--r-- 1 root root 0 Jul 23 12:13 /mnt/test/testfile >> > > # du -h /mnt/test/testfile >> > > 16M /mnt/test/testfile >> > > >> > > there doesn't seem to be a problem in fsck w/ block past EOF, or am I >> > > missing something else? >> > >> > I was taking Andreas' word for it but now that you mention it, I see the >> > same thing. Andreas, did you have a specific case in mind? >> >> Ted and I had discussed this in the past, maybe he fixed e2fsck to not >> change the file size when there are blocks allocated beyond EOF. Having >> a flag wouldn't be a terrible idea, IMHO, so that e2fsck can make a >> better decision on whether the size or the blocks count are more correct. >> I'm not dead set on it. > > For the moment I'm going to table the e2fsck change and make the flag > memory-only. It'll be easy enough to change this if and when you guys > come to an agreement about what is right. > > As for the flag itself, I'll pick a bit that doesn't conflict with > anything else and leave reconciling the already-conflicting bits to you > guys. > -- > Frank Mayhar <fmayhar@google.com> > Google, Inc. > > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Question on fallocate/ftruncate sequence 2009-08-28 18:42 ` Jiaying Zhang @ 2009-08-28 19:40 ` Andreas Dilger 2009-08-28 21:44 ` Jiaying Zhang 0 siblings, 1 reply; 42+ messages in thread From: Andreas Dilger @ 2009-08-28 19:40 UTC (permalink / raw) To: Jiaying Zhang Cc: Frank Mayhar, Eric Sandeen, Curt Wohlgemuth, ext4 development On Aug 28, 2009 11:42 -0700, Jiaying Zhang wrote: > Sorry for joining the conversation late. Frank and I had a discussion on this > problem this morning. We wonder whether we can just add the checking > on whether i_blocks is consistent with i_size during truncate. Here is the > patch I tried and it seems to have solved the problem. I.e., the space > reserved in fallocate(KEEP_SIZE) is now freed in the next truncate. > > --- git-linux/fs/attr.c 2009-05-20 18:05:55.000000000 -0700 > +++ linux-2.6.30.5/fs/attr.c 2009-08-27 14:34:48.000000000 -0700 > @@ -68,7 +68,8 @@ int inode_setattr(struct inode * inode, > unsigned int ia_valid = attr->ia_valid; > > if (ia_valid & ATTR_SIZE && > - attr->ia_size != i_size_read(inode)) { > + (attr->ia_size != i_size_read(inode) || > + attr->ia_size >> 9 < inode->i_blocks - 1)) { > int error = vmtruncate(inode, attr->ia_size); > if (error) > return error; This isn't really correct, however, because i_blocks also contains non-data blocks (indirect/index, EA, etc) blocks, so even with small files with ACLs i_blocks may always be larger than ia_size >> 9, and for ext2/3 at least this will ALWAYS be true for files > 48kB in size. > On Thu, Jul 23, 2009 at 3:46 PM, Frank Mayhar<fmayhar@google.com> wrote: > > On Thu, 2009-07-23 at 15:56 -0600, Andreas Dilger wrote: > >> On Jul 23, 2009 11:05 -0700, Frank Mayhar wrote: > >> > On Thu, 2009-07-23 at 12:00 -0500, Eric Sandeen wrote: > >> > > Sorry I skimmed to fast, skipped over the fsck part. But: > >> > > > >> > > # touch /mnt/test/testfile > >> > > # /root/fallocate -n -l 16m /mnt/test/testfile > >> > > # ls -l /mnt/test/testfile > >> > > -rw-r--r-- 1 root root 0 Jul 23 12:13 /mnt/test/testfile > >> > > # du -h /mnt/test/testfile > >> > > 16M /mnt/test/testfile > >> > > > >> > > there doesn't seem to be a problem in fsck w/ block past EOF, or am I > >> > > missing something else? > >> > > >> > I was taking Andreas' word for it but now that you mention it, I see the > >> > same thing. Andreas, did you have a specific case in mind? > >> > >> Ted and I had discussed this in the past, maybe he fixed e2fsck to not > >> change the file size when there are blocks allocated beyond EOF. Having > >> a flag wouldn't be a terrible idea, IMHO, so that e2fsck can make a > >> better decision on whether the size or the blocks count are more correct. > >> I'm not dead set on it. > > > > For the moment I'm going to table the e2fsck change and make the flag > > memory-only. It'll be easy enough to change this if and when you guys > > come to an agreement about what is right. > > > > As for the flag itself, I'll pick a bit that doesn't conflict with > > anything else and leave reconciling the already-conflicting bits to you > > guys. > > -- > > Frank Mayhar <fmayhar@google.com> > > Google, Inc. > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Question on fallocate/ftruncate sequence 2009-08-28 19:40 ` Andreas Dilger @ 2009-08-28 21:44 ` Jiaying Zhang 2009-08-28 22:14 ` Andreas Dilger 0 siblings, 1 reply; 42+ messages in thread From: Jiaying Zhang @ 2009-08-28 21:44 UTC (permalink / raw) To: Andreas Dilger Cc: Frank Mayhar, Eric Sandeen, Curt Wohlgemuth, ext4 development On Fri, Aug 28, 2009 at 12:40 PM, Andreas Dilger<adilger@sun.com> wrote: > On Aug 28, 2009 11:42 -0700, Jiaying Zhang wrote: >> Sorry for joining the conversation late. Frank and I had a discussion on this >> problem this morning. We wonder whether we can just add the checking >> on whether i_blocks is consistent with i_size during truncate. Here is the >> patch I tried and it seems to have solved the problem. I.e., the space >> reserved in fallocate(KEEP_SIZE) is now freed in the next truncate. >> >> --- git-linux/fs/attr.c 2009-05-20 18:05:55.000000000 -0700 >> +++ linux-2.6.30.5/fs/attr.c 2009-08-27 14:34:48.000000000 -0700 >> @@ -68,7 +68,8 @@ int inode_setattr(struct inode * inode, >> unsigned int ia_valid = attr->ia_valid; >> >> if (ia_valid & ATTR_SIZE && >> - attr->ia_size != i_size_read(inode)) { >> + (attr->ia_size != i_size_read(inode) || >> + attr->ia_size >> 9 < inode->i_blocks - 1)) { >> int error = vmtruncate(inode, attr->ia_size); >> if (error) >> return error; > > This isn't really correct, however, because i_blocks also contains > non-data blocks (indirect/index, EA, etc) blocks, so even with small > files with ACLs i_blocks may always be larger than ia_size >> 9, and > for ext2/3 at least this will ALWAYS be true for files > 48kB in size. I see. I guess we need to use a special flag then. Or is there any other suggestions? I also have another question related to this problem. Why those fallocated blocks are not marked as preallocated blocks that will then be automatically freed in ext4_release_file? Jiaying > >> On Thu, Jul 23, 2009 at 3:46 PM, Frank Mayhar<fmayhar@google.com> wrote: >> > On Thu, 2009-07-23 at 15:56 -0600, Andreas Dilger wrote: >> >> On Jul 23, 2009 11:05 -0700, Frank Mayhar wrote: >> >> > On Thu, 2009-07-23 at 12:00 -0500, Eric Sandeen wrote: >> >> > > Sorry I skimmed to fast, skipped over the fsck part. But: >> >> > > >> >> > > # touch /mnt/test/testfile >> >> > > # /root/fallocate -n -l 16m /mnt/test/testfile >> >> > > # ls -l /mnt/test/testfile >> >> > > -rw-r--r-- 1 root root 0 Jul 23 12:13 /mnt/test/testfile >> >> > > # du -h /mnt/test/testfile >> >> > > 16M /mnt/test/testfile >> >> > > >> >> > > there doesn't seem to be a problem in fsck w/ block past EOF, or am I >> >> > > missing something else? >> >> > >> >> > I was taking Andreas' word for it but now that you mention it, I see the >> >> > same thing. Andreas, did you have a specific case in mind? >> >> >> >> Ted and I had discussed this in the past, maybe he fixed e2fsck to not >> >> change the file size when there are blocks allocated beyond EOF. Having >> >> a flag wouldn't be a terrible idea, IMHO, so that e2fsck can make a >> >> better decision on whether the size or the blocks count are more correct. >> >> I'm not dead set on it. >> > >> > For the moment I'm going to table the e2fsck change and make the flag >> > memory-only. It'll be easy enough to change this if and when you guys >> > come to an agreement about what is right. >> > >> > As for the flag itself, I'll pick a bit that doesn't conflict with >> > anything else and leave reconciling the already-conflicting bits to you >> > guys. >> > -- >> > Frank Mayhar <fmayhar@google.com> >> > Google, Inc. >> > >> > -- >> > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in >> > the body of a message to majordomo@vger.kernel.org >> > More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > > Cheers, Andreas > -- > Andreas Dilger > Sr. Staff Engineer, Lustre Group > Sun Microsystems of Canada, Inc. > > -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Question on fallocate/ftruncate sequence 2009-08-28 21:44 ` Jiaying Zhang @ 2009-08-28 22:14 ` Andreas Dilger 2009-08-29 0:40 ` Jiaying Zhang 0 siblings, 1 reply; 42+ messages in thread From: Andreas Dilger @ 2009-08-28 22:14 UTC (permalink / raw) To: Jiaying Zhang Cc: Frank Mayhar, Eric Sandeen, Curt Wohlgemuth, ext4 development On Aug 28, 2009 14:44 -0700, Jiaying Zhang wrote: > On Fri, Aug 28, 2009 at 12:40 PM, Andreas Dilger<adilger@sun.com> wrote: > > This isn't really correct, however, because i_blocks also contains > > non-data blocks (indirect/index, EA, etc) blocks, so even with small > > files with ACLs i_blocks may always be larger than ia_size >> 9, and > > for ext2/3 at least this will ALWAYS be true for files > 48kB in size. > > I see. I guess we need to use a special flag then. Or is there any > other suggestions? I also have another question related to this > problem. Why those fallocated blocks are not marked as preallocated > blocks that will then be automatically freed in ext4_release_file? Because fallocate() means "persistent allocation on disk", not "in memory preallocation". The "in memory" preallocation already happens in ext4, and it is released when the inode is cleaned up. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Question on fallocate/ftruncate sequence 2009-08-28 22:14 ` Andreas Dilger @ 2009-08-29 0:40 ` Jiaying Zhang 2009-08-30 2:52 ` Theodore Tso 0 siblings, 1 reply; 42+ messages in thread From: Jiaying Zhang @ 2009-08-29 0:40 UTC (permalink / raw) To: Andreas Dilger Cc: Frank Mayhar, Eric Sandeen, Curt Wohlgemuth, ext4 development On Fri, Aug 28, 2009 at 3:14 PM, Andreas Dilger<adilger@sun.com> wrote: > On Aug 28, 2009 14:44 -0700, Jiaying Zhang wrote: >> On Fri, Aug 28, 2009 at 12:40 PM, Andreas Dilger<adilger@sun.com> wrote: >> > This isn't really correct, however, because i_blocks also contains >> > non-data blocks (indirect/index, EA, etc) blocks, so even with small >> > files with ACLs i_blocks may always be larger than ia_size >> 9, and >> > for ext2/3 at least this will ALWAYS be true for files > 48kB in size. >> >> I see. I guess we need to use a special flag then. Or is there any >> other suggestions? I also have another question related to this >> problem. Why those fallocated blocks are not marked as preallocated >> blocks that will then be automatically freed in ext4_release_file? > > Because fallocate() means "persistent allocation on disk", not "in memory > preallocation". The "in memory" preallocation already happens in ext4, > and it is released when the inode is cleaned up. Right. Thanks for pointing this out! RFC, here is a patch that Frank and I have been working on. It introduces a new fs flag to mark that the file has been allocated beyond its EOF, as discussed previously in this thread. The flag is cleared in the subsequent vmtruncate or fallocate without KEEPSIZE. It is possible that a vmtruncate may be called unnecessarily in the case that the file is written beyond the allocated size, but I think it is ok to pay this cost to get correctness. --- .pc/fallocate_keepsizse.patch/fs/attr.c 2009-08-28 15:38:46.000000000 -0700 +++ fs/attr.c 2009-08-28 17:01:04.000000000 -0700 @@ -68,7 +68,8 @@ int inode_setattr(struct inode * inode, unsigned int ia_valid = attr->ia_valid; if (ia_valid & ATTR_SIZE && - (attr->ia_size != i_size_read(inode)) { + (attr->ia_size != i_size_read(inode) || + (inode->i_flags & FS_KEEPSIZE_FL))) { int error = vmtruncate(inode, attr->ia_size); if (error) return error; --- .pc/fallocate_keepsizse.patch/fs/ext4/extents.c 2009-08-28 15:37:45.000000000 -0700 +++ fs/ext4/extents.c 2009-08-28 17:27:27.000000000 -0700 @@ -3095,7 +3095,13 @@ static void ext4_falloc_update_inode(str i_size_write(inode, new_size); if (new_size > EXT4_I(inode)->i_disksize) ext4_update_i_disksize(inode, new_size); + inode->i_flags &= ~FS_KEEPSIZE_FL; } else { + /* + * Mark that we allocate beyond EOF so the subsequent truncate + * can proceed even if the new size is the same as i_size. + */ + inode->i_flags |= FS_KEEPSIZE_FL; } } --- .pc/fallocate_keepsizse.patch/fs/ext4/inode.c 2009-08-16 14:19:38.000000000 -0700 +++ fs/ext4/inode.c 2009-08-28 16:59:42.000000000 -0700 @@ -3973,6 +3973,8 @@ void ext4_truncate(struct inode *inode) if (!ext4_can_truncate(inode)) return; + inode->i_flags &= ~FS_KEEPSIZE_FL; + if (inode->i_size == 0 && !test_opt(inode->i_sb, NO_AUTO_DA_ALLOC)) ei->i_state |= EXT4_STATE_DA_ALLOC_CLOSE; --- .pc/fallocate_keepsizse.patch/include/linux/fs.h 2009-08-28 15:44:27.000000000 -0700 +++ include/linux/fs.h 2009-08-28 17:00:47.000000000 -0700 @@ -343,6 +343,7 @@ struct inodes_stat_t { #define FS_TOPDIR_FL 0x00020000 /* Top of directory hierarchies*/ #define FS_EXTENT_FL 0x00080000 /* Extents */ #define FS_DIRECTIO_FL 0x00100000 /* Use direct i/o */ +#define FS_KEEPSIZE_FL 0x00200000 /* Blocks allocated beyond EOF */ #define FS_RESERVED_FL 0x80000000 /* reserved for ext2 lib */ #define FS_FL_USER_VISIBLE 0x0003DFFF /* User visible flags */ Jiaying > > Cheers, Andreas > -- > Andreas Dilger > Sr. Staff Engineer, Lustre Group > Sun Microsystems of Canada, Inc. > > -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Question on fallocate/ftruncate sequence 2009-08-29 0:40 ` Jiaying Zhang @ 2009-08-30 2:52 ` Theodore Tso 2009-08-31 19:40 ` Jiaying Zhang 0 siblings, 1 reply; 42+ messages in thread From: Theodore Tso @ 2009-08-30 2:52 UTC (permalink / raw) To: Jiaying Zhang Cc: Andreas Dilger, Frank Mayhar, Eric Sandeen, Curt Wohlgemuth, ext4 development On Fri, Aug 28, 2009 at 05:40:54PM -0700, Jiaying Zhang wrote: > --- .pc/fallocate_keepsizse.patch/fs/attr.c 2009-08-28 15:38:46.000000000 -0700 > +++ fs/attr.c 2009-08-28 17:01:04.000000000 -0700 > @@ -68,7 +68,8 @@ int inode_setattr(struct inode * inode, > unsigned int ia_valid = attr->ia_valid; > > if (ia_valid & ATTR_SIZE && > - (attr->ia_size != i_size_read(inode)) { > + (attr->ia_size != i_size_read(inode) || > + (inode->i_flags & FS_KEEPSIZE_FL))) { > int error = vmtruncate(inode, attr->ia_size); > if (error) > return error; Instead of doing this in the generic code, it really should be done in ext4_setattr. Technically speaking, we don't actually need the FS_KEEPSIZE_FL to solve this problem; instead we can simply have the ext4 code look in the extent tree to see if there are any blocks mapped beyond the logical block: i_size_read(inode) >> inode->i_sb->s_blocksize_bits Having a flag as Andreas suggested does help with the issue of e2fsck noticing whether or not i_size is incorrect (and should be fixed) or the file has been extended. So keeping having the flag is an OK thing to do, but we need to be careful about a particularly subtle overloading problem. The flags FS_*_FL as defined in include/linux/fs.h are technically only for in-memory use. The ext4 on-disk format flags is EXT4_*_FL, and defined in ext4.h. The flags were originially defined for use in ext2/3/4, but later on other filesystems adopted those flags so that e2fsprogs's chattr and lsattr programs could be used for their filesystems as well. It just so happens that for ext2/3/4 the on-disk encoding of those flags in the in-memory encoding of those flags in i_flags are the same, but that means that the flags need to be defined in both places to avoid assignment overlaps. We also need to be clear whether the flags are internal flags for ext4's use only, or flags meant for use by all filesystems. This is why the testing for FS_KEEPSIZE_FL in fs/attr is particularly bad, if the flag are going to be set in fs/ext4/extents.c. It's better to define the flag as EXT4_KEEPSIZE_FL, and to use it as EXT4_KEEPSIZE_FL, but make a note of that bitfield position as being reserved in include/linux/fs.h. - Ted ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Question on fallocate/ftruncate sequence 2009-08-30 2:52 ` Theodore Tso @ 2009-08-31 19:40 ` Jiaying Zhang 2009-08-31 21:56 ` Andreas Dilger 0 siblings, 1 reply; 42+ messages in thread From: Jiaying Zhang @ 2009-08-31 19:40 UTC (permalink / raw) To: Theodore Tso Cc: Andreas Dilger, Frank Mayhar, Eric Sandeen, Curt Wohlgemuth, ext4 development Thanks a lot for the comments and suggestions! On Sat, Aug 29, 2009 at 7:52 PM, Theodore Tso<tytso@mit.edu> wrote: > On Fri, Aug 28, 2009 at 05:40:54PM -0700, Jiaying Zhang wrote: >> --- .pc/fallocate_keepsizse.patch/fs/attr.c 2009-08-28 15:38:46.000000000 -0700 >> +++ fs/attr.c 2009-08-28 17:01:04.000000000 -0700 >> @@ -68,7 +68,8 @@ int inode_setattr(struct inode * inode, >> unsigned int ia_valid = attr->ia_valid; >> >> if (ia_valid & ATTR_SIZE && >> - (attr->ia_size != i_size_read(inode)) { >> + (attr->ia_size != i_size_read(inode) || >> + (inode->i_flags & FS_KEEPSIZE_FL))) { >> int error = vmtruncate(inode, attr->ia_size); >> if (error) >> return error; > > Instead of doing this in the generic code, it really should be done in > ext4_setattr. Technically speaking, we don't actually need the > FS_KEEPSIZE_FL to solve this problem; instead we can simply have the > ext4 code look in the extent tree to see if there are any blocks > mapped beyond the logical block: > > i_size_read(inode) >> inode->i_sb->s_blocksize_bits Is it relatively cheap to scan the extent tree? Will this add the overhead to truncate? > > Having a flag as Andreas suggested does help with the issue of e2fsck > noticing whether or not i_size is incorrect (and should be fixed) or > the file has been extended. So keeping having the flag is an OK thing > to do, but we need to be careful about a particularly subtle > overloading problem. The flags FS_*_FL as defined in > include/linux/fs.h are technically only for in-memory use. The ext4 > on-disk format flags is EXT4_*_FL, and defined in ext4.h. > > The flags were originially defined for use in ext2/3/4, but later on > other filesystems adopted those flags so that e2fsprogs's chattr and > lsattr programs could be used for their filesystems as well. It just > so happens that for ext2/3/4 the on-disk encoding of those flags in > the in-memory encoding of those flags in i_flags are the same, but > that means that the flags need to be defined in both places to avoid > assignment overlaps. We also need to be clear whether the flags are > internal flags for ext4's use only, or flags meant for use by all > filesystems. This is why the testing for FS_KEEPSIZE_FL in fs/attr is > particularly bad, if the flag are going to be set in fs/ext4/extents.c. > > It's better to define the flag as EXT4_KEEPSIZE_FL, and to use it as > EXT4_KEEPSIZE_FL, but make a note of that bitfield position as being > reserved in include/linux/fs.h. Here is the modified patch based on your suggestions. I stick with the KEEPSIZE_FL approach that I think can allow us to handle the special truncation accordingly during fsck. Other file systems can also re-use this flag when they want to support fallocate with KEEP_SIZE. As you suggested, I moved the EXT4_KEEPSIZE_FL checking to ext4_setattr that now calls vmtruncate if the KEEPSIZE flag is set in the i_flag. Please let me know what you think about this proposed patch. Thanks a lot! Jiaying --- .pc/fallocate_keepsizse.patch/fs/ext4/extents.c 2009-08-31 12:08:10.000000000 -0700 +++ fs/ext4/extents.c 2009-08-31 12:12:16.000000000 -0700 @@ -3095,7 +3095,13 @@ static void ext4_falloc_update_inode(str i_size_write(inode, new_size); if (new_size > EXT4_I(inode)->i_disksize) ext4_update_i_disksize(inode, new_size); + inode->i_flags &= ~EXT4_KEEPSIZE_FL; } else { + /* + * Mark that we allocate beyond EOF so the subsequent truncate + * can proceed even if the new size is the same as i_size. + */ + inode->i_flags |= EXT4_KEEPSIZE_FL; } } --- .pc/fallocate_keepsizse.patch/fs/ext4/inode.c 2009-08-31 12:08:10.000000000 -0700 +++ fs/ext4/inode.c 2009-08-31 12:12:16.000000000 -0700 @@ -3973,6 +3973,8 @@ void ext4_truncate(struct inode *inode) if (!ext4_can_truncate(inode)) return; + inode->i_flags &= ~EXT4_KEEPSIZE_FL; + if (inode->i_size == 0 && !test_opt(inode->i_sb, NO_AUTO_DA_ALLOC)) ei->i_state |= EXT4_STATE_DA_ALLOC_CLOSE; @@ -4807,7 +4809,9 @@ int ext4_setattr(struct dentry *dentry, } if (S_ISREG(inode->i_mode) && - attr->ia_valid & ATTR_SIZE && attr->ia_size < inode->i_size) { + attr->ia_valid & ATTR_SIZE && + (attr->ia_size < inode->i_size || + (inode->i_flags & EXT4_KEEPSIZE_FL))) { handle_t *handle; handle = ext4_journal_start(inode, 3); @@ -4838,6 +4842,11 @@ int ext4_setattr(struct dentry *dentry, goto err_out; } } + if ((inode->i_flags & EXT4_KEEPSIZE_FL)) { + rc = vmtruncate(inode, attr->ia_size); + if (rc) + goto err_out; + } } rc = inode_setattr(inode, attr); --- .pc/fallocate_keepsizse.patch/include/linux/fs.h 2009-08-31 12:08:10.000000000 -0700 +++ include/linux/fs.h 2009-08-31 12:12:16.000000000 -0700 @@ -343,6 +343,7 @@ struct inodes_stat_t { #define FS_TOPDIR_FL 0x00020000 /* Top of directory hierarchies*/ #define FS_EXTENT_FL 0x00080000 /* Extents */ #define FS_DIRECTIO_FL 0x00100000 /* Use direct i/o */ +#define FS_KEEPSIZE_FL 0x00200000 /* Blocks allocated beyond EOF */ #define FS_RESERVED_FL 0x80000000 /* reserved for ext2 lib */ #define FS_FL_USER_VISIBLE 0x0003DFFF /* User visible flags */ --- .pc/fallocate_keepsizse.patch/fs/ext4/ext4.h 2009-08-31 12:08:10.000000000 -0700 +++ fs/ext4/ext4.h 2009-08-31 12:12:16.000000000 -0700 @@ -235,6 +235,7 @@ struct flex_groups { #define EXT4_HUGE_FILE_FL 0x00040000 /* Set to each huge file */ #define EXT4_EXTENTS_FL 0x00080000 /* Inode uses extents */ #define EXT4_EXT_MIGRATE 0x00100000 /* Inode is migrating */ +#define EXT4_KEEPSIZE_FL 0x00200000 /* Blocks allocated beyond EOF (bit reserved in fs.h) */ #define EXT4_RESERVED_FL 0x80000000 /* reserved for ext4 lib */ #define EXT4_FL_USER_VISIBLE 0x000BDFFF /* User visible flags */ > > - Ted > -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Question on fallocate/ftruncate sequence 2009-08-31 19:40 ` Jiaying Zhang @ 2009-08-31 21:56 ` Andreas Dilger 2009-08-31 23:33 ` Jiaying Zhang 0 siblings, 1 reply; 42+ messages in thread From: Andreas Dilger @ 2009-08-31 21:56 UTC (permalink / raw) To: Jiaying Zhang Cc: Theodore Tso, Frank Mayhar, Eric Sandeen, Curt Wohlgemuth, ext4 development On Aug 31, 2009 12:40 -0700, Jiaying Zhang wrote: > > It's better to define the flag as EXT4_KEEPSIZE_FL, and to use it as > > EXT4_KEEPSIZE_FL, but make a note of that bitfield position as being > > reserved in include/linux/fs.h. > > Here is the modified patch based on your suggestions. I stick with the > KEEPSIZE_FL approach that I think can allow us to handle the special > truncation accordingly during fsck. Other file systems can also re-use > this flag when they want to support fallocate with KEEP_SIZE. As you > suggested, I moved the EXT4_KEEPSIZE_FL checking to ext4_setattr > that now calls vmtruncate if the KEEPSIZE flag is set in the i_flag. > Please let me know what you think about this proposed patch. > > --- .pc/fallocate_keepsizse.patch/fs/ext4/extents.c 2009-08-31 > 12:08:10.000000000 -0700 > +++ fs/ext4/extents.c 2009-08-31 12:12:16.000000000 -0700 > @@ -3095,7 +3095,13 @@ static void ext4_falloc_update_inode(str > i_size_write(inode, new_size); > if (new_size > EXT4_I(inode)->i_disksize) > ext4_update_i_disksize(inode, new_size); > + inode->i_flags &= ~EXT4_KEEPSIZE_FL; Note that fallocate can be called multiple times for a file. The EXT4_KEEPSIZE_FL should only be cleared if there were writes to the end of the fallocated space. In that regard, I think the name of this flag should be changed to something like "EXT4_EOFBLOCKS_FL" to indicate that blocks are allocated beyond the end of file (i_size). > } else { > + /* > + * Mark that we allocate beyond EOF so the subsequent truncate > + * can proceed even if the new size is the same as i_size. > + */ > + inode->i_flags |= EXT4_KEEPSIZE_FL; Similarly, this should only be done in case the fallocate is actually beyond i_size. While that is the most common case, it isn't necessarily ALWAYS going to be true (e.g. if multiple threads are calling fallocate() on a single file, or if a program always calls fallocate() on a file without first checking if the file size is large enough). > +++ include/linux/fs.h 2009-08-31 12:12:16.000000000 -0700 > #define FS_DIRECTIO_FL 0x00100000 /* Use direct i/o */ > +++ fs/ext4/ext4.h 2009-08-31 12:12:16.000000000 -0700 > #define EXT4_EXT_MIGRATE 0x00100000 /* Inode is migrating */ Should we redefine EXT4_EXT_MIGRATE not to conflict with FS_DIRECTIO_FL? I don't think much, if any, use has been made of this flag, and I can imagine a major headache in the future if this isn't changed now. Also, EXT4_EXT_MIGRATE doesn't necessarily belong in the i_flags space, since it is only used in-memory rather than on-disk as all of the others are. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Question on fallocate/ftruncate sequence 2009-08-31 21:56 ` Andreas Dilger @ 2009-08-31 23:33 ` Jiaying Zhang 2009-09-02 8:41 ` Andreas Dilger 0 siblings, 1 reply; 42+ messages in thread From: Jiaying Zhang @ 2009-08-31 23:33 UTC (permalink / raw) To: Andreas Dilger Cc: Theodore Tso, Frank Mayhar, Eric Sandeen, Curt Wohlgemuth, ext4 development On Mon, Aug 31, 2009 at 2:56 PM, Andreas Dilger<adilger@sun.com> wrote: > On Aug 31, 2009 12:40 -0700, Jiaying Zhang wrote: >> > It's better to define the flag as EXT4_KEEPSIZE_FL, and to use it as >> > EXT4_KEEPSIZE_FL, but make a note of that bitfield position as being >> > reserved in include/linux/fs.h. >> >> Here is the modified patch based on your suggestions. I stick with the >> KEEPSIZE_FL approach that I think can allow us to handle the special >> truncation accordingly during fsck. Other file systems can also re-use >> this flag when they want to support fallocate with KEEP_SIZE. As you >> suggested, I moved the EXT4_KEEPSIZE_FL checking to ext4_setattr >> that now calls vmtruncate if the KEEPSIZE flag is set in the i_flag. >> Please let me know what you think about this proposed patch. >> >> --- .pc/fallocate_keepsizse.patch/fs/ext4/extents.c 2009-08-31 >> 12:08:10.000000000 -0700 >> +++ fs/ext4/extents.c 2009-08-31 12:12:16.000000000 -0700 >> @@ -3095,7 +3095,13 @@ static void ext4_falloc_update_inode(str >> i_size_write(inode, new_size); >> if (new_size > EXT4_I(inode)->i_disksize) >> ext4_update_i_disksize(inode, new_size); >> + inode->i_flags &= ~EXT4_KEEPSIZE_FL; > > Note that fallocate can be called multiple times for a file. The > EXT4_KEEPSIZE_FL should only be cleared if there were writes to > the end of the fallocated space. In that regard, I think the name > of this flag should be changed to something like "EXT4_EOFBLOCKS_FL" > to indicate that blocks are allocated beyond the end of file (i_size). Thanks for catching this! I changed the patch to only clear the flag when the new_size is larger than i_size and changed the flag name as you suggested. It would be nice if we only clear the flag when we write beyond the fallocated space, but this seems hard to detect because we no longer have the allocated size once that keepsize fallocate call returns. > >> } else { >> + /* >> + * Mark that we allocate beyond EOF so the subsequent truncate >> + * can proceed even if the new size is the same as i_size. >> + */ >> + inode->i_flags |= EXT4_KEEPSIZE_FL; > > Similarly, this should only be done in case the fallocate is actually > beyond i_size. While that is the most common case, it isn't necessarily > ALWAYS going to be true (e.g. if multiple threads are calling fallocate() > on a single file, or if a program always calls fallocate() on a file > without first checking if the file size is large enough). Also fixed. > >> +++ include/linux/fs.h 2009-08-31 12:12:16.000000000 -0700 >> #define FS_DIRECTIO_FL 0x00100000 /* Use direct i/o */ > > >> +++ fs/ext4/ext4.h 2009-08-31 12:12:16.000000000 -0700 >> #define EXT4_EXT_MIGRATE 0x00100000 /* Inode is migrating */ > > Should we redefine EXT4_EXT_MIGRATE not to conflict with FS_DIRECTIO_FL? > I don't think much, if any, use has been made of this flag, and I can > imagine a major headache in the future if this isn't changed now. > > Also, EXT4_EXT_MIGRATE doesn't necessarily belong in the i_flags space, > since it is only used in-memory rather than on-disk as all of the others > are. I will leave this out from my patch since it seems to belong to more general cleanup and I don't know much about the EXT4_EXT_MIGRATE flag :). Here is the new patch: --- .pc/fallocate_keepsizse.patch/fs/ext4/extents.c 2009-08-31 12:08:10.000000000 -0700 +++ fs/ext4/extents.c 2009-08-31 15:51:13.000000000 -0700 @@ -3091,11 +3091,19 @@ static void ext4_falloc_update_inode(str * the file size. */ if (!(mode & FALLOC_FL_KEEP_SIZE)) { - if (new_size > i_size_read(inode)) + if (new_size > i_size_read(inode)) { i_size_write(inode, new_size); + inode->i_flags &= ~EXT4_EOFBLOCKS_FL; + } if (new_size > EXT4_I(inode)->i_disksize) ext4_update_i_disksize(inode, new_size); } else { + /* + * Mark that we allocate beyond EOF so the subsequent truncate + * can proceed even if the new size is the same as i_size. + */ + if (new_size > i_size_read(inode)) + inode->i_flags |= EXT4_EOFBLOCKS_FL; } } --- .pc/fallocate_keepsizse.patch/fs/ext4/inode.c 2009-08-31 12:08:10.000000000 -0700 +++ fs/ext4/inode.c 2009-08-31 15:50:56.000000000 -0700 @@ -3973,6 +3973,8 @@ void ext4_truncate(struct inode *inode) if (!ext4_can_truncate(inode)) return; + inode->i_flags &= ~EXT4_EOFBLOCKS_FL; + if (inode->i_size == 0 && !test_opt(inode->i_sb, NO_AUTO_DA_ALLOC)) ei->i_state |= EXT4_STATE_DA_ALLOC_CLOSE; @@ -4807,7 +4809,9 @@ int ext4_setattr(struct dentry *dentry, } if (S_ISREG(inode->i_mode) && - attr->ia_valid & ATTR_SIZE && attr->ia_size < inode->i_size) { + attr->ia_valid & ATTR_SIZE && + (attr->ia_size < inode->i_size || + (inode->i_flags & EXT4_EOFBLOCKS_FL))) { handle_t *handle; handle = ext4_journal_start(inode, 3); @@ -4838,6 +4842,11 @@ int ext4_setattr(struct dentry *dentry, goto err_out; } } + if ((inode->i_flags & EXT4_EOFBLOCKS_FL)) { + rc = vmtruncate(inode, attr->ia_size); + if (rc) + goto err_out; + } } rc = inode_setattr(inode, attr); --- .pc/fallocate_keepsizse.patch/include/linux/fs.h 2009-08-31 12:08:10.000000000 -0700 +++ include/linux/fs.h 2009-08-31 16:21:44.000000000 -0700 @@ -343,6 +343,7 @@ struct inodes_stat_t { #define FS_TOPDIR_FL 0x00020000 /* Top of directory hierarchies*/ #define FS_EXTENT_FL 0x00080000 /* Extents */ #define FS_DIRECTIO_FL 0x00100000 /* Use direct i/o */ +#define FS_EOFBLOCKS_FL 0x00200000 /* Blocks allocated beyond EOF */ #define FS_RESERVED_FL 0x80000000 /* reserved for ext2 lib */ #define FS_FL_USER_VISIBLE 0x0003DFFF /* User visible flags */ --- .pc/fallocate_keepsizse.patch/fs/ext4/ext4.h 2009-08-31 12:08:10.000000000 -0700 +++ fs/ext4/ext4.h 2009-08-31 15:52:34.000000000 -0700 @@ -235,6 +235,7 @@ struct flex_groups { #define EXT4_HUGE_FILE_FL 0x00040000 /* Set to each huge file */ #define EXT4_EXTENTS_FL 0x00080000 /* Inode uses extents */ #define EXT4_EXT_MIGRATE 0x00100000 /* Inode is migrating */ +#define EXT4_EOFBLOCKS_FL 0x00200000 /* Blocks allocated beyond EOF (bit reserved in fs.h) */ #define EXT4_RESERVED_FL 0x80000000 /* reserved for ext4 lib */ #define EXT4_FL_USER_VISIBLE 0x000BDFFF /* User visible flags */ root@outpost:/mnt/work/linux-2.6.30.5# Jiaying > > Cheers, Andreas > -- > Andreas Dilger > Sr. Staff Engineer, Lustre Group > Sun Microsystems of Canada, Inc. > > -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Question on fallocate/ftruncate sequence 2009-08-31 23:33 ` Jiaying Zhang @ 2009-09-02 8:41 ` Andreas Dilger 2009-09-03 5:20 ` Jiaying Zhang 0 siblings, 1 reply; 42+ messages in thread From: Andreas Dilger @ 2009-09-02 8:41 UTC (permalink / raw) To: Jiaying Zhang Cc: Theodore Tso, Frank Mayhar, Eric Sandeen, Curt Wohlgemuth, ext4 development On Aug 31, 2009 16:33 -0700, Jiaying Zhang wrote: > > EXT4_KEEPSIZE_FL should only be cleared if there were writes to > > the end of the fallocated space. In that regard, I think the name > > of this flag should be changed to something like "EXT4_EOFBLOCKS_FL" > > to indicate that blocks are allocated beyond the end of file (i_size). > > Thanks for catching this! I changed the patch to only clear the flag > when the new_size is larger than i_size and changed the flag name > as you suggested. It would be nice if we only clear the flag when we > write beyond the fallocated space, but this seems hard to detect > because we no longer have the allocated size once that keepsize > fallocate call returns. The problem is that if e2fsck depends on the EXT4_EOFBLOCKS_FL set for fallocate-beyond-EOF then it is worse to clear it than to leave it set. At worst, leaving the flag set results in too many truncates on the file. Clearing the flag when not correct may result in user visible data corruption if the file size is extended... > Here is the new patch: > > --- .pc/fallocate_keepsizse.patch/fs/ext4/extents.c 2009-08-31 > 12:08:10.000000000 -0700 > +++ fs/ext4/extents.c 2009-08-31 15:51:13.000000000 -0700 > @@ -3091,11 +3091,19 @@ static void ext4_falloc_update_inode(str > * the file size. > */ > if (!(mode & FALLOC_FL_KEEP_SIZE)) { > + if (new_size > i_size_read(inode)) { > i_size_write(inode, new_size); > + inode->i_flags &= ~EXT4_EOFBLOCKS_FL; This again isn't quite correct, since the EOFBLOCKS_FL shouldn't be cleared unless new_size is beyond the allocated size. The allocation code itself might be a better place to clear this, since it knows whether there were new blocks being added beyond the current max allocated block. > +#define FS_EOFBLOCKS_FL 0x00200000 /* Blocks allocated beyond EOF */ > #define FS_RESERVED_FL 0x80000000 /* reserved for ext2 lib */ > > #define FS_FL_USER_VISIBLE 0x0003DFFF /* User visible flags */ It probably isn't a bad idea to make this flag user-visible, since it would allow scanning for files that have excess space reserved (e.g. if the filesystem is getting full). Making it user-settable (i.e. clearable) would essentially mean truncating the file to i_size without updating the timestamps so that the reserved space is discarded. I don't think there is any value in allowing a user to turn this flag on for a file. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Question on fallocate/ftruncate sequence 2009-09-02 8:41 ` Andreas Dilger @ 2009-09-03 5:20 ` Jiaying Zhang 2009-09-03 5:32 ` Jiaying Zhang 2009-09-24 5:27 ` Jiaying Zhang 0 siblings, 2 replies; 42+ messages in thread From: Jiaying Zhang @ 2009-09-03 5:20 UTC (permalink / raw) To: Andreas Dilger Cc: Theodore Tso, Frank Mayhar, Eric Sandeen, Curt Wohlgemuth, ext4 development On Wed, Sep 2, 2009 at 1:41 AM, Andreas Dilger<adilger@sun.com> wrote: > On Aug 31, 2009 16:33 -0700, Jiaying Zhang wrote: >> > EXT4_KEEPSIZE_FL should only be cleared if there were writes to >> > the end of the fallocated space. In that regard, I think the name >> > of this flag should be changed to something like "EXT4_EOFBLOCKS_FL" >> > to indicate that blocks are allocated beyond the end of file (i_size). >> >> Thanks for catching this! I changed the patch to only clear the flag >> when the new_size is larger than i_size and changed the flag name >> as you suggested. It would be nice if we only clear the flag when we >> write beyond the fallocated space, but this seems hard to detect >> because we no longer have the allocated size once that keepsize >> fallocate call returns. > > The problem is that if e2fsck depends on the EXT4_EOFBLOCKS_FL set > for fallocate-beyond-EOF then it is worse to clear it than to leave > it set. At worst, leaving the flag set results in too many truncates > on the file. Clearing the flag when not correct may result in user > visible data corruption if the file size is extended... > >> Here is the new patch: >> >> --- .pc/fallocate_keepsizse.patch/fs/ext4/extents.c 2009-08-31 >> 12:08:10.000000000 -0700 >> +++ fs/ext4/extents.c 2009-08-31 15:51:13.000000000 -0700 >> @@ -3091,11 +3091,19 @@ static void ext4_falloc_update_inode(str >> * the file size. >> */ >> if (!(mode & FALLOC_FL_KEEP_SIZE)) { >> + if (new_size > i_size_read(inode)) { >> i_size_write(inode, new_size); >> + inode->i_flags &= ~EXT4_EOFBLOCKS_FL; > > This again isn't quite correct, since the EOFBLOCKS_FL shouldn't > be cleared unless new_size is beyond the allocated size. The > allocation code itself might be a better place to clear this, > since it knows whether there were new blocks being added beyond > the current max allocated block. We were thinking to clear this flag when we need to allocate new blocks, but I was not sure how to get the current max allocated block -- that is mostly because I just started working on the ext4 code. After digging into the ext4 allocation code today, I think we can put the check&clear in ext4_ext_get_blocks: @@ -2968,6 +2968,14 @@ int ext4_ext_get_blocks(handle_t *handle newex.ee_len = cpu_to_le16(ar.len); if (create == EXT4_CREATE_UNINITIALIZED_EXT) /* Mark uninitialized */ ext4_ext_mark_uninitialized(&newex); + + if (unlikely(inode->i_flags & EXT4_EOFBLOCKS_FL)) { + BUG_ON(!eh->eh_entries); + last_ex = EXT_LAST_EXTENT(eh); + if (iblock + max_blocks > le32_to_cpu(last_ex->ee_block) + + ext4_ext_get_actual_len(last_ex)) + inode->i_flags &= ~EXT4_EOFBLOCKS_FL; + } err = ext4_ext_insert_extent(handle, inode, path, &newex); if (err) { /* free data blocks we just allocated */ Again, I just started looking at this part of code, so please let me know if I am in the right direction. Another thing I am not sure is whether we can allocate a non-data block, like extended attributes, beyond the current max block without changing the i_size. In that case, clearing the EOFBLOCKS flag will be wrong. >> #define FS_FL_USER_VISIBLE 0x0003DFFF /* User visible flags */ > > It probably isn't a bad idea to make this flag user-visible, since it > would allow scanning for files that have excess space reserved (e.g. > if the filesystem is getting full). Making it user-settable (i.e. > clearable) would essentially mean truncating the file to i_size without > updating the timestamps so that the reserved space is discarded. I > don't think there is any value in allowing a user to turn this flag on > for a file. So to make it user-settable, we need to add the handling in ext4_ioctl that calls vmtruncate when the flag to be cleared. But how can we get the right size to truncate in that case? Can we just set that to the max initialized block shift with block size? But that may also truncate the blocks reserved without the KEEP_SIZE flag. Jiaying > > Cheers, Andreas > -- > Andreas Dilger > Sr. Staff Engineer, Lustre Group > Sun Microsystems of Canada, Inc. > > -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Question on fallocate/ftruncate sequence 2009-09-03 5:20 ` Jiaying Zhang @ 2009-09-03 5:32 ` Jiaying Zhang 2009-09-24 5:27 ` Jiaying Zhang 1 sibling, 0 replies; 42+ messages in thread From: Jiaying Zhang @ 2009-09-03 5:32 UTC (permalink / raw) To: Andreas Dilger Cc: Theodore Tso, Frank Mayhar, Eric Sandeen, Curt Wohlgemuth, ext4 development On Wed, Sep 2, 2009 at 10:20 PM, Jiaying Zhang<jiayingz@google.com> wrote: > On Wed, Sep 2, 2009 at 1:41 AM, Andreas Dilger<adilger@sun.com> wrote: >> On Aug 31, 2009 16:33 -0700, Jiaying Zhang wrote: >>> > EXT4_KEEPSIZE_FL should only be cleared if there were writes to >>> > the end of the fallocated space. In that regard, I think the name >>> > of this flag should be changed to something like "EXT4_EOFBLOCKS_FL" >>> > to indicate that blocks are allocated beyond the end of file (i_size). >>> >>> Thanks for catching this! I changed the patch to only clear the flag >>> when the new_size is larger than i_size and changed the flag name >>> as you suggested. It would be nice if we only clear the flag when we >>> write beyond the fallocated space, but this seems hard to detect >>> because we no longer have the allocated size once that keepsize >>> fallocate call returns. >> >> The problem is that if e2fsck depends on the EXT4_EOFBLOCKS_FL set >> for fallocate-beyond-EOF then it is worse to clear it than to leave >> it set. At worst, leaving the flag set results in too many truncates >> on the file. Clearing the flag when not correct may result in user >> visible data corruption if the file size is extended... >> >>> Here is the new patch: >>> >>> --- .pc/fallocate_keepsizse.patch/fs/ext4/extents.c 2009-08-31 >>> 12:08:10.000000000 -0700 >>> +++ fs/ext4/extents.c 2009-08-31 15:51:13.000000000 -0700 >>> @@ -3091,11 +3091,19 @@ static void ext4_falloc_update_inode(str >>> * the file size. >>> */ >>> if (!(mode & FALLOC_FL_KEEP_SIZE)) { >>> + if (new_size > i_size_read(inode)) { >>> i_size_write(inode, new_size); >>> + inode->i_flags &= ~EXT4_EOFBLOCKS_FL; >> >> This again isn't quite correct, since the EOFBLOCKS_FL shouldn't >> be cleared unless new_size is beyond the allocated size. The >> allocation code itself might be a better place to clear this, >> since it knows whether there were new blocks being added beyond >> the current max allocated block. > > We were thinking to clear this flag when we need to allocate new > blocks, but I was not sure how to get the current max allocated > block -- that is mostly because I just started working on the ext4 > code. After digging into the ext4 allocation code today, I think we > can put the check&clear in ext4_ext_get_blocks: > > @@ -2968,6 +2968,14 @@ int ext4_ext_get_blocks(handle_t *handle > newex.ee_len = cpu_to_le16(ar.len); > if (create == EXT4_CREATE_UNINITIALIZED_EXT) /* Mark uninitialized */ > ext4_ext_mark_uninitialized(&newex); > + > + if (unlikely(inode->i_flags & EXT4_EOFBLOCKS_FL)) { > + BUG_ON(!eh->eh_entries); > + last_ex = EXT_LAST_EXTENT(eh); > + if (iblock + max_blocks > le32_to_cpu(last_ex->ee_block) > + + ext4_ext_get_actual_len(last_ex)) > + inode->i_flags &= ~EXT4_EOFBLOCKS_FL; > + } > err = ext4_ext_insert_extent(handle, inode, path, &newex); > if (err) { > /* free data blocks we just allocated */ > > Again, I just started looking at this part of code, so please let me know > if I am in the right direction. > > Another thing I am not sure is whether we can allocate a non-data block, > like extended attributes, beyond the current max block without changing > the i_size. In that case, clearing the EOFBLOCKS flag will be wrong. > >>> #define FS_FL_USER_VISIBLE 0x0003DFFF /* User visible flags */ >> >> It probably isn't a bad idea to make this flag user-visible, since it >> would allow scanning for files that have excess space reserved (e.g. >> if the filesystem is getting full). Making it user-settable (i.e. >> clearable) would essentially mean truncating the file to i_size without >> updating the timestamps so that the reserved space is discarded. I >> don't think there is any value in allowing a user to turn this flag on >> for a file. > > So to make it user-settable, we need to add the handling in ext4_ioctl > that calls vmtruncate when the flag to be cleared. But how can we get > the right size to truncate in that case? Can we just set that to the > max initialized block shift with block size? But that may also truncate > the blocks reserved without the KEEP_SIZE flag. Never mind, that is a stupid question. We can just truncate to the current i_size. Jiaying > > Jiaying > >> >> Cheers, Andreas >> -- >> Andreas Dilger >> Sr. Staff Engineer, Lustre Group >> Sun Microsystems of Canada, Inc. >> >> > -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Question on fallocate/ftruncate sequence 2009-09-03 5:20 ` Jiaying Zhang 2009-09-03 5:32 ` Jiaying Zhang @ 2009-09-24 5:27 ` Jiaying Zhang 2009-09-25 7:35 ` Andreas Dilger 2009-09-29 19:15 ` Eric Sandeen 1 sibling, 2 replies; 42+ messages in thread From: Jiaying Zhang @ 2009-09-24 5:27 UTC (permalink / raw) To: Andreas Dilger Cc: Theodore Tso, Frank Mayhar, Eric Sandeen, Curt Wohlgemuth, ext4 development Sorry for taking so long to finish this. Here is the new patch based on Andreas's suggestions. Now the patch clears the EXT4_EOFBLOCKS_FL flag when we allocate beyond the maximum allocated block. I also made the EOFBLOCKS flag user visible and added the handling in ext4_ioctl as Andrea suggested. Index: linux-2.6.30.5/fs/ext4/inode.c =================================================================== --- linux-2.6.30.5.orig/fs/ext4/inode.c 2009-08-31 12:08:10.000000000 -0700 +++ linux-2.6.30.5/fs/ext4/inode.c 2009-09-23 21:42:33.000000000 -0700 @@ -3973,6 +3973,8 @@ void ext4_truncate(struct inode *inode) if (!ext4_can_truncate(inode)) return; + inode->i_flags &= ~EXT4_EOFBLOCKS_FL; + if (inode->i_size == 0 && !test_opt(inode->i_sb, NO_AUTO_DA_ALLOC)) ei->i_state |= EXT4_STATE_DA_ALLOC_CLOSE; @@ -4285,8 +4287,8 @@ void ext4_get_inode_flags(struct ext4_in { unsigned int flags = ei->vfs_inode.i_flags; - ei->i_flags &= ~(EXT4_SYNC_FL|EXT4_APPEND_FL| - EXT4_IMMUTABLE_FL|EXT4_NOATIME_FL|EXT4_DIRSYNC_FL); + ei->i_flags &= ~(EXT4_SYNC_FL|EXT4_APPEND_FL|EXT4_IMMUTABLE_FL| + EXT4_NOATIME_FL|EXT4_DIRSYNC_FL|EXT4_EOFBLOCKS_FL); if (flags & S_SYNC) ei->i_flags |= EXT4_SYNC_FL; if (flags & S_APPEND) @@ -4297,6 +4299,8 @@ void ext4_get_inode_flags(struct ext4_in ei->i_flags |= EXT4_NOATIME_FL; if (flags & S_DIRSYNC) ei->i_flags |= EXT4_DIRSYNC_FL; + if (flags & FS_EOFBLOCKS_FL) + ei->i_flags |= EXT4_EOFBLOCKS_FL; } static blkcnt_t ext4_inode_blocks(struct ext4_inode *raw_inode, struct ext4_inode_info *ei) @@ -4807,7 +4811,9 @@ int ext4_setattr(struct dentry *dentry, } if (S_ISREG(inode->i_mode) && - attr->ia_valid & ATTR_SIZE && attr->ia_size < inode->i_size) { + attr->ia_valid & ATTR_SIZE && + (attr->ia_size < inode->i_size || + (inode->i_flags & EXT4_EOFBLOCKS_FL))) { handle_t *handle; handle = ext4_journal_start(inode, 3); @@ -4838,6 +4844,11 @@ int ext4_setattr(struct dentry *dentry, goto err_out; } } + if ((inode->i_flags & EXT4_EOFBLOCKS_FL)) { + rc = vmtruncate(inode, attr->ia_size); + if (rc) + goto err_out; + } } rc = inode_setattr(inode, attr); Index: linux-2.6.30.5/include/linux/fs.h =================================================================== --- linux-2.6.30.5.orig/include/linux/fs.h 2009-08-31 12:08:10.000000000 -0700 +++ linux-2.6.30.5/include/linux/fs.h 2009-09-10 21:27:30.000000000 -0700 @@ -343,9 +343,10 @@ struct inodes_stat_t { #define FS_TOPDIR_FL 0x00020000 /* Top of directory hierarchies*/ #define FS_EXTENT_FL 0x00080000 /* Extents */ #define FS_DIRECTIO_FL 0x00100000 /* Use direct i/o */ +#define FS_EOFBLOCKS_FL 0x00200000 /* Blocks allocated beyond EOF */ #define FS_RESERVED_FL 0x80000000 /* reserved for ext2 lib */ -#define FS_FL_USER_VISIBLE 0x0003DFFF /* User visible flags */ +#define FS_FL_USER_VISIBLE 0x0023DFFF /* User visible flags */ #define FS_FL_USER_MODIFIABLE 0x000380FF /* User modifiable flags */ Index: linux-2.6.30.5/fs/ext4/ext4.h =================================================================== --- linux-2.6.30.5.orig/fs/ext4/ext4.h 2009-08-31 12:08:10.000000000 -0700 +++ linux-2.6.30.5/fs/ext4/ext4.h 2009-09-10 21:28:14.000000000 -0700 @@ -235,9 +235,10 @@ struct flex_groups { #define EXT4_HUGE_FILE_FL 0x00040000 /* Set to each huge file */ #define EXT4_EXTENTS_FL 0x00080000 /* Inode uses extents */ #define EXT4_EXT_MIGRATE 0x00100000 /* Inode is migrating */ +#define EXT4_EOFBLOCKS_FL 0x00200000 /* Blocks allocated beyond EOF (bit reserved in fs.h) */ #define EXT4_RESERVED_FL 0x80000000 /* reserved for ext4 lib */ -#define EXT4_FL_USER_VISIBLE 0x000BDFFF /* User visible flags */ +#define EXT4_FL_USER_VISIBLE 0x002BDFFF /* User visible flags */ #define EXT4_FL_USER_MODIFIABLE 0x000B80FF /* User modifiable flags */ /* Flags that should be inherited by new inodes from their parent. */ Index: linux-2.6.30.5/fs/ext4/extents.c =================================================================== --- linux-2.6.30.5.orig/fs/ext4/extents.c 2009-09-01 18:14:58.000000000 -0700 +++ linux-2.6.30.5/fs/ext4/extents.c 2009-09-23 22:12:22.000000000 -0700 @@ -2788,7 +2788,7 @@ int ext4_ext_get_blocks(handle_t *handle { struct ext4_ext_path *path = NULL; struct ext4_extent_header *eh; - struct ext4_extent newex, *ex; + struct ext4_extent newex, *ex, *last_ex; ext4_fsblk_t newblock; int err = 0, depth, ret, cache_type; unsigned int allocated = 0; @@ -2968,6 +2968,14 @@ int ext4_ext_get_blocks(handle_t *handle newex.ee_len = cpu_to_le16(ar.len); if (create == EXT4_CREATE_UNINITIALIZED_EXT) /* Mark uninitialized */ ext4_ext_mark_uninitialized(&newex); + + if (unlikely(inode->i_flags & EXT4_EOFBLOCKS_FL)) { + BUG_ON(!eh->eh_entries); + last_ex = EXT_LAST_EXTENT(eh); + if (iblock + ar.len > le32_to_cpu(last_ex->ee_block) + + ext4_ext_get_actual_len(last_ex)) + inode->i_flags &= ~EXT4_EOFBLOCKS_FL; + } err = ext4_ext_insert_extent(handle, inode, path, &newex); if (err) { /* free data blocks we just allocated */ @@ -3095,6 +3103,13 @@ static void ext4_falloc_update_inode(str i_size_write(inode, new_size); if (new_size > EXT4_I(inode)->i_disksize) ext4_update_i_disksize(inode, new_size); + } else { + /* + * Mark that we allocate beyond EOF so the subsequent truncate + * can proceed even if the new size is the same as i_size. + */ + if (new_size > i_size_read(inode)) + inode->i_flags |= EXT4_EOFBLOCKS_FL; } } Index: linux-2.6.30.5/fs/ext4/ioctl.c =================================================================== --- linux-2.6.30.5.orig/fs/ext4/ioctl.c 2009-08-16 14:19:38.000000000 -0700 +++ linux-2.6.30.5/fs/ext4/ioctl.c 2009-09-23 22:04:47.000000000 -0700 @@ -92,6 +92,16 @@ long ext4_ioctl(struct file *filp, unsig flags &= ~EXT4_EXTENTS_FL; } + if (flags & EXT4_EOFBLOCKS_FL) { + /* we don't support adding EOFBLOCKS flag */ + if (!(oldflags & EXT4_EOFBLOCKS_FL)) { + err = -EOPNOTSUPP; + goto flags_out; + } + } else if (oldflags & EXT4_EOFBLOCKS_FL) + /* free the space reserved with fallocate KEEPSIZE */ + vmtruncate(inode, inode->i_size); + handle = ext4_journal_start(inode, 1); if (IS_ERR(handle)) { err = PTR_ERR(handle); Jiaying On Wed, Sep 2, 2009 at 10:20 PM, Jiaying Zhang <jiayingz@google.com> wrote: > > On Wed, Sep 2, 2009 at 1:41 AM, Andreas Dilger<adilger@sun.com> wrote: > > On Aug 31, 2009 16:33 -0700, Jiaying Zhang wrote: > >> > EXT4_KEEPSIZE_FL should only be cleared if there were writes to > >> > the end of the fallocated space. In that regard, I think the name > >> > of this flag should be changed to something like "EXT4_EOFBLOCKS_FL" > >> > to indicate that blocks are allocated beyond the end of file (i_size). > >> > >> Thanks for catching this! I changed the patch to only clear the flag > >> when the new_size is larger than i_size and changed the flag name > >> as you suggested. It would be nice if we only clear the flag when we > >> write beyond the fallocated space, but this seems hard to detect > >> because we no longer have the allocated size once that keepsize > >> fallocate call returns. > > > > The problem is that if e2fsck depends on the EXT4_EOFBLOCKS_FL set > > for fallocate-beyond-EOF then it is worse to clear it than to leave > > it set. At worst, leaving the flag set results in too many truncates > > on the file. Clearing the flag when not correct may result in user > > visible data corruption if the file size is extended... > > > >> Here is the new patch: > >> > >> --- .pc/fallocate_keepsizse.patch/fs/ext4/extents.c 2009-08-31 > >> 12:08:10.000000000 -0700 > >> +++ fs/ext4/extents.c 2009-08-31 15:51:13.000000000 -0700 > >> @@ -3091,11 +3091,19 @@ static void ext4_falloc_update_inode(str > >> * the file size. > >> */ > >> if (!(mode & FALLOC_FL_KEEP_SIZE)) { > >> + if (new_size > i_size_read(inode)) { > >> i_size_write(inode, new_size); > >> + inode->i_flags &= ~EXT4_EOFBLOCKS_FL; > > > > This again isn't quite correct, since the EOFBLOCKS_FL shouldn't > > be cleared unless new_size is beyond the allocated size. The > > allocation code itself might be a better place to clear this, > > since it knows whether there were new blocks being added beyond > > the current max allocated block. > > We were thinking to clear this flag when we need to allocate new > blocks, but I was not sure how to get the current max allocated > block -- that is mostly because I just started working on the ext4 > code. After digging into the ext4 allocation code today, I think we > can put the check&clear in ext4_ext_get_blocks: > > @@ -2968,6 +2968,14 @@ int ext4_ext_get_blocks(handle_t *handle > newex.ee_len = cpu_to_le16(ar.len); > if (create == EXT4_CREATE_UNINITIALIZED_EXT) /* Mark uninitialized */ > ext4_ext_mark_uninitialized(&newex); > + > + if (unlikely(inode->i_flags & EXT4_EOFBLOCKS_FL)) { > + BUG_ON(!eh->eh_entries); > + last_ex = EXT_LAST_EXTENT(eh); > + if (iblock + max_blocks > le32_to_cpu(last_ex->ee_block) > + + ext4_ext_get_actual_len(last_ex)) > + inode->i_flags &= ~EXT4_EOFBLOCKS_FL; > + } > err = ext4_ext_insert_extent(handle, inode, path, &newex); > if (err) { > /* free data blocks we just allocated */ > > Again, I just started looking at this part of code, so please let me know > if I am in the right direction. > > Another thing I am not sure is whether we can allocate a non-data block, > like extended attributes, beyond the current max block without changing > the i_size. In that case, clearing the EOFBLOCKS flag will be wrong. > > >> #define FS_FL_USER_VISIBLE 0x0003DFFF /* User visible flags */ > > > > It probably isn't a bad idea to make this flag user-visible, since it > > would allow scanning for files that have excess space reserved (e.g. > > if the filesystem is getting full). Making it user-settable (i.e. > > clearable) would essentially mean truncating the file to i_size without > > updating the timestamps so that the reserved space is discarded. I > > don't think there is any value in allowing a user to turn this flag on > > for a file. > > So to make it user-settable, we need to add the handling in ext4_ioctl > that calls vmtruncate when the flag to be cleared. But how can we get > the right size to truncate in that case? Can we just set that to the > max initialized block shift with block size? But that may also truncate > the blocks reserved without the KEEP_SIZE flag. > > Jiaying > > > > > Cheers, Andreas > > -- > > Andreas Dilger > > Sr. Staff Engineer, Lustre Group > > Sun Microsystems of Canada, Inc. > > > > -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Question on fallocate/ftruncate sequence 2009-09-24 5:27 ` Jiaying Zhang @ 2009-09-25 7:35 ` Andreas Dilger 2009-09-25 22:08 ` Jiaying Zhang 2009-09-29 19:15 ` Eric Sandeen 1 sibling, 1 reply; 42+ messages in thread From: Andreas Dilger @ 2009-09-25 7:35 UTC (permalink / raw) To: Jiaying Zhang Cc: Theodore Tso, Frank Mayhar, Eric Sandeen, Curt Wohlgemuth, ext4 development On Sep 23, 2009 22:27 -0700, Jiaying Zhang wrote: > Sorry for taking so long to finish this. Here is the new patch based on > Andreas's suggestions. Now the patch clears the EXT4_EOFBLOCKS_FL > flag when we allocate beyond the maximum allocated block. I also > made the EOFBLOCKS flag user visible and added the handling > in ext4_ioctl as Andrea suggested. > > Index: linux-2.6.30.5/include/linux/fs.h > =================================================================== > +#define FS_EOFBLOCKS_FL 0x00200000 /* Blocks allocated beyond EOF */ Can you please use 0x00400000 here. I've already asked Ted to reserve the 0x0020000 inode flag for use by large extended attributes. > #define FS_RESERVED_FL 0x80000000 /* reserved for ext2 lib */ > > -#define FS_FL_USER_VISIBLE 0x0003DFFF /* User visible flags */ > +#define FS_FL_USER_VISIBLE 0x0023DFFF /* User visible flags */ This would need to be changed to 0x0043DFFF to match. Sorry, I haven't looked at the rest of the patch yet, just thought I'd mention this early on. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Question on fallocate/ftruncate sequence 2009-09-25 7:35 ` Andreas Dilger @ 2009-09-25 22:08 ` Jiaying Zhang 0 siblings, 0 replies; 42+ messages in thread From: Jiaying Zhang @ 2009-09-25 22:08 UTC (permalink / raw) To: Andreas Dilger Cc: Theodore Tso, Frank Mayhar, Eric Sandeen, Curt Wohlgemuth, ext4 development Thanks a lot for the notification! I will change it in my patch. Jiaying On Fri, Sep 25, 2009 at 12:35 AM, Andreas Dilger <adilger@sun.com> wrote: > On Sep 23, 2009 22:27 -0700, Jiaying Zhang wrote: >> Sorry for taking so long to finish this. Here is the new patch based on >> Andreas's suggestions. Now the patch clears the EXT4_EOFBLOCKS_FL >> flag when we allocate beyond the maximum allocated block. I also >> made the EOFBLOCKS flag user visible and added the handling >> in ext4_ioctl as Andrea suggested. >> >> Index: linux-2.6.30.5/include/linux/fs.h >> =================================================================== >> +#define FS_EOFBLOCKS_FL 0x00200000 /* Blocks allocated beyond EOF */ > > Can you please use 0x00400000 here. I've already asked Ted to reserve > the 0x0020000 inode flag for use by large extended attributes. > >> #define FS_RESERVED_FL 0x80000000 /* reserved for ext2 lib */ >> >> -#define FS_FL_USER_VISIBLE 0x0003DFFF /* User visible flags */ >> +#define FS_FL_USER_VISIBLE 0x0023DFFF /* User visible flags */ > > This would need to be changed to 0x0043DFFF to match. > > Sorry, I haven't looked at the rest of the patch yet, just thought I'd > mention this early on. > > Cheers, Andreas > -- > Andreas Dilger > Sr. Staff Engineer, Lustre Group > Sun Microsystems of Canada, Inc. > > -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Question on fallocate/ftruncate sequence 2009-09-24 5:27 ` Jiaying Zhang 2009-09-25 7:35 ` Andreas Dilger @ 2009-09-29 19:15 ` Eric Sandeen 2009-09-29 19:38 ` Jiaying Zhang 1 sibling, 1 reply; 42+ messages in thread From: Eric Sandeen @ 2009-09-29 19:15 UTC (permalink / raw) To: Jiaying Zhang Cc: Andreas Dilger, Theodore Tso, Frank Mayhar, Curt Wohlgemuth, ext4 development Jiaying Zhang wrote: > Sorry for taking so long to finish this. Here is the new patch based on > Andreas's suggestions. Now the patch clears the EXT4_EOFBLOCKS_FL > flag when we allocate beyond the maximum allocated block. I also > made the EOFBLOCKS flag user visible and added the handling > in ext4_ioctl as Andrea suggested. I was testing this a bit in xfstests, with test 083 (recently I sent a patch to the xfs list to let that test run on generic filesystems) which runs fsstress on a small-ish 100M fs, and that fsstress does space preallocation (on newer kernels, where the older xfs ioctls are hooked up to do_fallocate in a generic fashion). I'm actually seeing more corruption w/ this patch than without it, though I don't yet see why. I'll double check that it applied properly, since this was against 2.6.30.5.... Also it strikes me as a little odd to allow clearing of the EOF Flag from userspace, and the subsequent discarding of the blocks past EOF. Doesn't truncating to i_size do exactly the same thing, in a more portable way? Why make a new interface unique to ext4? -Eric > Index: linux-2.6.30.5/fs/ext4/inode.c > =================================================================== > --- linux-2.6.30.5.orig/fs/ext4/inode.c 2009-08-31 12:08:10.000000000 -0700 > +++ linux-2.6.30.5/fs/ext4/inode.c 2009-09-23 21:42:33.000000000 -0700 > @@ -3973,6 +3973,8 @@ void ext4_truncate(struct inode *inode) > if (!ext4_can_truncate(inode)) > return; > > + inode->i_flags &= ~EXT4_EOFBLOCKS_FL; > + > if (inode->i_size == 0 && !test_opt(inode->i_sb, NO_AUTO_DA_ALLOC)) > ei->i_state |= EXT4_STATE_DA_ALLOC_CLOSE; > > @@ -4285,8 +4287,8 @@ void ext4_get_inode_flags(struct ext4_in > { > unsigned int flags = ei->vfs_inode.i_flags; > > - ei->i_flags &= ~(EXT4_SYNC_FL|EXT4_APPEND_FL| > - EXT4_IMMUTABLE_FL|EXT4_NOATIME_FL|EXT4_DIRSYNC_FL); > + ei->i_flags &= ~(EXT4_SYNC_FL|EXT4_APPEND_FL|EXT4_IMMUTABLE_FL| > + EXT4_NOATIME_FL|EXT4_DIRSYNC_FL|EXT4_EOFBLOCKS_FL); > if (flags & S_SYNC) > ei->i_flags |= EXT4_SYNC_FL; > if (flags & S_APPEND) > @@ -4297,6 +4299,8 @@ void ext4_get_inode_flags(struct ext4_in > ei->i_flags |= EXT4_NOATIME_FL; > if (flags & S_DIRSYNC) > ei->i_flags |= EXT4_DIRSYNC_FL; > + if (flags & FS_EOFBLOCKS_FL) > + ei->i_flags |= EXT4_EOFBLOCKS_FL; > } > static blkcnt_t ext4_inode_blocks(struct ext4_inode *raw_inode, > struct ext4_inode_info *ei) > @@ -4807,7 +4811,9 @@ int ext4_setattr(struct dentry *dentry, > } > > if (S_ISREG(inode->i_mode) && > - attr->ia_valid & ATTR_SIZE && attr->ia_size < inode->i_size) { > + attr->ia_valid & ATTR_SIZE && > + (attr->ia_size < inode->i_size || > + (inode->i_flags & EXT4_EOFBLOCKS_FL))) { > handle_t *handle; > > handle = ext4_journal_start(inode, 3); > @@ -4838,6 +4844,11 @@ int ext4_setattr(struct dentry *dentry, > goto err_out; > } > } > + if ((inode->i_flags & EXT4_EOFBLOCKS_FL)) { > + rc = vmtruncate(inode, attr->ia_size); > + if (rc) > + goto err_out; > + } > } > > rc = inode_setattr(inode, attr); > Index: linux-2.6.30.5/include/linux/fs.h > =================================================================== > --- linux-2.6.30.5.orig/include/linux/fs.h 2009-08-31 > 12:08:10.000000000 -0700 > +++ linux-2.6.30.5/include/linux/fs.h 2009-09-10 21:27:30.000000000 -0700 > @@ -343,9 +343,10 @@ struct inodes_stat_t { > #define FS_TOPDIR_FL 0x00020000 /* Top of directory hierarchies*/ > #define FS_EXTENT_FL 0x00080000 /* Extents */ > #define FS_DIRECTIO_FL 0x00100000 /* Use direct i/o */ > +#define FS_EOFBLOCKS_FL 0x00200000 /* Blocks allocated beyond EOF */ > #define FS_RESERVED_FL 0x80000000 /* reserved for ext2 lib */ > > -#define FS_FL_USER_VISIBLE 0x0003DFFF /* User visible flags */ > +#define FS_FL_USER_VISIBLE 0x0023DFFF /* User visible flags */ > #define FS_FL_USER_MODIFIABLE 0x000380FF /* User modifiable flags */ > > > Index: linux-2.6.30.5/fs/ext4/ext4.h > =================================================================== > --- linux-2.6.30.5.orig/fs/ext4/ext4.h 2009-08-31 12:08:10.000000000 -0700 > +++ linux-2.6.30.5/fs/ext4/ext4.h 2009-09-10 21:28:14.000000000 -0700 > @@ -235,9 +235,10 @@ struct flex_groups { > #define EXT4_HUGE_FILE_FL 0x00040000 /* Set to each huge file */ > #define EXT4_EXTENTS_FL 0x00080000 /* Inode uses extents */ > #define EXT4_EXT_MIGRATE 0x00100000 /* Inode is migrating */ > +#define EXT4_EOFBLOCKS_FL 0x00200000 /* Blocks allocated > beyond EOF (bit reserved in fs.h) */ > #define EXT4_RESERVED_FL 0x80000000 /* reserved for ext4 lib */ > > -#define EXT4_FL_USER_VISIBLE 0x000BDFFF /* User visible flags */ > +#define EXT4_FL_USER_VISIBLE 0x002BDFFF /* User visible flags */ > #define EXT4_FL_USER_MODIFIABLE 0x000B80FF /* User modifiable flags */ > > /* Flags that should be inherited by new inodes from their parent. */ > Index: linux-2.6.30.5/fs/ext4/extents.c > =================================================================== > --- linux-2.6.30.5.orig/fs/ext4/extents.c 2009-09-01 18:14:58.000000000 -0700 > +++ linux-2.6.30.5/fs/ext4/extents.c 2009-09-23 22:12:22.000000000 -0700 > @@ -2788,7 +2788,7 @@ int ext4_ext_get_blocks(handle_t *handle > { > struct ext4_ext_path *path = NULL; > struct ext4_extent_header *eh; > - struct ext4_extent newex, *ex; > + struct ext4_extent newex, *ex, *last_ex; > ext4_fsblk_t newblock; > int err = 0, depth, ret, cache_type; > unsigned int allocated = 0; > @@ -2968,6 +2968,14 @@ int ext4_ext_get_blocks(handle_t *handle > newex.ee_len = cpu_to_le16(ar.len); > if (create == EXT4_CREATE_UNINITIALIZED_EXT) /* Mark uninitialized */ > ext4_ext_mark_uninitialized(&newex); > + > + if (unlikely(inode->i_flags & EXT4_EOFBLOCKS_FL)) { > + BUG_ON(!eh->eh_entries); > + last_ex = EXT_LAST_EXTENT(eh); > + if (iblock + ar.len > le32_to_cpu(last_ex->ee_block) > + + ext4_ext_get_actual_len(last_ex)) > + inode->i_flags &= ~EXT4_EOFBLOCKS_FL; > + } > err = ext4_ext_insert_extent(handle, inode, path, &newex); > if (err) { > /* free data blocks we just allocated */ > @@ -3095,6 +3103,13 @@ static void ext4_falloc_update_inode(str > i_size_write(inode, new_size); > if (new_size > EXT4_I(inode)->i_disksize) > ext4_update_i_disksize(inode, new_size); > + } else { > + /* > + * Mark that we allocate beyond EOF so the subsequent truncate > + * can proceed even if the new size is the same as i_size. > + */ > + if (new_size > i_size_read(inode)) > + inode->i_flags |= EXT4_EOFBLOCKS_FL; > } > } > > Index: linux-2.6.30.5/fs/ext4/ioctl.c > =================================================================== > --- linux-2.6.30.5.orig/fs/ext4/ioctl.c 2009-08-16 14:19:38.000000000 -0700 > +++ linux-2.6.30.5/fs/ext4/ioctl.c 2009-09-23 22:04:47.000000000 -0700 > @@ -92,6 +92,16 @@ long ext4_ioctl(struct file *filp, unsig > flags &= ~EXT4_EXTENTS_FL; > } > > + if (flags & EXT4_EOFBLOCKS_FL) { > + /* we don't support adding EOFBLOCKS flag */ > + if (!(oldflags & EXT4_EOFBLOCKS_FL)) { > + err = -EOPNOTSUPP; > + goto flags_out; > + } > + } else if (oldflags & EXT4_EOFBLOCKS_FL) > + /* free the space reserved with fallocate KEEPSIZE */ > + vmtruncate(inode, inode->i_size); > + > handle = ext4_journal_start(inode, 1); > if (IS_ERR(handle)) { > err = PTR_ERR(handle); > > ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Question on fallocate/ftruncate sequence 2009-09-29 19:15 ` Eric Sandeen @ 2009-09-29 19:38 ` Jiaying Zhang 2009-09-29 19:55 ` Eric Sandeen 0 siblings, 1 reply; 42+ messages in thread From: Jiaying Zhang @ 2009-09-29 19:38 UTC (permalink / raw) To: Eric Sandeen Cc: Andreas Dilger, Theodore Tso, Frank Mayhar, Curt Wohlgemuth, ext4 development On Tue, Sep 29, 2009 at 12:15 PM, Eric Sandeen <sandeen@redhat.com> wrote: > Jiaying Zhang wrote: >> Sorry for taking so long to finish this. Here is the new patch based on >> Andreas's suggestions. Now the patch clears the EXT4_EOFBLOCKS_FL >> flag when we allocate beyond the maximum allocated block. I also >> made the EOFBLOCKS flag user visible and added the handling >> in ext4_ioctl as Andrea suggested. > > I was testing this a bit in xfstests, with test 083 (recently I sent a > patch to the xfs list to let that test run on generic filesystems) which > runs fsstress on a small-ish 100M fs, and that fsstress does space > preallocation (on newer kernels, where the older xfs ioctls are hooked > up to do_fallocate in a generic fashion). Does the fsstress use fallocate with KEEP_SIZE? > > I'm actually seeing more corruption w/ this patch than without it, > though I don't yet see why. I'll double check that it applied properly, > since this was against 2.6.30.5.... Do you want me to port my changes to the latest ext4 git tree? I should have done so at the beginning. > > Also it strikes me as a little odd to allow clearing of the EOF Flag > from userspace, and the subsequent discarding of the blocks past EOF. > > Doesn't truncating to i_size do exactly the same thing, in a more > portable way? Why make a new interface unique to ext4? As Andreas suggested, I think the main purpose is to allow users to scan for any files with EOF flag with the getflag ioctl. We may not allow users to clear it with the setflag ioctl but just rely on the truncate interface, but supporting the setflag ioctl interface doesn't seem to do any harm. Jiaying > > -Eric > >> Index: linux-2.6.30.5/fs/ext4/inode.c >> =================================================================== >> --- linux-2.6.30.5.orig/fs/ext4/inode.c 2009-08-31 12:08:10.000000000 -0700 >> +++ linux-2.6.30.5/fs/ext4/inode.c 2009-09-23 21:42:33.000000000 -0700 >> @@ -3973,6 +3973,8 @@ void ext4_truncate(struct inode *inode) >> if (!ext4_can_truncate(inode)) >> return; >> >> + inode->i_flags &= ~EXT4_EOFBLOCKS_FL; >> + >> if (inode->i_size == 0 && !test_opt(inode->i_sb, NO_AUTO_DA_ALLOC)) >> ei->i_state |= EXT4_STATE_DA_ALLOC_CLOSE; >> >> @@ -4285,8 +4287,8 @@ void ext4_get_inode_flags(struct ext4_in >> { >> unsigned int flags = ei->vfs_inode.i_flags; >> >> - ei->i_flags &= ~(EXT4_SYNC_FL|EXT4_APPEND_FL| >> - EXT4_IMMUTABLE_FL|EXT4_NOATIME_FL|EXT4_DIRSYNC_FL); >> + ei->i_flags &= ~(EXT4_SYNC_FL|EXT4_APPEND_FL|EXT4_IMMUTABLE_FL| >> + EXT4_NOATIME_FL|EXT4_DIRSYNC_FL|EXT4_EOFBLOCKS_FL); >> if (flags & S_SYNC) >> ei->i_flags |= EXT4_SYNC_FL; >> if (flags & S_APPEND) >> @@ -4297,6 +4299,8 @@ void ext4_get_inode_flags(struct ext4_in >> ei->i_flags |= EXT4_NOATIME_FL; >> if (flags & S_DIRSYNC) >> ei->i_flags |= EXT4_DIRSYNC_FL; >> + if (flags & FS_EOFBLOCKS_FL) >> + ei->i_flags |= EXT4_EOFBLOCKS_FL; >> } >> static blkcnt_t ext4_inode_blocks(struct ext4_inode *raw_inode, >> struct ext4_inode_info *ei) >> @@ -4807,7 +4811,9 @@ int ext4_setattr(struct dentry *dentry, >> } >> >> if (S_ISREG(inode->i_mode) && >> - attr->ia_valid & ATTR_SIZE && attr->ia_size < inode->i_size) { >> + attr->ia_valid & ATTR_SIZE && >> + (attr->ia_size < inode->i_size || >> + (inode->i_flags & EXT4_EOFBLOCKS_FL))) { >> handle_t *handle; >> >> handle = ext4_journal_start(inode, 3); >> @@ -4838,6 +4844,11 @@ int ext4_setattr(struct dentry *dentry, >> goto err_out; >> } >> } >> + if ((inode->i_flags & EXT4_EOFBLOCKS_FL)) { >> + rc = vmtruncate(inode, attr->ia_size); >> + if (rc) >> + goto err_out; >> + } >> } >> >> rc = inode_setattr(inode, attr); >> Index: linux-2.6.30.5/include/linux/fs.h >> =================================================================== >> --- linux-2.6.30.5.orig/include/linux/fs.h 2009-08-31 >> 12:08:10.000000000 -0700 >> +++ linux-2.6.30.5/include/linux/fs.h 2009-09-10 21:27:30.000000000 -0700 >> @@ -343,9 +343,10 @@ struct inodes_stat_t { >> #define FS_TOPDIR_FL 0x00020000 /* Top of directory hierarchies*/ >> #define FS_EXTENT_FL 0x00080000 /* Extents */ >> #define FS_DIRECTIO_FL 0x00100000 /* Use direct i/o */ >> +#define FS_EOFBLOCKS_FL 0x00200000 /* Blocks allocated beyond EOF */ >> #define FS_RESERVED_FL 0x80000000 /* reserved for ext2 lib */ >> >> -#define FS_FL_USER_VISIBLE 0x0003DFFF /* User visible flags */ >> +#define FS_FL_USER_VISIBLE 0x0023DFFF /* User visible flags */ >> #define FS_FL_USER_MODIFIABLE 0x000380FF /* User modifiable flags */ >> >> >> Index: linux-2.6.30.5/fs/ext4/ext4.h >> =================================================================== >> --- linux-2.6.30.5.orig/fs/ext4/ext4.h 2009-08-31 12:08:10.000000000 -0700 >> +++ linux-2.6.30.5/fs/ext4/ext4.h 2009-09-10 21:28:14.000000000 -0700 >> @@ -235,9 +235,10 @@ struct flex_groups { >> #define EXT4_HUGE_FILE_FL 0x00040000 /* Set to each huge file */ >> #define EXT4_EXTENTS_FL 0x00080000 /* Inode uses extents */ >> #define EXT4_EXT_MIGRATE 0x00100000 /* Inode is migrating */ >> +#define EXT4_EOFBLOCKS_FL 0x00200000 /* Blocks allocated >> beyond EOF (bit reserved in fs.h) */ >> #define EXT4_RESERVED_FL 0x80000000 /* reserved for ext4 lib */ >> >> -#define EXT4_FL_USER_VISIBLE 0x000BDFFF /* User visible flags */ >> +#define EXT4_FL_USER_VISIBLE 0x002BDFFF /* User visible flags */ >> #define EXT4_FL_USER_MODIFIABLE 0x000B80FF /* User modifiable flags */ >> >> /* Flags that should be inherited by new inodes from their parent. */ >> Index: linux-2.6.30.5/fs/ext4/extents.c >> =================================================================== >> --- linux-2.6.30.5.orig/fs/ext4/extents.c 2009-09-01 18:14:58.000000000 -0700 >> +++ linux-2.6.30.5/fs/ext4/extents.c 2009-09-23 22:12:22.000000000 -0700 >> @@ -2788,7 +2788,7 @@ int ext4_ext_get_blocks(handle_t *handle >> { >> struct ext4_ext_path *path = NULL; >> struct ext4_extent_header *eh; >> - struct ext4_extent newex, *ex; >> + struct ext4_extent newex, *ex, *last_ex; >> ext4_fsblk_t newblock; >> int err = 0, depth, ret, cache_type; >> unsigned int allocated = 0; >> @@ -2968,6 +2968,14 @@ int ext4_ext_get_blocks(handle_t *handle >> newex.ee_len = cpu_to_le16(ar.len); >> if (create == EXT4_CREATE_UNINITIALIZED_EXT) /* Mark uninitialized */ >> ext4_ext_mark_uninitialized(&newex); >> + >> + if (unlikely(inode->i_flags & EXT4_EOFBLOCKS_FL)) { >> + BUG_ON(!eh->eh_entries); >> + last_ex = EXT_LAST_EXTENT(eh); >> + if (iblock + ar.len > le32_to_cpu(last_ex->ee_block) >> + + ext4_ext_get_actual_len(last_ex)) >> + inode->i_flags &= ~EXT4_EOFBLOCKS_FL; >> + } >> err = ext4_ext_insert_extent(handle, inode, path, &newex); >> if (err) { >> /* free data blocks we just allocated */ >> @@ -3095,6 +3103,13 @@ static void ext4_falloc_update_inode(str >> i_size_write(inode, new_size); >> if (new_size > EXT4_I(inode)->i_disksize) >> ext4_update_i_disksize(inode, new_size); >> + } else { >> + /* >> + * Mark that we allocate beyond EOF so the subsequent truncate >> + * can proceed even if the new size is the same as i_size. >> + */ >> + if (new_size > i_size_read(inode)) >> + inode->i_flags |= EXT4_EOFBLOCKS_FL; >> } >> } >> >> Index: linux-2.6.30.5/fs/ext4/ioctl.c >> =================================================================== >> --- linux-2.6.30.5.orig/fs/ext4/ioctl.c 2009-08-16 14:19:38.000000000 -0700 >> +++ linux-2.6.30.5/fs/ext4/ioctl.c 2009-09-23 22:04:47.000000000 -0700 >> @@ -92,6 +92,16 @@ long ext4_ioctl(struct file *filp, unsig >> flags &= ~EXT4_EXTENTS_FL; >> } >> >> + if (flags & EXT4_EOFBLOCKS_FL) { >> + /* we don't support adding EOFBLOCKS flag */ >> + if (!(oldflags & EXT4_EOFBLOCKS_FL)) { >> + err = -EOPNOTSUPP; >> + goto flags_out; >> + } >> + } else if (oldflags & EXT4_EOFBLOCKS_FL) >> + /* free the space reserved with fallocate KEEPSIZE */ >> + vmtruncate(inode, inode->i_size); >> + >> handle = ext4_journal_start(inode, 1); >> if (IS_ERR(handle)) { >> err = PTR_ERR(handle); >> >> > -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Question on fallocate/ftruncate sequence 2009-09-29 19:38 ` Jiaying Zhang @ 2009-09-29 19:55 ` Eric Sandeen 2009-09-30 8:10 ` Andreas Dilger 2009-10-02 22:10 ` Jiaying Zhang 0 siblings, 2 replies; 42+ messages in thread From: Eric Sandeen @ 2009-09-29 19:55 UTC (permalink / raw) To: Jiaying Zhang Cc: Andreas Dilger, Theodore Tso, Frank Mayhar, Curt Wohlgemuth, ext4 development Jiaying Zhang wrote: > On Tue, Sep 29, 2009 at 12:15 PM, Eric Sandeen <sandeen@redhat.com> wrote: >> Jiaying Zhang wrote: >>> Sorry for taking so long to finish this. Here is the new patch based on >>> Andreas's suggestions. Now the patch clears the EXT4_EOFBLOCKS_FL >>> flag when we allocate beyond the maximum allocated block. I also >>> made the EOFBLOCKS flag user visible and added the handling >>> in ext4_ioctl as Andrea suggested. >> I was testing this a bit in xfstests, with test 083 (recently I sent a >> patch to the xfs list to let that test run on generic filesystems) which >> runs fsstress on a small-ish 100M fs, and that fsstress does space >> preallocation (on newer kernels, where the older xfs ioctls are hooked >> up to do_fallocate in a generic fashion). > > Does the fsstress use fallocate with KEEP_SIZE? Effectively, yes. It uses the compatible xfs ioctls, which calls do_fallocate with KEEP_SIZE. >> I'm actually seeing more corruption w/ this patch than without it, >> though I don't yet see why. I'll double check that it applied properly, >> since this was against 2.6.30.5.... > > Do you want me to port my changes to the latest ext4 git tree? > I should have done so at the beginning. Sure :) >> Also it strikes me as a little odd to allow clearing of the EOF Flag >> from userspace, and the subsequent discarding of the blocks past EOF. >> >> Doesn't truncating to i_size do exactly the same thing, in a more >> portable way? Why make a new interface unique to ext4? > > As Andreas suggested, I think the main purpose is to allow users > to scan for any files with EOF flag with the getflag ioctl. We may > not allow users to clear it with the setflag ioctl but just rely on > the truncate interface, but supporting the setflag ioctl interface > doesn't seem to do any harm. I like the idea of being able to find them, but adding the clearing interface seems redundant to me. All filesystems would need to implement this, and I don't see that we gain anything. Thanks, -Eric > Jiaying ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Question on fallocate/ftruncate sequence 2009-09-29 19:55 ` Eric Sandeen @ 2009-09-30 8:10 ` Andreas Dilger 2009-10-02 22:10 ` Jiaying Zhang 1 sibling, 0 replies; 42+ messages in thread From: Andreas Dilger @ 2009-09-30 8:10 UTC (permalink / raw) To: Eric Sandeen Cc: Jiaying Zhang, Theodore Tso, Frank Mayhar, Curt Wohlgemuth, ext4 development On Sep 29, 2009 14:55 -0500, Eric Sandeen wrote: > Jiaying Zhang wrote: > > As Andreas suggested, I think the main purpose is to allow users > > to scan for any files with EOF flag with the getflag ioctl. We may > > not allow users to clear it with the setflag ioctl but just rely on > > the truncate interface, but supporting the setflag ioctl interface > > doesn't seem to do any harm. > > I like the idea of being able to find them, but adding the clearing > interface seems redundant to me. All filesystems would need to > implement this, and I don't see that we gain anything. I hadn't originally thought about being able to clear the flag, but now that it is implemented, I don't have a big objection to it. The one useful feature is that this could be done without changing the timestamps on the files, and it can be done easily from a script. We don't even have an fallocate() or other truncate() interface that can do this without changing the timestamps today. Even if we did use truncate(), that is racy with an application writing to the file, so it is not as safe as a call which is guaranteed to only discard the unused preallocated blocks. I don't think all other filesystems need to support it, that is up to them. As to whether applications use this or not remains to be seen. I doubt most applications will be written to be ext4-specific, but I like the ability to scan/clear preallocated space without writing a binary to do that, if I need it in the future. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Question on fallocate/ftruncate sequence 2009-09-29 19:55 ` Eric Sandeen 2009-09-30 8:10 ` Andreas Dilger @ 2009-10-02 22:10 ` Jiaying Zhang 2009-10-02 22:29 ` Eric Sandeen 1 sibling, 1 reply; 42+ messages in thread From: Jiaying Zhang @ 2009-10-02 22:10 UTC (permalink / raw) To: Eric Sandeen Cc: Andreas Dilger, Theodore Tso, Frank Mayhar, Curt Wohlgemuth, ext4 development FYI, here is my patch synced with the latest ext4-git tree: diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 9a99672..2a3d043 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -284,10 +284,11 @@ struct flex_groups { #define EXT4_TOPDIR_FL 0x00020000 /* Top of directory hierarchies*/ #define EXT4_HUGE_FILE_FL 0x00040000 /* Set to each huge file */ #define EXT4_EXTENTS_FL 0x00080000 /* Inode uses extents */ +#define EXT4_EOFBLOCKS_FL 0x00400000 /* Blocks allocated beyond EOF (bit reserved in fs.h) */ #define EXT4_RESERVED_FL 0x80000000 /* reserved for ext4 lib */ -#define EXT4_FL_USER_VISIBLE 0x000BDFFF /* User visible flags */ -#define EXT4_FL_USER_MODIFIABLE 0x000B80FF /* User modifiable flags */ +#define EXT4_FL_USER_VISIBLE 0x004BDFFF /* User visible flags */ +#define EXT4_FL_USER_MODIFIABLE 0x004B80FF /* User modifiable flags */ /* Flags that should be inherited by new inodes from their parent. */ #define EXT4_FL_INHERITED (EXT4_SECRM_FL | EXT4_UNRM_FL | EXT4_COMPR_FL |\ diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index 10539e3..3972f88 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -3131,7 +3131,7 @@ int ext4_ext_get_blocks(handle_t *handle, struct inode *inode, { struct ext4_ext_path *path = NULL; struct ext4_extent_header *eh; - struct ext4_extent newex, *ex; + struct ext4_extent newex, *ex, *last_ex; ext4_fsblk_t newblock; int err = 0, depth, ret, cache_type; unsigned int allocated = 0; @@ -3300,6 +3300,14 @@ int ext4_ext_get_blocks(handle_t *handle, struct inode *inode, if (io && flags == EXT4_GET_BLOCKS_DIO_CREATE_EXT) io->flag = DIO_AIO_UNWRITTEN; } + + if (unlikely(inode->i_flags & EXT4_EOFBLOCKS_FL)) { + BUG_ON(!eh->eh_entries); + last_ex = EXT_LAST_EXTENT(eh); + if (iblock + ar.len > le32_to_cpu(last_ex->ee_block) + + ext4_ext_get_actual_len(last_ex)) + inode->i_flags &= ~EXT4_EOFBLOCKS_FL; + } err = ext4_ext_insert_extent(handle, inode, path, &newex, flags); if (err) { /* free data blocks we just allocated */ @@ -3418,6 +3426,13 @@ static void ext4_falloc_update_inode(struct inode *inode, i_size_write(inode, new_size); if (new_size > EXT4_I(inode)->i_disksize) ext4_update_i_disksize(inode, new_size); + } else { + /* + * Mark that we allocate beyond EOF so the subsequent truncate + * can proceed even if the new size is the same as i_size. + */ + if (new_size > i_size_read(inode)) + inode->i_flags |= EXT4_EOFBLOCKS_FL; } } diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index f246b43..1d1857d 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -4620,6 +4620,8 @@ void ext4_truncate(struct inode *inode) if (!ext4_can_truncate(inode)) return; + inode->i_flags &= ~EXT4_EOFBLOCKS_FL; + if (inode->i_size == 0 && !test_opt(inode->i_sb, NO_AUTO_DA_ALLOC)) ei->i_state |= EXT4_STATE_DA_ALLOC_CLOSE; @@ -4932,8 +4934,8 @@ void ext4_get_inode_flags(struct ext4_inode_info *ei) { unsigned int flags = ei->vfs_inode.i_flags; - ei->i_flags &= ~(EXT4_SYNC_FL|EXT4_APPEND_FL| - EXT4_IMMUTABLE_FL|EXT4_NOATIME_FL|EXT4_DIRSYNC_FL); + ei->i_flags &= ~(EXT4_SYNC_FL|EXT4_APPEND_FL|EXT4_IMMUTABLE_FL| + EXT4_NOATIME_FL|EXT4_DIRSYNC_FL|EXT4_EOFBLOCKS_FL); if (flags & S_SYNC) ei->i_flags |= EXT4_SYNC_FL; if (flags & S_APPEND) @@ -4944,6 +4946,8 @@ void ext4_get_inode_flags(struct ext4_inode_info *ei) ei->i_flags |= EXT4_NOATIME_FL; if (flags & S_DIRSYNC) ei->i_flags |= EXT4_DIRSYNC_FL; + if (flags & FS_EOFBLOCKS_FL) + ei->i_flags |= EXT4_EOFBLOCKS_FL; } static blkcnt_t ext4_inode_blocks(struct ext4_inode *raw_inode, @@ -5453,7 +5457,9 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr) } if (S_ISREG(inode->i_mode) && - attr->ia_valid & ATTR_SIZE && attr->ia_size < inode->i_size) { + attr->ia_valid & ATTR_SIZE && + (attr->ia_size < inode->i_size || + (inode->i_flags & EXT4_EOFBLOCKS_FL))) { handle_t *handle; handle = ext4_journal_start(inode, 3); @@ -5484,6 +5490,11 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr) goto err_out; } } + if ((inode->i_flags & EXT4_EOFBLOCKS_FL)) { + rc = vmtruncate(inode, attr->ia_size); + if (rc) + goto err_out; + } } rc = inode_setattr(inode, attr); diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c index d1fe495..e7c543d 100644 --- a/fs/ext4/ioctl.c +++ b/fs/ext4/ioctl.c @@ -104,6 +104,16 @@ long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) flags &= ~EXT4_EXTENTS_FL; } + if (flags & EXT4_EOFBLOCKS_FL) { + /* we don't support adding EOFBLOCKS flag */ + if (!(oldflags & EXT4_EOFBLOCKS_FL)) { + err = -EOPNOTSUPP; + goto flags_out; + } + } else if (oldflags & EXT4_EOFBLOCKS_FL) + /* free the space reserved with fallocate KEEPSIZE */ + vmtruncate(inode, inode->i_size); + handle = ext4_journal_start(inode, 1); if (IS_ERR(handle)) { err = PTR_ERR(handle); diff --git a/include/linux/fs.h b/include/linux/fs.h index 2adaa25..7b3f0df 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -343,10 +343,11 @@ struct inodes_stat_t { #define FS_TOPDIR_FL 0x00020000 /* Top of directory hierarchies*/ #define FS_EXTENT_FL 0x00080000 /* Extents */ #define FS_DIRECTIO_FL 0x00100000 /* Use direct i/o */ +#define FS_EOFBLOCKS_FL 0x00400000 /* Blocks allocated beyond EOF */ #define FS_RESERVED_FL 0x80000000 /* reserved for ext2 lib */ -#define FS_FL_USER_VISIBLE 0x0003DFFF /* User visible flags */ -#define FS_FL_USER_MODIFIABLE 0x000380FF /* User modifiable flags */ +#define FS_FL_USER_VISIBLE 0x0043DFFF /* User visible flags */ +#define FS_FL_USER_MODIFIABLE 0x004380FF /* User modifiable flags */ #define SYNC_FILE_RANGE_WAIT_BEFORE 1 Jiaying On Tue, Sep 29, 2009 at 12:55 PM, Eric Sandeen <sandeen@redhat.com> wrote: > Jiaying Zhang wrote: >> On Tue, Sep 29, 2009 at 12:15 PM, Eric Sandeen <sandeen@redhat.com> wrote: >>> Jiaying Zhang wrote: >>>> Sorry for taking so long to finish this. Here is the new patch based on >>>> Andreas's suggestions. Now the patch clears the EXT4_EOFBLOCKS_FL >>>> flag when we allocate beyond the maximum allocated block. I also >>>> made the EOFBLOCKS flag user visible and added the handling >>>> in ext4_ioctl as Andrea suggested. >>> I was testing this a bit in xfstests, with test 083 (recently I sent a >>> patch to the xfs list to let that test run on generic filesystems) which >>> runs fsstress on a small-ish 100M fs, and that fsstress does space >>> preallocation (on newer kernels, where the older xfs ioctls are hooked >>> up to do_fallocate in a generic fashion). >> >> Does the fsstress use fallocate with KEEP_SIZE? > > Effectively, yes. It uses the compatible xfs ioctls, which calls > do_fallocate with KEEP_SIZE. > >>> I'm actually seeing more corruption w/ this patch than without it, >>> though I don't yet see why. I'll double check that it applied properly, >>> since this was against 2.6.30.5.... >> >> Do you want me to port my changes to the latest ext4 git tree? >> I should have done so at the beginning. > > Sure :) > >>> Also it strikes me as a little odd to allow clearing of the EOF Flag >>> from userspace, and the subsequent discarding of the blocks past EOF. >>> >>> Doesn't truncating to i_size do exactly the same thing, in a more >>> portable way? Why make a new interface unique to ext4? >> >> As Andreas suggested, I think the main purpose is to allow users >> to scan for any files with EOF flag with the getflag ioctl. We may >> not allow users to clear it with the setflag ioctl but just rely on >> the truncate interface, but supporting the setflag ioctl interface >> doesn't seem to do any harm. > > I like the idea of being able to find them, but adding the clearing > interface seems redundant to me. All filesystems would need to > implement this, and I don't see that we gain anything. > > Thanks, > -Eric > >> Jiaying > -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 42+ messages in thread
* Re: Question on fallocate/ftruncate sequence 2009-10-02 22:10 ` Jiaying Zhang @ 2009-10-02 22:29 ` Eric Sandeen 2009-10-02 23:21 ` Jiaying Zhang 0 siblings, 1 reply; 42+ messages in thread From: Eric Sandeen @ 2009-10-02 22:29 UTC (permalink / raw) To: Jiaying Zhang Cc: Andreas Dilger, Theodore Tso, Frank Mayhar, Curt Wohlgemuth, ext4 development Jiaying Zhang wrote: > FYI, here is my patch synced with the latest ext4-git tree: The patch came through pretty mangled, if you could post it in plain text w/ no wrapping, or maybe as an attachment, it'd be great. Thanks, -Eric ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Question on fallocate/ftruncate sequence 2009-10-02 22:29 ` Eric Sandeen @ 2009-10-02 23:21 ` Jiaying Zhang 0 siblings, 0 replies; 42+ messages in thread From: Jiaying Zhang @ 2009-10-02 23:21 UTC (permalink / raw) To: Eric Sandeen Cc: Andreas Dilger, Theodore Tso, Frank Mayhar, Curt Wohlgemuth, ext4 development [-- Attachment #1: Type: text/plain, Size: 441 bytes --] On Fri, Oct 2, 2009 at 3:29 PM, Eric Sandeen <sandeen@redhat.com> wrote: > Jiaying Zhang wrote: >> >> FYI, here is my patch synced with the latest ext4-git tree: > > The patch came through pretty mangled, if you could post it in plain text w/ > no wrapping, or maybe as an attachment, it'd be great. Hmm. I guess my email client re-formated it somehow. Sorry about the problem. Patch attached in this email. Jiaying > > Thanks, > -Eric > [-- Attachment #2: fallocate_keepsizse.patch --] [-- Type: text/x-patch, Size: 5696 bytes --] diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 9a99672..2a3d043 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -284,10 +284,11 @@ struct flex_groups { #define EXT4_TOPDIR_FL 0x00020000 /* Top of directory hierarchies*/ #define EXT4_HUGE_FILE_FL 0x00040000 /* Set to each huge file */ #define EXT4_EXTENTS_FL 0x00080000 /* Inode uses extents */ +#define EXT4_EOFBLOCKS_FL 0x00400000 /* Blocks allocated beyond EOF (bit reserved in fs.h) */ #define EXT4_RESERVED_FL 0x80000000 /* reserved for ext4 lib */ -#define EXT4_FL_USER_VISIBLE 0x000BDFFF /* User visible flags */ -#define EXT4_FL_USER_MODIFIABLE 0x000B80FF /* User modifiable flags */ +#define EXT4_FL_USER_VISIBLE 0x004BDFFF /* User visible flags */ +#define EXT4_FL_USER_MODIFIABLE 0x004B80FF /* User modifiable flags */ /* Flags that should be inherited by new inodes from their parent. */ #define EXT4_FL_INHERITED (EXT4_SECRM_FL | EXT4_UNRM_FL | EXT4_COMPR_FL |\ diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index 10539e3..3972f88 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -3131,7 +3131,7 @@ int ext4_ext_get_blocks(handle_t *handle, struct inode *inode, { struct ext4_ext_path *path = NULL; struct ext4_extent_header *eh; - struct ext4_extent newex, *ex; + struct ext4_extent newex, *ex, *last_ex; ext4_fsblk_t newblock; int err = 0, depth, ret, cache_type; unsigned int allocated = 0; @@ -3300,6 +3300,14 @@ int ext4_ext_get_blocks(handle_t *handle, struct inode *inode, if (io && flags == EXT4_GET_BLOCKS_DIO_CREATE_EXT) io->flag = DIO_AIO_UNWRITTEN; } + + if (unlikely(inode->i_flags & EXT4_EOFBLOCKS_FL)) { + BUG_ON(!eh->eh_entries); + last_ex = EXT_LAST_EXTENT(eh); + if (iblock + ar.len > le32_to_cpu(last_ex->ee_block) + + ext4_ext_get_actual_len(last_ex)) + inode->i_flags &= ~EXT4_EOFBLOCKS_FL; + } err = ext4_ext_insert_extent(handle, inode, path, &newex, flags); if (err) { /* free data blocks we just allocated */ @@ -3418,6 +3426,13 @@ static void ext4_falloc_update_inode(struct inode *inode, i_size_write(inode, new_size); if (new_size > EXT4_I(inode)->i_disksize) ext4_update_i_disksize(inode, new_size); + } else { + /* + * Mark that we allocate beyond EOF so the subsequent truncate + * can proceed even if the new size is the same as i_size. + */ + if (new_size > i_size_read(inode)) + inode->i_flags |= EXT4_EOFBLOCKS_FL; } } diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index f246b43..1d1857d 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -4620,6 +4620,8 @@ void ext4_truncate(struct inode *inode) if (!ext4_can_truncate(inode)) return; + inode->i_flags &= ~EXT4_EOFBLOCKS_FL; + if (inode->i_size == 0 && !test_opt(inode->i_sb, NO_AUTO_DA_ALLOC)) ei->i_state |= EXT4_STATE_DA_ALLOC_CLOSE; @@ -4932,8 +4934,8 @@ void ext4_get_inode_flags(struct ext4_inode_info *ei) { unsigned int flags = ei->vfs_inode.i_flags; - ei->i_flags &= ~(EXT4_SYNC_FL|EXT4_APPEND_FL| - EXT4_IMMUTABLE_FL|EXT4_NOATIME_FL|EXT4_DIRSYNC_FL); + ei->i_flags &= ~(EXT4_SYNC_FL|EXT4_APPEND_FL|EXT4_IMMUTABLE_FL| + EXT4_NOATIME_FL|EXT4_DIRSYNC_FL|EXT4_EOFBLOCKS_FL); if (flags & S_SYNC) ei->i_flags |= EXT4_SYNC_FL; if (flags & S_APPEND) @@ -4944,6 +4946,8 @@ void ext4_get_inode_flags(struct ext4_inode_info *ei) ei->i_flags |= EXT4_NOATIME_FL; if (flags & S_DIRSYNC) ei->i_flags |= EXT4_DIRSYNC_FL; + if (flags & FS_EOFBLOCKS_FL) + ei->i_flags |= EXT4_EOFBLOCKS_FL; } static blkcnt_t ext4_inode_blocks(struct ext4_inode *raw_inode, @@ -5453,7 +5457,9 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr) } if (S_ISREG(inode->i_mode) && - attr->ia_valid & ATTR_SIZE && attr->ia_size < inode->i_size) { + attr->ia_valid & ATTR_SIZE && + (attr->ia_size < inode->i_size || + (inode->i_flags & EXT4_EOFBLOCKS_FL))) { handle_t *handle; handle = ext4_journal_start(inode, 3); @@ -5484,6 +5490,11 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr) goto err_out; } } + if ((inode->i_flags & EXT4_EOFBLOCKS_FL)) { + rc = vmtruncate(inode, attr->ia_size); + if (rc) + goto err_out; + } } rc = inode_setattr(inode, attr); diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c index d1fe495..e7c543d 100644 --- a/fs/ext4/ioctl.c +++ b/fs/ext4/ioctl.c @@ -104,6 +104,16 @@ long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) flags &= ~EXT4_EXTENTS_FL; } + if (flags & EXT4_EOFBLOCKS_FL) { + /* we don't support adding EOFBLOCKS flag */ + if (!(oldflags & EXT4_EOFBLOCKS_FL)) { + err = -EOPNOTSUPP; + goto flags_out; + } + } else if (oldflags & EXT4_EOFBLOCKS_FL) + /* free the space reserved with fallocate KEEPSIZE */ + vmtruncate(inode, inode->i_size); + handle = ext4_journal_start(inode, 1); if (IS_ERR(handle)) { err = PTR_ERR(handle); diff --git a/include/linux/fs.h b/include/linux/fs.h index 2adaa25..7b3f0df 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -343,10 +343,11 @@ struct inodes_stat_t { #define FS_TOPDIR_FL 0x00020000 /* Top of directory hierarchies*/ #define FS_EXTENT_FL 0x00080000 /* Extents */ #define FS_DIRECTIO_FL 0x00100000 /* Use direct i/o */ +#define FS_EOFBLOCKS_FL 0x00400000 /* Blocks allocated beyond EOF */ #define FS_RESERVED_FL 0x80000000 /* reserved for ext2 lib */ -#define FS_FL_USER_VISIBLE 0x0003DFFF /* User visible flags */ -#define FS_FL_USER_MODIFIABLE 0x000380FF /* User modifiable flags */ +#define FS_FL_USER_VISIBLE 0x0043DFFF /* User visible flags */ +#define FS_FL_USER_MODIFIABLE 0x004380FF /* User modifiable flags */ #define SYNC_FILE_RANGE_WAIT_BEFORE 1 ^ permalink raw reply related [flat|nested] 42+ messages in thread
* Re: Question on fallocate/ftruncate sequence (and flags) 2009-07-21 21:54 ` Andreas Dilger 2009-07-22 16:24 ` Frank Mayhar 2009-07-22 23:10 ` Frank Mayhar @ 2009-07-23 19:48 ` Frank Mayhar 2009-07-23 20:37 ` Eric Sandeen 2 siblings, 1 reply; 42+ messages in thread From: Frank Mayhar @ 2009-07-23 19:48 UTC (permalink / raw) To: Andreas Dilger; +Cc: Eric Sandeen, Curt Wohlgemuth, ext4 development On Tue, 2009-07-21 at 15:54 -0600, Andreas Dilger wrote: > That said, we might need to have some kind of flag in the on-disk > inode to indicate that it was preallocated beyond EOF. Otherwise, > e2fsck will try and extend the file size to match the block count, > which isn't correct. We could also use this flag to determine if > truncate needs to be run on the inode even if the new size is the > same. As it happens there's already a flag, FS_FALLOC_FL, set by ext2 in fallocate(). Unfortunately ext4 is using that bit (0x00040000) for EXT4_HUGE_FILE_FL. (Ext4 is using another bit as well, 0x00100000, for EXT4_EXT_MIGRATE_FL when fs.h defines it as FS_DIRECTIO_FL.) I really want to use the FS_FALLOC_FL bit for this purpose but that means reallocating HUGE_FILE_FL to some other big. Objections? -- Frank Mayhar <fmayhar@google.com> Google, Inc. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Question on fallocate/ftruncate sequence (and flags) 2009-07-23 19:48 ` Question on fallocate/ftruncate sequence (and flags) Frank Mayhar @ 2009-07-23 20:37 ` Eric Sandeen 2009-07-23 21:01 ` Frank Mayhar 2009-07-23 21:53 ` Andreas Dilger 0 siblings, 2 replies; 42+ messages in thread From: Eric Sandeen @ 2009-07-23 20:37 UTC (permalink / raw) To: Frank Mayhar; +Cc: Andreas Dilger, Curt Wohlgemuth, ext4 development Frank Mayhar wrote: > On Tue, 2009-07-21 at 15:54 -0600, Andreas Dilger wrote: >> That said, we might need to have some kind of flag in the on-disk >> inode to indicate that it was preallocated beyond EOF. Otherwise, >> e2fsck will try and extend the file size to match the block count, >> which isn't correct. We could also use this flag to determine if >> truncate needs to be run on the inode even if the new size is the >> same. > > As it happens there's already a flag, FS_FALLOC_FL, set by ext2 in > fallocate(). Unfortunately ext4 is using that bit (0x00040000) for > EXT4_HUGE_FILE_FL. (Ext4 is using another bit as well, 0x00100000, for > EXT4_EXT_MIGRATE_FL when fs.h defines it as FS_DIRECTIO_FL.) I really > want to use the FS_FALLOC_FL bit for this purpose but that means > reallocating HUGE_FILE_FL to some other big. Objections? I'm confused (again?) :). I don't see FS_FALLOC_FL in the latest kernel source, and ext2 (well, my ext2 anyway) can't do fallocate(). Google (well, my google search) can't find it either. Is this something in your tree? As for: #define EXT4_EXT_MIGRATE 0x00100000 /* Inode is migrating */ this is not in the mask that FS_IOC_GETFLAGS can see ... and I don't think anyone else uses FS_DIRECTIO_FL. I'm not sure if the flags not in FS_FL_USER_VISIBLE are supposed to be fs-unique. -Eric ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Question on fallocate/ftruncate sequence (and flags) 2009-07-23 20:37 ` Eric Sandeen @ 2009-07-23 21:01 ` Frank Mayhar 2009-07-29 15:29 ` Jan Kara 2009-07-23 21:53 ` Andreas Dilger 1 sibling, 1 reply; 42+ messages in thread From: Frank Mayhar @ 2009-07-23 21:01 UTC (permalink / raw) To: Eric Sandeen; +Cc: Andreas Dilger, Curt Wohlgemuth, ext4 development On Thu, 2009-07-23 at 15:37 -0500, Eric Sandeen wrote: > I'm confused (again?) :). I don't see FS_FALLOC_FL in the latest kernel > source, and ext2 (well, my ext2 anyway) can't do fallocate(). Google > (well, my google search) can't find it either. Is this something in > your tree? No, I'm the one who got confused, yes, that's part of a hack in our tree. You did answer my question, though, at least partly: > As for: > > #define EXT4_EXT_MIGRATE 0x00100000 /* Inode is migrating */ > > this is not in the mask that FS_IOC_GETFLAGS can see ... and I don't > think anyone else uses FS_DIRECTIO_FL. > > I'm not sure if the flags not in FS_FL_USER_VISIBLE are supposed to be > fs-unique. The flag will need to be generic in any case, since inode_setattr() has to look at it when it's deciding whether or not to call vmtruncate(). Other filesystems that properly implement fallocate() may want to use it for this purpose as well. -- Frank Mayhar <fmayhar@google.com> Google, Inc. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Question on fallocate/ftruncate sequence (and flags) 2009-07-23 21:01 ` Frank Mayhar @ 2009-07-29 15:29 ` Jan Kara 2009-07-29 15:59 ` Frank Mayhar 0 siblings, 1 reply; 42+ messages in thread From: Jan Kara @ 2009-07-29 15:29 UTC (permalink / raw) To: Frank Mayhar Cc: Eric Sandeen, Andreas Dilger, Curt Wohlgemuth, ext4 development > On Thu, 2009-07-23 at 15:37 -0500, Eric Sandeen wrote: > > As for: > > > > #define EXT4_EXT_MIGRATE 0x00100000 /* Inode is migrating */ > > > > this is not in the mask that FS_IOC_GETFLAGS can see ... and I don't > > think anyone else uses FS_DIRECTIO_FL. > > > > I'm not sure if the flags not in FS_FL_USER_VISIBLE are supposed to be > > fs-unique. > > The flag will need to be generic in any case, since inode_setattr() has > to look at it when it's deciding whether or not to call vmtruncate(). > Other filesystems that properly implement fallocate() may want to use it > for this purpose as well. Actually, Nick Piggin is changing the truncate path (the patches may already be in Al Viro's tree) so that filesystem can come in earlier in the truncate path and can make the decision when to truncate and when not on its own. I guess this would help you... Honza -- Jan Kara <jack@suse.cz> SuSE CR Labs ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Question on fallocate/ftruncate sequence (and flags) 2009-07-29 15:29 ` Jan Kara @ 2009-07-29 15:59 ` Frank Mayhar 0 siblings, 0 replies; 42+ messages in thread From: Frank Mayhar @ 2009-07-29 15:59 UTC (permalink / raw) To: Jan Kara; +Cc: Eric Sandeen, Andreas Dilger, Curt Wohlgemuth, ext4 development On Wed, 2009-07-29 at 17:29 +0200, Jan Kara wrote: > > On Thu, 2009-07-23 at 15:37 -0500, Eric Sandeen wrote: > > > As for: > > > > > > #define EXT4_EXT_MIGRATE 0x00100000 /* Inode is migrating */ > > > > > > this is not in the mask that FS_IOC_GETFLAGS can see ... and I don't > > > think anyone else uses FS_DIRECTIO_FL. > > > > > > I'm not sure if the flags not in FS_FL_USER_VISIBLE are supposed to be > > > fs-unique. > > > > The flag will need to be generic in any case, since inode_setattr() has > > to look at it when it's deciding whether or not to call vmtruncate(). > > Other filesystems that properly implement fallocate() may want to use it > > for this purpose as well. > Actually, Nick Piggin is changing the truncate path (the patches may > already be in Al Viro's tree) so that filesystem can come in earlier in > the truncate path and can make the decision when to truncate and when > not on its own. I guess this would help you... Yes, immensely, but it looks like that's a bit in the future... -- Frank Mayhar <fmayhar@google.com> Google, Inc. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Question on fallocate/ftruncate sequence (and flags) 2009-07-23 20:37 ` Eric Sandeen 2009-07-23 21:01 ` Frank Mayhar @ 2009-07-23 21:53 ` Andreas Dilger 2009-07-23 23:33 ` Greg Freemyer 1 sibling, 1 reply; 42+ messages in thread From: Andreas Dilger @ 2009-07-23 21:53 UTC (permalink / raw) To: Eric Sandeen; +Cc: Frank Mayhar, Curt Wohlgemuth, ext4 development On Jul 23, 2009 15:37 -0500, Eric Sandeen wrote: > Frank Mayhar wrote: > > On Tue, 2009-07-21 at 15:54 -0600, Andreas Dilger wrote: > >> That said, we might need to have some kind of flag in the on-disk > >> inode to indicate that it was preallocated beyond EOF. Otherwise, > >> e2fsck will try and extend the file size to match the block count, > >> which isn't correct. We could also use this flag to determine if > >> truncate needs to be run on the inode even if the new size is the > >> same. > > > > As it happens there's already a flag, FS_FALLOC_FL, set by ext2 in > > fallocate(). Unfortunately ext4 is using that bit (0x00040000) for > > EXT4_HUGE_FILE_FL. (Ext4 is using another bit as well, 0x00100000, for > > EXT4_EXT_MIGRATE_FL when fs.h defines it as FS_DIRECTIO_FL.) I really > > want to use the FS_FALLOC_FL bit for this purpose but that means > > reallocating HUGE_FILE_FL to some other big. Objections? > > I'm confused (again?) :). I don't see FS_FALLOC_FL in the latest kernel > source, and ext2 (well, my ext2 anyway) can't do fallocate(). Google > (well, my google search) can't find it either. Is this something in > your tree? I think I recall Google working on a patch for fallocate on ext2, but it was vetoed from upstream inclusion because we don't want to flog a dead horse. Hence the Google patch to run ext4 w/o a journal. > As for: > > #define EXT4_EXT_MIGRATE 0x00100000 /* Inode is migrating */ > > this is not in the mask that FS_IOC_GETFLAGS can see ... and I don't > think anyone else uses FS_DIRECTIO_FL. > > I'm not sure if the flags not in FS_FL_USER_VISIBLE are supposed to be > fs-unique. Well, they are stored on disk, so we shouldn't have conflicts between ext2 and ext4 for sure. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Question on fallocate/ftruncate sequence (and flags) 2009-07-23 21:53 ` Andreas Dilger @ 2009-07-23 23:33 ` Greg Freemyer 0 siblings, 0 replies; 42+ messages in thread From: Greg Freemyer @ 2009-07-23 23:33 UTC (permalink / raw) To: Andreas Dilger Cc: Eric Sandeen, Frank Mayhar, Curt Wohlgemuth, ext4 development >> >> I'm confused (again?) :). I don't see FS_FALLOC_FL in the latest kernel >> source, and ext2 (well, my ext2 anyway) can't do fallocate(). Google >> (well, my google search) can't find it either. Is this something in >> your tree? > > I think I recall Google working on a patch for fallocate on ext2, but > it was vetoed from upstream inclusion because we don't want to flog > a dead horse. Is that ext2 patch still around? I'm doing some r&d via ext2 and it might handy to have fallocate() for it. Greg -- Greg Freemyer Head of EDD Tape Extraction and Processing team Litigation Triage Solutions Specialist http://www.linkedin.com/in/gregfreemyer Preservation and Forensic processing of Exchange Repositories White Paper - <http://www.norcrossgroup.com/forms/whitepapers/tng_whitepaper_fpe.html> The Norcross Group The Intersection of Evidence & Technology http://www.norcrossgroup.com -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Question on fallocate/ftruncate sequence 2009-07-20 22:45 ` Eric Sandeen 2009-07-21 21:29 ` Frank Mayhar @ 2009-07-21 22:03 ` Eric Sandeen 1 sibling, 0 replies; 42+ messages in thread From: Eric Sandeen @ 2009-07-21 22:03 UTC (permalink / raw) To: Curt Wohlgemuth; +Cc: ext4 development Eric Sandeen wrote: > Yep, I think you've found a bug, I will look into this soon unless > someone beats me to it :) FWIW this'd be a great test to add to the xfstests test suite (it would test ext4 for this too) I could do it but external contributions are welcomed and encouraged, hint hint ;) The xfs_io "falloc -k" command will do fallocate with KEEP_SIZE. -Eric ^ permalink raw reply [flat|nested] 42+ messages in thread
end of thread, other threads:[~2009-10-02 23:21 UTC | newest] Thread overview: 42+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-07-20 16:36 Question on fallocate/ftruncate sequence Curt Wohlgemuth 2009-07-20 22:45 ` Eric Sandeen 2009-07-21 21:29 ` Frank Mayhar 2009-07-21 21:54 ` Andreas Dilger 2009-07-22 16:24 ` Frank Mayhar 2009-07-22 23:10 ` Frank Mayhar 2009-07-23 3:05 ` Eric Sandeen 2009-07-23 16:27 ` Frank Mayhar 2009-07-23 17:00 ` Eric Sandeen 2009-07-23 18:05 ` Frank Mayhar 2009-07-23 21:56 ` Andreas Dilger 2009-07-23 22:46 ` Frank Mayhar 2009-08-28 18:42 ` Jiaying Zhang 2009-08-28 19:40 ` Andreas Dilger 2009-08-28 21:44 ` Jiaying Zhang 2009-08-28 22:14 ` Andreas Dilger 2009-08-29 0:40 ` Jiaying Zhang 2009-08-30 2:52 ` Theodore Tso 2009-08-31 19:40 ` Jiaying Zhang 2009-08-31 21:56 ` Andreas Dilger 2009-08-31 23:33 ` Jiaying Zhang 2009-09-02 8:41 ` Andreas Dilger 2009-09-03 5:20 ` Jiaying Zhang 2009-09-03 5:32 ` Jiaying Zhang 2009-09-24 5:27 ` Jiaying Zhang 2009-09-25 7:35 ` Andreas Dilger 2009-09-25 22:08 ` Jiaying Zhang 2009-09-29 19:15 ` Eric Sandeen 2009-09-29 19:38 ` Jiaying Zhang 2009-09-29 19:55 ` Eric Sandeen 2009-09-30 8:10 ` Andreas Dilger 2009-10-02 22:10 ` Jiaying Zhang 2009-10-02 22:29 ` Eric Sandeen 2009-10-02 23:21 ` Jiaying Zhang 2009-07-23 19:48 ` Question on fallocate/ftruncate sequence (and flags) Frank Mayhar 2009-07-23 20:37 ` Eric Sandeen 2009-07-23 21:01 ` Frank Mayhar 2009-07-29 15:29 ` Jan Kara 2009-07-29 15:59 ` Frank Mayhar 2009-07-23 21:53 ` Andreas Dilger 2009-07-23 23:33 ` Greg Freemyer 2009-07-21 22:03 ` Question on fallocate/ftruncate sequence Eric Sandeen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).