* fallocate(FALLOC_FL_PUNCH_HOLE) @ 2012-03-10 20:07 Richard Laager 2012-03-14 3:27 ` fallocate(FALLOC_FL_PUNCH_HOLE) Dave Chinner 0 siblings, 1 reply; 4+ messages in thread From: Richard Laager @ 2012-03-10 20:07 UTC (permalink / raw) To: linux-fsdevel [-- Attachment #1: Type: text/plain, Size: 1047 bytes --] I've been working on a discard patch for QEMU. I have a couple of questions about the semantics of fallocate()'s FALLOC_FL_PUNCH_HOLE that are not addressed in the latest man-pages.git. 1. Upon successful return, are the results guaranteed to be on stable storage? 1. If not, is fdatasync() sufficient, or is fsync() required? 2. Does O_DSYNC on open() change any of this? 3. Does O_DIRECT on open() change any of this? 2. If I punch a hole in a previously preallocated range, is this... A. required to undo the preallocation? B. permitted, but not required, to undo the preallocation? C. forbidden from undoing the preallocation? If the answer to #2 is not C, it would appear there's no atomic way to indicate that I'm done with certain data* but I want the filesystem to continue to guarantee space for me. Is this correct? * so the filesystem can send a TRIM/UNMAP to an underlying SSD. Thanks, Richard [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: fallocate(FALLOC_FL_PUNCH_HOLE) 2012-03-10 20:07 fallocate(FALLOC_FL_PUNCH_HOLE) Richard Laager @ 2012-03-14 3:27 ` Dave Chinner 2012-03-14 6:01 ` fallocate(FALLOC_FL_PUNCH_HOLE) Richard Laager 2012-03-14 12:56 ` fallocate(FALLOC_FL_PUNCH_HOLE) Ted Ts'o 0 siblings, 2 replies; 4+ messages in thread From: Dave Chinner @ 2012-03-14 3:27 UTC (permalink / raw) To: Richard Laager; +Cc: linux-fsdevel On Sat, Mar 10, 2012 at 02:07:05PM -0600, Richard Laager wrote: > I've been working on a discard patch for QEMU. > > I have a couple of questions about the semantics of fallocate()'s > FALLOC_FL_PUNCH_HOLE that are not addressed in the latest man-pages.git. > > 1. Upon successful return, are the results guaranteed to be on > stable storage? No. > 1. If not, is fdatasync() sufficient, or is fsync() > required? Will be on stable storage before fdatasync() returns. > 2. Does O_DSYNC on open() change any of this? Will be on stable storage before fallocate() returns. > 3. Does O_DIRECT on open() change any of this? Has no effect on behaviour. > 2. If I punch a hole in a previously preallocated range, is this... > A. required to undo the preallocation? > B. permitted, but not required, to undo the preallocation? > C. forbidden from undoing the preallocation? B. Most implementations will give you A, though. > If the answer to #2 is not C, it would appear there's no atomic way to > indicate that I'm done with certain data* but I want the filesystem to > continue to guarantee space for me. Is this correct? Not through fallocate() right now. XFS has an ioctl that will turn written ranges and holes back into preallocated space: XFS_IOC_ZERO_RANGE. I've got a patch that introduces this zeroing capability to fallocate (see below) which currently works on XFS. > * so the filesystem can send a TRIM/UNMAP to an underlying SSD. It does not, however, issue discards on the range, because it is still allocated space in the filesystem. It could probably be made to do so, especially as the folks that requested the XFS_IOC_ZERO_RANGE functionality asking about extending it to do this last week. Cheers, Dave. -- Dave Chinner david@fromorbit.com fs: Introduce FALLOC_FL_ZERO_RANGE From: Dave Chinner <dchinner@redhat.com> FALLOC_FL_ZERO_RANGE is the equivalent of an atomic hole-punch + preallocation. It enabled ranges of written data to be turned into zeroes without requiring IO or having to free and reallocate the extents in the range given as would occur if we had to punch and then preallocate them separately. This enables applications to zero parts of files very quickly without changing the layout of the files in any way. Signed-off-by: Dave Chinner <dchinner@redhat.com> --- fs/xfs/xfs_file.c | 6 +++++- include/linux/falloc.h | 1 + 2 files changed, 6 insertions(+), 1 deletions(-) diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index 825390e..ce2fd17 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -912,7 +912,9 @@ xfs_file_fallocate( int cmd = XFS_IOC_RESVSP; int attr_flags = XFS_ATTR_NOLOCK; - if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE)) + if (mode & ~(FALLOC_FL_KEEP_SIZE | + FALLOC_FL_PUNCH_HOLE | + FALLOC_FL_ZERO_RANGE)) return -EOPNOTSUPP; bf.l_whence = 0; @@ -923,6 +925,8 @@ xfs_file_fallocate( if (mode & FALLOC_FL_PUNCH_HOLE) cmd = XFS_IOC_UNRESVSP; + else if (mode & FALLOC_FL_ZERO_RANGE) + cmd = XFS_IOC_ZERO_RANGE; /* check the new inode size is valid before allocating */ if (!(mode & FALLOC_FL_KEEP_SIZE) && diff --git a/include/linux/falloc.h b/include/linux/falloc.h index 73e0b62..9160c70 100644 --- a/include/linux/falloc.h +++ b/include/linux/falloc.h @@ -3,6 +3,7 @@ #define FALLOC_FL_KEEP_SIZE 0x01 /* default is extend size */ #define FALLOC_FL_PUNCH_HOLE 0x02 /* de-allocates range */ +#define FALLOC_FL_ZERO_RANGE 0x04 /* zero/prealloc all blocks in range */ #ifdef __KERNEL__ ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: fallocate(FALLOC_FL_PUNCH_HOLE) 2012-03-14 3:27 ` fallocate(FALLOC_FL_PUNCH_HOLE) Dave Chinner @ 2012-03-14 6:01 ` Richard Laager 2012-03-14 12:56 ` fallocate(FALLOC_FL_PUNCH_HOLE) Ted Ts'o 1 sibling, 0 replies; 4+ messages in thread From: Richard Laager @ 2012-03-14 6:01 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-fsdevel [-- Attachment #1: Type: text/plain, Size: 1110 bytes --] On Wed, 2012-03-14 at 14:27 +1100, Dave Chinner wrote: > On Sat, Mar 10, 2012 at 02:07:05PM -0600, Richard Laager wrote: > > If the answer to #2 is not C, it would appear there's no atomic way to > > indicate that I'm done with certain data* but I want the filesystem to > > continue to guarantee space for me. Is this correct? > > Not through fallocate() right now. XFS has an ioctl that will turn > written ranges and holes back into preallocated space: > XFS_IOC_ZERO_RANGE. Do filesystems generally track the data necessary to tell the difference between fallocate() + write() and just a regular write()? If so, it might be nice for applications to be able to say "I'm done with this data" and effectively "undo" the write(). In other words, the space would return to being unallocated or preallocated, whichever it was originally. I suspect they don't track preallocation of data ranges once they're filled with data. So, for example, QEMU will have to be told whether the administrator wants thin (i.e. use PUNCH_HOLE) or thick (i.e. use ZERO_RANGE) provisioning. -- Richard [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: fallocate(FALLOC_FL_PUNCH_HOLE) 2012-03-14 3:27 ` fallocate(FALLOC_FL_PUNCH_HOLE) Dave Chinner 2012-03-14 6:01 ` fallocate(FALLOC_FL_PUNCH_HOLE) Richard Laager @ 2012-03-14 12:56 ` Ted Ts'o 1 sibling, 0 replies; 4+ messages in thread From: Ted Ts'o @ 2012-03-14 12:56 UTC (permalink / raw) To: Dave Chinner; +Cc: Richard Laager, linux-fsdevel On Wed, Mar 14, 2012 at 02:27:09PM +1100, Dave Chinner wrote: > > If the answer to #2 is not C, it would appear there's no atomic way to > > indicate that I'm done with certain data* but I want the filesystem to > > continue to guarantee space for me. Is this correct? > > Not through fallocate() right now. XFS has an ioctl that will turn > written ranges and holes back into preallocated space: > XFS_IOC_ZERO_RANGE. I've got a patch that introduces this zeroing > capability to fallocate (see below) which currently works on XFS. If this is something that an important application or set of applications really want, maybe we should provide an interface through fallocate(2) TO DO THIS. > > * so the filesystem can send a TRIM/UNMAP to an underlying SSD. > > It does not, however, issue discards on the range, because it is > still allocated space in the filesystem. It could probably be > made to do so, especially as the folks that requested the > XFS_IOC_ZERO_RANGE functionality asking about extending it to do > this last week. ... and if we know that discards will persistently cause blocks to return zero (the device exports a flag indicating whether this is true), and that trims are fast (i.e., they export the SATA 3.1 queable trim command), it might make sense to simply issue a discard on the range and not even mess with the metadata flags (since messing with the metadata flags has overhead at punch time and the next time oyu want to write to that block). The choice of what to do should be hidden from the application, though and be handled at the file system level. - Ted ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2012-03-14 12:56 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-03-10 20:07 fallocate(FALLOC_FL_PUNCH_HOLE) Richard Laager 2012-03-14 3:27 ` fallocate(FALLOC_FL_PUNCH_HOLE) Dave Chinner 2012-03-14 6:01 ` fallocate(FALLOC_FL_PUNCH_HOLE) Richard Laager 2012-03-14 12:56 ` fallocate(FALLOC_FL_PUNCH_HOLE) Ted Ts'o
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).