* Hole Punching V3 @ 2010-11-18 1:46 Josef Bacik 2010-11-18 1:46 ` [PATCH 1/6] fs: add hole punching to fallocate Josef Bacik ` (6 more replies) 0 siblings, 7 replies; 35+ messages in thread From: Josef Bacik @ 2010-11-18 1:46 UTC (permalink / raw) To: david, linux-kernel, linux-btrfs, linux-ext4, linux-fsdevel, xfs, cmm, cluster-dev This is version 3 of the hole punching series I've been posting. Not much has changed, the history is below V2->V3 -FALLOC_FL_PUNCH_HOLE must also have FALLOC_FL_KEEP_SIZE in order to work -formatting fixes V1->V2 -Hole punching doesn't change file size -Fixed the mode checks in ext4/btrfs/gfs2 so they do what they are supposed to I've updated my local copies of the xfsprogs patches I have to test this to use KEEP_SIZE and PUNCH_HOLE together, I'll post them after it looks like these patches are good to go, including the manpage update. The xfstest I wrote ran fine both on xfs and btrfs (failing on btrfs obviously). Thanks, Josef ^ permalink raw reply [flat|nested] 35+ messages in thread
* [PATCH 1/6] fs: add hole punching to fallocate 2010-11-18 1:46 Hole Punching V3 Josef Bacik @ 2010-11-18 1:46 ` Josef Bacik 2010-11-18 23:43 ` Jan Kara 2010-11-18 1:46 ` [PATCH 2/6] XFS: handle hole punching via fallocate properly Josef Bacik ` (5 subsequent siblings) 6 siblings, 1 reply; 35+ messages in thread From: Josef Bacik @ 2010-11-18 1:46 UTC (permalink / raw) To: david, linux-kernel, linux-btrfs, linux-ext4, linux-fsdevel, xfs, cmm, cluster-dev Hole punching has already been implemented by XFS and OCFS2, and has the potential to be implemented on both BTRFS and EXT4 so we need a generic way to get to this feature. The simplest way in my mind is to add FALLOC_FL_PUNCH_HOLE to fallocate() since it already looks like the normal fallocate() operation. I've tested this patch with XFS and BTRFS to make sure XFS did what it's supposed to do and that BTRFS failed like it was supposed to. Thank you, Signed-off-by: Josef Bacik <josef@redhat.com> --- fs/open.c | 7 ++++++- include/linux/falloc.h | 1 + 2 files changed, 7 insertions(+), 1 deletions(-) diff --git a/fs/open.c b/fs/open.c index 4197b9e..5b6ef7e 100644 --- a/fs/open.c +++ b/fs/open.c @@ -223,7 +223,12 @@ int do_fallocate(struct file *file, int mode, loff_t offset, loff_t len) return -EINVAL; /* Return error if mode is not supported */ - if (mode && !(mode & FALLOC_FL_KEEP_SIZE)) + if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE)) + return -EOPNOTSUPP; + + /* Punch hole must have keep size set */ + if ((mode & FALLOC_FL_PUNCH_HOLE) && + !(mode & FALLOC_FL_KEEP_SIZE)) return -EOPNOTSUPP; if (!(file->f_mode & FMODE_WRITE)) diff --git a/include/linux/falloc.h b/include/linux/falloc.h index 3c15510..73e0b62 100644 --- a/include/linux/falloc.h +++ b/include/linux/falloc.h @@ -2,6 +2,7 @@ #define _FALLOC_H_ #define FALLOC_FL_KEEP_SIZE 0x01 /* default is extend size */ +#define FALLOC_FL_PUNCH_HOLE 0x02 /* de-allocates range */ #ifdef __KERNEL__ -- 1.6.6.1 ^ permalink raw reply related [flat|nested] 35+ messages in thread
* Re: [PATCH 1/6] fs: add hole punching to fallocate 2010-11-18 1:46 ` [PATCH 1/6] fs: add hole punching to fallocate Josef Bacik @ 2010-11-18 23:43 ` Jan Kara 0 siblings, 0 replies; 35+ messages in thread From: Jan Kara @ 2010-11-18 23:43 UTC (permalink / raw) To: Josef Bacik Cc: david, linux-kernel, linux-btrfs, linux-ext4, linux-fsdevel, xfs, cmm, cluster-devel, joel.becker, jack On Wed 17-11-10 20:46:15, Josef Bacik wrote: > Hole punching has already been implemented by XFS and OCFS2, and has the > potential to be implemented on both BTRFS and EXT4 so we need a generic way to > get to this feature. The simplest way in my mind is to add FALLOC_FL_PUNCH_HOLE > to fallocate() since it already looks like the normal fallocate() operation. > I've tested this patch with XFS and BTRFS to make sure XFS did what it's > supposed to do and that BTRFS failed like it was supposed to. Thank you, Looks nice now. Acked-by: Jan Kara <jack@suse.cz> Honza > > Signed-off-by: Josef Bacik <josef@redhat.com> > --- > fs/open.c | 7 ++++++- > include/linux/falloc.h | 1 + > 2 files changed, 7 insertions(+), 1 deletions(-) > > diff --git a/fs/open.c b/fs/open.c > index 4197b9e..5b6ef7e 100644 > --- a/fs/open.c > +++ b/fs/open.c > @@ -223,7 +223,12 @@ int do_fallocate(struct file *file, int mode, loff_t offset, loff_t len) > return -EINVAL; > > /* Return error if mode is not supported */ > - if (mode && !(mode & FALLOC_FL_KEEP_SIZE)) > + if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE)) > + return -EOPNOTSUPP; > + > + /* Punch hole must have keep size set */ > + if ((mode & FALLOC_FL_PUNCH_HOLE) && > + !(mode & FALLOC_FL_KEEP_SIZE)) > return -EOPNOTSUPP; > > if (!(file->f_mode & FMODE_WRITE)) > diff --git a/include/linux/falloc.h b/include/linux/falloc.h > index 3c15510..73e0b62 100644 > --- a/include/linux/falloc.h > +++ b/include/linux/falloc.h > @@ -2,6 +2,7 @@ > #define _FALLOC_H_ > > #define FALLOC_FL_KEEP_SIZE 0x01 /* default is extend size */ > +#define FALLOC_FL_PUNCH_HOLE 0x02 /* de-allocates range */ > > #ifdef __KERNEL__ > > -- > 1.6.6.1 > -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 35+ messages in thread
* [PATCH 2/6] XFS: handle hole punching via fallocate properly 2010-11-18 1:46 Hole Punching V3 Josef Bacik 2010-11-18 1:46 ` [PATCH 1/6] fs: add hole punching to fallocate Josef Bacik @ 2010-11-18 1:46 ` Josef Bacik 2010-11-18 1:46 ` [PATCH 3/6] Ocfs2: " Josef Bacik ` (4 subsequent siblings) 6 siblings, 0 replies; 35+ messages in thread From: Josef Bacik @ 2010-11-18 1:46 UTC (permalink / raw) To: david, linux-kernel, linux-btrfs, linux-ext4, linux-fsdevel, xfs, cmm, cluster-dev This patch simply allows XFS to handle the hole punching flag in fallocate properly. I've tested this with a little program that does a bunch of random hole punching with FL_KEEP_SIZE and without it to make sure it does the right thing. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> --- fs/xfs/linux-2.6/xfs_iops.c | 7 +++++-- 1 files changed, 5 insertions(+), 2 deletions(-) diff --git a/fs/xfs/linux-2.6/xfs_iops.c b/fs/xfs/linux-2.6/xfs_iops.c index 96107ef..07d2164 100644 --- a/fs/xfs/linux-2.6/xfs_iops.c +++ b/fs/xfs/linux-2.6/xfs_iops.c @@ -516,6 +516,7 @@ xfs_vn_fallocate( loff_t new_size = 0; xfs_flock64_t bf; xfs_inode_t *ip = XFS_I(inode); + int cmd = XFS_IOC_RESVSP; /* preallocation on directories not yet supported */ error = -ENODEV; @@ -528,6 +529,9 @@ xfs_vn_fallocate( xfs_ilock(ip, XFS_IOLOCK_EXCL); + if (mode & FALLOC_FL_PUNCH_HOLE) + cmd = XFS_IOC_UNRESVSP; + /* check the new inode size is valid before allocating */ if (!(mode & FALLOC_FL_KEEP_SIZE) && offset + len > i_size_read(inode)) { @@ -537,8 +541,7 @@ xfs_vn_fallocate( goto out_unlock; } - error = -xfs_change_file_space(ip, XFS_IOC_RESVSP, &bf, - 0, XFS_ATTR_NOLOCK); + error = -xfs_change_file_space(ip, cmd, &bf, 0, XFS_ATTR_NOLOCK); if (error) goto out_unlock; -- 1.6.6.1 ^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH 3/6] Ocfs2: handle hole punching via fallocate properly 2010-11-18 1:46 Hole Punching V3 Josef Bacik 2010-11-18 1:46 ` [PATCH 1/6] fs: add hole punching to fallocate Josef Bacik 2010-11-18 1:46 ` [PATCH 2/6] XFS: handle hole punching via fallocate properly Josef Bacik @ 2010-11-18 1:46 ` Josef Bacik 2010-11-18 1:46 ` [PATCH 4/6] Ext4: fail if we try to use hole punch Josef Bacik ` (3 subsequent siblings) 6 siblings, 0 replies; 35+ messages in thread From: Josef Bacik @ 2010-11-18 1:46 UTC (permalink / raw) To: david, linux-kernel, linux-btrfs, linux-ext4, linux-fsdevel, xfs, cmm, cluster-dev This patch just makes ocfs2 use its UNRESERVP ioctl when we get the hole punch flag in fallocate. I didn't test it, but it seems simple enough. Thanks, Acked-by: Jan Kara <jack@suse.cz> Acked-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Josef Bacik <josef@redhat.com> --- fs/ocfs2/file.c | 8 ++++++-- 1 files changed, 6 insertions(+), 2 deletions(-) diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c index 77b4c04..ad23a18 100644 --- a/fs/ocfs2/file.c +++ b/fs/ocfs2/file.c @@ -1992,6 +1992,7 @@ static long ocfs2_fallocate(struct inode *inode, int mode, loff_t offset, struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); struct ocfs2_space_resv sr; int change_size = 1; + int cmd = OCFS2_IOC_RESVSP64; if (!ocfs2_writes_unwritten_extents(osb)) return -EOPNOTSUPP; @@ -2002,12 +2003,15 @@ static long ocfs2_fallocate(struct inode *inode, int mode, loff_t offset, if (mode & FALLOC_FL_KEEP_SIZE) change_size = 0; + if (mode & FALLOC_FL_PUNCH_HOLE) + cmd = OCFS2_IOC_UNRESVSP64; + sr.l_whence = 0; sr.l_start = (s64)offset; sr.l_len = (s64)len; - return __ocfs2_change_file_space(NULL, inode, offset, - OCFS2_IOC_RESVSP64, &sr, change_size); + return __ocfs2_change_file_space(NULL, inode, offset, cmd, &sr, + change_size); } int ocfs2_check_range_for_refcount(struct inode *inode, loff_t pos, -- 1.6.6.1 ^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH 4/6] Ext4: fail if we try to use hole punch 2010-11-18 1:46 Hole Punching V3 Josef Bacik ` (2 preceding siblings ...) 2010-11-18 1:46 ` [PATCH 3/6] Ocfs2: " Josef Bacik @ 2010-11-18 1:46 ` Josef Bacik 2010-11-18 1:46 ` [PATCH 5/6] Btrfs: " Josef Bacik ` (2 subsequent siblings) 6 siblings, 0 replies; 35+ messages in thread From: Josef Bacik @ 2010-11-18 1:46 UTC (permalink / raw) To: david, linux-kernel, linux-btrfs, linux-ext4, linux-fsdevel, xfs, cmm, cluster-dev Ext4 doesn't have the ability to punch holes yet, so make sure we return EOPNOTSUPP if we try to use hole punching through fallocate. This support can be added later. Thanks, Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Josef Bacik <josef@redhat.com> --- fs/ext4/extents.c | 4 ++++ 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index 0554c48..35bca73 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -3622,6 +3622,10 @@ long ext4_fallocate(struct inode *inode, int mode, loff_t offset, loff_t len) struct ext4_map_blocks map; unsigned int credits, blkbits = inode->i_blkbits; + /* We only support the FALLOC_FL_KEEP_SIZE mode */ + if (mode && (mode != FALLOC_FL_KEEP_SIZE)) + return -EOPNOTSUPP; + /* * currently supporting (pre)allocate mode for extent-based * files _only_ -- 1.6.6.1 ^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH 5/6] Btrfs: fail if we try to use hole punch 2010-11-18 1:46 Hole Punching V3 Josef Bacik ` (3 preceding siblings ...) 2010-11-18 1:46 ` [PATCH 4/6] Ext4: fail if we try to use hole punch Josef Bacik @ 2010-11-18 1:46 ` Josef Bacik 2010-11-18 1:46 ` [PATCH 6/6] Gfs2: " Josef Bacik 2011-01-03 21:57 ` Hole Punching V3 Josef Bacik 6 siblings, 0 replies; 35+ messages in thread From: Josef Bacik @ 2010-11-18 1:46 UTC (permalink / raw) To: david, linux-kernel, linux-btrfs, linux-ext4, linux-fsdevel, xfs, cmm, cluster-dev Btrfs doesn't have the ability to punch holes yet, so make sure we return EOPNOTSUPP if we try to use hole punching through fallocate. This support can be added later. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> --- fs/btrfs/inode.c | 4 ++++ 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 78877d7..6f08892 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -6936,6 +6936,10 @@ static long btrfs_fallocate(struct inode *inode, int mode, alloc_start = offset & ~mask; alloc_end = (offset + len + mask) & ~mask; + /* We only support the FALLOC_FL_KEEP_SIZE mode */ + if (mode && (mode != FALLOC_FL_KEEP_SIZE)) + return -EOPNOTSUPP; + /* * wait for ordered IO before we have any locks. We'll loop again * below with the locks held. -- 1.6.6.1 ^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH 6/6] Gfs2: fail if we try to use hole punch 2010-11-18 1:46 Hole Punching V3 Josef Bacik ` (4 preceding siblings ...) 2010-11-18 1:46 ` [PATCH 5/6] Btrfs: " Josef Bacik @ 2010-11-18 1:46 ` Josef Bacik 2011-01-03 21:57 ` Hole Punching V3 Josef Bacik 6 siblings, 0 replies; 35+ messages in thread From: Josef Bacik @ 2010-11-18 1:46 UTC (permalink / raw) To: david, linux-kernel, linux-btrfs, linux-ext4, linux-fsdevel, xfs, cmm, cluster-dev Gfs2 doesn't have the ability to punch holes yet, so make sure we return EOPNOTSUPP if we try to use hole punching through fallocate. This support can be added later. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> --- fs/gfs2/ops_inode.c | 4 ++++ 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/fs/gfs2/ops_inode.c b/fs/gfs2/ops_inode.c index 12cbea7..65cb5f6 100644 --- a/fs/gfs2/ops_inode.c +++ b/fs/gfs2/ops_inode.c @@ -1439,6 +1439,10 @@ static long gfs2_fallocate(struct inode *inode, int mode, loff_t offset, loff_t next = (offset + len - 1) >> sdp->sd_sb.sb_bsize_shift; next = (next + 1) << sdp->sd_sb.sb_bsize_shift; + /* We only support the FALLOC_FL_KEEP_SIZE mode */ + if (mode && (mode != FALLOC_FL_KEEP_SIZE)) + return -EOPNOTSUPP; + offset = (offset >> sdp->sd_sb.sb_bsize_shift) << sdp->sd_sb.sb_bsize_shift; -- 1.6.6.1 ^ permalink raw reply related [flat|nested] 35+ messages in thread
* Re: Hole Punching V3 2010-11-18 1:46 Hole Punching V3 Josef Bacik ` (5 preceding siblings ...) 2010-11-18 1:46 ` [PATCH 6/6] Gfs2: " Josef Bacik @ 2011-01-03 21:57 ` Josef Bacik 6 siblings, 0 replies; 35+ messages in thread From: Josef Bacik @ 2011-01-03 21:57 UTC (permalink / raw) To: Josef Bacik Cc: david, linux-kernel, linux-btrfs, linux-ext4, linux-fsdevel, xfs, cmm, cluster-devel, joel.becker, jack, akpm, torvalds On Wed, Nov 17, 2010 at 08:46:14PM -0500, Josef Bacik wrote: > This is version 3 of the hole punching series I've been posting. Not much has > changed, the history is below > > V2->V3 > -FALLOC_FL_PUNCH_HOLE must also have FALLOC_FL_KEEP_SIZE in order to work > -formatting fixes > > V1->V2 > -Hole punching doesn't change file size > -Fixed the mode checks in ext4/btrfs/gfs2 so they do what they are supposed to > > I've updated my local copies of the xfsprogs patches I have to test this to use > KEEP_SIZE and PUNCH_HOLE together, I'll post them after it looks like these > patches are good to go, including the manpage update. The xfstest I wrote ran > fine both on xfs and btrfs (failing on btrfs obviously). Thanks, > I'd like to try and get this into the next merge window, it seems everybody is happy with it so far, any other comments? Provided everybody is ok with it, how would you like me to send it to you Linus? Would you prefer a pull request or will you just pull the patches off the mailinglist? Thanks, Josef ^ permalink raw reply [flat|nested] 35+ messages in thread
* Hole Punching V2 @ 2010-11-15 17:05 Josef Bacik 2010-11-15 17:05 ` [PATCH 1/6] fs: add hole punching to fallocate Josef Bacik 0 siblings, 1 reply; 35+ messages in thread From: Josef Bacik @ 2010-11-15 17:05 UTC (permalink / raw) To: david, linux-kernel, linux-btrfs, linux-ext4, linux-fsdevel, xfs, cmm, cluster-dev This is version 2 of the hole punching series I posted last week. The following things have changed -Hole punching doesn't change file size -Fixed the mode checks in ext4/btrfs/gfs2 so they do what they are supposed to I posted updates to xfstests and xfsprogs in order to test this new interface, and ran the test on xfs to make sure hole punching worked properly for xfs and I tested it on btrfs to make sure it failed properly. This series also adds support for doing hole punching to ocfs2, but I did not test this part of it, albiet it's an obvious fix so it should work fine. Thanks, Josef ^ permalink raw reply [flat|nested] 35+ messages in thread
* [PATCH 1/6] fs: add hole punching to fallocate 2010-11-15 17:05 Hole Punching V2 Josef Bacik @ 2010-11-15 17:05 ` Josef Bacik 2010-11-16 11:16 ` Jan Kara 0 siblings, 1 reply; 35+ messages in thread From: Josef Bacik @ 2010-11-15 17:05 UTC (permalink / raw) To: david, linux-kernel, linux-btrfs, linux-ext4, linux-fsdevel, xfs, cmm, cluster-dev Hole punching has already been implemented by XFS and OCFS2, and has the potential to be implemented on both BTRFS and EXT4 so we need a generic way to get to this feature. The simplest way in my mind is to add FALLOC_FL_PUNCH_HOLE to fallocate() since it already looks like the normal fallocate() operation. I've tested this patch with XFS and BTRFS to make sure XFS did what it's supposed to do and that BTRFS failed like it was supposed to. Thank you, Signed-off-by: Josef Bacik <josef@redhat.com> --- fs/open.c | 2 +- include/linux/falloc.h | 1 + 2 files changed, 2 insertions(+), 1 deletions(-) diff --git a/fs/open.c b/fs/open.c index 4197b9e..ab8dedf 100644 --- a/fs/open.c +++ b/fs/open.c @@ -223,7 +223,7 @@ int do_fallocate(struct file *file, int mode, loff_t offset, loff_t len) return -EINVAL; /* Return error if mode is not supported */ - if (mode && !(mode & FALLOC_FL_KEEP_SIZE)) + if (mode && (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE))) return -EOPNOTSUPP; if (!(file->f_mode & FMODE_WRITE)) diff --git a/include/linux/falloc.h b/include/linux/falloc.h index 3c15510..851cba2 100644 --- a/include/linux/falloc.h +++ b/include/linux/falloc.h @@ -2,6 +2,7 @@ #define _FALLOC_H_ #define FALLOC_FL_KEEP_SIZE 0x01 /* default is extend size */ +#define FALLOC_FL_PUNCH_HOLE 0X02 /* de-allocates range */ #ifdef __KERNEL__ -- 1.6.6.1 ^ permalink raw reply related [flat|nested] 35+ messages in thread
* Re: [PATCH 1/6] fs: add hole punching to fallocate 2010-11-15 17:05 ` [PATCH 1/6] fs: add hole punching to fallocate Josef Bacik @ 2010-11-16 11:16 ` Jan Kara 2010-11-16 11:43 ` Jan Kara 2010-11-16 12:53 ` Josef Bacik 0 siblings, 2 replies; 35+ messages in thread From: Jan Kara @ 2010-11-16 11:16 UTC (permalink / raw) To: Josef Bacik Cc: david, linux-kernel, linux-btrfs, linux-ext4, linux-fsdevel, xfs, cmm, cluster-devel, ocfs2-devel On Mon 15-11-10 12:05:18, Josef Bacik wrote: > diff --git a/fs/open.c b/fs/open.c > index 4197b9e..ab8dedf 100644 > --- a/fs/open.c > +++ b/fs/open.c > @@ -223,7 +223,7 @@ int do_fallocate(struct file *file, int mode, loff_t offset, loff_t len) > return -EINVAL; > > /* Return error if mode is not supported */ > - if (mode && !(mode & FALLOC_FL_KEEP_SIZE)) > + if (mode && (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE))) Why not just: if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE)) ? > diff --git a/include/linux/falloc.h b/include/linux/falloc.h > index 3c15510..851cba2 100644 > --- a/include/linux/falloc.h > +++ b/include/linux/falloc.h > @@ -2,6 +2,7 @@ > #define _FALLOC_H_ > > #define FALLOC_FL_KEEP_SIZE 0x01 /* default is extend size */ > +#define FALLOC_FL_PUNCH_HOLE 0X02 /* de-allocates range */ ^ use lowercase 'x' please... Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 1/6] fs: add hole punching to fallocate 2010-11-16 11:16 ` Jan Kara @ 2010-11-16 11:43 ` Jan Kara 2010-11-16 12:52 ` Josef Bacik 2010-11-16 12:53 ` Josef Bacik 1 sibling, 1 reply; 35+ messages in thread From: Jan Kara @ 2010-11-16 11:43 UTC (permalink / raw) To: Josef Bacik Cc: david, linux-kernel, linux-btrfs, linux-ext4, linux-fsdevel, xfs, cmm, cluster-devel, ocfs2-devel On Tue 16-11-10 12:16:11, Jan Kara wrote: > On Mon 15-11-10 12:05:18, Josef Bacik wrote: > > diff --git a/fs/open.c b/fs/open.c > > index 4197b9e..ab8dedf 100644 > > --- a/fs/open.c > > +++ b/fs/open.c > > @@ -223,7 +223,7 @@ int do_fallocate(struct file *file, int mode, loff_t offset, loff_t len) > > return -EINVAL; > > > > /* Return error if mode is not supported */ > > - if (mode && !(mode & FALLOC_FL_KEEP_SIZE)) > > + if (mode && (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE))) > Why not just: > if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE)) ? And BTW, since FALLOC_FL_PUNCH_HOLE does not change the file size, should not we enforce that FALLOC_FL_KEEP_SIZE is / is not set? I don't mind too much which way but keeping it ambiguous (ignored) in the interface usually proves as a bad idea in future when we want to further extend the interface... Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 1/6] fs: add hole punching to fallocate 2010-11-16 11:43 ` Jan Kara @ 2010-11-16 12:52 ` Josef Bacik 2010-11-16 13:14 ` Jan Kara 0 siblings, 1 reply; 35+ messages in thread From: Josef Bacik @ 2010-11-16 12:52 UTC (permalink / raw) To: Jan Kara Cc: Josef Bacik, david, linux-kernel, linux-btrfs, linux-ext4, linux-fsdevel, xfs, cmm, cluster-devel, ocfs2-devel On Tue, Nov 16, 2010 at 12:43:46PM +0100, Jan Kara wrote: > On Tue 16-11-10 12:16:11, Jan Kara wrote: > > On Mon 15-11-10 12:05:18, Josef Bacik wrote: > > > diff --git a/fs/open.c b/fs/open.c > > > index 4197b9e..ab8dedf 100644 > > > --- a/fs/open.c > > > +++ b/fs/open.c > > > @@ -223,7 +223,7 @@ int do_fallocate(struct file *file, int mode, loff_t offset, loff_t len) > > > return -EINVAL; > > > > > > /* Return error if mode is not supported */ > > > - if (mode && !(mode & FALLOC_FL_KEEP_SIZE)) > > > + if (mode && (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE))) > > Why not just: > > if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE)) ? > And BTW, since FALLOC_FL_PUNCH_HOLE does not change the file size, should > not we enforce that FALLOC_FL_KEEP_SIZE is / is not set? I don't mind too > much which way but keeping it ambiguous (ignored) in the interface usually > proves as a bad idea in future when we want to further extend the interface... > Yeah I went back and forth on this. KEEP_SIZE won't change the behavior of PUNCH_HOLE since PUNCH_HOLE implicitly means keep the size. I figured since its "mode" and not "flags" it would be ok to make either way accepted, but if you prefer PUNCH_HOLE means you have to have KEEP_SIZE set then I'm cool with that, just let me know one way or the other. Thanks, Josef ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 1/6] fs: add hole punching to fallocate 2010-11-16 12:52 ` Josef Bacik @ 2010-11-16 13:14 ` Jan Kara 2010-11-17 0:22 ` Andreas Dilger 0 siblings, 1 reply; 35+ messages in thread From: Jan Kara @ 2010-11-16 13:14 UTC (permalink / raw) To: Josef Bacik Cc: Jan Kara, david, linux-kernel, linux-btrfs, linux-ext4, linux-fsdevel, xfs, cmm, cluster-devel, ocfs2-devel On Tue 16-11-10 07:52:50, Josef Bacik wrote: > On Tue, Nov 16, 2010 at 12:43:46PM +0100, Jan Kara wrote: > > On Tue 16-11-10 12:16:11, Jan Kara wrote: > > > On Mon 15-11-10 12:05:18, Josef Bacik wrote: > > > > diff --git a/fs/open.c b/fs/open.c > > > > index 4197b9e..ab8dedf 100644 > > > > --- a/fs/open.c > > > > +++ b/fs/open.c > > > > @@ -223,7 +223,7 @@ int do_fallocate(struct file *file, int mode, loff_t offset, loff_t len) > > > > return -EINVAL; > > > > > > > > /* Return error if mode is not supported */ > > > > - if (mode && !(mode & FALLOC_FL_KEEP_SIZE)) > > > > + if (mode && (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE))) > > > Why not just: > > > if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE)) ? > > And BTW, since FALLOC_FL_PUNCH_HOLE does not change the file size, should > > not we enforce that FALLOC_FL_KEEP_SIZE is / is not set? I don't mind too > > much which way but keeping it ambiguous (ignored) in the interface usually > > proves as a bad idea in future when we want to further extend the interface... > > > > Yeah I went back and forth on this. KEEP_SIZE won't change the behavior of > PUNCH_HOLE since PUNCH_HOLE implicitly means keep the size. I figured since its > "mode" and not "flags" it would be ok to make either way accepted, but if you > prefer PUNCH_HOLE means you have to have KEEP_SIZE set then I'm cool with that, > just let me know one way or the other. Thanks, I was wondering about 'mode' vs 'flags' as well. The manpage says: The mode argument determines the operation to be performed on the given range. Currently only one flag is supported for mode... So we call it "mode" but speak about "flags"? Seems a bit inconsistent. I'd maybe lean a bit at the "flags" side and just make sure that only one of FALLOC_FL_KEEP_SIZE, FALLOC_FL_PUNCH_HOLE is set (interpreting FALLOC_FL_KEEP_SIZE as allocate blocks beyond i_size). But I'm not sure what others think. Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 1/6] fs: add hole punching to fallocate 2010-11-16 13:14 ` Jan Kara @ 2010-11-17 0:22 ` Andreas Dilger 2010-11-17 2:11 ` Dave Chinner 0 siblings, 1 reply; 35+ messages in thread From: Andreas Dilger @ 2010-11-17 0:22 UTC (permalink / raw) To: Jan Kara Cc: Josef Bacik, david, linux-kernel, linux-btrfs, linux-ext4, linux-fsdevel, xfs, cmm, cluster-devel, ocfs2-devel On 2010-11-16, at 07:14, Jan Kara wrote: >> Yeah I went back and forth on this. KEEP_SIZE won't change the behavior of PUNCH_HOLE since PUNCH_HOLE implicitly means keep the size. I figured since its "mode" and not "flags" it would be ok to make either way accepted, but if you prefer PUNCH_HOLE means you have to have KEEP_SIZE set then I'm cool with that, just let me know one way or the other. > > So we call it "mode" but speak about "flags"? Seems a bit inconsistent. > I'd maybe lean a bit at the "flags" side and just make sure that only one of FALLOC_FL_KEEP_SIZE, FALLOC_FL_PUNCH_HOLE is set (interpreting FALLOC_FL_KEEP_SIZE as allocate blocks beyond i_size). But I'm not sure what others think. IMHO, it makes more sense for consistency and "get what users expect" that these be treated as flags. Some users will want KEEP_SIZE, but in other cases it may make sense that a hole punch at the end of a file should shrink the file (i.e. the opposite of an append). Cheers, Andreas ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 1/6] fs: add hole punching to fallocate 2010-11-17 0:22 ` Andreas Dilger @ 2010-11-17 2:11 ` Dave Chinner 2010-11-17 2:28 ` Josef Bacik 2010-11-17 9:19 ` Andreas Dilger 0 siblings, 2 replies; 35+ messages in thread From: Dave Chinner @ 2010-11-17 2:11 UTC (permalink / raw) To: Andreas Dilger Cc: Jan Kara, Josef Bacik, linux-kernel, linux-btrfs, linux-ext4, linux-fsdevel, xfs, cmm, cluster-devel, ocfs2-devel On Tue, Nov 16, 2010 at 06:22:47PM -0600, Andreas Dilger wrote: > On 2010-11-16, at 07:14, Jan Kara wrote: > >> Yeah I went back and forth on this. KEEP_SIZE won't change the > >> behavior of PUNCH_HOLE since PUNCH_HOLE implicitly means keep > >> the size. I figured since its "mode" and not "flags" it would > >> be ok to make either way accepted, but if you prefer PUNCH_HOLE > >> means you have to have KEEP_SIZE set then I'm cool with that, > >> just let me know one way or the other. > > > > So we call it "mode" but speak about "flags"? Seems a bit > > inconsistent. I'd maybe lean a bit at the "flags" side and just > > make sure that only one of FALLOC_FL_KEEP_SIZE, > > FALLOC_FL_PUNCH_HOLE is set (interpreting FALLOC_FL_KEEP_SIZE as > > allocate blocks beyond i_size). But I'm not sure what others > > think. > > IMHO, it makes more sense for consistency and "get what users > expect" that these be treated as flags. Some users will want > KEEP_SIZE, but in other cases it may make sense that a hole punch > at the end of a file should shrink the file (i.e. the opposite of > an append). What's wrong with ftruncate() for this? There's plenty of open questions about the interface if we allow hole punching to change the file size. e.g. where do we set the EOF (offset or offset+len)? What do we do with the rest of the blocks that are now beyond EOF? We weren't asked to punch them out, so do we leave them behind? What if we are leaving written blocks beyond EOF - does any filesystem other than XFS support that (i.e. are we introducing different behaviour on different filesystems)? And what happens if the offset is beyond EOF? Do we extend the file, and if so why wouldn't you just use ftruncate() instead? IMO, allowing hole punching to change the file size makes it much more complicated and hence less likely to simply do what the user expects. It also is harder to implement and testing becomes much more intricate. From that perspective, it does not seem desirable to me... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 1/6] fs: add hole punching to fallocate 2010-11-17 2:11 ` Dave Chinner @ 2010-11-17 2:28 ` Josef Bacik 2010-11-17 2:34 ` Josef Bacik 2010-11-17 9:19 ` Andreas Dilger 1 sibling, 1 reply; 35+ messages in thread From: Josef Bacik @ 2010-11-17 2:28 UTC (permalink / raw) To: Dave Chinner Cc: Andreas Dilger, Jan Kara, Josef Bacik, linux-kernel, linux-btrfs, linux-ext4, linux-fsdevel, xfs, cmm, cluster-devel, ocfs2-devel On Wed, Nov 17, 2010 at 01:11:50PM +1100, Dave Chinner wrote: > On Tue, Nov 16, 2010 at 06:22:47PM -0600, Andreas Dilger wrote: > > On 2010-11-16, at 07:14, Jan Kara wrote: > > >> Yeah I went back and forth on this. KEEP_SIZE won't change the > > >> behavior of PUNCH_HOLE since PUNCH_HOLE implicitly means keep > > >> the size. I figured since its "mode" and not "flags" it would > > >> be ok to make either way accepted, but if you prefer PUNCH_HOLE > > >> means you have to have KEEP_SIZE set then I'm cool with that, > > >> just let me know one way or the other. > > > > > > So we call it "mode" but speak about "flags"? Seems a bit > > > inconsistent. I'd maybe lean a bit at the "flags" side and just > > > make sure that only one of FALLOC_FL_KEEP_SIZE, > > > FALLOC_FL_PUNCH_HOLE is set (interpreting FALLOC_FL_KEEP_SIZE as > > > allocate blocks beyond i_size). But I'm not sure what others > > > think. > > > > IMHO, it makes more sense for consistency and "get what users > > expect" that these be treated as flags. Some users will want > > KEEP_SIZE, but in other cases it may make sense that a hole punch > > at the end of a file should shrink the file (i.e. the opposite of > > an append). > > What's wrong with ftruncate() for this? > > There's plenty of open questions about the interface if we allow > hole punching to change the file size. e.g. where do we set the EOF > (offset or offset+len)? What do we do with the rest of the blocks > that are now beyond EOF? We weren't asked to punch them out, so do > we leave them behind? What if we are leaving written blocks beyond > EOF - does any filesystem other than XFS support that (i.e. are we > introducing different behaviour on different filesystems)? And what > happens if the offset is beyond EOF? Do we extend the file, and if > so why wouldn't you just use ftruncate() instead? > > IMO, allowing hole punching to change the file size makes it much > more complicated and hence less likely to simply do what the user > expects. It also is harder to implement and testing becomes much > more intricate. From that perspective, it does not seem desirable to > me... > FWIW I agree with Dave, the only question at this point is do we force users to specify KEEP_SIZE with PUNCH_HOLE? On one hand it makes the interface a bit more consistent, on the other hand it makes the documentation a little weird "We have mode here, but if you want to use PUNCH_HOLE you also have to specify KEEP_SIZE, so really it's like a flags field it's just named poorly" I have no strong opinions the other way so if nobody else does then I'll just do it Jan's way. Thanks, Josef ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 1/6] fs: add hole punching to fallocate 2010-11-17 2:28 ` Josef Bacik @ 2010-11-17 2:34 ` Josef Bacik 2010-11-17 9:30 ` Andreas Dilger 0 siblings, 1 reply; 35+ messages in thread From: Josef Bacik @ 2010-11-17 2:34 UTC (permalink / raw) To: Josef Bacik Cc: Dave Chinner, Andreas Dilger, Jan Kara, linux-kernel, linux-btrfs, linux-ext4, linux-fsdevel, xfs, cmm, cluster-devel, ocfs2-devel On Tue, Nov 16, 2010 at 09:28:14PM -0500, Josef Bacik wrote: > On Wed, Nov 17, 2010 at 01:11:50PM +1100, Dave Chinner wrote: > > On Tue, Nov 16, 2010 at 06:22:47PM -0600, Andreas Dilger wrote: > > > On 2010-11-16, at 07:14, Jan Kara wrote: > > > >> Yeah I went back and forth on this. KEEP_SIZE won't change the > > > >> behavior of PUNCH_HOLE since PUNCH_HOLE implicitly means keep > > > >> the size. I figured since its "mode" and not "flags" it would > > > >> be ok to make either way accepted, but if you prefer PUNCH_HOLE > > > >> means you have to have KEEP_SIZE set then I'm cool with that, > > > >> just let me know one way or the other. > > > > > > > > So we call it "mode" but speak about "flags"? Seems a bit > > > > inconsistent. I'd maybe lean a bit at the "flags" side and just > > > > make sure that only one of FALLOC_FL_KEEP_SIZE, > > > > FALLOC_FL_PUNCH_HOLE is set (interpreting FALLOC_FL_KEEP_SIZE as > > > > allocate blocks beyond i_size). But I'm not sure what others > > > > think. > > > > > > IMHO, it makes more sense for consistency and "get what users > > > expect" that these be treated as flags. Some users will want > > > KEEP_SIZE, but in other cases it may make sense that a hole punch > > > at the end of a file should shrink the file (i.e. the opposite of > > > an append). > > > > What's wrong with ftruncate() for this? > > > > There's plenty of open questions about the interface if we allow > > hole punching to change the file size. e.g. where do we set the EOF > > (offset or offset+len)? What do we do with the rest of the blocks > > that are now beyond EOF? We weren't asked to punch them out, so do > > we leave them behind? What if we are leaving written blocks beyond > > EOF - does any filesystem other than XFS support that (i.e. are we > > introducing different behaviour on different filesystems)? And what > > happens if the offset is beyond EOF? Do we extend the file, and if > > so why wouldn't you just use ftruncate() instead? > > > > IMO, allowing hole punching to change the file size makes it much > > more complicated and hence less likely to simply do what the user > > expects. It also is harder to implement and testing becomes much > > more intricate. From that perspective, it does not seem desirable to > > me... > > > > FWIW I agree with Dave, the only question at this point is do we force users to > specify KEEP_SIZE with PUNCH_HOLE? On one hand it makes the interface a bit > more consistent, on the other hand it makes the documentation a little weird > > "We have mode here, but if you want to use PUNCH_HOLE you also have to specify > KEEP_SIZE, so really it's like a flags field it's just named poorly" > > I have no strong opinions the other way so if nobody else does then I'll just do > it Jan's way. Thanks, > Sorry child induced sleep deprevation bleeding in there, that should read "I have no strong opinions one way or the other." Sheesh, Josef ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 1/6] fs: add hole punching to fallocate 2010-11-17 2:34 ` Josef Bacik @ 2010-11-17 9:30 ` Andreas Dilger 0 siblings, 0 replies; 35+ messages in thread From: Andreas Dilger @ 2010-11-17 9:30 UTC (permalink / raw) To: Josef Bacik Cc: Dave Chinner, Jan Kara, linux-kernel, linux-btrfs, linux-ext4, linux-fsdevel, xfs, cmm, cluster-devel, ocfs2-devel On 2010-11-16, at 20:34, Josef Bacik wrote: > FWIW I agree with Dave, the only question at this point is do we force users to specify KEEP_SIZE with PUNCH_HOLE? On one hand it makes the interface a bit more consistent, on the other hand it makes the documentation a little weird > > "We have mode here, but if you want to use PUNCH_HOLE you also have to specify KEEP_SIZE, so really it's like a flags field it's just named poorly" Even if this is the case, and we decide today that PUNCH_HOLE without KEEP_SIZE is not desirable to implement, it would be better to just return -EOPNOTSUPP if both flags are not set than assume one or the other is what the user wanted. That allows the ability to implement this in the future without breaking every application, while if it is assumed that KEEP_SIZE is always implicit there will never be a way to add that functionality without something awful like a separate CHANGE_SIZE flag for PUNCH_HOLE. One option is to define FALLOC_FL_PUNCH_HOLE as 0x3 (so that KEEP_SIZE is always passed) and in the future we can define some new flag name like TRUNCATE_HOLE (or whatever) that is 0x2 only. Cheers, Andreas ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 1/6] fs: add hole punching to fallocate 2010-11-17 2:11 ` Dave Chinner 2010-11-17 2:28 ` Josef Bacik @ 2010-11-17 9:19 ` Andreas Dilger 1 sibling, 0 replies; 35+ messages in thread From: Andreas Dilger @ 2010-11-17 9:19 UTC (permalink / raw) To: Dave Chinner Cc: Jan Kara, Josef Bacik, linux-kernel, linux-btrfs, linux-ext4, linux-fsdevel, xfs, cmm, cluster-devel, ocfs2-devel On 2010-11-16, at 20:11, Dave Chinner wrote: > On Tue, Nov 16, 2010 at 06:22:47PM -0600, Andreas Dilger wrote: >> IMHO, it makes more sense for consistency and "get what users >> expect" that these be treated as flags. Some users will want >> KEEP_SIZE, but in other cases it may make sense that a hole punch >> at the end of a file should shrink the file (i.e. the opposite of >> an append). > > What's wrong with ftruncate() for this? It makes the API usage from applications more consistent. It would be inconvenient, for example, if applications had to use a different system call if they were writing in the middle of the file vs. at the end, wouldn't it? Similarly, if multiple threads are appending vs. punching (let's assume non-overlapping regions, for sanity, like a producer/consumer model punching out completed records) then using ftruncate() to remove the last record and shrink the file would require locking the whole file from userspace (unlike the append, which does this in the kernel), or risk discarding unprocessed data beyond the record that was punched out. > There's plenty of open questions about the interface if we allow > hole punching to change the file size. e.g. where do we set the EOF > (offset or offset+len)? I would think it natural that the new size is the start of the region, like an "anti-write" (where write sets the size at the end of the added bytes). > What do we do with the rest of the blocks that are now beyond EOF? > We weren't asked to punch them out, so do we leave them behind? I definitely think they should be left as is. If they were in the punched-out range, they would be deallocated, and if they are beyond EOF they will remain as they are - we didn't ask to remove them unless the punched-out range went to ~0ULL (which would make it equivalent to an ftruncate()). > What if we are leaving written blocks beyond EOF - does any filesystem other than XFS support that (i.e. are we introducing different behaviour on different filesystems)? I'm not sure I understand what a "written block beyond EOF" means. How can there be data beyond EOF? I think the KEEP_SIZE flag is only relevant if the punch is spanning EOF, like the opposite of a write that is spanning EOF. If KEEP_SIZE is set, then it leaves the size unchanged, and if unset and punch spans EOF it reduces the file size. If the punch is not at EOF it doesn't change the file size, just like a write that is not at EOF. > And what happens if the offset is beyond EOF? Do we extend the file, and if so why wouldn't you just use ftruncate() instead? Even if the effects were the same, it makes sense because applications may be using fallocate(PUNCH_HOLE) to punch out records, and having them special case the use of ftruncate() to get certain semantics at the end of the file adds needless complexity. Cheers, Andreas ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 1/6] fs: add hole punching to fallocate 2010-11-16 11:16 ` Jan Kara 2010-11-16 11:43 ` Jan Kara @ 2010-11-16 12:53 ` Josef Bacik 1 sibling, 0 replies; 35+ messages in thread From: Josef Bacik @ 2010-11-16 12:53 UTC (permalink / raw) To: Jan Kara Cc: Josef Bacik, david, linux-kernel, linux-btrfs, linux-ext4, linux-fsdevel, xfs, cmm, cluster-devel, ocfs2-devel On Tue, Nov 16, 2010 at 12:16:11PM +0100, Jan Kara wrote: > On Mon 15-11-10 12:05:18, Josef Bacik wrote: > > diff --git a/fs/open.c b/fs/open.c > > index 4197b9e..ab8dedf 100644 > > --- a/fs/open.c > > +++ b/fs/open.c > > @@ -223,7 +223,7 @@ int do_fallocate(struct file *file, int mode, loff_t offset, loff_t len) > > return -EINVAL; > > > > /* Return error if mode is not supported */ > > - if (mode && !(mode & FALLOC_FL_KEEP_SIZE)) > > + if (mode && (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE))) > Why not just: > if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE)) ? > Good point, thank you. > > diff --git a/include/linux/falloc.h b/include/linux/falloc.h > > index 3c15510..851cba2 100644 > > --- a/include/linux/falloc.h > > +++ b/include/linux/falloc.h > > @@ -2,6 +2,7 @@ > > #define _FALLOC_H_ > > > > #define FALLOC_FL_KEEP_SIZE 0x01 /* default is extend size */ > > +#define FALLOC_FL_PUNCH_HOLE 0X02 /* de-allocates range */ > ^ use lowercase 'x' please... > Argh bitten by caps-lock again. Thanks I'll fix this up, Josef ^ permalink raw reply [flat|nested] 35+ messages in thread
* [PATCH 1/6] fs: add hole punching to fallocate @ 2010-11-08 20:32 Josef Bacik 2010-11-09 1:12 ` Dave Chinner 0 siblings, 1 reply; 35+ messages in thread From: Josef Bacik @ 2010-11-08 20:32 UTC (permalink / raw) To: linux-kernel, linux-btrfs, linux-ext4, linux-fsdevel, xfs, joel.becker, cmm, cluster- Hole punching has already been implemented by XFS and OCFS2, and has the potential to be implemented on both BTRFS and EXT4 so we need a generic way to get to this feature. The simplest way in my mind is to add FALLOC_FL_PUNCH_HOLE to fallocate() since it already looks like the normal fallocate() operation. I've tested this patch with XFS and BTRFS to make sure XFS did what it's supposed to do and that BTRFS failed like it was supposed to. Thank you, Signed-off-by: Josef Bacik <josef@redhat.com> --- fs/open.c | 2 +- include/linux/falloc.h | 1 + 2 files changed, 2 insertions(+), 1 deletions(-) diff --git a/fs/open.c b/fs/open.c index 4197b9e..ab8dedf 100644 --- a/fs/open.c +++ b/fs/open.c @@ -223,7 +223,7 @@ int do_fallocate(struct file *file, int mode, loff_t offset, loff_t len) return -EINVAL; /* Return error if mode is not supported */ - if (mode && !(mode & FALLOC_FL_KEEP_SIZE)) + if (mode && (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE))) return -EOPNOTSUPP; if (!(file->f_mode & FMODE_WRITE)) diff --git a/include/linux/falloc.h b/include/linux/falloc.h index 3c15510..851cba2 100644 --- a/include/linux/falloc.h +++ b/include/linux/falloc.h @@ -2,6 +2,7 @@ #define _FALLOC_H_ #define FALLOC_FL_KEEP_SIZE 0x01 /* default is extend size */ +#define FALLOC_FL_PUNCH_HOLE 0X02 /* de-allocates range */ #ifdef __KERNEL__ -- 1.6.6.1 ^ permalink raw reply related [flat|nested] 35+ messages in thread
* Re: [PATCH 1/6] fs: add hole punching to fallocate 2010-11-08 20:32 Josef Bacik @ 2010-11-09 1:12 ` Dave Chinner 2010-11-09 2:10 ` Josef Bacik ` (2 more replies) 0 siblings, 3 replies; 35+ messages in thread From: Dave Chinner @ 2010-11-09 1:12 UTC (permalink / raw) To: Josef Bacik Cc: linux-kernel, linux-btrfs, linux-ext4, linux-fsdevel, xfs, joel.becker, cmm, cluster-devel On Mon, Nov 08, 2010 at 03:32:02PM -0500, Josef Bacik wrote: > Hole punching has already been implemented by XFS and OCFS2, and has the > potential to be implemented on both BTRFS and EXT4 so we need a generic way to > get to this feature. The simplest way in my mind is to add FALLOC_FL_PUNCH_HOLE > to fallocate() since it already looks like the normal fallocate() operation. > I've tested this patch with XFS and BTRFS to make sure XFS did what it's > supposed to do and that BTRFS failed like it was supposed to. Thank you, > > Signed-off-by: Josef Bacik <josef@redhat.com> > --- > fs/open.c | 2 +- > include/linux/falloc.h | 1 + > 2 files changed, 2 insertions(+), 1 deletions(-) > > diff --git a/fs/open.c b/fs/open.c > index 4197b9e..ab8dedf 100644 > --- a/fs/open.c > +++ b/fs/open.c > @@ -223,7 +223,7 @@ int do_fallocate(struct file *file, int mode, loff_t offset, loff_t len) > return -EINVAL; > > /* Return error if mode is not supported */ > - if (mode && !(mode & FALLOC_FL_KEEP_SIZE)) > + if (mode && (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE))) > return -EOPNOTSUPP; > > if (!(file->f_mode & FMODE_WRITE)) > diff --git a/include/linux/falloc.h b/include/linux/falloc.h > index 3c15510..851cba2 100644 > --- a/include/linux/falloc.h > +++ b/include/linux/falloc.h > @@ -2,6 +2,7 @@ > #define _FALLOC_H_ > > #define FALLOC_FL_KEEP_SIZE 0x01 /* default is extend size */ > +#define FALLOC_FL_PUNCH_HOLE 0X02 /* de-allocates range */ Hole punching was not included originally in fallocate() for a variety of reasons. IIRC, they were along the lines of: 1 de-allocating of blocks in an allocation syscall is wrong. People wanted a new syscall for this functionality. 2 no glibc interface needs it 3 at the time, only XFS supported punching holes, so there is not need to support it in a generic interface 4 the use cases presented were not considered compelling enough to justify the additional complexity (!) In the end, I gave up arguing for it to be included because just getting the FALLOC_FL_KEEP_SIZE functionality was a hard enough battle. Anyway, #3 isn't the case any more, #4 was just an excuse not to support anything ext4 couldn't do and lots of apps are calling fallocate directly (because glibc can't use FALLOC_FL_KEEP_SIZE) so #2 isn't an issue, either. I guess that leaves #1 to be debated; I don't think there is any problem with doing what you propose. What I will suggest is that this requires a generic xfstest to be written and support added to xfs_io to enable that test (and others) to issue hole punches. Something along the lines of test 242 which I wrote for testing all the edge case of XFS_IOC_ZERO_RANGE (*) would be good. Cheers, Dave. (*) fallocate() version: http://git.kernel.org/?p=linux/kernel/git/dgc/xfsdev.git;a=commitdiff;h=45f3e1831e3abc8bd12ec1e6c548f73a8dd9e36d -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 1/6] fs: add hole punching to fallocate 2010-11-09 1:12 ` Dave Chinner @ 2010-11-09 2:10 ` Josef Bacik 2010-11-09 3:30 ` Ted Ts'o 2010-11-09 20:51 ` Josef Bacik 2 siblings, 0 replies; 35+ messages in thread From: Josef Bacik @ 2010-11-09 2:10 UTC (permalink / raw) To: Dave Chinner Cc: Josef Bacik, linux-kernel, linux-btrfs, linux-ext4, linux-fsdevel, xfs, joel.becker, cmm, cluster-devel On Tue, Nov 09, 2010 at 12:12:22PM +1100, Dave Chinner wrote: > On Mon, Nov 08, 2010 at 03:32:02PM -0500, Josef Bacik wrote: > > Hole punching has already been implemented by XFS and OCFS2, and has the > > potential to be implemented on both BTRFS and EXT4 so we need a generic way to > > get to this feature. The simplest way in my mind is to add FALLOC_FL_PUNCH_HOLE > > to fallocate() since it already looks like the normal fallocate() operation. > > I've tested this patch with XFS and BTRFS to make sure XFS did what it's > > supposed to do and that BTRFS failed like it was supposed to. Thank you, > > > > Signed-off-by: Josef Bacik <josef@redhat.com> > > --- > > fs/open.c | 2 +- > > include/linux/falloc.h | 1 + > > 2 files changed, 2 insertions(+), 1 deletions(-) > > > > diff --git a/fs/open.c b/fs/open.c > > index 4197b9e..ab8dedf 100644 > > --- a/fs/open.c > > +++ b/fs/open.c > > @@ -223,7 +223,7 @@ int do_fallocate(struct file *file, int mode, loff_t offset, loff_t len) > > return -EINVAL; > > > > /* Return error if mode is not supported */ > > - if (mode && !(mode & FALLOC_FL_KEEP_SIZE)) > > + if (mode && (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE))) > > return -EOPNOTSUPP; > > > > if (!(file->f_mode & FMODE_WRITE)) > > diff --git a/include/linux/falloc.h b/include/linux/falloc.h > > index 3c15510..851cba2 100644 > > --- a/include/linux/falloc.h > > +++ b/include/linux/falloc.h > > @@ -2,6 +2,7 @@ > > #define _FALLOC_H_ > > > > #define FALLOC_FL_KEEP_SIZE 0x01 /* default is extend size */ > > +#define FALLOC_FL_PUNCH_HOLE 0X02 /* de-allocates range */ > > Hole punching was not included originally in fallocate() for a > variety of reasons. IIRC, they were along the lines of: > > 1 de-allocating of blocks in an allocation syscall is wrong. > People wanted a new syscall for this functionality. > 2 no glibc interface needs it > 3 at the time, only XFS supported punching holes, so there > is not need to support it in a generic interface > 4 the use cases presented were not considered compelling > enough to justify the additional complexity (!) > > In the end, I gave up arguing for it to be included because just > getting the FALLOC_FL_KEEP_SIZE functionality was a hard enough > battle. > > Anyway, #3 isn't the case any more, #4 was just an excuse not to > support anything ext4 couldn't do and lots of apps are calling > fallocate directly (because glibc can't use FALLOC_FL_KEEP_SIZE) so > #2 isn't an issue, either. I guess that leaves #1 to be debated; > I don't think there is any problem with doing what you propose. > > What I will suggest is that this requires a generic xfstest to be > written and support added to xfs_io to enable that test (and others) > to issue hole punches. Something along the lines of test 242 which I > wrote for testing all the edge case of XFS_IOC_ZERO_RANGE (*) would be > good. Sounds good. Do you want me to build my PUNCH_HOLE patch ontop of your ZERO_RANGE patch? Thanks, Josef ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 1/6] fs: add hole punching to fallocate 2010-11-09 1:12 ` Dave Chinner 2010-11-09 2:10 ` Josef Bacik @ 2010-11-09 3:30 ` Ted Ts'o 2010-11-09 4:42 ` Dave Chinner 2010-11-09 20:51 ` Josef Bacik 2 siblings, 1 reply; 35+ messages in thread From: Ted Ts'o @ 2010-11-09 3:30 UTC (permalink / raw) To: Dave Chinner Cc: Josef Bacik, linux-kernel, linux-btrfs, linux-ext4, linux-fsdevel, xfs, joel.becker, cmm, cluster-devel On Tue, Nov 09, 2010 at 12:12:22PM +1100, Dave Chinner wrote: > Hole punching was not included originally in fallocate() for a > variety of reasons. IIRC, they were along the lines of: > > 1 de-allocating of blocks in an allocation syscall is wrong. > People wanted a new syscall for this functionality. > 2 no glibc interface needs it > 3 at the time, only XFS supported punching holes, so there > is not need to support it in a generic interface > 4 the use cases presented were not considered compelling > enough to justify the additional complexity (!) > > In the end, I gave up arguing for it to be included because just > getting the FALLOC_FL_KEEP_SIZE functionality was a hard enough > battle. > > Anyway, #3 isn't the case any more, #4 was just an excuse not to > support anything ext4 couldn't do and lots of apps are calling > fallocate directly (because glibc can't use FALLOC_FL_KEEP_SIZE) so > #2 isn't an issue, either. I don't recall anyone arguing #4 because of ext4, but I get very tired of the linux-fsdevel bike-shed painting parties, so I often will concede whatever is necessary just to get the !@#! interface in, assuming we could add more flags later.... glibc does support fallocate(), BTW; it's just posix_fallocate() that doesn't use FALLOC_FL_KEEP_SIZE. > I guess that leaves #1 to be debated; > I don't think there is any problem with doing what you propose. I don't have a problem either. As a completely separate proposal, what do people think about an FALLOCATE_FL_ZEROIZE after which time the blocks are allocated, but reading from them returns zero. This could be done either by (a) sending a discard in the case of devices where discard_zeros_data is true and discard_granularty is less than the fs block size, or (b) by setting the uninitialized flag in the extent tree. - Ted ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 1/6] fs: add hole punching to fallocate 2010-11-09 3:30 ` Ted Ts'o @ 2010-11-09 4:42 ` Dave Chinner 2010-11-09 21:41 ` Ted Ts'o 0 siblings, 1 reply; 35+ messages in thread From: Dave Chinner @ 2010-11-09 4:42 UTC (permalink / raw) To: Ted Ts'o, Josef Bacik, linux-kernel, linux-btrfs, linux-ext4, linux-fsdevel, xfs On Mon, Nov 08, 2010 at 10:30:38PM -0500, Ted Ts'o wrote: > On Tue, Nov 09, 2010 at 12:12:22PM +1100, Dave Chinner wrote: > > Hole punching was not included originally in fallocate() for a > > variety of reasons. IIRC, they were along the lines of: > > > > 1 de-allocating of blocks in an allocation syscall is wrong. > > People wanted a new syscall for this functionality. .... > > I guess that leaves #1 to be debated; > > I don't think there is any problem with doing what you propose. > > I don't have a problem either. > > As a completely separate proposal, what do people think about an > FALLOCATE_FL_ZEROIZE after which time the blocks are allocated, but > reading from them returns zero. That's exactly the new XFS_IOC_ZERO_RANGE ioctl in 2.6.36 does (commit 447223520520b17d3b6d0631aa4838fbaf8eddb4 "xfs: Introduce XFS_IOC_ZERO_RANGE") The git commit I pointed to in the last email is the rudimentary fallocate() interface support I have for that code which goes along with an xfs_io patch I have. Given that there seems to be interest for this operation, I'll flesh it out into a proper patch.... > This could be done either by (a) > sending a discard in the case of devices where discard_zeros_data is > true and discard_granularty is less than the fs block size, or (b) by > setting the uninitialized flag in the extent tree. Implementation is up to the filesystem. However, XFS does (b) because: 1) it was extremely simple to implement (one of the advantages of having an exceedingly complex allocation interface to begin with :P) 2) conversion is atomic, fast and reliable 3) it is independent of the underlying storage; and 4) reads of unwritten extents operate at memory speed, not disk speed. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 1/6] fs: add hole punching to fallocate 2010-11-09 4:42 ` Dave Chinner @ 2010-11-09 21:41 ` Ted Ts'o 2010-11-09 21:53 ` Jan Kara 2010-11-09 23:40 ` Dave Chinner 0 siblings, 2 replies; 35+ messages in thread From: Ted Ts'o @ 2010-11-09 21:41 UTC (permalink / raw) To: Dave Chinner Cc: Josef Bacik, linux-kernel, linux-btrfs, linux-ext4, linux-fsdevel, xfs, joel.becker, cmm, cluster-devel On Tue, Nov 09, 2010 at 03:42:42PM +1100, Dave Chinner wrote: > Implementation is up to the filesystem. However, XFS does (b) > because: > > 1) it was extremely simple to implement (one of the > advantages of having an exceedingly complex allocation > interface to begin with :P) > 2) conversion is atomic, fast and reliable > 3) it is independent of the underlying storage; and > 4) reads of unwritten extents operate at memory speed, > not disk speed. Yeah, I was thinking that using a device-style TRIM might be better since future attempts to write to it won't require a separate seek to modify the extent tree. But yeah, there are a bunch of advantages of simply mutating the extent tree. While we're on the subject of changes to fallocate, what do people think of FALLOC_FL_EXPOSE_OLD_DATA, which requires either root privileges or (if capabilities are in use) CAP_DAC_OVERRIDE && CAP_MAC_OVERRIDE && CAP_SYS_ADMIN. This would allow a trusted process to fallocate blocks with the extent already marked initialized. I've had two requests for such functionality for ext4 already. (Take for example a trusted cluster filesystem backend that checks the object checksum before returning any data to the user; and if the check fails the cluster file system will try to use some other replica stored on some other server.) - Ted ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 1/6] fs: add hole punching to fallocate 2010-11-09 21:41 ` Ted Ts'o @ 2010-11-09 21:53 ` Jan Kara 2010-11-09 23:40 ` Dave Chinner 1 sibling, 0 replies; 35+ messages in thread From: Jan Kara @ 2010-11-09 21:53 UTC (permalink / raw) To: Ted Ts'o Cc: Dave Chinner, Josef Bacik, linux-kernel, linux-btrfs, linux-ext4, linux-fsdevel, xfs, joel.becker, cmm, cluster-devel On Tue 09-11-10 16:41:47, Ted Ts'o wrote: > On Tue, Nov 09, 2010 at 03:42:42PM +1100, Dave Chinner wrote: > > Implementation is up to the filesystem. However, XFS does (b) > > because: > > > > 1) it was extremely simple to implement (one of the > > advantages of having an exceedingly complex allocation > > interface to begin with :P) > > 2) conversion is atomic, fast and reliable > > 3) it is independent of the underlying storage; and > > 4) reads of unwritten extents operate at memory speed, > > not disk speed. > > Yeah, I was thinking that using a device-style TRIM might be better > since future attempts to write to it won't require a separate seek to > modify the extent tree. But yeah, there are a bunch of advantages of > simply mutating the extent tree. > > While we're on the subject of changes to fallocate, what do people > think of FALLOC_FL_EXPOSE_OLD_DATA, which requires either root > privileges or (if capabilities are in use) CAP_DAC_OVERRIDE && > CAP_MAC_OVERRIDE && CAP_SYS_ADMIN. This would allow a trusted process > to fallocate blocks with the extent already marked initialized. I've > had two requests for such functionality for ext4 already. > > (Take for example a trusted cluster filesystem backend that checks the > object checksum before returning any data to the user; and if the > check fails the cluster file system will try to use some other replica > stored on some other server.) Hum, could you elaborate a bit? I fail to see how above fallocate() flag could be used to help solving this problem... Just curious... Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 1/6] fs: add hole punching to fallocate 2010-11-09 21:41 ` Ted Ts'o 2010-11-09 21:53 ` Jan Kara @ 2010-11-09 23:40 ` Dave Chinner 2011-01-11 21:13 ` Lawrence Greenfield 1 sibling, 1 reply; 35+ messages in thread From: Dave Chinner @ 2010-11-09 23:40 UTC (permalink / raw) To: Ted Ts'o, Josef Bacik, linux-kernel, linux-btrfs, linux-ext4, linux-fsdevel, xfs On Tue, Nov 09, 2010 at 04:41:47PM -0500, Ted Ts'o wrote: > On Tue, Nov 09, 2010 at 03:42:42PM +1100, Dave Chinner wrote: > > Implementation is up to the filesystem. However, XFS does (b) > > because: > > > > 1) it was extremely simple to implement (one of the > > advantages of having an exceedingly complex allocation > > interface to begin with :P) > > 2) conversion is atomic, fast and reliable > > 3) it is independent of the underlying storage; and > > 4) reads of unwritten extents operate at memory speed, > > not disk speed. > > Yeah, I was thinking that using a device-style TRIM might be better > since future attempts to write to it won't require a separate seek to > modify the extent tree. But yeah, there are a bunch of advantages of > simply mutating the extent tree. > > While we're on the subject of changes to fallocate, what do people > think of FALLOC_FL_EXPOSE_OLD_DATA, which requires either root > privileges or (if capabilities are in use) CAP_DAC_OVERRIDE && > CAP_MAC_OVERRIDE && CAP_SYS_ADMIN. This would allow a trusted process > to fallocate blocks with the extent already marked initialized. I've > had two requests for such functionality for ext4 already. We removed that ability from XFS about three years ago because it's a massive security hole. e.g. what happens if the file is world readable, even though the process that called FALLOC_FL_EXPOSE_OLD_DATA was privileged and was allowed to expose such data? Or the file is chmod 777 after being exposed? The historical reason for such behaviour existing in XFS was that in 1997 the CPU and IO latency cost of unwritten extent conversion was significant, so users with real physical security (i.e. marines with guns) were able to make use of fast preallocation with no conversion overhead without caring about the security implications. These days, the performance overhead of unwritten extent conversion is minimal - I generally can't measure a difference in IO performance as a result of it - so there is simply no good reaѕon for leaving such a gaping security hole in the system. If anyone wants to read the underlying data, then use fiemap to map the physical blocks and read it directly from the block device. That requires root privileges but does not open any new stale data exposure problems.... > (Take for example a trusted cluster filesystem backend that checks the > object checksum before returning any data to the user; and if the > check fails the cluster file system will try to use some other replica > stored on some other server.) IOWs, all they want to do is avoid the unwritten extent conversion overhead. Time has shown that a bad security/performance tradeoff decision was made 13 years ago in XFS, so I see little reason to repeat it for ext4 today.... Cheers, Dave. -- Dave Chinner david@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 1/6] fs: add hole punching to fallocate 2010-11-09 23:40 ` Dave Chinner @ 2011-01-11 21:13 ` Lawrence Greenfield 2011-01-11 21:30 ` Ted Ts'o 2011-01-12 12:44 ` Dave Chinner 0 siblings, 2 replies; 35+ messages in thread From: Lawrence Greenfield @ 2011-01-11 21:13 UTC (permalink / raw) To: Dave Chinner Cc: Ted Ts'o, Josef Bacik, linux-kernel, linux-btrfs, linux-ext4, linux-fsdevel, xfs, joel.becker, cmm, cluster-devel On Tue, Nov 9, 2010 at 6:40 PM, Dave Chinner <david@fromorbit.com> wrote: > On Tue, Nov 09, 2010 at 04:41:47PM -0500, Ted Ts'o wrote: >> On Tue, Nov 09, 2010 at 03:42:42PM +1100, Dave Chinner wrote: >> > Implementation is up to the filesystem. However, XFS does (b) >> > because: >> > >> > 1) it was extremely simple to implement (one of the >> > advantages of having an exceedingly complex allocation >> > interface to begin with :P) >> > 2) conversion is atomic, fast and reliable >> > 3) it is independent of the underlying storage; and >> > 4) reads of unwritten extents operate at memory speed, >> > not disk speed. >> >> Yeah, I was thinking that using a device-style TRIM might be better >> since future attempts to write to it won't require a separate seek to >> modify the extent tree. But yeah, there are a bunch of advantages of >> simply mutating the extent tree. >> >> While we're on the subject of changes to fallocate, what do people >> think of FALLOC_FL_EXPOSE_OLD_DATA, which requires either root >> privileges or (if capabilities are in use) CAP_DAC_OVERRIDE && >> CAP_MAC_OVERRIDE && CAP_SYS_ADMIN. This would allow a trusted process >> to fallocate blocks with the extent already marked initialized. I've >> had two requests for such functionality for ext4 already. > > We removed that ability from XFS about three years ago because it's > a massive security hole. e.g. what happens if the file is world > readable, even though the process that called > FALLOC_FL_EXPOSE_OLD_DATA was privileged and was allowed to expose > such data? Or the file is chmod 777 after being exposed? > > The historical reason for such behaviour existing in XFS was that in > 1997 the CPU and IO latency cost of unwritten extent conversion was > significant, so users with real physical security (i.e. marines with > guns) were able to make use of fast preallocation with no conversion > overhead without caring about the security implications. These days, > the performance overhead of unwritten extent conversion is minimal - > I generally can't measure a difference in IO performance as a result > of it - so there is simply no good reaѕon for leaving such a gaping > security hole in the system. > > If anyone wants to read the underlying data, then use fiemap to map > the physical blocks and read it directly from the block device. That > requires root privileges but does not open any new stale data > exposure problems.... > >> (Take for example a trusted cluster filesystem backend that checks the >> object checksum before returning any data to the user; and if the >> check fails the cluster file system will try to use some other replica >> stored on some other server.) > > IOWs, all they want to do is avoid the unwritten extent conversion > overhead. Time has shown that a bad security/performance tradeoff > decision was made 13 years ago in XFS, so I see little reason to > repeat it for ext4 today.... I'd make use of FALLOC_FL_EXPOSE_OLD_DATA. It's not the CPU overhead of extent conversion. It's that extent conversion causes more metadata operations than what you'd have otherwise, which means systems that want to use O_DIRECT and make sure the data doesn't go away either have to write O_DIRECT|O_DSYNC or need to call fdatasync(). cluster file system implementor, Larry > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 1/6] fs: add hole punching to fallocate 2011-01-11 21:13 ` Lawrence Greenfield @ 2011-01-11 21:30 ` Ted Ts'o 2011-01-12 11:48 ` Dave Chinner 2011-01-12 12:44 ` Dave Chinner 1 sibling, 1 reply; 35+ messages in thread From: Ted Ts'o @ 2011-01-11 21:30 UTC (permalink / raw) To: Lawrence Greenfield Cc: Dave Chinner, Josef Bacik, linux-kernel, linux-btrfs, linux-ext4, linux-fsdevel, xfs, joel.becker, cmm, cluster-devel On Tue, Jan 11, 2011 at 04:13:42PM -0500, Lawrence Greenfield wrote: > > IOWs, all they want to do is avoid the unwritten extent conversion > > overhead. Time has shown that a bad security/performance tradeoff > > decision was made 13 years ago in XFS, so I see little reason to > > repeat it for ext4 today.... I suspect things may have changed somewhat; both in terms of requirements and nature of cluter file systems, and the performance of various storage systems (including PCIe-attached flash devices). > I'd make use of FALLOC_FL_EXPOSE_OLD_DATA. It's not the CPU overhead > of extent conversion. It's that extent conversion causes more metadata > operations than what you'd have otherwise, which means systems that > want to use O_DIRECT and make sure the data doesn't go away either > have to write O_DIRECT|O_DSYNC or need to call fdatasync(). > > cluster file system implementor, One possibility might be to make it an optional feature which is only enabled via a mount option. That way someone would have to explicit ask for this feature two ways (via a new flag to fallocate) and a mount option. It might not make sense for XFS, but for people who are using ext4 as the local storage file system back-end, and are doing all sorts of things to get the best performance, including disabling the journal, I suspect it really would make sense. So it could always be an optional-to-implement flag, that not all file systems should feel obliged to implement. - Ted ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 1/6] fs: add hole punching to fallocate 2011-01-11 21:30 ` Ted Ts'o @ 2011-01-12 11:48 ` Dave Chinner 0 siblings, 0 replies; 35+ messages in thread From: Dave Chinner @ 2011-01-12 11:48 UTC (permalink / raw) To: Ted Ts'o, Lawrence Greenfield, Josef Bacik, linux-kernel, linux-btrfs, linux-ext4 On Tue, Jan 11, 2011 at 04:30:07PM -0500, Ted Ts'o wrote: > On Tue, Jan 11, 2011 at 04:13:42PM -0500, Lawrence Greenfield wrote: > > > IOWs, all they want to do is avoid the unwritten extent conversion > > > overhead. Time has shown that a bad security/performance tradeoff > > > decision was made 13 years ago in XFS, so I see little reason to > > > repeat it for ext4 today.... > > I suspect things may have changed somewhat; both in terms of > requirements and nature of cluter file systems, and the performance of > various storage systems (including PCIe-attached flash devices). We can throw 1000x more CPU power and memory at the problem than we could 13 years ago. IOW the system balance hasn't changed (even considering pci-e SSDs) compared to 13 years. Hence if it was a bad tradeoff 13 years ago, it's still a bad tradeoff today. > > I'd make use of FALLOC_FL_EXPOSE_OLD_DATA. It's not the CPU overhead > > of extent conversion. It's that extent conversion causes more metadata > > operations than what you'd have otherwise, which means systems that > > want to use O_DIRECT and make sure the data doesn't go away either > > have to write O_DIRECT|O_DSYNC or need to call fdatasync(). > > cluster file system implementor, > > One possibility might be to make it an optional feature which is only > enabled via a mount option. That way someone would have to explicit > ask for this feature two ways (via a new flag to fallocate) and a > mount option. Proliferation of mount options just to enable feature X of API Y for filesystem Z is not a good idea. Either you enable it via the fallocate API or you don't allow it at all. > It might not make sense for XFS, but for people who are using ext4 > as the local storage file system back-end, How does this differ from a local filesystem? Are you talking about storage nodes for clustered/cloudy storage? If so, I know of quite a few places that use XFS for this purpose and they all seem to measure storage in petabytes made up of small boxes containing anywhere between 30-100TB each. The only request for additional preallocation functionality I've got from people running such applications recently is for XFS_IOC_ZERO_RANGE. This is quite relevant, because that specifically converts allocated extents to unwritten extents. i.e. they like to be able to efficiently re-initialise allocated space to zeros rather than have it contain stale data. > and are doing all sorts of things to get the best performance, > including disabling the journal, I suspect it really would make > sense. That's not really a convincing argument for a new interface that needs to be maintained forever. > So it could always be an > optional-to-implement flag, that not all file systems should feel > obliged to implement. It could, but it still needs better justification. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 1/6] fs: add hole punching to fallocate 2011-01-11 21:13 ` Lawrence Greenfield 2011-01-11 21:30 ` Ted Ts'o @ 2011-01-12 12:44 ` Dave Chinner 2011-01-28 18:13 ` Ric Wheeler 1 sibling, 1 reply; 35+ messages in thread From: Dave Chinner @ 2011-01-12 12:44 UTC (permalink / raw) To: Lawrence Greenfield Cc: Ted Ts'o, Josef Bacik, linux-kernel, linux-btrfs, linux-ext4, linux-fsdevel, xfs, joel.becker, cmm, cluster-devel On Tue, Jan 11, 2011 at 04:13:42PM -0500, Lawrence Greenfield wrote: > On Tue, Nov 9, 2010 at 6:40 PM, Dave Chinner <david@fromorbit.com> wrote: > > The historical reason for such behaviour existing in XFS was that in > > 1997 the CPU and IO latency cost of unwritten extent conversion was > > significant, ..... > >> (Take for example a trusted cluster filesystem backend that checks the > >> object checksum before returning any data to the user; and if the > >> check fails the cluster file system will try to use some other replica > >> stored on some other server.) > > > > IOWs, all they want to do is avoid the unwritten extent conversion > > overhead. Time has shown that a bad security/performance tradeoff > > decision was made 13 years ago in XFS, so I see little reason to > > repeat it for ext4 today.... > > I'd make use of FALLOC_FL_EXPOSE_OLD_DATA. It's not the CPU overhead > of extent conversion. It's that extent conversion causes more metadata > operations than what you'd have otherwise, Yes, that's the "IO latency" part of the cost I mentioned above. > which means systems that > want to use O_DIRECT and make sure the data doesn't go away either > have to write O_DIRECT|O_DSYNC or need to call fdatasync(). Seriously, we tell application writers _all the time_ that they *must* use fsync/fdatasync to guarantee their data is on stable storage and that they cannot rely on side-effects of filesystem or storage specific behaviours (like ext3 ordered mode) to do that job for them. You're suggesting that by introducing FALLOC_FL_EXPOSE_OLD_DATA, applications can rely on filesystem/storage specific behaviour to guarantee data is on stable storage without the use of fdatasync/fsync. Wht you describe is definitely storage specific, because volatile write caches still needs the fdatasync to issue a cache flush. Do you see the same conflict here that I do? > cluster file system implementor Which one? Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 1/6] fs: add hole punching to fallocate 2011-01-12 12:44 ` Dave Chinner @ 2011-01-28 18:13 ` Ric Wheeler 0 siblings, 0 replies; 35+ messages in thread From: Ric Wheeler @ 2011-01-28 18:13 UTC (permalink / raw) To: Dave Chinner Cc: Lawrence Greenfield, Ted Ts'o, Josef Bacik, linux-kernel, linux-btrfs, linux-ext4, linux-fsdevel, xfs, joel.becker, cmm, cluster-devel On 01/12/2011 07:44 AM, Dave Chinner wrote: > On Tue, Jan 11, 2011 at 04:13:42PM -0500, Lawrence Greenfield wrote: >> On Tue, Nov 9, 2010 at 6:40 PM, Dave Chinner<david@fromorbit.com> wrote: >>> The historical reason for such behaviour existing in XFS was that in >>> 1997 the CPU and IO latency cost of unwritten extent conversion was >>> significant, > ..... > >>>> (Take for example a trusted cluster filesystem backend that checks the >>>> object checksum before returning any data to the user; and if the >>>> check fails the cluster file system will try to use some other replica >>>> stored on some other server.) >>> IOWs, all they want to do is avoid the unwritten extent conversion >>> overhead. Time has shown that a bad security/performance tradeoff >>> decision was made 13 years ago in XFS, so I see little reason to >>> repeat it for ext4 today.... >> I'd make use of FALLOC_FL_EXPOSE_OLD_DATA. It's not the CPU overhead >> of extent conversion. It's that extent conversion causes more metadata >> operations than what you'd have otherwise, > Yes, that's the "IO latency" part of the cost I mentioned above. > >> which means systems that >> want to use O_DIRECT and make sure the data doesn't go away either >> have to write O_DIRECT|O_DSYNC or need to call fdatasync(). > Seriously, we tell application writers _all the time_ that they > *must* use fsync/fdatasync to guarantee their data is on stable > storage and that they cannot rely on side-effects of filesystem or > storage specific behaviours (like ext3 ordered mode) to do that job > for them. > > You're suggesting that by introducing FALLOC_FL_EXPOSE_OLD_DATA, > applications can rely on filesystem/storage specific behaviour to > guarantee data is on stable storage without the use of > fdatasync/fsync. Wht you describe is definitely storage specific, > because volatile write caches still needs the fdatasync to issue a > cache flush. > > Do you see the same conflict here that I do? > The very concept seems quite "non-enterprise". I also agree that the cost of maintaining extra mount options (and code) for something that no sane end user would ever do seems to be a loss. Why wouldn't you want to convert the punched hole to an unwritten extent? Thanks! Ric ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 1/6] fs: add hole punching to fallocate 2010-11-09 1:12 ` Dave Chinner 2010-11-09 2:10 ` Josef Bacik 2010-11-09 3:30 ` Ted Ts'o @ 2010-11-09 20:51 ` Josef Bacik 2 siblings, 0 replies; 35+ messages in thread From: Josef Bacik @ 2010-11-09 20:51 UTC (permalink / raw) To: Dave Chinner Cc: Josef Bacik, linux-kernel, linux-btrfs, linux-ext4, linux-fsdevel, xfs, joel.becker, cmm, cluster-devel On Tue, Nov 09, 2010 at 12:12:22PM +1100, Dave Chinner wrote: > On Mon, Nov 08, 2010 at 03:32:02PM -0500, Josef Bacik wrote: > > Hole punching has already been implemented by XFS and OCFS2, and has the > > potential to be implemented on both BTRFS and EXT4 so we need a generic way to > > get to this feature. The simplest way in my mind is to add FALLOC_FL_PUNCH_HOLE > > to fallocate() since it already looks like the normal fallocate() operation. > > I've tested this patch with XFS and BTRFS to make sure XFS did what it's > > supposed to do and that BTRFS failed like it was supposed to. Thank you, > > > > Signed-off-by: Josef Bacik <josef@redhat.com> > > --- > > fs/open.c | 2 +- > > include/linux/falloc.h | 1 + > > 2 files changed, 2 insertions(+), 1 deletions(-) > > > > diff --git a/fs/open.c b/fs/open.c > > index 4197b9e..ab8dedf 100644 > > --- a/fs/open.c > > +++ b/fs/open.c > > @@ -223,7 +223,7 @@ int do_fallocate(struct file *file, int mode, loff_t offset, loff_t len) > > return -EINVAL; > > > > /* Return error if mode is not supported */ > > - if (mode && !(mode & FALLOC_FL_KEEP_SIZE)) > > + if (mode && (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE))) > > return -EOPNOTSUPP; > > > > if (!(file->f_mode & FMODE_WRITE)) > > diff --git a/include/linux/falloc.h b/include/linux/falloc.h > > index 3c15510..851cba2 100644 > > --- a/include/linux/falloc.h > > +++ b/include/linux/falloc.h > > @@ -2,6 +2,7 @@ > > #define _FALLOC_H_ > > > > #define FALLOC_FL_KEEP_SIZE 0x01 /* default is extend size */ > > +#define FALLOC_FL_PUNCH_HOLE 0X02 /* de-allocates range */ > > Hole punching was not included originally in fallocate() for a > variety of reasons. IIRC, they were along the lines of: > > 1 de-allocating of blocks in an allocation syscall is wrong. > People wanted a new syscall for this functionality. > 2 no glibc interface needs it > 3 at the time, only XFS supported punching holes, so there > is not need to support it in a generic interface > 4 the use cases presented were not considered compelling > enough to justify the additional complexity (!) > > In the end, I gave up arguing for it to be included because just > getting the FALLOC_FL_KEEP_SIZE functionality was a hard enough > battle. > > Anyway, #3 isn't the case any more, #4 was just an excuse not to > support anything ext4 couldn't do and lots of apps are calling > fallocate directly (because glibc can't use FALLOC_FL_KEEP_SIZE) so > #2 isn't an issue, either. I guess that leaves #1 to be debated; > I don't think there is any problem with doing what you propose. > > What I will suggest is that this requires a generic xfstest to be > written and support added to xfs_io to enable that test (and others) > to issue hole punches. Something along the lines of test 242 which I > wrote for testing all the edge case of XFS_IOC_ZERO_RANGE (*) would be > good. > So this was relatively simple, adding a flag to falloc for xfs_io and such. Got a test going and it worked great on XFS. Then I went to make sure it worked on non-XFS, and thats where I've run into pain. Turns out xfs_io -c "bmap" only works on XFS. So I thought to myself "well how hard could it be to make this thing use fiemap?", hahaha I'm an idiot. So I've been adding a xfs_io -c "fiemap" that spits things out similar to bmap, and it will probably be tomorrow when I finish it. So good news is my simple patches seem to work just fine for hole-punch, bad news is its going to take me another day to have all the infrastructure to test it on non-XFS filesystems. Also did you want me to rebase my patches on your fallocate() version of ZERO_RANGE? Thanks, Josef ^ permalink raw reply [flat|nested] 35+ messages in thread
end of thread, other threads:[~2011-01-28 18:13 UTC | newest] Thread overview: 35+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-11-18 1:46 Hole Punching V3 Josef Bacik 2010-11-18 1:46 ` [PATCH 1/6] fs: add hole punching to fallocate Josef Bacik 2010-11-18 23:43 ` Jan Kara 2010-11-18 1:46 ` [PATCH 2/6] XFS: handle hole punching via fallocate properly Josef Bacik 2010-11-18 1:46 ` [PATCH 3/6] Ocfs2: " Josef Bacik 2010-11-18 1:46 ` [PATCH 4/6] Ext4: fail if we try to use hole punch Josef Bacik 2010-11-18 1:46 ` [PATCH 5/6] Btrfs: " Josef Bacik 2010-11-18 1:46 ` [PATCH 6/6] Gfs2: " Josef Bacik 2011-01-03 21:57 ` Hole Punching V3 Josef Bacik -- strict thread matches above, loose matches on Subject: below -- 2010-11-15 17:05 Hole Punching V2 Josef Bacik 2010-11-15 17:05 ` [PATCH 1/6] fs: add hole punching to fallocate Josef Bacik 2010-11-16 11:16 ` Jan Kara 2010-11-16 11:43 ` Jan Kara 2010-11-16 12:52 ` Josef Bacik 2010-11-16 13:14 ` Jan Kara 2010-11-17 0:22 ` Andreas Dilger 2010-11-17 2:11 ` Dave Chinner 2010-11-17 2:28 ` Josef Bacik 2010-11-17 2:34 ` Josef Bacik 2010-11-17 9:30 ` Andreas Dilger 2010-11-17 9:19 ` Andreas Dilger 2010-11-16 12:53 ` Josef Bacik 2010-11-08 20:32 Josef Bacik 2010-11-09 1:12 ` Dave Chinner 2010-11-09 2:10 ` Josef Bacik 2010-11-09 3:30 ` Ted Ts'o 2010-11-09 4:42 ` Dave Chinner 2010-11-09 21:41 ` Ted Ts'o 2010-11-09 21:53 ` Jan Kara 2010-11-09 23:40 ` Dave Chinner 2011-01-11 21:13 ` Lawrence Greenfield 2011-01-11 21:30 ` Ted Ts'o 2011-01-12 11:48 ` Dave Chinner 2011-01-12 12:44 ` Dave Chinner 2011-01-28 18:13 ` Ric Wheeler 2010-11-09 20:51 ` Josef Bacik
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).