* [PATCH 0/1] Batched discard support @ 2010-09-29 8:54 Lukas Czerner 2010-09-29 8:54 ` [PATCH] Add ioctl FITRIM Lukas Czerner 2010-09-30 0:55 ` [PATCH 0/1] Batched discard support Dave Chinner 0 siblings, 2 replies; 5+ messages in thread From: Lukas Czerner @ 2010-09-29 8:54 UTC (permalink / raw) To: linux-fsdevel; +Cc: tytso, sandeen, adilger, lczerner Hi, I am working on something I have called "batched discard support" for Ext3 and Ext4 filesystems. Traditional discard support for filesystems like Ext4 has been implemented the way that whenever the file is unlinked the disk-space that the file was using is trimmed (discarded) by sb_issue_discard() to let the device know that this portion of disk is no longer in use by the filesystem and can be safely used for wear-leveling. However, this approach comes with very noticeable performance loss on most of SSD devices and LUN's I have the opportunity to test it on. The fact is, that bigger discard ranges are more efficient than smaller ones, so it make sense try to batch the ranges together wherever it is possible. I have introduced new filesystem independent ioctl (FITRIM) which can be used to send the "trim this portion of filesystem" command down to the filesystem which (if implemented) discards all free extents in that range. The implementation for Ext3 and Ext4 is complete and you can see it here: http://www.spinics.net/lists/linux-ext4/msg21050.html Why I am sending it here to linux-fsdevel is because I am introducing new fs independent ioctl and new member of super_operations (trim_fs) and we would like let you know about this approach (which any filesystem can take advantage from) and we would like your comment on this patch before we send it upstream. Thanks! -Lukas [PATCH] 9c8c3a5 Add ioctl FITRIM fs/ioctl.c | 39 +++++++++++++++++++++++++++++++++++++++ include/linux/fs.h | 8 ++++++++ 2 files changed, 47 insertions(+), 0 deletions(-) ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH] Add ioctl FITRIM. 2010-09-29 8:54 [PATCH 0/1] Batched discard support Lukas Czerner @ 2010-09-29 8:54 ` Lukas Czerner 2010-09-30 0:55 ` [PATCH 0/1] Batched discard support Dave Chinner 1 sibling, 0 replies; 5+ messages in thread From: Lukas Czerner @ 2010-09-29 8:54 UTC (permalink / raw) To: linux-fsdevel; +Cc: tytso, sandeen, adilger, lczerner Adds an filesystem independent ioctl to allow implementation of file system batched discard support. I takes fstrim_range structure as an argument. fstrim_range is definec in the include/fs.h and its definition is as follows. struct fstrim_range { start; len; minlen; } start - first Byte to trim len - number of Bytes to trim from start minlen - minimum extent length to trim, free extents shorter than this number of Bytes will be ignored. This will be rounded up to fs block size. It is also possible to specify NULL as an argument. In this case the arguments will set itself as follows: start = 0; len = ULLONG_MAX; minlen = 0; So it will trim the whole file system at one run. After the FITRIM is done, the number of actually discarded Bytes is stored in fstrim_range.len to give the user better insight on how much storage space has been really released for wear-leveling. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Dmitry Monakhov <dmonakhov@openvz.org> --- fs/ioctl.c | 39 +++++++++++++++++++++++++++++++++++++++ include/linux/fs.h | 8 ++++++++ 2 files changed, 47 insertions(+), 0 deletions(-) diff --git a/fs/ioctl.c b/fs/ioctl.c index f855ea4..e92fdbb 100644 --- a/fs/ioctl.c +++ b/fs/ioctl.c @@ -530,6 +530,41 @@ static int ioctl_fsthaw(struct file *filp) return thaw_super(sb); } +static int ioctl_fstrim(struct file *filp, void __user *argp) +{ + struct super_block *sb = filp->f_path.dentry->d_inode->i_sb; + struct fstrim_range range; + int ret = 0; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + /* If filesystem doesn't support trim feature, return. */ + if (sb->s_op->trim_fs == NULL) + return -EOPNOTSUPP; + + /* If a blockdevice-backed filesystem isn't specified, return EINVAL. */ + if (sb->s_bdev == NULL) + return -EINVAL; + + if (argp == NULL) { + range.start = 0; + range.len = ULLONG_MAX; + range.minlen = 0; + } else if (copy_from_user(&range, argp, sizeof(range))) + return -EFAULT; + + ret = sb->s_op->trim_fs(sb, &range); + if (ret < 0) + return ret; + + if ((argp != NULL) && + (copy_to_user(argp, &range, sizeof(range)))) + return -EFAULT; + + return 0; +} + /* * When you add any new common ioctls to the switches above and below * please update compat_sys_ioctl() too. @@ -580,6 +615,10 @@ int do_vfs_ioctl(struct file *filp, unsigned int fd, unsigned int cmd, error = ioctl_fsthaw(filp); break; + case FITRIM: + error = ioctl_fstrim(filp, argp); + break; + case FS_IOC_FIEMAP: return ioctl_fiemap(filp, arg); diff --git a/include/linux/fs.h b/include/linux/fs.h index 0ec4d60e..63a0843 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -32,6 +32,12 @@ #define SEEK_END 2 /* seek relative to end of file */ #define SEEK_MAX SEEK_END +struct fstrim_range { + uint64_t start; + uint64_t len; + uint64_t minlen; +}; + /* And dynamically-tunable limits and defaults: */ struct files_stat_struct { int nr_files; /* read only */ @@ -312,6 +318,7 @@ struct inodes_stat_t { #define FIGETBSZ _IO(0x00,2) /* get the block size used for bmap */ #define FIFREEZE _IOWR('X', 119, int) /* Freeze */ #define FITHAW _IOWR('X', 120, int) /* Thaw */ +#define FITRIM _IOWR('X', 121, struct fstrim_range) /* Trim */ #define FS_IOC_GETFLAGS _IOR('f', 1, long) #define FS_IOC_SETFLAGS _IOW('f', 2, long) @@ -1573,6 +1580,7 @@ struct super_operations { ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t); #endif int (*bdev_try_to_free_page)(struct super_block*, struct page*, gfp_t); + int (*trim_fs) (struct super_block *, struct fstrim_range *); }; /* -- 1.7.2.3 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 0/1] Batched discard support 2010-09-29 8:54 [PATCH 0/1] Batched discard support Lukas Czerner 2010-09-29 8:54 ` [PATCH] Add ioctl FITRIM Lukas Czerner @ 2010-09-30 0:55 ` Dave Chinner 2010-09-30 12:17 ` Lukas Czerner 2010-09-30 16:11 ` Eric Sandeen 1 sibling, 2 replies; 5+ messages in thread From: Dave Chinner @ 2010-09-30 0:55 UTC (permalink / raw) To: Lukas Czerner; +Cc: linux-fsdevel, tytso, sandeen, adilger On Wed, Sep 29, 2010 at 10:54:25AM +0200, Lukas Czerner wrote: > Hi, > > I am working on something I have called "batched discard support" for Ext3 > and Ext4 filesystems. Traditional discard support for filesystems like Ext4 > has been implemented the way that whenever the file is unlinked the > disk-space that the file was using is trimmed (discarded) by > sb_issue_discard() to let the device know that this portion of disk is no > longer in use by the filesystem and can be safely used for wear-leveling. > > However, this approach comes with very noticeable performance loss on most > of SSD devices and LUN's I have the opportunity to test it on. The fact is, > that bigger discard ranges are more efficient than smaller ones, so it make > sense try to batch the ranges together wherever it is possible. > > I have introduced new filesystem independent ioctl (FITRIM) which can be used > to send the "trim this portion of filesystem" command down to the filesystem > which (if implemented) discards all free extents in that range. > > The implementation for Ext3 and Ext4 is complete and you can see it here: > > http://www.spinics.net/lists/linux-ext4/msg21050.html > > Why I am sending it here to linux-fsdevel is because I am introducing new fs > independent ioctl and new member of super_operations (trim_fs) and we would > like let you know about this approach (which any filesystem can take > advantage from) and we would like your comment on this patch before we > send it upstream. My first question is: how do you test a filesystem implements ->trim_fs correctly? That is, if we are going to include a data-destroying ioctl, I really want some filesystem independent tests written first so that as filesystems implement ->trim_fs they can be tested for correct implementation. Perhaps adding FITRIM support to xfs_io, and a generic test to xfstests would be the way to go. e.g. write a set of patterned files to the filesystem, unlink a number of the files, then run some trim commands on the filesystem exercising corner cases and check that none of the data in still-active files is damaged (e.g. via md5sum comparison).... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 0/1] Batched discard support 2010-09-30 0:55 ` [PATCH 0/1] Batched discard support Dave Chinner @ 2010-09-30 12:17 ` Lukas Czerner 2010-09-30 16:11 ` Eric Sandeen 1 sibling, 0 replies; 5+ messages in thread From: Lukas Czerner @ 2010-09-30 12:17 UTC (permalink / raw) To: Dave Chinner; +Cc: Lukas Czerner, linux-fsdevel, tytso, sandeen, adilger On Thu, 30 Sep 2010, Dave Chinner wrote: > On Wed, Sep 29, 2010 at 10:54:25AM +0200, Lukas Czerner wrote: > > Hi, > > > > I am working on something I have called "batched discard support" for Ext3 > > and Ext4 filesystems. Traditional discard support for filesystems like Ext4 > > has been implemented the way that whenever the file is unlinked the > > disk-space that the file was using is trimmed (discarded) by > > sb_issue_discard() to let the device know that this portion of disk is no > > longer in use by the filesystem and can be safely used for wear-leveling. > > > > However, this approach comes with very noticeable performance loss on most > > of SSD devices and LUN's I have the opportunity to test it on. The fact is, > > that bigger discard ranges are more efficient than smaller ones, so it make > > sense try to batch the ranges together wherever it is possible. > > > > I have introduced new filesystem independent ioctl (FITRIM) which can be used > > to send the "trim this portion of filesystem" command down to the filesystem > > which (if implemented) discards all free extents in that range. > > > > The implementation for Ext3 and Ext4 is complete and you can see it here: > > > > http://www.spinics.net/lists/linux-ext4/msg21050.html > > > > Why I am sending it here to linux-fsdevel is because I am introducing new fs > > independent ioctl and new member of super_operations (trim_fs) and we would > > like let you know about this approach (which any filesystem can take > > advantage from) and we would like your comment on this patch before we > > send it upstream. > > My first question is: how do you test a filesystem implements > ->trim_fs correctly? > > That is, if we are going to include a data-destroying ioctl, I > really want some filesystem independent tests written first so that > as filesystems implement ->trim_fs they can be tested for correct > implementation. > > Perhaps adding FITRIM support to xfs_io, and a generic test to > xfstests would be the way to go. e.g. write a set of patterned files > to the filesystem, unlink a number of the files, then run some trim > commands on the filesystem exercising corner cases and check that > none of the data in still-active files is damaged (e.g. via md5sum > comparison).... > > Cheers, > > Dave. Hi Dave, When chasing bugs in those patches I have extensively used stress.sh script which is part of my gensuit package: https://sourceforge.net/projects/gensuit/ ..and I found it very helpful for testing this. Basically, it does exactly what you have proposed. It creates checksums of all files in defined directory (e.g. /usr/share/doc) and run several processes which will clear its working directory, then copy everything from this previously checksummed directory (e.g. /usr/share/doc) into its working directory, create list of files in working directory and its checksums and compare it with the original list of checksums. Every process works in the loop so it repeat remove->copy->check. While the stress.sh tool is running I have run the FITRIM ioctl in the loop, so when the FITRIM is really buggy (thus data-destroying) the stress.sh will notice, because checksums will most change change. As I said I found it sufficient to test FITRIM ioctl on ext3 and ext4 filesystem. So if you think it is good enough I will create new xfstests test which will do the same thing as stress.sh. Actually I will post this new test shortly if everyone is ok with this approach. Thanks! -Lukas ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 0/1] Batched discard support 2010-09-30 0:55 ` [PATCH 0/1] Batched discard support Dave Chinner 2010-09-30 12:17 ` Lukas Czerner @ 2010-09-30 16:11 ` Eric Sandeen 1 sibling, 0 replies; 5+ messages in thread From: Eric Sandeen @ 2010-09-30 16:11 UTC (permalink / raw) To: Dave Chinner; +Cc: Lukas Czerner, linux-fsdevel, tytso, adilger On 09/29/2010 07:55 PM, Dave Chinner wrote: > On Wed, Sep 29, 2010 at 10:54:25AM +0200, Lukas Czerner wrote: >> Hi, >> >> I am working on something I have called "batched discard support" for Ext3 >> and Ext4 filesystems. Traditional discard support for filesystems like Ext4 >> has been implemented the way that whenever the file is unlinked the >> disk-space that the file was using is trimmed (discarded) by >> sb_issue_discard() to let the device know that this portion of disk is no >> longer in use by the filesystem and can be safely used for wear-leveling. >> >> However, this approach comes with very noticeable performance loss on most >> of SSD devices and LUN's I have the opportunity to test it on. The fact is, >> that bigger discard ranges are more efficient than smaller ones, so it make >> sense try to batch the ranges together wherever it is possible. >> >> I have introduced new filesystem independent ioctl (FITRIM) which can be used >> to send the "trim this portion of filesystem" command down to the filesystem >> which (if implemented) discards all free extents in that range. >> >> The implementation for Ext3 and Ext4 is complete and you can see it here: >> >> http://www.spinics.net/lists/linux-ext4/msg21050.html >> >> Why I am sending it here to linux-fsdevel is because I am introducing new fs >> independent ioctl and new member of super_operations (trim_fs) and we would >> like let you know about this approach (which any filesystem can take >> advantage from) and we would like your comment on this patch before we >> send it upstream. > > My first question is: how do you test a filesystem implements > ->trim_fs correctly? > > That is, if we are going to include a data-destroying ioctl, I > really want some filesystem independent tests written first so that > as filesystems implement ->trim_fs they can be tested for correct > implementation. > > Perhaps adding FITRIM support to xfs_io, and a generic test to > xfstests would be the way to go. e.g. write a set of patterned files > to the filesystem, unlink a number of the files, then run some trim > commands on the filesystem exercising corner cases and check that > none of the data in still-active files is damaged (e.g. via md5sum > comparison).... xfstests could use the scsi-debug module for testing too - at least once it has http://www.spinics.net/lists/linux-scsi/msg46008.html so that tests can be run even if you don't have the hardware. -Eric > Cheers, > > Dave. ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2010-09-30 16:11 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-09-29 8:54 [PATCH 0/1] Batched discard support Lukas Czerner 2010-09-29 8:54 ` [PATCH] Add ioctl FITRIM Lukas Czerner 2010-09-30 0:55 ` [PATCH 0/1] Batched discard support Dave Chinner 2010-09-30 12:17 ` Lukas Czerner 2010-09-30 16:11 ` Eric Sandeen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).