* [PATCH 0/1] Batched discard support
@ 2010-09-29 8:54 Lukas Czerner
2010-09-29 8:54 ` [PATCH] Add ioctl FITRIM Lukas Czerner
2010-09-30 0:55 ` [PATCH 0/1] Batched discard support Dave Chinner
0 siblings, 2 replies; 5+ messages in thread
From: Lukas Czerner @ 2010-09-29 8:54 UTC (permalink / raw)
To: linux-fsdevel; +Cc: tytso, sandeen, adilger, lczerner
Hi,
I am working on something I have called "batched discard support" for Ext3
and Ext4 filesystems. Traditional discard support for filesystems like Ext4
has been implemented the way that whenever the file is unlinked the
disk-space that the file was using is trimmed (discarded) by
sb_issue_discard() to let the device know that this portion of disk is no
longer in use by the filesystem and can be safely used for wear-leveling.
However, this approach comes with very noticeable performance loss on most
of SSD devices and LUN's I have the opportunity to test it on. The fact is,
that bigger discard ranges are more efficient than smaller ones, so it make
sense try to batch the ranges together wherever it is possible.
I have introduced new filesystem independent ioctl (FITRIM) which can be used
to send the "trim this portion of filesystem" command down to the filesystem
which (if implemented) discards all free extents in that range.
The implementation for Ext3 and Ext4 is complete and you can see it here:
http://www.spinics.net/lists/linux-ext4/msg21050.html
Why I am sending it here to linux-fsdevel is because I am introducing new fs
independent ioctl and new member of super_operations (trim_fs) and we would
like let you know about this approach (which any filesystem can take
advantage from) and we would like your comment on this patch before we
send it upstream.
Thanks!
-Lukas
[PATCH] 9c8c3a5 Add ioctl FITRIM
fs/ioctl.c | 39 +++++++++++++++++++++++++++++++++++++++
include/linux/fs.h | 8 ++++++++
2 files changed, 47 insertions(+), 0 deletions(-)
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH] Add ioctl FITRIM.
2010-09-29 8:54 [PATCH 0/1] Batched discard support Lukas Czerner
@ 2010-09-29 8:54 ` Lukas Czerner
2010-09-30 0:55 ` [PATCH 0/1] Batched discard support Dave Chinner
1 sibling, 0 replies; 5+ messages in thread
From: Lukas Czerner @ 2010-09-29 8:54 UTC (permalink / raw)
To: linux-fsdevel; +Cc: tytso, sandeen, adilger, lczerner
Adds an filesystem independent ioctl to allow implementation of file
system batched discard support. I takes fstrim_range structure as an
argument. fstrim_range is definec in the include/fs.h and its
definition is as follows.
struct fstrim_range {
start;
len;
minlen;
}
start - first Byte to trim
len - number of Bytes to trim from start
minlen - minimum extent length to trim, free extents shorter than this
number of Bytes will be ignored. This will be rounded up to fs
block size.
It is also possible to specify NULL as an argument. In this case the
arguments will set itself as follows:
start = 0;
len = ULLONG_MAX;
minlen = 0;
So it will trim the whole file system at one run.
After the FITRIM is done, the number of actually discarded Bytes is stored
in fstrim_range.len to give the user better insight on how much storage
space has been really released for wear-leveling.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Reviewed-by: Dmitry Monakhov <dmonakhov@openvz.org>
---
fs/ioctl.c | 39 +++++++++++++++++++++++++++++++++++++++
include/linux/fs.h | 8 ++++++++
2 files changed, 47 insertions(+), 0 deletions(-)
diff --git a/fs/ioctl.c b/fs/ioctl.c
index f855ea4..e92fdbb 100644
--- a/fs/ioctl.c
+++ b/fs/ioctl.c
@@ -530,6 +530,41 @@ static int ioctl_fsthaw(struct file *filp)
return thaw_super(sb);
}
+static int ioctl_fstrim(struct file *filp, void __user *argp)
+{
+ struct super_block *sb = filp->f_path.dentry->d_inode->i_sb;
+ struct fstrim_range range;
+ int ret = 0;
+
+ if (!capable(CAP_SYS_ADMIN))
+ return -EPERM;
+
+ /* If filesystem doesn't support trim feature, return. */
+ if (sb->s_op->trim_fs == NULL)
+ return -EOPNOTSUPP;
+
+ /* If a blockdevice-backed filesystem isn't specified, return EINVAL. */
+ if (sb->s_bdev == NULL)
+ return -EINVAL;
+
+ if (argp == NULL) {
+ range.start = 0;
+ range.len = ULLONG_MAX;
+ range.minlen = 0;
+ } else if (copy_from_user(&range, argp, sizeof(range)))
+ return -EFAULT;
+
+ ret = sb->s_op->trim_fs(sb, &range);
+ if (ret < 0)
+ return ret;
+
+ if ((argp != NULL) &&
+ (copy_to_user(argp, &range, sizeof(range))))
+ return -EFAULT;
+
+ return 0;
+}
+
/*
* When you add any new common ioctls to the switches above and below
* please update compat_sys_ioctl() too.
@@ -580,6 +615,10 @@ int do_vfs_ioctl(struct file *filp, unsigned int fd, unsigned int cmd,
error = ioctl_fsthaw(filp);
break;
+ case FITRIM:
+ error = ioctl_fstrim(filp, argp);
+ break;
+
case FS_IOC_FIEMAP:
return ioctl_fiemap(filp, arg);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 0ec4d60e..63a0843 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -32,6 +32,12 @@
#define SEEK_END 2 /* seek relative to end of file */
#define SEEK_MAX SEEK_END
+struct fstrim_range {
+ uint64_t start;
+ uint64_t len;
+ uint64_t minlen;
+};
+
/* And dynamically-tunable limits and defaults: */
struct files_stat_struct {
int nr_files; /* read only */
@@ -312,6 +318,7 @@ struct inodes_stat_t {
#define FIGETBSZ _IO(0x00,2) /* get the block size used for bmap */
#define FIFREEZE _IOWR('X', 119, int) /* Freeze */
#define FITHAW _IOWR('X', 120, int) /* Thaw */
+#define FITRIM _IOWR('X', 121, struct fstrim_range) /* Trim */
#define FS_IOC_GETFLAGS _IOR('f', 1, long)
#define FS_IOC_SETFLAGS _IOW('f', 2, long)
@@ -1573,6 +1580,7 @@ struct super_operations {
ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t);
#endif
int (*bdev_try_to_free_page)(struct super_block*, struct page*, gfp_t);
+ int (*trim_fs) (struct super_block *, struct fstrim_range *);
};
/*
--
1.7.2.3
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 0/1] Batched discard support
2010-09-29 8:54 [PATCH 0/1] Batched discard support Lukas Czerner
2010-09-29 8:54 ` [PATCH] Add ioctl FITRIM Lukas Czerner
@ 2010-09-30 0:55 ` Dave Chinner
2010-09-30 12:17 ` Lukas Czerner
2010-09-30 16:11 ` Eric Sandeen
1 sibling, 2 replies; 5+ messages in thread
From: Dave Chinner @ 2010-09-30 0:55 UTC (permalink / raw)
To: Lukas Czerner; +Cc: linux-fsdevel, tytso, sandeen, adilger
On Wed, Sep 29, 2010 at 10:54:25AM +0200, Lukas Czerner wrote:
> Hi,
>
> I am working on something I have called "batched discard support" for Ext3
> and Ext4 filesystems. Traditional discard support for filesystems like Ext4
> has been implemented the way that whenever the file is unlinked the
> disk-space that the file was using is trimmed (discarded) by
> sb_issue_discard() to let the device know that this portion of disk is no
> longer in use by the filesystem and can be safely used for wear-leveling.
>
> However, this approach comes with very noticeable performance loss on most
> of SSD devices and LUN's I have the opportunity to test it on. The fact is,
> that bigger discard ranges are more efficient than smaller ones, so it make
> sense try to batch the ranges together wherever it is possible.
>
> I have introduced new filesystem independent ioctl (FITRIM) which can be used
> to send the "trim this portion of filesystem" command down to the filesystem
> which (if implemented) discards all free extents in that range.
>
> The implementation for Ext3 and Ext4 is complete and you can see it here:
>
> http://www.spinics.net/lists/linux-ext4/msg21050.html
>
> Why I am sending it here to linux-fsdevel is because I am introducing new fs
> independent ioctl and new member of super_operations (trim_fs) and we would
> like let you know about this approach (which any filesystem can take
> advantage from) and we would like your comment on this patch before we
> send it upstream.
My first question is: how do you test a filesystem implements
->trim_fs correctly?
That is, if we are going to include a data-destroying ioctl, I
really want some filesystem independent tests written first so that
as filesystems implement ->trim_fs they can be tested for correct
implementation.
Perhaps adding FITRIM support to xfs_io, and a generic test to
xfstests would be the way to go. e.g. write a set of patterned files
to the filesystem, unlink a number of the files, then run some trim
commands on the filesystem exercising corner cases and check that
none of the data in still-active files is damaged (e.g. via md5sum
comparison)....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 0/1] Batched discard support
2010-09-30 0:55 ` [PATCH 0/1] Batched discard support Dave Chinner
@ 2010-09-30 12:17 ` Lukas Czerner
2010-09-30 16:11 ` Eric Sandeen
1 sibling, 0 replies; 5+ messages in thread
From: Lukas Czerner @ 2010-09-30 12:17 UTC (permalink / raw)
To: Dave Chinner; +Cc: Lukas Czerner, linux-fsdevel, tytso, sandeen, adilger
On Thu, 30 Sep 2010, Dave Chinner wrote:
> On Wed, Sep 29, 2010 at 10:54:25AM +0200, Lukas Czerner wrote:
> > Hi,
> >
> > I am working on something I have called "batched discard support" for Ext3
> > and Ext4 filesystems. Traditional discard support for filesystems like Ext4
> > has been implemented the way that whenever the file is unlinked the
> > disk-space that the file was using is trimmed (discarded) by
> > sb_issue_discard() to let the device know that this portion of disk is no
> > longer in use by the filesystem and can be safely used for wear-leveling.
> >
> > However, this approach comes with very noticeable performance loss on most
> > of SSD devices and LUN's I have the opportunity to test it on. The fact is,
> > that bigger discard ranges are more efficient than smaller ones, so it make
> > sense try to batch the ranges together wherever it is possible.
> >
> > I have introduced new filesystem independent ioctl (FITRIM) which can be used
> > to send the "trim this portion of filesystem" command down to the filesystem
> > which (if implemented) discards all free extents in that range.
> >
> > The implementation for Ext3 and Ext4 is complete and you can see it here:
> >
> > http://www.spinics.net/lists/linux-ext4/msg21050.html
> >
> > Why I am sending it here to linux-fsdevel is because I am introducing new fs
> > independent ioctl and new member of super_operations (trim_fs) and we would
> > like let you know about this approach (which any filesystem can take
> > advantage from) and we would like your comment on this patch before we
> > send it upstream.
>
> My first question is: how do you test a filesystem implements
> ->trim_fs correctly?
>
> That is, if we are going to include a data-destroying ioctl, I
> really want some filesystem independent tests written first so that
> as filesystems implement ->trim_fs they can be tested for correct
> implementation.
>
> Perhaps adding FITRIM support to xfs_io, and a generic test to
> xfstests would be the way to go. e.g. write a set of patterned files
> to the filesystem, unlink a number of the files, then run some trim
> commands on the filesystem exercising corner cases and check that
> none of the data in still-active files is damaged (e.g. via md5sum
> comparison)....
>
> Cheers,
>
> Dave.
Hi Dave,
When chasing bugs in those patches I have extensively used stress.sh
script which is part of my gensuit package:
https://sourceforge.net/projects/gensuit/
..and I found it very helpful for testing this. Basically, it does
exactly what you have proposed. It creates checksums of all files in
defined directory (e.g. /usr/share/doc) and run several processes which
will clear its working directory, then copy everything from this
previously checksummed directory (e.g. /usr/share/doc) into its working
directory, create list of files in working directory and its
checksums and compare it with the original list of checksums. Every
process works in the loop so it repeat remove->copy->check.
While the stress.sh tool is running I have run the FITRIM ioctl in the
loop, so when the FITRIM is really buggy (thus data-destroying) the
stress.sh will notice, because checksums will most change change.
As I said I found it sufficient to test FITRIM ioctl on ext3 and ext4
filesystem. So if you think it is good enough I will create new
xfstests test which will do the same thing as stress.sh. Actually I will
post this new test shortly if everyone is ok with this approach.
Thanks!
-Lukas
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 0/1] Batched discard support
2010-09-30 0:55 ` [PATCH 0/1] Batched discard support Dave Chinner
2010-09-30 12:17 ` Lukas Czerner
@ 2010-09-30 16:11 ` Eric Sandeen
1 sibling, 0 replies; 5+ messages in thread
From: Eric Sandeen @ 2010-09-30 16:11 UTC (permalink / raw)
To: Dave Chinner; +Cc: Lukas Czerner, linux-fsdevel, tytso, adilger
On 09/29/2010 07:55 PM, Dave Chinner wrote:
> On Wed, Sep 29, 2010 at 10:54:25AM +0200, Lukas Czerner wrote:
>> Hi,
>>
>> I am working on something I have called "batched discard support" for Ext3
>> and Ext4 filesystems. Traditional discard support for filesystems like Ext4
>> has been implemented the way that whenever the file is unlinked the
>> disk-space that the file was using is trimmed (discarded) by
>> sb_issue_discard() to let the device know that this portion of disk is no
>> longer in use by the filesystem and can be safely used for wear-leveling.
>>
>> However, this approach comes with very noticeable performance loss on most
>> of SSD devices and LUN's I have the opportunity to test it on. The fact is,
>> that bigger discard ranges are more efficient than smaller ones, so it make
>> sense try to batch the ranges together wherever it is possible.
>>
>> I have introduced new filesystem independent ioctl (FITRIM) which can be used
>> to send the "trim this portion of filesystem" command down to the filesystem
>> which (if implemented) discards all free extents in that range.
>>
>> The implementation for Ext3 and Ext4 is complete and you can see it here:
>>
>> http://www.spinics.net/lists/linux-ext4/msg21050.html
>>
>> Why I am sending it here to linux-fsdevel is because I am introducing new fs
>> independent ioctl and new member of super_operations (trim_fs) and we would
>> like let you know about this approach (which any filesystem can take
>> advantage from) and we would like your comment on this patch before we
>> send it upstream.
>
> My first question is: how do you test a filesystem implements
> ->trim_fs correctly?
>
> That is, if we are going to include a data-destroying ioctl, I
> really want some filesystem independent tests written first so that
> as filesystems implement ->trim_fs they can be tested for correct
> implementation.
>
> Perhaps adding FITRIM support to xfs_io, and a generic test to
> xfstests would be the way to go. e.g. write a set of patterned files
> to the filesystem, unlink a number of the files, then run some trim
> commands on the filesystem exercising corner cases and check that
> none of the data in still-active files is damaged (e.g. via md5sum
> comparison)....
xfstests could use the scsi-debug module for testing too - at least
once it has http://www.spinics.net/lists/linux-scsi/msg46008.html
so that tests can be run even if you don't have the hardware.
-Eric
> Cheers,
>
> Dave.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2010-09-30 16:11 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-09-29 8:54 [PATCH 0/1] Batched discard support Lukas Czerner
2010-09-29 8:54 ` [PATCH] Add ioctl FITRIM Lukas Czerner
2010-09-30 0:55 ` [PATCH 0/1] Batched discard support Dave Chinner
2010-09-30 12:17 ` Lukas Czerner
2010-09-30 16:11 ` Eric Sandeen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).