* RE: fs: Add FITRIM ioctl
@ 2010-10-28 10:30 Boaz Harrosh
2010-10-28 11:25 ` Christoph Hellwig
2010-10-29 8:58 ` Lukas Czerner
0 siblings, 2 replies; 4+ messages in thread
From: Boaz Harrosh @ 2010-10-28 10:30 UTC (permalink / raw)
To: Lukas Czerner, linux-fsdevel; +Cc: Theodore Ts'o
Lukas Czerner <lczerner@redhat.com> wrote:
> Gitweb: http://git.kernel.org/linus/367a51a339020ba4d9edb0ce0f21d65bd50b00c9
> Commit: 367a51a339020ba4d9edb0ce0f21d65bd50b00c9
> Parent: 77ca6cdf0ab8a42f481ec997911bc89e79138723
> Author: Lukas Czerner <lczerner@redhat.com>
> AuthorDate: Wed Oct 27 21:30:11 2010 -0400
> Committer: Theodore Ts'o <tytso@mit.edu>
> CommitDate: Wed Oct 27 21:30:11 2010 -0400
>
I've just noticed this patch. Was it posted to linux-fsdevel? Sorry
for missing it.
So I have a comment, and question.
---
> fs: Add FITRIM ioctl
>
> Adds an filesystem independent ioctl to allow implementation of file
> system batched discard support. I takes fstrim_range structure as an
> argument. fstrim_range is definec in the include/fs.h and its
> definition is as follows.
>
> struct fstrim_range {
> start;
> len;
> minlen;
> }
>
> start - first Byte to trim
> len - number of Bytes to trim from start
> minlen - minimum extent length to trim, free extents shorter than this
> number of Bytes will be ignored. This will be rounded up to fs
> block size.
>
Was the range done so it could be called in parts, and progress displayed?
> It is also possible to specify NULL as an argument. In this case the
> arguments will set itself as follows:
>
> start = 0;
> len = ULLONG_MAX;
> minlen = 0;
>
> So it will trim the whole file system at one run.
>
> After the FITRIM is done, the number of actually discarded Bytes is stored
> in fstrim_range.len to give the user better insight on how much storage
> space has been really released for wear-leveling.
>
> Signed-off-by: Lukas Czerner <lczerner@redhat.com>
> Reviewed-by: Dmitry Monakhov <dmonakhov@openvz.org>
> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
> ---
> fs/ioctl.c | 39 +++++++++++++++++++++++++++++++++++++++
> include/linux/fs.h | 8 ++++++++
> 2 files changed, 47 insertions(+), 0 deletions(-)
>
> diff --git a/fs/ioctl.c b/fs/ioctl.c
> index f855ea4..e92fdbb 100644
> --- a/fs/ioctl.c
> +++ b/fs/ioctl.c
> @@ -530,6 +530,41 @@ static int ioctl_fsthaw(struct file *filp)
> return thaw_super(sb);
> }
>
> +static int ioctl_fstrim(struct file *filp, void __user *argp)
> +{
> + struct super_block *sb = filp->f_path.dentry->d_inode->i_sb;
> + struct fstrim_range range;
> + int ret = 0;
> +
> + if (!capable(CAP_SYS_ADMIN))
> + return -EPERM;
> +
> + /* If filesystem doesn't support trim feature, return. */
> + if (sb->s_op->trim_fs == NULL)
> + return -EOPNOTSUPP;
> +
> + /* If a blockdevice-backed filesystem isn't specified, return EINVAL. */
> + if (sb->s_bdev == NULL)
> + return -EINVAL;
> +
What is the point of that? sb->s_bdev is not used anywhere in this code.
If an FS published an sb->s_op->trim_fs, it should know what it wants
No? Note that the FS in question does not even need to check. It already
knows if it's block based or not.
Thanks
Boaz
> + if (argp == NULL) {
> + range.start = 0;
> + range.len = ULLONG_MAX;
> + range.minlen = 0;
> + } else if (copy_from_user(&range, argp, sizeof(range)))
> + return -EFAULT;
> +
> + ret = sb->s_op->trim_fs(sb, &range);
> + if (ret < 0)
> + return ret;
> +
> + if ((argp != NULL) &&
> + (copy_to_user(argp, &range, sizeof(range))))
> + return -EFAULT;
> +
> + return 0;
> +}
> +
> /*
> * When you add any new common ioctls to the switches above and below
> * please update compat_sys_ioctl() too.
> @@ -580,6 +615,10 @@ int do_vfs_ioctl(struct file *filp, unsigned int fd, unsigned int cmd,
> error = ioctl_fsthaw(filp);
> break;
>
> + case FITRIM:
> + error = ioctl_fstrim(filp, argp);
> + break;
> +
> case FS_IOC_FIEMAP:
> return ioctl_fiemap(filp, arg);
>
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 63d069b..7008268 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -32,6 +32,12 @@
> #define SEEK_END 2 /* seek relative to end of file */
> #define SEEK_MAX SEEK_END
>
> +struct fstrim_range {
> + uint64_t start;
> + uint64_t len;
> + uint64_t minlen;
> +};
> +
> /* And dynamically-tunable limits and defaults: */
> struct files_stat_struct {
> int nr_files; /* read only */
> @@ -316,6 +322,7 @@ struct inodes_stat_t {
> #define FIGETBSZ _IO(0x00,2) /* get the block size used for bmap */
> #define FIFREEZE _IOWR('X', 119, int) /* Freeze */
> #define FITHAW _IOWR('X', 120, int) /* Thaw */
> +#define FITRIM _IOWR('X', 121, struct fstrim_range) /* Trim */
>
> #define FS_IOC_GETFLAGS _IOR('f', 1, long)
> #define FS_IOC_SETFLAGS _IOW('f', 2, long)
> @@ -1581,6 +1588,7 @@ struct super_operations {
> ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t);
> #endif
> int (*bdev_try_to_free_page)(struct super_block*, struct page*, gfp_t);
> + int (*trim_fs) (struct super_block *, struct fstrim_range *);
> };
>
> /*
> --
> To unsubscribe from this list: send the line "unsubscribe git-commits-head" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: fs: Add FITRIM ioctl
2010-10-28 10:30 fs: Add FITRIM ioctl Boaz Harrosh
@ 2010-10-28 11:25 ` Christoph Hellwig
2010-10-29 9:50 ` Lukas Czerner
2010-10-29 8:58 ` Lukas Czerner
1 sibling, 1 reply; 4+ messages in thread
From: Christoph Hellwig @ 2010-10-28 11:25 UTC (permalink / raw)
To: Boaz Harrosh; +Cc: Lukas Czerner, linux-fsdevel, Theodore Ts'o
On Thu, Oct 28, 2010 at 12:30:45PM +0200, Boaz Harrosh wrote:
> I've just noticed this patch. Was it posted to linux-fsdevel? Sorry
> for missing it.
It has been, but only very recently.
I can't say I overly like it. There's no real point in dispatching this
out to a separate vector instead of just through ->ioctl. It's got
rather useless special case for ommiting the argument, and the whole
thing just doesn't seem generic enough to do it in the VFS.
Why did this common code go in through the ext4 tree as a start, and
without hitting linux-next before the merge window?
> What is the point of that? sb->s_bdev is not used anywhere in this code.
> If an FS published an sb->s_op->trim_fs, it should know what it wants
> No? Note that the FS in question does not even need to check. It already
> knows if it's block based or not.
Indeed. Looks like copied from freeze/thaw which used to be block
device based, but don't really need this anymore either.
^ permalink raw reply [flat|nested] 4+ messages in thread
* RE: fs: Add FITRIM ioctl
2010-10-28 10:30 fs: Add FITRIM ioctl Boaz Harrosh
2010-10-28 11:25 ` Christoph Hellwig
@ 2010-10-29 8:58 ` Lukas Czerner
1 sibling, 0 replies; 4+ messages in thread
From: Lukas Czerner @ 2010-10-29 8:58 UTC (permalink / raw)
To: Boaz Harrosh; +Cc: Lukas Czerner, linux-fsdevel, Theodore Ts'o
On Thu, 28 Oct 2010, Boaz Harrosh wrote:
> Lukas Czerner <lczerner@redhat.com> wrote:
> > Gitweb: http://git.kernel.org/linus/367a51a339020ba4d9edb0ce0f21d65bd50b00c9
> > Commit: 367a51a339020ba4d9edb0ce0f21d65bd50b00c9
> > Parent: 77ca6cdf0ab8a42f481ec997911bc89e79138723
> > Author: Lukas Czerner <lczerner@redhat.com>
> > AuthorDate: Wed Oct 27 21:30:11 2010 -0400
> > Committer: Theodore Ts'o <tytso@mit.edu>
> > CommitDate: Wed Oct 27 21:30:11 2010 -0400
> >
>
> I've just noticed this patch. Was it posted to linux-fsdevel? Sorry
> for missing it.
Yes it was, not long time ago.
>
> So I have a comment, and question.
> ---
> > fs: Add FITRIM ioctl
> >
> > Adds an filesystem independent ioctl to allow implementation of file
> > system batched discard support. I takes fstrim_range structure as an
> > argument. fstrim_range is definec in the include/fs.h and its
> > definition is as follows.
> >
> > struct fstrim_range {
> > start;
> > len;
> > minlen;
> > }
> >
> > start - first Byte to trim
> > len - number of Bytes to trim from start
> > minlen - minimum extent length to trim, free extents shorter than this
> > number of Bytes will be ignored. This will be rounded up to fs
> > block size.
> >
>
> Was the range done so it could be called in parts, and progress displayed?
Yes, this is one of the reasons. Other is, that discard on some
devices may take quite a while to complete and it is not very cheap
operation either. So, instead of discarding the whole device, which may
cause noticeable slowdown for other IO on that device for some time, we
can do it for example by one allocation group at a time (or just
generally per partes).
>
> > It is also possible to specify NULL as an argument. In this case the
> > arguments will set itself as follows:
> >
> > start = 0;
> > len = ULLONG_MAX;
> > minlen = 0;
> >
> > So it will trim the whole file system at one run.
> >
> > After the FITRIM is done, the number of actually discarded Bytes is stored
> > in fstrim_range.len to give the user better insight on how much storage
> > space has been really released for wear-leveling.
> >
> > Signed-off-by: Lukas Czerner <lczerner@redhat.com>
> > Reviewed-by: Dmitry Monakhov <dmonakhov@openvz.org>
> > Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
> > ---
> > fs/ioctl.c | 39 +++++++++++++++++++++++++++++++++++++++
> > include/linux/fs.h | 8 ++++++++
> > 2 files changed, 47 insertions(+), 0 deletions(-)
> >
> > diff --git a/fs/ioctl.c b/fs/ioctl.c
> > index f855ea4..e92fdbb 100644
> > --- a/fs/ioctl.c
> > +++ b/fs/ioctl.c
> > @@ -530,6 +530,41 @@ static int ioctl_fsthaw(struct file *filp)
> > return thaw_super(sb);
> > }
> >
> > +static int ioctl_fstrim(struct file *filp, void __user *argp)
> > +{
> > + struct super_block *sb = filp->f_path.dentry->d_inode->i_sb;
> > + struct fstrim_range range;
> > + int ret = 0;
> > +
> > + if (!capable(CAP_SYS_ADMIN))
> > + return -EPERM;
> > +
> > + /* If filesystem doesn't support trim feature, return. */
> > + if (sb->s_op->trim_fs == NULL)
> > + return -EOPNOTSUPP;
> > +
> > + /* If a blockdevice-backed filesystem isn't specified, return EINVAL. */
> > + if (sb->s_bdev == NULL)
> > + return -EINVAL;
> > +
>
> What is the point of that? sb->s_bdev is not used anywhere in this code.
> If an FS published an sb->s_op->trim_fs, it should know what it wants
> No? Note that the FS in question does not even need to check. It already
> knows if it's block based or not.
Right, this was copied from fsfreeze and you are right it is not needed
so it can be removed.
>
> Thanks
> Boaz
Thanks!
-Lukas
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: fs: Add FITRIM ioctl
2010-10-28 11:25 ` Christoph Hellwig
@ 2010-10-29 9:50 ` Lukas Czerner
0 siblings, 0 replies; 4+ messages in thread
From: Lukas Czerner @ 2010-10-29 9:50 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Boaz Harrosh, Lukas Czerner, linux-fsdevel, Theodore Ts'o
Hi Christoph,
On Thu, 28 Oct 2010, Christoph Hellwig wrote:
> On Thu, Oct 28, 2010 at 12:30:45PM +0200, Boaz Harrosh wrote:
> > I've just noticed this patch. Was it posted to linux-fsdevel? Sorry
> > for missing it.
>
> It has been, but only very recently.
>
>
> I can't say I overly like it. There's no real point in dispatching this
> out to a separate vector instead of just through ->ioctl. It's got
> rather useless special case for ommiting the argument, and the whole
> thing just doesn't seem generic enough to do it in the VFS.
Well, learning from examples, that's what I am doing and ioctl_fiemap()
is doing the same thing so I was using that as an example. And, of
course, I am returning the amount of really discarded Bytes to the user
and I just can not simply use user-space pointer, or maybe I do not
understand you correctly.
About the special case I am not entirely sure that it is useless, since
there is no point in allocating the structure and setting range when you
just want to discard the whole thing (in user-space of course). But the
reason why to do it in ioctl_fstrim() is that otherwise some filesystems
might decide to return EINVAL when no argument has been passed and some
may handle it differently, hence it would not be consistent.
>
> Why did this common code go in through the ext4 tree as a start, and
> without hitting linux-next before the merge window?
I have posted it on both linux-fsdevel and linux-ext4 and we needed it
for batched discard implementation in ext4 so that is probably why it
went through ext4 tree. Unfortunately, I can not answer the second
question...
>
> > What is the point of that? sb->s_bdev is not used anywhere in this code.
> > If an FS published an sb->s_op->trim_fs, it should know what it wants
> > No? Note that the FS in question does not even need to check. It already
> > knows if it's block based or not.
>
> Indeed. Looks like copied from freeze/thaw which used to be block
> device based, but don't really need this anymore either.
>
You're right.
Thanks!
-Lukas
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2010-10-29 9:50 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-10-28 10:30 fs: Add FITRIM ioctl Boaz Harrosh
2010-10-28 11:25 ` Christoph Hellwig
2010-10-29 9:50 ` Lukas Czerner
2010-10-29 8:58 ` Lukas Czerner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).