From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org,
Vlastimil Babka <vbabka@suse.cz>,
xfs@oss.sgi.com
Subject: Re: [PATCH 8/9] vfs: hoist the btrfs deduplication ioctl to the vfs
Date: Thu, 28 Jul 2016 11:07:20 -0700 [thread overview]
Message-ID: <20160728180720.GA15753@birch.djwong.org> (raw)
In-Reply-To: <20160727215130.GA18996@node.shutemov.name>
On Thu, Jul 28, 2016 at 12:51:30AM +0300, Kirill A. Shutemov wrote:
> On Sat, Dec 19, 2015 at 12:55:59AM -0800, Darrick J. Wong wrote:
> > Hoist the btrfs EXTENT_SAME ioctl up to the VFS and make the name
> > more systematic (FIDEDUPERANGE).
> >
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> > fs/compat_ioctl.c | 1
> > fs/ioctl.c | 38 ++++++++++++++++++
> > fs/read_write.c | 100 +++++++++++++++++++++++++++++++++++++++++++++++
> > include/linux/fs.h | 4 ++
> > include/uapi/linux/fs.h | 30 ++++++++++++++
> > 5 files changed, 173 insertions(+)
> >
> >
> > diff --git a/fs/compat_ioctl.c b/fs/compat_ioctl.c
> > index 70d4b10..eab31e7 100644
> > --- a/fs/compat_ioctl.c
> > +++ b/fs/compat_ioctl.c
> > @@ -1582,6 +1582,7 @@ COMPAT_SYSCALL_DEFINE3(ioctl, unsigned int, fd, unsigned int, cmd,
> >
> > case FICLONE:
> > case FICLONERANGE:
> > + case FIDEDUPERANGE:
> > goto do_ioctl;
> >
> > case FIBMAP:
> > diff --git a/fs/ioctl.c b/fs/ioctl.c
> > index 84c6e79..fcdd33b 100644
> > --- a/fs/ioctl.c
> > +++ b/fs/ioctl.c
> > @@ -568,6 +568,41 @@ static int ioctl_fsthaw(struct file *filp)
> > return thaw_super(sb);
> > }
> >
> > +static long ioctl_file_dedupe_range(struct file *file, void __user *arg)
> > +{
> > + struct file_dedupe_range __user *argp = arg;
> > + struct file_dedupe_range *same = NULL;
> > + int ret;
> > + unsigned long size;
> > + u16 count;
> > +
> > + if (get_user(count, &argp->dest_count)) {
> > + ret = -EFAULT;
> > + goto out;
> > + }
> > +
> > + size = offsetof(struct file_dedupe_range __user, info[count]);
(I still hate this interface.)
> Vlastimil triggered this during fuzzing:
>
> http://paste.opensuse.org/view/raw/99203426
>
> High order allocation without __GFP_NOWARN + fallback. That's not good.
>
> Basically, we don't have any sanity check of 'dest_count' here. This u16
> comes directly from userspace. And we call memdup_user() based on it.
>
> Here's a program which makes kernel allocate order-9 page:
>
> https://gist.github.com/kiryl/2b344b51da1fd2725be420a996b10d22
>
> Should we put some reasonable upper limit for the 'dest_count'?
> What is typical 'dest_count'?
There are two userland programs I know of that call this ioctl. The
first is xfs_io, which always sets dest_count = 1.
The other is duperemove, which seems capable of setting dest_count to
however many fragments it finds, up to a max of 120. Capping size to
x86's 4k page size yields 127 entries. On bigger machines with 64k
pages, that increases to 2047. I think that's enough for anybody.
(Honestly, 127 dedupe candidates * max 16M extent length is already
2GB of IO for a single call.)
--D
>
> > +
> > + same = memdup_user(argp, size);
> > + if (IS_ERR(same)) {
> > + ret = PTR_ERR(same);
> > + same = NULL;
> > + goto out;
> > + }
> > +
> > + ret = vfs_dedupe_file_range(file, same);
> > + if (ret)
> > + goto out;
> > +
> > + ret = copy_to_user(argp, same, size);
> > + if (ret)
> > + ret = -EFAULT;
> > +
> > +out:
> > + kfree(same);
> > + return ret;
> > +}
> > +
>
> --
> Kirill A. Shutemov
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2016-07-28 18:07 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-12-19 8:55 [RFCv4 0/9] vfs: hoist reflink/dedupe ioctls to the VFS Darrick J. Wong
2015-12-19 8:55 ` [PATCH 1/9] vfs: add copy_file_range syscall and vfs helper Darrick J. Wong
2015-12-19 8:55 ` [PATCH 2/9] x86: add sys_copy_file_range to syscall tables Darrick J. Wong
2015-12-19 8:55 ` [PATCH 3/9] btrfs: add .copy_file_range file operation Darrick J. Wong
2015-12-19 8:55 ` [PATCH 4/9] vfs: Add vfs_copy_file_range() support for pagecache copies Darrick J. Wong
2015-12-19 8:55 ` [PATCH 5/9] locks: new locks_mandatory_area calling convention Darrick J. Wong
2015-12-19 8:55 ` [PATCH 6/9] vfs: pull btrfs clone API to vfs layer Darrick J. Wong
2015-12-19 8:55 ` [PATCH 7/9] vfs: wire up compat ioctl for CLONE/CLONE_RANGE Darrick J. Wong
2015-12-19 8:55 ` [PATCH 8/9] vfs: hoist the btrfs deduplication ioctl to the vfs Darrick J. Wong
2016-01-12 6:07 ` Eric Biggers
2016-01-12 9:14 ` Darrick J. Wong
2016-01-13 2:36 ` Eric Biggers
2016-01-23 0:54 ` Darrick J. Wong
2016-08-07 17:47 ` Michael Kerrisk (man-pages)
2016-07-27 21:51 ` Kirill A. Shutemov
2016-07-28 18:07 ` Darrick J. Wong [this message]
2016-07-28 19:25 ` Darrick J. Wong
2015-12-19 8:56 ` [PATCH 9/9] btrfs: use new dedupe data function pointer Darrick J. Wong
2015-12-20 15:30 ` [RFCv4 0/9] vfs: hoist reflink/dedupe ioctls to the VFS Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160728180720.GA15753@birch.djwong.org \
--to=darrick.wong@oracle.com \
--cc=kirill@shutemov.name \
--cc=linux-api@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=vbabka@suse.cz \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox