All of lore.kernel.org
 help / color / mirror / Atom feed
From: Gabriel de Perthuis <g2p.code@gmail.com>
To: Mark Fasheh <mfasheh@suse.de>
Cc: linux-btrfs@vger.kernel.org,
	Chris Mason <chris.mason@fusionio.com>,
	Josef Bacik <josef@redhat.com>, David Sterba <dsterba@suse.cz>
Subject: Re: [PATCH 0/4] btrfs: offline dedupe v2
Date: Tue, 11 Jun 2013 22:56:59 +0200	[thread overview]
Message-ID: <51B78F1B.7000100@gmail.com> (raw)
In-Reply-To: <1370982698-757-1-git-send-email-mfasheh@suse.de>

Le 11/06/2013 22:31, Mark Fasheh a écrit :
> Perhaps this isn't a limiation per-se but extent-same requires read/write
> access to the files we want to dedupe.  During my last series I had a
> conversation with Gabriel de Perthuis about access checking where we tried
> to maintain the ability for a user to run extent-same against a readonly
> snapshot. In addition, I reasoned that since the underlying data won't
> change (at least to the user) that we ought only require the files to be
> open for read.
> 
> What I found however is that neither of these is a great idea ;)
> 
> - We want to require that the inode be open for writing so that an
>   unprivileged user can't do things like run dedupe on a performance
>   sensitive file that they might only have read access to.  In addition I
>   could see it as kind of a surprise (non-standard behavior) to an
>   administrator that users could alter the layout of files they are only
>   allowed to read.
> 
> - Readonly snapshots won't let you open for write anyway (unsuprisingly,
>   open() returns -EROFS).  So that kind of kills the idea of them being able
>   to open those files for write which we want to dedupe.
> 
> That said, I still think being able to run this against a set of readonly
> snapshots makes sense especially if those snapshots are taken for backup
> purposes. I'm just not sure how we can sanely enable it.

The check could be: if (fmode_write || cap_sys_admin).

This isn't incompatible with mnt_want_write, that check is at the
level of the superblocks and vfsmount and not the subvolume fsid.

> Code review is very much appreciated. Thanks,
>      --Mark
> 
> 
> ChangeLog
> 
> - check that we have appropriate access to each file before deduping. For
>   the source, we only check that it is opened for read. Target files have to
>   be open for write.
> 
> - don't dedupe on readonly submounts (this is to maintain 
> 
> - check that we don't dedupe files with different checksumming states
>  (compare BTRFS_INODE_NODATASUM flags)
> 
> - get and maintain write access to the mount during the extent same
>   operation (mount_want_write())
> 
> - allocate our read buffers up front in btrfs_ioctl_file_extent_same() and
>   pass them through for re-use on every call to btrfs_extent_same(). (thanks
>   to David Sterba <dsterba@suse.cz> for reporting this
> 
> - As the read buffers could possibly be up to 1MB (depending on user
>   request), we now conditionally vmalloc them.
> 
> - removed redundant check for same inode. btrfs_extent_same() catches it now
>   and bubbles the error up.
> 
> - remove some unnecessary printks
> 
> Changes from RFC to v1:
> 
> - don't error on large length value in btrfs exent-same, instead we just
>   dedupe the maximum allowed.  That way userspace doesn't have to worry
>   about an arbitrary length limit.
> 
> - btrfs_extent_same will now loop over the dedupe range at 1MB increments (for
>   a total of 16MB per request)
> 
> - cleaned up poorly coded while loop in __extent_read_full_page() (thanks to
>   David Sterba <dsterba@suse.cz> for reporting this)
> 
> - included two fixes from Gabriel de Perthuis <g2p.code@gmail.com>:
>    - allow dedupe across subvolumes
>    - don't lock compressed pages twice when deduplicating
> 
> - removed some unused / poorly designed fields in btrfs_ioctl_same_args.
>   This should also give us a bit more reserved bytes.
> 
> - return -E2BIG instead of -ENOMEM when arg list is too large (thanks to
>   David Sterba <dsterba@suse.cz> for reporting this)
> 
> - Some more reserved bytes are now included as a result of some of my
>   cleanups. Quite possibly we could add a couple more.
> 


  parent reply	other threads:[~2013-06-11 20:57 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-11 20:31 [PATCH 0/4] btrfs: offline dedupe v2 Mark Fasheh
2013-06-11 20:31 ` [PATCH 1/4] btrfs: abtract out range locking in clone ioctl() Mark Fasheh
2013-06-11 20:31 ` [PATCH 2/4] btrfs_ioctl_clone: Move clone code into it's own function Mark Fasheh
2013-06-11 20:31 ` [PATCH 3/4] btrfs: Introduce extent_read_full_page_nolock() Mark Fasheh
2013-06-11 20:31 ` [PATCH 4/4] btrfs: offline dedupe Mark Fasheh
2013-07-15 20:55   ` Zach Brown
2013-07-17  0:14     ` Gabriel de Perthuis
2013-06-11 20:56 ` Gabriel de Perthuis [this message]
2013-06-11 21:04   ` [PATCH 0/4] btrfs: offline dedupe v2 Mark Fasheh
2013-06-11 21:31     ` Gabriel de Perthuis
2013-06-11 21:45       ` Mark Fasheh
2013-06-12 18:10 ` Josef Bacik
2013-06-17 20:04   ` Mark Fasheh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51B78F1B.7000100@gmail.com \
    --to=g2p.code@gmail.com \
    --cc=chris.mason@fusionio.com \
    --cc=dsterba@suse.cz \
    --cc=josef@redhat.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=mfasheh@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.