linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Simon Farnsworth <simon@farnz.org.uk>
To: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH] Btrfs: add extent-same ioctl for dedup
Date: Wed, 05 Jan 2011 17:50:42 +0000	[thread overview]
Message-ID: <ig2b1i$ph9$1@dough.gmane.org> (raw)
In-Reply-To: 1294245410-4739-2-git-send-email-josef@redhat.com

Josef Bacik wrote:

> This adds the ability for userspace to tell btrfs which extents match
> eachother. You pass in
> 
> -a logical offset
> -a length
> -a hash type (currently only sha256 is supported)
> -the hash
> -a list of file descriptors with their logical offset
> 
> and this ioctl will split up the extent on the target file and then link
> all of
> the files with the target files extent and free up their original extent. 
> The hash is to make sure nothing has changed between the userspace app
> running and we doing the actual linking, so we hash everything to make
> sure it's all still
> the same.  This doesn't work in a few key cases
> 
> 1) Any data transformation whatsoever.  This includes compression or any
> encryption that happens later on.  This is just to make sure we're not
> deduping things that don't turn out to be the same stuff on disk as it is
> uncompressed/decrypted.
> 
> 2) Across subvolumes.  This can be fixed later, but this is just to keep
> odd problems from happening, like oh say trying to dedup things that are
> snapshots
> of eachother already.  Nothing bad will happen, it's just needless work so
> just don't allow it for the time being.
> 
> 3) If the target file's data is split across extents.  We need one extent
> to point everybody at, so if the target file's data spans different
> extents we
> won't work.  In this case I return ERANGE so the userspace app can call
> defrag and then try again, but currently I don't do that, so that will
> have to be fixed at some point.
> 
> I think thats all of the special cases.  Thanks,
> 
I'm going to ask the stupid question: What happens if an attacker user can 
race against the dedupe process?

In particular, consider the following hypothetical scenario:

Attacker has discovered a hash collision for some important data that they 
can read but not write (e.g. /etc/passwd, /home/user/company-critical-
data.ods). They copy the important data from its original location to 
somewhere they can write to on the same filesystem.

Now for the evil bit; they wait, watching for the dedupe process to run. 
When it's had time to verify hash and memcmp the data, but before it calls 
the ioctl, Attacker swaps the copy of the data under their control for the 
bad one with the hash collision.

If I've understood the code correctly, if Attacker's version of the data is 
the source from the perspective of the ioctl, the kernel will hash the data, 
determine that the hash matches, not cross-check the entire extent with 
memcmp or equivalent, and will then splice Attacker's version of data into 
the original file. If the collision merely lets Attacker trash the original, 
that's bad enough; if it lets them put other interesting content in place, 
it's a really bad problem.

-- 
Here's hoping I simply missed something,

Simon Farnsworth


  reply	other threads:[~2011-01-05 17:50 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-05 16:36 Offline Deduplication for Btrfs Josef Bacik
2011-01-05 16:36 ` [PATCH] Btrfs: add extent-same ioctl for dedup Josef Bacik
2011-01-05 17:50   ` Simon Farnsworth [this message]
2011-01-05 16:36 ` [PATCH] Btrfs-progs: add dedup functionality Josef Bacik
2011-01-05 17:42 ` Offline Deduplication for Btrfs Gordan Bobic
2011-01-05 18:41   ` Diego Calleja
2011-01-05 19:01     ` Ray Van Dolson
2011-01-05 20:27       ` Gordan Bobic
2011-01-05 20:28       ` Josef Bacik
2011-01-05 20:25     ` Gordan Bobic
2011-01-05 21:14       ` Diego Calleja
2011-01-05 21:21         ` Gordan Bobic
2011-01-05 19:46   ` Josef Bacik
2011-01-05 19:58     ` Lars Wirzenius
2011-01-05 20:15       ` Josef Bacik
2011-01-05 20:34         ` Freddie Cash
2011-01-05 21:07       ` Lars Wirzenius
2011-01-05 20:12     ` Freddie Cash
2011-01-05 20:46     ` Gordan Bobic
     [not found]       ` <4D250B3C.6010708@shiftmail.org>
2011-01-06  1:03         ` Gordan Bobic
2011-01-06  1:56           ` Spelic
2011-01-06 10:39             ` Gordan Bobic
2011-01-06  3:33           ` Freddie Cash
2011-01-06  1:19       ` Spelic
2011-01-06  3:58         ` Peter A
2011-01-06 10:48           ` Gordan Bobic
2011-01-06 13:33             ` Peter A
2011-01-06 14:00               ` Gordan Bobic
2011-01-06 14:52                 ` Peter A
2011-01-06 15:07                   ` Gordan Bobic
2011-01-06 16:11                     ` Peter A
2011-01-06 18:35           ` Chris Mason
2011-01-08  0:27             ` Peter A
2011-01-06 14:30         ` Tomasz Torcz
2011-01-06 14:49           ` Gordan Bobic
2011-01-06  1:29   ` Chris Mason
2011-01-06 10:33     ` Gordan Bobic
2011-01-10 15:28     ` Ric Wheeler
2011-01-10 15:37       ` Josef Bacik
2011-01-10 15:39         ` Chris Mason
2011-01-10 15:43           ` Josef Bacik
2011-01-06 12:18   ` Simon Farnsworth
2011-01-06 12:29     ` Gordan Bobic
2011-01-06 13:30       ` Simon Farnsworth
2011-01-06 14:20     ` Ondřej Bílka
2011-01-06 14:41       ` Gordan Bobic
2011-01-06 15:37         ` Ondřej Bílka
2011-01-06  8:25 ` Yan, Zheng 
  -- strict thread matches above, loose matches on Subject: below --
2011-01-06 18:24 Offline Deduplication for Btrfs V2 Josef Bacik
2011-01-06 18:24 ` [PATCH] Btrfs: add extent-same ioctl for dedup Josef Bacik
2011-01-06 18:51   ` Josef Bacik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='ig2b1i$ph9$1@dough.gmane.org' \
    --to=simon@farnz.org.uk \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).