linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* de-duplicating +C files
@ 2020-10-07  6:29 Eric Levy
  2020-10-07 10:07 ` Adam Borowski
  2020-10-07 11:57 ` Zygo Blaxell
  0 siblings, 2 replies; 5+ messages in thread
From: Eric Levy @ 2020-10-07  6:29 UTC (permalink / raw)
  To: linux-btrfs

Recently a discussion [1] began about the desirability or risk of
applying a de-duplication operation on files with a C (no-CoW)
attribute set. The controversy rests largely on the observation that
calls to Btrfs currently fail for de-duplication between two files if
exactly one has the attribute set, but succeed in other cases, even in
which both have the attribute set. It may seem more natural that
success depends on neither file having the attribute set.

Comments would be welcome in either this thread, or perhaps even more
conveniently, in the Github discussion, over how this behavior of
Btrfs relates to the preferred operation of a de-duplication
application.


[1] https://github.com/markfasheh/duperemove/issues/251

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: de-duplicating +C files
  2020-10-07  6:29 de-duplicating +C files Eric Levy
@ 2020-10-07 10:07 ` Adam Borowski
  2020-10-07 10:12   ` Nikolay Borisov
  2020-10-07 11:35   ` Eric Levy
  2020-10-07 11:57 ` Zygo Blaxell
  1 sibling, 2 replies; 5+ messages in thread
From: Adam Borowski @ 2020-10-07 10:07 UTC (permalink / raw)
  To: Eric Levy; +Cc: linux-btrfs

On Wed, Oct 07, 2020 at 02:29:34AM -0400, Eric Levy wrote:
> Recently a discussion [1] began about the desirability or risk of
> applying a de-duplication operation on files with a C (no-CoW)
> attribute set. The controversy rests largely on the observation that
> calls to Btrfs currently fail for de-duplication between two files if
> exactly one has the attribute set, but succeed in other cases, even in
> which both have the attribute set. It may seem more natural that
> success depends on neither file having the attribute set.

Why?  It's just that the flag is misnamed, it should be "overwrite in-place"
as it doesn't block CoW in any way.  It works like most uses of the word
"CoW" elsewhere: a page is CoWed once whenever it's written to when its
refcount is more than 1.

You can already reflink by snapshotting the subvolume that contains your
file, thus there's no point in blocking explicit deduplication.

The behaviour when trying to dedupe a -C extent with a +C one is less clear.
I'd argue to either:
 * prefer the -C copy, or
 * refuse (current state)


Meow!
-- 
⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁ Imagine there are bandits in your house, your kid is bleeding out,
⢿⡄⠘⠷⠚⠋⠀ the house is on fire, and seven giant trumpets are playing in the
⠈⠳⣄⠀⠀⠀⠀ sky.  Your cat demands food.  The priority should be obvious...

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: de-duplicating +C files
  2020-10-07 10:07 ` Adam Borowski
@ 2020-10-07 10:12   ` Nikolay Borisov
  2020-10-07 11:35   ` Eric Levy
  1 sibling, 0 replies; 5+ messages in thread
From: Nikolay Borisov @ 2020-10-07 10:12 UTC (permalink / raw)
  To: Adam Borowski, Eric Levy; +Cc: linux-btrfs



On 7.10.20 г. 13:07 ч., Adam Borowski wrote:
> On Wed, Oct 07, 2020 at 02:29:34AM -0400, Eric Levy wrote:
>> Recently a discussion [1] began about the desirability or risk of
>> applying a de-duplication operation on files with a C (no-CoW)
>> attribute set. The controversy rests largely on the observation that
>> calls to Btrfs currently fail for de-duplication between two files if
>> exactly one has the attribute set, but succeed in other cases, even in
>> which both have the attribute set. It may seem more natural that
>> success depends on neither file having the attribute set.
> 
> Why?  It's just that the flag is misnamed, it should be "overwrite in-place"
> as it doesn't block CoW in any way.  It works like most uses of the word
> "CoW" elsewhere: a page is CoWed once whenever it's written to when its
> refcount is more than 1.
> 
> You can already reflink by snapshotting the subvolume that contains your
> file, thus there's no point in blocking explicit deduplication.
> 
> The behaviour when trying to dedupe a -C extent with a +C one is less clear.
> I'd argue to either:
>  * prefer the -C copy, or
>  * refuse (current state)
> 

The -C and +C distinction is necessary because when a file is nodatacow
it's also nodatasum so deduplicating a -C into +C file would result in
+C file having some extents with checksums and without. So for the sake
of consistency btrfs enforces 2 invariants:

a) All file extents must have checksums
or
b) No file extents shall have csums.

> 
> Meow!
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: de-duplicating +C files
  2020-10-07 10:07 ` Adam Borowski
  2020-10-07 10:12   ` Nikolay Borisov
@ 2020-10-07 11:35   ` Eric Levy
  1 sibling, 0 replies; 5+ messages in thread
From: Eric Levy @ 2020-10-07 11:35 UTC (permalink / raw)
  Cc: linux-btrfs

What is the practical utility of the attribute? What is the benefit of
the constraints it enforces, for example, in the case of swap files,
for which it is required?

On Wed, Oct 7, 2020 at 6:07 AM Adam Borowski <kilobyte@angband.pl> wrote:
> Why?  It's just that the flag is misnamed, it should be "overwrite in-place"
> as it doesn't block CoW in any way.  It works like most uses of the word
> "CoW" elsewhere: a page is CoWed once whenever it's written to when its
> refcount is more than 1.

On Wed, Oct 7, 2020 at 6:07 AM Adam Borowski <kilobyte@angband.pl> wrote:
>
> On Wed, Oct 07, 2020 at 02:29:34AM -0400, Eric Levy wrote:
> > Recently a discussion [1] began about the desirability or risk of
> > applying a de-duplication operation on files with a C (no-CoW)
> > attribute set. The controversy rests largely on the observation that
> > calls to Btrfs currently fail for de-duplication between two files if
> > exactly one has the attribute set, but succeed in other cases, even in
> > which both have the attribute set. It may seem more natural that
> > success depends on neither file having the attribute set.
>
> Why?  It's just that the flag is misnamed, it should be "overwrite in-place"
> as it doesn't block CoW in any way.  It works like most uses of the word
> "CoW" elsewhere: a page is CoWed once whenever it's written to when its
> refcount is more than 1.
>
> You can already reflink by snapshotting the subvolume that contains your
> file, thus there's no point in blocking explicit deduplication.
>
> The behaviour when trying to dedupe a -C extent with a +C one is less clear.
> I'd argue to either:
>  * prefer the -C copy, or
>  * refuse (current state)
>
>
> Meow!
> --
> ⢀⣴⠾⠻⢶⣦⠀
> ⣾⠁⢠⠒⠀⣿⡁ Imagine there are bandits in your house, your kid is bleeding out,
> ⢿⡄⠘⠷⠚⠋⠀ the house is on fire, and seven giant trumpets are playing in the
> ⠈⠳⣄⠀⠀⠀⠀ sky.  Your cat demands food.  The priority should be obvious...

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: de-duplicating +C files
  2020-10-07  6:29 de-duplicating +C files Eric Levy
  2020-10-07 10:07 ` Adam Borowski
@ 2020-10-07 11:57 ` Zygo Blaxell
  1 sibling, 0 replies; 5+ messages in thread
From: Zygo Blaxell @ 2020-10-07 11:57 UTC (permalink / raw)
  To: Eric Levy; +Cc: linux-btrfs

On Wed, Oct 07, 2020 at 02:29:34AM -0400, Eric Levy wrote:
> Recently a discussion [1] began about the desirability or risk of
> applying a de-duplication operation on files with a C (no-CoW)
> attribute set. The controversy rests largely on the observation that
> calls to Btrfs currently fail for de-duplication between two files if
> exactly one has the attribute set, but succeed in other cases, even in
> which both have the attribute set. It may seem more natural that
> success depends on neither file having the attribute set.
> 
> Comments would be welcome in either this thread, or perhaps even more
> conveniently, in the Github discussion, over how this behavior of
> Btrfs relates to the preferred operation of a de-duplication
> application.
> 
> [1] https://github.com/markfasheh/duperemove/issues/251

It came up as a bees issue 3 years ago:

	https://github.com/Zygo/bees/issues/31

The btrfs behavior results from an optimization for data checksums:
rather than looking up csums on an extent-by-extent basis, there is a
flag in the inode which specifies whether the entire file has csums.
If the inode flag indicates no csums, the entire csum lookup can be
skipped for any read from the file.  The inode flag also means that
extents which are referenced by a file that has csums cannot be shared
(dedupe, reflink, or snapshot) with a file that does not have csums
or vice versa.  nodatacow implies nodatasum because atomic csum update
requires copy-on-write to work, and that is where the +C attribute comes
in.  Other filesystems don't have data csums so they don't hit this issue.

nodatacow files can be reflinked or snapshotted on btrfs.  Unlike
nodatasum, any individual extent in any file can be datacow or not.
Extents in nodatacow files which are shared will operate temporarily in
datacow mode until they are no longer shared, then revert to nodatacow
behavior.  Prealloc does something similar, but only to extents created
with fallocate (and in reverse, starting in nodatacow and ending in
datacow state).  Dedupe is just reflink with an extra verification step.

As a temporary hack to avoid bad behavior for people running VM image
files in nodatacow mode, bees currently ignores nodatacow (+C) files.
This isn't correct--it should be ignoring nodatasum files instead, but
the nodatasum flag is only visible in the raw inode metadata, and has
to be fetched by the btrfs-specific TREE_SEARCH ioctl.  So far, nobody
has complained about datacow+nodatasum files not getting deduped, and
datasum+nodatacow is disallowed by btrfs, so the temporary hack seems
fine for now.

Proper support for nodatasum is planned in future bees versions, with a
user option to either exclude nodatasum and/or nodatacow files completely
from dedupe (as is done now, but using the correct inode flag) or support
separate dedupe domains for datasum and nodatasum extents.  This can be
done easily in most dedupers by hashing extents from datasum files with a
different hash salt than extents from nodatasum files, so that no datasum
block could match a non-datasum block, but nodatasum blocks could match
other nodatasum blocks, and datasum blocks can match other datasum blocks.
(bees has extent-splitting, which adds some additional requirements not
addressed in this paragraph)

Duperemove can easily implement both options:  Check for FS_NOCOW_FL
attribute on file open, and either drop the file, or invert the bits
of the hash as a crude but effective hash salt for hashes coming from
the file, and everything else works as before.  Some hash functions
have an explicit separate salt parameter or IV that can be set to
implement segregated collision domains.  Based on the zero complaints
I've received so far, I doubt any duperemove users will complain about
incorrect handling of datacow+nodatasum files either--and those users
can run the current duperemove anyway, they just need to make sure
they only pass files on the command line that have nodatasum set.

Filesystem tricks and quirks aside, it seems likely most users would
prefer the "ignore all nodatacow files for dedupe" behavior, regardless
of the datasum issue.  The whole point of nodatacow files would seem
to be to not have shared extents in them, while the whole point of
dedupe would seem to be to have as many shared extents as possible, and
only one of those points can be reached at a time.  Since chattr +C is
localized a statement of user intent wrt a specific filesystem object,
the sensible default is to not dedupe +C files.

That said, there are some edge use cases where it's useful, like big
read-only segments in VM images.  My plan is to keep that as default
bees behavior, but make dedupe on nodatacow and nodatasum fully supported
as optional modes.

Aside:  It looks like it should be possible to support a reflink or dedupe
from a datasum file to a nodatasum file (the nodatasum file would ignore
the csums), but the current btrfs code doesn't allow it.  That would make
some things easier and others a bit more complicated.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-10-07 11:57 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-10-07  6:29 de-duplicating +C files Eric Levy
2020-10-07 10:07 ` Adam Borowski
2020-10-07 10:12   ` Nikolay Borisov
2020-10-07 11:35   ` Eric Levy
2020-10-07 11:57 ` Zygo Blaxell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).