* About per-file dedup flag @ 2016-01-12 3:09 Qu Wenruo 2016-01-12 4:13 ` Duncan 0 siblings, 1 reply; 4+ messages in thread From: Qu Wenruo @ 2016-01-12 3:09 UTC (permalink / raw) To: btrfs Hi all, As some already know, we are implement btrfs in-band de-duplication. And we already have a working and stable version internal. But that's filesystem wide de-duplication. Now we hope to add support to enable/disable dedup per-file. Much like current NODATACOW/NOCOMPRESS for inode. But we are not sure where to add such flag. Here is our current ideas: 1) XATTR Make a btrfs internal xattr, just like some btrfs prop. 2) inode flag, like FS_NOCOMP_FL Although only btrfs is going to support in-band dedup, who knows what will happen in future? Any advice is welcomed. Thanks, Qu ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: About per-file dedup flag 2016-01-12 3:09 About per-file dedup flag Qu Wenruo @ 2016-01-12 4:13 ` Duncan 2016-01-12 4:51 ` Qu Wenruo 0 siblings, 1 reply; 4+ messages in thread From: Duncan @ 2016-01-12 4:13 UTC (permalink / raw) To: linux-btrfs Qu Wenruo posted on Tue, 12 Jan 2016 11:09:23 +0800 as excerpted: > Now we hope to add support to enable/disable dedup per-file. > Much like current NODATACOW/NOCOMPRESS for inode. How is this going to work? NODATACOW/NOCOMPRESS can apply to a single file. But a dup flag, by definition, needs two files, except for the special case of parts of a file duplicating other parts of the same file. Is there going to be some background thread that checks for dups and reflinks duplicated extents if both files have the dup flag set? What if one has it on and one has it off? Presumably, if a file has it on and it is copied (so a new file), the copy would be reflinked. But if the flag is off, does that make the file actually data-copy, by default, even if cp decides to do a reflink copy by default? And does the copy automatically have the dup flag set as well, or does the original instance set dup, while the new copy, reflinked to the old one due to that dup flag, still have the dup flag unset, until the user sets it? OTOH, I can see such an attribute for dirs making more sense, since it could be inherited much like the NOCOW attribute, and new files created there could automatically be checked against the current files to see if parts are dup, and reflink them if so. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: About per-file dedup flag 2016-01-12 4:13 ` Duncan @ 2016-01-12 4:51 ` Qu Wenruo 2016-01-12 5:11 ` Duncan 0 siblings, 1 reply; 4+ messages in thread From: Qu Wenruo @ 2016-01-12 4:51 UTC (permalink / raw) To: Duncan, linux-btrfs Duncan wrote on 2016/01/12 04:13 +0000: > Qu Wenruo posted on Tue, 12 Jan 2016 11:09:23 +0800 as excerpted: > >> Now we hope to add support to enable/disable dedup per-file. >> Much like current NODATACOW/NOCOMPRESS for inode. > > How is this going to work? > > NODATACOW/NOCOMPRESS can apply to a single file. But a dup flag, by > definition, needs two files, except for the special case of parts of a > file duplicating other parts of the same file. Is there going to be some > background thread that checks for dups and reflinks duplicated extents if > both files have the dup flag set? What if one has it on and one has it > off? You are still thinking in the way off-band dedup. For off-band dedup, we need two extents to compare. But for in-band dedup, we are not using reflink or similar facility. Instead, we have a hash pool, recording part or all of our known hashes of extents. So the things should be quite easy to understand: For normal case (no NODEDUP flag), valid data(page cache) will be hashed to find if it's a duplicated one. For NODEDUP flag case, all its page cache just direct write to disk or compressed then write to disk. No hash will be calculated. Thanks, Qu > > Presumably, if a file has it on and it is copied (so a new file), the > copy would be reflinked. But if the flag is off, does that make the file > actually data-copy, by default, even if cp decides to do a reflink copy > by default? And does the copy automatically have the dup flag set as > well, or does the original instance set dup, while the new copy, reflinked > to the old one due to that dup flag, still have the dup flag unset, until > the user sets it? > > OTOH, I can see such an attribute for dirs making more sense, since it > could be inherited much like the NOCOW attribute, and new files created > there could automatically be checked against the current files to see if > parts are dup, and reflink them if so. > ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: About per-file dedup flag 2016-01-12 4:51 ` Qu Wenruo @ 2016-01-12 5:11 ` Duncan 0 siblings, 0 replies; 4+ messages in thread From: Duncan @ 2016-01-12 5:11 UTC (permalink / raw) To: linux-btrfs Qu Wenruo posted on Tue, 12 Jan 2016 12:51:33 +0800 as excerpted: > Duncan wrote on 2016/01/12 04:13 +0000: >> Qu Wenruo posted on Tue, 12 Jan 2016 11:09:23 +0800 as excerpted: >> >>> Now we hope to add support to enable/disable dedup per-file. >>> Much like current NODATACOW/NOCOMPRESS for inode. >> >> How is this going to work? >> >> NODATACOW/NOCOMPRESS can apply to a single file. But a dup flag, by >> definition, needs two files, except for the special case of parts of a >> file duplicating other parts of the same file. > > You are still thinking in the way off-band dedup. > So the things should be quite easy to understand: > > For normal case (no NODEDUP flag), valid data(page cache) will be hashed > to find if it's a duplicated one. > > For NODEDUP flag case, all its page cache just direct write to disk or > compressed then write to disk. > No hash will be calculated. Oh, _NO_DEDUP. =:^) Opposite the dedup logic implied by the subject, with no hint in the original post indicating logic actually the reverse of that. NODEDUP indeed makes more sense, since with a mount or filesystem option enabling dedup, it would then be the default and nodedup as a per-file exception is the next logical extension. Thanks. I knew I must be missing something. A little negation makes a big difference! =:^) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2016-01-12 5:12 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-01-12 3:09 About per-file dedup flag Qu Wenruo 2016-01-12 4:13 ` Duncan 2016-01-12 4:51 ` Qu Wenruo 2016-01-12 5:11 ` Duncan
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).