duperemove : some real world figures on BTRFS deduplication

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* duperemove : some real world figures on BTRFS deduplication
@ 2016-12-08 15:11 Swâmi Petaramesh
  2016-12-08 15:42 ` Austin S. Hemmelgarn
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Swâmi Petaramesh @ 2016-12-08 15:11 UTC (permalink / raw)
  To: linux-btrfs

Hi, Some real world figures about running duperemove deduplication on
BTRFS :

I have an external 2,5", 5400 RPM, 1 TB HD, USB3, on which I store the
BTRFS backups (full rsync) of 5 PCs, using 2 different distros,
typically at the same update level, and all of them more of less sharing
the entirety or part of the same set of user files.

For each of these PCs I keep a series of 4-5 BTRFS subvolume snapshots
for having complete backups at different points in time.

The HD was full to 93% and made a good testbed for deduplicating.

So I ran duperemove on this HD, on a machine doing "only this", using a
hashfile. The machine being an Intel i5 with 6 GB of RAM.

Well, the damn thing has been running for 15 days uninterrupted !
...Until I [Ctrl]-C it this morning as I had to move with the machine (I
wasn't expecting it to last THAT long...).

It took about 48 hours just for calculating the files hashes.

Then it took another 48 hours just for "loading the hashes of duplicate
extents".

Then it took 11 days deduplicating until I killed it.

At the end, the disk that was 93% full is now 76% full, so I saved 17%
of 1 TB (170 GB) by deduplicating for 15 days.

Well the thing "works" and my disk isn't full anymore, so that's a very
partial success, but still l wonder if the gain is worth the effort...

Best regards.

ॐ

-- 
Swâmi Petaramesh <swami@petaramesh.org> PGP 9076E32E

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: duperemove : some real world figures on BTRFS deduplication
  2016-12-08 15:11 duperemove : some real world figures on BTRFS deduplication Swâmi Petaramesh
@ 2016-12-08 15:42 ` Austin S. Hemmelgarn
  2016-12-08 18:00   ` Timofey Titovets
  2016-12-08 20:07   ` Jeff Mahoney
  2016-12-08 20:07 ` Jeff Mahoney
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 12+ messages in thread
From: Austin S. Hemmelgarn @ 2016-12-08 15:42 UTC (permalink / raw)
  To: Swâmi Petaramesh, linux-btrfs

On 2016-12-08 10:11, Swâmi Petaramesh wrote:
> Hi, Some real world figures about running duperemove deduplication on
> BTRFS :
>
> I have an external 2,5", 5400 RPM, 1 TB HD, USB3, on which I store the
> BTRFS backups (full rsync) of 5 PCs, using 2 different distros,
> typically at the same update level, and all of them more of less sharing
> the entirety or part of the same set of user files.
>
> For each of these PCs I keep a series of 4-5 BTRFS subvolume snapshots
> for having complete backups at different points in time.
>
> The HD was full to 93% and made a good testbed for deduplicating.
>
> So I ran duperemove on this HD, on a machine doing "only this", using a
> hashfile. The machine being an Intel i5 with 6 GB of RAM.
>
> Well, the damn thing has been running for 15 days uninterrupted !
> ...Until I [Ctrl]-C it this morning as I had to move with the machine (I
> wasn't expecting it to last THAT long...).
>
> It took about 48 hours just for calculating the files hashes.
>
> Then it took another 48 hours just for "loading the hashes of duplicate
> extents".
>
> Then it took 11 days deduplicating until I killed it.
>
> At the end, the disk that was 93% full is now 76% full, so I saved 17%
> of 1 TB (170 GB) by deduplicating for 15 days.
>
> Well the thing "works" and my disk isn't full anymore, so that's a very
> partial success, but still l wonder if the gain is worth the effort...
So, some general explanation here:
Duperemove hashes data in blocks of (by default) 128kB, which means for 
~930GB, you've got about 7618560 blocks to hash, which partly explains 
why it took so long to hash.  Once that's done, it then has to compare 
hashes for all combinations of those blocks, which totals to 
58042456473600 comparisons (hence that taking a long time).  The block 
size thus becomes a trade-off between performance when hashing and 
actual space savings (smaller block size makes hashing take longer, but 
gives overall slightly better results for deduplication).

As far as the rest, given your hashing performance (which is not 
particularly good I might add, roughly 5.6MB/s), the amount of time it 
was taking to do the actual deduplication is reasonable since the 
deduplication ioctl does a byte-wise comparison of the extents to be 
deduplicated prior to actually ref-linking them to ensure you don't lose 
data.

Because of this, generic batch deduplication is not all that great on 
BTRFS.  There are cases where it can work, but usually they're pretty 
specific cases.  In most cases though, you're better off doing a custom 
tool that knows about how your data is laid out and what's likely to be 
duplicated (I've actually got two tools for this for the two cases where 
I use deduplication, they use knowledge of the data-set itself to figure 
out what's duplicated, then just call the ioctl through a wrapper 
(previously the one included in duperemove, currently xfs_io)).


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: duperemove : some real world figures on BTRFS deduplication
  2016-12-08 15:42 ` Austin S. Hemmelgarn
@ 2016-12-08 18:00   ` Timofey Titovets
  2016-12-08 20:07   ` Jeff Mahoney
  1 sibling, 0 replies; 12+ messages in thread
From: Timofey Titovets @ 2016-12-08 18:00 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: Swâmi Petaramesh, linux-btrfs

2016-12-08 18:42 GMT+03:00 Austin S. Hemmelgarn <ahferroin7@gmail.com>:
> On 2016-12-08 10:11, Swâmi Petaramesh wrote:
>>
>> Hi, Some real world figures about running duperemove deduplication on
>> BTRFS :
>>
>> I have an external 2,5", 5400 RPM, 1 TB HD, USB3, on which I store the
>> BTRFS backups (full rsync) of 5 PCs, using 2 different distros,
>> typically at the same update level, and all of them more of less sharing
>> the entirety or part of the same set of user files.
>>
>> For each of these PCs I keep a series of 4-5 BTRFS subvolume snapshots
>> for having complete backups at different points in time.
>>
>> The HD was full to 93% and made a good testbed for deduplicating.
>>
>> So I ran duperemove on this HD, on a machine doing "only this", using a
>> hashfile. The machine being an Intel i5 with 6 GB of RAM.
>>
>> Well, the damn thing has been running for 15 days uninterrupted !
>> ...Until I [Ctrl]-C it this morning as I had to move with the machine (I
>> wasn't expecting it to last THAT long...).
>>
>> It took about 48 hours just for calculating the files hashes.
>>
>> Then it took another 48 hours just for "loading the hashes of duplicate
>> extents".
>>
>> Then it took 11 days deduplicating until I killed it.
>>
>> At the end, the disk that was 93% full is now 76% full, so I saved 17%
>> of 1 TB (170 GB) by deduplicating for 15 days.
>>
>> Well the thing "works" and my disk isn't full anymore, so that's a very
>> partial success, but still l wonder if the gain is worth the effort...
>
> So, some general explanation here:
> Duperemove hashes data in blocks of (by default) 128kB, which means for
> ~930GB, you've got about 7618560 blocks to hash, which partly explains why
> it took so long to hash.  Once that's done, it then has to compare hashes
> for all combinations of those blocks, which totals to 58042456473600
> comparisons (hence that taking a long time).  The block size thus becomes a
> trade-off between performance when hashing and actual space savings (smaller
> block size makes hashing take longer, but gives overall slightly better
> results for deduplication).
>
> As far as the rest, given your hashing performance (which is not
> particularly good I might add, roughly 5.6MB/s), the amount of time it was
> taking to do the actual deduplication is reasonable since the deduplication
> ioctl does a byte-wise comparison of the extents to be deduplicated prior to
> actually ref-linking them to ensure you don't lose data.
>
> Because of this, generic batch deduplication is not all that great on BTRFS.
> There are cases where it can work, but usually they're pretty specific
> cases.  In most cases though, you're better off doing a custom tool that
> knows about how your data is laid out and what's likely to be duplicated
> (I've actually got two tools for this for the two cases where I use
> deduplication, they use knowledge of the data-set itself to figure out
> what's duplicated, then just call the ioctl through a wrapper (previously
> the one included in duperemove, currently xfs_io)).
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Zygo do the good job on this too.
Try:
https://github.com/Zygo/bees

It's cool and can work better on large massive of data, because it
dedup in the same time with scanning phase.
-- 
Have a nice day,
Timofey.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: duperemove : some real world figures on BTRFS deduplication
  2016-12-08 15:11 duperemove : some real world figures on BTRFS deduplication Swâmi Petaramesh
  2016-12-08 15:42 ` Austin S. Hemmelgarn
@ 2016-12-08 20:07 ` Jeff Mahoney
  2016-12-09 14:06   ` Swâmi Petaramesh
  2016-12-09  2:58 ` Chris Murphy
       [not found] ` <CAEtw4r2Q3pz8FQrKgij_fWTBw7p2YRB6DqYrXzoOZ-g0htiKAw@mail.gmail.com>
  3 siblings, 1 reply; 12+ messages in thread
From: Jeff Mahoney @ 2016-12-08 20:07 UTC (permalink / raw)
  To: Swâmi Petaramesh, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 2026 bytes --]

On 12/8/16 10:11 AM, Swâmi Petaramesh wrote:
> Hi, Some real world figures about running duperemove deduplication on
> BTRFS :
> 
> I have an external 2,5", 5400 RPM, 1 TB HD, USB3, on which I store the
> BTRFS backups (full rsync) of 5 PCs, using 2 different distros,
> typically at the same update level, and all of them more of less sharing
> the entirety or part of the same set of user files.
> 
> For each of these PCs I keep a series of 4-5 BTRFS subvolume snapshots
> for having complete backups at different points in time.
> 
> The HD was full to 93% and made a good testbed for deduplicating.
> 
> So I ran duperemove on this HD, on a machine doing "only this", using a
> hashfile. The machine being an Intel i5 with 6 GB of RAM.
> 
> Well, the damn thing has been running for 15 days uninterrupted !
> ...Until I [Ctrl]-C it this morning as I had to move with the machine (I
> wasn't expecting it to last THAT long...).
> 
> It took about 48 hours just for calculating the files hashes.
> 
> Then it took another 48 hours just for "loading the hashes of duplicate
> extents".
> 
> Then it took 11 days deduplicating until I killed it.
> 
> At the end, the disk that was 93% full is now 76% full, so I saved 17%
> of 1 TB (170 GB) by deduplicating for 15 days.
> 
> Well the thing "works" and my disk isn't full anymore, so that's a very
> partial success, but still l wonder if the gain is worth the effort...

What version were you using?  I know Mark had put a bunch of effort into
reducing the memory footprint and runtime.  The earlier versions were
"can we get this thing working" while the newer versions are more efficient.

What throughput are you getting to that disk?  I get that it's USB3, but
reading 1TB doesn't take a terribly long time so 15 days is pretty
ridiculous.

At any rate, the good news is that when you run it again, assuming you
used the hash file, it will not have to rescan most of your data set.

-Jeff

-- 
Jeff Mahoney
SUSE Labs


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 841 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: duperemove : some real world figures on BTRFS deduplication
  2016-12-08 15:42 ` Austin S. Hemmelgarn
  2016-12-08 18:00   ` Timofey Titovets
@ 2016-12-08 20:07   ` Jeff Mahoney
  2016-12-08 20:46     ` Austin S. Hemmelgarn
  1 sibling, 1 reply; 12+ messages in thread
From: Jeff Mahoney @ 2016-12-08 20:07 UTC (permalink / raw)
  To: Austin S. Hemmelgarn, Swâmi Petaramesh, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 2382 bytes --]

On 12/8/16 10:42 AM, Austin S. Hemmelgarn wrote:
> On 2016-12-08 10:11, Swâmi Petaramesh wrote:
>> Hi, Some real world figures about running duperemove deduplication on
>> BTRFS :
>>
>> I have an external 2,5", 5400 RPM, 1 TB HD, USB3, on which I store the
>> BTRFS backups (full rsync) of 5 PCs, using 2 different distros,
>> typically at the same update level, and all of them more of less sharing
>> the entirety or part of the same set of user files.
>>
>> For each of these PCs I keep a series of 4-5 BTRFS subvolume snapshots
>> for having complete backups at different points in time.
>>
>> The HD was full to 93% and made a good testbed for deduplicating.
>>
>> So I ran duperemove on this HD, on a machine doing "only this", using a
>> hashfile. The machine being an Intel i5 with 6 GB of RAM.
>>
>> Well, the damn thing has been running for 15 days uninterrupted !
>> ...Until I [Ctrl]-C it this morning as I had to move with the machine (I
>> wasn't expecting it to last THAT long...).
>>
>> It took about 48 hours just for calculating the files hashes.
>>
>> Then it took another 48 hours just for "loading the hashes of duplicate
>> extents".
>>
>> Then it took 11 days deduplicating until I killed it.
>>
>> At the end, the disk that was 93% full is now 76% full, so I saved 17%
>> of 1 TB (170 GB) by deduplicating for 15 days.
>>
>> Well the thing "works" and my disk isn't full anymore, so that's a very
>> partial success, but still l wonder if the gain is worth the effort...
> So, some general explanation here:
> Duperemove hashes data in blocks of (by default) 128kB, which means for
> ~930GB, you've got about 7618560 blocks to hash, which partly explains
> why it took so long to hash.  Once that's done, it then has to compare
> hashes for all combinations of those blocks, which totals to
> 58042456473600 comparisons (hence that taking a long time).  The block
> size thus becomes a trade-off between performance when hashing and
> actual space savings (smaller block size makes hashing take longer, but
> gives overall slightly better results for deduplication).

IIRC, the core of the duperemove duplicate matcher isn't an O(n^2)
algorithm.  I think Mark used a bloom filter to reduce the data set
prior to matching, but I haven't looked at the code in a while.

-Jeff

-- 
Jeff Mahoney
SUSE Labs


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 841 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: duperemove : some real world figures on BTRFS deduplication
  2016-12-08 20:07   ` Jeff Mahoney
@ 2016-12-08 20:46     ` Austin S. Hemmelgarn
  0 siblings, 0 replies; 12+ messages in thread
From: Austin S. Hemmelgarn @ 2016-12-08 20:46 UTC (permalink / raw)
  To: Jeff Mahoney, Swâmi Petaramesh, linux-btrfs

On 2016-12-08 15:07, Jeff Mahoney wrote:
> On 12/8/16 10:42 AM, Austin S. Hemmelgarn wrote:
>> On 2016-12-08 10:11, Swâmi Petaramesh wrote:
>>> Hi, Some real world figures about running duperemove deduplication on
>>> BTRFS :
>>>
>>> I have an external 2,5", 5400 RPM, 1 TB HD, USB3, on which I store the
>>> BTRFS backups (full rsync) of 5 PCs, using 2 different distros,
>>> typically at the same update level, and all of them more of less sharing
>>> the entirety or part of the same set of user files.
>>>
>>> For each of these PCs I keep a series of 4-5 BTRFS subvolume snapshots
>>> for having complete backups at different points in time.
>>>
>>> The HD was full to 93% and made a good testbed for deduplicating.
>>>
>>> So I ran duperemove on this HD, on a machine doing "only this", using a
>>> hashfile. The machine being an Intel i5 with 6 GB of RAM.
>>>
>>> Well, the damn thing has been running for 15 days uninterrupted !
>>> ...Until I [Ctrl]-C it this morning as I had to move with the machine (I
>>> wasn't expecting it to last THAT long...).
>>>
>>> It took about 48 hours just for calculating the files hashes.
>>>
>>> Then it took another 48 hours just for "loading the hashes of duplicate
>>> extents".
>>>
>>> Then it took 11 days deduplicating until I killed it.
>>>
>>> At the end, the disk that was 93% full is now 76% full, so I saved 17%
>>> of 1 TB (170 GB) by deduplicating for 15 days.
>>>
>>> Well the thing "works" and my disk isn't full anymore, so that's a very
>>> partial success, but still l wonder if the gain is worth the effort...
>> So, some general explanation here:
>> Duperemove hashes data in blocks of (by default) 128kB, which means for
>> ~930GB, you've got about 7618560 blocks to hash, which partly explains
>> why it took so long to hash.  Once that's done, it then has to compare
>> hashes for all combinations of those blocks, which totals to
>> 58042456473600 comparisons (hence that taking a long time).  The block
>> size thus becomes a trade-off between performance when hashing and
>> actual space savings (smaller block size makes hashing take longer, but
>> gives overall slightly better results for deduplication).
>
> IIRC, the core of the duperemove duplicate matcher isn't an O(n^2)
> algorithm.  I think Mark used a bloom filter to reduce the data set
> prior to matching, but I haven't looked at the code in a while.
>
You're right, I had completely forgotten about that.

Regardless of that though, it's still a lot of processing that needs done.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: duperemove : some real world figures on BTRFS deduplication
  2016-12-08 15:11 duperemove : some real world figures on BTRFS deduplication Swâmi Petaramesh
  2016-12-08 15:42 ` Austin S. Hemmelgarn
  2016-12-08 20:07 ` Jeff Mahoney
@ 2016-12-09  2:58 ` Chris Murphy
  2016-12-09 13:45   ` Swâmi Petaramesh
       [not found] ` <CAEtw4r2Q3pz8FQrKgij_fWTBw7p2YRB6DqYrXzoOZ-g0htiKAw@mail.gmail.com>
  3 siblings, 1 reply; 12+ messages in thread
From: Chris Murphy @ 2016-12-09  2:58 UTC (permalink / raw)
  To: Swâmi Petaramesh; +Cc: Btrfs BTRFS

On Thu, Dec 8, 2016 at 8:11 AM, Swâmi Petaramesh <swami@petaramesh.org> wrote:

> Well, the damn thing has been running for 15 days uninterrupted !
> ...Until I [Ctrl]-C it this morning as I had to move with the machine (I
> wasn't expecting it to last THAT long...).

Can you check some bigger files and see if they've become fragmented?
I'm seeing 1.4GiB files with 2-3 extents reported by filefrag, go to
over 5000 fragments during dedupe. This is not something I recall
happening some months ago.

I inadvertently replied to the wrong dedupe thread about my test and
what I'm finding, it's here.
https://www.spinics.net/lists/linux-btrfs/msg61304.html

But if you're seeing something similar, then it would explain why it's
so slow in your case.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: duperemove : some real world figures on BTRFS deduplication
       [not found] ` <CAEtw4r2Q3pz8FQrKgij_fWTBw7p2YRB6DqYrXzoOZ-g0htiKAw@mail.gmail.com>
@ 2016-12-09  7:56   ` Peter Becker
  0 siblings, 0 replies; 12+ messages in thread
From: Peter Becker @ 2016-12-09  7:56 UTC (permalink / raw)
  To: Swâmi Petaramesh; +Cc: linux-btrfs

> 2016-12-08 16:11 GMT+01:00 Swâmi Petaramesh <swami@petaramesh.org>:
>
> Then it took another 48 hours just for "loading the hashes of duplicate
> extents".
>

This issue i adressing currently with the following patches:
https://github.com/Floyddotnet/duperemove/commits/digest_trigger

Tested with a 3,9 TB directory, with 4723 objects:

old implementation of dbfile_load_hashes took 36593ms
new implementation of dbfile_load_hashes took 11ms

You can use this versions save. But i have to do more work. (for
example a migrationscript for existing hashfiles)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: duperemove : some real world figures on BTRFS deduplication
  2016-12-09  2:58 ` Chris Murphy
@ 2016-12-09 13:45   ` Swâmi Petaramesh
  2016-12-09 15:43     ` Chris Murphy
  0 siblings, 1 reply; 12+ messages in thread
From: Swâmi Petaramesh @ 2016-12-09 13:45 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

Hi Chris, thanks for your answer,

On 12/09/2016 03:58 AM, Chris Murphy wrote:
> Can you check some bigger files and see if they've become fragmented?
> I'm seeing 1.4GiB files with 2-3 extents reported by filefrag, go to
> over 5000 fragments during dedupe. This is not something I recall
> happening some months ago.

I have checked directories containing VM hard disks, that would be good
candidates. As they're backed up using full rsyncs, I wouldn't expect
them to be heavily fragmented (OTOH the whole BTRFS filesystem is lzo
compressed, and I believe that it may affect the number of extents
reported by filefrag...?)

Anyway this is the number of fragments that I get for a bunch of VMS HD
files which are in the range from a couple GB to about 20 GB.

The number of fragments reported by filefrag : 2907, 2560, 314, 10107

If compression has nothing to do with this, then this is heavy
fragmentation.

Kind regards.

ॐ

-- 
Swâmi Petaramesh <swami@petaramesh.org> PGP 9076E32E

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: duperemove : some real world figures on BTRFS deduplication
  2016-12-08 20:07 ` Jeff Mahoney
@ 2016-12-09 14:06   ` Swâmi Petaramesh
  0 siblings, 0 replies; 12+ messages in thread
From: Swâmi Petaramesh @ 2016-12-09 14:06 UTC (permalink / raw)
  To: Jeff Mahoney, linux-btrfs

Hi Jeff, thanks for your reply,


On 12/08/2016 09:07 PM, Jeff Mahoney wrote:
> What version were you using? 
That's v0.11.beta4, installed rather recently
> What throughput are you getting to that disk?  I get that it's USB3, but
> reading 1TB doesn't take a terribly long time so 15 days is pretty
> ridiculous.
This is run from inside a VM, onto a physical USB3 HD. Copying to/from
this HD shows a speed that corresponds to what I would expect on the
same HD connected to a physical (not virtual) setup.

The only quick data that I can get are from "hdparm", that says :

- Timed cached reads : 5976 MB/sec
- Timed buffered disk reads : 105 MB/sec

Kind regards.

ॐ

-- 
Swâmi Petaramesh <swami@petaramesh.org> PGP 9076E32E


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: duperemove : some real world figures on BTRFS deduplication
  2016-12-09 13:45   ` Swâmi Petaramesh
@ 2016-12-09 15:43     ` Chris Murphy
  2016-12-09 16:07       ` Holger Hoffstätte
  0 siblings, 1 reply; 12+ messages in thread
From: Chris Murphy @ 2016-12-09 15:43 UTC (permalink / raw)
  To: Swâmi Petaramesh; +Cc: Chris Murphy, Btrfs BTRFS

On Fri, Dec 9, 2016 at 6:45 AM, Swâmi Petaramesh <swami@petaramesh.org> wrote:
> Hi Chris, thanks for your answer,
>
> On 12/09/2016 03:58 AM, Chris Murphy wrote:
>> Can you check some bigger files and see if they've become fragmented?
>> I'm seeing 1.4GiB files with 2-3 extents reported by filefrag, go to
>> over 5000 fragments during dedupe. This is not something I recall
>> happening some months ago.
>
> I have checked directories containing VM hard disks, that would be good
> candidates. As they're backed up using full rsyncs, I wouldn't expect
> them to be heavily fragmented (OTOH the whole BTRFS filesystem is lzo
> compressed, and I believe that it may affect the number of extents
> reported by filefrag...?)
>
> Anyway this is the number of fragments that I get for a bunch of VMS HD
> files which are in the range from a couple GB to about 20 GB.
>
> The number of fragments reported by filefrag : 2907, 2560, 314, 10107
>
> If compression has nothing to do with this, then this is heavy
> fragmentation.

It's probably not that fragmented. Due to compression, metadata
describes 128KiB extents even though the data is actually contiguous.

And it might be the same thing in my case also, even though no
compression is involved.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: duperemove : some real world figures on BTRFS deduplication
  2016-12-09 15:43     ` Chris Murphy
@ 2016-12-09 16:07       ` Holger Hoffstätte
  0 siblings, 0 replies; 12+ messages in thread
From: Holger Hoffstätte @ 2016-12-09 16:07 UTC (permalink / raw)
  To: Chris Murphy, Swâmi Petaramesh; +Cc: Btrfs BTRFS

On 12/09/16 16:43, Chris Murphy wrote:
>> If compression has nothing to do with this, then this is heavy
>> fragmentation.
> 
> It's probably not that fragmented. Due to compression, metadata
> describes 128KiB extents even though the data is actually contiguous.
> 
> And it might be the same thing in my case also, even though no
> compression is involved.

In that case you can quickly collapse physically contiguous ranges by
reflink-mv'ing (ie. a recent mv) the file across subvolume boundaries
and back. :)

-h


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-12-09 16:08 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-12-08 15:11 duperemove : some real world figures on BTRFS deduplication Swâmi Petaramesh
2016-12-08 15:42 ` Austin S. Hemmelgarn
2016-12-08 18:00   ` Timofey Titovets
2016-12-08 20:07   ` Jeff Mahoney
2016-12-08 20:46     ` Austin S. Hemmelgarn
2016-12-08 20:07 ` Jeff Mahoney
2016-12-09 14:06   ` Swâmi Petaramesh
2016-12-09  2:58 ` Chris Murphy
2016-12-09 13:45   ` Swâmi Petaramesh
2016-12-09 15:43     ` Chris Murphy
2016-12-09 16:07       ` Holger Hoffstätte
     [not found] ` <CAEtw4r2Q3pz8FQrKgij_fWTBw7p2YRB6DqYrXzoOZ-g0htiKAw@mail.gmail.com>
2016-12-09  7:56   ` Peter Becker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).