* Why do full balance and deduplication reduce available free space?
@ 2017-10-02 10:02 Niccolò Belli
2017-10-02 10:16 ` Hans van Kranenburg
` (3 more replies)
0 siblings, 4 replies; 10+ messages in thread
From: Niccolò Belli @ 2017-10-02 10:02 UTC (permalink / raw)
To: linux-btrfs
Hi,
I have several subvolumes mounted with compress-force=lzo and autodefrag.
Since I use lots of snapshots (snapper keeps around 24 hourly snapshots, 7
daily snapshots and 4 weekly snapshots) I had to create a systemd timer to
perform a full balance and deduplication each night. In fact data needs to
be already deduplicated when snapshots are created, otherwise I have no
other way to deduplicate snapshots.
This is how I performe balance: btrfs balance start --full-balance rootfs
This is how I perform deduplication (duperemove is from git master):
duperemove -drh --dedupe-options=noblock --hashfile=../rootfs.hash
<all_subvols_except_snapshots_ones>
Looking at the logs I noticed something weird: available free space
actually decreases after balance or deduplication.
This is just before the timer starts:
Overall:
Device size: 128.00GiB
Device allocated: 49.03GiB
Device unallocated: 78.97GiB
Device missing: 0.00B
Used: 43.78GiB
Free (estimated): 82.97GiB (min: 82.97GiB)
Data ratio: 1.00
Metadata ratio: 1.00
Global reserve: 512.00MiB (used: 0.00B)
Data,single: Size:44.00GiB, Used:40.00GiB
/dev/sda5 44.00GiB
Metadata,single: Size:5.00GiB, Used:3.78GiB
/dev/sda5 5.00GiB
System,single: Size:32.00MiB, Used:16.00KiB
/dev/sda5 32.00MiB
Unallocated:
/dev/sda5 78.97GiB
I also manually performed a full balance just before the timer starts:
Overall:
Device size: 128.00GiB
Device allocated: 46.03GiB
Device unallocated: 81.97GiB
Device missing: 0.00B
Used: 43.78GiB
Free (estimated): 82.96GiB (min: 82.96GiB)
Data ratio: 1.00
Metadata ratio: 1.00
Global reserve: 512.00MiB (used: 0.00B)
Data,single: Size:41.00GiB, Used:40.01GiB
/dev/sda5 41.00GiB
Metadata,single: Size:5.00GiB, Used:3.77GiB
/dev/sda5 5.00GiB
System,single: Size:32.00MiB, Used:16.00KiB
/dev/sda5 32.00MiB
Unallocated:
/dev/sda5 81.97GiB
As you can see even doing a full balance was enough to reduce the available
free space!
Then the timer started and it performed the deduplication:
Overall:
Device size: 128.00GiB
Device allocated: 46.03GiB
Device unallocated: 81.97GiB
Device missing: 0.00B
Used: 43.87GiB
Free (estimated): 82.94GiB (min: 82.94GiB)
Data ratio: 1.00
Metadata ratio: 1.00
Global reserve: 512.00MiB (used: 176.00KiB)
Data,single: Size:41.00GiB, Used:40.03GiB
/dev/sda5 41.00GiB
Metadata,single: Size:5.00GiB, Used:3.84GiB
/dev/sda5 5.00GiB
System,single: Size:32.00MiB, Used:16.00KiB
/dev/sda5 32.00MiB
Unallocated:
/dev/sda5 81.97GiB
Once again it reduced the available free space!
Then, after the deduplication, the timer also performed a full balance:
Overall:
Device size: 128.00GiB
Device allocated: 46.03GiB
Device unallocated: 81.97GiB
Device missing: 0.00B
Used: 44.00GiB
Free (estimated): 82.93GiB (min: 82.93GiB)
Data ratio: 1.00
Metadata ratio: 1.00
Global reserve: 512.00MiB (used: 0.00B)
Data,single: Size:41.00GiB, Used:40.04GiB
/dev/sda5 41.00GiB
Metadata,single: Size:5.00GiB, Used:3.97GiB
/dev/sda5 5.00GiB
System,single: Size:32.00MiB, Used:16.00KiB
/dev/sda5 32.00MiB
Unallocated:
/dev/sda5 81.97GiB
It further reduced the available free space! Balance and deduplication
actually reduced my available free space of 400MB!
400MB each night!
How is it possible? Should I avoid doing balances and deduplications at
all?
Thanks,
Niccolò
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: Why do full balance and deduplication reduce available free space?
2017-10-02 10:02 Why do full balance and deduplication reduce available free space? Niccolò Belli
@ 2017-10-02 10:16 ` Hans van Kranenburg
2017-10-02 10:29 ` Niccolò Belli
2017-10-02 14:15 ` Why do full balance and deduplication reduce available free space? Niccolò Belli
` (2 subsequent siblings)
3 siblings, 1 reply; 10+ messages in thread
From: Hans van Kranenburg @ 2017-10-02 10:16 UTC (permalink / raw)
To: Niccolò Belli, linux-btrfs
On 10/02/2017 12:02 PM, Niccolò Belli wrote:
> [...]
>
> Since I use lots of snapshots [...] I had to
> create a systemd timer to perform a full balance and deduplication each
> night.
Can you explain what's your reasoning behind this 'because X it needs
Y'? I don't follow.
--
Hans van Kranenburg
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: Why do full balance and deduplication reduce available free space?
2017-10-02 10:16 ` Hans van Kranenburg
@ 2017-10-02 10:29 ` Niccolò Belli
2017-10-02 11:14 ` Paul Jones
0 siblings, 1 reply; 10+ messages in thread
From: Niccolò Belli @ 2017-10-02 10:29 UTC (permalink / raw)
To: Hans van Kranenburg; +Cc: linux-btrfs
Il 2017-10-02 12:16 Hans van Kranenburg ha scritto:
> On 10/02/2017 12:02 PM, Niccolò Belli wrote:
>> [...]
>>
>> Since I use lots of snapshots [...] I had to
>> create a systemd timer to perform a full balance and deduplication
>> each
>> night.
>
> Can you explain what's your reasoning behind this 'because X it needs
> Y'? I don't follow.
Available free space is important to me, so I want snapshots to be
deduplicated as well. Since I cannot deduplicate snapshots because they
are read-only, then the data must be already deduplicated before the
snapshots are taken. I do not consider the hourly snapshots because in a
day they will be gone anyway, but daily snapshots will stay there for
much longer so I want them to be deduplicated.
Niccolò
^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: Why do full balance and deduplication reduce available free space?
2017-10-02 10:29 ` Niccolò Belli
@ 2017-10-02 11:14 ` Paul Jones
2017-10-02 11:26 ` Is it really possible to dedupe read-only snapshots!? Niccolò Belli
0 siblings, 1 reply; 10+ messages in thread
From: Paul Jones @ 2017-10-02 11:14 UTC (permalink / raw)
To: Niccolò Belli; +Cc: linux-btrfs@vger.kernel.org
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1439 bytes --]
> -----Original Message-----
> From: linux-btrfs-owner@vger.kernel.org [mailto:linux-btrfs-
> owner@vger.kernel.org] On Behalf Of Niccolò Belli
> Sent: Monday, 2 October 2017 9:29 PM
> To: Hans van Kranenburg <hans.van.kranenburg@mendix.com>
> Cc: linux-btrfs@vger.kernel.org
> Subject: Re: Why do full balance and deduplication reduce available free
> space?
>
> Il 2017-10-02 12:16 Hans van Kranenburg ha scritto:
> > On 10/02/2017 12:02 PM, Niccolò Belli wrote:
> >> [...]
> >>
> >> Since I use lots of snapshots [...] I had to create a systemd timer
> >> to perform a full balance and deduplication each night.
> >
> > Can you explain what's your reasoning behind this 'because X it needs
> > Y'? I don't follow.
>
> Available free space is important to me, so I want snapshots to be
> deduplicated as well. Since I cannot deduplicate snapshots because they are
> read-only, then the data must be already deduplicated before the snapshots
> are taken. I do not consider the hourly snapshots because in a day they will
> be gone anyway, but daily snapshots will stay there for much longer so I want
> them to be deduplicated.
I use bees for deduplication and it will quite happily dedupe read-only snapshots. You could always change them to RW while dedupe is running then change back to RO.
Paul.
ÿôèº{.nÇ+·®+%Ëÿ±éݶ\x17¥wÿº{.nÇ+·¥{±ý»k~ÏâØ^nr¡ö¦zË\x1aëh¨èÚ&£ûàz¿äz¹Þú+Ê+zf£¢·h§~Ûiÿÿïêÿêçz_è®\x0fæj:+v¨þ)ߣøm
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Why do full balance and deduplication reduce available free space?
2017-10-02 10:02 Why do full balance and deduplication reduce available free space? Niccolò Belli
2017-10-02 10:16 ` Hans van Kranenburg
@ 2017-10-02 14:15 ` Niccolò Belli
2017-10-02 19:35 ` Kai Krakow
2017-10-02 20:27 ` Goffredo Baroncelli
3 siblings, 0 replies; 10+ messages in thread
From: Niccolò Belli @ 2017-10-02 14:15 UTC (permalink / raw)
To: linux-btrfs
Maybe this is because of the autodefrag mount option? I thought it
wasn't supposed to unshare lots of extents...
Niccolò
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Why do full balance and deduplication reduce available free space?
2017-10-02 10:02 Why do full balance and deduplication reduce available free space? Niccolò Belli
2017-10-02 10:16 ` Hans van Kranenburg
2017-10-02 14:15 ` Why do full balance and deduplication reduce available free space? Niccolò Belli
@ 2017-10-02 19:35 ` Kai Krakow
2017-10-02 20:19 ` Niccolò Belli
2017-10-02 20:27 ` Goffredo Baroncelli
3 siblings, 1 reply; 10+ messages in thread
From: Kai Krakow @ 2017-10-02 19:35 UTC (permalink / raw)
To: linux-btrfs
Am Mon, 02 Oct 2017 12:02:16 +0200
schrieb Niccolò Belli <darkbasic@linuxsystems.it>:
> This is how I performe balance: btrfs balance start --full-balance
> rootfs This is how I perform deduplication (duperemove is from git
> master): duperemove -drh --dedupe-options=noblock
> --hashfile=../rootfs.hash <all_subvols_except_snapshots_ones>
Besides defragging removing the reflinks, duperemove will unshare your
snapshots when used in this way: If it sees duplicate blocks within the
subvolumes you give it, it will potentially unshare blocks from the
snapshots while rewriting extents.
BTW, you should be able to use duperemove with read-only snapshots if
used in read-only-open mode. But I'd rather suggest to use bees
instead: It works at whole-volume level, walking extents instead of
files. That way it is much faster, doesn't reprocess already
deduplicated extents, and it works with read-only snapshots.
Until my patch it didn't like mixed nodatasum/datasum workloads.
Currently this is fixed by just leaving nocow data alone as users
probably set nocow for exactly the reason to not fragment extents and
relocate blocks.
--
Regards,
Kai
Replies to list-only preferred.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Why do full balance and deduplication reduce available free space?
2017-10-02 19:35 ` Kai Krakow
@ 2017-10-02 20:19 ` Niccolò Belli
2017-10-09 17:38 ` Kai Krakow
0 siblings, 1 reply; 10+ messages in thread
From: Niccolò Belli @ 2017-10-02 20:19 UTC (permalink / raw)
To: linux-btrfs
Il 2017-10-02 21:35 Kai Krakow ha scritto:
> Besides defragging removing the reflinks, duperemove will unshare your
> snapshots when used in this way: If it sees duplicate blocks within the
> subvolumes you give it, it will potentially unshare blocks from the
> snapshots while rewriting extents.
>
> BTW, you should be able to use duperemove with read-only snapshots if
> used in read-only-open mode. But I'd rather suggest to use bees
> instead: It works at whole-volume level, walking extents instead of
> files. That way it is much faster, doesn't reprocess already
> deduplicated extents, and it works with read-only snapshots.
>
> Until my patch it didn't like mixed nodatasum/datasum workloads.
> Currently this is fixed by just leaving nocow data alone as users
> probably set nocow for exactly the reason to not fragment extents and
> relocate blocks.
Bad Btrfs Feature Interactions: btrfs read-only snapshots (never tested,
probably wouldn't work well)
Unfortunately it seems that bees doesn't support read-only snapshots, so
it's a no way.
P.S.
I tried duperemove with -A, but besides taking much longer it didn't
improve the situation.
Are you sure that the culprit is duperemove? AFAIK it shouldn't unshare
extents...
Niccolò
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Why do full balance and deduplication reduce available free space?
2017-10-02 20:19 ` Niccolò Belli
@ 2017-10-09 17:38 ` Kai Krakow
0 siblings, 0 replies; 10+ messages in thread
From: Kai Krakow @ 2017-10-09 17:38 UTC (permalink / raw)
To: linux-btrfs
Am Mon, 02 Oct 2017 22:19:32 +0200
schrieb Niccolò Belli <darkbasic@linuxsystems.it>:
> Il 2017-10-02 21:35 Kai Krakow ha scritto:
> > Besides defragging removing the reflinks, duperemove will unshare
> > your snapshots when used in this way: If it sees duplicate blocks
> > within the subvolumes you give it, it will potentially unshare
> > blocks from the snapshots while rewriting extents.
> >
> > BTW, you should be able to use duperemove with read-only snapshots
> > if used in read-only-open mode. But I'd rather suggest to use bees
> > instead: It works at whole-volume level, walking extents instead of
> > files. That way it is much faster, doesn't reprocess already
> > deduplicated extents, and it works with read-only snapshots.
> >
> > Until my patch it didn't like mixed nodatasum/datasum workloads.
> > Currently this is fixed by just leaving nocow data alone as users
> > probably set nocow for exactly the reason to not fragment extents
> > and relocate blocks.
>
> Bad Btrfs Feature Interactions: btrfs read-only snapshots (never
> tested, probably wouldn't work well)
>
> Unfortunately it seems that bees doesn't support read-only snapshots,
> so it's a no way.
>
> P.S.
> I tried duperemove with -A, but besides taking much longer it didn't
> improve the situation.
> Are you sure that the culprit is duperemove? AFAIK it shouldn't
> unshare extents...
Unsharing of extents depends... If an extent is shared between a
r/o and r/w snapshot, rewriting the extent for deduplication ends up in
a shared extent again but it is no longer reflinked with the original
r/o snapshot. At least if btrfs doesn't allow to change extents part of
a r/o snapshot... Which you all tell is the case...
And then, there's unsharing of metadata by the deduplication process
itself.
Both effects should be minimal, tho. But since chunks are allocated in
1GB sizes, it may jump 1GB worth of allocation just for a few extra MB
needed. A metadata rebalance may fix this.
--
Regards,
Kai
Replies to list-only preferred.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Why do full balance and deduplication reduce available free space?
2017-10-02 10:02 Why do full balance and deduplication reduce available free space? Niccolò Belli
` (2 preceding siblings ...)
2017-10-02 19:35 ` Kai Krakow
@ 2017-10-02 20:27 ` Goffredo Baroncelli
3 siblings, 0 replies; 10+ messages in thread
From: Goffredo Baroncelli @ 2017-10-02 20:27 UTC (permalink / raw)
To: Niccolò Belli, linux-btrfs
On 10/02/2017 12:02 PM, Niccolò Belli wrote:
> Hi,
> I have several subvolumes mounted with compress-force=lzo and autodefrag.
>> Since I use lots of snapshots (snapper keeps around 24 hourly snapshots, 7 daily
> snapshots and 4 weekly snapshots) I had to create a systemd timer to perform a
> full balance and deduplication each night. In fact data needs to be
> already deduplicated when snapshots are created, otherwise I have no other way to deduplicate snapshots.
[...]
> Data,single: Size:44.00GiB, Used:40.00GiB
> /dev/sda5 44.00GiB
>
> Metadata,single: Size:5.00GiB, Used:3.78GiB
> /dev/sda5 5.00GiB
[...]
> Data,single: Size:41.00GiB, Used:40.01GiB
> /dev/sda5 41.00GiB
>
> Metadata,single: Size:5.00GiB, Used:3.77GiB
> /dev/sda5 5.00GiB
[...]
> Data,single: Size:41.00GiB, Used:40.03GiB
> /dev/sda5 41.00GiB
>
> Metadata,single: Size:5.00GiB, Used:3.84GiB
> /dev/sda5 5.00GiB
[...]
> Data,single: Size:41.00GiB, Used:40.04GiB
> /dev/sda5 41.00GiB
>
> Metadata,single: Size:5.00GiB, Used:3.97GiB
> /dev/sda5 5.00GiB
[....]
>
> It further reduced the available free space! Balance and deduplication actually reduced my available free space of 400MB!
> 400MB each night!
Your data increased by 40MB (over 40GB, so about ~0.1%); instead your metadata increased about 200MB (over ~4GB, about ~2%); so
1) it seems to me that your data is quite "deduped"
2) (NB this is a my guessing) I think that deduping (and or re-balancing) rearranges the metadata leading to a increase disk usage. The only explanation that I found is that the deduping breaks the sharing of metadata with the snapshots:
- a snapshot share the metadata, which in turn refers to the data. Because the metadata is shared, there is only one copy. The metadata remains shared, until it is not changed/updated.
- dedupe, when shares a file block, updates the metadata breaking the sharing with its snapshot, and thus creating a copy of these.
NB: updating snapshot metadata is the same that updating subvolume metadata
> How is it possible? Should I avoid doing balances and deduplications at all?
Try few days without deduplication, and check if something change. May be that it would be sufficient to delay the deduping: not each night, but each week or month.
Another option is running dedupe on all the files (including the snapshotted ones). In fact this would still break the metadata sharing, but the extents should still be shared (IMHO :-) ). Of course the cost of deduping will increasing a lot (about 24+7+4 = 35 times)
>
> Thanks,
> Niccolò
BR
G.Baroncelli
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2017-10-09 17:38 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-10-02 10:02 Why do full balance and deduplication reduce available free space? Niccolò Belli
2017-10-02 10:16 ` Hans van Kranenburg
2017-10-02 10:29 ` Niccolò Belli
2017-10-02 11:14 ` Paul Jones
2017-10-02 11:26 ` Is it really possible to dedupe read-only snapshots!? Niccolò Belli
2017-10-02 14:15 ` Why do full balance and deduplication reduce available free space? Niccolò Belli
2017-10-02 19:35 ` Kai Krakow
2017-10-02 20:19 ` Niccolò Belli
2017-10-09 17:38 ` Kai Krakow
2017-10-02 20:27 ` Goffredo Baroncelli
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).