* Btrfs send bloat @ 2019-05-19 8:11 Newbugreport 2019-05-19 20:06 ` Andrei Borzenkov 0 siblings, 1 reply; 10+ messages in thread From: Newbugreport @ 2019-05-19 8:11 UTC (permalink / raw) To: linux-btrfs@vger.kernel.org I have 3-4 years worth of snapshots I use for backup purposes. I keep R-O live snapshots, two local backups, and AWS Glacier Deep Freeze. I use both send | receive and send > file. This works well but I get massive deltas when files are moved around in a GUI via samba. Reorganize a bunch of files and the next snapshot is 50 or 100 GB. Perhaps mv or cp with reflink=always would fix the problem but it's just not usable enough for my family. I'd like a solution to the massive delta problem. Perhaps someone already has a solution, that would be great. If not, I need advice on a few ideas. It seems a realistic solution to deduplicate the subvolume before each snapshot is taken, and in theory I could write a small program to do that. However I don't know if that would work. Will Btrfs will let me deduplicate between a file on the live subvolume and a file on the R-O snapshot (really the same file but different path). If so, will Btrfs send with -p result in a small delta? Failing that I could probably make changes to the send data stream, but that's suboptimal for the live volume and any backup volumes where data has been received. Also, is it possible to access the Btrfs hash values for files so I don't have to recalculate file hashes for the whole volume myself? Thanks in advance for any advice. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Btrfs send bloat 2019-05-19 8:11 Btrfs send bloat Newbugreport @ 2019-05-19 20:06 ` Andrei Borzenkov 2019-05-20 9:20 ` David Disseldorp 2019-05-20 10:34 ` Patrik Lundquist 0 siblings, 2 replies; 10+ messages in thread From: Andrei Borzenkov @ 2019-05-19 20:06 UTC (permalink / raw) To: Newbugreport, linux-btrfs@vger.kernel.org 19.05.2019 11:11, Newbugreport пишет: > I have 3-4 years worth of snapshots I use for backup purposes. I keep > R-O live snapshots, two local backups, and AWS Glacier Deep Freeze. I > use both send | receive and send > file. This works well but I get > massive deltas when files are moved around in a GUI via samba. Did you analyze whether it is client or server problem? If client does file copy (instead of move as you imply) may be the simplest solution would be to use different tool on client. If problem is on server side, it is something to discuss with SAMBA folks. > Reorganize a bunch of files and the next snapshot is 50 or 100 GB. > Perhaps mv or cp with reflink=always would fix the problem but it's > just not usable enough for my family. > > I'd like a solution to the massive delta problem. Perhaps someone > already has a solution, that would be great. If not, I need advice on > a few ideas. > > It seems a realistic solution to deduplicate the subvolume before > each snapshot is taken, and in theory I could write a small program > to do that. You mean that none of existing half a dozen tools to perform deduplication on btrfs fits your requirements? > However I don't know if that would work. Will Btrfs will > let me deduplicate between a file on the live subvolume and a file on > the R-O snapshot (really the same file but different path). If so, btrfs does not care because it does not perform any deduplication at all. All tools compute identical file ranges and then invoke kernel ioctl to replace reference to range in destination file by reference to identical range in source file. So there is nothing that prevents using read-only data as source for deduplcation of read-write data. Whether each of existing tools supports it (or makes it easy to do) I do not know. > will Btrfs send with -p result in a small delta? > Well, if all data is replaced by reference to existing extents in some snapshot then delta to this snapshot will be small. > Failing that I could probably make changes to the send data stream, > but that's suboptimal for the live volume and any backup volumes > where data has been received. > > Also, is it possible to access the Btrfs hash values for files so I > don't have to recalculate file hashes for the whole volume myself? > Currently btrfs does not compute hashes suitable for deduplication. It only stores CRC32 checksums. You can access checksum tree and at least one tool makes use of it to speed up scanning; but it then computes second hash to avoid false positives. Recently patch series was posted to add support for different hashes (I believe SHA256 at least); these would be more useful for deduplication when merged. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Btrfs send bloat 2019-05-19 20:06 ` Andrei Borzenkov @ 2019-05-20 9:20 ` David Disseldorp 2019-05-20 10:34 ` Patrik Lundquist 1 sibling, 0 replies; 10+ messages in thread From: David Disseldorp @ 2019-05-20 9:20 UTC (permalink / raw) To: Andrei Borzenkov; +Cc: Newbugreport, linux-btrfs@vger.kernel.org On Sun, 19 May 2019 23:06:25 +0300, Andrei Borzenkov wrote: > 19.05.2019 11:11, Newbugreport пишет: > > I have 3-4 years worth of snapshots I use for backup purposes. I keep > > R-O live snapshots, two local backups, and AWS Glacier Deep Freeze. I > > use both send | receive and send > file. This works well but I get > > massive deltas when files are moved around in a GUI via samba. > > Did you analyze whether it is client or server problem? If client does > file copy (instead of move as you imply) may be the simplest solution > would be to use different tool on client. If problem is on server side, > it is something to discuss with SAMBA folks. Samba supports copy offload via FSCTL_SRV_COPYCHUNK and FSCTL_DUPLICATE_EXTENTS_TO_FILE, which can be translated to BTRFS_IOC_CLONE_RANGE via the btrfs Samba VFS module. Windows explorer and Linux (cifs.ko) are capable of using these fsctls during copy. See https://wiki.samba.org/index.php/Server-Side_Copy for details. Cheers, David ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Btrfs send bloat 2019-05-19 20:06 ` Andrei Borzenkov 2019-05-20 9:20 ` David Disseldorp @ 2019-05-20 10:34 ` Patrik Lundquist 2019-05-20 11:15 ` Newbugreport 1 sibling, 1 reply; 10+ messages in thread From: Patrik Lundquist @ 2019-05-20 10:34 UTC (permalink / raw) To: Newbugreport; +Cc: linux-btrfs@vger.kernel.org, Andrei Borzenkov On Mon, 20 May 2019 at 02:36, Andrei Borzenkov <arvidjaar@gmail.com> wrote: > > 19.05.2019 11:11, Newbugreport пишет: > > I have 3-4 years worth of snapshots I use for backup purposes. I keep > > R-O live snapshots, two local backups, and AWS Glacier Deep Freeze. I > > use both send | receive and send > file. This works well but I get > > massive deltas when files are moved around in a GUI via samba. > > Did you analyze whether it is client or server problem? If client does > file copy (instead of move as you imply) may be the simplest solution > would be to use different tool on client. If problem is on server side, > it is something to discuss with SAMBA folks. Also try the Btrfs module in Samba. https://wiki.samba.org/index.php/Server-Side_Copy#Btrfs_Enhanced_Server-Side_Copy_Offload ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Btrfs send bloat 2019-05-20 10:34 ` Patrik Lundquist @ 2019-05-20 11:15 ` Newbugreport 2019-05-20 11:58 ` Austin S. Hemmelgarn 0 siblings, 1 reply; 10+ messages in thread From: Newbugreport @ 2019-05-20 11:15 UTC (permalink / raw) To: Patrik Lundquist; +Cc: linux-btrfs@vger.kernel.org, Andrei Borzenkov Patrik, thank you. I've enabled the SAMBA module, which may help in the future. Does the GUI file manager (i.e. Nautilus) need special support? Andrea, thank you for the link. bup is impressive but does it work well with btrfs snapshots? My live drive contains the main volume alongside many snapshots and the associated bloat from moved/deleted files. There's not room for another copy of everything, even if it's deduplicated. Perhaps I could switch one of the backup drives and the cloud to bup, but how well would bup work diffing all those snapshots when the backup drive is plugged in? ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On Monday, May 20, 2019 10:34 AM, Patrik Lundquist <patrik.lundquist@gmail.com> wrote: > On Mon, 20 May 2019 at 02:36, Andrei Borzenkov arvidjaar@gmail.com wrote: > > > 19.05.2019 11:11, Newbugreport пишет: > > > > > I have 3-4 years worth of snapshots I use for backup purposes. I keep > > > R-O live snapshots, two local backups, and AWS Glacier Deep Freeze. I > > > use both send | receive and send > file. This works well but I get > > > massive deltas when files are moved around in a GUI via samba. > > > > Did you analyze whether it is client or server problem? If client does > > file copy (instead of move as you imply) may be the simplest solution > > would be to use different tool on client. If problem is on server side, > > it is something to discuss with SAMBA folks. > > Also try the Btrfs module in Samba. > https://wiki.samba.org/index.php/Server-Side_Copy#Btrfs_Enhanced_Server-Side_Copy_Offload ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Btrfs send bloat 2019-05-20 11:15 ` Newbugreport @ 2019-05-20 11:58 ` Austin S. Hemmelgarn 2019-05-20 12:14 ` Patrik Lundquist 0 siblings, 1 reply; 10+ messages in thread From: Austin S. Hemmelgarn @ 2019-05-20 11:58 UTC (permalink / raw) To: Newbugreport, Patrik Lundquist Cc: linux-btrfs@vger.kernel.org, Andrei Borzenkov On 2019-05-20 07:15, Newbugreport wrote: > Patrik, thank you. I've enabled the SAMBA module, which may help in the future. Does the GUI file manager (i.e. Nautilus) need special support? It shouldn't (Windows' default file manager doesn't, and most stuff on Linux uses Samba so it shouldn't either, not sure about macOS though). Keep in mind, however, that server-side copies only work in SMB within a single share. If you're moving files between two independent shares, even if they're on the same server (or even the same filesystem on the same server) will always translate to a copy+delete because the client system has no other way to tell the server to move the file across shares. > > Andrea, thank you for the link. bup is impressive but does it work well with btrfs snapshots? My live drive contains the main volume alongside many snapshots and the associated bloat from moved/deleted files. There's not room for another copy of everything, even if it's deduplicated. Perhaps I could switch one of the backup drives and the cloud to bup, but how well would bup work diffing all those snapshots when the backup drive is plugged in? Deduplication will almost never increase the total amount of data, and it absolutely won't need a second copy of everything. The initial pass will probably be very slow though, as the ioctl that gets used does a bytewise comparison of the ranges that get passed in to make sure they are actually identical before it merges them. Once the data is mostly deduplicated, this shouldn't be an issue for most tools as they will see the existing deduplicated ranges and not try to re-merge them. > > > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ > On Monday, May 20, 2019 10:34 AM, Patrik Lundquist <patrik.lundquist@gmail.com> wrote: > >> On Mon, 20 May 2019 at 02:36, Andrei Borzenkov arvidjaar@gmail.com wrote: >> >>> 19.05.2019 11:11, Newbugreport пишет: >>> >>>> I have 3-4 years worth of snapshots I use for backup purposes. I keep >>>> R-O live snapshots, two local backups, and AWS Glacier Deep Freeze. I >>>> use both send | receive and send > file. This works well but I get >>>> massive deltas when files are moved around in a GUI via samba. >>> >>> Did you analyze whether it is client or server problem? If client does >>> file copy (instead of move as you imply) may be the simplest solution >>> would be to use different tool on client. If problem is on server side, >>> it is something to discuss with SAMBA folks. >> >> Also try the Btrfs module in Samba. >> https://wiki.samba.org/index.php/Server-Side_Copy#Btrfs_Enhanced_Server-Side_Copy_Offload > > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Btrfs send bloat 2019-05-20 11:58 ` Austin S. Hemmelgarn @ 2019-05-20 12:14 ` Patrik Lundquist 2019-05-20 12:40 ` Btrfs remote reflink with Samba David Disseldorp 0 siblings, 1 reply; 10+ messages in thread From: Patrik Lundquist @ 2019-05-20 12:14 UTC (permalink / raw) To: Austin S. Hemmelgarn Cc: Newbugreport, linux-btrfs@vger.kernel.org, Andrei Borzenkov, ddiss On Mon, 20 May 2019 at 13:58, Austin S. Hemmelgarn <ahferroin7@gmail.com> wrote: > > On 2019-05-20 07:15, Newbugreport wrote: > > Patrik, thank you. I've enabled the SAMBA module, which may help in the future. Does the GUI file manager (i.e. Nautilus) need special support? > It shouldn't (Windows' default file manager doesn't, and most stuff on > Linux uses Samba so it shouldn't either, not sure about macOS though). The client side needs support for FSCTL_SRV_COPYCHUNK. Nautilus uses gvfsd-smb which in turn uses the Samba libs, but I have no idea if it works. Maybe David Disseldorp knows? Try copying a large file and compare used space. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Btrfs remote reflink with Samba 2019-05-20 12:14 ` Patrik Lundquist @ 2019-05-20 12:40 ` David Disseldorp 2019-05-20 20:33 ` Patrik Lundquist 0 siblings, 1 reply; 10+ messages in thread From: David Disseldorp @ 2019-05-20 12:40 UTC (permalink / raw) To: Patrik Lundquist Cc: Austin S. Hemmelgarn, Newbugreport, linux-btrfs@vger.kernel.org, Andrei Borzenkov, Samba Technical On Mon, 20 May 2019 14:14:48 +0200, Patrik Lundquist wrote: > On Mon, 20 May 2019 at 13:58, Austin S. Hemmelgarn <ahferroin7@gmail.com> wrote: > > > > On 2019-05-20 07:15, Newbugreport wrote: > > > Patrik, thank you. I've enabled the SAMBA module, which may help in the future. Does the GUI file manager (i.e. Nautilus) need special support? > > It shouldn't (Windows' default file manager doesn't, and most stuff on > > Linux uses Samba so it shouldn't either, not sure about macOS though). > > The client side needs support for FSCTL_SRV_COPYCHUNK. Nautilus uses > gvfsd-smb which in turn uses the Samba libs, but I have no idea if it > works. Maybe David Disseldorp knows? libsmbclient copychunk functionality was added via: https://git.samba.org/?p=samba.git;a=commit;h=f73bcf4934be IIRC, it was added with the intention of being used by Nautilus. That said, I've not tried it myself, and I don't see any reference to splice in: https://gitlab.gnome.org/GNOME/gvfs/blob/master/daemon/gvfsbackendsmb.c (Perhaps I'm looking in the wrong place?). Cheers, David ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Btrfs remote reflink with Samba 2019-05-20 12:40 ` Btrfs remote reflink with Samba David Disseldorp @ 2019-05-20 20:33 ` Patrik Lundquist 2019-05-20 22:50 ` Chris Murphy 0 siblings, 1 reply; 10+ messages in thread From: Patrik Lundquist @ 2019-05-20 20:33 UTC (permalink / raw) To: David Disseldorp Cc: Newbugreport, linux-btrfs@vger.kernel.org, Samba Technical On Mon, 20 May 2019 at 14:40, David Disseldorp <ddiss@samba.org> wrote: > > On Mon, 20 May 2019 14:14:48 +0200, Patrik Lundquist wrote: > > > On Mon, 20 May 2019 at 13:58, Austin S. Hemmelgarn <ahferroin7@gmail.com> wrote: > > > > > > On 2019-05-20 07:15, Newbugreport wrote: > > > > Patrik, thank you. I've enabled the SAMBA module, which may help in the future. Does the GUI file manager (i.e. Nautilus) need special support? > > > It shouldn't (Windows' default file manager doesn't, and most stuff on > > > Linux uses Samba so it shouldn't either, not sure about macOS though). > > > > The client side needs support for FSCTL_SRV_COPYCHUNK. Nautilus uses > > gvfsd-smb which in turn uses the Samba libs, but I have no idea if it > > works. Maybe David Disseldorp knows? > > libsmbclient copychunk functionality was added via: > https://git.samba.org/?p=samba.git;a=commit;h=f73bcf4934be > IIRC, it was added with the intention of being used by Nautilus. > That said, I've not tried it myself, and I don't see any reference to > splice in: > https://gitlab.gnome.org/GNOME/gvfs/blob/master/daemon/gvfsbackendsmb.c > (Perhaps I'm looking in the wrong place?). https://gitlab.gnome.org/GNOME/gvfs/issues/286 is unfortunately blocked by https://bugzilla.samba.org/show_bug.cgi?id=11413 I don't know if Nautilus tries reflink copying on a cifs mounted Samba share but Mr. Newbugreport can at least move around (ctrl-x, ctrl-v) files in Nautilus within the same share without making new copies. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Btrfs remote reflink with Samba 2019-05-20 20:33 ` Patrik Lundquist @ 2019-05-20 22:50 ` Chris Murphy 0 siblings, 0 replies; 10+ messages in thread From: Chris Murphy @ 2019-05-20 22:50 UTC (permalink / raw) To: linux-btrfs@vger.kernel.org; +Cc: Samba Technical On Mon, May 20, 2019 at 2:35 PM Patrik Lundquist <patrik.lundquist@gmail.com> wrote: > > On Mon, 20 May 2019 at 14:40, David Disseldorp <ddiss@samba.org> wrote: > > > > On Mon, 20 May 2019 14:14:48 +0200, Patrik Lundquist wrote: > > > > > On Mon, 20 May 2019 at 13:58, Austin S. Hemmelgarn <ahferroin7@gmail.com> wrote: > > > > > > > > On 2019-05-20 07:15, Newbugreport wrote: > > > > > Patrik, thank you. I've enabled the SAMBA module, which may help in the future. Does the GUI file manager (i.e. Nautilus) need special support? > > > > It shouldn't (Windows' default file manager doesn't, and most stuff on > > > > Linux uses Samba so it shouldn't either, not sure about macOS though). > > > > > > The client side needs support for FSCTL_SRV_COPYCHUNK. Nautilus uses > > > gvfsd-smb which in turn uses the Samba libs, but I have no idea if it > > > works. Maybe David Disseldorp knows? > > > > libsmbclient copychunk functionality was added via: > > https://git.samba.org/?p=samba.git;a=commit;h=f73bcf4934be > > IIRC, it was added with the intention of being used by Nautilus. > > That said, I've not tried it myself, and I don't see any reference to > > splice in: > > https://gitlab.gnome.org/GNOME/gvfs/blob/master/daemon/gvfsbackendsmb.c > > (Perhaps I'm looking in the wrong place?). > > https://gitlab.gnome.org/GNOME/gvfs/issues/286 is unfortunately > blocked by https://bugzilla.samba.org/show_bug.cgi?id=11413 > > I don't know if Nautilus tries reflink copying on a cifs mounted Samba > share but Mr. Newbugreport can at least move around (ctrl-x, ctrl-v) > files in Nautilus within the same share without making new copies. I just did ctrl-c, ctrl-v for a file in one dir to another dir, and it takes forever. It's clearly being copied over the network to my local machine and then pushed back to the server. Three minutes to copy a 2GiB file. Server side: kernel 5.1.0-1.fc31.x86_64 samba-4.9.5-0.fc29.x86_64 smb.conf contains 'vfs objects = btrfs' for this share Client side: samba-client-4.10.2-1.1.fc30.x86_64 gvfs-smb-1.40.1-2.fc30.x86_64 nautilus-3.32.1-1.fc30.x86_64 -- Chris Murphy ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2019-05-20 22:51 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2019-05-19 8:11 Btrfs send bloat Newbugreport 2019-05-19 20:06 ` Andrei Borzenkov 2019-05-20 9:20 ` David Disseldorp 2019-05-20 10:34 ` Patrik Lundquist 2019-05-20 11:15 ` Newbugreport 2019-05-20 11:58 ` Austin S. Hemmelgarn 2019-05-20 12:14 ` Patrik Lundquist 2019-05-20 12:40 ` Btrfs remote reflink with Samba David Disseldorp 2019-05-20 20:33 ` Patrik Lundquist 2019-05-20 22:50 ` Chris Murphy
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox