From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ig0-f181.google.com ([209.85.213.181]:34867 "EHLO mail-ig0-f181.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751452AbbHMLaq (ORCPT ); Thu, 13 Aug 2015 07:30:46 -0400 Received: by igbjg10 with SMTP id jg10so61213442igb.0 for ; Thu, 13 Aug 2015 04:30:45 -0700 (PDT) Subject: Re: bedup --defrag freezing To: Chris Murphy , Konstantin Svist , Btrfs BTRFS References: <55C2840E.4030909@gmail.com> <55C340B5.20005@gmail.com> <55CB942B.7050301@gmail.com> From: Austin S Hemmelgarn Message-ID: <55CC7FDE.5000209@gmail.com> Date: Thu, 13 Aug 2015 07:30:38 -0400 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/signed; protocol="application/pkcs7-signature"; micalg=sha-512; boundary="------------ms010908050600070502000202" Sender: linux-btrfs-owner@vger.kernel.org List-ID: This is a cryptographically signed message in MIME format. --------------ms010908050600070502000202 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable On 2015-08-12 15:30, Chris Murphy wrote: > On Wed, Aug 12, 2015 at 12:44 PM, Konstantin Svist = wrote: >> On 08/06/2015 04:10 AM, Austin S Hemmelgarn wrote: >>> On 2015-08-05 17:45, Konstantin Svist wrote: >>>> Hi, >>>> >>>> I've been running btrfs on Fedora for a while now, with bedup --defr= ag >>>> running in a night-time cronjob. >>>> Last few runs seem to have gotten stuck, without possibility of even= >>>> killing the process (kill -9 doesn't work) -- all I could do is hard= >>>> power cycle. >>>> >>>> Did something change recently? Is bedup simply too out of date? What= >>>> should I use to de-duplicate across snapshots instead? Etc.? >>>> >>> AFAIK, bedup hasn't been actively developed for quite a while (I'm >>> actually kind of surprised it runs with the newest btrfs-progs). >>> Personally, I'd suggest using duperemove >>> (https://github.com/markfasheh/duperemove) >> >> Thanks, good to know. >> Tried duperemove -- it looks like it builds a database of its own >> checksums every time it runs... why won't it use BTRFS internal >> checksums for fast rejection? Would run a LOT faster... > > I think the reason is duperremove does extent based deduplication. > Where Btrfs checksums are 4KiB block based, not extent based. And so > many 4KiB CRC32C checksums would need to be in memory, that could be > kinda expensive. And also, I don't know if CRC32C checksums have > essentially no practical chance of collision. If it's really rare, > rather than "so improbable as to be impossible" then you could end up > with "really rare" corruption where incorrect deduplication happens. Yeah, duperemove doesn't use them because of the memory limitations.=20 Theoretically it's possible to take the the CRC checksums of the=20 individual blocks and then combine them to get a checksum of the blocks=20 as a whold, but it really isn't worth it for that (it would take just=20 about as long as the current hashing. As for the collision properties of CRC32C, it's actually almost trivial=20 to construct collisions. The reason that it is used in BTRFS is because = there is a functional guarantee that any single bit error in a block=20 _will_ result in a different CRC, and most larger errors will also. In=20 other words, the usage of CRC32C in BTRFS is for error detection and=20 because it's ridiculously fast on all modern processors. As far as the=20 possibility of incorrect deduplication, the kernel does a bytewise=20 comparison of the extents submitted before actually deduplicating them,=20 so there's no chance (barring hardware issues and/or external influence=20 from a ill-intentioned third-party) of it happening. Because of this,=20 you could theoretically just call the ioctl on every possible=20 combination of extents in the FS, but that would take a ridiculous=20 amount of time (especially because calls involving the same byte ranges=20 get internally serialized by the kernel), which is why we have programs=20 like duperemove (while the hashing has to read all the data too, it's=20 still a lot faster than just comparing all of it directly). > > There was a patch late last year I think to re-introduce sha256 hash > as the checksum, but as far as I know it's not in btrfs-progs yet. I > forget if that's file, extent or block based. I'm pretty sure that that patch never made it into the kernel (the=20 original one was for the kernel, not the userspace programs, and it=20 never got brought in because the argument for it (better protection=20 against malicious intent) was inherently invalid for the usage of=20 checksums in BTRFS (if someone can rewrite your data arbitrarily on=20 disk, they can do so for the checksums also)), and that it was block=20 based (and as such less useful for deduplication than the CRC32C that we = are currently using). --------------ms010908050600070502000202 Content-Type: application/pkcs7-signature; name="smime.p7s" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="smime.p7s" Content-Description: S/MIME Cryptographic Signature MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgMFADCABgkqhkiG9w0BBwEAAKCC Brgwgga0MIIEnKADAgECAgMQblUwDQYJKoZIhvcNAQENBQAweTEQMA4GA1UEChMHUm9vdCBD QTEeMBwGA1UECxMVaHR0cDovL3d3dy5jYWNlcnQub3JnMSIwIAYDVQQDExlDQSBDZXJ0IFNp Z25pbmcgQXV0aG9yaXR5MSEwHwYJKoZIhvcNAQkBFhJzdXBwb3J0QGNhY2VydC5vcmcwHhcN MTUwMzI1MTkzNDM4WhcNMTUwOTIxMTkzNDM4WjBjMRgwFgYDVQQDEw9DQWNlcnQgV29UIFVz ZXIxIzAhBgkqhkiG9w0BCQEWFGFoZmVycm9pbjdAZ21haWwuY29tMSIwIAYJKoZIhvcNAQkB FhNhaGVtbWVsZ0BvaGlvZ3QuY29tMIICIjANBgkqhkiG9w0BAQEFAAOCAg8AMIICCgKCAgEA nQ/81tq0QBQi5w316VsVNfjg6kVVIMx760TuwA1MUaNQgQ3NyUl+UyFtjhpkNwwChjgAqfGd LIMTHAdObcwGfzO5uI2o1a8MHVQna8FRsU3QGouysIOGQlX8jFYXMKPEdnlt0GoQcd+BtESr pivbGWUEkPs1CwM6WOrs+09bAJP3qzKIr0VxervFrzrC5Dg9Rf18r9WXHElBuWHg4GYHNJ2V Ab8iKc10h44FnqxZK8RDN8ts/xX93i9bIBmHnFfyNRfiOUtNVeynJbf6kVtdHP+CRBkXCNRZ qyQT7gbTGD24P92PS2UTmDfplSBcWcTn65o3xWfesbf02jF6PL3BCrVnDRI4RgYxG3zFBJuG qvMoEODLhHKSXPAyQhwZINigZNdw5G1NqjXqUw+lIqdQvoPijK9J3eijiakh9u2bjWOMaleI SMRR6XsdM2O5qun1dqOrCgRkM0XSNtBQ2JjY7CycIx+qifJWsRaYWZz0aQU4ZrtAI7gVhO9h pyNaAGjvm7PdjEBiXq57e4QcgpwzvNlv8pG1c/hnt0msfDWNJtl3b6elhQ2Pz4w/QnWifZ8E BrFEmjeeJa2dqjE3giPVWrsH+lOvQQONsYJOuVb8b0zao4vrWeGmW2q2e3pdv0Axzm/60cJQ haZUv8+JdX9ZzqxOm5w5eUQSclt84u+D+hsCAwEAAaOCAVkwggFVMAwGA1UdEwEB/wQCMAAw VgYJYIZIAYb4QgENBEkWR1RvIGdldCB5b3VyIG93biBjZXJ0aWZpY2F0ZSBmb3IgRlJFRSBo ZWFkIG92ZXIgdG8gaHR0cDovL3d3dy5DQWNlcnQub3JnMA4GA1UdDwEB/wQEAwIDqDBABgNV HSUEOTA3BggrBgEFBQcDBAYIKwYBBQUHAwIGCisGAQQBgjcKAwQGCisGAQQBgjcKAwMGCWCG SAGG+EIEATAyBggrBgEFBQcBAQQmMCQwIgYIKwYBBQUHMAGGFmh0dHA6Ly9vY3NwLmNhY2Vy dC5vcmcwMQYDVR0fBCowKDAmoCSgIoYgaHR0cDovL2NybC5jYWNlcnQub3JnL3Jldm9rZS5j cmwwNAYDVR0RBC0wK4EUYWhmZXJyb2luN0BnbWFpbC5jb22BE2FoZW1tZWxnQG9oaW9ndC5j b20wDQYJKoZIhvcNAQENBQADggIBABr5e8W+NiTER+Q/7wiA2LxWN3UdhT3eZJjqqSlP370P KL5iWqeTfxQ67Ai/mHbJcT2PgAJ+/D2Ji+aRR03UWnU/vtOwzyDLUMstqnfl0Zs+sz/CJe7x nBA5jlpjC2DKuMVfbPze7eySaen7XSGFHKE1QoVIIpQ2kVjC4nbbJQnUbAVX1Iz29WxeVGt9 XYigz3tDPf3tglN+q23E7YjQl4abTIoM7i98yV1H9gfY8lFfKZ6jREB9+n6ie2EwS3Kat2mG tl2wBx4MfRnoSQSKsLKQ5oTwhWf0JqlFwpLfl374p0Njcykej9/jnWG8Ks1V/AXTHqI4eyIP Mf5yMZkPv7n7LS9WWKdG4Nd38iv4T2EiAaWsmgu+r81qL5CJu9AyA0SBS4ttKf6k3e63w2Mv N9R45vpQ3QhAhfWyFxFhZN95APe3YECDG3+XIRJpRYPEtHuIsOyzI70ajF93gg/BidvqKsmV MM2ccktDMfqwZXea6zey7F8Geu9R7BqjXmG2HlNuXu7e/xnHOgXf5D3wPmnRLlBhXL1Ch97a w2KjaupjpAHfFjv5kGnZXN87UvvlwzIZiKXwa3vTDwK+rrKn/sHPkfDZPSiyt/ZBIK6lX83P 34H/CzGg+Kx57rHYOIHGumIvpDa5vfWp8O0sGgawb1C2Aae4sTUVIWmIjVuGI062MYIE0TCC BM0CAQEwgYAweTEQMA4GA1UEChMHUm9vdCBDQTEeMBwGA1UECxMVaHR0cDovL3d3dy5jYWNl cnQub3JnMSIwIAYDVQQDExlDQSBDZXJ0IFNpZ25pbmcgQXV0aG9yaXR5MSEwHwYJKoZIhvcN AQkBFhJzdXBwb3J0QGNhY2VydC5vcmcCAxBuVTANBglghkgBZQMEAgMFAKCCAiEwGAYJKoZI hvcNAQkDMQsGCSqGSIb3DQEHATAcBgkqhkiG9w0BCQUxDxcNMTUwODEzMTEzMDM4WjBPBgkq hkiG9w0BCQQxQgRAW2jwMW3cEiFhNwfJT0Ckq7NI+vNc4YTTfZfLT76hyGfbE8wCDn3lmQx1 ykvM5c5X45SRNOtFqpumsSnxJTzkozBsBgkqhkiG9w0BCQ8xXzBdMAsGCWCGSAFlAwQBKjAL BglghkgBZQMEAQIwCgYIKoZIhvcNAwcwDgYIKoZIhvcNAwICAgCAMA0GCCqGSIb3DQMCAgFA MAcGBSsOAwIHMA0GCCqGSIb3DQMCAgEoMIGRBgkrBgEEAYI3EAQxgYMwgYAweTEQMA4GA1UE ChMHUm9vdCBDQTEeMBwGA1UECxMVaHR0cDovL3d3dy5jYWNlcnQub3JnMSIwIAYDVQQDExlD QSBDZXJ0IFNpZ25pbmcgQXV0aG9yaXR5MSEwHwYJKoZIhvcNAQkBFhJzdXBwb3J0QGNhY2Vy dC5vcmcCAxBuVTCBkwYLKoZIhvcNAQkQAgsxgYOggYAweTEQMA4GA1UEChMHUm9vdCBDQTEe MBwGA1UECxMVaHR0cDovL3d3dy5jYWNlcnQub3JnMSIwIAYDVQQDExlDQSBDZXJ0IFNpZ25p bmcgQXV0aG9yaXR5MSEwHwYJKoZIhvcNAQkBFhJzdXBwb3J0QGNhY2VydC5vcmcCAxBuVTAN BgkqhkiG9w0BAQEFAASCAgBS9PiBJyTCy1VNIyVGnMJFekpFk+egWNaNcE7HWTYQuD18B7WO JRMAf1KX8dB9Pi/SD+q0FtqUfPSaiFg7GGoT7xhQTu1a+ju8g9T0dHrQuOOPc6nl+lziwJYc QQmqs3ygxVjLAmPH+oIYxQIqiDW3Ae/CWJndMbqmvWf/pvWdaOB0TYUmRFT/CXciC4zYgZT/ 9JdLMoExTFdQP7fDrmZvNdz+74cvIt+MklLQdsEG1ypNPYGQjiZ2h2Gw2s07n2jRKloowQy1 LA09ulsMzsqUFnqkfoVEyQ7NAEmqyLe+Gg71BHNXsaTMdxII156WvDWLaXpWMHqXDJhGOjno 7RUUD4MjpcgRtJu7l8ZY33soUbfFkKUseVFjRzPEZWfFBIBq8nb0t2frBGwC2U5YunblMOVR UE9kMoESTh3vfxP6Snov8fHlLIGebaaxl+o/ZTjtEi+aFKojzxOhyj5Me+Pr3B6me/BIrenq zjlUvZuhHCIKHC9gxjecWvxKgwah7QTanmzzyOLEj4WgLBbWgZBTT+39OF2NswVSinKZKlTk 7lterkfCvU5iFJ0edW4gC7l7u5P+OSIAWlFzAZQN4ncKvAKhCkf4+Ug8tpHAIqmmt217q1Y5 K03iNFZkVIjlmNoHXziR03J33N3sCJv5PmtSjzb62qTm7OKBo7IL6z48BQAAAAAAAA== --------------ms010908050600070502000202--