From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mailgw-01.dd24.net ([193.46.215.41]:45320 "EHLO mailgw-01.dd24.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750920AbbLIFnF (ORCPT ); Wed, 9 Dec 2015 00:43:05 -0500 Message-ID: <1449639781.7835.3.camel@scientia.net> Subject: Re: [auto-]defrag, nodatacow - general suggestions?(was: btrfs: poor performance on deleting many large files?) From: Christoph Anton Mitterer To: Hugo Mills Cc: Austin S Hemmelgarn , linux-btrfs@vger.kernel.org, Duncan <1i5t5.duncan@cox.net> Date: Wed, 09 Dec 2015 06:43:01 +0100 In-Reply-To: <20151126003342.GA24333@carfax.org.uk> References: <56530DAD.9080607@gmail.com> <1448497439.28195.33.camel@scientia.net> <20151126003342.GA24333@carfax.org.uk> Content-Type: multipart/signed; micalg="sha-512"; protocol="application/x-pkcs7-signature"; boundary="=-2rGOBY9GIH1rMOAHOyMh" Mime-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org List-ID: --=-2rGOBY9GIH1rMOAHOyMh Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hey Hugo, On Thu, 2015-11-26 at 00:33 +0000, Hugo Mills wrote: > =C2=A0 =C2=A0Answering the second part first, no, it can't. Thanks so far :) > =C2=A0=C2=A0=C2=A0The issue is that nodatacow bypasses the transactional = nature of > the FS, making changes to live data immediately. This then means that > if you modify a modatacow file, the csum for that modified section is > out of date, and won't be back in sync again until the latest > transaction is committed. So you can end up with an inconsistent > filesystem if there's a crash between the two events. Sure,... (and btw: is there some kind of journal planned for nodatacow'ed files?),... but why not simply trying to write an updated checksum after the modified section has been flushed to disk... of course there's no guarantee that both are consistent in case of crash ( but that's also the case without any checksum)... but at least one would have the csum protection against everything else (blockerrors and that like) in case no crash occurs? > > For me the checksumming is actually the most important part of > > btrfs > > (not that I wouldn't like its other features as well)... so turning > > it > > off is something I really would want to avoid. > >=20 > > Plus it opens questions like: When there are no checksums, how can > > it > > (in the RAID cases) decide which block is the good one in case of > > corruptions? > =C2=A0=C2=A0=C2=A0It doesn't decide -- both copies look equally good, bec= ause > there's > no checksum, so if you read the data, the FS will return whatever > data > was on the copy it happened to pick. Hmm I see... so one gets basically the behaviour of RAID. Isn't that kind of a big loss? I always considered the guarantee against block errors and that like one of the big and basic features of btrfs. It seems that for certain (not too unimportant cases: DBs, VMs) one has to decide between either evil, loosing the guaranteed consistency via checksums... or basically running into severe troubles (like Mitch's reported fragmentation issues). > > 3) When I would actually disable datacow for e.g. a subvolume that > > holds VMs or DBs... what are all the implications? > > Obviously no checksumming, but what happens if I snapshot such a > > subvolume or if I send/receive it? >=20 > =C2=A0=C2=A0=C2=A0After snapshotting, modifications are CoWed precisely o= nce, and > then it reverts to nodatacow again. This means that making a snapshot > of a nodatacow object will cause it to fragment as writes are made to > it. I see... something that should possibly go to some advanced admin documentation (if not already in). It means basically, that one must assure that any such files (VM images, DB data dirs) are already created with nodatacow (perhaps on a subvolume which is mounted as such. > > 4) Duncan mentioned that defrag (and I guess that's also for auto- > > defrag) isn't ref-link aware... > > Isn't that somehow a complete showstopper? > =C2=A0=C2=A0=C2=A0It is, but the one attempt at dealing with it caused ma= ssive data > corruption, and it was turned off again. So... does this mean that it's still planned to be implemented some day or has it been given up forever? And is it (hopefully) also planned to be implemented for reflinks when compression is added/changed/removed? Given that you (or Duncan?,... sorry I sometimes mix up which of said exactly what, since both of you are notoriously helpful :-) ) mentioned that autodefrag basically fails with larger files,... and given that it seems to be quite important for btrfs to not be fragmented too heavily, it sounds a bit as if anything that uses (multiple) reflinks (e.g. snapshots) cannot be really used very well. > autodefrag, however, has > always been snapshot aware and snapshot safe, and would be the > recommended approach here. Ahhh... so autodefag *is* snapshot aware, and that's basically why the suggestion is (AFAIU) that it's turned on, right? So, I'm afraid O:-), that triggers a follow-up question: Why isn't it the default? Or in other words what are its drawbacks (e.g. other cases where ref-links would be broken up,... or issues with compression)? And also, when I now activate it on an already populated fs, will it defrag also any old files (even if they're not rewritten or so)? I tried to have a look for some general (rather "for dummies" than for core developers) description of how defrag and autodefrag work... but couldn't find anything in the usual places... :-( btw: The wiki (https://btrfs.wiki.kernel.org/index.php/UseCases#How_do_ I_defragment_many_files.3F) doesn't mention that auto-defrag doesn't suffer from that problem. > (Actually, it was broken in the same > incident I just described -- but fixed again when the broken patches > were reverted). So it just couldn't be fixed (hopfully: yet) for the (manual) online defragmentation?! > > 5) Especially keeping (4) in mind but also the other comments in > > from > > Duncan and Austin... > > Is auto-defrag now recommended to be generally used? > > =C2=A0=C2=A0=C2=A0Absolutely, yes. I see... well, I'll probably wait for some answers about its drawbacks and then give it a try. > =C2=A0=C2=A0=C2=A0It's late for me, and this email was longer than I susp= ected, so > I'm going to stop here, but I'll try to pick it up again and answer > your other questions tomorrow. Thanks so far :) I know I haven't replied to that thread for some days, but if you have anything to add to the remaining questions, I'd be still happy to read it :) Thanks and best wishes, Chris. --=-2rGOBY9GIH1rMOAHOyMh Content-Type: application/x-pkcs7-signature; name="smime.p7s" Content-Disposition: attachment; filename="smime.p7s" Content-Transfer-Encoding: base64 MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgMFADCABgkqhkiG9w0BBwEAAKCCEZIw ggW/MIIDp6ADAgECAgMCOakwDQYJKoZIhvcNAQENBQAwVDEUMBIGA1UEChMLQ0FjZXJ0IEluYy4x HjAcBgNVBAsTFWh0dHA6Ly93d3cuQ0FjZXJ0Lm9yZzEcMBoGA1UEAxMTQ0FjZXJ0IENsYXNzIDMg Um9vdDAeFw0xNDA2MTIxNjM2MThaFw0xNjA2MTExNjM2MThaMHwxITAfBgNVBAMTGENocmlzdG9w aCBBbnRvbiBNaXR0ZXJlcjEkMCIGCSqGSIb3DQEJARYVY2FsZXN0eW9Ac2NpZW50aWEubmV0MTEw LwYJKoZIhvcNAQkBFiJtYWlsQGNocmlzdG9waC5hbnRvbi5taXR0ZXJlci5uYW1lMIIBIjANBgkq hkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA4phP/j9vT9dZT+k3ffHxvRWMOuzBnu5O3Fl4y2+WL7pL rfLiEhWzGXhHvjSqpt4vCNSdqy43453nnu8+hMb+uEtqSIL1AHU5eLhuDNVN9S4bt9E7nA2WKYBU LCUi/xCD/GL7ToyJNwhrhzcCZ7pXSc3xVqFoC4f6weU9ExhoEZQNRpTM0BFCOi4fRxvKFNnUYgjK hqy0Ta5H0Xx86mAp0Q4dxoD7mhI5iTF6TRkUheELxF24JCuAf04M89Cwft6DRH1FpJ3yvgW2B5U5 aFSL4ZnF4N/wyCB7Dkm1rQ7RCAvw5btkf0VdPnU7ccDCx8HEc2nxK/lbCjrznvh3sa1CCwIDAQAB o4IBcDCCAWwwDAYDVR0TAQH/BAIwADBWBglghkgBhvhCAQ0ESRZHVG8gZ2V0IHlvdXIgb3duIGNl cnRpZmljYXRlIGZvciBGUkVFIGhlYWQgb3ZlciB0byBodHRwOi8vd3d3LkNBY2VydC5vcmcwDgYD VR0PAQH/BAQDAgOoMEAGA1UdJQQ5MDcGCCsGAQUFBwMEBggrBgEFBQcDAgYKKwYBBAGCNwoDBAYK KwYBBAGCNwoDAwYJYIZIAYb4QgQBMDIGCCsGAQUFBwEBBCYwJDAiBggrBgEFBQcwAYYWaHR0cDov L29jc3AuY2FjZXJ0Lm9yZzA4BgNVHR8EMTAvMC2gK6AphidodHRwOi8vY3JsLmNhY2VydC5vcmcv Y2xhc3MzLXJldm9rZS5jcmwwRAYDVR0RBD0wO4EVY2FsZXN0eW9Ac2NpZW50aWEubmV0gSJtYWls QGNocmlzdG9waC5hbnRvbi5taXR0ZXJlci5uYW1lMA0GCSqGSIb3DQEBDQUAA4ICAQBefctiLgGl e5baspuozyA4k7Up7SVhGHbif6pQfoFc/9Thx9GXnYpX+U64PMyWBfWwHZIy52Vg0RVkvPi1t6mi GyBfoSpC6ooR0bKWtUIogw/ymqKWlTLVR8kbLqRmRk4juMtCXG2K3yMygX/rjkuUSuFj2Bjpkmzg CtMojbUMYbszePmhQ7DJ62YEdtKpcjN94QAsI5GWlIAbs3KJazAcaNCRJeXCLcUMchyKHJA+NXH5 az/ekBxBMBzJP2An20PP88UI4JW18z31KiG9UVGa2uO4l4aWgVe2GnhNEdCD/o48msJEWKAt5vl2 yMqr7ihmNPocU2+/FW0xPe/vftdOTD9pgXdSGf4prdD+23q2YvpalOCzr2p8yCJZNVBPMxAP4mL0 3OEktXza4wohqAmceXKfGUNwRGBaPvtIGnPrpLhCQ+2YJDg8g1UEsk23bKyZlJWeKJyVqOBsDJmj aBsN/qKhQFnav+zQdqGhMeaSisF/53mD3gyVYg2JRl18apgGbg32kyLmomqa0JbhnY3Dc3FVtZfe +P+s2Cyep3pVKvFer2llRoGm8TwraG5Yhyx8Oq/1qETpstjbURJOVBLDCV4AjOEUj0ZnE/tEo/DK yexgGaViNvjp+IZdFdJhYmsVjw4Q3vG7O0pfsLiYEyQjeDgjNEWDfa5/MufPywIfxzCCBb8wggOn oAMCAQICAwI5qTANBgkqhkiG9w0BAQ0FADBUMRQwEgYDVQQKEwtDQWNlcnQgSW5jLjEeMBwGA1UE CxMVaHR0cDovL3d3dy5DQWNlcnQub3JnMRwwGgYDVQQDExNDQWNlcnQgQ2xhc3MgMyBSb290MB4X DTE0MDYxMjE2MzYxOFoXDTE2MDYxMTE2MzYxOFowfDEhMB8GA1UEAxMYQ2hyaXN0b3BoIEFudG9u IE1pdHRlcmVyMSQwIgYJKoZIhvcNAQkBFhVjYWxlc3R5b0BzY2llbnRpYS5uZXQxMTAvBgkqhkiG 9w0BCQEWIm1haWxAY2hyaXN0b3BoLmFudG9uLm1pdHRlcmVyLm5hbWUwggEiMA0GCSqGSIb3DQEB AQUAA4IBDwAwggEKAoIBAQDimE/+P29P11lP6Td98fG9FYw67MGe7k7cWXjLb5Yvukut8uISFbMZ eEe+NKqm3i8I1J2rLjfjneee7z6Exv64S2pIgvUAdTl4uG4M1U31Lhu30TucDZYpgFQsJSL/EIP8 YvtOjIk3CGuHNwJnuldJzfFWoWgLh/rB5T0TGGgRlA1GlMzQEUI6Lh9HG8oU2dRiCMqGrLRNrkfR fHzqYCnRDh3GgPuaEjmJMXpNGRSF4QvEXbgkK4B/Tgzz0LB+3oNEfUWknfK+BbYHlTloVIvhmcXg 3/DIIHsOSbWtDtEIC/Dlu2R/RV0+dTtxwMLHwcRzafEr+VsKOvOe+HexrUILAgMBAAGjggFwMIIB bDAMBgNVHRMBAf8EAjAAMFYGCWCGSAGG+EIBDQRJFkdUbyBnZXQgeW91ciBvd24gY2VydGlmaWNh dGUgZm9yIEZSRUUgaGVhZCBvdmVyIHRvIGh0dHA6Ly93d3cuQ0FjZXJ0Lm9yZzAOBgNVHQ8BAf8E BAMCA6gwQAYDVR0lBDkwNwYIKwYBBQUHAwQGCCsGAQUFBwMCBgorBgEEAYI3CgMEBgorBgEEAYI3 CgMDBglghkgBhvhCBAEwMgYIKwYBBQUHAQEEJjAkMCIGCCsGAQUFBzABhhZodHRwOi8vb2NzcC5j YWNlcnQub3JnMDgGA1UdHwQxMC8wLaAroCmGJ2h0dHA6Ly9jcmwuY2FjZXJ0Lm9yZy9jbGFzczMt cmV2b2tlLmNybDBEBgNVHREEPTA7gRVjYWxlc3R5b0BzY2llbnRpYS5uZXSBIm1haWxAY2hyaXN0 b3BoLmFudG9uLm1pdHRlcmVyLm5hbWUwDQYJKoZIhvcNAQENBQADggIBAF59y2IuAaV7ltqym6jP IDiTtSntJWEYduJ/qlB+gVz/1OHH0Zedilf5Trg8zJYF9bAdkjLnZWDRFWS8+LW3qaIbIF+hKkLq ihHRspa1QiiDD/KaopaVMtVHyRsupGZGTiO4y0JcbYrfIzKBf+uOS5RK4WPYGOmSbOAK0yiNtQxh uzN4+aFDsMnrZgR20qlyM33hACwjkZaUgBuzcolrMBxo0JEl5cItxQxyHIockD41cflrP96QHEEw HMk/YCfbQ8/zxQjglbXzPfUqIb1RUZra47iXhpaBV7YaeE0R0IP+jjyawkRYoC3m+XbIyqvuKGY0 +hxTb78VbTE97+9+105MP2mBd1IZ/imt0P7berZi+lqU4LOvanzIIlk1UE8zEA/iYvTc4SS1fNrj CiGoCZx5cp8ZQ3BEYFo++0gac+ukuEJD7ZgkODyDVQSyTbdsrJmUlZ4onJWo4GwMmaNoGw3+oqFA Wdq/7NB2oaEx5pKKwX/neYPeDJViDYlGXXxqmAZuDfaTIuaiaprQluGdjcNzcVW1l974/6zYLJ6n elUq8V6vaWVGgabxPCtobliHLHw6r/WoROmy2NtREk5UEsMJXgCM4RSPRmcT+0Sj8MrJ7GAZpWI2 +On4hl0V0mFiaxWPDhDe8bs7Sl+wuJgTJCN4OCM0RYN9rn8y58/LAh/HMIIGCDCCA/CgAwIBAgIB ATANBgkqhkiG9w0BAQQFADB5MRAwDgYDVQQKEwdSb290IENBMR4wHAYDVQQLExVodHRwOi8vd3d3 LmNhY2VydC5vcmcxIjAgBgNVBAMTGUNBIENlcnQgU2lnbmluZyBBdXRob3JpdHkxITAfBgkqhkiG 9w0BCQEWEnN1cHBvcnRAY2FjZXJ0Lm9yZzAeFw0wNTEwMTQwNzM2NTVaFw0zMzAzMjgwNzM2NTVa MFQxFDASBgNVBAoTC0NBY2VydCBJbmMuMR4wHAYDVQQLExVodHRwOi8vd3d3LkNBY2VydC5vcmcx HDAaBgNVBAMTE0NBY2VydCBDbGFzcyAzIFJvb3QwggIiMA0GCSqGSIb3DQEBAQUAA4ICDwAwggIK AoICAQCrSTURSHzSJn5TlM9Dqd0o10Iqi/OHeBlYfA+e2ol94fvrcpANdKGWZKufoCSZc9riVXbH F3v1BKxGuMO+f2SNEGwk82GcwPKQ+lHm9WkBY8MPVuJKQs/iRIwlKKjFeQl9RrmK8+nzNCkIReQc n8uUBByBqBSzmGXEQ+xOgo0J0b2qW42S0OzekMV/CsLj6+YxWl50PpczWejDAz1gM7/30W9HxM3u YoNSbi4ImqTZFRiRpoWSR7CuSOtttyHshRpocjWr//AQXcD0lKdq1TuSfkyQBX6TwSyLpI5idBVx bgtxA+qvFTia1NIFcm+M+SvrWnIl+TlG43IbPgTDZCciECqKT1inA62+tC4T7V2qSNfVfdQqe1z6 RgRQ5MwOQluM7dvyz/yWk+DbETZUYjQ4jwxgmzuXVjit89Jbi6Bb6k6WuHzX1aCGcEDTkSm3ojyt 9Yy7zxqSiuQ0e8DYbF/pCsLDpyCaWt8sXVJcukfVm+8kKHA4IC/VfynAskEDaJLM4JzMl0tF7zoQ CqtwOpiVcK01seqFK6QcgCExqa5geoAmSAC4AcCTY1UikTxW56/bOiXzjzFU6iaLgVn5odFTEcV7 nQP2dBHgbbEsPyyGkZlxmqZ3izRg0RS0LKydr4wQ05/EavhvE/xzWfdmQnQeiuP43NJvmJzLR5iV QAX76QIDAQABo4G/MIG8MA8GA1UdEwEB/wQFMAMBAf8wXQYIKwYBBQUHAQEEUTBPMCMGCCsGAQUF BzABhhdodHRwOi8vb2NzcC5DQWNlcnQub3JnLzAoBggrBgEFBQcwAoYcaHR0cDovL3d3dy5DQWNl cnQub3JnL2NhLmNydDBKBgNVHSAEQzBBMD8GCCsGAQQBgZBKMDMwMQYIKwYBBQUHAgEWJWh0dHA6 Ly93d3cuQ0FjZXJ0Lm9yZy9pbmRleC5waHA/aWQ9MTAwDQYJKoZIhvcNAQEEBQADggIBAH8IiKHa GlBJ2on7oQhy84r3HsQ6tHlbIDCxRd7CXdNlafHCXVRUPIVfuXtCkcKZ/RtRm6tGpaEQU55tiKxz biwzpvD0nuB1wT6IRanhZkP+VlrRekF490DaSjrxC1uluxYG5sLnk7mFTZdPsR44Q4Dvmw2M77in YACHV30eRBzLI++bPJmdr7UpHEV5FpZNJ23xHGzDwlVks7wU4vOkHx4y/CcVBc/dLq4+gmF78CEQ GPZE6lM5+dzQmiDgxrvgu1pPxJnIB721vaLbLmINQjRBvP+LivVRIqqIMADisNS8vmW61QNXeZvo 3MhN+FDtkaVSKKKs+zZYPumUK5FQhxvWXtaMzPcPEAxSTtAWYeXlCmy/F8dyRlecmPVsYGN6b165 Ti/Iubm7aoW8mA3t+T6XhDSUrgCvoeXnkm5OvfPi2RSLXNLrAWygF6UtEOucekq9ve7O/e0iQKtw OIj1CodqwqsFYMlIBdpTwd5Ed2qz8zw87YC8pjhKKSRf/lk7myV6VmMAZLldpGJ9VzZPrYPvH5JT oI53V93lYRE9IwCQTDz6o2CTBKOvNfYOao9PSmCnhQVsRqGP9Md246FZV/dxssRuFFxtbUFm3xuT sdQAw+7Lzzw9IYCpX2Nl/N3gX6T0K/CFcUHUZyX7GrGXrtaZghNB0m6lG5kngOcLqagAMYIC7TCC AukCAQEwWzBUMRQwEgYDVQQKEwtDQWNlcnQgSW5jLjEeMBwGA1UECxMVaHR0cDovL3d3dy5DQWNl cnQub3JnMRwwGgYDVQQDExNDQWNlcnQgQ2xhc3MgMyBSb290AgMCOakwDQYJYIZIAWUDBAIDBQCg ggFjMBgGCSqGSIb3DQEJAzELBgkqhkiG9w0BBwEwHAYJKoZIhvcNAQkFMQ8XDTE1MTIwOTA1NDMw MVowTwYJKoZIhvcNAQkEMUIEQFHMbVccof38Xue8xuupcen5zdntmZrzP3d84t47Zj+muRUonWdL IYBAAoXnNXQnzdQ7xkxe+yXNh6OXvKMqKXMwagYJKwYBBAGCNxAEMV0wWzBUMRQwEgYDVQQKEwtD QWNlcnQgSW5jLjEeMBwGA1UECxMVaHR0cDovL3d3dy5DQWNlcnQub3JnMRwwGgYDVQQDExNDQWNl cnQgQ2xhc3MgMyBSb290AgMCOakwbAYLKoZIhvcNAQkQAgsxXaBbMFQxFDASBgNVBAoTC0NBY2Vy dCBJbmMuMR4wHAYDVQQLExVodHRwOi8vd3d3LkNBY2VydC5vcmcxHDAaBgNVBAMTE0NBY2VydCBD bGFzcyAzIFJvb3QCAwI5qTANBgkqhkiG9w0BAQEFAASCAQCoY4a4P1pQ7jXTJndZgEpGxP4wacsc NVoPhMWifznC/r6asg0InswEM2ThHLmzMpmiDqI4w64itIm0fveEyCb+O2AxEKcBb2guomz/aEXl 0/Jyp1oOzlyN1DLAvs9AYm3KY6rtzYQSUIr0LpPKONg2X99JaeX9b3C/6OQq3h+9hqUCdSjg5unF 65TsSbBnC6scTwGZaFeAWCoo+dar0aY6JZKkCPrT5VOpV7MKGA3HQJU829CfsGha2Uf83yqY5BOT crHSfr/VvbBa+wK0JlKLPB/l01KX8Gfx7uYTdBLqdhcm0hlyFOf0kTgzYoYr4Sb69Pfnd4lfvPtH 5rm9Lz95AAAAAAAA --=-2rGOBY9GIH1rMOAHOyMh--