From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mout.gmx.net ([212.227.15.18]:64778 "EHLO mout.gmx.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750760AbbHOJTW (ORCPT ); Sat, 15 Aug 2015 05:19:22 -0400 Received: from thetick ([93.181.44.4]) by mail.gmx.com (mrgmx003) with ESMTPSA (Nemesis) id 0Lx8OH-1YkjmA2Kuf-016iox for ; Sat, 15 Aug 2015 11:19:19 +0200 Date: Sat, 15 Aug 2015 11:19:07 +0200 From: Marc Joliet To: linux-btrfs@vger.kernel.org Subject: Re: Deleted files cause btrfs-send to fail Message-ID: <20150815111907.700aa44b@thetick> In-Reply-To: References: <20150813003419.09f13c1a@thetick> <20150813090541.77f5c821@thetick> <20150813105458.676c884a@thetick> <20150814233737.5403f9fe@thetick> Reply-To: linux-btrfs@vger.kernel.org MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; boundary="Sig_/ZJM+CP3dU4YRvwEKeH=__GA"; protocol="application/pgp-signature" Sender: linux-btrfs-owner@vger.kernel.org List-ID: --Sig_/ZJM+CP3dU4YRvwEKeH=__GA Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Am Sat, 15 Aug 2015 05:10:57 +0000 (UTC) schrieb Duncan <1i5t5.duncan@cox.net>: > Marc Joliet posted on Fri, 14 Aug 2015 23:37:37 +0200 as excerpted: >=20 > > (One other thing I found interesting was that "btrfs scrub" didn't care > > about the link count errors.) >=20 > A lot of people are confused about exactly what btrfs scrub does, and=20 > expect it to detect and possibly fix stuff it has nothing to do with. =20 > It's *not* an fsck. >=20 > Scrub does one very useful, but limited, thing. It systematically=20 > verifies that the computed checksums for all data and metadata covered by= =20 > checksums match the corresponding recorded checksums. For dup/raid1/ > raid10 modes, if there's a match failure, it will look up the other copy= =20 > and see if it matches, replacing the invalid block with a new copy of the= =20 > other one, assuming it's valid. For raid56 modes, it attempts to compute= =20 > the valid copy from parity and, again assuming a match after doing so,=20 > does the replace. If a valid copy cannot be found or computed, either=20 > because it's damaged too or because there's no second copy or parity to=20 > fall back on (single and raid0 modes), then scrub will detect but cannot= =20 > correct the error. >=20 > In routine usage, btrfs automatically does the same thing if it happens=20 > to come across checksum errors in its normal IO stream, but it has to=20 > come across them first. Scrub's benefit is that it systematically=20 > verifies (and corrects errors where it can) checksums on the entire=20 > filesystem, not just the parts that happen to appear in the normal IO=20 > stream. I know all that, I just thought it was interesting and wanted to remark as such. After thinking about it a bit, of course, it makes perfect sense and = is not very interesting at all: scrub will just verify that the checksums mat= ch, no matter whether the underlying (meta)data is valid or not. > Such checksum errors can be for a few reasons... >=20 > I have one ssd that's gradually failing and returns checksum errors=20 > fairly regularly. Were I using a normal filesystem I'd have had to=20 > replace it some time ago. But with btrfs in raid1 mode and regular=20 > scrubs (and backups, should they be needed; sometimes I let them get a=20 > bit stale, but I do have them and am prepared to live with the stale=20 > restored data if I have to), I've been able to keep using the failing=20 > device. When the scrubs hit errors and btrfs does the rewrite from the=20 > good copy, a block relocation on the failing device is triggered as well,= =20 > with the bad block taken out of service and a new one from the set of=20 > spares all modern devices have takes its place. Currently, smartctl -A=20 > reports 904 reallocated sectors raw value, with a standardized value of=20 > 92. Before the first reallocated sector, the standardized value was 253,= =20 > perfect. With the first reallocated sector, it immediately dropped to=20 > 100, apparently the rounded percentage of spare sectors left. It has=20 > gradually dropped since then to its current 92, with a threshold value of= =20 > 36. So while it's gradually failing, there's still plenty of spare=20 > sectors left. Normally I would have replaced the device even so, but=20 > I've never actually had the opportunity to actually watch a slow failure= =20 > continue to get worse over time, and now that I do I'm a bit curious how= =20 > things will go, so I'm just letting it happen, tho I do have a=20 > replacement device already purchased and ready, when the time comes.=20 I'm curious how that will pan out. My experience with HDDs is that at some point the sector reallocations start picking up at a somewhat constant (may= be even accelerating) rate. I wonder how SSDs behave in this regard. > So real media failure, bitrot, is one reason for bad checksums. The data= =20 > read back from the device simply isn't the same data that was stored to=20 > it, and the checksum fails as a result. >=20 > Of course bad connector cables or storage chipset firmware or hardware is= =20 > another "hardware" cause. >=20 > Sudden reboot or power loss, with data being actively written and one=20 > copy either already updated or not yet touched, while the other is=20 > actually being written at the time of the crash so the write isn't=20 > completed, is yet another reason for checksum failure. This one is=20 > actually why a scrub can appear to do so much more than it does, because= =20 > where there's a second copy (or parity) of the data available, scrub can= =20 > use it to recover the partially written copy (which being partially=20 > written fails its checksum verification) to either the completed write=20 > state, if the other copy was already written, or the pre-write state, if= =20 > the other copy hadn't been written at all, yet. In this way the result=20 > is often the same one an fsck would normally produce, detecting and=20 > fixing the error, but the mechanism is entirely different -- it only=20 > detected and fixed the error because the checksum was bad and it had a=20 > good copy it could replace it with, not because it had any smarts about=20 > how the filesystem actually worked, and could actually tell what the=20 > error was and correct it by actually correcting it. >=20 >=20 > Meanwhile, in your case the problem was an actual btrfs logic bug -- it=20 > didn't track the inode ref-counts correctly, and didn't remove the inode= =20 > when the last reference to it was deleted, because it still thought there= =20 > were more references. So the metadata actually written to storage was=20 > incorrect due to the logic flaw, but the checksum covering it was indeed= =20 > the correct checksum for that metadata, as wrong as the metadata actually= =20 > happened to be. So scrub couldn't detect the error, because it was an=20 > error not in checksum, which was computed correctly over the metadata,=20 > but in the logic of the metadata itself as it was written. Scrub=20 > therefore had nothing to do with that error and was in fact totally=20 > oblivious to the fact that the valid checksum covered flawed data in the= =20 > first place. Only a tool that could follow the actual logic, send in=20 > this case, since it has to follow the logic in ordered to properly send=20 > it, could detect the error, and only btrfs check knew enough about the=20 > logic to both detect the problem and correct it -- tho even then, it=20 > couldn't totally fix it, as part of the metadata was irretrievably=20 > missing, so it simply dropped what it could retrieve in lost-and-found. >=20 >=20 > That should make the answer to the question of why scrub couldn't detect= =20 > and fix the problem clearer -- scrub only detects and possibly fixes a=20 > very specific problem. checksum verification failure, and that's not the= =20 > problem you had. As far as scrub was concerned, the checksums were fine,= =20 > and that's all it knows about, so to it, the data and metadata were fine. Yeah, that's a more verbose way to put it :) . Thanks anyway. Greetings --=20 Marc Joliet -- "People who think they know everything really annoy those of us who know we don't" - Bjarne Stroustrup --Sig_/ZJM+CP3dU4YRvwEKeH=__GA Content-Type: application/pgp-signature Content-Description: Digitale Signatur von OpenPGP -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBAgAGBQJVzwQWAAoJEL/Q5oYsiHj0imsP/ijJyMdY2AJDo+8DEyeMpYRi koyYWUSCZaZ23+YtAokYXGHuH2kHtNfb0G737hRIDVtM6JvGJ1HNdf/AsCywDqUm PF5AMzxi78UW6dVbd5kvCVhUW+73RWQbpQgsxh9oYc+3GpuldzsrmXBro0aTcuaH CEu5APyzMvLihLfZqks0tyr836BPCGUsLYwXQltrg606o5JLphl5HO/AUmAZQCnp GLzYBpH8AijWmjxUNLEiRrNFR9iVF9lVTpxEi4tyci+6EgHuyMRU/I/qIqsD43hT TcnxhsuvodHKhbOeZ795enjHAsv0GiaVWgwRIthiUb/eFnXx6Sk1S4PGyi8C6mzZ QG9MeWXf3kOlRByz0wR+VKC+b/QdhyeY6jBJfzHDj4Ey+UNbbnwaSDnXIJtGMIqK rt50A+DEXvsZbRNjNqF2kamVgbuCaBzskwwc3fvAd+wP9GkTbMtW3C1ClGTZWmXW +/0fbOqgnskR3A6SuGDXeFxZvnZHn38SgkjHUl+5bbGnHeOMosvgoXcLtoa7Ujar ixyRko7I3mS8+ZwDzTg7xFlryINXh4d9slFWjBIgPmrPs/ScTbRJQftMxqpjiRSU 7DuTCvda6xpZu+6p8ms+NxQ0gGzIum7MxuVJ9PrFdddTsUhOfq/bHkC1HfOiKIuz ebX235LuASx1el4F7UZI =H0b/ -----END PGP SIGNATURE----- --Sig_/ZJM+CP3dU4YRvwEKeH=__GA--