From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.1 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6E7E0C43381 for ; Mon, 18 Feb 2019 20:14:54 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 0A8502177E for ; Mon, 18 Feb 2019 20:14:53 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=seblu.net header.i=@seblu.net header.b="hKBxVumG" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728705AbfBRUOw (ORCPT ); Mon, 18 Feb 2019 15:14:52 -0500 Received: from mail.seblu.net ([212.129.28.29]:46054 "EHLO mail.seblu.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727400AbfBRUOw (ORCPT ); Mon, 18 Feb 2019 15:14:52 -0500 Received: from localhost (localhost [IPv6:::1]) by mail.seblu.net (Postfix) with ESMTP id 32C9952FBC34; Mon, 18 Feb 2019 21:14:49 +0100 (CET) Received: from mail.seblu.net ([IPv6:::1]) by localhost (mail.seblu.net [IPv6:::1]) (amavisd-new, port 10032) with ESMTP id lcVRtvTvueAN; Mon, 18 Feb 2019 21:14:48 +0100 (CET) Received: from localhost (localhost [IPv6:::1]) by mail.seblu.net (Postfix) with ESMTP id 7F1A352FBC36; Mon, 18 Feb 2019 21:14:48 +0100 (CET) DKIM-Filter: OpenDKIM Filter v2.10.3 mail.seblu.net 7F1A352FBC36 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=seblu.net; s=pipa; t=1550520888; bh=cSUGzEzA7/afyBtZ0/w0HSk3/8lBl9uiecKxFNhcYfc=; h=Message-ID:From:To:Date:Mime-Version; b=hKBxVumGPN/XYe8D+3hx82PdBFwcuZnqEIbKMoFf2NIqIJe2RSAApmxRXKukeFnHR EImvfsamvKeOA9uhgbeTwUXS+vevONTPond8ZhRh+52klQidfOo9/bPfzcQIh8EgQN 6ZrSkL6O+iP+Nr2TrslPxFM7q2WSFOOEgjsUaLVf0DveBqZgnt2dAWLl209wMKoidi p/CnhLv90RM54ToKM2IAYXiapzpqwCihb2aiAIpm1fE0VYBXipL/Smd8OSgoNNkkgN SImbih5RR6yOJ60mbYSEO6qEYtzmZB/2bQYujVloN0F/xofLZ5mMxs+Doz/VFYg7Ne udtAuVqcskjkQ== X-Virus-Scanned: amavisd-new at seblu.net Received: from mail.seblu.net ([IPv6:::1]) by localhost (mail.seblu.net [IPv6:::1]) (amavisd-new, port 10026) with ESMTP id iAWvzgY_Bie3; Mon, 18 Feb 2019 21:14:48 +0100 (CET) Received: from dolores (amontsouris-684-1-76-225.w90-87.abo.wanadoo.fr [90.87.59.225]) by mail.seblu.net (Postfix) with ESMTPSA id 4473652FBC34; Mon, 18 Feb 2019 21:14:48 +0100 (CET) Message-ID: <91e2c9ef095eae21f9e88f7b5cf49102571dcba8.camel@seblu.net> Subject: Re: Corrupted filesystem, looking for guidance From: =?ISO-8859-1?Q?S=E9bastien?= Luttringer To: Chris Murphy Cc: linux-btrfs Date: Mon, 18 Feb 2019 21:14:47 +0100 In-Reply-To: References: <7ef0e91501a04cd4c5e0d942db638a0b50ef3ec3.camel@seblu.net> Content-Type: multipart/signed; micalg="pgp-sha384"; protocol="application/pgp-signature"; boundary="=-AzVJNo7DYgHz2X915YAL" X-Mailer: Evolution 3.28.5 Mime-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org --=-AzVJNo7DYgHz2X915YAL Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Tue, 2019-02-12 at 15:40 -0700, Chris Murphy wrote: > On Mon, Feb 11, 2019 at 8:16 PM S=C3=A9bastien Luttringer wrote: >=20 > FYI: This only does full stripe reads, recomputes parity and overwrites t= he > parity strip. It assumes the data strips are correct, so long as the > underlying member devices do not return a read error. And the only way th= ey > can return a read error is if their SCT ERC time is less than the kernel'= s > SCSI command timer. Otherwise errors can accumulate. >=20 > smartctl -l scterc /dev/sdX > cat /sys/block/sdX/device/timeout >=20 > The first must be a lesser value than the second. If the first is disable= d > and can't be enabled, then the generally accepted assumed maximum time fo= r > recoveries is an almost unbelievable 180 seconds; so the second needs to = be > set to 180 and is not persistent. You'll need a udev rule or startup scri= pt > to set it at every boot. All my disks firmwares doesn't allow ERC to be modified trough SCT. # smartctl -l scterc /dev/sda smartctl 7.0 2018-12-30 r4883 [x86_64-linux-4.19.20-seblu] (local build) Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.= org =20 SCT Error Recovery Control command not supported I was not aware of that timer. I needed time to read and experiment on this= . Sorry for the long response time. I hope you didn't timeout. :) After simulated several errors and timeouts with scsi_debug[1], fault_injection[2], and dmsetup[3], I don't understand why you suggest this could lead to corruption. When an SCSI command timeout, the mid-layer[4] do several error recovery attempt. These attempts are logged into the kernel r= ing buffer and at worst the device is put offline. =46rom my experiment, the md layer has no timeout, and waits as long as the underlying layer doesn't return, either during check or normal read/write attempt. I understand the benefits of keeping the disk time to recover from errors b= elow the hba timeout. It prevents the disk to be kicked out of the array.=20 However, I don't see how this could lead to a difference between check and repair in the md layer and even trigger some corruption between the chunks inside a stipe. >=20 > It is sufficient to merely run a check, rather than repair, to trigger th= e > proper md RAID fixup from a device read error. >=20 > Getting a mismatch on a check means there's a hardware problem somewhere.= The > mismatch count only tells you there is a mismatch between data strips and > their parity strip. It doesn't tell you which device is wrong. And if the= re > are no read errors, and no link resets, and yet you get mismatches, that > suggests silent data corruption.=20 After reading the whole md (5) manual, I realize how bad it is to rely on t= he md layer to guaranty data integrity. There is no mechanism to known which c= hunk is corrupted in a stripe. I'm wondering if using btrfs raid5, despite its known flaws, it is not safe= r than md. > Further, if the mismatches are consistently in the same sector range, it > suggests the repair scrub returned one set of data, and the subsequent ch= eck > scrub returned different data - that's the only way you get mismatches > following a repair scrub. It was the same range. That was my understanding too. I finally get ride of these errors by removing a disk, wiping the superbloc= k and adding it back to the raid. Since then, no check error (tested twice). > If it's bad RAM, then chances are both copies of metadata will be identic= ally > wrong and thus no help in recovery. RAM is not ECC. I tested the RAM recently and no error was found. But, I needed more RAM to rsync all the data w/ hardlinks, so I added a swa= p file on my system disk (an ssd). The filesystem on it is also btrfs, so I u= sed a loop device to workaround the hole issue. I can find some link reset on this drive at time it was used as swap file. Maybe this could be a reason. > > How could I save my filesystem? Should I try --repair or --init-csum-tr= ee? >=20 > If it mounts read-only, update your backups. That is the first priority. = Be > prepared to need them. If it will not mount read only anymore then I sugg= est > 'btrfs restore' to scrape data out of the volume to a backup while it's s= till > possible. Any repair attempt means writing changes, and any writes are > inherently risky in this situation. So yeah - if the data is important, f= ocus > on backups first. Fortunately, data are safe, as I was in the middle of restoring them back t= o the server after a first issue with an old BTRFS filesystem[5]. > Next, I expect until the RAID is healthy that it's difficult to make a > successful repair of the file system. And for the RAID to be healthy, fir= st > memory and storage hardware needs to be certainly healthy - the fact ther= e > are mismatches following an md repair scrub directly suggests hardware > issues. The linux-raid list is usually quite helpful tracking down such > problems, including which devices are suspect, but they're going to ask t= he > same questions about SCT ERC and SCSI command timer values I mentioned > earlier, and will want to figure out why you're continuing to see mismatc= hes > even after a repair scrub - not normal. I think I will remove the md layer and use only BTRFS to be able to recover from silent data corruption. But I'm curious to be able to repair a broken BTRFS without moving all the dataset to another place. It's the second time it happen to me. I tried: # btrfs check --init-extent-tree /dev/md127 # btrfs check --clear-space-cache v2 /dev/md127 # btrfs check --clear-space-cache v1 /dev/md127 # btrfs rescue super-recover /dev/md127 # btrfs check -b --repair /dev/md127 # btrfs check --repair /dev/md127 # btrfs rescue zero-log /dev/md127 The detailed output is here [6]. But none of the above allowed me to drop t= he broken part of the btrfs tree to move forward. Is there a way to repair (by loosing corrupted data) without need to drop all the correct data? Regards, [1] http://sg.danny.cz/sg/sdebug26.html [2]=20 https://www.kernel.org/doc/Documentation/fault-injection/fault-injection.tx= t [3] https://linux.die.net/man/8/dmsetup [4] https://www.tldp.org/HOWTO/SCSI-Generic-HOWTO/x215.html [5]=20 https://lore.kernel.org/linux-btrfs/6e66eb52e4c13fc4206d742e1dade38b04592e4= 9.camel@seblu.net/ [6] http://cloud.seblu.net/s/EPieGzGm9xcyQzd --=20 S=C3=A9bastien Luttringer --=-AzVJNo7DYgHz2X915YAL Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- iQIrBAABCQAdFiEEVyJOyaX8pvvJK7iqShr8NF6+GPgFAlxrEjcACgkQShr8NF6+ GPixuw+/YVtVjVULfSxC6K/gONfc0KxVL5HGSUFfGdf3ILmySrRQZFGIAzYqKQbq 4cj1EfxACfl1CUQlYNs4q6uiqV/Tt34YfeUPdBUPvLz5vDm9mzGG96aCNuwtkwJQ T66O5r/4Cd3d12Po+UOixzf/Y+RiYKP9IHenRdH/nOtf6erbHchcC5tMOY7OEkdO LF+RBFtNMSnfepgomFsGjAXwOwF2WOuDHmHRTsb8F82t9ZUGm/E5V/UKGx1sAUZk WB04cUOv6WfFSm9Ei2OtcDWTnSHx8dAIzGbLdi1mLO2aoIzWarbpMip5CpfaS4Mx kOp8FpjHSHJug1i7mnb93t3a4eBnX7bvzK2oqiWG1T42NatGQ+Alv8rSu2VNfVqJ LZyvrHYplHAY0nVSR3N362tIoaTMfgUWYurlTViJaAX4G32hl9cZZSABb0FwCiA8 /PBg1Sp4C91xOvqPAEHLNL+XUo2GWzM2srpP4yNUuYFS1z2EsosHhRJtxbUyqFxT lzKrntJYK+X+FWtEkltWTDGtTylYrEQmvofNlrq8O67XThp4F/nADjvM4RErzcEi 5NB/t35RnvMIhGUFHQDfSn9RFx4zGwM1etHdUGiRpuM5uVzFQOvNRpFuL3/i0IFq hH2Eq76vQN2lcTh+R9q4+TVlyrjDHPZCz/MzoO2S =0fsO -----END PGP SIGNATURE----- --=-AzVJNo7DYgHz2X915YAL--