From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf0-f51.google.com ([209.85.215.51]:44859 "EHLO mail-lf0-f51.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751366AbeABKCn (ORCPT ); Tue, 2 Jan 2018 05:02:43 -0500 Received: by mail-lf0-f51.google.com with SMTP id g63so33118399lfl.11 for ; Tue, 02 Jan 2018 02:02:43 -0800 (PST) Message-ID: <5A4B58BF.1090804@gmail.com> Date: Tue, 02 Jan 2018 11:02:39 +0100 From: ein MIME-Version: 1.0 To: swestrup@gmail.com, Kai Krakow CC: linux-btrfs@vger.kernel.org Subject: Re: A Big Thank You, and some Notes on Current Recovery Tools. References: <1clphe-a7q.ln1@hurikhan77.spdns.de> In-Reply-To: Content-Type: text/plain; charset=utf-8 Sender: linux-btrfs-owner@vger.kernel.org List-ID: On 01/01/2018 08:44 PM, Stirling Westrup wrote: > On Mon, Jan 1, 2018 at 7:15 AM, Kai Krakow wrote: >> Am Mon, 01 Jan 2018 18:13:10 +0800 schrieb Qu Wenruo: >> >>> On 2018年01月01日 08:48, Stirling Westrup wrote: >>>> >>>> 1) I had a 2T drive die with exactly 3 hard-sector errors and those 3 >>>> errors exactly coincided with the 3 super-blocks on the drive. >>> >>> WTF, why all these corruption all happens at btrfs super blocks?! >>> >>> What a coincident. >> >> Maybe it's a hybrid drive with flash? Or something that went wrong in the >> drive-internal cache memory the very time when superblocks where updated? >> >> I bet that the sectors aren't really broken, just the on-disk checksum >> didn't match the sector. I remember such things happening to me more than >> once back in the days when drives where still connected by molex power >> connectors. Those connectors started to get loose over time, due to >> thermals or repeated disconnect and connect. That is, drives sometimes >> started to no longer have a reliable power source which let to all sorts >> of very strange problems, mostly resulting in pseudo-defective sectors. >> >> That said, the OP would like to check the power supply after this >> coincidence... Maybe it's aging and no longer able to support all four >> drives, CPU, GPU and stuff with stable power. > > You may be right about the cause of the error being a power-supply issue. > For those that are curious, the drive that failed was a Seagate Barracuda > LP 2000G drive (ST2000DL003). > Forgive me if it's not relevant, but I own quite a few disks from that series, like: root@iomega-ordo:~# hdparm -i /dev/sda /dev/sda: Model=ST2000DM001-1CH164, FwRev=CC27, SerialNo=Z1E6EV85 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs RotSpdTol>.5% } root@iomega-acm:~# smartctl -d sat -a /dev/sda === START OF INFORMATION SECTION === Device Model: ST3000DM001-9YN166 Serial Number: S1F0PGQJ LU WWN Device Id: 5 000c50 0516fce00 Firmware Version: CC4B root@iomega-europol:~# smartctl -d sat -a /dev/sda smartctl 5.41 2011-06-09 r3365 [armv5tel-linux-2.6.31.8] (local build) === START OF INFORMATION SECTION === Device Model: ST3000DM001-9YN166 Serial Number: Z1F1H5KA LU WWN Device Id: 5 000c50 04ec18fda Different locations, different environments, different boards one more stable (the power) than others. I replaced at least three four in the past 3 years. All of them died because heavy random wirte workload. (rsnapshot, massive cp -al of millions of files every day). In my case every time bad sectors occurred too, but I didn't analyze where exactly, it was just a backup destination drive. I pretty convinced it could be ext2 supers too though. -- PGP Public Key (RSA/4096b): ID: 0xF2C6EA10 SHA-1: 51DA 40EE 832A 0572 5AD8 B3C0 7AFF 69E1 F2C6 EA10