From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from plane.gmane.org ([80.91.229.3]:40718 "EHLO plane.gmane.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750781AbaBIFlU (ORCPT ); Sun, 9 Feb 2014 00:41:20 -0500 Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1WCN8o-0001Ag-MY for linux-btrfs@vger.kernel.org; Sun, 09 Feb 2014 06:41:18 +0100 Received: from ip68-231-22-224.ph.ph.cox.net ([68.231.22.224]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sun, 09 Feb 2014 06:41:18 +0100 Received: from 1i5t5.duncan by ip68-231-22-224.ph.ph.cox.net with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sun, 09 Feb 2014 06:41:18 +0100 To: linux-btrfs@vger.kernel.org From: Duncan <1i5t5.duncan@cox.net> Subject: Re: lost with degraded RAID1 Date: Sun, 9 Feb 2014 05:40:55 +0000 (UTC) Message-ID: References: <20140130175831.GU3314@carfax.org.uk> <6C293A14-9A38-4DAA-A720-1F77B9CB083D@colorremedies.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Sender: linux-btrfs-owner@vger.kernel.org List-ID: Johan Kröckel posted on Sat, 08 Feb 2014 12:09:46 +0100 as excerpted: > Ok, I did nuke it now and created the fs again using 3.12 kernel. So far > so good. Runs fine. > Finally, I know its kind of offtopic, but can some help me interpreting > this (I think this is the error in the smart-log which started the whole > mess)? > > Error 1 occurred at disk power-on lifetime: 2576 hours (107 days + 8 > hours) > When the command that caused the error occurred, the device was > active or idle. > > After command completion occurred, registers were: > ER ST SC SN CL CH DH > -- -- -- -- -- -- -- > 04 71 00 ff ff ff 0f > Device Fault; Error: ABRT at LBA = 0x0fffffff = 268435455 I'm no SMART expert, but that LBA number is incredibly suspicious. With standard 512-byte sectors that's the 128 GiB boundary, the old 28-bit LBA limit (LBA28, introduced with ATA-1 in 1994, modern drives are LBA48, introduced in 2003 with ATA-6 and offering an addressing capacity of 128 PiB, according to wikipedia's article on LBA). It looks like something flipped back to LBA28, and when a continuing operation happened to write past that value... it triggered the abort you see in the SMART log. Double-check your BIOS to be sure it didn't somehow revert to the old LBA28 compatibility mode or some such, and the drives, to make sure they aren't "clipped" to LBA28 compatibility mode as well. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman