From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from plane.gmane.org ([80.91.229.3]:59059 "EHLO plane.gmane.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752300AbcA3F67 (ORCPT ); Sat, 30 Jan 2016 00:58:59 -0500 Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1aPOYh-0000lc-PJ for linux-btrfs@vger.kernel.org; Sat, 30 Jan 2016 06:58:55 +0100 Received: from ip98-167-165-199.ph.ph.cox.net ([98.167.165.199]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sat, 30 Jan 2016 06:58:55 +0100 Received: from 1i5t5.duncan by ip98-167-165-199.ph.ph.cox.net with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sat, 30 Jan 2016 06:58:55 +0100 To: linux-btrfs@vger.kernel.org From: Duncan <1i5t5.duncan@cox.net> Subject: Re: csum failed : d-raid0, m-raid1 Date: Sat, 30 Jan 2016 05:58:45 +0000 (UTC) Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Sender: linux-btrfs-owner@vger.kernel.org List-ID: John Smith posted on Fri, 29 Jan 2016 19:04:42 +0100 as excerpted: > Hi > > i built btrfs volume using 2x3tb brand new /tested for badblocks drives. > I copied into volume around 5Tb of data. > > I tried to read one file which is around 4GB and i got input / output > error. > > Dmesg contains: > > [154159.040059] BTRFS warning (device sdd): csum failed ino 9995246 off > 4506214400 csum 383964635 expected csum 6478505 > > > Any idea what is it? Whats the reason that this happened? Can I recover? Btrfs crc32c-checksums all blocks on write, both data (except for data written while mounted nodatasum and nocow attribute files) and metadata (always), and verifies the checksum on read. The read-time csum verification failed on the block at that address of the file, and as your data is raid0, there's no second copy to fall back on as there would be for raid1 data and no parity data available to try to rebuild from as there'd be for raid56 data, so the file can only be read upto that point, and if you skip that block, and there are no further checksum failures, beyond that point, to the end of the file. Of course the sysadmin's first rule of backups, in simplest form, is that if you don't have at least one backup, you are by your failure to backup defining the value of the data as less than the value of the time/hassle/ resources you'd otherwise spend making that backup, so you either have a backup to fallback to, or you're data is self-defined by that lack of a backup as of only trivial value not worth the trouble. And of course, btrfs, while stabilizING, isn't yet considered fully stable and mature, so that sysadmin's rule of backups applies to an even stronger degree than it does to fully stable and mature filesystems. As a result, for recovery, you can either fall back to the backup, rewriting the file from backup to the btrfs in question, or by action you defined the data as too trivial to be worth backing up, so you can simply delete the file in question and not worry about it. The question then becomes one of finding out what file is involved, in ordered to either delete it or recover it from backup. Keep in mind that unlike most filesystems, inode numbers on btrfs are subvolume-specific, so it's possible to have multiple inodes with the same inode number on the filesystem, if you have multiple subvolumes. Thus, it's not as simple as looking up what file that inode corresponds to, unless of course you have only the primary/root subvolume, no others. There are two ways to find what file corresponds to that inode on that subvolume. One involves use of the btrfs debugging tools and is targeted at devs. While I know this is possible and I've seen the method posted, I'm not a dev, only a btrfs user and list regular, and I've not kept track of the specifics, so I won't attempt to describe them further here. The other one is btrfs scrub, which will systematically verify all checksums on the filesystem, repairing errors where it's possible (metadata in your case as it's raid1, assuming of course that the second copy of the block isn't also bad), reporting those which it can't (the raid0 data). Where it can't fix the problem dmesg should contain the file with the problem (unless it's metadata and thus not a file, of course). Of course on 5 TB of data, scrub's going to take awhile... likely over a day and possibly two (5 TiB of data at 30 MiB/sec is about 48 hours, 30 MiB/sec might be a bit pessimistically slow but isn't out of real-world range on spinning rust). Even on relatively fast (for spinning rust) drives, 100 MiB/sec, you're looking at 14 hours... Tho because scrub checksum-verifies all blocks, it'll cover any problems in other files and in metadata too, not just the one file. FWIW, maintenance time is one of several reasons I use multiple smaller btrfs on partitioned up devices, here, instead of a single huge multi-TB btrfs. My btrfs are also all raid1 both data and metadata, save for /boot (and its backup on the other device) which are both mixed-mode dup, two copies on the same device, so there's always that second copy to pull from to repair the failed one, if something fails checksum verification. They also happen to be on SSD, with the largest btrfs on a pair of 24 GiB partitions. As such, scrubs, balances, checks, etc, all take under 10 minutes per filesystem, with scrubs often complete in under a minute, instead of the day or longer it's likely to take you for 5 TiB on spinning rust. Of course I have more btrfs and it'd take me somewhat longer than that minute to do just one, say a half hour, to scrub them all, but some of them aren't even routinely mounted, and my 8 GiB (per device, two devices) btrfs raid1 / is mounted read-only by default, so it too is unlikely to be damaged. As such, generally only 2-3 btrfs need scrubbed at once and often it's only 1-2, and on fast SSD, I'm done in under 5 minutes. /Much/ more feasible maintenance time than several /days/! =:^) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman