From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from plane.gmane.org ([80.91.229.3]:45014 "EHLO plane.gmane.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751343AbaIYFeT (ORCPT ); Thu, 25 Sep 2014 01:34:19 -0400 Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1XX1h3-0005HU-W5 for linux-btrfs@vger.kernel.org; Thu, 25 Sep 2014 07:34:17 +0200 Received: from ip68-231-22-224.ph.ph.cox.net ([68.231.22.224]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 25 Sep 2014 07:34:17 +0200 Received: from 1i5t5.duncan by ip68-231-22-224.ph.ph.cox.net with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 25 Sep 2014 07:34:17 +0200 To: linux-btrfs@vger.kernel.org From: Duncan <1i5t5.duncan@cox.net> Subject: Re: mount problem Date: Thu, 25 Sep 2014 05:34:05 +0000 (UTC) Message-ID: References: <20140923120641.GA27624@galliera.it> <20140924142835.GA18272@galliera.it> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Sender: linux-btrfs-owner@vger.kernel.org List-ID: Simone Ferretti posted on Wed, 24 Sep 2014 16:28:35 +0200 as excerpted: > Wed, Sep 24, 2014 at 01:23:32PM +0000, Duncan wrote: >> Simone Ferretti posted on Tue, 23 Sep 2014 14:06:41 +0200 as excerpted: >> >>> we're testing BTRFS on our Debian server. After a lot of operations >>> simulating a RAID1 failure, every time I mount my BTRFS RAID1 volume >>> the kernel logs these messages: >>> >>> [73894.436173] BTRFS: bdev /dev/etherd/e30.20 errs: >>> wr 33036, rd 0, flush 0, corrupt 2806, gen 0 >>> [73894.436181] BTRFS: bdev /dev/etherd/e60.28 errs: >>> wr 244165, rd 0, flush 0, corrupt 1, gen 4 >>> >>> Everything seems to work nice but I'm courious to know what these >>> messages mean (in particular what do "gen" and "corrupt" mean?). >> >> Gen=generation. The generation or transaction-ID (different names for >> the exact same thing) is a monotonically increasing integer that gets >> updated every time a tree update reaches all the way to the superblock. >> In the error context, it means the superblock had one generation number >> but N other blocks had a different (presumably older) generation >> number. >> >> Corrupt is simply the number of blocks where the calculated checksum >> didn't match the recorded checksum, thus indicating an error. >> >> See btrfs device stats -z to reset the numbers to zero (after printing >> them one last time). > > > Thank you much for your quick and illuminating answer. > > I'm wondering if you (or anyone else of course) know if there is btrfs > documentation/papers/anything (besides wiki I did not find anything), in > which it's possible to learn this kind of informations? I've learned it from the list and wiki, and from general background experience and by reading between the lines at times. For the monotonically increasing counts and a zero-out option case, the manpage and help information for btrfs device stats -z, that indicates -z resets counts to zero, implies that they continue to count up otherwise. At one point I think a dev did confirm that on-list, but it's easy enough to read the implication without such confirmation, particularly when it matches observed behavior, as it does. The gen/trans-id thing is in fact covered in the wiki, but at least on the user-wiki side, I believe only in passing as it is mentioned on the btrfs restore page, here: https://btrfs.wiki.kernel.org/index.php/Restore (That is in turn linked from the problem-faq, filesystem won't mount and none of the above helped, is there any hope, entry, as well as from the built-in-tools section of the main page.) Of course people only searching for specific things instead of doing general research before diving head-first into a new filesystem, thus reading most of at least the user section of the wiki, as I did, might miss it. But while it's there, it took an actual problem and trying to actually use restore on my own system before the equivalence of trans-id and generation actually sunk in. The corrupt thing probably came from my previous experience, working with mdraid and its scrub, and with ECC RAM and the related BIOS scrub features. In general, any admin who has worked with (and understood) any sort of checksumming and error detection and correction should have a general idea what's going on there, at least after reading the btrfs-scrub manpage and running it to correct errors a few times, thus seeing how its output matches that of the corresponding stats. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman