From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-io0-f174.google.com ([209.85.223.174]:32785 "EHLO mail-io0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752539AbcCMUKs (ORCPT ); Sun, 13 Mar 2016 16:10:48 -0400 Received: by mail-io0-f174.google.com with SMTP id n190so201359491iof.0 for ; Sun, 13 Mar 2016 13:10:48 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20160313222442.1fa22a57@natsu> References: <20160312204847.2092f3f3@natsu> <20160312221524.646e1a66@natsu> <20160313142428.377b51b8@natsu> <20160313222442.1fa22a57@natsu> Date: Sun, 13 Mar 2016 14:10:47 -0600 Message-ID: Subject: Re: parent transid verify failed on snapshot deletion From: Chris Murphy To: Roman Mamedov Cc: Duncan <1i5t5.duncan@cox.net>, Btrfs BTRFS Content-Type: text/plain; charset=UTF-8 Sender: linux-btrfs-owner@vger.kernel.org List-ID: On Sun, Mar 13, 2016 at 11:24 AM, Roman Mamedov wrote: > > "Blowing away" a 6TB filesystem just because some block randomly went "bad", I'm going to guess it's a metadata block, and the profile is single. Otherwise, if it were data it'd just be a corrupt file and you'd be told which one is affected. And if metadata had more than one copy, then it should recover from the copy. The exact nature of the loss isn't clear, a kernel message for the time of the bad block message might help but I'm going to guess again that it's a 4096 byte missing block of metadata. Depending on what it is, that could be a pretty serious hole for any file system. > I'm running --init-extent-tree right now in a "what if" mode, using > the copy-on-write feature of 'nbd-server' (this way the original block device > is not modified, and all changes are saved in a separate file). So it's a Btrfs on NDB with no replication either from Btrfs or the storage backing it on the server? Off hand I'd say one of them needs redundancy to avoid this very problem, otherwise it's just too easy for even network corruption to cause a problem (NDB or iSCSI). Not related to your problem, I'm not sure whether and how many times Btrfs retries corrupt reads. That is, device returns read command OK (no error), but Btrfs detects corruption. Does it retry? Or immediately fail? For flash and network based Btrfs, it's possible the result is intermittant so it should try again. > It's been > running for a good 8 hours now, with 100% CPU use of btrfsck and very little > disk access. Yeah btrfs check is very much RAM intensive. -- Chris Murphy