From: Kai Krakow <hurikhan77@gmail.com>
To: linux-btrfs@vger.kernel.org
Subject: Re: csum failed root -9
Date: Thu, 15 Jun 2017 08:46:04 +0200 [thread overview]
Message-ID: <20170615084604.2794d8b0@jupiter.sol.kaishome.de> (raw)
In-Reply-To: CAPmG0jaVbfoL8-PpDT-=B+w_JH9H4ukJQm5UUkZFRKS6uSAs+Q@mail.gmail.com
Am Wed, 14 Jun 2017 15:39:50 +0200
schrieb Henk Slager <eye1tm@gmail.com>:
> On Tue, Jun 13, 2017 at 12:47 PM, Henk Slager <eye1tm@gmail.com>
> wrote:
> > On Tue, Jun 13, 2017 at 7:24 AM, Kai Krakow <hurikhan77@gmail.com>
> > wrote:
> >> Am Mon, 12 Jun 2017 11:00:31 +0200
> >> schrieb Henk Slager <eye1tm@gmail.com>:
> >>
> [...]
> >>
> >> There's btrfs-progs v4.11 available...
> >
> > I started:
> > # btrfs check -p --readonly /dev/mapper/smr
> > but it stopped with printing 'Killed' while checking extents. The
> > board has 8G RAM, no swap (yet), so I just started lowmem mode:
> > # btrfs check -p --mode lowmem --readonly /dev/mapper/smr
> >
> > Now after a 1 day 77 lines like this are printed:
> > ERROR: extent[5365470154752, 81920] referencer count mismatch (root:
> > 6310, owner: 1771130, offset: 33243062272) wanted: 1, have: 2
> >
> > It is still running, hopefully it will finish within 2 days. But
> > lateron I can compile/use latest progs from git. Same for kernel,
> > maybe with some tweaks/patches, but I think I will also plug the
> > disk into a faster machine then ( i7-4770 instead of the J1900 ).
> >
> [...]
> >>
> >> What looks strange to me is that the parameters of the error
> >> reports seem to be rotated by one... See below:
> >>
> [...]
> >>
> >> Why does it say "ino 1"? Does it mean devid 1?
> >
> > On a 3-disk btrfs raid1 fs I see in the journal also "read error
> > corrected: ino 1" lines for all 3 disks. This was with a 4.10.x
> > kernel, ATM I don't know if this is right or wrong.
> >
> [...]
> >>
> >> And why does it say "root -9"? Shouldn't it be "failed -9 root 257
> >> ino 515567616"? In that case the "off" value would be completely
> >> missing...
> >>
> >> Those "rotations" may mess up with where you try to locate the
> >> error on disk...
> >
> > I hadn't looked at the numbers like that, but as you indicate, I
> > also think that the 1-block csum fail location is bogus because the
> > kernel calculates that based on some random corruption in critical
> > btrfs structures, also looking at the 77 referencer count
> > mismatches. A negative root ID is already a sort of red flag. When
> > I can mount the fs again after the check is finished, I can
> > hopefully use the output of the check to get clearer how big the
> > 'damage' is.
>
> The btrfs lowmem mode check ends with:
>
> ERROR: root 7331 EXTENT_DATA[928390 3506176] shouldn't be hole
> ERROR: errors found in fs roots
> found 6968612982784 bytes used, error(s) found
> total csum bytes: 6786376404
> total tree bytes: 25656016896
> total fs tree bytes: 14857535488
> total extent tree bytes: 3237216256
> btree space waste bytes: 3072362630
> file data blocks allocated: 38874881994752
> referenced 36477629964288
>
> In total 2000+ of those "shouldn't be hole" lines.
>
> A non-lowmem check, now done with kernel 4.11.4 and progs v4.11 and
> 16G swap added ends with 'noerrors found'
Don't trust lowmem mode too much. The developer of lowmem mode may tell
you more about specific edge cases.
> W.r.t. holes, maybe it is woth to mention the super-flags:
> incompat_flags 0x369
> ( MIXED_BACKREF |
> COMPRESS_LZO |
> BIG_METADATA |
> EXTENDED_IREF |
> SKINNY_METADATA |
> NO_HOLES )
I think it's not worth to follow up on this holes topic: I guess it was
a false report of lowmem mode, and it was fixed with 4.11 btrfs progs.
> The fs has received snapshots from source fs that had NO_HOLES enabled
> for some time, but after registed this bug:
> https://bugzilla.kernel.org/show_bug.cgi?id=121321
> I put back that NO_HOLES flag to zero on the source fs. It seems I
> forgot to do that on the 8TB target/backup fs. But I don't know if
> there is a relation between this flag flipping and the btrfs check
> error messages.
>
> I think I leave it as is for the time being, unless there is some news
> how to fix things with low risk (or maybe via a temp overlay snapshot
> with DM). But the lowmem check took 2 days, that's not really fun.
> The goal for the 8TB fs is to have an up to 7 year snapshot history at
> sometime, now the oldest snapshot is from early 2014, so almost
> halfway :)
Btrfs is still much too unstable to trust 7 years worth of backup to
it. You will probably loose it at some point, especially while many
snapshots are still such a huge performance breaker in btrfs. I suggest
trying out also other alternatives like borg backup for such a project.
--
Regards,
Kai
Replies to list-only preferred.
next prev parent reply other threads:[~2017-06-15 6:46 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-06-12 9:00 csum failed root -9 Henk Slager
2017-06-13 5:24 ` Kai Krakow
2017-06-13 10:47 ` Henk Slager
2017-06-14 13:39 ` Henk Slager
2017-06-15 6:46 ` Kai Krakow [this message]
2017-06-19 15:23 ` Henk Slager
2017-06-15 7:13 ` Qu Wenruo
2017-06-19 14:20 ` Henk Slager
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170615084604.2794d8b0@jupiter.sol.kaishome.de \
--to=hurikhan77@gmail.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).