linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Duncan <1i5t5.duncan@cox.net>
To: linux-btrfs@vger.kernel.org
Subject: Re: btrfs scrub and kernel parent transid
Date: Fri, 9 Jan 2015 23:57:58 +0000 (UTC)	[thread overview]
Message-ID: <pan$2fc5a$d7961867$68930ae0$a3bcf496@cox.net> (raw)
In-Reply-To: 1420816425.4982.7.camel@AMD

Lars Callenbach posted on Fri, 09 Jan 2015 16:13:45 +0100 as excerpted:

> in kernel logs I get an error message of the type 'parent transid verify
> failed' but 'btrfs scrub ...' does not show any errors/inconsistencies.
> Do I have to worry about these kernel error messages?

I believe you're failing to understand either the nature of these 
messages, or what scrub actually does (if not both), as scrub wouldn't 
be /expected/ to detect/fix this sort of problem.

> ~ # uname -a
> [...] Linux 3.17.8-MYVERSION [...]
> ~ #   btrfs --version
> Btrfs v3.18+20141230

Thanks. =:^)

> ~ # dmesg 
> ...
> [    3.405363] parent transid verify failed on 1514311680
> wanted 21004 found 35043
> [    3.406370] parent transid verify failed on 17290989568
> wanted 21054 found 32893

In btrfs-speak, transid and generation (number) refer to the same thing, 
a monotonically increasing commit number.  In btrfs, various individual 
file changes can occur, with the changes cascading up the tree until an 
atomic root commit, which in theory represents a known valid state of the 
filesystem at that point.  Thus, in theory at least, the filesystem 
progresses from one known valid root commit to the next, and an invalid 
in-between state should not occur, it should always fall back to the last 
root tree commit, which should always be valid.

Of course that's the theory, not the reality we live in complete with 
bugs and hardware that doesn't exactly behave as it is documented to 
behave.  And btrfs not being a fully mature filesystem, to be sure 
there's certainly bugs!

Anyway, generation, aka transaction ID, aka transid, is simply an 
increasing number reflecting the number of root-tree commits to date.  A 
lower transid therefore reflects an earlier state of the tree, and 
"parent" refers to the next level of the commit tree, closer to root.

Here we're seeing HIGHER transids than expected.  The expected 
transaction ID is (for the first one above) 21004, but a much higher 
transid, 35043, was found.

I'm not a dev and thus don't grok the full implications here, but I do 
know that at one point fairly recently there was a bug that would reset 
transid for some nodes to zero in some circumstances.  It's possible that 
happened in your case, and that the differing transids reflect resets to 
zero of differing parts of the tree.


Whatever the cause, however, it's not a problem that scrub would be 
expected to handle.

What scrub does is simply this.  Btrfs data and metadata nodes are 
checksummed so that corruption can be detected and if there's a valid 
copy of the data/metadata elsewhere (as there normally is in raid1 or dup 
modes, for instance, but not in single or raid0 modes), it can be 
rewritten over the bad copy.

While this occurs in normal usage as btrfs reads requested nodes, what 
scrub does is systematically read /everything/ validating its checksum 
and reporting and/or correcting errors where it has a valid copy 
elsewhere and can do so.


Now what we're seeing above is actually valid data/metadata in terms of 
what scrub sees -- the calculated checksum matches that recorded, so 
there's nothing for scrub to report or fix.  It's simply not the 
/expected/ metadata -- the generation-number/transid is wrong.  Scrub 
doesn't detect or fix /that/ sort of problem, as it's designed to detect 
and fix an entirely /different/ sort of problem.

Now what /should/ detect and hopefully be able to fix the problem, is 
btrfs check.  Run it in read-only mode (without --repair or other 
"fixing" options) first, and see what it reports.

However, at least in the past, btrfs check in --repair mode could 
actually make certain problems it didn't properly understand worse.  I'm 
not sure if that remains the case or not, but the recommendation has 
always been NOT to run it in --repair mode until you're sure it can 
actually repair the problems you're seeing, without creating new problems.

What that means in practice is this:  Either make a full backup (and as 
every good sysadmin knows, a backup isn't a backup until you've tested 
that you can properly recover from it) before attempting btrfs check
--repair, or post the results of the read-only run, and let a dev tell 
you whether it's safe to run in --repair mode or not, based on the 
results of the read-only run and the problems you're experiencing that 
you're trying to fix.


Meanwhile, here's an example of something that scrub /can/ fix.

Most of my btrfs are pair-device raid1 mode for both data and metadata, 
so there's two copies of everything, one copy each on two different 
devices.  My root filesystem is normally mounted read-only unless I'm 
actively updating it, but my (separate btrfs) /home and /var/log 
filesystems are of course mounted read-write by default, and in the event 
of a loss of system power or a crash, particularly /var/log is often in 
the middle of writing something.

But for raid1 mode, btrfs writes first to one device, and then when it's 
done, to the other.  So when the power goes out or the system crashes, 
one or the other copy should either be fully written, or not yet 
started.  When I reboot after such an unclean shutdown, I normally do a 
scrub, and it will frequently find and fix errors on the device that was 
being written to at the time, as those blocks will fail checksum 
verification and their (meta)data will be rewritten using the valid copy 
from the other device.

This actually works rather nicely and has saved my filesystems from worse 
problems a number of times. =:^)

Of course the fact that I keep my root filesystem read-only unless I'm 
actually updating it, so unlike /home and /var/log, it doesn't tend to be 
subject to such crash-related problems in the first place, does help me 
sleep better, since I know that if I /do/ end up trying to recover a bad 
filesystem, the root filesystem itself, where all the tools and 
documentation I'll be using to fix the problem are located, is normally 
unaffected, as it wasn't mounted writable when the problem occurred. =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


      reply	other threads:[~2015-01-09 23:58 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-01-09 15:13 btrfs scrub and kernel parent transid Lars Callenbach
2015-01-09 23:57 ` Duncan [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='pan$2fc5a$d7961867$68930ae0$a3bcf496@cox.net' \
    --to=1i5t5.duncan@cox.net \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).