linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Hugo Mills <hugo@carfax.org.uk>
To: Eric Wolf <19wolf@gmail.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: BTRFS critical (device sda2): corrupt leaf, bad key order: block=293438636032, root=1, slot=11
Date: Thu, 31 Aug 2017 20:11:24 +0000	[thread overview]
Message-ID: <20170831201124.GD30990@carfax.org.uk> (raw)
In-Reply-To: <CAJ_hD5DLufKpJPHpvFhsq8fj-oL++=NMkMPEHe9y=dCcsbUwtg@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 5116 bytes --]

On Thu, Aug 31, 2017 at 03:21:07PM -0400, Eric Wolf wrote:
> I've previously confirmed it's a bad ram module which I have already
> submitted an RMA for. Any advice for manually fixing the bits?

   What I'd do... use a hex editor and the contents of ctree.h as
documentation to find the byte in question, change it back to what it
should be, mount the FS, try reading the directory again, look up the
csum failure in dmesg, edit the block again to fix up the csum, and
it's done. (Yes, I've done this before, and I'm a massive nerd).

   It's also possible to use Hans van Kranenberg's btrfs-python to fix
up this kind of thing, but I've not done it myself. There should be a
couple of talk-throughs from Hans in various archives -- both this
list (find it on, say, http://www.spinics.net/lists/linux-btrfs/), and
on the IRC archives (http://logs.tvrrug.org.uk/logs/%23btrfs/latest.html).

> Sorry for top leveling, not sure how mailing lists work (again sorry
> if this message is top leveled, how do I ensure it's not?)

   Just write your answers _after_ the quoted text that you're
replying to, not before. It's a convention, rather than a technical
thing...

   Hugo.

> ---
> Eric Wolf
> (201) 316-6098
> 19wolf@gmail.com
> 
> 
> On Thu, Aug 31, 2017 at 2:59 PM, Hugo Mills <hugo@carfax.org.uk> wrote:
> >    (Please don't top-post; edited for conversation flow)
> >
> > On Thu, Aug 31, 2017 at 02:44:39PM -0400, Eric Wolf wrote:
> >> On Thu, Aug 31, 2017 at 2:33 PM, Hugo Mills <hugo@carfax.org.uk> wrote:
> >> > On Thu, Aug 31, 2017 at 01:53:58PM -0400, Eric Wolf wrote:
> >> >> I'm having issues with a bad block(?) on my root ssd.
> >> >>
> >> >> dmesg is consistently outputting "BTRFS critical (device sda2):
> >> >> corrupt leaf, bad key order: block=293438636032, root=1, slot=11"
> >> >>
> >> >> "btrfs scrub stat /" outputs "scrub status for b2c9ff7b-[snip]-48a02cc4f508
> >> >> scrub started at Wed Aug 30 11:51:49 2017 and finished after 00:02:55
> >> >> total bytes scrubbed: 53.41GiB with 2 errors
> >> >> error details: verify=2
> >> >> corrected errors: 0, uncorrectable errors: 2, unverified errors: 0"
> >> >>
> >> >> Running "btrfs check --repair /dev/sda2" from a live system stalls
> >> >> after telling me corrupt leaf etc etc then "11 12". CPU usage hits
> >> >> 100% and disk activity remains at 0.
> >> >
> >> >    This error is usually attributable to bad hardware. Typically RAM,
> >> > but might also be marginal power regulation (blown capacitor
> >> > somewhere) or a slightly broken CPU.
> >> >
> >> >    Can you show us the output of "btrfs-debug-tree -b 293438636032 /dev/sda2"?
> >
> >    Here's the culprit:
> >
> > [snip]
> >> item 10 key (890553 EXTENT_DATA 0) itemoff 14685 itemsize 269
> >>    inline extent data size 248 ram 248 compress 0
> >> item 11 key (890554 INODE_ITEM 0) itemoff 14525 itemsize 160
> >>    inode generation 5386763 transid 5386764 size 135 nbytes 135
> >>    block group 0 mode 100644 links 1 uid 100000 gid 100000
> >>    rdev 0 flags 0x0
> >> item 12 key (856762 INODE_REF 31762) itemoff 14496 itemsize 29
> >>    inode ref index 2745 namelen 19 name: dpkg.statoverride.0
> >> item 13 key (890554 EXTENT_DATA 0) itemoff 14340 itemsize 156
> >>    inline extent data size 135 ram 135 compress 0
> > [snip]
> >
> >    Note the objectid field -- the first number in the brackets after
> > "key" for each item. This sequence of values should be non-decreasing.
> > Thus, item 12 should have an objectid of 890554 to match the items
> > either side of it, and instead it has 856762.
> >
> >    In hex, these are:
> >
> >>>> hex(890554)
> > '0xd96ba'
> >>>> hex(856762)
> > '0xd12ba'
> >
> >    Which means you've had two bitflips close together:
> >
> >>>> hex(856762 ^ 890554)
> > '0x8400'
> >
> >    Given that everything else is OK, and it's just one byte affected
> > in the middle of a load of data that's really quite sensitive to
> > errors, it's very unlikely that it's the result of a misplaced pointer
> > in the kernel, or some other subsystem accidentally walking over that
> > piece of RAM. It is, therefore, almost certainly your hardware that's
> > at fault.
> >
> >    I would strongly suggest running memtest86 on your machine -- I'd
> > usually say a minimum of 8 hours, or longer if you possibly can (24
> > hours), or until you have errors reported. If you get errors reported
> > in the same place on multiple passes, then it's the RAM. If you have
> > errors scattered around seemingly at random, then it's probably your
> > power regulation (PSU or motherboard).
> >
> >    Sadly, btrfs check on its own won't be able to fix this, as it's
> > two bits flipped. (It can cope with one bit flipped in the key, most
> > of the time, but not two). It can be fixed manually, if you're
> > familiar with a hex editor and the on-disk data structures.
> >
> >    Hugo.
> >

-- 
Hugo Mills             | "There's a Martian war machine outside -- they want
hugo@... carfax.org.uk | to talk to you about a cure for the common cold."
http://carfax.org.uk/  |
PGP: E2AB1DE4          |                           Stephen Franklin, Babylon 5

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

  reply	other threads:[~2017-08-31 20:11 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-31 17:53 BTRFS critical (device sda2): corrupt leaf, bad key order: block=293438636032, root=1, slot=11 Eric Wolf
2017-08-31 18:33 ` Hugo Mills
2017-08-31 18:44   ` Eric Wolf
2017-08-31 18:59     ` Hugo Mills
2017-08-31 19:21       ` Eric Wolf
2017-08-31 20:11         ` Hugo Mills [this message]
2017-09-01 13:38           ` Eric Wolf
2017-09-01 20:29             ` Chris Murphy
2017-09-01 13:42           ` Eric Wolf
2017-08-31 18:50   ` Eric Wolf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170831201124.GD30990@carfax.org.uk \
    --to=hugo@carfax.org.uk \
    --cc=19wolf@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).