linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Kai Krakow <hurikhan77@gmail.com>
To: linux-btrfs@vger.kernel.org
Subject: Re: btrfsck: backpointer mismatch (and multiple other errors)
Date: Sun, 3 Apr 2016 06:02:02 +0200	[thread overview]
Message-ID: <20160403060202.03e67651@jupiter.sol.kaishome.de> (raw)
In-Reply-To: CAJCQCtQ+RnQqghDs060hrLJ850e0n=FiKNn9W8UuUP3DQV9u5g@mail.gmail.com

Am Sat, 2 Apr 2016 18:14:17 -0600
schrieb Chris Murphy <lists@colorremedies.com>:

> On Sat, Apr 2, 2016 at 2:16 PM, Kai Krakow <hurikhan77@gmail.com>
> wrote:
> 
> > I'll go checking the RAM for problems - tho that would be the first
> > time in twenty years that a RAM module hadn't errors from the
> > beginning. Well, you'll never know. But I expect no error since
> > usually this would mean all sorts of different and random problems
> > which I don't have. Problems are very specific, which is atypical
> > for RAM errors.  
> 
> Well so far it's just the VDI that's experiencing csum mismatch
> errors, right? So that's not bad RAM, which would affect other files
> too. And same for a failing SSD.

No, other files are affected, too. And it looks like those files are
easily affected even when removed and recreated from whatever backup
source.

> I think you've got a bug somewhere and it's just hard to say where it
> is based on the available information. I've already lost track if
> others have all of the exact same setup you do: bcache + nossd +
> autodefrag + lzo + VirtualBox writing to VDI on this Btrfs volume.
> There are others who have some of those options, but I don't know if
> there's anyone who has all of those going on.

I didn't run VirtualBox since the incident. So I'd rule out VirtualBox.
Currently, there seems to be no csum error for the VDI file, instead
now another file gets corruptions, even after recreated. I think it is
result of another corruption and thus a side effect.

Also I think, having options nossd+autodefrag+lzo shouldn't be an
exotic or unsupported option. Having this on top of bcache should just
work.

Let's not rule out bcache had a problem although I usually expect
bcache to freak out with internal btree corruption then.

> Maybe Qu has some suggestions, but if it were me I'd do this. Build
> mainline 4.5.0, it's a known quantity by Btrfs devs.

4.5.0-gentoo is currently only a few patches so I could easily build
vanilla.

> Build the kernel
> with BTRFS_FS_CHECK_INTEGRITY enabled in kernel config. And when you
> mount the file system, don't use mount option check_int, just use your
> regular mount options and try to reproduce the VDI corruption. If you
> can reproduce it, then start over, this time with check_int mount
> option included along with the others you're using and try to
> reproduce. It's possible there will be fairly verbose kernel messages,
> so use boot parameter log_buf_len=1M and then that way you can use
> dmesg rather than depending on journalctl -k which sometimes drops
> messages if there are too many.

Does it make sense while I still have the corruptions in the FS? I'd
like to wait for Qu whether I should recreate the FS or whether I
should take some image, or send info to improve btrfsck...

I'm pretty sure I do not have reproducible corruptions which are not
caused by another corruption - so check_int would probably be of less
use currently.

> If you reproduce the corruption while check_int is enabled, kernel
> messages should have clues and then you can put that in a file and
> attach to the list or open a bug. FWIW, I'm pretty sure your MUA is
> wrapping poorly, when I look at this URL for your post with smartctl
> output, it wraps in a way that's essentially impossible to sort out at
> a glance. Whether it's your MUA or my web browser pretty much doesn't
> matter, it's not legible so what I do is just attach as file to a bug
> report or if small enough onto the list itself.
> http://www.spinics.net/lists/linux-btrfs/msg53790.html

Claws mail is just too smart for me... It showed up correctly in the
editor before hitting the send button. I wish I could go back to knode
(that did it's job right). But it's currently an unsupported orphan
project of KDE. :-(

> Finally, I would retest yet again with check_int_data as a mount
> option and try to reproduce. This is reported to be dirt slow, but it
> might capture something that check_int doesn't. But I admit this is
> throwing spaghetti on the wall, and is something of a goose chase just
> because I don't know what else to recommend other than iterating all
> of your mount options from none, adding just one at a time, and trying
> to reproduce. That somehow sounds more tedious. But chances are you'd
> find out what mount option is causing it; OR maybe you'd find out the
> corruption always happens, even with defaults, even without bcache, in
> which case that'd seem to implicate either a gentoo patch, or a
> virtual box bug of some sort.

I think the latter two are easily the least probable sort of bugs. But
I'll give it a try. For the time being, I could switch bcache to
write-around mode - so it could at least not corrupt btrfs during
writes.

-- 
Regards,
Kai

Replies to list-only preferred.


  reply	other threads:[~2016-04-03  4:02 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-31 20:44 btrfsck: backpointer mismatch (and multiple other errors) Kai Krakow
2016-03-31 23:27 ` Henk Slager
2016-04-01  1:10   ` Qu Wenruo
2016-04-02  8:47     ` Kai Krakow
2016-04-02  9:00   ` Kai Krakow
2016-04-02 17:17     ` Henk Slager
2016-04-02 20:16       ` Kai Krakow
2016-04-03  0:14         ` Chris Murphy
2016-04-03  4:02           ` Kai Krakow [this message]
2016-04-03  5:06             ` Duncan
2016-04-03 22:19               ` Kai Krakow
2016-04-04  0:51                 ` Chris Murphy
2016-04-04 19:36                   ` Kai Krakow
2016-04-04 19:57                     ` Chris Murphy
2016-04-04 20:50                       ` Kai Krakow
2016-04-04 21:00                         ` Kai Krakow
2016-04-04 23:09                         ` Chris Murphy
2016-04-05  7:05                           ` Kai Krakow
2016-04-04  4:34                 ` Duncan
2016-04-04 19:26                   ` Kai Krakow
2016-04-05  1:44                     ` Duncan
2016-04-03 19:03             ` Chris Murphy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160403060202.03e67651@jupiter.sol.kaishome.de \
    --to=hurikhan77@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).