From: Alexander Wetzel <alexander.wetzel@web.de>
To: Hugo Mills <hugo@carfax.org.uk>, linux-btrfs@vger.kernel.org
Subject: Re: btrfs filesystem corruptions with 4.18. git kernels
Date: Sat, 21 Jul 2018 08:16:40 +0200 [thread overview]
Message-ID: <cd28fb92-61ef-45a5-fd18-200b7153eecf@web.de> (raw)
In-Reply-To: <20180720231221.GE21293@carfax.org.uk>
>> I'm running my normal workstation with git kernels from git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-testing.git
>> and just got the second file system corruption in three weeks. I do
>> not have issues with stable kernels, and just want to give you a
>> heads up that there might be something seriously broken in current
>> development kernels.
>>
>> The first corruption was with a kernel based on 4.18.0-rc1
>> (wt-2018-06-20) and the second one today based on 4.18.0-rc4
>> (wt-2018-07-09).
>> The first corruption definitely destroyed data, the second one has
>> not been looked at all, yet.
>>
>> After the reinstall I did run some scrubs, the last working one one
>> week ago.
>>
>> Of course this could be unrelated to the development kernels or even
>> btrfs, but two corruptions within weeks after years without problems
>> is very suspect.
>> And since btrfs also allowed to read corrupted data (with a stable
>> ubuntu kernel, see below for more details) it looks like this is
>> indeed an issue in btrfs, correct?
>>
>> A btrfs subvolume is used as the rootfs on a "Samsung SSD 850 EVO
>> mSATA 1TB" and I'm running Gentoo ~amd64 on a Thinkpad W530. Discard
>> is enabled as mount option and there were roughly 5 other
>> subvolumes.
>>
>> I'm currently backing up the full btrfs partition after the second
>> corruption which announced itself with the following log entries:
>>
>> [ 979.223767] BTRFS critical (device sdc2): corrupt leaf: root=2
>> block=1029783552 slot=1, unexpected item end, have 16161 expect
>> 16250
>
> This means that the metadata block matches the checksum in its
> header, but is internally inconsistent. This means that the error in
> the block was made before the csum was computed -- i.e., it was that
> way in RAM. This can happen in a couple of different ways, but the
> most likely cause is bad RAM.
>
> In this case, it's not a single bitflip in the metadata page
> itself, so it's more likely to be something writing spurious data on
> the page in RAM that was holding this metadata block. This is either a
> bug in the kernel, or a hardware problem.
>
> I would strongly recommend checking your RAM (memtest86 for a
> minimum of 8 hours, preferably 24).
The system has 24G of ram but since the reinstalled was compiling the
complete OS from scratch (with a stable kernel) I would have expected to
hit the bad ram there also and kind of ignored that possibility. I'll
run the tests and also report back on that.
>> [ 979.223808] BTRFS: error (device sdc2) in __btrfs_cow_block:1080:
>> errno=-5 IO failure
>> [ 979.223810] BTRFS info (device sdc2): forced readonly
>> [ 979.224599] BTRFS warning (device sdc2): Skipping commit of
>> aborted transaction.
>> [ 979.224603] BTRFS: error (device sdc2) in
>> cleanup_transaction:1847: errno=-5 IO failure
>>
>> I'll restore the system from a backup - and stick to stable kernels
>> for now - after that, but if needed I can of course also restore the
>> partition backup to another disk for testing.
>
> It may be a kernel issue, but it's not necessarily in btrfs. It
> could be a bug in some other kernel component where it does some
> pointer arithmetic wrong, or uses some uninitialised data as a
> pointer. My money's is on bad RAM, though (by a small margin).
>
I also had two out of tree kernel modules:
https://github.com/hhfeuer/nvhda and the gentoo packaged version of
https://github.com/mkottman/acpi_call
Alexander
next prev parent reply other threads:[~2018-07-21 10:48 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-07-20 21:28 btrfs filesystem corruptions with 4.18. git kernels Alexander Wetzel
2018-07-20 22:53 ` Christian Kujau
2018-07-21 6:07 ` Alexander Wetzel
2018-07-20 23:12 ` Hugo Mills
2018-07-21 6:16 ` Alexander Wetzel [this message]
2018-07-21 1:22 ` Qu Wenruo
2018-07-21 6:39 ` Alexander Wetzel
2018-07-22 1:21 ` Qu Wenruo
2018-07-22 6:07 ` Alexander Wetzel
2018-07-21 6:13 ` Duncan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cd28fb92-61ef-45a5-fd18-200b7153eecf@web.de \
--to=alexander.wetzel@web.de \
--cc=hugo@carfax.org.uk \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).