From: David Sterba <dsterba@suse.cz>
To: Stefan Malte Schumacher <s.schumacher@netcologne.de>
Cc: Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: Massive filesystem errors - possible new HDD
Date: Fri, 15 Sep 2023 17:11:25 +0200 [thread overview]
Message-ID: <20230915151125.GC2747@twin.jikos.cz> (raw)
In-Reply-To: <CAA3ktqmgtdDGsubOiCZR+vS=5J3Wf2Hu8vi-t1z48zZB18mC0A@mail.gmail.com>
On Fri, Sep 15, 2023 at 12:31:44PM +0200, Stefan Malte Schumacher wrote:
> Hello,
>
> I have some serious problems with my btrfs-filesystem. It started with
> a smart error and "btrfs fi show" reporting the drive as missing about
> once every two weeks. The error went away after a reboot.
>
> Device: /dev/sda [SAT], unable to open ATA device
> Device info: WDC WUH722020ALE6L4, S/N:2LG7DJJK,
> WWN:5-000cca-2b3c35da5, FW:PQGNW108, 20.0 TB
>
> This is the latest hard disk I bought about four weeks ago.Half an
> hour ago I got the error message again, switched the monitor input to
> my file server and watched it boot. Booting produced some serious
> btrfs errors, but would finish. The filesystem is mounted and I can
> create files on it, but dmesg shows massive errors. I have pasted a
> selection of the errors after my message.
>
> Is this - in your opinion - a logical error of the filesystem or
> should I immediately exchange the new drive?
This kind of errors looks like problem on the hardware side, here the
filesystem is merely detecting that. It could be caused by loose cables
if the device disappears or controller or the disk itself, depending on
the load.
> Should I try to scrub my
> data? I have a backup but it's rather recent, meaning it could include
> corrupted files because I also bought a new NAS in addition to the new
> drive for the fileserver since I urgently needed space on both. Note:
> The former /dev/sda now is /dev/sdb after the reboot.
Better not use it for other purposes than to read the remaining data.
> Thanks in advance and yours faithfully
> Stefan Malte Schumacher
>
> Errors from journalctl. Repeat probably from the point the drive was
> not recognized any more.
> Sep 15 11:49:59 mars kernel: BTRFS error (device sdc): bdev /dev/sda
> errs: wr 176296937, rd 88151246, flush 477, corrupt 0, gen 0
> Sep 15 11:49:59 mars kernel: BTRFS error (device sdc): bdev /dev/sda
> errs: wr 176296938, rd 88151246, flush 477, corrupt 0, gen 0
> Sep 15 11:49:59 mars kernel: BTRFS error (device sdc): bdev /dev/sda
> errs: wr 176296938, rd 88151247, flush 477, corrupt 0, gen 0
> Sep 15 11:49:59 mars kernel: BTRFS error (device sdc): bdev /dev/sda
> errs: wr 176296938, rd 88151248, flush 477, corrupt 0, gen 0
> Sep 15 11:49:59 mars kernel: BTRFS error (device sdc): bdev /dev/sda
> errs: wr 176296939, rd 88151248, flush 477, corrupt 0, gen 0
Lots of missed writes and reads, plus some flush errors (ie. failed
super block writes.
> Sep 15 11:49:59 mars kernel: scrub_handle_errored_block: 16462
> callbacks suppressed
> Sep 15 11:49:59 mars kernel: BTRFS warning (device sdc): i/o error at
> logical 104647456002048 on dev /dev/sda, physical 3193421500416, root
> 5, inode 7253233, offset 67033587712, length 4096, links 1 (path:
> Film>
> Sep 15 11:49:59 mars kernel: BTRFS warning (device sdc): i/o error at
> logical 104647456100352 on dev /dev/sda, physical 3193421598720, root
> 5, inode 7253233, offset 67033686016, length 4096, links 1 (path:
> Film>
> Sep
>
> dmesg after reboot:
> [ 128.675658] BTRFS warning (device sdc): super block error on device
> /dev/sdb, physical 65536
> [ 128.675674] BTRFS error (device sdc): bdev /dev/sdb errs: wr
> 177062143, rd 88533832, flush 479, corrupt 1, gen 0
> [ 128.683734] BTRFS warning (device sdc): super block error on device
> /dev/sdb, physical 67108864
> [ 128.684228] BTRFS error (device sdc): bdev /dev/sdb errs: wr
> 177062143, rd 88533832, flush 479, corrupt 2, gen 0
> [ 128.687400] BTRFS warning (device sdc): super block error on device
> /dev/sdb, physical 274877906944
> [ 128.687956] BTRFS error (device sdc): bdev /dev/sdb errs: wr
> 177062143, rd 88533832, flush 479, corrupt 3, gen 0
Here corrupt means that garbage was read from the disk, which could mean
that the sector was eg. zeroed (like replaced from the internal HDD
pool) or stale data found, or crc mismatch.
> [ 128.688552] BTRFS info (device sdc): scrub: started on devid 8
> [ 128.688561] BTRFS info (device sdc): scrub: started on devid 9
> [ 128.688596] BTRFS info (device sdc): scrub: started on devid 10
> [ 128.709283] BTRFS info (device sdc): scrub: started on devid 11
> [ 128.709320] BTRFS info (device sdc): scrub: started on devid 12
> [ 128.720429] BTRFS warning (device sdc): super block error on device
> /dev/sdc, physical 274877906944
> [ 128.721452] BTRFS error (device sdc): bdev /dev/sdc errs: wr 0, rd
> 0, flush 0, corrupt 1, gen 0
> [ 128.723426] BTRFS warning (device sdc): super block error on device
> /dev/sdc, physical 67108864
> [ 128.724651] BTRFS error (device sdc): bdev /dev/sdc errs: wr 0, rd
> 0, flush 0, corrupt 2, gen 0
> [ 160.484366] BTRFS error (device sdc): space cache generation
> (1395658) does not match inode (1395902)
> [ 209.258959] BTRFS warning (device sdc): failed to load free space
> cache for block group 49783599267840, rebuilding it now
> [ 210.267125] BTRFS error (device sdc): space cache generation
> (1395658) does not match inode (1395664)
> [ 210.272770] BTRFS warning (device sdc): failed to load free space
> cache for block group 53557835333632, rebuilding it now
> [ 210.333332] BTRFS warning (device sdc): failed to load free space
> cache for block group 53599711264768, rebuilding it now
> [ 210.682350] BTRFS warning (device sdc): failed to load free space
> cache for block group 53763993763840, rebuilding it now
> [ 210.693379] BTRFS warning (device sdc): failed to load free space
> cache for block group 53766141247488, rebuilding it now
> [ 211.035247] BTRFS warning (device sdc): failed to load free space
> cache for block group 53904653942784, rebuilding it now
> [ 212.349634] BTRFS warning (device sdc): failed to load free space
> cache for block group 54425418727424, rebuilding it now
> [ 212.760877] io_ctl_check_generation: 5 callbacks suppressed
> [ 212.760887] BTRFS error (device sdc): space cache generation
> (1395658) does not match inode (1396097)
> [ 212.768138] BTRFS warning (device sdc): failed to load free space
> cache for block group 54615471030272, rebuilding it now
The free space load warnings are just a consequence of previous errors,
it's recoverable but at this point the HDD/filesystem has worse
problems.
prev parent reply other threads:[~2023-09-15 15:18 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-15 10:31 Massive filesystem errors - possible new HDD Stefan Malte Schumacher
2023-09-15 15:11 ` David Sterba [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230915151125.GC2747@twin.jikos.cz \
--to=dsterba@suse.cz \
--cc=linux-btrfs@vger.kernel.org \
--cc=s.schumacher@netcologne.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).