linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Massive filesystem errors - possible new HDD
@ 2023-09-15 10:31 Stefan Malte Schumacher
  2023-09-15 15:11 ` David Sterba
  0 siblings, 1 reply; 2+ messages in thread
From: Stefan Malte Schumacher @ 2023-09-15 10:31 UTC (permalink / raw)
  To: Btrfs BTRFS

Hello,

I have some serious problems with my btrfs-filesystem. It started with
a smart error and "btrfs fi show" reporting the drive as missing about
once every two weeks. The error went away after a reboot.

Device: /dev/sda [SAT], unable to open ATA device
Device info: WDC  WUH722020ALE6L4, S/N:2LG7DJJK,
WWN:5-000cca-2b3c35da5, FW:PQGNW108, 20.0 TB

This is the latest hard disk I bought about four weeks ago.Half an
hour ago I got the error message again, switched the monitor input to
my file server and watched it boot. Booting  produced some serious
btrfs errors, but would finish. The filesystem is mounted and I can
create files on it, but dmesg shows massive errors. I have pasted a
selection of the errors after my message.

Is this - in your opinion - a logical error of the filesystem or
should I immediately exchange the new drive? Should I try to scrub my
data? I have a backup but it's rather recent, meaning it could include
corrupted files because I also bought a new NAS in addition to the new
drive for the fileserver since I urgently needed space on both. Note:
The former /dev/sda now is /dev/sdb after the reboot.

Thanks in advance and yours faithfully
Stefan Malte Schumacher

Errors from journalctl. Repeat probably from the point the drive was
not recognized any more.
Sep 15 11:49:59 mars kernel: BTRFS error (device sdc): bdev /dev/sda
errs: wr 176296937, rd 88151246, flush 477, corrupt 0, gen 0
Sep 15 11:49:59 mars kernel: BTRFS error (device sdc): bdev /dev/sda
errs: wr 176296938, rd 88151246, flush 477, corrupt 0, gen 0
Sep 15 11:49:59 mars kernel: BTRFS error (device sdc): bdev /dev/sda
errs: wr 176296938, rd 88151247, flush 477, corrupt 0, gen 0
Sep 15 11:49:59 mars kernel: BTRFS error (device sdc): bdev /dev/sda
errs: wr 176296938, rd 88151248, flush 477, corrupt 0, gen 0
Sep 15 11:49:59 mars kernel: BTRFS error (device sdc): bdev /dev/sda
errs: wr 176296939, rd 88151248, flush 477, corrupt 0, gen 0
Sep 15 11:49:59 mars kernel: scrub_handle_errored_block: 16462
callbacks suppressed
Sep 15 11:49:59 mars kernel: BTRFS warning (device sdc): i/o error at
logical 104647456002048 on dev /dev/sda, physical 3193421500416, root
5, inode 7253233, offset 67033587712, length 4096, links 1 (path:
Film>
Sep 15 11:49:59 mars kernel: BTRFS warning (device sdc): i/o error at
logical 104647456100352 on dev /dev/sda, physical 3193421598720, root
5, inode 7253233, offset 67033686016, length 4096, links 1 (path:
Film>
Sep

dmesg after reboot:
[  128.675658] BTRFS warning (device sdc): super block error on device
/dev/sdb, physical 65536
[  128.675674] BTRFS error (device sdc): bdev /dev/sdb errs: wr
177062143, rd 88533832, flush 479, corrupt 1, gen 0
[  128.683734] BTRFS warning (device sdc): super block error on device
/dev/sdb, physical 67108864
[  128.684228] BTRFS error (device sdc): bdev /dev/sdb errs: wr
177062143, rd 88533832, flush 479, corrupt 2, gen 0
[  128.687400] BTRFS warning (device sdc): super block error on device
/dev/sdb, physical 274877906944
[  128.687956] BTRFS error (device sdc): bdev /dev/sdb errs: wr
177062143, rd 88533832, flush 479, corrupt 3, gen 0
[  128.688552] BTRFS info (device sdc): scrub: started on devid 8
[  128.688561] BTRFS info (device sdc): scrub: started on devid 9
[  128.688596] BTRFS info (device sdc): scrub: started on devid 10
[  128.709283] BTRFS info (device sdc): scrub: started on devid 11
[  128.709320] BTRFS info (device sdc): scrub: started on devid 12
[  128.720429] BTRFS warning (device sdc): super block error on device
/dev/sdc, physical 274877906944
[  128.721452] BTRFS error (device sdc): bdev /dev/sdc errs: wr 0, rd
0, flush 0, corrupt 1, gen 0
[  128.723426] BTRFS warning (device sdc): super block error on device
/dev/sdc, physical 67108864
[  128.724651] BTRFS error (device sdc): bdev /dev/sdc errs: wr 0, rd
0, flush 0, corrupt 2, gen 0
[  160.484366] BTRFS error (device sdc): space cache generation
(1395658) does not match inode (1395902)
[  209.258959] BTRFS warning (device sdc): failed to load free space
cache for block group 49783599267840, rebuilding it now
[  210.267125] BTRFS error (device sdc): space cache generation
(1395658) does not match inode (1395664)
[  210.272770] BTRFS warning (device sdc): failed to load free space
cache for block group 53557835333632, rebuilding it now
[  210.333332] BTRFS warning (device sdc): failed to load free space
cache for block group 53599711264768, rebuilding it now
[  210.682350] BTRFS warning (device sdc): failed to load free space
cache for block group 53763993763840, rebuilding it now
[  210.693379] BTRFS warning (device sdc): failed to load free space
cache for block group 53766141247488, rebuilding it now
[  211.035247] BTRFS warning (device sdc): failed to load free space
cache for block group 53904653942784, rebuilding it now
[  212.349634] BTRFS warning (device sdc): failed to load free space
cache for block group 54425418727424, rebuilding it now
[  212.760877] io_ctl_check_generation: 5 callbacks suppressed
[  212.760887] BTRFS error (device sdc): space cache generation
(1395658) does not match inode (1396097)
[  212.768138] BTRFS warning (device sdc): failed to load free space
cache for block group 54615471030272, rebuilding it now

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Massive filesystem errors - possible new HDD
  2023-09-15 10:31 Massive filesystem errors - possible new HDD Stefan Malte Schumacher
@ 2023-09-15 15:11 ` David Sterba
  0 siblings, 0 replies; 2+ messages in thread
From: David Sterba @ 2023-09-15 15:11 UTC (permalink / raw)
  To: Stefan Malte Schumacher; +Cc: Btrfs BTRFS

On Fri, Sep 15, 2023 at 12:31:44PM +0200, Stefan Malte Schumacher wrote:
> Hello,
> 
> I have some serious problems with my btrfs-filesystem. It started with
> a smart error and "btrfs fi show" reporting the drive as missing about
> once every two weeks. The error went away after a reboot.
> 
> Device: /dev/sda [SAT], unable to open ATA device
> Device info: WDC  WUH722020ALE6L4, S/N:2LG7DJJK,
> WWN:5-000cca-2b3c35da5, FW:PQGNW108, 20.0 TB
> 
> This is the latest hard disk I bought about four weeks ago.Half an
> hour ago I got the error message again, switched the monitor input to
> my file server and watched it boot. Booting  produced some serious
> btrfs errors, but would finish. The filesystem is mounted and I can
> create files on it, but dmesg shows massive errors. I have pasted a
> selection of the errors after my message.
> 
> Is this - in your opinion - a logical error of the filesystem or
> should I immediately exchange the new drive?

This kind of errors looks like problem on the hardware side, here the
filesystem is merely detecting that. It could be caused by loose cables
if the device disappears or controller or the disk itself, depending on
the load.

> Should I try to scrub my
> data? I have a backup but it's rather recent, meaning it could include
> corrupted files because I also bought a new NAS in addition to the new
> drive for the fileserver since I urgently needed space on both. Note:
> The former /dev/sda now is /dev/sdb after the reboot.

Better not use it for other purposes than to read the remaining data.

> Thanks in advance and yours faithfully
> Stefan Malte Schumacher
> 
> Errors from journalctl. Repeat probably from the point the drive was
> not recognized any more.
> Sep 15 11:49:59 mars kernel: BTRFS error (device sdc): bdev /dev/sda
> errs: wr 176296937, rd 88151246, flush 477, corrupt 0, gen 0
> Sep 15 11:49:59 mars kernel: BTRFS error (device sdc): bdev /dev/sda
> errs: wr 176296938, rd 88151246, flush 477, corrupt 0, gen 0
> Sep 15 11:49:59 mars kernel: BTRFS error (device sdc): bdev /dev/sda
> errs: wr 176296938, rd 88151247, flush 477, corrupt 0, gen 0
> Sep 15 11:49:59 mars kernel: BTRFS error (device sdc): bdev /dev/sda
> errs: wr 176296938, rd 88151248, flush 477, corrupt 0, gen 0
> Sep 15 11:49:59 mars kernel: BTRFS error (device sdc): bdev /dev/sda
> errs: wr 176296939, rd 88151248, flush 477, corrupt 0, gen 0

Lots of missed writes and reads, plus some flush errors (ie. failed
super block writes.

> Sep 15 11:49:59 mars kernel: scrub_handle_errored_block: 16462
> callbacks suppressed
> Sep 15 11:49:59 mars kernel: BTRFS warning (device sdc): i/o error at
> logical 104647456002048 on dev /dev/sda, physical 3193421500416, root
> 5, inode 7253233, offset 67033587712, length 4096, links 1 (path:
> Film>
> Sep 15 11:49:59 mars kernel: BTRFS warning (device sdc): i/o error at
> logical 104647456100352 on dev /dev/sda, physical 3193421598720, root
> 5, inode 7253233, offset 67033686016, length 4096, links 1 (path:
> Film>
> Sep
> 
> dmesg after reboot:
> [  128.675658] BTRFS warning (device sdc): super block error on device
> /dev/sdb, physical 65536
> [  128.675674] BTRFS error (device sdc): bdev /dev/sdb errs: wr
> 177062143, rd 88533832, flush 479, corrupt 1, gen 0
> [  128.683734] BTRFS warning (device sdc): super block error on device
> /dev/sdb, physical 67108864
> [  128.684228] BTRFS error (device sdc): bdev /dev/sdb errs: wr
> 177062143, rd 88533832, flush 479, corrupt 2, gen 0
> [  128.687400] BTRFS warning (device sdc): super block error on device
> /dev/sdb, physical 274877906944
> [  128.687956] BTRFS error (device sdc): bdev /dev/sdb errs: wr
> 177062143, rd 88533832, flush 479, corrupt 3, gen 0

Here corrupt means that garbage was read from the disk, which could mean
that the sector was eg. zeroed (like replaced from the internal HDD
pool) or stale data found, or crc mismatch.

> [  128.688552] BTRFS info (device sdc): scrub: started on devid 8
> [  128.688561] BTRFS info (device sdc): scrub: started on devid 9
> [  128.688596] BTRFS info (device sdc): scrub: started on devid 10
> [  128.709283] BTRFS info (device sdc): scrub: started on devid 11
> [  128.709320] BTRFS info (device sdc): scrub: started on devid 12
> [  128.720429] BTRFS warning (device sdc): super block error on device
> /dev/sdc, physical 274877906944
> [  128.721452] BTRFS error (device sdc): bdev /dev/sdc errs: wr 0, rd
> 0, flush 0, corrupt 1, gen 0
> [  128.723426] BTRFS warning (device sdc): super block error on device
> /dev/sdc, physical 67108864
> [  128.724651] BTRFS error (device sdc): bdev /dev/sdc errs: wr 0, rd
> 0, flush 0, corrupt 2, gen 0
> [  160.484366] BTRFS error (device sdc): space cache generation
> (1395658) does not match inode (1395902)
> [  209.258959] BTRFS warning (device sdc): failed to load free space
> cache for block group 49783599267840, rebuilding it now
> [  210.267125] BTRFS error (device sdc): space cache generation
> (1395658) does not match inode (1395664)
> [  210.272770] BTRFS warning (device sdc): failed to load free space
> cache for block group 53557835333632, rebuilding it now
> [  210.333332] BTRFS warning (device sdc): failed to load free space
> cache for block group 53599711264768, rebuilding it now
> [  210.682350] BTRFS warning (device sdc): failed to load free space
> cache for block group 53763993763840, rebuilding it now
> [  210.693379] BTRFS warning (device sdc): failed to load free space
> cache for block group 53766141247488, rebuilding it now
> [  211.035247] BTRFS warning (device sdc): failed to load free space
> cache for block group 53904653942784, rebuilding it now
> [  212.349634] BTRFS warning (device sdc): failed to load free space
> cache for block group 54425418727424, rebuilding it now
> [  212.760877] io_ctl_check_generation: 5 callbacks suppressed
> [  212.760887] BTRFS error (device sdc): space cache generation
> (1395658) does not match inode (1396097)
> [  212.768138] BTRFS warning (device sdc): failed to load free space
> cache for block group 54615471030272, rebuilding it now

The free space load warnings are just a consequence of previous errors,
it's recoverable but at this point the HDD/filesystem has worse
problems.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2023-09-15 15:18 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-15 10:31 Massive filesystem errors - possible new HDD Stefan Malte Schumacher
2023-09-15 15:11 ` David Sterba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).