Re: AW: How to (attempt to) repair these btrfs errors

public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed

From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Carsten Grommel <c.grommel@profihost.ag>,
	Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Cc: "linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: Re: AW: How to (attempt to) repair these btrfs errors
Date: Mon, 7 Mar 2022 15:11:36 +0800	[thread overview]
Message-ID: <5379ca8e-0384-b447-52c1-a41ef0ded7e7@gmx.com> (raw)
In-Reply-To: <AM0PR08MB3265F930C35B1AFBA7E981B18E089@AM0PR08MB3265.eurprd08.prod.outlook.com>



On 2022/3/7 15:03, Carsten Grommel wrote:
> Thank you for the answer. We are using space_cache v2:
>
> /dev/sdc1 on /vmbackup type btrfs (rw,noatime,nobarrier,compress-force=zlib:3,ssd_spread,noacl,space_cache=v2,skip_balance,subvolid=5,subvol=/,x-systemd.mount-timeout=4h)
>
>> Data is raid0, so data repair is not possible.  Delete all the files
>> that contain corrupt data.
>
> I tried but as soon as I access the broken blocks btrfs fails into readonly so I am kind of in a deadlock there.

Btrfs only falls back to RO for very critical errors (which could affect
on-disk metadata consistency).

Thus plain data corruption should not cause the RO.

Mind to share a dmesg just after the RO fallback?

Thanks,
Qu

>
>> I don't see any errors in these logs that would indicate a metadata issue,
>> but huge numbers of messages are suppressed.  Perhaps a log closer
>> to the moment when the filesystem goes read-only will be more useful.
>
>> I would expect that if there are no problems on sda1 or sdb1 then it
>> should be possible to repair the metadata errors on sdd1 by scrubbing
> that device.
>
> I ran a number of scrubs now, at some point it always fails and btrfs remounts into readonly.
> I did not yet try to scrub specifically on sdd though, gonna try that.
>
> Should it remount again i will provide the most recent dmesg's right before it crashes.
>
> ________________________________________
> Von: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
> Gesendet: Sonntag, 6. März 2022 02:36
> An: Carsten Grommel
> Cc: linux-btrfs@vger.kernel.org
> Betreff: Re: How to (attempt to) repair these btrfs errors
>
> On Tue, Mar 01, 2022 at 10:55:50AM +0000, Carsten Grommel wrote:
>> Follow-up pastebin with the most recent errors in dmesg:
>>
>> https://pastebin.com/4yJJdQPJ
>
> This seems to have expired.
>
>> ________________________________________
>> Von: Carsten Grommel
>> Gesendet: Montag, 28. Februar 2022 19:41
>> An: linux-btrfs@vger.kernel.org
>> Betreff: How to (attempt to) repair these btrfs errors
>>
>> Hi,
>>
>> short buildup: btrfs filesystem used for storing ceph rbd backups within subvolumes got corrupted.
>> Underlying 3 RAID 6es, btrfs is mounted on Top as RAID 0 over these Raids for performance ( we have to store massive Data)
>>
>> Linux cloud8-1550 5.10.93+2-ph #1 SMP Fri Jan 21 07:52:51 UTC 2022 x86_64 GNU/Linux
>>
>> But it was Kernel 5.4.121 before
>>
>> btrfs --version
>> btrfs-progs v4.20.1
>>
>> btrfs fi show
>> Label: none  uuid: b634a011-28fa-41d7-8d6e-3f68ccb131d0
>>                  Total devices 3 FS bytes used 56.74TiB
>>                  devid    1 size 25.46TiB used 22.70TiB path /dev/sda1
>>                  devid    2 size 25.46TiB used 22.69TiB path /dev/sdb1
>>                  devid    3 size 25.46TiB used 22.70TiB path /dev/sdd1
>>
>> btrfs fi df /vmbackup/
>> Data, RAID0: total=66.62TiB, used=56.45TiB
>> System, RAID1: total=8.00MiB, used=4.36MiB
>> Metadata, RAID1: total=750.00GiB, used=294.90GiB
>> GlobalReserve, single: total=512.00MiB, used=0.00B
>>
>> Attached the dmesg.log, a few dmesg messages following regarding the different errors (some informations redacted):
>>
>> [Mon Feb 28 18:53:57 2022] BTRFS error (device sda1): bdev /dev/sdd1 errs: wr 0, rd 0, flush 0, corrupt 69074516, gen 184286
>>
>> [Mon Feb 28 18:53:57 2022] BTRFS error (device sda1): bdev /dev/sdd1 errs: wr 0, rd 0, flush 0, corrupt 69074517, gen 184286
>>
>> [Mon Feb 28 18:54:23 2022] BTRFS error (device sda1): unable to fixup (regular) error at logical 776693776384 on dev /dev/sdd1
>>
>> [Mon Feb 28 18:54:25 2022] scrub_handle_errored_block: 21812 callbacks suppressed
>>
>> [Mon Feb 28 18:54:31 2022] BTRFS warning (device sda1): checksum error at logical 777752285184 on dev /dev/sdd1, physical 259607957504, root 108747, inode 257, offset 59804737536, length 4096, links 1 (path: cephstorX_vm-XXX-disk-X-base.img_1645337735)
>>
>> I am able to mount the filesystem in read-write mode but accessing specific blocks seems to crash btrfs to remount into read-only
>> I am currently running a scrub over the filesystem.
>>
>> The system got rebooted and the fs got remounted 2-3 times. I made the experience that usually btrfs would and could fix these kinds of errors after a remount, not this time though.
>>
>> Before I ran “btrfs check –repair” I would like some advice at how to tackle theses errors.
>
> The corruption and generation event counts indicate sdd1 (or one of its
> component devices) was offline for a long time or suffered corruption
> on a large scale.
>
> Data is raid0, so data repair is not possible.  Delete all the files
> that contain corrupt data.
>
> If you are using space_cache=v1, now is a good time to upgrade to
> space_cache=v2.  v1 space cache is stored in the data profile, and it has
> likely been corrupted.  btrfs will usually detect and repair corruption
> in space_cache=v1, but there is no need to take any such risk here
> when you can easily use v2 instead (or at least clear the v1 cache).
>
> I don't see any errors in these logs that would indicate a metadata issue,
> but huge numbers of messages are suppressed.  Perhaps a log closer
> to the moment when the filesystem goes read-only will be more useful.
>
> I would expect that if there are no problems on sda1 or sdb1 then it
> should be possible to repair the metadata errors on sdd1 by scrubbing
> that device.
>
>> Kind regards
>> Carsten Grommel
>>

next prev parent reply	other threads:[~2022-03-07  7:11 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-28 18:41 How to (attempt to) repair these btrfs errors Carsten Grommel
2022-03-01 10:55 ` AW: " Carsten Grommel
2022-03-06  1:36   ` Zygo Blaxell
2022-03-07  7:03     ` AW: " Carsten Grommel
2022-03-07  7:11       ` Qu Wenruo [this message]
2022-03-07  7:25         ` AW: " Carsten Grommel
2022-03-07  7:27           ` Carsten Grommel
2022-03-07  7:39             ` Qu Wenruo
2022-03-07  7:34           ` Qu Wenruo
2022-03-07  7:48             ` AW: " Carsten Grommel
2022-03-07  8:00               ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5379ca8e-0384-b447-52c1-a41ef0ded7e7@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=c.grommel@profihost.ag \
    --cc=ce3g8jdj@umail.furryterror.org \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox