From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Kai Krakow <hurikhan77@gmail.com>,
linux-btrfs <linux-btrfs@vger.kernel.org>,
Oliver Wien <ow@netactive.de>
Subject: Re: btrfs crashes during routine btrfs-balance-least-used
Date: Mon, 15 Jul 2024 07:23:57 +0930 [thread overview]
Message-ID: <0bedfc5f-4658-4d01-98b3-34bc14f736f3@gmx.com> (raw)
In-Reply-To: <CAMthOuPjg5RDT-G_LXeBBUUtzt3cq=JywF+D1_h+JYxe=WKp-Q@mail.gmail.com>
在 2024/7/15 01:43, Kai Krakow 写道:
> Hello btrfs list!
>
> (also reported in irc)
>
> Our btrfs pool crashed during a routine btrfs-balance-least-used.
> Maybe of interest: bees is also running on this filesystem, snapper
> takes hourly snapshots with retention policy.
>
> I'm currently still collecting diagnostics, "btrfs check" log is
> already 3 GB and growing.
>
> The btrfs runs on three devices vd{c,e,f}1 with data=single meta=raid1.
>
> Here's an excerpt from dmesg (full log https://gist.tnonline.net/TE):
Unfortunately the full log is not really full.
There should be extent leaf dump, and after that dump, showing the
reason why we believe it's a problem.
Is there any true full dmesg dump?
But overall, most of the errors inside __btrfs_free_extent() would be
extent tree corruption.
> [...]
>
> "btrfs check" can only run in lowmem mode, it will crash with "out of
> memory" (the system has 74G of RAM). Here's the beginning of the log:
>
> [1/7] checking root items
> [2/7] checking extents
> ERROR: shared extent 15929577472 referencer lost (parent: 1147747794944)
I believe that's the cause, some extent tree corruption.
> ERROR: shared extent 15929577472 referencer lost (parent: 1148095201280)
> ERROR: shared extent 15929577472 referencer lost (parent: 1175758274560)
> (repeating thousands of similar lines)
>
> Last gist: https://gist.tnonline.net/Z4 (meanwhile, this log is over
> 3GB, I can upload it somewhere later).
>
> We have backups (daily backups stored inside borg on a remote host).
>
> Is there anything we can do? Restoring from backup will probably take
> more than 24h (3 TB). The system runs web and mail hosts for more than
> 100 customers.
>
> We did not try to run "btrfs check --repair" yet, nor
> "--init-extent-tree". I'd rather try a quick repair before restoring.
> But OTOH, I don't want to make it worse and waste time by trying.
Considering the size of the metadata, I do not believe --repair nor
--init-extent-tree is going to fully fix the problem.
>
> Unfortunately, the btrfs has been mounted rw again after unmounting
> following the incident. This restarted the balance, and it seems it
> changed the first error "btrfs check" found. I'll try
> "ro,skip-balance" after btrfs-check finished. I think the file-system
> is still fully readable and we can take one last backup.
>
> Also, I happily provide the logs collected if a dev wanted to look into it.
I guess there is no real full dmesg of that incident?
The corrupted extent leaf has 260 items, but the dump only contains 36,
nor the final reason line.
The other thing is, does the server has ECC memory?
It's not uncommon to see bitflips causing various problems (almost
monthly reports).
If the machine doesn't have ECC memory, then a memtest would be preferable.
Thanks,
Qu
>
>
> Thanks in advance
> Kai
>
next prev parent reply other threads:[~2024-07-14 21:54 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-14 16:13 btrfs crashes during routine btrfs-balance-least-used Kai Krakow
2024-07-14 21:53 ` Qu Wenruo [this message]
2024-07-15 4:29 ` Kai Krakow
2024-07-15 5:00 ` Qu Wenruo
2024-07-15 5:31 ` Kai Krakow
2024-07-15 5:50 ` Qu Wenruo
2024-07-16 6:51 ` Kai Krakow
2024-07-16 9:09 ` Qu Wenruo
2024-07-16 13:25 ` Kai Krakow
2024-07-16 22:18 ` Qu Wenruo
2024-07-17 8:09 ` Kai Krakow
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0bedfc5f-4658-4d01-98b3-34bc14f736f3@gmx.com \
--to=quwenruo.btrfs@gmx.com \
--cc=hurikhan77@gmail.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=ow@netactive.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox