From: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
To: Pierre Abbat <phma@bezitopo.org>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: Computer stalled, apparently from filesystem corruption
Date: Mon, 4 Apr 2022 19:05:44 -0400 [thread overview]
Message-ID: <Ykt5yORl1OsMRODL@hungrycats.org> (raw)
In-Reply-To: <3205109.rnzMqkiUVr@puma>
On Sun, Apr 03, 2022 at 02:14:34AM -0400, Pierre Abbat wrote:
> On Friday, March 11, 2022 9:48:35 PM EDT Zygo Blaxell wrote:
> > There's no indication of corruption in those logs. Above the kernel
> > is complaining that it's taking too long to finish transactions, which
> > could be a btrfs problem, or a hardware problem, or even simply a large
> > filesystem running normally on very slow disks. Not enough information
> > to tell.
>
> It couldn't be a very slow disk, because all the drives are NVM or SSD.
Those can be slow too. It's up to the firmware and the health of the
underlying flash media, and some devices also implement throttling when
over temperature.
e.g. Samsung 860 EVO has a firmware quirk that makes them significantly
slower than spinning drives under continuous load. The drive drops to
2 iops every 5 seconds, an almost complete stop.
> > When posting logs, extract all lines with 'btrfs' on them, plus context
> > lines, e.g.
> >
> > grep -B9 -i btrfs /var/log/kern.log
> >
> > or
> >
> > dmesg | grep -B9 -i btrfs
> >
> > If you can reproduce the hang, enable sysrq and do Alt-SysRq-W when it
> > hangs (or run
> >
> > echo w > /proc/sysrq-trigger
> >
> > from a command line). This will provide stack traces of all blocked
> > processes so we can see what the transaction is waiting for.
>
> Here are the whole sections of the logfiles from the first error until the
> computer hung, compressed. I'm afraid to run the rsync script again because it
> might hang the computer.
It's probably not the rsync--I've retired quite a few btrfs bugs triggered
by rsync, and your stack traces are pointing to system calls that rsync
never uses (though rsync is caught up in the hang which affects everything
writing the filesystem).
> Is there a way to find out if the filesystem access
> hung on a bad sector or something?
Generally that would eventually trigger a timeout which would be reported
by the device layer, but some device drivers fail to recover from those.
If the device is truly hung, then you wouldn't be able to access the
device while the filesystem is locked up. Try this command (with "..."
replaced with the device name):
dd if=/dev/... of=/dev/null bs=4k status=progress
If it shows progress as data is read from the device, then the device
isn't hung. If it shows very slow progress (like <100K per 5 seconds)
then the device isn't hung, but it is having some kind of problem
that needs to be addressed (pick a different device model, add a heat
sink, replace a failing device, replace a failing HBA, whatever).
This looks like a more conventional btrfs deadlock bug, though:
everything seems to be waiting for a fsync on a directory. You might
try a different kernel, either the previous LTS (5.10) or the next one
(5.15) or the current release (5.17) to see if it is already resolved.
If it still happens on a current release, try
echo w > /proc/sysrq-trigger
echo d > /proc/sysrq-trigger
and post the logs. Maybe someone can figure out what the deadlock is.
> Pierre
>
> --
> Lanthanidia deliciosa: What the kiwifruit would be
> if it weren't so radioactive.
next prev parent reply other threads:[~2022-04-04 23:14 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-10 10:56 Computer stalled, apparently from filesystem corruption Pierre Abbat
2022-03-12 2:48 ` Zygo Blaxell
2022-04-03 6:14 ` Pierre Abbat
2022-04-04 23:05 ` Zygo Blaxell [this message]
-- strict thread matches above, loose matches on Subject: below --
2022-03-08 12:04 Pierre Abbat
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Ykt5yORl1OsMRODL@hungrycats.org \
--to=ce3g8jdj@umail.furryterror.org \
--cc=linux-btrfs@vger.kernel.org \
--cc=phma@bezitopo.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox