From: Martin Steigerwald <martin@lichtvoll.de>
To: linux-btrfs@vger.kernel.org
Subject: Re: [REGRESSION] Hang during backup with rsync
Date: Fri, 01 May 2015 12:13:55 +0200 [thread overview]
Message-ID: <2075166.qtr43mmylJ@merkaba> (raw)
In-Reply-To: <pan$b4d5f$f3e14022$274533d9$ff98a48f@cox.net>
Am Freitag, 1. Mai 2015, 01:48:23 schrieb Duncan:
> Martin Steigerwald posted on Thu, 30 Apr 2015 19:29:57 +0200 as
excerpted:
> > The hang was: Mouse pointer in KDE not movable anymore, Ctrl-Alt-F1
> > had
> > no effect. I waited for a minute at least. Maybe it would have reacted
> > after a longer time, but I wanted my machine back. Disks where idle,
> > if
> > I remember correctly. After reboot both filesystems mount okay.
>
> This response is in regard to what to do at an apparent hang, and has
> nothing directly to do with btrfs...
>
> Two comments:
>
> 1) Depending on your graphics hardware and driver config, a modern
> "KMS" (kernel modesetting) setup is more likely to "soft" hang in X mode
> and not switch back to text mode, even when the system is otherwise not
> hung and a VT switch would have worked fine pre-KMS-era.
>
> While I'm no kernel or graphics expert, the problem from here /seems/ to
> be that a modern KMS kernel generally uses high-res framebuffer mode at
> the CLI as well, and because the basic kernel handling is unified
> framebuffer and kernel-mode-switching for both X and CLI modes,
> switching from X to CLI doesn't involve switching to the entirely
> separate VGA mode driver and with it the forced hardware reset that it
> used to. Without that driver switch and forced reset, even if the
> switch actually occurs successfully in terms of what you might type,
> what is actually displayed may remain frozen, such that if you only
> have a local session, you generally have to reboot anyway, but if you
> already have a CLI login going in the VT you tried to switch to or can
> login blind, sometimes you can at least manage a controlled reboot, by
> doing an init 6 or systemctl reboot or whatever, even if the display is
> frozen and shows nothing. Of course it doesn't always work, but given
> the chance to avoid an unclean shutdown, try it and see.
>
> So no response at an attempted VT switch (your ctrl-alt-F1) doesn't mean
> what it used to...
I never read this. Also it is not obvious to me why a hardware reset would
be needed if the embedded Intel gfx is initialized properly already. I do
not believe that it was the GPU that hang.
I assume a simpler explaination: that X.org process was in D state and
thus not able to respond to the keypress anymore. Or that the kernel was
stuck in a way that it didn´t do anything anymore. Next time I may try a
ping to the machine from my other laptop, cause from my experience in that
case it doesn´t even respond to a ping anymore.
> 2) Along the same lines, there's the kernel's magic-sysrequest
> (sysrq/srq) functionality. Assuming you have it enabled in your
> kernel, you can try a series of alt-sysrq-key sequences and very
> possibly use that to avoid an entirely uncontrolled shutdown, even when
> major functionality upto and including all of userspace is
> non-functional.
I didn´t try these, although I am aware they exist. I didn´t think of it
and I didn´t memorize them. Maybe I dig for some kind of a reference card
to stick to somewhere I can look up in that case.
Thing is, I wanted to have the machine back. Now. So I did the quickest
way out. Yet, I still wanted to report what I could gather easily enough
in a short time.
Thank you for your detailed explaination. I may just print your mail as a
reference :)
But I had the plan that for the next backup attempt, I will quit X11 and
have it running on TTY1, while also logging into TTY2 and TTY3 or to
possible be able to issue some commands to gather further debug
information. For that those sysrq combinations may be helpful.
> So, when I see descriptions of apparent system hangs such as yours,
> above, a big thing I look for is whether the K/REISUB magic-srq
> sequences were tried, and if so, at which step, if any, the kernel
> responded.
>
> * If the user was in X and the secure-term K sequence worked, the
> problem wasn't too bad, and may have been a graphics system issue.
>
> * If the S and R sequences worked, then the problem was worse, but
> either wasn't storage related, or at least was minor enough that the
> kernel felt it safe to sync and remount.
>
> * If only the B sequence responded, then at least the kernel was still
> alive, but it considered the situation serious enough that it dare not
> do the sync/remount writes lest it risk scribbling on other partitions,
> etc.
>
> * If not even the B sequence responded, then the kernel was effectively
> dead as well, and the problem was very serious indeed!
>
> Unfortunately, the above hang description doesn't mention trying magic
> sysrq at all, and assuming you didn't try them, not only did you
> potentially needlessly endanger your data (if the S/R steps would have
> worked), but now we are missing that key bit of information about how
> badly the kernel /itself/ thought things were.
While I do think that these key combination can be helpful for further
debugging I doubt they would have done anything for ensuring data
integrity, cause…
… BTRFS was hung. And from my past experiences a issueing "sync" command
from the shell, when it was still possible, just got the process of the
"sync" command into D state and that was it.
When this happens usually after some time various parts of the KDE desktop
stop responding as their processes try to write data to the BTRFS
filesystem and get stuck in uninterruptible sleep.
Journaling and copy on write filesystems are supposed to deal with sudden
interruption write operation just fine and it is a bug if they are
corrupted afterwards. Only risk would be unwritten stuff, but, well, as I
assumed BTRFS was frozen, and the backtraces seem to suggest that as well,
it probably wouldn´t have written a single bit anyway anymore, unless I
wait for it to eventually come out of the hang after some time. And this
is the time I didn´t want to invest at that moment.
What was new this time compared to a regular BTRFS hang as they still
happen when BTRFS allocated all space of the devices into chunks, that
even the mouse pointer was frozen. Also here, clearly not all space of the
devices was allocated into chunks, so what I have seen is a different
issue.
Ciao,
--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7
next prev parent reply other threads:[~2015-05-01 10:13 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-04-30 17:29 [REGRESSION] Hang during backup with rsync Martin Steigerwald
2015-05-01 1:48 ` Duncan
2015-05-01 10:13 ` Martin Steigerwald [this message]
2015-05-01 11:18 ` Duncan
2015-05-01 12:25 ` Austin S Hemmelgarn
2015-05-02 3:40 ` Duncan
2015-05-01 9:49 ` Martin Steigerwald
2015-05-01 10:30 ` Filipe David Manana
2015-05-01 10:40 ` [BUG] " Martin Steigerwald
2015-05-01 10:43 ` Filipe David Manana
2015-05-01 10:45 ` Martin Steigerwald
2015-05-02 17:07 ` [REGRESSION] " Martin Steigerwald
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2075166.qtr43mmylJ@merkaba \
--to=martin@lichtvoll.de \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox