From: Duncan <1i5t5.duncan@cox.net>
To: linux-btrfs@vger.kernel.org
Subject: Re: unable to handle kernel paging request - btrfs
Date: Fri, 23 Sep 2016 04:58:53 +0000 (UTC) [thread overview]
Message-ID: <pan$afe1d$3cd285bd$f0b68da9$67150f63@cox.net> (raw)
In-Reply-To: CAGfcS_m0KvAjmWpnGC7agLYBKROqjPoSCVrziaEJXzW1Dyw5iQ@mail.gmail.com
Rich Freeman posted on Thu, 22 Sep 2016 07:18:35 -0500 as excerpted:
> I have been getting panics consistently after doing a btrfs replace
> operation on a raid1 and rebooting. I linked a photo of the panic; I
> haven't been able to get a text capture of it.
>
> https://ibin.co/2vx0HhDeViu3.jpg
>
> I'm getting this error on the latest 4.4, 4.1, and even on an old
> 3.18.26 kernel I had lying around.
>
> I tried the remove root_log_ctx from ctx list before btrfs_sync_log
> returns patch on 4.1 and that did not solve my problem either.
>
> I'm able to boot into single-user mode and if I don't start any
> processes the system seems fairly stable. I am also able to start a
> btrfs balance and run that for several hours without issue. If I start
> launching services the system will tend to panic, though how many
> processes I can launch will vary. I don't think that it is a particular
> file being accessed that is triggering the issue since the point where
> it fails varies. I suspect it may be load-related.
>
> Mounting with compress=no doesn't seem to help either. Granted, I see
> lzo_decompress in the backtrace and that is probably a read operation.
>
> Any suggestions? Google hasn't been helpful on this one...
Btrfs raid1 you say, and you have existing compressed files it's trying
to read in the backtrace?
Sounds like the issues I see sometimes and have posted about where after
a crash that resulted in one device of my raid1 pair getting behind the
other, the kernel will crash if it sees too many csum-errors, even tho
it's /supposed/ to check the other copy and read from it if valid (which
it is as a btrfs scrub resolves the issue).
When booted to rescue/single-user mode, can you run a scrub? If it's the
csum-related problem I see and the replace worked, a scrub should
complete fine, repairing the bad copy from the mirror, and the problem
should be resolved. If the replace bugged out and you now have only one
copy of some chunks, if scrub finds an error there it obviously won't be
able to repair from the good mirror, but it should at least spot some csum
errors it can't repair.
If a scrub crashes too, if it completes without finding any errors to
correct, or if it finds and corrects errors but the issue persists, then
it's unlikely to be the issue I've seen.
FWIW, the issue I've seen appears to be related to attempts to read
compressed files. It does not appear to affect users who don't have any
such files or do but they're simply not accessed in ordinary operations.
It may or may not affect other than raid1 and likely raid10, but they
make it easiest to verify due to the possibility of one copy getting out
of sync with the other, and due to scrub's ability to confirm that as the
problem as it can repair the bad copy from the good one, which the kernel
should do dynamically as well, but that's where the bug is as too many
dynamic csum errors trigger a crash even when there's a second copy
available, that scrub later verifies as valid.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
next prev parent reply other threads:[~2016-09-23 4:59 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-09-22 12:18 unable to handle kernel paging request - btrfs Rich Freeman
2016-09-22 12:44 ` Holger Hoffstätte
2016-09-22 16:23 ` David Sterba
2016-09-22 16:46 ` Rich Freeman
2016-09-22 17:29 ` Chris Murphy
2016-09-22 17:41 ` Jeff Mahoney
2016-09-30 18:54 ` Rich Freeman
2016-09-30 20:55 ` Jeff Mahoney
2016-09-30 21:07 ` Rich Freeman
2016-10-01 0:38 ` Jeff Mahoney
2016-10-07 14:00 ` Rich Freeman
2016-10-08 21:55 ` Rich Freeman
2016-10-10 12:54 ` Rich Freeman
2016-09-23 4:58 ` Duncan [this message]
2016-09-25 13:55 ` Rich Freeman
2016-09-26 0:22 ` Jeff Mahoney
2016-09-26 0:37 ` Rich Freeman
2016-09-26 0:39 ` Jeff Mahoney
2016-09-26 0:42 ` Rich Freeman
2016-09-26 2:21 ` Duncan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='pan$afe1d$3cd285bd$f0b68da9$67150f63@cox.net' \
--to=1i5t5.duncan@cox.net \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).