Re: unable to handle kernel paging request - btrfs

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Duncan <1i5t5.duncan@cox.net>
To: linux-btrfs@vger.kernel.org
Subject: Re: unable to handle kernel paging request - btrfs
Date: Fri, 23 Sep 2016 04:58:53 +0000 (UTC)	[thread overview]
Message-ID: <pan$afe1d$3cd285bd$f0b68da9$67150f63@cox.net> (raw)
In-Reply-To: CAGfcS_m0KvAjmWpnGC7agLYBKROqjPoSCVrziaEJXzW1Dyw5iQ@mail.gmail.com

Rich Freeman posted on Thu, 22 Sep 2016 07:18:35 -0500 as excerpted:

> I have been getting panics consistently after doing a btrfs replace
> operation on a raid1 and rebooting.  I linked a photo of the panic; I
> haven't been able to get a text capture of it.
> 
> https://ibin.co/2vx0HhDeViu3.jpg
> 
> I'm getting this error on the latest 4.4, 4.1, and even on an old
> 3.18.26 kernel I had lying around.
> 
> I tried the remove root_log_ctx from ctx list before btrfs_sync_log
> returns patch on 4.1 and that did not solve my problem either.
> 
> I'm able to boot into single-user mode and if I don't start any
> processes the system seems fairly stable.  I am also able to start a
> btrfs balance and run that for several hours without issue.  If I start
> launching services the system will tend to panic, though how many
> processes I can launch will vary.  I don't think that it is a particular
> file being accessed that is triggering the issue since the point where
> it fails varies.  I suspect it may be load-related.
> 
> Mounting with compress=no doesn't seem to help either.  Granted, I see
> lzo_decompress in the backtrace and that is probably a read operation.
> 
> Any suggestions?  Google hasn't been helpful on this one...

Btrfs raid1 you say, and you have existing compressed files it's trying 
to read in the backtrace?

Sounds like the issues I see sometimes and have posted about where after 
a crash that resulted in one device of my raid1 pair getting behind the 
other, the kernel will crash if it sees too many csum-errors, even tho 
it's /supposed/ to check the other copy and read from it if valid (which 
it is as a btrfs scrub resolves the issue).

When booted to rescue/single-user mode, can you run a scrub?  If it's the 
csum-related problem I see and the replace worked, a scrub should 
complete fine, repairing the bad copy from the mirror, and the problem 
should be resolved.  If the replace bugged out and you now have only one 
copy of some chunks, if scrub finds an error there it obviously won't be 
able to repair from the good mirror, but it should at least spot some csum 
errors it can't repair.

If a scrub crashes too, if it completes without finding any errors to 
correct, or if it finds and corrects errors but the issue persists, then 
it's unlikely to be the issue I've seen.

FWIW, the issue I've seen appears to be related to attempts to read 
compressed files.  It does not appear to affect users who don't have any 
such files or do but they're simply not accessed in ordinary operations.  
It may or may not affect other than raid1 and likely raid10, but they 
make it easiest to verify due to the possibility of one copy getting out 
of sync with the other, and due to scrub's ability to confirm that as the 
problem as it can repair the bad copy from the good one, which the kernel 
should do dynamically as well, but that's where the bug is as too many 
dynamic csum errors trigger a crash even when there's a second copy 
available, that scrub later verifies as valid.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

next prev parent reply	other threads:[~2016-09-23  4:59 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-22 12:18 unable to handle kernel paging request - btrfs Rich Freeman
2016-09-22 12:44 ` Holger Hoffstätte
2016-09-22 16:23   ` David Sterba
2016-09-22 16:46 ` Rich Freeman
2016-09-22 17:29   ` Chris Murphy
2016-09-22 17:41 ` Jeff Mahoney
2016-09-30 18:54   ` Rich Freeman
2016-09-30 20:55     ` Jeff Mahoney
2016-09-30 21:07       ` Rich Freeman
2016-10-01  0:38         ` Jeff Mahoney
2016-10-07 14:00           ` Rich Freeman
2016-10-08 21:55             ` Rich Freeman
2016-10-10 12:54               ` Rich Freeman
2016-09-23  4:58 ` Duncan [this message]
2016-09-25 13:55   ` Rich Freeman
2016-09-26  0:22     ` Jeff Mahoney
2016-09-26  0:37       ` Rich Freeman
2016-09-26  0:39         ` Jeff Mahoney
2016-09-26  0:42           ` Rich Freeman
2016-09-26  2:21     ` Duncan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='pan$afe1d$3cd285bd$f0b68da9$67150f63@cox.net' \
    --to=1i5t5.duncan@cox.net \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).