From: Eric Sandeen <sandeen@sandeen.net>
To: Juerg Haefliger <juergh@gmail.com>
Cc: xfs@oss.sgi.com
Subject: Re: Daily crash in xfs_cmn_err
Date: Tue, 30 Oct 2012 14:02:39 -0500 [thread overview]
Message-ID: <5090244F.7060900@sandeen.net> (raw)
In-Reply-To: <CADLDEKv8Y0Z+4gfJYhxi=-CejkkdaGBjtx1og6n8G6z1o9xpSA@mail.gmail.com>
On 10/30/12 3:58 AM, Juerg Haefliger wrote:
> On Mon, Oct 29, 2012 at 1:53 PM, Dave Chinner <david@fromorbit.com> wrote:
>> On Mon, Oct 29, 2012 at 11:55:15AM +0100, Juerg Haefliger wrote:
>>> Hi,
>>>
>>> I have a node that used to crash every day at 6:25am in xfs_cmn_err
>>> (Null pointer dereference).
>>
>> Stack trace, please.
>
>
> [128185.204521] BUG: unable to handle kernel NULL pointer dereference
> at 00000000000000f8
...
>
> mp passed to xfs_cmn_err was a Null pointer and mp->m_fsname in the
> printk line caused the crash (offset of m_fsname is 0xf8).
>
> Error message extracted from the dump:
> XFS internal error XFS_WANT_CORRUPTED_GOTO at line 1449 of file
> fs/xfs/xfs_alloc.c.
>
> And the comments in the source:
> 1444 /*
> 1445 * If this failure happens the request to free this
> 1446 * space was invalid, it's (partly) already free.
> 1447 * Very bad.
> 1448 */
> 1449 XFS_WANT_CORRUPTED_GOTO(gtbno >= bno + len, error0);
and:
#define XFS_WANT_CORRUPTED_GOTO(x,l) \
...
XFS_ERROR_REPORT("XFS_WANT_CORRUPTED_GOTO", \
XFS_ERRLEVEL_LOW, NULL); \
so it explicitly passes a NULL to XFS_ERROR_REPORT(), which sends
it down the xfs_error_report->xfs_cmn_err path and boom.
So you have a persistent on-disk corruption, but it's causing
this to blow up due to an old bug.
I think it got fixed in 2.6.39, there it finds its way to
__xfs_printk() which does:
if (mp && mp->m_fsname) {
printk("%sXFS (%s): %pV\n", level, mp->m_fsname, vaf);
return;
}
and so handles the null mp situation.
Anyway; I'd repair the fs; if you are paranoid, do:
xfs_metadump -o /dev/blah metadumpfile
xfs_mdrestore metadumpfile filesystem.img
xfs_repair filesystem.img
mount -o loop filesystem.img /some/place
first, and you can see for sure what xfs_repair will do to the real
device, and what the fs looks like when it's done (no data will be
present in the metadumped image, just metadata)
-Eric
>
>>> 1) I was under the impression that during the mounting of an XFS
>>> volume some sort of check/repair is performed. How does that differ
>>> from running xfs_check and/or xfs_repair?
>>
>> Journal recovery is performed at mount time, not a consistency
>> check.
>>
>> http://en.wikipedia.org/wiki/Filesystem_journaling
>
> Ah OK. Thanks for the clarification.
>
>
>>> 2) Any ideas how the filesystem might have gotten into this state? I
>>> don't have the history of that node but it's possible that it crashed
>>> previously due to an unrelated problem. Could this have left the
>>> filesystem is this state?
>>
>> <shrug>
>>
>> How long is a piece of string?
>>
>>> 3) What exactly does the ouput of the xfs_check mean? How serious is
>>> it? Are those warning or errors? Will some of them get cleanup up
>>> during the mounting of the filesystem?
>>
>> xfs_check is deprecated. The output of xfs_repair indicates
>> cross-linked extent indexes. Will only get properly detected and
>> fixed by xfs_repair. And "fixed" may mean corrupt files are removed
>> from the filesystem - repair does nto guarantee that your data is
>> preserved or consistent after it runs, just that the filesystem is
>> consistent and error free.
>>
>>> 4) We have a whole bunch of production nodes running the same kernel.
>>> I'm more than a little concerned that we might have a ticking timebomb
>>> with some filesystems being in a state that might trigger a crash
>>> eventually. Is there any way to perform a live check on a mounted
>>> filesystem so that I can get an idea of how big of a problem we have
>>> (if any)?
>>
>> Read the xfs_repair man page?
>>
>> -n No modify mode. Specifies that xfs_repair should not
>> modify the filesystem but should only scan the filesystem
>> and indicate what repairs would have been made.
>> .....
>>
>> -d Repair dangerously. Allow xfs_repair to repair an XFS
>> filesystem mounted read only. This is typically done on a
>> root fileystem from single user mode, immediately followed by
>> a reboot.
>>
>> So, remount read only, run xfs_repair -d -n will check the
>> filesystem as best as can be done online. If there are any problems,
>> then you can repair them and immediately reboot.
>>
>>> i don't claim to know exactly what I'm doing but I picked a
>>> node, froze the filesystem and then ran a modified xfs_check (which
>>> bypasses the is_mounted check and ignores non-committed metadata) and
>>> it did report some issues. At this point I believe those are false
>>> positive. Do you have any suggestions short of rebooting the nodes and
>>> running xfs_check on the unmounted filesystem?
>>
>> Don't bother with xfs_check. xfs_repair will detect all the same
>> errors (and more) and can fix them at the same time.
>
> Thanks for the hints.
>
> ...Juerg
>
>
>> Cheers,
>>
>> Dave.
>> --
>> Dave Chinner
>> david@fromorbit.com
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2012-10-30 19:00 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-10-29 10:55 Daily crash in xfs_cmn_err Juerg Haefliger
2012-10-29 12:53 ` Dave Chinner
2012-10-30 8:58 ` Juerg Haefliger
2012-10-30 19:02 ` Eric Sandeen [this message]
2012-10-29 14:23 ` Carlos Maiolino
2012-10-30 9:07 ` Juerg Haefliger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5090244F.7060900@sandeen.net \
--to=sandeen@sandeen.net \
--cc=juergh@gmail.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.