public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Theodore Ts'o" <tytso@mit.edu>
To: linux kernel mailing list <linux-kernel@vger.kernel.org>
Cc: martin f krafft <madduck@madduck.net>
Subject: Re: EXT4-fs error, kernel BUG
Date: Tue, 5 Aug 2014 08:51:14 -0400	[thread overview]
Message-ID: <20140805125114.GG5263@thunk.org> (raw)
In-Reply-To: <20140805103436.GA7531@fishbowl.rw.madduck.net>

On Tue, Aug 05, 2014 at 12:34:36PM +0200, martin f krafft wrote:
> Dear kernel people,
> 
> Yesterday, I encountered something weird on one of our NAS machines:
> 
>   Aug  4 20:09:40 julia kernel: [342873.007709] EXT4-fs error (device dm-6): ext4_ext_check_inode:481: inode #30414321: comm du: pblk 0 bad header/extent: invalid extent entries - magic f30a, entries 1, max 4(4), depth 0(0)
> 
> but a fsck -f of the filesystem revealed no problems.

One likely cause of this issue is that the hardware hiccuped on a
read, and returned garbage, which is what triggered the "EXT4-fs
error" message (which is really a report of a detect file system
inconsistency).  A common cause of this is the block address getting
corrupted, so that the hard drive read the correct data from the wrong
location.

The other likely cause is that you are using something like RAID1, and
the one of copies of the disk block really is corrupted, and the
kernel read the bad version of the block, but fsck managed to read the
good version of the block.

It's possible that this was caused by a memory corruption, but it
wouldn't have been high on my suspect list.  Still, if this is a new
machine, it might not be a bad idea to run memtest86+ for 24-48 hours.

> So I set up another filesystem and tried to copy over the data from
> /dev/dm-6, using tar.
> 
> Shortly afterwards, there a wall message like
> 
>   BUG: soft lockup - CPU#0 stuck for 23s! [kswapd0:28]

>From the stack traces, it looks like the system was thrashing trying
to free memory to make forward progess.  (i.e., due to high memory
pressure).  Exactly why this happened is not something I can determine
from the strack traces, sorry.  It could be that soft lockup happened,
you had more processes running, or that some of the processes (samba?
apache?) were using more memory, and this was a factor.  Why the OOM
killer didn't kill the processes I can't tell you.

> Is there anything in the following back traces that would help me
> identify the source of the problem with greater confidence?

Sorry, that's about how that can be divined from your kernel stack
traces.

It might be worth checking the system logs for any suspicious error
messages beyond just the EXT4-fs error message, but you may have done
that already.

Good luck,

						- Ted

  reply	other threads:[~2014-08-05 12:51 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-05 10:34 EXT4-fs error, kernel BUG martin f krafft
2014-08-05 12:51 ` Theodore Ts'o [this message]
2014-08-05 13:15   ` martin f krafft
2014-11-19 21:28     ` martin f krafft

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140805125114.GG5263@thunk.org \
    --to=tytso@mit.edu \
    --cc=linux-kernel@vger.kernel.org \
    --cc=madduck@madduck.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox