public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: "Bruno Prémont" <bonbons@linux-vserver.org>
To: Dave Chinner <david@fromorbit.com>
Cc: xfs@oss.sgi.com
Subject: Re: XFS crashing system with general protection fault
Date: Mon, 9 Feb 2015 09:47:01 +0100	[thread overview]
Message-ID: <20150209094701.6b1d480d@pluto.restena.lu> (raw)
In-Reply-To: <20150205221516.GT4251@dastard>

Hi Dave,

On Fri, 6 Feb 2015 09:15:16 +1100 Dave Chinner wrote:
> On Thu, Feb 05, 2015 at 03:10:07PM +0100, Bruno Prémont wrote:
> > Hi Dave,
> > 
> > New crash, new trace, this time on 3.18.2.
> > It looks like this time a NULL dereference happened prior to touched memory poison being detected.
> > 
> > Once again it's during normal system operation (no mount/umount activity)
> 
> Can you rebuild the kernel with CONFIG_XFS_WARN=y and see if that
> throws any interesting messages into logs?

Will try and see

> However:
> 
> > [1900390.261491] =============================================================================
> > [1900390.272989] BUG task_struct (Tainted: G      D W     ): Poison overwritten
> > [1900390.283021] -----------------------------------------------------------------------------
> > [1900390.283021] 
> > [1900390.297056] INFO: 0xffff880213d651b3-0xffff880213d651b3. First byte 0x6d instead of 0x6b
> > [1900390.309044] INFO: Slab 0xffffea00084f5800 objects=16 used=16 fp=0x          (null) flags=0x8000000000004080
> > [1900390.323087] INFO: Object 0xffff880213d64ba0 @offset=19360 fp=0xffff880213d61e40
> > [1900390.323087] 
> > [1900390.336988] Bytes b4 ffff880213d64b90: 60 2d d6 13 02 88 ff ff 5a 5a 5a 5a 5a 5a 5a 5a  `-......ZZZZZZZZ
> > [1900390.350988] Object ffff880213d64ba0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
> > [1900390.364943] Object ffff880213d64bb0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
> ....
> > [1900391.674636] Object ffff880213d651b0: 6b 6b 6b 6d 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkmkkkkkkkkkkkk
>                                                      ^^
> 
> There's a single bit that has been flipped in the task_struct slab.
> So more than just XFS is seeing memory corruption - this is in core
> kernel structure slab caches. I'm not sure, either, how XFS could
> cause corruption in this slab.
> 
> So, I'd be checking all the previous memory corruptions to see if
> they are single bit errors, and if there is any pattern to the
> addresses at which they occur. The above bit flip makes me think
> "hardware issue" and everything else stems from that...

System has ECC RAM so faulty RAM looks less probable (no complaint seen
by kernel nor recorded by firmware).

All previous crashes for which I have some logs were dereference after
free but not attempt to allocate memory from a modified poison in free
slabs.


Though what does that single bit represent in that area if it was
used/modified after free?


As I have seen some other servers mysteriously crash after months of
uptime (or sometime only weeks) I think a kernel issue/race condition is
more likely though I have no traces or other info that would help here.
As systems receive newer kernels I'm enabling netconsole to be able to
capture kernel message to be able to analyze future such crashes.

Cheers,
Bruno

> Cheers,
> 
> Dave.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2015-02-09  8:47 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-24 10:14 XFS crashing system with general protection fault Bruno Prémont
2014-12-28 11:51 ` Dave Chinner
2014-12-29  7:44   ` Bruno Prémont
2015-01-13  7:17     ` Bruno Prémont
2015-02-05 14:10       ` Bruno Prémont
2015-02-05 22:15         ` Dave Chinner
2015-02-09  8:47           ` Bruno Prémont [this message]
2015-02-09 21:24             ` Dave Chinner
2015-02-10  7:05               ` Bruno Prémont
2015-02-23  7:56                 ` Bruno Prémont
2015-03-12 14:15                   ` Bruno Prémont

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150209094701.6b1d480d@pluto.restena.lu \
    --to=bonbons@linux-vserver.org \
    --cc=david@fromorbit.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox