From: Dave Chinner <david@fromorbit.com>
To: Alex Gorbachev <ag@iss-integration.com>
Cc: xfs@oss.sgi.com
Subject: Re: Failing XFS filesystem underlying Ceph OSDs
Date: Mon, 6 Jul 2015 09:24:43 +1000 [thread overview]
Message-ID: <20150705232443.GA3902@dastard> (raw)
In-Reply-To: <CADb9452UYPqo_i2M=-sSwPKwqUrRSMJ4bhV10S=1k0sNCKCVfg@mail.gmail.com>
[ Please turn off line wrap when pasting kernel traces ]
On Sun, Jul 05, 2015 at 12:25:47AM -0400, Alex Gorbachev wrote:
> > > sysctl vm.swappiness=20 (can probably be 1 as per article)
> > >
> > > sysctl vm.min_free_kbytes=262144
> >
> > That's not an explanation for what looks to be page cache radix
> > tree coruption. Memory reclaim still occurs with the settings you
> > have now and, well, those changes occurred back in 3.5 - some
> > 3 years ago - so it's not really an explanation for a problem with a
> > recent 4.1 kernel...
> >
> > > So far no issues, but I need to wait a week to see if anything shows up.
> > > Thank you for reviewing the error codes.
> >
> > I expect that you'll see the problems again...
>
> We have experienced the problem in various guises with kernels 3.14, 3.19,
> 4.1-rc2 and now 4.1, so it's not new to us, just different error stack.
> Below are some other stack dumps of what manifested as the same error.
>
> [<ffffffff817cf4b9>] schedule+0x29/0x70
> [<ffffffffc07caee7>] _xfs_log_force+0x187/0x280 [xfs]
> [<ffffffff810a4150>] ? try_to_wake_up+0x2a0/0x2a0
> [<ffffffffc07cb019>] xfs_log_force+0x39/0xc0 [xfs]
> [<ffffffffc07d6542>] xfsaild_push+0x552/0x5a0 [xfs]
> [<ffffffff817d2264>] ? schedule_timeout+0x124/0x210
> [<ffffffffc07d662f>] xfsaild+0x9f/0x140 [xfs]
> [<ffffffffc07d6590>] ? xfsaild_push+0x5a0/0x5a0 [xfs]
> [<ffffffff81095e29>] kthread+0xc9/0xe0
> [<ffffffff81095d60>] ? flush_kthread_worker+0x90/0x90
> [<ffffffff817d3718>] ret_from_fork+0x58/0x90
> [<ffffffff81095d60>] ? flush_kthread_worker+0x90/0x90
> INFO: task xfsaild/sdg1:2606 blocked for more than 120 seconds.
> Not tainted 3.19.4-031904-generic #201504131440
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
That's indicative of IO completion problems, but not a crash.
> BUG: unable to handle kernel NULL pointer dereference at (null)
> IP: [<ffffffffc04be80f>] xfs_count_page_state+0x3f/0x70 [xfs]
....
> [<ffffffffc04be880>] xfs_vm_releasepage+0x40/0x120 [xfs]
> [<ffffffff8118a7d2>] try_to_release_page+0x32/0x50
> [<ffffffff8119fe6d>] shrink_page_list+0x69d/0x720
> [<ffffffff811a058d>] shrink_inactive_list+0x1dd/0x5d0
....
Again, this is indicative of a page cache issue: a page without
buffers has been passed to xfs_vm_releasepage(), which implies the
page flags are not correct. i.e PAGE_FLAGS_PRIVATE is set but
page->private is null...
Again, this is unlikely to be an XFS issue.
> Do you think we need to look at RAM handling by this Supermicro machine
> type?
Not sure what you mean by that. Problems like this can be caused by
bad hardware, but it's unusual for a machine using ECC memory to
have undetected RAM problems...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2015-07-05 23:25 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-07-03 9:07 Failing XFS filesystem underlying Ceph OSDs Alex Gorbachev
2015-07-03 23:51 ` Dave Chinner
2015-07-04 14:46 ` Alex Gorbachev
2015-07-04 23:38 ` Dave Chinner
2015-07-05 4:25 ` Alex Gorbachev
2015-07-05 23:24 ` Dave Chinner [this message]
2015-07-06 19:20 ` Alex Gorbachev
2015-07-07 0:35 ` Dave Chinner
2015-07-22 12:23 ` Alex Gorbachev
2015-08-13 14:25 ` Alex Gorbachev
2016-03-11 3:26 ` Alex Gorbachev
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150705232443.GA3902@dastard \
--to=david@fromorbit.com \
--cc=ag@iss-integration.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox