Re: I have a blaze of 353 page allocation failures, all alike

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Mel Gorman <mel@csn.ul.ie>
To: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>,
	eric.dumazet@gmail.com, Peter Kruse <pk@q-leap.de>,
	linux-kernel@vger.kernel.org
Subject: Re: I have a blaze of 353 page allocation failures, all alike
Date: Tue, 5 Jul 2011 18:20:43 +0100	[thread overview]
Message-ID: <20110705172043.GA15147@csn.ul.ie> (raw)
In-Reply-To: <alpine.DEB.2.00.1107051013500.16869@router.home>

On Tue, Jul 05, 2011 at 10:18:44AM -0500, Christoph Lameter wrote:
> On Tue, 28 Jun 2011, Peter Kruse wrote:
> 
> > the server crashed again, I attach the dmesg and kern.log that
> > contain some call traces.  Could you have a look again?
> > This time the server runs 2.6.32.29.
> 
> These are messages about a hung task. I do not see any page allocation
> failures in this log anymore. Mel can you have a look. Could it be that
> something holds up reclaim?

This is a new issue to me and I see it's against 2.6.32.29 which
isn't even the latest stable 2.6.32 so I don't think I'll be able to
dedicate a lot of time to this.

If kswapd is stuck in a loop, it's possible that it's stuck trying to
shrink slab, failing and unable to free the associated pages. sysrq+m
might have given a hint but I also see some worrying things like this
while glancing through the traces;

Jun 28 09:01:14 beosrv1-t kernel: [3702691.182966] dsmc          R  running task        0 27835      1 0x00020000
Jun 28 09:01:14 beosrv1-t kernel: [3702691.183139]  ffff88006ad6b728 0000000000000082 ffffffff8100b92e ffff88006ad6b728
Jun 28 09:01:14 beosrv1-t kernel: [3702691.183423]  000000000000dd08 ffff88006ad6bfd8 00000000000115c0 00000000000115c0
Jun 28 09:01:14 beosrv1-t kernel: [3702691.183707]  0000000000000000 ffffffffa00ef883 ffff880a18d5b810 ffff88006eccc770
Jun 28 09:01:14 beosrv1-t kernel: [3702691.183990] Call Trace:
Jun 28 09:01:14 beosrv1-t kernel: [3702691.184060]  [<ffffffff8100b92e>] ? apic_timer_interrupt+0xe/0x20
Jun 28 09:01:14 beosrv1-t kernel: [3702691.184149]  [<ffffffffa00ef883>] ? xfs_reclaim_inode+0x0/0xef [xfs]
Jun 28 09:01:14 beosrv1-t kernel: [3702691.184237]  [<ffffffffa00f0370>] ? xfs_inode_ag_iterator+0x6f/0xaf [xfs]
Jun 28 09:01:14 beosrv1-t kernel: [3702691.184327]  [<ffffffffa00ef883>] ? xfs_reclaim_inode+0x0/0xef [xfs]
Jun 28 09:01:14 beosrv1-t kernel: [3702691.184406]  [<ffffffff81034ad5>] __cond_resched+0x25/0x31
Jun 28 09:01:14 beosrv1-t kernel: [3702691.184483]  [<ffffffff813dd3d3>] ? _cond_resched+0x0/0x2f
Jun 28 09:01:14 beosrv1-t kernel: [3702691.184560]  [<ffffffff813dd3f7>] _cond_resched+0x24/0x2f
Jun 28 09:01:14 beosrv1-t kernel: [3702691.184637]  [<ffffffff810779cb>] shrink_slab+0x10d/0x154
Jun 28 09:01:14 beosrv1-t kernel: [3702691.184713]  [<ffffffff81078694>] try_to_free_pages+0x221/0x31c
Jun 28 09:01:14 beosrv1-t kernel: [3702691.184791]  [<ffffffff810757de>] ? isolate_pages_global+0x0/0x1f0
Jun 28 09:01:14 beosrv1-t kernel: [3702691.184869]  [<ffffffff8107260d>] __alloc_pages_nodemask+0x3fd/0x600
Jun 28 09:01:14 beosrv1-t kernel: [3702691.184948]  [<ffffffff81094d83>] kmem_getpages+0x5c/0x127
Jun 28 09:01:14 beosrv1-t kernel: [3702691.185024]  [<ffffffff81094f6d>] fallback_alloc+0x11f/0x195
Jun 28 09:01:14 beosrv1-t kernel: [3702691.185102]  [<ffffffff8109510c>] ____cache_alloc_node+0x129/0x138
Jun 28 09:01:14 beosrv1-t kernel: [3702691.185180]  [<ffffffff81095ad5>] kmem_cache_alloc+0xd1/0xfe
Jun 28 09:01:14 beosrv1-t kernel: [3702691.185268]  [<ffffffffa00e5aab>] kmem_zone_alloc+0x67/0xaf [xfs]
Jun 28 09:01:14 beosrv1-t kernel: [3702691.185358]  [<ffffffffa00cbdb1>] xfs_inode_alloc+0x24/0xcd [xfs]
Jun 28 09:01:14 beosrv1-t kernel: [3702691.185448]  [<ffffffffa00c0473>] ? xfs_dir_lookup+0xfc/0x153 [xfs]
Jun 28 09:01:14 beosrv1-t kernel: [3702691.185539]  [<ffffffffa00cc10c>] xfs_iget+0x2b2/0x472 [xfs]
Jun 28 09:01:14 beosrv1-t kernel: [3702691.185628]  [<ffffffffa00e3b1a>] xfs_lookup+0x7d/0xae [xfs]
Jun 28 09:01:14 beosrv1-t kernel: [3702691.185715]  [<ffffffffa00ebf43>] xfs_vn_lookup+0x3f/0x7e [xfs]
Jun 28 09:01:14 beosrv1-t kernel: [3702691.185793]  [<ffffffff810a2d29>] do_lookup+0xd5/0x1b9
Jun 28 09:01:14 beosrv1-t kernel: [3702691.185869]  [<ffffffff810a4f35>] __link_path_walk+0x92e/0xdea
Jun 28 09:01:14 beosrv1-t kernel: [3702691.185946]  [<ffffffff810a2cb7>] ? do_lookup+0x63/0x1b9
Jun 28 09:01:14 beosrv1-t kernel: [3702691.186024]  [<ffffffff810a5639>] path_walk+0x69/0xd4
Jun 28 09:01:14 beosrv1-t kernel: [3702691.186100]  [<ffffffff810a5783>] do_path_lookup+0x29/0x51
Jun 28 09:01:14 beosrv1-t kernel: [3702691.186176]  [<ffffffff810a61ab>] user_path_at+0x52/0x93
Jun 28 09:01:14 beosrv1-t kernel: [3702691.186261]  [<ffffffffa00a4cff>] ? posix_acl_access_exists+0x2e/0x38 [xfs]
Jun 28 09:01:14 beosrv1-t kernel: [3702691.186351]  [<ffffffffa00f0988>] ? xfs_vn_listxattr+0xbd/0x100 [xfs]
Jun 28 09:01:14 beosrv1-t kernel: [3702691.186430]  [<ffffffff810a61bd>] ? user_path_at+0x64/0x93
Jun 28 09:01:14 beosrv1-t kernel: [3702691.186506]  [<ffffffff8109e40e>] vfs_fstatat+0x35/0x62
Jun 28 09:01:14 beosrv1-t kernel: [3702691.186592]  [<ffffffffa00f073c>] ? xfs_xattr_put_listent_sizes+0x0/0x30 [xfs]
Jun 28 09:01:14 beosrv1-t kernel: [3702691.186688]  [<ffffffff8109e454>] vfs_lstat+0x19/0x1b
Jun 28 09:01:14 beosrv1-t kernel: [3702691.186765]  [<ffffffff8102afbb>] sys32_lstat64+0x1a/0x34
Jun 28 09:01:14 beosrv1-t kernel: [3702691.186841]  [<ffffffff810a2b18>] ? path_put+0x2c/0x30
Jun 28 09:01:14 beosrv1-t kernel: [3702691.186917]  [<ffffffff810b3594>] ? sys_llistxattr+0x4d/0x5c
Jun 28 09:01:14 beosrv1-t kernel: [3702691.186995]  [<ffffffff8102a0d3>] ia32_sysret+0x0/0x5

That looks like XFS could be recursing into itself but maybe it copes
with that and bails - I'm not familiar enough with XFS to know if it
copes with this properly.

There are also patches that should be in 2.6.32.29 like [e52af507:
xfs: Non-blocking inode locking in IO completion] which talks
about deadlocks during IO completion but the kernel name is
2.6.32.29-ql-server-20. Does that include this patch or is it some
other fork?

What seems more plausible is 2.6.32.29 is missing patches like this
[081003ff: xfs: properly account for reclaimed inodes] which sounds
like a very similar problem to what is happening here. There is a good
chance this problem is already fixed but wasn't backported to 2.6.32.x
unfortunately.

-- 
Mel Gorman
SUSE Labs

next prev parent reply	other threads:[~2011-07-05 17:20 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-02-10 15:03 I have a blaze of 353 page allocation failures, all alike Peter Kruse
2011-02-14 16:49 ` Christoph Lameter
2011-02-15  7:44   ` Peter Kruse
2011-02-15 17:30     ` Christoph Lameter
2011-02-16 12:22       ` Peter Kruse
2011-02-16 15:59         ` Christoph Lameter
2011-02-16 16:03           ` Peter Kruse
2011-02-16 16:14             ` Christoph Lameter
2011-02-17  7:31               ` Peter Kruse
2011-02-17 17:03                 ` Christoph Lameter
2011-02-18 12:30                   ` Peter Kruse
2011-02-24 12:01                     ` Peter Kruse
2011-04-12 15:01                       ` Peter Kruse
2011-04-12 18:08                         ` Christoph Lameter
2011-04-13  1:34                           ` David Rientjes
2011-04-13  7:13                             ` Peter Kruse
2011-04-13 16:17                               ` Christoph Lameter
2011-05-19 11:56                                 ` Peter Kruse
2011-05-19 16:00                                   ` Christoph Lameter
2011-05-23  6:34                                     ` Peter Kruse
     [not found]                                 ` <4E09BEA1.1080501@q-leap.de>
     [not found]                                   ` <alpine.DEB.2.00.1107051013500.16869@router.home>
2011-07-05 17:20                                     ` Mel Gorman [this message]
2011-07-06  4:16                                       ` Dave Chinner
2011-07-06  6:50                                         ` Peter Kruse
2011-07-06 14:31                                           ` Christoph Lameter
2011-07-06 15:15                                             ` Peter Kruse
2011-07-06 15:30                                               ` Christoph Lameter
2011-11-24 10:53                                                 ` Peter Kruse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110705172043.GA15147@csn.ul.ie \
    --to=mel@csn.ul.ie \
    --cc=cl@linux.com \
    --cc=eric.dumazet@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pk@q-leap.de \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.