public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Eric Sandeen <sandeen@sandeen.net>
To: Vegard Nossum <vegard.nossum@gmail.com>
Cc: Fabio Coatti <cova@ferrara.linux.it>,
	linux-kernel@vger.kernel.org, xfs@oss.sgi.com
Subject: Re: [BUG] spinlock lockup on CPU#0
Date: Thu, 09 Apr 2009 10:27:37 -0500	[thread overview]
Message-ID: <49DE13E9.6040605@sandeen.net> (raw)
In-Reply-To: <19f34abd0904090707v7eb8b677gbda42595aa04a090@mail.gmail.com>

Vegard Nossum wrote:
> 2009/3/30 Fabio Coatti <cova@ferrara.linux.it>:
>> Hi all, I've got the following BUG: report on one of our servers running
>> 2.6.28.8; some background:
>> we are seeing several lockups in db (mysql) servers that shows up as a sudden
>> load increase and then, very quickly, the server freezes. It happens in a
>> random way, sometimes after weeks, sometimes very quickly after a system
>> reboot. Trying to discover the problem we installed latest (at the time of
>> test) 2.6.28.X kernel and loaded it with some high disk I/O operations (find,
>> dd, rsync and so on).
>> We have been able  to crash a server with these tests; unfortunately we have
>> been able to capture only a remote screen snapshot so I copied by hand
>> (hopefully without typos) the data and this is the result is the following:
> 
> Hi,
> 
> Thanks for the report.
> 
>>  [<ffffffff80213590>] ? default_idle+0x30/0x50
>>  [<ffffffff8021358e>] ? default_idle+0x2e/0x50
>>  [<ffffffff80213793>] ? c1e_idle+0x73/0x120
>>  [<ffffffff80259f11>] ? atomic_notifier_call_chain+0x11/0x20
>>  [<ffffffff8020a31f>] ? cpu_idle+0x3f/0x70
>> BUG: spinlock lockup on CPU#0, find/13114, ffff8801363d2c80
>> Pid: 13114, comm: find Tainted: G      D W  2.6.28.8 #5
>> Call Trace:
>>  [<ffffffff8041a02e>] _raw_spin_lock+0x14e/0x180
>>  [<ffffffff8060b691>] _spin_lock+0x51/0x70
>>  [<ffffffff80231ca4>] ? task_rq_lock+0x54/0xa0
>>  [<ffffffff80231ca4>] task_rq_lock+0x54/0xa0
>>  [<ffffffff80234501>] try_to_wake_up+0x91/0x280
>>  [<ffffffff80234720>] wake_up_process+0x10/0x20
>>  [<ffffffff803bf863>] xfsbufd_wakeup+0x53/0x70
>>  [<ffffffff802871e0>] shrink_slab+0x90/0x180
>>  [<ffffffff80287526>] try_to_free_pages+0x256/0x3a0
>>  [<ffffffff80285280>] ? isolate_pages_global+0x0/0x280
>>  [<ffffffff80281166>] __alloc_pages_internal+0x1b6/0x460
>>  [<ffffffff802a186d>] alloc_page_vma+0x6d/0x110
>>  [<ffffffff8028d3ab>] handle_mm_fault+0x4ab/0x790
>>  [<ffffffff80225293>] do_page_fault+0x463/0x870
>>  [<ffffffff8060b199>] ? trace_hardirqs_off_thunk+0x3a/0x3c
>>  [<ffffffff8060bf52>] error_exit+0x0/0xa9
> 
> Seems like you hit this:

In _xfs_buf_lookup_pages?  that's not on the stack, and we didn't see
the below printk...

> /*
>  * This could deadlock.
>  *
>  * But until all the XFS lowlevel code is revamped to
>  * handle buffer allocation failures we can't do much.
>  */
> if (!(++retries % 100))
>         printk(KERN_ERR
>                         "XFS: possible memory allocation "
>                         "deadlock in %s (mode:0x%x)\n",
>                         __func__, gfp_mask);
> 
...

so I don't think so.  From the trace:

>>  [<ffffffff803bf863>] xfsbufd_wakeup+0x53/0x70
>>  [<ffffffff802871e0>] shrink_slab+0x90/0x180

this is the shrinker kicking off:

static struct shrinker xfs_buf_shake = {
        .shrink = xfsbufd_wakeup,
        .seeks = DEFAULT_SEEKS,
};


> ...so my guess is that you ran out of memory (and XFS simply can't
> handle it -- an error in the XFS code, of course).

Wrong guess, I think.  XFS has been called via the shrinker mechanisms
to *free* memory, and we're not able to get the task rq lock in the
wakeup path, but not sure why...

> My first tip, if you simply want your servers not to crash, is to
> switch to another filesystem. You could at least try it and see if it
> helps your problem -- that's the most straight-forward solution I can
> think of.
> 
>> The machine is a dual 2216HE (2 cores) AMD with 4 Gb ram; below you can find
>> the .config file. (from /proc/config.gz)
>>
>> we are seeing similar lockups (at least similar for the results) since several
>> kernel revisions (starting from 2.6.25.X) and on different hardware. Several
>> machines are hit by this, mostly databases (maybe for the specific usage, other
>> machines being apache servers, I don't know).
>>
>> Could someone give us some hints about this issue, or at least some
>> suggestions on how to dig it? Of course we can do any sort of testing and
>> tries.

If sysrq-t (show all running tasks) still works post-oops, capturing
that might help to see where other threads are at.  Hook up a serial
console to make capturing the output possible.

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

      parent reply	other threads:[~2009-04-09 15:28 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <200903301936.08477.cova@ferrara.linux.it>
2009-04-09 14:07 ` [BUG] spinlock lockup on CPU#0 Vegard Nossum
2009-04-09 14:21   ` Vegard Nossum
2009-04-09 15:27   ` Eric Sandeen [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49DE13E9.6040605@sandeen.net \
    --to=sandeen@sandeen.net \
    --cc=cova@ferrara.linux.it \
    --cc=linux-kernel@vger.kernel.org \
    --cc=vegard.nossum@gmail.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox