From: Brian Candler <B.Candler@pobox.com>
To: Stan Hoeppner <stan@hardwarefreak.com>
Cc: xfs@oss.sgi.com
Subject: Re: Storage server, hung tasks and tracebacks
Date: Fri, 4 May 2012 17:32:37 +0100 [thread overview]
Message-ID: <20120504163237.GA6128@nsrc.org> (raw)
In-Reply-To: <4FA3047D.8060908@hardwarefreak.com>
On Thu, May 03, 2012 at 05:19:41PM -0500, Stan Hoeppner wrote:
> Glad to hear you've got one running somewhat stable. Could be a driver
> problem, but it's pretty rare for a SCSI driver to hard lock a box isn't
> it?
Yes, that bothers me too.
> Keep us posted.
Last night I fired up two more instances of bonnie++ on that box, so there
were four at once. Going back to the box now, I find that they have all
hung :-(
They are stuck at:
Delete files in random order...
Stat files in random order...
Stat files in random order...
Stat files in sequential order...
respectively.
iostat 5 shows no activity. There are 9 hung processes:
$ uptime
17:23:35 up 1 day, 20:39, 1 user, load average: 9.04, 9.08, 8.91
$ ps auxwww | grep " D" | grep -v grep
root 35 1.5 0.0 0 0 ? D May02 42:10 [kswapd0]
root 1179 0.0 0.0 0 0 ? D May02 1:50 [xfsaild/md126]
root 3127 0.0 0.0 25096 312 ? D 16:55 0:00 /usr/lib/postfix/master
tomi 29138 1.1 0.0 378860 3708 pts/1 D+ 12:43 3:06 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
tomi 29390 1.0 0.0 378860 3560 pts/3 D+ 12:52 2:53 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
tomi 30356 1.1 0.0 378860 3512 pts/2 D+ 13:32 2:36 bonnie++ -d /disk/scratch/testb -s 16384k -n 98:800k:500k:1000
root 31075 0.0 0.0 0 0 ? D 14:00 0:04 [kworker/0:0]
tomi 31796 0.6 0.0 378860 3864 pts/4 D+ 14:30 1:05 bonnie++ -d /disk/scratch/testb -s 16384k -n 98:800k:500k:1000
root 31922 0.0 0.0 0 0 ? D 14:35 0:00 [kworker/1:0]
dmesg shows hung tasks and backtraces, starting with:
[150927.599920] INFO: task kswapd0:35 blocked for more than 120 seconds.
[150927.600263] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[150927.600698] kswapd0 D ffffffff81806240 0 35 2 0x00000000
[150927.600704] ffff880212389330 0000000000000046 ffff880212389320 ffffffff81082df5
[150927.600710] ffff880212389fd8 ffff880212389fd8 ffff880212389fd8 0000000000013780
[150927.600715] ffff8802121816f0 ffff88020e538000 ffff880212389320 ffff88020e538000
[150927.600719] Call Trace:
[150927.600728] [<ffffffff81082df5>] ? __queue_work+0xe5/0x320
[150927.600733] [<ffffffff8165a55f>] schedule+0x3f/0x60
[150927.600739] [<ffffffff814e82c6>] md_flush_request+0x86/0x140
[150927.600745] [<ffffffff8105f990>] ? try_to_wake_up+0x200/0x200
[150927.600756] [<ffffffffa0010419>] raid0_make_request+0x119/0x1c0 [raid0]
...
Now, the only other thing I have found by googling is a suggestion that LSI
drivers lock up when there is any smart or hddtemp activity: see end of
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/906873
On this system the smartmontools package is installed, but I have not
configured it, and smartd is not running. I don't have hddtemp installed
either.
I am completely at a loss with all this... I've never seen a Unix/Linux
system behave so unreliably. One of the company's directors has reminded me
that we have a Windows storage server with 48 disks which has been running
without incident for the last 3 or 4 years, and I don't have a good answer
for that :-(
Regards,
Brian.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2012-05-04 16:32 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-05-02 18:44 Storage server, hung tasks and tracebacks Brian Candler
2012-05-03 12:50 ` Stan Hoeppner
2012-05-03 20:41 ` Brian Candler
2012-05-03 22:19 ` Stan Hoeppner
2012-05-04 16:32 ` Brian Candler [this message]
2012-05-04 16:50 ` Stefan Ring
2012-05-07 1:53 ` Dave Chinner
[not found] ` <4FA4C321.2070105@hardwarefreak.com>
2012-05-06 8:47 ` Brian Candler
2012-05-15 14:02 ` Brian Candler
2012-05-20 16:35 ` Brian Candler
2012-05-22 13:14 ` Brian Candler
2012-05-20 23:59 ` Dave Chinner
2012-05-21 9:58 ` Brian Candler
2012-09-09 9:47 ` Brian Candler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120504163237.GA6128@nsrc.org \
--to=b.candler@pobox.com \
--cc=stan@hardwarefreak.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.