From: Brian Candler <B.Candler@pobox.com>
To: Stan Hoeppner <stan@hardwarefreak.com>
Cc: xfs@oss.sgi.com
Subject: Re: Storage server, hung tasks and tracebacks
Date: Fri, 4 May 2012 17:32:37 +0100 [thread overview]
Message-ID: <20120504163237.GA6128@nsrc.org> (raw)
In-Reply-To: <4FA3047D.8060908@hardwarefreak.com>
On Thu, May 03, 2012 at 05:19:41PM -0500, Stan Hoeppner wrote:
> Glad to hear you've got one running somewhat stable. Could be a driver
> problem, but it's pretty rare for a SCSI driver to hard lock a box isn't
> it?
Yes, that bothers me too.
> Keep us posted.
Last night I fired up two more instances of bonnie++ on that box, so there
were four at once. Going back to the box now, I find that they have all
hung :-(
They are stuck at:
Delete files in random order...
Stat files in random order...
Stat files in random order...
Stat files in sequential order...
respectively.
iostat 5 shows no activity. There are 9 hung processes:
$ uptime
17:23:35 up 1 day, 20:39, 1 user, load average: 9.04, 9.08, 8.91
$ ps auxwww | grep " D" | grep -v grep
root 35 1.5 0.0 0 0 ? D May02 42:10 [kswapd0]
root 1179 0.0 0.0 0 0 ? D May02 1:50 [xfsaild/md126]
root 3127 0.0 0.0 25096 312 ? D 16:55 0:00 /usr/lib/postfix/master
tomi 29138 1.1 0.0 378860 3708 pts/1 D+ 12:43 3:06 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
tomi 29390 1.0 0.0 378860 3560 pts/3 D+ 12:52 2:53 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
tomi 30356 1.1 0.0 378860 3512 pts/2 D+ 13:32 2:36 bonnie++ -d /disk/scratch/testb -s 16384k -n 98:800k:500k:1000
root 31075 0.0 0.0 0 0 ? D 14:00 0:04 [kworker/0:0]
tomi 31796 0.6 0.0 378860 3864 pts/4 D+ 14:30 1:05 bonnie++ -d /disk/scratch/testb -s 16384k -n 98:800k:500k:1000
root 31922 0.0 0.0 0 0 ? D 14:35 0:00 [kworker/1:0]
dmesg shows hung tasks and backtraces, starting with:
[150927.599920] INFO: task kswapd0:35 blocked for more than 120 seconds.
[150927.600263] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[150927.600698] kswapd0 D ffffffff81806240 0 35 2 0x00000000
[150927.600704] ffff880212389330 0000000000000046 ffff880212389320 ffffffff81082df5
[150927.600710] ffff880212389fd8 ffff880212389fd8 ffff880212389fd8 0000000000013780
[150927.600715] ffff8802121816f0 ffff88020e538000 ffff880212389320 ffff88020e538000
[150927.600719] Call Trace:
[150927.600728] [<ffffffff81082df5>] ? __queue_work+0xe5/0x320
[150927.600733] [<ffffffff8165a55f>] schedule+0x3f/0x60
[150927.600739] [<ffffffff814e82c6>] md_flush_request+0x86/0x140
[150927.600745] [<ffffffff8105f990>] ? try_to_wake_up+0x200/0x200
[150927.600756] [<ffffffffa0010419>] raid0_make_request+0x119/0x1c0 [raid0]
...
Now, the only other thing I have found by googling is a suggestion that LSI
drivers lock up when there is any smart or hddtemp activity: see end of
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/906873
On this system the smartmontools package is installed, but I have not
configured it, and smartd is not running. I don't have hddtemp installed
either.
I am completely at a loss with all this... I've never seen a Unix/Linux
system behave so unreliably. One of the company's directors has reminded me
that we have a Windows storage server with 48 disks which has been running
without incident for the last 3 or 4 years, and I don't have a good answer
for that :-(
Regards,
Brian.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2012-05-04 16:32 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-05-02 18:44 Storage server, hung tasks and tracebacks Brian Candler
2012-05-03 12:50 ` Stan Hoeppner
2012-05-03 20:41 ` Brian Candler
2012-05-03 22:19 ` Stan Hoeppner
2012-05-04 16:32 ` Brian Candler [this message]
2012-05-04 16:50 ` Stefan Ring
2012-05-07 1:53 ` Dave Chinner
[not found] ` <4FA4C321.2070105@hardwarefreak.com>
2012-05-06 8:47 ` Brian Candler
2012-05-15 14:02 ` Brian Candler
2012-05-20 16:35 ` Brian Candler
2012-05-22 13:14 ` Brian Candler
2012-05-20 23:59 ` Dave Chinner
2012-05-21 9:58 ` Brian Candler
2012-09-09 9:47 ` Brian Candler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120504163237.GA6128@nsrc.org \
--to=b.candler@pobox.com \
--cc=stan@hardwarefreak.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox