From: Brian Candler <B.Candler@pobox.com>
To: Stan Hoeppner <stan@hardwarefreak.com>
Cc: xfs@oss.sgi.com
Subject: Re: Storage server, hung tasks and tracebacks
Date: Tue, 15 May 2012 15:02:37 +0100 [thread overview]
Message-ID: <20120515140237.GA3630@nsrc.org> (raw)
In-Reply-To: <4FA4C321.2070105@hardwarefreak.com>
Update:
After a week away, I am continuing to try to narrow down the problem of this
system withing hanging I/O.
I can fairly reliably repeat the problem on a system with 24 disks, and I've
embarked on trying some different configs to see what's the simplest way I
can make this die.
During this, I found something of interest: I happened to leave an 'iostat
5' process running, and that hung too. i.e. ps showed it in 'D+' state, and
it was unkillable.
root 34 0.6 0.0 0 0 ? D 11:29 1:18 [kswapd0]
root 1258 0.0 0.0 15976 532 ? Ds 11:29 0:00 /usr/sbin/irqbalance
root 1421 0.0 0.0 0 0 ? D 12:49 0:01 [xfsaild/md127]
snmp 1430 0.0 0.0 48608 3440 ? D 11:29 0:00 /usr/sbin/snmpd -Lsd -Lf /dev/null -u snmp -g snmp -I -smux -p /var/run/snmpd.pid
xxxx 1614 1.1 0.0 378860 3812 pts/1 D+ 12:50 1:15 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
xxxx 1669 1.2 0.0 378860 3816 pts/2 D+ 12:50 1:21 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
xxxx 1727 0.5 0.0 383424 692 pts/3 Dl+ 12:51 0:37 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
xxxx 1782 1.2 0.0 378860 3824 pts/4 D+ 12:51 1:20 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
xxxx 1954 0.0 0.0 5912 544 pts/0 D+ 12:58 0:00 iostat 5
root 2642 0.2 0.0 0 0 ? D 13:25 0:09 [kworker/0:1]
root 3233 0.0 0.0 5044 168 ? Ds 13:50 0:00 /usr/sbin/sshd -D -R
xxxx 4648 0.0 0.0 8104 936 pts/6 S+ 14:41 0:00 grep --color=auto D
root 29491 0.0 0.0 0 0 ? D 12:45 0:00 [kworker/1:2]
I wonder if iostat actually communicates with the device driver at all? If
not, then presumably it's looking at some kernel data structure. Maybe
there is a lock being kept open on that by someone/something.
At the same time, I notice that 'cat /proc/diskstats' still works, and
starting a new 'iostat 5' process works too.
After issuing halt -p I get this:
root 34 0.6 0.0 0 0 ? D 11:29 1:18 [kswapd0]
root 1258 0.0 0.0 15976 532 ? Ds 11:29 0:00 /usr/sbin/irqbalance
root 1421 0.0 0.0 0 0 ? D 12:49 0:01 [xfsaild/md127]
snmp 1430 0.0 0.0 48608 3440 ? D 11:29 0:00 /usr/sbin/snmpd -Lsd -Lf /dev/null -u snmp -g snmp -I -smux -p /var/run/snmpd.pid
xxxx 1614 1.0 0.0 378860 3812 pts/1 D+ 12:50 1:15 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
xxxx 1669 1.1 0.0 378860 3816 pts/2 D+ 12:50 1:21 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
xxxx 1727 0.5 0.0 383424 692 pts/3 Dl+ 12:51 0:37 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
xxxx 1782 1.1 0.0 378860 3824 pts/4 D+ 12:51 1:20 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
xxxx 1954 0.0 0.0 5912 544 pts/0 D+ 12:58 0:00 iostat 5
root 2642 0.1 0.0 0 0 ? D 13:25 0:09 [kworker/0:1]
root 3233 0.0 0.0 5044 168 ? Ds 13:50 0:00 /usr/sbin/sshd -D -R
root 4753 0.0 0.0 15056 928 ? D 14:42 0:00 umount /run/rpc_pipefs
root 4828 0.0 0.0 4296 348 ? D 14:42 0:00 sync
root 4834 0.0 0.0 8100 624 pts/6 R+ 14:50 0:00 grep --color=auto D
root 29491 0.0 0.0 0 0 ? D 12:45 0:00 [kworker/1:2]
I see even umount'ing rpc_pipefs is hanging. So this suggests there's some
sort of global lock involved.
Anyway, I just wonder if this jogs a memory in anyone, as to why iostat
would hang in an unkillable way.
Regards,
Brian.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2012-05-15 14:02 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-05-02 18:44 Storage server, hung tasks and tracebacks Brian Candler
2012-05-03 12:50 ` Stan Hoeppner
2012-05-03 20:41 ` Brian Candler
2012-05-03 22:19 ` Stan Hoeppner
2012-05-04 16:32 ` Brian Candler
2012-05-04 16:50 ` Stefan Ring
[not found] ` <4FA4C321.2070105@hardwarefreak.com>
2012-05-06 8:47 ` Brian Candler
2012-05-15 14:02 ` Brian Candler [this message]
2012-05-20 16:35 ` Brian Candler
2012-05-22 13:14 ` Brian Candler
2012-05-20 23:59 ` Dave Chinner
2012-05-21 9:58 ` Brian Candler
2012-09-09 9:47 ` Brian Candler
2012-05-07 1:53 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120515140237.GA3630@nsrc.org \
--to=b.candler@pobox.com \
--cc=stan@hardwarefreak.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.