From: Brian Candler <B.Candler@pobox.com>
To: Stan Hoeppner <stan@hardwarefreak.com>
Cc: xfs@oss.sgi.com
Subject: Re: Storage server, hung tasks and tracebacks
Date: Tue, 15 May 2012 15:02:37 +0100 [thread overview]
Message-ID: <20120515140237.GA3630@nsrc.org> (raw)
In-Reply-To: <4FA4C321.2070105@hardwarefreak.com>
Update:
After a week away, I am continuing to try to narrow down the problem of this
system withing hanging I/O.
I can fairly reliably repeat the problem on a system with 24 disks, and I've
embarked on trying some different configs to see what's the simplest way I
can make this die.
During this, I found something of interest: I happened to leave an 'iostat
5' process running, and that hung too. i.e. ps showed it in 'D+' state, and
it was unkillable.
root 34 0.6 0.0 0 0 ? D 11:29 1:18 [kswapd0]
root 1258 0.0 0.0 15976 532 ? Ds 11:29 0:00 /usr/sbin/irqbalance
root 1421 0.0 0.0 0 0 ? D 12:49 0:01 [xfsaild/md127]
snmp 1430 0.0 0.0 48608 3440 ? D 11:29 0:00 /usr/sbin/snmpd -Lsd -Lf /dev/null -u snmp -g snmp -I -smux -p /var/run/snmpd.pid
xxxx 1614 1.1 0.0 378860 3812 pts/1 D+ 12:50 1:15 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
xxxx 1669 1.2 0.0 378860 3816 pts/2 D+ 12:50 1:21 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
xxxx 1727 0.5 0.0 383424 692 pts/3 Dl+ 12:51 0:37 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
xxxx 1782 1.2 0.0 378860 3824 pts/4 D+ 12:51 1:20 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
xxxx 1954 0.0 0.0 5912 544 pts/0 D+ 12:58 0:00 iostat 5
root 2642 0.2 0.0 0 0 ? D 13:25 0:09 [kworker/0:1]
root 3233 0.0 0.0 5044 168 ? Ds 13:50 0:00 /usr/sbin/sshd -D -R
xxxx 4648 0.0 0.0 8104 936 pts/6 S+ 14:41 0:00 grep --color=auto D
root 29491 0.0 0.0 0 0 ? D 12:45 0:00 [kworker/1:2]
I wonder if iostat actually communicates with the device driver at all? If
not, then presumably it's looking at some kernel data structure. Maybe
there is a lock being kept open on that by someone/something.
At the same time, I notice that 'cat /proc/diskstats' still works, and
starting a new 'iostat 5' process works too.
After issuing halt -p I get this:
root 34 0.6 0.0 0 0 ? D 11:29 1:18 [kswapd0]
root 1258 0.0 0.0 15976 532 ? Ds 11:29 0:00 /usr/sbin/irqbalance
root 1421 0.0 0.0 0 0 ? D 12:49 0:01 [xfsaild/md127]
snmp 1430 0.0 0.0 48608 3440 ? D 11:29 0:00 /usr/sbin/snmpd -Lsd -Lf /dev/null -u snmp -g snmp -I -smux -p /var/run/snmpd.pid
xxxx 1614 1.0 0.0 378860 3812 pts/1 D+ 12:50 1:15 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
xxxx 1669 1.1 0.0 378860 3816 pts/2 D+ 12:50 1:21 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
xxxx 1727 0.5 0.0 383424 692 pts/3 Dl+ 12:51 0:37 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
xxxx 1782 1.1 0.0 378860 3824 pts/4 D+ 12:51 1:20 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
xxxx 1954 0.0 0.0 5912 544 pts/0 D+ 12:58 0:00 iostat 5
root 2642 0.1 0.0 0 0 ? D 13:25 0:09 [kworker/0:1]
root 3233 0.0 0.0 5044 168 ? Ds 13:50 0:00 /usr/sbin/sshd -D -R
root 4753 0.0 0.0 15056 928 ? D 14:42 0:00 umount /run/rpc_pipefs
root 4828 0.0 0.0 4296 348 ? D 14:42 0:00 sync
root 4834 0.0 0.0 8100 624 pts/6 R+ 14:50 0:00 grep --color=auto D
root 29491 0.0 0.0 0 0 ? D 12:45 0:00 [kworker/1:2]
I see even umount'ing rpc_pipefs is hanging. So this suggests there's some
sort of global lock involved.
Anyway, I just wonder if this jogs a memory in anyone, as to why iostat
would hang in an unkillable way.
Regards,
Brian.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2012-05-15 14:02 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-05-02 18:44 Storage server, hung tasks and tracebacks Brian Candler
2012-05-03 12:50 ` Stan Hoeppner
2012-05-03 20:41 ` Brian Candler
2012-05-03 22:19 ` Stan Hoeppner
2012-05-04 16:32 ` Brian Candler
2012-05-04 16:50 ` Stefan Ring
[not found] ` <4FA4C321.2070105@hardwarefreak.com>
2012-05-06 8:47 ` Brian Candler
2012-05-15 14:02 ` Brian Candler [this message]
2012-05-20 16:35 ` Brian Candler
2012-05-22 13:14 ` Brian Candler
2012-05-20 23:59 ` Dave Chinner
2012-05-21 9:58 ` Brian Candler
2012-09-09 9:47 ` Brian Candler
2012-05-07 1:53 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120515140237.GA3630@nsrc.org \
--to=b.candler@pobox.com \
--cc=stan@hardwarefreak.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox