All of lore.kernel.org
 help / color / mirror / Atom feed
From: Brian Candler <B.Candler@pobox.com>
To: Stan Hoeppner <stan@hardwarefreak.com>
Cc: xfs@oss.sgi.com
Subject: Re: Storage server, hung tasks and tracebacks
Date: Tue, 15 May 2012 15:02:37 +0100	[thread overview]
Message-ID: <20120515140237.GA3630@nsrc.org> (raw)
In-Reply-To: <4FA4C321.2070105@hardwarefreak.com>

Update:

After a week away, I am continuing to try to narrow down the problem of this
system withing hanging I/O.

I can fairly reliably repeat the problem on a system with 24 disks, and I've
embarked on trying some different configs to see what's the simplest way I
can make this die.

During this, I found something of interest: I happened to leave an 'iostat
5' process running, and that hung too.  i.e. ps showed it in 'D+' state, and
it was unkillable.

root        34  0.6  0.0      0     0 ?        D    11:29   1:18 [kswapd0]
root      1258  0.0  0.0  15976   532 ?        Ds   11:29   0:00 /usr/sbin/irqbalance
root      1421  0.0  0.0      0     0 ?        D    12:49   0:01 [xfsaild/md127]
snmp      1430  0.0  0.0  48608  3440 ?        D    11:29   0:00 /usr/sbin/snmpd -Lsd -Lf /dev/null -u snmp -g snmp -I -smux -p /var/run/snmpd.pid
xxxx      1614  1.1  0.0 378860  3812 pts/1    D+   12:50   1:15 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
xxxx      1669  1.2  0.0 378860  3816 pts/2    D+   12:50   1:21 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
xxxx      1727  0.5  0.0 383424   692 pts/3    Dl+  12:51   0:37 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
xxxx      1782  1.2  0.0 378860  3824 pts/4    D+   12:51   1:20 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
xxxx      1954  0.0  0.0   5912   544 pts/0    D+   12:58   0:00 iostat 5
root      2642  0.2  0.0      0     0 ?        D    13:25   0:09 [kworker/0:1]
root      3233  0.0  0.0   5044   168 ?        Ds   13:50   0:00 /usr/sbin/sshd -D -R
xxxx      4648  0.0  0.0   8104   936 pts/6    S+   14:41   0:00 grep --color=auto  D
root     29491  0.0  0.0      0     0 ?        D    12:45   0:00 [kworker/1:2]

I wonder if iostat actually communicates with the device driver at all? If
not, then presumably it's looking at some kernel data structure.  Maybe
there is a lock being kept open on that by someone/something.

At the same time, I notice that 'cat /proc/diskstats' still works, and
starting a new 'iostat 5' process works too.

After issuing halt -p I get this:

root        34  0.6  0.0      0     0 ?        D    11:29   1:18 [kswapd0]
root      1258  0.0  0.0  15976   532 ?        Ds   11:29   0:00 /usr/sbin/irqbalance
root      1421  0.0  0.0      0     0 ?        D    12:49   0:01 [xfsaild/md127]
snmp      1430  0.0  0.0  48608  3440 ?        D    11:29   0:00 /usr/sbin/snmpd -Lsd -Lf /dev/null -u snmp -g snmp -I -smux -p /var/run/snmpd.pid
xxxx      1614  1.0  0.0 378860  3812 pts/1    D+   12:50   1:15 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
xxxx      1669  1.1  0.0 378860  3816 pts/2    D+   12:50   1:21 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
xxxx      1727  0.5  0.0 383424   692 pts/3    Dl+  12:51   0:37 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
xxxx      1782  1.1  0.0 378860  3824 pts/4    D+   12:51   1:20 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
xxxx      1954  0.0  0.0   5912   544 pts/0    D+   12:58   0:00 iostat 5
root      2642  0.1  0.0      0     0 ?        D    13:25   0:09 [kworker/0:1]
root      3233  0.0  0.0   5044   168 ?        Ds   13:50   0:00 /usr/sbin/sshd -D -R
root      4753  0.0  0.0  15056   928 ?        D    14:42   0:00 umount /run/rpc_pipefs
root      4828  0.0  0.0   4296   348 ?        D    14:42   0:00 sync
root      4834  0.0  0.0   8100   624 pts/6    R+   14:50   0:00 grep --color=auto  D
root     29491  0.0  0.0      0     0 ?        D    12:45   0:00 [kworker/1:2]

I see even umount'ing rpc_pipefs is hanging. So this suggests there's some
sort of global lock involved.

Anyway, I just wonder if this jogs a memory in anyone, as to why iostat
would hang in an unkillable way.

Regards,

Brian.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  parent reply	other threads:[~2012-05-15 14:02 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-02 18:44 Storage server, hung tasks and tracebacks Brian Candler
2012-05-03 12:50 ` Stan Hoeppner
2012-05-03 20:41   ` Brian Candler
2012-05-03 22:19     ` Stan Hoeppner
2012-05-04 16:32       ` Brian Candler
2012-05-04 16:50         ` Stefan Ring
     [not found]         ` <4FA4C321.2070105@hardwarefreak.com>
2012-05-06  8:47           ` Brian Candler
2012-05-15 14:02           ` Brian Candler [this message]
2012-05-20 16:35             ` Brian Candler
2012-05-22 13:14               ` Brian Candler
2012-05-20 23:59             ` Dave Chinner
2012-05-21  9:58               ` Brian Candler
2012-09-09  9:47                 ` Brian Candler
2012-05-07  1:53         ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120515140237.GA3630@nsrc.org \
    --to=b.candler@pobox.com \
    --cc=stan@hardwarefreak.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.