public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Josef 'Jeff' Sipek <jeffpc@josefsipek.net>
Cc: xfs@oss.sgi.com
Subject: Re: task blocked for more than 120 seconds
Date: Sat, 21 Apr 2012 10:29:32 +1000	[thread overview]
Message-ID: <20120421002932.GG9541@dastard> (raw)
In-Reply-To: <20120420135820.GB9189@poseidon.cudanet.local>

On Fri, Apr 20, 2012 at 09:58:20AM -0400, Josef 'Jeff' Sipek wrote:
> On Fri, Apr 20, 2012 at 08:56:03AM +1000, Dave Chinner wrote:
> > On Thu, Apr 19, 2012 at 11:46:02AM -0400, Josef 'Jeff' Sipek wrote:
> FWIW, the filesystem was made using the default mkfs options (IOW: mkfs.xfs
> /dev/sda4).  I didn't even notice the log being 2GB.  (I just assumed it'd
> be the good ol' 128MB.)

OK.

> ...
> > > The following applies to all of the spikes, but I'm specifically talking about
> > > the spike from this morning.  During the ~45 minute spike, there seems to be
> > > very little disk I/O (<1 MByte/s compared to the usual 10 MBytes/s).  Since
> > 
> > That sounds like it might have dropped into random 4k write IO or
> > inode cluster RMW cycles - a single large RAID6 volume is going to
> > be no faster than a single spindle at this. Can you get `iostat -d
> > -m -x 5` output when the next slowdown occurs so we can see the IOPS
> > and utilisation as well as the bandwidth?
> 
> Right.  Here's the output from a few minutes ago (I removed the empty lines
> and redundant column headings):

Thanks, that makes it easy to read ;)

> fs3.ess ~ # iostat -d -m -x 5
> Linux 3.2.2 (fs3.ess.sfj.cudaops.com) 	04/20/12 	_x86_64_	(6 CPU)
> 
> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
> sda              11.90   129.18  261.81  457.22     2.81     6.76    27.26     4.44    6.18    9.60    4.22   0.82  59.02
> sda               0.00     0.20   61.20  112.60     0.49     0.88    16.17     1.50    8.64   17.66    3.74   5.74  99.76
> sda               0.00     0.60   61.40  117.60     0.48     0.92    16.02     1.33    7.41   16.18    2.82   5.56  99.52
> sda               0.00     1.20   63.40  104.20     0.50     0.82    16.09     1.28    7.65   15.70    2.75   5.94  99.60

Ok, so the queue depth here is showing serialised IO (barely above
1), and the request size is 8k, which will be inode clusters. The
service time for reads is around 16ms, which assuming SATA drives is
a full disk seek. The writes are much faster dues to the BBWC. it
looks, however, like it is reading inodes from all over the drive,
and that is the delay factor here. The serialisation indicates that
it is either log tail pushing or low-memory inode reclaim that is
slowing everyone down here.

Now comes the "more data needed" bit. I still need the sysrq-w
output, but also given the length of the incident, some other data
is definitely needed:

	- a 30s event trace - it'll compress pretty well
	  (trace-cmd record -e xfs* sleep 30; trace-cmd report > output.txt)
	- vmstat 1 output for the same period
	- output of /proc/meminfo at the same time
	- the same iostat output for comparison.

> And now while the system is coming back to the usual loadavg (of 30):
> 
> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
> sda               0.00    26.00   29.00 1235.40     0.13    17.10    27.91   202.30  161.37 1792.52  123.08   0.79 100.00
> sda               0.00    29.80   26.20 1095.60     0.11    15.34    28.22   185.88  187.19 2248.92  137.89   0.89 100.00
> sda               0.00    28.20   11.80 1527.20     0.06    19.80    26.44   165.23  112.77  665.83  108.50   0.65 100.00

Yeah, that looks more like a properly working filesystem :/

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2012-04-21  0:29 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-18 15:11 task blocked for more than 120 seconds Josef 'Jeff' Sipek
2012-04-18 18:28 ` Ben Myers
2012-04-18 23:48 ` Dave Chinner
2012-04-19 15:46   ` Josef 'Jeff' Sipek
2012-04-19 22:56     ` Dave Chinner
2012-04-20 13:58       ` Josef 'Jeff' Sipek
2012-04-21  0:29         ` Dave Chinner [this message]
2012-04-23 17:16           ` Josef 'Jeff' Sipek
2012-04-23 20:24           ` Josef 'Jeff' Sipek
2012-04-23 23:27             ` Dave Chinner
2012-04-24 15:10               ` Josef 'Jeff' Sipek
2012-09-27 12:49               ` Josef 'Jeff' Sipek
2012-09-27 22:50                 ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120421002932.GG9541@dastard \
    --to=david@fromorbit.com \
    --cc=jeffpc@josefsipek.net \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox