public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Michael Monnerie <michael.monnerie@is.it-management.at>
Cc: xfs@oss.sgi.com
Subject: Re: XFS hangs and freezes with LSI 9265-8i controller on high i/o
Date: Fri, 15 Jun 2012 22:29:34 +1000	[thread overview]
Message-ID: <20120615122934.GB19223@dastard> (raw)
In-Reply-To: <47854255.KfXFdqTbOZ@saturn>

On Fri, Jun 15, 2012 at 11:52:17AM +0200, Michael Monnerie wrote:
> Am Freitag, 15. Juni 2012, 10:16:02 schrieb Dave Chinner:
> > So, the average service time for an IO is 10-16ms, which is a seek
> > per IO. You're doing primarily 128k read IOs, and maybe one or 2
> > writes a second. You have a very deep request queue: > 512 requests.
> > Have you tuned /sys/block/sda/queue/nr_requests up from the default
> > of 128? This is going to be one of the causes of your problems - you
> > have 511 oustanding write requests, and only one read at a time.
> > Reduce the ioscehduer queue depth, and potentially also the device
> > CTQ depth.
> 
> Dave, I'm puzzled by this. I'd believe that a higher #req. would help 
> the block layer to resort I/O in the elevator, and therefore help to 
> gain throughput. Why would 128 be better than 512 here?

512 * 16ms per IO = 7-8s IO latency.

Fundamentally, deep queues are as harmful to latency as shallow
queues are to throughput. Everyone says "make the queues deeper" to
get the highest benchmark numbers, but in reality most benchmarks
measure throughput and aren't IO latency sensistive.

I did a bunch of measurement 7 or8 years ago on high end FC HW RAID,
and found that a CTQ depth per lun of 4 was all that was needed to
reach maximum write bandwidth under almost all circumstances. When
doing concurrent read and write with a CTQ depth of 4, the balance
was roughly 50/50 read/write. Al things the same except for a CTQ
depth of 6, and it was 30/70 read/write. And any CTQ depth deeper
than 8 is was roughly 10/90 read/write. That hardware supported a
CTQ depth of 240 IOs per lun....

So even high end hardware that can support a maximum CTQ depth of
256 IOs will see this problem - you'll get 255 writes and a single
read at a time, resulting in terrible read IO latency. There is
always another async write ready to be queued, but the application
doesn't queue another read until the first one completes. Hence
reads always are issued in small numbers and when any IO is
completed, there isn't another read queued ready for dispatch. Hence
all that happens is that async writes are sent to the drive.

And then when the BBWC fills up and has to flush all those writes,
everything slows right done because the cache effective becomes
a write-through cache - it can't take another read or write until
the flush completes another IO and space is freed in the BBWC for
the next IO.

> And maybe Matthew could profit from limiting the vm.dirty_bytes, I've 
> seen when this value is too high the server stucks on lots of writes, 
> for streaming it's better to have this smaller so the disk writes can 
> keep up and delays are not too long.

I pretty much never tune dirty limits anymore - most writeback
problems are storage stack related these days...

> > Oh, I just noticed you are might be using CFQ (it's the default in
> > dmesg). Don't - CFQ is highly unsuited for hardware RAID - it's
> > hueristically tuned to work well on sngle SATA drives. Use deadline,
> > or preferably for hardware RAID, noop.
> 
> Wouldn't deadline be better with a higher rq_qu size? As I understand 
> it, noop only groups adjacent I/Os together, while deadline does a bit 
> more and should be able to get bigger adjacent I/O areas because it 
> waits a bit longer before a flush.

The BBWC does a much better job of sorting and batching IOs than the
io scheduler can ever possibly hope to. Think about it - 512MB can
hold a 100,000 4k IOs and reorder and batch them far more
effectively than a io scheduler with even a 512 request deept
queue.

That's why making the IO scheduler queue deeper with HW RAID is
harmful - it's not needed to reach maximum performance for almost
all workloads, and all it does is add latency to the IO path...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2012-06-15 12:29 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-11 21:37 XFS hangs and freezes with LSI 9265-8i controller on high i/o Matthew Whittaker-Williams
2012-06-12  1:18 ` Dave Chinner
2012-06-12 15:56   ` Matthew Whittaker-Williams
2012-06-12 17:40     ` Matthew Whittaker-Williams
2012-06-13  0:12     ` Stan Hoeppner
2012-06-13  1:19     ` Dave Chinner
2012-06-13  3:56       ` Stan Hoeppner
2012-06-13  8:54       ` Matthew Whittaker-Williams
2012-06-13 11:59         ` Andre Noll
2012-06-13 12:13           ` Michael Monnerie
2012-06-13 16:12             ` Stan Hoeppner
2012-06-14  7:31               ` Michael Monnerie
2012-06-14  0:04         ` Dave Chinner
2012-06-14 14:31           ` Matthew Whittaker-Williams
2012-06-15  0:16             ` Dave Chinner
2012-06-15  9:52               ` Michael Monnerie
2012-06-15 12:29                 ` Dave Chinner [this message]
2012-06-15 11:25               ` Bernd Schubert
2012-06-15 12:30                 ` Dave Chinner
2012-06-15 14:22                   ` Bernd Schubert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120615122934.GB19223@dastard \
    --to=david@fromorbit.com \
    --cc=michael.monnerie@is.it-management.at \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox