Understanding I/O behaviour - next try

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Martin Knoblauch <spamtrap@knobisoft.de>
To: linux-kernel@vger.kernel.org
Cc: Peter zijlstra <a.p.zijlstra@chello.nl>,
	mingo@redhat.com, spam trap <spamtrap@knobisoft.de>
Subject: Understanding I/O behaviour - next try
Date: Tue, 28 Aug 2007 08:53:07 -0700 (PDT)	[thread overview]
Message-ID: <713252.42570.qm@web32614.mail.mud.yahoo.com> (raw)

Keywords: I/O, bdi-v9, cfs

Hi,

 a while ago I asked a few questions on the Linux I/O behaviour,
because I were (still am) fighting some "misbehaviour" related to heavy
I/O.

 The basic setup is a dual x86_64 box with 8 GB of memory. The DL380
has a HW RAID5, made from 4x72GB disks and about 100 MB write cache.
The performance of the block device with O_DIRECT is about 90 MB/sec.

 The problematic behaviour comes when we are moving large files through
the system. The file usage in this case is mostly "use once" or
streaming. As soon as the amount of file data is larger than 7.5 GB, we
see occasional unresponsiveness of the system (e.g. no more ssh
connections into the box) of more than 1 or 2 minutes (!) duration
(kernels up to 2.6.19). Load goes up, mainly due to pdflush threads and
some other poor guys being in "D" state.

 The data flows in basically three modes. All of them are affected:

local-disk -> NFS
NFS -> local-disk
NFS -> NFS

 NFS is V3/TCP.

 So, I made a few experiments in the last few days, using three
different kernels: 2.6.22.5, 2.6.22.5+cfs20.4 an 2.6.22.5+bdi-v9.

 The first observation (independent of the kernel) is that we *should*
use O_DIRECT, at least for output to the local disk. Here we see about
90 MB/sec write performance. A simple "dd" using 1,2 and 3 parallel
threads to the same block device (through a ext2 FS) gives:

O_Direct: 88 MB/s, 2x44, 3x29.5
non-O_DIRECT: 51 MB/s, 2x19, 3x12.5

- Observation 1a: IO schedulers are mostly equivalent, with CFQ
slightly worse than AS and DEADLINE
- Observation 1b: when using a 2.6.22.5+cfs20.4, the non-O_DIRECT
performance goes [slightly] down. With three threads it is 3x10 MB/s.
Ingo?
- Observation 1c: bdi-v9 does not help in this case, which is not
surprising.

 The real question here is why the non-O_DIRECT case is so slow. Is
this a general thing? Is this related to the CCISS controller? Using
O_DIRECT is unfortunatelly not an option for us.

 When using three different targets (local disk plus two different NFS
Filesystems) bdi-v9 is a big winner. Without it, all threads are [seem
to be] limited to the speed of the slowest FS. With bdi-v9 we see a
considerable speedup.

 Just by chance I found out that doing all I/O inc sync-mode does
prevent the load from going up. Of course, I/O throughput is not
stellar (but not much worse than the non-O_DIRECT case). But the
responsiveness seem OK. Maybe a solution, as this can be controlled via
mount (would be great for O_DIRECT :-).

 In general 2.6.22 seems to bee better that 2.6.19, but this is highly
subjective :-( I am using the following setting in /proc. They seem to
provide the smoothest responsiveness:

vm.dirty_background_ratio = 1
vm.dirty_ratio = 1
vm.swappiness = 1
vm.vfs_cache_pressure = 1

 Another thing I saw during my tests is that when writing to NFS, the
"dirty" or "nr_dirty" numbers are always 0. Is this a conceptual thing,
or a bug?

 In any case, view this as a report for one specific loadcase that does
not behave very well. It seems there are ways to make things better
(sync, per device throttling, ...), but nothing "perfect yet. Use once
does seem to be a problem.

Cheers
Martin

------------------------------------------------------
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

next             reply	other threads:[~2007-08-28 15:53 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-08-28 15:53 Martin Knoblauch [this message]
2007-08-29  1:38 ` Understanding I/O behaviour - next try Fengguang Wu
2007-08-29  1:38   ` Fengguang Wu
2007-08-29  8:15     ` Martin Knoblauch
2007-08-29  8:40       ` Fengguang Wu
2007-08-29  8:40         ` Fengguang Wu
2007-08-29  9:22           ` Martin Knoblauch
2007-09-13 14:17       ` Peter Zijlstra
2007-08-29  9:48 ` Jens Axboe
2007-08-29 14:26   ` Martin Knoblauch
2007-08-30 10:50   ` Martin Knoblauch
2007-08-29 16:25 ` Chuck Ebbert
2007-08-29 21:43   ` Martin Knoblauch
     [not found] <fa.tV0SjP5wHRgCEzqJw2C8w4+Fh90@ifi.uio.no>
     [not found] ` <fa.NN9klzYbZhoZ+YoOWgrMeLtzlHE@ifi.uio.no>
2007-08-29 14:27   ` Robert Hancock
2007-08-30 10:26     ` Martin Knoblauch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=713252.42570.qm@web32614.mail.mud.yahoo.com \
    --to=spamtrap@knobisoft.de \
    --cc=a.p.zijlstra@chello.nl \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.