From: Robert Hancock <hancockr@shaw.ca>
To: Jens Axboe <jens.axboe@oracle.com>
Cc: Martin Knoblauch <spamtrap@knobisoft.de>,
linux-kernel@vger.kernel.org,
Peter zijlstra <a.p.zijlstra@chello.nl>,
mingo@redhat.com
Subject: Re: Understanding I/O behaviour - next try
Date: Wed, 29 Aug 2007 08:27:48 -0600 [thread overview]
Message-ID: <46D58264.6050403@shaw.ca> (raw)
In-Reply-To: <fa.NN9klzYbZhoZ+YoOWgrMeLtzlHE@ifi.uio.no>
Jens Axboe wrote:
> On Tue, Aug 28 2007, Martin Knoblauch wrote:
>> Keywords: I/O, bdi-v9, cfs
>>
>> Hi,
>>
>> a while ago I asked a few questions on the Linux I/O behaviour,
>> because I were (still am) fighting some "misbehaviour" related to heavy
>> I/O.
>>
>> The basic setup is a dual x86_64 box with 8 GB of memory. The DL380
>> has a HW RAID5, made from 4x72GB disks and about 100 MB write cache.
>> The performance of the block device with O_DIRECT is about 90 MB/sec.
>>
>> The problematic behaviour comes when we are moving large files through
>> the system. The file usage in this case is mostly "use once" or
>> streaming. As soon as the amount of file data is larger than 7.5 GB, we
>> see occasional unresponsiveness of the system (e.g. no more ssh
>> connections into the box) of more than 1 or 2 minutes (!) duration
>> (kernels up to 2.6.19). Load goes up, mainly due to pdflush threads and
>> some other poor guys being in "D" state.
>>
>> The data flows in basically three modes. All of them are affected:
>>
>> local-disk -> NFS
>> NFS -> local-disk
>> NFS -> NFS
>>
>> NFS is V3/TCP.
>>
>> So, I made a few experiments in the last few days, using three
>> different kernels: 2.6.22.5, 2.6.22.5+cfs20.4 an 2.6.22.5+bdi-v9.
>>
>> The first observation (independent of the kernel) is that we *should*
>> use O_DIRECT, at least for output to the local disk. Here we see about
>> 90 MB/sec write performance. A simple "dd" using 1,2 and 3 parallel
>> threads to the same block device (through a ext2 FS) gives:
>>
>> O_Direct: 88 MB/s, 2x44, 3x29.5
>> non-O_DIRECT: 51 MB/s, 2x19, 3x12.5
>>
>> - Observation 1a: IO schedulers are mostly equivalent, with CFQ
>> slightly worse than AS and DEADLINE
>> - Observation 1b: when using a 2.6.22.5+cfs20.4, the non-O_DIRECT
>> performance goes [slightly] down. With three threads it is 3x10 MB/s.
>> Ingo?
>> - Observation 1c: bdi-v9 does not help in this case, which is not
>> surprising.
>>
>> The real question here is why the non-O_DIRECT case is so slow. Is
>> this a general thing? Is this related to the CCISS controller? Using
>> O_DIRECT is unfortunatelly not an option for us.
>>
>> When using three different targets (local disk plus two different NFS
>> Filesystems) bdi-v9 is a big winner. Without it, all threads are [seem
>> to be] limited to the speed of the slowest FS. With bdi-v9 we see a
>> considerable speedup.
>>
>> Just by chance I found out that doing all I/O inc sync-mode does
>> prevent the load from going up. Of course, I/O throughput is not
>> stellar (but not much worse than the non-O_DIRECT case). But the
>> responsiveness seem OK. Maybe a solution, as this can be controlled via
>> mount (would be great for O_DIRECT :-).
>>
>> In general 2.6.22 seems to bee better that 2.6.19, but this is highly
>> subjective :-( I am using the following setting in /proc. They seem to
>> provide the smoothest responsiveness:
>>
>> vm.dirty_background_ratio = 1
>> vm.dirty_ratio = 1
>> vm.swappiness = 1
>> vm.vfs_cache_pressure = 1
>>
>> Another thing I saw during my tests is that when writing to NFS, the
>> "dirty" or "nr_dirty" numbers are always 0. Is this a conceptual thing,
>> or a bug?
>>
>> In any case, view this as a report for one specific loadcase that does
>> not behave very well. It seems there are ways to make things better
>> (sync, per device throttling, ...), but nothing "perfect yet. Use once
>> does seem to be a problem.
>
> Try limiting the queue depth on the cciss device, some of those are
> notoriously bad at starving commands. Something like the below hack, see
> if it makes a difference (and please verify in dmesg that it prints the
> message about limiting depth!):
I saw a bulletin from HP recently that sugggested disabling the
write-back cache on some Smart Array controllers as a workaround because
it reduced performance in applications that did large bulk writes.
Presumably they are planning on releasing some updated firmware that
fixes this eventually..
--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/
next parent reply other threads:[~2007-08-29 14:29 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <fa.tV0SjP5wHRgCEzqJw2C8w4+Fh90@ifi.uio.no>
[not found] ` <fa.NN9klzYbZhoZ+YoOWgrMeLtzlHE@ifi.uio.no>
2007-08-29 14:27 ` Robert Hancock [this message]
2007-08-30 10:26 ` Understanding I/O behaviour - next try Martin Knoblauch
2007-08-28 15:53 Martin Knoblauch
2007-08-29 1:38 ` Fengguang Wu
2007-08-29 1:38 ` Fengguang Wu
2007-08-29 8:15 ` Martin Knoblauch
2007-08-29 8:40 ` Fengguang Wu
2007-08-29 8:40 ` Fengguang Wu
2007-08-29 9:22 ` Martin Knoblauch
2007-09-13 14:17 ` Peter Zijlstra
2007-08-29 9:48 ` Jens Axboe
2007-08-29 14:26 ` Martin Knoblauch
2007-08-30 10:50 ` Martin Knoblauch
2007-08-29 16:25 ` Chuck Ebbert
2007-08-29 21:43 ` Martin Knoblauch
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=46D58264.6050403@shaw.ca \
--to=hancockr@shaw.ca \
--cc=a.p.zijlstra@chello.nl \
--cc=jens.axboe@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=spamtrap@knobisoft.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.