public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Understanding I/O behaviour
@ 2007-07-05 15:40 Martin Knoblauch
  2007-07-05 18:15 ` Andrew Lyon
  2007-07-05 20:22 ` Jesper Juhl
  0 siblings, 2 replies; 17+ messages in thread
From: Martin Knoblauch @ 2007-07-05 15:40 UTC (permalink / raw)
  To: linux-kernel

Hi,

 for a customer we are operating a rackful of HP/DL380/G4 boxes that
have given us some problems with system responsiveness under [I/O
triggered] system load.

 The systems in question have the following HW:

2x Intel/EM64T CPUs
8GB memory
CCISS Raid controller with 4x72GB SCSI disks as RAID5
2x BCM5704 NIC (using tg3)

 The distribution is RHEL4. We have tested several kernels including
the original 2.6.9, 2.6.19.2, 2.6.22-rc7 and 2.6.22-rc7+cfs-v18.

 One part of the workload is when several processes try to write 5 GB
each to the local filesystem (ext2->LVM->CCISS). When this happens, the
load goes up to 12 and responsiveness goes down. This means from one
moment to the next things like opening a ssh connection to the host in
question, or doing "df" take forever (minutes). Especially bad with the
vendor kernel, better (but not perfect) with 2.6.19 and 2.6.22-rc7.

 The load basically comes from the writing processes and up to 12
"pdflush" threads all being in "D" state.

 So, what I would like to understand is how we can maximize the
responsiveness of the system, while keeping disk throughput at maximum.

 During my investiogation I basically performed the following test,
because it represents the kind of trouble situation:

----
$ cat dd3.sh
echo "Start 3 dd processes: "`date`
dd if=/dev/zero of=/scratch/X1 bs=1M count=5000&
dd if=/dev/zero of=/scratch/X2 bs=1M count=5000&
dd if=/dev/zero of=/scratch/X3 bs=1M count=5000&
wait
echo "Finish 3 dd processes: "`date`
sync
echo "Finish sync: "`date`
rm -f /scratch/X?
echo "Files removed: "`date`
----

 This results in the following timings. All with the anticipatory
scheduler, because it gives the best results:

2.6.19.2, HT: 10m
2.6.19.2, non-HT: 8m45s
2.6.22-rc7, HT: 10m
2.6.22-rc7, non-HT: 6m
2.6.22-rc7+cfs_v18, HT: 10m40s
2.6.22-rc7+cfs_v18, non-HT: 10m45s

 The "felt" responsiveness was best with the last two kernels, although
the load profile over time looks identical in all cases.

 So, a few questions:

a) any idea why disabling HT improves throughput, except for the cfs
kernels? For plain 2.6.22 the difference is quite substantial
b) any ideas how to optimize the settings of the /proc/sys/vm/
parameters? The documentation is a bit thin here.

Thanks in advance
Martin

------------------------------------------------------
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

^ permalink raw reply	[flat|nested] 17+ messages in thread
[parent not found: <fa.gAvf+r9fiPwNwNVqahYy5u1/Is0@ifi.uio.no>]
* Re: Understanding I/O behaviour
@ 2007-07-06 10:18 Martin Knoblauch
  0 siblings, 0 replies; 17+ messages in thread
From: Martin Knoblauch @ 2007-07-06 10:18 UTC (permalink / raw)
  To: linux-kernel

>>    b) any ideas how to optimize the settings of the /proc/sys/vm/
>>    parameters? The documentation is a bit thin here.
>>
>>
>I cant offer any advice there, but is raid-5 really the best choice
>for your needs? I would not choose raid-5 for a system that is
>regularly performing lots of large writes at the same time, dont
>forget that each write can require several reads to recalculate the
>partity.
>
>Does the raid card have much cache ram?
>

 192 MB, split 50/50 to read write.

>If you can afford to loose some space raid-10 would probably perform
>better.

 RAID5 most likely is not the best solution and I would not use it if
the described use-case was happening all the time. It happens a few
times a day and then things go down when all memory is filled with
page-cache.

 And the same also happens when copying large amountd of data from one
NFS mounted FS to another NFS mounted FS. No disk involved there.
Memory fills with page-cache until it reaches a ceeling and then for
some time responsiveness is really really bad.

 I am just now playing with the dirty_* stuff. Maybe it helps.

Cheers
Martin



------------------------------------------------------
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

^ permalink raw reply	[flat|nested] 17+ messages in thread
* Re: Understanding I/O behaviour
@ 2007-07-06 11:03 Martin Knoblauch
  0 siblings, 0 replies; 17+ messages in thread
From: Martin Knoblauch @ 2007-07-06 11:03 UTC (permalink / raw)
  To: linux-kernel; +Cc: spam trap

Martin Knoblauch wrote:
>--- Robert Hancock <hancockr@xxxxxxx> wrote:
>
>>
>> Try playing with reducing /proc/sys/vm/dirty_ratio and see how that
>> helps. This workload will fill up memory with dirty data very
>> quickly,
>> and it seems like system responsiveness often goes down the toilet
>> when
>> this happens and the system is going crazy trying to write it all
>> out.
>>
>
>Definitely the "going crazy" part is the worst problem I see with 2.6
>based kernels (late 2.4 was really better in this corner case).
>
>I am just now playing with dirty_ratio. Anybody knows what the lower
>limit is? "0" seems acceptabel, but does it actually imply "write out
>immediatelly"?
>
>Another problem, the VM parameters are not really well documented in
>their behaviour and interdependence.

 Lowering dirty_ration just leads to more imbalanced write-speed for
the three dd's. Even when lowering the number to 0, the hich load
stays.

 Now, on another experiment I mounted the FS with "sync". And now the
load stays below/around 3. No more "pdflush" daemons going wild. And
the responsiveness is good, with no drops.

 My question is now: is there a parameter that one can use to force
immediate writeout for every process. This may hurt overall performance
of the system, but might really help my situation. Setting dirty_ratio
to 0 does not seem to do it.

Cheers
Martin

------------------------------------------------------
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

^ permalink raw reply	[flat|nested] 17+ messages in thread
* Re: Understanding I/O behaviour
@ 2007-07-06 12:44 Martin Knoblauch
  0 siblings, 0 replies; 17+ messages in thread
From: Martin Knoblauch @ 2007-07-06 12:44 UTC (permalink / raw)
  To: linux-kernel

Brice Figureau wrote:

>> CFQ gives less (about 10-15%) throughput except for the kernel
>> with the
>> cfs cpu scheduler, where CFQ is on par with the other IO
>> schedulers.
>>
>
>Please have a look to kernel bug #7372:
>http://bugzilla.kernel.org/show_bug.cgi?id=7372
>
>It seems I encountered the almost same issue.
>
>The fix on my side, beside running 2.6.17 (which was working fine
>for me) was to:
>1) have /proc/sys/vm/vfs_cache_pressure=1
>2) have /proc/sys/vm/dirty_ratio=1 and 
> /proc/sys/vm/dirty_background_ratio=1
>3) have /proc/sys/vm/swappiness=2
>4) run Peter Zijlstra: per dirty device throttling patch on the
> top of 2.6.21.5:
>http://www.ussg.iu.edu/hypermail/linux/kernel/0706.1/2776.html

Brice,

 any of them sufficient, or all together nedded? Just to avoid
confusion.

Cheers
Martin


------------------------------------------------------
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

^ permalink raw reply	[flat|nested] 17+ messages in thread
* Re: Understanding I/O behaviour
@ 2007-07-06 14:25 Daniel J Blueman
  2007-07-06 15:17 ` Martin Knoblauch
  0 siblings, 1 reply; 17+ messages in thread
From: Daniel J Blueman @ 2007-07-06 14:25 UTC (permalink / raw)
  To: Martin Knoblauch; +Cc: Linux Kernel

On 5 Jul, 16:50, Martin Knoblauch <spamtrap@knobisoft.de> wrote:
> Hi,
>
>  for a customer we are operating a rackful of HP/DL380/G4 boxes that
> have given us some problems with system responsiveness under [I/O
> triggered] system load.
[snip]

IIRC, the locking in the CCISS driver was pretty heavy until later in
the 2.6 series (2.6.16?) kernels; I don't think they were backported
to the 1000 or so patches that comprise RH EL 4 kernels.

With write performance being really poor on the Smartarray controllers
without the battery-backed write cache, and with less-good locking,
performance can really suck.

On a total quiescent hp DL380 G2 (dual PIII, 1.13GHz Tualatin 512KB
L2$) running RH EL 5 (2.6.18) with a 32MB SmartArray 5i controller
with 6x36GB 10K RPM SCSI disks and all latest firmware:

# dd if=/dev/cciss/c0d0p2 of=/dev/zero bs=1024k count=1000
509+1 records in
509+1 records out
534643200 bytes (535 MB) copied, 11.6336 seconds, 46.0 MB/s

# dd if=/dev/zero of=/dev/cciss/c0d0p2 bs=1024k count=100
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 22.3091 seconds, 4.7 MB/s

Oh dear! There are internal performance problems with this controller.
The SmartArray 5i in the newer DL380 G3 (dual P4 2.8GHz, 512KB L2$) is
perhaps twice the read performance (PCI-X helps some) but still sucks.

I'd get the BBWC in or install another controller.

Daniel
-- 
Daniel J Blueman

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2007-07-09  8:47 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-07-05 15:40 Understanding I/O behaviour Martin Knoblauch
2007-07-05 18:15 ` Andrew Lyon
2007-07-05 20:22 ` Jesper Juhl
2007-07-08 21:28   ` Jesper Juhl
2007-07-09  8:47     ` Martin Knoblauch
     [not found] <fa.gAvf+r9fiPwNwNVqahYy5u1/Is0@ifi.uio.no>
2007-07-05 23:47 ` Robert Hancock
2007-07-05 23:53   ` Jesper Juhl
2007-07-06  7:54     ` Martin Knoblauch
2007-07-06 10:15       ` Brice Figureau
2007-07-06 10:11   ` Martin Knoblauch
2007-07-07 13:23     ` Leroy van Logchem
  -- strict thread matches above, loose matches on Subject: below --
2007-07-06 10:18 Martin Knoblauch
2007-07-06 11:03 Martin Knoblauch
2007-07-06 12:44 Martin Knoblauch
2007-07-06 14:25 Daniel J Blueman
2007-07-06 15:17 ` Martin Knoblauch
2007-07-06 15:44   ` Daniel J Blueman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox