Understanding I/O behaviour

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Understanding I/O behaviour
@ 2007-07-05 15:40 Martin Knoblauch
  2007-07-05 18:15 ` Andrew Lyon
  2007-07-05 20:22 ` Jesper Juhl
  0 siblings, 2 replies; 17+ messages in thread
From: Martin Knoblauch @ 2007-07-05 15:40 UTC (permalink / raw)
  To: linux-kernel

Hi,

 for a customer we are operating a rackful of HP/DL380/G4 boxes that
have given us some problems with system responsiveness under [I/O
triggered] system load.

 The systems in question have the following HW:

2x Intel/EM64T CPUs
8GB memory
CCISS Raid controller with 4x72GB SCSI disks as RAID5
2x BCM5704 NIC (using tg3)

 The distribution is RHEL4. We have tested several kernels including
the original 2.6.9, 2.6.19.2, 2.6.22-rc7 and 2.6.22-rc7+cfs-v18.

 One part of the workload is when several processes try to write 5 GB
each to the local filesystem (ext2->LVM->CCISS). When this happens, the
load goes up to 12 and responsiveness goes down. This means from one
moment to the next things like opening a ssh connection to the host in
question, or doing "df" take forever (minutes). Especially bad with the
vendor kernel, better (but not perfect) with 2.6.19 and 2.6.22-rc7.

 The load basically comes from the writing processes and up to 12
"pdflush" threads all being in "D" state.

 So, what I would like to understand is how we can maximize the
responsiveness of the system, while keeping disk throughput at maximum.

 During my investiogation I basically performed the following test,
because it represents the kind of trouble situation:

----
$ cat dd3.sh
echo "Start 3 dd processes: "`date`
dd if=/dev/zero of=/scratch/X1 bs=1M count=5000&
dd if=/dev/zero of=/scratch/X2 bs=1M count=5000&
dd if=/dev/zero of=/scratch/X3 bs=1M count=5000&
wait
echo "Finish 3 dd processes: "`date`
sync
echo "Finish sync: "`date`
rm -f /scratch/X?
echo "Files removed: "`date`
----

 This results in the following timings. All with the anticipatory
scheduler, because it gives the best results:

2.6.19.2, HT: 10m
2.6.19.2, non-HT: 8m45s
2.6.22-rc7, HT: 10m
2.6.22-rc7, non-HT: 6m
2.6.22-rc7+cfs_v18, HT: 10m40s
2.6.22-rc7+cfs_v18, non-HT: 10m45s

 The "felt" responsiveness was best with the last two kernels, although
the load profile over time looks identical in all cases.

 So, a few questions:

a) any idea why disabling HT improves throughput, except for the cfs
kernels? For plain 2.6.22 the difference is quite substantial
b) any ideas how to optimize the settings of the /proc/sys/vm/
parameters? The documentation is a bit thin here.

Thanks in advance
Martin

------------------------------------------------------
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Understanding I/O behaviour
  2007-07-05 15:40 Understanding I/O behaviour Martin Knoblauch
@ 2007-07-05 18:15 ` Andrew Lyon
  2007-07-05 20:22 ` Jesper Juhl
  1 sibling, 0 replies; 17+ messages in thread
From: Andrew Lyon @ 2007-07-05 18:15 UTC (permalink / raw)
  To: linux-kernel

On 7/5/07, Martin Knoblauch <spamtrap@knobisoft.de> wrote:
> Hi,
>
>  for a customer we are operating a rackful of HP/DL380/G4 boxes that
> have given us some problems with system responsiveness under [I/O
> triggered] system load.
>
>  The systems in question have the following HW:
>
> 2x Intel/EM64T CPUs
> 8GB memory
> CCISS Raid controller with 4x72GB SCSI disks as RAID5
> 2x BCM5704 NIC (using tg3)
>
>  The distribution is RHEL4. We have tested several kernels including
> the original 2.6.9, 2.6.19.2, 2.6.22-rc7 and 2.6.22-rc7+cfs-v18.
>
>  One part of the workload is when several processes try to write 5 GB
> each to the local filesystem (ext2->LVM->CCISS). When this happens, the
> load goes up to 12 and responsiveness goes down. This means from one
> moment to the next things like opening a ssh connection to the host in
> question, or doing "df" take forever (minutes). Especially bad with the
> vendor kernel, better (but not perfect) with 2.6.19 and 2.6.22-rc7.
>
>  The load basically comes from the writing processes and up to 12
> "pdflush" threads all being in "D" state.
>
>  So, what I would like to understand is how we can maximize the
> responsiveness of the system, while keeping disk throughput at maximum.
>
>  During my investiogation I basically performed the following test,
> because it represents the kind of trouble situation:
>
> ----
> $ cat dd3.sh
> echo "Start 3 dd processes: "`date`
> dd if=/dev/zero of=/scratch/X1 bs=1M count=5000&
> dd if=/dev/zero of=/scratch/X2 bs=1M count=5000&
> dd if=/dev/zero of=/scratch/X3 bs=1M count=5000&
> wait
> echo "Finish 3 dd processes: "`date`
> sync
> echo "Finish sync: "`date`
> rm -f /scratch/X?
> echo "Files removed: "`date`
> ----
>
>  This results in the following timings. All with the anticipatory
> scheduler, because it gives the best results:
>
> 2.6.19.2, HT: 10m
> 2.6.19.2, non-HT: 8m45s
> 2.6.22-rc7, HT: 10m
> 2.6.22-rc7, non-HT: 6m
> 2.6.22-rc7+cfs_v18, HT: 10m40s
> 2.6.22-rc7+cfs_v18, non-HT: 10m45s
>
>  The "felt" responsiveness was best with the last two kernels, although
> the load profile over time looks identical in all cases.
>
>  So, a few questions:
>
> a) any idea why disabling HT improves throughput, except for the cfs
> kernels? For plain 2.6.22 the difference is quite substantial

Under certain loads HT can reduce performance, I have had serious
performance problems on windows terminal servers with HT enabled, and
I now disable it on all servers, no matter what OS they run.

Why? http://blogs.msdn.com/slavao/archive/2005/11/12/492119.aspx

> b) any ideas how to optimize the settings of the /proc/sys/vm/
> parameters? The documentation is a bit thin here.

I cant offer any advice there, but is raid-5 really the best choice
for your needs? I would not choose raid-5 for a system that is
regularly performing lots of large writes at the same time, dont
forget that each write can require several reads to recalculate the
partity.

Does the raid card have much cache ram?

If you can afford to loose some space raid-10 would probably perform better.

Andy

>
> Thanks in advance
> Martin
>
> ------------------------------------------------------
> Martin Knoblauch
> email: k n o b i AT knobisoft DOT de
> www:   http://www.knobisoft.de
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Understanding I/O behaviour
  2007-07-05 15:40 Understanding I/O behaviour Martin Knoblauch
  2007-07-05 18:15 ` Andrew Lyon
@ 2007-07-05 20:22 ` Jesper Juhl
  2007-07-08 21:28   ` Jesper Juhl
  1 sibling, 1 reply; 17+ messages in thread
From: Jesper Juhl @ 2007-07-05 20:22 UTC (permalink / raw)
  To: knobi; +Cc: linux-kernel

On 05/07/07, Martin Knoblauch <spamtrap@knobisoft.de> wrote:
> Hi,
>
>  for a customer we are operating a rackful of HP/DL380/G4 boxes that
> have given us some problems with system responsiveness under [I/O
> triggered] system load.
>
>  The systems in question have the following HW:
>
> 2x Intel/EM64T CPUs
> 8GB memory
> CCISS Raid controller with 4x72GB SCSI disks as RAID5
> 2x BCM5704 NIC (using tg3)
>
>  The distribution is RHEL4. We have tested several kernels including
> the original 2.6.9, 2.6.19.2, 2.6.22-rc7 and 2.6.22-rc7+cfs-v18.
>
>  One part of the workload is when several processes try to write 5 GB
> each to the local filesystem (ext2->LVM->CCISS). When this happens, the
> load goes up to 12 and responsiveness goes down. This means from one
> moment to the next things like opening a ssh connection to the host in
> question, or doing "df" take forever (minutes). Especially bad with the
> vendor kernel, better (but not perfect) with 2.6.19 and 2.6.22-rc7.
>
>  The load basically comes from the writing processes and up to 12
> "pdflush" threads all being in "D" state.
>
>  So, what I would like to understand is how we can maximize the
> responsiveness of the system, while keeping disk throughput at maximum.
>

I'd suspect you can't get both at 100%.

I'd guess you are probably using a 100Hz no-preempt kernel.  Have you
tried a 1000Hz + preempt kernel?   Sure, you'll get a bit lower
overall throughput, but interactive responsiveness should be better -
if it is, then you could experiment with various combinations of
CONFIG_PREEMPT, CONFIG_PREEMPT_VOLUNTARY, CONFIG_PREEMPT_NONE and
CONFIG_HZ_1000, CONFIG_HZ_300, CONFIG_HZ_250, CONFIG_HZ_100 to see
what gives you the best balance between throughput and interactive
responsiveness (you could also throw CONFIG_PREEMPT_BKL and/or
CONFIG_NO_HZ, but I don't think the impact will be as significant as
with the other options, so to keep things simple I'd leave those out
at first) .

I'd guess that something like CONFIG_PREEMPT_VOLUNTARY + CONFIG_HZ_300
would probably be a good compromise for you, but just to see if
there's any effect at all, start out with CONFIG_PREEMPT +
CONFIG_HZ_1000.

Hope that helps.

(PS. please don't do crap like using that spamtrap@ address and have
people manually replace it with the one from your .signature when
posting on LKML - it's annoying as hell)

-- 
Jesper Juhl <jesper.juhl@gmail.com>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please      http://www.expita.com/nomime.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Understanding I/O behaviour
  2007-07-05 20:22 ` Jesper Juhl
@ 2007-07-08 21:28   ` Jesper Juhl
  2007-07-09  8:47     ` Martin Knoblauch
  0 siblings, 1 reply; 17+ messages in thread
From: Jesper Juhl @ 2007-07-08 21:28 UTC (permalink / raw)
  To: knobi; +Cc: linux-kernel

On 05/07/07, Jesper Juhl <jesper.juhl@gmail.com> wrote:
> On 05/07/07, Martin Knoblauch <spamtrap@knobisoft.de> wrote:
> > Hi,
> >
> >  for a customer we are operating a rackful of HP/DL380/G4 boxes that
> > have given us some problems with system responsiveness under [I/O
> > triggered] system load.
> >
> >  The systems in question have the following HW:
> >
> > 2x Intel/EM64T CPUs
> > 8GB memory
> > CCISS Raid controller with 4x72GB SCSI disks as RAID5
> > 2x BCM5704 NIC (using tg3)
> >
> >  The distribution is RHEL4. We have tested several kernels including
> > the original 2.6.9, 2.6.19.2, 2.6.22-rc7 and 2.6.22-rc7+cfs-v18.
> >
> >  One part of the workload is when several processes try to write 5 GB
> > each to the local filesystem (ext2->LVM->CCISS). When this happens, the
> > load goes up to 12 and responsiveness goes down. This means from one
> > moment to the next things like opening a ssh connection to the host in
> > question, or doing "df" take forever (minutes). Especially bad with the
> > vendor kernel, better (but not perfect) with 2.6.19 and 2.6.22-rc7.
> >
> >  The load basically comes from the writing processes and up to 12
> > "pdflush" threads all being in "D" state.
> >
> >  So, what I would like to understand is how we can maximize the
> > responsiveness of the system, while keeping disk throughput at maximum.
> >
>
> I'd suspect you can't get both at 100%.
>
> I'd guess you are probably using a 100Hz no-preempt kernel.  Have you
> tried a 1000Hz + preempt kernel?   Sure, you'll get a bit lower
> overall throughput, but interactive responsiveness should be better -
> if it is, then you could experiment with various combinations of
> CONFIG_PREEMPT, CONFIG_PREEMPT_VOLUNTARY, CONFIG_PREEMPT_NONE and
> CONFIG_HZ_1000, CONFIG_HZ_300, CONFIG_HZ_250, CONFIG_HZ_100 to see
> what gives you the best balance between throughput and interactive
> responsiveness (you could also throw CONFIG_PREEMPT_BKL and/or
> CONFIG_NO_HZ, but I don't think the impact will be as significant as
> with the other options, so to keep things simple I'd leave those out
> at first) .
>
> I'd guess that something like CONFIG_PREEMPT_VOLUNTARY + CONFIG_HZ_300
> would probably be a good compromise for you, but just to see if
> there's any effect at all, start out with CONFIG_PREEMPT +
> CONFIG_HZ_1000.
>

I'm currious, did you ever try playing around with CONFIG_PREEMPT* and
CONFIG_HZ* to see if that had any noticable impact on interactive
performance and stuff like logging into the box via ssh etc...?

-- 
Jesper Juhl <jesper.juhl@gmail.com>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please      http://www.expita.com/nomime.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Understanding I/O behaviour
  2007-07-08 21:28   ` Jesper Juhl
@ 2007-07-09  8:47     ` Martin Knoblauch
  0 siblings, 0 replies; 17+ messages in thread
From: Martin Knoblauch @ 2007-07-09  8:47 UTC (permalink / raw)
  To: Jesper Juhl; +Cc: linux-kernel


--- Jesper Juhl <jesper.juhl@gmail.com> wrote:

> On 05/07/07, Jesper Juhl <jesper.juhl@gmail.com> wrote:
> > On 05/07/07, Martin Knoblauch <spamtrap@knobisoft.de> wrote:
> > > Hi,
> > >
> >
> > I'd suspect you can't get both at 100%.
> >
> > I'd guess you are probably using a 100Hz no-preempt kernel.  Have
> you
> > tried a 1000Hz + preempt kernel?   Sure, you'll get a bit lower
> > overall throughput, but interactive responsiveness should be better
> -
> > if it is, then you could experiment with various combinations of
> > CONFIG_PREEMPT, CONFIG_PREEMPT_VOLUNTARY, CONFIG_PREEMPT_NONE and
> > CONFIG_HZ_1000, CONFIG_HZ_300, CONFIG_HZ_250, CONFIG_HZ_100 to see
> > what gives you the best balance between throughput and interactive
> > responsiveness (you could also throw CONFIG_PREEMPT_BKL and/or
> > CONFIG_NO_HZ, but I don't think the impact will be as significant
> as
> > with the other options, so to keep things simple I'd leave those
> out
> > at first) .
> >
> > I'd guess that something like CONFIG_PREEMPT_VOLUNTARY +
> CONFIG_HZ_300
> > would probably be a good compromise for you, but just to see if
> > there's any effect at all, start out with CONFIG_PREEMPT +
> > CONFIG_HZ_1000.
> >
> 
> I'm currious, did you ever try playing around with CONFIG_PREEMPT*
> and
> CONFIG_HZ* to see if that had any noticable impact on interactive
> performance and stuff like logging into the box via ssh etc...?
> 
> -- 
> Jesper Juhl <jesper.juhl@gmail.com>
> Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
> Plain text mails only, please      http://www.expita.com/nomime.html
> 
> 
Hi Jesper,

 my initial kernel was voluntary@100HZ. I have switched to 300HZ, but
have not observed much difference. The config is now:

config-2.6.22-rc7:# CONFIG_PREEMPT_NONE is not set
config-2.6.22-rc7:CONFIG_PREEMPT_VOLUNTARY=y
config-2.6.22-rc7:# CONFIG_PREEMPT is not set
config-2.6.22-rc7:CONFIG_PREEMPT_BKL=y

Cheers


------------------------------------------------------
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

^ permalink raw reply	[flat|nested] 17+ messages in thread

[parent not found: <fa.gAvf+r9fiPwNwNVqahYy5u1/Is0@ifi.uio.no>]

* Re: Understanding I/O behaviour
       [not found] <fa.gAvf+r9fiPwNwNVqahYy5u1/Is0@ifi.uio.no>
@ 2007-07-05 23:47 ` Robert Hancock
  2007-07-05 23:53   ` Jesper Juhl
  2007-07-06 10:11   ` Martin Knoblauch
  0 siblings, 2 replies; 17+ messages in thread
From: Robert Hancock @ 2007-07-05 23:47 UTC (permalink / raw)
  To: knobi; +Cc: linux-kernel

Martin Knoblauch wrote:
> Hi,
> 
>  for a customer we are operating a rackful of HP/DL380/G4 boxes that
> have given us some problems with system responsiveness under [I/O
> triggered] system load.
> 
>  The systems in question have the following HW:
> 
> 2x Intel/EM64T CPUs
> 8GB memory
> CCISS Raid controller with 4x72GB SCSI disks as RAID5
> 2x BCM5704 NIC (using tg3)
> 
>  The distribution is RHEL4. We have tested several kernels including
> the original 2.6.9, 2.6.19.2, 2.6.22-rc7 and 2.6.22-rc7+cfs-v18.
> 
>  One part of the workload is when several processes try to write 5 GB
> each to the local filesystem (ext2->LVM->CCISS). When this happens, the
> load goes up to 12 and responsiveness goes down. This means from one
> moment to the next things like opening a ssh connection to the host in
> question, or doing "df" take forever (minutes). Especially bad with the
> vendor kernel, better (but not perfect) with 2.6.19 and 2.6.22-rc7.
> 
>  The load basically comes from the writing processes and up to 12
> "pdflush" threads all being in "D" state.
> 
>  So, what I would like to understand is how we can maximize the
> responsiveness of the system, while keeping disk throughput at maximum.
> 
>  During my investiogation I basically performed the following test,
> because it represents the kind of trouble situation:
> 
> ----
> $ cat dd3.sh
> echo "Start 3 dd processes: "`date`
> dd if=/dev/zero of=/scratch/X1 bs=1M count=5000&
> dd if=/dev/zero of=/scratch/X2 bs=1M count=5000&
> dd if=/dev/zero of=/scratch/X3 bs=1M count=5000&
> wait
> echo "Finish 3 dd processes: "`date`
> sync
> echo "Finish sync: "`date`
> rm -f /scratch/X?
> echo "Files removed: "`date`
> ----
> 
>  This results in the following timings. All with the anticipatory
> scheduler, because it gives the best results:
> 
> 2.6.19.2, HT: 10m
> 2.6.19.2, non-HT: 8m45s
> 2.6.22-rc7, HT: 10m
> 2.6.22-rc7, non-HT: 6m
> 2.6.22-rc7+cfs_v18, HT: 10m40s
> 2.6.22-rc7+cfs_v18, non-HT: 10m45s
> 
>  The "felt" responsiveness was best with the last two kernels, although
> the load profile over time looks identical in all cases.
> 
>  So, a few questions:
> 
> a) any idea why disabling HT improves throughput, except for the cfs
> kernels? For plain 2.6.22 the difference is quite substantial
> b) any ideas how to optimize the settings of the /proc/sys/vm/
> parameters? The documentation is a bit thin here.

Try playing with reducing /proc/sys/vm/dirty_ratio and see how that 
helps. This workload will fill up memory with dirty data very quickly, 
and it seems like system responsiveness often goes down the toilet when 
this happens and the system is going crazy trying to write it all out.

-- 
Robert Hancock      Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Understanding I/O behaviour
  2007-07-05 23:47 ` Robert Hancock
@ 2007-07-05 23:53   ` Jesper Juhl
  2007-07-06  7:54     ` Martin Knoblauch
  2007-07-06 10:11   ` Martin Knoblauch
  1 sibling, 1 reply; 17+ messages in thread
From: Jesper Juhl @ 2007-07-05 23:53 UTC (permalink / raw)
  To: Robert Hancock; +Cc: knobi, linux-kernel

On 06/07/07, Robert Hancock <hancockr@shaw.ca> wrote:
[snip]
>
> Try playing with reducing /proc/sys/vm/dirty_ratio and see how that
> helps. This workload will fill up memory with dirty data very quickly,
> and it seems like system responsiveness often goes down the toilet when
> this happens and the system is going crazy trying to write it all out.
>

Perhaps trying out a different elevator would also be worthwhile.

-- 
Jesper Juhl <jesper.juhl@gmail.com>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please      http://www.expita.com/nomime.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Understanding I/O behaviour
  2007-07-05 23:53   ` Jesper Juhl
@ 2007-07-06  7:54     ` Martin Knoblauch
  2007-07-06 10:15       ` Brice Figureau
  0 siblings, 1 reply; 17+ messages in thread
From: Martin Knoblauch @ 2007-07-06  7:54 UTC (permalink / raw)
  To: Jesper Juhl, Robert Hancock; +Cc: knobi, linux-kernel


--- Jesper Juhl <jesper.juhl@gmail.com> wrote:

> On 06/07/07, Robert Hancock <hancockr@shaw.ca> wrote:
> [snip]
> >
> > Try playing with reducing /proc/sys/vm/dirty_ratio and see how that
> > helps. This workload will fill up memory with dirty data very
> quickly,
> > and it seems like system responsiveness often goes down the toilet
> when
> > this happens and the system is going crazy trying to write it all
> out.
> >
> 
> Perhaps trying out a different elevator would also be worthwhile.
> 

 AS seems to be the best one (NOOP and DeadLine seem to be equally OK).
CFQ gives less (about 10-15%) throughput except for the kernel with the
cfs cpu scheduler, where CFQ is on par with the other IO schedulers.

Thanks
Martin

------------------------------------------------------
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Understanding I/O behaviour
  2007-07-06  7:54     ` Martin Knoblauch
@ 2007-07-06 10:15       ` Brice Figureau
  0 siblings, 0 replies; 17+ messages in thread
From: Brice Figureau @ 2007-07-06 10:15 UTC (permalink / raw)
  To: linux-kernel

Martin Knoblauch <spamtrap <at> knobisoft.de> writes:

> --- Jesper Juhl <jesper.juhl <at> gmail.com> wrote:
> 
> > On 06/07/07, Robert Hancock <hancockr <at> shaw.ca> wrote:
> > [snip]
> > >
> > > Try playing with reducing /proc/sys/vm/dirty_ratio and see how that
> > > helps. This workload will fill up memory with dirty data very
> > quickly,
> > > and it seems like system responsiveness often goes down the toilet
> > when
> > > this happens and the system is going crazy trying to write it all
> > out.
> > >
> > 
> > Perhaps trying out a different elevator would also be worthwhile.
> > 
> 
>  AS seems to be the best one (NOOP and DeadLine seem to be equally OK).
> CFQ gives less (about 10-15%) throughput except for the kernel with the
> cfs cpu scheduler, where CFQ is on par with the other IO schedulers.
> 

Please have a look to kernel bug #7372:
http://bugzilla.kernel.org/show_bug.cgi?id=7372

It seems I encountered the almost same issue.

The fix on my side, beside running 2.6.17 (which was working fine for me) was to:
 1) have /proc/sys/vm/vfs_cache_pressure=1
 2) have /proc/sys/vm/dirty_ratio=1 and /proc/sys/vm/dirty_background_ratio=1
 3) have /proc/sys/vm/swappiness=2
 4) run Peter Zijlstra: per dirty device throttling patch on the top of 2.6.21.5:
http://www.ussg.iu.edu/hypermail/linux/kernel/0706.1/2776.html

Hope that helps,
--
Brice Figureau


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Understanding I/O behaviour
  2007-07-05 23:47 ` Robert Hancock
  2007-07-05 23:53   ` Jesper Juhl
@ 2007-07-06 10:11   ` Martin Knoblauch
  2007-07-07 13:23     ` Leroy van Logchem
  1 sibling, 1 reply; 17+ messages in thread
From: Martin Knoblauch @ 2007-07-06 10:11 UTC (permalink / raw)
  To: Robert Hancock; +Cc: linux-kernel

--- Robert Hancock <hancockr@shaw.ca> wrote:

> 
> Try playing with reducing /proc/sys/vm/dirty_ratio and see how that 
> helps. This workload will fill up memory with dirty data very
> quickly, 
> and it seems like system responsiveness often goes down the toilet
> when 
> this happens and the system is going crazy trying to write it all
> out.
> 

 Definitely the "going crazy" part is the worst problem I see with 2.6
based kernels (late 2.4 was really better in this corner case).

 I am just now playing with dirty_ratio. Anybody knows what the lower
limit is? "0" seems acceptabel, but does it actually imply "write out
immediatelly"?

 Another problem, the VM parameters are not really well dociÃºmented in
their behaviour and interdependence.

Cheers
Martin

------------------------------------------------------
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Understanding I/O behaviour
  2007-07-06 10:11   ` Martin Knoblauch
@ 2007-07-07 13:23     ` Leroy van Logchem
  0 siblings, 0 replies; 17+ messages in thread
From: Leroy van Logchem @ 2007-07-07 13:23 UTC (permalink / raw)
  To: linux-kernel

>  I am just now playing with dirty_ratio. Anybody knows what the lower
> limit is? "0" seems acceptabel, but does it actually imply "write out
> immediatelly"?

You should "watch -n 1 cat /proc/meminfo" and monitor the Dirty and Writeback
while lowering the amount the kernel may keep dirty. The solution we are hoping
for is are the per device dirty throttling -v7 patches.

-- 
Leroy

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Understanding I/O behaviour
@ 2007-07-06 10:18 Martin Knoblauch
  0 siblings, 0 replies; 17+ messages in thread
From: Martin Knoblauch @ 2007-07-06 10:18 UTC (permalink / raw)
  To: linux-kernel

>>    b) any ideas how to optimize the settings of the /proc/sys/vm/
>>    parameters? The documentation is a bit thin here.
>>
>>
>I cant offer any advice there, but is raid-5 really the best choice
>for your needs? I would not choose raid-5 for a system that is
>regularly performing lots of large writes at the same time, dont
>forget that each write can require several reads to recalculate the
>partity.
>
>Does the raid card have much cache ram?
>

 192 MB, split 50/50 to read write.

>If you can afford to loose some space raid-10 would probably perform
>better.

 RAID5 most likely is not the best solution and I would not use it if
the described use-case was happening all the time. It happens a few
times a day and then things go down when all memory is filled with
page-cache.

 And the same also happens when copying large amountd of data from one
NFS mounted FS to another NFS mounted FS. No disk involved there.
Memory fills with page-cache until it reaches a ceeling and then for
some time responsiveness is really really bad.

 I am just now playing with the dirty_* stuff. Maybe it helps.

Cheers
Martin

------------------------------------------------------
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Understanding I/O behaviour
@ 2007-07-06 11:03 Martin Knoblauch
  0 siblings, 0 replies; 17+ messages in thread
From: Martin Knoblauch @ 2007-07-06 11:03 UTC (permalink / raw)
  To: linux-kernel; +Cc: spam trap

Martin Knoblauch wrote:
>--- Robert Hancock <hancockr@xxxxxxx> wrote:
>
>>
>> Try playing with reducing /proc/sys/vm/dirty_ratio and see how that
>> helps. This workload will fill up memory with dirty data very
>> quickly,
>> and it seems like system responsiveness often goes down the toilet
>> when
>> this happens and the system is going crazy trying to write it all
>> out.
>>
>
>Definitely the "going crazy" part is the worst problem I see with 2.6
>based kernels (late 2.4 was really better in this corner case).
>
>I am just now playing with dirty_ratio. Anybody knows what the lower
>limit is? "0" seems acceptabel, but does it actually imply "write out
>immediatelly"?
>
>Another problem, the VM parameters are not really well documented in
>their behaviour and interdependence.

 Lowering dirty_ration just leads to more imbalanced write-speed for
the three dd's. Even when lowering the number to 0, the hich load
stays.

 Now, on another experiment I mounted the FS with "sync". And now the
load stays below/around 3. No more "pdflush" daemons going wild. And
the responsiveness is good, with no drops.

 My question is now: is there a parameter that one can use to force
immediate writeout for every process. This may hurt overall performance
of the system, but might really help my situation. Setting dirty_ratio
to 0 does not seem to do it.

Cheers
Martin

------------------------------------------------------
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Understanding I/O behaviour
@ 2007-07-06 12:44 Martin Knoblauch
  0 siblings, 0 replies; 17+ messages in thread
From: Martin Knoblauch @ 2007-07-06 12:44 UTC (permalink / raw)
  To: linux-kernel

Brice Figureau wrote:

>> CFQ gives less (about 10-15%) throughput except for the kernel
>> with the
>> cfs cpu scheduler, where CFQ is on par with the other IO
>> schedulers.
>>
>
>Please have a look to kernel bug #7372:
>http://bugzilla.kernel.org/show_bug.cgi?id=7372
>
>It seems I encountered the almost same issue.
>
>The fix on my side, beside running 2.6.17 (which was working fine
>for me) was to:
>1) have /proc/sys/vm/vfs_cache_pressure=1
>2) have /proc/sys/vm/dirty_ratio=1 and 
> /proc/sys/vm/dirty_background_ratio=1
>3) have /proc/sys/vm/swappiness=2
>4) run Peter Zijlstra: per dirty device throttling patch on the
> top of 2.6.21.5:
>http://www.ussg.iu.edu/hypermail/linux/kernel/0706.1/2776.html

Brice,

 any of them sufficient, or all together nedded? Just to avoid
confusion.

Cheers
Martin


------------------------------------------------------
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Understanding I/O behaviour
@ 2007-07-06 14:25 Daniel J Blueman
  2007-07-06 15:17 ` Martin Knoblauch
  0 siblings, 1 reply; 17+ messages in thread
From: Daniel J Blueman @ 2007-07-06 14:25 UTC (permalink / raw)
  To: Martin Knoblauch; +Cc: Linux Kernel

On 5 Jul, 16:50, Martin Knoblauch <spamtrap@knobisoft.de> wrote:
> Hi,
>
>  for a customer we are operating a rackful of HP/DL380/G4 boxes that
> have given us some problems with system responsiveness under [I/O
> triggered] system load.
[snip]

IIRC, the locking in the CCISS driver was pretty heavy until later in
the 2.6 series (2.6.16?) kernels; I don't think they were backported
to the 1000 or so patches that comprise RH EL 4 kernels.

With write performance being really poor on the Smartarray controllers
without the battery-backed write cache, and with less-good locking,
performance can really suck.

On a total quiescent hp DL380 G2 (dual PIII, 1.13GHz Tualatin 512KB
L2$) running RH EL 5 (2.6.18) with a 32MB SmartArray 5i controller
with 6x36GB 10K RPM SCSI disks and all latest firmware:

# dd if=/dev/cciss/c0d0p2 of=/dev/zero bs=1024k count=1000
509+1 records in
509+1 records out
534643200 bytes (535 MB) copied, 11.6336 seconds, 46.0 MB/s

# dd if=/dev/zero of=/dev/cciss/c0d0p2 bs=1024k count=100
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 22.3091 seconds, 4.7 MB/s

Oh dear! There are internal performance problems with this controller.
The SmartArray 5i in the newer DL380 G3 (dual P4 2.8GHz, 512KB L2$) is
perhaps twice the read performance (PCI-X helps some) but still sucks.

I'd get the BBWC in or install another controller.

Daniel
-- 
Daniel J Blueman

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Understanding I/O behaviour
  2007-07-06 14:25 Daniel J Blueman
@ 2007-07-06 15:17 ` Martin Knoblauch
  2007-07-06 15:44   ` Daniel J Blueman
  0 siblings, 1 reply; 17+ messages in thread
From: Martin Knoblauch @ 2007-07-06 15:17 UTC (permalink / raw)
  To: Daniel J Blueman; +Cc: Linux Kernel


--- Daniel J Blueman <daniel.blueman@gmail.com> wrote:

> On 5 Jul, 16:50, Martin Knoblauch <spamtrap@knobisoft.de> wrote:
> > Hi,
> >
> >  for a customer we are operating a rackful of HP/DL380/G4 boxes
> that
> > have given us some problems with system responsiveness under [I/O
> > triggered] system load.
> [snip]
> 
> IIRC, the locking in the CCISS driver was pretty heavy until later in
> the 2.6 series (2.6.16?) kernels; I don't think they were backported
> to the 1000 or so patches that comprise RH EL 4 kernels.
> 
> With write performance being really poor on the Smartarray
> controllers
> without the battery-backed write cache, and with less-good locking,
> performance can really suck.
> 
> On a total quiescent hp DL380 G2 (dual PIII, 1.13GHz Tualatin 512KB
> L2$) running RH EL 5 (2.6.18) with a 32MB SmartArray 5i controller
> with 6x36GB 10K RPM SCSI disks and all latest firmware:
> 
> # dd if=/dev/cciss/c0d0p2 of=/dev/zero bs=1024k count=1000
> 509+1 records in
> 509+1 records out
> 534643200 bytes (535 MB) copied, 11.6336 seconds, 46.0 MB/s
> 
> # dd if=/dev/zero of=/dev/cciss/c0d0p2 bs=1024k count=100
> 100+0 records in
> 100+0 records out
> 104857600 bytes (105 MB) copied, 22.3091 seconds, 4.7 MB/s
> 
> Oh dear! There are internal performance problems with this
> controller.
> The SmartArray 5i in the newer DL380 G3 (dual P4 2.8GHz, 512KB L2$)
> is
> perhaps twice the read performance (PCI-X helps some) but still
> sucks.
> 
> I'd get the BBWC in or install another controller.
> 
Hi Daniel,

 thanks for the suggestion. The DL380g4 boxes have the "6i" and all
systems are equipped with the BBWC (192 MB, split 50/50).

 The thing is not really a speed daemon, but sufficient for the task.

 The problem really seems to be related to the VM system not writing
out dirty pages early enough and then getting into trouble when the
pressure gets to high.

Cheers
Martin



------------------------------------------------------
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Understanding I/O behaviour
  2007-07-06 15:17 ` Martin Knoblauch
@ 2007-07-06 15:44   ` Daniel J Blueman
  0 siblings, 0 replies; 17+ messages in thread
From: Daniel J Blueman @ 2007-07-06 15:44 UTC (permalink / raw)
  To: spamtrap; +Cc: Linux Kernel

> > On 5 Jul, 16:50, Martin Knoblauch <spamtrap@knobisoft.de> wrote:
> > > Hi,
> > >
> > >  for a customer we are operating a rackful of HP/DL380/G4 boxes
> > that
> > > have given us some problems with system responsiveness under [I/O
> > > triggered] system load.
> > [snip]
> >
> > IIRC, the locking in the CCISS driver was pretty heavy until later in
> > the 2.6 series (2.6.16?) kernels; I don't think they were backported
> > to the 1000 or so patches that comprise RH EL 4 kernels.
> >
> > With write performance being really poor on the Smartarray
> > controllers
> > without the battery-backed write cache, and with less-good locking,
> > performance can really suck.
> >
> > On a total quiescent hp DL380 G2 (dual PIII, 1.13GHz Tualatin 512KB
> > L2$) running RH EL 5 (2.6.18) with a 32MB SmartArray 5i controller
> > with 6x36GB 10K RPM SCSI disks and all latest firmware:
> >
> > # dd if=/dev/cciss/c0d0p2 of=/dev/zero bs=1024k count=1000
> > 509+1 records in
> > 509+1 records out
> > 534643200 bytes (535 MB) copied, 11.6336 seconds, 46.0 MB/s
> >
> > # dd if=/dev/zero of=/dev/cciss/c0d0p2 bs=1024k count=100
> > 100+0 records in
> > 100+0 records out
> > 104857600 bytes (105 MB) copied, 22.3091 seconds, 4.7 MB/s
> >
> > Oh dear! There are internal performance problems with this
> > controller.
> > The SmartArray 5i in the newer DL380 G3 (dual P4 2.8GHz, 512KB L2$)
> > is
> > perhaps twice the read performance (PCI-X helps some) but still
> > sucks.
> >
> > I'd get the BBWC in or install another controller.
> >
> Hi Daniel,
>
>  thanks for the suggestion. The DL380g4 boxes have the "6i" and all
> systems are equipped with the BBWC (192 MB, split 50/50).
>
>  The thing is not really a speed daemon, but sufficient for the task.
>
>  The problem really seems to be related to the VM system not writing
> out dirty pages early enough and then getting into trouble when the
> pressure gets to high.

Hmm...check out /proc/sys/vm/dirty_* and the documentation in the
kernel tree for this.

Just measuring single-spindle performance, it's still poor on RH EL4
(2.6.9) x86-64 with 64MB SmartArray 6i (w/o BBWC):

# swapoff -av
swapoff on /dev/cciss/c0d0p2

# time dd if=/dev/cciss/c0d0p2 of=/dev/null bs=1024k count=1000
real    0m49.717s  <-- 20MB/s

# time dd if=/dev/zero of=/dev/cciss/c0d0p2 bs=1024k count=1000
real    0m25.372s  <-- 39MB/s

Daniel
-- 
Daniel J Blueman

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2007-07-09  8:47 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-07-05 15:40 Understanding I/O behaviour Martin Knoblauch
2007-07-05 18:15 ` Andrew Lyon
2007-07-05 20:22 ` Jesper Juhl
2007-07-08 21:28   ` Jesper Juhl
2007-07-09  8:47     ` Martin Knoblauch
     [not found] <fa.gAvf+r9fiPwNwNVqahYy5u1/Is0@ifi.uio.no>
2007-07-05 23:47 ` Robert Hancock
2007-07-05 23:53   ` Jesper Juhl
2007-07-06  7:54     ` Martin Knoblauch
2007-07-06 10:15       ` Brice Figureau
2007-07-06 10:11   ` Martin Knoblauch
2007-07-07 13:23     ` Leroy van Logchem
  -- strict thread matches above, loose matches on Subject: below --
2007-07-06 10:18 Martin Knoblauch
2007-07-06 11:03 Martin Knoblauch
2007-07-06 12:44 Martin Knoblauch
2007-07-06 14:25 Daniel J Blueman
2007-07-06 15:17 ` Martin Knoblauch
2007-07-06 15:44   ` Daniel J Blueman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox