public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* bdi_threshold slow to reach steady state
@ 2009-10-14 11:09 Richard Kennedy
  2009-10-14 11:37 ` Peter Zijlstra
  0 siblings, 1 reply; 5+ messages in thread
From: Richard Kennedy @ 2009-10-14 11:09 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Wu Fengguang, lkml

Hi Peter,

I've been running simple tests that uses fio to write 2Gb & reading the
bdi dirty threshold once a second from debugfs. 

The graph of bdi dirty threshold is nice and smooth but takes a long
time to reach a steady state, 60 seconds or more. (run on 2.6.32-rc4)

By eye it seems as though a first-order control system is a good model
for its behavior, so it approximates to 1-e^(-t/T). It just seems too
heavily damped ( at least on my machine).

For fun, I changed calc_period_shift to
	return ilog2(dirty_total - 1) - 2;

and it now reaches a steady state much quicker, around 4-5 seconds.

Tests that write to 2 disks at the same time show no significant
performance differences but are much more consistent, i.e. the standard
deviation is lower across multiple runs.

I have noticed that the first test run on a freshly booted machine is
always the slowest of any sequence of tests, but this change to
calc_period_shift greatly reduces this effect. 

So I wondered how you chose these values? and are there any other tests
that are useful to explore this?

I know that my machine is getting a bit old now, it's AMDX2 & only has
sata 150 drives, so I'm not suggesting that this change is going to be
correct for all machines but maybe we can set a better default? or take
more factors in to account other than just memory size. 

BTW why is it ilog2(dirty_total -1) -- what does the -1 do?

regards
Richard



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: bdi_threshold slow to reach steady state
  2009-10-14 11:09 bdi_threshold slow to reach steady state Richard Kennedy
@ 2009-10-14 11:37 ` Peter Zijlstra
  2009-10-14 13:55   ` Richard Kennedy
  0 siblings, 1 reply; 5+ messages in thread
From: Peter Zijlstra @ 2009-10-14 11:37 UTC (permalink / raw)
  To: Richard Kennedy; +Cc: Wu Fengguang, lkml, Martin Bligh

On Wed, 2009-10-14 at 12:09 +0100, Richard Kennedy wrote:
> Hi Peter,
> 
> I've been running simple tests that uses fio to write 2Gb & reading the
> bdi dirty threshold once a second from debugfs. 
> 
> The graph of bdi dirty threshold is nice and smooth but takes a long
> time to reach a steady state, 60 seconds or more. (run on 2.6.32-rc4)
> 
> By eye it seems as though a first-order control system is a good model
> for its behavior, so it approximates to 1-e^(-t/T). It just seems too
> heavily damped ( at least on my machine).
> 
> For fun, I changed calc_period_shift to
> 	return ilog2(dirty_total - 1) - 2;
> 
> and it now reaches a steady state much quicker, around 4-5 seconds.
> 
> Tests that write to 2 disks at the same time show no significant
> performance differences but are much more consistent, i.e. the standard
> deviation is lower across multiple runs.
> 
> I have noticed that the first test run on a freshly booted machine is
> always the slowest of any sequence of tests, but this change to
> calc_period_shift greatly reduces this effect. 
> 
> So I wondered how you chose these values? and are there any other tests
> that are useful to explore this?

Right, so we measure time in page writeback completions, and the measure
I used was the round up power of two of the dirty_thresh. We adjust in
the same time it takes to write out a full dirty_thresh amount of data.

The idea was that people would scale their dirty thesh according to
their writeout capacity, etc..

Martin J Bligh complained about this very same issue and I told them to
experiment with that same scale function. But I guess the result of that
got lost in the google filter (stuff goes in, nothing ever comes back
out).

Anyway, the dirty_thresh relation seems sensible still, but the exact
parameters could be poked at. I have no objection to reducing the period
with a factor of 16 like you did, except that we need some more
feedback, preferably from people with more than a few spindles.

(The initial ramp will be roughly twice as slow, since the steady state
of this approximation is half-full).

> I know that my machine is getting a bit old now, it's AMDX2 & only has
> sata 150 drives, so I'm not suggesting that this change is going to be
> correct for all machines but maybe we can set a better default? or take
> more factors in to account other than just memory size. 
> 
> BTW why is it ilog2(dirty_total -1) -- what does the -1 do?

http://lkml.org/lkml/2007/1/26/143


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: bdi_threshold slow to reach steady state
  2009-10-14 11:37 ` Peter Zijlstra
@ 2009-10-14 13:55   ` Richard Kennedy
  2009-10-14 14:04     ` Peter Zijlstra
  0 siblings, 1 reply; 5+ messages in thread
From: Richard Kennedy @ 2009-10-14 13:55 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Wu Fengguang, lkml, Martin Bligh

On Wed, 2009-10-14 at 13:37 +0200, Peter Zijlstra wrote:
> On Wed, 2009-10-14 at 12:09 +0100, Richard Kennedy wrote:
> > Hi Peter,

> 
> Right, so we measure time in page writeback completions, and the measure
> I used was the round up power of two of the dirty_thresh. We adjust in
> the same time it takes to write out a full dirty_thresh amount of data.
> 
> The idea was that people would scale their dirty thesh according to
> their writeout capacity, etc..
> 
> Martin J Bligh complained about this very same issue and I told them to
> experiment with that same scale function. But I guess the result of that
> got lost in the google filter (stuff goes in, nothing ever comes back
> out).
> 
> Anyway, the dirty_thresh relation seems sensible still, but the exact
> parameters could be poked at. I have no objection to reducing the period
> with a factor of 16 like you did, except that we need some more
> feedback, preferably from people with more than a few spindles.

Sure, hopefully big fast machines have large amounts of memory so it
should be a good fit.

Yes, it would be good if someone with a big box tested this ;)
Here's a patch just in case anyone does feel like giving it a spin.

> (The initial ramp will be roughly twice as slow, since the steady state
> of this approximation is half-full).
> 
> > I know that my machine is getting a bit old now, it's AMDX2 & only has
> > sata 150 drives, so I'm not suggesting that this change is going to be
> > correct for all machines but maybe we can set a better default? or take
> > more factors in to account other than just memory size. 
> > 
> > BTW why is it ilog2(dirty_total -1) -- what does the -1 do?
> 
> http://lkml.org/lkml/2007/1/26/143
> 
thanks for that
regards
Richard

(patch against 2.6.32-rc4)

commit 11735a2336ba08cf21aebf79a706c86aca5e44b2
Author: Richard Kennedy <richard@rsk.demon.co.uk>
Date:   Wed Oct 14 14:46:21 2009 +0100

    mm: speed up per bdi dirty threshold calculations
    
    
    Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk>

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index a3b1409..018024e 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -144,7 +144,7 @@ static int calc_period_shift(void)
 	else
 		dirty_total = (vm_dirty_ratio * determine_dirtyable_memory()) /
 				100;
-	return 2 + ilog2(dirty_total - 1);
+	return ilog2(dirty_total - 1) - 2;
 }
 
 /*



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: bdi_threshold slow to reach steady state
  2009-10-14 13:55   ` Richard Kennedy
@ 2009-10-14 14:04     ` Peter Zijlstra
  2009-10-15  9:22       ` Richard Kennedy
  0 siblings, 1 reply; 5+ messages in thread
From: Peter Zijlstra @ 2009-10-14 14:04 UTC (permalink / raw)
  To: Richard Kennedy; +Cc: Wu Fengguang, lkml, Martin Bligh

On Wed, 2009-10-14 at 14:55 +0100, Richard Kennedy wrote:
> 
> commit 11735a2336ba08cf21aebf79a706c86aca5e44b2
> Author: Richard Kennedy <richard@rsk.demon.co.uk>
> Date:   Wed Oct 14 14:46:21 2009 +0100
> 
>     mm: speed up per bdi dirty threshold calculations

I think the subject is confusing, we don't actually compute things
faster in the less cycles sense.

We reduce the dampening for the control system, yielding faster
convergence.

>     Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk>
> 
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index a3b1409..018024e 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -144,7 +144,7 @@ static int calc_period_shift(void)
>         else
>                 dirty_total = (vm_dirty_ratio *
> determine_dirtyable_memory()) /
>                                 100;
> -       return 2 + ilog2(dirty_total - 1);
> +       return ilog2(dirty_total - 1) - 2;
>  }


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: bdi_threshold slow to reach steady state
  2009-10-14 14:04     ` Peter Zijlstra
@ 2009-10-15  9:22       ` Richard Kennedy
  0 siblings, 0 replies; 5+ messages in thread
From: Richard Kennedy @ 2009-10-15  9:22 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Wu Fengguang, lkml, Martin Bligh

On Wed, 2009-10-14 at 16:04 +0200, Peter Zijlstra wrote:
> On Wed, 2009-10-14 at 14:55 +0100, Richard Kennedy wrote:
> > 
> > commit 11735a2336ba08cf21aebf79a706c86aca5e44b2
> > Author: Richard Kennedy <richard@rsk.demon.co.uk>
> > Date:   Wed Oct 14 14:46:21 2009 +0100
> > 
> >     mm: speed up per bdi dirty threshold calculations
> 
> I think the subject is confusing, we don't actually compute things
> faster in the less cycles sense.
> 
> We reduce the dampening for the control system, yielding faster
> convergence.
Ah yes, sorry about that. That was a bit of a placeholder.

I'll write a proper change log & re-post.
regards
Richard 


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2009-10-15  9:23 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-10-14 11:09 bdi_threshold slow to reach steady state Richard Kennedy
2009-10-14 11:37 ` Peter Zijlstra
2009-10-14 13:55   ` Richard Kennedy
2009-10-14 14:04     ` Peter Zijlstra
2009-10-15  9:22       ` Richard Kennedy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox