Re: Change in default vm_dirty_ratio

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Peter Zijlstra <peterz@infradead.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Mackall <mpm@selenic.com>,
	Jens Axboe <jens.axboe@oracle.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	davej@redhat.com, tim.c.chen@linux.intel.com,
	linux-kernel@vger.kernel.org
Subject: Re: Change in default vm_dirty_ratio
Date: Mon, 25 Jun 2007 02:15:52 +0200	[thread overview]
Message-ID: <1182730552.6174.45.camel@lappy> (raw)
In-Reply-To: <alpine.LFD.0.98.0706240907560.3593@woody.linux-foundation.org>

On Sun, 2007-06-24 at 09:40 -0700, Linus Torvalds wrote:
> 
> On Sat, 23 Jun 2007, Peter Zijlstra wrote:
> 
> > On Thu, 2007-06-21 at 16:08 -0700, Linus Torvalds wrote:
> > > 
> > > The vm_dirty_ratio thing is a global value, and I think we need that 
> > > regardless (for the independent issue of memory deadlocks etc), but if we 
> > > *additionally* had a per-device throttle that was based on some kind of 
> > > adaptive thing, we could probably raise the global (hard) vm_dirty_ratio a 
> > > lot.
> > 
> > I just did quite a bit of that:
> > 
> >   http://lkml.org/lkml/2007/6/14/437
> 
> Ok, that does look interesting.
> 
> A few comments:
> 
>  - Cosmetic: please please *please* don't do this:
> 
> 	-	if (atomic_long_dec_return(&nfss->writeback) < NFS_CONGESTION_OFF_THRESH) {
> 	+	if (atomic_long_dec_return(&nfss->writeback) <
> 	+			NFS_CONGESTION_OFF_THRESH)
> 
>    we had a discussion about this not that long ago, and it drives me wild 
>    to see people split lines like that. It used to be readable. Now it's 
>    not.
> 
>    If it's the checkpatch.pl thing that caused you to split it like that, 
>    I think we should just change the value where we start complaining. 
>    Maybe set it at 95 characters per line or something.

It was.

>  - I appreciate the extensive comments on floating proportions, and the 
>    code looks really quite clean (apart from the cosmetic thing about) 
>    from a quick look-through.
> 
>    HOWEVER. It does seem to be a bit of an overkill. Do we really need to 
>    be quite that clever,

Hehe, it is the simplest thing I could come up with. It is
deterministic, fast and has a nice physical model :-)

>  and do we really need to do 64-bit calculations 
>    for this?

No we don't (well, on 64bit arches we do). I actually only use unsigned
long, and even cast whatever comes out of the percpu_counter thing to
unsigned long.

>  The 64-bit ops in particular seem quite iffy: if we ever 
>    actually pass in an amount that doesn't fit in 32 bits, we'll turn 
>    those per-cpu counters into totally synchronous global counters, which 
>    seems to defeat the whole point of them. So I'm a bit taken aback by 
>    that whole "mod64" thing

That is only needed on 64bit arches, and even there, actually
encountering such large values will be rare at best. 

Also, this re-normalisation event that uses the call is low frequency.
That is, that part will be used once every ~ total_dirty_limit/nr_bdis
written out.

>    (I also hate the naming. I don't think "..._mod()" was a good name to 
>    begin with: "mod" means "modulus" to me, not "modify". Making it be 
>    called "mod64" just makes it even *worse*, since it's now _obviously_ 
>    about modulus - but isn't)

Agreed.

> So I'd appreciate some more explanations, but I'd also appreciate some 
> renaming of those functions. What used to be pretty bad naming just turned 
> *really* bad, imnsho.

It all just stems from Andrew asking if I could please re-use something
instread of duplication a lot of things. I picked percpu_counter because
that was the closest to what was needed. An unsigned long based per-cpu
counter would suit better.

There is another problem I have with this percpu_counter, it is rather
space hungry. It does a node affine sizeof(s32) kalloc on each cpu.
Which will end up using the smallest slab, and that is quite a bit
bigger than needed. But should be about the size of a cacheline
(otherwise we might still end up with false sharing).

I've been thinking of extending this per cpu allocator thing a bit to be
a little smarter about these things. What would be needed is a strict
per-cpu slab allocator. The current ones are node affine, which can
still cause false sharing (unless - as should be the case - these
objects are both cacheline aligned and of cacheline size). When we have
that, we can start using smaller objects.

next prev parent reply	other threads:[~2007-06-25  0:16 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-06-18 21:14 Change in default vm_dirty_ratio Tim Chen
2007-06-18 23:47 ` Andrew Morton
2007-06-19  0:06   ` Linus Torvalds
2007-06-19  0:09     ` Arjan van de Ven
2007-06-19 18:41   ` John Stoffel
2007-06-19 19:04     ` Linus Torvalds
2007-06-19 19:06       ` Linus Torvalds
2007-06-19 22:33       ` David Miller
2007-06-19 19:57   ` Andi Kleen
2007-06-20  4:24   ` Dave Jones
2007-06-20  4:44     ` Andrew Morton
2007-06-20  8:35       ` Peter Zijlstra
2007-06-20  8:58         ` Andrew Morton
2007-06-20  9:14           ` Jens Axboe
2007-06-20  9:19             ` Peter Zijlstra
2007-06-20  9:20               ` Jens Axboe
2007-06-20  9:43                 ` Peter Zijlstra
2007-06-21 22:53                 ` Matt Mackall
2007-06-21 23:08                   ` Linus Torvalds
2007-06-23 18:23                     ` Peter Zijlstra
2007-06-24 16:40                       ` Linus Torvalds
2007-06-25  0:15                         ` Peter Zijlstra [this message]
2007-06-20  9:19           ` Peter Zijlstra
2007-06-21 16:54           ` Mark Lord
2007-06-21 16:55             ` Peter Zijlstra
2007-06-20 17:17         ` Linus Torvalds
2007-06-20 18:12           ` Arjan van de Ven
2007-06-20 18:28             ` Linus Torvalds
2007-06-21 12:37     ` Nadia Derbey

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1182730552.6174.45.camel@lappy \
    --to=peterz@infradead.org \
    --cc=akpm@linux-foundation.org \
    --cc=davej@redhat.com \
    --cc=jens.axboe@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mpm@selenic.com \
    --cc=tim.c.chen@linux.intel.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox