Re: Change in default vm_dirty_ratio

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Peter Zijlstra <peterz@infradead.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Mackall <mpm@selenic.com>,
	Jens Axboe <jens.axboe@oracle.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	davej@redhat.com, tim.c.chen@linux.intel.com,
	linux-kernel@vger.kernel.org
Subject: Re: Change in default vm_dirty_ratio
Date: Mon, 25 Jun 2007 02:15:52 +0200	[thread overview]
Message-ID: <1182730552.6174.45.camel@lappy> (raw)
In-Reply-To: <alpine.LFD.0.98.0706240907560.3593@woody.linux-foundation.org>

On Sun, 2007-06-24 at 09:40 -0700, Linus Torvalds wrote:
> 
> On Sat, 23 Jun 2007, Peter Zijlstra wrote:
> 
> > On Thu, 2007-06-21 at 16:08 -0700, Linus Torvalds wrote:
> > > 
> > > The vm_dirty_ratio thing is a global value, and I think we need that 
> > > regardless (for the independent issue of memory deadlocks etc), but if we 
> > > *additionally* had a per-device throttle that was based on some kind of 
> > > adaptive thing, we could probably raise the global (hard) vm_dirty_ratio a 
> > > lot.
> > 
> > I just did quite a bit of that:
> > 
> >   http://lkml.org/lkml/2007/6/14/437
> 
> Ok, that does look interesting.
> 
> A few comments:
> 
>  - Cosmetic: please please *please* don't do this:
> 
> 	-	if (atomic_long_dec_return(&nfss->writeback) < NFS_CONGESTION_OFF_THRESH) {
> 	+	if (atomic_long_dec_return(&nfss->writeback) <
> 	+			NFS_CONGESTION_OFF_THRESH)
> 
>    we had a discussion about this not that long ago, and it drives me wild 
>    to see people split lines like that. It used to be readable. Now it's 
>    not.
> 
>    If it's the checkpatch.pl thing that caused you to split it like that, 
>    I think we should just change the value where we start complaining. 
>    Maybe set it at 95 characters per line or something.

It was.

>  - I appreciate the extensive comments on floating proportions, and the 
>    code looks really quite clean (apart from the cosmetic thing about) 
>    from a quick look-through.
> 
>    HOWEVER. It does seem to be a bit of an overkill. Do we really need to 
>    be quite that clever,

Hehe, it is the simplest thing I could come up with. It is
deterministic, fast and has a nice physical model :-)

>  and do we really need to do 64-bit calculations 
>    for this?

No we don't (well, on 64bit arches we do). I actually only use unsigned
long, and even cast whatever comes out of the percpu_counter thing to
unsigned long.

>  The 64-bit ops in particular seem quite iffy: if we ever 
>    actually pass in an amount that doesn't fit in 32 bits, we'll turn 
>    those per-cpu counters into totally synchronous global counters, which 
>    seems to defeat the whole point of them. So I'm a bit taken aback by 
>    that whole "mod64" thing

That is only needed on 64bit arches, and even there, actually
encountering such large values will be rare at best. 

Also, this re-normalisation event that uses the call is low frequency.
That is, that part will be used once every ~ total_dirty_limit/nr_bdis
written out.

>    (I also hate the naming. I don't think "..._mod()" was a good name to 
>    begin with: "mod" means "modulus" to me, not "modify". Making it be 
>    called "mod64" just makes it even *worse*, since it's now _obviously_ 
>    about modulus - but isn't)

Agreed.

> So I'd appreciate some more explanations, but I'd also appreciate some 
> renaming of those functions. What used to be pretty bad naming just turned 
> *really* bad, imnsho.

It all just stems from Andrew asking if I could please re-use something
instread of duplication a lot of things. I picked percpu_counter because
that was the closest to what was needed. An unsigned long based per-cpu
counter would suit better.

There is another problem I have with this percpu_counter, it is rather
space hungry. It does a node affine sizeof(s32) kalloc on each cpu.
Which will end up using the smallest slab, and that is quite a bit
bigger than needed. But should be about the size of a cacheline
(otherwise we might still end up with false sharing).

I've been thinking of extending this per cpu allocator thing a bit to be
a little smarter about these things. What would be needed is a strict
per-cpu slab allocator. The current ones are node affine, which can
still cause false sharing (unless - as should be the case - these
objects are both cacheline aligned and of cacheline size). When we have
that, we can start using smaller objects.

next prev parent reply	other threads:[~2007-06-25  0:16 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-06-18 21:14 Change in default vm_dirty_ratio Tim Chen
2007-06-18 23:47 ` Andrew Morton
2007-06-19  0:06   ` Linus Torvalds
2007-06-19  0:09     ` Arjan van de Ven
2007-06-19 18:41   ` John Stoffel
2007-06-19 19:04     ` Linus Torvalds
2007-06-19 19:06       ` Linus Torvalds
2007-06-19 22:33       ` David Miller
2007-06-19 19:57   ` Andi Kleen
2007-06-20  4:24   ` Dave Jones
2007-06-20  4:44     ` Andrew Morton
2007-06-20  8:35       ` Peter Zijlstra
2007-06-20  8:58         ` Andrew Morton
2007-06-20  9:14           ` Jens Axboe
2007-06-20  9:19             ` Peter Zijlstra
2007-06-20  9:20               ` Jens Axboe
2007-06-20  9:43                 ` Peter Zijlstra
2007-06-21 22:53                 ` Matt Mackall
2007-06-21 23:08                   ` Linus Torvalds
2007-06-23 18:23                     ` Peter Zijlstra
2007-06-24 16:40                       ` Linus Torvalds
2007-06-25  0:15                         ` Peter Zijlstra [this message]
2007-06-20  9:19           ` Peter Zijlstra
2007-06-21 16:54           ` Mark Lord
2007-06-21 16:55             ` Peter Zijlstra
2007-06-20 17:17         ` Linus Torvalds
2007-06-20 18:12           ` Arjan van de Ven
2007-06-20 18:28             ` Linus Torvalds
2007-06-21 12:37     ` Nadia Derbey

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1182730552.6174.45.camel@lappy \
    --to=peterz@infradead.org \
    --cc=akpm@linux-foundation.org \
    --cc=davej@redhat.com \
    --cc=jens.axboe@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mpm@selenic.com \
    --cc=tim.c.chen@linux.intel.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.