From: Nick Piggin <nickpiggin@yahoo.com.au>
To: Andrea Arcangeli <andrea@novell.com>
Cc: Andrew Morton <akpm@osdl.org>, linux-kernel@vger.kernel.org
Subject: Re: ZONE_PADDING wastes 4 bytes of the new cacheline
Date: Fri, 22 Oct 2004 13:02:24 +1000 [thread overview]
Message-ID: <41787840.3060807@yahoo.com.au> (raw)
In-Reply-To: <20041022011057.GC14325@dualathlon.random>
Andrea Arcangeli wrote:
> On Fri, Oct 22, 2004 at 10:34:13AM +1000, Nick Piggin wrote:
>
>>Andrea Arcangeli wrote:
>>
>>
>>>looks reasonable. only cons is that this rejects on my tree ;), pages_*
>>>and protection is gone in my tree, replaced by watermarks[] using the
>>>very same optimal and proven algo of 2.4 (enabled by default of course).
>>>I'll reevaluate the false sharing later on.
>>>
>>
>>May I again ask what you think is wrong with ->protection[] apart from
>>it being turned off by default? (I don't think our previous conversation
>>ever reached a conclusion...)
>
>
> the API is flawed, how can set it up by default? if somebody tweaks
> pages_min it get screwed.
>
The setup code leaves a bit to be desired, but for the purpose of
kswapd and the page allocator they are fine.
> plus it's worthless to have pages_min/low/high and protection[] when you
> can combine the two things together. those pages_min/low/high and
> protection combined when protection itself is calculated in function of
> pages_min/low/high just creates confusion. I believe this comments
> explains it well enough:
>
I don't agree, there are times when you need to know the bare pages_xxx
watermark, and times when you need to know the whole ->protection thing.
> /*
> * setup_per_zone_protection - called whenver min_free_kbytes or
> * sysctl_lower_zone_protection changes. Ensures that each zone
> * has a correct pages_protected value, so an adequate number of
> * pages are left in the zone after a successful __alloc_pages().
> *
> * This algorithm is way confusing. I tries to keep the same
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> * behavior
> * as we had with the incremental min iterative algorithm.
> */
>
> I've to agree with the comment, if I've to start doing math on top of
> this algorithm, it's almost faster to replace it with the 2.4 one that
> has clear semantics.
>
> Example of the runtime behavaiour of the very well understood
> lowmem_reserve from 2.4, it's easier to make an example than to explain
> it with words:
>
> 1G machine -> (16M dma, 800M-16M normal, 1G-800M high)
> 1G machine -> (16M dma, 784M normal, 224M high)
>
> sysctl defaults are:
>
> int sysctl_lower_zone_reserve_ratio[MAX_NR_ZONES-1] = { 256, 32 };
>
> the results will be:
>
> 1) NORMAL allocation will leave 784M/256 of ram reserved in the ZONE_DMA
>
> 2) HIGHMEM allocation will leave 224M/32 of ram reserved in ZONE_NORMAL
>
> 3) HIGHMEM allocation will (224M+784M)/256 of ram reserved in ZONE_DMA
>
> I invented this algorithm to scale in all machines and to make the
> memory reservation not noticeable and only beneficial. With this API is
> trivial to tune and to understand what will happen, the default value
> looks good and they're proven by years of production in 2.4. Most
> important: it's only in function of the sizes of the zones, the
> pages_min levels have nothing to do with it.
>
OK I dont disagree that your setup calculations are much nicer, and
the current ones are pretty broken...
> I'm still unsure if the 2.6 lower_zone_protection completely mimics the
> 2.4 lowmem_zone_reserve algorithm if tuned by reversing the pages_min
> settings accordingly, but I believe it's easier to drop it and replace
> with a clear understandable API that as well drops the pages_min levels
> that have no reason to exists anymore, than to leave it in its current
> state and to start reversing pages_min algorithm to tune it from
> userspace (in the hope nobody could ever tweak pages_min calculation in
> the kernel, to avoid breaking the userspace that would require
> kernel-internal knowledge to have a chance to tune lowmem_protection
> from a rc.d script).
>
But please no wholesale replacement of the ->pages_xxx / ->protection
thing unless you really show it is needed (which I'm pretty sure it
isn't). alloc_pages is very nice right now ;)
next prev parent reply other threads:[~2004-10-22 3:11 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-10-21 1:17 ZONE_PADDING wastes 4 bytes of the new cacheline Andrea Arcangeli
2004-10-21 3:10 ` Nick Piggin
2004-10-21 4:36 ` Andrew Morton
2004-10-21 4:53 ` Nick Piggin
2004-10-21 10:51 ` Mikael Pettersson
2004-10-21 12:45 ` Andrea Arcangeli
2004-10-21 18:54 ` Adam Heath
2004-10-21 20:21 ` DaMouse
2004-10-21 21:24 ` Jon Masters
2004-10-22 10:09 ` DaMouse
2004-10-21 22:26 ` Nick Piggin
2004-10-21 22:45 ` Andrea Arcangeli
2004-10-22 0:34 ` Nick Piggin
2004-10-22 1:10 ` Andrea Arcangeli
2004-10-22 1:26 ` Andrew Morton
2004-10-22 2:55 ` Jesse Barnes
2004-10-22 3:38 ` Nick Piggin
2004-10-22 3:49 ` Jesse Barnes
2004-10-22 17:15 ` Andrea Arcangeli
2004-10-22 3:09 ` Nick Piggin
2004-10-22 3:26 ` Andrew Morton
2004-10-22 3:35 ` Nick Piggin
2004-10-22 17:13 ` Andrea Arcangeli
2004-10-22 17:07 ` Andrea Arcangeli
2004-10-22 15:50 ` Andrea Arcangeli
2004-10-22 3:02 ` Nick Piggin [this message]
2004-10-22 16:58 ` Andrea Arcangeli
2004-10-23 4:33 ` Nick Piggin
2004-10-23 9:59 ` Andrea Arcangeli
2004-10-23 10:22 ` Nick Piggin
2004-10-23 11:03 ` Andrea Arcangeli
2004-10-23 16:28 ` Nick Piggin
2004-10-25 12:44 ` Andrea Arcangeli
2004-10-25 12:49 ` Nick Piggin
2004-10-25 13:51 ` Andrea Arcangeli
2004-10-25 20:09 ` Robert White
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=41787840.3060807@yahoo.com.au \
--to=nickpiggin@yahoo.com.au \
--cc=akpm@osdl.org \
--cc=andrea@novell.com \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.