From: Nick Piggin <nickpiggin@yahoo.com.au>
To: William Lee Irwin III <wli@holomorphy.com>
Cc: Marcelo Tosatti <marcelo.tosatti@cyclades.com>,
Andrew Morton <akpm@osdl.org>,
linux-kernel@vger.kernel.org
Subject: Re: [pagevec] resize pagevec to O(lg(NR_CPUS))
Date: Sun, 12 Sep 2004 16:03:50 +1000 [thread overview]
Message-ID: <4143E6C6.40908@yahoo.com.au> (raw)
In-Reply-To: <20040912062703.GF2660@holomorphy.com>
William Lee Irwin III wrote:
> William Lee Irwin III wrote:
>
>>>No, it DTRT. Batching does not directly compensate for increases in
>>>arrival rates, rather most directly compensates for increases to lock
>>>transfer times, which do indeed increase on systems with large numbers
>>>of cpus.
>
>
> On Sun, Sep 12, 2004 at 02:28:46PM +1000, Nick Piggin wrote:
>
>>Generally though I think you could expect the lru lock to be most
>>often taken by the scanner by node local CPUs. Even on the big
>>systems. We'll see.
>
>
> No, I'd expect zone->lru_lock to be taken most often for lru_cache_add()
> and lru_cache_del().
>
Well "lru_cache_del" will be often coming from the scanner.
lru_cache_add should be being performed on newly allocated pages,
which should be node local most of the time.
>
> William Lee Irwin III wrote:
>
>>>A 511 item pagevec is 4KB on 64-bit machines.
>
>
> On Sun, Sep 12, 2004 at 02:28:46PM +1000, Nick Piggin wrote:
>
>>Sure. And when you fill it with pages, they'll use up 32KB of dcache
>>by using a single 64B line per page. Now that you've blown the cache,
>>when you go to move those pages to another list, you'll have to pull
>>them out of L2 again one at a time.
>
>
> There can be no adequate compile-time metric of L1 cache size. 64B
> cachelines with 16KB caches sounds a bit small, 256 entries, which is
> even smaller than TLB's on various systems.
>
Although I'm pretty sure that is what Itanium 2 has. P4s may even
have 128B lines and 16K L1 IIRC.
> In general a hard cap at the L1 cache size would be beneficial for
> operations done in tight loops, but there is no adequate detection
> method. Also recall that the page structures things will be touched
> regardless if they are there to be touched in a sufficiently large
> pagevec. Various pagevecs are meant to amortize locking done in
> scenarios where there is no relationship between calls. Again,
> lru_cache_add() and lru_cache_del() are the poster children. These
> operations are often done for one page at a time in some long codepath,
> e.g. fault handlers, and the pagevec is merely deferring the work until
> enough has accumulated. radix_tree_gang_lookup() and mpage_readpages()
> OTOH execute the operations to be done under the locks in tight loops,
> where the lock acquisitions are to be done immediately by the same caller.
>
> This differentiation between the characteristics of pagevec users
> happily matches the cases where they're used on-stack and per-cpu.
> In the former case, larger pagevecs are desirable, as the cachelines
> will not be L1-hot regardless; in the latter, L1 size limits apply.
>
Possibly, I don't know. Performing a large stream of faults to
map in a file could easily keep all pages of a small pagevec
in cache.
Anyway, the point I'm making is just that you don't want to be
expanding this thing just because you can. If all else is equal,
a smaller size is obviously preferable. So obviously, simple
testing is required - but I don't think I need to be telling you
that ;)
>
> On Sun, Sep 12, 2004 at 02:28:46PM +1000, Nick Piggin wrote:
>
>>OK, so a 511 item pagevec is pretty unlikely. How about a 64 item one
>>with 128 byte cachelines, and you're touching two cachelines per
>>page operation? That's 16K.
>
>
> 4*lg(NR_CPUS) is 64 for 16x-31x boxen. No constant number suffices.
> Adaptation to systems and the usage cases would be an improvement.
>
Ignore my comments about disliking compile time sizing: the main
thing is to just find improvements, and merge-worthy implementation
can follow.
next prev parent reply other threads:[~2004-09-12 6:55 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-09-09 16:39 [PATCH] cacheline align pagevec structure Marcelo Tosatti
2004-09-09 22:49 ` Andrew Morton
2004-09-09 21:41 ` Marcelo Tosatti
2004-09-09 23:20 ` Andrew Morton
2004-09-09 22:52 ` Andrew Morton
2004-09-09 23:09 ` William Lee Irwin III
2004-09-09 22:12 ` Marcelo Tosatti
2004-09-09 23:59 ` William Lee Irwin III
2004-09-09 23:22 ` Andrew Morton
2004-09-10 0:07 ` [pagevec] resize pagevec to O(lg(NR_CPUS)) William Lee Irwin III
2004-09-10 4:56 ` Nick Piggin
2004-09-10 4:59 ` Nick Piggin
2004-09-10 17:49 ` Marcelo Tosatti
2004-09-12 0:29 ` Nick Piggin
2004-09-12 5:23 ` William Lee Irwin III
2004-09-12 4:36 ` Nick Piggin
2004-09-12 4:56 ` William Lee Irwin III
2004-09-12 4:28 ` Nick Piggin
2004-09-12 6:27 ` William Lee Irwin III
2004-09-12 6:03 ` Nick Piggin [this message]
2004-09-12 7:19 ` William Lee Irwin III
2004-09-12 7:42 ` Andrew Morton
2004-09-14 2:18 ` William Lee Irwin III
2004-09-14 2:57 ` Andrew Morton
2004-09-14 3:12 ` William Lee Irwin III
2004-09-12 8:57 ` William Lee Irwin III
2004-09-13 22:21 ` Marcelo Tosatti
2004-09-14 1:59 ` Nick Piggin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4143E6C6.40908@yahoo.com.au \
--to=nickpiggin@yahoo.com.au \
--cc=akpm@osdl.org \
--cc=linux-kernel@vger.kernel.org \
--cc=marcelo.tosatti@cyclades.com \
--cc=wli@holomorphy.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.