From: Nick Piggin <nickpiggin@yahoo.com.au>
To: William Lee Irwin III <wli@holomorphy.com>
Cc: Marcelo Tosatti <marcelo.tosatti@cyclades.com>,
Andrew Morton <akpm@osdl.org>,
linux-kernel@vger.kernel.org
Subject: Re: [pagevec] resize pagevec to O(lg(NR_CPUS))
Date: Sun, 12 Sep 2004 16:03:50 +1000 [thread overview]
Message-ID: <4143E6C6.40908@yahoo.com.au> (raw)
In-Reply-To: <20040912062703.GF2660@holomorphy.com>
William Lee Irwin III wrote:
> William Lee Irwin III wrote:
>
>>>No, it DTRT. Batching does not directly compensate for increases in
>>>arrival rates, rather most directly compensates for increases to lock
>>>transfer times, which do indeed increase on systems with large numbers
>>>of cpus.
>
>
> On Sun, Sep 12, 2004 at 02:28:46PM +1000, Nick Piggin wrote:
>
>>Generally though I think you could expect the lru lock to be most
>>often taken by the scanner by node local CPUs. Even on the big
>>systems. We'll see.
>
>
> No, I'd expect zone->lru_lock to be taken most often for lru_cache_add()
> and lru_cache_del().
>
Well "lru_cache_del" will be often coming from the scanner.
lru_cache_add should be being performed on newly allocated pages,
which should be node local most of the time.
>
> William Lee Irwin III wrote:
>
>>>A 511 item pagevec is 4KB on 64-bit machines.
>
>
> On Sun, Sep 12, 2004 at 02:28:46PM +1000, Nick Piggin wrote:
>
>>Sure. And when you fill it with pages, they'll use up 32KB of dcache
>>by using a single 64B line per page. Now that you've blown the cache,
>>when you go to move those pages to another list, you'll have to pull
>>them out of L2 again one at a time.
>
>
> There can be no adequate compile-time metric of L1 cache size. 64B
> cachelines with 16KB caches sounds a bit small, 256 entries, which is
> even smaller than TLB's on various systems.
>
Although I'm pretty sure that is what Itanium 2 has. P4s may even
have 128B lines and 16K L1 IIRC.
> In general a hard cap at the L1 cache size would be beneficial for
> operations done in tight loops, but there is no adequate detection
> method. Also recall that the page structures things will be touched
> regardless if they are there to be touched in a sufficiently large
> pagevec. Various pagevecs are meant to amortize locking done in
> scenarios where there is no relationship between calls. Again,
> lru_cache_add() and lru_cache_del() are the poster children. These
> operations are often done for one page at a time in some long codepath,
> e.g. fault handlers, and the pagevec is merely deferring the work until
> enough has accumulated. radix_tree_gang_lookup() and mpage_readpages()
> OTOH execute the operations to be done under the locks in tight loops,
> where the lock acquisitions are to be done immediately by the same caller.
>
> This differentiation between the characteristics of pagevec users
> happily matches the cases where they're used on-stack and per-cpu.
> In the former case, larger pagevecs are desirable, as the cachelines
> will not be L1-hot regardless; in the latter, L1 size limits apply.
>
Possibly, I don't know. Performing a large stream of faults to
map in a file could easily keep all pages of a small pagevec
in cache.
Anyway, the point I'm making is just that you don't want to be
expanding this thing just because you can. If all else is equal,
a smaller size is obviously preferable. So obviously, simple
testing is required - but I don't think I need to be telling you
that ;)
>
> On Sun, Sep 12, 2004 at 02:28:46PM +1000, Nick Piggin wrote:
>
>>OK, so a 511 item pagevec is pretty unlikely. How about a 64 item one
>>with 128 byte cachelines, and you're touching two cachelines per
>>page operation? That's 16K.
>
>
> 4*lg(NR_CPUS) is 64 for 16x-31x boxen. No constant number suffices.
> Adaptation to systems and the usage cases would be an improvement.
>
Ignore my comments about disliking compile time sizing: the main
thing is to just find improvements, and merge-worthy implementation
can follow.
next prev parent reply other threads:[~2004-09-12 6:55 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-09-09 16:39 [PATCH] cacheline align pagevec structure Marcelo Tosatti
2004-09-09 22:49 ` Andrew Morton
2004-09-09 21:41 ` Marcelo Tosatti
2004-09-09 23:20 ` Andrew Morton
2004-09-09 22:52 ` Andrew Morton
2004-09-09 23:09 ` William Lee Irwin III
2004-09-09 22:12 ` Marcelo Tosatti
2004-09-09 23:59 ` William Lee Irwin III
2004-09-09 23:22 ` Andrew Morton
2004-09-10 0:07 ` [pagevec] resize pagevec to O(lg(NR_CPUS)) William Lee Irwin III
2004-09-10 4:56 ` Nick Piggin
2004-09-10 4:59 ` Nick Piggin
2004-09-10 17:49 ` Marcelo Tosatti
2004-09-12 0:29 ` Nick Piggin
2004-09-12 5:23 ` William Lee Irwin III
2004-09-12 4:36 ` Nick Piggin
2004-09-12 4:56 ` William Lee Irwin III
2004-09-12 4:28 ` Nick Piggin
2004-09-12 6:27 ` William Lee Irwin III
2004-09-12 6:03 ` Nick Piggin [this message]
2004-09-12 7:19 ` William Lee Irwin III
2004-09-12 7:42 ` Andrew Morton
2004-09-14 2:18 ` William Lee Irwin III
2004-09-14 2:57 ` Andrew Morton
2004-09-14 3:12 ` William Lee Irwin III
2004-09-12 8:57 ` William Lee Irwin III
2004-09-13 22:21 ` Marcelo Tosatti
2004-09-14 1:59 ` Nick Piggin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4143E6C6.40908@yahoo.com.au \
--to=nickpiggin@yahoo.com.au \
--cc=akpm@osdl.org \
--cc=linux-kernel@vger.kernel.org \
--cc=marcelo.tosatti@cyclades.com \
--cc=wli@holomorphy.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox