public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: William Lee Irwin III <wli@holomorphy.com>
To: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>,
	Andrew Morton <akpm@osdl.org>,
	linux-kernel@vger.kernel.org
Subject: Re: [pagevec] resize pagevec to O(lg(NR_CPUS))
Date: Sat, 11 Sep 2004 21:56:36 -0700	[thread overview]
Message-ID: <20040912045636.GA2660@holomorphy.com> (raw)
In-Reply-To: <20040910174915.GA4750@logos.cnet>

William Lee Irwin III wrote:
>>> In order to attempt to compensate for arrival rates to zone->lru_lock
>>> increasing as O(num_cpus_online()), this patch resizes the pagevec to
>>> O(lg(NR_CPUS)) for lock amortization that adjusts better to the size of
>>> the system. Compiletested on ia64.

On Fri, Sep 10, 2004 at 02:56:11PM +1000, Nick Piggin wrote:
>> Yuck. I don't like things like this to depend on NR_CPUS, because your
>> kernel may behave quite differently depending on the value. But in this
>> case I guess "quite differently" is probably "a little bit differently",
>> and practical reality may dictate that you need to do something like
>> this at compile time, and NR_CPUS is your best approximation.

On Fri, Sep 10, 2004 at 02:49:15PM -0300, Marcelo Tosatti wrote:
> For me Bill's patch (with the recursive thingie) is very cryptic. Its
> just doing log2(n), it took me an hour to figure it out with his help.

Feel free to suggest other ways to discover lg(n) at compile-time.


On Fri, Sep 10, 2004 at 02:56:11PM +1000, Nick Piggin wrote:
>> That said, I *don't* think this should go in hastily.
>> First reason is that the lru lock is per zone, so the premise that
>> zone->lru_lock aquisitions increases O(cpus) is wrong for anything large
>> enough to care (ie. it will be NUMA). It is possible that a 512 CPU Altix
>> will see less lru_lock contention than an 8-way Intel box.

On Fri, Sep 10, 2004 at 02:49:15PM -0300, Marcelo Tosatti wrote:
> Oops, right. wli's patch is borked for NUMA. Clamping it at 64 should
> do fine.

No, it DTRT. Batching does not directly compensate for increases in
arrival rates, rather most directly compensates for increases to lock
transfer times, which do indeed increase on systems with large numbers
of cpus.


On Fri, Sep 10, 2004 at 02:56:11PM +1000, Nick Piggin wrote:
>> Secondly is that you'll might really start putting pressure on small L1
>> caches (eg. Itanium 2) if you bite off too much in one go. If you blow
>> it, you'll have to pull all the pages into cache again as you process
>> the pagevec.

On Fri, Sep 10, 2004 at 02:49:15PM -0300, Marcelo Tosatti wrote:
> Whats the L1 cache size of Itanium2? Each page is huge compared to the pagevec
> structure (you need a 64 item pagevec array on 64-bits to occupy the space of 
> one 4KB page). So I think you wont blow up the cache even with a really big 
> pagevec.

A 511 item pagevec is 4KB on 64-bit machines.


On Fri, Sep 10, 2004 at 02:56:11PM +1000, Nick Piggin wrote:
>> I don't think the smallish loop overhead constant (mainly pulling a lock
>> and a couple of hot cachelines off another CPU) would gain muc from
>> increasing these a lot, either. The overhead should already at least an
>> order of magnitude smaller than the actual work cost.
>> Lock contention isn't a good argument either, because it shouldn't
>> significantly change as you tradeoff hold vs frequency if we assume
>> that the lock transfer and other overheads aren't significant (which
>> should be a safe assumption at PAGEVEC_SIZE of >= 16, I think).
>> Probably a PAGEVEC_SIZE of 4 on UP would be an interesting test, because
>> your loop overheads get a bit smaller.

On Fri, Sep 10, 2004 at 02:49:15PM -0300, Marcelo Tosatti wrote:
> Not very noticeable on reaim. I want to do more tests (different
> workloads, nr CPUs, etc).

The results I got suggest the tests will not be significantly different
unless the machines differ significantly in the overhead of acquiring a
cacheline in an exclusive state.


On Fri, Sep 10, 2004 at 02:49:15PM -0300, Marcelo Tosatti wrote:
> kernel: pagevec-4
> plmid: 3304
> Host: stp1-002
> Reaim test
> http://khack.osdl.org/stp/297484
> kernel: 3304
> Filesystem: ext3
> Peak load Test: Maximum Jobs per Minute 992.40 (average of 3 runs)
> Quick Convergence Test: Maximum Jobs per Minute 987.72 (average of 3 runs)
> If some fields are empty or look unusual you may have an old version.
> Compare to the current minimal requirements in Documentation/Changes.
> kernel: 2.6.9-rc1-mm4
> plmid: 3294
> Host: stp1-003
> Reaim test
> http://khack.osdl.org/stp/297485
> kernel: 3294
> Filesystem: ext3
> Peak load Test: Maximum Jobs per Minute 989.85 (average of 3 runs)
> Quick Convergence Test: Maximum Jobs per Minute 982.07 (average of 3 runs)
> If some fields are empty or look unusual you may have an old version.
> Compare to the current minimal requirements in Documentation/Changes.

Unsurprising. If the expected response time given batching factor K is
T(K) (which also depends on the lock transfer time) T(K)/(K*T(1)) may
have nontrivial maxima and minima in K. I've tried for the expected
waiting time of a few queues (e.g. M/M/1) and verified it's not a
degradation for them, though I've not gone over it in generality
(G/G/m is hard to get results of any kind for anyway). I refrained from
posting a lengthier discussion of the results.


-- wli

  parent reply	other threads:[~2004-09-12  4:57 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-09-09 16:39 [PATCH] cacheline align pagevec structure Marcelo Tosatti
2004-09-09 22:49 ` Andrew Morton
2004-09-09 21:41   ` Marcelo Tosatti
2004-09-09 23:20     ` Andrew Morton
2004-09-09 22:52 ` Andrew Morton
2004-09-09 23:09   ` William Lee Irwin III
2004-09-09 22:12     ` Marcelo Tosatti
2004-09-09 23:59       ` William Lee Irwin III
2004-09-09 23:22     ` Andrew Morton
2004-09-10  0:07       ` [pagevec] resize pagevec to O(lg(NR_CPUS)) William Lee Irwin III
2004-09-10  4:56         ` Nick Piggin
2004-09-10  4:59           ` Nick Piggin
2004-09-10 17:49           ` Marcelo Tosatti
2004-09-12  0:29             ` Nick Piggin
2004-09-12  5:23               ` William Lee Irwin III
2004-09-12  4:36                 ` Nick Piggin
2004-09-12  4:56             ` William Lee Irwin III [this message]
2004-09-12  4:28               ` Nick Piggin
2004-09-12  6:27                 ` William Lee Irwin III
2004-09-12  6:03                   ` Nick Piggin
2004-09-12  7:19                     ` William Lee Irwin III
2004-09-12  7:42                       ` Andrew Morton
2004-09-14  2:18                         ` William Lee Irwin III
2004-09-14  2:57                           ` Andrew Morton
2004-09-14  3:12                             ` William Lee Irwin III
2004-09-12  8:57                       ` William Lee Irwin III
2004-09-13 22:21                 ` Marcelo Tosatti
2004-09-14  1:59                   ` Nick Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20040912045636.GA2660@holomorphy.com \
    --to=wli@holomorphy.com \
    --cc=akpm@osdl.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=marcelo.tosatti@cyclades.com \
    --cc=nickpiggin@yahoo.com.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox