Re: [PATCH] Rmap speedup

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Andrew Morton <akpm@zip.com.au>
To: Daniel Phillips <phillips@arcor.de>
Cc: linux-kernel@vger.kernel.org
Subject: Re: [PATCH] Rmap speedup
Date: Sat, 03 Aug 2002 14:40:41 -0700	[thread overview]
Message-ID: <3D4C4DD9.779C057B@zip.com.au> (raw)
In-Reply-To: E17b3sE-0001T4-00@starship

Daniel Phillips wrote:
> 
> On Saturday 03 August 2002 07:24, Andrew Morton wrote:
> > - page_add_rmap has vanished
> > - page_remove_rmap has halved (80% of the remaining is the
> >   list walk)
> > - we've moved the cost into the new locking site, zap_pte_range
> >   and copy_page_range.
> 
> > So rmap locking is still a 15% slowdown on my soggy quad, which generally
> > seems relatively immune to locking costs.
> 
> What is it about your quad?

Dunno.  I was discussing related things with David M-T yesterday.  A
`lock;incl' on a P4 takes ~120 cycles, everything in-cache.  On my PIII
it's 33 cycles.  12-16 on ia64.   Lots of variation there.

Instrumentation during your 35 second test shows:

- Performed an rmap lock 6.5M times
- Got a hit on the cached lock 9.2M times
- Got a miss on the cached lock 2.6M times.

So the remaining (6.5M - 2.6M) locks were presumably in the 
pagefault handlers.

lockmeter results are at http://www.zip.com.au/~akpm/linux/rmap-lockstat.txt

- We took 3,000,000 rwlocks (mainly pagecache)
- We took 20,000,000 spinlocks
- the locking is spread around copy_mm, copy-page_range,
  do_no_page, handle_mm_fault, page_add_rmap, zap_pte_range
- total amount of CPU time lost spinning on locks is 1%, mainly
  in page_add_rmap and zap_pte_range.

That's not much spintime.   The total system time with this test went
from 71 seconds (2.5.26) to 88 seconds (2.5.30). (4.5 seconds per CPU)
So all the time is presumably spent waiting on cachelines to come from
other CPUs, or from local L2.

lockmeter results for 2.5.26 are at
http://www.zip.com.au/~akpm/linux/2.5.26-lockstat.txt

- 2.5.26 took 17,000,000 spinlocks
- but 3,000,000 of those were kmap_lock and pagemap_lru_lock, which
  have been slaughtered in my tree.  rmap really added 6,000,000
  locks to 2.5.30.


Running the same test on 2.4:

2.4.19-pre7:
	./daniel.sh  35.12s user 65.96s system 363% cpu 27.814 total
	./daniel.sh  35.95s user 64.77s system 362% cpu 27.763 total
	./daniel.sh  34.99s user 66.46s system 364% cpu 27.861 total

2.4.19-pre7+rmap:
	./daniel.sh  36.20s user 106.80s system 363% cpu 39.316 total
	./daniel.sh  38.76s user 118.69s system 399% cpu 39.405 total
	./daniel.sh  35.47s user 106.90s system 364% cpu 39.062 total

2.4.19-pre7+rmap-13b+your patch:
	./daniel.sh  33.72s user 97.20s system 364% cpu 35.904 total
	./daniel.sh  35.18s user 94.48s system 363% cpu 35.690 total
	./daniel.sh  34.83s user 95.66s system 363% cpu 35.921 total

The system time is pretty gross, isn't it?

And it's disproportional to the increased number of lockings.

> ...
> 
> But before we start on the micro-optimization we need to know why your quad
> is so unaffected by the big change.

We need a little test proggy to measure different platforms cache
load latency, locked operations, etc.  I have various bits-n-pieces,
will put something together.

>  Are you sure the slab cache batching of
> pte chain allocation performs as well as my simpleminded inline batching?

Slab is pretty good, I find.  And there's no indication of a problem
in the profiles.

> (I batched the pte chain allocation lock quite nicely.)  What about the bit
> test/set for the direct rmap pointer, how is performance affected by dropping
> the direct lookup optimization?

It didn't show in the instruction-level oprofiling.

>  Note that you are holding the rmap lock
> considerably longer than I was, by holding it across __page_add_rmap instead
> of just across the few instructions where pointers are actually updated.  I'm
> also wondering if gcc is optimizing your cached_rmap_lock inline as well as
> you think it is.
> 
> I really need to be running on 2.5 so I can crosscheck your results.  I'll
> return to the matter of getting the dac960 running now.

Sigh.  No IDE?

> Miscellaneous question: we are apparently adding rmaps to reserved pages, why
> is that?

That's one for Rik...

next prev parent reply	other threads:[~2002-08-03 21:27 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-08-02 19:42 [PATCH] Rmap speedup Daniel Phillips
2002-08-02 20:20 ` Andrew Morton
2002-08-02 21:40   ` William Lee Irwin III
2002-08-03  0:14   ` Rik van Riel
2002-08-03  0:31     ` Andrew Morton
2002-08-03  0:52       ` William Lee Irwin III
2002-08-03  0:56       ` Rik van Riel
2002-08-03  3:47   ` Daniel Phillips
2002-08-03  5:24     ` Andrew Morton
2002-08-03 18:43       ` Daniel Phillips
2002-08-03 21:40         ` Andrew Morton [this message]
2002-08-03 21:54           ` Rik van Riel
2002-08-03 22:49           ` Daniel Phillips
2002-08-03 23:55             ` Gerrit Huizenga
2002-08-04  0:47             ` Andrew Morton
2002-08-04  1:01               ` Daniel Phillips
2002-08-04 14:11                 ` Thunder from the hill
2002-08-04 14:47                   ` Zwane Mwaikambo
2002-08-04 16:55                   ` Tobias Ringstrom
2002-08-03 23:36           ` Daniel Phillips
2002-08-04  0:44             ` Andrew Morton
2002-08-03 21:05       ` Rik van Riel
2002-08-03 21:36         ` Daniel Phillips
2002-08-03 21:43         ` Andrew Morton
2002-08-03 21:41           ` Daniel Phillips
2002-08-03 21:24       ` [PATCH] Rmap speedup... call for testing Daniel Phillips
2002-08-03 22:05       ` [PATCH] Rmap speedup Daniel Phillips
2002-08-03 22:39         ` Andrew Morton
2002-08-03 22:35           ` Daniel Phillips
2002-08-04 23:33 ` Andrew Morton
2002-08-05  0:35   ` Daniel Phillips
2002-08-05  7:05   ` Andrew Morton
2002-08-05 13:48     ` Daniel Phillips
2002-08-05 13:57       ` Rik van Riel
2002-08-05 18:16         ` Andrew Morton
2002-08-07 18:59     ` Daniel Phillips
2002-08-07 19:40       ` Andrew Morton
2002-08-07 20:17         ` Daniel Phillips
2002-08-07 20:34           ` Andrew Morton
2002-08-07 20:51             ` Daniel Phillips
2002-08-07 20:54               ` Rik van Riel
2002-08-07 22:21                 ` Daniel Phillips
2002-08-07 22:48                   ` Andrew Morton
2002-08-07 20:39           ` Daniel Phillips

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3D4C4DD9.779C057B@zip.com.au \
    --to=akpm@zip.com.au \
    --cc=linux-kernel@vger.kernel.org \
    --cc=phillips@arcor.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox