public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@zip.com.au>
To: Daniel Phillips <phillips@arcor.de>
Cc: lkml <linux-kernel@vger.kernel.org>
Subject: Re: [patch 1/13] misc fixes
Date: Mon, 29 Jul 2002 13:00:14 -0700	[thread overview]
Message-ID: <3D459ECE.C5BD53DE@zip.com.au> (raw)
In-Reply-To: E17Z4v0-0002io-00@starship

Daniel Phillips wrote:
> 
>...
> >
> > It is apparent that these problems will not be solved by tweaking -
> > some redesign is needed.  In the 2.5 timeframe the only practical
> > solution appears to be page table sharing, based on Daniel's February
> > work.  Daniel and Dave McCracken are working that.
> 
> Sadly, it turns out that there are no possibilities for page table sharing
> when forking from bash.  It turns out there are only about 200 pages being
> shared amonst three page tables (stack, text and .interp) and at least one
> page in each of these gets written during the exec, so all are unshared and
> hence there is no reduction in the number of pte chains that have to be
> created.  For forking from a larger parent, page table sharing has a
> measurable benefit, but not from these little guys.

Thanks for looking into this.
 
> But there is something massively wrong with this whole picture.  Your kickass
> 4 way is managing to set up and tear down only one pte chain node per
> microsecond, if I'm reading your benchmark results correctly.

s/kickass/slow as a wet sock/.

That little script which did 12,000 fork/exec/exits ran page_add_rmap
and page_remove_rmap about 4,000,000 times each, and total runtime
went from 30 seconds to 34.  So the combined overhead of both adding and
removing the reverse mapping is around a microsecond per page, yes.
That's four locked operations.

>  That's really
> horrible.  I think we need to revisit the locking.

Anton did some testing on a 4-way PPC.  Similar results, I think.
As an experiment he added a spinlock to the page structure and used
that instead of the PG_chainlock flag.  It helped a lot.  He thinks
that is because their spin_unlock() is not buslocked (like ia32).

Which tends to imply that a __clear_bit() in pte_chain_unlock()
will be beneficial.  I don't know how portable that would be though.

 
> The idea I'm playing with now is to address an array of locks based on
> something like:
> 
>         spin_lock(pte_chain_locks + ((page->index >> 4) & 0xff));
> 
> so that 16 consecutive filemap pages use the same lock and there is a limited
> total number of locks to keep cache pressure down.  Since we are creating the
> vast majority of the pte chain nodes while walking across page tables, this
> should give nice locality.

Something like that could help.

At some point, when the reverse map is as CPU efficient as we can make it,
we need to decide whether the remaining cost is worth the benefit.  I
wonder how to do that.
 
> For this to work, anon pages will need to have something in page->index.
> This isn't too much of a challenge.  A reasonable value to put in there is
> the creator's virtual address, shifted right, and perhaps mangled a little to
> reduce contention.

Well you want the likely-to-be-temporally-adjacent anon pages to
use the same lock.  So maybe

	page->index = some_global_int++;

Except ->index gets stomped on when the page gets added to swapcache.
Which means that the address of its lock will change.  I can't immediately
think of a fix for that.

  parent reply	other threads:[~2002-07-29 19:59 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-07-28  7:32 [patch 1/13] misc fixes Andrew Morton
2002-07-29  7:26 ` Daniel Phillips
2002-07-29  8:44   ` Rik van Riel
2002-07-29  8:40     ` David S. Miller
2002-07-29 20:00   ` Andrew Morton [this message]
2002-07-29 20:55     ` Rik van Riel
2002-07-29 22:36       ` Daniel Phillips
2002-07-29 21:51     ` Daniel Phillips
2002-07-31 22:22       ` Rmap setup/teardown suckage Daniel Phillips
2002-07-30 10:11     ` [patch 1/13] misc fixes Daniel Phillips
2002-07-30 10:18     ` Daniel Phillips

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3D459ECE.C5BD53DE@zip.com.au \
    --to=akpm@zip.com.au \
    --cc=linux-kernel@vger.kernel.org \
    --cc=phillips@arcor.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox