public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andrea Arcangeli <andrea@suse.de>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Linus Torvalds <torvalds@osdl.org>,
	Linux Kernel list <linux-kernel@vger.kernel.org>,
	Rik van Riel <riel@surriel.com>, Andrew Morton <akpm@osdl.org>
Subject: Re: Page aging broken in 2.6
Date: Sat, 27 Dec 2003 03:37:53 +0100	[thread overview]
Message-ID: <20031227023752.GF1676@dualathlon.random> (raw)
In-Reply-To: <1072487027.15476.105.camel@gaston>

On Sat, Dec 27, 2003 at 12:03:48PM +1100, Benjamin Herrenschmidt wrote:
> For accessed, we currently do not use the HW bit neither. Accessed = in
> the hash, not accessed = not in the hash. A bit basic, but the cost of
> faulting them back in isn't that bad. Still, I always found it a bit
> stupid that we end up having the harvesting of accessed bits actually
> evict pages that _are_ accessed, and thus potentially here to be
> accessed again ;)

It's hard for me to evaluate how much the young bit matters by only
thinking about it,  I know for sure the heavily swapping behaviour on
the alpha was noticeably less smooth than on x86 (alpha has^Hd no way to
implement the young bit, not even like you do in software through hash
faults). So I guess it's worthwhile for you to account for it even if in
software (i.e. ppc not ppc64).

> Paul did some experiments using the HW bits and didn't see a great
> perf increase (or what is even a decrease ?), but I should try that

It should be an I/O dominated workload anyways and it sounds like the
hardware way involves hash manipulation too (it only avoids the fault to
set it back on).

> > I'll let Rik and Andrea argue that part - it's entirely possible that 
> > getting lots of positive results is a _good_ thing, if the same page is 
> > mapped multiple times. That would just make us less eager to unmap it, 

that sounds correct behaviour to me, if a page is mapped multiple times
we should be eager in unmapping it. More precisely we should give every
user the opportunity to increase the youngness of the page, so a page
with multiple users will go away after a page with just a single user,
assuming all users access their pages at the same frequency.

Returning to the "how to flush the tlb after clearing the young bit", at
least on the x86 I find more desiderable to flush based on mm (in UP
that's the most efficient and it provides an accurate behaviour, in SMP
it maybe still to costly but sure a lot less costly than a broadcast per
pte).  In 2.4 with the pagetable scan the flush per mm is
strightforward and  it provides a very high probability of optimizing
away an huge lot of spurious IPI broadcast. But even in 2.6 the vm is
unmapping stuff with some aggressive clustering algorithm so that when
it starts umapping stuff it drops quite some stuff and there's still a
relevant probability that only a few mm have to be flushed, which in SMP
can decrease a lot the need of IPIs.  Not sure how these flush_tlb_mm
ideas translates for ppc though.

The dirty and accessed bitflags instead are quite a different matter
w.r.t to tlb flushing, we can't defer the tlb flush after atomically
clearing the pte in smp while we clear the dirty bit. the tlb shootdown
is the clustered version of that. the shootdown run a broadcast IPI
not more than every 508 pte freed per mm. For the same reason we can try
to coalesce the tlb flush post-clear-young with an mm flush, we can
achieve a similar coalescing without the no need of an exact tlb
shootdown like in the pte freeing.

  reply	other threads:[~2003-12-27  2:38 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-12-26  7:28 Page aging broken in 2.6 Benjamin Herrenschmidt
2003-12-26  7:40 ` Andrew Morton
2003-12-26  9:21   ` Arjan van de Ven
2003-12-26  9:58     ` Benjamin Herrenschmidt
2003-12-26 19:44     ` Davide Libenzi
2003-12-26  9:33   ` Russell King
2003-12-26 10:07     ` Benjamin Herrenschmidt
2003-12-26 17:59 ` Linus Torvalds
2003-12-26 23:55   ` Benjamin Herrenschmidt
2003-12-27  0:35     ` Linus Torvalds
2003-12-27  0:44       ` Benjamin Herrenschmidt
2003-12-27  0:53         ` Linus Torvalds
2003-12-27  0:59           ` Linus Torvalds
2003-12-27  1:03           ` Benjamin Herrenschmidt
2003-12-27  2:37             ` Andrea Arcangeli [this message]
2003-12-27  5:02               ` Benjamin Herrenschmidt
2003-12-27 10:16               ` William Lee Irwin III
2003-12-27  2:47           ` Rik van Riel
2003-12-27  3:00             ` Andrew Morton
2003-12-27  3:31               ` Rik van Riel
2003-12-27  3:54               ` Linus Torvalds
2003-12-27 16:34                 ` Martin J. Bligh
2003-12-27 23:07               ` Roger Luethi
2003-12-27 23:55                 ` William Lee Irwin III
2003-12-28 11:23                   ` Roger Luethi
2003-12-28 16:35                     ` William Lee Irwin III
2003-12-28 17:15                       ` Roger Luethi
2003-12-28  0:04                 ` Andrew Morton
2003-12-28 11:58                   ` Roger Luethi
2003-12-27  1:41       ` Andrea Arcangeli
  -- strict thread matches above, loose matches on Subject: below --
2003-12-26 10:45 Manfred Spraul

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20031227023752.GF1676@dualathlon.random \
    --to=andrea@suse.de \
    --cc=akpm@osdl.org \
    --cc=benh@kernel.crashing.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=riel@surriel.com \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox