From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
To: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Hugh Dickins <hugh@veritas.com>,
Linux Memory Management List <linux-mm@kvack.org>,
Linux Kernel list <linux-kernel@vger.kernel.org>,
Nick Piggin <npiggin@suse.de>,
Martin Schwidefsky <schwidefsky@de.ibm.com>,
Peter Chubb <peterc@gelato.unsw.edu.au>
Subject: Re: PTE access rules & abstraction
Date: Thu, 25 Sep 2008 08:07:21 +1000 [thread overview]
Message-ID: <1222294041.8277.104.camel@pasglop> (raw)
In-Reply-To: <48DAB7E2.5030009@goop.org>
> What do you propose then? Ideally one would like to get something that
> works for powerpc, s390, all the wacky ia64 modes as well as x86. The
> ia64 folks proposed something, but I've not looked at it closely. From
> an x86 virtualization perspective, something that's basically x86 with
> as much scope for batching and deferring as possible would be fine.
That's where things get interesting. I liked Nick ideas of doing
something transactional that could encompass the lock, bach and flushing
but that may be too much at this stage...
> As a start, what's the state machine for a pte? What states can it be
> in, and how does it move from state to state? It sounds like powerpc
> has at least one extra state above x86 (hashed, with the hash key stored
> in the pte itself?).
We store in the PTE whether it was hashed, and the location within a
hash bucket. (For each hash value, there's 8 buckets, or rather 16 if
you count our secondary hashing).
We must never write a new valid PTE after we cleared a hashed one
without having a flush in between.
On 32 bits we have less state (only the 'hashed' bit) but the problem is
similar, though we handle it differently: we never clear the hash bit
until we flush the hash, ie, pte_clear doesn't clear the hash bit. On
64-bit we do things differently, we do clear PTEs and pile up in a
per-cpu batch what needs to be flushed, the flush then happens when
leaving lazy mode.
> ptep_get_and_clear() is not batchable anyway, because the x86
> implementation requires an atomic xchg on the pte, which will likely
> result in some sort of trap (and if it doesn't then it doesn't need
> batching).
Well, ptep_get_and_clear() used to be used by zap_pte_range() which I
_HOPE_ was batchable on x86 :-)
Nowadays, there's this new ptep_get_and_clear_full() (yet another
totally meaningless name for an ad-hoc API added for some random special
purpose) that zap_pte_range() uses. Maybe that one is now subtly
different such as it can be used to batch on x86 ?
In any case, powerpc batches -everything- (unless it's called *_flush in
which case the flush is immediate) in a private per-cpu batch and
flushes the hash when leaving lazy mode.
> The start/commit API was specifically so that we can do the
> mprotect (and fork COW updates) in a batchable way (in Xen its
> implemented with a pte update hypercall which updates the pte without
> affecting the A/D bits).
I think we have different ideas of what batch means but yeah, we do
batch everything including these on powerpc without the new start/commit
interface.
Ben.
> J
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
To: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Hugh Dickins <hugh@veritas.com>,
Linux Memory Management List <linux-mm@kvack.org>,
Linux Kernel list <linux-kernel@vger.kernel.org>,
Nick Piggin <npiggin@suse.de>,
Martin Schwidefsky <schwidefsky@de.ibm.com>,
Peter Chubb <peterc@gelato.unsw.edu.au>
Subject: Re: PTE access rules & abstraction
Date: Thu, 25 Sep 2008 08:07:21 +1000 [thread overview]
Message-ID: <1222294041.8277.104.camel@pasglop> (raw)
In-Reply-To: <48DAB7E2.5030009@goop.org>
> What do you propose then? Ideally one would like to get something that
> works for powerpc, s390, all the wacky ia64 modes as well as x86. The
> ia64 folks proposed something, but I've not looked at it closely. From
> an x86 virtualization perspective, something that's basically x86 with
> as much scope for batching and deferring as possible would be fine.
That's where things get interesting. I liked Nick ideas of doing
something transactional that could encompass the lock, bach and flushing
but that may be too much at this stage...
> As a start, what's the state machine for a pte? What states can it be
> in, and how does it move from state to state? It sounds like powerpc
> has at least one extra state above x86 (hashed, with the hash key stored
> in the pte itself?).
We store in the PTE whether it was hashed, and the location within a
hash bucket. (For each hash value, there's 8 buckets, or rather 16 if
you count our secondary hashing).
We must never write a new valid PTE after we cleared a hashed one
without having a flush in between.
On 32 bits we have less state (only the 'hashed' bit) but the problem is
similar, though we handle it differently: we never clear the hash bit
until we flush the hash, ie, pte_clear doesn't clear the hash bit. On
64-bit we do things differently, we do clear PTEs and pile up in a
per-cpu batch what needs to be flushed, the flush then happens when
leaving lazy mode.
> ptep_get_and_clear() is not batchable anyway, because the x86
> implementation requires an atomic xchg on the pte, which will likely
> result in some sort of trap (and if it doesn't then it doesn't need
> batching).
Well, ptep_get_and_clear() used to be used by zap_pte_range() which I
_HOPE_ was batchable on x86 :-)
Nowadays, there's this new ptep_get_and_clear_full() (yet another
totally meaningless name for an ad-hoc API added for some random special
purpose) that zap_pte_range() uses. Maybe that one is now subtly
different such as it can be used to batch on x86 ?
In any case, powerpc batches -everything- (unless it's called *_flush in
which case the flush is immediate) in a private per-cpu batch and
flushes the hash when leaving lazy mode.
> The start/commit API was specifically so that we can do the
> mprotect (and fork COW updates) in a batchable way (in Xen its
> implemented with a pte update hypercall which updates the pte without
> affecting the A/D bits).
I think we have different ideas of what batch means but yeah, we do
batch everything including these on powerpc without the new start/commit
interface.
Ben.
> J
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2008-09-24 22:16 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-09-19 17:42 PTE access rules & abstraction Benjamin Herrenschmidt
2008-09-19 17:42 ` Benjamin Herrenschmidt
2008-09-22 6:22 ` Jeremy Fitzhardinge
2008-09-22 6:22 ` Jeremy Fitzhardinge
2008-09-22 21:05 ` Benjamin Herrenschmidt
2008-09-22 21:05 ` Benjamin Herrenschmidt
2008-09-23 3:10 ` Nick Piggin
2008-09-23 3:10 ` Nick Piggin
2008-09-23 3:16 ` David Miller
2008-09-23 3:16 ` David Miller, Nick Piggin
2008-09-23 5:35 ` Benjamin Herrenschmidt
2008-09-23 5:35 ` Benjamin Herrenschmidt
2008-09-23 6:18 ` Nick Piggin
2008-09-23 6:18 ` Nick Piggin
2008-09-23 5:31 ` Benjamin Herrenschmidt
2008-09-23 5:31 ` Benjamin Herrenschmidt
2008-09-23 6:13 ` Jeremy Fitzhardinge
2008-09-23 6:13 ` Jeremy Fitzhardinge
2008-09-23 6:49 ` Benjamin Herrenschmidt
2008-09-23 6:49 ` Benjamin Herrenschmidt
2008-09-23 9:50 ` Nick Piggin
2008-09-23 9:50 ` Nick Piggin
2008-09-23 11:54 ` peter
2008-09-23 11:54 ` peter
2008-09-24 18:45 ` Hugh Dickins
2008-09-24 18:45 ` Hugh Dickins
2008-09-24 21:20 ` Benjamin Herrenschmidt
2008-09-24 21:20 ` Benjamin Herrenschmidt
2008-09-24 21:57 ` Jeremy Fitzhardinge
2008-09-24 21:57 ` Jeremy Fitzhardinge
2008-09-24 22:07 ` Benjamin Herrenschmidt [this message]
2008-09-24 22:07 ` Benjamin Herrenschmidt
2008-09-24 22:43 ` Jeremy Fitzhardinge
2008-09-24 22:43 ` Jeremy Fitzhardinge
2008-09-24 22:53 ` Benjamin Herrenschmidt
2008-09-24 22:53 ` Benjamin Herrenschmidt
2008-09-24 23:55 ` Hugh Dickins
2008-09-24 23:55 ` Hugh Dickins
2008-09-25 1:04 ` Benjamin Herrenschmidt
2008-09-25 1:04 ` Benjamin Herrenschmidt
2008-09-25 18:15 ` Jeremy Fitzhardinge
2008-09-25 18:15 ` Jeremy Fitzhardinge
2008-09-25 21:44 ` Benjamin Herrenschmidt
2008-09-25 21:44 ` Benjamin Herrenschmidt
2008-09-25 22:27 ` Jeremy Fitzhardinge
2008-09-25 22:27 ` Jeremy Fitzhardinge
2008-09-25 23:02 ` Benjamin Herrenschmidt
2008-09-25 23:02 ` Benjamin Herrenschmidt
2008-09-24 22:17 ` Martin Schwidefsky
2008-09-24 22:17 ` Martin Schwidefsky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1222294041.8277.104.camel@pasglop \
--to=benh@kernel.crashing.org \
--cc=hugh@veritas.com \
--cc=jeremy@goop.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=npiggin@suse.de \
--cc=peterc@gelato.unsw.edu.au \
--cc=schwidefsky@de.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.