All of lore.kernel.org
 help / color / mirror / Atom feed
From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
To: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Nick Piggin <npiggin@suse.de>,
	Linux Memory Management List <linux-mm@kvack.org>,
	Linux Kernel list <linux-kernel@vger.kernel.org>,
	Hugh Dickins <hugh@veritas.com>
Subject: Re: PTE access rules & abstraction
Date: Tue, 23 Sep 2008 16:49:32 +1000	[thread overview]
Message-ID: <1222152572.12085.129.camel@pasglop> (raw)
In-Reply-To: <48D88904.4030909@goop.org>


> A good first step might be to define some conventions.  For example,
> define that set_pte*() *always* means setting a non-valid pte to either
> a new non-valid state (like a swap reference) or to a valid state. 
> modify_pte() would modify the flags of a valid
> pte, giving a new valid pte.  etc...

Yup. Or make it clear that ptep_set_access_flags() should only be used
to -relax- access (ie, set dirty, writeable, accessed, ... but not
remove any of them).

> It may be that a given architecture collapses some or all of these down
> to the same underlying functionality, but it would allow the core intent
> to be clearly expressed.
> 
> What is the complete set of primitives we need?  I also noticed that a
> number of the existing pagetable operations are used only once or twice
> in the core code; I wonder if we really need such special cases, or
> whether we can make each arch pte operation carry a bit more weight?

Yes, that was some of my concern. It's getting close to having one API
per call site :-)

> Also, rather than leaving all the rule enforcing to documentation and a
> maintainer, we should also consider having a debug mode which adds
> enough paranoid checks to each operation so that any rule breakage will
> fail obviously on all architectures.

We could do both.

Now, regarding operations, let's first find the major call sites, see
what I miss. I'm omitting free_* in memory.c as those are for freeing
pte pages, not accessing PTEs themselves. I'm also ignoring read-only
call sites and hugetlb for now.

* None-iterative accessors

 - handle_pte_fault in memory.c, on "fixup" faults (pte is present and
it's not a COW), for fixing up DIRTY and ACCESSED (btw, could we make
that also fixup EXEC ? I would like this for some stuff I'm working on
at the moment, ie set it if the vma has VM_EXEC and it was lost from the
PTE as I might want to mask it out of PTEs under some circumstances).
Textbook usage of ptep_set_access_flags(), so that's fine.

 - do_wp_page() in memory.c for COW or fixup of shared writeable mapping
writeable-ness. Doesn't overwrite existing PTE for COW anymore, it uses
clear_flush nowadays and fixup of shared writeable mapping uses
ptep_set_access_flags() as it should, so that's all good.

 - insert_pfn() and insert_page() still in memory.c for fancy page
faults. Just a trivial set_pte_at() of a !present one, no big deal here

  - RMAP ones ? Some ad-hoc stuff due to _notify thingies.

* Iterative accessors (some don't batch, maybe they could/should).

 - zapping a mapping (zap_p*) in memory.c
 - fork (copy_p*) in memory.c could batch better maybe ?
 - setting linear user mappings (remap_p*) in memory.c, trivial
set_pte_at() on a range, pte's should be !present I think.
 - mprotect (change_p*) in memory.c, which has the problem I mentioned
 - moving page tables (move_p*), pretty trivial clear_flush + set_pte_at
 - clear_regs_pte_range via walk_page_range in fs/proc/task_mmu.c, does
a test_and_clear_young, flushes mm afterward, could use some lazy stuff
so we can batch properly on ppc64.
 - vmalloc, that's a bit special and kernel only, doesn't have nasty
races between creating/tearing down mappings vs. using them
 - highmem I leave alone for now, it's mostly trivial set_pte_at &
flushing for normal kmap but kmap_atomic can be nasty, though it's arch
specific.
 - some stuff in fremap I'm not too familiar with and I need to run...

What did I miss ?

Cheers,
Ben.



WARNING: multiple messages have this Message-ID (diff)
From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
To: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Nick Piggin <npiggin@suse.de>,
	Linux Memory Management List <linux-mm@kvack.org>,
	Linux Kernel list <linux-kernel@vger.kernel.org>,
	Hugh Dickins <hugh@veritas.com>
Subject: Re: PTE access rules & abstraction
Date: Tue, 23 Sep 2008 16:49:32 +1000	[thread overview]
Message-ID: <1222152572.12085.129.camel@pasglop> (raw)
In-Reply-To: <48D88904.4030909@goop.org>

> A good first step might be to define some conventions.  For example,
> define that set_pte*() *always* means setting a non-valid pte to either
> a new non-valid state (like a swap reference) or to a valid state. 
> modify_pte() would modify the flags of a valid
> pte, giving a new valid pte.  etc...

Yup. Or make it clear that ptep_set_access_flags() should only be used
to -relax- access (ie, set dirty, writeable, accessed, ... but not
remove any of them).

> It may be that a given architecture collapses some or all of these down
> to the same underlying functionality, but it would allow the core intent
> to be clearly expressed.
> 
> What is the complete set of primitives we need?  I also noticed that a
> number of the existing pagetable operations are used only once or twice
> in the core code; I wonder if we really need such special cases, or
> whether we can make each arch pte operation carry a bit more weight?

Yes, that was some of my concern. It's getting close to having one API
per call site :-)

> Also, rather than leaving all the rule enforcing to documentation and a
> maintainer, we should also consider having a debug mode which adds
> enough paranoid checks to each operation so that any rule breakage will
> fail obviously on all architectures.

We could do both.

Now, regarding operations, let's first find the major call sites, see
what I miss. I'm omitting free_* in memory.c as those are for freeing
pte pages, not accessing PTEs themselves. I'm also ignoring read-only
call sites and hugetlb for now.

* None-iterative accessors

 - handle_pte_fault in memory.c, on "fixup" faults (pte is present and
it's not a COW), for fixing up DIRTY and ACCESSED (btw, could we make
that also fixup EXEC ? I would like this for some stuff I'm working on
at the moment, ie set it if the vma has VM_EXEC and it was lost from the
PTE as I might want to mask it out of PTEs under some circumstances).
Textbook usage of ptep_set_access_flags(), so that's fine.

 - do_wp_page() in memory.c for COW or fixup of shared writeable mapping
writeable-ness. Doesn't overwrite existing PTE for COW anymore, it uses
clear_flush nowadays and fixup of shared writeable mapping uses
ptep_set_access_flags() as it should, so that's all good.

 - insert_pfn() and insert_page() still in memory.c for fancy page
faults. Just a trivial set_pte_at() of a !present one, no big deal here

  - RMAP ones ? Some ad-hoc stuff due to _notify thingies.

* Iterative accessors (some don't batch, maybe they could/should).

 - zapping a mapping (zap_p*) in memory.c
 - fork (copy_p*) in memory.c could batch better maybe ?
 - setting linear user mappings (remap_p*) in memory.c, trivial
set_pte_at() on a range, pte's should be !present I think.
 - mprotect (change_p*) in memory.c, which has the problem I mentioned
 - moving page tables (move_p*), pretty trivial clear_flush + set_pte_at
 - clear_regs_pte_range via walk_page_range in fs/proc/task_mmu.c, does
a test_and_clear_young, flushes mm afterward, could use some lazy stuff
so we can batch properly on ppc64.
 - vmalloc, that's a bit special and kernel only, doesn't have nasty
races between creating/tearing down mappings vs. using them
 - highmem I leave alone for now, it's mostly trivial set_pte_at &
flushing for normal kmap but kmap_atomic can be nasty, though it's arch
specific.
 - some stuff in fremap I'm not too familiar with and I need to run...

What did I miss ?

Cheers,
Ben.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2008-09-23  6:51 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-09-19 17:42 PTE access rules & abstraction Benjamin Herrenschmidt
2008-09-19 17:42 ` Benjamin Herrenschmidt
2008-09-22  6:22 ` Jeremy Fitzhardinge
2008-09-22  6:22   ` Jeremy Fitzhardinge
2008-09-22 21:05   ` Benjamin Herrenschmidt
2008-09-22 21:05     ` Benjamin Herrenschmidt
2008-09-23  3:10     ` Nick Piggin
2008-09-23  3:10       ` Nick Piggin
2008-09-23  3:16       ` David Miller
2008-09-23  3:16         ` David Miller, Nick Piggin
2008-09-23  5:35         ` Benjamin Herrenschmidt
2008-09-23  5:35           ` Benjamin Herrenschmidt
2008-09-23  6:18           ` Nick Piggin
2008-09-23  6:18             ` Nick Piggin
2008-09-23  5:31       ` Benjamin Herrenschmidt
2008-09-23  5:31         ` Benjamin Herrenschmidt
2008-09-23  6:13         ` Jeremy Fitzhardinge
2008-09-23  6:13           ` Jeremy Fitzhardinge
2008-09-23  6:49           ` Benjamin Herrenschmidt [this message]
2008-09-23  6:49             ` Benjamin Herrenschmidt
2008-09-23  9:50             ` Nick Piggin
2008-09-23  9:50               ` Nick Piggin
2008-09-23 11:54               ` peter
2008-09-23 11:54                 ` peter
2008-09-24 18:45     ` Hugh Dickins
2008-09-24 18:45       ` Hugh Dickins
2008-09-24 21:20       ` Benjamin Herrenschmidt
2008-09-24 21:20         ` Benjamin Herrenschmidt
2008-09-24 21:57         ` Jeremy Fitzhardinge
2008-09-24 21:57           ` Jeremy Fitzhardinge
2008-09-24 22:07           ` Benjamin Herrenschmidt
2008-09-24 22:07             ` Benjamin Herrenschmidt
2008-09-24 22:43             ` Jeremy Fitzhardinge
2008-09-24 22:43               ` Jeremy Fitzhardinge
2008-09-24 22:53               ` Benjamin Herrenschmidt
2008-09-24 22:53                 ` Benjamin Herrenschmidt
2008-09-24 23:55         ` Hugh Dickins
2008-09-24 23:55           ` Hugh Dickins
2008-09-25  1:04           ` Benjamin Herrenschmidt
2008-09-25  1:04             ` Benjamin Herrenschmidt
2008-09-25 18:15             ` Jeremy Fitzhardinge
2008-09-25 18:15               ` Jeremy Fitzhardinge
2008-09-25 21:44               ` Benjamin Herrenschmidt
2008-09-25 21:44                 ` Benjamin Herrenschmidt
2008-09-25 22:27                 ` Jeremy Fitzhardinge
2008-09-25 22:27                   ` Jeremy Fitzhardinge
2008-09-25 23:02                   ` Benjamin Herrenschmidt
2008-09-25 23:02                     ` Benjamin Herrenschmidt
2008-09-24 22:17       ` Martin Schwidefsky
2008-09-24 22:17         ` Martin Schwidefsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1222152572.12085.129.camel@pasglop \
    --to=benh@kernel.crashing.org \
    --cc=hugh@veritas.com \
    --cc=jeremy@goop.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.