how set_pte_at()'s vaddr and ptep args relate

virtualization.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed

* how set_pte_at()'s vaddr and ptep args relate
@ 2006-11-07 19:57 Jeremy Fitzhardinge
  2006-11-07 22:19 ` Zachary Amsden
  0 siblings, 1 reply; 15+ messages in thread
From: Jeremy Fitzhardinge @ 2006-11-07 19:57 UTC (permalink / raw)
  To: Zachary Amsden; +Cc: Chris Wright, Virtualization Mailing List

Hi Zach,

I'm wondering what the interface requirements of set_pte_at()'s "addr" 
and "ptep" args are.  I presume that in general the ptep points to the 
pte entry which corresponds to the vaddr, but is this necessarily the case?

For example, it is valid to pass a non-highmem page kmap_atomic(), which 
will simply return a direct pointer to the page.

kunmap_atomic() takes this address, as well as the kmap slot index, and 
ends up calling:

    set_pte_at(&init_mm, lowmem_vaddr, kmap_ptep, 0);

ie, the vaddr and the ptep bear no relationship to each other.  Is this 
a bug in kunmap_atomic (it shouldn't try to clear the pte for lowmem 
addresses), or should set_pte_at's implementation be able to cope with this.

Certainly at the moment, having mismatched ptep and vaddr makes the 
interface useless for Xen, since it will use one or the other depending 
on whether we modifying the current pagetable or not, and it assume they 
correspond to the same thing.

For now I've changed kunmap_atomic() to only clear the kmap pte for 
mapped high page addresses, but I'm wondering what other places might 
use set_pte_at in this way.

Also, it would be useful for Xen to have a set_pte_at_sync, which also 
does a TLB flush if necessary, since we can do that in a single operation.

    J

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: how set_pte_at()'s vaddr and ptep args relate
  2006-11-07 19:57 how set_pte_at()'s vaddr and ptep args relate Jeremy Fitzhardinge
@ 2006-11-07 22:19 ` Zachary Amsden
  2006-11-07 22:38   ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 15+ messages in thread
From: Zachary Amsden @ 2006-11-07 22:19 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: Chris Wright, Virtualization Mailing List

Jeremy Fitzhardinge wrote:
> Hi Zach,
>
> I'm wondering what the interface requirements of set_pte_at()'s "addr" 
> and "ptep" args are.  I presume that in general the ptep points to the 
> pte entry which corresponds to the vaddr, but is this necessarily the 
> case?

Yes, it must pass a pointer to the PTE in ptep.  The "addr" field must 
match the linear address of the mapping changed by the pte - the address 
you would invlpg if required.

> For example, it is valid to pass a non-highmem page kmap_atomic(), 
> which will simply return a direct pointer to the page.
>
> kunmap_atomic() takes this address, as well as the kmap slot index, 
> and ends up calling:
>
>    set_pte_at(&init_mm, lowmem_vaddr, kmap_ptep, 0);
>
> ie, the vaddr and the ptep bear no relationship to each other.  Is 
> this a bug in kunmap_atomic (it shouldn't try to clear the pte for 
> lowmem addresses), or should set_pte_at's implementation be able to 
> cope with this.

Ok, that is really strange, but it seems harmless.

>
> Certainly at the moment, having mismatched ptep and vaddr makes the 
> interface useless for Xen, since it will use one or the other 
> depending on whether we modifying the current pagetable or not, and it 
> assume they correspond to the same thing.
>
> For now I've changed kunmap_atomic() to only clear the kmap pte for 
> mapped high page addresses, but I'm wondering what other places might 
> use set_pte_at in this way.

None that I'm aware of.  The interface here is supposed to be passing 
the "addr" field as the linear address whose mapping in the current 
address space is changing, and the "ptep" field as a pointer to the PTE.

> Also, it would be useful for Xen to have a set_pte_at_sync, which also 
> does a TLB flush if necessary, since we can do that in a single 
> operation.

We could either add new operators or use a flags field which passes a 
"defer this update and piggyback on the next TLB flush" hint - which is 
how the Xen VMI interface worked.

In any case, the address field and hints / operators here are supposed 
to be liberal enough to accommodate using LA hints to update currently 
mapped PTEs and piggyback the flush, and if they are not, it is a bug or 
an oversight.

I haven't really dealt with the HIGHMEM_PTE case thoroughly yet - do we 
want to bother with that?

Zach

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: how set_pte_at()'s vaddr and ptep args relate
  2006-11-07 22:19 ` Zachary Amsden
@ 2006-11-07 22:38   ` Jeremy Fitzhardinge
  2006-11-07 23:33     ` Zachary Amsden
  0 siblings, 1 reply; 15+ messages in thread
From: Jeremy Fitzhardinge @ 2006-11-07 22:38 UTC (permalink / raw)
  To: Zachary Amsden; +Cc: Chris Wright, Virtualization Mailing List

Zachary Amsden wrote:
>> ie, the vaddr and the ptep bear no relationship to each other.  Is 
>> this a bug in kunmap_atomic (it shouldn't try to clear the pte for 
>> lowmem addresses), or should set_pte_at's implementation be able to 
>> cope with this.
>
> Ok, that is really strange, but it seems harmless.

Well, it kills Xen.  It ends up zeroing the pte for vaddr, ignoring the 
ptep; in my case, it meant that get_zeroed_page ends up returning an 
unmapped address.  I just pushed a patch to fix this (into the repo, not 
upstream).

> None that I'm aware of.  The interface here is supposed to be passing 
> the "addr" field as the linear address whose mapping in the current 
> address space is changing, and the "ptep" field as a pointer to the PTE.

You mean for the mm that's passed in?

>> Also, it would be useful for Xen to have a set_pte_at_sync, which 
>> also does a TLB flush if necessary, since we can do that in a single 
>> operation.
>
> We could either add new operators or use a flags field which passes a 
> "defer this update and piggyback on the next TLB flush" hint - which 
> is how the Xen VMI interface worked.

Do you mean by queuing updates to then submit them all in a single 
batched hypercall?  Or something else?  That sort of batching certainly 
works for Xen.

I guess _sync and "may batch" are opposite senses of the same thing; if 
you don't sync the tlb, then I presume any pagetable update is 
effectively deferred until the tlb sync.  Though isn't there some rule 
about not needing to do an explicit tlb flush if you're increasing the 
access permissions on a page (since the tlb miss/fault will rewalk the 
pagetable before actually deciding to raise an exception)?

> I haven't really dealt with the HIGHMEM_PTE case thoroughly yet - do 
> we want to bother with that?

I'm planning a patch to get rid of them altogether.

    J

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: how set_pte_at()'s vaddr and ptep args relate
  2006-11-07 22:38   ` Jeremy Fitzhardinge
@ 2006-11-07 23:33     ` Zachary Amsden
  2006-11-07 23:42       ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 15+ messages in thread
From: Zachary Amsden @ 2006-11-07 23:33 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: Chris Wright, Virtualization Mailing List

Jeremy Fitzhardinge wrote:
> Zachary Amsden wrote:
>>> ie, the vaddr and the ptep bear no relationship to each other.  Is 
>>> this a bug in kunmap_atomic (it shouldn't try to clear the pte for 
>>> lowmem addresses), or should set_pte_at's implementation be able to 
>>> cope with this.
>>
>> Ok, that is really strange, but it seems harmless.
>
> Well, it kills Xen.  It ends up zeroing the pte for vaddr, ignoring 
> the ptep; in my case, it meant that get_zeroed_page ends up returning 
> an unmapped address.  I just pushed a patch to fix this (into the 
> repo, not upstream).

Yes, seems the right thing to do.

>> None that I'm aware of.  The interface here is supposed to be passing 
>> the "addr" field as the linear address whose mapping in the current 
>> address space is changing, and the "ptep" field as a pointer to the PTE.
>
> You mean for the mm that's passed in?

Yes - which better be either the init_mm or the current mm if   you want 
to make use of the addr field constructively.

>>> Also, it would be useful for Xen to have a set_pte_at_sync, which 
>>> also does a TLB flush if necessary, since we can do that in a single 
>>> operation.
>>
>> We could either add new operators or use a flags field which passes a 
>> "defer this update and piggyback on the next TLB flush" hint - which 
>> is how the Xen VMI interface worked.
>
> Do you mean by queuing updates to then submit them all in a single 
> batched hypercall?  Or something else?  That sort of batching 
> certainly works for Xen.

Yes, I believe Xen already has a hypercall for set_pte+flush.

> I guess _sync and "may batch" are opposite senses of the same thing; 
> if you don't sync the tlb, then I presume any pagetable update is 
> effectively deferred until the tlb sync.  Though isn't there some rule 
> about not needing to do an explicit tlb flush if you're increasing the 
> access permissions on a page (since the tlb miss/fault will rewalk the 
> pagetable before actually deciding to raise an exception)?

Not quite - the effects of delayed PTE writes can create read hazards or 
consistency hazards.  If you want to use batching as such, you must 
explicitly flush under the protection of the PTE lock.  This rule 
applies to both shadow and non-shadow MMU consistency with either 
explicit queues (as shadow mode VMI uses), or implicit queues (direct 
writable page tables).

So there are two senses of "batching" - there is combining multiple PTE 
updates into one, and there is piggybacking the flush onto the PTE write.

In some cases, the TLB flush is too far away (not under the page table 
lock) to be piggybacked, so the PTE update must happen immediately.

Anything where you implicitly defer pagetable updates is far too 
vulnerable to bugs.  We played with several such schemes before, and 
although they could be made to work for a shadow mode hypervisor, 
getting them to work for both shadow and direct mode, with performance 
opportunities for everyone was just too risky and a burden on the Linux 
mm code.

There is no architectural rule about tlb flush that I am aware of, 
however, most cores will allow you to do NP->P transitions without a 
flush.  YMMV.  I believe the Linux use is fine.

>> I haven't really dealt with the HIGHMEM_PTE case thoroughly yet - do 
>> we want to bother with that?
>
> I'm planning a patch to get rid of them altogether.

Good.  It does not seem worth the effort.  I do have the code to make it 
work, but it is really ugly.  If some user comes screaming for it later, 
we can always add it back.

Zach

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: how set_pte_at()'s vaddr and ptep args relate
  2006-11-07 23:33     ` Zachary Amsden
@ 2006-11-07 23:42       ` Jeremy Fitzhardinge
  2006-11-07 23:59         ` Zachary Amsden
  0 siblings, 1 reply; 15+ messages in thread
From: Jeremy Fitzhardinge @ 2006-11-07 23:42 UTC (permalink / raw)
  To: Zachary Amsden; +Cc: Chris Wright, Virtualization Mailing List

Zachary Amsden wrote:
> Anything where you implicitly defer pagetable updates is far too 
> vulnerable to bugs.  We played with several such schemes before, and 
> although they could be made to work for a shadow mode hypervisor, 
> getting them to work for both shadow and direct mode, with performance 
> opportunities for everyone was just too risky and a burden on the 
> Linux mm code.

Yep.

> There is no architectural rule about tlb flush that I am aware of, 
> however, most cores will allow you to do NP->P transitions without a 
> flush.  YMMV.  I believe the Linux use is fine.

Hm, I was under the impression there's an actual architectural guarantee 
there, but I don't know chapter&verse.

> Good.  It does not seem worth the effort.  I do have the code to make 
> it work, but it is really ugly.  If some user comes screaming for it 
> later, we can always add it back. 
I'm working on linear pagetables, so that ptes can be allocated from 
anywhere any be directly accessable.  This eliminates the need for 
CONFIG_HIGHPTE, and it also simplifies a lot of the pagetable walking.  
Manipulating other processes's pagetables would still need kmap (or a 
second window for cross-process pagetable manipulation), but I should 
think that's pretty rare.

    J

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: how set_pte_at()'s vaddr and ptep args relate
  2006-11-07 23:42       ` Jeremy Fitzhardinge
@ 2006-11-07 23:59         ` Zachary Amsden
  2006-11-08  0:15           ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 15+ messages in thread
From: Zachary Amsden @ 2006-11-07 23:59 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: Chris Wright, Virtualization Mailing List

Jeremy Fitzhardinge wrote:
> Zachary Amsden wrote:
>> Anything where you implicitly defer pagetable updates is far too 
>> vulnerable to bugs.  We played with several such schemes before, and 
>> although they could be made to work for a shadow mode hypervisor, 
>> getting them to work for both shadow and direct mode, with 
>> performance opportunities for everyone was just too risky and a 
>> burden on the Linux mm code.
>
> Yep.
>
>> There is no architectural rule about tlb flush that I am aware of, 
>> however, most cores will allow you to do NP->P transitions without a 
>> flush.  YMMV.  I believe the Linux use is fine.
>
> Hm, I was under the impression there's an actual architectural 
> guarantee there, but I don't know chapter&verse.

There isn't one explicitly stated in the book I'm looking at.  Ps 19:12 
NIV seems relevant, although a little cryptic.

"Who can discern his errors?  Forgive my hidden faults."

> I'm working on linear pagetables, so that ptes can be allocated from 
> anywhere any be directly accessable.  This eliminates the need for 
> CONFIG_HIGHPTE, and it also simplifies a lot of the pagetable 
> walking.  Manipulating other processes's pagetables would still need 
> kmap (or a second window for cross-process pagetable manipulation), 
> but I should think that's pretty rare.

Oh, wow.  Unfortunately, the complexity isn't from how frequent or rare 
a kmapped PT access is, it is from it being there at all.

Zach

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: how set_pte_at()'s vaddr and ptep args relate
  2006-11-07 23:59         ` Zachary Amsden
@ 2006-11-08  0:15           ` Jeremy Fitzhardinge
  2006-11-08  0:19             ` Zachary Amsden
  2006-11-08  8:34             ` Keir Fraser
  0 siblings, 2 replies; 15+ messages in thread
From: Jeremy Fitzhardinge @ 2006-11-08  0:15 UTC (permalink / raw)
  To: Zachary Amsden; +Cc: Chris Wright, Virtualization Mailing List

Zachary Amsden wrote:
> "Who can discern his errors?  Forgive my hidden faults."
In general that Book doesn't give much in the way of architectural 
certainty.  3.12 IA32SDM isn't much better:

    Whenever a page-directory or page-table entry is changed (including
    when the present flag is set to zero), the operating-system must
    immediately invalidate the corresponding entry in the TLB so that it
    can be updated the next time the entry is referenced.

The parenthetic clause is annoyingly ambiguous: does it mean "when the 
present flag is currently set to 0" or "when setting the present flag to 
0"?  The passive voice should always be avoided.


> Oh, wow.  Unfortunately, the complexity isn't from how frequent or 
> rare a kmapped PT access is, it is from it being there at all.

We'll see how it works out.  What actually needs to do cross-process 
pagetable manipulation?  Fork?  ptrace?

    J

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: how set_pte_at()'s vaddr and ptep args relate
  2006-11-08  0:15           ` Jeremy Fitzhardinge
@ 2006-11-08  0:19             ` Zachary Amsden
  2006-11-08  8:34             ` Keir Fraser
  1 sibling, 0 replies; 15+ messages in thread
From: Zachary Amsden @ 2006-11-08  0:19 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: Chris Wright, Virtualization Mailing List

Jeremy Fitzhardinge wrote:
>> Oh, wow.  Unfortunately, the complexity isn't from how frequent or 
>> rare a kmapped PT access is, it is from it being there at all.
>
> We'll see how it works out.  What actually needs to do cross-process 
> pagetable manipulation?  Fork?  ptrace?

Fork yes, ptrace, maybe, and also the swapper for A-bit clocking.

Zach

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: how set_pte_at()'s vaddr and ptep args relate
  2006-11-08  0:15           ` Jeremy Fitzhardinge
  2006-11-08  0:19             ` Zachary Amsden
@ 2006-11-08  8:34             ` Keir Fraser
  2006-11-08 19:59               ` Jeremy Fitzhardinge
  1 sibling, 1 reply; 15+ messages in thread
From: Keir Fraser @ 2006-11-08  8:34 UTC (permalink / raw)
  To: Jeremy Fitzhardinge, Zachary Amsden
  Cc: Chris Wright, Virtualization Mailing List

On 8/11/06 12:15 am, "Jeremy Fitzhardinge" <jeremy@goop.org> wrote:

> The parenthetic clause is annoyingly ambiguous: does it mean "when the
> present flag is currently set to 0" or "when setting the present flag to
> 0"?  The passive voice should always be avoided.

If the PTE was previously NP, and you already flushed the TLB since it
became NP, then you can be quite sure that there is no stale TLB entry. x86
TLBs do not cache failed lookups. If they did, many OSes would not work!

Changing from read-only to writable is more exciting. You get a spurious
page fault when the CPU finds the read-only TLB entry on your first write
access. The CPU does throw away the TLB entry at that point though, so the
fault handler can do nothing and yet still make progress. That's guaranteed.

 -- Keir

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: how set_pte_at()'s vaddr and ptep args relate
  2006-11-08  8:34             ` Keir Fraser
@ 2006-11-08 19:59               ` Jeremy Fitzhardinge
  2006-11-08 20:18                 ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 15+ messages in thread
From: Jeremy Fitzhardinge @ 2006-11-08 19:59 UTC (permalink / raw)
  To: Keir Fraser; +Cc: Chris Wright, Virtualization Mailing List

Keir Fraser wrote:
> Changing from read-only to writable is more exciting. You get a spurious
> page fault when the CPU finds the read-only TLB entry on your first write
> access. The CPU does throw away the TLB entry at that point though, so the
> fault handler can do nothing and yet still make progress. That's guaranteed.
>   

Do you mean the tlb entry gets invalidated as part of raising the 
fault?  Do you know where this is documented?

Thanks,
    J

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: how set_pte_at()'s vaddr and ptep args relate
  2006-11-08 19:59               ` Jeremy Fitzhardinge
@ 2006-11-08 20:18                 ` Jeremy Fitzhardinge
  2006-11-08 23:17                   ` Keir Fraser
  0 siblings, 1 reply; 15+ messages in thread
From: Jeremy Fitzhardinge @ 2006-11-08 20:18 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Chris Wright, Virtualization Mailing List, Keir Fraser

Jeremy Fitzhardinge wrote:
> Do you mean the tlb entry gets invalidated as part of raising the 
> fault?
>   

Specifically, does it simply invalidate the TLB at that point, or does 
it re-walk the page table and populate it with the new PTE?

    J

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: how set_pte_at()'s vaddr and ptep args relate
  2006-11-08 20:18                 ` Jeremy Fitzhardinge
@ 2006-11-08 23:17                   ` Keir Fraser
  2006-11-08 23:25                     ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 15+ messages in thread
From: Keir Fraser @ 2006-11-08 23:17 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: Chris Wright, Virtualization Mailing List

On 8/11/06 8:18 pm, "Jeremy Fitzhardinge" <jeremy@goop.org> wrote:

>> Do you mean the tlb entry gets invalidated as part of raising the
>> fault?
>>   
> 
> Specifically, does it simply invalidate the TLB at that point, or does
> it re-walk the page table and populate it with the new PTE?

It invalidates the TLB entry, to avoid a fault loop, but it will then #PF
rather than walk the page tables in hardware. I believe this is to speed up
the fairly common CoW fault path. I don't know whether it is actually
documented anywhere: the x86 'architecture' seems to be defined by the
behaviour of the various hardware implementations. I discovered this
particular factoid from talking to one of the Intel hardware guys. Another
factoid I discovered at the same meeting is that the CPU may cache partial
page walks. So, for example, just because you 'detach' a page table from a
page-directory entry, doesn't mean that page table won't be accessed on
future hardware TLB fills. I confirmed both these factoids by constructing
simple test cases: IIRC both AMD and Intel CPUs exhibit both behaviours.

 -- Keir

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: how set_pte_at()'s vaddr and ptep args relate
  2006-11-08 23:17                   ` Keir Fraser
@ 2006-11-08 23:25                     ` Jeremy Fitzhardinge
  2006-11-09  8:29                       ` Keir Fraser
  0 siblings, 1 reply; 15+ messages in thread
From: Jeremy Fitzhardinge @ 2006-11-08 23:25 UTC (permalink / raw)
  To: Keir Fraser; +Cc: Chris Wright, Virtualization Mailing List

Keir Fraser wrote:
> Another
> factoid I discovered at the same meeting is that the CPU may cache partial
> page walks. So, for example, just because you 'detach' a page table from a
> page-directory entry, doesn't mean that page table won't be accessed on
> future hardware TLB fills.
>   

Do you know if these intermediate TLB entries are level-sensitive?  Ie, 
if you have a linear pagetable mapping where the pagetable points back 
to itself, will that result in multiple TLB entries for the pmd pages 
(pmd as pmd, and pmd as pte)?

    J

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: how set_pte_at()'s vaddr and ptep args relate
  2006-11-08 23:25                     ` Jeremy Fitzhardinge
@ 2006-11-09  8:29                       ` Keir Fraser
  2006-11-09  9:15                         ` Zachary Amsden
  0 siblings, 1 reply; 15+ messages in thread
From: Keir Fraser @ 2006-11-09  8:29 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: Chris Wright, Virtualization Mailing List

On 8/11/06 11:25 pm, "Jeremy Fitzhardinge" <jeremy@goop.org> wrote:

> Keir Fraser wrote:
>> Another
>> factoid I discovered at the same meeting is that the CPU may cache partial
>> page walks. So, for example, just because you 'detach' a page table from a
>> page-directory entry, doesn't mean that page table won't be accessed on
>> future hardware TLB fills.
>>   
> 
> Do you know if these intermediate TLB entries are level-sensitive?  Ie,
> if you have a linear pagetable mapping where the pagetable points back
> to itself, will that result in multiple TLB entries for the pmd pages
> (pmd as pmd, and pmd as pte)?

I think so. I can't think why the CPU would bother to disallow this. It does
mean you have to be careful when changing page-directory entries that your
linear mapping (if you have one) doesn't go stale.

 -- Keir

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: how set_pte_at()'s vaddr and ptep args relate
  2006-11-09  8:29                       ` Keir Fraser
@ 2006-11-09  9:15                         ` Zachary Amsden
  0 siblings, 0 replies; 15+ messages in thread
From: Zachary Amsden @ 2006-11-09  9:15 UTC (permalink / raw)
  To: Keir Fraser; +Cc: Chris Wright, Virtualization Mailing List

Keir Fraser wrote:
> On 8/11/06 11:25 pm, "Jeremy Fitzhardinge" <jeremy@goop.org> wrote:
>
>   
>> Keir Fraser wrote:
>>     
>>> Another
>>> factoid I discovered at the same meeting is that the CPU may cache partial
>>> page walks. So, for example, just because you 'detach' a page table from a
>>> page-directory entry, doesn't mean that page table won't be accessed on
>>> future hardware TLB fills.
>>>       

Yes.  The rule is, if you change a mapping at any level, your results of 
using any descendants of that mapping for memory access are undefined 
unless you have flushed the mappings you are trying to use.

>> Do you know if these intermediate TLB entries are level-sensitive?  Ie,
>> if you have a linear pagetable mapping where the pagetable points back
>> to itself, will that result in multiple TLB entries for the pmd pages
>> (pmd as pmd, and pmd as pte)?
>>     
>
> I think so. I can't think why the CPU would bother to disallow this. It does
> mean you have to be careful when changing page-directory entries that your
> linear mapping (if you have one) doesn't go stale.
>   

No, as long as the pmd is only mapped in one linear address, there is 
only one TLB entry for it.  You can't mix large pages and circular 
pagetables, so you can't have a pmd as pmd level TLB mapping in addition 
to a pmd as pte level mapping.  Thinking of it as pmd as pmd  / pmd as 
pte level mapping is confusing.  It is better to think of it in terms of 
physical page walks during TLB fills and virtual walks during circular 
mapping access.

You do need to issue appropriate page invalidations after changing 
page-directory level mappings for the page tables.  If you update PDEs 
and you use a circular mapping of page tables, then you may need to 
issue invlpg's for the page tables that may have been disconnected or 
changed.  This is for your consistency, not the TLB consistency.  The 
TLB will fetch new mappings from physical space, and knows nothing about 
your circular mapping.  So a PDE update to an entire 4M/2M range could 
require multiple flushes - one or more to invalidate mappings of 
underlying PTEs that have changed, and one to invalidate the mapping of 
the circularly mapped page table itself.

Of course, depending on the context and your consistency requirements, 
some or all of these flushes may be redundant or subsumed by subsequent 
flushes.

Zach

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2006-11-09  9:15 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-11-07 19:57 how set_pte_at()'s vaddr and ptep args relate Jeremy Fitzhardinge
2006-11-07 22:19 ` Zachary Amsden
2006-11-07 22:38   ` Jeremy Fitzhardinge
2006-11-07 23:33     ` Zachary Amsden
2006-11-07 23:42       ` Jeremy Fitzhardinge
2006-11-07 23:59         ` Zachary Amsden
2006-11-08  0:15           ` Jeremy Fitzhardinge
2006-11-08  0:19             ` Zachary Amsden
2006-11-08  8:34             ` Keir Fraser
2006-11-08 19:59               ` Jeremy Fitzhardinge
2006-11-08 20:18                 ` Jeremy Fitzhardinge
2006-11-08 23:17                   ` Keir Fraser
2006-11-08 23:25                     ` Jeremy Fitzhardinge
2006-11-09  8:29                       ` Keir Fraser
2006-11-09  9:15                         ` Zachary Amsden

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).