pte_offset_map + lazy mmu

virtualization.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed

* pte_offset_map + lazy mmu
@ 2007-03-10  6:37 Jeremy Fitzhardinge
  2007-03-10  6:54 ` Zachary Amsden
  2007-03-10  7:26 ` Zachary Amsden
  0 siblings, 2 replies; 4+ messages in thread
From: Jeremy Fitzhardinge @ 2007-03-10  6:37 UTC (permalink / raw)
  To: Zachary Amsden; +Cc: Chris Wright, Virtualization Mailing List

Is pte_offset_map allowed to happen within lazy mmu?  I presume not,
because you definitely don't want the mapping pte update to be deferred.

Or, more specifically, is kunmap_atomic ever allowed within lazy mmu? 
I'm looking at kpte_clear_flush; I've already got a patch which turns
this into a pv_op, along with a Xen implementation.  But I think its
probably an excess pv_op for a relatively minor corner case.  It seems
to me that it would be better to define kpte_clear_flush as:

    #define kpte_clear_flush(ptep, vaddr)					\
    do {									\
    	arch_enter_lazy_mmu_mode();					\
    	pte_clear(&init_mm, vaddr, ptep);				\
    	__flush_tlb_one(vaddr);						\
    	arch_leave_lazy_mmu_mode();					\
    } while (0)

and take advantage of mmu batching to make this operation efficient. 
But I'm not sure if this is safe.

(Also, kmap_atomic could use set_pte_at rather than set_pte.)

What do you think?

    J

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: pte_offset_map + lazy mmu
  2007-03-10  6:37 pte_offset_map + lazy mmu Jeremy Fitzhardinge
@ 2007-03-10  6:54 ` Zachary Amsden
  2007-03-10 16:06   ` Jeremy Fitzhardinge
  2007-03-10  7:26 ` Zachary Amsden
  1 sibling, 1 reply; 4+ messages in thread
From: Zachary Amsden @ 2007-03-10  6:54 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: Chris Wright, Virtualization Mailing List

Jeremy Fitzhardinge wrote:
> Is pte_offset_map allowed to happen within lazy mmu?  I presume not,
> because you definitely don't want the mapping pte update to be deferred.
>
> Or, more specifically, is kunmap_atomic ever allowed within lazy mmu? 
> I'm looking at kpte_clear_flush; I've already got a patch which turns
> this into a pv_op, along with a Xen implementation.  But I think its
> probably an excess pv_op for a relatively minor corner case.  It seems
> to me that it would be better to define kpte_clear_flush as:
>
>     #define kpte_clear_flush(ptep, vaddr)					\
>     do {									\
>     	arch_enter_lazy_mmu_mode();					\
>     	pte_clear(&init_mm, vaddr, ptep);				\
>     	__flush_tlb_one(vaddr);						\
>     	arch_leave_lazy_mmu_mode();					\
>     } while (0)
>       
>
> and take advantage of mmu batching to make this operation efficient. 
> But I'm not sure if this is safe.
>
> (Also, kmap_atomic could use set_pte_at rather than set_pte.)
>
> What do you think?
>   

We (artificially) don't allow nesting of lazy modes, they are intended 
mostly to be isolated to make the guest interface simpler.  We could fix 
that, rather easily, but I think a better solution is possible.

This could also be genericized in a different way.  PTE updates should 
be limited in scope to just a couple operations.  We don't want to have 
pv-ops for every possible combination of weirdness that can be 
efficiently combined, because then pv-ops explodes.

I propose adding a hint or flags field to the PV-op.  In fact, the 
address argument vaddr can be compressed down do a VPN, which gives 12 
more bits for flags to use.  Now, you can pass hints:

PV_LAZY_UPDATE
PV_DEAD_TABLE
PV_VADDR_VALID

which the backend can use.  In the kpte_flush example, you can now pass 
PV_LAZY_UPDATE to combine the pte write with the flush.  And in address 
space destruction, you can pass PV_DEAD_TABLE, which allows you to 
optimize away the pte writes which would otherwise trap to the 
hypervisor, but are not needed, because in the Xen case, you switch off 
the current process page tables during exit() (or should, anyway, to 
avoid spurious traps to Xen before the page tables are freed), and in 
our case, gets rid of these pte clears that don't need to be reflected 
in the shadows because the shadow is about to die.

And for updates in the current address space, you can pass 
PV_VADDR_VALID to indicate the virtual address field is actually valid 
(note that vaddr == 0 is not a sufficient test, as updates to VPN 0 
mappings).  This allows for various flush optimizations as well.

This also gets rid of all the update_pte_defer junk in asm-i386 
includes.  As long as we cooperate on the flags definition and native is 
not adversely affect by shifting the vaddr down (P4 shift are slow - our 
metrics with VMI showed no measurable high level disadvantage here for 
native, but the design has changed, and we should re-verify), then this 
solution is workable.  It just requires us to cooperate to come up with 
a common flags definition.

Zach

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: pte_offset_map + lazy mmu
  2007-03-10  6:37 pte_offset_map + lazy mmu Jeremy Fitzhardinge
  2007-03-10  6:54 ` Zachary Amsden
@ 2007-03-10  7:26 ` Zachary Amsden
  1 sibling, 0 replies; 4+ messages in thread
From: Zachary Amsden @ 2007-03-10  7:26 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: Chris Wright, Virtualization Mailing List

Jeremy Fitzhardinge wrote:
> Is pte_offset_map allowed to happen within lazy mmu?  I presume not,
> because you definitely don't want the mapping pte update to be deferred.
>
> Or, more specifically, is kunmap_atomic ever allowed within lazy mmu? 
> I'm looking at kpte_clear_flush; I've already got a patch which turns
> this into a pv_op, along with a Xen implementation.  But I think its
> probably an excess pv_op for a relatively minor corner case.  It seems
> to me that it would be better to define kpte_clear_flush as:
>
>     #define kpte_clear_flush(ptep, vaddr)					\
>     do {									\
>     	arch_enter_lazy_mmu_mode();					\
>     	pte_clear(&init_mm, vaddr, ptep);				\
>     	__flush_tlb_one(vaddr);						\
>     	arch_leave_lazy_mmu_mode();					\
>     } while (0)
>       
>
> and take advantage of mmu batching to make this operation efficient. 
> But I'm not sure if this is safe.
>
> (Also, kmap_atomic could use set_pte_at rather than set_pte.)
>
> What do you think?
>   

I'm sorry, I was broken.  This does work for us, as the batching is not 
nested (as you point out, that would be a bug).  I already took care to 
make sure that all the arch_enter_lazy_mmu_mode() hooks in mm code 
happen after the pagetables are mapped.

Still, I think the hint based solution allows for expansion of the 
capabilities without requiring new paravirt-ops.  What do you think 
about my proposal?

Zach

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: pte_offset_map + lazy mmu
  2007-03-10  6:54 ` Zachary Amsden
@ 2007-03-10 16:06   ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 4+ messages in thread
From: Jeremy Fitzhardinge @ 2007-03-10 16:06 UTC (permalink / raw)
  To: Zachary Amsden; +Cc: Chris Wright, Virtualization Mailing List

Zachary Amsden wrote:
> This could also be genericized in a different way.  PTE updates should
> be limited in scope to just a couple operations.  We don't want to
> have pv-ops for every possible combination of weirdness that can be
> efficiently combined, because then pv-ops explodes.
Yes, but...

> I propose adding a hint or flags field to the PV-op.  In fact, the
> address argument vaddr can be compressed down do a VPN, which gives 12
> more bits for flags to use.  Now, you can pass hints:
>
> PV_LAZY_UPDATE
> PV_DEAD_TABLE
> PV_VADDR_VALID

> which the backend can use.  In the kpte_flush example, you can now
> pass PV_LAZY_UPDATE to combine the pte write with the flush.

Are you saying that PV_LAZY_UPDATE would open a one operation lazy-mmu
window?  I don't like this much at all; this kind of stateful interface
really makes it complex to use, implement and understand.  What happens
if you keep setting this flag on a whole series of operations?  Does it
make them all lazy?  Or are they paired?  How does this interact with
explicitly setting lazy_mmu_mode?

No, I think we should implement laziness with just one mechanism, and
the current one seems just fine to me - though I'd consider adding args
to it to give advance hints about what you're going to be doing in the
lazy region.

>   And in address space destruction, you can pass PV_DEAD_TABLE, which
> allows you to optimize away the pte writes which would otherwise trap
> to the hypervisor, but are not needed, because in the Xen case, you
> switch off the current process page tables during exit() (or should,
> anyway, to avoid spurious traps to Xen before the page tables are freed),

I don't need that because I detach and unpin the pagetable entirely in
the exit_mmap hook.  All the teardown happens on just ordinary unpinned
memory.  Couldn't you do that too?

> and in our case, gets rid of these pte clears that don't need to be
> reflected in the shadows because the shadow is about to die.
>
> And for updates in the current address space, you can pass
> PV_VADDR_VALID to indicate the virtual address field is actually valid
> (note that vaddr == 0 is not a sufficient test, as updates to VPN 0
> mappings).  This allows for various flush optimizations as well.

Hm.  I think if you're using a set_pte_at interface, you should always
pass a valid vaddr.  If you don't have a valid vaddr to pass, you should
use set_pte.

> This also gets rid of all the update_pte_defer junk in asm-i386
> includes.  As long as we cooperate on the flags definition and native
> is not adversely affect by shifting the vaddr down (P4 shift are slow
> - our metrics with VMI showed no measurable high level disadvantage
> here for native, but the design has changed, and we should re-verify),
> then this solution is workable.  It just requires us to cooperate to
> come up with a common flags definition.

I don't use that hook, and I never really understood what its for.

    J

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2007-03-10 16:06 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-03-10  6:37 pte_offset_map + lazy mmu Jeremy Fitzhardinge
2007-03-10  6:54 ` Zachary Amsden
2007-03-10 16:06   ` Jeremy Fitzhardinge
2007-03-10  7:26 ` Zachary Amsden

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).