public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* [ofa-general] Re: [PATCH][RFC]: pte notifiers -- support for external page tables
       [not found]     ` <46DF0234.7090504@redhat.com>
@ 2007-09-05 19:32       ` Avi Kivity
       [not found]         ` <46DF045F.4020806-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Avi Kivity @ 2007-09-05 19:32 UTC (permalink / raw)
  To: Rik van Riel; +Cc: kvm-devel, linux-mm, linux-kernel, general, shaohua.li

Rik van Riel wrote:
>>
>> I imagine that many of the paravirt_ops mmu hooks will need to be 
>> exposed as pte notifiers.  This can't be done as part of the 
>> paravirt_ops code due to the need to pass high level data structures, 
>> though.
>
> Wait, I thought that paravirt_ops was all on the side of the
> guest kernel, where these host kernel operations are invisible?
>

It is, but the hooks are in much the same places.  It could be argued 
that you'd embed pte notifiers in paravirt_ops for a host kernel, but 
that's not doable because pte notifiers use higher-level data strutures 
(like vmas).

-- 
Any sufficiently difficult bug is indistinguishable from a feature.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [ofa-general] Re: [PATCH][RFC]: pte notifiers -- support for external page tables
       [not found] ` <20070905204012.GA29272@sgi.com>
@ 2007-09-05 20:42   ` Avi Kivity
  0 siblings, 0 replies; 5+ messages in thread
From: Avi Kivity @ 2007-09-05 20:42 UTC (permalink / raw)
  To: Jack Steiner; +Cc: kvm-devel, linux-mm, linux-kernel, general, shaohua.li

[resend due to broken cc list in my original post]

Jack Steiner wrote:
> On Wed, Sep 05, 2007 at 07:38:48PM +0300, Avi Kivity wrote:
>   
>> Some hardware and software systems maintain page tables outside the normal
>> Linux page tables, which reference userspace memory.  This includes
>> Infiniband, other RDMA-capable devices, and kvm (with a pending patch).
>>
>>     
>
> I like it. 
>
> We have 2 special devices with external TLBs that can
> take advantage of this.
>
> One suggestion - at least for what we need. Can the notifier be
> registered against the mm_struct instead of (or in addition to) the
> vma?
>   

Yes.  It's a lot simpler since this way we don't have to support vma
creation/splitting/merging/destruction.  There's a tiny performance hit
for kvm, but it isn't worth the bother.

Will implement for v2 of this patch.

-- 
Any sufficiently difficult bug is indistinguishable from a feature.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [ofa-general] Re: [PATCH][RFC] pte notifiers -- support for external page tables
  2007-09-06  4:28 ` Shaohua Li
@ 2007-09-06  8:38   ` Avi Kivity
  0 siblings, 0 replies; 5+ messages in thread
From: Avi Kivity @ 2007-09-06  8:38 UTC (permalink / raw)
  To: Shaohua Li; +Cc: kvm-devel, linux-mm, linux-kernel, general

Shaohua Li wrote:
> On Wed, 2007-09-05 at 22:32 +0300, Avi Kivity wrote:
>   
>> [resend due to bad alias expansion resulting in some recipients
>>  being bogus]
>>
>> Some hardware and software systems maintain page tables outside the normal
>> Linux page tables, which reference userspace memory.  This includes
>> Infiniband, other RDMA-capable devices, and kvm (with a pending patch).
>>
>> Because these systems maintain external page tables (and external tlbs),
>> Linux cannot demand page this memory and it must be locked.  For kvm at
>> least, this is a significant reduction in functionality.
>>
>> This sample patch adds a new mechanism, pte notifiers, that allows drivers
>> to register an interest in a changes to ptes. Whenever Linux changes a
>> pte, it will call a notifier to allow the driver to adjust the external
>> page table and flush its tlb.
>>
>> Note that only one notifier is implemented, ->clear(), but others should be
>> similar.
>>
>> pte notifiers are different from paravirt_ops: they extend the normal
>> page tables rather than replace them; and they provide high-level
>> information
>> such as the vma and the virtual address for the driver to use.
>>     
> Looks great. So for kvm, all guest pages will be vma mapped?
> There are lock issues in kvm between kvm lock and page lock. 
>   

Yes, locking will be a headache.

> Will shadow page table be still stored in page->private? If yes, the
> page->private must be cleaned before add_to_swap.
>   

page->private can be in use by filesystems, so we will need to move rmap 
somewhere else.

-- 
Any sufficiently difficult bug is indistinguishable from a feature.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH][RFC]: pte notifiers -- support for external page tables
       [not found]         ` <46DF045F.4020806-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
@ 2007-09-06 11:28           ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 5+ messages in thread
From: Jeremy Fitzhardinge @ 2007-09-06 11:28 UTC (permalink / raw)
  To: Avi Kivity
  Cc: kvm-devel, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	general-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5

Avi Kivity wrote:
> It is, but the hooks are in much the same places.  It could be argued
> that you'd embed pte notifiers in paravirt_ops for a host kernel, but
> that's not doable because pte notifiers use higher-level data
> strutures (like vmas).

Also, I wouldn't like to preclude the possibility of having a kernel
that's both a guest and a host (ie, nested vmms).

    J

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [ofa-general] Re: [PATCH][RFC] pte notifiers -- support for external page tables
       [not found] ` <p73myw09g5w.fsf@bingen.suse.de>
@ 2007-09-06 15:17   ` Avi Kivity
  0 siblings, 0 replies; 5+ messages in thread
From: Avi Kivity @ 2007-09-06 15:17 UTC (permalink / raw)
  To: Andi Kleen; +Cc: kvm-devel, linux-mm, linux-kernel, general

Andi Kleen wrote:
> Avi Kivity <avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org> writes:
>   
>> pte notifiers are different from paravirt_ops: they extend the normal
>> page tables rather than replace them; and they provide high-level information
>> such as the vma and the virtual address for the driver to use.
>>     
>
> Sounds like a locking horror to me.  To do anything with page tables
> you need locks. Both for the kernel page tables and for your new tables.
>
> What happens when people add all
> things of complicated operations in these notifiers? That will likely
> happen and then everytime you change something in VM code they 
> will break. This has the potential to increase the cost of maintaining
> VM code considerably, which would be a bad thing.
>
> This is quite different from paravirt ops because low level pvops
> can typically run lockless by just doing some kind of hypercall directly.
> But that won't work for maintaining your custom page tables.
>   

Okay, here's a possible fix: add ->lock() and ->unlock() callbacks, to 
be called when mmap_sem is taken either for read or write.  Also add a 
->release() for when the mm goes away to avoid the need to care about 
the entire data structure going away.

The notifier list would need to be kept sorted to avoid deadlocks.

-- 
Any sufficiently difficult bug is indistinguishable from a feature.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2007-09-06 15:17 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <11890103283456-git-send-email-avi@qumranet.com>
     [not found] ` <46DEFDF4.5000900@redhat.com>
     [not found]   ` <46DF0013.4060804@qumranet.com>
     [not found]     ` <46DF0234.7090504@redhat.com>
2007-09-05 19:32       ` [ofa-general] Re: [PATCH][RFC]: pte notifiers -- support for external page tables Avi Kivity
     [not found]         ` <46DF045F.4020806-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-09-06 11:28           ` Jeremy Fitzhardinge
     [not found] ` <20070905204012.GA29272@sgi.com>
2007-09-05 20:42   ` [ofa-general] " Avi Kivity
2007-09-05 19:32 [PATCH][RFC] " Avi Kivity
2007-09-06  4:28 ` Shaohua Li
2007-09-06  8:38   ` [ofa-general] " Avi Kivity
     [not found] ` <p73myw09g5w.fsf@bingen.suse.de>
2007-09-06 15:17   ` Avi Kivity

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox