public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* [RFT] mmu optimizations branch
@ 2007-01-01 10:32 Avi Kivity
       [not found] ` <4598E33B.608-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Avi Kivity @ 2007-01-01 10:32 UTC (permalink / raw)
  To: kvm-devel

This is a request for testing of the mmu optimizations branch.

Currently the shadow page tables are discarded every time the guest 
performs a context switch.  The mmu branch allows shadow page tables to 
be cached across context switches, greatly reducing the cpu utilization 
on multi process workloads.  It is now stable enough for testing (though 
perhaps not for general use).

I've tested 32-bit Linux (pae and non-pae), 64-bit Linux, and pae 
Windows guests on both 32-bit and 64-bit Intel hosts.

Known problems:

 - no AMD support yet
 - will fail horribly in low host memory situations (so run it with 
plenty of free memory)

I will fix these issues in the next few days.

There are still many optimizations that can be had, and I expect 
performance to improve steadily once it is fully stabilized.

You can download the code from

   http://people.qumranet.com/avi/kvm-mmu-4221.tar.gz

or directly from the subversion repository.

-- 
error compiling committee.c: too many arguments to function


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFT] mmu optimizations branch
       [not found] ` <4598E33B.608-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
@ 2007-01-02 16:11   ` Ingo Molnar
       [not found]     ` <20070102161117.GA3306-X9Un+BFzKDI@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Ingo Molnar @ 2007-01-02 16:11 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm-devel


* Avi Kivity <avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org> wrote:

> This is a request for testing of the mmu optimizations branch.
> 
> Currently the shadow page tables are discarded every time the guest 
> performs a context switch.  The mmu branch allows shadow page tables 
> to be cached across context switches, greatly reducing the cpu 
> utilization on multi process workloads.  It is now stable enough for 
> testing (though perhaps not for general use).

i have tested it with Fedora Core 6 guest (32-bit, nopae), under a FC6 
host (32-bit CoreDuo2, nopae, enough RAM), and it's working great!

Here are some quick numbers. Context-switch overhead with lmbench 
lat_ctx -s 0 [zero memory footprint]:
              
  -------------------------------------------------
    #tasks    native    kvm-r4204    kvm-r4232(mmu) 
  -------------------------------------------------
        2:      2.02       180.91         9.19
       20:      4.04       183.21        10.01
       50:      4.30       185.95        11.27

so here it's a /massive/, almost 20 times speedup!

Context-switch overhead with -s 1000 (1MB memory footprint):

  -------------------------------------------------
    #tasks    native    kvm-r4204    kvm-r4232(mmu)
  -------------------------------------------------
        2:     150.5      1032.97       295.16
       20:     216.6      1020.34       393.01
       50:     218.1      1015.58      2335.99[*]

the speedup is nice here too. Note the outlier at 50 tasks: it's 
consistently reproducable. Could KVM be trashing the pagetable cache due 
to some sort of internal limit? It's not due to guest size

The -mmu FC6 guest is visibly faster, so it's not just microbenchmarks 
that benefit from this change. KVM got /massively/ faster in every 
aspect, kudos Avi!  (Note that r4204 already included the interactivity 
IRQ fixes so the improvements are i think purely due to pagetable 
caching speedups.)

on a related note, i also got:

 vmwrite error: reg 6802 value cfd3c4a4 (err 17408)

and:

 kvm: unhandled wrmsr: 0xc1
 inject_general_protection: rip 0xc011f7f3
 kvm: unhandled wrmsr: 0x186
 inject_general_protection: rip 0xc011f7f3
 kvm: unhandled wrmsr: 0xc1
 inject_general_protection: rip 0xc011f7f3
 kvm: unhandled wrmsr: 0x186
 inject_general_protection: rip 0xc011f7f3

unfortunately 0xc011f7f3 is in native_write_msr(), which isnt very 
helpful. (i have CONFIG_PARAVIRT enabled in the -rt guest and host 
kernels) But the MSR values suggest that this is the NMI watchdog thing 
again, trying to program MSR_ARCH_PERFMON_EVENTSEL0 and 
MSR_ARCH_PERFMON_PERFCTR0, but this time Linux recovered due to a more 
robust MSR handling. The guest disabled the NMI watchdog with:

  Testing NMI watchdog ... CPU#0: NMI appears to be stuck (0->0)!

the FC6 installer hang that i saw with earlier MMU-branch snapshots is 
fixed.

	Ingo

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFT] mmu optimizations branch
       [not found]     ` <20070102161117.GA3306-X9Un+BFzKDI@public.gmane.org>
@ 2007-01-02 16:32       ` Avi Kivity
       [not found]         ` <459A8909.7020600-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Avi Kivity @ 2007-01-02 16:32 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: kvm-devel

Ingo Molnar wrote:
> * Avi Kivity <avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org> wrote:
>
>   
>> This is a request for testing of the mmu optimizations branch.
>>
>> Currently the shadow page tables are discarded every time the guest 
>> performs a context switch.  The mmu branch allows shadow page tables 
>> to be cached across context switches, greatly reducing the cpu 
>> utilization on multi process workloads.  It is now stable enough for 
>> testing (though perhaps not for general use).
>>     
>
> i have tested it with Fedora Core 6 guest (32-bit, nopae), under a FC6 
> host (32-bit CoreDuo2, nopae, enough RAM), and it's working great!
>
> Here are some quick numbers. Context-switch overhead with lmbench 
> lat_ctx -s 0 [zero memory footprint]:
>               
>   -------------------------------------------------
>     #tasks    native    kvm-r4204    kvm-r4232(mmu) 
>   -------------------------------------------------
>         2:      2.02       180.91         9.19
>        20:      4.04       183.21        10.01
>        50:      4.30       185.95        11.27
>
> so here it's a /massive/, almost 20 times speedup!
>   

Excellent. 10us is approximately the vmexit overhead on intel (we 
regularly see 100-120k exits/sec), so it means a context switch is 
exactly one exit.  Hard to beat without nested page tables.

> Context-switch overhead with -s 1000 (1MB memory footprint):
>
>   -------------------------------------------------
>     #tasks    native    kvm-r4204    kvm-r4232(mmu)
>   -------------------------------------------------
>         2:     150.5      1032.97       295.16
>        20:     216.6      1020.34       393.01
>        50:     218.1      1015.58      2335.99[*]
>
> the speedup is nice here too. Note the outlier at 50 tasks: it's 
> consistently reproducable. Could KVM be trashing the pagetable cache due 
> to some sort of internal limit? It's not due to guest size
>   

kvm now caches 256 page tables; so if every process uses 5 page tables, 
plus some for the kernel, you'd get thrashing.  I don't understand why 
we're lower than native with 2 processes.  Maybe background work causes 
page tables to be evicted (see page replacement, below).

I plan to add a tunable for the cache size, and autotuning later on.

The shadow page replacement algorithm can also use some work, currently 
it's FIFO.  It can be easily made to mimic the Linux active/inactive 
lists to approximate LRU by examining the accessed bits on parent page 
tables.


> The -mmu FC6 guest is visibly faster, so it's not just microbenchmarks 
> that benefit from this change. KVM got /massively/ faster in every 
> aspect, kudos Avi!  (Note that r4204 already included the interactivity 
> IRQ fixes so the improvements are i think purely due to pagetable 
> caching speedups.)
>
> on a related note, i also got:
>
>  vmwrite error: reg 6802 value cfd3c4a4 (err 17408)
>
>   

This is already fixed on the trunk (which now has mmu merged).

> and:
>
>  kvm: unhandled wrmsr: 0xc1
>  inject_general_protection: rip 0xc011f7f3
>  kvm: unhandled wrmsr: 0x186
>  inject_general_protection: rip 0xc011f7f3
>  kvm: unhandled wrmsr: 0xc1
>  inject_general_protection: rip 0xc011f7f3
>  kvm: unhandled wrmsr: 0x186
>  inject_general_protection: rip 0xc011f7f3
>
> unfortunately 0xc011f7f3 is in native_write_msr(), which isnt very 
> helpful. (i have CONFIG_PARAVIRT enabled in the -rt guest and host 
> kernels) But the MSR values suggest that this is the NMI watchdog thing 
> again, trying to program MSR_ARCH_PERFMON_EVENTSEL0 and 
> MSR_ARCH_PERFMON_PERFCTR0, but this time Linux recovered due to a more 
> robust MSR handling. The guest disabled the NMI watchdog with:
>
>   Testing NMI watchdog ... CPU#0: NMI appears to be stuck (0->0)!
>
> the FC6 installer hang that i saw with earlier MMU-branch snapshots is 
> fixed.
>   


Good.  Handling the counter well would have been very difficult, 
especially if attempting to support cross migration.

-- 
error compiling committee.c: too many arguments to function


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFT] mmu optimizations branch
       [not found]         ` <459A8909.7020600-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
@ 2007-01-02 16:49           ` Ingo Molnar
       [not found]             ` <20070102164912.GA25271-X9Un+BFzKDI@public.gmane.org>
  2007-01-02 17:01           ` Michael Riepe
                             ` (2 subsequent siblings)
  3 siblings, 1 reply; 16+ messages in thread
From: Ingo Molnar @ 2007-01-02 16:49 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm-devel


* Avi Kivity <avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org> wrote:

> >on a related note, i also got:
> >
> > vmwrite error: reg 6802 value cfd3c4a4 (err 17408)
> 
> This is already fixed on the trunk (which now has mmu merged).

which version? I used the merged trunk version from today, revision 
4232, as indicated in the table.

	Ingo

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFT] mmu optimizations branch
       [not found]         ` <459A8909.7020600-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
  2007-01-02 16:49           ` Ingo Molnar
@ 2007-01-02 17:01           ` Michael Riepe
       [not found]             ` <459A8FE0.2030202-0QoEqw4nQxo@public.gmane.org>
  2007-01-02 17:02           ` Ingo Molnar
  2007-01-03  2:22           ` Ingo Molnar
  3 siblings, 1 reply; 16+ messages in thread
From: Michael Riepe @ 2007-01-02 17:01 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm-devel

Hi!

Avi Kivity wrote:

>>on a related note, i also got:
>>
>> vmwrite error: reg 6802 value cfd3c4a4 (err 17408)
> 
> This is already fixed on the trunk (which now has mmu merged).

Actually not. Now it reads:

	vmwrite error: reg 6802 value 6802 (err 17408)

(trunk revision 4236)

-- 
Michael "Tired" Riepe <michael-0QoEqw4nQxo@public.gmane.org>
X-Tired: Each morning I get up I die a little

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFT] mmu optimizations branch
       [not found]         ` <459A8909.7020600-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
  2007-01-02 16:49           ` Ingo Molnar
  2007-01-02 17:01           ` Michael Riepe
@ 2007-01-02 17:02           ` Ingo Molnar
       [not found]             ` <20070102170212.GB25271-X9Un+BFzKDI@public.gmane.org>
  2007-01-03  2:22           ` Ingo Molnar
  3 siblings, 1 reply; 16+ messages in thread
From: Ingo Molnar @ 2007-01-02 17:02 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm-devel


* Avi Kivity <avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org> wrote:

> >unfortunately 0xc011f7f3 is in native_write_msr(), which isnt very 
> >helpful. (i have CONFIG_PARAVIRT enabled in the -rt guest and host 
> >kernels) But the MSR values suggest that this is the NMI watchdog 
> >thing again, trying to program MSR_ARCH_PERFMON_EVENTSEL0 and 
> >MSR_ARCH_PERFMON_PERFCTR0, but this time Linux recovered due to a 
> >more robust MSR handling. The guest disabled the NMI watchdog with:
> >
> >  Testing NMI watchdog ... CPU#0: NMI appears to be stuck (0->0)!
> >
> >the FC6 installer hang that i saw with earlier MMU-branch snapshots 
> >is fixed.
> 
> 
> Good.  Handling the counter well would have been very difficult, 
> especially if attempting to support cross migration.

as far as the NMI watchdog goes, it's in fact better to keep it disabled 
this way - it's not like the guest context could 'lock up' in an 
undebuggable way. Any NMI activity in the guest context would be pretty 
pointless. I'd suggest simulating a non-working performance counter: 
i.e. dont inject a #GPF when doing the wrmsr, and maybe preserve the 
values that were written into the MSR register, but otherwise dont try 
to implement the functionality by injecting NMIs. Worst-case this could 
result in user-space debugging tools seeing non-working 
performance-counter functionality.

	Ingo

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFT] mmu optimizations branch
       [not found]             ` <20070102164912.GA25271-X9Un+BFzKDI@public.gmane.org>
@ 2007-01-02 17:07               ` Avi Kivity
  0 siblings, 0 replies; 16+ messages in thread
From: Avi Kivity @ 2007-01-02 17:07 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: kvm-devel

Ingo Molnar wrote:
> * Avi Kivity <avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org> wrote:
>
>   
>>> on a related note, i also got:
>>>
>>> vmwrite error: reg 6802 value cfd3c4a4 (err 17408)
>>>       
>> This is already fixed on the trunk (which now has mmu merged).
>>     
>
> which version? I used the merged trunk version from today, revision 
> 4232, as indicated in the table.
>   

That's supposed to contain it.  Maybe it's another bug.

-- 
error compiling committee.c: too many arguments to function


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFT] mmu optimizations branch
       [not found]             ` <20070102170212.GB25271-X9Un+BFzKDI@public.gmane.org>
@ 2007-01-02 17:09               ` Avi Kivity
  0 siblings, 0 replies; 16+ messages in thread
From: Avi Kivity @ 2007-01-02 17:09 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: kvm-devel

Ingo Molnar wrote:
> * Avi Kivity <avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org> wrote:
>
>   
>>> unfortunately 0xc011f7f3 is in native_write_msr(), which isnt very 
>>> helpful. (i have CONFIG_PARAVIRT enabled in the -rt guest and host 
>>> kernels) But the MSR values suggest that this is the NMI watchdog 
>>> thing again, trying to program MSR_ARCH_PERFMON_EVENTSEL0 and 
>>> MSR_ARCH_PERFMON_PERFCTR0, but this time Linux recovered due to a 
>>> more robust MSR handling. The guest disabled the NMI watchdog with:
>>>
>>>  Testing NMI watchdog ... CPU#0: NMI appears to be stuck (0->0)!
>>>
>>> the FC6 installer hang that i saw with earlier MMU-branch snapshots 
>>> is fixed.
>>>       
>> Good.  Handling the counter well would have been very difficult, 
>> especially if attempting to support cross migration.
>>     
>
> as far as the NMI watchdog goes, it's in fact better to keep it disabled 
> this way - it's not like the guest context could 'lock up' in an 
> undebuggable way. Any NMI activity in the guest context would be pretty 
> pointless. I'd suggest simulating a non-working performance counter: 
> i.e. dont inject a #GPF when doing the wrmsr, and maybe preserve the 
> values that were written into the MSR register, but otherwise dont try 
> to implement the functionality by injecting NMIs. Worst-case this could 
> result in user-space debugging tools seeing non-working 
> performance-counter functionality.
>
>   

My worry is that when emulating an msr incorrectly, software can fail 
without any clue as to what went wrong.  I'll add the emulation as you 
suggest bug with a printk() to warn that we're bending the rules.


-- 
error compiling committee.c: too many arguments to function


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFT] mmu optimizations branch
       [not found]         ` <459A8909.7020600-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
                             ` (2 preceding siblings ...)
  2007-01-02 17:02           ` Ingo Molnar
@ 2007-01-03  2:22           ` Ingo Molnar
       [not found]             ` <20070103022241.GA13840-X9Un+BFzKDI@public.gmane.org>
  3 siblings, 1 reply; 16+ messages in thread
From: Ingo Molnar @ 2007-01-03  2:22 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm-devel


* Avi Kivity <avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org> wrote:

> >lat_ctx -s 0 [zero memory footprint]:
> >              
> >  -------------------------------------------------
> >    #tasks    native    kvm-r4204    kvm-r4232(mmu) 
> >  -------------------------------------------------
> >        2:      2.02       180.91         9.19
> >       20:      4.04       183.21        10.01
> >       50:      4.30       185.95        11.27
> >
> >so here it's a /massive/, almost 20 times speedup!
> >  
> 
> Excellent. 10us is approximately the vmexit overhead on intel (we 
> regularly see 100-120k exits/sec), so it means a context switch is 
> exactly one exit.  Hard to beat without nested page tables.

actually, the VM entry+exit cost on this CPU is around 3-4 microseconds, 
so it's still 2 VM exits per context switch.

I debugged this a bit, and what happens is that when Linux does a 
task-switch it does a cr3 load /and/ a write (look at __flush_tlb()) - 
and both are causing a vm exit!

I have started paravirtualizing the Linux kernel for KVM. I have 
eliminated the cr3 load from the Linux kernel via paravirtualization and 
that way lat_ctx shows a ~5-6 usecs context-switch cost. That's pretty 
good i think, compared to the 2-3 usecs of native. I'll send patches for 
this tomorrow.

	Ingo

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFT] mmu optimizations branch
       [not found]             ` <20070103022241.GA13840-X9Un+BFzKDI@public.gmane.org>
@ 2007-01-03  3:05               ` Anthony Liguori
       [not found]                 ` <459B1D8A.6040604-NZpS4cJIG2HvQtjrzfazuQ@public.gmane.org>
  2007-01-03  8:32               ` Avi Kivity
  1 sibling, 1 reply; 16+ messages in thread
From: Anthony Liguori @ 2007-01-03  3:05 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: kvm-devel

Ingo Molnar wrote:
> * Avi Kivity <avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org> wrote:
>
>   
>>> lat_ctx -s 0 [zero memory footprint]:
>>>
>>>  -------------------------------------------------
>>>    #tasks    native    kvm-r4204    kvm-r4232(mmu)
>>>  -------------------------------------------------
>>>        2:      2.02       180.91         9.19
>>>       20:      4.04       183.21        10.01
>>>       50:      4.30       185.95        11.27
>>>
>>> so here it's a /massive/, almost 20 times speedup!
>>>
>>>       
>> Excellent. 10us is approximately the vmexit overhead on intel (we
>> regularly see 100-120k exits/sec), so it means a context switch is
>> exactly one exit.  Hard to beat without nested page tables.
>>     
>
> actually, the VM entry+exit cost on this CPU is around 3-4 microseconds,
> so it's still 2 VM exits per context switch.
>
> I debugged this a bit, and what happens is that when Linux does a
> task-switch it does a cr3 load /and/ a write (look at __flush_tlb()) -
> and both are causing a vm exit!
>
> I have started paravirtualizing the Linux kernel for KVM. I have
> eliminated the cr3 load from the Linux kernel via paravirtualization and
> that way lat_ctx shows a ~5-6 usecs context-switch cost. That's pretty
> good i think, compared to the 2-3 usecs of native. I'll send patches for
> this tomorrow.
>   

This should be hookable via arch_{enter,leave}_cpu_mode() via 
paravirt_ops.  I was actually just looking at this myself (although I 
was focusing on lazy mmu hooks).  I've taken the route of using a VMI 
ROM to actually hook it (instead of implementing a custom paravirt_ops 
for KVM).

I can post the ROM if you're interested in this sort of approach.  I 
like using VMI as we can access most of the things hookable by 
paravirt_ops without having to change a kernel binary.

Regards,

Anthony Liguori

> 	Ingo
>
> -------------------------------------------------------------------------
> Take Surveys. Earn Cash. Influence the Future of IT
> Join SourceForge.net's Techsay panel and you'll get the chance to share your
> opinions on IT & business topics through brief surveys - and earn cash
> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> _______________________________________________
> kvm-devel mailing list
> kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
> https://lists.sourceforge.net/lists/listinfo/kvm-devel
>   


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFT] mmu optimizations branch
       [not found]             ` <20070103022241.GA13840-X9Un+BFzKDI@public.gmane.org>
  2007-01-03  3:05               ` Anthony Liguori
@ 2007-01-03  8:32               ` Avi Kivity
       [not found]                 ` <459B6A15.4010208-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
  1 sibling, 1 reply; 16+ messages in thread
From: Avi Kivity @ 2007-01-03  8:32 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: kvm-devel

Ingo Molnar wrote:
> * Avi Kivity <avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org> wrote:
>
>   
>>> lat_ctx -s 0 [zero memory footprint]:
>>>              
>>>  -------------------------------------------------
>>>    #tasks    native    kvm-r4204    kvm-r4232(mmu) 
>>>  -------------------------------------------------
>>>        2:      2.02       180.91         9.19
>>>       20:      4.04       183.21        10.01
>>>       50:      4.30       185.95        11.27
>>>
>>> so here it's a /massive/, almost 20 times speedup!
>>>  
>>>       
>> Excellent. 10us is approximately the vmexit overhead on intel (we 
>> regularly see 100-120k exits/sec), so it means a context switch is 
>> exactly one exit.  Hard to beat without nested page tables.
>>     
>
> actually, the VM entry+exit cost on this CPU is around 3-4 microseconds, 
> so it's still 2 VM exits per context switch.
>
> I debugged this a bit, and what happens is that when Linux does a 
> task-switch it does a cr3 load /and/ a write (look at __flush_tlb()) - 
> and both are causing a vm exit!
>
> I have started paravirtualizing the Linux kernel for KVM. I have 
> eliminated the cr3 load from the Linux kernel via paravirtualization and 
> that way lat_ctx shows a ~5-6 usecs context-switch cost. That's pretty 
> good i think, compared to the 2-3 usecs of native. I'll send patches for 
> this tomorrow.
>   

Does this really call for parvirtualization? Caching the old cr3 value 
will work for native as well.


-- 
error compiling committee.c: too many arguments to function


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFT] mmu optimizations branch
       [not found]                 ` <459B1D8A.6040604-NZpS4cJIG2HvQtjrzfazuQ@public.gmane.org>
@ 2007-01-03  8:35                   ` Avi Kivity
  0 siblings, 0 replies; 16+ messages in thread
From: Avi Kivity @ 2007-01-03  8:35 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: kvm-devel

Anthony Liguori wrote:
>
> This should be hookable via arch_{enter,leave}_cpu_mode() via 
> paravirt_ops.  I was actually just looking at this myself (although I 
> was focusing on lazy mmu hooks).  I've taken the route of using a VMI 
> ROM to actually hook it (instead of implementing a custom paravirt_ops 
> for KVM).
>
> I can post the ROM if you're interested in this sort of approach.  I 
> like using VMI as we can access most of the things hookable by 
> paravirt_ops without having to change a kernel binary.

VMI has the benefit of working for other OSes, not just Linux, if it 
catches on.  Please post it; it's certainly interesting.

My feelings is that if you have a paravirt_ops capable guest, you might 
as well go all the way and use lhype.  I'm willing to be convinced 
otherwise though.

-- 
error compiling committee.c: too many arguments to function


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFT] mmu optimizations branch
       [not found]             ` <459A8FE0.2030202-0QoEqw4nQxo@public.gmane.org>
@ 2007-01-03  8:49               ` Avi Kivity
  0 siblings, 0 replies; 16+ messages in thread
From: Avi Kivity @ 2007-01-03  8:49 UTC (permalink / raw)
  To: Michael Riepe; +Cc: kvm-devel

Michael Riepe wrote:
> Hi!
>
> Avi Kivity wrote:
>
>   
>>> on a related note, i also got:
>>>
>>> vmwrite error: reg 6802 value cfd3c4a4 (err 17408)
>>>       
>> This is already fixed on the trunk (which now has mmu merged).
>>     
>
> Actually not. Now it reads:
>
> 	vmwrite error: reg 6802 value 6802 (err 17408)
>   

I added a dump_stack() to vmwrite so we can see where it comes from.


-- 
error compiling committee.c: too many arguments to function


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFT] mmu optimizations branch
       [not found]                 ` <459B6A15.4010208-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
@ 2007-01-03 10:02                   ` Ingo Molnar
       [not found]                     ` <20070103100235.GA17168-X9Un+BFzKDI@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Ingo Molnar @ 2007-01-03 10:02 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm-devel


* Avi Kivity <avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org> wrote:

> >actually, the VM entry+exit cost on this CPU is around 3-4 
> >microseconds, so it's still 2 VM exits per context switch.
> >
> >I debugged this a bit, and what happens is that when Linux does a 
> >task-switch it does a cr3 load /and/ a write (look at __flush_tlb()) 
> >- and both are causing a vm exit!
> >
> >I have started paravirtualizing the Linux kernel for KVM. I have 
> >eliminated the cr3 load from the Linux kernel via paravirtualization 
> >and that way lat_ctx shows a ~5-6 usecs context-switch cost. That's 
> >pretty good i think, compared to the 2-3 usecs of native. I'll send 
> >patches for this tomorrow.
> 
> Does this really call for parvirtualization? Caching the old cr3 value 
> will work for native as well.

it's already cached (and my changes use that cached value), but it (used 
to be) faster/smaller to just move from cr3 than to dereference down to 
the cached pgd pointer, because we used to inline those instructions 
heavily. But i agree that in the current kernel it's probably worth 
doing this for native too - i'll cook up a patch for upstream.

	Ingo

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFT] mmu optimizations branch
       [not found]                     ` <20070103100235.GA17168-X9Un+BFzKDI@public.gmane.org>
@ 2007-01-03 10:16                       ` Avi Kivity
       [not found]                         ` <459B8267.5080000-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Avi Kivity @ 2007-01-03 10:16 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: kvm-devel

Ingo Molnar wrote:
> * Avi Kivity <avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org> wrote:
>
>   
>>> actually, the VM entry+exit cost on this CPU is around 3-4 
>>> microseconds, so it's still 2 VM exits per context switch.
>>>
>>> I debugged this a bit, and what happens is that when Linux does a 
>>> task-switch it does a cr3 load /and/ a write (look at __flush_tlb()) 
>>> - and both are causing a vm exit!
>>>
>>> I have started paravirtualizing the Linux kernel for KVM. I have 
>>> eliminated the cr3 load from the Linux kernel via paravirtualization 
>>> and that way lat_ctx shows a ~5-6 usecs context-switch cost. That's 
>>> pretty good i think, compared to the 2-3 usecs of native. I'll send 
>>> patches for this tomorrow.
>>>       
>> Does this really call for parvirtualization? Caching the old cr3 value 
>> will work for native as well.
>>     
>
> it's already cached (and my changes use that cached value), but it (used 
> to be) faster/smaller to just move from cr3 than to dereference down to 
> the cached pgd pointer, because we used to inline those instructions 
> heavily. But i agree that in the current kernel it's probably worth 
> doing this for native too - i'll cook up a patch for upstream.
>
>   

Ok.

A question. What's __flush_tlb() doing in the context switch path?  
Shouldn't it just load the new cr3 and be done with it?


-- 
error compiling committee.c: too many arguments to function


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFT] mmu optimizations branch
       [not found]                         ` <459B8267.5080000-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
@ 2007-01-03 11:30                           ` Ingo Molnar
  0 siblings, 0 replies; 16+ messages in thread
From: Ingo Molnar @ 2007-01-03 11:30 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm-devel


* Avi Kivity <avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org> wrote:

> A question. What's __flush_tlb() doing in the context switch path?  
> Shouldn't it just load the new cr3 and be done with it?

hmm .... it /does/ use load_cr3, which uses write_cr3(). Maybe i'm wrong 
about this analysis and the 'speedup' was a cache alignment fluke ...

	Ingo

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2007-01-03 11:30 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-01-01 10:32 [RFT] mmu optimizations branch Avi Kivity
     [not found] ` <4598E33B.608-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-01-02 16:11   ` Ingo Molnar
     [not found]     ` <20070102161117.GA3306-X9Un+BFzKDI@public.gmane.org>
2007-01-02 16:32       ` Avi Kivity
     [not found]         ` <459A8909.7020600-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-01-02 16:49           ` Ingo Molnar
     [not found]             ` <20070102164912.GA25271-X9Un+BFzKDI@public.gmane.org>
2007-01-02 17:07               ` Avi Kivity
2007-01-02 17:01           ` Michael Riepe
     [not found]             ` <459A8FE0.2030202-0QoEqw4nQxo@public.gmane.org>
2007-01-03  8:49               ` Avi Kivity
2007-01-02 17:02           ` Ingo Molnar
     [not found]             ` <20070102170212.GB25271-X9Un+BFzKDI@public.gmane.org>
2007-01-02 17:09               ` Avi Kivity
2007-01-03  2:22           ` Ingo Molnar
     [not found]             ` <20070103022241.GA13840-X9Un+BFzKDI@public.gmane.org>
2007-01-03  3:05               ` Anthony Liguori
     [not found]                 ` <459B1D8A.6040604-NZpS4cJIG2HvQtjrzfazuQ@public.gmane.org>
2007-01-03  8:35                   ` Avi Kivity
2007-01-03  8:32               ` Avi Kivity
     [not found]                 ` <459B6A15.4010208-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-01-03 10:02                   ` Ingo Molnar
     [not found]                     ` <20070103100235.GA17168-X9Un+BFzKDI@public.gmane.org>
2007-01-03 10:16                       ` Avi Kivity
     [not found]                         ` <459B8267.5080000-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-01-03 11:30                           ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox