* [RFT] mmu optimizations branch
@ 2007-01-01 10:32 Avi Kivity
[not found] ` <4598E33B.608-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
0 siblings, 1 reply; 16+ messages in thread
From: Avi Kivity @ 2007-01-01 10:32 UTC (permalink / raw)
To: kvm-devel
This is a request for testing of the mmu optimizations branch.
Currently the shadow page tables are discarded every time the guest
performs a context switch. The mmu branch allows shadow page tables to
be cached across context switches, greatly reducing the cpu utilization
on multi process workloads. It is now stable enough for testing (though
perhaps not for general use).
I've tested 32-bit Linux (pae and non-pae), 64-bit Linux, and pae
Windows guests on both 32-bit and 64-bit Intel hosts.
Known problems:
- no AMD support yet
- will fail horribly in low host memory situations (so run it with
plenty of free memory)
I will fix these issues in the next few days.
There are still many optimizations that can be had, and I expect
performance to improve steadily once it is fully stabilized.
You can download the code from
http://people.qumranet.com/avi/kvm-mmu-4221.tar.gz
or directly from the subversion repository.
--
error compiling committee.c: too many arguments to function
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
^ permalink raw reply [flat|nested] 16+ messages in thread[parent not found: <4598E33B.608-atKUWr5tajBWk0Htik3J/w@public.gmane.org>]
* Re: [RFT] mmu optimizations branch [not found] ` <4598E33B.608-atKUWr5tajBWk0Htik3J/w@public.gmane.org> @ 2007-01-02 16:11 ` Ingo Molnar [not found] ` <20070102161117.GA3306-X9Un+BFzKDI@public.gmane.org> 0 siblings, 1 reply; 16+ messages in thread From: Ingo Molnar @ 2007-01-02 16:11 UTC (permalink / raw) To: Avi Kivity; +Cc: kvm-devel * Avi Kivity <avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org> wrote: > This is a request for testing of the mmu optimizations branch. > > Currently the shadow page tables are discarded every time the guest > performs a context switch. The mmu branch allows shadow page tables > to be cached across context switches, greatly reducing the cpu > utilization on multi process workloads. It is now stable enough for > testing (though perhaps not for general use). i have tested it with Fedora Core 6 guest (32-bit, nopae), under a FC6 host (32-bit CoreDuo2, nopae, enough RAM), and it's working great! Here are some quick numbers. Context-switch overhead with lmbench lat_ctx -s 0 [zero memory footprint]: ------------------------------------------------- #tasks native kvm-r4204 kvm-r4232(mmu) ------------------------------------------------- 2: 2.02 180.91 9.19 20: 4.04 183.21 10.01 50: 4.30 185.95 11.27 so here it's a /massive/, almost 20 times speedup! Context-switch overhead with -s 1000 (1MB memory footprint): ------------------------------------------------- #tasks native kvm-r4204 kvm-r4232(mmu) ------------------------------------------------- 2: 150.5 1032.97 295.16 20: 216.6 1020.34 393.01 50: 218.1 1015.58 2335.99[*] the speedup is nice here too. Note the outlier at 50 tasks: it's consistently reproducable. Could KVM be trashing the pagetable cache due to some sort of internal limit? It's not due to guest size The -mmu FC6 guest is visibly faster, so it's not just microbenchmarks that benefit from this change. KVM got /massively/ faster in every aspect, kudos Avi! (Note that r4204 already included the interactivity IRQ fixes so the improvements are i think purely due to pagetable caching speedups.) on a related note, i also got: vmwrite error: reg 6802 value cfd3c4a4 (err 17408) and: kvm: unhandled wrmsr: 0xc1 inject_general_protection: rip 0xc011f7f3 kvm: unhandled wrmsr: 0x186 inject_general_protection: rip 0xc011f7f3 kvm: unhandled wrmsr: 0xc1 inject_general_protection: rip 0xc011f7f3 kvm: unhandled wrmsr: 0x186 inject_general_protection: rip 0xc011f7f3 unfortunately 0xc011f7f3 is in native_write_msr(), which isnt very helpful. (i have CONFIG_PARAVIRT enabled in the -rt guest and host kernels) But the MSR values suggest that this is the NMI watchdog thing again, trying to program MSR_ARCH_PERFMON_EVENTSEL0 and MSR_ARCH_PERFMON_PERFCTR0, but this time Linux recovered due to a more robust MSR handling. The guest disabled the NMI watchdog with: Testing NMI watchdog ... CPU#0: NMI appears to be stuck (0->0)! the FC6 installer hang that i saw with earlier MMU-branch snapshots is fixed. Ingo ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <20070102161117.GA3306-X9Un+BFzKDI@public.gmane.org>]
* Re: [RFT] mmu optimizations branch [not found] ` <20070102161117.GA3306-X9Un+BFzKDI@public.gmane.org> @ 2007-01-02 16:32 ` Avi Kivity [not found] ` <459A8909.7020600-atKUWr5tajBWk0Htik3J/w@public.gmane.org> 0 siblings, 1 reply; 16+ messages in thread From: Avi Kivity @ 2007-01-02 16:32 UTC (permalink / raw) To: Ingo Molnar; +Cc: kvm-devel Ingo Molnar wrote: > * Avi Kivity <avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org> wrote: > > >> This is a request for testing of the mmu optimizations branch. >> >> Currently the shadow page tables are discarded every time the guest >> performs a context switch. The mmu branch allows shadow page tables >> to be cached across context switches, greatly reducing the cpu >> utilization on multi process workloads. It is now stable enough for >> testing (though perhaps not for general use). >> > > i have tested it with Fedora Core 6 guest (32-bit, nopae), under a FC6 > host (32-bit CoreDuo2, nopae, enough RAM), and it's working great! > > Here are some quick numbers. Context-switch overhead with lmbench > lat_ctx -s 0 [zero memory footprint]: > > ------------------------------------------------- > #tasks native kvm-r4204 kvm-r4232(mmu) > ------------------------------------------------- > 2: 2.02 180.91 9.19 > 20: 4.04 183.21 10.01 > 50: 4.30 185.95 11.27 > > so here it's a /massive/, almost 20 times speedup! > Excellent. 10us is approximately the vmexit overhead on intel (we regularly see 100-120k exits/sec), so it means a context switch is exactly one exit. Hard to beat without nested page tables. > Context-switch overhead with -s 1000 (1MB memory footprint): > > ------------------------------------------------- > #tasks native kvm-r4204 kvm-r4232(mmu) > ------------------------------------------------- > 2: 150.5 1032.97 295.16 > 20: 216.6 1020.34 393.01 > 50: 218.1 1015.58 2335.99[*] > > the speedup is nice here too. Note the outlier at 50 tasks: it's > consistently reproducable. Could KVM be trashing the pagetable cache due > to some sort of internal limit? It's not due to guest size > kvm now caches 256 page tables; so if every process uses 5 page tables, plus some for the kernel, you'd get thrashing. I don't understand why we're lower than native with 2 processes. Maybe background work causes page tables to be evicted (see page replacement, below). I plan to add a tunable for the cache size, and autotuning later on. The shadow page replacement algorithm can also use some work, currently it's FIFO. It can be easily made to mimic the Linux active/inactive lists to approximate LRU by examining the accessed bits on parent page tables. > The -mmu FC6 guest is visibly faster, so it's not just microbenchmarks > that benefit from this change. KVM got /massively/ faster in every > aspect, kudos Avi! (Note that r4204 already included the interactivity > IRQ fixes so the improvements are i think purely due to pagetable > caching speedups.) > > on a related note, i also got: > > vmwrite error: reg 6802 value cfd3c4a4 (err 17408) > > This is already fixed on the trunk (which now has mmu merged). > and: > > kvm: unhandled wrmsr: 0xc1 > inject_general_protection: rip 0xc011f7f3 > kvm: unhandled wrmsr: 0x186 > inject_general_protection: rip 0xc011f7f3 > kvm: unhandled wrmsr: 0xc1 > inject_general_protection: rip 0xc011f7f3 > kvm: unhandled wrmsr: 0x186 > inject_general_protection: rip 0xc011f7f3 > > unfortunately 0xc011f7f3 is in native_write_msr(), which isnt very > helpful. (i have CONFIG_PARAVIRT enabled in the -rt guest and host > kernels) But the MSR values suggest that this is the NMI watchdog thing > again, trying to program MSR_ARCH_PERFMON_EVENTSEL0 and > MSR_ARCH_PERFMON_PERFCTR0, but this time Linux recovered due to a more > robust MSR handling. The guest disabled the NMI watchdog with: > > Testing NMI watchdog ... CPU#0: NMI appears to be stuck (0->0)! > > the FC6 installer hang that i saw with earlier MMU-branch snapshots is > fixed. > Good. Handling the counter well would have been very difficult, especially if attempting to support cross migration. -- error compiling committee.c: too many arguments to function ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <459A8909.7020600-atKUWr5tajBWk0Htik3J/w@public.gmane.org>]
* Re: [RFT] mmu optimizations branch [not found] ` <459A8909.7020600-atKUWr5tajBWk0Htik3J/w@public.gmane.org> @ 2007-01-02 16:49 ` Ingo Molnar [not found] ` <20070102164912.GA25271-X9Un+BFzKDI@public.gmane.org> 2007-01-02 17:01 ` Michael Riepe ` (2 subsequent siblings) 3 siblings, 1 reply; 16+ messages in thread From: Ingo Molnar @ 2007-01-02 16:49 UTC (permalink / raw) To: Avi Kivity; +Cc: kvm-devel * Avi Kivity <avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org> wrote: > >on a related note, i also got: > > > > vmwrite error: reg 6802 value cfd3c4a4 (err 17408) > > This is already fixed on the trunk (which now has mmu merged). which version? I used the merged trunk version from today, revision 4232, as indicated in the table. Ingo ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <20070102164912.GA25271-X9Un+BFzKDI@public.gmane.org>]
* Re: [RFT] mmu optimizations branch [not found] ` <20070102164912.GA25271-X9Un+BFzKDI@public.gmane.org> @ 2007-01-02 17:07 ` Avi Kivity 0 siblings, 0 replies; 16+ messages in thread From: Avi Kivity @ 2007-01-02 17:07 UTC (permalink / raw) To: Ingo Molnar; +Cc: kvm-devel Ingo Molnar wrote: > * Avi Kivity <avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org> wrote: > > >>> on a related note, i also got: >>> >>> vmwrite error: reg 6802 value cfd3c4a4 (err 17408) >>> >> This is already fixed on the trunk (which now has mmu merged). >> > > which version? I used the merged trunk version from today, revision > 4232, as indicated in the table. > That's supposed to contain it. Maybe it's another bug. -- error compiling committee.c: too many arguments to function ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFT] mmu optimizations branch [not found] ` <459A8909.7020600-atKUWr5tajBWk0Htik3J/w@public.gmane.org> 2007-01-02 16:49 ` Ingo Molnar @ 2007-01-02 17:01 ` Michael Riepe [not found] ` <459A8FE0.2030202-0QoEqw4nQxo@public.gmane.org> 2007-01-02 17:02 ` Ingo Molnar 2007-01-03 2:22 ` Ingo Molnar 3 siblings, 1 reply; 16+ messages in thread From: Michael Riepe @ 2007-01-02 17:01 UTC (permalink / raw) To: Avi Kivity; +Cc: kvm-devel Hi! Avi Kivity wrote: >>on a related note, i also got: >> >> vmwrite error: reg 6802 value cfd3c4a4 (err 17408) > > This is already fixed on the trunk (which now has mmu merged). Actually not. Now it reads: vmwrite error: reg 6802 value 6802 (err 17408) (trunk revision 4236) -- Michael "Tired" Riepe <michael-0QoEqw4nQxo@public.gmane.org> X-Tired: Each morning I get up I die a little ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <459A8FE0.2030202-0QoEqw4nQxo@public.gmane.org>]
* Re: [RFT] mmu optimizations branch [not found] ` <459A8FE0.2030202-0QoEqw4nQxo@public.gmane.org> @ 2007-01-03 8:49 ` Avi Kivity 0 siblings, 0 replies; 16+ messages in thread From: Avi Kivity @ 2007-01-03 8:49 UTC (permalink / raw) To: Michael Riepe; +Cc: kvm-devel Michael Riepe wrote: > Hi! > > Avi Kivity wrote: > > >>> on a related note, i also got: >>> >>> vmwrite error: reg 6802 value cfd3c4a4 (err 17408) >>> >> This is already fixed on the trunk (which now has mmu merged). >> > > Actually not. Now it reads: > > vmwrite error: reg 6802 value 6802 (err 17408) > I added a dump_stack() to vmwrite so we can see where it comes from. -- error compiling committee.c: too many arguments to function ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFT] mmu optimizations branch [not found] ` <459A8909.7020600-atKUWr5tajBWk0Htik3J/w@public.gmane.org> 2007-01-02 16:49 ` Ingo Molnar 2007-01-02 17:01 ` Michael Riepe @ 2007-01-02 17:02 ` Ingo Molnar [not found] ` <20070102170212.GB25271-X9Un+BFzKDI@public.gmane.org> 2007-01-03 2:22 ` Ingo Molnar 3 siblings, 1 reply; 16+ messages in thread From: Ingo Molnar @ 2007-01-02 17:02 UTC (permalink / raw) To: Avi Kivity; +Cc: kvm-devel * Avi Kivity <avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org> wrote: > >unfortunately 0xc011f7f3 is in native_write_msr(), which isnt very > >helpful. (i have CONFIG_PARAVIRT enabled in the -rt guest and host > >kernels) But the MSR values suggest that this is the NMI watchdog > >thing again, trying to program MSR_ARCH_PERFMON_EVENTSEL0 and > >MSR_ARCH_PERFMON_PERFCTR0, but this time Linux recovered due to a > >more robust MSR handling. The guest disabled the NMI watchdog with: > > > > Testing NMI watchdog ... CPU#0: NMI appears to be stuck (0->0)! > > > >the FC6 installer hang that i saw with earlier MMU-branch snapshots > >is fixed. > > > Good. Handling the counter well would have been very difficult, > especially if attempting to support cross migration. as far as the NMI watchdog goes, it's in fact better to keep it disabled this way - it's not like the guest context could 'lock up' in an undebuggable way. Any NMI activity in the guest context would be pretty pointless. I'd suggest simulating a non-working performance counter: i.e. dont inject a #GPF when doing the wrmsr, and maybe preserve the values that were written into the MSR register, but otherwise dont try to implement the functionality by injecting NMIs. Worst-case this could result in user-space debugging tools seeing non-working performance-counter functionality. Ingo ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <20070102170212.GB25271-X9Un+BFzKDI@public.gmane.org>]
* Re: [RFT] mmu optimizations branch [not found] ` <20070102170212.GB25271-X9Un+BFzKDI@public.gmane.org> @ 2007-01-02 17:09 ` Avi Kivity 0 siblings, 0 replies; 16+ messages in thread From: Avi Kivity @ 2007-01-02 17:09 UTC (permalink / raw) To: Ingo Molnar; +Cc: kvm-devel Ingo Molnar wrote: > * Avi Kivity <avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org> wrote: > > >>> unfortunately 0xc011f7f3 is in native_write_msr(), which isnt very >>> helpful. (i have CONFIG_PARAVIRT enabled in the -rt guest and host >>> kernels) But the MSR values suggest that this is the NMI watchdog >>> thing again, trying to program MSR_ARCH_PERFMON_EVENTSEL0 and >>> MSR_ARCH_PERFMON_PERFCTR0, but this time Linux recovered due to a >>> more robust MSR handling. The guest disabled the NMI watchdog with: >>> >>> Testing NMI watchdog ... CPU#0: NMI appears to be stuck (0->0)! >>> >>> the FC6 installer hang that i saw with earlier MMU-branch snapshots >>> is fixed. >>> >> Good. Handling the counter well would have been very difficult, >> especially if attempting to support cross migration. >> > > as far as the NMI watchdog goes, it's in fact better to keep it disabled > this way - it's not like the guest context could 'lock up' in an > undebuggable way. Any NMI activity in the guest context would be pretty > pointless. I'd suggest simulating a non-working performance counter: > i.e. dont inject a #GPF when doing the wrmsr, and maybe preserve the > values that were written into the MSR register, but otherwise dont try > to implement the functionality by injecting NMIs. Worst-case this could > result in user-space debugging tools seeing non-working > performance-counter functionality. > > My worry is that when emulating an msr incorrectly, software can fail without any clue as to what went wrong. I'll add the emulation as you suggest bug with a printk() to warn that we're bending the rules. -- error compiling committee.c: too many arguments to function ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFT] mmu optimizations branch [not found] ` <459A8909.7020600-atKUWr5tajBWk0Htik3J/w@public.gmane.org> ` (2 preceding siblings ...) 2007-01-02 17:02 ` Ingo Molnar @ 2007-01-03 2:22 ` Ingo Molnar [not found] ` <20070103022241.GA13840-X9Un+BFzKDI@public.gmane.org> 3 siblings, 1 reply; 16+ messages in thread From: Ingo Molnar @ 2007-01-03 2:22 UTC (permalink / raw) To: Avi Kivity; +Cc: kvm-devel * Avi Kivity <avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org> wrote: > >lat_ctx -s 0 [zero memory footprint]: > > > > ------------------------------------------------- > > #tasks native kvm-r4204 kvm-r4232(mmu) > > ------------------------------------------------- > > 2: 2.02 180.91 9.19 > > 20: 4.04 183.21 10.01 > > 50: 4.30 185.95 11.27 > > > >so here it's a /massive/, almost 20 times speedup! > > > > Excellent. 10us is approximately the vmexit overhead on intel (we > regularly see 100-120k exits/sec), so it means a context switch is > exactly one exit. Hard to beat without nested page tables. actually, the VM entry+exit cost on this CPU is around 3-4 microseconds, so it's still 2 VM exits per context switch. I debugged this a bit, and what happens is that when Linux does a task-switch it does a cr3 load /and/ a write (look at __flush_tlb()) - and both are causing a vm exit! I have started paravirtualizing the Linux kernel for KVM. I have eliminated the cr3 load from the Linux kernel via paravirtualization and that way lat_ctx shows a ~5-6 usecs context-switch cost. That's pretty good i think, compared to the 2-3 usecs of native. I'll send patches for this tomorrow. Ingo ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <20070103022241.GA13840-X9Un+BFzKDI@public.gmane.org>]
* Re: [RFT] mmu optimizations branch [not found] ` <20070103022241.GA13840-X9Un+BFzKDI@public.gmane.org> @ 2007-01-03 3:05 ` Anthony Liguori [not found] ` <459B1D8A.6040604-NZpS4cJIG2HvQtjrzfazuQ@public.gmane.org> 2007-01-03 8:32 ` Avi Kivity 1 sibling, 1 reply; 16+ messages in thread From: Anthony Liguori @ 2007-01-03 3:05 UTC (permalink / raw) To: Ingo Molnar; +Cc: kvm-devel Ingo Molnar wrote: > * Avi Kivity <avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org> wrote: > > >>> lat_ctx -s 0 [zero memory footprint]: >>> >>> ------------------------------------------------- >>> #tasks native kvm-r4204 kvm-r4232(mmu) >>> ------------------------------------------------- >>> 2: 2.02 180.91 9.19 >>> 20: 4.04 183.21 10.01 >>> 50: 4.30 185.95 11.27 >>> >>> so here it's a /massive/, almost 20 times speedup! >>> >>> >> Excellent. 10us is approximately the vmexit overhead on intel (we >> regularly see 100-120k exits/sec), so it means a context switch is >> exactly one exit. Hard to beat without nested page tables. >> > > actually, the VM entry+exit cost on this CPU is around 3-4 microseconds, > so it's still 2 VM exits per context switch. > > I debugged this a bit, and what happens is that when Linux does a > task-switch it does a cr3 load /and/ a write (look at __flush_tlb()) - > and both are causing a vm exit! > > I have started paravirtualizing the Linux kernel for KVM. I have > eliminated the cr3 load from the Linux kernel via paravirtualization and > that way lat_ctx shows a ~5-6 usecs context-switch cost. That's pretty > good i think, compared to the 2-3 usecs of native. I'll send patches for > this tomorrow. > This should be hookable via arch_{enter,leave}_cpu_mode() via paravirt_ops. I was actually just looking at this myself (although I was focusing on lazy mmu hooks). I've taken the route of using a VMI ROM to actually hook it (instead of implementing a custom paravirt_ops for KVM). I can post the ROM if you're interested in this sort of approach. I like using VMI as we can access most of the things hookable by paravirt_ops without having to change a kernel binary. Regards, Anthony Liguori > Ingo > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share your > opinions on IT & business topics through brief surveys - and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > kvm-devel mailing list > kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org > https://lists.sourceforge.net/lists/listinfo/kvm-devel > ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <459B1D8A.6040604-NZpS4cJIG2HvQtjrzfazuQ@public.gmane.org>]
* Re: [RFT] mmu optimizations branch [not found] ` <459B1D8A.6040604-NZpS4cJIG2HvQtjrzfazuQ@public.gmane.org> @ 2007-01-03 8:35 ` Avi Kivity 0 siblings, 0 replies; 16+ messages in thread From: Avi Kivity @ 2007-01-03 8:35 UTC (permalink / raw) To: Anthony Liguori; +Cc: kvm-devel Anthony Liguori wrote: > > This should be hookable via arch_{enter,leave}_cpu_mode() via > paravirt_ops. I was actually just looking at this myself (although I > was focusing on lazy mmu hooks). I've taken the route of using a VMI > ROM to actually hook it (instead of implementing a custom paravirt_ops > for KVM). > > I can post the ROM if you're interested in this sort of approach. I > like using VMI as we can access most of the things hookable by > paravirt_ops without having to change a kernel binary. VMI has the benefit of working for other OSes, not just Linux, if it catches on. Please post it; it's certainly interesting. My feelings is that if you have a paravirt_ops capable guest, you might as well go all the way and use lhype. I'm willing to be convinced otherwise though. -- error compiling committee.c: too many arguments to function ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFT] mmu optimizations branch [not found] ` <20070103022241.GA13840-X9Un+BFzKDI@public.gmane.org> 2007-01-03 3:05 ` Anthony Liguori @ 2007-01-03 8:32 ` Avi Kivity [not found] ` <459B6A15.4010208-atKUWr5tajBWk0Htik3J/w@public.gmane.org> 1 sibling, 1 reply; 16+ messages in thread From: Avi Kivity @ 2007-01-03 8:32 UTC (permalink / raw) To: Ingo Molnar; +Cc: kvm-devel Ingo Molnar wrote: > * Avi Kivity <avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org> wrote: > > >>> lat_ctx -s 0 [zero memory footprint]: >>> >>> ------------------------------------------------- >>> #tasks native kvm-r4204 kvm-r4232(mmu) >>> ------------------------------------------------- >>> 2: 2.02 180.91 9.19 >>> 20: 4.04 183.21 10.01 >>> 50: 4.30 185.95 11.27 >>> >>> so here it's a /massive/, almost 20 times speedup! >>> >>> >> Excellent. 10us is approximately the vmexit overhead on intel (we >> regularly see 100-120k exits/sec), so it means a context switch is >> exactly one exit. Hard to beat without nested page tables. >> > > actually, the VM entry+exit cost on this CPU is around 3-4 microseconds, > so it's still 2 VM exits per context switch. > > I debugged this a bit, and what happens is that when Linux does a > task-switch it does a cr3 load /and/ a write (look at __flush_tlb()) - > and both are causing a vm exit! > > I have started paravirtualizing the Linux kernel for KVM. I have > eliminated the cr3 load from the Linux kernel via paravirtualization and > that way lat_ctx shows a ~5-6 usecs context-switch cost. That's pretty > good i think, compared to the 2-3 usecs of native. I'll send patches for > this tomorrow. > Does this really call for parvirtualization? Caching the old cr3 value will work for native as well. -- error compiling committee.c: too many arguments to function ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <459B6A15.4010208-atKUWr5tajBWk0Htik3J/w@public.gmane.org>]
* Re: [RFT] mmu optimizations branch [not found] ` <459B6A15.4010208-atKUWr5tajBWk0Htik3J/w@public.gmane.org> @ 2007-01-03 10:02 ` Ingo Molnar [not found] ` <20070103100235.GA17168-X9Un+BFzKDI@public.gmane.org> 0 siblings, 1 reply; 16+ messages in thread From: Ingo Molnar @ 2007-01-03 10:02 UTC (permalink / raw) To: Avi Kivity; +Cc: kvm-devel * Avi Kivity <avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org> wrote: > >actually, the VM entry+exit cost on this CPU is around 3-4 > >microseconds, so it's still 2 VM exits per context switch. > > > >I debugged this a bit, and what happens is that when Linux does a > >task-switch it does a cr3 load /and/ a write (look at __flush_tlb()) > >- and both are causing a vm exit! > > > >I have started paravirtualizing the Linux kernel for KVM. I have > >eliminated the cr3 load from the Linux kernel via paravirtualization > >and that way lat_ctx shows a ~5-6 usecs context-switch cost. That's > >pretty good i think, compared to the 2-3 usecs of native. I'll send > >patches for this tomorrow. > > Does this really call for parvirtualization? Caching the old cr3 value > will work for native as well. it's already cached (and my changes use that cached value), but it (used to be) faster/smaller to just move from cr3 than to dereference down to the cached pgd pointer, because we used to inline those instructions heavily. But i agree that in the current kernel it's probably worth doing this for native too - i'll cook up a patch for upstream. Ingo ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <20070103100235.GA17168-X9Un+BFzKDI@public.gmane.org>]
* Re: [RFT] mmu optimizations branch [not found] ` <20070103100235.GA17168-X9Un+BFzKDI@public.gmane.org> @ 2007-01-03 10:16 ` Avi Kivity [not found] ` <459B8267.5080000-atKUWr5tajBWk0Htik3J/w@public.gmane.org> 0 siblings, 1 reply; 16+ messages in thread From: Avi Kivity @ 2007-01-03 10:16 UTC (permalink / raw) To: Ingo Molnar; +Cc: kvm-devel Ingo Molnar wrote: > * Avi Kivity <avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org> wrote: > > >>> actually, the VM entry+exit cost on this CPU is around 3-4 >>> microseconds, so it's still 2 VM exits per context switch. >>> >>> I debugged this a bit, and what happens is that when Linux does a >>> task-switch it does a cr3 load /and/ a write (look at __flush_tlb()) >>> - and both are causing a vm exit! >>> >>> I have started paravirtualizing the Linux kernel for KVM. I have >>> eliminated the cr3 load from the Linux kernel via paravirtualization >>> and that way lat_ctx shows a ~5-6 usecs context-switch cost. That's >>> pretty good i think, compared to the 2-3 usecs of native. I'll send >>> patches for this tomorrow. >>> >> Does this really call for parvirtualization? Caching the old cr3 value >> will work for native as well. >> > > it's already cached (and my changes use that cached value), but it (used > to be) faster/smaller to just move from cr3 than to dereference down to > the cached pgd pointer, because we used to inline those instructions > heavily. But i agree that in the current kernel it's probably worth > doing this for native too - i'll cook up a patch for upstream. > > Ok. A question. What's __flush_tlb() doing in the context switch path? Shouldn't it just load the new cr3 and be done with it? -- error compiling committee.c: too many arguments to function ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <459B8267.5080000-atKUWr5tajBWk0Htik3J/w@public.gmane.org>]
* Re: [RFT] mmu optimizations branch [not found] ` <459B8267.5080000-atKUWr5tajBWk0Htik3J/w@public.gmane.org> @ 2007-01-03 11:30 ` Ingo Molnar 0 siblings, 0 replies; 16+ messages in thread From: Ingo Molnar @ 2007-01-03 11:30 UTC (permalink / raw) To: Avi Kivity; +Cc: kvm-devel * Avi Kivity <avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org> wrote: > A question. What's __flush_tlb() doing in the context switch path? > Shouldn't it just load the new cr3 and be done with it? hmm .... it /does/ use load_cr3, which uses write_cr3(). Maybe i'm wrong about this analysis and the 'speedup' was a cache alignment fluke ... Ingo ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2007-01-03 11:30 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-01-01 10:32 [RFT] mmu optimizations branch Avi Kivity
[not found] ` <4598E33B.608-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-01-02 16:11 ` Ingo Molnar
[not found] ` <20070102161117.GA3306-X9Un+BFzKDI@public.gmane.org>
2007-01-02 16:32 ` Avi Kivity
[not found] ` <459A8909.7020600-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-01-02 16:49 ` Ingo Molnar
[not found] ` <20070102164912.GA25271-X9Un+BFzKDI@public.gmane.org>
2007-01-02 17:07 ` Avi Kivity
2007-01-02 17:01 ` Michael Riepe
[not found] ` <459A8FE0.2030202-0QoEqw4nQxo@public.gmane.org>
2007-01-03 8:49 ` Avi Kivity
2007-01-02 17:02 ` Ingo Molnar
[not found] ` <20070102170212.GB25271-X9Un+BFzKDI@public.gmane.org>
2007-01-02 17:09 ` Avi Kivity
2007-01-03 2:22 ` Ingo Molnar
[not found] ` <20070103022241.GA13840-X9Un+BFzKDI@public.gmane.org>
2007-01-03 3:05 ` Anthony Liguori
[not found] ` <459B1D8A.6040604-NZpS4cJIG2HvQtjrzfazuQ@public.gmane.org>
2007-01-03 8:35 ` Avi Kivity
2007-01-03 8:32 ` Avi Kivity
[not found] ` <459B6A15.4010208-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-01-03 10:02 ` Ingo Molnar
[not found] ` <20070103100235.GA17168-X9Un+BFzKDI@public.gmane.org>
2007-01-03 10:16 ` Avi Kivity
[not found] ` <459B8267.5080000-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-01-03 11:30 ` Ingo Molnar
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox