* [PATCH][RFC] Improve PIO latency
@ 2007-02-04 4:31 Anthony Liguori
[not found] ` <45C56188.2050408-NZpS4cJIG2HvQtjrzfazuQ@public.gmane.org>
0 siblings, 1 reply; 5+ messages in thread
From: Anthony Liguori @ 2007-02-04 4:31 UTC (permalink / raw)
To: kvm-devel
[-- Attachment #1: Type: text/plain, Size: 2616 bytes --]
The attached patch modifies libkvmctl to only make SET_REG/GET_REG
ioctls when needed for PIO instructions. I was only able to do this for
out instructions because I didn't want to break the kernel ABI.
I think we should change the API though so we can do this with other
types of IO instructions. Before this patch, the time line for a PIO
instruction looks something like this:
All times in nanoseconds and are round trips from the guests perspective
for an out instruction on an AMD X2 4200.
1015 - immediately after restoring saving guest registers
1991 - handled within the kernel in io_interception
2294 - libkvmctl returns immediately
2437 - w/ patch
3311 - w/o patch
The first data point is the best we could possible do. The only work
being done after the VMRUN is a VMSAVE/VMLOAD, saving the guest
registers, and restoring the host registers. The VMSAVE/VMLOAD is
needed so that vmcb->save.eip can be updated.[1] I played around
reducing the register savings but the differences weren't noticable.
I suspect that more intelligent handling of things like FPU save/restore
should be able to reduce the second data point. This will also improve
some other exit paths (like shadow paging). We save/restore an awful
lot of state considering that we probably return back to the guest for
the vast majority of exits.
On this system, a sysenter based syscall is roughly 100 nsec so I'm
pretty happy with the third data point. This is just what one would expect.
With the attached patch, we reduce the time we spend in QEMU by
eliminating unnecessary ioctl()s. This cuts down the total trip time by
about 1/3. We should be able to do this for in{b,w,l}s too.
With these patches, we get an improvement in disk performance.
virtbench can measure disk latency and small disk read speeds (16kb).
According to virt bench:
w/o patch
80% of native - latency
61% of native - bandwidth
w/ patch
96% of native - latency
99% of native - bandwidth
Before getting too excited, we're still only 25% of native with dbench.
We see a small improvement with the patch (around 10%) but there's
an awful lot of variability.
There are quite a few things that should improve disk performance in
QEMU. Moving to an asynchronous IO model (QEMU CVS), and utilizing
linux-aio should make a pretty significant difference.
The last interesting bit is that the native latency for an IDE PIO
operation is around 750 nsec on this system. Theoretically, we should
be able to get pretty close to native IDE performance with emulation.
At least, that's the theory :-)
Regards,
Anthony Liguori
[-- Attachment #2: pio-performance.diff --]
[-- Type: text/x-patch, Size: 1359 bytes --]
Avoid making system calls for out{b,w,l} instructions since it is not necessary
to sync GP registers.
Signed-off-by: Anthony Liguori <anthony-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
diff -r 29119439ef33 user/kvmctl.c
--- a/user/kvmctl.c Sat Feb 03 18:50:24 2007 -0600
+++ b/user/kvmctl.c Sat Feb 03 19:07:24 2007 -0600
@@ -234,11 +234,14 @@ static int handle_io(kvm_context_t kvm,
int first_time = 1;
int delta;
struct translation_cache tr;
+ int _in = (run->io.direction == KVM_EXIT_IO_IN);
translation_cache_init(&tr);
- regs.vcpu = run->vcpu;
- ioctl(kvm->fd, KVM_GET_REGS, ®s);
+ if (run->io.string || _in) {
+ regs.vcpu = run->vcpu;
+ ioctl(kvm->fd, KVM_GET_REGS, ®s);
+ }
delta = run->io.string_down ? -run->io.size : run->io.size;
@@ -246,9 +249,12 @@ static int handle_io(kvm_context_t kvm,
void *value_addr;
int r;
- if (!run->io.string)
- value_addr = ®s.rax;
- else {
+ if (!run->io.string) {
+ if (_in)
+ value_addr = ®s.rax;
+ else
+ value_addr = &run->io.value;
+ } else {
r = translate(kvm, run->vcpu, &tr, run->io.address,
&value_addr);
if (r) {
@@ -326,7 +332,8 @@ static int handle_io(kvm_context_t kvm,
}
}
- ioctl(kvm->fd, KVM_SET_REGS, ®s);
+ if (run->io.string || _in)
+ ioctl(kvm->fd, KVM_SET_REGS, ®s);
run->emulated = 1;
return 0;
}
[-- Attachment #3: Type: text/plain, Size: 374 bytes --]
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
[-- Attachment #4: Type: text/plain, Size: 186 bytes --]
_______________________________________________
kvm-devel mailing list
kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
https://lists.sourceforge.net/lists/listinfo/kvm-devel
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH][RFC] Improve PIO latency
[not found] ` <45C56188.2050408-NZpS4cJIG2HvQtjrzfazuQ@public.gmane.org>
@ 2007-02-06 10:46 ` Avi Kivity
[not found] ` <45C85C6D.5030501-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
0 siblings, 1 reply; 5+ messages in thread
From: Avi Kivity @ 2007-02-06 10:46 UTC (permalink / raw)
To: Anthony Liguori; +Cc: kvm-devel
Anthony Liguori wrote:
> The attached patch modifies libkvmctl to only make SET_REG/GET_REG
> ioctls when needed for PIO instructions. I was only able to do this
> for out instructions because I didn't want to break the kernel ABI.
>
> I think we should change the API though so we can do this with other
> types of IO instructions. Before this patch, the time line for a PIO
> instruction looks something like this:
>
> All times in nanoseconds and are round trips from the guests
> perspective for an out instruction on an AMD X2 4200.
>
> 1015 - immediately after restoring saving guest registers
> 1991 - handled within the kernel in io_interception
> 2294 - libkvmctl returns immediately
> 2437 - w/ patch
> 3311 - w/o patch
>
> The first data point is the best we could possible do. The only work
> being done after the VMRUN is a VMSAVE/VMLOAD, saving the guest
> registers, and restoring the host registers. The VMSAVE/VMLOAD is
> needed so that vmcb->save.eip can be updated.[1] I played around
> reducing the register savings but the differences weren't noticable.
>
> I suspect that more intelligent handling of things like FPU
> save/restore should be able to reduce the second data point. This
> will also improve some other exit paths (like shadow paging). We
> save/restore an awful lot of state considering that we probably return
> back to the guest for the vast majority of exits.
>
These are very encouraging numbers. I'd expected the vmexit to be more
expensive, and fpu save/restore to be less expensive. Since, as you
say, we can eliminate the fpu save/restore in many cases, we have a net
win :)
The planned userspace api changes will eliminate registers and virtual
addresses for pio. This will both improve performance and make the api
more architecture agnostic.
I'm applying the patch now, even though it will be obsoleted soon, as
it's always nice to have a performance improvement.
ps. that [1] is a dangling reference?
--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH][RFC] Improve PIO latency
[not found] ` <45C85C6D.5030501-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
@ 2007-02-06 11:56 ` Dor Laor
[not found] ` <64F9B87B6B770947A9F8391472E032160A4D2550-yEcIvxbTEBqsx+V+t5oei8rau4O3wl8o3fe8/T/H7NteoWH0uzbU5w@public.gmane.org>
2007-02-06 14:35 ` Anthony Liguori
1 sibling, 1 reply; 5+ messages in thread
From: Dor Laor @ 2007-02-06 11:56 UTC (permalink / raw)
To: Avi Kivity, Anthony Liguori; +Cc: kvm-devel
>Anthony Liguori wrote:
>> The attached patch modifies libkvmctl to only make SET_REG/GET_REG
>> ioctls when needed for PIO instructions. I was only able to do this
>> for out instructions because I didn't want to break the kernel ABI.
>>
>> I think we should change the API though so we can do this with other
>> types of IO instructions. Before this patch, the time line for a PIO
>> instruction looks something like this:
>>
>> All times in nanoseconds and are round trips from the guests
>> perspective for an out instruction on an AMD X2 4200.
>>
>> 1015 - immediately after restoring saving guest registers
>> 1991 - handled within the kernel in io_interception
>> 2294 - libkvmctl returns immediately
>> 2437 - w/ patch
>> 3311 - w/o patch
>>
>> The first data point is the best we could possible do. The only work
>> being done after the VMRUN is a VMSAVE/VMLOAD, saving the guest
>> registers, and restoring the host registers. The VMSAVE/VMLOAD is
>> needed so that vmcb->save.eip can be updated.[1] I played around
>> reducing the register savings but the differences weren't noticable.
>>
>> I suspect that more intelligent handling of things like FPU
>> save/restore should be able to reduce the second data point. This
>> will also improve some other exit paths (like shadow paging). We
>> save/restore an awful lot of state considering that we probably
return
>> back to the guest for the vast majority of exits.
>>
>
>These are very encouraging numbers. I'd expected the vmexit to be more
>expensive, and fpu save/restore to be less expensive. Since, as you
I tried to see the performance gain using dd if=/file iflag=direct ...
And didn't get any visible gain. So maybe all the vmexit/disk latency
are shadowing the performance gain?
Anthony can you please send the virt bench patch?
>say, we can eliminate the fpu save/restore in many cases, we have a net
>win :)
>
>The planned userspace api changes will eliminate registers and virtual
>addresses for pio. This will both improve performance and make the api
>more architecture agnostic.
>
>I'm applying the patch now, even though it will be obsoleted soon, as
>it's always nice to have a performance improvement.
>
>
>ps. that [1] is a dangling reference?
>
>--
>Do not meddle in the internals of kernels, for they are subtle and
quick to
>panic.
>
>
>-----------------------------------------------------------------------
--
>Using Tomcat but need to do more? Need to support web services,
security?
>Get stuff done quickly with pre-integrated technology to make your job
>easier.
>Download IBM WebSphere Application Server v.1.0.1 based on Apache
Geronimo
>http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=12164
2
>_______________________________________________
>kvm-devel mailing list
>kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
>https://lists.sourceforge.net/lists/listinfo/kvm-devel
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH][RFC] Improve PIO latency
[not found] ` <45C85C6D.5030501-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-02-06 11:56 ` Dor Laor
@ 2007-02-06 14:35 ` Anthony Liguori
1 sibling, 0 replies; 5+ messages in thread
From: Anthony Liguori @ 2007-02-06 14:35 UTC (permalink / raw)
To: Avi Kivity; +Cc: kvm-devel
Avi Kivity wrote:
> Anthony Liguori wrote:
>> The attached patch modifies libkvmctl to only make SET_REG/GET_REG
>> ioctls when needed for PIO instructions. I was only able to do this
>> for out instructions because I didn't want to break the kernel ABI.
>>
>> I think we should change the API though so we can do this with other
>> types of IO instructions. Before this patch, the time line for a PIO
>> instruction looks something like this:
>>
>> All times in nanoseconds and are round trips from the guests
>> perspective for an out instruction on an AMD X2 4200.
>>
>> 1015 - immediately after restoring saving guest registers
>> 1991 - handled within the kernel in io_interception
>> 2294 - libkvmctl returns immediately
>> 2437 - w/ patch
>> 3311 - w/o patch
>>
>> The first data point is the best we could possible do. The only work
>> being done after the VMRUN is a VMSAVE/VMLOAD, saving the guest
>> registers, and restoring the host registers. The VMSAVE/VMLOAD is
>> needed so that vmcb->save.eip can be updated.[1] I played around
>> reducing the register savings but the differences weren't noticable.
>>
>> I suspect that more intelligent handling of things like FPU
>> save/restore should be able to reduce the second data point. This
>> will also improve some other exit paths (like shadow paging). We
>> save/restore an awful lot of state considering that we probably
>> return back to the guest for the vast majority of exits.
>>
>
> These are very encouraging numbers. I'd expected the vmexit to be
> more expensive, and fpu save/restore to be less expensive. Since, as
> you say, we can eliminate the fpu save/restore in many cases, we have
> a net win :)
>
> The planned userspace api changes will eliminate registers and virtual
> addresses for pio. This will both improve performance and make the
> api more architecture agnostic.
Cool!
> I'm applying the patch now, even though it will be obsoleted soon, as
> it's always nice to have a performance improvement.
>
>
> ps. that [1] is a dangling reference?
Sorry, it was supposed to be something like:
[1] I have no idea why there is are VM{SAVE,LOAD} instructions in the
first place. AFAIK, there is nothing useful you can do after a VMRUN
without doing a VMSAVE since you cannot re-enter the VM without updating
the VMCB.
Regards,
Anthony Liguori
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH][RFC] Improve PIO latency
[not found] ` <64F9B87B6B770947A9F8391472E032160A4D2550-yEcIvxbTEBqsx+V+t5oei8rau4O3wl8o3fe8/T/H7NteoWH0uzbU5w@public.gmane.org>
@ 2007-02-06 14:38 ` Anthony Liguori
0 siblings, 0 replies; 5+ messages in thread
From: Anthony Liguori @ 2007-02-06 14:38 UTC (permalink / raw)
To: Dor Laor; +Cc: kvm-devel
Dor Laor wrote:
>> Anthony Liguori wrote:
>>
>>> The attached patch modifies libkvmctl to only make SET_REG/GET_REG
>>> ioctls when needed for PIO instructions. I was only able to do this
>>> for out instructions because I didn't want to break the kernel ABI.
>>>
>>> I think we should change the API though so we can do this with other
>>> types of IO instructions. Before this patch, the time line for a PIO
>>> instruction looks something like this:
>>>
>>> All times in nanoseconds and are round trips from the guests
>>> perspective for an out instruction on an AMD X2 4200.
>>>
>>> 1015 - immediately after restoring saving guest registers
>>> 1991 - handled within the kernel in io_interception
>>> 2294 - libkvmctl returns immediately
>>> 2437 - w/ patch
>>> 3311 - w/o patch
>>>
>>> The first data point is the best we could possible do. The only work
>>> being done after the VMRUN is a VMSAVE/VMLOAD, saving the guest
>>> registers, and restoring the host registers. The VMSAVE/VMLOAD is
>>> needed so that vmcb->save.eip can be updated.[1] I played around
>>> reducing the register savings but the differences weren't noticable.
>>>
>>> I suspect that more intelligent handling of things like FPU
>>> save/restore should be able to reduce the second data point. This
>>> will also improve some other exit paths (like shadow paging). We
>>> save/restore an awful lot of state considering that we probably
>>>
> return
>
>>> back to the guest for the vast majority of exits.
>>>
>>>
>> These are very encouraging numbers. I'd expected the vmexit to be more
>> expensive, and fpu save/restore to be less expensive. Since, as you
>>
>
>
> I tried to see the performance gain using dd if=/file iflag=direct ...
>
Have you compared against the host? I used dbench.
> And didn't get any visible gain. So maybe all the vmexit/disk latency
> are shadowing the performance gain?
> Anthony can you please send the virt bench patch?
>
You don't need a patched version of virtbench. Just try use:
virtbench local pio
http://ozlabs.org/~rusty/virtbench
Regards,
Anthony Liguori
>> say, we can eliminate the fpu save/restore in many cases, we have a net
>> win :)
>>
>> The planned userspace api changes will eliminate registers and virtual
>> addresses for pio. This will both improve performance and make the api
>> more architecture agnostic.
>>
>> I'm applying the patch now, even though it will be obsoleted soon, as
>> it's always nice to have a performance improvement.
>>
>>
>> ps. that [1] is a dangling reference?
>>
>> --
>> Do not meddle in the internals of kernels, for they are subtle and
>>
> quick to
>
>> panic.
>>
>>
>> -----------------------------------------------------------------------
>>
> --
>
>> Using Tomcat but need to do more? Need to support web services,
>>
> security?
>
>> Get stuff done quickly with pre-integrated technology to make your job
>> easier.
>> Download IBM WebSphere Application Server v.1.0.1 based on Apache
>>
> Geronimo
>
>> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=12164
>>
> 2
>
>> _______________________________________________
>> kvm-devel mailing list
>> kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
>> https://lists.sourceforge.net/lists/listinfo/kvm-devel
>>
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2007-02-06 14:38 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-02-04 4:31 [PATCH][RFC] Improve PIO latency Anthony Liguori
[not found] ` <45C56188.2050408-NZpS4cJIG2HvQtjrzfazuQ@public.gmane.org>
2007-02-06 10:46 ` Avi Kivity
[not found] ` <45C85C6D.5030501-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-02-06 11:56 ` Dor Laor
[not found] ` <64F9B87B6B770947A9F8391472E032160A4D2550-yEcIvxbTEBqsx+V+t5oei8rau4O3wl8o3fe8/T/H7NteoWH0uzbU5w@public.gmane.org>
2007-02-06 14:38 ` Anthony Liguori
2007-02-06 14:35 ` Anthony Liguori
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox